CI and evals
Quality documentation tells contributors which checks protect which behavior. foxctl separates production CI expectations from local developer feedback loops and experimental evaluation harnesses.
Local checks
Section titled “Local checks”Core commands
Section titled “Core commands”These make targets are the canonical contract for local quality checks:
| Command | What it does |
|---|---|
make test | Unit tests (no network), all packages |
make test-short | Unit tests with -short flag (fastest feedback loop) |
make test-race | Unit tests with -race detection |
make lint | golangci-lint + staticcheck + govet |
make fmt | gofumpt formatting |
make check | Runs fmt, lint, vet, and test together |
Coverage enforcement
Section titled “Coverage enforcement”| Command | Lines | Functions | Branches |
|---|---|---|---|
make check-coverage | ≥ 40% | ≥ 40% | ≥ 40% |
make check-coverage-strict | ≥ 85% | ≥ 80% | ≥ 75% |
The default gate enforces the current repository floor (40%). Local development should aim for the stricter check-coverage-strict target.
Documentation checks
Section titled “Documentation checks”# Check markdown links in repo docsmake check-doc-links
# Check for whitespace errorsgit diff --checkDocs-site checks
Section titled “Docs-site checks”# Build the Starlight sitebun run --cwd packages/docs-site build
# Run Astro checksbun run --cwd packages/docs-site checkCI pipeline
Section titled “CI pipeline”GitHub Actions (.github/workflows/ci.yml) runs a containerized CI pipeline using a pre-warmed Go image built from Dockerfile.ci. Key jobs:
| Job | What it runs |
|---|---|
| lint | make lint in the CI image |
| test | CGO_ENABLED=0 go test -short ./... with 40% coverage floor |
| race/tests/coverage | Race detection and coverage reporting |
CI guidelines
Section titled “CI guidelines”- Keep CI and local
maketargets in sync. - Do not add new CGO dependencies or networked tests without explicit human approval.
- The default test suite is deterministic: no network access in
go test ./....
Integration tests
Section titled “Integration tests”Integration tests live in two locations, both gated with //go:build integration:
test/integration/
Section titled “test/integration/”Full integration tests that may require network access, LLM API keys (FOXCTL_LLM_API_KEY, GEMINI_API_KEY), or external binaries. Tests end-to-end workflows like agent spawning, symbol indexing, and the SWE Grep pipeline.
make test-integrationcmd/foxctl/cmd/
Section titled “cmd/foxctl/cmd/”Command integration tests that verify CLI behavior with real skill binaries. Requires make skills-build before running.
make skills-buildmake test-integration-cmdMust-have test suites
Section titled “Must-have test suites”For new features or changes in these subsystems, expect tests of these kinds:
Envelope and protocol
Section titled “Envelope and protocol”- Valid and invalid envelopes
- Large-output → CAS wrapper (summary + artifact + optional
meta.cas_digest) - Error envelopes with actionable
error.code+data.hint
- Integrity failures (digest mismatch)
- Concurrent
Put/Get - Tagging and pinning behavior
- Lifecycle transitions (
queued → running → ok|error|canceled) - Crash recovery and resumption
--dedupebehavior
OpenAPI (http/openapi)
Section titled “OpenAPI (http/openapi)”- Dry-run output (
request_plan) - Auth: bearer, apiKey, basic, OAuth2 client-credentials
- Pagination: link, cursor, offset
- Retries and rate limiting (429/5xx → backoff →
ERATELIMIT/ERUNTIME) - Non-UTF-8 bodies rejected with
EPARSE
Plugins
Section titled “Plugins”- Example auth and pagination plugins invoked via subprocess (WASI and exec)
Golden tests
Section titled “Golden tests”Golden fixtures in testdata/*.json and test/golden/ must be reproducible:
- Sort keys and arrays in output
- Inject timestamps via clock interface (no
time.Now()in core logic) - Use stable IDs or inject UUID generator
- Prefer testing the functional core with table-driven tests (no IO)
Test watcher and feedback hooks
Section titled “Test watcher and feedback hooks”The test infrastructure provides fast local feedback and surfaces results to agents:
Test watcher
Section titled “Test watcher”foxctl watch testsA daemon that watches the workspace and runs configured test commands. Status persists in SQLite (~/.foxctl/storage/test_watch.db) via the testwatch store.
Test feedback hook
Section titled “Test feedback hook”The hooks/test_feedback skill reads the test watch store and returns a summary of failing watchers/tests back to Claude after edits.
Recommended workflow
Section titled “Recommended workflow”When making code changes that affect tests:
- Add or update watcher configurations:
foxctl test-watch add - Let the watcher and feedback hook surface failures automatically
- Prefer this over hard-coding bespoke test commands into docs
Storage builds
Section titled “Storage builds”Turso is the canonical SQLite-family storage path:
make buildmake testThe old libsqlite3/sqlite-vector build lane has been removed. Do not add github.com/mattn/go-sqlite3, -tags=libsqlite3, or sqlite-vector extension loading back to the storage path.
Determinism rules
Section titled “Determinism rules”| Rule | Why it matters |
|---|---|
No time.Now() in core logic | Breaks determinism; use injected clock interface |
No rand or UUID generation in core | Use injected UUID generator |
No map[string]any deep in domain logic | Stringly-typed bugs; parse at boundary |
| Sort before emit | Prevents flaky golden tests |
Core packages must not import os, database/sql, adapters | Architecture violation; invert dependency |
Review gates
Section titled “Review gates”Before merging, every PR must pass:
make check— fmt, lint, vet, testmake check-doc-links— when markdown changed- Coverage floor:
make check-coverage - CI pipeline (lint, test, race)
- At least one human approval