mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-01 19:25:10 +02:00
2014557e7f
* feat(setup-gbrain): add gstack-gbrain-repo-policy bin helper Per-remote trust-tier store for the forthcoming /setup-gbrain skill. Tiers are the D3 triad (read-write / read-only / deny), keyed by a normalized remote URL so ssh-shorthand and https variants collapse to the same entry. The file carries _schema_version: 2 (D2-eng); legacy `allow` values from pre-D3 experiments auto-migrate to `read-write` on first read, idempotent, with a one-shot log line. Pure bash + jq to match the existing gstack-brain-* family. Atomic writes via tmpfile + rename. Policy file mode 0600. Corrupt files quarantine to .corrupt-<ts> and start fresh. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(setup-gbrain): unit tests for gstack-gbrain-repo-policy 24 tests covering normalize (ssh/https/shorthand/uppercase collapse to one key), set/get round-trip, all three D3 tiers accepted, invalid tiers rejected, file mode 0600, _schema_version field written on fresh files, legacy allow migration (including idempotence and preservation of non-allow entries), corrupt-JSON quarantine + fresh-file recovery, list output sorting, and get-without-arg auto-detect against a git repo with no origin. All tests green against a per-test tmpdir GSTACK_HOME so nothing leaks into the real ~/.gstack. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(setup-gbrain): add gstack-gbrain-detect state reporter Pure-introspection JSON emitter for the /setup-gbrain skill's start-up branching. Reports: gbrain presence + version on PATH, ~/.gbrain/config.json existence + engine, `gbrain doctor --json` health (wrapped in timeout 5s to match the /health D6 pattern), gstack-brain-sync mode via gstack-config, and ~/.gstack/.git presence for the memory-sync feature. Never modifies state. Always emits valid JSON even when every check is false. Handles malformed ~/.gbrain/config.json without crashing — gbrain_engine is null in that case, not an error. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(setup-gbrain): add gstack-gbrain-install with D5 detect-first + D19 PATH-shadow guard Clones gbrain at a pinned commit (v0.18.2) and registers it via `bun link`. Before any clone: D5 detect-first — probes ~/git/gbrain, ~/gbrain, and the install target for a valid pre-existing clone (package.json with name "gbrain" and bin.gbrain set). If one is found, `bun link` runs there instead of cloning a second copy. Prevents the day-one duplicate-install footgun on the skill author's own machine. After install: D19 PATH-shadow guard — reads the install-dir's package.json version, compares to `gbrain --version` on PATH. On mismatch: exits 3, prints every gbrain binary on PATH via `type -a`, and gives a remediation menu. Setup skills refuse broken environments instead of warning and continuing. Prereq checks (bun, git, https://github.com reachability) fail fast with install hints. --dry-run and --validate-only flags let the skill probe the plan without touching state; tests use them to cover D5 and D19 without exercising real bun link. Pin is a load-bearing version: setup-gbrain v1 verified against gbrain v0.18.2. Updating requires re-running Pre-Impl Gate 1 to verify gbrain's CLI + config shapes haven't drifted. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(setup-gbrain): unit tests for gstack-gbrain-detect + install 15 tests covering: detect emits valid JSON when nothing configured, reports gstack_brain_git on GSTACK_HOME/.git presence, reads ~/.gbrain/config.json engine, tolerates malformed config, detects a mocked gbrain binary on PATH with version parsing. For install: D5 detect-first uses ~/git/gbrain fixtures under a sandboxed HOME, verifies fall-through to fresh clone when no valid clone exists, rejects invalid package.json shapes. D19 PATH-shadow validation uses a fake gbrain on a minimal SAFE_PATH to simulate version mismatch, same-version-pass, v-prefix tolerance, missing binary on PATH, and missing version field in package.json. --validate-only mode in the install bin makes the D19 check unit- testable without running real bun link (which touches ~/.bun/bin). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(setup-gbrain): add gstack-gbrain-lib.sh with read_secret_to_env (D3-eng) Shared secret-read helper for PAT (D11) and pooler URL paste (D16). One implementation of the hardest-to-get-right pattern: stty -echo + SIGINT/TERM/EXIT trap that restores terminal mode, read into a named env var, optional redacted preview. Validates the target var name against [A-Z_][A-Z0-9_]* to prevent bash name-injection via `read -r "$varname"`. When stdin is not a TTY (CI, piped tests) the stty branches skip cleanly — piped input doesn't echo anyway. Exports the var after read so subprocesses inherit it; callers own the `unset` at handoff time. Sourced, not executed — no +x bit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(setup-gbrain): add gstack-gbrain-supabase-verify structural URL check Zero-network validator for Supabase Session Pooler URLs before handing them to `gbrain init`. Canonical shape verified per gbrain init.ts:266: postgresql://postgres.<ref>:<password>@aws-0-<region>.pooler.supabase.com:6543/postgres Rejects direct-connection URLs (db.*.supabase.co:5432) with a distinct exit code 3 and clear IPv6-failure remediation — that's the most common paste mistake users make, so it earns its own UX path rather than a generic "bad URL" error. Never echoes the URL (contains a password) in error messages; tests verify a distinct seed password never appears in stderr on any reject path. Accepts URL from argv[1] or stdin ("-" or no arg). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(setup-gbrain): unit tests for supabase-verify + lib.sh secret helper 22 tests. verify: accepts canonical pooler URL (argv + stdin modes), rejects direct-connection URL with exit 3, rejects wrong scheme, wrong port, empty password, missing userinfo, plain 'postgres' user (catches direct-URL paste errors), wrong host, empty URL. Case-insensitive host match. Explicit negative: error messages never echo the URL password. lib.sh read_secret_to_env: reads piped stdin into the named env var, exports to subprocesses, redacted-preview emits masked form on stderr with the seed password absent, rejects invalid var names (lowercase, leading digit, hyphens), rejects missing/unknown flags, secret value never appears on stdout. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(setup-gbrain): add gstack-gbrain-supabase-provision Management API wrapper Four subcommands: list-orgs, create, wait, pooler-url. Built against the verified Supabase Management API shape (Pre-Impl Gate 1): - POST /v1/projects with {name, db_pass, organization_slug, region} — not the original plan's /v1/organizations/{ref}/projects - No `plan` field; subscription tier is org-level per the OpenAPI description ("Subscription Plan is now set on organization level and is ignored in this request") - GET /v1/projects/{ref}/config/database/pooler for pooler config — not /config/database Secrets discipline: SUPABASE_ACCESS_TOKEN (PAT) and DB_PASS read from env only, never from argv (D8 grep test enforces this). `set +x` at the top as a defensive default so debug tracing never leaks secrets. Management API hostname hardcoded to SUPABASE_API_BASE env override — no user-controlled URL portion (SSRF guard). HTTP error paths: 401/403 → exit 3 (auth), 402 → 4 (quota), 409 → 5 (conflict), 429 + 5xx → exponential-backoff retry up to 3 attempts, then exit 8. Wait subcommand polls every 5s until ACTIVE_HEALTHY with a configurable timeout; terminal states (INIT_FAILED, REMOVED, etc.) exit 7 immediately with a clear message. Timeout emits the --resume-provision hint so the skill can recover. Pooler-url constructs the URL locally from db_user/host/port/name + DB_PASS rather than trusting the API response's connection_string field, which is templated with [PASSWORD] rather than the real value. Handles both object and array response shapes, preferring session pool_mode when Supabase returns multiple pooler configs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(setup-gbrain): unit tests for gstack-gbrain-supabase-provision via mock API 22 tests covering D21 HTTP error suite (401/403/402/409/429/5xx) and happy paths for all four subcommands. Every test spins up a Bun.serve mock server bound to SUPABASE_API_BASE so nothing hits the real API. Uses Bun.spawn (async) rather than spawnSync because spawnSync blocks the Bun event loop, which prevents Bun.serve mocks from responding — calls would hit curl's own timeout instead of round-tripping. Verifies: POST body contains organization_slug (not organization_id) and no `plan` field, bearer-token auth header, retry-on-429 with eventual success, exit-8 on persistent 5xx after max retries, wait succeeds on ACTIVE_HEALTHY, exits 7 on INIT_FAILED, exits 6 with --resume-provision hint on timeout, pooler-url builds URL locally from db_user/host/port/name + DB_PASS (not response connection_string template), handles array pooler responses. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(setup-gbrain): add SKILL.md.tmpl — user-facing skill prompt Stitches together every slice built so far (repo-policy, detect, install, lib.sh secret helper, supabase-verify, supabase-provision) into a single interactive flow. Paths: Supabase existing-URL, Supabase auto-provision (D7), Supabase manual, PGLite local, switch (PGLite ↔ Supabase via gbrain migrate wrapped in timeout 180s per D9). Secrets discipline per D8/D10/D11: PAT + DB_PASS + pooler URL all read via read_secret_to_env from lib.sh and handed to gbrain via GBRAIN_DATABASE_URL env, never argv. PAT carries the full D11 scope disclosure before collection and an explicit revocation reminder after success. D12 SIGINT recovery prints the in-flight ref + resume command. D18 MCP registration is scoped honestly to Claude Code — skips with a manual-register hint when `claude` is not on PATH. D6 per-remote trust-triad question (read-write/read-only/deny/skip-for-now) gates repo import; the triad values compose with the D2-eng schema-version policy file so future migrations stay deterministic. Skill runs concurrent-run-locked via mkdir ~/.gstack/.setup-gbrain.lock.d (atomic, same pattern as gstack-brain-sync). Telemetry (D4) payload carries enumerated categorical values only — never URL, PAT, or any postgresql:// substring. --repo, --switch, --resume-provision, --cleanup-orphans shortcut modes documented inline; the skill parses its own invocation args. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(health): integrate gbrain as D6 composite dimension Adds a GBrain row to the /health dashboard rubric with weight 10%. Three sub-signals rolled into one 0-10 score: doctor status (0.5), sync queue depth (0.3), last-push age (0.2). Redistributes when gbrain_sync_mode is off so the dimension stays fair. Weights rebalance: typecheck 25→22, lint 20→18, test 30→28, deadcode 15→13, shell 10→9, gbrain +10 — sums to 100. gbrain doctor --json wrapped in timeout 5s so a hung gbrain never stalls the /health dashboard. Dimension is omitted (not red) when gbrain is not installed — running /health on a non-gbrain machine shouldn't penalize that choice. History-JSONL adds a `gbrain` field. Pre-D6 entries read as null for trend comparison; new tracking starts from first post-D6 run. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(test): add secret-sink-harness for negative-space leak testing (D21 #5) Runs a subprocess with a seeded secret, captures every channel the subprocess could leak through, and asserts the seed never appears. Built per the D1-eng tightened contract: per-run tmp $HOME, four seed match rules (exact + URL-decoded + first-12-char prefix + base64), fd-level stdout/stderr capture via Bun.spawn, post-mortem walk of every file written under $HOME, separate buckets for telemetry JSONL. Reusable: any future skill that handles secrets can import runWithSecretSink and run positive/negative controls against its own bins. The harness itself is ~180 lines of TS with no external deps beyond Bun + node:fs. Out of scope for v1 (documented as follow-ups): subprocess env dump (portable /proc reading), the user's real shell history (bins don't modify it). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test: secret-sink harness positive controls + real-bin negative controls 11 tests. Positive controls deliberately leak a seed in every covered channel (stdout, stderr, a file under $HOME, the telemetry JSONL path, base64-encoded, first-12-char prefix) and assert the harness catches each one. Without these, a harness that silently under-reports would look identical to a harness that works. Negative controls run real setup-gbrain bins with distinctive seeds: - supabase-verify rejects a mysql:// URL and a direct-connection URL, password never appears in any captured channel - lib.sh read_secret_to_env reads piped stdin, emits only the length, seed value stays invisible - supabase-provision on an auth-failure path fails fast without leaking the PAT to any channel Covers D21 #5 leak harness + uses it to validate D3-eng, D10, D11 discipline end-to-end on the already-shipped bins. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(setup-gbrain): add list-orphans + delete-project subcommands (D20) Powers /setup-gbrain --cleanup-orphans. list-orphans filters the authenticated user's Supabase projects by name prefix (default "gbrain") and excludes the project the local ~/.gbrain/config.json currently points at, so only unclaimed gbrain-shaped projects come back. Active-ref detection parses the pooler URL's user portion (postgres.<ref>:<pw>@...). delete-project is a thin DELETE /v1/projects/{ref} wrapper with no confirmation of its own — the skill's UI layer owns the per-project confirm AskUserQuestion loop. Keeps responsibilities clean: the bin manages HTTP; the skill manages user intent. Both subcommands reuse the existing api_call retry+backoff and the same PAT discipline (env only, never argv). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(setup-gbrain): list-orphans active-ref filtering + delete-project 404 6 new tests bringing the supabase-provision suite to 28: list-orphans: - Filters to gbrain-prefixed projects, excludes the active-ref derived from ~/.gbrain/config.json's pooler URL - Treats all gbrain-prefixed projects as orphans when no config exists (first run on a new machine) - Respects custom --name-prefix for users who named their brain something else delete-project: - Happy path sends DELETE /v1/projects/<ref> and returns {deleted_ref} - 404 surfaces cleanly (exit 2, "404" in stderr) - Missing <ref> positional rejected with exit 2 Uses per-test tmpdir HOME with a stubbed ~/.gbrain/config.json so active-ref extraction runs against deterministic fixtures. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: regenerate setup-gbrain SKILL.md after main merge * chore: bump version and changelog (v1.12.0.0) Ships /setup-gbrain and its supporting infrastructure end-to-end: per-remote trust policy, installer with PATH-shadow guard, shared secret-read helper, structural URL verifier, Supabase Management API wrapper, /health GBrain dimension, secret-sink test harness. 100 new tests across 5 suites, all green. Three pre-existing test failures noted as P0 in TODOS.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: add USING_GBRAIN_WITH_GSTACK.md + update README for /setup-gbrain README changes: - Rewrote the "Cross-machine memory with GBrain sync" section into "GBrain — persistent knowledge for your coding agent." Covers the three /setup-gbrain paths (Supabase existing URL, auto-provision, PGLite local), MCP registration, per-remote trust triad, and the (still-separate) memory sync feature. - Added /setup-gbrain row to the skills table pointing at the full guide. - Added /setup-gbrain to both skill-list install snippets. - Added USING_GBRAIN_WITH_GSTACK.md to the Docs table. New doc (USING_GBRAIN_WITH_GSTACK.md): - All three setup paths with trust-surface caveats - MCP registration details (and honest Claude-Code-v1 scoping) - Per-remote trust triad semantics + how to change a policy - Switching engines (PGLite ↔ Supabase) via --switch - GStack memory sync + its relationship to the gbrain knowledge base - /setup-gbrain --cleanup-orphans for orphan Supabase projects - Full command + flag reference, every bin helper, every env var - Security model: what's enforced in code, what's enforced by the leak harness, and the honest limits of v1 - Troubleshooting: PATH shadowing, direct-connection URL reject, auto-provision timeout, stale lock, policy file hand-edits, migrate hang - Why-this-design section explaining the non-obvious choices Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(brain-sync): secret scanner now catches Bearer-prefixed auth tokens in JSON The bearer-token-json regex value charset was [A-Za-z0-9_./+=-]{16,}, which does NOT permit spaces. Real HTTP auth headers embed the scheme name with a literal space — "Bearer <token>" — so the value portion actually starts with "Bearer " and the existing regex couldn't match. Result: any JSON blob containing "authorization":"Bearer ..." would slip past the scanner and sync to the user's private brain repo with the bearer token inline. Added optional (Bearer |Basic |Token )? prefix in front of the value charset. Now matches the common auth-scheme forms without broadening the matcher to tolerate arbitrary whitespace (which would false-positive on lots of benign JSON). Verified against 5 positive cases (bearer-in-json, clean bearer, apikey no-prefix, token with Bearer, password no-prefix) + 3 negative cases (too-short tokens, non-secret field names like username, random JSON). This closes the P0 security regression first noticed during v1.12.0.0 /ship. brain-sync.test.ts now passes all 7 secret-scan fixtures. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test: mock-gh integration tests for gstack-brain-init auto-create path 8 tests covering the gh-repo-create happy path that had zero coverage before. Existing brain-sync.test.ts always passes --remote <bare-url> to bypass gh entirely, so the interactive default ("press Enter, we'll run gh repo create for you") was shipping on trust. Test strategy: write a bash stub for gh that records every call into a file, then run gstack-brain-init with that stub on PATH. Assertions verify: gh auth status is checked, gh repo create fires with the computed gstack-brain-<user> default name + --private + --source flags, fall-through to gh repo view when create reports already-exists, user-provided URL bypasses gh entirely, gh-not-on-path and gh-not-authed branches both prompt for URL, --remote flag short-circuits all gh calls, conflicting-remote re-runs exit 1 with a clear message. No real GitHub, no live auth. Gate tier — runs on every commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(e2e): privacy-gate AskUserQuestion fires from preamble (periodic tier) Two periodic-tier E2E tests exercising the preamble's privacy gate end-to-end via the Agent SDK + canUseTool. Previously uncovered: - Positive: stages a fake gbrain on PATH + gbrain_sync_mode_prompted=false in config, runs a real skill, intercepts tool-use. Asserts the preamble fires a 3-option AskUserQuestion matching the canonical prose ("publish session memory" / "artifact" / "decline") and does NOT fire a second time in the same run (idempotency within session). - Negative: same staging but prompted=true. Asserts the gate stays silent even with gbrain detected on the host. Registered in test/helpers/touchfiles.ts as `brain-privacy-gate` (periodic) with dependency tracking on generate-brain-sync-block.ts, the three gstack-brain-* bins, gstack-config, and the Agent SDK runner. Diff-based selection re-runs the E2E when any of those change. Cost: ~$0.30-$0.50 per run. Only fires under EVALS=1 EVALS_TIER=periodic; gate tier stays free. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: update TODOS for bearer-json fix + new brain-sync test coverage Moves the bearer-json secret-scan regression from the P0 "pre-existing failures" block into the Completed section with full context on the fix, the mock-gh tests, the E2E privacy-gate tests, and the touchfile registration. Remaining P0s are the GSTACK_HOME config-isolation bug and the stale Opus 4.7 overlay pacing assertion, both unrelated. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(test): E2E privacy gate — ambient env + skill-file prompt Two fixes to get the E2E actually running end-to-end (first attempt failed at the SDK auth step, second at the assertion step): 1. Don't pass an explicit `env:` object to runAgentSdkTest. The SDK's auth pipeline misses ANTHROPIC_API_KEY when env is supplied as an object (verified against the plan-mode-no-op test, which passes no env and auths cleanly). Mutate process.env before the call instead, and restore the originals in finally so other tests don't inherit the ambient mutation. 2. The "Run /learn with no arguments" user prompt was too narrow — the model reduced it to a direct action and skipped the preamble privacy-gate directives entirely, so zero AskUserQuestions fired. Mirror the plan-mode-no-op pattern: point the model at the skill file on disk and ask it to follow every preamble directive. Bumped maxTurns from 6 to 10 to give the preamble room to execute. Verified both tests pass under `EVALS=1 EVALS_TIER=periodic bun test test/skill-e2e-brain-privacy-gate.test.ts` against a real ANTHROPIC_API_KEY. Cost per run: ~$0.30-$0.50 per test. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(CLAUDE.md): source ANTHROPIC/OPENAI keys from ~/.zshrc for paid evals Conductor workspaces don't inherit the interactive shell env, so both API keys are absent from the default process env even though they're set in ~/.zshrc. Documents the source-from-zshrc pattern (grep + eval, never echo the value) plus the Agent SDK gotcha: do NOT pass env as an object to runAgentSdkTest — mutate process.env ambiently and restore in finally. Discovered this during the brain-privacy-gate E2E. First run failed at SDK auth with 401; second failed because explicit env handoff bypassed the SDK's own auth routing. Fix pattern now codified so the next paid-eval session in a Conductor workspace doesn't hit the same two dead ends. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
321 lines
11 KiB
Cheetah
321 lines
11 KiB
Cheetah
---
|
|
name: health
|
|
preamble-tier: 2
|
|
version: 1.0.0
|
|
description: |
|
|
Code quality dashboard. Wraps existing project tools (type checker, linter,
|
|
test runner, dead code detector, shell linter), computes a weighted composite
|
|
0-10 score, and tracks trends over time. Use when: "health check",
|
|
"code quality", "how healthy is the codebase", "run all checks",
|
|
"quality score". (gstack)
|
|
triggers:
|
|
- code health check
|
|
- quality dashboard
|
|
- how healthy is codebase
|
|
allowed-tools:
|
|
- Bash
|
|
- Read
|
|
- Write
|
|
- Edit
|
|
- Glob
|
|
- Grep
|
|
- AskUserQuestion
|
|
---
|
|
|
|
{{PREAMBLE}}
|
|
|
|
# /health -- Code Quality Dashboard
|
|
|
|
You are a **Staff Engineer who owns the CI dashboard**. You know that code quality
|
|
isn't one metric -- it's a composite of type safety, lint cleanliness, test coverage,
|
|
dead code, and script hygiene. Your job is to run every available tool, score the
|
|
results, present a clear dashboard, and track trends so the team knows if quality
|
|
is improving or slipping.
|
|
|
|
**HARD GATE:** Do NOT fix any issues. Produce the dashboard and recommendations only.
|
|
The user decides what to act on.
|
|
|
|
## User-invocable
|
|
When the user types `/health`, run this skill.
|
|
|
|
---
|
|
|
|
## Step 1: Detect Health Stack
|
|
|
|
Read CLAUDE.md and look for a `## Health Stack` section. If found, parse the tools
|
|
listed there and skip auto-detection.
|
|
|
|
If no `## Health Stack` section exists, auto-detect available tools:
|
|
|
|
```bash
|
|
# Type checker
|
|
[ -f tsconfig.json ] && echo "TYPECHECK: tsc --noEmit"
|
|
|
|
# Linter
|
|
[ -f biome.json ] || [ -f biome.jsonc ] && echo "LINT: biome check ."
|
|
setopt +o nomatch 2>/dev/null || true
|
|
ls eslint.config.* .eslintrc.* .eslintrc 2>/dev/null | head -1 | xargs -I{} echo "LINT: eslint ."
|
|
[ -f .pylintrc ] || [ -f pyproject.toml ] && grep -q "pylint\|ruff" pyproject.toml 2>/dev/null && echo "LINT: ruff check ."
|
|
|
|
# Test runner
|
|
[ -f package.json ] && grep -q '"test"' package.json 2>/dev/null && echo "TEST: $(node -e "console.log(JSON.parse(require('fs').readFileSync('package.json','utf8')).scripts.test)" 2>/dev/null)"
|
|
[ -f pyproject.toml ] && grep -q "pytest" pyproject.toml 2>/dev/null && echo "TEST: pytest"
|
|
[ -f Cargo.toml ] && echo "TEST: cargo test"
|
|
[ -f go.mod ] && echo "TEST: go test ./..."
|
|
|
|
# Dead code
|
|
command -v knip >/dev/null 2>&1 && echo "DEADCODE: knip"
|
|
[ -f package.json ] && grep -q '"knip"' package.json 2>/dev/null && echo "DEADCODE: npx knip"
|
|
|
|
# Shell linting
|
|
command -v shellcheck >/dev/null 2>&1 && ls *.sh scripts/*.sh bin/*.sh 2>/dev/null | head -1 | xargs -I{} echo "SHELL: shellcheck"
|
|
|
|
# GBrain presence (D6) — only report as a dimension if gbrain is actually
|
|
# set up; otherwise skip so machines without gbrain aren't penalized.
|
|
if command -v gbrain >/dev/null 2>&1 && [ -f "$HOME/.gbrain/config.json" ]; then
|
|
echo "GBRAIN: gbrain doctor --json (wrapped in timeout 5s)"
|
|
fi
|
|
```
|
|
|
|
Use Glob to search for shell scripts:
|
|
- `**/*.sh` (shell scripts in the repo)
|
|
|
|
After auto-detection, present the detected tools via AskUserQuestion:
|
|
|
|
"I detected these health check tools for this project:
|
|
|
|
- Type check: `tsc --noEmit`
|
|
- Lint: `biome check .`
|
|
- Tests: `bun test`
|
|
- Dead code: `knip`
|
|
- Shell lint: `shellcheck *.sh`
|
|
|
|
A) Looks right -- persist to CLAUDE.md and continue
|
|
B) I need to adjust some tools (tell me which)
|
|
C) Skip persistence -- just run these"
|
|
|
|
If the user chooses A or B (after adjustments), append or update a `## Health Stack`
|
|
section in CLAUDE.md:
|
|
|
|
```markdown
|
|
## Health Stack
|
|
|
|
- typecheck: tsc --noEmit
|
|
- lint: biome check .
|
|
- test: bun test
|
|
- deadcode: knip
|
|
- shell: shellcheck *.sh scripts/*.sh
|
|
```
|
|
|
|
---
|
|
|
|
## Step 2: Run Tools
|
|
|
|
Run each detected tool. For each tool:
|
|
|
|
1. Record the start time
|
|
2. Run the command, capturing both stdout and stderr
|
|
3. Record the exit code
|
|
4. Record the end time
|
|
5. Capture the last 50 lines of output for the report
|
|
|
|
```bash
|
|
# Example for each tool — run each independently
|
|
START=$(date +%s)
|
|
tsc --noEmit 2>&1 | tail -50
|
|
EXIT_CODE=$?
|
|
END=$(date +%s)
|
|
echo "TOOL:typecheck EXIT:$EXIT_CODE DURATION:$((END-START))s"
|
|
```
|
|
|
|
Run tools sequentially (some may share resources or lock files). If a tool is not
|
|
installed or not found, record it as `SKIPPED` with reason, not as a failure.
|
|
|
|
---
|
|
|
|
## Step 3: Score Each Category
|
|
|
|
Score each category on a 0-10 scale using this rubric:
|
|
|
|
| Category | Weight | 10 | 7 | 4 | 0 |
|
|
|-----------|--------|------|-----------|------------|-----------|
|
|
| Type check | 22% | Clean (exit 0) | <10 errors | <50 errors | >=50 errors |
|
|
| Lint | 18% | Clean (exit 0) | <5 warnings | <20 warnings | >=20 warnings |
|
|
| Tests | 28% | All pass (exit 0) | >95% pass | >80% pass | <=80% pass |
|
|
| Dead code | 13% | Clean (exit 0) | <5 unused exports | <20 unused | >=20 unused |
|
|
| Shell lint | 9% | Clean (exit 0) | <5 issues | >=5 issues | N/A (skip) |
|
|
| GBrain (D6) | 10% | doctor=ok, queue<10, pushed <24h | doctor=warnings OR queue<100 OR pushed <72h | doctor broken OR queue>=100 OR pushed >=72h | N/A (gbrain not installed) |
|
|
|
|
**Parsing tool output for counts:**
|
|
- **tsc:** Count lines matching `error TS` in output.
|
|
- **biome/eslint/ruff:** Count lines matching error/warning patterns. Parse the summary line if available.
|
|
- **Tests:** Parse pass/fail counts from the test runner output. If the runner only reports exit code, use: exit 0 = 10, exit non-zero = 4 (assume some failures).
|
|
- **knip:** Count lines reporting unused exports, files, or dependencies.
|
|
- **shellcheck:** Count distinct findings (lines starting with "In ... line").
|
|
|
|
**Composite score:**
|
|
```
|
|
composite = (typecheck_score * 0.22) + (lint_score * 0.18) + (test_score * 0.28) + (deadcode_score * 0.13) + (shell_score * 0.09) + (gbrain_score * 0.10)
|
|
```
|
|
|
|
If a category is skipped (tool not available — includes GBrain when gbrain
|
|
is not installed), redistribute its weight proportionally among the
|
|
remaining categories.
|
|
|
|
**GBrain sub-score computation (D6):**
|
|
|
|
```
|
|
doctor_component: 10 if `gbrain doctor --json | jq -r .status` == "ok";
|
|
7 if "warnings"; 0 otherwise (or command times out after 5s).
|
|
queue_component: 10 if ~/.gstack/.brain-queue.jsonl has <10 lines;
|
|
7 if 10-100; 0 if >=100 (suggests secret-scan rejections
|
|
piling up). N/A if gbrain_sync_mode == off.
|
|
push_component: 10 if (now - mtime of ~/.gstack/.brain-last-push) < 24h;
|
|
7 if <72h; 0 if >=72h. N/A if gbrain_sync_mode == off.
|
|
gbrain_score = 0.5 * doctor_component + 0.3 * queue_component + 0.2 * push_component
|
|
(redistribute 0.3 + 0.2 into doctor when sync_mode is off:
|
|
gbrain_score = doctor_component in that case)
|
|
```
|
|
|
|
The `gbrain doctor --json` call MUST be wrapped in `timeout 5s` so a hung
|
|
or misconfigured gbrain doesn't stall the entire /health dashboard.
|
|
|
|
---
|
|
|
|
## Step 4: Present Dashboard
|
|
|
|
Present results as a clear table:
|
|
|
|
```
|
|
CODE HEALTH DASHBOARD
|
|
=====================
|
|
|
|
Project: <project name>
|
|
Branch: <current branch>
|
|
Date: <today>
|
|
|
|
Category Tool Score Status Duration Details
|
|
---------- ---------------- ----- -------- -------- -------
|
|
Type check tsc --noEmit 10/10 CLEAN 3s 0 errors
|
|
Lint biome check . 8/10 WARNING 2s 3 warnings
|
|
Tests bun test 10/10 CLEAN 12s 47/47 passed
|
|
Dead code knip 7/10 WARNING 5s 4 unused exports
|
|
Shell lint shellcheck 10/10 CLEAN 1s 0 issues
|
|
GBrain gbrain doctor 10/10 CLEAN <1s doctor=ok, queue=3, pushed 2h ago
|
|
|
|
COMPOSITE SCORE: 9.1 / 10
|
|
|
|
Duration: 23s total
|
|
```
|
|
|
|
Use these status labels:
|
|
- 10: `CLEAN`
|
|
- 7-9: `WARNING`
|
|
- 4-6: `NEEDS WORK`
|
|
- 0-3: `CRITICAL`
|
|
|
|
If any category scored below 7, list the top issues from that tool's output:
|
|
|
|
```
|
|
DETAILS: Lint (3 warnings)
|
|
biome check . output:
|
|
src/utils.ts:42 — lint/complexity/noForEach: Prefer for...of
|
|
src/api.ts:18 — lint/style/useConst: Use const instead of let
|
|
src/api.ts:55 — lint/suspicious/noExplicitAny: Unexpected any
|
|
```
|
|
|
|
---
|
|
|
|
## Step 5: Persist to Health History
|
|
|
|
```bash
|
|
{{SLUG_SETUP}}
|
|
```
|
|
|
|
Append one JSONL line to `~/.gstack/projects/$SLUG/health-history.jsonl`:
|
|
|
|
```json
|
|
{"ts":"2026-03-31T14:30:00Z","branch":"main","score":9.1,"typecheck":10,"lint":8,"test":10,"deadcode":7,"shell":10,"gbrain":10,"duration_s":23}
|
|
```
|
|
|
|
Fields:
|
|
- `ts` -- ISO 8601 timestamp
|
|
- `branch` -- current git branch
|
|
- `score` -- composite score (one decimal)
|
|
- `typecheck`, `lint`, `test`, `deadcode`, `shell`, `gbrain` -- individual category scores (integer 0-10)
|
|
- `duration_s` -- total time for all tools in seconds
|
|
|
|
If a category was skipped, set its value to `null`. Pre-D6 history entries
|
|
won't have a `gbrain` field — treat them as `null` for trend comparison
|
|
and start new tracking from the first post-D6 run.
|
|
|
|
---
|
|
|
|
## Step 6: Trend Analysis + Recommendations
|
|
|
|
Read the last 10 entries from `~/.gstack/projects/$SLUG/health-history.jsonl` (if the
|
|
file exists and has prior entries).
|
|
|
|
```bash
|
|
{{SLUG_SETUP}}
|
|
tail -10 ~/.gstack/projects/$SLUG/health-history.jsonl 2>/dev/null || echo "NO_HISTORY"
|
|
```
|
|
|
|
**If prior entries exist, show the trend:**
|
|
|
|
```
|
|
HEALTH TREND (last 5 runs)
|
|
==========================
|
|
Date Branch Score TC Lint Test Dead Shell GBrain
|
|
---------- ----------- ----- -- ---- ---- ---- ----- ------
|
|
2026-03-28 main 9.4 10 9 10 8 10 10
|
|
2026-03-29 feat/auth 8.8 10 7 10 7 10 10
|
|
2026-03-30 feat/auth 8.2 10 6 9 7 10 7
|
|
2026-03-31 feat/auth 9.1 10 8 10 7 10 10
|
|
|
|
Trend: IMPROVING (+0.9 since last run)
|
|
```
|
|
|
|
**If score dropped vs the previous run:**
|
|
1. Identify WHICH categories declined
|
|
2. Show the delta for each declining category
|
|
3. Correlate with tool output -- what specific errors/warnings appeared?
|
|
|
|
```
|
|
REGRESSIONS DETECTED
|
|
Lint: 9 -> 6 (-3) — 12 new biome warnings introduced
|
|
Most common: lint/complexity/noForEach (7 instances)
|
|
Tests: 10 -> 9 (-1) — 2 test failures
|
|
FAIL src/auth.test.ts > should validate token expiry
|
|
FAIL src/auth.test.ts > should reject malformed JWT
|
|
```
|
|
|
|
**Health improvement suggestions (always show these):**
|
|
|
|
Prioritize suggestions by impact (weight * score deficit):
|
|
|
|
```
|
|
RECOMMENDATIONS (by impact)
|
|
============================
|
|
1. [HIGH] Fix 2 failing tests (Tests: 9/10, weight 30%)
|
|
Run: bun test --verbose to see failures
|
|
2. [MED] Address 12 lint warnings (Lint: 6/10, weight 20%)
|
|
Run: biome check . --write to auto-fix
|
|
3. [LOW] Remove 4 unused exports (Dead code: 7/10, weight 15%)
|
|
Run: knip --fix to auto-remove
|
|
```
|
|
|
|
Rank by `weight * (10 - score)` descending. Only show categories below 10.
|
|
|
|
---
|
|
|
|
## Important Rules
|
|
|
|
1. **Wrap, don't replace.** Run the project's own tools. Never substitute your own analysis for what the tool reports.
|
|
2. **Read-only.** Never fix issues. Present the dashboard and let the user decide.
|
|
3. **Respect CLAUDE.md.** If `## Health Stack` is configured, use those exact commands. Do not second-guess.
|
|
4. **Skipped is not failed.** If a tool isn't available, skip it gracefully and redistribute weight. Do not penalize the score.
|
|
5. **Show raw output for failures.** When a tool reports errors, include the actual output (tail -50) so the user can act on it without re-running.
|
|
6. **Trends require history.** On first run, say "First health check -- no trend data yet. Run /health again after making changes to track progress."
|
|
7. **Be honest about scores.** A codebase with 100 type errors and all tests passing is not healthy. The composite score should reflect reality.
|