mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-07 05:56:41 +02:00
garrytan/ship-full-commit-coverage
7 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
0bff8d66a2 | fix: add codex skill metadata for gstack skills (#339) | ||
|
|
b7a3bf108d |
fix: Codex compatibility — 1024-char cap, duplicate skills, repo-local installs, kiro support (v0.11.2.0) (#346)
* fix: cap gstack skill descriptions for codex (#251) Compresses SKILL.md.tmpl root description to <1024 chars (Codex token limit). Adds description-length validation test. Includes /autoplan in compressed skill list (added since PR was branched). Co-authored-by: cweill <cweill@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: skip sidecar dir in Codex skill linking (#269) Adds guard to skip .agents/skills/gstack in link_codex_skill_dirs() — it's a runtime asset sidecar, not a standalone skill. Prevents duplicate skill discovery and symlink overwriting. Fixes #261 Co-authored-by: mvanhorn <mvanhorn@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: generate .agents directory at setup time instead of shipping duplicates (#308) Removes 14K+ lines of committed generated Codex skill files from git. .agents/ is now gitignored and generated at setup time via `bun run gen:skill-docs --host codex`. Updates CI workflow to validate generation instead of checking committed file freshness. Co-authored-by: cskwork <cskwork@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: avoid duplicate Codex skill discovery (#236) Adds migrate_direct_codex_install() to move old direct installs from ~/.codex/skills/gstack to ~/.gstack/repos/gstack. Adds create_codex_runtime_root() to expose only runtime assets (bin/, browse/, review files) via symlinks instead of symlinking the entire repo. Fixes #235 Co-authored-by: shichangs <shichangs@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: support repo-local Codex installs (#317) Changes gen-skill-docs.ts to use dynamic $GSTACK_ROOT/$GSTACK_BIN/$GSTACK_BROWSE variables in generated Codex preambles instead of hardcoded ~/.codex/ paths. Renames GSTACK_DIR → SOURCE_GSTACK_DIR/INSTALL_GSTACK_DIR throughout setup for clarity. Supports both global (~/.codex/skills/) and repo-local (.agents/skills/) Codex installs. Co-authored-by: pengwk <pengwk@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add --host kiro support to setup script (#309) Adds Kiro CLI as a supported agent platform. Setup detects kiro-cli, copies+sed-rewrites SKILL.md paths from Codex/Claude to Kiro format, and symlinks runtime assets (bin/, browse/). Co-authored-by: AnshulDesai <AnshulDesai@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: add sidecar skip, GSTACK_ROOT, and kiro coverage (T1-T3) Adds 3 tests identified during CEO/Eng review: - T1: link_codex_skill_dirs() contains sidecar skip guard - T2: generated Codex preambles use dynamic $GSTACK_ROOT paths - T3: setup supports --host kiro with INSTALL_KIRO and sed rewrites Also fixes existing test to expect kiro in --host case statement. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: review fixes — ETHOS.md, runtime root, repo-local guard, kiro assets, upgrade paths Paranoid 4-pass review found 7 issues, all fixed: - Add ETHOS.md to create_codex_runtime_root - Clean old real dirs (not just symlinks) on upgrade - Skip runtime root for repo-local installs (prevent self-referential symlinks) - Add review/, ETHOS.md, gstack-upgrade/ to Kiro install - Update gstack-upgrade to detect ~/.gstack/repos/ and .agents/skills/ - Guard --host without value from silent exit - Fix Kiro sed patterns + timeout instruction in gen-skill-docs.ts Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.11.2.0) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: remove last tracked .agents/ file from git index Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: cweill <cweill@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: mvanhorn <mvanhorn@users.noreply.github.com> Co-authored-by: cskwork <cskwork@users.noreply.github.com> Co-authored-by: shichangs <shichangs@users.noreply.github.com> Co-authored-by: pengwk <pengwk@users.noreply.github.com> Co-authored-by: AnshulDesai <AnshulDesai@users.noreply.github.com> |
||
|
|
264c1ca234 |
feat: plan files always show review status (v0.11.1.1) (#345)
* feat: plan files always show review status via preamble footer Add Plan Status Footer to generateCompletionStatus() in the preamble. When in plan mode before ExitPlanMode, Claude writes a GSTACK REVIEW REPORT section to the plan file — either populated from review logs or a "NO REVIEWS YET" placeholder. Skips if a review skill already wrote a richer report. * chore: bump version and changelog (v0.11.1.1) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
7ff0f84b1e |
feat: test coverage catalog — shared audit across plan/ship/review (v0.10.1.0) (#259)
* refactor: extract {{TEST_COVERAGE_AUDIT}} shared resolver
DRY extraction of the test coverage audit methodology into a shared
generator function with three explicit placeholders:
- TEST_COVERAGE_AUDIT_PLAN (plan-eng-review)
- TEST_COVERAGE_AUDIT_SHIP (ship)
- TEST_COVERAGE_AUDIT_REVIEW (review)
Shared across all modes: codepath tracing, ASCII diagram format,
quality scoring rubric, E2E test decision matrix, regression rule,
and test framework detection via CLAUDE.md.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor: plan-eng-review uses shared test coverage audit
Replace the thin 6-line Section 3 test review with the full shared
methodology via {{TEST_COVERAGE_AUDIT_PLAN}}. Plan mode now:
- Traces every codepath with full ASCII diagrams
- Adds missing tests to the plan (not just "check for tests")
- Writes test plan artifact for /qa consumption
- Includes E2E/eval recommendations and regression detection
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor: ship uses shared test coverage audit
Replace 135 lines of inline Step 3.4 methodology with
{{TEST_COVERAGE_AUDIT_SHIP}}. Functionally identical output plus:
- E2E test decision matrix (marks paths needing E2E vs unit)
- Eval recommendations for LLM prompt changes
- Regression detection iron rule
- Test framework detection via CLAUDE.md first
- Test plan artifact for /qa consumption
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: /review Step 4.75 test coverage diagram
Add codepath tracing to the pre-landing review via
{{TEST_COVERAGE_AUDIT_REVIEW}}. Review mode:
- Produces ASCII coverage diagram (same methodology as plan/ship)
- Generates tests for gaps via Fix-First (ASK user)
- Subsumes Pass 2 "Test Gaps" checklist category
- Gaps are INFORMATIONAL findings
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test: mode differentiation + regression guard for coverage audit
10 new tests verifying the three TEST_COVERAGE_AUDIT placeholders:
- All modes share: codepath tracing, E2E matrix, regression rule
- Plan mode: adds to plan + artifact, no ship-specific content
- Ship mode: auto-generates + before/after count + coverage summary
- Review mode: Fix-First ASK + INFORMATIONAL, no artifact
- Regression guard: ship SKILL.md preserves all key phrases
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test: extract shared coverage audit fixture + review E2E
- Extract billing.ts fixture into coverage-audit-fixture.ts (DRY)
- Refactor ship-coverage-audit E2E to use shared fixture
- Add review-coverage-audit E2E for Step 4.75
- Update touchfiles: both E2Es depend on shared fixture
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: strengthen E2E assertions for coverage audit tests
The coverage audit E2E tests (ship + review) were only asserting
exitReason === 'success' and readCalls > 0 — they passed even
if the agent produced no coverage diagram. Add assertion that
the output contains either GAP or TESTED markers.
Found during /review.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: plan mode traces the plan, not the git diff
Codex adversarial review caught that plan-eng-review was inheriting
"git diff origin/<base>...HEAD" from the shared resolver, but plan mode
reviews a plan document, not a code diff. Plan mode now says:
"Trace every codepath in the plan" and "Read the plan document."
Ship and review modes keep the git diff instruction.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: bump version and changelog (v0.9.5.0)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: test coverage catalog + failure triage (merged branches) (#285)
* feat: add bin/gstack-repo-mode — solo vs collaborative detection with caching
Detects whether a repo is solo-dev (one person does 80%+ of recent commits)
or collaborative. Uses 90-day git shortlog window with 7-day cache in
~/.gstack/projects/{SLUG}/repo-mode.json. Config override via
`gstack-config set repo_mode solo|collaborative` takes precedence over
the heuristic. Minimum 5 commits required to classify (otherwise unknown).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: test failure ownership triage — see something say something
Adds two new preamble sections to all gstack skills:
- Repo Ownership Mode: explains solo vs collaborative behavior
- See Something, Say Something: proactive issue flagging principle
Adds {{TEST_FAILURE_TRIAGE}} template variable (opt-in, used by /ship):
- Classifies test failures as in-branch vs pre-existing
- Solo mode defaults to "investigate and fix now"
- Collaborative mode offers "blame + assign GitHub issue" option
- Also offers P0 TODO and skip options
/ship Step 3 now triages test failures instead of hard-stopping on all
failures. In-branch failures still block shipping. Pre-existing failures
get user-directed triage based on repo mode.
Adds P2 TODO for gstack notes system (deferred lightweight reminder).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: regenerate SKILL.md files for Claude and Codex hosts
All 22 Claude skills and 21 Codex skills regenerated with new preamble
sections (Repo Ownership Mode, See Something Say Something) and
{{TEST_FAILURE_TRIAGE}} resolved in ship/SKILL.md.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: validate repo mode values to prevent shell injection
Codex adversarial review found that unvalidated config/cache values
could be injected into shell via source <(gstack-repo-mode). Added
validate_mode() that only allows solo|collaborative|unknown — anything
else becomes "unknown". Prevents persistent code execution through
malicious config.yaml or tampered cache JSON.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: shell injection via branch names + feature-branch sampling bias
Codex code review found two issues:
P1: eval $(gstack-slug) in gstack-repo-mode executes branch names as
shell. Branch names like foo$(touch${IFS}pwned) are valid git refs and
would execute arbitrary commands. Fix: compute SLUG directly with sed
instead of eval'ing gstack-slug output.
P2: git shortlog HEAD only sees current branch history. On feature
branches that haven't merged main recently, other contributors disappear
from the sample. Fix: use git shortlog on the default branch
(origin/main) instead of HEAD.
Also improved blame lookup in collaborative triage to check both the
test file and the production code it covers.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: broaden codex-host stripping test to accommodate triage section
"Investigate and fix" now appears in TEST_FAILURE_TRIAGE (not just the
Codex review step). Use CODEX_REVIEWS config string as a more specific
marker for detecting the Codex review step in Codex-hosted skills.
* fix: replace template placeholder in TODOS.md with readable text
{{TEST_FAILURE_TRIAGE}} is template syntax but TODOS.md is not processed
by gen-skill-docs — replaced with human-readable reference.
* chore: bump version and changelog (v0.9.5.0)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* docs: add bin/ directory to project structure in CLAUDE.md
* test: add triage resolver unit tests, plan-eng coverage audit E2E, and triage E2E
- TEST_FAILURE_TRIAGE resolver: 6 unit tests verifying all triage steps (T1-T4),
REPO_MODE branching, and safety default for ambiguous failures
- plan-eng-coverage-audit E2E: tests /plan-eng-review coverage audit codepath
(gap identified during eng review — existed on neither branch)
- ship-triage E2E: planted-bug fixture with in-branch (truncate null) and
pre-existing (divide-by-zero) failures; verifies correct classification
- Touchfile entries for diff-based test selection
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: regenerate stale Codex SKILL.md for retro
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: gstack-repo-mode handles repos without origin remote
Split `git remote get-url origin` into a separate variable with `|| true`
so the script doesn't crash under `set -euo pipefail` in local-only repos.
Falls back to REPO_MODE=unknown gracefully.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: REPO_MODE defaults to unknown when helper emits nothing
Changed preamble from `source <(...) || REPO_MODE=unknown` (which doesn't
catch empty output) to `source <(...) || true` followed by
`REPO_MODE=${REPO_MODE:-unknown}`. Regenerated all SKILL.md files.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: triage E2E runs both test files in subprocesses
math.test.js called process.exit(1) which killed the runner before
string.test.js could execute. Changed test runner to use child_process
so each test runs independently and both failure classes are exercised.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: gstack-repo-mode handles repos without origin remote
Fall back through origin/main → origin/master → HEAD when
git symbolic-ref refs/remotes/origin/HEAD is not set. Prevents
shortlog crash in repos where origin/HEAD isn't configured.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: triage E2E runs both test files in subprocesses
Add assertions verifying both math.test.js (pre-existing failure) and
string.test.js (in-branch failure) actually executed during triage.
Prevents false passes where only one failure class is exercised.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: REPO_MODE defaults to unknown when helper emits nothing
- Remove head -20 truncation that biased solo classification by
dropping low-volume contributors from the denominator
- Use atomic write (mktemp + mv) for cache to prevent concurrent
preamble reads from seeing partial JSON
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: add test coverage catalog to CHANGELOG + update project structure
- CHANGELOG: add 6 entries for coverage audit, review Step 4.75, E2E
recommendations, regression iron rule, failure triage, repo-mode fix
- CLAUDE.md: add missing skill directories (autoplan, benchmark, canary,
codex, land-and-deploy, setup-deploy) to project structure
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: bump version and changelog (v0.10.1.0)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: CHANGELOG rules — branch-scoped versions, never fold into old entries
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
||
|
|
f075cb757f |
feat: Search Before Building — builder ethos + skill integrations (v0.9.5.0) (#298)
* feat: ETHOS.md — gstack builder philosophy Standalone document capturing the four principles: The Golden Age, Boil the Lake, Search Before Building, and Build for Yourself. Introduces the three-layer knowledge framework (tried-and-true, new-and-popular, first-principles) and the Eureka Moment concept — when first-principles reasoning reveals conventional wisdom is wrong. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: Search Before Building preamble section + CLAUDE.md Add generateSearchBeforeBuildingSection(ctx) to gen-skill-docs.ts. Every workflow skill now gets a compact router section covering: - Three layers of knowledge (tried-and-true, new-and-popular, first-principles) - Eureka moment format and jq-based JSONL logging - WebSearch fallback clause - ETHOS.md reference via ctx.paths.skillRoot resolver Also adds compact "Search before building" section to CLAUDE.md. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: skill-specific Search Before Building integrations 8 template changes: - /office-hours: Phase 2.75 Landscape Awareness (WebSearch + three-layer synthesis) - /plan-eng-review: Step 0 search check with layer provenance annotations - /investigate: external pattern search + search escalation on hypothesis failure - /plan-ceo-review: Landscape Check before scope challenge - /review: search-before-recommending for fix patterns - /qa-only: WebSearch in allowed-tools - /design-consultation: three-layer synthesis backport in Phase 2 Step 3 - /retro: eureka moment tracking from ~/.gstack/analytics/eureka.jsonl All search steps include WebSearch fallback clause. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: v0.9.5.0 — Builder Ethos (CHANGELOG + VERSION + TODOS) ETHOS.md + Search Before Building across all workflow skills. Deferred: first-time intro flow (blocked on blog post). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address Codex review — sanitize search, privacy gate, ETHOS.md sidecar Three fixes from adversarial Codex review: - /investigate: sanitize error messages before searching (strip hostnames, IPs, file paths, SQL, customer data). Skip search if unsanitizable. - /office-hours: add privacy gate before landscape search. Use generalized category terms, never the user's specific product name or stealth idea. - setup: link ETHOS.md into .agents/skills/gstack/ sidecar so workspace- local Codex sessions can find the builder philosophy. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: sanitize Phase 2 external pattern search in /investigate The Phase 2 external search also sent raw error messages to WebSearch. Apply same sanitization rule as Phase 3 search escalation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: sync documentation with shipped changes - ARCHITECTURE.md: preamble now handles 5 things (add Search Before Building) - CLAUDE.md: add ETHOS.md to project structure tree - README.md: add ETHOS.md to docs table Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
91bea06675 |
fix: plan mode exception for review log + telemetry writes (v0.9.0.1) (#234)
* fix: plan mode exception for review log + telemetry writes Add explicit plan-mode exception notes to review log sections in all 3 plan review skill templates and the telemetry section in gen-skill-docs.ts. When Claude runs in plan mode, it self-censors bash writes — but review logging and telemetry write to ~/.gstack/ (user metadata, not project files). The preamble already writes to the same directory successfully. The exception note gives Claude a reasoning chain: safety argument, precedent, and consequence of skipping. * chore: regenerate Codex/agents SKILL.md files with plan-mode exception * chore: bump version and changelog (v0.9.0.1) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: community-first telemetry opt-in with anonymous fallback Default opt-in is now "Help gstack get better!" (community mode with stable device ID). If declined, offers anonymous mode as a softer alternative before fully off. * chore: regenerate SKILL.md files with community-first telemetry prompt --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
8ddfab233d |
feat: multi-agent support — gstack works on Codex, Gemini CLI, and Cursor (v0.9.0) (#226)
* refactor: host-aware gen-skill-docs + --host codex generation Refactor gen-skill-docs.ts for multi-agent support: - Add Host type, HostPaths interface, HOST_PATHS config - Decompose generatePreamble() into 7 composable sub-functions - Replace all hardcoded .claude/skills/gstack paths with ctx.paths - Replace static findTemplates() list with dynamic filesystem scan - Add --host codex|agents flag (aliases, same output) - Add processTemplate host routing to .agents/skills/gstack-*/ - Add codexSkillName() with double-prefix prevention - Add transformFrontmatter() — keeps only name + description for Codex - Add extractHookSafetyProse() — converts hooks to inline advisory - Add body text path rewriting for remaining hardcoded paths - Exclude /codex skill from Codex generation (self-referential) Claude output is unchanged (verified via --dry-run). SKILL.md is an open standard: .agents/skills/ works on Codex, Gemini CLI, and Cursor. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: generate Codex/Gemini/Cursor skills into .agents/skills/ Generated 21 skill files for the open SKILL.md standard: - Output: .agents/skills/gstack-*/SKILL.md (one per skill) - Frontmatter: name + description only (no allowed-tools/version) - No .claude/skills/ paths in any generated file - /codex skill excluded (Claude wrapper, self-referential on Codex) - Hook skills (careful/freeze/guard) get inline safety prose - Build script generates both hosts: bun run build Supported agents (all read .agents/skills/): - Codex CLI - Gemini CLI - Cursor Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: dual-host setup + find-browse for Codex/Gemini/Cursor - setup: add --host codex|claude|auto flag, install to ~/.codex/skills/ when targeting Codex, auto-detect installed agents - find-browse: priority chain .codex > .agents > .claude (both workspace-local and global) - dev-setup/teardown: create .agents/skills/gstack symlinks for dev mode Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: Codex generation tests + CI + docs for multi-agent support Tests (28 new): - Codex output path routing, frontmatter validation (name+description only) - No .claude/skills/ path leaks in Codex output (regression guard) - /codex skill exclusion, hook→prose conversion, multiline YAML - --host agents alias, dynamic template discovery - Codex skill validation + $B command validation - find-browse priority chain verification - Replace static ALL_SKILLS list with dynamic filesystem scan CI: - Add Codex freshness check to skill-docs workflow Docs: - AGENTS.md: Codex-facing project instructions - README: multi-agent installation section - CONTRIBUTING: dual-host development workflow - CHANGELOG: v0.9.0 multi-agent support entry Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: Codex E2E test harness — verify skills work on Codex CLI New test infrastructure: - CodexSessionRunner: spawns codex exec, parses JSONL stream, returns structured results (output, reasoning, toolCalls, tokens) - JSONL parser ported from Python (codex/SKILL.md.tmpl) to TypeScript - Temp HOME skill installation for Codex discovery testing E2E tests (gated behind EVALS=1 + codex + OPENAI_API_KEY): - codex-discover-skill: installs skill, verifies Codex finds it - codex-review-findings: runs gstack-review via Codex, validates output Integrates with existing eval infrastructure: - Diff-based test selection via touchfiles - Eval persistence via EvalCollector - bun run test:codex / test:codex:all convenience scripts Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: bump VERSION to 0.9.0 to match CHANGELOG Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: Codex sidecar paths + setup installs generated skills Two bugs found by Codex adversarial review: 1. Sidecar path mismatch: generated Codex skills referenced .agents/skills/gstack-review/checklist.md but setup creates sidecars at .agents/skills/gstack/review/. Fixed path rewriter to emit .agents/skills/gstack/review/ (matching setup layout). 2. Setup installed Claude-format source dirs for Codex global install instead of the generated Codex-format skills. Split link_skill_dirs into link_claude_skill_dirs (source dirs for Claude) and link_codex_skill_dirs (generated .agents/skills/ gstack-* dirs for Codex). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: comprehensive Codex path rewriting + setup install tests 17 new tests covering: - Sidecar path rewriting: .claude/skills/review → .agents/skills/gstack/review/ (catches the bug where checklist.md was unreachable at gstack-review/) - All 4 path rewrite rules tested individually across all skills - Greptile triage sidecar path correctness - Ship skill sidecar paths for pre-landing review - Claude output regression guard: zero Codex paths in any Claude skill - Setup script validation: separate link functions for Claude vs Codex, link_codex_skill_dirs reads from .agents/skills/, create_agents_sidecar links runtime assets (bin, browse, review, qa) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: regenerate Codex skills after investigate rename merge Remove stale gstack-debug, add gstack-investigate, regenerate all Codex skills to pick up changes merged from main (investigate rename, platform-agnostic templates, review helpers). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: Codex E2E uses ~/.codex/ auth, not OPENAI_API_KEY - Remove OPENAI_API_KEY gate from test prerequisites - Copy real ~/.codex/ auth config into temp HOME so codex can authenticate - Increase review test timeout to 540s (codex does thorough 60+ tool call reviews) - Document in CLAUDE.md that Codex uses its own auth config Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |