diff --git a/CLAUDE.md b/CLAUDE.md index 3ff25fffe..305c60c02 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -27,25 +27,16 @@ bun run slop:diff # slop findings in files changed on this branch only `test:evals` requires `ANTHROPIC_API_KEY`. Codex E2E tests (`test/codex-e2e.test.ts`) use Codex's own auth from `~/.codex/` config — no `OPENAI_API_KEY` env var needed. -**Where the keys live on this machine.** Conductor workspaces don't inherit the -user's interactive shell env, so `ANTHROPIC_API_KEY` and `OPENAI_API_KEY` aren't -in the default process env. Before running any paid eval / E2E, source them from -`~/.zshrc` (that's where Garry keeps them): +**Env keys in Conductor workspaces.** The `GSTACK_*` env-shim (v1.39.2.0+, +`lib/conductor-env-shim.ts`) promotes `GSTACK_ANTHROPIC_API_KEY` / +`GSTACK_OPENAI_API_KEY` to their canonical names inside gstack's TS binaries. +Tests run through gstack entrypoints inherit this promotion automatically. +Don't echo the key value to stdout, logs, or shell history. When passing to a +test's Agent SDK, do NOT pass `env: {...}` to `runAgentSdkTest` — the SDK's +auth pipeline doesn't pick up the key the same way when env is supplied as an +object (confirmed failure mode). Mutate `process.env.ANTHROPIC_API_KEY` +ambiently before the call and restore in `finally`. -```bash -bash -c ' - eval "$(grep -E "^export (ANTHROPIC_API_KEY|OPENAI_API_KEY)=" ~/.zshrc)" - export ANTHROPIC_API_KEY OPENAI_API_KEY - EVALS=1 EVALS_TIER=periodic bun test test/skill-e2e-.test.ts -' -``` - -Do not echo the key value anywhere (stdout, logs, shell history). The grep+eval -pattern keeps it in process env only. When passing to a test's Agent SDK, do NOT -pass `env: {...}` to `runAgentSdkTest` — the SDK's auth pipeline doesn't pick up -the key the same way when env is supplied as an object (confirmed failure mode). -Instead, mutate `process.env.ANTHROPIC_API_KEY` ambiently before the call and -restore in `finally`. E2E tests stream progress in real-time (tool-by-tool via `--output-format stream-json --verbose`). Results are persisted to `~/.gstack-dev/evals/` with auto-comparison against the previous run.