docs(CLAUDE.md): source ANTHROPIC/OPENAI keys from ~/.zshrc for paid evals

Conductor workspaces don't inherit the interactive shell env, so
both API keys are absent from the default process env even though
they're set in ~/.zshrc. Documents the source-from-zshrc pattern
(grep + eval, never echo the value) plus the Agent SDK gotcha: do
NOT pass env as an object to runAgentSdkTest — mutate process.env
ambiently and restore in finally.

Discovered this during the brain-privacy-gate E2E. First run failed
at SDK auth with 401; second failed because explicit env handoff
bypassed the SDK's own auth routing. Fix pattern now codified so
the next paid-eval session in a Conductor workspace doesn't hit the
same two dead ends.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Garry Tan
2026-04-24 01:26:24 -07:00
parent 371a7e684a
commit 3139cf40fb
+20
View File
@@ -26,6 +26,26 @@ bun run slop:diff # slop findings in files changed on this branch only
`test:evals` requires `ANTHROPIC_API_KEY`. Codex E2E tests (`test/codex-e2e.test.ts`)
use Codex's own auth from `~/.codex/` config — no `OPENAI_API_KEY` env var needed.
**Where the keys live on this machine.** Conductor workspaces don't inherit the
user's interactive shell env, so `ANTHROPIC_API_KEY` and `OPENAI_API_KEY` aren't
in the default process env. Before running any paid eval / E2E, source them from
`~/.zshrc` (that's where Garry keeps them):
```bash
bash -c '
eval "$(grep -E "^export (ANTHROPIC_API_KEY|OPENAI_API_KEY)=" ~/.zshrc)"
export ANTHROPIC_API_KEY OPENAI_API_KEY
EVALS=1 EVALS_TIER=periodic bun test test/skill-e2e-<whatever>.test.ts
'
```
Do not echo the key value anywhere (stdout, logs, shell history). The grep+eval
pattern keeps it in process env only. When passing to a test's Agent SDK, do NOT
pass `env: {...}` to `runAgentSdkTest` — the SDK's auth pipeline doesn't pick up
the key the same way when env is supplied as an object (confirmed failure mode).
Instead, mutate `process.env.ANTHROPIC_API_KEY` ambiently before the call and
restore in `finally`.
E2E tests stream progress in real-time (tool-by-tool via `--output-format stream-json
--verbose`). Results are persisted to `~/.gstack-dev/evals/` with auto-comparison
against the previous run.