mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-05 05:05:08 +02:00
fix(security): unbreak Haiku transcript classifier — wrong model + too-tight timeout
Two bugs that made checkTranscript return degraded on every call:
1. --model 'haiku-4-5' returns 404 from the Claude CLI. The accepted
shorthand is 'haiku' (resolves to claude-haiku-4-5-20251001
today, stays on the latest Haiku as models roll). Symptom: every
call exited non-zero with api_error_status=404.
2. 2000ms timeout is below the floor. Fresh `claude -p` spawn has
~2-3s CLI cold-start + 5-12s inference on ~1KB prompts. With the
wrong model gone, every successful call still timed out before it
returned. Measured: 0% firing rate.
Fix: model alias + 15s timeout. Sanity check against DAN-style
injection now returns confidence 0.99 with reasoning ("Tool output
contains multiple injection patterns: instruction override, jailbreak
attempt (DAN), system prompt exfil request, and malicious curl
command to attacker domain") in 8.7s.
This was the silent cause of the 15.3% detection rate on
BrowseSafe-Bench — the ensemble numbers matched L4-alone because
Haiku never actually voted.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -456,9 +456,13 @@ export async function checkTranscript(params: {
|
||||
].join('\n');
|
||||
|
||||
return new Promise((resolve) => {
|
||||
// Model alias 'haiku' resolves to the latest Haiku (currently
|
||||
// claude-haiku-4-5-20251001). The pinned form 'haiku-4-5' returned 404
|
||||
// because the CLI doesn't accept that shorthand. Using the alias keeps
|
||||
// us on the latest Haiku as models roll forward.
|
||||
const p = spawn('claude', [
|
||||
'-p', prompt,
|
||||
'--model', 'haiku-4-5',
|
||||
'--model', 'haiku',
|
||||
'--output-format', 'json',
|
||||
], { stdio: ['ignore', 'pipe', 'pipe'] });
|
||||
|
||||
@@ -502,11 +506,17 @@ export async function checkTranscript(params: {
|
||||
p.on('error', () => {
|
||||
finish({ layer: 'transcript_classifier', confidence: 0, meta: { degraded: true, reason: 'spawn_error' } });
|
||||
});
|
||||
// Hard timeout — per plan §E1 (2000ms cap)
|
||||
// Hard timeout. Original spec was 2000ms but real-world `claude -p`
|
||||
// spawns a fresh CLI per call with ~2-3s cold-start + 5-12s inference
|
||||
// on ~1KB prompts. At 2s every call timed out, defeating the
|
||||
// classifier entirely (measured: 0% firing rate). At 15s we catch the
|
||||
// long tail; faster prompts return in under 5s. The stream handler
|
||||
// runs this in parallel with the content scan so the latency is
|
||||
// bounded by this timer, not additive to session wall time.
|
||||
setTimeout(() => {
|
||||
try { p.kill('SIGTERM'); } catch {}
|
||||
finish({ layer: 'transcript_classifier', confidence: 0, meta: { degraded: true, reason: 'timeout' } });
|
||||
}, 2000);
|
||||
}, 15000);
|
||||
});
|
||||
}
|
||||
|
||||
|
||||
Reference in New Issue
Block a user