mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-02 03:35:09 +02:00
d5253215c5
Two real bugs found by the BrowseSafe-Bench smoke harness.
1. Truncation wasn't happening.
The TextClassificationPipeline in transformers.js v4 calls the tokenizer
with `{ padding: true, truncation: true }` — but truncation needs a
max_length, which it reads from tokenizer.model_max_length. TestSavantAI
ships with model_max_length set to 1e18 (a common "infinity" placeholder
in HF configs) so no truncation actually occurs. Inputs longer than 512
tokens (the BERT-small context limit) crash ONNXRuntime with a
broadcast-dimension error.
Fix: override tokenizer._tokenizerConfig.model_max_length = 512 right
after pipeline load. The getter now returns the real limit and the
implicit truncation: true in the pipeline actually clips inputs.
2. Classifier was receiving raw HTML.
TestSavantAI is trained on natural language, not markup. Feeding it a
blob of <div style="..."> dilutes the injection signal with tag noise.
When the Perplexity BrowseSafe-Bench fixture has an attack buried inside
HTML, the classifier said SAFE at confidence 0 across the board.
Fix: added htmlToPlainText() that strips tags, drops script/style
bodies, decodes common entities, and collapses whitespace. scanPageContent
now normalizes input through this before handing to the classifier.
Result: BrowseSafe-Bench smoke runs without errors. Detection rate is only
15% at WARN=0.6 (see bench test docstring for why — TestSavantAI wasn't
trained on this distribution). Ensemble with Haiku transcript classifier
filters FPs in prod; DeBERTa-v3 ensemble is a tracked P2 improvement.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>