Files
gstack/browse
Garry Tan d5253215c5 fix(security-classifier): truncation + HTML preprocessing
Two real bugs found by the BrowseSafe-Bench smoke harness.

1. Truncation wasn't happening.
   The TextClassificationPipeline in transformers.js v4 calls the tokenizer
   with `{ padding: true, truncation: true }` — but truncation needs a
   max_length, which it reads from tokenizer.model_max_length. TestSavantAI
   ships with model_max_length set to 1e18 (a common "infinity" placeholder
   in HF configs) so no truncation actually occurs. Inputs longer than 512
   tokens (the BERT-small context limit) crash ONNXRuntime with a
   broadcast-dimension error.
   Fix: override tokenizer._tokenizerConfig.model_max_length = 512 right
   after pipeline load. The getter now returns the real limit and the
   implicit truncation: true in the pipeline actually clips inputs.

2. Classifier was receiving raw HTML.
   TestSavantAI is trained on natural language, not markup. Feeding it a
   blob of <div style="..."> dilutes the injection signal with tag noise.
   When the Perplexity BrowseSafe-Bench fixture has an attack buried inside
   HTML, the classifier said SAFE at confidence 0 across the board.
   Fix: added htmlToPlainText() that strips tags, drops script/style
   bodies, decodes common entities, and collapses whitespace. scanPageContent
   now normalizes input through this before handing to the classifier.

Result: BrowseSafe-Bench smoke runs without errors. Detection rate is only
15% at WARN=0.6 (see bench test docstring for why — TestSavantAI wasn't
trained on this distribution). Ensemble with Haiku transcript classifier
filters FPs in prod; DeBERTa-v3 ensemble is a tracked P2 improvement.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 04:50:53 +08:00
..