mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-02 03:35:09 +02:00
07edc70df1
Ships the research skeleton for the P3 "5ms Bun-native classifier" TODO.
Honest scope: tokenizer + API surface + benchmark harness + roadmap doc.
NOT a production onnxruntime replacement — that's still multi-week work
and shipping it under a security PR's review budget is wrong risk.
browse/src/security-bunnative.ts:
* Pure-TS WordPiece tokenizer reading HF tokenizer.json directly —
produces the same input_ids sequence as transformers.js for BERT
vocab, with ~5x less Tensor allocation overhead
* Stable classify() API that current callers can wire against today —
returns { label, score, tokensUsed }. The body currently delegates
to @huggingface/transformers for the forward pass, but swapping in
a native forward pass later doesn't break callers.
* Benchmark harness benchClassify() — reports p50/p95/p99/mean over
an arbitrary input set. Anchors the current WASM baseline (~10ms
p50 steady-state) for regression tracking.
docs/designs/BUN_NATIVE_INFERENCE.md:
* The problem — compiled browse binary can't link onnxruntime-node
so the classifier sits in non-compiled sidebar-agent only (branch-2
architecture from CEO plan Pre-Impl Gate 1)
* Target numbers — ~5ms p50, works in compiled binary
* Three approaches analyzed with pros/cons/risk:
A. Pure-TS SIMD — ruled out (can't beat WASM at matmul)
B. Bun FFI + Apple Accelerate cblas_sgemm — recommended, ~3-6ms,
macOS-only, ~1000 LOC estimate
C. Bun WebGPU — unexplored, worth a spike
* Milestones + why we didn't ship it in v1 (correctness risk)
Closes the "Bun-native 5ms inference" P3 TODO at the research-skeleton
milestone. Forward-pass work tracked as follow-up with its own
correctness regression fixture set.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>