improve binary file detection: content sniffing + jsonl support

Files like .jsonl that mime-types doesn't know fell through to
application/octet-stream and rendered as "Unsupported binary file" in
the viewer. Replace istextorbinary with isbinaryfile for content-based
detection, and use mime-types for name-based classification with a
textual application/* allowlist.

The streaming transformer now defers classification when the name is
inconclusive and sniffs the first chunk before emitting "transform",
so route.ts and AnonymizedFile.ts get a content-aware Content-Type.
Whitelists .jsonl and .ndjson to short-circuit dataset files.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
tdurieux
2026-05-06 07:52:48 +03:00
parent 18ce39e019
commit 79f555769d
6 changed files with 154 additions and 158 deletions
+3 -2
View File
@@ -499,8 +499,9 @@ describe("ContentAnonimizer", function () {
// ---------------------------------------------------------------------------
// Mirror of isTextFile that relies on the file extension only — the real
// impl additionally calls istextorbinary, but for these tests checking the
// suffix is enough to demonstrate the constructor-vs-post-assignment bug.
// impl additionally consults mime-types and isbinaryfile, but for these
// tests checking the suffix is enough to demonstrate the
// constructor-vs-post-assignment bug.
function _isTextFileFromPath(filePath) {
if (!filePath) return false;
const ext = String(filePath).split(".").pop().toLowerCase();