improve binary file detection: content sniffing + jsonl support

Files like .jsonl that mime-types doesn't know fell through to
application/octet-stream and rendered as "Unsupported binary file" in
the viewer. Replace istextorbinary with isbinaryfile for content-based
detection, and use mime-types for name-based classification with a
textual application/* allowlist.

The streaming transformer now defers classification when the name is
inconclusive and sniffs the first chunk before emitting "transform",
so route.ts and AnonymizedFile.ts get a content-aware Content-Type.
Whitelists .jsonl and .ndjson to short-circuit dataset files.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
tdurieux
2026-05-06 07:52:48 +03:00
parent 18ce39e019
commit 79f555769d
6 changed files with 154 additions and 158 deletions
+1 -1
View File
@@ -51,7 +51,7 @@
"express-slow-down": "^2.0.1",
"got": "^11.8.6",
"inquirer": "^8.2.6",
"istextorbinary": "^9.5.0",
"isbinaryfile": "^6.0.0",
"marked": "^5.1.2",
"mime-types": "^2.1.35",
"mongoose": "^7.6.10",