mirror of
https://github.com/tdurieux/anonymous_github.git
synced 2026-05-15 22:48:00 +02:00
improve binary file detection: content sniffing + jsonl support
Files like .jsonl that mime-types doesn't know fell through to application/octet-stream and rendered as "Unsupported binary file" in the viewer. Replace istextorbinary with isbinaryfile for content-based detection, and use mime-types for name-based classification with a textual application/* allowlist. The streaming transformer now defers classification when the name is inconclusive and sniffs the first chunk before emitting "transform", so route.ts and AnonymizedFile.ts get a content-aware Content-Type. Whitelists .jsonl and .ndjson to short-circuit dataset files. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
@@ -499,8 +499,9 @@ describe("ContentAnonimizer", function () {
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
// Mirror of isTextFile that relies on the file extension only — the real
|
||||
// impl additionally calls istextorbinary, but for these tests checking the
|
||||
// suffix is enough to demonstrate the constructor-vs-post-assignment bug.
|
||||
// impl additionally consults mime-types and isbinaryfile, but for these
|
||||
// tests checking the suffix is enough to demonstrate the
|
||||
// constructor-vs-post-assignment bug.
|
||||
function _isTextFileFromPath(filePath) {
|
||||
if (!filePath) return false;
|
||||
const ext = String(filePath).split(".").pop().toLowerCase();
|
||||
|
||||
Reference in New Issue
Block a user