* fix: anonymize Windows batch scripts (#735)
mime-types maps .bat to application/x-msdownload, the same MIME type as
.exe/.dll, so batch scripts were classified as binary and streamed
through without any anonymization. Special-case .bat/.cmd as text before
the MIME lookup, keeping .exe/.dll binary.
* fix: recover files missing from truncated tree listings (#738)
GitHub truncates tree listings of very large repositories. Folders whose
listing was truncated are recorded in truncatedFolders, but files that
fell outside the listing never reached the database, so requesting them
returned 404 file_not_found even though they exist on GitHub — and a
force refresh could not help.
When a file lookup misses and its directory is under a truncated folder,
fetch the file metadata directly from GitHub's contents API (object
media type, so it works past the 1MB inline limit), cache it in the
database, and serve it normally.
* feat: warn when a repository uses git submodules (#737)
GitHub archives and tree listings never include submodule contents, so
submodules end up as empty folders in the anonymized repository, which
surprises users. Detect a root .gitmodules file and show a warning
banner in the explorer explaining that submodule contents are not
included.
* feat: allow users to delete their account (#741)
Add DELETE /api/user: removes all anonymized repositories, gists, and
pull requests owned by the user, best-effort revokes the GitHub OAuth
grant, and scrubs personal data (username, emails, tokens, GitHub id,
photo) from the user record. The record itself is kept with a
placeholder username so removed repoIds stay reserved and owner
references remain resolvable.
The settings page gains an Account section with a confirmed delete
button.
* fix: add missing error translations for token_expired and job_is_active
The error-code coverage test failed because both backend codes had no
frontend translation.
Files like .jsonl that mime-types doesn't know fell through to
application/octet-stream and rendered as "Unsupported binary file" in
the viewer. Replace istextorbinary with isbinaryfile for content-based
detection, and use mime-types for name-based classification with a
textual application/* allowlist.
The streaming transformer now defers classification when the name is
inconclusive and sniffs the first chunk before emitting "transform",
so route.ts and AnonymizedFile.ts get a content-aware Content-Type.
Whitelists .jsonl and .ndjson to short-circuit dataset files.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
istextorbinary returns null for filenames with no extension, and the
isTextFile() guard treated null as "not text" — so terms in LICENSE,
COPYING, AUTHORS, README (extensionless), CHANGELOG, NOTICE, and
similar conventional filenames went through the binary passthrough
in AnonymizeTransformer and were never anonymized.
Add a small whitelist of these names ahead of the istextorbinary call.
Fixes#493.