Files
anonymous_github/public
tdurieux ef78e8ff3c feat: preserve raw bytes when anonymization is a no-op
When the anonymizer doesn't change a slice's text, the streamer used
to push Buffer.from(out, "utf8") — which loses any invalid-UTF-8 bytes
in the input (replaced by U+FFFD via StringDecoder). Files
mistakenly classified as text (binary blobs without a known extension,
text with stray non-UTF-8 bytes, BOMs) came out corrupted even though
nothing in the term list matched.

Track the raw chunk bytes alongside the decoded `pending`. On flush —
where we have every byte buffered — emit the original buffer directly
when the output equals the input, so a pure passthrough is bit-exact.
In the streaming OVERLAP path, do the same when the decode for that
slice round-trips losslessly; fall back to encoded output otherwise
(unchanged from before for that case).

Also add the "missing_content" locale entry for the
/api/anonymize-preview route.
2026-05-04 11:52:03 +02:00
..
wip
2026-05-04 11:30:42 +02:00
2024-04-05 01:02:41 +01:00
2021-08-14 05:29:31 +02:00
wip
2026-05-04 11:30:42 +02:00
2021-04-26 06:37:18 +02:00
2026-04-24 15:25:23 +02:00
2021-03-19 10:23:46 +01:00