tdurieux ef78e8ff3c feat: preserve raw bytes when anonymization is a no-op
When the anonymizer doesn't change a slice's text, the streamer used
to push Buffer.from(out, "utf8") — which loses any invalid-UTF-8 bytes
in the input (replaced by U+FFFD via StringDecoder). Files
mistakenly classified as text (binary blobs without a known extension,
text with stray non-UTF-8 bytes, BOMs) came out corrupted even though
nothing in the term list matched.

Track the raw chunk bytes alongside the decoded `pending`. On flush —
where we have every byte buffered — emit the original buffer directly
when the output equals the input, so a pure passthrough is bit-exact.
In the streaming OVERLAP path, do the same when the decode for that
slice round-trips losslessly; fall back to encoded output otherwise
(unchanged from before for that case).

Also add the "missing_content" locale entry for the
/api/anonymize-preview route.
2026-05-04 11:52:03 +02:00
2023-02-13 08:15:55 +01:00
2026-05-03 18:29:20 +02:00
2024-04-05 13:11:43 +01:00
2021-05-30 17:38:08 +02:00
2026-04-24 16:14:29 +02:00

Anonymous Github

Anonymous Github is a system that helps anonymize Github repositories for double-anonymous paper submissions. A public instance of Anonymous Github is hosted at https://anonymous.4open.science/.

screenshot

Anonymous Github anonymizes the following:

  • Github repository owner, organization, and name
  • File and directory names
  • File contents of all extensions, including markdown, text, Java, etc.

Usage

Public instance

https://anonymous.4open.science/

CLI

This CLI tool allows you to anonymize your GitHub repositories locally, generating an anonymized zip file based on your configuration settings.

# Install the Anonymous GitHub CLI tool
npm install -g @tdurieux/anonymous_github

# Run the Anonymous GitHub CLI tool
anonymous_github

Own instance

1. Clone the repository

git clone https://github.com/tdurieux/anonymous_github/
cd anonymous_github
npm i

2. Configure the GitHub token

Create a .env file with the following contents:

GITHUB_TOKEN=<GITHUB_TOKEN>
CLIENT_ID=<CLIENT_ID>
CLIENT_SECRET=<CLIENT_SECRET>
PORT=5000
DB_USERNAME=
DB_PASSWORD=
AUTH_CALLBACK=http://localhost:5000/github/auth,

3. Start Anonymous Github server

docker-compose up -d

4. Go to Anonymous Github

Go to http://localhost:5000. By default, Anonymous Github uses port 5000. It can be changed in docker-compose.yml. I would recommand to put Anonymous GitHub behind ngnix to handle the https certificates.

What is the scope of anonymization?

In double-anonymous peer-review, the boundary of anonymization is the paper plus its online appendix, and only this, it's not the whole world. Googling any part of the paper or the online appendix can be considered as a deliberate attempt to break anonymity (explanation)

How does it work?

Anonymous Github either downloads the complete repository and anonymizes the content of the file or proxies the request to GitHub. In both cases, the original and anonymized versions of the file are cached on the server.

gitmask is a tool to anonymously contribute to a Github repository.

blind-reviews is a browser add-on that enables a person reviewing a GitHub pull request to hide identifying information about the person submitting it.

See also

Languages
TypeScript 32.9%
JavaScript 29.6%
HTML 23.3%
CSS 13.8%
Shell 0.3%
Other 0.1%