feat: add cli to anonymize repositories locally

This commit is contained in:
tdurieux
2023-02-06 15:48:21 +01:00
parent d01c839616
commit dcf7f36917
9 changed files with 676 additions and 123 deletions

1
.gitignore vendored
View File

@@ -1,4 +1,5 @@
.env .env
build
/repositories /repositories
repo/ repo/
db_backups db_backups

View File

@@ -1,27 +1,25 @@
Anonymous Github # Anonymous Github
================
Anonymous Github is a system to anonymize Github repositories before referring to them in a double-anonymous paper submission. Anonymous Github is a system to anonymize Github repositories before referring to them in a double-anonymous paper submission.
To start using Anonymous Github right now: **[http://anonymous.4open.science/](http://anonymous.4open.science/)** To start using Anonymous Github right now: **[http://anonymous.4open.science/](http://anonymous.4open.science/)**
Indeed, in a double-anonymous review process, the open-science data or code that is in the online appendix must be anonymized, similarly to paper anonymization. The authors must Indeed, in a double-anonymous review process, the open-science data or code that is in the online appendix must be anonymized, similarly to paper anonymization. The authors must
* anonymize URLs: the name of the institution/department/group/authors should not appear in the URLs of the open-science appendix - anonymize URLs: the name of the institution/department/group/authors should not appear in the URLs of the open-science appendix
* anonymize the appendix content itself - anonymize the appendix content itself
Anonymizing an open-science appendix needs some work, but fortunately, this can be automated, this is what Anonymous Github is about. Anonymizing an open-science appendix needs some work, but fortunately, this can be automated, this is what Anonymous Github is about.
Anonymous Github anonymizes: Anonymous Github anonymizes:
* the Github owner / organization / repository name
* the content of the repository - the Github owner / organization / repository name
* file contents (all extensions, md/txt/java/etc) - the content of the repository
* file and directory names - file contents (all extensions, md/txt/java/etc)
- file and directory names
Question / Feedback / Bug report: please open an issue in this repository. Question / Feedback / Bug report: please open an issue in this repository.
Using Anonymous Github ## Using Anonymous Github
-----------------------
## How to create a new anonymized repository ## How to create a new anonymized repository
@@ -42,15 +40,25 @@ To start using Anonymous Github right now, a public instance of anonymous_github
In double-anonymous peer-review, the boundary of anonymization is the paper plus its online appendix, and only this, it's not the whole world. Googling any part of the paper or the online appendix can be considered as a deliberate attempt to break anonymity ([explanation](http://www.monperrus.net/martin/open-science-double-anonymous)) In double-anonymous peer-review, the boundary of anonymization is the paper plus its online appendix, and only this, it's not the whole world. Googling any part of the paper or the online appendix can be considered as a deliberate attempt to break anonymity ([explanation](http://www.monperrus.net/martin/open-science-double-anonymous))
## CLI
How does it work? This CLI tool allows you to anonymize your GitHub repositories locally, generating an anonymized zip file based on your configuration settings.
-----------------
```bash
# Install the Anonymous GitHub CLI tool
npm install -g @tdurieux/anonymous_github
# Run the Anonymous GitHub CLI tool
anonymous_github
```
## How does it work?
Anonymous Github either download the complete repository and anonymize the content of the file or proxy the request to GitHub. In both case, the original and anonymized versions of the file are cached on the server. Anonymous Github either download the complete repository and anonymize the content of the file or proxy the request to GitHub. In both case, the original and anonymized versions of the file are cached on the server.
Installing Anonymous Github ## Installing Anonymous Github
----------------------------
1. Clone the repository 1. Clone the repository
```bash ```bash
git clone https://github.com/tdurieux/anonymous_github/ git clone https://github.com/tdurieux/anonymous_github/
cd anonymous_github cd anonymous_github
@@ -76,6 +84,7 @@ AUTH_CALLBACK=http://localhost:5000/github/auth,
The callback of the GitHub app needs to be defined as `https://<host>/github/auth` (the same as defined in AUTH_CALLBACK). The callback of the GitHub app needs to be defined as `https://<host>/github/auth` (the same as defined in AUTH_CALLBACK).
3. Run Anonymous Github 3. Run Anonymous Github
```bash ```bash
docker-compose up -d docker-compose up -d
``` ```
@@ -84,14 +93,12 @@ docker-compose up -d
By default, Anonymous Github uses port 5000. It can be changed in `docker-compose.yml`. By default, Anonymous Github uses port 5000. It can be changed in `docker-compose.yml`.
## Related tools
Related tools
--------------
[gitmask](https://www.gitmask.com/) is a tool to anonymously contribute to a Github repository. [gitmask](https://www.gitmask.com/) is a tool to anonymously contribute to a Github repository.
[blind-reviews](https://github.com/zombie/blind-reviews/) is a browser add-on that enables a person reviewing a GitHub pull request to hide identifying information about the person submitting it. [blind-reviews](https://github.com/zombie/blind-reviews/) is a browser add-on that enables a person reviewing a GitHub pull request to hide identifying information about the person submitting it.
See also ## See also
--------
* [Open-science and double-anonymous Peer-Review](https://www.monperrus.net/martin/open-science-double-blind) - [Open-science and double-anonymous Peer-Review](https://www.monperrus.net/martin/open-science-double-blind)

99
cli.ts Normal file
View File

@@ -0,0 +1,99 @@
#!/usr/bin/env node
import { config as dot } from "dotenv";
dot();
import { writeFile } from "fs/promises";
import { join } from "path";
import { tmpdir } from "os";
import * as gh from "parse-github-url";
import * as inquirer from "inquirer";
import config from "./config";
import GitHubDownload from "./src/source/GitHubDownload";
import Repository from "./src/Repository";
import AnonymizedRepositoryModel from "./src/database/anonymizedRepositories/anonymizedRepositories.model";
function generateRandomFileName(size: number) {
const characters =
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789";
let result = "";
for (let i = 0; i < size; i++) {
result += characters.charAt(Math.floor(Math.random() * characters.length));
}
return result;
}
async function main() {
config.STORAGE = "filesystem";
const inq = await inquirer.prompt([
{
type: "string",
name: "token",
message: `Enter your GitHub token. You can create one at https://github.com/settings/personal-access-tokens/new.`,
default: process.env.GITHUB_TOKEN,
},
{
type: "string",
name: "repo",
message: `URL of the repository to anonymize (if you want to download a specific branch or commit use the GitHub URL of that branch or commit).`,
},
{
type: "string",
name: "terms",
message: `Terms to remove from your repository (separated with comma).`,
},
]);
const ghURL = gh(inq.repo) || { owner: "", name: "", branch: "", commit: "" };
const repository = new Repository(
new AnonymizedRepositoryModel({
repoId: "test",
source: {
type: "GitHubDownload",
accessToken: inq.token,
branch: ghURL.branch || "master",
commit: ghURL.branch || "HEAD",
repositoryName: `${ghURL.owner}/${ghURL.name}`,
},
options: {
terms: inq.terms.split(","),
expirationMode: "never",
update: false,
image: true,
pdf: true,
notebook: true,
link: true,
page: false,
},
})
);
const source = new GitHubDownload(
{
type: "GitHubDownload",
accessToken: inq.token,
repositoryName: inq.repo,
},
repository
);
console.info("[INFO] Downloading repository...");
await source.download(inq.token);
const outputFileName = join(tmpdir(), generateRandomFileName(8) + ".zip");
console.info("[INFO] Anonymizing repository and creation zip file...");
await writeFile(outputFileName, repository.zip());
console.log(`Anonymized repository saved at ${outputFileName}`);
}
if (require.main === module) {
if (process.argv[2] == "server") {
// start the server
require("./src/server").default();
} else {
// use the cli interface
main();
}
}

599
package-lock.json generated

File diff suppressed because it is too large Load Diff

View File

@@ -1,8 +1,10 @@
{ {
"name": "anonymous_github", "name": "@tdurieux/anonymous_github",
"version": "2.1.0", "version": "2.1.0",
"description": "Anonymise Github repositories for double-anonymous reviews", "description": "Anonymise Github repositories for double-anonymous reviews",
"main": "index.ts", "bin": {
"anonymous_github": "build/cli.js"
},
"scripts": { "scripts": {
"test": "mocha --reporter spec", "test": "mocha --reporter spec",
"start": "node --inspect=5858 -r ts-node/register ./index.ts", "start": "node --inspect=5858 -r ts-node/register ./index.ts",
@@ -23,6 +25,10 @@
"url": "https://github.com/sponsors/tdurieux" "url": "https://github.com/sponsors/tdurieux"
}, },
"homepage": "https://github.com/tdurieux/anonymous_github#readme", "homepage": "https://github.com/tdurieux/anonymous_github#readme",
"files": [
"public",
"build"
],
"dependencies": { "dependencies": {
"@octokit/oauth-app": "^4.1.0", "@octokit/oauth-app": "^4.1.0",
"@octokit/rest": "^19.0.5", "@octokit/rest": "^19.0.5",
@@ -39,6 +45,7 @@
"express-session": "^1.17.3", "express-session": "^1.17.3",
"express-slow-down": "^1.5.0", "express-slow-down": "^1.5.0",
"got": "^11.8.5", "got": "^11.8.5",
"inquirer": "^8.2.5",
"istextorbinary": "^6.0.0", "istextorbinary": "^6.0.0",
"marked": "^4.1.1", "marked": "^4.1.1",
"mime-types": "^2.1.35", "mime-types": "^2.1.35",
@@ -63,6 +70,7 @@
"@types/express-session": "^1.17.5", "@types/express-session": "^1.17.5",
"@types/express-slow-down": "^1.3.2", "@types/express-slow-down": "^1.3.2",
"@types/got": "^9.6.12", "@types/got": "^9.6.12",
"@types/inquirer": "^8.0.0",
"@types/marked": "^4.0.7", "@types/marked": "^4.0.7",
"@types/mime-types": "^2.1.0", "@types/mime-types": "^2.1.0",
"@types/parse-github-url": "^1.0.0", "@types/parse-github-url": "^1.0.0",

View File

@@ -15,6 +15,7 @@ import Conference from "./Conference";
import ConferenceModel from "./database/conference/conferences.model"; import ConferenceModel from "./database/conference/conferences.model";
import AnonymousError from "./AnonymousError"; import AnonymousError from "./AnonymousError";
import { downloadQueue } from "./queue"; import { downloadQueue } from "./queue";
import { isConnected } from "./database/database";
export default class Repository { export default class Repository {
private _model: IAnonymizedRepositoryDocument; private _model: IAnonymizedRepositoryDocument;
@@ -208,6 +209,7 @@ export default class Repository {
* Update the last view and view count * Update the last view and view count
*/ */
async countView() { async countView() {
if (!isConnected) return this.model;
this._model.lastView = new Date(); this._model.lastView = new Date();
this._model.pageView = (this._model.pageView || 0) + 1; this._model.pageView = (this._model.pageView || 0) + 1;
return this._model.save(); return this._model.save();
@@ -219,9 +221,11 @@ export default class Repository {
* @param errorMessage a potential error message to display * @param errorMessage a potential error message to display
*/ */
async updateStatus(status: RepositoryStatus, statusMessage?: string) { async updateStatus(status: RepositoryStatus, statusMessage?: string) {
if (!status) return this.model;
this._model.status = status; this._model.status = status;
this._model.statusDate = new Date(); this._model.statusDate = new Date();
this._model.statusMessage = statusMessage; this._model.statusMessage = statusMessage;
if (!isConnected) return this.model;
return this._model.save(); return this._model.save();
} }
@@ -247,13 +251,12 @@ export default class Repository {
* Reset/delete the state of the repository * Reset/delete the state of the repository
*/ */
async resetSate(status?: RepositoryStatus, statusMessage?: string) { async resetSate(status?: RepositoryStatus, statusMessage?: string) {
if (status) this._model.status = status; const p = this.updateStatus(status, statusMessage);
if (statusMessage) this._model.statusMessage = statusMessage;
// remove attribute // remove attribute
this._model.size = { storage: 0, file: 0 }; this._model.size = { storage: 0, file: 0 };
this._model.originalFiles = null; this._model.originalFiles = null;
// remove cache // remove cache
return Promise.all([this._model.save(), this.removeCache()]); return Promise.all([p, this.removeCache()]);
} }
/** /**
@@ -281,15 +284,15 @@ export default class Repository {
}> { }> {
if (this.status != "ready") return { storage: 0, file: 0 }; if (this.status != "ready") return { storage: 0, file: 0 };
if (this._model.size.file) return this._model.size; if (this._model.size.file) return this._model.size;
function recursiveCount(files) { function recursiveCount(files: Tree): { storage: number; file: number } {
const out = { storage: 0, file: 0 }; const out = { storage: 0, file: 0 };
for (const name in files) { for (const name in files) {
const file = files[name]; const file = files[name];
if (file.size && parseInt(file.size) == file.size) { if (file.size && parseInt(file.size.toString()) == file.size) {
out.storage += file.size as number; out.storage += file.size as number;
out.file++; out.file++;
} else if (typeof file == "object") { } else if (typeof file == "object") {
const r = recursiveCount(file); const r = recursiveCount(file as Tree);
out.storage += r.storage; out.storage += r.storage;
out.file += r.file; out.file += r.file;
} }

View File

@@ -10,10 +10,13 @@ const MONGO_URL = `mongodb://${config.DB_USERNAME}:${config.DB_PASSWORD}@${confi
export const database = mongoose.connection; export const database = mongoose.connection;
export let isConnected = false;
export async function connect() { export async function connect() {
await mongoose.connect(MONGO_URL + "production", { await mongoose.connect(MONGO_URL + "production", {
authSource: "admin", authSource: "admin",
} as ConnectOptions); } as ConnectOptions);
isConnected = true;
return database; return database;
} }

View File

@@ -38,7 +38,7 @@ export default class GitHubDownload extends GitHubBase implements SourceBase {
}); });
} }
async download() { async download(token?: string) {
const fiveMinuteAgo = new Date(); const fiveMinuteAgo = new Date();
fiveMinuteAgo.setMinutes(fiveMinuteAgo.getMinutes() - 5); fiveMinuteAgo.setMinutes(fiveMinuteAgo.getMinutes() - 5);
if ( if (
@@ -51,7 +51,10 @@ export default class GitHubDownload extends GitHubBase implements SourceBase {
}); });
let response: OctokitResponse<unknown, number>; let response: OctokitResponse<unknown, number>;
try { try {
response = await this._getZipUrl(await this.getToken()); if (!token) {
token = await this.getToken();
}
response = await this._getZipUrl(token);
} catch (error) { } catch (error) {
if (error.status == 401 && config.GITHUB_TOKEN) { if (error.status == 401 && config.GITHUB_TOKEN) {
try { try {

View File

@@ -5,13 +5,13 @@
"compilerOptions": { "compilerOptions": {
"target": "es6", "target": "es6",
"module": "commonjs", "module": "commonjs",
"outDir": "dist", "outDir": "build",
"removeComments": true, "removeComments": true,
"preserveConstEnums": true, "preserveConstEnums": true,
"forceConsistentCasingInFileNames": true, "forceConsistentCasingInFileNames": true,
"sourceMap": false, "sourceMap": false,
"skipLibCheck": true "skipLibCheck": true
}, },
"include": ["src/**/*.ts", "index.ts", "tests3.ts"], "include": ["src/**/*.ts", "index.ts", "cli.ts"],
"exclude": ["node_modules", ".vscode"] "exclude": ["node_modules", ".vscode"]
} }