mirror of
https://github.com/tdurieux/anonymous_github.git
synced 2026-02-12 18:32:44 +00:00
feat: add cli to anonymize repositories locally
This commit is contained in:
1
.gitignore
vendored
1
.gitignore
vendored
@@ -1,4 +1,5 @@
|
|||||||
.env
|
.env
|
||||||
|
build
|
||||||
/repositories
|
/repositories
|
||||||
repo/
|
repo/
|
||||||
db_backups
|
db_backups
|
||||||
|
|||||||
47
README.md
47
README.md
@@ -1,27 +1,25 @@
|
|||||||
Anonymous Github
|
# Anonymous Github
|
||||||
================
|
|
||||||
|
|
||||||
Anonymous Github is a system to anonymize Github repositories before referring to them in a double-anonymous paper submission.
|
Anonymous Github is a system to anonymize Github repositories before referring to them in a double-anonymous paper submission.
|
||||||
To start using Anonymous Github right now: **[http://anonymous.4open.science/](http://anonymous.4open.science/)**
|
To start using Anonymous Github right now: **[http://anonymous.4open.science/](http://anonymous.4open.science/)**
|
||||||
|
|
||||||
Indeed, in a double-anonymous review process, the open-science data or code that is in the online appendix must be anonymized, similarly to paper anonymization. The authors must
|
Indeed, in a double-anonymous review process, the open-science data or code that is in the online appendix must be anonymized, similarly to paper anonymization. The authors must
|
||||||
|
|
||||||
* anonymize URLs: the name of the institution/department/group/authors should not appear in the URLs of the open-science appendix
|
- anonymize URLs: the name of the institution/department/group/authors should not appear in the URLs of the open-science appendix
|
||||||
* anonymize the appendix content itself
|
- anonymize the appendix content itself
|
||||||
|
|
||||||
Anonymizing an open-science appendix needs some work, but fortunately, this can be automated, this is what Anonymous Github is about.
|
Anonymizing an open-science appendix needs some work, but fortunately, this can be automated, this is what Anonymous Github is about.
|
||||||
|
|
||||||
Anonymous Github anonymizes:
|
Anonymous Github anonymizes:
|
||||||
* the Github owner / organization / repository name
|
|
||||||
* the content of the repository
|
- the Github owner / organization / repository name
|
||||||
* file contents (all extensions, md/txt/java/etc)
|
- the content of the repository
|
||||||
* file and directory names
|
- file contents (all extensions, md/txt/java/etc)
|
||||||
|
- file and directory names
|
||||||
|
|
||||||
Question / Feedback / Bug report: please open an issue in this repository.
|
Question / Feedback / Bug report: please open an issue in this repository.
|
||||||
|
|
||||||
Using Anonymous Github
|
## Using Anonymous Github
|
||||||
-----------------------
|
|
||||||
|
|
||||||
|
|
||||||
## How to create a new anonymized repository
|
## How to create a new anonymized repository
|
||||||
|
|
||||||
@@ -42,15 +40,25 @@ To start using Anonymous Github right now, a public instance of anonymous_github
|
|||||||
|
|
||||||
In double-anonymous peer-review, the boundary of anonymization is the paper plus its online appendix, and only this, it's not the whole world. Googling any part of the paper or the online appendix can be considered as a deliberate attempt to break anonymity ([explanation](http://www.monperrus.net/martin/open-science-double-anonymous))
|
In double-anonymous peer-review, the boundary of anonymization is the paper plus its online appendix, and only this, it's not the whole world. Googling any part of the paper or the online appendix can be considered as a deliberate attempt to break anonymity ([explanation](http://www.monperrus.net/martin/open-science-double-anonymous))
|
||||||
|
|
||||||
|
## CLI
|
||||||
|
|
||||||
How does it work?
|
This CLI tool allows you to anonymize your GitHub repositories locally, generating an anonymized zip file based on your configuration settings.
|
||||||
-----------------
|
|
||||||
|
```bash
|
||||||
|
# Install the Anonymous GitHub CLI tool
|
||||||
|
npm install -g @tdurieux/anonymous_github
|
||||||
|
|
||||||
|
# Run the Anonymous GitHub CLI tool
|
||||||
|
anonymous_github
|
||||||
|
```
|
||||||
|
## How does it work?
|
||||||
|
|
||||||
Anonymous Github either download the complete repository and anonymize the content of the file or proxy the request to GitHub. In both case, the original and anonymized versions of the file are cached on the server.
|
Anonymous Github either download the complete repository and anonymize the content of the file or proxy the request to GitHub. In both case, the original and anonymized versions of the file are cached on the server.
|
||||||
|
|
||||||
Installing Anonymous Github
|
## Installing Anonymous Github
|
||||||
----------------------------
|
|
||||||
1. Clone the repository
|
1. Clone the repository
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
git clone https://github.com/tdurieux/anonymous_github/
|
git clone https://github.com/tdurieux/anonymous_github/
|
||||||
cd anonymous_github
|
cd anonymous_github
|
||||||
@@ -76,6 +84,7 @@ AUTH_CALLBACK=http://localhost:5000/github/auth,
|
|||||||
The callback of the GitHub app needs to be defined as `https://<host>/github/auth` (the same as defined in AUTH_CALLBACK).
|
The callback of the GitHub app needs to be defined as `https://<host>/github/auth` (the same as defined in AUTH_CALLBACK).
|
||||||
|
|
||||||
3. Run Anonymous Github
|
3. Run Anonymous Github
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
docker-compose up -d
|
docker-compose up -d
|
||||||
```
|
```
|
||||||
@@ -84,14 +93,12 @@ docker-compose up -d
|
|||||||
|
|
||||||
By default, Anonymous Github uses port 5000. It can be changed in `docker-compose.yml`.
|
By default, Anonymous Github uses port 5000. It can be changed in `docker-compose.yml`.
|
||||||
|
|
||||||
|
## Related tools
|
||||||
|
|
||||||
Related tools
|
|
||||||
--------------
|
|
||||||
[gitmask](https://www.gitmask.com/) is a tool to anonymously contribute to a Github repository.
|
[gitmask](https://www.gitmask.com/) is a tool to anonymously contribute to a Github repository.
|
||||||
|
|
||||||
[blind-reviews](https://github.com/zombie/blind-reviews/) is a browser add-on that enables a person reviewing a GitHub pull request to hide identifying information about the person submitting it.
|
[blind-reviews](https://github.com/zombie/blind-reviews/) is a browser add-on that enables a person reviewing a GitHub pull request to hide identifying information about the person submitting it.
|
||||||
|
|
||||||
See also
|
## See also
|
||||||
--------
|
|
||||||
|
|
||||||
* [Open-science and double-anonymous Peer-Review](https://www.monperrus.net/martin/open-science-double-blind)
|
- [Open-science and double-anonymous Peer-Review](https://www.monperrus.net/martin/open-science-double-blind)
|
||||||
|
|||||||
99
cli.ts
Normal file
99
cli.ts
Normal file
@@ -0,0 +1,99 @@
|
|||||||
|
#!/usr/bin/env node
|
||||||
|
|
||||||
|
import { config as dot } from "dotenv";
|
||||||
|
dot();
|
||||||
|
|
||||||
|
import { writeFile } from "fs/promises";
|
||||||
|
import { join } from "path";
|
||||||
|
import { tmpdir } from "os";
|
||||||
|
|
||||||
|
import * as gh from "parse-github-url";
|
||||||
|
import * as inquirer from "inquirer";
|
||||||
|
|
||||||
|
import config from "./config";
|
||||||
|
import GitHubDownload from "./src/source/GitHubDownload";
|
||||||
|
import Repository from "./src/Repository";
|
||||||
|
import AnonymizedRepositoryModel from "./src/database/anonymizedRepositories/anonymizedRepositories.model";
|
||||||
|
|
||||||
|
function generateRandomFileName(size: number) {
|
||||||
|
const characters =
|
||||||
|
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789";
|
||||||
|
let result = "";
|
||||||
|
for (let i = 0; i < size; i++) {
|
||||||
|
result += characters.charAt(Math.floor(Math.random() * characters.length));
|
||||||
|
}
|
||||||
|
return result;
|
||||||
|
}
|
||||||
|
|
||||||
|
async function main() {
|
||||||
|
config.STORAGE = "filesystem";
|
||||||
|
const inq = await inquirer.prompt([
|
||||||
|
{
|
||||||
|
type: "string",
|
||||||
|
name: "token",
|
||||||
|
message: `Enter your GitHub token. You can create one at https://github.com/settings/personal-access-tokens/new.`,
|
||||||
|
default: process.env.GITHUB_TOKEN,
|
||||||
|
},
|
||||||
|
{
|
||||||
|
type: "string",
|
||||||
|
name: "repo",
|
||||||
|
message: `URL of the repository to anonymize (if you want to download a specific branch or commit use the GitHub URL of that branch or commit).`,
|
||||||
|
},
|
||||||
|
{
|
||||||
|
type: "string",
|
||||||
|
name: "terms",
|
||||||
|
message: `Terms to remove from your repository (separated with comma).`,
|
||||||
|
},
|
||||||
|
]);
|
||||||
|
|
||||||
|
const ghURL = gh(inq.repo) || { owner: "", name: "", branch: "", commit: "" };
|
||||||
|
|
||||||
|
const repository = new Repository(
|
||||||
|
new AnonymizedRepositoryModel({
|
||||||
|
repoId: "test",
|
||||||
|
source: {
|
||||||
|
type: "GitHubDownload",
|
||||||
|
accessToken: inq.token,
|
||||||
|
branch: ghURL.branch || "master",
|
||||||
|
commit: ghURL.branch || "HEAD",
|
||||||
|
repositoryName: `${ghURL.owner}/${ghURL.name}`,
|
||||||
|
},
|
||||||
|
options: {
|
||||||
|
terms: inq.terms.split(","),
|
||||||
|
expirationMode: "never",
|
||||||
|
update: false,
|
||||||
|
image: true,
|
||||||
|
pdf: true,
|
||||||
|
notebook: true,
|
||||||
|
link: true,
|
||||||
|
page: false,
|
||||||
|
},
|
||||||
|
})
|
||||||
|
);
|
||||||
|
|
||||||
|
const source = new GitHubDownload(
|
||||||
|
{
|
||||||
|
type: "GitHubDownload",
|
||||||
|
accessToken: inq.token,
|
||||||
|
repositoryName: inq.repo,
|
||||||
|
},
|
||||||
|
repository
|
||||||
|
);
|
||||||
|
|
||||||
|
console.info("[INFO] Downloading repository...");
|
||||||
|
await source.download(inq.token);
|
||||||
|
const outputFileName = join(tmpdir(), generateRandomFileName(8) + ".zip");
|
||||||
|
console.info("[INFO] Anonymizing repository and creation zip file...");
|
||||||
|
await writeFile(outputFileName, repository.zip());
|
||||||
|
console.log(`Anonymized repository saved at ${outputFileName}`);
|
||||||
|
}
|
||||||
|
|
||||||
|
if (require.main === module) {
|
||||||
|
if (process.argv[2] == "server") {
|
||||||
|
// start the server
|
||||||
|
require("./src/server").default();
|
||||||
|
} else {
|
||||||
|
// use the cli interface
|
||||||
|
main();
|
||||||
|
}
|
||||||
|
}
|
||||||
599
package-lock.json
generated
599
package-lock.json
generated
File diff suppressed because it is too large
Load Diff
12
package.json
12
package.json
@@ -1,8 +1,10 @@
|
|||||||
{
|
{
|
||||||
"name": "anonymous_github",
|
"name": "@tdurieux/anonymous_github",
|
||||||
"version": "2.1.0",
|
"version": "2.1.0",
|
||||||
"description": "Anonymise Github repositories for double-anonymous reviews",
|
"description": "Anonymise Github repositories for double-anonymous reviews",
|
||||||
"main": "index.ts",
|
"bin": {
|
||||||
|
"anonymous_github": "build/cli.js"
|
||||||
|
},
|
||||||
"scripts": {
|
"scripts": {
|
||||||
"test": "mocha --reporter spec",
|
"test": "mocha --reporter spec",
|
||||||
"start": "node --inspect=5858 -r ts-node/register ./index.ts",
|
"start": "node --inspect=5858 -r ts-node/register ./index.ts",
|
||||||
@@ -23,6 +25,10 @@
|
|||||||
"url": "https://github.com/sponsors/tdurieux"
|
"url": "https://github.com/sponsors/tdurieux"
|
||||||
},
|
},
|
||||||
"homepage": "https://github.com/tdurieux/anonymous_github#readme",
|
"homepage": "https://github.com/tdurieux/anonymous_github#readme",
|
||||||
|
"files": [
|
||||||
|
"public",
|
||||||
|
"build"
|
||||||
|
],
|
||||||
"dependencies": {
|
"dependencies": {
|
||||||
"@octokit/oauth-app": "^4.1.0",
|
"@octokit/oauth-app": "^4.1.0",
|
||||||
"@octokit/rest": "^19.0.5",
|
"@octokit/rest": "^19.0.5",
|
||||||
@@ -39,6 +45,7 @@
|
|||||||
"express-session": "^1.17.3",
|
"express-session": "^1.17.3",
|
||||||
"express-slow-down": "^1.5.0",
|
"express-slow-down": "^1.5.0",
|
||||||
"got": "^11.8.5",
|
"got": "^11.8.5",
|
||||||
|
"inquirer": "^8.2.5",
|
||||||
"istextorbinary": "^6.0.0",
|
"istextorbinary": "^6.0.0",
|
||||||
"marked": "^4.1.1",
|
"marked": "^4.1.1",
|
||||||
"mime-types": "^2.1.35",
|
"mime-types": "^2.1.35",
|
||||||
@@ -63,6 +70,7 @@
|
|||||||
"@types/express-session": "^1.17.5",
|
"@types/express-session": "^1.17.5",
|
||||||
"@types/express-slow-down": "^1.3.2",
|
"@types/express-slow-down": "^1.3.2",
|
||||||
"@types/got": "^9.6.12",
|
"@types/got": "^9.6.12",
|
||||||
|
"@types/inquirer": "^8.0.0",
|
||||||
"@types/marked": "^4.0.7",
|
"@types/marked": "^4.0.7",
|
||||||
"@types/mime-types": "^2.1.0",
|
"@types/mime-types": "^2.1.0",
|
||||||
"@types/parse-github-url": "^1.0.0",
|
"@types/parse-github-url": "^1.0.0",
|
||||||
|
|||||||
@@ -15,6 +15,7 @@ import Conference from "./Conference";
|
|||||||
import ConferenceModel from "./database/conference/conferences.model";
|
import ConferenceModel from "./database/conference/conferences.model";
|
||||||
import AnonymousError from "./AnonymousError";
|
import AnonymousError from "./AnonymousError";
|
||||||
import { downloadQueue } from "./queue";
|
import { downloadQueue } from "./queue";
|
||||||
|
import { isConnected } from "./database/database";
|
||||||
|
|
||||||
export default class Repository {
|
export default class Repository {
|
||||||
private _model: IAnonymizedRepositoryDocument;
|
private _model: IAnonymizedRepositoryDocument;
|
||||||
@@ -208,6 +209,7 @@ export default class Repository {
|
|||||||
* Update the last view and view count
|
* Update the last view and view count
|
||||||
*/
|
*/
|
||||||
async countView() {
|
async countView() {
|
||||||
|
if (!isConnected) return this.model;
|
||||||
this._model.lastView = new Date();
|
this._model.lastView = new Date();
|
||||||
this._model.pageView = (this._model.pageView || 0) + 1;
|
this._model.pageView = (this._model.pageView || 0) + 1;
|
||||||
return this._model.save();
|
return this._model.save();
|
||||||
@@ -219,9 +221,11 @@ export default class Repository {
|
|||||||
* @param errorMessage a potential error message to display
|
* @param errorMessage a potential error message to display
|
||||||
*/
|
*/
|
||||||
async updateStatus(status: RepositoryStatus, statusMessage?: string) {
|
async updateStatus(status: RepositoryStatus, statusMessage?: string) {
|
||||||
|
if (!status) return this.model;
|
||||||
this._model.status = status;
|
this._model.status = status;
|
||||||
this._model.statusDate = new Date();
|
this._model.statusDate = new Date();
|
||||||
this._model.statusMessage = statusMessage;
|
this._model.statusMessage = statusMessage;
|
||||||
|
if (!isConnected) return this.model;
|
||||||
return this._model.save();
|
return this._model.save();
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -247,13 +251,12 @@ export default class Repository {
|
|||||||
* Reset/delete the state of the repository
|
* Reset/delete the state of the repository
|
||||||
*/
|
*/
|
||||||
async resetSate(status?: RepositoryStatus, statusMessage?: string) {
|
async resetSate(status?: RepositoryStatus, statusMessage?: string) {
|
||||||
if (status) this._model.status = status;
|
const p = this.updateStatus(status, statusMessage);
|
||||||
if (statusMessage) this._model.statusMessage = statusMessage;
|
|
||||||
// remove attribute
|
// remove attribute
|
||||||
this._model.size = { storage: 0, file: 0 };
|
this._model.size = { storage: 0, file: 0 };
|
||||||
this._model.originalFiles = null;
|
this._model.originalFiles = null;
|
||||||
// remove cache
|
// remove cache
|
||||||
return Promise.all([this._model.save(), this.removeCache()]);
|
return Promise.all([p, this.removeCache()]);
|
||||||
}
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
@@ -281,15 +284,15 @@ export default class Repository {
|
|||||||
}> {
|
}> {
|
||||||
if (this.status != "ready") return { storage: 0, file: 0 };
|
if (this.status != "ready") return { storage: 0, file: 0 };
|
||||||
if (this._model.size.file) return this._model.size;
|
if (this._model.size.file) return this._model.size;
|
||||||
function recursiveCount(files) {
|
function recursiveCount(files: Tree): { storage: number; file: number } {
|
||||||
const out = { storage: 0, file: 0 };
|
const out = { storage: 0, file: 0 };
|
||||||
for (const name in files) {
|
for (const name in files) {
|
||||||
const file = files[name];
|
const file = files[name];
|
||||||
if (file.size && parseInt(file.size) == file.size) {
|
if (file.size && parseInt(file.size.toString()) == file.size) {
|
||||||
out.storage += file.size as number;
|
out.storage += file.size as number;
|
||||||
out.file++;
|
out.file++;
|
||||||
} else if (typeof file == "object") {
|
} else if (typeof file == "object") {
|
||||||
const r = recursiveCount(file);
|
const r = recursiveCount(file as Tree);
|
||||||
out.storage += r.storage;
|
out.storage += r.storage;
|
||||||
out.file += r.file;
|
out.file += r.file;
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -10,10 +10,13 @@ const MONGO_URL = `mongodb://${config.DB_USERNAME}:${config.DB_PASSWORD}@${confi
|
|||||||
|
|
||||||
export const database = mongoose.connection;
|
export const database = mongoose.connection;
|
||||||
|
|
||||||
|
export let isConnected = false;
|
||||||
|
|
||||||
export async function connect() {
|
export async function connect() {
|
||||||
await mongoose.connect(MONGO_URL + "production", {
|
await mongoose.connect(MONGO_URL + "production", {
|
||||||
authSource: "admin",
|
authSource: "admin",
|
||||||
} as ConnectOptions);
|
} as ConnectOptions);
|
||||||
|
isConnected = true;
|
||||||
|
|
||||||
return database;
|
return database;
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -38,7 +38,7 @@ export default class GitHubDownload extends GitHubBase implements SourceBase {
|
|||||||
});
|
});
|
||||||
}
|
}
|
||||||
|
|
||||||
async download() {
|
async download(token?: string) {
|
||||||
const fiveMinuteAgo = new Date();
|
const fiveMinuteAgo = new Date();
|
||||||
fiveMinuteAgo.setMinutes(fiveMinuteAgo.getMinutes() - 5);
|
fiveMinuteAgo.setMinutes(fiveMinuteAgo.getMinutes() - 5);
|
||||||
if (
|
if (
|
||||||
@@ -51,7 +51,10 @@ export default class GitHubDownload extends GitHubBase implements SourceBase {
|
|||||||
});
|
});
|
||||||
let response: OctokitResponse<unknown, number>;
|
let response: OctokitResponse<unknown, number>;
|
||||||
try {
|
try {
|
||||||
response = await this._getZipUrl(await this.getToken());
|
if (!token) {
|
||||||
|
token = await this.getToken();
|
||||||
|
}
|
||||||
|
response = await this._getZipUrl(token);
|
||||||
} catch (error) {
|
} catch (error) {
|
||||||
if (error.status == 401 && config.GITHUB_TOKEN) {
|
if (error.status == 401 && config.GITHUB_TOKEN) {
|
||||||
try {
|
try {
|
||||||
|
|||||||
@@ -5,13 +5,13 @@
|
|||||||
"compilerOptions": {
|
"compilerOptions": {
|
||||||
"target": "es6",
|
"target": "es6",
|
||||||
"module": "commonjs",
|
"module": "commonjs",
|
||||||
"outDir": "dist",
|
"outDir": "build",
|
||||||
"removeComments": true,
|
"removeComments": true,
|
||||||
"preserveConstEnums": true,
|
"preserveConstEnums": true,
|
||||||
"forceConsistentCasingInFileNames": true,
|
"forceConsistentCasingInFileNames": true,
|
||||||
"sourceMap": false,
|
"sourceMap": false,
|
||||||
"skipLibCheck": true
|
"skipLibCheck": true
|
||||||
},
|
},
|
||||||
"include": ["src/**/*.ts", "index.ts", "tests3.ts"],
|
"include": ["src/**/*.ts", "index.ts", "cli.ts"],
|
||||||
"exclude": ["node_modules", ".vscode"]
|
"exclude": ["node_modules", ".vscode"]
|
||||||
}
|
}
|
||||||
|
|||||||
Reference in New Issue
Block a user