Security hardening + gist UI fixes (#731)

* security: harden against XSS, ReDoS, path traversal, and injection

Defensive fixes across the server, storage, and viewer:

- XSS (CWE-79): sanitise rendered notebooks with DOMPurify, escape file
  names interpolated into AngularJS expressions (escapeNgString), set
  Mermaid securityLevel to 'strict', and stop urlRel2abs from returning
  javascript:/vbscript:/data:text/html URLs.
- Path traversal / zip-slip (CWE-22/23/24): validate URL-derived path
  components before they reach the storage layer (file/webview routes +
  StorageBase.assertSafePath) and sanitise zip entry names on extract for
  both the filesystem and S3 backends.
- ReDoS (CWE-1333): escape anonymization terms with catastrophic
  backtracking shapes to literals instead of compiling them as regexes.
- Secret hardening (CWE-798): require SESSION_SECRET / OAuth creds / DB
  password in production, random dev SESSION_SECRET fallback.
- Rate-limit spoofing (CWE-290): derive request.ip via trust-proxy hop
  count instead of the client-settable cf-connecting-ip header.
- NoSQL injection (CWE-943): allow only plain field paths as admin sort keys.
- Reject malformed streamer requests missing required string fields.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(ui): make gists reachable/visible and clarify the ZIP button

- Gist & PR routes now accept a trailing slash (/gist/:id/:path*?), so the
  dashboard links (which end in "/") resolve to the gist/PR page instead of
  falling through to the 404 route (#725).
- Gist viewer picks the default tab after content loads, defaulting to
  "files" when files exist; previously the ng-init ran before the async
  load and a files-only gist rendered blank under the hidden comments tab.
- Explorer toolbar: relabel ZIP to "Full repo ZIP" with a tooltip, and add
  tooltips to Raw/Download clarifying they apply to the current file (#721).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix: report SAML-enforced orgs clearly instead of "token expired"

When a repo's organization enforces SAML SSO, GitHub returns a 403 whose
message differs from the OAuth-App-restriction case. That 403 fell through
to the generic handler and surfaced as "token_expired", pushing users to
re-login when the real fix is authorizing their token for the org. Detect
the "SAML enforcement" message and raise a dedicated, actionable error
instead (#379, #550).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* security: catch nested quantified groups in ReDoS guard and backslash path traversal

- hasCatastrophicBacktracking now scans across nested parens ([\s\S]*?)
  so shapes like ((a+))+ are detected; comment reframed as a heuristic
  backstop rather than a proof.
- file route path-traversal check now rejects backslash separators and a
  leading backslash, covering Windows-style "..\" payloads (CWE-22/25).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* chore(dev): track dev-proxy script, ignore .DS_Store and .claude/

scripts/dev-proxy.js is referenced by the "dev:ui" npm script but was
never committed, breaking the command on a fresh clone. Add it and
ignore local-only macOS/Claude Code files.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
Thomas Durieux
2026-06-18 04:50:55 -07:00
committed by GitHub
parent bdfcc56d81
commit e4ffd74068
21 changed files with 484 additions and 23 deletions
+186
View File
@@ -0,0 +1,186 @@
/**
* Dev proxy for local UI iteration.
*
* Serves the local `public/` folder for HTML/CSS/JS/partials/images so you
* see your design changes instantly, and proxies everything else (API,
* auth, repo content, …) to the live https://anonymous.4open.science site.
*
* npm run dev:ui # default port 4001
* PORT=5000 npm run dev:ui
*
* Notes
* - Cookies from upstream are rewritten so they stick on localhost:
* • `Secure` flag stripped
* • `Domain=anonymous.4open.science` stripped
* - GitHub OAuth callback points at the production host, so live sign-in
* won't complete against localhost. You can still browse as an anonymous
* visitor (landing page, FAQ, anonymous repo mirrors) with full data.
*/
const path = require("path");
const express = require("express");
const {
createProxyMiddleware,
responseInterceptor,
} = require("http-proxy-middleware");
const fs = require("fs");
const UPSTREAM = process.env.UPSTREAM || "https://anonymous.4open.science";
const PORT = parseInt(process.env.PORT || "4001", 10);
const PUBLIC_DIR = path.resolve(__dirname, "..", "public");
// Re-read manifest on each request so gulp rebuilds are picked up instantly.
const manifestPath = path.join(PUBLIC_DIR, "asset-manifest.json");
function asset(name) {
try {
const manifest = JSON.parse(fs.readFileSync(manifestPath, "utf-8"));
return manifest[name] || name;
} catch {
return name;
}
}
// Paths that should always be served from the local `public/` folder.
// Anything else falls through to the proxy.
const LOCAL_PREFIXES = [
"/css/",
"/script/",
"/partials/",
"/fonts/",
"/imgs/",
"/i18n/",
"/favicon/",
"/favicon.ico",
"/robots.txt",
];
function isLocalPath(urlPath) {
if (urlPath === "/" || urlPath === "/index.html") return true;
return LOCAL_PREFIXES.some((p) => urlPath === p || urlPath.startsWith(p));
}
const app = express();
// 0) Serve hashed asset filenames by stripping the hash.
app.get(/^\/(script|css)\/(.+)\.([a-f0-9]{10})\.(min\.\w+|\w+)$/, (req, res, next) => {
const dir = req.params[0];
const base = req.params[1];
const ext = req.params[3];
const filePath = path.join(PUBLIC_DIR, dir, `${base}.${ext}`);
if (!fs.existsSync(filePath)) return next();
res.sendFile(filePath);
});
// 1) Local static for the UI shell.
app.use((req, res, next) => {
if (req.method === "GET" && isLocalPath(req.path)) {
res.setHeader("Cache-Control", "no-store, max-age=0");
// The SPA entry: serve index.html with asset-hash placeholders filled in.
if (req.path === "/" || req.path === "/index.html") {
let html = fs.readFileSync(path.join(PUBLIC_DIR, "index.html"), "utf-8");
html = html
.replace("__CORE_JS__", asset("core.min.js"))
.replace("__VENDOR_JS__", asset("vendor.min.js"))
.replace("__MERMAID_JS__", asset("mermaid.min.js"))
.replace("__ALL_CSS__", asset("all.min.css"));
res.type("html").send(html);
return;
}
return express.static(PUBLIC_DIR, {
fallthrough: true,
etag: false,
cacheControl: false,
})(req, res, next);
}
next();
});
// 2) SPA catch-all: serve local index.html for HTML page navigations
// so all routes use the local shell (with split bundles).
app.use((req, res, next) => {
const accept = req.headers.accept || "";
if (
req.method === "GET" &&
accept.includes("text/html") &&
!req.path.startsWith("/api/") &&
!req.path.startsWith("/github/") &&
!req.path.startsWith("/w/")
) {
let html = fs.readFileSync(path.join(PUBLIC_DIR, "index.html"), "utf-8");
html = html
.replace("__CORE_JS__", asset("core.min.js"))
.replace("__VENDOR_JS__", asset("vendor.min.js"))
.replace("__MERMAID_JS__", asset("mermaid.min.js"))
.replace("__ALL_CSS__", asset("all.min.css"));
res.type("html").send(html);
return;
}
next();
});
// 3) Proxy everything else to the live site.
app.use(
createProxyMiddleware({
target: UPSTREAM,
changeOrigin: true,
secure: true,
ws: true,
xfwd: false,
followRedirects: false,
selfHandleResponse: true, // so we can rewrite Set-Cookie + HTML
cookieDomainRewrite: "",
cookiePathRewrite: "/",
onProxyReq(proxyReq, req) {
// Make upstream think the request came in over HTTPS at its domain.
proxyReq.setHeader("origin", UPSTREAM);
proxyReq.setHeader("referer", UPSTREAM + req.originalUrl);
},
onProxyRes: responseInterceptor(async (buffer, proxyRes, req, res) => {
// Rewrite Set-Cookie so cookies stick on localhost.
const setCookie = proxyRes.headers["set-cookie"];
if (setCookie) {
const rewritten = setCookie.map((c) =>
c
.replace(/;\s*Secure/gi, "")
.replace(/;\s*Domain=[^;]+/gi, "")
.replace(/;\s*SameSite=None/gi, "; SameSite=Lax"),
);
res.setHeader("set-cookie", rewritten);
}
// Rewrite Location headers on 3xx redirects.
const location = proxyRes.headers["location"];
if (location && typeof location === "string") {
try {
const u = new URL(location, UPSTREAM);
if (u.origin === UPSTREAM) {
res.setHeader("location", u.pathname + u.search + u.hash);
}
} catch {
/* leave as-is */
}
}
const ct = String(proxyRes.headers["content-type"] || "");
if (ct.includes("text/html")) {
// Swap upstream domain references in HTML so relative navigation
// stays on localhost.
const body = buffer
.toString("utf8")
.replace(new RegExp(UPSTREAM.replace(/[.*+?^${}()|[\]\\]/g, "\\$&"), "g"), "");
return body;
}
return buffer;
}),
logLevel: "warn",
}),
);
app.listen(PORT, () => {
console.log(
`\n dev-proxy http://localhost:${PORT}` +
`\n → local: ${PUBLIC_DIR}` +
`\n → upstream ${UPSTREAM}\n`,
);
});