* security: harden against XSS, ReDoS, path traversal, and injection
Defensive fixes across the server, storage, and viewer:
- XSS (CWE-79): sanitise rendered notebooks with DOMPurify, escape file
names interpolated into AngularJS expressions (escapeNgString), set
Mermaid securityLevel to 'strict', and stop urlRel2abs from returning
javascript:/vbscript:/data:text/html URLs.
- Path traversal / zip-slip (CWE-22/23/24): validate URL-derived path
components before they reach the storage layer (file/webview routes +
StorageBase.assertSafePath) and sanitise zip entry names on extract for
both the filesystem and S3 backends.
- ReDoS (CWE-1333): escape anonymization terms with catastrophic
backtracking shapes to literals instead of compiling them as regexes.
- Secret hardening (CWE-798): require SESSION_SECRET / OAuth creds / DB
password in production, random dev SESSION_SECRET fallback.
- Rate-limit spoofing (CWE-290): derive request.ip via trust-proxy hop
count instead of the client-settable cf-connecting-ip header.
- NoSQL injection (CWE-943): allow only plain field paths as admin sort keys.
- Reject malformed streamer requests missing required string fields.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* fix(ui): make gists reachable/visible and clarify the ZIP button
- Gist & PR routes now accept a trailing slash (/gist/:id/:path*?), so the
dashboard links (which end in "/") resolve to the gist/PR page instead of
falling through to the 404 route (#725).
- Gist viewer picks the default tab after content loads, defaulting to
"files" when files exist; previously the ng-init ran before the async
load and a files-only gist rendered blank under the hidden comments tab.
- Explorer toolbar: relabel ZIP to "Full repo ZIP" with a tooltip, and add
tooltips to Raw/Download clarifying they apply to the current file (#721).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* fix: report SAML-enforced orgs clearly instead of "token expired"
When a repo's organization enforces SAML SSO, GitHub returns a 403 whose
message differs from the OAuth-App-restriction case. That 403 fell through
to the generic handler and surfaced as "token_expired", pushing users to
re-login when the real fix is authorizing their token for the org. Detect
the "SAML enforcement" message and raise a dedicated, actionable error
instead (#379, #550).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* security: catch nested quantified groups in ReDoS guard and backslash path traversal
- hasCatastrophicBacktracking now scans across nested parens ([\s\S]*?)
so shapes like ((a+))+ are detected; comment reframed as a heuristic
backstop rather than a proof.
- file route path-traversal check now rejects backslash separators and a
leading backslash, covering Windows-style "..\" payloads (CWE-22/25).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* chore(dev): track dev-proxy script, ignore .DS_Store and .claude/
scripts/dev-proxy.js is referenced by the "dev:ui" npm script but was
never committed, breaking the command on a fresh clone. Add it and
ignore local-only macOS/Claude Code files.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
- AnonymizedFile.anonymizedContent(): propagate content errors to the
anonymizer so callers see the failure instead of hanging.
- AnonymizedFile.send() local path: add error handler on the anonymizer
transform between content and response pipes.
- S3.send(): handle errors on the S3 body stream to avoid unhandled
emits crashing the process.
- S3.archive() / FileSystem.archive(): propagate read-stream errors
to the file transformer so archiver sees the failure.
- Add frontend translations for new error codes.
Add admin endpoints to ban and activate users, block banned users
from all auth flows (OAuth, token login, bearer auth), and invalidate
existing sessions on next request. Includes frontend translation and
user detail page ban/activate buttons.
Rewrite repo_access_limited, repo_not_found, repo_empty, and
repo_not_accessible to point users at concrete next steps instead
of stating only that something failed.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Follow-up review pass after the cache fixes turned up several bugs in
the same family — silent failures that look like success to the client,
plus content-correctness issues in the ZIP and per-file delivery paths.
- zipStream: stop calling archive.finalize() on upstream/parser errors.
That produced a valid-looking ZIP (200 OK, archive opens) silently
missing entries — same class as #694, but worse because the user has
no signal anything went wrong. Destroy the response on failure
instead so the client sees a connection drop.
- zipStream: apply per-repo image/pdf gates inside the entry handler.
The single-file /file/... endpoint refuses to serve those types
via AnonymizedFile.isFileSupported when image=false / pdf=false, but
the ZIP shipped them anyway — privacy-relevant for maintainers who
toggle image=false to suppress identifying screenshots. Threaded
contentOptions through both ZIP entry points (direct and streamer).
- GitHubUtils.getToken: validate the OAuth token-refresh response
before persisting. On a non-2xx response or a body without a string
token, we used to overwrite the stored token with `undefined`, which
then propagated as `Authorization: token undefined` to every API
call — 401 even on public repos, with the config.GITHUB_TOKEN
fallback unreachable because the field was no longer falsy.
- AnonymizedFile.send (streamer branch): forward Content-Type from the
upstream streamer response. got.stream(...).pipe(res) carries body
bytes only, so the parent response had no Content-Type and browsers
guessed (text rendered as download, etc.). Also resolve on
res.on("finish") in addition to "close" — keep-alive sockets stay
open long after the response is delivered, delaying countView().
- Repository.updateIfNeeded: persist a renamed source.repositoryName
even when the commit hasn't changed. Previously the new value lived
in memory only and was overwritten on the next reload, so the
rename detection ran every request.
- Repository.anonymize: stop materialising a dummy {path:"",name:""}
FileModel for empty repos. That row collided with the special case
in AnonymizedFile.getFileInfo and surfaced in unfiltered listings.
- streamer/route POST /: reject filePath segments containing ".." or
empty parts. Defence in depth — the parent server validates against
FileModel before calling, but the streamer joins filePath straight
into the storage path, so any future caller forwarding an
unvalidated path could traverse out of the repo root.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
When the anonymizer doesn't change a slice's text, the streamer used
to push Buffer.from(out, "utf8") — which loses any invalid-UTF-8 bytes
in the input (replaced by U+FFFD via StringDecoder). Files
mistakenly classified as text (binary blobs without a known extension,
text with stray non-UTF-8 bytes, BOMs) came out corrupted even though
nothing in the term list matched.
Track the raw chunk bytes alongside the decoded `pending`. On flush —
where we have every byte buffered — emit the original buffer directly
when the output equals the input, so a pure passthrough is bit-exact.
In the streaming OVERLAP path, do the same when the decode for that
slice round-trips losslessly; fall back to encoded output otherwise
(unchanged from before for that case).
Also add the "missing_content" locale entry for the
/api/anonymize-preview route.