* security: harden against XSS, ReDoS, path traversal, and injection
Defensive fixes across the server, storage, and viewer:
- XSS (CWE-79): sanitise rendered notebooks with DOMPurify, escape file
names interpolated into AngularJS expressions (escapeNgString), set
Mermaid securityLevel to 'strict', and stop urlRel2abs from returning
javascript:/vbscript:/data:text/html URLs.
- Path traversal / zip-slip (CWE-22/23/24): validate URL-derived path
components before they reach the storage layer (file/webview routes +
StorageBase.assertSafePath) and sanitise zip entry names on extract for
both the filesystem and S3 backends.
- ReDoS (CWE-1333): escape anonymization terms with catastrophic
backtracking shapes to literals instead of compiling them as regexes.
- Secret hardening (CWE-798): require SESSION_SECRET / OAuth creds / DB
password in production, random dev SESSION_SECRET fallback.
- Rate-limit spoofing (CWE-290): derive request.ip via trust-proxy hop
count instead of the client-settable cf-connecting-ip header.
- NoSQL injection (CWE-943): allow only plain field paths as admin sort keys.
- Reject malformed streamer requests missing required string fields.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* fix(ui): make gists reachable/visible and clarify the ZIP button
- Gist & PR routes now accept a trailing slash (/gist/:id/:path*?), so the
dashboard links (which end in "/") resolve to the gist/PR page instead of
falling through to the 404 route (#725).
- Gist viewer picks the default tab after content loads, defaulting to
"files" when files exist; previously the ng-init ran before the async
load and a files-only gist rendered blank under the hidden comments tab.
- Explorer toolbar: relabel ZIP to "Full repo ZIP" with a tooltip, and add
tooltips to Raw/Download clarifying they apply to the current file (#721).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* fix: report SAML-enforced orgs clearly instead of "token expired"
When a repo's organization enforces SAML SSO, GitHub returns a 403 whose
message differs from the OAuth-App-restriction case. That 403 fell through
to the generic handler and surfaced as "token_expired", pushing users to
re-login when the real fix is authorizing their token for the org. Detect
the "SAML enforcement" message and raise a dedicated, actionable error
instead (#379, #550).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* security: catch nested quantified groups in ReDoS guard and backslash path traversal
- hasCatastrophicBacktracking now scans across nested parens ([\s\S]*?)
so shapes like ((a+))+ are detected; comment reframed as a heuristic
backstop rather than a proof.
- file route path-traversal check now rejects backslash separators and a
leading backslash, covering Windows-style "..\" payloads (CWE-22/25).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* chore(dev): track dev-proxy script, ignore .DS_Store and .claude/
scripts/dev-proxy.js is referenced by the "dev:ui" npm script but was
never committed, breaking the command on a fresh clone. Add it and
ignore local-only macOS/Claude Code files.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
- AnonymizedFile.anonymizedContent(): propagate content errors to the
anonymizer so callers see the failure instead of hanging.
- AnonymizedFile.send() local path: add error handler on the anonymizer
transform between content and response pipes.
- S3.send(): handle errors on the S3 body stream to avoid unhandled
emits crashing the process.
- S3.archive() / FileSystem.archive(): propagate read-stream errors
to the file transformer so archiver sees the failure.
- Add frontend translations for new error codes.
Add missing error handler on the anonymizer transform stream in the
streamer route — without it, an upstream error tears down the pipe and
the anonymizer emits an unhandled error that crashes the process
(surfacing as ECONNRESET to the main server).
Classify transient network errors (ReadError, ECONNRESET, ETIMEDOUT)
as upstream_error/502 instead of file_not_found/404 so they are
distinguishable in logs and don't cache-poison downstream.
Update handleError tests to match the existing sanitization behavior
that returns internal_error for non-AnonymousError instances.
When a GitHub repo has no commits, the API returns 409 which was
unhandled, causing raw HttpError warnings. Now throws repo_empty
AnonymousError consistent with the existing convention.
All warn/error log calls now use field names the dashboard's decorate()
function recognizes: `code` for the error code pill, `httpStatus` for the
status badge and severity bucket, `url` for the sidebar link, and
`repoId` for the repository link.
Key changes:
- Streamer errors surface code, httpStatus, url, and nested err in Raw tab
- Nested `{ err: serializeError(e) }` replaced with spread pattern so
error fields (name, message, status) appear at the top level
- Raw Error objects in catch blocks now go through serializeError()
- Rate limit, token, and PR 404 warnings include code + httpStatus
- Dashboard stack walker traverses both `cause` and `err` chains
- Dashboard Raw tab renders repoId, filePath, upstream*, err, and cause
- trimRawArg recursively trims stacks in nested err/cause chains
- clampPayload strips heavy nested fields before falling back to
truncated placeholder, preserving flat diagnostic fields
Add admin endpoints to ban and activate users, block banned users
from all auth flows (OAuth, token login, bearer auth), and invalidate
existing sessions on next request. Includes frontend translation and
user detail page ban/activate buttons.
Floating button now initializes with theme-aware colors and updates
on toggle. Status page iframe uses a tuned CSS filter in dark mode
to blend with the warm palette.
- Use $location.search() instead of window.location.search for URL
params so cross-page links (owner, conference, search filters) work
with AngularJS client-side navigation
- Add missing removeRepository() in both repos and user detail controllers
- Fix removeCache() spurious $scope.$apply() that caused digest errors
- Add confirmation prompts and list refresh after remove/cache operations
Refresh button now always updates the commit to the latest SHA instead
of preserving the stale one in edit mode. Both create and update routes
verify the commit still exists on GitHub before persisting.
Mongoose treats `gist` as a nested path, not a sub-schema, so
set("gist", payload) mis-casts the inner subdoc arrays and fails
validation with 'Cast to [string] failed' at gist.files.0. Set each
subpath individually so the files/comments arrays cast correctly.
Surface the path/repo/url tied to an AnonymousError when it gets
serialized for logging — previously logs only carried name, message,
and httpStatus, which made file_not_found entries impossible to trace
back to a specific file or repo. Extract the existing detail formatting
out of toString() into a public detail() method, harden it against
AnonymizedFile getters that can throw, and have serializeError include
the result as a "detail" field.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Give the deps and prod-deps stages distinct BuildKit cache IDs with
sharing=locked so concurrent npm ci runs no longer race on
/root/.npm and fail with ENOTEMPTY.
- Drop the post-install 'npm cache clean --force' in prod-deps; it
was wiping the cache mount it had just filled.
- Update base images from node:21 to node:22 (LTS) to silence
EBADENGINE warnings from packages requiring node >= 22.
- Multi-stage Dockerfile with BuildKit npm cache mounts and a separate
prod-deps stage so source edits don't reinstall or prune.
- Tighter .dockerignore to shrink build context.
- Healthchecks: add start_period and tighten interval/retries so
containers report healthy as soon as the process is actually ready
instead of after a full polling interval.
- Move recoverStuckPreparing() off the startup critical path; the
recovery sweep now runs in the background after app.listen.
- depends_on uses condition: service_healthy and the obsolete
compose 'version' key is gone.
- New scripts/build.sh + scripts/deploy.sh: deploy.sh builds, exits
early if the image is unchanged, runs a blue/green streamer swap
(scale to 2N, wait healthy in parallel, drop olds), then recreates
the API with --no-deps to avoid compose's depends_on re-poll.
The LFS-pointer probe buffered up to 150 bytes before deciding whether
to forward the blob or swap to the raw URL. For blobs that fit entirely
in the probe, decide() ran from the source's end event and attached
data/end listeners to an already-ended stream, so out.end() was never
called. The response hung until upstream timed out and storage.write
left an incomplete cached copy, which then forced a re-fetch on every
subsequent read.
Pass a sourceEnded flag through decide() and end the output directly
when the source has already finished. Also skip the GitHub blob fetch
when the tree size is already over MAX_FILE_SIZE, surfacing
file_too_big instead of a translated 422.