Commit Graph

654 Commits

Author SHA1 Message Date
tdurieux 1edc9b7221 fix: remove never expiration 2026-07-02 15:06:12 +03:00
tdurieux 1c1993f972 fix: try to fix gist 2026-07-02 14:53:26 +03:00
Thomas Durieux 839582c657 Fix .bat anonymization, truncated-tree misses, submodule warning, account deletion (#742)
* fix: anonymize Windows batch scripts (#735)

mime-types maps .bat to application/x-msdownload, the same MIME type as
.exe/.dll, so batch scripts were classified as binary and streamed
through without any anonymization. Special-case .bat/.cmd as text before
the MIME lookup, keeping .exe/.dll binary.

* fix: recover files missing from truncated tree listings (#738)

GitHub truncates tree listings of very large repositories. Folders whose
listing was truncated are recorded in truncatedFolders, but files that
fell outside the listing never reached the database, so requesting them
returned 404 file_not_found even though they exist on GitHub — and a
force refresh could not help.

When a file lookup misses and its directory is under a truncated folder,
fetch the file metadata directly from GitHub's contents API (object
media type, so it works past the 1MB inline limit), cache it in the
database, and serve it normally.

* feat: warn when a repository uses git submodules (#737)

GitHub archives and tree listings never include submodule contents, so
submodules end up as empty folders in the anonymized repository, which
surprises users. Detect a root .gitmodules file and show a warning
banner in the explorer explaining that submodule contents are not
included.

* feat: allow users to delete their account (#741)

Add DELETE /api/user: removes all anonymized repositories, gists, and
pull requests owned by the user, best-effort revokes the GitHub OAuth
grant, and scrubs personal data (username, emails, tokens, GitHub id,
photo) from the user record. The record itself is kept with a
placeholder username so removed repoIds stay reserved and owner
references remain resolvable.

The settings page gains an Account section with a confirmed delete
button.

* fix: add missing error translations for token_expired and job_is_active

The error-code coverage test failed because both backend codes had no
frontend translation.
2026-07-02 13:35:48 +02:00
dependabot[bot] e4e102eccd chore(deps-dev): bump http-proxy-middleware from 3.0.5 to 3.0.7 (#732)
Bumps [http-proxy-middleware](https://github.com/chimurai/http-proxy-middleware) from 3.0.5 to 3.0.7.
- [Release notes](https://github.com/chimurai/http-proxy-middleware/releases)
- [Changelog](https://github.com/chimurai/http-proxy-middleware/blob/v3.0.7/CHANGELOG.md)
- [Commits](https://github.com/chimurai/http-proxy-middleware/compare/v3.0.5...v3.0.7)

---
updated-dependencies:
- dependency-name: http-proxy-middleware
  dependency-version: 3.0.7
  dependency-type: direct:development
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-06-26 08:05:27 +02:00
dependabot[bot] b98ad338fe chore(deps-dev): bump form-data from 2.5.5 to 2.5.6 (#733)
Bumps [form-data](https://github.com/form-data/form-data) from 2.5.5 to 2.5.6.
- [Changelog](https://github.com/form-data/form-data/blob/master/CHANGELOG.md)
- [Commits](https://github.com/form-data/form-data/compare/v2.5.5...v2.5.6)

---
updated-dependencies:
- dependency-name: form-data
  dependency-version: 2.5.6
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-06-26 08:05:14 +02:00
dependabot[bot] 1a146ea0c7 chore(deps-dev): bump js-yaml from 4.1.1 to 4.2.0 (#734)
Bumps [js-yaml](https://github.com/nodeca/js-yaml) from 4.1.1 to 4.2.0.
- [Changelog](https://github.com/nodeca/js-yaml/blob/master/CHANGELOG.md)
- [Commits](https://github.com/nodeca/js-yaml/compare/4.1.1...4.2.0)

---
updated-dependencies:
- dependency-name: js-yaml
  dependency-version: 4.2.0
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-06-22 15:23:36 +02:00
Thomas Durieux e4ffd74068 Security hardening + gist UI fixes (#731)
* security: harden against XSS, ReDoS, path traversal, and injection

Defensive fixes across the server, storage, and viewer:

- XSS (CWE-79): sanitise rendered notebooks with DOMPurify, escape file
  names interpolated into AngularJS expressions (escapeNgString), set
  Mermaid securityLevel to 'strict', and stop urlRel2abs from returning
  javascript:/vbscript:/data:text/html URLs.
- Path traversal / zip-slip (CWE-22/23/24): validate URL-derived path
  components before they reach the storage layer (file/webview routes +
  StorageBase.assertSafePath) and sanitise zip entry names on extract for
  both the filesystem and S3 backends.
- ReDoS (CWE-1333): escape anonymization terms with catastrophic
  backtracking shapes to literals instead of compiling them as regexes.
- Secret hardening (CWE-798): require SESSION_SECRET / OAuth creds / DB
  password in production, random dev SESSION_SECRET fallback.
- Rate-limit spoofing (CWE-290): derive request.ip via trust-proxy hop
  count instead of the client-settable cf-connecting-ip header.
- NoSQL injection (CWE-943): allow only plain field paths as admin sort keys.
- Reject malformed streamer requests missing required string fields.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(ui): make gists reachable/visible and clarify the ZIP button

- Gist & PR routes now accept a trailing slash (/gist/:id/:path*?), so the
  dashboard links (which end in "/") resolve to the gist/PR page instead of
  falling through to the 404 route (#725).
- Gist viewer picks the default tab after content loads, defaulting to
  "files" when files exist; previously the ng-init ran before the async
  load and a files-only gist rendered blank under the hidden comments tab.
- Explorer toolbar: relabel ZIP to "Full repo ZIP" with a tooltip, and add
  tooltips to Raw/Download clarifying they apply to the current file (#721).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix: report SAML-enforced orgs clearly instead of "token expired"

When a repo's organization enforces SAML SSO, GitHub returns a 403 whose
message differs from the OAuth-App-restriction case. That 403 fell through
to the generic handler and surfaced as "token_expired", pushing users to
re-login when the real fix is authorizing their token for the org. Detect
the "SAML enforcement" message and raise a dedicated, actionable error
instead (#379, #550).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* security: catch nested quantified groups in ReDoS guard and backslash path traversal

- hasCatastrophicBacktracking now scans across nested parens ([\s\S]*?)
  so shapes like ((a+))+ are detected; comment reframed as a heuristic
  backstop rather than a proof.
- file route path-traversal check now rejects backslash separators and a
  leading backslash, covering Windows-style "..\" payloads (CWE-22/25).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* chore(dev): track dev-proxy script, ignore .DS_Store and .claude/

scripts/dev-proxy.js is referenced by the "dev:ui" npm script but was
never committed, breaking the command on a fresh clone. Add it and
ignore local-only macOS/Claude Code files.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-18 13:50:55 +02:00
dependabot[bot] bdfcc56d81 chore(deps): bump qs and express (#719)
Bumps [qs](https://github.com/ljharb/qs) to 6.15.2 and updates ancestor dependency [express](https://github.com/expressjs/express). These dependencies need to be updated together.


Updates `qs` from 6.14.2 to 6.15.2
- [Changelog](https://github.com/ljharb/qs/blob/main/CHANGELOG.md)
- [Commits](https://github.com/ljharb/qs/compare/v6.14.2...v6.15.2)

Updates `express` from 4.22.1 to 4.22.2
- [Release notes](https://github.com/expressjs/express/releases)
- [Changelog](https://github.com/expressjs/express/blob/v4.22.2/History.md)
- [Commits](https://github.com/expressjs/express/compare/v4.22.1...v4.22.2)

---
updated-dependencies:
- dependency-name: qs
  dependency-version: 6.15.2
  dependency-type: indirect
- dependency-name: express
  dependency-version: 4.22.2
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-05-23 06:32:36 +03:00
dependabot[bot] d0a1380884 chore(deps): bump sanitize-html from 2.17.3 to 2.17.4 (#717)
Bumps [sanitize-html](https://github.com/apostrophecms/apostrophe/tree/HEAD/packages/sanitize-html) from 2.17.3 to 2.17.4.
- [Changelog](https://github.com/apostrophecms/apostrophe/blob/main/packages/sanitize-html/CHANGELOG.md)
- [Commits](https://github.com/apostrophecms/apostrophe/commits/HEAD/packages/sanitize-html)

---
updated-dependencies:
- dependency-name: sanitize-html
  dependency-version: 2.17.4
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-05-21 20:46:10 +03:00
tdurieux 39fadd6cf0 fix: auth issue & profile save issue 2026-05-18 16:04:09 +03:00
tdurieux e73ad48115 fix: file download large repo 2026-05-12 21:00:47 +03:00
tdurieux 898f18919e fix rate limit 2026-05-12 20:51:08 +03:00
tdurieux 427e26062e fix 2026-05-12 20:34:32 +03:00
tdurieux 9b6c8dbe62 improve admin overview 2026-05-11 12:23:24 +03:00
tdurieux afd9f36cfb improve admin overview 2026-05-11 12:10:17 +03:00
tdurieux 03e18fd572 repo change + daily stat improvements 2026-05-11 12:10:17 +03:00
dependabot[bot] b03c4b437c chore(deps): bump fast-xml-builder from 1.1.5 to 1.2.0 (#707)
Bumps [fast-xml-builder](https://github.com/NaturalIntelligence/fast-xml-builder) from 1.1.5 to 1.2.0.
- [Changelog](https://github.com/NaturalIntelligence/fast-xml-builder/blob/main/CHANGELOG.md)
- [Commits](https://github.com/NaturalIntelligence/fast-xml-builder/compare/v1.1.5...v1.2.0)

---
updated-dependencies:
- dependency-name: fast-xml-builder
  dependency-version: 1.2.0
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-05-09 13:31:56 +02:00
tdurieux 3eeed23609 handle memory issues 2026-05-07 21:01:07 +03:00
tdurieux 369fd8edb2 redis cache 2026-05-07 15:55:28 +03:00
tdurieux b37a814f3a improve queue 2026-05-07 14:58:36 +03:00
tdurieux f817a29a4b loading improvements 2026-05-07 08:30:31 +03:00
tdurieux 2de08c3df3 Add missing error handlers on stream pipelines
- AnonymizedFile.anonymizedContent(): propagate content errors to the
  anonymizer so callers see the failure instead of hanging.
- AnonymizedFile.send() local path: add error handler on the anonymizer
  transform between content and response pipes.
- S3.send(): handle errors on the S3 body stream to avoid unhandled
  emits crashing the process.
- S3.archive() / FileSystem.archive(): propagate read-stream errors
  to the file transformer so archiver sees the failure.
- Add frontend translations for new error codes.
2026-05-07 07:47:29 +03:00
tdurieux 7a163f2d35 Fix streamer crash and misclassified transient GitHub errors
Add missing error handler on the anonymizer transform stream in the
streamer route — without it, an upstream error tears down the pipe and
the anonymizer emits an unhandled error that crashes the process
(surfacing as ECONNRESET to the main server).

Classify transient network errors (ReadError, ECONNRESET, ETIMEDOUT)
as upstream_error/502 instead of file_not_found/404 so they are
distinguishable in logs and don't cache-poison downstream.

Update handleError tests to match the existing sanitization behavior
that returns internal_error for non-AnonymousError instances.
2026-05-07 07:44:15 +03:00
tdurieux 4ab8e0d1cd Handle GitHub 409 "repository is empty" error in getCommitInfo
When a GitHub repo has no commits, the API returns 409 which was
unhandled, causing raw HttpError warnings. Now throws repo_empty
AnonymousError consistent with the existing convention.
2026-05-07 07:42:05 +03:00
tdurieux fbbc694747 improve styling 2026-05-07 07:34:30 +03:00
tdurieux e59527bc78 Remove all user repositories when banning
Use removeQueue instead of cacheQueue so each repo transitions to
REMOVING status and is fully deleted, not just cache-cleared.
2026-05-07 06:04:55 +03:00
tdurieux 9292c19392 Handle GitHub 422 errors as commit_not_found and sanitize error responses 2026-05-07 05:54:51 +03:00
tdurieux 9403f15ac3 Align error logging with admin dashboard field conventions
All warn/error log calls now use field names the dashboard's decorate()
function recognizes: `code` for the error code pill, `httpStatus` for the
status badge and severity bucket, `url` for the sidebar link, and
`repoId` for the repository link.

Key changes:
- Streamer errors surface code, httpStatus, url, and nested err in Raw tab
- Nested `{ err: serializeError(e) }` replaced with spread pattern so
  error fields (name, message, status) appear at the top level
- Raw Error objects in catch blocks now go through serializeError()
- Rate limit, token, and PR 404 warnings include code + httpStatus
- Dashboard stack walker traverses both `cause` and `err` chains
- Dashboard Raw tab renders repoId, filePath, upstream*, err, and cause
- trimRawArg recursively trims stacks in nested err/cause chains
- clampPayload strips heavy nested fields before falling back to
  truncated placeholder, preserving flat diagnostic fields
2026-05-07 05:54:18 +03:00
tdurieux b8cfe293ea Fix BullMQ "Custom Id cannot be integers" error by prefixing jobId 2026-05-07 05:53:26 +03:00
tdurieux 8fc7ac5175 Add user ban/activate feature
Add admin endpoints to ban and activate users, block banned users
from all auth flows (OAuth, token login, bearer auth), and invalidate
existing sessions on next request. Includes frontend translation and
user detail page ban/activate buttons.
2026-05-07 05:41:12 +03:00
tdurieux 48256e743c Style Ko-fi widgets to match light/dark theme
Floating button now initializes with theme-aware colors and updates
on toggle. Status page iframe uses a tuned CSS filter in dark mode
to blend with the warm palette.
2026-05-06 21:38:52 +03:00
tdurieux dfa5a2e2fd Fix repo link on admin errors page to point to repo view 2026-05-06 21:38:31 +03:00
tdurieux 1204eaffa9 Fix admin repository links and remove buttons
- Use $location.search() instead of window.location.search for URL
  params so cross-page links (owner, conference, search filters) work
  with AngularJS client-side navigation
- Add missing removeRepository() in both repos and user detail controllers
- Fix removeCache() spurious $scope.$apply() that caused digest errors
- Add confirmation prompts and list refresh after remove/cache operations
2026-05-06 21:27:57 +03:00
tdurieux d9104c2ec2 Update commit on branch refresh and validate commit exists on save
Refresh button now always updates the commit to the latest SHA instead
of preserving the stale one in edit mode. Both create and update routes
verify the commit still exists on GitHub before persisting.
2026-05-06 21:14:53 +03:00
tdurieux d1d6257512 fix audio url 2026-05-06 20:37:50 +03:00
tdurieux 2f6ec41a2c block indexing webpage as well 2026-05-06 20:36:04 +03:00
tdurieux bd8656206a fix persistance bugs 2026-05-06 20:00:59 +03:00
tdurieux 67cb2538b1 fix old github download repos 2026-05-06 19:37:16 +03:00
tdurieux da78708b7b Improve error handling 2026-05-06 18:43:36 +03:00
tdurieux aae6eae6eb handle rate limit 2026-05-06 17:50:01 +03:00
tdurieux c1e18f82a9 Improve error handling 2026-05-06 17:39:43 +03:00
tdurieux 6bad6c2f09 fix bugs and report better errors 2026-05-06 17:26:47 +03:00
tdurieux 3b27816702 fix incremental 2026-05-06 17:12:58 +03:00
tdurieux 804bbffb7a Improve error handling 2026-05-06 17:03:19 +03:00
tdurieux 48e782946a Improve error handling 2026-05-06 16:56:07 +03:00
tdurieux b2461088e8 fix test 2026-05-06 16:55:50 +03:00
tdurieux cf2f172aca fix(gist): set gist subpaths individually to avoid CastError
Mongoose treats `gist` as a nested path, not a sub-schema, so
set("gist", payload) mis-casts the inner subdoc arrays and fails
validation with 'Cast to [string] failed' at gist.files.0. Set each
subpath individually so the files/comments arrays cast correctly.
2026-05-06 16:52:48 +03:00
tdurieux dcb524c8c1 Improve error handling 2026-05-06 16:45:22 +03:00
tdurieux 3613c895c8 improve logging 2026-05-06 16:31:10 +03:00
tdurieux 873c910dd3 Improve error dashboard 2026-05-06 16:12:37 +03:00