Floating button now initializes with theme-aware colors and updates
on toggle. Status page iframe uses a tuned CSS filter in dark mode
to blend with the warm palette.
- Use $location.search() instead of window.location.search for URL
params so cross-page links (owner, conference, search filters) work
with AngularJS client-side navigation
- Add missing removeRepository() in both repos and user detail controllers
- Fix removeCache() spurious $scope.$apply() that caused digest errors
- Add confirmation prompts and list refresh after remove/cache operations
Refresh button now always updates the commit to the latest SHA instead
of preserving the stale one in edit mode. Both create and update routes
verify the commit still exists on GitHub before persisting.
Mongoose treats `gist` as a nested path, not a sub-schema, so
set("gist", payload) mis-casts the inner subdoc arrays and fails
validation with 'Cast to [string] failed' at gist.files.0. Set each
subpath individually so the files/comments arrays cast correctly.
Surface the path/repo/url tied to an AnonymousError when it gets
serialized for logging — previously logs only carried name, message,
and httpStatus, which made file_not_found entries impossible to trace
back to a specific file or repo. Extract the existing detail formatting
out of toString() into a public detail() method, harden it against
AnonymizedFile getters that can throw, and have serializeError include
the result as a "detail" field.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Give the deps and prod-deps stages distinct BuildKit cache IDs with
sharing=locked so concurrent npm ci runs no longer race on
/root/.npm and fail with ENOTEMPTY.
- Drop the post-install 'npm cache clean --force' in prod-deps; it
was wiping the cache mount it had just filled.
- Update base images from node:21 to node:22 (LTS) to silence
EBADENGINE warnings from packages requiring node >= 22.
- Multi-stage Dockerfile with BuildKit npm cache mounts and a separate
prod-deps stage so source edits don't reinstall or prune.
- Tighter .dockerignore to shrink build context.
- Healthchecks: add start_period and tighten interval/retries so
containers report healthy as soon as the process is actually ready
instead of after a full polling interval.
- Move recoverStuckPreparing() off the startup critical path; the
recovery sweep now runs in the background after app.listen.
- depends_on uses condition: service_healthy and the obsolete
compose 'version' key is gone.
- New scripts/build.sh + scripts/deploy.sh: deploy.sh builds, exits
early if the image is unchanged, runs a blue/green streamer swap
(scale to 2N, wait healthy in parallel, drop olds), then recreates
the API with --no-deps to avoid compose's depends_on re-poll.
The LFS-pointer probe buffered up to 150 bytes before deciding whether
to forward the blob or swap to the raw URL. For blobs that fit entirely
in the probe, decide() ran from the source's end event and attached
data/end listeners to an already-ended stream, so out.end() was never
called. The response hung until upstream timed out and storage.write
left an incomplete cached copy, which then forced a re-fetch on every
subsequent read.
Pass a sourceEnded flag through decide() and end the output directly
when the source has already finished. Also skip the GitHub blob fetch
when the tree size is already over MAX_FILE_SIZE, surfacing
file_too_big instead of a translated 422.
Replace check-then-insert with atomic findOneAndUpdate upsert keyed on
externalId, plus a single E11000 retry fallback. Eliminates the duplicate
key race when two requests resolve the same gh_<id> concurrently.
getFiles blindly appended fetched entries to $scope.files, so
re-opening a folder duplicated its children in the tree. Drop any
existing entries at the requested path before appending.
* chore(deps): bump uuid and bullmq
Removes [uuid](https://github.com/uuidjs/uuid). It's no longer used after updating ancestor dependency [bullmq](https://github.com/taskforcesh/bullmq). These dependencies need to be updated together.
Removes `uuid`
Updates `bullmq` from 2.4.0 to 5.76.5
- [Release notes](https://github.com/taskforcesh/bullmq/releases)
- [Commits](https://github.com/taskforcesh/bullmq/compare/v2.4.0...v5.76.5)
---
updated-dependencies:
- dependency-name: bullmq
dependency-version: 5.76.5
dependency-type: direct:production
- dependency-name: uuid
dependency-version:
dependency-type: indirect
...
Signed-off-by: dependabot[bot] <support@github.com>
* fix(bullmq): adapt isRunning + getJobs typing for v5 API
Worker.isRunning became a method (was a property in v2), and
Queue.getJobs now requires a mutable JobType[] (was string[]).
* clean up
---------
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: tdurieux <durieuxthomas@hotmail.com>
Co-authored-by: Thomas Durieux <5577568+tdurieux@users.noreply.github.com>
Files like .jsonl that mime-types doesn't know fell through to
application/octet-stream and rendered as "Unsupported binary file" in
the viewer. Replace istextorbinary with isbinaryfile for content-based
detection, and use mime-types for name-based classification with a
textual application/* allowlist.
The streaming transformer now defers classification when the name is
inconclusive and sniffs the first chunk before emitting "transform",
so route.ts and AnonymizedFile.ts get a content-aware Content-Type.
Whitelists .jsonl and .ndjson to short-circuit dataset files.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Rewrite repo_access_limited, repo_not_found, repo_empty, and
repo_not_accessible to point users at concrete next steps instead
of stating only that something failed.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Follow-up review pass after the cache fixes turned up several bugs in
the same family — silent failures that look like success to the client,
plus content-correctness issues in the ZIP and per-file delivery paths.
- zipStream: stop calling archive.finalize() on upstream/parser errors.
That produced a valid-looking ZIP (200 OK, archive opens) silently
missing entries — same class as #694, but worse because the user has
no signal anything went wrong. Destroy the response on failure
instead so the client sees a connection drop.
- zipStream: apply per-repo image/pdf gates inside the entry handler.
The single-file /file/... endpoint refuses to serve those types
via AnonymizedFile.isFileSupported when image=false / pdf=false, but
the ZIP shipped them anyway — privacy-relevant for maintainers who
toggle image=false to suppress identifying screenshots. Threaded
contentOptions through both ZIP entry points (direct and streamer).
- GitHubUtils.getToken: validate the OAuth token-refresh response
before persisting. On a non-2xx response or a body without a string
token, we used to overwrite the stored token with `undefined`, which
then propagated as `Authorization: token undefined` to every API
call — 401 even on public repos, with the config.GITHUB_TOKEN
fallback unreachable because the field was no longer falsy.
- AnonymizedFile.send (streamer branch): forward Content-Type from the
upstream streamer response. got.stream(...).pipe(res) carries body
bytes only, so the parent response had no Content-Type and browsers
guessed (text rendered as download, etc.). Also resolve on
res.on("finish") in addition to "close" — keep-alive sockets stay
open long after the response is delivered, delaying countView().
- Repository.updateIfNeeded: persist a renamed source.repositoryName
even when the commit hasn't changed. Previously the new value lived
in memory only and was overwritten on the next reload, so the
rename detection ran every request.
- Repository.anonymize: stop materialising a dummy {path:"",name:""}
FileModel for empty repos. That row collided with the special case
in AnonymizedFile.getFileInfo and surfaced in unfiltered listings.
- streamer/route POST /: reject filePath segments containing ".." or
empty parts. Defence in depth — the parent server validates against
FileModel before calling, but the streamer joins filePath straight
into the storage path, so any future caller forwarding an
unvalidated path could traverse out of the repo root.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Follow-up to the GitHubStream cache fixes. The same poisoned-cache
class existed in the GitHubDownload path and a few related spots:
- GitHubDownload.download: wipe pre-existing state before extracting
and write a .anon-complete marker only after a successful extract.
On error, rm the partial cache so a retry starts clean. getFileContent
and getFiles now gate on the marker instead of "any file/folder
exists," so a half-extracted tree can never be served as canonical.
- GitHubDownload.getFileContent: validate cached file size against the
upstream FileModel size (via the new AnonymizedFile.size()), same
guard as GitHubStream. getFiles filters the marker from the listing.
- FileSystem.listFiles: drop the bogus stats.ino.toString() as sha.
An inode isn't a content hash; anything comparing it to a Git blob
sha would silently disagree. Leave undefined.
- S3.write: remove the fire-and-forget data.on("error") -> this.rm(...).
Multipart Upload doesn't commit partial objects, so there was nothing
to clean up, and the handler raced retries and could delete a
previously-good object on a transient source-stream hiccup. The
size-validated read path recovers from any other undersized objects.
- GitHubStream.resolveLfsPointer: drop the post-decision early-return
in blobStream.on("error"). Currently redundant with the inner
listener, but removes the future-refactor footgun.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
A failed/interrupted GitHub fetch could leave a 0-byte or truncated
file in the local cache. Subsequent reads happily streamed the empty
content as the file's body — visible to users as an "Empty file" with
HTTP 200. Reproduced on artifact-70B6/Lethe/configs.py (#694).
- FileSystem.write: stream into a sibling .tmp and rename into place
only on finish. Stream errors discard the tmp and leave any prior
cached file untouched. Drop the utf-8 encoding that was silently
corrupting binary blobs.
- GitHubStream.getFileContentCache: accept an expected size and treat
cached.size < expected as a poisoned cache (truncated fetch) → rm
and re-fetch. cached.size >= expected is accepted, which keeps
Git LFS-resolved files (whose FileModel.size is the pointer size)
working.
- AnonymizedFile: expose size() and pass it through to the streamer
alongside sha so the cache check has the upstream size.
Existing poisoned entries self-heal on next access.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
When a GitHub repo is renamed and looked up by its new name, the lookup
by name misses but a record with the same externalId still exists,
causing E11000 on save. Fall back to a lookup by externalId before
creating a new document.
Fixes#500
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Split into builder (node:21-slim, full deps + tsc/gulp) and runtime
(node:21-alpine, production deps only). Drops ~hundreds of MB from
the published image and removes dev tooling from the runtime layer.
Builds and pushes tdurieux/anonymous_github to Docker Hub on push to
main, v* tags, and manual dispatch. Multi-arch (amd64/arm64) with GHA
layer caching. Closes#478.