anonymous_github

mirror of https://github.com/tdurieux/anonymous_github.git synced 2026-05-16 14:59:07 +02:00

Author	SHA1	Message	Date
tdurieux	f413a30313	fix(cache): make Zip-source caches atomic and robust to partial state Follow-up to the GitHubStream cache fixes. The same poisoned-cache class existed in the GitHubDownload path and a few related spots: - GitHubDownload.download: wipe pre-existing state before extracting and write a .anon-complete marker only after a successful extract. On error, rm the partial cache so a retry starts clean. getFileContent and getFiles now gate on the marker instead of "any file/folder exists," so a half-extracted tree can never be served as canonical. - GitHubDownload.getFileContent: validate cached file size against the upstream FileModel size (via the new AnonymizedFile.size()), same guard as GitHubStream. getFiles filters the marker from the listing. - FileSystem.listFiles: drop the bogus stats.ino.toString() as sha. An inode isn't a content hash; anything comparing it to a Git blob sha would silently disagree. Leave undefined. - S3.write: remove the fire-and-forget data.on("error") -> this.rm(...). Multipart Upload doesn't commit partial objects, so there was nothing to clean up, and the handler raced retries and could delete a previously-good object on a transient source-stream hiccup. The size-validated read path recovers from any other undersized objects. - GitHubStream.resolveLfsPointer: drop the post-decision early-return in blobStream.on("error"). Currently redundant with the inner listener, but removes the future-refactor footgun. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-05 08:54:42 +03:00
tdurieux	9adff11e74	fix(cache): atomic file writes and size-validated cache reads A failed/interrupted GitHub fetch could leave a 0-byte or truncated file in the local cache. Subsequent reads happily streamed the empty content as the file's body — visible to users as an "Empty file" with HTTP 200. Reproduced on artifact-70B6/Lethe/configs.py (#694). - FileSystem.write: stream into a sibling .tmp and rename into place only on finish. Stream errors discard the tmp and leave any prior cached file untouched. Drop the utf-8 encoding that was silently corrupting binary blobs. - GitHubStream.getFileContentCache: accept an expected size and treat cached.size < expected as a poisoned cache (truncated fetch) → rm and re-fetch. cached.size >= expected is accepted, which keeps Git LFS-resolved files (whose FileModel.size is the pointer size) working. - AnonymizedFile: expose size() and pass it through to the streamer alongside sha so the cache check has the upstream size. Existing poisoned entries self-heal on next access. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-05 08:47:41 +03:00
Thomas Durieux	188066e91d	Fix 9 bugs and add 103 tests for core anonymization, config, and routing (#669 )	2026-04-15 09:41:00 +02:00
Thomas Durieux	f4209110c7	Fix all 93 ESLint issues (3 errors, 90 warnings) (#666 )	2026-04-15 09:04:22 +02:00
Thomas Durieux	655ae92c4c	Remove OpenTelemetry tracing infrastructure (#662 )	2026-04-15 04:39:08 +02:00
Thomas Durieux	f3641c8ce3	Set up CI with ESLint linter and Mocha test runner (#661 )	2026-04-15 04:34:03 +02:00
tdurieux	b0fa5e6689	fix: hot fix, replace repoID by repoId	2024-04-26 12:40:56 +01:00
tdurieux	710f7328e7	feat: flatten file tree for better performance	2024-04-26 10:32:09 +01:00
tdurieux	83c55fdfbf	fix: typo	2024-04-03 13:27:05 +01:00
tdurieux	db67f53b2c	fix: fix GitHubDownload	2024-04-03 13:24:34 +01:00
tdurieux	4d12641c7e	feat: introduce streamers that handle the stream and anonymization from github	2024-04-03 11:13:01 +01:00

11 Commits