fix: silent-truncation, token-refresh, and content-type bugs across hot paths

Follow-up review pass after the cache fixes turned up several bugs in
the same family — silent failures that look like success to the client,
plus content-correctness issues in the ZIP and per-file delivery paths.

- zipStream: stop calling archive.finalize() on upstream/parser errors.
  That produced a valid-looking ZIP (200 OK, archive opens) silently
  missing entries — same class as #694, but worse because the user has
  no signal anything went wrong. Destroy the response on failure
  instead so the client sees a connection drop.
- zipStream: apply per-repo image/pdf gates inside the entry handler.
  The single-file /file/... endpoint refuses to serve those types
  via AnonymizedFile.isFileSupported when image=false / pdf=false, but
  the ZIP shipped them anyway — privacy-relevant for maintainers who
  toggle image=false to suppress identifying screenshots. Threaded
  contentOptions through both ZIP entry points (direct and streamer).
- GitHubUtils.getToken: validate the OAuth token-refresh response
  before persisting. On a non-2xx response or a body without a string
  token, we used to overwrite the stored token with `undefined`, which
  then propagated as `Authorization: token undefined` to every API
  call — 401 even on public repos, with the config.GITHUB_TOKEN
  fallback unreachable because the field was no longer falsy.
- AnonymizedFile.send (streamer branch): forward Content-Type from the
  upstream streamer response. got.stream(...).pipe(res) carries body
  bytes only, so the parent response had no Content-Type and browsers
  guessed (text rendered as download, etc.). Also resolve on
  res.on("finish") in addition to "close" — keep-alive sockets stay
  open long after the response is delivered, delaying countView().
- Repository.updateIfNeeded: persist a renamed source.repositoryName
  even when the commit hasn't changed. Previously the new value lived
  in memory only and was overwritten on the next reload, so the
  rename detection ran every request.
- Repository.anonymize: stop materialising a dummy {path:"",name:""}
  FileModel for empty repos. That row collided with the special case
  in AnonymizedFile.getFileInfo and surfaced in unfiltered listings.
- streamer/route POST /: reject filePath segments containing ".." or
  empty parts. Defence in depth — the parent server validates against
  FileModel before calling, but the streamer joins filePath straight
  into the storage path, so any future caller forwarding an
  unvalidated path could traverse out of the repo root.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
tdurieux
2026-05-05 09:19:05 +03:00
parent f413a30313
commit 5b72b630c4
7 changed files with 192 additions and 52 deletions
+83 -26
View File
@@ -3,7 +3,11 @@ import { Parse } from "unzip-stream";
import archiver = require("archiver");
import GitHubDownload from "./source/GitHubDownload";
import { AnonymizeTransformer, anonymizePath } from "./anonymize-utils";
import {
AnonymizeTransformer,
anonymizePathCompiled,
compileTerms,
} from "./anonymize-utils";
export interface StreamAnonymizedZipOptions {
repoId: string;
@@ -12,6 +16,45 @@ export interface StreamAnonymizedZipOptions {
commit: string;
getToken: () => string | Promise<string>;
anonymizerOptions: ConstructorParameters<typeof AnonymizeTransformer>[0];
/**
* Per-repo content gates. Matches Repository.options — `image: true`
* includes images, `pdf: true` includes PDFs. The single-file `/file/...`
* endpoint enforces these via AnonymizedFile.isFileSupported; without
* the same gate here, the ZIP shipped a superset of what the per-file
* API exposes, which is privacy-relevant when a maintainer toggles
* image=false to suppress identifying screenshots.
*/
contentOptions?: {
image?: boolean;
pdf?: boolean;
};
}
const IMAGE_EXTENSIONS = new Set([
"png",
"jpg",
"jpeg",
"gif",
"svg",
"ico",
"bmp",
"tiff",
"tif",
"webp",
"avif",
"heif",
"heic",
]);
function isEntryAllowed(
filename: string,
contentOptions?: { image?: boolean; pdf?: boolean }
): boolean {
if (!contentOptions) return true;
const ext = filename.split(".").pop()?.toLowerCase() ?? "";
if (contentOptions.pdf === false && ext === "pdf") return false;
if (contentOptions.image === false && IMAGE_EXTENSIONS.has(ext)) return false;
return true;
}
/**
@@ -47,30 +90,44 @@ export async function streamAnonymizedZip(
});
const archive = archiver("zip", {});
const compiledTerms = compileTerms(opt.anonymizerOptions.terms || []);
// Track whether the upstream zipball finished cleanly. If it didn't,
// we must NOT finalize the archive — finalizing while bytes are still
// flowing to the response produces a valid-looking ZIP that's missing
// entries, which the client has no way to detect (status 200, archive
// opens). Destroy the response instead so the client sees a connection
// drop and knows the download failed. Same class of silent-truncation
// bug as #694.
let upstreamSucceeded = false;
const fail = (error: Error) => {
console.error(error);
archive.abort();
const destroyable = res as unknown as {
destroy?: (err?: Error) => void;
end?: () => void;
};
if (typeof destroyable.destroy === "function") {
destroyable.destroy(error);
} else if (typeof destroyable.end === "function") {
destroyable.end();
}
};
downloadStream
.on("error", (error) => {
console.error(error);
try {
archive.finalize();
} catch {
/* ignored */
}
})
.on("close", () => {
try {
archive.finalize();
} catch {
/* ignored */
}
})
.on("error", fail)
.pipe(Parse())
.on("entry", (entry: NodeJS.ReadableStream & { type: string; path: string; autodrain: () => void }) => {
if (entry.type === "File") {
try {
const fileName = anonymizePath(
const fileName = anonymizePathCompiled(
entry.path.substring(entry.path.indexOf("/") + 1),
opt.anonymizerOptions.terms || []
compiledTerms
);
if (!isEntryAllowed(fileName, opt.contentOptions)) {
entry.autodrain();
return;
}
// Pass filePath via the constructor — AnonymizeTransformer reads it
// there to decide whether the entry is text (and therefore should be
// anonymized) vs binary (passthrough). Assigning afterwards leaves
@@ -89,15 +146,9 @@ export async function streamAnonymizedZip(
entry.autodrain();
}
})
.on("error", (error: Error) => {
console.error(error);
try {
archive.finalize();
} catch {
/* ignored */
}
})
.on("error", fail)
.on("finish", () => {
upstreamSucceeded = true;
try {
archive.finalize();
} catch {
@@ -107,6 +158,12 @@ export async function streamAnonymizedZip(
archive.pipe(res).on("error", (error) => {
console.error(error);
if (!upstreamSucceeded) {
// archive errored while we were still depending on upstream bytes:
// treat as failure rather than truncating.
fail(error);
return;
}
(res as { end?: () => void }).end?.();
});
}