fix(cache): atomic file writes and size-validated cache reads

A failed/interrupted GitHub fetch could leave a 0-byte or truncated
file in the local cache. Subsequent reads happily streamed the empty
content as the file's body — visible to users as an "Empty file" with
HTTP 200. Reproduced on artifact-70B6/Lethe/configs.py (#694).

- FileSystem.write: stream into a sibling .tmp and rename into place
  only on finish. Stream errors discard the tmp and leave any prior
  cached file untouched. Drop the utf-8 encoding that was silently
  corrupting binary blobs.
- GitHubStream.getFileContentCache: accept an expected size and treat
  cached.size < expected as a poisoned cache (truncated fetch) → rm
  and re-fetch. cached.size >= expected is accepted, which keeps
  Git LFS-resolved files (whose FileModel.size is the pointer size)
  working.
- AnonymizedFile: expose size() and pass it through to the streamer
  alongside sha so the cache check has the upstream size.

Existing poisoned entries self-heal on next access.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
tdurieux
2026-05-05 08:47:41 +03:00
parent 53959f677c
commit 9adff11e74
4 changed files with 76 additions and 11 deletions
+32 -5
View File
@@ -62,16 +62,43 @@ export default class FileSystem extends StorageBase {
data: string | Readable
): Promise<void> {
const fullPath = join(config.FOLDER, this.repoPath(repoId), p);
// Atomic write: stream into a sibling .tmp and only rename into place
// when the source stream finishes successfully. If the source errors
// mid-flight (transient GitHub 5xx, socket reset, etc.), we drop the
// tmp and leave any pre-existing cached file untouched. Without this,
// a partial fetch would commit a 0-byte or truncated cache entry that
// future reads would happily serve as the file's content.
await this.mk(repoId, dirname(p));
const tmpPath = `${fullPath}.tmp.${process.pid}.${Date.now()}.${Math.random()
.toString(36)
.slice(2, 8)}`;
try {
await this.mk(repoId, dirname(p));
if (data instanceof Readable) {
data.on("error", (_err) => {
this.rm(repoId, p);
if (typeof data === "string") {
await fs.promises.writeFile(tmpPath, data);
} else {
await new Promise<void>((resolve, reject) => {
const ws = fs.createWriteStream(tmpPath);
let settled = false;
const finish = (err?: Error) => {
if (settled) return;
settled = true;
if (err) {
ws.destroy();
reject(err);
} else {
resolve();
}
};
data.on("error", finish);
ws.on("error", finish);
ws.on("finish", () => finish());
data.pipe(ws);
});
}
return await fs.promises.writeFile(fullPath, data, "utf-8");
await fs.promises.rename(tmpPath, fullPath);
} catch (err) {
console.error("[ERROR] FileSystem.write failed:", err);
await fs.promises.rm(tmpPath, { force: true }).catch(() => undefined);
throw err;
}
}