Files
gstack/make-pdf/src/orchestrator.ts
T
Garry Tan e23ff280a1 fix(v1.4.1.0): /make-pdf — page numbers, entity escape, Linux fonts (#1098)
* fix(make-pdf): single-source page numbers via CSS, honor --no-page-numbers end-to-end

Two page-number sources were stacking in every PDF: Chromium's native footer
and our @page @bottom-center CSS. The CLI flag --page-numbers/--no-page-numbers
also never reached the CSS layer, because RenderOptions didn't carry it.
Passing --footer-template likewise dropped the "custom footer replaces stock
footer" semantic.

- orchestrator.ts: browseClient.pdf() gets pageNumbers:false unconditionally.
  CSS is the single source of truth. Chromium native numbering always off.
- render.ts: RenderOptions gains pageNumbers + footerTemplate. render() computes
  showPageNumbers = pageNumbers !== false && !footerTemplate and passes to
  printCss(), preserving the prior footerTemplate-suppresses-stock semantic.
- print-css.ts: PrintCssOptions.pageNumbers wraps @bottom-center in a conditional
  matching the existing showConfidential pattern.
- types.ts: PreviewOptions.pageNumbers so preview path compiles and matches CLI.
- render.test.ts: 7 regression tests covering printCss({pageNumbers}) in
  isolation AND the full render() data flow incl. footerTemplate path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(make-pdf): decode HTML entities in titles and TOC to prevent double-escape

A markdown title like "# Herbert & Garry" rendered as "Herbert &amp;amp; Garry"
in <title>, cover block, and TOC entries. marked emits "&amp;" (correct HTML),
but extractFirstHeading and extractHeadings only stripTags — leaving the entity
intact. That string then flows through escapeHtml, producing the double-encode.

- render.ts: new decodeTextEntities helper, distinct from decodeTypographicEntities
  (which runs on in-pipeline HTML and intentionally preserves &amp;). Covers
  named entities (lt/gt/quot/apos/39/x27/amp) AND numeric (decimal + hex) so
  inputs like "&#169;" or "&#x2014;" don't create the same partial-fix bug.
  Amp-last ordering prevents double-decode on "&amp;lt;" et al.
- Apply in both extractFirstHeading and extractHeadings. extractHeadings feeds
  buildTocBlock → escapeHtml, so the TOC site had the same bug.
- render.test.ts: 8 tests covering the contract — parameterized across &, <, >,
  ©, — chars; single-escape in <title>/cover; TOC double-escape check; numeric
  entity decode; smartypants-interacts-with-quotes contract (no raw equality).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(make-pdf): Liberation Sans font fallback for Linux rendering

On Linux (Docker, CI, servers), neither Helvetica nor Arial exist. Our CSS
stacks were falling through to DejaVu Sans — wider letterforms that look like
Verdana, not the intended Helvetica/Faber look. Liberation Sans is the standard
metric-compatible Arial clone (SIL OFL 1.1, apt package fonts-liberation).

- print-css.ts: all four font stacks (body + @top-center + @bottom-center +
  @bottom-right CONFIDENTIAL) gain "Liberation Sans" between Helvetica and
  Arial. File-header docblock updated to reflect the new stack.
- .github/docker/Dockerfile.ci: explicit apt-get install fonts-liberation +
  fontconfig with retry, fc-cache -f, and a verify step that fails the build
  loud if the font disappears. Playwright's install-deps happens to pull this
  in today but the dep is implicit and could silently regress.
- SKILL.md.tmpl: one-sentence note pointing Linux users at fonts-liberation.
- SKILL.md: regenerated via bun run gen:skill-docs --host all (only make-pdf's
  generated file changed — verified clean diff scope).
- render.test.ts: 2 assertions — Liberation Sans in body stack AND in at least
  one @page margin-box rule (proves all four intended stacks got touched, not
  just one).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v1.4.1.0)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: anonymize test fixtures, drop VC-partner framing

- CHANGELOG + render.test.ts fixtures use "Faber & Faber" instead of a
  personal name. Same regression coverage (ampersand in <title>, cover,
  TOC, body), neutral subject.
- make-pdf/SKILL.md.tmpl description drops the "send to a VC partner, a
  book agent, a judge, or Rick Rubin's team" line. "Not a draft artifact
  — a finished artifact" stands on its own without the audience posturing.
- SKILL.md regenerated.

No functional changes. All 58 make-pdf tests still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 22:32:58 +08:00

235 lines
7.9 KiB
TypeScript

/**
* Orchestrator — ties render, browseClient, and filesystem together.
*
* generate(opts): markdown → PDF on disk. Returns output path.
* preview(opts): markdown → HTML, opens it in a browser.
*
* Progress indication (per DX spec):
* - stdout: ONLY the output path, printed by cli.ts after this returns.
* - stderr: spinner + per-stage status lines, unless opts.quiet.
* - --verbose: stage timings.
*
* Tab lifecycle: every generate opens a dedicated tab via $B newtab --json,
* runs load-html/js/pdf against --tab-id <N>, and closes the tab in a
* try/finally. Parallel $P generate calls never race on the active tab.
*/
import * as fs from "node:fs";
import * as os from "node:os";
import * as path from "node:path";
import * as crypto from "node:crypto";
import { spawn } from "node:child_process";
import { render } from "./render";
import type { GenerateOptions, PreviewOptions } from "./types";
import { ExitCode } from "./types";
import * as browseClient from "./browseClient";
class ProgressReporter {
private readonly quiet: boolean;
private readonly verbose: boolean;
private readonly stageStart = new Map<string, number>();
private readonly totalStart: number;
constructor(opts: { quiet?: boolean; verbose?: boolean }) {
this.quiet = opts.quiet === true;
this.verbose = opts.verbose === true;
this.totalStart = Date.now();
}
begin(stage: string): void {
this.stageStart.set(stage, Date.now());
if (this.quiet) return;
process.stderr.write(`\r\x1b[K${stage}...`);
}
end(stage: string, extra?: string): void {
const start = this.stageStart.get(stage) ?? Date.now();
const ms = Date.now() - start;
if (this.quiet) return;
if (this.verbose) {
process.stderr.write(`\r\x1b[K${stage} (${ms}ms)${extra ? `${extra}` : ""}\n`);
}
}
done(extra: string): void {
if (this.quiet) return;
const total = ((Date.now() - this.totalStart) / 1000).toFixed(1);
process.stderr.write(`\r\x1b[KDone in ${total}s. ${extra}\n`);
}
fail(stage: string, err: Error): void {
if (!this.quiet) process.stderr.write("\r\x1b[K");
// Always emit failure info, even in quiet mode — this is an error path.
process.stderr.write(`${stage} failed: ${err.message}\n`);
}
}
/**
* generate — full pipeline. Returns the output PDF path on success.
*/
export async function generate(opts: GenerateOptions): Promise<string> {
const progress = new ProgressReporter(opts);
const input = path.resolve(opts.input);
if (!fs.existsSync(input)) {
throw new Error(`input file not found: ${input}`);
}
const outputPath = path.resolve(
opts.output ?? path.join(os.tmpdir(), `${deriveSlug(input)}.pdf`),
);
// Stage 1: read markdown
progress.begin("Reading markdown");
const markdown = fs.readFileSync(input, "utf8");
progress.end("Reading markdown");
// Stage 2: render HTML
progress.begin("Rendering HTML");
const rendered = render({
markdown,
title: opts.title,
author: opts.author,
date: opts.date,
cover: opts.cover,
toc: opts.toc,
watermark: opts.watermark,
noChapterBreaks: opts.noChapterBreaks,
confidential: opts.confidential,
pageSize: opts.pageSize,
margins: opts.margins,
pageNumbers: opts.pageNumbers,
footerTemplate: opts.footerTemplate,
});
progress.end("Rendering HTML", `${rendered.meta.wordCount} words`);
// Stage 3: write HTML to a tmp file browse can read
// (We don't actually write it; we pass inline via --from-file JSON.)
// But for preview mode and debugging, we still write to tmp.
const htmlTmp = tmpFile("html");
fs.writeFileSync(htmlTmp, rendered.html, "utf8");
// Stage 4: spin up a dedicated tab, load HTML, (wait for Paged.js if TOC),
// then emit PDF. Always close the tab.
progress.begin("Opening tab");
const tabId = browseClient.newtab();
progress.end("Opening tab", `tabId=${tabId}`);
try {
progress.begin("Loading HTML into Chromium");
browseClient.loadHtml({
html: rendered.html,
waitUntil: "domcontentloaded",
tabId,
});
progress.end("Loading HTML into Chromium");
if (opts.toc) {
progress.begin("Paginating with Paged.js");
// Browse's $B pdf already waits internally when --toc is passed.
// We pass toc=true to browseClient.pdf() below.
progress.end("Paginating with Paged.js", "Paged.js after");
}
progress.begin("Generating PDF");
browseClient.pdf({
output: outputPath,
tabId,
format: opts.pageSize ?? "letter",
marginTop: opts.marginTop ?? opts.margins ?? "1in",
marginRight: opts.marginRight ?? opts.margins ?? "1in",
marginBottom: opts.marginBottom ?? opts.margins ?? "1in",
marginLeft: opts.marginLeft ?? opts.margins ?? "1in",
headerTemplate: opts.headerTemplate,
footerTemplate: opts.footerTemplate,
// CSS is the single source of truth for page numbers (see print-css.ts
// @bottom-center). Chromium's native numbering always off to avoid double
// footers. The CSS layer honors pageNumbers + footerTemplate via render().
pageNumbers: false,
tagged: opts.tagged !== false,
outline: opts.outline !== false,
printBackground: !!opts.watermark,
toc: opts.toc,
});
progress.end("Generating PDF");
const stat = fs.statSync(outputPath);
const kb = Math.round(stat.size / 1024);
progress.done(`${rendered.meta.wordCount} words · ${kb}KB · ${outputPath}`);
} finally {
// Always clean up the tab — even on crash, timeout, or Chromium hang.
try {
browseClient.closetab(tabId);
} catch {
// best-effort; we already exited the main path
}
// Cleanup tmp HTML
try { fs.unlinkSync(htmlTmp); } catch { /* best-effort */ }
}
return outputPath;
}
/**
* preview — render HTML and open it. No PDF round trip.
*/
export async function preview(opts: PreviewOptions): Promise<string> {
const progress = new ProgressReporter(opts);
const input = path.resolve(opts.input);
if (!fs.existsSync(input)) {
throw new Error(`input file not found: ${input}`);
}
progress.begin("Rendering HTML");
const markdown = fs.readFileSync(input, "utf8");
const rendered = render({
markdown,
title: opts.title,
author: opts.author,
date: opts.date,
cover: opts.cover,
toc: opts.toc,
watermark: opts.watermark,
noChapterBreaks: opts.noChapterBreaks,
confidential: opts.confidential,
pageNumbers: opts.pageNumbers,
});
progress.end("Rendering HTML", `${rendered.meta.wordCount} words`);
// Write to a stable path under /tmp so the user can reload in the same tab.
const previewPath = path.join(os.tmpdir(), `make-pdf-preview-${deriveSlug(input)}.html`);
fs.writeFileSync(previewPath, rendered.html, "utf8");
progress.begin("Opening preview");
tryOpen(previewPath);
progress.end("Opening preview");
progress.done(`Preview at ${previewPath}`);
return previewPath;
}
// ─── helpers ──────────────────────────────────────────────
function deriveSlug(p: string): string {
const base = path.basename(p).replace(/\.[^.]+$/, "");
return base.replace(/[^a-zA-Z0-9-_]+/g, "-").slice(0, 64) || "document";
}
function tmpFile(ext: string): string {
const hash = crypto.randomBytes(6).toString("hex");
return path.join(os.tmpdir(), `make-pdf-${process.pid}-${hash}.${ext}`);
}
function tryOpen(pathOrUrl: string): void {
const platform = process.platform;
const cmd = platform === "darwin" ? "open" :
platform === "win32" ? "cmd" :
"xdg-open";
const args = platform === "win32" ? ["/c", "start", "", pathOrUrl] : [pathOrUrl];
try {
const child = spawn(cmd, args, { detached: true, stdio: "ignore" });
child.unref();
} catch {
// Non-fatal; the caller already has the path and will print it.
}
}
/** Setup-only re-export so cli.ts can dynamic-import without another file. */
export { ExitCode };