fix(v1.4.1.0): /make-pdf — page numbers, entity escape, Linux fonts (#1098)

* fix(make-pdf): single-source page numbers via CSS, honor --no-page-numbers end-to-end Two page-number sources were stacking in every PDF: Chromium's native footer and our @page @bottom-center CSS. The CLI flag --page-numbers/--no-page-numbers also never reached the CSS layer, because RenderOptions didn't carry it. Passing --footer-template likewise dropped the "custom footer replaces stock footer" semantic. - orchestrator.ts: browseClient.pdf() gets pageNumbers:false unconditionally. CSS is the single source of truth. Chromium native numbering always off. - render.ts: RenderOptions gains pageNumbers + footerTemplate. render() computes showPageNumbers = pageNumbers !== false && !footerTemplate and passes to printCss(), preserving the prior footerTemplate-suppresses-stock semantic. - print-css.ts: PrintCssOptions.pageNumbers wraps @bottom-center in a conditional matching the existing showConfidential pattern. - types.ts: PreviewOptions.pageNumbers so preview path compiles and matches CLI. - render.test.ts: 7 regression tests covering printCss({pageNumbers}) in isolation AND the full render() data flow incl. footerTemplate path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(make-pdf): decode HTML entities in titles and TOC to prevent double-escape A markdown title like "# Herbert & Garry" rendered as "Herbert &amp; Garry" in <title>, cover block, and TOC entries. marked emits "&" (correct HTML), but extractFirstHeading and extractHeadings only stripTags — leaving the entity intact. That string then flows through escapeHtml, producing the double-encode. - render.ts: new decodeTextEntities helper, distinct from decodeTypographicEntities (which runs on in-pipeline HTML and intentionally preserves &). Covers named entities (lt/gt/quot/apos/39/x27/amp) AND numeric (decimal + hex) so inputs like "©" or "—" don't create the same partial-fix bug. Amp-last ordering prevents double-decode on "&lt;" et al. - Apply in both extractFirstHeading and extractHeadings. extractHeadings feeds buildTocBlock → escapeHtml, so the TOC site had the same bug. - render.test.ts: 8 tests covering the contract — parameterized across &, <, >, ©, — chars; single-escape in <title>/cover; TOC double-escape check; numeric entity decode; smartypants-interacts-with-quotes contract (no raw equality). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(make-pdf): Liberation Sans font fallback for Linux rendering On Linux (Docker, CI, servers), neither Helvetica nor Arial exist. Our CSS stacks were falling through to DejaVu Sans — wider letterforms that look like Verdana, not the intended Helvetica/Faber look. Liberation Sans is the standard metric-compatible Arial clone (SIL OFL 1.1, apt package fonts-liberation). - print-css.ts: all four font stacks (body + @top-center + @bottom-center + @bottom-right CONFIDENTIAL) gain "Liberation Sans" between Helvetica and Arial. File-header docblock updated to reflect the new stack. - .github/docker/Dockerfile.ci: explicit apt-get install fonts-liberation + fontconfig with retry, fc-cache -f, and a verify step that fails the build loud if the font disappears. Playwright's install-deps happens to pull this in today but the dep is implicit and could silently regress. - SKILL.md.tmpl: one-sentence note pointing Linux users at fonts-liberation. - SKILL.md: regenerated via bun run gen:skill-docs --host all (only make-pdf's generated file changed — verified clean diff scope). - render.test.ts: 2 assertions — Liberation Sans in body stack AND in at least one @page margin-box rule (proves all four intended stacks got touched, not just one). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v1.4.1.0) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: anonymize test fixtures, drop VC-partner framing - CHANGELOG + render.test.ts fixtures use "Faber & Faber" instead of a personal name. Same regression coverage (ampersand in <title>, cover, TOC, body), neutral subject. - make-pdf/SKILL.md.tmpl description drops the "send to a VC partner, a book agent, a judge, or Rick Rubin's team" line. "Not a draft artifact — a finished artifact" stands on its own without the audience posturing. - SKILL.md regenerated. No functional changes. All 58 make-pdf tests still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-24 10:39:57 +02:00 · 2026-04-20 22:32:58 +08:00
parent 97584f9a59
commit e23ff280a1
11 changed files with 285 additions and 24 deletions
@@ -34,6 +34,11 @@ export interface RenderOptions {
  // Page layout
  pageSize?: "letter" | "a4" | "legal" | "tabloid";
  margins?: string;
+
+  // Footer behavior. pageNumbers defaults to true. When footerTemplate is set,
+  // CSS page numbers are suppressed so the custom Chromium footer wins cleanly.
+  pageNumbers?: boolean;
+  footerTemplate?: string;
 }

 export interface RenderResult {
@@ -74,6 +79,10 @@ export function render(opts: RenderOptions): RenderResult {
  const derivedDate = opts.date ?? formatToday();

  // 5. Build CSS
+  // CSS is the single source of truth for page numbers (Chromium native
+  // numbering is always off in orchestrator). If the caller supplied a custom
+  // footerTemplate, suppress CSS page numbers too so their footer wins.
+  const showPageNumbers = opts.pageNumbers !== false && !opts.footerTemplate;
  const cssOptions: PrintCssOptions = {
    cover: opts.cover,
    toc: opts.toc,
@@ -83,6 +92,7 @@ export function render(opts: RenderOptions): RenderResult {
    runningHeader: derivedTitle,
    pageSize: opts.pageSize,
    margins: opts.margins,
+    pageNumbers: showPageNumbers,
  };
  const css = printCss(cssOptions);

@@ -278,7 +288,7 @@ function extractHeadings(html: string): Array<{ level: number; text: string }> {
  let match;
  while ((match = re.exec(html)) !== null) {
    const level = parseInt(match[1].slice(1), 10);
-    const text = stripTags(match[2]).trim();
+    const text = decodeTextEntities(stripTags(match[2]).trim());
    if (text) headings.push({ level, text });
  }
  return headings;
@@ -314,7 +324,32 @@ function wrapChaptersByH1(html: string): string {

 function extractFirstHeading(html: string): string | null {
  const m = html.match(/<h1\b[^>]*>([\s\S]*?)<\/h1>/i);
-  return m ? stripTags(m[1]).trim() : null;
+  return m ? decodeTextEntities(stripTags(m[1]).trim()) : null;
+}
+
+/**
+ * Decode HTML entities in plain text extracted from rendered HTML. Distinct
+ * from decodeTypographicEntities (which runs on in-pipeline HTML and preserves
+ * &amp; because &amp;amp; can be legitimate there). This runs on text destined
+ * for <title>, cover, and TOC entries where &amp; MUST become & or escapeHtml
+ * produces &amp;amp;.
+ *
+ * Amp-last ordering: input "&amp;#169;" decodes to "&#169;" in the named pass,
+ * then the numeric pass decodes "&#169;" to "©". Decoding &amp; first would
+ * produce "&#169;" and the numeric pass would consume it — different end state
+ * but risks double-decode on inputs like "&amp;lt;".
+ */
+function decodeTextEntities(s: string): string {
+  return s
+    .replace(/&lt;/g, "<")
+    .replace(/&gt;/g, ">")
+    .replace(/&quot;/g, '"')
+    .replace(/&#39;/g, "'")
+    .replace(/&apos;/g, "'")
+    .replace(/&#x27;/g, "'")
+    .replace(/&#(\d+);/g, (_, n) => String.fromCodePoint(parseInt(n, 10)))
+    .replace(/&#x([0-9a-fA-F]+);/g, (_, n) => String.fromCodePoint(parseInt(n, 16)))
+    .replace(/&amp;/g, "&");
 }

 function stripTags(html: string): string {