fix(v1.4.1.0): /make-pdf — page numbers, entity escape, Linux fonts (#1098)

* fix(make-pdf): single-source page numbers via CSS, honor --no-page-numbers end-to-end Two page-number sources were stacking in every PDF: Chromium's native footer and our @page @bottom-center CSS. The CLI flag --page-numbers/--no-page-numbers also never reached the CSS layer, because RenderOptions didn't carry it. Passing --footer-template likewise dropped the "custom footer replaces stock footer" semantic. - orchestrator.ts: browseClient.pdf() gets pageNumbers:false unconditionally. CSS is the single source of truth. Chromium native numbering always off. - render.ts: RenderOptions gains pageNumbers + footerTemplate. render() computes showPageNumbers = pageNumbers !== false && !footerTemplate and passes to printCss(), preserving the prior footerTemplate-suppresses-stock semantic. - print-css.ts: PrintCssOptions.pageNumbers wraps @bottom-center in a conditional matching the existing showConfidential pattern. - types.ts: PreviewOptions.pageNumbers so preview path compiles and matches CLI. - render.test.ts: 7 regression tests covering printCss({pageNumbers}) in isolation AND the full render() data flow incl. footerTemplate path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(make-pdf): decode HTML entities in titles and TOC to prevent double-escape A markdown title like "# Herbert & Garry" rendered as "Herbert &amp; Garry" in <title>, cover block, and TOC entries. marked emits "&" (correct HTML), but extractFirstHeading and extractHeadings only stripTags — leaving the entity intact. That string then flows through escapeHtml, producing the double-encode. - render.ts: new decodeTextEntities helper, distinct from decodeTypographicEntities (which runs on in-pipeline HTML and intentionally preserves &). Covers named entities (lt/gt/quot/apos/39/x27/amp) AND numeric (decimal + hex) so inputs like "©" or "—" don't create the same partial-fix bug. Amp-last ordering prevents double-decode on "&lt;" et al. - Apply in both extractFirstHeading and extractHeadings. extractHeadings feeds buildTocBlock → escapeHtml, so the TOC site had the same bug. - render.test.ts: 8 tests covering the contract — parameterized across &, <, >, ©, — chars; single-escape in <title>/cover; TOC double-escape check; numeric entity decode; smartypants-interacts-with-quotes contract (no raw equality). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(make-pdf): Liberation Sans font fallback for Linux rendering On Linux (Docker, CI, servers), neither Helvetica nor Arial exist. Our CSS stacks were falling through to DejaVu Sans — wider letterforms that look like Verdana, not the intended Helvetica/Faber look. Liberation Sans is the standard metric-compatible Arial clone (SIL OFL 1.1, apt package fonts-liberation). - print-css.ts: all four font stacks (body + @top-center + @bottom-center + @bottom-right CONFIDENTIAL) gain "Liberation Sans" between Helvetica and Arial. File-header docblock updated to reflect the new stack. - .github/docker/Dockerfile.ci: explicit apt-get install fonts-liberation + fontconfig with retry, fc-cache -f, and a verify step that fails the build loud if the font disappears. Playwright's install-deps happens to pull this in today but the dep is implicit and could silently regress. - SKILL.md.tmpl: one-sentence note pointing Linux users at fonts-liberation. - SKILL.md: regenerated via bun run gen:skill-docs --host all (only make-pdf's generated file changed — verified clean diff scope). - render.test.ts: 2 assertions — Liberation Sans in body stack AND in at least one @page margin-box rule (proves all four intended stacks got touched, not just one). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v1.4.1.0) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: anonymize test fixtures, drop VC-partner framing - CHANGELOG + render.test.ts fixtures use "Faber & Faber" instead of a personal name. Same regression coverage (ampersand in <title>, cover, TOC, body), neutral subject. - make-pdf/SKILL.md.tmpl description drops the "send to a VC partner, a book agent, a judge, or Rick Rubin's team" line. "Not a draft artifact — a finished artifact" stands on its own without the audience posturing. - SKILL.md regenerated. No functional changes. All 58 make-pdf tests still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-27 12:10:00 +02:00 · 2026-04-20 22:32:58 +08:00
parent 97584f9a59
commit e23ff280a1
11 changed files with 285 additions and 24 deletions
@@ -311,4 +311,139 @@ describe("printCss", () => {
    // Confirm no p-indent slipped in
    expect(css).not.toMatch(/p\s*\+\s*p\s*\{[^}]*text-indent/);
  });
+
+  test("emits @bottom-center page-number rule by default", () => {
+    const css = printCss();
+    expect(css).toMatch(/@bottom-center\s*\{\s*content:\s*counter\(page\)/);
+  });
+
+  test("suppresses @bottom-center page-number rule when pageNumbers=false", () => {
+    const css = printCss({ pageNumbers: false });
+    expect(css).not.toMatch(/@bottom-center\s*\{\s*content:\s*counter\(page\)/);
+  });
+
+  test("still emits @bottom-center when pageNumbers=true (explicit)", () => {
+    const css = printCss({ pageNumbers: true });
+    expect(css).toMatch(/@bottom-center\s*\{\s*content:\s*counter\(page\)/);
+  });
+
+  test("font stacks include Liberation Sans adjacent to Helvetica", () => {
+    const css = printCss({ confidential: true });
+    // Body stack
+    expect(css).toMatch(/font-family:\s*Helvetica,\s*"Liberation Sans",\s*Arial/);
+    // At least one @page margin box (running header / page number / CONFIDENTIAL)
+    // should also have the updated stack.
+    const marginBoxStacks = css.match(/@(top|bottom)-(center|right)\s*\{[^}]*Liberation Sans/g) ?? [];
+    expect(marginBoxStacks.length).toBeGreaterThanOrEqual(1);
+  });
+
+  test("all four original Helvetica stacks now include Liberation Sans", () => {
+    const css = printCss({ runningHeader: "Running Title", confidential: true });
+    // Count: body (1) + running header (1) + page numbers (1) + confidential (1) = 4
+    const occurrences = (css.match(/"Liberation Sans"/g) ?? []).length;
+    expect(occurrences).toBeGreaterThanOrEqual(4);
+  });
+});
+
+// ─── render() — pageNumbers / footerTemplate data flow ───────────────
+
+describe("render() — pageNumbers data flow", () => {
+  test("CSS footer renders by default", () => {
+    const result = render({ markdown: `# Doc\n\nBody.` });
+    expect(result.printCss).toMatch(/@bottom-center\s*\{\s*content:\s*counter\(page\)/);
+  });
+
+  test("--no-page-numbers reaches the CSS layer", () => {
+    const result = render({ markdown: `# Doc\n\nBody.`, pageNumbers: false });
+    expect(result.printCss).not.toMatch(/@bottom-center\s*\{\s*content:\s*counter\(page\)/);
+  });
+
+  test("footerTemplate suppresses CSS page numbers (custom footer wins)", () => {
+    const result = render({
+      markdown: `# Doc\n\nBody.`,
+      footerTemplate: `<div class="foo">custom</div>`,
+    });
+    expect(result.printCss).not.toMatch(/@bottom-center\s*\{\s*content:\s*counter\(page\)/);
+  });
+
+  test("pageNumbers=true + no footerTemplate keeps CSS footer", () => {
+    const result = render({ markdown: `# Doc`, pageNumbers: true });
+    expect(result.printCss).toMatch(/@bottom-center\s*\{\s*content:\s*counter\(page\)/);
+  });
+});
+
+// ─── render() — HTML entity handling in titles, cover, TOC ───────────
+
+describe("render() — no double HTML entity escaping", () => {
+  type Case = { char: string; inTitle: string; expectedTitleMeta: string };
+
+  // Only characters that should flow through unchanged. `"` and `'` are
+  // omitted from this set because smartypants converts them to curly quotes
+  // before heading extraction — asserted separately below.
+  const cases: Case[] = [
+    { char: "&", inTitle: "A & B", expectedTitleMeta: "A & B" },
+    { char: "<", inTitle: "A < B", expectedTitleMeta: "A < B" },
+    { char: ">", inTitle: "A > B", expectedTitleMeta: "A > B" },
+    { char: "©", inTitle: "A © B", expectedTitleMeta: "A © B" },
+    { char: "—", inTitle: "A — B", expectedTitleMeta: "A — B" },
+  ];
+
+  for (const { char, inTitle, expectedTitleMeta } of cases) {
+    test(`"${char}" in H1 has no double-escape in <title> or cover`, () => {
+      const result = render({
+        markdown: `# ${inTitle}\n\nBody.`,
+        cover: true,
+        author: "A",
+      });
+      // Meta: decoded plain text.
+      expect(result.meta.title).toBe(expectedTitleMeta);
+      // HTML: <title>...</title> never contains double-escape patterns.
+      expect(result.html).not.toMatch(/<title>[^<]*&amp;amp;/);
+      expect(result.html).not.toMatch(/<title>[^<]*&amp;lt;/);
+      expect(result.html).not.toMatch(/<title>[^<]*&amp;gt;/);
+      expect(result.html).not.toMatch(/<title>[^<]*&amp;#\d+;/);
+      expect(result.html).not.toMatch(/<title>[^<]*&amp;#x[0-9a-fA-F]+;/);
+      // Cover block also single-escape.
+      expect(result.html).not.toMatch(/class="cover-title"[^>]*>[^<]*&amp;amp;/);
+    });
+  }
+
+  test('ampersand in <title> renders as exactly one "&amp;"', () => {
+    const result = render({ markdown: `# Faber & Faber\n\nBody.` });
+    expect(result.html).toContain("<title>Faber &amp; Faber</title>");
+    expect(result.html).not.toContain("&amp;amp;");
+  });
+
+  test("TOC entries have no double-escape when a heading contains '&'", () => {
+    const result = render({
+      markdown: `# Doc\n\n## Faber & Faber\n\nBody.\n\n## Other\n\nMore.`,
+      toc: true,
+    });
+    // TOC renders the heading text through escapeHtml; must be single-escaped.
+    expect(result.html).toContain("Faber &amp; Faber");
+    expect(result.html).not.toContain("&amp;amp;");
+  });
+
+  test('numeric entity in H1 (e.g. "&#169;") decodes cleanly to <title>', () => {
+    // Marked passes through numeric entities verbatim in the HTML output,
+    // so the decoder must handle them.
+    const result = render({ markdown: `# A &#169; B\n\nBody.` });
+    expect(result.meta.title).toBe("A © B");
+    expect(result.html).toContain("<title>A © B</title>");
+  });
+
+  test("smartypants converts raw quotes in title BEFORE extraction (contract)", () => {
+    // We do NOT assert raw `"` survives — smartypants is expected to convert it.
+    // The contract is: no double-escape of the encoded form.
+    const result = render({ markdown: `# Say "hi"\n\nBody.` });
+    expect(result.html).not.toContain("&amp;quot;");
+    expect(result.html).not.toContain("&amp;#39;");
+    // And <title> contains exactly one level of escaping.
+    const titleMatch = result.html.match(/<title>([^<]*)<\/title>/);
+    expect(titleMatch).toBeTruthy();
+    if (titleMatch) {
+      // Never contains a double-encoded entity.
+      expect(titleMatch[1]).not.toMatch(/&amp;(amp|lt|gt|quot|#\d+);/);
+    }
+  });
 });