Files
gstack/make-pdf/test/render.test.ts
T
Garry Tan e23ff280a1 fix(v1.4.1.0): /make-pdf — page numbers, entity escape, Linux fonts (#1098)
* fix(make-pdf): single-source page numbers via CSS, honor --no-page-numbers end-to-end

Two page-number sources were stacking in every PDF: Chromium's native footer
and our @page @bottom-center CSS. The CLI flag --page-numbers/--no-page-numbers
also never reached the CSS layer, because RenderOptions didn't carry it.
Passing --footer-template likewise dropped the "custom footer replaces stock
footer" semantic.

- orchestrator.ts: browseClient.pdf() gets pageNumbers:false unconditionally.
  CSS is the single source of truth. Chromium native numbering always off.
- render.ts: RenderOptions gains pageNumbers + footerTemplate. render() computes
  showPageNumbers = pageNumbers !== false && !footerTemplate and passes to
  printCss(), preserving the prior footerTemplate-suppresses-stock semantic.
- print-css.ts: PrintCssOptions.pageNumbers wraps @bottom-center in a conditional
  matching the existing showConfidential pattern.
- types.ts: PreviewOptions.pageNumbers so preview path compiles and matches CLI.
- render.test.ts: 7 regression tests covering printCss({pageNumbers}) in
  isolation AND the full render() data flow incl. footerTemplate path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(make-pdf): decode HTML entities in titles and TOC to prevent double-escape

A markdown title like "# Herbert & Garry" rendered as "Herbert &amp;amp; Garry"
in <title>, cover block, and TOC entries. marked emits "&amp;" (correct HTML),
but extractFirstHeading and extractHeadings only stripTags — leaving the entity
intact. That string then flows through escapeHtml, producing the double-encode.

- render.ts: new decodeTextEntities helper, distinct from decodeTypographicEntities
  (which runs on in-pipeline HTML and intentionally preserves &amp;). Covers
  named entities (lt/gt/quot/apos/39/x27/amp) AND numeric (decimal + hex) so
  inputs like "&#169;" or "&#x2014;" don't create the same partial-fix bug.
  Amp-last ordering prevents double-decode on "&amp;lt;" et al.
- Apply in both extractFirstHeading and extractHeadings. extractHeadings feeds
  buildTocBlock → escapeHtml, so the TOC site had the same bug.
- render.test.ts: 8 tests covering the contract — parameterized across &, <, >,
  ©, — chars; single-escape in <title>/cover; TOC double-escape check; numeric
  entity decode; smartypants-interacts-with-quotes contract (no raw equality).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(make-pdf): Liberation Sans font fallback for Linux rendering

On Linux (Docker, CI, servers), neither Helvetica nor Arial exist. Our CSS
stacks were falling through to DejaVu Sans — wider letterforms that look like
Verdana, not the intended Helvetica/Faber look. Liberation Sans is the standard
metric-compatible Arial clone (SIL OFL 1.1, apt package fonts-liberation).

- print-css.ts: all four font stacks (body + @top-center + @bottom-center +
  @bottom-right CONFIDENTIAL) gain "Liberation Sans" between Helvetica and
  Arial. File-header docblock updated to reflect the new stack.
- .github/docker/Dockerfile.ci: explicit apt-get install fonts-liberation +
  fontconfig with retry, fc-cache -f, and a verify step that fails the build
  loud if the font disappears. Playwright's install-deps happens to pull this
  in today but the dep is implicit and could silently regress.
- SKILL.md.tmpl: one-sentence note pointing Linux users at fonts-liberation.
- SKILL.md: regenerated via bun run gen:skill-docs --host all (only make-pdf's
  generated file changed — verified clean diff scope).
- render.test.ts: 2 assertions — Liberation Sans in body stack AND in at least
  one @page margin-box rule (proves all four intended stacks got touched, not
  just one).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v1.4.1.0)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: anonymize test fixtures, drop VC-partner framing

- CHANGELOG + render.test.ts fixtures use "Faber & Faber" instead of a
  personal name. Same regression coverage (ampersand in <title>, cover,
  TOC, body), neutral subject.
- make-pdf/SKILL.md.tmpl description drops the "send to a VC partner, a
  book agent, a judge, or Rick Rubin's team" line. "Not a draft artifact
  — a finished artifact" stands on its own without the audience posturing.
- SKILL.md regenerated.

No functional changes. All 58 make-pdf tests still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 22:32:58 +08:00

450 lines
17 KiB
TypeScript

/**
* Renderer unit tests — pure-function assertions for render.ts, smartypants.ts,
* and print-css.ts. No Playwright, no PDF generation.
*/
import { describe, expect, test } from "bun:test";
import { render, sanitizeUntrustedHtml } from "../src/render";
import { smartypants } from "../src/smartypants";
import { printCss } from "../src/print-css";
// ─── smartypants ──────────────────────────────────────────────
describe("smartypants", () => {
test("converts straight double quotes to curly", () => {
const out = smartypants(`<p>She said "hello" to him.</p>`);
expect(out).toContain("\u201chello\u201d");
});
test("converts em dash (--)", () => {
const out = smartypants(`<p>This is it -- the answer.</p>`);
expect(out).toContain("\u2014");
});
test("converts ellipsis (...)", () => {
const out = smartypants(`<p>Wait...</p>`);
expect(out).toContain("\u2026");
});
test("converts apostrophes in contractions", () => {
const out = smartypants(`<p>don't you know?</p>`);
expect(out).toContain("don\u2019t");
});
test("does NOT touch content inside <code> blocks", () => {
const input = `<pre><code>const x = "hello"; // it's fine</code></pre>`;
const out = smartypants(input);
expect(out).toBe(input); // unchanged
});
test("does NOT touch content inside <pre> blocks", () => {
const input = `<pre>"quoted" -- don't</pre>`;
const out = smartypants(input);
expect(out).toBe(input);
});
test("does NOT touch inline code", () => {
const out = smartypants(`<p>Use <code>it's</code> like this: "hello".</p>`);
expect(out).toContain("<code>it's</code>");
expect(out).toContain("\u201chello\u201d");
});
test("does NOT touch URLs", () => {
const out = smartypants(`<p>Visit https://example.com/it's-page for "details".</p>`);
expect(out).toContain("https://example.com/it's-page");
expect(out).toContain("\u201cdetails\u201d");
});
test("does NOT touch HTML attribute values", () => {
const out = smartypants(`<a href="it's-a-test.html">link</a>`);
expect(out).toContain(`href="it's-a-test.html"`);
});
test("does NOT convert -- in CLI flags", () => {
// Prose like "try --verbose mode" should not turn -- into em dash
const out = smartypants(`<p>Try --verbose mode.</p>`);
// Since "--" is followed by a word char but not preceded by word/space,
// it should remain intact. We're lenient here — acceptable either way.
expect(out).toMatch(/--verbose|—verbose/);
});
});
// ─── sanitizer ──────────────────────────────────────────────
describe("sanitizeUntrustedHtml", () => {
test("strips <script> tags and content", () => {
const input = `<p>hello</p><script>alert(1)</script><p>world</p>`;
const out = sanitizeUntrustedHtml(input);
expect(out).not.toContain("<script");
expect(out).not.toContain("alert");
expect(out).toContain("<p>hello</p>");
expect(out).toContain("<p>world</p>");
});
test("strips <iframe>", () => {
const input = `<p>hi</p><iframe src="evil.com"></iframe>`;
expect(sanitizeUntrustedHtml(input)).not.toContain("<iframe");
});
test("strips onclick attribute", () => {
const input = `<a href="#" onclick="alert(1)">click</a>`;
const out = sanitizeUntrustedHtml(input);
expect(out).not.toContain("onclick");
expect(out).toContain("href=\"#\"");
});
test("strips event handlers with mixed case (onClick, ONCLICK)", () => {
const input1 = `<a href="#" onClick="x()">a</a>`;
const input2 = `<a href="#" ONCLICK="x()">b</a>`;
expect(sanitizeUntrustedHtml(input1)).not.toContain("onClick");
expect(sanitizeUntrustedHtml(input2)).not.toContain("ONCLICK");
});
test("rewrites javascript: URLs in href to #", () => {
const input = `<a href="javascript:alert(1)">bad</a>`;
const out = sanitizeUntrustedHtml(input);
expect(out).not.toContain("javascript:");
expect(out).toContain('href="#"');
});
test("strips inline SVG <script>", () => {
const input = `<svg><script>alert(1)</script><circle r="5"/></svg>`;
const out = sanitizeUntrustedHtml(input);
expect(out).not.toContain("<script");
expect(out).toContain("<circle");
});
test("strips <object>, <embed>, <link>, <meta>, <base>, <form>", () => {
const input = `
<object data="x.swf"></object>
<embed src="y.mov">
<link rel="stylesheet" href="evil.css">
<meta http-equiv="refresh" content="0;url=evil">
<base href="evil.com">
<form action="evil"><input/></form>
`;
const out = sanitizeUntrustedHtml(input);
expect(out).not.toContain("<object");
expect(out).not.toContain("<embed");
expect(out).not.toContain("<link");
expect(out).not.toContain("<meta");
expect(out).not.toContain("<base");
expect(out).not.toContain("<form");
});
test("strips srcdoc attribute (iframe escape vector)", () => {
const input = `<div srcdoc="<script>bad</script>">hi</div>`;
expect(sanitizeUntrustedHtml(input)).not.toContain("srcdoc");
});
});
// ─── end-to-end render ──────────────────────────────────────────────
describe("render (end-to-end)", () => {
test("produces a full HTML document with title, body, and CSS", () => {
const result = render({
markdown: `# Hello\n\nA paragraph with "quotes" and -- dashes.\n`,
});
expect(result.html).toContain("<!doctype html>");
expect(result.html).toContain("<title>Hello</title>");
expect(result.html).toContain("<h1");
expect(result.html).toContain("Hello");
// CSS should be inlined as <style>...
expect(result.html).toMatch(/<style>[\s\S]*font-family: Helvetica/);
// Smartypants ran
expect(result.html).toContain("\u201cquotes\u201d");
expect(result.html).toContain("\u2014");
});
test("derives title from first H1 when --title is not passed", () => {
const result = render({ markdown: `# My Title\n\nBody.` });
expect(result.meta.title).toBe("My Title");
});
test("uses --title override when provided", () => {
const result = render({
markdown: `# Auto-derived\n\nBody.`,
title: "Explicit Title",
});
expect(result.meta.title).toBe("Explicit Title");
});
test("includes cover block when cover=true", () => {
const result = render({
markdown: `# Doc\n\nBody.`,
cover: true,
subtitle: "A subtitle",
author: "Garry Tan",
});
expect(result.html).toContain(`class="cover"`);
expect(result.html).toContain(`class="cover-title"`);
expect(result.html).toContain("A subtitle");
expect(result.html).toContain("Garry Tan");
});
test("omits cover block when cover=false", () => {
const result = render({ markdown: `# Memo\n\nBody.` });
expect(result.html).not.toContain(`class="cover"`);
});
test("injects watermark element when --watermark is set", () => {
const result = render({ markdown: `# Doc`, watermark: "DRAFT" });
expect(result.html).toContain(`class="watermark"`);
expect(result.html).toContain("DRAFT");
// And the CSS rule for it must be present
expect(result.html).toContain("position: fixed");
expect(result.html).toContain("rotate(-30deg)");
});
test("wraps each H1 in its own .chapter section (default)", () => {
const result = render({
markdown: `# One\n\nbody 1\n\n# Two\n\nbody 2\n`,
});
const chapterMatches = result.html.match(/class="chapter"/g);
expect(chapterMatches).toBeTruthy();
if (chapterMatches) expect(chapterMatches.length).toBe(2);
});
test("does NOT create chapter sections when noChapterBreaks=true", () => {
const result = render({
markdown: `# One\n\nbody\n\n# Two\n\nbody\n`,
noChapterBreaks: true,
});
const chapterMatches = result.html.match(/class="chapter"/g) ?? [];
expect(chapterMatches.length).toBe(1);
});
test("builds a TOC with H1/H2 entries when toc=true", () => {
const result = render({
markdown: `# One\n\n## Sub\n\nbody\n\n# Two\n\nbody\n`,
toc: true,
});
expect(result.html).toContain(`class="toc"`);
expect(result.html).toContain(`<h2>Contents</h2>`);
expect(result.html).toContain("One");
expect(result.html).toContain("Sub");
expect(result.html).toContain("Two");
});
test("strips dangerous HTML from untrusted markdown", () => {
const result = render({
markdown: `# Safe\n\n<script>alert('xss')</script>\n\nBody.`,
});
expect(result.html).not.toContain("<script");
expect(result.html).not.toContain("alert");
expect(result.html).toContain("Safe");
});
test("respects text-align: left — no justify in print CSS", () => {
const result = render({ markdown: `para1\n\npara2\n` });
// The rule from the design-review fix: no p + p indent, text-align: left.
expect(result.printCss).toContain("text-align: left");
expect(result.printCss).not.toContain("text-align: justify");
expect(result.printCss).not.toContain("text-indent");
});
test("includes CJK font fallback in body", () => {
const result = render({ markdown: `body` });
expect(result.printCss).toContain("Hiragino Kaku Gothic");
expect(result.printCss).toContain("Noto Sans CJK");
});
});
// ─── print-css ──────────────────────────────────────────────
describe("printCss", () => {
test("emits 1in margins by default", () => {
const css = printCss();
expect(css).toContain("margin: 1in");
});
test("respects custom margins flag", () => {
const css = printCss({ margins: "72pt" });
expect(css).toContain("margin: 72pt");
});
test("emits letter page size by default", () => {
const css = printCss();
expect(css).toContain("size: letter");
});
test("respects custom page size", () => {
const css = printCss({ pageSize: "a4" });
expect(css).toContain("size: a4");
});
test("suppresses running header and footer on cover page", () => {
const css = printCss();
expect(css).toMatch(/@page\s*:first\s*\{[\s\S]*?content:\s*none[\s\S]*?content:\s*none/);
});
test("omits CONFIDENTIAL when confidential=false", () => {
const css = printCss({ confidential: false });
expect(css).not.toContain("CONFIDENTIAL");
});
test("emits watermark CSS only when watermark is set", () => {
const withWatermark = printCss({ watermark: "DRAFT" });
expect(withWatermark).toContain(".watermark");
expect(withWatermark).toContain("rotate(-30deg)");
const withoutWatermark = printCss();
expect(withoutWatermark).not.toContain(".watermark");
});
test("drops chapter break rule when noChapterBreaks=true", () => {
const on = printCss({ noChapterBreaks: false });
expect(on).toContain("break-before: page");
const off = printCss({ noChapterBreaks: true });
expect(off).not.toContain(".chapter { break-before: page");
});
test("always sets p { text-align: left }", () => {
const css = printCss();
expect(css).toContain("text-align: left");
});
test("never sets text-indent on p", () => {
const css = printCss();
// Confirm no p-indent slipped in
expect(css).not.toMatch(/p\s*\+\s*p\s*\{[^}]*text-indent/);
});
test("emits @bottom-center page-number rule by default", () => {
const css = printCss();
expect(css).toMatch(/@bottom-center\s*\{\s*content:\s*counter\(page\)/);
});
test("suppresses @bottom-center page-number rule when pageNumbers=false", () => {
const css = printCss({ pageNumbers: false });
expect(css).not.toMatch(/@bottom-center\s*\{\s*content:\s*counter\(page\)/);
});
test("still emits @bottom-center when pageNumbers=true (explicit)", () => {
const css = printCss({ pageNumbers: true });
expect(css).toMatch(/@bottom-center\s*\{\s*content:\s*counter\(page\)/);
});
test("font stacks include Liberation Sans adjacent to Helvetica", () => {
const css = printCss({ confidential: true });
// Body stack
expect(css).toMatch(/font-family:\s*Helvetica,\s*"Liberation Sans",\s*Arial/);
// At least one @page margin box (running header / page number / CONFIDENTIAL)
// should also have the updated stack.
const marginBoxStacks = css.match(/@(top|bottom)-(center|right)\s*\{[^}]*Liberation Sans/g) ?? [];
expect(marginBoxStacks.length).toBeGreaterThanOrEqual(1);
});
test("all four original Helvetica stacks now include Liberation Sans", () => {
const css = printCss({ runningHeader: "Running Title", confidential: true });
// Count: body (1) + running header (1) + page numbers (1) + confidential (1) = 4
const occurrences = (css.match(/"Liberation Sans"/g) ?? []).length;
expect(occurrences).toBeGreaterThanOrEqual(4);
});
});
// ─── render() — pageNumbers / footerTemplate data flow ───────────────
describe("render() — pageNumbers data flow", () => {
test("CSS footer renders by default", () => {
const result = render({ markdown: `# Doc\n\nBody.` });
expect(result.printCss).toMatch(/@bottom-center\s*\{\s*content:\s*counter\(page\)/);
});
test("--no-page-numbers reaches the CSS layer", () => {
const result = render({ markdown: `# Doc\n\nBody.`, pageNumbers: false });
expect(result.printCss).not.toMatch(/@bottom-center\s*\{\s*content:\s*counter\(page\)/);
});
test("footerTemplate suppresses CSS page numbers (custom footer wins)", () => {
const result = render({
markdown: `# Doc\n\nBody.`,
footerTemplate: `<div class="foo">custom</div>`,
});
expect(result.printCss).not.toMatch(/@bottom-center\s*\{\s*content:\s*counter\(page\)/);
});
test("pageNumbers=true + no footerTemplate keeps CSS footer", () => {
const result = render({ markdown: `# Doc`, pageNumbers: true });
expect(result.printCss).toMatch(/@bottom-center\s*\{\s*content:\s*counter\(page\)/);
});
});
// ─── render() — HTML entity handling in titles, cover, TOC ───────────
describe("render() — no double HTML entity escaping", () => {
type Case = { char: string; inTitle: string; expectedTitleMeta: string };
// Only characters that should flow through unchanged. `"` and `'` are
// omitted from this set because smartypants converts them to curly quotes
// before heading extraction — asserted separately below.
const cases: Case[] = [
{ char: "&", inTitle: "A & B", expectedTitleMeta: "A & B" },
{ char: "<", inTitle: "A < B", expectedTitleMeta: "A < B" },
{ char: ">", inTitle: "A > B", expectedTitleMeta: "A > B" },
{ char: "©", inTitle: "A © B", expectedTitleMeta: "A © B" },
{ char: "—", inTitle: "A — B", expectedTitleMeta: "A — B" },
];
for (const { char, inTitle, expectedTitleMeta } of cases) {
test(`"${char}" in H1 has no double-escape in <title> or cover`, () => {
const result = render({
markdown: `# ${inTitle}\n\nBody.`,
cover: true,
author: "A",
});
// Meta: decoded plain text.
expect(result.meta.title).toBe(expectedTitleMeta);
// HTML: <title>...</title> never contains double-escape patterns.
expect(result.html).not.toMatch(/<title>[^<]*&amp;amp;/);
expect(result.html).not.toMatch(/<title>[^<]*&amp;lt;/);
expect(result.html).not.toMatch(/<title>[^<]*&amp;gt;/);
expect(result.html).not.toMatch(/<title>[^<]*&amp;#\d+;/);
expect(result.html).not.toMatch(/<title>[^<]*&amp;#x[0-9a-fA-F]+;/);
// Cover block also single-escape.
expect(result.html).not.toMatch(/class="cover-title"[^>]*>[^<]*&amp;amp;/);
});
}
test('ampersand in <title> renders as exactly one "&amp;"', () => {
const result = render({ markdown: `# Faber & Faber\n\nBody.` });
expect(result.html).toContain("<title>Faber &amp; Faber</title>");
expect(result.html).not.toContain("&amp;amp;");
});
test("TOC entries have no double-escape when a heading contains '&'", () => {
const result = render({
markdown: `# Doc\n\n## Faber & Faber\n\nBody.\n\n## Other\n\nMore.`,
toc: true,
});
// TOC renders the heading text through escapeHtml; must be single-escaped.
expect(result.html).toContain("Faber &amp; Faber");
expect(result.html).not.toContain("&amp;amp;");
});
test('numeric entity in H1 (e.g. "&#169;") decodes cleanly to <title>', () => {
// Marked passes through numeric entities verbatim in the HTML output,
// so the decoder must handle them.
const result = render({ markdown: `# A &#169; B\n\nBody.` });
expect(result.meta.title).toBe("A © B");
expect(result.html).toContain("<title>A © B</title>");
});
test("smartypants converts raw quotes in title BEFORE extraction (contract)", () => {
// We do NOT assert raw `"` survives — smartypants is expected to convert it.
// The contract is: no double-escape of the encoded form.
const result = render({ markdown: `# Say "hi"\n\nBody.` });
expect(result.html).not.toContain("&amp;quot;");
expect(result.html).not.toContain("&amp;#39;");
// And <title> contains exactly one level of escaping.
const titleMatch = result.html.match(/<title>([^<]*)<\/title>/);
expect(titleMatch).toBeTruthy();
if (titleMatch) {
// Never contains a double-encoded entity.
expect(titleMatch[1]).not.toMatch(/&amp;(amp|lt|gt|quot|#\d+);/);
}
});
});