From d0782c4c4da2e71e3bc714317d5da5b3ce1072ad Mon Sep 17 00:00:00 2001 From: Garry Tan Date: Mon, 20 Apr 2026 13:20:30 +0800 Subject: [PATCH] =?UTF-8?q?feat(v1.4.0.0):=20/make-pdf=20=E2=80=94=20markd?= =?UTF-8?q?own=20to=20publication-quality=20PDFs=20(#1086)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * feat(browse): full $B pdf flag contract + tab-scoped load-html/js/pdf Grow $B pdf from a 2-line wrapper (hard-coded A4) into a real PDF engine frontend so make-pdf can shell out to it without duplicating Playwright: - pdf: --format, --width/--height, --margins, --margin-*, --header-template, --footer-template, --page-numbers, --tagged, --outline, --print-background, --prefer-css-page-size, --toc. Mutex rules enforced. --from-file dodges Windows argv limits (8191 char CreateProcess cap). - load-html: add --from-file mode for large inline HTML. Size + magic byte checks still apply to the inline content, not the payload file path. - newtab: add --json returning {"tabId":N,"url":...} for programmatic use. - cli: extract --tab-id flag and route as body.tabId to the HTTP layer so parallel callers can target specific tabs without racing on the active tab (makes make-pdf's per-render tab isolation possible). - --toc: non-fatal 3s wait for window.__pagedjsAfterFired. Paged.js ships later; v1 renders TOC statically via the markdown renderer. Codex round 2 flagged these P0 issues during plan review. All resolved. Co-Authored-By: Claude Opus 4.7 (1M context) * feat(resolvers): add MAKE_PDF_SETUP + makePdfDir host paths Skill templates can now embed {{MAKE_PDF_SETUP}} to resolve $P to the make-pdf binary via the same discovery order as $B / $D: env override (MAKE_PDF_BIN), local skill root, global install, or PATH. Mirrors the pattern established by generateBrowseSetup() and generateDesignSetup() in scripts/resolvers/design.ts. Co-Authored-By: Claude Opus 4.7 (1M context) * feat(make-pdf): new /make-pdf skill + orchestrator binary Turn markdown into publication-quality PDFs. $P generate input.md out.pdf produces a PDF with 1in margins, intelligent page breaks, page numbers, running header, CONFIDENTIAL footer, and curly quotes/em dashes — all on Helvetica so copy-paste extraction works ("S ai li ng" bug avoided). Architecture (per Codex round 2): markdown → render.ts (marked + sanitize + smartypants) → orchestrator → $B newtab --json → $B load-html --tab-id → $B js (poll Paged.js) → $B pdf --tab-id → $B closetab browseClient.ts shells out to the compiled browse CLI rather than duplicating Playwright. --tab-id isolation per render means parallel $P generate calls don't race on the active tab. try/finally tab cleanup survives Paged.js timeouts, browser crashes, and output-path failures. Features in v1: --cover left-aligned cover page (eyebrow + title + hairline rule) --toc clickable static TOC (Paged.js page numbers deferred) --watermark diagonal DRAFT/CONFIDENTIAL layer --no-chapter-breaks opt out of H1-starts-new-page --page-numbers "N of M" footer (default on) --tagged --outline accessible PDF + bookmark outline (default on) --allow-network opt in to external image loading (default off for privacy) --quiet --verbose stderr control Design decisions locked from the /plan-design-review pass: - Helvetica everywhere (Chromium emits single-word Tj operators for system fonts; bundled webfonts emit per-glyph and break extraction). - Left-aligned body, flush-left paragraphs, no text-indent, 12pt gap. - Cover shares 1in margins with body pages; no flexbox-center, no inset padding. - The reference HTMLs at .context/designs/*.html are the implementation source of truth for print-css.ts. Tests (56 unit + 1 E2E combined-features gate): - smartypants: code/URL-safe, verified against 10 fixtures - sanitizer: strips

world

`; + const out = sanitizeUntrustedHtml(input); + expect(out).not.toContain("hello

"); + expect(out).toContain("

world

"); + }); + + test("strips `; + expect(sanitizeUntrustedHtml(input)).not.toContain(" { + const input = `click`; + const out = sanitizeUntrustedHtml(input); + expect(out).not.toContain("onclick"); + expect(out).toContain("href=\"#\""); + }); + + test("strips event handlers with mixed case (onClick, ONCLICK)", () => { + const input1 = `a`; + const input2 = `b`; + expect(sanitizeUntrustedHtml(input1)).not.toContain("onClick"); + expect(sanitizeUntrustedHtml(input2)).not.toContain("ONCLICK"); + }); + + test("rewrites javascript: URLs in href to #", () => { + const input = `bad`; + const out = sanitizeUntrustedHtml(input); + expect(out).not.toContain("javascript:"); + expect(out).toContain('href="#"'); + }); + + test("strips inline SVG `; + const out = sanitizeUntrustedHtml(input); + expect(out).not.toContain(", , , , ,
", () => { + const input = ` + + + + + +
+ `; + const out = sanitizeUntrustedHtml(input); + expect(out).not.toContain(" { + const input = `
hi
`; + expect(sanitizeUntrustedHtml(input)).not.toContain("srcdoc"); + }); +}); + +// ─── end-to-end render ────────────────────────────────────────────── + +describe("render (end-to-end)", () => { + test("produces a full HTML document with title, body, and CSS", () => { + const result = render({ + markdown: `# Hello\n\nA paragraph with "quotes" and -- dashes.\n`, + }); + expect(result.html).toContain(""); + expect(result.html).toContain("Hello"); + expect(result.html).toContain("... + expect(result.html).toMatch(/