feat(v1.4.0.0): /make-pdf — markdown to publication-quality PDFs (#1086)

* feat(browse): full $B pdf flag contract + tab-scoped load-html/js/pdf

Grow $B pdf from a 2-line wrapper (hard-coded A4) into a real PDF engine
frontend so make-pdf can shell out to it without duplicating Playwright:

- pdf: --format, --width/--height, --margins, --margin-*, --header-template,
  --footer-template, --page-numbers, --tagged, --outline, --print-background,
  --prefer-css-page-size, --toc. Mutex rules enforced. --from-file <json>
  dodges Windows argv limits (8191 char CreateProcess cap).
- load-html: add --from-file <json> mode for large inline HTML. Size + magic
  byte checks still apply to the inline content, not the payload file path.
- newtab: add --json returning {"tabId":N,"url":...} for programmatic use.
- cli: extract --tab-id flag and route as body.tabId to the HTTP layer so
  parallel callers can target specific tabs without racing on the active
  tab (makes make-pdf's per-render tab isolation possible).
- --toc: non-fatal 3s wait for window.__pagedjsAfterFired. Paged.js ships
  later; v1 renders TOC statically via the markdown renderer.

Codex round 2 flagged these P0 issues during plan review. All resolved.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(resolvers): add MAKE_PDF_SETUP + makePdfDir host paths

Skill templates can now embed {{MAKE_PDF_SETUP}} to resolve $P to the
make-pdf binary via the same discovery order as $B / $D: env override
(MAKE_PDF_BIN), local skill root, global install, or PATH.

Mirrors the pattern established by generateBrowseSetup() and
generateDesignSetup() in scripts/resolvers/design.ts.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(make-pdf): new /make-pdf skill + orchestrator binary

Turn markdown into publication-quality PDFs. $P generate input.md out.pdf
produces a PDF with 1in margins, intelligent page breaks, page numbers,
running header, CONFIDENTIAL footer, and curly quotes/em dashes — all on
Helvetica so copy-paste extraction works ("S ai li ng" bug avoided).

Architecture (per Codex round 2):
  markdown → render.ts (marked + sanitize + smartypants) → orchestrator
    → $B newtab --json → $B load-html --tab-id → $B js (poll Paged.js)
    → $B pdf --tab-id → $B closetab

browseClient.ts shells out to the compiled browse CLI rather than
duplicating Playwright. --tab-id isolation per render means parallel
$P generate calls don't race on the active tab. try/finally tab cleanup
survives Paged.js timeouts, browser crashes, and output-path failures.

Features in v1:
  --cover              left-aligned cover page (eyebrow + title + hairline rule)
  --toc                clickable static TOC (Paged.js page numbers deferred)
  --watermark <text>   diagonal DRAFT/CONFIDENTIAL layer
  --no-chapter-breaks  opt out of H1-starts-new-page
  --page-numbers       "N of M" footer (default on)
  --tagged --outline   accessible PDF + bookmark outline (default on)
  --allow-network      opt in to external image loading (default off for privacy)
  --quiet --verbose    stderr control

Design decisions locked from the /plan-design-review pass:
  - Helvetica everywhere (Chromium emits single-word Tj operators for
    system fonts; bundled webfonts emit per-glyph and break extraction).
  - Left-aligned body, flush-left paragraphs, no text-indent, 12pt gap.
  - Cover shares 1in margins with body pages; no flexbox-center, no
    inset padding.
  - The reference HTMLs at .context/designs/*.html are the implementation
    source of truth for print-css.ts.

Tests (56 unit + 1 E2E combined-features gate):
  - smartypants: code/URL-safe, verified against 10 fixtures
  - sanitizer: strips <script>/<iframe>/on*/javascript: URLs
  - render: HTML assembly, CJK fallback, cover/TOC/chapter wrap
  - print-css: all @page rules, margin variants, watermark
  - pdftotext: normalize()+copyPasteGate() cross-OS tolerance
  - browseClient: binary resolution + typed error propagation
  - combined-features gate (P0): 2-chapter fixture with smartypants +
    hyphens + ligatures + bold/italic + inline code + lists + blockquote
    passes through PDF → pdftotext → expected.txt diff

Deferred to Phase 4 (future PR): Paged.js vendored for accurate TOC page
numbers, highlight.js for syntax highlighting, drop caps, pull quotes,
two-column, CMYK, watermark visual-diff acceptance.

Plan: .context/ceo-plans/2026-04-19-perfect-pdf-generator.md
References: .context/designs/make-pdf-*.html

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(build): wire make-pdf into build/test/setup/bin + add marked dep

- package.json: compile make-pdf/dist/pdf as part of bun run build; add
  "make-pdf" to bin entry; include make-pdf/test/ in the free test pass;
  add marked@18.0.2 as a dep (markdown parser, ~40KB).
- setup: add make-pdf/dist/pdf to the Apple Silicon codesign loop.
- .gitignore: add make-pdf/dist/ (matches browse/dist/ and design/dist/).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* ci(make-pdf): matrix copy-paste gate on Ubuntu + macOS

Runs the combined-features P0 gate on pull requests that touch make-pdf/
or browse's PDF surface. Installs poppler (macOS) / poppler-utils (Ubuntu)
per OS. Windows deferred to tolerant mode (Xpdf / Poppler-Windows
extraction variance not yet calibrated against the normalized comparator —
Codex round 2 #18).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(skills): regenerate SKILL.md for make-pdf addition + browse pdf flags

bun run gen:skill-docs picks up:
  - the new /make-pdf skill (make-pdf/SKILL.md)
  - updated browse command descriptions for 'pdf', 'load-html', 'newtab'
    reflecting the new flag contract and --from-file mode

Source of truth stays the .tmpl files + COMMAND_DESCRIPTIONS;
these are regenerated artifacts.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(tests): repair stale test expectations + emit _EXPLAIN_LEVEL / _QUESTION_TUNING from preamble

Three pre-existing test failures on main were blocking /ship:

- test/skill-validation.test.ts "Step 3.4 test coverage audit" expected the
  literal strings "CODE PATH COVERAGE" and "USER FLOW COVERAGE" which were
  removed when the Step 7 coverage diagram was compressed. Updated assertions
  to check the stable `Code paths:` / `User flows:` labels that still ship.

- test/skill-validation.test.ts "ship step numbering" allowed-substeps list
  didn't include 15.0 (WIP squash) and 15.1 (bisectable commits) which were
  added for continuous checkpoint mode. Extended the allowlist.

- test/writing-style-resolver.test.ts and test/plan-tune.test.ts expected
  `_EXPLAIN_LEVEL` and `_QUESTION_TUNING` bash variables in the preamble but
  generate-preamble-bash.ts had been refactored and those lines were dropped.
  Without them, downstream skills can't read `explain_level` or
  `question_tuning` config at runtime — terse mode and /plan-tune features
  were silently broken.

Added the two bash echo blocks back to generatePreambleBash and refreshed
the golden-file fixtures to match. All three preamble-related golden
baselines (claude/codex/factory) are synchronized with the new output.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v1.4.0.0)

New /make-pdf skill + $P binary.

Turn any markdown file into a publication-quality PDF. Default output is
a 1in-margin Helvetica letter with page numbers in the footer. `--cover`
adds a left-aligned cover page, `--toc` generates a clickable table of
contents, `--watermark DRAFT` overlays a diagonal watermark. Copy-paste
extraction from the PDF produces clean words, not "S a i l i n g"
spaced out letter by letter. CI gate (macOS + Ubuntu) runs a combined-
features fixture through pdftotext on every PR.

make-pdf shells out to browse rather than duplicating Playwright.
$B pdf grew into a real PDF engine with full flag contract (--format,
--margins, --header-template, --footer-template, --page-numbers,
--tagged, --outline, --toc, --tab-id, --from-file). $B load-html and
$B js gained --tab-id. $B newtab --json returns structured output.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(changelog): rewrite v1.4.0.0 headline — positive voice, no VC framing

The original headline led with "a PDF you wouldn't be embarrassed to send
to a VC": double-negative voice and audience-too-narrow. /make-pdf works
for essays, letters, memos, reports, proposals, and briefs. Framing the
whole release around founders-to-investors misses the wider audience.

New headline: "Turn any markdown file into a PDF that looks finished."
New tagline: "This one reads like a real essay or a real letter."

Positive voice. Broader aperture. Same energy.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Garry Tan
2026-04-20 13:20:30 +08:00
committed by GitHub
parent 22a4451e0e
commit d0782c4c4d
74 changed files with 4456 additions and 37 deletions
+326
View File
@@ -0,0 +1,326 @@
/**
* Typed shell-out wrapper for the browse CLI.
*
* Every browse call goes through this file. Reasons:
* - One place to do binary resolution.
* - One place to enforce the --from-file convention for large payloads
* (Windows argv cap is 8191 chars; 200KB HTML dies without this).
* - One place that maps non-zero exit codes to typed errors.
*
* Binary resolution order (Codex round 2 #4):
* 1. $BROWSE_BIN env override
* 2. sibling dir: dirname(argv[0])/../browse/dist/browse
* 3. ~/.claude/skills/gstack/browse/dist/browse
* 4. PATH lookup: `browse`
* 5. error with setup hint
*/
import { execFileSync } from "node:child_process";
import * as fs from "node:fs";
import * as os from "node:os";
import * as path from "node:path";
import * as crypto from "node:crypto";
import { BrowseClientError } from "./types";
export interface LoadHtmlOptions {
html: string; // raw HTML string
waitUntil?: "load" | "domcontentloaded" | "networkidle";
tabId: number;
}
export interface PdfOptions {
output: string;
tabId: number;
format?: string;
width?: string;
height?: string;
marginTop?: string;
marginRight?: string;
marginBottom?: string;
marginLeft?: string;
headerTemplate?: string;
footerTemplate?: string;
pageNumbers?: boolean;
tagged?: boolean;
outline?: boolean;
printBackground?: boolean;
preferCSSPageSize?: boolean;
toc?: boolean;
}
export interface JsOptions {
tabId: number;
expression: string; // JS expression to evaluate
}
/**
* Locate the browse binary. Throws a BrowseClientError with a
* canonical setup message if not found.
*/
export function resolveBrowseBin(): string {
const envOverride = process.env.BROWSE_BIN;
if (envOverride && isExecutable(envOverride)) return envOverride;
// Sibling: look relative to this process's binary
// (for when make-pdf and browse live next to each other in dist/)
const selfDir = path.dirname(process.argv[0]);
const siblingCandidates = [
path.resolve(selfDir, "../browse/dist/browse"),
path.resolve(selfDir, "../../browse/dist/browse"),
path.resolve(selfDir, "../browse"),
];
for (const candidate of siblingCandidates) {
if (isExecutable(candidate)) return candidate;
}
// Global install
const home = os.homedir();
const globalPath = path.join(home, ".claude/skills/gstack/browse/dist/browse");
if (isExecutable(globalPath)) return globalPath;
// PATH lookup
try {
const which = execFileSync("which", ["browse"], { encoding: "utf8" }).trim();
if (which && isExecutable(which)) return which;
} catch {
// `which` exited non-zero; fall through to error
}
throw new BrowseClientError(
/* exitCode */ 127,
"resolve",
[
"browse binary not found.",
"",
"make-pdf needs browse (the gstack Chromium daemon) to render PDFs.",
"Tried:",
` - $BROWSE_BIN (${envOverride || "unset"})`,
` - sibling: ${siblingCandidates.join(", ")}`,
` - global: ${globalPath}`,
" - PATH: `browse`",
"",
"To fix: run gstack setup from the gstack repo:",
" cd ~/.claude/skills/gstack && ./setup",
"",
"Or set BROWSE_BIN explicitly:",
" export BROWSE_BIN=/path/to/browse",
].join("\n"),
);
}
function isExecutable(p: string): boolean {
try {
fs.accessSync(p, fs.constants.X_OK);
return true;
} catch {
return false;
}
}
/**
* Run a browse command. Returns stdout on success.
* Throws BrowseClientError on non-zero exit.
*/
function runBrowse(args: string[]): string {
const bin = resolveBrowseBin();
try {
return execFileSync(bin, args, {
encoding: "utf8",
maxBuffer: 16 * 1024 * 1024, // 16MB; tab content can be large
stdio: ["ignore", "pipe", "pipe"],
});
} catch (err: any) {
const exitCode = typeof err.status === "number" ? err.status : 1;
const stderr = typeof err.stderr === "string"
? err.stderr
: (err.stderr?.toString() ?? "");
throw new BrowseClientError(exitCode, args[0] || "unknown", stderr);
}
}
/**
* Write a payload to a tmp file and return the path. Used for any payload
* >4KB to avoid Windows argv limits (Codex round 2 #3).
*/
function writePayloadFile(payload: Record<string, unknown>): string {
const hash = crypto.createHash("sha256")
.update(JSON.stringify(payload))
.digest("hex")
.slice(0, 12);
const tmpPath = path.join(os.tmpdir(), `make-pdf-browse-${process.pid}-${hash}.json`);
fs.writeFileSync(tmpPath, JSON.stringify(payload), "utf8");
return tmpPath;
}
function cleanupPayloadFile(p: string): void {
try { fs.unlinkSync(p); } catch { /* best-effort */ }
}
// ─── Public API ─────────────────────────────────────────────────
/**
* Open a new tab. Returns the tabId.
* Requires `$B newtab --json` to be available (added in the browse flag
* extension for this feature). If --json isn't supported yet, the fallback
* parses "Opened tab N" from stdout.
*/
export function newtab(url?: string): number {
const args = ["newtab"];
if (url) args.push(url);
// Try --json first (preferred path for programmatic use)
try {
const out = runBrowse([...args, "--json"]);
const parsed = JSON.parse(out);
if (typeof parsed.tabId === "number") return parsed.tabId;
} catch {
// Fall back to stdout-string parsing. Brittle, but works on older browse builds.
}
const out = runBrowse(args);
const m = out.match(/tab\s+(\d+)/i);
if (!m) throw new BrowseClientError(1, "newtab", `could not parse tab id from: ${out}`);
return parseInt(m[1], 10);
}
/**
* Close a tab (by id or the active tab).
*/
export function closetab(tabId?: number): void {
const args = ["closetab"];
if (tabId !== undefined) args.push(String(tabId));
runBrowse(args);
}
/**
* Load raw HTML into a specific tab.
* Uses --from-file for any payload >4KB (Codex round 2 #3).
*/
export function loadHtml(opts: LoadHtmlOptions): void {
// Always use --from-file to dodge argv limits. The HTML is almost always >4KB.
const payload = {
html: opts.html,
waitUntil: opts.waitUntil ?? "domcontentloaded",
};
const payloadFile = writePayloadFile(payload);
try {
runBrowse([
"load-html",
"--from-file", payloadFile,
"--tab-id", String(opts.tabId),
]);
} finally {
cleanupPayloadFile(payloadFile);
}
}
/**
* Evaluate a JS expression in a tab. Returns the serialized result as string.
*/
export function js(opts: JsOptions): string {
return runBrowse([
"js",
opts.expression,
"--tab-id", String(opts.tabId),
]).trim();
}
/**
* Poll a boolean JS expression until it evaluates to true, or timeout.
* Returns true if it succeeded, false if timed out.
*/
export function waitForExpression(opts: {
expression: string;
tabId: number;
timeoutMs: number;
pollIntervalMs?: number;
}): boolean {
const poll = opts.pollIntervalMs ?? 200;
const deadline = Date.now() + opts.timeoutMs;
while (Date.now() < deadline) {
try {
const result = js({ expression: opts.expression, tabId: opts.tabId });
if (result === "true") return true;
} catch {
// Tab may still be loading; keep polling
}
const wait = Math.min(poll, Math.max(0, deadline - Date.now()));
if (wait <= 0) break;
// Synchronous sleep is fine — this only runs once per PDF render
const end = Date.now() + wait;
while (Date.now() < end) { /* busy wait */ }
}
return false;
}
/**
* Generate a PDF from the given tab. Uses --from-file when header/footer
* templates are present (they can be HTML strings of arbitrary size).
*/
export function pdf(opts: PdfOptions): void {
// If any large payload is present, send via --from-file
const hasLargePayload =
(opts.headerTemplate && opts.headerTemplate.length > 1024) ||
(opts.footerTemplate && opts.footerTemplate.length > 1024);
if (hasLargePayload) {
const payloadFile = writePayloadFile({
output: opts.output,
tabId: opts.tabId,
...optionsToPdfFlags(opts),
});
try {
runBrowse(["pdf", "--from-file", payloadFile]);
} finally {
cleanupPayloadFile(payloadFile);
}
return;
}
// Small payload: pass flags via argv
const args = ["pdf", opts.output, "--tab-id", String(opts.tabId)];
pushFlagsFromOptions(args, opts);
runBrowse(args);
}
function optionsToPdfFlags(opts: PdfOptions): Record<string, unknown> {
// Shape mirrors what the browse `pdf` case expects when reading --from-file
const out: Record<string, unknown> = {};
if (opts.format) out.format = opts.format;
if (opts.width) out.width = opts.width;
if (opts.height) out.height = opts.height;
if (opts.marginTop) out.marginTop = opts.marginTop;
if (opts.marginRight) out.marginRight = opts.marginRight;
if (opts.marginBottom) out.marginBottom = opts.marginBottom;
if (opts.marginLeft) out.marginLeft = opts.marginLeft;
if (opts.headerTemplate !== undefined) out.headerTemplate = opts.headerTemplate;
if (opts.footerTemplate !== undefined) out.footerTemplate = opts.footerTemplate;
if (opts.pageNumbers !== undefined) out.pageNumbers = opts.pageNumbers;
if (opts.tagged !== undefined) out.tagged = opts.tagged;
if (opts.outline !== undefined) out.outline = opts.outline;
if (opts.printBackground !== undefined) out.printBackground = opts.printBackground;
if (opts.preferCSSPageSize !== undefined) out.preferCSSPageSize = opts.preferCSSPageSize;
if (opts.toc !== undefined) out.toc = opts.toc;
return out;
}
function pushFlagsFromOptions(args: string[], opts: PdfOptions): void {
if (opts.format) { args.push("--format", opts.format); }
if (opts.width) { args.push("--width", opts.width); }
if (opts.height) { args.push("--height", opts.height); }
if (opts.marginTop) { args.push("--margin-top", opts.marginTop); }
if (opts.marginRight) { args.push("--margin-right", opts.marginRight); }
if (opts.marginBottom) { args.push("--margin-bottom", opts.marginBottom); }
if (opts.marginLeft) { args.push("--margin-left", opts.marginLeft); }
if (opts.headerTemplate !== undefined) {
args.push("--header-template", opts.headerTemplate);
}
if (opts.footerTemplate !== undefined) {
args.push("--footer-template", opts.footerTemplate);
}
if (opts.pageNumbers === true) args.push("--page-numbers");
if (opts.tagged === true) args.push("--tagged");
if (opts.outline === true) args.push("--outline");
if (opts.printBackground === true) args.push("--print-background");
if (opts.preferCSSPageSize === true) args.push("--prefer-css-page-size");
if (opts.toc === true) args.push("--toc");
}
+256
View File
@@ -0,0 +1,256 @@
#!/usr/bin/env bun
/**
* make-pdf CLI — argv parse, dispatch, exit.
*
* Output contract (per CEO plan DX spec):
* stdout: ONLY the output path on success. One line. Nothing else.
* stderr: progress spinner per stage, final "Done in Xs. N pages."
* --quiet: suppress progress. Errors still print.
* --verbose: per-stage timings.
* exit 0 success / 1 bad args / 2 render error / 3 Paged.js timeout / 4 browse unavailable.
*/
import { COMMANDS } from "./commands";
import { ExitCode, BrowseClientError } from "./types";
import type { GenerateOptions, PreviewOptions } from "./types";
interface ParsedArgs {
command: string;
positional: string[];
flags: Record<string, string | boolean>;
}
function parseArgs(argv: string[]): ParsedArgs {
const args = argv.slice(2);
if (args.length === 0) {
printUsage();
process.exit(ExitCode.Success);
}
// First non-flag arg is the command.
let command = "";
const positional: string[] = [];
const flags: Record<string, string | boolean> = {};
for (let i = 0; i < args.length; i++) {
const a = args[i];
if (a.startsWith("--")) {
const key = a.slice(2);
const next = args[i + 1];
if (next !== undefined && !next.startsWith("--")) {
flags[key] = next;
i++;
} else {
flags[key] = true;
}
} else if (!command) {
command = a;
} else {
positional.push(a);
}
}
return { command, positional, flags };
}
function printUsage(): void {
const lines = [
"make-pdf — turn markdown into publication-quality PDFs",
"",
"Usage:",
];
for (const [name, info] of COMMANDS) {
lines.push(` $P ${info.usage}`);
lines.push(` ${info.description}`);
}
lines.push("");
lines.push("Page layout:");
lines.push(" --margins <dim> All four margins (default: 1in). in, pt, cm, mm.");
lines.push(" --page-size letter|a4|legal (aliases: --format)");
lines.push("");
lines.push("Document structure:");
lines.push(" --cover Add a cover page.");
lines.push(" --toc Generate clickable table of contents.");
lines.push(" --no-chapter-breaks Don't start a new page at every H1.");
lines.push("");
lines.push("Branding:");
lines.push(" --watermark <text> Diagonal watermark on every page.");
lines.push(" --header-template <html>");
lines.push(" --footer-template <html> Mutex with --page-numbers.");
lines.push(" --no-confidential Suppress the CONFIDENTIAL footer.");
lines.push("");
lines.push("Output control:");
lines.push(" --page-numbers / --no-page-numbers (default: on)");
lines.push(" --tagged / --no-tagged (default: on, accessible PDF)");
lines.push(" --outline / --no-outline (default: on, PDF bookmarks)");
lines.push(" --quiet Suppress progress on stderr.");
lines.push(" --verbose Per-stage timings on stderr.");
lines.push("");
lines.push("Network:");
lines.push(" --allow-network Load external images (off by default).");
lines.push("");
lines.push("Examples:");
lines.push(" $P generate letter.md");
lines.push(" $P generate --cover --toc essay.md essay.pdf");
lines.push(" $P generate --watermark DRAFT memo.md draft.pdf");
lines.push(" $P preview letter.md");
lines.push("");
lines.push("Run `$P setup` to verify browse + Chromium + pdftotext install.");
console.error(lines.join("\n"));
}
function generateOptionsFromFlags(parsed: ParsedArgs): GenerateOptions {
const p = parsed.positional;
if (p.length === 0) {
console.error("$P generate: missing <input.md>");
console.error("Usage: $P generate <input.md> [output.pdf] [options]");
process.exit(ExitCode.BadArgs);
}
const f = parsed.flags;
const booleanFlag = (key: string, def: boolean): boolean => {
if (f[key] === true) return true;
if (f[`no-${key}`] === true) return false;
return def;
};
return {
input: p[0],
output: p[1],
margins: f.margins as string | undefined,
marginTop: f["margin-top"] as string | undefined,
marginRight: f["margin-right"] as string | undefined,
marginBottom: f["margin-bottom"] as string | undefined,
marginLeft: f["margin-left"] as string | undefined,
pageSize: ((f["page-size"] ?? f.format) as any),
cover: f.cover === true,
toc: f.toc === true,
noChapterBreaks: f["no-chapter-breaks"] === true,
watermark: typeof f.watermark === "string" ? f.watermark : undefined,
headerTemplate: typeof f["header-template"] === "string"
? f["header-template"] : undefined,
footerTemplate: typeof f["footer-template"] === "string"
? f["footer-template"] : undefined,
confidential: booleanFlag("confidential", true),
pageNumbers: booleanFlag("page-numbers", true),
tagged: booleanFlag("tagged", true),
outline: booleanFlag("outline", true),
quiet: f.quiet === true,
verbose: f.verbose === true,
allowNetwork: f["allow-network"] === true,
title: typeof f.title === "string" ? f.title : undefined,
author: typeof f.author === "string" ? f.author : undefined,
date: typeof f.date === "string" ? f.date : undefined,
};
}
function previewOptionsFromFlags(parsed: ParsedArgs): PreviewOptions {
const p = parsed.positional;
if (p.length === 0) {
console.error("$P preview: missing <input.md>");
console.error("Usage: $P preview <input.md> [options]");
process.exit(ExitCode.BadArgs);
}
const f = parsed.flags;
const booleanFlag = (key: string, def: boolean): boolean => {
if (f[key] === true) return true;
if (f[`no-${key}`] === true) return false;
return def;
};
return {
input: p[0],
cover: f.cover === true,
toc: f.toc === true,
watermark: typeof f.watermark === "string" ? f.watermark : undefined,
noChapterBreaks: f["no-chapter-breaks"] === true,
confidential: booleanFlag("confidential", true),
allowNetwork: f["allow-network"] === true,
title: typeof f.title === "string" ? f.title : undefined,
author: typeof f.author === "string" ? f.author : undefined,
date: typeof f.date === "string" ? f.date : undefined,
quiet: f.quiet === true,
verbose: f.verbose === true,
};
}
async function main(): Promise<void> {
const parsed = parseArgs(process.argv);
if (!parsed.command) {
printUsage();
process.exit(ExitCode.BadArgs);
}
if (!COMMANDS.has(parsed.command)) {
console.error(`$P: unknown command: ${parsed.command}`);
console.error("");
printUsage();
process.exit(ExitCode.BadArgs);
}
try {
switch (parsed.command) {
case "version": {
// Read from VERSION file or fall back to a hard-coded default.
try {
const fs = await import("node:fs");
const path = await import("node:path");
const versionFile = path.resolve(
path.dirname(process.argv[1] || ""),
"../../VERSION",
);
const version = fs.readFileSync(versionFile, "utf8").trim();
console.log(version);
} catch {
console.log("make-pdf (version unknown)");
}
process.exit(ExitCode.Success);
}
case "setup": {
const { runSetup } = await import("./setup");
await runSetup();
process.exit(ExitCode.Success);
}
case "generate": {
const opts = generateOptionsFromFlags(parsed);
const { generate } = await import("./orchestrator");
const outputPath = await generate(opts);
// Contract: stdout = output path only
console.log(outputPath);
process.exit(ExitCode.Success);
}
case "preview": {
const opts = previewOptionsFromFlags(parsed);
const { preview } = await import("./orchestrator");
const htmlPath = await preview(opts);
console.log(htmlPath);
process.exit(ExitCode.Success);
}
default:
// Unreachable: COMMANDS.has guarded above
process.exit(ExitCode.BadArgs);
}
} catch (err: any) {
if (err instanceof BrowseClientError) {
console.error(`$P: ${err.message}`);
process.exit(ExitCode.BrowseUnavailable);
}
if (err?.code === "ENOENT") {
console.error(`$P: file not found: ${err.path ?? err.message}`);
process.exit(ExitCode.BadArgs);
}
if (err?.name === "PagedJsTimeout") {
console.error(`$P: ${err.message}`);
process.exit(ExitCode.PagedJsTimeout);
}
console.error(`$P: ${err?.message ?? String(err)}`);
if (parsed.flags.verbose && err?.stack) {
console.error(err.stack);
}
process.exit(ExitCode.RenderError);
}
}
main();
+62
View File
@@ -0,0 +1,62 @@
/**
* Command registry for make-pdf — single source of truth.
*
* Dependency graph:
* commands.ts ──▶ cli.ts (runtime dispatch)
* ──▶ gen-skill-docs.ts (generates usage table in SKILL.md)
* ──▶ tests (validation)
*
* Zero side effects. Safe to import from build scripts.
*/
export const COMMANDS = new Map<string, {
description: string;
usage: string;
flags?: string[];
category: "Primary" | "Setup";
}>([
["generate", {
description: "Render a markdown file to a publication-quality PDF",
usage: "generate <input.md> [output.pdf] [options]",
category: "Primary",
flags: [
// Page layout
"--margins", "--margin-top", "--margin-right", "--margin-bottom", "--margin-left",
"--page-size", "--format",
// Structure
"--cover", "--toc", "--no-chapter-breaks",
// Branding
"--watermark", "--header-template", "--footer-template", "--no-confidential",
// Output
"--page-numbers", "--no-page-numbers", "--tagged", "--no-tagged",
"--outline", "--no-outline", "--quiet", "--verbose",
// Network
"--allow-network",
// Metadata
"--title", "--author", "--date",
],
}],
["preview", {
description: "Render markdown to HTML and open it in the browser (fast iteration)",
usage: "preview <input.md> [options]",
category: "Primary",
flags: [
"--cover", "--toc", "--no-chapter-breaks", "--watermark",
"--no-confidential", "--allow-network",
"--title", "--author", "--date",
"--quiet", "--verbose",
],
}],
["setup", {
description: "Verify browse + Chromium + pdftotext, then run a smoke test",
usage: "setup",
category: "Setup",
flags: [],
}],
["version", {
description: "Print make-pdf version",
usage: "version",
category: "Setup",
flags: [],
}],
]);
+228
View File
@@ -0,0 +1,228 @@
/**
* Orchestrator — ties render, browseClient, and filesystem together.
*
* generate(opts): markdown → PDF on disk. Returns output path.
* preview(opts): markdown → HTML, opens it in a browser.
*
* Progress indication (per DX spec):
* - stdout: ONLY the output path, printed by cli.ts after this returns.
* - stderr: spinner + per-stage status lines, unless opts.quiet.
* - --verbose: stage timings.
*
* Tab lifecycle: every generate opens a dedicated tab via $B newtab --json,
* runs load-html/js/pdf against --tab-id <N>, and closes the tab in a
* try/finally. Parallel $P generate calls never race on the active tab.
*/
import * as fs from "node:fs";
import * as os from "node:os";
import * as path from "node:path";
import * as crypto from "node:crypto";
import { spawn } from "node:child_process";
import { render } from "./render";
import type { GenerateOptions, PreviewOptions } from "./types";
import { ExitCode } from "./types";
import * as browseClient from "./browseClient";
class ProgressReporter {
private readonly quiet: boolean;
private readonly verbose: boolean;
private readonly stageStart = new Map<string, number>();
private readonly totalStart: number;
constructor(opts: { quiet?: boolean; verbose?: boolean }) {
this.quiet = opts.quiet === true;
this.verbose = opts.verbose === true;
this.totalStart = Date.now();
}
begin(stage: string): void {
this.stageStart.set(stage, Date.now());
if (this.quiet) return;
process.stderr.write(`\r\x1b[K${stage}...`);
}
end(stage: string, extra?: string): void {
const start = this.stageStart.get(stage) ?? Date.now();
const ms = Date.now() - start;
if (this.quiet) return;
if (this.verbose) {
process.stderr.write(`\r\x1b[K${stage} (${ms}ms)${extra ? `${extra}` : ""}\n`);
}
}
done(extra: string): void {
if (this.quiet) return;
const total = ((Date.now() - this.totalStart) / 1000).toFixed(1);
process.stderr.write(`\r\x1b[KDone in ${total}s. ${extra}\n`);
}
fail(stage: string, err: Error): void {
if (!this.quiet) process.stderr.write("\r\x1b[K");
// Always emit failure info, even in quiet mode — this is an error path.
process.stderr.write(`${stage} failed: ${err.message}\n`);
}
}
/**
* generate — full pipeline. Returns the output PDF path on success.
*/
export async function generate(opts: GenerateOptions): Promise<string> {
const progress = new ProgressReporter(opts);
const input = path.resolve(opts.input);
if (!fs.existsSync(input)) {
throw new Error(`input file not found: ${input}`);
}
const outputPath = path.resolve(
opts.output ?? path.join(os.tmpdir(), `${deriveSlug(input)}.pdf`),
);
// Stage 1: read markdown
progress.begin("Reading markdown");
const markdown = fs.readFileSync(input, "utf8");
progress.end("Reading markdown");
// Stage 2: render HTML
progress.begin("Rendering HTML");
const rendered = render({
markdown,
title: opts.title,
author: opts.author,
date: opts.date,
cover: opts.cover,
toc: opts.toc,
watermark: opts.watermark,
noChapterBreaks: opts.noChapterBreaks,
confidential: opts.confidential,
pageSize: opts.pageSize,
margins: opts.margins,
});
progress.end("Rendering HTML", `${rendered.meta.wordCount} words`);
// Stage 3: write HTML to a tmp file browse can read
// (We don't actually write it; we pass inline via --from-file JSON.)
// But for preview mode and debugging, we still write to tmp.
const htmlTmp = tmpFile("html");
fs.writeFileSync(htmlTmp, rendered.html, "utf8");
// Stage 4: spin up a dedicated tab, load HTML, (wait for Paged.js if TOC),
// then emit PDF. Always close the tab.
progress.begin("Opening tab");
const tabId = browseClient.newtab();
progress.end("Opening tab", `tabId=${tabId}`);
try {
progress.begin("Loading HTML into Chromium");
browseClient.loadHtml({
html: rendered.html,
waitUntil: "domcontentloaded",
tabId,
});
progress.end("Loading HTML into Chromium");
if (opts.toc) {
progress.begin("Paginating with Paged.js");
// Browse's $B pdf already waits internally when --toc is passed.
// We pass toc=true to browseClient.pdf() below.
progress.end("Paginating with Paged.js", "Paged.js after");
}
progress.begin("Generating PDF");
browseClient.pdf({
output: outputPath,
tabId,
format: opts.pageSize ?? "letter",
marginTop: opts.marginTop ?? opts.margins ?? "1in",
marginRight: opts.marginRight ?? opts.margins ?? "1in",
marginBottom: opts.marginBottom ?? opts.margins ?? "1in",
marginLeft: opts.marginLeft ?? opts.margins ?? "1in",
headerTemplate: opts.headerTemplate,
footerTemplate: opts.footerTemplate,
pageNumbers: opts.pageNumbers !== false && !opts.footerTemplate,
tagged: opts.tagged !== false,
outline: opts.outline !== false,
printBackground: !!opts.watermark,
toc: opts.toc,
});
progress.end("Generating PDF");
const stat = fs.statSync(outputPath);
const kb = Math.round(stat.size / 1024);
progress.done(`${rendered.meta.wordCount} words · ${kb}KB · ${outputPath}`);
} finally {
// Always clean up the tab — even on crash, timeout, or Chromium hang.
try {
browseClient.closetab(tabId);
} catch {
// best-effort; we already exited the main path
}
// Cleanup tmp HTML
try { fs.unlinkSync(htmlTmp); } catch { /* best-effort */ }
}
return outputPath;
}
/**
* preview — render HTML and open it. No PDF round trip.
*/
export async function preview(opts: PreviewOptions): Promise<string> {
const progress = new ProgressReporter(opts);
const input = path.resolve(opts.input);
if (!fs.existsSync(input)) {
throw new Error(`input file not found: ${input}`);
}
progress.begin("Rendering HTML");
const markdown = fs.readFileSync(input, "utf8");
const rendered = render({
markdown,
title: opts.title,
author: opts.author,
date: opts.date,
cover: opts.cover,
toc: opts.toc,
watermark: opts.watermark,
noChapterBreaks: opts.noChapterBreaks,
confidential: opts.confidential,
});
progress.end("Rendering HTML", `${rendered.meta.wordCount} words`);
// Write to a stable path under /tmp so the user can reload in the same tab.
const previewPath = path.join(os.tmpdir(), `make-pdf-preview-${deriveSlug(input)}.html`);
fs.writeFileSync(previewPath, rendered.html, "utf8");
progress.begin("Opening preview");
tryOpen(previewPath);
progress.end("Opening preview");
progress.done(`Preview at ${previewPath}`);
return previewPath;
}
// ─── helpers ──────────────────────────────────────────────
function deriveSlug(p: string): string {
const base = path.basename(p).replace(/\.[^.]+$/, "");
return base.replace(/[^a-zA-Z0-9-_]+/g, "-").slice(0, 64) || "document";
}
function tmpFile(ext: string): string {
const hash = crypto.randomBytes(6).toString("hex");
return path.join(os.tmpdir(), `make-pdf-${process.pid}-${hash}.${ext}`);
}
function tryOpen(pathOrUrl: string): void {
const platform = process.platform;
const cmd = platform === "darwin" ? "open" :
platform === "win32" ? "cmd" :
"xdg-open";
const args = platform === "win32" ? ["/c", "start", "", pathOrUrl] : [pathOrUrl];
try {
const child = spawn(cmd, args, { detached: true, stdio: "ignore" });
child.unref();
} catch {
// Non-fatal; the caller already has the path and will print it.
}
}
/** Setup-only re-export so cli.ts can dynamic-import without another file. */
export { ExitCode };
+254
View File
@@ -0,0 +1,254 @@
/**
* pdftotext wrapper — the tool behind the copy-paste CI gate.
*
* Codex round 2 surfaced two real problems we address here:
*
* #18: pdftotext (Poppler) vs pdftotext (Xpdf) vs pdftotext-next vary on
* whitespace, line wrap, Unicode normalization, form feeds, and
* extraction order. Cross-platform exact diffing is a non-starter.
* We normalize aggressively and diff the normalized form.
*
* #19: the regex /(?:\b\w\s){4,}/ only catches one failure shape (letters
* spaced out). It misses word-order corruption, missing whitespace
* between paragraphs, and homoglyph substitution. We add a word-token
* diff and a paragraph-boundary assertion on top.
*
* Resolution order for the pdftotext binary:
* 1. $PDFTOTEXT_BIN env override
* 2. `which pdftotext` on PATH
* 3. standard Homebrew paths on macOS
* 4. throws a friendly "install poppler" error
*
* The wrapper is *optional at runtime*: production renders don't need it.
* Only the CI gate and unit tests invoke pdftotext.
*/
import { execFileSync } from "node:child_process";
import * as fs from "node:fs";
import * as os from "node:os";
import * as path from "node:path";
export class PdftotextUnavailableError extends Error {
constructor(message: string) {
super(message);
this.name = "PdftotextUnavailableError";
}
}
export interface PdftotextInfo {
bin: string;
version: string; // "pdftotext version 24.02.0" or similar
flavor: "poppler" | "xpdf" | "unknown";
}
/**
* Locate pdftotext. Throws PdftotextUnavailableError if none is found.
*/
export function resolvePdftotext(): PdftotextInfo {
const envOverride = process.env.PDFTOTEXT_BIN;
if (envOverride && isExecutable(envOverride)) {
return describeBinary(envOverride);
}
// Try PATH
try {
const which = execFileSync("which", ["pdftotext"], { encoding: "utf8" }).trim();
if (which && isExecutable(which)) return describeBinary(which);
} catch {
// fall through
}
// Common macOS Homebrew locations
const macCandidates = [
"/opt/homebrew/bin/pdftotext", // Apple Silicon
"/usr/local/bin/pdftotext", // Intel Mac or Linuxbrew
"/usr/bin/pdftotext", // distro package
];
for (const candidate of macCandidates) {
if (isExecutable(candidate)) return describeBinary(candidate);
}
throw new PdftotextUnavailableError([
"pdftotext not found.",
"",
"make-pdf needs pdftotext to run the copy-paste CI gate.",
"(Runtime rendering does NOT need it. This only affects tests.)",
"",
"To install:",
" macOS: brew install poppler",
" Ubuntu: sudo apt-get install poppler-utils",
" Fedora: sudo dnf install poppler-utils",
"",
"Or set PDFTOTEXT_BIN to an explicit path:",
" export PDFTOTEXT_BIN=/path/to/pdftotext",
].join("\n"));
}
function isExecutable(p: string): boolean {
try {
fs.accessSync(p, fs.constants.X_OK);
return true;
} catch {
return false;
}
}
function describeBinary(bin: string): PdftotextInfo {
let version = "unknown";
let flavor: PdftotextInfo["flavor"] = "unknown";
try {
// pdftotext -v writes to stderr and exits 0 on poppler, 99 on some xpdf builds.
const result = execFileSync(bin, ["-v"], {
encoding: "utf8",
stdio: ["ignore", "pipe", "pipe"],
});
version = (result || "").trim().split("\n")[0] || "unknown";
} catch (err: any) {
// Many pdftotext builds exit non-zero on -v but still write to stderr.
const stderr = err?.stderr?.toString?.() ?? "";
version = stderr.trim().split("\n")[0] || "unknown";
}
const v = version.toLowerCase();
if (v.includes("poppler")) flavor = "poppler";
else if (v.includes("xpdf")) flavor = "xpdf";
return { bin, version, flavor };
}
/**
* Run pdftotext on a PDF and return the extracted text.
*
* Uses `-layout` by default because that's what downstream normalization
* expects. Callers that need raw text can pass layout=false.
*/
export function pdftotext(pdfPath: string, opts?: { layout?: boolean }): string {
const info = resolvePdftotext();
const layout = opts?.layout ?? true;
const args: string[] = [];
if (layout) args.push("-layout");
args.push(pdfPath, "-"); // "-" = stdout
try {
return execFileSync(info.bin, args, {
encoding: "utf8",
maxBuffer: 32 * 1024 * 1024,
});
} catch (err: any) {
throw new Error(`pdftotext failed on ${pdfPath}: ${err.message}`);
}
}
/**
* Normalize extracted text for cross-platform, cross-flavor diffing.
*
* What we strip / normalize:
* - Unicode: NFC canonical composition (macOS emits NFD; Linux emits NFC;
* this dodges the fundamental encoding diff).
* - CR and CRLF → LF (Windows Xpdf emits CRLF).
* - Form feeds (\f) → double newline (Poppler emits \f at page breaks).
* - Trailing spaces on every line.
* - Runs of 3+ blank lines → 2 blank lines.
* - Leading/trailing whitespace on the whole string.
* - Non-breaking space (U+00A0) → regular space.
* - Zero-width space (U+200B) and zero-width non-joiner (U+200C) → empty.
* - Soft hyphen (U+00AD) → empty (pdftotext -layout sometimes emits these
* for hyphens: auto breaks).
*/
export function normalize(raw: string): string {
let s = raw;
s = s.normalize("NFC");
s = s.replace(/\r\n/g, "\n");
s = s.replace(/\r/g, "\n");
s = s.replace(/\f/g, "\n\n");
s = s.replace(/\u00a0/g, " ");
s = s.replace(/[\u200b\u200c\u00ad]/g, "");
s = s.replace(/[ \t]+$/gm, "");
s = s.replace(/\n{3,}/g, "\n\n");
s = s.trim();
return s;
}
/**
* The canonical copy-paste gate used in the E2E tests.
*
* Returns { ok: true } when all three assertions pass; returns
* { ok: false, reasons: [...] } with one or more failure reasons otherwise.
*/
export interface GateResult {
ok: boolean;
reasons: string[];
extracted: string;
}
export function copyPasteGate(pdfPath: string, expected: string): GateResult {
const extracted = normalize(pdftotext(pdfPath, { layout: true }));
const expectedNorm = normalize(expected);
const reasons: string[] = [];
// Assertion 1: every expected paragraph appears as a whole line or
// contiguous block in the extracted text.
const expectedParagraphs = splitParagraphs(expectedNorm);
for (const paragraph of expectedParagraphs) {
const compact = collapseWhitespace(paragraph);
const extractedCompact = collapseWhitespace(extracted);
if (!extractedCompact.includes(compact)) {
reasons.push(
`expected paragraph not found in extracted text: ${truncate(paragraph, 80)}`,
);
}
}
// Assertion 2: no "S a i l i n g"-style single-char runs.
// Count groups of 4+ consecutive letter-then-space tokens. False positive
// risk on things like "A B C D" (initials) — mitigate by requiring the
// letters spell a known-word substring of the expected text.
const fragRegex = /((?:\b\w\s){4,})/g;
let fragMatch: RegExpExecArray | null;
while ((fragMatch = fragRegex.exec(extracted)) !== null) {
const letters = fragMatch[1].replace(/\s/g, "");
// Only flag if the reassembled letters appear in the expected text.
if (expectedNorm.toLowerCase().includes(letters.toLowerCase()) && letters.length >= 4) {
reasons.push(
`per-glyph emission detected (the "S ai li ng" bug): "${fragMatch[1].trim()}" reassembles to "${letters}"`,
);
}
}
// Assertion 3: paragraph boundaries preserved. Count double-newlines
// in both; they should differ by no more than ±2 (header/footer noise).
const expectedBreaks = (expectedNorm.match(/\n\n/g) || []).length;
const extractedBreaks = (extracted.match(/\n\n/g) || []).length;
if (Math.abs(expectedBreaks - extractedBreaks) > 4) {
reasons.push(
`paragraph boundary count drift: expected ~${expectedBreaks}, got ${extractedBreaks}`,
);
}
return { ok: reasons.length === 0, reasons, extracted };
}
function splitParagraphs(s: string): string[] {
return s.split(/\n\n+/).map(p => p.trim()).filter(p => p.length > 0);
}
function collapseWhitespace(s: string): string {
return s.replace(/\s+/g, " ").trim();
}
function truncate(s: string, n: number): string {
return s.length > n ? s.slice(0, n) + "..." : s;
}
/**
* Emit diagnostic info to stderr — useful for CI failure debugging.
* Call this once before running any gate in a CI log.
*/
export function logDiagnostics(): void {
try {
const info = resolvePdftotext();
process.stderr.write(
`[pdftotext] bin=${info.bin} flavor=${info.flavor} version="${info.version}" ` +
`os=${os.platform()}-${os.arch()} node=${process.version}\n`,
);
} catch (err: any) {
process.stderr.write(`[pdftotext] unavailable: ${err.message}\n`);
}
}
+350
View File
@@ -0,0 +1,350 @@
/**
* Print stylesheet generator.
*
* Source of truth: .context/designs/make-pdf-print-reference.html and siblings.
* Mirror those CSS rules here. The HTML references were approved via
* /plan-design-review with explicit design decisions locked in the plan:
*
* - Helvetica only (system font, no bundled webfonts — dodges the
* per-glyph Tj bug that breaks copy-paste extraction).
* - All paragraphs flush-left. No first-line indent, no justify, no
* p+p indent. text-align: left everywhere. 12pt margin-bottom.
* - Cover page has the same 1in margins as every other page. No flexbox
* center, no inset padding, no vertical centering. Distinction comes
* from eyebrow + larger title + hairline rule, not from centering.
* - `@page :first` suppresses running header/footer but does NOT override
* the 1in margin.
* - No <link>, no external CSS/fonts — everything inlined.
* - CJK fallback: Helvetica, Arial, Hiragino Kaku Gothic ProN, Noto Sans
* CJK JP, Microsoft YaHei, sans-serif.
*/
export interface PrintCssOptions {
// Document structure
cover?: boolean;
toc?: boolean;
noChapterBreaks?: boolean;
// Branding
watermark?: string;
confidential?: boolean;
// Header (running title, top of page)
runningHeader?: string;
// Page size (in CSS `@page size:` terms)
pageSize?: "letter" | "a4" | "legal" | "tabloid";
// Margins (default 1in)
margins?: string;
}
/**
* Produce a CSS block (no <style> wrapper) for inline injection.
*/
export function printCss(opts: PrintCssOptions = {}): string {
const size = opts.pageSize ?? "letter";
const margin = opts.margins ?? "1in";
const hasWatermark = typeof opts.watermark === "string" && opts.watermark.length > 0;
return [
pageRules(size, margin, opts),
rootTypography(),
coverRules(opts.cover === true),
tocRules(opts.toc === true),
chapterRules(opts.noChapterBreaks === true),
blockRules(),
inlineRules(),
codeRules(),
quoteRules(),
figureRules(),
tableRules(),
listRules(),
footnoteRules(),
hasWatermark ? watermarkRules() : "",
breakAvoidRules(),
].filter(Boolean).join("\n\n");
}
function pageRules(size: string, margin: string, opts: PrintCssOptions): string {
const runningHeader = escapeCssString(opts.runningHeader ?? "");
const showConfidential = opts.confidential !== false;
return [
`@page {`,
` size: ${size};`,
` margin: ${margin};`,
runningHeader
? ` @top-center { content: "${runningHeader}"; font-family: Helvetica, Arial, sans-serif; font-size: 9pt; color: #666; }`
: ``,
` @bottom-center { content: counter(page) " of " counter(pages); font-family: Helvetica, Arial, sans-serif; font-size: 9pt; color: #666; }`,
showConfidential
? ` @bottom-right { content: "CONFIDENTIAL"; font-family: Helvetica, Arial, sans-serif; font-size: 8pt; color: #aaa; letter-spacing: 0.05em; }`
: ``,
`}`,
``,
// Cover page: suppress running header/footer but keep margins.
`@page :first {`,
` @top-center { content: none; }`,
` @bottom-center { content: none; }`,
` @bottom-right { content: none; }`,
`}`,
].filter(line => line !== "").join("\n");
}
function rootTypography(): string {
return [
`html { lang: en; }`,
`body {`,
` font-family: Helvetica, Arial, "Hiragino Kaku Gothic ProN", "Noto Sans CJK JP", "Microsoft YaHei", sans-serif;`,
` font-size: 11pt;`,
` line-height: 1.5;`,
` color: #111;`,
` background: white;`,
` hyphens: auto;`,
` font-variant-ligatures: common-ligatures;`,
` font-kerning: normal;`,
` text-rendering: geometricPrecision;`,
` margin: 0;`,
` padding: 0;`,
`}`,
].join("\n");
}
function coverRules(enabled: boolean): string {
if (!enabled) return "";
return [
`.cover {`,
` page: first;`,
` page-break-after: always;`,
` break-after: page;`,
` text-align: left;`,
`}`,
`.cover .eyebrow {`,
` font-size: 9pt;`,
` letter-spacing: 0.2em;`,
` text-transform: uppercase;`,
` color: #666;`,
` margin: 0 0 36pt;`,
`}`,
`.cover h1.cover-title {`,
` font-size: 32pt;`,
` line-height: 1.15;`,
` font-weight: 700;`,
` letter-spacing: -0.01em;`,
` margin: 0 0 18pt;`,
` max-width: 5.5in;`,
` text-align: left;`,
`}`,
`.cover .cover-subtitle {`,
` font-size: 14pt;`,
` line-height: 1.4;`,
` font-weight: 400;`,
` color: #333;`,
` margin: 0 0 36pt;`,
` max-width: 5in;`,
` text-align: left;`,
`}`,
`.cover hr.rule {`,
` width: 2.5in;`,
` height: 0;`,
` border: 0;`,
` border-top: 1px solid #111;`,
` margin: 0 0 18pt 0;`,
`}`,
`.cover .cover-meta { font-size: 10pt; line-height: 1.6; color: #333; text-align: left; }`,
`.cover .cover-meta strong { font-weight: 700; }`,
].join("\n");
}
function tocRules(enabled: boolean): string {
if (!enabled) return "";
return [
`.toc { page-break-after: always; break-after: page; }`,
`.toc h2 {`,
` font-size: 13pt;`,
` text-transform: uppercase;`,
` letter-spacing: 0.15em;`,
` color: #666;`,
` font-weight: 600;`,
` margin: 0 0 0.5in;`,
`}`,
`.toc ol {`,
` list-style: none;`,
` padding: 0;`,
` margin: 0;`,
`}`,
`.toc li {`,
` display: flex;`,
` align-items: baseline;`,
` gap: 0.25in;`,
` font-size: 11pt;`,
` line-height: 2;`,
` padding: 4pt 0;`,
`}`,
`.toc li .toc-title { flex: 0 0 auto; }`,
`.toc li .toc-dots { flex: 1 1 auto; border-bottom: 1px dotted #aaa; margin: 0 6pt; transform: translateY(-4pt); }`,
`.toc li .toc-page { flex: 0 0 auto; color: #666; font-variant-numeric: tabular-nums; }`,
`.toc li.level-2 { padding-left: 0.35in; font-size: 10pt; }`,
`.toc li a { color: inherit; text-decoration: none; }`,
].join("\n");
}
function chapterRules(noChapterBreaks: boolean): string {
const breakRule = noChapterBreaks
? `/* chapter breaks disabled */`
: [
`.chapter { break-before: page; page-break-before: always; }`,
`.chapter:first-of-type { break-before: auto; page-break-before: auto; }`,
].join("\n");
return [
breakRule,
`h1 {`,
` font-size: 22pt;`,
` line-height: 1.2;`,
` font-weight: 700;`,
` letter-spacing: -0.01em;`,
` margin: 0 0 0.25in;`,
` break-after: avoid;`,
` page-break-after: avoid;`,
`}`,
`h2 { font-size: 15pt; line-height: 1.3; font-weight: 700; margin: 24pt 0 6pt; break-after: avoid; page-break-after: avoid; }`,
`h3 { font-size: 12pt; line-height: 1.4; font-weight: 700; text-transform: uppercase; letter-spacing: 0.08em; color: #333; margin: 18pt 0 4pt; break-after: avoid; page-break-after: avoid; }`,
`h4 { font-size: 11pt; font-weight: 700; margin: 12pt 0 4pt; break-after: avoid; page-break-after: avoid; }`,
].join("\n");
}
function blockRules(): string {
// Flush-left paragraphs, no indent, 12pt gap. No justify.
// Rule from the plan's "Body paragraph rule (post-review fix)".
return [
`p {`,
` margin: 0 0 12pt;`,
` text-align: left;`,
` widows: 3;`,
` orphans: 3;`,
`}`,
`p:first-child { margin-top: 0; }`,
`p.lead { font-size: 13pt; line-height: 1.45; color: #222; margin: 0 0 18pt; }`,
].join("\n");
}
function inlineRules(): string {
return [
`a {`,
` color: #0055cc;`,
` text-decoration: underline;`,
` text-decoration-thickness: 0.5pt;`,
` text-underline-offset: 1.5pt;`,
`}`,
`strong { font-weight: 700; }`,
`em { font-style: italic; }`,
].join("\n");
}
function codeRules(): string {
return [
`code {`,
` font-family: "SF Mono", Menlo, Consolas, monospace;`,
` font-size: 9.5pt;`,
` background: #f4f4f4;`,
` padding: 1pt 3pt;`,
` border-radius: 2pt;`,
` border: 0.5pt solid #e4e4e4;`,
`}`,
`pre {`,
` font-family: "SF Mono", Menlo, Consolas, monospace;`,
` font-size: 9pt;`,
` line-height: 1.4;`,
` background: #f7f7f5;`,
` padding: 10pt 12pt;`,
` border: 0.5pt solid #e0e0e0;`,
` border-radius: 3pt;`,
` margin: 12pt 0;`,
` overflow: hidden;`,
` white-space: pre-wrap;`,
`}`,
`pre code { background: none; border: 0; padding: 0; font-size: inherit; }`,
// highlight.js minimal palette (kept neutral, prints well)
`.hljs-keyword { color: #8b0000; font-weight: 500; }`,
`.hljs-string { color: #0d6608; }`,
`.hljs-comment { color: #888; font-style: italic; }`,
`.hljs-function, .hljs-title { color: #0044aa; }`,
`.hljs-number { color: #a64d00; }`,
].join("\n");
}
function quoteRules(): string {
return [
`blockquote {`,
` margin: 12pt 0;`,
` padding: 0 0 0 18pt;`,
` border-left: 2pt solid #111;`,
` color: #333;`,
` font-size: 11pt;`,
` line-height: 1.5;`,
`}`,
`blockquote p { margin-bottom: 6pt; text-align: left; }`,
`blockquote cite { display: block; margin-top: 6pt; font-style: normal; font-size: 9.5pt; color: #666; letter-spacing: 0.02em; }`,
`blockquote cite::before { content: "— "; }`,
].join("\n");
}
function figureRules(): string {
return [
`figure { margin: 12pt 0; }`,
`figure img { display: block; max-width: 100%; height: auto; }`,
`figcaption { font-size: 9pt; color: #666; margin-top: 6pt; font-style: italic; }`,
].join("\n");
}
function tableRules(): string {
return [
`table { width: 100%; border-collapse: collapse; margin: 12pt 0; font-size: 10pt; }`,
`th, td { border-bottom: 0.5pt solid #ccc; padding: 5pt 8pt; text-align: left; vertical-align: top; }`,
`th { font-weight: 700; border-bottom: 1pt solid #111; background: transparent; }`,
].join("\n");
}
function listRules(): string {
return [
`ul, ol { margin: 0 0 12pt 0; padding-left: 20pt; }`,
`li { margin-bottom: 3pt; line-height: 1.45; }`,
`li > ul, li > ol { margin-top: 3pt; margin-bottom: 0; }`,
].join("\n");
}
function footnoteRules(): string {
return [
`.footnote-ref { font-size: 0.75em; vertical-align: super; line-height: 0; text-decoration: none; color: #0055cc; }`,
`.footnotes { margin-top: 24pt; padding-top: 12pt; border-top: 0.5pt solid #ccc; font-size: 9.5pt; line-height: 1.4; }`,
`.footnotes ol { padding-left: 18pt; }`,
].join("\n");
}
function watermarkRules(): string {
return [
`.watermark {`,
` position: fixed;`,
` top: 50%;`,
` left: 50%;`,
` transform: translate(-50%, -50%) rotate(-30deg);`,
` font-size: 140pt;`,
` font-weight: 700;`,
` color: rgba(200, 0, 0, 0.06);`,
` letter-spacing: 0.08em;`,
` pointer-events: none;`,
` z-index: 9999;`,
` user-select: none;`,
` white-space: nowrap;`,
`}`,
].join("\n");
}
function breakAvoidRules(): string {
return `blockquote, pre, code, table, figure, li, .keep-together { break-inside: avoid; page-break-inside: avoid; }`;
}
function escapeCssString(s: string): string {
return s.replace(/\\/g, "\\\\").replace(/"/g, "\\\"");
}
+340
View File
@@ -0,0 +1,340 @@
/**
* Markdown → HTML renderer. Pure function, no I/O, no Playwright.
*
* Pipeline:
* 1. marked parses markdown → HTML
* 2. Sanitize: strip <script>, <iframe>, <object>, <embed>, <link>,
* <meta>, <base>, <form>, and all on* event handlers + javascript:
* URLs. (Codex round 2 #9: untrusted markdown can embed raw HTML.)
* 3. Smartypants transform (code/URL-safe).
* 4. Assemble full HTML document with print CSS inlined and
* semantic structure (cover, TOC placeholder, body).
*/
import { marked } from "marked";
import { smartypants } from "./smartypants";
import { printCss, type PrintCssOptions } from "./print-css";
export interface RenderOptions {
markdown: string;
// Document-level metadata (used for cover, PDF metadata, running header).
title?: string;
author?: string;
date?: string; // ISO or human string
subtitle?: string;
// Features
cover?: boolean;
toc?: boolean;
watermark?: string;
noChapterBreaks?: boolean;
confidential?: boolean; // default: true
// Page layout
pageSize?: "letter" | "a4" | "legal" | "tabloid";
margins?: string;
}
export interface RenderResult {
html: string; // full HTML document, ready for $B load-html
printCss: string; // for debugging / preview
bodyHtml: string; // just the rendered body (tests, snapshots)
meta: {
title: string;
author: string;
date: string;
wordCount: number;
};
}
/**
* Pure renderer. No side effects.
*/
export function render(opts: RenderOptions): RenderResult {
// 1. Markdown → HTML
const rawHtml = marked.parse(opts.markdown, { async: false }) as string;
// 2. Sanitize
const cleanHtml = sanitizeUntrustedHtml(rawHtml);
// 3. Decode common entities so smartypants can match raw " and '.
// marked HTML-encodes quotes in text ("hello" → &quot;hello&quot;);
// without decoding, smartypants' regex never fires. These get re-encoded
// implicitly by the browser's HTML parser downstream, and for the ones
// that should stay as curly-quote Unicode, that IS the final form.
const decoded = decodeTypographicEntities(cleanHtml);
// 4. Smartypants (code-safe)
const typographicHtml = smartypants(decoded);
// 4. Derive metadata (title from first H1 if not provided)
const derivedTitle = opts.title ?? extractFirstHeading(typographicHtml) ?? "Document";
const derivedAuthor = opts.author ?? "";
const derivedDate = opts.date ?? formatToday();
// 5. Build CSS
const cssOptions: PrintCssOptions = {
cover: opts.cover,
toc: opts.toc,
noChapterBreaks: opts.noChapterBreaks,
watermark: opts.watermark,
confidential: opts.confidential !== false,
runningHeader: derivedTitle,
pageSize: opts.pageSize,
margins: opts.margins,
};
const css = printCss(cssOptions);
// 6. Assemble document
const coverBlock = opts.cover
? buildCoverBlock({
title: derivedTitle,
subtitle: opts.subtitle,
author: derivedAuthor,
date: derivedDate,
})
: "";
const tocBlock = opts.toc
? buildTocBlock(typographicHtml)
: "";
// Wrap body in .chapter sections at H1 boundaries if chapter breaks are on.
const chapterHtml = opts.noChapterBreaks
? `<section class="chapter">${typographicHtml}</section>`
: wrapChaptersByH1(typographicHtml);
const watermarkBlock = opts.watermark
? `<div class="watermark">${escapeHtml(opts.watermark)}</div>`
: "";
const fullHtml = [
`<!doctype html>`,
`<html lang="en">`,
`<head>`,
`<meta charset="utf-8">`,
`<title>${escapeHtml(derivedTitle)}</title>`,
derivedAuthor ? `<meta name="author" content="${escapeHtml(derivedAuthor)}">` : ``,
`<style>`,
css,
`</style>`,
`</head>`,
`<body>`,
watermarkBlock,
coverBlock,
tocBlock,
chapterHtml,
`</body>`,
`</html>`,
].filter(Boolean).join("\n");
return {
html: fullHtml,
printCss: css,
bodyHtml: typographicHtml,
meta: {
title: derivedTitle,
author: derivedAuthor,
date: derivedDate,
wordCount: countWords(stripTags(typographicHtml)),
},
};
}
/**
* Decode the HTML entities that marked emits for text-node quotes/apostrophes.
* Only the four that matter for smartypants — leaves &amp; alone because it
* can be legitimately doubled (&amp;amp;) and we don't want to double-decode.
*/
function decodeTypographicEntities(html: string): string {
return html
.replace(/&quot;/g, "\"")
.replace(/&#39;/g, "'")
.replace(/&apos;/g, "'")
.replace(/&#x27;/g, "'");
}
// ─── Sanitizer ────────────────────────────────────────────────────────
/**
* Strip dangerous HTML from markdown-produced output.
*
* We can't use DOMPurify (server-side; adds a jsdom dep). A conservative
* regex sanitizer is fine for this use case because:
* 1. marked produces structured HTML (never malformed)
* 2. we only need to strip a fixed blacklist of elements + attrs
* 3. the output goes through Chromium's parser again, which normalizes
*
* What's stripped:
* - <script>, <iframe>, <object>, <embed>, <link>, <meta>, <base>, <form>
* (and their content).
* - on* event handler attributes (onclick, ONCLICK, etc.).
* - href/src with javascript: scheme.
* - <svg> tags with <script> inside them.
*/
export function sanitizeUntrustedHtml(html: string): string {
let s = html;
// Elements to remove entirely (including content).
const DANGER_TAGS = [
"script", "iframe", "object", "embed", "link", "meta", "base", "form",
"applet", "frame", "frameset",
];
for (const tag of DANGER_TAGS) {
const re = new RegExp(`<${tag}\\b[\\s\\S]*?</${tag}>`, "gi");
s = s.replace(re, "");
// Self-closing / unclosed variants
const selfRe = new RegExp(`<${tag}\\b[^>]*/?>`, "gi");
s = s.replace(selfRe, "");
}
// SVG <script>
s = s.replace(/<svg([^>]*)>([\s\S]*?)<\/svg>/gi, (_, attrs, body) => {
return `<svg${attrs}>${body.replace(/<script\b[\s\S]*?<\/script>/gi, "")}</svg>`;
});
// Event handler attributes (on* in any case).
s = s.replace(/\s+on[a-zA-Z]+\s*=\s*"[^"]*"/gi, "");
s = s.replace(/\s+on[a-zA-Z]+\s*=\s*'[^']*'/gi, "");
s = s.replace(/\s+on[a-zA-Z]+\s*=\s*[^\s>]+/gi, "");
// javascript: URLs in href/src/action/formaction
s = s.replace(
/(\s(?:href|src|action|formaction|xlink:href)\s*=\s*)(?:"javascript:[^"]*"|'javascript:[^']*'|javascript:[^\s>]+)/gi,
'$1"#"',
);
// srcdoc attribute (iframe escape hatch — already stripped via iframe above,
// but defense-in-depth).
s = s.replace(/\s+srcdoc\s*=\s*"[^"]*"/gi, "");
s = s.replace(/\s+srcdoc\s*=\s*'[^']*'/gi, "");
// style="url(javascript:..)" — strip javascript: inside style attrs.
s = s.replace(/url\(\s*javascript:[^)]*\)/gi, "url(#)");
return s;
}
// ─── Cover / TOC / Chapter helpers ────────────────────────────────────
function buildCoverBlock(opts: {
title: string;
subtitle?: string;
author?: string;
date: string;
}): string {
const title = escapeHtml(opts.title);
const subtitle = opts.subtitle ? escapeHtml(opts.subtitle) : "";
const author = opts.author ? escapeHtml(opts.author) : "";
const date = escapeHtml(opts.date);
return [
`<section class="cover">`,
` <h1 class="cover-title">${title}</h1>`,
subtitle ? ` <p class="cover-subtitle">${subtitle}</p>` : ``,
` <hr class="rule">`,
` <div class="cover-meta">`,
author ? ` <div><strong>${author}</strong></div>` : ``,
` <div>${date}</div>`,
` </div>`,
`</section>`,
].filter(Boolean).join("\n");
}
/**
* Scan HTML for H1/H2/H3 headings and emit a TOC placeholder.
* Page numbers are filled in by Paged.js (when --toc is passed and Paged.js
* polyfill is injected).
*/
function buildTocBlock(html: string): string {
const headings = extractHeadings(html);
if (headings.length === 0) return "";
const items = headings.map((h, i) => {
const level = h.level >= 2 ? "level-2" : "level-1";
const id = `toc-${i}`;
return [
` <li class="${level}">`,
` <span class="toc-title"><a href="#${id}">${escapeHtml(h.text)}</a></span>`,
` <span class="toc-dots"></span>`,
` <span class="toc-page" data-toc-target="${id}"></span>`,
` </li>`,
].join("\n");
}).join("\n");
return [
`<section class="toc">`,
` <h2>Contents</h2>`,
` <ol>`,
items,
` </ol>`,
`</section>`,
].join("\n");
}
function extractHeadings(html: string): Array<{ level: number; text: string }> {
const re = /<(h[1-3])[^>]*>([\s\S]*?)<\/\1>/gi;
const headings: Array<{ level: number; text: string }> = [];
let match;
while ((match = re.exec(html)) !== null) {
const level = parseInt(match[1].slice(1), 10);
const text = stripTags(match[2]).trim();
if (text) headings.push({ level, text });
}
return headings;
}
/**
* Wrap H1-rooted sections in <section class="chapter">. When chapter breaks
* are on (default), CSS `.chapter { break-before: page }` fires between them.
*/
function wrapChaptersByH1(html: string): string {
// Split on H1 openings. Everything before the first H1 is a preamble.
const h1Re = /<h1\b[^>]*>/gi;
const matches: number[] = [];
let m;
while ((m = h1Re.exec(html)) !== null) {
matches.push(m.index);
}
if (matches.length === 0) {
return `<section class="chapter">${html}</section>`;
}
const chunks: string[] = [];
const preamble = html.slice(0, matches[0]);
if (preamble.trim().length > 0) {
chunks.push(`<section class="chapter">${preamble}</section>`);
}
for (let i = 0; i < matches.length; i++) {
const start = matches[i];
const end = i + 1 < matches.length ? matches[i + 1] : html.length;
chunks.push(`<section class="chapter">${html.slice(start, end)}</section>`);
}
return chunks.join("\n");
}
function extractFirstHeading(html: string): string | null {
const m = html.match(/<h1\b[^>]*>([\s\S]*?)<\/h1>/i);
return m ? stripTags(m[1]).trim() : null;
}
function stripTags(html: string): string {
return html.replace(/<[^>]+>/g, "");
}
function escapeHtml(s: string): string {
return s
.replace(/&/g, "&amp;")
.replace(/</g, "&lt;")
.replace(/>/g, "&gt;")
.replace(/"/g, "&quot;")
.replace(/'/g, "&#39;");
}
function countWords(text: string): number {
return text.split(/\s+/).filter(w => w.length > 0).length;
}
function formatToday(): string {
const now = new Date();
return now.toLocaleDateString("en-US", { year: "numeric", month: "long", day: "numeric" });
}
+110
View File
@@ -0,0 +1,110 @@
/**
* `$P setup` — guided smoke test.
*
* Flow (per the CEO plan CLI UX spec):
* 1. Verify browse binary exists and responds
* 2. Verify Chromium launches via $B goto about:blank
* 3. Verify pdftotext is installed (warn, don't fail)
* 4. Generate a smoke-test PDF from an inline 2-paragraph fixture
* 5. Open it
* 6. Print a 3-command cheatsheet
*/
import * as os from "node:os";
import * as path from "node:path";
import * as fs from "node:fs";
import * as browseClient from "./browseClient";
import { resolvePdftotext, PdftotextUnavailableError } from "./pdftotext";
import { generate } from "./orchestrator";
export async function runSetup(): Promise<void> {
process.stderr.write("make-pdf setup — verifying install\n\n");
// 1. Resolve browse binary
process.stderr.write(" [1/5] Checking browse binary...");
try {
const bin = browseClient.resolveBrowseBin();
process.stderr.write(` OK (${bin})\n`);
} catch (err: any) {
process.stderr.write(" FAIL\n");
process.stderr.write(`\n${err.message}\n`);
process.exit(4);
}
// 2. Chromium smoke (navigate a dedicated tab to about:blank)
process.stderr.write(" [2/5] Launching Chromium...");
let chromiumTab: number | null = null;
try {
chromiumTab = browseClient.newtab("about:blank");
process.stderr.write(` OK (tab ${chromiumTab})\n`);
} catch (err: any) {
process.stderr.write(" FAIL\n");
process.stderr.write(`\nChromium failed to launch: ${err.message}\n`);
process.stderr.write("\nTo fix: run gstack setup from the gstack repo:\n");
process.stderr.write(" cd ~/.claude/skills/gstack && ./setup\n");
process.exit(4);
} finally {
if (chromiumTab !== null) {
try { browseClient.closetab(chromiumTab); } catch { /* ignore */ }
}
}
// 3. pdftotext (optional — CI gate only)
process.stderr.write(" [3/5] Checking pdftotext (optional)...");
try {
const info = resolvePdftotext();
process.stderr.write(` OK (${info.flavor}, ${info.version.split(" ").slice(-1)[0] || "version unknown"})\n`);
} catch (err) {
process.stderr.write(" SKIP\n");
if (err instanceof PdftotextUnavailableError) {
process.stderr.write(
" pdftotext not installed. This is optional — only the CI\n" +
" copy-paste gate needs it. To enable:\n" +
" macOS: brew install poppler\n" +
" Ubuntu: sudo apt-get install poppler-utils\n",
);
}
}
// 4. Render smoke-test PDF
process.stderr.write(" [4/5] Generating smoke-test PDF...\n");
const fixture = [
"# Hello from make-pdf",
"",
"This is a two-paragraph smoke test. If you can read this sentence in the PDF that just opened, the pipeline works end-to-end.",
"",
"The second paragraph contains curly quotes (\"hello\"), an em dash -- like this, and an ellipsis... all of which should render correctly.",
"",
].join("\n");
const fixturePath = path.join(os.tmpdir(), `make-pdf-smoke-${process.pid}.md`);
const outPath = path.join(os.tmpdir(), `make-pdf-smoke-${process.pid}.pdf`);
fs.writeFileSync(fixturePath, fixture, "utf8");
try {
await generate({
input: fixturePath,
output: outPath,
quiet: true,
pageNumbers: true,
});
process.stderr.write(` PASSED. Smoke test saved to ${outPath}\n`);
} catch (err: any) {
process.stderr.write(` FAILED: ${err.message}\n`);
process.exit(2);
} finally {
try { fs.unlinkSync(fixturePath); } catch { /* ignore */ }
}
// 5. Cheatsheet
process.stderr.write(" [5/5] All checks passed.\n\n");
process.stderr.write([
"make-pdf is ready. Try:",
" $P generate letter.md # default memo mode",
" $P generate --cover --toc essay.md # full publication",
" $P generate --watermark DRAFT memo.md # diagonal watermark",
"",
`Smoke-test PDF: ${outPath}`,
"",
].join("\n"));
}
+100
View File
@@ -0,0 +1,100 @@
/**
* Inline typographic transform (smartypants).
*
* Converts ASCII typography to real Unicode:
* "quoted" → "quoted" (U+201C/U+201D)
* 'quoted' → 'quoted' (U+2018/U+2019)
* don't → don't (apostrophe: U+2019)
* -- → — (em dash U+2014)
* ... → … (ellipsis U+2026)
*
* Critical: must NOT touch code, URLs, or HTML attributes. The Codex round
* 2 review flagged this specifically — smartypants run over a fenced code
* block corrupts the code and tokens inside tag attributes can break
* parsing.
*
* This operates on HTML (marked already produced it) and walks text nodes
* only via a lightweight regex that recognizes code/pre/URL zones and
* skips them entirely.
*/
const CODE_ZONE_RE = /<(pre|code|script|style)\b[^>]*>[\s\S]*?<\/\1>/gi;
const TAG_RE = /<[^>]+>/g;
const URL_RE = /\bhttps?:\/\/\S+/g;
/**
* Apply smartypants to an HTML string. Zones that should not be touched:
* - <pre>, <code>, <script>, <style> blocks (content unchanged)
* - HTML tags themselves (attributes unchanged)
* - URLs (http:// and https:// spans unchanged)
*/
export function smartypants(html: string): string {
// Step 1: split into preserved + transformed zones.
// Preserved zones: code/pre/script/style, tags, URLs.
// We carve them out with placeholder tokens, transform the rest, and
// splice them back.
const preserved: string[] = [];
const PLACEHOLDER = (i: number) => `\u0000SMARTPANTS_PRESERVED_${i}\u0000`;
const carve = (source: string, pattern: RegExp): string => {
return source.replace(pattern, (match) => {
const idx = preserved.length;
preserved.push(match);
return PLACEHOLDER(idx);
});
};
let s = html;
s = carve(s, CODE_ZONE_RE);
s = carve(s, TAG_RE);
s = carve(s, URL_RE);
s = transformText(s);
// Step 2: restore preserved zones.
// Use a function to avoid $-substitution gotchas.
s = s.replace(/\u0000SMARTPANTS_PRESERVED_(\d+)\u0000/g, (_, idx) => {
return preserved[parseInt(idx, 10)] ?? "";
});
return s;
}
/**
* Transform plain text (no HTML, no code, no URLs).
*
* Order matters:
* 1. Triple dots first (so they don't collide with later apostrophes)
* 2. Em dashes (two hyphens → em dash)
* 3. Apostrophes (contractions + possessives)
* 4. Double quotes (open/close pairing)
* 5. Single quotes (open/close pairing — after apostrophes)
*/
function transformText(text: string): string {
let s = text;
// Ellipsis: three literal dots (with optional spaces) → …
s = s.replace(/\.\s?\.\s?\./g, "\u2026");
// Em dash: -- → —. Require space or word-char boundary on both sides so
// we don't mangle ARGV-style flags in prose like `--verbose`.
s = s.replace(/(\w|\s)--(\w|\s)/g, "$1\u2014$2");
// Standalone -- at start/end
s = s.replace(/^--\s/gm, "\u2014 ");
s = s.replace(/\s--$/gm, " \u2014");
// Apostrophes in contractions and possessives.
// "don't", "it's", "they're", "Garry's"
s = s.replace(/(\w)'(\w)/g, "$1\u2019$2");
// Double quotes: open if preceded by whitespace/bol, close if preceded
// by word char or punctuation.
s = s.replace(/(^|[\s\(\[\{\-])"/g, "$1\u201c"); // opening "
s = s.replace(/"/g, "\u201d"); // remaining " are closing
// Single quotes (after apostrophe pass):
s = s.replace(/(^|[\s\(\[\{\-])'/g, "$1\u2018"); // opening '
s = s.replace(/'/g, "\u2019"); // remaining ' are closing
return s;
}
+123
View File
@@ -0,0 +1,123 @@
/**
* make-pdf — shared types.
*
* No runtime code. Imports are safe from any module.
*/
export type PageSize = "letter" | "a4" | "legal" | "tabloid";
export type FontMode = "sans"; // v1: Helvetica only. Future: "serif" | "custom".
/**
* Options for `$P generate` — the public CLI contract.
* Matches the flag set documented in the CEO plan.
*/
export interface GenerateOptions {
input: string; // markdown input path
output?: string; // PDF output path (default: /tmp/<slug>.pdf)
// Page layout
margins?: string; // "1in" | "72pt" | "25mm" | "2.54cm"
marginTop?: string;
marginRight?: string;
marginBottom?: string;
marginLeft?: string;
pageSize?: PageSize; // default "letter"
// Document structure
cover?: boolean;
toc?: boolean;
noChapterBreaks?: boolean; // default: chapter breaks ON
// Branding
watermark?: string; // e.g. "DRAFT"
headerTemplate?: string; // raw HTML
footerTemplate?: string; // raw HTML, mutex with pageNumbers
confidential?: boolean; // default: true
// Output control
pageNumbers?: boolean; // default: true
tagged?: boolean; // default: true (accessible PDF)
outline?: boolean; // default: true (PDF bookmarks)
quiet?: boolean; // suppress progress on stderr
verbose?: boolean; // per-stage timings on stderr
// Network
allowNetwork?: boolean; // default: false
// Metadata
title?: string;
author?: string;
date?: string; // ISO-ish; default: today
}
/**
* Options for `$P preview`.
*/
export interface PreviewOptions {
input: string;
quiet?: boolean;
verbose?: boolean;
// Same render flags as generate so preview matches output
cover?: boolean;
toc?: boolean;
watermark?: string;
noChapterBreaks?: boolean;
confidential?: boolean;
allowNetwork?: boolean;
title?: string;
author?: string;
date?: string;
}
/**
* Parsed page.pdf() options passed to browse.
*/
export interface BrowsePdfOptions {
output: string;
tabId: number;
format?: PageSize;
width?: string;
height?: string;
margins?: {
top: string;
right: string;
bottom: string;
left: string;
};
headerTemplate?: string;
footerTemplate?: string;
pageNumbers?: boolean;
displayHeaderFooter?: boolean;
tagged?: boolean;
outline?: boolean;
printBackground?: boolean;
preferCSSPageSize?: boolean;
toc?: boolean; // signals browse to wait for Paged.js
}
/**
* Exit codes for $P generate.
* Mirror these in orchestrator error paths.
*/
export const ExitCode = {
Success: 0,
BadArgs: 1,
RenderError: 2,
PagedJsTimeout: 3,
BrowseUnavailable: 4,
} as const;
export type ExitCode = typeof ExitCode[keyof typeof ExitCode];
/**
* Structured error for browse CLI shell-out failures.
*/
export class BrowseClientError extends Error {
constructor(
public readonly exitCode: number,
public readonly command: string,
public readonly stderr: string,
) {
super(`browse ${command} exited ${exitCode}: ${stderr.trim()}`);
this.name = "BrowseClientError";
}
}