v1.58.0.0 feat: diagram + multi-format document engine (mermaid, excalidraw, single-file HTML, DOCX) (#1990)

* docs(todos): P3 content-hash diagram render cache for make-pdf

Deferred from the diagram-engine eng review (Codex outside-voice D7):
repeat make-pdf runs re-render every fence; cache keyed on fence source +
bundle version once multi-diagram docs make it worth building.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* feat(diagram-render): offline mermaid+excalidraw render bundle for browse

Single self-contained page (dist/diagram-render.html, 9.2MB, committed per
eng-review D2) exposing __renderMermaid / __mermaidToExcalidraw /
__excalidrawToSvg / __rasterize / __probeImage through browse load-html +
js --out. Render contract per D3: securityLevel strict, per-fence ids,
print-css font lock, htmlLabels off (canvas-taint-safe). Deterministic
build (same sha twice); drift test pins dist == BUILD_INFO == package.json
pins and rebuild-reproducibility when toolchain matches. Spike-proven
offline: flowchart + sequence SVG, editable .excalidraw scene, 300dpi PNG.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* feat(diagram-render): __downscaleRaster for print-resolution image normalization

Data-URI rasters re-encode in their own format (JPEG stays JPEG at q0.9 —
PNG-encoding photos bloats them) at an explicit target pixel width. Used by
make-pdf's pre-pass for the 300dpi content-box ceiling (eng-review D4).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* feat(make-pdf): diagram pre-pass — mermaid/excalidraw fences render as vector SVG; local images inline as data URIs

```mermaid / ```excalidraw fences extract to placeholder tokens, render in
one diagram-render bundle tab per run (reset contract: bundle page reloads
after any render error), and substitute back as accessible <figure> blocks
with the raw source preserved in a comment. Render failures produce a loud
red diagnostic block, never silent raw code. render=false keeps a fence as
code; title="..." becomes the aria-label and caption.

Local images now actually render: page.setContent loads at about:blank
(tab-session.ts:194), so relative paths silently 404'd before. The pre-pass
resolves them against the markdown's directory, inlines as data URIs, probes
intrinsic dimensions from the bytes (pure-TS PNG/JPEG/GIF/WebP/SVG sniffing),
and downscales rasters wider than 2x the content box at 300dpi. Remote URLs
warn (offline posture, --allow-network exempts); missing files get a visible
placeholder; --strict hard-fails both for CI pipelines.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* test(make-pdf): diagram pre-pass unit suite + e2e render gates

34 unit tests (fence extraction incl. nested/tilde/unclosed/render=false,
info-string parsing, slot substitution, diagnostic/figure escaping + SVG
script strip, byte-level dimension probing across 5 formats, content-box
math, image inlining incl. strict/remote/missing/data-URI paths). E2E gate
proves through the compiled binary: both fences render as vector text
(id-collision check), raw mermaid ships only via render=false, broken fence
yields the diagnostic block, and the relative fixture image rasterizes to
colored pixels (CRITICAL regression for the about:blank image fix).
--strict exits non-zero on a missing image.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* feat(make-pdf): width directives + conservative auto-landscape via CSS named pages

`![a](x.png){width=full|<pct>|<dim>}` and `{page=landscape|portrait}`
suffixes translate to data-gstack-* attrs in render() (before the sanitizer,
which keeps data- attributes; unrecognized brace groups stay visible text).
Default width rule needs no code: intrinsic CSS-px capped at the content box,
never upscaled — figure img max-width owns it.

Auto-landscape promotes a block to `@page wide { size: <pagesize> landscape }`
only when aspect >= 1.8 AND intrinsic width > 2.5x the content box (~1600px on
letter) AND diagram provenance (rendered fences) or a whole-word alt token
(diagram|architecture|flowchart|chart|graph) for plain images. {page=...}
forces or vetoes; fence info strings accept page=... too. preferCSSPageSize
is passed to Chromium only when a promotion exists, so every other document
prints exactly as before. False negatives are cheap; false positives feel
broken (eng-review P4, Codex challenge accepted).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* test(make-pdf): width-policy unit suite + landscape e2e gate with negative fixtures

24 unit tests weighted toward the false-positive guards: wide screenshot
without an alt hint stays portrait, sub-threshold and tall images stay
portrait, deterministic 1560/1561px boundary, whole-word alt matching
('photographic' must not match 'graph'), page=portrait veto beats every
heuristic, diagnostic blocks never promote. E2E gate asserts pdfinfo
per-page boxes through the compiled binary: exactly 3 of 5 fixture blocks
get landscape pages (alt-hinted image, directive-forced image, wide sequence
diagram) while the unhinted screenshot and the veto'd diagram stay portrait —
plus the --toc combo proving TOC and named-page landscape coexist.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* feat(make-pdf): --to html|docx output formats

--to html writes the assembled self-contained document directly (no print
round-trip): inline vector diagrams, data-URI images, zero network
references, plus an @media screen layer for browser reading. --to docx is
the content-fidelity export (eng-review P8): html-to-docx@1.8.0 (exact pin;
pure JS, bun-compile-verified) maps headings/tables/code/lists; diagrams and
SVG images rasterize at 300dpi of the content-box width via the render tab;
diagnostic figures convert to plain p/pre so the converter can't silently
drop an error. --format keeps its page-size-alias meaning; --to is the
output format, and the CLI says so when confused.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* test(make-pdf): format gate — html no-network-refs + docx zip content checks

HTML: zero src/href network refs, no script/link tags, inline SVG diagrams,
data-URI images, screen layer, diagnostic survives. DOCX: valid OOXML zip
(document.xml + Content_Types), >=2 PNG media (diagram raster + fixture
image), headings + render=false source + diagnostic text in document.xml,
no leaked mermaid source from rendered fences. Plus --to validation UX.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* feat(diagram): /diagram skill — English in, editable diagram triplet out

New skill: agent authors mermaid from the user's description and renders the
triplet through the offline diagram-render bundle in the browse daemon —
.mmd source (the single source of truth), editable .excalidraw (opens at
excalidraw.com, round-trips back through re-render), and SVG + PNG. Flowcharts
convert to fully editable scenes; other mermaid types render with an explicit
upstream-converter limitation note. Never ships an unrendered source file;
offline is the contract (no CDN fallback). Inventory rows in AGENTS.md +
docs/skills.md; generated SKILL.md + llms.txt via gen:skill-docs.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* test(diagram): paid E2E pair — gate triplet contract + periodic authoring judge

diagram-triplet (gate, deterministic functional): a fresh claude -p agent
following the skill extract must emit a parseable triplet — graph LR/TD in
.mmd, excalidraw scene with >3 elements, SVG markup, PNG magic bytes.
Verified live: pass, $0.17, 58s. diagram-authoring-quality (periodic,
LLM-judged): faithfulness/labels/size rubric with a diagnostic-path cap,
floor 6/10. Verified live: pass at exactly 6 with substantive critique.
Touchfiles select both on diagram/** and lib/diagram-render/** changes;
tier split per E2E_TIERS rules (eng-review D5).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* test(diagram): register /diagram in the skill coverage matrix

Gate: triplet contract + structural floor; periodic: authoring-quality judge.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* feat(make-pdf): typography scale-up, zero image truncation, landscape vertical centering

Dogfooding round on the repo README surfaced four output-quality bugs:

- Type was too small everywhere: body 11→12pt, h1 22→26pt, h2 15→18pt,
  cover title 32→56pt with poster spacing, cover meta 10→13pt, TOC 11→12pt
  with tighter leading, code 9.5→10.5pt, tables 10→11pt.
- Zero image truncation, ever: the max-width cap was figure-scoped, but
  markdown images render as <p><img> — a 1850px GitHub screenshot ran off
  the page edge. Global img { max-width: 100%; height: auto; } cap.
- hyphens: auto put real 'dif-\nferent' breaks into the PDF text layer the
  moment 12pt made lines wrap (combined-gate caught it). Clean copy-paste
  is the product contract; left-aligned rag doesn't need hyphenation →
  hyphens: manual.
- Promoted landscape blocks now vertically center. CSS flex/min-height
  centering fragments into phantom empty landscape pages in Chromium
  (bisected: min-height at ANY value; 3 promotions printed 5 pages), so
  image-policy computes an inline margin-top from each block's known
  aspect ratio against the landscape content box instead — fragmentation
  handles margins fine. .page-wide also drops its explicit break-before/
  after (the page-name change already breaks on both sides).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* test(make-pdf): pin zero-truncation invariant, typography floor, centering math

Global img cap pinned as a regex invariant (the figure-scoped-cap regression
class); typography floor (12pt body, 56pt cover, 12pt TOC); .page-wide must
NOT carry min-height/flex (the phantom-landscape-page regression class);
centering margin math verified both ways (2400×1000 image → 1.38in,
2050×600 viewBox diagram → 1.93in, page-filling directive block → no margin).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* docs: diagram + multi-format documentation across README, make-pdf skill, and how-to guide

README gains /make-pdf (Publisher) and /diagram (Diagram Maker) rows in the
sprint table. make-pdf's skill doc — the agent-facing contract — gains Core
patterns for mermaid/excalidraw fences (title/render=false/page= options),
the image policy ({width=}/{page=} directives, zero-truncation, conservative
auto-landscape), --to html|docx, and --strict, plus the --to vs --format
disambiguation in Common flags. New docs/howto-diagrams-and-formats.md is
the user-facing walkthrough: fences, directives, formats, /diagram triplet,
the mermaid racetrack trick, troubleshooting.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* test(make-pdf): fill ship-audit coverage gaps — downscale, reset contract, excalidraw fence, WebP

Ship coverage audit found 9 gaps (85%); this fills the 2 HIGH + 3 MEDIUM and
most LOW. diagram-gate fixture gains a 4200px incompressible photo (the only
live coverage of __downscaleRaster AND the 64KB chunked jsViaBuffer eval
transport — asserted via the downscale stderr warning), an ```excalidraw
scene fence rendered through exportToSvg (vector labels + caption in
pdftotext, no leaked scene JSON), and the broken fence MOVED BETWEEN the two
mermaid fences so the second diagram rendering proves the D6.2 reset
contract end-to-end. New coverage-gaps.test.ts (16 tests): mock-tab reset
contract (exactly one reload, post-failure fence renders), excalidraw
fail-fast diagnostic without a bundle call, rasterize error fallbacks
(figure/tag kept, never silent), WebP VP8/VP8L/VP8X byte parsers,
landscapeContentBox a4/asymmetric margins, bare-token slot fallback,
resolveBundlePath env override + error shape, screenCss media scoping.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* fix(make-pdf): pre-landing review wave — fence fidelity, injection hardening, Windows paths, transport rework

Review army (6 specialists + red team) findings, all fixed:

- Indented fences replay byte-for-byte and indented diagram fences are NOT
  extracted (red-team conf-9: the pre-pass reconstructed fences at column 0,
  splitting any list containing fenced code — every ordinary document).
- String.replace $-pattern injection killed at every seam: substituteSlots,
  mergeStyle, img/src rewrites all use function replacements (a diagram label
  containing $' duplicated the document tail).
- Big-expression transport reworked: browse `eval <file>` (one spawn, any
  size, Windows-safe) replaces the 64KB chunked window-buffer eval — fixes
  the per-chunk spawn cost, the char-vs-byte argv units, AND the Windows
  32,767-char command-line ceiling in one move.
- Staged-bundle trust: content verified by hash even when the file exists,
  and the rename-failure path re-hashes the survivor (sticky-bit /tmp EPERM
  would otherwise ride a pre-planted file past the check).
- Windows drive-letter img srcs (C:/x.png) reach the local-path branch
  instead of being swallowed as unknown URL schemes.
- DOCX rasterize-failure now embeds the decoded source as visible text —
  returning the figure made diagrams vanish silently (converter drops svg).
- Fence source preserved as base64 data-gstack-source attribute (the comment
  encoding corrupted every '-->' arrow); decodeFigureSource() round-trips.
- inlineLocalImages memoizes per path; file:// uses fileURLToPath; preview
  prints a divergence note for fences/local images; --to docx strips the
  watermark div and warns about print-only flags; TOC links resolve in
  html/docx (heading ids assigned); waitForExpression sleeps instead of
  busy-spinning; escapeHtml/svg-dims deduped to single definitions;
  typography stragglers (blockquote 12pt, footnotes 10pt, 42em screen
  measure); bundle BUILD_INFO gains srcSha256 for no-node_modules drift
  detection; MAX_TARGET_PX shared guard.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* ci: make-pdf gate covers the diagram-render bundle; bundle pinned to LF

make-pdf-gate.yml paths gain lib/diagram-render/** and the drift test (a
bundle-only PR previously skipped every render gate AND no CI lane ran the
drift check at all). .gitattributes pins dist html/json to LF so Windows
autocrlf can't break the hash-pinned bundle.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* test(make-pdf)+feat(diagram): review-wave test pins + skill transport hardening

Tests: indented-fence byte-for-byte replay + no-extraction-in-lists,
drive-letter local-path routing, $-pattern slot immunity, base64 source
round-trip ('A --> B' exact), existing-style merge preservation, DOCX
rasterize-failure surfaces source, srcSha256 + font-stack drift guards,
landscape veto asserted as some-portrait/no-landscape (layout-order-proof),
judge rubric cap lowered to 5 so it actually fails, vacuous error-shape test
removed honestly, tmpdir cleanup.

/diagram skill: base64 transport (template literals corrupted backticks/${
in sources), content-addressed staging with hash verification, and --tab-id
pinned on every browse call so a concurrent /qa session can't be clobbered.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* feat(make-pdf): out-of-tree image reads warn; --strict makes them fatal (D8.1)

Local CLI semantics stay (absolute paths and ../ still inline, like pandoc),
but never silently: an agent PDF-ing untrusted markdown can't quietly embed a
file from outside the input directory into a shareable document without a
visible warning, and --strict pipelines hard-fail. Two unit tests. Also:
TODOS.md gains the deferred e2e-harness dedup entry (D8.2).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* fix: pre-existing test failure in skill-e2e-bws operational-learning

Root cause was the fixture, not model behavior: gstack-learnings-log gained
an import of lib/jsonl-store.ts in the v1.57.5.0 injection-sanitization wave,
but the test copies only bin/ scripts into its sandbox — the inline bun
import failed and the script exited 1 before writing, on every run, on main
too (reproduced at a5833c41). Fixture now stages lib/jsonl-store.ts beside
bin/; verified deterministically (script exits 0, learning written) and via
the paid test (1 pass).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* fix(make-pdf): adversarial-review wave — offline posture enforced, symlink-aware confinement, bounded reads

Codex adversarial + structured review findings:

- Remote images are now BLOCKED with a visible placeholder instead of
  warn-and-keep — leaving the tag meant Chromium fetched the URL at print
  time anyway, so the offline posture was a lie (tracking pixels and
  internal-URL probes ran without --allow-network).
- The out-of-tree read check compares REAL paths: a symlink inside the input
  dir pointing at ~/.ssh/... passed the string-prefix check, including under
  --strict. Ordered after the existence check (realpath of a missing file
  false-positives on macOS /var → /private/var).
- Image reads are bounded BEFORE reading: statSync first, non-regular files
  (fifo/device/dir) and >64MB files degrade to placeholders instead of
  hanging or exhausting memory; malformed percent-encoding (foo%zz.png)
  degrades to missing-image instead of crashing decodeURIComponent.
- browse shell-outs get a 120s timeout — a wedged daemon or hostile mermaid
  source fails the run instead of hanging it.
- TOC entries link to the heading's ACTUAL id (pre-id'd raw-HTML headings
  previously got dead #toc-N links); per-side margins compose into the CSS
  @page shorthand so a landscape promotion flipping preferCSSPageSize no
  longer silently reverts --margin-left/right to defaults (Codex P2).
- The image memo is a typed object — literal NUL-byte separators had made
  diagram-prepass.ts register as binary to text tooling.

Codex structured review GATE: PASS (no P1).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* chore: bump version and changelog (v1.58.0.0)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* docs: sync make-pdf image-policy docs with final shipped behavior (v1.58.0.0)

The docs wave (87594420) predated the final review-wave commits, so two
docs drifted from shipped behavior:

- make-pdf/SKILL.md.tmpl + generated SKILL.md: remote images are BLOCKED
  with a visible placeholder (not warned-and-kept); out-of-tree reads
  (including via symlink) warn and --strict makes them fatal; --strict
  also covers oversized (>64MB) and non-regular files; troubleshooting
  entry now names the actual "[remote image blocked]" symptom.
- docs/howto-diagrams-and-formats.md: same corrections in the image
  section, CI section, and troubleshooting.
- README.md: docs/howto-diagrams-and-formats.md added to the Docs table
  (was unreachable from any entry-point doc).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* docs: apply Codex doc-review findings for v1.58.0.0

Cross-model doc review (Codex, read-only) checked the v1.58.0.0 docs
against the shipped code. Fixes:

- howto + make-pdf SKILL: diagram source is preserved base64 in a
  data-gstack-source attribute, not an HTML comment (-- in mermaid
  arrows would corrupt a comment); fences must start at column 0;
  fence options example gains page=portrait; --to html "zero network
  refs" qualified (--allow-network deliberately keeps remote tags).
- /diagram description, README + docs/skills.md rows: the hand-drawn
  aesthetic belongs to the .excalidraw artifact; rendered SVG/PNG use
  mermaid's clean neutral theme (lib/diagram-render entry.ts pins
  theme: "neutral").
- CHANGELOG v1.58.0.0 wording: --strict coverage lists all five fatal
  classes (missing/remote/out-of-tree/oversized/non-regular); fences
  are vector SVG in pdf+html, 300dpi PNG in docx; hand-drawn claim
  scoped to the .excalidraw file.
- lib/diagram-render/README: Page API table gains __downscaleRaster.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
This commit is contained in:
Garry Tan
2026-06-12 15:38:53 -07:00
committed by GitHub
parent a5833c413f
commit 14fc0866d9
53 changed files with 11129 additions and 69 deletions
+32 -3
View File
@@ -176,6 +176,9 @@ function runBrowse(args: string[]): string {
encoding: "utf8",
maxBuffer: 16 * 1024 * 1024, // 16MB; tab content can be large
stdio: ["ignore", "pipe", "pipe"],
// A wedged daemon (or a hostile mermaid source spinning the renderer)
// must fail the run, not hang it forever.
timeout: 120_000,
});
} catch (err: any) {
const exitCode = typeof err.status === "number" ? err.status : 1;
@@ -268,6 +271,17 @@ export function loadHtml(opts: LoadHtmlOptions): void {
}
}
/**
* Load an HTML file (already under browse's safe dirs, e.g. /tmp) into a tab
* by path. Cheaper than loadHtml for large pages — no JSON payload round-trip;
* browse reads the file directly (diagram-render bundle is ~9MB).
*/
export function loadHtmlFile(opts: { file: string; tabId: number; waitUntil?: "load" | "domcontentloaded" | "networkidle" }): void {
const args = ["load-html", opts.file, "--tab-id", String(opts.tabId)];
if (opts.waitUntil) args.push("--wait-until", opts.waitUntil);
runBrowse(args);
}
/**
* Evaluate a JS expression in a tab. Returns the serialized result as string.
*/
@@ -279,6 +293,19 @@ export function js(opts: JsOptions): string {
]).trim();
}
/**
* Evaluate a JS file in a tab (`browse eval <file>`): the argv-safe transport
* for expressions too large for a command-line element. The file must live
* under browse's safe dirs (/tmp or cwd).
*/
export function evalFile(opts: { file: string; tabId: number }): string {
return runBrowse([
"eval",
opts.file,
"--tab-id", String(opts.tabId),
]).trim();
}
/**
* Poll a boolean JS expression until it evaluates to true, or timeout.
* Returns true if it succeeded, false if timed out.
@@ -300,9 +327,11 @@ export function waitForExpression(opts: {
}
const wait = Math.min(poll, Math.max(0, deadline - Date.now()));
if (wait <= 0) break;
// Synchronous sleep is fine — this only runs once per PDF render
const end = Date.now() + wait;
while (Date.now() < end) { /* busy wait */ }
// Real sleep, not a busy-wait: this poll now runs on every diagram-render
// bundle load (and after every fence render error), exactly while Chromium
// is parsing a 9MB page on the same machine — spinning a core competes
// with the work being awaited.
Bun.sleepSync(wait);
}
return false;
}
+20 -1
View File
@@ -64,9 +64,14 @@ function printUsage(): void {
lines.push(` ${info.description}`);
}
lines.push("");
lines.push("Output format:");
lines.push(" --to pdf|html|docx What to produce (default: pdf).");
lines.push(" html = single self-contained file, no network refs.");
lines.push(" docx = content fidelity, diagrams as PNG.");
lines.push("");
lines.push("Page layout:");
lines.push(" --margins <dim> All four margins (default: 1in). in, pt, cm, mm.");
lines.push(" --page-size letter|a4|legal (aliases: --format)");
lines.push(" --page-size letter|a4|legal (aliases: --format — page SIZE, not output format)");
lines.push("");
lines.push("Document structure:");
lines.push(" --cover Add a cover page.");
@@ -86,6 +91,12 @@ function printUsage(): void {
lines.push(" --quiet Suppress progress on stderr.");
lines.push(" --verbose Per-stage timings on stderr.");
lines.push("");
lines.push("Diagrams & images:");
lines.push(" ```mermaid / ```excalidraw fences render as vector diagrams.");
lines.push(" Add render=false to a fence info string to keep it as a code block.");
lines.push(" Local images are inlined; oversized rasters downscale to print resolution.");
lines.push(" --strict Missing/remote images fail the run (CI mode).");
lines.push("");
lines.push("Network:");
lines.push(" --allow-network Load external images (off by default).");
lines.push("");
@@ -112,9 +123,16 @@ function generateOptionsFromFlags(parsed: ParsedArgs): GenerateOptions {
if (f[`no-${key}`] === true) return false;
return def;
};
const to = typeof f.to === "string" ? f.to.toLowerCase() : "pdf";
if (to !== "pdf" && to !== "html" && to !== "docx") {
console.error(`$P generate: invalid --to '${f.to}'. Expected pdf, html, or docx.`);
console.error("(--format is a --page-size alias, not the output format.)");
process.exit(ExitCode.BadArgs);
}
return {
input: p[0],
output: p[1],
to: to as GenerateOptions["to"],
margins: f.margins as string | undefined,
marginTop: f["margin-top"] as string | undefined,
marginRight: f["margin-right"] as string | undefined,
@@ -136,6 +154,7 @@ function generateOptionsFromFlags(parsed: ParsedArgs): GenerateOptions {
quiet: f.quiet === true,
verbose: f.verbose === true,
allowNetwork: f["allow-network"] === true,
strict: f.strict === true,
title: typeof f.title === "string" ? f.title : undefined,
author: typeof f.author === "string" ? f.author : undefined,
date: typeof f.date === "string" ? f.date : undefined,
+846
View File
@@ -0,0 +1,846 @@
/**
* Diagram + image pre-pass. Runs between "read markdown" and render() in the
* orchestrator, and owns everything that needs the diagram-render bundle.
*
* markdown ─▶ extractDiagramFences() ──▶ render() (marked+sanitize+smarty)
* │ fences → placeholder tokens │
* │ ▼
* └─▶ renderFenceSlots() ───────────▶ substituteSlots(html, slots)
* one browse render tab/run │
* error ⇒ diagnostic block + page reload ▼
* inlineLocalImages(html)
* data URIs, probe dims from bytes,
* downscale >2x content box @300dpi,
* remote warn / missing placeholder /
* --strict hard-fail
*
* Placeholders survive marked, the sanitizer, and smartypants because they are
* plain hyphenated lowercase tokens with no quotes or HTML. Slot HTML is run
* through the same sanitizer as user content before substitution (the bundle
* renders with securityLevel strict — the sanitizer is the second layer).
*
* Reset contract (eng-review D6.2): each fence renders with a fresh
* mermaid.render id; after ANY render error the bundle page is reloaded before
* the next fence so a poisoned global can't corrupt diagram N+1.
*/
import * as fs from "node:fs";
import * as os from "node:os";
import * as path from "node:path";
import * as crypto from "node:crypto";
import { fileURLToPath } from "node:url";
import * as browseClient from "./browseClient";
import { escapeHtml, sanitizeUntrustedHtml } from "./render";
import { imageDims } from "./image-size";
// ─── Types ────────────────────────────────────────────────────────────
export interface DiagramFence {
/** "mermaid" | "excalidraw" */
lang: string;
/** Fence body (the diagram source). */
source: string;
/** Optional title="..." from the fence info string (a11y label, D6.4). */
title?: string;
/** Optional page=landscape|portrait fence directive (image-policy override). */
page?: "landscape" | "portrait";
/** render=false → leave as a plain code block (escape hatch, D6.3). */
render: boolean;
/** Placeholder token substituted into the markdown. */
token: string;
/** 1-based ordinal among rendered fences (unique ids, aria fallback). */
ordinal: number;
}
export interface FenceExtraction {
markdown: string;
fences: DiagramFence[];
}
export interface PrepassWarnings {
warn: (msg: string) => void;
}
export interface PrepassImageOptions {
/** Directory of the source markdown — relative image paths resolve here. */
inputDir: string;
/** Hard-fail on missing/remote images instead of warn (D6.1). */
strict: boolean;
/** Remote images are left untouched when network is explicitly allowed. */
allowNetwork: boolean;
/** Physical content-box width in inches (page width minus margins). */
contentWidthIn: number;
warn: (msg: string) => void;
/** Lazily provides a ready bundle tab (only opened when needed). */
getTab: () => RenderTab | null;
}
/** Print-resolution policy (eng-review D4): downscale rasters wider than
* 2 × contentWidth × 300dpi down to contentWidth × 300dpi. */
const PRINT_DPI = 300;
const DOWNSCALE_FACTOR = 2;
/** Per-image read ceiling — bounds memory before any policy runs. */
const MAX_IMAGE_BYTES = 64 * 1024 * 1024;
export class StrictModeError extends Error {
constructor(msg: string) {
super(msg);
this.name = "StrictModeError";
}
}
// ─── Fence extraction (pure) ──────────────────────────────────────────
const DIAGRAM_LANGS = new Set(["mermaid", "excalidraw"]);
/**
* Extract column-0 ```mermaid / ```excalidraw fences, replacing each with a
* unique placeholder token paragraph. Backtick and tilde fences, any length
* >= 3; closers must be at least as long as the opener (CommonMark). Fences
* with `render=false` are left untouched.
*
* Two deliberate conservatisms (red-team finding — the original version
* reconstructed fences at column 0 and restructured lists):
* - Non-diagram fences replay as their ORIGINAL raw lines, byte-for-byte
* (only a render=false flag is removed, in place, preserving indent).
* - INDENTED diagram fences (inside lists/quotes) are NOT extracted — a
* column-0 placeholder would split the list. They replay verbatim as code.
*/
export function extractDiagramFences(markdown: string): FenceExtraction {
const lines = markdown.split("\n");
const out: string[] = [];
const fences: DiagramFence[] = [];
const runId = crypto.randomBytes(4).toString("hex");
let i = 0;
let openFence: {
char: string; len: number; indent: number; info: string;
rawOpener: string; body: string[];
} | null = null;
let ordinal = 0;
while (i < lines.length) {
const line = lines[i];
if (openFence) {
const close = matchFenceLine(line);
if (close && close.char === openFence.char && close.len >= openFence.len && close.info === "") {
const info = parseInfoString(openFence.info);
if (DIAGRAM_LANGS.has(info.lang) && info.render && openFence.indent === 0) {
ordinal++;
const token = `gstack-diagram-slot-${runId}-${ordinal}`;
fences.push({
lang: info.lang,
source: openFence.body.join("\n"),
title: info.title,
page: info.page,
render: true,
token,
ordinal,
});
out.push("", token, "");
} else {
// Not extracted (other language, render=false, or indented): replay
// the ORIGINAL lines verbatim; only strip a render=false flag.
out.push(stripRenderFalse(openFence.rawOpener));
out.push(...openFence.body);
out.push(line);
}
openFence = null;
i++;
continue;
}
openFence.body.push(line);
i++;
continue;
}
const open = matchFenceLine(line);
if (open && open.info !== "") {
openFence = { ...open, rawOpener: line, body: [] };
i++;
continue;
}
if (open) {
// Anonymous fence (plain code block) — copy through to its closer so a
// ```mermaid example INSIDE a plain fence is never extracted.
out.push(line);
i++;
while (i < lines.length) {
const l = lines[i];
const close = matchFenceLine(l);
out.push(l);
i++;
if (close && close.char === open.char && close.len >= open.len && close.info === "") break;
}
continue;
}
out.push(line);
i++;
}
// Unclosed fence at EOF: replay verbatim (CommonMark treats it as code to EOF).
if (openFence) {
out.push(openFence.rawOpener);
out.push(...openFence.body);
}
return { markdown: out.join("\n"), fences };
}
function matchFenceLine(line: string): { char: string; len: number; indent: number; info: string } | null {
const m = line.match(/^( {0,3})(`{3,}|~{3,})\s*(.*)$/);
if (!m) return null;
return { indent: m[1].length, char: m[2][0], len: m[2].length, info: m[3].trim() };
}
/** Remove a render=false flag from a raw opener line, preserving everything else. */
function stripRenderFalse(rawOpener: string): string {
return rawOpener.replace(/\s*\brender\s*=\s*false\b/i, "");
}
/** Parse a fence info string: `mermaid`, `mermaid render=false`,
* `mermaid title="Auth flow"`, `mermaid page=landscape`. */
export function parseInfoString(info: string): {
lang: string; render: boolean; title?: string; page?: "landscape" | "portrait";
} {
const lang = (info.match(/^\S+/)?.[0] ?? "").toLowerCase();
const render = !/\brender\s*=\s*false\b/i.test(info);
const title = info.match(/\btitle\s*=\s*"([^"]*)"/i)?.[1]
?? info.match(/\btitle\s*=\s*'([^']*)'/i)?.[1];
const pageRaw = info.match(/\bpage\s*=\s*(landscape|portrait)\b/i)?.[1]?.toLowerCase();
const page = pageRaw === "landscape" || pageRaw === "portrait" ? pageRaw : undefined;
return { lang, render, title, page };
}
// ─── Slot substitution (pure) ─────────────────────────────────────────
/**
* Replace placeholder tokens in rendered HTML with their final slot HTML.
* marked wraps the bare token line in <p>…</p>; replace the wrapper too so
* the figure isn't nested inside a paragraph.
*/
export function substituteSlots(html: string, slots: Map<string, string>): string {
let s = html;
for (const [token, slotHtml] of slots) {
// Function replacement is load-bearing: slot HTML carries user/LLM-authored
// diagram label text, and string-form replace() expands $&, $', $` patterns
// inside it — a label containing "$'" would duplicate the document tail.
const wrapped = new RegExp(`<p>\\s*${token}\\s*</p>`, "g");
const replaced = s.replace(wrapped, () => slotHtml);
s = replaced !== s ? replaced : s.split(token).join(slotHtml);
}
return s;
}
/**
* Visible diagnostic block for a failed fence render — never silent raw code
* (eng-review: explicit error blocks). Sanitizer-safe: all dynamic content is
* HTML-escaped.
*/
export function buildDiagnosticBlock(fence: DiagramFence, errorMessage: string): string {
const excerpt = fence.source.split("\n").slice(0, 8).join("\n");
const truncated = fence.source.split("\n").length > 8 ? "\n…" : "";
return [
`<figure class="diagram diagram-error" role="img" aria-label="${escapeHtml(diagramLabel(fence))} (failed to render)">`,
`<figcaption class="diagram-error-title">Diagram failed to render (${escapeHtml(fence.lang)})</figcaption>`,
`<pre class="diagram-error-detail">${escapeHtml(errorMessage.trim())}\n\n${escapeHtml(excerpt + truncated)}</pre>`,
`</figure>`,
].join("\n");
}
/**
* Wrap a rendered SVG in an accessible figure (D6.4). The raw fence source is
* preserved base64-encoded in a data attribute — an HTML comment would need
* `--` escaping, which corrupts every mermaid arrow (`-->`) and breaks
* round-trip recovery.
*/
export function buildDiagramFigure(fence: DiagramFence, svg: string): string {
const label = diagramLabel(fence);
const cleanSvg = sanitizeUntrustedHtml(svg);
const captioned = fence.title
? `\n<figcaption class="diagram-caption">${escapeHtml(fence.title)}</figcaption>`
: "";
const pageAttr = fence.page ? ` data-gstack-page="${fence.page}"` : "";
const sourceB64 = Buffer.from(fence.source, "utf8").toString("base64");
return [
`<figure class="diagram" role="img" aria-label="${escapeHtml(label)}"${pageAttr}` +
` data-gstack-lang="${escapeHtml(fence.lang)}" data-gstack-source="${sourceB64}">`,
cleanSvg,
captioned,
`</figure>`,
].join("\n");
}
/** Recover the original fence source from a rendered figure (round-trip). */
export function decodeFigureSource(figureHtml: string): string | null {
const m = figureHtml.match(/\bdata-gstack-source="([A-Za-z0-9+/=]*)"/);
if (!m) return null;
try {
return Buffer.from(m[1], "base64").toString("utf8");
} catch {
return null;
}
}
function diagramLabel(fence: DiagramFence): string {
return fence.title ?? `diagram ${fence.ordinal}`;
}
// ─── Render tab (bundle page lifecycle) ───────────────────────────────
const PAYLOAD_TMP_DIR = process.platform === "win32" ? os.tmpdir() : "/tmp";
const READY_TIMEOUT_MS = 20_000;
// Expressions bigger than this ship via `browse eval <file>` instead of argv.
// 8KB is safe on every platform (Windows CreateProcess caps the WHOLE command
// line at 32,767 chars; Linux MAX_ARG_STRLEN is ~128KiB) and the tmp-file
// round-trip costs microseconds — one spawn regardless of payload size.
const MAX_ARGV_EXPR_BYTES = 8_000;
export class RenderTab {
private constructor(
public readonly tabId: number,
private readonly stagedBundlePath: string,
) {}
/**
* Open a tab and load the diagram-render bundle. The bundle HTML is staged
* under /tmp (content-addressed, reused across runs — load-html only reads
* inside its safe dirs) and loaded by PATH, not --from-file: a 9MB JSON
* round-trip per run would be pure waste.
*/
static open(): RenderTab {
const bundleSrc = resolveBundlePath();
const html = fs.readFileSync(bundleSrc);
const sha = crypto.createHash("sha256").update(html).digest("hex").slice(0, 16);
const staged = path.join(PAYLOAD_TMP_DIR, `gstack-diagram-render-${sha}.html`);
// Never trust an existing file at the predictable shared-/tmp name: verify
// its content hash and re-stage on mismatch (a pre-planted file would
// otherwise be loaded into the render tab as the bundle).
let needsWrite = true;
if (fs.existsSync(staged)) {
try {
const existing = crypto.createHash("sha256").update(fs.readFileSync(staged)).digest("hex").slice(0, 16);
needsWrite = existing !== sha;
} catch {
needsWrite = true;
}
}
if (needsWrite) {
// Concurrent-safe: write to a unique temp name, then atomic rename.
const tmp = `${staged}.${process.pid}.${crypto.randomBytes(4).toString("hex")}`;
fs.writeFileSync(tmp, html);
try {
fs.renameSync(tmp, staged);
} catch (renameErr) {
try { fs.unlinkSync(tmp); } catch { /* best-effort tmp cleanup */ }
// Only swallow the rename failure when the surviving file HASHES to
// the expected bundle (a concurrent writer won an OS-level race).
// Sticky-bit /tmp makes rename-over-foreign-file fail EPERM — if the
// survivor were trusted on existence alone, a pre-planted file would
// ride through the exact check added to stop it.
let survivorOk = false;
try {
const survivor = crypto.createHash("sha256").update(fs.readFileSync(staged)).digest("hex").slice(0, 16);
survivorOk = survivor === sha;
} catch { /* unreadable survivor = not ok */ }
if (!survivorOk) throw renameErr;
}
}
const tabId = browseClient.newtab();
const tab = new RenderTab(tabId, staged);
tab.loadBundle();
return tab;
}
/** (Re)load the bundle page — also the reset path after a render error. */
loadBundle(): void {
browseClient.loadHtmlFile({ file: this.stagedBundlePath, tabId: this.tabId });
const ready = browseClient.waitForExpression({
expression: "document.getElementById('status') !== null && document.getElementById('status').textContent === 'ready'",
tabId: this.tabId,
timeoutMs: READY_TIMEOUT_MS,
});
if (!ready) {
throw new Error(
"diagram-render bundle did not become ready in the browse tab " +
`(${READY_TIMEOUT_MS}ms). Check \`browse js "window.__errors"\` on tab ${this.tabId}.`,
);
}
}
/**
* Call one of the bundle's async window functions with JSON-safe string
* args. Errors come back as a recognizable ERR: prefix so a render failure
* is data, not a thrown browse exit.
*/
call(fn: string, ...args: Array<string | number>): string {
const argList = args.map((a) => JSON.stringify(a)).join(",");
const expression =
`window.${fn}(${argList})` +
`.then(r => "OK:" + r)` +
`.catch(e => "ERR:" + String((e && e.message) || e))`;
const result = this.js(expression);
if (result.startsWith("OK:")) return result.slice(3);
if (result.startsWith("ERR:")) throw new RenderCallError(result.slice(4));
throw new RenderCallError(`unexpected bundle result: ${result.slice(0, 200)}`);
}
private js(expression: string): string {
// Large payloads (scene JSON, SVG text, data URIs) blow past argv limits —
// browseClient.js shells out with the expression as an argv element. The
// limit is BYTES, not chars (CJK content is 3x its char count in UTF-8),
// and Windows caps the whole command line at 32,767 chars — so anything
// big ships via `browse eval <file>` instead: one spawn, any size.
if (Buffer.byteLength(expression, "utf8") <= MAX_ARGV_EXPR_BYTES) {
return browseClient.js({ expression, tabId: this.tabId });
}
return this.jsViaFile(expression);
}
/** argv-safe path for big expressions: stage to a tmp file under browse's
* safe dirs and run `browse eval <file>` (one spawn regardless of size). */
private jsViaFile(expression: string): string {
const file = path.join(
PAYLOAD_TMP_DIR,
`gstack-diagram-expr-${process.pid}-${crypto.randomBytes(4).toString("hex")}.js`,
);
fs.writeFileSync(file, expression, "utf8");
try {
return browseClient.evalFile({ file, tabId: this.tabId });
} finally {
try { fs.unlinkSync(file); } catch { /* best-effort tmp cleanup */ }
}
}
close(): void {
try {
browseClient.closetab(this.tabId);
} catch {
// best-effort: orchestrator finally path
}
}
}
export class RenderCallError extends Error {
constructor(msg: string) {
super(msg);
this.name = "RenderCallError";
}
}
/** Resolve dist/diagram-render.html: env override → repo-relative (dev) → global install. */
export function resolveBundlePath(env: NodeJS.ProcessEnv = process.env): string {
const candidates = [
env.GSTACK_DIAGRAM_BUNDLE,
// dev: make-pdf/src/* → repo root lib/. (In a compiled binary this is the
// virtual /$bunfs/root and simply never exists — harmless.)
path.resolve(import.meta.dir, "../../lib/diagram-render/dist/diagram-render.html"),
// compiled binary at <root>/make-pdf/dist/pdf → <root>/lib/… — same shape
// in the repo and in the ~/.claude/skills/gstack global install. argv[0]
// is the literal string "bun" in compiled binaries; execPath is real.
path.resolve(path.dirname(process.execPath), "../../lib/diagram-render/dist/diagram-render.html"),
path.join(os.homedir(), ".claude/skills/gstack/lib/diagram-render/dist/diagram-render.html"),
].filter((p): p is string => !!p);
for (const p of candidates) {
if (fs.existsSync(p)) return p;
}
throw new Error(
"diagram-render bundle not found. Tried:\n" +
candidates.map((c) => ` - ${c}`).join("\n") +
"\nRun `bun run build:diagram-render` (repo) or re-run ./setup (install).",
);
}
// ─── Fence rendering ──────────────────────────────────────────────────
/**
* Render every extracted fence to its slot HTML. One bundle tab serves all
* fences; a failed fence yields a diagnostic block and a bundle reload
* (reset contract) before the next fence renders.
*/
export function renderFenceSlots(
fences: DiagramFence[],
tab: RenderTab,
warn: (msg: string) => void,
): Map<string, string> {
const slots = new Map<string, string>();
for (const fence of fences) {
try {
let svg: string;
if (fence.lang === "mermaid") {
svg = tab.call("__renderMermaid", `mermaid-fence-${fence.ordinal}`, fence.source);
} else {
JSON.parse(fence.source); // fail fast with a JSON diagnostic, not a bundle stack
svg = tab.call("__excalidrawToSvg", fence.source);
}
slots.set(fence.token, buildDiagramFigure(fence, svg));
} catch (err: any) {
const msg = err?.message ?? String(err);
warn(`diagram ${fence.ordinal} (${fence.lang}) failed to render: ${firstLine(msg)}`);
slots.set(fence.token, buildDiagnosticBlock(fence, msg));
// Reset contract: a poisoned page must not corrupt the next fence.
try {
tab.loadBundle();
} catch (reloadErr: any) {
warn(`bundle reload after render error failed: ${firstLine(reloadErr?.message ?? String(reloadErr))}`);
}
}
}
return slots;
}
// ─── DOCX rasterization (eng-review D6.5, P8) ─────────────────────────
/**
* Replace inline diagram SVGs (and svg data-URI images) with PNG <img> tags
* for the DOCX export — Word's SVG support is unreliable, so the content-
* fidelity contract embeds rasters at 300dpi of the placed width (the
* content box). Diagnostic blocks keep their text form.
*/
export function rasterizeDiagramFigures(
html: string,
tab: RenderTab,
contentWidthIn: number,
warn: (msg: string) => void,
): string {
const targetPx = Math.round(contentWidthIn * PRINT_DPI);
// 1. Rendered diagram figures → <img> with the figure's aria-label as alt.
let out = html.replace(
/<figure class="diagram"[^>]*>[\s\S]*?<\/figure>/gi,
(figure) => {
const svgMatch = figure.match(/<svg\b[\s\S]*<\/svg>/i);
if (!svgMatch) return figure;
const label = figure.match(/\baria-label\s*=\s*"([^"]*)"/i)?.[1] ?? "diagram";
try {
const png = tab.call("__rasterize", svgMatch[0], targetPx);
return `<p><img src="${png}" alt="${label}"></p>`;
} catch (err: any) {
const reason = firstLine(err?.message ?? String(err));
warn(`docx: diagram rasterization failed (${reason}); embedding source text instead`);
// The converter drops <figure>/<svg> entirely, so returning the figure
// would make the diagram vanish without a trace — the exact invisible
// failure the diagnostic contract forbids. Surface the source.
const source = decodeFigureSource(figure) ?? "(source unavailable)";
return [
`<p><strong>Diagram could not be rasterized for DOCX (${escapeHtml(reason)}) — source:</strong></p>`,
`<pre>${escapeHtml(source)}</pre>`,
].join("\n");
}
},
);
// 2. SVG data-URI images (inlined .svg files) → PNG.
out = out.replace(/<img\b[^>]*>/gi, (tag) => {
const m = tag.match(SRC_RE);
const src = m?.[2] ?? m?.[3] ?? "";
if (!src.startsWith("data:image/svg+xml")) return tag;
try {
const b64 = src.slice(src.indexOf(",") + 1);
const svgText = Buffer.from(b64, "base64").toString("utf8");
const png = tab.call("__rasterize", svgText, targetPx);
// Function replacement: data URIs can contain $-patterns.
return tag.replace(SRC_RE, () => `src="${png}"`);
} catch (err: any) {
warn(`docx: svg image rasterization failed (${firstLine(err?.message ?? String(err))})`);
return tag;
}
});
return out;
}
/**
* Diagnostic figures → plain <p>/<pre> for the DOCX converter, which drops
* <figure> elements it can't map. An invisible error is the one thing the
* diagnostic contract forbids. Pure — no render tab needed.
*/
export function convertDiagnosticsForDocx(html: string): string {
return html.replace(
/<figure class="diagram diagram-error"[^>]*>([\s\S]*?)<\/figure>/gi,
(_full, body: string) => {
const title = body.match(/<figcaption[^>]*>([\s\S]*?)<\/figcaption>/i)?.[1] ?? "Diagram failed to render";
const detail = body.match(/<pre[^>]*>([\s\S]*?)<\/pre>/i)?.[1] ?? "";
return `<p><strong>${title}</strong></p>\n<pre>${detail}</pre>`;
},
);
}
// ─── Image inlining (eng-review D1 + D4 + D6.1) ───────────────────────
const IMG_TAG_RE = /<img\b[^>]*>/gi;
const SRC_RE = /\bsrc\s*=\s*("([^"]*)"|'([^']*)')/i;
/**
* Inline every local <img> as a data URI, probe intrinsic dimensions from the
* bytes, and annotate the tag with data-gstack-px-width/-height for the width
* policy. Oversized rasters are downscaled to print resolution via the bundle
* tab. Missing files become visible placeholders (or throw under --strict);
* remote URLs warn (offline posture) unless --allow-network.
*/
export function inlineLocalImages(html: string, opts: PrepassImageOptions): string {
const maxPx = Math.round(opts.contentWidthIn * PRINT_DPI * DOWNSCALE_FACTOR);
const targetPx = Math.round(opts.contentWidthIn * PRINT_DPI);
// An image referenced N times is read/probed/downscaled once; the same data
// URI string is reused (also dedupes memory until the final join).
const memo = new Map<string, { dataUri: string; attrs: string }>();
return html.replace(IMG_TAG_RE, (tag) => {
const srcMatch = tag.match(SRC_RE);
if (!srcMatch) return tag;
const src = srcMatch[2] ?? srcMatch[3] ?? "";
if (src.startsWith("data:")) return annotateFromDataUri(tag, src);
// Windows drive-letter paths (C:/x.png, C:\x.png) look like single-letter
// URL schemes — they are local paths, not URLs.
const isDrivePath = /^[a-zA-Z]:[\\/]/.test(src);
if (!isDrivePath && /^[a-z][a-z0-9+.-]*:/i.test(src)) {
// Absolute URL with a scheme (http, https, file, …)
if (opts.allowNetwork && /^https?:/i.test(src)) return tag;
if (/^https?:/i.test(src)) {
const msg = `remote image blocked (offline posture): ${src}`;
if (opts.strict) throw new StrictModeError(msg + " — re-run without --strict or pass --allow-network");
opts.warn(msg);
// Leaving the tag would make Chromium fetch it at print time anyway —
// the warn would be a lie. Replace with a visible placeholder.
return buildBlockedRemotePlaceholder(src);
}
// file:// and friends fall through to the local path branch
if (!src.startsWith("file:")) return tag;
}
// decodeURIComponent throws on malformed escapes (foo%zz.png) — a broken
// URL must degrade to the missing-image path, not crash the run.
let decodedSrc = src;
try {
decodedSrc = decodeURIComponent(src);
} catch { /* keep raw src */ }
const filePath = src.startsWith("file:")
? fileURLToPath(src)
: isDrivePath
? path.resolve(src)
: path.resolve(opts.inputDir, decodedSrc);
const cached = memo.get(filePath);
if (cached !== undefined) return rewriteImgTag(tag, cached);
if (!fs.existsSync(filePath)) {
const msg = `image not found: ${src} (resolved to ${filePath})`;
if (opts.strict) throw new StrictModeError(msg);
opts.warn(msg);
return buildMissingImagePlaceholder(src);
}
// Out-of-tree reads are legal (local CLI semantics — like pandoc) but
// never silent: an agent PDF-ing untrusted markdown should not quietly
// embed ~/.ssh/config into a shareable document. --strict makes it fatal.
// Compare REAL paths — a symlink inside the input dir pointing outside
// would otherwise pass a string-prefix check (Codex adversarial finding).
// Runs after the existence check: realpath of a missing file can't
// resolve, and on macOS /var vs /private/var would false-positive.
const inputRoot = safeRealpath(path.resolve(opts.inputDir)) + path.sep;
const realFilePath = safeRealpath(filePath);
if (!realFilePath.startsWith(inputRoot)) {
const msg = `image resolves OUTSIDE the input directory: ${src}${realFilePath}`;
if (opts.strict) throw new StrictModeError(msg + " — move it under the markdown's directory or drop --strict");
opts.warn(msg);
}
// Bound the read BEFORE reading: a markdown image pointing at a special
// file (fifo, device) would hang readFileSync, and a multi-GB file would
// exhaust memory before any policy ran.
let stat: fs.Stats;
try {
stat = fs.statSync(filePath);
} catch {
opts.warn(`image unreadable: ${src}`);
return buildMissingImagePlaceholder(src);
}
if (!stat.isFile()) {
const msg = `image is not a regular file: ${src}`;
if (opts.strict) throw new StrictModeError(msg);
opts.warn(msg);
return buildMissingImagePlaceholder(src);
}
if (stat.size > MAX_IMAGE_BYTES) {
const msg = `image exceeds ${Math.round(MAX_IMAGE_BYTES / 1024 / 1024)}MB cap: ${src} (${Math.round(stat.size / 1024 / 1024)}MB)`;
if (opts.strict) throw new StrictModeError(msg);
opts.warn(msg);
return buildMissingImagePlaceholder(src);
}
let buf = fs.readFileSync(filePath);
let dims = imageDims(buf);
let mime = dims?.mime ?? mimeFromExtension(filePath);
// Print-resolution normalization (D4): rasters only — SVG scales free.
if (dims && mime !== "image/svg+xml" && dims.width > maxPx) {
const tab = opts.getTab();
if (tab) {
try {
const dataUri = `data:${mime};base64,${buf.toString("base64")}`;
const scaled = tab.call("__downscaleRaster", dataUri, targetPx, mime);
const scaledB64 = scaled.replace(/^data:[^,]*,/, "");
opts.warn(
`downscaled ${path.basename(filePath)} ${dims.width}px → ${targetPx}px ` +
`(print is ${PRINT_DPI}dpi; original exceeds ${maxPx}px content-box ceiling)`,
);
buf = Buffer.from(scaledB64, "base64");
mime = scaled.slice(5, scaled.indexOf(";"));
dims = { ...dims, height: Math.round((dims.height * targetPx) / dims.width), width: targetPx };
} catch (err: any) {
opts.warn(`downscale failed for ${src}, inlining at full size: ${firstLine(err?.message ?? String(err))}`);
}
}
}
const dataUri = `data:${mime};base64,${buf.toString("base64")}`;
const attrs = dims
? ` data-gstack-px-width="${Math.round(dims.width)}" data-gstack-px-height="${Math.round(dims.height)}"`
: "";
memo.set(filePath, { dataUri, attrs });
return rewriteImgTag(tag, memo.get(filePath)!);
});
}
/** Apply a memoized inline result to an img tag. */
function rewriteImgTag(tag: string, entry: { dataUri: string; attrs: string }): string {
// Function replacement: data URIs are user-content-derived; string-form
// replace() would expand $-patterns inside them.
let out = tag.replace(SRC_RE, () => `src="${entry.dataUri}"`);
if (entry.attrs) out = out.replace(/^<img\b/i, () => `<img${entry.attrs}`);
return out;
}
function annotateFromDataUri(tag: string, src: string): string {
try {
const b64 = src.slice(src.indexOf(",") + 1);
const head = Buffer.from(b64.slice(0, 8192), "base64");
const dims = imageDims(head);
if (!dims) return tag;
return tag.replace(
/^<img\b/i,
`<img data-gstack-px-width="${Math.round(dims.width)}" data-gstack-px-height="${Math.round(dims.height)}"`,
);
} catch {
return tag;
}
}
function buildMissingImagePlaceholder(src: string): string {
return (
`<span class="image-missing" role="img" aria-label="missing image">` +
`[missing image: ${escapeHtml(src)}]</span>`
);
}
function buildBlockedRemotePlaceholder(src: string): string {
return (
`<span class="image-missing" role="img" aria-label="remote image blocked">` +
`[remote image blocked (use --allow-network): ${escapeHtml(src)}]</span>`
);
}
/** realpath that degrades to the input path when resolution fails. */
function safeRealpath(p: string): string {
try {
return fs.realpathSync(p);
} catch {
return p;
}
}
function mimeFromExtension(p: string): string {
switch (path.extname(p).toLowerCase()) {
case ".png": return "image/png";
case ".jpg":
case ".jpeg": return "image/jpeg";
case ".gif": return "image/gif";
case ".webp": return "image/webp";
case ".svg": return "image/svg+xml";
default: return "application/octet-stream";
}
}
// ─── Content-box math ─────────────────────────────────────────────────
const PAGE_WIDTHS_IN: Record<string, number> = {
letter: 8.5,
a4: 8.27,
legal: 8.5,
tabloid: 11,
};
/** Parse a CSS dimension ("1in" | "72pt" | "25mm" | "2.54cm") to inches. */
export function dimToInches(dim: string | undefined, fallbackIn: number): number {
if (!dim) return fallbackIn;
const m = dim.trim().match(/^([0-9.]+)\s*(in|pt|cm|mm|px)?$/i);
if (!m) return fallbackIn;
const v = parseFloat(m[1]);
switch ((m[2] ?? "in").toLowerCase()) {
case "in": return v;
case "pt": return v / 72;
case "cm": return v / 2.54;
case "mm": return v / 25.4;
case "px": return v / 96;
default: return fallbackIn;
}
}
export function contentWidthInches(opts: {
pageSize?: string;
margins?: string;
marginLeft?: string;
marginRight?: string;
}): number {
const pageW = PAGE_WIDTHS_IN[opts.pageSize ?? "letter"] ?? 8.5;
const left = dimToInches(opts.marginLeft ?? opts.margins, 1);
const right = dimToInches(opts.marginRight ?? opts.margins, 1);
return Math.max(1, pageW - left - right);
}
const PAGE_HEIGHTS_IN: Record<string, number> = {
letter: 11,
a4: 11.69,
legal: 14,
tabloid: 17,
};
/**
* Content box of the rotated (landscape) named page: portrait page HEIGHT
* becomes the landscape width; portrait WIDTH becomes the landscape height.
* Used by image-policy to vertically center promoted blocks.
*/
export function landscapeContentBox(opts: {
pageSize?: string;
margins?: string;
marginLeft?: string;
marginRight?: string;
marginTop?: string;
marginBottom?: string;
}): { contentWIn: number; contentHIn: number } {
const size = opts.pageSize ?? "letter";
const pageH = PAGE_HEIGHTS_IN[size] ?? 11;
const pageW = PAGE_WIDTHS_IN[size] ?? 8.5;
const left = dimToInches(opts.marginLeft ?? opts.margins, 1);
const right = dimToInches(opts.marginRight ?? opts.margins, 1);
const top = dimToInches(opts.marginTop ?? opts.margins, 1);
const bottom = dimToInches(opts.marginBottom ?? opts.margins, 1);
return {
contentWIn: Math.max(1, pageH - left - right),
contentHIn: Math.max(1, pageW - top - bottom),
};
}
// ─── tiny helpers ─────────────────────────────────────────────────────
// escapeHtml is imported from ./render — single definition, no drift.
function firstLine(s: string): string {
return s.split("\n")[0].slice(0, 200);
}
+236
View File
@@ -0,0 +1,236 @@
/**
* Image width policy + conservative auto-landscape (eng-review P4, D4 spec).
*
* Two pure passes over rendered HTML:
*
* 1. applyImageDirectives — runs inside render() right after marked, before
* the sanitizer. Translates the markdown-adjacent directive suffix
* `![alt](x.png){width=50%}` / `{page=landscape}` into data-gstack-*
* attributes (the sanitizer keeps data- attributes; the brace text is
* consumed so it never reaches smartypants or the page).
*
* 2. applyImagePolicy — runs in the orchestrator after image inlining (which
* annotates data-gstack-px-width/-height from real bytes). Applies the
* width rule and decides landscape promotion:
*
* WIDTH RULE: render at intrinsic CSS-px width, capped at the content box,
* never upscaled — that is exactly `figure img { max-width: 100% }` doing
* its job, so the default needs no inline style. Directives opt into more:
* width=full stretches to the content box; <pct>/<dim> set explicit width.
*
* LANDSCAPE (conservative, false negatives are cheap):
* promote only when ALL hold —
* aspect ratio ≥ 1.8
* AND intrinsic CSS-px width > SHRINK_LIMIT × content box
* (content shrunk below ~40% of natural size = unreadable)
* AND diagram provenance (rendered fence) or an alt-text token from
* ALT_HINT_TOKENS (plain images)
* `{page=landscape}` forces, `{page=portrait}` vetoes — both skip the
* heuristics entirely.
*
* Promotion wraps the block in <div class="page-wide"> whose CSS named
* page (`@page wide { size: <size> landscape }`, print-css.ts) rotates
* just that page. Chromium only honors CSS page sizes when the print call
* passes preferCSSPageSize — the orchestrator sets it when hasLandscape.
*/
import { svgTagDims } from "./image-size";
export interface ImagePolicyOptions {
/** Physical content-box width in inches (page width minus margins). */
contentWidthIn: number;
/**
* Landscape named-page content box (inches). Used to vertically center a
* promoted block via a computed inline margin-top — CSS flex/min-height
* centering fragments into phantom landscape pages in Chromium, so the
* margin is computed here from the block's known aspect ratio instead.
*/
landscape: { contentWIn: number; contentHIn: number };
warn: (msg: string) => void;
}
export interface ImagePolicyResult {
html: string;
/** True when at least one block was promoted to the landscape named page. */
hasLandscape: boolean;
}
/** Aspect ratio floor for auto-promotion. */
const MIN_ASPECT = 1.8;
/**
* Auto-promote only when the intrinsic CSS-px width exceeds this multiple of
* the content box (in CSS px @96dpi). 2.5 ≈ the plan's ~1600px threshold on a
* 6.5in letter box; calibrated against fixtures (design doc Open Question 4).
*/
const SHRINK_LIMIT = 2.5;
/** Alt-text tokens that mark a plain image as diagram-like (case-insensitive). */
const ALT_HINT_TOKENS = ["diagram", "architecture", "flowchart", "chart", "graph"];
// ─── Pass 1: directive suffixes ───────────────────────────────────────
const IMG_WITH_SUFFIX_RE = /(<img\b[^>]*>)\s*\{([^{}<>\n]{1,120})\}/gi;
/**
* Consume `{...}` directive suffixes adjacent to <img> tags. Unrecognized
* brace groups are left untouched (someone's literal prose).
*/
export function applyImageDirectives(html: string): string {
return html.replace(IMG_WITH_SUFFIX_RE, (full, imgTag: string, body: string) => {
const parsed = parseDirectives(body);
if (!parsed) return full;
let tag = imgTag;
if (parsed.width) tag = addAttr(tag, "data-gstack-width", parsed.width);
if (parsed.page) tag = addAttr(tag, "data-gstack-page", parsed.page);
return tag;
});
}
export function parseDirectives(body: string): { width?: string; page?: string } | null {
let width: string | undefined;
let page: string | undefined;
let recognized = false;
for (const part of body.trim().split(/\s+/)) {
const m = part.match(/^(width|page)=(.+)$/i);
if (!m) return null; // any unknown token ⇒ not a directive group
const key = m[1].toLowerCase();
const value = m[2].toLowerCase();
if (key === "width" && /^(full|\d{1,3}%|[0-9.]+(in|cm|mm|pt|px))$/.test(value)) {
width = value;
recognized = true;
} else if (key === "page" && /^(landscape|portrait)$/.test(value)) {
page = value;
recognized = true;
} else {
return null; // recognized key, malformed value ⇒ leave visible, not silent
}
}
return recognized ? { width, page } : null;
}
function addAttr(imgTag: string, name: string, value: string): string {
return imgTag.replace(/^<img\b/i, `<img ${name}="${value}"`);
}
// ─── Pass 2: width styles + landscape promotion ───────────────────────
export function applyImagePolicy(html: string, opts: ImagePolicyOptions): ImagePolicyResult {
let hasLandscape = false;
const boxCssPx = opts.contentWidthIn * 96;
const widthThresholdPx = boxCssPx * SHRINK_LIMIT;
// 2a. width directives → inline styles on the img.
let out = html.replace(/<img\b[^>]*>/gi, (tag) => {
const width = attrValue(tag, "data-gstack-width");
if (!width) return tag;
const css = width === "full" ? "100%" : width;
return mergeStyle(tag, `width: ${css}; height: auto;`);
});
// 2b. landscape promotion — standalone images (markdown images render as
// <p><img …></p>; promote by swapping the paragraph for the wide wrapper).
out = out.replace(/<p>\s*(<img\b[^>]*>)\s*<\/p>/gi, (full, tag: string) => {
const decision = decideImagePromotion(tag, widthThresholdPx);
if (!decision.promote) return full;
hasLandscape = true;
opts.warn(`promoting image to a landscape page (${decision.reason})`);
const w = num(attrValue(tag, "data-gstack-px-width"));
const h = num(attrValue(tag, "data-gstack-px-height"));
return wrapPageWide(tag, w && h ? h / w : null, opts.landscape);
});
// 2c. landscape promotion — rendered diagram figures (provenance is
// automatic; dims come from the SVG's width/height or viewBox).
out = out.replace(
/<figure class="diagram[^"]*"[^>]*>[\s\S]*?<\/figure>/gi,
(figure) => {
if (figure.includes("diagram-error")) return figure;
const decision = decideDiagramPromotion(figure, widthThresholdPx);
if (!decision.promote) return figure;
hasLandscape = true;
opts.warn(`promoting diagram to a landscape page (${decision.reason})`);
const dims = svgCssDims(figure);
return wrapPageWide(figure, dims ? dims.height / dims.width : null, opts.landscape);
},
);
return { html: out, hasLandscape };
}
/**
* Wrap a promoted block in the wide-page div, vertically centered via a
* computed margin-top: placed height = landscape content width × aspect,
* centered in the landscape content height. Unknown aspect → no margin
* (top placement beats a wrong guess).
*/
function wrapPageWide(
inner: string,
aspectHoverW: number | null,
landscape: { contentWIn: number; contentHIn: number },
): string {
if (!aspectHoverW) return `<div class="page-wide">${inner}</div>`;
const placedHIn = landscape.contentWIn * aspectHoverW;
const marginIn = Math.max(0, (landscape.contentHIn - placedHIn) / 2);
if (marginIn < 0.1) return `<div class="page-wide">${inner}</div>`;
return `<div class="page-wide" style="margin-top: ${marginIn.toFixed(2)}in">${inner}</div>`;
}
interface PromotionDecision {
promote: boolean;
reason: string;
}
function decideImagePromotion(tag: string, widthThresholdPx: number): PromotionDecision {
const page = attrValue(tag, "data-gstack-page");
if (page === "portrait") return { promote: false, reason: "page=portrait veto" };
if (page === "landscape") return { promote: true, reason: "page=landscape directive" };
const w = num(attrValue(tag, "data-gstack-px-width"));
const h = num(attrValue(tag, "data-gstack-px-height"));
if (!w || !h) return { promote: false, reason: "no intrinsic dimensions" };
if (w / h < MIN_ASPECT) return { promote: false, reason: "aspect below floor" };
if (w <= widthThresholdPx) return { promote: false, reason: "fits portrait readably" };
const alt = (attrValue(tag, "alt") ?? "").toLowerCase();
const hinted = ALT_HINT_TOKENS.some((t) => new RegExp(`\\b${t}\\b`).test(alt));
if (!hinted) return { promote: false, reason: "no diagram hint in alt text" };
return { promote: true, reason: `wide diagram-like image (${Math.round(w)}px, alt hint)` };
}
function decideDiagramPromotion(figure: string, widthThresholdPx: number): PromotionDecision {
const page = attrValue(figure, "data-gstack-page");
if (page === "portrait") return { promote: false, reason: "page=portrait veto" };
if (page === "landscape") return { promote: true, reason: "page=landscape fence directive" };
const dims = svgCssDims(figure);
if (!dims) return { promote: false, reason: "no measurable SVG dimensions" };
if (dims.width / dims.height < MIN_ASPECT) return { promote: false, reason: "aspect below floor" };
if (dims.width <= widthThresholdPx) return { promote: false, reason: "fits portrait readably" };
return { promote: true, reason: `wide diagram (${Math.round(dims.width)}px)` };
}
/** SVG dimension probing is shared with the byte prober — see image-size.ts. */
const svgCssDims = svgTagDims;
function attrValue(tag: string, name: string): string | null {
const m = tag.match(new RegExp(`\\b${name}\\s*=\\s*"([^"]*)"`, "i"))
?? tag.match(new RegExp(`\\b${name}\\s*=\\s*'([^']*)'`, "i"));
return m ? m[1] : null;
}
function num(s: string | null): number | null {
if (s === null) return null;
const n = parseFloat(s);
return Number.isFinite(n) && n > 0 ? n : null;
}
function mergeStyle(tag: string, css: string): string {
const existing = attrValue(tag, "style");
if (existing !== null) {
// Function replacement (no $-pattern expansion from user-controlled style
// values) and the existing declarations are preserved verbatim — attrValue
// already returned the unquoted inner value.
return tag.replace(/\bstyle\s*=\s*(".*?"|'.*?')/i, () => `style="${existing}; ${css}"`);
}
return tag.replace(/^<img\b/i, () => `<img style="${css}"`);
}
+117
View File
@@ -0,0 +1,117 @@
/**
* Intrinsic image dimensions from raw bytes. Pure, no DOM, no deps.
*
* The diagram pre-pass probes every local image it inlines (eng-review D1:
* "dimensions are probed from the bytes") so the width policy and landscape
* detector never need a browser round-trip. Formats: PNG, JPEG, GIF, WebP
* (VP8/VP8L/VP8X), and SVG (attribute/viewBox best-effort).
*
* Returns null when the format is unrecognized or the header is truncated —
* callers treat unknown dimensions as "no policy applied", never an error.
*/
export interface ImageDims {
width: number;
height: number;
mime: string;
}
export function imageDims(buf: Buffer): ImageDims | null {
if (buf.length < 12) return null;
return pngDims(buf) ?? jpegDims(buf) ?? gifDims(buf) ?? webpDims(buf) ?? svgDims(buf);
}
function pngDims(b: Buffer): ImageDims | null {
// 8-byte signature, then IHDR chunk: length(4) "IHDR"(4) width(4) height(4)
if (b.length < 24) return null;
if (b.readUInt32BE(0) !== 0x89504e47 || b.readUInt32BE(4) !== 0x0d0a1a0a) return null;
if (b.toString("ascii", 12, 16) !== "IHDR") return null;
return { width: b.readUInt32BE(16), height: b.readUInt32BE(20), mime: "image/png" };
}
function jpegDims(b: Buffer): ImageDims | null {
if (b[0] !== 0xff || b[1] !== 0xd8) return null;
let i = 2;
while (i + 9 < b.length) {
if (b[i] !== 0xff) { i++; continue; }
const marker = b[i + 1];
// Standalone markers without length payload
if (marker === 0xd8 || (marker >= 0xd0 && marker <= 0xd9)) { i += 2; continue; }
const len = b.readUInt16BE(i + 2);
if (len < 2) return null;
// SOF0-SOF15 except DHT(C4)/JPGA(C8)/DAC(CC) carry dimensions
if (marker >= 0xc0 && marker <= 0xcf && marker !== 0xc4 && marker !== 0xc8 && marker !== 0xcc) {
if (i + 9 >= b.length) return null;
return { height: b.readUInt16BE(i + 5), width: b.readUInt16BE(i + 7), mime: "image/jpeg" };
}
i += 2 + len;
}
return null;
}
function gifDims(b: Buffer): ImageDims | null {
const sig = b.toString("ascii", 0, 6);
if (sig !== "GIF87a" && sig !== "GIF89a") return null;
return { width: b.readUInt16LE(6), height: b.readUInt16LE(8), mime: "image/gif" };
}
function webpDims(b: Buffer): ImageDims | null {
if (b.toString("ascii", 0, 4) !== "RIFF" || b.toString("ascii", 8, 12) !== "WEBP") return null;
const fmt = b.toString("ascii", 12, 16);
if (fmt === "VP8X" && b.length >= 30) {
// 24-bit little-endian width-1 / height-1 at offsets 24 / 27
const w = 1 + (b[24] | (b[25] << 8) | (b[26] << 16));
const h = 1 + (b[27] | (b[28] << 8) | (b[29] << 16));
return { width: w, height: h, mime: "image/webp" };
}
if (fmt === "VP8 " && b.length >= 30) {
// Lossy: dimensions at offset 26, 14 bits each, little-endian
return {
width: b.readUInt16LE(26) & 0x3fff,
height: b.readUInt16LE(28) & 0x3fff,
mime: "image/webp",
};
}
if (fmt === "VP8L" && b.length >= 25) {
if (b[20] !== 0x2f) return null;
const bits = b.readUInt32LE(21);
return {
width: (bits & 0x3fff) + 1,
height: ((bits >> 14) & 0x3fff) + 1,
mime: "image/webp",
};
}
return null;
}
/**
* SVG: parse width/height attributes (px or unitless) off the root element,
* falling back to viewBox. CSS-unit widths (em, %, pt) are ignored — the
* width policy treats them as "no intrinsic size".
*/
function svgDims(b: Buffer): ImageDims | null {
const head = b.toString("utf8", 0, Math.min(b.length, 4096));
const dims = svgTagDims(head);
return dims ? { ...dims, mime: "image/svg+xml" } : null;
}
/**
* CSS-px dimensions of the first <svg> element in a markup string: explicit
* width/height attributes (px or unitless) first, else viewBox. Shared by the
* byte prober above and image-policy's diagram-figure measurements — one
* regex, no drift.
*/
export function svgTagDims(markup: string): { width: number; height: number } | null {
const tag = markup.match(/<svg\b[^>]*>/i)?.[0];
if (!tag) return null;
const attr = (name: string): number | null => {
const m = tag.match(new RegExp(`\\b${name}\\s*=\\s*["']\\s*([0-9.]+)(px)?\\s*["']`, "i"));
return m ? parseFloat(m[1]) : null;
};
const w = attr("width");
const h = attr("height");
if (w && h) return { width: w, height: h };
const vb = tag.match(/\bviewBox\s*=\s*["']\s*[-0-9.]+[\s,]+[-0-9.]+[\s,]+([0-9.]+)[\s,]+([0-9.]+)\s*["']/i);
if (vb) return { width: parseFloat(vb[1]), height: parseFloat(vb[2]) };
return null;
}
+169 -4
View File
@@ -21,9 +21,22 @@ import * as crypto from "node:crypto";
import { spawn } from "node:child_process";
import { render } from "./render";
import { screenCss } from "./print-css";
import type { GenerateOptions, PreviewOptions } from "./types";
import { ExitCode } from "./types";
import * as browseClient from "./browseClient";
import {
RenderTab,
contentWidthInches,
convertDiagnosticsForDocx,
extractDiagramFences,
inlineLocalImages,
landscapeContentBox,
rasterizeDiagramFigures,
renderFenceSlots,
substituteSlots,
} from "./diagram-prepass";
import { applyImagePolicy } from "./image-policy";
class ProgressReporter {
private readonly quiet: boolean;
@@ -71,8 +84,9 @@ export async function generate(opts: GenerateOptions): Promise<string> {
throw new Error(`input file not found: ${input}`);
}
const to = opts.to ?? "pdf";
const outputPath = path.resolve(
opts.output ?? path.join(os.tmpdir(), `${deriveSlug(input)}.pdf`),
opts.output ?? path.join(os.tmpdir(), `${deriveSlug(input)}.${to}`),
);
// Stage 1: read markdown
@@ -80,10 +94,14 @@ export async function generate(opts: GenerateOptions): Promise<string> {
const markdown = fs.readFileSync(input, "utf8");
progress.end("Reading markdown");
// Stage 1.5: diagram pre-pass — extract ```mermaid/```excalidraw fences and
// swap in placeholder tokens. Rendering happens after the tab opens below.
const extraction = extractDiagramFences(markdown);
// Stage 2: render HTML
progress.begin("Rendering HTML");
const rendered = render({
markdown,
markdown: extraction.markdown,
title: opts.title,
author: opts.author,
date: opts.date,
@@ -94,16 +112,144 @@ export async function generate(opts: GenerateOptions): Promise<string> {
confidential: opts.confidential,
pageSize: opts.pageSize,
margins: opts.margins,
marginTop: opts.marginTop,
marginRight: opts.marginRight,
marginBottom: opts.marginBottom,
marginLeft: opts.marginLeft,
pageNumbers: opts.pageNumbers,
footerTemplate: opts.footerTemplate,
});
progress.end("Rendering HTML", `${rendered.meta.wordCount} words`);
// Stage 2.5: render diagram fences in a dedicated bundle tab, substitute
// slots, then inline + probe + (if oversized) downscale local images.
// The bundle tab is lazy: image-only documents open it only when a raster
// actually needs print-resolution downscaling (eng-review D4).
const warn = (msg: string) => {
if (!opts.quiet) process.stderr.write(`\r\x1b[K[make-pdf] warning: ${msg}\n`);
};
let renderTab: RenderTab | null = null;
let hasLandscape = false;
const getRenderTab = (): RenderTab | null => {
if (renderTab) return renderTab;
try {
renderTab = RenderTab.open();
} catch (err: any) {
warn(`diagram-render tab unavailable: ${String(err?.message ?? err).split("\n")[0]}`);
return null;
}
return renderTab;
};
let finalHtml = rendered.html;
try {
if (extraction.fences.length > 0) {
progress.begin(`Rendering ${extraction.fences.length} diagram(s)`);
const tab = getRenderTab();
if (tab) {
const slots = renderFenceSlots(extraction.fences, tab, warn);
finalHtml = substituteSlots(finalHtml, slots);
} else {
// No bundle/tab: visible diagnostic beats silent raw tokens.
const slots = new Map(
extraction.fences.map((f) => [
f.token,
`<figure class="diagram diagram-error" role="img" aria-label="diagram ${f.ordinal} (not rendered)">` +
`<figcaption class="diagram-error-title">Diagram not rendered (${f.lang}) — diagram-render bundle unavailable</figcaption></figure>`,
]),
);
finalHtml = substituteSlots(finalHtml, slots);
}
progress.end(`Rendering ${extraction.fences.length} diagram(s)`);
}
progress.begin("Inlining images");
const contentWidthIn = contentWidthInches(opts);
finalHtml = inlineLocalImages(finalHtml, {
inputDir: path.dirname(input),
strict: opts.strict === true,
allowNetwork: opts.allowNetwork === true,
contentWidthIn,
warn,
getTab: getRenderTab,
});
progress.end("Inlining images");
// Width directives + conservative auto-landscape (image-policy).
const policy = applyImagePolicy(finalHtml, {
contentWidthIn,
landscape: landscapeContentBox(opts),
warn,
});
finalHtml = policy.html;
hasLandscape = policy.hasLandscape;
// DOCX needs rasters, not inline SVG (Word's SVG support is unreliable) —
// do it while the render tab is still open.
if (to === "docx") {
const needsRaster = /<figure class="diagram"|data:image\/svg\+xml/.test(finalHtml);
if (needsRaster) {
progress.begin("Rasterizing diagrams for DOCX");
const tab = getRenderTab();
if (tab) {
finalHtml = rasterizeDiagramFigures(finalHtml, tab, contentWidthIn, warn);
} else {
warn("docx: no render tab — diagrams keep their source text form");
}
progress.end("Rasterizing diagrams for DOCX");
}
finalHtml = convertDiagnosticsForDocx(finalHtml);
}
} finally {
renderTab?.close();
}
// ─── --to html: write the self-contained document, no print round-trip ──
if (to === "html") {
const withScreenLayer = finalHtml.replace(
"</style>",
`</style>\n<style>\n${screenCss()}\n</style>`,
);
fs.writeFileSync(outputPath, withScreenLayer, "utf8");
const kb = Math.round(fs.statSync(outputPath).size / 1024);
progress.done(`${rendered.meta.wordCount} words · ${kb}KB · ${outputPath}`);
return outputPath;
}
// ─── --to docx: content-fidelity conversion (eng-review P8) ────────────
if (to === "docx") {
// Print-only surfaces don't survive the conversion. The watermark div
// would degrade to a literal body paragraph reading "DRAFT" (worse than
// absent) — strip it. Warn once about print-only flags that were set.
finalHtml = finalHtml.replace(/<div class="watermark">[\s\S]*?<\/div>/, "");
const printOnly: string[] = [];
if (opts.watermark) printOnly.push("--watermark");
if (opts.headerTemplate) printOnly.push("--header-template");
if (opts.footerTemplate) printOnly.push("--footer-template");
if (opts.pageSize) printOnly.push("--page-size");
if (opts.margins || opts.marginTop || opts.marginRight || opts.marginBottom || opts.marginLeft) printOnly.push("--margins");
if (printOnly.length > 0) {
warn(`docx is content-fidelity: ${printOnly.join(", ")} do not apply to Word output`);
}
progress.begin("Converting to DOCX");
const { default: HTMLtoDOCX } = await import("html-to-docx");
const buf = await HTMLtoDOCX(finalHtml, null, {
title: rendered.meta.title,
creator: rendered.meta.author || undefined,
});
const bytes: Uint8Array = buf instanceof Uint8Array ? buf : new Uint8Array(await (buf as Blob).arrayBuffer());
fs.writeFileSync(outputPath, bytes);
progress.end("Converting to DOCX");
const kb = Math.round(fs.statSync(outputPath).size / 1024);
progress.done(`${rendered.meta.wordCount} words · ${kb}KB · ${outputPath} (content fidelity — layout is Word's)`);
return outputPath;
}
// Stage 3: write HTML to a tmp file browse can read
// (We don't actually write it; we pass inline via --from-file JSON.)
// But for preview mode and debugging, we still write to tmp.
const htmlTmp = tmpFile("html");
fs.writeFileSync(htmlTmp, rendered.html, "utf8");
fs.writeFileSync(htmlTmp, finalHtml, "utf8");
// Stage 4: spin up a dedicated tab, load HTML, (wait for Paged.js if TOC),
// then emit PDF. Always close the tab.
@@ -114,7 +260,7 @@ export async function generate(opts: GenerateOptions): Promise<string> {
try {
progress.begin("Loading HTML into Chromium");
browseClient.loadHtml({
html: rendered.html,
html: finalHtml,
waitUntil: "domcontentloaded",
tabId,
});
@@ -145,6 +291,10 @@ export async function generate(opts: GenerateOptions): Promise<string> {
tagged: opts.tagged !== false,
outline: opts.outline !== false,
printBackground: !!opts.watermark,
// Named landscape pages only take effect when Chromium honors CSS page
// sizes. Flip it ONLY when a promotion exists — minimal behavior change
// for every other document.
preferCSSPageSize: hasLandscape ? true : undefined,
toc: opts.toc,
});
progress.end("Generating PDF");
@@ -178,6 +328,21 @@ export async function preview(opts: PreviewOptions): Promise<string> {
progress.begin("Rendering HTML");
const markdown = fs.readFileSync(input, "utf8");
// Preview deliberately skips the diagram/image pre-pass (no browse daemon
// round-trip — preview is the fast loop). Be loud about the divergence so
// nobody signs off on a preview that lacks what the PDF will have.
if (!opts.quiet) {
const fenceCount = extractDiagramFences(markdown).fences.length;
const hasLocalImages = /!\[[^\]]*\]\((?!https?:|data:)[^)]+\)/.test(markdown);
if (fenceCount > 0 || hasLocalImages) {
process.stderr.write(
`[make-pdf] preview note: ${fenceCount > 0 ? `${fenceCount} diagram fence(s) shown as code` : ""}` +
`${fenceCount > 0 && hasLocalImages ? "; " : ""}` +
`${hasLocalImages ? "local images may not resolve from the preview location" : ""}` +
`\`generate\` renders them fully.\n`,
);
}
}
const rendered = render({
markdown,
title: opts.title,
+110 -37
View File
@@ -12,9 +12,11 @@
* breaks copy-paste extraction.
* - All paragraphs flush-left. No first-line indent, no justify, no
* p+p indent. text-align: left everywhere. 12pt margin-bottom.
* - Cover page has the same 1in margins as every other page. No flexbox
* center, no inset padding, no vertical centering. Distinction comes
* from eyebrow + larger title + hairline rule, not from centering.
* - Cover page (v1.58.0.0 poster revision, user-directed): 56pt title,
* 13pt meta, padding-top 1.4in for poster placement. Still no flexbox
* and no vertical centering; the inset is a deliberate top-third drop.
* (Supersedes the original "no inset padding" lock from the first
* /plan-design-review — the 32pt cover read as too small in print.)
* - `@page :first` suppresses running header/footer but does NOT override
* the 1in margin.
* - No <link>, no external CSS/fonts — everything inlined.
@@ -118,19 +120,76 @@ function pageRules(size: string, margin: string, opts: PrintCssOptions): string
` @bottom-center { content: none; }`,
` @bottom-right { content: none; }`,
`}`,
``,
// Landscape named page for promoted wide diagrams/images (image-policy).
// Chromium-only — exactly the engine this pipeline always prints with.
// Honored only when the print call passes preferCSSPageSize (orchestrator
// sets it when a promotion exists). Vertical centering is NOT done here —
// image-policy emits a computed inline margin-top instead (see the
// .page-wide comment below for why).
`@page wide {`,
` size: ${size} landscape;`,
` margin: ${margin};`,
`}`,
// No explicit break-before/after (the page-name CHANGE already forces a
// break on both sides) and NO height/flex centering: a flex .page-wide
// with min-height fragments into a phantom empty landscape page in
// Chromium (landscape-gate counted 5 pages for 3 promotions; bisected to
// min-height at any value). Vertical centering is done by image-policy
// instead — it knows each promoted block's aspect ratio and emits an
// inline margin-top, which fragmentation handles fine.
`.page-wide {`,
` page: wide;`,
` text-align: center;`,
`}`,
// width: 100% stretch is intentional for promoted content: auto-promoted
// rasters are >=~1600px (≈190dpi at the 9in landscape box — prints fine),
// and a directive-forced small image is the user's explicit call.
`.page-wide img, .page-wide svg { width: 100%; height: auto; max-width: none; }`,
`.page-wide figure.diagram > svg { max-width: none; }`,
].filter(line => line !== "").join("\n");
}
/**
* Screen layer appended for `--to html` exports. The print CSS stays the
* source of truth; this only makes the same document readable in a browser
* (centered measure, padding, no print-only chapter breaks forcing scroll
* gaps). Print output is unaffected — media-scoped.
*/
export function screenCss(): string {
return [
`@media screen {`,
// ~42em at 12pt ≈ 70-75 characters per line — the readable ceiling.
` body { max-width: 42em; margin: 0 auto; padding: 2.5em 1.5em; }`,
` .chapter { break-before: auto; }`,
` .watermark { display: none; }`,
` figure.diagram { overflow-x: auto; }`,
// Page numbers only exist in print; hide the empty spans + dot leaders.
` .toc li .toc-page, .toc li .toc-dots { display: none; }`,
`}`,
].join("\n");
}
function rootTypography(): string {
return [
`html { lang: en; }`,
// Zero image truncation, ever: every image caps at the content box,
// whatever element it lives in. Markdown images render as <p><img> (no
// figure), so a figure-scoped cap alone lets a 1900px screenshot run off
// the page edge. .page-wide deliberately overrides to fill its landscape
// box — still bounded, never clipped.
`img { max-width: 100%; height: auto; }`,
`body {`,
` font-family: ${SANS_STACK}, ${CJK_STACK}, ${EMOJI_FAMILIES}, sans-serif;`,
` font-size: 11pt;`,
` font-size: 12pt;`,
` line-height: 1.5;`,
` color: #111;`,
` background: white;`,
` hyphens: auto;`,
// No auto-hyphenation: it puts real "dif-\nferent" breaks into the PDF
// text layer, and clean copy-paste is the product contract (the
// combined-gate caught this the moment 12pt body made lines wrap).
// Left-aligned rag doesn't need hyphenation.
` hyphens: manual;`,
` font-variant-ligatures: common-ligatures;`,
` font-kerning: normal;`,
` text-rendering: geometricPrecision;`,
@@ -143,45 +202,47 @@ function rootTypography(): string {
function coverRules(enabled: boolean): string {
if (!enabled) return "";
return [
// Poster scale: the cover is the one page where type should feel huge.
`.cover {`,
` page: first;`,
` page-break-after: always;`,
` break-after: page;`,
` text-align: left;`,
` padding-top: 1.4in;`,
`}`,
`.cover .eyebrow {`,
` font-size: 9pt;`,
` font-size: 11pt;`,
` letter-spacing: 0.2em;`,
` text-transform: uppercase;`,
` color: #666;`,
` margin: 0 0 36pt;`,
`}`,
`.cover h1.cover-title {`,
` font-size: 32pt;`,
` line-height: 1.15;`,
` font-size: 56pt;`,
` line-height: 1.08;`,
` font-weight: 700;`,
` letter-spacing: -0.01em;`,
` margin: 0 0 18pt;`,
` max-width: 5.5in;`,
` letter-spacing: -0.02em;`,
` margin: 0 0 24pt;`,
` max-width: 6in;`,
` text-align: left;`,
`}`,
`.cover .cover-subtitle {`,
` font-size: 14pt;`,
` line-height: 1.4;`,
` font-size: 18pt;`,
` line-height: 1.35;`,
` font-weight: 400;`,
` color: #333;`,
` margin: 0 0 36pt;`,
` max-width: 5in;`,
` max-width: 5.5in;`,
` text-align: left;`,
`}`,
`.cover hr.rule {`,
` width: 2.5in;`,
` height: 0;`,
` border: 0;`,
` border-top: 1px solid #111;`,
` margin: 0 0 18pt 0;`,
` border-top: 1.5px solid #111;`,
` margin: 0 0 24pt 0;`,
`}`,
`.cover .cover-meta { font-size: 10pt; line-height: 1.6; color: #333; text-align: left; }`,
`.cover .cover-meta { font-size: 13pt; line-height: 1.6; color: #333; text-align: left; }`,
`.cover .cover-meta strong { font-weight: 700; }`,
].join("\n");
}
@@ -191,12 +252,12 @@ function tocRules(enabled: boolean): string {
return [
`.toc { page-break-after: always; break-after: page; }`,
`.toc h2 {`,
` font-size: 13pt;`,
` font-size: 16pt;`,
` text-transform: uppercase;`,
` letter-spacing: 0.15em;`,
` color: #666;`,
` font-weight: 600;`,
` margin: 0 0 0.5in;`,
` color: #444;`,
` font-weight: 700;`,
` margin: 0 0 0.4in;`,
`}`,
`.toc ol {`,
` list-style: none;`,
@@ -207,14 +268,14 @@ function tocRules(enabled: boolean): string {
` display: flex;`,
` align-items: baseline;`,
` gap: 0.25in;`,
` font-size: 11pt;`,
` line-height: 2;`,
` padding: 4pt 0;`,
` font-size: 12pt;`,
` line-height: 1.7;`,
` padding: 3pt 0;`,
`}`,
`.toc li .toc-title { flex: 0 0 auto; }`,
`.toc li .toc-dots { flex: 1 1 auto; border-bottom: 1px dotted #aaa; margin: 0 6pt; transform: translateY(-4pt); }`,
`.toc li .toc-page { flex: 0 0 auto; color: #666; font-variant-numeric: tabular-nums; }`,
`.toc li.level-2 { padding-left: 0.35in; font-size: 10pt; }`,
`.toc li.level-2 { padding-left: 0.35in; font-size: 11pt; }`,
`.toc li a { color: inherit; text-decoration: none; }`,
].join("\n");
}
@@ -229,7 +290,7 @@ function chapterRules(noChapterBreaks: boolean): string {
return [
breakRule,
`h1 {`,
` font-size: 22pt;`,
` font-size: 26pt;`,
` line-height: 1.2;`,
` font-weight: 700;`,
` letter-spacing: -0.01em;`,
@@ -237,9 +298,9 @@ function chapterRules(noChapterBreaks: boolean): string {
` break-after: avoid;`,
` page-break-after: avoid;`,
`}`,
`h2 { font-size: 15pt; line-height: 1.3; font-weight: 700; margin: 24pt 0 6pt; break-after: avoid; page-break-after: avoid; }`,
`h3 { font-size: 12pt; line-height: 1.4; font-weight: 700; text-transform: uppercase; letter-spacing: 0.08em; color: #333; margin: 18pt 0 4pt; break-after: avoid; page-break-after: avoid; }`,
`h4 { font-size: 11pt; font-weight: 700; margin: 12pt 0 4pt; break-after: avoid; page-break-after: avoid; }`,
`h2 { font-size: 18pt; line-height: 1.3; font-weight: 700; margin: 26pt 0 8pt; break-after: avoid; page-break-after: avoid; }`,
`h3 { font-size: 13.5pt; line-height: 1.4; font-weight: 700; text-transform: uppercase; letter-spacing: 0.08em; color: #333; margin: 20pt 0 5pt; break-after: avoid; page-break-after: avoid; }`,
`h4 { font-size: 12pt; font-weight: 700; margin: 14pt 0 5pt; break-after: avoid; page-break-after: avoid; }`,
].join("\n");
}
@@ -254,7 +315,7 @@ function blockRules(): string {
` orphans: 3;`,
`}`,
`p:first-child { margin-top: 0; }`,
`p.lead { font-size: 13pt; line-height: 1.45; color: #222; margin: 0 0 18pt; }`,
`p.lead { font-size: 14pt; line-height: 1.45; color: #222; margin: 0 0 18pt; }`,
].join("\n");
}
@@ -275,7 +336,7 @@ function codeRules(): string {
return [
`code {`,
` font-family: "SF Mono", Menlo, Consolas, monospace;`,
` font-size: 9.5pt;`,
` font-size: 10.5pt;`,
` background: #f4f4f4;`,
` padding: 1pt 3pt;`,
` border-radius: 2pt;`,
@@ -283,7 +344,7 @@ function codeRules(): string {
`}`,
`pre {`,
` font-family: "SF Mono", Menlo, Consolas, monospace;`,
` font-size: 9pt;`,
` font-size: 10pt;`,
` line-height: 1.4;`,
` background: #f7f7f5;`,
` padding: 10pt 12pt;`,
@@ -310,11 +371,11 @@ function quoteRules(): string {
` padding: 0 0 0 18pt;`,
` border-left: 2pt solid #111;`,
` color: #333;`,
` font-size: 11pt;`,
` font-size: 12pt;`,
` line-height: 1.5;`,
`}`,
`blockquote p { margin-bottom: 6pt; text-align: left; }`,
`blockquote cite { display: block; margin-top: 6pt; font-style: normal; font-size: 9.5pt; color: #666; letter-spacing: 0.02em; }`,
`blockquote cite { display: block; margin-top: 6pt; font-style: normal; font-size: 10pt; color: #666; letter-spacing: 0.02em; }`,
`blockquote cite::before { content: "— "; }`,
].join("\n");
}
@@ -323,13 +384,25 @@ function figureRules(): string {
return [
`figure { margin: 12pt 0; }`,
`figure img { display: block; max-width: 100%; height: auto; }`,
`figcaption { font-size: 9pt; color: #666; margin-top: 6pt; font-style: italic; }`,
`figcaption { font-size: 10pt; color: #666; margin-top: 6pt; font-style: italic; }`,
// Diagram figures (diagram-prepass): rendered mermaid/excalidraw SVG.
// SVGs scale to the content box and never split across pages.
`figure.diagram { break-inside: avoid; text-align: center; }`,
`figure.diagram > svg { max-width: 100%; height: auto; }`,
`figure.diagram .diagram-caption { text-align: center; }`,
// Diagnostic block for a fence that failed to render — loud, boxed,
// unmistakably an error (never silent raw code).
`figure.diagram-error { border: 1.5pt solid #b00020; padding: 8pt 10pt; text-align: left; }`,
`figure.diagram-error .diagram-error-title { font-weight: 700; color: #b00020; font-style: normal; margin: 0 0 6pt; }`,
`figure.diagram-error .diagram-error-detail { font-size: 8.5pt; white-space: pre-wrap; margin: 0; }`,
// Missing local image placeholder (non-strict mode).
`.image-missing { display: inline-block; border: 1pt dashed #b00020; color: #b00020; padding: 4pt 8pt; font-size: 9pt; }`,
].join("\n");
}
function tableRules(): string {
return [
`table { width: 100%; border-collapse: collapse; margin: 12pt 0; font-size: 10pt; }`,
`table { width: 100%; border-collapse: collapse; margin: 12pt 0; font-size: 11pt; }`,
`th, td { border-bottom: 0.5pt solid #ccc; padding: 5pt 8pt; text-align: left; vertical-align: top; }`,
`th { font-weight: 700; border-bottom: 1pt solid #111; background: transparent; }`,
].join("\n");
@@ -346,7 +419,7 @@ function listRules(): string {
function footnoteRules(): string {
return [
`.footnote-ref { font-size: 0.75em; vertical-align: super; line-height: 0; text-decoration: none; color: #0055cc; }`,
`.footnotes { margin-top: 24pt; padding-top: 12pt; border-top: 0.5pt solid #ccc; font-size: 9.5pt; line-height: 1.4; }`,
`.footnotes { margin-top: 24pt; padding-top: 12pt; border-top: 0.5pt solid #ccc; font-size: 10pt; line-height: 1.4; }`,
`.footnotes ol { padding-left: 18pt; }`,
].join("\n");
}
+71 -8
View File
@@ -14,6 +14,7 @@
import { marked } from "marked";
import { smartypants } from "./smartypants";
import { printCss, type PrintCssOptions } from "./print-css";
import { applyImageDirectives } from "./image-policy";
export interface RenderOptions {
markdown: string;
@@ -34,6 +35,14 @@ export interface RenderOptions {
// Page layout
pageSize?: "letter" | "a4" | "legal" | "tabloid";
margins?: string;
// Per-side margins (override `margins`). Must reach the CSS @page rule:
// when a landscape promotion flips preferCSSPageSize on, the CSS margins
// are the ones Chromium honors — dropping per-side flags there would
// silently change the whole document's layout (Codex P2).
marginTop?: string;
marginRight?: string;
marginBottom?: string;
marginLeft?: string;
// Footer behavior. pageNumbers defaults to true. When footerTemplate is set,
// CSS page numbers are suppressed so the custom Chromium footer wins cleanly.
@@ -60,8 +69,13 @@ export function render(opts: RenderOptions): RenderResult {
// 1. Markdown → HTML
const rawHtml = marked.parse(opts.markdown, { async: false }) as string;
// 1.5. Image directive suffixes: `![a](x.png){width=50%}` → data-gstack-*
// attributes. Before the sanitizer (which keeps data- attrs) so the brace
// text never reaches smartypants or the final page.
const directedHtml = applyImageDirectives(rawHtml);
// 2. Sanitize
const cleanHtml = sanitizeUntrustedHtml(rawHtml);
const cleanHtml = sanitizeUntrustedHtml(directedHtml);
// 3. Decode common entities so smartypants can match raw " and '.
// marked HTML-encodes quotes in text ("hello" → &quot;hello&quot;);
@@ -91,7 +105,9 @@ export function render(opts: RenderOptions): RenderResult {
confidential: opts.confidential !== false,
runningHeader: derivedTitle,
pageSize: opts.pageSize,
margins: opts.margins,
// Compose per-side margins into the CSS shorthand so @page stays the
// single source of truth even under preferCSSPageSize.
margins: composeMargins(opts),
pageNumbers: showPageNumbers,
};
const css = printCss(cssOptions);
@@ -106,14 +122,22 @@ export function render(opts: RenderOptions): RenderResult {
})
: "";
// TOC anchors must resolve: assign id="toc-N" to each H1-H3 in the same
// order buildTocBlock scans them, or every TOC link is a dead href (masked
// in PDFs by Chromium outline bookmarks, glaring in --to html). Headings
// that already carry an id keep it — the ids array records the ACTUAL id
// per heading so TOC entries always link to something real.
const anchored = opts.toc ? addHeadingIds(typographicHtml) : { html: typographicHtml, ids: [] };
const anchoredHtml = anchored.html;
const tocBlock = opts.toc
? buildTocBlock(typographicHtml)
? buildTocBlock(anchoredHtml, anchored.ids)
: "";
// Wrap body in .chapter sections at H1 boundaries if chapter breaks are on.
const chapterHtml = opts.noChapterBreaks
? `<section class="chapter">${typographicHtml}</section>`
: wrapChaptersByH1(typographicHtml);
? `<section class="chapter">${anchoredHtml}</section>`
: wrapChaptersByH1(anchoredHtml);
const watermarkBlock = opts.watermark
? `<div class="watermark">${escapeHtml(opts.watermark)}</div>`
@@ -256,13 +280,13 @@ function buildCoverBlock(opts: {
* Page numbers are filled in by Paged.js (when --toc is passed and Paged.js
* polyfill is injected).
*/
function buildTocBlock(html: string): string {
function buildTocBlock(html: string, ids: string[] = []): string {
const headings = extractHeadings(html);
if (headings.length === 0) return "";
const items = headings.map((h, i) => {
const level = h.level >= 2 ? "level-2" : "level-1";
const id = `toc-${i}`;
const id = ids[i] ?? `toc-${i}`;
return [
` <li class="${level}">`,
` <span class="toc-title"><a href="#${id}">${escapeHtml(h.text)}</a></span>`,
@@ -282,6 +306,28 @@ function buildTocBlock(html: string): string {
].join("\n");
}
/**
* Assign id="toc-N" to every H1-H3 in document order — the same order
* extractHeadings/buildTocBlock use, so anchors and entries line up by index.
* A heading that already carries an id keeps it, and the returned ids array
* records the actual id for that slot so the TOC links to the real anchor
* instead of a nonexistent toc-N.
*/
function addHeadingIds(html: string): { html: string; ids: string[] } {
const ids: string[] = [];
const out = html.replace(/<(h[1-3])([^>]*)>/gi, (full, tag: string, attrs: string) => {
const existing = attrs.match(/\bid\s*=\s*["']([^"']*)["']/i)?.[1];
if (existing) {
ids.push(existing);
return full;
}
const id = `toc-${ids.length}`;
ids.push(id);
return `<${tag}${attrs} id="${id}">`;
});
return { html: out, ids };
}
function extractHeadings(html: string): Array<{ level: number; text: string }> {
const re = /<(h[1-3])[^>]*>([\s\S]*?)<\/\1>/gi;
const headings: Array<{ level: number; text: string }> = [];
@@ -352,11 +398,28 @@ function decodeTextEntities(s: string): string {
.replace(/&amp;/g, "&");
}
/** Compose `margin: top right bottom left` from per-side overrides + base. */
function composeMargins(opts: {
margins?: string; marginTop?: string; marginRight?: string;
marginBottom?: string; marginLeft?: string;
}): string | undefined {
const base = opts.margins ?? "1in";
if (!opts.marginTop && !opts.marginRight && !opts.marginBottom && !opts.marginLeft) {
return opts.margins;
}
return [
opts.marginTop ?? base,
opts.marginRight ?? base,
opts.marginBottom ?? base,
opts.marginLeft ?? base,
].join(" ");
}
function stripTags(html: string): string {
return html.replace(/<[^>]+>/g, "");
}
function escapeHtml(s: string): string {
export function escapeHtml(s: string): string {
return s
.replace(/&/g, "&amp;")
.replace(/</g, "&lt;")
+13 -1
View File
@@ -11,9 +11,17 @@ export type FontMode = "sans"; // v1: Helvetica only. Future: "serif" | "custom"
* Options for `$P generate` — the public CLI contract.
* Matches the flag set documented in the CEO plan.
*/
export type OutputFormat = "pdf" | "html" | "docx";
export interface GenerateOptions {
input: string; // markdown input path
output?: string; // PDF output path (default: /tmp/<slug>.pdf)
output?: string; // output path (default: /tmp/<slug>.<ext>)
// Output format (NOT --format, which is a --page-size alias):
// pdf — print-quality PDF via Chromium (default)
// html — single self-contained file, zero network references
// docx — content-fidelity Word document (diagrams embedded as PNG)
to?: OutputFormat;
// Page layout
margins?: string; // "1in" | "72pt" | "25mm" | "2.54cm"
@@ -44,6 +52,10 @@ export interface GenerateOptions {
// Network
allowNetwork?: boolean; // default: false
// Strict mode (eng-review D6.1): missing/remote images hard-fail instead of
// warn + placeholder. For CI docs pipelines that need determinism.
strict?: boolean; // default: false
// Metadata
title?: string;
author?: string;