The Horizon
This is the combined-features fixture. Every feature turned on simultaneously. The gate asserts that all of these paragraphs extract cleanly from the PDF with pdftotext.

A paragraph with bold, italic, and inline code tokens — each of which gets a different HTML treatment. None should fragment text on copy-paste.

A paragraph with “curly quotes”, ‘single quotes’, an em dash — like this, and an ellipsis… All three get smartypants transforms.

A subsection heading

First list item with some words that keep it on one line.
Second list item with more words.
Third list item.

A blockquote from Van Dyke. Her diminished size is in me, not in her.

A second chapter

This content begins on a fresh page because the default chapter-breaks rule fires. Extract must still find these paragraphs.

A final paragraph with enough words to trigger hyphenation across the line wrap boundary. Extraordinary words sometimes hyphenate. Interdisciplinary ones certainly do.
