diff --git a/CLAUDE.md b/CLAUDE.md
index d422ac7..cd3783a 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -29,7 +29,7 @@ You are a **principal Python engineer** maintaining a CLI tool and library for r
 
 - `noai/c2pa.py` — PNG chunk parser; use `extract_c2pa_chunk(path)` to get raw caBX payload, `has_c2pa_metadata(path)` to detect. Do not reimplement chunk parsing. `extract_c2pa_info(path)` sets `synthid_watermark`/`synthid_vendors` when the manifest is signed by a SynthID-using vendor, and `soft_binding`/`soft_binding_vendors` when a `c2pa.soft-binding` `alg` names a forensic-watermark vendor (`soft_binding_vendors_in(buffer)` is the shared byte-scan, used by both the PNG parser and the non-PNG binary path).
 - `noai/constants.py` — PNG_SIGNATURE, C2PA_CHUNK_TYPE, C2PA_SIGNATURES, C2PA_ISSUERS, `SYNTHID_C2PA_ISSUERS` (issuers that pair SynthID with C2PA: Google, OpenAI), and `C2PA_SOFT_BINDINGS` (soft-binding `alg` prefix → forensic-watermark vendor: Adobe TrustMark, Digimarc, Imatag, Steg.AI, Microsoft, ...). Add a new issuer/binding here, not inline.
-- `metadata.py` — `synthid_source(path)` returns the vendor name(s) if the C2PA manifest implies a SynthID pixel watermark, else None. Format-agnostic: PNG via the caBX parser, JPEG/WebP/AVIF/HEIF/JXL via a binary scan (C2PA marker + SynthID issuer + AI-source marker). `get_ai_metadata` surfaces the verdict, and `metadata --check` prints it as a callout. Both `get_ai_metadata` and `has_ai_metadata` guard the PIL open with `except Exception` (HEIC/unknown formats raise non-OSError) and fall through to the binary scan. `xai_signature(path)` detects xAI/Grok's EXIF-only scheme (`ImageDescription` = `Signature: <base64>` + UUID `Artist`); it feeds `has_ai_metadata`, `get_ai_metadata` (key `xai_signature`), and `identify`. `iptc_ai_system(path)` detects the IPTC Photo Metadata 2025.1 AI-disclosure XMP properties (`IPTC_AI_FIELD_MARKERS` = `AISystemUsed`/`AISystemVersionUsed`/`AIPromptInformation`/`AIPromptWriterName`) and returns the `AISystemUsed` generator name (or `"fields present"`). `remove_ai_metadata` routes **ISOBMFF video** (`.mp4`/`.mov`/`.m4v`) through the same `isobmff.strip_c2pa_boxes` as AVIF/HEIF (MP4 is ISOBMFF), and `_scrub_ai_exif` removes the xAI signature + AI-generator EXIF tags on JPEG output.
+- `metadata.py` — `scan_head(path, size=1MB)` is the shared input for every C2PA/AIGC/IPTC byte scan: first `size` bytes plus, for ISOBMFF, the late provenance-box payloads from `isobmff.scan_c2pa_region` (catches a manifest after a large `mdat`); behavior-neutral (`f.read(size)`) for non-ISOBMFF. Use it instead of `open().read(1MB)` for any new marker scan. `synthid_source(path)` returns the vendor name(s) if the C2PA manifest implies a SynthID pixel watermark, else None. Format-agnostic: PNG via the caBX parser, JPEG/WebP/AVIF/HEIF/JXL via a binary scan (C2PA marker + SynthID issuer + AI-source marker). `get_ai_metadata` surfaces the verdict, and `metadata --check` prints it as a callout. Both `get_ai_metadata` and `has_ai_metadata` guard the PIL open with `except Exception` (HEIC/unknown formats raise non-OSError) and fall through to the binary scan. `xai_signature(path)` detects xAI/Grok's EXIF-only scheme (`ImageDescription` = `Signature: <base64>` + UUID `Artist`); it feeds `has_ai_metadata`, `get_ai_metadata` (key `xai_signature`), and `identify`. `iptc_ai_system(path)` detects the IPTC Photo Metadata 2025.1 AI-disclosure XMP properties (`IPTC_AI_FIELD_MARKERS` = `AISystemUsed`/`AISystemVersionUsed`/`AIPromptInformation`/`AIPromptWriterName`) and returns the `AISystemUsed` generator name (or `"fields present"`). `remove_ai_metadata` routes **ISOBMFF video** (`.mp4`/`.mov`/`.m4v`) through the same `isobmff.strip_c2pa_boxes` as AVIF/HEIF (MP4 is ISOBMFF), and `_scrub_ai_exif` removes the xAI signature + AI-generator EXIF tags on JPEG output.
 - `identify.py` — `identify(path)` aggregates every locally-readable signal (C2PA issuer→platform, C2PA soft-binding forensic-watermark vendor, IPTC "Made with AI" + IPTC 2025.1 `AISystemUsed`, embedded SD/ComfyUI params, SynthID proxy, xAI/Grok EXIF signature via `metadata.xai_signature`, visible Gemini sparkle, open invisible watermark, Adobe TrustMark via `trustmark_detector`) into one `ProvenanceReport`. `is_ai_generated` is True or None (never asserted False — stripped metadata is not proof of clean origin). Visible-sparkle is promoted only at confidence ≥ `_SPARKLE_THRESHOLD` (0.5; corpus-tuned to separate Gemini sparkles ≥0.56 from non-sparkle ≤0.49). The cv2 dependency lives in `gemini_engine.detect_sparkle_confidence`, not here. **C2PA platform attribution is device-token-first, issuer-scan fallback** (`_device_platform` scans manifest bytes for `_DEVICE_C2PA_PLATFORM` tokens, then `_attribute_platform`/`_ISSUER_PLATFORM`). **Why, verified on real signed files 2026-05-26:** the old issuer-only byte-scan matched ANY issuer substring anywhere, so multi-entity manifests mis-attributed -- Leica→"Truepic" (a signing authority in the trust chain), Nikon→"Adobe Firefly" (XMP-toolkit "Adobe" + the sample's "Adobe_MAX" name), Pixel→"Google (Gemini)" ("Google LLC" cert org), Truepic→"Google". A distinctive device token wins instead. **Token distinctiveness is load-bearing:** bare `b"Truepic"` mis-fires (it appears in unrelated trust chains -- it mis-attributed the OpenAI `chatgpt-1.png` fixture), so the token is the specific `b"Truepic_Lens"` from the Lens SDK claim generator; likewise `b"Pixel Camera"` (cert CN) not bare `b"Pixel"`. `_DEVICE_C2PA_PLATFORM` lists ONLY tokens **verified against a real C2PA file**: Leica (`lc_c2pa`/`Leica Camera`), Nikon (`NIKON`), Pixel (`Pixel Camera` -- from a real Pixel 10 Pro file attached to c2pa-rs issue #1609/#1554), Sony (`sony.sig`/`sony.cert` -- Sony's own C2PA assertion namespace, verified on a real Sony PXW-Z300 file; NOT bare "Sony" which is a common EXIF Make), Truepic (`Truepic_Lens`). Canon/Samsung/Bria have **no public direct-download C2PA sample** (checked exhaustively: GitHub issue/PR attachments, contentcredentials gallery, HF datasets -- all upload-to-verify or token-gated; Canon's only public file was a self-signed hobbyist CR3, not factory), so they stay unmapped until a real file is captured (same fixture discipline as Grok/Doubao). The Sony sample is video (MP4) -- our ISOBMFF C2PA path detects it; Sony Alpha stills likely share the `sony.*` namespace but are not separately verified. Camera C2PA marks capture authenticity, not AI (Pixel carries `computationalCapture`, not `trainedAlgorithmicMedia`), so these never set `is_ai` -- that stays driven by digital-source-type. `c2pa.cbor_text_after` (now public) is best-effort for the `generator` detail string only and can be None when the manifest keys it `claim_generator_info` (Pixel). **Issuer→generator mapping is `is_ai`-gated** (`_attribute_platform(issuers, is_ai=c2pa_is_ai)`): a specific AI-generator platform is named only when the digital-source-type is `trainedAlgorithmicMedia`; on a non-AI source an issuer substring is treated as incidental (an "Adobe XMP" toolkit string in an *unmapped* Canon/Sony capture would otherwise mislabel it "Adobe Firefly"), so it degrades to the neutral "C2PA signer: X" label. Real Firefly/OpenAI/Google output carries the AI source-type, so it is unaffected (verified: chatgpt-1.png→OpenAI, firefly-1.png→Adobe Firefly still attribute). `_attribute_platform` defaults `is_ai=True` so the mapping stays unit-testable in isolation. Add device tokens to `_DEVICE_C2PA_PLATFORM`, generator/issuer platforms to `_ISSUER_PLATFORM`, not inline. For non-PNG containers (JPEG/WebP/AVIF/HEIF/JXL) the caBX parser returns nothing, so issuer (`_issuers_in`) and generator (`_ai_tools_in`, reusing `C2PA_AI_TOOLS`) are recovered by binary-scanning the first MB. EXIF `Software` / `Make` / `Artist` / `ImageDescription` and XMP `CreatorTool` generator tags are read by `metadata.exif_generator` (PIL+piexif for any format PIL opens incl. AVIF, plus a container-agnostic XMP raw-byte scan that also covers HEIF/JXL), matched against `AI_GENERATOR_TOKENS` so ordinary editors (plain "Adobe Photoshop") and real-camera `Make` ("Apple"/"Canon") are not flagged. **Ideogram tags its output with EXIF `Make="Ideogram AI"`** (verified on a real download 2026-05-24) — that's why `Make` is read. **Integrity-clash detection** (`_integrity_clashes`, surfaced as `ProvenanceReport.integrity_clashes`, printed in red by `identify` and serialized to `--json`): contradictions between independent generator stamps are a laundering/spoofing tell. Two rules: (1) two or more distinct AI-origin vendors named by independent signals (e.g. C2PA OpenAI + EXIF `Make="Ideogram AI"`), and (2) a camera-capture C2PA device (`_DEVICE_C2PA_PLATFORM`) coexisting with any AI-generation marker. Vendor normalization is `_vendor_of` over `_AI_VENDOR_TOKENS` (so a C2PA "Google (Gemini)" issuer and a SynthID-Google proxy agree, while different vendors clash). **High-precision by design:** only hard generator stamps feed it (C2PA-issuer when source is AI, SynthID, EXIF/XMP generator, IPTC `AISystemUsed`, xAI, AIGC); the fuzzy visible sparkle and the open invisible watermark are **excluded** (the latter can be a by-product of our own SDXL removal pass). The c2pa vendor is classified from the issuer attribution / generator, NOT the resolved `platform` (a camera label like "Google Pixel" would mis-normalize to "Google"). All real single-origin fixtures (chatgpt/firefly/doubao/grok/mj) verified to produce **zero** clashes (false-positive guard in `test_identify.py::TestRealSamplesHaveNoClash`).
 - `gemini_engine.py` — visible Gemini-sparkle remover/detector (cv2/numpy, no GPU). `detect_sparkle_confidence(path)` is the file-level entry point used by `identify.py`.
 - `doubao_engine.py` — visible Doubao "豆包AI生成" remover/detector (cv2/numpy, no GPU). `DoubaoEngine.locate` anchors a bottom-right box by **geometry** (mark scales with image WIDTH, fractions in module constants; no bundled template), `extract_mask` pulls the light low-saturation glyphs with a **polarity-aware white top-hat** (brighter-than-blurred-local-bg, so white-paper documents are left untouched instead of smeared), `detect` thresholds glyph coverage (`DETECT_MIN_COVERAGE` 0.16 separates real marks ≥0.20 from corner noise, which stays ≤0.06 on large images but can spike to ~0.15 on tiny ones), `remove_watermark` inpaints (cv2 Telea/NS) and **bails when coverage > `MAX_INPAINT_COVERAGE` 0.50** (dense-text background → would smear). Wired into `visible --mark` via `cli._run_doubao_if_selected`. **Logo is near-white (~253), not the gray some third-party tools assume.** Best on photo/illustration backgrounds; high-contrast edges leave faint residue (cv2-inpaint limit). Clean per-pixel reverse-alpha (Gemini-style) is the future upgrade but needs a captured/distilled alpha map — see below.
@@ -54,7 +54,7 @@ Who embeds what, and whether it is locally detectable (so we know which gaps are
 - **No detectable signal on download (correctly reported `unknown`):** **Recraft** (PNG export is a re-encoded design export — strips everything), **Krea hosting FLUX 2** (no imwatermark despite FLUX — the host omits the encoder, same as Stability's hosted SDXL), and Midjourney (embeds nothing). Lesson: the imwatermark detector only fires on *pristine* output from a pipeline that runs the encoder (diffusers default, official BFL), not from re-hosts (Krea/Stability) or re-encoded exports (Recraft/Canva).
 - **Invisible but NOT locally detectable (proprietary, API/oracle only — same wall as SynthID):** Amazon Titan Image Generator + Nova Canvas (Bedrock `DetectGeneratedContent` API), Kakao (new SynthID image adopter, May 2026), NVIDIA Cosmos (SynthID video). No local detector possible; treat like SynthID.
 - **C2PA 2.4 "Durable Content Credentials" (April 2026; verified against the spec) raise the bar for metadata stripping.** 2.4 defines soft bindings (an invisible watermark or a content fingerprint) plus a server-side manifest repository and a new `c2pa.repository-receipt` assertion. Per the spec: "if a C2PA manifest is removed from an asset, but a copy of that manifest remains in a provenance store elsewhere, the manifest and asset may be matched using available soft bindings." So our local `metadata --remove` deletes the *embedded* manifest, but a fingerprint/watermark soft binding can still re-link the image to its manifest in a repository server-side. Stripping the file is becoming necessary-but-not-sufficient against durable provenance. (Our parsers target the stable embedded-manifest format documented in C2PA 2.1 §11; that format is unchanged in 2.4 -- the new pieces are repository/soft-binding infra, not the on-file box layout, so no parser change is implied.) Spec: https://spec.c2pa.org/specifications/specifications/2.4/specs/C2PA_Specification.html We now READ the soft-binding `alg` (`C2PA_SOFT_BINDINGS` / `soft_binding_vendors_in`) to name the forensic-watermark vendor, and locally DECODE the one open scheme, Adobe TrustMark (`trustmark_detector`); the rest (Digimarc/Imatag/Steg.AI/...) stay name-only (proprietary decoders).
-- **Built 2026-05-26 (this batch):** soft-binding `alg` vendor detection; IPTC Photo Metadata 2025.1 AI-disclosure fields (`AISystemUsed` etc.); **video C2PA metadata** detect + strip for MP4/MOV/M4V (free — `isobmff.py` is format-agnostic, MP4 is ISOBMFF); Adobe TrustMark open decoder. NOT done (out of cheap reach, per the feasibility review): visible video-logo removal (needs a video frame pipeline) and audio (SynthID/ElevenLabs/Resemble/Suno all oracle-only or unmarked). The soft-binding **box detection window**: non-PNG/video detection scans the first 1 MB, so a C2PA box placed after a large `mdat` in a streaming MP4 can be missed — front-placed manifests (the common case) are caught.
+- **Built 2026-05-26 (this batch):** soft-binding `alg` vendor detection; IPTC Photo Metadata 2025.1 AI-disclosure fields (`AISystemUsed` etc.); **video C2PA metadata** detect + strip for MP4/MOV/M4V (free — `isobmff.py` is format-agnostic, MP4 is ISOBMFF); Adobe TrustMark open decoder. NOT done (out of cheap reach, per the feasibility review): visible video-logo removal (needs a video frame pipeline) and audio (SynthID/ElevenLabs/Resemble/Suno all oracle-only or unmarked). **Box detection window — now handled (v0.6.8):** detection no longer relies on a fixed first-MB read. `metadata.scan_head(path, size)` reads the first `size` bytes and, for ISOBMFF, appends the payloads of late provenance boxes found by `isobmff.scan_c2pa_region` (a file-seeking top-level box walker that skips past `mdat` by size without reading it), so a C2PA/AIGC/IPTC manifest placed AFTER a large `mdat` in a streaming/non-faststart MP4 is now caught. Every C2PA/marker byte scan (`has_ai_metadata`, `aigc_label`, `iptc_ai_system`, `synthid_source`, `exif_generator` XMP, `get_ai_metadata` soft-binding, and `identify`) goes through `scan_head`; it is behavior-neutral for non-ISOBMFF inputs (exactly `f.read(size)`). The remaining gap is EXIF/XMP stored as items *inside the `meta` box* (still needs meta-box surgery / exiftool).
 - **Regulatory driver (context, not a code change):** AI-content labeling mandates are expanding, which pushes more generators toward exactly the C2PA + watermark signals we read. The full per-jurisdiction table lives in README "## Legal" -- keep it there, not duplicated here. Newly added + primary-source verified 2026-05-26: **EU AI Act Article 50** machine-readable marking applicable **2026-08-02** (verified against the article text); **South Korea AI Framework Act Art. 31(3)** in force since **22 January 2026** (verified via Kim & Chang + FPF/Korea Times; Enforcement Decree accepts an invisible-watermark label); **California AB 853** (amends the CA AI Transparency Act) latent-disclosure duty operative **2026-08-02**, requiring a disclosure "permanent or extraordinarily difficult to remove" (verified against the leginfo bill text -- this is the exact disclosure our tool strips); **India IT Amendment Rules 2026** in force **2026-02-20** (verified via Chambers), which prominently-label + permanent-provenance-id all synthetic media AND **expressly prohibit removing/suppressing the label or metadata** -- the first major all-content removal ban outside China. **Removal liability (README "## Legal" disclaimer):** the tool is lawful general-purpose software; liability sits with the remover and is intent-gated -- downstream acts (fraud/deception/IP), plus US DMCA 17 USC 1202 (removing copyright-management info to conceal infringement), plus the removal-as-such bans in China + India. When extending the README table, verify each date/article against the statute/bill text before committing, not against search summaries.
 
 ## Known limitations
diff --git a/README.md b/README.md
index 2575574..f489481 100644
--- a/README.md
+++ b/README.md
@@ -327,7 +327,7 @@ Tracked but not yet implemented:
 - **AVIF / HEIF EXIF/XMP inside the `meta` box**. Removal already strips top-level C2PA `uuid` / JUMBF `jumb` boxes and any AI-labelled top-level XMP `uuid` box, and non-ISOBMFF audio/video (WebM, MP3, WAV, FLAC, OGG) is stripped losslessly via ffmpeg. Still open: EXIF/XMP stored as *items inside the `meta` box* (typical for AVIF/HEIF stills) — needs `meta`-box surgery (iinf/iloc + mdat splice) or `exiftool` (a non-bundled binary dependency).
 - **Multi-signal contradiction reporting ("Integrity Clash")** — *shipped (v0.6.7)*. `identify` now surfaces contradictions between independent provenance signals (two different AI vendors named by separate stamps, or camera-capture C2PA credentials next to AI-generation markers) as `integrity_clashes` (shown in red in the table view and in `--json`), rather than collapsing to a single verdict. Inspired by [arXiv:2603.02378](https://arxiv.org/abs/2603.02378).
 - **More C2PA device signers**. Leica, Nikon, Google Pixel, Sony, and Truepic are mapped (each verified against a real signed file). Canon and Samsung Galaxy (AI-edit) are deferred until a real signed sample surfaces — no public direct-download C2PA file exists for them today (upload-to-verify / news-agency-licensed only).
-- **C2PA detection window for streaming MP4**. Non-PNG detection scans the first 1 MB; a manifest placed after a large `mdat` in a streaming MP4 can be missed (front-placed manifests, the common case, are caught).
+- **C2PA detection window for streaming MP4** — *shipped (v0.6.8)*. Detection no longer relies on a fixed first-MB read: for ISOBMFF containers it walks the top-level boxes (seeking past `mdat` by size) to find a C2PA / AIGC / IPTC manifest placed after the media data, so a streaming / non-faststart MP4 is caught. The remaining gap is EXIF/XMP stored as items *inside the `meta` box* (needs meta-box surgery or `exiftool`).
 - **Resemble PerTh audio detection** — evaluated, not feasible with the public API: `get_watermark()` returns a raw bit array with no presence/confidence flag, so watermarked vs. clean audio can't be reliably separated without Resemble's fixed payload or a confidence service. Same wall as the SynthID pixel detector.
 - **Video pipeline (`noai-video`)**: per-frame inpainting and tracking for Sora 2 dynamic logo, Veo 3.1 badge, Kling, Runway. Separate package, not folded into this repo.
 
diff --git a/pyproject.toml b/pyproject.toml
index 83dae0a..b121918 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -1,6 +1,6 @@
 [project]
 name = "remove-ai-watermarks"
-version = "0.6.7"
+version = "0.6.8"
 description = "Remove visible and invisible AI watermarks from images (Gemini / Nano Banana, ChatGPT, Stable Diffusion)"
 readme = "README.md"
 requires-python = ">=3.10"
diff --git a/src/remove_ai_watermarks/__init__.py b/src/remove_ai_watermarks/__init__.py
index 16eef5c..189296f 100644
--- a/src/remove_ai_watermarks/__init__.py
+++ b/src/remove_ai_watermarks/__init__.py
@@ -1,3 +1,3 @@
 """Remove-AI-Watermarks: Unified tool for removing visible and invisible AI watermarks."""
 
-__version__ = "0.6.7"
+__version__ = "0.6.8"
diff --git a/src/remove_ai_watermarks/identify.py b/src/remove_ai_watermarks/identify.py
index 560b9d2..2ec6291 100644
--- a/src/remove_ai_watermarks/identify.py
+++ b/src/remove_ai_watermarks/identify.py
@@ -32,6 +32,7 @@ from remove_ai_watermarks.metadata import (
     exif_generator,
     get_ai_metadata,
     iptc_ai_system,
+    scan_head,
     xai_signature,
 )
 from remove_ai_watermarks.noai.c2pa import cbor_text_after, extract_c2pa_info, soft_binding_vendors_in
@@ -332,8 +333,9 @@ def identify(image_path: Path, *, check_visible: bool = True, check_invisible: b
 
     # First MB covers C2PA (PNG caBX, JPEG APP11, AVIF/HEIF/JXL uuid box) and
     # IPTC markers for the non-PNG path where extract_c2pa_info returns {}.
-    with open(image_path, "rb") as f:
-        head = f.read(_SCAN_BYTES)
+    # scan_head also seeks out late ISOBMFF provenance boxes (manifest after a
+    # large mdat in a streaming MP4) that a fixed first-MB read would miss.
+    head = scan_head(image_path, _SCAN_BYTES)
 
     signals: list[Signal] = []
     watermarks: list[str] = []
diff --git a/src/remove_ai_watermarks/metadata.py b/src/remove_ai_watermarks/metadata.py
index 3456809..546a8bd 100644
--- a/src/remove_ai_watermarks/metadata.py
+++ b/src/remove_ai_watermarks/metadata.py
@@ -132,6 +132,28 @@ def _is_ai_key(key: str) -> bool:
     return any(kw in key_lower for kw in AI_KEYWORDS)
 
 
+def scan_head(image_path: Path, size: int = 1024 * 1024) -> bytes:
+    """First ``size`` bytes of the file, plus -- for ISOBMFF containers -- the
+    payloads of any provenance (``uuid`` / ``jumb``) boxes found beyond that
+    window by seeking past large boxes like ``mdat``.
+
+    This is the shared input for every C2PA / AIGC / IPTC byte scan. The
+    ISOBMFF extension catches a manifest placed AFTER the media data in a
+    streaming / non-faststart MP4, which a fixed first-MB read would miss. For
+    non-ISOBMFF inputs it is exactly ``f.read(size)`` -- behavior-neutral.
+    """
+    with open(image_path, "rb") as f:
+        head = f.read(size)
+    # Lazy import: isobmff imports this module's constants at top level.
+    from remove_ai_watermarks.noai import isobmff
+
+    if isobmff.is_isobmff(head):
+        region = isobmff.scan_c2pa_region(image_path)
+        if region:
+            head += region
+    return head
+
+
 def has_ai_metadata(image_path: Path) -> bool:
     """Check if an image contains AI-generation metadata.
 
@@ -167,9 +189,8 @@ def has_ai_metadata(image_path: Path) -> bool:
         pass
 
     # Binary scan covers C2PA (PNG caBX, JPEG APP11, AVIF/HEIF/JXL uuid boxes)
-    # and IPTC AI markers in XMP. Read only the first 512KB to bound memory.
-    with open(image_path, "rb") as f:
-        data = f.read(512 * 1024)
+    # and IPTC AI markers in XMP. First 512KB (plus late ISOBMFF provenance boxes).
+    data = scan_head(image_path, 512 * 1024)
     if b"c2pa" in data.lower() or b"C2PA" in data:
         return True
     if C2PA_UUID in data:
@@ -196,8 +217,7 @@ def aigc_label(image_path: Path) -> dict[str, str] | None:
     import json
     import re
 
-    with open(image_path, "rb") as f:
-        data = f.read(1024 * 1024)
+    data = scan_head(image_path)
     match = re.search(rb"<TC260:AIGC>(.*?)</TC260:AIGC>", data, re.DOTALL)
     if not match:
         return None
@@ -219,8 +239,7 @@ def iptc_ai_system(image_path: Path) -> str | None:
     extractable, otherwise the literal ``"fields present"``. Container-agnostic
     raw-byte scan; handles both XMP element and attribute serializations.
     """
-    with open(image_path, "rb") as f:
-        data = f.read(1024 * 1024)
+    data = scan_head(image_path)
     if not any(marker in data for marker in IPTC_AI_FIELD_MARKERS):
         return None
     match = re.search(rb"AISystemUsed[=:\s]*[\"'>]\s*([^<\"']{1,120})", data)
@@ -259,8 +278,7 @@ def synthid_source(image_path: Path) -> str | None:
     # Non-PNG containers (JPEG APP11, WebP, AVIF/HEIF/JXL uuid box) keep the
     # C2PA manifest where the PNG parser can't reach it. Binary-scan for the
     # same signal: a C2PA manifest from a SynthID-using issuer on AI content.
-    with open(image_path, "rb") as f:
-        data = f.read(1024 * 1024)
+    data = scan_head(image_path)
     has_c2pa = b"c2pa" in data.lower() or C2PA_UUID in data
     # Matches both "trainedAlgorithmicMedia" and "compositeWithTrainedAlgorithmicMedia".
     ai_source = b"trainedAlgorithmicMedia" in data or b"TrainedAlgorithmicMedia" in data
@@ -311,8 +329,7 @@ def exif_generator(image_path: Path) -> str | None:
 
     # XMP CreatorTool: text, container-agnostic (covers HEIF/JXL via raw scan).
     try:
-        with open(image_path, "rb") as f:
-            head = f.read(1024 * 1024)
+        head = scan_head(image_path)
         for match in re.finditer(rb"CreatorTool[>\"'=\s]{1,4}([^<\"']{1,80})", head):
             candidates.append(match.group(1).decode("latin1", "replace"))
     except Exception as exc:
@@ -467,8 +484,7 @@ def get_ai_metadata(image_path: Path) -> dict[str, str]:
     if "synthid_watermark" not in result and (vendor := synthid_source(image_path)):
         result.setdefault("synthid_watermark", synthid_verdict(vendor))
     if "soft_binding" not in result:
-        with open(image_path, "rb") as f:
-            head = f.read(1024 * 1024)
+        head = scan_head(image_path)
         if vendors := soft_binding_vendors_in(head):
             result["soft_binding"] = ", ".join(vendors)
 
@@ -507,10 +523,18 @@ def _strip_with_ffmpeg(source_path: Path, output_path: Path) -> Path:
         )
     output_path.parent.mkdir(parents=True, exist_ok=True)
     cmd = [
-        ffmpeg, "-y", "-loglevel", "error",
-        "-i", str(source_path),
-        "-map_metadata", "-1", "-map_chapters", "-1",
-        "-c", "copy",
+        ffmpeg,
+        "-y",
+        "-loglevel",
+        "error",
+        "-i",
+        str(source_path),
+        "-map_metadata",
+        "-1",
+        "-map_chapters",
+        "-1",
+        "-c",
+        "copy",
         str(output_path),
     ]
     result = subprocess.run(cmd, capture_output=True, text=True, check=False)  # noqa: S603
diff --git a/src/remove_ai_watermarks/noai/isobmff.py b/src/remove_ai_watermarks/noai/isobmff.py
index 0c27858..105847d 100644
--- a/src/remove_ai_watermarks/noai/isobmff.py
+++ b/src/remove_ai_watermarks/noai/isobmff.py
@@ -22,6 +22,7 @@ from typing import TYPE_CHECKING
 
 if TYPE_CHECKING:
     from collections.abc import Iterator
+    from pathlib import Path
 
 from remove_ai_watermarks.metadata import (
     AIGC_MARKERS,
@@ -78,6 +79,58 @@ def is_isobmff(data: bytes) -> bool:
     return len(data) >= 8 and data[4:8] == b"ftyp"
 
 
+def scan_c2pa_region(path: str | Path, *, max_total: int = 4 * 1024 * 1024) -> bytes:
+    """Concatenated payloads of top-level ``uuid`` / ``jumb`` boxes in an ISOBMFF
+    file, found by seeking past other boxes (``mdat`` etc.) by size.
+
+    C2PA manifests and XMP packets (incl. AI labels) live in top-level ``uuid``
+    boxes; JPEG-XL uses ``jumb``. In a streaming / non-faststart MP4 the manifest
+    sits AFTER a multi-megabyte ``mdat``, so a fixed first-MB read misses it. This
+    walks box headers (8-16 bytes each) and seeks past payloads it does not need,
+    so it never loads ``mdat`` into memory and works on multi-GB files. Returns
+    the relevant box payloads (capped at ``max_total``), or ``b""`` for a
+    non-ISOBMFF file or on any read error.
+    """
+    collected = bytearray()
+    try:
+        with open(path, "rb") as f:
+            sniff = f.read(8)
+            if len(sniff) < 8 or sniff[4:8] != b"ftyp":
+                return b""
+            f.seek(0, 2)
+            file_size = f.tell()
+            pos = 0
+            while pos + 8 <= file_size and len(collected) < max_total:
+                f.seek(pos)
+                header = f.read(8)
+                if len(header) < 8:
+                    break
+                size32 = struct.unpack(">I", header[:4])[0]
+                box_type = header[4:8]
+                payload_off = pos + 8
+                if size32 == 1:
+                    ext = f.read(8)
+                    if len(ext) < 8:
+                        break
+                    size = struct.unpack(">Q", ext)[0]
+                    payload_off = pos + 16
+                elif size32 == 0:
+                    size = file_size - pos
+                else:
+                    size = size32
+                if size < (payload_off - pos) or pos + size > file_size:
+                    break
+                if box_type in C2PA_BOX_TYPES:
+                    f.seek(payload_off)
+                    to_read = min(pos + size - payload_off, max_total - len(collected))
+                    if to_read > 0:
+                        collected += f.read(to_read)
+                pos += size
+    except OSError:
+        return b""
+    return bytes(collected)
+
+
 def strip_c2pa_boxes(data: bytes) -> tuple[bytes, int]:
     """Return ``(cleaned_bytes, stripped_count)`` with AI-provenance boxes removed.
 
diff --git a/tests/test_metadata.py b/tests/test_metadata.py
index f2cf570..842e603 100644
--- a/tests/test_metadata.py
+++ b/tests/test_metadata.py
@@ -631,6 +631,11 @@ _MP4_FTYP = b"\x00\x00\x00\x18ftypmp42\x00\x00\x00\x00mp42isom"
 _MP4_MDAT = b"\x00\x00\x00\x10mdat" + b"videodat"
 
 
+def _box(box_type: bytes, payload: bytes) -> bytes:
+    """Build a 32-bit-size ISOBMFF box: [size:4][type:4][payload]."""
+    return (8 + len(payload)).to_bytes(4, "big") + box_type + payload
+
+
 class TestVideoC2pa:
     """C2PA in MP4 (ISOBMFF) -- detect + strip, reusing the image box walker."""
 
@@ -654,6 +659,59 @@ class TestVideoC2pa:
         assert has_ai_metadata(out) is False
 
 
+class TestLateProvenanceBox:
+    """A C2PA / provenance box placed AFTER a large mdat (streaming / non-faststart
+    MP4) must still be detected -- the fixed first-MB scan would miss it."""
+
+    def _mp4_late_c2pa(self, tmp_path: Path, gap: int = 1_500_000) -> Path:
+        from remove_ai_watermarks.metadata import C2PA_UUID
+
+        big_mdat = _box(b"mdat", b"\x00" * gap)  # > 1 MB pushes the manifest past the scan window
+        manifest = C2PA_UUID + b"OpenAI jumbf c2pa ... trainedAlgorithmicMedia ..."
+        p = tmp_path / "stream.mp4"
+        p.write_bytes(_MP4_FTYP + big_mdat + _box(b"uuid", manifest))
+        return p
+
+    def test_scan_c2pa_region_finds_late_box(self, tmp_path: Path):
+        from remove_ai_watermarks.metadata import C2PA_UUID
+        from remove_ai_watermarks.noai.isobmff import scan_c2pa_region
+
+        region = scan_c2pa_region(self._mp4_late_c2pa(tmp_path))
+        assert C2PA_UUID in region
+        assert b"trainedAlgorithmicMedia" in region
+
+    def test_fixed_window_would_have_missed_it(self, tmp_path: Path):
+        # Documents the regression the box walk fixes: the manifest is beyond 1 MB.
+        from remove_ai_watermarks.metadata import C2PA_UUID
+
+        p = self._mp4_late_c2pa(tmp_path)
+        assert C2PA_UUID not in p.read_bytes()[: 1024 * 1024]
+
+    def test_scan_head_includes_late_box(self, tmp_path: Path):
+        from remove_ai_watermarks.metadata import C2PA_UUID, scan_head
+
+        assert C2PA_UUID in scan_head(self._mp4_late_c2pa(tmp_path))
+
+    def test_has_ai_metadata_detects_late_manifest(self, tmp_path: Path):
+        assert has_ai_metadata(self._mp4_late_c2pa(tmp_path)) is True
+
+    def test_scan_c2pa_region_non_isobmff_is_empty(self, tmp_path: Path):
+        from remove_ai_watermarks.noai.isobmff import scan_c2pa_region
+
+        p = tmp_path / "not.bin"
+        p.write_bytes(b"\x89PNG\r\n\x1a\n not an isobmff file")
+        assert scan_c2pa_region(p) == b""
+
+    def test_front_placed_manifest_still_detected(self, tmp_path: Path):
+        # Regression: a faststart MP4 (manifest before mdat) is unaffected.
+        from remove_ai_watermarks.metadata import C2PA_UUID
+
+        manifest = C2PA_UUID + b"OpenAI ... trainedAlgorithmicMedia ..."
+        p = tmp_path / "front.mp4"
+        p.write_bytes(_MP4_FTYP + _box(b"uuid", manifest) + _box(b"mdat", b"\x00" * 100))
+        assert has_ai_metadata(p) is True
+
+
 class TestIsobmffMetadataRemoval:
     """Container-level AI-provenance stripping across ISOBMFF image/video/audio."""
 
@@ -718,9 +776,17 @@ class TestFfmpegMetadataStrip:
     def _wav_with_tag(self, path: Path, tag: str = "Suno AI") -> None:
         subprocess.run(  # noqa: S603
             [
-                shutil.which("ffmpeg"), "-y", "-loglevel", "error",
-                "-f", "lavfi", "-i", "sine=frequency=440:duration=0.1",
-                "-metadata", f"title={tag}", str(path),
+                shutil.which("ffmpeg"),
+                "-y",
+                "-loglevel",
+                "error",
+                "-f",
+                "lavfi",
+                "-i",
+                "sine=frequency=440:duration=0.1",
+                "-metadata",
+                f"title={tag}",
+                str(path),
             ],
             check=True,
         )
diff --git a/uv.lock b/uv.lock
index fe8bd8d..0b13897 100644
--- a/uv.lock
+++ b/uv.lock
@@ -2865,7 +2865,7 @@ wheels = [
 
 [[package]]
 name = "remove-ai-watermarks"
-version = "0.6.7"
+version = "0.6.8"
 source = { editable = "." }
 dependencies = [
     { name = "click" },