diff --git a/docs/module-internals.md b/docs/module-internals.md
index 0fe3b2a..3818e05 100644
--- a/docs/module-internals.md
+++ b/docs/module-internals.md
@@ -11,7 +11,7 @@ module.
 
 ## `noai/c2pa.py`
 
-`noai/c2pa.py` — PNG chunk parser; use `extract_c2pa_chunk(path)` to get raw caBX payload, `has_c2pa_metadata(path)` to detect. Do not reimplement chunk parsing. `extract_c2pa_info(path)` sets `synthid_watermark`/`synthid_vendors` when the manifest is signed by a SynthID-using vendor, and `soft_binding`/`soft_binding_vendors` when a `c2pa.soft-binding` `alg` names a forensic-watermark vendor (`soft_binding_vendors_in(buffer)` is the shared byte-scan, used by both the PNG parser and the non-PNG binary path). PNG/caBX chunk reads are clamped to the remaining file size (`safe_length = min(length, remaining)`; skipped chunks use seek) so a malformed huge `length` cannot drive a multi-GB allocation (shared safety discipline matching `isobmff.scan_c2pa_region`).
+`noai/c2pa.py` — C2PA reading, **official c2pa-python `Reader` first, hand-rolled parser as fallback** (migrated 2026-06-18; the official lib is a core dep, MIT/Apache, spec-tracking). `read_manifest_store_json(path)` runs `Reader.try_create` with a default `Context` (NO trust enforcement — we report what is in the file, we do not gate on cert trust) and returns the **whole** manifest-store JSON (every manifest plus ingredient manifests); it is memoized per (path, mtime) (`lru_cache(maxsize=8)`) because one `identify`/`get_ai_metadata` call invokes the structured parser ~3x on the same file. `extract_c2pa_info(path)` builds its dict from that store JSON (`_info_from_store_json`: structured `claim_generator` from the active manifest's `claim_generator` / `claim_generator_info[].name`, `timestamp` from `signature_info.time`) and falls back to the legacy caBX parser (`_extract_c2pa_info_png`) when the reader is unavailable (broken/absent wheel, `reader_available()` False) or finds no parseable manifest (synthetic/partial test blobs, the inject round-trip's re-stitched chunk). **Both paths share `_populate_registry_fields(buf, info)`** — the issuer / AI-tool / action / source-type / SynthID / soft-binding registry byte-scan applied to the store JSON (reader path) or the raw caBX bytes (fallback) — so the return-dict shape is identical and the registry stays the single source of truth. Whole-store scanning is load-bearing: a ChatGPT *edit* of a Sora generation keeps `trainedAlgorithmicMedia` + issuer "OpenAI" on the **parent/ingredient** manifest, not the active "opened" one (the active manifest's `signature_info.issuer` is "OpenAI", `common_name` "Truepic Lens CLI in Sora", so the issuer field now reads "OpenAI, Truepic" — first-match-wins platform attribution still resolves OpenAI). `extract_c2pa_info` now also serves non-PNG containers (JPEG/AVIF/MP4) structurally via the reader; the consumers (`identify`, `synthid_source`, `get_ai_metadata`) already merge `info OR byte-scan`, so this strictly upgrades the non-PNG path with no double-counting. `synthid_watermark`/`synthid_vendors` is set when the manifest is signed by a SynthID-using vendor on AI content; `soft_binding`/`soft_binding_vendors` when a `c2pa.soft-binding` `alg` names a forensic-watermark vendor (`soft_binding_vendors_in(buffer)` is the shared byte-scan, used by both paths and the non-PNG binary path). `extract_c2pa_chunk` / `inject_c2pa_chunk` / `has_c2pa_metadata` stay the PNG caBX byte tools (raw-chunk extraction for `extractor.py`, test injection, fallback detection). PNG/caBX chunk reads are clamped to the remaining file size (`safe_length = min(length, remaining)`; skipped chunks use seek) so a malformed huge `length` cannot drive a multi-GB allocation (shared safety discipline matching `isobmff.scan_c2pa_region`). Regression-guarded by `tests/test_noai.py::TestC2PARealSamples::{test_extract_info_uses_reader_store,test_fallback_to_png_parser_when_reader_unavailable}`.
 
 ## `noai/constants.py`
 
@@ -29,7 +29,7 @@ module.
 
 **Visible-mark detection** (`check_visible`, signals `visible_sparkle` / `visible_doubao` / `visible_jimeng` / `visible_samsung`): the Gemini sparkle keeps its own file-level path (`_visible_sparkle` → `gemini_engine.detect_sparkle_confidence`, promoted only at confidence ≥ `_SPARKLE_THRESHOLD` 0.5; corpus-tuned to separate Gemini sparkles ≥0.56 from non-sparkle ≤0.49), while Doubao/Jimeng/Samsung reuse the registry detectors (`_visible_text_marks` → `watermark_registry`, iterating `_VISIBLE_MARK_PLATFORM`), each gated by its own engine NCC threshold via `MarkDetection.detected` (Doubao 0.4, Jimeng 0.45, Samsung 0.4). Doubao/Jimeng are normally also caught by the TC260 AIGC metadata label and Samsung by its C2PA + `genAIType` marker, so the visible path is their stripped-metadata fallback. Visible marks set `platform` only when no harder signal already did, and (like the sparkle) are excluded from integrity-clash vendor claims. The cv2 dependency lives in the engines, not here.
 
-**`import identify` is deliberately light** (~21 MB; ~36 MB with cv2 loaded by a visible-mark run, ~106 MB for a full `check_visible` run): it imports only the pure `noai.c2pa`/`noai.constants` submodules, and `noai/__init__` is lazy (see "Test and lint"), so torch/diffusers are NOT pulled at import even in a full `gpu`/`detect` install — fits a 512 MB host. The heavy paths are opt-in: `check_invisible=True` needs the `detect`/`trustmark` extras (each pulls **torch**; TrustMark also **downloads weights**), so on a core-only deploy leave `check_invisible` off (it is a no-op there anyway). Before the lazy `__init__`, the mere presence of torch in the env inflated `import identify` to ~420 MB.
+**`import identify` is deliberately light** (~26 MB; ~36 MB with cv2 loaded by a visible-mark run, ~106 MB for a full `check_visible` run): it imports the `noai.c2pa`/`noai.constants` submodules, and `noai/__init__` is lazy (see "Test and lint"), so torch/diffusers are NOT pulled at import even in a full `gpu`/`detect` install — fits a 512 MB host. `noai.c2pa` does eagerly import the **c2pa-python** binary (Rust + cryptography, ~+5 MB RSS, no torch) for the primary `Reader` path — light enough to stay on the dependency-light host; a broken/absent wheel degrades to the byte-scan parser (`reader_available()` False). The heavy paths are opt-in: `check_invisible=True` needs the `detect`/`trustmark` extras (each pulls **torch**; TrustMark also **downloads weights**), so on a core-only deploy leave `check_invisible` off (it is a no-op there anyway). Before the lazy `__init__`, the mere presence of torch in the env inflated `import identify` to ~420 MB.
 
 **C2PA platform attribution is device-token-first, issuer-scan fallback** (`_device_platform` scans manifest bytes for `_DEVICE_C2PA_PLATFORM` tokens, then `_attribute_platform`/`_ISSUER_PLATFORM`).
 
@@ -103,6 +103,8 @@ The 11 survivors are near-white ill-conditioning (reverse-alpha divides by `1-a`
 
 `_text_mark_engine.py` — **shared base for the three reverse-alpha text-mark engines (Doubao/Jimeng/Samsung), extracted 2026-06-09** (they were ~90% byte-identical clones). `TextMarkEngine(config: TextMarkConfig)` owns the whole `locate → extract_mask → detect → _fixed/_aligned_alpha_map → _apply_reverse_alpha → remove_watermark_reverse_alpha` pipeline (+ the asset-keyed `load_alpha_template`/`glyph_silhouette`/`template_match_score` caches). Each engine module is now a thin subclass: it supplies only its `TextMarkConfig` (the tuned constants, the bundled asset, and the bounded structural deltas — `corner` br/bl, `margin_floor` 4/2, `morph_open_size` 5/3, `min_gw` 8/16) plus the test-facing module shims (`_alpha_template`/`_glyph_silhouette`/`_template_match_score` + the constants). Behavior is byte-exact vs the old per-engine code (the three engine test suites pass unchanged). Gemini stays a SEPARATE engine (its multi-size fixed-slot sparkle model is genuinely different). Add a new text mark = a new `TextMarkConfig` + a thin subclass + one registry `_text_mark(...)` row. The engine bullets below describe each mark's calibration history; the LOGIC lives here.
 
+**`_apply_reverse_alpha` runs on the glyph crop only:** `amap` is zero outside the glyph `region` (x, y, w, h), so the blend is a no-op there (`(wm - 0)/(1 - 0) == wm`, and a uint8→float32→uint8 round-trip is exact). It copies the frame through and computes the reverse-alpha math on the `region` crop only — byte-identical to the old full-frame pass (verified: Doubao 130 + Jimeng 22 placements, 0 mismatches) but O(glyph) not O(image). The full-frame pass cost ~275 ms on a 12 MP frame for a glyph that is <0.1% of it, once per candidate placement (fixed + aligned ≈ 2×/removal); the crop drops that to ~2 ms. Mirror of the Gemini `_core_and_bg` crop. `remove_watermark_reverse_alpha` passes the `region` each `_fixed/_aligned_alpha_map` returns.
+
 ## `doubao_engine.py`
 
 `doubao_engine.py` — **a thin `_text_mark_engine.TextMarkEngine` subclass (config only) since 2026-06-09.** visible Doubao "豆包AI生成" remover/detector (cv2/numpy, no GPU). `DoubaoEngine.locate` anchors a bottom-right box by **geometry** (mark scales with image WIDTH), `extract_mask` pulls the light, low-chroma glyphs (the detection candidate) using a per-pixel channel-spread proxy `sat = roi.max(axis=2) - roi.min(axis=2)` (no HSV conversion). `detect` is **shape-consistent**: it matches the bundled alpha glyph silhouette (`assets/doubao_alpha.png`) against the candidate via zero-mean normalized correlation (`_template_match_score`, cv2 `TM_CCOEFF_NORMED`), gated at `DETECT_NCC_THRESHOLD` 0.4 over a small `DETECT_MIN_COVERAGE` floor. Keying on glyph SHAPE (not coverage heuristics) fixed #23 (corpus FP 7/1243).
diff --git a/src/remove_ai_watermarks/_text_mark_engine.py b/src/remove_ai_watermarks/_text_mark_engine.py
index 8fae06d..708a80a 100644
--- a/src/remove_ai_watermarks/_text_mark_engine.py
+++ b/src/remove_ai_watermarks/_text_mark_engine.py
@@ -302,11 +302,28 @@ class TextMarkEngine:
         amap[ay : ay + gh, ax : ax + gw] = cv2.resize(at, (gw, gh), interpolation=cv2.INTER_LINEAR)
         return amap, (ax, ay, gw, gh)
 
-    def _apply_reverse_alpha(self, image: NDArray[Any], amap: NDArray[Any]) -> NDArray[Any]:
-        """Invert the alpha blend with ``amap``: ``original = (wm - a*logo)/(1-a)``."""
-        a3 = np.clip(amap, 0.0, 1.0)[:, :, None]
+    def _apply_reverse_alpha(
+        self, image: NDArray[Any], amap: NDArray[Any], region: tuple[int, int, int, int]
+    ) -> NDArray[Any]:
+        """Invert the alpha blend with ``amap``: ``original = (wm - a*logo)/(1-a)``.
+
+        ``amap`` is zero everywhere except the glyph ``region`` (x, y, w, h), so the
+        blend is a no-op (``(wm - 0)/(1 - 0) == wm``) outside it. Compute the math on
+        the glyph crop only and copy the rest through unchanged -- byte-identical to a
+        full-frame pass (a uint8 round-trip through float32 is exact), but O(glyph)
+        instead of O(image): a full-frame pass costs ~275 ms on a 12 MP frame for a
+        glyph that is <0.1% of it, and it runs once per candidate placement.
+        """
+        out = image.copy()
+        x1, y1, gw, gh = region
+        x2, y2 = x1 + gw, y1 + gh
+        if y1 >= y2 or x1 >= x2:
+            return out
+        a3 = np.clip(amap[y1:y2, x1:x2], 0.0, 1.0)[:, :, None]
         logo = np.array(self.config.alpha_logo_bgr, np.float32)
-        return np.clip((image.astype(np.float32) - a3 * logo) / np.clip(1.0 - a3, 0.25, 1.0), 0, 255).astype(np.uint8)
+        roi = out[y1:y2, x1:x2].astype(np.float32)
+        out[y1:y2, x1:x2] = np.clip((roi - a3 * logo) / np.clip(1.0 - a3, 0.25, 1.0), 0, 255).astype(np.uint8)
+        return out
 
     def remove_watermark_reverse_alpha(self, image: NDArray[Any], *, residual_inpaint: bool = True) -> NDArray[Any]:
         """Recover the original pixels by inverting the alpha blend, then clear the
@@ -335,8 +352,8 @@ class TextMarkEngine:
         best_out: NDArray[Any] | None = None
         best_amap: NDArray[Any] | None = None
         best_residual = float("inf")
-        for amap, _region in maps:
-            out = self._apply_reverse_alpha(image, amap)
+        for amap, region in maps:
+            out = self._apply_reverse_alpha(image, amap, region)
             residual = self.detect(out).confidence
             if residual < best_residual:
                 best_residual, best_out, best_amap = residual, out, amap
diff --git a/src/remove_ai_watermarks/gemini_engine.py b/src/remove_ai_watermarks/gemini_engine.py
index f308709..97385f6 100644
--- a/src/remove_ai_watermarks/gemini_engine.py
+++ b/src/remove_ai_watermarks/gemini_engine.py
@@ -142,6 +142,19 @@ class GeminiEngine:
     # gate separates them with a wide margin.
     _OVERSUB_FOOTPRINT_FRAC = 0.05
 
+    # Mid-tone over-subtraction (2026-06-18 prod "the color just changed, not removed"
+    # report). The numerator fraction above only trips when reverse-alpha drives a
+    # footprint pixel fully NEGATIVE -- the dark-background black-pit case. On a MID-TONE
+    # background a sparkle fainter than the captured alpha is over-subtracted into a
+    # visibly DARKER-than-background diamond while no pixel ever crosses zero, so the
+    # numerator gate misses it and ships the dark mark. Predict the reverse-alpha output
+    # at the bright core, (core - a*logo)/(1-a); when it lands more than this many gray
+    # levels BELOW the local background ring, reverse-alpha would leave a dark diamond --
+    # inpaint instead. Calibrated wide: clean removals predict within ~12 of background
+    # (demo_banana ~-1, a bright-bg sparkle ~-12), the prod regression predicts ~-40 and
+    # the issue #30 dark case ~-82, so 25 separates keep-vs-inpaint with margin.
+    _OVERSUB_DARK_MARGIN = 25.0
+
     # Per-image alpha gain (under-subtraction fix). The captured alpha peaks ~0.51
     # (a ~51%-opaque sparkle). Some real Gemini sparkles are rendered MORE opaque,
     # so the fixed alpha under-subtracts and reverse-alpha leaves a bright residual
@@ -642,19 +655,24 @@ class GeminiEngine:
         a_cap = float(alpha_roi.max())
         if a_cap < 0.2:
             return None
-        gray = image.astype(np.float32).mean(axis=2)
         core = alpha_roi >= a_cap * self._ALPHA_GAIN_CORE_FRAC
         if not bool(core.any()):
             return None
-        core_obs = float(np.percentile(gray[y1:y2, x1:x2][core], 75))
-        # Local background = a ring just outside the footprint box.
+        # Convert only the footprint+ring crop to gray, not the whole image: every
+        # sample below lives inside the ring box, so a full-image mean is wasted work
+        # that scales with resolution (~70 ms on a 12 MP image, recomputed for both
+        # the alpha-gain estimate and the over-subtraction gate). The crop is sized by
+        # the footprint, so this is O(footprint^2) regardless of image size.
         ih, iw = image.shape[:2]
         pad = int((x2 - x1) * 0.7)
         ry1, ry2 = max(0, y1 - pad), min(ih, y2 + pad)
         rx1, rx2 = max(0, x1 - pad), min(iw, x2 + pad)
-        ring = gray[ry1:ry2, rx1:rx2]
+        ring = image[ry1:ry2, rx1:rx2].astype(np.float32).mean(axis=2)
+        # Footprint box expressed in ring-crop coordinates.
+        fy1, fy2, fx1, fx2 = y1 - ry1, y2 - ry1, x1 - rx1, x2 - rx1
+        core_obs = float(np.percentile(ring[fy1:fy2, fx1:fx2][core], 75))
         ring_mask = np.ones(ring.shape, dtype=bool)
-        ring_mask[y1 - ry1 : y2 - ry1, x1 - rx1 : x2 - rx1] = False
+        ring_mask[fy1:fy2, fx1:fx2] = False
         if int(ring_mask.sum()) < 10:
             return None
         return core_obs, float(np.median(ring[ring_mask])), a_cap
@@ -704,11 +722,19 @@ class GeminiEngine:
         alpha_map: NDArray[Any],
         position: tuple[int, int],
     ) -> bool:
-        """True when reverse-alpha would drive the footprint dark (issue #30).
+        """True when reverse-alpha would drive the footprint dark.
 
-        Tests the numerator ``watermarked - alpha*logo`` over the sparkle body: a
-        brightening overlay can never make it negative, so a large negative fraction
-        means the fixed alpha over-estimates this image's opacity.
+        Two signatures of the captured alpha over-estimating this image's sparkle
+        opacity, either of which means reverse-alpha would leave a dark mark:
+
+        1. Dark-background black pit (issue #30): the numerator
+           ``watermarked - alpha*logo`` over the sparkle body. A brightening overlay
+           can never make it negative, so a large negative fraction means the fixed
+           alpha over-subtracts past black.
+        2. Mid-tone dark diamond (see ``_OVERSUB_DARK_MARGIN``): on a mid-tone
+           background the over-subtraction darkens the core well below the background
+           without any pixel crossing zero, so case 1 misses it. Predict the
+           reverse-alpha core output and trip when it lands far below the local ring.
         """
         placed = self._footprint_indices(alpha_map, position, image.shape)
         if placed is None:
@@ -720,7 +746,18 @@ class GeminiEngine:
         roi = image[y1:y2, x1:x2].astype(np.float32)
         numerator = roi.mean(axis=2) - np.clip(alpha_roi, 0.0, 0.99) * self.logo_value
         frac = float((numerator[body] < 0).sum()) / float(body.sum())
-        return frac > self._OVERSUB_FOOTPRINT_FRAC
+        if frac > self._OVERSUB_FOOTPRINT_FRAC:
+            return True
+
+        # Mid-tone darkening: predict the reverse-alpha output at the bright core and
+        # compare to the local background ring (reuses the FP-gate / alpha-gain machinery).
+        cb = self._core_and_bg(image, alpha_map, position)
+        if cb is None:
+            return False
+        core_obs, bg, a_cap = cb
+        a = min(a_cap, 0.99)
+        predicted_core = (core_obs - a * self.logo_value) / (1.0 - a)
+        return predicted_core < bg - self._OVERSUB_DARK_MARGIN
 
     def _inpaint_footprint(
         self,
diff --git a/tests/test_gemini_engine.py b/tests/test_gemini_engine.py
index ce746eb..4aa64cb 100644
--- a/tests/test_gemini_engine.py
+++ b/tests/test_gemini_engine.py
@@ -298,6 +298,28 @@ class TestOverSubtractionGuard:
         dalpha = self.engine.get_interpolated_alpha(dpos[2])
         assert self.engine._reverse_alpha_oversubtracts(dark, dalpha, (dpos[0], dpos[1])) is True
 
+    def test_midtone_background_does_not_leave_dark_diamond(self):
+        """2026-06-18 prod report: a faint sparkle on a MID-TONE background was
+        over-subtracted into a darker-than-background diamond ("the color just
+        changed, not removed"). No footprint pixel crosses zero there, so the
+        numerator gate misses it -- the dark-margin gate must catch it and inpaint.
+        """
+        image, (x, y, w, h) = self._composite_sparkle(bg_value=160)
+        footprint = image[y : y + h, x : x + w]
+        # The numerator (black-pit) gate alone does NOT fire on a mid-tone background.
+        alpha = self.engine.get_interpolated_alpha(w)
+        roi = footprint.astype(np.float32).mean(axis=2)
+        body = alpha[:h, :w] >= self.engine._FOOTPRINT_ALPHA
+        numerator = roi - np.clip(alpha[:h, :w], 0.0, 0.99) * self.engine.logo_value
+        assert float((numerator[body] < 0).sum()) / float(body.sum()) <= self.engine._OVERSUB_FOOTPRINT_FRAC
+        # ...but the over-subtraction guard still trips (via the dark-margin path) and
+        # removal leaves the footprint reading like the mid-tone background, not darker.
+        assert self.engine._reverse_alpha_oversubtracts(image, alpha, (x, y)) is True
+        out = self.engine.remove_watermark(image)
+        cleaned = out[y : y + h, x : x + w]
+        assert abs(float(cleaned.mean()) - 160.0) < 15.0, f"dark diamond: mean={cleaned.mean()}"
+        assert int(cleaned.min()) > 160 - 30, f"dark pit: min={cleaned.min()}"
+
 
 class TestUnderSubtractionGain:
     """Under-subtraction fix: a sparkle MORE opaque than the captured alpha must not