docs(synthid): correct protect_text guidance -- it does NOT block removal (keep ON)

An A/B at strength 0.3 on a real e-commerce infographic (updated GPU study) reverses the earlier claim: SynthID is a GLOBAL watermark, so 0.3 removes it whether protect_text is on or off, and protection SALVAGES text fidelity (medium headings/body stay readable; off, they garble). The earlier 'protect_text shields the watermark, use --no-protect-text' was wrong -- it mistook the 0.10 strength failure for a protection effect. Recommended SynthID config: ~0.3 + protect_text ON (the default). Also document the oracle scope: the Gemini app 'Verify with SynthID' is the only valid SynthID oracle; openai.com/verify is provenance-scoped (C2PA) and does NOT measure SynthID. Corrects CLAUDE.md + README + watermark_profiles comment shipped in cddbaf6. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-07-20 22:50:52 +02:00 · 2026-05-31 16:50:13 -07:00
parent cddbaf6413
commit b991b11a19
3 changed files with 10 additions and 7 deletions
@@ -116,7 +116,7 @@ image → encode to latent space (VAE) at native resolution

 > **Default strength is `0.30`, tuned to remove the current Google SynthID.** An oracle-verified study (fresh Gemini images, "Verify with SynthID") found the current SynthID survives `0.10`/`0.15`/`0.20` and clears only at `0.30`. SynthID is a moving target (the threshold has climbed `0.05` → `0.10` → `~0.30` as Google hardens it), and there is no local SynthID detector, so the tool cannot self-check and auto-tune. If the oracle still reads SynthID, raise `--strength` further; if you care more about preserving fine text, lower it. `0.30` softens dense typography somewhat, so use the lowest value that comes back clean on the oracle.
 >
-> **For SynthID in text, also pass `--no-protect-text`.** Text protection preserves text regions, but SynthID hides in them, so on text-heavy images the watermark can survive inside text at `0.30` unless protection is off. This trades text crispness for full removal — a genuine tradeoff, not a bug.
+> **Keep text protection on (the default) — it does not block SynthID removal.** SynthID is a global watermark, so strength `0.30` clears it whether or not text is protected, and text protection keeps headings and body text readable through the pass (only the very finest print still softens at `0.30`). You do not need to disable it for removal; `--no-protect-text` only trades text quality for a faster run.
 >
 > **OpenAI / ChatGPT images do not carry Google SynthID** (they use C2PA metadata, stripped by the metadata step), so `0.30` is overkill there; `--strength 0.10` preserves quality and the metadata strip is what matters.
 >
@@ -18,11 +18,14 @@ CTRLREGEN_MODEL_ID = "yepengliu/ctrlregen"
 # treat this as a moving target and re-test against fresh Gemini output periodically.
 # Cost of 0.3: SSIM ~0.97 vs original (modest), but fine/dense typography softens, and
 # it is OVERKILL for non-SynthID sources (OpenAI/ChatGPT carry C2PA, not Google SynthID
-# -- 0.10 is plenty there). Two known tensions, documented but not auto-handled here:
-# (1) higher strength deforms text more (why text protection runs by default), and
-# (2) `protect_text` SHIELDS the text regions where SynthID hides, so text-region
-# SynthID can survive at 0.3 unless `--no-protect-text` is passed. (Fixed LOW/MEDIUM/
-# HIGH presets were removed -- the one knob is this default + the per-call override.)
+# -- 0.10 is plenty there). protect_text is RECOMMENDED ON for SynthID removal (A/B
+# verified 2026-05-31): SynthID is GLOBAL, so 0.3 clears it whether protection is on or
+# off, and protection salvages medium-text fidelity (~3x runtime); only the very finest
+# text still softens at 0.3. (An earlier comment claimed protect_text shields the
+# watermark -- that was wrong, it mistook the 0.10 strength failure for a protection
+# effect.) The only true tension is the finest typography softening at this aggressive
+# strength. (Fixed LOW/MEDIUM/HIGH presets were removed -- the one knob is this default
+# plus the per-call override.)
 DEFAULT_STRENGTH = 0.30

 # CtrlRegen removes watermarks by regenerating from (near) clean Gaussian noise,