lower(strength): drop vendor-adaptive floor to OpenAI 0.10 / Google 0.15

A 2026-06-14 oracle re-test on the deployed Modal controlnet worker (v0.10.0) cleared SynthID at OpenAI 0.10 (2 photoreal) and Google 0.15 (2 native 2816x1536, retiring the "native >= 0.30" guess), while a pixel sweep showed the 2026-06-04 cert floors (0.20/0.30) over-regenerated for no efficacy gain (Google MAE -20% at 0.15). Lowers OPENAI_STRENGTH 0.20->0.10, GEMINI_STRENGTH and UNKNOWN_STRENGTH 0.30->0.15. Caveats documented in watermark_profiles.py + docs: removal near this floor is seed-non-deterministic (a service must pin a verified seed), and the n=2 re-test did not cover flat-graphic hard cases. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-07-05 07:57:50 +02:00 · 2026-06-14 13:17:11 -07:00
parent 41a2af2ecb
commit 4c6b56f888
4 changed files with 40 additions and 20 deletions
@@ -71,11 +71,11 @@ Generic HuggingFace detectors (`Organika/sdxl-detector` Swin Transformer, `umm-m

 ## Default strength is vendor-adaptive, one ladder for both pipelines

-**DEFAULT STRENGTH IS VENDOR-ADAPTIVE, ONE LADDER FOR BOTH PIPELINES (raised + unified 2026-06-09; vendor-adaptive since 2026-06-01, SUPERSEDES every fixed-default claim in this bullet and the next).**
+**DEFAULT STRENGTH IS VENDOR-ADAPTIVE, ONE LADDER FOR BOTH PIPELINES (LOWERED 2026-06-14; raised + unified 2026-06-09; vendor-adaptive since 2026-06-01, SUPERSEDES every fixed-default claim in this bullet and the next).**

-`resolve_strength(strength, vendor)` + `vendor_for_strength(path)` (`watermark_profiles.py`) read the C2PA issuer (`metadata.synthid_source`) on the ORIGINAL input and pick `OPENAI_STRENGTH` **0.20** / `GEMINI_STRENGTH` **0.30** / `UNKNOWN_STRENGTH` **0.30** when `--strength` is unset; explicit `--strength` always wins.
+`resolve_strength(strength, vendor)` + `vendor_for_strength(path)` (`watermark_profiles.py`) read the C2PA issuer (`metadata.synthid_source`) on the ORIGINAL input and pick `OPENAI_STRENGTH` **0.10** / `GEMINI_STRENGTH` **0.15** / `UNKNOWN_STRENGTH` **0.15** when `--strength` is unset; explicit `--strength` always wins.

-**The SAME ladder applies to BOTH pipelines** (`sdxl` and `controlnet`) -- these are the 2026-06-04 Modal-cert controlnet floors.
+**The SAME ladder applies to BOTH pipelines** (`sdxl` and `controlnet`). **2026-06-14: lowered from the 2026-06-04 cert floors (OpenAI 0.20 / Google 0.30) back toward the original 2026-06-01 study (OpenAI ~0.05-0.10 / Google 0.15).** A re-test on the deployed Modal controlnet worker cleared SynthID on the oracle at OpenAI 0.10 (2 photoreal, 1402/1448 px) and Google 0.15 (2 NATIVE 2816x1536 images -- retiring the "native ~2816 likely needs >=0.30" guess), while a pixel sweep showed 0.20/0.30 over-regenerated for no efficacy gain (Google MAE -20% at 0.15). See `watermark_profiles.py` "Data basis". **CAVEATS that stand:** (1) removal near this floor is SEED-NON-DETERMINISTIC (the 2026-06-09 finding below) -- a SERVICE on this ladder must pin a fixed, oracle-verified seed, not rely on a random one; (2) the re-test is n=2 per vendor on photoreal/landscape, NOT flat graphics (the `sdxl` weak spot), so raise `--strength` if an oracle reads SynthID on a flat output.

 **Why one ladder (NOT a per-pipeline split):** the cert was run on controlnet and does NOT transfer to `sdxl` by symmetry (opposite hard cases -- controlnet leaves SynthID on photoreal, `sdxl` on flat graphics), BUT on its OWN hard case (flat fills) `sdxl` is the WEAKER remover (plain img2img barely perturbs a flat region at low strength), so it needs AT LEAST controlnet's strength -- hence the certified floor is the right floor for `sdxl` too. It is a MARGIN argument for `sdxl`, not a fresh certification (no local SynthID detector to self-verify); raise `--strength` if an oracle still reads a flat `sdxl` output. The higher strength costs little quality because `controlnet` is now the default pipeline AND the only `--auto` pick, so `sdxl` is reached only via an explicit `--pipeline sdxl` (a deliberate opt-down for inputs without faces/text), where over-regeneration has nothing to damage. (A short-lived per-pipeline split ladder -- `sdxl` 0.15/0.20 vs controlnet 0.20/0.30 -- existed on 2026-06-09 before being unified the same day; the `resolve_strength` `pipeline` param and the `CONTROLNET_*_STRENGTH` constants were removed.) The CLI detects the vendor from the pristine source (before the visible pass / metadata-strip removes C2PA from the temp file) and passes it to display calls so display and execution agree; `cmd_invisible`/`cmd_all`/`batch` thread `vendor`.

@@ -453,9 +453,18 @@ study (section 2.2) gives empirical floors:

 The default is **vendor-adaptive** (`watermark_profiles.resolve_strength` +
 `vendor_for_strength`): the tool reads the C2PA issuer on the original input and picks
-`OPENAI_STRENGTH` 0.20 / `GEMINI_STRENGTH` 0.30 / `UNKNOWN_STRENGTH` 0.30. **The SAME
-ladder applies to both pipelines** (`sdxl` and `controlnet`) -- these are the
-oracle-certified controlnet floors (§5.5, the 2026-06-04 Modal cert). Why one ladder
+`OPENAI_STRENGTH` 0.10 / `GEMINI_STRENGTH` 0.15 / `UNKNOWN_STRENGTH` 0.15 **(LOWERED
+2026-06-14 from the 2026-06-04 cert floors 0.20/0.30/0.30)**. **The SAME ladder applies
+to both pipelines** (`sdxl` and `controlnet`). The 2026-06-14 re-test on the deployed
+Modal controlnet worker (v0.10.0) cleared SynthID on the oracle at OpenAI 0.10 (2
+photoreal) and Google 0.15 (2 NATIVE 2816x1536, contradicting the "native >= 0.30" guess
+on line above), and a pixel sweep showed 0.20/0.30 over-regenerated for no efficacy gain.
+**This re-opens a genuine tension with the 2026-06-04 pass, which found photoreal STILL
+detected after controlnet at 0.10/0.15 (lines above):** either the v0.10.0 controlnet
+default improved the floor, or n=2 landed on the lucky side of the seed-non-determinism
+(§5.5). So a SERVICE on this ladder MUST pin a fixed, oracle-verified seed (not random),
+and flat-graphic hard cases (NOT in the n=2 re-test) still need a per-content oracle
+recheck -- raise `--strength` there. The prior cert floors are the §5.5 record. Why one ladder
 covers plain `sdxl` too: the certification was run on controlnet and does NOT transfer
 by symmetry (the two pipelines have OPPOSITE hard cases -- controlnet leaves SynthID on
 photoreal, `sdxl` on flat graphics, the §5.1 content-x-pipeline table), BUT on its own
@@ -37,12 +37,22 @@ CONTROLNET_CANNY_MODEL = "xinsir/controlnet-canny-sdxl-1.0"
 # applies to BOTH pipelines (`sdxl` plain img2img and `controlnet`) -- see "why one
 # ladder" below.
 #
-# Data basis (see docs/synthid.md sections 2.2 / 5.5): the values are the ORACLE-
-# CERTIFIED controlnet floors (2026-06-04, isolated Modal cert app, each vendor on its
-# own verifier): OpenAI 0.20 (2 photoreal x 3 seeds = 6/6 clean, resolution-independent),
-# Google 0.30 (clean on 2/2 seeds, validated ONLY at <= 1536 -- Gemini is resolution-
-# sensitive, native ~2816 likely needs ~0.35+). Unknown vendor gets the Google (more
-# robust watermark) value: safe-by-default.
+# Data basis (see docs/synthid.md sections 2.2 / 5.5): ORACLE-CERTIFIED controlnet floors.
+# A 2026-06-14 re-test on the deployed Modal worker (the production controlnet pipeline)
+# LOWERED the ladder back to OpenAI 0.10 / Google 0.15: each output verified on its own
+# oracle (openai.com/verify for OpenAI, the Google Gemini app for Google), all clean ->
+#   - OpenAI 0.10: 2 photoreal images (1402 / 1448 px), SynthID not found on either.
+#   - Google 0.15: 2 NATIVE-resolution images (both 2816x1536), SynthID not found on
+#     either -- this directly retires the earlier "native ~2816 likely needs ~0.35+"
+#     guess, which was speculative and never oracle-checked at that resolution.
+# This supersedes the 2026-06-04 cert (OpenAI 0.20 / Google 0.30), whose higher floor a
+# pixel-fidelity sweep showed was ~2x the removal floor and over-regenerated for no
+# efficacy gain (Google MAE -20% at 0.15 vs 0.30, no SynthID returning). Unknown vendor
+# tracks the Google (more robust watermark) value -> 0.15, still safe-by-default and the
+# floor that real (no-vendor) photos hit, so it also minimizes damage when there is in
+# fact nothing to remove. CAVEAT: the re-test is n=2 per vendor on photoreal / landscape
+# content; FLAT-GRAPHIC hard cases (the historical `sdxl` weak spot) were NOT in the
+# sample, so if an oracle still reads SynthID on a flat output, raise `--strength`.
 #
 # Why ONE ladder for both pipelines (2026-06-09): the certification was run on
 # controlnet, and it does NOT transfer to `sdxl` by symmetry -- the two pipelines have
@@ -57,9 +67,9 @@ CONTROLNET_CANNY_MODEL = "xinsir/controlnet-canny-sdxl-1.0"
 # this is a MARGIN argument for `sdxl`, not a fresh certification -- there is no local
 # SynthID detector, so if an oracle still reads SynthID on a flat `sdxl` output, raise
 # `--strength`.
-OPENAI_STRENGTH = 0.20
-GEMINI_STRENGTH = 0.30
-UNKNOWN_STRENGTH = 0.30
+OPENAI_STRENGTH = 0.10
+GEMINI_STRENGTH = 0.15
+UNKNOWN_STRENGTH = 0.15
 # Backwards-compatible alias: the vendor-unknown value (what a caller gets without a
 # detected vendor). Kept as DEFAULT_STRENGTH for existing references.
 DEFAULT_STRENGTH = UNKNOWN_STRENGTH
@@ -140,11 +140,12 @@ class TestResolveStrength:
        assert resolve_strength(None, "adobe") == UNKNOWN_STRENGTH

    def test_ladder_is_the_certified_controlnet_floors(self):
-        # The unified ladder == the oracle-certified controlnet floors (OpenAI 0.20,
-        # Google/unknown 0.30); Google is the more-robust watermark, so it is higher.
-        assert OPENAI_STRENGTH == 0.20
-        assert GEMINI_STRENGTH == 0.30
-        assert UNKNOWN_STRENGTH == 0.30
+        # The unified ladder == the oracle-certified controlnet floors. Lowered on the
+        # 2026-06-14 Modal re-test (OpenAI 0.10, Google/unknown 0.15); Google is the
+        # more-robust watermark, so it is higher.
+        assert OPENAI_STRENGTH == 0.10
+        assert GEMINI_STRENGTH == 0.15
+        assert UNKNOWN_STRENGTH == 0.15
        assert OPENAI_STRENGTH < GEMINI_STRENGTH

    def test_default_strength_alias_is_unknown_vendor_value(self):