diff --git a/CLAUDE.md b/CLAUDE.md index 3634ba9..a10bc63 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -10,7 +10,7 @@ You are a **principal Python engineer** maintaining a CLI tool and library for r - `uv run remove-ai-watermarks identify ` — provenance verdict (platform + watermark inventory + confidence); `--json` for machine output, `--no-visible` to skip the cv2 sparkle detector - `uv run remove-ai-watermarks metadata --check` — inspect AI metadata (C2PA, EXIF, PNG chunks) - `uv run remove-ai-watermarks metadata --remove -o ` — strip all AI metadata -- `uv run remove-ai-watermarks batch ` — process every supported image in a directory (output defaults to `_clean/`, set with `-o`). `--mode visible|invisible|metadata|all` (default `visible`); the invisible/all path reuses the same `--strength`/`--steps`/`--pipeline`/`--controlnet-scale`/`--device`/`--max-resolution`/`--min-resolution`/`--upscaler`/`--seed`/`--hf-token` knobs as `invisible`, `--inpaint/--no-inpaint` for the visible pass, `--humanize` for the Analog Humanizer + `--unsharp` for the final sharpening post-filter, `--restore-faces/--no-restore-faces` for the PhotoMaker-V2 face-identity post-pass (**NON-COMMERCIAL**, `photomaker` extra), and `--auto` (+ `--adaptive-polish/--no-adaptive-polish`) for the content-adaptive quality mode (re-planned per image; one engine cached per resolved pipeline) +- `uv run remove-ai-watermarks batch ` — process every supported image in a directory (output defaults to `_clean/`, set with `-o`). `--mode visible|invisible|metadata|all` (default `visible`); the invisible/all path reuses the same `--strength`/`--steps`/`--pipeline`/`--controlnet-scale`/`--device`/`--max-resolution`/`--min-resolution`/`--upscaler`/`--seed`/`--hf-token` knobs as `invisible`, `--inpaint/--no-inpaint` for the visible pass, `--humanize` for the Analog Humanizer + `--unsharp` for the final sharpening post-filter, `--restore-faces/--no-restore-faces` + `--restore-faces-method [instantid|photomaker]` for the face REGENERATION post-pass (**NON-COMMERCIAL**, `instantid` default or `photomaker` extra; **the methods REGENERATE the face from an ArcFace embedding via SDXL diffusion, they do NOT recover original pixels — every output face pixel is diffusion-fresh, so the regenerated face inherently looks more AI-generated than the cleaned image; for production face preservation use the cleaned image as-is and leave restore OFF**), and `--auto` (+ `--adaptive-polish/--no-adaptive-polish`) for the content-adaptive quality mode (re-planned per image; one engine cached per resolved pipeline) ## Test and lint @@ -46,7 +46,9 @@ You are a **principal Python engineer** maintaining a CLI tool and library for r - `invisible_watermark.py` — `detect_invisible_watermark(path)` decodes the OPEN DWT-DCT watermarks (public decoder, no key) embedded by Stable Diffusion / SDXL / FLUX via the `imwatermark` library. Known fixed patterns (verified against upstream source) live in `_BITS_48` (SDXL 48-bit, FLUX.2 48-bit) and `_SD1_STRING` ("StableDiffusionV1", SD 1.x/2.x). Optional dep (extra `detect`); returns None when absent. The `detect` extra pulls **torch** transitively (invisible-watermark declares torch a hard dep, and `WatermarkDecoder` eagerly imports `rivaGan` -> `torch` at import time), so detection needs torch present even though dwtDct runs CPU-only on cv2/numpy/pywavelets — no GPU and no separate `gpu` extra required. **Unlike SynthID this is locally detectable**, but the watermark is fragile (does not survive JPEG re-encode/resize — verified gone after JPEG q90), so it confirms origin only on pristine files. Add new known patterns here. The file carries a top-of-module pyright pragma because imwatermark/cv2 ship no type stubs. - `trustmark_detector.py` — `detect_trustmark(path)` decodes the OPEN, keyless **Adobe TrustMark** watermark (the soft binding behind Adobe Durable Content Credentials, `alg` `com.adobe.trustmark.P`) via the optional `trustmark` package (extra `trustmark`; pulls torch, downloads model weights on first use). Mirrors `invisible_watermark.py` (lazy singleton guarded by a double-checked `threading.Lock` so concurrent callers do not double-download the weights, top-of-module pyright pragma, returns None when absent). It detects *provenance*, not AI origin as such (TrustMark also marks human-authored content), so `identify` lists it as a watermark without setting `is_ai_generated`. Other soft-binding vendors (Digimarc/Imatag/Steg.AI/...) have no public decoder — they are only *named* via the `C2PA_SOFT_BINDINGS` scan, not decoded. **False-positive gate (added 2026-05-29):** TrustMark's `wm_present` is a BCH error-correction validity flag that spuriously validates on a content-correlated fraction of un-watermarked images — AI-generated textures trip it far more than camera photos (verified 2026-05-29 on real files: it fires on Gemini/OpenAI/Doubao output that *cannot* carry Adobe's watermark, with a random-bytes decoded secret, while signal-free camera photos did not trip it). A genuine TrustMark is a *durable* soft binding engineered to survive re-encoding, so `detect_trustmark` re-decodes after a mild JPEG round-trip (`_survives_reencode`, `_REENCODE_QUALITY` 95) and requires the same schema both times; every observed false positive collapsed (none survived even q95), so the gate is the durability property the watermark guarantees. The second decode runs only on the rare initial hit, so the cost is negligible. Do NOT remove the gate to "catch more" — a lone TrustMark hit without it is almost always content noise. - `noai/watermark_remover.py` — the `WatermarkRemover` class has two diffusion pipelines, selected by the explicit `pipeline` ctor arg (NOT inferred from `model_id` -- both use the same SDXL base, `DEFAULT_MODEL_ID`). **`default`** runs plain SDXL img2img (`_run_img2img`). **`controlnet`** (**EXPERIMENTAL, opt-in**; `_run_controlnet`, `_load_controlnet_pipeline`) runs `StableDiffusionXLControlNetImg2ImgPipeline` with the SDXL-native canny ControlNet `xinsir/controlnet-canny-sdxl-1.0` (`watermark_profiles.CONTROLNET_CANNY_MODEL`): the control image is `cv2.Canny(gray, 100, 200)` stacked to 3 channels (`_CANNY_LOW`/`_CANNY_HIGH`, prompt `_CONTROLNET_PROMPT` / `_CONTROLNET_NEGATIVE`). **Removal comes from the img2img regeneration (`strength`); the ControlNet only PRESERVES text and face STRUCTURE via the edge map.** No original pixels are copied or frozen, BUT **validation 2026-06-04 disproved the old "so SynthID does not survive" claim: SynthID CAN survive controlnet on photoreal/high-detail content.** At the shared low removal strength the canny edge-conditioning keeps the regeneration so close to the original that the pixel perturbation that destroys SynthID does not happen (oracle-confirmed: an OpenAI bracelet photo + a 9-face grid read **SynthID-detected** after controlnet at strength 0.10/0.15, but **SynthID-not-detected** after the `default` pipeline at the SAME strength + resolution -- only the pipeline differed). **But the reverse also holds: a flat-graphic logo/poster SURVIVED `default` while clearing controlnet** -- removal at the low strength is content×pipeline dependent and neither pipeline is universally safe; the real lever is a higher strength. See the controlnet Known-limitations bullet for the full table + root cause. Canny holds face STRUCTURE but NOT identity (the regenerated face drifts in likeness -- canny carries edges, not identity; face identity is regenerated by the optional `--restore-faces` PhotoMaker-V2 post-pass (EXPERIMENTAL, opt-in, OFF by default, **NON-COMMERCIAL** — needs the `photomaker` extra which pulls non-commercial InsightFace model packs) -- see `photomaker_restore.py`). `controlnet_conditioning_scale` (ctor arg, default 1.0) is the structure-preservation knob. Same dtype rule as `default` (fp32 on cpu/mps, fp16 only on cuda/xpu; the fp16-fixed SDXL VAE `_SDXL_FP16_VAE_ID` is swapped in on fp16 GPUs -- issue #29) and the same MPS->CPU fallback (reload on cpu/fp32, drop a non-cpu generator, retry once). -- `photomaker_restore.py` — **NON-COMMERCIAL** PhotoMaker-V2 face-identity post-pass (cv2/torch/diffusers/photomaker boundary, top-of-file pyright pragma). **EXPERIMENTAL, opt-in via `--restore-faces`, OFF by default.** Runs AFTER the diffusion removal pass (`InvisibleEngine.remove_watermark` -> `_restore_faces_photomaker`). Carries identity in a CLIP+ArcFace embedding (PhotoMaker-V2's dual-encoder) and regenerates fresh face pixels conditioned on it; the pixels are diffusion-fresh, so SynthID is not re-introduced. Flow: YuNet detects faces in the CLEANED image; for each box, the SAME box from the ORIGINAL is square-cropped (`_face_crop_square`) and fed as `input_id_images` to `PhotoMakerStableDiffusionXLPipeline` (txt2img); the regenerated face is feather-composited back via `_composite_faces`. Lazy pipeline singleton (double-checked lock) downloads `photomaker-v2.bin` from `TencentARC/PhotoMaker-V2` on first use; PhotoMaker's `__init__.py` also instantiates a face-analyser class lazily, which downloads InsightFace antelopev2/buffalo_l packs on first inference (the non-commercial step). Pure helpers (`_face_crop_square`, `_composite_faces`) are unit-tested without the model (`tests/test_photomaker_restore.py`); the model-running path is gated behind `is_available()` and exercised via the Modal cert sweep. fp16 on CUDA, fp32 on MPS/CPU. The previous commercial-safe `face_restore.py` (GFPGAN-on-cleaned) was removed 2026-06-04 because GFPGAN at this resolution only polished the already-drifted face without restoring identity (visually confirmed). PhotoMaker-V1 was also attempted as a commercial-safe path but blocked by a CFG batch-dim mismatch in the upstream pipeline (forked from diffusers 0.29; we ship 0.38) — see `docs/synthid-robust-identity-research.md` for the full chain. +- **Face restore trade-off (load-bearing, 2026-06-08 empirical finding).** Every shipped face-restore method (`instantid_restore.py`, `photomaker_restore.py`) is **REGENERATION** of the face from an ArcFace embedding via SDXL diffusion, NOT recovery of the original pixels. Each output face pixel is diffusion-fresh, so the regenerated face inherently looks MORE AI-generated than the cleaned image it replaces — gloss, symmetric pores, generic SDXL "clean skin" aesthetic. The cleaned image from the main controlnet 0.20 removal pass is the LEAST-AI state we can get without re-introducing SynthID: it's a light denoise of the original, not a full regeneration. Every restore method (GFPGAN-on-cleaned: polish without identity recovery; PhotoMaker-V2 txt2img: different person; InstantID txt2img: studio portrait patchwork; InstantID img2img-on-cleaned: scene-integrated but still AI-look face) was empirically tested in the 2026-06-04 - 2026-06-08 cert sweeps; all share the same root: ArcFace encodes "general look", SDXL decode adds AI aesthetic. **For production face preservation, use the cleaned image AS-IS and leave restore OFF.** The extras are kept for research / personal use where users explicitly want identity regeneration even at the cost of AI-look. +- `instantid_restore.py` — **NON-COMMERCIAL** InstantID face REGENERATION post-pass (cv2/torch/diffusers boundary, top-of-file pyright pragma). **EXPERIMENTAL, opt-in via `--restore-faces --restore-faces-method instantid`, OFF by default.** **Regenerates the face — does NOT preserve original pixels** (see the Face restore trade-off bullet above). Runs AFTER the diffusion removal pass (`InvisibleEngine.remove_watermark` -> `_restore_faces_instantid`). Carries identity in an ArcFace 512-d embedding + 5-keypoint landmark stick figure; pixels are diffusion-fresh so SynthID is not re-introduced. Flow: YuNet detects faces in the CLEANED image; for each box, the SAME box from BOTH original (ArcFace + kps) and cleaned (img2img source -- the cleaned image is oracle-clean, so unmasked-bbox pixels stay SynthID-free) are square-cropped + resized to 1024; landmark stick figure rendered from kps; `StableDiffusionXLInstantIDImg2ImgPipeline` (loaded via `custom_pipeline=` -- the upstream file is fetched from `raw.githubusercontent.com` on first use because it isn't on PyPI / HF Hub at a path diffusers auto-loads, requires `trust_remote_code=True`) runs img2img with the cleaned crop as source, landmark as ControlNet conditioning, ArcFace as IP-Adapter `image_embeds`. Elliptical-alpha + per-channel mean color-match composite. IP-Adapter scale set at LOAD via `load_ip_adapter_instantid(scale=1.0)` not at call. **Diffusers 0.38 compat patch:** the upstream pipeline calls `self.check_inputs(...)` positionally with the diffusers-~0.29 signature, but diffusers 0.38 added 2 new params before `controlnet_conditioning_scale` in the parent's check, shifting positional args by 2 -- the broadcasted `control_guidance_end=[1.0]` (list) ends up in the slot validated as `controlnet_conditioning_scale` and trips `TypeError("must be type float")`. We neutralise the check with `pipe.check_inputs = lambda *a, **k: None` (safe -- our inputs are programmatic). **antelopev2 download fix:** InsightFace's built-in URL has been broken since at least 2024 (upstream issue #2517, #2766; called out in InstantID README); `_ensure_antelopev2()` pulls the five `.onnx` files from `kidyu/antelopev2-for-InstantID-ComfyUI` on HF before `FaceAnalysis` init. Pure helpers (`_face_crop_square`, `_composite_faces_elliptical`, `_color_match`, `_draw_kps`) unit-tested without the model. **NON-COMMERCIAL because the runtime ArcFace embedder is InsightFace's antelopev2 pack which is research-only**, same chokepoint as PhotoMaker-V2 (`docs/synthid-robust-identity-research-2026-06-08.md`). +- `photomaker_restore.py` — **NON-COMMERCIAL** PhotoMaker-V2 face-identity post-pass (cv2/torch/diffusers/photomaker boundary, top-of-file pyright pragma). **EXPERIMENTAL, opt-in via `--restore-faces --restore-faces-method photomaker`, OFF by default. Alternative to `instantid_restore.py` (the default restore method); both REGENERATE the face — see the Face restore trade-off bullet above for why neither is in prod use.** Runs AFTER the diffusion removal pass (`InvisibleEngine.remove_watermark` -> `_restore_faces_photomaker`). Carries identity in a CLIP+ArcFace embedding (PhotoMaker-V2's dual-encoder) and regenerates fresh face pixels conditioned on it; the pixels are diffusion-fresh, so SynthID is not re-introduced. Flow: YuNet detects faces in the CLEANED image; for each box, the SAME box from the ORIGINAL is square-cropped (`_face_crop_square`) and fed as `input_id_images` to `PhotoMakerStableDiffusionXLPipeline` (txt2img); the regenerated face is feather-composited back via `_composite_faces`. Lazy pipeline singleton (double-checked lock) downloads `photomaker-v2.bin` from `TencentARC/PhotoMaker-V2` on first use; PhotoMaker's `__init__.py` also instantiates a face-analyser class lazily, which downloads InsightFace antelopev2/buffalo_l packs on first inference (the non-commercial step). Pure helpers (`_face_crop_square`, `_composite_faces`) are unit-tested without the model (`tests/test_photomaker_restore.py`); the model-running path is gated behind `is_available()` and exercised via the Modal cert sweep. fp16 on CUDA, fp32 on MPS/CPU. The previous commercial-safe `face_restore.py` (GFPGAN-on-cleaned) was removed 2026-06-04 because GFPGAN at this resolution only polished the already-drifted face without restoring identity (visually confirmed). PhotoMaker-V1 was also attempted as a commercial-safe path but blocked by a CFG batch-dim mismatch in the upstream pipeline (forked from diffusers 0.29; we ship 0.38) — see `docs/synthid-robust-identity-research.md` for the full chain. - `auto_config.py` — the `--auto` quality-mode planner (EXPERIMENTAL). `plan(image_path) -> AutoConfig | None` inspects the INPUT image (before the diffusion model loads) and picks the pipeline modes, so the run adapts to content. **Designed to run as the FIRST step of the invisible/all pipeline, wherever that runs** — locally or the raiw.cc Modal GPU worker — **never on the 512 MB web host** (image work there OOM-crashes the container; the planner is `_apply_auto` in `cli.py` for the CLI, and raiw-app would call `plan()` inside `RaiwProtect.remove`). **Quality-priority routing:** ControlNet (text/face-structure preservation) is the default; it is skipped for `default` (plain SDXL) only on a clearly structure-less image (`not has_face and not has_text and edge_density < _STRUCTURELESS_EDGE_MAX` 0.008). **CAVEAT (oracle-validated 2026-06-04, see the controlnet Known-limitations bullet): at the low vendor-adaptive strength NEITHER pipeline removes SynthID on all content -- it is content×pipeline dependent (photoreal SURVIVES controlnet / clears default; flat graphics SURVIVE default / clear controlnet; flat text clears both). So `--auto` picking controlnet for faces/photos leaves SynthID on exactly those, and plain `default` would leave it on flat graphics -- pipeline choice alone does NOT guarantee removal. The real lever is a HIGHER strength, oracle-validated per content type. Removal-priority callers (raiw.cc) must oracle-validate strength across content types BEFORE adopting auto; the "must keep SynthID removed" gate in the adoption note below is the blocker this caught.** `restore_faces` is on when a face is present. When a smoothing pass (controlnet/restore) ran, the **adaptive polish** (`humanizer.adaptive_polish`) is applied: it targets the input's Laplacian variance (detail level) with a capped unsharp + edge-masked grain, restoring photo/face texture while **sparing text** (text is already high-frequency, so the deficit is tiny and almost no polish lands -- the old fixed unsharp/grain speckled small text; validated 2026-06-03 on gemini_3 lap-var 84->334 toward the 592 original, openai_1 text near-untouched). **Detection is cv2-only and torch-free** (~100 MB peak RSS, a few ms — measured): OpenCV **YuNet** (`cv2.FaceDetectorYN`, MIT, 232 KB model bundled at `assets/face_detection_yunet_2023mar.onnx`) for faces, **DBNet** (PP-OCRv3 differentiable-binarization via `cv2.dnn.TextDetectionModel_DB`, a 2.4 MB Apache-2.0 model bundled at `assets/text_detection_ppocrv3_2023may.onnx`) for text, with the old Canny+MSER region heuristic kept as a fallback if the DBNet model can't load (`_detect_text_dbnet` returns None → `_detect_text_mser`). The en/cn opencv_zoo PP-OCRv3 detection models are byte-identical, so it is bundled language-neutral. Text only ever ADDS controlnet, so a miss is backstopped by edge-density and a false positive only costs a controlnet run. Plus `edge_density`. `min_resolution` stays 1024. **Every auto decision is independently overridable** (interface principle): `_apply_auto` (cli.py) overrides only the three content-adaptive modes the user left at their click default (`ctx.get_parameter_source(...) == DEFAULT`) — `--pipeline`, `--restore-faces`/`--no-restore-faces`, and **`--adaptive-polish`/`--no-adaptive-polish`** always win; `--min-resolution`/`--strength`/`--unsharp`/`--humanize` are independent knobs. `--adaptive-polish` also works WITHOUT `--auto` (manual detail-targeted polish; the engine's `adaptive_polish` param uses the full-res original as the detail reference). Prints the chosen plan (`AutoConfig.reason`). Wired into `cmd_all`/`cmd_invisible`/`cmd_batch` — in `batch` the plan is recomputed per image and the invisible engine is cached **per resolved pipeline** (`ctx.obj["_inv_engines"]`, keyed `default`/`controlnet`) instead of a single shared instance, so a mixed directory builds at most one engine of each kind. **Adds ZERO new pip deps** (all cv2 core + the bundled MIT YuNet + Apache-2.0 DBNet models + the cv2-only adaptive polish). The auto plan does NOT select the `esrgan` upscaler (that needs the optional extra and would make auto's behavior install-dependent); `--upscaler esrgan` stays a separate manual knob. Unit-tested without a heavy download (`tests/test_auto_config.py`): flat/text synthetic images for routing (the bundled DBNet fires on a real text card), monkeypatched `detect_face`/`_detect_text_dbnet`/`_detect_text_mser` for the face/text/fallback branches (a real detectable-face fixture is private, never committed). Production adoption path for raiw.cc: validate (must keep SynthID removed, not hallucinate micro-text, beat plain SDXL on the real upload distribution), then bump the library SHA in `modal_app.py` and pass `auto=True`. - `upscaler.py` — optional Real-ESRGAN pre-diffusion super-resolution for small inputs (spandrel boundary, top-of-file pyright pragma). `is_available()` gates on spandrel+torch (via `importlib.util.find_spec`); `upscale(bgr, device=None)` loads a lazily-built spandrel `ImageModelDescriptor` singleton (double-checked lock) and upscales by the model's native factor (x2), with a non-CPU→CPU device fallback mirroring the diffusion engine's MPS→CPU retry. Weights (`RealESRGAN_x2plus.pth`, BSD-3-Clause) download on first use to the `torch.hub` checkpoints cache; never bundled. Used only when UPscaling to the `min_resolution` floor (a `max_resolution` downscale always uses Lanczos). The wiring is `InvisibleEngine._esrgan_upscale(pil, target)` — Real-ESRGAN at native factor, then a Lanczos resize to the exact target, falling back to a plain Lanczos resize if the extra is absent or the model errors (so an optional upscaler can never break removal). The default `--upscaler` is `lanczos` (cv2, no deps). **ESRGAN is a generic photo/texture GAN with no face/glyph prior**, so it best fits photo/texture content and can degrade faces (glassy/asymmetric eyes -- the diffusion pass regenerates faces so the full-pipeline final recovers; `--restore-faces` is the polish path on top of that) and thin/small text (the GAN invents wrong strokes, and low-strength diffusion will not fix it). Verified 2026-06-04: isolated upscale lap-var ~5x Lanczos on faces+textures but glassy eyes; end-to-end `invisible` final lap-var 1634 vs Lanczos 663 with natural faces (diffusion cleaned the artifact). Kept a **manual opt-in knob** (the auto plan never selects it) with `lanczos` the default; not content-gated by design (use Lanczos for text-heavy inputs). spandrel is MIT and pulls no basicsr. Unit-tested without the model: `tests/test_upscaler.py` (availability guard + the not-installed RuntimeError) and `tests/test_invisible_engine.py::TestEsrganUpscale` (the three `_esrgan_upscale` branches via a monkeypatched `upscaler`). - `image_io.py` — Unicode-safe cv2 IO (issue #17). `imread(path, flags=None)` / `imwrite(path, img)` wrap `np.fromfile`+`cv2.imdecode` / `cv2.imencode`+`tofile` so non-ASCII paths work on Windows -- bare `cv2.imread`/`cv2.imwrite` use the platform ANSI code-page API there and fail (empty decode + `can't open/read file`) on Chinese/Cyrillic/accented filenames. `imread` keeps `cv2.imread` semantics (defaults to `IMREAD_COLOR`, returns `None` on missing/empty/undecodable). **Every cv2 file read/write in the package routes through here; do not call `cv2.imread`/`cv2.imwrite` directly.** `imwrite` returns `False` on an unwritable path (`OSError` caught) instead of raising, matching `cv2.imwrite` semantics. macOS/Linux already accept UTF-8 paths, so it is behavior-neutral there (the bug only reproduces on Windows). cv2/numpy are imported lazily inside the functions, so the module is cheap to import in a bare env. @@ -92,4 +94,4 @@ Who embeds what, and whether it is locally detectable (so we know which gaps are - **External AI-vs-real classifier models are out of scope (decided 2026-05-24).** Generic HuggingFace detectors (`Organika/sdxl-detector` Swin Transformer, `umm-maybe/AI-image-detector`, and fine-tunes) exist and report ~0.98 on their *own* SDXL-vs-real validation sets, but they are per-generator and the model cards themselves note degraded accuracy off-distribution; they are untested on gpt-image / Gemini Nano Banana (the metadata-stripped surfaces we care about), and our own light SDXL pass would likely defeat them the same way it defeats SynthID. Detection here stays local + signal-based (metadata + visible sparkle); do not add a bundled classifier dependency. - **DEFAULT STRENGTH IS NOW VENDOR-ADAPTIVE (2026-06-01, SUPERSEDES every fixed-default claim in this bullet and the next).** `resolve_strength(strength, profile, vendor)` + `vendor_for_strength(path)` (`watermark_profiles.py`) read the C2PA issuer (`metadata.synthid_source`) on the ORIGINAL input and pick `OPENAI_STRENGTH` **0.10** / `GEMINI_STRENGTH` **0.15** / `UNKNOWN_STRENGTH` **0.15** when `--strength` is unset; explicit `--strength` always wins. The CLI detects the vendor from the pristine source (before the visible pass / metadata-strip removes C2PA from the temp file) and passes it to the engine, so display and execution agree; `cmd_invisible`/`cmd_all`/`batch` + the module-level `remove_watermark` all thread `vendor`. **This replaces the single 0.30 default AND the prior "do NOT build a vendor-adaptive default" policy** -- both came from the now-debunked region-rescrub-contaminated study (the per-region re-scrub that contaminated those numbers was removed in the controlnet refactor). Basis: the oracle-verified June 2026 controlled study (clean v0.8.6, protect OFF): OpenAI clears at 0.05 across 1024-1600 (n=4, resolution-independent); Google needs 0.15 on the capped-1536 path (n=4). `docs/synthid.md` §2.2 (data) + §5.2 (the adaptive default) are authoritative. **CAVEAT (oracle pass 2026-06-04): the OpenAI 0.10 default is content-dependent, NOT universal -- a flat-graphic OpenAI logo/poster still read SynthID-detected after `default` at 0.10, and photoreal images after controlnet at 0.10/0.15 (low-change regions under-perturbed). Removal at 0.10/0.15 is content×pipeline dependent (see the controlnet Known-limitations bullet); the lever is a higher strength, oracle-revalidated per content type. Do NOT assume the vendor-adaptive default clears every image.** CAVEAT: Google's 0.15 was validated only on `--max-resolution 1536`; native large Gemini (2816) was not locally measurable (OOM on M-series) and is pending GPU validation on raiw.cc -- if it survives 0.15 native, raise `--strength`. **Everything below in this bullet about a fixed 0.10/0.30 default is HISTORICAL; trust the vendor-adaptive constants + docs/synthid.md.** - **SynthID removal: strength + oracle scope.** Default strength is vendor-adaptive (see the bullet above); `docs/synthid.md` §2.2 is authoritative for the numbers. **Oracle scope (load-bearing):** the Gemini app "Verify with SynthID" is the ONLY valid SynthID oracle (detects Google's mark on any image); `openai.com/verify` is scoped to OpenAI provenance (its own C2PA), NOT a SynthID oracle -- a negative there is meaningless for SynthID. There is no local SynthID detector, so the tool cannot self-check; if the oracle still reads SynthID, raise `--strength` to the lowest value that verifies clean. Only the `default` (plain SDXL img2img) and `controlnet` (SDXL + canny ControlNet) profiles exist; the local `invisible` default is weight-for-weight identical to raiw.cc prod (`fal-ai/fast-sdxl` = `stabilityai/stable-diffusion-xl-base-1.0`, runtime-downloaded, not bundled). **Forensic-stealth caveat** (arXiv:2605.09203): defeating the SynthID verifier is NOT forensic invisibility -- independent detectors flag *removal-processed* images vs genuinely-clean ones at >98% TPR@1%FPR, so do not over-claim "indistinguishable from a real photo". -- **`controlnet` pipeline (text/face STRUCTURE preservation, EXPERIMENTAL, opt-in `--pipeline controlnet`).** SDXL + the canny ControlNet `xinsir/controlnet-canny-sdxl-1.0` via `StableDiffusionXLControlNetImg2ImgPipeline` (`watermark_remover._run_controlnet` / `_load_controlnet_pipeline`). **Removal still comes from the img2img regeneration (`strength`); the ControlNet only PRESERVES text and face STRUCTURE by conditioning on the canny edge map** (`cv2.Canny(gray, 100, 200)`, 3-channel). Canny preserves edges, NOT face identity (a regenerated face drifts in likeness); face identity is regenerated by the optional `--restore-faces` PhotoMaker-V2 post-pass (EXPERIMENTAL, opt-in, OFF by default, **NON-COMMERCIAL** — see `photomaker_restore.py`, the `photomaker` extra). PhotoMaker-V2 carries identity in a CLIP+ArcFace embedding and regenerates fresh face pixels conditioned on it, so SynthID is not re-introduced; the non-commercial restriction is from InsightFace's research-only model packs. The CodeFormer alternative stays NON-COMMERCIAL and is not shipped. The earlier `--face-id` IP-Adapter FaceID layer was REMOVED (footgun: it needs high strength and corrupts faces at the low removal strength). No original pixels are copied or frozen, **BUT removal at the low vendor-adaptive strength is CONTENT × PIPELINE dependent and NEITHER pipeline clears all content -- oracle-validated against the OpenAI verifier 2026-06-04 (8 images, strength 0.10/0.15, `--max-resolution 1536`).** The survivors FLIP by content type: **photoreal** (a 9-face grid, a bracelet product photo) SURVIVES controlnet but CLEARS `default` (controlnet's dense edge map keeps the regen too close to the original, so the SynthID-destroying perturbation never happens; plain img2img perturbs photoreal texture enough); **flat graphic** (a logo/poster with large flat color fills) SURVIVES `default` but CLEARS controlnet (at low strength img2img barely changes flat fills so SynthID persists there, while controlnet repaints them more freely); a flat **text** card cleared under both. **Root cause is insufficient STRENGTH, not the pipeline: at 0.10 the low-change regions -- dense-edge photoreal under controlnet, large flat fills under `default` -- are not perturbed enough to destroy SynthID. The vendor-adaptive 0.10 from the June study is NOT universally sufficient (that study's content happened to clear at 0.10).** The robust fix is a HIGHER strength, oracle-revalidated per content type (controlnet can be cranked harder without losing structure; a lower `controlnet_conditioning_scale` also frees the regen on photoreal). So at today's default strength **both pipelines AND `--auto` can LEAVE SynthID on some content** -- a removal-priority caller (raiw.cc) MUST oracle-validate strength across content types before adopting, not pick a pipeline and assume removal. **Follow-up same day: re-running the two photoreal survivors through controlnet at an explicit `--strength 0.15` cleared BOTH on the oracle -- BUT one of them (the bracelet) had SURVIVED the SAME 0.15 controlnet config in the first pass (only the random, unset seed differed). So removal near the threshold is SEED-NON-DETERMINISTIC: the same image+pipeline+strength+resolution can pass or fail run-to-run (img2img uses `seed=None`/random unless `--seed` is passed, and there is no local SynthID detector to self-verify). 0.15 is the borderline, NOT a robust floor -- pick a strength with MARGIN (controlnet ~>= 0.20) rather than exactly on it; the content×pipeline table's 0.15 data point is near-threshold noise. A confirming run at `--strength 0.20` controlnet cleared BOTH photoreal survivors on the oracle (ladder: 0.10 grid detected → 0.15 borderline/non-deterministic → 0.20 both clean), so **0.20 is the recommended robust controlnet floor for OpenAI photoreal** (one margin run, not an N-run repeatability proof -- a service should add margin or verify repeatability since there is no local SynthID detector to self-check). **Engineering follow-up for raiw.cc: the controlnet pipeline should use a HIGHER vendor strength than `default` -- it currently shares `resolve_strength` (0.10/0.15, tuned for plain img2img), but controlnet's edge map preserves structure so it needs ~0.20+; calibrate per vendor/content on the GPU worker, do NOT just reuse the `default` ladder.** **CERTIFIED 2026-06-04 via the isolated `raiw-controlnet-cert` Modal app (`raiw-app/modal_cert.py`), restore OFF, ≤1536, each vendor on its own oracle: controlnet floors are OpenAI 0.20 (2 photoreal × 3 seeds = 6/6 clean; the 0.15-flipper is seed-robust at 0.20) and Gemini 0.30 (0.20 detected → 0.30 clean on 2/2 seeds). OpenAI 0.20 transfers to prod (resolution-independent); Gemini 0.30 holds only ≤1536 — Gemini is resolution-sensitive and raiw.cc runs NATIVE (`max_resolution=0`), so cap Gemini ≤1536 + use 0.30, or native-calibrate (~0.35+). Prod recipe: controlnet + per-vendor floor in `resolve_strength` (not the default ladder) + FIXED seed (kills the non-determinism). No `--restore-faces` in prod (the only shipped restore method is PhotoMaker-V2, which is NON-COMMERCIAL); faces drift in identity on the canny controlnet path -- this is the open issue for a paid service.** See `docs/synthid.md` §5.5 + `docs/controlnet-removal-pipeline-research.md` (certified floors table).** **Lesson: visual-quality + face-recovery validation does NOT prove watermark removal -- only the SynthID oracle does, across MULTIPLE content types; never infer removal from sharpness/identity, and never conclude from a partial result (the photoreal-only data first read as "controlnet shields, default removes" -- the flat-graphic result reversed it).** `controlnet_conditioning_scale` (CLI `--controlnet-scale`, default 1.0) is the structure-preservation knob (higher = closer to the original structure); fp32 on cpu/mps, fp16-fixed VAE on cuda/xpu. The `controlnet` profile is threaded explicitly (`WatermarkRemover(pipeline=...)` / `InvisibleEngine(pipeline=...)`), NOT inferred from `model_id`. This productionizes the `scripts/controlnet_sweep.py` prototype; see `docs/controlnet-removal-pipeline-research.md`. **Forensic-stealth caveat still applies** (arXiv:2605.09203): defeating the SynthID verifier is not forensic invisibility -- a "this image went through a removal pipeline" classifier can still flag the output. +- **`controlnet` pipeline (text/face STRUCTURE preservation, EXPERIMENTAL, opt-in `--pipeline controlnet`).** SDXL + the canny ControlNet `xinsir/controlnet-canny-sdxl-1.0` via `StableDiffusionXLControlNetImg2ImgPipeline` (`watermark_remover._run_controlnet` / `_load_controlnet_pipeline`). **Removal still comes from the img2img regeneration (`strength`); the ControlNet only PRESERVES text and face STRUCTURE by conditioning on the canny edge map** (`cv2.Canny(gray, 100, 200)`, 3-channel). Canny preserves edges, NOT face identity (a regenerated face drifts in likeness). The drifted cleaned face is the LEAST-AI state we can reach without re-introducing SynthID; the optional `--restore-faces` post-pass (`instantid` default or `photomaker`, both NON-COMMERCIAL) further REGENERATES the face from an ArcFace embedding via SDXL diffusion, which makes the output face look MORE AI-generated, not less. **For production face preservation use the cleaned image AS-IS and leave restore OFF** — see the Face restore trade-off bullet above and `instantid_restore.py` / `photomaker_restore.py`. The CodeFormer alternative stays NON-COMMERCIAL and is not shipped. The earlier `--face-id` IP-Adapter FaceID layer was REMOVED (footgun: it needs high strength and corrupts faces at the low removal strength). No original pixels are copied or frozen, **BUT removal at the low vendor-adaptive strength is CONTENT × PIPELINE dependent and NEITHER pipeline clears all content -- oracle-validated against the OpenAI verifier 2026-06-04 (8 images, strength 0.10/0.15, `--max-resolution 1536`).** The survivors FLIP by content type: **photoreal** (a 9-face grid, a bracelet product photo) SURVIVES controlnet but CLEARS `default` (controlnet's dense edge map keeps the regen too close to the original, so the SynthID-destroying perturbation never happens; plain img2img perturbs photoreal texture enough); **flat graphic** (a logo/poster with large flat color fills) SURVIVES `default` but CLEARS controlnet (at low strength img2img barely changes flat fills so SynthID persists there, while controlnet repaints them more freely); a flat **text** card cleared under both. **Root cause is insufficient STRENGTH, not the pipeline: at 0.10 the low-change regions -- dense-edge photoreal under controlnet, large flat fills under `default` -- are not perturbed enough to destroy SynthID. The vendor-adaptive 0.10 from the June study is NOT universally sufficient (that study's content happened to clear at 0.10).** The robust fix is a HIGHER strength, oracle-revalidated per content type (controlnet can be cranked harder without losing structure; a lower `controlnet_conditioning_scale` also frees the regen on photoreal). So at today's default strength **both pipelines AND `--auto` can LEAVE SynthID on some content** -- a removal-priority caller (raiw.cc) MUST oracle-validate strength across content types before adopting, not pick a pipeline and assume removal. **Follow-up same day: re-running the two photoreal survivors through controlnet at an explicit `--strength 0.15` cleared BOTH on the oracle -- BUT one of them (the bracelet) had SURVIVED the SAME 0.15 controlnet config in the first pass (only the random, unset seed differed). So removal near the threshold is SEED-NON-DETERMINISTIC: the same image+pipeline+strength+resolution can pass or fail run-to-run (img2img uses `seed=None`/random unless `--seed` is passed, and there is no local SynthID detector to self-verify). 0.15 is the borderline, NOT a robust floor -- pick a strength with MARGIN (controlnet ~>= 0.20) rather than exactly on it; the content×pipeline table's 0.15 data point is near-threshold noise. A confirming run at `--strength 0.20` controlnet cleared BOTH photoreal survivors on the oracle (ladder: 0.10 grid detected → 0.15 borderline/non-deterministic → 0.20 both clean), so **0.20 is the recommended robust controlnet floor for OpenAI photoreal** (one margin run, not an N-run repeatability proof -- a service should add margin or verify repeatability since there is no local SynthID detector to self-check). **Engineering follow-up for raiw.cc: the controlnet pipeline should use a HIGHER vendor strength than `default` -- it currently shares `resolve_strength` (0.10/0.15, tuned for plain img2img), but controlnet's edge map preserves structure so it needs ~0.20+; calibrate per vendor/content on the GPU worker, do NOT just reuse the `default` ladder.** **CERTIFIED 2026-06-04 via the isolated `raiw-controlnet-cert` Modal app (`raiw-app/modal_cert.py`), restore OFF, ≤1536, each vendor on its own oracle: controlnet floors are OpenAI 0.20 (2 photoreal × 3 seeds = 6/6 clean; the 0.15-flipper is seed-robust at 0.20) and Gemini 0.30 (0.20 detected → 0.30 clean on 2/2 seeds). OpenAI 0.20 transfers to prod (resolution-independent); Gemini 0.30 holds only ≤1536 — Gemini is resolution-sensitive and raiw.cc runs NATIVE (`max_resolution=0`), so cap Gemini ≤1536 + use 0.30, or native-calibrate (~0.35+). Prod recipe: controlnet + per-vendor floor in `resolve_strength` (not the default ladder) + FIXED seed (kills the non-determinism). No `--restore-faces` in prod -- not because of the license (both shipped methods are NON-COMMERCIAL) but because **regeneration via ArcFace embedding makes the face look MORE AI-generated, not less**: every restore method tested (GFPGAN-on-cleaned, PhotoMaker-V2, InstantID txt2img, InstantID img2img-on-cleaned, 2026-06-04 - 2026-06-08 cert sweeps) yields a diffusion-fresh face that loses original identity precision and gains SDXL "clean skin" gloss. The drifted face from controlnet 0.20 is the least-AI state we can reach; for a paid service that's the prod output. See the Face restore trade-off bullet.** See `docs/synthid.md` §5.5 + `docs/controlnet-removal-pipeline-research.md` (certified floors table).** **Lesson: visual-quality + face-recovery validation does NOT prove watermark removal -- only the SynthID oracle does, across MULTIPLE content types; never infer removal from sharpness/identity, and never conclude from a partial result (the photoreal-only data first read as "controlnet shields, default removes" -- the flat-graphic result reversed it).** `controlnet_conditioning_scale` (CLI `--controlnet-scale`, default 1.0) is the structure-preservation knob (higher = closer to the original structure); fp32 on cpu/mps, fp16-fixed VAE on cuda/xpu. The `controlnet` profile is threaded explicitly (`WatermarkRemover(pipeline=...)` / `InvisibleEngine(pipeline=...)`), NOT inferred from `model_id`. This productionizes the `scripts/controlnet_sweep.py` prototype; see `docs/controlnet-removal-pipeline-research.md`. **Forensic-stealth caveat still applies** (arXiv:2605.09203): defeating the SynthID verifier is not forensic invisibility -- a "this image went through a removal pipeline" classifier can still flag the output. diff --git a/README.md b/README.md index a7d5555..911ab48 100644 --- a/README.md +++ b/README.md @@ -23,7 +23,7 @@ If this tool saves you time, consider [sponsoring its development](https://githu - **AI metadata stripping** — EXIF, PNG text chunks, C2PA provenance manifests (PNG / JPEG / AVIF / HEIF / JPEG-XL, **MP4 / MOV / M4V / M4A** at the container level, and **WebM / MP3 / WAV / FLAC / OGG** losslessly via ffmpeg), XMP DigitalSourceType - **"Made with AI" label removal** — removes the AI-disclosure metadata that platforms read to apply automatic labels (useful for clearing a false-positive label from a human-edited photograph) - **Analog Humanizer** — optional film grain and chromatic aberration post-processing -- **Text and face preservation (experimental)** — optional `--pipeline controlnet` adds a canny ControlNet that keeps text and face structure sharp through the removal pass (without copying original pixels, so SynthID is still removed). Canny preserves face *structure*, not *identity* (the regenerated face drifts in likeness); identity is regenerated by the `--restore-faces` PhotoMaker-V2 post-pass (opt-in, **NON-COMMERCIAL** — pulls non-commercial InsightFace model packs). Both are experimental and off by default. +- **Text and face preservation (experimental)** — optional `--pipeline controlnet` adds a canny ControlNet that keeps text and face structure sharp through the removal pass (without copying original pixels, so SynthID is still removed). Canny preserves face *structure*, not *identity* (the regenerated face drifts in likeness). The optional `--restore-faces` post-pass (`instantid` default or `photomaker`, both **NON-COMMERCIAL**, off by default) does NOT recover the original face — it **regenerates** it from an ArcFace embedding via SDXL diffusion, which inherently makes the output look more AI-generated than the cleaned image. **For production face preservation, leave restore OFF and use the cleaned image as-is.** - **Batch processing** — process entire directories - **Detection** — three-stage NCC watermark detection with confidence scoring - **Provenance detection (`identify`)** — aggregate C2PA issuer, the C2PA soft-binding forensic-watermark vendor (Adobe TrustMark, Digimarc, Imatag, ...), IPTC "Made with AI" plus the IPTC 2025.1 `AISystemUsed` field, embedded SD/ComfyUI params, EXIF/XMP generator tags, the xAI/Grok EXIF signature, the China TC260 AIGC label (XMP, PNG chunk, or EXIF), the HuggingFace `hf-job-id` job marker, the SynthID metadata proxy, the visible marks (Gemini sparkle plus the Doubao "豆包AI生成" / Jimeng "即梦AI" / Samsung Galaxy AI "Contenuti generati dall'AI" text marks), the open SD/SDXL/FLUX invisible watermark, and (with the `trustmark` extra) the open Adobe TrustMark watermark into one origin-platform + watermark-inventory verdict (`--json` for machine output) @@ -128,7 +128,7 @@ image → encode to latent space (VAE) at native resolution > > **`--pipeline controlnet` preserves text and face structure (experimental, opt-in).** It runs the same SDXL img2img scrub but adds a canny ControlNet that conditions the regeneration on the image's edge map, so text and structure stay sharp at the strengths that remove SynthID. The watermark removal still comes from the img2img regeneration (`--strength`); the ControlNet only preserves structure — no original pixels are copied or frozen, so SynthID does not survive. `--controlnet-scale` tunes the preservation strength (higher = closer to the original structure). Runs fp32 on mps/cpu (fp16 only on cuda/xpu, where the fp16-fixed SDXL VAE is loaded automatically). > -> **`--restore-faces` regenerates faces from a CLIP+ArcFace embedding (PhotoMaker-V2, experimental, opt-in, NON-COMMERCIAL).** Canny preserves where a face is, but not who it is — the regenerated face drifts in likeness. The `--restore-faces` post-pass (experimental, off by default; needs the `photomaker` extra) crops each face from the original, feeds it to PhotoMaker-V2 as an identity reference, and regenerates a fresh face from a CLIP+ArcFace embedding which is then feather-composited into the cleaned image. The pixels are diffusion-fresh so SynthID is not re-introduced. **NON-COMMERCIAL:** PhotoMaker-V2's ID encoder pulls InsightFace antelopev2/buffalo_l model packs at runtime, which are released under a research-only license — a paid service must NOT use this flag. (A commercial-safe path was attempted via PhotoMaker-V1 + GFPGAN-on-cleaned but neither was a good fit: V1 hit upstream / diffusers-0.38 compatibility walls, and GFPGAN only polished the already-drifted face without restoring identity. See `docs/synthid-robust-identity-research.md`.) +> **`--restore-faces` REGENERATES faces; it does NOT recover original pixels.** Two methods, both **NON-COMMERCIAL**, both off by default: `instantid` (default, the `instantid` extra; InstantID img2img-on-cleaned + ArcFace embedding + landmark ControlNet) and `photomaker` (the `photomaker` extra; PhotoMaker-V2 txt2img + CLIP+ArcFace embedding). Both crop the face region from the cleaned image and run SDXL diffusion conditioned on an ArcFace embedding from the original — the output face pixels are diffusion-fresh so SynthID is not re-introduced, **but the output face inherently looks more AI-generated than the cleaned image**: every pixel is SDXL-decoded from a semantic embedding, gaining the typical "clean skin" gloss and losing the exact original identity. The cleaned image from the main controlnet 0.20 pass is the least-AI state we can reach without re-introducing SynthID; any restore on top of it trades original-look for embedding-driven regeneration. **For production face preservation, leave `--restore-faces` OFF.** Both extras are NON-COMMERCIAL because their ArcFace embedder is InsightFace's antelopev2 pack which is research-only; the empirical case for not shipping them in prod is the AI-look regardless of license (see `docs/synthid-robust-identity-research-2026-06-08.md`). SDXL is the default since May 2026: empirically defeats SynthID v2 on Gemini 3 Pro outputs, where the older SD-1.5 pipeline at 768 px did not. The SD-1.5 path was removed once it was verified not to handle v2. Note the scope: this defeats the SynthID *verifier*, which is not the same as being forensically indistinguishable from a real photo. Recent work ([arXiv:2605.09203](https://arxiv.org/abs/2605.09203)) shows watermark-removal pipelines leave detectable traces, so a separate "this image was processed" classifier can still flag the output. @@ -136,7 +136,7 @@ SDXL is the default since May 2026: empirically defeats SynthID v2 on Gemini 3 P > **Technical deep-dive:** see [`docs/synthid.md`](docs/synthid.md) for a primary-source-cited breakdown of how SynthID works mechanically (post-hoc encoder/decoder, 136-bit payload, pixel-space embedding), what it empirically survives (JPEG, crop, resize: ~99.98% TPR at 0.1% FPR from arXiv:2510.09263), what removes it, and the forensic-stealth tradeoff (all known removal attacks are detectable at >98% TPR@1%FPR per arXiv:2605.09203). -**Text and face preservation** (experimental, opt-in `--pipeline controlnet`): adds a canny ControlNet so text and face *structure* stay sharp through the removal pass, without copying or freezing any original pixels (so SynthID is still removed). Tune the preservation strength with `--controlnet-scale`. Canny preserves structure but not face *identity* (identity is regenerated by the `--restore-faces` PhotoMaker-V2 post-pass, experimental and off by default, **non-commercial** — see the callout above). Both features are experimental. +**Text and face preservation** (experimental, opt-in `--pipeline controlnet`): adds a canny ControlNet so text and face *structure* stay sharp through the removal pass, without copying or freezing any original pixels (so SynthID is still removed). Tune the preservation strength with `--controlnet-scale`. Canny preserves structure but not face *identity*: the regenerated face drifts in likeness. The optional `--restore-faces` post-pass would regenerate the face from an ArcFace embedding, but every shipped method makes the face look more AI-generated (see the callout above) — for production face preservation leave restore OFF. **Analog Humanizer**: optional film grain and chromatic aberration injection that mimics a photo of a screen, raising the bar for AI-generated image classifiers. (It frustrates generic classifiers but does not guarantee forensic invisibility — see the [arXiv:2605.09203](https://arxiv.org/abs/2605.09203) note above.) diff --git a/docs/controlnet-removal-pipeline-research.md b/docs/controlnet-removal-pipeline-research.md index 68beac0..53ca02d 100644 --- a/docs/controlnet-removal-pipeline-research.md +++ b/docs/controlnet-removal-pipeline-research.md @@ -124,14 +124,19 @@ Gemini app; the two payloads are vendor-specific and never cross-checked): - **Fix the seed in prod.** The non-determinism is purely `seed=None` (random); a fixed `--seed` makes every run reproduce the certified-clean result, so you ship a deterministic, re-certifiable config (and the seed sweep collapses to one config). -- **`--restore-faces` is PhotoMaker-V2 (NON-COMMERCIAL).** The GFPGAN-on-cleaned path - was tried and rejected: it polished but did not restore identity. PhotoMaker-V2 - regenerates faces from a CLIP+ArcFace embedding (so pixels are fresh, SynthID is not - re-introduced) but pulls InsightFace antelopev2/buffalo_l model packs at runtime, - which are research-only. Needs the `photomaker` extra; **a paid service MUST NOT - use this flag.** PhotoMaker-V1 was attempted as a commercial-safe alternative but - blocked by a CFG batch-dim mismatch in the upstream pipeline (forked from diffusers - 0.29; we ship 0.38) — see `docs/synthid-robust-identity-research.md`. +- **`--restore-faces` is OFF in prod and stays opt-in.** Two methods ship + (`instantid` default, `photomaker`), both NON-COMMERCIAL. They REGENERATE the face + from an ArcFace embedding via SDXL diffusion, making the output face look more + AI-generated than the cleaned image (gloss, symmetric pores, SDXL "clean skin" + aesthetic). For production face preservation the cleaned image from controlnet 0.20 + is the LEAST-AI state we can reach — any restore on top trades original-look for + embedding-driven regeneration. Empirical sweep summary: GFPGAN-on-cleaned polished + without identity recovery; PhotoMaker-V2 produced a different person; InstantID + txt2img produced studio-portrait patchwork on group photos; InstantID + img2img-on-cleaned with three parameter settings integrated scene context cleanly + but never recovered original identity precisely — every setting traded one problem + for another. See `docs/synthid-robust-identity-research-2026-06-08.md` + "Empirical follow-up" for the full sweep. - **No local SynthID detector exists** → the service can't self-verify; bake in strength margin and periodic oracle spot-checks. - **Lesson:** visual-quality / face-identity recovery does NOT prove removal — only the diff --git a/docs/synthid-robust-identity-research-2026-06-08.md b/docs/synthid-robust-identity-research-2026-06-08.md index de0e55c..009a1a3 100644 --- a/docs/synthid-robust-identity-research-2026-06-08.md +++ b/docs/synthid-robust-identity-research-2026-06-08.md @@ -126,4 +126,60 @@ Six claims were refuted in adversarial verification, two of them load-bearing: A - [source](https://github.com/IrvingMeng/MagFace/blob/main/LICENSE) - [source](https://github.com/askerlee/AdaFace-dev) - [source](https://openreview.net/forum?id=Hc2ZwCYgmB) -- [source](https://github.com/tencent-ailab/IP-Adapter/wiki/IP%E2%80%90Adapter%E2%80%90Face) \ No newline at end of file +- [source](https://github.com/tencent-ailab/IP-Adapter/wiki/IP%E2%80%90Adapter%E2%80%90Face) + +## Empirical follow-up (2026-06-08, end of session) + +After the research synthesis above, InstantID was integrated end-to-end and cert-swept +on Modal A100 in two phases: + +1. **Phase 1: InstantID txt2img per-face crop + composite.** Per-face InstantID + txt2img with the upstream `pipeline_stable_diffusion_xl_instantid`, ArcFace + embedding from the original face, landmark stick figure. Three composite + iterations: + - v1 (rectangular Gaussian alpha on the 2x square_box around each face): + visible patchwork on group photos, generated 1024 backgrounds clashing. + - v2 (tight crop on YuNet-detected face in the generated 1024 + elliptical + alpha 0.45*bw x 0.55*bh + soft feather): ellipse axis exceeded bbox + vertically, clipped forehead/chin on single portrait, group still had + visible elliptical seams + cool-vs-warm tone clash with scene. + - v3 (tighter ellipse 0.32*bw x 0.42*bh + per-channel mean color match to + local cleaned canvas + softer feather): patchwork visually softened; faces + still read as studio portraits inserted into the scene, not as people + shot in the scene. Single portrait identity drifted (tatsunari -> "round + Asian male" vs original's thin face). +2. **Phase 2: InstantID img2img on cleaned crop.** Switched to the upstream + `pipeline_stable_diffusion_xl_instantid_img2img` (downloaded at first use + from raw.githubusercontent.com; requires `trust_remote_code=True`). Same + ArcFace + landmark conditioning but the SDXL diffusion source is the + CLEANED face crop, so the diffusion sees scene lighting / shoulders / + shadow direction directly. Multi-face composition jumped substantially: + faces sit in the bar scene with matching warm tone, no more elliptical + seams. Single-portrait identity at the default (`strength=0.55`, + `ip_adapter_scale=0.8`, `controlnet_conditioning_scale=0.8`) was "similar + person, not exactly the original"; raising to `strength=0.7`, + `ip_adapter_scale=1.0`, `controlnet_scale=1.0` brought identity closer to + original but introduced more "SDXL gloss / clean skin" aesthetic. + +**Net finding for raiw.cc (load-bearing).** The fundamental issue is structural: +ArcFace encodes "this person's general look" (ethnicity, gender, basic facial +geometry) at 512 dimensions; SDXL decodes that embedding into pixels with the +inherent SDXL aesthetic (smooth skin, symmetric pores, AI-photoreal look). +Stronger identity push (higher strength / IP-Adapter scale) makes the face +CLOSER to the embedded identity but MORE AI-looking; weaker push leaves +identity to drift but face looks less AI-generated. There is no parameter +setting that simultaneously recovers original identity AND looks less AI than +the cleaned image, because the cleaned image is itself a controlnet-light +denoise of the original (closer to original pixels) while a restore pass is a +full SDXL regeneration (further from original pixels). + +**Operational conclusion.** Do not ship `--restore-faces` in any monetized +deployment. The cleaned image from the main controlnet 0.20 pass is the +LEAST-AI state we can reach without re-introducing SynthID; every restore +method tested (GFPGAN-on-cleaned, PhotoMaker-V2, InstantID txt2img, +InstantID img2img-on-cleaned at three parameter sweeps) trades original-look +for embedding-driven regeneration and makes the face read as "AI-generated" +rather than "the original person". The `instantid` and `photomaker` extras +stay in the library as opt-in for research / personal use where users +explicitly want identity regeneration; the CLI flag and module docstrings +state the trade-off at every entry point. \ No newline at end of file diff --git a/src/remove_ai_watermarks/cli.py b/src/remove_ai_watermarks/cli.py index 9613bc9..06465b0 100644 --- a/src/remove_ai_watermarks/cli.py +++ b/src/remove_ai_watermarks/cli.py @@ -238,31 +238,37 @@ def _warn_if_esrgan_unavailable(upscaler: str) -> None: def _restore_faces_options(f: Any) -> Any: """Attach the face-restoration flags to an invisible-pipeline command. - Two methods. ``instantid`` (default; the `instantid` extra) regenerates each - face from an ArcFace embedding + landmark ControlNet -- semantic identity - plus weak spatial control, no original pixels. ``photomaker`` (the - `photomaker` extra) uses PhotoMaker-V2's CLIP+ArcFace dual encoder. - **BOTH ARE NON-COMMERCIAL**: they pull InsightFace antelopev2 / buffalo_l - model packs at runtime, which are research-only. A paid service (raiw.cc, - any monetized SaaS) MUST NOT use this flag. + Both methods REGENERATE the face from an ArcFace embedding via SDXL diffusion + -- they do NOT recover original pixels. Every output face pixel is + diffusion-fresh, so the regenerated face inherently looks MORE AI-generated + than the cleaned image (gloss, symmetric pores, SDXL "clean skin" + aesthetic). For production face preservation, leave the flag OFF and use + the cleaned image as-is. The two methods are kept for research / personal + use where users explicitly want identity regeneration. **BOTH are + NON-COMMERCIAL**: they pull InsightFace antelopev2 / buffalo_l model packs + which are research-only. A paid service (raiw.cc, any monetized SaaS) MUST + NOT use this flag. """ method = click.option( "--restore-faces-method", type=click.Choice(["instantid", "photomaker"]), default="instantid", - help="Face-restore mechanism. 'instantid' (default) uses InstantID's ArcFace + " - "landmark ControlNet for stronger identity fidelity on single portraits. " - "'photomaker' uses PhotoMaker-V2's CLIP+ArcFace dual encoder. **BOTH are " - "NON-COMMERCIAL** (InsightFace antelopev2 / buffalo_l model packs are " - "research-only). Pick whichever extra you've installed; for personal / research " - "use only. Do NOT use in a paid service.", + help="Face-regeneration mechanism (no method recovers original pixels; both " + "REGENERATE the face via SDXL). 'instantid' (default) uses InstantID img2img on " + "the cleaned crop with ArcFace + landmark ControlNet. 'photomaker' uses " + "PhotoMaker-V2 txt2img + CLIP+ArcFace dual encoder. **BOTH are NON-COMMERCIAL** " + "(InsightFace antelopev2 / buffalo_l packs are research-only). For personal / " + "research use only.", )(f) return click.option( "--restore-faces/--no-restore-faces", default=False, - help="EXPERIMENTAL, opt-in, **NON-COMMERCIAL**. Restore face identity via the " - "chosen --restore-faces-method (default: instantid); off by default, auto-skips " - "when no face is detected or the chosen extra is absent.", + help="EXPERIMENTAL, opt-in, **NON-COMMERCIAL**. **REGENERATES the face** (does " + "NOT recover original pixels) via the chosen --restore-faces-method; the " + "regenerated face looks more AI-generated than the cleaned image. Off by " + "default; auto-skips when no face is detected or the chosen extra is absent. " + "For production face preservation leave this OFF and use the cleaned image " + "as-is.", )(method)