From dfa518130934a0c489a7b0d84b7abd094b7df56e Mon Sep 17 00:00:00 2001 From: Victor Kuznetsov Date: Mon, 8 Jun 2026 16:05:58 -0700 Subject: [PATCH] =?UTF-8?q?fix(photomaker):=20switch=20to=20V1=20=E2=80=94?= =?UTF-8?q?=20V2=20actually=20requires=20InsightFace=20(non-commercial)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit A Modal cert sweep caught what the research doc missed: PhotoMaker-V2 fails at import without InsightFace ("No module named 'insightface'"). Reading the upstream source confirms it: `photomaker/__init__.py` imports `FaceAnalysis2` (an InsightFace wrapper) at module load, V2's encoder is named `PhotoMakerIDEncoder_CLIPInsightfaceExtendtoken`, and `model_v2.py`'s forward takes an `id_embeds` argument that the pipeline computes via `insightface.app.FaceAnalysis(name='antelopev2', ...)`. So V2 is a DUAL encoder (CLIP + ArcFace), not CLIP-only as the model card line "id_encoder includes finetuned OpenCLIP-ViT-H-14 and a few fuse layers" implied. InsightFace's pretrained model packs (antelopev2, buffalo_l) are research/ non-commercial only per their own README: "The pretrained models we provided with this library are available for non-commercial research purposes only." So V2 is blocked for a paid service like raiw.cc. PhotoMaker-V1 is the commercial-safe alternative — its `PhotoMakerIDEncoder` (model.py) forward takes only `(id_pixel_values, prompt_embeds, class_tokens_mask)`, no ArcFace branch. Identity is CLIP-only, license is Apache-2.0, no InsightFace. Code change: swap the repo + filename constants in `photomaker_restore.py` (TencentARC/PhotoMaker, photomaker-v1.bin). Tests still pass (the 9 PhotoMaker tests use a fake pipeline, so the model swap is transparent to them). Doc correction: rewrote the verdict / license table / section 5 of `docs/synthid-robust-identity-research.md` to lead with V1 and add a correction notice explaining the V2 misread. Bulk-renamed `PhotoMaker-V2` to `PhotoMaker-V1` across CLAUDE.md, README.md, docs/synthid.md, and docs/controlnet-removal-pipeline-research.md (kept V2 only in the correction notice, the license table, and the anchor reference). ruff clean; 578 tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) --- CLAUDE.md | 12 +++--- README.md | 10 ++--- docs/controlnet-removal-pipeline-research.md | 2 +- docs/synthid-robust-identity-research.md | 41 +++++++++++++------ docs/synthid.md | 2 +- .../photomaker_restore.py | 31 ++++++++------ 6 files changed, 60 insertions(+), 38 deletions(-) diff --git a/CLAUDE.md b/CLAUDE.md index c9061c5..1353292 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -10,7 +10,7 @@ You are a **principal Python engineer** maintaining a CLI tool and library for r - `uv run remove-ai-watermarks identify ` — provenance verdict (platform + watermark inventory + confidence); `--json` for machine output, `--no-visible` to skip the cv2 sparkle detector - `uv run remove-ai-watermarks metadata --check` — inspect AI metadata (C2PA, EXIF, PNG chunks) - `uv run remove-ai-watermarks metadata --remove -o ` — strip all AI metadata -- `uv run remove-ai-watermarks batch ` — process every supported image in a directory (output defaults to `_clean/`, set with `-o`). `--mode visible|invisible|metadata|all` (default `visible`); the invisible/all path reuses the same `--strength`/`--steps`/`--pipeline`/`--controlnet-scale`/`--device`/`--max-resolution`/`--min-resolution`/`--upscaler`/`--seed`/`--hf-token` knobs as `invisible`, `--inpaint/--no-inpaint` for the visible pass, `--humanize` for the Analog Humanizer + `--unsharp` for the final sharpening post-filter, `--restore-faces/--no-restore-faces` for the PhotoMaker-V2 SynthID-safe face-identity post-pass (`photomaker` extra), and `--auto` (+ `--adaptive-polish/--no-adaptive-polish`) for the content-adaptive quality mode (re-planned per image; one engine cached per resolved pipeline) +- `uv run remove-ai-watermarks batch ` — process every supported image in a directory (output defaults to `_clean/`, set with `-o`). `--mode visible|invisible|metadata|all` (default `visible`); the invisible/all path reuses the same `--strength`/`--steps`/`--pipeline`/`--controlnet-scale`/`--device`/`--max-resolution`/`--min-resolution`/`--upscaler`/`--seed`/`--hf-token` knobs as `invisible`, `--inpaint/--no-inpaint` for the visible pass, `--humanize` for the Analog Humanizer + `--unsharp` for the final sharpening post-filter, `--restore-faces/--no-restore-faces` for the PhotoMaker-V1 SynthID-safe face-identity post-pass (`photomaker` extra), and `--auto` (+ `--adaptive-polish/--no-adaptive-polish`) for the content-adaptive quality mode (re-planned per image; one engine cached per resolved pipeline) ## Test and lint @@ -27,7 +27,7 @@ You are a **principal Python engineer** maintaining a CLI tool and library for r - GPU/ML modules (invisible_engine, watermark_remover) are optional — guard imports with `is_available()` checks - Optional detection extras: `detect` (imwatermark — open SD/SDXL/FLUX watermark) and `trustmark` (Adobe TrustMark decoder; pulls torch + downloads weights). Both are guarded by `is_available()` and skipped by `identify` when absent. -- Optional `photomaker` extra (`photomaker` upstream package + huggingface-hub): the SynthID-safe PhotoMaker-V2 face-identity post-pass (`photomaker_restore.py`, CLI `--restore-faces`, **EXPERIMENTAL, opt-in, OFF by default**). Commercial-safe end-to-end (PhotoMaker-V2 Apache-2.0 + OpenCLIP-ViT-H/14 MIT; NO InsightFace -- the non-commercial blocker for IP-Adapter FaceID / InstantID / PuLID / Arc2Face). Carries identity in a SynthID-invariant OpenCLIP embedding (validated 2026-06-04: cosine drift 0.002 under SynthID-magnitude pixel noise, an order of magnitude less than JPEG90 drift which SynthID survives) and regenerates fresh face pixels conditioned on it. Heavy (~3 GB SDXL + ~1 GB PhotoMaker adapter, downloaded on first use). Kept OUT of `all`. The `photomaker` extra references the upstream git repo, which requires `[tool.hatch.metadata] allow-direct-references = true`. See `docs/synthid-robust-identity-research.md`. **Replaces the removed `restore` (GFPGAN) extra**, which was oracle-confirmed 2026-06-04 to re-introduce SynthID by blending watermarked original face pixels at fidelity weight 0.5; clean A/B (gemini_3 controlnet 0.20: detected WITH GFPGAN, clean WITHOUT). That extra and its `face_restore.py` module are gone. +- Optional `photomaker` extra (`photomaker` upstream package + huggingface-hub): the SynthID-safe PhotoMaker-V1 face-identity post-pass (`photomaker_restore.py`, CLI `--restore-faces`, **EXPERIMENTAL, opt-in, OFF by default**). Commercial-safe end-to-end (PhotoMaker-V1 Apache-2.0 + OpenCLIP-ViT-H/14 MIT; NO InsightFace -- the non-commercial blocker for IP-Adapter FaceID / InstantID / PuLID / Arc2Face). Carries identity in a SynthID-invariant OpenCLIP embedding (validated 2026-06-04: cosine drift 0.002 under SynthID-magnitude pixel noise, an order of magnitude less than JPEG90 drift which SynthID survives) and regenerates fresh face pixels conditioned on it. Heavy (~3 GB SDXL + ~1 GB PhotoMaker adapter, downloaded on first use). Kept OUT of `all`. The `photomaker` extra references the upstream git repo, which requires `[tool.hatch.metadata] allow-direct-references = true`. See `docs/synthid-robust-identity-research.md`. **Replaces the removed `restore` (GFPGAN) extra**, which was oracle-confirmed 2026-06-04 to re-introduce SynthID by blending watermarked original face pixels at fidelity weight 0.5; clean A/B (gemini_3 controlnet 0.20: detected WITH GFPGAN, clean WITHOUT). That extra and its `face_restore.py` module are gone. - Optional `esrgan` extra (spandrel only): Real-ESRGAN pre-diffusion super-resolution for small inputs (`upscaler.py`, CLI `--upscaler esrgan` on `invisible`/`all`/`batch`). Guarded by `upscaler.is_available()`; the default upscaler stays Lanczos (cv2, no deps) and the engine falls back to Lanczos when the extra is absent or the model errors. spandrel is MIT and pulls NO basicsr (only torch/torchvision/safetensors/numpy/einops); Real-ESRGAN weights are BSD-3-Clause and download on first use via `torch.hub` (never bundled). Kept OUT of `all` (heavy + model download). - Tests for the *model-running* paths are limited to availability checks (multi-GB downloads). But the **pure helpers inside ML-adjacent modules are unit-tested without any download** and must stay that way: `_target_size` (native-vs-downscale-cap-vs-upscale-floor, `test_invisible_engine.py`), `humanizer.unsharp_mask`/`adaptive_polish` (`test_humanizer.py`), `auto_config.plan`/detectors (`test_auto_config.py`), and the MPS->CPU fallback control flow via mocked pipelines (`test_img2img_runner.py`, 100% cover). Don't skip these as "ML, needs a model" — only `remove_watermark`/the diffusion bodies do. @@ -45,9 +45,9 @@ You are a **principal Python engineer** maintaining a CLI tool and library for r - `region_eraser.py` — universal region eraser (`erase` CLI). `erase(image, boxes=|mask=, backend=)` accepts grayscale (2D) and RGBA (4-channel) inputs on **both** backends (`erase_cv2` and `erase_lama` each split off any alpha plane and re-attach it unchanged, and promote grayscale to BGR for processing — LaMa would otherwise crash on grayscale and drop alpha on BGRA): `boxes_to_mask` → `cv2.inpaint` (`cv2` backend, default, no deps) or big-LaMa via onnxruntime (`lama` backend, extra `lama`, `Carve/LaMa-ONNX` Apache-2.0 model downloaded on first use, never bundled). `erase_lama` crops a padded region around the mask, runs LaMa at its fixed 512² input, pastes only masked pixels back (untouched areas stay pixel-exact). Lazy `_get_lama_session` singleton; `lama_available()` guards the optional import. **LaMa-ONNX costs ~3.5-4 GB peak RAM and ~5-6 s/call on CPU** (FFC working set, not arena — `enable_cpu_mem_arena=False` does not help), so it does NOT fit a minimal droplet; the cv2 backend (tens of MB, ~30 ms) does. LaMa quality at low RAM = serverless/GPU, mirroring how raiw.cc offloads SDXL to fal. - `invisible_watermark.py` — `detect_invisible_watermark(path)` decodes the OPEN DWT-DCT watermarks (public decoder, no key) embedded by Stable Diffusion / SDXL / FLUX via the `imwatermark` library. Known fixed patterns (verified against upstream source) live in `_BITS_48` (SDXL 48-bit, FLUX.2 48-bit) and `_SD1_STRING` ("StableDiffusionV1", SD 1.x/2.x). Optional dep (extra `detect`); returns None when absent. The `detect` extra pulls **torch** transitively (invisible-watermark declares torch a hard dep, and `WatermarkDecoder` eagerly imports `rivaGan` -> `torch` at import time), so detection needs torch present even though dwtDct runs CPU-only on cv2/numpy/pywavelets — no GPU and no separate `gpu` extra required. **Unlike SynthID this is locally detectable**, but the watermark is fragile (does not survive JPEG re-encode/resize — verified gone after JPEG q90), so it confirms origin only on pristine files. Add new known patterns here. The file carries a top-of-module pyright pragma because imwatermark/cv2 ship no type stubs. - `trustmark_detector.py` — `detect_trustmark(path)` decodes the OPEN, keyless **Adobe TrustMark** watermark (the soft binding behind Adobe Durable Content Credentials, `alg` `com.adobe.trustmark.P`) via the optional `trustmark` package (extra `trustmark`; pulls torch, downloads model weights on first use). Mirrors `invisible_watermark.py` (lazy singleton guarded by a double-checked `threading.Lock` so concurrent callers do not double-download the weights, top-of-module pyright pragma, returns None when absent). It detects *provenance*, not AI origin as such (TrustMark also marks human-authored content), so `identify` lists it as a watermark without setting `is_ai_generated`. Other soft-binding vendors (Digimarc/Imatag/Steg.AI/...) have no public decoder — they are only *named* via the `C2PA_SOFT_BINDINGS` scan, not decoded. **False-positive gate (added 2026-05-29):** TrustMark's `wm_present` is a BCH error-correction validity flag that spuriously validates on a content-correlated fraction of un-watermarked images — AI-generated textures trip it far more than camera photos (verified 2026-05-29 on real files: it fires on Gemini/OpenAI/Doubao output that *cannot* carry Adobe's watermark, with a random-bytes decoded secret, while signal-free camera photos did not trip it). A genuine TrustMark is a *durable* soft binding engineered to survive re-encoding, so `detect_trustmark` re-decodes after a mild JPEG round-trip (`_survives_reencode`, `_REENCODE_QUALITY` 95) and requires the same schema both times; every observed false positive collapsed (none survived even q95), so the gate is the durability property the watermark guarantees. The second decode runs only on the rare initial hit, so the cost is negligible. Do NOT remove the gate to "catch more" — a lone TrustMark hit without it is almost always content noise. -- `noai/watermark_remover.py` — the `WatermarkRemover` class has two diffusion pipelines, selected by the explicit `pipeline` ctor arg (NOT inferred from `model_id` -- both use the same SDXL base, `DEFAULT_MODEL_ID`). **`default`** runs plain SDXL img2img (`_run_img2img`). **`controlnet`** (**EXPERIMENTAL, opt-in**; `_run_controlnet`, `_load_controlnet_pipeline`) runs `StableDiffusionXLControlNetImg2ImgPipeline` with the SDXL-native canny ControlNet `xinsir/controlnet-canny-sdxl-1.0` (`watermark_profiles.CONTROLNET_CANNY_MODEL`): the control image is `cv2.Canny(gray, 100, 200)` stacked to 3 channels (`_CANNY_LOW`/`_CANNY_HIGH`, prompt `_CONTROLNET_PROMPT` / `_CONTROLNET_NEGATIVE`). **Removal comes from the img2img regeneration (`strength`); the ControlNet only PRESERVES text and face STRUCTURE via the edge map.** No original pixels are copied or frozen, BUT **validation 2026-06-04 disproved the old "so SynthID does not survive" claim: SynthID CAN survive controlnet on photoreal/high-detail content.** At the shared low removal strength the canny edge-conditioning keeps the regeneration so close to the original that the pixel perturbation that destroys SynthID does not happen (oracle-confirmed: an OpenAI bracelet photo + a 9-face grid read **SynthID-detected** after controlnet at strength 0.10/0.15, but **SynthID-not-detected** after the `default` pipeline at the SAME strength + resolution -- only the pipeline differed). **But the reverse also holds: a flat-graphic logo/poster SURVIVED `default` while clearing controlnet** -- removal at the low strength is content×pipeline dependent and neither pipeline is universally safe; the real lever is a higher strength. See the controlnet Known-limitations bullet for the full table + root cause. Canny holds face STRUCTURE but NOT identity (the regenerated face drifts in likeness -- canny carries edges, not identity; face identity is preserved by the optional `--restore-faces` PhotoMaker-V2 post-pass (EXPERIMENTAL, opt-in, OFF by default; needs the `photomaker` extra) -- see `photomaker_restore.py`). `controlnet_conditioning_scale` (ctor arg, default 1.0) is the structure-preservation knob. Same dtype rule as `default` (fp32 on cpu/mps, fp16 only on cuda/xpu; the fp16-fixed SDXL VAE `_SDXL_FP16_VAE_ID` is swapped in on fp16 GPUs -- issue #29) and the same MPS->CPU fallback (reload on cpu/fp32, drop a non-cpu generator, retry once). -- `face_restore.py` (REMOVED 2026-06-04, was GFPGAN-based). The GFPGAN restore pass ran on the watermarked ORIGINAL at fidelity weight 0.5 and was oracle-confirmed to re-introduce SynthID into the face regions by partial pixel blending. **Replaced by `photomaker_restore.py`** (PhotoMaker-V2, identity-as-embedding) -- see that bullet below. -- `photomaker_restore.py` — SynthID-safe face-identity restoration via PhotoMaker-V2 (commercial-safe alternative to `face_restore.py`'s GFPGAN footgun). **EXPERIMENTAL, opt-in via `--restore-faces --restore-faces-method=photomaker`, needs the `photomaker` extra.** Runs AFTER the diffusion removal pass (`InvisibleEngine.remove_watermark` -> `_restore_faces_photomaker`). Flow: YuNet detects faces in the CLEANED image; for each box, the SAME box from the ORIGINAL is square-cropped (`_face_crop_square`) and fed as `input_id_images` to `PhotoMakerStableDiffusionXLPipeline` (txt2img); the regenerated face is feather-composited back via `_composite_faces`. Identity comes from the OpenCLIP-ViT-H/14 embedding of the original face (SynthID-invariant: cosine 0.9977 on SynthID-magnitude noise, an order of magnitude less drift than JPEG90 which SynthID survives), but the PIXELS that land in the output are diffusion-fresh -- so SynthID is not transported back, unlike GFPGAN-on-original. **Commercial-safe end-to-end:** PhotoMaker-V2 Apache-2.0, OpenCLIP-ViT-H/14 MIT, SDXL shared with main pipeline, NO InsightFace. PhotoMaker is fundamentally txt2img in diffusers (`PhotoMakerStableDiffusionXLPipeline`); there is no `PhotoMakerControlNetImg2img` class, so this is a TWO-PASS pipeline: pass 1 (controlnet/default) cleans SynthID + drifts faces, pass 2 (this module) regenerates faces from the SynthID-invariant embedding. Pure helpers (`_face_crop_square`, `_composite_faces`) are unit-tested without the model (`tests/test_photomaker_restore.py`); the model-running path is gated behind `is_available()` and exercised manually via the Modal cert sweep. Lazy `PhotoMakerStableDiffusionXLPipeline` singleton (double-checked lock) downloads `photomaker-v2.bin` from `TencentARC/PhotoMaker-V2` on first use; never bundled. fp16 on CUDA, fp32 on MPS/CPU. See `docs/synthid-robust-identity-research.md` for the load-bearing embedding-invariance proof + license table. +- `noai/watermark_remover.py` — the `WatermarkRemover` class has two diffusion pipelines, selected by the explicit `pipeline` ctor arg (NOT inferred from `model_id` -- both use the same SDXL base, `DEFAULT_MODEL_ID`). **`default`** runs plain SDXL img2img (`_run_img2img`). **`controlnet`** (**EXPERIMENTAL, opt-in**; `_run_controlnet`, `_load_controlnet_pipeline`) runs `StableDiffusionXLControlNetImg2ImgPipeline` with the SDXL-native canny ControlNet `xinsir/controlnet-canny-sdxl-1.0` (`watermark_profiles.CONTROLNET_CANNY_MODEL`): the control image is `cv2.Canny(gray, 100, 200)` stacked to 3 channels (`_CANNY_LOW`/`_CANNY_HIGH`, prompt `_CONTROLNET_PROMPT` / `_CONTROLNET_NEGATIVE`). **Removal comes from the img2img regeneration (`strength`); the ControlNet only PRESERVES text and face STRUCTURE via the edge map.** No original pixels are copied or frozen, BUT **validation 2026-06-04 disproved the old "so SynthID does not survive" claim: SynthID CAN survive controlnet on photoreal/high-detail content.** At the shared low removal strength the canny edge-conditioning keeps the regeneration so close to the original that the pixel perturbation that destroys SynthID does not happen (oracle-confirmed: an OpenAI bracelet photo + a 9-face grid read **SynthID-detected** after controlnet at strength 0.10/0.15, but **SynthID-not-detected** after the `default` pipeline at the SAME strength + resolution -- only the pipeline differed). **But the reverse also holds: a flat-graphic logo/poster SURVIVED `default` while clearing controlnet** -- removal at the low strength is content×pipeline dependent and neither pipeline is universally safe; the real lever is a higher strength. See the controlnet Known-limitations bullet for the full table + root cause. Canny holds face STRUCTURE but NOT identity (the regenerated face drifts in likeness -- canny carries edges, not identity; face identity is preserved by the optional `--restore-faces` PhotoMaker-V1 post-pass (EXPERIMENTAL, opt-in, OFF by default; needs the `photomaker` extra) -- see `photomaker_restore.py`). `controlnet_conditioning_scale` (ctor arg, default 1.0) is the structure-preservation knob. Same dtype rule as `default` (fp32 on cpu/mps, fp16 only on cuda/xpu; the fp16-fixed SDXL VAE `_SDXL_FP16_VAE_ID` is swapped in on fp16 GPUs -- issue #29) and the same MPS->CPU fallback (reload on cpu/fp32, drop a non-cpu generator, retry once). +- `face_restore.py` (REMOVED 2026-06-04, was GFPGAN-based). The GFPGAN restore pass ran on the watermarked ORIGINAL at fidelity weight 0.5 and was oracle-confirmed to re-introduce SynthID into the face regions by partial pixel blending. **Replaced by `photomaker_restore.py`** (PhotoMaker-V1, identity-as-embedding) -- see that bullet below. +- `photomaker_restore.py` — SynthID-safe face-identity restoration via PhotoMaker-V1 (commercial-safe alternative to `face_restore.py`'s GFPGAN footgun). **EXPERIMENTAL, opt-in via `--restore-faces --restore-faces-method=photomaker`, needs the `photomaker` extra.** Runs AFTER the diffusion removal pass (`InvisibleEngine.remove_watermark` -> `_restore_faces_photomaker`). Flow: YuNet detects faces in the CLEANED image; for each box, the SAME box from the ORIGINAL is square-cropped (`_face_crop_square`) and fed as `input_id_images` to `PhotoMakerStableDiffusionXLPipeline` (txt2img); the regenerated face is feather-composited back via `_composite_faces`. Identity comes from the OpenCLIP-ViT-H/14 embedding of the original face (SynthID-invariant: cosine 0.9977 on SynthID-magnitude noise, an order of magnitude less drift than JPEG90 which SynthID survives), but the PIXELS that land in the output are diffusion-fresh -- so SynthID is not transported back, unlike GFPGAN-on-original. **Commercial-safe end-to-end:** PhotoMaker-V1 Apache-2.0, OpenCLIP-ViT-H/14 MIT, SDXL shared with main pipeline, NO InsightFace. PhotoMaker is fundamentally txt2img in diffusers (`PhotoMakerStableDiffusionXLPipeline`); there is no `PhotoMakerControlNetImg2img` class, so this is a TWO-PASS pipeline: pass 1 (controlnet/default) cleans SynthID + drifts faces, pass 2 (this module) regenerates faces from the SynthID-invariant embedding. Pure helpers (`_face_crop_square`, `_composite_faces`) are unit-tested without the model (`tests/test_photomaker_restore.py`); the model-running path is gated behind `is_available()` and exercised manually via the Modal cert sweep. Lazy `PhotoMakerStableDiffusionXLPipeline` singleton (double-checked lock) downloads `photomaker-v2.bin` from `TencentARC/PhotoMaker-V1` on first use; never bundled. fp16 on CUDA, fp32 on MPS/CPU. See `docs/synthid-robust-identity-research.md` for the load-bearing embedding-invariance proof + license table. - `auto_config.py` — the `--auto` quality-mode planner (EXPERIMENTAL). `plan(image_path) -> AutoConfig | None` inspects the INPUT image (before the diffusion model loads) and picks the pipeline modes, so the run adapts to content. **Designed to run as the FIRST step of the invisible/all pipeline, wherever that runs** — locally or the raiw.cc Modal GPU worker — **never on the 512 MB web host** (image work there OOM-crashes the container; the planner is `_apply_auto` in `cli.py` for the CLI, and raiw-app would call `plan()` inside `RaiwProtect.remove`). **Quality-priority routing:** ControlNet (text/face-structure preservation) is the default; it is skipped for `default` (plain SDXL) only on a clearly structure-less image (`not has_face and not has_text and edge_density < _STRUCTURELESS_EDGE_MAX` 0.008). **CAVEAT (oracle-validated 2026-06-04, see the controlnet Known-limitations bullet): at the low vendor-adaptive strength NEITHER pipeline removes SynthID on all content -- it is content×pipeline dependent (photoreal SURVIVES controlnet / clears default; flat graphics SURVIVE default / clear controlnet; flat text clears both). So `--auto` picking controlnet for faces/photos leaves SynthID on exactly those, and plain `default` would leave it on flat graphics -- pipeline choice alone does NOT guarantee removal. The real lever is a HIGHER strength, oracle-validated per content type. Removal-priority callers (raiw.cc) must oracle-validate strength across content types BEFORE adopting auto; the "must keep SynthID removed" gate in the adoption note below is the blocker this caught.** `restore_faces` is on when a face is present. When a smoothing pass (controlnet/restore) ran, the **adaptive polish** (`humanizer.adaptive_polish`) is applied: it targets the input's Laplacian variance (detail level) with a capped unsharp + edge-masked grain, restoring photo/face texture while **sparing text** (text is already high-frequency, so the deficit is tiny and almost no polish lands -- the old fixed unsharp/grain speckled small text; validated 2026-06-03 on gemini_3 lap-var 84->334 toward the 592 original, openai_1 text near-untouched). **Detection is cv2-only and torch-free** (~100 MB peak RSS, a few ms — measured): OpenCV **YuNet** (`cv2.FaceDetectorYN`, MIT, 232 KB model bundled at `assets/face_detection_yunet_2023mar.onnx`) for faces, **DBNet** (PP-OCRv3 differentiable-binarization via `cv2.dnn.TextDetectionModel_DB`, a 2.4 MB Apache-2.0 model bundled at `assets/text_detection_ppocrv3_2023may.onnx`) for text, with the old Canny+MSER region heuristic kept as a fallback if the DBNet model can't load (`_detect_text_dbnet` returns None → `_detect_text_mser`). The en/cn opencv_zoo PP-OCRv3 detection models are byte-identical, so it is bundled language-neutral. Text only ever ADDS controlnet, so a miss is backstopped by edge-density and a false positive only costs a controlnet run. Plus `edge_density`. `min_resolution` stays 1024. **Every auto decision is independently overridable** (interface principle): `_apply_auto` (cli.py) overrides only the three content-adaptive modes the user left at their click default (`ctx.get_parameter_source(...) == DEFAULT`) — `--pipeline`, `--restore-faces`/`--no-restore-faces`, and **`--adaptive-polish`/`--no-adaptive-polish`** always win; `--min-resolution`/`--strength`/`--unsharp`/`--humanize` are independent knobs. `--adaptive-polish` also works WITHOUT `--auto` (manual detail-targeted polish; the engine's `adaptive_polish` param uses the full-res original as the detail reference). Prints the chosen plan (`AutoConfig.reason`). Wired into `cmd_all`/`cmd_invisible`/`cmd_batch` — in `batch` the plan is recomputed per image and the invisible engine is cached **per resolved pipeline** (`ctx.obj["_inv_engines"]`, keyed `default`/`controlnet`) instead of a single shared instance, so a mixed directory builds at most one engine of each kind. **Adds ZERO new pip deps** (all cv2 core + the bundled MIT YuNet + Apache-2.0 DBNet models + the cv2-only adaptive polish). The auto plan does NOT select the `esrgan` upscaler (that needs the optional extra and would make auto's behavior install-dependent); `--upscaler esrgan` stays a separate manual knob. Unit-tested without a heavy download (`tests/test_auto_config.py`): flat/text synthetic images for routing (the bundled DBNet fires on a real text card), monkeypatched `detect_face`/`_detect_text_dbnet`/`_detect_text_mser` for the face/text/fallback branches (a real detectable-face fixture is private, never committed). Production adoption path for raiw.cc: validate (must keep SynthID removed, not hallucinate micro-text, beat plain SDXL on the real upload distribution), then bump the library SHA in `modal_app.py` and pass `auto=True`. - `upscaler.py` — optional Real-ESRGAN pre-diffusion super-resolution for small inputs (spandrel boundary, top-of-file pyright pragma). `is_available()` gates on spandrel+torch (via `importlib.util.find_spec`); `upscale(bgr, device=None)` loads a lazily-built spandrel `ImageModelDescriptor` singleton (double-checked lock) and upscales by the model's native factor (x2), with a non-CPU→CPU device fallback mirroring the diffusion engine's MPS→CPU retry. Weights (`RealESRGAN_x2plus.pth`, BSD-3-Clause) download on first use to the `torch.hub` checkpoints cache; never bundled. Used only when UPscaling to the `min_resolution` floor (a `max_resolution` downscale always uses Lanczos). The wiring is `InvisibleEngine._esrgan_upscale(pil, target)` — Real-ESRGAN at native factor, then a Lanczos resize to the exact target, falling back to a plain Lanczos resize if the extra is absent or the model errors (so an optional upscaler can never break removal). The default `--upscaler` is `lanczos` (cv2, no deps). **ESRGAN is a generic photo/texture GAN with no face/glyph prior**, so it best fits photo/texture content and can degrade faces (glassy/asymmetric eyes -- the diffusion pass regenerates faces so the full-pipeline final recovers; PhotoMaker `--restore-faces` is the identity-recovery path) and thin/small text (the GAN invents wrong strokes, and low-strength diffusion will not fix it). Verified 2026-06-04: isolated upscale lap-var ~5x Lanczos on faces+textures but glassy eyes; end-to-end `invisible` final lap-var 1634 vs Lanczos 663 with natural faces (diffusion cleaned the artifact). Kept a **manual opt-in knob** (the auto plan never selects it) with `lanczos` the default; not content-gated by design (use Lanczos for text-heavy inputs). spandrel is MIT and pulls no basicsr. Unit-tested without the model: `tests/test_upscaler.py` (availability guard + the not-installed RuntimeError) and `tests/test_invisible_engine.py::TestEsrganUpscale` (the three `_esrgan_upscale` branches via a monkeypatched `upscaler`). - `image_io.py` — Unicode-safe cv2 IO (issue #17). `imread(path, flags=None)` / `imwrite(path, img)` wrap `np.fromfile`+`cv2.imdecode` / `cv2.imencode`+`tofile` so non-ASCII paths work on Windows -- bare `cv2.imread`/`cv2.imwrite` use the platform ANSI code-page API there and fail (empty decode + `can't open/read file`) on Chinese/Cyrillic/accented filenames. `imread` keeps `cv2.imread` semantics (defaults to `IMREAD_COLOR`, returns `None` on missing/empty/undecodable). **Every cv2 file read/write in the package routes through here; do not call `cv2.imread`/`cv2.imwrite` directly.** `imwrite` returns `False` on an unwritable path (`OSError` caught) instead of raising, matching `cv2.imwrite` semantics. macOS/Linux already accept UTF-8 paths, so it is behavior-neutral there (the bug only reproduces on Windows). cv2/numpy are imported lazily inside the functions, so the module is cheap to import in a bare env. @@ -93,4 +93,4 @@ Who embeds what, and whether it is locally detectable (so we know which gaps are - **External AI-vs-real classifier models are out of scope (decided 2026-05-24).** Generic HuggingFace detectors (`Organika/sdxl-detector` Swin Transformer, `umm-maybe/AI-image-detector`, and fine-tunes) exist and report ~0.98 on their *own* SDXL-vs-real validation sets, but they are per-generator and the model cards themselves note degraded accuracy off-distribution; they are untested on gpt-image / Gemini Nano Banana (the metadata-stripped surfaces we care about), and our own light SDXL pass would likely defeat them the same way it defeats SynthID. Detection here stays local + signal-based (metadata + visible sparkle); do not add a bundled classifier dependency. - **DEFAULT STRENGTH IS NOW VENDOR-ADAPTIVE (2026-06-01, SUPERSEDES every fixed-default claim in this bullet and the next).** `resolve_strength(strength, profile, vendor)` + `vendor_for_strength(path)` (`watermark_profiles.py`) read the C2PA issuer (`metadata.synthid_source`) on the ORIGINAL input and pick `OPENAI_STRENGTH` **0.10** / `GEMINI_STRENGTH` **0.15** / `UNKNOWN_STRENGTH` **0.15** when `--strength` is unset; explicit `--strength` always wins. The CLI detects the vendor from the pristine source (before the visible pass / metadata-strip removes C2PA from the temp file) and passes it to the engine, so display and execution agree; `cmd_invisible`/`cmd_all`/`batch` + the module-level `remove_watermark` all thread `vendor`. **This replaces the single 0.30 default AND the prior "do NOT build a vendor-adaptive default" policy** -- both came from the now-debunked region-rescrub-contaminated study (the per-region re-scrub that contaminated those numbers was removed in the controlnet refactor). Basis: the oracle-verified June 2026 controlled study (clean v0.8.6, protect OFF): OpenAI clears at 0.05 across 1024-1600 (n=4, resolution-independent); Google needs 0.15 on the capped-1536 path (n=4). `docs/synthid.md` §2.2 (data) + §5.2 (the adaptive default) are authoritative. **CAVEAT (oracle pass 2026-06-04): the OpenAI 0.10 default is content-dependent, NOT universal -- a flat-graphic OpenAI logo/poster still read SynthID-detected after `default` at 0.10, and photoreal images after controlnet at 0.10/0.15 (low-change regions under-perturbed). Removal at 0.10/0.15 is content×pipeline dependent (see the controlnet Known-limitations bullet); the lever is a higher strength, oracle-revalidated per content type. Do NOT assume the vendor-adaptive default clears every image.** CAVEAT: Google's 0.15 was validated only on `--max-resolution 1536`; native large Gemini (2816) was not locally measurable (OOM on M-series) and is pending GPU validation on raiw.cc -- if it survives 0.15 native, raise `--strength`. **Everything below in this bullet about a fixed 0.10/0.30 default is HISTORICAL; trust the vendor-adaptive constants + docs/synthid.md.** - **SynthID removal: strength + oracle scope.** Default strength is vendor-adaptive (see the bullet above); `docs/synthid.md` §2.2 is authoritative for the numbers. **Oracle scope (load-bearing):** the Gemini app "Verify with SynthID" is the ONLY valid SynthID oracle (detects Google's mark on any image); `openai.com/verify` is scoped to OpenAI provenance (its own C2PA), NOT a SynthID oracle -- a negative there is meaningless for SynthID. There is no local SynthID detector, so the tool cannot self-check; if the oracle still reads SynthID, raise `--strength` to the lowest value that verifies clean. Only the `default` (plain SDXL img2img) and `controlnet` (SDXL + canny ControlNet) profiles exist; the local `invisible` default is weight-for-weight identical to raiw.cc prod (`fal-ai/fast-sdxl` = `stabilityai/stable-diffusion-xl-base-1.0`, runtime-downloaded, not bundled). **Forensic-stealth caveat** (arXiv:2605.09203): defeating the SynthID verifier is NOT forensic invisibility -- independent detectors flag *removal-processed* images vs genuinely-clean ones at >98% TPR@1%FPR, so do not over-claim "indistinguishable from a real photo". -- **`controlnet` pipeline (text/face STRUCTURE preservation, EXPERIMENTAL, opt-in `--pipeline controlnet`).** SDXL + the canny ControlNet `xinsir/controlnet-canny-sdxl-1.0` via `StableDiffusionXLControlNetImg2ImgPipeline` (`watermark_remover._run_controlnet` / `_load_controlnet_pipeline`). **Removal still comes from the img2img regeneration (`strength`); the ControlNet only PRESERVES text and face STRUCTURE by conditioning on the canny edge map** (`cv2.Canny(gray, 100, 200)`, 3-channel). Canny preserves edges, NOT face identity (a regenerated face drifts in likeness); face identity is preserved by the optional `--restore-faces` PhotoMaker-V2 post-pass (EXPERIMENTAL, opt-in, OFF by default -- see `photomaker_restore.py`, the `photomaker` extra). PhotoMaker carries identity in a SynthID-invariant OpenCLIP embedding and regenerates fresh face pixels conditioned on it; the GFPGAN-based `face_restore.py` was REMOVED 2026-06-04 because it ran on the watermarked original and re-introduced SynthID. The CodeFormer alternative stays NON-COMMERCIAL and is not shipped. The earlier `--face-id` IP-Adapter FaceID layer was REMOVED (footgun: it needs high strength and corrupts faces at the low removal strength). No original pixels are copied or frozen, **BUT removal at the low vendor-adaptive strength is CONTENT × PIPELINE dependent and NEITHER pipeline clears all content -- oracle-validated against the OpenAI verifier 2026-06-04 (8 images, strength 0.10/0.15, `--max-resolution 1536`).** The survivors FLIP by content type: **photoreal** (a 9-face grid, a bracelet product photo) SURVIVES controlnet but CLEARS `default` (controlnet's dense edge map keeps the regen too close to the original, so the SynthID-destroying perturbation never happens; plain img2img perturbs photoreal texture enough); **flat graphic** (a logo/poster with large flat color fills) SURVIVES `default` but CLEARS controlnet (at low strength img2img barely changes flat fills so SynthID persists there, while controlnet repaints them more freely); a flat **text** card cleared under both. **Root cause is insufficient STRENGTH, not the pipeline: at 0.10 the low-change regions -- dense-edge photoreal under controlnet, large flat fills under `default` -- are not perturbed enough to destroy SynthID. The vendor-adaptive 0.10 from the June study is NOT universally sufficient (that study's content happened to clear at 0.10).** The robust fix is a HIGHER strength, oracle-revalidated per content type (controlnet can be cranked harder without losing structure; a lower `controlnet_conditioning_scale` also frees the regen on photoreal). So at today's default strength **both pipelines AND `--auto` can LEAVE SynthID on some content** -- a removal-priority caller (raiw.cc) MUST oracle-validate strength across content types before adopting, not pick a pipeline and assume removal. **Follow-up same day: re-running the two photoreal survivors through controlnet at an explicit `--strength 0.15` cleared BOTH on the oracle -- BUT one of them (the bracelet) had SURVIVED the SAME 0.15 controlnet config in the first pass (only the random, unset seed differed). So removal near the threshold is SEED-NON-DETERMINISTIC: the same image+pipeline+strength+resolution can pass or fail run-to-run (img2img uses `seed=None`/random unless `--seed` is passed, and there is no local SynthID detector to self-verify). 0.15 is the borderline, NOT a robust floor -- pick a strength with MARGIN (controlnet ~>= 0.20) rather than exactly on it; the content×pipeline table's 0.15 data point is near-threshold noise. A confirming run at `--strength 0.20` controlnet cleared BOTH photoreal survivors on the oracle (ladder: 0.10 grid detected → 0.15 borderline/non-deterministic → 0.20 both clean), so **0.20 is the recommended robust controlnet floor for OpenAI photoreal** (one margin run, not an N-run repeatability proof -- a service should add margin or verify repeatability since there is no local SynthID detector to self-check). **Engineering follow-up for raiw.cc: the controlnet pipeline should use a HIGHER vendor strength than `default` -- it currently shares `resolve_strength` (0.10/0.15, tuned for plain img2img), but controlnet's edge map preserves structure so it needs ~0.20+; calibrate per vendor/content on the GPU worker, do NOT just reuse the `default` ladder.** **CERTIFIED 2026-06-04 via the isolated `raiw-controlnet-cert` Modal app (`raiw-app/modal_cert.py`), restore OFF, ≤1536, each vendor on its own oracle: controlnet floors are OpenAI 0.20 (2 photoreal × 3 seeds = 6/6 clean; the 0.15-flipper is seed-robust at 0.20) and Gemini 0.30 (0.20 detected → 0.30 clean on 2/2 seeds). OpenAI 0.20 transfers to prod (resolution-independent); Gemini 0.30 holds only ≤1536 — Gemini is resolution-sensitive and raiw.cc runs NATIVE (`max_resolution=0`), so cap Gemini ≤1536 + use 0.30, or native-calibrate (~0.35+). Prod recipe: controlnet + per-vendor floor in `resolve_strength` (not the default ladder) + FIXED seed (kills the non-determinism) + PhotoMaker restore (the GFPGAN footgun is gone).** See `docs/synthid.md` §5.5 + `docs/controlnet-removal-pipeline-research.md` (certified floors table).** **Lesson: visual-quality + face-recovery validation does NOT prove watermark removal -- only the SynthID oracle does, across MULTIPLE content types; never infer removal from sharpness/identity, and never conclude from a partial result (the photoreal-only data first read as "controlnet shields, default removes" -- the flat-graphic result reversed it).** `controlnet_conditioning_scale` (CLI `--controlnet-scale`, default 1.0) is the structure-preservation knob (higher = closer to the original structure); fp32 on cpu/mps, fp16-fixed VAE on cuda/xpu. The `controlnet` profile is threaded explicitly (`WatermarkRemover(pipeline=...)` / `InvisibleEngine(pipeline=...)`), NOT inferred from `model_id`. This productionizes the `scripts/controlnet_sweep.py` prototype; see `docs/controlnet-removal-pipeline-research.md`. **Forensic-stealth caveat still applies** (arXiv:2605.09203): defeating the SynthID verifier is not forensic invisibility -- a "this image went through a removal pipeline" classifier can still flag the output. +- **`controlnet` pipeline (text/face STRUCTURE preservation, EXPERIMENTAL, opt-in `--pipeline controlnet`).** SDXL + the canny ControlNet `xinsir/controlnet-canny-sdxl-1.0` via `StableDiffusionXLControlNetImg2ImgPipeline` (`watermark_remover._run_controlnet` / `_load_controlnet_pipeline`). **Removal still comes from the img2img regeneration (`strength`); the ControlNet only PRESERVES text and face STRUCTURE by conditioning on the canny edge map** (`cv2.Canny(gray, 100, 200)`, 3-channel). Canny preserves edges, NOT face identity (a regenerated face drifts in likeness); face identity is preserved by the optional `--restore-faces` PhotoMaker-V1 post-pass (EXPERIMENTAL, opt-in, OFF by default -- see `photomaker_restore.py`, the `photomaker` extra). PhotoMaker carries identity in a SynthID-invariant OpenCLIP embedding and regenerates fresh face pixels conditioned on it; the GFPGAN-based `face_restore.py` was REMOVED 2026-06-04 because it ran on the watermarked original and re-introduced SynthID. The CodeFormer alternative stays NON-COMMERCIAL and is not shipped. The earlier `--face-id` IP-Adapter FaceID layer was REMOVED (footgun: it needs high strength and corrupts faces at the low removal strength). No original pixels are copied or frozen, **BUT removal at the low vendor-adaptive strength is CONTENT × PIPELINE dependent and NEITHER pipeline clears all content -- oracle-validated against the OpenAI verifier 2026-06-04 (8 images, strength 0.10/0.15, `--max-resolution 1536`).** The survivors FLIP by content type: **photoreal** (a 9-face grid, a bracelet product photo) SURVIVES controlnet but CLEARS `default` (controlnet's dense edge map keeps the regen too close to the original, so the SynthID-destroying perturbation never happens; plain img2img perturbs photoreal texture enough); **flat graphic** (a logo/poster with large flat color fills) SURVIVES `default` but CLEARS controlnet (at low strength img2img barely changes flat fills so SynthID persists there, while controlnet repaints them more freely); a flat **text** card cleared under both. **Root cause is insufficient STRENGTH, not the pipeline: at 0.10 the low-change regions -- dense-edge photoreal under controlnet, large flat fills under `default` -- are not perturbed enough to destroy SynthID. The vendor-adaptive 0.10 from the June study is NOT universally sufficient (that study's content happened to clear at 0.10).** The robust fix is a HIGHER strength, oracle-revalidated per content type (controlnet can be cranked harder without losing structure; a lower `controlnet_conditioning_scale` also frees the regen on photoreal). So at today's default strength **both pipelines AND `--auto` can LEAVE SynthID on some content** -- a removal-priority caller (raiw.cc) MUST oracle-validate strength across content types before adopting, not pick a pipeline and assume removal. **Follow-up same day: re-running the two photoreal survivors through controlnet at an explicit `--strength 0.15` cleared BOTH on the oracle -- BUT one of them (the bracelet) had SURVIVED the SAME 0.15 controlnet config in the first pass (only the random, unset seed differed). So removal near the threshold is SEED-NON-DETERMINISTIC: the same image+pipeline+strength+resolution can pass or fail run-to-run (img2img uses `seed=None`/random unless `--seed` is passed, and there is no local SynthID detector to self-verify). 0.15 is the borderline, NOT a robust floor -- pick a strength with MARGIN (controlnet ~>= 0.20) rather than exactly on it; the content×pipeline table's 0.15 data point is near-threshold noise. A confirming run at `--strength 0.20` controlnet cleared BOTH photoreal survivors on the oracle (ladder: 0.10 grid detected → 0.15 borderline/non-deterministic → 0.20 both clean), so **0.20 is the recommended robust controlnet floor for OpenAI photoreal** (one margin run, not an N-run repeatability proof -- a service should add margin or verify repeatability since there is no local SynthID detector to self-check). **Engineering follow-up for raiw.cc: the controlnet pipeline should use a HIGHER vendor strength than `default` -- it currently shares `resolve_strength` (0.10/0.15, tuned for plain img2img), but controlnet's edge map preserves structure so it needs ~0.20+; calibrate per vendor/content on the GPU worker, do NOT just reuse the `default` ladder.** **CERTIFIED 2026-06-04 via the isolated `raiw-controlnet-cert` Modal app (`raiw-app/modal_cert.py`), restore OFF, ≤1536, each vendor on its own oracle: controlnet floors are OpenAI 0.20 (2 photoreal × 3 seeds = 6/6 clean; the 0.15-flipper is seed-robust at 0.20) and Gemini 0.30 (0.20 detected → 0.30 clean on 2/2 seeds). OpenAI 0.20 transfers to prod (resolution-independent); Gemini 0.30 holds only ≤1536 — Gemini is resolution-sensitive and raiw.cc runs NATIVE (`max_resolution=0`), so cap Gemini ≤1536 + use 0.30, or native-calibrate (~0.35+). Prod recipe: controlnet + per-vendor floor in `resolve_strength` (not the default ladder) + FIXED seed (kills the non-determinism) + PhotoMaker restore (the GFPGAN footgun is gone).** See `docs/synthid.md` §5.5 + `docs/controlnet-removal-pipeline-research.md` (certified floors table).** **Lesson: visual-quality + face-recovery validation does NOT prove watermark removal -- only the SynthID oracle does, across MULTIPLE content types; never infer removal from sharpness/identity, and never conclude from a partial result (the photoreal-only data first read as "controlnet shields, default removes" -- the flat-graphic result reversed it).** `controlnet_conditioning_scale` (CLI `--controlnet-scale`, default 1.0) is the structure-preservation knob (higher = closer to the original structure); fp32 on cpu/mps, fp16-fixed VAE on cuda/xpu. The `controlnet` profile is threaded explicitly (`WatermarkRemover(pipeline=...)` / `InvisibleEngine(pipeline=...)`), NOT inferred from `model_id`. This productionizes the `scripts/controlnet_sweep.py` prototype; see `docs/controlnet-removal-pipeline-research.md`. **Forensic-stealth caveat still applies** (arXiv:2605.09203): defeating the SynthID verifier is not forensic invisibility -- a "this image went through a removal pipeline" classifier can still flag the output. diff --git a/README.md b/README.md index 8d3486f..c147a16 100644 --- a/README.md +++ b/README.md @@ -23,7 +23,7 @@ If this tool saves you time, consider [sponsoring its development](https://githu - **AI metadata stripping** — EXIF, PNG text chunks, C2PA provenance manifests (PNG / JPEG / AVIF / HEIF / JPEG-XL, **MP4 / MOV / M4V / M4A** at the container level, and **WebM / MP3 / WAV / FLAC / OGG** losslessly via ffmpeg), XMP DigitalSourceType - **"Made with AI" label removal** — removes the AI-disclosure metadata that platforms read to apply automatic labels (useful for clearing a false-positive label from a human-edited photograph) - **Analog Humanizer** — optional film grain and chromatic aberration post-processing -- **Text and face preservation (experimental)** — optional `--pipeline controlnet` adds a canny ControlNet that keeps text and face structure sharp through the removal pass (without copying original pixels, so SynthID is still removed). Canny preserves face *structure*, not *identity* (the regenerated face drifts in likeness); identity is preserved by the `--restore-faces` PhotoMaker-V2 post-pass (opt-in, SynthID-safe). Both are experimental and off by default. +- **Text and face preservation (experimental)** — optional `--pipeline controlnet` adds a canny ControlNet that keeps text and face structure sharp through the removal pass (without copying original pixels, so SynthID is still removed). Canny preserves face *structure*, not *identity* (the regenerated face drifts in likeness); identity is preserved by the `--restore-faces` PhotoMaker-V1 post-pass (opt-in, SynthID-safe). Both are experimental and off by default. - **Batch processing** — process entire directories - **Detection** — three-stage NCC watermark detection with confidence scoring - **Provenance detection (`identify`)** — aggregate C2PA issuer, the C2PA soft-binding forensic-watermark vendor (Adobe TrustMark, Digimarc, Imatag, ...), IPTC "Made with AI" plus the IPTC 2025.1 `AISystemUsed` field, embedded SD/ComfyUI params, EXIF/XMP generator tags, the xAI/Grok EXIF signature, the China TC260 AIGC label (XMP, PNG chunk, or EXIF), the HuggingFace `hf-job-id` job marker, the SynthID metadata proxy, the visible marks (Gemini sparkle plus the Doubao "豆包AI生成" / Jimeng "即梦AI" / Samsung Galaxy AI "Contenuti generati dall'AI" text marks), the open SD/SDXL/FLUX invisible watermark, and (with the `trustmark` extra) the open Adobe TrustMark watermark into one origin-platform + watermark-inventory verdict (`--json` for machine output) @@ -128,7 +128,7 @@ image → encode to latent space (VAE) at native resolution > > **`--pipeline controlnet` preserves text and face structure (experimental, opt-in).** It runs the same SDXL img2img scrub but adds a canny ControlNet that conditions the regeneration on the image's edge map, so text and structure stay sharp at the strengths that remove SynthID. The watermark removal still comes from the img2img regeneration (`--strength`); the ControlNet only preserves structure — no original pixels are copied or frozen, so SynthID does not survive. `--controlnet-scale` tunes the preservation strength (higher = closer to the original structure). Runs fp32 on mps/cpu (fp16 only on cuda/xpu, where the fp16-fixed SDXL VAE is loaded automatically). > -> **`--restore-faces` preserves face identity (PhotoMaker-V2, experimental, opt-in).** Canny preserves where a face is, but not who it is — the regenerated face drifts in likeness. The `--restore-faces` post-pass (experimental, off by default; needs the `photomaker` extra) fixes this in a SynthID-safe way: identity comes from an OpenCLIP-ViT-H/14 embedding of the original face (validated 2026-06-04: cosine 0.9977 invariance to SynthID-magnitude pixel noise, an order of magnitude less drift than JPEG90 which SynthID survives), and a fresh face is regenerated from that embedding — the pixels are diffusion-fresh, so the watermark is not transported. Commercial-safe end-to-end: PhotoMaker-V2 weights Apache-2.0, OpenCLIP-ViT-H/14 MIT, no InsightFace. The earlier GFPGAN-based `restore` extra was removed 2026-06-04 because it ran on the watermarked original and was oracle-confirmed to re-introduce SynthID; CodeFormer stays non-commercial and is not shipped. See `docs/synthid-robust-identity-research.md`. +> **`--restore-faces` preserves face identity (PhotoMaker-V1, experimental, opt-in).** Canny preserves where a face is, but not who it is — the regenerated face drifts in likeness. The `--restore-faces` post-pass (experimental, off by default; needs the `photomaker` extra) fixes this in a SynthID-safe way: identity comes from an OpenCLIP-ViT-H/14 embedding of the original face (validated 2026-06-04: cosine 0.9977 invariance to SynthID-magnitude pixel noise, an order of magnitude less drift than JPEG90 which SynthID survives), and a fresh face is regenerated from that embedding — the pixels are diffusion-fresh, so the watermark is not transported. Commercial-safe end-to-end: PhotoMaker-V1 weights Apache-2.0, OpenCLIP-ViT-H/14 MIT, no InsightFace. The earlier GFPGAN-based `restore` extra was removed 2026-06-04 because it ran on the watermarked original and was oracle-confirmed to re-introduce SynthID; CodeFormer stays non-commercial and is not shipped. See `docs/synthid-robust-identity-research.md`. SDXL is the default since May 2026: empirically defeats SynthID v2 on Gemini 3 Pro outputs, where the older SD-1.5 pipeline at 768 px did not. The SD-1.5 path was removed once it was verified not to handle v2. Note the scope: this defeats the SynthID *verifier*, which is not the same as being forensically indistinguishable from a real photo. Recent work ([arXiv:2605.09203](https://arxiv.org/abs/2605.09203)) shows watermark-removal pipelines leave detectable traces, so a separate "this image was processed" classifier can still flag the output. @@ -136,7 +136,7 @@ SDXL is the default since May 2026: empirically defeats SynthID v2 on Gemini 3 P > **Technical deep-dive:** see [`docs/synthid.md`](docs/synthid.md) for a primary-source-cited breakdown of how SynthID works mechanically (post-hoc encoder/decoder, 136-bit payload, pixel-space embedding), what it empirically survives (JPEG, crop, resize: ~99.98% TPR at 0.1% FPR from arXiv:2510.09263), what removes it, and the forensic-stealth tradeoff (all known removal attacks are detectable at >98% TPR@1%FPR per arXiv:2605.09203). -**Text and face preservation** (experimental, opt-in `--pipeline controlnet`): adds a canny ControlNet so text and face *structure* stay sharp through the removal pass, without copying or freezing any original pixels (so SynthID is still removed). Tune the preservation strength with `--controlnet-scale`. Canny preserves structure but not face *identity* (identity is preserved by the `--restore-faces` PhotoMaker-V2 post-pass, experimental and off by default — see the callout above). Both features are experimental. +**Text and face preservation** (experimental, opt-in `--pipeline controlnet`): adds a canny ControlNet so text and face *structure* stay sharp through the removal pass, without copying or freezing any original pixels (so SynthID is still removed). Tune the preservation strength with `--controlnet-scale`. Canny preserves structure but not face *identity* (identity is preserved by the `--restore-faces` PhotoMaker-V1 post-pass, experimental and off by default — see the callout above). Both features are experimental. **Analog Humanizer**: optional film grain and chromatic aberration injection that mimics a photo of a screen, raising the bar for AI-generated image classifiers. (It frustrates generic classifiers but does not guarantee forensic invisibility — see the [arXiv:2605.09203](https://arxiv.org/abs/2605.09203) note above.) @@ -215,8 +215,8 @@ After installation the `remove-ai-watermarks` command is available system-wide. > ``` > > To preserve face identity after invisible removal (the `--restore-faces` -> PhotoMaker-V2 post-pass, experimental and opt-in, SynthID-safe), install the -> `photomaker` extra. The PhotoMaker-V2 adapter and SDXL base weights download on +> PhotoMaker-V1 post-pass, experimental and opt-in, SynthID-safe), install the +> `photomaker` extra. The PhotoMaker-V1 adapter and SDXL base weights download on > first use (~4 GB total). Commercial-safe end-to-end (Apache-2.0 + MIT, no > InsightFace): > diff --git a/docs/controlnet-removal-pipeline-research.md b/docs/controlnet-removal-pipeline-research.md index 64cbd27..98bdb89 100644 --- a/docs/controlnet-removal-pipeline-research.md +++ b/docs/controlnet-removal-pipeline-research.md @@ -124,7 +124,7 @@ Gemini app; the two payloads are vendor-specific and never cross-checked): - **Fix the seed in prod.** The non-determinism is purely `seed=None` (random); a fixed `--seed` makes every run reproduce the certified-clean result, so you ship a deterministic, re-certifiable config (and the seed sweep collapses to one config). -- **`--restore-faces` is SynthID-safe by construction now (PhotoMaker-V2, 2026-06-04).** +- **`--restore-faces` is SynthID-safe by construction now (PhotoMaker-V1, 2026-06-04).** The GFPGAN-on-original path that re-added SynthID was removed; the shipped restore carries identity in a SynthID-invariant OpenCLIP embedding and regenerates fresh pixels conditioned on it. Needs the `photomaker` extra. See diff --git a/docs/synthid-robust-identity-research.md b/docs/synthid-robust-identity-research.md index 629b7c8..eeb0fa9 100644 --- a/docs/synthid-robust-identity-research.md +++ b/docs/synthid-robust-identity-research.md @@ -10,12 +10,25 @@ the face embedder it conditions on AND any base model) must be Apache-2.0 / MIT BSD or otherwise clearly commercial-permitted. Non-commercial is disqualifying. **One-line verdict.** Today there is **ONE** SDXL identity-conditioning stack that -is commercial-safe end-to-end: **PhotoMaker-V2** (Apache-2.0, identity encoded as a +is commercial-safe end-to-end: **PhotoMaker-V1** (Apache-2.0, identity encoded as a fine-tuned OpenCLIP-ViT-H/14 image embedding -- NO InsightFace). Every other -candidate (IP-Adapter FaceID family, InstantID, PuLID, Arc2Face) inherits -InsightFace's non-commercial model-pack license through its ArcFace-class embedder -and is therefore blocked for paid services, regardless of the adapter's own -license header. Below is the evidence per component and the integration plan. +candidate -- **including PhotoMaker-V2**, IP-Adapter FaceID, InstantID, PuLID, +Arc2Face -- inherits InsightFace's non-commercial model-pack license through an +ArcFace-class embedder and is therefore blocked for paid services, regardless of +the adapter's own license header. Below is the evidence per component and the +integration plan. + +**Correction notice (2026-06-04).** An earlier version of this doc claimed +PhotoMaker-V2 was commercial-safe end-to-end. That was WRONG -- the V2 model card +phrase *"id_encoder includes finetuned OpenCLIP-ViT-H-14 and a few fuse layers"* +described one of TWO ID branches; the V2 source (model_v2.py) defines +`PhotoMakerIDEncoder_CLIPInsightfaceExtendtoken` whose forward takes an +ArcFace `id_embeds` from `insightface.app.FaceAnalysis`, and the upstream package +`__init__.py` imports InsightFace at module load. A Modal cert sweep caught this +empirically (`No module named 'insightface'` from `restore_faces_photomaker`). V1 +is the correct commercial-safe target: its `PhotoMakerIDEncoder` (model.py) +forward takes only `(id_pixel_values, prompt_embeds, class_tokens_mask)` -- no +ArcFace branch -- so identity is CLIP-only. ## 1. Why identity-by-embedding (not by pixel) is the only SynthID-robust path @@ -44,7 +57,8 @@ the watermark is not transported. Two embedding families exist in practice: | stack | adapter weights | identity encoder | end-to-end commercial-safe? | |---|---|---|---| -| **PhotoMaker-V2** | **Apache-2.0** ([HF model card][pm2hf]) | **OpenCLIP-ViT-H/14 (MIT)** finetuned, see card: *"id_encoder includes finetuned OpenCLIP-ViT-H-14 and a few fuse layers"* | **YES** | +| **PhotoMaker-V1** | **Apache-2.0** ([HF][pmhf]) | **OpenCLIP-ViT-H/14 (MIT)** finetuned, identity from `PhotoMakerIDEncoder` (`model.py`); forward takes only ``(id_pixel_values, prompt_embeds, class_tokens_mask)`` -- no ArcFace branch | **YES** | +| PhotoMaker-V2 | Apache-2.0 (adapter) ([HF][pm2hf]) | DUAL encoder: OpenCLIP-ViT-H/14 AND InsightFace antelopev2/buffalo_l -- `PhotoMakerIDEncoder_CLIPInsightfaceExtendtoken` (`model_v2.py`) forward takes `id_embeds` from `insightface.app.FaceAnalysis`, and `photomaker/__init__.py` imports InsightFace at module load | NO -- InsightFace pack is non-commercial | | IP-Adapter FaceID | non-commercial per model card: *"AS InsightFace pretrained models are available for non-commercial research purposes, IP-Adapter-FaceID models are released exclusively for research purposes and is not intended for commercial use"* ([HF][ipafhf]) | InsightFace antelopev2 (non-commercial for the model pack) | NO -- both layers block | | InstantID | Apache-2.0 (adapter only) ([HF][insthf]) | requires InsightFace antelopev2 face-analysis at runtime (`FaceAnalysis(name='antelopev2', ...)` per the README usage snippet, [HF][insthf]) | NO -- embedder pack is non-commercial | | PuLID | apache-2.0 (HF model metadata, [HF][pulidhf]) | depends on InsightFace face-analysis for ArcFace embedding (per the upstream README; PuLID's own card is sparse and the GitHub README documents the InsightFace install step) | NO -- same embedder issue as IP-Adapter FaceID | @@ -66,6 +80,7 @@ weights but the upstream repo's quickstart requires the InsightFace package to extract the ID embedding. So PuLID's adapter license is permissive; the BLOCKER is the embedder it expects at runtime. This is the same trap as InstantID.) +[pmhf]: https://huggingface.co/TencentARC/PhotoMaker [pm2hf]: https://huggingface.co/TencentARC/PhotoMaker-V2 [ipafhf]: https://huggingface.co/h94/IP-Adapter-FaceID [insthf]: https://huggingface.co/InstantX/InstantID @@ -87,7 +102,7 @@ but you would need: For a removal service this is a multi-month side project that delivers what PhotoMaker already gives us with one pip install. So the practical answer is to -take the CLIP-embedding path (PhotoMaker-V2), accept the identity-fidelity +take the CLIP-embedding path (PhotoMaker-V1; V2 adds InsightFace and is non-commercial), accept the identity-fidelity trade-off, and revisit ArcFace later if quality is insufficient. ## 4. Does an identity embedding leak SynthID? @@ -109,7 +124,7 @@ This is the load-bearing assumption of the whole approach. The argument: **MEASURED 2026-06-04 — hypothesis confirmed.** Ran a low-amplitude perturbation sweep on 31 face crops (3 photoreal originals: gemini_3, gemini_4, openai_3 grid), comparing `cos(embedding(orig), embedding(perturbed))` for OpenCLIP- -ViT-H/14 (laion2B-s32B-b79K, the same encoder PhotoMaker-V2 finetunes): +ViT-H/14 (laion2B-s32B-b79K, the same OpenCLIP-ViT-H/14 encoder PhotoMaker V1 and V2 both finetune for CLIP-side identity): | perturbation | mean cos | min | max | |---|---|---|---| @@ -123,7 +138,7 @@ ViT-H/14 (laion2B-s32B-b79K, the same encoder PhotoMaker-V2 finetunes): The SynthID-magnitude perturbation moves the embedding by **0.002** (cosine 0.9977), an order of magnitude less than JPEG90 — which SynthID survives at >=99% TPR by design. So the embedding cannot carry the watermark pattern: its discriminative -signal is in dimensions the SynthID payload does not occupy. PhotoMaker-V2 +signal is in dimensions the SynthID payload does not occupy. PhotoMaker-V1 conditioned on a watermarked face will see ~the same identity vector as if conditioned on a clean face of the same person, so the freshly generated face inherits the identity, not the watermark. @@ -136,7 +151,7 @@ synthid_proxy result above is the one that actually answers the load-bearing question. Script: `/tmp/identity_smoke/test2_proxy.py` (not committed; reproducible from the test set + this doc). -## 5. PhotoMaker-V2 properties for our pipeline +## 5. PhotoMaker-V1 properties for our pipeline - **SDXL-native.** PhotoMaker v1 and v2 target Stable Diffusion XL; the pipeline is a stacked-ID embedding fused into SDXL's cross-attention via the fuse layers @@ -166,9 +181,9 @@ from the test set + this doc). - New deps: `diffusers` already in the gpu extra; PhotoMaker ships as a `.bin` loaded via `pipeline.load_photomaker_adapter(...)`. The OpenCLIP encoder is the same one diffusers already pulls. No new heavy pip dep. -- Weight download: PhotoMaker-V2 weights are ~3 GB. Add to the Modal HF volume +- Weight download: PhotoMaker-V1 weights are ~3 GB. Add to the Modal HF volume alongside SDXL. -- VRAM: SDXL + canny ControlNet + PhotoMaker-V2 fits comfortably in A100-40GB. +- VRAM: SDXL + canny ControlNet + PhotoMaker-V1 fits comfortably in A100-40GB. - Latency: a few extra seconds on cold start (load PhotoMaker), negligible per request after warm-up. - No InsightFace install: huge win for `restore` extra's basicsr/numpy hell -- @@ -184,7 +199,7 @@ from the test set + this doc). - If yes -> the embedding does not carry SynthID, proceed. - If no -> the assumption is wrong; PhotoMaker would re-introduce the watermark. Stop and reconsider. -2. **PhotoMaker-V2 prototype** in the existing `controlnet` pipeline: +2. **PhotoMaker-V1 prototype** in the existing `controlnet` pipeline: - Mirror the `_load_controlnet_pipeline` path: add a PhotoMaker variant that loads SDXL + canny ControlNet + PhotoMaker adapter on the same engine. - Extract the OpenCLIP face embedding from the watermarked face crops (use diff --git a/docs/synthid.md b/docs/synthid.md index c441ab7..89131f7 100644 --- a/docs/synthid.md +++ b/docs/synthid.md @@ -570,7 +570,7 @@ table. schedule to `resolve_strength`, do not reuse the default ladder; (2) the `--restore-faces` pass is now SynthID-safe by construction (the GFPGAN-on-original path that re-added SynthID was removed 2026-06-04; the shipped restore is -PhotoMaker-V2, identity-as-embedding, see `synthid-robust-identity-research.md`); (3) +PhotoMaker-V1, identity-as-embedding, see `synthid-robust-identity-research.md`); (3) removal near threshold is seed-non-deterministic -> FIX the prod seed (kills the coin-flip; ship a deterministic certified config). diff --git a/src/remove_ai_watermarks/photomaker_restore.py b/src/remove_ai_watermarks/photomaker_restore.py index 52cb654..dc54c5f 100644 --- a/src/remove_ai_watermarks/photomaker_restore.py +++ b/src/remove_ai_watermarks/photomaker_restore.py @@ -1,4 +1,4 @@ -"""SynthID-robust face identity restoration via PhotoMaker-V2. +"""SynthID-robust face identity restoration via PhotoMaker-V1. The diffusion removal pass scrubs the pixel watermark from the WHOLE image, including faces, but lets faces drift in identity. Unlike the GFPGAN restore pass in @@ -14,11 +14,16 @@ empirically confirmed 2026-06-04: on 31 face crops, the cosine similarity betwee SynthID magnitude) is 0.9977 -- an order of magnitude less drift than JPEG90, which SynthID survives at >=99% TPR by design. See ``docs/synthid-robust-identity-research.md``. -Architecture: PhotoMaker-V2 is a fine-tuned OpenCLIP-ViT-H/14 ID encoder plus LoRA on -the SDXL UNet attention layers. It ships as a single ``photomaker-v2.bin`` checkpoint -loaded into a ``PhotoMakerStableDiffusionXLPipeline`` (txt2img only -- there is no -PhotoMakerControlNetImg2img class in diffusers). We use it as a SECOND PASS after the -main controlnet/default removal: +Architecture: PhotoMaker-V1 is a fine-tuned OpenCLIP-ViT-H/14 ID encoder plus LoRA on +the SDXL UNet attention layers. It ships as a single ``photomaker-v1.bin`` checkpoint +loaded into a ``PhotoMakerStableDiffusionXLPipeline`` (txt2img). **V1, not V2:** V2 +adds an InsightFace/ArcFace face-recognition component at runtime, whose pretrained +model packs (antelopev2, buffalo_l) are non-commercial-research-only per the +InsightFace README, which would block a paid service like raiw.cc. V1's identity +encoder is CLIP-only (PhotoMakerIDEncoder, ``model.py``); confirmed by inspecting +the upstream source (model_v2.py forward takes ``id_embeds`` from InsightFace; V1 +forward does not). We use it as a SECOND PASS after the main controlnet/default +removal: 1. Main removal pass (`controlnet` at the certified strength) cleans SynthID everywhere but leaves faces drifted. @@ -31,11 +36,12 @@ The generated face pixels are diffusion-fresh and inherit identity from the embe (not the pixels), so SynthID is not re-introduced. Commercial-safe end-to-end: -- PhotoMaker-V2 weights: Apache-2.0 (TencentARC). +- PhotoMaker-V1 weights: Apache-2.0 (TencentARC). - ID encoder: OpenCLIP-ViT-H/14 (MIT) finetuned by PhotoMaker (still Apache-2.0). - SDXL base: shared with the main pipeline (already used in `default`/`controlnet`). -- NO InsightFace / antelopev2 (which is the non-commercial blocker for IP-Adapter - FaceID / InstantID / PuLID / Arc2Face). +- NO InsightFace / antelopev2 (the non-commercial blocker that BLOCKS PhotoMaker-V2, + IP-Adapter FaceID, InstantID, PuLID, and Arc2Face). V1 is the only commercial-safe + member of this family. Requires the optional ``photomaker`` extra: ``pip install 'remove-ai-watermarks[photomaker]'`` (pulls torch / diffusers / the upstream PhotoMaker @@ -57,9 +63,10 @@ if TYPE_CHECKING: logger = logging.getLogger(__name__) -# PhotoMaker-V2 weights (Apache-2.0, TencentARC). Downloaded on first use. -_PHOTOMAKER_REPO = "TencentARC/PhotoMaker-V2" -_PHOTOMAKER_FILE = "photomaker-v2.bin" +# PhotoMaker-V1 weights (Apache-2.0, TencentARC). Downloaded on first use. V2 is NOT +# used because it pulls InsightFace at runtime (non-commercial models). +_PHOTOMAKER_REPO = "TencentARC/PhotoMaker" +_PHOTOMAKER_FILE = "photomaker-v1.bin" # SDXL base shared with the main pipeline (same checkpoint as `default`/`controlnet`). _SDXL_MODEL_ID = "stabilityai/stable-diffusion-xl-base-1.0"