fix(photomaker): switch to V1 — V2 actually requires InsightFace (non-commercial)

A Modal cert sweep caught what the research doc missed: PhotoMaker-V2 fails at import without InsightFace ("No module named 'insightface'"). Reading the upstream source confirms it: `photomaker/__init__.py` imports `FaceAnalysis2` (an InsightFace wrapper) at module load, V2's encoder is named `PhotoMakerIDEncoder_CLIPInsightfaceExtendtoken`, and `model_v2.py`'s forward takes an `id_embeds` argument that the pipeline computes via `insightface.app.FaceAnalysis(name='antelopev2', ...)`. So V2 is a DUAL encoder (CLIP + ArcFace), not CLIP-only as the model card line "id_encoder includes finetuned OpenCLIP-ViT-H-14 and a few fuse layers" implied. InsightFace's pretrained model packs (antelopev2, buffalo_l) are research/ non-commercial only per their own README: "The pretrained models we provided with this library are available for non-commercial research purposes only." So V2 is blocked for a paid service like raiw.cc. PhotoMaker-V1 is the commercial-safe alternative — its `PhotoMakerIDEncoder` (model.py) forward takes only `(id_pixel_values, prompt_embeds, class_tokens_mask)`, no ArcFace branch. Identity is CLIP-only, license is Apache-2.0, no InsightFace. Code change: swap the repo + filename constants in `photomaker_restore.py` (TencentARC/PhotoMaker, photomaker-v1.bin). Tests still pass (the 9 PhotoMaker tests use a fake pipeline, so the model swap is transparent to them). Doc correction: rewrote the verdict / license table / section 5 of `docs/synthid-robust-identity-research.md` to lead with V1 and add a correction notice explaining the V2 misread. Bulk-renamed `PhotoMaker-V2` to `PhotoMaker-V1` across CLAUDE.md, README.md, docs/synthid.md, and docs/controlnet-removal-pipeline-research.md (kept V2 only in the correction notice, the license table, and the anchor reference). ruff clean; 578 tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-07-28 10:18:49 +02:00 · 2026-06-08 16:05:58 -07:00
parent 7e6fc8bfb9
commit dfa5181309
6 changed files with 60 additions and 38 deletions
@@ -23,7 +23,7 @@ If this tool saves you time, consider [sponsoring its development](https://githu
 - **AI metadata stripping** — EXIF, PNG text chunks, C2PA provenance manifests (PNG / JPEG / AVIF / HEIF / JPEG-XL, **MP4 / MOV / M4V / M4A** at the container level, and **WebM / MP3 / WAV / FLAC / OGG** losslessly via ffmpeg), XMP DigitalSourceType
 - **"Made with AI" label removal** — removes the AI-disclosure metadata that platforms read to apply automatic labels (useful for clearing a false-positive label from a human-edited photograph)
 - **Analog Humanizer** — optional film grain and chromatic aberration post-processing
- **Text and face preservation (experimental)** — optional `--pipeline controlnet` adds a canny ControlNet that keeps text and face structure sharp through the removal pass (without copying original pixels, so SynthID is still removed). Canny preserves face *structure*, not *identity* (the regenerated face drifts in likeness); identity is preserved by the `--restore-faces` PhotoMaker-V2 post-pass (opt-in, SynthID-safe). Both are experimental and off by default.
+- **Text and face preservation (experimental)** — optional `--pipeline controlnet` adds a canny ControlNet that keeps text and face structure sharp through the removal pass (without copying original pixels, so SynthID is still removed). Canny preserves face *structure*, not *identity* (the regenerated face drifts in likeness); identity is preserved by the `--restore-faces` PhotoMaker-V1 post-pass (opt-in, SynthID-safe). Both are experimental and off by default.
 - **Batch processing** — process entire directories
 - **Detection** — three-stage NCC watermark detection with confidence scoring
 - **Provenance detection (`identify`)** — aggregate C2PA issuer, the C2PA soft-binding forensic-watermark vendor (Adobe TrustMark, Digimarc, Imatag, ...), IPTC "Made with AI" plus the IPTC 2025.1 `AISystemUsed` field, embedded SD/ComfyUI params, EXIF/XMP generator tags, the xAI/Grok EXIF signature, the China TC260 AIGC label (XMP, PNG chunk, or EXIF), the HuggingFace `hf-job-id` job marker, the SynthID metadata proxy, the visible marks (Gemini sparkle plus the Doubao "豆包AI生成" / Jimeng "即梦AI" / Samsung Galaxy AI "Contenuti generati dall'AI" text marks), the open SD/SDXL/FLUX invisible watermark, and (with the `trustmark` extra) the open Adobe TrustMark watermark into one origin-platform + watermark-inventory verdict (`--json` for machine output)
@@ -128,7 +128,7 @@ image → encode to latent space (VAE) at native resolution
 >
 > **`--pipeline controlnet` preserves text and face structure (experimental, opt-in).** It runs the same SDXL img2img scrub but adds a canny ControlNet that conditions the regeneration on the image's edge map, so text and structure stay sharp at the strengths that remove SynthID. The watermark removal still comes from the img2img regeneration (`--strength`); the ControlNet only preserves structure — no original pixels are copied or frozen, so SynthID does not survive. `--controlnet-scale` tunes the preservation strength (higher = closer to the original structure). Runs fp32 on mps/cpu (fp16 only on cuda/xpu, where the fp16-fixed SDXL VAE is loaded automatically).
 >
-> **`--restore-faces` preserves face identity (PhotoMaker-V2, experimental, opt-in).** Canny preserves where a face is, but not who it is — the regenerated face drifts in likeness. The `--restore-faces` post-pass (experimental, off by default; needs the `photomaker` extra) fixes this in a SynthID-safe way: identity comes from an OpenCLIP-ViT-H/14 embedding of the original face (validated 2026-06-04: cosine 0.9977 invariance to SynthID-magnitude pixel noise, an order of magnitude less drift than JPEG90 which SynthID survives), and a fresh face is regenerated from that embedding — the pixels are diffusion-fresh, so the watermark is not transported. Commercial-safe end-to-end: PhotoMaker-V2 weights Apache-2.0, OpenCLIP-ViT-H/14 MIT, no InsightFace. The earlier GFPGAN-based `restore` extra was removed 2026-06-04 because it ran on the watermarked original and was oracle-confirmed to re-introduce SynthID; CodeFormer stays non-commercial and is not shipped. See `docs/synthid-robust-identity-research.md`.
+> **`--restore-faces` preserves face identity (PhotoMaker-V1, experimental, opt-in).** Canny preserves where a face is, but not who it is — the regenerated face drifts in likeness. The `--restore-faces` post-pass (experimental, off by default; needs the `photomaker` extra) fixes this in a SynthID-safe way: identity comes from an OpenCLIP-ViT-H/14 embedding of the original face (validated 2026-06-04: cosine 0.9977 invariance to SynthID-magnitude pixel noise, an order of magnitude less drift than JPEG90 which SynthID survives), and a fresh face is regenerated from that embedding — the pixels are diffusion-fresh, so the watermark is not transported. Commercial-safe end-to-end: PhotoMaker-V1 weights Apache-2.0, OpenCLIP-ViT-H/14 MIT, no InsightFace. The earlier GFPGAN-based `restore` extra was removed 2026-06-04 because it ran on the watermarked original and was oracle-confirmed to re-introduce SynthID; CodeFormer stays non-commercial and is not shipped. See `docs/synthid-robust-identity-research.md`.

 SDXL is the default since May 2026: empirically defeats SynthID v2 on Gemini 3 Pro outputs, where the older SD-1.5 pipeline at 768 px did not. The SD-1.5 path was removed once it was verified not to handle v2. Note the scope: this defeats the SynthID *verifier*, which is not the same as being forensically indistinguishable from a real photo. Recent work ([arXiv:2605.09203](https://arxiv.org/abs/2605.09203)) shows watermark-removal pipelines leave detectable traces, so a separate "this image was processed" classifier can still flag the output.

@@ -136,7 +136,7 @@ SDXL is the default since May 2026: empirically defeats SynthID v2 on Gemini 3 P

 > **Technical deep-dive:** see [`docs/synthid.md`](docs/synthid.md) for a primary-source-cited breakdown of how SynthID works mechanically (post-hoc encoder/decoder, 136-bit payload, pixel-space embedding), what it empirically survives (JPEG, crop, resize: ~99.98% TPR at 0.1% FPR from arXiv:2510.09263), what removes it, and the forensic-stealth tradeoff (all known removal attacks are detectable at >98% TPR@1%FPR per arXiv:2605.09203).

-**Text and face preservation** (experimental, opt-in `--pipeline controlnet`): adds a canny ControlNet so text and face *structure* stay sharp through the removal pass, without copying or freezing any original pixels (so SynthID is still removed). Tune the preservation strength with `--controlnet-scale`. Canny preserves structure but not face *identity* (identity is preserved by the `--restore-faces` PhotoMaker-V2 post-pass, experimental and off by default — see the callout above). Both features are experimental.
+**Text and face preservation** (experimental, opt-in `--pipeline controlnet`): adds a canny ControlNet so text and face *structure* stay sharp through the removal pass, without copying or freezing any original pixels (so SynthID is still removed). Tune the preservation strength with `--controlnet-scale`. Canny preserves structure but not face *identity* (identity is preserved by the `--restore-faces` PhotoMaker-V1 post-pass, experimental and off by default — see the callout above). Both features are experimental.

 **Analog Humanizer**: optional film grain and chromatic aberration injection that mimics a photo of a screen, raising the bar for AI-generated image classifiers. (It frustrates generic classifiers but does not guarantee forensic invisibility — see the [arXiv:2605.09203](https://arxiv.org/abs/2605.09203) note above.)

@@ -215,8 +215,8 @@ After installation the `remove-ai-watermarks` command is available system-wide.
 > ```
 >
 > To preserve face identity after invisible removal (the `--restore-faces`
-> PhotoMaker-V2 post-pass, experimental and opt-in, SynthID-safe), install the
-> `photomaker` extra. The PhotoMaker-V2 adapter and SDXL base weights download on
+> PhotoMaker-V1 post-pass, experimental and opt-in, SynthID-safe), install the
+> `photomaker` extra. The PhotoMaker-V1 adapter and SDXL base weights download on
 > first use (~4 GB total). Commercial-safe end-to-end (Apache-2.0 + MIT, no
 > InsightFace):
 >
@@ -124,7 +124,7 @@ Gemini app; the two payloads are vendor-specific and never cross-checked):
 - **Fix the seed in prod.** The non-determinism is purely `seed=None` (random); a fixed
  `--seed` makes every run reproduce the certified-clean result, so you ship a
  deterministic, re-certifiable config (and the seed sweep collapses to one config).
- **`--restore-faces` is SynthID-safe by construction now (PhotoMaker-V2, 2026-06-04).**
+- **`--restore-faces` is SynthID-safe by construction now (PhotoMaker-V1, 2026-06-04).**
  The GFPGAN-on-original path that re-added SynthID was removed; the shipped restore
  carries identity in a SynthID-invariant OpenCLIP embedding and regenerates fresh
  pixels conditioned on it. Needs the `photomaker` extra. See
@@ -10,12 +10,25 @@ the face embedder it conditions on AND any base model) must be Apache-2.0 / MIT
 BSD or otherwise clearly commercial-permitted. Non-commercial is disqualifying.

 **One-line verdict.** Today there is **ONE** SDXL identity-conditioning stack that
-is commercial-safe end-to-end: **PhotoMaker-V2** (Apache-2.0, identity encoded as a
+is commercial-safe end-to-end: **PhotoMaker-V1** (Apache-2.0, identity encoded as a
 fine-tuned OpenCLIP-ViT-H/14 image embedding -- NO InsightFace). Every other
-candidate (IP-Adapter FaceID family, InstantID, PuLID, Arc2Face) inherits
-InsightFace's non-commercial model-pack license through its ArcFace-class embedder
-and is therefore blocked for paid services, regardless of the adapter's own
-license header. Below is the evidence per component and the integration plan.
+candidate -- **including PhotoMaker-V2**, IP-Adapter FaceID, InstantID, PuLID,
+Arc2Face -- inherits InsightFace's non-commercial model-pack license through an
+ArcFace-class embedder and is therefore blocked for paid services, regardless of
+the adapter's own license header. Below is the evidence per component and the
+integration plan.
+
+**Correction notice (2026-06-04).** An earlier version of this doc claimed
+PhotoMaker-V2 was commercial-safe end-to-end. That was WRONG -- the V2 model card
+phrase *"id_encoder includes finetuned OpenCLIP-ViT-H-14 and a few fuse layers"*
+described one of TWO ID branches; the V2 source (model_v2.py) defines
+`PhotoMakerIDEncoder_CLIPInsightfaceExtendtoken` whose forward takes an
+ArcFace `id_embeds` from `insightface.app.FaceAnalysis`, and the upstream package
+`__init__.py` imports InsightFace at module load. A Modal cert sweep caught this
+empirically (`No module named 'insightface'` from `restore_faces_photomaker`). V1
+is the correct commercial-safe target: its `PhotoMakerIDEncoder` (model.py)
+forward takes only `(id_pixel_values, prompt_embeds, class_tokens_mask)` -- no
+ArcFace branch -- so identity is CLIP-only.

 ## 1. Why identity-by-embedding (not by pixel) is the only SynthID-robust path

@@ -44,7 +57,8 @@ the watermark is not transported. Two embedding families exist in practice:

 | stack | adapter weights | identity encoder | end-to-end commercial-safe? |
 |---|---|---|---|
-| **PhotoMaker-V2** | **Apache-2.0** ([HF model card][pm2hf]) | **OpenCLIP-ViT-H/14 (MIT)** finetuned, see card: *"id_encoder includes finetuned OpenCLIP-ViT-H-14 and a few fuse layers"* | **YES** |
+| **PhotoMaker-V1** | **Apache-2.0** ([HF][pmhf]) | **OpenCLIP-ViT-H/14 (MIT)** finetuned, identity from `PhotoMakerIDEncoder` (`model.py`); forward takes only ``(id_pixel_values, prompt_embeds, class_tokens_mask)`` -- no ArcFace branch | **YES** |
+| PhotoMaker-V2 | Apache-2.0 (adapter) ([HF][pm2hf]) | DUAL encoder: OpenCLIP-ViT-H/14 AND InsightFace antelopev2/buffalo_l -- `PhotoMakerIDEncoder_CLIPInsightfaceExtendtoken` (`model_v2.py`) forward takes `id_embeds` from `insightface.app.FaceAnalysis`, and `photomaker/__init__.py` imports InsightFace at module load | NO -- InsightFace pack is non-commercial |
 | IP-Adapter FaceID | non-commercial per model card: *"AS InsightFace pretrained models are available for non-commercial research purposes, IP-Adapter-FaceID models are released exclusively for research purposes and is not intended for commercial use"* ([HF][ipafhf]) | InsightFace antelopev2 (non-commercial for the model pack) | NO -- both layers block |
 | InstantID | Apache-2.0 (adapter only) ([HF][insthf]) | requires InsightFace antelopev2 face-analysis at runtime (`FaceAnalysis(name='antelopev2', ...)` per the README usage snippet, [HF][insthf]) | NO -- embedder pack is non-commercial |
 | PuLID | apache-2.0 (HF model metadata, [HF][pulidhf]) | depends on InsightFace face-analysis for ArcFace embedding (per the upstream README; PuLID's own card is sparse and the GitHub README documents the InsightFace install step) | NO -- same embedder issue as IP-Adapter FaceID |
@@ -66,6 +80,7 @@ weights but the upstream repo's quickstart requires the InsightFace package to
 extract the ID embedding. So PuLID's adapter license is permissive; the BLOCKER
 is the embedder it expects at runtime. This is the same trap as InstantID.)

+[pmhf]: https://huggingface.co/TencentARC/PhotoMaker
 [pm2hf]: https://huggingface.co/TencentARC/PhotoMaker-V2
 [ipafhf]: https://huggingface.co/h94/IP-Adapter-FaceID
 [insthf]: https://huggingface.co/InstantX/InstantID
@@ -87,7 +102,7 @@ but you would need:

 For a removal service this is a multi-month side project that delivers what
 PhotoMaker already gives us with one pip install. So the practical answer is to
-take the CLIP-embedding path (PhotoMaker-V2), accept the identity-fidelity
+take the CLIP-embedding path (PhotoMaker-V1; V2 adds InsightFace and is non-commercial), accept the identity-fidelity
 trade-off, and revisit ArcFace later if quality is insufficient.

 ## 4. Does an identity embedding leak SynthID?
@@ -109,7 +124,7 @@ This is the load-bearing assumption of the whole approach. The argument:
 **MEASURED 2026-06-04 — hypothesis confirmed.** Ran a low-amplitude
 perturbation sweep on 31 face crops (3 photoreal originals: gemini_3, gemini_4,
 openai_3 grid), comparing `cos(embedding(orig), embedding(perturbed))` for OpenCLIP-
-ViT-H/14 (laion2B-s32B-b79K, the same encoder PhotoMaker-V2 finetunes):
+ViT-H/14 (laion2B-s32B-b79K, the same OpenCLIP-ViT-H/14 encoder PhotoMaker V1 and V2 both finetune for CLIP-side identity):

 | perturbation | mean cos | min | max |
 |---|---|---|---|
@@ -123,7 +138,7 @@ ViT-H/14 (laion2B-s32B-b79K, the same encoder PhotoMaker-V2 finetunes):
 The SynthID-magnitude perturbation moves the embedding by **0.002** (cosine 0.9977),
 an order of magnitude less than JPEG90 — which SynthID survives at >=99% TPR by
 design. So the embedding cannot carry the watermark pattern: its discriminative
-signal is in dimensions the SynthID payload does not occupy. PhotoMaker-V2
+signal is in dimensions the SynthID payload does not occupy. PhotoMaker-V1
 conditioned on a watermarked face will see ~the same identity vector as if
 conditioned on a clean face of the same person, so the freshly generated face
 inherits the identity, not the watermark.
@@ -136,7 +151,7 @@ synthid_proxy result above is the one that actually answers the load-bearing
 question. Script: `/tmp/identity_smoke/test2_proxy.py` (not committed; reproducible
 from the test set + this doc).

-## 5. PhotoMaker-V2 properties for our pipeline
+## 5. PhotoMaker-V1 properties for our pipeline

 - **SDXL-native.** PhotoMaker v1 and v2 target Stable Diffusion XL; the pipeline
  is a stacked-ID embedding fused into SDXL's cross-attention via the fuse layers
@@ -166,9 +181,9 @@ from the test set + this doc).
 - New deps: `diffusers` already in the gpu extra; PhotoMaker ships as a `.bin`
  loaded via `pipeline.load_photomaker_adapter(...)`. The OpenCLIP encoder is the
  same one diffusers already pulls. No new heavy pip dep.
- Weight download: PhotoMaker-V2 weights are ~3 GB. Add to the Modal HF volume
+- Weight download: PhotoMaker-V1 weights are ~3 GB. Add to the Modal HF volume
  alongside SDXL.
- VRAM: SDXL + canny ControlNet + PhotoMaker-V2 fits comfortably in A100-40GB.
+- VRAM: SDXL + canny ControlNet + PhotoMaker-V1 fits comfortably in A100-40GB.
 - Latency: a few extra seconds on cold start (load PhotoMaker), negligible per
  request after warm-up.
 - No InsightFace install: huge win for `restore` extra's basicsr/numpy hell --
@@ -184,7 +199,7 @@ from the test set + this doc).
   - If yes -> the embedding does not carry SynthID, proceed.
   - If no -> the assumption is wrong; PhotoMaker would re-introduce the
     watermark. Stop and reconsider.
-2. **PhotoMaker-V2 prototype** in the existing `controlnet` pipeline:
+2. **PhotoMaker-V1 prototype** in the existing `controlnet` pipeline:
   - Mirror the `_load_controlnet_pipeline` path: add a PhotoMaker variant that
     loads SDXL + canny ControlNet + PhotoMaker adapter on the same engine.
   - Extract the OpenCLIP face embedding from the watermarked face crops (use
@@ -570,7 +570,7 @@ table.
 schedule to `resolve_strength`, do not reuse the default ladder; (2) the
 `--restore-faces` pass is now SynthID-safe by construction (the GFPGAN-on-original
 path that re-added SynthID was removed 2026-06-04; the shipped restore is
-PhotoMaker-V2, identity-as-embedding, see `synthid-robust-identity-research.md`); (3)
+PhotoMaker-V1, identity-as-embedding, see `synthid-robust-identity-research.md`); (3)
 removal near threshold is seed-non-deterministic -> FIX the prod seed (kills the
 coin-flip; ship a deterministic certified config).

@@ -1,4 +1,4 @@
-"""SynthID-robust face identity restoration via PhotoMaker-V2.
+"""SynthID-robust face identity restoration via PhotoMaker-V1.

 The diffusion removal pass scrubs the pixel watermark from the WHOLE image, including
 faces, but lets faces drift in identity. Unlike the GFPGAN restore pass in
@@ -14,11 +14,16 @@ empirically confirmed 2026-06-04: on 31 face crops, the cosine similarity betwee
 SynthID magnitude) is 0.9977 -- an order of magnitude less drift than JPEG90, which
 SynthID survives at >=99% TPR by design. See ``docs/synthid-robust-identity-research.md``.

-Architecture: PhotoMaker-V2 is a fine-tuned OpenCLIP-ViT-H/14 ID encoder plus LoRA on
-the SDXL UNet attention layers. It ships as a single ``photomaker-v2.bin`` checkpoint
-loaded into a ``PhotoMakerStableDiffusionXLPipeline`` (txt2img only -- there is no
-PhotoMakerControlNetImg2img class in diffusers). We use it as a SECOND PASS after the
-main controlnet/default removal:
+Architecture: PhotoMaker-V1 is a fine-tuned OpenCLIP-ViT-H/14 ID encoder plus LoRA on
+the SDXL UNet attention layers. It ships as a single ``photomaker-v1.bin`` checkpoint
+loaded into a ``PhotoMakerStableDiffusionXLPipeline`` (txt2img). **V1, not V2:** V2
+adds an InsightFace/ArcFace face-recognition component at runtime, whose pretrained
+model packs (antelopev2, buffalo_l) are non-commercial-research-only per the
+InsightFace README, which would block a paid service like raiw.cc. V1's identity
+encoder is CLIP-only (PhotoMakerIDEncoder, ``model.py``); confirmed by inspecting
+the upstream source (model_v2.py forward takes ``id_embeds`` from InsightFace; V1
+forward does not). We use it as a SECOND PASS after the main controlnet/default
+removal:

  1. Main removal pass (`controlnet` at the certified strength) cleans SynthID
     everywhere but leaves faces drifted.
@@ -31,11 +36,12 @@ The generated face pixels are diffusion-fresh and inherit identity from the embe
 (not the pixels), so SynthID is not re-introduced.

 Commercial-safe end-to-end:
- PhotoMaker-V2 weights: Apache-2.0 (TencentARC).
+- PhotoMaker-V1 weights: Apache-2.0 (TencentARC).
 - ID encoder: OpenCLIP-ViT-H/14 (MIT) finetuned by PhotoMaker (still Apache-2.0).
 - SDXL base: shared with the main pipeline (already used in `default`/`controlnet`).
- NO InsightFace / antelopev2 (which is the non-commercial blocker for IP-Adapter
-  FaceID / InstantID / PuLID / Arc2Face).
+- NO InsightFace / antelopev2 (the non-commercial blocker that BLOCKS PhotoMaker-V2,
+  IP-Adapter FaceID, InstantID, PuLID, and Arc2Face). V1 is the only commercial-safe
+  member of this family.

 Requires the optional ``photomaker`` extra: ``pip install
 'remove-ai-watermarks[photomaker]'`` (pulls torch / diffusers / the upstream PhotoMaker
@@ -57,9 +63,10 @@ if TYPE_CHECKING:

 logger = logging.getLogger(__name__)

-# PhotoMaker-V2 weights (Apache-2.0, TencentARC). Downloaded on first use.
-_PHOTOMAKER_REPO = "TencentARC/PhotoMaker-V2"
-_PHOTOMAKER_FILE = "photomaker-v2.bin"
+# PhotoMaker-V1 weights (Apache-2.0, TencentARC). Downloaded on first use. V2 is NOT
+# used because it pulls InsightFace at runtime (non-commercial models).
+_PHOTOMAKER_REPO = "TencentARC/PhotoMaker"
+_PHOTOMAKER_FILE = "photomaker-v1.bin"
 # SDXL base shared with the main pipeline (same checkpoint as `default`/`controlnet`).
 _SDXL_MODEL_ID = "stabilityai/stable-diffusion-xl-base-1.0"