docs(restore): document that restore methods REGENERATE, not preserve

Empirical conclusion from the 2026-06-04 - 2026-06-08 cert sweeps:
every shipped face-restore method (GFPGAN-on-cleaned, PhotoMaker-V2,
InstantID txt2img, InstantID img2img-on-cleaned at three parameter
settings) regenerates the face from an ArcFace embedding via SDXL
diffusion. Output face pixels are diffusion-fresh, which makes the
regenerated face look MORE AI-generated than the cleaned image (gloss,
symmetric pores, SDXL "clean skin" aesthetic) regardless of license.

The cleaned image from the main controlnet 0.20 removal pass is the
LEAST-AI state we can reach without re-introducing SynthID; any restore
on top trades original-look for embedding-driven regeneration. The
fundamental issue is structural: ArcFace encodes "general look" at 512
dimensions, SDXL decodes that into pixels with the inherent SDXL
aesthetic. Stronger identity push (higher strength + IP-Adapter scale)
makes the face closer to the embedding but more AI-looking; weaker push
leaves identity to drift further. No parameter setting recovers original
identity AND looks less AI than cleaned.

Production conclusion: do not ship `--restore-faces` in any monetized
deployment. The extras (`instantid`, `photomaker`) stay in the library
for research / personal use where users explicitly want regeneration.
Documented at every entry point:
- CLAUDE.md: new "Face restore trade-off" bullet + every restore mention
  rewritten to "REGENERATES, does NOT recover"; controlnet bullet updated
- README.md: feature bullet + callout + secondary mention all updated
- docs/synthid-robust-identity-research-2026-06-08.md: appended
  "Empirical follow-up" section documenting the InstantID sweep phases
  (Phase 1 txt2img v1/v2/v3, Phase 2 img2img defaults + stronger params)
- docs/controlnet-removal-pipeline-research.md: updated restore-faces
  bullet to reflect the empirical conclusion
- CLI help: `_restore_faces_options` docstring + `--restore-faces` /
  `--restore-faces-method` help text all updated

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
Victor Kuznetsov
2026-06-08 21:08:11 -07:00
parent 7d8af7882a
commit 567f3ae729
5 changed files with 100 additions and 31 deletions
+5 -3
View File
File diff suppressed because one or more lines are too long
+3 -3
View File
@@ -23,7 +23,7 @@ If this tool saves you time, consider [sponsoring its development](https://githu
- **AI metadata stripping** — EXIF, PNG text chunks, C2PA provenance manifests (PNG / JPEG / AVIF / HEIF / JPEG-XL, **MP4 / MOV / M4V / M4A** at the container level, and **WebM / MP3 / WAV / FLAC / OGG** losslessly via ffmpeg), XMP DigitalSourceType
- **"Made with AI" label removal** — removes the AI-disclosure metadata that platforms read to apply automatic labels (useful for clearing a false-positive label from a human-edited photograph)
- **Analog Humanizer** — optional film grain and chromatic aberration post-processing
- **Text and face preservation (experimental)** — optional `--pipeline controlnet` adds a canny ControlNet that keeps text and face structure sharp through the removal pass (without copying original pixels, so SynthID is still removed). Canny preserves face *structure*, not *identity* (the regenerated face drifts in likeness); identity is regenerated by the `--restore-faces` PhotoMaker-V2 post-pass (opt-in, **NON-COMMERCIAL** — pulls non-commercial InsightFace model packs). Both are experimental and off by default.
- **Text and face preservation (experimental)** — optional `--pipeline controlnet` adds a canny ControlNet that keeps text and face structure sharp through the removal pass (without copying original pixels, so SynthID is still removed). Canny preserves face *structure*, not *identity* (the regenerated face drifts in likeness). The optional `--restore-faces` post-pass (`instantid` default or `photomaker`, both **NON-COMMERCIAL**, off by default) does NOT recover the original face — it **regenerates** it from an ArcFace embedding via SDXL diffusion, which inherently makes the output look more AI-generated than the cleaned image. **For production face preservation, leave restore OFF and use the cleaned image as-is.**
- **Batch processing** — process entire directories
- **Detection** — three-stage NCC watermark detection with confidence scoring
- **Provenance detection (`identify`)** — aggregate C2PA issuer, the C2PA soft-binding forensic-watermark vendor (Adobe TrustMark, Digimarc, Imatag, ...), IPTC "Made with AI" plus the IPTC 2025.1 `AISystemUsed` field, embedded SD/ComfyUI params, EXIF/XMP generator tags, the xAI/Grok EXIF signature, the China TC260 AIGC label (XMP, PNG chunk, or EXIF), the HuggingFace `hf-job-id` job marker, the SynthID metadata proxy, the visible marks (Gemini sparkle plus the Doubao "豆包AI生成" / Jimeng "即梦AI" / Samsung Galaxy AI "Contenuti generati dall'AI" text marks), the open SD/SDXL/FLUX invisible watermark, and (with the `trustmark` extra) the open Adobe TrustMark watermark into one origin-platform + watermark-inventory verdict (`--json` for machine output)
@@ -128,7 +128,7 @@ image → encode to latent space (VAE) at native resolution
>
> **`--pipeline controlnet` preserves text and face structure (experimental, opt-in).** It runs the same SDXL img2img scrub but adds a canny ControlNet that conditions the regeneration on the image's edge map, so text and structure stay sharp at the strengths that remove SynthID. The watermark removal still comes from the img2img regeneration (`--strength`); the ControlNet only preserves structure — no original pixels are copied or frozen, so SynthID does not survive. `--controlnet-scale` tunes the preservation strength (higher = closer to the original structure). Runs fp32 on mps/cpu (fp16 only on cuda/xpu, where the fp16-fixed SDXL VAE is loaded automatically).
>
> **`--restore-faces` regenerates faces from a CLIP+ArcFace embedding (PhotoMaker-V2, experimental, opt-in, NON-COMMERCIAL).** Canny preserves where a face is, but not who it is — the regenerated face drifts in likeness. The `--restore-faces` post-pass (experimental, off by default; needs the `photomaker` extra) crops each face from the original, feeds it to PhotoMaker-V2 as an identity reference, and regenerates a fresh face from a CLIP+ArcFace embedding which is then feather-composited into the cleaned image. The pixels are diffusion-fresh so SynthID is not re-introduced. **NON-COMMERCIAL:** PhotoMaker-V2's ID encoder pulls InsightFace antelopev2/buffalo_l model packs at runtime, which are released under a research-only license — a paid service must NOT use this flag. (A commercial-safe path was attempted via PhotoMaker-V1 + GFPGAN-on-cleaned but neither was a good fit: V1 hit upstream / diffusers-0.38 compatibility walls, and GFPGAN only polished the already-drifted face without restoring identity. See `docs/synthid-robust-identity-research.md`.)
> **`--restore-faces` REGENERATES faces; it does NOT recover original pixels.** Two methods, both **NON-COMMERCIAL**, both off by default: `instantid` (default, the `instantid` extra; InstantID img2img-on-cleaned + ArcFace embedding + landmark ControlNet) and `photomaker` (the `photomaker` extra; PhotoMaker-V2 txt2img + CLIP+ArcFace embedding). Both crop the face region from the cleaned image and run SDXL diffusion conditioned on an ArcFace embedding from the original — the output face pixels are diffusion-fresh so SynthID is not re-introduced, **but the output face inherently looks more AI-generated than the cleaned image**: every pixel is SDXL-decoded from a semantic embedding, gaining the typical "clean skin" gloss and losing the exact original identity. The cleaned image from the main controlnet 0.20 pass is the least-AI state we can reach without re-introducing SynthID; any restore on top of it trades original-look for embedding-driven regeneration. **For production face preservation, leave `--restore-faces` OFF.** Both extras are NON-COMMERCIAL because their ArcFace embedder is InsightFace's antelopev2 pack which is research-only; the empirical case for not shipping them in prod is the AI-look regardless of license (see `docs/synthid-robust-identity-research-2026-06-08.md`).
SDXL is the default since May 2026: empirically defeats SynthID v2 on Gemini 3 Pro outputs, where the older SD-1.5 pipeline at 768 px did not. The SD-1.5 path was removed once it was verified not to handle v2. Note the scope: this defeats the SynthID *verifier*, which is not the same as being forensically indistinguishable from a real photo. Recent work ([arXiv:2605.09203](https://arxiv.org/abs/2605.09203)) shows watermark-removal pipelines leave detectable traces, so a separate "this image was processed" classifier can still flag the output.
@@ -136,7 +136,7 @@ SDXL is the default since May 2026: empirically defeats SynthID v2 on Gemini 3 P
> **Technical deep-dive:** see [`docs/synthid.md`](docs/synthid.md) for a primary-source-cited breakdown of how SynthID works mechanically (post-hoc encoder/decoder, 136-bit payload, pixel-space embedding), what it empirically survives (JPEG, crop, resize: ~99.98% TPR at 0.1% FPR from arXiv:2510.09263), what removes it, and the forensic-stealth tradeoff (all known removal attacks are detectable at >98% TPR@1%FPR per arXiv:2605.09203).
**Text and face preservation** (experimental, opt-in `--pipeline controlnet`): adds a canny ControlNet so text and face *structure* stay sharp through the removal pass, without copying or freezing any original pixels (so SynthID is still removed). Tune the preservation strength with `--controlnet-scale`. Canny preserves structure but not face *identity* (identity is regenerated by the `--restore-faces` PhotoMaker-V2 post-pass, experimental and off by default, **non-commercial** — see the callout above). Both features are experimental.
**Text and face preservation** (experimental, opt-in `--pipeline controlnet`): adds a canny ControlNet so text and face *structure* stay sharp through the removal pass, without copying or freezing any original pixels (so SynthID is still removed). Tune the preservation strength with `--controlnet-scale`. Canny preserves structure but not face *identity*: the regenerated face drifts in likeness. The optional `--restore-faces` post-pass would regenerate the face from an ArcFace embedding, but every shipped method makes the face look more AI-generated (see the callout above) — for production face preservation leave restore OFF.
**Analog Humanizer**: optional film grain and chromatic aberration injection that mimics a photo of a screen, raising the bar for AI-generated image classifiers. (It frustrates generic classifiers but does not guarantee forensic invisibility — see the [arXiv:2605.09203](https://arxiv.org/abs/2605.09203) note above.)
+13 -8
View File
@@ -124,14 +124,19 @@ Gemini app; the two payloads are vendor-specific and never cross-checked):
- **Fix the seed in prod.** The non-determinism is purely `seed=None` (random); a fixed
`--seed` makes every run reproduce the certified-clean result, so you ship a
deterministic, re-certifiable config (and the seed sweep collapses to one config).
- **`--restore-faces` is PhotoMaker-V2 (NON-COMMERCIAL).** The GFPGAN-on-cleaned path
was tried and rejected: it polished but did not restore identity. PhotoMaker-V2
regenerates faces from a CLIP+ArcFace embedding (so pixels are fresh, SynthID is not
re-introduced) but pulls InsightFace antelopev2/buffalo_l model packs at runtime,
which are research-only. Needs the `photomaker` extra; **a paid service MUST NOT
use this flag.** PhotoMaker-V1 was attempted as a commercial-safe alternative but
blocked by a CFG batch-dim mismatch in the upstream pipeline (forked from diffusers
0.29; we ship 0.38) — see `docs/synthid-robust-identity-research.md`.
- **`--restore-faces` is OFF in prod and stays opt-in.** Two methods ship
(`instantid` default, `photomaker`), both NON-COMMERCIAL. They REGENERATE the face
from an ArcFace embedding via SDXL diffusion, making the output face look more
AI-generated than the cleaned image (gloss, symmetric pores, SDXL "clean skin"
aesthetic). For production face preservation the cleaned image from controlnet 0.20
is the LEAST-AI state we can reach — any restore on top trades original-look for
embedding-driven regeneration. Empirical sweep summary: GFPGAN-on-cleaned polished
without identity recovery; PhotoMaker-V2 produced a different person; InstantID
txt2img produced studio-portrait patchwork on group photos; InstantID
img2img-on-cleaned with three parameter settings integrated scene context cleanly
but never recovered original identity precisely — every setting traded one problem
for another. See `docs/synthid-robust-identity-research-2026-06-08.md`
"Empirical follow-up" for the full sweep.
- **No local SynthID detector exists** → the service can't self-verify; bake in strength
margin and periodic oracle spot-checks.
- **Lesson:** visual-quality / face-identity recovery does NOT prove removal — only the
@@ -126,4 +126,60 @@ Six claims were refuted in adversarial verification, two of them load-bearing: A
- [source](https://github.com/IrvingMeng/MagFace/blob/main/LICENSE)
- [source](https://github.com/askerlee/AdaFace-dev)
- [source](https://openreview.net/forum?id=Hc2ZwCYgmB)
- [source](https://github.com/tencent-ailab/IP-Adapter/wiki/IP%E2%80%90Adapter%E2%80%90Face)
- [source](https://github.com/tencent-ailab/IP-Adapter/wiki/IP%E2%80%90Adapter%E2%80%90Face)
## Empirical follow-up (2026-06-08, end of session)
After the research synthesis above, InstantID was integrated end-to-end and cert-swept
on Modal A100 in two phases:
1. **Phase 1: InstantID txt2img per-face crop + composite.** Per-face InstantID
txt2img with the upstream `pipeline_stable_diffusion_xl_instantid`, ArcFace
embedding from the original face, landmark stick figure. Three composite
iterations:
- v1 (rectangular Gaussian alpha on the 2x square_box around each face):
visible patchwork on group photos, generated 1024 backgrounds clashing.
- v2 (tight crop on YuNet-detected face in the generated 1024 + elliptical
alpha 0.45*bw x 0.55*bh + soft feather): ellipse axis exceeded bbox
vertically, clipped forehead/chin on single portrait, group still had
visible elliptical seams + cool-vs-warm tone clash with scene.
- v3 (tighter ellipse 0.32*bw x 0.42*bh + per-channel mean color match to
local cleaned canvas + softer feather): patchwork visually softened; faces
still read as studio portraits inserted into the scene, not as people
shot in the scene. Single portrait identity drifted (tatsunari -> "round
Asian male" vs original's thin face).
2. **Phase 2: InstantID img2img on cleaned crop.** Switched to the upstream
`pipeline_stable_diffusion_xl_instantid_img2img` (downloaded at first use
from raw.githubusercontent.com; requires `trust_remote_code=True`). Same
ArcFace + landmark conditioning but the SDXL diffusion source is the
CLEANED face crop, so the diffusion sees scene lighting / shoulders /
shadow direction directly. Multi-face composition jumped substantially:
faces sit in the bar scene with matching warm tone, no more elliptical
seams. Single-portrait identity at the default (`strength=0.55`,
`ip_adapter_scale=0.8`, `controlnet_conditioning_scale=0.8`) was "similar
person, not exactly the original"; raising to `strength=0.7`,
`ip_adapter_scale=1.0`, `controlnet_scale=1.0` brought identity closer to
original but introduced more "SDXL gloss / clean skin" aesthetic.
**Net finding for raiw.cc (load-bearing).** The fundamental issue is structural:
ArcFace encodes "this person's general look" (ethnicity, gender, basic facial
geometry) at 512 dimensions; SDXL decodes that embedding into pixels with the
inherent SDXL aesthetic (smooth skin, symmetric pores, AI-photoreal look).
Stronger identity push (higher strength / IP-Adapter scale) makes the face
CLOSER to the embedded identity but MORE AI-looking; weaker push leaves
identity to drift but face looks less AI-generated. There is no parameter
setting that simultaneously recovers original identity AND looks less AI than
the cleaned image, because the cleaned image is itself a controlnet-light
denoise of the original (closer to original pixels) while a restore pass is a
full SDXL regeneration (further from original pixels).
**Operational conclusion.** Do not ship `--restore-faces` in any monetized
deployment. The cleaned image from the main controlnet 0.20 pass is the
LEAST-AI state we can reach without re-introducing SynthID; every restore
method tested (GFPGAN-on-cleaned, PhotoMaker-V2, InstantID txt2img,
InstantID img2img-on-cleaned at three parameter sweeps) trades original-look
for embedding-driven regeneration and makes the face read as "AI-generated"
rather than "the original person". The `instantid` and `photomaker` extras
stay in the library as opt-in for research / personal use where users
explicitly want identity regeneration; the CLI flag and module docstrings
state the trade-off at every entry point.
+22 -16
View File
@@ -238,31 +238,37 @@ def _warn_if_esrgan_unavailable(upscaler: str) -> None:
def _restore_faces_options(f: Any) -> Any:
"""Attach the face-restoration flags to an invisible-pipeline command.
Two methods. ``instantid`` (default; the `instantid` extra) regenerates each
face from an ArcFace embedding + landmark ControlNet -- semantic identity
plus weak spatial control, no original pixels. ``photomaker`` (the
`photomaker` extra) uses PhotoMaker-V2's CLIP+ArcFace dual encoder.
**BOTH ARE NON-COMMERCIAL**: they pull InsightFace antelopev2 / buffalo_l
model packs at runtime, which are research-only. A paid service (raiw.cc,
any monetized SaaS) MUST NOT use this flag.
Both methods REGENERATE the face from an ArcFace embedding via SDXL diffusion
-- they do NOT recover original pixels. Every output face pixel is
diffusion-fresh, so the regenerated face inherently looks MORE AI-generated
than the cleaned image (gloss, symmetric pores, SDXL "clean skin"
aesthetic). For production face preservation, leave the flag OFF and use
the cleaned image as-is. The two methods are kept for research / personal
use where users explicitly want identity regeneration. **BOTH are
NON-COMMERCIAL**: they pull InsightFace antelopev2 / buffalo_l model packs
which are research-only. A paid service (raiw.cc, any monetized SaaS) MUST
NOT use this flag.
"""
method = click.option(
"--restore-faces-method",
type=click.Choice(["instantid", "photomaker"]),
default="instantid",
help="Face-restore mechanism. 'instantid' (default) uses InstantID's ArcFace + "
"landmark ControlNet for stronger identity fidelity on single portraits. "
"'photomaker' uses PhotoMaker-V2's CLIP+ArcFace dual encoder. **BOTH are "
"NON-COMMERCIAL** (InsightFace antelopev2 / buffalo_l model packs are "
"research-only). Pick whichever extra you've installed; for personal / research "
"use only. Do NOT use in a paid service.",
help="Face-regeneration mechanism (no method recovers original pixels; both "
"REGENERATE the face via SDXL). 'instantid' (default) uses InstantID img2img on "
"the cleaned crop with ArcFace + landmark ControlNet. 'photomaker' uses "
"PhotoMaker-V2 txt2img + CLIP+ArcFace dual encoder. **BOTH are NON-COMMERCIAL** "
"(InsightFace antelopev2 / buffalo_l packs are research-only). For personal / "
"research use only.",
)(f)
return click.option(
"--restore-faces/--no-restore-faces",
default=False,
help="EXPERIMENTAL, opt-in, **NON-COMMERCIAL**. Restore face identity via the "
"chosen --restore-faces-method (default: instantid); off by default, auto-skips "
"when no face is detected or the chosen extra is absent.",
help="EXPERIMENTAL, opt-in, **NON-COMMERCIAL**. **REGENERATES the face** (does "
"NOT recover original pixels) via the chosen --restore-faces-method; the "
"regenerated face looks more AI-generated than the cleaned image. Off by "
"default; auto-skips when no face is detected or the chosen extra is absent. "
"For production face preservation leave this OFF and use the cleaned image "
"as-is.",
)(method)