refactor(face-restore): rollback PhotoMaker, restore GFPGAN on the CLEANED image

After 7 cascading upstream-compat fixes (insightface dep, peft dep, pm_version,
device, etc.), the PhotoMaker V1 cert sweep still hit a CFG batch-dim mismatch
inside the denoising loop. The upstream PhotoMaker `pipeline.py` is forked from
diffusers v0.29.1 and our env runs 0.38; SDXL prompt-encoder handling changed
significantly between those versions, so making PhotoMaker work end-to-end
needs a proper fork or a diffusers downgrade — both expensive. Not worth
shipping today.

Pivot: restore `face_restore.py` (GFPGAN) with a single-line fix that makes it
SynthID-safe by construction. The previous design ran GFPGAN.enhance on the
ORIGINAL watermarked image and was oracle-confirmed to re-add SynthID via the
weight-0.5 pixel blend. The fix is to run GFPGAN on the diffusion-CLEANED
image — whatever pixels GFPGAN derives from are already SynthID-free, so the
partial blend cannot transport the watermark. Identity fidelity is lower than
a true identity-as-embedding stack would deliver, but it ships and works.

Changes:
- `src/remove_ai_watermarks/face_restore.py` restored from pre-wipe state with
  one line changed: `restorer.enhance(cleaned_bgr, ...)` instead of
  `restorer.enhance(original_bgr, ...)`. `original_bgr` is kept as an unused
  positional argument for API stability.
- `src/remove_ai_watermarks/photomaker_restore.py` and its tests REMOVED. The
  research note (`docs/synthid-robust-identity-research.md`) keeps a "status
  notice" documenting why PhotoMaker is parked for now and what the path back
  in would look like.
- `pyproject.toml` `restore` extra restored (gfpgan/facexlib/basicsr +
  scipy<1.18 + numba<0.60 pins + the basicsr setuptools<69 build pin), plus
  `photomaker` extra (with its einops/insightface/peft pile) and the
  `[tool.hatch.metadata] allow-direct-references = true` block REMOVED.
- `InvisibleEngine._restore_faces_photomaker` removed; `_restore_faces`
  restored. The `--restore-faces` CLI flag and its plumbing through cmd_*
  signatures are unchanged.
- CLAUDE.md, README.md, docs/synthid.md, docs/controlnet-removal-pipeline-
  research.md updated to describe the shipped GFPGAN-on-cleaned design and to
  reference PhotoMaker only as the parked alternative.

ruff + strict pyright(src/) clean; 578 tests pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
Victor Kuznetsov
2026-06-08 16:55:45 -07:00
parent d1b85ee6a8
commit 01fe98bf54
13 changed files with 1273 additions and 851 deletions
+6 -8
View File
@@ -238,18 +238,16 @@ def _warn_if_esrgan_unavailable(upscaler: str) -> None:
def _restore_faces_options(f: Any) -> Any:
"""Attach the face-restoration flag to an invisible-pipeline command.
PhotoMaker-V2 is the only restoration method shipped (the prior GFPGAN path was
oracle-confirmed to re-introduce SynthID by partial pixel blending and has been
removed). PhotoMaker carries identity in a SynthID-invariant OpenCLIP embedding
and regenerates fresh face pixels conditioned on it -- see
``docs/synthid-robust-identity-research.md``.
The post-pass runs GFPGAN on the DIFFUSION-CLEANED image (not the original), so
SynthID is not re-introduced (the input pixels GFPGAN derives from are already
SynthID-free). See ``face_restore.py``.
"""
return click.option(
"--restore-faces/--no-restore-faces",
default=False,
help="EXPERIMENTAL, opt-in. Restore face identity with the PhotoMaker-V2 post-pass "
"when faces are present (needs the 'photomaker' extra); off by default, auto-skips "
"when no face is detected or the extra is absent.",
help="EXPERIMENTAL, opt-in. Polish face detail with a GFPGAN post-pass on the "
"cleaned image when faces are present (needs the 'restore' extra); off by default, "
"auto-skips when no face is detected or the extra is absent.",
)(f)
+210
View File
@@ -0,0 +1,210 @@
"""Optional GFPGAN face-polish post-pass for the invisible removal pipeline.
The diffusion removal pass scrubs the watermark everywhere but lets faces drift in
likeness (canny holds face *structure*, not *identity*). This module sharpens and
re-synthesizes each face from GFPGAN's StyleGAN2 prior, running on the
DIFFUSION-CLEANED image -- not on the original.
**Why "cleaned, not original":** an earlier version of this module ran GFPGAN on the
ORIGINAL (watermarked) image and was oracle-confirmed (2026-06-04) to re-introduce
SynthID into the face regions, because GFPGAN at fidelity weight 0.5 blends ~half
the input pixels with the prior, and SynthID is robust to that partial blend. The
fix is to feed GFPGAN the already-clean image -- whatever pixels it preserves are
already SynthID-free, so the composited face stays clean. Identity is recovered from
the StyleGAN2 prior conditioned on the already-drifted cleaned face (not on the
original face), so identity fidelity is somewhat lower than the would-have-been
identity-as-embedding stack (PhotoMaker-V1), but the upstream PhotoMaker package has
significant compatibility issues with the diffusers version we ship, so this is the
shipping path.
Both GFPGAN (Apache-2.0) and its RetinaFace detector (MIT) are commercial-safe.
The GFPGANv1.4 weights and the RetinaFace detector download on first use and are
never bundled. Requires the optional ``restore`` extra (gfpgan/facexlib/basicsr).
"""
# cv2/torch/gfpgan boundary: gfpgan/basicsr/facexlib ship no usable type stubs and
# this module wraps cv2 (feather composite) and torch; relax the unknown-type rules
# for this file only.
# pyright: reportUnknownMemberType=false, reportUnknownArgumentType=false, reportUnknownVariableType=false, reportUnknownParameterType=false, reportMissingTypeArgument=false, reportMissingTypeStubs=false, reportMissingImports=false, reportArgumentType=false, reportAssignmentType=false, reportReturnType=false, reportCallIssue=false, reportIndexIssue=false, reportOperatorIssue=false, reportOptionalMemberAccess=false, reportOptionalCall=false, reportOptionalSubscript=false, reportOptionalOperand=false, reportAttributeAccessIssue=false, reportPrivateImportUsage=false, reportPrivateUsage=false, reportInvalidTypeForm=false, reportConstantRedefinition=false, reportUnnecessaryComparison=false
from __future__ import annotations
import logging
import sys
import threading
from typing import TYPE_CHECKING, Any
if TYPE_CHECKING:
from numpy.typing import NDArray
logger = logging.getLogger(__name__)
# GFPGANv1.4 weights (Apache-2.0). Downloaded on first use, never bundled.
_GFPGAN_MODEL_URL = "https://github.com/TencentARC/GFPGAN/releases/download/v1.3.0/GFPGANv1.4.pth"
_GFPGAN_ARCH = "clean"
_GFPGAN_CHANNEL_MULTIPLIER = 2
_restorer: Any | None = None
_restorer_lock = threading.Lock()
def is_available() -> bool:
"""True when the optional GFPGAN face-restoration deps are importable."""
import importlib.util
return importlib.util.find_spec("gfpgan") is not None and importlib.util.find_spec("facexlib") is not None
def _apply_basicsr_shim() -> None:
"""Install the ``torchvision.transforms.functional_tensor`` compatibility shim.
basicsr (a GFPGAN dependency) imports ``rgb_to_grayscale`` from the
``torchvision.transforms.functional_tensor`` module, which newer torchvision
removed. Recreate that module pointing at the public functional API. Idempotent:
only installed when the real module is missing.
"""
import importlib.util
if importlib.util.find_spec("torchvision.transforms.functional_tensor") is not None:
return
if "torchvision.transforms.functional_tensor" in sys.modules:
return
import types
import torchvision.transforms.functional as tv_functional
shim = types.ModuleType("torchvision.transforms.functional_tensor")
shim.rgb_to_grayscale = tv_functional.rgb_to_grayscale
sys.modules["torchvision.transforms.functional_tensor"] = shim
def _select_device() -> str:
"""Pick the GFPGAN device: CUDA when present, else CPU.
The pip GFPGANer has an MPS device-mismatch bug, and this is a cheap post-pass
on a few face crops, so MPS is deliberately avoided -- CPU is the safe default
on Apple silicon.
"""
try:
import torch
if torch.cuda.is_available():
return "cuda"
except Exception as e:
logger.debug("face_restore: CUDA probe failed (%s); using CPU", e)
return "cpu"
def _get_restorer() -> Any:
"""Return the lazily-built GFPGANer singleton (downloads weights on first use)."""
global _restorer
if _restorer is not None:
return _restorer
with _restorer_lock:
if _restorer is None:
_apply_basicsr_shim()
from gfpgan import GFPGANer
_restorer = GFPGANer(
model_path=_GFPGAN_MODEL_URL,
upscale=1,
arch=_GFPGAN_ARCH,
channel_multiplier=_GFPGAN_CHANNEL_MULTIPLIER,
device=_select_device(),
)
return _restorer
def _composite_faces(
base_bgr: NDArray[Any],
restored_bgr: NDArray[Any],
boxes: list[tuple[float, float, float, float]],
pad: int = 14,
feather_div: int = 6,
) -> NDArray[Any]:
"""Feather-composite restored face regions from ``restored_bgr`` into ``base_bgr``.
Pure cv2/numpy helper (no gfpgan), so it is unit-testable without the model.
For each ``(x1, y1, x2, y2)`` box: pad and clip to the image, build a Gaussian-
feathered rectangular alpha, and blend ``restored * a + base * (1 - a)``. Boxes
that fall fully outside the image (or an empty list) leave ``base_bgr`` unchanged.
"""
import cv2
import numpy as np
out = base_bgr.astype(np.float32)
h, w = base_bgr.shape[:2]
for box in boxes:
x1 = int(box[0]) - pad
y1 = int(box[1]) - pad
x2 = int(box[2]) + pad
y2 = int(box[3]) + pad
x1 = max(0, min(x1, w))
y1 = max(0, min(y1, h))
x2 = max(0, min(x2, w))
y2 = max(0, min(y2, h))
bw = x2 - x1
bh = y2 - y1
if bw <= 0 or bh <= 0:
continue
alpha = np.zeros((h, w), dtype=np.float32)
alpha[y1:y2, x1:x2] = 1.0
k = max(3, (min(bw, bh) // feather_div) | 1) # odd kernel >= 3
alpha = cv2.GaussianBlur(alpha, (k, k), 0)
alpha = alpha[:, :, None]
out = restored_bgr.astype(np.float32) * alpha + out * (1.0 - alpha)
return np.clip(out, 0, 255).astype(np.uint8)
def restore_faces(
original_bgr: NDArray[Any], # legacy positional kept for API stability; unused
cleaned_bgr: NDArray[Any],
weight: float = 0.5,
pad: int = 14,
feather_div: int = 6,
) -> NDArray[Any]:
"""Restore face identity in ``cleaned_bgr`` by running GFPGAN on the CLEANED image.
GFPGAN is a fidelity-restoration net: it sharpens and re-synthesizes face details
from its StyleGAN2 prior conditioned on the INPUT face. **Running it on the
diffusion-cleaned image (not the original)** is what makes this pass SynthID-safe:
the input pixels GFPGAN derives from are already SynthID-free, so the partial
pixel-blend at the default weight 0.5 cannot re-introduce the watermark.
The earlier version of this module ran GFPGAN on the ORIGINAL (watermarked) image
and was oracle-confirmed (2026-06-04) to re-introduce SynthID into the face
regions. The fix is the single-line source swap below.
The ``original_bgr`` argument is kept for positional API stability with the
earlier signature but is no longer used; pass it for legacy callers, ignore it
in new code.
Args:
original_bgr: UNUSED (legacy; kept for positional API stability).
cleaned_bgr: The diffusion-cleaned image as cv2 BGR (faces drifted from the
removal pass). GFPGAN runs on THIS, polishing each face without changing
the watermark state of the source pixels.
weight: GFPGAN fidelity weight (0-1); lower = more StyleGAN2 regeneration of
the face from the prior.
pad: Pixels to grow each face box before compositing.
feather_div: Larger = sharper composite edge (box-min // feather_div kernel).
"""
restorer = _get_restorer()
_, _, restored_img = restorer.enhance(
cleaned_bgr,
has_aligned=False,
only_center_face=False,
paste_back=True,
weight=weight,
)
det_faces = getattr(restorer.face_helper, "det_faces", None) or []
boxes = [(float(b[0]), float(b[1]), float(b[2]), float(b[3])) for b in det_faces]
if not boxes:
logger.debug("face_restore: no faces detected; returning cleaned image unchanged")
return cleaned_bgr
return _composite_faces(cleaned_bgr, restored_img, boxes, pad=pad, feather_div=feather_div)
+30 -40
View File
@@ -180,13 +180,11 @@ class InvisibleEngine:
guidance_scale: Classifier-free guidance scale.
seed: Random seed for reproducibility.
humanize: Intensity of Analog Humanizer film grain (0 = off).
restore_faces: EXPERIMENTAL, opt-in (default False). Run the PhotoMaker-V2
face-identity post-pass when faces are present (needs the
``photomaker`` extra). Carries identity via a SynthID-invariant OpenCLIP
embedding and regenerates fresh face pixels conditioned on it, so the
pixel watermark is not transported. Auto-skips with a debug log when the
extra is absent or no face is detected. See
``docs/synthid-robust-identity-research.md``.
restore_faces: EXPERIMENTAL, opt-in (default False). Run the GFPGAN
face-polish post-pass when faces are present (needs the ``restore``
extra). Runs on the diffusion-CLEANED image (not the original), so
SynthID is not re-introduced. Auto-skips with a debug log when the
extra is absent or no face is detected.
unsharp: Final unsharp-mask sharpening strength (0 = off, default).
Applied last (after face restoration) to counter the soft,
over-smoothed look of the diffusion + restoration; ~0.5-0.8 is a
@@ -312,13 +310,13 @@ class InvisibleEngine:
out_cv = cv2.resize(out_cv, orig_size, interpolation=cv2.INTER_LANCZOS4)
image_io.imwrite(out_path, out_cv)
# Optional PhotoMaker-V2 face-identity post-pass: restore face identity that
# the diffusion regeneration drifted, carrying identity in a SynthID-invariant
# OpenCLIP embedding so the regenerated face pixels are watermark-free. Runs
# on the cleaned output at its final resolution; auto-skips when faces are
# absent or the optional extra is not installed.
# Optional GFPGAN face-polish post-pass: sharpens and re-synthesizes each
# face from GFPGAN's StyleGAN2 prior, running on the DIFFUSION-CLEANED image
# (not the original) -- so SynthID is not re-introduced (the input pixels
# GFPGAN derives from are already SynthID-free). Auto-skips when faces are
# absent or the optional `restore` extra is not installed.
if restore_faces:
self._restore_faces_photomaker(out_path, image, seed)
self._restore_faces(out_path)
# Final sharpening, LAST so it crisps the face-restored result too (a
# pre-restore sharpen would be smoothed back over by the face pass).
@@ -357,50 +355,42 @@ class InvisibleEngine:
if _tmp_path.exists():
_tmp_path.unlink()
def _restore_faces_photomaker(
self,
out_path: Path,
original_image: Any,
seed: int | None,
) -> None:
"""Run the PhotoMaker-V2 SynthID-safe face-identity restoration post-pass.
def _restore_faces(self, out_path: Path) -> None:
"""Run the GFPGAN face-polish post-pass on the cleaned ``out_path``.
Unlike the GFPGAN path (which blends watermarked original face pixels back into
the cleaned output and re-introduces SynthID), PhotoMaker carries identity in a
SynthID-invariant OpenCLIP embedding and regenerates fresh face pixels conditioned
on it. Best-effort: any failure (missing extra, model load, runtime error) logs a
warning and leaves the un-restored cleaned output in place. See
``docs/synthid-robust-identity-research.md`` and ``photomaker_restore.py``.
SynthID-safe: GFPGAN is run on the diffusion-CLEANED image (not the original),
so the partial pixel-blend it does at fidelity weight 0.5 cannot re-introduce
the watermark -- the input pixels GFPGAN derives from are already SynthID-free.
Best-effort: any failure logs a warning and leaves the un-restored cleaned
output in place; a missing ``restore`` extra is logged at debug and skipped
(the flag must never error when the extra is absent or no face is present).
"""
from remove_ai_watermarks import photomaker_restore
from remove_ai_watermarks import face_restore
if not photomaker_restore.is_available():
logger.debug("restore_faces=photomaker requested but the 'photomaker' extra is not installed; skipping")
if not face_restore.is_available():
logger.debug("restore_faces requested but the 'restore' extra is not installed; skipping")
return
try:
import cv2
import numpy as np
from remove_ai_watermarks import image_io
cleaned_bgr = image_io.imread(out_path, cv2.IMREAD_COLOR)
if cleaned_bgr is None:
logger.warning("restore_faces_photomaker: could not read cleaned output %s; skipping", out_path)
logger.warning("restore_faces: could not read cleaned output %s; skipping", out_path)
return
original_rgb = original_image.convert("RGB")
original_bgr = cv2.cvtColor(np.array(original_rgb), cv2.COLOR_RGB2BGR)
cleaned_size = (cleaned_bgr.shape[1], cleaned_bgr.shape[0])
if (original_bgr.shape[1], original_bgr.shape[0]) != cleaned_size:
original_bgr = cv2.resize(original_bgr, cleaned_size, interpolation=cv2.INTER_LANCZOS4)
if self._progress_callback:
self._progress_callback("Restoring face identity (PhotoMaker-V2 post-pass)...")
restored = photomaker_restore.restore_faces_photomaker(original_bgr, cleaned_bgr, seed=seed)
self._progress_callback("Polishing face identity (GFPGAN on cleaned image)...")
# original_bgr is unused (GFPGAN runs on cleaned_bgr); pass an empty array
# for positional API stability with the legacy signature.
import numpy as np
restored = face_restore.restore_faces(np.empty((0, 0, 3), dtype=np.uint8), cleaned_bgr)
image_io.imwrite(out_path, restored)
except Exception as e:
logger.warning("restore_faces_photomaker post-pass failed (%s); keeping un-restored output", e)
logger.warning("restore_faces post-pass failed (%s); keeping un-restored output", e)
def remove_watermark_batch(
self,
@@ -1,343 +0,0 @@
"""SynthID-robust face identity restoration via PhotoMaker-V1.
The diffusion removal pass scrubs the pixel watermark from the WHOLE image, including
faces, but lets faces drift in identity. Unlike the GFPGAN restore pass in
``face_restore.py`` (which runs on the watermarked ORIGINAL and re-introduces SynthID
via partial pixel blending), PhotoMaker carries identity in a SEMANTIC EMBEDDING
(OpenCLIP-ViT-H/14 image embedding, finetuned by PhotoMaker-V2) and uses it to
CONDITION a fresh txt2img generation -- the pixels are new, so the watermark cannot
be transported.
That the embedding cannot carry an invisible pixel watermark like SynthID was
empirically confirmed 2026-06-04: on 31 face crops, the cosine similarity between
``embed(orig)`` and ``embed(synthid_proxy(orig))`` (a ±2 LSB low-frequency noise of
SynthID magnitude) is 0.9977 -- an order of magnitude less drift than JPEG90, which
SynthID survives at >=99% TPR by design. See ``docs/synthid-robust-identity-research.md``.
Architecture: PhotoMaker-V1 is a fine-tuned OpenCLIP-ViT-H/14 ID encoder plus LoRA on
the SDXL UNet attention layers. It ships as a single ``photomaker-v1.bin`` checkpoint
loaded into a ``PhotoMakerStableDiffusionXLPipeline`` (txt2img). **V1, not V2:** V2
adds an InsightFace/ArcFace face-recognition component at runtime, whose pretrained
model packs (antelopev2, buffalo_l) are non-commercial-research-only per the
InsightFace README, which would block a paid service like raiw.cc. V1's identity
encoder is CLIP-only (PhotoMakerIDEncoder, ``model.py``); confirmed by inspecting
the upstream source (model_v2.py forward takes ``id_embeds`` from InsightFace; V1
forward does not). We use it as a SECOND PASS after the main controlnet/default
removal:
1. Main removal pass (`controlnet` at the certified strength) cleans SynthID
everywhere but leaves faces drifted.
2. For each face found in the CLEANED image (YuNet), this module takes the SAME
face region from the ORIGINAL, computes a PhotoMaker ID embedding from it, and
runs PhotoMaker txt2img to regenerate JUST that face crop from the embedding.
The freshly generated face is feather-composited back into the cleaned image.
The generated face pixels are diffusion-fresh and inherit identity from the embedding
(not the pixels), so SynthID is not re-introduced.
Commercial-safe end-to-end:
- PhotoMaker-V1 weights: Apache-2.0 (TencentARC).
- ID encoder: OpenCLIP-ViT-H/14 (MIT) finetuned by PhotoMaker (still Apache-2.0).
- SDXL base: shared with the main pipeline (already used in `default`/`controlnet`).
- NO InsightFace / antelopev2 (the non-commercial blocker that BLOCKS PhotoMaker-V2,
IP-Adapter FaceID, InstantID, PuLID, and Arc2Face). V1 is the only commercial-safe
member of this family.
Requires the optional ``photomaker`` extra: ``pip install
'remove-ai-watermarks[photomaker]'`` (pulls torch / diffusers / the upstream PhotoMaker
package, all commercial-safe). Weights download on first use; never bundled.
**Why the extra includes ``insightface`` even though we use V1.** The upstream
PhotoMaker package's ``__init__.py`` unconditionally imports its face-analyser
wrapper (an InsightFace subclass), so JUST importing the V1 pipeline class needs
``insightface`` to be importable -- otherwise the import errors with
``ModuleNotFoundError: No module named 'insightface'`` (caught empirically by the
Modal cert sweep 2026-06-04). The PyPI ``insightface`` package itself is MIT-licensed
CODE; the non-commercial restriction is on the pretrained MODEL packs (antelopev2,
buffalo_l), which only download when the face-analyser class is INSTANTIATED. **We
never instantiate it** -- our V1 path uses
``PhotoMakerStableDiffusionXLPipeline.load_photomaker_adapter`` which loads
photomaker-v1.bin (the OpenCLIP-only encoder) and never touches the InsightFace face
analyser. So the legal status of the InsightFace model packs does not bind us; this
module only depends on the MIT-licensed CODE for the import to resolve. A test
(``tests/test_photomaker_restore.py::TestV1OnlyCommercialSafetyGuard``) asserts that
this module never references the face-analyser class.
"""
# cv2/torch/diffusers boundary: relax unknown-type rules for this file only.
# pyright: reportUnknownMemberType=false, reportUnknownArgumentType=false, reportUnknownVariableType=false, reportUnknownParameterType=false, reportMissingTypeArgument=false, reportMissingTypeStubs=false, reportMissingImports=false, reportArgumentType=false, reportAssignmentType=false, reportReturnType=false, reportCallIssue=false, reportIndexIssue=false, reportOperatorIssue=false, reportOptionalMemberAccess=false, reportOptionalCall=false, reportOptionalSubscript=false, reportOptionalOperand=false, reportAttributeAccessIssue=false, reportPrivateImportUsage=false, reportPrivateUsage=false, reportInvalidTypeForm=false, reportConstantRedefinition=false, reportUnnecessaryComparison=false
from __future__ import annotations
import importlib.util
import logging
import threading
from pathlib import Path
from typing import TYPE_CHECKING, Any
if TYPE_CHECKING:
from numpy.typing import NDArray
logger = logging.getLogger(__name__)
# PhotoMaker-V1 weights (Apache-2.0, TencentARC). Downloaded on first use. V2 is NOT
# used because it pulls InsightFace at runtime (non-commercial models).
_PHOTOMAKER_REPO = "TencentARC/PhotoMaker"
_PHOTOMAKER_FILE = "photomaker-v1.bin"
# SDXL base shared with the main pipeline (same checkpoint as `default`/`controlnet`).
_SDXL_MODEL_ID = "stabilityai/stable-diffusion-xl-base-1.0"
# The neutral prompt PhotoMaker is designed around: a class noun + the trigger word
# `img`, which PhotoMaker replaces with the ID embedding at inference. Keeping it
# scene-neutral (no extra style words) maximises identity transfer from the embed and
# minimises hallucinated background/lighting that would not match the cleaned scene.
_PHOTOMAKER_PROMPT = "a portrait photo of a person img, natural lighting, sharp focus"
_PHOTOMAKER_NEGATIVE = "blurry, lowres, deformed, distorted, watermark"
# Square size used to feed PhotoMaker (must match a multiple of 64; 512 fits CPU/GPU
# comfortably and gives the encoder enough pixels for a stable embedding).
_PHOTOMAKER_FACE_SIZE = 512
_pipeline: Any | None = None
_pipeline_lock = threading.Lock()
def is_available() -> bool:
"""True when the optional PhotoMaker extra deps are importable."""
return (
importlib.util.find_spec("photomaker") is not None
and importlib.util.find_spec("diffusers") is not None
and importlib.util.find_spec("huggingface_hub") is not None
)
def _select_device() -> str:
"""Pick the PhotoMaker pipeline device: CUDA when present, MPS on Apple, else CPU."""
try:
import torch
if torch.cuda.is_available():
return "cuda"
if torch.backends.mps.is_available():
return "mps"
except Exception as e:
logger.debug("photomaker_restore: device probe failed (%s); using CPU", e)
return "cpu"
def _get_pipeline() -> Any:
"""Return the lazily-built PhotoMaker pipeline singleton (downloads weights on first use)."""
global _pipeline
if _pipeline is not None:
return _pipeline
with _pipeline_lock:
if _pipeline is None:
import torch
from huggingface_hub import hf_hub_download
from photomaker import PhotoMakerStableDiffusionXLPipeline
device = _select_device()
dtype = torch.float16 if device == "cuda" else torch.float32
logger.info("photomaker_restore: loading SDXL+PhotoMaker on %s (%s)", device, dtype)
# Belt-and-suspenders: V1 file name. If a future maintainer points
# _PHOTOMAKER_FILE at v2, this stops the build so we don't silently regress
# to the non-commercial InsightFace path.
if _PHOTOMAKER_FILE != "photomaker-v1.bin":
raise RuntimeError(
f"PhotoMaker V1 is the only commercial-safe variant; got "
f"{_PHOTOMAKER_FILE!r}. V2 requires the non-commercial InsightFace "
"antelopev2/buffalo_l face packs "
"(see docs/synthid-robust-identity-research.md)."
)
adapter_path = hf_hub_download(repo_id=_PHOTOMAKER_REPO, filename=_PHOTOMAKER_FILE)
pipe = PhotoMakerStableDiffusionXLPipeline.from_pretrained(_SDXL_MODEL_ID, torch_dtype=dtype)
# Move SDXL submodules to the device BEFORE loading the PhotoMaker adapter:
# ``load_photomaker_adapter`` reads ``self.device`` / ``self.unet.dtype`` to
# place the new ID encoder. If we ``.to(device)`` after, the SDXL submodules
# move but the id_encoder stays where it was (custom attribute, not in the
# auto-managed module tree), and inference errors with
# "Input type (torch.cuda.HalfTensor) and weight type (torch.HalfTensor)
# should be the same" (caught empirically 2026-06-04).
pipe.to(device)
# ``pm_version="v1"`` is REQUIRED: the upstream loader defaults to v2 and would
# build the V2 encoder (PhotoMakerIDEncoder_CLIPInsightfaceExtendtoken), then
# error on load_state_dict because the v1 weights have a different shape.
# Passing v1 builds the CLIP-only PhotoMakerIDEncoder, which is the
# commercial-safe path we want.
pipe.load_photomaker_adapter(
str(Path(adapter_path).parent),
subfolder="",
weight_name=_PHOTOMAKER_FILE,
trigger_word="img",
pm_version="v1",
)
pipe.fuse_lora()
# Belt: also explicitly cast the loaded id_encoder, because some
# diffusers/torch combinations leave the encoder buffers untouched even
# though ``pipe.to(device)`` ran first.
if hasattr(pipe, "id_encoder") and pipe.id_encoder is not None:
pipe.id_encoder = pipe.id_encoder.to(device=device, dtype=dtype)
_pipeline = pipe
return _pipeline
def _face_crop_square(
image_bgr: NDArray[Any],
box: tuple[int, int, int, int],
pad: float = 0.30,
) -> tuple[NDArray[Any], tuple[int, int, int, int]]:
"""Square crop around a face box (with padding), clipped to the image.
Returns ``(crop_bgr, (x1, y1, x2, y2))``. The crop is the image content inside the
returned square box -- callers use the box for the composite step. Pure numpy slicing,
no model.
"""
h, w = image_bgr.shape[:2]
x, y, bw, bh = box
cx, cy = x + bw // 2, y + bh // 2
side = int(max(bw, bh) * (1.0 + 2.0 * pad))
half = side // 2
x1 = max(0, cx - half)
y1 = max(0, cy - half)
x2 = min(w, cx + half)
y2 = min(h, cy + half)
return image_bgr[y1:y2, x1:x2], (x1, y1, x2, y2)
def _composite_faces(
base_bgr: NDArray[Any],
restored_crops: list[tuple[NDArray[Any], tuple[int, int, int, int]]],
feather_div: int = 6,
) -> NDArray[Any]:
"""Feather-composite a list of ``(restored_crop, (x1, y1, x2, y2))`` into ``base_bgr``.
Pure cv2/numpy helper (no model), unit-testable. For each ``(crop, box)``: resize
the crop to the box size, build a Gaussian-feathered rectangular alpha, and blend
``crop * a + base * (1 - a)``. Boxes that fall fully outside the image (or an empty
list) leave ``base_bgr`` unchanged. Mirrors the alpha math in ``face_restore._composite_faces``.
"""
import cv2
import numpy as np
out = base_bgr.astype(np.float32)
h, w = base_bgr.shape[:2]
for crop, (x1, y1, x2, y2) in restored_crops:
x1, y1 = max(0, x1), max(0, y1)
x2, y2 = min(w, x2), min(h, y2)
bw, bh = x2 - x1, y2 - y1
if bw <= 0 or bh <= 0:
continue
resized = cv2.resize(crop, (bw, bh), interpolation=cv2.INTER_LANCZOS4)
alpha = np.zeros((h, w), dtype=np.float32)
alpha[y1:y2, x1:x2] = 1.0
k = max(3, (min(bw, bh) // feather_div) | 1)
alpha = cv2.GaussianBlur(alpha, (k, k), 0)[:, :, None]
full_restored = np.zeros_like(out)
full_restored[y1:y2, x1:x2] = resized
out = full_restored * alpha + out * (1.0 - alpha)
return np.clip(out, 0, 255).astype(np.uint8)
def restore_faces_photomaker(
original_bgr: NDArray[Any],
cleaned_bgr: NDArray[Any],
num_inference_steps: int = 30,
guidance_scale: float = 5.0,
style_strength: int = 20,
seed: int | None = None,
detect_faces_fn: Any | None = None,
) -> NDArray[Any]:
"""SynthID-robust face identity restoration via PhotoMaker txt2img.
Pipeline:
1. Detect faces in ``cleaned_bgr`` (YuNet via the package's ``auto_config`` by
default; override via ``detect_faces_fn`` for tests).
2. For each face: take the SAME box from ``original_bgr`` -> square crop -> PhotoMaker
txt2img with that crop as the ID image -> a fresh face generated from the
OpenCLIP embedding (the embedding is SynthID-invariant by ~3 orders of magnitude,
see docs/synthid-robust-identity-research.md).
3. Feather-composite each regenerated face into ``cleaned_bgr``.
Faces are taken from ``original_bgr`` (the embedding ignores the watermark) but the
PIXELS that land in the output are diffusion-fresh, so SynthID is not transported.
Args:
original_bgr: The original (watermarked) image as cv2 BGR. Source of identity.
cleaned_bgr: The main-pass output as cv2 BGR. Faces drifted in identity; this
module replaces those face regions.
num_inference_steps: Diffusion steps inside PhotoMaker (def 30).
guidance_scale: CFG scale inside PhotoMaker (def 5.0; the PhotoMaker recipe).
style_strength: PhotoMaker's ``start_merge_step`` knob ~ 20-30 (def 20).
seed: Optional seed for reproducibility.
detect_faces_fn: Optional callable ``(bgr) -> list[(x,y,w,h)]`` to override the
default YuNet detector (used by tests).
Returns:
``cleaned_bgr`` with regenerated face regions composited in (or unchanged when
no face is detected).
"""
import cv2
import numpy as np
import torch
from PIL import Image
if detect_faces_fn is None:
from remove_ai_watermarks import auto_config as _ac
def _default_detect(bgr: NDArray[Any]) -> list[tuple[int, int, int, int]]:
h, w = bgr.shape[:2]
model = Path(_ac.__file__).parent / "assets" / "face_detection_yunet_2023mar.onnx"
det = cv2.FaceDetectorYN.create(str(model), "", (w, h), _ac._FACE_SCORE, 0.3, 5000)
det.setInputSize((w, h))
_, faces = det.detect(bgr)
if faces is None:
return []
return [(int(f[0]), int(f[1]), int(f[2]), int(f[3])) for f in faces if int(f[2]) > 0 and int(f[3]) > 0]
detect_faces_fn = _default_detect
boxes = detect_faces_fn(cleaned_bgr)
if not boxes:
logger.debug("photomaker_restore: no faces detected; returning cleaned image unchanged")
return cleaned_bgr
pipeline = _get_pipeline()
generator = None
if seed is not None:
generator = torch.Generator(device=pipeline.device).manual_seed(seed)
restored: list[tuple[NDArray[Any], tuple[int, int, int, int]]] = []
for box in boxes:
id_crop_bgr, square_box = _face_crop_square(original_bgr, box)
if id_crop_bgr.size == 0:
continue
id_crop_rgb = cv2.cvtColor(id_crop_bgr, cv2.COLOR_BGR2RGB)
id_image_pil = Image.fromarray(id_crop_rgb)
# Don't pass negative_prompt: the PhotoMaker pipeline manages its own CFG by
# concatenating [negative_prompt_embeds, prompt_embeds]; if we pass a custom
# negative the upstream code splits text_only vs id-injected branches and
# the resulting embed batch dims can mismatch (we saw
# "Sizes of tensors must match except in dimension 1. Expected size 2 but got
# size 1" on a real run). The default empty negative is what the upstream
# gradio demo uses.
out = pipeline(
prompt=_PHOTOMAKER_PROMPT,
input_id_images=[id_image_pil],
num_inference_steps=num_inference_steps,
guidance_scale=guidance_scale,
start_merge_step=style_strength,
generator=generator,
height=_PHOTOMAKER_FACE_SIZE,
width=_PHOTOMAKER_FACE_SIZE,
num_images_per_prompt=1,
)
gen_rgb = out.images[0]
gen_bgr = cv2.cvtColor(np.array(gen_rgb), cv2.COLOR_RGB2BGR)
restored.append((gen_bgr, square_box))
return _composite_faces(cleaned_bgr, restored)