Commit Graph

212 Commits

Author SHA1 Message Date
Victor Kuznetsov 295e7ada2b chore: project review (dev tools in extras, dep upgrades, optional-deps guard, stale cleanup)
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-09 17:03:17 -07:00
Victor Kuznetsov 826cfdb82a chore(release): v0.10.0
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
v0.10.0
2026-06-09 13:24:37 -07:00
Victor Kuznetsov 2fcd00ced0 fix: address whole-project code review (visible all/batch, engine consolidation, I/O)
Nine findings from a high-effort project-wide review, fixed and verified
(571 passed, ruff/pyright clean):

Correctness:
- all/batch now remove Doubao/Jimeng/Samsung visible text marks: the visible
  step routes through the registry (new cli._remove_visible_auto) instead of a
  hardcoded GeminiEngine, so they no longer leave the wordmark intact.
- batch always reads the original source (dropped the out_path-reuse that
  re-processed already-cleaned outputs on a re-run).
- img2img_runner only retries the diffusion call on the deprecated-callback
  TypeError; any other TypeError now propagates instead of double-running.
- gemini detect/remove and the reverse-alpha engines normalize channels via a
  new image_io.to_bgr, fixing a grayscale/BGRA crash in the FP-gate path.
- _png_late_metadata advances its cursor by the clamped length, so a malformed
  chunk length no longer aborts the late AI-label scan.

Cleanup / efficiency:
- Consolidate the ~90%-identical Doubao/Jimeng/Samsung engines into a shared
  config-driven _text_mark_engine.TextMarkEngine base; each engine is now a thin
  subclass (TextMarkConfig + test shims). Behavior is byte-exact (the three
  engine test suites pass unchanged). Registry adapters collapse to one
  _text_mark(...) row each. Gemini stays a separate engine.
- scan_head is memoized per (path, size, mtime), so identify() reads the file
  head once instead of ~8 times.
- invisible_engine post-processing decodes/encodes the output once (chained in
  memory) instead of 2-4 times across stages.
- Remove the orphaned get_model_id_for_profile (+ CONTROLNET_PROFILE); derive
  the --strength help from the strength constants (strength_default_help) so it
  cannot drift; share the --pipeline/--strength click options; simplify the
  retired --auto resolver.

Net -835 lines. Tests added for the registry-routed visible pass, to_bgr,
the polish/model/guidance wiring, and strength_default_help. CLAUDE.md updated
for the new base module, the engine/registry changes, image_io.to_bgr, and the
scan_head cache.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 13:21:13 -07:00
Victor Kuznetsov b1189549b8 feat(invisible): controlnet default, unified strength, retire --auto, add --model/--guidance-scale
Overhaul the diffusion-removal surface around a single robust default and a
complete, consistent CLI.

Pipeline + strength:
- controlnet is now the DEFAULT pipeline (CLI --pipeline + both engine ctors).
  With the certified higher strength it clears both photoreal and flat-graphic
  content, whereas plain SDXL left SynthID on flat graphics.
- Rename the plain-SDXL profile default -> sdxl; "default" stays as a back-compat
  alias (normalize_profile + a click callback that warns).
- Unify the strength ladder: resolve_strength applies ONE vendor-adaptive ladder
  (the certified controlnet floors OpenAI 0.20 / Google 0.30 / unknown 0.30) to
  both pipelines. sdxl is the weaker remover on its own hard case (flat fills),
  so the certified floor is the right floor for it too.

CLI completeness:
- Add --model (HF model id) to invisible + batch (was only on all) and
  --guidance-scale (CFG) to all three diffusion commands; both were library
  knobs the CLI did not expose.
- Flip --adaptive-polish to ON by default (it self-gates to a no-op where there
  is no detail deficit, so default-on is safe).
- Share --pipeline / --strength / --model / --guidance-scale as single
  decorators so invisible/all/batch keep an identical surface; the --strength
  help is derived from the strength constants (strength_default_help) so it can
  never drift from the ladder.

Removals:
- Delete the auto_config content-detection planner + its YuNet/DBNet assets
  (~2.6 MB): with controlnet always the pipeline and the polish self-gating, the
  face/text/edge detection no longer changed behavior. --auto is now a deprecated
  no-op that only warns (the polish it enabled is the default).

Docs (README, CLAUDE.md, docs/synthid.md) updated throughout; added an
InvisibleEngine Python API example. Tests cover the alias warnings, the
polish default, and the --model/--guidance-scale wiring.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 12:40:45 -07:00
Victor Kuznetsov efc5b4a9af docs(auto): drop stale face-restore mentions from --auto
The face-restore family was removed in 20d7eda, but the auto_config
module docstring still claimed "PhotoMaker face restoration is enabled
when a face is present" and the --auto help text (CLI + README example)
listed "face restore" as something --auto picks. A detected face now
only routes to the controlnet pipeline (canny preserves face STRUCTURE,
not identity); there is no identity restoration. Comments/docstrings/help
only, no code behavior change.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 11:12:53 -07:00
Victor Kuznetsov ea098cf1be chore(release): v0.9.0
BREAKING:
- Drop `--restore-faces` / `--restore-faces-method` CLI flags
- Drop `restore`, `photomaker`, `instantid` extras
- Drop `restore_faces` / `restore_faces_method` params from
  InvisibleEngine.remove_watermark and AutoConfig

Rationale (full empirical record in
docs/synthid-robust-identity-research-2026-06-08.md "Empirical follow-up"):
every face-restore approach evaluated 2026-06-04 - 2026-06-08 (GFPGAN-on-
cleaned, PhotoMaker-V2, InstantID txt2img, InstantID img2img-on-cleaned
at three parameter sweeps) regenerates the face via SDXL diffusion --
output face pixels are diffusion-fresh, so the regenerated face inherits
SDXL's "clean skin" aesthetic and loses original identity precision. The
result looks MORE AI-generated than the cleaned image, not less. The
cleaned controlnet 0.20 image is the least-AI face state we can reach
without re-introducing SynthID.

License:
- MIT -> Apache 2.0 (Apache adds an explicit patent grant + trademark
  clause; better fit with the upstream Apache projects this library
  mirrors / depends on -- diffusers, transformers, controlnet-aux,
  xinsir's controlnet weights)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
v0.9.0
2026-06-08 21:28:09 -07:00
Victor Kuznetsov a4554bb5d3 chore(license): switch from MIT to Apache 2.0
Replace LICENSE with the canonical Apache License 2.0 text + a brief
copyright notice for "wiltodelta 2025-2026". Update pyproject.toml's
`license` field to "Apache-2.0" and the PyPI classifier to "Apache
Software License". Update README's License section to point at the
LICENSE file and name the copyright holder.

Why: Apache 2.0 gives downstream users an explicit patent grant and the
trademark-use clause, which MIT doesn't carry. It is also the more
common license among the upstream projects this library depends on /
mirrors (diffusers, transformers, controlnet-aux, xinsir's canny
controlnet weights), so contributions can flow either way without a
permission-shape mismatch.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 21:23:57 -07:00
Victor Kuznetsov 20d7eda96a remove: drop all face-restore code (regeneration, not preservation)
Empirical conclusion from the 2026-06-04 - 2026-06-08 Modal cert sweeps:
every face-restore approach we built (GFPGAN-on-cleaned, PhotoMaker-V2,
InstantID txt2img, InstantID img2img-on-cleaned at three parameter
settings) regenerates the face via SDXL diffusion rather than preserves
it. Output face pixels are diffusion-fresh, so the regenerated face
inherits SDXL "clean skin" aesthetic and loses original identity
precision -- it looks MORE AI-generated than the cleaned image, not
less. The cleaned image from the main controlnet 0.20 removal pass is
the least-AI face state we can reach without re-introducing SynthID.

Nothing in the restore family achieves the actual goal (preserve the
original person's face). Keeping them around as opt-in invites users to
ship something that defeats the point. Removing entirely.

Library changes:
- Deleted src/remove_ai_watermarks/instantid_restore.py
- Deleted src/remove_ai_watermarks/photomaker_restore.py
- Deleted tests/test_instantid_restore.py
- Deleted tests/test_photomaker_restore.py
- Removed `instantid` and `photomaker` extras from pyproject.toml
- Removed `[tool.hatch.metadata] allow-direct-references = true` (was
  only needed for the photomaker git+ URL)
- InvisibleEngine.remove_watermark: dropped `restore_faces` +
  `restore_faces_method` params, removed both `_restore_faces_instantid`
  and `_restore_faces_photomaker` private methods, removed dispatch
- CLI: dropped `_restore_faces_options` decorator, all four cmd_*
  signatures lose `restore_faces` + `restore_faces_method`, kwarg passes
  to remove_watermark dropped
- _apply_auto: dropped `restore_faces` from tuple shape (was unused after
  the engine no longer takes it)
- auto_config.AutoConfig: dropped `restore_faces` field; `plan()` no
  longer sets it; `reason` no longer mentions it
- Tests updated accordingly (test_auto_config.TestReason no longer asserts
  "face-restore on" in the reason string)

Docs updated:
- CLAUDE.md: removed the photomaker extras bullet, the Face restore
  trade-off bullet, the instantid_restore.py + photomaker_restore.py
  module bullets; replaced restore mentions in watermark_remover and
  controlnet bullets and prod recipe with the empirical conclusion
- README.md: removed both `--restore-faces` callouts and the install
  snippet; the feature bullet and auto-mode comment updated
- docs/synthid-robust-identity-research.md: added Status-retired notice
  at the top pointing at the 2026-06-08 followup

raiw-app:
- modal_cert.py: dropped `--restore-faces` flag entirely; sweep() no
  longer takes restore_faces; pinned _LIB_SPEC to `[gpu]` extras (no
  `photomaker` / `instantid` extras), points at main

ruff + strict pyright clean; 569 tests pass; 18 restore-specific tests
gone.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 21:21:58 -07:00
Victor Kuznetsov 567f3ae729 docs(restore): document that restore methods REGENERATE, not preserve
Empirical conclusion from the 2026-06-04 - 2026-06-08 cert sweeps:
every shipped face-restore method (GFPGAN-on-cleaned, PhotoMaker-V2,
InstantID txt2img, InstantID img2img-on-cleaned at three parameter
settings) regenerates the face from an ArcFace embedding via SDXL
diffusion. Output face pixels are diffusion-fresh, which makes the
regenerated face look MORE AI-generated than the cleaned image (gloss,
symmetric pores, SDXL "clean skin" aesthetic) regardless of license.

The cleaned image from the main controlnet 0.20 removal pass is the
LEAST-AI state we can reach without re-introducing SynthID; any restore
on top trades original-look for embedding-driven regeneration. The
fundamental issue is structural: ArcFace encodes "general look" at 512
dimensions, SDXL decodes that into pixels with the inherent SDXL
aesthetic. Stronger identity push (higher strength + IP-Adapter scale)
makes the face closer to the embedding but more AI-looking; weaker push
leaves identity to drift further. No parameter setting recovers original
identity AND looks less AI than cleaned.

Production conclusion: do not ship `--restore-faces` in any monetized
deployment. The extras (`instantid`, `photomaker`) stay in the library
for research / personal use where users explicitly want regeneration.
Documented at every entry point:
- CLAUDE.md: new "Face restore trade-off" bullet + every restore mention
  rewritten to "REGENERATES, does NOT recover"; controlnet bullet updated
- README.md: feature bullet + callout + secondary mention all updated
- docs/synthid-robust-identity-research-2026-06-08.md: appended
  "Empirical follow-up" section documenting the InstantID sweep phases
  (Phase 1 txt2img v1/v2/v3, Phase 2 img2img defaults + stronger params)
- docs/controlnet-removal-pipeline-research.md: updated restore-faces
  bullet to reflect the empirical conclusion
- CLI help: `_restore_faces_options` docstring + `--restore-faces` /
  `--restore-faces-method` help text all updated

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 21:08:11 -07:00
Victor Kuznetsov 7d8af7882a tune(instantid): raise IP-Adapter + landmark scale + strength for stronger identity
First img2img cert sweep: scene/lighting integration was excellent on both
single (tatsunari) and group (gemini_3) photos, but the regenerated faces
were "recognizable similar people" rather than the original individuals.
The cleaned face crop (which has already drifted from original through the
main controlnet 0.20 removal pass) was competing as a structural prior;
at the previous parameter settings InstantID's ArcFace branch couldn't
dominate it.

Push the identity signal:
- `ip_adapter_scale`: 0.8 -> 1.0 at load time (full IP-Adapter strength)
- `controlnet_conditioning_scale`: 0.8 -> 1.0 default (landmark anchor)
- `img2img_strength`: 0.55 -> 0.7 default (more denoise, less cleaned
  structure survives, more room for the diffusion to render ArcFace)

The cleaned image already passed the SynthID oracle, so the absolute floor
on strength is "any positive value" -- raising it only increases the
freedom of the diffusion to inject identity (SynthID-safety isn't reduced
by higher strength, because the noise injection only destroys more of the
input pixels).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 20:54:41 -07:00
Victor Kuznetsov 8ed2d16a23 fix(instantid): pass trust_remote_code=True for local custom_pipeline
The img2img run silently produced an identity output because
DiffusionPipeline.from_pretrained refused to load the local custom_pipeline
.py without `trust_remote_code=True` (emits a single-line warning to stderr,
then falls back to a default class). load_ip_adapter_instantid then
AttributeError'd, our outer except logged + skipped, and the saved file
was the un-restored cleaned image (exact byte size match against the
no-restore baseline -- 250988 bytes).

We fetch the file from a pinned raw.githubusercontent URL we control, so
trust_remote_code is safe to opt in here.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 20:47:26 -07:00
Victor Kuznetsov 2687604b24 feat(instantid): switch from txt2img to img2img on cleaned crop
The txt2img architecture (generate face from scratch in a fresh 1024 scene)
fundamentally couldn't fix multi-face patchwork: each face was a studio
portrait that didn't belong in the surrounding scene (wrong lighting,
frontal pose, neutral expression vs the original group photo's varied
angles and smiles). Tight crop + elliptical alpha + color match smoothed
the seams but didn't make the faces look like they were SHOT in the scene.

Replacing with img2img-on-cleaned: feed the CLEANED face crop as the img2img
source, so the diffusion sees the actual scene context (shoulders, hair
edges, lighting direction, shadows) and harmonises the regenerated face
with it. Identity still flows through the ArcFace embedding (from original)
+ landmark ControlNet (kps from original) -- both semantic / pure geometry,
neither carries pixels.

SynthID safety preserved by construction:
- img2img source pixels = cleaned crop = already oracle-verified clean
- ArcFace embedding = 512-d semantic vector from original, no pixel content
- Landmark stick figure = colour-coded geometry, no source pixels
- img2img noise injection at strength 0.55 destroys any residual high-freq
  pattern in the cleaned crop
- Pipeline is the upstream StableDiffusionXLInstantIDImg2ImgPipeline,
  inherits from StableDiffusionXLControlNetImg2ImgPipeline; we still patch
  check_inputs to neutralise the same diffusers-0.38 positional shift the
  txt2img variant had

Implementation:
- New _fetch_img2img_pipeline_file() caches the upstream pipeline file from
  GitHub raw on first use (not on PyPI / HF Hub, has to be downloaded
  separately)
- _get_pipeline() now loads StableDiffusionXLInstantIDImg2ImgPipeline via
  custom_pipeline=<cached path>
- restore_faces_instantid() crops the SAME bbox from both original and
  cleaned, runs InsightFace on original (sharper embedding), feeds cleaned
  crop as img2img source, ArcFace+landmark as conditioning
- New img2img_strength=0.55 parameter (was no strength knob in txt2img mode)
- Composite path unchanged (elliptical alpha + color_match)
- 9 control-flow tests still pass (the mock pipe call shape change is
  absorbed by the kwargs-only fake)

Cert sweep will validate on tatsunari (single) first per user request.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 20:43:27 -07:00
Victor Kuznetsov 7c0c16fd66 test(instantid): update composite assertion to survive color-match
Last commit added `_color_match` which shifts the face crop's mean to the
canvas mean -- the old test fed a uniform face (210) into a uniform cleaned
canvas (90), so after color-match the face was uniform 90 and the
composite was undetectable by value. Switched the fake pipeline to a
gradient face so the color-match preserves variance, and the assertion
now checks that the face region has non-zero std (composite injected
gradient pixels) instead of a value threshold.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 20:26:56 -07:00
Victor Kuznetsov cdd6bd1fea feat(instantid): tighter face ellipse + color match for cleaner multi-face composite
Second multi-face iteration. v1-rect: full-1024 frame + Gaussian rectangle ->
patchwork. v2-ellipse: tight crop + ellipse 0.45*bw x 0.55*bh -> ellipse
exceeds bbox vertically and clips forehead/chin on single portrait, plus
group-photo faces visibly drift cooler than the warm bar background. v3:

1. **Smaller ellipse axes**: 0.32*bw x 0.42*bh. Both fit inside the bbox (since
   axes are radii from center, 0.32*bw extends 0.64*bw total width and
   0.42*bh extends 0.84*bh total height) so no chin/forehead clip even on
   non-square boxes. Face shape: vertically elongated (0.42 vs 0.32),
   matching real face geometry.

2. **Wider feather**: `min(bw, bh) // 5` instead of // 8. Edges fade over a
   wider band so the elliptical seam is less visible.

3. **Per-channel mean color match** (`_color_match`): before compositing,
   shift the regenerated face's mean BGR to match the cleaned canvas region
   where it lands. Each InstantID generation has independent SDXL noise so
   white balance drifts -- matching means equalises tone (warm bar / cool
   face -> warm face) without rescaling contrast.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 20:25:34 -07:00
Victor Kuznetsov 92c7245e2d chore: drop unused _composite_faces import
Linter caught it after the elliptical composite swap.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 20:18:37 -07:00
Victor Kuznetsov 1786f6de9f feat(instantid): multi-face anti-patchwork (tight-crop + elliptical composite)
Group-photo cert sweep last round produced the same "patchwork quilt" failure
mode as PhotoMaker-V2: each face is regenerated as a fresh 1024x1024 SCENE
(face + background + lighting), then composited as a Gaussian-feathered
RECTANGLE into the 2x square box around the original face. The rectangle's
corners carry regenerated background pixels with different colors / textures
per face, and the rectangular Gaussian feather lets them bleed into the
cleaned image -- 9 face renders with 9 different backgrounds -> patchwork.

Two changes, both surgical:

1. **Tight-crop the regenerated face before composite.** After generation,
   run YuNet again on the 1024 frame to find where the face actually landed,
   then crop tightly around it (matching the 2x padding our input crop uses
   so the face fills its natural slot). Drops the regenerated background's
   peripheral pixels.

2. **Elliptical composite alpha** (`_composite_faces_elliptical`). Instead of
   reusing photomaker_restore's rectangular Gaussian alpha, inscribe an
   ellipse in each face bbox (axes ~0.45*bw x 0.55*bh so the feather edge
   tapers cleanly inside the rectangle, head-silhouette shape), feather only
   the ellipse edge. Bbox corners (regenerated scene context) end up at
   alpha=0 and the cleaned-canvas pixels there stay intact. Only the head
   region is replaced.

Net result: faces stay identity-restored (semantic ArcFace + landmark control
still drives generation) but the canvas around each face is the cleaned
image, not a regenerated frame. No more multi-face patchwork.

Single-portrait case unchanged: there's one face to composite and the cleaned
canvas around it is mostly the background that was already there.

All 9 InstantID control-flow tests still pass (the mock face analyser
responds to both .get() calls with the same fake bbox, so the new
generated-image YuNet step is exercised end-to-end).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 20:18:15 -07:00
Victor Kuznetsov 4ec8ffec6b fix(instantid): patch check_inputs for diffusers-0.38 + set scale at load time
Two compat bugs caught by the Modal cert sweep, both rooted in diffusers
0.38 vs InstantID's community pipeline expectations:

1. **Positional check_inputs misalignment.** InstantID's __call__ calls
   `self.check_inputs(...)` POSITIONALLY using the parent's ~v0.29 signature.
   Diffusers 0.38 added two new parameters BEFORE `controlnet_conditioning_scale`
   in the parent's signature (`ip_adapter_image`, `ip_adapter_image_embeds`),
   which shifts every positional arg by two slots. The argument that lands in
   the parent's `controlnet_conditioning_scale` slot is actually InstantID's
   `control_guidance_end` -- which a few lines earlier was converted to `[1.0]`
   (a list) by InstantID's auto-broadcasting for the single-controlnet case.
   The parent's check then trips on `not isinstance([1.0], float)` -> TypeError.

   Our inputs are programmatic and validated by our own callers, so neutralising
   `pipe.check_inputs = lambda *a, **k: None` after load is safe. This is the
   standard workaround community ComfyUI ports use for the same compat break.

2. **`ip_adapter_scale` was passed at call time and silently ignored.** It's not
   in `StableDiffusionXLInstantIDPipeline.__call__`'s signature -- the upstream
   API sets the IP-Adapter weight on the ArcFace cross-attention branch at LOAD
   time via `load_ip_adapter_instantid(scale=...)`. Moved the 0.8 default there,
   dropped the call-time kwarg.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 20:07:31 -07:00
Victor Kuznetsov 53d753f2ad fix(instantid): pre-fetch antelopev2 from HF mirror (InsightFace auto-link is broken)
InsightFace's built-in auto-download for the antelopev2 model pack
(github.com/deepinsight/insightface/releases/download/v0.7/antelopev2.zip)
has been broken since at least 2024 (upstream issues #2517, #2766, called
out in InstantID's README: "manually download via this URL to models/
antelopev2 as the default link is invalid").

When the .onnx files aren't in place, FaceAnalysis.prepare() raises
`assert 'detection' in self.models` -- which is exactly what our Modal
cert sweep hit on the first real run.

Fix: a tiny pre-flight `_ensure_antelopev2()` that pulls the five expected
.onnx files (1k3d68, 2d106det, genderage, glintr100, scrfd_10g_bnkps) from
the HuggingFace mirror `kidyu/antelopev2-for-InstantID-ComfyUI` into
./models/antelopev2/ before FaceAnalysis is instantiated. Idempotent
(skips files that already exist); uses huggingface_hub's cache for free
caching on the Modal volume.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 19:58:40 -07:00
Victor Kuznetsov 00c559482f fix(invisible-engine): log exc_info + exception class on restore_faces failure
The InstantID cert sweep emitted `restore_faces post-pass failed ()` -- the
exception's str() was empty so the log line told us nothing about what
actually failed. Adding `exc_info=True` plus `type(e).__name__` so the
full traceback and exception class land in the log even when the message
is empty.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 19:53:07 -07:00
Victor Kuznetsov a296d5fe46 fix(instantid): inline YuNet detection (the imagined _get_yunet doesn't exist)
The InstantID restore module imported `_get_yunet` from `auto_config`, but
auto_config doesn't export that function -- the YuNet singleton lives inline
inside `detect_face()`. Caught by the Modal cert sweep:

  restore_faces post-pass failed (cannot import name '_get_yunet' from
  'remove_ai_watermarks.auto_config'); keeping un-restored output

Inline the YuNet builder the same way `photomaker_restore` does (read
`auto_config._FACE_SCORE` and the bundled `face_detection_yunet_2023mar.onnx`
asset, build a fresh `FaceDetectorYN` per call). This is the proven pattern
from PhotoMaker and avoids a private-API drift between the modules.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 19:48:21 -07:00
Victor Kuznetsov 70e8b3a517 feat(face-restore): add InstantID as the default non-commercial restore path
Per the 2026-06-08 deep-research synthesis (docs/synthid-robust-identity-
research-2026-06-08.md), the entire ArcFace-class identity-adapter ecosystem
for SDXL is blocked from commercial use by InsightFace's non-commercial model
packs (antelopev2 / buffalo_l). No commercial-safe ArcFace-grade identity
stack exists today. The user explicitly opted into shipping a non-commercial
restore path (research / personal use; raiw.cc must NOT install the extra).

Architectural choice: InstantID over PhotoMaker-V2 as the default.
- PhotoMaker-V2 (CLIP+ArcFace dual encoder, txt2img only): documented upstream
  identity drift on Asian male faces, visually confirmed in our cert sweep
  (tatsunari rendered as a generic woman; group photo collapsed into a
  patchwork).
- InstantID (ArcFace cross-attention + landmark ControlNet): semantic
  identity branch + spatial weak landmark control, decoupled. Per InstantID
  paper (arXiv:2401.07519) and the research report, stronger identity fidelity
  on single portraits. Critically: NO original face pixels enter the diffusion
  (ArcFace embedding is semantic, landmark stick figure is pure geometry), so
  SynthID is not transported.

Implementation:
- New `src/remove_ai_watermarks/instantid_restore.py` mirrors the
  `photomaker_restore.py` shape (lazy singletons for pipeline + FaceAnalysis,
  per-face crop + _composite_faces from photomaker_restore). Loads the
  InstantID community pipeline via `DiffusionPipeline.from_pretrained(
  custom_pipeline="pipeline_stable_diffusion_xl_instantid")` -- no upstream
  Python package needed; diffusers fetches the file from its community
  examples.
- New `instantid` extra in pyproject (insightface + onnxruntime +
  huggingface-hub). NON-COMMERCIAL block in the comment explains why.
- CLI: `--restore-faces-method [instantid|photomaker]`, default `instantid`.
  Both methods explicitly labeled NON-COMMERCIAL in the help text.
- Engine: dispatch on `restore_faces_method` to either
  `_restore_faces_instantid` or `_restore_faces_photomaker`.
- 9 control-flow tests for InstantID without model download (mirror the
  photomaker_restore.py test pattern + draw_kps helper checks). 587/587 pass.

Diffusers-0.38 compat verified by upstream code inspection: the InstantID
pipeline inherits from `StableDiffusionXLControlNetPipeline`, uses only
public diffusers APIs (`encode_prompt`, `prepare_image`, `prepare_latents`,
`get_guidance_scale_embedding`), uses legacy attention processor API which
diffusers preserves for backward compat. No PhotoMaker-V1-style internal
text_encoder access. End-to-end execution will be validated by the Modal
cert sweep in the next step.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 19:44:17 -07:00
Victor Kuznetsov c486badaa8 fix(photomaker-v2): render at SDXL native 1024, use upstream prompt + neg_prompt
The 9-face grid + single-face cert outputs were still mosaic of training-time
faces even after the id_embeds shape fix. WebFetch of the upstream
inference_pmv2.py revealed three mismatches:

1. SDXL at width=height=512 falls into its low-res failure mode (small-detail
   collage / mosaic) on the V2 LoRA. Render at native 1024 then downscale into
   the original face bbox at composite time.
2. Upstream prompt is descriptive ("instagram photo, portrait photo of a woman
   img, colorful, perfect face, natural skin, hard shadows, film grain, best
   quality"). Our generic prompt let SDXL drift away from the ID embedding.
   Adopted the upstream pattern.
3. Upstream V2 explicitly passes negative_prompt; the CFG batch-mismatch we hit
   on V1 isn't a V2 issue. Re-added negative_prompt with the upstream wording
   (asymmetry/worst quality/etc).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 19:11:48 -07:00
Victor Kuznetsov b1fed810fd fix(photomaker-v2): don't pre-unsqueeze id_embeds (the pipeline does it)
V2's pipeline forward at line 705 of upstream pipeline.py calls
`id_embeds.unsqueeze(0)` itself to add a batch dim, so callers pass a 2-D
(N_faces, 512) tensor and the pipeline turns it into 3-D. Upstream
inference_pmv2.py shows the canonical form: torch.stack([...]) of per-image
embeddings.

Our previous call .unsqueeze(0)'d on the way in, which the pipeline then
.unsqueeze(0)'d again, giving a (1, 1, 512) shape that the V2 id_encoder
consumed as garbage -- the resulting output was a training-time face collage
(verified visually 2026-06-04 against tatsunari + gemini_3 + the 9-face grid).

Fix: pass torch.stack([torch.from_numpy(embedding)]) -- shape (1, 512) -- so
the pipeline's internal unsqueeze gives the expected (1, 1, 512) inside the
forward. Don't pre-cast dtype either; the pipeline handles that internally.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 19:03:04 -07:00
Victor Kuznetsov 37817a610f test(photomaker): stub face_analyser + analyze_faces in the control-flow test
The previous commit added a real call into FaceAnalysis2 / analyze_faces inside
restore_faces_photomaker, which broke the model-free control-flow test. Stub it:
- monkeypatch _get_face_analyser to return a sentinel
- install a fake `photomaker` module with analyze_faces returning a single
  512-d zero embedding
- add dtype=torch.float32 to the fake pipeline class so .to(device, dtype=...) works

11/11 green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 18:51:26 -07:00
Victor Kuznetsov 3d00fed00c fix(photomaker-v2): compute id_embeds via FaceAnalysis2 before pipeline call
The Modal cert sweep against V2 hit the next layer of the API:

  PhotoMakerIDEncoder_CLIPInsightfaceExtendtoken.forward() missing 1 required
  positional argument: 'id_embeds'

V2 forward takes BOTH the CLIP image embedding (computed inside the pipeline from
input_id_images) AND an ArcFace identity embedding (id_embeds) that the caller
must compute. The upstream pipeline does NOT auto-compute it -- inference_pmv2.py
shows the caller using FaceAnalysis2 + analyze_faces to extract the ArcFace
vector from each input ID image and passing id_embeds=torch.stack([...]) into
pipe(...).

Wired the same flow here:
- New _get_face_analyser() singleton (double-checked lock) builds
  FaceAnalysis2(['CUDAExecutionProvider' | 'CPUExecutionProvider']).prepare(...).
  This is the non-commercial step (antelopev2/buffalo_l auto-download on first
  use). Module docstring already calls it out.
- Per face: analyze_faces() -> torch.from_numpy(embedding) -> .unsqueeze(0) to
  match the pipeline's expected (B, D) shape, casting to pipeline.device/dtype.
  Faces InsightFace can't detect inside the crop get skipped (the most likely
  cause would be the diffusion-cleaned face being too small or stylised after
  the main pass; YuNet already gated us into having a face per crop, so this
  should be rare).
- id_embeds= keyword threaded into the pipeline call site alongside the existing
  input_id_images=.

Tests untouched (the V1-only safety guard was already removed in the previous
commit when we swapped V1->V2; the existing 11 tests still pass).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 18:49:10 -07:00
Victor Kuznetsov 65de8df5c5 refactor(face-restore): drop GFPGAN, ship PhotoMaker-V2 as the sole restore (non-commercial)
Visual review of the GFPGAN-on-cleaned output (9-face grid, 1448x1086) showed it
only polished the already-drifted face without restoring identity — useless for the
"restore who is in the photo" intent. Dropping it.

The shipped restore path is now PhotoMaker-V2, which delivers true identity-from-
embedding face regeneration via a CLIP+ArcFace dual encoder. The ArcFace branch
pulls InsightFace antelopev2/buffalo_l model packs at runtime, which InsightFace
releases under a research-only license, so the whole extra is **NON-COMMERCIAL**.
raiw.cc and any monetized deployment must NOT install the `photomaker` extra.
This is called out at every entry point: CLI flag help, module docstring,
pyproject extra block, CLAUDE.md extras bullet, README install snippet.

Changes:
- Deleted `src/remove_ai_watermarks/face_restore.py` and its tests.
- Deleted the `restore` extra (gfpgan/facexlib/basicsr + scipy<1.18 / numba<0.60
  pins) and the basicsr setuptools<69 build pin from pyproject.toml.
- Restored `src/remove_ai_watermarks/photomaker_restore.py` (V2 this time:
  `TencentARC/PhotoMaker-V2`, `photomaker-v2.bin`, no `pm_version='v1'` override).
- Restored the `photomaker` extra in pyproject with all the upstream-compat
  pins (einops, peft, onnxruntime, insightface) and the `allow-direct-references`
  hatch metadata block.
- `InvisibleEngine` swapped `_restore_faces` -> `_restore_faces_photomaker`;
  `--restore-faces-method` removed (only one method, no choice).
- CLI flag help, CLAUDE.md, README, docs/synthid.md, and
  docs/controlnet-removal-pipeline-research.md all updated.
- docs/synthid-robust-identity-research.md status notice rewritten to list both
  abandoned commercial-safe attempts (V1 + GFPGAN-on-cleaned) and the
  non-commercial trade-off we accepted.

ruff + strict pyright(src/) clean; 578 tests pass (the 9 GFPGAN tests are gone,
the 11 PhotoMaker tests stay green).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 18:41:01 -07:00
Victor Kuznetsov 01fe98bf54 refactor(face-restore): rollback PhotoMaker, restore GFPGAN on the CLEANED image
After 7 cascading upstream-compat fixes (insightface dep, peft dep, pm_version,
device, etc.), the PhotoMaker V1 cert sweep still hit a CFG batch-dim mismatch
inside the denoising loop. The upstream PhotoMaker `pipeline.py` is forked from
diffusers v0.29.1 and our env runs 0.38; SDXL prompt-encoder handling changed
significantly between those versions, so making PhotoMaker work end-to-end
needs a proper fork or a diffusers downgrade — both expensive. Not worth
shipping today.

Pivot: restore `face_restore.py` (GFPGAN) with a single-line fix that makes it
SynthID-safe by construction. The previous design ran GFPGAN.enhance on the
ORIGINAL watermarked image and was oracle-confirmed to re-add SynthID via the
weight-0.5 pixel blend. The fix is to run GFPGAN on the diffusion-CLEANED
image — whatever pixels GFPGAN derives from are already SynthID-free, so the
partial blend cannot transport the watermark. Identity fidelity is lower than
a true identity-as-embedding stack would deliver, but it ships and works.

Changes:
- `src/remove_ai_watermarks/face_restore.py` restored from pre-wipe state with
  one line changed: `restorer.enhance(cleaned_bgr, ...)` instead of
  `restorer.enhance(original_bgr, ...)`. `original_bgr` is kept as an unused
  positional argument for API stability.
- `src/remove_ai_watermarks/photomaker_restore.py` and its tests REMOVED. The
  research note (`docs/synthid-robust-identity-research.md`) keeps a "status
  notice" documenting why PhotoMaker is parked for now and what the path back
  in would look like.
- `pyproject.toml` `restore` extra restored (gfpgan/facexlib/basicsr +
  scipy<1.18 + numba<0.60 pins + the basicsr setuptools<69 build pin), plus
  `photomaker` extra (with its einops/insightface/peft pile) and the
  `[tool.hatch.metadata] allow-direct-references = true` block REMOVED.
- `InvisibleEngine._restore_faces_photomaker` removed; `_restore_faces`
  restored. The `--restore-faces` CLI flag and its plumbing through cmd_*
  signatures are unchanged.
- CLAUDE.md, README.md, docs/synthid.md, docs/controlnet-removal-pipeline-
  research.md updated to describe the shipped GFPGAN-on-cleaned design and to
  reference PhotoMaker only as the parked alternative.

ruff + strict pyright(src/) clean; 578 tests pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 16:55:45 -07:00
Victor Kuznetsov d1b85ee6a8 fix(photomaker): drop explicit negative_prompt to fix CFG batch mismatch
Modal cert sweep #6 made it INTO the denoising loop and died with
"Sizes of tensors must match except in dimension 1. Expected size 2 but got size 1
for tensor number 1 in the list."

In the PhotoMaker pipeline's denoising loop, the per-step embeddings are built
as torch.cat([negative_prompt_embeds, prompt_embeds(_text_only)], dim=0). The
text-encoder + ID-encoder flow can leave the negative branch at batch=2 and the
ID-injected branch at batch=1 when a custom negative_prompt is passed, so the
cat fails. The upstream gradio demo just passes no negative_prompt and relies
on the pipeline's empty default; do the same.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 16:35:40 -07:00
Victor Kuznetsov 031c38dc7f fix(photomaker): place id_encoder on the right device + dtype
Modal cert sweep #5 made it through component load (V1 id_encoder + lora_weights)
and died at inference with the classic
"Input type (torch.cuda.HalfTensor) and weight type (torch.HalfTensor) should be
the same" — id_encoder lived on CPU/fp32 while the rest of the pipeline ran on
CUDA/fp16. Two fixes:

1. Call `pipe.to(device)` BEFORE `load_photomaker_adapter` so the loader picks the
   right device/dtype from `self.device` / `self.unet.dtype` when it builds the
   encoder.
2. Belt: after load, explicitly `pipe.id_encoder.to(device, dtype)` because some
   torch/diffusers combos leave custom attributes on the old device even when
   `pipe.to` ran first.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 16:29:00 -07:00
Victor Kuznetsov 9435e12ce6 fix(photomaker extra): add peft dep (required by pipe.fuse_lora)
Modal cert sweep #4 got further -- PhotoMaker V1 components actually loaded
("Loading PhotoMaker v1 components [1] id_encoder ... [2] lora_weights") -- and
died on the next step: "PEFT backend is required for this method." That's
diffusers' fuse_lora call gated on the peft library, which PhotoMaker doesn't
declare in its install_requires either.

Pin peft>=0.10.0 in the photomaker extra.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 16:23:32 -07:00
Victor Kuznetsov 1fb2a64b56 fix(photomaker): pass pm_version='v1' to load_photomaker_adapter
Modal cert sweep #3 ran past the `insightface` import error and into a real
state_dict mismatch:

  Error(s) in loading state_dict for PhotoMakerIDEncoder_CLIPInsightfaceExtendtoken:
    Missing key(s) ... qformer_perceiver.token_proj.0.weight ...

The upstream `load_photomaker_adapter` defaults to `pm_version='v2'` regardless of
the .bin file passed -- the loader builds a V2 encoder
(PhotoMakerIDEncoder_CLIPInsightfaceExtendtoken) and then tries to load V1 weights
into it. We must pass `pm_version='v1'` explicitly so the loader instantiates the
CLIP-only PhotoMakerIDEncoder. The pipeline-level `input_id_images` API is the
same across V1 and V2, so the call site does not change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 16:18:52 -07:00
Victor Kuznetsov 860bde4a26 fix(photomaker extra): pin insightface for import resolution (MIT code only)
The upstream PhotoMaker package's `__init__.py` unconditionally imports a
face-analyser class from its `insightface_package` submodule, so JUST importing
`PhotoMakerStableDiffusionXLPipeline` (the V1 pipeline class we use) raises
`ModuleNotFoundError: No module named 'insightface'` if insightface isn't
present in the env. The Modal cert sweep caught this on the V1 image.

Resolution: pin `insightface>=0.7.3` (and its `onnxruntime` runtime dep) in the
`photomaker` extra. The PyPI insightface package is MIT-licensed CODE; the
non-commercial restriction sits on the pretrained model packs (antelopev2,
buffalo_l) which download only when `FaceAnalysis()` is instantiated. Our V1 path
never instantiates the face-analyser -- it loads photomaker-v1.bin (CLIP-only
encoder) via `load_photomaker_adapter` -- so the model-pack license does not
bind us; we depend only on the MIT code for the import to resolve.

Safety guards:
- Runtime check in `_get_pipeline`: raises if `_PHOTOMAKER_FILE` is ever pointed
  at v2 (so a future maintainer can't silently regress to the InsightFace path).
- New test class `TestV1OnlyCommercialSafetyGuard`: asserts repo + filename
  pin to V1 AND asserts the module source never references the face-analyser
  class (a static check that our codepath stays out of the runtime that would
  pull the non-commercial model packs).

Docs: documented the import dance + legal split inline at the top of
`photomaker_restore.py`.

ruff clean; 581 tests pass (the 9 PhotoMaker tests plus 3 new V1-guard tests).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 16:13:20 -07:00
Victor Kuznetsov dfa5181309 fix(photomaker): switch to V1 — V2 actually requires InsightFace (non-commercial)
A Modal cert sweep caught what the research doc missed: PhotoMaker-V2 fails at
import without InsightFace ("No module named 'insightface'"). Reading the upstream
source confirms it: `photomaker/__init__.py` imports `FaceAnalysis2` (an InsightFace
wrapper) at module load, V2's encoder is named
`PhotoMakerIDEncoder_CLIPInsightfaceExtendtoken`, and `model_v2.py`'s forward
takes an `id_embeds` argument that the pipeline computes via
`insightface.app.FaceAnalysis(name='antelopev2', ...)`. So V2 is a DUAL encoder
(CLIP + ArcFace), not CLIP-only as the model card line "id_encoder includes
finetuned OpenCLIP-ViT-H-14 and a few fuse layers" implied.

InsightFace's pretrained model packs (antelopev2, buffalo_l) are research/
non-commercial only per their own README:
  "The pretrained models we provided with this library are available for
   non-commercial research purposes only."
So V2 is blocked for a paid service like raiw.cc.

PhotoMaker-V1 is the commercial-safe alternative — its `PhotoMakerIDEncoder`
(model.py) forward takes only `(id_pixel_values, prompt_embeds, class_tokens_mask)`,
no ArcFace branch. Identity is CLIP-only, license is Apache-2.0, no InsightFace.

Code change: swap the repo + filename constants in `photomaker_restore.py`
(TencentARC/PhotoMaker, photomaker-v1.bin). Tests still pass (the 9 PhotoMaker
tests use a fake pipeline, so the model swap is transparent to them).

Doc correction: rewrote the verdict / license table / section 5 of
`docs/synthid-robust-identity-research.md` to lead with V1 and add a correction
notice explaining the V2 misread. Bulk-renamed `PhotoMaker-V2` to `PhotoMaker-V1`
across CLAUDE.md, README.md, docs/synthid.md, and
docs/controlnet-removal-pipeline-research.md (kept V2 only in the correction
notice, the license table, and the anchor reference).

ruff clean; 578 tests pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 16:05:58 -07:00
Victor Kuznetsov 7e6fc8bfb9 fix(photomaker extra): add einops explicitly (upstream missed it)
PhotoMaker imports einops in its forward path but its install_requires doesn't
declare it, so the photomaker extra resolved without einops on a clean install
and the Modal cert sweep died at the restore-faces step with
"No module named 'einops'" -- the post-pass failed gracefully and returned the
un-restored cleaned output, so the cert artifact had no face recovery.

Pin einops>=0.7.0 in the photomaker extra so the extra is self-contained.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 15:46:28 -07:00
Victor Kuznetsov 439eeadc07 refactor(face-restore): wipe GFPGAN path, --restore-faces is PhotoMaker-only
The GFPGAN `restore` extra and its `face_restore.py` module are gone. They were
oracle-confirmed to re-introduce SynthID by blending watermarked original face
pixels at fidelity weight 0.5 (clean A/B: gemini_3 controlnet 0.20 detected WITH
GFPGAN, clean WITHOUT). Keeping them as the default restore method was a footgun
for the removal pipeline. PhotoMaker-V2 (added in the previous commit) is the
single shipped restore path now -- identity-as-embedding, SynthID-safe by
construction.

Removed:
- src/remove_ai_watermarks/face_restore.py + tests/test_face_restore.py
- pyproject.toml `restore` extra (gfpgan/facexlib/basicsr + scipy/numba pins)
- pyproject.toml `[tool.uv.extra-build-dependencies] basicsr = [...]` build pin
- CLI: `--restore-faces-method` and `--restore-faces-weight` (no method choice
  to make, no GFPGAN weight knob to expose)
- InvisibleEngine._restore_faces method (only _restore_faces_photomaker remains)
- All restore-faces-method / restore-faces-weight threading through cmd_*
  signatures and _process_batch_image

Kept:
- `--restore-faces / --no-restore-faces`: now binds to PhotoMaker-V2.
- All adopted oracle findings about GFPGAN re-introducing SynthID (kept in the
  research docs as historical context that explains why the path was removed).

Docs updated: CLAUDE.md (restore extras bullet collapsed to photomaker, removed
face_restore Key-modules bullet, several inline GFPGAN refs scrubbed), README.md
(face-identity callout + install section now point to the photomaker extra),
docs/synthid.md 5.5 (net recipe), docs/controlnet-removal-pipeline-research.md
(recommendations).

ruff + strict pyright (src/) clean; 578 tests pass (the 9 GFPGAN tests are gone,
the 9 PhotoMaker tests stay green).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 15:35:37 -07:00
Victor Kuznetsov 1439eb0714 feat(photomaker): SynthID-safe face-identity restoration via PhotoMaker-V2
Adds the second face-restore mechanism, selectable via the new CLI option
`--restore-faces-method=photomaker`. Unlike the existing GFPGAN path (which runs on
the watermarked ORIGINAL and was oracle-confirmed to re-introduce SynthID by partial
pixel blending), PhotoMaker carries identity in a SynthID-invariant OpenCLIP
embedding and regenerates fresh face pixels conditioned on it — the pixels in the
output are diffusion-fresh, so the watermark cannot be transported.

The load-bearing assumption (embedding invariance to SynthID-magnitude pixel noise)
was empirically validated in the prior commit (smoke test): cosine drift 0.002
under a ±2 LSB low-freq carrier, an order of magnitude less than JPEG90 drift
which SynthID survives at >=99% TPR.

End-to-end commercial-safe:
- PhotoMaker-V2 weights: Apache-2.0 (TencentARC)
- ID encoder: OpenCLIP-ViT-H/14 (MIT)
- SDXL base: shared with the main pipeline
- NO InsightFace (the non-commercial blocker for IP-Adapter FaceID / InstantID /
  PuLID / Arc2Face)

Two-pass architecture (PhotoMaker has no ControlNetImg2img class in diffusers):
1) main controlnet/default removal pass cleans SynthID + drifts faces
2) PhotoMaker txt2img regenerates each face from its embedding, feather-composited
   back into the cleaned image

New module `photomaker_restore.py` mirrors `face_restore.py`: lazy pipeline
singleton (double-checked lock), `is_available()` gate, pure `_face_crop_square` and
`_composite_faces` helpers, all unit-tested without the model (9 new tests). New
`InvisibleEngine._restore_faces_photomaker` runs after the diffusion pass, mirroring
`_restore_faces`. CLI flag `--restore-faces-method=[gfpgan|photomaker]` threaded
through `cmd_invisible`/`cmd_all`/`cmd_batch` + `_process_batch_image`.

New optional `photomaker` extra (Apache-2.0 + Apache-2.0/MIT deps, no basicsr).
`[tool.hatch.metadata] allow-direct-references = true` is required because the
upstream PhotoMaker package lives only on GitHub.

The next step (separate work) is oracle validation: run a 6-image cert sweep
through the new pipeline (default/controlnet at the certified strength +
--restore-faces-method=photomaker) and confirm SynthID stays clean while face
identity is recovered. The required infrastructure (`raiw-app/modal_cert.py`) is
already in place.

ruff + strict pyright(src/) clean; 586 tests pass (+ 9 new in
tests/test_photomaker_restore.py).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 15:20:29 -07:00
Victor Kuznetsov f8f247308b docs(identity): smoke test confirms OpenCLIP embedding is invariant to SynthID-magnitude noise
Empirical confirmation of the load-bearing assumption in the PhotoMaker-V2 path: the
identity embedding cannot transport an invisible pixel watermark.

Tested OpenCLIP-ViT-H/14 (laion2B-s32B-b79K — the same encoder PhotoMaker-V2
fine-tunes) on 31 face crops from gemini_3/gemini_4/openai_3 grid. cosine
similarity between embed(orig) and embed(perturbed):

- synthid_proxy (±2 LSB low-frequency noise, the regime SynthID actually lives in):
  mean 0.9977, min 0.9937. Embedding moves by 0.002 — an order of magnitude less
  than JPEG90 (mean 0.928), which SynthID survives at >=99% TPR by design.
- noise3 / jpeg70 / blur1: 0.89-0.95, all clearly above the SynthID floor.
- self check: 1.0000 (pipeline sane).

So the embedder discards exactly the dimensions SynthID hides in. PhotoMaker-V2
conditioned on a watermarked face will see the same identity vector as a clean
face of that person, so the generated face inherits identity, not the watermark.

This unblocks step 2 of the research plan: prototype PhotoMaker-V2 in the
controlnet pipeline. The previously logged ad-hoc "cos(orig, SDXL-cleaned)"
numbers (0.56-0.93) measured diffusion drift, not watermark invariance, and are
not relevant to the hypothesis.

Docs only.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 15:05:15 -07:00
Victor Kuznetsov 310ce912ba docs: SynthID-robust identity research — PhotoMaker-V2 is the only commercial-safe SDXL stack
After GFPGAN restore was oracle-confirmed to RE-INTRODUCE SynthID (it is a fidelity-
restoration net conditioned on the watermarked input), the only identity path that
will not transport the watermark is identity-by-EMBEDDING: a semantic vector that
conditions a fresh generation. That requires a face-recognition / ArcFace-class or
CLIP-image embedder.

Verified the license stack of every credible 2025-2026 SDXL identity adapter by
fetching primary sources directly (HuggingFace model cards, insightface.ai):

- IP-Adapter FaceID family, InstantID, PuLID, Arc2Face -> all blocked. Each
  depends at runtime on InsightFace's antelopev2/buffalo_l ArcFace packs, and
  insightface.ai explicitly states "Code is MIT licensed; models require separate
  commercial licensing." IP-Adapter FaceID's own model card flags itself non-
  commercial for the same reason.
- PhotoMaker-V2 is the single commercial-safe end-to-end stack today: Apache-2.0
  adapter weights with identity encoded as a fine-tuned OpenCLIP-ViT-H/14 (the
  model card's exact phrase: "id_encoder includes finetuned OpenCLIP-ViT-H-14
  and a few fuse layers"). No InsightFace.

Mechanistic argument that an identity embedding cannot transport SynthID: the
embedder is trained to be invariant to low-amplitude pixel changes (JPEG, resize,
brightness, noise), which is exactly the regime SynthID hides in by design. So
the embedding extracted from a watermarked face should be ~identical to the
embedding from the cleaned face, and the embedding cannot carry the watermark
into a freshly generated face. Flagged explicitly as not-yet-measured -- the
first integration step is a cosine-similarity smoke test (no codegen) before
investing in a PhotoMaker prototype.

Process note: the deep-research harness was run but its verifier subagents failed
to call StructuredOutput (same harness bug as a prior session), so its synthesis
was unusable; the license claims here are direct quotes from the primary
sources, fetched and verified, not from the workflow synthesis.

Docs only.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 14:58:11 -07:00
Victor Kuznetsov be14eca207 docs: certified controlnet strength floors from the Modal GPU oracle sweep
Ran the isolated raiw-controlnet-cert Modal app (raiw-app/modal_cert.py) over a
strength x seed grid, restore OFF, --max-resolution 1536, each vendor checked on its
OWN oracle (OpenAI -> openai.com/verify, Gemini -> the Gemini app). Certified
controlnet SynthID-removal floors:

- OpenAI 0.20: 2 photoreal images (9-face grid + bracelet) x seed {1,2,3} = 6/6 clean;
  the bracelet that flipped at 0.15 is seed-robust at 0.20. Transfers to prod (OpenAI
  removal is resolution-independent).
- Gemini 0.30: 0.20 detected -> 0.30 clean on 2/2 seeds (hardest face). Holds only at
  <= 1536; Gemini is resolution-sensitive and raiw.cc runs NATIVE, so cap Gemini
  <= 1536 + use 0.30, or native-calibrate (~0.35+).

Prod recipe recorded: controlnet + a controlnet-specific per-vendor schedule in
resolve_strength (OpenAI 0.20 / Gemini 0.30, NOT the default 0.10/0.15 ladder) +
FIXED prod seed (kills the near-threshold non-determinism) + restore reworked/off.
Added to docs/controlnet-removal-pipeline-research.md (certified floors table),
docs/synthid.md 5.5, and the CLAUDE.md controlnet bullet. Docs only.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 12:44:56 -07:00
Victor Kuznetsov d38b9a6122 docs: correct controlnet/restore SynthID-removal claims from the 2026-06-04 oracle pass
Oracle validation (openai.com/verify + the Gemini app) overturned three claims that
were on main, and consolidates the controlnet findings into one authoritative place.

- controlnet does NOT reliably remove SynthID at the low vendor-adaptive strength:
  removal is content x pipeline dependent and the survivors FLIP by content type
  (photoreal survives controlnet / clears default; flat graphic survives default /
  clears controlnet; flat text clears both). Root cause is insufficient strength,
  not the pipeline; controlnet needs a higher, per-vendor floor than default.
- removal near the threshold is SEED-non-deterministic (same image+pipeline+strength
  can pass or fail run-to-run); a single clean run does not certify a strength.
- `--restore-faces` RE-INTRODUCES SynthID: GFPGAN runs on the ORIGINAL watermarked
  face at weight 0.5 and composites it back over the cleaned result (clean A/B:
  a Gemini face stayed detected through controlnet 0.15/0.20/0.25 WITH restore,
  cleared at 0.20 with --no-restore-faces). The old "GFPGAN scrubs SynthID" claim
  was wrong.

Corrected in CLAUDE.md (watermark_remover controlnet bullet, controlnet
Known-limitations bullet, face_restore bullet, vendor-adaptive strength bullet) and
docs/synthid.md (5.1 controlnet/face-identity, 5.2 strength floors, new 5.5 oracle
validation log). docs/controlnet-removal-pipeline-research.md gains an authoritative
"Oracle validation 2026-06-04" section that the others point to as the single source.

Docs only; no code change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 12:22:43 -07:00
Victor Kuznetsov 3aea21e632 feat(visible): Samsung Galaxy AI mark removal (bottom-left reverse-alpha, #37)
New samsung_engine.py mirrors the jimeng engine but anchors bottom-left; wired
into watermark_registry, the CLI (--mark samsung / auto), and identify
(visible_samsung, medium). visible_alpha_solve.py gains a corner=bl mode;
samsung_alpha.png solved from @f-liva's flat captures. Calibrated for the
Italian "Contenuti generati dall'AI" variant. Flat black/gray/white captures
committed, real photos gitignored. Tests + docs.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-05 10:27:44 -07:00
Victor Kuznetsov 6f4aa4c7b1 fix(invisible): retry in fp32 on a degenerate fp16 output (#41)
The fp16-fix VAE swap (#29) is gated to the default SDXL checkpoint, so a
custom model_id, a stale pre-fix install, or a fal/custom loader can still
decode to an all-black/NaN frame in fp16 (reporter: gpt-image 1448x1086,
the `image_processor.py invalid value encountered in cast` warning).

Add a model-agnostic backstop in remove_watermark: after generation, if the
run was fp16 and the output is degenerate (_is_degenerate_image: near-zero
mean and variance), rebuild the pipeline in fp32 on the same device and
re-run once. fp32 is the verified-clean path, so a black image is never
returned regardless of model_id or version. Mirrors the MPS->CPU fallback's
self-mutation pattern; batch inherits it. Verified e2e on MPS by forcing
fp16 with the swap disabled (first pass black, guard fired, retry clean).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-04 17:43:27 -07:00
Victor Kuznetsov ec549b5c55 chore(deps): bump aiohttp 3.13.5 -> 3.14.0 for GHSA-hg6j-4rv6-33pg + GHSA-jg22-mg44-37j8
Targeted `uv lock --upgrade-package aiohttp`; only the aiohttp pin changes (no
other package added/removed). Clears the two moderate Dependabot alerts on the
transitive aiohttp. The third alert (basicsr GHSA-86w8-vhw6-q9qq, command
injection, no patch) is accepted: basicsr is the optional, off-by-default
`restore` extra pinned to 1.4.2 as the only buildable version.

Imports + targeted suite (identify/metadata/gemini) green after the bump.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-04 16:57:08 -07:00
Victor Kuznetsov 2c0b174dfa fix(gemini): self-verify repair for under-removed sparkles
After reverse-alpha, re-detect the sparkle; when one survives at or above the
registry fail line (conf >= 0.5) -- an alpha mismatch the per-image gain estimate
could not fully correct -- inpaint the footprint and keep that only when it lowers
the re-detect confidence. The footprint inpaint reconstructs the slot from its
darker surroundings, so it physically removes the bright sparkle; purely additive,
the common clean removal re-detects below 0.5 and is returned untouched.

Measured on the spaces visible-removal audit: gemini removal-audit failures drop
15 -> 11 (4 genuine rescues), doubao 65/65 and jimeng 11/11 unchanged, zero
regressions on the 468 already-clean removals.

An offset+scale alignment search was prototyped on the remaining 11 fails and
rejected: an audit "ceiling" suggested +4 more, but those were NCC-gaming -- the
lower-scoring placement left the sparkle as bright or brighter, just reshaping the
residual so the contrast-invariant shape-NCC scored lower (a5a9: first-pass slot
~76 at background level vs the "aligned win" ~164). A brightness sanity check
rejected every one, so it contributed nothing and was removed.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-04 16:45:18 -07:00
Victor Kuznetsov 6d11c11b52 feat(auto): DBNet text detector, Real-ESRGAN upscaler, batch --auto
Three content-quality features for the invisible/all/batch pipeline.

DBNet text detector (auto_config): replace the MSER text heuristic with
PP-OCRv3 differentiable-binarization via cv2.dnn.TextDetectionModel_DB,
using a bundled 2.4 MB Apache-2.0 model (en/cn detection nets are
byte-identical, so it ships language-neutral). cv2.dnn is core OpenCV, so
no new pip dep. MSER stays as the fallback when the model can't load.
Validated on real images: matches MSER everywhere and additionally catches
the Doubao CJK mark MSER missed; routing decisions unchanged otherwise.

Real-ESRGAN upscaler (new upscaler.py, esrgan extra): optional
pre-diffusion super-resolution for the min-resolution floor upscale, loaded
via spandrel (MIT, no basicsr) with BSD-3-Clause weights downloaded on
first use. New --upscaler {lanczos,esrgan} on invisible/all/batch; default
stays lanczos and the engine falls back to lanczos when the extra is absent
or the model errors (never breaks removal). It is a manual opt-in knob (the
auto plan never selects it) -- as a generic GAN it sharpens photo/texture
content strongly but can degrade faces (the diffusion pass regenerates
them) and thin text, documented accordingly.

batch --auto: wire the content-adaptive --auto (+ --adaptive-polish) into
cmd_batch. The plan is recomputed per image and the invisible engine is
cached per resolved pipeline (default/controlnet), so a mixed directory
builds at most one engine of each kind. Verified end-to-end: 3 mixed
images routed correctly with only 2 pipeline loads (controlnet reused).

ruff + strict pyright(src/) clean; 558 tests pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-04 16:04:33 -07:00
Victor Kuznetsov 4a6cd71ab2 Merge branch 'claude/silly-northcutt-c2bf06': unify C2PA vendor registry + code-health + uv publish
Brings in commit 5cf68a6 (single C2PA_AI_VENDORS registry, erase_lama
grayscale/BGRA support, batch device-cache clearing + --controlnet-scale,
uv publish via OIDC, hatchling pin <1.31). Auto-merged with no conflicts;
ruff/pytest(544)/pyright all clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-03 22:10:25 -07:00
Victor Kuznetsov 32a0779e1d fix(gemini): demote sparkle false positives with a core-brightness gate
detect_watermark's shape-only NCC (spatial/gradient/var fusion) fires on ornate
or flat content (text strips, banners, hatching) that coincidentally matches the
diamond shape. The NCC is contrast-invariant, so it cannot see the defining
property of a real Gemini sparkle: a bright WHITE overlay whose core sits above
the local background.

The fusion now demotes (caps confidence to 0.30) a match that is BOTH
low-confidence (< _SPARKLE_FP_CONF 0.65) AND has a low core-ring brightness
margin (_core_ring_margin < _SPARKLE_FP_MARGIN 5). Real sparkles escape via
EITHER high confidence (white-bg sparkles score >=0.79 despite a low margin) OR
high margin (dark/mid backgrounds, incl. the #36 faint-corner case), so both
must fail to demote. The gate is monotonic -- it only removes detections, never
adds -- so it cannot regress the verified-negative corpus (already 0 FPs).

On the spaces corpus it demoted 16/495 flagged sparkles (13 no AI metadata =
content FPs; the 3 AI-meta ones were visually FPs / a near-invisible
white-on-white sparkle whose AI verdict is held by metadata), and dropped the
removal-audit failures 20 -> 15.

- _core_and_bg shared helper (core 75th-pct brightness vs background-ring median);
  _estimate_alpha_gain refactored onto it, new _core_ring_margin wrapper.
- TestSparkleFalsePositiveGate: margin high/low, strong-sparkle kept (incl. on
  white via high conf), blurred no-core blob demoted.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-03 22:02:28 -07:00
Victor Kuznetsov b686dbdd79 feat(auto): adaptive detail-targeting polish + --adaptive-polish flag
The fixed mild auto polish (unsharp 0.5 / grain 2.0) under-corrected soft
photo/face output (gemini_3 stayed at lap-var 84 vs its 592 original) and its
grain speckled small text. Replace it with humanizer.adaptive_polish: target the
input's Laplacian variance with a capped unsharp scaled to the deficit + edge-
masked grain (smooth regions only), calibrated by a short sigma search. Self-
limiting on text/graphics -- already high-frequency, so almost no polish lands
and text edges are masked out. Validated on the spaces corpus (gemini_3 84 -> 334
end-to-end; openai_1 text near-untouched).

Interface: every --auto decision is now independently overridable -- add
--adaptive-polish/--no-adaptive-polish (matching --restore-faces; works without
--auto too) so the polish can be disabled or used manually. _apply_auto overrides
exactly the three content-adaptive modes (pipeline, restore-faces, adaptive-
polish); --unsharp/--humanize stay independent fixed filters.

cv2-only, no new deps. Threaded through invisible/all (not batch).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-03 21:49:08 -07:00
Victor Kuznetsov 5cf68a6a3d refactor: unify C2PA vendor registry + code-health fixes + uv publish
Three P2 cleanups from a library-wide review.

Detection -- single C2PA_AI_VENDORS registry (noai/constants.py):
- C2PA_ISSUERS, SYNTHID_C2PA_ISSUERS, and identify._ISSUER_PLATFORM now derive
  from one C2paAiVendor table, so adding a C2PA vendor is one entry instead of
  edits in three places across two files. Behavior-identical (262 detection
  tests pass; the kept `needle` field is load-bearing -- it differs from `org`
  for Google and ByteDance, with no mechanical derivation).

Code-health:
- region_eraser.erase_lama now accepts grayscale/BGRA like erase_cv2 (it
  crashed on grayscale and silently dropped alpha on BGRA). +2 regression tests.
- batch frees the device cache between images via a shared try_empty_device_cache
  helper (generalized from the MPS-only _try_clear_mps_cache, now reused by both
  the MPS->CPU fallback and the batch loop).
- batch gained --controlnet-scale (parity with invisible/all).

CI / packaging:
- publish.yml uploads via `uv publish` (PyPI trusted publishing over OIDC),
  replacing pypa/gh-action-pypi-publish so uploads no longer depend on that
  action's bundled twine accepting the Metadata-Version. Workflow filename +
  pypi environment unchanged, so PyPI's trusted-publisher entry still matches.
- hatchling pin relaxed <1.28 -> <1.31 (verified against hatch's changelog:
  1.30.0 made Metadata 2.5 the default, 1.30.1 reverted to 2.4; 1.27-1.29 were
  always 2.4). Kept as belt-and-suspenders so the first uv-publish release ships
  2.4, isolating the uploader swap from the metadata-version bump.

Docs (CLAUDE.md, pyproject) synced; corrected the inaccurate "hatchling 1.28+
emits 2.5" note.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-03 21:01:07 -07:00
Victor Kuznetsov 9bd2c17cc4 feat(auto): content-adaptive --auto quality mode, Phase 1
Add `auto_config.plan(image_path) -> AutoConfig`, the first step of the
invisible/all pipeline: it inspects the input image (before the diffusion model
loads) and picks the quality modes so the run adapts to content. Quality-priority
routing -- ControlNet (text/face-structure preservation) is the default, skipped for
plain SDXL only on a clearly structure-less image; GFPGAN face restore when a face is
present; a mild sharpen + grain polish when a smoothing pass ran. Exposed as `--auto`
on `all`/`invisible` (`_apply_auto`; explicit flags override via click's parameter
source). Not wired into batch (its engine is cached per-mode).

Detection is cv2-only and torch-free (~100 MB peak RSS, a few ms): OpenCV YuNet
(`cv2.FaceDetectorYN`, MIT, 232 KB model bundled in assets/) for faces, a Canny
edge-density + MSER heuristic for text/structure (a rough Phase-1 placeholder; DBNet
via cv2.dnn is the planned upgrade). ZERO new pip deps. Designed to run wherever the
pipeline runs -- the raiw.cc Modal GPU worker -- never on the 512 MB web host.

Real-ESRGAN-via-Spandrel upscaling (a new `esrgan` extra) and an adaptive
Laplacian-variance polish are deferred to later phases.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-03 20:52:17 -07:00