flux-1.png / flux-1.jpg are real Black Forest Labs FLUX.2 [pro] Playground
outputs (signed C2PA, issuer "Black Forest Labs" + trainedAlgorithmicMedia,
manifests verified to contain no personal data). flux-1.jpg is the first
committed JPEG-with-C2PA fixture, exercising the c2pa-python non-PNG reader path
end to end. Regression tests assert both attribute to "Black Forest Labs (FLUX)".
Also documents the verified finding (n=2, 2026-06-19): BFL's hosted output carries
the signed C2PA manifest but NOT the open invisible-watermark DWT-DCT (decodes to
degenerate all-ones, chance-level vs the FLUX reference) -- the open pixel mark is
dev-inference-code-optional only. So a hosted FLUX.2 image is identified by C2PA
alone, with no open-pixel fallback once C2PA is stripped.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Mining the local production corpus (25,725 imgs) surfaced two AI vendors signing
C2PA that the registry missed:
- Canva (Magic Media) signed "Canva" + trainedAlgorithmicMedia -> detected AI but
no platform attributed (disproves the old "Canva exports strip C2PA" assumption).
- BytePlus (ByteDance international: Seedream/Seededit) signs "Byteplus Pte. Ltd.";
the bare volcengine needle missed it, so its output was mis-attributed to "Adobe
Firefly" via an incidental "Adobe XMP" string the fallback byte-scan picked up.
Adding both to C2PA_AI_VENDORS lets the clean manifest issuer attribute them
directly. Corpus re-run: 16 platform changes, all improvements (3 Adobe->ByteDance
fixes, 4 None/TC260->ByteDance, 9 None->Canva), 0 regressions. An attempted
signer-based attribution fallback was measured and dropped: it regressed 18 images
(friendly ByteDance label -> raw Chinese cert org; IPTC tool name pre-empted).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
extract_c2pa_info now uses the c2pa-python Reader first (any container, whole
manifest store incl. ingredient manifests), falling back to the hand-rolled caBX
parser for blobs the validator rejects (synthetic/partial, broken wheel). The
issuer/source-type/SynthID/soft-binding registry scan is shared by both paths
(_populate_registry_fields), so the return-dict contract is unchanged. Also
replaces the dead `from c2pa import has_c2pa_metadata` import in metadata.py with
a real Reader presence check. c2pa-python added as a core dep (MIT/Apache, ~+5MB
RSS, no torch; wheels cover the CI matrix).
Validated on the full local spaces corpus (25,725 imgs): 0 regressions; 384
manifests newly parsed (379 non-PNG JPEG/WebP + 2 PNGs the byte-scanner missed);
3 false Adobe/Microsoft->Google attributions fixed via real-manifest parsing.
The docs/module-internals.md section for this change already landed in 41f6797.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The free `visible` path over-subtracted a faint Gemini sparkle on a
mid-tone background into a darker-than-background brown diamond instead
of removing it (2026-06-18 prod NPS report, "the watermark was not
removed, just its color changed"). The existing over-subtraction guard
only tripped when reverse-alpha drove a footprint pixel fully negative
(the issue #30 dark-background black-pit case); on a mid-tone background
the over-subtraction darkens the core well below the background without
any pixel crossing zero, so the gate missed it and shipped the dark mark.
Add a second over-subtraction signal to `_reverse_alpha_oversubtracts`:
predict the reverse-alpha output at the bright core, (core - a*logo)/(1-a),
and route to the footprint inpaint when it lands more than
`_OVERSUB_DARK_MARGIN` (25) gray levels below the local background ring.
Calibrated wide: clean removals predict within ~12 of background
(demo_banana ~-1), the prod regression ~-40, the issue #30 dark case ~-82.
Corpus-validated on the 479 detected Gemini images: 10 switch reverse-alpha
to inpaint, all of them dark-diamond cases that improve or match; the
other 469 stay byte-identical. demo_banana stays on the reverse-alpha
path (byte-identical).
Also crop both reverse-alpha helpers to the region they actually touch,
a pure O(image) -> O(mark) win that is byte-identical to the full-frame
math (a uint8<->float32 round-trip is exact):
- `GeminiEngine._core_and_bg` converts only the footprint+ring crop to
gray, not the whole frame (~70 ms -> 0.1 ms on a 12 MP image; it runs
for both the alpha-gain estimate and the new gate). Verified identical
across 479 images; detector confidence unchanged.
- `TextMarkEngine._apply_reverse_alpha` computes the blend on the glyph
crop only (`amap` is zero outside it, so the math is a no-op there):
~275 ms -> ~2 ms per placement on a 12 MP frame, up to 2 placements per
removal. Verified identical across 142 Doubao/Jimeng placements.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
identify(check_visible=True) ran the Gemini-sparkle detector and the
Doubao/Jimeng text-mark detector each with its own image_io.imread, so the
same bitmap was fully decoded twice. On a memory-constrained host (the raiw.cc
512 MB web worker, which runs identify on every upload) that doubled the peak
decode allocation and contributed to OOM restarts.
Decode once in identify() and pass the BGR array to both detectors. The detect
methods already accept an NDArray, so this only threads the pre-decoded array
through: detect_sparkle_confidence and the two _visible_* helpers gain an
optional image= param that, when None, preserves the old self-read behavior
(so direct callers and the cv2-missing/unreadable paths are unchanged).
Only the visible path is deduplicated; the optional check_invisible decoders
are unaffected (and off on the web hot path). Adds a test asserting
identify(check_visible=True, check_invisible=False) decodes exactly once.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
A 2026-06-14 oracle re-test on the deployed Modal controlnet worker (v0.10.0)
cleared SynthID at OpenAI 0.10 (2 photoreal) and Google 0.15 (2 native
2816x1536, retiring the "native >= 0.30" guess), while a pixel sweep showed the
2026-06-04 cert floors (0.20/0.30) over-regenerated for no efficacy gain
(Google MAE -20% at 0.15). Lowers OPENAI_STRENGTH 0.20->0.10, GEMINI_STRENGTH
and UNKNOWN_STRENGTH 0.30->0.15.
Caveats documented in watermark_profiles.py + docs: removal near this floor is
seed-non-deterministic (a service must pin a verified seed), and the n=2 re-test
did not cover flat-graphic hard cases.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The 'no signal' branch of the visible no-mark path claimed 'No AI provenance
signal found either', which reads as 'the image is clean'. A missing metadata
proxy is not proof an invisible pixel watermark (SynthID) is absent: it cannot
be detected once metadata is gone and may have been stripped upstream. The
message now preserves that uncertainty and routes to both 'all' (regenerate
pixels) and 'erase'. Regression-guarded by the SynthID/all asserts in
test_cli.py. CLAUDE.md visible-command note updated to match.
Also adds a 'Scope and non-goals' section (CLAUDE.md + README): removing
AI-provenance marks on the user's own content is in scope; stripping
stock/paid-content watermarks (Shutterstock/Getty/iStock, classifieds) is out
of scope by principle, not by difficulty.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The convenience wrapper's docstring still quoted the pre-2026-06 ladder
(0.10 OpenAI / 0.15 Google / 0.15 unknown). The live constants in
watermark_profiles.py are 0.20 / 0.30 / 0.30, applied to both the controlnet
and sdxl pipelines. Docstring only; behaviour was already correct via
vendor_for_strength + resolve_strength.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
When `visible --mark auto` (or an explicit `--mark` with detection on) found
no registered mark, it exited 0 without writing output -- which a wrapping
service reads as success and re-serves the unchanged input. ~74% of real
uploads carry no registered visible mark, so this was the dominant "it didn't
work" / NPS score-0 failure mode.
Now it runs a cheap metadata-only identify, prints actionable guidance (route
to `all` for an invisible/metadata mark, or `erase` for an arbitrary logo),
writes no output file, and exits EXIT_NO_VISIBLE_MARK (2) -- distinct from
success (0) and a hard error (1) so the caller can surface the message.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The 256->512 detection-search widening (v0.8) let a large, low-gradient
shape match outrank a genuine mid-size corner sparkle whose raw NCC sits
below the 0.85 corner-promote gate, so `identify` read `unknown` on Gemini
images that v0.7.2 caught (reporter osachub: scale-48 sparkle on light
bedding -- true sparkle spatial 0.775 / grad 0.960 / fusion 0.676, but the
size-weighted argmax locked onto a decoy at spatial 0.628 / grad 0.036).
detect_watermark now keeps the top-K (_SELECT_TOPK=3) size-weighted
candidates (NMS-deduped) plus the corner-promote candidate, scores each by
full fusion (spatial+gradient+variance) via the extracted _grad_var_scores
helper, and selects the highest -- the gradient term lifts the true sparkle
over the decoy. Ranking by the SIZE-WEIGHTED score (not a raw-NCC argmax)
preserves tiny-patch suppression: a raw-NCC argmax re-admitted 16-18px
content false positives (14/65 doubao + 4/11 jimeng visible images). Top-K
adds zero flips on the doubao/jimeng corpora and leaves the 495-image Gemini
set unchanged (479 detected) while recovering the reporter's image at 0.676.
- _grad_var_scores: gradient/variance scoring factored out of detect_watermark
- confidence = best_fused (drop the duplicated fusion recompute)
- tests: rename test_promotion_is_what_rescues_it ->
test_size_weighted_search_alone_traps_on_the_decoy (corner-promote is no
longer the sole rescue path); add a deterministic regression test mirroring
the real spatial/grad signature
- docs: module-internals.md detector section + CLAUDE.md mechanism map
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
test_all_basic / test_all_visible_step_uses_registry asserted exit 0 but did
not patch is_available, so on CI (core+dev only, no gpu) they took the skip
branch and hit the new non-zero exit. Passed locally where gpu is present.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Step 2 (invisible/SynthID) was skipped with a quiet inline warning and the
run still exited 0, so a missing [gpu] extra was mistaken for a clean result
(recurring #14/#47). Add a prominent end-of-run banner and a non-zero exit.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- AIGC: parse the bare ``AIGC{...}`` blob form (label glued to its JSON in a
JPEG APP segment near the JFIF header), and scan both raw-JSON forms in one
fall-through loop so a quoted ``"AIGC"`` later in an XMP packet no longer
shadows a real bare label earlier in the file (3 files read unknown before).
- Integrity clash rule 2: a camera device + an AI marker from the SAME C2PA
manifest (Google Pixel Magic Editor / Pixel Studio edit chain) is a legitimate
edit chain, not a contradiction. Fire only when the AI marker's source is
independent of the camera's manifest; pure cameras (Leica/Sony/Nikon) are
unaffected (2 Pixel files mis-flagged before).
- New c2pa_cloud_manifest detector: surface a C2PA 2.4 Durable Content
Credentials cloud-manifest reference (Adobe cai-manifests.adobe.com) as a
medium provenance signal when the embedded manifest is stripped. Provenance
only, never asserts is_ai (2 files read fully unknown before).
identify reuses its already-loaded scan head for the cloud check (no second
read). +7 tests; CLAUDE.md + README synced.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
actions/checkout@v4 ran on the deprecated Node 20; bump to v6 to match
test.yml/publish.yml. Document the dismissed Dependabot torch alert
(GHSA-rrmf-rvhw-rf47, not_used: no torch.jit usage, gpu-extra-only, no patch).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
distribute.yml fans a published GitHub Release out to the channels that
would otherwise be manual: it waits for the sdist on PyPI, bumps the
Homebrew formula (HOMEBREW_TAP_TOKEN) and factory-rebuilds the HF Space
(HF_TOKEN). PyPI stays on publish.yml; conda-forge on its autotick bot.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
From the HN front-page discussion (news.ycombinator.com/item?id=48200569):
- Threat model: drop the 'third-party classifiers' overclaim. State scope
honestly: it removes SynthID / visible marks / provenance metadata, does NOT
defeat trained AI-vs-real classifiers (Hive), and watermarks are a weak trust
signal to begin with.
- Replace the 'preserving art / historical record' use case (criticized as not
holding) with the defensible one: clearing an overstated AI label from your
own lightly-AI-edited photo.
- Add a Limitations section: lossless visible/metadata vs lossy content-dependent
SynthID path, no local self-verify, large images not tiled yet, out-of-scope.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Research-informed metadata for organic dev discovery:
- pyproject: add a keywords field (was absent; biggest PyPI search gap) and
expand classifiers (audience, console, security, AI, utilities); rewrite the
summary noun-first, naming Nano Banana / SynthID / C2PA verbatim.
- README: add PyPI version, Python versions, downloads, and license badges.
GitHub topics (comfyui, watermark-remover) and the repo description were
updated out of band. PyPI metadata ships on the next release.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Nine findings from a high-effort project-wide review, fixed and verified
(571 passed, ruff/pyright clean):
Correctness:
- all/batch now remove Doubao/Jimeng/Samsung visible text marks: the visible
step routes through the registry (new cli._remove_visible_auto) instead of a
hardcoded GeminiEngine, so they no longer leave the wordmark intact.
- batch always reads the original source (dropped the out_path-reuse that
re-processed already-cleaned outputs on a re-run).
- img2img_runner only retries the diffusion call on the deprecated-callback
TypeError; any other TypeError now propagates instead of double-running.
- gemini detect/remove and the reverse-alpha engines normalize channels via a
new image_io.to_bgr, fixing a grayscale/BGRA crash in the FP-gate path.
- _png_late_metadata advances its cursor by the clamped length, so a malformed
chunk length no longer aborts the late AI-label scan.
Cleanup / efficiency:
- Consolidate the ~90%-identical Doubao/Jimeng/Samsung engines into a shared
config-driven _text_mark_engine.TextMarkEngine base; each engine is now a thin
subclass (TextMarkConfig + test shims). Behavior is byte-exact (the three
engine test suites pass unchanged). Registry adapters collapse to one
_text_mark(...) row each. Gemini stays a separate engine.
- scan_head is memoized per (path, size, mtime), so identify() reads the file
head once instead of ~8 times.
- invisible_engine post-processing decodes/encodes the output once (chained in
memory) instead of 2-4 times across stages.
- Remove the orphaned get_model_id_for_profile (+ CONTROLNET_PROFILE); derive
the --strength help from the strength constants (strength_default_help) so it
cannot drift; share the --pipeline/--strength click options; simplify the
retired --auto resolver.
Net -835 lines. Tests added for the registry-routed visible pass, to_bgr,
the polish/model/guidance wiring, and strength_default_help. CLAUDE.md updated
for the new base module, the engine/registry changes, image_io.to_bgr, and the
scan_head cache.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Overhaul the diffusion-removal surface around a single robust default and a
complete, consistent CLI.
Pipeline + strength:
- controlnet is now the DEFAULT pipeline (CLI --pipeline + both engine ctors).
With the certified higher strength it clears both photoreal and flat-graphic
content, whereas plain SDXL left SynthID on flat graphics.
- Rename the plain-SDXL profile default -> sdxl; "default" stays as a back-compat
alias (normalize_profile + a click callback that warns).
- Unify the strength ladder: resolve_strength applies ONE vendor-adaptive ladder
(the certified controlnet floors OpenAI 0.20 / Google 0.30 / unknown 0.30) to
both pipelines. sdxl is the weaker remover on its own hard case (flat fills),
so the certified floor is the right floor for it too.
CLI completeness:
- Add --model (HF model id) to invisible + batch (was only on all) and
--guidance-scale (CFG) to all three diffusion commands; both were library
knobs the CLI did not expose.
- Flip --adaptive-polish to ON by default (it self-gates to a no-op where there
is no detail deficit, so default-on is safe).
- Share --pipeline / --strength / --model / --guidance-scale as single
decorators so invisible/all/batch keep an identical surface; the --strength
help is derived from the strength constants (strength_default_help) so it can
never drift from the ladder.
Removals:
- Delete the auto_config content-detection planner + its YuNet/DBNet assets
(~2.6 MB): with controlnet always the pipeline and the polish self-gating, the
face/text/edge detection no longer changed behavior. --auto is now a deprecated
no-op that only warns (the polish it enabled is the default).
Docs (README, CLAUDE.md, docs/synthid.md) updated throughout; added an
InvisibleEngine Python API example. Tests cover the alias warnings, the
polish default, and the --model/--guidance-scale wiring.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The face-restore family was removed in 20d7eda, but the auto_config
module docstring still claimed "PhotoMaker face restoration is enabled
when a face is present" and the --auto help text (CLI + README example)
listed "face restore" as something --auto picks. A detected face now
only routes to the controlnet pipeline (canny preserves face STRUCTURE,
not identity); there is no identity restoration. Comments/docstrings/help
only, no code behavior change.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
BREAKING:
- Drop `--restore-faces` / `--restore-faces-method` CLI flags
- Drop `restore`, `photomaker`, `instantid` extras
- Drop `restore_faces` / `restore_faces_method` params from
InvisibleEngine.remove_watermark and AutoConfig
Rationale (full empirical record in
docs/synthid-robust-identity-research-2026-06-08.md "Empirical follow-up"):
every face-restore approach evaluated 2026-06-04 - 2026-06-08 (GFPGAN-on-
cleaned, PhotoMaker-V2, InstantID txt2img, InstantID img2img-on-cleaned
at three parameter sweeps) regenerates the face via SDXL diffusion --
output face pixels are diffusion-fresh, so the regenerated face inherits
SDXL's "clean skin" aesthetic and loses original identity precision. The
result looks MORE AI-generated than the cleaned image, not less. The
cleaned controlnet 0.20 image is the least-AI face state we can reach
without re-introducing SynthID.
License:
- MIT -> Apache 2.0 (Apache adds an explicit patent grant + trademark
clause; better fit with the upstream Apache projects this library
mirrors / depends on -- diffusers, transformers, controlnet-aux,
xinsir's controlnet weights)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replace LICENSE with the canonical Apache License 2.0 text + a brief
copyright notice for "wiltodelta 2025-2026". Update pyproject.toml's
`license` field to "Apache-2.0" and the PyPI classifier to "Apache
Software License". Update README's License section to point at the
LICENSE file and name the copyright holder.
Why: Apache 2.0 gives downstream users an explicit patent grant and the
trademark-use clause, which MIT doesn't carry. It is also the more
common license among the upstream projects this library depends on /
mirrors (diffusers, transformers, controlnet-aux, xinsir's canny
controlnet weights), so contributions can flow either way without a
permission-shape mismatch.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Empirical conclusion from the 2026-06-04 - 2026-06-08 Modal cert sweeps:
every face-restore approach we built (GFPGAN-on-cleaned, PhotoMaker-V2,
InstantID txt2img, InstantID img2img-on-cleaned at three parameter
settings) regenerates the face via SDXL diffusion rather than preserves
it. Output face pixels are diffusion-fresh, so the regenerated face
inherits SDXL "clean skin" aesthetic and loses original identity
precision -- it looks MORE AI-generated than the cleaned image, not
less. The cleaned image from the main controlnet 0.20 removal pass is
the least-AI face state we can reach without re-introducing SynthID.
Nothing in the restore family achieves the actual goal (preserve the
original person's face). Keeping them around as opt-in invites users to
ship something that defeats the point. Removing entirely.
Library changes:
- Deleted src/remove_ai_watermarks/instantid_restore.py
- Deleted src/remove_ai_watermarks/photomaker_restore.py
- Deleted tests/test_instantid_restore.py
- Deleted tests/test_photomaker_restore.py
- Removed `instantid` and `photomaker` extras from pyproject.toml
- Removed `[tool.hatch.metadata] allow-direct-references = true` (was
only needed for the photomaker git+ URL)
- InvisibleEngine.remove_watermark: dropped `restore_faces` +
`restore_faces_method` params, removed both `_restore_faces_instantid`
and `_restore_faces_photomaker` private methods, removed dispatch
- CLI: dropped `_restore_faces_options` decorator, all four cmd_*
signatures lose `restore_faces` + `restore_faces_method`, kwarg passes
to remove_watermark dropped
- _apply_auto: dropped `restore_faces` from tuple shape (was unused after
the engine no longer takes it)
- auto_config.AutoConfig: dropped `restore_faces` field; `plan()` no
longer sets it; `reason` no longer mentions it
- Tests updated accordingly (test_auto_config.TestReason no longer asserts
"face-restore on" in the reason string)
Docs updated:
- CLAUDE.md: removed the photomaker extras bullet, the Face restore
trade-off bullet, the instantid_restore.py + photomaker_restore.py
module bullets; replaced restore mentions in watermark_remover and
controlnet bullets and prod recipe with the empirical conclusion
- README.md: removed both `--restore-faces` callouts and the install
snippet; the feature bullet and auto-mode comment updated
- docs/synthid-robust-identity-research.md: added Status-retired notice
at the top pointing at the 2026-06-08 followup
raiw-app:
- modal_cert.py: dropped `--restore-faces` flag entirely; sweep() no
longer takes restore_faces; pinned _LIB_SPEC to `[gpu]` extras (no
`photomaker` / `instantid` extras), points at main
ruff + strict pyright clean; 569 tests pass; 18 restore-specific tests
gone.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Empirical conclusion from the 2026-06-04 - 2026-06-08 cert sweeps:
every shipped face-restore method (GFPGAN-on-cleaned, PhotoMaker-V2,
InstantID txt2img, InstantID img2img-on-cleaned at three parameter
settings) regenerates the face from an ArcFace embedding via SDXL
diffusion. Output face pixels are diffusion-fresh, which makes the
regenerated face look MORE AI-generated than the cleaned image (gloss,
symmetric pores, SDXL "clean skin" aesthetic) regardless of license.
The cleaned image from the main controlnet 0.20 removal pass is the
LEAST-AI state we can reach without re-introducing SynthID; any restore
on top trades original-look for embedding-driven regeneration. The
fundamental issue is structural: ArcFace encodes "general look" at 512
dimensions, SDXL decodes that into pixels with the inherent SDXL
aesthetic. Stronger identity push (higher strength + IP-Adapter scale)
makes the face closer to the embedding but more AI-looking; weaker push
leaves identity to drift further. No parameter setting recovers original
identity AND looks less AI than cleaned.
Production conclusion: do not ship `--restore-faces` in any monetized
deployment. The extras (`instantid`, `photomaker`) stay in the library
for research / personal use where users explicitly want regeneration.
Documented at every entry point:
- CLAUDE.md: new "Face restore trade-off" bullet + every restore mention
rewritten to "REGENERATES, does NOT recover"; controlnet bullet updated
- README.md: feature bullet + callout + secondary mention all updated
- docs/synthid-robust-identity-research-2026-06-08.md: appended
"Empirical follow-up" section documenting the InstantID sweep phases
(Phase 1 txt2img v1/v2/v3, Phase 2 img2img defaults + stronger params)
- docs/controlnet-removal-pipeline-research.md: updated restore-faces
bullet to reflect the empirical conclusion
- CLI help: `_restore_faces_options` docstring + `--restore-faces` /
`--restore-faces-method` help text all updated
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
First img2img cert sweep: scene/lighting integration was excellent on both
single (tatsunari) and group (gemini_3) photos, but the regenerated faces
were "recognizable similar people" rather than the original individuals.
The cleaned face crop (which has already drifted from original through the
main controlnet 0.20 removal pass) was competing as a structural prior;
at the previous parameter settings InstantID's ArcFace branch couldn't
dominate it.
Push the identity signal:
- `ip_adapter_scale`: 0.8 -> 1.0 at load time (full IP-Adapter strength)
- `controlnet_conditioning_scale`: 0.8 -> 1.0 default (landmark anchor)
- `img2img_strength`: 0.55 -> 0.7 default (more denoise, less cleaned
structure survives, more room for the diffusion to render ArcFace)
The cleaned image already passed the SynthID oracle, so the absolute floor
on strength is "any positive value" -- raising it only increases the
freedom of the diffusion to inject identity (SynthID-safety isn't reduced
by higher strength, because the noise injection only destroys more of the
input pixels).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The img2img run silently produced an identity output because
DiffusionPipeline.from_pretrained refused to load the local custom_pipeline
.py without `trust_remote_code=True` (emits a single-line warning to stderr,
then falls back to a default class). load_ip_adapter_instantid then
AttributeError'd, our outer except logged + skipped, and the saved file
was the un-restored cleaned image (exact byte size match against the
no-restore baseline -- 250988 bytes).
We fetch the file from a pinned raw.githubusercontent URL we control, so
trust_remote_code is safe to opt in here.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The txt2img architecture (generate face from scratch in a fresh 1024 scene)
fundamentally couldn't fix multi-face patchwork: each face was a studio
portrait that didn't belong in the surrounding scene (wrong lighting,
frontal pose, neutral expression vs the original group photo's varied
angles and smiles). Tight crop + elliptical alpha + color match smoothed
the seams but didn't make the faces look like they were SHOT in the scene.
Replacing with img2img-on-cleaned: feed the CLEANED face crop as the img2img
source, so the diffusion sees the actual scene context (shoulders, hair
edges, lighting direction, shadows) and harmonises the regenerated face
with it. Identity still flows through the ArcFace embedding (from original)
+ landmark ControlNet (kps from original) -- both semantic / pure geometry,
neither carries pixels.
SynthID safety preserved by construction:
- img2img source pixels = cleaned crop = already oracle-verified clean
- ArcFace embedding = 512-d semantic vector from original, no pixel content
- Landmark stick figure = colour-coded geometry, no source pixels
- img2img noise injection at strength 0.55 destroys any residual high-freq
pattern in the cleaned crop
- Pipeline is the upstream StableDiffusionXLInstantIDImg2ImgPipeline,
inherits from StableDiffusionXLControlNetImg2ImgPipeline; we still patch
check_inputs to neutralise the same diffusers-0.38 positional shift the
txt2img variant had
Implementation:
- New _fetch_img2img_pipeline_file() caches the upstream pipeline file from
GitHub raw on first use (not on PyPI / HF Hub, has to be downloaded
separately)
- _get_pipeline() now loads StableDiffusionXLInstantIDImg2ImgPipeline via
custom_pipeline=<cached path>
- restore_faces_instantid() crops the SAME bbox from both original and
cleaned, runs InsightFace on original (sharper embedding), feeds cleaned
crop as img2img source, ArcFace+landmark as conditioning
- New img2img_strength=0.55 parameter (was no strength knob in txt2img mode)
- Composite path unchanged (elliptical alpha + color_match)
- 9 control-flow tests still pass (the mock pipe call shape change is
absorbed by the kwargs-only fake)
Cert sweep will validate on tatsunari (single) first per user request.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Last commit added `_color_match` which shifts the face crop's mean to the
canvas mean -- the old test fed a uniform face (210) into a uniform cleaned
canvas (90), so after color-match the face was uniform 90 and the
composite was undetectable by value. Switched the fake pipeline to a
gradient face so the color-match preserves variance, and the assertion
now checks that the face region has non-zero std (composite injected
gradient pixels) instead of a value threshold.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Second multi-face iteration. v1-rect: full-1024 frame + Gaussian rectangle ->
patchwork. v2-ellipse: tight crop + ellipse 0.45*bw x 0.55*bh -> ellipse
exceeds bbox vertically and clips forehead/chin on single portrait, plus
group-photo faces visibly drift cooler than the warm bar background. v3:
1. **Smaller ellipse axes**: 0.32*bw x 0.42*bh. Both fit inside the bbox (since
axes are radii from center, 0.32*bw extends 0.64*bw total width and
0.42*bh extends 0.84*bh total height) so no chin/forehead clip even on
non-square boxes. Face shape: vertically elongated (0.42 vs 0.32),
matching real face geometry.
2. **Wider feather**: `min(bw, bh) // 5` instead of // 8. Edges fade over a
wider band so the elliptical seam is less visible.
3. **Per-channel mean color match** (`_color_match`): before compositing,
shift the regenerated face's mean BGR to match the cleaned canvas region
where it lands. Each InstantID generation has independent SDXL noise so
white balance drifts -- matching means equalises tone (warm bar / cool
face -> warm face) without rescaling contrast.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Group-photo cert sweep last round produced the same "patchwork quilt" failure
mode as PhotoMaker-V2: each face is regenerated as a fresh 1024x1024 SCENE
(face + background + lighting), then composited as a Gaussian-feathered
RECTANGLE into the 2x square box around the original face. The rectangle's
corners carry regenerated background pixels with different colors / textures
per face, and the rectangular Gaussian feather lets them bleed into the
cleaned image -- 9 face renders with 9 different backgrounds -> patchwork.
Two changes, both surgical:
1. **Tight-crop the regenerated face before composite.** After generation,
run YuNet again on the 1024 frame to find where the face actually landed,
then crop tightly around it (matching the 2x padding our input crop uses
so the face fills its natural slot). Drops the regenerated background's
peripheral pixels.
2. **Elliptical composite alpha** (`_composite_faces_elliptical`). Instead of
reusing photomaker_restore's rectangular Gaussian alpha, inscribe an
ellipse in each face bbox (axes ~0.45*bw x 0.55*bh so the feather edge
tapers cleanly inside the rectangle, head-silhouette shape), feather only
the ellipse edge. Bbox corners (regenerated scene context) end up at
alpha=0 and the cleaned-canvas pixels there stay intact. Only the head
region is replaced.
Net result: faces stay identity-restored (semantic ArcFace + landmark control
still drives generation) but the canvas around each face is the cleaned
image, not a regenerated frame. No more multi-face patchwork.
Single-portrait case unchanged: there's one face to composite and the cleaned
canvas around it is mostly the background that was already there.
All 9 InstantID control-flow tests still pass (the mock face analyser
responds to both .get() calls with the same fake bbox, so the new
generated-image YuNet step is exercised end-to-end).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Two compat bugs caught by the Modal cert sweep, both rooted in diffusers
0.38 vs InstantID's community pipeline expectations:
1. **Positional check_inputs misalignment.** InstantID's __call__ calls
`self.check_inputs(...)` POSITIONALLY using the parent's ~v0.29 signature.
Diffusers 0.38 added two new parameters BEFORE `controlnet_conditioning_scale`
in the parent's signature (`ip_adapter_image`, `ip_adapter_image_embeds`),
which shifts every positional arg by two slots. The argument that lands in
the parent's `controlnet_conditioning_scale` slot is actually InstantID's
`control_guidance_end` -- which a few lines earlier was converted to `[1.0]`
(a list) by InstantID's auto-broadcasting for the single-controlnet case.
The parent's check then trips on `not isinstance([1.0], float)` -> TypeError.
Our inputs are programmatic and validated by our own callers, so neutralising
`pipe.check_inputs = lambda *a, **k: None` after load is safe. This is the
standard workaround community ComfyUI ports use for the same compat break.
2. **`ip_adapter_scale` was passed at call time and silently ignored.** It's not
in `StableDiffusionXLInstantIDPipeline.__call__`'s signature -- the upstream
API sets the IP-Adapter weight on the ArcFace cross-attention branch at LOAD
time via `load_ip_adapter_instantid(scale=...)`. Moved the 0.8 default there,
dropped the call-time kwarg.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
InsightFace's built-in auto-download for the antelopev2 model pack
(github.com/deepinsight/insightface/releases/download/v0.7/antelopev2.zip)
has been broken since at least 2024 (upstream issues #2517, #2766, called
out in InstantID's README: "manually download via this URL to models/
antelopev2 as the default link is invalid").
When the .onnx files aren't in place, FaceAnalysis.prepare() raises
`assert 'detection' in self.models` -- which is exactly what our Modal
cert sweep hit on the first real run.
Fix: a tiny pre-flight `_ensure_antelopev2()` that pulls the five expected
.onnx files (1k3d68, 2d106det, genderage, glintr100, scrfd_10g_bnkps) from
the HuggingFace mirror `kidyu/antelopev2-for-InstantID-ComfyUI` into
./models/antelopev2/ before FaceAnalysis is instantiated. Idempotent
(skips files that already exist); uses huggingface_hub's cache for free
caching on the Modal volume.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The InstantID cert sweep emitted `restore_faces post-pass failed ()` -- the
exception's str() was empty so the log line told us nothing about what
actually failed. Adding `exc_info=True` plus `type(e).__name__` so the
full traceback and exception class land in the log even when the message
is empty.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The InstantID restore module imported `_get_yunet` from `auto_config`, but
auto_config doesn't export that function -- the YuNet singleton lives inline
inside `detect_face()`. Caught by the Modal cert sweep:
restore_faces post-pass failed (cannot import name '_get_yunet' from
'remove_ai_watermarks.auto_config'); keeping un-restored output
Inline the YuNet builder the same way `photomaker_restore` does (read
`auto_config._FACE_SCORE` and the bundled `face_detection_yunet_2023mar.onnx`
asset, build a fresh `FaceDetectorYN` per call). This is the proven pattern
from PhotoMaker and avoids a private-API drift between the modules.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Per the 2026-06-08 deep-research synthesis (docs/synthid-robust-identity-
research-2026-06-08.md), the entire ArcFace-class identity-adapter ecosystem
for SDXL is blocked from commercial use by InsightFace's non-commercial model
packs (antelopev2 / buffalo_l). No commercial-safe ArcFace-grade identity
stack exists today. The user explicitly opted into shipping a non-commercial
restore path (research / personal use; raiw.cc must NOT install the extra).
Architectural choice: InstantID over PhotoMaker-V2 as the default.
- PhotoMaker-V2 (CLIP+ArcFace dual encoder, txt2img only): documented upstream
identity drift on Asian male faces, visually confirmed in our cert sweep
(tatsunari rendered as a generic woman; group photo collapsed into a
patchwork).
- InstantID (ArcFace cross-attention + landmark ControlNet): semantic
identity branch + spatial weak landmark control, decoupled. Per InstantID
paper (arXiv:2401.07519) and the research report, stronger identity fidelity
on single portraits. Critically: NO original face pixels enter the diffusion
(ArcFace embedding is semantic, landmark stick figure is pure geometry), so
SynthID is not transported.
Implementation:
- New `src/remove_ai_watermarks/instantid_restore.py` mirrors the
`photomaker_restore.py` shape (lazy singletons for pipeline + FaceAnalysis,
per-face crop + _composite_faces from photomaker_restore). Loads the
InstantID community pipeline via `DiffusionPipeline.from_pretrained(
custom_pipeline="pipeline_stable_diffusion_xl_instantid")` -- no upstream
Python package needed; diffusers fetches the file from its community
examples.
- New `instantid` extra in pyproject (insightface + onnxruntime +
huggingface-hub). NON-COMMERCIAL block in the comment explains why.
- CLI: `--restore-faces-method [instantid|photomaker]`, default `instantid`.
Both methods explicitly labeled NON-COMMERCIAL in the help text.
- Engine: dispatch on `restore_faces_method` to either
`_restore_faces_instantid` or `_restore_faces_photomaker`.
- 9 control-flow tests for InstantID without model download (mirror the
photomaker_restore.py test pattern + draw_kps helper checks). 587/587 pass.
Diffusers-0.38 compat verified by upstream code inspection: the InstantID
pipeline inherits from `StableDiffusionXLControlNetPipeline`, uses only
public diffusers APIs (`encode_prompt`, `prepare_image`, `prepare_latents`,
`get_guidance_scale_embedding`), uses legacy attention processor API which
diffusers preserves for backward compat. No PhotoMaker-V1-style internal
text_encoder access. End-to-end execution will be validated by the Modal
cert sweep in the next step.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The 9-face grid + single-face cert outputs were still mosaic of training-time
faces even after the id_embeds shape fix. WebFetch of the upstream
inference_pmv2.py revealed three mismatches:
1. SDXL at width=height=512 falls into its low-res failure mode (small-detail
collage / mosaic) on the V2 LoRA. Render at native 1024 then downscale into
the original face bbox at composite time.
2. Upstream prompt is descriptive ("instagram photo, portrait photo of a woman
img, colorful, perfect face, natural skin, hard shadows, film grain, best
quality"). Our generic prompt let SDXL drift away from the ID embedding.
Adopted the upstream pattern.
3. Upstream V2 explicitly passes negative_prompt; the CFG batch-mismatch we hit
on V1 isn't a V2 issue. Re-added negative_prompt with the upstream wording
(asymmetry/worst quality/etc).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>