remove-ai-watermarks

mirror of https://github.com/wiltodelta/remove-ai-watermarks.git synced 2026-07-30 11:18:49 +02:00

Author	SHA1	Message	Date
Victor KuznetsovandClaude Opus 4.8	439eeadc07	refactor(face-restore): wipe GFPGAN path, --restore-faces is PhotoMaker-only The GFPGAN `restore` extra and its `face_restore.py` module are gone. They were oracle-confirmed to re-introduce SynthID by blending watermarked original face pixels at fidelity weight 0.5 (clean A/B: gemini_3 controlnet 0.20 detected WITH GFPGAN, clean WITHOUT). Keeping them as the default restore method was a footgun for the removal pipeline. PhotoMaker-V2 (added in the previous commit) is the single shipped restore path now -- identity-as-embedding, SynthID-safe by construction. Removed: - src/remove_ai_watermarks/face_restore.py + tests/test_face_restore.py - pyproject.toml `restore` extra (gfpgan/facexlib/basicsr + scipy/numba pins) - pyproject.toml `[tool.uv.extra-build-dependencies] basicsr = [...]` build pin - CLI: `--restore-faces-method` and `--restore-faces-weight` (no method choice to make, no GFPGAN weight knob to expose) - InvisibleEngine._restore_faces method (only _restore_faces_photomaker remains) - All restore-faces-method / restore-faces-weight threading through cmd_* signatures and _process_batch_image Kept: - `--restore-faces / --no-restore-faces`: now binds to PhotoMaker-V2. - All adopted oracle findings about GFPGAN re-introducing SynthID (kept in the research docs as historical context that explains why the path was removed). Docs updated: CLAUDE.md (restore extras bullet collapsed to photomaker, removed face_restore Key-modules bullet, several inline GFPGAN refs scrubbed), README.md (face-identity callout + install section now point to the photomaker extra), docs/synthid.md 5.5 (net recipe), docs/controlnet-removal-pipeline-research.md (recommendations). ruff + strict pyright (src/) clean; 578 tests pass (the 9 GFPGAN tests are gone, the 9 PhotoMaker tests stay green). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-08 15:35:37 -07:00
Victor KuznetsovandClaude Opus 4.8	f8f247308b	docs(identity): smoke test confirms OpenCLIP embedding is invariant to SynthID-magnitude noise Empirical confirmation of the load-bearing assumption in the PhotoMaker-V2 path: the identity embedding cannot transport an invisible pixel watermark. Tested OpenCLIP-ViT-H/14 (laion2B-s32B-b79K — the same encoder PhotoMaker-V2 fine-tunes) on 31 face crops from gemini_3/gemini_4/openai_3 grid. cosine similarity between embed(orig) and embed(perturbed): - synthid_proxy (±2 LSB low-frequency noise, the regime SynthID actually lives in): mean 0.9977, min 0.9937. Embedding moves by 0.002 — an order of magnitude less than JPEG90 (mean 0.928), which SynthID survives at >=99% TPR by design. - noise3 / jpeg70 / blur1: 0.89-0.95, all clearly above the SynthID floor. - self check: 1.0000 (pipeline sane). So the embedder discards exactly the dimensions SynthID hides in. PhotoMaker-V2 conditioned on a watermarked face will see the same identity vector as a clean face of that person, so the generated face inherits identity, not the watermark. This unblocks step 2 of the research plan: prototype PhotoMaker-V2 in the controlnet pipeline. The previously logged ad-hoc "cos(orig, SDXL-cleaned)" numbers (0.56-0.93) measured diffusion drift, not watermark invariance, and are not relevant to the hypothesis. Docs only. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-08 15:05:15 -07:00
Victor KuznetsovandClaude Opus 4.8	310ce912ba	docs: SynthID-robust identity research — PhotoMaker-V2 is the only commercial-safe SDXL stack After GFPGAN restore was oracle-confirmed to RE-INTRODUCE SynthID (it is a fidelity- restoration net conditioned on the watermarked input), the only identity path that will not transport the watermark is identity-by-EMBEDDING: a semantic vector that conditions a fresh generation. That requires a face-recognition / ArcFace-class or CLIP-image embedder. Verified the license stack of every credible 2025-2026 SDXL identity adapter by fetching primary sources directly (HuggingFace model cards, insightface.ai): - IP-Adapter FaceID family, InstantID, PuLID, Arc2Face -> all blocked. Each depends at runtime on InsightFace's antelopev2/buffalo_l ArcFace packs, and insightface.ai explicitly states "Code is MIT licensed; models require separate commercial licensing." IP-Adapter FaceID's own model card flags itself non- commercial for the same reason. - PhotoMaker-V2 is the single commercial-safe end-to-end stack today: Apache-2.0 adapter weights with identity encoded as a fine-tuned OpenCLIP-ViT-H/14 (the model card's exact phrase: "id_encoder includes finetuned OpenCLIP-ViT-H-14 and a few fuse layers"). No InsightFace. Mechanistic argument that an identity embedding cannot transport SynthID: the embedder is trained to be invariant to low-amplitude pixel changes (JPEG, resize, brightness, noise), which is exactly the regime SynthID hides in by design. So the embedding extracted from a watermarked face should be ~identical to the embedding from the cleaned face, and the embedding cannot carry the watermark into a freshly generated face. Flagged explicitly as not-yet-measured -- the first integration step is a cosine-similarity smoke test (no codegen) before investing in a PhotoMaker prototype. Process note: the deep-research harness was run but its verifier subagents failed to call StructuredOutput (same harness bug as a prior session), so its synthesis was unusable; the license claims here are direct quotes from the primary sources, fetched and verified, not from the workflow synthesis. Docs only. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-08 14:58:11 -07:00
Victor KuznetsovandClaude Opus 4.8	be14eca207	docs: certified controlnet strength floors from the Modal GPU oracle sweep Ran the isolated raiw-controlnet-cert Modal app (raiw-app/modal_cert.py) over a strength x seed grid, restore OFF, --max-resolution 1536, each vendor checked on its OWN oracle (OpenAI -> openai.com/verify, Gemini -> the Gemini app). Certified controlnet SynthID-removal floors: - OpenAI 0.20: 2 photoreal images (9-face grid + bracelet) x seed {1,2,3} = 6/6 clean; the bracelet that flipped at 0.15 is seed-robust at 0.20. Transfers to prod (OpenAI removal is resolution-independent). - Gemini 0.30: 0.20 detected -> 0.30 clean on 2/2 seeds (hardest face). Holds only at <= 1536; Gemini is resolution-sensitive and raiw.cc runs NATIVE, so cap Gemini <= 1536 + use 0.30, or native-calibrate (~0.35+). Prod recipe recorded: controlnet + a controlnet-specific per-vendor schedule in resolve_strength (OpenAI 0.20 / Gemini 0.30, NOT the default 0.10/0.15 ladder) + FIXED prod seed (kills the near-threshold non-determinism) + restore reworked/off. Added to docs/controlnet-removal-pipeline-research.md (certified floors table), docs/synthid.md 5.5, and the CLAUDE.md controlnet bullet. Docs only. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-08 12:44:56 -07:00
Victor KuznetsovandClaude Opus 4.8	d38b9a6122	docs: correct controlnet/restore SynthID-removal claims from the 2026-06-04 oracle pass Oracle validation (openai.com/verify + the Gemini app) overturned three claims that were on main, and consolidates the controlnet findings into one authoritative place. - controlnet does NOT reliably remove SynthID at the low vendor-adaptive strength: removal is content x pipeline dependent and the survivors FLIP by content type (photoreal survives controlnet / clears default; flat graphic survives default / clears controlnet; flat text clears both). Root cause is insufficient strength, not the pipeline; controlnet needs a higher, per-vendor floor than default. - removal near the threshold is SEED-non-deterministic (same image+pipeline+strength can pass or fail run-to-run); a single clean run does not certify a strength. - `--restore-faces` RE-INTRODUCES SynthID: GFPGAN runs on the ORIGINAL watermarked face at weight 0.5 and composites it back over the cleaned result (clean A/B: a Gemini face stayed detected through controlnet 0.15/0.20/0.25 WITH restore, cleared at 0.20 with --no-restore-faces). The old "GFPGAN scrubs SynthID" claim was wrong. Corrected in CLAUDE.md (watermark_remover controlnet bullet, controlnet Known-limitations bullet, face_restore bullet, vendor-adaptive strength bullet) and docs/synthid.md (5.1 controlnet/face-identity, 5.2 strength floors, new 5.5 oracle validation log). docs/controlnet-removal-pipeline-research.md gains an authoritative "Oracle validation 2026-06-04" section that the others point to as the single source. Docs only; no code change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-08 12:22:43 -07:00
Victor KuznetsovandClaude Opus 4.8	8523f48fb6	data(corpus): archive June 2026 SynthID strength-study subjects Back docs/synthid.md section 2.2 with the actual test set: the per-image oracle-verified subjects were only in a local working dir, while the doc claimed they were recorded in data/synthid_corpus/. Ingest the key pos+cleaned pairs so the claim holds. - pos: openai_1/2/3 originals (gpt-image, openai-verify) + gemini_1/2/3/4 originals (Gemini app, gemini-app); all probe as C2PA-SynthID present. - cleaned: OpenAI at strength 0.05 (openai_2 only s010 captured) + Gemini at 0.15 --max-resolution 1536; oracle: SynthID NOT detected. Metadata stripped, so no C2PA on the cleaned rows. - Excluded the third-party issue #14 image (pic3): oracle-verified but not committed to the public corpus. - docs/synthid.md 2.2: state OpenAI n=4 = 3 archived + 1 external-only. - CLAUDE.md: drop the drift-prone "~65 MB" corpus size from the sdist note. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-03 17:09:58 -07:00
Victor KuznetsovandClaude Opus 4.8	5ec8269949	chore: mark controlnet pipeline + GFPGAN restore-faces as experimental Both content-preservation features are now flagged EXPERIMENTAL and opt-in. --pipeline controlnet was already opt-in (default=default); --restore-faces flips from on-by-default to OFF by default, matching the repo's prior pattern for experimental preservation passes (the removed protect_text/protect_faces). - cli.py: --restore-faces/--no-restore-faces default False; EXPERIMENTAL in the --restore-faces / --controlnet-scale / --pipeline help; batch default False. - invisible_engine.py: remove_watermark restore_faces default False + docstring. - CLAUDE.md / README.md / docs/synthid.md: label both experimental/opt-in. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-03 16:59:28 -07:00
Victor KuznetsovandClaude Opus 4.8	411ef16ec3	feat: GFPGAN face-identity restoration post-pass Add an optional, commercial-safe face-restoration post-pass that recovers face identity the diffusion removal pass drifts (canny holds structure, not likeness) while still scrubbing the pixel watermark in the face regions. - face_restore.py: GFPGANer singleton (CPU unless CUDA), the basicsr torchvision.transforms.functional_tensor shim, and the pure feather _composite_faces helper (unit-tested without the model). GFPGAN re-synthesizes each face from a StyleGAN2 prior, so composited face pixels are GAN-generated (no watermark, no pixel-copy) -- oracle-clean at weight 0.5 with identity preserved. - InvisibleEngine.remove_watermark: restore_faces / restore_faces_weight, best-effort, auto-skips when the extra is absent or no face is detected. - CLI --restore-faces/--no-restore-faces + --restore-faces-weight on invisible/all/batch (on by default). - restore extra (gfpgan/facexlib/basicsr), numpy<2-pinned (scipy<1.18, numba<0.60) and kept out of `all`; basicsr needs Python <3.13 + setuptools<69 to build, so pin .python-version 3.12. Commercial-safe: GFPGAN Apache-2.0, RetinaFace MIT. The CodeFormer alternative is non-commercial and is not shipped. The earlier IP-Adapter FaceID layer was removed (footgun: needs high strength, corrupts faces at the low removal strength). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-03 16:59:28 -07:00
Victor KuznetsovandClaude Opus 4.8	d90d5d886a	feat: controlnet pipeline for text/face-structure preservation Add `--pipeline controlnet` (SDXL base + xinsir canny ControlNet via StableDiffusionXLControlNetImg2ImgPipeline): the canny edge map conditions the img2img regeneration so text and face STRUCTURE stay sharp, while the watermark is still removed by the regeneration (`strength`) -- no original pixels are copied or frozen, so SynthID does not survive. Oracle-verified clean on OpenAI with better text/structure fidelity than plain img2img at equal strength. `--controlnet-scale` tunes structure preservation; fp32 on mps/cpu (fp16-fixed VAE on cuda/xpu). Shares the img2img runner (live progress + MPS->CPU fallback) and the fp16-VAE-fix / device-move helpers with the default pipeline. Remove the superseded subsystems -- ctrlregen (SD1.5 clean-noise), text-protection (differential / region-hires) and face-protection: they either destroyed real content or shielded the watermark by re-using original pixels. controlnet replaces them by regenerating everything under edge conditioning. Canny preserves face structure but not identity; face IDENTITY is a separate face-restoration post-pass (CodeFormer/GFPGAN), researched + prototyped but not yet shipped. An IP-Adapter FaceID attempt was built and removed (footgun: needs high strength, corrupts faces at removal strength). Docs: docs/controlnet-removal-pipeline-research.md, scripts/controlnet_sweep.py. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-03 16:59:28 -07:00
Victor KuznetsovandClaude Opus 4.8	96038f960f	feat(invisible): vendor-adaptive default strength (OpenAI 0.10 / Google 0.15) The default img2img strength is now chosen from the detected SynthID vendor (C2PA issuer) instead of a single fixed 0.30: OpenAI gpt-image -> 0.10, Google Gemini -> 0.15, unknown source -> 0.15. Explicit --strength always wins. Basis: an oracle-verified June 2026 controlled study (clean v0.8.6, text/face protection OFF, per-image openai.com/verify or Gemini-app verdict). OpenAI's SynthID clears at 0.05 across 1024-1600 px (n=4, resolution-independent); Google's is ~3x more robust and needs 0.15 on the capped-1536 path (n=4). The dominant factor is the VENDOR, not resolution. The earlier single 0.30 default and the "resolution dependence" lore came from contaminated tests run with the protect-text bug ON (issue #14) -- re-running those same 1600x1600 images clean removes SynthID at 0.05. `vendor_for_strength(path)` reads metadata.synthid_source on the ORIGINAL input and is threaded through cli (invisible/all/batch) -> invisible_engine -> watermark_remover -> resolve_strength(strength, profile, vendor), so display and execution use the same vendor (the engine sees a temp path whose C2PA the visible pass already stripped, so detection must happen in the CLI on the pristine source). Caveat: Google's 0.15 was validated only on --max-resolution 1536; native 2816 Gemini was not locally measurable (OOM on Apple Silicon) and is pending GPU validation on raiw.cc. Docs: docs/synthid.md sections 2.2/4.4/5.2 corrected (the contaminated resolution-dependence findings replaced with the clean oracle-verified table); README and CLAUDE.md updated; CLI --strength help reflects the adaptive default. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-01 19:29:47 -07:00
Victor KuznetsovandClaude Opus 4.8	4b0b370ac0	fix(invisible): disable protect-text/protect-faces by default; add docs/synthid.md Both text and face protection were shielding SynthID from removal. The text-protection high-res re-scrub regenerates pixels at an upscaled resolution where the per-region pass may not be strong enough to re-destroy the SynthID payload, allowing it to survive in text areas. Face protection has an even more direct mechanism: it pastes back the original (pre-diffusion, watermarked) face pixels after the global pass, guaranteeing SynthID survives in face regions regardless of strength. Both --protect-text and --protect-faces are now off by default and opt-in. Rename from --no-protect-text / --no-protect-faces to --protect-text / --protect-faces. Extract shared click.option decorators to module-level constants (_protect_text_option, _protect_faces_option) to eliminate copy-paste between cmd_invisible and cmd_all. Add docs/synthid.md: primary-source-cited technical reference for SynthID-Image covering mechanism (post-hoc encoder/decoder, 136-bit payload, pixel-space, no model-weight modification), robustness numbers (arXiv:2510.09263: ~99.98% TPR at 0.1% FPR across 30 transforms), removal attacks and forensic detectability (arXiv:2605.09203: all 6 attacks detectable >98% TPR@1%FPR), detectability limits, oracle scope, adoption landscape, and practical implications including the protect-text/faces SynthID-preservation finding. Verified June 2026 on gpt-image 1600x1600 via openai.com/verify: with --protect-text SynthID detected; without, SynthID removed. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-01 10:28:34 -07:00
Victor KuznetsovandClaude Opus 4.8	4b4049a6f1	docs(text-protection): update stale strength note (~0.05 -> ~0.30 SynthID threshold) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-31 17:53:48 -07:00
Victor Kuznetsov GitHub Claude Opus 4.8	58bdf51c59	Visible-watermark registry: reverse-alpha-only Doubao + Gemini, exact native recovery (#28 ) * fix(trustmark): gate detection on re-encode durability to kill false positives TrustMark's wm_present flag is a BCH validity check that spuriously validates on a content-correlated fraction of un-watermarked images (AI textures trip it more than camera photos). On a 1343-image set all 20 raw detections were false, several on Gemini/OpenAI/Doubao output that cannot carry Adobe's watermark, with random-bytes secrets. A genuine TrustMark is a durable soft binding that survives re-encoding, so detect_trustmark now re-decodes after a mild JPEG round-trip and requires the same schema both times. Every observed false positive collapsed under this gate; the second decode runs only on the rare hit. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(identify): Samsung Galaxy AI, FLUX, ByteDance C2PA; fix C2PA substring FP Detection extensions verified on real signed files (2026-05-29): - Samsung Galaxy AI: signer attribution via a new _SIGNER_C2PA_PLATFORM (Samsung Galaxy / ASUS Gallery) kept separate from the capture-camera _DEVICE_C2PA_PLATFORM so a Galaxy AI edit (device cert + AI source type) does not trip the camera-vs-AI integrity clash. Plus metadata.samsung_genai: the proprietary genAIType marker in PhotoEditor_Re_Edit_Data, a medium- confidence AI-editing signal (samsung_only branch). - Black Forest Labs (FLUX) and ByteDance Volcano Engine (Doubao/Jimeng) added as C2PA issuers + issuer->platform mappings. - fix: C2PA presence required only the bare 4-byte 'c2pa' substring, which false-positives on compressed pixel data (a recompressed PNG IDAT re-flagged C2PA after its manifest was correctly stripped). New c2pa_marker_in() requires the JUMBF wrapper (jumb+c2pa) or the C2PA uuid box; applied in identify + metadata. Verified: all 535 real C2PA files carry jumb. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(doubao): gate detection on text structure to cut ~95% of false positives (#23) Coverage alone over-fired: any textured bottom-right corner cleared the threshold, so the detector false-positived on ~28% of arbitrary images. The real '豆包AI生成' mark is six glyphs in one row, so detect now also requires the text-structure signature (_glyph_structure): many connected components, no single dominant blob, concentration in a thin horizontal band. False positives dropped 343 -> 17 across the corpus while keeping real-mark recall and the doubao-1.png sample. Also accept a no-op force kwarg for remover-interface symmetry. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(samsung): add Samsung Galaxy AI visible-badge remover New samsung_engine.py removes the bottom-left sparkle + localized 'AI-generated content' badge that Galaxy AI tools stamp. Mirrors the Doubao locate->mask->inpaint pattern but bottom-left, with a dual-polarity top-hat mask (the badge is light-on-dark or dark-on-light). Detection gates on a band + left-anchor signature (the Doubao CJK-component gate does not transfer: Latin badge letters connect into few blobs). Explicit-only -- tuned on few real badges with a ~4% FP floor, so it is not used in auto. Synthetic byte-blob fixtures (real badges are user content, not shipped). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(visible): unified known-watermark registry + LaMa inpaint backend watermark_registry.py is a single catalog of known visible marks, each tying {usual location, in_auto flag, recovery strategy, detect adapter, remove adapter}: gemini (reverse-alpha, exact), doubao, samsung. cmd_visible is now registry-driven (best_auto_mark for --mark auto; mark_keys() feeds the CLI choices) -- the per-mark _run_doubao/_run_samsung helper branches are gone. Cross-engine confidences are not comparable, so the gemini adapter applies the corpus-validated 0.5 sparkle threshold for auto arbitration (its engine flag is loose and weakly fired ~0.36 on Doubao text, hijacking auto). --backend auto\|cv2\|lama chooses background reconstruction for the mask-based marks; auto = LaMa when onnxruntime is present, else cv2. For LaMa the mask is the FILLED glyph bounding box (sparse glyph masks leave anti-aliased edges behind). cv2 stays the zero-dependency fallback. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs: watermark registry, Samsung/FLUX/ByteDance detection, LaMa backend, trustmark gate Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(doubao): exact reverse-alpha removal from captured alpha map The Doubao '豆包AI生成' mark is a fixed semi-transparent white overlay, so given its alpha map the original pixels are recovered exactly: original = (wm - alogo)/(1-a) -- no inpaint hallucination. The alpha map + logo colour were solved from real black+gray Doubao captures on a controlled background: on black captured = alogo, and the black/gray pair solves a per-pixel without assuming the logo colour (a_max~0.65, logo near-white); the white capture cross-validates (mark vanishes to a flat fill). Bundled as assets/doubao_alpha.png + geometry constants. remove_watermark_reverse_alpha applies it scaled to image width; exact at the captured width, so the registry routes doubao through it only when reverse_alpha_available (width within the calibrated band) and the mark is detected, falling back to mask inpaint (cv2/LaMa) otherwise. A light residual inpaint cleans the sub-pixel rescaling error. Add captures at more resolutions to widen exact coverage. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(visible): reverse-alpha only -- drop inpaint removal + heuristic detection Per the principle that we only remove/detect what we can do exactly, the visible-mark path is now reverse-alpha only: - Doubao detect is reverse-alpha-consistent: match the bundled alpha glyph silhouette against the corner via TM_CCOEFF_NORMED (DETECT_NCC_THRESHOLD 0.4) -- keys on the '豆包AI生成' SHAPE, not coverage/structure heuristics. FP 7/1243 (0.6%). Removes the cv2 inpaint path + the _glyph_structure gate. - Registry is reverse-alpha only: dropped the cv2/LaMa backend (_glyph_remove, _lama_box_inpaint, default_backend, --backend) and the Samsung entry. Doubao outside the alpha resolution band is skipped, never inpainted. - Removed samsung_engine.py + tests + --mark samsung (no alpha map captured; Samsung C2PA/genAIType metadata detection in identify is unaffected). - The universal erase --region (cv2/LaMa) is unchanged -- arbitrary-region inpainting stays a user-directed tool, separate from the known-mark registry. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(doubao): NCC sub-pixel alignment -> reverse-alpha at any resolution A pure width-scale of the captured alpha map is only sub-pixel-accurate at the captured width and leaves a faint ghost elsewhere. remove_watermark_reverse_alpha now registers the alpha glyph to the actual mark via a TM_CCOEFF_NORMED scale+position search (_aligned_alpha_map) before inverting the blend, so the single 2048 capture works at any resolution -- verified clean on the 1773x2364 (3:4) corpus size, the biggest coverage gap (23 files). reverse_alpha_available is now just 'asset present' (no width band); the registry still gates removal on detect so a clean corner is never touched. Drops the _ALPHA_WIDTH_TOLERANCE gate. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(doubao): keep native recovery exact -- fixed geometry at captured width Integer-pixel NCC alignment landed ~1px off at the captured width, degrading the otherwise-exact native reverse-alpha (synthetic recovery error 0.94 -> 1.39). remove_watermark_reverse_alpha now uses exact width-relative geometry within _ALPHA_NATIVE_BAND of the captured width and the NCC search only off it -- best of both: native back to 0.94, other resolutions still aligned. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(doubao): harden alignment -- try fixed+aligned, keep least residual (56/56) On a faint/busy-background mark the NCC alignment peak can wander a few px off the true mark and leave a residual (2/56 real corpus files). Off the captured width, remove_watermark_reverse_alpha now builds BOTH the fixed-geometry and the NCC-aligned alpha map, applies each, and keeps whichever leaves the least residual mark (re-detect confidence on the bare reverse-alpha) -- geometry wins on faint marks, alignment on clear ones, no magic threshold. Real-file round-trip now removes 56/56 detected Doubao clean across every corpus resolution (was 54). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * perf(doubao): skip residual inpaint at native width for exact recovery At the captured width the fixed-geometry reverse-alpha is pixel-exact, so inpainting over it only replaced exactly-recovered interior pixels with a cv2 hallucination -- measured worse on a textured background (native error vs true bg 1.6 reverse-alpha-only vs 2.6 with the old always-on full-footprint inpaint). Native now returns the bare recovery untouched; off-native, where NCC alignment is only sub-pixel-approximate, the footprint inpaint stays to clean the seam. Real round-trip still 56/56 across all corpus resolutions; negatives 0/60, Gemini unaffected. Add test_native_returns_exact_reverse_alpha_no_inpaint as the regression guard. Sync CLAUDE.md + README (the table cell and prose described the pre-NCC "skipped off native / cv2-LaMa" behavior, now stale). Gitignore the session scheduled_tasks.lock, and add the text-protection research note. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-29 19:49:09 -07:00