diff --git a/CLAUDE.md b/CLAUDE.md index e8e83d6..6e1f9ed 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -14,7 +14,7 @@ You are a **principal Python engineer** maintaining a CLI tool and library for r ## Test and lint -- **CI** (`.github/workflows/test.yml`): runs on push to `main` + every PR. A `lint` job (ubuntu: `ruff check` + `ruff format --check`) plus a `test` matrix (ubuntu/macos/windows x py3.10/3.12) that does `uv sync --frozen --extra dev` then `pytest`. The matrix installs only core + dev (no `gpu` extra), so the GPU/model-running tests skip there and it exercises the metadata/identify/visible/cv2-eraser surface on all three OSes. Keep `uv.lock` valid (don't break `--frozen`) when editing `pyproject.toml`. `publish.yml` stays release-only and now verifies the release tag matches the `pyproject.toml` version (fails the build on a mismatch) before building. **Release flow:** bump the version in `pyproject.toml` + `src/remove_ai_watermarks/__init__.py` + `uv.lock` (the project's own `[[package]]` entry, ~line 2868), commit `chore(release): vX.Y.Z`, `git tag -a vX.Y.Z -m vX.Y.Z` (annotated — `git tag` without `-m` errors here), push `main` + the tag, then `gh release create vX.Y.Z` — **PyPI publish triggers on the GitHub Release `published` event, NOT on the tag push**, so the tag alone does not publish. **Sdist must exclude `data/`** (`[tool.hatch.build.targets.sdist] exclude = ["/data"]`): hatchling's default sdist bundles all VCS-tracked files, so the committed `data/` test corpora (synthid_corpus images ~65 MB + the visible-mark captures) pushed the **0.8.0** sdist past PyPI's per-project file-size limit (400 "File too large") — the wheel uploaded but the sdist was rejected, so 0.8.0 shipped wheel-only and 0.8.1 carried the fix. The wheel only ships `src/` (via `[tool.hatch.build.targets.wheel] packages`), so it was never affected. **A failed PyPI upload of one artifact still leaves the other live and you cannot re-upload the same version** — fix the build and cut the next patch. **Build backend is pinned `hatchling<1.28`** (`[build-system] requires`): hatchling 1.28+ emits **Metadata-Version 2.5** (PEP 639), which the twine bundled in `pypa/gh-action-pypi-publish@release/v1` rejects (`"'2.5' is not a valid Metadata-Version"`) — this **failed the v0.8.3 PyPI upload on 2026-06-01** (tag-match + build passed, the upload step failed; nothing was uploaded, so the version stayed empty on PyPI). 1.27.x emits 2.4, which uploads fine (0.8.2 shipped on it). The pin is unpinned `requires = ["hatchling"]` no longer safe because `uv build` pulls the latest hatchling. Lift the pin only once the publish action's twine is ≥ 6.1.0 (2.5-aware) or the workflow moves to `uv publish`. +- **CI** (`.github/workflows/test.yml`): runs on push to `main` + every PR. A `lint` job (ubuntu: `ruff check` + `ruff format --check`) plus a `test` matrix (ubuntu/macos/windows x py3.10/3.12) that does `uv sync --frozen --extra dev` then `pytest`. The matrix installs only core + dev (no `gpu` extra), so the GPU/model-running tests skip there and it exercises the metadata/identify/visible/cv2-eraser surface on all three OSes. Keep `uv.lock` valid (don't break `--frozen`) when editing `pyproject.toml`. `publish.yml` stays release-only and now verifies the release tag matches the `pyproject.toml` version (fails the build on a mismatch) before building. **Release flow:** bump the version in `pyproject.toml` + `src/remove_ai_watermarks/__init__.py` + `uv.lock` (the project's own `[[package]]` entry, ~line 2868), commit `chore(release): vX.Y.Z`, `git tag -a vX.Y.Z -m vX.Y.Z` (annotated — `git tag` without `-m` errors here), push `main` + the tag, then `gh release create vX.Y.Z` — **PyPI publish triggers on the GitHub Release `published` event, NOT on the tag push**, so the tag alone does not publish. **Sdist must exclude `data/`** (`[tool.hatch.build.targets.sdist] exclude = ["/data"]`): hatchling's default sdist bundles all VCS-tracked files, so the committed `data/` test corpora (the multi-hundred-MB synthid_corpus images + the visible-mark captures) pushed the **0.8.0** sdist past PyPI's per-project file-size limit (400 "File too large") — the wheel uploaded but the sdist was rejected, so 0.8.0 shipped wheel-only and 0.8.1 carried the fix. The wheel only ships `src/` (via `[tool.hatch.build.targets.wheel] packages`), so it was never affected. **A failed PyPI upload of one artifact still leaves the other live and you cannot re-upload the same version** — fix the build and cut the next patch. **Build backend is pinned `hatchling<1.28`** (`[build-system] requires`): hatchling 1.28+ emits **Metadata-Version 2.5** (PEP 639), which the twine bundled in `pypa/gh-action-pypi-publish@release/v1` rejects (`"'2.5' is not a valid Metadata-Version"`) — this **failed the v0.8.3 PyPI upload on 2026-06-01** (tag-match + build passed, the upload step failed; nothing was uploaded, so the version stayed empty on PyPI). 1.27.x emits 2.4, which uploads fine (0.8.2 shipped on it). The pin is unpinned `requires = ["hatchling"]` no longer safe because `uv build` pulls the latest hatchling. Lift the pin only once the publish action's twine is ≥ 6.1.0 (2.5-aware) or the workflow moves to `uv publish`. - `bash maintain.sh` — uv-outdated, uv-secure, ruff check/fix, ruff format, pyright, pytest -n auto - **Strict pyright is clean across `src/` (0 errors).** The cv2/torch/diffusers boundary files (`gemini_engine`, `region_eraser`, `doubao_engine`, `humanizer`, `invisible_engine`, `noai/watermark_remover`) carry a documented per-file `# pyright:` relax pragma that turns off only the unknown-type / untyped-third-party rules — those libs ship no usable types, so strict typing there fights the ecosystem. Pure-logic files stay fully strict; `typings/piexif/__init__.pyi` is a local stub so `metadata.py`/`extractor.py` resolve piexif. Public ndarray-returning signatures on the relaxed engines are still annotated `NDArray[Any]` so strict consumers (`cli.py`) stay clean. When touching a relaxed file, prefer fixing real issues over widening the pragma; keep the pragma scoped to genuinely-untyped boundaries. (`uv-secure` is clean since idna was bumped 3.11 -> 3.16, fixing GHSA-65pc-fj4g-8rjx.) - **Full-project `uv run pyright` (no path) OOMs/crashes node on this ML-heavy repo** (emits a `libnode` stack frame, no summary) — a known environment limit, not a code error. Gate with `uv run --extra dev --extra gpu pyright src/` (completes, authoritative) or scope to changed files; also run `uv run ruff check` and `uv run pytest` directly. diff --git a/data/synthid_corpus/images/cleaned/2e4ce41c-openai_1_clean_s005.png b/data/synthid_corpus/images/cleaned/2e4ce41c-openai_1_clean_s005.png new file mode 100644 index 0000000..e52c76b Binary files /dev/null and b/data/synthid_corpus/images/cleaned/2e4ce41c-openai_1_clean_s005.png differ diff --git a/data/synthid_corpus/images/cleaned/356196dd-gemini_3_clean_s015_max1536.png b/data/synthid_corpus/images/cleaned/356196dd-gemini_3_clean_s015_max1536.png new file mode 100644 index 0000000..c6f183a Binary files /dev/null and b/data/synthid_corpus/images/cleaned/356196dd-gemini_3_clean_s015_max1536.png differ diff --git a/data/synthid_corpus/images/cleaned/37b34274-openai_2_clean_s010.png b/data/synthid_corpus/images/cleaned/37b34274-openai_2_clean_s010.png new file mode 100644 index 0000000..4306c2b Binary files /dev/null and b/data/synthid_corpus/images/cleaned/37b34274-openai_2_clean_s010.png differ diff --git a/data/synthid_corpus/images/cleaned/4aa5f61c-gemini_2_clean_s015_max1536.png b/data/synthid_corpus/images/cleaned/4aa5f61c-gemini_2_clean_s015_max1536.png new file mode 100644 index 0000000..f95254a Binary files /dev/null and b/data/synthid_corpus/images/cleaned/4aa5f61c-gemini_2_clean_s015_max1536.png differ diff --git a/data/synthid_corpus/images/cleaned/9e4160bb-gemini_4_clean_s015_max1536.png b/data/synthid_corpus/images/cleaned/9e4160bb-gemini_4_clean_s015_max1536.png new file mode 100644 index 0000000..75a9030 Binary files /dev/null and b/data/synthid_corpus/images/cleaned/9e4160bb-gemini_4_clean_s015_max1536.png differ diff --git a/data/synthid_corpus/images/cleaned/f7c52cdf-openai_3_clean_s005.png b/data/synthid_corpus/images/cleaned/f7c52cdf-openai_3_clean_s005.png new file mode 100644 index 0000000..a7c7d12 Binary files /dev/null and b/data/synthid_corpus/images/cleaned/f7c52cdf-openai_3_clean_s005.png differ diff --git a/data/synthid_corpus/images/cleaned/f99bd9a5-gemini_1_clean_s015_max1536.png b/data/synthid_corpus/images/cleaned/f99bd9a5-gemini_1_clean_s015_max1536.png new file mode 100644 index 0000000..45f3136 Binary files /dev/null and b/data/synthid_corpus/images/cleaned/f99bd9a5-gemini_1_clean_s015_max1536.png differ diff --git a/data/synthid_corpus/images/pos/05b836ec-openai_1_original.png b/data/synthid_corpus/images/pos/05b836ec-openai_1_original.png new file mode 100644 index 0000000..c1325f7 Binary files /dev/null and b/data/synthid_corpus/images/pos/05b836ec-openai_1_original.png differ diff --git a/data/synthid_corpus/images/pos/28ff8732-openai_3_original.png b/data/synthid_corpus/images/pos/28ff8732-openai_3_original.png new file mode 100644 index 0000000..59365a1 Binary files /dev/null and b/data/synthid_corpus/images/pos/28ff8732-openai_3_original.png differ diff --git a/data/synthid_corpus/images/pos/2c33e75a-gemini_4_original.png b/data/synthid_corpus/images/pos/2c33e75a-gemini_4_original.png new file mode 100644 index 0000000..c79bc40 Binary files /dev/null and b/data/synthid_corpus/images/pos/2c33e75a-gemini_4_original.png differ diff --git a/data/synthid_corpus/images/pos/45d79a68-gemini_3_original.png b/data/synthid_corpus/images/pos/45d79a68-gemini_3_original.png new file mode 100644 index 0000000..7ffcbf4 Binary files /dev/null and b/data/synthid_corpus/images/pos/45d79a68-gemini_3_original.png differ diff --git a/data/synthid_corpus/images/pos/4affd7f2-gemini_1_original.png b/data/synthid_corpus/images/pos/4affd7f2-gemini_1_original.png new file mode 100644 index 0000000..229f943 Binary files /dev/null and b/data/synthid_corpus/images/pos/4affd7f2-gemini_1_original.png differ diff --git a/data/synthid_corpus/images/pos/794e023e-openai_2_original.png b/data/synthid_corpus/images/pos/794e023e-openai_2_original.png new file mode 100644 index 0000000..e3b01fb Binary files /dev/null and b/data/synthid_corpus/images/pos/794e023e-openai_2_original.png differ diff --git a/data/synthid_corpus/images/pos/8c1a6fb0-gemini_2_original.png b/data/synthid_corpus/images/pos/8c1a6fb0-gemini_2_original.png new file mode 100644 index 0000000..3526c8c Binary files /dev/null and b/data/synthid_corpus/images/pos/8c1a6fb0-gemini_2_original.png differ diff --git a/data/synthid_corpus/manifest.csv b/data/synthid_corpus/manifest.csv index be6ed08..14074c8 100644 --- a/data/synthid_corpus/manifest.csv +++ b/data/synthid_corpus/manifest.csv @@ -24,3 +24,17 @@ d20d4cc936dbdfe909c52502039a9e84ba93d97b42b24a0acee5b7d6c71930ae,d20d4cc9-Gemini c86973424817f62510e2a312b85c52e05adf47ace87a8e717fd442607596f501,c8697342-aistudio_lake.png,pos,Google AI Studio (Nano Banana),gemini-2.5-flash-image,1024,1024,png,,,gemini-app,2026-05-24T21:39:09Z,"API/playground: SynthID pixel CONFIRMED (Gemini-app oracle) + visible sparkle, but NO C2PA/IPTC -> synthid_source blind spot" 1f81827c06d67cf6f6c7f5d53ec8f9738183942a6d1d2717b161fea0fdcc540a,1f81827c-Designer.png,pos,Microsoft Designer,dall-e (Designer),1024,1024,png,"OpenAI, Microsoft",yes,c2pa-metadata,2026-05-24T22:18:40Z,C2PA issuer OpenAI+Microsoft; synthid_source=OpenAI (DALL-E surface inherits OpenAI SynthID+C2PA) f6dd47a5ffd319aea21bf10dcf9877097666420b02c2620080bac12b03976e7e,f6dd47a5-4ef377bd-gpt-image-2-cleaned.png,cleaned,"our pipeline (invisible/SDXL, native-res default)",stabilityai/stable-diffusion-xl-base-1.0,1254,1254,png,,,openai-verify,2026-05-25T20:50:38Z,"cleaned from 4ef377bd via v0.5.3 'all' at native 1254x1254 (prod-equivalent); openai.com/verify: SynthID NOT detected. Re-confirms #10 native-res default defeats OpenAI SynthID (closes #15 root cause). Note: native res OOMs on 20GB MPS, auto-fell back to CPU." +05b836ecfe40fd689177fda74384ae4fdcc446505bbc4281cd3cbb6523eb669e,05b836ec-openai_1_original.png,pos,ChatGPT,gpt-image,1122,1402,png,OpenAI,yes,openai-verify,2026-06-04T00:07:53Z,June 2026 strength-study subject; openai.com/verify: SynthID detected (docs/synthid.md 2.2) +794e023ea7ae321267fe5af76f4080c98a84a9865669c0733ebfb9757b8638df,794e023e-openai_2_original.png,pos,ChatGPT,gpt-image,1024,1536,png,OpenAI,yes,openai-verify,2026-06-04T00:07:53Z,June 2026 strength-study subject; openai.com/verify: SynthID detected (docs/synthid.md 2.2) +28ff8732b037f98a4ef5bc277bbcdaa32e5eb9ccbd00b6c8c616e46ef68ae8a0,28ff8732-openai_3_original.png,pos,ChatGPT,gpt-image,1448,1086,png,OpenAI,yes,openai-verify,2026-06-04T00:07:53Z,June 2026 strength-study subject; openai.com/verify: SynthID detected (docs/synthid.md 2.2) +4affd7f27767a445db6abf741355743ba8d95108ad922c9fff045feed8492236,4affd7f2-gemini_1_original.png,pos,Gemini app,gemini,2816,1536,png,Google LLC,yes,gemini-app,2026-06-04T00:08:05Z,June 2026 strength-study subject; Gemini-app Verify with SynthID: detected (docs/synthid.md 2.2) +8c1a6fb03ef3d45a1f958fb3401e4264e409ff88c2a793061db7f29023454d0e,8c1a6fb0-gemini_2_original.png,pos,Gemini app,gemini,2816,1536,png,Google LLC,yes,gemini-app,2026-06-04T00:08:05Z,June 2026 strength-study subject; Gemini-app Verify with SynthID: detected (docs/synthid.md 2.2) +45d79a683134fcba1b147b2aedb669783d474e1fb8a4df329729a0904fd1b46b,45d79a68-gemini_3_original.png,pos,Gemini app,gemini,2816,1536,png,Google LLC,yes,gemini-app,2026-06-04T00:08:05Z,June 2026 strength-study subject; Gemini-app Verify with SynthID: detected (docs/synthid.md 2.2) +2c33e75a2db614ce74c83cc0a6ac6c3ac735aca83ab88c9c9345843b124f7856,2c33e75a-gemini_4_original.png,pos,Gemini app,gemini,2816,1536,png,Google LLC,yes,gemini-app,2026-06-04T00:08:05Z,June 2026 strength-study subject; Gemini-app Verify with SynthID: detected (docs/synthid.md 2.2) +2e4ce41cfab456c1d9ea0898e47a5a1d434266ba24e88d4cc807a4180a56925f,2e4ce41c-openai_1_clean_s005.png,cleaned,"our pipeline (SDXL img2img, native)",stabilityai/stable-diffusion-xl-base-1.0,1122,1402,png,,,openai-verify,2026-06-04T00:08:05Z,cleaned at strength 0.05 native; openai.com/verify: SynthID NOT detected (docs/synthid.md 2.2) +f7c52cdfeb14a6be2fff449e89b0181e66e365f36635ee4fcb21567e4cb770ef,f7c52cdf-openai_3_clean_s005.png,cleaned,"our pipeline (SDXL img2img, native)",stabilityai/stable-diffusion-xl-base-1.0,1448,1086,png,,,openai-verify,2026-06-04T00:08:05Z,cleaned at strength 0.05 native; openai.com/verify: SynthID NOT detected (docs/synthid.md 2.2) +37b34274210888702e56eb74f4aa36578f15bf57157a3ebf394f0b8eaa820e19,37b34274-openai_2_clean_s010.png,cleaned,"our pipeline (SDXL img2img, native)",stabilityai/stable-diffusion-xl-base-1.0,1024,1536,png,,,openai-verify,2026-06-04T00:08:05Z,cleaned at strength 0.10 native (min captured for this subject); openai.com/verify: SynthID NOT detected +f99bd9a51814265a23de467d14792db903fef99678b7e7c960d0c6813ed9b0fc,f99bd9a5-gemini_1_clean_s015_max1536.png,cleaned,"our pipeline (SDXL img2img, --max-resolution 1536)",stabilityai/stable-diffusion-xl-base-1.0,2816,1536,png,,,gemini-app,2026-06-04T00:08:05Z,cleaned at strength 0.15 --max-resolution 1536; Gemini-app: SynthID NOT detected (docs/synthid.md 2.2) +4aa5f61c55c1f3fa9bbc49dffff8a404527722637ae694a932245629635b3f2b,4aa5f61c-gemini_2_clean_s015_max1536.png,cleaned,"our pipeline (SDXL img2img, --max-resolution 1536)",stabilityai/stable-diffusion-xl-base-1.0,2816,1536,png,,,gemini-app,2026-06-04T00:08:05Z,cleaned at strength 0.15 --max-resolution 1536; Gemini-app: SynthID NOT detected (docs/synthid.md 2.2) +356196dd63abf011b30b582a0408ccecb726d746065af20ad3611dde72a88725,356196dd-gemini_3_clean_s015_max1536.png,cleaned,"our pipeline (SDXL img2img, --max-resolution 1536)",stabilityai/stable-diffusion-xl-base-1.0,2816,1536,png,,,gemini-app,2026-06-04T00:08:05Z,cleaned at strength 0.15 --max-resolution 1536; Gemini-app: SynthID NOT detected (docs/synthid.md 2.2) +9e4160bb8e3e915d2d4593e37c71495ee7cfcec183602541166f622ebfd84403,9e4160bb-gemini_4_clean_s015_max1536.png,cleaned,"our pipeline (SDXL img2img, --max-resolution 1536)",stabilityai/stable-diffusion-xl-base-1.0,2816,1536,png,,,gemini-app,2026-06-04T00:08:05Z,cleaned at strength 0.15 --max-resolution 1536; Gemini-app: SynthID NOT detected (docs/synthid.md 2.2) diff --git a/docs/synthid.md b/docs/synthid.md index 4742205..07ec056 100644 --- a/docs/synthid.md +++ b/docs/synthid.md @@ -175,12 +175,15 @@ A controlled study (June 2026, clean v0.8.6 with text/face protection OFF, native resolution on this repo's default SDXL pipeline) measured the minimum img2img strength that removes the SynthID pixel watermark, verified per image on the vendor's own oracle (openai.com/verify for OpenAI, the Gemini app "Verify -with SynthID" for Google). The test set and per-image results are recorded in -`data/synthid_corpus/` (manifest `verified_via` = `openai-verify` / `gemini-app`). +with SynthID" for Google). Each subject is archived in `data/synthid_corpus/` as a +pos original plus its minimum-clearing cleaned output (manifest `verified_via` = +`openai-verify` / `gemini-app`), EXCEPT one third-party image from issue #14, which +was oracle-verified but is not committed (third-party content stays out of the +public corpus). | Vendor | Images | Resolution(s) | Pipeline | Removed at | |--------|--------|---------------|----------|------------| -| OpenAI (gpt-image) | n=4 | 1024x1536 .. 1600x1600 | native | **0.05** | +| OpenAI (gpt-image) | n=4 (3 archived + 1 external-only) | 1024x1536 .. 1600x1600 | native | **0.05** | | Google (Gemini) | n=4 | 2816x1536 -> capped 1536 | `--max-resolution 1536` | **0.15** (0.05 and 0.10 do NOT clear) | **Two findings, both oracle-verified:**