data(corpus): archive June 2026 SynthID strength-study subjects
Back docs/synthid.md section 2.2 with the actual test set: the per-image oracle-verified subjects were only in a local working dir, while the doc claimed they were recorded in data/synthid_corpus/. Ingest the key pos+cleaned pairs so the claim holds. - pos: openai_1/2/3 originals (gpt-image, openai-verify) + gemini_1/2/3/4 originals (Gemini app, gemini-app); all probe as C2PA-SynthID present. - cleaned: OpenAI at strength 0.05 (openai_2 only s010 captured) + Gemini at 0.15 --max-resolution 1536; oracle: SynthID NOT detected. Metadata stripped, so no C2PA on the cleaned rows. - Excluded the third-party issue #14 image (pic3): oracle-verified but not committed to the public corpus. - docs/synthid.md 2.2: state OpenAI n=4 = 3 archived + 1 external-only. - CLAUDE.md: drop the drift-prone "~65 MB" corpus size from the sdist note. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@@ -14,7 +14,7 @@ You are a **principal Python engineer** maintaining a CLI tool and library for r
|
||||
|
||||
## Test and lint
|
||||
|
||||
- **CI** (`.github/workflows/test.yml`): runs on push to `main` + every PR. A `lint` job (ubuntu: `ruff check` + `ruff format --check`) plus a `test` matrix (ubuntu/macos/windows x py3.10/3.12) that does `uv sync --frozen --extra dev` then `pytest`. The matrix installs only core + dev (no `gpu` extra), so the GPU/model-running tests skip there and it exercises the metadata/identify/visible/cv2-eraser surface on all three OSes. Keep `uv.lock` valid (don't break `--frozen`) when editing `pyproject.toml`. `publish.yml` stays release-only and now verifies the release tag matches the `pyproject.toml` version (fails the build on a mismatch) before building. **Release flow:** bump the version in `pyproject.toml` + `src/remove_ai_watermarks/__init__.py` + `uv.lock` (the project's own `[[package]]` entry, ~line 2868), commit `chore(release): vX.Y.Z`, `git tag -a vX.Y.Z -m vX.Y.Z` (annotated — `git tag` without `-m` errors here), push `main` + the tag, then `gh release create vX.Y.Z` — **PyPI publish triggers on the GitHub Release `published` event, NOT on the tag push**, so the tag alone does not publish. **Sdist must exclude `data/`** (`[tool.hatch.build.targets.sdist] exclude = ["/data"]`): hatchling's default sdist bundles all VCS-tracked files, so the committed `data/` test corpora (synthid_corpus images ~65 MB + the visible-mark captures) pushed the **0.8.0** sdist past PyPI's per-project file-size limit (400 "File too large") — the wheel uploaded but the sdist was rejected, so 0.8.0 shipped wheel-only and 0.8.1 carried the fix. The wheel only ships `src/` (via `[tool.hatch.build.targets.wheel] packages`), so it was never affected. **A failed PyPI upload of one artifact still leaves the other live and you cannot re-upload the same version** — fix the build and cut the next patch. **Build backend is pinned `hatchling<1.28`** (`[build-system] requires`): hatchling 1.28+ emits **Metadata-Version 2.5** (PEP 639), which the twine bundled in `pypa/gh-action-pypi-publish@release/v1` rejects (`"'2.5' is not a valid Metadata-Version"`) — this **failed the v0.8.3 PyPI upload on 2026-06-01** (tag-match + build passed, the upload step failed; nothing was uploaded, so the version stayed empty on PyPI). 1.27.x emits 2.4, which uploads fine (0.8.2 shipped on it). The pin is unpinned `requires = ["hatchling"]` no longer safe because `uv build` pulls the latest hatchling. Lift the pin only once the publish action's twine is ≥ 6.1.0 (2.5-aware) or the workflow moves to `uv publish`.
|
||||
- **CI** (`.github/workflows/test.yml`): runs on push to `main` + every PR. A `lint` job (ubuntu: `ruff check` + `ruff format --check`) plus a `test` matrix (ubuntu/macos/windows x py3.10/3.12) that does `uv sync --frozen --extra dev` then `pytest`. The matrix installs only core + dev (no `gpu` extra), so the GPU/model-running tests skip there and it exercises the metadata/identify/visible/cv2-eraser surface on all three OSes. Keep `uv.lock` valid (don't break `--frozen`) when editing `pyproject.toml`. `publish.yml` stays release-only and now verifies the release tag matches the `pyproject.toml` version (fails the build on a mismatch) before building. **Release flow:** bump the version in `pyproject.toml` + `src/remove_ai_watermarks/__init__.py` + `uv.lock` (the project's own `[[package]]` entry, ~line 2868), commit `chore(release): vX.Y.Z`, `git tag -a vX.Y.Z -m vX.Y.Z` (annotated — `git tag` without `-m` errors here), push `main` + the tag, then `gh release create vX.Y.Z` — **PyPI publish triggers on the GitHub Release `published` event, NOT on the tag push**, so the tag alone does not publish. **Sdist must exclude `data/`** (`[tool.hatch.build.targets.sdist] exclude = ["/data"]`): hatchling's default sdist bundles all VCS-tracked files, so the committed `data/` test corpora (the multi-hundred-MB synthid_corpus images + the visible-mark captures) pushed the **0.8.0** sdist past PyPI's per-project file-size limit (400 "File too large") — the wheel uploaded but the sdist was rejected, so 0.8.0 shipped wheel-only and 0.8.1 carried the fix. The wheel only ships `src/` (via `[tool.hatch.build.targets.wheel] packages`), so it was never affected. **A failed PyPI upload of one artifact still leaves the other live and you cannot re-upload the same version** — fix the build and cut the next patch. **Build backend is pinned `hatchling<1.28`** (`[build-system] requires`): hatchling 1.28+ emits **Metadata-Version 2.5** (PEP 639), which the twine bundled in `pypa/gh-action-pypi-publish@release/v1` rejects (`"'2.5' is not a valid Metadata-Version"`) — this **failed the v0.8.3 PyPI upload on 2026-06-01** (tag-match + build passed, the upload step failed; nothing was uploaded, so the version stayed empty on PyPI). 1.27.x emits 2.4, which uploads fine (0.8.2 shipped on it). The pin is unpinned `requires = ["hatchling"]` no longer safe because `uv build` pulls the latest hatchling. Lift the pin only once the publish action's twine is ≥ 6.1.0 (2.5-aware) or the workflow moves to `uv publish`.
|
||||
- `bash maintain.sh` — uv-outdated, uv-secure, ruff check/fix, ruff format, pyright, pytest -n auto
|
||||
- **Strict pyright is clean across `src/` (0 errors).** The cv2/torch/diffusers boundary files (`gemini_engine`, `region_eraser`, `doubao_engine`, `humanizer`, `invisible_engine`, `noai/watermark_remover`) carry a documented per-file `# pyright:` relax pragma that turns off only the unknown-type / untyped-third-party rules — those libs ship no usable types, so strict typing there fights the ecosystem. Pure-logic files stay fully strict; `typings/piexif/__init__.pyi` is a local stub so `metadata.py`/`extractor.py` resolve piexif. Public ndarray-returning signatures on the relaxed engines are still annotated `NDArray[Any]` so strict consumers (`cli.py`) stay clean. When touching a relaxed file, prefer fixing real issues over widening the pragma; keep the pragma scoped to genuinely-untyped boundaries. (`uv-secure` is clean since idna was bumped 3.11 -> 3.16, fixing GHSA-65pc-fj4g-8rjx.)
|
||||
- **Full-project `uv run pyright` (no path) OOMs/crashes node on this ML-heavy repo** (emits a `libnode` stack frame, no summary) — a known environment limit, not a code error. Gate with `uv run --extra dev --extra gpu pyright src/` (completes, authoritative) or scope to changed files; also run `uv run ruff check` and `uv run pytest` directly.
|
||||
|
||||
|
After Width: | Height: | Size: 2.4 MiB |
|
After Width: | Height: | Size: 7.6 MiB |
|
After Width: | Height: | Size: 1.7 MiB |
|
After Width: | Height: | Size: 7.3 MiB |
|
After Width: | Height: | Size: 8.5 MiB |
|
After Width: | Height: | Size: 2.3 MiB |
|
After Width: | Height: | Size: 7.1 MiB |
|
After Width: | Height: | Size: 2.3 MiB |
|
After Width: | Height: | Size: 2.1 MiB |
|
After Width: | Height: | Size: 9.6 MiB |
|
After Width: | Height: | Size: 9.3 MiB |
|
After Width: | Height: | Size: 8.2 MiB |
|
After Width: | Height: | Size: 1.4 MiB |
|
After Width: | Height: | Size: 8.6 MiB |
@@ -24,3 +24,17 @@ d20d4cc936dbdfe909c52502039a9e84ba93d97b42b24a0acee5b7d6c71930ae,d20d4cc9-Gemini
|
||||
c86973424817f62510e2a312b85c52e05adf47ace87a8e717fd442607596f501,c8697342-aistudio_lake.png,pos,Google AI Studio (Nano Banana),gemini-2.5-flash-image,1024,1024,png,,,gemini-app,2026-05-24T21:39:09Z,"API/playground: SynthID pixel CONFIRMED (Gemini-app oracle) + visible sparkle, but NO C2PA/IPTC -> synthid_source blind spot"
|
||||
1f81827c06d67cf6f6c7f5d53ec8f9738183942a6d1d2717b161fea0fdcc540a,1f81827c-Designer.png,pos,Microsoft Designer,dall-e (Designer),1024,1024,png,"OpenAI, Microsoft",yes,c2pa-metadata,2026-05-24T22:18:40Z,C2PA issuer OpenAI+Microsoft; synthid_source=OpenAI (DALL-E surface inherits OpenAI SynthID+C2PA)
|
||||
f6dd47a5ffd319aea21bf10dcf9877097666420b02c2620080bac12b03976e7e,f6dd47a5-4ef377bd-gpt-image-2-cleaned.png,cleaned,"our pipeline (invisible/SDXL, native-res default)",stabilityai/stable-diffusion-xl-base-1.0,1254,1254,png,,,openai-verify,2026-05-25T20:50:38Z,"cleaned from 4ef377bd via v0.5.3 'all' at native 1254x1254 (prod-equivalent); openai.com/verify: SynthID NOT detected. Re-confirms #10 native-res default defeats OpenAI SynthID (closes #15 root cause). Note: native res OOMs on 20GB MPS, auto-fell back to CPU."
|
||||
05b836ecfe40fd689177fda74384ae4fdcc446505bbc4281cd3cbb6523eb669e,05b836ec-openai_1_original.png,pos,ChatGPT,gpt-image,1122,1402,png,OpenAI,yes,openai-verify,2026-06-04T00:07:53Z,June 2026 strength-study subject; openai.com/verify: SynthID detected (docs/synthid.md 2.2)
|
||||
794e023ea7ae321267fe5af76f4080c98a84a9865669c0733ebfb9757b8638df,794e023e-openai_2_original.png,pos,ChatGPT,gpt-image,1024,1536,png,OpenAI,yes,openai-verify,2026-06-04T00:07:53Z,June 2026 strength-study subject; openai.com/verify: SynthID detected (docs/synthid.md 2.2)
|
||||
28ff8732b037f98a4ef5bc277bbcdaa32e5eb9ccbd00b6c8c616e46ef68ae8a0,28ff8732-openai_3_original.png,pos,ChatGPT,gpt-image,1448,1086,png,OpenAI,yes,openai-verify,2026-06-04T00:07:53Z,June 2026 strength-study subject; openai.com/verify: SynthID detected (docs/synthid.md 2.2)
|
||||
4affd7f27767a445db6abf741355743ba8d95108ad922c9fff045feed8492236,4affd7f2-gemini_1_original.png,pos,Gemini app,gemini,2816,1536,png,Google LLC,yes,gemini-app,2026-06-04T00:08:05Z,June 2026 strength-study subject; Gemini-app Verify with SynthID: detected (docs/synthid.md 2.2)
|
||||
8c1a6fb03ef3d45a1f958fb3401e4264e409ff88c2a793061db7f29023454d0e,8c1a6fb0-gemini_2_original.png,pos,Gemini app,gemini,2816,1536,png,Google LLC,yes,gemini-app,2026-06-04T00:08:05Z,June 2026 strength-study subject; Gemini-app Verify with SynthID: detected (docs/synthid.md 2.2)
|
||||
45d79a683134fcba1b147b2aedb669783d474e1fb8a4df329729a0904fd1b46b,45d79a68-gemini_3_original.png,pos,Gemini app,gemini,2816,1536,png,Google LLC,yes,gemini-app,2026-06-04T00:08:05Z,June 2026 strength-study subject; Gemini-app Verify with SynthID: detected (docs/synthid.md 2.2)
|
||||
2c33e75a2db614ce74c83cc0a6ac6c3ac735aca83ab88c9c9345843b124f7856,2c33e75a-gemini_4_original.png,pos,Gemini app,gemini,2816,1536,png,Google LLC,yes,gemini-app,2026-06-04T00:08:05Z,June 2026 strength-study subject; Gemini-app Verify with SynthID: detected (docs/synthid.md 2.2)
|
||||
2e4ce41cfab456c1d9ea0898e47a5a1d434266ba24e88d4cc807a4180a56925f,2e4ce41c-openai_1_clean_s005.png,cleaned,"our pipeline (SDXL img2img, native)",stabilityai/stable-diffusion-xl-base-1.0,1122,1402,png,,,openai-verify,2026-06-04T00:08:05Z,cleaned at strength 0.05 native; openai.com/verify: SynthID NOT detected (docs/synthid.md 2.2)
|
||||
f7c52cdfeb14a6be2fff449e89b0181e66e365f36635ee4fcb21567e4cb770ef,f7c52cdf-openai_3_clean_s005.png,cleaned,"our pipeline (SDXL img2img, native)",stabilityai/stable-diffusion-xl-base-1.0,1448,1086,png,,,openai-verify,2026-06-04T00:08:05Z,cleaned at strength 0.05 native; openai.com/verify: SynthID NOT detected (docs/synthid.md 2.2)
|
||||
37b34274210888702e56eb74f4aa36578f15bf57157a3ebf394f0b8eaa820e19,37b34274-openai_2_clean_s010.png,cleaned,"our pipeline (SDXL img2img, native)",stabilityai/stable-diffusion-xl-base-1.0,1024,1536,png,,,openai-verify,2026-06-04T00:08:05Z,cleaned at strength 0.10 native (min captured for this subject); openai.com/verify: SynthID NOT detected
|
||||
f99bd9a51814265a23de467d14792db903fef99678b7e7c960d0c6813ed9b0fc,f99bd9a5-gemini_1_clean_s015_max1536.png,cleaned,"our pipeline (SDXL img2img, --max-resolution 1536)",stabilityai/stable-diffusion-xl-base-1.0,2816,1536,png,,,gemini-app,2026-06-04T00:08:05Z,cleaned at strength 0.15 --max-resolution 1536; Gemini-app: SynthID NOT detected (docs/synthid.md 2.2)
|
||||
4aa5f61c55c1f3fa9bbc49dffff8a404527722637ae694a932245629635b3f2b,4aa5f61c-gemini_2_clean_s015_max1536.png,cleaned,"our pipeline (SDXL img2img, --max-resolution 1536)",stabilityai/stable-diffusion-xl-base-1.0,2816,1536,png,,,gemini-app,2026-06-04T00:08:05Z,cleaned at strength 0.15 --max-resolution 1536; Gemini-app: SynthID NOT detected (docs/synthid.md 2.2)
|
||||
356196dd63abf011b30b582a0408ccecb726d746065af20ad3611dde72a88725,356196dd-gemini_3_clean_s015_max1536.png,cleaned,"our pipeline (SDXL img2img, --max-resolution 1536)",stabilityai/stable-diffusion-xl-base-1.0,2816,1536,png,,,gemini-app,2026-06-04T00:08:05Z,cleaned at strength 0.15 --max-resolution 1536; Gemini-app: SynthID NOT detected (docs/synthid.md 2.2)
|
||||
9e4160bb8e3e915d2d4593e37c71495ee7cfcec183602541166f622ebfd84403,9e4160bb-gemini_4_clean_s015_max1536.png,cleaned,"our pipeline (SDXL img2img, --max-resolution 1536)",stabilityai/stable-diffusion-xl-base-1.0,2816,1536,png,,,gemini-app,2026-06-04T00:08:05Z,cleaned at strength 0.15 --max-resolution 1536; Gemini-app: SynthID NOT detected (docs/synthid.md 2.2)
|
||||
|
||||
|
@@ -175,12 +175,15 @@ A controlled study (June 2026, clean v0.8.6 with text/face protection OFF,
|
||||
native resolution on this repo's default SDXL pipeline) measured the minimum
|
||||
img2img strength that removes the SynthID pixel watermark, verified per image on
|
||||
the vendor's own oracle (openai.com/verify for OpenAI, the Gemini app "Verify
|
||||
with SynthID" for Google). The test set and per-image results are recorded in
|
||||
`data/synthid_corpus/` (manifest `verified_via` = `openai-verify` / `gemini-app`).
|
||||
with SynthID" for Google). Each subject is archived in `data/synthid_corpus/` as a
|
||||
pos original plus its minimum-clearing cleaned output (manifest `verified_via` =
|
||||
`openai-verify` / `gemini-app`), EXCEPT one third-party image from issue #14, which
|
||||
was oracle-verified but is not committed (third-party content stays out of the
|
||||
public corpus).
|
||||
|
||||
| Vendor | Images | Resolution(s) | Pipeline | Removed at |
|
||||
|--------|--------|---------------|----------|------------|
|
||||
| OpenAI (gpt-image) | n=4 | 1024x1536 .. 1600x1600 | native | **0.05** |
|
||||
| OpenAI (gpt-image) | n=4 (3 archived + 1 external-only) | 1024x1536 .. 1600x1600 | native | **0.05** |
|
||||
| Google (Gemini) | n=4 | 2816x1536 -> capped 1536 | `--max-resolution 1536` | **0.15** (0.05 and 0.10 do NOT clear) |
|
||||
|
||||
**Two findings, both oracle-verified:**
|
||||
|
||||