mirror of
https://github.com/wiltodelta/remove-ai-watermarks.git
synced 2026-05-26 14:17:47 +02:00
chore(corpus): grow SynthID reference set + document autonomous Chrome collection
Adds content positives (OpenAI gpt-image: forest, fisherman, tokyo; Google gemini: fisherman, mug) and SDXL/non-SynthID negatives to the local corpus manifest. Now spans 4 resolutions across 2 vendors (was solid-black only). README: documents driving generation via Chrome MCP -- Gemini single-click download; ChatGPT via in-page fetch+blob (preserves original C2PA bytes, unlike the flaky UI download / a canvas re-encode). Images stay gitignored; only the manifest (sha256 + labels + extracted metadata) and protocol are tracked. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -73,3 +73,34 @@ uv run python scripts/synthid_corpus.py ingest path/to/*.png \
|
||||
|
||||
uv run python scripts/synthid_corpus.py status # counts by label / resolution / verification
|
||||
```
|
||||
|
||||
## Autonomous collection via Chrome MCP
|
||||
|
||||
Generation can be driven through the browser (the account must be logged in):
|
||||
|
||||
- **Gemini** (`gemini.google.com`): type `Create an image: <prompt>`, wait, hover the
|
||||
result, click the download icon (top-right). Single, reliable click. Outputs
|
||||
carry Google C2PA + SynthID. Occasionally the composer stalls in a
|
||||
"generating" state -> start a New chat to reset.
|
||||
- **ChatGPT** (`chatgpt.com`): the UI download is flaky (the fullscreen viewer
|
||||
races and can grab the previous image; the share-modal path works but is
|
||||
multi-step). Reliable path is an in-page fetch of the rendered image, which
|
||||
preserves the original bytes (C2PA intact, unlike a canvas re-encode):
|
||||
|
||||
```js
|
||||
// run in the ChatGPT tab via the browser MCP javascript tool
|
||||
(async () => {
|
||||
const imgs = [...document.querySelectorAll('img')].filter(i => i.naturalWidth >= 400);
|
||||
const img = imgs[imgs.length - 1]; // newest large image
|
||||
const b = await (await fetch(img.currentSrc || img.src)).blob();
|
||||
const a = document.createElement('a');
|
||||
a.href = URL.createObjectURL(b); a.download = 'dl.png';
|
||||
document.body.appendChild(a); a.click(); a.remove();
|
||||
return 'size=' + b.size; // do NOT return the src (privacy guard blocks query strings)
|
||||
})()
|
||||
```
|
||||
|
||||
Gotcha: confirm the returned `size` differs from the previous image before
|
||||
ingesting -- if the new image has not finished rendering, the script grabs the
|
||||
prior one (the corpus dedups by sha256, but the notes would mislabel it).
|
||||
ChatGPT also shows an A/B "which is better?" picker; click Skip first.
|
||||
|
||||
@@ -2,3 +2,16 @@ sha256,filename,label,source,model,width,height,format,c2pa_issuer,synthid_metad
|
||||
4ef377bde1a1d4eff141972841938643b173f5052992a018b9a21b31ac31731e,"4ef377bd-ChatGPT Image May 23, 2026, 02_43_02 PM.png",pos,ChatGPT,gpt-image,1254,1254,png,OpenAI,yes,openai-verify,2026-05-23T21:48:12Z,fresh post-rollout 2026-05-23; openai.com/verify: SynthID+C2PA detected
|
||||
d09f84c0e4c6d8b336bf4a9a7277314e940dcb5052ae7051e785cbb3bb42d656,d09f84c0-Gemini_Generated_Image_vq7wkwvq7wkwvq7w.png,pos,Gemini app,gemini,2816,1536,png,Google LLC,yes,c2pa-metadata,2026-05-23T21:52:40Z,"user: latest Gemini, SynthID v2"
|
||||
47188e88f956291bd38ab6906e5f21eb273d4a697ddc8b4479deac9f48915e1a,47188e88-disco_synthid_removed.png,cleaned,our pipeline (invisible/SDXL),stabilityai/stable-diffusion-xl-base-1.0,1254,1254,png,,,openai-verify,2026-05-23T22:06:54Z,cleaned from 4ef377bd disco; openai.com/verify: SynthID NOT detected (defeated)
|
||||
52bb6bd524a74bff1b74ed893784c1c0ee76f48a9b45a009bae95ad4a57b7759,"52bb6bd5-ChatGPT Image May 24, 2026, 11_01_02 AM.png",pos,ChatGPT,gpt-image,1254,1254,png,OpenAI,yes,c2pa-metadata,2026-05-24T18:49:35Z,solid black ref (spectral pilot 2026-05-24)
|
||||
ee75ade07914d6306eae3443b3028782e0ac8c125a31b2d5a141f75aebdafb18,"ee75ade0-ChatGPT Image May 24, 2026, 11_10_04 AM.png",pos,ChatGPT,gpt-image,1254,1254,png,OpenAI,yes,c2pa-metadata,2026-05-24T18:49:35Z,solid black ref (spectral pilot 2026-05-24)
|
||||
9398c74dfad0f030633bd3ac224ba53e56a7ff9711d7b3a4c464e0073ece51b5,"9398c74d-ChatGPT Image May 24, 2026, 11_10_09 AM.png",pos,ChatGPT,gpt-image,1254,1254,png,OpenAI,yes,c2pa-metadata,2026-05-24T18:49:35Z,solid black ref (spectral pilot 2026-05-24)
|
||||
79c3733895e82e3c9e506de0ddd6dfbf20ba09171263d88444520777151868c8,"79c37338-ChatGPT Image May 24, 2026, 11_10_13 AM.png",pos,ChatGPT,gpt-image,1254,1254,png,OpenAI,yes,c2pa-metadata,2026-05-24T18:49:35Z,solid black ref (spectral pilot 2026-05-24)
|
||||
38315a0a83aa0e094a50520ea44e01aa26115927f09a532ff2ab8636de743e0a,"38315a0a-ChatGPT Image May 24, 2026, 11_10_19 AM.png",pos,ChatGPT,gpt-image,1254,1254,png,OpenAI,yes,c2pa-metadata,2026-05-24T18:49:35Z,solid black ref (spectral pilot 2026-05-24)
|
||||
f3a1fbc3bc8f768265400724bb9800d322f8e0b1461b2c585540845ea8352c5d,f3a1fbc3-winter_scene_X.png,neg,local (SDXL/processed),sdxl-or-processed,2816,1536,png,,,none,2026-05-24T18:52:23Z,metadata-clean: no C2PA/SynthID source
|
||||
f07bc0bcad09a5a5687ae312a1298c9ddd110a5e414265efc52ef4d524b36f86,f07bc0bc-api_enterprise_gate_ax.png,neg,local (SDXL/processed),sdxl-or-processed,1080,1350,png,,,none,2026-05-24T18:52:23Z,metadata-clean: no C2PA/SynthID source
|
||||
89571987e368f1ce82f9dedfa9101584434def842e50d1f4c759de64db5c21d9,89571987-c87bd3c48a4443a68cb84a65604dacd3_clean.png,neg,local (SDXL/processed),sdxl-or-processed,2816,1536,png,,,none,2026-05-24T18:52:23Z,metadata-clean: no C2PA/SynthID source
|
||||
7b650522d42db09568e249c04d683c469fb3e280a2c53fcd1031cb9df27c619a,"7b650522-ChatGPT Image May 24, 2026, 12_19_54 PM.png",pos,ChatGPT,gpt-image,1602,982,png,OpenAI,yes,c2pa-metadata,2026-05-24T19:20:25Z,content: misty pine forest at dawn
|
||||
fb28dba2a82cc101a92fdee5714867b32610d0564f37737fe4bb70782b8ecf32,fb28dba2-Gemini_Generated_Image_dsjlnsdsjlnsdsjl.png,pos,Gemini app,gemini,2816,1536,png,Google LLC,yes,c2pa-metadata,2026-05-24T19:30:25Z,content: elderly fisherman portrait
|
||||
d20d4cc936dbdfe909c52502039a9e84ba93d97b42b24a0acee5b7d6c71930ae,d20d4cc9-Gemini_Generated_Image_ug6kdpug6kdpug6k.png,pos,Gemini app,gemini,2816,1536,png,Google LLC,yes,c2pa-metadata,2026-05-24T19:33:15Z,content: red coffee mug product shot
|
||||
28f323345f6496d936c3f1a72f671ddf59d0f81565c24a63bf3286860f633afe,28f32334-chatgpt_fisherman.png,pos,ChatGPT,gpt-image,1023,1537,png,OpenAI,yes,c2pa-metadata,2026-05-24T19:39:20Z,content: elderly fisherman portrait (fetch+blob dl)
|
||||
88e61a384c2e0b12d97bc66046e4a10542b2987448ba89c4b49e66311e969c84,88e61a38-chatgpt_tokyo.png,pos,ChatGPT,gpt-image,1023,1537,png,OpenAI,yes,c2pa-metadata,2026-05-24T19:42:02Z,content: tokyo street night (fetch+blob dl)
|
||||
|
||||
|
Reference in New Issue
Block a user