chore(corpus): grow SynthID reference set + document autonomous Chrome collection

Adds content positives (OpenAI gpt-image: forest, fisherman, tokyo; Google
gemini: fisherman, mug) and SDXL/non-SynthID negatives to the local corpus
manifest. Now spans 4 resolutions across 2 vendors (was solid-black only).

README: documents driving generation via Chrome MCP -- Gemini single-click
download; ChatGPT via in-page fetch+blob (preserves original C2PA bytes,
unlike the flaky UI download / a canvas re-encode).

Images stay gitignored; only the manifest (sha256 + labels + extracted
metadata) and protocol are tracked.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
test-user
2026-05-24 12:46:46 -07:00
parent c006f9b8b4
commit da0edcbddc
2 changed files with 44 additions and 0 deletions
+31
View File
@@ -73,3 +73,34 @@ uv run python scripts/synthid_corpus.py ingest path/to/*.png \
uv run python scripts/synthid_corpus.py status # counts by label / resolution / verification
```
## Autonomous collection via Chrome MCP
Generation can be driven through the browser (the account must be logged in):
- **Gemini** (`gemini.google.com`): type `Create an image: <prompt>`, wait, hover the
result, click the download icon (top-right). Single, reliable click. Outputs
carry Google C2PA + SynthID. Occasionally the composer stalls in a
"generating" state -> start a New chat to reset.
- **ChatGPT** (`chatgpt.com`): the UI download is flaky (the fullscreen viewer
races and can grab the previous image; the share-modal path works but is
multi-step). Reliable path is an in-page fetch of the rendered image, which
preserves the original bytes (C2PA intact, unlike a canvas re-encode):
```js
// run in the ChatGPT tab via the browser MCP javascript tool
(async () => {
const imgs = [...document.querySelectorAll('img')].filter(i => i.naturalWidth >= 400);
const img = imgs[imgs.length - 1]; // newest large image
const b = await (await fetch(img.currentSrc || img.src)).blob();
const a = document.createElement('a');
a.href = URL.createObjectURL(b); a.download = 'dl.png';
document.body.appendChild(a); a.click(); a.remove();
return 'size=' + b.size; // do NOT return the src (privacy guard blocks query strings)
})()
```
Gotcha: confirm the returned `size` differs from the previous image before
ingesting -- if the new image has not finished rendering, the script grabs the
prior one (the corpus dedups by sha256, but the notes would mislabel it).
ChatGPT also shows an A/B "which is better?" picker; click Skip first.
+13
View File
@@ -2,3 +2,16 @@ sha256,filename,label,source,model,width,height,format,c2pa_issuer,synthid_metad
4ef377bde1a1d4eff141972841938643b173f5052992a018b9a21b31ac31731e,"4ef377bd-ChatGPT Image May 23, 2026, 02_43_02 PM.png",pos,ChatGPT,gpt-image,1254,1254,png,OpenAI,yes,openai-verify,2026-05-23T21:48:12Z,fresh post-rollout 2026-05-23; openai.com/verify: SynthID+C2PA detected
d09f84c0e4c6d8b336bf4a9a7277314e940dcb5052ae7051e785cbb3bb42d656,d09f84c0-Gemini_Generated_Image_vq7wkwvq7wkwvq7w.png,pos,Gemini app,gemini,2816,1536,png,Google LLC,yes,c2pa-metadata,2026-05-23T21:52:40Z,"user: latest Gemini, SynthID v2"
47188e88f956291bd38ab6906e5f21eb273d4a697ddc8b4479deac9f48915e1a,47188e88-disco_synthid_removed.png,cleaned,our pipeline (invisible/SDXL),stabilityai/stable-diffusion-xl-base-1.0,1254,1254,png,,,openai-verify,2026-05-23T22:06:54Z,cleaned from 4ef377bd disco; openai.com/verify: SynthID NOT detected (defeated)
52bb6bd524a74bff1b74ed893784c1c0ee76f48a9b45a009bae95ad4a57b7759,"52bb6bd5-ChatGPT Image May 24, 2026, 11_01_02 AM.png",pos,ChatGPT,gpt-image,1254,1254,png,OpenAI,yes,c2pa-metadata,2026-05-24T18:49:35Z,solid black ref (spectral pilot 2026-05-24)
ee75ade07914d6306eae3443b3028782e0ac8c125a31b2d5a141f75aebdafb18,"ee75ade0-ChatGPT Image May 24, 2026, 11_10_04 AM.png",pos,ChatGPT,gpt-image,1254,1254,png,OpenAI,yes,c2pa-metadata,2026-05-24T18:49:35Z,solid black ref (spectral pilot 2026-05-24)
9398c74dfad0f030633bd3ac224ba53e56a7ff9711d7b3a4c464e0073ece51b5,"9398c74d-ChatGPT Image May 24, 2026, 11_10_09 AM.png",pos,ChatGPT,gpt-image,1254,1254,png,OpenAI,yes,c2pa-metadata,2026-05-24T18:49:35Z,solid black ref (spectral pilot 2026-05-24)
79c3733895e82e3c9e506de0ddd6dfbf20ba09171263d88444520777151868c8,"79c37338-ChatGPT Image May 24, 2026, 11_10_13 AM.png",pos,ChatGPT,gpt-image,1254,1254,png,OpenAI,yes,c2pa-metadata,2026-05-24T18:49:35Z,solid black ref (spectral pilot 2026-05-24)
38315a0a83aa0e094a50520ea44e01aa26115927f09a532ff2ab8636de743e0a,"38315a0a-ChatGPT Image May 24, 2026, 11_10_19 AM.png",pos,ChatGPT,gpt-image,1254,1254,png,OpenAI,yes,c2pa-metadata,2026-05-24T18:49:35Z,solid black ref (spectral pilot 2026-05-24)
f3a1fbc3bc8f768265400724bb9800d322f8e0b1461b2c585540845ea8352c5d,f3a1fbc3-winter_scene_X.png,neg,local (SDXL/processed),sdxl-or-processed,2816,1536,png,,,none,2026-05-24T18:52:23Z,metadata-clean: no C2PA/SynthID source
f07bc0bcad09a5a5687ae312a1298c9ddd110a5e414265efc52ef4d524b36f86,f07bc0bc-api_enterprise_gate_ax.png,neg,local (SDXL/processed),sdxl-or-processed,1080,1350,png,,,none,2026-05-24T18:52:23Z,metadata-clean: no C2PA/SynthID source
89571987e368f1ce82f9dedfa9101584434def842e50d1f4c759de64db5c21d9,89571987-c87bd3c48a4443a68cb84a65604dacd3_clean.png,neg,local (SDXL/processed),sdxl-or-processed,2816,1536,png,,,none,2026-05-24T18:52:23Z,metadata-clean: no C2PA/SynthID source
7b650522d42db09568e249c04d683c469fb3e280a2c53fcd1031cb9df27c619a,"7b650522-ChatGPT Image May 24, 2026, 12_19_54 PM.png",pos,ChatGPT,gpt-image,1602,982,png,OpenAI,yes,c2pa-metadata,2026-05-24T19:20:25Z,content: misty pine forest at dawn
fb28dba2a82cc101a92fdee5714867b32610d0564f37737fe4bb70782b8ecf32,fb28dba2-Gemini_Generated_Image_dsjlnsdsjlnsdsjl.png,pos,Gemini app,gemini,2816,1536,png,Google LLC,yes,c2pa-metadata,2026-05-24T19:30:25Z,content: elderly fisherman portrait
d20d4cc936dbdfe909c52502039a9e84ba93d97b42b24a0acee5b7d6c71930ae,d20d4cc9-Gemini_Generated_Image_ug6kdpug6kdpug6k.png,pos,Gemini app,gemini,2816,1536,png,Google LLC,yes,c2pa-metadata,2026-05-24T19:33:15Z,content: red coffee mug product shot
28f323345f6496d936c3f1a72f671ddf59d0f81565c24a63bf3286860f633afe,28f32334-chatgpt_fisherman.png,pos,ChatGPT,gpt-image,1023,1537,png,OpenAI,yes,c2pa-metadata,2026-05-24T19:39:20Z,content: elderly fisherman portrait (fetch+blob dl)
88e61a384c2e0b12d97bc66046e4a10542b2987448ba89c4b49e66311e969c84,88e61a38-chatgpt_tokyo.png,pos,ChatGPT,gpt-image,1023,1537,png,OpenAI,yes,c2pa-metadata,2026-05-24T19:42:02Z,content: tokyo street night (fetch+blob dl)
1 sha256 filename label source model width height format c2pa_issuer synthid_metadata verified_via added notes
2 4ef377bde1a1d4eff141972841938643b173f5052992a018b9a21b31ac31731e 4ef377bd-ChatGPT Image May 23, 2026, 02_43_02 PM.png pos ChatGPT gpt-image 1254 1254 png OpenAI yes openai-verify 2026-05-23T21:48:12Z fresh post-rollout 2026-05-23; openai.com/verify: SynthID+C2PA detected
3 d09f84c0e4c6d8b336bf4a9a7277314e940dcb5052ae7051e785cbb3bb42d656 d09f84c0-Gemini_Generated_Image_vq7wkwvq7wkwvq7w.png pos Gemini app gemini 2816 1536 png Google LLC yes c2pa-metadata 2026-05-23T21:52:40Z user: latest Gemini, SynthID v2
4 47188e88f956291bd38ab6906e5f21eb273d4a697ddc8b4479deac9f48915e1a 47188e88-disco_synthid_removed.png cleaned our pipeline (invisible/SDXL) stabilityai/stable-diffusion-xl-base-1.0 1254 1254 png openai-verify 2026-05-23T22:06:54Z cleaned from 4ef377bd disco; openai.com/verify: SynthID NOT detected (defeated)
5 52bb6bd524a74bff1b74ed893784c1c0ee76f48a9b45a009bae95ad4a57b7759 52bb6bd5-ChatGPT Image May 24, 2026, 11_01_02 AM.png pos ChatGPT gpt-image 1254 1254 png OpenAI yes c2pa-metadata 2026-05-24T18:49:35Z solid black ref (spectral pilot 2026-05-24)
6 ee75ade07914d6306eae3443b3028782e0ac8c125a31b2d5a141f75aebdafb18 ee75ade0-ChatGPT Image May 24, 2026, 11_10_04 AM.png pos ChatGPT gpt-image 1254 1254 png OpenAI yes c2pa-metadata 2026-05-24T18:49:35Z solid black ref (spectral pilot 2026-05-24)
7 9398c74dfad0f030633bd3ac224ba53e56a7ff9711d7b3a4c464e0073ece51b5 9398c74d-ChatGPT Image May 24, 2026, 11_10_09 AM.png pos ChatGPT gpt-image 1254 1254 png OpenAI yes c2pa-metadata 2026-05-24T18:49:35Z solid black ref (spectral pilot 2026-05-24)
8 79c3733895e82e3c9e506de0ddd6dfbf20ba09171263d88444520777151868c8 79c37338-ChatGPT Image May 24, 2026, 11_10_13 AM.png pos ChatGPT gpt-image 1254 1254 png OpenAI yes c2pa-metadata 2026-05-24T18:49:35Z solid black ref (spectral pilot 2026-05-24)
9 38315a0a83aa0e094a50520ea44e01aa26115927f09a532ff2ab8636de743e0a 38315a0a-ChatGPT Image May 24, 2026, 11_10_19 AM.png pos ChatGPT gpt-image 1254 1254 png OpenAI yes c2pa-metadata 2026-05-24T18:49:35Z solid black ref (spectral pilot 2026-05-24)
10 f3a1fbc3bc8f768265400724bb9800d322f8e0b1461b2c585540845ea8352c5d f3a1fbc3-winter_scene_X.png neg local (SDXL/processed) sdxl-or-processed 2816 1536 png none 2026-05-24T18:52:23Z metadata-clean: no C2PA/SynthID source
11 f07bc0bcad09a5a5687ae312a1298c9ddd110a5e414265efc52ef4d524b36f86 f07bc0bc-api_enterprise_gate_ax.png neg local (SDXL/processed) sdxl-or-processed 1080 1350 png none 2026-05-24T18:52:23Z metadata-clean: no C2PA/SynthID source
12 89571987e368f1ce82f9dedfa9101584434def842e50d1f4c759de64db5c21d9 89571987-c87bd3c48a4443a68cb84a65604dacd3_clean.png neg local (SDXL/processed) sdxl-or-processed 2816 1536 png none 2026-05-24T18:52:23Z metadata-clean: no C2PA/SynthID source
13 7b650522d42db09568e249c04d683c469fb3e280a2c53fcd1031cb9df27c619a 7b650522-ChatGPT Image May 24, 2026, 12_19_54 PM.png pos ChatGPT gpt-image 1602 982 png OpenAI yes c2pa-metadata 2026-05-24T19:20:25Z content: misty pine forest at dawn
14 fb28dba2a82cc101a92fdee5714867b32610d0564f37737fe4bb70782b8ecf32 fb28dba2-Gemini_Generated_Image_dsjlnsdsjlnsdsjl.png pos Gemini app gemini 2816 1536 png Google LLC yes c2pa-metadata 2026-05-24T19:30:25Z content: elderly fisherman portrait
15 d20d4cc936dbdfe909c52502039a9e84ba93d97b42b24a0acee5b7d6c71930ae d20d4cc9-Gemini_Generated_Image_ug6kdpug6kdpug6k.png pos Gemini app gemini 2816 1536 png Google LLC yes c2pa-metadata 2026-05-24T19:33:15Z content: red coffee mug product shot
16 28f323345f6496d936c3f1a72f671ddf59d0f81565c24a63bf3286860f633afe 28f32334-chatgpt_fisherman.png pos ChatGPT gpt-image 1023 1537 png OpenAI yes c2pa-metadata 2026-05-24T19:39:20Z content: elderly fisherman portrait (fetch+blob dl)
17 88e61a384c2e0b12d97bc66046e4a10542b2987448ba89c4b49e66311e969c84 88e61a38-chatgpt_tokyo.png pos ChatGPT gpt-image 1023 1537 png OpenAI yes c2pa-metadata 2026-05-24T19:42:02Z content: tokyo street night (fetch+blob dl)