mirror of
https://github.com/aloshdenny/reverse-SynthID.git
synced 2026-06-26 19:29:55 +02:00
Compare commits
11 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
| 96076718a4 | |||
| decd987ca1 | |||
| ac05670ba6 | |||
| 302f7c7dd9 | |||
| cc00e57582 | |||
| c4d6b2b4a8 | |||
| 083a5eec6a | |||
| 736d746f5a | |||
| 0c60b31f86 | |||
| 84b6b4c9c2 | |||
| 764c7ab333 |
@@ -0,0 +1,5 @@
|
||||
runs/*.png filter=lfs diff=lfs merge=lfs -text
|
||||
runs/*.jpg filter=lfs diff=lfs merge=lfs -text
|
||||
runs/*.jpeg filter=lfs diff=lfs merge=lfs -text
|
||||
runs/*.gif filter=lfs diff=lfs merge=lfs -text
|
||||
artifacts/*.npz filter=lfs diff=lfs merge=lfs -text
|
||||
|
||||
@@ -25,12 +25,16 @@ Thumbs.db
|
||||
*.pkl
|
||||
!artifacts/codebook/*.pkl
|
||||
|
||||
# Large codebook artifacts — upload manually via HF or GitHub UI
|
||||
artifacts/spectral_codebook_v4.npz
|
||||
|
||||
# Reference images (hosted on HF: https://huggingface.co/datasets/aoxo/reverse-synthid)
|
||||
gemini_black/
|
||||
gemini_white/
|
||||
gemini_random/
|
||||
gemini_black_nb_pro/
|
||||
gemini_white_nb_pro/
|
||||
runs/
|
||||
|
||||
# Secrets
|
||||
.env
|
||||
|
||||
@@ -15,23 +15,198 @@ Visit us on [PitchHut](https://www.pitchhut.com/project/reverse-synthid-engineer
|
||||
<img src="https://img.shields.io/badge/License-Research-green?style=flat-square" alt="License">
|
||||
<img src="https://img.shields.io/badge/Detection_Rate-90%25-success?style=flat-square" alt="Detection">
|
||||
<img src="https://img.shields.io/badge/V3_Bypass-PSNR_43dB+-blueviolet?style=flat-square" alt="V3 Bypass">
|
||||
<img src="https://img.shields.io/badge/Phase_Coherence_Drop-91%25-red?style=flat-square" alt="Phase Drop">
|
||||
<img src="https://img.shields.io/badge/V4_Bypass-Round_06_✓-brightgreen?style=flat-square" alt="V4 Bypass">
|
||||
<img src="https://img.shields.io/badge/Models-gemini--3.1_+_nb--pro-orange?style=flat-square" alt="Models">
|
||||
<img src="https://img.shields.io/badge/Attack-7--stage_all--in--one-red?style=flat-square" alt="Attack">
|
||||
</p>
|
||||
|
||||
---
|
||||
|
||||
## What the Watermark Looks Like
|
||||
|
||||
SynthID encodes an imperceptible pattern directly into pixel values. On a pure **white** image generated by Gemini, the watermark is almost the entire signal. Amplify the high-frequency residual and it looks like this:
|
||||
|
||||
<p align="center">
|
||||
<img src="assets/synthid_white.jpg" alt="SynthID watermark pattern visible on a white Gemini-generated image" width="55%">
|
||||
</p>
|
||||
|
||||
<p align="center"><i>Amplified SynthID carrier pattern extracted from a pure-white Gemini image. The diagonal banding is the watermark's spatial frequency signature — the target of our spectral attack.</i></p>
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
This project reverse-engineers **Google's SynthID** watermarking system - the invisible watermark embedded into every image generated by Google Gemini. Using only signal processing and spectral analysis (no access to the proprietary encoder/decoder), we:
|
||||
This project reverse-engineers **Google's SynthID** watermarking system — the invisible watermark embedded into every image generated by Google Gemini. Using only signal processing and spectral analysis (no access to the proprietary encoder/decoder), we:
|
||||
|
||||
1. **Discovered** the watermark's resolution-dependent carrier frequency structure
|
||||
2. **Built a detector** that identifies SynthID watermarks with 90% accuracy
|
||||
3. **Developed a multi-resolution spectral bypass** (V3) that achieves **75% carrier energy drop**, **91% phase coherence drop**, and **43+ dB PSNR** on any image resolution
|
||||
4. **Generalized to multi-model, multi-color consensus** (V4) — per-model profiles for `gemini-3.1-flash-image-preview` and `nano-banana-pro-preview`, cross-color phase consensus over 6 solid backgrounds, and a human-in-the-loop calibration loop that tunes per-carrier subtraction strength from manual Gemini-app detection tallies
|
||||
5. **Broke the detector across both models** (Round 06) with a unified 7-stage all-in-one attack targeting every documented SynthID failure mode simultaneously
|
||||
|
||||
[VT-OxFF](https://github.com/VT-0xFF) built a really cool visualizer to view the process of how SynthID watermark is added to images [here](https://vt-0xff.github.io/SynthID-Explained/) (also available in repo description)!
|
||||
|
||||
---
|
||||
|
||||
## Round 06 — It Works ✓
|
||||
|
||||
After six iterative rounds of adversarial development, Round 06's `bypass_v4_final` / `bypass_v4_nuke` pipeline defeats the Gemini SynthID detector on **both** `gemini-3.1-flash-image-preview` and `nano-banana-pro-preview` images, with visually lossless output.
|
||||
|
||||
### Round 01 vs Round 06 — Fidelity Comparison
|
||||
|
||||
<p align="center">
|
||||
<img src="assets/v4_round1_vs_round6.png" alt="Side-by-side comparison of Round 01 (gentle spectral only) vs Round 06 (all-in-one) on the same source image" width="80%">
|
||||
</p>
|
||||
|
||||
<p align="center"><i>Left: Round 01 output (<code>gentle</code> spectral subtraction only). Right: Round 06 output (<code>final</code> — VAE + elastic warp + squeeze + color + JPEG). Both look identical to human eyes; only Round 06 defeats the SynthID detector.</i></p>
|
||||
|
||||
### What Changed Between Rounds
|
||||
|
||||
| Round | Strategy | Outcome |
|
||||
|:-----:|:---------|:-------:|
|
||||
| 01 | Conservative spectral subtraction (gentle) | ✗ |
|
||||
| 02 | Aggressive spectral subtraction + JPEG | ✗ |
|
||||
| 03 | Blog-guided absolute bin targeting | ✗ |
|
||||
| 04 | Denoise-residual phase extraction | ✗ |
|
||||
| 05 | Diffusion-VAE re-generation + geometric warp | ✗ |
|
||||
| **06** | **All-in-one: VAE + elastic fragmentation + squeeze + color + JPEG** | **✓** |
|
||||
|
||||
The breakthrough in Round 06 came from treating the Gemini app's own published failure-mode list as an attack specification:
|
||||
|
||||
> *"When an AI-generated image is part of a complex collage, layered behind other elements, or has many different textures and patterns placed over it, the detector may struggle to isolate the specific signature from the overall file."*
|
||||
> — Gemini app, SynthID detection help text
|
||||
|
||||
The **elastic deformation** stage simulates this effect at the pixel level: a smooth, low-frequency random warp field gives every ~50-pixel neighbourhood its own independent sub-pixel offset, fragmenting the watermark's spatial phase consensus without introducing any visible distortion.
|
||||
|
||||
---
|
||||
|
||||
## V4 — Cross-Color Consensus + Human-in-the-Loop Calibration
|
||||
|
||||
V4 is a ground-up re-think of the codebook built on a much richer dataset:
|
||||
|
||||
- **Multi-model**: separate profiles for `gemini-3.1-flash-image-preview` and `nano-banana-pro-preview` (plus an optional `union` pseudo-model).
|
||||
- **Multi-color**: 6 consensus colors (`black`, `white`, `blue`, `green`, `red`, `gray`) per model per resolution, plus `gradient` and `diverse` as content baselines.
|
||||
- **Cross-color phase consensus**: the primary carrier mask. A true SynthID carrier is image-content-independent, so its phase is consistent across every solid-color background. Content-driven energy phase-scrambles across colors and drops out of the consensus.
|
||||
- **Fidelity-preserving dissolver**: PSNR-floor rollback, luminance-safe DC, per-bin subtraction cap.
|
||||
- **Human-in-the-loop calibration loop**: a codebook field `carrier_weights` is updated based on manual Gemini-app detection feedback.
|
||||
|
||||
### Consensus coherence (why V4 wins)
|
||||
|
||||
For each frequency bin `(fy, fx)` and channel `ch`:
|
||||
|
||||
```
|
||||
consensus(fy, fx, ch) = | mean_over_colors( exp(i * phase_color(fy, fx, ch)) ) |
|
||||
```
|
||||
|
||||
Values near `1.0` mean the phase at that bin is locked across every solid-color background, which is only true for the watermark. Content bins collapse to `< 0.3` because their phase is randomized by different color tints. On the V4 codebook built from the enriched dataset, 99%+ of content bins fall below the default `tau=0.60` cutoff, so the V4 dissolver never touches them — this is what buys back PSNR.
|
||||
|
||||
### Two-phase release workflow
|
||||
|
||||
```mermaid
|
||||
flowchart LR
|
||||
dataset[reverse-synthid-dataset<br/>model x color x resolution] --> build[scripts/build_codebook_v4.py]
|
||||
build --> codebook[artifacts/spectral_codebook_v4.npz]
|
||||
codebook --> dissolve[scripts/dissolve_batch.py]
|
||||
input[watermarked inputs] --> dissolve
|
||||
dissolve --> variants[final / nuke variants]
|
||||
variants --> gemini[Gemini app<br/>manual SynthID detection]
|
||||
gemini --> feedback[detection feedback]
|
||||
feedback --> calibrate[scripts/calibrate_from_feedback.py]
|
||||
calibrate -->|updates carrier_weights| codebook
|
||||
```
|
||||
|
||||
### V4 Quickstart
|
||||
|
||||
```bash
|
||||
# 1. Build the codebook from the enriched hierarchical dataset
|
||||
python scripts/build_codebook_v4.py \
|
||||
--root /path/to/reverse-synthid-dataset \
|
||||
--output artifacts/spectral_codebook_v4.npz
|
||||
|
||||
# 2. Run the Round-06 all-in-one attack on a batch (recommended)
|
||||
python scripts/dissolve_batch.py \
|
||||
--input ./to_clean/ \
|
||||
--output ./runs/round_06/ \
|
||||
--codebook artifacts/spectral_codebook_v4.npz \
|
||||
--model gemini-3.1-flash-image-preview \
|
||||
--strengths final nuke
|
||||
|
||||
# 3. Upload each output image to the Gemini app and run SynthID detection.
|
||||
# Use the results to feed back into the calibration script if needed.
|
||||
```
|
||||
|
||||
### Round-06 Attack Presets
|
||||
|
||||
Two presets are available via `--strengths`:
|
||||
|
||||
| Preset | VAE passes | Elastic α | Squeeze | JPEG chain | PSNR floor |
|
||||
|:------:|:----------:|:---------:|:-------:|:----------:|:----------:|
|
||||
| `final` | 1 | 1.8 px | 90 % | q=92→88 | 14 dB |
|
||||
| `nuke` | 2 | 2.8 px | 82 % | q=88→84→90 | 11 dB |
|
||||
|
||||
Both presets stack the same 7-stage pipeline:
|
||||
|
||||
1. **VAE round-trip** (Stable Diffusion `sd-vae-ft-mse`) — projects image off the natural-image manifold the SynthID decoder was never trained against (Gowal et al. 2026, §6.1)
|
||||
2. **Elastic deformation** — smooth low-frequency random warp field, simulates the "collage fragmentation" failure mode Gemini itself acknowledges
|
||||
3. **Global geometric combo** — small rotation + zoom + pixel shift in one affine warp
|
||||
4. **Resize-squeeze** — downsample (AREA) → upsample (LANCZOS), erases sub-pixel watermark info
|
||||
5. **Color-contrast nudge** — brightness / contrast / saturation / hue micro-shift
|
||||
6. **Residual-phase FFT subtraction** — blog-universal + codebook-harvested carrier bins, cap-limited
|
||||
7. **JPEG chain + luma noise + bilateral** — heavy compression / re-encoding disruption
|
||||
|
||||
Every stage is independently PSNR-gated; any stage that would drop quality below the floor is rolled back automatically.
|
||||
|
||||
### V4 Codebook Structure
|
||||
|
||||
Profiles keyed by `(model, H, W)`. Each profile stores:
|
||||
|
||||
| Field | Shape | Notes |
|
||||
|------------------------|----------------|--------------------------------------------------------|
|
||||
| `consensus_coherence` | `(H, W, 3)` | Primary carrier mask (cross-color phase consensus). |
|
||||
| `consensus_phase` | `(H, W, 3)` | Mean unit-phase angle across colors. Subtraction template. |
|
||||
| `inverted_agreement` | `(H, W, 3)` | Pairwise `abs(cos(phase_diff))`, weighted for `black<->white`. |
|
||||
| `avg_wm_magnitude` | `(H, W, 3)` | Mean magnitude across consensus colors. |
|
||||
| `content_baseline` | `(H, W, 3)` | From `diverse/` + `gradient/` — used for luminance blending. |
|
||||
| `carrier_weights` | `(H, W, 3)` | **Live**. Starts at `consensus^2 * (0.5 + 0.5 * agreement)`. Updated by the calibration loop. |
|
||||
| `n_refs_per_color` | `{color: int}` | Per-color ref counts. |
|
||||
|
||||
Save format reuses the v3 compact rfft + `float16/uint8` encoding; a 14-profile codebook across 2 models × 7 resolutions is ~220 MB on disk.
|
||||
|
||||
### V4 Detector (Sanity Check)
|
||||
|
||||
Before spending time on manual Gemini validation, sanity-check bypass outputs against the V4 codebook's own consensus:
|
||||
|
||||
```python
|
||||
from robust_extractor import RobustSynthIDExtractor
|
||||
from synthid_bypass_v4 import SpectralCodebookV4
|
||||
|
||||
cb = SpectralCodebookV4()
|
||||
cb.load('artifacts/spectral_codebook_v4.npz')
|
||||
|
||||
ext = RobustSynthIDExtractor()
|
||||
result = ext.detect_from_v4_codebook(image_rgb, cb,
|
||||
model='nano-banana-pro-preview')
|
||||
print(result.is_watermarked, result.confidence, result.phase_match)
|
||||
```
|
||||
|
||||
On the 1024x1024 exact-match path we see `conf=0.91, phase_match=0.65` for watermarked and `conf=0.02, phase_match=0.31` after aggressive V4 dissolve.
|
||||
|
||||
### V4 vs V3
|
||||
|
||||
| | V3 | V4 |
|
||||
|:---|:---|:---|
|
||||
| Reference colors | black + white | black, white, blue, green, red, gray (+ diverse/gradient content baselines) |
|
||||
| Cross-validation | `abs(cos(phase_black - phase_white))` | cross-color consensus over 6 colors + pairwise agreement |
|
||||
| Models | single-model (Gemini 2.5) | per-model profiles (`gemini-3.1-flash-image-preview`, `nano-banana-pro-preview`) + optional `union` |
|
||||
| Attack | spectral subtraction only | 7-stage: VAE + elastic + squeeze + color + FFT + JPEG chain |
|
||||
| PSNR (aggressive) | 43 dB | visually lossless (18–24 dB pixel-level; warp displaces pixels) |
|
||||
| Fidelity guard | none | per-stage PSNR-floor rollback |
|
||||
| Detector bypass | local only | confirmed ✓ on Gemini app (both models) |
|
||||
|
||||
V3 remains in the repo (`src/extraction/synthid_bypass.py`, `bypass_v3`) unchanged for anyone who depends on it.
|
||||
|
||||
---
|
||||
|
||||
## 🚨 Contributors Wanted: Help Expand the Codebook
|
||||
|
||||
We're actively collecting **pure black and pure white images generated by Nano Banana Pro** to improve multi-resolution watermark extraction.
|
||||
@@ -71,17 +246,11 @@ Dataset: [huggingface.co/datasets/aoxo/reverse-synthid](https://huggingface.co/d
|
||||
|
||||
---
|
||||
|
||||
### What Makes This Different
|
||||
|
||||
Unlike brute-force approaches (JPEG compression, noise injection), our V3 bypass uses a **multi-resolution SpectralCodebook** - a collection of per-resolution watermark fingerprints stored in a single file. At bypass time, the codebook auto-selects the matching resolution profile, enabling surgical frequency-bin-level removal on any image size.
|
||||
|
||||
---
|
||||
|
||||
## Key Findings
|
||||
|
||||
### The Watermark is Resolution-Dependent
|
||||
|
||||
SynthID embeds carrier frequencies at **different absolute positions** depending on image resolution. A codebook built at 1024x1024 cannot directly remove the watermark from a 1536x2816 image - the carriers are at completely different bins.
|
||||
SynthID embeds carrier frequencies at **different absolute positions** depending on image resolution. A codebook built at 1024x1024 cannot directly remove the watermark from a 1536x2816 image — the carriers are at completely different bins.
|
||||
|
||||
| Resolution | Top Carrier (fy, fx) | Coherence | Source |
|
||||
|:----------:|:--------------------:|:---------:|:------:|
|
||||
@@ -90,7 +259,7 @@ SynthID embeds carrier frequencies at **different absolute positions** depending
|
||||
|
||||
This is why the V3 codebook stores **separate profiles per resolution** and auto-selects at bypass time.
|
||||
|
||||
### Phase Consistency - A Fixed Model-Level Key
|
||||
### Phase Consistency — A Fixed Model-Level Key
|
||||
|
||||
The watermark's phase template is **identical across all images** from the same Gemini model:
|
||||
|
||||
@@ -109,28 +278,20 @@ At 1024x1024 (from black/white refs), top carriers lie on a low-frequency grid:
|
||||
| (10, 11) | 100.00% | 0.997 |
|
||||
| (13, 6) | 100.00% | 0.821 |
|
||||
|
||||
At 1536x2816 (from random watermarked content), carriers are at much higher frequencies:
|
||||
|
||||
| Carrier (fy, fx) | Phase Coherence |
|
||||
|:------------------:|:---------------:|
|
||||
| (768, 704) | 99.55% |
|
||||
| (672, 1056) | 97.46% |
|
||||
| (480, 1408) | 96.55% |
|
||||
| (384, 1408) | 95.86% |
|
||||
|
||||
---
|
||||
|
||||
## Architecture
|
||||
|
||||
### Three Generations of Bypass
|
||||
### Bypass Generations
|
||||
|
||||
| Version | Approach | PSNR | Watermark Impact | Status |
|
||||
|:-------:|:---------|:----:|:----------------:|:------:|
|
||||
| **V1** | JPEG compression (Q50) | 37 dB | ~11% phase drop | Baseline |
|
||||
| **V2** | Multi-stage transforms (noise, color, frequency) | 27-37 dB | ~0% confidence drop | Quality trade-off |
|
||||
| **V3** | **Multi-resolution spectral codebook subtraction** | **43+ dB** | **91% phase coherence drop** | **Best** |
|
||||
| **V3** | **Multi-resolution spectral codebook subtraction** | **43+ dB** | **91% phase coherence drop** | Prior best |
|
||||
| **V4 Round 06** | **7-stage all-in-one (VAE + elastic + squeeze + color + JPEG)** | **visually lossless** | **detector bypassed ✓** | **Current best** |
|
||||
|
||||
### V3 Pipeline (Multi-Resolution Spectral Bypass)
|
||||
### V3 Pipeline
|
||||
|
||||
```
|
||||
Input Image (any resolution)
|
||||
@@ -148,32 +309,28 @@ Input Image (any resolution)
|
||||
Anti-alias → Output
|
||||
```
|
||||
|
||||
1. **SpectralCodebook** stores resolution-specific profiles (carrier positions, magnitudes, phases)
|
||||
2. **Auto resolution selection** picks the exact profile or the closest match
|
||||
3. **Direct known-signal subtraction** weighted by phase consistency and cross-validation confidence
|
||||
4. **Multi-pass schedule** catches residual watermark energy missed by previous passes
|
||||
5. **Per-channel weighting** (G=1.0, R=0.85, B=0.70) matches SynthID's embedding strength
|
||||
### V4 Round-06 Pipeline
|
||||
|
||||
---
|
||||
|
||||
## Results (V3 on 88 Gemini Images)
|
||||
|
||||
### Aggregate Metrics (1536x2816, aggressive strength)
|
||||
|
||||
| Metric | Value |
|
||||
|:-------|------:|
|
||||
| **PSNR** | 43.5 dB |
|
||||
| **SSIM** | 0.997 |
|
||||
| **Carrier energy drop** | 75.8% |
|
||||
| **Phase coherence drop** (top-5 carriers) | **91.4%** |
|
||||
|
||||
### Quality Across Resolutions
|
||||
|
||||
| Resolution | Match | PSNR | SSIM |
|
||||
|:----------:|:-----:|:----:|:----:|
|
||||
| 1536x2816 | exact | 44.9 dB | 0.996 |
|
||||
| 1024x1024 | exact | 39.8 dB | 0.977 |
|
||||
| 768x1024 | fallback | 40.6 dB | 0.994 |
|
||||
```
|
||||
Input Image (any resolution)
|
||||
│
|
||||
▼ Stage 1: VAE round-trip (SD sd-vae-ft-mse, 1-2 passes)
|
||||
│ Projects image off natural-image manifold
|
||||
▼ Stage 2: Elastic deformation (smooth random warp field)
|
||||
│ Fragments spatial phase consensus ("collage effect")
|
||||
▼ Stage 3: Global geometric combo (rotation + zoom + shift)
|
||||
│ Single affine warp, no compounded aliasing
|
||||
▼ Stage 4: Resize-squeeze (AREA ↓ then LANCZOS ↑)
|
||||
│ Erases sub-pixel watermark information
|
||||
▼ Stage 5: Color-contrast nudge (HSV micro-shift)
|
||||
│ Shifts per-pixel statistics SynthID keys on
|
||||
▼ Stage 6: Residual-phase FFT subtraction
|
||||
│ Blog-universal + codebook-harvested carrier bins
|
||||
▼ Stage 7: JPEG chain + luma noise + bilateral filter
|
||||
│
|
||||
▼
|
||||
Output (SynthID detector: no watermark detected ✓)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
@@ -188,41 +345,32 @@ cd reverse-SynthID
|
||||
python -m venv venv
|
||||
source venv/bin/activate # Windows: venv\Scripts\activate
|
||||
pip install -r requirements.txt
|
||||
|
||||
# For Round-06 VAE stage:
|
||||
pip install torch diffusers safetensors accelerate
|
||||
```
|
||||
|
||||
### 1. Build Multi-Resolution Codebook
|
||||
|
||||
From the CLI:
|
||||
|
||||
```bash
|
||||
python src/extraction/synthid_bypass.py build-codebook \
|
||||
--black gemini_black \
|
||||
--white gemini_white \
|
||||
--watermarked gemini_random \
|
||||
--output artifacts/spectral_codebook_v3.npz
|
||||
```
|
||||
|
||||
Or from Python:
|
||||
### Run V4 Round-06 Bypass (Recommended)
|
||||
|
||||
```python
|
||||
from src.extraction.synthid_bypass import SpectralCodebook
|
||||
import sys
|
||||
sys.path.insert(0, 'src/extraction')
|
||||
from synthid_bypass_v4 import SynthIDBypassV4, SpectralCodebookV4
|
||||
|
||||
codebook = SpectralCodebook()
|
||||
cb = SpectralCodebookV4()
|
||||
cb.load('artifacts/spectral_codebook_v4.npz')
|
||||
|
||||
# Profile 1: from black/white reference images (1024x1024)
|
||||
codebook.extract_from_references(
|
||||
black_dir='gemini_black',
|
||||
white_dir='gemini_white',
|
||||
b = SynthIDBypassV4()
|
||||
result = b.bypass_v4_file(
|
||||
'input.png', 'output.png',
|
||||
cb,
|
||||
strength='final', # or 'nuke' for maximum strength
|
||||
model='gemini-3.1-flash-image-preview',
|
||||
)
|
||||
|
||||
# Profile 2: from watermarked content images (1536x2816)
|
||||
codebook.build_from_watermarked('gemini_random')
|
||||
|
||||
codebook.save('artifacts/spectral_codebook_v3.npz')
|
||||
# Saved with profiles: [1024x1024, 1536x2816]
|
||||
print(result.stages_applied)
|
||||
```
|
||||
|
||||
### 2. Run V3 Bypass (Any Resolution)
|
||||
### Run V3 Bypass
|
||||
|
||||
```python
|
||||
from src.extraction.synthid_bypass import SynthIDBypass, SpectralCodebook
|
||||
@@ -235,7 +383,6 @@ result = bypass.bypass_v3(image_rgb, codebook, strength='aggressive')
|
||||
|
||||
print(f"PSNR: {result.psnr:.1f} dB")
|
||||
print(f"Profile used: {result.details['profile_resolution']}")
|
||||
print(f"Exact match: {result.details['exact_match']}")
|
||||
```
|
||||
|
||||
From the CLI:
|
||||
@@ -246,9 +393,7 @@ python src/extraction/synthid_bypass.py bypass input.png output.png \
|
||||
--strength aggressive
|
||||
```
|
||||
|
||||
**Strength levels:** `gentle` (minimal, ~45 dB) > `moderate` > `aggressive` (recommended) > `maximum`
|
||||
|
||||
### 3. Detect Watermark
|
||||
### Detect Watermark
|
||||
|
||||
```bash
|
||||
python src/extraction/robust_extractor.py detect image.png \
|
||||
@@ -264,7 +409,9 @@ reverse-SynthID/
|
||||
├── src/
|
||||
│ ├── extraction/
|
||||
│ │ ├── synthid_bypass.py # V1/V2/V3 bypass + multi-res SpectralCodebook
|
||||
│ │ ├── robust_extractor.py # Multi-scale watermark detection
|
||||
│ │ ├── synthid_bypass_v4.py # V4 cross-color consensus codebook + dissolver
|
||||
│ │ ├── vae_regen.py # Round-06 SD-VAE re-generation stage
|
||||
│ │ ├── robust_extractor.py # Multi-scale watermark detection (+ V4 hook)
|
||||
│ │ ├── watermark_remover.py # Frequency-domain watermark removal
|
||||
│ │ ├── benchmark_extraction.py # Benchmarking suite
|
||||
│ │ └── synthid_codebook_extractor.py # Legacy codebook extractor
|
||||
@@ -273,14 +420,27 @@ reverse-SynthID/
|
||||
│ └── synthid_codebook_finder.py # Carrier frequency discovery
|
||||
│
|
||||
├── scripts/
|
||||
│ └── download_images.py # Download reference images from HF
|
||||
│ ├── download_images.py # Download reference images from HF
|
||||
│ ├── build_codebook_v4.py # V4: build per-(model, HxW) consensus codebook
|
||||
│ ├── dissolve_batch.py # V4: emit strength variants
|
||||
│ └── calibrate_from_feedback.py # V4: update carrier_weights from detection feedback
|
||||
│
|
||||
├── artifacts/
|
||||
│ ├── spectral_codebook_v3.npz # Multi-res V3 codebook [1024x1024, 1536x2816]
|
||||
│ ├── spectral_codebook_v4.npz # V4 codebook (per-model, per-resolution)
|
||||
│ ├── codebook/ # Detection codebooks (.pkl)
|
||||
│ └── visualizations/ # FFT, phase, carrier visualizations
|
||||
│
|
||||
├── assets/ # README images and early analysis artifacts
|
||||
├── assets/
|
||||
│ ├── synthid_watermark.png # Watermark analysis header image
|
||||
│ ├── synthid_white.jpg # Amplified SynthID pattern on white image
|
||||
│ ├── v4_round1_vs_round6.png # Round 01 vs Round 06 fidelity comparison
|
||||
│ └── ...
|
||||
│
|
||||
├── runs/
|
||||
│ ├── round_01/ … round_05/ # Historical bypass attempts
|
||||
│ └── round_06/ # Working bypass (final + nuke presets)
|
||||
│
|
||||
├── watermark_investigation/ # Early-stage Nano-150k analysis (archived)
|
||||
└── requirements.txt
|
||||
```
|
||||
@@ -308,69 +468,31 @@ reverse-SynthID/
|
||||
└──────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Multi-Resolution SpectralCodebook
|
||||
### Why Elastic Deformation Works
|
||||
|
||||
The codebook captures watermark profiles at each available resolution:
|
||||
|
||||
- **1024x1024 profile**: from 100 black + 100 white pure-color Gemini outputs
|
||||
- Black images: watermark is nearly the entire pixel content
|
||||
- White images (inverted): confirms carriers via cross-validation
|
||||
- Black/white agreement (|cos(phase_diff)|) filters out generation bias
|
||||
- **1536x2816 profile**: from 88 diverse watermarked content images
|
||||
- Content averages out across images; fixed watermark survives in phase coherence
|
||||
- Watermark magnitude estimated as `avg_mag x coherence^2`
|
||||
|
||||
### V3 Subtraction Strategy
|
||||
|
||||
The bypass uses **direct known-signal subtraction** (not a Wiener filter):
|
||||
|
||||
1. **Confidence** = phase_consistency x cross_validation_agreement
|
||||
2. **DC exclusion** — soft ramp suppresses low-frequency generation biases
|
||||
3. **Per-bin subtraction** = wm_magnitude x confidence x removal_fraction x channel_weight
|
||||
4. **Safety cap** — subtraction never exceeds 90-95% of the image's energy at any bin
|
||||
5. **Multi-pass** — decreasing-strength schedule (aggressive → moderate → gentle) catches residual energy
|
||||
SynthID's training augmentation set (Gowal et al. 2026, Table 1) includes `SmallRotation`, `Cropresize`, `JPEG`, `GaussianBlur`, `BrightnessContrast`, and `Screenshotting` — all *global*, *uniform* spatial transforms. The elastic warp field is a *spatially varying* distortion: each local neighbourhood gets its own independent sub-pixel offset. Because the offsets are smooth (Gaussian-blurred from white noise, σ=44–56 px), the image content is visually unaffected, but the watermark's phase-consensus structure is incoherent — it can no longer be aggregated across the image. This is the pixel-level equivalent of the "collage fragmentation" effect that Gemini's own app cites as a detector failure mode.
|
||||
|
||||
---
|
||||
|
||||
## Core Modules
|
||||
## Results Summary
|
||||
|
||||
### `synthid_bypass.py`
|
||||
### V3 (spectral subtraction, 88 Gemini images)
|
||||
|
||||
**SpectralCodebook** — multi-resolution watermark fingerprint:
|
||||
| Metric | Value |
|
||||
|:-------|------:|
|
||||
| **PSNR** | 43.5 dB |
|
||||
| **SSIM** | 0.997 |
|
||||
| **Carrier energy drop** | 75.8% |
|
||||
| **Phase coherence drop** (top-5 carriers) | **91.4%** |
|
||||
|
||||
```python
|
||||
codebook = SpectralCodebook()
|
||||
codebook.extract_from_references('gemini_black', 'gemini_white') # adds 1024x1024 profile
|
||||
codebook.build_from_watermarked('gemini_random') # adds 1536x2816 profile
|
||||
codebook.save('codebook.npz')
|
||||
### V4 Round 06 (all-in-one attack, 20 images validated)
|
||||
|
||||
# Later:
|
||||
codebook.load('codebook.npz')
|
||||
profile, res, exact = codebook.get_profile(1536, 2816) # auto-select
|
||||
```
|
||||
|
||||
**SynthIDBypass** — three bypass generations:
|
||||
|
||||
```python
|
||||
bypass = SynthIDBypass()
|
||||
|
||||
result = bypass.bypass_simple(image, jpeg_quality=50) # V1
|
||||
result = bypass.bypass_v2(image, strength='aggressive') # V2
|
||||
result = bypass.bypass_v3(image, codebook, strength='aggressive') # V3 (best)
|
||||
```
|
||||
|
||||
### `robust_extractor.py`
|
||||
|
||||
Multi-scale watermark detector (90% accuracy):
|
||||
|
||||
```python
|
||||
from robust_extractor import RobustSynthIDExtractor
|
||||
|
||||
extractor = RobustSynthIDExtractor()
|
||||
extractor.load_codebook('artifacts/codebook/robust_codebook.pkl')
|
||||
result = extractor.detect_array(image)
|
||||
print(f"Watermarked: {result.is_watermarked}, Confidence: {result.confidence:.4f}")
|
||||
```
|
||||
| Model | Preset | Detector bypassed |
|
||||
|:------|:------:|:-----------------:|
|
||||
| gemini-3.1-flash-image-preview | `final` | ✓ |
|
||||
| gemini-3.1-flash-image-preview | `nuke` | ✓ |
|
||||
| nano-banana-pro-preview | `final` | ✓ |
|
||||
| nano-banana-pro-preview | `nuke` | ✓ |
|
||||
|
||||
---
|
||||
|
||||
@@ -378,6 +500,7 @@ print(f"Watermarked: {result.is_watermarked}, Confidence: {result.confidence:.4f
|
||||
|
||||
- [SynthID: Identifying AI-generated images](https://deepmind.google/technologies/synthid/)
|
||||
- [SynthID Paper (arXiv:2510.09263)](https://arxiv.org/abs/2510.09263)
|
||||
- [How to Reverse SynthID (legally😉) — Aloshdenny on Medium](https://medium.com/@aloshdenny)
|
||||
|
||||
---
|
||||
|
||||
|
||||
Binary file not shown.
Binary file not shown.
|
After Width: | Height: | Size: 222 KiB |
@@ -19,3 +19,13 @@ matplotlib>=3.4.0
|
||||
|
||||
# Utilities
|
||||
tqdm>=4.60.0
|
||||
|
||||
# Deep learning (VAE re-generation stage)
|
||||
torch>=2.0.0
|
||||
diffusers>=0.20.0
|
||||
safetensors>=0.3.0
|
||||
accelerate>=0.20.0
|
||||
|
||||
# Gemini API (reference image generation)
|
||||
google-genai>=1.0.0
|
||||
python-dotenv>=1.0.0
|
||||
|
||||
@@ -0,0 +1,161 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Build the reverse-SynthID V4 codebook from a hierarchical dataset.
|
||||
|
||||
Expected layout::
|
||||
|
||||
<root>/
|
||||
<model>/
|
||||
black/ HxW/*.png
|
||||
white/ HxW/*.png
|
||||
blue/ HxW/*.png
|
||||
green/ HxW/*.png
|
||||
red/ HxW/*.png
|
||||
gray/ HxW/*.png
|
||||
gradient/ HxW/*.png
|
||||
diverse/ HxW/*.png
|
||||
|
||||
The script produces one ``ProfileV4`` per ``(model, H, W)`` that has at least
|
||||
``--min-consensus-colors`` consensus colours (``black``, ``white``, ``blue``,
|
||||
``green``, ``red``, ``gray``) with enough reference images. ``gradient/`` and
|
||||
``diverse/`` are used as content-baseline only, never as carrier sources.
|
||||
|
||||
Usage::
|
||||
|
||||
python scripts/build_codebook_v4.py \\
|
||||
--root /Users/aoxo/vscode/reverse-synthid-data \\
|
||||
--output artifacts/spectral_codebook_v4.npz
|
||||
|
||||
# Restrict to a single model:
|
||||
python scripts/build_codebook_v4.py --root <root> --models nano-banana-pro-preview
|
||||
|
||||
# Also emit a 'union' pseudo-model that averages profiles across models:
|
||||
python scripts/build_codebook_v4.py --root <root> --add-union
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import os
|
||||
import sys
|
||||
from typing import List, Optional
|
||||
|
||||
|
||||
REPO_ROOT = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
|
||||
sys.path.insert(0, os.path.join(REPO_ROOT, "src", "extraction"))
|
||||
|
||||
from synthid_bypass_v4 import ( # noqa: E402
|
||||
ALL_COLORS,
|
||||
SpectralCodebookV4,
|
||||
)
|
||||
|
||||
|
||||
DEFAULT_DATASET_ROOT = "/Users/aoxo/vscode/reverse-synthid-data"
|
||||
DEFAULT_OUTPUT = os.path.join(REPO_ROOT, "artifacts", "spectral_codebook_v4.npz")
|
||||
|
||||
|
||||
def build(
|
||||
root: str,
|
||||
output: str,
|
||||
models: Optional[List[str]] = None,
|
||||
colors: Optional[List[str]] = None,
|
||||
min_refs_per_color: int = 3,
|
||||
min_consensus_colors: int = 3,
|
||||
max_per_bucket: Optional[int] = None,
|
||||
add_union: bool = False,
|
||||
) -> None:
|
||||
if not os.path.isdir(root):
|
||||
raise FileNotFoundError(f"Dataset root not found: {root}")
|
||||
|
||||
codebook = SpectralCodebookV4()
|
||||
codebook._bind_root(root) # type: ignore[attr-defined]
|
||||
codebook.build_from_hierarchical_dataset(
|
||||
root=root,
|
||||
models=models,
|
||||
colors=colors,
|
||||
min_refs_per_color=min_refs_per_color,
|
||||
min_consensus_colors=min_consensus_colors,
|
||||
max_per_bucket=max_per_bucket,
|
||||
verbose=True,
|
||||
)
|
||||
|
||||
if not codebook.profiles:
|
||||
print("\nNo profiles built. Check that --root points at a directory "
|
||||
"containing <model>/<color>/<HxW>/*.png")
|
||||
sys.exit(2)
|
||||
|
||||
if add_union:
|
||||
codebook.add_union_profiles(verbose=True)
|
||||
|
||||
os.makedirs(os.path.dirname(output) if os.path.dirname(output) else ".",
|
||||
exist_ok=True)
|
||||
codebook.save(output)
|
||||
|
||||
print("\nProfiles:")
|
||||
for key in sorted(codebook.profiles):
|
||||
model, h, w = key
|
||||
prof = codebook.profiles[key]
|
||||
refs = ", ".join(
|
||||
f"{c}={n}" for c, n in sorted(prof.n_refs_per_color.items())
|
||||
)
|
||||
print(f" {model}/{h}x{w}: {refs} (content={prof.n_content_refs})")
|
||||
|
||||
|
||||
def main() -> None:
|
||||
p = argparse.ArgumentParser(
|
||||
description="Build the reverse-SynthID V4 codebook.",
|
||||
)
|
||||
p.add_argument(
|
||||
"--root", default=DEFAULT_DATASET_ROOT,
|
||||
help=(
|
||||
"Hierarchical dataset root (default: "
|
||||
f"{DEFAULT_DATASET_ROOT}). Should contain <model>/<color>/<HxW>/*."
|
||||
),
|
||||
)
|
||||
p.add_argument(
|
||||
"--output", default=DEFAULT_OUTPUT,
|
||||
help=f"Output .npz path (default: {DEFAULT_OUTPUT}).",
|
||||
)
|
||||
p.add_argument(
|
||||
"--models", nargs="*", default=None,
|
||||
help="Restrict to these model subdirectories (default: auto-detect).",
|
||||
)
|
||||
p.add_argument(
|
||||
"--colors", nargs="*", default=None, choices=list(ALL_COLORS),
|
||||
help="Colours to include (default: all known).",
|
||||
)
|
||||
p.add_argument(
|
||||
"--min-refs-per-color", type=int, default=3,
|
||||
help="Drop (color, resolution) buckets with fewer images than this.",
|
||||
)
|
||||
p.add_argument(
|
||||
"--min-consensus-colors", type=int, default=3,
|
||||
help=(
|
||||
"Require at least this many consensus colours per (model, HxW) "
|
||||
"or the profile is skipped."
|
||||
),
|
||||
)
|
||||
p.add_argument(
|
||||
"--max-per-bucket", type=int, default=None,
|
||||
help="Cap images per (color, resolution) bucket (default: unlimited).",
|
||||
)
|
||||
p.add_argument(
|
||||
"--add-union", action="store_true",
|
||||
help="Also emit a 'union' pseudo-model averaging across real models.",
|
||||
)
|
||||
args = p.parse_args()
|
||||
|
||||
build(
|
||||
root=args.root,
|
||||
output=args.output,
|
||||
models=args.models,
|
||||
colors=args.colors,
|
||||
min_refs_per_color=args.min_refs_per_color,
|
||||
min_consensus_colors=args.min_consensus_colors,
|
||||
max_per_bucket=args.max_per_bucket,
|
||||
add_union=args.add_union,
|
||||
)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
@@ -0,0 +1,308 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Close the manual-validation loop for reverse-SynthID V4.
|
||||
|
||||
Reads the ``manifest.csv`` from ``dissolve_batch.py`` plus a ``tally.csv``
|
||||
you filled by hand after checking each variant in the Gemini app. Updates
|
||||
``carrier_weights`` in the V4 codebook in place:
|
||||
|
||||
- Bins that the **failed** variants (``still_watermarked=y``) tried to subtract
|
||||
get their weights **bumped up**, so subsequent dissolves attack those bins
|
||||
harder.
|
||||
- Bins that the **succeeded** variants (``still_watermarked=n``) already
|
||||
subtracted get their weights **damped slightly**, to recover fidelity
|
||||
without giving up detector immunity.
|
||||
|
||||
The tally CSV accepts ``y``/``n``/``yes``/``no``/``1``/``0`` (case-insensitive)
|
||||
in ``still_watermarked``. Rows with a blank value are ignored.
|
||||
|
||||
Usage::
|
||||
|
||||
python scripts/calibrate_from_feedback.py \\
|
||||
--manifest runs/round_01/manifest.csv \\
|
||||
--tally runs/round_01/tally.csv \\
|
||||
--codebook artifacts/spectral_codebook_v4.npz \\
|
||||
--step 0.25
|
||||
|
||||
The codebook is rewritten in place; a timestamped backup is made next to it
|
||||
unless ``--no-backup`` is passed.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import csv
|
||||
import datetime
|
||||
import os
|
||||
import shutil
|
||||
import sys
|
||||
from typing import Dict, List, Optional, Tuple
|
||||
|
||||
|
||||
REPO_ROOT = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
|
||||
sys.path.insert(0, os.path.join(REPO_ROOT, "src", "extraction"))
|
||||
|
||||
import numpy as np # noqa: E402
|
||||
|
||||
from synthid_bypass_v4 import SpectralCodebookV4 # noqa: E402
|
||||
|
||||
|
||||
TRUE_TOKENS = {"y", "yes", "1", "true", "t"}
|
||||
FALSE_TOKENS = {"n", "no", "0", "false", "f"}
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# CSV loading
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def _read_csv_dicts(path: str) -> List[Dict[str, str]]:
|
||||
with open(path, newline="") as f:
|
||||
return list(csv.DictReader(f))
|
||||
|
||||
|
||||
def _parse_still_watermarked(value: str) -> Optional[bool]:
|
||||
"""``y/n`` → ``True/False``; empty/unknown → ``None``."""
|
||||
if value is None:
|
||||
return None
|
||||
v = value.strip().lower()
|
||||
if v == "":
|
||||
return None
|
||||
if v in TRUE_TOKENS:
|
||||
return True
|
||||
if v in FALSE_TOKENS:
|
||||
return False
|
||||
return None
|
||||
|
||||
|
||||
def load_feedback(
|
||||
manifest_path: str, tally_path: str,
|
||||
) -> List[Dict]:
|
||||
"""Join manifest + tally on ``(source, variant)``; return labelled rows.
|
||||
|
||||
Only rows whose tally has a parseable ``still_watermarked`` are returned.
|
||||
"""
|
||||
manifest = _read_csv_dicts(manifest_path)
|
||||
|
||||
# Tally may be the same file as the manifest (user filled in place) or a
|
||||
# separate file with at least (source, variant, still_watermarked).
|
||||
tally_raw = _read_csv_dicts(tally_path)
|
||||
tally: Dict[Tuple[str, str], bool] = {}
|
||||
for row in tally_raw:
|
||||
still = _parse_still_watermarked(row.get("still_watermarked", ""))
|
||||
if still is None:
|
||||
continue
|
||||
key = (row["source"], row["variant"])
|
||||
tally[key] = still
|
||||
|
||||
joined: List[Dict] = []
|
||||
for row in manifest:
|
||||
key = (row["source"], row["variant"])
|
||||
if key not in tally:
|
||||
continue
|
||||
merged = dict(row)
|
||||
merged["still_watermarked"] = tally[key]
|
||||
joined.append(merged)
|
||||
return joined
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Calibration logic
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def _parse_profile_key(profile_key: str) -> Optional[Tuple[str, int, int]]:
|
||||
"""Parse ``'model_name/HxW'`` → ``(model, H, W)``."""
|
||||
if not profile_key or "/" not in profile_key:
|
||||
return None
|
||||
model, res = profile_key.rsplit("/", 1)
|
||||
if "x" not in res:
|
||||
return None
|
||||
try:
|
||||
h, w = (int(p) for p in res.lower().split("x"))
|
||||
except ValueError:
|
||||
return None
|
||||
return (model, h, w)
|
||||
|
||||
|
||||
def calibrate(
|
||||
codebook: SpectralCodebookV4,
|
||||
feedback: List[Dict],
|
||||
step: float,
|
||||
damp_factor: float,
|
||||
consensus_floor: float,
|
||||
verbose: bool,
|
||||
) -> Dict[Tuple[str, int, int], Dict[str, float]]:
|
||||
"""Update ``carrier_weights`` in-place. Returns per-profile summary stats.
|
||||
|
||||
The update rule, per profile ``P``:
|
||||
|
||||
Let ``F`` = number of feedback rows against ``P`` with
|
||||
``still_watermarked=True`` (failed dissolves).
|
||||
Let ``S`` = number with ``still_watermarked=False`` (cleared dissolves).
|
||||
|
||||
If ``F > 0``: scale ``carrier_weights`` by ``1 + step * (F / (F + S))``
|
||||
but only on bins with ``consensus_coherence >= consensus_floor``. Non-
|
||||
carrier bins are never touched — we don't want to amplify noise.
|
||||
|
||||
If ``F == 0 and S > 0``: scale ``carrier_weights`` by
|
||||
``1 - damp_factor * step`` on carrier bins (gentle fidelity recovery
|
||||
once we're clearing the detector).
|
||||
"""
|
||||
groups: Dict[Tuple[str, int, int], Dict[str, List[Dict]]] = {}
|
||||
for row in feedback:
|
||||
pkey = _parse_profile_key(row.get("profile_key", ""))
|
||||
if pkey is None:
|
||||
continue
|
||||
bucket = groups.setdefault(pkey, {"fail": [], "pass": []})
|
||||
target = "fail" if row["still_watermarked"] else "pass"
|
||||
bucket[target].append(row)
|
||||
|
||||
summary: Dict[Tuple[str, int, int], Dict[str, float]] = {}
|
||||
|
||||
for pkey, bucket in groups.items():
|
||||
if pkey not in codebook.profiles:
|
||||
if verbose:
|
||||
print(f" skip {pkey}: no matching profile in codebook")
|
||||
continue
|
||||
prof = codebook.profiles[pkey]
|
||||
F = len(bucket["fail"])
|
||||
S = len(bucket["pass"])
|
||||
|
||||
carrier_mask = (prof.consensus_coherence >= consensus_floor).astype(np.float32)
|
||||
|
||||
if F > 0:
|
||||
fail_ratio = F / max(F + S, 1)
|
||||
scale = 1.0 + step * fail_ratio
|
||||
delta = 1.0 + (scale - 1.0) * carrier_mask
|
||||
action = f"bump ×{scale:.3f}"
|
||||
elif S > 0:
|
||||
scale = max(1.0 - damp_factor * step, 0.2)
|
||||
delta = 1.0 + (scale - 1.0) * carrier_mask
|
||||
action = f"damp ×{scale:.3f}"
|
||||
else:
|
||||
continue
|
||||
|
||||
before_mean = float(np.mean(prof.carrier_weights[..., 1]))
|
||||
codebook.update_carrier_weights(pkey, delta)
|
||||
after_mean = float(np.mean(prof.carrier_weights[..., 1]))
|
||||
|
||||
summary[pkey] = {
|
||||
"fail": F,
|
||||
"pass": S,
|
||||
"before_mean_g": before_mean,
|
||||
"after_mean_g": after_mean,
|
||||
"action": action,
|
||||
}
|
||||
if verbose:
|
||||
print(f" {pkey[0]}/{pkey[1]}x{pkey[2]}: {action} "
|
||||
f"fail={F} pass={S} "
|
||||
f"mean(G) {before_mean:.4f} → {after_mean:.4f}")
|
||||
|
||||
return summary
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Entry point
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def run(
|
||||
manifest_path: str,
|
||||
tally_path: str,
|
||||
codebook_path: str,
|
||||
step: float,
|
||||
damp_factor: float,
|
||||
consensus_floor: float,
|
||||
backup: bool,
|
||||
) -> None:
|
||||
if not os.path.isfile(manifest_path):
|
||||
raise FileNotFoundError(f"Manifest not found: {manifest_path}")
|
||||
if not os.path.isfile(tally_path):
|
||||
raise FileNotFoundError(f"Tally not found: {tally_path}")
|
||||
if not os.path.isfile(codebook_path):
|
||||
raise FileNotFoundError(f"Codebook not found: {codebook_path}")
|
||||
|
||||
feedback = load_feedback(manifest_path, tally_path)
|
||||
if not feedback:
|
||||
print("No usable feedback rows (empty still_watermarked?). Nothing "
|
||||
"to do.")
|
||||
return
|
||||
|
||||
print(f"Loaded {len(feedback)} labelled rows from tally.")
|
||||
|
||||
codebook = SpectralCodebookV4()
|
||||
codebook.load(codebook_path)
|
||||
|
||||
if backup:
|
||||
ts = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
|
||||
backup_path = codebook_path + f".bak-{ts}.npz"
|
||||
shutil.copyfile(codebook_path, backup_path)
|
||||
print(f"Backup → {backup_path}")
|
||||
|
||||
summary = calibrate(
|
||||
codebook=codebook,
|
||||
feedback=feedback,
|
||||
step=step,
|
||||
damp_factor=damp_factor,
|
||||
consensus_floor=consensus_floor,
|
||||
verbose=True,
|
||||
)
|
||||
|
||||
if not summary:
|
||||
print("No profiles updated.")
|
||||
return
|
||||
|
||||
codebook.save(codebook_path)
|
||||
|
||||
n_fail = sum(s["fail"] for s in summary.values())
|
||||
n_pass = sum(s["pass"] for s in summary.values())
|
||||
print(f"\nCalibration complete. Profiles updated: {len(summary)}")
|
||||
print(f"Feedback: {n_pass} cleared / {n_fail} still watermarked "
|
||||
f"({n_pass * 100.0 / max(n_pass + n_fail, 1):.1f}% success).")
|
||||
if n_fail > 0:
|
||||
print("Next: re-run dissolve_batch.py on a fresh batch; weights "
|
||||
"are now stronger at persistent carriers.")
|
||||
else:
|
||||
print("100% cleared — consider lowering strength for better "
|
||||
"fidelity on the next batch.")
|
||||
|
||||
|
||||
def main() -> None:
|
||||
p = argparse.ArgumentParser(
|
||||
description=(
|
||||
"Update V4 carrier_weights from manual Gemini detection tallies."
|
||||
),
|
||||
)
|
||||
p.add_argument("--manifest", required=True,
|
||||
help="Path to manifest.csv produced by dissolve_batch.py.")
|
||||
p.add_argument("--tally", required=True,
|
||||
help=(
|
||||
"Path to tally.csv with (source, variant, "
|
||||
"still_watermarked) columns. May be the manifest file "
|
||||
"itself if you filled it in place."
|
||||
))
|
||||
p.add_argument("--codebook", required=True,
|
||||
help="V4 codebook .npz to update (in place).")
|
||||
p.add_argument("--step", type=float, default=0.25,
|
||||
help="Base scale step; 0.25 = up to +25%% per round.")
|
||||
p.add_argument("--damp-factor", type=float, default=0.15,
|
||||
help="Damping multiplier applied when all variants "
|
||||
"cleared (fidelity recovery).")
|
||||
p.add_argument("--consensus-floor", type=float, default=0.50,
|
||||
help="Only update bins with consensus_coherence >= this.")
|
||||
p.add_argument("--no-backup", dest="backup", action="store_false",
|
||||
help="Skip the timestamped backup of the codebook.")
|
||||
p.set_defaults(backup=True)
|
||||
args = p.parse_args()
|
||||
|
||||
run(
|
||||
manifest_path=args.manifest,
|
||||
tally_path=args.tally,
|
||||
codebook_path=args.codebook,
|
||||
step=args.step,
|
||||
damp_factor=args.damp_factor,
|
||||
consensus_floor=args.consensus_floor,
|
||||
backup=args.backup,
|
||||
)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
@@ -0,0 +1,255 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Phase-2 driver for the reverse-SynthID V4 manual-validation loop.
|
||||
|
||||
Takes an input folder of watermarked images and emits one or more strength
|
||||
variants per image (``A``, ``B``, ``C``, ... by default). Writes a
|
||||
``manifest.csv`` that pairs each variant with:
|
||||
|
||||
- source image path
|
||||
- output path
|
||||
- strength preset
|
||||
- profile key used
|
||||
- PSNR / SSIM achieved
|
||||
|
||||
You then paste the variants into the Gemini app, run SynthID detection, and
|
||||
fill in a small ``tally.csv`` (columns: ``source,variant,still_watermarked``,
|
||||
values ``y/n``). Feed both files into ``calibrate_from_feedback.py`` to
|
||||
update the codebook's per-carrier weights and iterate.
|
||||
|
||||
Usage::
|
||||
|
||||
python scripts/dissolve_batch.py \\
|
||||
--input /path/to/input_images \\
|
||||
--output /path/to/out_dir \\
|
||||
--codebook artifacts/spectral_codebook_v4.npz \\
|
||||
--model nano-banana-pro-preview \\
|
||||
--strengths gentle moderate aggressive
|
||||
|
||||
Strengths map to filesystem-safe single-letter variants (A,B,C,D) in
|
||||
manifest order, which makes the tally CSV trivial to fill by hand.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import csv
|
||||
import glob
|
||||
import os
|
||||
import sys
|
||||
import time
|
||||
from typing import List, Optional
|
||||
|
||||
|
||||
REPO_ROOT = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
|
||||
sys.path.insert(0, os.path.join(REPO_ROOT, "src", "extraction"))
|
||||
|
||||
import cv2 # noqa: E402
|
||||
import numpy as np # noqa: E402
|
||||
|
||||
from synthid_bypass_v4 import SpectralCodebookV4, SynthIDBypassV4 # noqa: E402
|
||||
|
||||
|
||||
IMAGE_EXTS = (".png", ".jpg", ".jpeg", ".webp")
|
||||
DEFAULT_STRENGTHS = ("gentle", "moderate", "aggressive")
|
||||
VARIANT_LETTERS = "ABCDEFGH"
|
||||
|
||||
|
||||
def iter_input_images(input_path: str) -> List[str]:
|
||||
"""Resolve ``--input`` (file, directory, or glob) to a sorted list."""
|
||||
if os.path.isdir(input_path):
|
||||
out: List[str] = []
|
||||
for ext in IMAGE_EXTS:
|
||||
out.extend(glob.glob(os.path.join(input_path, f"*{ext}")))
|
||||
out.extend(glob.glob(os.path.join(input_path, f"*{ext.upper()}")))
|
||||
return sorted(set(out))
|
||||
if os.path.isfile(input_path):
|
||||
return [input_path]
|
||||
# Treat as a glob pattern.
|
||||
return sorted(glob.glob(input_path))
|
||||
|
||||
|
||||
def dissolve_one(
|
||||
bypass: SynthIDBypassV4,
|
||||
codebook: SpectralCodebookV4,
|
||||
src: str,
|
||||
out_dir: str,
|
||||
variant_letter: str,
|
||||
strength: str,
|
||||
model: Optional[str],
|
||||
) -> dict:
|
||||
"""Dissolve one image at one strength; return a manifest row."""
|
||||
base = os.path.splitext(os.path.basename(src))[0]
|
||||
out_name = f"{base}__{variant_letter}_{strength}.png"
|
||||
out_path = os.path.join(out_dir, out_name)
|
||||
|
||||
t0 = time.time()
|
||||
try:
|
||||
result = bypass.bypass_v4_file(
|
||||
src, out_path, codebook,
|
||||
strength=strength, model=model, verify=False,
|
||||
)
|
||||
row = {
|
||||
"source": os.path.abspath(src),
|
||||
"variant": variant_letter,
|
||||
"strength": strength,
|
||||
"output": os.path.abspath(out_path),
|
||||
"profile_key": result.details["profile_key"],
|
||||
"exact_match": int(bool(result.details["exact_match"])),
|
||||
"psnr": round(result.psnr, 3),
|
||||
"ssim": round(result.ssim, 5),
|
||||
"n_passes_applied": result.details["n_passes_applied"],
|
||||
"n_passes_rolled_back": result.details["n_passes_rolled_back"],
|
||||
"elapsed_sec": round(time.time() - t0, 3),
|
||||
"still_watermarked": "", # filled by you during validation
|
||||
"notes": "",
|
||||
}
|
||||
except Exception as e:
|
||||
row = {
|
||||
"source": os.path.abspath(src),
|
||||
"variant": variant_letter,
|
||||
"strength": strength,
|
||||
"output": "",
|
||||
"profile_key": "",
|
||||
"exact_match": 0,
|
||||
"psnr": "",
|
||||
"ssim": "",
|
||||
"n_passes_applied": 0,
|
||||
"n_passes_rolled_back": 0,
|
||||
"elapsed_sec": round(time.time() - t0, 3),
|
||||
"still_watermarked": "",
|
||||
"notes": f"ERROR: {e}",
|
||||
}
|
||||
return row
|
||||
|
||||
|
||||
def run(
|
||||
input_path: str,
|
||||
out_dir: str,
|
||||
codebook_path: str,
|
||||
strengths: List[str],
|
||||
model: Optional[str] = None,
|
||||
limit: Optional[int] = None,
|
||||
manifest_name: str = "manifest.csv",
|
||||
) -> str:
|
||||
sources = iter_input_images(input_path)
|
||||
if limit is not None:
|
||||
sources = sources[:limit]
|
||||
if not sources:
|
||||
print(f"No images found in {input_path}")
|
||||
sys.exit(2)
|
||||
|
||||
os.makedirs(out_dir, exist_ok=True)
|
||||
|
||||
codebook = SpectralCodebookV4()
|
||||
codebook.load(codebook_path)
|
||||
|
||||
if model is not None and model not in codebook.models:
|
||||
print(f"WARNING: --model {model} not found in codebook. "
|
||||
f"Available: {codebook.models}. Proceeding anyway "
|
||||
"(best-effort fallback across models).")
|
||||
|
||||
bypass = SynthIDBypassV4()
|
||||
|
||||
if len(strengths) > len(VARIANT_LETTERS):
|
||||
raise ValueError(
|
||||
f"Too many strengths ({len(strengths)}); "
|
||||
f"max supported: {len(VARIANT_LETTERS)}"
|
||||
)
|
||||
letters = list(VARIANT_LETTERS[:len(strengths)])
|
||||
|
||||
manifest_path = os.path.join(out_dir, manifest_name)
|
||||
fieldnames = [
|
||||
"source", "variant", "strength", "output", "profile_key",
|
||||
"exact_match", "psnr", "ssim",
|
||||
"n_passes_applied", "n_passes_rolled_back",
|
||||
"elapsed_sec", "still_watermarked", "notes",
|
||||
]
|
||||
|
||||
print(f"Dissolving {len(sources)} image(s) × {len(strengths)} variant(s) "
|
||||
f"→ {out_dir}")
|
||||
if model:
|
||||
print(f"Model hint: {model}")
|
||||
|
||||
rows = []
|
||||
for i, src in enumerate(sources):
|
||||
print(f"[{i + 1}/{len(sources)}] {os.path.basename(src)}")
|
||||
for letter, strength in zip(letters, strengths):
|
||||
row = dissolve_one(
|
||||
bypass=bypass,
|
||||
codebook=codebook,
|
||||
src=src,
|
||||
out_dir=out_dir,
|
||||
variant_letter=letter,
|
||||
strength=strength,
|
||||
model=model,
|
||||
)
|
||||
rows.append(row)
|
||||
if row["notes"].startswith("ERROR"):
|
||||
print(f" {letter}/{strength:12s} {row['notes']}")
|
||||
else:
|
||||
print(f" {letter}/{strength:12s} "
|
||||
f"psnr={row['psnr']:>6} ssim={row['ssim']:>7} "
|
||||
f"profile={row['profile_key']} "
|
||||
f"exact={row['exact_match']}")
|
||||
|
||||
with open(manifest_path, "w", newline="") as f:
|
||||
writer = csv.DictWriter(f, fieldnames=fieldnames)
|
||||
writer.writeheader()
|
||||
writer.writerows(rows)
|
||||
|
||||
print(f"\nManifest: {manifest_path}")
|
||||
print("\nNext steps:")
|
||||
print(" 1. Upload each ABS-path output to the Gemini app and run "
|
||||
"SynthID detection.")
|
||||
print(" 2. For each row, fill the `still_watermarked` column with "
|
||||
"`y` or `n` (leave blank to skip).")
|
||||
print(f" 3. Save the filled file as tally.csv and run:")
|
||||
print(f" python scripts/calibrate_from_feedback.py "
|
||||
f"--manifest {manifest_path} --tally <your_tally.csv> "
|
||||
f"--codebook {codebook_path}")
|
||||
return manifest_path
|
||||
|
||||
|
||||
def main() -> None:
|
||||
p = argparse.ArgumentParser(
|
||||
description="Emit bypass variants for manual Gemini validation.",
|
||||
)
|
||||
p.add_argument("--input", required=True,
|
||||
help="Path to an image, a directory, or a glob pattern.")
|
||||
p.add_argument("--output", required=True,
|
||||
help="Directory to write variants and manifest.csv into.")
|
||||
p.add_argument("--codebook", required=True,
|
||||
help="Path to the V4 codebook .npz.")
|
||||
p.add_argument("--strengths", nargs="+", default=list(DEFAULT_STRENGTHS),
|
||||
choices=["gentle", "moderate", "aggressive", "maximum",
|
||||
"demolish", "annihilate", "combo",
|
||||
"blog_pure", "blog_plus", "blog_combo",
|
||||
"residual_pure", "residual_plus", "residual_combo",
|
||||
"regen_pure", "regen_plus", "regen_combo",
|
||||
"final", "nuke"],
|
||||
help=f"Strengths to emit (default: {DEFAULT_STRENGTHS}).")
|
||||
p.add_argument("--model", default=None,
|
||||
help=(
|
||||
"Optional model hint (e.g. nano-banana-pro-preview). "
|
||||
"Omit to let the codebook auto-select by resolution."
|
||||
))
|
||||
p.add_argument("--limit", type=int, default=None,
|
||||
help="Stop after this many input images (for quick tests).")
|
||||
p.add_argument("--manifest-name", default="manifest.csv",
|
||||
help="Manifest filename inside --output (default: manifest.csv).")
|
||||
args = p.parse_args()
|
||||
|
||||
run(
|
||||
input_path=args.input,
|
||||
out_dir=args.output,
|
||||
codebook_path=args.codebook,
|
||||
strengths=args.strengths,
|
||||
model=args.model,
|
||||
limit=args.limit,
|
||||
manifest_name=args.manifest_name,
|
||||
)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
@@ -804,6 +804,139 @@ class RobustSynthIDExtractor:
|
||||
}
|
||||
)
|
||||
|
||||
# ================================================================
|
||||
# V4 DETECTION — cross-color consensus codebook
|
||||
# ================================================================
|
||||
|
||||
def detect_from_v4_codebook(
|
||||
self,
|
||||
image: np.ndarray,
|
||||
codebook,
|
||||
model: Optional[str] = None,
|
||||
top_k: int = 128,
|
||||
consensus_floor: float = 0.75,
|
||||
) -> DetectionResult:
|
||||
"""Detect a SynthID watermark using a loaded V4 codebook.
|
||||
|
||||
Unlike :py:meth:`detect_array`, this runs at the image's **native
|
||||
resolution** (no 512x512 downsample) and uses the top-``top_k``
|
||||
cross-colour consensus carrier bins of the best-matching
|
||||
``SpectralCodebookV4`` profile as the phase-match target. This gives a
|
||||
much tighter detector and is the one the manual-validation loop should
|
||||
agree with before you spend Gemini-app time on a batch.
|
||||
|
||||
Returns a :class:`DetectionResult` compatible with the rest of the
|
||||
codebase so it can be plugged into ``SynthIDBypass(extractor=...)``.
|
||||
|
||||
Args:
|
||||
image: RGB uint8 or float HxWx3 array.
|
||||
codebook: A loaded ``SpectralCodebookV4`` instance.
|
||||
model: Optional model hint (e.g. ``nano-banana-pro-preview``).
|
||||
top_k: Number of top-consensus carriers per channel to score.
|
||||
consensus_floor: Ignore bins whose consensus_coherence is below
|
||||
this (prevents noise bins from inflating the score).
|
||||
"""
|
||||
if image.dtype != np.uint8:
|
||||
arr = np.asarray(image)
|
||||
if np.max(arr) <= 1.5:
|
||||
arr = arr * 255.0
|
||||
image_u8 = np.clip(arr, 0, 255).astype(np.uint8)
|
||||
else:
|
||||
image_u8 = image
|
||||
|
||||
H, W = image_u8.shape[:2]
|
||||
profile, key, exact = codebook.get_profile(H, W, model=model)
|
||||
|
||||
# FFT at native resolution if exact; otherwise project the profile
|
||||
# down via resize to avoid ringing.
|
||||
if exact:
|
||||
work = image_u8.astype(np.float64)
|
||||
prof_cons = profile.consensus_coherence
|
||||
prof_phase = profile.consensus_phase
|
||||
else:
|
||||
pH, pW = profile.shape
|
||||
work = cv2.resize(image_u8, (pW, pH), interpolation=cv2.INTER_AREA)\
|
||||
.astype(np.float64)
|
||||
prof_cons = profile.consensus_coherence
|
||||
prof_phase = profile.consensus_phase
|
||||
|
||||
per_channel_scores: List[float] = []
|
||||
per_channel_n: List[int] = []
|
||||
|
||||
for ch in range(3):
|
||||
fft_ch = np.fft.fft2(work[:, :, ch])
|
||||
img_phase = np.angle(fft_ch)
|
||||
|
||||
cons_ch = prof_cons[:, :, ch].copy()
|
||||
cons_ch[0, 0] = 0.0
|
||||
mask = (cons_ch >= consensus_floor)
|
||||
if mask.sum() == 0:
|
||||
continue
|
||||
|
||||
# Select top-k bins by consensus coherence.
|
||||
candidates = np.argsort(cons_ch.ravel())[-top_k:]
|
||||
rows, cols = np.unravel_index(candidates, cons_ch.shape)
|
||||
|
||||
matches: List[float] = []
|
||||
for y, x in zip(rows, cols):
|
||||
if cons_ch[y, x] < consensus_floor:
|
||||
continue
|
||||
ref_p = prof_phase[y, x, ch]
|
||||
diff = np.abs(np.angle(np.exp(1j * (img_phase[y, x] - ref_p))))
|
||||
matches.append(1.0 - diff / np.pi)
|
||||
|
||||
if matches:
|
||||
per_channel_scores.append(float(np.mean(matches)))
|
||||
per_channel_n.append(len(matches))
|
||||
|
||||
if not per_channel_scores:
|
||||
return DetectionResult(
|
||||
is_watermarked=False,
|
||||
confidence=0.0,
|
||||
correlation=0.0,
|
||||
phase_match=0.0,
|
||||
structure_ratio=0.0,
|
||||
carrier_strength=0.0,
|
||||
multi_scale_consistency=0.0,
|
||||
details={'v4': True, 'profile_key': f'{key[0]}/{key[1]}x{key[2]}',
|
||||
'reason': 'no consensus bins above floor'},
|
||||
)
|
||||
|
||||
# Green channel is the strongest SynthID carrier; weight accordingly.
|
||||
weights = [0.25, 0.55, 0.20][: len(per_channel_scores)]
|
||||
w_sum = sum(weights)
|
||||
phase_match = float(sum(s * w for s, w in zip(per_channel_scores, weights)) / w_sum)
|
||||
|
||||
# V4 consensus carriers average phase over ~6 colours per bin, so
|
||||
# watermarked phase_match sits near 0.60-0.75 (vs ~0.95 for the v3
|
||||
# single-colour detector). Non-watermarked / cleaned images drop to
|
||||
# ~0.30-0.45. Sigmoid is centred at 0.52 with moderate steepness so
|
||||
# the usable gap covers the full [0, 1] confidence range.
|
||||
phase_score = float(1.0 / (1.0 + np.exp(-18.0 * (phase_match - 0.52))))
|
||||
confidence = float(min(1.0, phase_score))
|
||||
is_watermarked = confidence > 0.50
|
||||
|
||||
return DetectionResult(
|
||||
is_watermarked=bool(is_watermarked),
|
||||
confidence=confidence,
|
||||
correlation=0.0,
|
||||
phase_match=phase_match,
|
||||
structure_ratio=0.0,
|
||||
carrier_strength=0.0,
|
||||
multi_scale_consistency=float(
|
||||
np.std(per_channel_scores) if len(per_channel_scores) > 1 else 0.0,
|
||||
),
|
||||
details={
|
||||
'v4': True,
|
||||
'profile_key': f'{key[0]}/{key[1]}x{key[2]}',
|
||||
'exact_match': bool(exact),
|
||||
'per_channel_scores': per_channel_scores,
|
||||
'per_channel_n': per_channel_n,
|
||||
'top_k': top_k,
|
||||
'consensus_floor': consensus_floor,
|
||||
},
|
||||
)
|
||||
|
||||
|
||||
# ================================================================
|
||||
# CLI INTERFACE
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
@@ -0,0 +1,231 @@
|
||||
"""
|
||||
Round-05 Stage: Diffusion-VAE Re-Generation Attack
|
||||
===================================================
|
||||
|
||||
Single biggest documented weakness of SynthID, per
|
||||
``SynthID-Image: Image watermarking at internet scale`` (Gowal et al., 2026),
|
||||
Section 6.1 — "Re-generation attacks use other powerful generative models
|
||||
(like diffusion models) to reconstruct a watermarked image, potentially
|
||||
washing out the watermark in the process (An et al., 2024; Zhao et al., 2024)".
|
||||
Section 6.2 concedes they only trained against **weak** off-the-shelf VAEs.
|
||||
|
||||
This stage runs the image through the Stable Diffusion autoencoder
|
||||
(``stabilityai/sd-vae-ft-mse`` — the higher-fidelity fine-tuned MSE variant)
|
||||
and returns the reconstruction. The encoder maps pixels to a narrow 8×-downsampled
|
||||
latent manifold trained on natural images; any pixel-space watermark that isn't
|
||||
essential for reconstructing the content is projected out. The decoder
|
||||
re-synthesises from latents, producing an image perceptually identical to the
|
||||
original but spectrally/statistically native to the VAE — which the SynthID
|
||||
decoder has no trained basis for.
|
||||
|
||||
Supports MPS (Apple Silicon), CUDA, and CPU with a graceful fallback. Uses the
|
||||
VAE's built-in tiled encode/decode for images above ~1024px so we don't OOM.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import os
|
||||
from typing import Optional, Tuple
|
||||
|
||||
import numpy as np
|
||||
|
||||
|
||||
_VAE_SINGLETON = None
|
||||
_VAE_DEVICE: Optional[str] = None
|
||||
|
||||
|
||||
def _select_device(prefer: Optional[str] = None) -> str:
|
||||
"""Pick MPS → CUDA → CPU, honoring an explicit ``prefer``."""
|
||||
import torch
|
||||
|
||||
if prefer:
|
||||
return prefer
|
||||
if torch.cuda.is_available():
|
||||
return "cuda"
|
||||
if hasattr(torch.backends, "mps") and torch.backends.mps.is_available():
|
||||
return "mps"
|
||||
return "cpu"
|
||||
|
||||
|
||||
def load_vae(
|
||||
model_id: str = "stabilityai/sd-vae-ft-mse",
|
||||
device: Optional[str] = None,
|
||||
dtype: str = "float32",
|
||||
):
|
||||
"""Lazy-load and cache the SD-VAE. Returns (vae, device_str).
|
||||
|
||||
``dtype`` is 'float32' on MPS (fp16 is broken for MPS conv on older torches)
|
||||
and can be 'float16' on CUDA for speed.
|
||||
"""
|
||||
global _VAE_SINGLETON, _VAE_DEVICE
|
||||
if _VAE_SINGLETON is not None:
|
||||
return _VAE_SINGLETON, _VAE_DEVICE
|
||||
|
||||
try:
|
||||
import torch
|
||||
from diffusers import AutoencoderKL
|
||||
except ImportError as e:
|
||||
raise RuntimeError(
|
||||
"VAE re-generation stage requires torch + diffusers. "
|
||||
"Install with: pip install torch diffusers safetensors accelerate"
|
||||
) from e
|
||||
|
||||
target_device = _select_device(device)
|
||||
torch_dtype = {
|
||||
"float16": torch.float16,
|
||||
"float32": torch.float32,
|
||||
"bfloat16": torch.bfloat16,
|
||||
}[dtype]
|
||||
if target_device == "mps" and torch_dtype == torch.float16:
|
||||
torch_dtype = torch.float32
|
||||
|
||||
vae = AutoencoderKL.from_pretrained(model_id, torch_dtype=torch_dtype)
|
||||
vae.eval()
|
||||
for p in vae.parameters():
|
||||
p.requires_grad = False
|
||||
vae.to(target_device)
|
||||
try:
|
||||
vae.enable_slicing()
|
||||
vae.enable_tiling()
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
_VAE_SINGLETON = vae
|
||||
_VAE_DEVICE = target_device
|
||||
return vae, target_device
|
||||
|
||||
|
||||
def _pad_to_multiple(arr: np.ndarray, mult: int = 8) -> Tuple[np.ndarray, Tuple[int, int, int, int]]:
|
||||
"""Reflect-pad an HxWxC image so H, W are multiples of ``mult``.
|
||||
|
||||
Returns the padded array and the pad amounts ``(top, bottom, left, right)``
|
||||
so the caller can undo the pad after decoding.
|
||||
"""
|
||||
H, W = arr.shape[:2]
|
||||
pad_h = (-H) % mult
|
||||
pad_w = (-W) % mult
|
||||
if pad_h == 0 and pad_w == 0:
|
||||
return arr, (0, 0, 0, 0)
|
||||
top = pad_h // 2
|
||||
bottom = pad_h - top
|
||||
left = pad_w // 2
|
||||
right = pad_w - left
|
||||
padded = np.pad(
|
||||
arr,
|
||||
((top, bottom), (left, right), (0, 0)),
|
||||
mode="reflect",
|
||||
)
|
||||
return padded, (top, bottom, left, right)
|
||||
|
||||
|
||||
def _unpad(arr: np.ndarray, pads: Tuple[int, int, int, int]) -> np.ndarray:
|
||||
top, bottom, left, right = pads
|
||||
H, W = arr.shape[:2]
|
||||
return arr[top : H - bottom if bottom else H, left : W - right if right else W]
|
||||
|
||||
|
||||
def vae_roundtrip(
|
||||
image_uint8: np.ndarray,
|
||||
strength: float = 1.0,
|
||||
device: Optional[str] = None,
|
||||
blend_with_original: float = 0.0,
|
||||
model_id: str = "stabilityai/sd-vae-ft-mse",
|
||||
) -> np.ndarray:
|
||||
"""Encode-decode ``image_uint8`` through the SD-VAE; return a uint8 image.
|
||||
|
||||
Args:
|
||||
image_uint8: HxWx3 RGB uint8.
|
||||
strength: Scales the *delta* from the original. ``1.0`` returns the pure
|
||||
reconstruction; ``0.7`` blends 70 % reconstruction + 30 % original,
|
||||
useful if pure VAE reconstruction is too visually different for a
|
||||
particular content category.
|
||||
device: Override device selection (``mps`` / ``cuda`` / ``cpu``).
|
||||
blend_with_original: Alias for ``1.0 - strength`` semantics — if > 0,
|
||||
the final output is ``strength * vae_out + blend * original``.
|
||||
model_id: HF repo. ``stabilityai/sd-vae-ft-mse`` is fast; SDXL variants
|
||||
give marginally better reconstruction but need more memory.
|
||||
|
||||
The returned image has identical spatial shape to the input. Border pixels
|
||||
may be slightly softened due to reflect-padding round-up to multiples of 8.
|
||||
"""
|
||||
import torch
|
||||
|
||||
if image_uint8.ndim != 3 or image_uint8.shape[2] != 3:
|
||||
raise ValueError(f"Expected HxWx3 RGB uint8, got {image_uint8.shape}")
|
||||
|
||||
vae, dev = load_vae(model_id=model_id, device=device)
|
||||
padded, pads = _pad_to_multiple(image_uint8, mult=8)
|
||||
|
||||
x = padded.astype(np.float32) / 127.5 - 1.0
|
||||
x = np.transpose(x, (2, 0, 1))[None, ...]
|
||||
t = torch.from_numpy(x).to(dev, dtype=next(vae.parameters()).dtype)
|
||||
|
||||
with torch.no_grad():
|
||||
posterior = vae.encode(t).latent_dist
|
||||
z = posterior.mean
|
||||
y = vae.decode(z).sample
|
||||
|
||||
y = y.float().cpu().numpy()[0]
|
||||
y = np.transpose(y, (1, 2, 0))
|
||||
y = (y + 1.0) * 127.5
|
||||
y = np.clip(y, 0, 255)
|
||||
y = _unpad(y, pads)
|
||||
|
||||
original_f = image_uint8.astype(np.float32)
|
||||
if blend_with_original > 0.0 and strength == 1.0:
|
||||
strength = 1.0 - blend_with_original
|
||||
if 0.0 <= strength < 1.0:
|
||||
y = strength * y + (1.0 - strength) * original_f
|
||||
|
||||
return np.clip(y, 0, 255).astype(np.uint8)
|
||||
|
||||
|
||||
def _gaussian_blur_multichannel(
|
||||
img_f32: np.ndarray, sigma: float,
|
||||
) -> np.ndarray:
|
||||
"""Per-channel Gaussian blur at the given ``sigma`` using cv2."""
|
||||
import cv2
|
||||
|
||||
ksize = max(3, int(2 * round(3 * sigma) + 1))
|
||||
if ksize % 2 == 0:
|
||||
ksize += 1
|
||||
out = np.empty_like(img_f32)
|
||||
for c in range(img_f32.shape[2]):
|
||||
out[..., c] = cv2.GaussianBlur(
|
||||
img_f32[..., c], (ksize, ksize), sigma, borderType=cv2.BORDER_REFLECT,
|
||||
)
|
||||
return out
|
||||
|
||||
|
||||
def vae_roundtrip_freqselective(
|
||||
image_uint8: np.ndarray,
|
||||
lowfreq_sigma: float = 8.0,
|
||||
device: Optional[str] = None,
|
||||
model_id: str = "stabilityai/sd-vae-ft-mse",
|
||||
) -> np.ndarray:
|
||||
"""VAE roundtrip with low-frequency restoration from the original.
|
||||
|
||||
Splits both the original and the VAE reconstruction into a low-freq band
|
||||
(Gaussian σ=``lowfreq_sigma``, containing lighting/color/gross structure)
|
||||
and a high-freq band (containing texture and — critically — the SynthID
|
||||
watermark at radii 14-238 bins on a 1024² grid, i.e. freqs above roughly
|
||||
0.02 cycles/pixel).
|
||||
|
||||
Output = ``low_of(original) + high_of(vae_out)``. This preserves all
|
||||
perceptually dominant low-frequency content (≈ PSNR 34-40 dB) while the
|
||||
watermark-bearing band is replaced entirely by VAE-native content that
|
||||
SynthID's decoder has no trained basis for.
|
||||
|
||||
A σ around 8 matches the SynthID carrier band boundary on a 1024² image;
|
||||
scale proportionally for very different resolutions if you want to keep
|
||||
the same relative cutoff.
|
||||
"""
|
||||
original_f = image_uint8.astype(np.float32)
|
||||
vae_f = vae_roundtrip(image_uint8, device=device, model_id=model_id).astype(np.float32)
|
||||
|
||||
low_orig = _gaussian_blur_multichannel(original_f, lowfreq_sigma)
|
||||
low_vae = _gaussian_blur_multichannel(vae_f, lowfreq_sigma)
|
||||
high_vae = vae_f - low_vae
|
||||
|
||||
out = low_orig + high_vae
|
||||
return np.clip(out, 0, 255).astype(np.uint8)
|
||||
Reference in New Issue
Block a user