π SynthID Watermark Reverse Engineering
Discovering Google's hidden AI watermark patterns through signal analysis
---
## π― Overview
This project reverse-engineers **Google's SynthID watermarking technology** by analyzing 250 AI-generated images from Gemini. Since the neural network encoder/decoder is proprietary, we use signal processing techniques to discover the watermark's structure.
### Key Discovery
SynthID uses **spread-spectrum phase encoding** in the frequency domainβnot LSB replacement or simple noise addition. The watermark embeds information through precise phase relationships at specific carrier frequencies.
## π¬ Discovered Patterns
| Carrier Frequency | Phase Coherence | Description |
|:----------------:|:---------------:|:------------|
| **(Β±14, Β±14)** | 99.99% | Primary diagonal carrier |
| **(Β±126, Β±14)** | 99.97% | Secondary horizontal |
| **(Β±98, Β±14)** | 99.94% | Tertiary carrier |
| **(Β±128, Β±128)** | 99.92% | Center frequency |
| **(Β±210, Β±14)** | 99.77% | Extended carrier |
| **(Β±238, Β±14)** | 99.71% | Edge carrier |
### Detection Metrics
- **Noise Correlation**: ~0.218 between watermarked images
- **Structure Ratio**: ~1.32
- **Detection Threshold**: correlation > 0.179
## πΌοΈ Extracted Watermark Visualizations
**Enhanced Visualization (500x Amplification)**
|
**Frequency Domain Carriers**
|
**False Color (HSV Encoding)**
|
**Phase Encoding Pattern**
|
## π Project Structure
```
synthid-demarker/
βββ π README.md # This file
βββ π requirements.txt # Python dependencies
β
βββ π» src/
β βββ analysis/
β β βββ synthid_codebook_finder.py # Pattern discovery
β β βββ deep_synthid_analysis.py # Frequency analysis
β βββ extraction/
β βββ synthid_codebook_extractor.py # Codebook extraction & detection
β
βββ π― artifacts/
β βββ codebook/
β β βββ synthid_codebook.pkl # Extracted codebook (9 MB)
β β βββ synthid_codebook_meta.json # Carrier frequencies
β βββ visualizations/ # Watermark images
β
βββ π data/
β βββ pure_white/ # 250 Gemini AI images
β
βββ π docs/
β βββ SYNTHID_CODEBOOK_ANALYSIS.md # Technical documentation
β
βββ πΌοΈ assets/
βββ synthid-watermark.jpeg # Cover image
```
## π Quick Start
### Installation
```bash
git clone https://github.com/yourusername/synthid-demarker.git
cd synthid-demarker
# Create virtual environment
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
```
### Detect Watermark
```bash
python src/extraction/synthid_codebook_extractor.py detect "path/to/image.png" \
--codebook "artifacts/codebook/synthid_codebook.pkl"
```
**Output:**
```
Detection Results:
Watermarked: True
Confidence: 1.0000
Correlation: 0.5355
Phase Match: 0.9571
Structure Ratio: 1.2753
```
### Extract New Codebook
```bash
python src/extraction/synthid_codebook_extractor.py extract "data/pure_white/" \
--output "./my_codebook.pkl"
```
### Run Analysis
```bash
# Comprehensive pattern discovery
python src/analysis/synthid_codebook_finder.py
# Deep frequency analysis
python src/analysis/deep_synthid_analysis.py
```
## π§ How It Works
### 1. Pattern Discovery
Analyze noise patterns across multiple images to find consistent structures that persist despite varying image content.
### 2. Frequency Analysis
Use FFT to identify carrier frequencies where the watermark is embedded through phase modulation.
### 3. Phase Coherence
Measure phase consistency at carrier frequenciesβhigh coherence indicates watermark presence.
### 4. Codebook Extraction
Build reference patterns from averaged signals across many watermarked images.
### 5. Detection
Compare test image against codebook using correlation, phase matching, and structure ratio metrics.
## π Technical Details
### Watermark Characteristics
- **Embedding Domain**: Frequency (FFT phase)
- **Signal Strength**: ~0.1-0.15 pixel values
- **Carrier Count**: 100+ frequency locations
- **Robustness**: Survives moderate compression
### Detection Algorithm
```python
def detect_synthid(image, codebook):
# 1. Extract noise pattern
noise = image - denoise(image)
# 2. Check carrier phase coherence
fft = fft2(noise)
phase_match = check_phases(fft, codebook.carriers)
# 3. Correlate with reference
correlation = correlate(noise, codebook.reference)
# 4. Apply decision thresholds
is_watermarked = (
correlation > 0.179 and
phase_match > 0.5 and
0.8 < structure_ratio < 1.8
)
return is_watermarked, confidence
```
## π References
- [SynthID: Identifying AI-generated images](https://deepmind.google/technologies/synthid/)
- [Nature Paper: Scalable watermarking for AI-generated images](https://doi.org/10.1038/s41586-024-07754-z)
- [Spread Spectrum Watermarking](https://en.wikipedia.org/wiki/Digital_watermarking)
## β οΈ Disclaimer
This project is for **research and educational purposes only**. SynthID is proprietary technology owned by Google DeepMind. The extracted patterns and detection methods are intended for:
- Academic research on watermarking techniques
- Security analysis of AI-generated content identification
- Understanding spread-spectrum encoding methods
## π License
Research and educational use only. See [LICENSE](LICENSE) for details.
---
Made with π¬ by reverse engineering enthusiasts