SynthID Watermark Analysis

πŸ” AI Watermark Reverse Engineering

Discovering hidden AI watermark patterns through signal analysis

Python License Status Images Detection

--- ## 🎯 Overview This project reverse-engineers **AI watermarking technologies** by analyzing AI-generated and AI-edited images. We use signal processing techniques to discover watermark structures without access to proprietary neural network encoders/decoders. ### Projects | Analysis | Images | Detection Rate | Key Finding | |:---------|:------:|:--------------:|:------------| | **[Nano-150k Investigation](#-nano-150k-watermark-investigation)** | 123,268 | 99.9% | Multi-layer frequency + spatial watermarking | | **[SynthID Analysis](#-synthid-google-gemini-analysis)** | 250 | 84% | Spread-spectrum phase encoding | --- ## πŸ”¬ Nano-150k Watermark Investigation Analysis of **123,268 AI-edited image pairs** from the Nano-150k dataset to detect and characterize embedded watermarks. ### Key Discovery AI-edited images contain **multi-layer watermarks** using both frequency domain (DCT/DFT) and spatial domain (color shifts) embedding techniques. The watermarks are invisible to humans but detectable via statistical analysis. ### Detection Results | Metric | Rate | Description | |:-------|:----:|:------------| | **Frequency Domain Modifications** | 100.0% | All images show spectral changes | | **Significant Color Shifts** | 95.3% | Mean shift > 1.0 in RGB channels | | **Perceptual Hash Changes** | 66.0% | Invisible modifications detected | | **LSB Anomalies** | 10.2% | Least significant bit patterns | | **2+ Watermark Indicators** | 99.9% | Multi-layer evidence | | **3+ Watermark Indicators** | 69.2% | Strong multi-layer evidence | ### Watermark Confidence Distribution ``` 0 indicators: 0 ( 0.0%) 1 indicator: 122 ( 0.1%) 2 indicators: 37,832 (30.7%) β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ 3 indicators: 74,525 (60.5%) β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ 4 indicators: 10,789 ( 8.8%) β–ˆβ–ˆβ–ˆβ–ˆ ``` ### Extracted Watermark Visualizations
**Extracted Watermark Pattern** **Comprehensive Analysis**
**Frequency Spectrum** **Enhanced Difference Pattern**
### Analysis by Edit Category | Category | Image Pairs | Avg Freq Diff | Watermark Strength | |:---------|:-----------:|:-------------:|:------------------:| | hairstyle | 16,012 | 1.786 | High | | sweet_headshot | 16,008 | 1.759 | High | | black_headshot | 17,700 | 1.735 | High | | background | 32,765 | 1.037 | Medium | | time-change | 18,178 | 1.028 | Medium | | action | 22,605 | 1.013 | Medium | ### Processing Statistics - **Total Processing Time**: 170.2 minutes - **Processing Rate**: 12.1 pairs/second - **Success Rate**: 100% (0 failed loads) --- ## πŸ”¬ SynthID (Google Gemini) Analysis Analysis of **250 AI-generated images** from Google Gemini to reverse-engineer SynthID watermarking. ### Key Discovery SynthID uses **spread-spectrum phase encoding** in the frequency domainβ€”not LSB replacement or simple noise addition. The watermark embeds information through precise phase relationships at specific carrier frequencies. ## πŸ”¬ Discovered Patterns | Carrier Frequency | Phase Coherence | Description | |:----------------:|:---------------:|:------------| | **(Β±14, Β±14)** | 99.99% | Primary diagonal carrier | | **(Β±126, Β±14)** | 99.97% | Secondary horizontal | | **(Β±98, Β±14)** | 99.94% | Tertiary carrier | | **(Β±128, Β±128)** | 99.92% | Center frequency | | **(Β±210, Β±14)** | 99.77% | Extended carrier | | **(Β±238, Β±14)** | 99.71% | Edge carrier | ### Detection Metrics - **Noise Correlation**: ~0.218 between watermarked images - **Structure Ratio**: ~1.32 - **Detection Threshold**: correlation > 0.179 ## πŸ–ΌοΈ Extracted Watermark Visualizations
**Enhanced Visualization (500x Amplification)** **Frequency Domain Carriers**
**False Color (HSV Encoding)** **Phase Encoding Pattern**
## πŸ“ Project Structure ``` reverse-SynthID/ β”œβ”€β”€ πŸ“„ README.md # This file β”œβ”€β”€ πŸ“‹ requirements.txt # Python dependencies β”‚ β”œβ”€β”€ πŸ” watermark_investigation/ # Nano-150k Analysis (NEW) β”‚ β”œβ”€β”€ WATERMARK_EXTRACTED.png # Final extracted watermark β”‚ β”œβ”€β”€ WATERMARK_FINAL_ANALYSIS.png # Comprehensive visualization β”‚ β”œβ”€β”€ WATERMARK_enhanced_difference.png # Enhanced pattern β”‚ β”œβ”€β”€ WATERMARK_frequency_spectrum.png # Frequency domain β”‚ β”œβ”€β”€ WATERMARK_signed_pattern.png # Signed watermark β”‚ β”œβ”€β”€ watermark_FULL_123k_results.json # Complete results β”‚ β”œβ”€β”€ watermark_evidence/ # Visual evidence β”‚ └── *.py # Analysis scripts β”‚ β”œβ”€β”€ πŸ’» src/ β”‚ β”œβ”€β”€ analysis/ β”‚ β”‚ β”œβ”€β”€ synthid_codebook_finder.py # Pattern discovery β”‚ β”‚ └── deep_synthid_analysis.py # Frequency analysis β”‚ └── extraction/ β”‚ └── synthid_codebook_extractor.py # Codebook extraction & detection β”‚ β”œβ”€β”€ 🎯 artifacts/ β”‚ β”œβ”€β”€ codebook/ β”‚ β”‚ β”œβ”€β”€ synthid_codebook.pkl # Extracted codebook (9 MB) β”‚ β”‚ └── synthid_codebook_meta.json # Carrier frequencies β”‚ └── visualizations/ # Watermark images β”‚ β”œβ”€β”€ πŸ“‚ data/ β”‚ └── pure_white/ # 250 Gemini AI images β”‚ β”œβ”€β”€ πŸ“š docs/ β”‚ └── SYNTHID_CODEBOOK_ANALYSIS.md # Technical documentation β”‚ └── πŸ–ΌοΈ assets/ └── synthid-watermark.jpeg # Cover image ``` ## πŸš€ Quick Start ### Installation ```bash git clone https://github.com/yourusername/reverse-SynthID.git cd reverse-SynthID # Create virtual environment python -m venv venv source venv/bin/activate # Windows: venv\Scripts\activate # Install dependencies pip install -r requirements.txt ``` ### Run Nano-150k Watermark Analysis ```bash # Full analysis on all 123k pairs (takes ~3 hours) python watermark_investigation/watermark_full_123k_analysis.py # Extract final watermark visualization python watermark_investigation/extract_final_watermark.py # Quick sample analysis (1000 pairs) python watermark_investigation/watermark_full_analysis.py ``` ### Detect SynthID Watermark ```bash python src/extraction/synthid_codebook_extractor.py detect "path/to/image.png" \ --codebook "artifacts/codebook/synthid_codebook.pkl" ``` **Output:** ``` Detection Results: Watermarked: True Confidence: 1.0000 Correlation: 0.5355 Phase Match: 0.9571 Structure Ratio: 1.2753 ``` ### Extract New Codebook ```bash python src/extraction/synthid_codebook_extractor.py extract "data/pure_white/" \ --output "./my_codebook.pkl" ``` ### Run Analysis ```bash # Comprehensive pattern discovery python src/analysis/synthid_codebook_finder.py # Deep frequency analysis python src/analysis/deep_synthid_analysis.py ``` ## 🧠 How It Works ### Nano-150k Watermark Detection 1. **Frequency Domain Analysis**: Compute FFT differences between original and edited images 2. **LSB Pattern Detection**: Analyze least significant bit distributions for anomalies 3. **Color Shift Measurement**: Detect systematic RGB channel modifications 4. **Perceptual Hashing**: Compare perceptual hashes to find invisible changes 5. **Multi-Indicator Scoring**: Combine multiple detection methods for confidence ### SynthID Detection 1. **Pattern Discovery**: Analyze noise patterns across multiple images to find consistent structures 2. **Frequency Analysis**: Use FFT to identify carrier frequencies with phase modulation 3. **Phase Coherence**: Measure phase consistency at carrier frequencies 4. **Codebook Extraction**: Build reference patterns from averaged signals 5. **Detection**: Compare test image against codebook using correlation metrics ## πŸ“Š Technical Details ### Nano-150k Watermark Characteristics - **Embedding Domains**: Frequency (DCT/DFT) + Spatial (color shifts) - **Detection Methods**: FFT analysis, LSB statistics, perceptual hashing - **Signal Strength**: Mean freq diff ~1.32, color shifts 32-35 pixel values - **Robustness**: Survives JPEG compression, consistent across edit types - **Categories Analyzed**: background, action, time-change, headshot, hairstyle ### SynthID Watermark Characteristics - **Embedding Domain**: Frequency (FFT phase) - **Signal Strength**: ~0.1-0.15 pixel values - **Carrier Count**: 100+ frequency locations - **Robustness**: Survives moderate compression ### Detection Algorithms **Nano-150k Multi-Indicator Detection:** ```python def detect_watermark(original, edited): indicators = 0 # 1. Frequency domain analysis freq_diff = compute_fft_difference(original, edited) if freq_diff > 0.5: indicators += 1 # 2. Color shift detection color_shift = compute_color_shift(original, edited) if any(abs(shift) > 1.0 for shift in color_shift): indicators += 1 # 3. LSB anomaly detection lsb_deviation = compute_lsb_deviation(edited) if any(dev > 0.02 for dev in lsb_deviation): indicators += 1 # 4. Perceptual hash comparison phash_dist = compute_phash_distance(original, edited) if 5 < phash_dist <= 30: indicators += 1 return indicators >= 2, indicators ``` **SynthID Detection:** ```python def detect_synthid(image, codebook): # 1. Extract noise pattern noise = image - denoise(image) # 2. Check carrier phase coherence fft = fft2(noise) phase_match = check_phases(fft, codebook.carriers) # 3. Correlate with reference correlation = correlate(noise, codebook.reference) # 4. Apply decision thresholds is_watermarked = ( correlation > 0.179 and phase_match > 0.5 and 0.8 < structure_ratio < 1.8 ) return is_watermarked, confidence ``` ## πŸ“š References - [SynthID: Identifying AI-generated images](https://deepmind.google/technologies/synthid/) - [Arxiv Paper - SynthID-Image: Image watermarking at internet scale]([https://doi.org/10.1038/s41586-024-07754-z](https://arxiv.org/abs/2510.09263)) ## ⚠️ Disclaimer This project is for **research and educational purposes only**. SynthID is proprietary technology owned by Google DeepMind. The extracted patterns and detection methods are intended for: - Academic research on watermarking techniques - Security analysis of AI-generated content identification - Understanding spread-spectrum encoding methods ## πŸ“„ License Research and educational use only. See [LICENSE](LICENSE) for details. ---

Made with πŸ”¬ by reverse engineering enthusiasts