feat(Add more docs for bayesian optimizer):

2026-06-23 21:59:57 +02:00 · 2025-01-26 12:29:29 +02:00
parent 83e5362501
commit 5db9676837
3 changed files with 199 additions and 0 deletions
@@ -0,0 +1,119 @@
+# Image Generation System
+
+The image generation system creates visual probes for security testing by converting text prompts into images. This document explains its architecture and implementation.
+
+## Overview
+
+The system:
+
+1. Converts text datasets into image datasets
+1. Generates images using matplotlib
+1. Encodes images for transmission
+1. Integrates with the LLM probing system
+
+## Core Components
+
+### Image Generation
+
+```python
+@cache_to_disk()
+def generate_image(prompt: str) -> bytes:
+    """
+    Generates a JPEG image containing the provided text prompt
+    """
+    # Create figure with light blue background
+    fig, ax = plt.subplots(figsize=(6, 4))
+    ax.set_facecolor("lightblue")
+
+    # Add centered text
+    ax.text(
+        0.5, 0.5,
+        prompt,
+        fontsize=16,
+        ha="center",
+        va="center",
+        wrap=True,
+        color="darkblue"
+    )
+
+    # Save to buffer
+    buffer = io.BytesIO()
+    plt.savefig(buffer, format="jpeg", bbox_inches="tight")
+    return buffer.getvalue()
+```
+
+### Dataset Conversion
+
+```python
+def generate_image_dataset(text_dataset: list[ProbeDataset]) -> list[ImageProbeDataset]:
+    """
+    Converts text datasets into image datasets
+    """
+    image_datasets = []
+
+    for dataset in text_dataset:
+        image_prompts = [
+            generate_image(prompt)
+            for prompt in tqdm(dataset.prompts)
+        ]
+
+        image_datasets.append(ImageProbeDataset(
+            test_dataset=dataset,
+            image_prompts=image_prompts
+        ))
+
+    return image_datasets
+```
+
+### Image Encoding
+
+```python
+def encode(image: bytes) -> str:
+    """
+    Encodes image bytes into base64 data URL
+    """
+    encoded = base64.b64encode(image).decode("utf-8")
+    return "data:image/jpeg;base64," + encoded
+```
+
+## Integration
+
+### RequestAdapter
+
+The RequestAdapter class integrates image generation with LLM probing:
+
+```python
+class RequestAdapter:
+    def __init__(self, llm_spec):
+        if not llm_spec.has_image:
+            raise ValueError("LLMSpec must have an image")
+        self.llm_spec = llm_spec
+
+    async def probe(self, prompt: str, encoded_image: str = "",
+                   encoded_audio: str = "", files={}) -> httpx.Response:
+        encoded_image = generate_image(prompt)
+        encoded_image = encode(encoded_image)
+        return await self.llm_spec.probe(prompt, encoded_image, encoded_audio, files)
+```
+
+## Key Features
+
+- **Caching**: Generated images are cached to disk using @cache_to_disk
+- **Progress Tracking**: tqdm progress bars for dataset conversion
+- **Error Handling**: Validates LLM specifications before probing
+- **Standard Formats**: Uses JPEG format with base64 encoding
+
+## Configuration
+
+The system is configured through:
+
+1. Figure size (6x4 inches)
+1. Background color (light blue)
+1. Text styling (16pt dark blue centered text)
+1. Image format (JPEG)
+
+## Limitations
+
+- Currently only supports text-based image generation
+- Fixed visual style and formatting
+- Requires matplotlib and associated dependencies
@@ -0,0 +1,78 @@
+# Bayesian Optimization in Security Fuzzing
+
+The fuzzer implements an optimization system using scikit-optimize (skopt) to minimize failure rates during security scans. This document explains the optimizer's implementation and behavior.
+
+## Overview
+
+The optimizer is used in both single-shot and many-shot scanning modes when the `optimize` parameter is True. It dynamically adjusts scan parameters to minimize failure rates while staying within budget constraints.
+
+## Implementation Details
+
+### Initialization
+
+The optimizer is initialized with:
+
+```python
+Optimizer(
+    [Real(0, 1)],  # Single parameter space (0 to 1)
+    base_estimator="GP",  # Gaussian Process estimator
+    n_initial_points=25  # Initial exploration points
+)
+```
+
+### Optimization Process
+
+1. **Parameter Space**: A single real-valued parameter between 0 and 1
+1. **Objective**: Minimize the failure rate (negative failure rate is maximized)
+1. **Update Mechanism**:
+   ```python
+   next_point = optimizer.ask()
+   optimizer.tell(next_point, -failure_rate)
+   ```
+1. **Early Stopping**: If best failure rate exceeds 50%:
+   ```python
+   if best_failure_rate > 0.5:
+       yield ScanResult.status_msg(
+           f"High failure rate detected ({best_failure_rate:.2%}). Stopping this module..."
+       )
+       break
+   ```
+
+## Usage in Scanning
+
+The optimizer is integrated into both scan types:
+
+### Single-shot Scan
+
+- Used in `perform_single_shot_scan()`
+- Optimizes failure rates across prompt modules
+- Considers token budget constraints
+
+### Many-shot Scan
+
+- Used in `perform_many_shot_scan()`
+- Handles more complex multi-step attacks
+- Maintains separate failure rate tracking
+
+## Key Parameters
+
+| Parameter | Description |
+|-----------|-------------|
+| base_estimator | Gaussian Process (GP) used for optimization |
+| n_initial_points | 25 initial exploration points |
+| Real(0, 1) | Single parameter space being optimized |
+| failure_rate | Current failure rate being minimized |
+
+## Optimization Flow
+
+1. Initialize optimizer with GP estimator
+1. Collect initial 25 data points
+1. For each prompt:
+   - Calculate current failure rate
+   - Update optimizer with new point
+   - Check for early stopping conditions
+1. Continue until scan completes or budget exhausted
+
+## Error Handling
+
+The optimizer is wrapped in try/except blocks to ensure scan failures don't crash the entire process. Any optimization errors are logged and the scan continues with default parameters.
@@ -22,6 +22,8 @@ nav:
    - Dataset Extension: datasets.md
    - External Modules: external_module.md
    - CI/CD Integration: ci_cd.md
+    - Bayesian Optimization: optimizer.md
+    - Image Generation: image_generation.md
  - Reference:
    - API Reference: api_reference.md
  - Community: