UDora/udora/README.md

# UDora Architecture Documentation

This document describes the modular architecture of UDora after refactoring for better maintainability and extensibility.

## Overview

The UDora codebase has been decomposed into specialized modules, each handling a specific aspect of the attack algorithm. This modular design makes the code more:

- **Readable**: Each module has a clear, focused responsibility
- **Maintainable**: Changes to one component don't affect others
- **Extensible**: New features can be added by extending specific modules
- **Testable**: Individual components can be tested in isolation

## Module Structure

### 1. `attack.py` - Core Attack Algorithm

**Responsibilities**:

- Main `UDora` class implementation
- Attack orchestration and optimization loop
- Configuration and result data classes
- Attack buffer management
- High-level coordination between modules

**Key Classes**:

- `UDora`: Main attack class
- `UDoraConfig`: Configuration parameters
- `UDoraResult`: Attack results
- `AttackBuffer`: Candidate management

### 2. `scheduling.py` - Weighted Interval Scheduling

**Responsibilities**:

- Weighted interval scheduling algorithm (the core of UDora's positioning strategy)
- Interval filtering based on optimization modes
- Final token sequence construction
- Dynamic programming for optimal target placement

**Key Functions**:

- `weighted_interval_scheduling()`: Core DP algorithm
- `filter_intervals_by_sequential_mode()`: Mode-specific filtering
- `build_final_token_sequence()`: Token sequence construction

### 3. `datasets.py` - Dataset-Specific Logic

**Responsibilities**:

- Success condition evaluation for different datasets
- Dataset-specific formatting and validation
- Extensible framework for adding new datasets

**Key Functions**:

- `check_success_condition()`: Main success evaluation
- Dataset-specific checkers: `_check_webshop_success()`, `_check_injecagent_success()`, `_check_agentharm_success()`
- `validate_dataset_name()`: Dataset validation

### 4. `text_processing.py` - Target Positioning

**Responsibilities**:

- Target interval identification and scoring
- Text analysis for optimal insertion positions
- Probability-based scoring of potential targets
- Debug utilities for interval analysis

**Key Functions**:

- `build_target_intervals()`: Find all possible target positions
- `_compute_interval_score()`: Score target quality
- `count_matched_locations()`: Success threshold analysis
- `format_interval_debug_info()`: Debug output formatting

### 5. `loss.py` - Specialized Loss Functions

- UDora loss computation combining probability and reward components
- Cross-entropy loss with positional weighting
- Mellowmax loss application for smoother optimization
- Consecutive token matching rewards

**Key Functions**:

- `compute_udora_loss()`: Main UDora loss with exponential weighting
- `compute_cross_entropy_loss()`: Standard cross-entropy with position weighting
- `apply_mellowmax_loss()`: Mellowmax alternative to cross-entropy

### 6. `readable.py` - Readable Adversarial String Optimization

**Responsibilities**:

- Generate natural language adversarial strings instead of random tokens
- Apply semantic guidance to gradient-based optimization
- Evaluate readability and naturalness of adversarial strings
- Context-aware token selection for fluent adversarial prompts

**Key Classes**:

- `ReadableOptimizer`: Main class for readable optimization
  - Vocabulary categorization (functional words, content words, etc.)
  - Context-aware gradient modification
  - Fluency-based token bonuses
  - ASCII/special character penalties

**Key Functions**:

- `apply_readable_guidance()`: Modify gradients to encourage natural language
- `generate_readable_initialization()`: Create natural language starting points
- `evaluate_readability()`: Assess naturalness and coherence of text
- `create_readable_optimizer()`: Factory function for easy instantiation

**Beta Feature**: Enable with `config.readable = True` to generate adversarial strings that appear more natural and less suspicious to human reviewers.

### 6. `utils.py` - General Utilities

**Responsibilities**:

- General utility functions
- Text processing helpers
- Model interface utilities
- Common constants and helpers