mirror of
https://github.com/AI-secure/UDora.git
synced 2026-02-12 21:03:21 +00:00
UDora Architecture Documentation
This document describes the modular architecture of UDora after refactoring for better maintainability and extensibility.
Overview
The UDora codebase has been decomposed into specialized modules, each handling a specific aspect of the attack algorithm. This modular design makes the code more:
- Readable: Each module has a clear, focused responsibility
- Maintainable: Changes to one component don't affect others
- Extensible: New features can be added by extending specific modules
- Testable: Individual components can be tested in isolation
Module Structure
1. attack.py - Core Attack Algorithm
Responsibilities:
- Main
UDoraclass implementation - Attack orchestration and optimization loop
- Configuration and result data classes
- Attack buffer management
- High-level coordination between modules
Key Classes:
UDora: Main attack classUDoraConfig: Configuration parametersUDoraResult: Attack resultsAttackBuffer: Candidate management
2. scheduling.py - Weighted Interval Scheduling
Responsibilities:
- Weighted interval scheduling algorithm (the core of UDora's positioning strategy)
- Interval filtering based on optimization modes
- Final token sequence construction
- Dynamic programming for optimal target placement
Key Functions:
weighted_interval_scheduling(): Core DP algorithmfilter_intervals_by_sequential_mode(): Mode-specific filteringbuild_final_token_sequence(): Token sequence construction
3. datasets.py - Dataset-Specific Logic
Responsibilities:
- Success condition evaluation for different datasets
- Dataset-specific formatting and validation
- Extensible framework for adding new datasets
Key Functions:
check_success_condition(): Main success evaluation- Dataset-specific checkers:
_check_webshop_success(),_check_injecagent_success(),_check_agentharm_success() validate_dataset_name(): Dataset validation
4. text_processing.py - Target Positioning
Responsibilities:
- Target interval identification and scoring
- Text analysis for optimal insertion positions
- Probability-based scoring of potential targets
- Debug utilities for interval analysis
Key Functions:
build_target_intervals(): Find all possible target positions_compute_interval_score(): Score target qualitycount_matched_locations(): Success threshold analysisformat_interval_debug_info(): Debug output formatting
5. loss.py - Specialized Loss Functions
- UDora loss computation combining probability and reward components
- Cross-entropy loss with positional weighting
- Mellowmax loss application for smoother optimization
- Consecutive token matching rewards
Key Functions:
compute_udora_loss(): Main UDora loss with exponential weightingcompute_cross_entropy_loss(): Standard cross-entropy with position weightingapply_mellowmax_loss(): Mellowmax alternative to cross-entropy
6. readable.py - Readable Adversarial String Optimization
Responsibilities:
- Generate natural language adversarial strings instead of random tokens
- Apply semantic guidance to gradient-based optimization
- Evaluate readability and naturalness of adversarial strings
- Context-aware token selection for fluent adversarial prompts
Key Classes:
ReadableOptimizer: Main class for readable optimization- Vocabulary categorization (functional words, content words, etc.)
- Context-aware gradient modification
- Fluency-based token bonuses
- ASCII/special character penalties
Key Functions:
apply_readable_guidance(): Modify gradients to encourage natural languagegenerate_readable_initialization(): Create natural language starting pointsevaluate_readability(): Assess naturalness and coherence of textcreate_readable_optimizer(): Factory function for easy instantiation
Beta Feature: Enable with config.readable = True to generate adversarial strings that appear more natural and less suspicious to human reviewers.
6. utils.py - General Utilities
Responsibilities:
- General utility functions
- Text processing helpers
- Model interface utilities
- Common constants and helpers