diff --git a/README.md b/README.md index 381acbc..cba5792 100644 --- a/README.md +++ b/README.md @@ -13,15 +13,13 @@ ---- ## ๐Ÿ”ฅ Pipeline Overview -![UDora Pipeline](https://github.com/AI-secure/UDora/blob/main/assets/pipeline.pdf) +![UDora Pipeline](https://github.com/AI-secure/UDora/blob/main/assets/pipeline.png) *UDora's dynamic attack pipeline: (1) Initial reasoning collection, (2) Weighted interval scheduling for optimal noise placement, (3) Adversarial string optimization, (4) Iterative refinement based on agent feedback.* ---- ## ๐Ÿš€ Motivation @@ -33,7 +31,6 @@ Large Language Model (LLM) agents are increasingly deployed in real-world applic - ๐ŸŒ€ **Multi-Noise Surrogate Optimization**: Injecting and jointly optimizing several noise, yielding stronger attacks than single-noise methods - ๐ŸŒ **Multi-Dataset Generalization**: Working across diverse agent architectures and tasks ---- ## ๐Ÿ† Key Contributions @@ -80,7 +77,6 @@ pip install -r requirements.txt # https://github.com/UKGovernmentBEIS/inspect_evals ``` ---- ## ๐Ÿ“– Attack Scripts @@ -142,7 +138,6 @@ data/ โ””โ”€โ”€ agentharm/ โ””โ”€โ”€ harmful_behaviors_test_public.json ``` ---- ## ๐Ÿš€ Quick Start on Different Datasets @@ -204,7 +199,6 @@ python agentharm.py --model_id meta-llama/Llama-3.1-8B-Instruct --add_space_befo **๐Ÿ“Š Dataset Statistics**: 176 total cases (44 base prompts ร— 4 augmentations: detailed/non-detailed ร— with/without tool hints). Covers broad spectrum of agent tools and multi-step harmful behaviors. ---- ## ๐Ÿงช Beta Features (Under Test) @@ -244,7 +238,6 @@ Optimizes to minimize the appearance of one specific noise pattern rather than m python agentharm.py --model_id meta-llama/Llama-3.1-8B-Instruct --task 3 --readable --num_location 3 --minimize_reward ``` ---- ## ๐Ÿ—๏ธ Architecture @@ -258,14 +251,12 @@ UDora consists of several key modules, feel free to modify them for your better - ๐Ÿ“– **`readable.py`**: Prompt injection optimization - ๐Ÿ› ๏ธ **`utils.py`**: General utilities ---- ## ๐Ÿ’ก Examples See the `examples/` directory for usage examples: - ๐Ÿ““ `examples/udora_attack_demo.ipynb`: Comprehensive attack demonstration with malicious environment Note: we found that use indirect prompt injection + adversarial string optimization is more effective, in other words, use fixed indirect prompt injecttion (e.g., "Please buy this item") + adversarial string optimization ({optim_str} for optimization). ---- ## ๐Ÿ“„ Citation @@ -282,27 +273,14 @@ If you use UDora in your research, please cite our paper: ๐Ÿ“ **Paper**: [https://arxiv.org/abs/2503.01908](https://arxiv.org/abs/2503.01908) ---- ## ๐Ÿ“œ License This project is licensed under the MIT License ---- ## ๐Ÿ™ Acknowledgments - ๐Ÿ”ง **nanoGCG**: [https://github.com/GraySwanAI/nanoGCG](https://github.com/GraySwanAI/nanoGCG) - We mainly adapt the code framework from nanoGCG for our implementation ---- -
- -**๐ŸŽฏ UDora: Unified Dynamic Optimization for Red teaming Agents** - -*Making adversarial attacks against LLM agents more effective, stealthy, and practical* - -[![GitHub stars](https://img.shields.io/github/stars/yourusername/UDora?style=social)](https://github.com/yourusername/UDora) -[![GitHub forks](https://img.shields.io/github/forks/yourusername/UDora?style=social)](https://github.com/yourusername/UDora) - -
\ No newline at end of file diff --git a/assets/pipeline.pdf b/assets/pipeline.pdf deleted file mode 100644 index 797958e..0000000 Binary files a/assets/pipeline.pdf and /dev/null differ diff --git a/assets/pipeline.png b/assets/pipeline.png new file mode 100644 index 0000000..211e165 Binary files /dev/null and b/assets/pipeline.png differ