mirror of
https://github.com/romovpa/claudini.git
synced 2026-05-12 19:12:19 +02:00
Update arxiv link and citation (arXiv:2603.24511)
Assisted-by: Claude <noreply@anthropic.com>
This commit is contained in:
+25
-7
@@ -22,10 +22,28 @@ authors:
|
||||
given-names: Maksym
|
||||
affiliation: "ELLIS Institute Tübingen & Max Planck Institute for Intelligent Systems; Tübingen AI Center"
|
||||
license: Apache-2.0
|
||||
# TODO: update after arxiv publication
|
||||
# preferred-citation:
|
||||
# type: article
|
||||
# authors: <same as above>
|
||||
# title: "Claudini: Autoresearch Discovers State-of-the-Art Adversarial Attack Algorithms for LLMs"
|
||||
# year: 2026
|
||||
# url: "https://arxiv.org/abs/XXXX.XXXXX"
|
||||
preferred-citation:
|
||||
type: article
|
||||
authors:
|
||||
- family-names: Panfilov
|
||||
given-names: Alexander
|
||||
affiliation: "MATS; ELLIS Institute Tübingen & Max Planck Institute for Intelligent Systems; Tübingen AI Center"
|
||||
- family-names: Romov
|
||||
given-names: Peter
|
||||
affiliation: "Imperial College London"
|
||||
- family-names: Shilov
|
||||
given-names: Igor
|
||||
affiliation: "Imperial College London"
|
||||
- family-names: de Montjoye
|
||||
given-names: Yves-Alexandre
|
||||
affiliation: "Imperial College London"
|
||||
- family-names: Geiping
|
||||
given-names: Jonas
|
||||
affiliation: "ELLIS Institute Tübingen & Max Planck Institute for Intelligent Systems; Tübingen AI Center"
|
||||
- family-names: Andriushchenko
|
||||
given-names: Maksym
|
||||
affiliation: "ELLIS Institute Tübingen & Max Planck Institute for Intelligent Systems; Tübingen AI Center"
|
||||
title: "Claudini: Autoresearch Discovers State-of-the-Art Adversarial Attack Algorithms for LLMs"
|
||||
year: 2026
|
||||
url: "https://arxiv.org/abs/2603.24511"
|
||||
doi: "10.48550/arXiv.2603.24511"
|
||||
|
||||
@@ -2,7 +2,7 @@
|
||||
|
||||
**Autoresearch Discovers State-of-the-Art Adversarial Attack Algorithms for LLMs**
|
||||
|
||||
[](https://arxiv.org/abs/XXXX.XXXXX)
|
||||
[](https://arxiv.org/abs/2603.24511)
|
||||
|
||||
<p align="center">
|
||||
<img src="assets/teaser.png" width="90%" alt="Claude autoresearch vs Optuna hyperparameter search: best train and validation loss over trials">
|
||||
@@ -10,7 +10,7 @@
|
||||
|
||||
We show that an *[autoresearch](https://github.com/karpathy/autoresearch)*-style pipeline powered by Claude Code discovers novel white-box adversarial attack *algorithms* that **significantly outperform** all existing [methods](claudini/methods/original/README.md) in jailbreaking and prompt injection evaluations.
|
||||
|
||||
This official code repository contains a demo autoresearch pipeline, the Claude-discovered methods from the paper, baseline implementations, and the evaluation benchmark. Read our [paper](https://arxiv.org/abs/XXXX.XXXXX) and consider [citing us](#citation) if you find this useful.
|
||||
This official code repository contains a demo autoresearch pipeline, the Claude-discovered methods from the paper, baseline implementations, and the evaluation benchmark. Read our [paper](https://arxiv.org/abs/2603.24511) and consider [citing us](#citation) if you find this useful.
|
||||
|
||||
## Setup
|
||||
|
||||
@@ -75,9 +75,12 @@ See [`CLAUDE.md`](CLAUDE.md) for how to implement a new method.
|
||||
|
||||
```bibtex
|
||||
@article{panfilov2026claudini,
|
||||
title={Claudini: Autoresearch Discovers State-of-the-Art Adversarial Attack Algorithms for LLMs},
|
||||
author={Panfilov, Alexander and Romov, Peter and Shilov, Igor and de Montjoye, Yves-Alexandre and Geiping, Jonas and Andriushchenko, Maksym},
|
||||
journal={arXiv preprint arXiv:XXXX.XXXXX},
|
||||
year={2026}
|
||||
title = {Claudini: Autoresearch Discovers State-of-the-Art Adversarial Attack Algorithms for LLMs},
|
||||
author = {Alexander Panfilov and Peter Romov and Igor Shilov and Yves-Alexandre de Montjoye and Jonas Geiping and Maksym Andriushchenko},
|
||||
journal = {arXiv preprint},
|
||||
eprint = {2603.24511},
|
||||
archivePrefix = {arXiv},
|
||||
year = {2026},
|
||||
url = {https://arxiv.org/abs/2603.24511},
|
||||
}
|
||||
```
|
||||
|
||||
Reference in New Issue
Block a user