Update arxiv link and citation (arXiv:2603.24511)

Assisted-by: Claude <noreply@anthropic.com>
This commit is contained in:
Peter Romov
2026-03-26 10:05:29 +00:00
parent 63974ddfee
commit 4c938fd325
2 changed files with 34 additions and 13 deletions
+25 -7
View File
@@ -22,10 +22,28 @@ authors:
given-names: Maksym
affiliation: "ELLIS Institute Tübingen & Max Planck Institute for Intelligent Systems; Tübingen AI Center"
license: Apache-2.0
# TODO: update after arxiv publication
# preferred-citation:
# type: article
# authors: <same as above>
# title: "Claudini: Autoresearch Discovers State-of-the-Art Adversarial Attack Algorithms for LLMs"
# year: 2026
# url: "https://arxiv.org/abs/XXXX.XXXXX"
preferred-citation:
type: article
authors:
- family-names: Panfilov
given-names: Alexander
affiliation: "MATS; ELLIS Institute Tübingen & Max Planck Institute for Intelligent Systems; Tübingen AI Center"
- family-names: Romov
given-names: Peter
affiliation: "Imperial College London"
- family-names: Shilov
given-names: Igor
affiliation: "Imperial College London"
- family-names: de Montjoye
given-names: Yves-Alexandre
affiliation: "Imperial College London"
- family-names: Geiping
given-names: Jonas
affiliation: "ELLIS Institute Tübingen & Max Planck Institute for Intelligent Systems; Tübingen AI Center"
- family-names: Andriushchenko
given-names: Maksym
affiliation: "ELLIS Institute Tübingen & Max Planck Institute for Intelligent Systems; Tübingen AI Center"
title: "Claudini: Autoresearch Discovers State-of-the-Art Adversarial Attack Algorithms for LLMs"
year: 2026
url: "https://arxiv.org/abs/2603.24511"
doi: "10.48550/arXiv.2603.24511"
+9 -6
View File
@@ -2,7 +2,7 @@
**Autoresearch Discovers State-of-the-Art Adversarial Attack Algorithms for LLMs**
[![arXiv](https://img.shields.io/badge/arXiv-XXXX.XXXXX-b31b1b.svg?logo=arxiv)](https://arxiv.org/abs/XXXX.XXXXX)
[![arXiv](https://img.shields.io/badge/arXiv-2603.24511-b31b1b.svg?logo=arxiv)](https://arxiv.org/abs/2603.24511)
<p align="center">
<img src="assets/teaser.png" width="90%" alt="Claude autoresearch vs Optuna hyperparameter search: best train and validation loss over trials">
@@ -10,7 +10,7 @@
We show that an *[autoresearch](https://github.com/karpathy/autoresearch)*-style pipeline powered by Claude Code discovers novel white-box adversarial attack *algorithms* that **significantly outperform** all existing [methods](claudini/methods/original/README.md) in jailbreaking and prompt injection evaluations.
This official code repository contains a demo autoresearch pipeline, the Claude-discovered methods from the paper, baseline implementations, and the evaluation benchmark. Read our [paper](https://arxiv.org/abs/XXXX.XXXXX) and consider [citing us](#citation) if you find this useful.
This official code repository contains a demo autoresearch pipeline, the Claude-discovered methods from the paper, baseline implementations, and the evaluation benchmark. Read our [paper](https://arxiv.org/abs/2603.24511) and consider [citing us](#citation) if you find this useful.
## Setup
@@ -75,9 +75,12 @@ See [`CLAUDE.md`](CLAUDE.md) for how to implement a new method.
```bibtex
@article{panfilov2026claudini,
title={Claudini: Autoresearch Discovers State-of-the-Art Adversarial Attack Algorithms for LLMs},
author={Panfilov, Alexander and Romov, Peter and Shilov, Igor and de Montjoye, Yves-Alexandre and Geiping, Jonas and Andriushchenko, Maksym},
journal={arXiv preprint arXiv:XXXX.XXXXX},
year={2026}
title = {Claudini: Autoresearch Discovers State-of-the-Art Adversarial Attack Algorithms for LLMs},
author = {Alexander Panfilov and Peter Romov and Igor Shilov and Yves-Alexandre de Montjoye and Jonas Geiping and Maksym Andriushchenko},
journal = {arXiv preprint},
eprint = {2603.24511},
archivePrefix = {arXiv},
year = {2026},
url = {https://arxiv.org/abs/2603.24511},
}
```