diff --git a/CITATION.cff b/CITATION.cff index 7b9c620..ba661b5 100644 --- a/CITATION.cff +++ b/CITATION.cff @@ -22,10 +22,28 @@ authors: given-names: Maksym affiliation: "ELLIS Institute Tübingen & Max Planck Institute for Intelligent Systems; Tübingen AI Center" license: Apache-2.0 -# TODO: update after arxiv publication -# preferred-citation: -# type: article -# authors: -# title: "Claudini: Autoresearch Discovers State-of-the-Art Adversarial Attack Algorithms for LLMs" -# year: 2026 -# url: "https://arxiv.org/abs/XXXX.XXXXX" +preferred-citation: + type: article + authors: + - family-names: Panfilov + given-names: Alexander + affiliation: "MATS; ELLIS Institute Tübingen & Max Planck Institute for Intelligent Systems; Tübingen AI Center" + - family-names: Romov + given-names: Peter + affiliation: "Imperial College London" + - family-names: Shilov + given-names: Igor + affiliation: "Imperial College London" + - family-names: de Montjoye + given-names: Yves-Alexandre + affiliation: "Imperial College London" + - family-names: Geiping + given-names: Jonas + affiliation: "ELLIS Institute Tübingen & Max Planck Institute for Intelligent Systems; Tübingen AI Center" + - family-names: Andriushchenko + given-names: Maksym + affiliation: "ELLIS Institute Tübingen & Max Planck Institute for Intelligent Systems; Tübingen AI Center" + title: "Claudini: Autoresearch Discovers State-of-the-Art Adversarial Attack Algorithms for LLMs" + year: 2026 + url: "https://arxiv.org/abs/2603.24511" + doi: "10.48550/arXiv.2603.24511" diff --git a/README.md b/README.md index 714e023..6a61612 100644 --- a/README.md +++ b/README.md @@ -2,7 +2,7 @@ **Autoresearch Discovers State-of-the-Art Adversarial Attack Algorithms for LLMs** -[![arXiv](https://img.shields.io/badge/arXiv-XXXX.XXXXX-b31b1b.svg?logo=arxiv)](https://arxiv.org/abs/XXXX.XXXXX) +[![arXiv](https://img.shields.io/badge/arXiv-2603.24511-b31b1b.svg?logo=arxiv)](https://arxiv.org/abs/2603.24511)

Claude autoresearch vs Optuna hyperparameter search: best train and validation loss over trials @@ -10,7 +10,7 @@ We show that an *[autoresearch](https://github.com/karpathy/autoresearch)*-style pipeline powered by Claude Code discovers novel white-box adversarial attack *algorithms* that **significantly outperform** all existing [methods](claudini/methods/original/README.md) in jailbreaking and prompt injection evaluations. -This official code repository contains a demo autoresearch pipeline, the Claude-discovered methods from the paper, baseline implementations, and the evaluation benchmark. Read our [paper](https://arxiv.org/abs/XXXX.XXXXX) and consider [citing us](#citation) if you find this useful. +This official code repository contains a demo autoresearch pipeline, the Claude-discovered methods from the paper, baseline implementations, and the evaluation benchmark. Read our [paper](https://arxiv.org/abs/2603.24511) and consider [citing us](#citation) if you find this useful. ## Setup @@ -75,9 +75,12 @@ See [`CLAUDE.md`](CLAUDE.md) for how to implement a new method. ```bibtex @article{panfilov2026claudini, - title={Claudini: Autoresearch Discovers State-of-the-Art Adversarial Attack Algorithms for LLMs}, - author={Panfilov, Alexander and Romov, Peter and Shilov, Igor and de Montjoye, Yves-Alexandre and Geiping, Jonas and Andriushchenko, Maksym}, - journal={arXiv preprint arXiv:XXXX.XXXXX}, - year={2026} + title = {Claudini: Autoresearch Discovers State-of-the-Art Adversarial Attack Algorithms for LLMs}, + author = {Alexander Panfilov and Peter Romov and Igor Shilov and Yves-Alexandre de Montjoye and Jonas Geiping and Maksym Andriushchenko}, + journal = {arXiv preprint}, + eprint = {2603.24511}, + archivePrefix = {arXiv}, + year = {2026}, + url = {https://arxiv.org/abs/2603.24511}, } ```