Files
claudini/claudini/methods/original
Peter Romov 59106bdf3c SecAlign-70B support: configs, quantization, multi-GPU (#1)
- **PEFT adapter merging.** `model_loader.py` auto-detects PEFT adapters (e.g. `facebook/Meta-SecAlign-8B`), merges on CPU in bf16, and caches the merged model to disk. No config flags needed.

- **Configurable quantization.** `quantization:` field in YAML or `--quantization` on CLI, accepting `nf4`, `fp4`, or `int8`. Replaces the old `load_in_4bit` boolean.

- **Multi-GPU sharding.** `device_map:` in configs or `--device-map` on CLI. Config value is now correctly read from YAML presets (was previously ignored).

- **CLI overrides.** New `--model`, `--device-map`, `--quantization` flags to override preset values from the command line.

- **SecAlign injection presets.** Configs for prompt injection on Meta-SecAlign-8B and 70B (default + Optuna-tuned), using new `AlpacaInjectionSource` — generates 3-role prompts from AlpacaFarm data with trusted/untrusted separation.

- **Fixes.** `BenchmarkRunner.summarize()` crash when all runs are skipped. System prompt suppression now works correctly (`""` suppresses model defaults, `None` omits the turn).

Co-authored-by: Peter Romov <peter@romov.com>
Co-authored-by: Alexander Panfilov <39771221+kotekjedi@users.noreply.github.com>
2026-04-06 13:36:08 +00:00
..
2026-03-25 02:09:26 +00:00
2026-03-25 02:09:26 +00:00
2026-03-25 02:09:26 +00:00
2026-03-25 02:09:26 +00:00
2026-03-25 02:09:26 +00:00
2026-03-25 02:09:26 +00:00
2026-03-25 02:09:26 +00:00
2026-03-25 02:09:26 +00:00
2026-03-25 02:09:26 +00:00
2026-03-25 02:09:26 +00:00
2026-03-25 02:09:26 +00:00
2026-03-25 02:09:26 +00:00
2026-03-25 02:09:26 +00:00
2026-03-25 02:09:26 +00:00
2026-03-25 02:09:26 +00:00
2026-03-25 02:09:26 +00:00
2026-03-25 02:09:26 +00:00
2026-03-25 02:09:26 +00:00
2026-03-25 02:09:26 +00:00
2026-03-25 02:09:26 +00:00
2026-03-25 02:09:26 +00:00
2026-03-25 02:09:26 +00:00
2026-03-25 02:09:26 +00:00
2026-03-25 02:09:26 +00:00
2026-03-25 02:09:26 +00:00
2026-03-25 02:09:26 +00:00
2026-03-25 02:09:26 +00:00
2026-03-25 02:09:26 +00:00
2026-03-25 02:09:26 +00:00
2026-03-25 02:09:26 +00:00
2026-03-25 02:09:26 +00:00
2026-03-25 02:09:26 +00:00
2026-03-25 02:09:26 +00:00
2026-03-25 02:09:26 +00:00

Original Methods

Reimplementations of published token-optimization attacks, adapted to the TokenOptimizer interface.

Type: D = discrete token search, C = continuous relaxation, F = gradient-free.

Methods

Package Methods Full Name Type Year Paper Official Implementation
uat uat Universal Adversarial Triggers D 2019 wallace2019universal Eric-Wallace/universal-triggers
autoprompt autoprompt AutoPrompt D 2020 shin2020autoprompt ucinlp/autoprompt
gbda gbda Gradient-Based Distributional Attack C 2021 guo2021gradient facebookresearch/text-adversarial-attack
arca arca Autoregressive Randomized Coordinate Ascent D 2023 jones2023arca ejones313/auditing-llms
gcg gcg Greedy Coordinate Gradient D 2023 zou2023universal llm-attacks/llm-attacks
lls lls Lapid-Langberg-Sipper genetic algorithm F 2023 lapid2024open
pez pez Prompts made EaZy (Hard Prompts Made Easy) C 2023 wen2023hard YuxinWenRick/hard-prompts-made-easy
acg acg Accelerated Coordinate Gradient D 2024 liu2024making
adc adc Adaptive Dense-to-sparse Constrained optimization C 2024 hu2024efficient hukkai/adc_llm_attack
attn_gcg attngcg Attention-enhanced GCG D 2024 wang2024attngcg UCSC-VLAA/AttnGCG-attack
beast beast Beam Search-based Adversarial Attack D 2024 sadasivan2024beast vinusankars/BEAST
bon bon Best-of-N F 2024 hughes2024bon jplhughes/bon-jailbreaking
cold_attack cold_attack COLD-Attack (Langevin Dynamics in Logit Space) C 2024 guo2024cold Yu-Fangxu/COLD-Attack
degcg degcg Iterative Decoupled GCG D 2024 liu2024advancing Waffle-Liu/DeGCG
faster_gcg faster_gcg Faster GCG D 2024 li2024faster
gcg_pp gcg_pp GCG++ D 2024 sitawarin2024pal chawins/pal
i_gcg i_gcg_lsgm, i_gcg_lila, i_gcg Improved GCG (LSGM / LILA / Combined) D 2024 li2024improved qizhangli/Gradient-based-Jailbreak-Attacks
mac mac Momentum-Accelerated GCG D 2024 zhang2024boosting weizeming/momentum-attack-llm
magic magic Model Attack Gradient Index GCG D 2024 li2024exploiting jiah-li/magic
mc_gcg mc_gcg MC-GCG (Progressive Multi-Coordinate Merging) D 2024 jia2025improved jiaxiaojunQAQ/I-GCG
pgd pgd, pgd_vanilla Projected Gradient Descent C 2024 geisler2024pgd sigeisler/reinforce-attacks-llms
probe_sampling probe_sampling Probe Sampling D 2024 zhao2024accelerating zhaoyiran924/Probe-Sampling
prs prs Random Search F 2024 andriushchenko2024jailbreaking tml-epfl/llm-adaptive-attacks
reg_relax reg_relax Regularized Relaxation C 2024 chacko2024adversarial sj21j/Regularized_Relaxation
egd egd Exponentiated Gradient Descent C 2025 biswas2025adversarial sbamit/Exponentiated-Gradient-Descent-LLM-Attack
mask_gcg mask_gcg Mask-GCG (Learnable Token Masks on GCG) D 2025 mu2025maskgcg Junjie-Mu/Mask-GCG
reinforce reinforce_gcg, reinforce_pgd REINFORCE Adversarial Attacks D/C 2025 geisler2025reinforce sigeisler/reinforce-attacks-llms
slot_gcg slot_gcg Slot GCG D 2025 jeong2025slotgcg youai058/SlotGCG
sm_gcg sm_gcg Spatial Momentum GCG D 2025 gu2025smgcg
tgcg tgcg Temperature-annealed GCG D 2025 tan2025resurgence
rails rails RAILS (Random Iterative Local Search) F 2026 nurlanov2026jailbreaking
tao tao TAO-Attack (Direction-Priority Token Optimization) D 2026 xu2026tao ZevineXu/TAO-Attack