mirror of https://github.com/romovpa/claudini.git synced 2026-07-23 23:30:50 +02:00

Files

T

Peter Romov 59106bdf3c SecAlign-70B support: configs, quantization, multi-GPU (#1 )

- **PEFT adapter merging.** `model_loader.py` auto-detects PEFT adapters (e.g. `facebook/Meta-SecAlign-8B`), merges on CPU in bf16, and caches the merged model to disk. No config flags needed.

- **Configurable quantization.** `quantization:` field in YAML or `--quantization` on CLI, accepting `nf4`, `fp4`, or `int8`. Replaces the old `load_in_4bit` boolean.

- **Multi-GPU sharding.** `device_map:` in configs or `--device-map` on CLI. Config value is now correctly read from YAML presets (was previously ignored).

- **CLI overrides.** New `--model`, `--device-map`, `--quantization` flags to override preset values from the command line.

- **SecAlign injection presets.** Configs for prompt injection on Meta-SecAlign-8B and 70B (default + Optuna-tuned), using new `AlpacaInjectionSource` — generates 3-role prompts from AlpacaFarm data with trusted/untrusted separation.

- **Fixes.** `BenchmarkRunner.summarize()` crash when all runs are skipped. System prompt suppression now works correctly (`""` suppresses model defaults, `None` omits the turn).

Co-authored-by: Peter Romov <peter@romov.com>
Co-authored-by: Alexander Panfilov <39771221+kotekjedi@users.noreply.github.com>

2026-04-06 13:36:08 +00:00

acg

Initial commit

2026-03-25 02:09:26 +00:00

adc

Initial commit

2026-03-25 02:09:26 +00:00

arca

Initial commit

2026-03-25 02:09:26 +00:00

attn_gcg

Initial commit

2026-03-25 02:09:26 +00:00

autoprompt

Initial commit

2026-03-25 02:09:26 +00:00

beast

Initial commit

2026-03-25 02:09:26 +00:00

bon

Initial commit

2026-03-25 02:09:26 +00:00

cold_attack

Initial commit

2026-03-25 02:09:26 +00:00

degcg

Initial commit

2026-03-25 02:09:26 +00:00

egd

Initial commit

2026-03-25 02:09:26 +00:00

esa

Initial commit

2026-03-25 02:09:26 +00:00

faster_gcg

Initial commit

2026-03-25 02:09:26 +00:00

gbda

Initial commit

2026-03-25 02:09:26 +00:00

gcg

Initial commit

2026-03-25 02:09:26 +00:00

gcg_pp

Initial commit

2026-03-25 02:09:26 +00:00

i_gcg

Initial commit

2026-03-25 02:09:26 +00:00

lls

Initial commit

2026-03-25 02:09:26 +00:00

mac

Initial commit

2026-03-25 02:09:26 +00:00

magic

Initial commit

2026-03-25 02:09:26 +00:00

mask_gcg

Initial commit

2026-03-25 02:09:26 +00:00

mc_gcg

Initial commit

2026-03-25 02:09:26 +00:00

pez

Initial commit

2026-03-25 02:09:26 +00:00

pgd

Initial commit

2026-03-25 02:09:26 +00:00

probe_sampling

Initial commit

2026-03-25 02:09:26 +00:00

prs

Initial commit

2026-03-25 02:09:26 +00:00

rails

Initial commit

2026-03-25 02:09:26 +00:00

reg_relax

Initial commit

2026-03-25 02:09:26 +00:00

reinforce

Initial commit

2026-03-25 02:09:26 +00:00

slot_gcg

Initial commit

2026-03-25 02:09:26 +00:00

sm_gcg

Initial commit

2026-03-25 02:09:26 +00:00

tao

Initial commit

2026-03-25 02:09:26 +00:00

tgcg

Initial commit

2026-03-25 02:09:26 +00:00

uat

Initial commit

2026-03-25 02:09:26 +00:00

__init__.py

Initial commit

2026-03-25 02:09:26 +00:00

README.md

SecAlign-70B support: configs, quantization, multi-GPU (#1 )

2026-04-06 13:36:08 +00:00

README.md

Original Methods

Reimplementations of published token-optimization attacks, adapted to the TokenOptimizer interface.

Type: D = discrete token search, C = continuous relaxation, F = gradient-free.

Methods

Package	Methods	Full Name	Type	Year	Paper	Official Implementation
`uat`	`uat`	Universal Adversarial Triggers	D	2019	wallace2019universal	Eric-Wallace/universal-triggers
`autoprompt`	`autoprompt`	AutoPrompt	D	2020	shin2020autoprompt	ucinlp/autoprompt
`gbda`	`gbda`	Gradient-Based Distributional Attack	C	2021	guo2021gradient	facebookresearch/text-adversarial-attack
`arca`	`arca`	Autoregressive Randomized Coordinate Ascent	D	2023	jones2023arca	ejones313/auditing-llms
`gcg`	`gcg`	Greedy Coordinate Gradient	D	2023	zou2023universal	llm-attacks/llm-attacks
`lls`	`lls`	Lapid-Langberg-Sipper genetic algorithm	F	2023	lapid2024open
`pez`	`pez`	Prompts made EaZy (Hard Prompts Made Easy)	C	2023	wen2023hard	YuxinWenRick/hard-prompts-made-easy
`acg`	`acg`	Accelerated Coordinate Gradient	D	2024	liu2024making
`adc`	`adc`	Adaptive Dense-to-sparse Constrained optimization	C	2024	hu2024efficient	hukkai/adc_llm_attack
`attn_gcg`	`attngcg`	Attention-enhanced GCG	D	2024	wang2024attngcg	UCSC-VLAA/AttnGCG-attack
`beast`	`beast`	Beam Search-based Adversarial Attack	D	2024	sadasivan2024beast	vinusankars/BEAST
`bon`	`bon`	Best-of-N	F	2024	hughes2024bon	jplhughes/bon-jailbreaking
`cold_attack`	`cold_attack`	COLD-Attack (Langevin Dynamics in Logit Space)	C	2024	guo2024cold	Yu-Fangxu/COLD-Attack
`degcg`	`degcg`	Iterative Decoupled GCG	D	2024	liu2024advancing	Waffle-Liu/DeGCG
`faster_gcg`	`faster_gcg`	Faster GCG	D	2024	li2024faster
`gcg_pp`	`gcg_pp`	GCG++	D	2024	sitawarin2024pal	chawins/pal
`i_gcg`	`i_gcg_lsgm`, `i_gcg_lila`, `i_gcg`	Improved GCG (LSGM / LILA / Combined)	D	2024	li2024improved	qizhangli/Gradient-based-Jailbreak-Attacks
`mac`	`mac`	Momentum-Accelerated GCG	D	2024	zhang2024boosting	weizeming/momentum-attack-llm
`magic`	`magic`	Model Attack Gradient Index GCG	D	2024	li2024exploiting	jiah-li/magic
`mc_gcg`	`mc_gcg`	MC-GCG (Progressive Multi-Coordinate Merging)	D	2024	jia2025improved	jiaxiaojunQAQ/I-GCG
`pgd`	`pgd`, `pgd_vanilla`	Projected Gradient Descent	C	2024	geisler2024pgd	sigeisler/reinforce-attacks-llms
`probe_sampling`	`probe_sampling`	Probe Sampling	D	2024	zhao2024accelerating	zhaoyiran924/Probe-Sampling
`prs`	`prs`	Random Search	F	2024	andriushchenko2024jailbreaking	tml-epfl/llm-adaptive-attacks
`reg_relax`	`reg_relax`	Regularized Relaxation	C	2024	chacko2024adversarial	sj21j/Regularized_Relaxation
`egd`	`egd`	Exponentiated Gradient Descent	C	2025	biswas2025adversarial	sbamit/Exponentiated-Gradient-Descent-LLM-Attack
`mask_gcg`	`mask_gcg`	Mask-GCG (Learnable Token Masks on GCG)	D	2025	mu2025maskgcg	Junjie-Mu/Mask-GCG
`reinforce`	`reinforce_gcg`, `reinforce_pgd`	REINFORCE Adversarial Attacks	D/C	2025	geisler2025reinforce	sigeisler/reinforce-attacks-llms
`slot_gcg`	`slot_gcg`	Slot GCG	D	2025	jeong2025slotgcg	youai058/SlotGCG
`sm_gcg`	`sm_gcg`	Spatial Momentum GCG	D	2025	gu2025smgcg
`tgcg`	`tgcg`	Temperature-annealed GCG	D	2025	tan2025resurgence
`rails`	`rails`	RAILS (Random Iterative Local Search)	F	2026	nurlanov2026jailbreaking
`tao`	`tao`	TAO-Attack (Direction-Priority Token Optimization)	D	2026	xu2026tao	ZevineXu/TAO-Attack