Commit Graph

5 Commits

Author SHA1 Message Date
Alexander Myasoedov d5e2746567 docs: Update PRD and progress for US-004 completion 2026-01-28 18:35:07 +02:00
Alexander Myasoedov 29decc5c4e docs: Update PRD and progress for US-003 completion 2026-01-28 18:29:44 +02:00
Alexander Myasoedov d5ec249b6c docs: Update PRD and progress for US-002 completion 2026-01-28 18:23:30 +02:00
Alexander Myasoedov 93a85029cb docs: Update PRD and progress for US-001 completion 2026-01-28 18:18:32 +02:00
Alexander Myasoedov 32f103acbc feat: US-001 - Dual-LLM Evaluation for Attack Success Detection
Add LLM-based refusal classifier inspired by Promptmap's dual-LLM
architecture. The controller LLM evaluates whether an attack succeeded
by analyzing the target's response against pass/fail conditions.

- Create LLMRefusalClassifier plugin integrating with existing system
- Support OpenAI and Anthropic providers with lazy initialization
- Add configurable system prompts and pass/fail conditions
- Include 20 unit tests for comprehensive coverage
2026-01-28 18:18:09 +02:00