mirror of
https://github.com/msoedov/agentic_security.git
synced 2026-06-24 06:09:55 +02:00
32f103acbc
Add LLM-based refusal classifier inspired by Promptmap's dual-LLM architecture. The controller LLM evaluates whether an attack succeeded by analyzing the target's response against pass/fail conditions. - Create LLMRefusalClassifier plugin integrating with existing system - Support OpenAI and Anthropic providers with lazy initialization - Add configurable system prompts and pass/fail conditions - Include 20 unit tests for comprehensive coverage