agentic_security

mirror of https://github.com/msoedov/agentic_security.git synced 2026-06-24 06:09:55 +02:00

Files

T

Alexander Myasoedov 32f103acbc feat: US-001 - Dual-LLM Evaluation for Attack Success Detection

Add LLM-based refusal classifier inspired by Promptmap's dual-LLM
architecture. The controller LLM evaluates whether an attack succeeded
by analyzing the target's response against pass/fail conditions.

- Create LLMRefusalClassifier plugin integrating with existing system
- Support OpenAI and Anthropic providers with lazy initialization
- Add configurable system prompts and pass/fail conditions
- Include 20 unit tests for comprehensive coverage

2026-01-28 18:18:09 +02:00

executor

feat(restruct tests):

2025-12-26 22:58:21 +02:00

probe_actor

feat(restruct tests):

2025-12-26 22:58:21 +02:00

probe_data

feat(restruct tests):

2025-12-26 22:58:21 +02:00

refusal_classifier

feat: US-001 - Dual-LLM Evaluation for Attack Success Detection