Commit Graph

6 Commits

Author SHA1 Message Date
Edneam be7fb1f370 fix: keep PII detection separate from refusal metrics 2026-05-14 22:42:28 +05:30
Edneam d734067ef6 test: cover PII leak detector 2026-05-14 22:31:50 +05:30
Alexander Myasoedov bc7fdd7cfa fix(pc): 2026-01-28 21:04:29 +02:00
Alexander Myasoedov b38a27d78c feat: US-005 - Enhanced Refusal Detection with Hybrid Approach
Implement hybrid refusal classifier combining multiple detection methods:
- Add confidence scoring to refusal detection (HybridResult)
- Implement weighted voting with configurable thresholds
- Support require_unanimous mode for strict classification
- Add factory function create_hybrid_classifier for common setup
- Include 32 unit tests with table-driven test patterns
2026-01-28 18:52:20 +02:00
Alexander Myasoedov 32f103acbc feat: US-001 - Dual-LLM Evaluation for Attack Success Detection
Add LLM-based refusal classifier inspired by Promptmap's dual-LLM
architecture. The controller LLM evaluates whether an attack succeeded
by analyzing the target's response against pass/fail conditions.

- Create LLMRefusalClassifier plugin integrating with existing system
- Support OpenAI and Anthropic providers with lazy initialization
- Add configurable system prompts and pass/fail conditions
- Include 20 unit tests for comprehensive coverage
2026-01-28 18:18:09 +02:00
Alexander Myasoedov ce7636fe9e feat(restruct tests): 2025-12-26 22:58:21 +02:00