mirror of
https://github.com/msoedov/agentic_security.git
synced 2026-06-23 21:59:57 +02:00
docs: Update PRD and progress for US-001 completion
This commit is contained in:
@@ -14,7 +14,7 @@
|
||||
"Add unit tests for the new classifier"
|
||||
],
|
||||
"priority": 1,
|
||||
"passes": false
|
||||
"passes": true
|
||||
},
|
||||
{
|
||||
"id": "US-002",
|
||||
|
||||
@@ -21,3 +21,16 @@
|
||||
- Attack data modules are in agentic_security/probe_data/modules/
|
||||
- Security utilities are in agentic_security/core/security.py
|
||||
---
|
||||
|
||||
## 2026-01-28 - US-001
|
||||
- Implemented LLM-based refusal classifier (Dual-LLM evaluation)
|
||||
- Files created:
|
||||
- agentic_security/refusal_classifier/llm_classifier.py
|
||||
- tests/unit/refusal_classifier/test_llm_classifier.py
|
||||
- **Learnings for future iterations:**
|
||||
- RefusalClassifierPlugin requires is_refusal(response: str) -> bool method
|
||||
- LLMClient Protocol pattern works well for multiple provider support
|
||||
- Use lazy initialization for API clients to avoid requiring keys at import time
|
||||
- Anthropic response.content[0] can be TextBlock or ToolUseBlock, need hasattr check
|
||||
- Pre-existing test failure in test_sanitize_password (regex doesn't match dict syntax)
|
||||
---
|
||||
|
||||
Reference in New Issue
Block a user