agentic_security

mirror of https://github.com/msoedov/agentic_security.git synced 2026-06-24 06:09:55 +02:00

Author	SHA1	Message	Date
Devam Shah	d28c4b4b1e	feat: config-pluggable refusal classifiers and leak detectors PIIDetector and SandboxEscapeDetector were wired directly in probe_actor/refusal.py and the refusal classifier manager was populated from a hardcoded list, so the only way to toggle a bundled detector or add an organization-specific signature was to patch the module. Add a DetectorRegistry mapping plugin names to factories, assembled from an agentic_security.toml [detectors] section via build_from_config. Custom detectors load by import path ("pkg.module:ClassName"). refusal.py gains build_refusal_manager(config=None) reading the [detectors] table; all public symbols are preserved. Built-in leak detectors ship registered but disabled, so default refusal_heuristic behaviour is unchanged. Closes #82 Signed-off-by: Devam Shah <devamshah91@gmail.com>	2026-06-22 19:40:33 +05:30
Alexander Myasoedov	ead8f85836	feat(feat(refusal): detect Docker/K8s sandbox escape probes (#280 )):	2026-06-04 18:28:12 +03:00
Edneam	be7fb1f370	fix: keep PII detection separate from refusal metrics	2026-05-14 22:42:28 +05:30
Edneam	d734067ef6	test: cover PII leak detector	2026-05-14 22:31:50 +05:30
Alexander Myasoedov	bc7fdd7cfa	fix(pc):	2026-01-28 21:04:29 +02:00
Alexander Myasoedov	b38a27d78c	feat: US-005 - Enhanced Refusal Detection with Hybrid Approach Implement hybrid refusal classifier combining multiple detection methods: - Add confidence scoring to refusal detection (HybridResult) - Implement weighted voting with configurable thresholds - Support require_unanimous mode for strict classification - Add factory function create_hybrid_classifier for common setup - Include 32 unit tests with table-driven test patterns	2026-01-28 18:52:20 +02:00
Alexander Myasoedov	32f103acbc	feat: US-001 - Dual-LLM Evaluation for Attack Success Detection Add LLM-based refusal classifier inspired by Promptmap's dual-LLM architecture. The controller LLM evaluates whether an attack succeeded by analyzing the target's response against pass/fail conditions. - Create LLMRefusalClassifier plugin integrating with existing system - Support OpenAI and Anthropic providers with lazy initialization - Add configurable system prompts and pass/fail conditions - Include 20 unit tests for comprehensive coverage	2026-01-28 18:18:09 +02:00
Alexander Myasoedov	ce7636fe9e	feat(restruct tests):	2025-12-26 22:58:21 +02:00

8 Commits