agentic_security

CalvinBackup/agentic_security

Fork 0

mirror of https://github.com/msoedov/agentic_security.git synced 2026-06-24 14:19:55 +02:00

Commit Graph

Author	SHA1	Message	Date
Alexander Myasoedov	ef35c1f82e	feat: US-002 - YAML-based Attack Rule System Implement a YAML-based rule system for defining attack patterns and success conditions, inspired by Promptmap's 50+ YAML rule definitions. Features: - AttackRule model with name, type, severity, prompt, pass/fail conditions - RuleLoader for parsing YAML files with validation - Support for recursive directory loading and filtering by type/severity - Template variable substitution in prompts - Dataset integration for converting rules to ProbeDataset format - YAMLRulesDatasetLoader for loading rules from multiple directories Tested with 47 unit tests covering models, loader, and dataset integration. Successfully loads 69 rules from promptmap research directory.	2026-01-28 18:23:04 +02:00
Alexander Myasoedov	32f103acbc	feat: US-001 - Dual-LLM Evaluation for Attack Success Detection Add LLM-based refusal classifier inspired by Promptmap's dual-LLM architecture. The controller LLM evaluates whether an attack succeeded by analyzing the target's response against pass/fail conditions. - Create LLMRefusalClassifier plugin integrating with existing system - Support OpenAI and Anthropic providers with lazy initialization - Add configurable system prompts and pass/fail conditions - Include 20 unit tests for comprehensive coverage	2026-01-28 18:18:09 +02:00
Alexander Myasoedov	ce7636fe9e	feat(restruct tests):	2025-12-26 22:58:21 +02:00

Author

SHA1

Message

Date

Alexander Myasoedov

ef35c1f82e

feat: US-002 - YAML-based Attack Rule System

Implement a YAML-based rule system for defining attack patterns and success
conditions, inspired by Promptmap's 50+ YAML rule definitions.

Features:
- AttackRule model with name, type, severity, prompt, pass/fail conditions
- RuleLoader for parsing YAML files with validation
- Support for recursive directory loading and filtering by type/severity
- Template variable substitution in prompts
- Dataset integration for converting rules to ProbeDataset format
- YAMLRulesDatasetLoader for loading rules from multiple directories

Tested with 47 unit tests covering models, loader, and dataset integration.
Successfully loads 69 rules from promptmap research directory.

2026-01-28 18:23:04 +02:00

Alexander Myasoedov

32f103acbc

feat: US-001 - Dual-LLM Evaluation for Attack Success Detection

Add LLM-based refusal classifier inspired by Promptmap's dual-LLM
architecture. The controller LLM evaluates whether an attack succeeded
by analyzing the target's response against pass/fail conditions.

- Create LLMRefusalClassifier plugin integrating with existing system
- Support OpenAI and Anthropic providers with lazy initialization
- Add configurable system prompts and pass/fail conditions
- Include 20 unit tests for comprehensive coverage

2026-01-28 18:18:09 +02:00

Alexander Myasoedov

ce7636fe9e

feat(restruct tests):

2025-12-26 22:58:21 +02:00

3 Commits