Commit Graph

20 Commits

Author SHA1 Message Date
Alexander Myasoedov ead8f85836 feat(feat(refusal): detect Docker/K8s sandbox escape probes (#280)): 2026-06-04 18:28:12 +03:00
Alexander Myasoedov 816c8c6bc7 fix(make litellm optional import): 2026-06-03 15:08:23 +03:00
Alexander Myasoedov a193ef9c2c fix(pc): 2026-06-03 15:05:59 +03:00
Alexander Myasoedov 67cedfb116 Merge pull request #299 from RheagalFire/feat/add-litellm-provider
feat: add LiteLLM as provider for 100+ LLM backends
2026-06-03 15:04:18 +03:00
Alexander Myasoedov 1fa66bd292 Merge pull request #300 from JackSpiece/fix/mcp-client-usage-examples
docs: add MCP client usage examples
2026-06-03 15:01:02 +03:00
Alexander Myasoedov 1bfb7dcc20 fix(use_agg_backend): 2026-06-03 14:59:43 +03:00
Carlos bad38aeb87 fix: correct test expectations to match _generate_identifiers behavior, set Agg backend for headless CI 2026-05-30 14:15:59 -04:00
Carlos 312a4cee53 feat: add MCP+Agno integration docs and report chart tests 2026-05-30 12:16:06 -04:00
JackSpiece 72f0f63a89 docs: add MCP client usage examples 2026-05-19 19:16:11 +08:00
RheagalFire a4833908ef test: add 29 unit tests and remove lazy imports 2026-05-19 01:50:40 +05:30
Edneam be7fb1f370 fix: keep PII detection separate from refusal metrics 2026-05-14 22:42:28 +05:30
Edneam d734067ef6 test: cover PII leak detector 2026-05-14 22:31:50 +05:30
Alexander Myasoedov a0b2b9ec70 feat(py upgrade): 2026-05-14 18:56:24 +03:00
Alexander Myasoedov bc7fdd7cfa fix(pc): 2026-01-28 21:04:29 +02:00
Alexander Myasoedov b38a27d78c feat: US-005 - Enhanced Refusal Detection with Hybrid Approach
Implement hybrid refusal classifier combining multiple detection methods:
- Add confidence scoring to refusal detection (HybridResult)
- Implement weighted voting with configurable thresholds
- Support require_unanimous mode for strict classification
- Add factory function create_hybrid_classifier for common setup
- Include 32 unit tests with table-driven test patterns
2026-01-28 18:52:20 +02:00
Alexander Myasoedov 41567925aa feat: US-004 - Unified LLM Provider Abstraction
Create unified provider abstraction layer for direct LLM integrations beyond
HTTP specs, inspired by FuzzyAI's comprehensive provider system.

- Add BaseLLMProvider abstract class with standard interface (generate, chat,
  sync_generate, sync_chat methods)
- Implement OpenAIProvider supporting chat completions API
- Implement AnthropicProvider supporting messages API
- Create provider factory for instantiation by name (create_provider,
  get_provider_class)
- Add 60 unit tests covering all provider implementations
2026-01-28 18:34:38 +02:00
Alexander Myasoedov f8e3f6f4a5 feat: US-003 - Composable Fuzzing Chain System
Implement FuzzNode and FuzzChain classes for multi-step attack chains
with pipe operator syntax, inspired by FuzzyAI architecture.

- FuzzNode: Single LLM call with {var} template substitution
- FuzzChain: Sequential execution passing output as input
- Pipe operator (|) for composing nodes into chains
- LLMProvider protocol for provider abstraction
- 22 unit tests covering composition and execution
2026-01-28 18:29:22 +02:00
Alexander Myasoedov ef35c1f82e feat: US-002 - YAML-based Attack Rule System
Implement a YAML-based rule system for defining attack patterns and success
conditions, inspired by Promptmap's 50+ YAML rule definitions.

Features:
- AttackRule model with name, type, severity, prompt, pass/fail conditions
- RuleLoader for parsing YAML files with validation
- Support for recursive directory loading and filtering by type/severity
- Template variable substitution in prompts
- Dataset integration for converting rules to ProbeDataset format
- YAMLRulesDatasetLoader for loading rules from multiple directories

Tested with 47 unit tests covering models, loader, and dataset integration.
Successfully loads 69 rules from promptmap research directory.
2026-01-28 18:23:04 +02:00
Alexander Myasoedov 32f103acbc feat: US-001 - Dual-LLM Evaluation for Attack Success Detection
Add LLM-based refusal classifier inspired by Promptmap's dual-LLM
architecture. The controller LLM evaluates whether an attack succeeded
by analyzing the target's response against pass/fail conditions.

- Create LLMRefusalClassifier plugin integrating with existing system
- Support OpenAI and Anthropic providers with lazy initialization
- Add configurable system prompts and pass/fail conditions
- Include 20 unit tests for comprehensive coverage
2026-01-28 18:18:09 +02:00
Alexander Myasoedov ce7636fe9e feat(restruct tests): 2025-12-26 22:58:21 +02:00