feat(add \Reinforcement Learning Optimization doc):

2026-06-24 06:09:55 +02:00 · 2025-02-07 01:02:12 +02:00
parent e0eed6fd92
commit eb27f7bbaa
4 changed files with 255 additions and 2 deletions
@@ -50,6 +50,50 @@ The `probe_data` module is a core component of the Agentic Security project, res
  - `base64_encode(data)`: Encodes data in base64 format.
  - `mirror_words(text)`: Mirrors words in the text.

+### rl_model.py
+
+- **Classes:**
+  - `PromptSelectionInterface`: Abstract base class for prompt selection strategies.
+
+    - Methods:
+      - `select_next_prompt(current_prompt: str, passed_guard: bool) -> str`: Selects next prompt
+      - `select_next_prompts(current_prompt: str, passed_guard: bool) -> list[str]`: Selects multiple prompts
+      - `update_rewards(previous_prompt: str, current_prompt: str, reward: float, passed_guard: bool) -> None`: Updates rewards
+
+  - `RandomPromptSelector`: Basic random selection with history tracking.
+
+    - Parameters:
+      - `prompts: list[str]`: List of available prompts
+      - `history_size: int = 3`: Size of history to prevent cycles
+
+  - `CloudRLPromptSelector`: Cloud-based RL implementation with fallback.
+
+    - Parameters:
+      - `prompts: list[str]`: List of available prompts
+      - `api_url: str`: URL of RL service
+      - `auth_token: str = AUTH_TOKEN`: Authentication token
+      - `history_size: int = 300`: Size of history
+      - `timeout: int = 5`: Request timeout
+      - `run_id: str = ""`: Unique run identifier
+
+  - `QLearningPromptSelector`: Local Q-learning implementation.
+
+    - Parameters:
+      - `prompts: list[str]`: List of available prompts
+      - `learning_rate: float = 0.1`: Learning rate
+      - `discount_factor: float = 0.9`: Discount factor
+      - `initial_exploration: float = 1.0`: Initial exploration rate
+      - `exploration_decay: float = 0.995`: Exploration decay rate
+      - `min_exploration: float = 0.01`: Minimum exploration rate
+      - `history_size: int = 300`: Size of history
+
+  - `Module`: Main class that uses CloudRLPromptSelector.
+
+    - Parameters:
+      - `prompt_groups: list[str]`: Groups of prompts
+      - `tools_inbox: asyncio.Queue`: Queue for tool communication
+      - `opts: dict = {}`: Configuration options
+
 ## Usage Examples

 ### Generating Audio
@@ -68,6 +112,19 @@ from agentic_security.probe_data.data import load_dataset_general
 dataset = load_dataset_general("example_dataset")
 ```

+### Using RL Model
+
+```python
+from agentic_security.probe_data.modules.rl_model import QLearningPromptSelector
+
+prompts = ["What is AI?", "Explain machine learning"]
+selector = QLearningPromptSelector(prompts)
+
+current_prompt = "What is AI?"
+next_prompt = selector.select_next_prompt(current_prompt, passed_guard=True)
+selector.update_rewards(current_prompt, next_prompt, reward=1.0, passed_guard=True)
+```
+
 ## Conclusion

 The `probe_data` module provides essential functionality for handling and transforming datasets within the Agentic Security project. This documentation serves as a guide to understanding and utilizing the module's capabilities.