mirror of
https://github.com/FuzzingLabs/fuzzforge_ai.git
synced 2026-06-03 01:28:01 +02:00
Feature/litellm proxy (#27)
* feat: seed governance config and responses routing * Add env-configurable timeout for proxy providers * Integrate LiteLLM OTEL collector and update docs * Make .env.litellm optional for LiteLLM proxy * Add LiteLLM proxy integration with model-agnostic virtual keys Changes: - Bootstrap generates 3 virtual keys with individual budgets (CLI: $100, Task-Agent: $25, Cognee: $50) - Task-agent loads config at runtime via entrypoint script to wait for bootstrap completion - All keys are model-agnostic by default (no LITELLM_DEFAULT_MODELS restrictions) - Bootstrap handles database/env mismatch after docker prune by deleting stale aliases - CLI and Cognee configured to use LiteLLM proxy with virtual keys - Added comprehensive documentation in volumes/env/README.md Technical details: - task-agent entrypoint waits for keys in .env file before starting uvicorn - Bootstrap creates/updates TASK_AGENT_API_KEY, COGNEE_API_KEY, and OPENAI_API_KEY - Removed hardcoded API keys from docker-compose.yml - All services route through http://localhost:10999 proxy * Fix CLI not loading virtual keys from global .env Project .env files with empty OPENAI_API_KEY values were overriding the global virtual keys. Updated _load_env_file_if_exists to only override with non-empty values. * Fix agent executor not passing API key to LiteLLM The agent was initializing LiteLlm without api_key or api_base, causing authentication errors when using the LiteLLM proxy. Now reads from OPENAI_API_KEY/LLM_API_KEY and LLM_ENDPOINT environment variables and passes them to LiteLlm constructor. * Auto-populate project .env with virtual key from global config When running 'ff init', the command now checks for a global volumes/env/.env file and automatically uses the OPENAI_API_KEY virtual key if found. This ensures projects work with LiteLLM proxy out of the box without manual key configuration. * docs: Update README with LiteLLM configuration instructions Add note about LITELLM_GEMINI_API_KEY configuration and clarify that OPENAI_API_KEY default value should not be changed as it's used for the LLM proxy. * Refactor workflow parameters to use JSON Schema defaults Consolidates parameter defaults into JSON Schema format, removing the separate default_parameters field. Adds extract_defaults_from_json_schema() helper to extract defaults from the standard schema structure. Updates LiteLLM proxy config to use LITELLM_OPENAI_API_KEY environment variable. * Remove .env.example from task_agent * Fix MDX syntax error in llm-proxy.md * fix: apply default parameters from metadata.yaml automatically Fixed TemporalManager.run_workflow() to correctly apply default parameter values from workflow metadata.yaml files when parameters are not provided by the caller. Previous behavior: - When workflow_params was empty {}, the condition `if workflow_params and 'parameters' in metadata` would fail - Parameters would not be extracted from schema, resulting in workflows receiving only target_id with no other parameters New behavior: - Removed the `workflow_params and` requirement from the condition - Now explicitly checks for defaults in parameter spec - Applies defaults from metadata.yaml automatically when param not provided - Workflows receive all parameters with proper fallback: provided value > metadata default > None This makes metadata.yaml the single source of truth for parameter defaults, removing the need for workflows to implement defensive default handling. Affected workflows: - llm_secret_detection (was failing with KeyError) - All other workflows now benefit from automatic default application Co-authored-by: tduhamel42 <tduhamel@fuzzinglabs.com>
This commit is contained in:
@@ -1,10 +0,0 @@
|
||||
# Default LiteLLM configuration
|
||||
LITELLM_MODEL=gemini/gemini-2.0-flash-001
|
||||
# LITELLM_PROVIDER=gemini
|
||||
|
||||
# API keys (uncomment and fill as needed)
|
||||
# GOOGLE_API_KEY=
|
||||
# OPENAI_API_KEY=
|
||||
# ANTHROPIC_API_KEY=
|
||||
# OPENROUTER_API_KEY=
|
||||
# MISTRAL_API_KEY=
|
||||
@@ -16,4 +16,9 @@ COPY . /app/agent_with_adk_format
|
||||
WORKDIR /app/agent_with_adk_format
|
||||
ENV PYTHONPATH=/app
|
||||
|
||||
# Copy and set up entrypoint
|
||||
COPY docker-entrypoint.sh /docker-entrypoint.sh
|
||||
RUN chmod +x /docker-entrypoint.sh
|
||||
|
||||
ENTRYPOINT ["/docker-entrypoint.sh"]
|
||||
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
|
||||
|
||||
@@ -43,18 +43,34 @@ cd task_agent
|
||||
# cp .env.example .env
|
||||
```
|
||||
|
||||
Edit `.env` (or `.env.example`) and add your API keys. The agent must be restarted after changes so the values are picked up:
|
||||
Edit `.env` (or `.env.example`) and add your proxy + API keys. The agent must be restarted after changes so the values are picked up:
|
||||
```bash
|
||||
# Set default model
|
||||
LITELLM_MODEL=gemini/gemini-2.0-flash-001
|
||||
# Route every request through the proxy container (use http://localhost:10999 from the host)
|
||||
FF_LLM_PROXY_BASE_URL=http://llm-proxy:4000
|
||||
|
||||
# Add API keys for providers you want to use
|
||||
GOOGLE_API_KEY=your_google_api_key
|
||||
OPENAI_API_KEY=your_openai_api_key
|
||||
ANTHROPIC_API_KEY=your_anthropic_api_key
|
||||
OPENROUTER_API_KEY=your_openrouter_api_key
|
||||
# Default model + provider the agent boots with
|
||||
LITELLM_MODEL=openai/gpt-4o-mini
|
||||
LITELLM_PROVIDER=openai
|
||||
|
||||
# Virtual key issued by the proxy to the task agent (bootstrap replaces the placeholder)
|
||||
OPENAI_API_KEY=sk-proxy-default
|
||||
|
||||
# Upstream keys stay inside the proxy. Store real secrets under the LiteLLM
|
||||
# aliases and the bootstrapper mirrors them into .env.litellm for the proxy container.
|
||||
LITELLM_OPENAI_API_KEY=your_real_openai_api_key
|
||||
LITELLM_ANTHROPIC_API_KEY=your_real_anthropic_key
|
||||
LITELLM_GEMINI_API_KEY=your_real_gemini_key
|
||||
LITELLM_MISTRAL_API_KEY=your_real_mistral_key
|
||||
LITELLM_OPENROUTER_API_KEY=your_real_openrouter_key
|
||||
```
|
||||
|
||||
> When running the agent outside of Docker, swap `FF_LLM_PROXY_BASE_URL` to the host port (default `http://localhost:10999`).
|
||||
|
||||
The bootstrap container provisions LiteLLM, copies provider secrets into
|
||||
`volumes/env/.env.litellm`, and rewrites `volumes/env/.env` with the virtual key.
|
||||
Populate the `LITELLM_*_API_KEY` values before the first launch so the proxy can
|
||||
reach your upstream providers as soon as the bootstrap script runs.
|
||||
|
||||
### 2. Install Dependencies
|
||||
|
||||
```bash
|
||||
|
||||
@@ -0,0 +1,31 @@
|
||||
#!/bin/bash
|
||||
set -e
|
||||
|
||||
# Wait for .env file to have keys (max 30 seconds)
|
||||
echo "[task-agent] Waiting for virtual keys to be provisioned..."
|
||||
for i in $(seq 1 30); do
|
||||
if [ -f /app/config/.env ]; then
|
||||
# Check if TASK_AGENT_API_KEY has a value (not empty)
|
||||
KEY=$(grep -E '^TASK_AGENT_API_KEY=' /app/config/.env | cut -d'=' -f2)
|
||||
if [ -n "$KEY" ] && [ "$KEY" != "" ]; then
|
||||
echo "[task-agent] Virtual keys found, loading environment..."
|
||||
# Export keys from .env file
|
||||
export TASK_AGENT_API_KEY="$KEY"
|
||||
export OPENAI_API_KEY=$(grep -E '^OPENAI_API_KEY=' /app/config/.env | cut -d'=' -f2)
|
||||
export FF_LLM_PROXY_BASE_URL=$(grep -E '^FF_LLM_PROXY_BASE_URL=' /app/config/.env | cut -d'=' -f2)
|
||||
echo "[task-agent] Loaded TASK_AGENT_API_KEY: ${TASK_AGENT_API_KEY:0:15}..."
|
||||
echo "[task-agent] Loaded FF_LLM_PROXY_BASE_URL: $FF_LLM_PROXY_BASE_URL"
|
||||
break
|
||||
fi
|
||||
fi
|
||||
echo "[task-agent] Keys not ready yet, waiting... ($i/30)"
|
||||
sleep 1
|
||||
done
|
||||
|
||||
if [ -z "$TASK_AGENT_API_KEY" ]; then
|
||||
echo "[task-agent] ERROR: Virtual keys were not provisioned within 30 seconds!"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo "[task-agent] Starting uvicorn..."
|
||||
exec "$@"
|
||||
@@ -4,13 +4,28 @@ from __future__ import annotations
|
||||
|
||||
import os
|
||||
|
||||
|
||||
def _normalize_proxy_base_url(raw_value: str | None) -> str | None:
|
||||
if not raw_value:
|
||||
return None
|
||||
cleaned = raw_value.strip()
|
||||
if not cleaned:
|
||||
return None
|
||||
# Avoid double slashes in downstream requests
|
||||
return cleaned.rstrip("/")
|
||||
|
||||
AGENT_NAME = "litellm_agent"
|
||||
AGENT_DESCRIPTION = (
|
||||
"A LiteLLM-backed shell that exposes hot-swappable model and prompt controls."
|
||||
)
|
||||
|
||||
DEFAULT_MODEL = os.getenv("LITELLM_MODEL", "gemini-2.0-flash-001")
|
||||
DEFAULT_PROVIDER = os.getenv("LITELLM_PROVIDER")
|
||||
DEFAULT_MODEL = os.getenv("LITELLM_MODEL", "openai/gpt-4o-mini")
|
||||
DEFAULT_PROVIDER = os.getenv("LITELLM_PROVIDER") or None
|
||||
PROXY_BASE_URL = _normalize_proxy_base_url(
|
||||
os.getenv("FF_LLM_PROXY_BASE_URL")
|
||||
or os.getenv("LITELLM_API_BASE")
|
||||
or os.getenv("LITELLM_BASE_URL")
|
||||
)
|
||||
|
||||
STATE_PREFIX = "app:litellm_agent/"
|
||||
STATE_MODEL_KEY = f"{STATE_PREFIX}model"
|
||||
|
||||
@@ -3,11 +3,15 @@
|
||||
from __future__ import annotations
|
||||
|
||||
from dataclasses import dataclass
|
||||
import os
|
||||
from typing import Any, Mapping, MutableMapping, Optional
|
||||
|
||||
import httpx
|
||||
|
||||
from .config import (
|
||||
DEFAULT_MODEL,
|
||||
DEFAULT_PROVIDER,
|
||||
PROXY_BASE_URL,
|
||||
STATE_MODEL_KEY,
|
||||
STATE_PROMPT_KEY,
|
||||
STATE_PROVIDER_KEY,
|
||||
@@ -66,11 +70,109 @@ class HotSwapState:
|
||||
"""Create a LiteLlm instance for the current state."""
|
||||
|
||||
from google.adk.models.lite_llm import LiteLlm # Lazy import to avoid cycle
|
||||
from google.adk.models.lite_llm import LiteLLMClient
|
||||
from litellm.types.utils import Choices, Message, ModelResponse, Usage
|
||||
|
||||
kwargs = {"model": self.model}
|
||||
if self.provider:
|
||||
kwargs["custom_llm_provider"] = self.provider
|
||||
return LiteLlm(**kwargs)
|
||||
if PROXY_BASE_URL:
|
||||
provider = (self.provider or DEFAULT_PROVIDER or "").lower()
|
||||
if provider and provider != "openai":
|
||||
kwargs["api_base"] = f"{PROXY_BASE_URL.rstrip('/')}/{provider}"
|
||||
else:
|
||||
kwargs["api_base"] = PROXY_BASE_URL
|
||||
kwargs.setdefault("api_key", os.environ.get("TASK_AGENT_API_KEY") or os.environ.get("OPENAI_API_KEY"))
|
||||
|
||||
provider = (self.provider or DEFAULT_PROVIDER or "").lower()
|
||||
model_suffix = self.model.split("/", 1)[-1]
|
||||
use_responses = provider == "openai" and (
|
||||
model_suffix.startswith("gpt-5") or model_suffix.startswith("o1")
|
||||
)
|
||||
if use_responses:
|
||||
kwargs.setdefault("use_responses_api", True)
|
||||
|
||||
llm = LiteLlm(**kwargs)
|
||||
|
||||
if use_responses and PROXY_BASE_URL:
|
||||
|
||||
class _ResponsesAwareClient(LiteLLMClient):
|
||||
def __init__(self, base_client: LiteLLMClient, api_base: str, api_key: str):
|
||||
self._base_client = base_client
|
||||
self._api_base = api_base.rstrip("/")
|
||||
self._api_key = api_key
|
||||
|
||||
async def acompletion(self, model, messages, tools, **kwargs): # type: ignore[override]
|
||||
use_responses_api = kwargs.pop("use_responses_api", False)
|
||||
if not use_responses_api:
|
||||
return await self._base_client.acompletion(
|
||||
model=model,
|
||||
messages=messages,
|
||||
tools=tools,
|
||||
**kwargs,
|
||||
)
|
||||
|
||||
resolved_model = model
|
||||
if "/" not in resolved_model:
|
||||
resolved_model = f"openai/{resolved_model}"
|
||||
|
||||
payload = {
|
||||
"model": resolved_model,
|
||||
"input": _messages_to_responses_input(messages),
|
||||
}
|
||||
|
||||
timeout = kwargs.get("timeout", 60)
|
||||
headers = {
|
||||
"Authorization": f"Bearer {self._api_key}",
|
||||
"Content-Type": "application/json",
|
||||
}
|
||||
|
||||
async with httpx.AsyncClient(timeout=timeout) as client:
|
||||
response = await client.post(
|
||||
f"{self._api_base}/v1/responses",
|
||||
json=payload,
|
||||
headers=headers,
|
||||
)
|
||||
try:
|
||||
response.raise_for_status()
|
||||
except httpx.HTTPStatusError as exc:
|
||||
text = exc.response.text
|
||||
raise RuntimeError(
|
||||
f"LiteLLM responses request failed: {text}"
|
||||
) from exc
|
||||
data = response.json()
|
||||
|
||||
text_output = _extract_output_text(data)
|
||||
usage = data.get("usage", {})
|
||||
|
||||
return ModelResponse(
|
||||
id=data.get("id"),
|
||||
model=model,
|
||||
choices=[
|
||||
Choices(
|
||||
finish_reason="stop",
|
||||
index=0,
|
||||
message=Message(role="assistant", content=text_output),
|
||||
provider_specific_fields={"bifrost_response": data},
|
||||
)
|
||||
],
|
||||
usage=Usage(
|
||||
prompt_tokens=usage.get("input_tokens"),
|
||||
completion_tokens=usage.get("output_tokens"),
|
||||
reasoning_tokens=usage.get("output_tokens_details", {}).get(
|
||||
"reasoning_tokens"
|
||||
),
|
||||
total_tokens=usage.get("total_tokens"),
|
||||
),
|
||||
)
|
||||
|
||||
llm.llm_client = _ResponsesAwareClient(
|
||||
llm.llm_client,
|
||||
PROXY_BASE_URL,
|
||||
os.environ.get("TASK_AGENT_API_KEY") or os.environ.get("OPENAI_API_KEY", ""),
|
||||
)
|
||||
|
||||
return llm
|
||||
|
||||
@property
|
||||
def display_model(self) -> str:
|
||||
@@ -84,3 +186,69 @@ def apply_state_to_agent(invocation_context, state: HotSwapState) -> None:
|
||||
|
||||
agent = invocation_context.agent
|
||||
agent.model = state.instantiate_llm()
|
||||
|
||||
|
||||
def _messages_to_responses_input(messages: list[dict[str, Any]]) -> list[dict[str, Any]]:
|
||||
inputs: list[dict[str, Any]] = []
|
||||
for message in messages:
|
||||
role = message.get("role", "user")
|
||||
content = message.get("content", "")
|
||||
text_segments: list[str] = []
|
||||
|
||||
if isinstance(content, list):
|
||||
for item in content:
|
||||
if isinstance(item, dict):
|
||||
text = item.get("text") or item.get("content")
|
||||
if text:
|
||||
text_segments.append(str(text))
|
||||
elif isinstance(item, str):
|
||||
text_segments.append(item)
|
||||
elif isinstance(content, str):
|
||||
text_segments.append(content)
|
||||
|
||||
text = "\n".join(segment.strip() for segment in text_segments if segment)
|
||||
if not text:
|
||||
continue
|
||||
|
||||
entry_type = "input_text"
|
||||
if role == "assistant":
|
||||
entry_type = "output_text"
|
||||
|
||||
inputs.append(
|
||||
{
|
||||
"role": role,
|
||||
"content": [
|
||||
{
|
||||
"type": entry_type,
|
||||
"text": text,
|
||||
}
|
||||
],
|
||||
}
|
||||
)
|
||||
|
||||
if not inputs:
|
||||
inputs.append(
|
||||
{
|
||||
"role": "user",
|
||||
"content": [
|
||||
{
|
||||
"type": "input_text",
|
||||
"text": "",
|
||||
}
|
||||
],
|
||||
}
|
||||
)
|
||||
return inputs
|
||||
|
||||
|
||||
def _extract_output_text(response_json: dict[str, Any]) -> str:
|
||||
outputs = response_json.get("output", [])
|
||||
collected: list[str] = []
|
||||
for item in outputs:
|
||||
if isinstance(item, dict) and item.get("type") == "message":
|
||||
for part in item.get("content", []):
|
||||
if isinstance(part, dict) and part.get("type") == "output_text":
|
||||
text = part.get("text", "")
|
||||
if text:
|
||||
collected.append(str(text))
|
||||
return "\n\n".join(collected).strip()
|
||||
|
||||
@@ -0,0 +1,5 @@
|
||||
# LLM Proxy Integrations
|
||||
|
||||
This directory contains vendor source trees that were vendored only for reference when integrating LLM gateways. The actual FuzzForge deployment uses the official Docker images for each project.
|
||||
|
||||
See `docs/docs/how-to/llm-proxy.md` for up-to-date instructions on running the proxy services and issuing keys for the agents.
|
||||
@@ -1049,10 +1049,19 @@ class FuzzForgeExecutor:
|
||||
FunctionTool(get_task_list)
|
||||
])
|
||||
|
||||
|
||||
# Create the agent
|
||||
|
||||
# Create the agent with LiteLLM configuration
|
||||
llm_kwargs = {}
|
||||
api_key = os.getenv('OPENAI_API_KEY') or os.getenv('LLM_API_KEY')
|
||||
api_base = os.getenv('LLM_ENDPOINT') or os.getenv('LLM_API_BASE') or os.getenv('OPENAI_API_BASE')
|
||||
|
||||
if api_key:
|
||||
llm_kwargs['api_key'] = api_key
|
||||
if api_base:
|
||||
llm_kwargs['api_base'] = api_base
|
||||
|
||||
self.agent = LlmAgent(
|
||||
model=LiteLlm(model=self.model),
|
||||
model=LiteLlm(model=self.model, **llm_kwargs),
|
||||
name="fuzzforge_executor",
|
||||
description="Intelligent A2A orchestrator with memory",
|
||||
instruction=self._build_instruction(),
|
||||
|
||||
@@ -56,7 +56,7 @@ class CogneeService:
|
||||
# Configure LLM with API key BEFORE any other cognee operations
|
||||
provider = os.getenv("LLM_PROVIDER", "openai")
|
||||
model = os.getenv("LLM_MODEL") or os.getenv("LITELLM_MODEL", "gpt-4o-mini")
|
||||
api_key = os.getenv("LLM_API_KEY") or os.getenv("OPENAI_API_KEY")
|
||||
api_key = os.getenv("COGNEE_API_KEY") or os.getenv("LLM_API_KEY") or os.getenv("OPENAI_API_KEY")
|
||||
endpoint = os.getenv("LLM_ENDPOINT")
|
||||
api_version = os.getenv("LLM_API_VERSION")
|
||||
max_tokens = os.getenv("LLM_MAX_TOKENS")
|
||||
@@ -78,48 +78,62 @@ class CogneeService:
|
||||
os.environ.setdefault("OPENAI_API_KEY", api_key)
|
||||
if endpoint:
|
||||
os.environ["LLM_ENDPOINT"] = endpoint
|
||||
os.environ.setdefault("LLM_API_BASE", endpoint)
|
||||
os.environ.setdefault("OPENAI_API_BASE", endpoint)
|
||||
os.environ.setdefault("LITELLM_PROXY_API_BASE", endpoint)
|
||||
if api_key:
|
||||
os.environ.setdefault("LITELLM_PROXY_API_KEY", api_key)
|
||||
if api_version:
|
||||
os.environ["LLM_API_VERSION"] = api_version
|
||||
if max_tokens:
|
||||
os.environ["LLM_MAX_TOKENS"] = str(max_tokens)
|
||||
|
||||
# Configure Cognee's runtime using its configuration helpers when available
|
||||
embedding_model = os.getenv("LLM_EMBEDDING_MODEL")
|
||||
embedding_endpoint = os.getenv("LLM_EMBEDDING_ENDPOINT")
|
||||
if embedding_endpoint:
|
||||
os.environ.setdefault("LLM_EMBEDDING_API_BASE", embedding_endpoint)
|
||||
|
||||
if hasattr(cognee.config, "set_llm_provider"):
|
||||
cognee.config.set_llm_provider(provider)
|
||||
if hasattr(cognee.config, "set_llm_model"):
|
||||
cognee.config.set_llm_model(model)
|
||||
if api_key and hasattr(cognee.config, "set_llm_api_key"):
|
||||
cognee.config.set_llm_api_key(api_key)
|
||||
if endpoint and hasattr(cognee.config, "set_llm_endpoint"):
|
||||
cognee.config.set_llm_endpoint(endpoint)
|
||||
if hasattr(cognee.config, "set_llm_model"):
|
||||
cognee.config.set_llm_model(model)
|
||||
if api_key and hasattr(cognee.config, "set_llm_api_key"):
|
||||
cognee.config.set_llm_api_key(api_key)
|
||||
if endpoint and hasattr(cognee.config, "set_llm_endpoint"):
|
||||
cognee.config.set_llm_endpoint(endpoint)
|
||||
if embedding_model and hasattr(cognee.config, "set_llm_embedding_model"):
|
||||
cognee.config.set_llm_embedding_model(embedding_model)
|
||||
if embedding_endpoint and hasattr(cognee.config, "set_llm_embedding_endpoint"):
|
||||
cognee.config.set_llm_embedding_endpoint(embedding_endpoint)
|
||||
if api_version and hasattr(cognee.config, "set_llm_api_version"):
|
||||
cognee.config.set_llm_api_version(api_version)
|
||||
if max_tokens and hasattr(cognee.config, "set_llm_max_tokens"):
|
||||
cognee.config.set_llm_max_tokens(int(max_tokens))
|
||||
|
||||
|
||||
# Configure graph database
|
||||
cognee.config.set_graph_db_config({
|
||||
"graph_database_provider": self.cognee_config.get("graph_database_provider", "kuzu"),
|
||||
})
|
||||
|
||||
|
||||
# Set data directories
|
||||
data_dir = self.cognee_config.get("data_directory")
|
||||
system_dir = self.cognee_config.get("system_directory")
|
||||
|
||||
|
||||
if data_dir:
|
||||
logger.debug("Setting cognee data root", extra={"path": data_dir})
|
||||
cognee.config.data_root_directory(data_dir)
|
||||
if system_dir:
|
||||
logger.debug("Setting cognee system root", extra={"path": system_dir})
|
||||
cognee.config.system_root_directory(system_dir)
|
||||
|
||||
|
||||
# Setup multi-tenant user context
|
||||
await self._setup_user_context()
|
||||
|
||||
|
||||
self._initialized = True
|
||||
logger.info(f"Cognee initialized for project {self.project_context['project_name']} "
|
||||
f"with Kuzu at {system_dir}")
|
||||
|
||||
|
||||
except ImportError:
|
||||
logger.error("Cognee not installed. Install with: pip install cognee")
|
||||
raise
|
||||
|
||||
Reference in New Issue
Block a user