Feature/litellm proxy (#27)

* feat: seed governance config and responses routing

* Add env-configurable timeout for proxy providers

* Integrate LiteLLM OTEL collector and update docs

* Make .env.litellm optional for LiteLLM proxy

* Add LiteLLM proxy integration with model-agnostic virtual keys

Changes:
- Bootstrap generates 3 virtual keys with individual budgets (CLI: $100, Task-Agent: $25, Cognee: $50)
- Task-agent loads config at runtime via entrypoint script to wait for bootstrap completion
- All keys are model-agnostic by default (no LITELLM_DEFAULT_MODELS restrictions)
- Bootstrap handles database/env mismatch after docker prune by deleting stale aliases
- CLI and Cognee configured to use LiteLLM proxy with virtual keys
- Added comprehensive documentation in volumes/env/README.md

Technical details:
- task-agent entrypoint waits for keys in .env file before starting uvicorn
- Bootstrap creates/updates TASK_AGENT_API_KEY, COGNEE_API_KEY, and OPENAI_API_KEY
- Removed hardcoded API keys from docker-compose.yml
- All services route through http://localhost:10999 proxy

* Fix CLI not loading virtual keys from global .env

Project .env files with empty OPENAI_API_KEY values were overriding
the global virtual keys. Updated _load_env_file_if_exists to only
override with non-empty values.

* Fix agent executor not passing API key to LiteLLM

The agent was initializing LiteLlm without api_key or api_base,
causing authentication errors when using the LiteLLM proxy. Now
reads from OPENAI_API_KEY/LLM_API_KEY and LLM_ENDPOINT environment
variables and passes them to LiteLlm constructor.

* Auto-populate project .env with virtual key from global config

When running 'ff init', the command now checks for a global
volumes/env/.env file and automatically uses the OPENAI_API_KEY
virtual key if found. This ensures projects work with LiteLLM
proxy out of the box without manual key configuration.

* docs: Update README with LiteLLM configuration instructions

Add note about LITELLM_GEMINI_API_KEY configuration and clarify that OPENAI_API_KEY default value should not be changed as it's used for the LLM proxy.

* Refactor workflow parameters to use JSON Schema defaults

Consolidates parameter defaults into JSON Schema format, removing the separate default_parameters field. Adds extract_defaults_from_json_schema() helper to extract defaults from the standard schema structure. Updates LiteLLM proxy config to use LITELLM_OPENAI_API_KEY environment variable.

* Remove .env.example from task_agent

* Fix MDX syntax error in llm-proxy.md

* fix: apply default parameters from metadata.yaml automatically

Fixed TemporalManager.run_workflow() to correctly apply default parameter
values from workflow metadata.yaml files when parameters are not provided
by the caller.

Previous behavior:
- When workflow_params was empty {}, the condition
  `if workflow_params and 'parameters' in metadata` would fail
- Parameters would not be extracted from schema, resulting in workflows
  receiving only target_id with no other parameters

New behavior:
- Removed the `workflow_params and` requirement from the condition
- Now explicitly checks for defaults in parameter spec
- Applies defaults from metadata.yaml automatically when param not provided
- Workflows receive all parameters with proper fallback:
  provided value > metadata default > None

This makes metadata.yaml the single source of truth for parameter defaults,
removing the need for workflows to implement defensive default handling.

Affected workflows:
- llm_secret_detection (was failing with KeyError)
- All other workflows now benefit from automatic default application

Co-authored-by: tduhamel42 <tduhamel@fuzzinglabs.com>
This commit is contained in:
Songbird99
2025-10-26 12:51:53 +01:00
committed by tduhamel42
parent bd94d19d34
commit f77c3ff1e9
29 changed files with 1869 additions and 106 deletions
-10
View File
@@ -1,10 +0,0 @@
# Default LiteLLM configuration
LITELLM_MODEL=gemini/gemini-2.0-flash-001
# LITELLM_PROVIDER=gemini
# API keys (uncomment and fill as needed)
# GOOGLE_API_KEY=
# OPENAI_API_KEY=
# ANTHROPIC_API_KEY=
# OPENROUTER_API_KEY=
# MISTRAL_API_KEY=
+5
View File
@@ -16,4 +16,9 @@ COPY . /app/agent_with_adk_format
WORKDIR /app/agent_with_adk_format
ENV PYTHONPATH=/app
# Copy and set up entrypoint
COPY docker-entrypoint.sh /docker-entrypoint.sh
RUN chmod +x /docker-entrypoint.sh
ENTRYPOINT ["/docker-entrypoint.sh"]
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
+24 -8
View File
@@ -43,18 +43,34 @@ cd task_agent
# cp .env.example .env
```
Edit `.env` (or `.env.example`) and add your API keys. The agent must be restarted after changes so the values are picked up:
Edit `.env` (or `.env.example`) and add your proxy + API keys. The agent must be restarted after changes so the values are picked up:
```bash
# Set default model
LITELLM_MODEL=gemini/gemini-2.0-flash-001
# Route every request through the proxy container (use http://localhost:10999 from the host)
FF_LLM_PROXY_BASE_URL=http://llm-proxy:4000
# Add API keys for providers you want to use
GOOGLE_API_KEY=your_google_api_key
OPENAI_API_KEY=your_openai_api_key
ANTHROPIC_API_KEY=your_anthropic_api_key
OPENROUTER_API_KEY=your_openrouter_api_key
# Default model + provider the agent boots with
LITELLM_MODEL=openai/gpt-4o-mini
LITELLM_PROVIDER=openai
# Virtual key issued by the proxy to the task agent (bootstrap replaces the placeholder)
OPENAI_API_KEY=sk-proxy-default
# Upstream keys stay inside the proxy. Store real secrets under the LiteLLM
# aliases and the bootstrapper mirrors them into .env.litellm for the proxy container.
LITELLM_OPENAI_API_KEY=your_real_openai_api_key
LITELLM_ANTHROPIC_API_KEY=your_real_anthropic_key
LITELLM_GEMINI_API_KEY=your_real_gemini_key
LITELLM_MISTRAL_API_KEY=your_real_mistral_key
LITELLM_OPENROUTER_API_KEY=your_real_openrouter_key
```
> When running the agent outside of Docker, swap `FF_LLM_PROXY_BASE_URL` to the host port (default `http://localhost:10999`).
The bootstrap container provisions LiteLLM, copies provider secrets into
`volumes/env/.env.litellm`, and rewrites `volumes/env/.env` with the virtual key.
Populate the `LITELLM_*_API_KEY` values before the first launch so the proxy can
reach your upstream providers as soon as the bootstrap script runs.
### 2. Install Dependencies
```bash
+31
View File
@@ -0,0 +1,31 @@
#!/bin/bash
set -e
# Wait for .env file to have keys (max 30 seconds)
echo "[task-agent] Waiting for virtual keys to be provisioned..."
for i in $(seq 1 30); do
if [ -f /app/config/.env ]; then
# Check if TASK_AGENT_API_KEY has a value (not empty)
KEY=$(grep -E '^TASK_AGENT_API_KEY=' /app/config/.env | cut -d'=' -f2)
if [ -n "$KEY" ] && [ "$KEY" != "" ]; then
echo "[task-agent] Virtual keys found, loading environment..."
# Export keys from .env file
export TASK_AGENT_API_KEY="$KEY"
export OPENAI_API_KEY=$(grep -E '^OPENAI_API_KEY=' /app/config/.env | cut -d'=' -f2)
export FF_LLM_PROXY_BASE_URL=$(grep -E '^FF_LLM_PROXY_BASE_URL=' /app/config/.env | cut -d'=' -f2)
echo "[task-agent] Loaded TASK_AGENT_API_KEY: ${TASK_AGENT_API_KEY:0:15}..."
echo "[task-agent] Loaded FF_LLM_PROXY_BASE_URL: $FF_LLM_PROXY_BASE_URL"
break
fi
fi
echo "[task-agent] Keys not ready yet, waiting... ($i/30)"
sleep 1
done
if [ -z "$TASK_AGENT_API_KEY" ]; then
echo "[task-agent] ERROR: Virtual keys were not provisioned within 30 seconds!"
exit 1
fi
echo "[task-agent] Starting uvicorn..."
exec "$@"
+17 -2
View File
@@ -4,13 +4,28 @@ from __future__ import annotations
import os
def _normalize_proxy_base_url(raw_value: str | None) -> str | None:
if not raw_value:
return None
cleaned = raw_value.strip()
if not cleaned:
return None
# Avoid double slashes in downstream requests
return cleaned.rstrip("/")
AGENT_NAME = "litellm_agent"
AGENT_DESCRIPTION = (
"A LiteLLM-backed shell that exposes hot-swappable model and prompt controls."
)
DEFAULT_MODEL = os.getenv("LITELLM_MODEL", "gemini-2.0-flash-001")
DEFAULT_PROVIDER = os.getenv("LITELLM_PROVIDER")
DEFAULT_MODEL = os.getenv("LITELLM_MODEL", "openai/gpt-4o-mini")
DEFAULT_PROVIDER = os.getenv("LITELLM_PROVIDER") or None
PROXY_BASE_URL = _normalize_proxy_base_url(
os.getenv("FF_LLM_PROXY_BASE_URL")
or os.getenv("LITELLM_API_BASE")
or os.getenv("LITELLM_BASE_URL")
)
STATE_PREFIX = "app:litellm_agent/"
STATE_MODEL_KEY = f"{STATE_PREFIX}model"
+169 -1
View File
@@ -3,11 +3,15 @@
from __future__ import annotations
from dataclasses import dataclass
import os
from typing import Any, Mapping, MutableMapping, Optional
import httpx
from .config import (
DEFAULT_MODEL,
DEFAULT_PROVIDER,
PROXY_BASE_URL,
STATE_MODEL_KEY,
STATE_PROMPT_KEY,
STATE_PROVIDER_KEY,
@@ -66,11 +70,109 @@ class HotSwapState:
"""Create a LiteLlm instance for the current state."""
from google.adk.models.lite_llm import LiteLlm # Lazy import to avoid cycle
from google.adk.models.lite_llm import LiteLLMClient
from litellm.types.utils import Choices, Message, ModelResponse, Usage
kwargs = {"model": self.model}
if self.provider:
kwargs["custom_llm_provider"] = self.provider
return LiteLlm(**kwargs)
if PROXY_BASE_URL:
provider = (self.provider or DEFAULT_PROVIDER or "").lower()
if provider and provider != "openai":
kwargs["api_base"] = f"{PROXY_BASE_URL.rstrip('/')}/{provider}"
else:
kwargs["api_base"] = PROXY_BASE_URL
kwargs.setdefault("api_key", os.environ.get("TASK_AGENT_API_KEY") or os.environ.get("OPENAI_API_KEY"))
provider = (self.provider or DEFAULT_PROVIDER or "").lower()
model_suffix = self.model.split("/", 1)[-1]
use_responses = provider == "openai" and (
model_suffix.startswith("gpt-5") or model_suffix.startswith("o1")
)
if use_responses:
kwargs.setdefault("use_responses_api", True)
llm = LiteLlm(**kwargs)
if use_responses and PROXY_BASE_URL:
class _ResponsesAwareClient(LiteLLMClient):
def __init__(self, base_client: LiteLLMClient, api_base: str, api_key: str):
self._base_client = base_client
self._api_base = api_base.rstrip("/")
self._api_key = api_key
async def acompletion(self, model, messages, tools, **kwargs): # type: ignore[override]
use_responses_api = kwargs.pop("use_responses_api", False)
if not use_responses_api:
return await self._base_client.acompletion(
model=model,
messages=messages,
tools=tools,
**kwargs,
)
resolved_model = model
if "/" not in resolved_model:
resolved_model = f"openai/{resolved_model}"
payload = {
"model": resolved_model,
"input": _messages_to_responses_input(messages),
}
timeout = kwargs.get("timeout", 60)
headers = {
"Authorization": f"Bearer {self._api_key}",
"Content-Type": "application/json",
}
async with httpx.AsyncClient(timeout=timeout) as client:
response = await client.post(
f"{self._api_base}/v1/responses",
json=payload,
headers=headers,
)
try:
response.raise_for_status()
except httpx.HTTPStatusError as exc:
text = exc.response.text
raise RuntimeError(
f"LiteLLM responses request failed: {text}"
) from exc
data = response.json()
text_output = _extract_output_text(data)
usage = data.get("usage", {})
return ModelResponse(
id=data.get("id"),
model=model,
choices=[
Choices(
finish_reason="stop",
index=0,
message=Message(role="assistant", content=text_output),
provider_specific_fields={"bifrost_response": data},
)
],
usage=Usage(
prompt_tokens=usage.get("input_tokens"),
completion_tokens=usage.get("output_tokens"),
reasoning_tokens=usage.get("output_tokens_details", {}).get(
"reasoning_tokens"
),
total_tokens=usage.get("total_tokens"),
),
)
llm.llm_client = _ResponsesAwareClient(
llm.llm_client,
PROXY_BASE_URL,
os.environ.get("TASK_AGENT_API_KEY") or os.environ.get("OPENAI_API_KEY", ""),
)
return llm
@property
def display_model(self) -> str:
@@ -84,3 +186,69 @@ def apply_state_to_agent(invocation_context, state: HotSwapState) -> None:
agent = invocation_context.agent
agent.model = state.instantiate_llm()
def _messages_to_responses_input(messages: list[dict[str, Any]]) -> list[dict[str, Any]]:
inputs: list[dict[str, Any]] = []
for message in messages:
role = message.get("role", "user")
content = message.get("content", "")
text_segments: list[str] = []
if isinstance(content, list):
for item in content:
if isinstance(item, dict):
text = item.get("text") or item.get("content")
if text:
text_segments.append(str(text))
elif isinstance(item, str):
text_segments.append(item)
elif isinstance(content, str):
text_segments.append(content)
text = "\n".join(segment.strip() for segment in text_segments if segment)
if not text:
continue
entry_type = "input_text"
if role == "assistant":
entry_type = "output_text"
inputs.append(
{
"role": role,
"content": [
{
"type": entry_type,
"text": text,
}
],
}
)
if not inputs:
inputs.append(
{
"role": "user",
"content": [
{
"type": "input_text",
"text": "",
}
],
}
)
return inputs
def _extract_output_text(response_json: dict[str, Any]) -> str:
outputs = response_json.get("output", [])
collected: list[str] = []
for item in outputs:
if isinstance(item, dict) and item.get("type") == "message":
for part in item.get("content", []):
if isinstance(part, dict) and part.get("type") == "output_text":
text = part.get("text", "")
if text:
collected.append(str(text))
return "\n\n".join(collected).strip()
+5
View File
@@ -0,0 +1,5 @@
# LLM Proxy Integrations
This directory contains vendor source trees that were vendored only for reference when integrating LLM gateways. The actual FuzzForge deployment uses the official Docker images for each project.
See `docs/docs/how-to/llm-proxy.md` for up-to-date instructions on running the proxy services and issuing keys for the agents.
+12 -3
View File
@@ -1049,10 +1049,19 @@ class FuzzForgeExecutor:
FunctionTool(get_task_list)
])
# Create the agent
# Create the agent with LiteLLM configuration
llm_kwargs = {}
api_key = os.getenv('OPENAI_API_KEY') or os.getenv('LLM_API_KEY')
api_base = os.getenv('LLM_ENDPOINT') or os.getenv('LLM_API_BASE') or os.getenv('OPENAI_API_BASE')
if api_key:
llm_kwargs['api_key'] = api_key
if api_base:
llm_kwargs['api_base'] = api_base
self.agent = LlmAgent(
model=LiteLlm(model=self.model),
model=LiteLlm(model=self.model, **llm_kwargs),
name="fuzzforge_executor",
description="Intelligent A2A orchestrator with memory",
instruction=self._build_instruction(),
+27 -13
View File
@@ -56,7 +56,7 @@ class CogneeService:
# Configure LLM with API key BEFORE any other cognee operations
provider = os.getenv("LLM_PROVIDER", "openai")
model = os.getenv("LLM_MODEL") or os.getenv("LITELLM_MODEL", "gpt-4o-mini")
api_key = os.getenv("LLM_API_KEY") or os.getenv("OPENAI_API_KEY")
api_key = os.getenv("COGNEE_API_KEY") or os.getenv("LLM_API_KEY") or os.getenv("OPENAI_API_KEY")
endpoint = os.getenv("LLM_ENDPOINT")
api_version = os.getenv("LLM_API_VERSION")
max_tokens = os.getenv("LLM_MAX_TOKENS")
@@ -78,48 +78,62 @@ class CogneeService:
os.environ.setdefault("OPENAI_API_KEY", api_key)
if endpoint:
os.environ["LLM_ENDPOINT"] = endpoint
os.environ.setdefault("LLM_API_BASE", endpoint)
os.environ.setdefault("OPENAI_API_BASE", endpoint)
os.environ.setdefault("LITELLM_PROXY_API_BASE", endpoint)
if api_key:
os.environ.setdefault("LITELLM_PROXY_API_KEY", api_key)
if api_version:
os.environ["LLM_API_VERSION"] = api_version
if max_tokens:
os.environ["LLM_MAX_TOKENS"] = str(max_tokens)
# Configure Cognee's runtime using its configuration helpers when available
embedding_model = os.getenv("LLM_EMBEDDING_MODEL")
embedding_endpoint = os.getenv("LLM_EMBEDDING_ENDPOINT")
if embedding_endpoint:
os.environ.setdefault("LLM_EMBEDDING_API_BASE", embedding_endpoint)
if hasattr(cognee.config, "set_llm_provider"):
cognee.config.set_llm_provider(provider)
if hasattr(cognee.config, "set_llm_model"):
cognee.config.set_llm_model(model)
if api_key and hasattr(cognee.config, "set_llm_api_key"):
cognee.config.set_llm_api_key(api_key)
if endpoint and hasattr(cognee.config, "set_llm_endpoint"):
cognee.config.set_llm_endpoint(endpoint)
if hasattr(cognee.config, "set_llm_model"):
cognee.config.set_llm_model(model)
if api_key and hasattr(cognee.config, "set_llm_api_key"):
cognee.config.set_llm_api_key(api_key)
if endpoint and hasattr(cognee.config, "set_llm_endpoint"):
cognee.config.set_llm_endpoint(endpoint)
if embedding_model and hasattr(cognee.config, "set_llm_embedding_model"):
cognee.config.set_llm_embedding_model(embedding_model)
if embedding_endpoint and hasattr(cognee.config, "set_llm_embedding_endpoint"):
cognee.config.set_llm_embedding_endpoint(embedding_endpoint)
if api_version and hasattr(cognee.config, "set_llm_api_version"):
cognee.config.set_llm_api_version(api_version)
if max_tokens and hasattr(cognee.config, "set_llm_max_tokens"):
cognee.config.set_llm_max_tokens(int(max_tokens))
# Configure graph database
cognee.config.set_graph_db_config({
"graph_database_provider": self.cognee_config.get("graph_database_provider", "kuzu"),
})
# Set data directories
data_dir = self.cognee_config.get("data_directory")
system_dir = self.cognee_config.get("system_directory")
if data_dir:
logger.debug("Setting cognee data root", extra={"path": data_dir})
cognee.config.data_root_directory(data_dir)
if system_dir:
logger.debug("Setting cognee system root", extra={"path": system_dir})
cognee.config.system_root_directory(system_dir)
# Setup multi-tenant user context
await self._setup_user_context()
self._initialized = True
logger.info(f"Cognee initialized for project {self.project_context['project_name']} "
f"with Kuzu at {system_dir}")
except ImportError:
logger.error("Cognee not installed. Install with: pip install cognee")
raise