mirror of
https://github.com/FuzzingLabs/fuzzforge_ai.git
synced 2026-05-16 23:33:28 +02:00
Feature/litellm proxy (#27)
* feat: seed governance config and responses routing * Add env-configurable timeout for proxy providers * Integrate LiteLLM OTEL collector and update docs * Make .env.litellm optional for LiteLLM proxy * Add LiteLLM proxy integration with model-agnostic virtual keys Changes: - Bootstrap generates 3 virtual keys with individual budgets (CLI: $100, Task-Agent: $25, Cognee: $50) - Task-agent loads config at runtime via entrypoint script to wait for bootstrap completion - All keys are model-agnostic by default (no LITELLM_DEFAULT_MODELS restrictions) - Bootstrap handles database/env mismatch after docker prune by deleting stale aliases - CLI and Cognee configured to use LiteLLM proxy with virtual keys - Added comprehensive documentation in volumes/env/README.md Technical details: - task-agent entrypoint waits for keys in .env file before starting uvicorn - Bootstrap creates/updates TASK_AGENT_API_KEY, COGNEE_API_KEY, and OPENAI_API_KEY - Removed hardcoded API keys from docker-compose.yml - All services route through http://localhost:10999 proxy * Fix CLI not loading virtual keys from global .env Project .env files with empty OPENAI_API_KEY values were overriding the global virtual keys. Updated _load_env_file_if_exists to only override with non-empty values. * Fix agent executor not passing API key to LiteLLM The agent was initializing LiteLlm without api_key or api_base, causing authentication errors when using the LiteLLM proxy. Now reads from OPENAI_API_KEY/LLM_API_KEY and LLM_ENDPOINT environment variables and passes them to LiteLlm constructor. * Auto-populate project .env with virtual key from global config When running 'ff init', the command now checks for a global volumes/env/.env file and automatically uses the OPENAI_API_KEY virtual key if found. This ensures projects work with LiteLLM proxy out of the box without manual key configuration. * docs: Update README with LiteLLM configuration instructions Add note about LITELLM_GEMINI_API_KEY configuration and clarify that OPENAI_API_KEY default value should not be changed as it's used for the LLM proxy. * Refactor workflow parameters to use JSON Schema defaults Consolidates parameter defaults into JSON Schema format, removing the separate default_parameters field. Adds extract_defaults_from_json_schema() helper to extract defaults from the standard schema structure. Updates LiteLLM proxy config to use LITELLM_OPENAI_API_KEY environment variable. * Remove .env.example from task_agent * Fix MDX syntax error in llm-proxy.md * fix: apply default parameters from metadata.yaml automatically Fixed TemporalManager.run_workflow() to correctly apply default parameter values from workflow metadata.yaml files when parameters are not provided by the caller. Previous behavior: - When workflow_params was empty {}, the condition `if workflow_params and 'parameters' in metadata` would fail - Parameters would not be extracted from schema, resulting in workflows receiving only target_id with no other parameters New behavior: - Removed the `workflow_params and` requirement from the condition - Now explicitly checks for defaults in parameter spec - Applies defaults from metadata.yaml automatically when param not provided - Workflows receive all parameters with proper fallback: provided value > metadata default > None This makes metadata.yaml the single source of truth for parameter defaults, removing the need for workflows to implement defensive default handling. Affected workflows: - llm_secret_detection (was failing with KeyError) - All other workflows now benefit from automatic default application Co-authored-by: tduhamel42 <tduhamel@fuzzinglabs.com>
This commit is contained in:
@@ -0,0 +1,179 @@
|
||||
---
|
||||
title: "Hot-Swap LiteLLM Models"
|
||||
description: "Register OpenAI and Anthropic models with the bundled LiteLLM proxy and switch them on the task agent without downtime."
|
||||
---
|
||||
|
||||
LiteLLM sits between the task agent and upstream providers, so every model change
|
||||
is just an API call. This guide walks through registering OpenAI and Anthropic
|
||||
models, updating the virtual key, and exercising the A2A hot-swap flow.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- `docker compose up llm-proxy llm-proxy-db task-agent`
|
||||
- Provider secrets in `volumes/env/.env`:
|
||||
- `LITELLM_OPENAI_API_KEY`
|
||||
- `LITELLM_ANTHROPIC_API_KEY`
|
||||
- Master key (`LITELLM_MASTER_KEY`) and task-agent virtual key (auto-generated
|
||||
during bootstrap)
|
||||
|
||||
> UI access uses `UI_USERNAME` / `UI_PASSWORD` (defaults: `fuzzforge` /
|
||||
> `fuzzforge123`). Change them by exporting new values before running compose.
|
||||
|
||||
## Register Provider Models
|
||||
|
||||
Use the admin API to register the models the proxy should expose. The snippet
|
||||
below creates aliases for OpenAI `gpt-5`, `gpt-5-mini`, and Anthropic
|
||||
`claude-sonnet-4-5`.
|
||||
|
||||
```bash
|
||||
MASTER_KEY=$(awk -F= '$1=="LITELLM_MASTER_KEY"{print $2}' volumes/env/.env)
|
||||
export OPENAI_API_KEY=$(awk -F= '$1=="OPENAI_API_KEY"{print $2}' volumes/env/.env)
|
||||
python - <<'PY'
|
||||
import os, requests
|
||||
master = os.environ['MASTER_KEY'].strip()
|
||||
base = 'http://localhost:10999'
|
||||
models = [
|
||||
{
|
||||
"model_name": "openai/gpt-5",
|
||||
"litellm_params": {
|
||||
"model": "gpt-5",
|
||||
"custom_llm_provider": "openai",
|
||||
"api_key": "os.environ/LITELLM_OPENAI_API_KEY"
|
||||
},
|
||||
"model_info": {
|
||||
"provider": "openai",
|
||||
"description": "OpenAI GPT-5"
|
||||
}
|
||||
},
|
||||
{
|
||||
"model_name": "openai/gpt-5-mini",
|
||||
"litellm_params": {
|
||||
"model": "gpt-5-mini",
|
||||
"custom_llm_provider": "openai",
|
||||
"api_key": "os.environ/LITELLM_OPENAI_API_KEY"
|
||||
},
|
||||
"model_info": {
|
||||
"provider": "openai",
|
||||
"description": "OpenAI GPT-5 mini"
|
||||
}
|
||||
},
|
||||
{
|
||||
"model_name": "anthropic/claude-sonnet-4-5",
|
||||
"litellm_params": {
|
||||
"model": "claude-sonnet-4-5",
|
||||
"custom_llm_provider": "anthropic",
|
||||
"api_key": "os.environ/LITELLM_ANTHROPIC_API_KEY"
|
||||
},
|
||||
"model_info": {
|
||||
"provider": "anthropic",
|
||||
"description": "Anthropic Claude Sonnet 4.5"
|
||||
}
|
||||
}
|
||||
]
|
||||
for payload in models:
|
||||
resp = requests.post(
|
||||
f"{base}/model/new",
|
||||
headers={"Authorization": f"Bearer {master}", "Content-Type": "application/json"},
|
||||
json=payload,
|
||||
timeout=60,
|
||||
)
|
||||
if resp.status_code not in (200, 201, 409):
|
||||
raise SystemExit(f"Failed to register {payload['model_name']}: {resp.status_code} {resp.text}")
|
||||
print(payload['model_name'], '=>', resp.status_code)
|
||||
PY
|
||||
```
|
||||
|
||||
Each entry stores the upstream secret by reference (`os.environ/...`) so the
|
||||
raw API key never leaves the container environment.
|
||||
|
||||
## Relax Virtual Key Model Restrictions
|
||||
|
||||
Let the agent key call every model on the proxy:
|
||||
|
||||
```bash
|
||||
MASTER_KEY=$(awk -F= '$1=="LITELLM_MASTER_KEY"{print $2}' volumes/env/.env)
|
||||
VK=$(awk -F= '$1=="OPENAI_API_KEY"{print $2}' volumes/env/.env)
|
||||
python - <<'PY'
|
||||
import os, requests, json
|
||||
resp = requests.post(
|
||||
'http://localhost:10999/key/update',
|
||||
headers={
|
||||
'Authorization': f"Bearer {os.environ['MASTER_KEY'].strip()}",
|
||||
'Content-Type': 'application/json'
|
||||
},
|
||||
json={'key': os.environ['VK'].strip(), 'models': []},
|
||||
timeout=60,
|
||||
)
|
||||
print(json.dumps(resp.json(), indent=2))
|
||||
PY
|
||||
```
|
||||
|
||||
Restart the task agent so it sees the refreshed key:
|
||||
|
||||
```bash
|
||||
docker compose restart task-agent
|
||||
```
|
||||
|
||||
## Hot-Swap With The A2A Helper
|
||||
|
||||
Switch models without restarting the service:
|
||||
|
||||
```bash
|
||||
# Ensure the CLI reads the latest virtual key
|
||||
export OPENAI_API_KEY=$(awk -F= '$1=="OPENAI_API_KEY"{print $2}' volumes/env/.env)
|
||||
|
||||
# OpenAI gpt-5 alias
|
||||
python ai/agents/task_agent/a2a_hot_swap.py \
|
||||
--url http://localhost:10900/a2a/litellm_agent \
|
||||
--model openai gpt-5 \
|
||||
--context switch-demo
|
||||
|
||||
# Confirm the response comes from the new model
|
||||
python ai/agents/task_agent/a2a_hot_swap.py \
|
||||
--url http://localhost:10900/a2a/litellm_agent \
|
||||
--message "Which model am I using?" \
|
||||
--context switch-demo
|
||||
|
||||
# Swap to gpt-5-mini
|
||||
python ai/agents/task_agent/a2a_hot_swap.py --url http://localhost:10900/a2a/litellm_agent --model openai gpt-5-mini --context switch-demo
|
||||
|
||||
# Swap to Anthropic Claude Sonnet 4.5
|
||||
python ai/agents/task_agent/a2a_hot_swap.py --url http://localhost:10900/a2a/litellm_agent --model anthropic claude-sonnet-4-5 --context switch-demo
|
||||
```
|
||||
|
||||
> Each invocation reuses the same conversation context (`switch-demo`) so you
|
||||
> can confirm the active provider by asking follow-up questions.
|
||||
|
||||
## Resetting The Proxy (Optional)
|
||||
|
||||
To wipe the LiteLLM state and rerun bootstrap:
|
||||
|
||||
```bash
|
||||
docker compose down llm-proxy llm-proxy-db llm-proxy-bootstrap
|
||||
|
||||
docker volume rm fuzzforge_litellm_proxy_data fuzzforge_litellm_proxy_db
|
||||
|
||||
docker compose up -d llm-proxy-db llm-proxy
|
||||
```
|
||||
|
||||
After the proxy is healthy, rerun the registration script and key update. The
|
||||
bootstrap container mirrors secrets into `.env.litellm` and reissues the task
|
||||
agent key automatically.
|
||||
|
||||
## How The Pieces Fit Together
|
||||
|
||||
1. **LiteLLM Proxy** exposes OpenAI-compatible routes and stores provider
|
||||
metadata in Postgres.
|
||||
2. **Bootstrap Container** waits for `/health/liveliness`, mirrors secrets into
|
||||
`.env.litellm`, registers any models you script, and keeps the virtual key in
|
||||
sync with the discovered model list.
|
||||
3. **Task Agent** calls the proxy via `FF_LLM_PROXY_BASE_URL`. The hot-swap tool
|
||||
updates the agent’s runtime configuration, so switching providers is just a
|
||||
control message.
|
||||
4. **Virtual Keys** carry quotas and allowed models. Setting the `models` array
|
||||
to `[]` lets the key use anything registered on the proxy.
|
||||
|
||||
Keep the master key and generated virtual keys somewhere safe—they grant full
|
||||
admin and agent access respectively. When you add a new provider (e.g., Ollama)
|
||||
just register the model via `/model/new`, update the key if needed, and repeat
|
||||
the hot-swap steps.
|
||||
@@ -0,0 +1,194 @@
|
||||
---
|
||||
title: "Run the LLM Proxy"
|
||||
description: "Run the LiteLLM gateway that ships with FuzzForge and connect it to the task agent."
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
FuzzForge routes every LLM request through a LiteLLM proxy so that usage can be
|
||||
metered, priced, and rate limited per user. Docker Compose starts the proxy in a
|
||||
hardened container, while a bootstrap job seeds upstream provider secrets and
|
||||
issues a virtual key for the task agent automatically.
|
||||
|
||||
LiteLLM exposes the OpenAI-compatible APIs (`/v1/*`) plus a rich admin UI. All
|
||||
traffic stays on your network and upstream credentials never leave the proxy
|
||||
container.
|
||||
|
||||
## Before You Start
|
||||
|
||||
1. Copy `volumes/env/.env.example` to `volumes/env/.env` and set the basics:
|
||||
- `LITELLM_MASTER_KEY` — admin token used to manage the proxy
|
||||
- `LITELLM_SALT_KEY` — random string used to encrypt provider credentials
|
||||
- Provider secrets under `LITELLM_<PROVIDER>_API_KEY` (for example
|
||||
`LITELLM_OPENAI_API_KEY`)
|
||||
- Leave `OPENAI_API_KEY=sk-proxy-default`; the bootstrap job replaces it with a
|
||||
LiteLLM-issued virtual key
|
||||
2. When running tools outside Docker, change `FF_LLM_PROXY_BASE_URL` to the
|
||||
published host port (`http://localhost:10999`). Inside Docker the default
|
||||
value `http://llm-proxy:4000` already resolves to the container.
|
||||
|
||||
## Start the Proxy
|
||||
|
||||
```bash
|
||||
docker compose up llm-proxy
|
||||
```
|
||||
|
||||
The service publishes two things:
|
||||
|
||||
- HTTP API + admin UI on `http://localhost:10999`
|
||||
- Persistent SQLite state inside the named volume
|
||||
`fuzzforge_litellm_proxy_data`
|
||||
|
||||
The UI login uses the `UI_USERNAME` / `UI_PASSWORD` pair (defaults to
|
||||
`fuzzforge` / `fuzzforge123`). To change them, set the environment variables
|
||||
before you run `docker compose up`:
|
||||
|
||||
```bash
|
||||
export UI_USERNAME=myadmin
|
||||
export UI_PASSWORD=super-secret
|
||||
docker compose up llm-proxy
|
||||
```
|
||||
|
||||
You can also edit the values directly in `docker-compose.yml` if you prefer to
|
||||
check them into a different secrets manager.
|
||||
|
||||
Proxy-wide settings now live in `volumes/litellm/proxy_config.yaml`. By
|
||||
default it enables `store_model_in_db` and `store_prompts_in_spend_logs`, which
|
||||
lets the UI display request/response payloads for new calls. Update this file
|
||||
if you need additional LiteLLM options and restart the `llm-proxy` container.
|
||||
|
||||
LiteLLM's health endpoint lives at `/health/liveliness`. You can verify it from
|
||||
another terminal:
|
||||
|
||||
```bash
|
||||
curl http://localhost:10999/health/liveliness
|
||||
```
|
||||
|
||||
## What the Bootstrapper Does
|
||||
|
||||
During startup the `llm-proxy-bootstrap` container performs three actions:
|
||||
|
||||
1. **Wait for the proxy** — Blocks until `/health/liveliness` becomes healthy.
|
||||
2. **Mirror provider secrets** — Reads `volumes/env/.env` and writes any
|
||||
`LITELLM_*_API_KEY` values into `volumes/env/.env.litellm`. The file is
|
||||
created automatically on first boot; if you delete it, bootstrap will
|
||||
recreate it and the proxy continues to read secrets from `.env`.
|
||||
3. **Issue the default virtual key** — Calls `/key/generate` with the master key
|
||||
and persists the generated token back into `volumes/env/.env` (replacing the
|
||||
`sk-proxy-default` placeholder). The key is scoped to
|
||||
`LITELLM_DEFAULT_MODELS` when that variable is set; otherwise it uses the
|
||||
model from `LITELLM_MODEL`.
|
||||
|
||||
The sequence is idempotent. Existing provider secrets and virtual keys are
|
||||
reused on subsequent runs, and the allowed-model list is refreshed via
|
||||
`/key/update` if you change the defaults.
|
||||
|
||||
## Managing Virtual Keys
|
||||
|
||||
LiteLLM keys act as per-user credentials. The default key, named
|
||||
`task-agent default`, is created automatically for the task agent. You can issue
|
||||
more keys for teammates or CI jobs with the same management API:
|
||||
|
||||
```bash
|
||||
curl http://localhost:10999/key/generate \
|
||||
-H "Authorization: Bearer $LITELLM_MASTER_KEY" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"key_alias": "demo-user",
|
||||
"user_id": "demo",
|
||||
"models": ["openai/gpt-4o-mini"],
|
||||
"duration": "30d",
|
||||
"max_budget": 50,
|
||||
"metadata": {"team": "sandbox"}
|
||||
}'
|
||||
```
|
||||
|
||||
Use `/key/update` to adjust budgets or the allowed-model list on existing keys:
|
||||
|
||||
```bash
|
||||
curl http://localhost:10999/key/update \
|
||||
-H "Authorization: Bearer $LITELLM_MASTER_KEY" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"key": "sk-...",
|
||||
"models": ["openai/*", "anthropic/*"],
|
||||
"max_budget": 100
|
||||
}'
|
||||
```
|
||||
|
||||
The admin UI (navigate to `http://localhost:10999/ui`) provides equivalent
|
||||
controls for creating keys, routing models, auditing spend, and exporting logs.
|
||||
|
||||
## Wiring the Task Agent
|
||||
|
||||
The task agent already expects to talk to the proxy. Confirm these values in
|
||||
`volumes/env/.env` before launching the stack:
|
||||
|
||||
```bash
|
||||
FF_LLM_PROXY_BASE_URL=http://llm-proxy:4000 # or http://localhost:10999 when outside Docker
|
||||
OPENAI_API_KEY=<virtual key created by bootstrap>
|
||||
LITELLM_MODEL=openai/gpt-5
|
||||
LITELLM_PROVIDER=openai
|
||||
```
|
||||
|
||||
Restart the agent container after changing environment variables so the process
|
||||
picks up the updates.
|
||||
|
||||
To validate the integration end to end, call the proxy directly:
|
||||
|
||||
```bash
|
||||
curl -X POST http://localhost:10999/v1/chat/completions \
|
||||
-H "Authorization: Bearer $OPENAI_API_KEY" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"model": "openai/gpt-4o-mini",
|
||||
"messages": [{"role": "user", "content": "Proxy health check"}]
|
||||
}'
|
||||
```
|
||||
|
||||
A JSON response indicates the proxy can reach your upstream provider using the
|
||||
mirrored secrets.
|
||||
|
||||
## Local Runtimes (Ollama, etc.)
|
||||
|
||||
LiteLLM supports non-hosted providers as well. To route requests to a local
|
||||
runtime such as Ollama:
|
||||
|
||||
1. Set the appropriate provider key in the env file
|
||||
(for Ollama, point LiteLLM at `OLLAMA_API_BASE` inside the container).
|
||||
2. Add the passthrough model either from the UI (**Models → Add Model**) or
|
||||
by calling `/model/new` with the master key.
|
||||
3. Update `LITELLM_DEFAULT_MODELS` (and regenerate the virtual key if you want
|
||||
the default key to include it).
|
||||
|
||||
The task agent keeps using the same OpenAI-compatible surface while LiteLLM
|
||||
handles the translation to your runtime.
|
||||
|
||||
## Next Steps
|
||||
|
||||
- Explore [LiteLLM's documentation](https://docs.litellm.ai/docs/simple_proxy)
|
||||
for advanced routing, cost controls, and observability hooks.
|
||||
- Configure Slack/Prometheus integrations from the UI to monitor usage.
|
||||
- Rotate the master key periodically and store it in your secrets manager, as it
|
||||
grants full admin access to the proxy.
|
||||
|
||||
## Observability
|
||||
|
||||
LiteLLM ships with OpenTelemetry hooks for traces and metrics. This repository
|
||||
already includes an OTLP collector (`otel-collector` service) and mounts a
|
||||
default configuration that forwards traces to standard output. To wire it up:
|
||||
|
||||
1. Edit `volumes/otel/collector-config.yaml` if you want to forward to Jaeger,
|
||||
Datadog, etc. The initial config uses the logging exporter so you can see
|
||||
spans immediately via `docker compose logs -f otel-collector`.
|
||||
2. Customize `volumes/litellm/proxy_config.yaml` if you need additional
|
||||
callbacks; `general_settings.otel: true` and `litellm_settings.callbacks:
|
||||
["otel"]` are already present so no extra code changes are required.
|
||||
3. (Optional) Override `OTEL_EXPORTER_OTLP_*` environment variables in
|
||||
`docker-compose.yml` or your shell to point at a remote collector.
|
||||
|
||||
After updating the configs, run `docker compose up -d otel-collector llm-proxy`
|
||||
and generate a request (for example, trigger `ff workflow run llm_analysis`).
|
||||
New traces will show up in the collector logs or whichever backend you
|
||||
configured. See the official LiteLLM guide for advanced exporter options:
|
||||
https://docs.litellm.ai/docs/observability/opentelemetry_integration.
|
||||
Reference in New Issue
Block a user