Feature/litellm proxy (#27)

* feat: seed governance config and responses routing

* Add env-configurable timeout for proxy providers

* Integrate LiteLLM OTEL collector and update docs

* Make .env.litellm optional for LiteLLM proxy

* Add LiteLLM proxy integration with model-agnostic virtual keys

Changes:
- Bootstrap generates 3 virtual keys with individual budgets (CLI: $100, Task-Agent: $25, Cognee: $50)
- Task-agent loads config at runtime via entrypoint script to wait for bootstrap completion
- All keys are model-agnostic by default (no LITELLM_DEFAULT_MODELS restrictions)
- Bootstrap handles database/env mismatch after docker prune by deleting stale aliases
- CLI and Cognee configured to use LiteLLM proxy with virtual keys
- Added comprehensive documentation in volumes/env/README.md

Technical details:
- task-agent entrypoint waits for keys in .env file before starting uvicorn
- Bootstrap creates/updates TASK_AGENT_API_KEY, COGNEE_API_KEY, and OPENAI_API_KEY
- Removed hardcoded API keys from docker-compose.yml
- All services route through http://localhost:10999 proxy

* Fix CLI not loading virtual keys from global .env

Project .env files with empty OPENAI_API_KEY values were overriding
the global virtual keys. Updated _load_env_file_if_exists to only
override with non-empty values.

* Fix agent executor not passing API key to LiteLLM

The agent was initializing LiteLlm without api_key or api_base,
causing authentication errors when using the LiteLLM proxy. Now
reads from OPENAI_API_KEY/LLM_API_KEY and LLM_ENDPOINT environment
variables and passes them to LiteLlm constructor.

* Auto-populate project .env with virtual key from global config

When running 'ff init', the command now checks for a global
volumes/env/.env file and automatically uses the OPENAI_API_KEY
virtual key if found. This ensures projects work with LiteLLM
proxy out of the box without manual key configuration.

* docs: Update README with LiteLLM configuration instructions

Add note about LITELLM_GEMINI_API_KEY configuration and clarify that OPENAI_API_KEY default value should not be changed as it's used for the LLM proxy.

* Refactor workflow parameters to use JSON Schema defaults

Consolidates parameter defaults into JSON Schema format, removing the separate default_parameters field. Adds extract_defaults_from_json_schema() helper to extract defaults from the standard schema structure. Updates LiteLLM proxy config to use LITELLM_OPENAI_API_KEY environment variable.

* Remove .env.example from task_agent

* Fix MDX syntax error in llm-proxy.md

* fix: apply default parameters from metadata.yaml automatically

Fixed TemporalManager.run_workflow() to correctly apply default parameter
values from workflow metadata.yaml files when parameters are not provided
by the caller.

Previous behavior:
- When workflow_params was empty {}, the condition
  `if workflow_params and 'parameters' in metadata` would fail
- Parameters would not be extracted from schema, resulting in workflows
  receiving only target_id with no other parameters

New behavior:
- Removed the `workflow_params and` requirement from the condition
- Now explicitly checks for defaults in parameter spec
- Applies defaults from metadata.yaml automatically when param not provided
- Workflows receive all parameters with proper fallback:
  provided value > metadata default > None

This makes metadata.yaml the single source of truth for parameter defaults,
removing the need for workflows to implement defensive default handling.

Affected workflows:
- llm_secret_detection (was failing with KeyError)
- All other workflows now benefit from automatic default application

Co-authored-by: tduhamel42 <tduhamel@fuzzinglabs.com>
This commit is contained in:
Songbird99
2025-10-26 12:51:53 +01:00
committed by GitHub
parent 3b25edef19
commit a2c760ea2b
29 changed files with 1869 additions and 106 deletions
+179
View File
@@ -0,0 +1,179 @@
---
title: "Hot-Swap LiteLLM Models"
description: "Register OpenAI and Anthropic models with the bundled LiteLLM proxy and switch them on the task agent without downtime."
---
LiteLLM sits between the task agent and upstream providers, so every model change
is just an API call. This guide walks through registering OpenAI and Anthropic
models, updating the virtual key, and exercising the A2A hot-swap flow.
## Prerequisites
- `docker compose up llm-proxy llm-proxy-db task-agent`
- Provider secrets in `volumes/env/.env`:
- `LITELLM_OPENAI_API_KEY`
- `LITELLM_ANTHROPIC_API_KEY`
- Master key (`LITELLM_MASTER_KEY`) and task-agent virtual key (auto-generated
during bootstrap)
> UI access uses `UI_USERNAME` / `UI_PASSWORD` (defaults: `fuzzforge` /
> `fuzzforge123`). Change them by exporting new values before running compose.
## Register Provider Models
Use the admin API to register the models the proxy should expose. The snippet
below creates aliases for OpenAI `gpt-5`, `gpt-5-mini`, and Anthropic
`claude-sonnet-4-5`.
```bash
MASTER_KEY=$(awk -F= '$1=="LITELLM_MASTER_KEY"{print $2}' volumes/env/.env)
export OPENAI_API_KEY=$(awk -F= '$1=="OPENAI_API_KEY"{print $2}' volumes/env/.env)
python - <<'PY'
import os, requests
master = os.environ['MASTER_KEY'].strip()
base = 'http://localhost:10999'
models = [
{
"model_name": "openai/gpt-5",
"litellm_params": {
"model": "gpt-5",
"custom_llm_provider": "openai",
"api_key": "os.environ/LITELLM_OPENAI_API_KEY"
},
"model_info": {
"provider": "openai",
"description": "OpenAI GPT-5"
}
},
{
"model_name": "openai/gpt-5-mini",
"litellm_params": {
"model": "gpt-5-mini",
"custom_llm_provider": "openai",
"api_key": "os.environ/LITELLM_OPENAI_API_KEY"
},
"model_info": {
"provider": "openai",
"description": "OpenAI GPT-5 mini"
}
},
{
"model_name": "anthropic/claude-sonnet-4-5",
"litellm_params": {
"model": "claude-sonnet-4-5",
"custom_llm_provider": "anthropic",
"api_key": "os.environ/LITELLM_ANTHROPIC_API_KEY"
},
"model_info": {
"provider": "anthropic",
"description": "Anthropic Claude Sonnet 4.5"
}
}
]
for payload in models:
resp = requests.post(
f"{base}/model/new",
headers={"Authorization": f"Bearer {master}", "Content-Type": "application/json"},
json=payload,
timeout=60,
)
if resp.status_code not in (200, 201, 409):
raise SystemExit(f"Failed to register {payload['model_name']}: {resp.status_code} {resp.text}")
print(payload['model_name'], '=>', resp.status_code)
PY
```
Each entry stores the upstream secret by reference (`os.environ/...`) so the
raw API key never leaves the container environment.
## Relax Virtual Key Model Restrictions
Let the agent key call every model on the proxy:
```bash
MASTER_KEY=$(awk -F= '$1=="LITELLM_MASTER_KEY"{print $2}' volumes/env/.env)
VK=$(awk -F= '$1=="OPENAI_API_KEY"{print $2}' volumes/env/.env)
python - <<'PY'
import os, requests, json
resp = requests.post(
'http://localhost:10999/key/update',
headers={
'Authorization': f"Bearer {os.environ['MASTER_KEY'].strip()}",
'Content-Type': 'application/json'
},
json={'key': os.environ['VK'].strip(), 'models': []},
timeout=60,
)
print(json.dumps(resp.json(), indent=2))
PY
```
Restart the task agent so it sees the refreshed key:
```bash
docker compose restart task-agent
```
## Hot-Swap With The A2A Helper
Switch models without restarting the service:
```bash
# Ensure the CLI reads the latest virtual key
export OPENAI_API_KEY=$(awk -F= '$1=="OPENAI_API_KEY"{print $2}' volumes/env/.env)
# OpenAI gpt-5 alias
python ai/agents/task_agent/a2a_hot_swap.py \
--url http://localhost:10900/a2a/litellm_agent \
--model openai gpt-5 \
--context switch-demo
# Confirm the response comes from the new model
python ai/agents/task_agent/a2a_hot_swap.py \
--url http://localhost:10900/a2a/litellm_agent \
--message "Which model am I using?" \
--context switch-demo
# Swap to gpt-5-mini
python ai/agents/task_agent/a2a_hot_swap.py --url http://localhost:10900/a2a/litellm_agent --model openai gpt-5-mini --context switch-demo
# Swap to Anthropic Claude Sonnet 4.5
python ai/agents/task_agent/a2a_hot_swap.py --url http://localhost:10900/a2a/litellm_agent --model anthropic claude-sonnet-4-5 --context switch-demo
```
> Each invocation reuses the same conversation context (`switch-demo`) so you
> can confirm the active provider by asking follow-up questions.
## Resetting The Proxy (Optional)
To wipe the LiteLLM state and rerun bootstrap:
```bash
docker compose down llm-proxy llm-proxy-db llm-proxy-bootstrap
docker volume rm fuzzforge_litellm_proxy_data fuzzforge_litellm_proxy_db
docker compose up -d llm-proxy-db llm-proxy
```
After the proxy is healthy, rerun the registration script and key update. The
bootstrap container mirrors secrets into `.env.litellm` and reissues the task
agent key automatically.
## How The Pieces Fit Together
1. **LiteLLM Proxy** exposes OpenAI-compatible routes and stores provider
metadata in Postgres.
2. **Bootstrap Container** waits for `/health/liveliness`, mirrors secrets into
`.env.litellm`, registers any models you script, and keeps the virtual key in
sync with the discovered model list.
3. **Task Agent** calls the proxy via `FF_LLM_PROXY_BASE_URL`. The hot-swap tool
updates the agents runtime configuration, so switching providers is just a
control message.
4. **Virtual Keys** carry quotas and allowed models. Setting the `models` array
to `[]` lets the key use anything registered on the proxy.
Keep the master key and generated virtual keys somewhere safe—they grant full
admin and agent access respectively. When you add a new provider (e.g., Ollama)
just register the model via `/model/new`, update the key if needed, and repeat
the hot-swap steps.
+194
View File
@@ -0,0 +1,194 @@
---
title: "Run the LLM Proxy"
description: "Run the LiteLLM gateway that ships with FuzzForge and connect it to the task agent."
---
## Overview
FuzzForge routes every LLM request through a LiteLLM proxy so that usage can be
metered, priced, and rate limited per user. Docker Compose starts the proxy in a
hardened container, while a bootstrap job seeds upstream provider secrets and
issues a virtual key for the task agent automatically.
LiteLLM exposes the OpenAI-compatible APIs (`/v1/*`) plus a rich admin UI. All
traffic stays on your network and upstream credentials never leave the proxy
container.
## Before You Start
1. Copy `volumes/env/.env.example` to `volumes/env/.env` and set the basics:
- `LITELLM_MASTER_KEY` — admin token used to manage the proxy
- `LITELLM_SALT_KEY` — random string used to encrypt provider credentials
- Provider secrets under `LITELLM_<PROVIDER>_API_KEY` (for example
`LITELLM_OPENAI_API_KEY`)
- Leave `OPENAI_API_KEY=sk-proxy-default`; the bootstrap job replaces it with a
LiteLLM-issued virtual key
2. When running tools outside Docker, change `FF_LLM_PROXY_BASE_URL` to the
published host port (`http://localhost:10999`). Inside Docker the default
value `http://llm-proxy:4000` already resolves to the container.
## Start the Proxy
```bash
docker compose up llm-proxy
```
The service publishes two things:
- HTTP API + admin UI on `http://localhost:10999`
- Persistent SQLite state inside the named volume
`fuzzforge_litellm_proxy_data`
The UI login uses the `UI_USERNAME` / `UI_PASSWORD` pair (defaults to
`fuzzforge` / `fuzzforge123`). To change them, set the environment variables
before you run `docker compose up`:
```bash
export UI_USERNAME=myadmin
export UI_PASSWORD=super-secret
docker compose up llm-proxy
```
You can also edit the values directly in `docker-compose.yml` if you prefer to
check them into a different secrets manager.
Proxy-wide settings now live in `volumes/litellm/proxy_config.yaml`. By
default it enables `store_model_in_db` and `store_prompts_in_spend_logs`, which
lets the UI display request/response payloads for new calls. Update this file
if you need additional LiteLLM options and restart the `llm-proxy` container.
LiteLLM's health endpoint lives at `/health/liveliness`. You can verify it from
another terminal:
```bash
curl http://localhost:10999/health/liveliness
```
## What the Bootstrapper Does
During startup the `llm-proxy-bootstrap` container performs three actions:
1. **Wait for the proxy** — Blocks until `/health/liveliness` becomes healthy.
2. **Mirror provider secrets** — Reads `volumes/env/.env` and writes any
`LITELLM_*_API_KEY` values into `volumes/env/.env.litellm`. The file is
created automatically on first boot; if you delete it, bootstrap will
recreate it and the proxy continues to read secrets from `.env`.
3. **Issue the default virtual key** — Calls `/key/generate` with the master key
and persists the generated token back into `volumes/env/.env` (replacing the
`sk-proxy-default` placeholder). The key is scoped to
`LITELLM_DEFAULT_MODELS` when that variable is set; otherwise it uses the
model from `LITELLM_MODEL`.
The sequence is idempotent. Existing provider secrets and virtual keys are
reused on subsequent runs, and the allowed-model list is refreshed via
`/key/update` if you change the defaults.
## Managing Virtual Keys
LiteLLM keys act as per-user credentials. The default key, named
`task-agent default`, is created automatically for the task agent. You can issue
more keys for teammates or CI jobs with the same management API:
```bash
curl http://localhost:10999/key/generate \
-H "Authorization: Bearer $LITELLM_MASTER_KEY" \
-H "Content-Type: application/json" \
-d '{
"key_alias": "demo-user",
"user_id": "demo",
"models": ["openai/gpt-4o-mini"],
"duration": "30d",
"max_budget": 50,
"metadata": {"team": "sandbox"}
}'
```
Use `/key/update` to adjust budgets or the allowed-model list on existing keys:
```bash
curl http://localhost:10999/key/update \
-H "Authorization: Bearer $LITELLM_MASTER_KEY" \
-H "Content-Type: application/json" \
-d '{
"key": "sk-...",
"models": ["openai/*", "anthropic/*"],
"max_budget": 100
}'
```
The admin UI (navigate to `http://localhost:10999/ui`) provides equivalent
controls for creating keys, routing models, auditing spend, and exporting logs.
## Wiring the Task Agent
The task agent already expects to talk to the proxy. Confirm these values in
`volumes/env/.env` before launching the stack:
```bash
FF_LLM_PROXY_BASE_URL=http://llm-proxy:4000 # or http://localhost:10999 when outside Docker
OPENAI_API_KEY=<virtual key created by bootstrap>
LITELLM_MODEL=openai/gpt-5
LITELLM_PROVIDER=openai
```
Restart the agent container after changing environment variables so the process
picks up the updates.
To validate the integration end to end, call the proxy directly:
```bash
curl -X POST http://localhost:10999/v1/chat/completions \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-4o-mini",
"messages": [{"role": "user", "content": "Proxy health check"}]
}'
```
A JSON response indicates the proxy can reach your upstream provider using the
mirrored secrets.
## Local Runtimes (Ollama, etc.)
LiteLLM supports non-hosted providers as well. To route requests to a local
runtime such as Ollama:
1. Set the appropriate provider key in the env file
(for Ollama, point LiteLLM at `OLLAMA_API_BASE` inside the container).
2. Add the passthrough model either from the UI (**Models → Add Model**) or
by calling `/model/new` with the master key.
3. Update `LITELLM_DEFAULT_MODELS` (and regenerate the virtual key if you want
the default key to include it).
The task agent keeps using the same OpenAI-compatible surface while LiteLLM
handles the translation to your runtime.
## Next Steps
- Explore [LiteLLM's documentation](https://docs.litellm.ai/docs/simple_proxy)
for advanced routing, cost controls, and observability hooks.
- Configure Slack/Prometheus integrations from the UI to monitor usage.
- Rotate the master key periodically and store it in your secrets manager, as it
grants full admin access to the proxy.
## Observability
LiteLLM ships with OpenTelemetry hooks for traces and metrics. This repository
already includes an OTLP collector (`otel-collector` service) and mounts a
default configuration that forwards traces to standard output. To wire it up:
1. Edit `volumes/otel/collector-config.yaml` if you want to forward to Jaeger,
Datadog, etc. The initial config uses the logging exporter so you can see
spans immediately via `docker compose logs -f otel-collector`.
2. Customize `volumes/litellm/proxy_config.yaml` if you need additional
callbacks; `general_settings.otel: true` and `litellm_settings.callbacks:
["otel"]` are already present so no extra code changes are required.
3. (Optional) Override `OTEL_EXPORTER_OTLP_*` environment variables in
`docker-compose.yml` or your shell to point at a remote collector.
After updating the configs, run `docker compose up -d otel-collector llm-proxy`
and generate a request (for example, trigger `ff workflow run llm_analysis`).
New traces will show up in the collector logs or whichever backend you
configured. See the official LiteLLM guide for advanced exporter options:
https://docs.litellm.ai/docs/observability/opentelemetry_integration.