Feature/litellm proxy (#27)

* feat: seed governance config and responses routing * Add env-configurable timeout for proxy providers * Integrate LiteLLM OTEL collector and update docs * Make .env.litellm optional for LiteLLM proxy * Add LiteLLM proxy integration with model-agnostic virtual keys Changes: - Bootstrap generates 3 virtual keys with individual budgets (CLI: $100, Task-Agent: $25, Cognee: $50) - Task-agent loads config at runtime via entrypoint script to wait for bootstrap completion - All keys are model-agnostic by default (no LITELLM_DEFAULT_MODELS restrictions) - Bootstrap handles database/env mismatch after docker prune by deleting stale aliases - CLI and Cognee configured to use LiteLLM proxy with virtual keys - Added comprehensive documentation in volumes/env/README.md Technical details: - task-agent entrypoint waits for keys in .env file before starting uvicorn - Bootstrap creates/updates TASK_AGENT_API_KEY, COGNEE_API_KEY, and OPENAI_API_KEY - Removed hardcoded API keys from docker-compose.yml - All services route through http://localhost:10999 proxy * Fix CLI not loading virtual keys from global .env Project .env files with empty OPENAI_API_KEY values were overriding the global virtual keys. Updated _load_env_file_if_exists to only override with non-empty values. * Fix agent executor not passing API key to LiteLLM The agent was initializing LiteLlm without api_key or api_base, causing authentication errors when using the LiteLLM proxy. Now reads from OPENAI_API_KEY/LLM_API_KEY and LLM_ENDPOINT environment variables and passes them to LiteLlm constructor. * Auto-populate project .env with virtual key from global config When running 'ff init', the command now checks for a global volumes/env/.env file and automatically uses the OPENAI_API_KEY virtual key if found. This ensures projects work with LiteLLM proxy out of the box without manual key configuration. * docs: Update README with LiteLLM configuration instructions Add note about LITELLM_GEMINI_API_KEY configuration and clarify that OPENAI_API_KEY default value should not be changed as it's used for the LLM proxy. * Refactor workflow parameters to use JSON Schema defaults Consolidates parameter defaults into JSON Schema format, removing the separate default_parameters field. Adds extract_defaults_from_json_schema() helper to extract defaults from the standard schema structure. Updates LiteLLM proxy config to use LITELLM_OPENAI_API_KEY environment variable. * Remove .env.example from task_agent * Fix MDX syntax error in llm-proxy.md * fix: apply default parameters from metadata.yaml automatically Fixed TemporalManager.run_workflow() to correctly apply default parameter values from workflow metadata.yaml files when parameters are not provided by the caller. Previous behavior: - When workflow_params was empty {}, the condition `if workflow_params and 'parameters' in metadata` would fail - Parameters would not be extracted from schema, resulting in workflows receiving only target_id with no other parameters New behavior: - Removed the `workflow_params and` requirement from the condition - Now explicitly checks for defaults in parameter spec - Applies defaults from metadata.yaml automatically when param not provided - Workflows receive all parameters with proper fallback: provided value > metadata default > None This makes metadata.yaml the single source of truth for parameter defaults, removing the need for workflows to implement defensive default handling. Affected workflows: - llm_secret_detection (was failing with KeyError) - All other workflows now benefit from automatic default application Co-authored-by: tduhamel42 <tduhamel@fuzzinglabs.com>
2026-05-16 23:33:28 +02:00 · 2025-10-26 12:51:53 +01:00
parent 3b25edef19
commit a2c760ea2b
29 changed files with 1869 additions and 106 deletions
@@ -0,0 +1,179 @@
+---
+title: "Hot-Swap LiteLLM Models"
+description: "Register OpenAI and Anthropic models with the bundled LiteLLM proxy and switch them on the task agent without downtime."
+---
+
+LiteLLM sits between the task agent and upstream providers, so every model change
+is just an API call. This guide walks through registering OpenAI and Anthropic
+models, updating the virtual key, and exercising the A2A hot-swap flow.
+
+## Prerequisites
+
+- `docker compose up llm-proxy llm-proxy-db task-agent`
+- Provider secrets in `volumes/env/.env`:
+  - `LITELLM_OPENAI_API_KEY`
+  - `LITELLM_ANTHROPIC_API_KEY`
+- Master key (`LITELLM_MASTER_KEY`) and task-agent virtual key (auto-generated
+  during bootstrap)
+
+> UI access uses `UI_USERNAME` / `UI_PASSWORD` (defaults: `fuzzforge` /
+> `fuzzforge123`). Change them by exporting new values before running compose.
+
+## Register Provider Models
+
+Use the admin API to register the models the proxy should expose. The snippet
+below creates aliases for OpenAI `gpt-5`, `gpt-5-mini`, and Anthropic
+`claude-sonnet-4-5`.
+
+```bash
+MASTER_KEY=$(awk -F= '$1=="LITELLM_MASTER_KEY"{print $2}' volumes/env/.env)
+export OPENAI_API_KEY=$(awk -F= '$1=="OPENAI_API_KEY"{print $2}' volumes/env/.env)
+python - <<'PY'
+import os, requests
+master = os.environ['MASTER_KEY'].strip()
+base = 'http://localhost:10999'
+models = [
+    {
+        "model_name": "openai/gpt-5",
+        "litellm_params": {
+            "model": "gpt-5",
+            "custom_llm_provider": "openai",
+            "api_key": "os.environ/LITELLM_OPENAI_API_KEY"
+        },
+        "model_info": {
+            "provider": "openai",
+            "description": "OpenAI GPT-5"
+        }
+    },
+    {
+        "model_name": "openai/gpt-5-mini",
+        "litellm_params": {
+            "model": "gpt-5-mini",
+            "custom_llm_provider": "openai",
+            "api_key": "os.environ/LITELLM_OPENAI_API_KEY"
+        },
+        "model_info": {
+            "provider": "openai",
+            "description": "OpenAI GPT-5 mini"
+        }
+    },
+    {
+        "model_name": "anthropic/claude-sonnet-4-5",
+        "litellm_params": {
+            "model": "claude-sonnet-4-5",
+            "custom_llm_provider": "anthropic",
+            "api_key": "os.environ/LITELLM_ANTHROPIC_API_KEY"
+        },
+        "model_info": {
+            "provider": "anthropic",
+            "description": "Anthropic Claude Sonnet 4.5"
+        }
+    }
+]
+for payload in models:
+    resp = requests.post(
+        f"{base}/model/new",
+        headers={"Authorization": f"Bearer {master}", "Content-Type": "application/json"},
+        json=payload,
+        timeout=60,
+    )
+    if resp.status_code not in (200, 201, 409):
+        raise SystemExit(f"Failed to register {payload['model_name']}: {resp.status_code} {resp.text}")
+    print(payload['model_name'], '=>', resp.status_code)
+PY
+```
+
+Each entry stores the upstream secret by reference (`os.environ/...`) so the
+raw API key never leaves the container environment.
+
+## Relax Virtual Key Model Restrictions
+
+Let the agent key call every model on the proxy:
+
+```bash
+MASTER_KEY=$(awk -F= '$1=="LITELLM_MASTER_KEY"{print $2}' volumes/env/.env)
+VK=$(awk -F= '$1=="OPENAI_API_KEY"{print $2}' volumes/env/.env)
+python - <<'PY'
+import os, requests, json
+resp = requests.post(
+    'http://localhost:10999/key/update',
+    headers={
+        'Authorization': f"Bearer {os.environ['MASTER_KEY'].strip()}",
+        'Content-Type': 'application/json'
+    },
+    json={'key': os.environ['VK'].strip(), 'models': []},
+    timeout=60,
+)
+print(json.dumps(resp.json(), indent=2))
+PY
+```
+
+Restart the task agent so it sees the refreshed key:
+
+```bash
+docker compose restart task-agent
+```
+
+## Hot-Swap With The A2A Helper
+
+Switch models without restarting the service:
+
+```bash
+# Ensure the CLI reads the latest virtual key
+export OPENAI_API_KEY=$(awk -F= '$1=="OPENAI_API_KEY"{print $2}' volumes/env/.env)
+
+# OpenAI gpt-5 alias
+python ai/agents/task_agent/a2a_hot_swap.py \
+  --url http://localhost:10900/a2a/litellm_agent \
+  --model openai gpt-5 \
+  --context switch-demo
+
+# Confirm the response comes from the new model
+python ai/agents/task_agent/a2a_hot_swap.py \
+  --url http://localhost:10900/a2a/litellm_agent \
+  --message "Which model am I using?" \
+  --context switch-demo
+
+# Swap to gpt-5-mini
+python ai/agents/task_agent/a2a_hot_swap.py --url http://localhost:10900/a2a/litellm_agent --model openai gpt-5-mini --context switch-demo
+
+# Swap to Anthropic Claude Sonnet 4.5
+python ai/agents/task_agent/a2a_hot_swap.py --url http://localhost:10900/a2a/litellm_agent --model anthropic claude-sonnet-4-5 --context switch-demo
+```
+
+> Each invocation reuses the same conversation context (`switch-demo`) so you
+> can confirm the active provider by asking follow-up questions.
+
+## Resetting The Proxy (Optional)
+
+To wipe the LiteLLM state and rerun bootstrap:
+
+```bash
+docker compose down llm-proxy llm-proxy-db llm-proxy-bootstrap
+
+docker volume rm fuzzforge_litellm_proxy_data fuzzforge_litellm_proxy_db
+
+docker compose up -d llm-proxy-db llm-proxy
+```
+
+After the proxy is healthy, rerun the registration script and key update. The
+bootstrap container mirrors secrets into `.env.litellm` and reissues the task
+agent key automatically.
+
+## How The Pieces Fit Together
+
+1. **LiteLLM Proxy** exposes OpenAI-compatible routes and stores provider
+   metadata in Postgres.
+2. **Bootstrap Container** waits for `/health/liveliness`, mirrors secrets into
+   `.env.litellm`, registers any models you script, and keeps the virtual key in
+   sync with the discovered model list.
+3. **Task Agent** calls the proxy via `FF_LLM_PROXY_BASE_URL`. The hot-swap tool
+   updates the agent’s runtime configuration, so switching providers is just a
+   control message.
+4. **Virtual Keys** carry quotas and allowed models. Setting the `models` array
+   to `[]` lets the key use anything registered on the proxy.
+
+Keep the master key and generated virtual keys somewhere safe—they grant full
+admin and agent access respectively. When you add a new provider (e.g., Ollama)
+just register the model via `/model/new`, update the key if needed, and repeat
+the hot-swap steps.
@@ -0,0 +1,194 @@
+---
+title: "Run the LLM Proxy"
+description: "Run the LiteLLM gateway that ships with FuzzForge and connect it to the task agent."
+---
+
+## Overview
+
+FuzzForge routes every LLM request through a LiteLLM proxy so that usage can be
+metered, priced, and rate limited per user. Docker Compose starts the proxy in a
+hardened container, while a bootstrap job seeds upstream provider secrets and
+issues a virtual key for the task agent automatically.
+
+LiteLLM exposes the OpenAI-compatible APIs (`/v1/*`) plus a rich admin UI. All
+traffic stays on your network and upstream credentials never leave the proxy
+container.
+
+## Before You Start
+
+1. Copy `volumes/env/.env.example` to `volumes/env/.env` and set the basics:
+   - `LITELLM_MASTER_KEY` — admin token used to manage the proxy
+   - `LITELLM_SALT_KEY` — random string used to encrypt provider credentials
+   - Provider secrets under `LITELLM_<PROVIDER>_API_KEY` (for example
+     `LITELLM_OPENAI_API_KEY`)
+   - Leave `OPENAI_API_KEY=sk-proxy-default`; the bootstrap job replaces it with a
+     LiteLLM-issued virtual key
+2. When running tools outside Docker, change `FF_LLM_PROXY_BASE_URL` to the
+   published host port (`http://localhost:10999`). Inside Docker the default
+   value `http://llm-proxy:4000` already resolves to the container.
+
+## Start the Proxy
+
+```bash
+docker compose up llm-proxy
+```
+
+The service publishes two things:
+
+- HTTP API + admin UI on `http://localhost:10999`
+- Persistent SQLite state inside the named volume
+  `fuzzforge_litellm_proxy_data`
+
+The UI login uses the `UI_USERNAME` / `UI_PASSWORD` pair (defaults to
+`fuzzforge` / `fuzzforge123`). To change them, set the environment variables
+before you run `docker compose up`:
+
+```bash
+export UI_USERNAME=myadmin
+export UI_PASSWORD=super-secret
+docker compose up llm-proxy
+```
+
+You can also edit the values directly in `docker-compose.yml` if you prefer to
+check them into a different secrets manager.
+
+Proxy-wide settings now live in `volumes/litellm/proxy_config.yaml`. By
+default it enables `store_model_in_db` and `store_prompts_in_spend_logs`, which
+lets the UI display request/response payloads for new calls. Update this file
+if you need additional LiteLLM options and restart the `llm-proxy` container.
+
+LiteLLM's health endpoint lives at `/health/liveliness`. You can verify it from
+another terminal:
+
+```bash
+curl http://localhost:10999/health/liveliness
+```
+
+## What the Bootstrapper Does
+
+During startup the `llm-proxy-bootstrap` container performs three actions:
+
+1. **Wait for the proxy** — Blocks until `/health/liveliness` becomes healthy.
+2. **Mirror provider secrets** — Reads `volumes/env/.env` and writes any
+   `LITELLM_*_API_KEY` values into `volumes/env/.env.litellm`. The file is
+   created automatically on first boot; if you delete it, bootstrap will
+   recreate it and the proxy continues to read secrets from `.env`.
+3. **Issue the default virtual key** — Calls `/key/generate` with the master key
+   and persists the generated token back into `volumes/env/.env` (replacing the
+   `sk-proxy-default` placeholder). The key is scoped to
+   `LITELLM_DEFAULT_MODELS` when that variable is set; otherwise it uses the
+   model from `LITELLM_MODEL`.
+
+The sequence is idempotent. Existing provider secrets and virtual keys are
+reused on subsequent runs, and the allowed-model list is refreshed via
+`/key/update` if you change the defaults.
+
+## Managing Virtual Keys
+
+LiteLLM keys act as per-user credentials. The default key, named
+`task-agent default`, is created automatically for the task agent. You can issue
+more keys for teammates or CI jobs with the same management API:
+
+```bash
+curl http://localhost:10999/key/generate \
+  -H "Authorization: Bearer $LITELLM_MASTER_KEY" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "key_alias": "demo-user",
+    "user_id": "demo",
+    "models": ["openai/gpt-4o-mini"],
+    "duration": "30d",
+    "max_budget": 50,
+    "metadata": {"team": "sandbox"}
+  }'
+```
+
+Use `/key/update` to adjust budgets or the allowed-model list on existing keys:
+
+```bash
+curl http://localhost:10999/key/update \
+  -H "Authorization: Bearer $LITELLM_MASTER_KEY" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "key": "sk-...",            
+    "models": ["openai/*", "anthropic/*"],
+    "max_budget": 100
+  }'
+```
+
+The admin UI (navigate to `http://localhost:10999/ui`) provides equivalent
+controls for creating keys, routing models, auditing spend, and exporting logs.
+
+## Wiring the Task Agent
+
+The task agent already expects to talk to the proxy. Confirm these values in
+`volumes/env/.env` before launching the stack:
+
+```bash
+FF_LLM_PROXY_BASE_URL=http://llm-proxy:4000          # or http://localhost:10999 when outside Docker
+OPENAI_API_KEY=<virtual key created by bootstrap>
+LITELLM_MODEL=openai/gpt-5
+LITELLM_PROVIDER=openai
+```
+
+Restart the agent container after changing environment variables so the process
+picks up the updates.
+
+To validate the integration end to end, call the proxy directly:
+
+```bash
+curl -X POST http://localhost:10999/v1/chat/completions \
+  -H "Authorization: Bearer $OPENAI_API_KEY" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "openai/gpt-4o-mini",
+    "messages": [{"role": "user", "content": "Proxy health check"}]
+  }'
+```
+
+A JSON response indicates the proxy can reach your upstream provider using the
+mirrored secrets.
+
+## Local Runtimes (Ollama, etc.)
+
+LiteLLM supports non-hosted providers as well. To route requests to a local
+runtime such as Ollama:
+
+1. Set the appropriate provider key in the env file
+   (for Ollama, point LiteLLM at `OLLAMA_API_BASE` inside the container).
+2. Add the passthrough model either from the UI (**Models → Add Model**) or
+   by calling `/model/new` with the master key.
+3. Update `LITELLM_DEFAULT_MODELS` (and regenerate the virtual key if you want
+the default key to include it).
+
+The task agent keeps using the same OpenAI-compatible surface while LiteLLM
+handles the translation to your runtime.
+
+## Next Steps
+
+- Explore [LiteLLM's documentation](https://docs.litellm.ai/docs/simple_proxy)
+  for advanced routing, cost controls, and observability hooks.
+- Configure Slack/Prometheus integrations from the UI to monitor usage.
+- Rotate the master key periodically and store it in your secrets manager, as it
+  grants full admin access to the proxy.
+
+## Observability
+
+LiteLLM ships with OpenTelemetry hooks for traces and metrics. This repository
+already includes an OTLP collector (`otel-collector` service) and mounts a
+default configuration that forwards traces to standard output. To wire it up:
+
+1. Edit `volumes/otel/collector-config.yaml` if you want to forward to Jaeger,
+   Datadog, etc. The initial config uses the logging exporter so you can see
+   spans immediately via `docker compose logs -f otel-collector`.
+2. Customize `volumes/litellm/proxy_config.yaml` if you need additional
+   callbacks; `general_settings.otel: true` and `litellm_settings.callbacks:
+   ["otel"]` are already present so no extra code changes are required.
+3. (Optional) Override `OTEL_EXPORTER_OTLP_*` environment variables in
+   `docker-compose.yml` or your shell to point at a remote collector.
+
+After updating the configs, run `docker compose up -d otel-collector llm-proxy`
+and generate a request (for example, trigger `ff workflow run llm_analysis`).
+New traces will show up in the collector logs or whichever backend you
+configured. See the official LiteLLM guide for advanced exporter options:
+https://docs.litellm.ai/docs/observability/opentelemetry_integration.