mirror of
https://github.com/FuzzingLabs/fuzzforge_ai.git
synced 2026-02-13 00:32:47 +00:00
Add event-driven Cognee ingestion pipeline
This commit is contained in:
@@ -10,9 +10,9 @@ fuzzforge ai server
|
||||
|
||||
Run the command from a project directory that already contains `.fuzzforge/`. The server reads the project configuration and reuses the same environment variables as the CLI shell.
|
||||
|
||||
**Default directories**
|
||||
**Default directories / services**
|
||||
- Logs: `.fuzzforge/logs/cognee.log`
|
||||
- Cognee datasets: `.fuzzforge/cognee/project_<id>/{data,system}`
|
||||
- Cognee datasets: hosted by the shared Cognee service (`COGNEE_SERVICE_URL`) inside the configured MinIO/S3 bucket. Local mode falls back to `.fuzzforge/cognee/project_<id>/{data,system}`. Uploads dropped into `s3://cognee/projects/<project-id>/...` are ingested automatically via RabbitMQ + the dispatcher.
|
||||
- Artifact cache: `.fuzzforge/artifacts`
|
||||
|
||||
## HTTP Endpoints
|
||||
|
||||
@@ -73,7 +73,8 @@ sequenceDiagram
|
||||
- **Remote agent registry** (`ai/src/fuzzforge_ai/remote_agent.py`) holds metadata for downstream agents and handles capability discovery over HTTP. Auto-registration is configured by `ConfigManager` so known agents attach on startup.
|
||||
- **Memory services**:
|
||||
- `FuzzForgeMemoryService` and `HybridMemoryManager` (`ai/src/fuzzforge_ai/memory_service.py`) provide conversation recall and bridge to Cognee datasets when configured.
|
||||
- Cognee bootstrap (`ai/src/fuzzforge_ai/cognee_service.py`) ensures ingestion and knowledge queries stay scoped to the current project.
|
||||
- Cognee bootstrap (`ai/src/fuzzforge_ai/cognee_service.py`) ensures ingestion and knowledge queries stay scoped to the current project and forwards them to the shared Cognee service (`COGNEE_SERVICE_URL`). Datasets live inside the configured MinIO/S3 bucket, with `.fuzzforge/cognee/` available only when `COGNEE_STORAGE_BACKEND=local`.
|
||||
- MinIO bucket notifications push object-created events into RabbitMQ. The `ingestion-dispatcher` container listens on `cognee-ingest`, downloads the object, and invokes Cognee’s REST API on behalf of the project’s tenant so uploads become datasets without a manual CLI hop.
|
||||
|
||||
## Workflow Automation
|
||||
|
||||
@@ -91,7 +92,7 @@ The CLI surface mirrors these helpers as natural-language prompts (`You> run fuz
|
||||
|
||||
## Knowledge & Ingestion
|
||||
|
||||
- The `fuzzforge ingest` and `fuzzforge rag ingest` commands call into `ai/src/fuzzforge_ai/ingest_utils.py`, which filters file types, ignores caches, and populates Cognee datasets under `.fuzzforge/cognee/project_<id>/`.
|
||||
- The `fuzzforge ingest` and `fuzzforge rag ingest` commands call into `ai/src/fuzzforge_ai/ingest_utils.py`, which filters file types, ignores caches, and streams files to the Cognee service where they are stored under `s3://<bucket>/<prefix>/project_<id>/`. When files land directly in `s3://cognee/projects/<project-id>/<category>/...`, the dispatcher performs the same workflow automatically via RabbitMQ events.
|
||||
- Runtime queries hit `query_project_knowledge_api` on the executor, which defers to `cognee_service` for dataset lookup and semantic search. When Cognee credentials are absent the tools return a friendly "not configured" response.
|
||||
|
||||
## Artifact Pipeline
|
||||
@@ -140,7 +141,7 @@ graph LR
|
||||
- **Session persistence** is controlled by `SESSION_PERSISTENCE`. When set to `sqlite`, ADK’s `DatabaseSessionService` writes transcripts to the path configured by `SESSION_DB_PATH` (defaults to `./fuzzforge_sessions.db`). With `inmemory`, the context is scoped to the current process.
|
||||
- **Semantic recall** stores vector embeddings so `/recall` queries can surface earlier prompts, even after restarts when using SQLite.
|
||||
- **Hybrid memory manager** (`HybridMemoryManager`) stitches Cognee results into the ADK session. When a knowledge query hits Cognee, the relevant nodes are appended back into the session context so follow-up prompts can reference them naturally.
|
||||
- **Cognee datasets** are unique per project. Ingestion runs populate `<project>_codebase` while custom calls to `ingest_to_dataset` let you maintain dedicated buckets (e.g., `insights`). Data is persisted inside `.fuzzforge/cognee/project_<id>/` and shared across CLI and A2A modes.
|
||||
- **Cognee datasets** are unique per project. Ingestion runs populate `<project>_codebase` while custom calls to `ingest_to_dataset` let you maintain dedicated buckets (e.g., `insights`). Data is persisted inside the Cognee service’s bucket/prefix and is shared across CLI, HTTP server, and MCP integrations.
|
||||
- **Task metadata** (workflow runs, artifact descriptors) lives in the executor’s in-memory caches but is also mirrored through A2A task events so remote agents can resubscribe if the CLI restarts.
|
||||
- **Operational check**: Run `/recall <keyword>` or `You> search project knowledge for "topic" using INSIGHTS` after ingestion to confirm both ADK session recall and Cognee graph access are active.
|
||||
- **CLI quick check**: `/memory status` summarises the current memory type, session persistence, and Cognee dataset directories from inside the agent shell.
|
||||
|
||||
@@ -81,6 +81,33 @@ LLM_COGNEE_API_KEY=sk-your-key
|
||||
|
||||
If the Cognee variables are omitted, graph-specific tools remain available but return a friendly "not configured" response.
|
||||
|
||||
### Cognee Storage Backend
|
||||
|
||||
Cognee defaults to local storage under `.fuzzforge/cognee/`, but you can mirror datasets to MinIO/S3 for multi-tenant or containerised deployments:
|
||||
|
||||
```env
|
||||
COGNEE_STORAGE_BACKEND=s3
|
||||
COGNEE_S3_BUCKET=cognee
|
||||
COGNEE_S3_PREFIX=project_${PROJECT_ID}
|
||||
COGNEE_S3_ENDPOINT=http://localhost:9000
|
||||
COGNEE_S3_REGION=us-east-1
|
||||
COGNEE_S3_ACCESS_KEY=fuzzforge
|
||||
COGNEE_S3_SECRET_KEY=fuzzforge123
|
||||
COGNEE_S3_ALLOW_HTTP=1
|
||||
```
|
||||
|
||||
Set the values to match your MinIO/S3 endpoint; the docker compose stack seeds a `cognee` bucket automatically. When S3 mode is active, ingestion and search work exactly the same but Cognee writes metadata to `s3://<bucket>/<prefix>/project_<id>/{data,system}`.
|
||||
|
||||
### Cognee Service URL
|
||||
|
||||
The CLI and workers talk to Cognee over HTTP. Point `COGNEE_SERVICE_URL` at the service (defaults to `http://localhost:18000` when you run `docker/docker-compose.cognee.yml`) and provide `COGNEE_API_KEY` if you protect the API behind LiteLLM.
|
||||
|
||||
Every project gets its own Cognee login so datasets stay isolated. The CLI auto-derives an email/password pair (e.g., `project_<id>@fuzzforge.dev`) and registers it the first time you run `fuzzforge ingest`. Override those defaults by setting `COGNEE_SERVICE_EMAIL` / `COGNEE_SERVICE_PASSWORD` in `.fuzzforge/.env` before running ingestion if you need to reuse an existing account.
|
||||
|
||||
### MinIO Event Mapping
|
||||
|
||||
The ingestion dispatcher converts S3 prefixes to datasets using `DATASET_CATEGORY_MAP` (default `files:codebase,findings:findings,docs:docs`). Adjust it in `docker-compose.yml` if you want to add more categories or rename datasets.
|
||||
|
||||
## MCP / Backend Integration
|
||||
|
||||
```env
|
||||
|
||||
@@ -38,12 +38,13 @@ All runs automatically skip `.fuzzforge/**` and `.git/**` to avoid recursive ing
|
||||
|
||||
- Primary dataset: `<project>_codebase`
|
||||
- Additional datasets: create ad-hoc buckets such as `insights` via the `ingest_to_dataset` tool
|
||||
- Storage location: `.fuzzforge/cognee/project_<id>/`
|
||||
- Storage location (service default): `s3://<bucket>/<prefix>/project_<id>/{data,system}` as defined by the Cognee service (the docker compose stack seeds a `cognee` bucket automatically).
|
||||
- Local mode (opt-in): set `COGNEE_STORAGE_BACKEND=local` to fall back to `.fuzzforge/cognee/project_<id>/` when developing without MinIO.
|
||||
|
||||
### Persistence Details
|
||||
|
||||
- Every dataset lives under `.fuzzforge/cognee/project_<id>/{data,system}`. These directories are safe to commit to long-lived storage (they only contain embeddings and metadata).
|
||||
- Cognee assigns deterministic IDs per project; if you move the repository, copy the entire `.fuzzforge/cognee/` tree to retain graph history.
|
||||
- The Cognee service keeps datasets inside the configured bucket/prefix (`s3://<bucket>/<prefix>/project_<id>/{data,system}`) so every project has its own Ladybug + LanceDB pair. Local mode mirrors the same layout under `.fuzzforge/cognee/project_<id>/`.
|
||||
- Cognee assigns deterministic IDs per project; copy the entire prefix (local or S3) if you migrate repositories to retain graph history.
|
||||
- `HybridMemoryManager` ensures answers from Cognee are written back into the ADK session store so future prompts can refer to the same nodes without repeating the query.
|
||||
- All Cognee processing runs locally against the files you ingest. No external service calls are made unless you configure a remote Cognee endpoint.
|
||||
|
||||
@@ -77,10 +78,40 @@ FUZZFORGE_MCP_URL=http://localhost:8010/mcp
|
||||
LLM_COGNEE_PROVIDER=openai
|
||||
LLM_COGNEE_MODEL=gpt-5-mini
|
||||
LLM_COGNEE_API_KEY=sk-your-key
|
||||
COGNEE_SERVICE_URL=http://localhost:18000
|
||||
COGNEE_API_KEY=
|
||||
```
|
||||
|
||||
The CLI auto-registers a dedicated Cognee account per project the first time you ingest (email pattern `project_<id>@cognee.local`). Set `COGNEE_SERVICE_EMAIL` / `COGNEE_SERVICE_PASSWORD` in `.fuzzforge/.env` if you prefer to reuse an existing login.
|
||||
|
||||
Switch the knowledge graph storage to S3/MinIO by adding:
|
||||
|
||||
```env
|
||||
COGNEE_STORAGE_BACKEND=s3
|
||||
COGNEE_S3_BUCKET=cognee
|
||||
COGNEE_S3_PREFIX=project_${PROJECT_ID}
|
||||
COGNEE_S3_ENDPOINT=http://localhost:9000
|
||||
COGNEE_S3_ACCESS_KEY=fuzzforge
|
||||
COGNEE_S3_SECRET_KEY=fuzzforge123
|
||||
COGNEE_S3_ALLOW_HTTP=1
|
||||
```
|
||||
|
||||
The default `docker-compose` stack already seeds a `cognee` bucket inside MinIO so these values work out-of-the-box. Point `COGNEE_SERVICE_URL` at the Cognee container (included in `docker/docker-compose.cognee.yml`) so `fuzzforge ingest` sends all requests to the shared service instead of importing Cognee locally.
|
||||
|
||||
Add comments or project-specific overrides as needed; the agent reads these variables on startup.
|
||||
|
||||
## Event-Driven Ingestion
|
||||
|
||||
Uploading files directly into MinIO triggers Cognee automatically. The dispatcher watches `s3://cognee/projects/<project-id>/...` and translates the top-level folder into a dataset:
|
||||
|
||||
| Prefix | Dataset name |
|
||||
|-----------|---------------------------------------|
|
||||
| `files/` | `<project-id>_codebase` |
|
||||
| `findings/` | `<project-id>_findings` |
|
||||
| `docs/` | `<project-id>_docs` |
|
||||
|
||||
Under the hood MinIO publishes a `PUT` event → RabbitMQ (`cognee-ingest` exchange) → the `ingestion-dispatcher` container downloads the object and calls `/api/v1/add` + `/api/v1/cognify` using the deterministic project credentials (`project_<id>@fuzzforge.dev`). That means rsync, `aws s3 cp`, GitHub Actions, or any other tool that writes to the bucket can seed Cognee without touching the CLI.
|
||||
|
||||
## Tips
|
||||
|
||||
- Re-run ingestion after significant code changes to keep the knowledge graph fresh.
|
||||
|
||||
@@ -231,6 +231,20 @@ nano volumes/env/.env
|
||||
|
||||
See [Getting Started](../tutorial/getting-started.md) for detailed environment setup.
|
||||
|
||||
### Cognee Service Stack
|
||||
|
||||
Cognee now runs as its own container so every project shares the same multi-tenant backend (Ladybug + LanceDB sitting on MinIO). After the core stack is running, bring the service online with:
|
||||
|
||||
```bash
|
||||
docker compose -f docker/docker-compose.cognee.yml up -d
|
||||
```
|
||||
|
||||
This spins up the Cognee API on `http://localhost:18000`, publishes it to the host, and stores knowledge graphs in the `cognee` bucket that the main compose file seeds. Point the CLI at it by setting `COGNEE_SERVICE_URL=http://localhost:18000` (already included in `.env.template`).
|
||||
|
||||
### RabbitMQ + Dispatcher
|
||||
|
||||
`docker-compose.yml` also launches RabbitMQ (`http://localhost:15672`, ingest/ingest) and the `ingestion-dispatcher` container. MinIO publishes `PUT` events from `s3://cognee/projects/<project-id>/...` to the `cognee-ingest` exchange, and the dispatcher downloads the object and calls Cognee’s REST API. That means any rsync/upload into the projects bucket automatically becomes a dataset.
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
Reference in New Issue
Block a user