Integrate Cognee service updates

2026-07-06 20:17:54 +02:00 · 2025-10-03 11:54:01 +02:00
parent 43d3eae1db
commit c7adfabe0a
17 changed files with 1649 additions and 99 deletions
@@ -12,7 +12,7 @@ Run the command from a project directory that already contains `.fuzzforge/`. Th

 **Default directories**
 - Logs: `.fuzzforge/logs/cognee.log`
- Cognee datasets: `.fuzzforge/cognee/project_<id>/{data,system}`
+- Cognee datasets: `.fuzzforge/cognee/project_<id>/{data,system}` in embedded mode, or `s3://<bucket>/cognee/projects/<project-id>/` when the service backend is active.
 - Artifact cache: `.fuzzforge/artifacts`

 ## HTTP Endpoints
@@ -140,7 +140,7 @@ graph LR
 - **Session persistence** is controlled by `SESSION_PERSISTENCE`. When set to `sqlite`, ADK’s `DatabaseSessionService` writes transcripts to the path configured by `SESSION_DB_PATH` (defaults to `./fuzzforge_sessions.db`). With `inmemory`, the context is scoped to the current process.
 - **Semantic recall** stores vector embeddings so `/recall` queries can surface earlier prompts, even after restarts when using SQLite.
 - **Hybrid memory manager** (`HybridMemoryManager`) stitches Cognee results into the ADK session. When a knowledge query hits Cognee, the relevant nodes are appended back into the session context so follow-up prompts can reference them naturally.
- **Cognee datasets** are unique per project. Ingestion runs populate `<project>_codebase` while custom calls to `ingest_to_dataset` let you maintain dedicated buckets (e.g., `insights`). Data is persisted inside `.fuzzforge/cognee/project_<id>/` and shared across CLI and A2A modes.
+- **Cognee datasets** are unique per project. Ingestion runs populate `<project>_codebase` while custom calls to `ingest_to_dataset` let you maintain dedicated buckets (e.g., `insights`). Data is persisted inside `.fuzzforge/cognee/project_<id>/` when running embedded, or under `s3://<bucket>/cognee/projects/<project-id>/` when the hosted Cognee service is enabled.
 - **Task metadata** (workflow runs, artifact descriptors) lives in the executor’s in-memory caches but is also mirrored through A2A task events so remote agents can resubscribe if the CLI restarts.
 - **Operational check**: Run `/recall <keyword>` or `You> search project knowledge for "topic" using INSIGHTS` after ingestion to confirm both ADK session recall and Cognee graph access are active.
 - **CLI quick check**: `/memory status` summarises the current memory type, session persistence, and Cognee dataset directories from inside the agent shell.
@@ -81,6 +81,23 @@ LLM_COGNEE_API_KEY=sk-your-key

 If the Cognee variables are omitted, graph-specific tools remain available but return a friendly "not configured" response.

+### Hosted Cognee Service
+
+See [Hosted Cognee Service](./cognee-service.md) for step-by-step instructions on starting the shared backend with Docker.
+
+When you want multiple projects to share a dedicated Cognee backend, point the CLI at the service and shared S3 bucket:
+
+```env
+COGNEE_STORAGE_MODE=service
+COGNEE_SERVICE_URL=http://localhost:8000
+COGNEE_S3_BUCKET=cognee-shared
+COGNEE_S3_PREFIX=cognee/projects
+COGNEE_SERVICE_USER_EMAIL=project_12345678@cognee.local
+COGNEE_SERVICE_USER_PASSWORD=super-secret
+```
+
+During initialisation the CLI writes these values to `.fuzzforge/cognee/service/project_<id>/.env`. Each project gets its own scoped dataset (default `<project>_codebase`) while the service persists metadata in `s3://<bucket>/<prefix>/` using the project and tenant identifiers.
+
 ## MCP / Backend Integration

 ```env
@@ -38,14 +38,14 @@ All runs automatically skip `.fuzzforge/**` and `.git/**` to avoid recursive ing

 - Primary dataset: `<project>_codebase`
 - Additional datasets: create ad-hoc buckets such as `insights` via the `ingest_to_dataset` tool
- Storage location: `.fuzzforge/cognee/project_<id>/`
+- Storage location: `.fuzzforge/cognee/project_<id>/` when running embedded, or `s3://<bucket>/cognee/projects/<project-id>/` when using the Cognee service mode.

 ### Persistence Details

- Every dataset lives under `.fuzzforge/cognee/project_<id>/{data,system}`. These directories are safe to commit to long-lived storage (they only contain embeddings and metadata).
+- Every dataset lives under `.fuzzforge/cognee/project_<id>/{data,system}` when running locally. In service mode the same layout is mirrored to a shared S3 bucket so multiple projects can reuse the hosted Cognee instance without colliding.
 - Cognee assigns deterministic IDs per project; if you move the repository, copy the entire `.fuzzforge/cognee/` tree to retain graph history.
 - `HybridMemoryManager` ensures answers from Cognee are written back into the ADK session store so future prompts can refer to the same nodes without repeating the query.
- All Cognee processing runs locally against the files you ingest. No external service calls are made unless you configure a remote Cognee endpoint.
+- In embedded mode all Cognee processing runs locally against the files you ingest. When `COGNEE_STORAGE_MODE=service`, the CLI streams files to the Cognee API, which stores them in the shared S3 prefix and runs the pipeline remotely before results flow back into the agent session.

 ## Prompt Examples

@@ -77,6 +77,12 @@ FUZZFORGE_MCP_URL=http://localhost:8010/mcp
 LLM_COGNEE_PROVIDER=openai
 LLM_COGNEE_MODEL=gpt-5-mini
 LLM_COGNEE_API_KEY=sk-your-key
+
+# Optional: hosted Cognee service
+COGNEE_STORAGE_MODE=service
+COGNEE_SERVICE_URL=http://localhost:8000
+COGNEE_S3_BUCKET=cognee-shared
+COGNEE_S3_PREFIX=cognee/projects
 ```

 Add comments or project-specific overrides as needed; the agent reads these variables on startup.