mirror of
https://github.com/FuzzingLabs/fuzzforge_ai.git
synced 2026-02-12 22:32:45 +00:00
Store Cognee datasets in the projects bucket
This commit is contained in:
@@ -10,7 +10,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
|
||||
### ✨ Enhancements
|
||||
- Added Ladybug-backed Cognee integration with optional MinIO/S3 storage. Projects can now set `COGNEE_STORAGE_BACKEND=s3` (plus the `COGNEE_S3_*` settings) to keep knowledge graphs in the shared MinIO bucket seeded by `docker-compose`, enabling multi-tenant ingestion across workers and containers.
|
||||
- Introduced a dedicated Cognee service (`docker/docker-compose.cognee.yml`) and HTTP client so `fuzzforge ingest` streams data to the shared backend (`COGNEE_SERVICE_URL`) instead of importing Cognee locally. Each project now auto-provisions its own Cognee account/tenant and authenticates via the REST API, keeping datasets isolated even though the service is shared.
|
||||
- Added an event-driven ingestion pipeline: MinIO publishes `PUT` events from `s3://cognee/projects/<project-id>/...` to RabbitMQ, and the new `ingestion-dispatcher` container downloads the file, logs into Cognee as that project’s tenant, and invokes `/api/v1/add` + `/api/v1/cognify`. Uploading files (rsync, CI, etc.) now keeps datasets fresh without touching the CLI.
|
||||
- Added an event-driven ingestion pipeline: MinIO publishes `PUT` events from `s3://projects/<project-id>/...` to RabbitMQ, and the new `ingestion-dispatcher` container downloads the file, logs into Cognee as that project’s tenant, and invokes `/api/v1/add` + `/api/v1/cognify`. Uploading files (rsync, CI, etc.) now keeps datasets fresh without touching the CLI.
|
||||
|
||||
### 📝 Documentation
|
||||
- Added comprehensive worker startup documentation across all guides
|
||||
|
||||
@@ -197,7 +197,7 @@ ff workflow run security_assessment . # Start workflow - CLI uploads files au
|
||||
Uploading files into MinIO automatically streams them into Cognee:
|
||||
|
||||
```
|
||||
s3://cognee/projects/<project-id>/
|
||||
s3://projects/<project-id>/
|
||||
files/... # → <project-id>_codebase dataset
|
||||
findings/... # → <project-id>_findings dataset
|
||||
docs/... # → <project-id>_docs dataset
|
||||
|
||||
@@ -138,7 +138,7 @@ class CogneeConfig(BaseModel):
|
||||
service_password: Optional[str] = None
|
||||
storage_backend: Literal["local", "s3"] = "s3"
|
||||
s3_bucket: Optional[str] = None
|
||||
s3_prefix: Optional[str] = "projects"
|
||||
s3_prefix: Optional[str] = None
|
||||
s3_endpoint_url: Optional[str] = None
|
||||
s3_region: Optional[str] = None
|
||||
s3_access_key: Optional[str] = None
|
||||
@@ -217,8 +217,12 @@ class FuzzForgeConfig(BaseModel):
|
||||
cognee.service_url = "http://localhost:18000"
|
||||
changed = True
|
||||
|
||||
if not cognee.s3_prefix:
|
||||
cognee.s3_prefix = "projects"
|
||||
if not cognee.s3_bucket:
|
||||
cognee.s3_bucket = "projects"
|
||||
changed = True
|
||||
|
||||
if cognee.s3_prefix is None:
|
||||
cognee.s3_prefix = ""
|
||||
changed = True
|
||||
|
||||
default_email = f"project_{self.project.id}@fuzzforge.dev"
|
||||
@@ -234,9 +238,13 @@ class FuzzForgeConfig(BaseModel):
|
||||
changed = True
|
||||
|
||||
if cognee.storage_backend.lower() == "s3":
|
||||
bucket = cognee.s3_bucket or "cognee"
|
||||
prefix = (cognee.s3_prefix or "projects").strip("/")
|
||||
base_uri = f"s3://{bucket}/{prefix}/{self.project.id}"
|
||||
bucket = cognee.s3_bucket or "projects"
|
||||
prefix = (cognee.s3_prefix or "").strip("/")
|
||||
path_parts = [f"s3://{bucket}"]
|
||||
if prefix:
|
||||
path_parts.append(prefix)
|
||||
path_parts.append(self.project.id)
|
||||
base_uri = "/".join(path_parts)
|
||||
data_dir = f"{base_uri}/files"
|
||||
system_dir = f"{base_uri}/graph"
|
||||
else:
|
||||
|
||||
@@ -12,10 +12,7 @@ services:
|
||||
GRAPH_DATABASE_PROVIDER: ladybug
|
||||
ENABLE_BACKEND_ACCESS_CONTROL: "true"
|
||||
STORAGE_BACKEND: s3
|
||||
STORAGE_BUCKET_NAME: ${COGNEE_S3_BUCKET:-cognee}
|
||||
DATA_ROOT_DIRECTORY: s3://${COGNEE_S3_BUCKET:-cognee}/${COGNEE_S3_PREFIX:-projects}
|
||||
SYSTEM_ROOT_DIRECTORY: s3://${COGNEE_S3_BUCKET:-cognee}/${COGNEE_S3_PREFIX:-projects}
|
||||
CACHE_ROOT_DIRECTORY: s3://${COGNEE_S3_BUCKET:-cognee}/${COGNEE_S3_PREFIX:-projects}/cache
|
||||
STORAGE_BUCKET_NAME: ${COGNEE_S3_BUCKET:-projects}
|
||||
DB_PROVIDER: sqlite
|
||||
DB_PATH: /data/relational
|
||||
DB_NAME: cognee.db
|
||||
|
||||
@@ -12,7 +12,7 @@ Run the command from a project directory that already contains `.fuzzforge/`. Th
|
||||
|
||||
**Default directories / services**
|
||||
- Logs: `.fuzzforge/logs/cognee.log`
|
||||
- Cognee datasets: hosted by the shared Cognee service (`COGNEE_SERVICE_URL`) inside the configured MinIO/S3 bucket. Local mode falls back to `.fuzzforge/cognee/project_<id>/{data,system}`. Uploads dropped into `s3://cognee/projects/<project-id>/...` are ingested automatically via RabbitMQ + the dispatcher.
|
||||
- Cognee datasets: hosted by the shared Cognee service (`COGNEE_SERVICE_URL`) inside the configured MinIO/S3 bucket. Local mode falls back to `.fuzzforge/cognee/project_<id>/{data,system}`. Uploads dropped into `s3://projects/<project-id>/...` are ingested automatically via RabbitMQ + the dispatcher.
|
||||
- Artifact cache: `.fuzzforge/artifacts`
|
||||
|
||||
## HTTP Endpoints
|
||||
|
||||
@@ -92,7 +92,7 @@ The CLI surface mirrors these helpers as natural-language prompts (`You> run fuz
|
||||
|
||||
## Knowledge & Ingestion
|
||||
|
||||
- The `fuzzforge ingest` and `fuzzforge rag ingest` commands call into `ai/src/fuzzforge_ai/ingest_utils.py`, which filters file types, ignores caches, and streams files to the Cognee service where they are stored under `s3://<bucket>/<prefix>/project_<id>/`. When files land directly in `s3://cognee/projects/<project-id>/<category>/...`, the dispatcher performs the same workflow automatically via RabbitMQ events.
|
||||
- The `fuzzforge ingest` and `fuzzforge rag ingest` commands call into `ai/src/fuzzforge_ai/ingest_utils.py`, which filters file types, ignores caches, and streams files to the Cognee service where they are stored under `s3://projects/<project-id>/`. When files land directly in `s3://projects/<project-id>/<category>/...`, the dispatcher performs the same workflow automatically via RabbitMQ events.
|
||||
- Runtime queries hit `query_project_knowledge_api` on the executor, which defers to `cognee_service` for dataset lookup and semantic search. When Cognee credentials are absent the tools return a friendly "not configured" response.
|
||||
|
||||
## Artifact Pipeline
|
||||
|
||||
@@ -102,7 +102,7 @@ Add comments or project-specific overrides as needed; the agent reads these vari
|
||||
|
||||
## Event-Driven Ingestion
|
||||
|
||||
Uploading files directly into MinIO triggers Cognee automatically. The dispatcher watches `s3://cognee/projects/<project-id>/...` and translates the top-level folder into a dataset:
|
||||
Uploading files directly into MinIO triggers Cognee automatically. The dispatcher watches `s3://projects/<project-id>/...` and translates the top-level folder into a dataset:
|
||||
|
||||
| Prefix | Dataset name |
|
||||
|-----------|---------------------------------------|
|
||||
|
||||
@@ -243,7 +243,7 @@ This spins up the Cognee API on `http://localhost:18000`, publishes it to the ho
|
||||
|
||||
### RabbitMQ + Dispatcher
|
||||
|
||||
`docker-compose.yml` also launches RabbitMQ (`http://localhost:15672`, ingest/ingest) and the `ingestion-dispatcher` container. MinIO publishes `PUT` events from `s3://cognee/projects/<project-id>/...` to the `cognee-ingest` exchange, and the dispatcher downloads the object and calls Cognee’s REST API. That means any rsync/upload into the projects bucket automatically becomes a dataset.
|
||||
`docker-compose.yml` also launches RabbitMQ (`http://localhost:15672`, ingest/ingest) and the `ingestion-dispatcher` container. MinIO publishes `PUT` events from `s3://projects/<project-id>/...` to the `cognee-ingest` exchange, and the dispatcher downloads the object and calls Cognee’s REST API. That means any rsync/upload into the projects bucket automatically becomes a dataset.
|
||||
|
||||
---
|
||||
|
||||
|
||||
4
volumes/env/.env.template
vendored
4
volumes/env/.env.template
vendored
@@ -69,8 +69,8 @@ COGNEE_MCP_URL=
|
||||
COGNEE_SERVICE_URL=http://localhost:18000
|
||||
COGNEE_API_KEY=
|
||||
COGNEE_STORAGE_BACKEND=s3
|
||||
COGNEE_S3_BUCKET=cognee
|
||||
COGNEE_S3_PREFIX=projects
|
||||
COGNEE_S3_BUCKET=projects
|
||||
COGNEE_S3_PREFIX=
|
||||
COGNEE_S3_ENDPOINT=http://localhost:9000
|
||||
COGNEE_S3_REGION=us-east-1
|
||||
COGNEE_S3_ACCESS_KEY=fuzzforge
|
||||
|
||||
Reference in New Issue
Block a user