# Blog RSS Scraper This Lambda function ingests RSS feeds into a Directus CMS instance. It's specifically configured to pull from the "Have I Been Flocked?" RSS feed and sync the posts with your Directus blog collection. ## Features - **RSS Feed Parsing**: Extracts title, link, pubDate, and description from RSS entries - **Directus Integration**: Creates, updates, and deletes blog posts via Directus API - **Idempotent Operation**: Safe to run multiple times - only makes necessary changes - **Selective Sync**: Only manages RSS-ingested posts (identified by `externalUrl` field) - **Error Handling**: Comprehensive logging and error recovery ## Setup ### Environment Variables Set the following environment variables: ```bash # Required DIRECTUS_API_TOKEN=your_directus_api_token_here # Optional (defaults to https://cms.deflock.me) DIRECTUS_BASE_URL=https://your-directus-instance.com ``` ### Directus Collection Schema Your Directus `blog` collection should have the following fields: - `id` (integer, auto-increment) - `title` (string, required) - `description` (text) - `content` (rich text, optional - RSS posts will have this as null) - `externalUrl` (string, optional - identifies RSS-ingested posts) - `published` (datetime) ### Dependencies Install dependencies using uv: ```bash uv init ``` ## Usage ### Local Testing ```bash uv run main.py ``` ### AWS Lambda Deploy as a Python 3.14 Lambda function. The `lambda_handler` function serves as the entry point. #### Sample Lambda Event The function doesn't require any specific event data: ```json {} ``` #### Sample Response Success: ```json { "statusCode": 200, "body": { "message": "RSS synchronization completed successfully", "stats": { "created": 2, "updated": 1, "deleted": 0, "errors": 0 } } } ``` Error: ```json { "statusCode": 500, "body": { "message": "RSS synchronization failed", "error": "DIRECTUS_API_TOKEN environment variable is required" } } ``` ## How It Works 1. **Fetch RSS Feed**: Downloads and parses the RSS feed from `https://haveibeenflocked.com/feed.xml` 2. **Get Existing Posts**: Queries Directus for all blog posts that have an `externalUrl` (these are RSS-managed posts) 3. **Synchronization**: - **Create**: New RSS entries that don't exist in Directus - **Update**: Existing posts where title or description has changed - **Delete**: Directus posts with `externalUrl` that no longer exist in the RSS feed 4. **Preserve Manual Posts**: Posts without an `externalUrl` are left untouched ## RSS Feed Structure The scraper expects standard RSS 2.0 format with the following elements: - `