== Per-install operator handle for every third-party API call ==
Before this PR, every Shadowbroker install identified itself to
Wikipedia, Wikidata, Nominatim, GDELT, OpenMHz, Broadcastify,
weather.gov, NUFORC, Sentinel/Planetary Computer, TinyGS / CelesTrak,
Shodan, Finnhub, and others with a single project-wide User-Agent
("Shadowbroker/1.0" or "ShadowBroker-OSINT/1.0"). From the upstream's
perspective every install in the world looked like one giant scraper.
If one install misbehaved, the upstream's only recourse was to block
"Shadowbroker" as a whole.
PR #284 inadvertently doubled down on this in the frontend by
introducing a shared `WIKIMEDIA_API_USER_AGENT` constant. This PR
retrofits both backends to per-operator attribution.
New setting: OPERATOR_HANDLE (env var / settings UI / auto-gen)
New helper: network_utils.outbound_user_agent("purpose")
The handle is auto-generated as "operator-XXXXXX" on first call (the
"shadow-" prefix from earlier drafts was deliberately dropped — too
suspicious-looking for abuse-detection systems). Operators can
override via OPERATOR_HANDLE; the value is sanitized to lowercase
alphanumeric+dash+underscore and capped at 48 chars. Persisted to
backend/data/operator_handle.json so it survives container restarts.
Retrofitted call sites (every previously-MONSTER User-Agent):
- services/region_dossier.py (Wikipedia + Wikidata + Nominatim)
- services/geocode.py (Nominatim)
- services/sentinel_search.py (Microsoft Planetary Computer)
- services/feed_ingester.py (operator-curated RSS feeds)
- services/fetchers/earth_observation.py (weather.gov, NUFORC)
- services/fetchers/infrastructure.py
- services/fetchers/aircraft_database.py
- services/fetchers/route_database.py
- services/fetchers/trains.py
- services/fetchers/meshtastic_map.py
- services/shodan_connector.py
- services/unusual_whales_connector.py (Finnhub)
- services/tinygs_fetcher.py (CelesTrak + TinyGS)
- services/sar/sar_products_client.py
- services/geopolitics.py (GDELT)
- services/radio_intercept.py (Broadcastify + OpenMHz)
- routers/cctv.py + main.py (CCTV proxy)
- routers/ai_intel.py
- scripts/convert_power_plants.py (release-time data refresh)
Spoofed browser UAs removed (issues #289 / #290 / #291 — tg12 audit):
- cloudscraper-based Chrome impersonation against api.openmhz.com
-> replaced with honest requests + per-install UA
- Mozilla/5.0 spoofed UA on Broadcastify scrape
-> replaced with honest UA
- Mozilla/5.0 + fake first-party Referer on OpenMHz audio relay
-> replaced with honest UA
- cloudscraper dependency dropped from pyproject.toml + uv.lock
Frontend retrofit:
- new GET /api/settings/operator-handle endpoint (local-operator
gated) returns the install's handle
- frontend/src/lib/wikimediaClient.ts fetches the handle once on
first use, caches it for page lifetime, embeds it in the
Api-User-Agent for every Wikipedia / Wikidata browser-direct call
== GDELT GCS-direct fix ==
GDELT's data.gdeltproject.org is a CNAME to a Google Cloud Storage
bucket. GCS responds with the wildcard *.storage.googleapis.com cert
which legitimately does NOT cover the GDELT custom domain, so Python's
TLS verification correctly refuses the connection. Some networks
happen to route through a path where this works; many (notably Docker
Desktop's outbound NAT on local installs) do not. Verified on the
maintainer's local install: GDELT was unreachable; 1610 geopolitical
events / 48 export files were dropping silently.
Fix: services/geopolitics._gcs_direct_gdelt_url() rewrites any
data.gdeltproject.org URL to its GCS-direct equivalent
(storage.googleapis.com/data.gdeltproject.org/...) where the standard
GCS cert is genuinely valid. api.gdeltproject.org and every other host
are left untouched.
Confirmed live: backend log goes from
GDELT lastupdate failed: 500
to
Downloading 48 GDELT export files...
Downloaded 48/48 GDELT exports
GDELT parsed: 1610 conflict locations from 48 files
== Tests ==
backend/tests/test_per_operator_outbound_attribution.py (12 tests)
backend/tests/test_gdelt_gcs_direct_rewrite.py (6 tests)
backend/tests/test_region_dossier_wikimedia_ua.py (updated to
pin the helper + per-operator handle, not the old constant)
frontend/src/__tests__/utils/wikimediaClient.test.ts (rewritten
to mock /api/settings/operator-handle and assert per-operator UA)
Local: backend 114/114 security+audit+round7a suite green;
frontend 718/718 vitest suite green.
Credit: tg12 (external security audit, issues #289/#290/#291
relating to spoofed UAs); BigBodyCobain (operator-prefix call,
GDELT cloud-vs-local diagnosis).
Detected by Aeon + Semgrep (5x use-defused-xml ERROR).
Severity: medium
CWE-776 (billion laughs) / CWE-611 (XML external entity)
Five XML parse sites pass response bodies into the Python stdlib
xml.etree.ElementTree without protection against entity expansion
attacks. Python's ElementTree still permits internal entity references
by default (per the docs vulnerabilities table), so a malicious or
compromised upstream can ship a "billion laughs"-style payload that
expands to gigabytes in memory.
The user-controllable site is sb_monitor._parse_rss: the OpenClaw skill
exposes add_custom_feed(name, url, ...) to the agent, then
poll_custom_feeds fetches feed.url and passes the body to
xml.etree.ElementTree.fromstring with no host allowlist or
entity-bomb defence. The other four sites (psk_reporter_fetcher,
aircraft_database, cctv_pipeline x2) parse XML from hard-coded
upstreams (pskreporter.info, s3.opensky-network.org,
datos.madrid.es); defence-in-depth for upstream-compromise/MITM.
Switch all five call sites to defusedxml.ElementTree. Same
fromstring/find/findall/iter/findtext API, but rejects entity
references by default (raises defusedxml.EntitiesForbidden).
Confirmed locally that a 4-deep billion-laughs payload that
expands to 3000 chars under stdlib ET is rejected by defusedxml.
Added defusedxml>=0.7.1 to backend/pyproject.toml dependencies.
Co-authored-by: aeonframework <aeon-bot@aaronjmars.com>
Ship the v0.9.79 runtime refresh with transport lane isolation, Infonet secure-message address management, MeshChat MQTT controls, selected asset trail behavior, telemetry panel refinements, onboarding updates, and desktop/package metadata alignment.
Also ignore local graphify work products so analysis folders do not leak into future commits.
Add Tor/onion runtime wiring and faster Infonet node status refresh.
Keep node bootstrap state clearer across Docker and local runtimes.
Use selected aircraft trail history for cumulative tracked-aircraft emissions.
Full import audit found these packages used but missing from
pyproject.toml — all silently broken in Docker:
- meshtastic: MQTT protobuf decode (why US/LongFast chat was empty)
- PyNaCl: DM sealed-box encryption
- vaderSentiment: oracle sentiment analysis (unguarded, would crash)
paho-mqtt v2 changed Client constructor and on_connect callback
signatures, breaking the Meshtastic MQTT bridge. Pin to <2.0.0
so the existing v1 code works correctly in Docker.
paho-mqtt was missing from pyproject.toml, causing the Meshtastic MQTT
bridge to silently disable itself in Docker — no live chat messages
could be received. Also improve Infonet node status labels: show
RETRYING when sync fails instead of misleading SYNCING, and WAITING
when node is enabled but no sync has run yet.
Docker image was crash-looping with `ModuleNotFoundError: No module named 'orjson'`
because these packages were imported but not declared as dependencies.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
CI runs `uv sync --group dev` but only a `test` group existed.
Renamed to `dev` and added ruff + black so Docker Publish can pass.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>