== Per-install operator handle for every third-party API call ==
Before this PR, every Shadowbroker install identified itself to
Wikipedia, Wikidata, Nominatim, GDELT, OpenMHz, Broadcastify,
weather.gov, NUFORC, Sentinel/Planetary Computer, TinyGS / CelesTrak,
Shodan, Finnhub, and others with a single project-wide User-Agent
("Shadowbroker/1.0" or "ShadowBroker-OSINT/1.0"). From the upstream's
perspective every install in the world looked like one giant scraper.
If one install misbehaved, the upstream's only recourse was to block
"Shadowbroker" as a whole.
PR #284 inadvertently doubled down on this in the frontend by
introducing a shared `WIKIMEDIA_API_USER_AGENT` constant. This PR
retrofits both backends to per-operator attribution.
New setting: OPERATOR_HANDLE (env var / settings UI / auto-gen)
New helper: network_utils.outbound_user_agent("purpose")
The handle is auto-generated as "operator-XXXXXX" on first call (the
"shadow-" prefix from earlier drafts was deliberately dropped — too
suspicious-looking for abuse-detection systems). Operators can
override via OPERATOR_HANDLE; the value is sanitized to lowercase
alphanumeric+dash+underscore and capped at 48 chars. Persisted to
backend/data/operator_handle.json so it survives container restarts.
Retrofitted call sites (every previously-MONSTER User-Agent):
- services/region_dossier.py (Wikipedia + Wikidata + Nominatim)
- services/geocode.py (Nominatim)
- services/sentinel_search.py (Microsoft Planetary Computer)
- services/feed_ingester.py (operator-curated RSS feeds)
- services/fetchers/earth_observation.py (weather.gov, NUFORC)
- services/fetchers/infrastructure.py
- services/fetchers/aircraft_database.py
- services/fetchers/route_database.py
- services/fetchers/trains.py
- services/fetchers/meshtastic_map.py
- services/shodan_connector.py
- services/unusual_whales_connector.py (Finnhub)
- services/tinygs_fetcher.py (CelesTrak + TinyGS)
- services/sar/sar_products_client.py
- services/geopolitics.py (GDELT)
- services/radio_intercept.py (Broadcastify + OpenMHz)
- routers/cctv.py + main.py (CCTV proxy)
- routers/ai_intel.py
- scripts/convert_power_plants.py (release-time data refresh)
Spoofed browser UAs removed (issues #289 / #290 / #291 — tg12 audit):
- cloudscraper-based Chrome impersonation against api.openmhz.com
-> replaced with honest requests + per-install UA
- Mozilla/5.0 spoofed UA on Broadcastify scrape
-> replaced with honest UA
- Mozilla/5.0 + fake first-party Referer on OpenMHz audio relay
-> replaced with honest UA
- cloudscraper dependency dropped from pyproject.toml + uv.lock
Frontend retrofit:
- new GET /api/settings/operator-handle endpoint (local-operator
gated) returns the install's handle
- frontend/src/lib/wikimediaClient.ts fetches the handle once on
first use, caches it for page lifetime, embeds it in the
Api-User-Agent for every Wikipedia / Wikidata browser-direct call
== GDELT GCS-direct fix ==
GDELT's data.gdeltproject.org is a CNAME to a Google Cloud Storage
bucket. GCS responds with the wildcard *.storage.googleapis.com cert
which legitimately does NOT cover the GDELT custom domain, so Python's
TLS verification correctly refuses the connection. Some networks
happen to route through a path where this works; many (notably Docker
Desktop's outbound NAT on local installs) do not. Verified on the
maintainer's local install: GDELT was unreachable; 1610 geopolitical
events / 48 export files were dropping silently.
Fix: services/geopolitics._gcs_direct_gdelt_url() rewrites any
data.gdeltproject.org URL to its GCS-direct equivalent
(storage.googleapis.com/data.gdeltproject.org/...) where the standard
GCS cert is genuinely valid. api.gdeltproject.org and every other host
are left untouched.
Confirmed live: backend log goes from
GDELT lastupdate failed: 500
to
Downloading 48 GDELT export files...
Downloaded 48/48 GDELT exports
GDELT parsed: 1610 conflict locations from 48 files
== Tests ==
backend/tests/test_per_operator_outbound_attribution.py (12 tests)
backend/tests/test_gdelt_gcs_direct_rewrite.py (6 tests)
backend/tests/test_region_dossier_wikimedia_ua.py (updated to
pin the helper + per-operator handle, not the old constant)
frontend/src/__tests__/utils/wikimediaClient.test.ts (rewritten
to mock /api/settings/operator-handle and assert per-operator UA)
Local: backend 114/114 security+audit+round7a suite green;
frontend 718/718 vitest suite green.
Credit: tg12 (external security audit, issues #289/#290/#291
relating to spoofed UAs); BigBodyCobain (operator-prefix call,
GDELT cloud-vs-local diagnosis).
External security audit by @tg12 (May 17, 2026) filed 11 issues against
the backend. PR #227 (May 18, AI-generated) closed seven of them by
adding require_local_operator to control-plane endpoints. Four remained
live; this PR closes the rest.
#192 — CCTV proxy followed redirects without re-validating host
Issue: /api/cctv/media validated only the caller-supplied URL host
before passing it to requests.get(..., allow_redirects=True). A 302
to http://127.0.0.1 or any internal/disallowed host was silently
followed, turning the proxy into an open-redirect-to-SSRF chain.
Fix in routers/cctv.py: replace the single allow_redirects=True call
with a manual follow loop. Each hop's Location is parsed, the host is
rerun through _cctv_host_allowed(), and non-HTTP schemes (file://,
ftp://, etc.) are rejected. Cap chain length at 5 hops.
Test: backend/tests/test_cctv_redirect_ssrf.py covers
- redirect to disallowed host -> 502
- redirect to localhost -> 502
- redirect to another allowed host -> 200
- redirect chain length cap
- non-HTTP scheme rejected
#198 — Gate introspection GETs were unauthenticated
Issue: /api/wormhole/gate/{gate_id}/{identity,personas,key} were
callable with no auth dependency. Any caller that could reach the
backend could dump the operator's active persona, persona inventory,
and key status for any gate_id they knew. The wiki's privacy threat
model explicitly markets gate personas as rotating, unlinkable
pseudonyms — this leak defeated that property.
Fix in routers/wormhole.py: add
dependencies=[Depends(require_local_operator)] to all three routes.
Test: backend/tests/test_control_surface_auth.py extended with
three new parameterized cases (lines 75-77).
#199 — GDELT military incident ingestion used plaintext HTTP
Issue: backend/services/geopolitics.py fetched
http://data.gdeltproject.org/gdeltv2/lastupdate.txt and ~48 export
archive URLs over plaintext HTTP. Passive observers could identify
Shadowbroker nodes from the fetch pattern. Active MITM could inject
doctored military incident records into the global map.
Fix in services/geopolitics.py: rewrite the lastupdate.txt fetch and
the export download URL constructor to use https://. GDELT's
data.gdeltproject.org serves the same content over HTTPS.
Test: backend/tests/test_gdelt_https.py asserts no plaintext HTTP
URLs to data.gdeltproject.org remain in code (comments excluded) and
that the HTTPS URLs we expect are present.
#200 — Sentinel token cache lookup used client_id only
Issue: routers/tools.py kept a process-global cache of Copernicus
bearer tokens. The lookup compared
_sh_token_cache["client_id"] == client_id. A caller who knew a valid
client_id but supplied any wrong client_secret hit the cache and
reused the legitimate caller's bearer token — burning their quota
and accessing imagery on their account.
Fix in routers/tools.py: replace the client_id field with
credential_fp, an HMAC-SHA256 over (client_id, client_secret) under
a per-process random key (_SH_TOKEN_CACHE_HMAC_KEY = os.urandom(32),
regenerated at startup). A caller who doesn't know the secret cannot
compute a matching fingerprint, so they miss the cache and hit the
real Copernicus token endpoint — which will reject their wrong
secret with a 401.
Test: backend/tests/test_sentinel_token_cache.py covers
- same client_id + different secrets => different fingerprints
- same credentials => same fingerprint (cache still works)
- different client_ids + same secret => different fingerprints
- cache no longer stores raw client_id (catches regression)
- attacker with wrong secret cannot reuse victim's token
Validation
pytest backend/tests/test_control_surface_auth.py
backend/tests/test_cctv_redirect_ssrf.py
backend/tests/test_gdelt_https.py
backend/tests/test_sentinel_token_cache.py
-> 37 passed
Credit: @tg12 reported all four of these in their May 17 audit with
correct line-number citations and accurate remediation recommendations.
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Gate messages now propagate via the Infonet hashchain as encrypted blobs — every node syncs them
through normal chain sync while only Gate members with MLS keys can decrypt. Added mesh reputation
system, peer push workers, voluntary Wormhole opt-in for node participation, fork recovery,
killwormhole scripts, obfuscated terminology, and hardened the self-updater to protect encryption
keys and chain state during updates.
New features: Shodan search, train tracking, Sentinel Hub imagery, 8 new intelligence layers,
CCTV expansion to 11,000+ cameras across 6 countries, Mesh Terminal CLI, prediction markets,
desktop-shell scaffold, and comprehensive mesh test suite (215 frontend + backend tests passing).
Community contributors: @wa1id, @AlborzNazari, @adust09, @Xpirix, @imqdcr, @csysp, @suranyami,
@chr0n1x, @johan-martensson, @singularfailure, @smithbh, @OrfeoTerkuci, @deuza, @tm-const,
@Elhard1, @ttulttul