Compare commits

..

9 Commits

Author SHA1 Message Date
BigBodyCobain ebbf42fb3c Round 7a: per-operator outbound attribution + GDELT GCS-direct fix
== Per-install operator handle for every third-party API call ==

Before this PR, every Shadowbroker install identified itself to
Wikipedia, Wikidata, Nominatim, GDELT, OpenMHz, Broadcastify,
weather.gov, NUFORC, Sentinel/Planetary Computer, TinyGS / CelesTrak,
Shodan, Finnhub, and others with a single project-wide User-Agent
("Shadowbroker/1.0" or "ShadowBroker-OSINT/1.0"). From the upstream's
perspective every install in the world looked like one giant scraper.
If one install misbehaved, the upstream's only recourse was to block
"Shadowbroker" as a whole.

PR #284 inadvertently doubled down on this in the frontend by
introducing a shared `WIKIMEDIA_API_USER_AGENT` constant. This PR
retrofits both backends to per-operator attribution.

  New setting: OPERATOR_HANDLE (env var / settings UI / auto-gen)
  New helper:  network_utils.outbound_user_agent("purpose")

The handle is auto-generated as "operator-XXXXXX" on first call (the
"shadow-" prefix from earlier drafts was deliberately dropped — too
suspicious-looking for abuse-detection systems). Operators can
override via OPERATOR_HANDLE; the value is sanitized to lowercase
alphanumeric+dash+underscore and capped at 48 chars. Persisted to
backend/data/operator_handle.json so it survives container restarts.

Retrofitted call sites (every previously-MONSTER User-Agent):
  - services/region_dossier.py (Wikipedia + Wikidata + Nominatim)
  - services/geocode.py         (Nominatim)
  - services/sentinel_search.py (Microsoft Planetary Computer)
  - services/feed_ingester.py   (operator-curated RSS feeds)
  - services/fetchers/earth_observation.py (weather.gov, NUFORC)
  - services/fetchers/infrastructure.py
  - services/fetchers/aircraft_database.py
  - services/fetchers/route_database.py
  - services/fetchers/trains.py
  - services/fetchers/meshtastic_map.py
  - services/shodan_connector.py
  - services/unusual_whales_connector.py (Finnhub)
  - services/tinygs_fetcher.py            (CelesTrak + TinyGS)
  - services/sar/sar_products_client.py
  - services/geopolitics.py               (GDELT)
  - services/radio_intercept.py           (Broadcastify + OpenMHz)
  - routers/cctv.py + main.py             (CCTV proxy)
  - routers/ai_intel.py
  - scripts/convert_power_plants.py       (release-time data refresh)

Spoofed browser UAs removed (issues #289 / #290 / #291 — tg12 audit):
  - cloudscraper-based Chrome impersonation against api.openmhz.com
    -> replaced with honest requests + per-install UA
  - Mozilla/5.0 spoofed UA on Broadcastify scrape
    -> replaced with honest UA
  - Mozilla/5.0 + fake first-party Referer on OpenMHz audio relay
    -> replaced with honest UA
  - cloudscraper dependency dropped from pyproject.toml + uv.lock

Frontend retrofit:
  - new GET /api/settings/operator-handle endpoint (local-operator
    gated) returns the install's handle
  - frontend/src/lib/wikimediaClient.ts fetches the handle once on
    first use, caches it for page lifetime, embeds it in the
    Api-User-Agent for every Wikipedia / Wikidata browser-direct call

== GDELT GCS-direct fix ==

GDELT's data.gdeltproject.org is a CNAME to a Google Cloud Storage
bucket. GCS responds with the wildcard *.storage.googleapis.com cert
which legitimately does NOT cover the GDELT custom domain, so Python's
TLS verification correctly refuses the connection. Some networks
happen to route through a path where this works; many (notably Docker
Desktop's outbound NAT on local installs) do not. Verified on the
maintainer's local install: GDELT was unreachable; 1610 geopolitical
events / 48 export files were dropping silently.

Fix: services/geopolitics._gcs_direct_gdelt_url() rewrites any
data.gdeltproject.org URL to its GCS-direct equivalent
(storage.googleapis.com/data.gdeltproject.org/...) where the standard
GCS cert is genuinely valid. api.gdeltproject.org and every other host
are left untouched.

Confirmed live: backend log goes from
  GDELT lastupdate failed: 500
to
  Downloading 48 GDELT export files...
  Downloaded 48/48 GDELT exports
  GDELT parsed: 1610 conflict locations from 48 files

== Tests ==

  backend/tests/test_per_operator_outbound_attribution.py (12 tests)
  backend/tests/test_gdelt_gcs_direct_rewrite.py          (6 tests)
  backend/tests/test_region_dossier_wikimedia_ua.py       (updated to
    pin the helper + per-operator handle, not the old constant)
  frontend/src/__tests__/utils/wikimediaClient.test.ts    (rewritten
    to mock /api/settings/operator-handle and assert per-operator UA)

Local: backend 114/114 security+audit+round7a suite green;
       frontend 718/718 vitest suite green.

Credit: tg12 (external security audit, issues #289/#290/#291
relating to spoofed UAs); BigBodyCobain (operator-prefix call,
GDELT cloud-vs-local diagnosis).
2026-05-21 15:03:27 -06:00
Shadowbroker c3ef9f4b9e Fix #239: CI guard against new duplicate route registrations (#286)
The audit's concern is that FastAPI behavior depends on the order
routes are registered, because backend/main.py and several router
modules register the same (method, path) pairs twice.

Empirical verification (done in this PR's investigation, see
test_router_handler_is_the_one_that_serves) shows:

- main.app.include_router(...) runs at line ~3316.
- All @app.get/post/... decorators in main.py run AFTER that.
- FastAPI matches in registration order -> the router handler always
  wins; the main.py copies are dead code at the route-resolution
  layer.

So behavior today is deterministic, but drift between the two copies
is a real future risk: someone editing only one copy of a pair
introduces silent inconsistency, exactly as we saw in round 5 with
_WORMHOLE_PUBLIC_SETTINGS_FIELDS (which existed in BOTH main.py and
routers/wormhole.py and had to be tightened in both).

This PR is the lowest-risk fix: a CI guard that captures today's 166
known duplicates as a baseline and fails the build if any NEW
duplicate appears later. Existing duplicates are tolerated. Removed
duplicates are allowed (the baseline is a ceiling, not a floor). No
production code is deleted or moved -- the dedup of the existing 166
duplicates can be staged separately in future PRs without rushing.

Files:

- backend/tests/data/duplicate_routes_baseline.json
  Snapshot of every currently-tolerated (METHOD path) duplicate with
  the modules that register each copy. Generated from a live import
  of main.app via the snippet in the test docstring.

- backend/tests/test_no_new_duplicate_routes.py
  Three tests:
    1. test_no_new_duplicate_route_registrations -- the actual guard,
       fails if (METHOD, path) not in baseline is found duplicated.
    2. test_baseline_only_lists_real_duplicates -- warns (does not
       fail) if the baseline has entries that no longer correspond to
       a real duplicate; informational housekeeping for the next
       baseline regeneration.
    3. test_router_handler_is_the_one_that_serves -- pins the
       empirical claim that for every duplicated path the router
       handler is the first-registered one. If someone ever reorders
       include_router() to come AFTER @app decorators, this test
       fails loudly and points at the most likely cause.

Verified locally:
- 3/3 new tests pass with current main (166 baselined dups).
- Synthetic duplicate injected into main.app at runtime IS caught by
  test 1.
- Full security+carrier suite (96 tests) still green.

Credit: tg12 (external security audit).
2026-05-21 13:27:16 -06:00
Shadowbroker 5e6bb8511a Fix #244/#245/#246: carrier tracker seed/cache/freshness model (#285)
Replace the dated editorial fallback positions baked into the registry
with a one-shot seed file + persistent observation cache. The user's
runtime cache now reflects what THIS install has actually observed,
not what USNI published on March 9, 2026. A year from now, the cache
holds a year of observations and the seed is irrelevant.

== #244: dated editorial coordinates out of the registry ==

CARRIER_REGISTRY no longer carries fallback_lat/lng/heading/desc.
Those fields are deleted. The registry is now identity + homeport
only.

New file: backend/data/carrier_seed.json
  - Read-only, shipped with every release.
  - Used ONCE on first-ever startup to bootstrap carrier_cache.json.
  - Each entry stamped with position_confidence="seed" and the actual
    as-of date (2026-03-09), NOT now().

== #245: approximate confidence for headline-derived positions ==

_parse_carrier_positions_from_news() now stamps every GDELT-derived
entry with position_confidence="approximate" so the UI knows the
coordinate is a region-centroid match, not a precise observation.
After the freshness window the label rolls over to
"stale_approximate" so old-and-imprecise is distinguishable from
recent-and-imprecise.

The article's actual seendate is used as position_source_at instead
of now(), so the "last reported X days ago" badge is honest.

== #246: freshness is labelling, not eviction ==

The cache always preserves the last position the system observed,
forever. What changes is the position_confidence label:
  - within configurable window (default 14d, env-overridable via
    SHADOWBROKER_CARRIER_FRESHNESS_DAYS) -> "recent"
  - older -> "stale"
  - seed-bootstrap entries that were never refreshed -> "seed"
  - homeport defaults (carrier added post-install) -> "homeport_default"
  - headline-derived (any age, fresh) -> "approximate"
  - headline-derived (older than window) -> "stale_approximate"

The position itself never reverts to the seed or the registry. The
user always sees the last position the system observed. Per the
user's explicit guidance: "from there have it be the last position
the user has logged the carriers that way a year from now it doesnt
revert to where the ships are today".

== Other improvements ==

- CACHE_FILE moved to backend/data/carrier_cache.json so it lives in
  the volume-mounted dir under Docker compose. Previously it was at
  /app/carrier_cache.json which got wiped on every container restart
  (pre-existing bug).
- Atomic cache write (temp + os.replace) so a crash mid-write does
  not leave a truncated cache file.

== Public API shape ==

Every carrier object the API emits now includes:
  - position_confidence: seed | recent | stale | approximate |
                         stale_approximate | homeport_default
  - position_source_at:  ISO timestamp of when the underlying source
                         was observed (NOT now())
  - is_fallback:         convenience boolean for the UI; true when the
                         confidence is seed/stale/stale_approximate/
                         homeport_default

Existing fields (estimated, source, source_url, last_osint_update,
name, type, lat, lng, country, desc, wiki) are preserved exactly so
the current ShipPopup frontend renders unchanged. last_osint_update
now reflects position_source_at instead of now(), which is what the
existing "last reported MM/DD" badge always meant to show.

Tests: backend/tests/test_carrier_tracker_quality.py — 17 tests
covering seed bootstrap, subsequent-startup ignoring seed, no-seed/
no-cache homeport fallback, registry no longer has fallback fields,
freshness window labelling + env override, "year-old cache entry keeps
its position, only the label flips" regression, approximate
confidence for headline matches, GDELT seendate ISO parser, public
response shape backward compat.

Credit: tg12 (external security audit, three P1/P2 issues).
2026-05-21 11:15:52 -06:00
Shadowbroker 0fee36e8f7 Fix #218/#219/#220: identify ShadowBroker on Wikipedia + Wikidata calls (#284)
Wikimedia's User-Agent policy asks API clients to identify themselves
with a stable, contactable identifier so their operators can rate-limit
or coordinate. Before this change, ShadowBroker was sending:

- Backend (region_dossier.py): generic project default UA only; no
  Api-User-Agent.
- Frontend (useRegionDossier.ts, WikiImage.tsx, NewsFeed.tsx): zero
  identifying header at all; three separate copy-pasted anonymous
  fetches with their own module-local caches.

Three separate components doing the same broken thing meant policy
fixes had to happen in three places, with no shared cache or kill
switch.

Fix (no UX change, zero hostility):

== Backend ==

`backend/services/region_dossier.py` now sets explicit `User-Agent` +
`Api-User-Agent` headers on every outbound Wikidata and Wikipedia
request via a new `_WIKIMEDIA_REQUEST_HEADERS` constant. The identifier
includes a contact path (issues page on the public GitHub repo).

== Frontend ==

New shared helper `frontend/src/lib/wikimediaClient.ts`:
- `fetchWikipediaSummary(title)` — single source of truth for Wikipedia
  REST summary lookups, with one shared LRU cache (in-flight requests
  deduplicated, 512-entry cap), `Api-User-Agent` on every fetch.
- `fetchWikidataSparql(query)` — same shape for Wikidata SPARQL.
- `WIKIMEDIA_API_USER_AGENT` — exported constant; one place to update
  if Wikimedia ever asks us to back off.

Refactored three components to use the shared client:
- `frontend/src/hooks/useRegionDossier.ts` — fetchLeader() and
  fetchLocalWikiSummary() now route through the shared helpers.
- `frontend/src/components/WikiImage.tsx` — uses fetchWikipediaSummary,
  proper React state instead of module-mutation + forceUpdate trick.
- `frontend/src/components/NewsFeed.tsx` — same shape.

UX: byte-for-byte identical. Same thumbnails, same dossier content,
same load behavior. The only observable difference is the outgoing
request header.

Note on #239 (route duplication): an audit-grade inventory shows 166
main.py routes are shadowed by router modules. That cleanup is too
large to land safely in this PR; it will be staged as a separate
ladder of small PRs grouped by router module.

Tests:
- `backend/tests/test_region_dossier_wikimedia_ua.py` — 3 tests
  asserting backend headers are present.
- `frontend/src/__tests__/utils/wikimediaClient.test.ts` — 9 tests
  covering Api-User-Agent presence, shared cache, concurrent
  deduplication, disambiguation/HTTP-error/network-error fallthroughs,
  empty-input safety.

Local: backend 76/76 security suite green, frontend 716/716 vitest
suite green.

Credit: tg12 (external security audit).
2026-05-21 10:48:05 -06:00
Shadowbroker e125467721 Fix #243/#252/#253: stop leaking settings posture to anonymous callers (#283)
Three settings endpoints were disclosing operational posture or
operator-curated configuration to any network caller. This change
either tightens the redacted-public view (#243) or adds a
local-operator auth gate (#252, #253) per the audit recommendations.

Zero hostility to legitimate users: in all three cases, the Tauri
shell (loopback), the Docker bridge frontend container (#250 + #278),
and any caller with an admin key continue to see the full data. Only
anonymous LAN/internet callers see the reduced surface.

== #243 (Wormhole transport posture, anonymous-mode, profile, node mode)

Tightened the public-redaction allowlists in BOTH the main.py and
routers/wormhole.py copies:
- _WORMHOLE_PUBLIC_SETTINGS_FIELDS: {enabled, transport, anonymous_mode}
                                 -> {enabled}
- _WORMHOLE_PUBLIC_PROFILE_FIELDS: {profile, wormhole_enabled}
                                 -> {wormhole_enabled}

`GET /api/settings/node` (both the routers/admin.py and main.py copies)
now returns an empty stub for unauthenticated callers and the full
node_mode + node_enabled fields only for authenticated callers via
_scoped_view_authenticated(request, "node").

== #252 (news feed inventory disclosure)

`GET /api/settings/news-feeds` now requires Depends(require_local_operator)
in both the canonical routers/admin.py handler and the duplicate main.py
handler. Anonymous callers can no longer enumerate operator-curated
feed names and URLs.

== #253 (Time Machine archival-capture posture disclosure)

`GET /api/settings/timemachine` now requires Depends(require_local_operator).
Anonymous callers can no longer fingerprint whether a deployment is
retaining replayable historical surveillance data.

Tests: backend/tests/test_round5_settings_info_disclosure.py (10 tests)
- Wormhole settings: anonymous sees only `enabled`; authenticated sees full state.
- Privacy profile: anonymous sees only `wormhole_enabled`; authenticated sees `profile` + `transport` + `anonymous_mode`.
- Node settings: anonymous sees `{}`; authenticated sees node_mode + node_enabled + persisted state.
- news-feeds: anonymous gets 403 (and get_feeds() is NOT called); authenticated gets full inventory.
- timemachine: anonymous gets 403; authenticated sees enabled + storage_warning.

Local: 73/73 security suite (round 5 + earlier rounds) green.

Credit: tg12 (external security audit, P1 + 2x Medium).
2026-05-21 10:32:23 -06:00
Shadowbroker 2b03b808ac Fix #279: add defusedxml to uv.lock so Docker image installs it (#282)
defusedxml is listed in backend/pyproject.toml line 18 but was missing
from uv.lock. The backend Dockerfile uses `uv sync --frozen --no-dev`,
which only installs packages pinned in the lockfile. As a result the
runtime image shipped without defusedxml even though pyproject declared
it, and any import path that touched it crashed at startup with:

    ModuleNotFoundError: No module named 'defusedxml'

Affected import sites:

- backend/services/psk_reporter_fetcher.py:10
- backend/services/fetchers/aircraft_database.py:21
- backend/services/cctv_pipeline.py:990
- backend/services/cctv_pipeline.py:1018

Fix: regenerate uv.lock so defusedxml v0.7.1 (matching the >=0.7.1
specifier in pyproject) is locked. No code changes -- only the lockfile.
Next image build picks it up via the existing `uv sync --frozen` step.

Reporter: external user. Thanks for catching the missing dep.
2026-05-21 10:18:40 -06:00
Shadowbroker 2e14e75a0e Fix #256: per-peer HMAC secrets defeat cross-peer impersonation (#281)
Before this change, every peer-push HMAC was derived from the single
fleet-shared MESH_PEER_PUSH_SECRET. The receiver could prove "this
request was signed by someone who knows the fleet secret" but it could
NOT prove which peer signed it. Any peer that knew the global secret
could compute the expected HMAC for any other peer URL and forge a
push pretending to be that peer.

Fix: introduce MESH_PEER_SECRETS, an optional comma-separated
url=secret map. When a peer URL appears in the map, only the listed
per-peer secret is accepted for it -- the global secret is ignored for
that specific URL. Peer A no longer knows peer B's secret, so peer A
cannot forge a push claiming to be peer B.

The new helper resolve_peer_key_for_url() in mesh_crypto.py wraps the
lookup and is called from every existing peer-push call site:

- backend/auth.py:_verify_peer_push_hmac (receiver)
- backend/main.py:_http_peer_push_loop (Infonet event push)
- backend/main.py:_http_gate_pull_loop (gate event pull)
- backend/main.py:_http_gate_push_loop (gate event push)
- backend/services/mesh/mesh_router.py (two transports, push)
- backend/services/mesh/mesh_hashchain.py (gate wire ref key)
- backend/services/mesh/mesh_wormhole_prekey.py (peer prekey lookup)

Zero hostility, by design:

- Single-peer installs leave MESH_PEER_SECRETS empty -> resolver falls
  back to MESH_PEER_PUSH_SECRET -> behavior is byte-for-byte unchanged.
- Multi-peer installs that haven't migrated yet behave exactly as
  before.
- Multi-peer installs that DO migrate set MESH_PEER_SECRETS on both
  ends of each peering and immediately close the impersonation surface
  for those URLs. Migration is incremental: unlisted peers keep using
  the global secret.

Tests in backend/tests/test_per_peer_secret_resolver.py:
- env parsing (default, override, whitespace, malformed entries, cache)
- precedence: per-peer beats global
- migration window: unlisted peer falls back to global
- IMPERSONATION REFUSAL: peer A with global-secret-only cannot forge
  HMAC for peer B that has a per-peer secret configured
- IMPERSONATION REFUSAL: peer A with its OWN per-peer secret cannot
  forge HMAC for peer B
- positive control: legitimate peer B request verifies
- zero-behavior-change: single-peer install produces the same key bytes
  as before the change

Credit: tg12 (external security audit, P1/High/High confidence)
2026-05-21 10:05:29 -06:00
Shadowbroker 084e563412 Fix #240/#241: require admin auth on oracle resolve endpoints (#280)
Both POST /api/mesh/oracle/resolve and POST /api/mesh/oracle/resolve-stakes
were previously gated only by a rate limit (5/min) and tagged with
`mesh_write_exempt(MeshWriteExemption.ADMIN_CONTROL)`. The exemption
decorator is metadata only — it tells the mesh signed-write middleware
not to require a signature envelope, it does NOT enforce caller
authorization. Any network caller could:

- /resolve: settle any prediction market to any outcome (corrupts every
  downstream profile/win-loss count derived from that ledger).
- /resolve-stakes: trigger stake settlement for all expired contests at
  a time of their choosing (race against operator intent).

Fix: add `dependencies=[Depends(require_admin)]` to both routes. The
existing `mesh_write_exempt` tag stays in place because it accurately
describes the route's relationship to the signed-write envelope system;
adding `require_admin` is what closes the actual auth hole.

Tests in backend/tests/test_oracle_resolve_auth_gate.py:
- anonymous caller -> 403, ledger mutator NOT called
- wrong admin key -> 403, ledger mutator NOT called
- valid admin key -> 200, ledger mutator called
- admin key unconfigured + no debug/insecure-admin -> 403

Credit: tg12 (external security audit)
2026-05-21 09:45:08 -06:00
Shadowbroker 9ef6213284 Fix #250: bind Docker bridge local-operator trust to frontend hostname (#278)
Tightens the bridge-trust check so a connection on the Docker bridge
is only granted local-operator status when its source IP matches a
configured frontend container hostname (default: `frontend` + the
shipped `container_name` `shadowbroker-frontend`). Previously, when
`SHADOWBROKER_TRUST_DOCKER_BRIDGE_LOCAL_OPERATOR=1` was set, ANY IP
in the 172.16.0.0/12 range was granted local-operator privileges —
on a shared Docker host that included any unrelated container on the
same bridge.

Operators with renamed services can list new hostnames via the new
`SHADOWBROKER_TRUSTED_FRONTEND_HOSTS` env var (comma-separated). DNS
resolution is cached for 30s; if Docker DNS can't resolve any of the
configured names we fail closed and refuse the bridge entirely.

Single-user installs see no behavior change — the default-named
frontend container still resolves and is still trusted.

Credit: tg12 (external security audit)
2026-05-21 02:06:11 -06:00
51 changed files with 4346 additions and 408 deletions
+4
View File
@@ -105,6 +105,10 @@ backend/data/*
# the self-updater as a second-line integrity check when the release's
# SHA256SUMS.txt asset can't be fetched.
!backend/data/release_digests.json
# Issue #244/#245/#246: one-shot carrier-position seed shipped with each
# release. Used ONLY on first-ever startup to bootstrap carrier_cache.json;
# after that the cache reflects this install's own GDELT observations.
!backend/data/carrier_seed.json
# OS generated files
.DS_Store
+21 -7
View File
@@ -24,14 +24,28 @@ AIS_API_KEY= # https://aisstream.io/ — free tier WebSocket key
# Requires MESH_DEBUG_MODE=true; do not enable this for ordinary use.
# ALLOW_INSECURE_ADMIN=false
# Default outbound User-Agent for all third-party HTTP fetchers.
# Project-generic by default — does NOT include any personal contact info or
# operator-specific identifier. Override only if you run a public relay and
# want upstreams to be able to reach you (e.g. Nominatim/OSM usage policy).
# SHADOWBROKER_USER_AGENT=ShadowBroker-OSINT/0.9 (contact: ops@example.com)
# Per-install operator handle. Round 7a: every outbound third-party API
# call (Wikipedia, Wikidata, Nominatim, GDELT, OpenMHz, Broadcastify,
# weather.gov, NUFORC, etc.) includes this handle in the User-Agent so
# upstreams can rate-limit / contact the specific install instead of
# treating every Shadowbroker user as one entity.
#
# Default empty -> a stable pseudonymous handle (e.g. "operator-7f3a92") is
# auto-generated on first run and persisted to backend/data/operator_handle.json.
# Operators who want a meaningful handle (real name, org, GitHub login) can
# set it here. Special characters are sanitized to dashes.
# OPERATOR_HANDLE=
# User-Agent for Nominatim geocoding requests (per OSM usage policy).
# NOMINATIM_USER_AGENT=ShadowBroker/1.0
# Default outbound User-Agent for all third-party HTTP fetchers. Operators
# who run a public relay and want a completely custom UA can set this; it
# bypasses the per-operator helper entirely. Most installs should leave it
# unset and use OPERATOR_HANDLE instead.
# SHADOWBROKER_USER_AGENT=
# Nominatim-specific User-Agent override (OSM usage policy). Leave unset to
# use the per-install handle (default) — set only if you have a registered
# Nominatim relay identity.
# NOMINATIM_USER_AGENT=
# ── Third-party fetcher opt-ins ────────────────────────────────
# These data sources phone home to politically/commercially sensitive
+10 -5
View File
@@ -45,6 +45,7 @@ from services.mesh.mesh_compatibility import (
from services.mesh.mesh_crypto import (
_derive_peer_key,
normalize_peer_url,
resolve_peer_key_for_url,
verify_signature,
verify_node_binding,
parse_public_key_algo,
@@ -1403,11 +1404,15 @@ def _peer_hmac_url_from_request(request: Request) -> str:
def _verify_peer_push_hmac(request: Request, body_bytes: bytes) -> bool:
"""Verify HMAC-SHA256 peer authentication on push requests."""
secret = str(get_settings().MESH_PEER_PUSH_SECRET or "").strip()
if not secret:
return False
"""Verify HMAC-SHA256 peer authentication on push requests.
Issue #256: ``resolve_peer_key_for_url`` looks up a per-peer secret
in ``MESH_PEER_SECRETS`` first, then falls back to the global
``MESH_PEER_PUSH_SECRET``. When a peer URL is listed in the per-peer
map, only the listed secret is accepted for it — the global secret
is ignored, so any peer that knows only the global secret cannot
forge a request claiming to be that peer.
"""
provided = str(request.headers.get("x-peer-hmac", "") or "").strip()
if not provided:
return False
@@ -1416,7 +1421,7 @@ def _verify_peer_push_hmac(request: Request, body_bytes: bytes) -> bool:
allowed_peers = set(authenticated_push_peer_urls())
if not peer_url or peer_url not in allowed_peers:
return False
peer_key = _derive_peer_key(secret, peer_url)
peer_key = resolve_peer_key_for_url(peer_url)
if not peer_key:
return False
+120
View File
@@ -0,0 +1,120 @@
{
"_meta": {
"as_of": "2026-03-09",
"source": "USNI News Fleet & Marine Tracker",
"source_url": "https://news.usni.org/2026/03/09/usni-news-fleet-and-marine-tracker-march-9-2026",
"note": "One-shot bootstrap for first-run carrier positions. Once carrier_cache.json exists in the runtime data volume, this seed file is never read again. All subsequent updates come from GDELT (and any future sources) and are written to carrier_cache.json. A year from now, your runtime cache reflects whatever your install has observed since first launch — not these snapshot positions."
},
"carriers": {
"CVN-68": {
"lat": 47.5535,
"lng": -122.6400,
"heading": 90,
"desc": "Bremerton, WA (Maintenance)",
"source": "USNI News Fleet & Marine Tracker (seed, as of 2026-03-09)",
"source_url": "https://news.usni.org/category/fleet-tracker",
"position_source_at": "2026-03-09T00:00:00Z",
"position_confidence": "seed"
},
"CVN-76": {
"lat": 47.5580,
"lng": -122.6360,
"heading": 90,
"desc": "Bremerton, WA (Decommissioning)",
"source": "USNI News Fleet & Marine Tracker (seed, as of 2026-03-09)",
"source_url": "https://news.usni.org/category/fleet-tracker",
"position_source_at": "2026-03-09T00:00:00Z",
"position_confidence": "seed"
},
"CVN-69": {
"lat": 36.9465,
"lng": -76.3265,
"heading": 0,
"desc": "Norfolk, VA (Post-deployment maintenance)",
"source": "USNI News Fleet & Marine Tracker (seed, as of 2026-03-09)",
"source_url": "https://news.usni.org/category/fleet-tracker",
"position_source_at": "2026-03-09T00:00:00Z",
"position_confidence": "seed"
},
"CVN-78": {
"lat": 18.0,
"lng": 39.5,
"heading": 0,
"desc": "Red Sea — Operation Epic Fury (USNI Mar 9)",
"source": "USNI News Fleet & Marine Tracker (seed, as of 2026-03-09)",
"source_url": "https://news.usni.org/category/fleet-tracker",
"position_source_at": "2026-03-09T00:00:00Z",
"position_confidence": "seed"
},
"CVN-74": {
"lat": 36.98,
"lng": -76.43,
"heading": 0,
"desc": "Newport News, VA (RCOH refueling overhaul)",
"source": "USNI News Fleet & Marine Tracker (seed, as of 2026-03-09)",
"source_url": "https://news.usni.org/category/fleet-tracker",
"position_source_at": "2026-03-09T00:00:00Z",
"position_confidence": "seed"
},
"CVN-75": {
"lat": 36.0,
"lng": 15.0,
"heading": 0,
"desc": "Mediterranean Sea deployment (USNI Mar 9)",
"source": "USNI News Fleet & Marine Tracker (seed, as of 2026-03-09)",
"source_url": "https://news.usni.org/category/fleet-tracker",
"position_source_at": "2026-03-09T00:00:00Z",
"position_confidence": "seed"
},
"CVN-77": {
"lat": 36.5,
"lng": -74.0,
"heading": 0,
"desc": "Atlantic — Pre-deployment workups (USNI Mar 9)",
"source": "USNI News Fleet & Marine Tracker (seed, as of 2026-03-09)",
"source_url": "https://news.usni.org/category/fleet-tracker",
"position_source_at": "2026-03-09T00:00:00Z",
"position_confidence": "seed"
},
"CVN-70": {
"lat": 32.6840,
"lng": -117.1290,
"heading": 180,
"desc": "San Diego, CA (Homeport)",
"source": "USNI News Fleet & Marine Tracker (seed, as of 2026-03-09)",
"source_url": "https://news.usni.org/category/fleet-tracker",
"position_source_at": "2026-03-09T00:00:00Z",
"position_confidence": "seed"
},
"CVN-71": {
"lat": 32.6885,
"lng": -117.1280,
"heading": 180,
"desc": "San Diego, CA (Maintenance)",
"source": "USNI News Fleet & Marine Tracker (seed, as of 2026-03-09)",
"source_url": "https://news.usni.org/category/fleet-tracker",
"position_source_at": "2026-03-09T00:00:00Z",
"position_confidence": "seed"
},
"CVN-72": {
"lat": 20.0,
"lng": 64.0,
"heading": 0,
"desc": "Arabian Sea — Operation Epic Fury (USNI Mar 9)",
"source": "USNI News Fleet & Marine Tracker (seed, as of 2026-03-09)",
"source_url": "https://news.usni.org/category/fleet-tracker",
"position_source_at": "2026-03-09T00:00:00Z",
"position_confidence": "seed"
},
"CVN-73": {
"lat": 35.2830,
"lng": 139.6700,
"heading": 180,
"desc": "Yokosuka, Japan (Forward deployed)",
"source": "USNI News Fleet & Marine Tracker (seed, as of 2026-03-09)",
"source_url": "https://news.usni.org/category/fleet-tracker",
"position_source_at": "2026-03-09T00:00:00Z",
"position_confidence": "seed"
}
}
}
+48 -19
View File
@@ -220,6 +220,7 @@ from services.mesh.mesh_crypto import (
_derive_peer_key,
derive_node_id,
normalize_peer_url,
resolve_peer_key_for_url,
verify_node_binding,
parse_public_key_algo,
)
@@ -1079,8 +1080,18 @@ def _public_mesh_log_size(entries: list[dict[str, Any]]) -> int:
return sum(1 for item in entries if _public_mesh_log_entry(item) is not None)
_WORMHOLE_PUBLIC_SETTINGS_FIELDS = {"enabled", "transport", "anonymous_mode"}
_WORMHOLE_PUBLIC_PROFILE_FIELDS = {"profile", "wormhole_enabled"}
# Issue #243 (tg12): the public redaction now exposes only the bare
# "is Wormhole on?" boolean. Transport choice (tor/i2p/mixnet/direct),
# anonymous-mode state, and the named privacy profile are all
# operational posture and were leaking actionable recon to any
# unauthenticated caller. They are now gated behind authenticated reads
# (admin key or scoped-view token). Loopback Tauri shells and Docker
# bridge frontend containers continue to see full status because the
# Next.js catch-all proxy injects the configured ADMIN_KEY for
# same-origin/non-browser callers (see PR #263), so legitimate operator
# UX is unaffected.
_WORMHOLE_PUBLIC_SETTINGS_FIELDS = {"enabled"}
_WORMHOLE_PUBLIC_PROFILE_FIELDS = {"wormhole_enabled"}
_PRIVATE_LANE_CONTROL_FIELDS = {"private_lane_tier", "private_lane_policy"}
_PUBLIC_RNS_STATUS_FIELDS = {"enabled", "ready", "configured_peers", "active_peers"}
_NODE_PUBLIC_EVENT_HOOK_REGISTERED = False
@@ -1745,10 +1756,12 @@ def _http_peer_push_loop() -> None:
_NODE_SYNC_STOP.wait(_PEER_PUSH_INTERVAL_S)
continue
secret = str(get_settings().MESH_PEER_PUSH_SECRET or "").strip()
if not secret:
_NODE_SYNC_STOP.wait(_PEER_PUSH_INTERVAL_S)
continue
# Issue #256: resolve_peer_key_for_url() handles both the
# legacy global MESH_PEER_PUSH_SECRET path and the per-peer
# MESH_PEER_SECRETS map. The per-peer skip happens below
# ("if not peer_key: continue"), so we don't gate the whole
# loop on the global secret being set — an install that only
# configures per-peer secrets is now valid.
peers = authenticated_push_peer_urls()
if not peers:
@@ -1778,7 +1791,7 @@ def _http_peer_push_loop() -> None:
ensure_ascii=False,
).encode("utf-8")
peer_key = _derive_peer_key(secret, normalized)
peer_key = resolve_peer_key_for_url(normalized)
if not peer_key:
continue
import hmac as _hmac_mod2
@@ -1831,10 +1844,7 @@ def _http_gate_pull_loop() -> None:
_NODE_SYNC_STOP.wait(_GATE_PULL_INTERVAL_S)
continue
secret = str(get_settings().MESH_PEER_PUSH_SECRET or "").strip()
if not secret:
_NODE_SYNC_STOP.wait(_GATE_PULL_INTERVAL_S)
continue
# Issue #256: per-peer key resolution; see _http_peer_push_loop.
peers = authenticated_push_peer_urls()
if not peers:
@@ -1846,7 +1856,7 @@ def _http_gate_pull_loop() -> None:
if not normalized:
continue
peer_key = _derive_peer_key(secret, normalized)
peer_key = resolve_peer_key_for_url(normalized)
if not peer_key:
continue
@@ -1959,10 +1969,7 @@ def _http_gate_push_loop() -> None:
_NODE_SYNC_STOP.wait(_PEER_PUSH_INTERVAL_S)
continue
secret = str(get_settings().MESH_PEER_PUSH_SECRET or "").strip()
if not secret:
_NODE_SYNC_STOP.wait(_PEER_PUSH_INTERVAL_S)
continue
# Issue #256: per-peer key resolution; see _http_peer_push_loop.
peers = authenticated_push_peer_urls()
if not peers:
@@ -1977,7 +1984,7 @@ def _http_gate_push_loop() -> None:
if not normalized:
continue
peer_key = _derive_peer_key(secret, normalized)
peer_key = resolve_peer_key_for_url(normalized)
if not peer_key:
continue
@@ -8141,8 +8148,12 @@ def _cctv_proxy_profile_for_url(target_url: str) -> _CCTVProxyProfile:
def _cctv_upstream_headers(request: Request, profile: _CCTVProxyProfile) -> dict[str, str]:
# Round 7a: per-install operator handle. See routers/cctv.py for the
# canonical handler; this duplicate stays in lockstep until the #239
# dedup ladder removes it.
from services.network_utils import outbound_user_agent
headers = {
"User-Agent": "Mozilla/5.0 (compatible; ShadowBroker CCTV proxy)",
"User-Agent": f"Mozilla/5.0 (compatible; {outbound_user_agent('cctv-proxy')})",
**profile.headers,
}
range_header = request.headers.get("range")
@@ -8813,9 +8824,14 @@ async def api_uw_flow(request: Request):
from services.news_feed_config import get_feeds, save_feeds, reset_feeds
@app.get("/api/settings/news-feeds")
@app.get(
"/api/settings/news-feeds",
dependencies=[Depends(require_local_operator)],
)
@limiter.limit("30/minute")
async def api_get_news_feeds(request: Request):
"""Issue #252 (tg12): gated on local-operator. See the canonical
handler in backend/routers/admin.py for the full rationale."""
return get_feeds()
@@ -9018,9 +9034,22 @@ class NodeSettingsUpdate(BaseModel):
@app.get("/api/settings/node")
@limiter.limit("30/minute")
async def api_get_node_settings(request: Request):
"""Issue #243 (tg12): node mode and participant state are
operational posture. Anonymous callers receive an empty stub
enough for the UI to know the endpoint exists but nothing
fingerprintable. Authenticated callers see the full state.
Authenticated == local-operator (loopback / Docker bridge) OR an
admin / scoped-view token. The Tauri shell and Docker frontend
container both qualify via their existing transport (PR #263 +
PR #278), so legitimate operator UX is unchanged.
"""
from services.node_settings import read_node_settings
data = await asyncio.to_thread(read_node_settings)
authenticated = _scoped_view_authenticated(request, "node")
if not authenticated:
return {}
return {
**data,
"node_mode": _current_node_mode(),
-1
View File
@@ -13,7 +13,6 @@ dependencies = [
"apscheduler==3.10.3",
"beautifulsoup4>=4.9.0",
"cachetools==5.5.2",
"cloudscraper==1.2.71",
"cryptography>=41.0.0",
"defusedxml>=0.7.1",
"fastapi==0.115.12",
+52 -2
View File
@@ -82,9 +82,40 @@ async def api_get_keys_meta(request: Request):
return get_env_path_info()
@router.get("/api/settings/news-feeds")
@router.get(
"/api/settings/operator-handle",
dependencies=[Depends(require_local_operator)],
)
@limiter.limit("60/minute")
async def api_get_operator_handle(request: Request):
"""Round 7a: return the per-install operator handle so the frontend
can include it in browser-direct third-party API calls (Wikipedia /
Wikidata via lib/wikimediaClient). The handle is auto-generated on
first use; operators can override it via the OPERATOR_HANDLE setting
or the env var of the same name.
Gated on local-operator: legitimate browser usage goes through the
Next.js proxy which auto-attaches the admin key; remote scanners get
403. The handle itself isn't a secret (it's sent to every third-party
API the operator touches), but admin-gating it matches the rest of
the settings endpoints and follows least-privilege.
"""
from services.network_utils import get_operator_handle
return {"handle": get_operator_handle()}
@router.get(
"/api/settings/news-feeds",
dependencies=[Depends(require_local_operator)],
)
@limiter.limit("30/minute")
async def api_get_news_feeds(request: Request):
"""Issue #252 (tg12): the curated feed inventory is configuration
state, not a public data feed. Gated on local-operator so the
Tauri shell, the Docker bridge frontend, and any caller with an
admin key all see the full list; anonymous LAN/internet callers
can no longer enumerate operator source URLs.
"""
from services.news_feed_config import get_feeds
return get_feeds()
@@ -118,9 +149,18 @@ async def api_reset_news_feeds(request: Request):
@router.get("/api/settings/node")
@limiter.limit("30/minute")
async def api_get_node_settings(request: Request):
"""Issue #243 (tg12): node_mode and node_enabled are operational
posture. Anonymous callers receive an empty stub; authenticated
callers (local-operator or admin/scoped token) see the full
state. See the canonical handler in backend/main.py for the full
rationale.
"""
import asyncio
from auth import _scoped_view_authenticated
from services.node_settings import read_node_settings
data = await asyncio.to_thread(read_node_settings)
if not _scoped_view_authenticated(request, "node"):
return {}
return {
**data,
"node_mode": _current_node_mode(),
@@ -210,9 +250,19 @@ async def api_set_meshtastic_mqtt_settings(request: Request, body: MeshtasticMqt
return _meshtastic_runtime_snapshot()
@router.get("/api/settings/timemachine")
@router.get(
"/api/settings/timemachine",
dependencies=[Depends(require_local_operator)],
)
@limiter.limit("30/minute")
async def api_get_timemachine_settings(request: Request):
"""Issue #253 (tg12): archival-capture posture is operationally
sensitive — it tells a remote caller whether this deployment is
retaining replayable historical surveillance data. Gated on
local-operator so the Tauri shell and Docker bridge frontend
still see the toggle state, but anonymous LAN/internet callers
can no longer fingerprint Time Machine state.
"""
import asyncio
from services.node_settings import read_node_settings
data = await asyncio.to_thread(read_node_settings)
+7 -1
View File
@@ -18,6 +18,12 @@ from auth import require_local_operator, require_openclaw_or_local
from limiter import limiter
from services.fetchers._store import latest_data as _latest_data
def _ai_intel_user_agent() -> str:
from services.network_utils import outbound_user_agent
return outbound_user_agent("ai-intel")
logger = logging.getLogger(__name__)
router = APIRouter()
@@ -447,7 +453,7 @@ async def ai_satellite_images(
"https://planetarycomputer.microsoft.com/api/stac/v1/search",
json=search_payload,
timeout=10,
headers={"User-Agent": "ShadowBroker-OSINT/1.0 (ai-intel)"},
headers={"User-Agent": _ai_intel_user_agent()},
)
resp.raise_for_status()
features = resp.json().get("features", [])
+7 -1
View File
@@ -165,7 +165,13 @@ def _cctv_proxy_profile_for_url(target_url: str) -> _CCTVProxyProfile:
def _cctv_upstream_headers(request: Request, profile: _CCTVProxyProfile) -> dict:
headers = {"User-Agent": "Mozilla/5.0 (compatible; ShadowBroker CCTV proxy)", **profile.headers}
# Round 7a: per-install operator handle. Mozilla/5.0 prefix retained
# because many CCTV endpoints sniff for a browser-like prefix.
from services.network_utils import outbound_user_agent
headers = {
"User-Agent": f"Mozilla/5.0 (compatible; {outbound_user_agent('cctv-proxy')})",
**profile.headers,
}
range_header = request.headers.get("range")
if range_header:
headers["Range"] = range_header
+21 -4
View File
@@ -223,11 +223,21 @@ async def oracle_markets_more(request: Request, category: str = "NEWS", offset:
"has_more": offset + limit < len(cat_markets), "total": len(cat_markets)}
@router.post("/api/mesh/oracle/resolve")
@router.post(
"/api/mesh/oracle/resolve",
dependencies=[Depends(require_admin)],
)
@limiter.limit("5/minute")
@mesh_write_exempt(MeshWriteExemption.ADMIN_CONTROL)
async def oracle_resolve(request: Request):
"""Resolve a prediction market."""
"""Resolve a prediction market.
Issue #240 (tg12): requires admin authentication. The
``mesh_write_exempt`` decorator below is **metadata only** — it tags
the route as not requiring a mesh signed-write envelope, it does
NOT itself enforce caller authorization. The ``Depends(require_admin)``
on the route decorator is what actually gates access.
"""
from services.mesh.mesh_oracle import oracle_ledger
body = await request.json()
market_title = body.get("market_title", "")
@@ -327,11 +337,18 @@ async def oracle_predictions(request: Request, node_id: str = ""):
active_predictions, authenticated=_scoped_view_authenticated(request, "mesh.audit"))
@router.post("/api/mesh/oracle/resolve-stakes")
@router.post(
"/api/mesh/oracle/resolve-stakes",
dependencies=[Depends(require_admin)],
)
@limiter.limit("5/minute")
@mesh_write_exempt(MeshWriteExemption.ADMIN_CONTROL)
async def oracle_resolve_stakes(request: Request):
"""Resolve all expired stake contests."""
"""Resolve all expired stake contests.
Issue #241 (tg12): requires admin authentication. See the note on
``oracle_resolve`` above — ``mesh_write_exempt`` is metadata only.
"""
from services.mesh.mesh_oracle import oracle_ledger
resolutions = oracle_ledger.resolve_expired_stakes()
return {"ok": True, "resolutions": resolutions, "count": len(resolutions)}
+7 -2
View File
@@ -160,8 +160,13 @@ router = APIRouter()
# --- Constants ---
_WORMHOLE_PUBLIC_SETTINGS_FIELDS = {"enabled", "transport", "anonymous_mode"}
_WORMHOLE_PUBLIC_PROFILE_FIELDS = {"profile", "wormhole_enabled"}
# Issue #243 (tg12): the public redaction now exposes only the bare
# "is this on?" boolean. Transport choice, anonymous-mode state, and
# the named privacy profile were all leaking actionable recon to
# unauthenticated callers and are now gated behind authenticated reads.
# See the matching block in backend/main.py for the full rationale.
_WORMHOLE_PUBLIC_SETTINGS_FIELDS = {"enabled"}
_WORMHOLE_PUBLIC_PROFILE_FIELDS = {"wormhole_enabled"}
_PRIVATE_LANE_CONTROL_FIELDS = {"private_lane_tier", "private_lane_policy"}
_PUBLIC_RNS_STATUS_FIELDS = {"enabled", "ready", "configured_peers", "active_peers"}
_NODE_PUBLIC_EVENT_HOOK_REGISTERED = False
+11 -1
View File
@@ -20,7 +20,17 @@ OUT_PATH = Path(__file__).parent.parent / "data" / "power_plants.json"
def main() -> None:
print(f"Downloading WRI Global Power Plant Database from GitHub...")
req = urllib.request.Request(CSV_URL, headers={"User-Agent": "ShadowBroker-OSINT/1.0"})
# Round 7a: release-time data refresher. Uses the per-operator UA if
# available, otherwise a release-script-specific identifier. This
# script is run by the maintainer at release time, NOT at runtime,
# so an aggregate UA is acceptable; we still use the helper so the
# behavior matches the rest of the project.
try:
from services.network_utils import outbound_user_agent
ua = outbound_user_agent("release-script-power-plants")
except Exception:
ua = "Shadowbroker/0.9 (release-script-power-plants; +https://github.com/BigBodyCobain/Shadowbroker/issues)"
req = urllib.request.Request(CSV_URL, headers={"User-Agent": ua})
with urllib.request.urlopen(req, timeout=60) as resp:
raw = resp.read().decode("utf-8")
+371 -173
View File
@@ -1,46 +1,90 @@
"""
Carrier Strike Group OSINT Tracker
===================================
Scrapes multiple OSINT sources to maintain current estimated positions
for US Navy Carrier Strike Groups. Updates on startup + 00:00 & 12:00 UTC.
Maintains estimated positions for US Navy Carrier Strike Groups with
honest provenance and freshness signals.
Sources:
1. GDELT News API — recent carrier movement headlines
2. WikiVoyage / public port-call databases
3. Fallback — last-known or static OSINT estimates
Issues #244 / #245 / #246 (tg12 external audit):
The previous implementation baked a snapshot of USNI News Fleet &
Marine Tracker positions (March 9, 2026) into the registry as
``fallback_lat``/``fallback_lng`` and stamped ``updated = now()``
every time the dossier was rendered. That presented stale editorial
data as live state. It also persisted GDELT-derived positions to the
on-disk cache with no freshness signal, so a single news mention from
months ago could keep overriding the (already-stale) registry default
indefinitely.
Architecture after this PR:
::
backend/data/carrier_seed.json read-only, shipped with image,
used ONCE on first-ever startup
to bootstrap carrier_cache.json.
backend/data/carrier_cache.json mutable, lives in the runtime data
volume, written by every GDELT
refresh + any future source.
Startup flow:
1. ``carrier_cache.json`` exists? → load it.
2. Otherwise, copy ``carrier_seed.json`` → ``carrier_cache.json``,
then load it. (This happens once, ever, per install.)
3. Background: GDELT fetch runs. Any carrier mentioned in fresh news
gets its entry replaced with the news-derived position.
``position_source_at`` is set to the news article timestamp.
Freshness is a *labelling* decision, not an eviction decision:
- ``position_source_at`` within the configurable freshness window
(default 14 days) → ``position_confidence = "recent"``.
- Older than that → ``position_confidence = "stale"``.
- Bootstrapped from the seed file (never updated) → ``"seed"``.
- No cache entry at all (e.g. a carrier added to the registry after
first install) → carrier renders at its homeport with
``"homeport_default"``.
Carriers are never hidden, never teleported, never disappeared. The
position the user sees is always the last position the system actually
observed, with an honest "as-of" timestamp the UI can render however
it likes. A year from now, the runtime cache reflects whatever this
install has observed via GDELT — not the seed snapshot.
"""
import re
import os
import json
import time
import logging
import threading
import random
from datetime import datetime, timezone
import shutil
from datetime import datetime, timedelta, timezone
from pathlib import Path
from typing import Dict, List, Optional
from typing import Any, Dict, List, Optional, Tuple
from services.network_utils import fetch_with_curl
logger = logging.getLogger(__name__)
# -----------------------------------------------------------------
# Carrier registry: hull number → metadata + fallback position
# Carrier registry: hull number → identity only.
#
# Issue #244 (tg12): the previous registry carried hard-coded
# ``fallback_lat``/``fallback_lng`` that were dated editorial
# snapshots from a 2026-03-09 article. Those fields are DELETED. The
# registry is now identity + homeport only; positions are sourced
# exclusively from carrier_cache.json (and via that, from the
# bootstrap seed or live OSINT).
# -----------------------------------------------------------------
CARRIER_REGISTRY: Dict[str, dict] = {
# Fallback positions sourced from USNI News Fleet & Marine Tracker (Mar 9, 2026)
# https://news.usni.org/2026/03/09/usni-news-fleet-and-marine-tracker-march-9-2026
# --- Bremerton, WA (Naval Base Kitsap) ---
# Distinct pier positions along Sinclair Inlet so carriers don't stack
"CVN-68": {
"name": "USS Nimitz (CVN-68)",
"wiki": "https://en.wikipedia.org/wiki/USS_Nimitz",
"homeport": "Bremerton, WA",
"homeport_lat": 47.5535,
"homeport_lng": -122.6400,
"fallback_lat": 47.5535,
"fallback_lng": -122.6400,
"fallback_heading": 90,
"fallback_desc": "Bremerton, WA (Maintenance)",
},
"CVN-76": {
"name": "USS Ronald Reagan (CVN-76)",
@@ -48,23 +92,14 @@ CARRIER_REGISTRY: Dict[str, dict] = {
"homeport": "Bremerton, WA",
"homeport_lat": 47.5580,
"homeport_lng": -122.6360,
"fallback_lat": 47.5580,
"fallback_lng": -122.6360,
"fallback_heading": 90,
"fallback_desc": "Bremerton, WA (Decommissioning)",
},
# --- Norfolk, VA (Naval Station Norfolk) ---
# Piers run N-S along Willoughby Bay; each carrier gets a distinct berth
"CVN-69": {
"name": "USS Dwight D. Eisenhower (CVN-69)",
"wiki": "https://en.wikipedia.org/wiki/USS_Dwight_D._Eisenhower",
"homeport": "Norfolk, VA",
"homeport_lat": 36.9465,
"homeport_lng": -76.3265,
"fallback_lat": 36.9465,
"fallback_lng": -76.3265,
"fallback_heading": 0,
"fallback_desc": "Norfolk, VA (Post-deployment maintenance)",
},
"CVN-78": {
"name": "USS Gerald R. Ford (CVN-78)",
@@ -72,10 +107,6 @@ CARRIER_REGISTRY: Dict[str, dict] = {
"homeport": "Norfolk, VA",
"homeport_lat": 36.9505,
"homeport_lng": -76.3250,
"fallback_lat": 18.0,
"fallback_lng": 39.5,
"fallback_heading": 0,
"fallback_desc": "Red Sea — Operation Epic Fury (USNI Mar 9)",
},
"CVN-74": {
"name": "USS John C. Stennis (CVN-74)",
@@ -83,10 +114,6 @@ CARRIER_REGISTRY: Dict[str, dict] = {
"homeport": "Norfolk, VA",
"homeport_lat": 36.9540,
"homeport_lng": -76.3235,
"fallback_lat": 36.98,
"fallback_lng": -76.43,
"fallback_heading": 0,
"fallback_desc": "Newport News, VA (RCOH refueling overhaul)",
},
"CVN-75": {
"name": "USS Harry S. Truman (CVN-75)",
@@ -94,10 +121,6 @@ CARRIER_REGISTRY: Dict[str, dict] = {
"homeport": "Norfolk, VA",
"homeport_lat": 36.9580,
"homeport_lng": -76.3220,
"fallback_lat": 36.0,
"fallback_lng": 15.0,
"fallback_heading": 0,
"fallback_desc": "Mediterranean Sea deployment (USNI Mar 9)",
},
"CVN-77": {
"name": "USS George H.W. Bush (CVN-77)",
@@ -105,23 +128,14 @@ CARRIER_REGISTRY: Dict[str, dict] = {
"homeport": "Norfolk, VA",
"homeport_lat": 36.9620,
"homeport_lng": -76.3210,
"fallback_lat": 36.5,
"fallback_lng": -74.0,
"fallback_heading": 0,
"fallback_desc": "Atlantic — Pre-deployment workups (USNI Mar 9)",
},
# --- San Diego, CA (Naval Base San Diego) ---
# Carrier piers along the east shore of San Diego Bay, spread N-S
"CVN-70": {
"name": "USS Carl Vinson (CVN-70)",
"wiki": "https://en.wikipedia.org/wiki/USS_Carl_Vinson",
"homeport": "San Diego, CA",
"homeport_lat": 32.6840,
"homeport_lng": -117.1290,
"fallback_lat": 32.6840,
"fallback_lng": -117.1290,
"fallback_heading": 180,
"fallback_desc": "San Diego, CA (Homeport)",
},
"CVN-71": {
"name": "USS Theodore Roosevelt (CVN-71)",
@@ -129,10 +143,6 @@ CARRIER_REGISTRY: Dict[str, dict] = {
"homeport": "San Diego, CA",
"homeport_lat": 32.6885,
"homeport_lng": -117.1280,
"fallback_lat": 32.6885,
"fallback_lng": -117.1280,
"fallback_heading": 180,
"fallback_desc": "San Diego, CA (Maintenance)",
},
"CVN-72": {
"name": "USS Abraham Lincoln (CVN-72)",
@@ -140,10 +150,6 @@ CARRIER_REGISTRY: Dict[str, dict] = {
"homeport": "San Diego, CA",
"homeport_lat": 32.6925,
"homeport_lng": -117.1275,
"fallback_lat": 20.0,
"fallback_lng": 64.0,
"fallback_heading": 0,
"fallback_desc": "Arabian Sea — Operation Epic Fury (USNI Mar 9)",
},
# --- Yokosuka, Japan (CFAY) ---
"CVN-73": {
@@ -152,16 +158,18 @@ CARRIER_REGISTRY: Dict[str, dict] = {
"homeport": "Yokosuka, Japan",
"homeport_lat": 35.2830,
"homeport_lng": 139.6700,
"fallback_lat": 35.2830,
"fallback_lng": 139.6700,
"fallback_heading": 180,
"fallback_desc": "Yokosuka, Japan (Forward deployed)",
},
}
# -----------------------------------------------------------------
# Region → approximate center coordinates
# Used to map textual geographic descriptions to lat/lng
# Region → approximate center coordinates.
#
# Issue #245 (tg12): converting a region name straight into precise
# map coordinates is false precision. We still use this table to
# infer a coarse position from a headline mention, but the resulting
# carrier object is now stamped ``position_confidence = "approximate"``
# so the UI can render an uncertainty radius / dimmed icon. The
# centroid is a best-effort midpoint of the named body of water.
# -----------------------------------------------------------------
REGION_COORDS: Dict[str, tuple] = {
# Oceans & Seas
@@ -220,9 +228,39 @@ REGION_COORDS: Dict[str, tuple] = {
}
# -----------------------------------------------------------------
# Cache file for persisting positions between restarts
# Files
# -----------------------------------------------------------------
CACHE_FILE = Path(__file__).parent.parent / "carrier_cache.json"
#
# The seed lives in the read-only image data dir (it ships with each
# release). The cache lives in the same data dir but is written at
# runtime; under Docker compose this dir is volume-mounted so the
# cache persists across container restarts, which is the whole point
# of the seed-then-observe model — the user's runtime observations
# survive image upgrades.
SEED_FILE = Path(__file__).parent.parent / "data" / "carrier_seed.json"
CACHE_FILE = Path(__file__).parent.parent / "data" / "carrier_cache.json"
# -----------------------------------------------------------------
# Freshness window for position_confidence labeling. Issue #246 (tg12):
# previously persisted cache entries had no freshness signal at all.
# After this change, the position itself is preserved (we never lose
# what was last observed) but the confidence label flips from
# "recent" to "stale" once the underlying source is older than this
# window. Operator-overridable via env var.
# -----------------------------------------------------------------
_DEFAULT_FRESHNESS_WINDOW_DAYS = 14
def _freshness_window_days() -> int:
raw = str(os.environ.get("SHADOWBROKER_CARRIER_FRESHNESS_DAYS", "") or "").strip()
if not raw:
return _DEFAULT_FRESHNESS_WINDOW_DAYS
try:
n = int(raw)
return n if n > 0 else _DEFAULT_FRESHNESS_WINDOW_DAYS
except (TypeError, ValueError):
return _DEFAULT_FRESHNESS_WINDOW_DAYS
_carrier_positions: Dict[str, dict] = {}
_positions_lock = threading.Lock()
@@ -234,25 +272,159 @@ _GDELT_REQUEST_DELAY_SECONDS = 1.25
_GDELT_REQUEST_JITTER_SECONDS = 0.35
def _now_iso() -> str:
return datetime.now(timezone.utc).isoformat()
def _parse_iso(ts: str) -> Optional[datetime]:
if not ts:
return None
try:
# Python's fromisoformat accepts +00:00 but not 'Z' until 3.11.
normalized = ts.replace("Z", "+00:00")
dt = datetime.fromisoformat(normalized)
if dt.tzinfo is None:
dt = dt.replace(tzinfo=timezone.utc)
return dt
except (TypeError, ValueError):
return None
def _compute_position_confidence(entry: dict, *, now: Optional[datetime] = None) -> str:
"""Return the public confidence label for a carrier cache entry.
Order of precedence:
- explicit "homeport_default" / "seed" labels are preserved.
- dated entries (with position_source_at) are "recent" if within
the configured freshness window, else "stale".
- missing position_source_at falls through to "stale".
"""
raw_label = str(entry.get("position_confidence", "") or "").strip()
# Explicit "kind of provenance" labels are preserved as-is. They
# describe HOW we got the position, not WHEN — a fresh headline-to-
# centroid match (#245) is still imprecise no matter how recently
# it was observed, and the seed (#244) is always the seed.
if raw_label in {"seed", "homeport_default", "approximate"}:
# Approximate entries can still age into "stale_approximate" if
# they fall out of the freshness window — that distinction lets
# the UI render a different badge for old-and-imprecise vs
# recent-and-imprecise. seed/homeport_default never age (they
# were never timestamped against real observations).
if raw_label == "approximate":
source_at = _parse_iso(str(entry.get("position_source_at", "") or ""))
if source_at is not None:
reference = now or datetime.now(timezone.utc)
if reference - source_at > timedelta(days=_freshness_window_days()):
return "stale_approximate"
return raw_label
source_at = _parse_iso(str(entry.get("position_source_at", "") or ""))
if not source_at:
return "stale"
reference = now or datetime.now(timezone.utc)
window = timedelta(days=_freshness_window_days())
if reference - source_at <= window:
return "recent"
return "stale"
def _load_seed() -> Dict[str, dict]:
"""Load the read-only seed file shipped with the image.
Returns a hull→entry dict (no _meta wrapper). Missing or malformed
seed files yield an empty dict — the caller falls back to homeport
defaults.
"""
try:
if not SEED_FILE.exists():
logger.info("Carrier seed file not present at %s; first-run will fall back to homeport defaults", SEED_FILE)
return {}
raw = json.loads(SEED_FILE.read_text(encoding="utf-8"))
carriers = raw.get("carriers", {}) if isinstance(raw, dict) else {}
if not isinstance(carriers, dict):
return {}
logger.info("Carrier seed loaded: %d entries from %s", len(carriers), SEED_FILE)
return carriers
except (IOError, OSError, json.JSONDecodeError, ValueError) as e:
logger.warning("Failed to load carrier seed file %s: %s", SEED_FILE, e)
return {}
def _load_cache() -> Dict[str, dict]:
"""Load cached carrier positions from disk."""
"""Load the mutable cache (last-known positions persisted between restarts)."""
try:
if CACHE_FILE.exists():
data = json.loads(CACHE_FILE.read_text())
logger.info(f"Carrier cache loaded: {len(data)} carriers from {CACHE_FILE}")
return data
data = json.loads(CACHE_FILE.read_text(encoding="utf-8"))
if isinstance(data, dict):
logger.info("Carrier cache loaded: %d carriers from %s", len(data), CACHE_FILE)
return data
except (IOError, OSError, json.JSONDecodeError, ValueError) as e:
logger.warning(f"Failed to load carrier cache: {e}")
logger.warning("Failed to load carrier cache: %s", e)
return {}
def _save_cache(positions: Dict[str, dict]):
"""Persist carrier positions to disk."""
def _save_cache(positions: Dict[str, dict]) -> None:
"""Persist the mutable cache. Atomic write (temp + rename) so a crash
mid-write can't leave the file truncated."""
try:
CACHE_FILE.write_text(json.dumps(positions, indent=2))
logger.info(f"Carrier cache saved: {len(positions)} carriers")
CACHE_FILE.parent.mkdir(parents=True, exist_ok=True)
tmp = CACHE_FILE.with_suffix(CACHE_FILE.suffix + ".tmp")
tmp.write_text(json.dumps(positions, indent=2), encoding="utf-8")
# On Windows os.replace is atomic and overwrites existing files.
os.replace(tmp, CACHE_FILE)
logger.info("Carrier cache saved: %d carriers", len(positions))
except (IOError, OSError) as e:
logger.warning(f"Failed to save carrier cache: {e}")
logger.warning("Failed to save carrier cache: %s", e)
def _homeport_entry_for(hull: str) -> Optional[dict]:
"""Return a homeport-default cache entry for a hull, or None if the
hull is not in the registry."""
info = CARRIER_REGISTRY.get(hull)
if not info:
return None
return {
"lat": info["homeport_lat"],
"lng": info["homeport_lng"],
"heading": 0,
"desc": f"{info['homeport']} (no observations yet)",
"source": f"Homeport default ({info['homeport']})",
"source_url": info.get("wiki", ""),
"position_source_at": _now_iso(),
"position_confidence": "homeport_default",
}
def _bootstrap_cache_if_missing() -> Dict[str, dict]:
"""One-shot: if no cache exists, materialize one from the seed file.
Returns the cache contents (hull→entry). On first-ever startup,
this writes ``carrier_cache.json`` so subsequent restarts skip the
seed entirely. Operator-deleted caches re-bootstrap the same way —
operators can use that to "reset" carrier positions, but it's an
explicit operator action.
"""
if CACHE_FILE.exists():
return _load_cache()
seed = _load_seed()
if not seed:
# No seed file either. Build a homeport-default cache so the
# first save_cache call still produces something honest.
homeports: Dict[str, dict] = {}
for hull in CARRIER_REGISTRY:
entry = _homeport_entry_for(hull)
if entry is not None:
homeports[hull] = entry
if homeports:
_save_cache(homeports)
return homeports
# Persist the seed as the first cache so subsequent runs skip this branch.
_save_cache(seed)
logger.info("Carrier cache bootstrapped from seed (first-ever startup)")
return dict(seed)
def _match_region(text: str) -> Optional[tuple]:
@@ -270,10 +442,8 @@ def _match_carrier(text: str) -> Optional[str]:
for hull, info in CARRIER_REGISTRY.items():
hull_check = hull.lower().replace("-", "")
name_parts = info["name"].lower()
# Match hull number (e.g., "CVN-78", "CVN78")
if hull.lower() in text_lower or hull_check in text_lower.replace("-", ""):
return hull
# Match ship name (e.g., "Ford", "Eisenhower", "Vinson")
ship_name = name_parts.split("(")[0].strip()
last_name = ship_name.split()[-1] if ship_name else ""
if last_name and len(last_name) > 3 and last_name in text_lower:
@@ -323,8 +493,9 @@ def _fetch_gdelt_carrier_news() -> List[dict]:
articles = data.get("articles", [])
for art in articles:
title = art.get("title", "")
url = art.get("url", "")
results.append({"title": title, "url": url})
article_url = art.get("url", "")
article_at = art.get("seendate") or art.get("date") or ""
results.append({"title": title, "url": article_url, "seendate": article_at})
except (ConnectionError, TimeoutError, ValueError, KeyError, OSError) as e:
logger.debug(f"GDELT search failed for '{term}': {e}")
continue
@@ -340,108 +511,139 @@ def _fetch_gdelt_carrier_news() -> List[dict]:
return results
def _gdelt_seendate_to_iso(seendate: str) -> Optional[str]:
"""GDELT returns YYYYMMDDhhmmss (UTC). Convert to ISO8601 for
position_source_at. Returns None if the input is unparseable."""
raw = (seendate or "").strip()
if len(raw) < 8 or not raw.isdigit():
return None
try:
dt = datetime.strptime(raw[:14] if len(raw) >= 14 else raw[:8] + "000000", "%Y%m%d%H%M%S")
return dt.replace(tzinfo=timezone.utc).isoformat()
except (TypeError, ValueError):
return None
def _parse_carrier_positions_from_news(articles: List[dict]) -> Dict[str, dict]:
"""Parse carrier positions from news article titles and descriptions."""
"""Parse carrier positions from news article titles.
Issue #245 (tg12): the position is a region centroid, which is
coarse — we now stamp ``position_confidence = "approximate"`` so
the UI can render that uncertainty. Issue #244: the
``position_source_at`` field is the news article's actual seen
date, NOT now(), so the freshness check correctly flips entries
to "stale" once they age past the configured window.
"""
updates: Dict[str, dict] = {}
for article in articles:
title = article.get("title", "")
# Try to match a carrier from the title
hull = _match_carrier(title)
if not hull:
continue
# Try to match a region from the title
coords = _match_region(title)
if not coords:
continue
# Only update if we haven't seen this carrier yet (first match wins — most recent)
# First match wins (most recent article, GDELT returns newest first
# per term).
if hull not in updates:
iso_at = _gdelt_seendate_to_iso(str(article.get("seendate", ""))) or _now_iso()
updates[hull] = {
"lat": coords[0],
"lng": coords[1],
"heading": 0,
"desc": title[:100],
"source": "GDELT News API",
"source": "GDELT News API (headline region match — approximate)",
"source_url": article.get("url", "https://api.gdeltproject.org"),
"updated": datetime.now(timezone.utc).isoformat(),
"position_source_at": iso_at,
# Headline-to-centroid match is explicitly approximate.
"position_confidence": "approximate",
}
logger.info(
f"Carrier update: {CARRIER_REGISTRY[hull]['name']}{coords} (from: {title[:80]})"
"Carrier update: %s%s (from: %s)",
CARRIER_REGISTRY[hull]["name"],
coords,
title[:80],
)
return updates
def _load_carrier_fallbacks() -> Dict[str, dict]:
"""Build carrier positions from static fallbacks + disk cache (instant, no network)."""
positions: Dict[str, dict] = {}
for hull, info in CARRIER_REGISTRY.items():
positions[hull] = {
"name": info["name"],
"lat": info["fallback_lat"],
"lng": info["fallback_lng"],
"heading": info["fallback_heading"],
"desc": info["fallback_desc"],
"wiki": info["wiki"],
"source": "USNI News Fleet & Marine Tracker",
"source_url": "https://news.usni.org/category/fleet-tracker",
"updated": datetime.now(timezone.utc).isoformat(),
}
# Overlay cached positions from previous runs (may have GDELT data)
cached = _load_cache()
for hull, cached_pos in cached.items():
if hull in positions:
if cached_pos.get("source", "").startswith("GDELT") or cached_pos.get(
"source", ""
).startswith("News"):
positions[hull].update(
{
"lat": cached_pos["lat"],
"lng": cached_pos["lng"],
"desc": cached_pos.get("desc", positions[hull]["desc"]),
"source": cached_pos.get("source", "Cached OSINT"),
"updated": cached_pos.get("updated", ""),
}
)
return positions
def _enrich_for_rendering(hull: str, entry: dict, *, now: Optional[datetime] = None) -> dict:
"""Add live computed fields (confidence label, last_osint_update)
on top of the persisted cache entry. The persisted entry is left
untouched; this function builds the public-facing object.
"""
info = CARRIER_REGISTRY.get(hull, {})
confidence = _compute_position_confidence(entry, now=now)
return {
"name": entry.get("name", info.get("name", hull)),
"lat": entry["lat"],
"lng": entry["lng"],
"heading": entry.get("heading", 0),
"desc": entry.get("desc", ""),
"wiki": entry.get("wiki", info.get("wiki", "")),
"source": entry.get("source", "OSINT estimated position"),
"source_url": entry.get("source_url", ""),
"position_source_at": entry.get("position_source_at", ""),
"position_confidence": confidence,
# Existing field preserved for backward compatibility with the
# current frontend ShipPopup; now reflects the SOURCE's observed
# time (not now()), so "last reported X days ago" is honest.
"last_osint_update": entry.get("position_source_at", ""),
# Convenience boolean for the UI: true when the position is
# NOT live OSINT (used to render dimmed icons / badges).
"is_fallback": confidence in {"seed", "stale", "stale_approximate", "homeport_default"},
}
def update_carrier_positions():
"""Main update function — called on startup and every 12h.
def update_carrier_positions() -> None:
"""Refresh carrier positions.
Phase 1 (instant): publish fallback + cached positions so the map has carriers immediately.
Phase 2 (slow): query GDELT for fresh OSINT positions and update in-place.
Phase 1 (instant): publish whatever's in carrier_cache.json (or
bootstrap from seed on first-ever run), so the map has carriers
immediately.
Phase 2 (slow): query GDELT and replace position entries for any
carrier mentioned in fresh news. Persist back to cache.
"""
global _last_update
# --- Phase 1: instant fallback + cache ---
positions = _load_carrier_fallbacks()
# --- Phase 1: instant cache (bootstrap from seed on first-ever run) ---
positions = _bootstrap_cache_if_missing()
# Ensure every registered hull has SOMETHING in the cache. A hull
# the seed didn't cover (e.g. added after install) renders at its
# homeport with "homeport_default" confidence.
for hull in CARRIER_REGISTRY:
if hull not in positions:
entry = _homeport_entry_for(hull)
if entry is not None:
positions[hull] = entry
with _positions_lock:
# Only overwrite if positions are currently empty (first startup).
# If we already have data from a previous cycle, keep it while GDELT runs.
if not _carrier_positions:
_carrier_positions.update(positions)
_last_update = datetime.now(timezone.utc)
logger.info(
f"Carrier tracker: {len(positions)} carriers loaded from fallback/cache (GDELT enrichment starting...)"
"Carrier tracker: %d carriers loaded from cache (GDELT enrichment starting...)",
len(positions),
)
# --- Phase 2: slow GDELT enrichment ---
# --- Phase 2: GDELT enrichment ---
try:
articles = _fetch_gdelt_carrier_news()
news_positions = _parse_carrier_positions_from_news(articles)
for hull, pos in news_positions.items():
if hull in positions:
positions[hull].update(pos)
logger.info(f"Carrier OSINT: updated {CARRIER_REGISTRY[hull]['name']} from news")
# Always overwrite — newest GDELT mention wins. The previous
# entry's position is preserved in git history and the next
# cycle either confirms or replaces it.
positions[hull] = pos
logger.info("Carrier OSINT: updated %s from news", CARRIER_REGISTRY[hull]["name"])
except (ValueError, KeyError, json.JSONDecodeError, OSError) as e:
logger.warning(f"GDELT carrier fetch failed: {e}")
logger.warning("GDELT carrier fetch failed: %s", e)
# Save and update the global state with enriched positions
with _positions_lock:
_carrier_positions.clear()
_carrier_positions.update(positions)
@@ -449,21 +651,15 @@ def update_carrier_positions():
_save_cache(positions)
sources = {}
for p in positions.values():
src = p.get("source", "unknown")
sources[src] = sources.get(src, 0) + 1
logger.info(f"Carrier tracker: {len(positions)} carriers updated. Sources: {sources}")
confidences: Dict[str, int] = {}
for entry in positions.values():
label = _compute_position_confidence(entry)
confidences[label] = confidences.get(label, 0) + 1
logger.info("Carrier tracker: %d carriers updated. Confidence: %s", len(positions), confidences)
def _deconflict_positions(result: List[dict]) -> List[dict]:
"""Offset carriers that share identical coordinates so they don't stack.
At port: offset along the pier axis (~500m / 0.004° apart).
At sea: offset perpendicular to each other (~0.08° / ~9km apart)
so they're visibly separate but clearly operating together.
"""
# Group by rounded lat/lng (within ~0.01° ≈ 1km = same spot)
"""Offset carriers that share identical coordinates so they don't stack."""
from collections import defaultdict
groups: dict[str, list[int]] = defaultdict(list)
@@ -475,7 +671,6 @@ def _deconflict_positions(result: List[dict]) -> List[dict]:
if len(indices) < 2:
continue
n = len(indices)
# Determine if this is a port (near a homeport) or at sea
sample = result[indices[0]]
at_port = any(
abs(sample["lat"] - info.get("homeport_lat", 0)) < 0.05
@@ -484,7 +679,6 @@ def _deconflict_positions(result: List[dict]) -> List[dict]:
)
if at_port:
# Use each carrier's distinct homeport pier coordinates
for idx in indices:
carrier = result[idx]
hull = None
@@ -497,8 +691,7 @@ def _deconflict_positions(result: List[dict]) -> List[dict]:
carrier["lat"] = info["homeport_lat"]
carrier["lng"] = info["homeport_lng"]
else:
# At sea: spread in a line perpendicular to travel (~0.08° apart)
spacing = 0.08 # ~9km — close enough to see they're together
spacing = 0.08
start_offset = -(n - 1) * spacing / 2
for j, idx in enumerate(indices):
result[idx]["lng"] += start_offset + j * spacing
@@ -507,36 +700,44 @@ def _deconflict_positions(result: List[dict]) -> List[dict]:
def get_carrier_positions() -> List[dict]:
"""Return current carrier positions for the data pipeline."""
"""Return current carrier positions for the data pipeline.
Each entry has the full provenance + freshness fields; the UI can
decide how to render them. Carriers are never hidden — only
labeled.
"""
now = datetime.now(timezone.utc)
with _positions_lock:
result = []
for hull, pos in _carrier_positions.items():
info = CARRIER_REGISTRY.get(hull, {})
result: List[dict] = []
for hull, entry in _carrier_positions.items():
enriched = _enrich_for_rendering(hull, entry, now=now)
result.append(
{
"name": pos.get("name", info.get("name", hull)),
"name": enriched["name"],
"type": "carrier",
"lat": pos["lat"],
"lng": pos["lng"],
"heading": None, # Heading unknown for carriers — OSINT cannot determine true heading
"lat": enriched["lat"],
"lng": enriched["lng"],
"heading": None, # OSINT cannot determine true heading.
"sog": 0,
"cog": 0,
"country": "United States",
"desc": pos.get("desc", ""),
"wiki": pos.get("wiki", info.get("wiki", "")),
"desc": enriched["desc"],
"wiki": enriched["wiki"],
"estimated": True,
"source": pos.get("source", "OSINT estimated position"),
"source_url": pos.get(
"source_url", "https://news.usni.org/category/fleet-tracker"
),
"last_osint_update": pos.get("updated", ""),
"source": enriched["source"],
"source_url": enriched["source_url"],
"last_osint_update": enriched["last_osint_update"],
# New fields (additive — existing UI continues to work):
"position_source_at": enriched["position_source_at"],
"position_confidence": enriched["position_confidence"],
"is_fallback": enriched["is_fallback"],
}
)
return _deconflict_positions(result)
# -----------------------------------------------------------------
# Scheduler: runs at startup, then at 00:00 and 12:00 UTC daily
# Scheduler: runs at startup, then at 00:00 and 12:00 UTC daily.
# -----------------------------------------------------------------
_scheduler_thread: Optional[threading.Thread] = None
_scheduler_stop = threading.Event()
@@ -544,7 +745,6 @@ _scheduler_stop = threading.Event()
def _scheduler_loop():
"""Background thread that triggers updates at 00:00 and 12:00 UTC."""
# Initial update on startup
try:
update_carrier_positions()
except Exception as e:
@@ -552,7 +752,6 @@ def _scheduler_loop():
while not _scheduler_stop.is_set():
now = datetime.now(timezone.utc)
# Next target: 00:00 or 12:00 UTC, whichever is sooner
hour = now.hour
if hour < 12:
next_hour = 12
@@ -561,18 +760,17 @@ def _scheduler_loop():
next_run = now.replace(hour=next_hour % 24, minute=0, second=0, microsecond=0)
if next_hour == 24:
from datetime import timedelta
next_run = (now + timedelta(days=1)).replace(hour=0, minute=0, second=0, microsecond=0)
wait_seconds = (next_run - now).total_seconds()
logger.info(
f"Carrier tracker: next update at {next_run.isoformat()} ({wait_seconds/3600:.1f}h)"
"Carrier tracker: next update at %s (%.1fh)",
next_run.isoformat(),
wait_seconds / 3600,
)
# Wait until next scheduled time, or until stop event
if _scheduler_stop.wait(timeout=wait_seconds):
break # Stop event was set
break
try:
update_carrier_positions()
+19
View File
@@ -53,6 +53,12 @@ class Settings(BaseSettings):
MESH_RELAY_FAILURE_COOLDOWN_S: int = 120
MESH_BOOTSTRAP_SEED_FAILURE_COOLDOWN_S: int = 15
MESH_PEER_PUSH_SECRET: str = ""
# Issue #256 (tg12): optional per-peer HMAC secret map. Comma-separated
# `url=secret` pairs. When a peer URL appears here, only that per-peer
# secret is accepted for it — the global MESH_PEER_PUSH_SECRET above is
# ignored for that specific URL. Single-peer installs and unmigrated
# multi-peer installs leave this empty and behavior is unchanged.
MESH_PEER_SECRETS: str = ""
MESH_RNS_APP_NAME: str = "shadowbroker"
MESH_RNS_ASPECT: str = "infonet"
MESH_RNS_IDENTITY_PATH: str = ""
@@ -289,6 +295,19 @@ class Settings(BaseSettings):
# service operator can identify per-install traffic instead of a generic
# "ShadowBroker" aggregate.
MESHTASTIC_OPERATOR_CALLSIGN: str = ""
# Per-install operator handle used in the User-Agent for EVERY third-party
# API the backend calls (Wikipedia, Wikidata, Nominatim, GDELT, OpenMHz,
# Broadcastify, weather.gov, NUFORC, etc.). The default is empty, in which
# case backend/services/network_utils.py auto-generates a stable
# pseudonymous handle like "operator-7f3a92" on first use and caches it.
# Operators who want to identify themselves with a real handle can set
# this; operators who want to stay pseudonymous can leave it empty.
#
# The handle is sent ONLY to public third-party APIs. It is NEVER mixed
# into mesh / Wormhole / Infonet identity (those have their own crypto
# identity layer; conflating the two would leak public attribution into
# private mesh state).
OPERATOR_HANDLE: str = ""
# SAR (Synthetic Aperture Radar) data layer
# Mode A — free catalog metadata, no account, default-on
+8 -1
View File
@@ -16,8 +16,15 @@ from typing import Any
import requests
from services.network_utils import outbound_user_agent
logger = logging.getLogger(__name__)
def _feed_ingester_user_agent() -> str:
# Round 7a: per-install attribution for operator-curated feed URLs.
return outbound_user_agent("feed-ingester")
# ---------------------------------------------------------------------------
# State
# ---------------------------------------------------------------------------
@@ -157,7 +164,7 @@ def _fetch_layer_feed(layer: dict[str, Any]) -> None:
resp = requests.get(
feed_url,
timeout=_FETCH_TIMEOUT,
headers={"User-Agent": "ShadowBroker-FeedIngester/1.0"},
headers={"User-Agent": _feed_ingester_user_agent()},
)
resp.raise_for_status()
data = resp.json()
@@ -21,6 +21,13 @@ from typing import Any
import defusedxml.ElementTree as ET
import requests
def _aircraft_db_user_agent() -> str:
"""Round 7a: lazy import so the per-install operator handle is included."""
from services.network_utils import outbound_user_agent
return outbound_user_agent("aircraft-database")
logger = logging.getLogger(__name__)
_BUCKET_LIST_URL = (
@@ -44,7 +51,7 @@ def _latest_snapshot_key() -> str:
response = requests.get(
_BUCKET_LIST_URL,
timeout=_LIST_TIMEOUT_S,
headers={"User-Agent": _USER_AGENT},
headers={"User-Agent": _aircraft_db_user_agent()},
)
response.raise_for_status()
root = ET.fromstring(response.text)
@@ -71,7 +78,7 @@ def _stream_csv_index(url: str) -> dict[str, dict[str, str]]:
url,
timeout=_DOWNLOAD_TIMEOUT_S,
stream=True,
headers={"User-Agent": _USER_AGENT},
headers={"User-Agent": _aircraft_db_user_agent()},
) as response:
response.raise_for_status()
line_iter = (
+19 -10
View File
@@ -15,7 +15,11 @@ import time
import heapq
from datetime import datetime, timedelta
from pathlib import Path
from services.network_utils import external_curl_fallback_enabled, fetch_with_curl
from services.network_utils import (
external_curl_fallback_enabled,
fetch_with_curl,
outbound_user_agent,
)
from services.fetchers._store import latest_data, _data_lock, _mark_fresh
from services.fetchers.nuforc_enrichment import enrich_sighting
from services.fetchers.retry import with_retry
@@ -279,13 +283,13 @@ def fetch_weather_alerts():
return
alerts = []
try:
# weather.gov requires a User-Agent per their API policy, but it
# need not identify the operator. Use a project-generic string and
# let the user override via SHADOWBROKER_USER_AGENT if needed.
from services.network_utils import DEFAULT_USER_AGENT
# weather.gov requires a User-Agent per their API policy. Round 7a:
# send the per-install operator handle so they can rate-limit per
# operator instead of treating "Shadowbroker" as one entity.
from services.network_utils import outbound_user_agent
url = "https://api.weather.gov/alerts/active?status=actual"
headers = {
"User-Agent": DEFAULT_USER_AGENT,
"User-Agent": outbound_user_agent("weather-gov"),
"Accept": "application/geo+json",
}
response = fetch_with_curl(url, timeout=15, headers=headers)
@@ -713,7 +717,12 @@ _NUFORC_LIVE_NONCE_RE = re.compile(
r'id=["\']wdtNonceFrontendServerSide_1["\'][^>]*value=["\']([a-f0-9]+)["\']'
)
_NUFORC_LIVE_SIGHTING_ID_RE = re.compile(r"id=(\d+)")
_NUFORC_LIVE_USER_AGENT = "Mozilla/5.0 (ShadowBroker-OSINT NUFORC-fetcher)"
# Round 7a: NUFORC's site is sensitive to non-browser UAs but we send a
# per-install operator handle prefixed by Mozilla/5.0 so we're identifiable
# without being aggregately blocked. Operators who want stricter privacy
# can override the entire UA via SHADOWBROKER_USER_AGENT.
def _nuforc_live_user_agent() -> str:
return f"Mozilla/5.0 ({outbound_user_agent('nuforc-live')})"
_NUFORC_LIVE_SESSION_COOKIES = _NUFORC_DATA_DIR / "nuforc_session.cookies"
# Sample grid covering continental US, Alaska, Hawaii, Canada, UK, Australia
@@ -957,7 +966,7 @@ def _photon_lookup(query: str) -> list[float] | None:
res = fetch_with_curl(
url,
headers={
"User-Agent": "ShadowBroker-OSINT/1.0 (NUFORC-UAP-layer)",
"User-Agent": outbound_user_agent("nuforc-uap-geocode"),
"Accept-Language": "en",
},
timeout=10,
@@ -1053,7 +1062,7 @@ def _nuforc_fetch_month_live(yyyymm: str, cookie_jar: Path) -> list[dict]:
index_res = subprocess.run(
[
curl_bin, "-sL",
"-A", _NUFORC_LIVE_USER_AGENT,
"-A", _nuforc_live_user_agent(),
"-c", str(cookie_jar),
"-b", str(cookie_jar),
index_url,
@@ -1089,7 +1098,7 @@ def _nuforc_fetch_month_live(yyyymm: str, cookie_jar: Path) -> list[dict]:
ajax_res = subprocess.run(
[
curl_bin, "-sL",
"-A", _NUFORC_LIVE_USER_AGENT,
"-A", _nuforc_live_user_agent(),
"-c", str(cookie_jar),
"-b", str(cookie_jar),
"-X", "POST",
+2 -2
View File
@@ -6,7 +6,7 @@ import heapq
import logging
from pathlib import Path
from cachetools import TTLCache
from services.network_utils import fetch_with_curl
from services.network_utils import fetch_with_curl, outbound_user_agent
from services.fetchers._store import latest_data, _data_lock, _mark_fresh
from services.fetchers.retry import with_retry
@@ -29,7 +29,7 @@ def _geocode_region(region_name: str, country_name: str) -> tuple:
query = urllib.parse.quote(f"{region_name}, {country_name}")
url = f"https://nominatim.openstreetmap.org/search?q={query}&format=json&limit=1"
response = fetch_with_curl(url, timeout=8, headers={"User-Agent": "ShadowBroker-OSINT/1.0"})
response = fetch_with_curl(url, timeout=8, headers={"User-Agent": outbound_user_agent("infrastructure-data")})
if response.status_code == 200:
results = response.json()
if results:
+7 -2
View File
@@ -191,8 +191,13 @@ def fetch_meshtastic_nodes():
_os.environ.get("MESHTASTIC_SEND_CALLSIGN_HEADER", "true")
).strip().lower() not in {"0", "false", "no", "off", ""}
from services.network_utils import DEFAULT_USER_AGENT
ua_base = f"{DEFAULT_USER_AGENT}; 24h polling"
# Round 7a: outbound_user_agent already includes the per-install handle.
# The optional Meshtastic callsign is appended as additional context so
# meshtastic.liamcottle.net's operator can identify both the install AND
# the registered radio operator (when MESHTASTIC_OPERATOR_CALLSIGN is set
# and MESHTASTIC_SEND_CALLSIGN_HEADER is true; see issue #203).
from services.network_utils import outbound_user_agent
ua_base = f"{outbound_user_agent('meshtastic-map')}; 24h polling"
if callsign and send_callsign_header:
user_agent = f"{ua_base}; node={callsign}"
else:
+7 -1
View File
@@ -17,6 +17,12 @@ from typing import Any
import requests
def _route_db_user_agent() -> str:
from services.network_utils import outbound_user_agent
return outbound_user_agent("route-database")
logger = logging.getLogger(__name__)
_ROUTES_URL = "https://vrs-standing-data.adsb.lol/routes.csv.gz"
@@ -37,7 +43,7 @@ def _fetch_csv_gz(url: str) -> list[dict[str, str]]:
response = requests.get(
url,
timeout=_HTTP_TIMEOUT_S,
headers={"User-Agent": _USER_AGENT, "Accept-Encoding": "gzip"},
headers={"User-Agent": _route_db_user_agent(), "Accept-Encoding": "gzip"},
)
response.raise_for_status()
text = gzip.decompress(response.content).decode("utf-8-sig")
+7 -1
View File
@@ -10,6 +10,12 @@ from datetime import datetime, timezone
from services.fetchers._store import _data_lock, _mark_fresh, latest_data
from services.network_utils import fetch_with_curl
def _trains_user_agent() -> str:
from services.network_utils import outbound_user_agent
return outbound_user_agent("trains")
logger = logging.getLogger(__name__)
_EARTH_RADIUS_KM = 6371.0
@@ -379,7 +385,7 @@ def _fetch_digitraffic() -> list[dict]:
timeout=15,
headers={
"Accept-Encoding": "gzip",
"User-Agent": "ShadowBroker-OSINT/1.0",
"User-Agent": _trains_user_agent(),
},
)
if resp.status_code != 200:
+13 -5
View File
@@ -21,9 +21,17 @@ _cache_lock = threading.Lock()
_local_search_cache: List[Dict[str, Any]] | None = None
_local_search_lock = threading.Lock()
_USER_AGENT = os.environ.get(
"NOMINATIM_USER_AGENT", "ShadowBroker/1.0 (https://github.com/BigBodyCobain/Shadowbroker)"
)
# Round 7a: per-install operator handle threads through every Nominatim
# call. NOMINATIM_USER_AGENT env override is still honored for operators
# who run a custom relay / known good identity, but the default uses the
# per-install handle so OpenStreetMap can rate-limit per install instead
# of treating "Shadowbroker" as one big offender.
def _nominatim_user_agent() -> str:
override = os.environ.get("NOMINATIM_USER_AGENT", "").strip()
if override:
return override
from services.network_utils import outbound_user_agent
return outbound_user_agent("nominatim")
def _get_cache(key: str):
@@ -178,7 +186,7 @@ def search_geocode(query: str, limit: int = 5, local_only: bool = False) -> List
res = fetch_with_curl(
url,
headers={
"User-Agent": _USER_AGENT,
"User-Agent": _nominatim_user_agent(),
"Accept-Language": "en",
},
timeout=6,
@@ -241,7 +249,7 @@ def reverse_geocode(lat: float, lng: float, local_only: bool = False) -> Dict[st
res = fetch_with_curl(
url,
headers={
"User-Agent": _USER_AGENT,
"User-Agent": _nominatim_user_agent(),
"Accept-Language": "en",
},
timeout=6,
+33 -3
View File
@@ -8,6 +8,13 @@ from datetime import datetime
from urllib.parse import urljoin, urlparse
from services.network_utils import fetch_with_curl
def _geopolitics_user_agent() -> str:
"""Round 7a: GDELT geopolitics fetcher attribution."""
from services.network_utils import outbound_user_agent
return outbound_user_agent("geopolitics-gdelt")
logger = logging.getLogger(__name__)
# Cache Frontline data for 30 minutes, it doesn't move that fast
@@ -316,7 +323,7 @@ def _fetch_article_title(url):
resp = requests.get(
current_url,
timeout=4,
headers={"User-Agent": "Mozilla/5.0 (compatible; OSINT Dashboard/1.0)"},
headers={"User-Agent": _geopolitics_user_agent()},
stream=True,
allow_redirects=False,
)
@@ -521,10 +528,29 @@ def _parse_gdelt_export_zip(zip_bytes, conflict_codes, seen_locs, features, loc_
logger.warning(f"Failed to parse GDELT export zip: {e}")
# GDELT's data.gdeltproject.org is a CNAME to a Google Cloud Storage
# bucket of the same name. GCS returns the wildcard ``*.storage.googleapis.com``
# certificate, which legitimately does NOT cover the GDELT custom domain
# — Python's TLS verification correctly refuses it. Some networks/POPs
# happen to route through a path where this works; many do not (notably
# Docker Desktop's outbound NAT on local installs).
#
# Fix: rewrite the URL to hit GCS directly with a path-style bucket
# reference, where the standard GCS cert is genuinely valid. Same data,
# verified TLS, no operator-side workaround needed.
def _gcs_direct_gdelt_url(url: str) -> str:
"""If ``url`` points at data.gdeltproject.org, return the equivalent
GCS-direct URL. Otherwise return the URL unchanged."""
prefix = "://data.gdeltproject.org/"
if prefix in url:
return url.replace(prefix, "://storage.googleapis.com/data.gdeltproject.org/", 1)
return url
def _download_gdelt_export(url):
"""Download a single GDELT export file, return bytes or None."""
try:
res = fetch_with_curl(url, timeout=15)
res = fetch_with_curl(_gcs_direct_gdelt_url(url), timeout=15)
if res.status_code == 200:
return res.content
except (ConnectionError, TimeoutError, OSError): # non-critical
@@ -620,8 +646,12 @@ def fetch_global_military_incidents():
# HTTPS is used to prevent passive network observers from injecting
# poisoned export records into the global incident map via MITM.
# GDELT serves the same content over HTTPS as HTTP.
# Use the GCS-direct URL because data.gdeltproject.org's CNAME
# serves a wildcard *.storage.googleapis.com cert that legitimately
# doesn't cover the GDELT hostname. See _gcs_direct_gdelt_url above.
index_res = fetch_with_curl(
"https://data.gdeltproject.org/gdeltv2/lastupdate.txt", timeout=10
_gcs_direct_gdelt_url("https://data.gdeltproject.org/gdeltv2/lastupdate.txt"),
timeout=10,
)
if index_res.status_code != 200:
logger.error(f"GDELT lastupdate failed: {index_res.status_code}")
+109
View File
@@ -69,6 +69,115 @@ def _derive_peer_key(shared_secret: str, peer_url: str) -> bytes:
).digest()
# ---------------------------------------------------------------------------
# Issue #256 (tg12): per-peer HMAC secrets
# ---------------------------------------------------------------------------
#
# Before this change, ALL peer-push HMACs were derived from a single
# fleet-shared ``MESH_PEER_PUSH_SECRET``. The receiver could prove a
# request was signed by *someone who knows the fleet secret*, but it
# could NOT prove which peer signed it — any peer could compute the
# expected HMAC for any other peer's URL and impersonate that peer.
#
# Fix: an optional ``MESH_PEER_SECRETS`` env var maps specific peer URLs
# to per-peer secrets. When a peer URL is listed there, only that
# per-peer secret is accepted for that URL — the global secret is
# ignored for that peer. Peer A no longer learns peer B's secret, so
# peer A cannot forge a request claiming to be peer B.
#
# Backwards-compatible by design:
#
# - Single-peer installs (``MESH_PEER_SECRETS`` empty) keep using the
# global secret. Zero behavior change. Zero operator action required.
# - Multi-peer installs that haven't migrated yet keep using the global
# secret for every peer. Same behavior as before — same exposure.
# - Multi-peer installs that have migrated configure
# ``MESH_PEER_SECRETS=urlA=secretA,urlB=secretB`` and immediately get
# per-peer identity. Migration is incremental: peers not yet listed
# continue using the global secret until both sides of that peering
# add their entry.
_PEER_SECRETS_CACHE: dict[str, str] = {}
_PEER_SECRETS_CACHE_RAW: str = ""
def _lookup_per_peer_secret(normalized_url: str) -> str:
"""Return the per-peer secret for ``normalized_url`` from MESH_PEER_SECRETS.
Returns "" if no per-peer entry is configured for that URL. The parser
is forgiving:
- Whitespace around items, URLs, and secrets is stripped.
- Items without ``=`` or with empty URL/secret halves are skipped.
- The URL half is normalized via ``normalize_peer_url`` so config
authors don't have to match scheme/port/path quirks exactly.
The cache is invalidated whenever the env var's raw value changes,
which keeps tests' ``monkeypatch.setenv`` calls effective without
forcing a process restart.
"""
import os
raw = str(os.environ.get("MESH_PEER_SECRETS", "") or "").strip()
global _PEER_SECRETS_CACHE, _PEER_SECRETS_CACHE_RAW
if raw != _PEER_SECRETS_CACHE_RAW:
new_cache: dict[str, str] = {}
for chunk in raw.split(","):
chunk = chunk.strip()
if not chunk or "=" not in chunk:
continue
url_part, _, secret_part = chunk.partition("=")
normalized = normalize_peer_url(url_part.strip())
secret = secret_part.strip()
if normalized and secret:
new_cache[normalized] = secret
_PEER_SECRETS_CACHE = new_cache
_PEER_SECRETS_CACHE_RAW = raw
return _PEER_SECRETS_CACHE.get(normalized_url, "")
def resolve_peer_key_for_url(peer_url: str) -> bytes:
"""Return the HMAC key for ``peer_url``, preferring per-peer secret.
Issue #256: this is the function every peer-push call site should
use. It looks up the peer-specific secret first, falling back to the
fleet-shared ``MESH_PEER_PUSH_SECRET`` only when the URL is NOT
listed in ``MESH_PEER_SECRETS``.
Both sender (computing X-Peer-HMAC) and receiver (verifying it) call
this with the SENDER's URL — they must derive the same key, so
operators on both ends of a peering need matching MESH_PEER_SECRETS
entries for that URL to stay in sync.
Returns empty bytes when no usable secret exists. Callers must treat
that as fail-closed (skip the push, reject the verification).
"""
normalized_url = normalize_peer_url(peer_url)
if not normalized_url:
return b""
per_peer_secret = _lookup_per_peer_secret(normalized_url)
if per_peer_secret:
return _derive_peer_key(per_peer_secret, normalized_url)
# No per-peer entry for this URL — fall back to the legacy global
# secret. This is what preserves zero-hostility for single-peer
# installs and the migration window for multi-peer installs.
try:
from services.config import get_settings
global_secret = str(
getattr(get_settings(), "MESH_PEER_PUSH_SECRET", "") or ""
).strip()
except Exception:
return b""
if not global_secret:
return b""
return _derive_peer_key(global_secret, normalized_url)
def _node_digest(public_key_b64: str) -> str:
raw = base64.b64decode(public_key_b64)
return hashlib.sha256(raw).hexdigest()
+8 -7
View File
@@ -216,18 +216,19 @@ def _peer_pair_ref_key(peer_url: str) -> bytes:
Returns an empty key on misconfiguration so callers fail closed.
"""
try:
from services.config import get_settings
from services.mesh.mesh_crypto import _derive_peer_key, normalize_peer_url
secret = str(get_settings().MESH_PEER_PUSH_SECRET or "").strip()
from services.mesh.mesh_crypto import (
normalize_peer_url,
resolve_peer_key_for_url,
)
except Exception:
return b""
if not secret:
return b""
normalized = normalize_peer_url(peer_url or "")
if not normalized:
return b""
peer_key = _derive_peer_key(secret, normalized)
# Issue #256: resolve_peer_key_for_url() prefers per-peer secrets
# from MESH_PEER_SECRETS and falls back to the global
# MESH_PEER_PUSH_SECRET only when the URL has no per-peer entry.
peer_key = resolve_peer_key_for_url(normalized)
if not peer_key:
return b""
# Domain-separate from the transport HMAC key so the two
+16 -11
View File
@@ -26,7 +26,11 @@ from enum import Enum
from typing import Any, Callable, Optional
from collections import deque
from urllib.parse import urlparse
from services.mesh.mesh_crypto import _derive_peer_key, normalize_peer_url
from services.mesh.mesh_crypto import (
_derive_peer_key,
normalize_peer_url,
resolve_peer_key_for_url,
)
from services.mesh.mesh_metrics import increment as metrics_inc
from services.mesh.mesh_privacy_policy import (
TRANSPORT_TIER_ORDER as _TIER_RANK,
@@ -703,7 +707,6 @@ class InternetTransport(_PeerPushTransportMixin):
endpoint_path, padded = self._build_peer_push_request(envelope, self.NAME)
except ValueError as exc:
return TransportResult(False, self.NAME, str(exc))
secret = str(settings.MESH_PEER_PUSH_SECRET or "").strip()
delivered = 0
last_error = ""
@@ -713,10 +716,13 @@ class InternetTransport(_PeerPushTransportMixin):
try:
normalized_peer_url = normalize_peer_url(peer_url)
headers = {"Content-Type": "application/json"}
if secret:
peer_key = _derive_peer_key(secret, normalized_peer_url)
if not peer_key:
raise ValueError("invalid peer URL for HMAC derivation")
# Issue #256: per-peer secret takes precedence over the
# global MESH_PEER_PUSH_SECRET. When neither is set the
# key is empty and we skip the HMAC header entirely so a
# bare (unsigned) push still works on test deployments
# that have not yet configured any secret at all.
peer_key = resolve_peer_key_for_url(normalized_peer_url)
if peer_key:
headers["X-Peer-Url"] = normalized_peer_url
headers["X-Peer-HMAC"] = hmac.new(
peer_key,
@@ -798,7 +804,6 @@ class TorArtiTransport(_PeerPushTransportMixin):
endpoint_path, padded = self._build_peer_push_request(envelope, self.NAME)
except ValueError as exc:
return TransportResult(False, self.NAME, str(exc))
secret = str(settings.MESH_PEER_PUSH_SECRET or "").strip()
delivered = 0
last_error = ""
@@ -808,10 +813,10 @@ class TorArtiTransport(_PeerPushTransportMixin):
try:
normalized_peer_url = normalize_peer_url(peer_url)
headers = {"Content-Type": "application/json"}
if secret:
peer_key = _derive_peer_key(secret, normalized_peer_url)
if not peer_key:
raise ValueError("invalid peer URL for HMAC derivation")
# Issue #256: per-peer secret takes precedence; see the
# other transport above for the rationale.
peer_key = resolve_peer_key_for_url(normalized_peer_url)
if peer_key:
headers["X-Peer-Url"] = normalized_peer_url
headers["X-Peer-HMAC"] = hmac.new(
peer_key,
@@ -91,13 +91,15 @@ def _fetch_dm_prekey_bundle_from_peer_lookup(lookup_token: str) -> dict[str, Any
return {"ok": False, "detail": "lookup token required"}
try:
from services.config import get_settings
from services.mesh.mesh_crypto import _derive_peer_key, normalize_peer_url
from services.mesh.mesh_crypto import (
normalize_peer_url,
resolve_peer_key_for_url,
)
from services.mesh.mesh_router import configured_relay_peer_urls
settings = get_settings()
secret = str(getattr(settings, "MESH_PEER_PUSH_SECRET", "") or "").strip()
if not secret:
return {"ok": False, "detail": "peer prekey lookup unavailable"}
# Issue #256: secret check moved per-peer below. We still bail out
# cleanly when there are no peers configured at all.
peers = configured_relay_peer_urls()
if not peers:
return {"ok": False, "detail": "peer prekey lookup unavailable"}
@@ -121,7 +123,8 @@ def _fetch_dm_prekey_bundle_from_peer_lookup(lookup_token: str) -> dict[str, Any
or os.environ.get("SB_TEST_NODE_URL", "").strip()
or normalized_peer_url
)
peer_key = _derive_peer_key(secret, sender_peer_url)
# Issue #256: prefer per-peer secret keyed by the sender URL.
peer_key = resolve_peer_key_for_url(sender_peer_url)
if not peer_key:
continue
headers = {
+205 -6
View File
@@ -5,7 +5,9 @@ import subprocess
import shutil
import time
import threading
import uuid
import requests
from pathlib import Path
from urllib.parse import urlparse
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
@@ -20,14 +22,211 @@ _session.mount("https://", HTTPAdapter(max_retries=_retry, pool_maxsize=20))
_session.mount("http://", HTTPAdapter(max_retries=_retry, pool_maxsize=10))
# Default outbound User-Agent. Generic by design — does NOT include any
# personal contact info or a fork-specific repo URL. Operators who run a
# public-facing relay and want to identify themselves to upstreams (e.g.
# for Nominatim / weather.gov usage-policy compliance) can override this
# via the SHADOWBROKER_USER_AGENT env var.
# ---------------------------------------------------------------------------
# Per-operator outbound identification
# ---------------------------------------------------------------------------
#
# Issues #289 / #290 / #291 and the retrofit of PR #284 (#218 / #219 / #220):
# every third-party API the backend calls used to identify itself with a
# single "Shadowbroker" aggregate User-Agent. From the upstream's
# perspective, that meant every Shadowbroker install in the world looked
# like one giant entity hammering them. If one install misbehaved, the
# upstream's only recourse was to block "Shadowbroker" as a whole — which
# would take out every other install too.
#
# Fix: give each install a stable pseudonymous handle and include it in
# the User-Agent. Now an upstream can rate-limit or block the offending
# operator without affecting anyone else.
#
# The handle:
#
# - Is auto-generated on first call if no `OPERATOR_HANDLE` is configured
# (looks like "operator-7f3a92" — 6 hex chars from uuid4()).
# - Is persisted to ``backend/data/operator_handle.json`` so it survives
# restarts. Under Docker compose that file lives in the volume mount
# alongside `carrier_cache.json` and the other persistent state.
# - Can be overridden by the operator via the `OPERATOR_HANDLE` setting
# (env var or settings UI). Operators with their own GitHub handle,
# organization name, etc. can use that for traceability.
# - Is NEVER mixed into mesh / Wormhole / Infonet identity. This layer is
# strictly for public third-party API attribution.
_SHADOWBROKER_VERSION = "0.9"
_OPERATOR_HANDLE_FILE = (
Path(__file__).parent.parent / "data" / "operator_handle.json"
)
_OPERATOR_HANDLE_CACHE: str = ""
_OPERATOR_HANDLE_LOCK = threading.Lock()
def _generate_operator_handle() -> str:
"""Produce a stable pseudonymous handle for first-launch installs.
Format: ``operator-7f3a92`` (6 hex chars from a fresh uuid4()).
Distinct per install. Carries no real-world identity by default
operators who want one can override via ``OPERATOR_HANDLE``.
Note: the prefix is deliberately neutral. Earlier drafts used
``shadow-`` which, while accurate to the project name, looks
exactly like the kind of pattern a third-party abuse-detection
system would auto-block as suspicious. ``operator-`` describes
what the value actually is and doesn't pattern-match malware.
"""
return f"operator-{uuid.uuid4().hex[:6]}"
def _load_persisted_operator_handle() -> str:
"""Return the previously-saved handle from disk, or empty if none.
Reads ``backend/data/operator_handle.json`` if it exists. Any read
error returns empty so a fresh handle gets generated rather than
crashing the request.
"""
try:
if _OPERATOR_HANDLE_FILE.exists():
data = json.loads(_OPERATOR_HANDLE_FILE.read_text(encoding="utf-8"))
return str(data.get("handle", "") or "").strip()
except (OSError, json.JSONDecodeError, ValueError):
pass
return ""
def _persist_operator_handle(handle: str) -> None:
"""Atomically save the auto-generated handle so subsequent restarts
use the same one. Failure to persist is non-fatal the request still
succeeds with the in-memory handle, we just may generate a different
one on the next process restart."""
try:
_OPERATOR_HANDLE_FILE.parent.mkdir(parents=True, exist_ok=True)
tmp = _OPERATOR_HANDLE_FILE.with_suffix(_OPERATOR_HANDLE_FILE.suffix + ".tmp")
tmp.write_text(
json.dumps({"handle": handle, "_meta": {
"purpose": "Per-install operator handle for outbound third-party API attribution.",
"see": "backend/services/network_utils.py:outbound_user_agent",
}}, indent=2),
encoding="utf-8",
)
os.replace(tmp, _OPERATOR_HANDLE_FILE)
except OSError as exc:
logger.debug("Could not persist operator_handle (continuing in-memory): %s", exc)
def get_operator_handle() -> str:
"""Return the stable per-install operator handle.
Resolution order:
1. ``OPERATOR_HANDLE`` setting (env var / settings UI) if non-empty.
2. Process-cached value from previous call this run.
3. Value persisted to ``operator_handle.json`` (from a previous run).
4. Newly generated pseudonymous handle, persisted to disk.
The handle is normalized: stripped of whitespace, lowercased,
non-alphanumeric chars (except ``-`` and ``_``) replaced with ``-``.
This both sanitizes any HTTP-header-unsafe characters AND prevents
the operator from impersonating real third-party projects via
inventive whitespace.
"""
global _OPERATOR_HANDLE_CACHE
with _OPERATOR_HANDLE_LOCK:
# 1. Configured override always wins.
configured = ""
try:
from services.config import get_settings
configured = str(getattr(get_settings(), "OPERATOR_HANDLE", "") or "").strip()
except Exception:
configured = ""
if configured:
return _normalize_handle(configured)
# 2. In-memory cache (fast path for repeated calls).
if _OPERATOR_HANDLE_CACHE:
return _OPERATOR_HANDLE_CACHE
# 3. On-disk handle from a previous run.
persisted = _load_persisted_operator_handle()
if persisted:
_OPERATOR_HANDLE_CACHE = _normalize_handle(persisted)
return _OPERATOR_HANDLE_CACHE
# 4. Generate, persist, return.
fresh = _generate_operator_handle()
_persist_operator_handle(fresh)
_OPERATOR_HANDLE_CACHE = fresh
return fresh
def _normalize_handle(raw: str) -> str:
"""Strip whitespace, lowercase, replace unsafe characters with dashes."""
safe = "".join(
ch if (ch.isalnum() or ch in "-_") else "-"
for ch in raw.strip().lower()
)
# Collapse runs of dashes and trim to a reasonable length so an
# operator can't make our outbound logs unreadable.
while "--" in safe:
safe = safe.replace("--", "-")
safe = safe.strip("-")
return safe[:48] if safe else "anonymous"
_CONTACT_URL = "https://github.com/BigBodyCobain/Shadowbroker/issues"
def outbound_user_agent(purpose: str = "") -> str:
"""Build a User-Agent for an outbound third-party HTTP request.
Returns something like::
Shadowbroker/0.9 (operator: shadow-7f3a92; purpose: wikipedia;
+https://github.com/BigBodyCobain/Shadowbroker/issues)
The ``purpose`` is optional but recommended it tells the upstream
what feature of ours is making the call (``wikipedia``, ``openmhz``,
``nominatim``, etc.), which makes their logs and our complaints
actionable.
Every outbound call in the backend that previously sent a custom
User-Agent should call this helper instead. Centralizing here means:
- one place to change the contact URL,
- one place to bump the version on release,
- one place a Wikimedia / OpenMHz operator can reach to ask for
the project to back off, with a per-install handle so they can
target the specific install instead of the project as a whole.
"""
handle = get_operator_handle()
if purpose:
purpose_clean = _normalize_handle(purpose)
return (
f"Shadowbroker/{_SHADOWBROKER_VERSION} "
f"(operator: {handle}; purpose: {purpose_clean}; +{_CONTACT_URL})"
)
return (
f"Shadowbroker/{_SHADOWBROKER_VERSION} "
f"(operator: {handle}; +{_CONTACT_URL})"
)
def _reset_operator_handle_cache_for_tests() -> None:
"""Test-only: invalidate the in-memory cache so a test can set a
new ``OPERATOR_HANDLE`` env var and see it picked up immediately."""
global _OPERATOR_HANDLE_CACHE
with _OPERATOR_HANDLE_LOCK:
_OPERATOR_HANDLE_CACHE = ""
# Default outbound User-Agent. Retained for backwards compatibility with
# call sites that haven't been migrated to ``outbound_user_agent()`` yet.
# Operators who want full per-install attribution should set the
# ``OPERATOR_HANDLE`` setting and migrate call sites incrementally.
#
# Operators who run a public-facing relay can also override the whole UA
# string via the ``SHADOWBROKER_USER_AGENT`` env var. That override
# completely bypasses the per-operator helper; only use it if you know
# what you're doing.
DEFAULT_USER_AGENT = os.environ.get(
"SHADOWBROKER_USER_AGENT",
"ShadowBroker-OSINT/0.9",
f"Shadowbroker/{_SHADOWBROKER_VERSION}",
)
# Find bash for curl fallback — Git bash's curl has the TLS features
+61 -20
View File
@@ -2,14 +2,34 @@ import requests
from bs4 import BeautifulSoup
import logging
from cachetools import cached, TTLCache
import cloudscraper
import reverse_geocoder as rg
from urllib.parse import urlparse
from services.network_utils import outbound_user_agent
logger = logging.getLogger(__name__)
_OPENMHZ_AUDIO_HOSTS = {"media.openmhz.com", "media2.openmhz.com", "media3.openmhz.com"}
# Round 7a / Issues #289, #290, #291 (tg12 audit):
# We previously sent a spoofed Chrome User-Agent and (for OpenMHz) used
# cloudscraper to bypass anti-bot challenges. Both are dishonest and ToS-
# unfriendly. We now send the per-install Shadowbroker UA — the upstream
# can identify us, rate-limit us per install, and contact us if needed.
#
# If the upstream actively blocks our honest UA, the feature degrades
# gracefully (returns an empty list / cached results) rather than
# escalating to deception.
def _broadcastify_user_agent() -> str:
return outbound_user_agent("broadcastify")
def _openmhz_user_agent() -> str:
return outbound_user_agent("openmhz")
# Cache the top feeds for 5 minutes so we don't hammer Broadcastify
radio_cache = TTLCache(maxsize=1, ttl=300)
@@ -22,8 +42,12 @@ def get_top_broadcastify_feeds():
"""
logger.info("Scraping Broadcastify Top Feeds (Cache Miss)")
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8",
# Issue #289 (tg12) + Round 7a: identify ourselves honestly as a
# per-install Shadowbroker scraper. Broadcastify can rate-limit
# us per install or block us; either way we stop pretending to be
# a browser. If they block, the panel degrades gracefully.
"User-Agent": _broadcastify_user_agent(),
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.9",
}
@@ -89,21 +113,32 @@ openmhz_systems_cache = TTLCache(maxsize=1, ttl=3600)
@cached(openmhz_systems_cache)
def get_openmhz_systems():
"""Fetches the full directory of OpenMHZ systems."""
logger.info("Scraping OpenMHZ Systems (Cache Miss)")
scraper = cloudscraper.create_scraper(
browser={"browser": "chrome", "platform": "windows", "desktop": True}
)
"""Fetches the full directory of OpenMHZ systems.
Issue #290 (tg12) + Round 7a: replaced cloudscraper-based Chrome
impersonation with an honest per-install Shadowbroker User-Agent.
If OpenMHz's Cloudflare layer blocks honest traffic, we accept
that degradation (return empty list) rather than spoof a browser.
"""
logger.info("Fetching OpenMHZ Systems (Cache Miss)")
try:
res = scraper.get("https://api.openmhz.com/systems", timeout=15)
res = requests.get(
"https://api.openmhz.com/systems",
timeout=15,
headers={"User-Agent": _openmhz_user_agent(), "Accept": "application/json"},
)
if res.status_code == 200:
data = res.json()
# Return list of systems
return data.get("systems", []) if isinstance(data, dict) else []
if res.status_code in (403, 503):
logger.warning(
"OpenMHZ returned %s for systems directory — Cloudflare may "
"be blocking our honest UA. Feature degrades to empty result.",
res.status_code,
)
return []
except (requests.RequestException, ConnectionError, TimeoutError, ValueError, KeyError) as e:
logger.error(f"OpenMHZ Systems Scrape Exception: {e}")
logger.error(f"OpenMHZ Systems Fetch Exception: {e}")
return []
@@ -113,21 +148,25 @@ openmhz_calls_cache = TTLCache(maxsize=100, ttl=20)
@cached(openmhz_calls_cache)
def get_recent_openmhz_calls(sys_name: str):
"""Fetches the actual audio burst .m4a URLs for a specific system (e.g., 'wmata')."""
logger.info(f"Fetching OpenMHZ calls for {sys_name} (Cache Miss)")
scraper = cloudscraper.create_scraper(
browser={"browser": "chrome", "platform": "windows", "desktop": True}
)
"""Fetches the actual audio burst .m4a URLs for a specific system (e.g., 'wmata').
Issue #290 (tg12) + Round 7a: same honest-UA model as
``get_openmhz_systems``.
"""
logger.info(f"Fetching OpenMHZ calls for {sys_name} (Cache Miss)")
try:
url = f"https://api.openmhz.com/{sys_name}/calls"
res = scraper.get(url, timeout=15)
res = requests.get(
url,
timeout=15,
headers={"User-Agent": _openmhz_user_agent(), "Accept": "application/json"},
)
if res.status_code == 200:
data = res.json()
return data.get("calls", []) if isinstance(data, dict) else []
return []
except (requests.RequestException, ConnectionError, TimeoutError, ValueError, KeyError) as e:
logger.error(f"OpenMHZ Calls Scrape Exception ({sys_name}): {e}")
logger.error(f"OpenMHZ Calls Fetch Exception ({sys_name}): {e}")
return []
@@ -163,9 +202,11 @@ def openmhz_audio_response(target_url: str):
timeout=(5, 20),
allow_redirects=False,
headers={
"User-Agent": "Mozilla/5.0",
# Issue #291 (tg12) + Round 7a: drop spoofed Mozilla
# UA and the fake first-party Referer. Identify as
# the per-install Shadowbroker proxy honestly.
"User-Agent": _openmhz_user_agent(),
"Accept": "audio/mpeg,audio/*,*/*;q=0.8",
"Referer": "https://openmhz.com/",
},
)
if upstream.is_redirect or upstream.status_code in (301, 302, 303, 307, 308):
+37 -6
View File
@@ -4,7 +4,7 @@ import concurrent.futures
from urllib.parse import quote
import requests as _requests
from cachetools import TTLCache
from services.network_utils import fetch_with_curl
from services.network_utils import fetch_with_curl, outbound_user_agent
logger = logging.getLogger(__name__)
@@ -15,6 +15,31 @@ dossier_cache = TTLCache(maxsize=500, ttl=86400)
# Nominatim requires max 1 req/sec — track last call time
_nominatim_last_call = 0.0
# Issues #218 / #219 (tg12): Wikimedia's User-Agent policy requires API
# clients to identify themselves with a stable User-Agent that includes
# a contact path.
#
# Round 7a: the original fix in PR #284 used a single project-wide
# identifier, which from Wikimedia's perspective made every Shadowbroker
# install in the world look like one giant scraper. If one install
# misbehaved, their only recourse was to block "Shadowbroker" as a
# whole. We now build the headers from ``outbound_user_agent('wikimedia')``
# which embeds the per-install operator handle (auto-generated or
# operator-chosen), so Wikimedia can rate-limit / contact the specific
# install instead of the project.
def _wikimedia_request_headers() -> dict[str, str]:
ua = outbound_user_agent("wikimedia")
return {
"User-Agent": ua,
# Browser-JS-style header that Wikimedia's policy explicitly
# accepts on top of (or instead of) User-Agent. We send both so
# whichever the upstream prefers, the per-operator handle is
# always available.
"Api-User-Agent": ua,
}
def _reverse_geocode_offline(lat: float, lng: float) -> dict:
"""Offline fallback via reverse_geocoder when external reverse geocoding is blocked."""
@@ -45,9 +70,7 @@ def _reverse_geocode(lat: float, lng: float) -> dict:
f"https://nominatim.openstreetmap.org/reverse?"
f"lat={lat}&lon={lng}&format=json&zoom=10&addressdetails=1&accept-language=en"
)
headers = {
"User-Agent": "ShadowBroker-OSINT/1.0 (live-risk-dashboard; contact@shadowbroker.app)"
}
headers = {"User-Agent": outbound_user_agent("nominatim")}
for attempt in range(2):
# Enforce Nominatim's 1 req/sec policy
@@ -121,7 +144,13 @@ def _fetch_wikidata_leader(country_name: str) -> dict:
"""
url = f"https://query.wikidata.org/sparql?query={quote(sparql)}&format=json"
try:
res = fetch_with_curl(url, timeout=6)
# Issue #218 (tg12): Wikimedia's User-Agent policy requires
# outbound API traffic to be identifiable. fetch_with_curl()
# sends the project default, and we also add the Wikimedia-
# specific Api-User-Agent that the policy specifically asks
# for, since this request originates from a backend service
# that proxies on behalf of (potentially many) browser users.
res = fetch_with_curl(url, timeout=6, headers=_wikimedia_request_headers())
if res.status_code == 200:
results = res.json().get("results", {}).get("bindings", [])
if results:
@@ -147,7 +176,9 @@ def _fetch_local_wiki_summary(place_name: str, country_name: str = "") -> dict:
slug = quote(name.replace(" ", "_"))
url = f"https://en.wikipedia.org/api/rest_v1/page/summary/{slug}"
try:
res = fetch_with_curl(url, timeout=5)
# Issue #219 (tg12): identify ourselves to Wikimedia per
# their UA policy; see _fetch_wikidata_leader above.
res = fetch_with_curl(url, timeout=5, headers=_wikimedia_request_headers())
if res.status_code == 200:
data = res.json()
if data.get("type") != "disambiguation":
+6 -1
View File
@@ -34,6 +34,11 @@ from services.sar.sar_config import (
copernicus_token,
earthdata_token,
)
def _sar_user_agent() -> str:
from services.network_utils import outbound_user_agent
return outbound_user_agent("sar-products")
from services.sar.sar_normalize import (
SarAnomaly,
evidence_hash_for_payload,
@@ -442,7 +447,7 @@ def _fetch_unosat_packages() -> list[dict[str, Any]]:
# HDX CKAN returns 406 without explicit Accept + a browser-ish UA.
hdx_headers = {
"Accept": "application/json",
"User-Agent": "Mozilla/5.0 (compatible; ShadowBroker-SAR/1.0)",
"User-Agent": _sar_user_agent(),
}
try:
resp = fetch_with_curl(url, timeout=20, headers=hdx_headers)
+10 -1
View File
@@ -11,12 +11,21 @@ import requests
from datetime import datetime, timedelta
from cachetools import TTLCache
from services.network_utils import outbound_user_agent
logger = logging.getLogger(__name__)
# Cache by rounded lat/lon (0.02° grid ~= 2km), TTL 1 hour
_sentinel_cache = TTLCache(maxsize=200, ttl=3600)
def _planetary_user_agent() -> str:
# Round 7a: per-install handle so Microsoft Planetary Computer can
# attribute requests to the specific operator rather than treating
# the whole Shadowbroker user base as one entity.
return outbound_user_agent("sentinel2-planetary-computer")
def _esri_imagery_fallback(lat: float, lng: float) -> dict:
lat_span = 0.18
lng_span = 0.24
@@ -64,7 +73,7 @@ def search_sentinel2_scene(lat: float, lng: float) -> dict:
"https://planetarycomputer.microsoft.com/api/stac/v1/search",
json=search_payload,
timeout=8,
headers={"User-Agent": "ShadowBroker-OSINT/1.0 (live-risk-dashboard)"},
headers={"User-Agent": _planetary_user_agent()},
)
search_res.raise_for_status()
data = search_res.json()
+6 -2
View File
@@ -20,7 +20,11 @@ from cachetools import TTLCache
logger = logging.getLogger(__name__)
_SHODAN_BASE = "https://api.shodan.io"
_USER_AGENT = "ShadowBroker/0.9.79 local Shodan connector"
# Round 7a: per-install attribution. Shodan already has the operator API
# key for billing, but the UA still identifies the install.
def _shodan_user_agent():
from services.network_utils import outbound_user_agent
return outbound_user_agent("shodan")
_REQUEST_TIMEOUT = 15
_MIN_INTERVAL_SECONDS = 1.05 # Shodan docs say API plans are rate limited to ~1 req/sec.
_DEFAULT_SEARCH_PAGES = 1
@@ -179,7 +183,7 @@ def _request(path: str, *, params: dict[str, Any], cache: TTLCache[str, dict[str
f"{_SHODAN_BASE}{path}",
params=payload,
timeout=_REQUEST_TIMEOUT,
headers={"User-Agent": _USER_AGENT, "Accept": "application/json"},
headers={"User-Agent": _shodan_user_agent(), "Accept": "application/json"},
)
finally:
_last_request_at = time.monotonic()
+9 -2
View File
@@ -19,6 +19,13 @@ from pathlib import Path
import requests
from sgp4.api import Satrec, WGS72, jday
def _tinygs_user_agent(purpose: str) -> str:
"""Round 7a: per-install handle for CelesTrak / TinyGS attribution."""
from services.network_utils import outbound_user_agent
return outbound_user_agent(f"tinygs-{purpose}")
logger = logging.getLogger(__name__)
# ---------------------------------------------------------------------------
@@ -113,7 +120,7 @@ def _fetch_celestrak_tles() -> list[dict]:
params={"GROUP": group, "FORMAT": "json"},
timeout=20,
headers={
"User-Agent": "ShadowBroker-OSINT/1.0 (CelesTrak fair-use)",
"User-Agent": _tinygs_user_agent("celestrak"),
"Accept": "application/json",
},
)
@@ -259,7 +266,7 @@ def _fetch_tinygs_telemetry() -> None:
timeout=15,
headers={
"Accept": "application/json",
"User-Agent": "ShadowBroker-OSINT/1.0",
"User-Agent": _tinygs_user_agent("tinygs"),
},
)
resp.raise_for_status()
+4 -2
View File
@@ -24,7 +24,9 @@ from cachetools import TTLCache
logger = logging.getLogger(__name__)
_FINNHUB_BASE = "https://finnhub.io/api/v1"
_USER_AGENT = "ShadowBroker/0.9.79 Finnhub connector"
def _finnhub_user_agent():
from services.network_utils import outbound_user_agent
return outbound_user_agent("finnhub")
_REQUEST_TIMEOUT = 12
_MIN_INTERVAL_SECONDS = 0.35 # Stay well under 60 calls/min
@@ -89,7 +91,7 @@ def _request(path: str, params: dict[str, Any] | None = None) -> Any:
f"{_FINNHUB_BASE}{path}",
params=payload,
timeout=_REQUEST_TIMEOUT,
headers={"User-Agent": _USER_AGENT, "Accept": "application/json"},
headers={"User-Agent": _finnhub_user_agent(), "Accept": "application/json"},
)
finally:
_last_request_at = time.monotonic()
@@ -0,0 +1,677 @@
{
"_meta": {
"issue": "#239",
"note": "Snapshot of currently-tolerated duplicate route registrations. The test in test_no_new_duplicate_routes.py fails if any NEW (method, path) duplicate appears outside this list. Removing entries (by actually deduping) is fine and the test stays green. New entries here require explicit, reviewed updates.",
"generated_with": "python -c 'see tests/test_no_new_duplicate_routes.py'"
},
"duplicates": {
"DELETE /api/mesh/peers": [
"main",
"routers.mesh_operator",
"routers.mesh_public"
],
"DELETE /api/wormhole/dm/contact/{peer_id}": [
"main",
"routers.wormhole"
],
"DELETE /api/wormhole/dm/invite/handles/{handle}": [
"main",
"routers.wormhole"
],
"GET /api/cctv/media": [
"main",
"routers.cctv"
],
"GET /api/debug-latest": [
"main",
"routers.health"
],
"GET /api/geocode/reverse": [
"main",
"routers.tools"
],
"GET /api/geocode/search": [
"main",
"routers.tools"
],
"GET /api/health": [
"main",
"routers.health"
],
"GET /api/live-data": [
"main",
"routers.data"
],
"GET /api/live-data/fast": [
"main",
"routers.data"
],
"GET /api/live-data/slow": [
"main",
"routers.data"
],
"GET /api/mesh/channels": [
"main",
"routers.mesh_public"
],
"GET /api/mesh/dm/count": [
"main",
"routers.mesh_dm"
],
"GET /api/mesh/dm/poll": [
"main",
"routers.mesh_dm"
],
"GET /api/mesh/dm/prekey-bundle": [
"main",
"routers.mesh_dm"
],
"GET /api/mesh/dm/pubkey": [
"main",
"routers.mesh_dm"
],
"GET /api/mesh/dm/witness": [
"main",
"routers.mesh_dm"
],
"GET /api/mesh/gate/list": [
"main",
"routers.mesh_public"
],
"GET /api/mesh/gate/{gate_id}": [
"main",
"routers.mesh_public"
],
"GET /api/mesh/gate/{gate_id}/messages": [
"main",
"routers.mesh_public"
],
"GET /api/mesh/infonet/event/{event_id}": [
"main",
"routers.mesh_public"
],
"GET /api/mesh/infonet/events": [
"main",
"routers.mesh_public"
],
"GET /api/mesh/infonet/locator": [
"main",
"routers.mesh_public"
],
"GET /api/mesh/infonet/merkle": [
"main",
"routers.mesh_public"
],
"GET /api/mesh/infonet/messages": [
"main",
"routers.mesh_public"
],
"GET /api/mesh/infonet/messages/wait": [
"main",
"routers.mesh_public"
],
"GET /api/mesh/infonet/node/{node_id}": [
"main",
"routers.mesh_public"
],
"GET /api/mesh/infonet/status": [
"main",
"routers.mesh_public"
],
"GET /api/mesh/infonet/sync": [
"main",
"routers.mesh_public"
],
"GET /api/mesh/log": [
"main",
"routers.mesh_public"
],
"GET /api/mesh/messages": [
"main",
"routers.mesh_public"
],
"GET /api/mesh/metrics": [
"main",
"routers.mesh_public"
],
"GET /api/mesh/oracle/consensus": [
"main",
"routers.mesh_oracle"
],
"GET /api/mesh/oracle/markets": [
"main",
"routers.mesh_oracle"
],
"GET /api/mesh/oracle/markets/more": [
"main",
"routers.mesh_oracle"
],
"GET /api/mesh/oracle/predictions": [
"main",
"routers.mesh_oracle"
],
"GET /api/mesh/oracle/profile": [
"main",
"routers.mesh_oracle"
],
"GET /api/mesh/oracle/search": [
"main",
"routers.mesh_oracle"
],
"GET /api/mesh/oracle/stakes/{message_id}": [
"main",
"routers.mesh_oracle"
],
"GET /api/mesh/peers": [
"main",
"routers.mesh_operator",
"routers.mesh_public"
],
"GET /api/mesh/reputation": [
"main",
"routers.mesh_public"
],
"GET /api/mesh/reputation/all": [
"main",
"routers.mesh_public"
],
"GET /api/mesh/reputation/batch": [
"main",
"routers.mesh_public"
],
"GET /api/mesh/rns/status": [
"main",
"routers.mesh_public"
],
"GET /api/mesh/signals": [
"main",
"routers.mesh_public"
],
"GET /api/mesh/status": [
"main",
"routers.mesh_public"
],
"GET /api/mesh/trust/vouches": [
"main",
"routers.mesh_dm"
],
"GET /api/oracle/region-intel": [
"main",
"routers.sigint"
],
"GET /api/radio/nearest": [
"main",
"routers.radio"
],
"GET /api/radio/nearest-list": [
"main",
"routers.radio"
],
"GET /api/radio/openmhz/audio": [
"main",
"routers.radio"
],
"GET /api/radio/openmhz/calls/{sys_name}": [
"main",
"routers.radio"
],
"GET /api/radio/openmhz/systems": [
"main",
"routers.radio"
],
"GET /api/radio/top": [
"main",
"routers.radio"
],
"GET /api/refresh": [
"main",
"routers.data"
],
"GET /api/region-dossier": [
"main",
"routers.tools"
],
"GET /api/route/{callsign}": [
"main",
"routers.radio"
],
"GET /api/sentinel2/search": [
"main",
"routers.tools"
],
"GET /api/settings/api-keys": [
"main",
"routers.admin"
],
"GET /api/settings/api-keys/meta": [
"main",
"routers.admin"
],
"GET /api/settings/news-feeds": [
"main",
"routers.admin"
],
"GET /api/settings/node": [
"main",
"routers.admin"
],
"GET /api/settings/privacy-profile": [
"main",
"routers.wormhole"
],
"GET /api/settings/wormhole": [
"main",
"routers.wormhole"
],
"GET /api/settings/wormhole-status": [
"main",
"routers.wormhole"
],
"GET /api/sigint/nearest-sdr": [
"main",
"routers.sigint"
],
"GET /api/thermal/verify": [
"main",
"routers.sigint"
],
"GET /api/tools/shodan/status": [
"main",
"routers.tools"
],
"GET /api/tools/uw/status": [
"main",
"routers.tools"
],
"GET /api/wormhole/dm/contacts": [
"main",
"routers.wormhole"
],
"GET /api/wormhole/dm/identity": [
"main",
"routers.wormhole"
],
"GET /api/wormhole/dm/invite": [
"main",
"routers.wormhole"
],
"GET /api/wormhole/dm/invite/handles": [
"main",
"routers.wormhole"
],
"GET /api/wormhole/gate/{gate_id}/identity": [
"main",
"routers.wormhole"
],
"GET /api/wormhole/gate/{gate_id}/key": [
"main",
"routers.wormhole"
],
"GET /api/wormhole/gate/{gate_id}/personas": [
"main",
"routers.wormhole"
],
"GET /api/wormhole/health": [
"main",
"routers.wormhole"
],
"GET /api/wormhole/identity": [
"main",
"routers.wormhole"
],
"GET /api/wormhole/status": [
"main",
"routers.wormhole"
],
"PATCH /api/mesh/peers": [
"main",
"routers.mesh_operator",
"routers.mesh_public"
],
"POST /api/ais/feed": [
"main",
"routers.data"
],
"POST /api/layers": [
"main",
"routers.data"
],
"POST /api/mesh/dm/block": [
"main",
"routers.mesh_dm"
],
"POST /api/mesh/dm/count": [
"main",
"routers.mesh_dm"
],
"POST /api/mesh/dm/poll": [
"main",
"routers.mesh_dm"
],
"POST /api/mesh/dm/register": [
"main",
"routers.mesh_dm"
],
"POST /api/mesh/dm/send": [
"main",
"routers.mesh_dm"
],
"POST /api/mesh/dm/witness": [
"main",
"routers.mesh_dm"
],
"POST /api/mesh/gate/create": [
"main",
"routers.mesh_public"
],
"POST /api/mesh/gate/peer-pull": [
"main",
"routers.mesh_peer_sync"
],
"POST /api/mesh/gate/peer-push": [
"main",
"routers.mesh_peer_sync"
],
"POST /api/mesh/gate/{gate_id}/message": [
"main",
"routers.mesh_public"
],
"POST /api/mesh/identity/revoke": [
"main",
"routers.mesh_public"
],
"POST /api/mesh/identity/rotate": [
"main",
"routers.mesh_public"
],
"POST /api/mesh/infonet/ingest": [
"main",
"routers.mesh_public"
],
"POST /api/mesh/infonet/peer-push": [
"main",
"routers.mesh_peer_sync"
],
"POST /api/mesh/infonet/sync": [
"main",
"routers.mesh_public"
],
"POST /api/mesh/oracle/predict": [
"main",
"routers.mesh_oracle"
],
"POST /api/mesh/oracle/resolve": [
"main",
"routers.mesh_oracle"
],
"POST /api/mesh/oracle/resolve-stakes": [
"main",
"routers.mesh_oracle"
],
"POST /api/mesh/oracle/stake": [
"main",
"routers.mesh_oracle"
],
"POST /api/mesh/peers": [
"main",
"routers.mesh_operator",
"routers.mesh_public"
],
"POST /api/mesh/report": [
"main",
"routers.mesh_public"
],
"POST /api/mesh/send": [
"main",
"routers.mesh_public"
],
"POST /api/mesh/trust/vouch": [
"main",
"routers.mesh_dm"
],
"POST /api/mesh/vote": [
"main",
"routers.mesh_public"
],
"POST /api/sentinel/tile": [
"main",
"routers.tools"
],
"POST /api/sentinel/token": [
"main",
"routers.tools"
],
"POST /api/settings/news-feeds/reset": [
"main",
"routers.admin"
],
"POST /api/sigint/transmit": [
"main",
"routers.sigint"
],
"POST /api/system/update": [
"main",
"routers.admin"
],
"POST /api/tools/shodan/count": [
"main",
"routers.tools"
],
"POST /api/tools/shodan/host": [
"main",
"routers.tools"
],
"POST /api/tools/shodan/search": [
"main",
"routers.tools"
],
"POST /api/tools/uw/congress": [
"main",
"routers.tools"
],
"POST /api/tools/uw/darkpool": [
"main",
"routers.tools"
],
"POST /api/tools/uw/flow": [
"main",
"routers.tools"
],
"POST /api/viewport": [
"main",
"routers.data"
],
"POST /api/wormhole/connect": [
"main",
"routers.wormhole"
],
"POST /api/wormhole/disconnect": [
"main",
"routers.wormhole"
],
"POST /api/wormhole/dm/bootstrap-decrypt": [
"main",
"routers.wormhole"
],
"POST /api/wormhole/dm/bootstrap-encrypt": [
"main",
"routers.wormhole"
],
"POST /api/wormhole/dm/build-seal": [
"main",
"routers.wormhole"
],
"POST /api/wormhole/dm/compose": [
"main",
"routers.wormhole"
],
"POST /api/wormhole/dm/dead-drop-token": [
"main",
"routers.wormhole"
],
"POST /api/wormhole/dm/dead-drop-tokens": [
"main",
"routers.wormhole"
],
"POST /api/wormhole/dm/decrypt": [
"main",
"routers.wormhole"
],
"POST /api/wormhole/dm/encrypt": [
"main",
"routers.wormhole"
],
"POST /api/wormhole/dm/invite/import": [
"main",
"routers.wormhole"
],
"POST /api/wormhole/dm/open-seal": [
"main",
"routers.wormhole"
],
"POST /api/wormhole/dm/pairwise-alias": [
"main",
"routers.wormhole"
],
"POST /api/wormhole/dm/pairwise-alias/rotate": [
"main",
"routers.wormhole"
],
"POST /api/wormhole/dm/prekey/register": [
"main",
"routers.wormhole"
],
"POST /api/wormhole/dm/register-key": [
"main",
"routers.wormhole"
],
"POST /api/wormhole/dm/reset": [
"main",
"routers.wormhole"
],
"POST /api/wormhole/dm/sas": [
"main",
"routers.wormhole"
],
"POST /api/wormhole/dm/sender-token": [
"main",
"routers.wormhole"
],
"POST /api/wormhole/gate/enter": [
"main",
"routers.wormhole"
],
"POST /api/wormhole/gate/key/grant": [
"main",
"routers.wormhole"
],
"POST /api/wormhole/gate/key/rotate": [
"main",
"routers.wormhole"
],
"POST /api/wormhole/gate/leave": [
"main",
"routers.wormhole"
],
"POST /api/wormhole/gate/message/compose": [
"main",
"routers.wormhole"
],
"POST /api/wormhole/gate/message/decrypt": [
"main",
"routers.wormhole"
],
"POST /api/wormhole/gate/message/post": [
"main",
"routers.wormhole"
],
"POST /api/wormhole/gate/message/post-encrypted": [
"main",
"routers.wormhole"
],
"POST /api/wormhole/gate/message/sign-encrypted": [
"main",
"routers.wormhole"
],
"POST /api/wormhole/gate/messages/decrypt": [
"main",
"routers.wormhole"
],
"POST /api/wormhole/gate/persona/activate": [
"main",
"routers.wormhole"
],
"POST /api/wormhole/gate/persona/clear": [
"main",
"routers.wormhole"
],
"POST /api/wormhole/gate/persona/create": [
"main",
"routers.wormhole"
],
"POST /api/wormhole/gate/persona/retire": [
"main",
"routers.wormhole"
],
"POST /api/wormhole/gate/proof": [
"main",
"routers.wormhole"
],
"POST /api/wormhole/gate/state/export": [
"main",
"routers.wormhole"
],
"POST /api/wormhole/identity/bootstrap": [
"main",
"routers.wormhole"
],
"POST /api/wormhole/join": [
"main",
"routers.wormhole"
],
"POST /api/wormhole/leave": [
"main",
"routers.wormhole"
],
"POST /api/wormhole/restart": [
"main",
"routers.wormhole"
],
"POST /api/wormhole/sign": [
"main",
"routers.wormhole"
],
"POST /api/wormhole/sign-raw": [
"main",
"routers.wormhole"
],
"PUT /api/mesh/gate/{gate_id}/envelope_policy": [
"main",
"routers.mesh_public"
],
"PUT /api/mesh/gate/{gate_id}/legacy_envelope_fallback": [
"main",
"routers.mesh_public"
],
"PUT /api/settings/news-feeds": [
"main",
"routers.admin"
],
"PUT /api/settings/node": [
"main",
"routers.admin"
],
"PUT /api/settings/privacy-profile": [
"main",
"routers.wormhole"
],
"PUT /api/settings/wormhole": [
"main",
"routers.wormhole"
],
"PUT /api/wormhole/dm/contact": [
"main",
"routers.wormhole"
]
}
}
@@ -0,0 +1,389 @@
"""Issues #244, #245, #246 (tg12 external audit): carrier tracker
quality + provenance + freshness.
These tests pin the post-fix contract:
- **#244**: dated editorial snapshot positions no longer live in the
registry. They live in a one-shot seed file that is consumed once
on first-ever startup. After that, the runtime cache reflects only
what THIS install has actually observed.
- **#245**: headline-derived positions (centroid of a region keyword)
are stamped ``position_confidence = "approximate"`` so the UI can
render them with appropriate uncertainty.
- **#246**: freshness is a *labelling* decision, not an eviction
decision. Positions older than the configurable freshness window
flip from ``"recent"`` to ``"stale"`` but are NEVER replaced with
the registry default that would teleport the carrier. The user
always sees the last position the system actually observed.
"""
from __future__ import annotations
import json
import os
from datetime import datetime, timedelta, timezone
from pathlib import Path
from unittest.mock import patch
import pytest
@pytest.fixture
def fresh_tracker(tmp_path, monkeypatch):
"""Isolated carrier_tracker with seed/cache paths redirected to tmp.
Yields the module so tests can call its functions; resets globals
between tests so position caches don't leak across cases.
"""
from services import carrier_tracker
seed_path = tmp_path / "data" / "carrier_seed.json"
cache_path = tmp_path / "carrier_cache.json"
seed_path.parent.mkdir(parents=True, exist_ok=True)
monkeypatch.setattr(carrier_tracker, "SEED_FILE", seed_path)
monkeypatch.setattr(carrier_tracker, "CACHE_FILE", cache_path)
monkeypatch.delenv("SHADOWBROKER_CARRIER_FRESHNESS_DAYS", raising=False)
# Reset module-level mutable state.
carrier_tracker._carrier_positions.clear()
carrier_tracker._cached_gdelt_articles.clear()
carrier_tracker._last_gdelt_fetch_at = 0.0
yield carrier_tracker
# Clean up so subsequent tests start fresh.
carrier_tracker._carrier_positions.clear()
carrier_tracker._cached_gdelt_articles.clear()
def _write_seed(path: Path, hull: str = "CVN-78", **overrides) -> None:
payload = {
"_meta": {
"as_of": "2026-03-09",
"source": "USNI News Fleet & Marine Tracker",
"source_url": "https://news.usni.org/...",
"note": "test",
},
"carriers": {
hull: {
"lat": 18.0,
"lng": 39.5,
"heading": 0,
"desc": "Red Sea — Operation Epic Fury (USNI Mar 9)",
"source": "USNI News Fleet & Marine Tracker (seed, as of 2026-03-09)",
"source_url": "https://news.usni.org/category/fleet-tracker",
"position_source_at": "2026-03-09T00:00:00Z",
"position_confidence": "seed",
**overrides,
}
},
}
path.write_text(json.dumps(payload), encoding="utf-8")
# ---------------------------------------------------------------------------
# #244 — first-run seed bootstrap, never re-seeds after that
# ---------------------------------------------------------------------------
class TestSeedBootstrap:
def test_first_ever_startup_bootstraps_from_seed(self, fresh_tracker, tmp_path):
_write_seed(fresh_tracker.SEED_FILE)
# No cache exists yet.
assert not fresh_tracker.CACHE_FILE.exists()
positions = fresh_tracker._bootstrap_cache_if_missing()
# The seed entry made it into the cache.
assert "CVN-78" in positions
assert positions["CVN-78"]["lat"] == 18.0
assert positions["CVN-78"]["position_confidence"] == "seed"
# And the cache file is now on disk so subsequent runs skip the seed.
assert fresh_tracker.CACHE_FILE.exists()
def test_subsequent_startup_ignores_seed(self, fresh_tracker, tmp_path):
# Pre-seed a different position into the cache; the seed file says Red Sea.
cache_data = {
"CVN-78": {
"lat": 25.0,
"lng": 55.0,
"heading": 0,
"desc": "Persian Gulf — operator-observed",
"source": "Operator log",
"source_url": "",
"position_source_at": "2026-04-15T12:00:00Z",
"position_confidence": "recent",
}
}
fresh_tracker.CACHE_FILE.write_text(json.dumps(cache_data))
_write_seed(fresh_tracker.SEED_FILE) # seed is present but should NOT be used
positions = fresh_tracker._bootstrap_cache_if_missing()
assert positions["CVN-78"]["lat"] == 25.0
assert positions["CVN-78"]["desc"] == "Persian Gulf — operator-observed"
def test_no_seed_no_cache_falls_back_to_homeport(self, fresh_tracker):
# Neither seed nor cache. Must fall back to homeport defaults
# (carrier never disappears).
assert not fresh_tracker.SEED_FILE.exists()
assert not fresh_tracker.CACHE_FILE.exists()
positions = fresh_tracker._bootstrap_cache_if_missing()
# Every registered carrier has SOMETHING.
assert set(positions.keys()) == set(fresh_tracker.CARRIER_REGISTRY.keys())
# All entries are labelled as homeport defaults.
for hull, entry in positions.items():
assert entry["position_confidence"] == "homeport_default"
registry = fresh_tracker.CARRIER_REGISTRY[hull]
assert entry["lat"] == registry["homeport_lat"]
assert entry["lng"] == registry["homeport_lng"]
# ---------------------------------------------------------------------------
# #244 — no editorial fallbacks live in the registry
# ---------------------------------------------------------------------------
class TestRegistryShape:
def test_registry_has_no_dated_fallback_fields(self, fresh_tracker):
"""The Mar 9 editorial coordinates are gone from the registry.
They live only in the seed file."""
forbidden = {"fallback_lat", "fallback_lng", "fallback_heading", "fallback_desc"}
for hull, entry in fresh_tracker.CARRIER_REGISTRY.items():
offending = forbidden & set(entry.keys())
assert not offending, f"{hull} still has dated registry fields: {offending}"
def test_registry_keeps_homeport_for_every_hull(self, fresh_tracker):
for hull, entry in fresh_tracker.CARRIER_REGISTRY.items():
assert "homeport_lat" in entry, f"{hull} missing homeport_lat"
assert "homeport_lng" in entry, f"{hull} missing homeport_lng"
assert "name" in entry
assert "wiki" in entry
# ---------------------------------------------------------------------------
# #246 — freshness labelling, NOT eviction
# ---------------------------------------------------------------------------
class TestFreshnessLabelling:
def test_recent_observation_labels_recent(self, fresh_tracker):
now = datetime(2026, 6, 1, tzinfo=timezone.utc)
entry = {
"lat": 25.0,
"lng": 55.0,
"position_source_at": (now - timedelta(days=3)).isoformat(),
}
assert fresh_tracker._compute_position_confidence(entry, now=now) == "recent"
def test_aged_observation_flips_to_stale(self, fresh_tracker):
now = datetime(2026, 6, 1, tzinfo=timezone.utc)
entry = {
"lat": 25.0,
"lng": 55.0,
"position_source_at": (now - timedelta(days=30)).isoformat(),
}
assert fresh_tracker._compute_position_confidence(entry, now=now) == "stale"
def test_seed_label_is_preserved_explicitly(self, fresh_tracker):
now = datetime(2026, 6, 1, tzinfo=timezone.utc)
entry = {
"lat": 18.0,
"lng": 39.5,
"position_source_at": "2026-03-09T00:00:00Z",
"position_confidence": "seed",
}
# Even though the source is months old, the explicit "seed" label wins
# so the UI can render the seed-specific badge instead of generic "stale".
assert fresh_tracker._compute_position_confidence(entry, now=now) == "seed"
def test_homeport_default_label_is_preserved(self, fresh_tracker):
now = datetime(2026, 6, 1, tzinfo=timezone.utc)
entry = {
"lat": 36.95,
"lng": -76.32,
"position_source_at": now.isoformat(),
"position_confidence": "homeport_default",
}
assert fresh_tracker._compute_position_confidence(entry, now=now) == "homeport_default"
def test_freshness_window_is_env_configurable(self, fresh_tracker, monkeypatch):
now = datetime(2026, 6, 1, tzinfo=timezone.utc)
entry = {
"lat": 25.0,
"lng": 55.0,
"position_source_at": (now - timedelta(days=20)).isoformat(),
}
# Default window = 14 days → 20-day-old entry is stale.
assert fresh_tracker._compute_position_confidence(entry, now=now) == "stale"
# Stretch to 30 days → same entry is now "recent".
monkeypatch.setenv("SHADOWBROKER_CARRIER_FRESHNESS_DAYS", "30")
assert fresh_tracker._compute_position_confidence(entry, now=now) == "recent"
def test_aged_cache_entry_keeps_its_position_never_reverts(self, fresh_tracker):
"""The core regression test for the user's intent: a year-old
cache entry must NOT be replaced with the seed or homeport.
The PHYSICAL position the user sees is the last one observed;
only the freshness LABEL changes."""
a_year_ago = (datetime.now(timezone.utc) - timedelta(days=365)).isoformat()
cache_data = {
"CVN-78": {
"lat": 25.0,
"lng": 55.0,
"heading": 0,
"desc": "Persian Gulf",
"source": "GDELT News API",
"source_url": "https://news.example/...",
"position_source_at": a_year_ago,
"position_confidence": "recent", # was recent when written
}
}
fresh_tracker.CACHE_FILE.write_text(json.dumps(cache_data))
positions = fresh_tracker._bootstrap_cache_if_missing()
enriched = fresh_tracker._enrich_for_rendering("CVN-78", positions["CVN-78"])
# The position is preserved exactly.
assert enriched["lat"] == 25.0
assert enriched["lng"] == 55.0
# But the live label has flipped to stale.
assert enriched["position_confidence"] == "stale"
assert enriched["is_fallback"] is True
# ---------------------------------------------------------------------------
# #245 — approximate confidence for region-centroid positions
# ---------------------------------------------------------------------------
class TestApproximateConfidenceForNewsDerivedPositions:
def test_news_parsing_stamps_approximate_confidence(self, fresh_tracker):
articles = [
{
"title": "USS Ford carrier deployed in Mediterranean for joint exercise",
"url": "https://news.example/ford-mediterranean",
"seendate": "20260415120000",
}
]
updates = fresh_tracker._parse_carrier_positions_from_news(articles)
assert "CVN-78" in updates
entry = updates["CVN-78"]
assert entry["position_confidence"] == "approximate"
# And the source_at is the article's seen date, not now().
assert entry["position_source_at"].startswith("2026-04-15")
def test_gdelt_seendate_parser_handles_well_formed_input(self, fresh_tracker):
iso = fresh_tracker._gdelt_seendate_to_iso("20260415120000")
assert iso is not None
assert iso.startswith("2026-04-15T12:00:00")
def test_gdelt_seendate_parser_returns_none_on_garbage(self, fresh_tracker):
assert fresh_tracker._gdelt_seendate_to_iso("") is None
assert fresh_tracker._gdelt_seendate_to_iso("not-a-date") is None
assert fresh_tracker._gdelt_seendate_to_iso("2026") is None
# ---------------------------------------------------------------------------
# Full enrichment → public API shape
# ---------------------------------------------------------------------------
class TestEnrichForRendering:
def test_seed_entry_produces_expected_public_fields(self, fresh_tracker):
seed_entry = {
"lat": 18.0,
"lng": 39.5,
"heading": 0,
"desc": "Red Sea (USNI Mar 9)",
"source": "USNI News Fleet & Marine Tracker (seed, as of 2026-03-09)",
"source_url": "https://news.usni.org/category/fleet-tracker",
"position_source_at": "2026-03-09T00:00:00Z",
"position_confidence": "seed",
}
enriched = fresh_tracker._enrich_for_rendering("CVN-78", seed_entry)
# Existing UI fields preserved.
assert enriched["lat"] == 18.0
assert enriched["lng"] == 39.5
assert enriched["source"].startswith("USNI")
assert enriched["last_osint_update"] == "2026-03-09T00:00:00Z"
# New audit-required fields.
assert enriched["position_confidence"] == "seed"
assert enriched["position_source_at"] == "2026-03-09T00:00:00Z"
assert enriched["is_fallback"] is True
def test_recent_observation_is_not_fallback(self, fresh_tracker):
now = datetime.now(timezone.utc)
recent_entry = {
"lat": 25.0,
"lng": 55.0,
"heading": 0,
"desc": "Persian Gulf",
"source": "GDELT News API",
"source_url": "https://news.example/...",
"position_source_at": (now - timedelta(days=2)).isoformat(),
"position_confidence": "approximate",
}
enriched = fresh_tracker._enrich_for_rendering("CVN-78", recent_entry, now=now)
assert enriched["position_confidence"] == "approximate"
# Approximate (from a recent headline) is honest precision, but the UI
# treats it as live data — is_fallback only flips True for explicit
# fallback categories (seed / stale / homeport_default).
assert enriched["is_fallback"] is False
# ---------------------------------------------------------------------------
# Regression: existing frontend fields are preserved
# ---------------------------------------------------------------------------
class TestPublicResponseShapeBackwardCompat:
"""The frontend ShipPopup expects `estimated`, `source`, `source_url`,
`last_osint_update`. The new fields are additive and existing fields
keep their meaning so the UI does not need updating to keep working."""
def test_get_carrier_positions_preserves_existing_keys(self, fresh_tracker):
_write_seed(fresh_tracker.SEED_FILE)
fresh_tracker._bootstrap_cache_if_missing()
with fresh_tracker._positions_lock:
fresh_tracker._carrier_positions.update(
{
"CVN-78": {
"lat": 18.0,
"lng": 39.5,
"heading": 0,
"desc": "Red Sea (seed)",
"source": "Seed",
"source_url": "",
"position_source_at": "2026-03-09T00:00:00Z",
"position_confidence": "seed",
}
}
)
out = fresh_tracker.get_carrier_positions()
assert len(out) == 1
c = out[0]
# Old fields the frontend uses.
for key in (
"name",
"type",
"lat",
"lng",
"country",
"desc",
"wiki",
"estimated",
"source",
"source_url",
"last_osint_update",
):
assert key in c, f"missing legacy field {key!r}"
# New fields.
for key in ("position_confidence", "position_source_at", "is_fallback"):
assert key in c, f"missing audit-required field {key!r}"
assert c["type"] == "carrier"
assert c["estimated"] is True
@@ -0,0 +1,83 @@
"""GDELT's ``data.gdeltproject.org`` is a CNAME to a Google Cloud Storage
bucket. GCS responds with the wildcard ``*.storage.googleapis.com``
certificate, which legitimately does NOT cover the GDELT custom
domain, so Python's TLS verification refuses the connection. Some
networks happen to route through a path where this works; many
(notably Docker Desktop's outbound NAT on local installs) do not.
The fix in ``services.geopolitics._gcs_direct_gdelt_url`` rewrites any
URL pointing at ``data.gdeltproject.org`` to its GCS-direct equivalent
(``storage.googleapis.com/data.gdeltproject.org/...``), where the
standard GCS certificate is genuinely valid. ``api.gdeltproject.org``
and every other host are left untouched.
These tests pin that behavior so a future refactor that drops the
helper or accidentally rewrites the wrong host gets a loud failure.
"""
from __future__ import annotations
import pytest
def test_rewrites_data_gdeltproject_https():
from services.geopolitics import _gcs_direct_gdelt_url
assert _gcs_direct_gdelt_url(
"https://data.gdeltproject.org/gdeltv2/lastupdate.txt"
) == "https://storage.googleapis.com/data.gdeltproject.org/gdeltv2/lastupdate.txt"
def test_rewrites_data_gdeltproject_http():
"""GDELT's lastupdate.txt sometimes lists URLs with http:// — we
rewrite those too (the downstream call upgrades them to https)."""
from services.geopolitics import _gcs_direct_gdelt_url
assert _gcs_direct_gdelt_url(
"http://data.gdeltproject.org/gdeltv2/20260301120000.export.CSV.zip"
) == "http://storage.googleapis.com/data.gdeltproject.org/gdeltv2/20260301120000.export.CSV.zip"
def test_rewrites_preserve_query_string_and_path():
from services.geopolitics import _gcs_direct_gdelt_url
url = "https://data.gdeltproject.org/some/deep/path?a=1&b=2&c=hello%20world"
rewritten = _gcs_direct_gdelt_url(url)
assert rewritten == (
"https://storage.googleapis.com/data.gdeltproject.org"
"/some/deep/path?a=1&b=2&c=hello%20world"
)
def test_does_not_touch_api_gdeltproject_org():
"""The API host is NOT a CNAME to GCS; rewriting it would break the
actual GDELT API endpoint."""
from services.geopolitics import _gcs_direct_gdelt_url
url = "https://api.gdeltproject.org/api/v2/doc/doc?query=carrier"
assert _gcs_direct_gdelt_url(url) == url
def test_does_not_touch_other_hosts():
from services.geopolitics import _gcs_direct_gdelt_url
for url in (
"https://en.wikipedia.org/wiki/Boeing_747",
"https://query.wikidata.org/sparql",
"https://storage.googleapis.com/already-correct/path",
"https://nominatim.openstreetmap.org/search",
):
assert _gcs_direct_gdelt_url(url) == url
def test_does_not_partially_match_strings():
"""``data.gdeltproject.org`` is matched exactly; URLs that merely
contain that substring elsewhere (in a query parameter, for example)
are left alone. Otherwise we'd rewrite something like
``https://example.com/?ref=data.gdeltproject.org/x`` which is wrong."""
from services.geopolitics import _gcs_direct_gdelt_url
# The match requires ``://`` immediately before the host, so a host
# like ``example-data.gdeltproject.org`` would also be left alone
# (treated as a different host, which is correct).
url = "https://example-data.gdeltproject.org/path"
assert _gcs_direct_gdelt_url(url) == url
@@ -0,0 +1,208 @@
"""Issue #239 (tg12): backend registers duplicate API routes in both
``main.py`` and router modules, so request behavior depends on the
order ``FastAPI`` happened to register them.
This test is the **CI guard** that locks in the invariant going forward.
It does NOT delete any existing duplicates those are tolerated via an
explicit baseline file. What it DOES block is *new* duplicates appearing
later, which is what the audit was actually asking for: a way to stop
the drift before it gets worse.
Findings (empirically verified, see PR #286 description):
- ``main.app`` calls ``include_router(...)`` for every router at module
import time around line 3316.
- Every ``@app.get/post/put/...`` decorator inside ``main.py`` runs
*after* those include_router calls, so the router handler is the one
that actually serves requests. The duplicates in ``main.py`` are
dead code at the route-resolution layer.
- Behavior today is deterministic (router wins), but if someone later
adds a NEW route only in ``main.py``, or edits one copy of an
existing pair without the other, drift starts.
How this test works:
- Walks ``main.app.routes`` and records every ``(method, path)`` that
appears more than once, along with which modules registered each
copy.
- Compares that set against the baseline in
``backend/tests/data/duplicate_routes_baseline.json``.
- **Fails** if any duplicate appears that is NOT in the baseline
(or if the registering modules for an existing duplicate change).
- **Stays green** when duplicates are *removed* by genuinely deduping
the code. (The baseline is a ceiling, not a floor.)
To extend in the future:
- If you actually dedupe a route, leave the baseline alone the test
still passes. Subsequent regenerations of the baseline (``python -m
scripts.regen_duplicate_routes_baseline`` or the snippet in this
test's docstring) will shrink it.
- If you legitimately need a new duplicate (you probably do not), add
it to the baseline AND explain why in the PR description so reviewers
can push back.
"""
from __future__ import annotations
import json
from collections import defaultdict
from pathlib import Path
import pytest
BASELINE_PATH = (
Path(__file__).parent / "data" / "duplicate_routes_baseline.json"
)
def _current_duplicates() -> dict[str, list[str]]:
"""Walk ``main.app.routes`` and return ``{'METHOD /path': [module, ...]}``
for every (method, path) registered more than once."""
import main
by_key: dict[str, list[str]] = defaultdict(list)
for route in main.app.routes:
path = getattr(route, "path", None)
methods = getattr(route, "methods", None)
endpoint = getattr(route, "endpoint", None)
if not path or not methods or endpoint is None:
continue
for method in methods:
if method in ("HEAD", "OPTIONS"):
continue
by_key[f"{method} {path}"].append(endpoint.__module__)
return {
key: sorted(modules) for key, modules in by_key.items() if len(modules) > 1
}
def _load_baseline() -> dict[str, list[str]]:
if not BASELINE_PATH.exists():
return {}
raw = json.loads(BASELINE_PATH.read_text(encoding="utf-8"))
dups = raw.get("duplicates", {})
if not isinstance(dups, dict):
return {}
return {k: sorted(v) for k, v in dups.items()}
def test_no_new_duplicate_route_registrations():
"""Block any (method, path) duplicate not already in the baseline.
This is the primary CI guard: PRs that add a NEW shadowed
``@app.get`` while a router module already serves the same route
fail here with an actionable message.
"""
current = _current_duplicates()
baseline = _load_baseline()
new_or_changed = []
for key, modules in sorted(current.items()):
if key not in baseline:
new_or_changed.append(
f" + {key} (NEW duplicate; registered in: {modules})"
)
continue
if modules != baseline[key]:
new_or_changed.append(
f" ~ {key} "
f"(modules changed: was {baseline[key]}, now {modules})"
)
if new_or_changed:
pytest.fail(
"Issue #239 CI guard: detected duplicate route registrations "
"that are NOT in the tolerated baseline.\n"
"\n"
"If you added a new @app.get/post/... in main.py for a path "
"that a router module already serves, please move the handler "
"into the router and delete the main.py copy — the router "
"version wins on request routing anyway, so the main.py copy "
"is dead code that just creates drift risk.\n"
"\n"
"Offending entries:\n"
+ "\n".join(new_or_changed)
+ "\n\n"
"Baseline lives at "
f"{BASELINE_PATH.relative_to(BASELINE_PATH.parent.parent.parent)}."
)
def test_baseline_only_lists_real_duplicates():
"""Catch baseline drift in the other direction: if an entry in the
baseline is no longer actually a duplicate (because someone deduped
it manually), the baseline is stale and should be shrunk so future
re-introductions of that duplicate get caught.
This test is informational it does NOT fail the build today (the
audit's main concern is *new* duplicates, not stale baseline
entries). It prints a warning so the next baseline regeneration
can clean things up.
"""
current = _current_duplicates()
baseline = _load_baseline()
stale = sorted(k for k in baseline if k not in current)
if stale:
# Use warnings instead of fail so this is friendly housekeeping,
# not a CI blocker. The other test catches the actual safety
# concern.
import warnings
warnings.warn(
f"duplicate_routes_baseline.json contains {len(stale)} entry/entries "
"no longer present in app.routes — consider regenerating the baseline. "
f"Stale: {stale[:5]}{'...' if len(stale) > 5 else ''}",
stacklevel=2,
)
def test_router_handler_is_the_one_that_serves():
"""Pin the empirical claim from PR #286: for every duplicated
(method, path), the FIRST-registered handler is in a router
module, not in main.py. If this ever flips e.g. someone moves
include_router calls to the bottom of main.py duplicate routes
start silently changing which handler runs. This catches that
rearrangement immediately.
"""
import main
first_seen: dict[str, str] = {}
for route in main.app.routes:
path = getattr(route, "path", None)
methods = getattr(route, "methods", None)
endpoint = getattr(route, "endpoint", None)
if not path or not methods or endpoint is None:
continue
for method in methods:
if method in ("HEAD", "OPTIONS"):
continue
key = f"{method} {path}"
if key not in first_seen:
first_seen[key] = endpoint.__module__
main_winning = sorted(
k for k, mod in first_seen.items() if mod == "main"
)
# The duplicates we tolerate are router-first. If main is the first
# registered for any duplicated path, the router copy gets shadowed
# instead, which would invalidate every assumption made in audit
# rounds 5 and 6 about "the router version is canonical."
baseline = _load_baseline()
main_first_in_baseline = [k for k in main_winning if k in baseline]
if main_first_in_baseline:
pytest.fail(
"Issue #239 invariant broken: for at least one duplicated "
"(method, path), main.py is now registered FIRST and is "
"serving requests instead of the router copy. Audit rounds "
"5 and 6 assumed the router handler wins.\n"
"\n"
"Affected entries:\n"
+ "\n".join(f" {k}" for k in main_first_in_baseline)
+ "\n\n"
"Most likely cause: someone moved app.include_router(...) "
"calls in main.py to after the @app.get decorators. Move "
"them back to before the @app routes (currently around "
"line 3316)."
)
@@ -0,0 +1,160 @@
"""Issues #240 & #241 (tg12): oracle market/stake resolution endpoints
must require admin authentication.
Before the fix, ``POST /api/mesh/oracle/resolve`` and
``POST /api/mesh/oracle/resolve-stakes`` were decorated with
``@mesh_write_exempt(MeshWriteExemption.ADMIN_CONTROL)``. That decorator
only tags the route as not requiring a mesh signed-write envelope; it
does NOT enforce authorization. The rate limiter (5/minute) was the
only real gate, which is wrong for control-plane state mutations.
The fix adds ``dependencies=[Depends(require_admin)]`` to both routes.
These tests prove:
- Anonymous callers receive 403.
- A request bearing the configured admin key passes the auth gate.
- The underlying ledger mutator is not invoked on a 403.
"""
from __future__ import annotations
from unittest.mock import patch, MagicMock
import pytest
from fastapi.testclient import TestClient
_ADMIN_KEY = "test-admin-key-for-oracle-resolve-fixture-32+"
@pytest.fixture
def client():
"""TestClient with the private-lane transport middleware short-circuited.
The ``enforce_high_privacy_mesh`` middleware in ``main.py`` returns
HTTP 202 ("preparing private lane") for ``/api/mesh/*`` requests
when the Wormhole supervisor is not yet at the required transport
tier. In tests that's always — Wormhole is not running. Patching
``_minimum_transport_tier`` to return None disables the tier check
for the duration of the test, letting the request reach the route
(and therefore reach the ``Depends(require_admin)`` we are testing).
"""
import main
with patch("main._minimum_transport_tier", return_value=None):
yield TestClient(main.app, raise_server_exceptions=False)
@pytest.fixture
def mock_ledger():
"""Replace oracle_ledger methods so tests don't mutate persistent state.
The handler does ``from services.mesh.mesh_oracle import oracle_ledger``
at call time, so we patch the module attribute.
"""
fake = MagicMock()
fake.resolve_market.return_value = (0, 0)
fake.resolve_market_stakes.return_value = {"winners": 0, "losers": 0}
fake.resolve_expired_stakes.return_value = []
with patch("services.mesh.mesh_oracle.oracle_ledger", fake):
yield fake
# ---------------------------------------------------------------------------
# /api/mesh/oracle/resolve — issue #240
# ---------------------------------------------------------------------------
class TestOracleResolveAuthGate:
def test_anonymous_caller_is_rejected(self, client, mock_ledger):
with patch("auth._current_admin_key", return_value=_ADMIN_KEY):
r = client.post(
"/api/mesh/oracle/resolve",
json={"market_title": "test-market", "outcome": "Yes"},
)
assert r.status_code == 403
# Critically: the ledger mutator must NOT have been called on a 403.
assert mock_ledger.resolve_market.call_count == 0
assert mock_ledger.resolve_market_stakes.call_count == 0
def test_wrong_admin_key_rejected(self, client, mock_ledger):
with patch("auth._current_admin_key", return_value=_ADMIN_KEY):
r = client.post(
"/api/mesh/oracle/resolve",
headers={"X-Admin-Key": "this-key-is-wrong"},
json={"market_title": "test-market", "outcome": "Yes"},
)
assert r.status_code == 403
assert mock_ledger.resolve_market.call_count == 0
def test_valid_admin_key_passes_auth_gate(self, client, mock_ledger):
with patch("auth._current_admin_key", return_value=_ADMIN_KEY):
r = client.post(
"/api/mesh/oracle/resolve",
headers={"X-Admin-Key": _ADMIN_KEY},
json={"market_title": "test-market", "outcome": "Yes"},
)
# The auth gate let us through. The handler ran and called the
# (mocked) ledger.
assert r.status_code == 200
assert mock_ledger.resolve_market.call_count == 1
assert mock_ledger.resolve_market.call_args[0] == ("test-market", "Yes")
def test_admin_key_unset_blocks_in_production_posture(self, client, mock_ledger):
"""When ADMIN_KEY env is not configured at all and we're not in
debug, the endpoint must still refuse never silently accept."""
with (
patch("auth._current_admin_key", return_value=""),
patch("auth._allow_insecure_admin", return_value=False),
patch("auth._debug_mode_enabled", return_value=False),
patch("auth._scoped_admin_tokens", return_value={}),
):
r = client.post(
"/api/mesh/oracle/resolve",
json={"market_title": "test-market", "outcome": "Yes"},
)
assert r.status_code == 403
assert mock_ledger.resolve_market.call_count == 0
# ---------------------------------------------------------------------------
# /api/mesh/oracle/resolve-stakes — issue #241
# ---------------------------------------------------------------------------
class TestOracleResolveStakesAuthGate:
def test_anonymous_caller_is_rejected(self, client, mock_ledger):
with patch("auth._current_admin_key", return_value=_ADMIN_KEY):
r = client.post("/api/mesh/oracle/resolve-stakes")
assert r.status_code == 403
assert mock_ledger.resolve_expired_stakes.call_count == 0
def test_wrong_admin_key_rejected(self, client, mock_ledger):
with patch("auth._current_admin_key", return_value=_ADMIN_KEY):
r = client.post(
"/api/mesh/oracle/resolve-stakes",
headers={"X-Admin-Key": "nope"},
)
assert r.status_code == 403
assert mock_ledger.resolve_expired_stakes.call_count == 0
def test_valid_admin_key_passes_auth_gate(self, client, mock_ledger):
with patch("auth._current_admin_key", return_value=_ADMIN_KEY):
r = client.post(
"/api/mesh/oracle/resolve-stakes",
headers={"X-Admin-Key": _ADMIN_KEY},
)
assert r.status_code == 200
assert mock_ledger.resolve_expired_stakes.call_count == 1
body = r.json()
assert body["ok"] is True
assert body["count"] == 0
def test_admin_key_unset_blocks_in_production_posture(self, client, mock_ledger):
with (
patch("auth._current_admin_key", return_value=""),
patch("auth._allow_insecure_admin", return_value=False),
patch("auth._debug_mode_enabled", return_value=False),
patch("auth._scoped_admin_tokens", return_value={}),
):
r = client.post("/api/mesh/oracle/resolve-stakes")
assert r.status_code == 403
assert mock_ledger.resolve_expired_stakes.call_count == 0
@@ -0,0 +1,277 @@
"""Round 7a: per-install operator handle threads through every outbound
third-party API call.
Background: before this change every Shadowbroker install identified
itself to Wikipedia, Wikidata, Nominatim, GDELT, OpenMHz, Broadcastify,
weather.gov, NUFORC, etc. with a single project-wide ``Shadowbroker``
User-Agent. From the upstream's perspective, every install in the world
looked like one giant scraper. If one install misbehaved, the upstream's
only recourse was to block ``Shadowbroker`` as a whole, taking out every
other install.
Fix: each install gets a stable pseudonymous handle (auto-generated like
``shadow-7f3a92`` or operator-overridden via ``OPERATOR_HANDLE``) that
gets embedded in the User-Agent for every outbound call. Upstreams can
now rate-limit / contact the specific operator instead of the project.
These tests pin:
1. The handle is auto-generated on first call if no override exists.
2. The handle survives process restart (persisted to disk).
3. ``OPERATOR_HANDLE`` env var override wins over the auto-gen handle.
4. The handle is sanitized (whitespace, special chars, length).
5. Every previously-MONSTER-UA call site now sends the per-operator UA.
"""
from __future__ import annotations
import json
import os
from pathlib import Path
from unittest.mock import patch
import pytest
@pytest.fixture
def isolated_handle(tmp_path, monkeypatch):
"""Redirect the persistence path to tmp and reset caches between tests."""
from services import network_utils
handle_file = tmp_path / "operator_handle.json"
monkeypatch.setattr(network_utils, "_OPERATOR_HANDLE_FILE", handle_file)
network_utils._reset_operator_handle_cache_for_tests()
monkeypatch.delenv("OPERATOR_HANDLE", raising=False)
# Reset Settings cache so OPERATOR_HANDLE env changes are picked up.
from services.config import get_settings
get_settings.cache_clear()
yield network_utils
network_utils._reset_operator_handle_cache_for_tests()
get_settings.cache_clear()
# ---------------------------------------------------------------------------
# Core handle generation / persistence / override
# ---------------------------------------------------------------------------
class TestOperatorHandleGeneration:
def test_auto_generates_on_first_call(self, isolated_handle):
h = isolated_handle.get_operator_handle()
# Prefix is "operator-" (deliberately neutral; "shadow-" looked
# exactly like a pattern abuse-detection systems would auto-block).
assert h.startswith("operator-")
assert len(h) == len("operator-") + 6
# Hex suffix.
suffix = h.split("-", 1)[1]
int(suffix, 16) # raises if not hex
def test_persists_to_disk_so_handle_survives_restart(self, isolated_handle):
first = isolated_handle.get_operator_handle()
# Simulate process restart: clear in-memory cache, then ask again.
isolated_handle._reset_operator_handle_cache_for_tests()
second = isolated_handle.get_operator_handle()
assert second == first
# The file actually exists.
assert isolated_handle._OPERATOR_HANDLE_FILE.exists()
body = json.loads(isolated_handle._OPERATOR_HANDLE_FILE.read_text())
assert body["handle"] == first
def test_env_override_wins_over_auto_generated(self, isolated_handle, monkeypatch):
# First call without env var auto-generates.
auto = isolated_handle.get_operator_handle()
assert auto.startswith("operator-")
# Setting env var changes the resolved handle without touching the disk file.
monkeypatch.setenv("OPERATOR_HANDLE", "alice")
from services.config import get_settings
get_settings.cache_clear()
isolated_handle._reset_operator_handle_cache_for_tests()
assert isolated_handle.get_operator_handle() == "alice"
def test_handle_is_sanitized(self, isolated_handle, monkeypatch):
from services.config import get_settings
# Sanitization tests run against the normalizer directly so the
# empty-string case can be asserted independently of the env-var
# resolution path (where empty means "use auto-gen", not "use
# 'anonymous'").
from services.network_utils import _normalize_handle
cases = [
("Alice Smith", "alice-smith"),
("user@example.com", "user-example-com"),
(" whitespace ", "whitespace"),
("UPPER-CASE", "upper-case"),
("multiple---dashes", "multiple-dashes"),
("/leading/slash", "leading-slash"),
("trailing-", "trailing"),
("", "anonymous"),
]
for raw, expected in cases:
got = _normalize_handle(raw)
assert got == expected, f"{raw!r} -> {got!r}, expected {expected!r}"
assert got == got.lower()
for ch in got:
assert ch.isalnum() or ch in "-_", f"unsafe char {ch!r} in {got!r}"
assert "--" not in got
def test_handle_is_length_capped(self, isolated_handle, monkeypatch):
from services.config import get_settings
monkeypatch.setenv("OPERATOR_HANDLE", "x" * 1000)
get_settings.cache_clear()
isolated_handle._reset_operator_handle_cache_for_tests()
got = isolated_handle.get_operator_handle()
assert len(got) <= 48
# ---------------------------------------------------------------------------
# outbound_user_agent() builds the right header
# ---------------------------------------------------------------------------
class TestOutboundUserAgentString:
def test_includes_operator_handle(self, isolated_handle):
ua = isolated_handle.outbound_user_agent()
handle = isolated_handle.get_operator_handle()
assert f"operator: {handle}" in ua
def test_includes_purpose_when_provided(self, isolated_handle):
ua = isolated_handle.outbound_user_agent("wikipedia")
assert "purpose: wikipedia" in ua
def test_includes_contact_path(self, isolated_handle):
ua = isolated_handle.outbound_user_agent()
assert "github.com" in ua.lower()
assert "shadowbroker" in ua.lower()
def test_version_prefix(self, isolated_handle):
ua = isolated_handle.outbound_user_agent()
assert ua.startswith("Shadowbroker/")
# ---------------------------------------------------------------------------
# Wikipedia / Wikidata — retroactive fix for PR #284's MONSTER pattern
# ---------------------------------------------------------------------------
class TestWikimediaCallsAreNowPerOperator:
def test_wikidata_call_uses_per_operator_ua(self, isolated_handle, monkeypatch):
from services import region_dossier
captured = []
class _FakeResp:
status_code = 200
def json(self):
return {"results": {"bindings": []}}
def fake_fetch(url, **kwargs):
captured.append(kwargs.get("headers") or {})
return _FakeResp()
monkeypatch.setattr(region_dossier, "fetch_with_curl", fake_fetch)
region_dossier._fetch_wikidata_leader("Testlandia")
assert captured, "Wikidata fetcher was not called"
headers = captured[0]
assert "User-Agent" in headers
assert "Api-User-Agent" in headers
handle = isolated_handle.get_operator_handle()
for header_value in (headers["User-Agent"], headers["Api-User-Agent"]):
assert f"operator: {handle}" in header_value, (
f"Wikimedia UA must include the per-operator handle; got {header_value!r}"
)
def test_wikipedia_summary_uses_per_operator_ua(self, isolated_handle, monkeypatch):
from services import region_dossier
captured = []
class _FakeResp:
status_code = 200
def json(self):
return {
"type": "standard",
"description": "x",
"extract": "y",
"thumbnail": {"source": ""},
}
def fake_fetch(url, **kwargs):
captured.append((url, kwargs.get("headers") or {}))
return _FakeResp()
monkeypatch.setattr(region_dossier, "fetch_with_curl", fake_fetch)
region_dossier._fetch_local_wiki_summary("Paris", "France")
wikipedia_hits = [c for c in captured if "wikipedia.org" in c[0]]
assert wikipedia_hits, "Wikipedia summary fetch was not called"
for _url, headers in wikipedia_hits:
handle = isolated_handle.get_operator_handle()
assert f"operator: {handle}" in headers.get("User-Agent", "")
# ---------------------------------------------------------------------------
# Generic round-7a regression guard
# ---------------------------------------------------------------------------
class TestNoMonsterUserAgentRemains:
"""The audit's underlying concern was that every Shadowbroker install
looked like one entity. This test scans the codebase for the OLD
aggregate identifier patterns and fails if a new one sneaks back in.
We allow the strings to appear in:
- comments (audit prose, change-log notes)
- tests
- .env.example (documentation)
The test only fails if the string lives in actual outbound-request
HEADER values without going through the per-operator helper.
"""
BANNED_LITERALS = (
"ShadowBroker-OSINT/1.0",
"ShadowBroker-OSINT/0.9",
"ShadowBroker-FeedIngester/1.0",
"ShadowBroker/0.9.79 local Shodan connector",
"ShadowBroker/0.9.79 Finnhub connector",
"Mozilla/5.0 (compatible; ShadowBroker CCTV proxy)",
)
def test_no_banned_aggregate_user_agent_strings(self):
from pathlib import Path
backend_root = Path(__file__).parent.parent
offenders = []
for py in backend_root.rglob("*.py"):
# Skip test files and any audit-context comments.
rel = py.relative_to(backend_root).as_posix()
if rel.startswith("tests/"):
continue
text = py.read_text(encoding="utf-8", errors="ignore")
# Look only for the literal as part of a string in a User-Agent
# context: cheap heuristic via "User-Agent" + literal coexisting
# in the same file. A literal in a comment block won't trigger
# because the same line won't have User-Agent surrounding it.
for banned in self.BANNED_LITERALS:
if banned in text:
# Walk lines to ensure it's a real header value.
for i, line in enumerate(text.splitlines(), 1):
if banned in line:
# Comments / docstrings are allowed — only fail
# if the line looks like a header assignment.
stripped = line.strip()
if stripped.startswith("#"):
continue
if '"User-Agent"' in line or "'User-Agent'" in line:
offenders.append(f"{rel}:{i}: {stripped[:120]}")
assert not offenders, (
"Round 7a regression: the following lines reintroduced an "
"aggregate Shadowbroker User-Agent. Use "
"outbound_user_agent('purpose') instead so the per-install "
"operator handle is embedded.\n"
+ "\n".join(offenders)
)
@@ -0,0 +1,366 @@
"""Issue #256 (tg12): per-peer HMAC secrets must defeat cross-peer
impersonation.
Before the fix, ALL peer-push HMACs were derived from the single
fleet-shared ``MESH_PEER_PUSH_SECRET``. The receiver could only prove
"this request was signed by someone who knows the fleet secret" not
which peer signed it. Any peer that knew the secret could compute the
expected HMAC for any other peer's URL and impersonate that peer.
The fix introduces ``MESH_PEER_SECRETS``, a per-peer URL-to-secret map.
When a peer URL appears there:
- Only the listed per-peer secret is accepted for that URL.
- The global ``MESH_PEER_PUSH_SECRET`` is ignored for that specific URL.
- A peer that knows only the global secret (or a different peer's
per-peer secret) cannot forge a request claiming to be that peer.
When a peer URL is NOT listed (the common case for single-peer installs
and for migration windows), the resolver falls back to the global
secret preserving existing behavior with zero operator action.
These tests exercise ``resolve_peer_key_for_url`` directly so we cover
the security contract without spinning up a full mesh node.
"""
from __future__ import annotations
import hashlib
import hmac
import pytest
# ---------------------------------------------------------------------------
# _lookup_per_peer_secret — env parsing
# ---------------------------------------------------------------------------
class TestLookupPerPeerSecret:
def setup_method(self):
# Invalidate the parser cache so each test sees its own env state.
from services.mesh import mesh_crypto
mesh_crypto._PEER_SECRETS_CACHE = {}
mesh_crypto._PEER_SECRETS_CACHE_RAW = ""
def test_returns_empty_when_env_unset(self, monkeypatch):
from services.mesh.mesh_crypto import _lookup_per_peer_secret
monkeypatch.delenv("MESH_PEER_SECRETS", raising=False)
assert _lookup_per_peer_secret("https://peer.example") == ""
def test_returns_empty_when_env_blank(self, monkeypatch):
from services.mesh.mesh_crypto import _lookup_per_peer_secret
monkeypatch.setenv("MESH_PEER_SECRETS", "")
assert _lookup_per_peer_secret("https://peer.example") == ""
def test_returns_per_peer_secret_for_listed_url(self, monkeypatch):
from services.mesh.mesh_crypto import _lookup_per_peer_secret
monkeypatch.setenv(
"MESH_PEER_SECRETS",
"https://peer-a.example=secretA,https://peer-b.example=secretB",
)
assert _lookup_per_peer_secret("https://peer-a.example") == "secretA"
assert _lookup_per_peer_secret("https://peer-b.example") == "secretB"
def test_returns_empty_for_url_not_listed(self, monkeypatch):
from services.mesh.mesh_crypto import _lookup_per_peer_secret
monkeypatch.setenv(
"MESH_PEER_SECRETS",
"https://peer-a.example=secretA",
)
assert _lookup_per_peer_secret("https://other.example") == ""
def test_url_is_normalized_before_lookup(self, monkeypatch):
from services.mesh.mesh_crypto import _lookup_per_peer_secret
# Configure with a trailing slash + uppercase host. Lookup with
# plain lowercase host. Both should normalize to the same key.
monkeypatch.setenv(
"MESH_PEER_SECRETS",
"https://Peer-A.Example/=secretA",
)
assert _lookup_per_peer_secret("https://peer-a.example") == "secretA"
def test_whitespace_around_entries_is_stripped(self, monkeypatch):
from services.mesh.mesh_crypto import _lookup_per_peer_secret
monkeypatch.setenv(
"MESH_PEER_SECRETS",
" https://peer-a.example = secretA , https://peer-b.example=secretB ",
)
assert _lookup_per_peer_secret("https://peer-a.example") == "secretA"
assert _lookup_per_peer_secret("https://peer-b.example") == "secretB"
def test_malformed_entries_are_skipped_not_raised(self, monkeypatch):
"""A garbled MESH_PEER_SECRETS value must NOT crash the resolver.
Bad entries are silently dropped; well-formed entries still work.
This is the "fail-forward, not loud" rule a typo in operator
config should not take the whole backend down."""
from services.mesh.mesh_crypto import _lookup_per_peer_secret
monkeypatch.setenv(
"MESH_PEER_SECRETS",
"no_equals_sign,=missing_url,https://no.secret=,https://good.example=secretGood",
)
assert _lookup_per_peer_secret("https://good.example") == "secretGood"
# The malformed ones produce no entry (and don't poison the cache).
assert _lookup_per_peer_secret("https://no.secret") == ""
def test_cache_invalidates_on_env_change(self, monkeypatch):
"""A test (or operator) updating MESH_PEER_SECRETS must see the
new value immediately no process restart required."""
from services.mesh.mesh_crypto import _lookup_per_peer_secret
monkeypatch.setenv("MESH_PEER_SECRETS", "https://a.example=first")
assert _lookup_per_peer_secret("https://a.example") == "first"
monkeypatch.setenv("MESH_PEER_SECRETS", "https://a.example=second")
assert _lookup_per_peer_secret("https://a.example") == "second"
# ---------------------------------------------------------------------------
# resolve_peer_key_for_url — precedence + fallback
# ---------------------------------------------------------------------------
class TestResolvePeerKeyForUrl:
def setup_method(self):
from services.mesh import mesh_crypto
mesh_crypto._PEER_SECRETS_CACHE = {}
mesh_crypto._PEER_SECRETS_CACHE_RAW = ""
def _fake_settings(self, global_secret: str):
from unittest.mock import MagicMock
s = MagicMock()
s.MESH_PEER_PUSH_SECRET = global_secret
return s
def test_falls_back_to_global_when_no_per_peer_entry(self, monkeypatch):
"""Single-peer installs: MESH_PEER_SECRETS empty, MESH_PEER_PUSH_SECRET
set must keep working as before."""
from services.mesh.mesh_crypto import (
resolve_peer_key_for_url,
_derive_peer_key,
)
monkeypatch.delenv("MESH_PEER_SECRETS", raising=False)
with monkeypatch.context() as m:
m.setattr(
"services.config.get_settings",
lambda: self._fake_settings("global-secret"),
)
key = resolve_peer_key_for_url("https://peer.example")
expected = _derive_peer_key("global-secret", "https://peer.example")
assert key == expected
assert len(key) == 32 # SHA-256 output
def test_per_peer_secret_takes_precedence_over_global(self, monkeypatch):
from services.mesh.mesh_crypto import (
resolve_peer_key_for_url,
_derive_peer_key,
)
monkeypatch.setenv(
"MESH_PEER_SECRETS",
"https://peer-a.example=per-peer-a-secret",
)
with monkeypatch.context() as m:
m.setattr(
"services.config.get_settings",
lambda: self._fake_settings("global-secret"),
)
key = resolve_peer_key_for_url("https://peer-a.example")
expected_per_peer = _derive_peer_key(
"per-peer-a-secret", "https://peer-a.example"
)
expected_global = _derive_peer_key("global-secret", "https://peer-a.example")
assert key == expected_per_peer
assert key != expected_global
def test_unlisted_peer_uses_global_during_migration(self, monkeypatch):
"""Partial migration: peer A is in MESH_PEER_SECRETS, peer B is
not yet. Peer B must keep working under the global secret."""
from services.mesh.mesh_crypto import (
resolve_peer_key_for_url,
_derive_peer_key,
)
monkeypatch.setenv(
"MESH_PEER_SECRETS",
"https://peer-a.example=per-peer-a-secret",
)
with monkeypatch.context() as m:
m.setattr(
"services.config.get_settings",
lambda: self._fake_settings("global-secret"),
)
key_a = resolve_peer_key_for_url("https://peer-a.example")
key_b = resolve_peer_key_for_url("https://peer-b.example")
expected_b = _derive_peer_key("global-secret", "https://peer-b.example")
assert key_b == expected_b
# Peer A's per-peer key must differ from peer B's global key
# (they're keyed by different secrets and different URLs).
assert key_a != key_b
def test_returns_empty_when_no_secret_available(self, monkeypatch):
from services.mesh.mesh_crypto import resolve_peer_key_for_url
monkeypatch.delenv("MESH_PEER_SECRETS", raising=False)
with monkeypatch.context() as m:
m.setattr(
"services.config.get_settings",
lambda: self._fake_settings(""),
)
key = resolve_peer_key_for_url("https://peer.example")
assert key == b""
def test_returns_empty_when_url_is_unparseable(self, monkeypatch):
from services.mesh.mesh_crypto import resolve_peer_key_for_url
with monkeypatch.context() as m:
m.setattr(
"services.config.get_settings",
lambda: self._fake_settings("global-secret"),
)
assert resolve_peer_key_for_url("") == b""
assert resolve_peer_key_for_url("not-a-url") == b""
assert resolve_peer_key_for_url(None) == b""
# ---------------------------------------------------------------------------
# The actual #256 attack: peer A cannot impersonate peer B
# ---------------------------------------------------------------------------
class TestCrossPeerImpersonationRefused:
"""The core regression: when MESH_PEER_SECRETS is configured, a peer
that knows ONLY the global secret (or a different peer's per-peer
secret) cannot produce a valid HMAC for another peer's URL."""
def setup_method(self):
from services.mesh import mesh_crypto
mesh_crypto._PEER_SECRETS_CACHE = {}
mesh_crypto._PEER_SECRETS_CACHE_RAW = ""
def _hmac(self, key: bytes, body: bytes) -> str:
return hmac.new(key, body, hashlib.sha256).hexdigest()
def test_peer_a_global_secret_cannot_forge_peer_b_hmac(self, monkeypatch):
from services.mesh.mesh_crypto import (
resolve_peer_key_for_url,
_derive_peer_key,
)
from unittest.mock import MagicMock
# Receiver has BOTH the global secret AND a per-peer secret for B.
monkeypatch.setenv(
"MESH_PEER_SECRETS",
"https://peer-b.example=per-peer-b-secret",
)
settings = MagicMock()
settings.MESH_PEER_PUSH_SECRET = "global-secret"
monkeypatch.setattr(
"services.config.get_settings", lambda: settings
)
body = b'{"events": [{"id": 1}]}'
# Attacker (peer A) knows only the global secret. Tries to forge
# an HMAC claiming to be peer B.
attacker_key = _derive_peer_key("global-secret", "https://peer-b.example")
attacker_hmac = self._hmac(attacker_key, body)
# Receiver derives B's expected key from B's per-peer secret.
receiver_key = resolve_peer_key_for_url("https://peer-b.example")
expected_hmac = self._hmac(receiver_key, body)
# The forgery MUST NOT match.
assert attacker_hmac != expected_hmac
def test_peer_a_per_peer_secret_cannot_forge_peer_b_hmac(self, monkeypatch):
"""Even harder case: peer A has its OWN per-peer secret, but
still does not know peer B's per-peer secret, and so cannot
forge an HMAC for peer B."""
from services.mesh.mesh_crypto import (
resolve_peer_key_for_url,
_derive_peer_key,
)
from unittest.mock import MagicMock
monkeypatch.setenv(
"MESH_PEER_SECRETS",
"https://peer-a.example=secretA,https://peer-b.example=secretB",
)
settings = MagicMock()
settings.MESH_PEER_PUSH_SECRET = ""
monkeypatch.setattr(
"services.config.get_settings", lambda: settings
)
body = b'{"events": [{"id": 99}]}'
# Attacker A tries to forge for B using its own secret (secretA).
attacker_key = _derive_peer_key("secretA", "https://peer-b.example")
attacker_hmac = self._hmac(attacker_key, body)
receiver_key = resolve_peer_key_for_url("https://peer-b.example")
expected_hmac = self._hmac(receiver_key, body)
assert attacker_hmac != expected_hmac
def test_legitimate_peer_b_request_verifies(self, monkeypatch):
"""Positive control: when peer B uses ITS per-peer secret and
claims to be itself, the receiver accepts the HMAC."""
from services.mesh.mesh_crypto import resolve_peer_key_for_url
from unittest.mock import MagicMock
monkeypatch.setenv(
"MESH_PEER_SECRETS",
"https://peer-b.example=secretB",
)
settings = MagicMock()
settings.MESH_PEER_PUSH_SECRET = ""
monkeypatch.setattr(
"services.config.get_settings", lambda: settings
)
body = b'{"events": [{"id": 7}]}'
# Peer B and the receiver both call resolve_peer_key_for_url.
sender_key = resolve_peer_key_for_url("https://peer-b.example")
receiver_key = resolve_peer_key_for_url("https://peer-b.example")
sender_hmac = self._hmac(sender_key, body)
expected_hmac = self._hmac(receiver_key, body)
assert sender_hmac == expected_hmac
def test_single_peer_install_zero_behavior_change(self, monkeypatch):
"""The "no UX hostility" guarantee: an install with the global
secret set and NO MESH_PEER_SECRETS entries must derive exactly
the same key as before this change."""
from services.mesh.mesh_crypto import (
resolve_peer_key_for_url,
_derive_peer_key,
)
from unittest.mock import MagicMock
monkeypatch.delenv("MESH_PEER_SECRETS", raising=False)
settings = MagicMock()
settings.MESH_PEER_PUSH_SECRET = "legacy-global-secret"
monkeypatch.setattr(
"services.config.get_settings", lambda: settings
)
# The legacy derivation that every prior call site used.
legacy_key = _derive_peer_key("legacy-global-secret", "https://peer.example")
# The new resolver, with no per-peer entries configured.
new_key = resolve_peer_key_for_url("https://peer.example")
assert new_key == legacy_key
@@ -0,0 +1,101 @@
"""Issues #218 / #219 (tg12): outbound Wikipedia + Wikidata calls must
identify ShadowBroker via the Wikimedia-recommended User-Agent /
Api-User-Agent headers.
Before this fix, ``backend/services/region_dossier.py`` called
``fetch_with_curl(url)`` with no explicit headers, falling back to the
generic project default UA. That sent a too-anonymous identifier to
Wikimedia. Per Wikimedia's policy
(https://foundation.wikimedia.org/wiki/Policy:Wikimedia_Foundation_User-Agent_Policy)
the API caller should send a stable, contactable identifier so Wikimedia
operators can rate-limit or reach the project.
This test does NOT make network calls. It patches ``fetch_with_curl``
and asserts the headers that get passed through.
"""
from __future__ import annotations
from unittest.mock import MagicMock, patch
import pytest
def _fake_resp(payload: dict, status: int = 200) -> MagicMock:
r = MagicMock()
r.status_code = status
r.json.return_value = payload
return r
def test_wikidata_call_passes_wikimedia_request_headers():
from services import region_dossier
calls = []
def fake_fetch(url, **kwargs):
calls.append(kwargs.get("headers"))
return _fake_resp({"results": {"bindings": []}})
with patch.object(region_dossier, "fetch_with_curl", side_effect=fake_fetch):
region_dossier._fetch_wikidata_leader("Testlandia")
assert calls, "fetch_with_curl was not called"
headers = calls[0] or {}
assert "User-Agent" in headers
assert "Api-User-Agent" in headers
# Stable identifier should mention the project + a contact path.
assert "Shadowbroker" in headers["Api-User-Agent"] or "ShadowBroker" in headers["Api-User-Agent"]
assert "github.com" in headers["Api-User-Agent"].lower()
def test_wikipedia_summary_call_passes_wikimedia_request_headers():
from services import region_dossier
calls = []
def fake_fetch(url, **kwargs):
calls.append((url, kwargs.get("headers")))
return _fake_resp(
{
"type": "standard",
"description": "test desc",
"extract": "test extract",
"thumbnail": {"source": ""},
}
)
with patch.object(region_dossier, "fetch_with_curl", side_effect=fake_fetch):
region_dossier._fetch_local_wiki_summary("Paris", "France")
# At least one Wikipedia REST call was issued.
wikipedia_calls = [c for c in calls if "wikipedia.org" in c[0]]
assert wikipedia_calls, "no Wikipedia call was issued"
for url, headers in wikipedia_calls:
headers = headers or {}
assert "User-Agent" in headers, f"missing User-Agent on {url}"
assert "Api-User-Agent" in headers, f"missing Api-User-Agent on {url}"
assert "github.com" in headers["Api-User-Agent"].lower()
def test_wikimedia_headers_helper_is_stable():
"""Regression guard: if someone removes the contact path or the
per-operator handle from the Wikimedia headers, we want a loud
test failure, not a silent ToS drift.
Round 7a: the original ``_WIKIMEDIA_REQUEST_HEADERS`` constant was
replaced with the ``_wikimedia_request_headers()`` function so the
per-install operator handle is embedded at call time. This test
pins both the project identifier AND the contact path AND the
per-operator format.
"""
from services.region_dossier import _wikimedia_request_headers
headers = _wikimedia_request_headers()
aua = headers.get("Api-User-Agent", "")
ua = headers.get("User-Agent", "")
for h, label in ((ua, "User-Agent"), (aua, "Api-User-Agent")):
assert "Shadowbroker" in h or "ShadowBroker" in h, f"{label} missing project id"
assert "github.com" in h.lower(), f"{label} missing contact URL"
assert "issues" in h.lower(), f"{label} missing /issues contact path"
# Round 7a: must include the per-operator handle.
assert "operator:" in h, f"{label} missing per-operator handle: {h!r}"
@@ -0,0 +1,263 @@
"""Issues #243, #252, #253 (tg12): settings endpoints must not leak
operational posture to unauthenticated callers.
- **#243**: ``GET /api/settings/wormhole``, ``/api/settings/privacy-profile``,
and ``/api/settings/node`` were leaking transport choice, anonymous-mode
state, the named privacy profile, and node-participant state to any
unauthenticated caller. The fix tightens the redaction allowlists to
expose ONLY a bare "is this feature on?" boolean and gates node mode
behind authenticated reads.
- **#252**: ``GET /api/settings/news-feeds`` returned the operator's full
curated feed inventory (names + URLs) to anyone. Now gated on
local-operator.
- **#253**: ``GET /api/settings/timemachine`` returned whether archival
capture is enabled to anyone. Now gated on local-operator.
Auth model: ``require_local_operator`` allows loopback (Tauri shell),
the Docker bridge frontend container (via the hostname-bound trust from
PR #278), and any caller that presents the configured admin key.
Anonymous LAN or internet callers do NOT pass and either receive 403
(news-feeds, timemachine) or a redacted minimum (wormhole / node).
"""
from __future__ import annotations
from unittest.mock import patch, MagicMock
import pytest
from fastapi.testclient import TestClient
_ADMIN_KEY = "test-admin-key-for-round5-fixture-32+chars"
@pytest.fixture
def client():
"""TestClient with the private-lane transport middleware disabled.
Same shape as the oracle resolve fixture the mesh privacy
middleware returns 202 for ``/api/settings/*`` under TestClient
because Wormhole is not actually running. Patching out the tier
requirement lets requests reach the route's auth gate.
"""
import main
with patch("main._minimum_transport_tier", return_value=None):
yield TestClient(main.app, raise_server_exceptions=False)
# ---------------------------------------------------------------------------
# #243: Wormhole posture redaction
# ---------------------------------------------------------------------------
class TestWormholeSettingsRedaction:
"""``GET /api/settings/wormhole`` must NOT leak transport choice or
anonymous-mode state to unauthenticated callers."""
def _read_settings_payload(self):
return {
"enabled": True,
"transport": "tor_arti",
"anonymous_mode": True,
"privacy_profile": "high",
"socks_proxy": "socks5h://127.0.0.1:9050",
}
def test_anonymous_caller_sees_only_enabled_bool(self, client):
with (
patch("main.read_wormhole_settings", return_value=self._read_settings_payload()),
patch("routers.wormhole.read_wormhole_settings", return_value=self._read_settings_payload()),
patch("services.wormhole_settings.read_wormhole_settings", return_value=self._read_settings_payload()),
patch("auth._current_admin_key", return_value=_ADMIN_KEY),
):
r = client.get("/api/settings/wormhole")
assert r.status_code == 200
body = r.json()
# Only the bare "is Wormhole on?" boolean is exposed publicly.
assert "enabled" in body
assert body["enabled"] is True
# Posture fields the audit flagged must be absent.
assert "transport" not in body
assert "anonymous_mode" not in body
assert "privacy_profile" not in body
assert "socks_proxy" not in body
def test_authenticated_caller_sees_full_state(self, client):
with (
patch("main.read_wormhole_settings", return_value=self._read_settings_payload()),
patch("routers.wormhole.read_wormhole_settings", return_value=self._read_settings_payload()),
patch("services.wormhole_settings.read_wormhole_settings", return_value=self._read_settings_payload()),
patch("auth._current_admin_key", return_value=_ADMIN_KEY),
):
r = client.get(
"/api/settings/wormhole",
headers={"X-Admin-Key": _ADMIN_KEY},
)
assert r.status_code == 200
body = r.json()
# All fields visible when authenticated.
assert body["enabled"] is True
assert body["transport"] == "tor_arti"
assert body["anonymous_mode"] is True
assert body["privacy_profile"] == "high"
class TestPrivacyProfileRedaction:
"""``GET /api/settings/privacy-profile`` must NOT leak the named
profile to unauthenticated callers (the profile name itself
discloses operator intent)."""
def _payload(self):
return {
"enabled": True,
"transport": "tor_arti",
"anonymous_mode": True,
"privacy_profile": "high",
}
def test_anonymous_caller_sees_only_wormhole_enabled_bool(self, client):
with (
patch("main.read_wormhole_settings", return_value=self._payload()),
patch("routers.wormhole.read_wormhole_settings", return_value=self._payload()),
patch("services.wormhole_settings.read_wormhole_settings", return_value=self._payload()),
patch("auth._current_admin_key", return_value=_ADMIN_KEY),
):
r = client.get("/api/settings/privacy-profile")
assert r.status_code == 200
body = r.json()
assert "wormhole_enabled" in body
assert body["wormhole_enabled"] is True
# The named profile, transport, and anonymous mode must NOT
# leak to anonymous callers.
assert "profile" not in body or body.get("profile") is None
assert "transport" not in body
assert "anonymous_mode" not in body
def test_authenticated_caller_sees_named_profile_and_transport(self, client):
with (
patch("main.read_wormhole_settings", return_value=self._payload()),
patch("routers.wormhole.read_wormhole_settings", return_value=self._payload()),
patch("services.wormhole_settings.read_wormhole_settings", return_value=self._payload()),
patch("auth._current_admin_key", return_value=_ADMIN_KEY),
):
r = client.get(
"/api/settings/privacy-profile",
headers={"X-Admin-Key": _ADMIN_KEY},
)
assert r.status_code == 200
body = r.json()
assert body["profile"] == "high"
assert body["wormhole_enabled"] is True
assert body["transport"] == "tor_arti"
assert body["anonymous_mode"] is True
class TestNodeSettingsRedaction:
"""``GET /api/settings/node`` must NOT disclose node_mode or
node_enabled to anonymous callers."""
def _node_data(self):
return {"some_node_field": "value"}
def test_anonymous_caller_sees_empty_stub(self, client):
with (
patch("services.node_settings.read_node_settings", return_value=self._node_data()),
patch("routers.admin._current_node_mode", return_value="participant"),
patch("routers.admin._participant_node_enabled", return_value=True),
patch("auth._current_admin_key", return_value=_ADMIN_KEY),
):
r = client.get("/api/settings/node")
assert r.status_code == 200
body = r.json()
# No posture fields.
assert "node_mode" not in body
assert "node_enabled" not in body
assert "some_node_field" not in body
def test_authenticated_caller_sees_full_node_state(self, client):
with (
patch("services.node_settings.read_node_settings", return_value=self._node_data()),
patch("routers.admin._current_node_mode", return_value="participant"),
patch("routers.admin._participant_node_enabled", return_value=True),
patch("auth._current_admin_key", return_value=_ADMIN_KEY),
):
r = client.get(
"/api/settings/node",
headers={"X-Admin-Key": _ADMIN_KEY},
)
assert r.status_code == 200
body = r.json()
assert body["node_mode"] == "participant"
assert body["node_enabled"] is True
assert body["some_node_field"] == "value"
# ---------------------------------------------------------------------------
# #252: news-feeds auth gate
# ---------------------------------------------------------------------------
class TestNewsFeedsAuthGate:
def _fake_feeds(self):
return [
{"name": "Custom Internal", "url": "https://internal.example/rss", "weight": 5},
{"name": "Default News", "url": "https://news.example/rss", "weight": 3},
]
def test_anonymous_caller_rejected(self, client):
with (
patch("services.news_feed_config.get_feeds", return_value=self._fake_feeds()) as get_feeds,
patch("auth._current_admin_key", return_value=_ADMIN_KEY),
):
r = client.get("/api/settings/news-feeds")
assert r.status_code == 403
# Critically: the underlying config read must NOT have been performed
# (else the response body could leak the count via response timing).
assert get_feeds.call_count == 0
def test_authenticated_caller_sees_full_feed_inventory(self, client):
with (
patch("services.news_feed_config.get_feeds", return_value=self._fake_feeds()),
patch("auth._current_admin_key", return_value=_ADMIN_KEY),
):
r = client.get(
"/api/settings/news-feeds",
headers={"X-Admin-Key": _ADMIN_KEY},
)
assert r.status_code == 200
body = r.json()
assert len(body) == 2
assert body[0]["name"] == "Custom Internal"
assert body[0]["url"] == "https://internal.example/rss"
# ---------------------------------------------------------------------------
# #253: timemachine auth gate
# ---------------------------------------------------------------------------
class TestTimemachineAuthGate:
def test_anonymous_caller_rejected(self, client):
node_data = {"timemachine_enabled": True}
with (
patch("services.node_settings.read_node_settings", return_value=node_data),
patch("auth._current_admin_key", return_value=_ADMIN_KEY),
):
r = client.get("/api/settings/timemachine")
assert r.status_code == 403
def test_authenticated_caller_sees_enabled_state(self, client):
node_data = {"timemachine_enabled": True}
with (
patch("services.node_settings.read_node_settings", return_value=node_data),
patch("auth._current_admin_key", return_value=_ADMIN_KEY),
):
r = client.get(
"/api/settings/timemachine",
headers={"X-Admin-Key": _ADMIN_KEY},
)
assert r.status_code == 200
body = r.json()
assert body["enabled"] is True
assert "storage_warning" in body
+9
View File
@@ -28,6 +28,15 @@ services:
- MESH_RELAY_PEERS=${MESH_RELAY_PEERS:-}
# Shared transport auth for operator peer push. Must be set to a unique secret per deployment.
- MESH_PEER_PUSH_SECRET=${MESH_PEER_PUSH_SECRET:-}
# Issue #256: optional per-peer HMAC secrets. Comma-separated
# `url=secret` pairs (no spaces). When a peer URL appears here, only
# the listed per-peer secret is accepted for it — the global
# MESH_PEER_PUSH_SECRET above is ignored for that specific URL. This
# closes the cross-peer impersonation surface for multi-peer fleets.
# Single-peer installs leave this empty (default) for unchanged
# behavior. Both sides of a peering must agree on the per-peer
# secret for a given URL.
- MESH_PEER_SECRETS=${MESH_PEER_SECRETS:-}
# Meshtastic MQTT is opt-in to avoid passive load on the public broker.
# Set MESH_MQTT_ENABLED=true in .env only when this node should join live MQTT.
- MESH_MQTT_ENABLED=${MESH_MQTT_ENABLED:-false}
@@ -0,0 +1,238 @@
/**
* Issues #218 / #219 / #220 (tg12 external audit) + Round 7a:
*
* Every browser-direct call to Wikipedia or Wikidata must send the
* `Api-User-Agent` header that Wikimedia's UA policy asks for, AND must
* embed the per-install operator handle so Wikimedia can rate-limit /
* contact the specific operator instead of treating "Shadowbroker" as
* one giant entity.
*
* These tests pin both requirements on the shared `lib/wikimediaClient`
* helper that WikiImage, NewsFeed, and useRegionDossier all route
* through. A future refactor that drops either the header OR the
* per-operator handle gets a loud test failure rather than a silent
* ToS / privacy regression.
*/
import { afterEach, beforeEach, describe, expect, it, vi } from 'vitest';
import {
buildWikimediaUserAgent,
fetchWikipediaSummary,
fetchWikidataSparql,
_resetWikimediaClientCacheForTests,
} from '@/lib/wikimediaClient';
const originalFetch = globalThis.fetch;
// Helper: stub fetch so calls to /api/settings/operator-handle return a
// known handle, and everything else proxies to whatever the test set up.
function withHandle(handle: string, otherFetch: typeof globalThis.fetch) {
return vi.fn(async (input: any, init?: RequestInit) => {
const url = String(input);
if (url.endsWith('/api/settings/operator-handle')) {
return new Response(JSON.stringify({ handle }), { status: 200 });
}
return otherFetch(input, init);
});
}
describe('lib/wikimediaClient', () => {
beforeEach(() => {
_resetWikimediaClientCacheForTests();
});
afterEach(() => {
globalThis.fetch = originalFetch;
vi.restoreAllMocks();
});
it('builds a stable per-operator Api-User-Agent with contact path', async () => {
globalThis.fetch = withHandle(
'operator-abc123',
vi.fn(async () => new Response('{}', { status: 200 })) as any,
) as any;
const ua = await buildWikimediaUserAgent('wikipedia-summary');
expect(ua).toContain('Shadowbroker');
expect(ua.toLowerCase()).toContain('github.com');
expect(ua.toLowerCase()).toContain('issues');
expect(ua).toContain('operator: operator-abc123');
expect(ua).toContain('purpose: wikipedia-summary');
});
it('falls back to "operator-offline" when handle endpoint is unreachable', async () => {
globalThis.fetch = vi.fn(async (input: any) => {
const url = String(input);
if (url.endsWith('/api/settings/operator-handle')) {
return new Response('forbidden', { status: 403 });
}
return new Response('{}', { status: 200 });
}) as any;
const ua = await buildWikimediaUserAgent('test');
expect(ua).toContain('operator: operator-offline');
});
it('sends per-operator Api-User-Agent on Wikipedia summary fetch', async () => {
const wikiCalls: Array<{ url: string; init?: RequestInit }> = [];
const baseFetch = vi.fn(async (url: any, init?: RequestInit) => {
wikiCalls.push({ url: String(url), init });
return new Response(
JSON.stringify({
type: 'standard',
title: 'Boeing 747',
description: 'aircraft',
extract: 'long extract',
thumbnail: { source: 'https://example.org/thumb.jpg' },
}),
{ status: 200 },
);
});
globalThis.fetch = withHandle('operator-test01', baseFetch as any) as any;
const summary = await fetchWikipediaSummary('Boeing 747');
expect(summary?.thumbnail).toBe('https://example.org/thumb.jpg');
// wikiCalls only captures calls to non-handle URLs.
expect(wikiCalls).toHaveLength(1);
const headers = (wikiCalls[0].init?.headers || {}) as Record<string, string>;
expect(headers['Api-User-Agent']).toContain('operator: operator-test01');
expect(headers['Api-User-Agent']).toContain('purpose: wikipedia-summary');
});
it('sends per-operator Api-User-Agent on Wikidata SPARQL fetch', async () => {
const calls: Array<{ url: string; init?: RequestInit }> = [];
const baseFetch = vi.fn(async (url: any, init?: RequestInit) => {
calls.push({ url: String(url), init });
return new Response(
JSON.stringify({
results: { bindings: [{ leaderLabel: { value: 'Test Leader' } }] },
}),
{ status: 200 },
);
});
globalThis.fetch = withHandle('operator-sparql', baseFetch as any) as any;
const bindings = await fetchWikidataSparql('SELECT * WHERE { ?s ?p ?o }');
expect(bindings).toHaveLength(1);
const headers = (calls[0].init?.headers || {}) as Record<string, string>;
expect(headers['Api-User-Agent']).toContain('operator: operator-sparql');
expect(headers['Api-User-Agent']).toContain('purpose: wikidata-sparql');
expect(headers['Accept']).toBe('application/sparql-results+json');
});
it('handle endpoint is queried only ONCE across many wiki fetches', async () => {
let handleCalls = 0;
let wikiCalls = 0;
globalThis.fetch = vi.fn(async (input: any) => {
const url = String(input);
if (url.endsWith('/api/settings/operator-handle')) {
handleCalls++;
return new Response(JSON.stringify({ handle: 'operator-cache' }), { status: 200 });
}
wikiCalls++;
return new Response(
JSON.stringify({
type: 'standard',
title: 'X',
description: '',
extract: '',
thumbnail: { source: 'https://example.org/x.jpg' },
}),
{ status: 200 },
);
}) as any;
await fetchWikipediaSummary('Eiffel Tower');
await fetchWikipediaSummary('Mount Fuji');
await fetchWikipediaSummary('Statue of Liberty');
expect(handleCalls).toBe(1);
expect(wikiCalls).toBe(3);
});
it('shares cache across consecutive callers for the same Wikipedia title', async () => {
let fetchCount = 0;
const baseFetch = vi.fn(async () => {
fetchCount++;
return new Response(
JSON.stringify({
type: 'standard',
title: 'Eiffel Tower',
description: 'iron lattice tower',
extract: '...',
thumbnail: { source: 'https://example.org/eiffel.jpg' },
}),
{ status: 200 },
);
});
globalThis.fetch = withHandle('operator-cache', baseFetch as any) as any;
const a = await fetchWikipediaSummary('Eiffel Tower');
const b = await fetchWikipediaSummary('Eiffel Tower');
expect(fetchCount).toBe(1);
expect(a?.thumbnail).toBe(b?.thumbnail);
});
it('deduplicates concurrent in-flight requests for the same title', async () => {
let fetchCount = 0;
const baseFetch = vi.fn(async () => {
fetchCount++;
await new Promise((r) => setTimeout(r, 5));
return new Response(
JSON.stringify({
type: 'standard',
title: 'Mount Fuji',
description: 'stratovolcano',
extract: '...',
thumbnail: { source: 'https://example.org/fuji.jpg' },
}),
{ status: 200 },
);
});
globalThis.fetch = withHandle('operator-cache', baseFetch as any) as any;
const [a, b, c] = await Promise.all([
fetchWikipediaSummary('Mount Fuji'),
fetchWikipediaSummary('Mount Fuji'),
fetchWikipediaSummary('Mount Fuji'),
]);
expect(fetchCount).toBe(1);
expect(a?.thumbnail).toBe('https://example.org/fuji.jpg');
expect(b).toEqual(a);
expect(c).toEqual(a);
});
it('returns null on disambiguation pages without throwing', async () => {
globalThis.fetch = withHandle(
'operator-cache',
vi.fn(async () =>
new Response(JSON.stringify({ type: 'disambiguation' }), { status: 200 }),
) as any,
) as any;
const summary = await fetchWikipediaSummary('Mercury');
expect(summary).toBeNull();
});
it('returns null on HTTP error without throwing', async () => {
globalThis.fetch = withHandle(
'operator-cache',
vi.fn(async () => new Response('not found', { status: 404 })) as any,
) as any;
const summary = await fetchWikipediaSummary('Nonexistent Article 12345');
expect(summary).toBeNull();
});
it('returns null on network error without throwing', async () => {
globalThis.fetch = withHandle(
'operator-cache',
vi.fn(async () => {
throw new Error('network down');
}) as any,
) as any;
const summary = await fetchWikipediaSummary('Anything');
expect(summary).toBeNull();
});
it('returns null on empty input without fetching anything', async () => {
globalThis.fetch = vi.fn(async () => new Response('{}', { status: 200 })) as any;
expect(await fetchWikipediaSummary('')).toBeNull();
expect(await fetchWikipediaSummary(' ')).toBeNull();
expect(globalThis.fetch).not.toHaveBeenCalled();
});
});
+24 -20
View File
@@ -5,6 +5,7 @@ import { motion, AnimatePresence } from 'framer-motion';
import { AlertTriangle, Clock, Minus, Plus, ExternalLink, Brain, Loader2 } from 'lucide-react';
import React, { useEffect, useRef, useCallback } from 'react';
import WikiImage from '@/components/WikiImage';
import { fetchWikipediaSummary } from '@/lib/wikimediaClient';
import type { SelectedEntity, RegionDossier, FimiData } from "@/types/dashboard";
import { useDataKeys } from '@/hooks/useDataStore';
import { API_BASE } from '@/lib/api';
@@ -203,34 +204,37 @@ function resolveAircraftWikiTitle(model: string | undefined): string | null {
return AIRCRAFT_WIKI[model] || resolveAcTypeWiki(model);
}
// Module-level cache for Wikipedia thumbnails (persists across re-renders)
const _wikiThumbCache: Record<string, { url: string | null; loading: boolean }> = {};
// Issue #220 (tg12): the previous implementation kept its own
// module-local Wikipedia thumbnail cache and issued anonymous fetches
// without `Api-User-Agent`. We now delegate to lib/wikimediaClient,
// which sends the policy-compliant header and shares one cache with
// WikiImage and useRegionDossier.
function useAircraftImage(model: string | undefined): { imgUrl: string | null; wikiUrl: string | null; loading: boolean } {
const [, forceUpdate] = useState(0);
const [imgUrl, setImgUrl] = useState<string | null>(null);
const [loading, setLoading] = useState(false);
const wikiTitle = resolveAircraftWikiTitle(model) || undefined;
const wikiUrl = wikiTitle ? `https://en.wikipedia.org/wiki/${wikiTitle.replace(/ /g, '_')}` : null;
useEffect(() => {
if (!wikiTitle) return;
const key = wikiTitle;
if (_wikiThumbCache[key]) return; // Already fetched or in-flight
_wikiThumbCache[key] = { url: null, loading: true };
fetch(`https://en.wikipedia.org/api/rest_v1/page/summary/${encodeURIComponent(wikiTitle)}`)
.then(r => r.json())
.then(d => {
_wikiThumbCache[key] = { url: d.thumbnail?.source || null, loading: false };
forceUpdate(n => n + 1);
})
.catch(() => {
_wikiThumbCache[key] = { url: null, loading: false };
forceUpdate(n => n + 1);
});
let cancelled = false;
if (!wikiTitle) {
setImgUrl(null);
setLoading(false);
return;
}
setLoading(true);
fetchWikipediaSummary(wikiTitle).then((summary) => {
if (cancelled) return;
setImgUrl(summary?.thumbnail || null);
setLoading(false);
});
return () => {
cancelled = true;
};
}, [wikiTitle]);
if (!wikiTitle) return { imgUrl: null, wikiUrl: null, loading: false };
const cached = _wikiThumbCache[wikiTitle];
return { imgUrl: cached?.url || null, wikiUrl, loading: cached?.loading || false };
return { imgUrl, wikiUrl, loading };
}
+25 -23
View File
@@ -1,13 +1,17 @@
'use client';
import React, { useState, useEffect } from 'react';
import ExternalImage from '@/components/ExternalImage';
// Module-level cache: Wikipedia article title → thumbnail URL
const _cache: Record<string, { url: string | null; done: boolean }> = {};
import { fetchWikipediaSummary } from '@/lib/wikimediaClient';
/**
* WikiImage displays a Wikipedia thumbnail for a given article URL.
* Uses the Wikipedia REST API with a module-level cache (only fetches once per article).
*
* Issue #220 (tg12): this component previously had its own
* module-local Wikipedia fetch + cache. It now delegates to
* `lib/wikimediaClient`, which sends the policy-compliant
* `Api-User-Agent` header and shares one cache across every UI
* component that asks Wikipedia for an article summary (WikiImage,
* NewsFeed, useRegionDossier).
*
* Props:
* wikiUrl: Full Wikipedia URL, e.g. "https://en.wikipedia.org/wiki/Boeing_787_Dreamliner"
@@ -26,32 +30,30 @@ export default function WikiImage({
maxH?: string;
accent?: string;
}) {
const [, forceUpdate] = useState(0);
const [imgUrl, setImgUrl] = useState<string | null>(null);
const [loading, setLoading] = useState(true);
// Extract article title from URL
const title = wikiUrl.replace(/^https?:\/\/[^/]+\/wiki\//, '');
useEffect(() => {
if (!title || _cache[title]?.done) return;
if (_cache[title]) return; // In-flight
_cache[title] = { url: null, done: false };
fetch(`https://en.wikipedia.org/api/rest_v1/page/summary/${encodeURIComponent(title)}`)
.then((r) => r.json())
.then((d) => {
_cache[title] = { url: d.thumbnail?.source || d.originalimage?.source || null, done: true };
forceUpdate((n) => n + 1);
})
.catch(() => {
_cache[title] = { url: null, done: true };
forceUpdate((n) => n + 1);
});
let cancelled = false;
if (!title) {
setImgUrl(null);
setLoading(false);
return;
}
setLoading(true);
fetchWikipediaSummary(title).then((summary) => {
if (cancelled) return;
setImgUrl(summary?.thumbnail || null);
setLoading(false);
});
return () => {
cancelled = true;
};
}, [title]);
const cached = _cache[title];
const imgUrl = cached?.url;
const loading = cached && !cached.done;
return (
<div className="pb-2">
{loading && (
+23 -22
View File
@@ -1,5 +1,6 @@
import { useCallback, useState, useEffect } from 'react';
import type { RegionDossier, SelectedEntity } from '@/types/dashboard';
import { fetchWikipediaSummary, fetchWikidataSparql } from '@/lib/wikimediaClient';
// ─── CACHE ─────────────────────────────────────────────────────────────────
// Simple in-memory cache keyed by rounded lat/lng (0.1° ≈ 11km grid), 24h TTL.
@@ -114,7 +115,11 @@ async function fetchCountryData(countryCode: string) {
return Array.isArray(data) ? data[0] || {} : data || {};
}
/** Fetch head of state + government type from Wikidata SPARQL (direct browser call). */
/** Fetch head of state + government type from Wikidata SPARQL.
*
* Issue #218 (tg12): routes through lib/wikimediaClient so the
* Api-User-Agent header is set per Wikimedia's UA policy.
*/
async function fetchLeader(countryName: string) {
if (!countryName) return { leader: 'Unknown', government_type: 'Unknown' };
const safeName = countryName.replace(/"/g, '\\"').replace(/'/g, "\\'");
@@ -127,13 +132,11 @@ async function fetchLeader(countryName: string) {
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
} LIMIT 1
`;
const url = `https://query.wikidata.org/sparql?query=${encodeURIComponent(sparql)}&format=json`;
const res = await fetch(url, {
headers: { Accept: 'application/sparql-results+json' },
});
if (!res.ok) throw new Error(`Wikidata HTTP ${res.status}`);
const results = (await res.json()).results?.bindings || [];
if (results.length > 0) {
const results = await fetchWikidataSparql<{
leaderLabel?: { value: string };
govTypeLabel?: { value: string };
}>(sparql);
if (results && results.length > 0) {
return {
leader: results[0].leaderLabel?.value || 'Unknown',
government_type: results[0].govTypeLabel?.value || 'Unknown',
@@ -142,27 +145,25 @@ async function fetchLeader(countryName: string) {
return { leader: 'Unknown', government_type: 'Unknown' };
}
/** Fetch Wikipedia summary for a place (direct browser call). */
/** Fetch Wikipedia summary for a place.
*
* Issue #219 (tg12): routes through lib/wikimediaClient so the
* Api-User-Agent header is set per Wikimedia's UA policy, AND the
* shared cache means consecutive useRegionDossier + WikiImage +
* NewsFeed lookups for the same article all hit the same slot.
*/
async function fetchLocalWikiSummary(placeName: string, countryName = '') {
if (!placeName) return {};
const candidates = [placeName];
if (countryName) candidates.push(`${placeName}, ${countryName}`);
for (const name of candidates) {
try {
const slug = encodeURIComponent(name.replace(/ /g, '_'));
const url = `https://en.wikipedia.org/api/rest_v1/page/summary/${slug}`;
const res = await fetch(url);
if (!res.ok) continue;
const data = await res.json();
if (data.type === 'disambiguation') continue;
const summary = await fetchWikipediaSummary(name);
if (summary) {
return {
description: data.description || '',
extract: data.extract || '',
thumbnail: data.thumbnail?.source || '',
description: summary.description,
extract: summary.extract,
thumbnail: summary.thumbnail,
};
} catch {
continue;
}
}
return {};
+210
View File
@@ -0,0 +1,210 @@
/**
* wikimediaClient single fetch surface for Wikipedia / Wikidata.
*
* Issues #218, #219, #220 (tg12 external audit) + Round 7a:
*
* Wikimedia's User-Agent policy asks API clients to identify themselves
* via `Api-User-Agent` when calling from browser JavaScript (because the
* browser does not let JS set `User-Agent` directly). Three independent
* components used to issue anonymous browser fetches against Wikipedia /
* Wikidata:
*
* - useRegionDossier (Wikidata SPARQL + Wikipedia REST summary)
* - WikiImage (Wikipedia REST summary)
* - NewsFeed (Wikipedia REST summary)
*
* PR #284 collapsed them into this shared helper with one stable
* `Api-User-Agent`. That fixed compliance but introduced a new problem:
* the `Api-User-Agent` was project-wide, so from Wikimedia's perspective
* every Shadowbroker install looked like one giant scraper. If one
* install misbehaved, Wikimedia's only recourse was to block the project
* as a whole.
*
* Round 7a fixes that. The frontend fetches the per-install operator
* handle from `GET /api/settings/operator-handle` once on first use and
* embeds it in the `Api-User-Agent`. Wikimedia can now rate-limit /
* contact the specific install instead of the project. The handle is
* auto-generated on the backend (`shadow-XXXXXX`) or operator-chosen via
* the `OPERATOR_HANDLE` setting.
*
* UX impact: zero. Same thumbnails, same summaries, same load behavior.
* The only observable change is the value of the outgoing
* `Api-User-Agent` header.
*/
// Module-level cache shared by WikiImage, NewsFeed, and useRegionDossier.
// Keyed by Wikipedia article title (NOT slug — we keep the human-readable
// form so debugging the cache is easier). Values track in-flight state
// so concurrent callers for the same title share one network request.
export interface WikipediaSummary {
title: string;
description: string;
extract: string;
thumbnail: string;
type: string; // 'standard' | 'disambiguation' | etc.
}
interface CacheEntry {
summary: WikipediaSummary | null;
inflight: Promise<WikipediaSummary | null> | null;
loaded: boolean;
}
const _summaryCache: Map<string, CacheEntry> = new Map();
const SUMMARY_CACHE_MAX = 512;
function evictIfOverCap() {
if (_summaryCache.size <= SUMMARY_CACHE_MAX) return;
const oldest = _summaryCache.keys().next().value;
if (oldest) _summaryCache.delete(oldest);
}
// ─── Per-operator handle (Round 7a) ────────────────────────────────────────
// Fetched once from the backend on first need and cached for the page
// lifetime. The handle is NOT a secret — Wikimedia will see it on every
// Wikipedia / Wikidata request we make — but caching it locally avoids a
// round-trip on every Wikipedia fetch and lets the offline / no-backend
// case still produce a stable UA (the fallback handle).
let _handlePromise: Promise<string> | null = null;
let _cachedHandle: string | null = null;
const FALLBACK_HANDLE = 'operator-offline';
const HANDLE_ENDPOINT = '/api/settings/operator-handle';
async function fetchOperatorHandle(): Promise<string> {
try {
const res = await fetch(HANDLE_ENDPOINT, {
// Use the standard relative-path proxy so the Next.js admin-key
// injection (same-origin) flows naturally for legitimate browser
// sessions. A cross-origin scanner will be blocked by the proxy
// before this even leaves their browser.
credentials: 'same-origin',
});
if (!res.ok) return FALLBACK_HANDLE;
const data = await res.json();
const h = (data && typeof data.handle === 'string' && data.handle.trim()) || '';
return h || FALLBACK_HANDLE;
} catch {
return FALLBACK_HANDLE;
}
}
async function getOperatorHandle(): Promise<string> {
if (_cachedHandle) return _cachedHandle;
if (!_handlePromise) {
_handlePromise = fetchOperatorHandle().then((h) => {
_cachedHandle = h;
return h;
});
}
return _handlePromise;
}
/** Build the Wikimedia Api-User-Agent for this install.
*
* Includes the per-install operator handle so Wikimedia can rate-limit /
* contact the specific operator instead of the project as a whole.
* Exported for tests; production callers should let
* `fetchWikipediaSummary` / `fetchWikidataSparql` build it implicitly.
*/
export async function buildWikimediaUserAgent(purpose: string): Promise<string> {
const handle = await getOperatorHandle();
const safePurpose = (purpose || '').replace(/[^a-zA-Z0-9_-]/g, '-').toLowerCase();
return (
`Shadowbroker/1.0 (operator: ${handle}; purpose: ${safePurpose}; ` +
'+https://github.com/BigBodyCobain/Shadowbroker; report issues at /issues)'
);
}
// ─── Wikipedia summary fetch ───────────────────────────────────────────────
/** Fetch a Wikipedia article summary (titles, NOT URLs).
*
* Empty / invalid input resolves to `null`. Network errors and disambig
* pages also resolve to `null` so callers can render a fallback without
* a try/catch. Per the audit's "fail forward, not loud" rule.
*/
export async function fetchWikipediaSummary(
title: string,
): Promise<WikipediaSummary | null> {
const trimmed = (title || '').trim();
if (!trimmed) return null;
const cached = _summaryCache.get(trimmed);
if (cached?.loaded) return cached.summary;
if (cached?.inflight) return cached.inflight;
const slug = encodeURIComponent(trimmed.replace(/ /g, '_'));
const url = `https://en.wikipedia.org/api/rest_v1/page/summary/${slug}`;
const promise = (async (): Promise<WikipediaSummary | null> => {
try {
const ua = await buildWikimediaUserAgent('wikipedia-summary');
const r = await fetch(url, { headers: { 'Api-User-Agent': ua } });
if (!r.ok) return null;
const d = await r.json();
if (d?.type === 'disambiguation') return null;
return {
title: trimmed,
description: d?.description || '',
extract: d?.extract || '',
thumbnail: d?.thumbnail?.source || d?.originalimage?.source || '',
type: d?.type || 'standard',
};
} catch {
return null;
}
})().then((summary) => {
_summaryCache.set(trimmed, { summary, inflight: null, loaded: true });
evictIfOverCap();
return summary;
});
_summaryCache.set(trimmed, { summary: null, inflight: promise, loaded: false });
evictIfOverCap();
return promise;
}
// ─── Wikidata SPARQL ───────────────────────────────────────────────────────
/** Fetch a Wikidata SPARQL query result.
*
* Returns the parsed JSON `results.bindings` array on success; `null`
* (not throwing) on any failure so callers can render fallbacks
* silently. Per-install operator handle threaded through `Api-User-Agent`
* (Round 7a).
*/
export async function fetchWikidataSparql<T = Record<string, { value: string }>>(
sparql: string,
): Promise<T[] | null> {
const trimmed = (sparql || '').trim();
if (!trimmed) return null;
const url = `https://query.wikidata.org/sparql?query=${encodeURIComponent(
trimmed,
)}&format=json`;
try {
const ua = await buildWikimediaUserAgent('wikidata-sparql');
const res = await fetch(url, {
headers: {
'Api-User-Agent': ua,
Accept: 'application/sparql-results+json',
},
});
if (!res.ok) return null;
const json = await res.json();
const bindings = json?.results?.bindings;
return Array.isArray(bindings) ? (bindings as T[]) : null;
} catch {
return null;
}
}
// ─── Test helpers ──────────────────────────────────────────────────────────
/** Internal: clear the shared cache + the handle cache. Exposed for tests only. */
export function _resetWikimediaClientCacheForTests() {
_summaryCache.clear();
_handlePromise = null;
_cachedHandle = null;
}
Generated
+11 -37
View File
@@ -80,8 +80,8 @@ dependencies = [
{ name = "apscheduler" },
{ name = "beautifulsoup4" },
{ name = "cachetools" },
{ name = "cloudscraper" },
{ name = "cryptography" },
{ name = "defusedxml" },
{ name = "fastapi" },
{ name = "feedparser" },
{ name = "httpx" },
@@ -118,8 +118,8 @@ requires-dist = [
{ name = "apscheduler", specifier = "==3.10.3" },
{ name = "beautifulsoup4", specifier = ">=4.9.0" },
{ name = "cachetools", specifier = "==5.5.2" },
{ name = "cloudscraper", specifier = "==1.2.71" },
{ name = "cryptography", specifier = ">=41.0.0" },
{ name = "defusedxml", specifier = ">=0.7.1" },
{ name = "fastapi", specifier = "==0.115.12" },
{ name = "feedparser", specifier = "==6.0.10" },
{ name = "httpx", specifier = "==0.28.1" },
@@ -451,20 +451,6 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/98/78/01c019cdb5d6498122777c1a43056ebb3ebfeef2076d9d026bfe15583b2b/click-8.3.1-py3-none-any.whl", hash = "sha256:981153a64e25f12d547d3426c367a4857371575ee7ad18df2a6183ab0545b2a6", size = 108274, upload-time = "2025-11-15T20:45:41.139Z" },
]
[[package]]
name = "cloudscraper"
version = "1.2.71"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "pyparsing" },
{ name = "requests" },
{ name = "requests-toolbelt" },
]
sdist = { url = "https://files.pythonhosted.org/packages/ac/25/6d0481860583f44953bd791de0b7c4f6d7ead7223f8a17e776247b34a5b4/cloudscraper-1.2.71.tar.gz", hash = "sha256:429c6e8aa6916d5bad5c8a5eac50f3ea53c9ac22616f6cb21b18dcc71517d0d3", size = 93261, upload-time = "2023-04-25T23:20:19.467Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/81/97/fc88803a451029688dffd7eb446dc1b529657577aec13aceff1cc9628c5d/cloudscraper-1.2.71-py2.py3-none-any.whl", hash = "sha256:76f50ca529ed2279e220837befdec892626f9511708e200d48d5bb76ded679b0", size = 99652, upload-time = "2023-04-25T23:20:15.974Z" },
]
[[package]]
name = "colorama"
version = "0.4.6"
@@ -600,6 +586,15 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/a4/87/d03a718e7bfdbbebaa4b6a66ba5bb069bc00a84e5ad176d8198cc785cd42/dbus_fast-4.0.0-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:f6af190d8306f1bd506740c39701f5c211aa31ac660a3fcb401ebb97d33166c7", size = 1627620, upload-time = "2026-02-01T21:05:46.878Z" },
]
[[package]]
name = "defusedxml"
version = "0.7.1"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/0f/d5/c66da9b79e5bdb124974bfe172b4daf3c984ebd9c2a06e2b8a4dc7331c72/defusedxml-0.7.1.tar.gz", hash = "sha256:1bb3032db185915b62d7c6209c5a8792be6a32ab2fedacc84e01b52c51aa3e69", size = 75520, upload-time = "2021-03-08T10:59:26.269Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/07/6c/aa3f2f849e01cb6a001cd8554a88d4c77c5c1a31c95bdf1cf9301e6d9ef4/defusedxml-0.7.1-py2.py3-none-any.whl", hash = "sha256:a352e7e428770286cc899e2542b6cdaedb2b4953ff269a210103ec58f6198a61", size = 25604, upload-time = "2021-03-08T10:59:24.45Z" },
]
[[package]]
name = "deprecated"
version = "1.3.1"
@@ -1632,15 +1627,6 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/99/32/15e08a0c4bb536303e1568e2ba5cae1ce39a2e026a03aea46173af4c7a2d/pyobjc_framework_libdispatch-12.1-cp314-cp314t-macosx_10_15_universal2.whl", hash = "sha256:23fc9915cba328216b6a736c7a48438a16213f16dfb467f69506300b95938cc7", size = 15976, upload-time = "2025-11-14T09:53:07.936Z" },
]
[[package]]
name = "pyparsing"
version = "3.3.2"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/f3/91/9c6ee907786a473bf81c5f53cf703ba0957b23ab84c264080fb5a450416f/pyparsing-3.3.2.tar.gz", hash = "sha256:c777f4d763f140633dcb6d8a3eda953bf7a214dc4eff598413c070bcdc117cbc", size = 6851574, upload-time = "2026-01-21T03:57:59.36Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/10/bd/c038d7cc38edc1aa5bf91ab8068b63d4308c66c4c8bb3cbba7dfbc049f9c/pyparsing-3.3.2-py3-none-any.whl", hash = "sha256:850ba148bd908d7e2411587e247a1e4f0327839c40e2e5e6d05a007ecc69911d", size = 122781, upload-time = "2026-01-21T03:57:55.912Z" },
]
[[package]]
name = "pypubsub"
version = "4.0.7"
@@ -1890,18 +1876,6 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/70/8e/0e2d847013cb52cd35b38c009bb167a1a26b2ce6cd6965bf26b47bc0bf44/requests-2.31.0-py3-none-any.whl", hash = "sha256:58cd2187c01e70e6e26505bca751777aa9f2ee0b7f4300988b709f44e013003f", size = 62574, upload-time = "2023-05-22T15:12:42.313Z" },
]
[[package]]
name = "requests-toolbelt"
version = "1.0.0"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "requests" },
]
sdist = { url = "https://files.pythonhosted.org/packages/f3/61/d7545dafb7ac2230c70d38d31cbfe4cc64f7144dc41f6e4e4b78ecd9f5bb/requests-toolbelt-1.0.0.tar.gz", hash = "sha256:7681a0a3d047012b5bdc0ee37d7f8f07ebe76ab08caeccfc3921ce23c88d5bc6", size = 206888, upload-time = "2023-05-01T04:11:33.229Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/3f/51/d4db610ef29373b879047326cbf6fa98b6c1969d6f6dc423279de2b1be2c/requests_toolbelt-1.0.0-py2.py3-none-any.whl", hash = "sha256:cccfdd665f0a24fcf4726e690f65639d272bb0637b9b92dfd91a5568ccf6bd06", size = 54481, upload-time = "2023-05-01T04:11:28.427Z" },
]
[[package]]
name = "reverse-geocoder"
version = "1.5.1"