Commit Graph

377 Commits

Author SHA1 Message Date
BigBodyCobain 03b8053617 feat(flights): cumulative fuel burned + CO2 emitted per flight
Pre-fix the emissions tooltip only showed the per-hour *rate* — what most
users actually want is the cumulative *amount* burned. This adds running
totals computed by multiplying the model-based rate by the elapsed
observation time since we first saw the airframe.

New module ``flight_observations.py``:
* Tracks first_seen_at + last_seen_at per icao24 hex.
* Re-opens a fresh session when an aircraft is unseen for > 15 min
  (treated as a new flight — landed and took off, or transited a dead
  zone). Prevents the cumulative counter from resetting mid-flight if
  the trail-rendering cache prunes the trail.
* Clamps elapsed time to 24h max so clock skew can't produce comically
  large numbers.
* Pruned every 5 min via a new scheduler job (mirrors ais_prune cadence).

flights.py + military.py emission enrichment now also attaches:
* observed_seconds — how long we've been tracking this airframe.
* fuel_gallons_burned — rate * elapsed_h.
* co2_kg_emitted — rate * elapsed_h.

The existing per-hour rate fields stay in the dict for backward compat
and are shown as small secondary context in the tooltip.

Frontend EmissionsEstimateBlock (NewsFeed.tsx) now prominently shows
the cumulative totals with the rate as smaller context underneath plus
"Observed in flight for Xh Ym". When observed_seconds is 0 (first refresh)
it renders "Just observed · totals will appear on next refresh" instead
of a misleading "0 gal".

12 backend tests cover record/accumulate/reset, the 24h clamp, prune,
case-insensitive key normalization, and end-to-end emission integration
in _classify_and_publish.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-23 07:56:23 -06:00
Shadowbroker 20807a2d62 Merge pull request #316 from BigBodyCobain/feat/aishub-fallback
feat(ais): AISHub REST fallback when AISStream is offline (20-min polling)
2026-05-23 07:42:56 -06:00
Shadowbroker 79fbf9741b Merge pull request #314 from BigBodyCobain/feat/ais-upstream-health
feat(ais): surface AISStream upstream outage instead of failing silently
2026-05-23 07:12:37 -06:00
BigBodyCobain a2f5d62926 feat(ais): AISHub REST fallback when AISStream WebSocket is offline
When stream.aisstream.io is unreachable (cert outage, server down — see
2026-05-20 and 2026-05-23 events) the ships layer goes empty. This adds
a slow REST fallback to data.aishub.net so the layer stays populated in
degraded mode.

Behavior:

* Opt-in via AISHUB_USERNAME (free registration at aishub.net/api).
  Without the env var the fetcher is a no-op.
* Default poll cadence 20 min — well inside their free-tier limits, gives
  ships time to move enough to look "alive". Configurable via
  AISHUB_POLL_INTERVAL_MINUTES, clamped to [1, 360].
* Internal gate: skips the poll entirely when the WebSocket primary is
  currently connected. Stomping fresh live data with 20-min-old REST
  data would be worse than leaving it alone.
* Vessels merge into the shared _vessels dict with source="aishub" so
  the existing UI / health tooling can attribute the provider.
* Live data wins races: if a WebSocket update for the same MMSI lands in
  the last 1s, we don't overwrite with the slower REST record.

Scheduler job runs every AISHUB_POLL_INTERVAL_MINUTES minutes alongside
the existing ais_prune job in data_fetcher.py.

24 tests cover gating (no-username, primary-connected), response parsing
(success / error / empty / malformed / unexpected shape), record
normalization (sentinels, missing fields, range checks, AIS @ padding),
poll interval clamping, and end-to-end merge with live-data-wins.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-23 07:00:32 -06:00
BigBodyCobain 5e0b2c037e feat(ais): surface upstream outage instead of failing silently
On 2026-05-23, stream.aisstream.io went fully offline (TCP timeouts on port
443). The backend kept respawning the node WebSocket proxy every few
seconds with nothing arriving. From the operator's POV the ships layer
silently went empty — no banner, no log surfacing, no way to tell whether
it was their config / network / viewport filter / upstream.

Backend:
* ais_proxy_status() now also returns:
  - connected (bool): true when a vessel message arrived in last 60s
  - last_msg_age_seconds (int | None)
  - proxy_spawn_count (int): proxy respawns — sustained growth without
    connected means upstream is dead
* /api/health escalates top status to "degraded" when AIS_API_KEY is set
  but the proxy is currently disconnected. Existing degraded_tls signal
  preserved.

Frontend:
* useAisUpstreamHealth hook polls /api/health every 30s, derives the
  outage state. Defensively only reports outage once spawn_count > 0 so
  operators who haven't opted in don't see the banner.
* AisUpstreamBanner component renders a dismissible amber notice
  "Ship data temporarily unavailable — AISStream upstream is offline"
  mounted on the main app shell.

7 backend tests pin the status-shape contract and the /api/health
escalation behavior in both with-key and without-key configurations.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-23 06:38:05 -06:00
Shadowbroker 69ef231e5a Merge pull request #313 from BigBodyCobain/feat/flight-source-attribution
feat(flights): stamp source attribution on every flight record
2026-05-23 06:29:31 -06:00
Shadowbroker 7a5f47ca9e Merge pull request #312 from BigBodyCobain/fix/gps-jamming-thresholds
fix(gps-jamming): count nac_p=0 + lower thresholds so layer actually fires
2026-05-23 06:29:20 -06:00
Shadowbroker 5cd49542bf Merge pull request #311 from BigBodyCobain/fix/uap-fallback-cutoff
fix(uap): stop HF fallback from serving 3-year-old NUFORC sightings
2026-05-23 06:29:08 -06:00
BigBodyCobain f14d4feb6d feat(flights): stamp source attribution on every flight record
Pre-fix, adsb.lol records (the primary source for most flights) carried
no source marker. OpenSky records got is_opensky: True and supplementals
got supplemental_source, so any UI inspecting source labels saw
OpenSky/airplanes.live records as explicitly tagged and adsb.lol records
as "unlabeled" — making it look like adsb.lol wasn't being used at all
even though it's the primary source.

Changes:

* _fetch_adsb_lol_regions stamps source="adsb.lol" on each aircraft
  before returning, so the tag survives the OpenSky dedupe-by-hex merge.
* OpenSky records get source="OpenSky" (alongside is_opensky=True for
  back-compat).
* military fetcher tags source on both adsb.lol and airplanes.live
  records before they're merged, and propagates source into the
  military_flights and uavs output dicts.
* _classify_and_publish promotes the explicit source field into the
  published flight dict. Falls back to legacy supplemental_source if
  source is absent. Final fallback "adsb.lol" preserves prior behavior
  for any caller synthesizing records without going through a fetcher.

8 new tests cover the published-dict propagation, OpenSky tagging,
supplemental fallback, explicit-wins precedence, default behavior, the
adsb.lol regional fetcher tagging, and the military output dict.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-23 06:14:39 -06:00
BigBodyCobain 19a8560a80 fix(gps-jamming): count nac_p=0 + lower thresholds so the layer actually fires
Three stacked filters meant the gps_jamming layer almost never lit up:

1. nac_p == 0 aircraft were dropped on the theory that "0 = old transponder."
   That's only half right — modern Mode-S Enhanced Surveillance transponders
   also fall back to nac_p=0 when they lose GPS lock entirely, which IS the
   jamming signature we want to catch. Discarding them was discarding the
   strongest signal. None (no field at all — typical for OpenSky-sourced
   records) is still skipped because absence-of-data isn't evidence.
2. GPS_JAMMING_MIN_AIRCRAFT was 5 per 1°x1° cell. Jamming hotspots
   (eastern Med, Russia/Ukraine border, Iran/Iraq) tend to have sparser
   traffic because pilots avoid them. Lowered to 3.
3. GPS_JAMMING_MIN_RATIO was 0.30. Combined with the (preserved) -1 noise
   cushion that made the effective bar high. Lowered to 0.20.

The 1-aircraft noise cushion is intact so a single quirky transponder
still can't flag a zone alone.

Also extracted the detector loop into a pure ``detect_gps_jamming_zones()``
function at module scope so it's testable in isolation (was previously
inlined inside ``_classify_and_publish``). The public signature accepts
threshold overrides for ad-hoc re-tuning without code edits.

16 new tests cover nac_p=0 inclusion, None-skip preservation, MIN_AIRCRAFT
lowering, MIN_RATIO lowering, noise cushion preservation, constant pinning,
override behavior, lon/lng key compatibility, and robustness to empty/None
inputs.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-22 23:40:18 -06:00
BigBodyCobain 0d0e009867 fix(uap): stop HF fallback from serving 3-year-old NUFORC sightings
The UAP sightings layer is sourced from a live scrape of nuforc.org with a
static Hugging Face CSV mirror (kcimc/NUFORC) as a fallback. The fallback
parsed every row, sorted by occurred-desc, and took the top 250 — with no
date cutoff. The HF mirror is a third-party snapshot that hasn't been
refreshed in years, so the "newest 250" rows it returns are from ~2022-23.
When the live path fails (Cloudflare 403, curl disabled on Windows, wdtNonce
regex stale, etc.) users see a map full of sightings from 3 years ago,
labeled as the "last 60 days" layer.

Changes:

* HF fallback now applies the same 60-day cutoff the live path uses. Rows
  outside the window are dropped before take-top-N. If the mirror has
  nothing inside the window the fallback returns [] (don't serve stale).
* When the HF mirror is fully stale a loud ERROR log fires with the count
  of dropped rows so the operator can tell the mirror's the problem, not
  a network issue.
* When BOTH live AND HF fallback produce 0 rows, fetch_uap_sightings now
  trips assert_canary("uap_sightings", 0) so the health registry shows
  the layer as broken instead of "fresh and empty for days."
* Scheduler moved from daily 12:00 UTC to weekly Mondays 12:00 UTC. The
  layer is a rolling 60-day digest; refreshing once a week is enough
  cadence for human-readable map exploration and keeps nuforc.org load
  light.

6 new tests cover the cutoff filter, the doomsday-log path, the mixed-age
path, the both-paths-empty health failure, the positive fallback path, and
the scheduler cadence.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-22 23:27:12 -06:00
Shadowbroker febcce9125 Merge pull request #310 from BigBodyCobain/fix/infonet-sync-429-backoff
Infonet sync: honor HTTP 429 Retry-After + exponential backoff
2026-05-22 23:11:00 -06:00
BigBodyCobain 31ebcb5cd9 Infonet sync: honor HTTP 429 Retry-After + exponential backoff
Fixes the retry-storm that's been keeping the local node 429'd out of
the seed peer (the diagnosis we ran earlier in the session). Pre-fix:

  1. Sync hits the seed peer, gets HTTP 429 (Too Many Requests)
  2. _peer_sync_response stringifies the status into a ValueError
  3. _sync_from_peer catches it, error becomes the str() of the exc
  4. _run_public_sync_cycle calls finish_sync(error=..., failure_backoff_s=60)
  5. next_sync_due_at = now + 60s
  6. After 60s, sync runs again, hits same upstream that hasn't reset
     its rate-limit bucket, 429 again. Loop indefinitely.

Net effect: a node that hit one transient 429 would hammer the seed
every 60s forever, keeping the bucket full and never recovering. We
saw this in the live status dump: consecutive_failures=49,
last_sync_ok_at=0, retry storm sustained over the entire uptime.

What changed
------------
services/mesh/mesh_infonet_sync_support.py

  * New typed exception PeerSyncRateLimited carries the parsed
    Retry-After value out of the HTTP layer instead of stringifying
    everything into a generic ValueError.

  * New parse_retry_after_header() handles both RFC 7231 §7.1.3
    forms (delay-seconds and HTTP-date). Clamped at 1 hour so a
    hostile peer can't silence us for days.

  * New _failure_backoff_seconds() helper computes the next delay
    as max(exponential, retry_after_s). Schedule with default
    base=60s, cap=1800s:

      failure 1 -> 60s     (preserves pre-fix for transient blips)
      failure 2 -> 120s
      failure 3 -> 240s
      failure 4 -> 480s
      failure 5 -> 960s
      failure 6+ -> 1800s  (capped at 30 min)

    cap_s=0 explicitly disables exponential entirely — operators
    who want pure-Retry-After behavior have that option.

  * finish_sync now accepts retry_after_s and failure_backoff_cap_s
    kwargs. Backward-compatible: existing callers that don't pass
    retry_after_s get the same first-failure delay as before (the
    base value), only repeat failures grow.

main.py

  * _peer_sync_response detects 429 specifically, parses the
    Retry-After header, raises PeerSyncRateLimited(retry_after_s=N).
    Includes the response body prefix in the message so the
    operator's last_error finally shows something useful.

  * _sync_from_peer extended to return (ok, error, forked,
    retry_after_s) — the 4th tuple element is non-zero only when
    the upstream sent a parseable Retry-After. Existing call shape
    preserved: the lone caller in _run_public_sync_cycle was
    updated in the same commit.

  * _run_public_sync_cycle forwards retry_after_s into finish_sync.

Tests
-----
backend/tests/mesh/test_infonet_sync_429_backoff.py — 17 new tests:

  TestParseRetryAfter (7):
    - integer seconds form
    - HTTP-date form (computed as seconds-from-now)
    - HTTP-date in the past returns 0
    - empty / whitespace returns 0
    - malformed returns 0
    - clamps to 1 hour (hostile-peer cap)
    - negative returns 0

  TestFailureBackoffSeconds (5):
    - exponential growth schedule pins each level
    - retry_after wins when larger than exponential
    - exponential wins when larger than retry_after
    - cap_s=0 disables exponential entirely
    - zero inputs return zero

  TestFinishSyncBackoff (5):
    - first failure uses base unchanged (pre-fix back-compat)
    - consecutive_failures actually grow the delay
    - retry_after honored at low failure count
    - success resets consecutive_failures
    - last_error carries the HTTP status / Retry-After detail

All 24 existing sync-support / status-gate tests still pass. Other
failures in tests/mesh/ are pre-existing on origin/main and unrelated
to this change (verified by running the same tests against the
user's main worktree without these edits).

What the operator sees after this lands + a docker rebuild
----------------------------------------------------------
With the live 429 storm we diagnosed:

  Pre-fix: consecutive_failures keeps climbing 1/min forever,
           last_error empty or generic
  Post-fix: consecutive_failures grows, next_sync_due_at backs off
           exponentially (max 30 min), last_error explicitly carries
           "HTTP 429 from <peer> (retry_after=Ns): <body>" so the
           operator can see what's actually wrong. Once the upstream
           bucket drains and a sync succeeds, consecutive_failures
           resets to 0 and the schedule returns to the normal 300s
           interval.
2026-05-22 22:55:05 -06:00
Shadowbroker b3fca3dc18 Merge pull request #309 from BigBodyCobain/feat/cross-node-dm-mailbox-replication
DM mailbox: per-(sender, recipient) anti-spam cap + replication primitives
2026-05-22 22:43:26 -06:00
BigBodyCobain 401f114e4f DM mailbox: outbound replication + receiving endpoint
Second commit on this branch (first added the per-sender cap + accept_replica
primitive). This commit wires the actual cross-node propagation:

Outbound (sender side)
----------------------
* New ``DMRelay._replicate_envelope_to_peers_async()`` — fire-and-forget
  thread that POSTs the envelope to every authenticated relay peer via
  the same per-peer HMAC pattern gate-message replication uses (#256
  ``X-Peer-Url`` + ``X-Peer-HMAC`` headers, ``resolve_peer_key_for_url``).
* ``deposit()`` now calls the replication helper after a successful
  local accept. Per-peer errors are swallowed — slow Tor peers must not
  block the sender's UX, and the recipient polling from a healthy peer
  works fine even if some peers are down.
* Metrics: dm_replication_push_ok / _rejected / _error.

Inbound (receiving side)
------------------------
* New endpoint ``POST /api/mesh/dm/replicate-envelope`` in
  routers/mesh_peer_sync.py.
* Same HMAC auth gate (``_verify_peer_push_hmac``) as the existing
  infonet/gate peer-push endpoints. Unauthenticated requests get 403.
* Body cap of 64 KB (DM envelope is bounded by MESH_DM_MAX_MSG_BYTES).
* Calls DMRelay.accept_replica which enforces the per-sender cap as a
  network rule — hostile sender's relay can hold extras locally but
  honest peers reject them on inbound replication.

End-to-end flow now works
-------------------------
  1. Alice's node accepts a deposit to Bob's mailbox (local cap check).
  2. Alice's node spawns a background thread that POSTs the envelope
     to MESH_RELAY_PEERS with per-peer HMAC.
  3. Each peer's /api/mesh/dm/replicate-envelope verifies the HMAC and
     calls accept_replica, which re-enforces the per-sender cap.
  4. Bob (offline at the time of send) eventually logs into ANY node
     in MESH_RELAY_PEERS, his existing pollDmMailboxes pulls from
     the local mailbox there, finds Alice's envelope, decrypts.

Tests
-----
backend/tests/test_dm_replicate_envelope_endpoint.py — 4 tests:

  TestReplicateEndpointAuth:
    - rejects requests without peer HMAC (403)
    - rejects requests with WRONG peer HMAC (403) — confirms the
      HMAC is actually verified, not just present
    - rejects oversize bodies (>64 KB) with 400/413

  TestReplicateEndpointRegistered:
    - static check that POST /api/mesh/dm/replicate-envelope is
      registered on app.routes — catches future refactor that
      drops the router include

All 38 backend tests touching the new code paths still pass:
  test_dm_relay_per_sender_cap.py (14)
  test_dm_replicate_envelope_endpoint.py (4)
  test_no_new_duplicate_routes.py (1) — new route is unique
  test_per_peer_secret_resolver.py (19) — HMAC primitive unaffected

What's still ahead (PR-3+)
--------------------------
* ack propagation: when recipient pulls a message on node X, peers Y/Z
  should prune their copies to free the sender's quota network-wide.
  Without this, the sender's quota frees only on the node the recipient
  actually polled — other peers still see N pending until TTL expiry.
  Workable but suboptimal. PR-3 will add a /api/mesh/dm/ack endpoint
  with the same HMAC pattern.
* recipient pull-from-peers: today the recipient's poll only hits
  their own node's relay. If they log into a peer they didn't deposit
  with, they need a way to fetch envelopes from other peers in
  MESH_RELAY_PEERS. Today this works as long as the recipient's
  current node is one of the peers Alice's node pushed to — which is
  true in a fully-meshed deployment but not guaranteed for partial
  meshes. PR-4 if telemetry shows this matters.
2026-05-22 19:23:09 -06:00
BigBodyCobain 79b39e8985 DM mailbox: per-(sender, recipient) anti-spam cap + replication primitives
Foundation work for cross-node DM mailbox replication. Adds the network
rule that makes the replication safe to ship next, plus the primitives
the outbound replication PR will call.

The rule
--------
A single sender can have at most N UNACKED messages parked in a single
recipient's mailbox at any one time. Default N=2, tunable via
``MESH_DM_PENDING_PER_SENDER_LIMIT``. Once the recipient pulls (acks) a
message, the sender's quota for that (sender, recipient) pair frees up.

Network rule, not local rule
----------------------------
The cap is enforced TWICE:

  1. ``DMRelay.deposit(...)`` — local check on the sender's own node.
     Refuses to spool the (N+1)th message before it can be replicated.

  2. ``DMRelay.accept_replica(...)`` — replication-acceptance check on
     every receiving peer. Refuses to accept an inbound replica that
     would put the local mailbox over the cap.

The second half is what makes the rule a NETWORK rule. A hostile sender
could patch out the deposit check on their own relay and continue to
spool extras locally — but those extras can never propagate, because
every honest peer enforces the same cap on the way in. A recipient who
polls from honest peers therefore never sees more than N pending from
any one sender, regardless of how many spam attempts the hostile
sender's relay accepted.

New API surface on ``DMRelay``
------------------------------
  _per_sender_pending_limit()       — reads MESH_DM_PENDING_PER_SENDER_LIMIT
  _per_sender_pending_count(...)    — counts unacked from a sender for a mailbox
  accept_replica(envelope=...)      — peer-push receive entry point
  envelope_for_replication(...)     — helper to extract a wire-form envelope

``accept_replica`` is idempotent on duplicate ``msg_id`` (replication
round-trips and multi-path delivery don't double-spool).

``envelope_for_replication`` exposes the exact shape ``accept_replica``
expects, so the follow-up PR (outbound replication wiring) just has to
fetch the envelope and POST it to authenticated peer URLs with the
existing per-peer HMAC pattern from #256.

Why this is PR-1 of two
-----------------------
The full cross-node mailbox replication needs three pieces:

  A. cap enforcement on deposit (in this PR)
  B. cap enforcement on replica acceptance (in this PR)
  C. outbound: push envelope to MESH_RELAY_PEERS after deposit (NEXT PR)

(A) + (B) shipped together close the cap-bypass attack surface BEFORE
(C) introduces the actual cross-node propagation. Shipping them in the
other order would briefly let extras propagate during the window between
"outbound push lands" and "accept_replica cap lands."

Tests
-----
backend/tests/test_dm_relay_per_sender_cap.py — 14 tests:

  TestDepositCap:
    - first 2 deposits succeed (UX baseline)
    - 3rd from same sender rejected with friendly message
    - different senders have independent quotas
    - different recipients have independent quotas
    - ack frees the quota (after recipient pulls, sender can deposit again)
    - cap is env-tunable

  TestAcceptReplicaCap:
    - replica accepted under cap
    - idempotent on duplicate msg_id (no double-spool, no rejection)
    - rejected at cap with structured ``cap_violation`` marker so
      sender's relay can stop retrying
    - per-sender, not per-mailbox: different sender_block_ref passes
      even when another sender at the same mailbox is capped
    - malformed envelope shapes rejected without crash

  TestEnvelopeForReplication:
    - returns the envelope for stored messages
    - returns None for unknown msg_id
    - round-trips through accept_replica end-to-end (proves the wire
      shape matches across the two sides)
2026-05-22 19:18:01 -06:00
Shadowbroker c3e38621fc Merge pull request #308 from BigBodyCobain/fix/296-windows-venv-uvicorn-detection
Fix #296: reject backend venvs missing uvicorn before launch (Windows)
2026-05-22 18:56:08 -06:00
BigBodyCobain 9ef02dd06f Fix #296: reject backend venvs missing uvicorn before launch
Reported by @f3n3k on Windows native install path. Symptom:

    C:\001\backend\venv\Scripts\python.exe: No module named uvicorn
    [backend] exited with 1
    ShadowBroker has stopped. Exit code: 1

Root cause
----------
The Windows Start.bat flow chains:

    Start.bat
      └─ scripts\run-windows-runtime.ps1
           └─ frontend\scripts\dev-all.cjs
                └─ start-backend.js
                     └─ backend\venv\Scripts\python.exe -m uvicorn main:app

`start-backend.js` decided whether an existing `backend\venv` was usable
by calling `canRun(candidate, ["-V"])`. That only checks whether Python
itself can run — it does NOT check whether the backend's actual runtime
dependencies are installed.

When the venv exists but `pip install` never finished (partial install,
failed network, interrupted bootstrap, etc.), the launcher happily
accepted that broken venv, then died with the exact error f3n3k
reported.

Fix
---
New `canRunBackendPython()` helper that requires BOTH:

    python -V                                # Python is runnable
    python -c "import fastapi, uvicorn"      # backend deps are installed

Used in two call sites:

  * `ensureBackendVenv()` — when iterating candidate venvs on first
    launch, reject any venv whose Python can't import the backend's
    real entry-point deps. The launcher then falls through to its
    existing rebuild path (`rebuildBackendVenv`) which reinstalls deps
    before declaring the venv healthy.
  * `rebuildBackendVenv()` — after a rebuild attempt, verify the deps
    are present before returning the new interpreter path. Catches
    silent partial rebuilds.

The check is the import that uvicorn itself would do at startup, so a
green return here genuinely means "uvicorn will start". Cost is one
extra `python -c` per venv candidate on launcher startup — milliseconds.

Verified locally with `node --check start-backend.js`.

Credit: @f3n3k for the original report.
2026-05-22 18:50:27 -06:00
Shadowbroker ba39d3b9aa Merge pull request #307 from BigBodyCobain/fix/302-openclaw-hmac-reveal-hardening
Fix #302: split OpenClaw HMAC reveal into dedicated POST with no-store headers
2026-05-22 18:47:09 -06:00
BigBodyCobain f91ddcf38b Fix #302: split OpenClaw HMAC reveal into dedicated POST with no-store
Reported by @tg12. Pre-fix, two problems lived on the GET endpoint:

  1. `GET /api/ai/connect-info?reveal=true` returned the full HMAC
     secret in the response body on every Connect modal open. Even
     gated to require_local_operator, that put the secret into
     browser history, dev-tools network panels, browser disk caches,
     HAR exports, and screen captures.

  2. The same GET endpoint auto-bootstrapped (generated + persisted)
     the secret on a mere read. Side effects on a GET are a footgun:
     browser prefetchers, mirror tools, and casual curl-from-history
     would all silently mint+persist a fresh secret.

Backend (backend/routers/ai_intel.py)
-------------------------------------
  GET  /api/ai/connect-info             — always returns the MASKED
                                          fingerprint (first6 + bullets
                                          + last4). No `?reveal` param.
                                          NO auto-bootstrap. When the
                                          secret is missing, returns
                                          `hmac_secret_set: false` and
                                          tells the caller to POST to
                                          /bootstrap.
  POST /api/ai/connect-info/bootstrap   — NEW. Mints+persists the secret
                                          if missing. Idempotent. Never
                                          returns the full secret in the
                                          response body.
  POST /api/ai/connect-info/reveal      — NEW. Returns the full secret
                                          with Cache-Control: no-store,
                                          no-cache, must-revalidate +
                                          Pragma: no-cache + Expires: 0.
                                          POST so the body never lands
                                          in URL history. 404 (with a
                                          pointer to /bootstrap) when
                                          the secret isn't set.
  POST /api/ai/connect-info/regenerate  — keeps existing one-time-reveal
                                          behavior (regen IS a deliberate
                                          destructive action triggered
                                          by the operator). Same
                                          no-store/no-cache headers added
                                          so even the regen response
                                          doesn't get cached.

Frontend (AIIntelPanel.tsx, OnboardingModal.tsx)
------------------------------------------------
  * On mount: GET (masked only). If hmac_secret_set: false, fire a
    transparent POST /bootstrap and refresh the masked fingerprint.
    Operator sees no behavior change from pre-#302.
  * Reveal (eye icon): lazy POST /reveal — secret only travels when
    the operator explicitly clicks the button.
  * Copy: lazy POST /reveal too — copying without a prior reveal
    works exactly like before, just routed through the new endpoint.
  * Regenerate: POST returns the new secret (same as before, but the
    response now has no-store headers).
  * The displayed snippet uses the masked fingerprint until the
    operator clicks Reveal or Copy.

Tests (backend/tests/test_openclaw_connect_info_reveal.py — 13 tests)
---------------------------------------------------------------------
  * GET returns masked + the full secret never appears in r.text
  * GET does NOT auto-bootstrap when missing
  * GET silently ignores any ?reveal=true query (back-compat noise)
  * POST /bootstrap mints when missing, idempotent when set
  * POST /bootstrap never returns the full secret
  * POST /reveal returns the full secret with Cache-Control: no-store,
    no-cache + Pragma: no-cache + Expires: 0
  * POST /reveal 404s with a pointer to /bootstrap when no secret
  * POST /regenerate returns the new secret with the same headers
  * Anonymous remote callers get 403 on ALL FOUR endpoints (parametric
    regression against the same allowlist used elsewhere).

Adjacent suites still green: test_openclaw_route_security,
test_no_new_duplicate_routes, test_control_surface_auth. 67/67 pass
locally.

Credit: @tg12 for the audit report.
2026-05-22 18:40:24 -06:00
Shadowbroker 49151d8b9f Merge pull request #304 from BigBodyCobain/fix/298-sentinel-creds-server-side
Fix #298: move Sentinel credentials from browser storage to backend .env
2026-05-22 18:29:11 -06:00
BigBodyCobain 767a2f6c00 Merge remote-tracking branch 'origin/main' into fix/298-sentinel-creds-server-side 2026-05-22 18:19:12 -06:00
Shadowbroker 2da739c9e8 Merge pull request #306 from BigBodyCobain/fix/messagesview-flake-alias-race
Deflake messagesViewFirstContact: alias-resolution race in toast text
2026-05-22 18:18:56 -06:00
BigBodyCobain eca7f24e2c Loosen messagesViewFirstContact toast assertion to fix alias-race flake
Follow-up to #305. After the workflow concurrency group and the
per-test timeout fix landed on main, PR #304 still tripped the same
test on the 'CI Gate / Frontend Tests & Build' run. Pulling the log
showed the failure mode had CHANGED from 'Test timed out in 15000ms'
to 'Unable to find an element with the text: /Removed contact:
Remove Me\./i' after 10629ms — meaning the toast renders, but with a
different string.

Tracing through MessagesView.tsx:3478-3494, the Remove handler computes
the toast text as:

    setComposeStatus(
      `Removed contact: ${displayNameForPeer(peerId, contacts)}.`,
    );

displayNameForPeer reads contacts[peerId].alias or falls through to
the raw peerId. The reference is captured from the closed-over React
state. Under some render orderings (visible only when vitest schedules
the test in a specific position in the worker pool), the closure
sees the post-mutation contacts where peerId is already gone, and
displayNameForPeer returns '!sb_remove' instead of 'Remove Me'. The
toast renders correctly — but as 'Removed contact: !sb_remove.' —
and the precise regex misses.

Fix: loosen the assertion to /Removed contact:/i. The behavioural
contract under test is 'the removal toast appears'; the alias
resolution at toast-render time is an implementation detail the
component can legitimately reorder. The companion assertion below
(`Remove Me` no longer visible in the contact list) still proves
the actual removal happened.

Verified locally: 26/26 tests pass in 5.15s.
2026-05-22 18:06:56 -06:00
BigBodyCobain 7bfaad17f0 Merge remote-tracking branch 'origin/main' into fix/298-sentinel-creds-server-side 2026-05-22 17:55:58 -06:00
Shadowbroker e3efcfd476 Merge pull request #305 from BigBodyCobain/fix/messagesview-flake-ci-concurrency
Deflake messagesViewFirstContact via CI concurrency group
2026-05-22 17:55:22 -06:00
BigBodyCobain 32b8421a1c Merge origin/main into fix/298: resolve tools.py conflict
PR #303 landed on main and added Depends(require_local_operator) to the
@router.post decorators for /api/sentinel/token and /api/sentinel/tile.
PR #298 (this branch) edited the same decorator lines AND function bodies
to add the env-credential fallback resolver.

Resolution keeps BOTH:
  * The require_local_operator dependency from #303 (the auth gate)
  * The _resolve_sentinel_credentials helper from #298
  * The env-fallback path inside the function bodies

Both layers are independent — the gate blocks anonymous callers, the env
fallback lets legitimate (gated) callers omit credentials from the body.

Verified: 46 tests pass against the merged code, including both
test_sentinel_credentials_server_side.py (#298 fallback) and
test_sentinel_routes_auth_gate.py (#303 gate).
2026-05-22 17:52:10 -06:00
BigBodyCobain bc70cc3527 fix(test): per-test timeout — 15s waitFor inside 15s testTimeout was zero headroom
Mistake in the prior commit on this branch (44e9b38). Bumped the
waitFor timeout to 15s without realising the suite-wide testTimeout
was ALSO 15s (raised in Round 7a deflake work). Net effect: the
test ran out of clock budget BEFORE waitFor could even finish
polling, producing "Test timed out in 15000ms" on the
"Frontend Tests & Build" run of PR #305 — same job that the
concurrency-group fix had just freed from the resource-contention
flake.

Fix:
  * Bump JUST this test's per-test timeout to 30s via the
    `{ timeout: 30_000 }` argument on the `it()` block.
  * Drop the inner waitFor back to 10s (was 15s) so it has a clear
    margin against the 30s test budget after setup/render/click.

26/26 tests in the file pass locally in 6.19s. The concurrency-group
fix in ci.yml stays as-is — that was correct and verifiably worked
(CI Gate / Frontend Tests & Build went green on the PR after 8 prior
failures). The flake-jump to the sibling workflow exposed this
second-order bug.
2026-05-22 17:49:00 -06:00
BigBodyCobain 44e9b38ac2 Deflake messagesViewFirstContact via CI concurrency group
Root cause
----------
ci.yml fires twice on every PR — once directly via `pull_request:
[main]` (producing the "Frontend Tests & Build" check) and once via
`workflow_call` from docker-publish.yml (producing the "CI Gate /
Frontend Tests & Build" check). Both jobs land on the same Actions
runner pool at the same time and fight for CPU/RAM. Under contention,
the React reconciliation in `messagesViewFirstContact.test.tsx >
removes an approved contact immediately from the visible contact list`
overruns its 5s waitFor timeout.

This is the single test that has flaked on PRs #226, #237, #261, #262,
#265, #294, #303, and the fd7d6fa push — always on the same job name
("CI Gate / Frontend Tests & Build"), never on the sibling job
("Frontend Tests & Build") on the same commit. PR #304 (which heavily
touched the frontend) passed both jobs on first try. PR #303 (zero
frontend changes) failed only the CI Gate job. That asymmetry is what
finally pinpointed the parallel-resource-contention cause rather than
anything in the test or the PRs.

Fix
---
.github/workflows/ci.yml — added a workflow-level concurrency group
keyed on the PR head SHA (or pushed commit SHA). Both invocations
against the same commit now share a group, so the second one queues
instead of running in parallel. cancel-in-progress is intentionally
`false` — cancelling would risk leaving a PR check stuck in "Expected"
if only one of the two ever finished. Total CI time grows by ~2 min
in exchange for deterministic outcomes.

frontend/src/__tests__/mesh/messagesViewFirstContact.test.tsx —
belt-and-suspenders bump of the waitFor timeout from 5s to 15s. The
structural fix above should make the original 5s margin sufficient,
but the bump removes the residual risk of brief runner load spikes
inside the (now serialised) single job. The failure mode this masks
would be "toast never renders", which still fails loudly at 15s.

The full mesh test file (26 tests) passes locally in ~8s with the
bumped timeout.
2026-05-22 17:36:33 -06:00
Shadowbroker b01a69c172 Merge pull request #303 from BigBodyCobain/fix/299-300-301-sentinel-auth-gate
Fix #299/#300/#301: gate Sentinel proxy routes with require_local_operator
2026-05-22 10:56:41 -06:00
BigBodyCobain b041b5e97c Fix #298: move Sentinel credentials from browser storage to backend .env
Reported by @tg12. Pre-fix, the Settings panel stored real third-party
Copernicus CDSE client_id + client_secret in browser localStorage /
sessionStorage via the privacy storage helper, and the proxy routes
required those values to come back in every tile/token request body.
Any same-origin script (XSS, malicious browser extension, dev-tools
HAR export) had read access to the credentials.

This change moves them server-side, behind the same .env-backed admin
flow every other third-party API key (OpenSky, AIS Stream, Finnhub,
Shodan, …) already uses.

Backend
-------
backend/services/api_settings.py
  * Added SENTINEL_CLIENT_ID and SENTINEL_CLIENT_SECRET entries to
    API_REGISTRY. The existing GET/PUT /api/settings/api-keys flow
    (already require_local_operator-gated, .env-backed) now manages
    them — no new route surface.

backend/routers/tools.py
  * /api/sentinel/token and /api/sentinel/tile resolve credentials via
    a new _resolve_sentinel_credentials() helper: body fields win for
    back-compat with any legacy callers, otherwise the helper reads
    SENTINEL_CLIENT_ID / SENTINEL_CLIENT_SECRET from os.environ.
  * When neither source has a value, the route returns 400 with a
    friendly pointer ("Set SENTINEL_CLIENT_ID and SENTINEL_CLIENT_SECRET
    in the API Keys panel") instead of the curt "required" message.
    The user's standing rule against hostile errors applies.
  * Function bodies only — decorator lines untouched, so this PR does
    not conflict with #303 (which adds Depends(require_local_operator)
    to the same routes).

Frontend
--------
frontend/src/lib/sentinelHub.ts — rewritten
  * Removed: getSentinelCredentials / setSentinelCredentials /
    clearSentinelCredentials / getSentinelCredentialStorageMode.
    These were the browser-storage read/write helpers; their existence
    was the bug.
  * Added: checkBackendSentinelStatus(), refreshSentinelStatus(),
    getCachedSentinelStatus(), and a kept-for-back-compat
    hasSentinelCredentials() shim. Status is sourced from
    /api/settings/api-keys (the same endpoint the API Keys panel
    already uses), so we don't add a new route just for this read.
  * Added: migrateLegacySentinelBrowserKeys() — one-shot, idempotent
    helper that clears sb_sentinel_client_id / _secret / _instance_id
    from BOTH localStorage and sessionStorage. We deliberately do NOT
    auto-POST those legacy browser values to the backend; doing so
    would silently migrate a secret across a trust boundary without
    operator consent. Operators re-enter once in the API Keys panel
    and the legacy keys get wiped here.
  * fetchSentinelTile and getSentinelToken no longer send client_id /
    client_secret in the request body. The backend uses .env.

frontend/src/components/SettingsPanel.tsx
  * Dropped sb_sentinel_client_id / _secret / _instance_id from
    PRIVACY_SENSITIVE_BROWSER_KEYS — they're no longer written.
  * SentinelTab rewritten: removed the inline Client ID / Client Secret
    inputs + Save / Clear / Test buttons. Replaced with a status panel
    that calls checkBackendSentinelStatus() on mount, a one-click
    "Open API Keys Panel" button, and a migration banner that appears
    only when migrateLegacySentinelBrowserKeys() actually cleared
    something.
  * Setup guide STEP 3 now points to the API Keys panel instead of
    the local form.

frontend/src/app/page.tsx
  * Added a one-time useEffect that fires checkBackendSentinelStatus()
    on mount so the cached value (which the synchronous
    hasSentinelCredentials() shim reads) is populated before
    MaplibreViewer's tile-URL memo runs.

Tests
-----
backend/tests/test_sentinel_credentials_server_side.py (new)
  * API_REGISTRY surface — sentinel_client_id / sentinel_client_secret
    are registered with the right env_keys, ALLOWED_ENV_KEYS lets
    /api/settings/api-keys PUT them.
  * Resolution order — body wins, env is fallback, neither → 400 with
    the friendly pointer message, and NO upstream HTTP call when
    neither source has credentials (asserted via
    MagicMock(side_effect=AssertionError)).
  * /api/sentinel/tile same shape.

frontend/src/__tests__/utils/sentinelHub.test.ts (new)
  * migrateLegacySentinelBrowserKeys clears localStorage AND
    sessionStorage, reports what it cleared, idempotent.
  * fetchSentinelTile + getSentinelToken POST WITHOUT client_id /
    client_secret in the body (plants leaked credentials in browser
    storage first to prove they are NOT picked up).
  * checkBackendSentinelStatus parses /api/settings/api-keys correctly:
    true only when both keys is_set, false on partial config or
    network errors.

All 7 backend tests + 8 frontend tests pass locally. The
test_no_new_duplicate_routes guard and the api-settings test suite
still pass.

Credit: @tg12 for the audit report.
2026-05-22 10:44:50 -06:00
BigBodyCobain c54ea7fd9f Fix #299/#300/#301: gate Sentinel proxy routes with require_local_operator
Reported by @tg12 in three audit issues opened the same day:

  #299 — POST /api/sentinel/token is an unauthenticated Copernicus
         OAuth relay for caller-supplied client_id/secret.
  #300 — POST /api/sentinel/tile is an unauthenticated quota/bandwidth
         relay for Sentinel Hub Process API tile fetches.
  #301 — GET /api/sentinel2/search is an unauthenticated Planetary
         Computer STAC + Esri imagery search relay.

All three lived in backend/routers/tools.py decorated only with
@limiter.limit(...) — no Depends(require_local_operator). That made
the backend a free anonymous relay for any caller's Sentinel /
Planetary Computer queries, in the same shape we already closed for
#240/#241 (oracle resolve) and #211/#213/#214 (thermal verify, OpenMHZ
calls + audio relay).

Fix: add dependencies=[Depends(require_local_operator)] to each route.
Loopback / Docker-bridge / admin-key callers (the operator dashboard)
are unaffected — they still resolve through the same allowlist used by
every other operator-only helper in this file. Anonymous remote callers
now receive 403 BEFORE any outbound HTTP call to Copernicus or
Planetary Computer happens.

Tests
-----
test_sentinel_routes_auth_gate.py — 8 new tests:
  * anonymous-remote → 403 on all three routes
  * NO upstream HTTP call when the gate fires (asserted via
    MagicMock(side_effect=AssertionError) on requests.post and
    services.sentinel_search.search_sentinel2_scene). This is the
    property that makes the gate real — without it, a 403 returned
    after the upstream call still burns quota.
  * 127.0.0.1 loopback caller reaches the handler (no false-positive
    where the gate accidentally blocks the local operator too).
  * Uses raw ASGITransport(client=(peer_ip, ...)) rather than
    FastAPI's TestClient because TestClient reports client.host as
    "testclient" which is not on the loopback allowlist.

test_control_surface_auth.py — extended the existing parameterised
regression with the three new routes. That regression is the global
"no remote control surface ships without auth" guard for the whole
codebase; adding these to it means a future refactor that drops the
dependency from any of them will fail CI alongside the existing
~30 gated routes.

The egress-on-403 property and the parameterised regression together
give two independent proofs that the gate fires before the upstream
network call, even if FastAPI's internal dependant tree shape changes
across versions (an earlier draft of this PR included a static walker
of the route table; it was removed because behavioural evidence is
strictly stronger and version-independent).
2026-05-22 09:58:25 -06:00
BigBodyCobain a3aa7b4dec Merge branch 'main' of https://github.com/bigbodycobain/Shadowbroker into fix/287-rate-limit-proxy-aware 2026-05-22 09:51:13 -06:00
Shadowbroker 19fb7f0b1e Fix #288: viewport-scoped live-data for heavy layers only (#294)
Reported by @tg12 in the external security/correctness audit.

Before this change, /api/live-data/{fast,slow} accepted s/w/n/e query
params but their Query() descriptions explicitly said "(ignored)". The
endpoints shipped the full in-memory world dataset on every poll:

    /api/live-data/fast → 16.88 MB
    /api/live-data/slow → 10.12 MB
                          ── 27 MB per poll cycle, regardless of zoom

For a node with N operators each polling at the steady 15s/120s cadence,
this is hundreds of MB/minute of outbound traffic that never gets used —
the GPU just culls everything outside the viewport client-side. On a
Tor-bridged or LTE-backed node, that bandwidth bill is the actual cost.

This change makes the existing s/w/n/e params honored — when all four
bounds are supplied, the backend bbox-filters a curated set of heavy,
density-driven, time-sensitive collections to that viewport (with the
existing 20% padding from _bbox_filter):

    /fast: commercial_flights, military_flights, private_flights,
           private_jets, tracked_flights, ships, cctv, uavs, liveuamap,
           gps_jamming, sigint, trains
    /slow: gdelt, firms_fires, kiwisdr, scanners, psk_reporter

Static reference layers (satellites, datacenters, military_bases,
power_plants, satnogs, weather, news, stocks, etc.) deliberately STAY
world-scale so panning never reveals an "empty world" of infrastructure.
That preserves the no-hostile-UX feel of the existing dashboard.

Behavior contract:

  * Without bbox params (or with a partial bbox), the response is
    byte-for-byte identical to the pre-#288 implementation. No
    behavior change for any existing caller that hasn't opted in.
  * World-scale bbox (lng_span >= 300 or lat_span >= 120) short-circuits
    filtering and shares the global ETag — zoomed-out operators all
    hit the same 304 cache exactly like before.
  * ETag now mixes a 1°-quantized bbox suffix when filtering engages,
    so two viewports never poison each other's 304 cache. Sub-degree
    pans land in the same ETag bucket (i.e. don't bust the cache on
    every mouse drag).

Polling cadence, rate-limit windows, and the 304 short-circuit are all
unchanged. Only the SIZE of the responses changes, and only when the
caller opts in via bounds.

Frontend wiring: useViewportBounds reuses the same coarsened/
expanded bounds it already computes for the AIS /api/viewport POST and
pushes them into a new module-level liveDataViewport store.
useDataPolling reads from that store via appendLiveDataBoundsParams
when building each live-data URL.

Tests cover: no-bbox → world data; bbox → heavy layers filtered;
bbox → reference layers untouched; world-scale bbox → no filter;
partial bbox → treated as no bbox; ETag changes with bbox; sub-degree
pan → same ETag; 304 path works; antimeridian-crossing bbox handled.

Co-authored-by: BigBodyCobain <moatbc@gmail.com>
2026-05-22 00:56:29 -06:00
Shadowbroker 35cd4e4c71 Fix #287: proxy-aware rate-limit key (#295)
Reported by @tg12 in the external security/correctness audit.

Before this change, backend/limiter.py was:

    from slowapi.util import get_remote_address
    limiter = Limiter(key_func=get_remote_address)

get_remote_address only ever returns request.client.host — it does
not look at X-Forwarded-For. Behind the bundled Next.js proxy (or any
other reverse proxy), every connected operator's client.host is the
frontend container's bridge IP, so @limiter.limit("120/minute")
collapses into one shared bucket for everybody on the same backend.
One heavy tab can starve every other operator on that node.

This change swaps in shadowbroker_rate_limit_key, which:

  * Reads X-Forwarded-For ONLY when the immediate peer matches the
    SAME hostname-bound allowlist we use for Docker-bridge local-operator
    trust (auth._resolve_trusted_bridge_ips — fix #250). Default is
    `frontend,shadowbroker-frontend`, override via
    SHADOWBROKER_TRUSTED_FRONTEND_HOSTS.
  * Picks the FIRST entry in the XFF chain — that's the operator end,
    not the proxy end.
  * Falls back to request.client.host for any peer not on the
    allowlist. Direct hits, unrelated containers, and unknown hosts
    are bucketed exactly like before.
  * Falls back to request.client.host when the resolver itself raises
    (e.g. DNS down). XFF is never accepted on a peer we can't confirm
    is the trusted frontend — there is no way to spoof another
    operator's bucket from outside.

No new env vars. No new operator config. Single-operator nodes are
unaffected — same behaviour as before. The 120/minute and 60/minute
windows on the existing endpoints are unchanged; only the KEY they
bucket on changes.

Tests cover:
  * Direct loopback → keys on peer (regression check vs.
    get_remote_address default).
  * Untrusted peer sending XFF → XFF ignored, keys on peer.
  * Trusted frontend peer with XFF → keys on first XFF entry.
  * First XFF entry picked from a multi-hop chain.
  * Trusted peer without XFF → falls back to peer IP.
  * Empty/whitespace XFF entries skipped.
  * Header lookup is case-insensitive.
  * Two operators behind same proxy → different keys (the whole
    point of the fix).
  * Spoof attempt from internet-facing untrusted IP can't steal the
    victim's bucket.
  * Resolver raising is treated as untrusted (fail-closed).
  * No-client request shape doesn't raise.

Co-authored-by: BigBodyCobain <moatbc@gmail.com>
2026-05-22 00:51:54 -06:00
BigBodyCobain 31f79fd8e2 Fix #287: proxy-aware rate-limit key
Reported by @tg12 in the external security/correctness audit.

Before this change, backend/limiter.py was:

    from slowapi.util import get_remote_address
    limiter = Limiter(key_func=get_remote_address)

get_remote_address only ever returns request.client.host — it does
not look at X-Forwarded-For. Behind the bundled Next.js proxy (or any
other reverse proxy), every connected operator's client.host is the
frontend container's bridge IP, so @limiter.limit("120/minute")
collapses into one shared bucket for everybody on the same backend.
One heavy tab can starve every other operator on that node.

This change swaps in shadowbroker_rate_limit_key, which:

  * Reads X-Forwarded-For ONLY when the immediate peer matches the
    SAME hostname-bound allowlist we use for Docker-bridge local-operator
    trust (auth._resolve_trusted_bridge_ips — fix #250). Default is
    `frontend,shadowbroker-frontend`, override via
    SHADOWBROKER_TRUSTED_FRONTEND_HOSTS.
  * Picks the FIRST entry in the XFF chain — that's the operator end,
    not the proxy end.
  * Falls back to request.client.host for any peer not on the
    allowlist. Direct hits, unrelated containers, and unknown hosts
    are bucketed exactly like before.
  * Falls back to request.client.host when the resolver itself raises
    (e.g. DNS down). XFF is never accepted on a peer we can't confirm
    is the trusted frontend — there is no way to spoof another
    operator's bucket from outside.

No new env vars. No new operator config. Single-operator nodes are
unaffected — same behaviour as before. The 120/minute and 60/minute
windows on the existing endpoints are unchanged; only the KEY they
bucket on changes.

Tests cover:
  * Direct loopback → keys on peer (regression check vs.
    get_remote_address default).
  * Untrusted peer sending XFF → XFF ignored, keys on peer.
  * Trusted frontend peer with XFF → keys on first XFF entry.
  * First XFF entry picked from a multi-hop chain.
  * Trusted peer without XFF → falls back to peer IP.
  * Empty/whitespace XFF entries skipped.
  * Header lookup is case-insensitive.
  * Two operators behind same proxy → different keys (the whole
    point of the fix).
  * Spoof attempt from internet-facing untrusted IP can't steal the
    victim's bucket.
  * Resolver raising is treated as untrusted (fail-closed).
  * No-client request shape doesn't raise.
2026-05-22 00:46:25 -06:00
BigBodyCobain fd7d6fa401 chore(.gitignore): exclude AI-agent scratch dirs and stray fixtures
The repo root has been accumulating AI-coding-agent dropouts that have
no project contract value:

  .codex/, .codex-app-schema/, .codex-app-ts/   — OpenAI Codex CLI
  AGENTS.md, GEMINI.md                          — per-agent instructions
  CLAUDE.md                                     — same shape
  .github/copilot-instructions.md               — GitHub Copilot hints

These are operator-side preferences. If something needs to be canonical
for the project, it goes in docs/ explicitly.

Also adding backend/tests/test_carrier_tracker_region_centers.py —
a stale fixture that referenced fields (region, source_detail,
position_label, position_source_type, position_confidence='low')
that don't exist in the current `_parse_carrier_positions_from_news`
implementation. The real coverage for that function lives in
tests/test_carrier_tracker_quality.py from PR #285.
2026-05-21 20:47:06 -06:00
Shadowbroker 49621824b1 Use USNI Fleet Tracker as the primary carrier source + small UI fixes (#293)
Background
==========
PR #285 set up the seed -> cache -> GDELT model for the carrier tracker
to address audit issues #244/#245/#246. The GDELT half of that pipeline
hits api.gdeltproject.org's doc API for headline-region keyword
matching -- low precision (false centroid positions per #245) AND
unreliable (the host times out from some networks, including Docker
Desktop on Windows).

USNI publishes a weekly Fleet & Marine Tracker with explicit prose like:

  "The Gerald R. Ford Carrier Strike Group is operating in the Red Sea"
  "Aircraft carrier USS George Washington (CVN-73) is in port in
   Yokosuka, Japan"

That is a strictly better source for U.S. Navy carrier positions:
authoritative, deterministically parseable, weekly cadence.

What this PR does
=================
New module: backend/services/fetchers/usni_fleet_tracker.py

  - Pulls USNI's WordPress RSS feeds (site-wide + category, unioned).
  - Picks the most recent fleet-tracker post by parsed pubDate.
  - For each carrier in the registry, scans the article body for
    "is operating in / is in port in / returned to / transiting" near
    the carrier's name, hull code, or "<name> Carrier Strike Group"
    variant. Captures the region/port phrase that follows.
  - Maps the region phrase to coordinates via the existing
    REGION_COORDS table, with a USNI-phrase alias table for the
    specific wording USNI uses ("Yokosuka, Japan", "Norfolk, Va.",
    "Naval Station San Diego", "5th Fleet AOR", etc.).
  - Returns {hull: position_entry} with position_confidence="recent"
    and position_source_at = the article's actual publication
    timestamp (not now()).

Politeness
----------
Uses outbound_user_agent("usni-fleet-tracker") so USNI sees a
per-install Shadowbroker identifier (Round 7a / PR #292). The
article body pages return 403 to non-browser UAs; the WordPress RSS
feed serves the full <content:encoded> body and is the supported
aggregator path. No browser UA spoofing.

carrier_tracker.update_carrier_positions() now runs three phases:
  1. Bootstrap from cache (or seed on first run).
  2. USNI fleet tracker -- PRIMARY high-confidence source.
  3. GDELT -- SECONDARY backfill; can NOT demote a "recent" USNI
     position to an "approximate" GDELT headline match.

Verified live: 6 of 11 carriers picked up real May 18, 2026 positions
on first refresh (Eisenhower, Ford, Bush, Roosevelt, Lincoln,
Washington). The other 5 weren't mentioned in this week's article
(they're in port at homeports with no deployment changes) and kept
their cache entries -- which is the correct seed/cache contract from
PR #285.

Other small fixes bundled in
============================
docker-compose.yml: add the 6 third-party-fetcher opt-in env vars
(PREDICTION_MARKETS_ENABLED, FINANCIAL_ENABLED, FIMI_ENABLED,
NUFORC_ENABLED, NEWS_ENABLED, CROWDTHREAT_ENABLED). They were
documented in .env.example but never wired through compose, so setting
them in .env had no effect.

frontend/src/components/TopRightControls.tsx: fix 6 broken i18n keys
that were showing as raw "terminal.term1" / "terminal.cleanupDetail" /
"node.soloReady" placeholders in the INFONET TERMINAL modal. The
translation files have these strings under different key names; the
component now calls the right ones. Full-file sweep confirmed every
other t('...') key in the whole frontend resolves cleanly.
2026-05-21 20:39:23 -06:00
Shadowbroker 76750caa92 Round 7a: per-operator outbound attribution + GDELT GCS-direct fix (#292)
== Per-install operator handle for every third-party API call ==

Before this PR, every Shadowbroker install identified itself to
Wikipedia, Wikidata, Nominatim, GDELT, OpenMHz, Broadcastify,
weather.gov, NUFORC, Sentinel/Planetary Computer, TinyGS / CelesTrak,
Shodan, Finnhub, and others with a single project-wide User-Agent
("Shadowbroker/1.0" or "ShadowBroker-OSINT/1.0"). From the upstream's
perspective every install in the world looked like one giant scraper.
If one install misbehaved, the upstream's only recourse was to block
"Shadowbroker" as a whole.

PR #284 inadvertently doubled down on this in the frontend by
introducing a shared `WIKIMEDIA_API_USER_AGENT` constant. This PR
retrofits both backends to per-operator attribution.

  New setting: OPERATOR_HANDLE (env var / settings UI / auto-gen)
  New helper:  network_utils.outbound_user_agent("purpose")

The handle is auto-generated as "operator-XXXXXX" on first call (the
"shadow-" prefix from earlier drafts was deliberately dropped — too
suspicious-looking for abuse-detection systems). Operators can
override via OPERATOR_HANDLE; the value is sanitized to lowercase
alphanumeric+dash+underscore and capped at 48 chars. Persisted to
backend/data/operator_handle.json so it survives container restarts.

Retrofitted call sites (every previously-MONSTER User-Agent):
  - services/region_dossier.py (Wikipedia + Wikidata + Nominatim)
  - services/geocode.py         (Nominatim)
  - services/sentinel_search.py (Microsoft Planetary Computer)
  - services/feed_ingester.py   (operator-curated RSS feeds)
  - services/fetchers/earth_observation.py (weather.gov, NUFORC)
  - services/fetchers/infrastructure.py
  - services/fetchers/aircraft_database.py
  - services/fetchers/route_database.py
  - services/fetchers/trains.py
  - services/fetchers/meshtastic_map.py
  - services/shodan_connector.py
  - services/unusual_whales_connector.py (Finnhub)
  - services/tinygs_fetcher.py            (CelesTrak + TinyGS)
  - services/sar/sar_products_client.py
  - services/geopolitics.py               (GDELT)
  - services/radio_intercept.py           (Broadcastify + OpenMHz)
  - routers/cctv.py + main.py             (CCTV proxy)
  - routers/ai_intel.py
  - scripts/convert_power_plants.py       (release-time data refresh)

Spoofed browser UAs removed (issues #289 / #290 / #291 — tg12 audit):
  - cloudscraper-based Chrome impersonation against api.openmhz.com
    -> replaced with honest requests + per-install UA
  - Mozilla/5.0 spoofed UA on Broadcastify scrape
    -> replaced with honest UA
  - Mozilla/5.0 + fake first-party Referer on OpenMHz audio relay
    -> replaced with honest UA
  - cloudscraper dependency dropped from pyproject.toml + uv.lock

Frontend retrofit:
  - new GET /api/settings/operator-handle endpoint (local-operator
    gated) returns the install's handle
  - frontend/src/lib/wikimediaClient.ts fetches the handle once on
    first use, caches it for page lifetime, embeds it in the
    Api-User-Agent for every Wikipedia / Wikidata browser-direct call

== GDELT GCS-direct fix ==

GDELT's data.gdeltproject.org is a CNAME to a Google Cloud Storage
bucket. GCS responds with the wildcard *.storage.googleapis.com cert
which legitimately does NOT cover the GDELT custom domain, so Python's
TLS verification correctly refuses the connection. Some networks
happen to route through a path where this works; many (notably Docker
Desktop's outbound NAT on local installs) do not. Verified on the
maintainer's local install: GDELT was unreachable; 1610 geopolitical
events / 48 export files were dropping silently.

Fix: services/geopolitics._gcs_direct_gdelt_url() rewrites any
data.gdeltproject.org URL to its GCS-direct equivalent
(storage.googleapis.com/data.gdeltproject.org/...) where the standard
GCS cert is genuinely valid. api.gdeltproject.org and every other host
are left untouched.

Confirmed live: backend log goes from
  GDELT lastupdate failed: 500
to
  Downloading 48 GDELT export files...
  Downloaded 48/48 GDELT exports
  GDELT parsed: 1610 conflict locations from 48 files

== Tests ==

  backend/tests/test_per_operator_outbound_attribution.py (12 tests)
  backend/tests/test_gdelt_gcs_direct_rewrite.py          (6 tests)
  backend/tests/test_region_dossier_wikimedia_ua.py       (updated to
    pin the helper + per-operator handle, not the old constant)
  frontend/src/__tests__/utils/wikimediaClient.test.ts    (rewritten
    to mock /api/settings/operator-handle and assert per-operator UA)

Local: backend 114/114 security+audit+round7a suite green;
       frontend 718/718 vitest suite green.

Credit: tg12 (external security audit, issues #289/#290/#291
relating to spoofed UAs); BigBodyCobain (operator-prefix call,
GDELT cloud-vs-local diagnosis).
2026-05-21 15:11:28 -06:00
Shadowbroker c3ef9f4b9e Fix #239: CI guard against new duplicate route registrations (#286)
The audit's concern is that FastAPI behavior depends on the order
routes are registered, because backend/main.py and several router
modules register the same (method, path) pairs twice.

Empirical verification (done in this PR's investigation, see
test_router_handler_is_the_one_that_serves) shows:

- main.app.include_router(...) runs at line ~3316.
- All @app.get/post/... decorators in main.py run AFTER that.
- FastAPI matches in registration order -> the router handler always
  wins; the main.py copies are dead code at the route-resolution
  layer.

So behavior today is deterministic, but drift between the two copies
is a real future risk: someone editing only one copy of a pair
introduces silent inconsistency, exactly as we saw in round 5 with
_WORMHOLE_PUBLIC_SETTINGS_FIELDS (which existed in BOTH main.py and
routers/wormhole.py and had to be tightened in both).

This PR is the lowest-risk fix: a CI guard that captures today's 166
known duplicates as a baseline and fails the build if any NEW
duplicate appears later. Existing duplicates are tolerated. Removed
duplicates are allowed (the baseline is a ceiling, not a floor). No
production code is deleted or moved -- the dedup of the existing 166
duplicates can be staged separately in future PRs without rushing.

Files:

- backend/tests/data/duplicate_routes_baseline.json
  Snapshot of every currently-tolerated (METHOD path) duplicate with
  the modules that register each copy. Generated from a live import
  of main.app via the snippet in the test docstring.

- backend/tests/test_no_new_duplicate_routes.py
  Three tests:
    1. test_no_new_duplicate_route_registrations -- the actual guard,
       fails if (METHOD, path) not in baseline is found duplicated.
    2. test_baseline_only_lists_real_duplicates -- warns (does not
       fail) if the baseline has entries that no longer correspond to
       a real duplicate; informational housekeeping for the next
       baseline regeneration.
    3. test_router_handler_is_the_one_that_serves -- pins the
       empirical claim that for every duplicated path the router
       handler is the first-registered one. If someone ever reorders
       include_router() to come AFTER @app decorators, this test
       fails loudly and points at the most likely cause.

Verified locally:
- 3/3 new tests pass with current main (166 baselined dups).
- Synthetic duplicate injected into main.app at runtime IS caught by
  test 1.
- Full security+carrier suite (96 tests) still green.

Credit: tg12 (external security audit).
2026-05-21 13:27:16 -06:00
Shadowbroker 5e6bb8511a Fix #244/#245/#246: carrier tracker seed/cache/freshness model (#285)
Replace the dated editorial fallback positions baked into the registry
with a one-shot seed file + persistent observation cache. The user's
runtime cache now reflects what THIS install has actually observed,
not what USNI published on March 9, 2026. A year from now, the cache
holds a year of observations and the seed is irrelevant.

== #244: dated editorial coordinates out of the registry ==

CARRIER_REGISTRY no longer carries fallback_lat/lng/heading/desc.
Those fields are deleted. The registry is now identity + homeport
only.

New file: backend/data/carrier_seed.json
  - Read-only, shipped with every release.
  - Used ONCE on first-ever startup to bootstrap carrier_cache.json.
  - Each entry stamped with position_confidence="seed" and the actual
    as-of date (2026-03-09), NOT now().

== #245: approximate confidence for headline-derived positions ==

_parse_carrier_positions_from_news() now stamps every GDELT-derived
entry with position_confidence="approximate" so the UI knows the
coordinate is a region-centroid match, not a precise observation.
After the freshness window the label rolls over to
"stale_approximate" so old-and-imprecise is distinguishable from
recent-and-imprecise.

The article's actual seendate is used as position_source_at instead
of now(), so the "last reported X days ago" badge is honest.

== #246: freshness is labelling, not eviction ==

The cache always preserves the last position the system observed,
forever. What changes is the position_confidence label:
  - within configurable window (default 14d, env-overridable via
    SHADOWBROKER_CARRIER_FRESHNESS_DAYS) -> "recent"
  - older -> "stale"
  - seed-bootstrap entries that were never refreshed -> "seed"
  - homeport defaults (carrier added post-install) -> "homeport_default"
  - headline-derived (any age, fresh) -> "approximate"
  - headline-derived (older than window) -> "stale_approximate"

The position itself never reverts to the seed or the registry. The
user always sees the last position the system observed. Per the
user's explicit guidance: "from there have it be the last position
the user has logged the carriers that way a year from now it doesnt
revert to where the ships are today".

== Other improvements ==

- CACHE_FILE moved to backend/data/carrier_cache.json so it lives in
  the volume-mounted dir under Docker compose. Previously it was at
  /app/carrier_cache.json which got wiped on every container restart
  (pre-existing bug).
- Atomic cache write (temp + os.replace) so a crash mid-write does
  not leave a truncated cache file.

== Public API shape ==

Every carrier object the API emits now includes:
  - position_confidence: seed | recent | stale | approximate |
                         stale_approximate | homeport_default
  - position_source_at:  ISO timestamp of when the underlying source
                         was observed (NOT now())
  - is_fallback:         convenience boolean for the UI; true when the
                         confidence is seed/stale/stale_approximate/
                         homeport_default

Existing fields (estimated, source, source_url, last_osint_update,
name, type, lat, lng, country, desc, wiki) are preserved exactly so
the current ShipPopup frontend renders unchanged. last_osint_update
now reflects position_source_at instead of now(), which is what the
existing "last reported MM/DD" badge always meant to show.

Tests: backend/tests/test_carrier_tracker_quality.py — 17 tests
covering seed bootstrap, subsequent-startup ignoring seed, no-seed/
no-cache homeport fallback, registry no longer has fallback fields,
freshness window labelling + env override, "year-old cache entry keeps
its position, only the label flips" regression, approximate
confidence for headline matches, GDELT seendate ISO parser, public
response shape backward compat.

Credit: tg12 (external security audit, three P1/P2 issues).
2026-05-21 11:15:52 -06:00
Shadowbroker 0fee36e8f7 Fix #218/#219/#220: identify ShadowBroker on Wikipedia + Wikidata calls (#284)
Wikimedia's User-Agent policy asks API clients to identify themselves
with a stable, contactable identifier so their operators can rate-limit
or coordinate. Before this change, ShadowBroker was sending:

- Backend (region_dossier.py): generic project default UA only; no
  Api-User-Agent.
- Frontend (useRegionDossier.ts, WikiImage.tsx, NewsFeed.tsx): zero
  identifying header at all; three separate copy-pasted anonymous
  fetches with their own module-local caches.

Three separate components doing the same broken thing meant policy
fixes had to happen in three places, with no shared cache or kill
switch.

Fix (no UX change, zero hostility):

== Backend ==

`backend/services/region_dossier.py` now sets explicit `User-Agent` +
`Api-User-Agent` headers on every outbound Wikidata and Wikipedia
request via a new `_WIKIMEDIA_REQUEST_HEADERS` constant. The identifier
includes a contact path (issues page on the public GitHub repo).

== Frontend ==

New shared helper `frontend/src/lib/wikimediaClient.ts`:
- `fetchWikipediaSummary(title)` — single source of truth for Wikipedia
  REST summary lookups, with one shared LRU cache (in-flight requests
  deduplicated, 512-entry cap), `Api-User-Agent` on every fetch.
- `fetchWikidataSparql(query)` — same shape for Wikidata SPARQL.
- `WIKIMEDIA_API_USER_AGENT` — exported constant; one place to update
  if Wikimedia ever asks us to back off.

Refactored three components to use the shared client:
- `frontend/src/hooks/useRegionDossier.ts` — fetchLeader() and
  fetchLocalWikiSummary() now route through the shared helpers.
- `frontend/src/components/WikiImage.tsx` — uses fetchWikipediaSummary,
  proper React state instead of module-mutation + forceUpdate trick.
- `frontend/src/components/NewsFeed.tsx` — same shape.

UX: byte-for-byte identical. Same thumbnails, same dossier content,
same load behavior. The only observable difference is the outgoing
request header.

Note on #239 (route duplication): an audit-grade inventory shows 166
main.py routes are shadowed by router modules. That cleanup is too
large to land safely in this PR; it will be staged as a separate
ladder of small PRs grouped by router module.

Tests:
- `backend/tests/test_region_dossier_wikimedia_ua.py` — 3 tests
  asserting backend headers are present.
- `frontend/src/__tests__/utils/wikimediaClient.test.ts` — 9 tests
  covering Api-User-Agent presence, shared cache, concurrent
  deduplication, disambiguation/HTTP-error/network-error fallthroughs,
  empty-input safety.

Local: backend 76/76 security suite green, frontend 716/716 vitest
suite green.

Credit: tg12 (external security audit).
2026-05-21 10:48:05 -06:00
Shadowbroker e125467721 Fix #243/#252/#253: stop leaking settings posture to anonymous callers (#283)
Three settings endpoints were disclosing operational posture or
operator-curated configuration to any network caller. This change
either tightens the redacted-public view (#243) or adds a
local-operator auth gate (#252, #253) per the audit recommendations.

Zero hostility to legitimate users: in all three cases, the Tauri
shell (loopback), the Docker bridge frontend container (#250 + #278),
and any caller with an admin key continue to see the full data. Only
anonymous LAN/internet callers see the reduced surface.

== #243 (Wormhole transport posture, anonymous-mode, profile, node mode)

Tightened the public-redaction allowlists in BOTH the main.py and
routers/wormhole.py copies:
- _WORMHOLE_PUBLIC_SETTINGS_FIELDS: {enabled, transport, anonymous_mode}
                                 -> {enabled}
- _WORMHOLE_PUBLIC_PROFILE_FIELDS: {profile, wormhole_enabled}
                                 -> {wormhole_enabled}

`GET /api/settings/node` (both the routers/admin.py and main.py copies)
now returns an empty stub for unauthenticated callers and the full
node_mode + node_enabled fields only for authenticated callers via
_scoped_view_authenticated(request, "node").

== #252 (news feed inventory disclosure)

`GET /api/settings/news-feeds` now requires Depends(require_local_operator)
in both the canonical routers/admin.py handler and the duplicate main.py
handler. Anonymous callers can no longer enumerate operator-curated
feed names and URLs.

== #253 (Time Machine archival-capture posture disclosure)

`GET /api/settings/timemachine` now requires Depends(require_local_operator).
Anonymous callers can no longer fingerprint whether a deployment is
retaining replayable historical surveillance data.

Tests: backend/tests/test_round5_settings_info_disclosure.py (10 tests)
- Wormhole settings: anonymous sees only `enabled`; authenticated sees full state.
- Privacy profile: anonymous sees only `wormhole_enabled`; authenticated sees `profile` + `transport` + `anonymous_mode`.
- Node settings: anonymous sees `{}`; authenticated sees node_mode + node_enabled + persisted state.
- news-feeds: anonymous gets 403 (and get_feeds() is NOT called); authenticated gets full inventory.
- timemachine: anonymous gets 403; authenticated sees enabled + storage_warning.

Local: 73/73 security suite (round 5 + earlier rounds) green.

Credit: tg12 (external security audit, P1 + 2x Medium).
2026-05-21 10:32:23 -06:00
Shadowbroker 2b03b808ac Fix #279: add defusedxml to uv.lock so Docker image installs it (#282)
defusedxml is listed in backend/pyproject.toml line 18 but was missing
from uv.lock. The backend Dockerfile uses `uv sync --frozen --no-dev`,
which only installs packages pinned in the lockfile. As a result the
runtime image shipped without defusedxml even though pyproject declared
it, and any import path that touched it crashed at startup with:

    ModuleNotFoundError: No module named 'defusedxml'

Affected import sites:

- backend/services/psk_reporter_fetcher.py:10
- backend/services/fetchers/aircraft_database.py:21
- backend/services/cctv_pipeline.py:990
- backend/services/cctv_pipeline.py:1018

Fix: regenerate uv.lock so defusedxml v0.7.1 (matching the >=0.7.1
specifier in pyproject) is locked. No code changes -- only the lockfile.
Next image build picks it up via the existing `uv sync --frozen` step.

Reporter: external user. Thanks for catching the missing dep.
2026-05-21 10:18:40 -06:00
Shadowbroker 2e14e75a0e Fix #256: per-peer HMAC secrets defeat cross-peer impersonation (#281)
Before this change, every peer-push HMAC was derived from the single
fleet-shared MESH_PEER_PUSH_SECRET. The receiver could prove "this
request was signed by someone who knows the fleet secret" but it could
NOT prove which peer signed it. Any peer that knew the global secret
could compute the expected HMAC for any other peer URL and forge a
push pretending to be that peer.

Fix: introduce MESH_PEER_SECRETS, an optional comma-separated
url=secret map. When a peer URL appears in the map, only the listed
per-peer secret is accepted for it -- the global secret is ignored for
that specific URL. Peer A no longer knows peer B's secret, so peer A
cannot forge a push claiming to be peer B.

The new helper resolve_peer_key_for_url() in mesh_crypto.py wraps the
lookup and is called from every existing peer-push call site:

- backend/auth.py:_verify_peer_push_hmac (receiver)
- backend/main.py:_http_peer_push_loop (Infonet event push)
- backend/main.py:_http_gate_pull_loop (gate event pull)
- backend/main.py:_http_gate_push_loop (gate event push)
- backend/services/mesh/mesh_router.py (two transports, push)
- backend/services/mesh/mesh_hashchain.py (gate wire ref key)
- backend/services/mesh/mesh_wormhole_prekey.py (peer prekey lookup)

Zero hostility, by design:

- Single-peer installs leave MESH_PEER_SECRETS empty -> resolver falls
  back to MESH_PEER_PUSH_SECRET -> behavior is byte-for-byte unchanged.
- Multi-peer installs that haven't migrated yet behave exactly as
  before.
- Multi-peer installs that DO migrate set MESH_PEER_SECRETS on both
  ends of each peering and immediately close the impersonation surface
  for those URLs. Migration is incremental: unlisted peers keep using
  the global secret.

Tests in backend/tests/test_per_peer_secret_resolver.py:
- env parsing (default, override, whitespace, malformed entries, cache)
- precedence: per-peer beats global
- migration window: unlisted peer falls back to global
- IMPERSONATION REFUSAL: peer A with global-secret-only cannot forge
  HMAC for peer B that has a per-peer secret configured
- IMPERSONATION REFUSAL: peer A with its OWN per-peer secret cannot
  forge HMAC for peer B
- positive control: legitimate peer B request verifies
- zero-behavior-change: single-peer install produces the same key bytes
  as before the change

Credit: tg12 (external security audit, P1/High/High confidence)
2026-05-21 10:05:29 -06:00
Shadowbroker 084e563412 Fix #240/#241: require admin auth on oracle resolve endpoints (#280)
Both POST /api/mesh/oracle/resolve and POST /api/mesh/oracle/resolve-stakes
were previously gated only by a rate limit (5/min) and tagged with
`mesh_write_exempt(MeshWriteExemption.ADMIN_CONTROL)`. The exemption
decorator is metadata only — it tells the mesh signed-write middleware
not to require a signature envelope, it does NOT enforce caller
authorization. Any network caller could:

- /resolve: settle any prediction market to any outcome (corrupts every
  downstream profile/win-loss count derived from that ledger).
- /resolve-stakes: trigger stake settlement for all expired contests at
  a time of their choosing (race against operator intent).

Fix: add `dependencies=[Depends(require_admin)]` to both routes. The
existing `mesh_write_exempt` tag stays in place because it accurately
describes the route's relationship to the signed-write envelope system;
adding `require_admin` is what closes the actual auth hole.

Tests in backend/tests/test_oracle_resolve_auth_gate.py:
- anonymous caller -> 403, ledger mutator NOT called
- wrong admin key -> 403, ledger mutator NOT called
- valid admin key -> 200, ledger mutator called
- admin key unconfigured + no debug/insecure-admin -> 403

Credit: tg12 (external security audit)
2026-05-21 09:45:08 -06:00
Shadowbroker 9ef6213284 Fix #250: bind Docker bridge local-operator trust to frontend hostname (#278)
Tightens the bridge-trust check so a connection on the Docker bridge
is only granted local-operator status when its source IP matches a
configured frontend container hostname (default: `frontend` + the
shipped `container_name` `shadowbroker-frontend`). Previously, when
`SHADOWBROKER_TRUST_DOCKER_BRIDGE_LOCAL_OPERATOR=1` was set, ANY IP
in the 172.16.0.0/12 range was granted local-operator privileges —
on a shared Docker host that included any unrelated container on the
same bridge.

Operators with renamed services can list new hostnames via the new
`SHADOWBROKER_TRUSTED_FRONTEND_HOSTS` env var (comma-separated). DNS
resolution is cached for 30s; if Docker DNS can't resolve any of the
configured names we fail closed and refuse the bridge entirely.

Single-user installs see no behavior change — the default-named
frontend container still resolves and is still trusted.

Credit: tg12 (external security audit)
2026-05-21 02:06:11 -06:00
Shadowbroker fb11e0881f Fix #251: refuse symlink/hardlink members during Tor bundle extraction (#277)
External audit (@tg12) flagged that the Tor Expert Bundle extractor
checked tarinfo.name against path traversal but never inspected
tarinfo.linkname for symlink or hardlink members. Python 3.11's
tarfile.extractall() honors symlinks, so a malicious archive could
ship a member like::

    name     = "innocent.txt"          (passes the path-traversal check)
    type     = SYMTYPE
    linkname = "C:\Windows\System32\config\system"

After extraction, subsequent reads of innocent.txt dereference to that
arbitrary filesystem location; subsequent writes corrupt it. On
Windows (where Tor Expert Bundle extraction actually runs), this is
a host-compromise path of essentially the same severity as the
supply-chain RCE in #231 — gated only by the integrity check we just
hardened in PR #261/#265.

Python 3.12+ added tarfile.extract / extractall filter='data' as a
built-in mitigation; we're on Python 3.11 in production, so we
implement the same idea manually.

Fix in backend/services/tor_hidden_service.py:

  Extract the existing path-traversal-only check into a new
  _extract_tor_bundle_safely() helper that:

  1. Refuses any member with member.issym() or member.islnk() True.
     Tor bundles never legitimately contain symlinks or hardlinks
     so this is non-disruptive. Logs the linkname so an operator
     can see what the malicious archive was trying to alias.
  2. Refuses any member that isn't isfile() or isdir() — no FIFOs,
     no character or block devices, no contiguous-file-type entries.
     None of those belong in a Tor Expert Bundle and accepting them
     is a class of bug we don't need to debug later.
  3. Preserves the original path-traversal guard (member.name must
     resolve under install_dir).
  4. Catches tarfile.TarError so a corrupt archive returns False
     gracefully instead of bubbling out an exception.

Tests: backend/tests/test_tor_bundle_symlink_filter.py (8 tests)
  - Clean archive with only regular files extracts successfully
  - Symlink member is rejected (the core regression)
  - Hardlink member is rejected
  - Symlink with relative target inside install_dir is still rejected
    (we don't allow symlinks at all, not just absolute-target ones)
  - FIFO/device-style member is rejected
  - Path-traversal guard still works under the new shape
  - Malformed/non-tar file is rejected gracefully (no crash)
  - Failure on one member rejects the whole bundle (no half-extract)

Validation:
  pytest backend/tests/test_tor_bundle_symlink_filter.py
         backend/tests/test_tor_bundle_verification.py
  -> 14 passed

UX impact: zero for legitimate Tor releases. Operators installing
a real Tor Expert Bundle continue to see "Tor installed at:" exactly
as before. Only malicious archives are refused, with a clear log
message identifying the rejected linkname.

Credit: @tg12 — the original report was specific enough that the
fix design was immediate.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-21 01:41:13 -06:00
Shadowbroker 7f96151e56 Fix #231: multi-source SHA-256 verification for the self-updater (#265)
External audit (@tg12, May 18) found that backend/services/updater.py
silently skipped all SHA-256 integrity verification whenever the
MESH_UPDATE_SHA256 env var was unset — which is the default. Nothing
in any install doc tells operators to set it, so practically every
deployment was running the auto-updater with zero integrity check.
That made GitHub release pipeline compromise a single-step path to
arbitrary code execution on every node that auto-updates.

Investigation surfaced a deeper bug too: the updater downloads
zipball_url (GitHub's auto-generated source archive) but the
maintainer's release process publishes SHA256SUMS.txt for a separate
named asset (ShadowBroker_v*.zip). So even if MESH_UPDATE_SHA256
WERE set, operators had no published digest to compare against — the
file they were downloading wasn't the file the maintainer had signed.

This PR fixes both issues with the same multi-source verification
chain we shipped for the Tor bundle in PR #261:

  backend/services/updater.py
    _download_release() now prefers a maintainer-signed release asset
    matching ShadowBroker_v*.zip over zipball_url. Captures the
    SHA256SUMS.txt asset URL when present.

    _validate_zip_hash() rewritten as a four-source chain:
      1. MESH_UPDATE_SHA256 env var (operator override, preserved)
      2. SHA256SUMS.txt asset published with the release (primary —
         the maintainer's release process already publishes this)
      3. Baked-in backend/data/release_digests.json (second line of
         defense for releases that lack the SHA256SUMS asset, or when
         the asset can't be fetched at update time)
      4. HTTPS-only fallback with a loud warning (preserves the auto-
         update flow during transient outages)

    Mismatch from any source that DID respond is fatal — the update
    is refused and the existing install keeps running. Only the
    "no source reachable at all" case falls back to HTTPS-only.

    _fetch_sha256sums() new — fetches and parses a standard
    SHA256SUMS.txt asset. Handles both "<digest>  <name>" and binary-
    marker "<digest> *<name>" formats. Tolerant to comments, blank
    lines, and malformed entries.

  backend/data/release_digests.json (new)
    Baked-in digest list keyed by release tag. Seeded with the v0.9.79
    entries copied from the published SHA256SUMS.txt:
      ShadowBroker_v0.9.79.zip      = f6877c1d6661...
      ShadowBroker_0.9.79_x64-setup.exe = f7b676ada45c...
      ShadowBroker_0.9.79_x64_en-US.msi = e0713c3cdda1...
    Whitelisted in .gitignore alongside the other static reference
    data files (kiwisdr_directory.json, tor_bundle_digests.json,
    aisstream_spki_pins.json).

  backend/tests/test_update_integrity_chain.py (new, 16 tests)
    - Each source matches → success, identifies which source verified
    - Each source mismatches → RuntimeError "mismatch"
    - No source reachable → https-only fallback with loud warning
    - Env override beats all other sources (preserved precedence)
    - SHA256SUMS.txt parser handles standard, binary-marker, comments,
      and network-failure cases

Validation:
  pytest backend/tests/test_update_integrity_chain.py → 16 passed
  pytest (all 15 security test files together) → 105 passed

UX impact: zero. Normal auto-update flow is unchanged for legitimate
releases (path 2 catches everything because the release publishes
SHA256SUMS.txt). Transient network failures during update gracefully
fall through to path 3 then path 4 — no operator intervention needed.
The only user-visible behavior change is in the compromised-release
case, where the update is now refused instead of silently applied.

Credit: @tg12 for the original bug report and the specific call-out
that MESH_UPDATE_SHA256 was unreachable by default operators.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-21 01:31:20 -06:00
Shadowbroker d0299fc0a0 test(ci): raise vitest testTimeout to 15s to stop CI-load flakes (#266)
Vitest's default per-test timeout is 5s. That's plenty for tests that
exercise pure functions or even simple JSX, but the heavier React
component trees we render under jsdom — MessagesView, GateView,
Wormhole contact flows — consistently measure 6-10s on GitHub Actions'
shared Node workers under load.

Concrete flake history that drove this bump (none were real product
bugs — all were CI load racing the 5s ceiling on findByText /
waitFor against React reconciliation):

  PR #226 messagesViewFirstContact > removes approved contact
  PR #237 (same)
  PR #261 (same)
  PR #262 (same) ← worst: fired on post-merge Docker Publish run,
                   prevented the AIS SPKI security fix's image from
                   being published to GHCR until PR #263 cumulatively
                   re-published it. Real security-fix-shipping risk.
  PR #264 fixed messagesViewFirstContact specifically with waitFor
  PR #265 messagesViewFirstContact > legacy handle-only addresses
                  AND gateCompatDecryptUx > browser-local gate runtime
                  AND failed on the rerun too — confirming the flake
                  class is broader than the one test we deflaked.

The deflake in PR #264 was too surgical — it addressed one specific
test out of a class of similarly-flaky CI-load-sensitive sites. This
PR addresses the root cause at the config layer instead of playing
whack-a-mole.

Why 15s specifically: 3x the default. Headroom for routine CI slowness
without masking real "test never settles" bugs (those would still
time out, just three rounds later). Individual tests can still pin
their own tighter timeout via the third arg to `it()`.

Also bumps hookTimeout to 15s — beforeEach/afterEach setup for the
same heavier component tests has the same CI-load sensitivity.

User-facing impact: zero. This is dev pipeline infrastructure. End
users never see test timeouts. The cost is theoretical: a buggy test
that genuinely never resolves now takes 15s to declare failure
instead of 5s. In practice that's negligible because the suite runs
once per CI invocation and tests don't usually deadlock.

Validation:
  Local full vitest run → 707 passed, 72 files, 10.36s wall clock
  (same speed as before — we only changed how long we WAIT for slow
   tests, not how fast tests actually run)

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-21 01:26:34 -06:00