Compare commits

..

89 Commits

Author SHA1 Message Date
BigBodyCobain 00f9e3f1fd Pin v0.9.82 release digests for updater integrity verification.
Carry SHA-256 hashes for the source zip, MSI, and setup EXE into release_digests.json while retaining prior release entries.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-08 23:13:34 -06:00
BigBodyCobain ffdfe0426b Prepare v0.9.82 release: bump versions and changelog UI.
Align backend, desktop, helm, and frontend package versions for the Telegram OSINT and OpenClaw recon release.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-08 23:05:26 -06:00
BigBodyCobain 1583fd5715 Expose new telemetry and recon toolkit to OpenClaw agents.
Wire telegram_osint, malware, cyber, and SCM into search/slow-tier helpers; add osint_lookup, entity_expand, and osint_sweep commands; update README and skill docs.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-08 22:44:16 -06:00
BigBodyCobain af9b3d08cc feat: Telegram OSINT map layer, Osiris intel ports, and maritime settings
Add Telegram OSINT with hourly incremental t.me scraping, metro geocoding
separate from news centroids, threat-intercept popup UI with inline media,
and HTML markers above alert boxes so pins stay clickable. Expose GFW_API_TOKEN
in onboarding and Settings Maritime; harden GFW/CCTV/geo fetchers. Port Osiris-
derived recon, SCM, entity graph, malware/cyber feeds, sanctions, and submarine
cable layers with tests and documentation.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-08 21:04:08 -06:00
BigBodyCobain b64b9e0962 Add Sentinel-2 road freight trends with Analyze Here UI.
Port DrishX truck-motion detection as an opt-in slow layer: on-demand map-center analysis, preset corridors, layer panel toggle, and Docker road-corridor extras.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-07 23:39:13 -06:00
BigBodyCobain 76f4deb3a7 test: remove dead _make_client helper from conftest (from PR #376 review).
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-06 20:40:29 -06:00
BigBodyCobain 49d90eaf69 Track production-hardening checklist in docs (gitignore exception).
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-06 20:23:11 -06:00
BigBodyCobain 079ff7b737 Harden production checklist: dedupe live-data routes and align serializers.
Pin Mathieu's data-path checklist in docs and PR template, remove dead main.py fast/slow handlers, unify orjson via _live_data_json_bytes, and bound LiveUAMap Playwright defaults.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-06 20:16:18 -06:00
BigBodyCobain bd81a940ff Follow up on #375 review: dedupe live-data route and harden serializers.
Align full /api/live-data with slow-tier orjson options, remove dead main.py duplicate, cap slow batches to pool size, cancel queued work on timeout, and stop retrying HTTP 4xx/5xx.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-06 20:10:59 -06:00
BigBodyCobain 9a0a9a116a Address #375 production-readiness: dev bind, live-data lock, heavy fetch pool.
Default python main.py to loopback, deep-copy dashboard snapshots outside the store lock with ETag on full live-data, and route GDELT/LiveUAMap/CCTV/slow-tier work through an isolated executor so Playwright jobs cannot starve fast-tier workers.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-04 17:29:04 -06:00
BigBodyCobain 80a01275ff Add MKT opt-in on threat intercept, jittered market fetches, and Sentinel multi-scene dossier.
Operators enable Polymarket/Kalshi correlation from Global Threat Intercept with a consent dialog; polls use a jittered schedule separate from the slow tier. Right-click Sentinel imagery returns up to three signed scenes again.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-04 09:01:21 -06:00
BigBodyCobain 3ac8442e4b fix(uap): weekly live NUFORC refresh with 7-day cache for operators
Each install pulls ~60-day sightings from nuforc.org every Monday; disk cache
matches weekly cadence so users keep current pins between restarts.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-03 18:41:28 -06:00
BigBodyCobain 5f322b0a79 fix(uap): enforce 60-day window, refresh daily, live NUFORC on Windows
Filter stale rows out of nuforc_recent_sightings.json on load; add requests-based
live scrape when curl is disabled; daily scheduler rebuild instead of weekly-only.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-03 18:27:30 -06:00
BigBodyCobain 363b5a49c8 Close tg12 outbound audit (#348-#366): operator UA, opt-ins, docs
- User-Agent is per-install handle only (no Shadowbroker product token)
- LiveUAMap: Windows UI consent when enabling Global Incidents; env override
- Meshtastic callsign upstream header off by default (opt-in true)
- Expanded docs/OUTBOUND_DATA.md and README link for CCTV, basemap, Broadcastify

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-03 15:01:32 -06:00
BigBodyCobain a3e5c98cd0 test(cctv): Madrid KML HTTPS-first fallback; clarify KiwiSDR #364 docs
Adds unit coverage for MadridCityIngestor catalog fetch order.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-03 14:33:01 -06:00
BigBodyCobain 6a098e1c5f Pin DeepState mirror, prefer HTTPS for Madrid/KiwiSDR, document outbound data (#362–#364).
Operators can set DEEPSTATE_MIRROR_COMMIT for immutable frontline ingest; Madrid KML tries HTTPS then HTTP without changing camera image URLs or proxy Referers.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-03 14:31:31 -06:00
BigBodyCobain f08781bdc9 Route dossier, geocode, and Wikimedia through the backend (#351, #352, #360)
Proxy region dossier, Sentinel search, Wikipedia, and Wikidata via self-hosted
APIs; remove LocateBar client-side Nominatim fallback; migrate legacy shadow-
operator handles to operator- prefix.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-02 15:20:44 -06:00
BigBodyCobain c3dd95f6a9 Address remaining safe security hardening 2026-06-02 13:34:11 -06:00
BigBodyCobain 10a8c7b5be Apply non-disruptive security hardening 2026-06-02 12:50:41 -06:00
BigBodyCobain f03ebbba11 Clarify OpenClaw HMAC agent credentials 2026-05-30 13:52:01 -06:00
BigBodyCobain a16f22ed34 Cover AI and SAR proxy auth routes 2026-05-29 08:15:06 -06:00
BigBodyCobain 41e35e4da2 Fail fast on short admin keys 2026-05-28 15:02:40 -06:00
BigBodyCobain be3ab5823a Fix self-host API key proxy auth 2026-05-28 01:54:23 -06:00
BigBodyCobain ef52bd03d2 Harden private Infonet host checks 2026-05-28 01:26:48 -06:00
BigBodyCobain 017f383096 Fix BadHost path handling 2026-05-28 01:24:33 -06:00
Shadowbroker 41799f9891 feat(ci): switch GitLab mirror-to-github job to per-repo SSH deploy key (#331)
* feat(ci): switch mirror-to-github job from PAT to per-repo SSH deploy key

GitHub fine-grained PATs are capped at 366 days, classic PATs would
need 'public_repo' (broader scope than needed). Per-repo SSH deploy
keys are tighter:
- Can ONLY push to BigBodyCobain/Shadowbroker (no access to anything
  else, not even other repos owned by the same account).
- Never expire.
- Rotating == one-click delete on github.com/.../settings/keys.

Changes:
- New CI/CD variable GITHUB_MIRROR_SSH_KEY (File, Protected) holding
  the ed25519 private half. Public half lives on the repo's deploy
  keys with write access enabled.
- mirror-to-github before_script writes the key to ~/.ssh/id_ed25519,
  pins github.com host fingerprints (ed25519 + ecdsa + rsa from the
  2023-03-24 rotation) into ~/.ssh/known_hosts so we never trust a
  MITM, then pushes via git@github.com:... instead of HTTPS.
- Job rule now gates on GITHUB_MIRROR_SSH_KEY (the new var) instead
  of GITHUB_MIRROR_TOKEN (which never existed).

After this lands, every commit pushed directly to GitLab main will
mirror back to GitHub main automatically — closing the loop on
bi-directional sync.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(secret-scan): exempt SSH known_hosts entries from leaked-key detection

PR #331 introduced github.com host fingerprints pinned in
.gitlab-ci.yml's mirror-to-github before_script. The scanner flagged
them as embedded secrets and blocked CI:

  BLOCKED: Embedded secrets/tokens found in:
    .gitlab-ci.yml
      133: github.com ssh-ed25519 AAAA...
      135: github.com ssh-rsa AAAA...

These are PUBLIC host keys — the whole point of pinning known_hosts is
to publish the fingerprint widely so a MITM is detectable. They are
documented at https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/githubs-ssh-key-fingerprints
and committing them is the correct, secure practice.

Fix: add a KNOWN_HOSTS_LINE regex to the content-scan block that
recognizes `<host-or-ip> [salt] <algo> AAAA...` shape lines (the
exact format used in ~/.ssh/known_hosts) and filters them out before
flagging the file. Bare `ssh-rsa AAAA...` lines without a host prefix
are still caught — only the host-key shape is exempt.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 04:22:09 -06:00
Shadowbroker a1af9c3595 fix(ci): wrap GitLab dind TLS env in docker context so buildx accepts it (#330)
The build-backend and build-frontend jobs were failing immediately after
identity verification finally allocated runners:

    $ docker buildx create --use --name multiarch --driver docker-container
    ERROR: could not create a builder instance with TLS data loaded from
    environment. Please use `docker context create <context-name>` to create
    a context for current environment and then create a builder instance
    with context set to <context-name>

The dind service exports DOCKER_HOST=tcp://docker:2376 +
DOCKER_TLS_CERTDIR=/certs, but buildx --driver docker-container doesn't
read TLS from those env vars directly. Documented GitLab fix: create an
empty `docker context` (which inherits the current TLS env), then bind
buildx to that context name as a positional arg.

After this lands, the multi-arch buildx jobs should actually build and
push amd64 + arm64 images to
  registry.gitlab.com/bigbodycobain/shadowbroker/backend:latest
  registry.gitlab.com/bigbodycobain/shadowbroker/frontend:latest

Surfaced by the post-verification pipeline at
  https://gitlab.com/bigbodycobain/Shadowbroker/-/pipelines/2550501798

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 02:04:53 -06:00
Shadowbroker c8a8fc56f8 chore(ci): bump comment in .gitlab-ci.yml to verify post-verification runner allocation (#329)
Pipelines on the GitLab mirror have been instant-failing with 0 jobs and
no started_at since the project was created — classic "shared runners
not allocated to unverified free-tier accounts" pattern. The account is
now identity-verified; this trivial comment bump exists solely to fire a
fresh pipeline that confirms runners now pick up the build-backend and
build-frontend jobs.

If the resulting pipeline produces real jobs that build the multi-arch
images and push them to registry.gitlab.com/bigbodycobain/shadowbroker/{backend,frontend},
the GitLab install path is at full parity with the GitHub one.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 01:54:08 -06:00
Shadowbroker e6aba86ce1 chore(release): update v0.9.81 SHA256 digests after rebuild (#328)
Re-cut v0.9.81 binaries from current main (which now includes the
private gate + DM hashchain spool from #326 and the gate-directory
test from #327). All three artifacts were signed with the same
minisign updater key as the original v0.9.81 release, so existing
v0.9.81 installs on Tauri auto-update accept the new bundles.

Updated hashes (verified against released assets):
- ShadowBroker_v0.9.81.zip      f81f454bdc88e9a32c351df38212b8cfa624704d65764b971bb091eef62259c6
- ShadowBroker_0.9.81_x64-setup.exe   25e9a95d0d8ce959a7d08fe8e7406772ae24b596652793e81d1de5d02510a5a6
- ShadowBroker_0.9.81_x64_en-US.msi   34e655fc0c0f195ee4ac978f228a4b2b9d5565253b8771aca9ef4693409e9e70

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 01:16:12 -06:00
Shadowbroker d5609ac02f test(infonet): cover gate directory renderer (landing + command variants) (#327)
Adds the focused test Codex wrote alongside the gate-directory UI work
that already shipped in #326 (the `renderGateDirectory` helper used
both under the Infonet logo on the landing screen and as the output of
the `gates` command in the terminal).

The renderer itself is already on origin/main; this PR just ships the
test so CI catches regressions to the dual-variant render.

Verified locally:
- frontend npm run test:ci -- src/__tests__/mesh/infonetShellGateDirectory.test.tsx → 1/1 pass

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-24 21:55:54 -06:00
Shadowbroker 1d7fa5185a feat(infonet): private gate + DM hashchain spool with hardened propagation (#326)
Private gate messages and offline DMs now ride the Infonet hashchain
as ciphertext-only events, replicated across nodes via private
transports (Tor onion / RNS / loopback) and decrypted only by parties
holding the gate or recipient keys.

Hashchain core (mesh_hashchain.py)
----------------------------------

* New ``append_private_gate_message`` and ``append_private_dm_message``
  append paths with full signature verification, public-key binding,
  revocation check, and replay protection in a dedicated sequence
  domain (so a gate post does not consume the author's public broadcast
  sequence, and a DM cannot replay-block a public message at sequence=1).
* Fork validation and full-chain validation now accept the gate
  signature compatibility variants — older signatures that canonicalize
  with/without epoch or reply_to still verify, so a re-sync from an
  older peer doesn't reject still-valid history.
* DM hashchain spool: capped at 2 active sealed offline DMs per
  recipient mailbox, plus a per-(sender, recipient) cap so one prolific
  sender can't consume both slots. 1-hour TTL on the cap counter.
  Spool intentionally small — it's an offline bootstrap channel,
  not a persistent mailbox.
* Rebuild-state preserves the gate sequence domain across reloads so
  a chain reload doesn't accidentally let an old gate sequence
  replay-collide on next append.

Schema enforcement (mesh_schema.py)
-----------------------------------

* Private gate + DM payloads have closed allowlists of fields.
  Plaintext keys (``message``, ``plaintext``, ``_local_plaintext``,
  ``_local_reply_to``) are explicit rejection-bait — they raise before
  the event ever touches the chain.
* DM ciphertext + nonce must look like base64-ish sealed bytes;
  obvious base64-encoded plaintext shapes are rejected.
* ``transport_lock`` required: DM hashchain spool requires
  ``private_strong``; gate accepts ``private``/``private_strong``/
  ``rns``/``onion``.

Defense-in-depth at the network layer (main.py + mesh_public.py)
----------------------------------------------------------------

* ``_infonet_sync_response_events`` now silently redacts private events
  (gate_message + dm_message) unless the request looks like a loopback /
  onion / RNS / private transport caller. If an operator accidentally
  exposes :8000 to the public internet, an external puller gets
  public events only — never ciphertext.
* ``_sync_from_peer`` raises ``PeerSyncRateLimited`` for 429 (handled
  as 4-tuple return with retry_after_s) and ``PeerSyncHTTPError`` for
  other non-200 statuses (handled by ``_run_public_sync_cycle`` to
  honor server cooldown hints even outside the 429 path).

DM relay hydration (main.py)
-----------------------------

* New ``_hydrate_dm_relay_from_chain``: when accepted dm_message chain
  events arrive on a node, they get deposited into the local DM relay
  store with a deterministic sender_token_hash so re-sync of the same
  event is idempotent. Recipients see the ciphertext as a normal DM
  on their next poll and decrypt with their existing recipient key.

Other surfaces
--------------

* meshnode.bat / meshnode.sh now set ``MESH_INFONET_ALLOW_CLEARNET_SYNC=
  false`` and the participant runtime flags by default so a freshly
  spun-up node defaults to private-only sync.
* InfonetTerminal/InfonetShell.tsx adds a gate directory renderer for
  the new private-gate workflow.
* docker-compose.relay.yml binds the relay backend to 127.0.0.1:8000
  only; Tor's hidden service forwards onion traffic into 127.0.0.1.
  Public clearnet :8000 stays off the network edge.

Tests
-----

* 7 new tests in test_private_gate_hashchain.py + test_private_dm_
  hashchain.py covering: gate fork accepts ciphertext propagation,
  gate fork rejects plaintext, append rejects plaintext before
  normalize, append requires private_strong, append rejects
  non-sealed ciphertext shape, DM spool 2-per-recipient + 1-per-pair
  cap, DM hydration delivers to poll/claim.
* Updated test_mesh_node_bootstrap_runtime.py covers 429 backoff via
  PeerSyncRateLimited 4-tuple AND PeerSyncHTTPError exception.
* Updated test_s14b_public_sync_gate_filter.py + test_s9b_gate_store_
  hydration.py + test_gate_write_cutover.py cover the new private
  redaction on public sync responses.
* test_private_gate_hashchain.py + test_private_dm_hashchain.py:
  10 passed locally.
* Combined mesh-relevant suite (the 5 modified existing tests +
  2 new): 17 passed.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-24 21:25:18 -06:00
Shadowbroker fb97042c01 Update README.md
Elaborated on Tor and Reticulum usage.
2026-05-24 11:08:05 -06:00
Shadowbroker 2616a6c9e3 Update README.md 2026-05-24 11:06:40 -06:00
Shadowbroker a930497e14 fix(start-scripts): find bundled privacy_core.dll next to script (#319) (#324)
* fix(start-scripts): find bundled privacy_core.dll next to script

start.bat and start.sh only checked the source-tree DLL path
(``privacy-core/target/release/privacy_core.dll``), not the bundled
location where MSI/AppImage/DMG installers stage the library directly
next to the script in backend-runtime/.

Users running start.bat from inside an MSI install dir (a documented
workaround when the desktop shell crashes) saw a scary "install Rust"
warning even though the DLL was sitting right next to them. See issue
#319 for the user-reported confusion.

Fix: add a fallback check for the bundled location before falling
through to the "build privacy-core from source" warning. Source-tree
behavior unchanged — the source path is still preferred when present.

Also re-stamps the v0.9.81 source archive: ``release_digests.json``
v0.9.81 zip hash updated to point at the rebuilt source archive that
contains these script changes. MSI/EXE/sig hashes are unchanged (the
scripts live at the repo root, not inside the desktop bundle).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(#319): bundle start.bat + start.sh into the MSI/EXE installers

Follow-up to the start-script DLL fallback fix in the prior commit.

ChrisMTheMan's report on #319 made it clear the workaround flow was:

  1. MSI install crashes on launch (different bug, fixed in v0.9.81)
  2. User goes looking for start.bat to launch the backend manually
  3. start.bat isn't in their install dir, so they go fetch it from GitHub
  4. They get a working script but it doesn't know about the bundled
     privacy_core.dll layout, so they see a scary "install Rust" warning

The prior commit fixed step 4. This commit fixes step 3 — start.bat and
start.sh now ship inside the MSI/EXE installers (staged into
backend-runtime/ next to the privacy_core.dll they expect to find).
After the rebuild lands, an MSI user looking for these scripts finds
them right inside their install dir, already pointing at the correct
bundled DLL location.

What changed
------------

* ``build-backend-runtime.cjs`` now has a ``stageStartScripts()`` step
  that copies start.bat and start.sh from the repo root into the
  staged backend-runtime/. Preserves the executable bit on .sh under
  POSIX.

* ``release_digests.json`` v0.9.81 block hashes refreshed for the
  rebuilt MSI / EXE / source-zip (the scripts being bundled changed
  the MSI/EXE contents; the source zip also includes the start-script
  fix from the prior commit).

  ShadowBroker_v0.9.81.zip                  6.06 MB
    af8c87ccdece8fbb9aadc6be63cce10d3fcba74e6d87ef83289dda6d555fd270
  ShadowBroker_0.9.81_x64_en-US.msi       122.4 MB
    8977c9a1c54e1f0d030436be9c4e3d81d766cc0080699eb747649095f360c7ff
  ShadowBroker_0.9.81_x64-setup.exe        76.5 MB
    4e866fa0423c0c2470ed32f4809167a7815dc23ee7762b69e95681c1f3a28250

Post-merge plan
---------------

Force-move the v0.9.81 tag to this commit and replace ALL release
assets on the GitHub release: zip, msi, exe, both .sig files,
latest.json, SHA256SUMS.txt, release-manifest.json.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-23 21:34:59 -06:00
Shadowbroker 2dc1fcc778 release: v0.9.81 — signed auto-update + admin_session race fix (#323)
What this release does
----------------------

1. Establishes a fresh Tauri updater signing keypair. The previous keypair
   (pubkey baked into v0.9.79 / v0.9.8) had no matching private key on
   any maintainer-controlled machine — every prior release shipped
   without signatures, so auto-update has never actually worked. v0.9.81
   rotates to a new pubkey and ships signed installers + latest.json so
   every release from here is a one-click upgrade.

2. Fixes the ``admin_session_required`` race in TopRightControls.tsx.
   The updateAction state used to default to ``auto_apply`` at React-init
   time. A click on the Update button before the async runtime probe
   completed went down the auto_apply path (POST /api/system/update),
   which throws ``admin_session_required`` on fresh sessions. Desktop
   installs now default to ``manual_download`` based on synchronous
   ``window.__TAURI__`` detection at useState init.

One-time cost for current installs
----------------------------------

Anyone on v0.9.79 or v0.9.8 will see the in-app Update button still
trigger the broken path on their existing install (the fix only takes
effect once they're ON v0.9.81). The MANUAL DOWNLOAD button in the
update dialog opens the GitHub release page, where they grab the .msi
and run it. After that one manual hop, all future updates are seamless.

Release artifacts
-----------------

  ShadowBroker_v0.9.81.zip                  6.06 MB
    42f8a51f9a5690d1e7349d90d8ecf2d163c9061d6cf90c69ee03647a785437ff
  ShadowBroker_0.9.81_x64_en-US.msi       122.4 MB
    a45b177c26c95d2b28d71592d7147e88ff4e104865f214fde11249d311ec9e25
  ShadowBroker_0.9.81_x64-setup.exe        76.5 MB
    eca884b9d37eeccd0f11c91dcc6f6ae1b3609d9dee72bd73c37c9a427babfef2

Plus .sig files for the .msi and .exe, plus a signed latest.json for
the Tauri updater endpoint.

Sizes match the v0.9.79 / v0.9.8 reference shape within drift for
the new TopRightControls patch.

release_digests.json keeps v0.9.79 + v0.9.8 blocks alongside v0.9.81
so operators still on those versions continue to validate cleanly
during the rollout transition.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-23 18:43:53 -06:00
Shadowbroker 896d1ae938 fix(#319,#296): v0.9.8 rebuild — bundle missing deps so backend launches (#322)
Issues #319 and #296 reported that the installed v0.9.79 Windows MSI/EXE
crashed on launch with:

    thread 'main' panicked ... failed to setup app: error encountered
    during setup hook: ShadowBroker cannot start: the bundled local
    backend failed to launch.
    technical detail: managed_backend_exited_early:exit code: 103

Root cause: ``backend/pyproject.toml`` declares ``defusedxml>=0.7.1`` and
``PySocks==1.7.1`` as runtime dependencies, but the venv used to build
v0.9.79 (and the initial v0.9.8 publish) had both missing. When
``services/fetchers/aircraft_database.py`` does
``import defusedxml.ElementTree`` at startup, Python raises
``ModuleNotFoundError`` and uvicorn exits, which Tauri reports as
``managed_backend_exited_early``.

Both packages now installed in the build venv. ``main.py`` imports
end-to-end with only the expected ``plane_alert_db.json not found``
warning (runtime-state file, populated on first launch).

Rebuilt artifacts on the maintainer's local machine:

    ShadowBroker_v0.9.8.zip                  6.06 MB
      183bb5cd62b9b9349d95df5ef7696cb6ca810ab4b991fa9dab6f898af4c7a175
    ShadowBroker_0.9.8_x64_en-US.msi       122.4 MB
      fe22f9d51e4360d74c18a7250c2fbb9ed4fa4c7a884b3ac0d04a21115466386b
    ShadowBroker_0.9.8_x64-setup.exe        76.5 MB
      94a0309862e9c81c92cdcbfea8eec9dbb97eef19ded82b26217b397defbc810c

After this merges, the v0.9.8 tag will be force-moved to this commit and
the GitHub release assets replaced so the integrity chain validates
against the working installers instead of the broken ones.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-23 16:48:45 -06:00
Shadowbroker 8dfa6a7199 release: v0.9.8 — Cumulative Fuel/CO2, AIS Resilience, Data-Layer Repair (#321)
Bumps every hardcoded 0.9.79 → 0.9.8 across backend, frontend,
desktop-shell, helm, lockfiles, test fixtures. Refreshes the in-app
ChangelogModal HEADLINE_FEATURES, NEW_FEATURES, and BUG_FIXES with the
v0.9.8 highlights.

Release artifacts built locally and hashed into release_digests.json:

  ShadowBroker_v0.9.8.zip                  6.06 MB
    d506f6b8462ccb12096f0cd9462233be58928094240416b65fb3127bdd1f3820
  ShadowBroker_0.9.8_x64_en-US.msi       122.4 MB
    d4be4cb68c3e6409fff54c225acdcdd08e27d5d6d2b31616d78d2a4f6812991d
  ShadowBroker_0.9.8_x64-setup.exe        76.5 MB
    1115d1f5cf37edd03ea2c21d821c7626e1bf3319c990402aaa0293bca46fea67

Sizes match the v0.9.79 reference shape (5.76 MB / 117 MB / 72.9 MB)
within expected drift for new code. The .zip is a `git archive` of the
v0.9.8 source tree (matching v0.9.79's approach).

Audit confirms no .env, .key, .venv-dir, or cache files leaked into the
backend-runtime bundle. Python 3.11.9 + 199 site-packages + privacy_core
all staged correctly.

Headline changes since v0.9.79:
* Cumulative fuel/CO2 per flight (#317) — running totals since first
  observation, not just per-hour rate.
* AIS maritime resilience (#314, #316) — outage banner + AISHub REST
  fallback when AISStream WebSocket primary is offline.
* Data-layer repair (#311, #312) — UAP fallback respects the 60-day
  cutoff; GPS jamming threshold tuning + nac_p=0 inclusion so the layer
  actually fires.
* Per-flight source attribution (#313) — source field on every record.
* Cross-node DM mailbox replication (#309).
* Infonet sync HTTP 429 honored (#310).

Test fixtures updated:
* test_per_operator_outbound_attribution.py — added v0.9.8 UA strings
  to the banned-aggregate-literals list (alongside v0.9.79).
* updateRuntime.test.ts — bumped asset filename fixtures to v0.9.8.

release_digests.json keeps the v0.9.79 block alongside v0.9.8 so
operators still on 0.9.79 validate cleanly during the rollout.

The accent narrowing fix in ChangelogModal (one feature uses 'purple',
two use 'cyan' so the renderer's `accent === 'purple'` comparison
still type-checks) is included.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-23 16:24:20 -06:00
Shadowbroker ef6b8ec181 fix(desktop-build): strip layout.tsx force-dynamic on CRLF checkouts too (#320)
build-frontend-export.cjs stages a desktop-only frontend export tree and
strips the ``force-dynamic`` + ``revalidate`` directives from
``frontend/src/app/layout.tsx`` so Next's ``output: "export"`` can
prerender every route.

The strip regexes only matched LF (``\n``). Any Windows checkout without
``core.autocrlf=input`` has CRLF line endings, the strip silently
no-op'd, and the desktop build failed at the static-export step:

    Error: Page with `dynamic = "force-dynamic"` couldn't be exported.
    `output: "export"` requires all pages be renderable statically
    because there is no runtime server to dynamically render routes
    in this output format.
    Export encountered an error on /_not-found/page: /_not-found

Reaches every Windows contributor who hasn't normalized line endings
locally. Replacing each ``\n`` in the strip regexes with ``\r?\n``
makes the strip CRLF-tolerant; LF behavior is unchanged.

Verified by running both regexes against the actual layout.tsx (302
bytes removed, force-dynamic + revalidate both gone) and against a
synthetic LF input (296 bytes removed, same outcome).

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-23 16:07:11 -06:00
Shadowbroker dcea325fba Merge pull request #317 from BigBodyCobain/feat/cumulative-fuel-burn
feat(flights): cumulative fuel burned + CO2 emitted per flight
2026-05-23 08:09:34 -06:00
BigBodyCobain 03b8053617 feat(flights): cumulative fuel burned + CO2 emitted per flight
Pre-fix the emissions tooltip only showed the per-hour *rate* — what most
users actually want is the cumulative *amount* burned. This adds running
totals computed by multiplying the model-based rate by the elapsed
observation time since we first saw the airframe.

New module ``flight_observations.py``:
* Tracks first_seen_at + last_seen_at per icao24 hex.
* Re-opens a fresh session when an aircraft is unseen for > 15 min
  (treated as a new flight — landed and took off, or transited a dead
  zone). Prevents the cumulative counter from resetting mid-flight if
  the trail-rendering cache prunes the trail.
* Clamps elapsed time to 24h max so clock skew can't produce comically
  large numbers.
* Pruned every 5 min via a new scheduler job (mirrors ais_prune cadence).

flights.py + military.py emission enrichment now also attaches:
* observed_seconds — how long we've been tracking this airframe.
* fuel_gallons_burned — rate * elapsed_h.
* co2_kg_emitted — rate * elapsed_h.

The existing per-hour rate fields stay in the dict for backward compat
and are shown as small secondary context in the tooltip.

Frontend EmissionsEstimateBlock (NewsFeed.tsx) now prominently shows
the cumulative totals with the rate as smaller context underneath plus
"Observed in flight for Xh Ym". When observed_seconds is 0 (first refresh)
it renders "Just observed · totals will appear on next refresh" instead
of a misleading "0 gal".

12 backend tests cover record/accumulate/reset, the 24h clamp, prune,
case-insensitive key normalization, and end-to-end emission integration
in _classify_and_publish.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-23 07:56:23 -06:00
Shadowbroker 20807a2d62 Merge pull request #316 from BigBodyCobain/feat/aishub-fallback
feat(ais): AISHub REST fallback when AISStream is offline (20-min polling)
2026-05-23 07:42:56 -06:00
Shadowbroker 79fbf9741b Merge pull request #314 from BigBodyCobain/feat/ais-upstream-health
feat(ais): surface AISStream upstream outage instead of failing silently
2026-05-23 07:12:37 -06:00
BigBodyCobain a2f5d62926 feat(ais): AISHub REST fallback when AISStream WebSocket is offline
When stream.aisstream.io is unreachable (cert outage, server down — see
2026-05-20 and 2026-05-23 events) the ships layer goes empty. This adds
a slow REST fallback to data.aishub.net so the layer stays populated in
degraded mode.

Behavior:

* Opt-in via AISHUB_USERNAME (free registration at aishub.net/api).
  Without the env var the fetcher is a no-op.
* Default poll cadence 20 min — well inside their free-tier limits, gives
  ships time to move enough to look "alive". Configurable via
  AISHUB_POLL_INTERVAL_MINUTES, clamped to [1, 360].
* Internal gate: skips the poll entirely when the WebSocket primary is
  currently connected. Stomping fresh live data with 20-min-old REST
  data would be worse than leaving it alone.
* Vessels merge into the shared _vessels dict with source="aishub" so
  the existing UI / health tooling can attribute the provider.
* Live data wins races: if a WebSocket update for the same MMSI lands in
  the last 1s, we don't overwrite with the slower REST record.

Scheduler job runs every AISHUB_POLL_INTERVAL_MINUTES minutes alongside
the existing ais_prune job in data_fetcher.py.

24 tests cover gating (no-username, primary-connected), response parsing
(success / error / empty / malformed / unexpected shape), record
normalization (sentinels, missing fields, range checks, AIS @ padding),
poll interval clamping, and end-to-end merge with live-data-wins.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-23 07:00:32 -06:00
BigBodyCobain 5e0b2c037e feat(ais): surface upstream outage instead of failing silently
On 2026-05-23, stream.aisstream.io went fully offline (TCP timeouts on port
443). The backend kept respawning the node WebSocket proxy every few
seconds with nothing arriving. From the operator's POV the ships layer
silently went empty — no banner, no log surfacing, no way to tell whether
it was their config / network / viewport filter / upstream.

Backend:
* ais_proxy_status() now also returns:
  - connected (bool): true when a vessel message arrived in last 60s
  - last_msg_age_seconds (int | None)
  - proxy_spawn_count (int): proxy respawns — sustained growth without
    connected means upstream is dead
* /api/health escalates top status to "degraded" when AIS_API_KEY is set
  but the proxy is currently disconnected. Existing degraded_tls signal
  preserved.

Frontend:
* useAisUpstreamHealth hook polls /api/health every 30s, derives the
  outage state. Defensively only reports outage once spawn_count > 0 so
  operators who haven't opted in don't see the banner.
* AisUpstreamBanner component renders a dismissible amber notice
  "Ship data temporarily unavailable — AISStream upstream is offline"
  mounted on the main app shell.

7 backend tests pin the status-shape contract and the /api/health
escalation behavior in both with-key and without-key configurations.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-23 06:38:05 -06:00
Shadowbroker 69ef231e5a Merge pull request #313 from BigBodyCobain/feat/flight-source-attribution
feat(flights): stamp source attribution on every flight record
2026-05-23 06:29:31 -06:00
Shadowbroker 7a5f47ca9e Merge pull request #312 from BigBodyCobain/fix/gps-jamming-thresholds
fix(gps-jamming): count nac_p=0 + lower thresholds so layer actually fires
2026-05-23 06:29:20 -06:00
Shadowbroker 5cd49542bf Merge pull request #311 from BigBodyCobain/fix/uap-fallback-cutoff
fix(uap): stop HF fallback from serving 3-year-old NUFORC sightings
2026-05-23 06:29:08 -06:00
BigBodyCobain f14d4feb6d feat(flights): stamp source attribution on every flight record
Pre-fix, adsb.lol records (the primary source for most flights) carried
no source marker. OpenSky records got is_opensky: True and supplementals
got supplemental_source, so any UI inspecting source labels saw
OpenSky/airplanes.live records as explicitly tagged and adsb.lol records
as "unlabeled" — making it look like adsb.lol wasn't being used at all
even though it's the primary source.

Changes:

* _fetch_adsb_lol_regions stamps source="adsb.lol" on each aircraft
  before returning, so the tag survives the OpenSky dedupe-by-hex merge.
* OpenSky records get source="OpenSky" (alongside is_opensky=True for
  back-compat).
* military fetcher tags source on both adsb.lol and airplanes.live
  records before they're merged, and propagates source into the
  military_flights and uavs output dicts.
* _classify_and_publish promotes the explicit source field into the
  published flight dict. Falls back to legacy supplemental_source if
  source is absent. Final fallback "adsb.lol" preserves prior behavior
  for any caller synthesizing records without going through a fetcher.

8 new tests cover the published-dict propagation, OpenSky tagging,
supplemental fallback, explicit-wins precedence, default behavior, the
adsb.lol regional fetcher tagging, and the military output dict.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-23 06:14:39 -06:00
BigBodyCobain 19a8560a80 fix(gps-jamming): count nac_p=0 + lower thresholds so the layer actually fires
Three stacked filters meant the gps_jamming layer almost never lit up:

1. nac_p == 0 aircraft were dropped on the theory that "0 = old transponder."
   That's only half right — modern Mode-S Enhanced Surveillance transponders
   also fall back to nac_p=0 when they lose GPS lock entirely, which IS the
   jamming signature we want to catch. Discarding them was discarding the
   strongest signal. None (no field at all — typical for OpenSky-sourced
   records) is still skipped because absence-of-data isn't evidence.
2. GPS_JAMMING_MIN_AIRCRAFT was 5 per 1°x1° cell. Jamming hotspots
   (eastern Med, Russia/Ukraine border, Iran/Iraq) tend to have sparser
   traffic because pilots avoid them. Lowered to 3.
3. GPS_JAMMING_MIN_RATIO was 0.30. Combined with the (preserved) -1 noise
   cushion that made the effective bar high. Lowered to 0.20.

The 1-aircraft noise cushion is intact so a single quirky transponder
still can't flag a zone alone.

Also extracted the detector loop into a pure ``detect_gps_jamming_zones()``
function at module scope so it's testable in isolation (was previously
inlined inside ``_classify_and_publish``). The public signature accepts
threshold overrides for ad-hoc re-tuning without code edits.

16 new tests cover nac_p=0 inclusion, None-skip preservation, MIN_AIRCRAFT
lowering, MIN_RATIO lowering, noise cushion preservation, constant pinning,
override behavior, lon/lng key compatibility, and robustness to empty/None
inputs.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-22 23:40:18 -06:00
BigBodyCobain 0d0e009867 fix(uap): stop HF fallback from serving 3-year-old NUFORC sightings
The UAP sightings layer is sourced from a live scrape of nuforc.org with a
static Hugging Face CSV mirror (kcimc/NUFORC) as a fallback. The fallback
parsed every row, sorted by occurred-desc, and took the top 250 — with no
date cutoff. The HF mirror is a third-party snapshot that hasn't been
refreshed in years, so the "newest 250" rows it returns are from ~2022-23.
When the live path fails (Cloudflare 403, curl disabled on Windows, wdtNonce
regex stale, etc.) users see a map full of sightings from 3 years ago,
labeled as the "last 60 days" layer.

Changes:

* HF fallback now applies the same 60-day cutoff the live path uses. Rows
  outside the window are dropped before take-top-N. If the mirror has
  nothing inside the window the fallback returns [] (don't serve stale).
* When the HF mirror is fully stale a loud ERROR log fires with the count
  of dropped rows so the operator can tell the mirror's the problem, not
  a network issue.
* When BOTH live AND HF fallback produce 0 rows, fetch_uap_sightings now
  trips assert_canary("uap_sightings", 0) so the health registry shows
  the layer as broken instead of "fresh and empty for days."
* Scheduler moved from daily 12:00 UTC to weekly Mondays 12:00 UTC. The
  layer is a rolling 60-day digest; refreshing once a week is enough
  cadence for human-readable map exploration and keeps nuforc.org load
  light.

6 new tests cover the cutoff filter, the doomsday-log path, the mixed-age
path, the both-paths-empty health failure, the positive fallback path, and
the scheduler cadence.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-22 23:27:12 -06:00
Shadowbroker febcce9125 Merge pull request #310 from BigBodyCobain/fix/infonet-sync-429-backoff
Infonet sync: honor HTTP 429 Retry-After + exponential backoff
2026-05-22 23:11:00 -06:00
BigBodyCobain 31ebcb5cd9 Infonet sync: honor HTTP 429 Retry-After + exponential backoff
Fixes the retry-storm that's been keeping the local node 429'd out of
the seed peer (the diagnosis we ran earlier in the session). Pre-fix:

  1. Sync hits the seed peer, gets HTTP 429 (Too Many Requests)
  2. _peer_sync_response stringifies the status into a ValueError
  3. _sync_from_peer catches it, error becomes the str() of the exc
  4. _run_public_sync_cycle calls finish_sync(error=..., failure_backoff_s=60)
  5. next_sync_due_at = now + 60s
  6. After 60s, sync runs again, hits same upstream that hasn't reset
     its rate-limit bucket, 429 again. Loop indefinitely.

Net effect: a node that hit one transient 429 would hammer the seed
every 60s forever, keeping the bucket full and never recovering. We
saw this in the live status dump: consecutive_failures=49,
last_sync_ok_at=0, retry storm sustained over the entire uptime.

What changed
------------
services/mesh/mesh_infonet_sync_support.py

  * New typed exception PeerSyncRateLimited carries the parsed
    Retry-After value out of the HTTP layer instead of stringifying
    everything into a generic ValueError.

  * New parse_retry_after_header() handles both RFC 7231 §7.1.3
    forms (delay-seconds and HTTP-date). Clamped at 1 hour so a
    hostile peer can't silence us for days.

  * New _failure_backoff_seconds() helper computes the next delay
    as max(exponential, retry_after_s). Schedule with default
    base=60s, cap=1800s:

      failure 1 -> 60s     (preserves pre-fix for transient blips)
      failure 2 -> 120s
      failure 3 -> 240s
      failure 4 -> 480s
      failure 5 -> 960s
      failure 6+ -> 1800s  (capped at 30 min)

    cap_s=0 explicitly disables exponential entirely — operators
    who want pure-Retry-After behavior have that option.

  * finish_sync now accepts retry_after_s and failure_backoff_cap_s
    kwargs. Backward-compatible: existing callers that don't pass
    retry_after_s get the same first-failure delay as before (the
    base value), only repeat failures grow.

main.py

  * _peer_sync_response detects 429 specifically, parses the
    Retry-After header, raises PeerSyncRateLimited(retry_after_s=N).
    Includes the response body prefix in the message so the
    operator's last_error finally shows something useful.

  * _sync_from_peer extended to return (ok, error, forked,
    retry_after_s) — the 4th tuple element is non-zero only when
    the upstream sent a parseable Retry-After. Existing call shape
    preserved: the lone caller in _run_public_sync_cycle was
    updated in the same commit.

  * _run_public_sync_cycle forwards retry_after_s into finish_sync.

Tests
-----
backend/tests/mesh/test_infonet_sync_429_backoff.py — 17 new tests:

  TestParseRetryAfter (7):
    - integer seconds form
    - HTTP-date form (computed as seconds-from-now)
    - HTTP-date in the past returns 0
    - empty / whitespace returns 0
    - malformed returns 0
    - clamps to 1 hour (hostile-peer cap)
    - negative returns 0

  TestFailureBackoffSeconds (5):
    - exponential growth schedule pins each level
    - retry_after wins when larger than exponential
    - exponential wins when larger than retry_after
    - cap_s=0 disables exponential entirely
    - zero inputs return zero

  TestFinishSyncBackoff (5):
    - first failure uses base unchanged (pre-fix back-compat)
    - consecutive_failures actually grow the delay
    - retry_after honored at low failure count
    - success resets consecutive_failures
    - last_error carries the HTTP status / Retry-After detail

All 24 existing sync-support / status-gate tests still pass. Other
failures in tests/mesh/ are pre-existing on origin/main and unrelated
to this change (verified by running the same tests against the
user's main worktree without these edits).

What the operator sees after this lands + a docker rebuild
----------------------------------------------------------
With the live 429 storm we diagnosed:

  Pre-fix: consecutive_failures keeps climbing 1/min forever,
           last_error empty or generic
  Post-fix: consecutive_failures grows, next_sync_due_at backs off
           exponentially (max 30 min), last_error explicitly carries
           "HTTP 429 from <peer> (retry_after=Ns): <body>" so the
           operator can see what's actually wrong. Once the upstream
           bucket drains and a sync succeeds, consecutive_failures
           resets to 0 and the schedule returns to the normal 300s
           interval.
2026-05-22 22:55:05 -06:00
Shadowbroker b3fca3dc18 Merge pull request #309 from BigBodyCobain/feat/cross-node-dm-mailbox-replication
DM mailbox: per-(sender, recipient) anti-spam cap + replication primitives
2026-05-22 22:43:26 -06:00
BigBodyCobain 401f114e4f DM mailbox: outbound replication + receiving endpoint
Second commit on this branch (first added the per-sender cap + accept_replica
primitive). This commit wires the actual cross-node propagation:

Outbound (sender side)
----------------------
* New ``DMRelay._replicate_envelope_to_peers_async()`` — fire-and-forget
  thread that POSTs the envelope to every authenticated relay peer via
  the same per-peer HMAC pattern gate-message replication uses (#256
  ``X-Peer-Url`` + ``X-Peer-HMAC`` headers, ``resolve_peer_key_for_url``).
* ``deposit()`` now calls the replication helper after a successful
  local accept. Per-peer errors are swallowed — slow Tor peers must not
  block the sender's UX, and the recipient polling from a healthy peer
  works fine even if some peers are down.
* Metrics: dm_replication_push_ok / _rejected / _error.

Inbound (receiving side)
------------------------
* New endpoint ``POST /api/mesh/dm/replicate-envelope`` in
  routers/mesh_peer_sync.py.
* Same HMAC auth gate (``_verify_peer_push_hmac``) as the existing
  infonet/gate peer-push endpoints. Unauthenticated requests get 403.
* Body cap of 64 KB (DM envelope is bounded by MESH_DM_MAX_MSG_BYTES).
* Calls DMRelay.accept_replica which enforces the per-sender cap as a
  network rule — hostile sender's relay can hold extras locally but
  honest peers reject them on inbound replication.

End-to-end flow now works
-------------------------
  1. Alice's node accepts a deposit to Bob's mailbox (local cap check).
  2. Alice's node spawns a background thread that POSTs the envelope
     to MESH_RELAY_PEERS with per-peer HMAC.
  3. Each peer's /api/mesh/dm/replicate-envelope verifies the HMAC and
     calls accept_replica, which re-enforces the per-sender cap.
  4. Bob (offline at the time of send) eventually logs into ANY node
     in MESH_RELAY_PEERS, his existing pollDmMailboxes pulls from
     the local mailbox there, finds Alice's envelope, decrypts.

Tests
-----
backend/tests/test_dm_replicate_envelope_endpoint.py — 4 tests:

  TestReplicateEndpointAuth:
    - rejects requests without peer HMAC (403)
    - rejects requests with WRONG peer HMAC (403) — confirms the
      HMAC is actually verified, not just present
    - rejects oversize bodies (>64 KB) with 400/413

  TestReplicateEndpointRegistered:
    - static check that POST /api/mesh/dm/replicate-envelope is
      registered on app.routes — catches future refactor that
      drops the router include

All 38 backend tests touching the new code paths still pass:
  test_dm_relay_per_sender_cap.py (14)
  test_dm_replicate_envelope_endpoint.py (4)
  test_no_new_duplicate_routes.py (1) — new route is unique
  test_per_peer_secret_resolver.py (19) — HMAC primitive unaffected

What's still ahead (PR-3+)
--------------------------
* ack propagation: when recipient pulls a message on node X, peers Y/Z
  should prune their copies to free the sender's quota network-wide.
  Without this, the sender's quota frees only on the node the recipient
  actually polled — other peers still see N pending until TTL expiry.
  Workable but suboptimal. PR-3 will add a /api/mesh/dm/ack endpoint
  with the same HMAC pattern.
* recipient pull-from-peers: today the recipient's poll only hits
  their own node's relay. If they log into a peer they didn't deposit
  with, they need a way to fetch envelopes from other peers in
  MESH_RELAY_PEERS. Today this works as long as the recipient's
  current node is one of the peers Alice's node pushed to — which is
  true in a fully-meshed deployment but not guaranteed for partial
  meshes. PR-4 if telemetry shows this matters.
2026-05-22 19:23:09 -06:00
BigBodyCobain 79b39e8985 DM mailbox: per-(sender, recipient) anti-spam cap + replication primitives
Foundation work for cross-node DM mailbox replication. Adds the network
rule that makes the replication safe to ship next, plus the primitives
the outbound replication PR will call.

The rule
--------
A single sender can have at most N UNACKED messages parked in a single
recipient's mailbox at any one time. Default N=2, tunable via
``MESH_DM_PENDING_PER_SENDER_LIMIT``. Once the recipient pulls (acks) a
message, the sender's quota for that (sender, recipient) pair frees up.

Network rule, not local rule
----------------------------
The cap is enforced TWICE:

  1. ``DMRelay.deposit(...)`` — local check on the sender's own node.
     Refuses to spool the (N+1)th message before it can be replicated.

  2. ``DMRelay.accept_replica(...)`` — replication-acceptance check on
     every receiving peer. Refuses to accept an inbound replica that
     would put the local mailbox over the cap.

The second half is what makes the rule a NETWORK rule. A hostile sender
could patch out the deposit check on their own relay and continue to
spool extras locally — but those extras can never propagate, because
every honest peer enforces the same cap on the way in. A recipient who
polls from honest peers therefore never sees more than N pending from
any one sender, regardless of how many spam attempts the hostile
sender's relay accepted.

New API surface on ``DMRelay``
------------------------------
  _per_sender_pending_limit()       — reads MESH_DM_PENDING_PER_SENDER_LIMIT
  _per_sender_pending_count(...)    — counts unacked from a sender for a mailbox
  accept_replica(envelope=...)      — peer-push receive entry point
  envelope_for_replication(...)     — helper to extract a wire-form envelope

``accept_replica`` is idempotent on duplicate ``msg_id`` (replication
round-trips and multi-path delivery don't double-spool).

``envelope_for_replication`` exposes the exact shape ``accept_replica``
expects, so the follow-up PR (outbound replication wiring) just has to
fetch the envelope and POST it to authenticated peer URLs with the
existing per-peer HMAC pattern from #256.

Why this is PR-1 of two
-----------------------
The full cross-node mailbox replication needs three pieces:

  A. cap enforcement on deposit (in this PR)
  B. cap enforcement on replica acceptance (in this PR)
  C. outbound: push envelope to MESH_RELAY_PEERS after deposit (NEXT PR)

(A) + (B) shipped together close the cap-bypass attack surface BEFORE
(C) introduces the actual cross-node propagation. Shipping them in the
other order would briefly let extras propagate during the window between
"outbound push lands" and "accept_replica cap lands."

Tests
-----
backend/tests/test_dm_relay_per_sender_cap.py — 14 tests:

  TestDepositCap:
    - first 2 deposits succeed (UX baseline)
    - 3rd from same sender rejected with friendly message
    - different senders have independent quotas
    - different recipients have independent quotas
    - ack frees the quota (after recipient pulls, sender can deposit again)
    - cap is env-tunable

  TestAcceptReplicaCap:
    - replica accepted under cap
    - idempotent on duplicate msg_id (no double-spool, no rejection)
    - rejected at cap with structured ``cap_violation`` marker so
      sender's relay can stop retrying
    - per-sender, not per-mailbox: different sender_block_ref passes
      even when another sender at the same mailbox is capped
    - malformed envelope shapes rejected without crash

  TestEnvelopeForReplication:
    - returns the envelope for stored messages
    - returns None for unknown msg_id
    - round-trips through accept_replica end-to-end (proves the wire
      shape matches across the two sides)
2026-05-22 19:18:01 -06:00
Shadowbroker c3e38621fc Merge pull request #308 from BigBodyCobain/fix/296-windows-venv-uvicorn-detection
Fix #296: reject backend venvs missing uvicorn before launch (Windows)
2026-05-22 18:56:08 -06:00
BigBodyCobain 9ef02dd06f Fix #296: reject backend venvs missing uvicorn before launch
Reported by @f3n3k on Windows native install path. Symptom:

    C:\001\backend\venv\Scripts\python.exe: No module named uvicorn
    [backend] exited with 1
    ShadowBroker has stopped. Exit code: 1

Root cause
----------
The Windows Start.bat flow chains:

    Start.bat
      └─ scripts\run-windows-runtime.ps1
           └─ frontend\scripts\dev-all.cjs
                └─ start-backend.js
                     └─ backend\venv\Scripts\python.exe -m uvicorn main:app

`start-backend.js` decided whether an existing `backend\venv` was usable
by calling `canRun(candidate, ["-V"])`. That only checks whether Python
itself can run — it does NOT check whether the backend's actual runtime
dependencies are installed.

When the venv exists but `pip install` never finished (partial install,
failed network, interrupted bootstrap, etc.), the launcher happily
accepted that broken venv, then died with the exact error f3n3k
reported.

Fix
---
New `canRunBackendPython()` helper that requires BOTH:

    python -V                                # Python is runnable
    python -c "import fastapi, uvicorn"      # backend deps are installed

Used in two call sites:

  * `ensureBackendVenv()` — when iterating candidate venvs on first
    launch, reject any venv whose Python can't import the backend's
    real entry-point deps. The launcher then falls through to its
    existing rebuild path (`rebuildBackendVenv`) which reinstalls deps
    before declaring the venv healthy.
  * `rebuildBackendVenv()` — after a rebuild attempt, verify the deps
    are present before returning the new interpreter path. Catches
    silent partial rebuilds.

The check is the import that uvicorn itself would do at startup, so a
green return here genuinely means "uvicorn will start". Cost is one
extra `python -c` per venv candidate on launcher startup — milliseconds.

Verified locally with `node --check start-backend.js`.

Credit: @f3n3k for the original report.
2026-05-22 18:50:27 -06:00
Shadowbroker ba39d3b9aa Merge pull request #307 from BigBodyCobain/fix/302-openclaw-hmac-reveal-hardening
Fix #302: split OpenClaw HMAC reveal into dedicated POST with no-store headers
2026-05-22 18:47:09 -06:00
BigBodyCobain f91ddcf38b Fix #302: split OpenClaw HMAC reveal into dedicated POST with no-store
Reported by @tg12. Pre-fix, two problems lived on the GET endpoint:

  1. `GET /api/ai/connect-info?reveal=true` returned the full HMAC
     secret in the response body on every Connect modal open. Even
     gated to require_local_operator, that put the secret into
     browser history, dev-tools network panels, browser disk caches,
     HAR exports, and screen captures.

  2. The same GET endpoint auto-bootstrapped (generated + persisted)
     the secret on a mere read. Side effects on a GET are a footgun:
     browser prefetchers, mirror tools, and casual curl-from-history
     would all silently mint+persist a fresh secret.

Backend (backend/routers/ai_intel.py)
-------------------------------------
  GET  /api/ai/connect-info             — always returns the MASKED
                                          fingerprint (first6 + bullets
                                          + last4). No `?reveal` param.
                                          NO auto-bootstrap. When the
                                          secret is missing, returns
                                          `hmac_secret_set: false` and
                                          tells the caller to POST to
                                          /bootstrap.
  POST /api/ai/connect-info/bootstrap   — NEW. Mints+persists the secret
                                          if missing. Idempotent. Never
                                          returns the full secret in the
                                          response body.
  POST /api/ai/connect-info/reveal      — NEW. Returns the full secret
                                          with Cache-Control: no-store,
                                          no-cache, must-revalidate +
                                          Pragma: no-cache + Expires: 0.
                                          POST so the body never lands
                                          in URL history. 404 (with a
                                          pointer to /bootstrap) when
                                          the secret isn't set.
  POST /api/ai/connect-info/regenerate  — keeps existing one-time-reveal
                                          behavior (regen IS a deliberate
                                          destructive action triggered
                                          by the operator). Same
                                          no-store/no-cache headers added
                                          so even the regen response
                                          doesn't get cached.

Frontend (AIIntelPanel.tsx, OnboardingModal.tsx)
------------------------------------------------
  * On mount: GET (masked only). If hmac_secret_set: false, fire a
    transparent POST /bootstrap and refresh the masked fingerprint.
    Operator sees no behavior change from pre-#302.
  * Reveal (eye icon): lazy POST /reveal — secret only travels when
    the operator explicitly clicks the button.
  * Copy: lazy POST /reveal too — copying without a prior reveal
    works exactly like before, just routed through the new endpoint.
  * Regenerate: POST returns the new secret (same as before, but the
    response now has no-store headers).
  * The displayed snippet uses the masked fingerprint until the
    operator clicks Reveal or Copy.

Tests (backend/tests/test_openclaw_connect_info_reveal.py — 13 tests)
---------------------------------------------------------------------
  * GET returns masked + the full secret never appears in r.text
  * GET does NOT auto-bootstrap when missing
  * GET silently ignores any ?reveal=true query (back-compat noise)
  * POST /bootstrap mints when missing, idempotent when set
  * POST /bootstrap never returns the full secret
  * POST /reveal returns the full secret with Cache-Control: no-store,
    no-cache + Pragma: no-cache + Expires: 0
  * POST /reveal 404s with a pointer to /bootstrap when no secret
  * POST /regenerate returns the new secret with the same headers
  * Anonymous remote callers get 403 on ALL FOUR endpoints (parametric
    regression against the same allowlist used elsewhere).

Adjacent suites still green: test_openclaw_route_security,
test_no_new_duplicate_routes, test_control_surface_auth. 67/67 pass
locally.

Credit: @tg12 for the audit report.
2026-05-22 18:40:24 -06:00
Shadowbroker 49151d8b9f Merge pull request #304 from BigBodyCobain/fix/298-sentinel-creds-server-side
Fix #298: move Sentinel credentials from browser storage to backend .env
2026-05-22 18:29:11 -06:00
BigBodyCobain 767a2f6c00 Merge remote-tracking branch 'origin/main' into fix/298-sentinel-creds-server-side 2026-05-22 18:19:12 -06:00
Shadowbroker 2da739c9e8 Merge pull request #306 from BigBodyCobain/fix/messagesview-flake-alias-race
Deflake messagesViewFirstContact: alias-resolution race in toast text
2026-05-22 18:18:56 -06:00
BigBodyCobain eca7f24e2c Loosen messagesViewFirstContact toast assertion to fix alias-race flake
Follow-up to #305. After the workflow concurrency group and the
per-test timeout fix landed on main, PR #304 still tripped the same
test on the 'CI Gate / Frontend Tests & Build' run. Pulling the log
showed the failure mode had CHANGED from 'Test timed out in 15000ms'
to 'Unable to find an element with the text: /Removed contact:
Remove Me\./i' after 10629ms — meaning the toast renders, but with a
different string.

Tracing through MessagesView.tsx:3478-3494, the Remove handler computes
the toast text as:

    setComposeStatus(
      `Removed contact: ${displayNameForPeer(peerId, contacts)}.`,
    );

displayNameForPeer reads contacts[peerId].alias or falls through to
the raw peerId. The reference is captured from the closed-over React
state. Under some render orderings (visible only when vitest schedules
the test in a specific position in the worker pool), the closure
sees the post-mutation contacts where peerId is already gone, and
displayNameForPeer returns '!sb_remove' instead of 'Remove Me'. The
toast renders correctly — but as 'Removed contact: !sb_remove.' —
and the precise regex misses.

Fix: loosen the assertion to /Removed contact:/i. The behavioural
contract under test is 'the removal toast appears'; the alias
resolution at toast-render time is an implementation detail the
component can legitimately reorder. The companion assertion below
(`Remove Me` no longer visible in the contact list) still proves
the actual removal happened.

Verified locally: 26/26 tests pass in 5.15s.
2026-05-22 18:06:56 -06:00
BigBodyCobain 7bfaad17f0 Merge remote-tracking branch 'origin/main' into fix/298-sentinel-creds-server-side 2026-05-22 17:55:58 -06:00
Shadowbroker e3efcfd476 Merge pull request #305 from BigBodyCobain/fix/messagesview-flake-ci-concurrency
Deflake messagesViewFirstContact via CI concurrency group
2026-05-22 17:55:22 -06:00
BigBodyCobain 32b8421a1c Merge origin/main into fix/298: resolve tools.py conflict
PR #303 landed on main and added Depends(require_local_operator) to the
@router.post decorators for /api/sentinel/token and /api/sentinel/tile.
PR #298 (this branch) edited the same decorator lines AND function bodies
to add the env-credential fallback resolver.

Resolution keeps BOTH:
  * The require_local_operator dependency from #303 (the auth gate)
  * The _resolve_sentinel_credentials helper from #298
  * The env-fallback path inside the function bodies

Both layers are independent — the gate blocks anonymous callers, the env
fallback lets legitimate (gated) callers omit credentials from the body.

Verified: 46 tests pass against the merged code, including both
test_sentinel_credentials_server_side.py (#298 fallback) and
test_sentinel_routes_auth_gate.py (#303 gate).
2026-05-22 17:52:10 -06:00
BigBodyCobain bc70cc3527 fix(test): per-test timeout — 15s waitFor inside 15s testTimeout was zero headroom
Mistake in the prior commit on this branch (44e9b38). Bumped the
waitFor timeout to 15s without realising the suite-wide testTimeout
was ALSO 15s (raised in Round 7a deflake work). Net effect: the
test ran out of clock budget BEFORE waitFor could even finish
polling, producing "Test timed out in 15000ms" on the
"Frontend Tests & Build" run of PR #305 — same job that the
concurrency-group fix had just freed from the resource-contention
flake.

Fix:
  * Bump JUST this test's per-test timeout to 30s via the
    `{ timeout: 30_000 }` argument on the `it()` block.
  * Drop the inner waitFor back to 10s (was 15s) so it has a clear
    margin against the 30s test budget after setup/render/click.

26/26 tests in the file pass locally in 6.19s. The concurrency-group
fix in ci.yml stays as-is — that was correct and verifiably worked
(CI Gate / Frontend Tests & Build went green on the PR after 8 prior
failures). The flake-jump to the sibling workflow exposed this
second-order bug.
2026-05-22 17:49:00 -06:00
BigBodyCobain 44e9b38ac2 Deflake messagesViewFirstContact via CI concurrency group
Root cause
----------
ci.yml fires twice on every PR — once directly via `pull_request:
[main]` (producing the "Frontend Tests & Build" check) and once via
`workflow_call` from docker-publish.yml (producing the "CI Gate /
Frontend Tests & Build" check). Both jobs land on the same Actions
runner pool at the same time and fight for CPU/RAM. Under contention,
the React reconciliation in `messagesViewFirstContact.test.tsx >
removes an approved contact immediately from the visible contact list`
overruns its 5s waitFor timeout.

This is the single test that has flaked on PRs #226, #237, #261, #262,
#265, #294, #303, and the fd7d6fa push — always on the same job name
("CI Gate / Frontend Tests & Build"), never on the sibling job
("Frontend Tests & Build") on the same commit. PR #304 (which heavily
touched the frontend) passed both jobs on first try. PR #303 (zero
frontend changes) failed only the CI Gate job. That asymmetry is what
finally pinpointed the parallel-resource-contention cause rather than
anything in the test or the PRs.

Fix
---
.github/workflows/ci.yml — added a workflow-level concurrency group
keyed on the PR head SHA (or pushed commit SHA). Both invocations
against the same commit now share a group, so the second one queues
instead of running in parallel. cancel-in-progress is intentionally
`false` — cancelling would risk leaving a PR check stuck in "Expected"
if only one of the two ever finished. Total CI time grows by ~2 min
in exchange for deterministic outcomes.

frontend/src/__tests__/mesh/messagesViewFirstContact.test.tsx —
belt-and-suspenders bump of the waitFor timeout from 5s to 15s. The
structural fix above should make the original 5s margin sufficient,
but the bump removes the residual risk of brief runner load spikes
inside the (now serialised) single job. The failure mode this masks
would be "toast never renders", which still fails loudly at 15s.

The full mesh test file (26 tests) passes locally in ~8s with the
bumped timeout.
2026-05-22 17:36:33 -06:00
Shadowbroker b01a69c172 Merge pull request #303 from BigBodyCobain/fix/299-300-301-sentinel-auth-gate
Fix #299/#300/#301: gate Sentinel proxy routes with require_local_operator
2026-05-22 10:56:41 -06:00
BigBodyCobain b041b5e97c Fix #298: move Sentinel credentials from browser storage to backend .env
Reported by @tg12. Pre-fix, the Settings panel stored real third-party
Copernicus CDSE client_id + client_secret in browser localStorage /
sessionStorage via the privacy storage helper, and the proxy routes
required those values to come back in every tile/token request body.
Any same-origin script (XSS, malicious browser extension, dev-tools
HAR export) had read access to the credentials.

This change moves them server-side, behind the same .env-backed admin
flow every other third-party API key (OpenSky, AIS Stream, Finnhub,
Shodan, …) already uses.

Backend
-------
backend/services/api_settings.py
  * Added SENTINEL_CLIENT_ID and SENTINEL_CLIENT_SECRET entries to
    API_REGISTRY. The existing GET/PUT /api/settings/api-keys flow
    (already require_local_operator-gated, .env-backed) now manages
    them — no new route surface.

backend/routers/tools.py
  * /api/sentinel/token and /api/sentinel/tile resolve credentials via
    a new _resolve_sentinel_credentials() helper: body fields win for
    back-compat with any legacy callers, otherwise the helper reads
    SENTINEL_CLIENT_ID / SENTINEL_CLIENT_SECRET from os.environ.
  * When neither source has a value, the route returns 400 with a
    friendly pointer ("Set SENTINEL_CLIENT_ID and SENTINEL_CLIENT_SECRET
    in the API Keys panel") instead of the curt "required" message.
    The user's standing rule against hostile errors applies.
  * Function bodies only — decorator lines untouched, so this PR does
    not conflict with #303 (which adds Depends(require_local_operator)
    to the same routes).

Frontend
--------
frontend/src/lib/sentinelHub.ts — rewritten
  * Removed: getSentinelCredentials / setSentinelCredentials /
    clearSentinelCredentials / getSentinelCredentialStorageMode.
    These were the browser-storage read/write helpers; their existence
    was the bug.
  * Added: checkBackendSentinelStatus(), refreshSentinelStatus(),
    getCachedSentinelStatus(), and a kept-for-back-compat
    hasSentinelCredentials() shim. Status is sourced from
    /api/settings/api-keys (the same endpoint the API Keys panel
    already uses), so we don't add a new route just for this read.
  * Added: migrateLegacySentinelBrowserKeys() — one-shot, idempotent
    helper that clears sb_sentinel_client_id / _secret / _instance_id
    from BOTH localStorage and sessionStorage. We deliberately do NOT
    auto-POST those legacy browser values to the backend; doing so
    would silently migrate a secret across a trust boundary without
    operator consent. Operators re-enter once in the API Keys panel
    and the legacy keys get wiped here.
  * fetchSentinelTile and getSentinelToken no longer send client_id /
    client_secret in the request body. The backend uses .env.

frontend/src/components/SettingsPanel.tsx
  * Dropped sb_sentinel_client_id / _secret / _instance_id from
    PRIVACY_SENSITIVE_BROWSER_KEYS — they're no longer written.
  * SentinelTab rewritten: removed the inline Client ID / Client Secret
    inputs + Save / Clear / Test buttons. Replaced with a status panel
    that calls checkBackendSentinelStatus() on mount, a one-click
    "Open API Keys Panel" button, and a migration banner that appears
    only when migrateLegacySentinelBrowserKeys() actually cleared
    something.
  * Setup guide STEP 3 now points to the API Keys panel instead of
    the local form.

frontend/src/app/page.tsx
  * Added a one-time useEffect that fires checkBackendSentinelStatus()
    on mount so the cached value (which the synchronous
    hasSentinelCredentials() shim reads) is populated before
    MaplibreViewer's tile-URL memo runs.

Tests
-----
backend/tests/test_sentinel_credentials_server_side.py (new)
  * API_REGISTRY surface — sentinel_client_id / sentinel_client_secret
    are registered with the right env_keys, ALLOWED_ENV_KEYS lets
    /api/settings/api-keys PUT them.
  * Resolution order — body wins, env is fallback, neither → 400 with
    the friendly pointer message, and NO upstream HTTP call when
    neither source has credentials (asserted via
    MagicMock(side_effect=AssertionError)).
  * /api/sentinel/tile same shape.

frontend/src/__tests__/utils/sentinelHub.test.ts (new)
  * migrateLegacySentinelBrowserKeys clears localStorage AND
    sessionStorage, reports what it cleared, idempotent.
  * fetchSentinelTile + getSentinelToken POST WITHOUT client_id /
    client_secret in the body (plants leaked credentials in browser
    storage first to prove they are NOT picked up).
  * checkBackendSentinelStatus parses /api/settings/api-keys correctly:
    true only when both keys is_set, false on partial config or
    network errors.

All 7 backend tests + 8 frontend tests pass locally. The
test_no_new_duplicate_routes guard and the api-settings test suite
still pass.

Credit: @tg12 for the audit report.
2026-05-22 10:44:50 -06:00
BigBodyCobain c54ea7fd9f Fix #299/#300/#301: gate Sentinel proxy routes with require_local_operator
Reported by @tg12 in three audit issues opened the same day:

  #299 — POST /api/sentinel/token is an unauthenticated Copernicus
         OAuth relay for caller-supplied client_id/secret.
  #300 — POST /api/sentinel/tile is an unauthenticated quota/bandwidth
         relay for Sentinel Hub Process API tile fetches.
  #301 — GET /api/sentinel2/search is an unauthenticated Planetary
         Computer STAC + Esri imagery search relay.

All three lived in backend/routers/tools.py decorated only with
@limiter.limit(...) — no Depends(require_local_operator). That made
the backend a free anonymous relay for any caller's Sentinel /
Planetary Computer queries, in the same shape we already closed for
#240/#241 (oracle resolve) and #211/#213/#214 (thermal verify, OpenMHZ
calls + audio relay).

Fix: add dependencies=[Depends(require_local_operator)] to each route.
Loopback / Docker-bridge / admin-key callers (the operator dashboard)
are unaffected — they still resolve through the same allowlist used by
every other operator-only helper in this file. Anonymous remote callers
now receive 403 BEFORE any outbound HTTP call to Copernicus or
Planetary Computer happens.

Tests
-----
test_sentinel_routes_auth_gate.py — 8 new tests:
  * anonymous-remote → 403 on all three routes
  * NO upstream HTTP call when the gate fires (asserted via
    MagicMock(side_effect=AssertionError) on requests.post and
    services.sentinel_search.search_sentinel2_scene). This is the
    property that makes the gate real — without it, a 403 returned
    after the upstream call still burns quota.
  * 127.0.0.1 loopback caller reaches the handler (no false-positive
    where the gate accidentally blocks the local operator too).
  * Uses raw ASGITransport(client=(peer_ip, ...)) rather than
    FastAPI's TestClient because TestClient reports client.host as
    "testclient" which is not on the loopback allowlist.

test_control_surface_auth.py — extended the existing parameterised
regression with the three new routes. That regression is the global
"no remote control surface ships without auth" guard for the whole
codebase; adding these to it means a future refactor that drops the
dependency from any of them will fail CI alongside the existing
~30 gated routes.

The egress-on-403 property and the parameterised regression together
give two independent proofs that the gate fires before the upstream
network call, even if FastAPI's internal dependant tree shape changes
across versions (an earlier draft of this PR included a static walker
of the route table; it was removed because behavioural evidence is
strictly stronger and version-independent).
2026-05-22 09:58:25 -06:00
BigBodyCobain a3aa7b4dec Merge branch 'main' of https://github.com/bigbodycobain/Shadowbroker into fix/287-rate-limit-proxy-aware 2026-05-22 09:51:13 -06:00
Shadowbroker 19fb7f0b1e Fix #288: viewport-scoped live-data for heavy layers only (#294)
Reported by @tg12 in the external security/correctness audit.

Before this change, /api/live-data/{fast,slow} accepted s/w/n/e query
params but their Query() descriptions explicitly said "(ignored)". The
endpoints shipped the full in-memory world dataset on every poll:

    /api/live-data/fast → 16.88 MB
    /api/live-data/slow → 10.12 MB
                          ── 27 MB per poll cycle, regardless of zoom

For a node with N operators each polling at the steady 15s/120s cadence,
this is hundreds of MB/minute of outbound traffic that never gets used —
the GPU just culls everything outside the viewport client-side. On a
Tor-bridged or LTE-backed node, that bandwidth bill is the actual cost.

This change makes the existing s/w/n/e params honored — when all four
bounds are supplied, the backend bbox-filters a curated set of heavy,
density-driven, time-sensitive collections to that viewport (with the
existing 20% padding from _bbox_filter):

    /fast: commercial_flights, military_flights, private_flights,
           private_jets, tracked_flights, ships, cctv, uavs, liveuamap,
           gps_jamming, sigint, trains
    /slow: gdelt, firms_fires, kiwisdr, scanners, psk_reporter

Static reference layers (satellites, datacenters, military_bases,
power_plants, satnogs, weather, news, stocks, etc.) deliberately STAY
world-scale so panning never reveals an "empty world" of infrastructure.
That preserves the no-hostile-UX feel of the existing dashboard.

Behavior contract:

  * Without bbox params (or with a partial bbox), the response is
    byte-for-byte identical to the pre-#288 implementation. No
    behavior change for any existing caller that hasn't opted in.
  * World-scale bbox (lng_span >= 300 or lat_span >= 120) short-circuits
    filtering and shares the global ETag — zoomed-out operators all
    hit the same 304 cache exactly like before.
  * ETag now mixes a 1°-quantized bbox suffix when filtering engages,
    so two viewports never poison each other's 304 cache. Sub-degree
    pans land in the same ETag bucket (i.e. don't bust the cache on
    every mouse drag).

Polling cadence, rate-limit windows, and the 304 short-circuit are all
unchanged. Only the SIZE of the responses changes, and only when the
caller opts in via bounds.

Frontend wiring: useViewportBounds reuses the same coarsened/
expanded bounds it already computes for the AIS /api/viewport POST and
pushes them into a new module-level liveDataViewport store.
useDataPolling reads from that store via appendLiveDataBoundsParams
when building each live-data URL.

Tests cover: no-bbox → world data; bbox → heavy layers filtered;
bbox → reference layers untouched; world-scale bbox → no filter;
partial bbox → treated as no bbox; ETag changes with bbox; sub-degree
pan → same ETag; 304 path works; antimeridian-crossing bbox handled.

Co-authored-by: BigBodyCobain <moatbc@gmail.com>
2026-05-22 00:56:29 -06:00
Shadowbroker 35cd4e4c71 Fix #287: proxy-aware rate-limit key (#295)
Reported by @tg12 in the external security/correctness audit.

Before this change, backend/limiter.py was:

    from slowapi.util import get_remote_address
    limiter = Limiter(key_func=get_remote_address)

get_remote_address only ever returns request.client.host — it does
not look at X-Forwarded-For. Behind the bundled Next.js proxy (or any
other reverse proxy), every connected operator's client.host is the
frontend container's bridge IP, so @limiter.limit("120/minute")
collapses into one shared bucket for everybody on the same backend.
One heavy tab can starve every other operator on that node.

This change swaps in shadowbroker_rate_limit_key, which:

  * Reads X-Forwarded-For ONLY when the immediate peer matches the
    SAME hostname-bound allowlist we use for Docker-bridge local-operator
    trust (auth._resolve_trusted_bridge_ips — fix #250). Default is
    `frontend,shadowbroker-frontend`, override via
    SHADOWBROKER_TRUSTED_FRONTEND_HOSTS.
  * Picks the FIRST entry in the XFF chain — that's the operator end,
    not the proxy end.
  * Falls back to request.client.host for any peer not on the
    allowlist. Direct hits, unrelated containers, and unknown hosts
    are bucketed exactly like before.
  * Falls back to request.client.host when the resolver itself raises
    (e.g. DNS down). XFF is never accepted on a peer we can't confirm
    is the trusted frontend — there is no way to spoof another
    operator's bucket from outside.

No new env vars. No new operator config. Single-operator nodes are
unaffected — same behaviour as before. The 120/minute and 60/minute
windows on the existing endpoints are unchanged; only the KEY they
bucket on changes.

Tests cover:
  * Direct loopback → keys on peer (regression check vs.
    get_remote_address default).
  * Untrusted peer sending XFF → XFF ignored, keys on peer.
  * Trusted frontend peer with XFF → keys on first XFF entry.
  * First XFF entry picked from a multi-hop chain.
  * Trusted peer without XFF → falls back to peer IP.
  * Empty/whitespace XFF entries skipped.
  * Header lookup is case-insensitive.
  * Two operators behind same proxy → different keys (the whole
    point of the fix).
  * Spoof attempt from internet-facing untrusted IP can't steal the
    victim's bucket.
  * Resolver raising is treated as untrusted (fail-closed).
  * No-client request shape doesn't raise.

Co-authored-by: BigBodyCobain <moatbc@gmail.com>
2026-05-22 00:51:54 -06:00
BigBodyCobain 31f79fd8e2 Fix #287: proxy-aware rate-limit key
Reported by @tg12 in the external security/correctness audit.

Before this change, backend/limiter.py was:

    from slowapi.util import get_remote_address
    limiter = Limiter(key_func=get_remote_address)

get_remote_address only ever returns request.client.host — it does
not look at X-Forwarded-For. Behind the bundled Next.js proxy (or any
other reverse proxy), every connected operator's client.host is the
frontend container's bridge IP, so @limiter.limit("120/minute")
collapses into one shared bucket for everybody on the same backend.
One heavy tab can starve every other operator on that node.

This change swaps in shadowbroker_rate_limit_key, which:

  * Reads X-Forwarded-For ONLY when the immediate peer matches the
    SAME hostname-bound allowlist we use for Docker-bridge local-operator
    trust (auth._resolve_trusted_bridge_ips — fix #250). Default is
    `frontend,shadowbroker-frontend`, override via
    SHADOWBROKER_TRUSTED_FRONTEND_HOSTS.
  * Picks the FIRST entry in the XFF chain — that's the operator end,
    not the proxy end.
  * Falls back to request.client.host for any peer not on the
    allowlist. Direct hits, unrelated containers, and unknown hosts
    are bucketed exactly like before.
  * Falls back to request.client.host when the resolver itself raises
    (e.g. DNS down). XFF is never accepted on a peer we can't confirm
    is the trusted frontend — there is no way to spoof another
    operator's bucket from outside.

No new env vars. No new operator config. Single-operator nodes are
unaffected — same behaviour as before. The 120/minute and 60/minute
windows on the existing endpoints are unchanged; only the KEY they
bucket on changes.

Tests cover:
  * Direct loopback → keys on peer (regression check vs.
    get_remote_address default).
  * Untrusted peer sending XFF → XFF ignored, keys on peer.
  * Trusted frontend peer with XFF → keys on first XFF entry.
  * First XFF entry picked from a multi-hop chain.
  * Trusted peer without XFF → falls back to peer IP.
  * Empty/whitespace XFF entries skipped.
  * Header lookup is case-insensitive.
  * Two operators behind same proxy → different keys (the whole
    point of the fix).
  * Spoof attempt from internet-facing untrusted IP can't steal the
    victim's bucket.
  * Resolver raising is treated as untrusted (fail-closed).
  * No-client request shape doesn't raise.
2026-05-22 00:46:25 -06:00
BigBodyCobain fd7d6fa401 chore(.gitignore): exclude AI-agent scratch dirs and stray fixtures
The repo root has been accumulating AI-coding-agent dropouts that have
no project contract value:

  .codex/, .codex-app-schema/, .codex-app-ts/   — OpenAI Codex CLI
  AGENTS.md, GEMINI.md                          — per-agent instructions
  CLAUDE.md                                     — same shape
  .github/copilot-instructions.md               — GitHub Copilot hints

These are operator-side preferences. If something needs to be canonical
for the project, it goes in docs/ explicitly.

Also adding backend/tests/test_carrier_tracker_region_centers.py —
a stale fixture that referenced fields (region, source_detail,
position_label, position_source_type, position_confidence='low')
that don't exist in the current `_parse_carrier_positions_from_news`
implementation. The real coverage for that function lives in
tests/test_carrier_tracker_quality.py from PR #285.
2026-05-21 20:47:06 -06:00
Shadowbroker 49621824b1 Use USNI Fleet Tracker as the primary carrier source + small UI fixes (#293)
Background
==========
PR #285 set up the seed -> cache -> GDELT model for the carrier tracker
to address audit issues #244/#245/#246. The GDELT half of that pipeline
hits api.gdeltproject.org's doc API for headline-region keyword
matching -- low precision (false centroid positions per #245) AND
unreliable (the host times out from some networks, including Docker
Desktop on Windows).

USNI publishes a weekly Fleet & Marine Tracker with explicit prose like:

  "The Gerald R. Ford Carrier Strike Group is operating in the Red Sea"
  "Aircraft carrier USS George Washington (CVN-73) is in port in
   Yokosuka, Japan"

That is a strictly better source for U.S. Navy carrier positions:
authoritative, deterministically parseable, weekly cadence.

What this PR does
=================
New module: backend/services/fetchers/usni_fleet_tracker.py

  - Pulls USNI's WordPress RSS feeds (site-wide + category, unioned).
  - Picks the most recent fleet-tracker post by parsed pubDate.
  - For each carrier in the registry, scans the article body for
    "is operating in / is in port in / returned to / transiting" near
    the carrier's name, hull code, or "<name> Carrier Strike Group"
    variant. Captures the region/port phrase that follows.
  - Maps the region phrase to coordinates via the existing
    REGION_COORDS table, with a USNI-phrase alias table for the
    specific wording USNI uses ("Yokosuka, Japan", "Norfolk, Va.",
    "Naval Station San Diego", "5th Fleet AOR", etc.).
  - Returns {hull: position_entry} with position_confidence="recent"
    and position_source_at = the article's actual publication
    timestamp (not now()).

Politeness
----------
Uses outbound_user_agent("usni-fleet-tracker") so USNI sees a
per-install Shadowbroker identifier (Round 7a / PR #292). The
article body pages return 403 to non-browser UAs; the WordPress RSS
feed serves the full <content:encoded> body and is the supported
aggregator path. No browser UA spoofing.

carrier_tracker.update_carrier_positions() now runs three phases:
  1. Bootstrap from cache (or seed on first run).
  2. USNI fleet tracker -- PRIMARY high-confidence source.
  3. GDELT -- SECONDARY backfill; can NOT demote a "recent" USNI
     position to an "approximate" GDELT headline match.

Verified live: 6 of 11 carriers picked up real May 18, 2026 positions
on first refresh (Eisenhower, Ford, Bush, Roosevelt, Lincoln,
Washington). The other 5 weren't mentioned in this week's article
(they're in port at homeports with no deployment changes) and kept
their cache entries -- which is the correct seed/cache contract from
PR #285.

Other small fixes bundled in
============================
docker-compose.yml: add the 6 third-party-fetcher opt-in env vars
(PREDICTION_MARKETS_ENABLED, FINANCIAL_ENABLED, FIMI_ENABLED,
NUFORC_ENABLED, NEWS_ENABLED, CROWDTHREAT_ENABLED). They were
documented in .env.example but never wired through compose, so setting
them in .env had no effect.

frontend/src/components/TopRightControls.tsx: fix 6 broken i18n keys
that were showing as raw "terminal.term1" / "terminal.cleanupDetail" /
"node.soloReady" placeholders in the INFONET TERMINAL modal. The
translation files have these strings under different key names; the
component now calls the right ones. Full-file sweep confirmed every
other t('...') key in the whole frontend resolves cleanly.
2026-05-21 20:39:23 -06:00
Shadowbroker 76750caa92 Round 7a: per-operator outbound attribution + GDELT GCS-direct fix (#292)
== Per-install operator handle for every third-party API call ==

Before this PR, every Shadowbroker install identified itself to
Wikipedia, Wikidata, Nominatim, GDELT, OpenMHz, Broadcastify,
weather.gov, NUFORC, Sentinel/Planetary Computer, TinyGS / CelesTrak,
Shodan, Finnhub, and others with a single project-wide User-Agent
("Shadowbroker/1.0" or "ShadowBroker-OSINT/1.0"). From the upstream's
perspective every install in the world looked like one giant scraper.
If one install misbehaved, the upstream's only recourse was to block
"Shadowbroker" as a whole.

PR #284 inadvertently doubled down on this in the frontend by
introducing a shared `WIKIMEDIA_API_USER_AGENT` constant. This PR
retrofits both backends to per-operator attribution.

  New setting: OPERATOR_HANDLE (env var / settings UI / auto-gen)
  New helper:  network_utils.outbound_user_agent("purpose")

The handle is auto-generated as "operator-XXXXXX" on first call (the
"shadow-" prefix from earlier drafts was deliberately dropped — too
suspicious-looking for abuse-detection systems). Operators can
override via OPERATOR_HANDLE; the value is sanitized to lowercase
alphanumeric+dash+underscore and capped at 48 chars. Persisted to
backend/data/operator_handle.json so it survives container restarts.

Retrofitted call sites (every previously-MONSTER User-Agent):
  - services/region_dossier.py (Wikipedia + Wikidata + Nominatim)
  - services/geocode.py         (Nominatim)
  - services/sentinel_search.py (Microsoft Planetary Computer)
  - services/feed_ingester.py   (operator-curated RSS feeds)
  - services/fetchers/earth_observation.py (weather.gov, NUFORC)
  - services/fetchers/infrastructure.py
  - services/fetchers/aircraft_database.py
  - services/fetchers/route_database.py
  - services/fetchers/trains.py
  - services/fetchers/meshtastic_map.py
  - services/shodan_connector.py
  - services/unusual_whales_connector.py (Finnhub)
  - services/tinygs_fetcher.py            (CelesTrak + TinyGS)
  - services/sar/sar_products_client.py
  - services/geopolitics.py               (GDELT)
  - services/radio_intercept.py           (Broadcastify + OpenMHz)
  - routers/cctv.py + main.py             (CCTV proxy)
  - routers/ai_intel.py
  - scripts/convert_power_plants.py       (release-time data refresh)

Spoofed browser UAs removed (issues #289 / #290 / #291 — tg12 audit):
  - cloudscraper-based Chrome impersonation against api.openmhz.com
    -> replaced with honest requests + per-install UA
  - Mozilla/5.0 spoofed UA on Broadcastify scrape
    -> replaced with honest UA
  - Mozilla/5.0 + fake first-party Referer on OpenMHz audio relay
    -> replaced with honest UA
  - cloudscraper dependency dropped from pyproject.toml + uv.lock

Frontend retrofit:
  - new GET /api/settings/operator-handle endpoint (local-operator
    gated) returns the install's handle
  - frontend/src/lib/wikimediaClient.ts fetches the handle once on
    first use, caches it for page lifetime, embeds it in the
    Api-User-Agent for every Wikipedia / Wikidata browser-direct call

== GDELT GCS-direct fix ==

GDELT's data.gdeltproject.org is a CNAME to a Google Cloud Storage
bucket. GCS responds with the wildcard *.storage.googleapis.com cert
which legitimately does NOT cover the GDELT custom domain, so Python's
TLS verification correctly refuses the connection. Some networks
happen to route through a path where this works; many (notably Docker
Desktop's outbound NAT on local installs) do not. Verified on the
maintainer's local install: GDELT was unreachable; 1610 geopolitical
events / 48 export files were dropping silently.

Fix: services/geopolitics._gcs_direct_gdelt_url() rewrites any
data.gdeltproject.org URL to its GCS-direct equivalent
(storage.googleapis.com/data.gdeltproject.org/...) where the standard
GCS cert is genuinely valid. api.gdeltproject.org and every other host
are left untouched.

Confirmed live: backend log goes from
  GDELT lastupdate failed: 500
to
  Downloading 48 GDELT export files...
  Downloaded 48/48 GDELT exports
  GDELT parsed: 1610 conflict locations from 48 files

== Tests ==

  backend/tests/test_per_operator_outbound_attribution.py (12 tests)
  backend/tests/test_gdelt_gcs_direct_rewrite.py          (6 tests)
  backend/tests/test_region_dossier_wikimedia_ua.py       (updated to
    pin the helper + per-operator handle, not the old constant)
  frontend/src/__tests__/utils/wikimediaClient.test.ts    (rewritten
    to mock /api/settings/operator-handle and assert per-operator UA)

Local: backend 114/114 security+audit+round7a suite green;
       frontend 718/718 vitest suite green.

Credit: tg12 (external security audit, issues #289/#290/#291
relating to spoofed UAs); BigBodyCobain (operator-prefix call,
GDELT cloud-vs-local diagnosis).
2026-05-21 15:11:28 -06:00
Shadowbroker c3ef9f4b9e Fix #239: CI guard against new duplicate route registrations (#286)
The audit's concern is that FastAPI behavior depends on the order
routes are registered, because backend/main.py and several router
modules register the same (method, path) pairs twice.

Empirical verification (done in this PR's investigation, see
test_router_handler_is_the_one_that_serves) shows:

- main.app.include_router(...) runs at line ~3316.
- All @app.get/post/... decorators in main.py run AFTER that.
- FastAPI matches in registration order -> the router handler always
  wins; the main.py copies are dead code at the route-resolution
  layer.

So behavior today is deterministic, but drift between the two copies
is a real future risk: someone editing only one copy of a pair
introduces silent inconsistency, exactly as we saw in round 5 with
_WORMHOLE_PUBLIC_SETTINGS_FIELDS (which existed in BOTH main.py and
routers/wormhole.py and had to be tightened in both).

This PR is the lowest-risk fix: a CI guard that captures today's 166
known duplicates as a baseline and fails the build if any NEW
duplicate appears later. Existing duplicates are tolerated. Removed
duplicates are allowed (the baseline is a ceiling, not a floor). No
production code is deleted or moved -- the dedup of the existing 166
duplicates can be staged separately in future PRs without rushing.

Files:

- backend/tests/data/duplicate_routes_baseline.json
  Snapshot of every currently-tolerated (METHOD path) duplicate with
  the modules that register each copy. Generated from a live import
  of main.app via the snippet in the test docstring.

- backend/tests/test_no_new_duplicate_routes.py
  Three tests:
    1. test_no_new_duplicate_route_registrations -- the actual guard,
       fails if (METHOD, path) not in baseline is found duplicated.
    2. test_baseline_only_lists_real_duplicates -- warns (does not
       fail) if the baseline has entries that no longer correspond to
       a real duplicate; informational housekeeping for the next
       baseline regeneration.
    3. test_router_handler_is_the_one_that_serves -- pins the
       empirical claim that for every duplicated path the router
       handler is the first-registered one. If someone ever reorders
       include_router() to come AFTER @app decorators, this test
       fails loudly and points at the most likely cause.

Verified locally:
- 3/3 new tests pass with current main (166 baselined dups).
- Synthetic duplicate injected into main.app at runtime IS caught by
  test 1.
- Full security+carrier suite (96 tests) still green.

Credit: tg12 (external security audit).
2026-05-21 13:27:16 -06:00
Shadowbroker 5e6bb8511a Fix #244/#245/#246: carrier tracker seed/cache/freshness model (#285)
Replace the dated editorial fallback positions baked into the registry
with a one-shot seed file + persistent observation cache. The user's
runtime cache now reflects what THIS install has actually observed,
not what USNI published on March 9, 2026. A year from now, the cache
holds a year of observations and the seed is irrelevant.

== #244: dated editorial coordinates out of the registry ==

CARRIER_REGISTRY no longer carries fallback_lat/lng/heading/desc.
Those fields are deleted. The registry is now identity + homeport
only.

New file: backend/data/carrier_seed.json
  - Read-only, shipped with every release.
  - Used ONCE on first-ever startup to bootstrap carrier_cache.json.
  - Each entry stamped with position_confidence="seed" and the actual
    as-of date (2026-03-09), NOT now().

== #245: approximate confidence for headline-derived positions ==

_parse_carrier_positions_from_news() now stamps every GDELT-derived
entry with position_confidence="approximate" so the UI knows the
coordinate is a region-centroid match, not a precise observation.
After the freshness window the label rolls over to
"stale_approximate" so old-and-imprecise is distinguishable from
recent-and-imprecise.

The article's actual seendate is used as position_source_at instead
of now(), so the "last reported X days ago" badge is honest.

== #246: freshness is labelling, not eviction ==

The cache always preserves the last position the system observed,
forever. What changes is the position_confidence label:
  - within configurable window (default 14d, env-overridable via
    SHADOWBROKER_CARRIER_FRESHNESS_DAYS) -> "recent"
  - older -> "stale"
  - seed-bootstrap entries that were never refreshed -> "seed"
  - homeport defaults (carrier added post-install) -> "homeport_default"
  - headline-derived (any age, fresh) -> "approximate"
  - headline-derived (older than window) -> "stale_approximate"

The position itself never reverts to the seed or the registry. The
user always sees the last position the system observed. Per the
user's explicit guidance: "from there have it be the last position
the user has logged the carriers that way a year from now it doesnt
revert to where the ships are today".

== Other improvements ==

- CACHE_FILE moved to backend/data/carrier_cache.json so it lives in
  the volume-mounted dir under Docker compose. Previously it was at
  /app/carrier_cache.json which got wiped on every container restart
  (pre-existing bug).
- Atomic cache write (temp + os.replace) so a crash mid-write does
  not leave a truncated cache file.

== Public API shape ==

Every carrier object the API emits now includes:
  - position_confidence: seed | recent | stale | approximate |
                         stale_approximate | homeport_default
  - position_source_at:  ISO timestamp of when the underlying source
                         was observed (NOT now())
  - is_fallback:         convenience boolean for the UI; true when the
                         confidence is seed/stale/stale_approximate/
                         homeport_default

Existing fields (estimated, source, source_url, last_osint_update,
name, type, lat, lng, country, desc, wiki) are preserved exactly so
the current ShipPopup frontend renders unchanged. last_osint_update
now reflects position_source_at instead of now(), which is what the
existing "last reported MM/DD" badge always meant to show.

Tests: backend/tests/test_carrier_tracker_quality.py — 17 tests
covering seed bootstrap, subsequent-startup ignoring seed, no-seed/
no-cache homeport fallback, registry no longer has fallback fields,
freshness window labelling + env override, "year-old cache entry keeps
its position, only the label flips" regression, approximate
confidence for headline matches, GDELT seendate ISO parser, public
response shape backward compat.

Credit: tg12 (external security audit, three P1/P2 issues).
2026-05-21 11:15:52 -06:00
Shadowbroker 0fee36e8f7 Fix #218/#219/#220: identify ShadowBroker on Wikipedia + Wikidata calls (#284)
Wikimedia's User-Agent policy asks API clients to identify themselves
with a stable, contactable identifier so their operators can rate-limit
or coordinate. Before this change, ShadowBroker was sending:

- Backend (region_dossier.py): generic project default UA only; no
  Api-User-Agent.
- Frontend (useRegionDossier.ts, WikiImage.tsx, NewsFeed.tsx): zero
  identifying header at all; three separate copy-pasted anonymous
  fetches with their own module-local caches.

Three separate components doing the same broken thing meant policy
fixes had to happen in three places, with no shared cache or kill
switch.

Fix (no UX change, zero hostility):

== Backend ==

`backend/services/region_dossier.py` now sets explicit `User-Agent` +
`Api-User-Agent` headers on every outbound Wikidata and Wikipedia
request via a new `_WIKIMEDIA_REQUEST_HEADERS` constant. The identifier
includes a contact path (issues page on the public GitHub repo).

== Frontend ==

New shared helper `frontend/src/lib/wikimediaClient.ts`:
- `fetchWikipediaSummary(title)` — single source of truth for Wikipedia
  REST summary lookups, with one shared LRU cache (in-flight requests
  deduplicated, 512-entry cap), `Api-User-Agent` on every fetch.
- `fetchWikidataSparql(query)` — same shape for Wikidata SPARQL.
- `WIKIMEDIA_API_USER_AGENT` — exported constant; one place to update
  if Wikimedia ever asks us to back off.

Refactored three components to use the shared client:
- `frontend/src/hooks/useRegionDossier.ts` — fetchLeader() and
  fetchLocalWikiSummary() now route through the shared helpers.
- `frontend/src/components/WikiImage.tsx` — uses fetchWikipediaSummary,
  proper React state instead of module-mutation + forceUpdate trick.
- `frontend/src/components/NewsFeed.tsx` — same shape.

UX: byte-for-byte identical. Same thumbnails, same dossier content,
same load behavior. The only observable difference is the outgoing
request header.

Note on #239 (route duplication): an audit-grade inventory shows 166
main.py routes are shadowed by router modules. That cleanup is too
large to land safely in this PR; it will be staged as a separate
ladder of small PRs grouped by router module.

Tests:
- `backend/tests/test_region_dossier_wikimedia_ua.py` — 3 tests
  asserting backend headers are present.
- `frontend/src/__tests__/utils/wikimediaClient.test.ts` — 9 tests
  covering Api-User-Agent presence, shared cache, concurrent
  deduplication, disambiguation/HTTP-error/network-error fallthroughs,
  empty-input safety.

Local: backend 76/76 security suite green, frontend 716/716 vitest
suite green.

Credit: tg12 (external security audit).
2026-05-21 10:48:05 -06:00
Shadowbroker e125467721 Fix #243/#252/#253: stop leaking settings posture to anonymous callers (#283)
Three settings endpoints were disclosing operational posture or
operator-curated configuration to any network caller. This change
either tightens the redacted-public view (#243) or adds a
local-operator auth gate (#252, #253) per the audit recommendations.

Zero hostility to legitimate users: in all three cases, the Tauri
shell (loopback), the Docker bridge frontend container (#250 + #278),
and any caller with an admin key continue to see the full data. Only
anonymous LAN/internet callers see the reduced surface.

== #243 (Wormhole transport posture, anonymous-mode, profile, node mode)

Tightened the public-redaction allowlists in BOTH the main.py and
routers/wormhole.py copies:
- _WORMHOLE_PUBLIC_SETTINGS_FIELDS: {enabled, transport, anonymous_mode}
                                 -> {enabled}
- _WORMHOLE_PUBLIC_PROFILE_FIELDS: {profile, wormhole_enabled}
                                 -> {wormhole_enabled}

`GET /api/settings/node` (both the routers/admin.py and main.py copies)
now returns an empty stub for unauthenticated callers and the full
node_mode + node_enabled fields only for authenticated callers via
_scoped_view_authenticated(request, "node").

== #252 (news feed inventory disclosure)

`GET /api/settings/news-feeds` now requires Depends(require_local_operator)
in both the canonical routers/admin.py handler and the duplicate main.py
handler. Anonymous callers can no longer enumerate operator-curated
feed names and URLs.

== #253 (Time Machine archival-capture posture disclosure)

`GET /api/settings/timemachine` now requires Depends(require_local_operator).
Anonymous callers can no longer fingerprint whether a deployment is
retaining replayable historical surveillance data.

Tests: backend/tests/test_round5_settings_info_disclosure.py (10 tests)
- Wormhole settings: anonymous sees only `enabled`; authenticated sees full state.
- Privacy profile: anonymous sees only `wormhole_enabled`; authenticated sees `profile` + `transport` + `anonymous_mode`.
- Node settings: anonymous sees `{}`; authenticated sees node_mode + node_enabled + persisted state.
- news-feeds: anonymous gets 403 (and get_feeds() is NOT called); authenticated gets full inventory.
- timemachine: anonymous gets 403; authenticated sees enabled + storage_warning.

Local: 73/73 security suite (round 5 + earlier rounds) green.

Credit: tg12 (external security audit, P1 + 2x Medium).
2026-05-21 10:32:23 -06:00
Shadowbroker 2b03b808ac Fix #279: add defusedxml to uv.lock so Docker image installs it (#282)
defusedxml is listed in backend/pyproject.toml line 18 but was missing
from uv.lock. The backend Dockerfile uses `uv sync --frozen --no-dev`,
which only installs packages pinned in the lockfile. As a result the
runtime image shipped without defusedxml even though pyproject declared
it, and any import path that touched it crashed at startup with:

    ModuleNotFoundError: No module named 'defusedxml'

Affected import sites:

- backend/services/psk_reporter_fetcher.py:10
- backend/services/fetchers/aircraft_database.py:21
- backend/services/cctv_pipeline.py:990
- backend/services/cctv_pipeline.py:1018

Fix: regenerate uv.lock so defusedxml v0.7.1 (matching the >=0.7.1
specifier in pyproject) is locked. No code changes -- only the lockfile.
Next image build picks it up via the existing `uv sync --frozen` step.

Reporter: external user. Thanks for catching the missing dep.
2026-05-21 10:18:40 -06:00
Shadowbroker 2e14e75a0e Fix #256: per-peer HMAC secrets defeat cross-peer impersonation (#281)
Before this change, every peer-push HMAC was derived from the single
fleet-shared MESH_PEER_PUSH_SECRET. The receiver could prove "this
request was signed by someone who knows the fleet secret" but it could
NOT prove which peer signed it. Any peer that knew the global secret
could compute the expected HMAC for any other peer URL and forge a
push pretending to be that peer.

Fix: introduce MESH_PEER_SECRETS, an optional comma-separated
url=secret map. When a peer URL appears in the map, only the listed
per-peer secret is accepted for it -- the global secret is ignored for
that specific URL. Peer A no longer knows peer B's secret, so peer A
cannot forge a push claiming to be peer B.

The new helper resolve_peer_key_for_url() in mesh_crypto.py wraps the
lookup and is called from every existing peer-push call site:

- backend/auth.py:_verify_peer_push_hmac (receiver)
- backend/main.py:_http_peer_push_loop (Infonet event push)
- backend/main.py:_http_gate_pull_loop (gate event pull)
- backend/main.py:_http_gate_push_loop (gate event push)
- backend/services/mesh/mesh_router.py (two transports, push)
- backend/services/mesh/mesh_hashchain.py (gate wire ref key)
- backend/services/mesh/mesh_wormhole_prekey.py (peer prekey lookup)

Zero hostility, by design:

- Single-peer installs leave MESH_PEER_SECRETS empty -> resolver falls
  back to MESH_PEER_PUSH_SECRET -> behavior is byte-for-byte unchanged.
- Multi-peer installs that haven't migrated yet behave exactly as
  before.
- Multi-peer installs that DO migrate set MESH_PEER_SECRETS on both
  ends of each peering and immediately close the impersonation surface
  for those URLs. Migration is incremental: unlisted peers keep using
  the global secret.

Tests in backend/tests/test_per_peer_secret_resolver.py:
- env parsing (default, override, whitespace, malformed entries, cache)
- precedence: per-peer beats global
- migration window: unlisted peer falls back to global
- IMPERSONATION REFUSAL: peer A with global-secret-only cannot forge
  HMAC for peer B that has a per-peer secret configured
- IMPERSONATION REFUSAL: peer A with its OWN per-peer secret cannot
  forge HMAC for peer B
- positive control: legitimate peer B request verifies
- zero-behavior-change: single-peer install produces the same key bytes
  as before the change

Credit: tg12 (external security audit, P1/High/High confidence)
2026-05-21 10:05:29 -06:00
Shadowbroker 084e563412 Fix #240/#241: require admin auth on oracle resolve endpoints (#280)
Both POST /api/mesh/oracle/resolve and POST /api/mesh/oracle/resolve-stakes
were previously gated only by a rate limit (5/min) and tagged with
`mesh_write_exempt(MeshWriteExemption.ADMIN_CONTROL)`. The exemption
decorator is metadata only — it tells the mesh signed-write middleware
not to require a signature envelope, it does NOT enforce caller
authorization. Any network caller could:

- /resolve: settle any prediction market to any outcome (corrupts every
  downstream profile/win-loss count derived from that ledger).
- /resolve-stakes: trigger stake settlement for all expired contests at
  a time of their choosing (race against operator intent).

Fix: add `dependencies=[Depends(require_admin)]` to both routes. The
existing `mesh_write_exempt` tag stays in place because it accurately
describes the route's relationship to the signed-write envelope system;
adding `require_admin` is what closes the actual auth hole.

Tests in backend/tests/test_oracle_resolve_auth_gate.py:
- anonymous caller -> 403, ledger mutator NOT called
- wrong admin key -> 403, ledger mutator NOT called
- valid admin key -> 200, ledger mutator called
- admin key unconfigured + no debug/insecure-admin -> 403

Credit: tg12 (external security audit)
2026-05-21 09:45:08 -06:00
Shadowbroker 9ef6213284 Fix #250: bind Docker bridge local-operator trust to frontend hostname (#278)
Tightens the bridge-trust check so a connection on the Docker bridge
is only granted local-operator status when its source IP matches a
configured frontend container hostname (default: `frontend` + the
shipped `container_name` `shadowbroker-frontend`). Previously, when
`SHADOWBROKER_TRUST_DOCKER_BRIDGE_LOCAL_OPERATOR=1` was set, ANY IP
in the 172.16.0.0/12 range was granted local-operator privileges —
on a shared Docker host that included any unrelated container on the
same bridge.

Operators with renamed services can list new hostnames via the new
`SHADOWBROKER_TRUSTED_FRONTEND_HOSTS` env var (comma-separated). DNS
resolution is cached for 30s; if Docker DNS can't resolve any of the
configured names we fail closed and refuse the bridge entirely.

Single-user installs see no behavior change — the default-named
frontend container still resolves and is still trusted.

Credit: tg12 (external security audit)
2026-05-21 02:06:11 -06:00
Shadowbroker fb11e0881f Fix #251: refuse symlink/hardlink members during Tor bundle extraction (#277)
External audit (@tg12) flagged that the Tor Expert Bundle extractor
checked tarinfo.name against path traversal but never inspected
tarinfo.linkname for symlink or hardlink members. Python 3.11's
tarfile.extractall() honors symlinks, so a malicious archive could
ship a member like::

    name     = "innocent.txt"          (passes the path-traversal check)
    type     = SYMTYPE
    linkname = "C:\Windows\System32\config\system"

After extraction, subsequent reads of innocent.txt dereference to that
arbitrary filesystem location; subsequent writes corrupt it. On
Windows (where Tor Expert Bundle extraction actually runs), this is
a host-compromise path of essentially the same severity as the
supply-chain RCE in #231 — gated only by the integrity check we just
hardened in PR #261/#265.

Python 3.12+ added tarfile.extract / extractall filter='data' as a
built-in mitigation; we're on Python 3.11 in production, so we
implement the same idea manually.

Fix in backend/services/tor_hidden_service.py:

  Extract the existing path-traversal-only check into a new
  _extract_tor_bundle_safely() helper that:

  1. Refuses any member with member.issym() or member.islnk() True.
     Tor bundles never legitimately contain symlinks or hardlinks
     so this is non-disruptive. Logs the linkname so an operator
     can see what the malicious archive was trying to alias.
  2. Refuses any member that isn't isfile() or isdir() — no FIFOs,
     no character or block devices, no contiguous-file-type entries.
     None of those belong in a Tor Expert Bundle and accepting them
     is a class of bug we don't need to debug later.
  3. Preserves the original path-traversal guard (member.name must
     resolve under install_dir).
  4. Catches tarfile.TarError so a corrupt archive returns False
     gracefully instead of bubbling out an exception.

Tests: backend/tests/test_tor_bundle_symlink_filter.py (8 tests)
  - Clean archive with only regular files extracts successfully
  - Symlink member is rejected (the core regression)
  - Hardlink member is rejected
  - Symlink with relative target inside install_dir is still rejected
    (we don't allow symlinks at all, not just absolute-target ones)
  - FIFO/device-style member is rejected
  - Path-traversal guard still works under the new shape
  - Malformed/non-tar file is rejected gracefully (no crash)
  - Failure on one member rejects the whole bundle (no half-extract)

Validation:
  pytest backend/tests/test_tor_bundle_symlink_filter.py
         backend/tests/test_tor_bundle_verification.py
  -> 14 passed

UX impact: zero for legitimate Tor releases. Operators installing
a real Tor Expert Bundle continue to see "Tor installed at:" exactly
as before. Only malicious archives are refused, with a clear log
message identifying the rejected linkname.

Credit: @tg12 — the original report was specific enough that the
fix design was immediate.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-21 01:41:13 -06:00
Shadowbroker 7f96151e56 Fix #231: multi-source SHA-256 verification for the self-updater (#265)
External audit (@tg12, May 18) found that backend/services/updater.py
silently skipped all SHA-256 integrity verification whenever the
MESH_UPDATE_SHA256 env var was unset — which is the default. Nothing
in any install doc tells operators to set it, so practically every
deployment was running the auto-updater with zero integrity check.
That made GitHub release pipeline compromise a single-step path to
arbitrary code execution on every node that auto-updates.

Investigation surfaced a deeper bug too: the updater downloads
zipball_url (GitHub's auto-generated source archive) but the
maintainer's release process publishes SHA256SUMS.txt for a separate
named asset (ShadowBroker_v*.zip). So even if MESH_UPDATE_SHA256
WERE set, operators had no published digest to compare against — the
file they were downloading wasn't the file the maintainer had signed.

This PR fixes both issues with the same multi-source verification
chain we shipped for the Tor bundle in PR #261:

  backend/services/updater.py
    _download_release() now prefers a maintainer-signed release asset
    matching ShadowBroker_v*.zip over zipball_url. Captures the
    SHA256SUMS.txt asset URL when present.

    _validate_zip_hash() rewritten as a four-source chain:
      1. MESH_UPDATE_SHA256 env var (operator override, preserved)
      2. SHA256SUMS.txt asset published with the release (primary —
         the maintainer's release process already publishes this)
      3. Baked-in backend/data/release_digests.json (second line of
         defense for releases that lack the SHA256SUMS asset, or when
         the asset can't be fetched at update time)
      4. HTTPS-only fallback with a loud warning (preserves the auto-
         update flow during transient outages)

    Mismatch from any source that DID respond is fatal — the update
    is refused and the existing install keeps running. Only the
    "no source reachable at all" case falls back to HTTPS-only.

    _fetch_sha256sums() new — fetches and parses a standard
    SHA256SUMS.txt asset. Handles both "<digest>  <name>" and binary-
    marker "<digest> *<name>" formats. Tolerant to comments, blank
    lines, and malformed entries.

  backend/data/release_digests.json (new)
    Baked-in digest list keyed by release tag. Seeded with the v0.9.79
    entries copied from the published SHA256SUMS.txt:
      ShadowBroker_v0.9.79.zip      = f6877c1d6661...
      ShadowBroker_0.9.79_x64-setup.exe = f7b676ada45c...
      ShadowBroker_0.9.79_x64_en-US.msi = e0713c3cdda1...
    Whitelisted in .gitignore alongside the other static reference
    data files (kiwisdr_directory.json, tor_bundle_digests.json,
    aisstream_spki_pins.json).

  backend/tests/test_update_integrity_chain.py (new, 16 tests)
    - Each source matches → success, identifies which source verified
    - Each source mismatches → RuntimeError "mismatch"
    - No source reachable → https-only fallback with loud warning
    - Env override beats all other sources (preserved precedence)
    - SHA256SUMS.txt parser handles standard, binary-marker, comments,
      and network-failure cases

Validation:
  pytest backend/tests/test_update_integrity_chain.py → 16 passed
  pytest (all 15 security test files together) → 105 passed

UX impact: zero. Normal auto-update flow is unchanged for legitimate
releases (path 2 catches everything because the release publishes
SHA256SUMS.txt). Transient network failures during update gracefully
fall through to path 3 then path 4 — no operator intervention needed.
The only user-visible behavior change is in the compromised-release
case, where the update is now refused instead of silently applied.

Credit: @tg12 for the original bug report and the specific call-out
that MESH_UPDATE_SHA256 was unreachable by default operators.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-21 01:31:20 -06:00
Shadowbroker d0299fc0a0 test(ci): raise vitest testTimeout to 15s to stop CI-load flakes (#266)
Vitest's default per-test timeout is 5s. That's plenty for tests that
exercise pure functions or even simple JSX, but the heavier React
component trees we render under jsdom — MessagesView, GateView,
Wormhole contact flows — consistently measure 6-10s on GitHub Actions'
shared Node workers under load.

Concrete flake history that drove this bump (none were real product
bugs — all were CI load racing the 5s ceiling on findByText /
waitFor against React reconciliation):

  PR #226 messagesViewFirstContact > removes approved contact
  PR #237 (same)
  PR #261 (same)
  PR #262 (same) ← worst: fired on post-merge Docker Publish run,
                   prevented the AIS SPKI security fix's image from
                   being published to GHCR until PR #263 cumulatively
                   re-published it. Real security-fix-shipping risk.
  PR #264 fixed messagesViewFirstContact specifically with waitFor
  PR #265 messagesViewFirstContact > legacy handle-only addresses
                  AND gateCompatDecryptUx > browser-local gate runtime
                  AND failed on the rerun too — confirming the flake
                  class is broader than the one test we deflaked.

The deflake in PR #264 was too surgical — it addressed one specific
test out of a class of similarly-flaky CI-load-sensitive sites. This
PR addresses the root cause at the config layer instead of playing
whack-a-mole.

Why 15s specifically: 3x the default. Headroom for routine CI slowness
without masking real "test never settles" bugs (those would still
time out, just three rounds later). Individual tests can still pin
their own tighter timeout via the third arg to `it()`.

Also bumps hookTimeout to 15s — beforeEach/afterEach setup for the
same heavier component tests has the same CI-load sensitivity.

User-facing impact: zero. This is dev pipeline infrastructure. End
users never see test timeouts. The cost is theoretical: a buggy test
that genuinely never resolves now takes 15s to declare failure
instead of 5s. In practice that's negligible because the suite runs
once per CI invocation and tests don't usually deadlock.

Validation:
  Local full vitest run → 707 passed, 72 files, 10.36s wall clock
  (same speed as before — we only changed how long we WAIT for slow
   tests, not how fast tests actually run)

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-21 01:26:34 -06:00
269 changed files with 27544 additions and 3140 deletions
+38 -2
View File
@@ -10,6 +10,23 @@ OPENSKY_CLIENT_ID=
OPENSKY_CLIENT_SECRET= OPENSKY_CLIENT_SECRET=
AIS_API_KEY= AIS_API_KEY=
# Global Fishing Watch — fishing vessel activity events (Fishing Activity map layer).
# Free API token from https://globalfishingwatch.org/our-apis/tokens
# Without this the fishing_activity layer stays empty.
# GFW_API_TOKEN=
# Optional tuning — GFW can return 40k+ global events; defaults cap fetch for map paint.
# GFW_EVENTS_PAGE_SIZE=500
# GFW_EVENTS_MAX_PAGES=10
# GFW_EVENTS_LOOKBACK_DAYS=7
# GFW_EVENTS_TIMEOUT_S=90
# Windy Webcams global CCTV layer — free key from https://api.windy.com/webcams/docs
# WINDY_API_KEY=
# Telegram OSINT map layer — scrapes public t.me/s channel previews (no bot token).
# TELEGRAM_OSINT_ENABLED=true
# TELEGRAM_OSINT_CHANNELS=osintdefender,insiderpaper,aljazeeraenglish,nexta_live,war_monitor
# Admin key to protect sensitive endpoints (settings, updates). # Admin key to protect sensitive endpoints (settings, updates).
# If blank, loopback/localhost requests still work for local single-host dev. # If blank, loopback/localhost requests still work for local single-host dev.
# Remote/non-loopback admin access requires ADMIN_KEY, or ALLOW_INSECURE_ADMIN=true in debug-only setups. # Remote/non-loopback admin access requires ADMIN_KEY, or ALLOW_INSECURE_ADMIN=true in debug-only setups.
@@ -39,8 +56,8 @@ ADMIN_KEY=
# NUFORC_MAPBOX_TOKEN= # NUFORC_MAPBOX_TOKEN=
# Optional startup-risk controls. # Optional startup-risk controls.
# On Windows, external curl fallback and the Playwright LiveUAMap scraper are # On Windows, external curl fallback is off by default. LiveUAMap uses UI consent
# disabled by default so blocked upstream feeds cannot interrupt start.bat. # when you enable Global Incidents (or set SHADOWBROKER_ENABLE_LIVEUAMAP_SCRAPER=true).
# SHADOWBROKER_ENABLE_WINDOWS_CURL_FALLBACK=false # SHADOWBROKER_ENABLE_WINDOWS_CURL_FALLBACK=false
# SHADOWBROKER_ENABLE_LIVEUAMAP_SCRAPER=false # SHADOWBROKER_ENABLE_LIVEUAMAP_SCRAPER=false
# AIS starts by default when AIS_API_KEY is set. Set to 0/false to force-disable. # AIS starts by default when AIS_API_KEY is set. Set to 0/false to force-disable.
@@ -77,6 +94,19 @@ ADMIN_KEY=
# pip install earthengine-api # pip install earthengine-api
# GEE_SERVICE_ACCOUNT_KEY= # GEE_SERVICE_ACCOUNT_KEY=
# Copernicus CDSE — Sentinel-2 imagery (Settings → Imagery, or backend .env).
# Free OAuth app at https://dataspace.copernicus.eu/
# SENTINEL_CLIENT_ID=
# SENTINEL_CLIENT_SECRET=
# Sentinel-2 road corridor freight trends (DrishX engine port — opt-in slow layer).
# pip install -e backend[road-corridor] (or uv sync --extra road-corridor)
# ROAD_CORRIDOR_SAT_ENABLED=false
# ROAD_CORRIDOR_SCHEDULED_PRESETS=laredo_i35
# ROAD_CORRIDOR_MONTHS=2
# ROAD_CORRIDOR_MAX_FRAMES=6
# ROAD_CORRIDOR_REFRESH_HOURS=24
# Override the backend URL the frontend uses (leave blank for auto-detect) # Override the backend URL the frontend uses (leave blank for auto-detect)
# NEXT_PUBLIC_API_URL=http://192.168.1.50:8000 # NEXT_PUBLIC_API_URL=http://192.168.1.50:8000
@@ -128,8 +158,14 @@ ADMIN_KEY=
# MESH_DM_ROOT_TRANSPARENCY_LEDGER_READBACK_URI=backend/../ops/root_transparency_ledger.json # MESH_DM_ROOT_TRANSPARENCY_LEDGER_READBACK_URI=backend/../ops/root_transparency_ledger.json
# ── Self Update ──────────────────────────────────────────────── # ── Self Update ────────────────────────────────────────────────
# Optional ZIP updater digest pin. The updater checks this first, then
# backend/data/release_digests.json, then the release SHA256SUMS.txt asset.
# MESH_UPDATE_SHA256= # MESH_UPDATE_SHA256=
# Optional strict nonce-only frontend CSP. Leave unset unless the exact build
# has been verified to hydrate cleanly in your deployment.
# SHADOWBROKER_STRICT_CSP=1
# ── Wormhole (Local Agent) ───────────────────────────────────── # ── Wormhole (Local Agent) ─────────────────────────────────────
# WORMHOLE_URL=http://127.0.0.1:8787 # WORMHOLE_URL=http://127.0.0.1:8787
# WORMHOLE_TRANSPORT=direct # WORMHOLE_TRANSPORT=direct
+13
View File
@@ -0,0 +1,13 @@
## Summary
<!-- What changed and why (13 bullets). -->
## Test plan
- [ ] <!-- How you verified the change -->
## Production hardening (data path / fetchers / unattended deploys only)
If this PR touches the data path, fetchers, or live-data APIs, walk through [docs/production-hardening.md](https://github.com/BigBodyCobain/Shadowbroker/blob/main/docs/production-hardening.md) and note any N/A items here.
- [ ] Checklist reviewed (or N/A — explain why)
+22
View File
@@ -7,6 +7,28 @@ on:
branches: [main] branches: [main]
workflow_call: workflow_call:
# CI flake mitigation:
# ci.yml is triggered TWICE per PR on the same commit — once directly via
# the `pull_request` trigger above ("Frontend Tests & Build" check) and once
# via `workflow_call` from docker-publish.yml ("CI Gate / Frontend Tests &
# Build" check). Both jobs land on the same Actions runner pool at the same
# time and fight for CPU/RAM. Under contention, React's reconciliation in
# `messagesViewFirstContact.test.tsx > removes an approved contact …`
# overruns its 5s waitFor timeout — that's the single failure mode we've
# seen flake on PRs #226, #237, #261, #262, #265, #294, #303, and the
# fd7d6fa push. Backend tests and every other frontend test pass under
# the same conditions, which is what made this look random.
#
# Pinning a concurrency group on the SHA (PR head, or the pushed commit
# for main) serializes the two invocations so neither starves the other.
# We use cancel-in-progress: false so the second one queues instead of
# cancelling — cancelling could leave the PR check stuck "Expected" if
# only one of the two ever finishes. Total CI time grows by ~2 min in
# exchange for deterministic outcomes.
concurrency:
group: ci-${{ github.event.pull_request.head.sha || github.sha }}
cancel-in-progress: false
jobs: jobs:
frontend: frontend:
name: Frontend Tests & Build name: Frontend Tests & Build
+42
View File
@@ -101,6 +101,17 @@ backend/data/*
# Issue #258: SPKI pins for stream.aisstream.io so we can survive upstream # Issue #258: SPKI pins for stream.aisstream.io so we can survive upstream
# Let's Encrypt renewal failures without disabling TLS validation entirely. # Let's Encrypt renewal failures without disabling TLS validation entirely.
!backend/data/aisstream_spki_pins.json !backend/data/aisstream_spki_pins.json
# Issue #231: pinned SHA-256 digests for known release archives. Used by
# the self-updater as a second-line integrity check when the release's
# SHA256SUMS.txt asset can't be fetched.
!backend/data/release_digests.json
# Issue #244/#245/#246: one-shot carrier-position seed shipped with each
# release. Used ONLY on first-ever startup to bootstrap carrier_cache.json;
# after that the cache reflects this install's own GDELT observations.
!backend/data/carrier_seed.json
# DrishX RF model weights (MIT — see backend/third_party/drishx/NOTICE.md)
!backend/data/drishx/
!backend/data/drishx/rf_model.pickle
# OS generated files # OS generated files
.DS_Store .DS_Store
@@ -190,6 +201,8 @@ graphify-out/
# Internal docs & brainstorming (never commit) # Internal docs & brainstorming (never commit)
# ======================== # ========================
docs/* docs/*
!docs/OUTBOUND_DATA.md
!docs/production-hardening.md
!docs/mesh/ !docs/mesh/
docs/mesh/* docs/mesh/*
!docs/mesh/threat-model.md !docs/mesh/threat-model.md
@@ -253,3 +266,32 @@ backend/data/wormhole_stdout.log
# Compressed snapshot archives (can be 100 MB+) # Compressed snapshot archives (can be 100 MB+)
*.json.gz *.json.gz
# ──────────────────────────────────────────────────────────────────────
# AI assistant / coding-agent scratch
# ──────────────────────────────────────────────────────────────────────
# Per-tool config + scratch directories. These are private to whichever
# coding agent the operator happens to be using and have no business in
# the repo. If a tool's instructions need to be canonical for the project,
# we'll put them in docs/ explicitly — not let the agent dump them at the
# repo root.
# OpenAI Codex CLI
.codex/
.codex-app-schema/
.codex-app-ts/
# Per-agent instruction files dropped at repo root by various tools.
# These are operator-side preferences, not part of the project contract.
AGENTS.md
GEMINI.md
CLAUDE.md
.github/copilot-instructions.md
# Stale AI-generated test file that referenced fields that don't exist in
# the current `_parse_carrier_positions_from_news` implementation. Kept
# ignored so it doesn't accidentally get committed if it shows up again
# from a tool that's working off an out-of-date understanding of the
# module. If a real test for that function is needed, write it under a
# meaningful name in tests/test_carrier_tracker_quality.py.
backend/tests/test_carrier_tracker_region_centers.py
+42 -12
View File
@@ -13,13 +13,22 @@
# 2. Reverse-mirrors main back to GitHub (only if commits land directly # 2. Reverse-mirrors main back to GitHub (only if commits land directly
# on GitLab) so the two sources stay in sync. # on GitLab) so the two sources stay in sync.
# #
# Pipelines on this repo were instant-failing for free-tier accounts until
# identity verification was added — the May 2026 bump in this comment is
# the marker commit that confirms runner allocation after verification.
#
# Auth notes: # Auth notes:
# - The image build/push uses $CI_JOB_TOKEN, which GitLab provides # - The image build/push uses $CI_JOB_TOKEN, which GitLab provides
# automatically. No credentials need to be configured. # automatically. No credentials need to be configured.
# - The reverse mirror requires a GitHub personal access token stored # - The reverse mirror authenticates to GitHub via a per-repo SSH
# as the GitLab CI/CD variable GITHUB_MIRROR_TOKEN (Protected + Masked). # deploy key. The private half is stored as the File-type GitLab
# Scope: public_repo (or repo for private). If the variable isn't # CI/CD variable GITHUB_MIRROR_SSH_KEY (Protected). The matching
# set the mirror job is skipped — image builds still run. # public key is added to github.com/BigBodyCobain/Shadowbroker/
# settings/keys with write access. This is a tighter-scoped
# replacement for a personal access token: it can ONLY push to
# Shadowbroker, never expires, and rotating it is a one-click
# delete on GitHub's deploy-keys page. If the variable isn't set,
# the mirror job is skipped — image builds still run.
stages: stages:
- build - build
@@ -48,7 +57,11 @@ variables:
- docker info - docker info
- docker login -u "$CI_REGISTRY_USER" -p "$CI_JOB_TOKEN" "$CI_REGISTRY" - docker login -u "$CI_REGISTRY_USER" -p "$CI_JOB_TOKEN" "$CI_REGISTRY"
- docker run --privileged --rm tonistiigi/binfmt --install all - docker run --privileged --rm tonistiigi/binfmt --install all
- docker buildx create --use --name multiarch --driver docker-container # buildx --driver docker-container can't read TLS from the env vars
# the GitLab dind service exports. Wrap them in a docker context and
# bind buildx to it. See https://docs.gitlab.com/ee/ci/docker/using_docker_build.html#use-docker-buildx
- docker context create tls-env
- docker buildx create --use --name multiarch --driver docker-container tls-env
# ── Backend image ──────────────────────────────────────────────────────── # ── Backend image ────────────────────────────────────────────────────────
build-backend: build-backend:
@@ -93,18 +106,35 @@ build-frontend:
- .gitlab-ci.yml - .gitlab-ci.yml
# ── Reverse mirror to GitHub ───────────────────────────────────────────── # ── Reverse mirror to GitHub ─────────────────────────────────────────────
# Pushes refs/heads/main to github.com/BigBodyCobain/Shadowbroker. # Pushes refs/heads/main to github.com/BigBodyCobain/Shadowbroker via SSH
# Fast-forward-only — if GitLab main and GitHub main have diverged, this # using a per-repo deploy key. Fast-forward-only by default — if GitLab
# fails loudly rather than silently overwriting either side. # main and GitHub main have diverged, the push fails loudly rather than
# silently overwriting either side.
# #
# Only runs if GITHUB_MIRROR_TOKEN is set as a CI/CD variable. See the # Only runs if GITHUB_MIRROR_SSH_KEY is set as a File-type CI/CD variable.
# header comment of this file for setup instructions. # See the header comment of this file for setup instructions.
mirror-to-github: mirror-to-github:
stage: mirror stage: mirror
image: alpine:3.20 image: alpine:3.20
needs: [] needs: []
before_script: before_script:
- apk add --no-cache git openssh-client ca-certificates - apk add --no-cache git openssh-client ca-certificates
- mkdir -p ~/.ssh
- chmod 700 ~/.ssh
# Install the deploy key. File-type CI variable exposes the path; copy
# to ~/.ssh/id_ed25519 with restrictive perms so ssh accepts it.
- cp "$GITHUB_MIRROR_SSH_KEY" ~/.ssh/id_ed25519
- chmod 600 ~/.ssh/id_ed25519
# Pin github.com's current host keys so we never trust a man-in-the-
# middle. Sourced from https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/githubs-ssh-key-fingerprints
# (rotated 2023-03-24 after the previous RSA key leak).
- |
cat > ~/.ssh/known_hosts <<'EOF'
github.com ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIOMqqnkVzrm0SdG6UOoqKLsabgH5C9okWi0dh2l9GKJl
github.com ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBEmKSENjQEezOmxkZMy7opKgwFB9nkt5YRrYMjNuG5N87uRgg6CLrbo5wAdT/y6v0mKV0U2w0WZ2YB/++Tpockg=
github.com ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQCj7ndNxQowgcQnjshcLrqPEiiphnt+VTTvDP6mHBL9j1aNUkY4Ue1gvwnGLVlOhGeYrnZaMgRK6+PKCUXaDbC7qtbW8gIkhL7aGCsOr/C56SJMy/BCZfxd1nWzAOxSDPgVsmerOBYfNqltV9/hWCqBywINIR+5dIg6JTJ72pcEpEjcYgXkE2YEFXV1JHnsKgbLWNlhScqb2UmyRkQyytRLtL+38TGxkxCflmO+5Z8CSSNY7GidjMIZ7Q4zMjA2n1nGrlTDkzwDCsw+wqFPGQA179cnfGWOWRVruj16z6XyvxvjJwbz0wQZ75XK5tKSb7FNyeIEs4TT4jk+S4dhPeAUC5y+bDYirYgM4GC7uEnztnZyaVWQ7B381AK4Qdrwt51ZqExKbQpTUNn+EjqoTwvqNj4kqx5QUCI0ThS/YkOxJCXmPUWZbhjpCg56i+2aB6CmK2JGhn57K5mj0MNdBXA4/WnwH6XoPWJzK5Nyu2zB3nAZp+S5hpQs+p1vN1/wsjk=
EOF
- chmod 644 ~/.ssh/known_hosts
script: script:
- git config --global user.email "ci-mirror@gitlab.com" - git config --global user.email "ci-mirror@gitlab.com"
- git config --global user.name "GitLab CI Mirror" - git config --global user.name "GitLab CI Mirror"
@@ -115,7 +145,7 @@ mirror-to-github:
- cd repo - cd repo
- > - >
git push git push
"https://x-access-token:${GITHUB_MIRROR_TOKEN}@github.com/BigBodyCobain/Shadowbroker.git" "git@github.com:BigBodyCobain/Shadowbroker.git"
"${CI_COMMIT_SHA}:refs/heads/main" "${CI_COMMIT_SHA}:refs/heads/main"
rules: rules:
- if: $CI_COMMIT_BRANCH == "main" && $GITHUB_MIRROR_TOKEN - if: $CI_COMMIT_BRANCH == "main" && $GITHUB_MIRROR_SSH_KEY
+2 -1
View File
@@ -44,7 +44,8 @@ These sources have their own terms; consult each link before redistributing.
| aisstream.io | https://aisstream.io | Free-tier API terms (attribution required) | AIS vessel positions | | aisstream.io | https://aisstream.io | Free-tier API terms (attribution required) | AIS vessel positions |
| Global Fishing Watch | https://globalfishingwatch.org | CC BY 4.0 (for public data) | Fishing activity events | | Global Fishing Watch | https://globalfishingwatch.org | CC BY 4.0 (for public data) | Fishing activity events |
| Microsoft Planetary Computer | https://planetarycomputer.microsoft.com | Sentinel-2 / ESA Copernicus terms | Sentinel-2 imagery | | Microsoft Planetary Computer | https://planetarycomputer.microsoft.com | Sentinel-2 / ESA Copernicus terms | Sentinel-2 imagery |
| Copernicus CDSE (Sentinel Hub) | https://dataspace.copernicus.eu | ESA Copernicus open data terms | SAR + optical imagery | | Copernicus CDSE (Sentinel Hub) | https://dataspace.copernicus.eu | ESA Copernicus open data terms | SAR + optical imagery, optional road-corridor truck trends |
| DrishX / Fisser et al. 2022 | https://github.com/sparkyniner/DRISH-X-Satellite-powered-freight-intelligence- | MIT (engine); research methodology attribution | Sentinel-2 motion-smear truck detection on major roads (opt-in) |
| Shodan | https://www.shodan.io | Operator-supplied API key, Shodan ToS | Internet device search | | Shodan | https://www.shodan.io | Operator-supplied API key, Shodan ToS | Internet device search |
| Smithsonian GVP | https://volcano.si.edu | Attribution required | Volcanoes | | Smithsonian GVP | https://volcano.si.edu | Attribution required | Volcanoes |
| OpenAQ | https://openaq.org | CC BY 4.0 | Air quality stations | | OpenAQ | https://openaq.org | CC BY 4.0 | Air quality stations |
+134 -21
View File
@@ -19,7 +19,7 @@
**ShadowBroker** is a decentralized intelligence platform that aggregates real-time, multi-domain OSINT telemetry from 60+ live intelligence feeds into a single dark-ops map interface. Aircraft, ships, satellites, conflict zones, CCTV networks, GPS jamming, internet-connected devices, police scanners, mesh radio nodes, and breaking geopolitical events — all updating in real time on one screen as well as an obfuscated communications protocol and information exchange infrastructure. **ShadowBroker** is a decentralized intelligence platform that aggregates real-time, multi-domain OSINT telemetry from 60+ live intelligence feeds into a single dark-ops map interface. Aircraft, ships, satellites, conflict zones, CCTV networks, GPS jamming, internet-connected devices, police scanners, mesh radio nodes, and breaking geopolitical events — all updating in real time on one screen as well as an obfuscated communications protocol and information exchange infrastructure.
Built with **Next.js**, **MapLibre GL**, **FastAPI**, and **Python**. 35+ toggleable data layers, including SAR ground-change detection. Multiple visual modes (DEFAULT / SATELLITE / FLIR / NVG / CRT). Right-click any point on Earth for a country dossier, head-of-state lookup, and the latest Sentinel-2 satellite photo. No user data is collected or transmitted — the dashboard runs entirely in your browser against a self-hosted backend. Built with **Next.js**, **MapLibre GL**, **FastAPI**, and **Python**. 40+ toggleable data layers, including SAR ground-change detection, **Telegram OSINT** (public channel previews geoparsed onto the map), a **server-side recon toolkit** (DNS, WHOIS, sanctions, BGP, IP sweep, and more), supply-chain risk overlays, and malware/C2 + CISA KEV cyber threat feeds. Multiple visual modes (DEFAULT / SATELLITE / FLIR / NVG / CRT). Right-click any point on Earth for a country dossier, head-of-state lookup, entity-graph expansion, and the latest Sentinel-2 satellite photo. ShadowBroker has no accounts, product telemetry, or analytics; the dashboard talks to your self-hosted backend. Sensitive recon and Shodan queries never hit third-party APIs from the browser — they are proxied through the backend with SSRF guards and local-operator auth. The **OpenClaw / agent command channel** exposes the same recon backends plus full telemetry search — no separate API integration required.
Designed for analysts, researchers, radio operators, and anyone who wants to see what the world looks like when every public signal is on the same map. Designed for analysts, researchers, radio operators, and anyone who wants to see what the world looks like when every public signal is on the same map.
@@ -28,18 +28,20 @@ Designed for analysts, researchers, radio operators, and anyone who wants to see
A surprising amount of global telemetry is already public — aircraft ADS-B broadcasts, maritime AIS signals, satellite orbital data, earthquake sensors, mesh radio networks, police scanner feeds, environmental monitoring stations, internet infrastructure telemetry, and more. This data is scattered across dozens of tools and APIs. ShadowBroker combines all of it into a single interface. A surprising amount of global telemetry is already public — aircraft ADS-B broadcasts, maritime AIS signals, satellite orbital data, earthquake sensors, mesh radio networks, police scanner feeds, environmental monitoring stations, internet infrastructure telemetry, and more. This data is scattered across dozens of tools and APIs. ShadowBroker combines all of it into a single interface.
The project does not introduce new surveillance capabilities — it aggregates and visualizes existing public datasets. It is fully open-source so anyone can audit exactly what data is accessed and how. No user data is collected or transmitted — everything runs locally against a self-hosted backend. No telemetry, no analytics, no accounts. The project does not introduce new surveillance capabilities — it aggregates and visualizes existing public datasets. It is fully open-source so anyone can audit exactly what data is accessed and how. ShadowBroker does not include product telemetry, analytics, or accounts. Operator-supplied keys stay in your local deployment, but live OSINT features necessarily make outbound requests to the public data providers you enable or query.
### Shodan Connector ### Shodan & Recon (security-first)
ShadowBroker includes an optional Shodan connector for operator-supplied API access. Shodan results are fetched with your own `SHODAN_API_KEY`, rendered as a local investigative overlay (not merged into core feeds), and remain subject to Shodans terms of service. ShadowBroker includes an optional **Shodan connector** for operator-supplied API access (`SHODAN_API_KEY`) and a **Recon Toolkit** panel for keyless OSINT lookups. Both run **server-side only**: the browser calls your self-hosted `/api/osint/*` and `/api/tools/shodan/*` routes; outbound requests are made by the backend after SSRF validation. Recon requires **local-operator** access (same trust model as layer toggles and admin routes). Shodan results render as a separate map overlay and remain subject to Shodans terms of service.
> **Not included:** embedded live-news YouTube grids or a built-in Gemini AI analyst panel — use the **OpenClaw / agent channel** for AI-assisted analysis instead.
--- ---
## Interesting Use Cases ## Interesting Use Cases
* **Track Air Force One**, the private jets of billionaires and dictators, and every military tanker, ISR, and fighter broadcasting ADS-B. Air Force One and all of the accompanying Presidential/Vice Presidential planes are highlighted and monitored from the moment they leave the ground. * **Track Air Force One**, the private jets of billionaires and dictators, and every military tanker, ISR, and fighter broadcasting ADS-B. Air Force One and all of the accompanying Presidential/Vice Presidential planes are highlighted and monitored from the moment they leave the ground.
* **Connect an AI agent as a co-analyst** through ShadowBroker's HMAC-signed agentic command channel — supports OpenClaw and any other agent that speaks the protocol (Claude, GPT, LangChain, custom). The agent gets full read/write access to all 35+ data layers, pin placement, map control, SAR ground-change, mesh networking, and alert delivery. It sees everything the operator sees and can take actions on the map in real time. * **Connect an AI agent as a co-analyst** through ShadowBroker's HMAC-signed agentic command channel — supports OpenClaw and any other agent that speaks the protocol (Claude, GPT, LangChain, custom). The agent gets full read/write access to all 40+ data layers, compact cross-layer search (`search_telemetry`, `search_news`), the full recon toolkit (`osint_lookup` for IP/DNS/WHOIS/sanctions/CVE/etc.), entity-graph expansion, pin placement, map control, SAR ground-change, mesh networking, and alert delivery. It sees everything the operator sees and can take actions on the map in real time.
* **Communicate on the InfoNet testnet** — The first decentralized intelligence mesh built into an OSINT tool. Obfuscated messaging with gate personas, Dead Drop peer-to-peer exchange, and a built-in terminal CLI. No accounts, no signup. Privacy is not guaranteed yet — this is an experimental testnet — but the protocol is live and being hardened. * **Communicate on the InfoNet testnet** — The first decentralized intelligence mesh built into an OSINT tool. Obfuscated messaging with gate personas, Dead Drop peer-to-peer exchange, and a built-in terminal CLI. No accounts, no signup. Privacy is not guaranteed yet — this is an experimental testnet — but the protocol is live and being hardened.
* **Right-click anywhere on Earth** for a country dossier (head of state, population, languages), Wikipedia summary, and the latest Sentinel-2 satellite photo at 10m resolution * **Right-click anywhere on Earth** for a country dossier (head of state, population, languages), Wikipedia summary, and the latest Sentinel-2 satellite photo at 10m resolution
* **Click a KiwiSDR node** and tune into live shortwave radio directly in the dashboard. Click a police scanner feed and eavesdrop in one click. * **Click a KiwiSDR node** and tune into live shortwave radio directly in the dashboard. Click a police scanner feed and eavesdrop in one click.
@@ -55,6 +57,12 @@ ShadowBroker includes an optional Shodan connector for operator-supplied API acc
* **Track trains** across the US (Amtrak) and Europe (DigiTraffic) in real time * **Track trains** across the US (Amtrak) and Europe (DigiTraffic) in real time
* **Estimate where US aircraft carriers are** using automated GDELT news scraping — no other open tool does this * **Estimate where US aircraft carriers are** using automated GDELT news scraping — no other open tool does this
* **Search internet-connected devices worldwide** via Shodan — cameras, SCADA systems, databases — plotted as a live overlay on the map * **Search internet-connected devices worldwide** via Shodan — cameras, SCADA systems, databases — plotted as a live overlay on the map
* **Run a full recon toolkit** from the left sidebar — IP geolocation, DNS, RDAP/WHOIS, certificate transparency, BGP/ASN, OFAC sanctions search, CVE lookup, Tor/OTX threat checks, and subnet sweeps (InternetDB proxied server-side)
* **Expand an entity graph** when you select an aircraft, vessel, company, or IP — Wikidata + OFAC + live store cross-links rendered in the Entity Graph panel
* **Monitor supply-chain risk** — Tier 1/2 semiconductor and battery fabs scored against nearby earthquakes, wildfires, and conflict events (SCM panel)
* **Toggle malware C2 hotspots** — abuse.ch Feodo Tracker + URLhaus feeds mapped by country (opt-in layer)
* **Monitor Telegram OSINT channels** — public `t.me/s` war/conflict feeds (OSINTdefender, NEXTA, etc.) scraped hourly, risk-scored, geoparsed to metro anchors, and plotted as clickable map pins with inline media
* **Overlay global submarine cables** — static TeleGeography-derived cable routes (opt-in layer)
--- ---
@@ -113,6 +121,20 @@ That's it. `pull` grabs the latest images, `up -d` restarts the containers.
> >
> Podman users should run the equivalent provider command, for example `podman-compose pull` and `podman-compose up -d`, or use `./compose.sh --engine podman pull` and `./compose.sh --engine podman up -d` from a bash-compatible shell. > Podman users should run the equivalent provider command, for example `podman-compose pull` and `podman-compose up -d`, or use `./compose.sh --engine podman pull` and `./compose.sh --engine podman up -d` from a bash-compatible shell.
### Update Integrity
Docker updates are delivered through signed container registries. The legacy ZIP self-updater verifies release archives through this chain, in order:
* `MESH_UPDATE_SHA256` when an operator pins a digest explicitly.
* `backend/data/release_digests.json` for bundled release pins.
* The release `SHA256SUMS.txt` asset on GitHub when a bundled pin is not present.
Release maintainers should run `python backend/scripts/release_helper.py hash <ShadowBroker_vX.Y.Z.zip>` before publishing, then publish `SHA256SUMS.txt` and update `backend/data/release_digests.json` when shipping a ZIP updater target. The updater keeps the operator override path intact instead of failing closed on missing bundled digests, so existing installs do not get stranded by a release-process mistake.
### CSP Hardening
The production frontend ships with a hydration-compatible CSP and a strict nonce-only CSP in `Content-Security-Policy-Report-Only`. Set `SHADOWBROKER_STRICT_CSP=1` only after verifying the exact build hydrates correctly in your deployment. Runtime Google Fonts are not required; the bundled Next font pipeline serves the dashboard font from the app build.
### ⚠️ **Stuck on the old version?** ### ⚠️ **Stuck on the old version?**
**If `git pull` fails or `docker compose up` keeps building from source instead of pulling images**, your clone predates a March 2026 repository migration that rewrote commit history. A normal `git pull` cannot fix this. Run: **If `git pull` fails or `docker compose up` keeps building from source instead of pulling images**, your clone predates a March 2026 repository migration that rewrote commit history. A normal `git pull` cannot fix this. Run:
@@ -174,7 +196,7 @@ ShadowBroker v0.9.7 ships **InfoNet** (decentralized intelligence mesh + Soverei
| Channel | Privacy Status | Details | | Channel | Privacy Status | Details |
|---|---|---| |---|---|---|
| **Meshtastic / APRS** | **PUBLIC** | RF radio transmissions are public and interceptable by design. | | **Meshtastic / APRS** | **PUBLIC** | RF radio transmissions are public and interceptable by design. |
| **InfoNet Gate Chat** | **OBFUSCATED** | Messages are obfuscated with gate personas and canonical payload signing, but NOT end-to-end encrypted. Metadata is not hidden. | | **InfoNet Gate Chat** | **OBFUSCATED** | Messages are obfuscated with gate personas and canonical payload signing, but NOT end-to-end encrypted. Metadata is not hidden despite being designed through Tor and Reticulum (Work in progress). |
| **Dead Drop DMs** | **STRONGEST CURRENT LANE** | Token-based epoch mailbox with SAS word verification. Strongest lane in this build, but not yet confidently private. | | **Dead Drop DMs** | **STRONGEST CURRENT LANE** | Token-based epoch mailbox with SAS word verification. Strongest lane in this build, but not yet confidently private. |
| **Sovereign Shell governance** | **PUBLIC LEDGER** | Petitions, votes, upgrade hashes, and dispute stakes are signed events on a public hashchain. Pseudonymous via gate persona, but governance actions are intentionally observable. | | **Sovereign Shell governance** | **PUBLIC LEDGER** | Petitions, votes, upgrade hashes, and dispute stakes are signed events on a public hashchain. Pseudonymous via gate persona, but governance actions are intentionally observable. |
| **Privacy primitives (RingCT / stealth / DEX)** | **NOT YET WIRED** | Locked Protocol contracts are in place, but the cryptographic scheme has not been chosen. The privacy-core Rust crate is the integration target for a future sprint. | | **Privacy primitives (RingCT / stealth / DEX)** | **NOT YET WIRED** | Locked Protocol contracts are in place, but the cryptographic scheme has not been chosen. The privacy-core Rust crate is the integration target for a future sprint. |
@@ -199,7 +221,7 @@ The first decentralized intelligence communication and governance layer built di
**Communication layer (since v0.9.6):** **Communication layer (since v0.9.6):**
* **InfoNet Experimental Testnet** — A global, obfuscated message relay. Anyone running ShadowBroker can transmit and receive on the InfoNet. Messages pass through a Wormhole relay layer with gate personas, Ed25519 canonical payload signing, and transport obfuscation. * **InfoNet Experimental Testnet** — A global, obfuscated message relay using Tor and Reticulum. Anyone running ShadowBroker can transmit and receive on the InfoNet. Messages pass through a Wormhole relay layer with gate personas, Ed25519 canonical payload signing, and transport obfuscation.
* **Mesh Chat Panel** — Three-tab interface: **INFONET** (gate chat with obfuscated transport), **MESH** (Meshtastic radio integration), **DEAD DROP** (peer-to-peer message exchange with token-based epoch mailboxes — strongest current lane). * **Mesh Chat Panel** — Three-tab interface: **INFONET** (gate chat with obfuscated transport), **MESH** (Meshtastic radio integration), **DEAD DROP** (peer-to-peer message exchange with token-based epoch mailboxes — strongest current lane).
* **Gate Persona System** — Pseudonymous identities with Ed25519 signing keys, prekey bundles, SAS word contact verification, and abuse reporting. * **Gate Persona System** — Pseudonymous identities with Ed25519 signing keys, prekey bundles, SAS word contact verification, and abuse reporting.
* **Mesh Terminal** — Built-in CLI: `send`, `dm`, market commands, gate state inspection. Draggable panel, minimizes to the top bar. Type `help` to see all commands. * **Mesh Terminal** — Built-in CLI: `send`, `dm`, market commands, gate state inspection. Draggable panel, minimizes to the top bar. Type `help` to see all commands.
@@ -219,17 +241,34 @@ The first decentralized intelligence communication and governance layer built di
**Privacy primitive runway (NEW in v0.9.7):** **Privacy primitive runway (NEW in v0.9.7):**
* **Function Keys — Anonymous Citizenship Proof** — A citizen proves "I am an Infonet citizen" without revealing their Infonet identity. 5 of 6 pieces shipped: nullifiers, challenge-response, two-phase commit receipts, enumerated denial codes, batched settlement. Issuance via blind signatures waits on a primitive decision (RSA blind sigs vs BBS+ vs U-Prove vs Idemix). * **Function Keys — Anonymous Credential Scaffolding** — The plumbing is in place for nullifiers, challenge-response, two-phase commit receipts, enumerated denial codes, and batched settlement. Today's challenge-response is an HMAC-based placeholder for integration testing, not a production anonymous or zero-knowledge citizenship proof. True unlinkable issuance still waits on a primitive decision (RSA blind sigs vs BBS+ vs U-Prove vs Idemix).
* **Locked Protocol Contracts** — Stable interfaces in `services/infonet/privacy/contracts.py` for ring signatures, stealth addresses, Pedersen commitments, range proofs, and DEX matching. The `privacy-core` Rust crate is the integration target — no caller of the privacy module needs to know which scheme is active. * **Locked Protocol Contracts** — Stable interfaces in `services/infonet/privacy/contracts.py` for ring signatures, stealth addresses, Pedersen commitments, range proofs, and DEX matching. The `privacy-core` Rust crate is the integration target — no caller of the privacy module needs to know which scheme is active.
* **Sprint 11+ Path** — When the cryptographic scheme is chosen, primitives wire into the locked Protocols without API churn. * **Sprint 11+ Path** — When the cryptographic scheme is chosen, primitives wire into the locked Protocols without API churn.
> **Experimental Testnet — No Privacy Guarantee:** InfoNet messages are obfuscated but NOT end-to-end encrypted. The Mesh network (Meshtastic/APRS) is NOT private — radio transmissions are inherently public. The privacy primitive contracts are scaffolded but not yet wired. Do not send anything sensitive on any channel. Treat all channels as open and public for now. > **Experimental Testnet — No Privacy Guarantee:** InfoNet messages are obfuscated but NOT end-to-end encrypted. The Mesh network (Meshtastic/APRS) is NOT private — radio transmissions are inherently public. The privacy primitive contracts are scaffolded but not yet wired. Do not send anything sensitive on any channel. Treat all channels as open and public for now.
### 🔍 Shodan Device Search (NEW in v0.9.6) ### 🔍 Recon Toolkit & Shodan (Osiris-derived, security-first)
* **Internet Device Search** — Query Shodan directly from ShadowBroker. Search by keyword, CVE, port, or service — results plotted as a live overlay on the map Adapted from the [OSIRIS](https://github.com/simplifaisoul/osiris) recon stack (MIT) with ShadowBrokers proxy model. Attribution: `backend/third_party/osiris/NOTICE.md`.
**Recon Toolkit** (left sidebar — local operator only):
* **IP / DNS / WHOIS** — ip-api.com geolocation, Google DNS-over-HTTPS, RDAP registrant data with optional HTTP security header scoring
* **Certificates & BGP** — crt.sh subdomain discovery, bgpview.io ASN/prefix lookups
* **Threat intel** — AlienVault OTX pulses, Tor exit-node checks, optional per-IP/domain reputation
* **Sanctions** — OpenSanctions `us_ofac_sdn` index (CC-BY); cross-checks on WHOIS entities and IP ISP/org strings
* **CVE / MAC / GitHub / leaks** — MITRE CVE API, MAC vendor lookup, GitHub profile recon, public breach checks
* **IP sweep** — `/api/osint/sweep/scan` geolocates a target /24/32 and proxies Shodan InternetDB host discovery server-side (browser never contacts InternetDB directly)
* **SSRF guard** — Private, loopback, link-local, and metadata hostnames are blocked before any user-supplied fetch
**Entity graph** — Select any map entity to open the Entity Graph panel (`GET /api/entity/expand`). Resolves aircraft, vessels, companies, persons, IPs, and countries into a node/link graph (Wikidata SPARQL + OFAC + in-memory flight/ship store).
**OpenClaw / agent access** — The same recon backends are available on the HMAC command channel (no browser local-operator gate): `osint_lookup` (passive IP/DNS/WHOIS/certs/BGP/sanctions/CVE/MAC/GitHub/leaks/threats), `entity_expand` (relationship graph), and `osint_sweep` (active subnet scan — **full** access tier only). Call `osint_tools` to list supported lookup types. Skill package: `openclaw-skills/shadowbroker/` (`SKILL.md` + `sb_query.py`).
**Shodan overlay** (unchanged):
* **Internet Device Search** — Query Shodan with your own API key; results plotted as a live overlay
* **Configurable Markers** — Shape, color, and size customization for Shodan results * **Configurable Markers** — Shape, color, and size customization for Shodan results
* **Operator-Supplied API** — Uses your own `SHODAN_API_KEY`; results rendered as a local investigative overlay
### 🛩️ Aviation Tracking ### 🛩️ Aviation Tracking
@@ -317,11 +356,12 @@ The first decentralized intelligence communication and governance layer built di
### 📷 Surveillance ### 📷 Surveillance
* **CCTV Mesh** — 11,000+ live traffic cameras from 13 sources across 6 countries: * **CCTV Mesh** — 22,000+ live traffic cameras from 21 ingestors across 10 countries (US, UK, Canada, Australia, Austria, Spain, Singapore, Netherlands when NDW feed is up, plus OSM):
* 🇬🇧 Transport for London JamCams * 🇬🇧 Transport for London JamCams
* 🇺🇸 NYC DOT, Austin TX (TxDOT) * 🇺🇸 NYC DOT, Austin TX (TxDOT)
* 🇺🇸 California (12 Caltrans districts), Washington State (WSDOT), Georgia DOT, Illinois DOT, Michigan DOT * 🇺🇸 California (12 Caltrans districts), Washington State (WSDOT), Georgia DOT, Illinois DOT, Michigan DOT
* 🇪🇸 Spain DGT National (20 cities), Madrid City (357 cameras via KML) * 🇪🇸 Spain DGT National (20 cities), Madrid City (357 cameras via KML)
* 🇦🇹 Austria ASFINAG motorway webcams
* 🇸🇬 Singapore LTA * 🇸🇬 Singapore LTA
* 🌍 Windy Webcams * 🌍 Windy Webcams
* **Feed Rendering** — Automatic detection & rendering of video, MJPEG, HLS, embed, satellite tile, and image feeds * **Feed Rendering** — Automatic detection & rendering of video, MJPEG, HLS, embed, satellite tile, and image feeds
@@ -342,6 +382,12 @@ The first decentralized intelligence communication and governance layer built di
* **Data Center Mapping** — 2,000+ global data centers plotted from a curated dataset. Clustered purple markers with server-rack icons. Click for operator, location, and automatic internet outage cross-referencing by country. * **Data Center Mapping** — 2,000+ global data centers plotted from a curated dataset. Clustered purple markers with server-rack icons. Click for operator, location, and automatic internet outage cross-referencing by country.
* **Military Bases** — Global military installation and missile facility database (NEW) * **Military Bases** — Global military installation and missile facility database (NEW)
* **Power Plants** — 35,000+ global power plants from the WRI database (NEW) * **Power Plants** — 35,000+ global power plants from the WRI database (NEW)
* **Submarine Cables** — Global undersea cable routes from static TeleGeography-derived GeoJSON (`frontend/public/data/submarine-cables.json`). Opt-in line overlay.
* **Malware C2 Layer** — Botnet C2 servers (Feodo Tracker) and recent malware URLs (URLhaus) from abuse.ch, refreshed on the slow tier when the layer is enabled.
* **SCM Supplier Risk** — Tier 1/2 fabs and battery plants (TSMC, Samsung, CATL, etc.) cross-referenced against earthquakes, FIRMS fires, and GDELT conflict proximity. Alerts in the SCM panel; optional map layer.
* **Cyber Threats Feed** — Recent CISA Known Exploited Vulnerabilities (KEV) entries exposed via `/api/cyber-threats` and the layer toggle.
* **Country Risk Index** — Static geopolitical risk scores with USGS earthquake enrichment via `/api/country-risk`.
* **Telegram OSINT** — Public channel web previews (`t.me/s/*`) from configurable war/OSINT feeds. Hourly incremental merge (no redundant re-scrape), keyword risk scoring, Cyrillic/Arabic place aliases, metro-anchor geocoding (separate from news centroids), inline photo/video via `/api/telegram/media` proxy. Layer key: `telegram_osint`.
### 🌐 Additional Layers & Tools ### 🌐 Additional Layers & Tools
@@ -367,7 +413,9 @@ v0.9.7 turns ShadowBroker from a dashboard a human watches into an intelligence
**Capabilities:** **Capabilities:**
* **Full Telemetry Access** — The agent queries all 35+ data layers: flights, ships, satellites, SIGINT, conflict events, earthquakes, fires, wastewater, prediction markets, and more. Fast and slow tier endpoints return enriched data with geographic coordinates, timestamps, and source attribution. * **Full Telemetry Access** — The agent queries all 40+ data layers: flights, ships, satellites, SIGINT, conflict events, earthquakes, fires, wastewater, **Telegram OSINT**, malware/C2, **CISA KEV cyber threats**, SCM overlays, fishing activity (GFW), prediction markets, and more. Fast and slow tier endpoints return enriched data with geographic coordinates, timestamps, and source attribution.
* **Compact Search (preferred over full dumps)** — `get_summary``get_layer_slice` with per-layer `since_layer_versions` (SSE `layer_changed` push tells the agent exactly which layers updated). `search_telemetry` is the Google-style cross-layer keyword index. `search_news` covers news, GDELT, CrowdThreat, LiveUAMap, frontlines, and Telegram posts. `entities_near`, `brief_area`, `find_flights`/`find_ships`/`find_entity`, and `correlate_entity` answer targeted questions without multi-megabyte pulls.
* **Recon Toolkit on the Channel** — `osint_lookup` runs the same SSRF-guarded backends as the Recon panel (`ip`, `dns`, `whois`, `certs`, `bgp`, `sanctions`, `cve`, `mac`, `github`, `leaks`, `threats`, `sweep_init`). `entity_expand` builds Wikidata + OFAC relationship graphs. `osint_sweep` runs Shodan InternetDB subnet discovery (**full** tier). Layer aliases: `telegram`, `malware`/`botnet`, `cyber`/`cisa`/`kev`, `scm`/`suppliers`, `gfw`/`fishing`.
* **AI Intel Pins** — Place color-coded investigation markers directly on the operator's map. 14 pin categories (threat, anomaly, military, maritime, aviation, SIGINT, infrastructure, etc.) with confidence scores, TTL expiry, source URLs, and batch placement up to 100 pins at once. * **AI Intel Pins** — Place color-coded investigation markers directly on the operator's map. 14 pin categories (threat, anomaly, military, maritime, aviation, SIGINT, infrastructure, etc.) with confidence scores, TTL expiry, source URLs, and batch placement up to 100 pins at once.
* **Map Control** — Fly the operator's map view to any coordinate, trigger satellite imagery lookups, and open region dossiers. The agent can direct the operator's attention to specific locations in real time. * **Map Control** — Fly the operator's map view to any coordinate, trigger satellite imagery lookups, and open region dossiers. The agent can direct the operator's attention to specific locations in real time.
* **SAR Ground-Change** — Query SAR anomaly feeds, inspect pin details, manage AOIs, and fly the map to watch areas. The agent can monitor for ground deformation, flood extent, or damage and promote anomalies to pins. * **SAR Ground-Change** — Query SAR anomaly feeds, inspect pin details, manage AOIs, and fly the map to watch areas. The agent can monitor for ground deformation, flood extent, or damage and promote anomalies to pins.
@@ -380,7 +428,7 @@ v0.9.7 turns ShadowBroker from a dashboard a human watches into an intelligence
* **Intelligence Reports** — Generate structured reports with summary stats, top military flights, correlations, earthquake activity, SIGINT counts, and pin inventories. * **Intelligence Reports** — Generate structured reports with summary stats, top military flights, correlations, earthquake activity, SIGINT counts, and pin inventories.
* **Auditable** — Every channel call is logged; the operator can introspect what the agent has done. * **Auditable** — Every channel call is logged; the operator can introspect what the agent has done.
**Connect an agent:** Open the AI Intel panel in the left sidebar, click **Connect Agent**, and copy the HMAC secret. From there, point any compatible agent at the channel — for OpenClaw, import `ShadowBrokerClient` from the OpenClaw skill package; for any other agent, use the same HMAC contract documented above (timestamp + nonce + body digest, tier-gated). The channel is the protocol, not the agent. **Connect an agent:** Open the AI Intel panel in the left sidebar, click **Connect Agent**, and copy the HMAC secret. From there, point any compatible agent at the channel — for OpenClaw, import `ShadowBrokerClient` from `openclaw-skills/shadowbroker/sb_query.py` (see `SKILL.md` for examples); for any other agent, use the same HMAC contract documented above (timestamp + nonce + body digest, tier-gated). Discovery: `GET /api/ai/tools` and `GET /api/ai/capabilities`. The channel is the protocol, not the agent.
### ⏱️ Time Machine — Snapshot Playback (NEW in v0.9.7) ### ⏱️ Time Machine — Snapshot Playback (NEW in v0.9.7)
@@ -529,9 +577,20 @@ ShadowBroker v0.9.7 is composed of three vertically-stacked planes — the **Ope
| [GDELT Project](https://www.gdeltproject.org) | Global conflict events | ~6h | No | | [GDELT Project](https://www.gdeltproject.org) | Global conflict events | ~6h | No |
| [DeepState Map](https://deepstatemap.live) | Ukraine frontline | ~30min | No | | [DeepState Map](https://deepstatemap.live) | Ukraine frontline | ~30min | No |
| [Shodan](https://www.shodan.io) | Internet-connected device search | On-demand | **Yes** | | [Shodan](https://www.shodan.io) | Internet-connected device search | On-demand | **Yes** |
| [OpenSanctions](https://www.opensanctions.org) | OFAC SDN sanctions index (recon + entity graph) | 24h cache | No |
| [abuse.ch Feodo + URLhaus](https://abuse.ch) | Malware C2 / distribution URLs | ~5min (opt-in layer) | No |
| [CISA KEV](https://www.cisa.gov/known-exploited-vulnerabilities-catalog) | Known exploited CVEs | ~5min (opt-in layer) | No |
| [ip-api.com](https://ip-api.com) | IP geolocation (recon, entity graph) | On-demand | No |
| [Google Public DNS](https://dns.google) | DNS-over-HTTPS lookups (recon) | On-demand | No |
| [RDAP.org](https://rdap.org) | Domain registration data (recon) | On-demand | No |
| [crt.sh](https://crt.sh) | Certificate transparency (recon) | On-demand | No |
| [bgpview.io](https://bgpview.io) | BGP/ASN routing (recon) | On-demand | No |
| TeleGeography (static) | Submarine cable routes | Static | No |
| [ASFINAG](https://www.asfinag.at) | Austria motorway webcams | ~10min | No |
| [Amtrak](https://www.amtrak.com) | US train positions | ~60s | No | | [Amtrak](https://www.amtrak.com) | US train positions | ~60s | No |
| [DigiTraffic](https://www.digitraffic.fi) | European rail positions | ~60s | No | | [DigiTraffic](https://www.digitraffic.fi) | European rail positions | ~60s | No |
| [Global Fishing Watch](https://globalfishingwatch.org) | Fishing vessel activity events | ~10min | No | | [Global Fishing Watch](https://globalfishingwatch.org) | Fishing vessel activity events | ~1hr | **Yes** (`GFW_API_TOKEN`) |
| [Telegram public previews](https://t.me/s) | War/OSINT channel posts (`telegram_osint`) | ~1hr | No (optional `TELEGRAM_OSINT_CHANNELS`) |
| Transport for London, NYC DOT, TxDOT | CCTV cameras (UK, US) | ~10min | No | | Transport for London, NYC DOT, TxDOT | CCTV cameras (UK, US) | ~10min | No |
| Caltrans, WSDOT, GDOT, IDOT, MDOT | CCTV cameras (5 US states) | ~10min | No | | Caltrans, WSDOT, GDOT, IDOT, MDOT | CCTV cameras (5 US states) | ~10min | No |
| Spain DGT, Madrid City | CCTV cameras (Spain) | ~10min | No | | Spain DGT, Madrid City | CCTV cameras (Spain) | ~10min | No |
@@ -563,6 +622,8 @@ ShadowBroker v0.9.7 is composed of three vertically-stacked planes — the **Ope
| [OSM Nominatim](https://nominatim.openstreetmap.org) | Place name geocoding (LOCATE bar) | On-demand | No | | [OSM Nominatim](https://nominatim.openstreetmap.org) | Place name geocoding (LOCATE bar) | On-demand | No |
| [CARTO Basemaps](https://carto.com) | Dark map tiles | Continuous | No | | [CARTO Basemaps](https://carto.com) | Dark map tiles | Continuous | No |
**Outbound privacy & audit (#348#366):** Each self-hosted install uses its own backend IP and per-install User-Agent handle. See [docs/OUTBOUND_DATA.md](docs/OUTBOUND_DATA.md) for what contacts third parties, opt-in/env controls, and accepted tradeoffs (CCTV Referer, basemap CDN, LiveUAMap, etc.).
--- ---
## 🚀 Getting Started ## 🚀 Getting Started
@@ -584,9 +645,16 @@ Open `http://localhost:3000` to view the dashboard.
> **Deploying publicly or on a LAN?** No configuration needed for most setups. > **Deploying publicly or on a LAN?** No configuration needed for most setups.
> The frontend proxies all API calls through the Next.js server to `BACKEND_URL`, > The frontend proxies all API calls through the Next.js server to `BACKEND_URL`,
> which defaults to `http://backend:8000` (Docker internal networking). > which defaults to `http://backend:8000` (Docker internal networking).
> Host port `8000` is only published for local API/debug access. If it conflicts > Host port `8000` is only published for local API/debug access (`127.0.0.1:8000`
> with another service, set `BACKEND_PORT=8001` in `.env`; leave `BACKEND_URL` > in `docker-compose.yml`). If it conflicts with another service, set
> as `http://backend:8000` because that is the Docker-internal port. > `BACKEND_PORT=8001` in `.env`; leave `BACKEND_URL` as `http://backend:8000`
> because that is the Docker-internal port.
>
> **Running the backend outside Docker** (`cd backend && python main.py`):
> the dev server binds **loopback only** (`127.0.0.1:8000`) so other machines on
> your LAN cannot hit admin/local-trust routes with an empty `ADMIN_KEY`. Set
> `SHADOWBROKER_DEV_BIND_ALL=true` in `.env` only when you deliberately need
> `0.0.0.0` and use a strong `ADMIN_KEY` for any non-local callers.
> The backend memory cap is controlled by `BACKEND_MEMORY_LIMIT` and defaults > The backend memory cap is controlled by `BACKEND_MEMORY_LIMIT` and defaults
> to `4G`. If Docker reports OOM events, the backend will restart and slow > to `4G`. If Docker reports OOM events, the backend will restart and slow
> layers can look empty until they repopulate. > layers can look empty until they repopulate.
@@ -798,7 +866,7 @@ AIS-catcher decodes VHF radio signals on 161.975 MHz and 162.025 MHz and POSTs d
## 🎛️ Data Layers ## 🎛️ Data Layers
All 37 layers are independently toggleable from the left panel: All 41 layers are independently toggleable from the left panel:
| Layer | Default | Description | | Layer | Default | Description |
|---|---|---| |---|---|---|
@@ -840,6 +908,24 @@ All 37 layers are independently toggleable from the left panel:
| VIIRS Nightlights | ❌ OFF | Night-time light change detection | | VIIRS Nightlights | ❌ OFF | Night-time light change detection |
| Power Plants | ❌ OFF | 35,000+ global power plants | | Power Plants | ❌ OFF | 35,000+ global power plants |
| Shodan Overlay | ❌ OFF | Internet device search results | | Shodan Overlay | ❌ OFF | Internet device search results |
| Road Freight Trends | ❌ OFF | Sentinel-2 truck-motion trends on major highways (Analyze Here) |
| Submarine Cables | ❌ OFF | Global undersea cable routes (static GeoJSON) |
| Malware C2 | ❌ OFF | abuse.ch Feodo + URLhaus threat points |
| SCM Suppliers | ❌ OFF | Tier 1/2 supply-chain risk markers + panel alerts |
| Cyber Threats | ❌ OFF | Recent CISA KEV entries (stats in slow-tier payload) |
| Telegram OSINT | ✅ ON | Public war/OSINT Telegram channels — hourly scrape, geoparsed pins |
| SAR | ✅ ON | Synthetic aperture radar catalog + anomaly alerts |
**Recon & entity tools** (not map layers — left sidebar / selection):
| Tool | Dashboard access | OpenClaw command | Description |
|---|---|---|---|
| Recon Toolkit | Local operator (`/api/osint/*`) | `osint_lookup`, `osint_sweep`† | IP, DNS, WHOIS, certs, BGP, sanctions, CVE, MAC, GitHub, leaks, threats, subnet sweep |
| Entity Graph | Local operator (`/api/entity/expand`) | `entity_expand` | Wikidata + OFAC + live-store relationship graph |
| SCM Risk panel | Local operator (`/api/scm-suppliers`) | `get_layer_slice(["scm_suppliers"])` | Supplier threat rollup + map markers |
| Tool discovery | — | `osint_tools` | Lists recon lookup types and entity-expand schemas |
† `osint_sweep` (active InternetDB scan) requires `OPENCLAW_ACCESS_TIER=full`.
--- ---
@@ -863,6 +949,7 @@ The platform is optimized for handling massive real-time datasets:
``` ```
Shadowbroker/ Shadowbroker/
├── openclaw-skills/shadowbroker/ # OpenClaw skill — SKILL.md, sb_query.py client, alerts/monitor helpers
├── backend/ ├── backend/
│ ├── main.py # FastAPI app, middleware, API routes (~4,000 lines) │ ├── main.py # FastAPI app, middleware, API routes (~4,000 lines)
│ ├── cctv.db # SQLite CCTV camera database (auto-generated) │ ├── cctv.db # SQLite CCTV camera database (auto-generated)
@@ -872,7 +959,18 @@ Shadowbroker/
│ │ ├── data_fetcher.py # Core scheduler — orchestrates all data sources │ │ ├── data_fetcher.py # Core scheduler — orchestrates all data sources
│ │ ├── ais_stream.py # AIS WebSocket client (25K+ vessels) │ │ ├── ais_stream.py # AIS WebSocket client (25K+ vessels)
│ │ ├── carrier_tracker.py # OSINT carrier position estimator (GDELT news scraping) │ │ ├── carrier_tracker.py # OSINT carrier position estimator (GDELT news scraping)
│ │ ├── cctv_pipeline.py # 13-source CCTV camera ingestion pipeline │ │ ├── cctv_pipeline.py # 14-source CCTV camera ingestion pipeline
│ │ ├── ssrf_guard.py # SSRF validation for operator recon fetches
│ │ ├── sanctions/ofac.py # OpenSanctions OFAC SDN index
│ │ ├── osint/lookups.py # Server-side recon lookups (Osiris port)
│ │ ├── osint/openclaw_recon.py # OpenClaw dispatch for recon + entity_expand
│ │ ├── osint_intel/resolve.py # Entity graph resolver (Wikidata + OFAC)
│ │ ├── scm/suppliers.py # Supply-chain risk overlay
│ │ ├── intel_feeds/ # Country risk index helpers
│ │ ├── fetchers/malware.py # abuse.ch Feodo + URLhaus
│ │ ├── fetchers/cyber_status.py # CISA KEV feed
│ │ ├── fetchers/telegram_osint.py # Public Telegram channel scrape + geoparse
│ │ ├── third_party/osiris/ # MIT attribution for Osiris-derived code
│ │ ├── geopolitics.py # GDELT + Ukraine frontline + air alerts │ │ ├── geopolitics.py # GDELT + Ukraine frontline + air alerts
│ │ ├── region_dossier.py # Right-click country/city intelligence │ │ ├── region_dossier.py # Right-click country/city intelligence
│ │ ├── radio_intercept.py # Police scanner feeds + OpenMHZ │ │ ├── radio_intercept.py # Police scanner feeds + OpenMHZ
@@ -910,7 +1008,14 @@ Shadowbroker/
│ │ ├── mesh_reputation.py # Node reputation scoring │ │ ├── mesh_reputation.py # Node reputation scoring
│ │ ├── mesh_oracle.py # Oracle consensus protocol │ │ ├── mesh_oracle.py # Oracle consensus protocol
│ │ └── mesh_secure_storage.py # Secure credential storage │ │ └── mesh_secure_storage.py # Secure credential storage
│ ├── routers/
│ │ ├── osint.py # /api/osint/* recon routes (local operator)
│ │ ├── entity_graph.py # /api/entity/expand
│ │ ├── scm.py # /api/scm-suppliers
│ │ └── intel_feeds.py # /api/malware, /api/cyber-threats, /api/telegram-feed, /api/country-risk
├── frontend/ ├── frontend/
│ ├── public/data/
│ │ └── submarine-cables.json # Static undersea cable GeoJSON
│ ├── src/ │ ├── src/
│ │ ├── app/ │ │ ├── app/
│ │ │ └── page.tsx # Main dashboard — state, polling, layout │ │ │ └── page.tsx # Main dashboard — state, polling, layout
@@ -919,7 +1024,12 @@ Shadowbroker/
│ │ ├── MeshChat.tsx # InfoNet / Mesh / Dead Drop chat panel │ │ ├── MeshChat.tsx # InfoNet / Mesh / Dead Drop chat panel
│ │ ├── MeshTerminal.tsx # Draggable CLI terminal │ │ ├── MeshTerminal.tsx # Draggable CLI terminal
│ │ ├── NewsFeed.tsx # SIGINT feed + entity detail panels │ │ ├── NewsFeed.tsx # SIGINT feed + entity detail panels
│ │ ├── WorldviewLeftPanel.tsx # Data layer toggles (35+ layers) │ │ ├── WorldviewLeftPanel.tsx # Data layer toggles (40+ layers)
│ │ ├── ShodanPanel.tsx # Shodan device search overlay
│ │ ├── ReconPanel.tsx # Server-side OSINT recon toolkit
│ │ ├── ScmPanel.tsx # Supply-chain risk command panel
│ │ ├── EntityGraphPanel.tsx # Entity graph on map selection
│ │ ├── MaplibreViewer/popups/TelegramOsintPopup.tsx # Threat-intercept styled Telegram pin popups
│ │ ├── WorldviewRightPanel.tsx # Search + filter sidebar │ │ ├── WorldviewRightPanel.tsx # Search + filter sidebar
│ │ ├── AdvancedFilterModal.tsx # Airport/country/owner filtering │ │ ├── AdvancedFilterModal.tsx # Airport/country/owner filtering
│ │ ├── MapLegend.tsx # Dynamic legend with all icons │ │ ├── MapLegend.tsx # Dynamic legend with all icons
@@ -956,6 +1066,9 @@ MESH_SAR_EARTHDATA_TOKEN= # NASA Earthdata token (paired wit
MESH_SAR_COPERNICUS_USER= # Copernicus Data Space user (SAR Mode B — EGMS / EMS) MESH_SAR_COPERNICUS_USER= # Copernicus Data Space user (SAR Mode B — EGMS / EMS)
MESH_SAR_COPERNICUS_TOKEN= # Copernicus token (paired with user above) MESH_SAR_COPERNICUS_TOKEN= # Copernicus token (paired with user above)
OPENCLAW_ACCESS_TIER=restricted # OpenClaw agent tier: "restricted" (read-only) or "full" OPENCLAW_ACCESS_TIER=restricted # OpenClaw agent tier: "restricted" (read-only) or "full"
GFW_API_TOKEN=your_gfw_token # Global Fishing Watch — fishing_activity layer (Settings → Maritime)
TELEGRAM_OSINT_ENABLED=true # Telegram OSINT layer (default on)
TELEGRAM_OSINT_CHANNELS=osintdefender,... # Comma-separated public channel slugs (see .env.example)
# Private-lane privacy-core pinning (required when Arti or RNS is enabled) # Private-lane privacy-core pinning (required when Arti or RNS is enabled)
PRIVACY_CORE_MIN_VERSION=0.1.0 PRIVACY_CORE_MIN_VERSION=0.1.0
+75 -11
View File
@@ -11,6 +11,22 @@ AIS_API_KEY= # https://aisstream.io/ — free tier WebSocket key
# ── Optional ─────────────────────────────────────────────────── # ── Optional ───────────────────────────────────────────────────
# AISHub REST fallback. Used when stream.aisstream.io is unreachable
# (e.g. their cert expires or server goes offline). Free tier requires
# registration at https://www.aishub.net/api. Poll cadence defaults to
# 20 min to stay courteous; tunable via AISHUB_POLL_INTERVAL_MINUTES.
# AISHUB_USERNAME=
# AISHUB_POLL_INTERVAL_MINUTES=20
# `python main.py` (uvicorn reload) binds 127.0.0.1:8000 by default so LAN clients
# cannot reach a dev server with empty ADMIN_KEY (#375). Set true only when you
# intentionally need 0.0.0.0 and understand the local-trust implications.
# SHADOWBROKER_DEV_BIND_ALL=false
#
# Thread pool for GDELT, LiveUAMap, CCTV ingest, and slow-tier refresh batches.
# Keeps heavy jobs from starving fast flight/ship workers (default 2).
# SHADOWBROKER_HEAVY_FETCH_WORKERS=2
# Override allowed CORS origins (comma-separated). Defaults to localhost + LAN auto-detect. # Override allowed CORS origins (comma-separated). Defaults to localhost + LAN auto-detect.
# CORS_ORIGINS=http://192.168.1.50:3000,https://my-domain.com # CORS_ORIGINS=http://192.168.1.50:3000,https://my-domain.com
@@ -24,14 +40,24 @@ AIS_API_KEY= # https://aisstream.io/ — free tier WebSocket key
# Requires MESH_DEBUG_MODE=true; do not enable this for ordinary use. # Requires MESH_DEBUG_MODE=true; do not enable this for ordinary use.
# ALLOW_INSECURE_ADMIN=false # ALLOW_INSECURE_ADMIN=false
# Default outbound User-Agent for all third-party HTTP fetchers. # Per-install operator handle. Round 7a: outbound third-party API calls send
# Project-generic by default — does NOT include any personal contact info or # this handle as the User-Agent (e.g. operator-7f3a92), not a shared app name,
# operator-specific identifier. Override only if you run a public relay and # so upstreams rate-limit one install instead of blocking every user.
# want upstreams to be able to reach you (e.g. Nominatim/OSM usage policy). #
# SHADOWBROKER_USER_AGENT=ShadowBroker-OSINT/0.9 (contact: ops@example.com) # Default empty -> a stable pseudonymous handle (e.g. "operator-7f3a92") is
# auto-generated on first run and persisted to backend/data/operator_handle.json.
# Operators who want a meaningful handle (real name, org, GitHub login) can
# set it here. Special characters are sanitized to dashes.
# OPERATOR_HANDLE=
# User-Agent for Nominatim geocoding requests (per OSM usage policy). # Full User-Agent override (replaces the operator handle entirely). Rare;
# NOMINATIM_USER_AGENT=ShadowBroker/1.0 # most installs should use OPERATOR_HANDLE only.
# SHADOWBROKER_USER_AGENT=
# Nominatim-specific User-Agent override (OSM usage policy). Leave unset to
# use the per-install handle (default) — set only if you have a registered
# Nominatim relay identity.
# NOMINATIM_USER_AGENT=
# ── Third-party fetcher opt-ins ──────────────────────────────── # ── Third-party fetcher opt-ins ────────────────────────────────
# These data sources phone home to politically/commercially sensitive # These data sources phone home to politically/commercially sensitive
@@ -45,20 +71,48 @@ AIS_API_KEY= # https://aisstream.io/ — free tier WebSocket key
# FIMI_ENABLED=false # FIMI_ENABLED=false
# #
# Polymarket + Kalshi — US political/election prediction markets. # Polymarket + Kalshi — US political/election prediction markets.
# Default off; enable from Global Threat Intercept (MKT toggle) or set true here.
# PREDICTION_MARKETS_ENABLED=false # PREDICTION_MARKETS_ENABLED=false
# When enabled, polls use a jittered schedule (not the fixed 5-minute slow tier):
# PREDICTION_MARKETS_INTERVAL_MINUTES=7
# PREDICTION_MARKETS_SCHEDULER_JITTER_S=240
# PREDICTION_MARKETS_INITIAL_DELAY_MAX_S=180
# PREDICTION_MARKETS_PRE_FETCH_JITTER_S=90
# PREDICTION_MARKETS_PROVIDER_GAP_JITTER_S=45
# MESH_POLYMARKET_PAGE_DELAY_JITTER_S=0.08
# MESH_KALSHI_PAGE_DELAY_JITTER_S=0.2
# #
# Finnhub fallback / yfinance — financial market data. # Finnhub fallback / yfinance — financial market data.
# Set FINNHUB_API_KEY to enable Finnhub, or set FINANCIAL_ENABLED=true to allow # Set FINNHUB_API_KEY to enable Finnhub, or set FINANCIAL_ENABLED=true to allow
# the unauthenticated yfinance fallback to call Yahoo Finance. # the unauthenticated yfinance fallback to call Yahoo Finance.
# FINANCIAL_ENABLED=false # FINANCIAL_ENABLED=false
# #
# NUFORC UAP sightings — huggingface.co dataset download. # NUFORC UAP map layer — live scrape from nuforc.org (rolling window, default 60 days).
# Refreshed weekly (Mon 12:00 UTC); cache reused for up to 7 days between runs.
# NUFORC_RECENT_DAYS=60
# NUFORC_CACHE_TTL_HOURS=168
# On Windows, live scrape uses Python requests by default; optional:
# SHADOWBROKER_ENABLE_WINDOWS_CURL_FALLBACK=true
# NUFORC enrichment index (HF dataset) is separate — opt-in only:
# NUFORC_ENABLED=false # NUFORC_ENABLED=false
# #
# News RSS aggregator — defaults ON. Set to "false" to disable all # News RSS aggregator — defaults ON. Set to "false" to disable all
# configured news feeds (kill switch for the news layer). # configured news feeds (kill switch for the news layer).
# NEWS_ENABLED=true # NEWS_ENABLED=true
# Global Fishing Watch — fishing vessel activity events (Fishing Activity map layer).
# Free API token from https://globalfishingwatch.org/our-apis/tokens
# Without this the fishing_activity layer stays empty.
# GFW_API_TOKEN=
# Optional tuning — GFW can return 40k+ global events; defaults cap fetch for map paint.
# GFW_EVENTS_PAGE_SIZE=500
# GFW_EVENTS_MAX_PAGES=10
# GFW_EVENTS_LOOKBACK_DAYS=7
# GFW_EVENTS_TIMEOUT_S=90
# Windy Webcams global CCTV layer — free key from https://api.windy.com/webcams/docs
# WINDY_API_KEY=
# LTA Singapore traffic cameras — leave blank to skip this data source. # LTA Singapore traffic cameras — leave blank to skip this data source.
# LTA_ACCOUNT_KEY= # LTA_ACCOUNT_KEY=
@@ -66,6 +120,12 @@ AIS_API_KEY= # https://aisstream.io/ — free tier WebSocket key
# Free MAP_KEY from https://firms.modaps.eosdis.nasa.gov/map/#d:24hrs;@0.0,0.0,3.0z # Free MAP_KEY from https://firms.modaps.eosdis.nasa.gov/map/#d:24hrs;@0.0,0.0,3.0z
# FIRMS_MAP_KEY= # FIRMS_MAP_KEY=
# Ukraine frontline mirror (GitHub). Default follows cyterat/deepstate-map-data@main.
# Pin an immutable commit SHA so ingest cannot silently change if main is force-pushed (#362).
# Example (verify on GitHub before use): main @ b479954e94696bc5622c7818fd20a64a699f4fe8
# DEEPSTATE_MIRROR_COMMIT=b479954e94696bc5622c7818fd20a64a699f4fe8
# DEEPSTATE_MIRROR_REPO=cyterat/deepstate-map-data
# Ukraine air raid alerts from alerts.in.ua — free token from https://alerts.in.ua/ # Ukraine air raid alerts from alerts.in.ua — free token from https://alerts.in.ua/
# ALERTS_IN_UA_TOKEN= # ALERTS_IN_UA_TOKEN=
@@ -95,12 +155,16 @@ AIS_API_KEY= # https://aisstream.io/ — free tier WebSocket key
# can identify per-install traffic instead of aggregated "ShadowBroker" hits. # can identify per-install traffic instead of aggregated "ShadowBroker" hits.
# Leave blank to send a generic UA. If you set MESHTASTIC_OPERATOR_CALLSIGN, # Leave blank to send a generic UA. If you set MESHTASTIC_OPERATOR_CALLSIGN,
# it is included in outbound headers to meshtastic.org by default so they # it is included in outbound headers to meshtastic.org by default so they
# can rate-limit per-operator. Set MESHTASTIC_SEND_CALLSIGN_HEADER=false to # can rate-limit per-operator. Callsign is NOT sent upstream unless you opt in.
# suppress the callsign while still using it locally (e.g. for APRS).
# MESHTASTIC_OPERATOR_CALLSIGN= # MESHTASTIC_OPERATOR_CALLSIGN=
# MESHTASTIC_SEND_CALLSIGN_HEADER=true # MESHTASTIC_SEND_CALLSIGN_HEADER=false
# MESH_MQTT_PSK= # hex-encoded, empty = default LongFast key # MESH_MQTT_PSK= # hex-encoded, empty = default LongFast key
# LiveUAMap Playwright scraper (#348). Linux/macOS: on by default when Global
# Incidents layer is active. Windows: off until the operator enables Global
# Incidents in the UI (consent dialog) or sets SHADOWBROKER_ENABLE_LIVEUAMAP_SCRAPER=true.
# SHADOWBROKER_ENABLE_LIVEUAMAP_SCRAPER=false forces off on all platforms.
# ── Mesh / Reticulum (RNS) ───────────────────────────────────── # ── Mesh / Reticulum (RNS) ─────────────────────────────────────
# Full-node / participant-node posture for public Infonet sync. # Full-node / participant-node posture for public Infonet sync.
# MESH_NODE_MODE=participant # participant | relay | perimeter # MESH_NODE_MODE=participant # participant | relay | perimeter
+1 -1
View File
@@ -45,7 +45,7 @@ COPY uv.lock /workspace/uv.lock
COPY backend/pyproject.toml /workspace/backend/pyproject.toml COPY backend/pyproject.toml /workspace/backend/pyproject.toml
# Install Python dependencies using the lockfile # Install Python dependencies using the lockfile
RUN cd /workspace/backend && uv sync --frozen --no-dev \ RUN cd /workspace/backend && uv sync --frozen --no-dev --extra road-corridor \
&& playwright install --with-deps chromium && playwright install --with-deps chromium
# Copy backend source code # Copy backend source code
+120 -39
View File
@@ -45,6 +45,7 @@ from services.mesh.mesh_compatibility import (
from services.mesh.mesh_crypto import ( from services.mesh.mesh_crypto import (
_derive_peer_key, _derive_peer_key,
normalize_peer_url, normalize_peer_url,
resolve_peer_key_for_url,
verify_signature, verify_signature,
verify_node_binding, verify_node_binding,
parse_public_key_algo, parse_public_key_algo,
@@ -112,8 +113,14 @@ def _scoped_admin_tokens() -> dict[str, list[str]]:
return normalized return normalized
def _request_scope_path(request: Request) -> str:
"""Return the ASGI request-line path, not the Host-derived URL path."""
scope = getattr(request, "scope", {}) or {}
return str(scope.get("path") or "")
def _required_scope_for_request(request: Request) -> str: def _required_scope_for_request(request: Request) -> str:
path = str(request.url.path or "") path = _request_scope_path(request)
if path.startswith("/api/wormhole/gate/"): if path.startswith("/api/wormhole/gate/"):
return "gate" return "gate"
if path.startswith("/api/wormhole/dm/"): if path.startswith("/api/wormhole/dm/"):
@@ -245,15 +252,90 @@ def _docker_bridge_local_operator_enabled() -> bool:
} }
# Issue #250 (tg12): the previous implementation returned True for any IP
# in the entire 172.16.0.0/12 range. Anyone with `docker run` access on
# the same daemon could spin up a container that automatically passed
# local-operator auth. The fix narrows trust to ONLY connections whose
# source IP matches the configured frontend container's hostname.
#
# Docker DNS resolves both the compose service name (``frontend``) and
# the explicit ``container_name`` (``shadowbroker-frontend``) to the
# frontend container's bridge IP. We forward-resolve both, cache the
# result for 30s, and only trust connections from those exact IPs.
#
# Operators on shared Docker hosts get the benefit of the narrower
# surface. Operators on single-user installs see no behavior change —
# their frontend container still resolves and is still trusted.
_DOCKER_BRIDGE_TRUST_CACHE: dict = {"ips": frozenset(), "expires": 0.0}
_DOCKER_BRIDGE_TRUST_TTL = 30.0
def _trusted_bridge_frontend_hostnames() -> list[str]:
"""Container hostnames whose IPs we treat as local-operator on the bridge.
Default covers both Docker Compose service name (``frontend``) and the
explicit ``container_name`` from the shipped docker-compose.yml
(``shadowbroker-frontend``). Operators with non-default names can
override via the ``SHADOWBROKER_TRUSTED_FRONTEND_HOSTS`` env var
(comma-separated, no spaces).
"""
raw = str(
os.environ.get(
"SHADOWBROKER_TRUSTED_FRONTEND_HOSTS",
"frontend,shadowbroker-frontend",
)
).strip()
return [h.strip() for h in raw.split(",") if h.strip()]
def _resolve_trusted_bridge_ips() -> frozenset[str]:
"""Resolve trusted frontend hostnames to a set of IPs, with caching.
Cached for 30s so we don't hit DNS on every request. The cache is
process-local — frontend container IP rotations during a backend's
lifetime will be picked up within 30s.
Returns frozenset() if Docker DNS can't resolve any of the configured
hostnames (fail-closed — when in doubt, refuse to trust the bridge).
"""
import socket
import time as _time
now = _time.time()
cache = _DOCKER_BRIDGE_TRUST_CACHE
if cache["expires"] > now:
return cache["ips"]
ips: set[str] = set()
for hostname in _trusted_bridge_frontend_hostnames():
try:
_, _, addrs = socket.gethostbyname_ex(hostname)
except (OSError, socket.gaierror):
continue
for addr in addrs:
ips.add(addr)
resolved = frozenset(ips)
cache["ips"] = resolved
cache["expires"] = now + _DOCKER_BRIDGE_TRUST_TTL
return resolved
def _is_docker_bridge_host(host: str) -> bool: def _is_docker_bridge_host(host: str) -> bool:
"""Return True only when the source IP matches our trusted frontend
container hostname(s).
Previously trusted any 172.16.0.0/12 IP unconditionally. See the
block comment above for the security rationale.
"""
try: try:
ip = ipaddress.ip_address(host) ip = ipaddress.ip_address(host)
except ValueError: except ValueError:
return False return False
# Docker Desktop and the default compose bridge normally sit inside # Public IPs are never our frontend container — skip DNS work for them.
# 172.16.0.0/12. Keep this narrower than "any private IP" so a user who if not ip.is_private:
# intentionally binds the backend to LAN does not silently trust LAN clients. return False
return ip in ipaddress.ip_network("172.16.0.0/12") return host in _resolve_trusted_bridge_ips()
def _is_trusted_local_runtime_host(host: str) -> bool: def _is_trusted_local_runtime_host(host: str) -> bool:
@@ -367,7 +449,7 @@ async def _verify_openclaw_hmac(request: Request) -> bool:
# Compute expected signature: HMAC-SHA256(secret, METHOD|path|ts|nonce|body_digest) # Compute expected signature: HMAC-SHA256(secret, METHOD|path|ts|nonce|body_digest)
method = str(request.method or "").upper() method = str(request.method or "").upper()
path = str(request.url.path or "") path = _request_scope_path(request)
message = f"{method}|{path}|{ts_str}|{nonce}|{body_digest}" message = f"{method}|{path}|{ts_str}|{nonce}|{body_digest}"
expected = hmac.new( expected = hmac.new(
secret.encode("utf-8"), secret.encode("utf-8"),
@@ -439,33 +521,32 @@ _KNOWN_COMPROMISED_PEER_PUSH_SECRET_SHA256 = (
def _validate_admin_startup() -> None: def _validate_admin_startup() -> None:
admin_key = _current_admin_key() admin_key = _current_admin_key()
if not admin_key or len(admin_key) < 32: if not admin_key:
import secrets logger.warning(
"ADMIN_KEY is not set. Local-operator/admin endpoints will reject "
"remote callers until ADMIN_KEY is configured."
)
return
reason = "not set" if not admin_key else f"too short ({len(admin_key)} chars, minimum 32)" if len(admin_key) < 32:
new_key = secrets.token_hex(32) # 64-char hex string reason = f"too short ({len(admin_key)} chars, minimum 32)"
try: try:
from routers.ai_intel import _write_env_value debug_mode = bool(getattr(get_settings(), "MESH_DEBUG_MODE", False))
except Exception:
_write_env_value("ADMIN_KEY", new_key) debug_mode = False
os.environ["ADMIN_KEY"] = new_key if debug_mode:
logger.info(
"ADMIN_KEY was %s — auto-generated a strong 64-character key and "
"saved it to .env. Admin/mesh endpoints are now secured.",
reason,
)
# Clear settings cache so the rest of startup picks up the new key
try:
get_settings.cache_clear()
except Exception:
pass
except Exception as exc:
logger.warning( logger.warning(
"ADMIN_KEY is %s and could not auto-generate: %s. " "ADMIN_KEY is %s. Debug mode is enabled, so startup will continue, "
"Admin/mesh endpoints may be unavailable.", "but production deployments must use a 32+ character key.",
reason, reason,
exc,
) )
return
logger.error(
"ADMIN_KEY is %s. Refusing to start because auto-generating a backend-only "
"replacement would desynchronize the frontend and backend containers.",
reason,
)
raise SystemExit(1)
def _validate_insecure_admin_startup() -> None: def _validate_insecure_admin_startup() -> None:
@@ -668,8 +749,7 @@ def _is_debug_test_request(request: Request) -> bool:
if not _debug_mode_enabled(): if not _debug_mode_enabled():
return False return False
client_host = (request.client.host or "").lower() if request.client else "" client_host = (request.client.host or "").lower() if request.client else ""
url_host = (request.url.hostname or "").lower() if request.url else "" return client_host == "test"
return client_host == "test" or url_host == "test"
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
@@ -1321,18 +1401,19 @@ def _peer_hmac_url_from_request(request: Request) -> str:
header_url = normalize_peer_url(str(request.headers.get("x-peer-url", "") or "")) header_url = normalize_peer_url(str(request.headers.get("x-peer-url", "") or ""))
if header_url: if header_url:
return header_url return header_url
if not request.url: return ""
return ""
base_url = f"{request.url.scheme}://{request.url.netloc}".rstrip("/")
return normalize_peer_url(base_url)
def _verify_peer_push_hmac(request: Request, body_bytes: bytes) -> bool: def _verify_peer_push_hmac(request: Request, body_bytes: bytes) -> bool:
"""Verify HMAC-SHA256 peer authentication on push requests.""" """Verify HMAC-SHA256 peer authentication on push requests.
secret = str(get_settings().MESH_PEER_PUSH_SECRET or "").strip()
if not secret:
return False
Issue #256: ``resolve_peer_key_for_url`` looks up a per-peer secret
in ``MESH_PEER_SECRETS`` first, then falls back to the global
``MESH_PEER_PUSH_SECRET``. When a peer URL is listed in the per-peer
map, only the listed secret is accepted for it — the global secret
is ignored, so any peer that knows only the global secret cannot
forge a request claiming to be that peer.
"""
provided = str(request.headers.get("x-peer-hmac", "") or "").strip() provided = str(request.headers.get("x-peer-hmac", "") or "").strip()
if not provided: if not provided:
return False return False
@@ -1341,7 +1422,7 @@ def _verify_peer_push_hmac(request: Request, body_bytes: bytes) -> bool:
allowed_peers = set(authenticated_push_peer_urls()) allowed_peers = set(authenticated_push_peer_urls())
if not peer_url or peer_url not in allowed_peers: if not peer_url or peer_url not in allowed_peers:
return False return False
peer_key = _derive_peer_key(secret, peer_url) peer_key = resolve_peer_key_for_url(peer_url)
if not peer_key: if not peer_key:
return False return False
+2 -2
View File
@@ -7,7 +7,7 @@
}, },
{ {
"name": "BBC", "name": "BBC",
"url": "http://feeds.bbci.co.uk/news/world/rss.xml", "url": "https://feeds.bbci.co.uk/news/world/rss.xml",
"weight": 3 "weight": 3
}, },
{ {
@@ -47,7 +47,7 @@
}, },
{ {
"name": "Xinhua", "name": "Xinhua",
"url": "http://www.news.cn/english/rss/worldrss.xml", "url": "https://www.news.cn/english/rss/worldrss.xml",
"weight": 2 "weight": 2
}, },
{ {
+120
View File
@@ -0,0 +1,120 @@
{
"_meta": {
"as_of": "2026-03-09",
"source": "USNI News Fleet & Marine Tracker",
"source_url": "https://news.usni.org/2026/03/09/usni-news-fleet-and-marine-tracker-march-9-2026",
"note": "One-shot bootstrap for first-run carrier positions. Once carrier_cache.json exists in the runtime data volume, this seed file is never read again. All subsequent updates come from GDELT (and any future sources) and are written to carrier_cache.json. A year from now, your runtime cache reflects whatever your install has observed since first launch — not these snapshot positions."
},
"carriers": {
"CVN-68": {
"lat": 47.5535,
"lng": -122.6400,
"heading": 90,
"desc": "Bremerton, WA (Maintenance)",
"source": "USNI News Fleet & Marine Tracker (seed, as of 2026-03-09)",
"source_url": "https://news.usni.org/category/fleet-tracker",
"position_source_at": "2026-03-09T00:00:00Z",
"position_confidence": "seed"
},
"CVN-76": {
"lat": 47.5580,
"lng": -122.6360,
"heading": 90,
"desc": "Bremerton, WA (Decommissioning)",
"source": "USNI News Fleet & Marine Tracker (seed, as of 2026-03-09)",
"source_url": "https://news.usni.org/category/fleet-tracker",
"position_source_at": "2026-03-09T00:00:00Z",
"position_confidence": "seed"
},
"CVN-69": {
"lat": 36.9465,
"lng": -76.3265,
"heading": 0,
"desc": "Norfolk, VA (Post-deployment maintenance)",
"source": "USNI News Fleet & Marine Tracker (seed, as of 2026-03-09)",
"source_url": "https://news.usni.org/category/fleet-tracker",
"position_source_at": "2026-03-09T00:00:00Z",
"position_confidence": "seed"
},
"CVN-78": {
"lat": 18.0,
"lng": 39.5,
"heading": 0,
"desc": "Red Sea — Operation Epic Fury (USNI Mar 9)",
"source": "USNI News Fleet & Marine Tracker (seed, as of 2026-03-09)",
"source_url": "https://news.usni.org/category/fleet-tracker",
"position_source_at": "2026-03-09T00:00:00Z",
"position_confidence": "seed"
},
"CVN-74": {
"lat": 36.98,
"lng": -76.43,
"heading": 0,
"desc": "Newport News, VA (RCOH refueling overhaul)",
"source": "USNI News Fleet & Marine Tracker (seed, as of 2026-03-09)",
"source_url": "https://news.usni.org/category/fleet-tracker",
"position_source_at": "2026-03-09T00:00:00Z",
"position_confidence": "seed"
},
"CVN-75": {
"lat": 36.0,
"lng": 15.0,
"heading": 0,
"desc": "Mediterranean Sea deployment (USNI Mar 9)",
"source": "USNI News Fleet & Marine Tracker (seed, as of 2026-03-09)",
"source_url": "https://news.usni.org/category/fleet-tracker",
"position_source_at": "2026-03-09T00:00:00Z",
"position_confidence": "seed"
},
"CVN-77": {
"lat": 36.5,
"lng": -74.0,
"heading": 0,
"desc": "Atlantic — Pre-deployment workups (USNI Mar 9)",
"source": "USNI News Fleet & Marine Tracker (seed, as of 2026-03-09)",
"source_url": "https://news.usni.org/category/fleet-tracker",
"position_source_at": "2026-03-09T00:00:00Z",
"position_confidence": "seed"
},
"CVN-70": {
"lat": 32.6840,
"lng": -117.1290,
"heading": 180,
"desc": "San Diego, CA (Homeport)",
"source": "USNI News Fleet & Marine Tracker (seed, as of 2026-03-09)",
"source_url": "https://news.usni.org/category/fleet-tracker",
"position_source_at": "2026-03-09T00:00:00Z",
"position_confidence": "seed"
},
"CVN-71": {
"lat": 32.6885,
"lng": -117.1280,
"heading": 180,
"desc": "San Diego, CA (Maintenance)",
"source": "USNI News Fleet & Marine Tracker (seed, as of 2026-03-09)",
"source_url": "https://news.usni.org/category/fleet-tracker",
"position_source_at": "2026-03-09T00:00:00Z",
"position_confidence": "seed"
},
"CVN-72": {
"lat": 20.0,
"lng": 64.0,
"heading": 0,
"desc": "Arabian Sea — Operation Epic Fury (USNI Mar 9)",
"source": "USNI News Fleet & Marine Tracker (seed, as of 2026-03-09)",
"source_url": "https://news.usni.org/category/fleet-tracker",
"position_source_at": "2026-03-09T00:00:00Z",
"position_confidence": "seed"
},
"CVN-73": {
"lat": 35.2830,
"lng": 139.6700,
"heading": 180,
"desc": "Yokosuka, Japan (Forward deployed)",
"source": "USNI News Fleet & Marine Tracker (seed, as of 2026-03-09)",
"source_url": "https://news.usni.org/category/fleet-tracker",
"position_source_at": "2026-03-09T00:00:00Z",
"position_confidence": "seed"
}
}
}
+3
View File
@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:72b69418aa860a0d92ccae398a08722bc85e64a992b5515dd7bf9ae9f79f2fd1
size 107194128
+55
View File
@@ -0,0 +1,55 @@
{
"_comment": [
"Baked-in SHA-256 digests for known Shadowbroker release archives.",
"",
"Issue #231: the self-updater previously skipped integrity verification",
"entirely whenever the MESH_UPDATE_SHA256 env var was unset (which is the",
"default — nothing in the install docs tells operators to set it). That",
"made the auto-update a supply-chain RCE on any compromise of the GitHub",
"release pipeline.",
"",
"The fix uses a multi-source verification chain mirroring the Tor bundle",
"digest approach in #201:",
"",
" 1. MESH_UPDATE_SHA256 env var (operator override, preserved)",
" 2. SHA256SUMS.txt asset published alongside each release (primary —",
" the maintainer's release process already publishes this)",
" 3. This baked-in digest list (second line of defense for releases",
" missing a SHA256SUMS asset, or when the asset can't be fetched)",
" 4. HTTPS-only fallback with a loud warning (preserves auto-update",
" flow during transient outages so users don't get stuck)",
"",
"Mismatch from a source that DID respond is fatal — the update is",
"refused and the existing install keeps running. Only the 'no source",
"reachable at all' case falls back to HTTPS-only.",
"",
"Format: each entry is keyed by release tag and maps asset filenames",
"to their canonical SHA-256 digest (hex, lowercase). The updater",
"compares the locally-computed digest of the downloaded asset against",
"the value here.",
"",
"When the maintainer ships a new release, add its digests here BEFORE",
"removing the old ones so operators on the old code still validate",
"against the previous entries during the transition."
],
"v0.9.79": {
"ShadowBroker_v0.9.79.zip": "f6877c1d66614525315ea82636ce9f7b41178332c4dbf90d27431a1ea1d9cd47",
"ShadowBroker_0.9.79_x64-setup.exe": "f7b676ada45cac7da05868b0a353678c9ee700e3abcf456a7c0c038c36da446f",
"ShadowBroker_0.9.79_x64_en-US.msi": "e0713c3cdda184cfbea750bfac0d62a35678fec00847e6476f2cac8e7e42046e"
},
"v0.9.8": {
"ShadowBroker_v0.9.8.zip": "183bb5cd62b9b9349d95df5ef7696cb6ca810ab4b991fa9dab6f898af4c7a175",
"ShadowBroker_0.9.8_x64-setup.exe": "94a0309862e9c81c92cdcbfea8eec9dbb97eef19ded82b26217b397defbc810c",
"ShadowBroker_0.9.8_x64_en-US.msi": "fe22f9d51e4360d74c18a7250c2fbb9ed4fa4c7a884b3ac0d04a21115466386b"
},
"v0.9.81": {
"ShadowBroker_v0.9.81.zip": "f81f454bdc88e9a32c351df38212b8cfa624704d65764b971bb091eef62259c6",
"ShadowBroker_0.9.81_x64-setup.exe": "25e9a95d0d8ce959a7d08fe8e7406772ae24b596652793e81d1de5d02510a5a6",
"ShadowBroker_0.9.81_x64_en-US.msi": "34e655fc0c0f195ee4ac978f228a4b2b9d5565253b8771aca9ef4693409e9e70"
},
"v0.9.82": {
"ShadowBroker_v0.9.82.zip": "202ab043465741dcc06de57c19ec8314904332f8e818b891d7174655719d084c",
"ShadowBroker_0.9.82_x64-setup.exe": "0eb9f2bda02ab691b39687641abc97e6bfb507b42f48de21970ad7dfb4ea15fc",
"ShadowBroker_0.9.82_x64_en-US.msi": "ced08f930171c0c08009a958cc30b0171a09f982230fc217c6808c2ed7ab2e30"
}
}
+105 -1
View File
@@ -1,4 +1,108 @@
"""Rate-limit key function for slowapi.
Issue #287 (tg12): the previous implementation used
``slowapi.util.get_remote_address`` which only ever returns
``request.client.host``. Behind the bundled Next.js proxy (or any other
reverse proxy), every connected operator's ``client.host`` is the
frontend container's bridge IP. ``@limiter.limit("120/minute")`` then
collapses into one shared bucket for everybody on the same backend —
one heavy tab can starve every other operator on the node.
This module replaces that key function with one that:
* Reads ``X-Forwarded-For`` ONLY when the immediate peer is a trusted
frontend container (same allowlist used by the Docker bridge
local-operator trust path — see ``backend/auth.py`` ``#250``).
* Picks the FIRST entry in the XFF chain. That's the client end of
the proxy chain, which is the operator we want to bucket on.
* Falls back to ``request.client.host`` for any peer that isn't on
the trusted-frontend allowlist. Direct hits, unrelated containers,
and unknown hosts are bucketed exactly like before — there is no
way for an untrusted caller to spoof XFF and steal another
operator's rate-limit bucket.
Single-operator nodes are unaffected: the frontend resolves to one IP,
that IP is on the trust list, the XFF header is read, and you get one
bucket per operator (i.e. you).
"""
from __future__ import annotations
from typing import Any
from slowapi import Limiter from slowapi import Limiter
from slowapi.util import get_remote_address from slowapi.util import get_remote_address
limiter = Limiter(key_func=get_remote_address)
def _client_host(request: Any) -> str:
"""Return the immediate peer's IP, normalised to a lowercase string."""
client = getattr(request, "client", None)
if client is None:
return ""
host = getattr(client, "host", "") or ""
return host.lower()
def _first_forwarded_for(value: str) -> str:
"""Return the first non-empty entry from an ``X-Forwarded-For`` header.
RFC 7239 / de-facto XFF format is ``client, proxy1, proxy2, …``. The
client end is what we want to bucket on. Empty parts (which appear
in some malformed headers) are skipped so we don't end up keying on
an empty string.
"""
for raw in value.split(","):
candidate = raw.strip()
if candidate:
return candidate.lower()
return ""
def _is_trusted_frontend_peer(host: str) -> bool:
"""True iff ``host`` is one of the resolved trusted-frontend IPs.
Imported lazily so this module stays usable in unit tests that
don't want to pull the whole auth module into scope.
"""
if not host:
return False
try:
from auth import _resolve_trusted_bridge_ips
except Exception: # pragma: no cover - defensive
return False
try:
trusted_ips = _resolve_trusted_bridge_ips()
except Exception: # pragma: no cover - defensive
return False
return host in trusted_ips
def shadowbroker_rate_limit_key(request: Any) -> str:
"""slowapi key_func that is proxy-aware on trusted frontend peers only.
Behaviour matrix:
* Direct loopback / unknown peer → ``request.client.host``
(identical to slowapi's default ``get_remote_address``).
* Peer is a trusted frontend container AND ``X-Forwarded-For`` is
present → first XFF entry (the actual operator).
* Peer is a trusted frontend container but no XFF → fall back to
``request.client.host`` (the bridge IP). One shared bucket for
everyone in that case, same as before — but you only get there
if the trusted frontend forgot to forward XFF, which it won't.
"""
peer = _client_host(request)
if _is_trusted_frontend_peer(peer):
headers = getattr(request, "headers", None)
if headers is not None:
xff = headers.get("x-forwarded-for") or headers.get("X-Forwarded-For")
if xff:
first = _first_forwarded_for(xff)
if first:
return first
# Untrusted peer (or trusted peer without XFF): match the original
# get_remote_address behaviour byte-for-byte.
return get_remote_address(request)
limiter = Limiter(key_func=shadowbroker_rate_limit_key)
+638 -555
View File
File diff suppressed because it is too large Load Diff
+18 -7
View File
@@ -7,16 +7,15 @@ py-modules = []
[project] [project]
name = "backend" name = "backend"
version = "0.9.79" version = "0.9.82"
requires-python = ">=3.10" requires-python = ">=3.10"
dependencies = [ dependencies = [
"apscheduler==3.10.3", "apscheduler==3.10.3",
"beautifulsoup4>=4.9.0", "beautifulsoup4>=4.9.0",
"cachetools==5.5.2", "cachetools==5.5.2",
"cloudscraper==1.2.71", "cryptography>=46.0.7",
"cryptography>=41.0.0",
"defusedxml>=0.7.1", "defusedxml>=0.7.1",
"fastapi==0.115.12", "fastapi==0.136.3",
"feedparser==6.0.10", "feedparser==6.0.10",
"httpx==0.28.1", "httpx==0.28.1",
"playwright==1.59.0", "playwright==1.59.0",
@@ -25,7 +24,7 @@ dependencies = [
"pydantic-settings==2.8.1", "pydantic-settings==2.8.1",
"pystac-client==0.8.6", "pystac-client==0.8.6",
"python-dotenv==1.2.2", "python-dotenv==1.2.2",
"requests==2.31.0", "requests==2.33.0",
"PySocks==1.7.1", "PySocks==1.7.1",
"reverse-geocoder==1.5.1", "reverse-geocoder==1.5.1",
"sgp4==2.25", "sgp4==2.25",
@@ -34,17 +33,29 @@ dependencies = [
"paho-mqtt>=1.6.0,<2.0.0", "paho-mqtt>=1.6.0,<2.0.0",
"PyNaCl>=1.5.0", "PyNaCl>=1.5.0",
"slowapi==0.1.9", "slowapi==0.1.9",
"starlette==1.0.1",
"vaderSentiment>=3.3.0", "vaderSentiment>=3.3.0",
"uvicorn==0.34.0", "uvicorn==0.34.0",
"yfinance==1.3.0", "yfinance==1.3.0",
] ]
[project.optional-dependencies]
road-corridor = [
"geopandas>=1.0.0",
"imageio>=2.34.0",
"osmnx>=2.0.0",
"rasterio>=1.4.0",
"scikit-learn>=1.5.0",
"sentinelhub>=3.10.0",
"shapely>=2.0.0",
]
[dependency-groups] [dependency-groups]
dev = ["pytest>=8.3.4", "pytest-asyncio==0.25.0", "ruff>=0.9.0", "black>=24.0.0"] dev = ["pytest>=9.0.3", "pytest-asyncio>=1.4.0", "ruff>=0.9.0", "black>=24.0.0"]
[tool.ruff.lint] [tool.ruff.lint]
# The current backend carries historical style debt in large legacy modules. # The current backend carries historical style debt in large legacy modules.
# Keep CI focused on actionable correctness checks for the v0.9.79 release. # Keep CI focused on actionable correctness checks for the v0.9.82 release.
ignore = ["E401", "E402", "E701", "E731", "E741", "F401", "F402", "F541", "F811", "F841"] ignore = ["E401", "E402", "E701", "E731", "E741", "F401", "F402", "F541", "F811", "F841"]
[tool.black] [tool.black]
+52 -2
View File
@@ -82,9 +82,40 @@ async def api_get_keys_meta(request: Request):
return get_env_path_info() return get_env_path_info()
@router.get("/api/settings/news-feeds") @router.get(
"/api/settings/operator-handle",
dependencies=[Depends(require_local_operator)],
)
@limiter.limit("60/minute")
async def api_get_operator_handle(request: Request):
"""Round 7a: return the per-install operator handle so the frontend
can include it in browser-direct third-party API calls (Wikipedia /
Wikidata via lib/wikimediaClient). The handle is auto-generated on
first use; operators can override it via the OPERATOR_HANDLE setting
or the env var of the same name.
Gated on local-operator: legitimate browser usage goes through the
Next.js proxy which auto-attaches the admin key; remote scanners get
403. The handle itself isn't a secret (it's sent to every third-party
API the operator touches), but admin-gating it matches the rest of
the settings endpoints and follows least-privilege.
"""
from services.network_utils import get_operator_handle
return {"handle": get_operator_handle()}
@router.get(
"/api/settings/news-feeds",
dependencies=[Depends(require_local_operator)],
)
@limiter.limit("30/minute") @limiter.limit("30/minute")
async def api_get_news_feeds(request: Request): async def api_get_news_feeds(request: Request):
"""Issue #252 (tg12): the curated feed inventory is configuration
state, not a public data feed. Gated on local-operator so the
Tauri shell, the Docker bridge frontend, and any caller with an
admin key all see the full list; anonymous LAN/internet callers
can no longer enumerate operator source URLs.
"""
from services.news_feed_config import get_feeds from services.news_feed_config import get_feeds
return get_feeds() return get_feeds()
@@ -118,9 +149,18 @@ async def api_reset_news_feeds(request: Request):
@router.get("/api/settings/node") @router.get("/api/settings/node")
@limiter.limit("30/minute") @limiter.limit("30/minute")
async def api_get_node_settings(request: Request): async def api_get_node_settings(request: Request):
"""Issue #243 (tg12): node_mode and node_enabled are operational
posture. Anonymous callers receive an empty stub; authenticated
callers (local-operator or admin/scoped token) see the full
state. See the canonical handler in backend/main.py for the full
rationale.
"""
import asyncio import asyncio
from auth import _scoped_view_authenticated
from services.node_settings import read_node_settings from services.node_settings import read_node_settings
data = await asyncio.to_thread(read_node_settings) data = await asyncio.to_thread(read_node_settings)
if not _scoped_view_authenticated(request, "node"):
return {}
return { return {
**data, **data,
"node_mode": _current_node_mode(), "node_mode": _current_node_mode(),
@@ -210,9 +250,19 @@ async def api_set_meshtastic_mqtt_settings(request: Request, body: MeshtasticMqt
return _meshtastic_runtime_snapshot() return _meshtastic_runtime_snapshot()
@router.get("/api/settings/timemachine") @router.get(
"/api/settings/timemachine",
dependencies=[Depends(require_local_operator)],
)
@limiter.limit("30/minute") @limiter.limit("30/minute")
async def api_get_timemachine_settings(request: Request): async def api_get_timemachine_settings(request: Request):
"""Issue #253 (tg12): archival-capture posture is operationally
sensitive — it tells a remote caller whether this deployment is
retaining replayable historical surveillance data. Gated on
local-operator so the Tauri shell and Docker bridge frontend
still see the toggle state, but anonymous LAN/internet callers
can no longer fingerprint Time Machine state.
"""
import asyncio import asyncio
from services.node_settings import read_node_settings from services.node_settings import read_node_settings
data = await asyncio.to_thread(read_node_settings) data = await asyncio.to_thread(read_node_settings)
+276 -45
View File
@@ -18,6 +18,12 @@ from auth import require_local_operator, require_openclaw_or_local
from limiter import limiter from limiter import limiter
from services.fetchers._store import latest_data as _latest_data from services.fetchers._store import latest_data as _latest_data
def _ai_intel_user_agent() -> str:
from services.network_utils import outbound_user_agent
return outbound_user_agent("ai-intel")
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)
router = APIRouter() router = APIRouter()
@@ -447,7 +453,7 @@ async def ai_satellite_images(
"https://planetarycomputer.microsoft.com/api/stac/v1/search", "https://planetarycomputer.microsoft.com/api/stac/v1/search",
json=search_payload, json=search_payload,
timeout=10, timeout=10,
headers={"User-Agent": "ShadowBroker-OSINT/1.0 (ai-intel)"}, headers={"User-Agent": _ai_intel_user_agent()},
) )
resp.raise_for_status() resp.raise_for_status()
features = resp.json().get("features", []) features = resp.json().get("features", [])
@@ -1584,7 +1590,7 @@ async def agent_tool_manifest(request: Request):
return { return {
"ok": True, "ok": True,
"version": "0.9.79", "version": "0.9.82",
"access_tier": access_tier, "access_tier": access_tier,
"available_commands": available_commands, "available_commands": available_commands,
"transport": { "transport": {
@@ -1699,11 +1705,12 @@ async def agent_tool_manifest(request: Request):
{ {
"name": "search_news", "name": "search_news",
"type": "read", "type": "read",
"description": "Search news and event layers server-side by keyword. Includes news, GDELT, CrowdThreat, and major incident/event feeds without pulling the full slow telemetry feed.", "description": "Search news and event layers server-side by keyword. Includes news, GDELT, CrowdThreat, Telegram OSINT, and major incident/event feeds without pulling the full slow telemetry feed.",
"parameters": { "parameters": {
"query": {"type": "string", "required": True, "description": "Keyword or phrase to search for"}, "query": {"type": "string", "required": True, "description": "Keyword or phrase to search for"},
"limit": {"type": "integer", "required": False, "description": "Max results (default 10, max 50)"}, "limit": {"type": "integer", "required": False, "description": "Max results (default 10, max 50)"},
"include_gdelt": {"type": "boolean", "required": False, "description": "Include GDELT matches (default true)"}, "include_gdelt": {"type": "boolean", "required": False, "description": "Include GDELT matches (default true)"},
"include_telegram": {"type": "boolean", "required": False, "description": "Include Telegram OSINT channel posts (default true)"},
"compact": {"type": "boolean", "required": False, "description": "If true, strips empty/None fields from each result and rounds lat/lng to 3 decimals. Response includes format: 'compressed_v1'."}, "compact": {"type": "boolean", "required": False, "description": "If true, strips empty/None fields from each result and rounds lat/lng to 3 decimals. Response includes format: 'compressed_v1'."},
}, },
"returns": "{results: [{source_layer, title, summary, source, link, lat, lng, risk_score}], version: int, truncated: bool}", "returns": "{results: [{source_layer, title, summary, source, link, lat, lng, risk_score}], version: int, truncated: bool}",
@@ -1737,6 +1744,55 @@ async def agent_tool_manifest(request: Request):
}, },
"returns": "{center, radius_km, nearby, topic_news, context_layers}", "returns": "{center, radius_km, nearby, topic_news, context_layers}",
}, },
{
"name": "osint_lookup",
"type": "read",
"description": "Run a passive OSINT recon lookup server-side (same backends as the Recon panel). SSRF-guarded outbound proxies for IP geolocation, DNS, WHOIS, certs, BGP/ASN, sanctions, CVE, MAC vendor, GitHub profile, breach checks, and threat feeds.",
"parameters": {
"tool": {"type": "string", "required": True, "description": "Lookup type: ip, dns, whois, certs, threats, bgp, sanctions, cve, mac, github, leaks, sweep_init"},
"ip": {"type": "string", "required": False, "description": "IPv4/IPv6 for ip or sweep_init"},
"domain": {"type": "string", "required": False, "description": "Domain for dns, whois, certs"},
"query": {"type": "string", "required": False, "description": "Generic query (BGP ASN, sanctions name, optional threats filter)"},
"cve": {"type": "string", "required": False, "description": "CVE id for cve lookup"},
"mac": {"type": "string", "required": False, "description": "MAC address for mac lookup"},
"username": {"type": "string", "required": False, "description": "GitHub username"},
"email": {"type": "string", "required": False, "description": "Email for breach/leak lookup"},
"schema": {"type": "string", "required": False, "description": "Sanctions schema filter: Person, Organization, Company, Vessel, Airplane, LegalEntity"},
"limit": {"type": "integer", "required": False, "description": "Sanctions result cap (default 25, max 100)"},
"cidr": {"type": "integer", "required": False, "description": "CIDR mask for sweep_init (24-32, default 24)"},
},
"returns": "Tool-specific JSON (geo, DNS records, WHOIS, sanctions hits, CVE details, etc.)",
},
{
"name": "osint_tools",
"type": "read",
"description": "List available OSINT recon tools, entity-expand types, and sanctions schemas.",
"parameters": {},
"returns": "{tools: [...], entity_types: [...], sanctions_schemas: [...], notes: {...}}",
},
{
"name": "entity_expand",
"type": "read",
"description": "Expand an entity relationship graph around an aircraft, vessel, IP, company, person, or country. Same backend as /api/entity/expand.",
"parameters": {
"type": {"type": "string", "required": True, "description": "Entity type: aircraft, vessel, company, person, ip, country"},
"id": {"type": "string", "required": True, "description": "Entity identifier (tail number, MMSI, IP, company name, etc.)"},
"registration": {"type": "string", "required": False, "description": "Aircraft registration hint"},
"model": {"type": "string", "required": False, "description": "Aircraft model hint"},
"icao24": {"type": "string", "required": False, "description": "ICAO24 hex for aircraft"},
},
"returns": "{nodes: [...], links: [...]}",
},
{
"name": "osint_sweep",
"type": "write",
"description": "Active subnet device discovery via Shodan InternetDB (ports, vulns, hostnames). Requires full OpenClaw access tier. Private/reserved IPs blocked.",
"parameters": {
"ip": {"type": "string", "required": True, "description": "Public IPv4 anchor for the sweep"},
"cidr": {"type": "integer", "required": False, "description": "Subnet size /24-/32 (default 24)"},
},
"returns": "{center, target_ip, cidr, subnet, devices, summary, sweep_time_ms}",
},
{ {
"name": "what_changed", "name": "what_changed",
"type": "read", "type": "read",
@@ -2188,6 +2244,11 @@ async def agent_tool_manifest(request: Request):
"Prefer compact lookups first: search_telemetry, find_flights, find_ships, search_news, entities_near, get_layer_slice. Use get_telemetry/get_slow_telemetry/get_report only when focused commands are insufficient.", "Prefer compact lookups first: search_telemetry, find_flights, find_ships, search_news, entities_near, get_layer_slice. Use get_telemetry/get_slow_telemetry/get_report only when focused commands are insufficient.",
"ShadowBroker does expose UAP sightings, wastewater, and tracked_flights/VIP aircraft when those layers are populated. Verify with get_summary or get_layer_slice before claiming a layer is absent.", "ShadowBroker does expose UAP sightings, wastewater, and tracked_flights/VIP aircraft when those layers are populated. Verify with get_summary or get_layer_slice before claiming a layer is absent.",
"ShadowBroker also exposes fishing_activity, which is the fishing-vessel activity layer backed by Global Fishing Watch data when GFW_API_TOKEN is configured. Do not confuse it with the AIS ships layer.", "ShadowBroker also exposes fishing_activity, which is the fishing-vessel activity layer backed by Global Fishing Watch data when GFW_API_TOKEN is configured. Do not confuse it with the AIS ships layer.",
"telegram_osint, malware_threats, cyber_threats, and scm_suppliers are live map layers. Use get_summary or get_layer_slice(['telegram_osint']) before claiming they are absent. Aliases: telegram, malware/botnet, cyber/cisa/kev, scm/suppliers.",
"search_telemetry and search_news both index Telegram OSINT posts. For malware C2, botnet IPs, CISA KEV CVEs, or semiconductor suppliers, use search_telemetry or get_layer_slice on the matching layer.",
"The Recon toolkit is available via osint_lookup: IP geolocation, DNS, WHOIS, certs, BGP, sanctions, CVE, MAC vendor, GitHub, breach checks, threat feeds. Call osint_tools first to list supported tools.",
"entity_expand builds relationship graphs for aircraft, vessels, IPs, companies, people, and countries — use after resolving an entity from telemetry or osint_lookup.",
"osint_sweep runs active subnet discovery (Shodan InternetDB) and requires full OpenClaw access tier. Use osint_lookup tool=sweep_init for passive geolocation context only.",
"Use search_telemetry as the Google-style entry point whenever the user gives you a person, place, company, topic, owner, nickname, or natural-language phrase and you do not already know the source layer.", "Use search_telemetry as the Google-style entry point whenever the user gives you a person, place, company, topic, owner, nickname, or natural-language phrase and you do not already know the source layer.",
"Example: for 'Where is Jerry Jones yacht?' search 'Jerry Jones' across all telemetry first, identify the ship match, then refine with find_ships or raw layer context only if needed.", "Example: for 'Where is Jerry Jones yacht?' search 'Jerry Jones' across all telemetry first, identify the ship match, then refine with find_ships or raw layer context only if needed.",
"For fuzzy natural-language lookups like 'Patriots jet' or 'Jerry Jones yacht', use search_telemetry first and inspect the ranked candidate list before making a hard claim.", "For fuzzy natural-language lookups like 'Patriots jet' or 'Jerry Jones yacht', use search_telemetry first and inspect the ranked candidate list before making a hard claim.",
@@ -2220,7 +2281,7 @@ async def api_capabilities(request: Request):
access_tier = str(get_settings().OPENCLAW_ACCESS_TIER or "restricted").strip().lower() access_tier = str(get_settings().OPENCLAW_ACCESS_TIER or "restricted").strip().lower()
return { return {
"ok": True, "ok": True,
"version": "0.9.79", "version": "0.9.82",
"auth": { "auth": {
"method": "HMAC-SHA256", "method": "HMAC-SHA256",
"headers": ["X-SB-Timestamp", "X-SB-Nonce", "X-SB-Signature"], "headers": ["X-SB-Timestamp", "X-SB-Nonce", "X-SB-Signature"],
@@ -2348,13 +2409,29 @@ async def api_capabilities(request: Request):
"description": "Universal compact search across telemetry when the entity type or source layer is not obvious.", "description": "Universal compact search across telemetry when the entity type or source layer is not obvious.",
}, },
"search_news": { "search_news": {
"args": {"query": "str", "limit": "int (default 10)", "include_gdelt": "bool (default true)"}, "args": {"query": "str", "limit": "int (default 10)", "include_gdelt": "bool (default true)", "include_telegram": "bool (default true)"},
"description": "Search news and event layers by keyword without pulling the whole slow feed.", "description": "Search news and event layers by keyword without pulling the whole slow feed. Includes Telegram OSINT when include_telegram is true.",
}, },
"entities_near": { "entities_near": {
"args": {"lat": "float", "lng": "float", "radius_km": "float (default 50)", "entity_types": "list[str] (optional)", "limit": "int (default 25)"}, "args": {"lat": "float", "lng": "float", "radius_km": "float (default 50)", "entity_types": "list[str] (optional)", "limit": "int (default 25)"},
"description": "Compact proximity search around a point across selected layers.", "description": "Compact proximity search around a point across selected layers.",
}, },
"osint_lookup": {
"args": {"tool": "str (ip|dns|whois|certs|threats|bgp|sanctions|cve|mac|github|leaks|sweep_init)", "...": "tool-specific params"},
"description": "Passive OSINT recon lookup — same backends as the Recon panel.",
},
"osint_tools": {
"args": {},
"description": "List available recon tools and entity-expand types.",
},
"entity_expand": {
"args": {"type": "str", "id": "str", "registration": "str (optional)", "icao24": "str (optional)"},
"description": "Entity relationship graph expansion.",
},
"osint_sweep": {
"args": {"ip": "str", "cidr": "int (default 24)"},
"description": "Active subnet scan — requires full access tier.",
},
"brief_area": { "brief_area": {
"args": {"lat": "float", "lng": "float", "radius_km": "float (default 50)", "entity_types": "list[str] (optional)", "query": "str (optional)", "limit": "int (default 25)", "context_limit": "int (default 10)"}, "args": {"lat": "float", "lng": "float", "radius_km": "float (default 50)", "entity_types": "list[str] (optional)", "query": "str (optional)", "limit": "int (default 25)", "context_limit": "int (default 10)"},
"description": "One compact area brief: nearby aircraft/ships/entities, optional topic news, and selected context layers.", "description": "One compact area brief: nearby aircraft/ships/entities, optional topic news, and selected context layers.",
@@ -2515,45 +2592,85 @@ async def api_capabilities(request: Request):
# OpenClaw Connection Management (local-operator only — NOT via HMAC) # OpenClaw Connection Management (local-operator only — NOT via HMAC)
# These endpoints manage the HMAC secret itself, so they MUST require # These endpoints manage the HMAC secret itself, so they MUST require
# local operator access to prevent privilege escalation. # local operator access to prevent privilege escalation.
#
# Issue #302 (tg12): pre-fix, GET /api/ai/connect-info had two problems:
#
# 1. ``?reveal=true`` made the full secret travel through every operator
# page-load that opened the Connect modal. Even gated to
# ``require_local_operator``, that put the secret into browser
# history, dev-tools network panels, browser disk caches, HAR
# exports, and screen captures. Every time the modal opened.
#
# 2. The same GET endpoint auto-bootstrapped (generated + persisted)
# the secret on first read. Side effects on a GET are a footgun:
# browser prefetchers, mirror tools, and casual curl-from-history
# would all silently mint+persist a fresh secret. (Gated, but
# still surprising — and noisy in the audit log.)
#
# Resolution:
#
# GET /api/ai/connect-info — always returns the MASKED
# secret. No ?reveal param.
# No auto-bootstrap; if the
# secret is missing,
# ``hmac_secret_set: false``
# tells the frontend to call
# /bootstrap.
#
# POST /api/ai/connect-info/bootstrap — NEW. Generates + persists the
# secret if missing. Idempotent.
# Returns metadata only, never
# the full secret.
#
# POST /api/ai/connect-info/reveal — NEW. Returns the full secret in
# the body with strict
# ``Cache-Control: no-store,
# no-cache, must-revalidate``
# + ``Pragma: no-cache`` so
# it does not land in browser
# caches. POST means it does
# not land in URL history.
#
# POST /api/ai/connect-info/regenerate — keeps existing one-time-reveal
# behavior (regenerate IS a
# deliberate destructive action
# the operator triggered, so
# displaying the new secret
# once is the only path that
# makes the operation useful).
# Same no-store headers added.
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
@router.get("/api/ai/connect-info", dependencies=[Depends(require_local_operator)]) # Cache-Control headers that should accompany every response carrying the
@limiter.limit("30/minute") # full HMAC secret. Reused across the reveal + regenerate endpoints so a
async def get_connect_info(request: Request, reveal: bool = False): # future refactor that splits or renames them can't forget the headers.
"""Return connection details for the OpenClaw Connect modal. _NO_STORE_HEADERS = {
"Cache-Control": "no-store, no-cache, must-revalidate, private",
"Pragma": "no-cache",
"Expires": "0",
}
The HMAC secret is masked by default. Pass ?reveal=true to see the full key.
Private keys are NEVER returned. def _mask_hmac_secret(secret: str) -> str:
"""Return a fingerprint-style mask (first6 + bullets + last4) suitable
for display in the UI before the operator clicks Reveal."""
if not secret:
return ""
if len(secret) > 10:
return secret[:6] + "••••••••" + secret[-4:]
return "••••••••"
def _connect_info_metadata(settings) -> dict:
"""Return everything the Connect modal needs EXCEPT the secret itself.
Shared between GET /api/ai/connect-info (where the full secret is
masked) and POST /api/ai/connect-info/bootstrap (where the operator
just generated a secret but we don't return it inline — they have to
call /reveal to see it).
""" """
import os
import secrets
from services.config import get_settings
settings = get_settings()
hmac_secret = str(settings.OPENCLAW_HMAC_SECRET or "").strip()
access_tier = str(settings.OPENCLAW_ACCESS_TIER or "restricted").strip().lower() access_tier = str(settings.OPENCLAW_ACCESS_TIER or "restricted").strip().lower()
# Auto-generate if not set
if not hmac_secret:
hmac_secret = secrets.token_hex(24) # 48 chars
_write_env_value("OPENCLAW_HMAC_SECRET", hmac_secret)
# Clear settings cache so next read picks up the new value
get_settings.cache_clear()
masked = hmac_secret[:6] + "••••••••" + hmac_secret[-4:] if len(hmac_secret) > 10 else "••••••••"
return { return {
"ok": True,
"hmac_secret": hmac_secret if reveal else masked,
"hmac_secret_set": bool(hmac_secret),
"bootstrap_behavior": {
"auto_generates_when_missing": True,
"auto_generated_this_call": not bool(settings.OPENCLAW_HMAC_SECRET or ""),
"notes": [
"If no HMAC secret exists yet, this endpoint bootstraps one and persists it to .env.",
"Regenerating the HMAC secret revokes all existing direct-mode OpenClaw callers at once.",
],
},
"access_tier": access_tier, "access_tier": access_tier,
"trust_model": { "trust_model": {
"remote_http_principal": "holder_of_openclaw_hmac_secret", "remote_http_principal": "holder_of_openclaw_hmac_secret",
@@ -2607,24 +2724,138 @@ async def get_connect_info(request: Request, reveal: bool = False):
} }
@router.post("/api/ai/connect-info/regenerate", dependencies=[Depends(require_local_operator)]) @router.get("/api/ai/connect-info", dependencies=[Depends(require_local_operator)])
@limiter.limit("5/minute") @limiter.limit("30/minute")
async def regenerate_hmac_secret(request: Request): async def get_connect_info(request: Request):
"""Generate a new HMAC secret. Old secret immediately stops working.""" """Return connection details for the OpenClaw Connect modal.
The HMAC secret is always returned as a fingerprint mask
(``first6 + bullets + last4``); the full value is only ever served by
``POST /api/ai/connect-info/reveal`` (see #302). When the secret has
not been bootstrapped yet, ``hmac_secret_set`` is false and the
frontend should call ``POST /api/ai/connect-info/bootstrap``.
Private keys are NEVER returned.
"""
from services.config import get_settings
settings = get_settings()
hmac_secret = str(settings.OPENCLAW_HMAC_SECRET or "").strip()
return {
"ok": True,
"masked_hmac_secret": _mask_hmac_secret(hmac_secret),
"hmac_secret_set": bool(hmac_secret),
"bootstrap_behavior": {
"auto_generates_when_missing": False,
"notes": [
"Call POST /api/ai/connect-info/bootstrap to mint a secret on first use.",
"Call POST /api/ai/connect-info/reveal to see the full secret (no-store).",
"Regenerating the HMAC secret revokes all existing direct-mode OpenClaw callers at once.",
],
},
**_connect_info_metadata(settings),
}
@router.post("/api/ai/connect-info/bootstrap", dependencies=[Depends(require_local_operator)])
@limiter.limit("10/minute")
async def bootstrap_hmac_secret(request: Request):
"""Mint and persist the OpenClaw HMAC secret if it isn't already set.
Idempotent: if a secret already exists, returns ``generated: false``
and leaves the existing secret untouched. Never returns the secret
value in the response body — the operator calls
``POST /api/ai/connect-info/reveal`` to see it.
"""
import secrets import secrets
from services.config import get_settings from services.config import get_settings
settings = get_settings()
existing = str(settings.OPENCLAW_HMAC_SECRET or "").strip()
if existing:
return {
"ok": True,
"generated": False,
"hmac_secret_set": True,
"masked_hmac_secret": _mask_hmac_secret(existing),
"detail": "HMAC secret already configured. Use /reveal to see it.",
}
new_secret = secrets.token_hex(24) # 48 chars new_secret = secrets.token_hex(24) # 48 chars
_write_env_value("OPENCLAW_HMAC_SECRET", new_secret) _write_env_value("OPENCLAW_HMAC_SECRET", new_secret)
get_settings.cache_clear() get_settings.cache_clear()
return { return {
"ok": True, "ok": True,
"hmac_secret": new_secret, "generated": True,
"detail": "HMAC secret regenerated. Update your OpenClaw agent configuration.", "hmac_secret_set": True,
"masked_hmac_secret": _mask_hmac_secret(new_secret),
"detail": "HMAC secret generated. Call /reveal to copy it into your OpenClaw config.",
} }
@router.post("/api/ai/connect-info/reveal", dependencies=[Depends(require_local_operator)])
@limiter.limit("10/minute")
async def reveal_hmac_secret(request: Request):
"""Return the full HMAC secret in the response body.
POST (not GET) so the secret never lands in URL history, access logs,
or browser visit history. Strict ``Cache-Control: no-store`` headers
prevent intermediaries from persisting the response. Returns 404 if
no secret has been bootstrapped — the frontend should call
``POST /api/ai/connect-info/bootstrap`` first.
"""
from services.config import get_settings
settings = get_settings()
hmac_secret = str(settings.OPENCLAW_HMAC_SECRET or "").strip()
if not hmac_secret:
raise HTTPException(
404,
"No HMAC secret configured. Call POST /api/ai/connect-info/bootstrap first.",
)
return JSONResponse(
content={
"ok": True,
"hmac_secret": hmac_secret,
"masked_hmac_secret": _mask_hmac_secret(hmac_secret),
},
headers=_NO_STORE_HEADERS,
)
@router.post("/api/ai/connect-info/regenerate", dependencies=[Depends(require_local_operator)])
@limiter.limit("5/minute")
async def regenerate_hmac_secret(request: Request):
"""Generate a new HMAC secret. Old secret immediately stops working.
Returns the new secret in the response body — this is the only
operation where the full secret travels back through the response,
because regenerating IS a deliberate destructive action the operator
triggered and they need to see the new value once to update their
OpenClaw configuration. Strict ``Cache-Control: no-store`` headers
keep it from being persisted by browser caches, proxies, or HAR
capture tooling.
"""
import secrets
from services.config import get_settings
new_secret = secrets.token_hex(24) # 48 chars
_write_env_value("OPENCLAW_HMAC_SECRET", new_secret)
get_settings.cache_clear()
return JSONResponse(
content={
"ok": True,
"hmac_secret": new_secret,
"masked_hmac_secret": _mask_hmac_secret(new_secret),
"detail": "HMAC secret regenerated. Update your OpenClaw agent configuration.",
},
headers=_NO_STORE_HEADERS,
)
@router.put("/api/ai/connect-info/access-tier", dependencies=[Depends(require_local_operator)]) @router.put("/api/ai/connect-info/access-tier", dependencies=[Depends(require_local_operator)])
@limiter.limit("10/minute") @limiter.limit("10/minute")
async def set_access_tier(request: Request, body: dict): async def set_access_tier(request: Request, body: dict):
+43 -3
View File
@@ -47,6 +47,8 @@ _CCTV_PROXY_ALLOWED_HOSTS = {
"www.tripcheck.com", "www.tripcheck.com",
"infocar.dgt.es", "infocar.dgt.es",
"informo.madrid.es", "informo.madrid.es",
"webcams2.asfinag.at",
"odo.asfinag.at",
"www.windy.com", "www.windy.com",
"imgproxy.windy.com", "imgproxy.windy.com",
"www.lakecountypassage.com", "www.lakecountypassage.com",
@@ -55,6 +57,14 @@ _CCTV_PROXY_ALLOWED_HOSTS = {
"www.nps.gov", "www.nps.gov",
"home.lewiscounty.com", "home.lewiscounty.com",
"www.seattle.gov", "www.seattle.gov",
"511on.ca",
"511.alberta.ca",
"fl511.com",
"www.fl511.com",
"webcams.transport.nsw.gov.au",
"www.livetraffic.com",
"livetraffic.com",
"opendata.ndw.nu",
} }
@@ -120,7 +130,7 @@ def _cctv_proxy_profile_for_url(target_url: str) -> _CCTVProxyProfile:
read_timeout = 18.0 if "/snapshots/" in path else 12.0 read_timeout = 18.0 if "/snapshots/" in path else 12.0
return _CCTVProxyProfile(name="gdot-snapshot", timeout=(_CCTV_PROXY_CONNECT_TIMEOUT_S, read_timeout), cache_seconds=15, return _CCTVProxyProfile(name="gdot-snapshot", timeout=(_CCTV_PROXY_CONNECT_TIMEOUT_S, read_timeout), cache_seconds=15,
headers={"Accept": "image/avif,image/webp,image/apng,image/*,*/*;q=0.8", headers={"Accept": "image/avif,image/webp,image/apng,image/*,*/*;q=0.8",
"Referer": "http://navigator-c2c.dot.ga.gov/"}) "Referer": "https://navigator-c2c.dot.ga.gov/"})
if host == "511ga.org": if host == "511ga.org":
return _CCTVProxyProfile(name="gdot-511ga-image", timeout=(_CCTV_PROXY_CONNECT_TIMEOUT_S, 12.0), cache_seconds=15, return _CCTVProxyProfile(name="gdot-511ga-image", timeout=(_CCTV_PROXY_CONNECT_TIMEOUT_S, 12.0), cache_seconds=15,
headers={"Accept": "image/avif,image/webp,image/apng,image/*,*/*;q=0.8", headers={"Accept": "image/avif,image/webp,image/apng,image/*,*/*;q=0.8",
@@ -128,7 +138,7 @@ def _cctv_proxy_profile_for_url(target_url: str) -> _CCTVProxyProfile:
if host.startswith("vss") and host.endswith("dot.ga.gov"): if host.startswith("vss") and host.endswith("dot.ga.gov"):
return _CCTVProxyProfile(name="gdot-hls", timeout=(_CCTV_PROXY_CONNECT_TIMEOUT_S, 20.0), cache_seconds=10, return _CCTVProxyProfile(name="gdot-hls", timeout=(_CCTV_PROXY_CONNECT_TIMEOUT_S, 20.0), cache_seconds=10,
headers={"Accept": "application/vnd.apple.mpegurl,application/x-mpegURL,video/*,*/*;q=0.8", headers={"Accept": "application/vnd.apple.mpegurl,application/x-mpegURL,video/*,*/*;q=0.8",
"Referer": "http://navigator-c2c.dot.ga.gov/"}) "Referer": "https://navigator-c2c.dot.ga.gov/"})
if host in {"gettingaroundillinois.com", "cctv.travelmidwest.com"}: if host in {"gettingaroundillinois.com", "cctv.travelmidwest.com"}:
return _CCTVProxyProfile(name="illinois-dot", timeout=(_CCTV_PROXY_CONNECT_TIMEOUT_S, 12.0), cache_seconds=30, return _CCTVProxyProfile(name="illinois-dot", timeout=(_CCTV_PROXY_CONNECT_TIMEOUT_S, 12.0), cache_seconds=30,
headers={"Accept": "image/avif,image/webp,image/apng,image/*,*/*;q=0.8"}) headers={"Accept": "image/avif,image/webp,image/apng,image/*,*/*;q=0.8"})
@@ -156,16 +166,46 @@ def _cctv_proxy_profile_for_url(target_url: str) -> _CCTVProxyProfile:
return _CCTVProxyProfile(name="madrid-city", timeout=(_CCTV_PROXY_CONNECT_TIMEOUT_S, 12.0), cache_seconds=30, return _CCTVProxyProfile(name="madrid-city", timeout=(_CCTV_PROXY_CONNECT_TIMEOUT_S, 12.0), cache_seconds=30,
headers={"Accept": "image/avif,image/webp,image/apng,image/*,*/*;q=0.8", headers={"Accept": "image/avif,image/webp,image/apng,image/*,*/*;q=0.8",
"Referer": "https://informo.madrid.es/"}) "Referer": "https://informo.madrid.es/"})
if host in {"webcams2.asfinag.at", "odo.asfinag.at"}:
return _CCTVProxyProfile(name="asfinag-austria", timeout=(_CCTV_PROXY_CONNECT_TIMEOUT_S, 15.0), cache_seconds=60,
headers={"Accept": "image/avif,image/webp,image/apng,image/*,*/*;q=0.8",
"Referer": "https://www.asfinag.at/"})
if host in {"www.windy.com", "imgproxy.windy.com"}: if host in {"www.windy.com", "imgproxy.windy.com"}:
return _CCTVProxyProfile(name="windy-webcams", timeout=(_CCTV_PROXY_CONNECT_TIMEOUT_S, 12.0), cache_seconds=60, return _CCTVProxyProfile(name="windy-webcams", timeout=(_CCTV_PROXY_CONNECT_TIMEOUT_S, 12.0), cache_seconds=60,
headers={"Accept": "image/avif,image/webp,image/apng,image/*,*/*;q=0.8", headers={"Accept": "image/avif,image/webp,image/apng,image/*,*/*;q=0.8",
"Referer": "https://www.windy.com/"}) "Referer": "https://www.windy.com/"})
if host == "511on.ca":
return _CCTVProxyProfile(name="ontario-511", timeout=(_CCTV_PROXY_CONNECT_TIMEOUT_S, 15.0), cache_seconds=30,
headers={"Accept": "image/avif,image/webp,image/apng,image/*,*/*;q=0.8",
"Referer": "https://511on.ca/"})
if host == "511.alberta.ca":
return _CCTVProxyProfile(name="alberta-511", timeout=(_CCTV_PROXY_CONNECT_TIMEOUT_S, 15.0), cache_seconds=30,
headers={"Accept": "image/avif,image/webp,image/apng,image/*,*/*;q=0.8",
"Referer": "https://511.alberta.ca/"})
if host in {"fl511.com", "www.fl511.com"}:
return _CCTVProxyProfile(name="florida-511", timeout=(_CCTV_PROXY_CONNECT_TIMEOUT_S, 15.0), cache_seconds=30,
headers={"Accept": "image/avif,image/webp,image/apng,image/*,*/*;q=0.8",
"Referer": "https://fl511.com/"})
if host == "webcams.transport.nsw.gov.au":
return _CCTVProxyProfile(name="nsw-live-traffic", timeout=(_CCTV_PROXY_CONNECT_TIMEOUT_S, 12.0), cache_seconds=60,
headers={"Accept": "image/avif,image/webp,image/apng,image/*,*/*;q=0.8",
"Referer": "https://www.livetraffic.com/"})
if host in {"opendata.ndw.nu", "www.ndw.nu"}:
return _CCTVProxyProfile(name="ndw-netherlands", timeout=(_CCTV_PROXY_CONNECT_TIMEOUT_S, 12.0), cache_seconds=120,
headers={"Accept": "image/avif,image/webp,image/apng,image/*,*/*;q=0.8",
"Referer": "https://www.ndw.nu/"})
return _CCTVProxyProfile(name="generic-cctv", timeout=(_CCTV_PROXY_CONNECT_TIMEOUT_S, 8.0), cache_seconds=30, return _CCTVProxyProfile(name="generic-cctv", timeout=(_CCTV_PROXY_CONNECT_TIMEOUT_S, 8.0), cache_seconds=30,
headers={"Accept": "*/*"}) headers={"Accept": "*/*"})
def _cctv_upstream_headers(request: Request, profile: _CCTVProxyProfile) -> dict: def _cctv_upstream_headers(request: Request, profile: _CCTVProxyProfile) -> dict:
headers = {"User-Agent": "Mozilla/5.0 (compatible; ShadowBroker CCTV proxy)", **profile.headers} # Round 7a: per-install operator handle. Mozilla/5.0 prefix retained
# because many CCTV endpoints sniff for a browser-like prefix.
from services.network_utils import outbound_user_agent
headers = {
"User-Agent": f"Mozilla/5.0 (compatible; {outbound_user_agent('cctv-proxy')})",
**profile.headers,
}
range_header = request.headers.get("range") range_header = request.headers.get("range")
if range_header: if range_header:
headers["Range"] = range_header headers["Range"] = range_header
+272 -17
View File
@@ -1,6 +1,7 @@
import asyncio import asyncio
import logging import logging
import math import math
import os
import threading import threading
from typing import Any from typing import Any
from fastapi import APIRouter, Request, Response, Query, Depends from fastapi import APIRouter, Request, Response, Query, Depends
@@ -8,7 +9,7 @@ from fastapi.responses import JSONResponse
from pydantic import BaseModel from pydantic import BaseModel
from limiter import limiter from limiter import limiter
from auth import require_admin, require_local_operator from auth import require_admin, require_local_operator
from services.data_fetcher import get_latest_data, update_all_data from services.data_fetcher import update_all_data
import orjson import orjson
import json as json_mod import json as json_mod
@@ -30,6 +31,14 @@ class LayerUpdate(BaseModel):
layers: dict[str, bool] layers: dict[str, bool]
class LiveUamapOptInUpdate(BaseModel):
opted_in: bool
class PredictionMarketsOptInUpdate(BaseModel):
opted_in: bool
_LAST_VIEWPORT_UPDATE: tuple | None = None _LAST_VIEWPORT_UPDATE: tuple | None = None
_LAST_VIEWPORT_UPDATE_TS = 0.0 _LAST_VIEWPORT_UPDATE_TS = 0.0
_VIEWPORT_UPDATE_LOCK = threading.Lock() _VIEWPORT_UPDATE_LOCK = threading.Lock()
@@ -98,6 +107,88 @@ def _current_etag(prefix: str = "") -> str:
return f"{prefix}v{get_data_version()}-l{get_active_layers_version()}" return f"{prefix}v{get_data_version()}-l{get_active_layers_version()}"
# ── Issue #288: viewport-aware payloads ─────────────────────────────────────
# Heavy, density-driven, time-sensitive layers that benefit from bbox
# filtering. Light reference layers (datacenters, military_bases,
# power_plants, satellites, weather, news, etc.) are intentionally NOT
# in these sets — they ship world-scale even when bounds are supplied so
# panning never reveals an "empty world" of static infrastructure.
#
# When the caller does NOT pass s/w/n/e, none of this runs and the response
# is byte-for-byte identical to the pre-#288 behavior.
_FAST_BBOX_HEAVY_KEYS: tuple[str, ...] = (
"commercial_flights",
"military_flights",
"private_flights",
"private_jets",
"tracked_flights",
"ships",
"cctv",
"uavs",
"liveuamap",
"gps_jamming",
"sigint",
"trains",
)
_SLOW_BBOX_HEAVY_KEYS: tuple[str, ...] = (
"gdelt",
"firms_fires",
"kiwisdr",
"scanners",
"psk_reporter",
)
def _has_full_bbox(s, w, n, e) -> bool:
return None not in (s, w, n, e)
def _bbox_etag_suffix(s, w, n, e) -> str:
"""Quantize bbox to 1° before mixing into the ETag.
The 20% padding inside _bbox_filter already absorbs sub-degree pans;
quantizing here means small mouse drags don't blow the ETag cache
on the client. Full-world bounds collapse to a single suffix.
"""
if not _has_full_bbox(s, w, n, e):
return ""
try:
ss = math.floor(float(s))
ww = math.floor(float(w))
nn = math.ceil(float(n))
ee = math.ceil(float(e))
except (TypeError, ValueError):
return ""
# If the requested window covers basically the whole world, treat it as
# "no bbox" for caching purposes so world-zoomed clients all hit the
# same ETag and benefit from the existing 304 path.
lat_span, lng_span = _bbox_spans(s, w, n, e)
if lng_span >= 300 or lat_span >= 120:
return ""
return f"|bbox={ss},{ww},{nn},{ee}"
def _apply_bbox_to_payload(payload: dict, heavy_keys: tuple[str, ...],
s: float, w: float, n: float, e: float) -> dict:
"""In-place filter the heavy-key collections in *payload* to a viewport.
Items without lat/lng are passed through (so e.g. summary blobs aren't
accidentally dropped). The existing _bbox_filter helper applies a 20%
pad and handles antimeridian crossings.
"""
lat_span, lng_span = _bbox_spans(s, w, n, e)
# World-scale request → skip filtering entirely. Spares the CPU and
# guarantees the response matches the no-params shape.
if lng_span >= 300 or lat_span >= 120:
return payload
for key in heavy_keys:
items = payload.get(key)
if not isinstance(items, list) or not items:
continue
payload[key] = _bbox_filter(items, s, w, n, e)
return payload
def _json_safe(value): def _json_safe(value):
if isinstance(value, float): if isinstance(value, float):
return value if math.isfinite(value) else None return value if math.isfinite(value) else None
@@ -120,6 +211,15 @@ def _sanitize_payload(value):
return value return value
def _live_data_json_bytes(payload: dict) -> bytes:
"""Serialize dashboard payloads with the same defensive orjson options everywhere."""
return orjson.dumps(
_sanitize_payload(payload),
default=str,
option=orjson.OPT_NON_STR_KEYS,
)
def _bbox_filter(items: list, s: float, w: float, n: float, e: float, def _bbox_filter(items: list, s: float, w: float, n: float, e: float,
lat_key: str = "lat", lng_key: str = "lng") -> list: lat_key: str = "lat", lng_key: str = "lng") -> list:
pad_lat = (n - s) * 0.2 pad_lat = (n - s) * 0.2
@@ -304,6 +404,95 @@ async def update_viewport(vp: ViewportUpdate, request: Request): # noqa: ARG001
return {"status": "ok"} return {"status": "ok"}
@router.get("/api/liveuamap/scraper-status", dependencies=[Depends(require_local_operator)])
async def api_liveuamap_scraper_status():
"""Whether LiveUAMap Playwright may run (Windows needs UI opt-in unless env forces)."""
from services.liveuamap_settings import liveuamap_scraper_status
return liveuamap_scraper_status()
@router.post("/api/liveuamap/scraper-opt-in", dependencies=[Depends(require_local_operator)])
@limiter.limit("10/minute")
async def api_liveuamap_scraper_opt_in(body: LiveUamapOptInUpdate, request: Request):
"""Persist operator consent for LiveUAMap scraper (#348)."""
from services.liveuamap_settings import liveuamap_scraper_status, set_liveuamap_ui_opt_in
set_liveuamap_ui_opt_in(body.opted_in)
if body.opted_in:
from services.fetchers._store import is_any_active
if is_any_active("global_incidents"):
threading.Thread(target=_run_liveuamap_refresh, daemon=True).start()
return liveuamap_scraper_status()
def _run_liveuamap_refresh() -> None:
try:
from services.fetchers.geo import update_liveuamap
update_liveuamap()
except Exception as e:
logger.warning("LiveUAMap refresh after opt-in failed: %s", e)
@router.get("/api/prediction-markets/status", dependencies=[Depends(require_local_operator)])
async def api_prediction_markets_status():
"""Whether Polymarket/Kalshi fetches and news market correlation are enabled."""
from services.prediction_markets_settings import prediction_markets_status
return prediction_markets_status()
@router.post("/api/prediction-markets/opt-in", dependencies=[Depends(require_local_operator)])
@limiter.limit("10/minute")
async def api_prediction_markets_opt_in(body: PredictionMarketsOptInUpdate, request: Request):
"""Enable or disable prediction market fetches + intercept story correlation."""
from services.config import get_settings
from services.prediction_markets_settings import (
prediction_markets_status,
set_prediction_markets_ui_opt_in,
)
from routers.ai_intel import _write_env_value
set_prediction_markets_ui_opt_in(body.opted_in)
_write_env_value("PREDICTION_MARKETS_ENABLED", "true" if body.opted_in else "false")
os.environ["PREDICTION_MARKETS_ENABLED"] = "true" if body.opted_in else "false"
get_settings.cache_clear()
if body.opted_in:
threading.Thread(target=_run_prediction_markets_refresh, daemon=True).start()
else:
threading.Thread(target=_run_prediction_markets_disable, daemon=True).start()
return prediction_markets_status()
def _run_prediction_markets_refresh() -> None:
try:
from services.fetchers.prediction_markets import fetch_prediction_markets
from services.fetchers.news import fetch_news
fetch_prediction_markets()
fetch_news()
except Exception as e:
logger.warning("Prediction markets refresh after opt-in failed: %s", e)
def _run_prediction_markets_disable() -> None:
try:
from services.fetchers._store import _data_lock, _mark_fresh, latest_data
from services.fetchers.news import fetch_news
with _data_lock:
latest_data["prediction_markets"] = []
latest_data["trending_markets"] = []
_mark_fresh("prediction_markets")
fetch_news()
except Exception as e:
logger.warning("Prediction markets disable cleanup failed: %s", e)
@router.post("/api/layers", dependencies=[Depends(require_local_operator)]) @router.post("/api/layers", dependencies=[Depends(require_local_operator)])
@limiter.limit("30/minute") @limiter.limit("30/minute")
async def update_layers(update: LayerUpdate, request: Request): async def update_layers(update: LayerUpdate, request: Request):
@@ -313,6 +502,8 @@ async def update_layers(update: LayerUpdate, request: Request):
old_mesh = is_any_active("sigint_meshtastic") old_mesh = is_any_active("sigint_meshtastic")
old_aprs = is_any_active("sigint_aprs") old_aprs = is_any_active("sigint_aprs")
old_viirs = is_any_active("viirs_nightlights") old_viirs = is_any_active("viirs_nightlights")
old_datacenters = is_any_active("datacenters")
old_fishing = is_any_active("fishing_activity")
changed = False changed = False
for key, value in update.layers.items(): for key, value in update.layers.items():
if key in active_layers: if key in active_layers:
@@ -325,6 +516,8 @@ async def update_layers(update: LayerUpdate, request: Request):
new_mesh = is_any_active("sigint_meshtastic") new_mesh = is_any_active("sigint_meshtastic")
new_aprs = is_any_active("sigint_aprs") new_aprs = is_any_active("sigint_aprs")
new_viirs = is_any_active("viirs_nightlights") new_viirs = is_any_active("viirs_nightlights")
new_datacenters = is_any_active("datacenters")
new_fishing = is_any_active("fishing_activity")
if old_ships and not new_ships: if old_ships and not new_ships:
from services.ais_stream import stop_ais_stream from services.ais_stream import stop_ais_stream
stop_ais_stream() stop_ais_stream()
@@ -368,13 +561,33 @@ async def update_layers(update: LayerUpdate, request: Request):
if not old_viirs and new_viirs: if not old_viirs and new_viirs:
_queue_viirs_change_refresh() _queue_viirs_change_refresh()
logger.info("VIIRS change refresh queued (layer enabled)") logger.info("VIIRS change refresh queued (layer enabled)")
if not old_datacenters and new_datacenters:
from services.fetchers.infrastructure import fetch_datacenters
fetch_datacenters()
logger.info("Datacenters loaded (layer enabled)")
if not old_fishing and new_fishing:
from services.fetchers.geo import fetch_fishing_activity
fetch_fishing_activity()
logger.info("Fishing activity refresh queued (layer enabled)")
return {"status": "ok"} return {"status": "ok"}
@router.get("/api/live-data") @router.get("/api/live-data")
@limiter.limit("120/minute") @limiter.limit("120/minute")
async def live_data(request: Request): async def live_data(request: Request):
return get_latest_data() etag = _current_etag(prefix="live|full|")
if request.headers.get("if-none-match") == etag:
return Response(status_code=304, headers={"ETag": etag, "Cache-Control": "no-cache"})
from services.fetchers._store import get_latest_data_deepcopy_snapshot
payload = get_latest_data_deepcopy_snapshot()
return Response(
content=_live_data_json_bytes(payload),
media_type="application/json",
headers={"ETag": etag, "Cache-Control": "no-cache"},
)
@router.get("/api/bootstrap/critical") @router.get("/api/bootstrap/critical")
@@ -469,7 +682,7 @@ async def bootstrap_critical(request: Request):
"bootstrap_payload": True, "bootstrap_payload": True,
} }
return Response( return Response(
content=orjson.dumps(_sanitize_payload(payload), default=str, option=orjson.OPT_NON_STR_KEYS), content=_live_data_json_bytes(payload),
media_type="application/json", media_type="application/json",
headers={"ETag": etag, "Cache-Control": "no-cache"}, headers={"ETag": etag, "Cache-Control": "no-cache"},
) )
@@ -479,13 +692,14 @@ async def bootstrap_critical(request: Request):
@limiter.limit("120/minute") @limiter.limit("120/minute")
async def live_data_fast( async def live_data_fast(
request: Request, request: Request,
s: float = Query(None, description="South bound (ignored)", ge=-90, le=90), s: float = Query(None, description="South bound — when all four bounds are supplied, heavy/dense layers (vessels, aircraft, sigint, CCTV, …) are filtered to this viewport with 20% padding. Static reference layers (satellites, etc.) always ship world-scale.", ge=-90, le=90),
w: float = Query(None, description="West bound (ignored)", ge=-180, le=180), w: float = Query(None, description="West bound (see s)", ge=-180, le=180),
n: float = Query(None, description="North bound (ignored)", ge=-90, le=90), n: float = Query(None, description="North bound (see s)", ge=-90, le=90),
e: float = Query(None, description="East bound (ignored)", ge=-180, le=180), e: float = Query(None, description="East bound (see s)", ge=-180, le=180),
initial: bool = Query(False, description="Return a capped startup payload for first paint"), initial: bool = Query(False, description="Return a capped startup payload for first paint"),
): ):
etag = _current_etag(prefix="fast|initial|" if initial else "fast|full|") bbox_suffix = _bbox_etag_suffix(s, w, n, e)
etag = _current_etag(prefix=("fast|initial|" if initial else "fast|full|") + bbox_suffix.lstrip("|") + ("|" if bbox_suffix else ""))
if request.headers.get("if-none-match") == etag: if request.headers.get("if-none-match") == etag:
return Response(status_code=304, headers={"ETag": etag, "Cache-Control": "no-cache"}) return Response(status_code=304, headers={"ETag": etag, "Cache-Control": "no-cache"})
from services.fetchers._store import (active_layers, get_latest_data_subset_refs, get_source_timestamps_snapshot) from services.fetchers._store import (active_layers, get_latest_data_subset_refs, get_source_timestamps_snapshot)
@@ -525,20 +739,29 @@ async def live_data_fast(
payload = _cap_fast_startup_payload(payload) payload = _cap_fast_startup_payload(payload)
else: else:
payload = _cap_fast_dashboard_payload(payload) payload = _cap_fast_dashboard_payload(payload)
return Response(content=orjson.dumps(_sanitize_payload(payload)), media_type="application/json", # Issue #288: bbox filter heavy/dense layers only when all four bounds
headers={"ETag": etag, "Cache-Control": "no-cache"}) # are supplied. Without bounds, behaviour is byte-for-byte identical
# to the pre-#288 implementation.
if _has_full_bbox(s, w, n, e):
payload = _apply_bbox_to_payload(payload, _FAST_BBOX_HEAVY_KEYS, s, w, n, e)
return Response(
content=_live_data_json_bytes(payload),
media_type="application/json",
headers={"ETag": etag, "Cache-Control": "no-cache"},
)
@router.get("/api/live-data/slow") @router.get("/api/live-data/slow")
@limiter.limit("60/minute") @limiter.limit("60/minute")
async def live_data_slow( async def live_data_slow(
request: Request, request: Request,
s: float = Query(None, description="South bound (ignored)", ge=-90, le=90), s: float = Query(None, description="South bound — when all four bounds are supplied, heavy/dense layers (gdelt, firms_fires, kiwisdr, scanners, psk_reporter) are filtered to this viewport with 20% padding. Static reference layers (datacenters, military bases, power plants, weather, news, …) always ship world-scale.", ge=-90, le=90),
w: float = Query(None, description="West bound (ignored)", ge=-180, le=180), w: float = Query(None, description="West bound (see s)", ge=-180, le=180),
n: float = Query(None, description="North bound (ignored)", ge=-90, le=90), n: float = Query(None, description="North bound (see s)", ge=-90, le=90),
e: float = Query(None, description="East bound (ignored)", ge=-180, le=180), e: float = Query(None, description="East bound (see s)", ge=-180, le=180),
): ):
etag = _current_etag(prefix="slow|full|") bbox_suffix = _bbox_etag_suffix(s, w, n, e)
etag = _current_etag(prefix="slow|full|" + bbox_suffix.lstrip("|") + ("|" if bbox_suffix else ""))
if request.headers.get("if-none-match") == etag: if request.headers.get("if-none-match") == etag:
return Response(status_code=304, headers={"ETag": etag, "Cache-Control": "no-cache"}) return Response(status_code=304, headers={"ETag": etag, "Cache-Control": "no-cache"})
from services.fetchers._store import (active_layers, get_latest_data_subset_refs, get_source_timestamps_snapshot) from services.fetchers._store import (active_layers, get_latest_data_subset_refs, get_source_timestamps_snapshot)
@@ -549,7 +772,8 @@ async def live_data_slow(
"firms_fires", "datacenters", "military_bases", "power_plants", "viirs_change_nodes", "firms_fires", "datacenters", "military_bases", "power_plants", "viirs_change_nodes",
"scanners", "weather_alerts", "ukraine_alerts", "air_quality", "volcanoes", "scanners", "weather_alerts", "ukraine_alerts", "air_quality", "volcanoes",
"fishing_activity", "psk_reporter", "correlations", "uap_sightings", "wastewater", "fishing_activity", "psk_reporter", "correlations", "uap_sightings", "wastewater",
"crowdthreat", "threat_level", "trending_markets", "crowdthreat", "threat_level", "trending_markets", "road_corridor_trends",
"malware_threats", "cyber_threats", "scm_suppliers", "telegram_osint",
) )
freshness = get_source_timestamps_snapshot() freshness = get_source_timestamps_snapshot()
payload = { payload = {
@@ -590,10 +814,41 @@ async def live_data_slow(
"uap_sightings": (d.get("uap_sightings") or []) if active_layers.get("uap_sightings", True) else [], "uap_sightings": (d.get("uap_sightings") or []) if active_layers.get("uap_sightings", True) else [],
"wastewater": (d.get("wastewater") or []) if active_layers.get("wastewater", True) else [], "wastewater": (d.get("wastewater") or []) if active_layers.get("wastewater", True) else [],
"crowdthreat": (d.get("crowdthreat") or []) if active_layers.get("crowdthreat", True) else [], "crowdthreat": (d.get("crowdthreat") or []) if active_layers.get("crowdthreat", True) else [],
"road_corridor_trends": (
d.get("road_corridor_trends") or {"updated_at": None, "corridors": []}
)
if active_layers.get("road_corridor_trends", False)
else {"updated_at": None, "corridors": []},
"malware_threats": (
d.get("malware_threats") or {"threats": [], "total": 0}
)
if active_layers.get("malware_c2", False)
else {"threats": [], "total": 0},
"cyber_threats": (
d.get("cyber_threats") or {"threats": [], "stats": {}}
)
if active_layers.get("cyber_threats", False)
else {"threats": [], "stats": {}},
"scm_suppliers": (
d.get("scm_suppliers") or {"suppliers": [], "total": 0, "critical_count": 0}
)
if active_layers.get("scm_suppliers", False)
else {"suppliers": [], "total": 0, "critical_count": 0},
"telegram_osint": (
d.get("telegram_osint") or {"posts": [], "total": 0, "geolocated": 0}
)
if active_layers.get("telegram_osint", True)
else {"posts": [], "total": 0, "geolocated": 0},
"freshness": freshness, "freshness": freshness,
} }
# Issue #288: bbox filter heavy/dense layers only when all four bounds
# are supplied. Static reference layers (datacenters, military bases,
# power_plants, etc.) deliberately stay world-scale so panning never
# hides the infrastructure overlay the operator already has on screen.
if _has_full_bbox(s, w, n, e):
payload = _apply_bbox_to_payload(payload, _SLOW_BBOX_HEAVY_KEYS, s, w, n, e)
return Response( return Response(
content=orjson.dumps(_sanitize_payload(payload), default=str, option=orjson.OPT_NON_STR_KEYS), content=_live_data_json_bytes(payload),
media_type="application/json", media_type="application/json",
headers={"ETag": etag, "Cache-Control": "no-cache"}, headers={"ETag": etag, "Cache-Control": "no-cache"},
) )
+30
View File
@@ -0,0 +1,30 @@
"""Entity graph expansion (intel layer)."""
from __future__ import annotations
from fastapi import APIRouter, Depends, HTTPException, Query, Request
from auth import require_local_operator
from limiter import limiter
from services.osint_intel.resolve import resolve_entity
router = APIRouter()
@router.get("/api/entity/expand")
@limiter.limit("30/minute")
async def entity_expand(
request: Request,
_: None = Depends(require_local_operator),
type: str = Query(..., min_length=3, max_length=32),
id: str = Query(..., min_length=2, max_length=200),
registration: str | None = Query(default=None, max_length=32),
model: str | None = Query(default=None, max_length=64),
icao24: str | None = Query(default=None, max_length=16),
) -> dict:
props = {"label": id, "registration": registration, "model": model, "icao24": icao24}
try:
return resolve_entity(type, id, props)
except ValueError as exc:
raise HTTPException(status_code=400, detail=str(exc)) from exc
except Exception as exc:
raise HTTPException(status_code=502, detail="Intelligence layer unavailable") from exc
+16 -1
View File
@@ -8,7 +8,7 @@ from services.data_fetcher import get_latest_data
from services.schemas import HealthResponse from services.schemas import HealthResponse
import os import os
APP_VERSION = os.environ.get("_HEALTH_APP_VERSION", "0.9.79") APP_VERSION = os.environ.get("_HEALTH_APP_VERSION", "0.9.82")
router = APIRouter() router = APIRouter()
@@ -59,6 +59,12 @@ async def health_check(request: Request):
# when the SPKI-pinned fallback is in effect. The data plane keeps # when the SPKI-pinned fallback is in effect. The data plane keeps
# flowing (this is by design — see ais_proxy.js comments) but observers # flowing (this is by design — see ais_proxy.js comments) but observers
# who care about MITM-protection posture deserve a visible signal. # who care about MITM-protection posture deserve a visible signal.
#
# Plus connectivity health (added 2026-05-23 when stream.aisstream.io
# went fully offline): ``connected`` tells the frontend whether ship
# data is actually flowing. When false, a banner explains that ships
# are unavailable due to an upstream outage — better than the user
# silently seeing an empty ocean and assuming we broke something.
ais_status: dict = {} ais_status: dict = {}
try: try:
from services.ais_stream import ais_proxy_status from services.ais_stream import ais_proxy_status
@@ -69,6 +75,15 @@ async def health_check(request: Request):
# Don't override a worse top-level status if SLOs already failed, # Don't override a worse top-level status if SLOs already failed,
# but escalate ok -> degraded so the field surfaces in dashboards. # but escalate ok -> degraded so the field surfaces in dashboards.
top_status = "degraded" top_status = "degraded"
# AIS_API_KEY not configured is "feature off", not "system broken" —
# so we only escalate when the operator opted into AIS (key set) AND
# the stream is currently offline.
if (
os.environ.get("AIS_API_KEY")
and ais_status.get("connected") is False
and top_status == "ok"
):
top_status = "degraded"
return { return {
"status": top_status, "status": top_status,
+122
View File
@@ -0,0 +1,122 @@
"""Malware, cyber threats, and country risk feeds."""
from __future__ import annotations
import logging
from urllib.parse import urlparse
import requests
from fastapi import APIRouter, HTTPException, Query, Request
from fastapi.responses import StreamingResponse
from starlette.background import BackgroundTask
from limiter import limiter
from services.fetchers._store import get_latest_data_subset_refs
from services.fetchers.telegram_osint import telegram_media_host_allowed
from services.intel_feeds.country_risk import build_country_risk_payload
from services.network_utils import outbound_user_agent
logger = logging.getLogger(__name__)
router = APIRouter()
@router.get("/api/malware")
@limiter.limit("60/minute")
async def malware_feed(request: Request) -> dict:
snap = get_latest_data_subset_refs("malware_threats")
payload = snap.get("malware_threats")
if isinstance(payload, dict) and payload.get("threats") is not None:
return payload
return {"threats": [], "total": 0, "timestamp": None, "source": "abuse.ch"}
@router.get("/api/cyber-threats")
@limiter.limit("60/minute")
async def cyber_threats(request: Request) -> dict:
snap = get_latest_data_subset_refs("cyber_threats")
return snap.get("cyber_threats") or {"threats": [], "stats": {}}
@router.get("/api/country-risk")
@limiter.limit("30/minute")
async def country_risk(request: Request) -> dict:
return build_country_risk_payload()
@router.get("/api/telegram-feed")
@limiter.limit("30/minute")
async def telegram_feed(request: Request) -> dict:
snap = get_latest_data_subset_refs("telegram_osint")
payload = snap.get("telegram_osint")
if isinstance(payload, dict) and payload.get("posts") is not None:
return payload
return {"posts": [], "total": 0, "geolocated": 0, "timestamp": None}
def _infer_telegram_media_type(target_url: str, content_type: str) -> str:
clean_type = str(content_type or "").split(";", 1)[0].strip().lower()
if clean_type and clean_type not in {"application/octet-stream", "binary/octet-stream"}:
return content_type
path = str(urlparse(target_url).path or "").lower()
if path.endswith((".jpg", ".jpeg")):
return "image/jpeg"
if path.endswith(".png"):
return "image/png"
if path.endswith(".webp"):
return "image/webp"
if path.endswith(".gif"):
return "image/gif"
if path.endswith(".mp4"):
return "video/mp4"
if path.endswith(".webm"):
return "video/webm"
return content_type or "application/octet-stream"
@router.get("/api/telegram/media")
@limiter.limit("60/minute")
async def telegram_media_proxy(request: Request, url: str = Query(...)) -> StreamingResponse:
"""Stream Telegram CDN media for in-app playback (host allowlist only)."""
parsed = urlparse(url)
if parsed.scheme not in ("http", "https"):
raise HTTPException(status_code=400, detail="Invalid scheme")
if not telegram_media_host_allowed(parsed.hostname):
raise HTTPException(status_code=403, detail="Host not allowed")
headers = {
"User-Agent": (
f"Mozilla/5.0 (compatible; {outbound_user_agent('telegram-media')}) "
"AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"
),
"Accept": "*/*",
}
if range_header := request.headers.get("range"):
headers["Range"] = range_header
try:
resp = requests.get(url, stream=True, timeout=(3, 45), headers=headers)
except requests.RequestException as exc:
logger.warning("Telegram media upstream failure %s: %s", url, exc)
raise HTTPException(status_code=502, detail="Upstream fetch failed") from exc
if resp.status_code >= 400:
resp.close()
raise HTTPException(status_code=int(resp.status_code), detail=f"Upstream returned {resp.status_code}")
media_type = _infer_telegram_media_type(url, resp.headers.get("Content-Type", "application/octet-stream"))
response_headers = {
"Cache-Control": "private, max-age=300",
"Accept-Ranges": resp.headers.get("Accept-Ranges", "bytes"),
}
if content_length := resp.headers.get("Content-Length"):
response_headers["Content-Length"] = content_length
if content_range := resp.headers.get("Content-Range"):
response_headers["Content-Range"] = content_range
return StreamingResponse(
resp.iter_content(chunk_size=65536),
status_code=resp.status_code,
media_type=media_type,
headers=response_headers,
background=BackgroundTask(resp.close),
)
+21 -4
View File
@@ -223,11 +223,21 @@ async def oracle_markets_more(request: Request, category: str = "NEWS", offset:
"has_more": offset + limit < len(cat_markets), "total": len(cat_markets)} "has_more": offset + limit < len(cat_markets), "total": len(cat_markets)}
@router.post("/api/mesh/oracle/resolve") @router.post(
"/api/mesh/oracle/resolve",
dependencies=[Depends(require_admin)],
)
@limiter.limit("5/minute") @limiter.limit("5/minute")
@mesh_write_exempt(MeshWriteExemption.ADMIN_CONTROL) @mesh_write_exempt(MeshWriteExemption.ADMIN_CONTROL)
async def oracle_resolve(request: Request): async def oracle_resolve(request: Request):
"""Resolve a prediction market.""" """Resolve a prediction market.
Issue #240 (tg12): requires admin authentication. The
``mesh_write_exempt`` decorator below is **metadata only** — it tags
the route as not requiring a mesh signed-write envelope, it does
NOT itself enforce caller authorization. The ``Depends(require_admin)``
on the route decorator is what actually gates access.
"""
from services.mesh.mesh_oracle import oracle_ledger from services.mesh.mesh_oracle import oracle_ledger
body = await request.json() body = await request.json()
market_title = body.get("market_title", "") market_title = body.get("market_title", "")
@@ -327,11 +337,18 @@ async def oracle_predictions(request: Request, node_id: str = ""):
active_predictions, authenticated=_scoped_view_authenticated(request, "mesh.audit")) active_predictions, authenticated=_scoped_view_authenticated(request, "mesh.audit"))
@router.post("/api/mesh/oracle/resolve-stakes") @router.post(
"/api/mesh/oracle/resolve-stakes",
dependencies=[Depends(require_admin)],
)
@limiter.limit("5/minute") @limiter.limit("5/minute")
@mesh_write_exempt(MeshWriteExemption.ADMIN_CONTROL) @mesh_write_exempt(MeshWriteExemption.ADMIN_CONTROL)
async def oracle_resolve_stakes(request: Request): async def oracle_resolve_stakes(request: Request):
"""Resolve all expired stake contests.""" """Resolve all expired stake contests.
Issue #241 (tg12): requires admin authentication. See the note on
``oracle_resolve`` above — ``mesh_write_exempt`` is metadata only.
"""
from services.mesh.mesh_oracle import oracle_ledger from services.mesh.mesh_oracle import oracle_ledger
resolutions = oracle_ledger.resolve_expired_stakes() resolutions = oracle_ledger.resolve_expired_stakes()
return {"ok": True, "resolutions": resolutions, "count": len(resolutions)} return {"ok": True, "resolutions": resolutions, "count": len(resolutions)}
+65
View File
@@ -55,6 +55,12 @@ def _hydrate_gate_store_from_chain(events: list) -> int:
return count return count
def _hydrate_dm_relay_from_chain(events: list) -> int:
import main as _m
return int(_m._hydrate_dm_relay_from_chain(events))
@router.post("/api/mesh/infonet/peer-push") @router.post("/api/mesh/infonet/peer-push")
@limiter.limit("30/minute") @limiter.limit("30/minute")
async def infonet_peer_push(request: Request): async def infonet_peer_push(request: Request):
@@ -82,9 +88,68 @@ async def infonet_peer_push(request: Request):
return {"ok": True, "accepted": 0, "duplicates": 0, "rejected": []} return {"ok": True, "accepted": 0, "duplicates": 0, "rejected": []}
result = infonet.ingest_events(events) result = infonet.ingest_events(events)
_hydrate_gate_store_from_chain(events) _hydrate_gate_store_from_chain(events)
_hydrate_dm_relay_from_chain(events)
return {"ok": True, **result} return {"ok": True, **result}
@router.post("/api/mesh/dm/replicate-envelope")
@limiter.limit("60/minute")
async def dm_replicate_envelope(request: Request):
"""Accept a DM envelope replicated from a peer relay (cross-node mailbox).
Companion endpoint to ``DMRelay.replicate_to_peers`` (outbound, in
``mesh_dm_relay.py``). The sender's relay POSTs an encrypted DM
envelope here after a successful local ``deposit``; this endpoint
re-enforces the per-(sender, recipient) anti-spam cap and stores
the envelope in the local mailbox if accepted.
The cap is the network rule: a hostile sender's relay can spool
extras locally, but every honest peer enforces the cap on inbound
replication. Recipient polling from any honest peer therefore
never sees more than ``MESH_DM_PENDING_PER_SENDER_LIMIT`` pending
from any one sender, no matter how many spam attempts were tried.
Same HMAC auth pattern as ``infonet_peer_push`` and ``gate_peer_push``.
"""
content_length = request.headers.get("content-length")
if content_length:
try:
# DM envelopes are bounded by MESH_DM_MAX_MSG_BYTES + envelope
# overhead; 64 KB is a generous ceiling.
if int(content_length) > 65_536:
return Response(
content='{"ok":false,"detail":"Request body too large (max 64KB)"}',
status_code=413, media_type="application/json",
)
except (ValueError, TypeError):
pass
body_bytes = await request.body()
if not _verify_peer_push_hmac(request, body_bytes):
return Response(
content='{"ok":false,"detail":"Invalid or missing peer HMAC"}',
status_code=403, media_type="application/json",
)
try:
body = json_mod.loads(body_bytes or b"{}")
except (ValueError, TypeError):
return Response(
content='{"ok":false,"detail":"Invalid JSON body"}',
status_code=400, media_type="application/json",
)
envelope = body.get("envelope")
if not isinstance(envelope, dict):
return {"ok": False, "detail": "envelope must be an object"}
originating_peer = _peer_hmac_url_from_request(request) or ""
from services.mesh.mesh_dm_relay import dm_relay
result = dm_relay.accept_replica(
envelope=envelope,
originating_peer_url=originating_peer,
)
return result
@router.post("/api/mesh/gate/peer-push") @router.post("/api/mesh/gate/peer-push")
@limiter.limit("30/minute") @limiter.limit("30/minute")
async def gate_peer_push(request: Request): async def gate_peer_push(request: Request):
+33 -8
View File
@@ -65,6 +65,7 @@ from services.mesh.mesh_signed_events import (
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)
router = APIRouter() router = APIRouter()
_INFONET_SYNC_RATE_LIMIT = "600/minute"
def _signed_body(request: Request) -> dict[str, Any]: def _signed_body(request: Request) -> dict[str, Any]:
@@ -263,6 +264,19 @@ def _redact_public_event(event: dict) -> dict:
return _redact_vote_gate(_redact_key_rotate_payload(_redact_gate_metadata(event))) return _redact_vote_gate(_redact_key_rotate_payload(_redact_gate_metadata(event)))
def _infonet_private_transport_required() -> bool:
import main as _m
return bool(_m._infonet_private_transport_required())
def _infonet_sync_response_events(events: list[dict], request=None) -> list[dict]:
"""Build the sync event surface for the current transport policy."""
import main as _m
return _m._infonet_sync_response_events(events, request=request)
def _trusted_gate_reply_to(event: dict) -> str: def _trusted_gate_reply_to(event: dict) -> str:
if not isinstance(event, dict): if not isinstance(event, dict):
return "" return ""
@@ -574,6 +588,12 @@ def _hydrate_gate_store_from_chain(events: list[dict]) -> int:
pass pass
return count return count
def _hydrate_dm_relay_from_chain(events: list[dict]) -> int:
import main as _m
return int(_m._hydrate_dm_relay_from_chain(events))
# --- Safe type helpers --- # --- Safe type helpers ---
def _safe_int(val, default=0): def _safe_int(val, default=0):
@@ -1531,7 +1551,7 @@ async def infonet_locator(request: Request, limit: int = Query(32, ge=4, le=128)
@router.post("/api/mesh/infonet/sync") @router.post("/api/mesh/infonet/sync")
@limiter.limit("30/minute") @limiter.limit(_INFONET_SYNC_RATE_LIMIT)
@mesh_write_exempt(MeshWriteExemption.PEER_GOSSIP) @mesh_write_exempt(MeshWriteExemption.PEER_GOSSIP)
async def infonet_sync_post( async def infonet_sync_post(
request: Request, request: Request,
@@ -1584,8 +1604,7 @@ async def infonet_sync_post(
elif matched_hash == GENESIS_HASH and len(locator) > 1: elif matched_hash == GENESIS_HASH and len(locator) > 1:
forked = True forked = True
# Filter out legacy gate_message events — not part of the public sync surface. events = _infonet_sync_response_events(events, request=request)
events = [_redact_public_event(e) for e in events if e.get("event_type") != "gate_message"]
response = { response = {
"events": events, "events": events,
@@ -1646,7 +1665,7 @@ async def mesh_rns_status(request: Request):
@router.get("/api/mesh/infonet/sync") @router.get("/api/mesh/infonet/sync")
@limiter.limit("30/minute") @limiter.limit(_INFONET_SYNC_RATE_LIMIT)
async def infonet_sync( async def infonet_sync(
request: Request, request: Request,
after_hash: str = "", after_hash: str = "",
@@ -1684,8 +1703,7 @@ async def infonet_sync(
) )
base = after_hash or GENESIS_HASH base = after_hash or GENESIS_HASH
events = infonet.get_events_after(base, limit=limit) events = infonet.get_events_after(base, limit=limit)
# Filter out legacy gate_message events — not part of the public sync surface. events = _infonet_sync_response_events(events, request=request)
events = [_redact_public_event(e) for e in events if e.get("event_type") != "gate_message"]
return { return {
"events": events, "events": events,
"after_hash": base, "after_hash": base,
@@ -1724,6 +1742,7 @@ async def infonet_ingest(request: Request):
result = infonet.ingest_events(events) result = infonet.ingest_events(events)
_hydrate_gate_store_from_chain(events) _hydrate_gate_store_from_chain(events)
_hydrate_dm_relay_from_chain(events)
return {"ok": True, **result} return {"ok": True, **result}
@@ -2279,6 +2298,12 @@ async def infonet_event(request: Request, event_id: str):
) )
return _strip_gate_for_access(evt, access) return _strip_gate_for_access(evt, access)
return {"ok": False, "detail": "Event not found"} return {"ok": False, "detail": "Event not found"}
if evt.get("event_type") == "dm_message":
return await _private_plane_refusal_response(
request,
status_code=403,
payload=_private_plane_access_denied_payload(),
)
if evt.get("event_type") == "gate_message": if evt.get("event_type") == "gate_message":
gate_id = str(evt.get("payload", {}).get("gate", "") or evt.get("gate", "") or "").strip() gate_id = str(evt.get("payload", {}).get("gate", "") or evt.get("gate", "") or "").strip()
access = _verify_gate_access(request, gate_id) if gate_id else "" access = _verify_gate_access(request, gate_id) if gate_id else ""
@@ -2303,7 +2328,7 @@ async def infonet_node_events(
from services.mesh.mesh_hashchain import infonet from services.mesh.mesh_hashchain import infonet
events = infonet.get_events_by_node(node_id, limit=limit) events = infonet.get_events_by_node(node_id, limit=limit)
events = [e for e in events if e.get("event_type") != "gate_message"] events = [e for e in events if e.get("event_type") not in {"gate_message", "dm_message"}]
events = [_redact_public_event(e) for e in infonet.decorate_events(events)] events = [_redact_public_event(e) for e in infonet.decorate_events(events)]
events = _redact_public_node_history( events = _redact_public_node_history(
events, events,
@@ -2328,7 +2353,7 @@ async def infonet_events_by_type(
else: else:
events = list(reversed(infonet.events)) events = list(reversed(infonet.events))
events = events[offset : offset + limit] events = events[offset : offset + limit]
events = [e for e in events if e.get("event_type") != "gate_message"] events = [e for e in events if e.get("event_type") not in {"gate_message", "dm_message"}]
events = [_redact_public_event(e) for e in infonet.decorate_events(events)] events = [_redact_public_event(e) for e in infonet.decorate_events(events)]
return { return {
"events": events, "events": events,
+151
View File
@@ -0,0 +1,151 @@
"""Operator OSINT recon routes (server-side proxies, SSRF guarded)."""
from __future__ import annotations
from fastapi import APIRouter, Depends, HTTPException, Query, Request
from pydantic import BaseModel, Field
from auth import require_local_operator
from limiter import limiter
from services.osint import lookups
router = APIRouter(dependencies=[Depends(require_local_operator)])
_ALLOWED_SCHEMAS = {
"Person",
"Organization",
"Company",
"Vessel",
"Airplane",
"LegalEntity",
}
class SweepScanRequest(BaseModel):
ip: str = Field(min_length=7, max_length=45)
cidr: int = Field(default=24, ge=24, le=32)
def _bad_request(exc: ValueError) -> HTTPException:
return HTTPException(status_code=400, detail=str(exc))
@router.get("/api/osint/ip")
@limiter.limit("20/minute")
async def osint_ip(request: Request, ip: str = Query(..., min_length=7, max_length=45)) -> dict:
try:
return lookups.lookup_ip(ip)
except ValueError as exc:
raise _bad_request(exc) from exc
@router.get("/api/osint/dns")
@limiter.limit("20/minute")
async def osint_dns(request: Request, domain: str = Query(..., min_length=4, max_length=253)) -> dict:
try:
return lookups.lookup_dns(domain)
except ValueError as exc:
raise _bad_request(exc) from exc
@router.get("/api/osint/whois")
@limiter.limit("20/minute")
async def osint_whois(request: Request, domain: str = Query(..., min_length=4, max_length=253)) -> dict:
try:
return lookups.lookup_whois(domain)
except ValueError as exc:
raise _bad_request(exc) from exc
@router.get("/api/osint/certs")
@limiter.limit("20/minute")
async def osint_certs(request: Request, domain: str = Query(..., min_length=4, max_length=253)) -> dict:
try:
return lookups.lookup_certs(domain)
except ValueError as exc:
raise _bad_request(exc) from exc
@router.get("/api/osint/threats")
@limiter.limit("20/minute")
async def osint_threats(request: Request, query: str | None = Query(default=None, max_length=253)) -> dict:
return lookups.lookup_threats(query)
@router.get("/api/osint/bgp")
@limiter.limit("20/minute")
async def osint_bgp(request: Request, query: str = Query(..., min_length=2, max_length=64)) -> dict:
try:
return lookups.lookup_bgp(query)
except ValueError as exc:
raise _bad_request(exc) from exc
@router.get("/api/osint/sanctions")
@limiter.limit("20/minute")
async def osint_sanctions(
request: Request,
query: str = Query(..., min_length=4, max_length=200),
schema: str | None = Query(default=None),
limit: int = Query(default=25, ge=1, le=100),
) -> dict:
if schema and schema not in _ALLOWED_SCHEMAS:
raise HTTPException(status_code=400, detail=f"Invalid schema. Allowed: {', '.join(sorted(_ALLOWED_SCHEMAS))}")
return lookups.lookup_sanctions(query, schema=schema, limit=limit)
@router.get("/api/osint/cve")
@limiter.limit("30/minute")
async def osint_cve(request: Request, cve: str = Query(..., min_length=10, max_length=32)) -> dict:
try:
return lookups.lookup_cve(cve)
except ValueError as exc:
raise HTTPException(status_code=404 if "not found" in str(exc).lower() else 400, detail=str(exc)) from exc
@router.get("/api/osint/mac")
@limiter.limit("20/minute")
async def osint_mac(request: Request, mac: str = Query(..., min_length=5, max_length=32)) -> dict:
return lookups.lookup_mac(mac)
@router.get("/api/osint/github")
@limiter.limit("20/minute")
async def osint_github(request: Request, username: str = Query(..., min_length=1, max_length=64)) -> dict:
try:
return lookups.lookup_github(username)
except ValueError as exc:
raise HTTPException(status_code=404, detail=str(exc)) from exc
@router.get("/api/osint/leaks")
@limiter.limit("10/minute")
async def osint_leaks(request: Request, email: str = Query(..., min_length=5, max_length=254)) -> dict:
try:
return lookups.lookup_leaks(email)
except ValueError as exc:
raise _bad_request(exc) from exc
@router.get("/api/osint/sweep")
@limiter.limit("5/minute")
async def osint_sweep_init(
request: Request,
ip: str = Query(..., min_length=7, max_length=45),
cidr: int = Query(default=24, ge=24, le=32),
) -> dict:
try:
return lookups.sweep_init(ip, cidr)
except ValueError as exc:
raise _bad_request(exc) from exc
@router.post("/api/osint/sweep/scan")
@limiter.limit("3/minute")
async def osint_sweep_scan(request: Request, payload: SweepScanRequest) -> dict:
try:
subnet = lookups.subnet_start_for(payload.ip, payload.cidr)
scan = lookups.sweep_scan(subnet, payload.cidr)
init = lookups.sweep_init(payload.ip, payload.cidr)
return {**init, **scan, "subnet": f"{subnet}/{payload.cidr}"}
except ValueError as exc:
raise _bad_request(exc) from exc
+105
View File
@@ -0,0 +1,105 @@
"""Road corridor Sentinel-2 freight trend endpoints (opt-in slow layer)."""
from fastapi import APIRouter, HTTPException, Query, Request
from pydantic import BaseModel, Field
from limiter import limiter
from services.road_corridor_sat.config import optional_deps_available, road_corridor_sat_enabled
from services.road_corridor_sat.credentials import sentinel_credentials_configured
from services.road_corridor_sat.jobs import enqueue_analyze, get_job, get_latest_job, job_to_dict
from services.road_corridor_sat.presets import CORRIDOR_PRESETS, get_preset
from services.road_corridor_sat.storage import build_trends_payload, preset_metadata
router = APIRouter()
def _status_payload() -> dict:
latest = get_latest_job()
return {
"enabled": road_corridor_sat_enabled(),
"deps_installed": optional_deps_available(),
"credentials_configured": sentinel_credentials_configured(),
"preset_count": len(CORRIDOR_PRESETS),
"attribution": "backend/third_party/drishx/NOTICE.md",
"active_job": job_to_dict(latest) if latest and latest.status in {"queued", "running"} else None,
}
def _require_analyze_ready() -> None:
if not optional_deps_available():
raise HTTPException(
status_code=503,
detail="Install optional road-corridor dependencies (uv sync --extra road-corridor)",
)
if not sentinel_credentials_configured():
raise HTTPException(
status_code=503,
detail="Set SENTINEL_CLIENT_ID and SENTINEL_CLIENT_SECRET in Imagery settings",
)
class AnalyzeRequest(BaseModel):
lat: float = Field(ge=-90, le=90)
lon: float = Field(ge=-180, le=180)
label: str | None = Field(default=None, max_length=120)
@router.get("/api/road-corridors/status")
@limiter.limit("60/minute")
async def road_corridors_status(request: Request) -> dict:
return {"ok": True, **_status_payload()}
@router.get("/api/road-corridors")
@limiter.limit("60/minute")
async def list_road_corridors(request: Request) -> dict:
return {
"ok": True,
"status": _status_payload(),
"presets": CORRIDOR_PRESETS,
"trends": build_trends_payload(),
}
@router.post("/api/road-corridors/analyze")
@limiter.limit("6/minute")
async def analyze_road_corridor_here(request: Request, payload: AnalyzeRequest) -> dict:
"""Start an on-demand Sentinel-2 corridor analysis at map center."""
_require_analyze_ready()
try:
job = enqueue_analyze(payload.lat, payload.lon, payload.label)
except RuntimeError as exc:
if str(exc) == "analysis_already_running":
active = get_latest_job()
raise HTTPException(
status_code=409,
detail="Analysis already in progress",
headers={"X-Job-Id": active.job_id if active else ""},
) from exc
raise
return {"ok": True, **job_to_dict(job)}
@router.get("/api/road-corridors/analyze/status")
@limiter.limit("120/minute")
async def analyze_road_corridor_status(
request: Request,
job_id: str | None = Query(default=None),
) -> dict:
job = get_job(job_id) if job_id else get_latest_job()
if job is None:
return {"ok": True, "job": None}
return {"ok": True, "job": job_to_dict(job)}
@router.get("/api/road-corridors/{preset_id}")
@limiter.limit("60/minute")
async def get_road_corridor(preset_id: str, request: Request) -> dict:
meta = preset_metadata(preset_id)
if meta is None:
raise HTTPException(status_code=404, detail="Unknown corridor preset")
preset = get_preset(preset_id)
if preset is None:
# Ad-hoc viewport runs are stored on disk but not in CORRIDOR_PRESETS.
return {"ok": True, "preset": None, "result": meta, "status": _status_payload()}
return {"ok": True, "preset": preset, "result": meta, "status": _status_payload()}
+16
View File
@@ -0,0 +1,16 @@
"""Supply-chain risk overlay."""
from __future__ import annotations
from fastapi import APIRouter, Depends, Request
from auth import require_local_operator
from limiter import limiter
from services.scm.suppliers import build_scm_payload
router = APIRouter()
@router.get("/api/scm-suppliers")
@limiter.limit("30/minute")
async def scm_suppliers(request: Request, _: None = Depends(require_local_operator)) -> dict:
return build_scm_payload()
+120 -10
View File
@@ -85,7 +85,63 @@ async def api_geocode_reverse(
return await asyncio.to_thread(reverse_geocode, lat, lng, local_only) return await asyncio.to_thread(reverse_geocode, lat, lng, local_only)
@router.get("/api/sentinel2/search") # ── Wikimedia proxy (#360) — browser calls these instead of wikipedia.org ───
@router.get("/api/wikipedia/summary")
@limiter.limit("60/minute")
def api_wikipedia_summary(
request: Request,
title: str = Query(..., min_length=1, max_length=256),
):
"""Proxy Wikipedia REST summaries through the self-hosted backend."""
from services.region_dossier import fetch_wikipedia_page_summary
summary = fetch_wikipedia_page_summary(title)
if summary is None:
return JSONResponse(status_code=404, content={"detail": "not_found"})
return summary
class WikidataSparqlRequest(BaseModel):
query: str
@router.post("/api/wikidata/sparql")
@limiter.limit("30/minute")
def api_wikidata_sparql(request: Request, body: WikidataSparqlRequest):
"""Proxy Wikidata SPARQL so the browser never contacts query.wikidata.org."""
from services.region_dossier import fetch_wikidata_sparql_bindings
q = (body.query or "").strip()
if len(q) > 12_000:
raise HTTPException(400, "SPARQL query too large")
bindings = fetch_wikidata_sparql_bindings(q)
return {"bindings": bindings}
# ── Sentinel proxy routes (Issue #299/#300/#301, reported by tg12) ──────────
# These three endpoints relay external Sentinel / Planetary Computer
# requests through the backend to avoid browser CORS blocks. They are
# operator-only helpers — they MUST NOT be callable by anonymous remote
# users, because:
#
# * /api/sentinel/token — caller supplies their own Sentinel client_id +
# client_secret. Without operator gating, the backend becomes a free
# anonymous OAuth-mint relay for any Copernicus account.
# * /api/sentinel/tile — same shape as the token route but for tile
# imagery. Without gating, the backend acts as an anonymous quota and
# bandwidth relay for Sentinel Hub Process API calls.
# * /api/sentinel2/search — hits the Planetary Computer STAC search API
# and falls back to Esri imagery. No caller credentials are involved,
# but the route is still an anonymous external-search relay. We gate
# it the same way for consistency with the rest of the operator-only
# helper surface.
#
# Gating is via require_local_operator (loopback / bridge / admin key),
# matching the same allowlist already used by /api/region-dossier and
# the other operator helpers further up this file. Single-operator nodes
# see no behavior change — their dashboard already lives on loopback or
# the trusted Docker bridge, so it still resolves.
@router.get("/api/sentinel2/search", dependencies=[Depends(require_local_operator)])
@limiter.limit("30/minute") @limiter.limit("30/minute")
def api_sentinel2_search( def api_sentinel2_search(
request: Request, request: Request,
@@ -97,18 +153,60 @@ def api_sentinel2_search(
return search_sentinel2_scene(lat, lng) return search_sentinel2_scene(lat, lng)
@router.post("/api/sentinel/token") # Issue #298 (tg12): Sentinel credentials moved server-side
# ---------------------------------------------------------------------------
# Previously the frontend kept Copernicus CDSE client_id + client_secret in
# browser localStorage / sessionStorage and forwarded them on every tile
# request through this proxy. That exposed real third-party credentials to
# any same-origin script (XSS, malicious browser extension, dev-tools HAR
# export).
#
# Resolution order (first match wins):
# 1. Request body — kept for back-compat. A small number of legacy
# operator setups may still post credentials; we don't break them.
# 2. Backend .env — SENTINEL_CLIENT_ID / SENTINEL_CLIENT_SECRET, managed
# through the existing /api/settings/api-keys flow (admin-gated).
#
# The frontend in ``sentinelHub.ts`` no longer reads browser storage and no
# longer forwards credentials — every dashboard request now lands in (2).
# The require_local_operator gate (added in #303/PR #303) stays — both layers
# are independent: the gate blocks anonymous callers, the env fallback lets
# legitimate (gated) callers omit credentials from the body.
# ---------------------------------------------------------------------------
def _resolve_sentinel_credentials(body_id: str, body_secret: str) -> tuple[str, str]:
"""Return (client_id, client_secret) using body values when present,
otherwise falling back to backend .env. Empty strings if neither is set."""
import os as _os
cid = (body_id or "").strip() or (_os.environ.get("SENTINEL_CLIENT_ID", "") or "").strip()
csec = (body_secret or "").strip() or (_os.environ.get("SENTINEL_CLIENT_SECRET", "") or "").strip()
return cid, csec
@router.post("/api/sentinel/token", dependencies=[Depends(require_local_operator)])
@limiter.limit("60/minute") @limiter.limit("60/minute")
async def api_sentinel_token(request: Request): async def api_sentinel_token(request: Request):
"""Proxy Copernicus CDSE OAuth2 token request (avoids browser CORS block).""" """Proxy Copernicus CDSE OAuth2 token request (avoids browser CORS block).
Credentials are resolved by ``_resolve_sentinel_credentials`` — body
fields are honored for back-compat, otherwise the backend .env values
populated through ``/api/settings/api-keys`` are used.
"""
import requests as req import requests as req
body = await request.body() body = await request.body()
from urllib.parse import parse_qs from urllib.parse import parse_qs
params = parse_qs(body.decode("utf-8")) params = parse_qs(body.decode("utf-8"))
client_id = params.get("client_id", [""])[0] body_id = params.get("client_id", [""])[0]
client_secret = params.get("client_secret", [""])[0] body_secret = params.get("client_secret", [""])[0]
client_id, client_secret = _resolve_sentinel_credentials(body_id, body_secret)
if not client_id or not client_secret: if not client_id or not client_secret:
raise HTTPException(400, "client_id and client_secret required") # Friendly, non-hostile error — points the operator at the place
# they configure other API keys instead of just saying "required".
raise HTTPException(
400,
"Sentinel client_id/client_secret are not configured. "
"Set SENTINEL_CLIENT_ID and SENTINEL_CLIENT_SECRET in the "
"API Keys panel (Settings → API Keys) or your backend .env.",
)
token_url = "https://identity.dataspace.copernicus.eu/auth/realms/CDSE/protocol/openid-connect/token" token_url = "https://identity.dataspace.copernicus.eu/auth/realms/CDSE/protocol/openid-connect/token"
try: try:
resp = await asyncio.to_thread(req.post, token_url, resp = await asyncio.to_thread(req.post, token_url,
@@ -152,7 +250,7 @@ import os as _os
_SH_TOKEN_CACHE_HMAC_KEY = _os.urandom(32) _SH_TOKEN_CACHE_HMAC_KEY = _os.urandom(32)
@router.post("/api/sentinel/tile") @router.post("/api/sentinel/tile", dependencies=[Depends(require_local_operator)])
@limiter.limit("300/minute") @limiter.limit("300/minute")
async def api_sentinel_tile(request: Request): async def api_sentinel_tile(request: Request):
"""Proxy Sentinel Hub Process API tile request (avoids CORS block).""" """Proxy Sentinel Hub Process API tile request (avoids CORS block)."""
@@ -163,8 +261,11 @@ async def api_sentinel_tile(request: Request):
except Exception: except Exception:
return JSONResponse(status_code=422, content={"ok": False, "detail": "invalid JSON body"}) return JSONResponse(status_code=422, content={"ok": False, "detail": "invalid JSON body"})
client_id = body.get("client_id", "") # Issue #298: same resolution order as /api/sentinel/token — body
client_secret = body.get("client_secret", "") # values for back-compat, otherwise backend .env.
body_id = body.get("client_id", "")
body_secret = body.get("client_secret", "")
client_id, client_secret = _resolve_sentinel_credentials(body_id, body_secret)
preset = body.get("preset", "TRUE-COLOR") preset = body.get("preset", "TRUE-COLOR")
date_str = body.get("date", "") date_str = body.get("date", "")
z = body.get("z", 0) z = body.get("z", 0)
@@ -172,7 +273,16 @@ async def api_sentinel_tile(request: Request):
y = body.get("y", 0) y = body.get("y", 0)
if not client_id or not client_secret or not date_str: if not client_id or not client_secret or not date_str:
raise HTTPException(400, "client_id, client_secret, and date required") # Distinguish "no creds" from "no date" so the operator knows
# what to fix. Same friendly pointer as the /token route.
if not client_id or not client_secret:
raise HTTPException(
400,
"Sentinel client_id/client_secret are not configured. "
"Set SENTINEL_CLIENT_ID and SENTINEL_CLIENT_SECRET in the "
"API Keys panel (Settings → API Keys) or your backend .env.",
)
raise HTTPException(400, "date required")
now = _time.time() now = _time.time()
credential_fp = _credential_fingerprint(client_id, client_secret) credential_fp = _credential_fingerprint(client_id, client_secret)
+7 -2
View File
@@ -160,8 +160,13 @@ router = APIRouter()
# --- Constants --- # --- Constants ---
_WORMHOLE_PUBLIC_SETTINGS_FIELDS = {"enabled", "transport", "anonymous_mode"} # Issue #243 (tg12): the public redaction now exposes only the bare
_WORMHOLE_PUBLIC_PROFILE_FIELDS = {"profile", "wormhole_enabled"} # "is this on?" boolean. Transport choice, anonymous-mode state, and
# the named privacy profile were all leaking actionable recon to
# unauthenticated callers and are now gated behind authenticated reads.
# See the matching block in backend/main.py for the full rationale.
_WORMHOLE_PUBLIC_SETTINGS_FIELDS = {"enabled"}
_WORMHOLE_PUBLIC_PROFILE_FIELDS = {"wormhole_enabled"}
_PRIVATE_LANE_CONTROL_FIELDS = {"private_lane_tier", "private_lane_policy"} _PRIVATE_LANE_CONTROL_FIELDS = {"private_lane_tier", "private_lane_policy"}
_PUBLIC_RNS_STATUS_FIELDS = {"enabled", "ready", "configured_peers", "active_peers"} _PUBLIC_RNS_STATUS_FIELDS = {"enabled", "ready", "configured_peers", "active_peers"}
_NODE_PUBLIC_EVENT_HOOK_REGISTERED = False _NODE_PUBLIC_EVENT_HOOK_REGISTERED = False
+11 -1
View File
@@ -20,7 +20,17 @@ OUT_PATH = Path(__file__).parent.parent / "data" / "power_plants.json"
def main() -> None: def main() -> None:
print(f"Downloading WRI Global Power Plant Database from GitHub...") print(f"Downloading WRI Global Power Plant Database from GitHub...")
req = urllib.request.Request(CSV_URL, headers={"User-Agent": "ShadowBroker-OSINT/1.0"}) # Round 7a: release-time data refresher. Uses the per-operator UA if
# available, otherwise a release-script-specific identifier. This
# script is run by the maintainer at release time, NOT at runtime,
# so an aggregate UA is acceptable; we still use the helper so the
# behavior matches the rest of the project.
try:
from services.network_utils import outbound_user_agent
ua = outbound_user_agent("release-script-power-plants")
except Exception:
ua = "operator-release-script (purpose: power-plants)"
req = urllib.request.Request(CSV_URL, headers={"User-Agent": ua})
with urllib.request.urlopen(req, timeout=60) as resp: with urllib.request.urlopen(req, timeout=60) as resp:
raw = resp.read().decode("utf-8") raw = resp.read().decode("utf-8")
+5
View File
@@ -167,6 +167,11 @@ def cmd_hash(args: argparse.Namespace) -> int:
print("") print("")
print("Updater pin:") print("Updater pin:")
print(f"MESH_UPDATE_SHA256={digest}") print(f"MESH_UPDATE_SHA256={digest}")
print("")
print("Release checklist:")
print(" - add this digest to SHA256SUMS.txt for the GitHub release")
print(" - add/update backend/data/release_digests.json for bundled updater verification")
print(" - keep MESH_UPDATE_SHA256 available as the operator override path")
return 0 if asset_matches else 2 return 0 if asset_matches else 2
+28 -9
View File
@@ -92,18 +92,37 @@ SECRET_REGEX+='pypi-[0-9a-zA-Z-]{50,}' # PyPI token
TEXT_FILES=$(grep -ivE '\.(png|jpg|jpeg|gif|ico|svg|woff2?|ttf|eot|pbf|zip|tar|gz|db|sqlite|xlsx|pdf|mp[34]|wav|ogg|webm|webp|avif)$' "$FILELIST" | grep -v 'scan-secrets\.sh$' || true) TEXT_FILES=$(grep -ivE '\.(png|jpg|jpeg|gif|ico|svg|woff2?|ttf|eot|pbf|zip|tar|gz|db|sqlite|xlsx|pdf|mp[34]|wav|ogg|webm|webp|avif)$' "$FILELIST" | grep -v 'scan-secrets\.sh$' || true)
if [[ -n "$TEXT_FILES" ]]; then if [[ -n "$TEXT_FILES" ]]; then
# Known-public exclusions: lines matching `<host-or-ip> ssh-<algo> <key>`
# are SSH known_hosts entries — the host's PUBLIC fingerprint, which is
# by definition safe to commit (the whole point of pinning known_hosts
# is to publish the fingerprint widely so MITM is detectable). Filter
# these out before flagging the file.
KNOWN_HOSTS_LINE='^[[:space:]]*[a-zA-Z0-9._:,*-]+([[:space:]]+[a-zA-Z0-9._:,*-]+)?[[:space:]]+(ssh-rsa|ssh-ed25519|ssh-dss|ecdsa-sha2-nistp256|ecdsa-sha2-nistp384|ecdsa-sha2-nistp521)[[:space:]]+AAAA'
# Use grep with file list, skip missing/binary, limit output # Use grep with file list, skip missing/binary, limit output
CONTENT_HITS=$(echo "$TEXT_FILES" | xargs grep -lE "$SECRET_REGEX" 2>/dev/null || true) CONTENT_HITS=$(echo "$TEXT_FILES" | xargs grep -lE "$SECRET_REGEX" 2>/dev/null || true)
if [[ -n "$CONTENT_HITS" ]]; then if [[ -n "$CONTENT_HITS" ]]; then
echo -e "\n${RED}BLOCKED: Embedded secrets/tokens found in:${NC}" REAL_HITS=""
echo "$CONTENT_HITS" | while read -r f; do REAL_REPORT=""
echo -e " ${RED}$f${NC}" while IFS= read -r f; do
# Show first matching line for context [[ -z "$f" ]] && continue
grep -nE "$SECRET_REGEX" "$f" 2>/dev/null | head -2 | while read -r line; do # Re-grep this file, but filter out known_hosts-style lines.
echo -e " ${YELLOW}$line${NC}" FILE_HITS=$(grep -nE "$SECRET_REGEX" "$f" 2>/dev/null | grep -vE "$KNOWN_HOSTS_LINE" || true)
done if [[ -n "$FILE_HITS" ]]; then
done REAL_HITS+="$f"$'\n'
FOUND=1 REAL_REPORT+=" ${RED}$f${NC}"$'\n'
# Show first 2 matching lines for context
while IFS= read -r line; do
[[ -z "$line" ]] && continue
REAL_REPORT+=" ${YELLOW}$line${NC}"$'\n'
done < <(echo "$FILE_HITS" | head -2)
fi
done <<< "$CONTENT_HITS"
if [[ -n "$REAL_HITS" ]]; then
echo -e "\n${RED}BLOCKED: Embedded secrets/tokens found in:${NC}"
echo -en "$REAL_REPORT"
FOUND=1
fi
fi fi
fi fi
+54 -7
View File
@@ -350,19 +350,58 @@ _proxy_process = None
# path during an upstream cert outage. Surfaced via ais_proxy_status() for # path during an upstream cert outage. Surfaced via ais_proxy_status() for
# /api/health. # /api/health.
_proxy_status: dict = {} _proxy_status: dict = {}
# Upstream-connectivity telemetry (added when stream.aisstream.io went fully
# offline on 2026-05-23). ``_last_msg_at`` is the unix timestamp of the most
# recent vessel message received from the proxy. ``_proxy_spawn_count`` is
# how many times we've started the node proxy; combined with no recent
# messages it tells us the proxy is respawning in a tight loop because the
# upstream is unreachable. Surfaced via ais_proxy_status() so the operator
# can see "AIS is dead" instead of guessing whether it's their map filter,
# their api key, or upstream.
_last_msg_at: float = 0.0
_proxy_spawn_count: int = 0
_VESSEL_TRAIL_INTERVAL_S = 120 _VESSEL_TRAIL_INTERVAL_S = 120
_VESSEL_TRAIL_MAX_POINTS = 240 _VESSEL_TRAIL_MAX_POINTS = 240
def ais_proxy_status() -> dict: # How stale "last vessel message" can be before we consider the stream
"""Return a copy of the latest ais_proxy.js status (issue #258). # disconnected. AISStream typically pushes multiple messages/sec, so a 60s
# gap means something's wrong upstream or in transit.
_AIS_CONNECTED_FRESHNESS_S = 60
Currently surfaces ``degraded_tls`` (bool) which is true when the
proxy is using SPKI-pinned fallback because AISStream's cert expired. def ais_proxy_status() -> dict:
Returns an empty dict when no status has been received yet. """Return a copy of the latest ais_proxy.js status + connectivity health.
Fields:
* ``degraded_tls`` (bool, issue #258) — true when the proxy is using
SPKI-pinned fallback because AISStream's cert expired.
* ``connected`` (bool) — true when we received a vessel message in
the last ``_AIS_CONNECTED_FRESHNESS_S`` seconds.
* ``last_msg_age_seconds`` (int | None) — seconds since the last
vessel message; None if we've never received one.
* ``proxy_spawn_count`` (int) — how many times we've spawned the
node proxy. Sustained increases here without ``connected`` means
we're respawning in a tight loop because upstream is dead.
Returns an empty dict when called before the AIS subsystem starts
(e.g. during tests or when no API key is set).
""" """
with _vessels_lock: with _vessels_lock:
return dict(_proxy_status) status = dict(_proxy_status)
last = _last_msg_at
spawns = _proxy_spawn_count
now = time.time()
if last > 0:
last_age = int(now - last)
status["last_msg_age_seconds"] = last_age
status["connected"] = last_age <= _AIS_CONNECTED_FRESHNESS_S
else:
status["last_msg_age_seconds"] = None
status["connected"] = False
status["proxy_spawn_count"] = spawns
return status
import os import os
@@ -588,8 +627,10 @@ def _ais_stream_loop():
env=proxy_env, env=proxy_env,
**popen_kwargs, **popen_kwargs,
) )
global _proxy_spawn_count
with _vessels_lock: with _vessels_lock:
_proxy_process = process _proxy_process = process
_proxy_spawn_count += 1
# Drain stderr in a background thread to prevent deadlock # Drain stderr in a background thread to prevent deadlock
import threading import threading
@@ -645,9 +686,15 @@ def _ais_stream_loop():
if not mmsi: if not mmsi:
continue continue
# Telemetry: stamp the timestamp of the most recent real
# vessel message. ais_proxy_status() reads this to decide
# whether the stream is currently "connected" — i.e. has
# any data flowed in the last 60s.
global _last_msg_at
with _vessels_lock: with _vessels_lock:
_last_msg_at = time.time()
if mmsi not in _vessels: if mmsi not in _vessels:
_vessels[mmsi] = {"_updated": time.time()} _vessels[mmsi] = {"_updated": _last_msg_at}
vessel = _vessels[mmsi] vessel = _vessels[mmsi]
# Update position from PositionReport or StandardClassBPositionReport # Update position from PositionReport or StandardClassBPositionReport
+34
View File
@@ -51,6 +51,15 @@ API_REGISTRY = [
"url": "https://aisstream.io/", "url": "https://aisstream.io/",
"required": True, "required": True,
}, },
{
"id": "gfw_api_token",
"env_key": "GFW_API_TOKEN",
"name": "Global Fishing Watch",
"description": "Bearer token for Global Fishing Watch fishing-vessel activity events (Fishing Activity map layer). Free registration at globalfishingwatch.org.",
"category": "Maritime",
"url": "https://globalfishingwatch.org/our-apis/",
"required": False,
},
{ {
"id": "adsb_lol", "id": "adsb_lol",
"env_key": None, "env_key": None,
@@ -150,6 +159,31 @@ API_REGISTRY = [
"url": "https://finnhub.io/register", "url": "https://finnhub.io/register",
"required": False, "required": False,
}, },
# Issue #298 (tg12): Sentinel Hub / Copernicus Data Space Ecosystem
# credentials were previously held in browser localStorage / sessionStorage
# by the Settings panel. Moved server-side to the same .env-backed
# store every other third-party API key lives in. The Sentinel proxy
# routes (POST /api/sentinel/token, /tile) now fall back to these
# env values when the request body omits credentials — see
# backend/routers/tools.py for the resolution order.
{
"id": "sentinel_client_id",
"env_key": "SENTINEL_CLIENT_ID",
"name": "Sentinel Hub / Copernicus — Client ID",
"description": "OAuth2 client ID for Copernicus Data Space Ecosystem (CDSE). Required for the Sentinel-2 imagery overlay and the right-click Sentinel-2 Intel Card. Sign in at dataspace.copernicus.eu and create OAuth credentials.",
"category": "Imagery",
"url": "https://dataspace.copernicus.eu/",
"required": False,
},
{
"id": "sentinel_client_secret",
"env_key": "SENTINEL_CLIENT_SECRET",
"name": "Sentinel Hub / Copernicus — Client Secret",
"description": "OAuth2 client secret paired with the Client ID above. Used by the backend to mint short-lived access tokens against the CDSE identity provider. Stored in the backend .env; never sent to the browser.",
"category": "Imagery",
"url": "https://dataspace.copernicus.eu/",
"required": False,
},
] ]
ALLOWED_ENV_KEYS = { ALLOWED_ENV_KEYS = {
+407 -173
View File
@@ -1,46 +1,90 @@
""" """
Carrier Strike Group OSINT Tracker Carrier Strike Group OSINT Tracker
=================================== ===================================
Scrapes multiple OSINT sources to maintain current estimated positions Maintains estimated positions for US Navy Carrier Strike Groups with
for US Navy Carrier Strike Groups. Updates on startup + 00:00 & 12:00 UTC. honest provenance and freshness signals.
Sources: Issues #244 / #245 / #246 (tg12 external audit):
1. GDELT News API recent carrier movement headlines
2. WikiVoyage / public port-call databases The previous implementation baked a snapshot of USNI News Fleet &
3. Fallback last-known or static OSINT estimates Marine Tracker positions (March 9, 2026) into the registry as
``fallback_lat``/``fallback_lng`` and stamped ``updated = now()``
every time the dossier was rendered. That presented stale editorial
data as live state. It also persisted GDELT-derived positions to the
on-disk cache with no freshness signal, so a single news mention from
months ago could keep overriding the (already-stale) registry default
indefinitely.
Architecture after this PR:
::
backend/data/carrier_seed.json read-only, shipped with image,
used ONCE on first-ever startup
to bootstrap carrier_cache.json.
backend/data/carrier_cache.json mutable, lives in the runtime data
volume, written by every GDELT
refresh + any future source.
Startup flow:
1. ``carrier_cache.json`` exists? load it.
2. Otherwise, copy ``carrier_seed.json`` ``carrier_cache.json``,
then load it. (This happens once, ever, per install.)
3. Background: GDELT fetch runs. Any carrier mentioned in fresh news
gets its entry replaced with the news-derived position.
``position_source_at`` is set to the news article timestamp.
Freshness is a *labelling* decision, not an eviction decision:
- ``position_source_at`` within the configurable freshness window
(default 14 days) ``position_confidence = "recent"``.
- Older than that ``position_confidence = "stale"``.
- Bootstrapped from the seed file (never updated) ``"seed"``.
- No cache entry at all (e.g. a carrier added to the registry after
first install) carrier renders at its homeport with
``"homeport_default"``.
Carriers are never hidden, never teleported, never disappeared. The
position the user sees is always the last position the system actually
observed, with an honest "as-of" timestamp the UI can render however
it likes. A year from now, the runtime cache reflects whatever this
install has observed via GDELT not the seed snapshot.
""" """
import re import os
import json import json
import time import time
import logging import logging
import threading import threading
import random import random
from datetime import datetime, timezone import shutil
from datetime import datetime, timedelta, timezone
from pathlib import Path from pathlib import Path
from typing import Dict, List, Optional from typing import Any, Dict, List, Optional, Tuple
from services.network_utils import fetch_with_curl from services.network_utils import fetch_with_curl
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)
# ----------------------------------------------------------------- # -----------------------------------------------------------------
# Carrier registry: hull number → metadata + fallback position # Carrier registry: hull number → identity only.
#
# Issue #244 (tg12): the previous registry carried hard-coded
# ``fallback_lat``/``fallback_lng`` that were dated editorial
# snapshots from a 2026-03-09 article. Those fields are DELETED. The
# registry is now identity + homeport only; positions are sourced
# exclusively from carrier_cache.json (and via that, from the
# bootstrap seed or live OSINT).
# ----------------------------------------------------------------- # -----------------------------------------------------------------
CARRIER_REGISTRY: Dict[str, dict] = { CARRIER_REGISTRY: Dict[str, dict] = {
# Fallback positions sourced from USNI News Fleet & Marine Tracker (Mar 9, 2026)
# https://news.usni.org/2026/03/09/usni-news-fleet-and-marine-tracker-march-9-2026
# --- Bremerton, WA (Naval Base Kitsap) --- # --- Bremerton, WA (Naval Base Kitsap) ---
# Distinct pier positions along Sinclair Inlet so carriers don't stack
"CVN-68": { "CVN-68": {
"name": "USS Nimitz (CVN-68)", "name": "USS Nimitz (CVN-68)",
"wiki": "https://en.wikipedia.org/wiki/USS_Nimitz", "wiki": "https://en.wikipedia.org/wiki/USS_Nimitz",
"homeport": "Bremerton, WA", "homeport": "Bremerton, WA",
"homeport_lat": 47.5535, "homeport_lat": 47.5535,
"homeport_lng": -122.6400, "homeport_lng": -122.6400,
"fallback_lat": 47.5535,
"fallback_lng": -122.6400,
"fallback_heading": 90,
"fallback_desc": "Bremerton, WA (Maintenance)",
}, },
"CVN-76": { "CVN-76": {
"name": "USS Ronald Reagan (CVN-76)", "name": "USS Ronald Reagan (CVN-76)",
@@ -48,23 +92,14 @@ CARRIER_REGISTRY: Dict[str, dict] = {
"homeport": "Bremerton, WA", "homeport": "Bremerton, WA",
"homeport_lat": 47.5580, "homeport_lat": 47.5580,
"homeport_lng": -122.6360, "homeport_lng": -122.6360,
"fallback_lat": 47.5580,
"fallback_lng": -122.6360,
"fallback_heading": 90,
"fallback_desc": "Bremerton, WA (Decommissioning)",
}, },
# --- Norfolk, VA (Naval Station Norfolk) --- # --- Norfolk, VA (Naval Station Norfolk) ---
# Piers run N-S along Willoughby Bay; each carrier gets a distinct berth
"CVN-69": { "CVN-69": {
"name": "USS Dwight D. Eisenhower (CVN-69)", "name": "USS Dwight D. Eisenhower (CVN-69)",
"wiki": "https://en.wikipedia.org/wiki/USS_Dwight_D._Eisenhower", "wiki": "https://en.wikipedia.org/wiki/USS_Dwight_D._Eisenhower",
"homeport": "Norfolk, VA", "homeport": "Norfolk, VA",
"homeport_lat": 36.9465, "homeport_lat": 36.9465,
"homeport_lng": -76.3265, "homeport_lng": -76.3265,
"fallback_lat": 36.9465,
"fallback_lng": -76.3265,
"fallback_heading": 0,
"fallback_desc": "Norfolk, VA (Post-deployment maintenance)",
}, },
"CVN-78": { "CVN-78": {
"name": "USS Gerald R. Ford (CVN-78)", "name": "USS Gerald R. Ford (CVN-78)",
@@ -72,10 +107,6 @@ CARRIER_REGISTRY: Dict[str, dict] = {
"homeport": "Norfolk, VA", "homeport": "Norfolk, VA",
"homeport_lat": 36.9505, "homeport_lat": 36.9505,
"homeport_lng": -76.3250, "homeport_lng": -76.3250,
"fallback_lat": 18.0,
"fallback_lng": 39.5,
"fallback_heading": 0,
"fallback_desc": "Red Sea — Operation Epic Fury (USNI Mar 9)",
}, },
"CVN-74": { "CVN-74": {
"name": "USS John C. Stennis (CVN-74)", "name": "USS John C. Stennis (CVN-74)",
@@ -83,10 +114,6 @@ CARRIER_REGISTRY: Dict[str, dict] = {
"homeport": "Norfolk, VA", "homeport": "Norfolk, VA",
"homeport_lat": 36.9540, "homeport_lat": 36.9540,
"homeport_lng": -76.3235, "homeport_lng": -76.3235,
"fallback_lat": 36.98,
"fallback_lng": -76.43,
"fallback_heading": 0,
"fallback_desc": "Newport News, VA (RCOH refueling overhaul)",
}, },
"CVN-75": { "CVN-75": {
"name": "USS Harry S. Truman (CVN-75)", "name": "USS Harry S. Truman (CVN-75)",
@@ -94,10 +121,6 @@ CARRIER_REGISTRY: Dict[str, dict] = {
"homeport": "Norfolk, VA", "homeport": "Norfolk, VA",
"homeport_lat": 36.9580, "homeport_lat": 36.9580,
"homeport_lng": -76.3220, "homeport_lng": -76.3220,
"fallback_lat": 36.0,
"fallback_lng": 15.0,
"fallback_heading": 0,
"fallback_desc": "Mediterranean Sea deployment (USNI Mar 9)",
}, },
"CVN-77": { "CVN-77": {
"name": "USS George H.W. Bush (CVN-77)", "name": "USS George H.W. Bush (CVN-77)",
@@ -105,23 +128,14 @@ CARRIER_REGISTRY: Dict[str, dict] = {
"homeport": "Norfolk, VA", "homeport": "Norfolk, VA",
"homeport_lat": 36.9620, "homeport_lat": 36.9620,
"homeport_lng": -76.3210, "homeport_lng": -76.3210,
"fallback_lat": 36.5,
"fallback_lng": -74.0,
"fallback_heading": 0,
"fallback_desc": "Atlantic — Pre-deployment workups (USNI Mar 9)",
}, },
# --- San Diego, CA (Naval Base San Diego) --- # --- San Diego, CA (Naval Base San Diego) ---
# Carrier piers along the east shore of San Diego Bay, spread N-S
"CVN-70": { "CVN-70": {
"name": "USS Carl Vinson (CVN-70)", "name": "USS Carl Vinson (CVN-70)",
"wiki": "https://en.wikipedia.org/wiki/USS_Carl_Vinson", "wiki": "https://en.wikipedia.org/wiki/USS_Carl_Vinson",
"homeport": "San Diego, CA", "homeport": "San Diego, CA",
"homeport_lat": 32.6840, "homeport_lat": 32.6840,
"homeport_lng": -117.1290, "homeport_lng": -117.1290,
"fallback_lat": 32.6840,
"fallback_lng": -117.1290,
"fallback_heading": 180,
"fallback_desc": "San Diego, CA (Homeport)",
}, },
"CVN-71": { "CVN-71": {
"name": "USS Theodore Roosevelt (CVN-71)", "name": "USS Theodore Roosevelt (CVN-71)",
@@ -129,10 +143,6 @@ CARRIER_REGISTRY: Dict[str, dict] = {
"homeport": "San Diego, CA", "homeport": "San Diego, CA",
"homeport_lat": 32.6885, "homeport_lat": 32.6885,
"homeport_lng": -117.1280, "homeport_lng": -117.1280,
"fallback_lat": 32.6885,
"fallback_lng": -117.1280,
"fallback_heading": 180,
"fallback_desc": "San Diego, CA (Maintenance)",
}, },
"CVN-72": { "CVN-72": {
"name": "USS Abraham Lincoln (CVN-72)", "name": "USS Abraham Lincoln (CVN-72)",
@@ -140,10 +150,6 @@ CARRIER_REGISTRY: Dict[str, dict] = {
"homeport": "San Diego, CA", "homeport": "San Diego, CA",
"homeport_lat": 32.6925, "homeport_lat": 32.6925,
"homeport_lng": -117.1275, "homeport_lng": -117.1275,
"fallback_lat": 20.0,
"fallback_lng": 64.0,
"fallback_heading": 0,
"fallback_desc": "Arabian Sea — Operation Epic Fury (USNI Mar 9)",
}, },
# --- Yokosuka, Japan (CFAY) --- # --- Yokosuka, Japan (CFAY) ---
"CVN-73": { "CVN-73": {
@@ -152,16 +158,18 @@ CARRIER_REGISTRY: Dict[str, dict] = {
"homeport": "Yokosuka, Japan", "homeport": "Yokosuka, Japan",
"homeport_lat": 35.2830, "homeport_lat": 35.2830,
"homeport_lng": 139.6700, "homeport_lng": 139.6700,
"fallback_lat": 35.2830,
"fallback_lng": 139.6700,
"fallback_heading": 180,
"fallback_desc": "Yokosuka, Japan (Forward deployed)",
}, },
} }
# ----------------------------------------------------------------- # -----------------------------------------------------------------
# Region → approximate center coordinates # Region → approximate center coordinates.
# Used to map textual geographic descriptions to lat/lng #
# Issue #245 (tg12): converting a region name straight into precise
# map coordinates is false precision. We still use this table to
# infer a coarse position from a headline mention, but the resulting
# carrier object is now stamped ``position_confidence = "approximate"``
# so the UI can render an uncertainty radius / dimmed icon. The
# centroid is a best-effort midpoint of the named body of water.
# ----------------------------------------------------------------- # -----------------------------------------------------------------
REGION_COORDS: Dict[str, tuple] = { REGION_COORDS: Dict[str, tuple] = {
# Oceans & Seas # Oceans & Seas
@@ -220,9 +228,39 @@ REGION_COORDS: Dict[str, tuple] = {
} }
# ----------------------------------------------------------------- # -----------------------------------------------------------------
# Cache file for persisting positions between restarts # Files
# ----------------------------------------------------------------- # -----------------------------------------------------------------
CACHE_FILE = Path(__file__).parent.parent / "carrier_cache.json" #
# The seed lives in the read-only image data dir (it ships with each
# release). The cache lives in the same data dir but is written at
# runtime; under Docker compose this dir is volume-mounted so the
# cache persists across container restarts, which is the whole point
# of the seed-then-observe model — the user's runtime observations
# survive image upgrades.
SEED_FILE = Path(__file__).parent.parent / "data" / "carrier_seed.json"
CACHE_FILE = Path(__file__).parent.parent / "data" / "carrier_cache.json"
# -----------------------------------------------------------------
# Freshness window for position_confidence labeling. Issue #246 (tg12):
# previously persisted cache entries had no freshness signal at all.
# After this change, the position itself is preserved (we never lose
# what was last observed) but the confidence label flips from
# "recent" to "stale" once the underlying source is older than this
# window. Operator-overridable via env var.
# -----------------------------------------------------------------
_DEFAULT_FRESHNESS_WINDOW_DAYS = 14
def _freshness_window_days() -> int:
raw = str(os.environ.get("SHADOWBROKER_CARRIER_FRESHNESS_DAYS", "") or "").strip()
if not raw:
return _DEFAULT_FRESHNESS_WINDOW_DAYS
try:
n = int(raw)
return n if n > 0 else _DEFAULT_FRESHNESS_WINDOW_DAYS
except (TypeError, ValueError):
return _DEFAULT_FRESHNESS_WINDOW_DAYS
_carrier_positions: Dict[str, dict] = {} _carrier_positions: Dict[str, dict] = {}
_positions_lock = threading.Lock() _positions_lock = threading.Lock()
@@ -234,25 +272,159 @@ _GDELT_REQUEST_DELAY_SECONDS = 1.25
_GDELT_REQUEST_JITTER_SECONDS = 0.35 _GDELT_REQUEST_JITTER_SECONDS = 0.35
def _now_iso() -> str:
return datetime.now(timezone.utc).isoformat()
def _parse_iso(ts: str) -> Optional[datetime]:
if not ts:
return None
try:
# Python's fromisoformat accepts +00:00 but not 'Z' until 3.11.
normalized = ts.replace("Z", "+00:00")
dt = datetime.fromisoformat(normalized)
if dt.tzinfo is None:
dt = dt.replace(tzinfo=timezone.utc)
return dt
except (TypeError, ValueError):
return None
def _compute_position_confidence(entry: dict, *, now: Optional[datetime] = None) -> str:
"""Return the public confidence label for a carrier cache entry.
Order of precedence:
- explicit "homeport_default" / "seed" labels are preserved.
- dated entries (with position_source_at) are "recent" if within
the configured freshness window, else "stale".
- missing position_source_at falls through to "stale".
"""
raw_label = str(entry.get("position_confidence", "") or "").strip()
# Explicit "kind of provenance" labels are preserved as-is. They
# describe HOW we got the position, not WHEN — a fresh headline-to-
# centroid match (#245) is still imprecise no matter how recently
# it was observed, and the seed (#244) is always the seed.
if raw_label in {"seed", "homeport_default", "approximate"}:
# Approximate entries can still age into "stale_approximate" if
# they fall out of the freshness window — that distinction lets
# the UI render a different badge for old-and-imprecise vs
# recent-and-imprecise. seed/homeport_default never age (they
# were never timestamped against real observations).
if raw_label == "approximate":
source_at = _parse_iso(str(entry.get("position_source_at", "") or ""))
if source_at is not None:
reference = now or datetime.now(timezone.utc)
if reference - source_at > timedelta(days=_freshness_window_days()):
return "stale_approximate"
return raw_label
source_at = _parse_iso(str(entry.get("position_source_at", "") or ""))
if not source_at:
return "stale"
reference = now or datetime.now(timezone.utc)
window = timedelta(days=_freshness_window_days())
if reference - source_at <= window:
return "recent"
return "stale"
def _load_seed() -> Dict[str, dict]:
"""Load the read-only seed file shipped with the image.
Returns a hullentry dict (no _meta wrapper). Missing or malformed
seed files yield an empty dict the caller falls back to homeport
defaults.
"""
try:
if not SEED_FILE.exists():
logger.info("Carrier seed file not present at %s; first-run will fall back to homeport defaults", SEED_FILE)
return {}
raw = json.loads(SEED_FILE.read_text(encoding="utf-8"))
carriers = raw.get("carriers", {}) if isinstance(raw, dict) else {}
if not isinstance(carriers, dict):
return {}
logger.info("Carrier seed loaded: %d entries from %s", len(carriers), SEED_FILE)
return carriers
except (IOError, OSError, json.JSONDecodeError, ValueError) as e:
logger.warning("Failed to load carrier seed file %s: %s", SEED_FILE, e)
return {}
def _load_cache() -> Dict[str, dict]: def _load_cache() -> Dict[str, dict]:
"""Load cached carrier positions from disk.""" """Load the mutable cache (last-known positions persisted between restarts)."""
try: try:
if CACHE_FILE.exists(): if CACHE_FILE.exists():
data = json.loads(CACHE_FILE.read_text()) data = json.loads(CACHE_FILE.read_text(encoding="utf-8"))
logger.info(f"Carrier cache loaded: {len(data)} carriers from {CACHE_FILE}") if isinstance(data, dict):
return data logger.info("Carrier cache loaded: %d carriers from %s", len(data), CACHE_FILE)
return data
except (IOError, OSError, json.JSONDecodeError, ValueError) as e: except (IOError, OSError, json.JSONDecodeError, ValueError) as e:
logger.warning(f"Failed to load carrier cache: {e}") logger.warning("Failed to load carrier cache: %s", e)
return {} return {}
def _save_cache(positions: Dict[str, dict]): def _save_cache(positions: Dict[str, dict]) -> None:
"""Persist carrier positions to disk.""" """Persist the mutable cache. Atomic write (temp + rename) so a crash
mid-write can't leave the file truncated."""
try: try:
CACHE_FILE.write_text(json.dumps(positions, indent=2)) CACHE_FILE.parent.mkdir(parents=True, exist_ok=True)
logger.info(f"Carrier cache saved: {len(positions)} carriers") tmp = CACHE_FILE.with_suffix(CACHE_FILE.suffix + ".tmp")
tmp.write_text(json.dumps(positions, indent=2), encoding="utf-8")
# On Windows os.replace is atomic and overwrites existing files.
os.replace(tmp, CACHE_FILE)
logger.info("Carrier cache saved: %d carriers", len(positions))
except (IOError, OSError) as e: except (IOError, OSError) as e:
logger.warning(f"Failed to save carrier cache: {e}") logger.warning("Failed to save carrier cache: %s", e)
def _homeport_entry_for(hull: str) -> Optional[dict]:
"""Return a homeport-default cache entry for a hull, or None if the
hull is not in the registry."""
info = CARRIER_REGISTRY.get(hull)
if not info:
return None
return {
"lat": info["homeport_lat"],
"lng": info["homeport_lng"],
"heading": 0,
"desc": f"{info['homeport']} (no observations yet)",
"source": f"Homeport default ({info['homeport']})",
"source_url": info.get("wiki", ""),
"position_source_at": _now_iso(),
"position_confidence": "homeport_default",
}
def _bootstrap_cache_if_missing() -> Dict[str, dict]:
"""One-shot: if no cache exists, materialize one from the seed file.
Returns the cache contents (hullentry). On first-ever startup,
this writes ``carrier_cache.json`` so subsequent restarts skip the
seed entirely. Operator-deleted caches re-bootstrap the same way
operators can use that to "reset" carrier positions, but it's an
explicit operator action.
"""
if CACHE_FILE.exists():
return _load_cache()
seed = _load_seed()
if not seed:
# No seed file either. Build a homeport-default cache so the
# first save_cache call still produces something honest.
homeports: Dict[str, dict] = {}
for hull in CARRIER_REGISTRY:
entry = _homeport_entry_for(hull)
if entry is not None:
homeports[hull] = entry
if homeports:
_save_cache(homeports)
return homeports
# Persist the seed as the first cache so subsequent runs skip this branch.
_save_cache(seed)
logger.info("Carrier cache bootstrapped from seed (first-ever startup)")
return dict(seed)
def _match_region(text: str) -> Optional[tuple]: def _match_region(text: str) -> Optional[tuple]:
@@ -270,10 +442,8 @@ def _match_carrier(text: str) -> Optional[str]:
for hull, info in CARRIER_REGISTRY.items(): for hull, info in CARRIER_REGISTRY.items():
hull_check = hull.lower().replace("-", "") hull_check = hull.lower().replace("-", "")
name_parts = info["name"].lower() name_parts = info["name"].lower()
# Match hull number (e.g., "CVN-78", "CVN78")
if hull.lower() in text_lower or hull_check in text_lower.replace("-", ""): if hull.lower() in text_lower or hull_check in text_lower.replace("-", ""):
return hull return hull
# Match ship name (e.g., "Ford", "Eisenhower", "Vinson")
ship_name = name_parts.split("(")[0].strip() ship_name = name_parts.split("(")[0].strip()
last_name = ship_name.split()[-1] if ship_name else "" last_name = ship_name.split()[-1] if ship_name else ""
if last_name and len(last_name) > 3 and last_name in text_lower: if last_name and len(last_name) > 3 and last_name in text_lower:
@@ -323,8 +493,9 @@ def _fetch_gdelt_carrier_news() -> List[dict]:
articles = data.get("articles", []) articles = data.get("articles", [])
for art in articles: for art in articles:
title = art.get("title", "") title = art.get("title", "")
url = art.get("url", "") article_url = art.get("url", "")
results.append({"title": title, "url": url}) article_at = art.get("seendate") or art.get("date") or ""
results.append({"title": title, "url": article_url, "seendate": article_at})
except (ConnectionError, TimeoutError, ValueError, KeyError, OSError) as e: except (ConnectionError, TimeoutError, ValueError, KeyError, OSError) as e:
logger.debug(f"GDELT search failed for '{term}': {e}") logger.debug(f"GDELT search failed for '{term}': {e}")
continue continue
@@ -340,108 +511,175 @@ def _fetch_gdelt_carrier_news() -> List[dict]:
return results return results
def _gdelt_seendate_to_iso(seendate: str) -> Optional[str]:
"""GDELT returns YYYYMMDDhhmmss (UTC). Convert to ISO8601 for
position_source_at. Returns None if the input is unparseable."""
raw = (seendate or "").strip()
if len(raw) < 8 or not raw.isdigit():
return None
try:
dt = datetime.strptime(raw[:14] if len(raw) >= 14 else raw[:8] + "000000", "%Y%m%d%H%M%S")
return dt.replace(tzinfo=timezone.utc).isoformat()
except (TypeError, ValueError):
return None
def _parse_carrier_positions_from_news(articles: List[dict]) -> Dict[str, dict]: def _parse_carrier_positions_from_news(articles: List[dict]) -> Dict[str, dict]:
"""Parse carrier positions from news article titles and descriptions.""" """Parse carrier positions from news article titles.
Issue #245 (tg12): the position is a region centroid, which is
coarse we now stamp ``position_confidence = "approximate"`` so
the UI can render that uncertainty. Issue #244: the
``position_source_at`` field is the news article's actual seen
date, NOT now(), so the freshness check correctly flips entries
to "stale" once they age past the configured window.
"""
updates: Dict[str, dict] = {} updates: Dict[str, dict] = {}
for article in articles: for article in articles:
title = article.get("title", "") title = article.get("title", "")
# Try to match a carrier from the title
hull = _match_carrier(title) hull = _match_carrier(title)
if not hull: if not hull:
continue continue
# Try to match a region from the title
coords = _match_region(title) coords = _match_region(title)
if not coords: if not coords:
continue continue
# Only update if we haven't seen this carrier yet (first match wins — most recent) # First match wins (most recent article, GDELT returns newest first
# per term).
if hull not in updates: if hull not in updates:
iso_at = _gdelt_seendate_to_iso(str(article.get("seendate", ""))) or _now_iso()
updates[hull] = { updates[hull] = {
"lat": coords[0], "lat": coords[0],
"lng": coords[1], "lng": coords[1],
"heading": 0,
"desc": title[:100], "desc": title[:100],
"source": "GDELT News API", "source": "GDELT News API (headline region match — approximate)",
"source_url": article.get("url", "https://api.gdeltproject.org"), "source_url": article.get("url", "https://api.gdeltproject.org"),
"updated": datetime.now(timezone.utc).isoformat(), "position_source_at": iso_at,
# Headline-to-centroid match is explicitly approximate.
"position_confidence": "approximate",
} }
logger.info( logger.info(
f"Carrier update: {CARRIER_REGISTRY[hull]['name']}{coords} (from: {title[:80]})" "Carrier update: %s%s (from: %s)",
CARRIER_REGISTRY[hull]["name"],
coords,
title[:80],
) )
return updates return updates
def _load_carrier_fallbacks() -> Dict[str, dict]: def _enrich_for_rendering(hull: str, entry: dict, *, now: Optional[datetime] = None) -> dict:
"""Build carrier positions from static fallbacks + disk cache (instant, no network).""" """Add live computed fields (confidence label, last_osint_update)
positions: Dict[str, dict] = {} on top of the persisted cache entry. The persisted entry is left
for hull, info in CARRIER_REGISTRY.items(): untouched; this function builds the public-facing object.
positions[hull] = { """
"name": info["name"], info = CARRIER_REGISTRY.get(hull, {})
"lat": info["fallback_lat"], confidence = _compute_position_confidence(entry, now=now)
"lng": info["fallback_lng"], return {
"heading": info["fallback_heading"], "name": entry.get("name", info.get("name", hull)),
"desc": info["fallback_desc"], "lat": entry["lat"],
"wiki": info["wiki"], "lng": entry["lng"],
"source": "USNI News Fleet & Marine Tracker", "heading": entry.get("heading", 0),
"source_url": "https://news.usni.org/category/fleet-tracker", "desc": entry.get("desc", ""),
"updated": datetime.now(timezone.utc).isoformat(), "wiki": entry.get("wiki", info.get("wiki", "")),
} "source": entry.get("source", "OSINT estimated position"),
"source_url": entry.get("source_url", ""),
# Overlay cached positions from previous runs (may have GDELT data) "position_source_at": entry.get("position_source_at", ""),
cached = _load_cache() "position_confidence": confidence,
for hull, cached_pos in cached.items(): # Existing field preserved for backward compatibility with the
if hull in positions: # current frontend ShipPopup; now reflects the SOURCE's observed
if cached_pos.get("source", "").startswith("GDELT") or cached_pos.get( # time (not now()), so "last reported X days ago" is honest.
"source", "" "last_osint_update": entry.get("position_source_at", ""),
).startswith("News"): # Convenience boolean for the UI: true when the position is
positions[hull].update( # NOT live OSINT (used to render dimmed icons / badges).
{ "is_fallback": confidence in {"seed", "stale", "stale_approximate", "homeport_default"},
"lat": cached_pos["lat"], }
"lng": cached_pos["lng"],
"desc": cached_pos.get("desc", positions[hull]["desc"]),
"source": cached_pos.get("source", "Cached OSINT"),
"updated": cached_pos.get("updated", ""),
}
)
return positions
def update_carrier_positions(): def update_carrier_positions() -> None:
"""Main update function — called on startup and every 12h. """Refresh carrier positions.
Phase 1 (instant): publish fallback + cached positions so the map has carriers immediately. Phase 1 (instant): publish whatever's in carrier_cache.json (or
Phase 2 (slow): query GDELT for fresh OSINT positions and update in-place. bootstrap from seed on first-ever run), so the map has carriers
immediately.
Phase 2 (slow): query GDELT and replace position entries for any
carrier mentioned in fresh news. Persist back to cache.
""" """
global _last_update global _last_update
# --- Phase 1: instant fallback + cache --- # --- Phase 1: instant cache (bootstrap from seed on first-ever run) ---
positions = _load_carrier_fallbacks() positions = _bootstrap_cache_if_missing()
# Ensure every registered hull has SOMETHING in the cache. A hull
# the seed didn't cover (e.g. added after install) renders at its
# homeport with "homeport_default" confidence.
for hull in CARRIER_REGISTRY:
if hull not in positions:
entry = _homeport_entry_for(hull)
if entry is not None:
positions[hull] = entry
with _positions_lock: with _positions_lock:
# Only overwrite if positions are currently empty (first startup).
# If we already have data from a previous cycle, keep it while GDELT runs.
if not _carrier_positions: if not _carrier_positions:
_carrier_positions.update(positions) _carrier_positions.update(positions)
_last_update = datetime.now(timezone.utc) _last_update = datetime.now(timezone.utc)
logger.info( logger.info(
f"Carrier tracker: {len(positions)} carriers loaded from fallback/cache (GDELT enrichment starting...)" "Carrier tracker: %d carriers loaded from cache (USNI + GDELT enrichment starting...)",
len(positions),
) )
# --- Phase 2: slow GDELT enrichment --- # --- Phase 2: USNI Fleet & Marine Tracker (PRIMARY source) ---
#
# USNI publishes a weekly editorial tracker with each carrier's
# actual operating area, parsed from explicit prose like
# "The Gerald R. Ford Carrier Strike Group is operating in the Red Sea"
# These positions are tagged ``position_confidence: "recent"`` because
# they reflect actual reporting, not headline-keyword centroids.
# USNI updates are preferred over GDELT — they're authoritative on
# US Navy positions where GDELT is just article-title text mining.
try:
from services.fetchers.usni_fleet_tracker import (
fetch_latest_fleet_tracker_positions,
)
usni_positions = fetch_latest_fleet_tracker_positions()
for hull, pos in usni_positions.items():
positions[hull] = pos
logger.info(
"Carrier USNI update: %s%s",
CARRIER_REGISTRY[hull]["name"],
pos.get("desc", ""),
)
except Exception as e:
logger.warning("USNI fleet-tracker fetch failed: %s", e)
# --- Phase 3: GDELT enrichment (SECONDARY — fills gaps) ---
#
# Used only to backfill carriers USNI didn't mention this week. The
# position is stamped ``approximate`` so the UI knows it's a
# headline-centroid match (Issue #245).
try: try:
articles = _fetch_gdelt_carrier_news() articles = _fetch_gdelt_carrier_news()
news_positions = _parse_carrier_positions_from_news(articles) news_positions = _parse_carrier_positions_from_news(articles)
for hull, pos in news_positions.items(): for hull, pos in news_positions.items():
if hull in positions: # Only overwrite if the existing entry is NOT a recent USNI
positions[hull].update(pos) # observation. A "recent" USNI position is higher-confidence
logger.info(f"Carrier OSINT: updated {CARRIER_REGISTRY[hull]['name']} from news") # than a GDELT headline-centroid match — don't let GDELT
# demote a real position to an approximate one.
existing = positions.get(hull, {})
existing_conf = _compute_position_confidence(existing)
if existing_conf == "recent":
continue
positions[hull] = pos
logger.info(
"Carrier OSINT: updated %s from GDELT news",
CARRIER_REGISTRY[hull]["name"],
)
except (ValueError, KeyError, json.JSONDecodeError, OSError) as e: except (ValueError, KeyError, json.JSONDecodeError, OSError) as e:
logger.warning(f"GDELT carrier fetch failed: {e}") logger.warning("GDELT carrier fetch failed: %s", e)
# Save and update the global state with enriched positions
with _positions_lock: with _positions_lock:
_carrier_positions.clear() _carrier_positions.clear()
_carrier_positions.update(positions) _carrier_positions.update(positions)
@@ -449,21 +687,15 @@ def update_carrier_positions():
_save_cache(positions) _save_cache(positions)
sources = {} confidences: Dict[str, int] = {}
for p in positions.values(): for entry in positions.values():
src = p.get("source", "unknown") label = _compute_position_confidence(entry)
sources[src] = sources.get(src, 0) + 1 confidences[label] = confidences.get(label, 0) + 1
logger.info(f"Carrier tracker: {len(positions)} carriers updated. Sources: {sources}") logger.info("Carrier tracker: %d carriers updated. Confidence: %s", len(positions), confidences)
def _deconflict_positions(result: List[dict]) -> List[dict]: def _deconflict_positions(result: List[dict]) -> List[dict]:
"""Offset carriers that share identical coordinates so they don't stack. """Offset carriers that share identical coordinates so they don't stack."""
At port: offset along the pier axis (~500m / 0.004° apart).
At sea: offset perpendicular to each other (~0.08° / ~9km apart)
so they're visibly separate but clearly operating together.
"""
# Group by rounded lat/lng (within ~0.01° ≈ 1km = same spot)
from collections import defaultdict from collections import defaultdict
groups: dict[str, list[int]] = defaultdict(list) groups: dict[str, list[int]] = defaultdict(list)
@@ -475,7 +707,6 @@ def _deconflict_positions(result: List[dict]) -> List[dict]:
if len(indices) < 2: if len(indices) < 2:
continue continue
n = len(indices) n = len(indices)
# Determine if this is a port (near a homeport) or at sea
sample = result[indices[0]] sample = result[indices[0]]
at_port = any( at_port = any(
abs(sample["lat"] - info.get("homeport_lat", 0)) < 0.05 abs(sample["lat"] - info.get("homeport_lat", 0)) < 0.05
@@ -484,7 +715,6 @@ def _deconflict_positions(result: List[dict]) -> List[dict]:
) )
if at_port: if at_port:
# Use each carrier's distinct homeport pier coordinates
for idx in indices: for idx in indices:
carrier = result[idx] carrier = result[idx]
hull = None hull = None
@@ -497,8 +727,7 @@ def _deconflict_positions(result: List[dict]) -> List[dict]:
carrier["lat"] = info["homeport_lat"] carrier["lat"] = info["homeport_lat"]
carrier["lng"] = info["homeport_lng"] carrier["lng"] = info["homeport_lng"]
else: else:
# At sea: spread in a line perpendicular to travel (~0.08° apart) spacing = 0.08
spacing = 0.08 # ~9km — close enough to see they're together
start_offset = -(n - 1) * spacing / 2 start_offset = -(n - 1) * spacing / 2
for j, idx in enumerate(indices): for j, idx in enumerate(indices):
result[idx]["lng"] += start_offset + j * spacing result[idx]["lng"] += start_offset + j * spacing
@@ -507,36 +736,44 @@ def _deconflict_positions(result: List[dict]) -> List[dict]:
def get_carrier_positions() -> List[dict]: def get_carrier_positions() -> List[dict]:
"""Return current carrier positions for the data pipeline.""" """Return current carrier positions for the data pipeline.
Each entry has the full provenance + freshness fields; the UI can
decide how to render them. Carriers are never hidden only
labeled.
"""
now = datetime.now(timezone.utc)
with _positions_lock: with _positions_lock:
result = [] result: List[dict] = []
for hull, pos in _carrier_positions.items(): for hull, entry in _carrier_positions.items():
info = CARRIER_REGISTRY.get(hull, {}) enriched = _enrich_for_rendering(hull, entry, now=now)
result.append( result.append(
{ {
"name": pos.get("name", info.get("name", hull)), "name": enriched["name"],
"type": "carrier", "type": "carrier",
"lat": pos["lat"], "lat": enriched["lat"],
"lng": pos["lng"], "lng": enriched["lng"],
"heading": None, # Heading unknown for carriers — OSINT cannot determine true heading "heading": None, # OSINT cannot determine true heading.
"sog": 0, "sog": 0,
"cog": 0, "cog": 0,
"country": "United States", "country": "United States",
"desc": pos.get("desc", ""), "desc": enriched["desc"],
"wiki": pos.get("wiki", info.get("wiki", "")), "wiki": enriched["wiki"],
"estimated": True, "estimated": True,
"source": pos.get("source", "OSINT estimated position"), "source": enriched["source"],
"source_url": pos.get( "source_url": enriched["source_url"],
"source_url", "https://news.usni.org/category/fleet-tracker" "last_osint_update": enriched["last_osint_update"],
), # New fields (additive — existing UI continues to work):
"last_osint_update": pos.get("updated", ""), "position_source_at": enriched["position_source_at"],
"position_confidence": enriched["position_confidence"],
"is_fallback": enriched["is_fallback"],
} }
) )
return _deconflict_positions(result) return _deconflict_positions(result)
# ----------------------------------------------------------------- # -----------------------------------------------------------------
# Scheduler: runs at startup, then at 00:00 and 12:00 UTC daily # Scheduler: runs at startup, then at 00:00 and 12:00 UTC daily.
# ----------------------------------------------------------------- # -----------------------------------------------------------------
_scheduler_thread: Optional[threading.Thread] = None _scheduler_thread: Optional[threading.Thread] = None
_scheduler_stop = threading.Event() _scheduler_stop = threading.Event()
@@ -544,7 +781,6 @@ _scheduler_stop = threading.Event()
def _scheduler_loop(): def _scheduler_loop():
"""Background thread that triggers updates at 00:00 and 12:00 UTC.""" """Background thread that triggers updates at 00:00 and 12:00 UTC."""
# Initial update on startup
try: try:
update_carrier_positions() update_carrier_positions()
except Exception as e: except Exception as e:
@@ -552,7 +788,6 @@ def _scheduler_loop():
while not _scheduler_stop.is_set(): while not _scheduler_stop.is_set():
now = datetime.now(timezone.utc) now = datetime.now(timezone.utc)
# Next target: 00:00 or 12:00 UTC, whichever is sooner
hour = now.hour hour = now.hour
if hour < 12: if hour < 12:
next_hour = 12 next_hour = 12
@@ -561,18 +796,17 @@ def _scheduler_loop():
next_run = now.replace(hour=next_hour % 24, minute=0, second=0, microsecond=0) next_run = now.replace(hour=next_hour % 24, minute=0, second=0, microsecond=0)
if next_hour == 24: if next_hour == 24:
from datetime import timedelta
next_run = (now + timedelta(days=1)).replace(hour=0, minute=0, second=0, microsecond=0) next_run = (now + timedelta(days=1)).replace(hour=0, minute=0, second=0, microsecond=0)
wait_seconds = (next_run - now).total_seconds() wait_seconds = (next_run - now).total_seconds()
logger.info( logger.info(
f"Carrier tracker: next update at {next_run.isoformat()} ({wait_seconds/3600:.1f}h)" "Carrier tracker: next update at %s (%.1fh)",
next_run.isoformat(),
wait_seconds / 3600,
) )
# Wait until next scheduled time, or until stop event
if _scheduler_stop.wait(timeout=wait_seconds): if _scheduler_stop.wait(timeout=wait_seconds):
break # Stop event was set break
try: try:
update_carrier_positions() update_carrier_positions()
+419 -93
View File
@@ -17,6 +17,9 @@ _KNOWN_CCTV_MEDIA_HOST_ALIASES = {
# Trusted upstream occasionally publishes a typo for this Georgia camera # Trusted upstream occasionally publishes a typo for this Georgia camera
# host. Normalize it at ingest so the proxy and client stay consistent. # host. Normalize it at ingest so the proxy and client stay consistent.
"navigatos-c2c.dot.ga.gov": "navigator-c2c.dot.ga.gov", "navigatos-c2c.dot.ga.gov": "navigator-c2c.dot.ga.gov",
# TravelIQ staging hosts occasionally appear in 511 catalog metadata.
"on.stage.traveliq.co": "511on.ca",
"ab.stage.traveliq.co": "511.alberta.ca",
} }
_POINT_WKT_RE = re.compile( _POINT_WKT_RE = re.compile(
@@ -40,6 +43,17 @@ def _normalize_cctv_media_url(raw_url: str) -> str:
return urlunparse(parsed._replace(netloc=netloc)) return urlunparse(parsed._replace(netloc=netloc))
def _ensure_https_url(raw_url: str) -> str:
"""Upgrade http:// media/catalog URLs to https:// at ingest time."""
candidate = _normalize_cctv_media_url(str(raw_url or "").strip())
if not candidate:
return ""
parsed = urlparse(candidate)
if parsed.scheme.lower() == "http":
return urlunparse(parsed._replace(scheme="https"))
return candidate
def _looks_like_direct_cctv_media_url(url: str) -> bool: def _looks_like_direct_cctv_media_url(url: str) -> bool:
candidate = str(url or "").strip().lower() candidate = str(url or "").strip().lower()
if not candidate.startswith(("http://", "https://")): if not candidate.startswith(("http://", "https://")):
@@ -93,6 +107,165 @@ def _parse_wkt_point(raw_point: str) -> tuple[float | None, float | None]:
return lat, lon return lat, lon
def _fetch_traveliq_v2_cameras(
*,
api_url: str,
base_url: str,
id_prefix: str,
source_agency: str,
) -> List[Dict[str, Any]]:
"""Parse TravelIQ-style GET /api/v2/get/cameras feeds (Ontario, Alberta)."""
resp = fetch_with_curl(
api_url,
timeout=30,
headers={"Accept": "application/json"},
)
if not resp or resp.status_code != 200:
logger.error(
"%s CCTV fetch failed: HTTP %s",
source_agency,
resp.status_code if resp else "no response",
)
return []
data = resp.json()
if not isinstance(data, list):
return []
cameras: List[Dict[str, Any]] = []
for cam in data:
if not isinstance(cam, dict):
continue
try:
lat = float(cam.get("Latitude"))
lon = float(cam.get("Longitude"))
except (TypeError, ValueError):
continue
site_id = cam.get("Id")
location = str(cam.get("Location") or cam.get("Roadway") or "Camera")[:120]
views = cam.get("Views") or []
if not views:
continue
for view in views:
if not isinstance(view, dict):
continue
status = str(view.get("Status") or "enabled").strip().lower()
if status and status not in {"enabled", "active"}:
continue
media_url = _ensure_https_url(
urljoin(base_url, str(view.get("Url") or "").strip())
)
if not media_url:
continue
view_id = view.get("Id") or site_id
if site_id is None or view_id is None:
continue
label = str(view.get("Description") or location or "Camera")[:120]
cameras.append(
{
"id": f"{id_prefix}-{site_id}-{view_id}",
"source_agency": source_agency,
"lat": lat,
"lon": lon,
"direction_facing": label,
"media_url": media_url,
"media_type": "image",
"refresh_rate_seconds": 60,
}
)
return cameras
def _fetch_511_datatables_cameras(
*,
list_url: str,
base_url: str,
id_prefix: str,
source_agency: str,
referer: str,
page_size: int = 500,
) -> List[Dict[str, Any]]:
"""Parse 511 DataTables POST /List/GetData/Cameras feeds (Georgia, Florida)."""
cameras: List[Dict[str, Any]] = []
start = 0
draw = 1
while True:
resp = fetch_with_curl(
list_url,
method="POST",
json_data={"draw": draw, "start": start, "length": page_size},
timeout=30,
headers={
"Accept": "application/json",
"Referer": referer,
"Origin": base_url.rstrip("/"),
},
)
if not resp or resp.status_code != 200:
logger.error(
"%s CCTV fetch failed: HTTP %s",
source_agency,
resp.status_code if resp else "no response",
)
break
data = resp.json()
rows = data.get("data") or []
if not rows:
break
for row in rows:
if not isinstance(row, dict):
continue
site_id = row.get("id") or row.get("DT_RowId")
location = row.get("location") or row.get("roadway") or source_agency
lat_lng = row.get("latLng") or {}
geography = lat_lng.get("geography") if isinstance(lat_lng, dict) else {}
lat, lon = _parse_wkt_point(
geography.get("wellKnownText") if isinstance(geography, dict) else ""
)
images = row.get("images") or []
image = next(
(
candidate
for candidate in images
if str(candidate.get("imageUrl") or "").strip()
and not bool(candidate.get("blocked"))
),
None,
)
if not (site_id and image and lat is not None and lon is not None):
continue
media_url = _ensure_https_url(
urljoin(base_url, str(image.get("imageUrl") or "").strip())
)
if not media_url:
continue
cameras.append(
{
"id": f"{id_prefix}-{site_id}",
"source_agency": source_agency,
"lat": lat,
"lon": lon,
"direction_facing": str(location)[:120],
"media_url": media_url,
"media_type": "image",
"refresh_rate_seconds": 60,
}
)
start += len(rows)
draw += 1
total = int(data.get("recordsTotal") or 0)
if total and start >= total:
break
if not total and len(rows) < page_size:
break
return cameras
def init_db(): def init_db():
DB_PATH.parent.mkdir(parents=True, exist_ok=True) DB_PATH.parent.mkdir(parents=True, exist_ok=True)
conn = sqlite3.connect(str(DB_PATH)) conn = sqlite3.connect(str(DB_PATH))
@@ -169,7 +342,7 @@ class BaseCCTVIngestor(ABC):
cam.get("lat"), cam.get("lat"),
cam.get("lon"), cam.get("lon"),
cam.get("direction_facing", "Unknown"), cam.get("direction_facing", "Unknown"),
cam.get("media_url"), _ensure_https_url(cam.get("media_url", "")),
cam.get("media_type", _detect_media_type(cam.get("media_url", ""))), cam.get("media_type", _detect_media_type(cam.get("media_url", ""))),
cam.get("refresh_rate_seconds", 60), cam.get("refresh_rate_seconds", 60),
), ),
@@ -454,77 +627,14 @@ class WSDOTIngestor(BaseCCTVIngestor):
class GeorgiaDOTIngestor(BaseCCTVIngestor): class GeorgiaDOTIngestor(BaseCCTVIngestor):
"""Georgia cameras via the public 511GA list feed.""" """Georgia cameras via the public 511GA list feed."""
URL = "https://511ga.org/List/GetData/Cameras"
BASE_URL = "https://511ga.org"
PAGE_SIZE = 500
def fetch_data(self) -> List[Dict[str, Any]]: def fetch_data(self) -> List[Dict[str, Any]]:
cameras = [] return _fetch_511_datatables_cameras(
start = 0 list_url="https://511ga.org/List/GetData/Cameras",
draw = 1 base_url="https://511ga.org",
while True: id_prefix="GDOT",
resp = fetch_with_curl( source_agency="Georgia DOT",
self.URL, referer="https://511ga.org/cctv",
method="POST", )
json_data={"draw": draw, "start": start, "length": self.PAGE_SIZE},
timeout=30,
headers={
"Accept": "application/json",
"Referer": "https://511ga.org/cctv",
"Origin": "https://511ga.org",
},
)
if not resp or resp.status_code != 200:
logger.error(
"Georgia CCTV fetch failed: HTTP %s",
resp.status_code if resp else "no response",
)
break
data = resp.json()
rows = data.get("data") or []
if not rows:
break
for row in rows:
site_id = row.get("id") or row.get("DT_RowId")
location = row.get("location") or row.get("roadway") or "GA Camera"
lat_lng = row.get("latLng") or {}
geography = lat_lng.get("geography") if isinstance(lat_lng, dict) else {}
lat, lon = _parse_wkt_point(geography.get("wellKnownText") if isinstance(geography, dict) else "")
images = row.get("images") or []
image = next(
(
candidate
for candidate in images
if str(candidate.get("imageUrl") or "").strip()
and not bool(candidate.get("blocked"))
),
None,
)
if not (site_id and image and lat is not None and lon is not None):
continue
media_url = _normalize_cctv_media_url(
urljoin(self.BASE_URL, str(image.get("imageUrl") or "").strip())
)
cameras.append(
{
"id": f"GDOT-{site_id}",
"source_agency": "Georgia DOT",
"lat": lat,
"lon": lon,
"direction_facing": str(location)[:120],
"media_url": media_url,
"media_type": "image",
"refresh_rate_seconds": 60,
}
)
start += len(rows)
draw += 1
total = int(data.get("recordsTotal") or 0)
if total and start >= total:
break
if not total and len(rows) < self.PAGE_SIZE:
break
return cameras
class IllinoisDOTIngestor(BaseCCTVIngestor): class IllinoisDOTIngestor(BaseCCTVIngestor):
@@ -1009,17 +1119,72 @@ def _extract_img_src(html_fragment: str):
return None return None
class AsfinagIngestor(BaseCCTVIngestor):
"""Austria ASFINAG motorway webcams (Osiris port)."""
API_URL = "https://odo.asfinag.at/odo/rest/sec/resource/001/json/webcams?language=atDE"
HEADERS = {
"User-Agent": "Shadowbroker-CCTV/1.0",
"Accept": "application/json",
"Referer": "https://www.asfinag.at/",
"Authorization": "Basic bWFwX3dpZGdldDp0ZWdkaXc=",
}
def fetch_data(self) -> List[Dict[str, Any]]:
try:
response = fetch_with_curl(self.API_URL, timeout=15, headers=self.HEADERS)
response.raise_for_status()
payload = response.json()
except Exception as exc:
logger.error("AsfinagIngestor: fetch failed: %s", exc)
return []
if not isinstance(payload, list):
return []
cameras: List[Dict[str, Any]] = []
for cam in payload:
cam_id = cam.get("wcs_id")
lat = cam.get("wgs84_lat")
lon = cam.get("wgs84_lon")
image_url = cam.get("url_campic")
if not cam_id or lat is None or lon is None or not image_url:
continue
if str(cam_id).startswith("Utinform"):
continue
label = cam.get("position_txt") or cam.get("direction_txt") or "ASFINAG Webcam"
secure_url = _ensure_https_url(image_url)
if not secure_url:
continue
cameras.append(
{
"id": f"ASFINAG-{cam_id}",
"source_agency": "ASFINAG Austria",
"lat": float(lat),
"lon": float(lon),
"direction_facing": label,
"media_url": secure_url,
"media_type": "image",
"refresh_rate_seconds": 300,
}
)
logger.info("AsfinagIngestor: parsed %s cameras", len(cameras))
return cameras
class MadridCityIngestor(BaseCCTVIngestor): class MadridCityIngestor(BaseCCTVIngestor):
"""Madrid City Hall traffic cameras from datos.madrid.es KML feed.""" """Madrid City Hall traffic cameras from datos.madrid.es KML feed."""
KML_URL = "http://datos.madrid.es/egob/catalogo/202088-0-trafico-camaras.kml" KML_URL = "https://datos.madrid.es/egob/catalogo/202088-0-trafico-camaras.kml"
def _fetch_kml(self):
response = fetch_with_curl(self.KML_URL, timeout=20)
response.raise_for_status()
return response
def fetch_data(self) -> List[Dict[str, Any]]: def fetch_data(self) -> List[Dict[str, Any]]:
import defusedxml.ElementTree as ET import defusedxml.ElementTree as ET
try: try:
response = fetch_with_curl(self.KML_URL, timeout=20) response = self._fetch_kml()
response.raise_for_status()
except Exception as e: except Exception as e:
logger.error(f"MadridCityIngestor: failed to fetch KML: {e}") logger.error(f"MadridCityIngestor: failed to fetch KML: {e}")
return [] return []
@@ -1055,6 +1220,9 @@ class MadridCityIngestor(BaseCCTVIngestor):
if desc_el is not None and desc_el.text: if desc_el is not None and desc_el.text:
image_url = _extract_img_src(desc_el.text) image_url = _extract_img_src(desc_el.text)
if not image_url:
continue
image_url = _ensure_https_url(image_url)
if not image_url: if not image_url:
continue continue
@@ -1076,6 +1244,153 @@ class MadridCityIngestor(BaseCCTVIngestor):
return cameras return cameras
class Ontario511Ingestor(BaseCCTVIngestor):
"""Ontario highway cameras via 511on.ca TravelIQ API."""
def fetch_data(self) -> List[Dict[str, Any]]:
return _fetch_traveliq_v2_cameras(
api_url="https://511on.ca/api/v2/get/cameras",
base_url="https://511on.ca",
id_prefix="ON511",
source_agency="511 Ontario",
)
class Alberta511Ingestor(BaseCCTVIngestor):
"""Alberta highway cameras via 511 Alberta TravelIQ API."""
def fetch_data(self) -> List[Dict[str, Any]]:
return _fetch_traveliq_v2_cameras(
api_url="https://511.alberta.ca/api/v2/get/cameras",
base_url="https://511.alberta.ca",
id_prefix="AB511",
source_agency="511 Alberta",
)
class Florida511Ingestor(BaseCCTVIngestor):
"""Florida cameras via FL511 DataTables feed (~4,800 sites)."""
def fetch_data(self) -> List[Dict[str, Any]]:
return _fetch_511_datatables_cameras(
list_url="https://fl511.com/List/GetData/Cameras",
base_url="https://fl511.com",
id_prefix="FL511",
source_agency="Florida 511",
referer="https://fl511.com/",
)
class AustraliaLiveTrafficIngestor(BaseCCTVIngestor):
"""NSW / Australia live traffic cameras via Transport for NSW JSON feed."""
URL = "https://www.livetraffic.com/datajson/all-feeds-web.json"
def fetch_data(self) -> List[Dict[str, Any]]:
resp = fetch_with_curl(self.URL, timeout=35, headers={"Accept": "application/json"})
if not resp or resp.status_code != 200:
logger.error(
"Australia Live Traffic CCTV fetch failed: HTTP %s",
resp.status_code if resp else "no response",
)
return []
data = resp.json()
if not isinstance(data, list):
return []
cameras: List[Dict[str, Any]] = []
for item in data:
if not isinstance(item, dict) or item.get("eventType") != "liveCams":
continue
geometry = item.get("geometry") if isinstance(item.get("geometry"), dict) else {}
coords = geometry.get("coordinates") if isinstance(geometry.get("coordinates"), list) else []
if len(coords) < 2:
continue
try:
lon = float(coords[0])
lat = float(coords[1])
except (TypeError, ValueError):
continue
props = item.get("properties") if isinstance(item.get("properties"), dict) else {}
media_url = _ensure_https_url(str(props.get("href") or "").strip())
if not media_url:
continue
cam_id = str(item.get("path") or props.get("id") or len(cameras)).strip("/")
label = str(props.get("title") or props.get("headline") or "Australia Camera")[:120]
cameras.append(
{
"id": f"AUS-{cam_id}",
"source_agency": "NSW Live Traffic",
"lat": lat,
"lon": lon,
"direction_facing": label,
"media_url": media_url,
"media_type": "image",
"refresh_rate_seconds": 120,
}
)
logger.info("AustraliaLiveTrafficIngestor: parsed %s cameras", len(cameras))
return cameras
class NetherlandsRWSIngestor(BaseCCTVIngestor):
"""Netherlands Rijkswaterstaat cameras from legacy NDW open-data JSON.
The opendata.ndw.nu/cameras.json feed Osiris used is often offline; when
unavailable this ingestor returns an empty set and logs a warning.
"""
URL = "https://opendata.ndw.nu/cameras.json"
MAX_CAMERAS = 1200
def fetch_data(self) -> List[Dict[str, Any]]:
resp = fetch_with_curl(self.URL, timeout=25, headers={"Accept": "application/json"})
if not resp or resp.status_code != 200:
logger.warning(
"Netherlands RWS cameras.json unavailable (HTTP %s) — "
"NDW retired this open-data endpoint; no cameras ingested",
resp.status_code if resp else "no response",
)
return []
data = resp.json()
if not isinstance(data, list):
return []
cameras: List[Dict[str, Any]] = []
for i, cam in enumerate(data[: self.MAX_CAMERAS]):
if not isinstance(cam, dict):
continue
lat = cam.get("lat") if cam.get("lat") is not None else cam.get("latitude")
lon = cam.get("lng") if cam.get("lng") is not None else cam.get("longitude")
media_url = _ensure_https_url(
str(cam.get("imageUrl") or cam.get("feed_url") or cam.get("url") or "").strip()
)
if lat is None or lon is None or not media_url:
continue
try:
lat_f, lon_f = float(lat), float(lon)
except (TypeError, ValueError):
continue
cameras.append(
{
"id": f"NLRWS-{cam.get('id') or i}",
"source_agency": "Rijkswaterstaat",
"lat": lat_f,
"lon": lon_f,
"direction_facing": str(cam.get("name") or "Netherlands Camera")[:120],
"media_url": media_url,
"media_type": "image",
"refresh_rate_seconds": 120,
}
)
logger.info("NetherlandsRWSIngestor: parsed %s cameras", len(cameras))
return cameras
def _detect_media_type(url: str) -> str: def _detect_media_type(url: str) -> str:
"""Detect the media type from a camera URL for proper frontend rendering.""" """Detect the media type from a camera URL for proper frontend rendering."""
if not url: if not url:
@@ -1094,29 +1409,40 @@ def _detect_media_type(url: str) -> str:
return "image" return "image"
def scheduled_cctv_ingestors() -> List[tuple["BaseCCTVIngestor", str]]:
"""Canonical list of CCTV ingestors for startup, scheduler, and DB seeding."""
return [
(TFLJamCamIngestor(), "cctv_tfl"),
(LTASingaporeIngestor(), "cctv_lta"),
(AustinTXIngestor(), "cctv_atx"),
(NYCDOTIngestor(), "cctv_nyc"),
(CaltransIngestor(), "cctv_caltrans"),
(ColoradoDOTIngestor(), "cctv_codot"),
(WSDOTIngestor(), "cctv_wsdot"),
(GeorgiaDOTIngestor(), "cctv_gdot"),
(IllinoisDOTIngestor(), "cctv_idot"),
(MichiganDOTIngestor(), "cctv_mdot"),
(WindyWebcamsIngestor(), "cctv_windy"),
(DGTNationalIngestor(), "cctv_dgt"),
(MadridCityIngestor(), "cctv_madrid"),
(OSMTrafficCameraIngestor(), "cctv_osm"),
(AsfinagIngestor(), "cctv_asfinag"),
(OSMALPRCameraIngestor(), "cctv_osm_alpr"),
(Ontario511Ingestor(), "cctv_on511"),
(Alberta511Ingestor(), "cctv_ab511"),
(Florida511Ingestor(), "cctv_fl511"),
(AustraliaLiveTrafficIngestor(), "cctv_australia"),
(NetherlandsRWSIngestor(), "cctv_nl_rws"),
]
def run_all_ingestors(): def run_all_ingestors():
"""Run all CCTV ingestors synchronously. Used for first-run DB seeding.""" """Run all CCTV ingestors synchronously. Used for first-run DB seeding."""
ingestors = [ for ingestor, _name in scheduled_cctv_ingestors():
TFLJamCamIngestor(),
LTASingaporeIngestor(),
AustinTXIngestor(),
NYCDOTIngestor(),
CaltransIngestor(),
ColoradoDOTIngestor(),
WSDOTIngestor(),
GeorgiaDOTIngestor(),
IllinoisDOTIngestor(),
MichiganDOTIngestor(),
WindyWebcamsIngestor(),
OSMTrafficCameraIngestor(),
DGTNationalIngestor(),
MadridCityIngestor(),
]
for ing in ingestors:
try: try:
ing.ingest() ingestor.ingest()
except Exception as e: except Exception as e:
logger.warning(f"Ingestor {ing.__class__.__name__} failed during seed: {e}") logger.warning(f"Ingestor {ingestor.__class__.__name__} failed during seed: {e}")
def get_all_cameras() -> List[Dict[str, Any]]: def get_all_cameras() -> List[Dict[str, Any]]:
+35
View File
@@ -32,6 +32,7 @@ class Settings(BaseSettings):
MESH_ARTI_ENABLED: bool = False MESH_ARTI_ENABLED: bool = False
MESH_ARTI_SOCKS_PORT: int = 9050 MESH_ARTI_SOCKS_PORT: int = 9050
MESH_RELAY_PEERS: str = "" MESH_RELAY_PEERS: str = ""
MESH_PUBLIC_PEER_URL: str = ""
# Bootstrap seeds are discovery hints, not authoritative network roots. # Bootstrap seeds are discovery hints, not authoritative network roots.
# Nodes promote healthy discovered peers from the store/manifest over time. # Nodes promote healthy discovered peers from the store/manifest over time.
MESH_BOOTSTRAP_SEED_PEERS: str = "http://gqpbunqbgtkcqilvclm3xrkt3zowjyl3s62kkktvojgvxzizamvbrqid.onion:8000" MESH_BOOTSTRAP_SEED_PEERS: str = "http://gqpbunqbgtkcqilvclm3xrkt3zowjyl3s62kkktvojgvxzizamvbrqid.onion:8000"
@@ -53,6 +54,12 @@ class Settings(BaseSettings):
MESH_RELAY_FAILURE_COOLDOWN_S: int = 120 MESH_RELAY_FAILURE_COOLDOWN_S: int = 120
MESH_BOOTSTRAP_SEED_FAILURE_COOLDOWN_S: int = 15 MESH_BOOTSTRAP_SEED_FAILURE_COOLDOWN_S: int = 15
MESH_PEER_PUSH_SECRET: str = "" MESH_PEER_PUSH_SECRET: str = ""
# Issue #256 (tg12): optional per-peer HMAC secret map. Comma-separated
# `url=secret` pairs. When a peer URL appears here, only that per-peer
# secret is accepted for it — the global MESH_PEER_PUSH_SECRET above is
# ignored for that specific URL. Single-peer installs and unmigrated
# multi-peer installs leave this empty and behavior is unchanged.
MESH_PEER_SECRETS: str = ""
MESH_RNS_APP_NAME: str = "shadowbroker" MESH_RNS_APP_NAME: str = "shadowbroker"
MESH_RNS_ASPECT: str = "infonet" MESH_RNS_ASPECT: str = "infonet"
MESH_RNS_IDENTITY_PATH: str = "" MESH_RNS_IDENTITY_PATH: str = ""
@@ -110,6 +117,21 @@ class Settings(BaseSettings):
MESH_DM_REQUEST_MAILBOX_LIMIT: int = 12 MESH_DM_REQUEST_MAILBOX_LIMIT: int = 12
MESH_DM_SHARED_MAILBOX_LIMIT: int = 48 MESH_DM_SHARED_MAILBOX_LIMIT: int = 48
MESH_DM_SELF_MAILBOX_LIMIT: int = 12 MESH_DM_SELF_MAILBOX_LIMIT: int = 12
# Anti-spam: cap on distinct UNACKED messages a single sender can have
# parked in a single recipient's mailbox at any one time. Once the
# recipient pulls (acks) a message, the sender's quota for that pair
# frees up. Default 2 — a sender who wants to deliver more must wait
# for the recipient to actually read the prior messages.
#
# This cap is enforced TWICE: once on the local deposit path (the
# sender's own node refuses to spool the 3rd message) AND once on
# the replication-acceptance path (honest peer relays refuse to
# accept inbound replicas that would put them over the cap). The
# double enforcement makes the rule a NETWORK rule — patching out
# the local check on a hostile sender's relay doesn't let extras
# propagate, because every honest peer enforces the same cap on
# inbound replication.
MESH_DM_PENDING_PER_SENDER_LIMIT: int = 2
MESH_BLOCK_LEGACY_AGENT_ID_LOOKUP: bool = True MESH_BLOCK_LEGACY_AGENT_ID_LOOKUP: bool = True
MESH_ALLOW_COMPAT_DM_INVITE_IMPORT: bool = False MESH_ALLOW_COMPAT_DM_INVITE_IMPORT: bool = False
MESH_ALLOW_COMPAT_DM_INVITE_IMPORT_UNTIL: str = "" MESH_ALLOW_COMPAT_DM_INVITE_IMPORT_UNTIL: str = ""
@@ -289,6 +311,19 @@ class Settings(BaseSettings):
# service operator can identify per-install traffic instead of a generic # service operator can identify per-install traffic instead of a generic
# "ShadowBroker" aggregate. # "ShadowBroker" aggregate.
MESHTASTIC_OPERATOR_CALLSIGN: str = "" MESHTASTIC_OPERATOR_CALLSIGN: str = ""
# Per-install operator handle used in the User-Agent for EVERY third-party
# API the backend calls (Wikipedia, Wikidata, Nominatim, GDELT, OpenMHz,
# Broadcastify, weather.gov, NUFORC, etc.). The default is empty, in which
# case backend/services/network_utils.py auto-generates a stable
# pseudonymous handle like "operator-7f3a92" on first use and caches it.
# Operators who want to identify themselves with a real handle can set
# this; operators who want to stay pseudonymous can leave it empty.
#
# The handle is sent ONLY to public third-party APIs. It is NEVER mixed
# into mesh / Wormhole / Infonet identity (those have their own crypto
# identity layer; conflating the two would leak public attribution into
# private mesh state).
OPERATOR_HANDLE: str = ""
# SAR (Synthetic Aperture Radar) data layer # SAR (Synthetic Aperture Radar) data layer
# Mode A — free catalog metadata, no account, default-on # Mode A — free catalog metadata, no account, default-on
+7 -2
View File
@@ -11,8 +11,13 @@ DEFAULT_TRAIL_TTL_S = 300 # 5 min - trail TTL for non-tracked flights
HOLD_PATTERN_DEGREES = 300 # Total heading change to flag holding pattern HOLD_PATTERN_DEGREES = 300 # Total heading change to flag holding pattern
GPS_JAMMING_NACP_THRESHOLD = 8 # NACp below this = degraded GPS signal GPS_JAMMING_NACP_THRESHOLD = 8 # NACp below this = degraded GPS signal
GPS_JAMMING_GRID_SIZE = 1.0 # 1 degree grid for aggregation GPS_JAMMING_GRID_SIZE = 1.0 # 1 degree grid for aggregation
GPS_JAMMING_MIN_RATIO = 0.30 # 30% degraded aircraft to flag zone # Tuned 2026-05: previously 0.30 / 5 aircraft which — combined with the
GPS_JAMMING_MIN_AIRCRAFT = 5 # Min aircraft in grid cell for statistical significance # -1 noise cushion in the detector AND the pre-fix nac_p==0 filter that
# discarded jamming victims — meant the layer almost never lit up.
# Lowering the bar so genuine jamming zones with sparser ADS-B coverage
# clear (eastern Med, Russia/Ukraine border, Iran/Iraq).
GPS_JAMMING_MIN_RATIO = 0.20 # 20% degraded aircraft to flag zone
GPS_JAMMING_MIN_AIRCRAFT = 3 # Min aircraft in grid cell for statistical significance
# ─── Network & Circuit Breaker ────────────────────────────────────────────── # ─── Network & Circuit Breaker ──────────────────────────────────────────────
CIRCUIT_BREAKER_TTL_S = 120 # Skip domain for 2 min after total failure CIRCUIT_BREAKER_TTL_S = 120 # Skip domain for 2 min after total failure
+172 -58
View File
@@ -19,6 +19,7 @@ import concurrent.futures
import json import json
import math import math
import os import os
import random
import threading import threading
import time import time
from datetime import datetime, timedelta from datetime import datetime, timedelta
@@ -75,6 +76,7 @@ from services.fetchers.infrastructure import ( # noqa: F401
fetch_tinygs, fetch_tinygs,
fetch_psk_reporter, fetch_psk_reporter,
) )
from services.fetchers.road_corridor_sat import fetch_road_corridor_trends # noqa: F401
from services.fetchers.geo import ( # noqa: F401 from services.fetchers.geo import ( # noqa: F401
fetch_ships, fetch_ships,
fetch_airports, fetch_airports,
@@ -99,6 +101,10 @@ from services.fetchers.crowdthreat import fetch_crowdthreat # noqa: F401
from services.fetchers.wastewater import fetch_wastewater # noqa: F401 from services.fetchers.wastewater import fetch_wastewater # noqa: F401
from services.fetchers.sar_catalog import fetch_sar_catalog # noqa: F401 from services.fetchers.sar_catalog import fetch_sar_catalog # noqa: F401
from services.fetchers.sar_products import fetch_sar_products # noqa: F401 from services.fetchers.sar_products import fetch_sar_products # noqa: F401
from services.fetchers.malware import fetch_malware_threats # noqa: F401
from services.fetchers.telegram_osint import fetch_telegram_osint # noqa: F401
from services.fetchers.cyber_status import fetch_cyber_threats # noqa: F401
from services.scm.suppliers import fetch_scm_suppliers # noqa: F401
from services.ais_stream import prune_stale_vessels # noqa: F401 from services.ais_stream import prune_stale_vessels # noqa: F401
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)
@@ -144,13 +150,18 @@ _STARTUP_HEAVY_REFRESH_DELAY_S = float(os.environ.get("SHADOWBROKER_STARTUP_HEAV
_STARTUP_HEAVY_REFRESH_STARTED = False _STARTUP_HEAVY_REFRESH_STARTED = False
_STARTUP_HEAVY_REFRESH_LOCK = threading.Lock() _STARTUP_HEAVY_REFRESH_LOCK = threading.Lock()
_FETCH_WORKERS = int(os.environ.get("SHADOWBROKER_FETCH_WORKERS", "8")) _FETCH_WORKERS = int(os.environ.get("SHADOWBROKER_FETCH_WORKERS", "8"))
_HEAVY_FETCH_WORKERS = int(os.environ.get("SHADOWBROKER_HEAVY_FETCH_WORKERS", "2"))
_SLOW_FETCH_CONCURRENCY = int(os.environ.get("SHADOWBROKER_SLOW_FETCH_CONCURRENCY", "4")) _SLOW_FETCH_CONCURRENCY = int(os.environ.get("SHADOWBROKER_SLOW_FETCH_CONCURRENCY", "4"))
_STARTUP_HEAVY_CONCURRENCY = int(os.environ.get("SHADOWBROKER_STARTUP_HEAVY_CONCURRENCY", "2")) _STARTUP_HEAVY_CONCURRENCY = int(os.environ.get("SHADOWBROKER_STARTUP_HEAVY_CONCURRENCY", "2"))
# Shared thread pool — reused across all fetch cycles instead of creating/destroying per tick # Fast-tier pool (flights, ships, sigint, …). Slow / heavy work uses a separate pool
# so Playwright, GDELT, CCTV ingest, etc. cannot starve the 60s refresh path (#375).
_SHARED_EXECUTOR = concurrent.futures.ThreadPoolExecutor( _SHARED_EXECUTOR = concurrent.futures.ThreadPoolExecutor(
max_workers=max(2, _FETCH_WORKERS), thread_name_prefix="fetch" max_workers=max(2, _FETCH_WORKERS), thread_name_prefix="fetch"
) )
_SLOW_EXECUTOR = concurrent.futures.ThreadPoolExecutor(
max_workers=max(1, _HEAVY_FETCH_WORKERS), thread_name_prefix="fetch-slow"
)
def _cache_json_safe(value): def _cache_json_safe(value):
@@ -319,10 +330,49 @@ def seed_startup_caches() -> None:
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
# Scheduler & Orchestration # Scheduler & Orchestration
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
def _executor_for_task_label(label: str) -> concurrent.futures.ThreadPoolExecutor:
if label.startswith(("slow-tier", "startup-heavy")):
return _SLOW_EXECUTOR
return _SHARED_EXECUTOR
def _run_task_with_health_on_executor(
executor: concurrent.futures.ThreadPoolExecutor,
func,
name: str | None = None,
) -> None:
"""Run a scheduled job on the given pool so it cannot starve fast-tier workers."""
task_name = name or getattr(func, "__name__", "task")
future = executor.submit(func)
start = time.perf_counter()
try:
future.result(timeout=_TASK_HARD_TIMEOUT_S)
duration = time.perf_counter() - start
from services.fetch_health import record_success
record_success(task_name, duration_s=duration)
if duration > _SLOW_FETCH_S:
logger.warning("task slow: %s took %.2fs", task_name, duration)
except concurrent.futures.TimeoutError:
future.cancel()
duration = time.perf_counter() - start
from services.fetch_health import record_failure
record_failure(task_name, error=TimeoutError(f"{task_name} timed out"), duration_s=duration)
logger.error("task timed out: %s (%.2fs)", task_name, duration)
except Exception as e:
duration = time.perf_counter() - start
from services.fetch_health import record_failure
record_failure(task_name, error=e, duration_s=duration)
logger.exception("task failed: %s", task_name)
def _run_tasks(label: str, funcs: list, *, max_concurrency: int | None = None): def _run_tasks(label: str, funcs: list, *, max_concurrency: int | None = None):
"""Run tasks concurrently and log any exceptions (do not fail silently).""" """Run tasks concurrently and log any exceptions (do not fail silently)."""
if not funcs: if not funcs:
return return
executor = _executor_for_task_label(label)
if max_concurrency is None: if max_concurrency is None:
if label.startswith("slow-tier"): if label.startswith("slow-tier"):
max_concurrency = _SLOW_FETCH_CONCURRENCY max_concurrency = _SLOW_FETCH_CONCURRENCY
@@ -330,12 +380,13 @@ def _run_tasks(label: str, funcs: list, *, max_concurrency: int | None = None):
max_concurrency = _STARTUP_HEAVY_CONCURRENCY max_concurrency = _STARTUP_HEAVY_CONCURRENCY
else: else:
max_concurrency = len(funcs) max_concurrency = len(funcs)
max_concurrency = max(1, min(max_concurrency, len(funcs))) pool_workers = getattr(executor, "_max_workers", len(funcs))
max_concurrency = max(1, min(max_concurrency, len(funcs), pool_workers))
remaining_funcs = list(funcs) remaining_funcs = list(funcs)
while remaining_funcs: while remaining_funcs:
batch, remaining_funcs = remaining_funcs[:max_concurrency], remaining_funcs[max_concurrency:] batch, remaining_funcs = remaining_funcs[:max_concurrency], remaining_funcs[max_concurrency:]
futures = {_SHARED_EXECUTOR.submit(func): (func.__name__, time.perf_counter()) for func in batch} futures = {executor.submit(func): (func.__name__, time.perf_counter()) for func in batch}
_drain_task_futures(label, futures) _drain_task_futures(label, futures)
@@ -352,6 +403,13 @@ def _drain_task_futures(label: str, futures: dict):
record_success(name, duration_s=duration) record_success(name, duration_s=duration)
if duration > _SLOW_FETCH_S: if duration > _SLOW_FETCH_S:
logger.warning(f"{label} task slow: {name} took {duration:.2f}s") logger.warning(f"{label} task slow: {name} took {duration:.2f}s")
except concurrent.futures.TimeoutError:
future.cancel()
duration = time.perf_counter() - start
from services.fetch_health import record_failure
record_failure(name, error=TimeoutError(f"{name} timed out"), duration_s=duration)
logger.error("%s task timed out: %s (%.2fs)", label, name, duration)
except Exception as e: except Exception as e:
duration = time.perf_counter() - start duration = time.perf_counter() - start
from services.fetch_health import record_failure from services.fetch_health import record_failure
@@ -405,7 +463,6 @@ def update_slow_data():
logger.info("Slow-tier data update starting...") logger.info("Slow-tier data update starting...")
slow_funcs = [ slow_funcs = [
fetch_news, fetch_news,
fetch_prediction_markets,
fetch_earthquakes, fetch_earthquakes,
fetch_firms_fires, fetch_firms_fires,
fetch_firms_country_fires, fetch_firms_country_fires,
@@ -427,6 +484,9 @@ def update_slow_data():
fetch_fishing_activity, fetch_fishing_activity,
fetch_power_plants, fetch_power_plants,
fetch_ukraine_air_raid_alerts, fetch_ukraine_air_raid_alerts,
fetch_malware_threats,
fetch_cyber_threats,
fetch_scm_suppliers,
] ]
_run_tasks("slow-tier", slow_funcs) _run_tasks("slow-tier", slow_funcs)
# Run correlation engine after all data is fresh # Run correlation engine after all data is fresh
@@ -470,6 +530,15 @@ def _load_cctv_cache_for_startup() -> None:
logger.warning("Startup CCTV cache load failed (non-fatal): %s", e) logger.warning("Startup CCTV cache load failed (non-fatal): %s", e)
def _load_static_infrastructure_for_startup() -> None:
"""Disk-backed reference layers — instant, no network."""
for func in (fetch_datacenters, fetch_military_bases, fetch_power_plants):
try:
func()
except Exception as e:
logger.warning("Startup static infrastructure load failed for %s: %s", func.__name__, e)
def _run_delayed_startup_heavy_refresh() -> None: def _run_delayed_startup_heavy_refresh() -> None:
if _STARTUP_HEAVY_REFRESH_DELAY_S > 0: if _STARTUP_HEAVY_REFRESH_DELAY_S > 0:
logger.info( logger.info(
@@ -482,6 +551,7 @@ def _run_delayed_startup_heavy_refresh() -> None:
"startup-heavy", "startup-heavy",
[ [
update_slow_data, update_slow_data,
fetch_telegram_osint,
fetch_volcanoes, fetch_volcanoes,
fetch_viirs_change_nodes, fetch_viirs_change_nodes,
fetch_unusual_whales, fetch_unusual_whales,
@@ -520,6 +590,7 @@ def update_all_data(*, startup_mode: bool = False):
logger.info("Full data update starting (parallel)...") logger.info("Full data update starting (parallel)...")
# Preload Meshtastic map cache immediately (instant, from disk) # Preload Meshtastic map cache immediately (instant, from disk)
seed_startup_caches() seed_startup_caches()
_load_static_infrastructure_for_startup()
with _data_lock: with _data_lock:
meshtastic_seeded = bool(latest_data.get("meshtastic_map_nodes")) meshtastic_seeded = bool(latest_data.get("meshtastic_map_nodes"))
if startup_mode: if startup_mode:
@@ -596,22 +667,9 @@ def update_all_data(*, startup_mode: bool = False):
# (the scheduled job also runs every 10 min for ongoing refresh). # (the scheduled job also runs every 10 min for ongoing refresh).
if startup_mode: if startup_mode:
try: try:
from services.cctv_pipeline import ( from services.cctv_pipeline import get_all_cameras, scheduled_cctv_ingestors
TFLJamCamIngestor, LTASingaporeIngestor, AustinTXIngestor,
NYCDOTIngestor, CaltransIngestor, ColoradoDOTIngestor, _startup_ingestors = [ing for ing, _name in scheduled_cctv_ingestors()]
WSDOTIngestor, GeorgiaDOTIngestor, IllinoisDOTIngestor,
MichiganDOTIngestor, WindyWebcamsIngestor, DGTNationalIngestor,
MadridCityIngestor, OSMTrafficCameraIngestor, get_all_cameras,
)
from services.cctv_pipeline import OSMALPRCameraIngestor
_startup_ingestors = [
TFLJamCamIngestor(), LTASingaporeIngestor(), AustinTXIngestor(),
NYCDOTIngestor(), CaltransIngestor(), ColoradoDOTIngestor(),
WSDOTIngestor(), GeorgiaDOTIngestor(), IllinoisDOTIngestor(),
MichiganDOTIngestor(), WindyWebcamsIngestor(), DGTNationalIngestor(),
MadridCityIngestor(), OSMTrafficCameraIngestor(),
OSMALPRCameraIngestor(),
]
logger.info("Running CCTV ingest at startup (%d ingestors)...", len(_startup_ingestors)) logger.info("Running CCTV ingest at startup (%d ingestors)...", len(_startup_ingestors))
ingest_futures = { ingest_futures = {
_SHARED_EXECUTOR.submit(ing.ingest): ing.__class__.__name__ _SHARED_EXECUTOR.submit(ing.ingest): ing.__class__.__name__
@@ -747,6 +805,39 @@ def start_scheduler():
misfire_grace_time=120, misfire_grace_time=120,
) )
# Telegram OSINT — hourly t.me/s channel scrape (kept off the 5-minute slow tier).
_telegram_interval_m = max(15, int(os.environ.get("TELEGRAM_OSINT_INTERVAL_MINUTES", "60")))
_scheduler.add_job(
lambda: _run_task_with_health(fetch_telegram_osint, "fetch_telegram_osint"),
"interval",
minutes=_telegram_interval_m,
next_run_time=datetime.utcnow() + timedelta(seconds=45),
id="telegram_osint",
max_instances=1,
misfire_grace_time=600,
)
# Prediction markets — own jittered cadence (Polymarket/Kalshi clearnet egress).
# Kept off the fixed 5-minute slow tier so poll timing is less fingerprintable.
from services.fetchers.prediction_markets import fetch_prediction_markets
_pm_interval_m = max(5, int(os.environ.get("PREDICTION_MARKETS_INTERVAL_MINUTES", "7")))
_pm_jitter_s = max(0, int(os.environ.get("PREDICTION_MARKETS_SCHEDULER_JITTER_S", "240")))
_pm_initial_max_s = max(0, int(os.environ.get("PREDICTION_MARKETS_INITIAL_DELAY_MAX_S", "180")))
_pm_first_run = datetime.utcnow() + timedelta(
seconds=random.randint(30, max(30, _pm_initial_max_s))
)
_scheduler.add_job(
lambda: _run_task_with_health(fetch_prediction_markets, "fetch_prediction_markets"),
"interval",
minutes=_pm_interval_m,
jitter=_pm_jitter_s,
next_run_time=_pm_first_run,
id="prediction_markets",
max_instances=1,
misfire_grace_time=300,
)
# Weather alerts — every 5 minutes (time-critical, separate from slow tier) # Weather alerts — every 5 minutes (time-critical, separate from slow tier)
_scheduler.add_job( _scheduler.add_job(
lambda: _run_task_with_health(fetch_weather_alerts, "fetch_weather_alerts"), lambda: _run_task_with_health(fetch_weather_alerts, "fetch_weather_alerts"),
@@ -777,6 +868,39 @@ def start_scheduler():
misfire_grace_time=60, misfire_grace_time=60,
) )
# Flight observation pruning — drops icao24 → first_seen_at entries we
# haven't seen in an hour. Same cadence as AIS prune for symmetry; the
# per-tick scan is O(in-flight aircraft) so it's cheap.
from services.fetchers.flight_observations import prune as _prune_flight_observations
_scheduler.add_job(
lambda: _run_task_with_health(_prune_flight_observations, "prune_flight_observations"),
"interval",
minutes=5,
id="flight_observation_prune",
max_instances=1,
misfire_grace_time=60,
)
# AISHub REST fallback — slow polling when the AISStream WebSocket
# primary is offline. Configurable interval via
# AISHUB_POLL_INTERVAL_MINUTES env (default 20 min). Operator must
# set AISHUB_USERNAME to opt in. The fetcher is gated internally on
# the primary being disconnected, so this job is cheap when the
# WebSocket is healthy (early-returns after a status check).
from services.fetchers.aishub_fallback import (
aishub_poll_interval_minutes,
fetch_aishub_vessels,
)
_aishub_interval = aishub_poll_interval_minutes()
_scheduler.add_job(
lambda: _run_task_with_health(fetch_aishub_vessels, "fetch_aishub_vessels"),
"interval",
minutes=_aishub_interval,
id="aishub_fallback",
max_instances=1,
misfire_grace_time=120,
)
# Route database — bulk refresh from vrs-standing-data.adsb.lol every 5 # Route database — bulk refresh from vrs-standing-data.adsb.lol every 5
# days. Replaces the legacy /api/0/routeset POST (blocked under our UA, # days. Replaces the legacy /api/0/routeset POST (blocked under our UA,
# and broken upstream). Airline schedules change on a quarterly cycle, # and broken upstream). Airline schedules change on a quarterly cycle,
@@ -811,7 +935,7 @@ def start_scheduler():
# GDELT — every 30 minutes (downloads 32 ZIP files per call, avoid rate limits) # GDELT — every 30 minutes (downloads 32 ZIP files per call, avoid rate limits)
_scheduler.add_job( _scheduler.add_job(
lambda: _run_task_with_health(fetch_gdelt, "fetch_gdelt"), lambda: _run_task_with_health_on_executor(_SLOW_EXECUTOR, fetch_gdelt, "fetch_gdelt"),
"interval", "interval",
minutes=30, minutes=30,
id="gdelt", id="gdelt",
@@ -819,7 +943,9 @@ def start_scheduler():
misfire_grace_time=120, misfire_grace_time=120,
) )
_scheduler.add_job( _scheduler.add_job(
lambda: _run_task_with_health(update_liveuamap, "update_liveuamap"), lambda: _run_task_with_health_on_executor(
_SLOW_EXECUTOR, update_liveuamap, "update_liveuamap"
),
"interval", "interval",
minutes=30, minutes=30,
id="liveuamap", id="liveuamap",
@@ -829,39 +955,9 @@ def start_scheduler():
# CCTV pipeline refresh — runs all ingestors, then refreshes in-memory data. # CCTV pipeline refresh — runs all ingestors, then refreshes in-memory data.
# Delay the first run slightly so startup serves cached/DB-backed data first. # Delay the first run slightly so startup serves cached/DB-backed data first.
from services.cctv_pipeline import ( from services.cctv_pipeline import scheduled_cctv_ingestors
TFLJamCamIngestor,
LTASingaporeIngestor,
AustinTXIngestor,
NYCDOTIngestor,
CaltransIngestor,
ColoradoDOTIngestor,
WSDOTIngestor,
GeorgiaDOTIngestor,
IllinoisDOTIngestor,
MichiganDOTIngestor,
WindyWebcamsIngestor,
DGTNationalIngestor,
MadridCityIngestor,
OSMTrafficCameraIngestor,
)
_cctv_ingestors = [ _cctv_ingestors = scheduled_cctv_ingestors()
(TFLJamCamIngestor(), "cctv_tfl"),
(LTASingaporeIngestor(), "cctv_lta"),
(AustinTXIngestor(), "cctv_atx"),
(NYCDOTIngestor(), "cctv_nyc"),
(CaltransIngestor(), "cctv_caltrans"),
(ColoradoDOTIngestor(), "cctv_codot"),
(WSDOTIngestor(), "cctv_wsdot"),
(GeorgiaDOTIngestor(), "cctv_gdot"),
(IllinoisDOTIngestor(), "cctv_idot"),
(MichiganDOTIngestor(), "cctv_mdot"),
(WindyWebcamsIngestor(), "cctv_windy"),
(DGTNationalIngestor(), "cctv_dgt"),
(MadridCityIngestor(), "cctv_madrid"),
(OSMTrafficCameraIngestor(), "cctv_osm"),
]
def _run_cctv_ingest_cycle(): def _run_cctv_ingest_cycle():
from services.fetchers._store import is_any_active from services.fetchers._store import is_any_active
@@ -880,7 +976,9 @@ def start_scheduler():
logger.warning(f"CCTV post-ingest refresh failed: {e}") logger.warning(f"CCTV post-ingest refresh failed: {e}")
_scheduler.add_job( _scheduler.add_job(
_run_cctv_ingest_cycle, lambda: _run_task_with_health_on_executor(
_SLOW_EXECUTOR, _run_cctv_ingest_cycle, "cctv_ingest_cycle"
),
"interval", "interval",
minutes=10, minutes=10,
id="cctv_ingest", id="cctv_ingest",
@@ -950,6 +1048,16 @@ def start_scheduler():
misfire_grace_time=600, misfire_grace_time=600,
) )
# Sentinel-2 road corridor freight trends — daily (opt-in, heavy CDSE usage)
_scheduler.add_job(
lambda: _run_task_with_health(fetch_road_corridor_trends, "fetch_road_corridor_trends"),
"interval",
hours=24,
id="road_corridor_trends",
max_instances=1,
misfire_grace_time=3600,
)
# FIMI disinformation index — every 12 hours (weekly editorial feed) # FIMI disinformation index — every 12 hours (weekly editorial feed)
_scheduler.add_job( _scheduler.add_job(
lambda: _run_task_with_health(fetch_fimi, "fetch_fimi"), lambda: _run_task_with_health(fetch_fimi, "fetch_fimi"),
@@ -960,16 +1068,19 @@ def start_scheduler():
misfire_grace_time=600, misfire_grace_time=600,
) )
# UAP sightings (NUFORC) — daily at 12:00 UTC # UAP sightings (NUFORC) — weekly Mondays 12:00 UTC. Rolling ~60-day window;
# each self-hosted install pulls live nuforc.org so operators see current
# reports (typically ~400500 mappable pins). Disk cache TTL defaults to 7d.
_scheduler.add_job( _scheduler.add_job(
lambda: _run_task_with_health( lambda: _run_task_with_health(
lambda: fetch_uap_sightings(force_refresh=True), lambda: fetch_uap_sightings(force_refresh=True),
"fetch_uap_sightings", "fetch_uap_sightings",
), ),
"cron", "cron",
day_of_week="mon",
hour=12, hour=12,
minute=0, minute=0,
id="uap_sightings_daily", id="uap_sightings_weekly",
max_instances=1, max_instances=1,
misfire_grace_time=3600, misfire_grace_time=3600,
) )
@@ -1094,7 +1205,10 @@ def start_scheduler():
def stop_scheduler(): def stop_scheduler():
if _scheduler: if _scheduler:
_scheduler.shutdown(wait=False) _scheduler.shutdown(wait=False)
_SLOW_EXECUTOR.shutdown(wait=False, cancel_futures=True)
def get_latest_data(): def get_latest_data():
return get_latest_data_subset(*latest_data.keys()) from services.fetchers._store import get_latest_data_deepcopy_snapshot
return get_latest_data_deepcopy_snapshot()
+1
View File
@@ -46,6 +46,7 @@ _CRITICAL_WARN = {
_OPTIONAL = { _OPTIONAL = {
"AIS_API_KEY": "AIS vessel streaming (ships layer will be empty without it)", "AIS_API_KEY": "AIS vessel streaming (ships layer will be empty without it)",
"GFW_API_TOKEN": "Global Fishing Watch fishing-vessel activity (fishing_activity layer)",
"LTA_ACCOUNT_KEY": "Singapore LTA traffic cameras (CCTV layer)", "LTA_ACCOUNT_KEY": "Singapore LTA traffic cameras (CCTV layer)",
"PUBLIC_API_KEY": "Optional client auth for public endpoints (recommended for exposed deployments)", "PUBLIC_API_KEY": "Optional client auth for public endpoints (recommended for exposed deployments)",
} }
+8 -1
View File
@@ -16,8 +16,15 @@ from typing import Any
import requests import requests
from services.network_utils import outbound_user_agent
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)
def _feed_ingester_user_agent() -> str:
# Round 7a: per-install attribution for operator-curated feed URLs.
return outbound_user_agent("feed-ingester")
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
# State # State
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
@@ -157,7 +164,7 @@ def _fetch_layer_feed(layer: dict[str, Any]) -> None:
resp = requests.get( resp = requests.get(
feed_url, feed_url,
timeout=_FETCH_TIMEOUT, timeout=_FETCH_TIMEOUT,
headers={"User-Agent": "ShadowBroker-FeedIngester/1.0"}, headers={"User-Agent": _feed_ingester_user_agent()},
) )
resp.raise_for_status() resp.raise_for_status()
data = resp.json() data = resp.json()
+34 -10
View File
@@ -69,6 +69,11 @@ class DashboardData(TypedDict, total=False):
sar_scenes: List[Dict[str, Any]] sar_scenes: List[Dict[str, Any]]
sar_anomalies: List[Dict[str, Any]] sar_anomalies: List[Dict[str, Any]]
sar_aoi_coverage: List[Dict[str, Any]] sar_aoi_coverage: List[Dict[str, Any]]
road_corridor_trends: Dict[str, Any]
malware_threats: Dict[str, Any]
cyber_threats: Dict[str, Any]
scm_suppliers: Dict[str, Any]
telegram_osint: Dict[str, Any]
# In-memory store # In-memory store
@@ -119,6 +124,11 @@ latest_data: DashboardData = {
"sar_scenes": [], "sar_scenes": [],
"sar_anomalies": [], "sar_anomalies": [],
"sar_aoi_coverage": [], "sar_aoi_coverage": [],
"road_corridor_trends": {"updated_at": None, "corridors": []},
"malware_threats": {"threats": [], "total": 0, "timestamp": None},
"cyber_threats": {"threats": [], "stats": {}},
"scm_suppliers": {"suppliers": [], "total": 0, "critical_count": 0},
"telegram_osint": {"posts": [], "total": 0, "geolocated": 0, "timestamp": None},
} }
# Per-source freshness timestamps # Per-source freshness timestamps
@@ -230,27 +240,35 @@ _active_layers_version: int = 0
def bump_active_layers_version() -> None: def bump_active_layers_version() -> None:
"""Increment the active-layer version when frontend toggles change response shape.""" """Increment the active-layer version when frontend toggles change response shape."""
global _active_layers_version global _active_layers_version
_active_layers_version += 1 with _data_lock:
_active_layers_version += 1
def get_active_layers_version() -> int: def get_active_layers_version() -> int:
"""Return the current active-layer version (for ETag generation).""" """Return the current active-layer version (for ETag generation)."""
return _active_layers_version with _data_lock:
return _active_layers_version
def get_latest_data_subset(*keys: str) -> DashboardData: def get_latest_data_subset(*keys: str) -> DashboardData:
"""Return a deep snapshot of only the requested top-level keys. """Return a deep snapshot of only the requested top-level keys.
This avoids cloning the entire dashboard store for endpoints that only need Grabs references under the lock, then deep-copies outside it so fetcher
a small tier-specific subset. Deep copy ensures callers cannot mutate writers are not blocked for the duration of a large clone (#375).
nested structures (e.g. individual flight dicts) and affect the live store.
""" """
with _data_lock: with _data_lock:
snap: DashboardData = {} items = [(key, latest_data.get(key)) for key in keys]
for key in keys: snap: DashboardData = {}
value = latest_data.get(key) for key, value in items:
snap[key] = copy.deepcopy(value) snap[key] = copy.deepcopy(value)
return snap return snap
def get_latest_data_deepcopy_snapshot() -> DashboardData:
"""Deep-copy the full dashboard for legacy /api/live-data consumers."""
with _data_lock:
items = list(latest_data.items())
return {key: copy.deepcopy(value) for key, value in items}
def get_latest_data_subset_refs(*keys: str) -> DashboardData: def get_latest_data_subset_refs(*keys: str) -> DashboardData:
@@ -320,6 +338,12 @@ active_layers: dict[str, bool] = {
"ai_intel": True, "ai_intel": True,
"crowdthreat": False, "crowdthreat": False,
"sar": True, "sar": True,
"road_corridor_trends": False,
"malware_c2": False,
"submarine_cables": False,
"scm_suppliers": False,
"cyber_threats": False,
"telegram_osint": True,
} }
@@ -21,6 +21,13 @@ from typing import Any
import defusedxml.ElementTree as ET import defusedxml.ElementTree as ET
import requests import requests
def _aircraft_db_user_agent() -> str:
"""Round 7a: lazy import so the per-install operator handle is included."""
from services.network_utils import outbound_user_agent
return outbound_user_agent("aircraft-database")
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)
_BUCKET_LIST_URL = ( _BUCKET_LIST_URL = (
@@ -31,8 +38,6 @@ _S3_NS = "{http://s3.amazonaws.com/doc/2006-03-01/}"
_REFRESH_INTERVAL_S = 5 * 24 * 3600 _REFRESH_INTERVAL_S = 5 * 24 * 3600
_LIST_TIMEOUT_S = 30 _LIST_TIMEOUT_S = 30
_DOWNLOAD_TIMEOUT_S = 600 _DOWNLOAD_TIMEOUT_S = 600
from services.network_utils import DEFAULT_USER_AGENT as _USER_AGENT
_lock = threading.RLock() _lock = threading.RLock()
_aircraft_by_hex: dict[str, dict[str, str]] = {} _aircraft_by_hex: dict[str, dict[str, str]] = {}
_last_refresh = 0.0 _last_refresh = 0.0
@@ -44,7 +49,7 @@ def _latest_snapshot_key() -> str:
response = requests.get( response = requests.get(
_BUCKET_LIST_URL, _BUCKET_LIST_URL,
timeout=_LIST_TIMEOUT_S, timeout=_LIST_TIMEOUT_S,
headers={"User-Agent": _USER_AGENT}, headers={"User-Agent": _aircraft_db_user_agent()},
) )
response.raise_for_status() response.raise_for_status()
root = ET.fromstring(response.text) root = ET.fromstring(response.text)
@@ -71,7 +76,7 @@ def _stream_csv_index(url: str) -> dict[str, dict[str, str]]:
url, url,
timeout=_DOWNLOAD_TIMEOUT_S, timeout=_DOWNLOAD_TIMEOUT_S,
stream=True, stream=True,
headers={"User-Agent": _USER_AGENT}, headers={"User-Agent": _aircraft_db_user_agent()},
) as response: ) as response:
response.raise_for_status() response.raise_for_status()
line_iter = ( line_iter = (
@@ -0,0 +1,290 @@
"""AISHub REST fallback for ship tracking when AISStream is unreachable.
Background
----------
On 2026-05-23 ``stream.aisstream.io`` (the primary live AIS WebSocket feed)
went fully offline. Backend's only ship signal vanished. This module polls
``data.aishub.net``'s free REST API on a slow cadence (default 20 min) when
the WebSocket primary is disconnected, so the ships layer doesn't go fully
dark during upstream outages.
Why 20 minutes
--------------
AISHub's free tier is rate-limited and explicitly asks consumers to be
courteous. 20 minutes is well inside their limits, gives ships time to
move enough to look "alive" on the map, and won't drain their service.
Configurable via the ``AISHUB_POLL_INTERVAL_MINUTES`` env var (clamped to
[1, 360]).
Why slow vs primary
-------------------
This is degraded mode, not a replacement. A ship at 20 knots moves about
6 nautical miles in 20 minutes visible on the map but coarser than the
real-time WebSocket signal. When AISStream comes back online, the
WebSocket data will overwrite these records via the same ``_vessels``
dict and ``source`` will flip from ``"aishub"`` back to upstream-live.
Opt-in
------
Operator must set ``AISHUB_USERNAME`` (free registration at
https://www.aishub.net/api). If unset, this fetcher is a no-op.
"""
from __future__ import annotations
import json
import logging
import os
import time
from typing import Any
from services.network_utils import fetch_with_curl
logger = logging.getLogger(__name__)
AISHUB_URL = "https://data.aishub.net/ws.php"
def aishub_username() -> str:
return str(os.environ.get("AISHUB_USERNAME", "")).strip()
def aishub_fallback_enabled() -> bool:
"""Returns True only when the operator has registered with AISHub and
set ``AISHUB_USERNAME``. The presence of the username is the opt-in."""
return bool(aishub_username())
def aishub_poll_interval_minutes() -> int:
"""Default 20 minutes. Clamped to [1, 360] so a hostile or
misconfigured env var can't either hammer the upstream or silence the
fallback for a day."""
raw = os.environ.get("AISHUB_POLL_INTERVAL_MINUTES", "20")
try:
value = int(str(raw).strip())
except (TypeError, ValueError):
value = 20
return max(1, min(360, value))
def _should_run_fallback() -> bool:
"""Only run when the primary WebSocket is disconnected. Avoids stomping
over fresher live data when AISStream is healthy.
Returns False if:
* AISHub isn't configured (no username)
* AISStream primary is currently connected (recent vessel messages)
Returns True only when AIS is configured-but-down. The
``proxy_spawn_count > 0`` guard means "the primary has at least tried
to run" — if the user set AISHUB_USERNAME but not AIS_API_KEY at all,
AISHub will still serve as a primary on its own slow cadence.
"""
if not aishub_fallback_enabled():
return False
try:
from services.ais_stream import ais_proxy_status
status = ais_proxy_status() or {}
except Exception:
return True # ais_stream not importable? still try AISHub.
# If the WebSocket primary is connected, skip the fallback — fresher
# data is already flowing.
if status.get("connected") is True:
return False
return True
def _parse_aishub_response(payload: str) -> list[dict]:
"""Parse the AISHub JSON response into a list of vessel records.
Successful response shape::
[
{"ERROR": false, "USERNAME": "...", "FORMAT": "1", "RECORDS": N},
[{"MMSI": ..., "LATITUDE": ..., "LONGITUDE": ..., ...}, ...]
]
Error response shape::
[{"ERROR": true, "ERROR_MESSAGE": "..."}]
Empty payload (e.g. silent rate-limit drop) returns ``[]``.
"""
if not payload or not payload.strip():
return []
try:
data = json.loads(payload)
except json.JSONDecodeError as e:
logger.warning("AISHub: response is not JSON: %s", e)
return []
if not isinstance(data, list) or not data:
return []
header = data[0] if isinstance(data[0], dict) else {}
if header.get("ERROR") is True:
logger.warning(
"AISHub: upstream error: %s",
header.get("ERROR_MESSAGE", "<unspecified>"),
)
return []
if len(data) < 2 or not isinstance(data[1], list):
return []
return [row for row in data[1] if isinstance(row, dict)]
def _normalize_record(row: dict) -> dict | None:
"""Map an AISHub vessel record to our internal vessel schema.
Returns None when the record can't be used (no MMSI, bad position,
sentinel "not available" lat/lng).
"""
try:
mmsi = int(row.get("MMSI") or 0)
except (TypeError, ValueError):
return None
if not mmsi:
return None
try:
lat = float(row.get("LATITUDE"))
lng = float(row.get("LONGITUDE"))
except (TypeError, ValueError):
return None
# AIS uses 91/181 as "no position available" sentinels.
if abs(lat) > 90 or abs(lng) > 180:
return None
if lat == 91.0 or lng == 181.0:
return None
# SOG raw 102.3 is "speed not available"; sanitize to 0.
try:
sog_raw = float(row.get("SOG") or 0)
except (TypeError, ValueError):
sog_raw = 0.0
sog = 0.0 if sog_raw >= 102.2 else sog_raw
try:
cog = float(row.get("COG") or 0)
except (TypeError, ValueError):
cog = 0.0
try:
heading_raw = int(row.get("HEADING") or 511)
except (TypeError, ValueError):
heading_raw = 511
# AIS heading sentinel 511 = "not available" — fall back to COG.
heading = heading_raw if heading_raw != 511 else cog
try:
ais_type = int(row.get("TYPE") or 0)
except (TypeError, ValueError):
ais_type = 0
return {
"mmsi": mmsi,
"lat": lat,
"lng": lng,
"sog": sog,
"cog": cog,
"heading": heading,
"name": str(row.get("NAME") or "").strip() or "UNKNOWN",
"callsign": str(row.get("CALLSIGN") or "").strip(),
"destination": str(row.get("DEST") or "").strip().replace("@", "") or "",
"imo": int(row.get("IMO") or 0),
"ais_type_code": ais_type,
}
def fetch_aishub_vessels() -> int:
"""Poll AISHub and merge vessels into the shared ``_vessels`` store.
Returns the number of vessels updated (0 on skip, error, or no data).
Designed to be called by the APScheduler tier see
``data_fetcher.py`` for the 20-minute interval job that wraps this.
"""
if not _should_run_fallback():
logger.debug("AISHub fallback skipped: primary connected or not configured")
return 0
username = aishub_username()
url = (
f"{AISHUB_URL}?username={username}&format=1&output=json"
f"&compress=0"
)
try:
response = fetch_with_curl(url, timeout=30)
except Exception as e:
logger.warning("AISHub fetch failed: %s", e)
return 0
if not response or response.status_code != 200:
logger.warning(
"AISHub HTTP %s",
getattr(response, "status_code", "None"),
)
return 0
rows = _parse_aishub_response(getattr(response, "text", "") or "")
if not rows:
return 0
# Inline imports to avoid a circular dependency at module load time
# (ais_stream imports lots of things and is loaded by main.py).
from services.ais_stream import (
_vessels,
_vessels_lock,
_record_vessel_trail_locked,
classify_vessel,
get_country_from_mmsi,
)
now = time.time()
count = 0
with _vessels_lock:
for row in rows:
normalized = _normalize_record(row)
if normalized is None:
continue
mmsi = normalized["mmsi"]
vessel = _vessels.setdefault(mmsi, {"mmsi": mmsi})
# Don't overwrite fresher live data: if the WebSocket pushed an
# update for this MMSI more recently than now-1s (race during
# the brief reconnection window) keep the live one.
last = float(vessel.get("_updated") or 0)
if last > now - 1:
continue
vessel.update(
{
"lat": normalized["lat"],
"lng": normalized["lng"],
"sog": normalized["sog"],
"cog": normalized["cog"],
"heading": normalized["heading"],
"_updated": now,
"source": "aishub",
}
)
if normalized["name"] and normalized["name"] != "UNKNOWN":
vessel["name"] = normalized["name"]
if normalized["callsign"]:
vessel["callsign"] = normalized["callsign"]
if normalized["destination"]:
vessel["destination"] = normalized["destination"]
if normalized["imo"]:
vessel["imo"] = normalized["imo"]
if normalized["ais_type_code"]:
vessel["ais_type_code"] = normalized["ais_type_code"]
vessel["type"] = classify_vessel(normalized["ais_type_code"], mmsi)
if not vessel.get("country"):
vessel["country"] = get_country_from_mmsi(mmsi)
_record_vessel_trail_locked(
mmsi,
normalized["lat"],
normalized["lng"],
normalized["sog"],
now,
)
count += 1
if count:
logger.info(
"AISHub fallback: merged %d vessels (poll interval %d min)",
count,
aishub_poll_interval_minutes(),
)
return count
+62
View File
@@ -0,0 +1,62 @@
"""CISA KEV + cyber threat stats (Osiris port)."""
from __future__ import annotations
import logging
from datetime import datetime, timezone
from typing import Any
from services.fetchers._store import _data_lock, _mark_fresh, is_any_active, latest_data
from services.network_utils import fetch_with_curl
logger = logging.getLogger(__name__)
def fetch_cyber_threats() -> dict[str, Any]:
if not is_any_active("cyber_threats"):
return latest_data.get("cyber_threats") or {"threats": [], "stats": {}}
results: dict[str, Any] = {"threats": [], "stats": {}, "timestamp": datetime.now(timezone.utc).isoformat()}
try:
resp = fetch_with_curl(
"https://www.cisa.gov/sites/default/files/feeds/known_exploited_vulnerabilities.json",
timeout=15,
)
if resp.status_code == 200:
data = resp.json()
vulns = data.get("vulnerabilities") or []
results["stats"]["cisa_total"] = len(vulns)
now = datetime.now(timezone.utc)
recent = []
for v in vulns:
try:
added = datetime.fromisoformat(v.get("dateAdded", "").replace("Z", "+00:00"))
days = (now - added).total_seconds() / 86400
except Exception:
continue
if days <= 30:
recent.append(v)
recent = recent[:10]
results["threats"] = [
{
"id": v.get("cveID"),
"name": v.get("vulnerabilityName"),
"vendor": v.get("vendorProject"),
"product": v.get("product"),
"severity": "CRITICAL",
"date": v.get("dateAdded"),
"due": v.get("dueDate"),
"source": "CISA KEV",
}
for v in recent
]
except Exception as exc:
logger.warning("CISA KEV fetch failed: %s", exc)
count = len(results["threats"])
results["stats"]["active_cves"] = count
results["stats"]["threat_level"] = "CRITICAL" if count >= 8 else "HIGH" if count >= 4 else "ELEVATED"
with _data_lock:
latest_data["cyber_threats"] = results
_mark_fresh("cyber_threats")
return results
+289 -109
View File
@@ -15,7 +15,11 @@ import time
import heapq import heapq
from datetime import datetime, timedelta from datetime import datetime, timedelta
from pathlib import Path from pathlib import Path
from services.network_utils import external_curl_fallback_enabled, fetch_with_curl from services.network_utils import (
external_curl_fallback_enabled,
fetch_with_curl,
outbound_user_agent,
)
from services.fetchers._store import latest_data, _data_lock, _mark_fresh from services.fetchers._store import latest_data, _data_lock, _mark_fresh
from services.fetchers.nuforc_enrichment import enrich_sighting from services.fetchers.nuforc_enrichment import enrich_sighting
from services.fetchers.retry import with_retry from services.fetchers.retry import with_retry
@@ -279,13 +283,13 @@ def fetch_weather_alerts():
return return
alerts = [] alerts = []
try: try:
# weather.gov requires a User-Agent per their API policy, but it # weather.gov requires a User-Agent per their API policy. Round 7a:
# need not identify the operator. Use a project-generic string and # send the per-install operator handle so they can rate-limit per
# let the user override via SHADOWBROKER_USER_AGENT if needed. # operator instead of treating "Shadowbroker" as one entity.
from services.network_utils import DEFAULT_USER_AGENT from services.network_utils import outbound_user_agent
url = "https://api.weather.gov/alerts/active?status=actual" url = "https://api.weather.gov/alerts/active?status=actual"
headers = { headers = {
"User-Agent": DEFAULT_USER_AGENT, "User-Agent": outbound_user_agent("weather-gov"),
"Accept": "application/geo+json", "Accept": "application/geo+json",
} }
response = fetch_with_curl(url, timeout=15, headers=headers) response = fetch_with_curl(url, timeout=15, headers=headers)
@@ -688,7 +692,8 @@ _NUFORC_TILESET = "nuforc.cmm18aqea06bu1mmselhpnano-0ce5v"
_NUFORC_TOKEN = os.environ.get("NUFORC_MAPBOX_TOKEN", "").strip() _NUFORC_TOKEN = os.environ.get("NUFORC_MAPBOX_TOKEN", "").strip()
_NUFORC_RADIUS_M = 200_000 # 200 km query radius _NUFORC_RADIUS_M = 200_000 # 200 km query radius
_NUFORC_LIMIT = 50 # max features per tilequery call _NUFORC_LIMIT = 50 # max features per tilequery call
_NUFORC_RECENT_DAYS = int(os.environ.get("NUFORC_RECENT_DAYS", "60")) # Rolling window shown on the map (~2 calendar months). Override via NUFORC_RECENT_DAYS.
_NUFORC_RECENT_DAYS = max(1, int(os.environ.get("NUFORC_RECENT_DAYS", "60")))
_NUFORC_HF_FALLBACK_LIMIT = max(25, int(os.environ.get("NUFORC_HF_FALLBACK_LIMIT", "250"))) _NUFORC_HF_FALLBACK_LIMIT = max(25, int(os.environ.get("NUFORC_HF_FALLBACK_LIMIT", "250")))
_NUFORC_HF_GEOCODE_LIMIT = max(25, int(os.environ.get("NUFORC_HF_GEOCODE_LIMIT", "150"))) _NUFORC_HF_GEOCODE_LIMIT = max(25, int(os.environ.get("NUFORC_HF_GEOCODE_LIMIT", "150")))
_NUFORC_GEOCODE_WORKERS = max(1, int(os.environ.get("NUFORC_GEOCODE_WORKERS", "1"))) _NUFORC_GEOCODE_WORKERS = max(1, int(os.environ.get("NUFORC_GEOCODE_WORKERS", "1")))
@@ -696,6 +701,12 @@ _NUFORC_GEOCODE_WORKERS = max(1, int(os.environ.get("NUFORC_GEOCODE_WORKERS", "1
# practice, so a 0.3s spacing keeps us well under any soft throttle while # practice, so a 0.3s spacing keeps us well under any soft throttle while
# still rebuilding a full 12-month window in ~10 minutes. # still rebuilding a full 12-month window in ~10 minutes.
_NUFORC_GEOCODE_SPACING_S = float(os.environ.get("NUFORC_GEOCODE_SPACING_S", "0.3")) _NUFORC_GEOCODE_SPACING_S = float(os.environ.get("NUFORC_GEOCODE_SPACING_S", "0.3"))
# Disk cache TTL — match the weekly scheduler so restarts between fetches still
# serve the same rolling 60-day snapshot without hammering nuforc.org daily.
_NUFORC_CACHE_TTL_S = max(
3600,
int(os.environ.get("NUFORC_CACHE_TTL_HOURS", "168")) * 3600,
)
_NUFORC_DATA_DIR = Path(__file__).resolve().parent.parent.parent / "data" _NUFORC_DATA_DIR = Path(__file__).resolve().parent.parent.parent / "data"
_NUFORC_SIGHTINGS_CACHE_FILE = _NUFORC_DATA_DIR / "nuforc_recent_sightings.json" _NUFORC_SIGHTINGS_CACHE_FILE = _NUFORC_DATA_DIR / "nuforc_recent_sightings.json"
_NUFORC_LOCATION_CACHE_FILE = _NUFORC_DATA_DIR / "nuforc_location_cache.json" _NUFORC_LOCATION_CACHE_FILE = _NUFORC_DATA_DIR / "nuforc_location_cache.json"
@@ -713,7 +724,12 @@ _NUFORC_LIVE_NONCE_RE = re.compile(
r'id=["\']wdtNonceFrontendServerSide_1["\'][^>]*value=["\']([a-f0-9]+)["\']' r'id=["\']wdtNonceFrontendServerSide_1["\'][^>]*value=["\']([a-f0-9]+)["\']'
) )
_NUFORC_LIVE_SIGHTING_ID_RE = re.compile(r"id=(\d+)") _NUFORC_LIVE_SIGHTING_ID_RE = re.compile(r"id=(\d+)")
_NUFORC_LIVE_USER_AGENT = "Mozilla/5.0 (ShadowBroker-OSINT NUFORC-fetcher)" # Round 7a: NUFORC's site is sensitive to non-browser UAs but we send a
# per-install operator handle prefixed by Mozilla/5.0 so we're identifiable
# without being aggregately blocked. Operators who want stricter privacy
# can override the entire UA via SHADOWBROKER_USER_AGENT.
def _nuforc_live_user_agent() -> str:
return f"Mozilla/5.0 ({outbound_user_agent('nuforc-live')})"
_NUFORC_LIVE_SESSION_COOKIES = _NUFORC_DATA_DIR / "nuforc_session.cookies" _NUFORC_LIVE_SESSION_COOKIES = _NUFORC_DATA_DIR / "nuforc_session.cookies"
# Sample grid covering continental US, Alaska, Hawaii, Canada, UK, Australia # Sample grid covering continental US, Alaska, Hawaii, Canada, UK, Australia
@@ -757,6 +773,35 @@ def _fetch_nuforc_tilequery(lng: float, lat: float) -> list[dict]:
return [] return []
def _uap_cutoff_date_str() -> str:
return (datetime.utcnow() - timedelta(days=_NUFORC_RECENT_DAYS)).strftime("%Y-%m-%d")
def _uap_sighting_date_str(sighting: dict) -> str | None:
"""Normalize a sighting row to YYYY-MM-DD for window filtering."""
from services.fetchers.nuforc_enrichment import _parse_date
raw = str(sighting.get("date_time") or sighting.get("occurred") or "").strip()
if not raw:
return None
parsed = _parse_date(raw)
if parsed:
return parsed
if len(raw) >= 10 and raw[4] == "-" and raw[7] == "-":
return raw[:10]
return None
def _filter_uap_sightings_recent(sightings: list[dict]) -> list[dict]:
"""Drop anything outside the rolling NUFORC_RECENT_DAYS window."""
cutoff = _uap_cutoff_date_str()
return [
sighting
for sighting in sightings
if (_uap_sighting_date_str(sighting) or "") >= cutoff
]
def _parse_nuforc_tile_date(value: str) -> datetime | None: def _parse_nuforc_tile_date(value: str) -> datetime | None:
raw = str(value or "").strip() raw = str(value or "").strip()
if not raw: if not raw:
@@ -793,19 +838,41 @@ def _load_nuforc_sightings_cache(*, force_refresh: bool = False) -> list[dict] |
built_dt = datetime.fromisoformat(built) if built else None built_dt = datetime.fromisoformat(built) if built else None
if built_dt is None: if built_dt is None:
return None return None
if (datetime.utcnow() - built_dt).total_seconds() > 86400: if (datetime.utcnow() - built_dt).total_seconds() > _NUFORC_CACHE_TTL_S:
return None
if raw.get("cutoff_days") != _NUFORC_RECENT_DAYS:
logger.info(
"UAP sightings: cache cutoff_days mismatch (%s != %s); rebuilding",
raw.get("cutoff_days"),
_NUFORC_RECENT_DAYS,
)
return None return None
sightings = raw.get("sightings") sightings = raw.get("sightings")
if isinstance(sightings, list): if isinstance(sightings, list):
if len(sightings) <= 0: if len(sightings) <= 0:
logger.info("UAP sightings: cache is fresh but empty; rebuilding") logger.info("UAP sightings: cache is fresh but empty; rebuilding")
return None return None
filtered = _filter_uap_sightings_recent(sightings)
if not filtered:
logger.warning(
"UAP sightings: cache had %d rows but none within last %d days; rebuilding",
len(sightings),
_NUFORC_RECENT_DAYS,
)
return None
if len(filtered) < len(sightings):
logger.info(
"UAP sightings: dropped %d stale cached rows outside %d-day window",
len(sightings) - len(filtered),
_NUFORC_RECENT_DAYS,
)
logger.info( logger.info(
"UAP sightings: loaded %d cached reports from %s", "UAP sightings: loaded %d cached reports from %s (within %d-day window)",
len(sightings), len(filtered),
built, built,
_NUFORC_RECENT_DAYS,
) )
return sightings return filtered
except Exception as e: except Exception as e:
logger.warning("UAP sightings: cache load error: %s", e) logger.warning("UAP sightings: cache load error: %s", e)
return None return None
@@ -819,6 +886,7 @@ def _save_nuforc_sightings_cache(sightings: list[dict]) -> None:
_NUFORC_DATA_DIR.mkdir(parents=True, exist_ok=True) _NUFORC_DATA_DIR.mkdir(parents=True, exist_ok=True)
payload = { payload = {
"built": datetime.utcnow().isoformat(), "built": datetime.utcnow().isoformat(),
"cutoff_days": _NUFORC_RECENT_DAYS,
"count": len(sightings), "count": len(sightings),
"sightings": sightings, "sightings": sightings,
} }
@@ -957,7 +1025,7 @@ def _photon_lookup(query: str) -> list[float] | None:
res = fetch_with_curl( res = fetch_with_curl(
url, url,
headers={ headers={
"User-Agent": "ShadowBroker-OSINT/1.0 (NUFORC-UAP-layer)", "User-Agent": outbound_user_agent("nuforc-uap-geocode"),
"Accept-Language": "en", "Accept-Language": "en",
}, },
timeout=10, timeout=10,
@@ -1026,97 +1094,10 @@ def _nuforc_months_for_window(days: int) -> list[str]:
return months return months
def _nuforc_fetch_month_live(yyyymm: str, cookie_jar: Path) -> list[dict]: def _parse_nuforc_live_datatables_rows(raw_rows: list) -> list[dict]:
"""Pull one month of NUFORC sightings via the live wpDataTables AJAX. """Parse wpDataTables ``data`` array into normalized row dicts."""
Returns a list of raw row dicts with the fields we care about:
id, occurred (YYYY-MM-DD), posted (YYYY-MM-DD), city, state, country,
shape_raw, summary, explanation. Empty list on any failure caller
decides whether a failure is fatal.
"""
from services.fetchers.nuforc_enrichment import _parse_date from services.fetchers.nuforc_enrichment import _parse_date
curl_bin = shutil.which("curl") or "curl"
index_url = _NUFORC_LIVE_INDEX_URL.format(yyyymm=yyyymm)
ajax_url = _NUFORC_LIVE_AJAX_URL.format(yyyymm=yyyymm)
if not external_curl_fallback_enabled():
logger.warning(
"NUFORC live: external curl disabled on Windows for %s; "
"set SHADOWBROKER_ENABLE_WINDOWS_CURL_FALLBACK=1 to opt in.",
yyyymm,
)
return []
# Step 1: GET the month index to capture session cookies + fresh nonce.
try:
index_res = subprocess.run(
[
curl_bin, "-sL",
"-A", _NUFORC_LIVE_USER_AGENT,
"-c", str(cookie_jar),
"-b", str(cookie_jar),
index_url,
],
capture_output=True, text=True, timeout=60,
encoding="utf-8", errors="replace",
)
except (subprocess.SubprocessError, OSError) as e:
logger.warning("NUFORC live: index fetch failed for %s: %s", yyyymm, e)
return []
if index_res.returncode != 0 or not index_res.stdout:
logger.warning(
"NUFORC live: index fetch exit=%s for %s", index_res.returncode, yyyymm,
)
return []
nonce_match = _NUFORC_LIVE_NONCE_RE.search(index_res.stdout)
if not nonce_match:
logger.warning("NUFORC live: wdtNonce not found on index page for %s", yyyymm)
return []
nonce = nonce_match.group(1)
# Step 2: POST to admin-ajax.php with length=-1 to pull the whole month.
post_data = (
"draw=1"
"&columns%5B0%5D%5Bdata%5D=0&columns%5B0%5D%5Bsearchable%5D=true&columns%5B0%5D%5Borderable%5D=false"
"&columns%5B1%5D%5Bdata%5D=1&columns%5B1%5D%5Bsearchable%5D=true&columns%5B1%5D%5Borderable%5D=true"
"&order%5B0%5D%5Bcolumn%5D=1&order%5B0%5D%5Bdir%5D=desc"
"&start=0&length=-1"
"&search%5Bvalue%5D=&search%5Bregex%5D=false"
f"&wdtNonce={nonce}"
)
try:
ajax_res = subprocess.run(
[
curl_bin, "-sL",
"-A", _NUFORC_LIVE_USER_AGENT,
"-c", str(cookie_jar),
"-b", str(cookie_jar),
"-X", "POST",
"-H", f"Referer: {index_url}",
"-H", "X-Requested-With: XMLHttpRequest",
"-H", "Content-Type: application/x-www-form-urlencoded",
"--data", post_data,
ajax_url,
],
capture_output=True, text=True, timeout=120,
encoding="utf-8", errors="replace",
)
except (subprocess.SubprocessError, OSError) as e:
logger.warning("NUFORC live: ajax fetch failed for %s: %s", yyyymm, e)
return []
if ajax_res.returncode != 0 or not ajax_res.stdout:
logger.warning(
"NUFORC live: ajax fetch exit=%s for %s", ajax_res.returncode, yyyymm,
)
return []
try:
payload = json.loads(ajax_res.stdout)
except json.JSONDecodeError as e:
logger.warning("NUFORC live: ajax JSON decode failed for %s: %s", yyyymm, e)
return []
raw_rows = payload.get("data") or []
out: list[dict] = [] out: list[dict] = []
for raw in raw_rows: for raw in raw_rows:
if not isinstance(raw, list) or len(raw) < 8: if not isinstance(raw, list) or len(raw) < 8:
@@ -1165,16 +1146,166 @@ def _nuforc_fetch_month_live(yyyymm: str, cookie_jar: Path) -> list[dict]:
return out return out
def _nuforc_fetch_month_live_requests(yyyymm: str) -> list[dict]:
"""Live NUFORC month fetch via requests (Windows-safe when curl is disabled)."""
import requests
index_url = _NUFORC_LIVE_INDEX_URL.format(yyyymm=yyyymm)
ajax_url = _NUFORC_LIVE_AJAX_URL.format(yyyymm=yyyymm)
headers = {"User-Agent": _nuforc_live_user_agent()}
session = requests.Session()
session.headers.update(headers)
try:
index_res = session.get(index_url, timeout=60)
except requests.RequestException as e:
logger.warning("NUFORC live (requests): index fetch failed for %s: %s", yyyymm, e)
return []
if index_res.status_code != 200 or not index_res.text:
logger.warning(
"NUFORC live (requests): index HTTP %s for %s",
index_res.status_code,
yyyymm,
)
return []
nonce_match = _NUFORC_LIVE_NONCE_RE.search(index_res.text)
if not nonce_match:
logger.warning("NUFORC live (requests): wdtNonce not found for %s", yyyymm)
return []
nonce = nonce_match.group(1)
post_data = (
"draw=1"
"&columns%5B0%5D%5Bdata%5D=0&columns%5B0%5D%5Bsearchable%5D=true&columns%5B0%5D%5Borderable%5D=false"
"&columns%5B1%5D%5Bdata%5D=1&columns%5B1%5D%5Bsearchable%5D=true&columns%5B1%5D%5Borderable%5D=true"
"&order%5B0%5D%5Bcolumn%5D=1&order%5B0%5D%5Bdir%5D=desc"
"&start=0&length=-1"
"&search%5Bvalue%5D=&search%5Bregex%5D=false"
f"&wdtNonce={nonce}"
)
try:
ajax_res = session.post(
ajax_url,
data=post_data,
headers={
**headers,
"Referer": index_url,
"X-Requested-With": "XMLHttpRequest",
"Content-Type": "application/x-www-form-urlencoded",
},
timeout=120,
)
except requests.RequestException as e:
logger.warning("NUFORC live (requests): ajax failed for %s: %s", yyyymm, e)
return []
if ajax_res.status_code != 200 or not ajax_res.text:
logger.warning(
"NUFORC live (requests): ajax HTTP %s for %s",
ajax_res.status_code,
yyyymm,
)
return []
try:
payload = ajax_res.json()
except json.JSONDecodeError as e:
logger.warning("NUFORC live (requests): ajax JSON decode failed for %s: %s", yyyymm, e)
return []
return _parse_nuforc_live_datatables_rows(payload.get("data") or [])
def _nuforc_fetch_month_live_curl(yyyymm: str, cookie_jar: Path) -> list[dict]:
"""Pull one month of NUFORC sightings via curl + wpDataTables AJAX."""
curl_bin = shutil.which("curl") or "curl"
index_url = _NUFORC_LIVE_INDEX_URL.format(yyyymm=yyyymm)
ajax_url = _NUFORC_LIVE_AJAX_URL.format(yyyymm=yyyymm)
# Step 1: GET the month index to capture session cookies + fresh nonce.
try:
index_res = subprocess.run(
[
curl_bin, "-sL",
"-A", _nuforc_live_user_agent(),
"-c", str(cookie_jar),
"-b", str(cookie_jar),
index_url,
],
capture_output=True, text=True, timeout=60,
encoding="utf-8", errors="replace",
)
except (subprocess.SubprocessError, OSError) as e:
logger.warning("NUFORC live: index fetch failed for %s: %s", yyyymm, e)
return []
if index_res.returncode != 0 or not index_res.stdout:
logger.warning(
"NUFORC live: index fetch exit=%s for %s", index_res.returncode, yyyymm,
)
return []
nonce_match = _NUFORC_LIVE_NONCE_RE.search(index_res.stdout)
if not nonce_match:
logger.warning("NUFORC live: wdtNonce not found on index page for %s", yyyymm)
return []
nonce = nonce_match.group(1)
# Step 2: POST to admin-ajax.php with length=-1 to pull the whole month.
post_data = (
"draw=1"
"&columns%5B0%5D%5Bdata%5D=0&columns%5B0%5D%5Bsearchable%5D=true&columns%5B0%5D%5Borderable%5D=false"
"&columns%5B1%5D%5Bdata%5D=1&columns%5B1%5D%5Bsearchable%5D=true&columns%5B1%5D%5Borderable%5D=true"
"&order%5B0%5D%5Bcolumn%5D=1&order%5B0%5D%5Bdir%5D=desc"
"&start=0&length=-1"
"&search%5Bvalue%5D=&search%5Bregex%5D=false"
f"&wdtNonce={nonce}"
)
try:
ajax_res = subprocess.run(
[
curl_bin, "-sL",
"-A", _nuforc_live_user_agent(),
"-c", str(cookie_jar),
"-b", str(cookie_jar),
"-X", "POST",
"-H", f"Referer: {index_url}",
"-H", "X-Requested-With: XMLHttpRequest",
"-H", "Content-Type: application/x-www-form-urlencoded",
"--data", post_data,
ajax_url,
],
capture_output=True, text=True, timeout=120,
encoding="utf-8", errors="replace",
)
except (subprocess.SubprocessError, OSError) as e:
logger.warning("NUFORC live: ajax fetch failed for %s: %s", yyyymm, e)
return []
if ajax_res.returncode != 0 or not ajax_res.stdout:
logger.warning(
"NUFORC live: ajax fetch exit=%s for %s", ajax_res.returncode, yyyymm,
)
return []
try:
payload = json.loads(ajax_res.stdout)
except json.JSONDecodeError as e:
logger.warning("NUFORC live: ajax JSON decode failed for %s: %s", yyyymm, e)
return []
return _parse_nuforc_live_datatables_rows(payload.get("data") or [])
def _nuforc_fetch_month_live(yyyymm: str, cookie_jar: Path) -> list[dict]:
"""Pull one month of NUFORC sightings via live wpDataTables AJAX."""
if external_curl_fallback_enabled():
rows = _nuforc_fetch_month_live_curl(yyyymm, cookie_jar)
if rows:
return rows
return _nuforc_fetch_month_live_requests(yyyymm)
def _build_recent_uap_sightings() -> list[dict]: def _build_recent_uap_sightings() -> list[dict]:
"""Build the rolling 1-year UAP sightings layer from live NUFORC data. """Build the rolling UAP sightings layer from live NUFORC data.
Hits nuforc.org's public sub-index once per month in the window, drops Hits nuforc.org's public sub-index once per month in the window, drops
anything outside the exact day-precision cutoff, dedupes by sighting id, anything outside the exact day-precision cutoff, dedupes by sighting id,
geocodes city+state via the existing location cache, and returns rows geocodes city+state via the existing location cache, and returns rows
keyed to the same schema the frontend already renders. keyed to the same schema the frontend already renders.
""" """
cutoff_dt = datetime.utcnow() - timedelta(days=_NUFORC_RECENT_DAYS) cutoff_str = _uap_cutoff_date_str()
cutoff_str = cutoff_dt.strftime("%Y-%m-%d")
months = _nuforc_months_for_window(_NUFORC_RECENT_DAYS) months = _nuforc_months_for_window(_NUFORC_RECENT_DAYS)
try: try:
@@ -1374,10 +1505,21 @@ def _build_uap_sightings_from_hf_mirror() -> list[dict]:
This is a resilience fallback for local/Windows runs where nuforc.org is This is a resilience fallback for local/Windows runs where nuforc.org is
Cloudflare-gated and the Mapbox token is not configured. It is not as fresh Cloudflare-gated and the Mapbox token is not configured. It is not as fresh
as the live NUFORC AJAX feed, but it keeps the layer visible and cached. as the live NUFORC AJAX feed, but it keeps the layer visible and cached.
Date-cutoff guard: the kcimc/NUFORC HF dataset is a static snapshot whose
maintainer refreshes it sporadically. Without a cutoff, sorting by
occurred-desc and taking the top N rows returns whatever the mirror's
newest rows happen to be which can be years old if the snapshot is
stale. We apply the same ``_NUFORC_RECENT_DAYS`` window the live path
uses (60 days). If the HF mirror has nothing inside the window we return
``[]`` rather than silently serving 3-year-old "newest" rows.
""" """
from services.fetchers.nuforc_enrichment import _HF_CSV_URL, _parse_date from services.fetchers.nuforc_enrichment import _HF_CSV_URL, _parse_date
from services.geocode_validate import coord_in_country from services.geocode_validate import coord_in_country
cutoff_dt = datetime.utcnow() - timedelta(days=_NUFORC_RECENT_DAYS)
cutoff_str = cutoff_dt.strftime("%Y-%m-%d")
try: try:
response = fetch_with_curl(_HF_CSV_URL, timeout=180, follow_redirects=True) response = fetch_with_curl(_HF_CSV_URL, timeout=180, follow_redirects=True)
if not response or response.status_code != 200: if not response or response.status_code != 200:
@@ -1391,6 +1533,7 @@ def _build_uap_sightings_from_hf_mirror() -> list[dict]:
return [] return []
candidates: list[dict] = [] candidates: list[dict] = []
stale_rows_dropped = 0
try: try:
reader = csv.DictReader(io.StringIO(response.text)) reader = csv.DictReader(io.StringIO(response.text))
for row in reader: for row in reader:
@@ -1401,6 +1544,9 @@ def _build_uap_sightings_from_hf_mirror() -> list[dict]:
) )
if not occurred: if not occurred:
continue continue
if occurred < cutoff_str:
stale_rows_dropped += 1
continue
raw_location = _normalize_uap_location( raw_location = _normalize_uap_location(
row.get("Location", "") row.get("Location", "")
or row.get("City", "") or row.get("City", "")
@@ -1435,6 +1581,19 @@ def _build_uap_sightings_from_hf_mirror() -> list[dict]:
logger.warning("UAP sightings: HF fallback parse failed: %s", e) logger.warning("UAP sightings: HF fallback parse failed: %s", e)
return [] return []
if not candidates:
# HF mirror returned rows, but none inside the rolling window. This is
# the smoking gun for "the public HF dataset hasn't been refreshed in
# years" — log loudly so the operator sees it instead of guessing.
logger.error(
"UAP sightings: HF fallback yielded 0 rows within last %d days "
"(dropped %d stale rows). HF mirror is likely stale; the layer "
"will be empty until the live NUFORC path recovers.",
_NUFORC_RECENT_DAYS,
stale_rows_dropped,
)
return []
candidates.sort(key=lambda row: (row["occurred"], row["posted"], row["id"]), reverse=True) candidates.sort(key=lambda row: (row["occurred"], row["posted"], row["id"]), reverse=True)
candidates = candidates[:_NUFORC_HF_FALLBACK_LIMIT] candidates = candidates[:_NUFORC_HF_FALLBACK_LIMIT]
@@ -1493,11 +1652,12 @@ def _build_uap_sightings_from_hf_mirror() -> list[dict]:
@with_retry(max_retries=1, base_delay=5) @with_retry(max_retries=1, base_delay=5)
def fetch_uap_sightings(*, force_refresh: bool = False): def fetch_uap_sightings(*, force_refresh: bool = False):
"""Fetch last-year UAP sightings from NUFORC. """Fetch rolling-window UAP sightings from live NUFORC.
Startup reads the cached daily snapshot when it is still fresh. The daily Startup reads the cached snapshot when still within NUFORC_CACHE_TTL_HOURS
scheduler forces a rebuild so this layer updates once per day instead of (default 168h / one week). The weekly scheduler forces a rebuild so every
churning continuously. install refreshes the same ~60-day layer without daily load on nuforc.org.
Operators can also POST /api/refresh (admin) to pull immediately.
""" """
from services.fetchers._store import is_any_active from services.fetchers._store import is_any_active
@@ -1506,13 +1666,32 @@ def fetch_uap_sightings(*, force_refresh: bool = False):
sightings = _load_nuforc_sightings_cache(force_refresh=force_refresh) sightings = _load_nuforc_sightings_cache(force_refresh=force_refresh)
if sightings is None: if sightings is None:
live_error: Exception | None = None
try: try:
sightings = _build_recent_uap_sightings() sightings = _build_recent_uap_sightings()
except Exception as e: except Exception as e:
live_error = e
logger.warning("UAP sightings: live NUFORC rebuild failed, using fallback: %s", e) logger.warning("UAP sightings: live NUFORC rebuild failed, using fallback: %s", e)
sightings = _build_uap_sightings_from_hf_mirror() sightings = _build_uap_sightings_from_hf_mirror()
if sightings: if sightings:
_save_nuforc_sightings_cache(sightings) _save_nuforc_sightings_cache(sightings)
elif live_error is not None:
# Both paths failed: live raised AND HF fallback returned empty
# (either the HF mirror is stale beyond the cutoff or the network
# is gone entirely). The previous code silently set the layer to
# ``[]`` and kept marking it fresh; that masked the failure for
# days. Surface it via assert_canary so the health registry shows
# the layer as broken instead of "fresh and empty".
from services.slo import assert_canary
assert_canary("uap_sightings", 0)
logger.error(
"UAP sightings: both live NUFORC and HF fallback produced 0 "
"rows; layer is unavailable. Live error: %s",
live_error,
)
if sightings:
sightings = _filter_uap_sightings_recent(sightings)
with _data_lock: with _data_lock:
latest_data["uap_sightings"] = sightings or [] latest_data["uap_sightings"] = sightings or []
@@ -1520,6 +1699,7 @@ def fetch_uap_sightings(*, force_refresh: bool = False):
_mark_fresh("uap_sightings") _mark_fresh("uap_sightings")
return return
# Unreachable legacy Mapbox tilequery path (kept for reference).
cutoff = datetime.utcnow() - timedelta(days=_NUFORC_RECENT_DAYS) cutoff = datetime.utcnow() - timedelta(days=_NUFORC_RECENT_DAYS)
# Query the grid concurrently (up to 8 threads) # Query the grid concurrently (up to 8 threads)
@@ -0,0 +1,148 @@
"""Per-aircraft observation tracking for cumulative fuel/CO2 estimates.
Background
----------
The pre-existing emissions enrichment attached a *rate* to each flight
(GPH and kg/hr) based on aircraft model. Users reasonably wanted the
running total: how much fuel HAS this plane burned since we started
seeing it? Multiplying the rate by elapsed observation time gets us
there, but it requires somewhere to remember "when did this icao24
first appear on our radar?"
Why this lives outside ``flight_trails``
----------------------------------------
``flight_trails`` is sized and pruned aggressively for map rendering
(5-minute TTL for untracked aircraft, 200 trail points max). That's
wrong for cumulative burn: if a plane has been airborne 2 hours but
its trail was pruned 30 min in, the "first trail point" timestamp is
30 min ago, not 2h ago. Worse, when the trail expires and re-creates,
the cumulative counter would reset mid-flight.
This module tracks observation lifecycle separately:
* When a hex is first observed: start a new flight session.
* While observed regularly (gap < ``REOPEN_GAP_S``): keep accumulating.
* When unseen for longer than ``REOPEN_GAP_S``: treat next sighting as
a new session (the plane landed and took off again, or it's a
different leg). Reset ``first_seen_at``.
* Stale sessions are pruned every ``PRUNE_INTERVAL_S`` so memory stays
bounded.
The user explicitly asked for this counting semantic: "as soon as a
plane appears there should be a counter that keeps a running count of
the fuel being burned... If there is no estimate take off time then it
can just be from the time the server starts to keep a log of whats in
the air."
"""
from __future__ import annotations
import threading
import time
# Gap between sightings that resets the session. ADS-B refreshes the
# whole aircraft list every minute or two, so anything over a few
# minutes means the plane left our coverage window (landed, transit
# through dead zone, etc). 15 minutes is conservative.
REOPEN_GAP_S = 15 * 60
# Don't accumulate runaway memory: drop entries unseen for an hour.
PRUNE_AFTER_S = 60 * 60
# Cap on accumulated airtime per session so a single bug elsewhere
# (e.g. ts clock skew) can't produce comically large numbers.
MAX_SESSION_SECONDS = 24 * 3600 # 24h — longest realistic civilian leg
_observations: dict[str, dict[str, float]] = {}
_lock = threading.Lock()
_last_prune_at = 0.0
def record_observation(icao_hex: str, *, now: float | None = None) -> int:
"""Record a sighting of ``icao_hex`` and return airtime so far (seconds).
Returns 0 for the first-ever sighting (no elapsed time yet) or when
``icao_hex`` is falsy. The caller can multiply the returned seconds
by ``rate_per_hour / 3600`` to get cumulative consumption.
"""
if not icao_hex:
return 0
key = str(icao_hex).strip().lower()
if not key:
return 0
current = float(now if now is not None else time.time())
with _lock:
entry = _observations.get(key)
if entry is None:
_observations[key] = {"first_seen_at": current, "last_seen_at": current}
return 0
# Use explicit ``is None`` checks instead of ``or`` short-circuit:
# ``0.0`` is a legitimate timestamp value (e.g. test fixtures
# seeding a far-past first_seen_at to exercise the clamp) but
# ``0.0 or fallback`` collapses to ``fallback`` because 0.0 is
# falsy. Bit me on my own test — leaving the safer form here.
last_raw = entry.get("last_seen_at")
last_seen = float(last_raw) if last_raw is not None else current
gap = current - last_seen
if gap > REOPEN_GAP_S:
# Treat as a new flight session — the plane landed/disappeared
# long enough that the prior cumulative count is no longer
# the same flight.
_observations[key] = {"first_seen_at": current, "last_seen_at": current}
return 0
first_raw = entry.get("first_seen_at")
first = float(first_raw) if first_raw is not None else current
# Clamp absurd values from clock skew or bad input.
elapsed = max(0, min(int(current - first), MAX_SESSION_SECONDS))
entry["last_seen_at"] = current
return elapsed
def prune(*, now: float | None = None) -> int:
"""Drop entries we haven't seen in ``PRUNE_AFTER_S`` seconds.
Returns number of entries dropped. Safe to call from a scheduler tick;
cheap (single dict scan) so cadence doesn't matter much.
"""
current = float(now if now is not None else time.time())
dropped = 0
with _lock:
stale_keys = []
for k, v in _observations.items():
last_raw = v.get("last_seen_at")
last = float(last_raw) if last_raw is not None else 0.0
if current - last > PRUNE_AFTER_S:
stale_keys.append(k)
for k in stale_keys:
del _observations[k]
dropped += 1
return dropped
def get_session_seconds(icao_hex: str, *, now: float | None = None) -> int:
"""Read-only accessor: airtime for a known icao without bumping last-seen.
Used by tests and external consumers (e.g. when rendering a snapshot
of all in-flight aircraft, you want the current value, not to update
last_seen_at as a side effect).
"""
if not icao_hex:
return 0
key = str(icao_hex).strip().lower()
with _lock:
entry = _observations.get(key)
if entry is None:
return 0
current = float(now if now is not None else time.time())
first_raw = entry.get("first_seen_at")
first = float(first_raw) if first_raw is not None else current
return max(0, min(int(current - first), MAX_SESSION_SECONDS))
def _reset_for_tests() -> None:
"""Drop all observations. Test helper only."""
with _lock:
_observations.clear()
+123 -50
View File
@@ -17,6 +17,7 @@ from services.network_utils import fetch_with_curl
from services.fetchers._store import latest_data, _data_lock, _mark_fresh from services.fetchers._store import latest_data, _data_lock, _mark_fresh
from services.fetchers.plane_alert import enrich_with_plane_alert, enrich_with_tracked_names from services.fetchers.plane_alert import enrich_with_plane_alert, enrich_with_tracked_names
from services.fetchers.emissions import get_emissions_info from services.fetchers.emissions import get_emissions_info
from services.fetchers.flight_observations import record_observation as _record_flight_observation
from services.fetchers.retry import with_retry from services.fetchers.retry import with_retry
from services.fetchers.route_database import lookup_route from services.fetchers.route_database import lookup_route
from services.fetchers.aircraft_database import lookup_aircraft_type from services.fetchers.aircraft_database import lookup_aircraft_type
@@ -29,6 +30,88 @@ _RE_AIRLINE_CODE_1 = re.compile(r"^([A-Z]{3})\d")
_RE_AIRLINE_CODE_2 = re.compile(r"^([A-Z]{3})[A-Z\d]") _RE_AIRLINE_CODE_2 = re.compile(r"^([A-Z]{3})[A-Z\d]")
def detect_gps_jamming_zones(
raw_flights: list[dict],
*,
min_aircraft: int | None = None,
min_ratio: float | None = None,
nacp_threshold: int | None = None,
) -> list[dict]:
"""Detect GPS interference zones from a snapshot of raw ADS-B aircraft.
Methodology mirrors GPSJam.org / Flightradar24: bin aircraft into 1°x1°
grid cells, flag cells where the fraction of aircraft reporting degraded
NACp clears a threshold.
Inputs
------
raw_flights:
Iterable of dicts. Each item is expected to carry ``lat``, ``lng``
(or ``lon``), and ``nac_p``. Records missing position OR missing
``nac_p`` entirely (typical for OpenSky-sourced flights) are
skipped absence-of-data isn't evidence of anything.
nac_p == 0 IS counted as degraded. Pre-fix code skipped it on the theory
that "0 = old transponder, never computed accuracy." That's only half
right: modern Mode-S Enhanced Surveillance transponders also fall back
to nac_p=0 when they lose GPS lock entirely which is exactly the
jamming signature we're trying to detect. Filtering 0 out was discarding
the strongest evidence.
Denoising:
1. Require ``min_aircraft`` per grid cell for statistical validity.
2. Subtract 1 from degraded count per cell (GPSJam's technique) so
a single quirky transponder can't flag an entire zone.
3. Require ratio ``adjusted_degraded / total > min_ratio``.
All thresholds default to the module-level constants but can be
overridden for testing.
"""
min_aircraft = GPS_JAMMING_MIN_AIRCRAFT if min_aircraft is None else int(min_aircraft)
min_ratio = GPS_JAMMING_MIN_RATIO if min_ratio is None else float(min_ratio)
nacp_threshold = (
GPS_JAMMING_NACP_THRESHOLD if nacp_threshold is None else int(nacp_threshold)
)
jamming_grid: dict[str, dict[str, int]] = {}
for rf in raw_flights or []:
rlat = rf.get("lat")
rlng = rf.get("lng") if rf.get("lng") is not None else rf.get("lon")
if rlat is None or rlng is None:
continue
nacp = rf.get("nac_p")
if nacp is None:
continue
grid_key = f"{int(rlat)},{int(rlng)}"
cell = jamming_grid.setdefault(grid_key, {"degraded": 0, "total": 0})
cell["total"] += 1
if nacp < nacp_threshold:
cell["degraded"] += 1
jamming_zones: list[dict] = []
for gk, counts in jamming_grid.items():
if counts["total"] < min_aircraft:
continue
adjusted_degraded = max(counts["degraded"] - 1, 0)
if adjusted_degraded == 0:
continue
ratio = adjusted_degraded / counts["total"]
if ratio > min_ratio:
lat_i, lng_i = gk.split(",")
severity = "low" if ratio < 0.5 else "medium" if ratio < 0.75 else "high"
jamming_zones.append(
{
"lat": int(lat_i) + 0.5,
"lng": int(lng_i) + 0.5,
"severity": severity,
"ratio": round(ratio, 2),
"degraded": counts["degraded"],
"total": counts["total"],
}
)
return jamming_zones
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
# OpenSky Network API Client (OAuth2) # OpenSky Network API Client (OAuth2)
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
@@ -459,6 +542,18 @@ def _classify_and_publish(all_adsb_flights):
ac_category = "heli" if model_upper in _HELI_TYPES_BACKEND else "plane" ac_category = "heli" if model_upper in _HELI_TYPES_BACKEND else "plane"
# Source attribution: prefer the explicit ``source`` tag stamped
# at fetch time (adsb.lol, OpenSky). If absent, fall back to the
# legacy ``supplemental_source`` (airplanes.live, adsb.fi) so
# supplementals are still attributed without changing their
# tagger. Final fallback "adsb.lol" preserves prior behavior for
# any caller that synthesizes records without going through one
# of our fetchers (e.g. tests).
source = (
f.get("source")
or f.get("supplemental_source")
or "adsb.lol"
)
flights.append( flights.append(
{ {
"callsign": flight_str, "callsign": flight_str,
@@ -480,6 +575,7 @@ def _classify_and_publish(all_adsb_flights):
"airline_code": airline_code, "airline_code": airline_code,
"aircraft_category": ac_category, "aircraft_category": ac_category,
"nac_p": f.get("nac_p"), "nac_p": f.get("nac_p"),
"source": source,
} }
) )
except (ValueError, TypeError, KeyError, AttributeError) as loop_e: except (ValueError, TypeError, KeyError, AttributeError) as loop_e:
@@ -506,6 +602,22 @@ def _classify_and_publish(all_adsb_flights):
if model: if model:
emi = get_emissions_info(model) emi = get_emissions_info(model)
if emi: if emi:
# Cumulative fuel/CO2: multiply the per-hour rate by how
# long we've been observing this airframe. Users want to
# see the *amount* burned, not just the rate. If we've
# never seen this hex before, observed_seconds is 0 and
# the cumulative values are 0 until the next refresh —
# the rate is still useful info on its own.
observed_seconds = _record_flight_observation(
f.get("icao24") or ""
)
elapsed_h = observed_seconds / 3600.0
emi = {
**emi,
"observed_seconds": observed_seconds,
"fuel_gallons_burned": round(emi["fuel_gph"] * elapsed_h, 1),
"co2_kg_emitted": round(emi["co2_kg_per_hour"] * elapsed_h, 1),
}
f["emissions"] = emi f["emissions"] = emi
callsign = f.get("callsign", "").strip().upper() callsign = f.get("callsign", "").strip().upper()
@@ -724,56 +836,8 @@ def _classify_and_publish(all_adsb_flights):
latest_data["military_flights"] = military_snapshot latest_data["military_flights"] = military_snapshot
# --- GPS Jamming Detection --- # --- GPS Jamming Detection ---
# Uses NACp (Navigation Accuracy Category Position) from ADS-B to infer
# GPS interference zones, similar to GPSJam.org / Flightradar24.
# NACp < 8 = position accuracy worse than the FAA-mandated 0.05 NM.
#
# Denoising (to suppress false positives from old GA transponders):
# 1. Skip nac_p == 0 ("unknown accuracy") — old transponders that never
# computed accuracy, NOT evidence of jamming. Real jamming shows 1-7.
# 2. Require minimum aircraft per grid cell for statistical validity.
# 3. Subtract 1 from degraded count per cell (GPSJam's technique) so a
# single quirky transponder can't flag an entire zone.
# 4. Require the adjusted ratio to exceed the threshold.
try: try:
jamming_grid = {} jamming_zones = detect_gps_jamming_zones(raw_flights_snapshot)
raw_flights = raw_flights_snapshot
for rf in raw_flights:
rlat = rf.get("lat")
rlng = rf.get("lng") or rf.get("lon")
if rlat is None or rlng is None:
continue
nacp = rf.get("nac_p")
if nacp is None or nacp == 0:
continue
grid_key = f"{int(rlat)},{int(rlng)}"
if grid_key not in jamming_grid:
jamming_grid[grid_key] = {"degraded": 0, "total": 0}
jamming_grid[grid_key]["total"] += 1
if nacp < GPS_JAMMING_NACP_THRESHOLD:
jamming_grid[grid_key]["degraded"] += 1
jamming_zones = []
for gk, counts in jamming_grid.items():
if counts["total"] < GPS_JAMMING_MIN_AIRCRAFT:
continue
adjusted_degraded = max(counts["degraded"] - 1, 0)
if adjusted_degraded == 0:
continue
ratio = adjusted_degraded / counts["total"]
if ratio > GPS_JAMMING_MIN_RATIO:
lat_i, lng_i = gk.split(",")
severity = "low" if ratio < 0.5 else "medium" if ratio < 0.75 else "high"
jamming_zones.append(
{
"lat": int(lat_i) + 0.5,
"lng": int(lng_i) + 0.5,
"severity": severity,
"ratio": round(ratio, 2),
"degraded": counts["degraded"],
"total": counts["total"],
}
)
with _data_lock: with _data_lock:
latest_data["gps_jamming"] = jamming_zones latest_data["gps_jamming"] = jamming_zones
if jamming_zones: if jamming_zones:
@@ -849,7 +913,15 @@ def _fetch_adsb_lol_regions():
res = fetch_with_curl(url, timeout=10) res = fetch_with_curl(url, timeout=10)
if res.status_code == 200: if res.status_code == 200:
data = res.json() data = res.json()
return data.get("ac", []) aircraft = data.get("ac", [])
# Stamp the source at the fetch site so attribution survives
# the OpenSky/supplemental dedupe-by-hex merge downstream.
# Previously adsb.lol records carried no marker while OpenSky
# records got ``is_opensky: True`` — which made flight tooltips
# look like everything came from OpenSky.
for a in aircraft:
a["source"] = "adsb.lol"
return aircraft
except ( except (
requests.RequestException, requests.RequestException,
ConnectionError, ConnectionError,
@@ -932,6 +1004,7 @@ def _enrich_with_opensky_and_supplemental(adsb_flights):
"gs": (s[9] * 1.94384) if s[9] else 0, "gs": (s[9] * 1.94384) if s[9] else 0,
"t": "Unknown", "t": "Unknown",
"is_opensky": True, "is_opensky": True,
"source": "OpenSky",
} }
) )
elif os_res.status_code == 429: elif os_res.status_code == 429:
+47 -17
View File
@@ -20,17 +20,9 @@ def _env_flag(name: str) -> str:
def liveuamap_scraper_enabled() -> bool: def liveuamap_scraper_enabled() -> bool:
"""Return whether the Playwright-based LiveUAMap scraper should run. from services.liveuamap_settings import liveuamap_scraper_enabled as _enabled
It is useful enrichment, but it starts a browser/Node driver and must not be return _enabled()
allowed to destabilize Windows local startup.
"""
setting = _env_flag("SHADOWBROKER_ENABLE_LIVEUAMAP_SCRAPER")
if setting in {"1", "true", "yes", "on"}:
return True
if setting in {"0", "false", "no", "off"}:
return False
return os.name != "nt"
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
@@ -210,10 +202,17 @@ def update_liveuamap():
if not is_any_active("global_incidents"): if not is_any_active("global_incidents"):
return return
if not liveuamap_scraper_enabled(): if not liveuamap_scraper_enabled():
logger.info( from services.liveuamap_settings import liveuamap_requires_ui_opt_in
"Liveuamap scraper disabled for this runtime; set "
"SHADOWBROKER_ENABLE_LIVEUAMAP_SCRAPER=1 to opt in." if liveuamap_requires_ui_opt_in():
) logger.info(
"Liveuamap scraper disabled: enable Global Incidents in the UI to "
"consent, or set SHADOWBROKER_ENABLE_LIVEUAMAP_SCRAPER=1."
)
else:
logger.info(
"Liveuamap scraper disabled; set SHADOWBROKER_ENABLE_LIVEUAMAP_SCRAPER=1 to opt in."
)
return return
logger.info("Running scheduled Liveuamap scraper...") logger.info("Running scheduled Liveuamap scraper...")
try: try:
@@ -279,6 +278,16 @@ _FISHING_FETCH_INTERVAL_S = 3600 # once per hour — GFW data has ~5 day lag
_last_fishing_fetch_ts: float = 0.0 _last_fishing_fetch_ts: float = 0.0
def _gfw_int_env(name: str, default: int, *, minimum: int = 1, maximum: int | None = None) -> int:
try:
value = int(os.environ.get(name, str(default)) or default)
except (TypeError, ValueError):
value = default
if maximum is not None:
value = min(maximum, value)
return max(minimum, value)
@with_retry(max_retries=1, base_delay=5) @with_retry(max_retries=1, base_delay=5)
def fetch_fishing_activity(): def fetch_fishing_activity():
"""Fetch recent fishing events from Global Fishing Watch (~5 day lag).""" """Fetch recent fishing events from Global Fishing Watch (~5 day lag)."""
@@ -301,10 +310,16 @@ def fetch_fishing_activity():
try: try:
import datetime as _dt import datetime as _dt
# GFW publishes with ~5 day lag; windows shorter than ~7 days often return 0 events.
lookback_days = _gfw_int_env("GFW_EVENTS_LOOKBACK_DAYS", 7, minimum=1, maximum=14)
max_pages = _gfw_int_env("GFW_EVENTS_MAX_PAGES", 10, minimum=1, maximum=100)
timeout_s = _gfw_int_env("GFW_EVENTS_TIMEOUT_S", 90, minimum=30, maximum=180)
_end = _dt.date.today().isoformat() _end = _dt.date.today().isoformat()
_start = (_dt.date.today() - _dt.timedelta(days=7)).isoformat() _start = (_dt.date.today() - _dt.timedelta(days=lookback_days)).isoformat()
page_size = max(1, int(os.environ.get("GFW_EVENTS_PAGE_SIZE", "500") or "500")) page_size = _gfw_int_env("GFW_EVENTS_PAGE_SIZE", 500, minimum=1, maximum=1000)
offset = 0 offset = 0
pages_fetched = 0
total_available: int | None = None
seen_offsets: set[int] = set() seen_offsets: set[int] = set()
seen_ids: set[str] = set() seen_ids: set[str] = set()
headers = {"Authorization": f"Bearer {token}"} headers = {"Authorization": f"Bearer {token}"}
@@ -325,7 +340,7 @@ def fetch_fishing_activity():
} }
) )
url = f"https://gateway.api.globalfishingwatch.org/v3/events?{query}" url = f"https://gateway.api.globalfishingwatch.org/v3/events?{query}"
response = fetch_with_curl(url, timeout=30, headers=headers) response = fetch_with_curl(url, timeout=timeout_s, headers=headers)
if response.status_code != 200: if response.status_code != 200:
logger.warning( logger.warning(
"Fishing activity fetch failed at offset=%s: HTTP %s", "Fishing activity fetch failed at offset=%s: HTTP %s",
@@ -335,10 +350,16 @@ def fetch_fishing_activity():
break break
payload = response.json() or {} payload = response.json() or {}
if total_available is None:
try:
total_available = int(payload.get("total")) if payload.get("total") is not None else None
except (TypeError, ValueError):
total_available = None
entries = payload.get("entries", []) entries = payload.get("entries", [])
if not entries: if not entries:
break break
pages_fetched += 1
added_this_page = 0 added_this_page = 0
for e in entries: for e in entries:
pos = e.get("position", {}) pos = e.get("position", {})
@@ -373,6 +394,15 @@ def fetch_fishing_activity():
if len(entries) < page_size: if len(entries) < page_size:
break break
if pages_fetched >= max_pages:
logger.info(
"Fishing activity: capped at %s pages (%s events fetched; GFW total=%s)",
max_pages,
len(events),
total_available if total_available is not None else "unknown",
)
break
next_offset = payload.get("nextOffset") next_offset = payload.get("nextOffset")
if next_offset is None: if next_offset is None:
next_offset = (payload.get("pagination") or {}).get("nextOffset") next_offset = (payload.get("pagination") or {}).get("nextOffset")
+6 -6
View File
@@ -6,7 +6,7 @@ import heapq
import logging import logging
from pathlib import Path from pathlib import Path
from cachetools import TTLCache from cachetools import TTLCache
from services.network_utils import fetch_with_curl from services.network_utils import fetch_with_curl, outbound_user_agent
from services.fetchers._store import latest_data, _data_lock, _mark_fresh from services.fetchers._store import latest_data, _data_lock, _mark_fresh
from services.fetchers.retry import with_retry from services.fetchers.retry import with_retry
@@ -29,7 +29,7 @@ def _geocode_region(region_name: str, country_name: str) -> tuple:
query = urllib.parse.quote(f"{region_name}, {country_name}") query = urllib.parse.quote(f"{region_name}, {country_name}")
url = f"https://nominatim.openstreetmap.org/search?q={query}&format=json&limit=1" url = f"https://nominatim.openstreetmap.org/search?q={query}&format=json&limit=1"
response = fetch_with_curl(url, timeout=8, headers={"User-Agent": "ShadowBroker-OSINT/1.0"}) response = fetch_with_curl(url, timeout=8, headers={"User-Agent": outbound_user_agent("infrastructure-data")})
if response.status_code == 200: if response.status_code == 200:
results = response.json() results = response.json()
if results: if results:
@@ -235,11 +235,11 @@ _DC_GEOCODED_PATH = Path(__file__).parent.parent.parent / "data" / "datacenters_
def fetch_datacenters(): def fetch_datacenters():
"""Load geocoded data centers (5K+ street-level precise locations).""" """Load geocoded data centers (5K+ street-level precise locations).
from services.fetchers._store import is_any_active
if not is_any_active("datacenters"): Always loads from disk; /api/live-data/slow gates the payload on the
return datacenters layer toggle so enabling the layer can render immediately.
"""
dcs = [] dcs = []
try: try:
if not _DC_GEOCODED_PATH.exists(): if not _DC_GEOCODED_PATH.exists():
+107
View File
@@ -0,0 +1,107 @@
"""Malware C2 / URLhaus feed (abuse.ch, Osiris port)."""
from __future__ import annotations
import logging
from datetime import datetime, timezone
from typing import Any
from services.fetchers._store import _data_lock, _mark_fresh, is_any_active, latest_data
from services.network_utils import fetch_with_curl
logger = logging.getLogger(__name__)
COUNTRY_CENTROIDS: dict[str, tuple[float, float]] = {
"AF": (65, 33), "AL": (20, 41), "DZ": (3, 28), "AR": (-64, -34), "AU": (134, -25),
"AT": (14, 47.5), "BE": (4, 50.8), "BR": (-51, -10), "CA": (-96, 62), "CN": (105, 35),
"DE": (10, 51), "FR": (2, 46), "GB": (-2, 54), "IN": (79, 22), "IR": (53, 32),
"IT": (12.5, 42.8), "JP": (138, 36), "KR": (128, 36), "MX": (-102, 23.5), "NL": (5.5, 52.5),
"PL": (19.5, 52), "RU": (100, 60), "SG": (103.8, 1.35), "TW": (121, 23.7), "UA": (32, 49),
"US": (-97, 38), "VN": (106, 16),
}
def fetch_malware_threats() -> list[dict[str, Any]]:
if not is_any_active("malware_c2"):
return latest_data.get("malware_threats") or []
threats: list[dict[str, Any]] = []
threat_id = 0
try:
resp = fetch_with_curl(
"https://feodotracker.abuse.ch/downloads/ipblocklist.json",
timeout=10,
headers={"User-Agent": "Shadowbroker/1.0", "Accept": "application/json"},
)
if resp.status_code == 200:
entries = resp.json()
if not isinstance(entries, list):
entries = []
for entry in entries[:200]:
cc = entry.get("country")
if not cc or cc not in COUNTRY_CENTROIDS:
continue
lng, lat = COUNTRY_CENTROIDS[cc]
j_lng = ((threat_id * 173.7) % 200 - 100) / 100 * 4
j_lat = ((threat_id * 293.1) % 200 - 100) / 100 * 4
threats.append(
{
"id": f"feodo-{threat_id}",
"lat": lat + j_lat,
"lng": lng + j_lng,
"ip": entry.get("ip_address") or "unknown",
"port": entry.get("dst_port") or 0,
"malware": entry.get("malware") or "unknown",
"status": entry.get("status") or "active",
"first_seen": entry.get("first_seen"),
"last_online": entry.get("last_online"),
"country": cc,
"threat_type": "botnet_c2",
}
)
threat_id += 1
except Exception as exc:
logger.warning("Feodo fetch failed: %s", exc)
try:
resp = fetch_with_curl(
"https://urlhaus-api.abuse.ch/v1/urls/recent/limit/100/",
timeout=8,
)
if resp.status_code == 200:
urls = (resp.json() or {}).get("urls") or []
for u in urls:
cc = u.get("country")
if not cc or cc not in COUNTRY_CENTROIDS:
cc = next(iter(COUNTRY_CENTROIDS))
lng, lat = COUNTRY_CENTROIDS[cc]
j_lng = ((threat_id * 137.3) % 200 - 100) / 100 * 5
j_lat = ((threat_id * 211.7) % 200 - 100) / 100 * 5
threats.append(
{
"id": f"urlhaus-{threat_id}",
"lat": lat + j_lat,
"lng": lng + j_lng,
"ip": u.get("host") or "unknown",
"port": 0,
"malware": ", ".join(u.get("tags") or []) or u.get("threat") or "malware",
"status": u.get("url_status") or "online",
"first_seen": u.get("dateadded"),
"country": cc,
"threat_type": "malware_url",
}
)
threat_id += 1
except Exception as exc:
logger.debug("URLhaus supplement failed: %s", exc)
payload = {
"threats": threats,
"total": len(threats),
"timestamp": datetime.now(timezone.utc).isoformat(),
"source": "abuse.ch Feodo Tracker + URLhaus",
}
with _data_lock:
latest_data["malware_threats"] = payload
_mark_fresh("malware_threats")
return threats
+9 -4
View File
@@ -188,11 +188,16 @@ def fetch_meshtastic_nodes():
callsign = "" callsign = ""
send_callsign_header = str( send_callsign_header = str(
_os.environ.get("MESHTASTIC_SEND_CALLSIGN_HEADER", "true") _os.environ.get("MESHTASTIC_SEND_CALLSIGN_HEADER", "false")
).strip().lower() not in {"0", "false", "no", "off", ""} ).strip().lower() in {"1", "true", "yes", "on"}
from services.network_utils import DEFAULT_USER_AGENT # Round 7a: outbound_user_agent already includes the per-install handle.
ua_base = f"{DEFAULT_USER_AGENT}; 24h polling" # The optional Meshtastic callsign is appended as additional context so
# meshtastic.liamcottle.net's operator can identify both the install AND
# the registered radio operator (when MESHTASTIC_OPERATOR_CALLSIGN is set
# and MESHTASTIC_SEND_CALLSIGN_HEADER is true; see issue #203).
from services.network_utils import outbound_user_agent
ua_base = f"{outbound_user_agent('meshtastic-map')}; 24h polling"
if callsign and send_callsign_header: if callsign and send_callsign_header:
user_agent = f"{ua_base}; node={callsign}" user_agent = f"{ua_base}; node={callsign}"
else: else:
+18 -1
View File
@@ -7,6 +7,7 @@ import requests
from services.network_utils import fetch_with_curl from services.network_utils import fetch_with_curl
from services.fetchers._store import latest_data, _data_lock, _mark_fresh from services.fetchers._store import latest_data, _data_lock, _mark_fresh
from services.fetchers.emissions import get_emissions_info from services.fetchers.emissions import get_emissions_info
from services.fetchers.flight_observations import record_observation as _record_flight_observation
from services.fetchers.plane_alert import enrich_with_plane_alert from services.fetchers.plane_alert import enrich_with_plane_alert
logger = logging.getLogger("services.data_fetcher") logger = logging.getLogger("services.data_fetcher")
@@ -171,6 +172,7 @@ def fetch_military_flights():
h = a.get("hex", "").lower() h = a.get("hex", "").lower()
if h and h not in seen_hex: if h and h not in seen_hex:
seen_hex.add(h) seen_hex.add(h)
a["source"] = "adsb.lol"
all_mil_ac.append(a) all_mil_ac.append(a)
except Exception as e: except Exception as e:
logger.warning(f"adsb.lol mil fetch failed: {e}") logger.warning(f"adsb.lol mil fetch failed: {e}")
@@ -182,6 +184,7 @@ def fetch_military_flights():
h = a.get("hex", "").lower() h = a.get("hex", "").lower()
if h and h not in seen_hex: if h and h not in seen_hex:
seen_hex.add(h) seen_hex.add(h)
a["source"] = "airplanes.live"
all_mil_ac.append(a) all_mil_ac.append(a)
logger.info(f"airplanes.live mil: +{len(resp2.json().get('ac', []))} raw, {len(all_mil_ac)} total unique") logger.info(f"airplanes.live mil: +{len(resp2.json().get('ac', []))} raw, {len(all_mil_ac)} total unique")
except Exception as e: except Exception as e:
@@ -234,6 +237,7 @@ def fetch_military_flights():
"registration": f.get("r", "N/A"), "registration": f.get("r", "N/A"),
"icao24": icao_hex, "icao24": icao_hex,
"squawk": f.get("squawk", ""), "squawk": f.get("squawk", ""),
"source": f.get("source") or "adsb.lol",
}) })
continue continue
@@ -258,7 +262,8 @@ def fetch_military_flights():
"model": f.get("t", "Unknown"), "model": f.get("t", "Unknown"),
"icao24": icao_hex, "icao24": icao_hex,
"speed_knots": speed_knots, "speed_knots": speed_knots,
"squawk": f.get("squawk", "") "squawk": f.get("squawk", ""),
"source": f.get("source") or "adsb.lol",
}) })
except Exception as loop_e: except Exception as loop_e:
logger.error(f"Mil flight interpolation error: {loop_e}") logger.error(f"Mil flight interpolation error: {loop_e}")
@@ -296,6 +301,18 @@ def fetch_military_flights():
if model: if model:
emissions = get_emissions_info(model) emissions = get_emissions_info(model)
if emissions: if emissions:
# Cumulative fuel/CO2 since first observation — mirrors
# the civilian path in flights._classify_and_publish.
observed_seconds = _record_flight_observation(
mf.get("icao24") or ""
)
elapsed_h = observed_seconds / 3600.0
emissions = {
**emissions,
"observed_seconds": observed_seconds,
"fuel_gallons_burned": round(emissions["fuel_gph"] * elapsed_h, 1),
"co2_kg_emitted": round(emissions["co2_kg_per_hour"] * elapsed_h, 1),
}
mf["emissions"] = emissions mf["emissions"] = emissions
if mf.get("alert_category"): if mf.get("alert_category"):
mf["type"] = "tracked_flight" mf["type"] = "tracked_flight"
+14 -9
View File
@@ -158,21 +158,26 @@ _KEYWORD_COORDS = {
_SORTED_KEYWORDS = sorted(_KEYWORD_COORDS.items(), key=lambda x: len(x[0]), reverse=True) _SORTED_KEYWORDS = sorted(_KEYWORD_COORDS.items(), key=lambda x: len(x[0]), reverse=True)
def resolve_coords_match(text: str) -> tuple[tuple[float, float], str] | None:
"""Return ((lat, lng), matched_keyword) for the most specific keyword hit."""
padded_text = f" {text} "
for kw, coords in _SORTED_KEYWORDS:
if kw.startswith(" ") or kw.endswith(" "):
if kw in padded_text:
return coords, kw
elif re.search(r"\b" + re.escape(kw) + r"\b", text):
return coords, kw
return None
def _resolve_coords(text: str) -> tuple[float, float] | None: def _resolve_coords(text: str) -> tuple[float, float] | None:
"""Return (lat, lng) for the most specific keyword match, or None. """Return (lat, lng) for the most specific keyword match, or None.
Longer keywords are tried first. Space-padded keywords (" us ", " uk ") Longer keywords are tried first. Space-padded keywords (" us ", " uk ")
use substring matching on padded text; all others use word-boundary regex. use substring matching on padded text; all others use word-boundary regex.
""" """
padded_text = f" {text} " match = resolve_coords_match(text)
for kw, coords in _SORTED_KEYWORDS: return match[0] if match else None
if kw.startswith(" ") or kw.endswith(" "):
if kw in padded_text:
return coords
else:
if re.search(r'\b' + re.escape(kw) + r'\b', text):
return coords
return None
@with_retry(max_retries=1, base_delay=2) @with_retry(max_retries=1, base_delay=2)
@@ -9,6 +9,7 @@ import json
import logging import logging
import math import math
import os import os
import random
import threading import threading
import time import time
from urllib.parse import urlencode from urllib.parse import urlencode
@@ -21,23 +22,34 @@ _prev_probabilities: dict[str, float] = {}
_market_cache = TTLCache(maxsize=1, ttl=300) _market_cache = TTLCache(maxsize=1, ttl=300)
_POLYMARKET_PAGE_DELAY_S = float(os.environ.get("MESH_POLYMARKET_PAGE_DELAY_S", "0.02")) _POLYMARKET_PAGE_DELAY_S = float(os.environ.get("MESH_POLYMARKET_PAGE_DELAY_S", "0.02"))
_KALSHI_PAGE_DELAY_S = float(os.environ.get("MESH_KALSHI_PAGE_DELAY_S", "0.08")) _KALSHI_PAGE_DELAY_S = float(os.environ.get("MESH_KALSHI_PAGE_DELAY_S", "0.08"))
_POLYMARKET_PAGE_DELAY_JITTER_S = float(os.environ.get("MESH_POLYMARKET_PAGE_DELAY_JITTER_S", "0.08"))
_KALSHI_PAGE_DELAY_JITTER_S = float(os.environ.get("MESH_KALSHI_PAGE_DELAY_JITTER_S", "0.2"))
# Random delay before each full Polymarket+Kalshi cycle (decorrelates from other slow-tier jobs).
_PRE_FETCH_JITTER_S = float(os.environ.get("PREDICTION_MARKETS_PRE_FETCH_JITTER_S", "90"))
# Random pause between finishing Polymarket pagination and starting Kalshi.
_PROVIDER_GAP_JITTER_S = float(os.environ.get("PREDICTION_MARKETS_PROVIDER_GAP_JITTER_S", "45"))
_provider_pace_lock = threading.Lock() _provider_pace_lock = threading.Lock()
_provider_last_request_at: dict[str, float] = {} _provider_last_request_at: dict[str, float] = {}
def prediction_markets_fetch_enabled() -> bool: def prediction_markets_fetch_enabled() -> bool:
"""Return True only when the operator explicitly opts into Polymarket/Kalshi pulls.""" """Return True when UI opt-in or PREDICTION_MARKETS_ENABLED enables pulls."""
return str(os.environ.get("PREDICTION_MARKETS_ENABLED", "")).strip().lower() in { from services.prediction_markets_settings import prediction_markets_fetch_enabled as _enabled
"1",
"true", return _enabled()
"yes",
"on",
}
def _pace_provider(provider: str, min_interval_s: float) -> None: def _pace_provider(provider: str, min_interval_s: float) -> None:
if min_interval_s <= 0: if min_interval_s <= 0:
return return
jitter_s = (
_POLYMARKET_PAGE_DELAY_JITTER_S
if provider == "polymarket"
else _KALSHI_PAGE_DELAY_JITTER_S
if provider == "kalshi"
else 0.0
)
min_interval_s += random.uniform(0.0, jitter_s) if jitter_s > 0 else 0.0
with _provider_pace_lock: with _provider_pace_lock:
now = time.monotonic() now = time.monotonic()
wait_s = min_interval_s - (now - _provider_last_request_at.get(provider, 0.0)) wait_s = min_interval_s - (now - _provider_last_request_at.get(provider, 0.0))
@@ -47,6 +59,24 @@ def _pace_provider(provider: str, min_interval_s: float) -> None:
_provider_last_request_at[provider] = now _provider_last_request_at[provider] = now
def _apply_pre_fetch_jitter() -> None:
if _PRE_FETCH_JITTER_S <= 0:
return
delay = random.uniform(0.0, _PRE_FETCH_JITTER_S)
if delay >= 1.0:
logger.debug("Prediction markets: pre-fetch jitter %.1fs", delay)
time.sleep(delay)
def _apply_provider_gap_jitter() -> None:
if _PROVIDER_GAP_JITTER_S <= 0:
return
delay = random.uniform(0.0, _PROVIDER_GAP_JITTER_S)
if delay >= 1.0:
logger.debug("Prediction markets: provider gap jitter %.1fs", delay)
time.sleep(delay)
def _finite_or_none(value): def _finite_or_none(value):
try: try:
n = float(value) n = float(value)
@@ -750,7 +780,9 @@ def _merge_markets(poly_events: list[dict], kalshi_events: list[dict]) -> list[d
@cached(_market_cache) @cached(_market_cache)
def fetch_prediction_markets_raw() -> list[dict]: def fetch_prediction_markets_raw() -> list[dict]:
"""Fetch and merge prediction markets from both sources. Cached 5 min.""" """Fetch and merge prediction markets from both sources. Cached 5 min."""
_apply_pre_fetch_jitter()
poly = _fetch_polymarket_events() poly = _fetch_polymarket_events()
_apply_provider_gap_jitter()
kalshi = _fetch_kalshi_events() kalshi = _fetch_kalshi_events()
merged = _merge_markets(poly, kalshi) merged = _merge_markets(poly, kalshi)
logger.info( logger.info(
+9 -2
View File
@@ -11,15 +11,20 @@ import random
import logging import logging
import functools import functools
import requests import requests
from requests.exceptions import ChunkedEncodingError, ConnectionError as RequestsConnectionError
from requests.exceptions import Timeout as RequestsTimeout
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)
# Only retry on transient network/OS errors — not on parse errors, key errors, etc. # Only retry on transient network/OS errors — not parse/key errors or HTTP 4xx/5xx.
# requests.HTTPError (from raise_for_status) is intentionally excluded.
TRANSIENT_ERRORS = ( TRANSIENT_ERRORS = (
TimeoutError, TimeoutError,
ConnectionError, ConnectionError,
OSError, OSError,
requests.RequestException, RequestsConnectionError,
RequestsTimeout,
ChunkedEncodingError,
) )
@@ -43,6 +48,8 @@ def with_retry(max_retries: int = 3, base_delay: float = 2.0, max_delay: float =
for attempt in range(1 + max_retries): for attempt in range(1 + max_retries):
try: try:
return func(*args, **kwargs) return func(*args, **kwargs)
except requests.HTTPError:
raise
except TRANSIENT_ERRORS as exc: except TRANSIENT_ERRORS as exc:
last_exc = exc last_exc = exc
if attempt < max_retries: if attempt < max_retries:
@@ -0,0 +1,84 @@
"""Scheduled Sentinel-2 road corridor freight trend fetcher (opt-in, slow tier)."""
from __future__ import annotations
import logging
import os
from datetime import datetime, timezone
from services.fetchers._store import _data_lock, _mark_fresh, is_any_active, latest_data
logger = logging.getLogger(__name__)
_REFRESH_HOURS = float(os.environ.get("ROAD_CORRIDOR_REFRESH_HOURS", "24"))
def _hours_since(iso_ts: str) -> float | None:
try:
dt = datetime.fromisoformat(iso_ts.replace("Z", "+00:00"))
if dt.tzinfo is None:
dt = dt.replace(tzinfo=timezone.utc)
return (datetime.now(timezone.utc) - dt).total_seconds() / 3600.0
except ValueError:
return None
def _feature_ready() -> bool:
from services.road_corridor_sat.config import optional_deps_available, road_corridor_sat_enabled
from services.road_corridor_sat.credentials import sentinel_credentials_configured
if not road_corridor_sat_enabled():
return False
if not optional_deps_available():
logger.debug("road_corridor_trends skipped — optional deps not installed")
return False
if not sentinel_credentials_configured():
logger.debug("road_corridor_trends skipped — Sentinel credentials missing")
return False
return True
def refresh_road_corridor_store() -> None:
from services.road_corridor_sat.storage import build_trends_payload
payload = build_trends_payload()
with _data_lock:
latest_data["road_corridor_trends"] = payload
_mark_fresh("road_corridor_trends")
def fetch_road_corridor_trends(force: bool = False) -> None:
"""Refresh scheduled corridor presets (default: laredo_i35 every 24h)."""
if not is_any_active("road_corridor_trends"):
return
if not _feature_ready():
return
from services.road_corridor_sat.config import SCHEDULED_PRESET_IDS
from services.road_corridor_sat.pipeline import analyze_preset
from services.road_corridor_sat.presets import get_preset
from services.road_corridor_sat.storage import load_refresh_state
state = load_refresh_state()
for preset_id in SCHEDULED_PRESET_IDS:
preset = get_preset(preset_id)
if preset is None:
logger.warning("Unknown scheduled road corridor preset: %s", preset_id)
continue
last = state.get(preset_id)
if last and not force:
age_h = _hours_since(last)
if age_h is not None and age_h < _REFRESH_HOURS:
logger.info(
"road_corridor %s fresh (%.1fh < %.1fh) — skipping",
preset_id,
age_h,
_REFRESH_HOURS,
)
continue
try:
logger.info("road_corridor analysis starting for %s", preset_id)
analyze_preset(preset_id)
except Exception as exc:
logger.exception("road_corridor analysis failed for %s: %s", preset_id, exc)
refresh_road_corridor_store()
+7 -3
View File
@@ -17,6 +17,12 @@ from typing import Any
import requests import requests
def _route_db_user_agent() -> str:
from services.network_utils import outbound_user_agent
return outbound_user_agent("route-database")
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)
_ROUTES_URL = "https://vrs-standing-data.adsb.lol/routes.csv.gz" _ROUTES_URL = "https://vrs-standing-data.adsb.lol/routes.csv.gz"
@@ -24,8 +30,6 @@ _AIRPORTS_URL = "https://vrs-standing-data.adsb.lol/airports.csv.gz"
_REFRESH_INTERVAL_S = 5 * 24 * 3600 _REFRESH_INTERVAL_S = 5 * 24 * 3600
_HTTP_TIMEOUT_S = 60 _HTTP_TIMEOUT_S = 60
from services.network_utils import DEFAULT_USER_AGENT as _USER_AGENT
_lock = threading.RLock() _lock = threading.RLock()
_routes_by_callsign: dict[str, dict[str, Any]] = {} _routes_by_callsign: dict[str, dict[str, Any]] = {}
_airports_by_icao: dict[str, dict[str, Any]] = {} _airports_by_icao: dict[str, dict[str, Any]] = {}
@@ -37,7 +41,7 @@ def _fetch_csv_gz(url: str) -> list[dict[str, str]]:
response = requests.get( response = requests.get(
url, url,
timeout=_HTTP_TIMEOUT_S, timeout=_HTTP_TIMEOUT_S,
headers={"User-Agent": _USER_AGENT, "Accept-Encoding": "gzip"}, headers={"User-Agent": _route_db_user_agent(), "Accept-Encoding": "gzip"},
) )
response.raise_for_status() response.raise_for_status()
text = gzip.decompress(response.content).decode("utf-8-sig") text = gzip.decompress(response.content).decode("utf-8-sig")
+381
View File
@@ -0,0 +1,381 @@
"""Telegram OSINT — public channel web previews (t.me/s) with keyword geoparsing."""
from __future__ import annotations
import hashlib
import logging
import os
import re
from datetime import datetime, timezone
from typing import Any
from services.fetchers._store import _data_lock, _mark_fresh, is_any_active, latest_data
from services.fetchers.news import resolve_coords_match
from services.network_utils import fetch_with_curl, outbound_user_agent
logger = logging.getLogger(__name__)
_DEFAULT_CHANNELS = (
"osintdefender",
"insiderpaper",
"aljazeeraenglish",
"nexta_live",
"war_monitor",
"OSINTtechnical",
"Liveuamap",
)
_MESSAGE_BLOCK_RE = re.compile(
r'<div class="tgme_widget_message_wrap js-widget_message_wrap"[\s\S]*?</div>\s*</div>\s*</div>',
re.IGNORECASE,
)
_TEXT_RE = re.compile(
r'<div class="tgme_widget_message_text[^>]*>([\s\S]*?)</div>',
re.IGNORECASE,
)
_DATE_RE = re.compile(
r'<a class="tgme_widget_message_date" href="(https://t\.me/[^"]+)".*?<time datetime="([^"]+)"',
re.IGNORECASE,
)
_HAS_VIDEO_RE = re.compile(
r'tgme_widget_message_video|js-message_video|<video\s',
re.IGNORECASE,
)
_HAS_PHOTO_RE = re.compile(r'tgme_widget_message_photo_wrap', re.IGNORECASE)
_VIDEO_SRC_RE = re.compile(r'<video[^>]+src="([^"]+)"', re.IGNORECASE)
_BG_IMAGE_RE = re.compile(r"background-image:url\('([^']+)'\)", re.IGNORECASE)
_TELEGRAM_MEDIA_HOST_SUFFIXES = (".telesco.pe", ".telegram-cdn.org")
# Cyrillic / Arabic aliases for war-reporting channels (merged after English resolver).
_EXTRA_PLACE_KEYWORDS: dict[str, tuple[float, float]] = {
"киев": (50.450, 30.523),
"київ": (50.450, 30.523),
"харьков": (49.993, 36.231),
"харків": (49.993, 36.231),
"одесса": (46.482, 30.724),
"одеса": (46.482, 30.724),
"донецк": (48.015, 37.803),
"донецьк": (48.015, 37.803),
"луганск": (48.574, 39.307),
"луганськ": (48.574, 39.307),
"москва": (55.755, 37.617),
"крым": (45.000, 34.000),
"крим": (45.000, 34.000),
"бахмут": (48.595, 38.000),
"запорожье": (47.838, 35.139),
"запоріжжя": (47.838, 35.139),
"غزة": (31.416, 34.333),
"دمشق": (33.513, 36.276),
"بيروت": (33.893, 35.501),
"tel aviv": (32.085, 34.781),
"תל אביב": (32.085, 34.781),
}
# Country-level news geocodes sit on national centroids that stack with threat alerts.
# Telegram uses major metro anchors so pins land on a different map cell than news.
_TELEGRAM_ANCHOR_OVERRIDES: dict[str, tuple[float, float]] = {
"israel": (32.085, 34.781), # Tel Aviv (news uses central Israel ~Jerusalem corridor)
"middle east": (32.085, 34.781),
"china": (39.904, 116.407), # Beijing (news uses country centroid)
"united states": (40.712, -74.006), # New York (news uses Washington DC)
"usa": (40.712, -74.006),
"us": (40.712, -74.006),
"america": (40.712, -74.006),
"uk": (51.507, -0.127), # London
"iran": (35.689, 51.389), # Tehran
"russia": (55.755, 37.617), # Moscow
"ukraine": (50.450, 30.523), # Kyiv
"france": (48.856, 2.352), # Paris
"germany": (52.520, 13.405), # Berlin
"lebanon": (34.433, 35.844), # Tripoli (news uses Beirut corridor)
}
_RISK_KEYWORDS = (
"war",
"missile",
"strike",
"attack",
"crisis",
"tension",
"military",
"conflict",
"defense",
"clash",
"nuclear",
"invasion",
"bomb",
"drone",
"weapon",
"sanctions",
"ceasefire",
"escalation",
"killed",
"destroyed",
"operation",
"casualty",
"frontline",
"threat",
"explosion",
"shelling",
)
def telegram_osint_enabled() -> bool:
return str(os.environ.get("TELEGRAM_OSINT_ENABLED", "true")).strip().lower() not in {
"0",
"false",
"no",
"off",
"",
}
def _configured_channels() -> list[str]:
raw = str(os.environ.get("TELEGRAM_OSINT_CHANNELS", "")).strip()
if raw:
return [part.strip().lstrip("@") for part in raw.split(",") if part.strip()]
return list(_DEFAULT_CHANNELS)
def telegram_media_host_allowed(hostname: str | None) -> bool:
host = str(hostname or "").strip().lower()
if not host:
return False
return any(host.endswith(suffix) for suffix in _TELEGRAM_MEDIA_HOST_SUFFIXES)
def _extract_media(block: str, link: str) -> dict[str, Any]:
has_video = bool(_HAS_VIDEO_RE.search(block))
has_photo = bool(_HAS_PHOTO_RE.search(block))
media_type: str | None = None
media_url: str | None = None
if has_video:
media_type = "video"
video_match = _VIDEO_SRC_RE.search(block)
if video_match:
media_url = video_match.group(1).strip()
elif has_photo:
media_type = "photo"
photo_match = _BG_IMAGE_RE.search(block)
if photo_match:
media_url = photo_match.group(1).strip()
embed_url: str | None = None
if media_type and link:
embed_url = f"{link}?embed=1"
return {
"media_type": media_type,
"media_url": media_url,
"embed_url": embed_url,
}
def _strip_html(text: str) -> str:
cleaned = re.sub(r"<br\s*/?>", "\n", text, flags=re.IGNORECASE)
cleaned = re.sub(r"<[^>]+>", "", cleaned)
return (
cleaned.replace("&quot;", '"')
.replace("&amp;", "&")
.replace("&lt;", "<")
.replace("&gt;", ">")
.strip()
)
def _score_risk(text: str) -> int:
lower = text.lower()
score = 1
for kw in _RISK_KEYWORDS:
if kw in lower:
score += 2
return min(10, score)
def _refresh_post_coords(post: dict[str, Any]) -> dict[str, Any]:
"""Re-apply geoparsing so stored posts pick up anchor updates."""
text = "\n".join(
str(part).strip()
for part in (post.get("title"), post.get("description"))
if part and str(part).strip()
)
if not text:
return post
coords = _resolve_telegram_coords(text)
if not coords:
return post
updated = dict(post)
updated["coords"] = [coords[0], coords[1]]
return updated
def _resolve_telegram_coords(text: str) -> tuple[float, float] | None:
lower = text.lower()
match = resolve_coords_match(lower)
if match:
_coords, keyword = match
anchor = _TELEGRAM_ANCHOR_OVERRIDES.get(keyword.strip().lower())
if anchor:
return anchor
return _coords
for keyword, coords in sorted(_EXTRA_PLACE_KEYWORDS.items(), key=lambda x: len(x[0]), reverse=True):
if keyword in lower:
return coords
return None
def _post_link(post: dict[str, Any]) -> str:
return str(post.get("link") or "").strip()
def _extract_new_channel_posts(
html: str,
channel: str,
known_links: set[str],
*,
bootstrap_limit: int = 12,
) -> list[dict[str, Any]]:
"""Return unseen posts from a channel page; stop once we hit a stored link."""
parsed = parse_telegram_channel_html(html, channel)
if not parsed:
return []
if not known_links:
return parsed[-bootstrap_limit:]
fresh: list[dict[str, Any]] = []
for post in reversed(parsed):
link = _post_link(post)
if not link:
continue
if link in known_links:
break
fresh.append(post)
fresh.reverse()
return fresh
def _merge_telegram_posts(
existing: list[dict[str, Any]],
incoming: list[dict[str, Any]],
*,
max_posts: int = 120,
) -> tuple[list[dict[str, Any]], int]:
known_links = {_post_link(post) for post in existing if _post_link(post)}
added = 0
for post in incoming:
link = _post_link(post)
if not link or link in known_links:
continue
known_links.add(link)
existing.append(post)
added += 1
existing.sort(key=lambda p: str(p.get("published") or ""), reverse=True)
return existing[:max_posts], added
def parse_telegram_channel_html(html: str, channel: str) -> list[dict[str, Any]]:
"""Parse public t.me/s channel preview HTML into post dicts."""
posts: list[dict[str, Any]] = []
for block in _MESSAGE_BLOCK_RE.findall(html or ""):
text_match = _TEXT_RE.search(block)
if not text_match:
continue
text = _strip_html(text_match.group(1))
if len(text) < 10:
continue
date_match = _DATE_RE.search(block)
link = date_match.group(1) if date_match else f"https://t.me/{channel}"
published = date_match.group(2) if date_match else datetime.now(timezone.utc).isoformat()
title = text.split("\n", 1)[0][:160]
risk_score = _score_risk(text)
coords = _resolve_telegram_coords(text)
post_id = hashlib.sha1(f"{link}|{published}".encode("utf-8")).hexdigest()[:16]
media = _extract_media(block, link)
posts.append(
{
"id": post_id,
"title": title,
"description": text[:1200],
"link": link,
"published": published,
"source": f"t.me/{channel}",
"channel": channel,
"risk_score": risk_score,
"coords": [coords[0], coords[1]] if coords else None,
**media,
}
)
return posts
def fetch_telegram_osint() -> dict[str, Any]:
if not is_any_active("telegram_osint"):
return latest_data.get("telegram_osint") or {"posts": [], "total": 0, "timestamp": None}
if not telegram_osint_enabled():
with _data_lock:
latest_data["telegram_osint"] = {"posts": [], "total": 0, "timestamp": None, "disabled": True}
_mark_fresh("telegram_osint")
return latest_data["telegram_osint"]
headers = {
"User-Agent": (
f"Mozilla/5.0 (compatible; {outbound_user_agent('telegram-osint')}) "
"AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"
),
"Accept": "text/html,application/xhtml+xml",
}
with _data_lock:
prior = latest_data.get("telegram_osint") or {}
existing_posts = list(prior.get("posts") or [])
known_links = {_post_link(post) for post in existing_posts if _post_link(post)}
incoming: list[dict[str, Any]] = []
for channel in _configured_channels():
url = f"https://t.me/s/{channel}"
try:
resp = fetch_with_curl(url, timeout=15, headers=headers)
if not resp or resp.status_code != 200:
logger.warning(
"Telegram channel %s fetch failed: HTTP %s",
channel,
resp.status_code if resp else "no response",
)
continue
channel_new = _extract_new_channel_posts(resp.text, channel, known_links)
for post in channel_new:
link = _post_link(post)
if not link or link in known_links:
continue
known_links.add(link)
incoming.append(post)
except Exception as exc:
logger.warning("Telegram channel %s parse failed: %s", channel, exc)
merged_posts, added = _merge_telegram_posts(existing_posts, incoming)
merged_posts = [_refresh_post_coords(post) for post in merged_posts]
geolocated = sum(1 for p in merged_posts if p.get("coords"))
payload = {
"posts": merged_posts,
"total": len(merged_posts),
"geolocated": geolocated,
"timestamp": datetime.now(timezone.utc).isoformat(),
"channels": _configured_channels(),
"last_fetch_new": added,
}
with _data_lock:
latest_data["telegram_osint"] = payload
_mark_fresh("telegram_osint")
logger.info(
"Telegram OSINT: +%s new, %s retained (%s geolocated)",
added,
len(merged_posts),
geolocated,
)
return payload
+7 -1
View File
@@ -10,6 +10,12 @@ from datetime import datetime, timezone
from services.fetchers._store import _data_lock, _mark_fresh, latest_data from services.fetchers._store import _data_lock, _mark_fresh, latest_data
from services.network_utils import fetch_with_curl from services.network_utils import fetch_with_curl
def _trains_user_agent() -> str:
from services.network_utils import outbound_user_agent
return outbound_user_agent("trains")
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)
_EARTH_RADIUS_KM = 6371.0 _EARTH_RADIUS_KM = 6371.0
@@ -379,7 +385,7 @@ def _fetch_digitraffic() -> list[dict]:
timeout=15, timeout=15,
headers={ headers={
"Accept-Encoding": "gzip", "Accept-Encoding": "gzip",
"User-Agent": "ShadowBroker-OSINT/1.0", "User-Agent": _trains_user_agent(),
}, },
) )
if resp.status_code != 200: if resp.status_code != 200:
@@ -0,0 +1,457 @@
"""USNI News Fleet & Marine Tracker — authoritative weekly carrier
position publication.
Why this exists
---------------
The previous carrier_tracker pipeline relied on GDELT headline matching
(``api.gdeltproject.org``) to derive positions from text like "USS Ford
in the Mediterranean" → centroid of "Mediterranean Sea". That was
- low-precision (audit issue #245 — false precision from text mentions),
- unreliable (``api.gdeltproject.org`` is sometimes unreachable from
certain network paths, including Docker Desktop on some Windows hosts).
USNI publishes a weekly tracker that explicitly lists where every U.S.
carrier is operating. The article body uses extremely consistent phrasing:
"The Gerald R. Ford Carrier Strike Group is operating in the Red Sea"
"Aircraft carrier USS George Washington (CVN-73) is in port in
Yokosuka, Japan."
"USS Dwight D. Eisenhower (CVN-69) sails down the Elizabeth River"
Those are deterministic to parse. This module:
1. Pulls the WordPress RSS feeds (both site-wide and category) the
site-wide feed often has fresher posts before the category feed
catches up, so we union them.
2. Picks the most recent post by parsed ``pubDate``.
3. For each carrier in the registry, scans the article body for a
"is operating in / is in port in / departed from" pattern near
the carrier's name.
4. Maps the extracted region phrase to coordinates via the carrier
tracker's existing REGION_COORDS.
The result is a ``{hull: position_entry}`` dict that the carrier tracker
consumes as a high-confidence source ``position_confidence: "recent"``
with ``position_source_at`` set to the article's actual publication
timestamp (not ``now()``).
Politeness
----------
We send the per-install operator handle via ``outbound_user_agent``
(Round 7a) so USNI can rate-limit / contact the specific install if
needed. Article-body pages return 403 to non-browser UAs (Cloudflare),
but WordPress RSS feeds are open and serve the full article in
``<content:encoded>`` that's the supported path for aggregators and
the one we use. We do not spoof browser headers.
"""
from __future__ import annotations
import logging
import re
import xml.etree.ElementTree as ET
from datetime import datetime, timezone
from email.utils import parsedate_to_datetime
from typing import Iterable
from services.network_utils import fetch_with_curl, outbound_user_agent
logger = logging.getLogger(__name__)
_RSS_URLS: tuple[str, ...] = (
# Site-wide feed often has the freshest posts before the category
# feed catches up. We try this first.
"https://news.usni.org/feed",
# Category feed has older fleet trackers for backfill.
"https://news.usni.org/category/fleet-tracker/feed",
)
_RSS_NS = {"content": "http://purl.org/rss/1.0/modules/content/"}
_FLEET_TRACKER_TITLE_RE = re.compile(
r"fleet\s+and\s+marine\s+tracker", re.IGNORECASE
)
_TAG_STRIP_RE = re.compile(r"<[^>]+>")
_WHITESPACE_RE = re.compile(r"\s+")
def _strip_html(html: str) -> str:
text = _TAG_STRIP_RE.sub(" ", html or "")
return _WHITESPACE_RE.sub(" ", text).strip()
def _request_headers() -> dict[str, str]:
"""Headers USNI's WordPress feed accepts from a legitimate aggregator.
The ``Referer`` is the category index page that's where a real
feed reader navigates from. ``Accept`` declares RSS preference but
falls back to HTML. No browser UA spoofing.
"""
return {
"User-Agent": outbound_user_agent("usni-fleet-tracker"),
"Accept": "application/rss+xml, application/xml;q=0.9, */*;q=0.1",
"Accept-Language": "en-US,en;q=0.5",
"Referer": "https://news.usni.org/category/fleet-tracker",
}
def _parse_pubdate(raw: str) -> datetime | None:
if not raw:
return None
try:
dt = parsedate_to_datetime(raw)
if dt.tzinfo is None:
dt = dt.replace(tzinfo=timezone.utc)
return dt
except (TypeError, ValueError):
return None
def _iter_fleet_tracker_items(rss_urls: Iterable[str]) -> list[dict]:
"""Pull every fleet-tracker post visible across the given RSS feeds.
De-duplicates by article link. Returns a list of dicts:
{"title", "link", "pub_date" (datetime), "body" (plain text)}
"""
items_by_link: dict[str, dict] = {}
for url in rss_urls:
try:
r = fetch_with_curl(url, timeout=15, headers=_request_headers())
except Exception as exc:
logger.debug("USNI RSS %s exception: %s", url, exc)
continue
if not r or r.status_code != 200 or not r.text:
logger.debug(
"USNI RSS %s returned status=%s body=%d",
url,
getattr(r, "status_code", "?"),
len(getattr(r, "text", "") or ""),
)
continue
try:
root = ET.fromstring(r.text)
except ET.ParseError as exc:
logger.warning("USNI RSS parse error from %s: %s", url, exc)
continue
for item in root.findall(".//item"):
title = (item.findtext("title") or "").strip()
if not _FLEET_TRACKER_TITLE_RE.search(title):
continue
link = (item.findtext("link") or "").strip()
if not link or link in items_by_link:
continue
pub_dt = _parse_pubdate(item.findtext("pubDate") or "")
body_html = (
item.findtext("content:encoded", default="", namespaces=_RSS_NS)
or item.findtext("description", default="")
or ""
)
items_by_link[link] = {
"title": title,
"link": link,
"pub_date": pub_dt,
"body": _strip_html(body_html),
}
return list(items_by_link.values())
# Map USNI region phrases to keys in carrier_tracker.REGION_COORDS.
# The carrier_tracker table already covers most named bodies of water and
# major ports — we just need to teach this module to RECOGNIZE the
# specific phrases USNI's editorial style uses, which sometimes spell
# the same body of water differently.
_USNI_REGION_ALIASES: tuple[tuple[str, str], ...] = (
# USNI phrase (lowercase) -> REGION_COORDS key
("eastern mediterranean", "eastern mediterranean"),
("western mediterranean", "western mediterranean"),
("mediterranean sea", "mediterranean"),
("the mediterranean", "mediterranean"),
("red sea", "red sea"),
("arabian sea area of responsibility", "arabian sea"),
("north arabian sea", "north arabian sea"),
("arabian sea", "arabian sea"),
("persian gulf", "persian gulf"),
("gulf of oman", "gulf of oman"),
("strait of hormuz", "strait of hormuz"),
("south china sea", "south china sea"),
("east china sea", "east china sea"),
("philippine sea", "philippine sea"),
("sea of japan", "sea of japan"),
("taiwan strait", "taiwan strait"),
("western pacific", "western pacific"),
("pacific ocean", "pacific"),
("indian ocean", "indian ocean"),
("north atlantic", "north atlantic"),
("western atlantic", "atlantic"),
("eastern atlantic", "atlantic"),
("atlantic ocean", "atlantic"),
("gulf of aden", "gulf of aden"),
("horn of africa", "horn of africa"),
("bab el-mandeb", "bab el-mandeb"),
("suez canal", "suez canal"),
("baltic sea", "baltic sea"),
("north sea", "north sea"),
("black sea", "black sea"),
("south atlantic", "south atlantic"),
("coral sea", "coral sea"),
("gulf of mexico", "gulf of mexico"),
("caribbean sea", "caribbean"),
("caribbean", "caribbean"),
# Specific ports
("naval station norfolk", "norfolk"),
("norfolk naval shipyard", "newport news"),
("newport news shipbuilding", "newport news"),
("newport news", "newport news"),
# USNI tags Norfolk mentions with state suffix; match both.
("norfolk, va", "norfolk"),
("norfolk", "norfolk"),
("naval station everett", "puget sound"),
("naval base kitsap", "bremerton"),
("bremerton", "bremerton"),
("puget sound", "puget sound"),
("naval base san diego", "san diego"),
("san diego, calif", "san diego"),
("san diego", "san diego"),
("yokosuka, japan", "yokosuka"),
("yokosuka", "yokosuka"),
("pearl harbor", "pearl harbor"),
("apra harbor, guam", "guam"),
("guam", "guam"),
("bahrain", "bahrain"),
("naval station rota", "rota"),
("rota, spain", "rota"),
("naples, italy", "naples"),
# Fleets / AORs
("5th fleet", "5th fleet"),
("6th fleet", "6th fleet"),
("7th fleet", "7th fleet"),
("3rd fleet", "3rd fleet"),
("2nd fleet", "2nd fleet"),
("centcom", "centcom"),
("indo-pacific command", "indopacom"),
("eucom", "eucom"),
("southcom", "southcom"),
)
def _resolve_region_phrase(phrase: str) -> tuple[str, str] | None:
"""Map a USNI region phrase to a ``(canonical_key, display)`` tuple,
or ``None`` if we don't recognize it.
``canonical_key`` is what ``carrier_tracker.REGION_COORDS`` keys on.
``display`` is the phrase we'll show in the dossier description.
"""
p = (phrase or "").lower().strip()
if not p:
return None
for usni_phrase, canonical in _USNI_REGION_ALIASES:
if usni_phrase in p:
return canonical, usni_phrase
return None
# Operating-verb phrases USNI uses, with a capture group for the region
# phrase that immediately follows. Each pattern is designed to swallow
# the optional editorial filler that often appears between verb and
# location (e.g. "returned Friday to Norfolk" — "Friday" goes in the
# filler; "Norfolk" is the location).
#
# Order matters: most-specific patterns first, so e.g. "is in port in"
# wins over the generic "is".
_DAY_FILLER = r"(?:[A-Z][a-z]+(?:day)?,?\s+)?" # optional "Friday" / "Monday" / etc.
_LOC_CAPTURE = r"([A-Za-z][A-Za-z0-9\s,\.\-']{2,80})"
_OPERATING_PATTERNS: tuple[re.Pattern, ...] = (
# "is operating in [the] {REGION}" / "is also operating in [the] {REGION}"
re.compile(r"\bis\s+(?:also\s+|now\s+)?operating\s+in\s+(?:the\s+)?" + _LOC_CAPTURE, re.IGNORECASE),
# "is conducting <stuff> in [the] {REGION}"
re.compile(r"\bis\s+conducting\s+[A-Za-z0-9\-\s]{2,40}\s+in\s+(?:the\s+)?" + _LOC_CAPTURE, re.IGNORECASE),
# "is in port in {LOCATION}"
re.compile(r"\bis\s+in\s+port\s+in\s+" + _LOC_CAPTURE, re.IGNORECASE),
# "is in port" (no location — degenerate, use carrier's homeport via separate path)
# → not captured here; falls through to homeport
# "is underway in [the] {REGION}"
re.compile(r"\bis\s+underway\s+in\s+(?:the\s+)?" + _LOC_CAPTURE, re.IGNORECASE),
# "is deployed to [the] {REGION}" / "deployed in"
re.compile(r"\bis\s+deployed\s+(?:to|in)\s+(?:the\s+)?" + _LOC_CAPTURE, re.IGNORECASE),
# "returned [Day] to {LOCATION}" / "returned [Day] from {REGION}"
re.compile(r"\breturned\s+" + _DAY_FILLER + r"to\s+" + _LOC_CAPTURE, re.IGNORECASE),
re.compile(r"\breturned\s+" + _DAY_FILLER + r"from\s+(?:the\s+)?" + _LOC_CAPTURE, re.IGNORECASE),
# "arrived [Day] in/at {LOCATION}"
re.compile(r"\barrived\s+" + _DAY_FILLER + r"(?:in|at)\s+" + _LOC_CAPTURE, re.IGNORECASE),
# "departed [Day] from {LOCATION}"
re.compile(r"\bdeparted\s+" + _DAY_FILLER + r"(?:from\s+)?" + _LOC_CAPTURE, re.IGNORECASE),
# "transiting [the] {REGION}" / "sailing through [the] {REGION}"
re.compile(r"\btransiting\s+(?:the\s+)?" + _LOC_CAPTURE, re.IGNORECASE),
re.compile(r"\bsailing\s+through\s+(?:the\s+)?" + _LOC_CAPTURE, re.IGNORECASE),
# "is homeported at {LOCATION}"
re.compile(r"\bis\s+homeported\s+at\s+" + _LOC_CAPTURE, re.IGNORECASE),
)
def _extract_region_for_carrier(
body: str,
carrier_names: list[str],
hull_code: str,
) -> str | None:
"""Return the best-guess region phrase for one carrier from the
article body, or None if no confident match.
Algorithm:
1. Find every mention of the carrier (any name variant or the hull
code) in the body.
2. For each mention, look in the ~300-char window AFTER it for any
of the operating-verb patterns.
3. Return the first hit. If a more-confident match later turns up
(e.g. "is operating in the X" beats "is homeported at Y"), the
first one in document order still wins USNI's structure puts
the position-update sentence near the top of each carrier's
section, and the homeport mention later.
"""
# Build a master mention regex covering every name variant + the hull.
candidates: list[str] = []
for name in carrier_names:
if name and len(name) >= 4:
candidates.append(re.escape(name))
if hull_code:
candidates.append(re.escape(hull_code))
if not candidates:
return None
mention_re = re.compile(r"\b(?:" + "|".join(candidates) + r")\b", re.IGNORECASE)
window_chars = 320
seen_phrases: list[str] = []
for mention in mention_re.finditer(body):
end = mention.end()
window = body[end : end + window_chars]
# Cut window at the next sentence break for tighter context.
# (We use the LAST period within the window so "Norfolk, Va." isn't
# confused for a sentence end — USNI uses ", Va." prolifically.)
# Sentence break candidates: ". " followed by uppercase OR newline.
sent_break = re.search(r"[\.!?]\s+[A-Z]", window)
if sent_break:
window = window[: sent_break.start() + 1]
# Try patterns in priority order.
for pat in _OPERATING_PATTERNS:
m = pat.search(window)
if not m:
continue
phrase = m.group(1).strip().rstrip(",.;: ")
if not phrase:
continue
# Strip trailing editorial filler — USNI often writes
# "Norfolk, Va., according to ship spotters" or
# "Yokosuka, Japan, according to..."
phrase = re.split(
r",\s+(?:according|as of|for|while|where|in support|in the)",
phrase,
maxsplit=1,
)[0].strip()
seen_phrases.append(phrase)
return phrase
return seen_phrases[0] if seen_phrases else None
def fetch_latest_fleet_tracker_positions(
carrier_registry: dict | None = None,
region_coords: dict | None = None,
) -> dict[str, dict]:
"""Return ``{hull: position_entry}`` for the latest USNI fleet tracker.
Entries look like::
{
"lat": 18.0, "lng": 39.5, "heading": 0,
"desc": "Red Sea (USNI May 18, 2026)",
"source": "USNI News Fleet & Marine Tracker (May 18, 2026)",
"source_url": "https://news.usni.org/2026/05/18/...",
"position_source_at": "2026-05-18T18:58:44+00:00",
"position_confidence": "recent",
}
Carriers whose section can't be parsed (e.g. an off-week with no
mention) are simply absent from the result the caller keeps
whatever position they had before.
``carrier_registry`` and ``region_coords`` default to the carrier_tracker
module's own tables; passed in here for testability.
"""
if carrier_registry is None or region_coords is None:
from services.carrier_tracker import CARRIER_REGISTRY, REGION_COORDS
carrier_registry = carrier_registry or CARRIER_REGISTRY
region_coords = region_coords or REGION_COORDS
items = _iter_fleet_tracker_items(_RSS_URLS)
if not items:
logger.warning("USNI fleet-tracker: no parseable RSS items")
return {}
# Pick the most recent by parsed pubDate. Items without a parseable
# date fall to the back of the list.
items.sort(
key=lambda it: it["pub_date"] or datetime(1970, 1, 1, tzinfo=timezone.utc),
reverse=True,
)
latest = items[0]
pub_dt: datetime | None = latest["pub_date"]
pub_iso = pub_dt.isoformat() if pub_dt else ""
pub_human = pub_dt.strftime("%b %d, %Y") if pub_dt else "unknown date"
body = latest["body"]
if not body:
logger.warning("USNI fleet-tracker: latest item has empty body")
return {}
positions: dict[str, dict] = {}
for hull, info in carrier_registry.items():
# Build name variants we'll try in the body.
full_name = info["name"] # "USS Gerald R. Ford (CVN-78)"
without_hull = full_name.split("(")[0].strip() # "USS Gerald R. Ford"
last_word = without_hull.split()[-1] # "Ford"
ship_only = without_hull[4:] # "Gerald R. Ford"
# Variants ordered most-specific first.
variants: list[str] = []
for v in (without_hull, f"USS {ship_only}", ship_only, last_word):
if v and v not in variants and len(v) >= 4:
variants.append(v)
phrase = _extract_region_for_carrier(body, variants, hull)
if not phrase:
continue
resolved = _resolve_region_phrase(phrase)
if not resolved:
logger.debug(
"USNI: %s region phrase %r did not match any known region",
hull, phrase,
)
continue
canonical_key, display_phrase = resolved
coords = region_coords.get(canonical_key)
if not coords:
continue
positions[hull] = {
"lat": coords[0],
"lng": coords[1],
"heading": 0,
"desc": f"{display_phrase.title()} (USNI {pub_human})",
"source": f"USNI News Fleet & Marine Tracker ({pub_human})",
"source_url": latest["link"],
"position_source_at": pub_iso,
"position_confidence": "recent",
}
if positions:
logger.info(
"USNI fleet-tracker: parsed %d/%d carrier positions from %s",
len(positions), len(carrier_registry), latest["link"],
)
else:
logger.warning(
"USNI fleet-tracker: latest article %s yielded zero parseable carriers",
latest["link"],
)
return positions
+13 -5
View File
@@ -21,9 +21,17 @@ _cache_lock = threading.Lock()
_local_search_cache: List[Dict[str, Any]] | None = None _local_search_cache: List[Dict[str, Any]] | None = None
_local_search_lock = threading.Lock() _local_search_lock = threading.Lock()
_USER_AGENT = os.environ.get( # Round 7a: per-install operator handle threads through every Nominatim
"NOMINATIM_USER_AGENT", "ShadowBroker/1.0 (https://github.com/BigBodyCobain/Shadowbroker)" # call. NOMINATIM_USER_AGENT env override is still honored for operators
) # who run a custom relay / known good identity, but the default uses the
# per-install handle so OpenStreetMap can rate-limit per install instead
# of treating "Shadowbroker" as one big offender.
def _nominatim_user_agent() -> str:
override = os.environ.get("NOMINATIM_USER_AGENT", "").strip()
if override:
return override
from services.network_utils import outbound_user_agent
return outbound_user_agent("nominatim")
def _get_cache(key: str): def _get_cache(key: str):
@@ -178,7 +186,7 @@ def search_geocode(query: str, limit: int = 5, local_only: bool = False) -> List
res = fetch_with_curl( res = fetch_with_curl(
url, url,
headers={ headers={
"User-Agent": _USER_AGENT, "User-Agent": _nominatim_user_agent(),
"Accept-Language": "en", "Accept-Language": "en",
}, },
timeout=6, timeout=6,
@@ -241,7 +249,7 @@ def reverse_geocode(lat: float, lng: float, local_only: bool = False) -> Dict[st
res = fetch_with_curl( res = fetch_with_curl(
url, url,
headers={ headers={
"User-Agent": _USER_AGENT, "User-Agent": _nominatim_user_agent(),
"Accept-Language": "en", "Accept-Language": "en",
}, },
timeout=6, timeout=6,
+97 -55
View File
@@ -1,3 +1,4 @@
import os
import requests import requests
import logging import logging
import zipfile import zipfile
@@ -8,11 +9,62 @@ from datetime import datetime
from urllib.parse import urljoin, urlparse from urllib.parse import urljoin, urlparse
from services.network_utils import fetch_with_curl from services.network_utils import fetch_with_curl
def _geopolitics_user_agent() -> str:
"""Round 7a: GDELT geopolitics fetcher attribution."""
from services.network_utils import outbound_user_agent
return outbound_user_agent("geopolitics-gdelt")
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)
# Cache Frontline data for 30 minutes, it doesn't move that fast # Cache Frontline data for 30 minutes, it doesn't move that fast
frontline_cache = TTLCache(maxsize=1, ttl=1800) frontline_cache = TTLCache(maxsize=1, ttl=1800)
_DEFAULT_DEEPSTATE_MIRROR_REPO = "cyterat/deepstate-map-data"
def _deepstate_mirror_ref() -> tuple[str, str]:
"""Return (github_repo_slug, git_ref) for the DeepState mirror.
When ``DEEPSTATE_MIRROR_COMMIT`` is set, ingest is pinned to that immutable
SHA instead of following the mutable ``main`` branch (#362).
"""
repo = (os.environ.get("DEEPSTATE_MIRROR_REPO") or _DEFAULT_DEEPSTATE_MIRROR_REPO).strip()
if repo.count("/") != 1:
repo = _DEFAULT_DEEPSTATE_MIRROR_REPO
commit = (os.environ.get("DEEPSTATE_MIRROR_COMMIT") or "").strip()
ref = commit if commit else "main"
return repo, ref
def _latest_deepstate_geo_path(tree_items: list) -> str | None:
geo_files = [
item["path"]
for item in tree_items
if isinstance(item, dict)
and str(item.get("path", "")).startswith("data/deepstatemap_data_")
and str(item.get("path", "")).endswith(".geojson")
]
return sorted(geo_files)[-1] if geo_files else None
def _annotate_deepstate_geojson(data: dict) -> dict:
name_map = {
0: "Russian-occupied areas",
1: "Russian advance",
2: "Liberated area",
3: "Russian-occupied areas", # Crimea / LPR / DPR
4: "Directions of UA attacks",
}
if "features" in data:
for idx, feature in enumerate(data["features"]):
if "properties" not in feature or feature["properties"] is None:
feature["properties"] = {}
feature["properties"]["name"] = name_map.get(idx, "Russian-occupied areas")
feature["properties"]["zone_id"] = idx
return data
@cached(frontline_cache) @cached(frontline_cache)
def fetch_ukraine_frontlines(): def fetch_ukraine_frontlines():
@@ -20,67 +72,34 @@ def fetch_ukraine_frontlines():
Fetches the latest GeoJSON data representing the Ukraine frontline. Fetches the latest GeoJSON data representing the Ukraine frontline.
We use the cyterat/deepstate-map-data github mirror since the public API is locked. We use the cyterat/deepstate-map-data github mirror since the public API is locked.
""" """
repo, ref = _deepstate_mirror_ref()
try: try:
logger.info("Fetching DeepStateMap from GitHub mirror...") logger.info("Fetching DeepStateMap from GitHub mirror (%s @ %s)...", repo, ref)
# First, query the repo tree to find the latest file name tree_url = f"https://api.github.com/repos/{repo}/git/trees/{ref}?recursive=1"
tree_url = (
"https://api.github.com/repos/cyterat/deepstate-map-data/git/trees/main?recursive=1"
)
res_tree = requests.get(tree_url, timeout=10) res_tree = requests.get(tree_url, timeout=10)
if res_tree.status_code == 200: if res_tree.status_code == 200:
tree_data = res_tree.json().get("tree", []) latest_file = _latest_deepstate_geo_path(res_tree.json().get("tree", []))
# Filter for geojson files in data folder if latest_file:
geo_files = [ raw_url = f"https://raw.githubusercontent.com/{repo}/{ref}/{latest_file}"
item["path"] logger.info("Downloading DeepStateMap: %s", raw_url)
for item in tree_data
if item["path"].startswith("data/deepstatemap_data_")
and item["path"].endswith(".geojson")
]
if geo_files:
# Get the alphabetically latest file (since it's named with YYYYMMDD)
latest_file = sorted(geo_files)[-1]
raw_url = f"https://raw.githubusercontent.com/cyterat/deepstate-map-data/main/{latest_file}"
logger.info(f"Downloading latest DeepStateMap: {raw_url}")
res_geo = requests.get(raw_url, timeout=20) res_geo = requests.get(raw_url, timeout=20)
if res_geo.status_code == 200: if res_geo.status_code == 200:
data = res_geo.json() return _annotate_deepstate_geojson(res_geo.json())
logger.error(
# The Cyterat GitHub mirror strips all properties and just provides a raw array of Feature polygons. "Failed to fetch parsed Github Raw GeoJSON: %s", res_geo.status_code
# Based on DeepStateMap's frontend mapping, the array index corresponds to the zone type: )
# 0: Russian-occupied areas else:
# 1: Russian advance logger.error("No deepstatemap_data_*.geojson files in mirror tree at %s", ref)
# 2: Liberated area
# 3: Uncontested/Crimea (often folded into occupied)
name_map = {
0: "Russian-occupied areas",
1: "Russian advance",
2: "Liberated area",
3: "Russian-occupied areas", # Crimea / LPR / DPR
4: "Directions of UA attacks",
}
if "features" in data:
for idx, feature in enumerate(data["features"]):
if "properties" not in feature or feature["properties"] is None:
feature["properties"] = {}
feature["properties"]["name"] = name_map.get(
idx, "Russian-occupied areas"
)
feature["properties"]["zone_id"] = idx
return data
else:
logger.error(
f"Failed to fetch parsed Github Raw GeoJSON: {res_geo.status_code}"
)
else: else:
logger.error(f"Failed to fetch Github Tree for Deepstatemap: {res_tree.status_code}") logger.error(
"Failed to fetch Github tree for Deepstatemap (%s @ %s): %s",
repo,
ref,
res_tree.status_code,
)
except (requests.RequestException, ConnectionError, TimeoutError, ValueError, KeyError) as e: except (requests.RequestException, ConnectionError, TimeoutError, ValueError, KeyError) as e:
logger.error(f"Error fetching DeepStateMap: {e}") logger.error(f"Error fetching DeepStateMap: {e}")
return None return None
@@ -316,7 +335,7 @@ def _fetch_article_title(url):
resp = requests.get( resp = requests.get(
current_url, current_url,
timeout=4, timeout=4,
headers={"User-Agent": "Mozilla/5.0 (compatible; OSINT Dashboard/1.0)"}, headers={"User-Agent": _geopolitics_user_agent()},
stream=True, stream=True,
allow_redirects=False, allow_redirects=False,
) )
@@ -521,10 +540,29 @@ def _parse_gdelt_export_zip(zip_bytes, conflict_codes, seen_locs, features, loc_
logger.warning(f"Failed to parse GDELT export zip: {e}") logger.warning(f"Failed to parse GDELT export zip: {e}")
# GDELT's data.gdeltproject.org is a CNAME to a Google Cloud Storage
# bucket of the same name. GCS returns the wildcard ``*.storage.googleapis.com``
# certificate, which legitimately does NOT cover the GDELT custom domain
# — Python's TLS verification correctly refuses it. Some networks/POPs
# happen to route through a path where this works; many do not (notably
# Docker Desktop's outbound NAT on local installs).
#
# Fix: rewrite the URL to hit GCS directly with a path-style bucket
# reference, where the standard GCS cert is genuinely valid. Same data,
# verified TLS, no operator-side workaround needed.
def _gcs_direct_gdelt_url(url: str) -> str:
"""If ``url`` points at data.gdeltproject.org, return the equivalent
GCS-direct URL. Otherwise return the URL unchanged."""
prefix = "://data.gdeltproject.org/"
if prefix in url:
return url.replace(prefix, "://storage.googleapis.com/data.gdeltproject.org/", 1)
return url
def _download_gdelt_export(url): def _download_gdelt_export(url):
"""Download a single GDELT export file, return bytes or None.""" """Download a single GDELT export file, return bytes or None."""
try: try:
res = fetch_with_curl(url, timeout=15) res = fetch_with_curl(_gcs_direct_gdelt_url(url), timeout=15)
if res.status_code == 200: if res.status_code == 200:
return res.content return res.content
except (ConnectionError, TimeoutError, OSError): # non-critical except (ConnectionError, TimeoutError, OSError): # non-critical
@@ -620,8 +658,12 @@ def fetch_global_military_incidents():
# HTTPS is used to prevent passive network observers from injecting # HTTPS is used to prevent passive network observers from injecting
# poisoned export records into the global incident map via MITM. # poisoned export records into the global incident map via MITM.
# GDELT serves the same content over HTTPS as HTTP. # GDELT serves the same content over HTTPS as HTTP.
# Use the GCS-direct URL because data.gdeltproject.org's CNAME
# serves a wildcard *.storage.googleapis.com cert that legitimately
# doesn't cover the GDELT hostname. See _gcs_direct_gdelt_url above.
index_res = fetch_with_curl( index_res = fetch_with_curl(
"https://data.gdeltproject.org/gdeltv2/lastupdate.txt", timeout=10 _gcs_direct_gdelt_url("https://data.gdeltproject.org/gdeltv2/lastupdate.txt"),
timeout=10,
) )
if index_res.status_code != 200: if index_res.status_code != 200:
logger.error(f"GDELT lastupdate failed: {index_res.status_code}") logger.error(f"GDELT lastupdate failed: {index_res.status_code}")
@@ -1,14 +1,20 @@
"""Function Keys — anonymous citizenship proof. """Function Keys — anonymous credential scaffolding.
Source of truth: ``infonet-economy/IMPLEMENTATION_PLAN.md`` §4.4, Source of truth: ``infonet-economy/IMPLEMENTATION_PLAN.md`` §4.4,
``infonet-economy/BRAINDUMP.md`` §11 item 9. ``infonet-economy/BRAINDUMP.md`` §11 item 9.
A citizen should be able to prove "I am a UBI-eligible Infonet A citizen should eventually be able to prove "I am a UBI-eligible
citizen" to a real-world operator (food bank, community service) Infonet citizen" to a real-world operator (food bank, community
**without revealing their Infonet identity**. The naive approach service) **without revealing their Infonet identity**. The current
(scramble a public key, record each redemption on chain) leaks Python implementation wires the accounting, nullifier, receipt, and
identity through metadata correlation (time, location, operator, operator flows, but its HMAC challenge-response is a placeholder for
frequency). integration tests. It is not a production anonymous or zero-knowledge
citizenship proof until blind signatures or anonymous credentials are
selected and wired.
The naive approach (scramble a public key, record each redemption on
chain) leaks identity through metadata correlation (time, location,
operator, frequency).
The full design has six pieces; five are implemented in pure Python The full design has six pieces; five are implemented in pure Python
here. The remaining piece issuance via blind signatures or here. The remaining piece issuance via blind signatures or
@@ -27,7 +33,8 @@ Pieces:
operator: tracked via ``NullifierTracker``. operator: tracked via ``NullifierTracker``.
3. **Challenge-response** (`challenge_response.py`) operator 3. **Challenge-response** (`challenge_response.py`) operator
issues a fresh nonce, key-holder signs with the Function Key's issues a fresh nonce, key-holder signs with the Function Key's
secret. Prevents screenshot attacks, key sharing, replay. secret. This is HMAC placeholder plumbing for screenshot/replay
resistance, not the final anonymous credential proof.
4. **Two-phase commit receipts** (`receipt.py`) Phase 1 4. **Two-phase commit receipts** (`receipt.py`) Phase 1
verification receipt (operator-signed, day-level date NOT verification receipt (operator-signed, day-level date NOT
timestamp, no node_id). Phase 2 fulfillment receipt (citizen timestamp, no node_id). Phase 2 fulfillment receipt (citizen
@@ -0,0 +1,94 @@
"""Country risk index (static scores + USGS quake enrichment)."""
from __future__ import annotations
from datetime import datetime, timezone
from typing import Any
from zoneinfo import ZoneInfo
from services.network_utils import fetch_with_curl
RISK_FACTORS: dict[str, dict[str, Any]] = {
"UA": {"base": 85, "tags": ["active_conflict", "infrastructure_damage"]},
"RU": {"base": 72, "tags": ["sanctions", "military_mobilization"]},
"IL": {"base": 78, "tags": ["active_conflict", "regional_instability"]},
"PS": {"base": 90, "tags": ["active_conflict", "humanitarian_crisis"]},
"SY": {"base": 82, "tags": ["post_conflict", "infrastructure_damage"]},
"YE": {"base": 88, "tags": ["active_conflict", "humanitarian_crisis"]},
"MM": {"base": 76, "tags": ["civil_unrest", "military_junta"]},
"SD": {"base": 84, "tags": ["active_conflict", "humanitarian_crisis"]},
"AF": {"base": 80, "tags": ["post_conflict", "governance_collapse"]},
"KP": {"base": 70, "tags": ["nuclear_risk", "isolation"]},
"IR": {"base": 68, "tags": ["sanctions", "nuclear_program", "regional_proxy"]},
"CN": {"base": 35, "tags": ["strategic_competition", "taiwan_tensions"]},
"TW": {"base": 45, "tags": ["invasion_risk", "semiconductor_dependency"]},
"VE": {"base": 60, "tags": ["economic_collapse", "political_instability"]},
"HT": {"base": 85, "tags": ["gang_violence", "governance_collapse"]},
"LB": {"base": 65, "tags": ["economic_crisis", "political_deadlock"]},
"PK": {"base": 55, "tags": ["terrorism", "political_instability"]},
"SO": {"base": 82, "tags": ["terrorism", "state_fragility"]},
"LY": {"base": 72, "tags": ["divided_government", "militia_control"]},
"ET": {"base": 62, "tags": ["ethnic_tensions", "regional_conflicts"]},
}
EXCHANGES = [
{"name": "NYSE", "tz": "America/New_York", "open": 9.5, "close": 16, "country": "US"},
{"name": "NASDAQ", "tz": "America/New_York", "open": 9.5, "close": 16, "country": "US"},
{"name": "LSE", "tz": "Europe/London", "open": 8, "close": 16.5, "country": "GB"},
{"name": "TSE", "tz": "Asia/Tokyo", "open": 9, "close": 15, "country": "JP"},
{"name": "SSE", "tz": "Asia/Shanghai", "open": 9.5, "close": 15, "country": "CN"},
{"name": "HKEX", "tz": "Asia/Hong_Kong", "open": 9.5, "close": 16, "country": "HK"},
{"name": "FRA", "tz": "Europe/Berlin", "open": 8, "close": 20, "country": "DE"},
{"name": "TSX", "tz": "America/Toronto", "open": 9.5, "close": 16, "country": "CA"},
{"name": "MOEX", "tz": "Europe/Moscow", "open": 10, "close": 18.5, "country": "RU"},
]
def _exchange_open(ex: dict[str, Any]) -> bool:
try:
now = datetime.now(ZoneInfo(ex["tz"]))
if now.weekday() >= 5:
return False
decimal = now.hour + now.minute / 60
return ex["open"] <= decimal < ex["close"]
except Exception:
return False
def build_country_risk_payload() -> dict[str, Any]:
quake_risks: dict[str, float] = {}
try:
resp = fetch_with_curl(
"https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/4.5_day.geojson",
timeout=5,
)
if resp.status_code == 200:
for f in resp.json().get("features") or []:
place = (f.get("properties") or {}).get("place") or ""
mag = (f.get("properties") or {}).get("mag") or 0
for code in RISK_FACTORS:
if code.lower() in place.lower():
quake_risks[code] = quake_risks.get(code, 0) + mag
except Exception:
pass
countries = []
for code, data in RISK_FACTORS.items():
base = data["base"]
score = min(100, base + quake_risks.get(code, 0))
countries.append(
{
"code": code,
"risk_score": score,
"risk_level": "CRITICAL" if base >= 80 else "HIGH" if base >= 60 else "ELEVATED" if base >= 40 else "LOW",
"tags": data["tags"],
}
)
countries.sort(key=lambda c: c["risk_score"], reverse=True)
exchanges = [{"name": e["name"], "country": e["country"], "open": _exchange_open(e)} for e in EXCHANGES]
return {
"countries": countries,
"exchanges": exchanges,
"open_exchanges": sum(1 for e in exchanges if e["open"]),
"total_exchanges": len(exchanges),
"timestamp": datetime.now(timezone.utc).isoformat(),
}
+34 -16
View File
@@ -32,14 +32,14 @@ logger = logging.getLogger(__name__)
_REFRESH_SECONDS = 24 * 3600 _REFRESH_SECONDS = 24 * 3600
kiwisdr_cache: TTLCache = TTLCache(maxsize=1, ttl=_REFRESH_SECONDS) kiwisdr_cache: TTLCache = TTLCache(maxsize=1, ttl=_REFRESH_SECONDS)
_SOURCE_URL = "http://rx.linkfanel.net/kiwisdr_com.js" _SOURCE_URL_HTTP = "http://rx.linkfanel.net/kiwisdr_com.js"
_SOURCE_URL_HTTPS = "https://rx.linkfanel.net/kiwisdr_com.js"
_CACHE_FILE = Path(__file__).resolve().parent.parent / "data" / "kiwisdr_cache.json" _CACHE_FILE = Path(__file__).resolve().parent.parent / "data" / "kiwisdr_cache.json"
# Bundled fallback — shipped with the codebase so the KiwiSDR layer always # Bundled fallback — shipped with the codebase so the KiwiSDR layer always
# has something to render even when the upstream is unreachable, returns # has something to render even when the upstream is unreachable, returns
# garbage, or appears to have been tampered with. Issue #206: the upstream # garbage, or appears to have been tampered with. Issue #206 / #364: try HTTPS
# only speaks HTTP, so we can't rely on TLS for integrity — instead we # first, then HTTP; we still validate shape and fall back to this bundle if the
# validate the response's shape and fall back to this bundle if it doesn't # payload does not look right.
# look right.
_BUNDLED_FALLBACK = Path(__file__).resolve().parent.parent / "data" / "kiwisdr_directory.json" _BUNDLED_FALLBACK = Path(__file__).resolve().parent.parent / "data" / "kiwisdr_directory.json"
# Minimum number of receivers we expect from a healthy upstream response. # Minimum number of receivers we expect from a healthy upstream response.
@@ -184,6 +184,29 @@ def _validate_fetched_nodes(nodes: list[dict]) -> bool:
return True return True
def _fetch_mirror_payload_text() -> str | None:
"""Try HTTPS first, then HTTP. Shape validation still applies (#364)."""
from services.network_utils import fetch_with_curl
last_error: Exception | None = None
for url in (_SOURCE_URL_HTTPS, _SOURCE_URL_HTTP):
try:
res = fetch_with_curl(url, timeout=20)
if res and res.status_code == 200:
if url == _SOURCE_URL_HTTP:
logger.info(
"KiwiSDR: HTTPS mirror unavailable; using HTTP with shape validation"
)
return res.text
last_error = RuntimeError(f"HTTP {getattr(res, 'status_code', 'unknown')}")
except Exception as e:
last_error = e
logger.debug("KiwiSDR mirror fetch failed for %s: %s", url, e)
if last_error is not None:
logger.warning("KiwiSDR mirror fetch failed: %s", last_error)
return None
def _load_bundled_fallback() -> list[dict]: def _load_bundled_fallback() -> list[dict]:
"""Last-resort directory shipped with the codebase. Always returns a """Last-resort directory shipped with the codebase. Always returns a
list (may be empty if the bundle is missing in older deployments).""" list (may be empty if the bundle is missing in older deployments)."""
@@ -202,9 +225,8 @@ def _load_bundled_fallback() -> list[dict]:
def fetch_kiwisdr_nodes() -> list[dict]: def fetch_kiwisdr_nodes() -> list[dict]:
"""Return the KiwiSDR receiver list, refreshed at most once per day. """Return the KiwiSDR receiver list, refreshed at most once per day.
Layered fallback (issue #206 — upstream is HTTP-only, so we defend with Layered fallback (issue #206 / #364 — HTTPS first, HTTP fallback, plus
content validation + bundled static directory rather than trying to content validation + bundled static directory):
upgrade the transport):
1. In-memory cache (handled by @cached on this function) 1. In-memory cache (handled by @cached on this function)
2. On-disk cache if <24h old 2. On-disk cache if <24h old
@@ -216,8 +238,6 @@ def fetch_kiwisdr_nodes() -> list[dict]:
tampered upstream returning garbage is caught by _validate_fetched_nodes() tampered upstream returning garbage is caught by _validate_fetched_nodes()
and falls through to whatever previously-trusted snapshot we have. and falls through to whatever previously-trusted snapshot we have.
""" """
from services.network_utils import fetch_with_curl
# 1. Trust on-disk cache if fresh. # 1. Trust on-disk cache if fresh.
cached_nodes = _load_disk_cache() cached_nodes = _load_disk_cache()
if cached_nodes is not None: if cached_nodes is not None:
@@ -230,14 +250,12 @@ def fetch_kiwisdr_nodes() -> list[dict]:
fresh_nodes: list[dict] = [] fresh_nodes: list[dict] = []
fetch_succeeded = False fetch_succeeded = False
try: try:
res = fetch_with_curl(_SOURCE_URL, timeout=20) body = _fetch_mirror_payload_text()
if res and res.status_code == 200: if body:
fresh_nodes = _parse_mirror_payload(res.text) fresh_nodes = _parse_mirror_payload(body)
fetch_succeeded = True fetch_succeeded = True
else: else:
logger.warning( logger.warning("KiwiSDR fetch returned no usable mirror payload")
f"KiwiSDR fetch returned HTTP {res.status_code if res else 'no response'}"
)
except (requests.RequestException, ConnectionError, TimeoutError, ValueError, KeyError) as e: except (requests.RequestException, ConnectionError, TimeoutError, ValueError, KeyError) as e:
logger.warning(f"KiwiSDR fetch exception: {e}") logger.warning(f"KiwiSDR fetch exception: {e}")
+11 -1
View File
@@ -27,11 +27,21 @@ def fetch_liveuamap():
browser = p.chromium.launch( browser = p.chromium.launch(
headless=True, args=["--disable-blink-features=AutomationControlled"] headless=True, args=["--disable-blink-features=AutomationControlled"]
) )
from services.network_utils import outbound_user_agent
# Per-install handle (no shared Shadowbroker product token). Stealth remains
# for Turnstile; see docs/OUTBOUND_DATA.md #348.
playwright_ua = (
f"Mozilla/5.0 (compatible; {outbound_user_agent('liveuamap')})"
)
context = browser.new_context( context = browser.new_context(
user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36", user_agent=playwright_ua,
viewport={"width": 1920, "height": 1080}, viewport={"width": 1920, "height": 1080},
color_scheme="dark", color_scheme="dark",
) )
# Bound navigation and script evaluation so a stuck region cannot hang the slow pool.
context.set_default_navigation_timeout(60_000)
context.set_default_timeout(30_000)
page = context.new_page() page = context.new_page()
stealth_sync(page) stealth_sync(page)
+73
View File
@@ -0,0 +1,73 @@
"""LiveUAMap Playwright scraper opt-in (#348) — UI consent on Windows."""
from __future__ import annotations
import json
import logging
import os
import threading
from pathlib import Path
from typing import Any
logger = logging.getLogger(__name__)
_OPT_IN_FILE = Path(__file__).resolve().parent.parent / "data" / "liveuamap_scraper_opt_in.json"
_OPT_IN_LOCK = threading.Lock()
def _env_flag(name: str) -> str:
return str(os.getenv(name, "")).strip().lower()
def liveuamap_requires_ui_opt_in() -> bool:
"""Windows local installs need explicit consent before Playwright contacts LiveUAMap."""
return os.name == "nt"
def get_liveuamap_ui_opt_in() -> bool:
if not _OPT_IN_FILE.exists():
return False
try:
payload = json.loads(_OPT_IN_FILE.read_text(encoding="utf-8"))
return bool(payload.get("opted_in"))
except (OSError, json.JSONDecodeError, TypeError) as e:
logger.warning("LiveUAMap opt-in file unreadable: %s", e)
return False
def set_liveuamap_ui_opt_in(opted_in: bool) -> None:
_OPT_IN_FILE.parent.mkdir(parents=True, exist_ok=True)
with _OPT_IN_LOCK:
_OPT_IN_FILE.write_text(
json.dumps({"opted_in": bool(opted_in)}, indent=2),
encoding="utf-8",
)
def liveuamap_scraper_enabled() -> bool:
"""Whether the Playwright LiveUAMap scraper may run on this backend."""
setting = _env_flag("SHADOWBROKER_ENABLE_LIVEUAMAP_SCRAPER")
if setting in {"1", "true", "yes", "on"}:
return True
if setting in {"0", "false", "no", "off"}:
return False
if not liveuamap_requires_ui_opt_in():
return True
return get_liveuamap_ui_opt_in()
def liveuamap_scraper_status() -> dict[str, Any]:
setting = _env_flag("SHADOWBROKER_ENABLE_LIVEUAMAP_SCRAPER")
env_override = None
if setting in {"1", "true", "yes", "on"}:
env_override = "on"
elif setting in {"0", "false", "no", "off"}:
env_override = "off"
ui_opted_in = get_liveuamap_ui_opt_in()
requires = liveuamap_requires_ui_opt_in()
return {
"platform_requires_opt_in": requires,
"ui_opted_in": ui_opted_in,
"scraper_enabled": liveuamap_scraper_enabled(),
"env_override": env_override,
}
+109
View File
@@ -69,6 +69,115 @@ def _derive_peer_key(shared_secret: str, peer_url: str) -> bytes:
).digest() ).digest()
# ---------------------------------------------------------------------------
# Issue #256 (tg12): per-peer HMAC secrets
# ---------------------------------------------------------------------------
#
# Before this change, ALL peer-push HMACs were derived from a single
# fleet-shared ``MESH_PEER_PUSH_SECRET``. The receiver could prove a
# request was signed by *someone who knows the fleet secret*, but it
# could NOT prove which peer signed it — any peer could compute the
# expected HMAC for any other peer's URL and impersonate that peer.
#
# Fix: an optional ``MESH_PEER_SECRETS`` env var maps specific peer URLs
# to per-peer secrets. When a peer URL is listed there, only that
# per-peer secret is accepted for that URL — the global secret is
# ignored for that peer. Peer A no longer learns peer B's secret, so
# peer A cannot forge a request claiming to be peer B.
#
# Backwards-compatible by design:
#
# - Single-peer installs (``MESH_PEER_SECRETS`` empty) keep using the
# global secret. Zero behavior change. Zero operator action required.
# - Multi-peer installs that haven't migrated yet keep using the global
# secret for every peer. Same behavior as before — same exposure.
# - Multi-peer installs that have migrated configure
# ``MESH_PEER_SECRETS=urlA=secretA,urlB=secretB`` and immediately get
# per-peer identity. Migration is incremental: peers not yet listed
# continue using the global secret until both sides of that peering
# add their entry.
_PEER_SECRETS_CACHE: dict[str, str] = {}
_PEER_SECRETS_CACHE_RAW: str = ""
def _lookup_per_peer_secret(normalized_url: str) -> str:
"""Return the per-peer secret for ``normalized_url`` from MESH_PEER_SECRETS.
Returns "" if no per-peer entry is configured for that URL. The parser
is forgiving:
- Whitespace around items, URLs, and secrets is stripped.
- Items without ``=`` or with empty URL/secret halves are skipped.
- The URL half is normalized via ``normalize_peer_url`` so config
authors don't have to match scheme/port/path quirks exactly.
The cache is invalidated whenever the env var's raw value changes,
which keeps tests' ``monkeypatch.setenv`` calls effective without
forcing a process restart.
"""
import os
raw = str(os.environ.get("MESH_PEER_SECRETS", "") or "").strip()
global _PEER_SECRETS_CACHE, _PEER_SECRETS_CACHE_RAW
if raw != _PEER_SECRETS_CACHE_RAW:
new_cache: dict[str, str] = {}
for chunk in raw.split(","):
chunk = chunk.strip()
if not chunk or "=" not in chunk:
continue
url_part, _, secret_part = chunk.partition("=")
normalized = normalize_peer_url(url_part.strip())
secret = secret_part.strip()
if normalized and secret:
new_cache[normalized] = secret
_PEER_SECRETS_CACHE = new_cache
_PEER_SECRETS_CACHE_RAW = raw
return _PEER_SECRETS_CACHE.get(normalized_url, "")
def resolve_peer_key_for_url(peer_url: str) -> bytes:
"""Return the HMAC key for ``peer_url``, preferring per-peer secret.
Issue #256: this is the function every peer-push call site should
use. It looks up the peer-specific secret first, falling back to the
fleet-shared ``MESH_PEER_PUSH_SECRET`` only when the URL is NOT
listed in ``MESH_PEER_SECRETS``.
Both sender (computing X-Peer-HMAC) and receiver (verifying it) call
this with the SENDER's URL — they must derive the same key, so
operators on both ends of a peering need matching MESH_PEER_SECRETS
entries for that URL to stay in sync.
Returns empty bytes when no usable secret exists. Callers must treat
that as fail-closed (skip the push, reject the verification).
"""
normalized_url = normalize_peer_url(peer_url)
if not normalized_url:
return b""
per_peer_secret = _lookup_per_peer_secret(normalized_url)
if per_peer_secret:
return _derive_peer_key(per_peer_secret, normalized_url)
# No per-peer entry for this URL — fall back to the legacy global
# secret. This is what preserves zero-hostility for single-peer
# installs and the migration window for multi-peer installs.
try:
from services.config import get_settings
global_secret = str(
getattr(get_settings(), "MESH_PEER_PUSH_SECRET", "") or ""
).strip()
except Exception:
return b""
if not global_secret:
return b""
return _derive_peer_key(global_secret, normalized_url)
def _node_digest(public_key_b64: str) -> str: def _node_digest(public_key_b64: str) -> str:
raw = base64.b64decode(public_key_b64) raw = base64.b64decode(public_key_b64)
return hashlib.sha256(raw).hexdigest() return hashlib.sha256(raw).hexdigest()
+293
View File
@@ -317,6 +317,39 @@ class DMRelay:
def _self_mailbox_limit(self) -> int: def _self_mailbox_limit(self) -> int:
return max(1, int(self._settings().MESH_DM_SELF_MAILBOX_LIMIT)) return max(1, int(self._settings().MESH_DM_SELF_MAILBOX_LIMIT))
def _per_sender_pending_limit(self) -> int:
"""Anti-spam cap on UNACKED messages a single sender can have parked
in a single recipient mailbox at any one time. See ``config.py``
``MESH_DM_PENDING_PER_SENDER_LIMIT`` for the threat model this
rule is enforced both at ``deposit`` (local) and at
``accept_replica`` (peer push acceptance), making it a network
rule rather than a client-side honor system."""
try:
limit = int(getattr(self._settings(), "MESH_DM_PENDING_PER_SENDER_LIMIT", 2) or 2)
except (TypeError, ValueError):
limit = 2
return max(1, limit)
def _per_sender_pending_count(
self,
*,
mailbox_key: str,
sender_block_ref: str,
) -> int:
"""Count UNACKED messages from ``sender_block_ref`` currently parked
in ``mailbox_key``. Caller already holds ``self._lock``.
Messages that have been claimed/acked are removed from the mailbox
list (see ``claim_message_ids``), so anything still here is by
definition unacked. We count by exact ``sender_block_ref`` match
that's the per-pair sender identity used for blocking too, so
the cap is naturally per-(sender, recipient).
"""
if not mailbox_key or not sender_block_ref:
return 0
messages = self._mailboxes.get(mailbox_key, [])
return sum(1 for m in messages if m.sender_block_ref == sender_block_ref)
def _nonce_ttl_seconds(self) -> int: def _nonce_ttl_seconds(self) -> int:
return max(30, int(self._settings().MESH_DM_NONCE_TTL_S)) return max(30, int(self._settings().MESH_DM_NONCE_TTL_S))
@@ -1515,6 +1548,29 @@ class DMRelay:
if len(self._mailboxes[mailbox_key]) >= self._mailbox_limit_for_class(delivery_class): if len(self._mailboxes[mailbox_key]) >= self._mailbox_limit_for_class(delivery_class):
metrics_inc("dm_drop_full") metrics_inc("dm_drop_full")
return {"ok": False, "detail": "Recipient mailbox full"} return {"ok": False, "detail": "Recipient mailbox full"}
# Anti-spam: per-(sender, recipient) cap on unacked messages.
# A sender who already has the configured number of messages
# parked in this mailbox can't deposit more until the recipient
# pulls (acks) at least one. The same cap is re-enforced on
# inbound replication in ``accept_replica`` so this rule isn't
# bypassable by patching out the local check on a hostile
# sender's relay — see config.py
# MESH_DM_PENDING_PER_SENDER_LIMIT for the threat model.
per_sender_limit = self._per_sender_pending_limit()
pending = self._per_sender_pending_count(
mailbox_key=mailbox_key,
sender_block_ref=sender_block_ref,
)
if pending >= per_sender_limit:
metrics_inc("dm_drop_per_sender_cap")
return {
"ok": False,
"detail": (
f"Recipient already has {pending} unread message"
f"{'s' if pending != 1 else ''} from you. Wait for "
"them to read your messages before sending more."
),
}
if not msg_id: if not msg_id:
msg_id = f"dm_{int(time.time() * 1000)}_{secrets.token_hex(6)}" msg_id = f"dm_{int(time.time() * 1000)}_{secrets.token_hex(6)}"
elif any(m.msg_id == msg_id for m in self._mailboxes[mailbox_key]): elif any(m.msg_id == msg_id for m in self._mailboxes[mailbox_key]):
@@ -1539,8 +1595,245 @@ class DMRelay:
) )
self._stats["messages_in_memory"] = sum(len(v) for v in self._mailboxes.values()) self._stats["messages_in_memory"] = sum(len(v) for v in self._mailboxes.values())
self._save() self._save()
# Cross-node mailbox replication: push the freshly-stored
# envelope to every authenticated relay peer so the recipient
# can log into ANY node and find their messages. The push is
# async (fire-and-forget thread) so deposit() returns
# immediately — slow Tor peers can't block the sender's UX.
# Each receiving peer re-enforces the per-sender cap on
# acceptance, so hostile relays can't widen the cap.
try:
envelope_for_push = self.envelope_for_replication(
mailbox_key=mailbox_key, msg_id=msg_id,
)
if envelope_for_push:
self._replicate_envelope_to_peers_async(
envelope=envelope_for_push,
)
except Exception:
metrics_inc("dm_replication_push_error")
return {"ok": True, "msg_id": msg_id} return {"ok": True, "msg_id": msg_id}
def accept_replica(
self,
*,
envelope: dict[str, Any],
originating_peer_url: str = "",
) -> dict[str, Any]:
"""Receive a DM envelope replicated from a peer relay.
Cross-node mailbox replication entry point. When a sender's local
relay accepts a ``deposit`` and pushes the envelope to
``MESH_RELAY_PEERS`` (so the recipient can log into any peer
node and find their messages), each receiving peer calls
``accept_replica`` to ingest it.
The per-(sender, recipient) cap is re-enforced HERE. That's what
makes the rule a NETWORK rule rather than a client-side honor
system: a hostile sender who patches out the local ``deposit``
check still can't get a 3rd unacked message to spread, because
every honest peer enforces the same cap on inbound replicas.
Result: hostile relays can hold extras locally, but those extras
never reach any node a legitimate recipient is polling from.
Returns the same shape as ``deposit`` so the calling endpoint can
forward the result back to the originating peer.
"""
if not isinstance(envelope, dict):
return {"ok": False, "detail": "envelope must be an object"}
msg_id = str(envelope.get("msg_id", "") or "").strip()
mailbox_key = str(envelope.get("mailbox_key", "") or "").strip()
sender_block_ref = str(envelope.get("sender_block_ref", "") or "").strip()
ciphertext = str(envelope.get("ciphertext", "") or "")
if not msg_id or not mailbox_key or not sender_block_ref or not ciphertext:
return {"ok": False, "detail": "envelope missing required fields"}
with self._lock:
self._refresh_from_shared_relay()
self._cleanup_expired()
# Idempotent — if we already hold this exact msg_id, the
# replication round-tripped or a peer pushed the same
# envelope through multiple paths. Accept silently.
if any(m.msg_id == msg_id for m in self._mailboxes.get(mailbox_key, [])):
metrics_inc("dm_replica_duplicate")
return {"ok": True, "msg_id": msg_id, "duplicate": True}
# Same per-class cap as the deposit path — defense in depth
# against a peer that wraps a "deposit" as a "replica" to
# bypass the class limit.
delivery_class = str(envelope.get("delivery_class", "") or "")
if delivery_class in ("request", "shared", "self"):
class_limit = self._mailbox_limit_for_class(delivery_class)
else:
class_limit = self._shared_mailbox_limit()
if len(self._mailboxes.get(mailbox_key, [])) >= class_limit:
metrics_inc("dm_replica_drop_full")
return {"ok": False, "detail": "Recipient mailbox full"}
# THE network rule: per-(sender, recipient) anti-spam cap.
per_sender_limit = self._per_sender_pending_limit()
pending = self._per_sender_pending_count(
mailbox_key=mailbox_key,
sender_block_ref=sender_block_ref,
)
if pending >= per_sender_limit:
metrics_inc("dm_replica_drop_per_sender_cap")
# Returning a structured rejection — the sender's relay
# learns its envelope was rejected by an honest peer and
# can stop trying to push it.
return {
"ok": False,
"detail": (
"Per-sender cap reached on this relay; refusing replica"
),
"cap_violation": True,
"pending": pending,
"limit": per_sender_limit,
}
# Accept the replica into the local mailbox.
self._mailboxes[mailbox_key].append(
DMMessage(
sender_id=str(envelope.get("sender_id", "") or ""),
ciphertext=ciphertext,
timestamp=float(envelope.get("timestamp", time.time()) or time.time()),
msg_id=msg_id,
delivery_class=str(envelope.get("delivery_class", "shared") or "shared"),
sender_seal=str(envelope.get("sender_seal", "") or ""),
relay_salt=str(envelope.get("relay_salt", "") or ""),
sender_block_ref=sender_block_ref,
payload_format=str(envelope.get("payload_format", "dm1") or "dm1"),
session_welcome=str(envelope.get("session_welcome", "") or ""),
)
)
self._stats["messages_in_memory"] = sum(len(v) for v in self._mailboxes.values())
self._save()
metrics_inc("dm_replica_accepted")
return {"ok": True, "msg_id": msg_id}
def _replicate_envelope_to_peers_async(
self,
*,
envelope: dict[str, Any],
) -> None:
"""Push an outbound DM envelope to every authenticated relay peer.
Fire-and-forget: spawned in a background thread so ``deposit``
returns to the caller immediately. Per-peer errors are logged
and swallowed the sender's UX must not block on slow Tor
peers, and a peer that's down today gets the next message
whenever it comes back. Inbound recipient polling from a healthy
peer keeps the system functional during peer failures.
Each peer is authed with the existing per-peer HMAC pattern
(#256) — same headers and key resolver gate-message replication
uses, so a hostile node that doesn't know any peer's HMAC key
can't impersonate a legitimate relay.
"""
import threading
def _do_push():
try:
import hashlib
import hmac
import requests as _requests
from services.mesh.mesh_crypto import (
normalize_peer_url,
resolve_peer_key_for_url,
)
from services.mesh.mesh_router import (
authenticated_push_peer_urls,
)
peers = authenticated_push_peer_urls()
if not peers:
return
payload = json.dumps(
{"envelope": envelope},
separators=(",", ":"),
ensure_ascii=False,
).encode("utf-8")
timeout = max(
1,
int(getattr(self._settings(), "MESH_RELAY_PUSH_TIMEOUT_S", 10) or 10),
)
for peer_url in peers:
try:
normalized = normalize_peer_url(peer_url)
headers = {"Content-Type": "application/json"}
peer_key = resolve_peer_key_for_url(normalized)
if peer_key:
headers["X-Peer-Url"] = normalized
headers["X-Peer-HMAC"] = hmac.new(
peer_key, payload, hashlib.sha256
).hexdigest()
url = f"{peer_url}/api/mesh/dm/replicate-envelope"
resp = _requests.post(
url, data=payload, timeout=timeout, headers=headers,
)
if resp.status_code == 200:
metrics_inc("dm_replication_push_ok")
else:
# 4xx including the structured cap_violation
# rejection from accept_replica — sender's
# relay learns and stops retrying this msg_id.
metrics_inc("dm_replication_push_rejected")
except Exception:
# Per-peer failure is non-fatal — log to metrics
# but don't break the loop. Other peers and a
# future retry can still propagate the envelope.
metrics_inc("dm_replication_push_error")
continue
except Exception:
# Outer guard — never let replication errors propagate
# back to the sender's deposit() caller.
metrics_inc("dm_replication_push_error")
thread = threading.Thread(
target=_do_push,
name="dm-replicate-push",
daemon=True,
)
thread.start()
def envelope_for_replication(
self,
*,
mailbox_key: str,
msg_id: str,
) -> dict[str, Any] | None:
"""Return the wire-form envelope for a stored message, suitable
for POSTing to a peer relay's replicate-envelope endpoint.
Returns ``None`` if the message isn't in the mailbox (already
acked, expired, never existed). The caller holds the
responsibility for transport security (Tor SOCKS for .onion
peers, per-peer HMAC) and for not leaking the envelope to
clearnet peers when private transport is required.
"""
with self._lock:
for m in self._mailboxes.get(mailbox_key, []):
if m.msg_id == msg_id:
return {
"msg_id": m.msg_id,
"mailbox_key": mailbox_key,
"sender_id": m.sender_id,
"sender_block_ref": m.sender_block_ref,
"sender_seal": m.sender_seal,
"ciphertext": m.ciphertext,
"timestamp": m.timestamp,
"delivery_class": m.delivery_class,
"relay_salt": m.relay_salt,
"payload_format": m.payload_format,
"session_welcome": m.session_welcome,
}
return None
def is_blocked(self, recipient_id: str, sender_id: str) -> bool: def is_blocked(self, recipient_id: str, sender_id: str) -> bool:
with self._lock: with self._lock:
self._refresh_from_shared_relay() self._refresh_from_shared_relay()
+530 -90
View File
@@ -33,8 +33,9 @@ Each event contains:
Persistence: JSON file at backend/data/infonet.json Persistence: JSON file at backend/data/infonet.json
Encrypted gate chat events are intentionally kept off the public chain and Encrypted gate chat events are private-chain ciphertext records. They are
persisted separately via GateMessageStore. excluded from public read surfaces and replicated only over private Infonet
transports.
""" """
import json import json
@@ -64,6 +65,8 @@ from services.mesh.mesh_schema import (
ACTIVE_PUBLIC_LEDGER_EVENT_TYPES, ACTIVE_PUBLIC_LEDGER_EVENT_TYPES,
PUBLIC_LEDGER_EVENT_TYPES, PUBLIC_LEDGER_EVENT_TYPES,
validate_event_payload, validate_event_payload,
validate_private_dm_ledger_payload,
validate_private_gate_ledger_payload,
validate_protocol_fields, validate_protocol_fields,
validate_public_ledger_payload, validate_public_ledger_payload,
) )
@@ -127,6 +130,12 @@ GATE_SEGMENT_MAX_COMPRESSED_BYTES = max(
int(os.environ.get("MESH_GATE_SEGMENT_MAX_COMPRESSED_BYTES", str(2 * 1024 * 1024)) or str(2 * 1024 * 1024)), int(os.environ.get("MESH_GATE_SEGMENT_MAX_COMPRESSED_BYTES", str(2 * 1024 * 1024)) or str(2 * 1024 * 1024)),
) )
GATE_SEGMENT_STORAGE_VERSION = 1 GATE_SEGMENT_STORAGE_VERSION = 1
DM_HASHCHAIN_SPOOL_LIMIT = max(1, int(os.environ.get("MESH_DM_HASHCHAIN_SPOOL_LIMIT", "2") or "2"))
DM_HASHCHAIN_SPOOL_SENDER_LIMIT = max(
1,
int(os.environ.get("MESH_DM_HASHCHAIN_SPOOL_SENDER_LIMIT", "1") or "1"),
)
DM_HASHCHAIN_SPOOL_TTL_S = max(60, int(os.environ.get("MESH_DM_HASHCHAIN_SPOOL_TTL_S", "3600") or "3600"))
_PUBLIC_EVENT_APPEND_HOOKS: list[Any] = [] _PUBLIC_EVENT_APPEND_HOOKS: list[Any] = []
_PUBLIC_EVENT_APPEND_HOOKS_LOCK = threading.Lock() _PUBLIC_EVENT_APPEND_HOOKS_LOCK = threading.Lock()
@@ -216,18 +225,19 @@ def _peer_pair_ref_key(peer_url: str) -> bytes:
Returns an empty key on misconfiguration so callers fail closed. Returns an empty key on misconfiguration so callers fail closed.
""" """
try: try:
from services.config import get_settings from services.mesh.mesh_crypto import (
from services.mesh.mesh_crypto import _derive_peer_key, normalize_peer_url normalize_peer_url,
resolve_peer_key_for_url,
secret = str(get_settings().MESH_PEER_PUSH_SECRET or "").strip() )
except Exception: except Exception:
return b"" return b""
if not secret:
return b""
normalized = normalize_peer_url(peer_url or "") normalized = normalize_peer_url(peer_url or "")
if not normalized: if not normalized:
return b"" return b""
peer_key = _derive_peer_key(secret, normalized) # Issue #256: resolve_peer_key_for_url() prefers per-peer secrets
# from MESH_PEER_SECRETS and falls back to the global
# MESH_PEER_PUSH_SECRET only when the URL has no per-peer entry.
peer_key = resolve_peer_key_for_url(normalized)
if not peer_key: if not peer_key:
return b"" return b""
# Domain-separate from the transport HMAC key so the two # Domain-separate from the transport HMAC key so the two
@@ -339,6 +349,32 @@ def _private_gate_event_id(
).hexdigest() ).hexdigest()
def _private_gate_signature_payload_variants(gate_id: str, event: dict[str, Any]) -> list[dict[str, Any]]:
payload = _private_gate_signature_payload(gate_id, event)
variants: list[dict[str, Any]] = [payload]
event_payload = event.get("payload") if isinstance(event.get("payload"), dict) else {}
reply_to = str(event_payload.get("reply_to", "") or "").strip()
if reply_to:
variants.append(_private_gate_signature_payload(gate_id, event, include_reply_to=False))
if "epoch" in payload:
no_epoch = dict(payload)
no_epoch.pop("epoch", None)
variants.append(no_epoch)
if reply_to:
no_epoch_no_reply = _private_gate_signature_payload(gate_id, event, include_reply_to=False)
no_epoch_no_reply.pop("epoch", None)
variants.append(no_epoch_no_reply)
deduped: list[dict[str, Any]] = []
seen: set[str] = set()
for variant in variants:
material = json.dumps(variant, sort_keys=True, separators=(",", ":"), ensure_ascii=False)
if material in seen:
continue
seen.add(material)
deduped.append(variant)
return deduped
def _sanitize_private_gate_event(gate_id: str, event: dict[str, Any]) -> dict[str, Any]: def _sanitize_private_gate_event(gate_id: str, event: dict[str, Any]) -> dict[str, Any]:
payload = event.get("payload") if isinstance(event.get("payload"), dict) else {} payload = event.get("payload") if isinstance(event.get("payload"), dict) else {}
sanitized = { sanitized = {
@@ -1567,11 +1603,18 @@ class Infonet:
def _rebuild_state(self) -> None: def _rebuild_state(self) -> None:
self.event_index = {} self.event_index = {}
self.node_sequences = {} self.node_sequences = {}
# Keep private signed-write replay domains across public-chain # Keep private signed-write replay domains that are not represented
# rebuilds; these domains protect local side effects that are not # on-chain, but rebuild the gate_message sequence domain from chain
# represented as public Infonet events. # events so reloads/fork application do not mix it with public
if not isinstance(getattr(self, "sequence_domains", None), dict): # per-node message sequences.
self.sequence_domains = {} preserved_domains = {}
if isinstance(getattr(self, "sequence_domains", None), dict):
preserved_domains = {
key: value
for key, value in self.sequence_domains.items()
if not str(key or "").endswith("|gate_message")
}
self.sequence_domains = dict(preserved_domains)
self.public_key_bindings = {} self.public_key_bindings = {}
self.revocations = {} self.revocations = {}
self._replay_filter = ReplayFilter() self._replay_filter = ReplayFilter()
@@ -1583,9 +1626,12 @@ class Infonet:
node_id = evt.get("node_id", "") node_id = evt.get("node_id", "")
sequence = _safe_int(evt.get("sequence", 0) or 0, 0) sequence = _safe_int(evt.get("sequence", 0) or 0, 0)
if node_id and sequence: if node_id and sequence:
last = self.node_sequences.get(node_id, 0) sequence_table, sequence_key = self._sequence_table_for_event(
evt.get("event_type", ""), node_id
)
last = sequence_table.get(sequence_key, 0)
if sequence > last: if sequence > last:
self.node_sequences[node_id] = sequence sequence_table[sequence_key] = sequence
public_key = str(evt.get("public_key", "") or "") public_key = str(evt.get("public_key", "") or "")
if public_key and node_id: if public_key and node_id:
existing = self.public_key_bindings.get(public_key) existing = self.public_key_bindings.get(public_key)
@@ -1897,6 +1943,295 @@ class Infonet:
self._save() self._save()
return True, "ok" return True, "ok"
def _sequence_table_for_event(self, event_type: str, node_id: str) -> tuple[dict[str, int], str]:
normalized = str(event_type or "").strip().lower()
if normalized == "gate_message":
return self.sequence_domains, f"{node_id}|gate_message"
if normalized == "dm_message":
return self.sequence_domains, f"{node_id}|dm_message"
return self.node_sequences, node_id
def _dm_spool_target_key(self, payload: dict[str, Any]) -> tuple[str, str]:
delivery_class = str(payload.get("delivery_class", "") or "").strip().lower()
if delivery_class == "shared":
key = str(payload.get("recipient_token", "") or "").strip()
else:
key = str(payload.get("recipient_id", "") or "").strip()
return delivery_class, key
def _dm_spool_active_counts(
self,
payload: dict[str, Any],
*,
sender_id: str = "",
now: float | None = None,
) -> tuple[int, int]:
delivery_class, key = self._dm_spool_target_key(payload)
if not key:
return 0, 0
sender_id = str(sender_id or "").strip()
current = time.time() if now is None else float(now)
total_count = 0
sender_count = 0
for evt in reversed(self.events):
if evt.get("event_type") != "dm_message":
continue
evt_payload = evt.get("payload") if isinstance(evt.get("payload"), dict) else {}
evt_delivery_class, evt_key = self._dm_spool_target_key(evt_payload)
if evt_delivery_class != delivery_class:
continue
if evt_key != key:
continue
evt_ts = float(evt_payload.get("timestamp", evt.get("timestamp", 0)) or 0)
if evt_ts > 0 and current - evt_ts > DM_HASHCHAIN_SPOOL_TTL_S:
continue
total_count += 1
if sender_id and str(evt.get("node_id", "") or "").strip() == sender_id:
sender_count += 1
if total_count >= DM_HASHCHAIN_SPOOL_LIMIT and (
not sender_id or sender_count >= DM_HASHCHAIN_SPOOL_SENDER_LIMIT
):
break
return total_count, sender_count
def _dm_spool_active_count(self, payload: dict[str, Any], *, now: float | None = None) -> int:
total_count, _sender_count = self._dm_spool_active_counts(payload, now=now)
return total_count
def append_private_dm_message(
self,
*,
node_id: str,
payload: dict,
signature: str,
sequence: int,
public_key: str,
public_key_algo: str,
protocol_version: str = "",
timestamp: float = 0,
) -> dict:
"""Append an encrypted DM dead-drop message to the private Infonet ledger.
The event is a small offline spool, capped per mailbox target, so the
hashchain can carry a couple of sealed DMs without becoming an
unbounded global mailbox.
"""
event_type = "dm_message"
if sequence <= 0:
raise ValueError("sequence is required and must be > 0")
sequence_table, sequence_key = self._sequence_table_for_event(event_type, node_id)
last = sequence_table.get(sequence_key, 0)
if sequence <= last:
raise ValueError(f"Replay detected: sequence {sequence} <= last {last}")
raw_payload = dict(payload or {})
if "message" in raw_payload or "plaintext" in raw_payload or "_local_plaintext" in raw_payload:
raise ValueError("private DM ledger payload must not contain plaintext")
if str(raw_payload.get("transport_lock", "") or "").strip().lower() != "private_strong":
raise ValueError("DM hashchain spool requires private_strong transport_lock")
payload = normalize_payload(event_type, raw_payload)
ok, reason = validate_private_dm_ledger_payload(payload)
if not ok:
raise ValueError(reason)
total_count, sender_count = self._dm_spool_active_counts(payload, sender_id=node_id)
if sender_count >= DM_HASHCHAIN_SPOOL_SENDER_LIMIT:
raise ValueError("DM hashchain sender spool full for recipient")
if total_count >= DM_HASHCHAIN_SPOOL_LIMIT:
raise ValueError("DM hashchain spool full for recipient")
payload_json = json.dumps(payload, sort_keys=True, separators=(",", ":"), ensure_ascii=False)
if len(payload_json.encode("utf-8")) > MAX_PAYLOAD_BYTES:
raise ValueError("payload exceeds max size")
protocol_version = str(protocol_version or PROTOCOL_VERSION)
ok, reason = validate_protocol_fields(protocol_version, NETWORK_ID)
if not ok:
raise ValueError(reason)
if not (signature and public_key and public_key_algo):
raise ValueError("Missing signature fields")
algo = parse_public_key_algo(public_key_algo)
if not algo:
raise ValueError("Unsupported public_key_algo")
if not verify_node_binding(node_id, public_key):
raise ValueError("node_id mismatch")
bound, bind_reason = self._bind_public_key(public_key, node_id)
if not bound:
raise ValueError(bind_reason)
sig_payload = build_signature_payload(
event_type=event_type,
node_id=node_id,
sequence=sequence,
payload=payload,
)
if not verify_signature(
public_key_b64=public_key,
public_key_algo=public_key_algo,
signature_hex=signature,
payload=sig_payload,
):
raise ValueError("Invalid signature")
revoked, _info = self._revocation_status(public_key)
if revoked:
raise ValueError("public key is revoked")
event = ChainEvent(
prev_hash=self.head_hash,
event_type=event_type,
node_id=node_id,
payload=payload,
timestamp=float(timestamp or time.time()),
sequence=sequence,
signature=signature,
public_key=public_key,
public_key_algo=public_key_algo,
protocol_version=protocol_version,
)
event_dict = event.to_dict()
self._write_wal(event_dict)
self.events.append(event_dict)
self.event_index[event.event_id] = len(self.events) - 1
self.head_hash = event.event_id
sequence_table[sequence_key] = sequence
self._replay_filter.add(event.event_id)
self._invalidate_merkle_cache()
self._update_counters_for_event(event_dict)
self._save()
try:
from services.mesh.mesh_rns import rns_bridge
rns_bridge.publish_event(event_dict)
except Exception:
pass
_notify_public_event_append_hooks(event_dict)
logger.info(
f"Infonet append [dm_message] by {_redact_node(node_id)} seq={sequence} "
f"id={event.event_id[:16]}..."
)
return event_dict
def append_private_gate_message(
self,
*,
node_id: str,
payload: dict,
signature: str,
sequence: int,
public_key: str,
public_key_algo: str,
protocol_version: str = "",
timestamp: float = 0,
) -> dict:
"""Append an encrypted gate message to the private Infonet ledger.
Gate messages use their own sequence domain so a gate post cannot
consume or replay-block the author's public broadcast sequence.
"""
event_type = "gate_message"
if sequence <= 0:
raise ValueError("sequence is required and must be > 0")
sequence_table, sequence_key = self._sequence_table_for_event(event_type, node_id)
last = sequence_table.get(sequence_key, 0)
if sequence <= last:
raise ValueError(f"Replay detected: sequence {sequence} <= last {last}")
raw_payload = dict(payload or {})
if "message" in raw_payload or "_local_plaintext" in raw_payload or "_local_reply_to" in raw_payload:
raise ValueError("private gate ledger payload must not contain plaintext")
if str(raw_payload.get("transport_lock", "") or "").strip().lower() != "private_strong":
raise ValueError("gate messages require private_strong transport_lock")
payload = normalize_payload(event_type, raw_payload)
ok, reason = validate_private_gate_ledger_payload(payload)
if not ok:
raise ValueError(reason)
payload_json = json.dumps(payload, sort_keys=True, separators=(",", ":"), ensure_ascii=False)
if len(payload_json.encode("utf-8")) > MAX_PAYLOAD_BYTES:
raise ValueError("payload exceeds max size")
protocol_version = str(protocol_version or PROTOCOL_VERSION)
ok, reason = validate_protocol_fields(protocol_version, NETWORK_ID)
if not ok:
raise ValueError(reason)
if not (signature and public_key and public_key_algo):
raise ValueError("Missing signature fields")
algo = parse_public_key_algo(public_key_algo)
if not algo:
raise ValueError("Unsupported public_key_algo")
if not verify_node_binding(node_id, public_key):
raise ValueError("node_id mismatch")
bound, bind_reason = self._bind_public_key(public_key, node_id)
if not bound:
raise ValueError(bind_reason)
event_for_signature = {"payload": payload}
signature_ok = False
for signature_payload in _private_gate_signature_payload_variants(
str(payload.get("gate", "") or ""),
event_for_signature,
):
sig_payload = build_signature_payload(
event_type=event_type,
node_id=node_id,
sequence=sequence,
payload=signature_payload,
)
if verify_signature(
public_key_b64=public_key,
public_key_algo=public_key_algo,
signature_hex=signature,
payload=sig_payload,
):
signature_ok = True
break
if not signature_ok:
raise ValueError("Invalid signature")
revoked, _info = self._revocation_status(public_key)
if revoked:
raise ValueError("public key is revoked")
event = ChainEvent(
prev_hash=self.head_hash,
event_type=event_type,
node_id=node_id,
payload=payload,
timestamp=float(timestamp or time.time()),
sequence=sequence,
signature=signature,
public_key=public_key,
public_key_algo=public_key_algo,
protocol_version=protocol_version,
)
event_dict = event.to_dict()
self._write_wal(event_dict)
self.events.append(event_dict)
self.event_index[event.event_id] = len(self.events) - 1
self.head_hash = event.event_id
sequence_table[sequence_key] = sequence
self._replay_filter.add(event.event_id)
self._invalidate_merkle_cache()
self._update_counters_for_event(event_dict)
self._save()
try:
from services.mesh.mesh_rns import rns_bridge
rns_bridge.publish_event(event_dict)
except Exception:
pass
_notify_public_event_append_hooks(event_dict)
logger.info(
f"Infonet append [gate_message] by {_redact_node(node_id)} seq={sequence} "
f"id={event.event_id[:16]}..."
)
return event_dict
def append( def append(
self, self,
event_type: str, event_type: str,
@@ -2077,6 +2412,18 @@ class Infonet:
if not event_id or not prev_hash: if not event_id or not prev_hash:
rejected.append({"index": idx, "reason": "Missing event_id or prev_hash"}) rejected.append({"index": idx, "reason": "Missing event_id or prev_hash"})
continue continue
if event_id in self.event_index:
duplicates += 1
continue
if self._replay_filter.seen(event_id):
try:
from services.mesh.mesh_metrics import increment as metrics_inc
metrics_inc("ingest_replay_seen")
except Exception:
pass
duplicates += 1
continue
if prev_hash != expected_prev: if prev_hash != expected_prev:
try: try:
from services.mesh.mesh_metrics import increment as metrics_inc from services.mesh.mesh_metrics import increment as metrics_inc
@@ -2095,25 +2442,14 @@ class Infonet:
pass pass
rejected.append({"index": idx, "reason": "network_id mismatch"}) rejected.append({"index": idx, "reason": "network_id mismatch"})
continue continue
if event_id in self.event_index:
duplicates += 1
continue
if self._replay_filter.seen(event_id):
try:
from services.mesh.mesh_metrics import increment as metrics_inc
metrics_inc("ingest_replay_seen")
except Exception:
pass
duplicates += 1
continue
if prev_hash != self.head_hash: if prev_hash != self.head_hash:
rejected.append({"index": idx, "reason": "prev_hash does not match head"}) rejected.append({"index": idx, "reason": "prev_hash does not match head"})
continue continue
if sequence <= 0: if sequence <= 0:
rejected.append({"index": idx, "reason": "Invalid sequence"}) rejected.append({"index": idx, "reason": "Invalid sequence"})
continue continue
last = self.node_sequences.get(node_id, 0) sequence_table, sequence_key = self._sequence_table_for_event(event_type, node_id)
last = sequence_table.get(sequence_key, 0)
if sequence <= last: if sequence <= last:
rejected.append({"index": idx, "reason": "Replay detected"}) rejected.append({"index": idx, "reason": "Replay detected"})
continue continue
@@ -2148,7 +2484,18 @@ class Infonet:
if not ok: if not ok:
rejected.append({"index": idx, "reason": reason}) rejected.append({"index": idx, "reason": reason})
continue continue
ok, reason = validate_public_ledger_payload(event_type, payload) if event_type == "gate_message":
ok, reason = validate_private_gate_ledger_payload(payload)
elif event_type == "dm_message":
ok, reason = validate_private_dm_ledger_payload(payload)
if ok:
total_count, sender_count = self._dm_spool_active_counts(payload, sender_id=str(evt.get("node_id", "") or ""))
if sender_count >= DM_HASHCHAIN_SPOOL_SENDER_LIMIT:
ok, reason = False, "DM hashchain sender spool full for recipient"
elif total_count >= DM_HASHCHAIN_SPOOL_LIMIT:
ok, reason = False, "DM hashchain spool full for recipient"
else:
ok, reason = validate_public_ledger_payload(event_type, payload)
if not ok: if not ok:
rejected.append({"index": idx, "reason": reason}) rejected.append({"index": idx, "reason": reason})
continue continue
@@ -2224,7 +2571,7 @@ class Infonet:
pass pass
rejected.append({"index": idx, "reason": "public key is revoked"}) rejected.append({"index": idx, "reason": "public key is revoked"})
continue continue
last_seq = self.node_sequences.get(node_id, 0) last_seq = sequence_table.get(sequence_key, 0)
if sequence <= last_seq: if sequence <= last_seq:
try: try:
from services.mesh.mesh_metrics import increment as metrics_inc from services.mesh.mesh_metrics import increment as metrics_inc
@@ -2260,18 +2607,30 @@ class Infonet:
rejected.append({"index": idx, "reason": bind_reason}) rejected.append({"index": idx, "reason": bind_reason})
continue continue
sig_payload = build_signature_payload( if event_type == "gate_message":
event_type=event_type, signature_payloads = _private_gate_signature_payload_variants(
node_id=node_id, str(payload.get("gate", "") or ""),
sequence=sequence, evt,
payload=payload, )
) else:
if not verify_signature( signature_payloads = [payload]
public_key_b64=public_key, signature_ok = False
public_key_algo=public_key_algo, for signature_payload in signature_payloads:
signature_hex=signature, sig_payload = build_signature_payload(
payload=sig_payload, event_type=event_type,
): node_id=node_id,
sequence=sequence,
payload=signature_payload,
)
if verify_signature(
public_key_b64=public_key,
public_key_algo=public_key_algo,
signature_hex=signature,
payload=sig_payload,
):
signature_ok = True
break
if not signature_ok:
try: try:
from services.mesh.mesh_metrics import increment as metrics_inc from services.mesh.mesh_metrics import increment as metrics_inc
@@ -2301,7 +2660,7 @@ class Infonet:
self.events.append(evt) self.events.append(evt)
self.event_index[event_id] = len(self.events) - 1 self.event_index[event_id] = len(self.events) - 1
self.head_hash = event_id self.head_hash = event_id
self.node_sequences[node_id] = sequence sequence_table[sequence_key] = sequence
self._update_counters_for_event(evt) self._update_counters_for_event(evt)
accepted += 1 accepted += 1
expected_prev = event_id expected_prev = event_id
@@ -2364,6 +2723,7 @@ class Infonet:
verify_node_binding, verify_node_binding,
) )
event_type = evt_dict.get("event_type", "")
node_id = evt_dict.get("node_id", "") node_id = evt_dict.get("node_id", "")
if not parse_public_key_algo(public_key_algo): if not parse_public_key_algo(public_key_algo):
return False, f"Unsupported public_key_algo at index {i}" return False, f"Unsupported public_key_algo at index {i}"
@@ -2374,21 +2734,41 @@ class Infonet:
return False, f"public key binding conflict at index {i}" return False, f"public key binding conflict at index {i}"
seen_public_keys[public_key] = node_id seen_public_keys[public_key] = node_id
normalized = normalize_payload( payload = evt_dict.get("payload", {})
evt_dict.get("event_type", ""), evt_dict.get("payload", {}) if event_type == "gate_message":
) ok, reason = validate_private_gate_ledger_payload(payload)
sig_payload = build_signature_payload( if not ok:
event_type=evt_dict.get("event_type", ""), return False, f"Invalid gate_message payload at index {i}: {reason}"
node_id=node_id, signature_payloads = _private_gate_signature_payload_variants(
sequence=_safe_int(evt_dict.get("sequence", 0) or 0, 0), str(payload.get("gate", "") or ""),
payload=normalized, evt_dict,
) )
if not verify_signature( elif event_type == "dm_message":
public_key_b64=public_key, ok, reason = validate_private_dm_ledger_payload(payload)
public_key_algo=public_key_algo, if not ok:
signature_hex=signature, return False, f"Invalid dm_message payload at index {i}: {reason}"
payload=sig_payload, signature_payloads = [normalize_payload(event_type, payload)]
): else:
signature_payloads = [
normalize_payload(event_type, payload)
]
signature_ok = False
for signature_payload in signature_payloads:
sig_payload = build_signature_payload(
event_type=event_type,
node_id=node_id,
sequence=_safe_int(evt_dict.get("sequence", 0) or 0, 0),
payload=signature_payload,
)
if verify_signature(
public_key_b64=public_key,
public_key_algo=public_key_algo,
signature_hex=signature,
payload=sig_payload,
):
signature_ok = True
break
if not signature_ok:
return False, f"Invalid signature at index {i}" return False, f"Invalid signature at index {i}"
prev = evt_dict["event_id"] prev = evt_dict["event_id"]
@@ -2453,27 +2833,48 @@ class Infonet:
verify_node_binding, verify_node_binding,
) )
event_type = evt_dict.get("event_type", "")
node_id = evt_dict.get("node_id", "") node_id = evt_dict.get("node_id", "")
if not parse_public_key_algo(public_key_algo): if not parse_public_key_algo(public_key_algo):
return False, f"Unsupported public_key_algo at index {i}" return False, f"Unsupported public_key_algo at index {i}"
if not verify_node_binding(node_id, public_key): if not verify_node_binding(node_id, public_key):
return False, f"node_id mismatch at index {i}" return False, f"node_id mismatch at index {i}"
normalized = normalize_payload( payload = evt_dict.get("payload", {})
evt_dict.get("event_type", ""), evt_dict.get("payload", {}) if event_type == "gate_message":
) ok, reason = validate_private_gate_ledger_payload(payload)
sig_payload = build_signature_payload( if not ok:
event_type=evt_dict.get("event_type", ""), return False, f"Invalid gate_message payload at index {i}: {reason}"
node_id=node_id, signature_payloads = _private_gate_signature_payload_variants(
sequence=_safe_int(evt_dict.get("sequence", 0) or 0, 0), str(payload.get("gate", "") or ""),
payload=normalized, evt_dict,
) )
if not verify_signature( elif event_type == "dm_message":
public_key_b64=public_key, ok, reason = validate_private_dm_ledger_payload(payload)
public_key_algo=public_key_algo, if not ok:
signature_hex=signature, return False, f"Invalid dm_message payload at index {i}: {reason}"
payload=sig_payload, signature_payloads = [normalize_payload(event_type, payload)]
): else:
signature_payloads = [
normalize_payload(event_type, payload)
]
signature_ok = False
for signature_payload in signature_payloads:
sig_payload = build_signature_payload(
event_type=event_type,
node_id=node_id,
sequence=_safe_int(evt_dict.get("sequence", 0) or 0, 0),
payload=signature_payload,
)
if verify_signature(
public_key_b64=public_key,
public_key_algo=public_key_algo,
signature_hex=signature,
payload=sig_payload,
):
signature_ok = True
break
if not signature_ok:
return False, f"Invalid signature at index {i}" return False, f"Invalid signature at index {i}"
prev = evt_dict["event_id"] prev = evt_dict["event_id"]
@@ -2537,7 +2938,14 @@ class Infonet:
node_id = evt.get("node_id", "") node_id = evt.get("node_id", "")
sequence = _safe_int(evt.get("sequence", 0) or 0, 0) sequence = _safe_int(evt.get("sequence", 0) or 0, 0)
if node_id and sequence: if node_id and sequence:
last_seq[node_id] = max(last_seq.get(node_id, 0), sequence) sequence_key = (
f"{node_id}|gate_message"
if str(evt.get("event_type", "") or "").strip().lower() == "gate_message"
else f"{node_id}|dm_message"
if str(evt.get("event_type", "") or "").strip().lower() == "dm_message"
else node_id
)
last_seq[sequence_key] = max(last_seq.get(sequence_key, 0), sequence)
public_key = str(evt.get("public_key", "") or "") public_key = str(evt.get("public_key", "") or "")
if public_key and node_id: if public_key and node_id:
seen_public_keys.setdefault(public_key, node_id) seen_public_keys.setdefault(public_key, node_id)
@@ -2557,8 +2965,21 @@ class Infonet:
existing_idx = self.event_index.get(event_id) existing_idx = self.event_index.get(event_id)
if existing_idx is not None and existing_idx <= prev_index: if existing_idx is not None and existing_idx <= prev_index:
return False, "duplicate event_id" return False, "duplicate event_id"
payload = normalize_payload(event_type, dict(payload or {})) if event_type == "gate_message":
payload = dict(payload or {})
elif event_type == "dm_message":
payload = normalize_payload(event_type, dict(payload or {}))
else:
payload = normalize_payload(event_type, dict(payload or {}))
ok, reason = validate_event_payload(event_type, payload) ok, reason = validate_event_payload(event_type, payload)
if not ok:
return False, reason
if event_type == "gate_message":
ok, reason = validate_private_gate_ledger_payload(payload)
elif event_type == "dm_message":
ok, reason = validate_private_dm_ledger_payload(payload)
else:
ok, reason = validate_public_ledger_payload(event_type, payload)
if not ok: if not ok:
return False, reason return False, reason
proto = evt.get("protocol_version") or PROTOCOL_VERSION proto = evt.get("protocol_version") or PROTOCOL_VERSION
@@ -2572,7 +2993,14 @@ class Infonet:
revoked, _info = self._revocation_status(public_key) revoked, _info = self._revocation_status(public_key)
if revoked and event_type != "key_revoke": if revoked and event_type != "key_revoke":
return False, "public key revoked" return False, "public key revoked"
last = last_seq.get(node_id, 0) sequence_key = (
f"{node_id}|gate_message"
if event_type == "gate_message"
else f"{node_id}|dm_message"
if event_type == "dm_message"
else node_id
)
last = last_seq.get(sequence_key, 0)
if sequence <= last: if sequence <= last:
return False, "sequence replay" return False, "sequence replay"
from services.mesh.mesh_crypto import ( from services.mesh.mesh_crypto import (
@@ -2590,23 +3018,35 @@ class Infonet:
if existing and existing != node_id: if existing and existing != node_id:
return False, "public key binding conflict" return False, "public key binding conflict"
seen_public_keys[public_key] = node_id seen_public_keys[public_key] = node_id
sig_payload = build_signature_payload( if event_type == "gate_message":
event_type=event_type, signature_payloads = _private_gate_signature_payload_variants(
node_id=node_id, str(payload.get("gate", "") or ""),
sequence=sequence, evt,
payload=payload, )
) else:
if not verify_signature( signature_payloads = [payload]
public_key_b64=public_key, signature_ok = False
public_key_algo=public_key_algo, for signature_payload in signature_payloads:
signature_hex=signature, sig_payload = build_signature_payload(
payload=sig_payload, event_type=event_type,
): node_id=node_id,
sequence=sequence,
payload=signature_payload,
)
if verify_signature(
public_key_b64=public_key,
public_key_algo=public_key_algo,
signature_hex=signature,
payload=sig_payload,
):
signature_ok = True
break
if not signature_ok:
return False, "invalid signature" return False, "invalid signature"
computed = ChainEvent.from_dict(evt).event_id computed = ChainEvent.from_dict(evt).event_id
if computed != event_id: if computed != event_id:
return False, "event_id mismatch" return False, "event_id mismatch"
last_seq[node_id] = sequence last_seq[sequence_key] = sequence
# Apply fork # Apply fork
self.events = prefix + ordered self.events = prefix + ordered
@@ -2,10 +2,64 @@ from __future__ import annotations
import time import time
from dataclasses import asdict, dataclass from dataclasses import asdict, dataclass
from email.utils import parsedate_to_datetime
from datetime import timezone
from services.mesh.mesh_peer_store import PeerRecord from services.mesh.mesh_peer_store import PeerRecord
class PeerSyncRateLimited(Exception):
"""Upstream peer returned HTTP 429 — Too Many Requests.
Carries the ``Retry-After`` header value (parsed to seconds) so
the caller can pass it to ``finish_sync(retry_after_s=...)`` and
actually wait that long instead of hammering the upstream every
60s and keeping its rate-limit bucket full.
``retry_after_s`` is 0 when the upstream didn't provide a header.
Caller should still apply the exponential backoff in that case.
"""
def __init__(self, message: str, retry_after_s: int = 0, status: int = 429):
super().__init__(message)
self.retry_after_s = max(0, int(retry_after_s or 0))
self.status = int(status or 429)
def parse_retry_after_header(header_value: str, *, now: float | None = None) -> int:
"""Parse the ``Retry-After`` HTTP header.
Two valid forms per RFC 7231 §7.1.3:
* Delay-seconds: a non-negative integer (e.g. ``Retry-After: 120``)
* HTTP-date: an absolute time (e.g. ``Retry-After: Wed, 21 Oct 2026 07:28:00 GMT``)
Returns the wait in **seconds from now**. Unparseable / empty headers
return 0 (caller falls back to exponential backoff). Clamped at a
sane upper bound (1 hour) so a typo'd or hostile peer can't pin us
silent for days.
"""
value = str(header_value or "").strip()
if not value:
return 0
upper_bound = 3600 # never trust a peer to silence us > 1h
# Form 1: pure integer seconds.
if value.isdigit():
return min(max(0, int(value)), upper_bound)
# Form 2: HTTP-date.
try:
target = parsedate_to_datetime(value)
if target is None:
return 0
if target.tzinfo is None:
target = target.replace(tzinfo=timezone.utc)
current = float(now if now is not None else time.time())
delta = int(target.timestamp() - current)
return min(max(0, delta), upper_bound)
except (TypeError, ValueError):
return 0
@dataclass(frozen=True) @dataclass(frozen=True)
class SyncWorkerState: class SyncWorkerState:
last_sync_started_at: int = 0 last_sync_started_at: int = 0
@@ -72,6 +126,59 @@ def begin_sync(
) )
def _failure_backoff_seconds(
*,
base_backoff_s: int,
consecutive_failures: int,
retry_after_s: int,
cap_s: int = 1800,
) -> int:
"""Compute the next-attempt delay after a failed sync.
Two inputs combine:
* ``retry_after_s`` when an upstream peer answered HTTP 429
with a ``Retry-After`` header, we honor it exactly. Continuing
to hammer the upstream every 60s is the bug this fix exists to
close: it keeps the upstream's rate-limit bucket full
indefinitely and no sync ever lands.
* Exponential growth on ``consecutive_failures`` even without an
explicit Retry-After, repeated failures should slow us down. The
first failure waits ``base`` (preserves pre-fix behavior for
one-off blips). Each subsequent failure doubles the wait, capped
to ``cap_s`` (default 30 minutes). With base=60 and cap=1800,
the schedule is 60s 120s 240s 480s 960s 1800s
1800s .
The actual delay is the MAX of the two whichever asks for more
patience wins. ``retry_after_s == 0`` (no header) falls back to
pure exponential. An aggressive ``Retry-After`` (say 600s while
we're only at 1 failure) wins over the exponential ladder.
"""
base = max(0, int(base_backoff_s or 0))
failures = max(0, int(consecutive_failures or 0))
cap = max(0, int(cap_s or 0))
retry_after = max(0, int(retry_after_s or 0))
# ``cap_s=0`` explicitly disables the exponential ladder entirely
# — operators who want the pre-fix "honor Retry-After only" behavior
# can set this. The default cap of 1800s is what saturates the
# ladder at the 5th-6th failure for base=60.
if cap == 0:
return retry_after
# 2^(failures-1) — so failure #1 = base (preserves the pre-fix
# default for transient blips), failure #2 = 2*base, etc. Cap on
# the exponent (16) is defense against integer overflow on a
# hostile or very large failures counter.
if base > 0 and failures > 0:
exponent = min(max(0, failures - 1), 16)
grown = base * (2 ** exponent)
else:
grown = 0
exponential = min(max(0, grown), cap)
return max(exponential, retry_after)
def finish_sync( def finish_sync(
state: SyncWorkerState, state: SyncWorkerState,
*, *,
@@ -83,7 +190,26 @@ def finish_sync(
now: float | None = None, now: float | None = None,
interval_s: int = 300, interval_s: int = 300,
failure_backoff_s: int = 60, failure_backoff_s: int = 60,
retry_after_s: int = 0,
failure_backoff_cap_s: int = 1800,
) -> SyncWorkerState: ) -> SyncWorkerState:
"""Finalise a sync attempt and compute when the next one should run.
New args (added for the 429 retry storm fix):
* ``retry_after_s`` if the peer responded with HTTP 429 + a
``Retry-After`` header, pass that value here. ``finish_sync``
will use ``max(exponential, retry_after_s)`` for the delay so
we never hammer a peer that asked us to back off.
* ``failure_backoff_cap_s`` upper bound on the exponential
ladder. Default 1800 (30 min) keeps a sync queue from going
silent for hours while still cutting the request rate to
something the upstream can absorb.
The pre-fix behavior (constant 60s on every failure) is recoverable
by passing ``failure_backoff_cap_s=0`` and ``retry_after_s=0``, but
there's no reason to.
"""
timestamp = int(now if now is not None else time.time()) timestamp = int(now if now is not None else time.time())
if ok: if ok:
return SyncWorkerState( return SyncWorkerState(
@@ -99,17 +225,25 @@ def finish_sync(
consecutive_failures=0, consecutive_failures=0,
) )
next_failures = state.consecutive_failures + 1
delay_s = _failure_backoff_seconds(
base_backoff_s=failure_backoff_s,
consecutive_failures=next_failures,
retry_after_s=retry_after_s,
cap_s=failure_backoff_cap_s,
)
return SyncWorkerState( return SyncWorkerState(
last_sync_started_at=state.last_sync_started_at, last_sync_started_at=state.last_sync_started_at,
last_sync_finished_at=timestamp, last_sync_finished_at=timestamp,
last_sync_ok_at=state.last_sync_ok_at, last_sync_ok_at=state.last_sync_ok_at,
next_sync_due_at=timestamp + max(0, int(failure_backoff_s or 0)), next_sync_due_at=timestamp + delay_s,
last_peer_url=peer_url or state.last_peer_url, last_peer_url=peer_url or state.last_peer_url,
last_error=str(error or "").strip(), last_error=str(error or "").strip(),
last_outcome="fork" if fork_detected else "error", last_outcome="fork" if fork_detected else "error",
current_head=current_head or state.current_head, current_head=current_head or state.current_head,
fork_detected=bool(fork_detected), fork_detected=bool(fork_detected),
consecutive_failures=state.consecutive_failures + 1, consecutive_failures=next_failures,
) )
@@ -142,5 +276,6 @@ def should_run_sync(
) -> bool: ) -> bool:
current_time = int(now if now is not None else time.time()) current_time = int(now if now is not None else time.time())
if state.last_outcome == "running": if state.last_outcome == "running":
return False started_at = int(state.last_sync_started_at or 0)
return started_at <= 0 or current_time - started_at >= 300
return int(state.next_sync_due_at or 0) <= current_time return int(state.next_sync_due_at or 0) <= current_time
+16 -11
View File
@@ -26,7 +26,11 @@ from enum import Enum
from typing import Any, Callable, Optional from typing import Any, Callable, Optional
from collections import deque from collections import deque
from urllib.parse import urlparse from urllib.parse import urlparse
from services.mesh.mesh_crypto import _derive_peer_key, normalize_peer_url from services.mesh.mesh_crypto import (
_derive_peer_key,
normalize_peer_url,
resolve_peer_key_for_url,
)
from services.mesh.mesh_metrics import increment as metrics_inc from services.mesh.mesh_metrics import increment as metrics_inc
from services.mesh.mesh_privacy_policy import ( from services.mesh.mesh_privacy_policy import (
TRANSPORT_TIER_ORDER as _TIER_RANK, TRANSPORT_TIER_ORDER as _TIER_RANK,
@@ -703,7 +707,6 @@ class InternetTransport(_PeerPushTransportMixin):
endpoint_path, padded = self._build_peer_push_request(envelope, self.NAME) endpoint_path, padded = self._build_peer_push_request(envelope, self.NAME)
except ValueError as exc: except ValueError as exc:
return TransportResult(False, self.NAME, str(exc)) return TransportResult(False, self.NAME, str(exc))
secret = str(settings.MESH_PEER_PUSH_SECRET or "").strip()
delivered = 0 delivered = 0
last_error = "" last_error = ""
@@ -713,10 +716,13 @@ class InternetTransport(_PeerPushTransportMixin):
try: try:
normalized_peer_url = normalize_peer_url(peer_url) normalized_peer_url = normalize_peer_url(peer_url)
headers = {"Content-Type": "application/json"} headers = {"Content-Type": "application/json"}
if secret: # Issue #256: per-peer secret takes precedence over the
peer_key = _derive_peer_key(secret, normalized_peer_url) # global MESH_PEER_PUSH_SECRET. When neither is set the
if not peer_key: # key is empty and we skip the HMAC header entirely so a
raise ValueError("invalid peer URL for HMAC derivation") # bare (unsigned) push still works on test deployments
# that have not yet configured any secret at all.
peer_key = resolve_peer_key_for_url(normalized_peer_url)
if peer_key:
headers["X-Peer-Url"] = normalized_peer_url headers["X-Peer-Url"] = normalized_peer_url
headers["X-Peer-HMAC"] = hmac.new( headers["X-Peer-HMAC"] = hmac.new(
peer_key, peer_key,
@@ -798,7 +804,6 @@ class TorArtiTransport(_PeerPushTransportMixin):
endpoint_path, padded = self._build_peer_push_request(envelope, self.NAME) endpoint_path, padded = self._build_peer_push_request(envelope, self.NAME)
except ValueError as exc: except ValueError as exc:
return TransportResult(False, self.NAME, str(exc)) return TransportResult(False, self.NAME, str(exc))
secret = str(settings.MESH_PEER_PUSH_SECRET or "").strip()
delivered = 0 delivered = 0
last_error = "" last_error = ""
@@ -808,10 +813,10 @@ class TorArtiTransport(_PeerPushTransportMixin):
try: try:
normalized_peer_url = normalize_peer_url(peer_url) normalized_peer_url = normalize_peer_url(peer_url)
headers = {"Content-Type": "application/json"} headers = {"Content-Type": "application/json"}
if secret: # Issue #256: per-peer secret takes precedence; see the
peer_key = _derive_peer_key(secret, normalized_peer_url) # other transport above for the rationale.
if not peer_key: peer_key = resolve_peer_key_for_url(normalized_peer_url)
raise ValueError("invalid peer URL for HMAC derivation") if peer_key:
headers["X-Peer-Url"] = normalized_peer_url headers["X-Peer-Url"] = normalized_peer_url
headers["X-Peer-HMAC"] = hmac.new( headers["X-Peer-HMAC"] = hmac.new(
peer_key, peer_key,
+144
View File
@@ -2,6 +2,9 @@
from __future__ import annotations from __future__ import annotations
import base64
import binascii
import math
from dataclasses import dataclass from dataclasses import dataclass
from typing import Any, Callable from typing import Any, Callable
@@ -33,6 +36,58 @@ def _require_fields(payload: dict[str, Any], fields: tuple[str, ...]) -> tuple[b
return True, "ok" return True, "ok"
def _decode_base64ish(value: Any) -> bytes | None:
raw = str(value or "").strip()
if not raw or any(ch.isspace() for ch in raw):
return None
padded = raw + ("=" * (-len(raw) % 4))
for altchars in (None, b"-_"):
try:
return base64.b64decode(padded.encode("ascii"), altchars=altchars, validate=True)
except (binascii.Error, UnicodeEncodeError, ValueError):
continue
return None
def _byte_entropy(data: bytes) -> float:
if not data:
return 0.0
counts = [0] * 256
for byte in data:
counts[byte] += 1
total = float(len(data))
return -sum((count / total) * math.log2(count / total) for count in counts if count)
def _validate_sealed_bytes_field(
payload: dict[str, Any],
field: str,
*,
min_bytes: int = 8,
entropy_floor: float = 2.5,
) -> tuple[bool, str]:
data = _decode_base64ish(payload.get(field, ""))
if data is None:
return False, f"{field} must be base64-encoded sealed bytes"
if len(data) < min_bytes:
return False, f"{field} is too short"
# Short test vectors and compact envelopes can be low entropy; only apply
# heuristics once there is enough material to distinguish a sealed blob
# from accidental base64-encoded plaintext.
if len(data) >= 32:
printable = sum(1 for byte in data if 32 <= byte <= 126 or byte in (9, 10, 13))
if printable / len(data) > 0.9:
try:
data.decode("utf-8")
return False, f"{field} looks like encoded plaintext"
except UnicodeDecodeError:
pass
if _byte_entropy(data) < entropy_floor:
return False, f"{field} entropy is too low for sealed bytes"
return True, "ok"
def _validate_message(payload: dict[str, Any]) -> tuple[bool, str]: def _validate_message(payload: dict[str, Any]) -> tuple[bool, str]:
ok, reason = _require_fields( ok, reason = _require_fields(
payload, ("message", "destination", "channel", "priority", "ephemeral") payload, ("message", "destination", "channel", "priority", "ephemeral")
@@ -331,6 +386,7 @@ ACTIVE_PUBLIC_LEDGER_EVENT_TYPES: frozenset[str] = frozenset(
LEGACY_PUBLIC_LEDGER_EVENT_TYPES: frozenset[str] = frozenset( LEGACY_PUBLIC_LEDGER_EVENT_TYPES: frozenset[str] = frozenset(
{ {
"gate_message", "gate_message",
"dm_message",
} }
) )
"""Event types that exist historically on the public chain and must remain """Event types that exist historically on the public chain and must remain
@@ -425,6 +481,8 @@ def validate_event_payload(event_type: str, payload: dict[str, Any]) -> tuple[bo
def validate_public_ledger_payload(event_type: str, payload: dict[str, Any]) -> tuple[bool, str]: def validate_public_ledger_payload(event_type: str, payload: dict[str, Any]) -> tuple[bool, str]:
if event_type == "gate_message":
return validate_private_gate_ledger_payload(payload)
if event_type not in PUBLIC_LEDGER_EVENT_TYPES and event_type not in _EXTENSION_VALIDATORS: if event_type not in PUBLIC_LEDGER_EVENT_TYPES and event_type not in _EXTENSION_VALIDATORS:
return False, f"{event_type} is not allowed on the public ledger" return False, f"{event_type} is not allowed on the public ledger"
forbidden = sorted( forbidden = sorted(
@@ -441,6 +499,92 @@ def validate_public_ledger_payload(event_type: str, payload: dict[str, Any]) ->
return True, "ok" return True, "ok"
_PRIVATE_GATE_LEDGER_ALLOWED_FIELDS: frozenset[str] = frozenset(
{
"gate",
"ciphertext",
"nonce",
"sender_ref",
"format",
"epoch",
"gate_envelope",
"envelope_hash",
"reply_to",
"transport_lock",
"signed_context",
}
)
def validate_private_gate_ledger_payload(payload: dict[str, Any]) -> tuple[bool, str]:
"""Validate ciphertext-only gate events for private Infonet replication."""
ok, reason = validate_event_payload("gate_message", payload)
if not ok:
return ok, reason
unexpected = sorted(
key
for key in payload.keys()
if str(key or "").strip().lower() not in _PRIVATE_GATE_LEDGER_ALLOWED_FIELDS
)
if unexpected:
return False, f"private gate ledger payload contains unsupported fields: {', '.join(unexpected)}"
if "message" in payload or "_local_plaintext" in payload or "_local_reply_to" in payload:
return False, "private gate ledger payload must not contain plaintext"
transport_lock = str(payload.get("transport_lock", "") or "").strip().lower()
if transport_lock and transport_lock not in {"private", "private_strong", "rns", "onion"}:
return False, "gate messages require private transport_lock"
ok, reason = _validate_sealed_bytes_field(payload, "ciphertext")
if not ok:
return ok, reason
ok, reason = _validate_sealed_bytes_field(payload, "nonce")
if not ok:
return ok, reason
return True, "ok"
_PRIVATE_DM_LEDGER_ALLOWED_FIELDS: frozenset[str] = frozenset(
{
"recipient_id",
"delivery_class",
"recipient_token",
"ciphertext",
"msg_id",
"timestamp",
"format",
"session_welcome",
"sender_seal",
"relay_salt",
"transport_lock",
"signed_context",
}
)
def validate_private_dm_ledger_payload(payload: dict[str, Any]) -> tuple[bool, str]:
"""Validate ciphertext-only DM dead-drop events for private Infonet replication."""
ok, reason = validate_event_payload("dm_message", payload)
if not ok:
return ok, reason
unexpected = sorted(
key
for key in payload.keys()
if str(key or "").strip().lower() not in _PRIVATE_DM_LEDGER_ALLOWED_FIELDS
)
if unexpected:
return False, f"private DM ledger payload contains unsupported fields: {', '.join(unexpected)}"
if "message" in payload or "plaintext" in payload or "_local_plaintext" in payload:
return False, "private DM ledger payload must not contain plaintext"
transport_lock = str(payload.get("transport_lock", "") or "").strip().lower()
if transport_lock != "private_strong":
return False, "DM hashchain spool requires private_strong transport_lock"
if not str(payload.get("ciphertext", "") or "").strip():
return False, "ciphertext cannot be empty"
ok, reason = _validate_sealed_bytes_field(payload, "ciphertext")
if not ok:
return ok, reason
return True, "ok"
def validate_protocol_fields(protocol_version: str, network_id: str) -> tuple[bool, str]: def validate_protocol_fields(protocol_version: str, network_id: str) -> tuple[bool, str]:
if protocol_version != PROTOCOL_VERSION: if protocol_version != PROTOCOL_VERSION:
return False, "Unsupported protocol_version" return False, "Unsupported protocol_version"
+7 -2
View File
@@ -38,6 +38,11 @@ _REVOCATION_TTL_CACHE: dict[str, dict[str, Any]] = {}
_REVOCATION_TTL_LOCK = threading.Lock() _REVOCATION_TTL_LOCK = threading.Lock()
_REVOCATION_REFRESH_LOCK = threading.Lock() _REVOCATION_REFRESH_LOCK = threading.Lock()
_REVOCATION_REFRESH_FAIL_FAST_WINDOW_S = 5.0 _REVOCATION_REFRESH_FAIL_FAST_WINDOW_S = 5.0
def _request_scope_path(request: Request) -> str:
scope = getattr(request, "scope", {}) or {}
return str(scope.get("path") or "")
_REVOCATION_REFRESH_RETRY_AFTER_S = 5 _REVOCATION_REFRESH_RETRY_AFTER_S = 5
_REVOCATION_PRECHECK_UNAVAILABLE_DETAIL = "Signed event integrity preflight unavailable" _REVOCATION_PRECHECK_UNAVAILABLE_DETAIL = "Signed event integrity preflight unavailable"
@@ -166,7 +171,7 @@ def _canonical_signed_write_retry_payload(
signed_context = build_signed_context( signed_context = build_signed_context(
event_type=prepared.event_type, event_type=prepared.event_type,
kind=prepared.kind.value, kind=prepared.kind.value,
endpoint=str(request.url.path or ""), endpoint=_request_scope_path(request),
lane_floor=_content_private_required_transport_tier(prepared.kind), lane_floor=_content_private_required_transport_tier(prepared.kind),
sequence_domain=_signed_context_sequence_domain(prepared), sequence_domain=_signed_context_sequence_domain(prepared),
node_id=prepared.node_id, node_id=prepared.node_id,
@@ -540,7 +545,7 @@ def _apply_signed_context_policy(prepared: "PreparedSignedWrite", request: Reque
ok, reason = validate_signed_context( ok, reason = validate_signed_context(
event_type=prepared.event_type, event_type=prepared.event_type,
kind=prepared.kind.value, kind=prepared.kind.value,
endpoint=str(request.url.path or ""), endpoint=_request_scope_path(request),
lane_floor=_content_private_required_transport_tier(prepared.kind), lane_floor=_content_private_required_transport_tier(prepared.kind),
sequence_domain=_signed_context_sequence_domain(prepared), sequence_domain=_signed_context_sequence_domain(prepared),
node_id=prepared.node_id, node_id=prepared.node_id,
+10 -7
View File
@@ -91,13 +91,15 @@ def _fetch_dm_prekey_bundle_from_peer_lookup(lookup_token: str) -> dict[str, Any
return {"ok": False, "detail": "lookup token required"} return {"ok": False, "detail": "lookup token required"}
try: try:
from services.config import get_settings from services.config import get_settings
from services.mesh.mesh_crypto import _derive_peer_key, normalize_peer_url from services.mesh.mesh_crypto import (
normalize_peer_url,
resolve_peer_key_for_url,
)
from services.mesh.mesh_router import configured_relay_peer_urls from services.mesh.mesh_router import configured_relay_peer_urls
settings = get_settings() settings = get_settings()
secret = str(getattr(settings, "MESH_PEER_PUSH_SECRET", "") or "").strip() # Issue #256: secret check moved per-peer below. We still bail out
if not secret: # cleanly when there are no peers configured at all.
return {"ok": False, "detail": "peer prekey lookup unavailable"}
peers = configured_relay_peer_urls() peers = configured_relay_peer_urls()
if not peers: if not peers:
return {"ok": False, "detail": "peer prekey lookup unavailable"} return {"ok": False, "detail": "peer prekey lookup unavailable"}
@@ -121,7 +123,8 @@ def _fetch_dm_prekey_bundle_from_peer_lookup(lookup_token: str) -> dict[str, Any
or os.environ.get("SB_TEST_NODE_URL", "").strip() or os.environ.get("SB_TEST_NODE_URL", "").strip()
or normalized_peer_url or normalized_peer_url
) )
peer_key = _derive_peer_key(secret, sender_peer_url) # Issue #256: prefer per-peer secret keyed by the sender URL.
peer_key = resolve_peer_key_for_url(sender_peer_url)
if not peer_key: if not peer_key:
continue continue
headers = { headers = {
@@ -231,12 +234,12 @@ def _fetch_dm_prekey_bundle_from_public_lookup(lookup_token: str) -> dict[str, A
# Generic UA: any peer-facing crypto request should not carry a # Generic UA: any peer-facing crypto request should not carry a
# fork-specific identifier — that turns prekey lookups into a # fork-specific identifier — that turns prekey lookups into a
# software-fingerprinting beacon. # software-fingerprinting beacon.
from services.network_utils import DEFAULT_USER_AGENT from services.network_utils import default_user_agent
request = urllib.request.Request( request = urllib.request.Request(
f"{normalized_peer_url}/api/mesh/dm/prekey-bundle?{encoded}", f"{normalized_peer_url}/api/mesh/dm/prekey-bundle?{encoded}",
headers={ headers={
"Accept": "application/json", "Accept": "application/json",
"User-Agent": DEFAULT_USER_AGENT, "User-Agent": default_user_agent(),
}, },
method="GET", method="GET",
) )
+186 -9
View File
@@ -5,7 +5,9 @@ import subprocess
import shutil import shutil
import time import time
import threading import threading
import uuid
import requests import requests
from pathlib import Path
from urllib.parse import urlparse from urllib.parse import urlparse
from requests.adapters import HTTPAdapter from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry from urllib3.util.retry import Retry
@@ -20,15 +22,190 @@ _session.mount("https://", HTTPAdapter(max_retries=_retry, pool_maxsize=20))
_session.mount("http://", HTTPAdapter(max_retries=_retry, pool_maxsize=10)) _session.mount("http://", HTTPAdapter(max_retries=_retry, pool_maxsize=10))
# Default outbound User-Agent. Generic by design — does NOT include any # ---------------------------------------------------------------------------
# personal contact info or a fork-specific repo URL. Operators who run a # Per-operator outbound identification
# public-facing relay and want to identify themselves to upstreams (e.g. # ---------------------------------------------------------------------------
# for Nominatim / weather.gov usage-policy compliance) can override this #
# via the SHADOWBROKER_USER_AGENT env var. # Issues #289 / #290 / #291 and the retrofit of PR #284 (#218 / #219 / #220):
DEFAULT_USER_AGENT = os.environ.get( # every third-party API the backend calls used to identify itself with a
"SHADOWBROKER_USER_AGENT", # single "Shadowbroker" aggregate User-Agent. From the upstream's
"ShadowBroker-OSINT/0.9", # perspective, that meant every Shadowbroker install in the world looked
# like one giant entity hammering them. If one install misbehaved, the
# upstream's only recourse was to block "Shadowbroker" as a whole — which
# would take out every other install too.
#
# Fix: give each install a stable pseudonymous handle used as the entire
# User-Agent product token (no shared "Shadowbroker" label). Upstreams see
# ``operator-7f3a92`` (or ``OPERATOR_HANDLE``), not one monolithic app name.
#
# The handle:
#
# - Is auto-generated on first call if no `OPERATOR_HANDLE` is configured
# (looks like "operator-7f3a92" — 6 hex chars from uuid4()).
# - Is persisted to ``backend/data/operator_handle.json`` so it survives
# restarts. Under Docker compose that file lives in the volume mount
# alongside `carrier_cache.json` and the other persistent state.
# - Can be overridden by the operator via the `OPERATOR_HANDLE` setting
# (env var or settings UI). Operators with their own GitHub handle,
# organization name, etc. can use that for traceability.
# - Is NEVER mixed into mesh / Wormhole / Infonet identity. This layer is
# strictly for public third-party API attribution.
_OPERATOR_HANDLE_FILE = (
Path(__file__).parent.parent / "data" / "operator_handle.json"
) )
_OPERATOR_HANDLE_CACHE: str = ""
_OPERATOR_HANDLE_LOCK = threading.Lock()
def _generate_operator_handle() -> str:
"""Produce a stable pseudonymous handle for first-launch installs.
Format: ``operator-7f3a92`` (6 hex chars from a fresh uuid4()).
Distinct per install. Carries no real-world identity by default
operators who want one can override via ``OPERATOR_HANDLE``.
Note: the prefix is deliberately neutral. Earlier drafts used
``shadow-`` which, while accurate to the project name, looks
exactly like the kind of pattern a third-party abuse-detection
system would auto-block as suspicious. ``operator-`` describes
what the value actually is and doesn't pattern-match malware.
"""
return f"operator-{uuid.uuid4().hex[:6]}"
def _load_persisted_operator_handle() -> str:
"""Return the previously-saved handle from disk, or empty if none.
Reads ``backend/data/operator_handle.json`` if it exists. Any read
error returns empty so a fresh handle gets generated rather than
crashing the request.
"""
try:
if _OPERATOR_HANDLE_FILE.exists():
data = json.loads(_OPERATOR_HANDLE_FILE.read_text(encoding="utf-8"))
return str(data.get("handle", "") or "").strip()
except (OSError, json.JSONDecodeError, ValueError):
pass
return ""
def _persist_operator_handle(handle: str) -> None:
"""Atomically save the auto-generated handle so subsequent restarts
use the same one. Failure to persist is non-fatal the request still
succeeds with the in-memory handle, we just may generate a different
one on the next process restart."""
try:
_OPERATOR_HANDLE_FILE.parent.mkdir(parents=True, exist_ok=True)
tmp = _OPERATOR_HANDLE_FILE.with_suffix(_OPERATOR_HANDLE_FILE.suffix + ".tmp")
tmp.write_text(
json.dumps({"handle": handle, "_meta": {
"purpose": "Per-install operator handle for outbound third-party API attribution.",
"see": "backend/services/network_utils.py:outbound_user_agent",
}}, indent=2),
encoding="utf-8",
)
os.replace(tmp, _OPERATOR_HANDLE_FILE)
except OSError as exc:
logger.debug("Could not persist operator_handle (continuing in-memory): %s", exc)
def get_operator_handle() -> str:
"""Return the stable per-install operator handle.
Resolution order:
1. ``OPERATOR_HANDLE`` setting (env var / settings UI) if non-empty.
2. Process-cached value from previous call this run.
3. Value persisted to ``operator_handle.json`` (from a previous run).
4. Newly generated pseudonymous handle, persisted to disk.
The handle is normalized: stripped of whitespace, lowercased,
non-alphanumeric chars (except ``-`` and ``_``) replaced with ``-``.
This both sanitizes any HTTP-header-unsafe characters AND prevents
the operator from impersonating real third-party projects via
inventive whitespace.
"""
global _OPERATOR_HANDLE_CACHE
with _OPERATOR_HANDLE_LOCK:
# 1. Configured override always wins.
configured = ""
try:
from services.config import get_settings
configured = str(getattr(get_settings(), "OPERATOR_HANDLE", "") or "").strip()
except Exception:
configured = ""
if configured:
return _normalize_handle(configured)
# 2. In-memory cache (fast path for repeated calls).
if _OPERATOR_HANDLE_CACHE:
return _OPERATOR_HANDLE_CACHE
# 3. On-disk handle from a previous run.
persisted = _load_persisted_operator_handle()
if persisted:
normalized = _normalize_handle(persisted)
# Migrate legacy auto-generated handles (pre-Round-7a ``shadow-`` prefix).
if normalized.startswith("shadow-"):
normalized = f"operator-{normalized[len('shadow-'):]}"
_persist_operator_handle(normalized)
_OPERATOR_HANDLE_CACHE = normalized
return _OPERATOR_HANDLE_CACHE
# 4. Generate, persist, return.
fresh = _generate_operator_handle()
_persist_operator_handle(fresh)
_OPERATOR_HANDLE_CACHE = fresh
return fresh
def _normalize_handle(raw: str) -> str:
"""Strip whitespace, lowercase, replace unsafe characters with dashes."""
safe = "".join(
ch if (ch.isalnum() or ch in "-_") else "-"
for ch in raw.strip().lower()
)
# Collapse runs of dashes and trim to a reasonable length so an
# operator can't make our outbound logs unreadable.
while "--" in safe:
safe = safe.replace("--", "-")
safe = safe.strip("-")
return safe[:48] if safe else "anonymous"
def outbound_user_agent(purpose: str = "") -> str:
"""Build a User-Agent for an outbound third-party HTTP request.
Returns the per-install handle only, e.g. ``operator-7f3a92`` or
``operator-7f3a92 (purpose: wikipedia)``. No shared project name so
upstream abuse teams cannot block every install with one ``Shadowbroker``
rule.
Set ``SHADOWBROKER_USER_AGENT`` to override the entire string if needed.
"""
handle = get_operator_handle()
if purpose:
purpose_clean = _normalize_handle(purpose)
return f"{handle} (purpose: {purpose_clean})"
return handle
def _reset_operator_handle_cache_for_tests() -> None:
"""Test-only: invalidate the in-memory cache so a test can set a
new ``OPERATOR_HANDLE`` env var and see it picked up immediately."""
global _OPERATOR_HANDLE_CACHE
with _OPERATOR_HANDLE_LOCK:
_OPERATOR_HANDLE_CACHE = ""
def default_user_agent() -> str:
"""Default User-Agent for ``fetch_with_curl`` and legacy call sites."""
custom = (os.environ.get("SHADOWBROKER_USER_AGENT") or "").strip()
if custom:
return custom
return outbound_user_agent()
# Find bash for curl fallback — Git bash's curl has the TLS features # Find bash for curl fallback — Git bash's curl has the TLS features
# needed to pass CDN fingerprint checks (brotli, zstd, libpsl) # needed to pass CDN fingerprint checks (brotli, zstd, libpsl)
@@ -84,7 +261,7 @@ def fetch_with_curl(url, method="GET", json_data=None, timeout=15, headers=None,
both Python requests and the barebones Windows system curl. both Python requests and the barebones Windows system curl.
""" """
default_headers = { default_headers = {
"User-Agent": DEFAULT_USER_AGENT, "User-Agent": default_user_agent(),
} }
if headers: if headers:
default_headers.update(headers) default_headers.update(headers)
+4 -2
View File
@@ -12,6 +12,8 @@ logger = logging.getLogger(__name__)
CONFIG_PATH = Path(__file__).parent.parent / "config" / "news_feeds.json" CONFIG_PATH = Path(__file__).parent.parent / "config" / "news_feeds.json"
MAX_FEEDS = 50 MAX_FEEDS = 50
_FEED_URL_REPLACEMENTS = { _FEED_URL_REPLACEMENTS = {
"http://feeds.bbci.co.uk/news/world/rss.xml": "https://feeds.bbci.co.uk/news/world/rss.xml",
"http://www.news.cn/english/rss/worldrss.xml": "https://www.news.cn/english/rss/worldrss.xml",
"https://www.channelnewsasia.com/rssfeed/8395986": "https://www.channelnewsasia.com/api/v1/rss-outbound-feed?_format=xml", "https://www.channelnewsasia.com/rssfeed/8395986": "https://www.channelnewsasia.com/api/v1/rss-outbound-feed?_format=xml",
} }
_DEAD_FEED_URLS = { _DEAD_FEED_URLS = {
@@ -27,7 +29,7 @@ _DEAD_FEED_URLS = {
DEFAULT_FEEDS = [ DEFAULT_FEEDS = [
{"name": "NPR", "url": "https://feeds.npr.org/1004/rss.xml", "weight": 4}, {"name": "NPR", "url": "https://feeds.npr.org/1004/rss.xml", "weight": 4},
{"name": "BBC", "url": "http://feeds.bbci.co.uk/news/world/rss.xml", "weight": 3}, {"name": "BBC", "url": "https://feeds.bbci.co.uk/news/world/rss.xml", "weight": 3},
{"name": "AlJazeera", "url": "https://www.aljazeera.com/xml/rss/all.xml", "weight": 2}, {"name": "AlJazeera", "url": "https://www.aljazeera.com/xml/rss/all.xml", "weight": 2},
{"name": "NYT", "url": "https://rss.nytimes.com/services/xml/rss/nyt/World.xml", "weight": 1}, {"name": "NYT", "url": "https://rss.nytimes.com/services/xml/rss/nyt/World.xml", "weight": 1},
{"name": "GDACS", "url": "https://www.gdacs.org/xml/rss.xml", "weight": 5}, {"name": "GDACS", "url": "https://www.gdacs.org/xml/rss.xml", "weight": 5},
@@ -35,7 +37,7 @@ DEFAULT_FEEDS = [
{"name": "Bellingcat", "url": "https://www.bellingcat.com/feed/", "weight": 4}, {"name": "Bellingcat", "url": "https://www.bellingcat.com/feed/", "weight": 4},
{"name": "Guardian", "url": "https://www.theguardian.com/world/rss", "weight": 3}, {"name": "Guardian", "url": "https://www.theguardian.com/world/rss", "weight": 3},
{"name": "TASS", "url": "https://tass.com/rss/v2.xml", "weight": 2}, {"name": "TASS", "url": "https://tass.com/rss/v2.xml", "weight": 2},
{"name": "Xinhua", "url": "http://www.news.cn/english/rss/worldrss.xml", "weight": 2}, {"name": "Xinhua", "url": "https://www.news.cn/english/rss/worldrss.xml", "weight": 2},
{"name": "CNA", "url": "https://www.channelnewsasia.com/api/v1/rss-outbound-feed?_format=xml", "weight": 3}, {"name": "CNA", "url": "https://www.channelnewsasia.com/api/v1/rss-outbound-feed?_format=xml", "weight": 3},
{"name": "Mercopress", "url": "https://en.mercopress.com/rss/", "weight": 3}, {"name": "Mercopress", "url": "https://en.mercopress.com/rss/", "weight": 3},
{"name": "SCMP", "url": "https://www.scmp.com/rss/91/feed", "weight": 4}, {"name": "SCMP", "url": "https://www.scmp.com/rss/91/feed", "weight": 4},
+27
View File
@@ -83,6 +83,10 @@ READ_COMMANDS = frozenset({
"sar_pin_click", "sar_pin_click",
# Analysis zones (OpenClaw map overlays) # Analysis zones (OpenClaw map overlays)
"list_analysis_zones", "list_analysis_zones",
# Recon / OSINT toolkit (server-side proxies, SSRF guarded)
"osint_lookup",
"osint_tools",
"entity_expand",
}) })
WRITE_COMMANDS = frozenset({ WRITE_COMMANDS = frozenset({
@@ -112,6 +116,8 @@ WRITE_COMMANDS = frozenset({
"place_analysis_zone", "place_analysis_zone",
"delete_analysis_zone", "delete_analysis_zone",
"clear_analysis_zones", "clear_analysis_zones",
# Active recon (subnet device discovery)
"osint_sweep",
}) })
@@ -780,6 +786,7 @@ def _dispatch_command(cmd: str, args: dict[str, Any]) -> dict[str, Any]:
query=str(args.get("query", "") or ""), query=str(args.get("query", "") or ""),
limit=args.get("limit", 10), limit=args.get("limit", 10),
include_gdelt=bool(args.get("include_gdelt", True)), include_gdelt=bool(args.get("include_gdelt", True)),
include_telegram=bool(args.get("include_telegram", True)),
) )
if _wants_compact(args): if _wants_compact(args):
return {"ok": True, "data": _compact_query_result(result), "format": "compressed_v1"} return {"ok": True, "data": _compact_query_result(result), "format": "compressed_v1"}
@@ -846,6 +853,26 @@ def _dispatch_command(cmd: str, args: dict[str, Any]) -> dict[str, Any]:
return {"ok": True, "data": _compact_query_result(result), "format": "compressed_v1"} return {"ok": True, "data": _compact_query_result(result), "format": "compressed_v1"}
return {"ok": True, "data": result} return {"ok": True, "data": result}
if cmd == "osint_lookup":
from services.osint.openclaw_recon import run_osint_lookup
tool = str(args.get("tool", "") or args.get("lookup", "") or args.get("type", "") or "")
result = run_osint_lookup(tool, args)
return {"ok": True, "data": result, "tool": tool.strip().lower()}
if cmd == "osint_tools":
from services.osint.openclaw_recon import osint_tool_help
return {"ok": True, "data": osint_tool_help()}
if cmd == "osint_sweep":
from services.osint.openclaw_recon import run_osint_sweep
result = run_osint_sweep(args)
return {"ok": True, "data": result}
if cmd == "entity_expand":
from services.osint.openclaw_recon import run_entity_expand
result = run_entity_expand(args)
return {"ok": True, "data": result}
if cmd == "get_report": if cmd == "get_report":
from services.telemetry import get_cached_telemetry_refs, get_cached_slow_telemetry_refs from services.telemetry import get_cached_telemetry_refs, get_cached_slow_telemetry_refs
fast = get_cached_telemetry_refs() fast = get_cached_telemetry_refs()
+1
View File
@@ -0,0 +1 @@
"""Operator-initiated OSINT lookups (server-side proxies)."""
+492
View File
@@ -0,0 +1,492 @@
"""Server-side OSINT lookups (Osiris port, HTTPS outbound only)."""
from __future__ import annotations
import ipaddress
import json
import logging
import re
import socket
import time
from concurrent.futures import ThreadPoolExecutor, as_completed
from datetime import datetime, timezone
from typing import Any
from urllib.parse import quote
from services.network_utils import fetch_with_curl
from services.sanctions.ofac import match_exact, search_sanctions
from services.ssrf_guard import safe_get, validate_domain, validate_host
logger = logging.getLogger(__name__)
_IPV4_RE = re.compile(r"^(\d{1,3}\.){3}\d{1,3}$")
_IPV6_RE = re.compile(r"^[0-9a-fA-F:]+$")
_CVE_RE = re.compile(r"^CVE-\d{4}-\d{4,}$", re.I)
_ASN_RE = re.compile(r"^(AS)?\d+$", re.I)
def _now_iso() -> str:
return datetime.now(timezone.utc).isoformat()
def _json_get(url: str, *, timeout: float = 8.0, headers: dict[str, str] | None = None) -> Any:
resp = fetch_with_curl(url, timeout=timeout, headers=headers or {"Accept": "application/json"})
if resp.status_code != 200:
return None
try:
return resp.json()
except Exception:
return None
def _sanctions_hits(*values: str) -> list[dict[str, Any]] | None:
hits: list[dict[str, Any]] = []
seen: set[str] = set()
for value in values:
if not value or value in seen:
continue
seen.add(value)
entries = match_exact(value)
if entries:
hits.append({"matched_value": value, "entries": entries})
return hits or None
def lookup_ip(ip: str) -> dict[str, Any]:
if not _IPV4_RE.match(ip) and not _IPV6_RE.match(ip):
raise ValueError("Invalid IP format")
check = validate_host(ip.strip("[]"))
if not check.get("ok"):
raise ValueError(check.get("reason", "blocked IP"))
results: dict[str, Any] = {"ip": ip, "timestamp": _now_iso()}
fields = (
"status,message,continent,country,countryCode,region,regionName,city,zip,"
"lat,lon,timezone,isp,org,as,asname,mobile,proxy,hosting,query"
)
geo = _json_get(f"https://ip-api.com/json/{quote(ip)}?fields={fields}", timeout=5)
if isinstance(geo, dict) and geo.get("status") == "success":
results["geo"] = {
"country": geo.get("country"),
"country_code": geo.get("countryCode"),
"region": geo.get("regionName"),
"city": geo.get("city"),
"lat": geo.get("lat"),
"lon": geo.get("lon"),
"timezone": geo.get("timezone"),
"isp": geo.get("isp"),
"org": geo.get("org"),
"as_number": geo.get("as"),
"as_name": geo.get("asname"),
"is_mobile": geo.get("mobile"),
"is_proxy": geo.get("proxy"),
"is_hosting": geo.get("hosting"),
}
results["reputation"] = {
"is_proxy": bool(geo.get("proxy")),
"is_hosting": bool(geo.get("hosting")),
"is_mobile": bool(geo.get("mobile")),
"risk_level": "HIGH" if geo.get("proxy") else "MEDIUM" if geo.get("hosting") else "LOW",
}
sm = _sanctions_hits(geo.get("org") or "", geo.get("isp") or "", geo.get("asname") or "")
if sm:
results["sanctions_match"] = {"source": "OFAC SDN", "hits": sm}
return results
def lookup_dns(domain: str) -> dict[str, Any]:
if not validate_domain(domain):
raise ValueError("Invalid domain format")
results: dict[str, Any] = {"domain": domain, "records": {}, "timestamp": _now_iso()}
for rtype in ("A", "AAAA", "MX", "NS", "TXT", "CNAME", "SOA"):
data = _json_get(
f"https://dns.google/resolve?name={quote(domain)}&type={rtype}",
timeout=5,
)
answers = []
if isinstance(data, dict):
for ans in data.get("Answer") or []:
answers.append(
{
"name": ans.get("name"),
"type": ans.get("type"),
"ttl": ans.get("TTL"),
"data": ans.get("data"),
}
)
results["records"][rtype] = answers
a_records = results["records"].get("A") or []
mx_records = results["records"].get("MX") or []
ns_records = results["records"].get("NS") or []
results["summary"] = {
"ip_addresses": [r["data"] for r in a_records if r.get("data")],
"mail_servers": [r["data"] for r in mx_records if r.get("data")],
"nameservers": [r["data"] for r in ns_records if r.get("data")],
"total_records": sum(len(v) for v in results["records"].values()),
}
return results
def lookup_whois(domain: str) -> dict[str, Any]:
if not validate_domain(domain):
raise ValueError("Invalid domain format")
results: dict[str, Any] = {"domain": domain, "timestamp": _now_iso()}
rdap = _json_get(f"https://rdap.org/domain/{quote(domain)}", timeout=8)
if isinstance(rdap, dict):
entities = []
for ent in rdap.get("entities") or []:
vcard = ent.get("vcardArray")
name = org = None
if isinstance(vcard, list) and len(vcard) > 1:
for row in vcard[1]:
if row[0] == "fn":
name = row[3]
if row[0] == "org":
org = row[3]
if name or org:
entities.append({"handle": ent.get("handle"), "roles": ent.get("roles"), "name": name, "org": org})
events = [
{"action": e.get("eventAction"), "date": e.get("eventDate")}
for e in (rdap.get("events") or [])
]
results["rdap"] = {
"handle": rdap.get("handle"),
"name": rdap.get("ldhName"),
"status": rdap.get("status"),
"events": events,
"nameservers": [ns.get("ldhName") for ns in (rdap.get("nameservers") or [])],
"entities": entities,
}
results["registration"] = next((e["date"] for e in events if e["action"] == "registration"), None)
results["expiration"] = next((e["date"] for e in events if e["action"] == "expiration"), None)
results["last_changed"] = next((e["date"] for e in events if e["action"] == "last changed"), None)
sm = _sanctions_hits(*(e.get("name") or "" for e in entities), *(e.get("org") or "" for e in entities))
if sm:
results["sanctions_match"] = {"source": "OFAC SDN", "hits": sm}
try:
res = safe_get(f"https://{domain}", timeout=5, headers={"User-Agent": "Shadowbroker-OSINT/1.0"})
headers = {}
for h in (
"server",
"x-powered-by",
"x-frame-options",
"strict-transport-security",
"content-security-policy",
"x-content-type-options",
"x-xss-protection",
"referrer-policy",
"permissions-policy",
):
val = res.headers.get(h)
if val:
headers[h] = val
score = sum(
1
for k in (
"strict-transport-security",
"content-security-policy",
"x-frame-options",
"x-content-type-options",
"referrer-policy",
)
if k in headers
) + (2 if "strict-transport-security" in headers else 0) + (2 if "content-security-policy" in headers else 0)
results["http"] = {"status": res.status_code, "headers": headers, "final_url": res.url}
results["security_score"] = {
"score": score,
"max": 7,
"grade": "A" if score >= 5 else "B" if score >= 3 else "C" if score >= 1 else "F",
}
except Exception as exc:
logger.debug("WHOIS header probe failed for %s: %s", domain, exc)
return results
def lookup_certs(domain: str) -> dict[str, Any]:
if not validate_domain(domain):
raise ValueError("Invalid domain format")
resp = fetch_with_curl(
f"https://crt.sh/?q=%25.{quote(domain)}&output=json",
timeout=10,
headers={"User-Agent": "Shadowbroker-OSINT/1.0"},
)
if resp.status_code != 200:
return {"domain": domain, "certificates": [], "error": "crt.sh unavailable"}
try:
certs = resp.json()
except Exception:
certs = []
seen: set[str] = set()
subdomains: set[str] = set()
unique: list[dict[str, Any]] = []
for cert in (certs or [])[:200]:
key = f"{cert.get('common_name')}-{cert.get('serial_number')}"
if key in seen:
continue
seen.add(key)
for name in (cert.get("name_value") or "").split("\n"):
clean = name.strip().replace("*.", "")
if clean.endswith(domain):
subdomains.add(clean)
unique.append(
{
"id": cert.get("id"),
"issuer": cert.get("issuer_name"),
"common_name": cert.get("common_name"),
"not_before": cert.get("not_before"),
"not_after": cert.get("not_after"),
}
)
return {
"domain": domain,
"certificates": unique[:50],
"subdomains": sorted(subdomains)[:100],
"total_found": len(certs or []),
"timestamp": _now_iso(),
}
def lookup_threats(query: str | None = None) -> dict[str, Any]:
results: dict[str, Any] = {"timestamp": _now_iso()}
pulses = _json_get("https://otx.alienvault.com/api/v1/pulses/activity?limit=10", timeout=8)
if isinstance(pulses, dict):
results["pulses"] = [
{
"name": p.get("name"),
"description": (p.get("description") or "")[:200],
"created": p.get("created"),
"tags": (p.get("tags") or [])[:5],
"adversary": p.get("adversary"),
"indicators_count": p.get("indicator_count"),
}
for p in (pulses.get("results") or [])[:10]
]
if query:
if _IPV4_RE.match(query):
try:
tor_resp = fetch_with_curl("https://check.torproject.org/torbulkexitlist", timeout=5)
results["tor_exit_node"] = query in (tor_resp.text or "").splitlines() if tor_resp.status_code == 200 else None
except Exception:
results["tor_exit_node"] = None
otx = _json_get(f"https://otx.alienvault.com/api/v1/indicators/IPv4/{quote(query)}/general", timeout=5)
if isinstance(otx, dict):
results["otx"] = {
"reputation": otx.get("reputation"),
"pulse_count": (otx.get("pulse_info") or {}).get("count", 0),
"country": otx.get("country_name"),
"asn": otx.get("asn"),
}
elif validate_domain(query):
otx = _json_get(f"https://otx.alienvault.com/api/v1/indicators/domain/{quote(query)}/general", timeout=5)
if isinstance(otx, dict):
results["otx"] = {"pulse_count": (otx.get("pulse_info") or {}).get("count", 0)}
pulse_count = (results.get("otx") or {}).get("pulse_count", 0)
results["threat_level"] = "HIGH" if pulse_count > 5 else "MEDIUM" if pulse_count > 0 else "LOW"
return results
def lookup_bgp(query: str) -> dict[str, Any]:
results: dict[str, Any] = {"query": query, "timestamp": _now_iso()}
if _IPV4_RE.match(query):
data = _json_get(f"https://api.bgpview.io/ip/{quote(query)}", timeout=8)
if isinstance(data, dict) and data.get("status") == "ok":
results["ip"] = data.get("data")
results["type"] = "ip"
return results
if _ASN_RE.match(query):
asn_num = re.sub(r"^AS", "", query, flags=re.I)
asn = _json_get(f"https://api.bgpview.io/asn/{asn_num}", timeout=8)
prefixes = _json_get(f"https://api.bgpview.io/asn/{asn_num}/prefixes", timeout=8)
peers = _json_get(f"https://api.bgpview.io/asn/{asn_num}/peers", timeout=8)
if isinstance(asn, dict) and asn.get("status") == "ok":
results["asn"] = asn.get("data")
if isinstance(prefixes, dict) and prefixes.get("status") == "ok":
pdata = prefixes.get("data") or {}
results["prefixes"] = {
"ipv4": (pdata.get("ipv4_prefixes") or [])[:20],
"ipv6": (pdata.get("ipv6_prefixes") or [])[:10],
"total_v4": len(pdata.get("ipv4_prefixes") or []),
"total_v6": len(pdata.get("ipv6_prefixes") or []),
}
if isinstance(peers, dict) and peers.get("status") == "ok":
pdata = peers.get("data") or {}
results["peers"] = {
"upstream": (pdata.get("ipv4_peers") or [])[:10],
"total": len(pdata.get("ipv4_peers") or []),
}
results["type"] = "asn"
return results
raise ValueError("Unrecognized query format. Use IP address or AS number.")
def lookup_sanctions(query: str, *, schema: str | None = None, limit: int = 25) -> dict[str, Any]:
matches = search_sanctions(query, schema=schema, limit=limit)
return {
"query": query,
"schema": schema,
"total": len(matches),
"matches": matches,
"source": "OpenSanctions / US OFAC SDN",
"timestamp": _now_iso(),
}
def lookup_cve(cve: str) -> dict[str, Any]:
if not _CVE_RE.match(cve):
raise ValueError("Invalid CVE format")
cve_id = cve.upper()
data = _json_get(f"https://cveawg.mitre.org/api/cve/{quote(cve_id)}", timeout=8)
if isinstance(data, dict) and data.get("cveMetadata"):
meta = data["cveMetadata"]
desc = ""
for block in (data.get("containers") or {}).get("cna", {}).get("descriptions") or []:
if block.get("lang") == "en":
desc = block.get("value") or desc
return {"id": meta.get("cveId", cve_id), "description": desc or "No description.", "timestamp": _now_iso()}
fallback = _json_get(f"https://cve.circl.lu/api/cve/{quote(cve_id)}", timeout=8)
if isinstance(fallback, dict):
return {
"id": fallback.get("id", cve_id),
"description": fallback.get("summary") or "No description.",
"cvss": fallback.get("cvss"),
"references": (fallback.get("references") or [])[:5],
"timestamp": _now_iso(),
}
raise ValueError("CVE not found")
def lookup_mac(mac: str) -> dict[str, Any]:
clean = mac.strip().upper()
clean = re.sub(r"[^A-F0-9:-]", "", clean)
data = _json_get(f"https://api.macvendors.com/{quote(clean)}", timeout=8)
if isinstance(data, dict):
return {"mac": clean, "vendor": data.get("company") or data.get("organization") or "Not Found"}
if isinstance(data, str) and data:
return {"mac": clean, "vendor": data}
return {"mac": clean, "vendor": "Not Found"}
def lookup_github(username: str) -> dict[str, Any]:
user = _json_get(f"https://api.github.com/users/{quote(username)}", timeout=8)
if not isinstance(user, dict) or user.get("message") == "Not Found":
raise ValueError("GitHub user not found")
repos = _json_get(f"https://api.github.com/users/{quote(username)}/repos?per_page=10&sort=updated", timeout=8)
return {
"username": username,
"profile": {
"name": user.get("name"),
"bio": user.get("bio"),
"company": user.get("company"),
"location": user.get("location"),
"public_repos": user.get("public_repos"),
"followers": user.get("followers"),
"created_at": user.get("created_at"),
"html_url": user.get("html_url"),
},
"repos": [
{"name": r.get("name"), "language": r.get("language"), "stars": r.get("stargazers_count")}
for r in (repos or [])[:10]
if isinstance(r, dict)
],
"timestamp": _now_iso(),
}
def lookup_leaks(email: str) -> dict[str, Any]:
if "@" not in email or len(email) < 5:
raise ValueError("Invalid email")
# HIBP requires API key for v3; use public breach directory style via leak-lookup (rate limited)
data = _json_get(f"https://leakcheck.io/api/public?check={quote(email)}", timeout=8)
if isinstance(data, dict):
return {
"email": email,
"found": bool(data.get("found")),
"sources": data.get("sources") or [],
"timestamp": _now_iso(),
}
return {"email": email, "found": False, "sources": [], "timestamp": _now_iso()}
def sweep_init(ip: str, cidr: int = 24) -> dict[str, Any]:
try:
addr = ipaddress.IPv4Address(ip)
except ValueError as exc:
raise ValueError("Invalid IPv4 address format") from exc
if addr.is_private or addr.is_loopback or addr.is_link_local or addr.is_reserved:
raise ValueError("Private and reserved IP ranges are not allowed")
if cidr < 24 or cidr > 32:
raise ValueError("CIDR must be between 24 and 32")
fields = "status,message,country,countryCode,region,regionName,city,lat,lon,isp,org,as,proxy,hosting"
geo = _json_get(f"https://ip-api.com/json/{quote(ip)}?fields={fields}", timeout=5)
if not isinstance(geo, dict) or geo.get("status") != "success":
raise ValueError(f"Geolocation failed: {(geo or {}).get('message', 'unknown')}")
return {
"center": {
"lat": geo.get("lat"),
"lng": geo.get("lon"),
"city": geo.get("city"),
"region": geo.get("regionName"),
"country": geo.get("country"),
"countryCode": geo.get("countryCode"),
"isp": geo.get("isp"),
"asn": geo.get("as") or "",
"org": geo.get("org") or "",
},
"target_ip": ip,
"cidr": cidr,
}
def _internetdb_lookup(ip: str) -> dict[str, Any] | None:
try:
resp = fetch_with_curl(
f"https://internetdb.shodan.io/{quote(ip)}",
timeout=4,
headers={"Accept": "application/json"},
)
if resp.status_code == 404:
return None
if resp.status_code != 200:
return None
return resp.json()
except Exception:
return None
def sweep_scan(subnet_start: str, cidr: int, *, max_workers: int = 12) -> dict[str, Any]:
"""Scan a /24-/32 via Shodan InternetDB (server-side proxy)."""
base = int(ipaddress.IPv4Address(subnet_start))
host_count = 2 ** (32 - cidr)
if host_count > 256:
raise ValueError("Subnet too large")
ips = [str(ipaddress.IPv4Address(base + i)) for i in range(host_count)]
devices: list[dict[str, Any]] = []
t0 = time.time()
with ThreadPoolExecutor(max_workers=max_workers) as pool:
futures = {pool.submit(_internetdb_lookup, ip): ip for ip in ips}
for fut in as_completed(futures):
ip = futures[fut]
data = fut.result()
if not data:
continue
devices.append(
{
"ip": data.get("ip") or ip,
"ports": data.get("ports") or [],
"hostnames": data.get("hostnames") or [],
"cpes": data.get("cpes") or [],
"vulns": data.get("vulns") or [],
"tags": data.get("tags") or [],
}
)
return {
"devices": devices,
"summary": {"total_hosts": host_count, "total_responsive": len(devices)},
"sweep_time_ms": int((time.time() - t0) * 1000),
}
def subnet_start_for(ip: str, cidr: int) -> str:
net = ipaddress.IPv4Network(f"{ip}/{cidr}", strict=False)
return str(net.network_address)
+135
View File
@@ -0,0 +1,135 @@
"""OpenClaw dispatch for the operator recon / OSINT lookup toolkit."""
from __future__ import annotations
from typing import Any
from services.osint import lookups
from services.osint_intel.resolve import ALLOWED_TYPES, resolve_entity
_OSINT_TOOLS: dict[str, str] = {
"ip": "ip",
"dns": "domain",
"whois": "domain",
"certs": "domain",
"threats": "query",
"bgp": "query",
"sanctions": "query",
"cve": "cve",
"mac": "mac",
"github": "username",
"leaks": "email",
"sweep_init": "ip",
}
_ENTITY_SCHEMAS = frozenset({
"Person",
"Organization",
"Company",
"Vessel",
"Airplane",
"LegalEntity",
})
def _require_str(args: dict[str, Any], *keys: str) -> str:
for key in keys:
value = str(args.get(key, "") or "").strip()
if value:
return value
joined = "/".join(keys)
raise ValueError(f"Missing required argument: {joined}")
def run_osint_lookup(tool: str, args: dict[str, Any]) -> dict[str, Any]:
"""Run a passive OSINT lookup (same backends as /api/osint/*)."""
name = str(tool or "").strip().lower().replace("-", "_")
if name not in _OSINT_TOOLS:
allowed = ", ".join(sorted(_OSINT_TOOLS))
raise ValueError(f"Unknown OSINT tool '{tool}'. Allowed: {allowed}")
if name == "ip":
return lookups.lookup_ip(_require_str(args, "ip", "query", "value"))
if name == "dns":
return lookups.lookup_dns(_require_str(args, "domain", "query", "value"))
if name == "whois":
return lookups.lookup_whois(_require_str(args, "domain", "query", "value"))
if name == "certs":
return lookups.lookup_certs(_require_str(args, "domain", "query", "value"))
if name == "threats":
query = str(args.get("query", "") or args.get("value", "") or "").strip() or None
return lookups.lookup_threats(query)
if name == "bgp":
return lookups.lookup_bgp(_require_str(args, "query", "asn", "value"))
if name == "sanctions":
query = _require_str(args, "query", "name", "value")
schema = str(args.get("schema", "") or "").strip() or None
if schema and schema not in _ENTITY_SCHEMAS:
allowed = ", ".join(sorted(_ENTITY_SCHEMAS))
raise ValueError(f"Invalid schema. Allowed: {allowed}")
limit = args.get("limit", 25)
try:
limit = int(limit)
except (TypeError, ValueError):
limit = 25
limit = max(1, min(100, limit))
return lookups.lookup_sanctions(query, schema=schema, limit=limit)
if name == "cve":
return lookups.lookup_cve(_require_str(args, "cve", "query", "value"))
if name == "mac":
return lookups.lookup_mac(_require_str(args, "mac", "query", "value"))
if name == "github":
return lookups.lookup_github(_require_str(args, "username", "user", "query", "value"))
if name == "leaks":
return lookups.lookup_leaks(_require_str(args, "email", "query", "value"))
if name == "sweep_init":
ip = _require_str(args, "ip", "query", "value")
cidr = args.get("cidr", 24)
try:
cidr = int(cidr)
except (TypeError, ValueError):
cidr = 24
return lookups.sweep_init(ip, cidr)
raise ValueError(f"Unhandled OSINT tool: {name}")
def run_osint_sweep(args: dict[str, Any]) -> dict[str, Any]:
"""Run subnet device discovery (Shodan InternetDB proxy). Requires full access tier."""
ip = _require_str(args, "ip", "query", "value")
cidr = args.get("cidr", 24)
try:
cidr = int(cidr)
except (TypeError, ValueError):
cidr = 24
subnet = lookups.subnet_start_for(ip, cidr)
scan = lookups.sweep_scan(subnet, cidr)
init = lookups.sweep_init(ip, cidr)
return {**init, **scan, "subnet": f"{subnet}/{cidr}"}
def run_entity_expand(args: dict[str, Any]) -> dict[str, Any]:
"""Expand an entity graph node (aircraft, vessel, IP, company, person, country)."""
entity_type = _require_str(args, "type", "entity_type")
entity_id = _require_str(args, "id", "entity_id", "query", "value")
props = {
"label": entity_id,
"registration": str(args.get("registration", "") or "").strip() or None,
"model": str(args.get("model", "") or "").strip() or None,
"icao24": str(args.get("icao24", "") or "").strip() or None,
}
props = {key: value for key, value in props.items() if value is not None}
return resolve_entity(entity_type, entity_id, props)
def osint_tool_help() -> dict[str, Any]:
"""Discovery metadata for agents."""
return {
"tools": sorted(_OSINT_TOOLS),
"entity_types": sorted(ALLOWED_TYPES),
"sanctions_schemas": sorted(_ENTITY_SCHEMAS),
"notes": {
"osint_lookup": "Passive lookups — same data as the Recon panel /api/osint/* routes.",
"osint_sweep": "Active subnet scan via Shodan InternetDB — requires full OpenClaw access tier.",
"entity_expand": "Build a relationship graph around aircraft, vessels, IPs, companies, people, or countries.",
},
}
+1
View File
@@ -0,0 +1 @@
"""Entity graph resolution (Osiris intel layer port)."""
+268
View File
@@ -0,0 +1,268 @@
"""Entity graph resolver (Python port of Osiris intel/server.js)."""
from __future__ import annotations
import logging
import re
import threading
import time
from typing import Any
from urllib.parse import quote
from services.network_utils import fetch_with_curl
from services.sanctions.ofac import match_exact, search_sanctions
logger = logging.getLogger(__name__)
ALLOWED_TYPES = frozenset({"aircraft", "vessel", "company", "person", "ip", "country"})
_WD_CACHE: dict[str, tuple[float, dict[str, Any]]] = {}
_WD_LOCK = threading.Lock()
_WD_TTL = 24 * 60 * 60
_WD_UA = "Shadowbroker-Intel/1.0 (ontology engine)"
def _dedup(nodes: list[dict], links: list[dict]) -> dict[str, Any]:
node_map: dict[str, dict] = {}
for n in nodes:
node_map[n["id"]] = n
seen_links: set[str] = set()
out_links: list[dict] = []
for link in links:
key = f"{link['source']}{link['target']}{link['label']}"
if key in seen_links:
continue
seen_links.add(key)
out_links.append(link)
return {"nodes": list(node_map.values()), "links": out_links}
def _wd_cache_get(key: str) -> dict[str, Any] | None:
with _WD_LOCK:
entry = _WD_CACHE.get(key)
if not entry:
return None
ts, data = entry
if time.time() - ts > _WD_TTL:
_WD_CACHE.pop(key, None)
return None
return data
def _wd_cache_set(key: str, data: dict[str, Any]) -> None:
with _WD_LOCK:
if len(_WD_CACHE) > 5000:
oldest = next(iter(_WD_CACHE))
_WD_CACHE.pop(oldest, None)
_WD_CACHE[key] = (time.time(), data)
def _add_sanctions(id_label: str, root_id: str, nodes: list, links: list) -> None:
for hit in search_sanctions(id_label, limit=3):
sid = f"sanction:{hit['id']}"
nodes.append(
{
"id": sid,
"label": hit["name"],
"type": "sanction",
"properties": {"programs": hit.get("programs"), "source": "OFAC SDN"},
}
)
links.append({"source": root_id, "target": sid, "label": "SANCTIONS MATCH"})
def _sparql(query: str) -> list[dict[str, Any]]:
url = f"https://query.wikidata.org/sparql?query={quote(query)}&format=json"
resp = fetch_with_curl(url, timeout=10, headers={"User-Agent": _WD_UA, "Accept": "application/sparql-results+json"})
if resp.status_code != 200:
return []
try:
data = resp.json()
except Exception:
return []
return data.get("results", {}).get("bindings", [])
def _wd_search(label: str) -> str | None:
url = (
"https://www.wikidata.org/w/api.php?action=wbsearchentities"
f"&search={quote(label)}&language=en&limit=1&format=json"
)
resp = fetch_with_curl(url, timeout=5, headers={"User-Agent": _WD_UA})
if resp.status_code != 200:
return None
try:
hits = resp.json().get("search") or []
except Exception:
return None
return hits[0]["id"] if hits else None
def _resolve_ip(id_value: str) -> dict[str, Any]:
cache_key = f"ip:{id_value}"
cached = _wd_cache_get(cache_key)
if cached:
return cached
root_id = f"ip:{id_value}"
nodes: list[dict] = [{"id": root_id, "label": id_value, "type": "ip", "properties": {}}]
links: list[dict] = []
geo = fetch_with_curl(
f"https://ip-api.com/json/{quote(id_value)}"
"?fields=status,country,countryCode,city,lat,lon,isp,org,as,asname,proxy,hosting,mobile",
timeout=8,
)
if geo.status_code == 200:
try:
data = geo.json()
except Exception:
data = {}
if data.get("status") == "success":
nodes[0]["properties"] = {
"proxy": bool(data.get("proxy")),
"hosting": bool(data.get("hosting")),
"mobile": bool(data.get("mobile")),
"source": "ip-api.com",
}
if data.get("isp"):
iid = f"company:{data['isp']}"
nodes.append({"id": iid, "label": data["isp"], "type": "company", "properties": {"role": "ISP"}})
links.append({"source": root_id, "target": iid, "label": "HOSTED_BY"})
if data.get("country"):
cid = f"country:{data['country']}"
nodes.append(
{
"id": cid,
"label": data["country"],
"type": "country",
"properties": {"code": data.get("countryCode")},
}
)
links.append({"source": root_id, "target": cid, "label": "LOCATED_IN"})
for val in (data.get("isp"), data.get("org"), data.get("asname")):
if val:
for entry in match_exact(val):
sid = f"sanction:{entry['id']}"
nodes.append({"id": sid, "label": entry["name"], "type": "sanction", "properties": {}})
links.append({"source": root_id, "target": sid, "label": "SANCTIONS MATCH"})
whois = fetch_with_curl(
f"https://stat.ripe.net/data/whois/data.json?resource={quote(id_value)}",
timeout=8,
)
if whois.status_code == 200:
try:
records = whois.json().get("data", {}).get("records") or []
except Exception:
records = []
for record in records:
for field in record:
if field.get("key") in ("netname", "NetName"):
nid = f"company:{field['value']}"
nodes.append({"id": nid, "label": field["value"], "type": "company", "properties": {"role": "Network"}})
links.append({"source": root_id, "target": nid, "label": "HOSTED_BY"})
result = _dedup(nodes, links)
_wd_cache_set(cache_key, result)
return result
def _resolve_company(id_value: str) -> dict[str, Any]:
cache_key = f"company:{id_value}"
cached = _wd_cache_get(cache_key)
if cached:
return cached
root_id = f"company:{id_value}"
nodes = [{"id": root_id, "label": id_value, "type": "company", "properties": {}}]
links: list[dict] = []
safe = re.sub(r'[^a-zA-Z0-9 \-._]', '', id_value).strip()
qid = _wd_search(safe)
filt = f"VALUES ?item {{ wd:{qid} }}" if qid else f'?item rdfs:label "{safe}"@en . ?item wdt:P31/wdt:P279* wd:Q4830453 .'
rows = _sparql(
f"""
SELECT ?countryLabel ?parentLabel ?ceoLabel WHERE {{
{filt}
OPTIONAL {{ ?item wdt:P17 ?country . }}
OPTIONAL {{ ?item wdt:P749 ?parent . }}
OPTIONAL {{ ?item wdt:P169 ?ceo . }}
SERVICE wikibase:label {{ bd:serviceParam wikibase:language "en" . }}
}} LIMIT 10
"""
)
for row in rows:
if row.get("countryLabel", {}).get("value"):
cid = f"country:{row['countryLabel']['value']}"
nodes.append({"id": cid, "label": row["countryLabel"]["value"], "type": "country", "properties": {}})
links.append({"source": root_id, "target": cid, "label": "HEADQUARTERED"})
if row.get("parentLabel", {}).get("value"):
pid = f"company:{row['parentLabel']['value']}"
nodes.append({"id": pid, "label": row["parentLabel"]["value"], "type": "company", "properties": {}})
links.append({"source": root_id, "target": pid, "label": "PARENT ORG"})
if row.get("ceoLabel", {}).get("value"):
pid = f"person:{row['ceoLabel']['value']}"
nodes.append({"id": pid, "label": row["ceoLabel"]["value"], "type": "person", "properties": {"role": "CEO"}})
links.append({"source": root_id, "target": pid, "label": "CEO"})
_add_sanctions(id_value, root_id, nodes, links)
result = _dedup(nodes, links)
_wd_cache_set(cache_key, result)
return result
def _resolve_from_store(entity_type: str, id_value: str, props: dict[str, Any]) -> dict[str, Any]:
from services.fetchers._store import get_latest_data_subset_refs
root_id = f"{entity_type}:{id_value}"
nodes = [{"id": root_id, "label": props.get("label") or id_value, "type": entity_type, "properties": props}]
links: list[dict] = []
data = get_latest_data_subset_refs("flights", "ships", "military_flights", "tracked_flights")
if entity_type == "aircraft":
icao = (props.get("icao24") or id_value).lower()
for bucket in ("military_flights", "tracked_flights", "flights"):
for f in data.get(bucket) or []:
if str(f.get("icao24", "")).lower() == icao:
if f.get("country"):
cid = f"country:{f['country']}"
nodes.append({"id": cid, "label": f["country"], "type": "country", "properties": {}})
links.append({"source": root_id, "target": cid, "label": "REGISTERED_IN"})
if f.get("registration"):
nodes[0]["properties"]["registration"] = f["registration"]
break
elif entity_type == "vessel":
mmsi = str(props.get("mmsi") or id_value)
for ship in data.get("ships") or []:
if str(ship.get("mmsi")) == mmsi:
if ship.get("country"):
cid = f"country:{ship['country']}"
nodes.append({"id": cid, "label": ship["country"], "type": "country", "properties": {}})
links.append({"source": root_id, "target": cid, "label": "FLAG"})
break
_add_sanctions(id_value, root_id, nodes, links)
return _dedup(nodes, links)
def resolve_entity(entity_type: str, id_value: str, properties: dict[str, Any] | None = None) -> dict[str, Any]:
etype = (entity_type or "").lower().strip()
eid = (id_value or "").strip()
if etype not in ALLOWED_TYPES:
raise ValueError(f"Invalid type. Allowed: {', '.join(sorted(ALLOWED_TYPES))}")
if len(eid) < 2 or len(eid) > 200:
raise ValueError("Invalid id (2-200 chars)")
props = properties or {}
if etype == "ip":
return _resolve_ip(eid)
if etype in ("company", "person", "country"):
if etype == "company":
return _resolve_company(eid)
if etype == "person":
root_id = f"person:{eid}"
nodes = [{"id": root_id, "label": eid, "type": "person", "properties": {}}]
links: list[dict] = []
_add_sanctions(eid, root_id, nodes, links)
return _dedup(nodes, links)
root_id = f"country:{eid}"
nodes = [{"id": root_id, "label": eid, "type": "country", "properties": {}}]
links = []
_add_sanctions(eid, root_id, nodes, links)
return _dedup(nodes, links)
return _resolve_from_store(etype, eid, props)
@@ -0,0 +1,81 @@
"""Operator opt-in for Polymarket/Kalshi outbound fetches (Global Threat Intercept)."""
from __future__ import annotations
import json
import logging
import os
import threading
from pathlib import Path
from typing import Any
logger = logging.getLogger(__name__)
_OPT_IN_FILE = Path(__file__).resolve().parent.parent / "data" / "prediction_markets_opt_in.json"
_OPT_IN_LOCK = threading.Lock()
def _env_flag(name: str) -> str:
return str(os.getenv(name, "")).strip().lower()
def get_prediction_markets_ui_opt_in() -> bool:
if not _OPT_IN_FILE.exists():
return False
try:
payload = json.loads(_OPT_IN_FILE.read_text(encoding="utf-8"))
return bool(payload.get("opted_in"))
except (OSError, json.JSONDecodeError, TypeError) as exc:
logger.warning("Prediction markets opt-in file unreadable: %s", exc)
return False
def set_prediction_markets_ui_opt_in(opted_in: bool) -> None:
_OPT_IN_FILE.parent.mkdir(parents=True, exist_ok=True)
with _OPT_IN_LOCK:
_OPT_IN_FILE.write_text(
json.dumps({"opted_in": bool(opted_in)}, indent=2),
encoding="utf-8",
)
def prediction_markets_env_forced_on() -> bool:
return _env_flag("PREDICTION_MARKETS_ENABLED") in {"1", "true", "yes", "on"}
def prediction_markets_env_forced_off() -> bool:
return _env_flag("PREDICTION_MARKETS_ENABLED") in {"0", "false", "no", "off"}
def prediction_markets_fetch_enabled() -> bool:
"""True when UI opt-in or env enables Polymarket/Kalshi pulls."""
if get_prediction_markets_ui_opt_in():
return True
return prediction_markets_env_forced_on()
def prediction_markets_status() -> dict[str, Any]:
ui_opted_in = get_prediction_markets_ui_opt_in()
env_on = prediction_markets_env_forced_on()
env_off = prediction_markets_env_forced_off()
env_override = None
if env_on:
env_override = "on"
elif env_off:
env_override = "off"
return {
"enabled": prediction_markets_fetch_enabled(),
"ui_opted_in": ui_opted_in,
"env_override": env_override,
"jitter": {
"scheduler_interval_minutes": int(
os.environ.get("PREDICTION_MARKETS_INTERVAL_MINUTES", "7")
),
"scheduler_jitter_seconds": int(
os.environ.get("PREDICTION_MARKETS_SCHEDULER_JITTER_S", "240")
),
"pre_fetch_jitter_seconds": float(
os.environ.get("PREDICTION_MARKETS_PRE_FETCH_JITTER_S", "90")
),
},
}
+61 -20
View File
@@ -2,14 +2,34 @@ import requests
from bs4 import BeautifulSoup from bs4 import BeautifulSoup
import logging import logging
from cachetools import cached, TTLCache from cachetools import cached, TTLCache
import cloudscraper
import reverse_geocoder as rg import reverse_geocoder as rg
from urllib.parse import urlparse from urllib.parse import urlparse
from services.network_utils import outbound_user_agent
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)
_OPENMHZ_AUDIO_HOSTS = {"media.openmhz.com", "media2.openmhz.com", "media3.openmhz.com"} _OPENMHZ_AUDIO_HOSTS = {"media.openmhz.com", "media2.openmhz.com", "media3.openmhz.com"}
# Round 7a / Issues #289, #290, #291 (tg12 audit):
# We previously sent a spoofed Chrome User-Agent and (for OpenMHz) used
# cloudscraper to bypass anti-bot challenges. Both are dishonest and ToS-
# unfriendly. We now send the per-install Shadowbroker UA — the upstream
# can identify us, rate-limit us per install, and contact us if needed.
#
# If the upstream actively blocks our honest UA, the feature degrades
# gracefully (returns an empty list / cached results) rather than
# escalating to deception.
def _broadcastify_user_agent() -> str:
return outbound_user_agent("broadcastify")
def _openmhz_user_agent() -> str:
return outbound_user_agent("openmhz")
# Cache the top feeds for 5 minutes so we don't hammer Broadcastify # Cache the top feeds for 5 minutes so we don't hammer Broadcastify
radio_cache = TTLCache(maxsize=1, ttl=300) radio_cache = TTLCache(maxsize=1, ttl=300)
@@ -22,8 +42,12 @@ def get_top_broadcastify_feeds():
""" """
logger.info("Scraping Broadcastify Top Feeds (Cache Miss)") logger.info("Scraping Broadcastify Top Feeds (Cache Miss)")
headers = { headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36", # Issue #289 (tg12) + Round 7a: identify ourselves honestly as a
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8", # per-install Shadowbroker scraper. Broadcastify can rate-limit
# us per install or block us; either way we stop pretending to be
# a browser. If they block, the panel degrades gracefully.
"User-Agent": _broadcastify_user_agent(),
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.9", "Accept-Language": "en-US,en;q=0.9",
} }
@@ -89,21 +113,32 @@ openmhz_systems_cache = TTLCache(maxsize=1, ttl=3600)
@cached(openmhz_systems_cache) @cached(openmhz_systems_cache)
def get_openmhz_systems(): def get_openmhz_systems():
"""Fetches the full directory of OpenMHZ systems.""" """Fetches the full directory of OpenMHZ systems.
logger.info("Scraping OpenMHZ Systems (Cache Miss)")
scraper = cloudscraper.create_scraper(
browser={"browser": "chrome", "platform": "windows", "desktop": True}
)
Issue #290 (tg12) + Round 7a: replaced cloudscraper-based Chrome
impersonation with an honest per-install Shadowbroker User-Agent.
If OpenMHz's Cloudflare layer blocks honest traffic, we accept
that degradation (return empty list) rather than spoof a browser.
"""
logger.info("Fetching OpenMHZ Systems (Cache Miss)")
try: try:
res = scraper.get("https://api.openmhz.com/systems", timeout=15) res = requests.get(
"https://api.openmhz.com/systems",
timeout=15,
headers={"User-Agent": _openmhz_user_agent(), "Accept": "application/json"},
)
if res.status_code == 200: if res.status_code == 200:
data = res.json() data = res.json()
# Return list of systems
return data.get("systems", []) if isinstance(data, dict) else [] return data.get("systems", []) if isinstance(data, dict) else []
if res.status_code in (403, 503):
logger.warning(
"OpenMHZ returned %s for systems directory — Cloudflare may "
"be blocking our honest UA. Feature degrades to empty result.",
res.status_code,
)
return [] return []
except (requests.RequestException, ConnectionError, TimeoutError, ValueError, KeyError) as e: except (requests.RequestException, ConnectionError, TimeoutError, ValueError, KeyError) as e:
logger.error(f"OpenMHZ Systems Scrape Exception: {e}") logger.error(f"OpenMHZ Systems Fetch Exception: {e}")
return [] return []
@@ -113,21 +148,25 @@ openmhz_calls_cache = TTLCache(maxsize=100, ttl=20)
@cached(openmhz_calls_cache) @cached(openmhz_calls_cache)
def get_recent_openmhz_calls(sys_name: str): def get_recent_openmhz_calls(sys_name: str):
"""Fetches the actual audio burst .m4a URLs for a specific system (e.g., 'wmata').""" """Fetches the actual audio burst .m4a URLs for a specific system (e.g., 'wmata').
logger.info(f"Fetching OpenMHZ calls for {sys_name} (Cache Miss)")
scraper = cloudscraper.create_scraper(
browser={"browser": "chrome", "platform": "windows", "desktop": True}
)
Issue #290 (tg12) + Round 7a: same honest-UA model as
``get_openmhz_systems``.
"""
logger.info(f"Fetching OpenMHZ calls for {sys_name} (Cache Miss)")
try: try:
url = f"https://api.openmhz.com/{sys_name}/calls" url = f"https://api.openmhz.com/{sys_name}/calls"
res = scraper.get(url, timeout=15) res = requests.get(
url,
timeout=15,
headers={"User-Agent": _openmhz_user_agent(), "Accept": "application/json"},
)
if res.status_code == 200: if res.status_code == 200:
data = res.json() data = res.json()
return data.get("calls", []) if isinstance(data, dict) else [] return data.get("calls", []) if isinstance(data, dict) else []
return [] return []
except (requests.RequestException, ConnectionError, TimeoutError, ValueError, KeyError) as e: except (requests.RequestException, ConnectionError, TimeoutError, ValueError, KeyError) as e:
logger.error(f"OpenMHZ Calls Scrape Exception ({sys_name}): {e}") logger.error(f"OpenMHZ Calls Fetch Exception ({sys_name}): {e}")
return [] return []
@@ -163,9 +202,11 @@ def openmhz_audio_response(target_url: str):
timeout=(5, 20), timeout=(5, 20),
allow_redirects=False, allow_redirects=False,
headers={ headers={
"User-Agent": "Mozilla/5.0", # Issue #291 (tg12) + Round 7a: drop spoofed Mozilla
# UA and the fake first-party Referer. Identify as
# the per-install Shadowbroker proxy honestly.
"User-Agent": _openmhz_user_agent(),
"Accept": "audio/mpeg,audio/*,*/*;q=0.8", "Accept": "audio/mpeg,audio/*,*/*;q=0.8",
"Referer": "https://openmhz.com/",
}, },
) )
if upstream.is_redirect or upstream.status_code in (301, 302, 303, 307, 308): if upstream.is_redirect or upstream.status_code in (301, 302, 303, 307, 308):
+70 -6
View File
@@ -4,7 +4,7 @@ import concurrent.futures
from urllib.parse import quote from urllib.parse import quote
import requests as _requests import requests as _requests
from cachetools import TTLCache from cachetools import TTLCache
from services.network_utils import fetch_with_curl from services.network_utils import fetch_with_curl, outbound_user_agent
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)
@@ -15,6 +15,31 @@ dossier_cache = TTLCache(maxsize=500, ttl=86400)
# Nominatim requires max 1 req/sec — track last call time # Nominatim requires max 1 req/sec — track last call time
_nominatim_last_call = 0.0 _nominatim_last_call = 0.0
# Issues #218 / #219 (tg12): Wikimedia's User-Agent policy requires API
# clients to identify themselves with a stable User-Agent that includes
# a contact path.
#
# Round 7a: the original fix in PR #284 used a single project-wide
# identifier, which from Wikimedia's perspective made every Shadowbroker
# install in the world look like one giant scraper. If one install
# misbehaved, their only recourse was to block "Shadowbroker" as a
# whole. We now build the headers from ``outbound_user_agent('wikimedia')``
# which embeds the per-install operator handle (auto-generated or
# operator-chosen), so Wikimedia can rate-limit / contact the specific
# install instead of the project.
def _wikimedia_request_headers() -> dict[str, str]:
ua = outbound_user_agent("wikimedia")
return {
"User-Agent": ua,
# Browser-JS-style header that Wikimedia's policy explicitly
# accepts on top of (or instead of) User-Agent. We send both so
# whichever the upstream prefers, the per-operator handle is
# always available.
"Api-User-Agent": ua,
}
def _reverse_geocode_offline(lat: float, lng: float) -> dict: def _reverse_geocode_offline(lat: float, lng: float) -> dict:
"""Offline fallback via reverse_geocoder when external reverse geocoding is blocked.""" """Offline fallback via reverse_geocoder when external reverse geocoding is blocked."""
@@ -45,9 +70,7 @@ def _reverse_geocode(lat: float, lng: float) -> dict:
f"https://nominatim.openstreetmap.org/reverse?" f"https://nominatim.openstreetmap.org/reverse?"
f"lat={lat}&lon={lng}&format=json&zoom=10&addressdetails=1&accept-language=en" f"lat={lat}&lon={lng}&format=json&zoom=10&addressdetails=1&accept-language=en"
) )
headers = { headers = {"User-Agent": outbound_user_agent("nominatim")}
"User-Agent": "ShadowBroker-OSINT/1.0 (live-risk-dashboard; contact@shadowbroker.app)"
}
for attempt in range(2): for attempt in range(2):
# Enforce Nominatim's 1 req/sec policy # Enforce Nominatim's 1 req/sec policy
@@ -121,7 +144,13 @@ def _fetch_wikidata_leader(country_name: str) -> dict:
""" """
url = f"https://query.wikidata.org/sparql?query={quote(sparql)}&format=json" url = f"https://query.wikidata.org/sparql?query={quote(sparql)}&format=json"
try: try:
res = fetch_with_curl(url, timeout=6) # Issue #218 (tg12): Wikimedia's User-Agent policy requires
# outbound API traffic to be identifiable. fetch_with_curl()
# sends the project default, and we also add the Wikimedia-
# specific Api-User-Agent that the policy specifically asks
# for, since this request originates from a backend service
# that proxies on behalf of (potentially many) browser users.
res = fetch_with_curl(url, timeout=6, headers=_wikimedia_request_headers())
if res.status_code == 200: if res.status_code == 200:
results = res.json().get("results", {}).get("bindings", []) results = res.json().get("results", {}).get("bindings", [])
if results: if results:
@@ -147,7 +176,9 @@ def _fetch_local_wiki_summary(place_name: str, country_name: str = "") -> dict:
slug = quote(name.replace(" ", "_")) slug = quote(name.replace(" ", "_"))
url = f"https://en.wikipedia.org/api/rest_v1/page/summary/{slug}" url = f"https://en.wikipedia.org/api/rest_v1/page/summary/{slug}"
try: try:
res = fetch_with_curl(url, timeout=5) # Issue #219 (tg12): identify ourselves to Wikimedia per
# their UA policy; see _fetch_wikidata_leader above.
res = fetch_with_curl(url, timeout=5, headers=_wikimedia_request_headers())
if res.status_code == 200: if res.status_code == 200:
data = res.json() data = res.json()
if data.get("type") != "disambiguation": if data.get("type") != "disambiguation":
@@ -270,3 +301,36 @@ def get_region_dossier(lat: float, lng: float) -> dict:
dossier_cache[cache_key] = result dossier_cache[cache_key] = result
return result return result
def fetch_wikipedia_page_summary(title: str) -> dict | None:
"""Wikipedia REST summary for a page title (backend-proxied for #360)."""
trimmed = (title or "").strip()
if not trimmed:
return None
data = _fetch_local_wiki_summary(trimmed, "")
if not data.get("extract") and not data.get("description"):
return None
return {
"title": trimmed,
"description": data.get("description", ""),
"extract": data.get("extract", ""),
"thumbnail": data.get("thumbnail", ""),
"type": "standard",
}
def fetch_wikidata_sparql_bindings(sparql: str) -> list:
"""Run a Wikidata SPARQL query; returns bindings list (empty on failure)."""
trimmed = (sparql or "").strip()
if not trimmed:
return []
url = f"https://query.wikidata.org/sparql?query={quote(trimmed)}&format=json"
try:
res = fetch_with_curl(url, timeout=8, headers=_wikimedia_request_headers())
if res.status_code == 200:
bindings = res.json().get("results", {}).get("bindings", [])
return bindings if isinstance(bindings, list) else []
except (ConnectionError, TimeoutError, ValueError, KeyError, OSError) as e:
logger.warning("Wikidata SPARQL failed: %s", e)
return []
@@ -0,0 +1,5 @@
"""Sentinel-2 road corridor freight trend analysis (DrishX engine port)."""
from .config import optional_deps_available, road_corridor_sat_enabled
__all__ = ["optional_deps_available", "road_corridor_sat_enabled"]
@@ -0,0 +1,4 @@
from .cli import main
if __name__ == "__main__":
raise SystemExit(main())
+53
View File
@@ -0,0 +1,53 @@
"""CLI for manual road corridor analysis runs."""
from __future__ import annotations
import argparse
import logging
import sys
from .config import optional_deps_available, road_corridor_sat_enabled
from .credentials import sentinel_credentials_configured
from .pipeline import analyze_preset
from .presets import CORRIDOR_PRESETS
def main(argv: list[str] | None = None) -> int:
parser = argparse.ArgumentParser(description="Run Sentinel-2 road corridor truck trend analysis")
parser.add_argument("--preset", required=True, help="Preset id (e.g. laredo_i35)")
parser.add_argument("-v", "--verbose", action="store_true")
args = parser.parse_args(argv)
logging.basicConfig(level=logging.DEBUG if args.verbose else logging.INFO)
if not optional_deps_available():
print(
"Install optional deps: uv sync --extra road-corridor "
"(geopandas, osmnx, rasterio, sentinelhub, scikit-learn, imageio)",
file=sys.stderr,
)
return 2
if not road_corridor_sat_enabled() and not args.verbose:
print("Note: ROAD_CORRIDOR_SAT_ENABLED is off — CLI still runs for manual analysis.")
if not sentinel_credentials_configured():
print("Set SENTINEL_CLIENT_ID and SENTINEL_CLIENT_SECRET first.", file=sys.stderr)
return 2
valid = {p["id"] for p in CORRIDOR_PRESETS}
if args.preset not in valid:
print(f"Unknown preset {args.preset!r}. Choose from: {', '.join(sorted(valid))}", file=sys.stderr)
return 2
def progress(msg: str, pct: int | None = None) -> None:
suffix = f" ({pct}%)" if pct is not None else ""
print(f"{msg}{suffix}")
result = analyze_preset(args.preset, progress_cb=progress)
print(
f"Done: {result.get('total_detections', 0)} detections across "
f"{len(result.get('daily_counts') or [])} days — status={result.get('status')}"
)
return 0 if result.get("status") == "ok" else 1
if __name__ == "__main__":
raise SystemExit(main())
@@ -0,0 +1,41 @@
"""Configuration for Sentinel-2 road corridor trend analysis."""
from __future__ import annotations
import os
from pathlib import Path
_BACKEND_ROOT = Path(__file__).resolve().parents[2]
DATA_ROOT = Path(os.environ.get("ROAD_CORRIDOR_DATA_DIR", str(_BACKEND_ROOT / "data" / "road_corridors")))
CACHE_DIR = DATA_ROOT / "cache"
DETECTION_CROP_DIR = DATA_ROOT / "detection_crops"
STATE_PATH = DATA_ROOT / "_refresh_state.json"
DEFAULT_MONTHS = int(os.environ.get("ROAD_CORRIDOR_MONTHS", "2"))
DEFAULT_MAX_FRAMES = int(os.environ.get("ROAD_CORRIDOR_MAX_FRAMES", "6"))
SCHEDULED_PRESET_IDS = [
s.strip()
for s in os.environ.get("ROAD_CORRIDOR_SCHEDULED_PRESETS", "laredo_i35").split(",")
if s.strip()
]
def road_corridor_sat_enabled() -> bool:
return os.environ.get("ROAD_CORRIDOR_SAT_ENABLED", "").strip().lower() in {
"1",
"true",
"yes",
"on",
}
def optional_deps_available() -> bool:
try:
import geopandas # noqa: F401
import osmnx # noqa: F401
import rasterio # noqa: F401
import sentinelhub # noqa: F401
import sklearn # noqa: F401
return True
except ImportError:
return False
@@ -0,0 +1,37 @@
"""Reuse Shadowbroker Sentinel Hub / Copernicus CDSE credentials."""
from __future__ import annotations
import os
from .config import CACHE_DIR
def resolve_sentinel_credentials() -> tuple[str, str]:
client_id = (os.environ.get("SENTINEL_CLIENT_ID") or "").strip()
client_secret = (os.environ.get("SENTINEL_CLIENT_SECRET") or "").strip()
return client_id, client_secret
def sentinel_credentials_configured() -> bool:
client_id, client_secret = resolve_sentinel_credentials()
return bool(client_id and client_secret)
def build_sh_config():
from sentinelhub import SHConfig
client_id, client_secret = resolve_sentinel_credentials()
if not client_id or not client_secret:
raise RuntimeError(
"SENTINEL_CLIENT_ID and SENTINEL_CLIENT_SECRET are required for road corridor analysis"
)
config = SHConfig()
config.sh_client_id = client_id
config.sh_client_secret = client_secret
config.sh_base_url = "https://sh.dataspace.copernicus.eu"
config.sh_token_url = (
"https://identity.dataspace.copernicus.eu/auth/realms/CDSE/protocol/openid-connect/token"
)
CACHE_DIR.mkdir(parents=True, exist_ok=True)
config.cache_dir = str(CACHE_DIR / "sentinelhub")
return config
+149
View File
@@ -0,0 +1,149 @@
"""In-memory job queue for on-demand Analyze Here runs."""
from __future__ import annotations
import logging
import threading
import uuid
from dataclasses import dataclass
from typing import Any
logger = logging.getLogger(__name__)
_lock = threading.Lock()
_jobs: dict[str, AnalyzeJob] = {}
@dataclass
class AnalyzeJob:
job_id: str
lat: float
lon: float
status: str = "queued"
message: str = "Queued"
progress: int = 0
result: dict[str, Any] | None = None
error: str | None = None
def get_job(job_id: str) -> AnalyzeJob | None:
with _lock:
return _jobs.get(job_id)
def get_latest_job() -> AnalyzeJob | None:
with _lock:
if not _jobs:
return None
return max(_jobs.values(), key=lambda j: j.job_id)
def _running_job() -> AnalyzeJob | None:
with _lock:
for job in _jobs.values():
if job.status in {"queued", "running"}:
return job
return None
def _prune_jobs(max_keep: int = 8) -> None:
with _lock:
if len(_jobs) <= max_keep:
return
ordered = sorted(_jobs.items(), key=lambda item: item[0], reverse=True)
for job_id, _ in ordered[max_keep:]:
_jobs.pop(job_id, None)
def _worker(job_id: str, lat: float, lon: float, label: str | None) -> None:
from services.fetchers.road_corridor_sat import refresh_road_corridor_store
from .pipeline import analyze_corridor
from .viewport import adhoc_preset_id, bbox_around_point, default_label_for_point
job = get_job(job_id)
if job is None:
return
def progress(msg: str, pct: int | None = None) -> None:
with _lock:
current = _jobs.get(job_id)
if current is None:
return
current.message = msg
if pct is not None:
current.progress = pct
with _lock:
job.status = "running"
job.message = "Starting road corridor analysis"
job.progress = 0
try:
bbox = bbox_around_point(lat, lon)
preset_id = adhoc_preset_id(lat, lon)
corridor_label = label or default_label_for_point(lat, lon)
result = analyze_corridor(
preset_id=preset_id,
label=corridor_label,
bbox=bbox,
country="adhoc",
category="viewport",
progress_cb=progress,
)
refresh_road_corridor_store()
with _lock:
current = _jobs.get(job_id)
if current is None:
return
current.status = "ok" if result.get("status") == "ok" else "error"
current.result = result
current.error = result.get("error")
current.message = (
f"{result.get('total_detections', 0)} signatures · "
f"{len(result.get('daily_counts') or [])} days"
)
current.progress = 100
except Exception as exc:
logger.exception("road corridor analyze job %s failed", job_id)
with _lock:
current = _jobs.get(job_id)
if current is None:
return
current.status = "error"
current.error = str(exc)
current.message = "Analysis failed"
current.progress = 100
def enqueue_analyze(lat: float, lon: float, label: str | None = None) -> AnalyzeJob:
running = _running_job()
if running is not None:
raise RuntimeError("analysis_already_running")
job_id = uuid.uuid4().hex[:12]
job = AnalyzeJob(job_id=job_id, lat=lat, lon=lon)
with _lock:
_jobs[job_id] = job
_prune_jobs()
thread = threading.Thread(
target=_worker,
args=(job_id, lat, lon, label),
name=f"road-corridor-analyze-{job_id}",
daemon=True,
)
thread.start()
return job
def job_to_dict(job: AnalyzeJob) -> dict[str, Any]:
return {
"job_id": job.job_id,
"lat": job.lat,
"lon": job.lon,
"status": job.status,
"message": job.message,
"progress": job.progress,
"result": job.result,
"error": job.error,
}
@@ -0,0 +1,216 @@
"""Run Sentinel-2 road-corridor truck trend analysis for a bbox preset."""
from __future__ import annotations
import logging
from collections.abc import Callable
from concurrent.futures import ThreadPoolExecutor, as_completed
from datetime import datetime, timedelta
from typing import Any
from .config import CACHE_DIR, DEFAULT_MAX_FRAMES, DEFAULT_MONTHS, DETECTION_CROP_DIR
from .storage import store_analysis_result
logger = logging.getLogger(__name__)
ProgressCb = Callable[[str, int | None], None]
_EVALSCRIPT = """//VERSION=3
function setup() {
return {
input: ["B02", "B03", "B04", "B08", "CLM"],
output: { id: "default", bands: 5, sampleType: "FLOAT32" }
};
}
function evaluatePixel(s) {
return [s.B04, s.B03, s.B02, s.B08, s.CLM];
}"""
def _noop_progress(_msg: str, _pct: int | None = None) -> None:
return None
def analyze_corridor(
*,
preset_id: str,
label: str,
bbox: list[float],
country: str = "",
category: str = "",
months: int = DEFAULT_MONTHS,
max_frames: int = DEFAULT_MAX_FRAMES,
progress_cb: ProgressCb | None = None,
) -> dict[str, Any]:
"""Synchronously analyze one corridor bbox and persist daily truck-count trends."""
from rasterio import features as rio_features
from rasterio import transform as rio_transform
from sentinelhub import BBox, CRS, DataCollection, MimeType, SentinelHubCatalog, SentinelHubRequest
from .credentials import build_sh_config
from .s2_truck_detect import S2TruckEngine
progress = progress_cb or _noop_progress
min_lat, min_lon, max_lat, max_lon = bbox
if abs(max_lat - min_lat) > 0.5 or abs(max_lon - min_lon) > 0.5:
raise ValueError("AOI too large. Max strategic sector is ~55 km x 55 km.")
CACHE_DIR.mkdir(parents=True, exist_ok=True)
engine = S2TruckEngine(
cache_dir=str(CACHE_DIR),
detection_dir=str(DETECTION_CROP_DIR),
)
config = build_sh_config()
progress(f"Road discovery for {label}", 10)
roads = engine.fetch_roads(bbox)
if roads.empty:
return store_analysis_result(
preset_id,
label=label,
bbox=bbox,
country=country,
category=category,
road_count=0,
frame_count=0,
detections=[],
status="error",
error="No major roads found in AOI.",
)
progress(f"Found {len(roads)} road segments — querying Copernicus catalog", 25)
sh_bbox = BBox(bbox=[min_lon, min_lat, max_lon, max_lat], crs=CRS.WGS84)
catalog = SentinelHubCatalog(config=config)
end_date = datetime.utcnow()
start_date = end_date - timedelta(days=max(1, months) * 30)
cdse_collection = DataCollection.SENTINEL2_L2A.define_from(
"s2l2a",
service_url=config.sh_base_url,
)
search_results = list(
catalog.search(
cdse_collection,
bbox=sh_bbox,
datetime=(
f"{start_date.strftime('%Y-%m-%dT00:00:00Z')}/"
f"{end_date.strftime('%Y-%m-%dT23:59:59Z')}"
),
filter="eo:cloud_cover < 60",
fields={"include": ["properties.datetime", "id"], "exclude": []},
)
)
unique_scenes: dict[str, Any] = {}
for res in search_results:
date_key = res["properties"]["datetime"][:10]
if date_key not in unique_scenes:
unique_scenes[date_key] = res
final_obs = [unique_scenes[d] for d in sorted(unique_scenes.keys(), reverse=True)]
final_obs = final_obs[: max(1, max_frames)]
if not final_obs:
return store_analysis_result(
preset_id,
label=label,
bbox=bbox,
country=country,
category=category,
road_count=len(roads),
frame_count=0,
detections=[],
status="error",
error=f"No clear imagery found in the last {months} months.",
)
def _fetch_frame(idx: int, res_obs: dict[str, Any]):
try:
date_str = res_obs["properties"]["datetime"]
req_sh = SentinelHubRequest(
evalscript=_EVALSCRIPT,
input_data=[
SentinelHubRequest.input_data(
data_collection=cdse_collection,
time_interval=(date_str, date_str),
)
],
responses=[SentinelHubRequest.output_response("default", MimeType.TIFF)],
bbox=sh_bbox,
config=config,
)
data_list = req_sh.get_data()
if not data_list:
return idx, date_str, None
return idx, date_str, data_list[0]
except Exception as exc:
logger.error("Sentinel frame %s failed: %s", idx, exc)
return idx, None, None
progress(f"Seed frame 1/{len(final_obs)}", 35)
_, seed_ts, seed_data = _fetch_frame(0, final_obs[0])
if seed_data is None:
return store_analysis_result(
preset_id,
label=label,
bbox=bbox,
country=country,
category=category,
road_count=len(roads),
frame_count=0,
detections=[],
status="error",
error="Failed to acquire seed spectral data.",
)
roads_buf = roads.to_crs(epsg=3857).buffer(20).to_crs(epsg=4326)
h, w = seed_data.shape[:2]
trans = rio_transform.from_bounds(min_lon, min_lat, max_lon, max_lat, w, h)
road_mask = rio_features.rasterize(
[(geom.__geo_interface__, 1) for geom in roads_buf.geometry],
out_shape=(h, w),
transform=trans,
fill=0,
all_touched=True,
)
detections: list[dict[str, Any]] = []
detections.extend(engine.detect_trucks(seed_data, bbox, final_obs[0]["properties"]["datetime"], road_mask))
if len(final_obs) > 1:
progress(f"Parallel frames ({len(final_obs) - 1} remaining)", 45)
with ThreadPoolExecutor(max_workers=3, thread_name_prefix="road-corridor") as executor:
futures = {
executor.submit(_fetch_frame, i, final_obs[i]): i for i in range(1, len(final_obs))
}
done = 1
for future in as_completed(futures):
idx, date_str, frame_data = future.result()
done += 1
if frame_data is not None and date_str:
detections.extend(engine.detect_trucks(frame_data, bbox, date_str, road_mask))
progress(f"Frame {done}/{len(final_obs)}", 45 + int((done / len(final_obs)) * 50))
progress(f"Complete — {len(detections)} truck signatures", 100)
return store_analysis_result(
preset_id,
label=label,
bbox=bbox,
country=country,
category=category,
road_count=len(roads),
frame_count=len(final_obs),
detections=detections,
status="ok",
)
def analyze_preset(preset_id: str, progress_cb: ProgressCb | None = None) -> dict[str, Any]:
from .presets import get_preset
preset = get_preset(preset_id)
if preset is None:
raise KeyError(f"Unknown preset: {preset_id}")
return analyze_corridor(
preset_id=preset["id"],
label=preset["label"],
bbox=preset["bbox"],
country=preset["country"],
category=preset["category"],
progress_cb=progress_cb,
)
@@ -0,0 +1,59 @@
"""Preset freight / chokepoint corridors for scheduled trend analysis."""
from __future__ import annotations
from typing import TypedDict
class CorridorPreset(TypedDict):
id: str
label: str
bbox: list[float] # [min_lat, min_lon, max_lat, max_lon]
country: str
category: str
# Bboxes are small (~510 km) highway segments suitable for 10 m Sentinel-2 analysis.
CORRIDOR_PRESETS: list[CorridorPreset] = [
{
"id": "laredo_i35",
"label": "Laredo I-35 (USMexico freight)",
"bbox": [27.48, -99.58, 27.54, -99.48],
"country": "USA / Mexico",
"category": "border_crossing",
},
{
"id": "bandar_abbas_feeder",
"label": "Bandar Abbas port feeder (Highway 71)",
"bbox": [27.12, 56.22, 27.22, 56.38],
"country": "Iran",
"category": "port_feeder",
},
{
"id": "rotterdam_a15",
"label": "Rotterdam A15 port feeder",
"bbox": [51.88, 4.42, 51.96, 4.58],
"country": "Netherlands",
"category": "port_feeder",
},
{
"id": "mombasa_nairobi_a109",
"label": "MombasaNairobi A109 corridor",
"bbox": [-4.10, 39.55, -1.20, 37.00],
"country": "Kenya",
"category": "trade_corridor",
},
{
"id": "braunschweig_a7",
"label": "Braunschweig A7 (validation)",
"bbox": [52.25, 10.45, 52.32, 10.55],
"country": "Germany",
"category": "validation",
},
]
def get_preset(preset_id: str) -> CorridorPreset | None:
for preset in CORRIDOR_PRESETS:
if preset["id"] == preset_id:
return preset
return None
@@ -0,0 +1,731 @@
"""S2 truck motion detection core (DrishX / Fisser et al. 2022 — see third_party/drishx/NOTICE.md)."""
from __future__ import annotations
import logging
import os
import pickle
import time
from pathlib import Path
import imageio.v3 as imageio
import numpy as np
import requests
from requests.adapters import HTTPAdapter
from shapely.geometry import LineString
from urllib3.util.retry import Retry
logger = logging.getLogger(__name__)
SECONDS_OFFSET_B02_B04 = 1.01
OVERPASS_MIRRORS = [
"https://lz4.overpass-api.de/api/interpreter",
"https://z.overpass-api.de/api/interpreter",
"https://overpass.osm.ch/api/interpreter",
"https://overpass-api.de/api/interpreter",
]
_session = requests.Session()
_retry = Retry(
total=2,
backoff_factor=1.0,
status_forcelist=[429, 500, 502, 503, 504],
allowed_methods=["GET", "POST"],
)
_adapter = HTTPAdapter(max_retries=_retry)
_session.mount("http://", _adapter)
_session.mount("https://", _adapter)
def _configure_osmnx(data_dir: str) -> None:
import osmnx as ox
ox.settings.requests_session = _session
ox.settings.requests_timeout = 30
ox.settings.overpass_rate_limit = False
ox.settings.max_query_area_size = 1_000_000_000_000
ox.settings.log_console = False
ox.settings.use_cache = True
ox.settings.cache_folder = os.path.join(data_dir, "osm_cache")
def _default_rf_model_path() -> str:
return str(Path(__file__).resolve().parents[2] / "data" / "drishx" / "rf_model.pickle")
# ─────────────────────────────────────────────────────────────────────────────
# Helper math — mirrors S2TD.array_utils.math
# ─────────────────────────────────────────────────────────────────────────────
def normalized_ratio(a, b):
"""(a - b) / (a + b), safe division."""
denom = a + b
with np.errstate(divide="ignore", invalid="ignore"):
result = np.where(denom != 0, (a - b) / denom, 0.0)
return result.astype(np.float32)
def rescale_s2(bands):
"""Rescale Sentinel-2 L2A reflectance values (typically 010000 int) to 01 float."""
bands = bands.astype(np.float32)
if np.nanmax(bands) > 10: # likely DN scale
bands /= 10000.0
return bands
# ─────────────────────────────────────────────────────────────────────────────
# Array subset — exact replica of S2TD.pick_arr_subset
# ─────────────────────────────────────────────────────────────────────────────
def pick_arr_subset(arr, y, x, size):
"""Pick a size×size window centred on (y, x) from a 2D or 3D array."""
size_low = size // 2
size_up = size // 2
if size_low + size_up < size:
size_up += 1
ymin = max(0, y - size_low)
ymax = max(0, y + size_up)
xmin = max(0, x - size_low)
xmax = max(0, x + size_up)
if arr.ndim == 2:
return arr[ymin:ymax, xmin:xmax]
elif arr.ndim == 3:
return arr[:, ymin:ymax, xmin:xmax]
return arr
# ─────────────────────────────────────────────────────────────────────────────
# Feature stack — exact 7 features as in S2TD._build_feature_stack (Table 1)
# ─────────────────────────────────────────────────────────────────────────────
def build_feature_stack(data):
"""
Build the 7-feature stack from Sentinel-2 bands.
Input `data` shape: (H, W, 5) with channels [B04(R), B03(G), B02(B), B08(NIR), CLM].
Feature order (Table 1, Fisser et al. 2022):
0: variance of (B04, B03, B02)
1: normalized_ratio(B04, B02) red / blue
2: normalized_ratio(B03, B02) green / blue
3: B04 - mean(B04)
4: B03 - mean(B03)
5: B02 - mean(B02)
6: B08 - mean(B08)
"""
R = data[:, :, 0].astype(np.float32) # B04
G = data[:, :, 1].astype(np.float32) # B03
B = data[:, :, 2].astype(np.float32) # B02
NIR = data[:, :, 3].astype(np.float32) # B08
CLM = data[:, :, 4]
# Rescale if needed
bands = np.stack([R, G, B, NIR], axis=0)
bands = rescale_s2(bands)
R, G, B, NIR = bands[0], bands[1], bands[2], bands[3]
# Cloud mask → NaN
cloud = CLM > 0
R[cloud] = np.nan
G[cloud] = np.nan
B[cloud] = np.nan
NIR[cloud] = np.nan
H, W = R.shape
fs = np.zeros((7, H, W), dtype=np.float32)
# Check for any valid data to avoid "Mean of empty slice" warnings
if np.any(~cloud):
import warnings
with warnings.catch_warnings():
warnings.simplefilter("ignore", category=RuntimeWarning)
# Feature 0: variance of visible bands
fs[0] = np.nanvar(np.stack([R, G, B], axis=0), axis=0, ddof=0)
# Features 12: normalized ratios
fs[1] = normalized_ratio(R, B)
fs[2] = normalized_ratio(G, B)
# Features 36: mean-centered bands
fs[3] = R - np.nanmean(R)
fs[4] = G - np.nanmean(G)
fs[5] = B - np.nanmean(B)
fs[6] = NIR - np.nanmean(NIR)
else:
# All pixels are cloud-masked
fs.fill(np.nan)
# Ensure NaN consistency
nan_mask = np.isnan(fs[3])
fs[:, nan_mask] = np.nan
return {
"feature_stack": fs,
"bands": {"R": R, "G": G, "B": B, "NIR": NIR},
"cloud_mask": cloud,
}
# ─────────────────────────────────────────────────────────────────────────────
# RF Model loading
# ─────────────────────────────────────────────────────────────────────────────
# Path to the trained Random Forest model from S2TruckDetect
RF_MODEL_PATH = _default_rf_model_path()
_rf_model = None
def load_rf_model(path=None):
"""Load the trained RF model from pickle. Returns None if not found."""
global _rf_model
p = path or RF_MODEL_PATH
if _rf_model is not None:
return _rf_model
if os.path.isfile(p):
try:
_rf_model = pickle.load(open(p, "rb"))
logger.info(f"Loaded trained RF model from {p}")
return _rf_model
except Exception as e:
logger.error(f"Failed to load RF model from {p}: {e}")
else:
logger.warning(f"RF model not found at {p} — will use proxy classifier (lower accuracy)")
return None
# ─────────────────────────────────────────────────────────────────────────────
# Classification — real RF (preferred) or proxy fallback
# ─────────────────────────────────────────────────────────────────────────────
def rf_classify(feature_stack, road_mask, rf_model):
"""
Classify pixels using the trained Random Forest model.
Exact replica of S2TD._predict + _postprocess_prediction.
:param feature_stack: (7, H, W) feature array
:param road_mask: (H, W) binary road mask
:param rf_model: trained sklearn RandomForestClassifier
:return: (probabilities (4, H, W), prediction (H, W) int8)
"""
H, W = feature_stack.shape[1], feature_stack.shape[2]
# Reshape to (n_pixels, 7) for sklearn
vars_reshaped = []
for band_idx in range(feature_stack.shape[0]):
vars_reshaped.append(feature_stack[band_idx].flatten())
vars_reshaped = np.array(vars_reshaped).swapaxes(0, 1) # (n_pixels, 7)
# Build NaN mask — exclude NaN and Inf pixels
nan_mask_flat = np.zeros_like(vars_reshaped)
for var_idx in range(vars_reshaped.shape[1]):
nan_mask_flat[:, var_idx] = ~np.isnan(vars_reshaped[:, var_idx])
not_nan = (np.nanmin(nan_mask_flat, axis=1).astype(bool)
& np.min(np.isfinite(vars_reshaped), axis=1).astype(bool))
# Run RF predict_proba on valid pixels only
if not np.any(not_nan):
# Graceful return if no valid pixels found (e.g., all cloud masked)
probabilities_shaped = np.zeros((4, H, W), dtype=np.float32)
classification = np.zeros((H, W), dtype=np.int8)
return probabilities_shaped, classification
predictions_flat = rf_model.predict_proba(vars_reshaped[not_nan])
# Map probabilities back to spatial grid
n_classes = predictions_flat.shape[1]
probabilities_shaped = np.zeros((n_classes, H * W), dtype=np.float32)
for idx in range(n_classes):
probabilities_shaped[idx, not_nan] = predictions_flat[:, idx]
probabilities_shaped = probabilities_shaped.reshape((n_classes, H, W))
# Zero out NaN positions
nan_2d = np.isnan(feature_stack[0])
probabilities_shaped[:, nan_2d] = 0
# Post-process: suppress low-confidence background (exact S2TD logic)
probabilities_shaped[1][probabilities_shaped[1] < 0.75] = 0
classification = np.nanargmax(probabilities_shaped, axis=0).astype(np.int8) + 1
classification[np.max(probabilities_shaped, axis=0) == 0] = 0
classification[nan_2d] = 0
# Apply road mask
rm = road_mask.astype(bool)
classification[~rm] = 0
return probabilities_shaped, classification
def proxy_classify(feature_stack, road_mask):
"""
Heuristic proxy when RF model is unavailable. Lower accuracy.
Produces:
probabilities: (4, H, W) class probs for [background, blue, green, red]
prediction: (H, W) int8 labels {0=nan, 1=background, 2=blue, 3=green, 4=red}
"""
fs = feature_stack # (7, H, W)
H, W = fs.shape[1], fs.shape[2]
probs = np.zeros((4, H, W), dtype=np.float32)
centered_R = fs[3]
centered_G = fs[4]
centered_B = fs[5]
var_feat = fs[0]
nratio_rb = fs[1]
nratio_gb = fs[2]
rm = road_mask.astype(bool)
nan_mask = np.isnan(centered_R)
blue_score = np.clip(-nratio_rb * 2 + centered_B * 5 + var_feat * 10, 0, None)
blue_score[~rm | nan_mask] = 0
green_score = np.clip(nratio_gb * 2 + centered_G * 5 + var_feat * 10, 0, None)
green_score[~rm | nan_mask] = 0
red_score = np.clip(nratio_rb * 2 + centered_R * 5 + var_feat * 10, 0, None)
red_score[~rm | nan_mask] = 0
total = blue_score + green_score + red_score + 1e-8
probs[1] = blue_score / total
probs[2] = green_score / total
probs[3] = red_score / total
probs[0] = 1.0 - np.max(probs[1:], axis=0)
probs[0][probs[0] < 0.75] = 0
classification = np.nanargmax(probs, axis=0).astype(np.int8) + 1
classification[np.max(probs, axis=0) == 0] = 0
classification[nan_mask] = 0
classification[~rm] = 0
return probs, classification
def classify(feature_stack, road_mask, rf_model=None):
"""
Unified classifier entry point.
Uses trained RF if model is provided, otherwise falls back to proxy.
"""
if rf_model is not None:
logger.debug("Using trained RF model for classification")
return rf_classify(feature_stack, road_mask, rf_model)
else:
logger.debug("Using proxy classifier (no RF model loaded)")
return proxy_classify(feature_stack, road_mask)
# ─────────────────────────────────────────────────────────────────────────────
# Object extraction — faithful port of S2TD ObjectExtractor
# ─────────────────────────────────────────────────────────────────────────────
class ObjectExtractor:
"""
Extracts truck objects from the RF prediction raster using recursive
neighbourhood clustering, matching the S2TD reference implementation.
"""
def __init__(self, probabilities, lat_arr, lon_arr):
"""
:param probabilities: (4, H, W) class probabilities
:param lat_arr: 1-D array of latitude per row
:param lon_arr: 1-D array of longitude per column
"""
self.probabilities = probabilities
self.lat = lat_arr
self.lon = lon_arr
def extract(self, predictions_arr):
"""Main extraction loop over all blue (class 2) seed pixels."""
preds = predictions_arr.copy()
probs = self.probabilities.copy()
preds[preds == 1] = 0 # zero out background
blue_ys, blue_xs = np.where(preds == 2)
detections = []
sub_size = 9
for i in range(len(blue_ys)):
y_blue, x_blue = int(blue_ys[i]), int(blue_xs[i])
if preds[y_blue, x_blue] == 0:
continue
subset_9 = pick_arr_subset(preds, y_blue, x_blue, sub_size).copy()
subset_3 = pick_arr_subset(preds, y_blue, x_blue, 3).copy()
subset_9_probs = pick_arr_subset(probs, y_blue, x_blue, sub_size).copy()
half_idx_y = y_blue if subset_9.shape[0] < sub_size else subset_9.shape[0] // 2
half_idx_x = x_blue if subset_9.shape[1] < sub_size else subset_9.shape[1] // 2
try:
current_value = subset_9[half_idx_y, half_idx_x]
except IndexError:
half_idx_y, half_idx_x = sub_size // 2, sub_size // 2
current_value = subset_9[half_idx_y, half_idx_x]
new_value = 100
if not all(v in subset_9 for v in [2, 3, 4]):
continue
cluster, seen_idx, seen_vals, _ = self._cluster_array(
arr=subset_9, probs=subset_9_probs,
point=[half_idx_y, half_idx_x],
new_value=new_value, current_value=current_value,
yet_seen_indices=[], yet_seen_values=[],
skipped_one=False,
)
if np.count_nonzero(cluster == new_value) < 3:
continue
det = self._postprocess_cluster(
cluster, preds, probs, subset_3,
y_blue, x_blue,
half_idx_y, half_idx_x,
new_value,
)
if det is not None:
preds = det["updated_preds"]
detections.append(det["detection"])
return detections
def _cluster_array(self, arr, probs, point, new_value, current_value,
yet_seen_indices, yet_seen_values, skipped_one):
"""Recursive neighbourhood clustering — matches S2TD._cluster_array."""
if len(yet_seen_indices) == 0:
yet_seen_indices.append(point)
yet_seen_values.append(current_value)
arr_mod = arr.copy()
arr_mod[point[0], point[1]] = 0
window_3x3 = pick_arr_subset(arr_mod, point[0], point[1], 3).copy()
if window_3x3.shape[0] >= 2 and window_3x3.shape[1] >= 2:
cy = min(1, window_3x3.shape[0] - 1)
cx = min(1, window_3x3.shape[1] - 1)
if window_3x3[cy, cx] == 2:
window_3x3[window_3x3 == 4] = 1 # eliminate reds near blue
y, x = point[0], point[1]
window_3x3_probs = pick_arr_subset(probs, y, x, 3)
windows = [window_3x3]
windows_probs = [window_3x3_probs]
if current_value == 4 or skipped_one:
windows = windows[0:1]
ys, xs = np.array([], dtype=int), np.array([], dtype=int)
window_idx = 0
offset_y, offset_x = 0, 0
while len(ys) == 0 and window_idx < len(windows):
window = windows[window_idx]
window_p = windows_probs[window_idx]
offset_y = window.shape[0] // 2
offset_x = window.shape[1] // 2
go_next = (current_value + 1) in window or current_value == 2
target_value = current_value + 1 if go_next else current_value
match = window == target_value
if np.count_nonzero(match) == 0:
target_value = current_value
match = window == target_value
ys_found, xs_found = np.where(match)
# Probability-based tie-breaking
if len(ys_found) > 1 and window_p.ndim == 3 and window_p.shape[0] > (target_value - 1):
wp_target = window_p[target_value - 1] * match
max_prob_mask = (wp_target == np.max(wp_target))
ys_found, xs_found = np.where(max_prob_mask)
ys, xs = ys_found, xs_found
window_idx += 1
ymin_w = max(0, point[0] - offset_y)
xmin_w = max(0, point[1] - offset_x)
for y_local, x_local in zip(ys, xs):
ny, nx = ymin_w + int(y_local), xmin_w + int(x_local)
if [ny, nx] in yet_seen_indices:
continue
if ny < 0 or ny >= arr.shape[0] or nx < 0 or nx >= arr.shape[1]:
continue
try:
cv = arr[ny, nx]
except IndexError:
continue
# Red already seen but this is green or blue → skip
if 4 in yet_seen_values and cv <= 3:
continue
arr_mod[ny, nx] = new_value
yet_seen_indices.append([ny, nx])
yet_seen_values.append(cv)
# Guard: avoid picking many more reds than blues and greens
n_blue = sum(1 for v in yet_seen_values if v == 2)
n_green = sum(1 for v in yet_seen_values if v == 3)
n_red = sum(1 for v in yet_seen_values if v == 4)
if n_red > n_blue and n_red > n_green:
break
arr_mod, yet_seen_indices, yet_seen_values, skipped_one = self._cluster_array(
arr_mod, probs, [ny, nx], new_value, cv,
yet_seen_indices, yet_seen_values, skipped_one,
)
arr_mod[point[0], point[1]] = new_value
return arr_mod, yet_seen_indices, yet_seen_values, skipped_one
def _postprocess_cluster(self, cluster, preds_copy, probs, subset_3,
y_blue, x_blue, half_idx_y, half_idx_x,
new_value):
"""Validate cluster and produce a detection dict — mirrors S2TD._postprocess_cluster."""
# Add neighbouring blues from the 3×3 window
ys_ba, xs_ba = np.where(subset_3 == 2)
ys_ba = ys_ba + half_idx_y - 1
xs_ba = xs_ba + half_idx_x - 1
for yb, xb in zip(ys_ba, xs_ba):
yb_c = int(np.clip(yb, 0, cluster.shape[0] - 1))
xb_c = int(np.clip(xb, 0, cluster.shape[1] - 1))
cluster[yb_c, xb_c] = new_value
cluster[cluster != new_value] = 0
cys, cxs = np.where(cluster == new_value)
if len(cys) == 0:
return None
# Map subset coords back to full array
ymin_sub = int(np.clip(y_blue - half_idx_y, 0, np.inf))
xmin_sub = int(np.clip(x_blue - half_idx_x, 0, np.inf))
cys_full = cys + ymin_sub
cxs_full = cxs + xmin_sub
ymin = int(np.min(cys_full))
xmin = int(np.min(cxs_full))
ymax = int(np.max(cys_full)) + 1 # +1: box extends to upper bound of pixel
xmax = int(np.max(cxs_full)) + 1
H, W = preds_copy.shape
ymin, ymax = max(0, ymin), min(H, ymax)
xmin, xmax = max(0, xmin), min(W, xmax)
box_preds = preds_copy[ymin:ymax, xmin:xmax].copy()
box_probs = probs[1:, ymin:ymax, xmin:xmax].copy() # classes 2,3,4 → indices 0,1,2
# Spectral probability scores (exact S2TD logic)
max_probs = []
for cls_offset, cls_val in enumerate([2, 3, 4]):
mask = (box_preds == cls_val)
vals = box_probs[cls_offset] * mask
mp = float(np.nanmax(vals)) if np.any(mask) else 0.0
max_probs.append(mp)
mean_max_spectral_probability = float(np.nanmean(max_probs))
mean_spectral_probability = float(np.nanmean(np.nanmax(box_probs, axis=0)))
# Validation checks
all_given = all(v in box_preds for v in [2, 3, 4])
large_enough = box_preds.shape[0] > 2 or box_preds.shape[1] > 2
too_large = box_preds.shape[0] > 5 or box_preds.shape[1] > 5
if too_large or not all_given or not large_enough:
return None
# Score: TWO terms — matches reference
score = mean_max_spectral_probability + mean_spectral_probability
if score <= 1.2:
return None
# Direction (blue → red vector)
by, bx = np.where(box_preds == 2)
ry, rx = np.where(box_preds == 4)
blue_idx = np.array([by[0], bx[0]], dtype=np.int8)
red_idx = np.array([ry[0], rx[0]], dtype=np.int8)
vector = (blue_idx - red_idx) * np.array([1, -1], dtype=np.int8)
heading = float(np.degrees(np.arctan2(vector[1], vector[0])) % 360)
# Speed
diameter = max(box_preds.shape) * 10 - 10
speed_kmh = float(np.sqrt(diameter * 20) / SECONDS_OFFSET_B02_B04 * 3.6)
# Geo-coordinates (centre of detection box)
lat_centre = float((self.lat[ymin] + self.lat[min(ymax, len(self.lat) - 1)]) / 2)
lon_centre = float((self.lon[xmin] + self.lon[min(xmax, len(self.lon) - 1)]) / 2)
# Zero out detected pixels to prevent re-detection
preds_copy[ymin:ymax, xmin:xmax] *= np.zeros_like(box_preds)
# Also zero 3×3 around blue pixels
blue_in_box = np.where(box_preds == 2)
for yb, xb in zip(blue_in_box[0], blue_in_box[1]):
y0, y1 = max(0, ymin + yb - 1), min(H, ymin + yb + 2)
x0, x1 = max(0, xmin + xb - 1), min(W, xmin + xb + 2)
preds_copy[y0:y1, x0:x1] *= (preds_copy[y0:y1, x0:x1] != 2).astype(np.int8)
crop_id = f"truck_{int(time.time() * 1000)}_{ymin}_{xmin}.png"
return {
"updated_preds": preds_copy,
"detection": {
"lat": lat_centre,
"lon": lon_centre,
"confidence": float(min(score / 2.4, 1.0)),
"s_score": round(score, 3),
"speed_kmh": round(speed_kmh, 1),
"heading": round(heading, 1),
"heading_desc": self._direction_to_compass(heading),
"id": crop_id,
"image_url": f"/detections/{crop_id}",
"box_shape": list(box_preds.shape),
"max_probs": {"blue": max_probs[0], "green": max_probs[1], "red": max_probs[2]},
},
}
@staticmethod
def _direction_to_compass(deg):
bins = np.arange(0, 359, 45, dtype=np.float32)
labels = ["N", "NE", "E", "SE", "S", "SW", "W", "NW"]
return labels[int(np.argmin(np.abs(bins - deg)))]
# ─────────────────────────────────────────────────────────────────────────────
# ARGUS Engine
# ─────────────────────────────────────────────────────────────────────────────
class S2TruckEngine:
def __init__(self, *, cache_dir: str, detection_dir: str, rf_model_path: str | None = None):
self.cache_dir = cache_dir
self.detection_dir = detection_dir
os.makedirs(self.detection_dir, exist_ok=True)
_configure_osmnx(cache_dir)
self.rf_model = load_rf_model(rf_model_path)
def fetch_roads(self, bbox_coords, progress_cb=None):
"""Fetch major roads with automatic mirror rotation and fallbacks."""
import geopandas as gpd
import osmnx as ox
def log(msg, level="info", pct=None):
if level == "info":
logger.info(msg)
elif level == "warn":
logger.warning(msg)
if progress_cb:
progress_cb(msg, pct)
min_lat, min_lon, max_lat, max_lon = bbox_coords
center_lat = (min_lat + max_lat) / 2
center_lon = (min_lon + max_lon) / 2
lat_span = (max_lat - min_lat) * 111000
lon_span = (max_lon - min_lon) * 111000 * np.cos(np.radians(center_lat))
dist_m = int(max(lat_span, lon_span) * 0.6) + 1000
log(f"Starting road discovery (ROI: {center_lat:.4f}, {center_lon:.4f})", pct=5)
for i, mirror in enumerate(OVERPASS_MIRRORS):
log(f"Trying mirror {i+1}/{len(OVERPASS_MIRRORS)}: {mirror}", pct=10 + i * 5)
ox.settings.overpass_url = mirror
try:
graph = ox.graph_from_point(
(center_lat, center_lon), dist=dist_m,
network_type="drive", simplify=True,
retain_all=False, truncate_by_edge=True,
)
roads = ox.graph_to_gdfs(graph, nodes=False)
major_types = [
"motorway", "trunk", "primary", "secondary",
"motorway_link", "trunk_link", "primary_link",
]
roads = roads[roads["highway"].isin(major_types)].copy()
if not roads.empty:
logger.info(f"Fetched {len(roads)} major roads from {mirror}")
return roads
except Exception as e:
logger.warning(f"Mirror {mirror} failed: {e}")
time.sleep(1)
# Raw Overpass fallback
logger.warning("All mirrors failed. Trying raw Overpass query.")
try:
query = f"""
[out:json][timeout:60];
(way["highway"~"motorway|trunk|primary"]({min_lat},{min_lon},{max_lat},{max_lon}););
out body; >; out skel qt;
"""
resp = requests.post(OVERPASS_MIRRORS[0], data={"data": query}, timeout=60)
if resp.status_code == 200:
data = resp.json()
nodes = {n["id"]: (n["lon"], n["lat"]) for n in data["elements"] if n["type"] == "node"}
ways = []
for w in data["elements"]:
if w["type"] == "way" and "nodes" in w:
coords = [nodes[nid] for nid in w["nodes"] if nid in nodes]
if len(coords) > 1:
ways.append({"geometry": LineString(coords), "highway": w["tags"].get("highway")})
if ways:
roads = gpd.GeoDataFrame(ways, crs="EPSG:4326")
logger.info(f"Raw fallback: {len(roads)} roads")
return roads
except Exception as e:
logger.error(f"Raw fallback failed: {e}")
return gpd.GeoDataFrame()
def detect_trucks(self, data, bbox_coords, timestamp, road_mask):
"""
Detect trucks using corrected Fisser et al. methodology.
:param data: (H, W, 5) array [B04, B03, B02, B08, CLM]
:param bbox_coords: [min_lat, min_lon, max_lat, max_lon]
:param timestamp: str ISO timestamp
:param road_mask: (H, W) binary mask of road pixels
:return: list of detection dicts
"""
min_lat, min_lon, max_lat, max_lon = bbox_coords
H, W = data.shape[:2]
# 1. Build feature stack (corrected order)
feat = build_feature_stack(data)
feature_stack = feat["feature_stack"]
# 2. Classify (real RF if loaded, proxy fallback otherwise)
probs, prediction = classify(feature_stack, road_mask, self.rf_model)
# 3. Lat/lon arrays for geo-referencing
lat_arr = np.linspace(max_lat, min_lat, H) # top to bottom
lon_arr = np.linspace(min_lon, max_lon, W) # left to right
# 4. Object extraction (corrected)
extractor = ObjectExtractor(probs, lat_arr, lon_arr)
detections = extractor.extract(prediction)
# 5. Add timestamp and save crops
for det in detections:
det["timestamp"] = timestamp
try:
self._save_crop(data, det, H, W, min_lat, min_lon, max_lat, max_lon)
except Exception as e:
logger.warning(f"Could not save crop for {det['id']}: {e}")
return detections
def _save_crop(self, data, det, H, W, min_lat, min_lon, max_lat, max_lon):
"""Save a 20×20 RGB crop centred on the detection."""
cy = int((max_lat - det["lat"]) / (max_lat - min_lat + 1e-9) * H)
cx = int((det["lon"] - min_lon) / (max_lon - min_lon + 1e-9) * W)
cy, cx = int(np.clip(cy, 0, H - 1)), int(np.clip(cx, 0, W - 1))
y0, y1 = max(0, cy - 10), min(H, cy + 10)
x0, x1 = max(0, cx - 10), min(W, cx + 10)
rgb = data[y0:y1, x0:x1, :3].astype(np.float32)
rgb = rescale_s2(rgb)
rgb = (np.clip(rgb, 0, 0.3) / 0.3 * 255).astype(np.uint8)
path = os.path.join(self.detection_dir, det["id"])
imageio.imwrite(path, rgb)

Some files were not shown because too many files have changed in this diff Show More