Files
Shadowbroker/backend/services/radio_intercept.py
T
Shadowbroker e36d1fc79c [security] Close tg12 audit issues #201–#214 seamlessly (#261)
External security audit by @tg12 (May 17, 2026) filed issues #201–#214
in addition to the #189–#200 batch already closed by PRs #227/#232/#260.
This PR closes all eight that are real security bugs (the other six in
the 201–214 range are either design discussions or upstream-abuse/TOS
concerns we're keeping intentional, see issue triage notes on each).

The user-facing principle for this PR: fix the security gap WITHOUT
introducing a single hostile error or behavior change for legitimate
users. Every fix follows the same template — fail forward, not loud.
When the secure path is harder than the insecure one, build a
fallback chain that ends in graceful degradation, not in a scary
modal or 422 response.

  #205 — OpenMHZ audio redirect SSRF (services/radio_intercept.py)

  Replaced requests.get(..., allow_redirects=True) with a manual
  redirect loop that re-validates each hop's host against
  _OPENMHZ_AUDIO_HOSTS. Same-host redirects (CDN edge selection)
  still work, so legitimate audio playback is unaffected. Cross-host
  redirects to disallowed hosts return a generic 502 which the
  browser audio element handles gracefully. Cap at 5 hops.

  #207 — infonet/status verify_signatures DoS (routers/mesh_public.py)

  Silently downgrade verify_signatures=true to False for
  unauthenticated callers. No error surfaced — the response shape is
  identical, just without the O(n_events) signature verification.
  Authenticated callers (scoped mesh.audit) still get the full path.
  The frontend never passes this param so legitimate UI is unaffected.

  #211 — thermal/verify expensive analysis (routers/sigint.py)

  Added Depends(require_local_operator). Frontend has no direct
  callers (verified by grep); Tauri/AI agents use scoped tokens that
  pass the auth check. Anonymous abusers blocked silently — the
  legitimate UI keeps working through the Next.js admin-key proxy.

  #213, #214 — OpenMHZ calls/audio upstream abuse (routers/radio.py)

  Added Depends(require_local_operator) to both. Browser users hit
  these through the Next.js proxy at src/app/api/[...path]/route.ts
  which injects X-Admin-Key, so the auth check passes transparently.
  Direct attackers can no longer rotate sys_names to hammer
  api.openmhz.com or relay arbitrary audio streams through the
  backend's bandwidth.

  #202 — overflights unbounded hours (routers/data.py)

  Silently clamp `hours` to OVERFLIGHTS_MAX_HOURS (default 72,
  configurable). NO 422 — clients asking for an absurd window get a
  shorter window back with `requested_hours` and `effective_hours`
  hint fields. Postel's law: liberal in what we accept, conservative
  in what we compute.

  #203 — Meshtastic callsign UA leak (services/fetchers/meshtastic_map.py)

  Added MESHTASTIC_SEND_CALLSIGN_HEADER opt-out env var. Default is
  TRUE — preserves existing operator behavior (callsign sent so
  meshtastic.org can rate-limit per-install). Privacy-conscious
  operators set it to false to suppress.

  #206 — KiwiSDR upstream is HTTP-only (services/kiwisdr_fetcher.py)

  Upstream rx.linkfanel.net doesn't speak HTTPS (verified — Apache
  2.4.10 only on port 80). We can't fix the transport. Instead added
  three layers:
    1. Content validation on fetched data — reject responses with
       <50 receivers or >5% malformed entries (likely MITM injection).
    2. Existing disk cache fallback (already present).
    3. NEW: bundled static directory at backend/data/kiwisdr_directory.json
       shipping 798 known-good receivers. Used as last resort so the
       KiwiSDR map layer always renders something useful.

  #208 — Merkle proof DoS via /api/mesh/infonet/sync (services/mesh/mesh_hashchain.py)

  The endpoint is part of the cross-node federation protocol — peers
  legitimately call it without local-operator auth, so we can't add
  Depends(). Instead made the underlying operation O(1) per proof
  via a cached Merkle level structure on the Infonet instance:
    - _merkle_levels_cache + _merkle_levels_for_event_count on each
      Infonet instance
    - _invalidate_merkle_cache() called from every chain mutation
      point (append, ingest_events, apply_fork, cleanup_expired)
    - _get_merkle_levels() does the lazy recompute on first read
      after invalidation, then serves from cache thereafter
  Effect: anonymous attackers hammering the proofs endpoint hit a
  cached structure; the rebuild happens at most once per real chain
  advance. Federation untouched.

  #201 — Tor bundle SHA-256 bypass (services/tor_hidden_service.py)

  Docker users were already covered — backend/Dockerfile installs
  Tor via apt-get at build time (signed by Debian's package system).
  No runtime download needed for the 80%-of-users case.

  For Tauri desktop, replaced the single .sha256sum check with a
  multi-source verification chain implemented in _verify_tor_bundle():
    1. Try upstream .sha256sum (current behavior — fast path)
    2. Try baked-in digest list at backend/data/tor_bundle_digests.json
       (pinned per-version, maintainer-updated)
    3. If neither source is REACHABLE: HTTPS-only fallback with a loud
       warning (avoids breaking first-run onboarding while the
       maintainer hasn't yet pinned a new Tor release)
  A mismatch from a source that DID respond is always fatal — only
  the "no source reachable" case falls back to HTTPS-only. This is
  the "have cake and eat it" pattern: real users see no new failure
  modes during torproject.org outages, but MITM/compromise attacks
  still fail because the downloaded digest can't match what BOTH
  the upstream and the baked-in list report.

  Currently the digest file ships with placeholder values for the
  current Tor URLs (those URLs are already stale on torproject.org
  too). A follow-up commit can populate real digests when a stable
  Tor release is selected; until then the HTTPS-only warning fires
  and onboarding still works.

Tests (82 total, all passing):
  test_openmhz_redirect_ssrf.py        (5 tests)  — #205
  test_infonet_status_verify_gate.py   (2 tests)  — #207
  test_overflights_clamp.py            (5 tests)  — #202
  test_meshtastic_callsign_optout.py   (3 tests)  — #203
  test_kiwisdr_fallback.py             (6 tests)  — #206
  test_merkle_cache.py                 (6 tests)  — #208
  test_tor_bundle_verification.py      (6 tests)  — #201
  test_control_surface_auth.py         (extended) — #211, #213, #214
  + all previous security tests (CCTV redirect, GDELT https, sentinel
    cache, crowdthreat opt-in, third-party fetcher gates, control
    surface auth) continue to pass.

Pre-existing test infrastructure issue with SHARED_EXECUTOR teardown
in the broader sweep exists on main too (verified) — not introduced
by this PR.

Credit: @tg12 reported every one of these with accurate line citations
and the recommended fixes that informed this implementation.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-20 19:57:06 -06:00

317 lines
11 KiB
Python

import requests
from bs4 import BeautifulSoup
import logging
from cachetools import cached, TTLCache
import cloudscraper
import reverse_geocoder as rg
from urllib.parse import urlparse
logger = logging.getLogger(__name__)
_OPENMHZ_AUDIO_HOSTS = {"media.openmhz.com", "media2.openmhz.com", "media3.openmhz.com"}
# Cache the top feeds for 5 minutes so we don't hammer Broadcastify
radio_cache = TTLCache(maxsize=1, ttl=300)
@cached(radio_cache)
def get_top_broadcastify_feeds():
"""
Scrapes the Broadcastify Top 50 live audio feeds public dashboard.
Returns a list of dictionaries containing feed metadata and direct stream URLs.
"""
logger.info("Scraping Broadcastify Top Feeds (Cache Miss)")
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.9",
}
try:
res = requests.get("https://www.broadcastify.com/listen/top", headers=headers, timeout=10)
if res.status_code != 200:
logger.error(f"Broadcastify Scrape Failed: HTTP {res.status_code}")
return []
soup = BeautifulSoup(res.text, "html.parser")
table = soup.find("table", {"class": "btable"})
if not table:
logger.error("Could not find feeds table on Broadcastify.")
return []
feeds = []
rows = table.find_all("tr")[1:] # Skip header row
for row in rows:
cols = row.find_all("td")
if len(cols) >= 5:
# Top layout: [Listeners, Feed ID (hidden), Location, Feed Name, Category, Genre]
listeners_str = cols[0].text.strip().replace(",", "")
listeners = int(listeners_str) if listeners_str.isdigit() else 0
link_tag = cols[2].find("a")
if not link_tag:
continue
href = link_tag.get("href", "")
feed_id = href.split("/")[-1] if "/listen/feed/" in href else None
if not feed_id:
continue
location = cols[1].text.strip()
name = cols[2].text.strip()
category = cols[3].text.strip()
feeds.append(
{
"id": feed_id,
"listeners": listeners,
"location": location,
"name": name,
"category": category,
"stream_url": f"https://broadcastify.cdnstream1.com/{feed_id}",
}
)
logger.info(f"Successfully scraped {len(feeds)} top feeds from Broadcastify.")
return feeds
except (requests.RequestException, ConnectionError, TimeoutError, ValueError, KeyError) as e:
logger.error(f"Broadcastify Scrape Exception: {e}")
return []
# Cache OpenMHZ systems mapping so we don't have to fetch all 450+ every time
openmhz_systems_cache = TTLCache(maxsize=1, ttl=3600)
@cached(openmhz_systems_cache)
def get_openmhz_systems():
"""Fetches the full directory of OpenMHZ systems."""
logger.info("Scraping OpenMHZ Systems (Cache Miss)")
scraper = cloudscraper.create_scraper(
browser={"browser": "chrome", "platform": "windows", "desktop": True}
)
try:
res = scraper.get("https://api.openmhz.com/systems", timeout=15)
if res.status_code == 200:
data = res.json()
# Return list of systems
return data.get("systems", []) if isinstance(data, dict) else []
return []
except (requests.RequestException, ConnectionError, TimeoutError, ValueError, KeyError) as e:
logger.error(f"OpenMHZ Systems Scrape Exception: {e}")
return []
# Cache specific city calls briefly (15-30s) to limit our polling rate
openmhz_calls_cache = TTLCache(maxsize=100, ttl=20)
@cached(openmhz_calls_cache)
def get_recent_openmhz_calls(sys_name: str):
"""Fetches the actual audio burst .m4a URLs for a specific system (e.g., 'wmata')."""
logger.info(f"Fetching OpenMHZ calls for {sys_name} (Cache Miss)")
scraper = cloudscraper.create_scraper(
browser={"browser": "chrome", "platform": "windows", "desktop": True}
)
try:
url = f"https://api.openmhz.com/{sys_name}/calls"
res = scraper.get(url, timeout=15)
if res.status_code == 200:
data = res.json()
return data.get("calls", []) if isinstance(data, dict) else []
return []
except (requests.RequestException, ConnectionError, TimeoutError, ValueError, KeyError) as e:
logger.error(f"OpenMHZ Calls Scrape Exception ({sys_name}): {e}")
return []
_OPENMHZ_MAX_REDIRECTS = 5
def openmhz_audio_response(target_url: str):
"""Fetch an OpenMHz audio object through the backend with browser-safe headers.
Redirects are followed manually so each hop's host can be re-validated
against ``_OPENMHZ_AUDIO_HOSTS``. Without this, the upstream could
302-redirect to an internal address (e.g. ``http://127.0.0.1:8000/...``
or an RFC1918 range), and the backend would dutifully fetch and stream
that response back to the browser — a classic open-redirect-to-SSRF
chain. Same-host redirects (CDN edge selection) still work normally.
"""
from fastapi import HTTPException
from fastapi.responses import StreamingResponse
from urllib.parse import urljoin
parsed = urlparse(str(target_url or ""))
host = (parsed.hostname or "").lower()
if parsed.scheme != "https" or host not in _OPENMHZ_AUDIO_HOSTS:
raise HTTPException(status_code=400, detail="Unsupported OpenMHz audio URL")
current_url = target_url
hops = 0
try:
while True:
upstream = requests.get(
current_url,
stream=True,
timeout=(5, 20),
allow_redirects=False,
headers={
"User-Agent": "Mozilla/5.0",
"Accept": "audio/mpeg,audio/*,*/*;q=0.8",
"Referer": "https://openmhz.com/",
},
)
if upstream.is_redirect or upstream.status_code in (301, 302, 303, 307, 308):
location = upstream.headers.get("Location", "")
upstream.close()
if hops >= _OPENMHZ_MAX_REDIRECTS or not location:
raise HTTPException(status_code=502, detail="OpenMHz redirect rejected")
next_url = urljoin(current_url, location)
next_parsed = urlparse(next_url)
next_host = (next_parsed.hostname or "").lower()
# Re-validate the next hop against the same allowlist used for
# the original URL. Cross-host redirects to disallowed hosts
# are rejected silently; the browser audio element handles
# the resulting 502 gracefully and moves on.
if next_parsed.scheme != "https" or next_host not in _OPENMHZ_AUDIO_HOSTS:
raise HTTPException(status_code=502, detail="OpenMHz redirect rejected")
current_url = next_url
hops += 1
continue
break
except requests.RequestException as exc:
raise HTTPException(status_code=502, detail="OpenMHz audio fetch failed") from exc
if upstream.status_code >= 400:
upstream.close()
raise HTTPException(status_code=upstream.status_code, detail="OpenMHz audio unavailable")
def chunks():
try:
for chunk in upstream.iter_content(chunk_size=64 * 1024):
if chunk:
yield chunk
finally:
upstream.close()
return StreamingResponse(
chunks(),
media_type="audio/mpeg",
headers={
"Cache-Control": "public, max-age=300",
"Accept-Ranges": "bytes",
},
)
US_STATES = {
"Alabama": "AL",
"Alaska": "AK",
"Arizona": "AZ",
"Arkansas": "AR",
"California": "CA",
"Colorado": "CO",
"Connecticut": "CT",
"Delaware": "DE",
"Florida": "FL",
"Georgia": "GA",
"Hawaii": "HI",
"Idaho": "ID",
"Illinois": "IL",
"Indiana": "IN",
"Iowa": "IA",
"Kansas": "KS",
"Kentucky": "KY",
"Louisiana": "LA",
"Maine": "ME",
"Maryland": "MD",
"Massachusetts": "MA",
"Michigan": "MI",
"Minnesota": "MN",
"Mississippi": "MS",
"Missouri": "MO",
"Montana": "MT",
"Nebraska": "NE",
"Nevada": "NV",
"New Hampshire": "NH",
"New Jersey": "NJ",
"New Mexico": "NM",
"New York": "NY",
"North Carolina": "NC",
"North Dakota": "ND",
"Ohio": "OH",
"Oklahoma": "OK",
"Oregon": "OR",
"Pennsylvania": "PA",
"Rhode Island": "RI",
"South Carolina": "SC",
"South Dakota": "SD",
"Tennessee": "TN",
"Texas": "TX",
"Utah": "UT",
"Vermont": "VT",
"Virginia": "VA",
"Washington": "WA",
"West Virginia": "WV",
"Wisconsin": "WI",
"Wyoming": "WY",
"Washington, D.C.": "DC",
"District of Columbia": "DC",
}
import math
def haversine_distance(lat1, lon1, lat2, lon2):
R = 3958.8 # Earth radius in miles
dLat = math.radians(lat2 - lat1)
dLon = math.radians(lon2 - lon1)
a = math.sin(dLat / 2) * math.sin(dLat / 2) + math.cos(math.radians(lat1)) * math.cos(
math.radians(lat2)
) * math.sin(dLon / 2) * math.sin(dLon / 2)
c = 2 * math.atan2(math.sqrt(a), math.sqrt(1 - a))
return R * c
def find_nearest_openmhz_systems_list(lat: float, lng: float, limit: int = 5):
"""
Finds the strictly nearest OpenMHZ systems by distance.
"""
systems = get_openmhz_systems()
if not systems:
return []
# Calculate distance for all systems that provide coordinates
valid_systems = []
for s in systems:
s_lat = s.get("lat")
s_lng = s.get("lng")
if s_lat is not None and s_lng is not None:
dist = haversine_distance(lat, lng, float(s_lat), float(s_lng))
s["distance_miles"] = dist
valid_systems.append(s)
if not valid_systems:
return []
# Sort strictly by distance
valid_systems.sort(key=lambda x: x["distance_miles"])
return valid_systems[:limit]
def find_nearest_openmhz_system(lat: float, lng: float):
"""
Returns the single closest OpenMHZ system by distance.
"""
nearest = find_nearest_openmhz_systems_list(lat, lng, limit=1)
if nearest:
return nearest[0]
return None