Use USNI Fleet Tracker as the primary carrier source + small UI fixes

Background ========== PR #285 set up the seed -> cache -> GDELT model for the carrier tracker to address audit issues #244/#245/#246. The GDELT half of that pipeline hits api.gdeltproject.org's doc API for headline-region keyword matching -- low precision (false centroid positions per #245) AND unreliable (the host times out from some networks, including Docker Desktop on Windows). USNI publishes a weekly Fleet & Marine Tracker with explicit prose like: "The Gerald R. Ford Carrier Strike Group is operating in the Red Sea" "Aircraft carrier USS George Washington (CVN-73) is in port in Yokosuka, Japan" That is a strictly better source for U.S. Navy carrier positions: authoritative, deterministically parseable, weekly cadence. What this PR does ================= New module: backend/services/fetchers/usni_fleet_tracker.py - Pulls USNI's WordPress RSS feeds (site-wide + category, unioned). - Picks the most recent fleet-tracker post by parsed pubDate. - For each carrier in the registry, scans the article body for "is operating in / is in port in / returned to / transiting" near the carrier's name, hull code, or "<name> Carrier Strike Group" variant. Captures the region/port phrase that follows. - Maps the region phrase to coordinates via the existing REGION_COORDS table, with a USNI-phrase alias table for the specific wording USNI uses ("Yokosuka, Japan", "Norfolk, Va.", "Naval Station San Diego", "5th Fleet AOR", etc.). - Returns {hull: position_entry} with position_confidence="recent" and position_source_at = the article's actual publication timestamp (not now()). Politeness ---------- Uses outbound_user_agent("usni-fleet-tracker") so USNI sees a per-install Shadowbroker identifier (Round 7a / PR #292). The article body pages return 403 to non-browser UAs; the WordPress RSS feed serves the full <content:encoded> body and is the supported aggregator path. No browser UA spoofing. carrier_tracker.update_carrier_positions() now runs three phases: 1. Bootstrap from cache (or seed on first run). 2. USNI fleet tracker -- PRIMARY high-confidence source. 3. GDELT -- SECONDARY backfill; can NOT demote a "recent" USNI position to an "approximate" GDELT headline match. Verified live: 6 of 11 carriers picked up real May 18, 2026 positions on first refresh (Eisenhower, Ford, Bush, Roosevelt, Lincoln, Washington). The other 5 weren't mentioned in this week's article (they're in port at homeports with no deployment changes) and kept their cache entries -- which is the correct seed/cache contract from PR #285. Other small fixes bundled in ============================ docker-compose.yml: add the 6 third-party-fetcher opt-in env vars (PREDICTION_MARKETS_ENABLED, FINANCIAL_ENABLED, FIMI_ENABLED, NUFORC_ENABLED, NEWS_ENABLED, CROWDTHREAT_ENABLED). They were documented in .env.example but never wired through compose, so setting them in .env had no effect. frontend/src/components/TopRightControls.tsx: fix 6 broken i18n keys that were showing as raw "terminal.term1" / "terminal.cleanupDetail" / "node.soloReady" placeholders in the INFONET TERMINAL modal. The translation files have these strings under different key names; the component now calls the right ones. Full-file sweep confirmed every other t('...') key in the whole frontend resolves cleanly.
Round 7a: per-operator outbound attribution + GDELT GCS-direct fix (#292 )
2026-06-03 21:08:13 +02:00 · 2026-05-21 20:32:22 -06:00 · 2026-05-21 15:11:28 -06:00 · 2026-05-21 13:27:16 -06:00
35 changed files with 1638 additions and 250 deletions
@@ -24,14 +24,28 @@ AIS_API_KEY=              # https://aisstream.io/ — free tier WebSocket key
 # Requires MESH_DEBUG_MODE=true; do not enable this for ordinary use.
 # ALLOW_INSECURE_ADMIN=false

-# Default outbound User-Agent for all third-party HTTP fetchers.
-# Project-generic by default — does NOT include any personal contact info or
-# operator-specific identifier. Override only if you run a public relay and
-# want upstreams to be able to reach you (e.g. Nominatim/OSM usage policy).
-# SHADOWBROKER_USER_AGENT=ShadowBroker-OSINT/0.9 (contact: ops@example.com)
+# Per-install operator handle. Round 7a: every outbound third-party API
+# call (Wikipedia, Wikidata, Nominatim, GDELT, OpenMHz, Broadcastify,
+# weather.gov, NUFORC, etc.) includes this handle in the User-Agent so
+# upstreams can rate-limit / contact the specific install instead of
+# treating every Shadowbroker user as one entity.
+#
+# Default empty -> a stable pseudonymous handle (e.g. "operator-7f3a92") is
+# auto-generated on first run and persisted to backend/data/operator_handle.json.
+# Operators who want a meaningful handle (real name, org, GitHub login) can
+# set it here. Special characters are sanitized to dashes.
+# OPERATOR_HANDLE=

-# User-Agent for Nominatim geocoding requests (per OSM usage policy).
-# NOMINATIM_USER_AGENT=ShadowBroker/1.0
+# Default outbound User-Agent for all third-party HTTP fetchers. Operators
+# who run a public relay and want a completely custom UA can set this; it
+# bypasses the per-operator helper entirely. Most installs should leave it
+# unset and use OPERATOR_HANDLE instead.
+# SHADOWBROKER_USER_AGENT=
+
+# Nominatim-specific User-Agent override (OSM usage policy). Leave unset to
+# use the per-install handle (default) — set only if you have a registered
+# Nominatim relay identity.
+# NOMINATIM_USER_AGENT=

 # ── Third-party fetcher opt-ins ────────────────────────────────
 # These data sources phone home to politically/commercially sensitive
@@ -8148,8 +8148,12 @@ def _cctv_proxy_profile_for_url(target_url: str) -> _CCTVProxyProfile:


 def _cctv_upstream_headers(request: Request, profile: _CCTVProxyProfile) -> dict[str, str]:
+    # Round 7a: per-install operator handle. See routers/cctv.py for the
+    # canonical handler; this duplicate stays in lockstep until the #239
+    # dedup ladder removes it.
+    from services.network_utils import outbound_user_agent
    headers = {
-        "User-Agent": "Mozilla/5.0 (compatible; ShadowBroker CCTV proxy)",
+        "User-Agent": f"Mozilla/5.0 (compatible; {outbound_user_agent('cctv-proxy')})",
        **profile.headers,
    }
    range_header = request.headers.get("range")
@@ -13,7 +13,6 @@ dependencies = [
    "apscheduler==3.10.3",
    "beautifulsoup4>=4.9.0",
    "cachetools==5.5.2",
-    "cloudscraper==1.2.71",
    "cryptography>=41.0.0",
    "defusedxml>=0.7.1",
    "fastapi==0.115.12",
@@ -82,6 +82,28 @@ async def api_get_keys_meta(request: Request):
    return get_env_path_info()


+@router.get(
+    "/api/settings/operator-handle",
+    dependencies=[Depends(require_local_operator)],
+)
+@limiter.limit("60/minute")
+async def api_get_operator_handle(request: Request):
+    """Round 7a: return the per-install operator handle so the frontend
+    can include it in browser-direct third-party API calls (Wikipedia /
+    Wikidata via lib/wikimediaClient). The handle is auto-generated on
+    first use; operators can override it via the OPERATOR_HANDLE setting
+    or the env var of the same name.
+
+    Gated on local-operator: legitimate browser usage goes through the
+    Next.js proxy which auto-attaches the admin key; remote scanners get
+    403. The handle itself isn't a secret (it's sent to every third-party
+    API the operator touches), but admin-gating it matches the rest of
+    the settings endpoints and follows least-privilege.
+    """
+    from services.network_utils import get_operator_handle
+    return {"handle": get_operator_handle()}
+
+
@router.get(
    "/api/settings/news-feeds",
    dependencies=[Depends(require_local_operator)],
@@ -18,6 +18,12 @@ from auth import require_local_operator, require_openclaw_or_local
 from limiter import limiter
 from services.fetchers._store import latest_data as _latest_data

+
+
+def _ai_intel_user_agent() -> str:
+    from services.network_utils import outbound_user_agent
+    return outbound_user_agent("ai-intel")
+
 logger = logging.getLogger(__name__)
 router = APIRouter()

@@ -447,7 +453,7 @@ async def ai_satellite_images(
            "https://planetarycomputer.microsoft.com/api/stac/v1/search",
            json=search_payload,
            timeout=10,
-            headers={"User-Agent": "ShadowBroker-OSINT/1.0 (ai-intel)"},
+            headers={"User-Agent": _ai_intel_user_agent()},
        )
        resp.raise_for_status()
        features = resp.json().get("features", [])
@@ -165,7 +165,13 @@ def _cctv_proxy_profile_for_url(target_url: str) -> _CCTVProxyProfile:


 def _cctv_upstream_headers(request: Request, profile: _CCTVProxyProfile) -> dict:
-    headers = {"User-Agent": "Mozilla/5.0 (compatible; ShadowBroker CCTV proxy)", **profile.headers}
+    # Round 7a: per-install operator handle. Mozilla/5.0 prefix retained
+    # because many CCTV endpoints sniff for a browser-like prefix.
+    from services.network_utils import outbound_user_agent
+    headers = {
+        "User-Agent": f"Mozilla/5.0 (compatible; {outbound_user_agent('cctv-proxy')})",
+        **profile.headers,
+    }
    range_header = request.headers.get("range")
    if range_header:
        headers["Range"] = range_header
@@ -20,7 +20,17 @@ OUT_PATH = Path(__file__).parent.parent / "data" / "power_plants.json"

 def main() -> None:
    print(f"Downloading WRI Global Power Plant Database from GitHub...")
-    req = urllib.request.Request(CSV_URL, headers={"User-Agent": "ShadowBroker-OSINT/1.0"})
+    # Round 7a: release-time data refresher. Uses the per-operator UA if
+    # available, otherwise a release-script-specific identifier. This
+    # script is run by the maintainer at release time, NOT at runtime,
+    # so an aggregate UA is acceptable; we still use the helper so the
+    # behavior matches the rest of the project.
+    try:
+        from services.network_utils import outbound_user_agent
+        ua = outbound_user_agent("release-script-power-plants")
+    except Exception:
+        ua = "Shadowbroker/0.9 (release-script-power-plants; +https://github.com/BigBodyCobain/Shadowbroker/issues)"
+    req = urllib.request.Request(CSV_URL, headers={"User-Agent": ua})
    with urllib.request.urlopen(req, timeout=60) as resp:
        raw = resp.read().decode("utf-8")

@@ -627,20 +627,56 @@ def update_carrier_positions() -> None:
            _carrier_positions.update(positions)
            _last_update = datetime.now(timezone.utc)
    logger.info(
-        "Carrier tracker: %d carriers loaded from cache (GDELT enrichment starting...)",
+        "Carrier tracker: %d carriers loaded from cache (USNI + GDELT enrichment starting...)",
        len(positions),
    )

-    # --- Phase 2: GDELT enrichment ---
+    # --- Phase 2: USNI Fleet & Marine Tracker (PRIMARY source) ---
+    #
+    # USNI publishes a weekly editorial tracker with each carrier's
+    # actual operating area, parsed from explicit prose like
+    #   "The Gerald R. Ford Carrier Strike Group is operating in the Red Sea"
+    # These positions are tagged ``position_confidence: "recent"`` because
+    # they reflect actual reporting, not headline-keyword centroids.
+    # USNI updates are preferred over GDELT — they're authoritative on
+    # US Navy positions where GDELT is just article-title text mining.
+    try:
+        from services.fetchers.usni_fleet_tracker import (
+            fetch_latest_fleet_tracker_positions,
+        )
+        usni_positions = fetch_latest_fleet_tracker_positions()
+        for hull, pos in usni_positions.items():
+            positions[hull] = pos
+            logger.info(
+                "Carrier USNI update: %s → %s",
+                CARRIER_REGISTRY[hull]["name"],
+                pos.get("desc", ""),
+            )
+    except Exception as e:
+        logger.warning("USNI fleet-tracker fetch failed: %s", e)
+
+    # --- Phase 3: GDELT enrichment (SECONDARY — fills gaps) ---
+    #
+    # Used only to backfill carriers USNI didn't mention this week. The
+    # position is stamped ``approximate`` so the UI knows it's a
+    # headline-centroid match (Issue #245).
    try:
        articles = _fetch_gdelt_carrier_news()
        news_positions = _parse_carrier_positions_from_news(articles)
        for hull, pos in news_positions.items():
-            # Always overwrite — newest GDELT mention wins. The previous
-            # entry's position is preserved in git history and the next
-            # cycle either confirms or replaces it.
+            # Only overwrite if the existing entry is NOT a recent USNI
+            # observation. A "recent" USNI position is higher-confidence
+            # than a GDELT headline-centroid match — don't let GDELT
+            # demote a real position to an approximate one.
+            existing = positions.get(hull, {})
+            existing_conf = _compute_position_confidence(existing)
+            if existing_conf == "recent":
+                continue
            positions[hull] = pos
-            logger.info("Carrier OSINT: updated %s from news", CARRIER_REGISTRY[hull]["name"])
+            logger.info(
+                "Carrier OSINT: updated %s from GDELT news",
+                CARRIER_REGISTRY[hull]["name"],
+            )
    except (ValueError, KeyError, json.JSONDecodeError, OSError) as e:
        logger.warning("GDELT carrier fetch failed: %s", e)

@@ -295,6 +295,19 @@ class Settings(BaseSettings):
    # service operator can identify per-install traffic instead of a generic
    # "ShadowBroker" aggregate.
    MESHTASTIC_OPERATOR_CALLSIGN: str = ""
+    # Per-install operator handle used in the User-Agent for EVERY third-party
+    # API the backend calls (Wikipedia, Wikidata, Nominatim, GDELT, OpenMHz,
+    # Broadcastify, weather.gov, NUFORC, etc.). The default is empty, in which
+    # case backend/services/network_utils.py auto-generates a stable
+    # pseudonymous handle like "operator-7f3a92" on first use and caches it.
+    # Operators who want to identify themselves with a real handle can set
+    # this; operators who want to stay pseudonymous can leave it empty.
+    #
+    # The handle is sent ONLY to public third-party APIs. It is NEVER mixed
+    # into mesh / Wormhole / Infonet identity (those have their own crypto
+    # identity layer; conflating the two would leak public attribution into
+    # private mesh state).
+    OPERATOR_HANDLE: str = ""

    # SAR (Synthetic Aperture Radar) data layer
    # Mode A — free catalog metadata, no account, default-on
@@ -16,8 +16,15 @@ from typing import Any

 import requests

+from services.network_utils import outbound_user_agent
+
 logger = logging.getLogger(__name__)

+
+def _feed_ingester_user_agent() -> str:
+    # Round 7a: per-install attribution for operator-curated feed URLs.
+    return outbound_user_agent("feed-ingester")
+
 # ---------------------------------------------------------------------------
 # State
 # ---------------------------------------------------------------------------
@@ -157,7 +164,7 @@ def _fetch_layer_feed(layer: dict[str, Any]) -> None:
        resp = requests.get(
            feed_url,
            timeout=_FETCH_TIMEOUT,
-            headers={"User-Agent": "ShadowBroker-FeedIngester/1.0"},
+            headers={"User-Agent": _feed_ingester_user_agent()},
        )
        resp.raise_for_status()
        data = resp.json()
@@ -21,6 +21,13 @@ from typing import Any
 import defusedxml.ElementTree as ET
 import requests

+
+
+def _aircraft_db_user_agent() -> str:
+    """Round 7a: lazy import so the per-install operator handle is included."""
+    from services.network_utils import outbound_user_agent
+    return outbound_user_agent("aircraft-database")
+
 logger = logging.getLogger(__name__)

 _BUCKET_LIST_URL = (
@@ -44,7 +51,7 @@ def _latest_snapshot_key() -> str:
    response = requests.get(
        _BUCKET_LIST_URL,
        timeout=_LIST_TIMEOUT_S,
-        headers={"User-Agent": _USER_AGENT},
+        headers={"User-Agent": _aircraft_db_user_agent()},
    )
    response.raise_for_status()
    root = ET.fromstring(response.text)
@@ -71,7 +78,7 @@ def _stream_csv_index(url: str) -> dict[str, dict[str, str]]:
        url,
        timeout=_DOWNLOAD_TIMEOUT_S,
        stream=True,
-        headers={"User-Agent": _USER_AGENT},
+        headers={"User-Agent": _aircraft_db_user_agent()},
    ) as response:
        response.raise_for_status()
        line_iter = (
@@ -15,7 +15,11 @@ import time
 import heapq
 from datetime import datetime, timedelta
 from pathlib import Path
-from services.network_utils import external_curl_fallback_enabled, fetch_with_curl
+from services.network_utils import (
+    external_curl_fallback_enabled,
+    fetch_with_curl,
+    outbound_user_agent,
+)
 from services.fetchers._store import latest_data, _data_lock, _mark_fresh
 from services.fetchers.nuforc_enrichment import enrich_sighting
 from services.fetchers.retry import with_retry
@@ -279,13 +283,13 @@ def fetch_weather_alerts():
        return
    alerts = []
    try:
-        # weather.gov requires a User-Agent per their API policy, but it
-        # need not identify the operator. Use a project-generic string and
-        # let the user override via SHADOWBROKER_USER_AGENT if needed.
-        from services.network_utils import DEFAULT_USER_AGENT
+        # weather.gov requires a User-Agent per their API policy. Round 7a:
+        # send the per-install operator handle so they can rate-limit per
+        # operator instead of treating "Shadowbroker" as one entity.
+        from services.network_utils import outbound_user_agent
        url = "https://api.weather.gov/alerts/active?status=actual"
        headers = {
-            "User-Agent": DEFAULT_USER_AGENT,
+            "User-Agent": outbound_user_agent("weather-gov"),
            "Accept": "application/geo+json",
        }
        response = fetch_with_curl(url, timeout=15, headers=headers)
@@ -713,7 +717,12 @@ _NUFORC_LIVE_NONCE_RE = re.compile(
    r'id=["\']wdtNonceFrontendServerSide_1["\'][^>]*value=["\']([a-f0-9]+)["\']'
 )
 _NUFORC_LIVE_SIGHTING_ID_RE = re.compile(r"id=(\d+)")
-_NUFORC_LIVE_USER_AGENT = "Mozilla/5.0 (ShadowBroker-OSINT NUFORC-fetcher)"
+# Round 7a: NUFORC's site is sensitive to non-browser UAs but we send a
+# per-install operator handle prefixed by Mozilla/5.0 so we're identifiable
+# without being aggregately blocked. Operators who want stricter privacy
+# can override the entire UA via SHADOWBROKER_USER_AGENT.
+def _nuforc_live_user_agent() -> str:
+    return f"Mozilla/5.0 ({outbound_user_agent('nuforc-live')})"
 _NUFORC_LIVE_SESSION_COOKIES = _NUFORC_DATA_DIR / "nuforc_session.cookies"

 # Sample grid covering continental US, Alaska, Hawaii, Canada, UK, Australia
@@ -957,7 +966,7 @@ def _photon_lookup(query: str) -> list[float] | None:
        res = fetch_with_curl(
            url,
            headers={
-                "User-Agent": "ShadowBroker-OSINT/1.0 (NUFORC-UAP-layer)",
+                "User-Agent": outbound_user_agent("nuforc-uap-geocode"),
                "Accept-Language": "en",
            },
            timeout=10,
@@ -1053,7 +1062,7 @@ def _nuforc_fetch_month_live(yyyymm: str, cookie_jar: Path) -> list[dict]:
        index_res = subprocess.run(
            [
                curl_bin, "-sL",
-                "-A", _NUFORC_LIVE_USER_AGENT,
+                "-A", _nuforc_live_user_agent(),
                "-c", str(cookie_jar),
                "-b", str(cookie_jar),
                index_url,
@@ -1089,7 +1098,7 @@ def _nuforc_fetch_month_live(yyyymm: str, cookie_jar: Path) -> list[dict]:
        ajax_res = subprocess.run(
            [
                curl_bin, "-sL",
-                "-A", _NUFORC_LIVE_USER_AGENT,
+                "-A", _nuforc_live_user_agent(),
                "-c", str(cookie_jar),
                "-b", str(cookie_jar),
                "-X", "POST",
@@ -6,7 +6,7 @@ import heapq
 import logging
 from pathlib import Path
 from cachetools import TTLCache
-from services.network_utils import fetch_with_curl
+from services.network_utils import fetch_with_curl, outbound_user_agent
 from services.fetchers._store import latest_data, _data_lock, _mark_fresh
 from services.fetchers.retry import with_retry

@@ -29,7 +29,7 @@ def _geocode_region(region_name: str, country_name: str) -> tuple:

        query = urllib.parse.quote(f"{region_name}, {country_name}")
        url = f"https://nominatim.openstreetmap.org/search?q={query}&format=json&limit=1"
-        response = fetch_with_curl(url, timeout=8, headers={"User-Agent": "ShadowBroker-OSINT/1.0"})
+        response = fetch_with_curl(url, timeout=8, headers={"User-Agent": outbound_user_agent("infrastructure-data")})
        if response.status_code == 200:
            results = response.json()
            if results:
@@ -191,8 +191,13 @@ def fetch_meshtastic_nodes():
        _os.environ.get("MESHTASTIC_SEND_CALLSIGN_HEADER", "true")
    ).strip().lower() not in {"0", "false", "no", "off", ""}

-    from services.network_utils import DEFAULT_USER_AGENT
-    ua_base = f"{DEFAULT_USER_AGENT}; 24h polling"
+    # Round 7a: outbound_user_agent already includes the per-install handle.
+    # The optional Meshtastic callsign is appended as additional context so
+    # meshtastic.liamcottle.net's operator can identify both the install AND
+    # the registered radio operator (when MESHTASTIC_OPERATOR_CALLSIGN is set
+    # and MESHTASTIC_SEND_CALLSIGN_HEADER is true; see issue #203).
+    from services.network_utils import outbound_user_agent
+    ua_base = f"{outbound_user_agent('meshtastic-map')}; 24h polling"
    if callsign and send_callsign_header:
        user_agent = f"{ua_base}; node={callsign}"
    else:
@@ -17,6 +17,12 @@ from typing import Any

 import requests

+
+
+def _route_db_user_agent() -> str:
+    from services.network_utils import outbound_user_agent
+    return outbound_user_agent("route-database")
+
 logger = logging.getLogger(__name__)

 _ROUTES_URL = "https://vrs-standing-data.adsb.lol/routes.csv.gz"
@@ -37,7 +43,7 @@ def _fetch_csv_gz(url: str) -> list[dict[str, str]]:
    response = requests.get(
        url,
        timeout=_HTTP_TIMEOUT_S,
-        headers={"User-Agent": _USER_AGENT, "Accept-Encoding": "gzip"},
+        headers={"User-Agent": _route_db_user_agent(), "Accept-Encoding": "gzip"},
    )
    response.raise_for_status()
    text = gzip.decompress(response.content).decode("utf-8-sig")
@@ -10,6 +10,12 @@ from datetime import datetime, timezone
 from services.fetchers._store import _data_lock, _mark_fresh, latest_data
 from services.network_utils import fetch_with_curl

+
+
+def _trains_user_agent() -> str:
+    from services.network_utils import outbound_user_agent
+    return outbound_user_agent("trains")
+
 logger = logging.getLogger(__name__)

 _EARTH_RADIUS_KM = 6371.0
@@ -379,7 +385,7 @@ def _fetch_digitraffic() -> list[dict]:
            timeout=15,
            headers={
                "Accept-Encoding": "gzip",
-                "User-Agent": "ShadowBroker-OSINT/1.0",
+                "User-Agent": _trains_user_agent(),
            },
        )
        if resp.status_code != 200:
@@ -0,0 +1,457 @@
+"""USNI News Fleet & Marine Tracker — authoritative weekly carrier
+position publication.
+
+Why this exists
+---------------
+The previous carrier_tracker pipeline relied on GDELT headline matching
+(``api.gdeltproject.org``) to derive positions from text like "USS Ford
+in the Mediterranean" → centroid of "Mediterranean Sea". That was
+- low-precision (audit issue #245 — false precision from text mentions),
+- unreliable (``api.gdeltproject.org`` is sometimes unreachable from
+  certain network paths, including Docker Desktop on some Windows hosts).
+
+USNI publishes a weekly tracker that explicitly lists where every U.S.
+carrier is operating. The article body uses extremely consistent phrasing:
+
+    "The Gerald R. Ford Carrier Strike Group is operating in the Red Sea"
+    "Aircraft carrier USS George Washington (CVN-73) is in port in
+     Yokosuka, Japan."
+    "USS Dwight D. Eisenhower (CVN-69) sails down the Elizabeth River"
+
+Those are deterministic to parse. This module:
+
+  1. Pulls the WordPress RSS feeds (both site-wide and category) — the
+     site-wide feed often has fresher posts before the category feed
+     catches up, so we union them.
+  2. Picks the most recent post by parsed ``pubDate``.
+  3. For each carrier in the registry, scans the article body for a
+     "is operating in / is in port in / departed from" pattern near
+     the carrier's name.
+  4. Maps the extracted region phrase to coordinates via the carrier
+     tracker's existing REGION_COORDS.
+
+The result is a ``{hull: position_entry}`` dict that the carrier tracker
+consumes as a high-confidence source — ``position_confidence: "recent"``
+with ``position_source_at`` set to the article's actual publication
+timestamp (not ``now()``).
+
+Politeness
+----------
+We send the per-install operator handle via ``outbound_user_agent``
+(Round 7a) so USNI can rate-limit / contact the specific install if
+needed. Article-body pages return 403 to non-browser UAs (Cloudflare),
+but WordPress RSS feeds are open and serve the full article in
+``<content:encoded>`` — that's the supported path for aggregators and
+the one we use. We do not spoof browser headers.
+"""
+
+from __future__ import annotations
+
+import logging
+import re
+import xml.etree.ElementTree as ET
+from datetime import datetime, timezone
+from email.utils import parsedate_to_datetime
+from typing import Iterable
+
+from services.network_utils import fetch_with_curl, outbound_user_agent
+
+logger = logging.getLogger(__name__)
+
+_RSS_URLS: tuple[str, ...] = (
+    # Site-wide feed often has the freshest posts before the category
+    # feed catches up. We try this first.
+    "https://news.usni.org/feed",
+    # Category feed has older fleet trackers for backfill.
+    "https://news.usni.org/category/fleet-tracker/feed",
+)
+
+_RSS_NS = {"content": "http://purl.org/rss/1.0/modules/content/"}
+
+_FLEET_TRACKER_TITLE_RE = re.compile(
+    r"fleet\s+and\s+marine\s+tracker", re.IGNORECASE
+)
+
+_TAG_STRIP_RE = re.compile(r"<[^>]+>")
+_WHITESPACE_RE = re.compile(r"\s+")
+
+
+def _strip_html(html: str) -> str:
+    text = _TAG_STRIP_RE.sub(" ", html or "")
+    return _WHITESPACE_RE.sub(" ", text).strip()
+
+
+def _request_headers() -> dict[str, str]:
+    """Headers USNI's WordPress feed accepts from a legitimate aggregator.
+
+    The ``Referer`` is the category index page — that's where a real
+    feed reader navigates from. ``Accept`` declares RSS preference but
+    falls back to HTML. No browser UA spoofing.
+    """
+    return {
+        "User-Agent": outbound_user_agent("usni-fleet-tracker"),
+        "Accept": "application/rss+xml, application/xml;q=0.9, */*;q=0.1",
+        "Accept-Language": "en-US,en;q=0.5",
+        "Referer": "https://news.usni.org/category/fleet-tracker",
+    }
+
+
+def _parse_pubdate(raw: str) -> datetime | None:
+    if not raw:
+        return None
+    try:
+        dt = parsedate_to_datetime(raw)
+        if dt.tzinfo is None:
+            dt = dt.replace(tzinfo=timezone.utc)
+        return dt
+    except (TypeError, ValueError):
+        return None
+
+
+def _iter_fleet_tracker_items(rss_urls: Iterable[str]) -> list[dict]:
+    """Pull every fleet-tracker post visible across the given RSS feeds.
+
+    De-duplicates by article link. Returns a list of dicts:
+        {"title", "link", "pub_date" (datetime), "body" (plain text)}
+    """
+    items_by_link: dict[str, dict] = {}
+    for url in rss_urls:
+        try:
+            r = fetch_with_curl(url, timeout=15, headers=_request_headers())
+        except Exception as exc:
+            logger.debug("USNI RSS %s exception: %s", url, exc)
+            continue
+        if not r or r.status_code != 200 or not r.text:
+            logger.debug(
+                "USNI RSS %s returned status=%s body=%d",
+                url,
+                getattr(r, "status_code", "?"),
+                len(getattr(r, "text", "") or ""),
+            )
+            continue
+        try:
+            root = ET.fromstring(r.text)
+        except ET.ParseError as exc:
+            logger.warning("USNI RSS parse error from %s: %s", url, exc)
+            continue
+        for item in root.findall(".//item"):
+            title = (item.findtext("title") or "").strip()
+            if not _FLEET_TRACKER_TITLE_RE.search(title):
+                continue
+            link = (item.findtext("link") or "").strip()
+            if not link or link in items_by_link:
+                continue
+            pub_dt = _parse_pubdate(item.findtext("pubDate") or "")
+            body_html = (
+                item.findtext("content:encoded", default="", namespaces=_RSS_NS)
+                or item.findtext("description", default="")
+                or ""
+            )
+            items_by_link[link] = {
+                "title": title,
+                "link": link,
+                "pub_date": pub_dt,
+                "body": _strip_html(body_html),
+            }
+    return list(items_by_link.values())
+
+
+# Map USNI region phrases to keys in carrier_tracker.REGION_COORDS.
+# The carrier_tracker table already covers most named bodies of water and
+# major ports — we just need to teach this module to RECOGNIZE the
+# specific phrases USNI's editorial style uses, which sometimes spell
+# the same body of water differently.
+_USNI_REGION_ALIASES: tuple[tuple[str, str], ...] = (
+    # USNI phrase (lowercase) -> REGION_COORDS key
+    ("eastern mediterranean", "eastern mediterranean"),
+    ("western mediterranean", "western mediterranean"),
+    ("mediterranean sea", "mediterranean"),
+    ("the mediterranean", "mediterranean"),
+    ("red sea", "red sea"),
+    ("arabian sea area of responsibility", "arabian sea"),
+    ("north arabian sea", "north arabian sea"),
+    ("arabian sea", "arabian sea"),
+    ("persian gulf", "persian gulf"),
+    ("gulf of oman", "gulf of oman"),
+    ("strait of hormuz", "strait of hormuz"),
+    ("south china sea", "south china sea"),
+    ("east china sea", "east china sea"),
+    ("philippine sea", "philippine sea"),
+    ("sea of japan", "sea of japan"),
+    ("taiwan strait", "taiwan strait"),
+    ("western pacific", "western pacific"),
+    ("pacific ocean", "pacific"),
+    ("indian ocean", "indian ocean"),
+    ("north atlantic", "north atlantic"),
+    ("western atlantic", "atlantic"),
+    ("eastern atlantic", "atlantic"),
+    ("atlantic ocean", "atlantic"),
+    ("gulf of aden", "gulf of aden"),
+    ("horn of africa", "horn of africa"),
+    ("bab el-mandeb", "bab el-mandeb"),
+    ("suez canal", "suez canal"),
+    ("baltic sea", "baltic sea"),
+    ("north sea", "north sea"),
+    ("black sea", "black sea"),
+    ("south atlantic", "south atlantic"),
+    ("coral sea", "coral sea"),
+    ("gulf of mexico", "gulf of mexico"),
+    ("caribbean sea", "caribbean"),
+    ("caribbean", "caribbean"),
+    # Specific ports
+    ("naval station norfolk", "norfolk"),
+    ("norfolk naval shipyard", "newport news"),
+    ("newport news shipbuilding", "newport news"),
+    ("newport news", "newport news"),
+    # USNI tags Norfolk mentions with state suffix; match both.
+    ("norfolk, va", "norfolk"),
+    ("norfolk", "norfolk"),
+    ("naval station everett", "puget sound"),
+    ("naval base kitsap", "bremerton"),
+    ("bremerton", "bremerton"),
+    ("puget sound", "puget sound"),
+    ("naval base san diego", "san diego"),
+    ("san diego, calif", "san diego"),
+    ("san diego", "san diego"),
+    ("yokosuka, japan", "yokosuka"),
+    ("yokosuka", "yokosuka"),
+    ("pearl harbor", "pearl harbor"),
+    ("apra harbor, guam", "guam"),
+    ("guam", "guam"),
+    ("bahrain", "bahrain"),
+    ("naval station rota", "rota"),
+    ("rota, spain", "rota"),
+    ("naples, italy", "naples"),
+    # Fleets / AORs
+    ("5th fleet", "5th fleet"),
+    ("6th fleet", "6th fleet"),
+    ("7th fleet", "7th fleet"),
+    ("3rd fleet", "3rd fleet"),
+    ("2nd fleet", "2nd fleet"),
+    ("centcom", "centcom"),
+    ("indo-pacific command", "indopacom"),
+    ("eucom", "eucom"),
+    ("southcom", "southcom"),
+)
+
+
+def _resolve_region_phrase(phrase: str) -> tuple[str, str] | None:
+    """Map a USNI region phrase to a ``(canonical_key, display)`` tuple,
+    or ``None`` if we don't recognize it.
+
+    ``canonical_key`` is what ``carrier_tracker.REGION_COORDS`` keys on.
+    ``display`` is the phrase we'll show in the dossier description.
+    """
+    p = (phrase or "").lower().strip()
+    if not p:
+        return None
+    for usni_phrase, canonical in _USNI_REGION_ALIASES:
+        if usni_phrase in p:
+            return canonical, usni_phrase
+    return None
+
+
+# Operating-verb phrases USNI uses, with a capture group for the region
+# phrase that immediately follows. Each pattern is designed to swallow
+# the optional editorial filler that often appears between verb and
+# location (e.g. "returned Friday to Norfolk" — "Friday" goes in the
+# filler; "Norfolk" is the location).
+#
+# Order matters: most-specific patterns first, so e.g. "is in port in"
+# wins over the generic "is".
+_DAY_FILLER = r"(?:[A-Z][a-z]+(?:day)?,?\s+)?"  # optional "Friday" / "Monday" / etc.
+_LOC_CAPTURE = r"([A-Za-z][A-Za-z0-9\s,\.\-']{2,80})"
+
+_OPERATING_PATTERNS: tuple[re.Pattern, ...] = (
+    # "is operating in [the] {REGION}" / "is also operating in [the] {REGION}"
+    re.compile(r"\bis\s+(?:also\s+|now\s+)?operating\s+in\s+(?:the\s+)?" + _LOC_CAPTURE, re.IGNORECASE),
+    # "is conducting <stuff> in [the] {REGION}"
+    re.compile(r"\bis\s+conducting\s+[A-Za-z0-9\-\s]{2,40}\s+in\s+(?:the\s+)?" + _LOC_CAPTURE, re.IGNORECASE),
+    # "is in port in {LOCATION}"
+    re.compile(r"\bis\s+in\s+port\s+in\s+" + _LOC_CAPTURE, re.IGNORECASE),
+    # "is in port" (no location — degenerate, use carrier's homeport via separate path)
+    # → not captured here; falls through to homeport
+    # "is underway in [the] {REGION}"
+    re.compile(r"\bis\s+underway\s+in\s+(?:the\s+)?" + _LOC_CAPTURE, re.IGNORECASE),
+    # "is deployed to [the] {REGION}" / "deployed in"
+    re.compile(r"\bis\s+deployed\s+(?:to|in)\s+(?:the\s+)?" + _LOC_CAPTURE, re.IGNORECASE),
+    # "returned [Day] to {LOCATION}" / "returned [Day] from {REGION}"
+    re.compile(r"\breturned\s+" + _DAY_FILLER + r"to\s+" + _LOC_CAPTURE, re.IGNORECASE),
+    re.compile(r"\breturned\s+" + _DAY_FILLER + r"from\s+(?:the\s+)?" + _LOC_CAPTURE, re.IGNORECASE),
+    # "arrived [Day] in/at {LOCATION}"
+    re.compile(r"\barrived\s+" + _DAY_FILLER + r"(?:in|at)\s+" + _LOC_CAPTURE, re.IGNORECASE),
+    # "departed [Day] from {LOCATION}"
+    re.compile(r"\bdeparted\s+" + _DAY_FILLER + r"(?:from\s+)?" + _LOC_CAPTURE, re.IGNORECASE),
+    # "transiting [the] {REGION}" / "sailing through [the] {REGION}"
+    re.compile(r"\btransiting\s+(?:the\s+)?" + _LOC_CAPTURE, re.IGNORECASE),
+    re.compile(r"\bsailing\s+through\s+(?:the\s+)?" + _LOC_CAPTURE, re.IGNORECASE),
+    # "is homeported at {LOCATION}"
+    re.compile(r"\bis\s+homeported\s+at\s+" + _LOC_CAPTURE, re.IGNORECASE),
+)
+
+
+def _extract_region_for_carrier(
+    body: str,
+    carrier_names: list[str],
+    hull_code: str,
+) -> str | None:
+    """Return the best-guess region phrase for one carrier from the
+    article body, or None if no confident match.
+
+    Algorithm:
+      1. Find every mention of the carrier (any name variant or the hull
+         code) in the body.
+      2. For each mention, look in the ~300-char window AFTER it for any
+         of the operating-verb patterns.
+      3. Return the first hit. If a more-confident match later turns up
+         (e.g. "is operating in the X" beats "is homeported at Y"), the
+         first one in document order still wins — USNI's structure puts
+         the position-update sentence near the top of each carrier's
+         section, and the homeport mention later.
+    """
+    # Build a master mention regex covering every name variant + the hull.
+    candidates: list[str] = []
+    for name in carrier_names:
+        if name and len(name) >= 4:
+            candidates.append(re.escape(name))
+    if hull_code:
+        candidates.append(re.escape(hull_code))
+    if not candidates:
+        return None
+    mention_re = re.compile(r"\b(?:" + "|".join(candidates) + r")\b", re.IGNORECASE)
+
+    window_chars = 320
+    seen_phrases: list[str] = []
+    for mention in mention_re.finditer(body):
+        end = mention.end()
+        window = body[end : end + window_chars]
+        # Cut window at the next sentence break for tighter context.
+        # (We use the LAST period within the window so "Norfolk, Va." isn't
+        # confused for a sentence end — USNI uses ", Va." prolifically.)
+        # Sentence break candidates: ". " followed by uppercase OR newline.
+        sent_break = re.search(r"[\.!?]\s+[A-Z]", window)
+        if sent_break:
+            window = window[: sent_break.start() + 1]
+        # Try patterns in priority order.
+        for pat in _OPERATING_PATTERNS:
+            m = pat.search(window)
+            if not m:
+                continue
+            phrase = m.group(1).strip().rstrip(",.;: ")
+            if not phrase:
+                continue
+            # Strip trailing editorial filler — USNI often writes
+            # "Norfolk, Va., according to ship spotters" or
+            # "Yokosuka, Japan, according to..."
+            phrase = re.split(
+                r",\s+(?:according|as of|for|while|where|in support|in the)",
+                phrase,
+                maxsplit=1,
+            )[0].strip()
+            seen_phrases.append(phrase)
+            return phrase
+    return seen_phrases[0] if seen_phrases else None
+
+
+def fetch_latest_fleet_tracker_positions(
+    carrier_registry: dict | None = None,
+    region_coords: dict | None = None,
+) -> dict[str, dict]:
+    """Return ``{hull: position_entry}`` for the latest USNI fleet tracker.
+
+    Entries look like::
+
+        {
+          "lat": 18.0, "lng": 39.5, "heading": 0,
+          "desc": "Red Sea (USNI May 18, 2026)",
+          "source": "USNI News Fleet & Marine Tracker (May 18, 2026)",
+          "source_url": "https://news.usni.org/2026/05/18/...",
+          "position_source_at": "2026-05-18T18:58:44+00:00",
+          "position_confidence": "recent",
+        }
+
+    Carriers whose section can't be parsed (e.g. an off-week with no
+    mention) are simply absent from the result — the caller keeps
+    whatever position they had before.
+
+    ``carrier_registry`` and ``region_coords`` default to the carrier_tracker
+    module's own tables; passed in here for testability.
+    """
+    if carrier_registry is None or region_coords is None:
+        from services.carrier_tracker import CARRIER_REGISTRY, REGION_COORDS
+        carrier_registry = carrier_registry or CARRIER_REGISTRY
+        region_coords = region_coords or REGION_COORDS
+
+    items = _iter_fleet_tracker_items(_RSS_URLS)
+    if not items:
+        logger.warning("USNI fleet-tracker: no parseable RSS items")
+        return {}
+
+    # Pick the most recent by parsed pubDate. Items without a parseable
+    # date fall to the back of the list.
+    items.sort(
+        key=lambda it: it["pub_date"] or datetime(1970, 1, 1, tzinfo=timezone.utc),
+        reverse=True,
+    )
+    latest = items[0]
+
+    pub_dt: datetime | None = latest["pub_date"]
+    pub_iso = pub_dt.isoformat() if pub_dt else ""
+    pub_human = pub_dt.strftime("%b %d, %Y") if pub_dt else "unknown date"
+
+    body = latest["body"]
+    if not body:
+        logger.warning("USNI fleet-tracker: latest item has empty body")
+        return {}
+
+    positions: dict[str, dict] = {}
+    for hull, info in carrier_registry.items():
+        # Build name variants we'll try in the body.
+        full_name = info["name"]                       # "USS Gerald R. Ford (CVN-78)"
+        without_hull = full_name.split("(")[0].strip() # "USS Gerald R. Ford"
+        last_word = without_hull.split()[-1]            # "Ford"
+        ship_only = without_hull[4:]                    # "Gerald R. Ford"
+
+        # Variants ordered most-specific first.
+        variants: list[str] = []
+        for v in (without_hull, f"USS {ship_only}", ship_only, last_word):
+            if v and v not in variants and len(v) >= 4:
+                variants.append(v)
+
+        phrase = _extract_region_for_carrier(body, variants, hull)
+        if not phrase:
+            continue
+        resolved = _resolve_region_phrase(phrase)
+        if not resolved:
+            logger.debug(
+                "USNI: %s region phrase %r did not match any known region",
+                hull, phrase,
+            )
+            continue
+        canonical_key, display_phrase = resolved
+        coords = region_coords.get(canonical_key)
+        if not coords:
+            continue
+
+        positions[hull] = {
+            "lat": coords[0],
+            "lng": coords[1],
+            "heading": 0,
+            "desc": f"{display_phrase.title()} (USNI {pub_human})",
+            "source": f"USNI News Fleet & Marine Tracker ({pub_human})",
+            "source_url": latest["link"],
+            "position_source_at": pub_iso,
+            "position_confidence": "recent",
+        }
+
+    if positions:
+        logger.info(
+            "USNI fleet-tracker: parsed %d/%d carrier positions from %s",
+            len(positions), len(carrier_registry), latest["link"],
+        )
+    else:
+        logger.warning(
+            "USNI fleet-tracker: latest article %s yielded zero parseable carriers",
+            latest["link"],
+        )
+    return positions
@@ -21,9 +21,17 @@ _cache_lock = threading.Lock()
 _local_search_cache: List[Dict[str, Any]] | None = None
 _local_search_lock = threading.Lock()

-_USER_AGENT = os.environ.get(
-    "NOMINATIM_USER_AGENT", "ShadowBroker/1.0 (https://github.com/BigBodyCobain/Shadowbroker)"
-)
+# Round 7a: per-install operator handle threads through every Nominatim
+# call. NOMINATIM_USER_AGENT env override is still honored for operators
+# who run a custom relay / known good identity, but the default uses the
+# per-install handle so OpenStreetMap can rate-limit per install instead
+# of treating "Shadowbroker" as one big offender.
+def _nominatim_user_agent() -> str:
+    override = os.environ.get("NOMINATIM_USER_AGENT", "").strip()
+    if override:
+        return override
+    from services.network_utils import outbound_user_agent
+    return outbound_user_agent("nominatim")


 def _get_cache(key: str):
@@ -178,7 +186,7 @@ def search_geocode(query: str, limit: int = 5, local_only: bool = False) -> List
        res = fetch_with_curl(
            url,
            headers={
-                "User-Agent": _USER_AGENT,
+                "User-Agent": _nominatim_user_agent(),
                "Accept-Language": "en",
            },
            timeout=6,
@@ -241,7 +249,7 @@ def reverse_geocode(lat: float, lng: float, local_only: bool = False) -> Dict[st
        res = fetch_with_curl(
            url,
            headers={
-                "User-Agent": _USER_AGENT,
+                "User-Agent": _nominatim_user_agent(),
                "Accept-Language": "en",
            },
            timeout=6,
@@ -8,6 +8,13 @@ from datetime import datetime
 from urllib.parse import urljoin, urlparse
 from services.network_utils import fetch_with_curl

+
+
+def _geopolitics_user_agent() -> str:
+    """Round 7a: GDELT geopolitics fetcher attribution."""
+    from services.network_utils import outbound_user_agent
+    return outbound_user_agent("geopolitics-gdelt")
+
 logger = logging.getLogger(__name__)

 # Cache Frontline data for 30 minutes, it doesn't move that fast
@@ -316,7 +323,7 @@ def _fetch_article_title(url):
            resp = requests.get(
                current_url,
                timeout=4,
-                headers={"User-Agent": "Mozilla/5.0 (compatible; OSINT Dashboard/1.0)"},
+                headers={"User-Agent": _geopolitics_user_agent()},
                stream=True,
                allow_redirects=False,
            )
@@ -521,10 +528,29 @@ def _parse_gdelt_export_zip(zip_bytes, conflict_codes, seen_locs, features, loc_
        logger.warning(f"Failed to parse GDELT export zip: {e}")


+# GDELT's data.gdeltproject.org is a CNAME to a Google Cloud Storage
+# bucket of the same name. GCS returns the wildcard ``*.storage.googleapis.com``
+# certificate, which legitimately does NOT cover the GDELT custom domain
+# — Python's TLS verification correctly refuses it. Some networks/POPs
+# happen to route through a path where this works; many do not (notably
+# Docker Desktop's outbound NAT on local installs).
+#
+# Fix: rewrite the URL to hit GCS directly with a path-style bucket
+# reference, where the standard GCS cert is genuinely valid. Same data,
+# verified TLS, no operator-side workaround needed.
+def _gcs_direct_gdelt_url(url: str) -> str:
+    """If ``url`` points at data.gdeltproject.org, return the equivalent
+    GCS-direct URL. Otherwise return the URL unchanged."""
+    prefix = "://data.gdeltproject.org/"
+    if prefix in url:
+        return url.replace(prefix, "://storage.googleapis.com/data.gdeltproject.org/", 1)
+    return url
+
+
 def _download_gdelt_export(url):
    """Download a single GDELT export file, return bytes or None."""
    try:
-        res = fetch_with_curl(url, timeout=15)
+        res = fetch_with_curl(_gcs_direct_gdelt_url(url), timeout=15)
        if res.status_code == 200:
            return res.content
    except (ConnectionError, TimeoutError, OSError):  # non-critical
@@ -620,8 +646,12 @@ def fetch_global_military_incidents():
        # HTTPS is used to prevent passive network observers from injecting
        # poisoned export records into the global incident map via MITM.
        # GDELT serves the same content over HTTPS as HTTP.
+        # Use the GCS-direct URL because data.gdeltproject.org's CNAME
+        # serves a wildcard *.storage.googleapis.com cert that legitimately
+        # doesn't cover the GDELT hostname. See _gcs_direct_gdelt_url above.
        index_res = fetch_with_curl(
-            "https://data.gdeltproject.org/gdeltv2/lastupdate.txt", timeout=10
+            _gcs_direct_gdelt_url("https://data.gdeltproject.org/gdeltv2/lastupdate.txt"),
+            timeout=10,
        )
        if index_res.status_code != 200:
            logger.error(f"GDELT lastupdate failed: {index_res.status_code}")
@@ -5,7 +5,9 @@ import subprocess
 import shutil
 import time
 import threading
+import uuid
 import requests
+from pathlib import Path
 from urllib.parse import urlparse
 from requests.adapters import HTTPAdapter
 from urllib3.util.retry import Retry
@@ -20,14 +22,211 @@ _session.mount("https://", HTTPAdapter(max_retries=_retry, pool_maxsize=20))
 _session.mount("http://", HTTPAdapter(max_retries=_retry, pool_maxsize=10))


-# Default outbound User-Agent. Generic by design — does NOT include any
-# personal contact info or a fork-specific repo URL. Operators who run a
-# public-facing relay and want to identify themselves to upstreams (e.g.
-# for Nominatim / weather.gov usage-policy compliance) can override this
-# via the SHADOWBROKER_USER_AGENT env var.
+# ---------------------------------------------------------------------------
+# Per-operator outbound identification
+# ---------------------------------------------------------------------------
+#
+# Issues #289 / #290 / #291 and the retrofit of PR #284 (#218 / #219 / #220):
+# every third-party API the backend calls used to identify itself with a
+# single "Shadowbroker" aggregate User-Agent. From the upstream's
+# perspective, that meant every Shadowbroker install in the world looked
+# like one giant entity hammering them. If one install misbehaved, the
+# upstream's only recourse was to block "Shadowbroker" as a whole — which
+# would take out every other install too.
+#
+# Fix: give each install a stable pseudonymous handle and include it in
+# the User-Agent. Now an upstream can rate-limit or block the offending
+# operator without affecting anyone else.
+#
+# The handle:
+#
+# - Is auto-generated on first call if no `OPERATOR_HANDLE` is configured
+#   (looks like "operator-7f3a92" — 6 hex chars from uuid4()).
+# - Is persisted to ``backend/data/operator_handle.json`` so it survives
+#   restarts. Under Docker compose that file lives in the volume mount
+#   alongside `carrier_cache.json` and the other persistent state.
+# - Can be overridden by the operator via the `OPERATOR_HANDLE` setting
+#   (env var or settings UI). Operators with their own GitHub handle,
+#   organization name, etc. can use that for traceability.
+# - Is NEVER mixed into mesh / Wormhole / Infonet identity. This layer is
+#   strictly for public third-party API attribution.
+
+_SHADOWBROKER_VERSION = "0.9"
+_OPERATOR_HANDLE_FILE = (
+    Path(__file__).parent.parent / "data" / "operator_handle.json"
+)
+_OPERATOR_HANDLE_CACHE: str = ""
+_OPERATOR_HANDLE_LOCK = threading.Lock()
+
+
+def _generate_operator_handle() -> str:
+    """Produce a stable pseudonymous handle for first-launch installs.
+
+    Format: ``operator-7f3a92`` (6 hex chars from a fresh uuid4()).
+    Distinct per install. Carries no real-world identity by default —
+    operators who want one can override via ``OPERATOR_HANDLE``.
+
+    Note: the prefix is deliberately neutral. Earlier drafts used
+    ``shadow-`` which, while accurate to the project name, looks
+    exactly like the kind of pattern a third-party abuse-detection
+    system would auto-block as suspicious. ``operator-`` describes
+    what the value actually is and doesn't pattern-match malware.
+    """
+    return f"operator-{uuid.uuid4().hex[:6]}"
+
+
+def _load_persisted_operator_handle() -> str:
+    """Return the previously-saved handle from disk, or empty if none.
+
+    Reads ``backend/data/operator_handle.json`` if it exists. Any read
+    error returns empty so a fresh handle gets generated rather than
+    crashing the request.
+    """
+    try:
+        if _OPERATOR_HANDLE_FILE.exists():
+            data = json.loads(_OPERATOR_HANDLE_FILE.read_text(encoding="utf-8"))
+            return str(data.get("handle", "") or "").strip()
+    except (OSError, json.JSONDecodeError, ValueError):
+        pass
+    return ""
+
+
+def _persist_operator_handle(handle: str) -> None:
+    """Atomically save the auto-generated handle so subsequent restarts
+    use the same one. Failure to persist is non-fatal — the request still
+    succeeds with the in-memory handle, we just may generate a different
+    one on the next process restart."""
+    try:
+        _OPERATOR_HANDLE_FILE.parent.mkdir(parents=True, exist_ok=True)
+        tmp = _OPERATOR_HANDLE_FILE.with_suffix(_OPERATOR_HANDLE_FILE.suffix + ".tmp")
+        tmp.write_text(
+            json.dumps({"handle": handle, "_meta": {
+                "purpose": "Per-install operator handle for outbound third-party API attribution.",
+                "see": "backend/services/network_utils.py:outbound_user_agent",
+            }}, indent=2),
+            encoding="utf-8",
+        )
+        os.replace(tmp, _OPERATOR_HANDLE_FILE)
+    except OSError as exc:
+        logger.debug("Could not persist operator_handle (continuing in-memory): %s", exc)
+
+
+def get_operator_handle() -> str:
+    """Return the stable per-install operator handle.
+
+    Resolution order:
+      1. ``OPERATOR_HANDLE`` setting (env var / settings UI) if non-empty.
+      2. Process-cached value from previous call this run.
+      3. Value persisted to ``operator_handle.json`` (from a previous run).
+      4. Newly generated pseudonymous handle, persisted to disk.
+
+    The handle is normalized: stripped of whitespace, lowercased,
+    non-alphanumeric chars (except ``-`` and ``_``) replaced with ``-``.
+    This both sanitizes any HTTP-header-unsafe characters AND prevents
+    the operator from impersonating real third-party projects via
+    inventive whitespace.
+    """
+    global _OPERATOR_HANDLE_CACHE
+    with _OPERATOR_HANDLE_LOCK:
+        # 1. Configured override always wins.
+        configured = ""
+        try:
+            from services.config import get_settings
+
+            configured = str(getattr(get_settings(), "OPERATOR_HANDLE", "") or "").strip()
+        except Exception:
+            configured = ""
+        if configured:
+            return _normalize_handle(configured)
+
+        # 2. In-memory cache (fast path for repeated calls).
+        if _OPERATOR_HANDLE_CACHE:
+            return _OPERATOR_HANDLE_CACHE
+
+        # 3. On-disk handle from a previous run.
+        persisted = _load_persisted_operator_handle()
+        if persisted:
+            _OPERATOR_HANDLE_CACHE = _normalize_handle(persisted)
+            return _OPERATOR_HANDLE_CACHE
+
+        # 4. Generate, persist, return.
+        fresh = _generate_operator_handle()
+        _persist_operator_handle(fresh)
+        _OPERATOR_HANDLE_CACHE = fresh
+        return fresh
+
+
+def _normalize_handle(raw: str) -> str:
+    """Strip whitespace, lowercase, replace unsafe characters with dashes."""
+    safe = "".join(
+        ch if (ch.isalnum() or ch in "-_") else "-"
+        for ch in raw.strip().lower()
+    )
+    # Collapse runs of dashes and trim to a reasonable length so an
+    # operator can't make our outbound logs unreadable.
+    while "--" in safe:
+        safe = safe.replace("--", "-")
+    safe = safe.strip("-")
+    return safe[:48] if safe else "anonymous"
+
+
+_CONTACT_URL = "https://github.com/BigBodyCobain/Shadowbroker/issues"
+
+
+def outbound_user_agent(purpose: str = "") -> str:
+    """Build a User-Agent for an outbound third-party HTTP request.
+
+    Returns something like::
+
+        Shadowbroker/0.9 (operator: shadow-7f3a92; purpose: wikipedia;
+         +https://github.com/BigBodyCobain/Shadowbroker/issues)
+
+    The ``purpose`` is optional but recommended — it tells the upstream
+    what feature of ours is making the call (``wikipedia``, ``openmhz``,
+    ``nominatim``, etc.), which makes their logs and our complaints
+    actionable.
+
+    Every outbound call in the backend that previously sent a custom
+    User-Agent should call this helper instead. Centralizing here means:
+      - one place to change the contact URL,
+      - one place to bump the version on release,
+      - one place a Wikimedia / OpenMHz operator can reach to ask for
+        the project to back off, with a per-install handle so they can
+        target the specific install instead of the project as a whole.
+    """
+    handle = get_operator_handle()
+    if purpose:
+        purpose_clean = _normalize_handle(purpose)
+        return (
+            f"Shadowbroker/{_SHADOWBROKER_VERSION} "
+            f"(operator: {handle}; purpose: {purpose_clean}; +{_CONTACT_URL})"
+        )
+    return (
+        f"Shadowbroker/{_SHADOWBROKER_VERSION} "
+        f"(operator: {handle}; +{_CONTACT_URL})"
+    )
+
+
+def _reset_operator_handle_cache_for_tests() -> None:
+    """Test-only: invalidate the in-memory cache so a test can set a
+    new ``OPERATOR_HANDLE`` env var and see it picked up immediately."""
+    global _OPERATOR_HANDLE_CACHE
+    with _OPERATOR_HANDLE_LOCK:
+        _OPERATOR_HANDLE_CACHE = ""
+
+
+# Default outbound User-Agent. Retained for backwards compatibility with
+# call sites that haven't been migrated to ``outbound_user_agent()`` yet.
+# Operators who want full per-install attribution should set the
+# ``OPERATOR_HANDLE`` setting and migrate call sites incrementally.
+#
+# Operators who run a public-facing relay can also override the whole UA
+# string via the ``SHADOWBROKER_USER_AGENT`` env var. That override
+# completely bypasses the per-operator helper; only use it if you know
+# what you're doing.
 DEFAULT_USER_AGENT = os.environ.get(
    "SHADOWBROKER_USER_AGENT",
-    "ShadowBroker-OSINT/0.9",
+    f"Shadowbroker/{_SHADOWBROKER_VERSION}",
 )

 # Find bash for curl fallback — Git bash's curl has the TLS features
@@ -2,14 +2,34 @@ import requests
 from bs4 import BeautifulSoup
 import logging
 from cachetools import cached, TTLCache
-import cloudscraper
 import reverse_geocoder as rg
 from urllib.parse import urlparse

+from services.network_utils import outbound_user_agent
+
 logger = logging.getLogger(__name__)

 _OPENMHZ_AUDIO_HOSTS = {"media.openmhz.com", "media2.openmhz.com", "media3.openmhz.com"}

+
+# Round 7a / Issues #289, #290, #291 (tg12 audit):
+# We previously sent a spoofed Chrome User-Agent and (for OpenMHz) used
+# cloudscraper to bypass anti-bot challenges. Both are dishonest and ToS-
+# unfriendly. We now send the per-install Shadowbroker UA — the upstream
+# can identify us, rate-limit us per install, and contact us if needed.
+#
+# If the upstream actively blocks our honest UA, the feature degrades
+# gracefully (returns an empty list / cached results) rather than
+# escalating to deception.
+
+
+def _broadcastify_user_agent() -> str:
+    return outbound_user_agent("broadcastify")
+
+
+def _openmhz_user_agent() -> str:
+    return outbound_user_agent("openmhz")
+
 # Cache the top feeds for 5 minutes so we don't hammer Broadcastify
 radio_cache = TTLCache(maxsize=1, ttl=300)

@@ -22,8 +42,12 @@ def get_top_broadcastify_feeds():
    """
    logger.info("Scraping Broadcastify Top Feeds (Cache Miss)")
    headers = {
-        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
-        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8",
+        # Issue #289 (tg12) + Round 7a: identify ourselves honestly as a
+        # per-install Shadowbroker scraper. Broadcastify can rate-limit
+        # us per install or block us; either way we stop pretending to be
+        # a browser. If they block, the panel degrades gracefully.
+        "User-Agent": _broadcastify_user_agent(),
+        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
        "Accept-Language": "en-US,en;q=0.9",
    }

@@ -89,21 +113,32 @@ openmhz_systems_cache = TTLCache(maxsize=1, ttl=3600)

@cached(openmhz_systems_cache)
 def get_openmhz_systems():
-    """Fetches the full directory of OpenMHZ systems."""
-    logger.info("Scraping OpenMHZ Systems (Cache Miss)")
-    scraper = cloudscraper.create_scraper(
-        browser={"browser": "chrome", "platform": "windows", "desktop": True}
-    )
+    """Fetches the full directory of OpenMHZ systems.

+    Issue #290 (tg12) + Round 7a: replaced cloudscraper-based Chrome
+    impersonation with an honest per-install Shadowbroker User-Agent.
+    If OpenMHz's Cloudflare layer blocks honest traffic, we accept
+    that degradation (return empty list) rather than spoof a browser.
+    """
+    logger.info("Fetching OpenMHZ Systems (Cache Miss)")
    try:
-        res = scraper.get("https://api.openmhz.com/systems", timeout=15)
+        res = requests.get(
+            "https://api.openmhz.com/systems",
+            timeout=15,
+            headers={"User-Agent": _openmhz_user_agent(), "Accept": "application/json"},
+        )
        if res.status_code == 200:
            data = res.json()
-            # Return list of systems
            return data.get("systems", []) if isinstance(data, dict) else []
+        if res.status_code in (403, 503):
+            logger.warning(
+                "OpenMHZ returned %s for systems directory — Cloudflare may "
+                "be blocking our honest UA. Feature degrades to empty result.",
+                res.status_code,
+            )
        return []
    except (requests.RequestException, ConnectionError, TimeoutError, ValueError, KeyError) as e:
-        logger.error(f"OpenMHZ Systems Scrape Exception: {e}")
+        logger.error(f"OpenMHZ Systems Fetch Exception: {e}")
        return []


@@ -113,21 +148,25 @@ openmhz_calls_cache = TTLCache(maxsize=100, ttl=20)

@cached(openmhz_calls_cache)
 def get_recent_openmhz_calls(sys_name: str):
-    """Fetches the actual audio burst .m4a URLs for a specific system (e.g., 'wmata')."""
-    logger.info(f"Fetching OpenMHZ calls for {sys_name} (Cache Miss)")
-    scraper = cloudscraper.create_scraper(
-        browser={"browser": "chrome", "platform": "windows", "desktop": True}
-    )
+    """Fetches the actual audio burst .m4a URLs for a specific system (e.g., 'wmata').

+    Issue #290 (tg12) + Round 7a: same honest-UA model as
+    ``get_openmhz_systems``.
+    """
+    logger.info(f"Fetching OpenMHZ calls for {sys_name} (Cache Miss)")
    try:
        url = f"https://api.openmhz.com/{sys_name}/calls"
-        res = scraper.get(url, timeout=15)
+        res = requests.get(
+            url,
+            timeout=15,
+            headers={"User-Agent": _openmhz_user_agent(), "Accept": "application/json"},
+        )
        if res.status_code == 200:
            data = res.json()
            return data.get("calls", []) if isinstance(data, dict) else []
        return []
    except (requests.RequestException, ConnectionError, TimeoutError, ValueError, KeyError) as e:
-        logger.error(f"OpenMHZ Calls Scrape Exception ({sys_name}): {e}")
+        logger.error(f"OpenMHZ Calls Fetch Exception ({sys_name}): {e}")
        return []


@@ -163,9 +202,11 @@ def openmhz_audio_response(target_url: str):
                timeout=(5, 20),
                allow_redirects=False,
                headers={
-                    "User-Agent": "Mozilla/5.0",
+                    # Issue #291 (tg12) + Round 7a: drop spoofed Mozilla
+                    # UA and the fake first-party Referer. Identify as
+                    # the per-install Shadowbroker proxy honestly.
+                    "User-Agent": _openmhz_user_agent(),
                    "Accept": "audio/mpeg,audio/*,*/*;q=0.8",
-                    "Referer": "https://openmhz.com/",
                },
            )
            if upstream.is_redirect or upstream.status_code in (301, 302, 303, 307, 308):
@@ -4,7 +4,7 @@ import concurrent.futures
 from urllib.parse import quote
 import requests as _requests
 from cachetools import TTLCache
-from services.network_utils import fetch_with_curl, DEFAULT_USER_AGENT
+from services.network_utils import fetch_with_curl, outbound_user_agent

 logger = logging.getLogger(__name__)

@@ -15,24 +15,30 @@ dossier_cache = TTLCache(maxsize=500, ttl=86400)
 # Nominatim requires max 1 req/sec — track last call time
 _nominatim_last_call = 0.0

-# Issue #218 / #219 (tg12): Wikimedia's User-Agent policy requires API
+# Issues #218 / #219 (tg12): Wikimedia's User-Agent policy requires API
 # clients to identify themselves with a stable User-Agent that includes
-# a contact path. Bare "python-requests/x.y" or generic strings violate
-# the policy and risk getting blocked. We send the project default UA
-# (operator-overridable via SHADOWBROKER_USER_AGENT) on EVERY outbound
-# Wikimedia request, plus the policy-recommended Api-User-Agent which
-# Wikimedia explicitly accepts on top of the regular UA.
+# a contact path.
 #
-# This is documented and stable so a Wikimedia operator who wants to
-# rate-limit or contact us has a fixed identifier to grep for.
-_WIKIMEDIA_REQUEST_HEADERS = {
-    "User-Agent": DEFAULT_USER_AGENT,
-    "Api-User-Agent": (
-        f"{DEFAULT_USER_AGENT} "
-        "(+https://github.com/BigBodyCobain/Shadowbroker; "
-        "report issues at /issues)"
-    ),
-}
+# Round 7a: the original fix in PR #284 used a single project-wide
+# identifier, which from Wikimedia's perspective made every Shadowbroker
+# install in the world look like one giant scraper. If one install
+# misbehaved, their only recourse was to block "Shadowbroker" as a
+# whole. We now build the headers from ``outbound_user_agent('wikimedia')``
+# which embeds the per-install operator handle (auto-generated or
+# operator-chosen), so Wikimedia can rate-limit / contact the specific
+# install instead of the project.
+
+
+def _wikimedia_request_headers() -> dict[str, str]:
+    ua = outbound_user_agent("wikimedia")
+    return {
+        "User-Agent": ua,
+        # Browser-JS-style header that Wikimedia's policy explicitly
+        # accepts on top of (or instead of) User-Agent. We send both so
+        # whichever the upstream prefers, the per-operator handle is
+        # always available.
+        "Api-User-Agent": ua,
+    }


 def _reverse_geocode_offline(lat: float, lng: float) -> dict:
@@ -64,9 +70,7 @@ def _reverse_geocode(lat: float, lng: float) -> dict:
        f"https://nominatim.openstreetmap.org/reverse?"
        f"lat={lat}&lon={lng}&format=json&zoom=10&addressdetails=1&accept-language=en"
    )
-    headers = {
-        "User-Agent": "ShadowBroker-OSINT/1.0 (live-risk-dashboard; contact@shadowbroker.app)"
-    }
+    headers = {"User-Agent": outbound_user_agent("nominatim")}

    for attempt in range(2):
        # Enforce Nominatim's 1 req/sec policy
@@ -146,7 +150,7 @@ def _fetch_wikidata_leader(country_name: str) -> dict:
        # specific Api-User-Agent that the policy specifically asks
        # for, since this request originates from a backend service
        # that proxies on behalf of (potentially many) browser users.
-        res = fetch_with_curl(url, timeout=6, headers=_WIKIMEDIA_REQUEST_HEADERS)
+        res = fetch_with_curl(url, timeout=6, headers=_wikimedia_request_headers())
        if res.status_code == 200:
            results = res.json().get("results", {}).get("bindings", [])
            if results:
@@ -174,7 +178,7 @@ def _fetch_local_wiki_summary(place_name: str, country_name: str = "") -> dict:
        try:
            # Issue #219 (tg12): identify ourselves to Wikimedia per
            # their UA policy; see _fetch_wikidata_leader above.
-            res = fetch_with_curl(url, timeout=5, headers=_WIKIMEDIA_REQUEST_HEADERS)
+            res = fetch_with_curl(url, timeout=5, headers=_wikimedia_request_headers())
            if res.status_code == 200:
                data = res.json()
                if data.get("type") != "disambiguation":
@@ -34,6 +34,11 @@ from services.sar.sar_config import (
    copernicus_token,
    earthdata_token,
 )
+
+
+def _sar_user_agent() -> str:
+    from services.network_utils import outbound_user_agent
+    return outbound_user_agent("sar-products")
 from services.sar.sar_normalize import (
    SarAnomaly,
    evidence_hash_for_payload,
@@ -442,7 +447,7 @@ def _fetch_unosat_packages() -> list[dict[str, Any]]:
    # HDX CKAN returns 406 without explicit Accept + a browser-ish UA.
    hdx_headers = {
        "Accept": "application/json",
-        "User-Agent": "Mozilla/5.0 (compatible; ShadowBroker-SAR/1.0)",
+        "User-Agent": _sar_user_agent(),
    }
    try:
        resp = fetch_with_curl(url, timeout=20, headers=hdx_headers)
@@ -11,12 +11,21 @@ import requests
 from datetime import datetime, timedelta
 from cachetools import TTLCache

+from services.network_utils import outbound_user_agent
+
 logger = logging.getLogger(__name__)

 # Cache by rounded lat/lon (0.02° grid ~= 2km), TTL 1 hour
 _sentinel_cache = TTLCache(maxsize=200, ttl=3600)


+def _planetary_user_agent() -> str:
+    # Round 7a: per-install handle so Microsoft Planetary Computer can
+    # attribute requests to the specific operator rather than treating
+    # the whole Shadowbroker user base as one entity.
+    return outbound_user_agent("sentinel2-planetary-computer")
+
+
 def _esri_imagery_fallback(lat: float, lng: float) -> dict:
    lat_span = 0.18
    lng_span = 0.24
@@ -64,7 +73,7 @@ def search_sentinel2_scene(lat: float, lng: float) -> dict:
            "https://planetarycomputer.microsoft.com/api/stac/v1/search",
            json=search_payload,
            timeout=8,
-            headers={"User-Agent": "ShadowBroker-OSINT/1.0 (live-risk-dashboard)"},
+            headers={"User-Agent": _planetary_user_agent()},
        )
        search_res.raise_for_status()
        data = search_res.json()
@@ -20,7 +20,11 @@ from cachetools import TTLCache
 logger = logging.getLogger(__name__)

 _SHODAN_BASE = "https://api.shodan.io"
-_USER_AGENT = "ShadowBroker/0.9.79 local Shodan connector"
+# Round 7a: per-install attribution. Shodan already has the operator API
+# key for billing, but the UA still identifies the install.
+def _shodan_user_agent():
+    from services.network_utils import outbound_user_agent
+    return outbound_user_agent("shodan")
 _REQUEST_TIMEOUT = 15
 _MIN_INTERVAL_SECONDS = 1.05  # Shodan docs say API plans are rate limited to ~1 req/sec.
 _DEFAULT_SEARCH_PAGES = 1
@@ -179,7 +183,7 @@ def _request(path: str, *, params: dict[str, Any], cache: TTLCache[str, dict[str
                f"{_SHODAN_BASE}{path}",
                params=payload,
                timeout=_REQUEST_TIMEOUT,
-                headers={"User-Agent": _USER_AGENT, "Accept": "application/json"},
+                headers={"User-Agent": _shodan_user_agent(), "Accept": "application/json"},
            )
        finally:
            _last_request_at = time.monotonic()
@@ -19,6 +19,13 @@ from pathlib import Path
 import requests
 from sgp4.api import Satrec, WGS72, jday

+
+
+def _tinygs_user_agent(purpose: str) -> str:
+    """Round 7a: per-install handle for CelesTrak / TinyGS attribution."""
+    from services.network_utils import outbound_user_agent
+    return outbound_user_agent(f"tinygs-{purpose}")
+
 logger = logging.getLogger(__name__)

 # ---------------------------------------------------------------------------
@@ -113,7 +120,7 @@ def _fetch_celestrak_tles() -> list[dict]:
                params={"GROUP": group, "FORMAT": "json"},
                timeout=20,
                headers={
-                    "User-Agent": "ShadowBroker-OSINT/1.0 (CelesTrak fair-use)",
+                    "User-Agent": _tinygs_user_agent("celestrak"),
                    "Accept": "application/json",
                },
            )
@@ -259,7 +266,7 @@ def _fetch_tinygs_telemetry() -> None:
            timeout=15,
            headers={
                "Accept": "application/json",
-                "User-Agent": "ShadowBroker-OSINT/1.0",
+                "User-Agent": _tinygs_user_agent("tinygs"),
            },
        )
        resp.raise_for_status()
@@ -24,7 +24,9 @@ from cachetools import TTLCache
 logger = logging.getLogger(__name__)

 _FINNHUB_BASE = "https://finnhub.io/api/v1"
-_USER_AGENT = "ShadowBroker/0.9.79 Finnhub connector"
+def _finnhub_user_agent():
+    from services.network_utils import outbound_user_agent
+    return outbound_user_agent("finnhub")
 _REQUEST_TIMEOUT = 12
 _MIN_INTERVAL_SECONDS = 0.35  # Stay well under 60 calls/min

@@ -89,7 +91,7 @@ def _request(path: str, params: dict[str, Any] | None = None) -> Any:
                f"{_FINNHUB_BASE}{path}",
                params=payload,
                timeout=_REQUEST_TIMEOUT,
-                headers={"User-Agent": _USER_AGENT, "Accept": "application/json"},
+                headers={"User-Agent": _finnhub_user_agent(), "Accept": "application/json"},
            )
        finally:
            _last_request_at = time.monotonic()
@@ -0,0 +1,83 @@
+"""GDELT's ``data.gdeltproject.org`` is a CNAME to a Google Cloud Storage
+bucket. GCS responds with the wildcard ``*.storage.googleapis.com``
+certificate, which legitimately does NOT cover the GDELT custom
+domain, so Python's TLS verification refuses the connection. Some
+networks happen to route through a path where this works; many
+(notably Docker Desktop's outbound NAT on local installs) do not.
+
+The fix in ``services.geopolitics._gcs_direct_gdelt_url`` rewrites any
+URL pointing at ``data.gdeltproject.org`` to its GCS-direct equivalent
+(``storage.googleapis.com/data.gdeltproject.org/...``), where the
+standard GCS certificate is genuinely valid. ``api.gdeltproject.org``
+and every other host are left untouched.
+
+These tests pin that behavior so a future refactor that drops the
+helper or accidentally rewrites the wrong host gets a loud failure.
+"""
+from __future__ import annotations
+
+import pytest
+
+
+def test_rewrites_data_gdeltproject_https():
+    from services.geopolitics import _gcs_direct_gdelt_url
+
+    assert _gcs_direct_gdelt_url(
+        "https://data.gdeltproject.org/gdeltv2/lastupdate.txt"
+    ) == "https://storage.googleapis.com/data.gdeltproject.org/gdeltv2/lastupdate.txt"
+
+
+def test_rewrites_data_gdeltproject_http():
+    """GDELT's lastupdate.txt sometimes lists URLs with http:// — we
+    rewrite those too (the downstream call upgrades them to https)."""
+    from services.geopolitics import _gcs_direct_gdelt_url
+
+    assert _gcs_direct_gdelt_url(
+        "http://data.gdeltproject.org/gdeltv2/20260301120000.export.CSV.zip"
+    ) == "http://storage.googleapis.com/data.gdeltproject.org/gdeltv2/20260301120000.export.CSV.zip"
+
+
+def test_rewrites_preserve_query_string_and_path():
+    from services.geopolitics import _gcs_direct_gdelt_url
+
+    url = "https://data.gdeltproject.org/some/deep/path?a=1&b=2&c=hello%20world"
+    rewritten = _gcs_direct_gdelt_url(url)
+    assert rewritten == (
+        "https://storage.googleapis.com/data.gdeltproject.org"
+        "/some/deep/path?a=1&b=2&c=hello%20world"
+    )
+
+
+def test_does_not_touch_api_gdeltproject_org():
+    """The API host is NOT a CNAME to GCS; rewriting it would break the
+    actual GDELT API endpoint."""
+    from services.geopolitics import _gcs_direct_gdelt_url
+
+    url = "https://api.gdeltproject.org/api/v2/doc/doc?query=carrier"
+    assert _gcs_direct_gdelt_url(url) == url
+
+
+def test_does_not_touch_other_hosts():
+    from services.geopolitics import _gcs_direct_gdelt_url
+
+    for url in (
+        "https://en.wikipedia.org/wiki/Boeing_747",
+        "https://query.wikidata.org/sparql",
+        "https://storage.googleapis.com/already-correct/path",
+        "https://nominatim.openstreetmap.org/search",
+    ):
+        assert _gcs_direct_gdelt_url(url) == url
+
+
+def test_does_not_partially_match_strings():
+    """``data.gdeltproject.org`` is matched exactly; URLs that merely
+    contain that substring elsewhere (in a query parameter, for example)
+    are left alone. Otherwise we'd rewrite something like
+    ``https://example.com/?ref=data.gdeltproject.org/x`` which is wrong."""
+    from services.geopolitics import _gcs_direct_gdelt_url
+
+    # The match requires ``://`` immediately before the host, so a host
+    # like ``example-data.gdeltproject.org`` would also be left alone
+    # (treated as a different host, which is correct).
+    url = "https://example-data.gdeltproject.org/path"
+    assert _gcs_direct_gdelt_url(url) == url
@@ -0,0 +1,277 @@
+"""Round 7a: per-install operator handle threads through every outbound
+third-party API call.
+
+Background: before this change every Shadowbroker install identified
+itself to Wikipedia, Wikidata, Nominatim, GDELT, OpenMHz, Broadcastify,
+weather.gov, NUFORC, etc. with a single project-wide ``Shadowbroker``
+User-Agent. From the upstream's perspective, every install in the world
+looked like one giant scraper. If one install misbehaved, the upstream's
+only recourse was to block ``Shadowbroker`` as a whole, taking out every
+other install.
+
+Fix: each install gets a stable pseudonymous handle (auto-generated like
+``shadow-7f3a92`` or operator-overridden via ``OPERATOR_HANDLE``) that
+gets embedded in the User-Agent for every outbound call. Upstreams can
+now rate-limit / contact the specific operator instead of the project.
+
+These tests pin:
+
+  1. The handle is auto-generated on first call if no override exists.
+  2. The handle survives process restart (persisted to disk).
+  3. ``OPERATOR_HANDLE`` env var override wins over the auto-gen handle.
+  4. The handle is sanitized (whitespace, special chars, length).
+  5. Every previously-MONSTER-UA call site now sends the per-operator UA.
+"""
+from __future__ import annotations
+
+import json
+import os
+from pathlib import Path
+from unittest.mock import patch
+
+import pytest
+
+
+@pytest.fixture
+def isolated_handle(tmp_path, monkeypatch):
+    """Redirect the persistence path to tmp and reset caches between tests."""
+    from services import network_utils
+
+    handle_file = tmp_path / "operator_handle.json"
+    monkeypatch.setattr(network_utils, "_OPERATOR_HANDLE_FILE", handle_file)
+    network_utils._reset_operator_handle_cache_for_tests()
+    monkeypatch.delenv("OPERATOR_HANDLE", raising=False)
+
+    # Reset Settings cache so OPERATOR_HANDLE env changes are picked up.
+    from services.config import get_settings
+    get_settings.cache_clear()
+
+    yield network_utils
+
+    network_utils._reset_operator_handle_cache_for_tests()
+    get_settings.cache_clear()
+
+
+# ---------------------------------------------------------------------------
+# Core handle generation / persistence / override
+# ---------------------------------------------------------------------------
+
+
+class TestOperatorHandleGeneration:
+    def test_auto_generates_on_first_call(self, isolated_handle):
+        h = isolated_handle.get_operator_handle()
+        # Prefix is "operator-" (deliberately neutral; "shadow-" looked
+        # exactly like a pattern abuse-detection systems would auto-block).
+        assert h.startswith("operator-")
+        assert len(h) == len("operator-") + 6
+        # Hex suffix.
+        suffix = h.split("-", 1)[1]
+        int(suffix, 16)  # raises if not hex
+
+    def test_persists_to_disk_so_handle_survives_restart(self, isolated_handle):
+        first = isolated_handle.get_operator_handle()
+        # Simulate process restart: clear in-memory cache, then ask again.
+        isolated_handle._reset_operator_handle_cache_for_tests()
+        second = isolated_handle.get_operator_handle()
+        assert second == first
+        # The file actually exists.
+        assert isolated_handle._OPERATOR_HANDLE_FILE.exists()
+        body = json.loads(isolated_handle._OPERATOR_HANDLE_FILE.read_text())
+        assert body["handle"] == first
+
+    def test_env_override_wins_over_auto_generated(self, isolated_handle, monkeypatch):
+        # First call without env var auto-generates.
+        auto = isolated_handle.get_operator_handle()
+        assert auto.startswith("operator-")
+        # Setting env var changes the resolved handle without touching the disk file.
+        monkeypatch.setenv("OPERATOR_HANDLE", "alice")
+        from services.config import get_settings
+        get_settings.cache_clear()
+        isolated_handle._reset_operator_handle_cache_for_tests()
+        assert isolated_handle.get_operator_handle() == "alice"
+
+    def test_handle_is_sanitized(self, isolated_handle, monkeypatch):
+        from services.config import get_settings
+
+        # Sanitization tests run against the normalizer directly so the
+        # empty-string case can be asserted independently of the env-var
+        # resolution path (where empty means "use auto-gen", not "use
+        # 'anonymous'").
+        from services.network_utils import _normalize_handle
+
+        cases = [
+            ("Alice Smith", "alice-smith"),
+            ("user@example.com", "user-example-com"),
+            ("  whitespace  ", "whitespace"),
+            ("UPPER-CASE", "upper-case"),
+            ("multiple---dashes", "multiple-dashes"),
+            ("/leading/slash", "leading-slash"),
+            ("trailing-", "trailing"),
+            ("", "anonymous"),
+        ]
+        for raw, expected in cases:
+            got = _normalize_handle(raw)
+            assert got == expected, f"{raw!r} -> {got!r}, expected {expected!r}"
+            assert got == got.lower()
+            for ch in got:
+                assert ch.isalnum() or ch in "-_", f"unsafe char {ch!r} in {got!r}"
+            assert "--" not in got
+
+    def test_handle_is_length_capped(self, isolated_handle, monkeypatch):
+        from services.config import get_settings
+
+        monkeypatch.setenv("OPERATOR_HANDLE", "x" * 1000)
+        get_settings.cache_clear()
+        isolated_handle._reset_operator_handle_cache_for_tests()
+        got = isolated_handle.get_operator_handle()
+        assert len(got) <= 48
+
+
+# ---------------------------------------------------------------------------
+# outbound_user_agent() builds the right header
+# ---------------------------------------------------------------------------
+
+
+class TestOutboundUserAgentString:
+    def test_includes_operator_handle(self, isolated_handle):
+        ua = isolated_handle.outbound_user_agent()
+        handle = isolated_handle.get_operator_handle()
+        assert f"operator: {handle}" in ua
+
+    def test_includes_purpose_when_provided(self, isolated_handle):
+        ua = isolated_handle.outbound_user_agent("wikipedia")
+        assert "purpose: wikipedia" in ua
+
+    def test_includes_contact_path(self, isolated_handle):
+        ua = isolated_handle.outbound_user_agent()
+        assert "github.com" in ua.lower()
+        assert "shadowbroker" in ua.lower()
+
+    def test_version_prefix(self, isolated_handle):
+        ua = isolated_handle.outbound_user_agent()
+        assert ua.startswith("Shadowbroker/")
+
+
+# ---------------------------------------------------------------------------
+# Wikipedia / Wikidata — retroactive fix for PR #284's MONSTER pattern
+# ---------------------------------------------------------------------------
+
+
+class TestWikimediaCallsAreNowPerOperator:
+    def test_wikidata_call_uses_per_operator_ua(self, isolated_handle, monkeypatch):
+        from services import region_dossier
+
+        captured = []
+
+        class _FakeResp:
+            status_code = 200
+            def json(self):
+                return {"results": {"bindings": []}}
+
+        def fake_fetch(url, **kwargs):
+            captured.append(kwargs.get("headers") or {})
+            return _FakeResp()
+
+        monkeypatch.setattr(region_dossier, "fetch_with_curl", fake_fetch)
+        region_dossier._fetch_wikidata_leader("Testlandia")
+
+        assert captured, "Wikidata fetcher was not called"
+        headers = captured[0]
+        assert "User-Agent" in headers
+        assert "Api-User-Agent" in headers
+        handle = isolated_handle.get_operator_handle()
+        for header_value in (headers["User-Agent"], headers["Api-User-Agent"]):
+            assert f"operator: {handle}" in header_value, (
+                f"Wikimedia UA must include the per-operator handle; got {header_value!r}"
+            )
+
+    def test_wikipedia_summary_uses_per_operator_ua(self, isolated_handle, monkeypatch):
+        from services import region_dossier
+
+        captured = []
+
+        class _FakeResp:
+            status_code = 200
+            def json(self):
+                return {
+                    "type": "standard",
+                    "description": "x",
+                    "extract": "y",
+                    "thumbnail": {"source": ""},
+                }
+
+        def fake_fetch(url, **kwargs):
+            captured.append((url, kwargs.get("headers") or {}))
+            return _FakeResp()
+
+        monkeypatch.setattr(region_dossier, "fetch_with_curl", fake_fetch)
+        region_dossier._fetch_local_wiki_summary("Paris", "France")
+
+        wikipedia_hits = [c for c in captured if "wikipedia.org" in c[0]]
+        assert wikipedia_hits, "Wikipedia summary fetch was not called"
+        for _url, headers in wikipedia_hits:
+            handle = isolated_handle.get_operator_handle()
+            assert f"operator: {handle}" in headers.get("User-Agent", "")
+
+
+# ---------------------------------------------------------------------------
+# Generic round-7a regression guard
+# ---------------------------------------------------------------------------
+
+
+class TestNoMonsterUserAgentRemains:
+    """The audit's underlying concern was that every Shadowbroker install
+    looked like one entity. This test scans the codebase for the OLD
+    aggregate identifier patterns and fails if a new one sneaks back in.
+
+    We allow the strings to appear in:
+      - comments (audit prose, change-log notes)
+      - tests
+      - .env.example (documentation)
+    The test only fails if the string lives in actual outbound-request
+    HEADER values without going through the per-operator helper.
+    """
+
+    BANNED_LITERALS = (
+        "ShadowBroker-OSINT/1.0",
+        "ShadowBroker-OSINT/0.9",
+        "ShadowBroker-FeedIngester/1.0",
+        "ShadowBroker/0.9.79 local Shodan connector",
+        "ShadowBroker/0.9.79 Finnhub connector",
+        "Mozilla/5.0 (compatible; ShadowBroker CCTV proxy)",
+    )
+
+    def test_no_banned_aggregate_user_agent_strings(self):
+        from pathlib import Path
+
+        backend_root = Path(__file__).parent.parent
+        offenders = []
+        for py in backend_root.rglob("*.py"):
+            # Skip test files and any audit-context comments.
+            rel = py.relative_to(backend_root).as_posix()
+            if rel.startswith("tests/"):
+                continue
+            text = py.read_text(encoding="utf-8", errors="ignore")
+            # Look only for the literal as part of a string in a User-Agent
+            # context: cheap heuristic via "User-Agent" + literal coexisting
+            # in the same file. A literal in a comment block won't trigger
+            # because the same line won't have User-Agent surrounding it.
+            for banned in self.BANNED_LITERALS:
+                if banned in text:
+                    # Walk lines to ensure it's a real header value.
+                    for i, line in enumerate(text.splitlines(), 1):
+                        if banned in line:
+                            # Comments / docstrings are allowed — only fail
+                            # if the line looks like a header assignment.
+                            stripped = line.strip()
+                            if stripped.startswith("#"):
+                                continue
+                            if '"User-Agent"' in line or "'User-Agent'" in line:
+                                offenders.append(f"{rel}:{i}: {stripped[:120]}")
+        assert not offenders, (
+            "Round 7a regression: the following lines reintroduced an "
+            "aggregate Shadowbroker User-Agent. Use "
+            "outbound_user_agent('purpose') instead so the per-install "
+            "operator handle is embedded.\n"
+            + "\n".join(offenders)
+        )
@@ -77,15 +77,25 @@ def test_wikipedia_summary_call_passes_wikimedia_request_headers():
        assert "github.com" in headers["Api-User-Agent"].lower()


-def test_wikimedia_headers_constant_is_stable():
-    """Regression guard: if someone removes the contact path from the
-    Api-User-Agent we want a loud test failure, not a silent ToS drift.
-    """
-    from services.region_dossier import _WIKIMEDIA_REQUEST_HEADERS
+def test_wikimedia_headers_helper_is_stable():
+    """Regression guard: if someone removes the contact path or the
+    per-operator handle from the Wikimedia headers, we want a loud
+    test failure, not a silent ToS drift.

-    aua = _WIKIMEDIA_REQUEST_HEADERS.get("Api-User-Agent", "")
-    assert "Shadowbroker" in aua or "ShadowBroker" in aua
-    assert "github.com" in aua.lower()
-    # Must include a path Wikimedia operators can use to contact us
-    # (we use /issues against the public repo).
-    assert "issues" in aua.lower()
+    Round 7a: the original ``_WIKIMEDIA_REQUEST_HEADERS`` constant was
+    replaced with the ``_wikimedia_request_headers()`` function so the
+    per-install operator handle is embedded at call time. This test
+    pins both the project identifier AND the contact path AND the
+    per-operator format.
+    """
+    from services.region_dossier import _wikimedia_request_headers
+
+    headers = _wikimedia_request_headers()
+    aua = headers.get("Api-User-Agent", "")
+    ua = headers.get("User-Agent", "")
+    for h, label in ((ua, "User-Agent"), (aua, "Api-User-Agent")):
+        assert "Shadowbroker" in h or "ShadowBroker" in h, f"{label} missing project id"
+        assert "github.com" in h.lower(), f"{label} missing contact URL"
+        assert "issues" in h.lower(), f"{label} missing /issues contact path"
+        # Round 7a: must include the per-operator handle.
+        assert "operator:" in h, f"{label} missing per-operator handle: {h!r}"
@@ -57,6 +57,18 @@ services:
      # name). If you rename the frontend service or run with a different
      # container_name, list the hostnames here (comma-separated, no spaces).
      - SHADOWBROKER_TRUSTED_FRONTEND_HOSTS=${SHADOWBROKER_TRUSTED_FRONTEND_HOSTS:-frontend,shadowbroker-frontend}
+      # Third-party fetcher opt-ins. Default OFF — these phone home to
+      # politically/commercially sensitive upstreams (Polymarket, Kalshi,
+      # Yahoo Finance, EU disinfo trackers, NUFORC dataset host, etc.).
+      # Set to "true" in your .env only if you want the node's IP to
+      # contact each of these services. The dashboard panel for each
+      # feature reads as "no data" until the corresponding flag is on.
+      - PREDICTION_MARKETS_ENABLED=${PREDICTION_MARKETS_ENABLED:-false}
+      - FINANCIAL_ENABLED=${FINANCIAL_ENABLED:-false}
+      - CROWDTHREAT_ENABLED=${CROWDTHREAT_ENABLED:-false}
+      - FIMI_ENABLED=${FIMI_ENABLED:-false}
+      - NUFORC_ENABLED=${NUFORC_ENABLED:-false}
+      - NEWS_ENABLED=${NEWS_ENABLED:-true}
    volumes:
      - backend_data:/app/data
    restart: unless-stopped
@@ -1,16 +1,21 @@
 /**
- * Issues #218 / #219 / #220 (tg12 external audit):
+ * Issues #218 / #219 / #220 (tg12 external audit) + Round 7a:
 *
 * Every browser-direct call to Wikipedia or Wikidata must send the
- * `Api-User-Agent` header that Wikimedia's UA policy asks for. These
- * tests pin that requirement on the shared `lib/wikimediaClient`
+ * `Api-User-Agent` header that Wikimedia's UA policy asks for, AND must
+ * embed the per-install operator handle so Wikimedia can rate-limit /
+ * contact the specific operator instead of treating "Shadowbroker" as
+ * one giant entity.
+ *
+ * These tests pin both requirements on the shared `lib/wikimediaClient`
 * helper that WikiImage, NewsFeed, and useRegionDossier all route
- * through, so a future refactor that drops the header gets a loud
- * test failure rather than a silent ToS regression.
+ * through. A future refactor that drops either the header OR the
+ * per-operator handle gets a loud test failure rather than a silent
+ * ToS / privacy regression.
 */
 import { afterEach, beforeEach, describe, expect, it, vi } from 'vitest';
 import {
-  WIKIMEDIA_API_USER_AGENT,
+  buildWikimediaUserAgent,
  fetchWikipediaSummary,
  fetchWikidataSparql,
  _resetWikimediaClientCacheForTests,
@@ -18,6 +23,18 @@ import {

 const originalFetch = globalThis.fetch;

+// Helper: stub fetch so calls to /api/settings/operator-handle return a
+// known handle, and everything else proxies to whatever the test set up.
+function withHandle(handle: string, otherFetch: typeof globalThis.fetch) {
+  return vi.fn(async (input: any, init?: RequestInit) => {
+    const url = String(input);
+    if (url.endsWith('/api/settings/operator-handle')) {
+      return new Response(JSON.stringify({ handle }), { status: 200 });
+    }
+    return otherFetch(input, init);
+  });
+}
+
 describe('lib/wikimediaClient', () => {
  beforeEach(() => {
    _resetWikimediaClientCacheForTests();
@@ -28,16 +45,35 @@ describe('lib/wikimediaClient', () => {
    vi.restoreAllMocks();
  });

-  it('exposes a stable Api-User-Agent identifier with a contact path', () => {
-    expect(WIKIMEDIA_API_USER_AGENT).toContain('Shadowbroker');
-    expect(WIKIMEDIA_API_USER_AGENT.toLowerCase()).toContain('github.com');
-    expect(WIKIMEDIA_API_USER_AGENT.toLowerCase()).toContain('issues');
+  it('builds a stable per-operator Api-User-Agent with contact path', async () => {
+    globalThis.fetch = withHandle(
+      'operator-abc123',
+      vi.fn(async () => new Response('{}', { status: 200 })) as any,
+    ) as any;
+    const ua = await buildWikimediaUserAgent('wikipedia-summary');
+    expect(ua).toContain('Shadowbroker');
+    expect(ua.toLowerCase()).toContain('github.com');
+    expect(ua.toLowerCase()).toContain('issues');
+    expect(ua).toContain('operator: operator-abc123');
+    expect(ua).toContain('purpose: wikipedia-summary');
  });

-  it('sends Api-User-Agent on Wikipedia summary fetch', async () => {
-    const calls: Array<{ url: string; init?: RequestInit }> = [];
-    globalThis.fetch = vi.fn(async (url: any, init?: RequestInit) => {
-      calls.push({ url: String(url), init });
+  it('falls back to "operator-offline" when handle endpoint is unreachable', async () => {
+    globalThis.fetch = vi.fn(async (input: any) => {
+      const url = String(input);
+      if (url.endsWith('/api/settings/operator-handle')) {
+        return new Response('forbidden', { status: 403 });
+      }
+      return new Response('{}', { status: 200 });
+    }) as any;
+    const ua = await buildWikimediaUserAgent('test');
+    expect(ua).toContain('operator: operator-offline');
+  });
+
+  it('sends per-operator Api-User-Agent on Wikipedia summary fetch', async () => {
+    const wikiCalls: Array<{ url: string; init?: RequestInit }> = [];
+    const baseFetch = vi.fn(async (url: any, init?: RequestInit) => {
+      wikiCalls.push({ url: String(url), init });
      return new Response(
        JSON.stringify({
          type: 'standard',
@@ -48,44 +84,71 @@ describe('lib/wikimediaClient', () => {
        }),
        { status: 200 },
      );
-    }) as any;
+    });
+    globalThis.fetch = withHandle('operator-test01', baseFetch as any) as any;

    const summary = await fetchWikipediaSummary('Boeing 747');
    expect(summary?.thumbnail).toBe('https://example.org/thumb.jpg');
-    expect(calls).toHaveLength(1);
-    const headers = (calls[0].init?.headers || {}) as Record<string, string>;
-    expect(headers['Api-User-Agent']).toBe(WIKIMEDIA_API_USER_AGENT);
+    // wikiCalls only captures calls to non-handle URLs.
+    expect(wikiCalls).toHaveLength(1);
+    const headers = (wikiCalls[0].init?.headers || {}) as Record<string, string>;
+    expect(headers['Api-User-Agent']).toContain('operator: operator-test01');
+    expect(headers['Api-User-Agent']).toContain('purpose: wikipedia-summary');
  });

-  it('sends Api-User-Agent on Wikidata SPARQL fetch', async () => {
+  it('sends per-operator Api-User-Agent on Wikidata SPARQL fetch', async () => {
    const calls: Array<{ url: string; init?: RequestInit }> = [];
-    globalThis.fetch = vi.fn(async (url: any, init?: RequestInit) => {
+    const baseFetch = vi.fn(async (url: any, init?: RequestInit) => {
      calls.push({ url: String(url), init });
      return new Response(
        JSON.stringify({
-          results: {
-            bindings: [
-              {
-                leaderLabel: { value: 'Test Leader' },
-                govTypeLabel: { value: 'Test Government' },
-              },
-            ],
-          },
+          results: { bindings: [{ leaderLabel: { value: 'Test Leader' } }] },
+        }),
+        { status: 200 },
+      );
+    });
+    globalThis.fetch = withHandle('operator-sparql', baseFetch as any) as any;
+
+    const bindings = await fetchWikidataSparql('SELECT * WHERE { ?s ?p ?o }');
+    expect(bindings).toHaveLength(1);
+    const headers = (calls[0].init?.headers || {}) as Record<string, string>;
+    expect(headers['Api-User-Agent']).toContain('operator: operator-sparql');
+    expect(headers['Api-User-Agent']).toContain('purpose: wikidata-sparql');
+    expect(headers['Accept']).toBe('application/sparql-results+json');
+  });
+
+  it('handle endpoint is queried only ONCE across many wiki fetches', async () => {
+    let handleCalls = 0;
+    let wikiCalls = 0;
+    globalThis.fetch = vi.fn(async (input: any) => {
+      const url = String(input);
+      if (url.endsWith('/api/settings/operator-handle')) {
+        handleCalls++;
+        return new Response(JSON.stringify({ handle: 'operator-cache' }), { status: 200 });
+      }
+      wikiCalls++;
+      return new Response(
+        JSON.stringify({
+          type: 'standard',
+          title: 'X',
+          description: '',
+          extract: '',
+          thumbnail: { source: 'https://example.org/x.jpg' },
        }),
        { status: 200 },
      );
    }) as any;

-    const bindings = await fetchWikidataSparql('SELECT * WHERE { ?s ?p ?o }');
-    expect(bindings).toHaveLength(1);
-    const headers = (calls[0].init?.headers || {}) as Record<string, string>;
-    expect(headers['Api-User-Agent']).toBe(WIKIMEDIA_API_USER_AGENT);
-    expect(headers['Accept']).toBe('application/sparql-results+json');
+    await fetchWikipediaSummary('Eiffel Tower');
+    await fetchWikipediaSummary('Mount Fuji');
+    await fetchWikipediaSummary('Statue of Liberty');
+    expect(handleCalls).toBe(1);
+    expect(wikiCalls).toBe(3);
  });

  it('shares cache across consecutive callers for the same Wikipedia title', async () => {
    let fetchCount = 0;
-    globalThis.fetch = vi.fn(async () => {
+    const baseFetch = vi.fn(async () => {
      fetchCount++;
      return new Response(
        JSON.stringify({
@@ -97,7 +160,8 @@ describe('lib/wikimediaClient', () => {
        }),
        { status: 200 },
      );
-    }) as any;
+    });
+    globalThis.fetch = withHandle('operator-cache', baseFetch as any) as any;

    const a = await fetchWikipediaSummary('Eiffel Tower');
    const b = await fetchWikipediaSummary('Eiffel Tower');
@@ -107,7 +171,7 @@ describe('lib/wikimediaClient', () => {

  it('deduplicates concurrent in-flight requests for the same title', async () => {
    let fetchCount = 0;
-    globalThis.fetch = vi.fn(async () => {
+    const baseFetch = vi.fn(async () => {
      fetchCount++;
      await new Promise((r) => setTimeout(r, 5));
      return new Response(
@@ -120,7 +184,8 @@ describe('lib/wikimediaClient', () => {
        }),
        { status: 200 },
      );
-    }) as any;
+    });
+    globalThis.fetch = withHandle('operator-cache', baseFetch as any) as any;

    const [a, b, c] = await Promise.all([
      fetchWikipediaSummary('Mount Fuji'),
@@ -134,28 +199,37 @@ describe('lib/wikimediaClient', () => {
  });

  it('returns null on disambiguation pages without throwing', async () => {
-    globalThis.fetch = vi.fn(async () =>
-      new Response(JSON.stringify({ type: 'disambiguation' }), { status: 200 }),
+    globalThis.fetch = withHandle(
+      'operator-cache',
+      vi.fn(async () =>
+        new Response(JSON.stringify({ type: 'disambiguation' }), { status: 200 }),
+      ) as any,
    ) as any;
    const summary = await fetchWikipediaSummary('Mercury');
    expect(summary).toBeNull();
  });

  it('returns null on HTTP error without throwing', async () => {
-    globalThis.fetch = vi.fn(async () => new Response('not found', { status: 404 })) as any;
+    globalThis.fetch = withHandle(
+      'operator-cache',
+      vi.fn(async () => new Response('not found', { status: 404 })) as any,
+    ) as any;
    const summary = await fetchWikipediaSummary('Nonexistent Article 12345');
    expect(summary).toBeNull();
  });

  it('returns null on network error without throwing', async () => {
-    globalThis.fetch = vi.fn(async () => {
-      throw new Error('network down');
-    }) as any;
+    globalThis.fetch = withHandle(
+      'operator-cache',
+      vi.fn(async () => {
+        throw new Error('network down');
+      }) as any,
+    ) as any;
    const summary = await fetchWikipediaSummary('Anything');
    expect(summary).toBeNull();
  });

-  it('returns null on empty input', async () => {
+  it('returns null on empty input without fetching anything', async () => {
    globalThis.fetch = vi.fn(async () => new Response('{}', { status: 200 })) as any;
    expect(await fetchWikipediaSummary('')).toBeNull();
    expect(await fetchWikipediaSummary('   ')).toBeNull();
@@ -859,7 +859,7 @@ export default function TopRightControls({
                        }>
                          {activatingPhase === 'done'
                            ? (syncOutcomeRaw === 'solo'
-                              ? `${t('node.soloReady')} — ${nodeStatus?.total_events ?? 0} ${t('node.events')}`
+                              ? `${t('node.soloNodeReady')} — ${nodeStatus?.total_events ?? 0} ${t('node.events')}`
                              : `${t('node.synced')} — ${nodeStatus?.total_events ?? 0} ${t('node.events')}`)
                            : activatingPhase === 'sync'
                              ? `${t('node.syncingChain')}${(nodeStatus?.total_events ?? 0) > 0 ? ` ${nodeStatus?.total_events} ${t('node.events')}` : ''}`
@@ -1013,8 +1013,8 @@ export default function TopRightControls({
                    : t('terminal.terminalDetail')}
                  <div className="mt-2 text-[12px] text-cyan-200/70 normal-case tracking-normal">
                    {terminalPrivateReady
-                      ? t('terminal.enterTerminalDetail')
-                      : t('terminal.terminalDetailMore')}
+                      ? t('terminal.identityReady')
+                      : t('terminal.identityNotReady')}
                  </div>
                </div>
                {terminalLaunchError && (
@@ -1025,15 +1025,15 @@ export default function TopRightControls({
                <div className="border border-cyan-500/20 bg-black/30 px-4 py-4 text-[12px] font-mono text-slate-200 leading-[1.85]">
                  <div className="text-cyan-300 tracking-[0.18em]">{t('terminal.beforeYouEnter')}</div>
                  <ul className="mt-3 space-y-2 list-disc pl-5">
-                    <li>{t('terminal.term1')}</li>
-                    <li>{t('terminal.term2')}</li>
-                    <li>{t('terminal.term3')}</li>
+                    <li>{t('terminal.termTerminal1')}</li>
+                    <li>{t('terminal.termTerminal2')}</li>
+                    <li>{t('terminal.termTerminal3')}</li>
                  </ul>
                </div>
                <div className="border border-amber-500/20 bg-amber-950/10 px-4 py-3 text-[12px] font-mono text-amber-200/80 leading-[1.85]">
                  <div className="text-amber-300 tracking-[0.18em]">{t('terminal.wormholeCleanup')}</div>
                  <div className="mt-2">
-                    {t('terminal.wormholeCleanupDetail')}
+                    {t('terminal.cleanupDetail')}
                  </div>
                </div>
                <div className="grid grid-cols-1 gap-3 sm:grid-cols-3">
@@ -1,51 +1,37 @@
 /**
 * wikimediaClient — single fetch surface for Wikipedia / Wikidata.
 *
- * Issues #218, #219, #220 (tg12 external audit):
+ * Issues #218, #219, #220 (tg12 external audit) + Round 7a:
 *
 * Wikimedia's User-Agent policy asks API clients to identify themselves
 * via `Api-User-Agent` when calling from browser JavaScript (because the
- * browser does not let JS set `User-Agent` directly). Before this
- * module existed, three independent components issued anonymous browser
- * fetches against Wikipedia / Wikidata:
+ * browser does not let JS set `User-Agent` directly). Three independent
+ * components used to issue anonymous browser fetches against Wikipedia /
+ * Wikidata:
 *
 *   - useRegionDossier  (Wikidata SPARQL + Wikipedia REST summary)
 *   - WikiImage          (Wikipedia REST summary)
 *   - NewsFeed           (Wikipedia REST summary)
 *
- * Each component shipped its own copy-pasted fetch + module-local cache.
- * Provider-policy compliance was missing in all three places.
+ * PR #284 collapsed them into this shared helper with one stable
+ * `Api-User-Agent`. That fixed compliance but introduced a new problem:
+ * the `Api-User-Agent` was project-wide, so from Wikimedia's perspective
+ * every Shadowbroker install looked like one giant scraper. If one
+ * install misbehaved, Wikimedia's only recourse was to block the project
+ * as a whole.
 *
- * This module centralizes:
+ * Round 7a fixes that. The frontend fetches the per-install operator
+ * handle from `GET /api/settings/operator-handle` once on first use and
+ * embeds it in the `Api-User-Agent`. Wikimedia can now rate-limit /
+ * contact the specific install instead of the project. The handle is
+ * auto-generated on the backend (`shadow-XXXXXX`) or operator-chosen via
+ * the `OPERATOR_HANDLE` setting.
 *
- *   1. The `Api-User-Agent` header on every request.
- *   2. A single LRU cache for Wikipedia summary lookups (keyed by article
- *      title).  Multiple components asking for the same article share
- *      one in-flight request and one cache slot.
- *   3. One predictable kill switch — if Wikimedia ever asks us to back
- *      off, we change `WIKIMEDIA_API_USER_AGENT` here and the whole
- *      frontend updates.
- *
- * This does NOT change end-user UX:
- *
- *   - WikiImage still shows the same thumbnails.
- *   - NewsFeed still shows aircraft thumbnails.
- *   - useRegionDossier still returns the same place summary + leader.
- *
- * What changes:
- *
- *   - Wikimedia can identify our traffic from any other anonymous
- *     browser visitor pool.
- *   - Provider-policy fixes happen here once, not in three places.
+ * UX impact: zero. Same thumbnails, same summaries, same load behavior.
+ * The only observable change is the value of the outgoing
+ * `Api-User-Agent` header.
 */

-// Stable identifier per Wikimedia UA policy. Includes a contact path so
-// Wikimedia's operators can reach the project if they need to rate-limit
-// or coordinate. Bump the version when the contact path changes.
-export const WIKIMEDIA_API_USER_AGENT =
-  'Shadowbroker/1.0 (+https://github.com/BigBodyCobain/Shadowbroker; ' +
-  'report issues at /issues)';
-
 // Module-level cache shared by WikiImage, NewsFeed, and useRegionDossier.
 // Keyed by Wikipedia article title (NOT slug — we keep the human-readable
 // form so debugging the cache is easier). Values track in-flight state
@@ -73,6 +59,66 @@ function evictIfOverCap() {
  if (oldest) _summaryCache.delete(oldest);
 }

+// ─── Per-operator handle (Round 7a) ────────────────────────────────────────
+
+// Fetched once from the backend on first need and cached for the page
+// lifetime. The handle is NOT a secret — Wikimedia will see it on every
+// Wikipedia / Wikidata request we make — but caching it locally avoids a
+// round-trip on every Wikipedia fetch and lets the offline / no-backend
+// case still produce a stable UA (the fallback handle).
+let _handlePromise: Promise<string> | null = null;
+let _cachedHandle: string | null = null;
+
+const FALLBACK_HANDLE = 'operator-offline';
+const HANDLE_ENDPOINT = '/api/settings/operator-handle';
+
+async function fetchOperatorHandle(): Promise<string> {
+  try {
+    const res = await fetch(HANDLE_ENDPOINT, {
+      // Use the standard relative-path proxy so the Next.js admin-key
+      // injection (same-origin) flows naturally for legitimate browser
+      // sessions. A cross-origin scanner will be blocked by the proxy
+      // before this even leaves their browser.
+      credentials: 'same-origin',
+    });
+    if (!res.ok) return FALLBACK_HANDLE;
+    const data = await res.json();
+    const h = (data && typeof data.handle === 'string' && data.handle.trim()) || '';
+    return h || FALLBACK_HANDLE;
+  } catch {
+    return FALLBACK_HANDLE;
+  }
+}
+
+async function getOperatorHandle(): Promise<string> {
+  if (_cachedHandle) return _cachedHandle;
+  if (!_handlePromise) {
+    _handlePromise = fetchOperatorHandle().then((h) => {
+      _cachedHandle = h;
+      return h;
+    });
+  }
+  return _handlePromise;
+}
+
+/** Build the Wikimedia Api-User-Agent for this install.
+ *
+ * Includes the per-install operator handle so Wikimedia can rate-limit /
+ * contact the specific operator instead of the project as a whole.
+ * Exported for tests; production callers should let
+ * `fetchWikipediaSummary` / `fetchWikidataSparql` build it implicitly.
+ */
+export async function buildWikimediaUserAgent(purpose: string): Promise<string> {
+  const handle = await getOperatorHandle();
+  const safePurpose = (purpose || '').replace(/[^a-zA-Z0-9_-]/g, '-').toLowerCase();
+  return (
+    `Shadowbroker/1.0 (operator: ${handle}; purpose: ${safePurpose}; ` +
+    '+https://github.com/BigBodyCobain/Shadowbroker; report issues at /issues)'
+  );
+}
+
+// ─── Wikipedia summary fetch ───────────────────────────────────────────────
+
 /** Fetch a Wikipedia article summary (titles, NOT URLs).
 *
 * Empty / invalid input resolves to `null`. Network errors and disambig
@@ -92,40 +138,42 @@ export async function fetchWikipediaSummary(
  const slug = encodeURIComponent(trimmed.replace(/ /g, '_'));
  const url = `https://en.wikipedia.org/api/rest_v1/page/summary/${slug}`;

-  const promise = fetch(url, {
-    headers: { 'Api-User-Agent': WIKIMEDIA_API_USER_AGENT },
-  })
-    .then(async (r) => {
+  const promise = (async (): Promise<WikipediaSummary | null> => {
+    try {
+      const ua = await buildWikimediaUserAgent('wikipedia-summary');
+      const r = await fetch(url, { headers: { 'Api-User-Agent': ua } });
      if (!r.ok) return null;
      const d = await r.json();
      if (d?.type === 'disambiguation') return null;
-      const summary: WikipediaSummary = {
+      return {
        title: trimmed,
        description: d?.description || '',
        extract: d?.extract || '',
        thumbnail: d?.thumbnail?.source || d?.originalimage?.source || '',
        type: d?.type || 'standard',
      };
-      return summary;
-    })
-    .catch(() => null)
-    .then((summary) => {
-      _summaryCache.set(trimmed, { summary, inflight: null, loaded: true });
-      evictIfOverCap();
-      return summary;
-    });
+    } catch {
+      return null;
+    }
+  })().then((summary) => {
+    _summaryCache.set(trimmed, { summary, inflight: null, loaded: true });
+    evictIfOverCap();
+    return summary;
+  });

  _summaryCache.set(trimmed, { summary: null, inflight: promise, loaded: false });
  evictIfOverCap();
  return promise;
 }

+// ─── Wikidata SPARQL ───────────────────────────────────────────────────────
+
 /** Fetch a Wikidata SPARQL query result.
 *
 * Returns the parsed JSON `results.bindings` array on success; `null`
 * (not throwing) on any failure so callers can render fallbacks
- * silently. Kept as a thin wrapper so the audit-required UA header is
- * applied in exactly one place.
+ * silently. Per-install operator handle threaded through `Api-User-Agent`
+ * (Round 7a).
 */
 export async function fetchWikidataSparql<T = Record<string, { value: string }>>(
  sparql: string,
@@ -136,9 +184,10 @@ export async function fetchWikidataSparql<T = Record<string, { value: string }>>
    trimmed,
  )}&format=json`;
  try {
+    const ua = await buildWikimediaUserAgent('wikidata-sparql');
    const res = await fetch(url, {
      headers: {
-        'Api-User-Agent': WIKIMEDIA_API_USER_AGENT,
+        'Api-User-Agent': ua,
        Accept: 'application/sparql-results+json',
      },
    });
@@ -151,7 +200,11 @@ export async function fetchWikidataSparql<T = Record<string, { value: string }>>
  }
 }

-/** Internal: clear the shared cache. Exposed for tests only. */
+// ─── Test helpers ──────────────────────────────────────────────────────────
+
+/** Internal: clear the shared cache + the handle cache. Exposed for tests only. */
 export function _resetWikimediaClientCacheForTests() {
  _summaryCache.clear();
+  _handlePromise = null;
+  _cachedHandle = null;
 }
@@ -80,7 +80,6 @@ dependencies = [
    { name = "apscheduler" },
    { name = "beautifulsoup4" },
    { name = "cachetools" },
-    { name = "cloudscraper" },
    { name = "cryptography" },
    { name = "defusedxml" },
    { name = "fastapi" },
@@ -119,7 +118,6 @@ requires-dist = [
    { name = "apscheduler", specifier = "==3.10.3" },
    { name = "beautifulsoup4", specifier = ">=4.9.0" },
    { name = "cachetools", specifier = "==5.5.2" },
-    { name = "cloudscraper", specifier = "==1.2.71" },
    { name = "cryptography", specifier = ">=41.0.0" },
    { name = "defusedxml", specifier = ">=0.7.1" },
    { name = "fastapi", specifier = "==0.115.12" },
@@ -453,20 +451,6 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/98/78/01c019cdb5d6498122777c1a43056ebb3ebfeef2076d9d026bfe15583b2b/click-8.3.1-py3-none-any.whl", hash = "sha256:981153a64e25f12d547d3426c367a4857371575ee7ad18df2a6183ab0545b2a6", size = 108274, upload-time = "2025-11-15T20:45:41.139Z" },
 ]

-[[package]]
-name = "cloudscraper"
-version = "1.2.71"
-source = { registry = "https://pypi.org/simple" }
-dependencies = [
-    { name = "pyparsing" },
-    { name = "requests" },
-    { name = "requests-toolbelt" },
-]
-sdist = { url = "https://files.pythonhosted.org/packages/ac/25/6d0481860583f44953bd791de0b7c4f6d7ead7223f8a17e776247b34a5b4/cloudscraper-1.2.71.tar.gz", hash = "sha256:429c6e8aa6916d5bad5c8a5eac50f3ea53c9ac22616f6cb21b18dcc71517d0d3", size = 93261, upload-time = "2023-04-25T23:20:19.467Z" }
-wheels = [
-    { url = "https://files.pythonhosted.org/packages/81/97/fc88803a451029688dffd7eb446dc1b529657577aec13aceff1cc9628c5d/cloudscraper-1.2.71-py2.py3-none-any.whl", hash = "sha256:76f50ca529ed2279e220837befdec892626f9511708e200d48d5bb76ded679b0", size = 99652, upload-time = "2023-04-25T23:20:15.974Z" },
-]
-
 [[package]]
 name = "colorama"
 version = "0.4.6"
@@ -1643,15 +1627,6 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/99/32/15e08a0c4bb536303e1568e2ba5cae1ce39a2e026a03aea46173af4c7a2d/pyobjc_framework_libdispatch-12.1-cp314-cp314t-macosx_10_15_universal2.whl", hash = "sha256:23fc9915cba328216b6a736c7a48438a16213f16dfb467f69506300b95938cc7", size = 15976, upload-time = "2025-11-14T09:53:07.936Z" },
 ]

-[[package]]
-name = "pyparsing"
-version = "3.3.2"
-source = { registry = "https://pypi.org/simple" }
-sdist = { url = "https://files.pythonhosted.org/packages/f3/91/9c6ee907786a473bf81c5f53cf703ba0957b23ab84c264080fb5a450416f/pyparsing-3.3.2.tar.gz", hash = "sha256:c777f4d763f140633dcb6d8a3eda953bf7a214dc4eff598413c070bcdc117cbc", size = 6851574, upload-time = "2026-01-21T03:57:59.36Z" }
-wheels = [
-    { url = "https://files.pythonhosted.org/packages/10/bd/c038d7cc38edc1aa5bf91ab8068b63d4308c66c4c8bb3cbba7dfbc049f9c/pyparsing-3.3.2-py3-none-any.whl", hash = "sha256:850ba148bd908d7e2411587e247a1e4f0327839c40e2e5e6d05a007ecc69911d", size = 122781, upload-time = "2026-01-21T03:57:55.912Z" },
-]
-
 [[package]]
 name = "pypubsub"
 version = "4.0.7"
@@ -1901,18 +1876,6 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/70/8e/0e2d847013cb52cd35b38c009bb167a1a26b2ce6cd6965bf26b47bc0bf44/requests-2.31.0-py3-none-any.whl", hash = "sha256:58cd2187c01e70e6e26505bca751777aa9f2ee0b7f4300988b709f44e013003f", size = 62574, upload-time = "2023-05-22T15:12:42.313Z" },
 ]

-[[package]]
-name = "requests-toolbelt"
-version = "1.0.0"
-source = { registry = "https://pypi.org/simple" }
-dependencies = [
-    { name = "requests" },
-]
-sdist = { url = "https://files.pythonhosted.org/packages/f3/61/d7545dafb7ac2230c70d38d31cbfe4cc64f7144dc41f6e4e4b78ecd9f5bb/requests-toolbelt-1.0.0.tar.gz", hash = "sha256:7681a0a3d047012b5bdc0ee37d7f8f07ebe76ab08caeccfc3921ce23c88d5bc6", size = 206888, upload-time = "2023-05-01T04:11:33.229Z" }
-wheels = [
-    { url = "https://files.pythonhosted.org/packages/3f/51/d4db610ef29373b879047326cbf6fa98b6c1969d6f6dc423279de2b1be2c/requests_toolbelt-1.0.0-py2.py3-none-any.whl", hash = "sha256:cccfdd665f0a24fcf4726e690f65639d272bb0637b9b92dfd91a5568ccf6bd06", size = 54481, upload-time = "2023-05-01T04:11:28.427Z" },
-]
-
 [[package]]
 name = "reverse-geocoder"
 version = "1.5.1"