Files
Shadowbroker/backend/services/carrier_tracker.py
T
Shadowbroker 5e6bb8511a Fix #244/#245/#246: carrier tracker seed/cache/freshness model (#285)
Replace the dated editorial fallback positions baked into the registry
with a one-shot seed file + persistent observation cache. The user's
runtime cache now reflects what THIS install has actually observed,
not what USNI published on March 9, 2026. A year from now, the cache
holds a year of observations and the seed is irrelevant.

== #244: dated editorial coordinates out of the registry ==

CARRIER_REGISTRY no longer carries fallback_lat/lng/heading/desc.
Those fields are deleted. The registry is now identity + homeport
only.

New file: backend/data/carrier_seed.json
  - Read-only, shipped with every release.
  - Used ONCE on first-ever startup to bootstrap carrier_cache.json.
  - Each entry stamped with position_confidence="seed" and the actual
    as-of date (2026-03-09), NOT now().

== #245: approximate confidence for headline-derived positions ==

_parse_carrier_positions_from_news() now stamps every GDELT-derived
entry with position_confidence="approximate" so the UI knows the
coordinate is a region-centroid match, not a precise observation.
After the freshness window the label rolls over to
"stale_approximate" so old-and-imprecise is distinguishable from
recent-and-imprecise.

The article's actual seendate is used as position_source_at instead
of now(), so the "last reported X days ago" badge is honest.

== #246: freshness is labelling, not eviction ==

The cache always preserves the last position the system observed,
forever. What changes is the position_confidence label:
  - within configurable window (default 14d, env-overridable via
    SHADOWBROKER_CARRIER_FRESHNESS_DAYS) -> "recent"
  - older -> "stale"
  - seed-bootstrap entries that were never refreshed -> "seed"
  - homeport defaults (carrier added post-install) -> "homeport_default"
  - headline-derived (any age, fresh) -> "approximate"
  - headline-derived (older than window) -> "stale_approximate"

The position itself never reverts to the seed or the registry. The
user always sees the last position the system observed. Per the
user's explicit guidance: "from there have it be the last position
the user has logged the carriers that way a year from now it doesnt
revert to where the ships are today".

== Other improvements ==

- CACHE_FILE moved to backend/data/carrier_cache.json so it lives in
  the volume-mounted dir under Docker compose. Previously it was at
  /app/carrier_cache.json which got wiped on every container restart
  (pre-existing bug).
- Atomic cache write (temp + os.replace) so a crash mid-write does
  not leave a truncated cache file.

== Public API shape ==

Every carrier object the API emits now includes:
  - position_confidence: seed | recent | stale | approximate |
                         stale_approximate | homeport_default
  - position_source_at:  ISO timestamp of when the underlying source
                         was observed (NOT now())
  - is_fallback:         convenience boolean for the UI; true when the
                         confidence is seed/stale/stale_approximate/
                         homeport_default

Existing fields (estimated, source, source_url, last_osint_update,
name, type, lat, lng, country, desc, wiki) are preserved exactly so
the current ShipPopup frontend renders unchanged. last_osint_update
now reflects position_source_at instead of now(), which is what the
existing "last reported MM/DD" badge always meant to show.

Tests: backend/tests/test_carrier_tracker_quality.py — 17 tests
covering seed bootstrap, subsequent-startup ignoring seed, no-seed/
no-cache homeport fallback, registry no longer has fallback fields,
freshness window labelling + env override, "year-old cache entry keeps
its position, only the label flips" regression, approximate
confidence for headline matches, GDELT seendate ISO parser, public
response shape backward compat.

Credit: tg12 (external security audit, three P1/P2 issues).
2026-05-21 11:15:52 -06:00

800 lines
30 KiB
Python

"""
Carrier Strike Group OSINT Tracker
===================================
Maintains estimated positions for US Navy Carrier Strike Groups with
honest provenance and freshness signals.
Issues #244 / #245 / #246 (tg12 external audit):
The previous implementation baked a snapshot of USNI News Fleet &
Marine Tracker positions (March 9, 2026) into the registry as
``fallback_lat``/``fallback_lng`` and stamped ``updated = now()``
every time the dossier was rendered. That presented stale editorial
data as live state. It also persisted GDELT-derived positions to the
on-disk cache with no freshness signal, so a single news mention from
months ago could keep overriding the (already-stale) registry default
indefinitely.
Architecture after this PR:
::
backend/data/carrier_seed.json read-only, shipped with image,
used ONCE on first-ever startup
to bootstrap carrier_cache.json.
backend/data/carrier_cache.json mutable, lives in the runtime data
volume, written by every GDELT
refresh + any future source.
Startup flow:
1. ``carrier_cache.json`` exists? → load it.
2. Otherwise, copy ``carrier_seed.json`` → ``carrier_cache.json``,
then load it. (This happens once, ever, per install.)
3. Background: GDELT fetch runs. Any carrier mentioned in fresh news
gets its entry replaced with the news-derived position.
``position_source_at`` is set to the news article timestamp.
Freshness is a *labelling* decision, not an eviction decision:
- ``position_source_at`` within the configurable freshness window
(default 14 days) → ``position_confidence = "recent"``.
- Older than that → ``position_confidence = "stale"``.
- Bootstrapped from the seed file (never updated) → ``"seed"``.
- No cache entry at all (e.g. a carrier added to the registry after
first install) → carrier renders at its homeport with
``"homeport_default"``.
Carriers are never hidden, never teleported, never disappeared. The
position the user sees is always the last position the system actually
observed, with an honest "as-of" timestamp the UI can render however
it likes. A year from now, the runtime cache reflects whatever this
install has observed via GDELT — not the seed snapshot.
"""
import os
import json
import time
import logging
import threading
import random
import shutil
from datetime import datetime, timedelta, timezone
from pathlib import Path
from typing import Any, Dict, List, Optional, Tuple
from services.network_utils import fetch_with_curl
logger = logging.getLogger(__name__)
# -----------------------------------------------------------------
# Carrier registry: hull number → identity only.
#
# Issue #244 (tg12): the previous registry carried hard-coded
# ``fallback_lat``/``fallback_lng`` that were dated editorial
# snapshots from a 2026-03-09 article. Those fields are DELETED. The
# registry is now identity + homeport only; positions are sourced
# exclusively from carrier_cache.json (and via that, from the
# bootstrap seed or live OSINT).
# -----------------------------------------------------------------
CARRIER_REGISTRY: Dict[str, dict] = {
# --- Bremerton, WA (Naval Base Kitsap) ---
"CVN-68": {
"name": "USS Nimitz (CVN-68)",
"wiki": "https://en.wikipedia.org/wiki/USS_Nimitz",
"homeport": "Bremerton, WA",
"homeport_lat": 47.5535,
"homeport_lng": -122.6400,
},
"CVN-76": {
"name": "USS Ronald Reagan (CVN-76)",
"wiki": "https://en.wikipedia.org/wiki/USS_Ronald_Reagan",
"homeport": "Bremerton, WA",
"homeport_lat": 47.5580,
"homeport_lng": -122.6360,
},
# --- Norfolk, VA (Naval Station Norfolk) ---
"CVN-69": {
"name": "USS Dwight D. Eisenhower (CVN-69)",
"wiki": "https://en.wikipedia.org/wiki/USS_Dwight_D._Eisenhower",
"homeport": "Norfolk, VA",
"homeport_lat": 36.9465,
"homeport_lng": -76.3265,
},
"CVN-78": {
"name": "USS Gerald R. Ford (CVN-78)",
"wiki": "https://en.wikipedia.org/wiki/USS_Gerald_R._Ford",
"homeport": "Norfolk, VA",
"homeport_lat": 36.9505,
"homeport_lng": -76.3250,
},
"CVN-74": {
"name": "USS John C. Stennis (CVN-74)",
"wiki": "https://en.wikipedia.org/wiki/USS_John_C._Stennis",
"homeport": "Norfolk, VA",
"homeport_lat": 36.9540,
"homeport_lng": -76.3235,
},
"CVN-75": {
"name": "USS Harry S. Truman (CVN-75)",
"wiki": "https://en.wikipedia.org/wiki/USS_Harry_S._Truman",
"homeport": "Norfolk, VA",
"homeport_lat": 36.9580,
"homeport_lng": -76.3220,
},
"CVN-77": {
"name": "USS George H.W. Bush (CVN-77)",
"wiki": "https://en.wikipedia.org/wiki/USS_George_H.W._Bush",
"homeport": "Norfolk, VA",
"homeport_lat": 36.9620,
"homeport_lng": -76.3210,
},
# --- San Diego, CA (Naval Base San Diego) ---
"CVN-70": {
"name": "USS Carl Vinson (CVN-70)",
"wiki": "https://en.wikipedia.org/wiki/USS_Carl_Vinson",
"homeport": "San Diego, CA",
"homeport_lat": 32.6840,
"homeport_lng": -117.1290,
},
"CVN-71": {
"name": "USS Theodore Roosevelt (CVN-71)",
"wiki": "https://en.wikipedia.org/wiki/USS_Theodore_Roosevelt_(CVN-71)",
"homeport": "San Diego, CA",
"homeport_lat": 32.6885,
"homeport_lng": -117.1280,
},
"CVN-72": {
"name": "USS Abraham Lincoln (CVN-72)",
"wiki": "https://en.wikipedia.org/wiki/USS_Abraham_Lincoln_(CVN-72)",
"homeport": "San Diego, CA",
"homeport_lat": 32.6925,
"homeport_lng": -117.1275,
},
# --- Yokosuka, Japan (CFAY) ---
"CVN-73": {
"name": "USS George Washington (CVN-73)",
"wiki": "https://en.wikipedia.org/wiki/USS_George_Washington_(CVN-73)",
"homeport": "Yokosuka, Japan",
"homeport_lat": 35.2830,
"homeport_lng": 139.6700,
},
}
# -----------------------------------------------------------------
# Region → approximate center coordinates.
#
# Issue #245 (tg12): converting a region name straight into precise
# map coordinates is false precision. We still use this table to
# infer a coarse position from a headline mention, but the resulting
# carrier object is now stamped ``position_confidence = "approximate"``
# so the UI can render an uncertainty radius / dimmed icon. The
# centroid is a best-effort midpoint of the named body of water.
# -----------------------------------------------------------------
REGION_COORDS: Dict[str, tuple] = {
# Oceans & Seas
"eastern mediterranean": (34.0, 25.0),
"mediterranean": (36.0, 15.0),
"western mediterranean": (37.0, 2.0),
"red sea": (18.0, 39.5),
"arabian sea": (16.0, 64.0),
"persian gulf": (26.5, 51.5),
"gulf of oman": (24.5, 58.5),
"north arabian sea": (20.0, 64.0),
"south china sea": (15.0, 115.0),
"east china sea": (28.0, 125.0),
"philippine sea": (20.0, 130.0),
"sea of japan": (40.0, 135.0),
"taiwan strait": (24.0, 119.5),
"western pacific": (20.0, 140.0),
"pacific": (20.0, -150.0),
"indian ocean": (-5.0, 70.0),
"north atlantic": (40.0, -40.0),
"atlantic": (30.0, -50.0),
"gulf of aden": (12.5, 45.0),
"horn of africa": (10.0, 50.0),
"strait of hormuz": (26.5, 56.3),
"bab el-mandeb": (12.6, 43.3),
"suez canal": (30.5, 32.3),
"baltic sea": (57.0, 18.0),
"north sea": (56.0, 3.0),
"black sea": (43.0, 34.0),
"south atlantic": (-20.0, -20.0),
"coral sea": (-18.0, 155.0),
"gulf of mexico": (25.0, -90.0),
"caribbean": (15.0, -75.0),
# Specific bases / ports
"norfolk": (36.95, -76.33),
"san diego": (32.68, -117.15),
"yokosuka": (35.28, 139.67),
"pearl harbor": (21.35, -157.95),
"guam": (13.45, 144.79),
"bahrain": (26.23, 50.55),
"rota": (36.62, -6.35),
"naples": (40.85, 14.27),
"bremerton": (47.56, -122.63),
"puget sound": (47.56, -122.63),
"newport news": (36.98, -76.43),
# Areas of operation
"centcom": (25.0, 55.0),
"indopacom": (20.0, 130.0),
"eucom": (48.0, 15.0),
"southcom": (10.0, -80.0),
"5th fleet": (25.0, 55.0),
"6th fleet": (36.0, 15.0),
"7th fleet": (25.0, 130.0),
"3rd fleet": (30.0, -130.0),
"2nd fleet": (35.0, -60.0),
}
# -----------------------------------------------------------------
# Files
# -----------------------------------------------------------------
#
# The seed lives in the read-only image data dir (it ships with each
# release). The cache lives in the same data dir but is written at
# runtime; under Docker compose this dir is volume-mounted so the
# cache persists across container restarts, which is the whole point
# of the seed-then-observe model — the user's runtime observations
# survive image upgrades.
SEED_FILE = Path(__file__).parent.parent / "data" / "carrier_seed.json"
CACHE_FILE = Path(__file__).parent.parent / "data" / "carrier_cache.json"
# -----------------------------------------------------------------
# Freshness window for position_confidence labeling. Issue #246 (tg12):
# previously persisted cache entries had no freshness signal at all.
# After this change, the position itself is preserved (we never lose
# what was last observed) but the confidence label flips from
# "recent" to "stale" once the underlying source is older than this
# window. Operator-overridable via env var.
# -----------------------------------------------------------------
_DEFAULT_FRESHNESS_WINDOW_DAYS = 14
def _freshness_window_days() -> int:
raw = str(os.environ.get("SHADOWBROKER_CARRIER_FRESHNESS_DAYS", "") or "").strip()
if not raw:
return _DEFAULT_FRESHNESS_WINDOW_DAYS
try:
n = int(raw)
return n if n > 0 else _DEFAULT_FRESHNESS_WINDOW_DAYS
except (TypeError, ValueError):
return _DEFAULT_FRESHNESS_WINDOW_DAYS
_carrier_positions: Dict[str, dict] = {}
_positions_lock = threading.Lock()
_last_update: Optional[datetime] = None
_last_gdelt_fetch_at = 0.0
_cached_gdelt_articles: List[dict] = []
_GDELT_FETCH_INTERVAL_SECONDS = 1800
_GDELT_REQUEST_DELAY_SECONDS = 1.25
_GDELT_REQUEST_JITTER_SECONDS = 0.35
def _now_iso() -> str:
return datetime.now(timezone.utc).isoformat()
def _parse_iso(ts: str) -> Optional[datetime]:
if not ts:
return None
try:
# Python's fromisoformat accepts +00:00 but not 'Z' until 3.11.
normalized = ts.replace("Z", "+00:00")
dt = datetime.fromisoformat(normalized)
if dt.tzinfo is None:
dt = dt.replace(tzinfo=timezone.utc)
return dt
except (TypeError, ValueError):
return None
def _compute_position_confidence(entry: dict, *, now: Optional[datetime] = None) -> str:
"""Return the public confidence label for a carrier cache entry.
Order of precedence:
- explicit "homeport_default" / "seed" labels are preserved.
- dated entries (with position_source_at) are "recent" if within
the configured freshness window, else "stale".
- missing position_source_at falls through to "stale".
"""
raw_label = str(entry.get("position_confidence", "") or "").strip()
# Explicit "kind of provenance" labels are preserved as-is. They
# describe HOW we got the position, not WHEN — a fresh headline-to-
# centroid match (#245) is still imprecise no matter how recently
# it was observed, and the seed (#244) is always the seed.
if raw_label in {"seed", "homeport_default", "approximate"}:
# Approximate entries can still age into "stale_approximate" if
# they fall out of the freshness window — that distinction lets
# the UI render a different badge for old-and-imprecise vs
# recent-and-imprecise. seed/homeport_default never age (they
# were never timestamped against real observations).
if raw_label == "approximate":
source_at = _parse_iso(str(entry.get("position_source_at", "") or ""))
if source_at is not None:
reference = now or datetime.now(timezone.utc)
if reference - source_at > timedelta(days=_freshness_window_days()):
return "stale_approximate"
return raw_label
source_at = _parse_iso(str(entry.get("position_source_at", "") or ""))
if not source_at:
return "stale"
reference = now or datetime.now(timezone.utc)
window = timedelta(days=_freshness_window_days())
if reference - source_at <= window:
return "recent"
return "stale"
def _load_seed() -> Dict[str, dict]:
"""Load the read-only seed file shipped with the image.
Returns a hull→entry dict (no _meta wrapper). Missing or malformed
seed files yield an empty dict — the caller falls back to homeport
defaults.
"""
try:
if not SEED_FILE.exists():
logger.info("Carrier seed file not present at %s; first-run will fall back to homeport defaults", SEED_FILE)
return {}
raw = json.loads(SEED_FILE.read_text(encoding="utf-8"))
carriers = raw.get("carriers", {}) if isinstance(raw, dict) else {}
if not isinstance(carriers, dict):
return {}
logger.info("Carrier seed loaded: %d entries from %s", len(carriers), SEED_FILE)
return carriers
except (IOError, OSError, json.JSONDecodeError, ValueError) as e:
logger.warning("Failed to load carrier seed file %s: %s", SEED_FILE, e)
return {}
def _load_cache() -> Dict[str, dict]:
"""Load the mutable cache (last-known positions persisted between restarts)."""
try:
if CACHE_FILE.exists():
data = json.loads(CACHE_FILE.read_text(encoding="utf-8"))
if isinstance(data, dict):
logger.info("Carrier cache loaded: %d carriers from %s", len(data), CACHE_FILE)
return data
except (IOError, OSError, json.JSONDecodeError, ValueError) as e:
logger.warning("Failed to load carrier cache: %s", e)
return {}
def _save_cache(positions: Dict[str, dict]) -> None:
"""Persist the mutable cache. Atomic write (temp + rename) so a crash
mid-write can't leave the file truncated."""
try:
CACHE_FILE.parent.mkdir(parents=True, exist_ok=True)
tmp = CACHE_FILE.with_suffix(CACHE_FILE.suffix + ".tmp")
tmp.write_text(json.dumps(positions, indent=2), encoding="utf-8")
# On Windows os.replace is atomic and overwrites existing files.
os.replace(tmp, CACHE_FILE)
logger.info("Carrier cache saved: %d carriers", len(positions))
except (IOError, OSError) as e:
logger.warning("Failed to save carrier cache: %s", e)
def _homeport_entry_for(hull: str) -> Optional[dict]:
"""Return a homeport-default cache entry for a hull, or None if the
hull is not in the registry."""
info = CARRIER_REGISTRY.get(hull)
if not info:
return None
return {
"lat": info["homeport_lat"],
"lng": info["homeport_lng"],
"heading": 0,
"desc": f"{info['homeport']} (no observations yet)",
"source": f"Homeport default ({info['homeport']})",
"source_url": info.get("wiki", ""),
"position_source_at": _now_iso(),
"position_confidence": "homeport_default",
}
def _bootstrap_cache_if_missing() -> Dict[str, dict]:
"""One-shot: if no cache exists, materialize one from the seed file.
Returns the cache contents (hull→entry). On first-ever startup,
this writes ``carrier_cache.json`` so subsequent restarts skip the
seed entirely. Operator-deleted caches re-bootstrap the same way —
operators can use that to "reset" carrier positions, but it's an
explicit operator action.
"""
if CACHE_FILE.exists():
return _load_cache()
seed = _load_seed()
if not seed:
# No seed file either. Build a homeport-default cache so the
# first save_cache call still produces something honest.
homeports: Dict[str, dict] = {}
for hull in CARRIER_REGISTRY:
entry = _homeport_entry_for(hull)
if entry is not None:
homeports[hull] = entry
if homeports:
_save_cache(homeports)
return homeports
# Persist the seed as the first cache so subsequent runs skip this branch.
_save_cache(seed)
logger.info("Carrier cache bootstrapped from seed (first-ever startup)")
return dict(seed)
def _match_region(text: str) -> Optional[tuple]:
"""Match a text string against known regions, return (lat, lng) or None."""
text_lower = text.lower()
for region, coords in sorted(REGION_COORDS.items(), key=lambda x: -len(x[0])):
if region in text_lower:
return coords
return None
def _match_carrier(text: str) -> Optional[str]:
"""Match a text string against known carrier names/hull numbers."""
text_lower = text.lower()
for hull, info in CARRIER_REGISTRY.items():
hull_check = hull.lower().replace("-", "")
name_parts = info["name"].lower()
if hull.lower() in text_lower or hull_check in text_lower.replace("-", ""):
return hull
ship_name = name_parts.split("(")[0].strip()
last_name = ship_name.split()[-1] if ship_name else ""
if last_name and len(last_name) > 3 and last_name in text_lower:
return hull
return None
def _fetch_gdelt_carrier_news() -> List[dict]:
"""Search GDELT for recent carrier movement news."""
global _last_gdelt_fetch_at, _cached_gdelt_articles
now = time.time()
if _cached_gdelt_articles and (now - _last_gdelt_fetch_at) < _GDELT_FETCH_INTERVAL_SECONDS:
logger.info("Carrier OSINT: using cached GDELT article set to avoid startup bursts")
return list(_cached_gdelt_articles)
results = []
search_terms = [
"aircraft+carrier+deployed",
"carrier+strike+group+navy",
"USS+Nimitz+carrier",
"USS+Ford+carrier",
"USS+Eisenhower+carrier",
"USS+Vinson+carrier",
"USS+Roosevelt+carrier+navy",
"USS+Lincoln+carrier",
"USS+Truman+carrier",
"USS+Reagan+carrier",
"USS+Washington+carrier+navy",
"USS+Bush+carrier",
"USS+Stennis+carrier",
]
for idx, term in enumerate(search_terms):
try:
url = f"https://api.gdeltproject.org/api/v2/doc/doc?query={term}&mode=artlist&maxrecords=5&format=json&timespan=14d"
raw = fetch_with_curl(url, timeout=8)
if getattr(raw, "status_code", 500) == 429:
logger.warning(
"GDELT returned 429 for '%s'; preserving cached carrier OSINT results",
term,
)
continue
if not raw or not hasattr(raw, "text"):
continue
data = raw.json()
articles = data.get("articles", [])
for art in articles:
title = art.get("title", "")
article_url = art.get("url", "")
article_at = art.get("seendate") or art.get("date") or ""
results.append({"title": title, "url": article_url, "seendate": article_at})
except (ConnectionError, TimeoutError, ValueError, KeyError, OSError) as e:
logger.debug(f"GDELT search failed for '{term}': {e}")
continue
if idx < len(search_terms) - 1:
time.sleep(
_GDELT_REQUEST_DELAY_SECONDS
+ random.uniform(0.0, _GDELT_REQUEST_JITTER_SECONDS)
)
_cached_gdelt_articles = list(results)
_last_gdelt_fetch_at = time.time()
logger.info(f"Carrier OSINT: found {len(results)} GDELT articles")
return results
def _gdelt_seendate_to_iso(seendate: str) -> Optional[str]:
"""GDELT returns YYYYMMDDhhmmss (UTC). Convert to ISO8601 for
position_source_at. Returns None if the input is unparseable."""
raw = (seendate or "").strip()
if len(raw) < 8 or not raw.isdigit():
return None
try:
dt = datetime.strptime(raw[:14] if len(raw) >= 14 else raw[:8] + "000000", "%Y%m%d%H%M%S")
return dt.replace(tzinfo=timezone.utc).isoformat()
except (TypeError, ValueError):
return None
def _parse_carrier_positions_from_news(articles: List[dict]) -> Dict[str, dict]:
"""Parse carrier positions from news article titles.
Issue #245 (tg12): the position is a region centroid, which is
coarse — we now stamp ``position_confidence = "approximate"`` so
the UI can render that uncertainty. Issue #244: the
``position_source_at`` field is the news article's actual seen
date, NOT now(), so the freshness check correctly flips entries
to "stale" once they age past the configured window.
"""
updates: Dict[str, dict] = {}
for article in articles:
title = article.get("title", "")
hull = _match_carrier(title)
if not hull:
continue
coords = _match_region(title)
if not coords:
continue
# First match wins (most recent article, GDELT returns newest first
# per term).
if hull not in updates:
iso_at = _gdelt_seendate_to_iso(str(article.get("seendate", ""))) or _now_iso()
updates[hull] = {
"lat": coords[0],
"lng": coords[1],
"heading": 0,
"desc": title[:100],
"source": "GDELT News API (headline region match — approximate)",
"source_url": article.get("url", "https://api.gdeltproject.org"),
"position_source_at": iso_at,
# Headline-to-centroid match is explicitly approximate.
"position_confidence": "approximate",
}
logger.info(
"Carrier update: %s%s (from: %s)",
CARRIER_REGISTRY[hull]["name"],
coords,
title[:80],
)
return updates
def _enrich_for_rendering(hull: str, entry: dict, *, now: Optional[datetime] = None) -> dict:
"""Add live computed fields (confidence label, last_osint_update)
on top of the persisted cache entry. The persisted entry is left
untouched; this function builds the public-facing object.
"""
info = CARRIER_REGISTRY.get(hull, {})
confidence = _compute_position_confidence(entry, now=now)
return {
"name": entry.get("name", info.get("name", hull)),
"lat": entry["lat"],
"lng": entry["lng"],
"heading": entry.get("heading", 0),
"desc": entry.get("desc", ""),
"wiki": entry.get("wiki", info.get("wiki", "")),
"source": entry.get("source", "OSINT estimated position"),
"source_url": entry.get("source_url", ""),
"position_source_at": entry.get("position_source_at", ""),
"position_confidence": confidence,
# Existing field preserved for backward compatibility with the
# current frontend ShipPopup; now reflects the SOURCE's observed
# time (not now()), so "last reported X days ago" is honest.
"last_osint_update": entry.get("position_source_at", ""),
# Convenience boolean for the UI: true when the position is
# NOT live OSINT (used to render dimmed icons / badges).
"is_fallback": confidence in {"seed", "stale", "stale_approximate", "homeport_default"},
}
def update_carrier_positions() -> None:
"""Refresh carrier positions.
Phase 1 (instant): publish whatever's in carrier_cache.json (or
bootstrap from seed on first-ever run), so the map has carriers
immediately.
Phase 2 (slow): query GDELT and replace position entries for any
carrier mentioned in fresh news. Persist back to cache.
"""
global _last_update
# --- Phase 1: instant cache (bootstrap from seed on first-ever run) ---
positions = _bootstrap_cache_if_missing()
# Ensure every registered hull has SOMETHING in the cache. A hull
# the seed didn't cover (e.g. added after install) renders at its
# homeport with "homeport_default" confidence.
for hull in CARRIER_REGISTRY:
if hull not in positions:
entry = _homeport_entry_for(hull)
if entry is not None:
positions[hull] = entry
with _positions_lock:
if not _carrier_positions:
_carrier_positions.update(positions)
_last_update = datetime.now(timezone.utc)
logger.info(
"Carrier tracker: %d carriers loaded from cache (GDELT enrichment starting...)",
len(positions),
)
# --- Phase 2: GDELT enrichment ---
try:
articles = _fetch_gdelt_carrier_news()
news_positions = _parse_carrier_positions_from_news(articles)
for hull, pos in news_positions.items():
# Always overwrite — newest GDELT mention wins. The previous
# entry's position is preserved in git history and the next
# cycle either confirms or replaces it.
positions[hull] = pos
logger.info("Carrier OSINT: updated %s from news", CARRIER_REGISTRY[hull]["name"])
except (ValueError, KeyError, json.JSONDecodeError, OSError) as e:
logger.warning("GDELT carrier fetch failed: %s", e)
with _positions_lock:
_carrier_positions.clear()
_carrier_positions.update(positions)
_last_update = datetime.now(timezone.utc)
_save_cache(positions)
confidences: Dict[str, int] = {}
for entry in positions.values():
label = _compute_position_confidence(entry)
confidences[label] = confidences.get(label, 0) + 1
logger.info("Carrier tracker: %d carriers updated. Confidence: %s", len(positions), confidences)
def _deconflict_positions(result: List[dict]) -> List[dict]:
"""Offset carriers that share identical coordinates so they don't stack."""
from collections import defaultdict
groups: dict[str, list[int]] = defaultdict(list)
for i, c in enumerate(result):
key = f"{round(c['lat'], 2)},{round(c['lng'], 2)}"
groups[key].append(i)
for indices in groups.values():
if len(indices) < 2:
continue
n = len(indices)
sample = result[indices[0]]
at_port = any(
abs(sample["lat"] - info.get("homeport_lat", 0)) < 0.05
and abs(sample["lng"] - info.get("homeport_lng", 0)) < 0.05
for info in CARRIER_REGISTRY.values()
)
if at_port:
for idx in indices:
carrier = result[idx]
hull = None
for h, info in CARRIER_REGISTRY.items():
if info["name"] == carrier["name"]:
hull = h
break
if hull:
info = CARRIER_REGISTRY[hull]
carrier["lat"] = info["homeport_lat"]
carrier["lng"] = info["homeport_lng"]
else:
spacing = 0.08
start_offset = -(n - 1) * spacing / 2
for j, idx in enumerate(indices):
result[idx]["lng"] += start_offset + j * spacing
return result
def get_carrier_positions() -> List[dict]:
"""Return current carrier positions for the data pipeline.
Each entry has the full provenance + freshness fields; the UI can
decide how to render them. Carriers are never hidden — only
labeled.
"""
now = datetime.now(timezone.utc)
with _positions_lock:
result: List[dict] = []
for hull, entry in _carrier_positions.items():
enriched = _enrich_for_rendering(hull, entry, now=now)
result.append(
{
"name": enriched["name"],
"type": "carrier",
"lat": enriched["lat"],
"lng": enriched["lng"],
"heading": None, # OSINT cannot determine true heading.
"sog": 0,
"cog": 0,
"country": "United States",
"desc": enriched["desc"],
"wiki": enriched["wiki"],
"estimated": True,
"source": enriched["source"],
"source_url": enriched["source_url"],
"last_osint_update": enriched["last_osint_update"],
# New fields (additive — existing UI continues to work):
"position_source_at": enriched["position_source_at"],
"position_confidence": enriched["position_confidence"],
"is_fallback": enriched["is_fallback"],
}
)
return _deconflict_positions(result)
# -----------------------------------------------------------------
# Scheduler: runs at startup, then at 00:00 and 12:00 UTC daily.
# -----------------------------------------------------------------
_scheduler_thread: Optional[threading.Thread] = None
_scheduler_stop = threading.Event()
def _scheduler_loop():
"""Background thread that triggers updates at 00:00 and 12:00 UTC."""
try:
update_carrier_positions()
except Exception as e:
logger.error(f"Carrier tracker initial update failed: {e}")
while not _scheduler_stop.is_set():
now = datetime.now(timezone.utc)
hour = now.hour
if hour < 12:
next_hour = 12
else:
next_hour = 24 # midnight = next day 00:00
next_run = now.replace(hour=next_hour % 24, minute=0, second=0, microsecond=0)
if next_hour == 24:
next_run = (now + timedelta(days=1)).replace(hour=0, minute=0, second=0, microsecond=0)
wait_seconds = (next_run - now).total_seconds()
logger.info(
"Carrier tracker: next update at %s (%.1fh)",
next_run.isoformat(),
wait_seconds / 3600,
)
if _scheduler_stop.wait(timeout=wait_seconds):
break
try:
update_carrier_positions()
except Exception as e:
logger.error(f"Carrier tracker scheduled update failed: {e}")
def start_carrier_tracker():
"""Start the carrier tracker background thread."""
global _scheduler_thread
if _scheduler_thread and _scheduler_thread.is_alive():
return
_scheduler_stop.clear()
_scheduler_thread = threading.Thread(
target=_scheduler_loop, daemon=True, name="carrier-tracker"
)
_scheduler_thread.start()
logger.info("Carrier tracker started")
def stop_carrier_tracker():
"""Stop the carrier tracker background thread."""
_scheduler_stop.set()
if _scheduler_thread:
_scheduler_thread.join(timeout=5)
logger.info("Carrier tracker stopped")