Private gate messages and offline DMs now ride the Infonet hashchain
as ciphertext-only events, replicated across nodes via private
transports (Tor onion / RNS / loopback) and decrypted only by parties
holding the gate or recipient keys.
Hashchain core (mesh_hashchain.py)
----------------------------------
* New ``append_private_gate_message`` and ``append_private_dm_message``
append paths with full signature verification, public-key binding,
revocation check, and replay protection in a dedicated sequence
domain (so a gate post does not consume the author's public broadcast
sequence, and a DM cannot replay-block a public message at sequence=1).
* Fork validation and full-chain validation now accept the gate
signature compatibility variants — older signatures that canonicalize
with/without epoch or reply_to still verify, so a re-sync from an
older peer doesn't reject still-valid history.
* DM hashchain spool: capped at 2 active sealed offline DMs per
recipient mailbox, plus a per-(sender, recipient) cap so one prolific
sender can't consume both slots. 1-hour TTL on the cap counter.
Spool intentionally small — it's an offline bootstrap channel,
not a persistent mailbox.
* Rebuild-state preserves the gate sequence domain across reloads so
a chain reload doesn't accidentally let an old gate sequence
replay-collide on next append.
Schema enforcement (mesh_schema.py)
-----------------------------------
* Private gate + DM payloads have closed allowlists of fields.
Plaintext keys (``message``, ``plaintext``, ``_local_plaintext``,
``_local_reply_to``) are explicit rejection-bait — they raise before
the event ever touches the chain.
* DM ciphertext + nonce must look like base64-ish sealed bytes;
obvious base64-encoded plaintext shapes are rejected.
* ``transport_lock`` required: DM hashchain spool requires
``private_strong``; gate accepts ``private``/``private_strong``/
``rns``/``onion``.
Defense-in-depth at the network layer (main.py + mesh_public.py)
----------------------------------------------------------------
* ``_infonet_sync_response_events`` now silently redacts private events
(gate_message + dm_message) unless the request looks like a loopback /
onion / RNS / private transport caller. If an operator accidentally
exposes :8000 to the public internet, an external puller gets
public events only — never ciphertext.
* ``_sync_from_peer`` raises ``PeerSyncRateLimited`` for 429 (handled
as 4-tuple return with retry_after_s) and ``PeerSyncHTTPError`` for
other non-200 statuses (handled by ``_run_public_sync_cycle`` to
honor server cooldown hints even outside the 429 path).
DM relay hydration (main.py)
-----------------------------
* New ``_hydrate_dm_relay_from_chain``: when accepted dm_message chain
events arrive on a node, they get deposited into the local DM relay
store with a deterministic sender_token_hash so re-sync of the same
event is idempotent. Recipients see the ciphertext as a normal DM
on their next poll and decrypt with their existing recipient key.
Other surfaces
--------------
* meshnode.bat / meshnode.sh now set ``MESH_INFONET_ALLOW_CLEARNET_SYNC=
false`` and the participant runtime flags by default so a freshly
spun-up node defaults to private-only sync.
* InfonetTerminal/InfonetShell.tsx adds a gate directory renderer for
the new private-gate workflow.
* docker-compose.relay.yml binds the relay backend to 127.0.0.1:8000
only; Tor's hidden service forwards onion traffic into 127.0.0.1.
Public clearnet :8000 stays off the network edge.
Tests
-----
* 7 new tests in test_private_gate_hashchain.py + test_private_dm_
hashchain.py covering: gate fork accepts ciphertext propagation,
gate fork rejects plaintext, append rejects plaintext before
normalize, append requires private_strong, append rejects
non-sealed ciphertext shape, DM spool 2-per-recipient + 1-per-pair
cap, DM hydration delivers to poll/claim.
* Updated test_mesh_node_bootstrap_runtime.py covers 429 backoff via
PeerSyncRateLimited 4-tuple AND PeerSyncHTTPError exception.
* Updated test_s14b_public_sync_gate_filter.py + test_s9b_gate_store_
hydration.py + test_gate_write_cutover.py cover the new private
redaction on public sync responses.
* test_private_gate_hashchain.py + test_private_dm_hashchain.py:
10 passed locally.
* Combined mesh-relevant suite (the 5 modified existing tests +
2 new): 17 passed.
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Fixes the retry-storm that's been keeping the local node 429'd out of
the seed peer (the diagnosis we ran earlier in the session). Pre-fix:
1. Sync hits the seed peer, gets HTTP 429 (Too Many Requests)
2. _peer_sync_response stringifies the status into a ValueError
3. _sync_from_peer catches it, error becomes the str() of the exc
4. _run_public_sync_cycle calls finish_sync(error=..., failure_backoff_s=60)
5. next_sync_due_at = now + 60s
6. After 60s, sync runs again, hits same upstream that hasn't reset
its rate-limit bucket, 429 again. Loop indefinitely.
Net effect: a node that hit one transient 429 would hammer the seed
every 60s forever, keeping the bucket full and never recovering. We
saw this in the live status dump: consecutive_failures=49,
last_sync_ok_at=0, retry storm sustained over the entire uptime.
What changed
------------
services/mesh/mesh_infonet_sync_support.py
* New typed exception PeerSyncRateLimited carries the parsed
Retry-After value out of the HTTP layer instead of stringifying
everything into a generic ValueError.
* New parse_retry_after_header() handles both RFC 7231 §7.1.3
forms (delay-seconds and HTTP-date). Clamped at 1 hour so a
hostile peer can't silence us for days.
* New _failure_backoff_seconds() helper computes the next delay
as max(exponential, retry_after_s). Schedule with default
base=60s, cap=1800s:
failure 1 -> 60s (preserves pre-fix for transient blips)
failure 2 -> 120s
failure 3 -> 240s
failure 4 -> 480s
failure 5 -> 960s
failure 6+ -> 1800s (capped at 30 min)
cap_s=0 explicitly disables exponential entirely — operators
who want pure-Retry-After behavior have that option.
* finish_sync now accepts retry_after_s and failure_backoff_cap_s
kwargs. Backward-compatible: existing callers that don't pass
retry_after_s get the same first-failure delay as before (the
base value), only repeat failures grow.
main.py
* _peer_sync_response detects 429 specifically, parses the
Retry-After header, raises PeerSyncRateLimited(retry_after_s=N).
Includes the response body prefix in the message so the
operator's last_error finally shows something useful.
* _sync_from_peer extended to return (ok, error, forked,
retry_after_s) — the 4th tuple element is non-zero only when
the upstream sent a parseable Retry-After. Existing call shape
preserved: the lone caller in _run_public_sync_cycle was
updated in the same commit.
* _run_public_sync_cycle forwards retry_after_s into finish_sync.
Tests
-----
backend/tests/mesh/test_infonet_sync_429_backoff.py — 17 new tests:
TestParseRetryAfter (7):
- integer seconds form
- HTTP-date form (computed as seconds-from-now)
- HTTP-date in the past returns 0
- empty / whitespace returns 0
- malformed returns 0
- clamps to 1 hour (hostile-peer cap)
- negative returns 0
TestFailureBackoffSeconds (5):
- exponential growth schedule pins each level
- retry_after wins when larger than exponential
- exponential wins when larger than retry_after
- cap_s=0 disables exponential entirely
- zero inputs return zero
TestFinishSyncBackoff (5):
- first failure uses base unchanged (pre-fix back-compat)
- consecutive_failures actually grow the delay
- retry_after honored at low failure count
- success resets consecutive_failures
- last_error carries the HTTP status / Retry-After detail
All 24 existing sync-support / status-gate tests still pass. Other
failures in tests/mesh/ are pre-existing on origin/main and unrelated
to this change (verified by running the same tests against the
user's main worktree without these edits).
What the operator sees after this lands + a docker rebuild
----------------------------------------------------------
With the live 429 storm we diagnosed:
Pre-fix: consecutive_failures keeps climbing 1/min forever,
last_error empty or generic
Post-fix: consecutive_failures grows, next_sync_due_at backs off
exponentially (max 30 min), last_error explicitly carries
"HTTP 429 from <peer> (retry_after=Ns): <body>" so the
operator can see what's actually wrong. Once the upstream
bucket drains and a sync succeeds, consecutive_failures
resets to 0 and the schedule returns to the normal 300s
interval.
Allow local-operator DM invite import without requiring a full admin session.
Prioritize bundled/bootstrap seed peers and shorten stale seed cooldowns for faster Infonet recovery.
Replace raw DM invite dumps with copyable signed-address controls, contact request handling, and safer sealed-send behavior while the private delivery route connects.
Ship the v0.9.79 runtime refresh with transport lane isolation, Infonet secure-message address management, MeshChat MQTT controls, selected asset trail behavior, telemetry panel refinements, onboarding updates, and desktop/package metadata alignment.
Also ignore local graphify work products so analysis folders do not leak into future commits.
Add Tor/onion runtime wiring and faster Infonet node status refresh.
Keep node bootstrap state clearer across Docker and local runtimes.
Use selected aircraft trail history for cumulative tracked-aircraft emissions.
Gate messages now propagate via the Infonet hashchain as encrypted blobs — every node syncs them
through normal chain sync while only Gate members with MLS keys can decrypt. Added mesh reputation
system, peer push workers, voluntary Wormhole opt-in for node participation, fork recovery,
killwormhole scripts, obfuscated terminology, and hardened the self-updater to protect encryption
keys and chain state during updates.
New features: Shodan search, train tracking, Sentinel Hub imagery, 8 new intelligence layers,
CCTV expansion to 11,000+ cameras across 6 countries, Mesh Terminal CLI, prediction markets,
desktop-shell scaffold, and comprehensive mesh test suite (215 frontend + backend tests passing).
Community contributors: @wa1id, @AlborzNazari, @adust09, @Xpirix, @imqdcr, @csysp, @suranyami,
@chr0n1x, @johan-martensson, @singularfailure, @smithbh, @OrfeoTerkuci, @deuza, @tm-const,
@Elhard1, @ttulttul