Deflake messagesViewFirstContact via CI concurrency group

Root cause ---------- ci.yml fires twice on every PR — once directly via `pull_request: [main]` (producing the "Frontend Tests & Build" check) and once via `workflow_call` from docker-publish.yml (producing the "CI Gate / Frontend Tests & Build" check). Both jobs land on the same Actions runner pool at the same time and fight for CPU/RAM. Under contention, the React reconciliation in `messagesViewFirstContact.test.tsx > removes an approved contact immediately from the visible contact list` overruns its 5s waitFor timeout. This is the single test that has flaked on PRs #226, #237, #261, #262, #265, #294, #303, and the fd7d6fa push — always on the same job name ("CI Gate / Frontend Tests & Build"), never on the sibling job ("Frontend Tests & Build") on the same commit. PR #304 (which heavily touched the frontend) passed both jobs on first try. PR #303 (zero frontend changes) failed only the CI Gate job. That asymmetry is what finally pinpointed the parallel-resource-contention cause rather than anything in the test or the PRs. Fix --- .github/workflows/ci.yml — added a workflow-level concurrency group keyed on the PR head SHA (or pushed commit SHA). Both invocations against the same commit now share a group, so the second one queues instead of running in parallel. cancel-in-progress is intentionally `false` — cancelling would risk leaving a PR check stuck in "Expected" if only one of the two ever finished. Total CI time grows by ~2 min in exchange for deterministic outcomes. frontend/src/__tests__/mesh/messagesViewFirstContact.test.tsx — belt-and-suspenders bump of the waitFor timeout from 5s to 15s. The structural fix above should make the original 5s margin sufficient, but the bump removes the residual risk of brief runner load spikes inside the (now serialised) single job. The failure mode this masks would be "toast never renders", which still fails loudly at 15s. The full mesh test file (26 tests) passes locally in ~8s with the bumped timeout.
2026-07-12 23:16:38 +02:00 · 2026-05-22 17:36:33 -06:00
parent b01a69c172
commit 44e9b38ac2
2 changed files with 37 additions and 6 deletions
@@ -7,6 +7,28 @@ on:
    branches: [main]
  workflow_call:

+# CI flake mitigation:
+# ci.yml is triggered TWICE per PR on the same commit — once directly via
+# the `pull_request` trigger above ("Frontend Tests & Build" check) and once
+# via `workflow_call` from docker-publish.yml ("CI Gate / Frontend Tests &
+# Build" check). Both jobs land on the same Actions runner pool at the same
+# time and fight for CPU/RAM. Under contention, React's reconciliation in
+# `messagesViewFirstContact.test.tsx > removes an approved contact …`
+# overruns its 5s waitFor timeout — that's the single failure mode we've
+# seen flake on PRs #226, #237, #261, #262, #265, #294, #303, and the
+# fd7d6fa push. Backend tests and every other frontend test pass under
+# the same conditions, which is what made this look random.
+#
+# Pinning a concurrency group on the SHA (PR head, or the pushed commit
+# for main) serializes the two invocations so neither starves the other.
+# We use cancel-in-progress: false so the second one queues instead of
+# cancelling — cancelling could leave the PR check stuck "Expected" if
+# only one of the two ever finishes. Total CI time grows by ~2 min in
+# exchange for deterministic outcomes.
+concurrency:
+  group: ci-${{ github.event.pull_request.head.sha || github.sha }}
+  cancel-in-progress: false
+
 jobs:
  frontend:
    name: Frontend Tests & Build