mirror of
https://github.com/BigBodyCobain/Shadowbroker.git
synced 2026-05-28 10:01:31 +02:00
Deflake messagesViewFirstContact via CI concurrency group
Root cause
----------
ci.yml fires twice on every PR — once directly via `pull_request:
[main]` (producing the "Frontend Tests & Build" check) and once via
`workflow_call` from docker-publish.yml (producing the "CI Gate /
Frontend Tests & Build" check). Both jobs land on the same Actions
runner pool at the same time and fight for CPU/RAM. Under contention,
the React reconciliation in `messagesViewFirstContact.test.tsx >
removes an approved contact immediately from the visible contact list`
overruns its 5s waitFor timeout.
This is the single test that has flaked on PRs #226, #237, #261, #262,
#265, #294, #303, and the fd7d6fa push — always on the same job name
("CI Gate / Frontend Tests & Build"), never on the sibling job
("Frontend Tests & Build") on the same commit. PR #304 (which heavily
touched the frontend) passed both jobs on first try. PR #303 (zero
frontend changes) failed only the CI Gate job. That asymmetry is what
finally pinpointed the parallel-resource-contention cause rather than
anything in the test or the PRs.
Fix
---
.github/workflows/ci.yml — added a workflow-level concurrency group
keyed on the PR head SHA (or pushed commit SHA). Both invocations
against the same commit now share a group, so the second one queues
instead of running in parallel. cancel-in-progress is intentionally
`false` — cancelling would risk leaving a PR check stuck in "Expected"
if only one of the two ever finished. Total CI time grows by ~2 min
in exchange for deterministic outcomes.
frontend/src/__tests__/mesh/messagesViewFirstContact.test.tsx —
belt-and-suspenders bump of the waitFor timeout from 5s to 15s. The
structural fix above should make the original 5s margin sufficient,
but the bump removes the residual risk of brief runner load spikes
inside the (now serialised) single job. The failure mode this masks
would be "toast never renders", which still fails loudly at 15s.
The full mesh test file (26 tests) passes locally in ~8s with the
bumped timeout.
This commit is contained in:
@@ -7,6 +7,28 @@ on:
|
||||
branches: [main]
|
||||
workflow_call:
|
||||
|
||||
# CI flake mitigation:
|
||||
# ci.yml is triggered TWICE per PR on the same commit — once directly via
|
||||
# the `pull_request` trigger above ("Frontend Tests & Build" check) and once
|
||||
# via `workflow_call` from docker-publish.yml ("CI Gate / Frontend Tests &
|
||||
# Build" check). Both jobs land on the same Actions runner pool at the same
|
||||
# time and fight for CPU/RAM. Under contention, React's reconciliation in
|
||||
# `messagesViewFirstContact.test.tsx > removes an approved contact …`
|
||||
# overruns its 5s waitFor timeout — that's the single failure mode we've
|
||||
# seen flake on PRs #226, #237, #261, #262, #265, #294, #303, and the
|
||||
# fd7d6fa push. Backend tests and every other frontend test pass under
|
||||
# the same conditions, which is what made this look random.
|
||||
#
|
||||
# Pinning a concurrency group on the SHA (PR head, or the pushed commit
|
||||
# for main) serializes the two invocations so neither starves the other.
|
||||
# We use cancel-in-progress: false so the second one queues instead of
|
||||
# cancelling — cancelling could leave the PR check stuck "Expected" if
|
||||
# only one of the two ever finishes. Total CI time grows by ~2 min in
|
||||
# exchange for deterministic outcomes.
|
||||
concurrency:
|
||||
group: ci-${{ github.event.pull_request.head.sha || github.sha }}
|
||||
cancel-in-progress: false
|
||||
|
||||
jobs:
|
||||
frontend:
|
||||
name: Frontend Tests & Build
|
||||
|
||||
Reference in New Issue
Block a user