fix: resolve merge conflicts with origin/main (v0.6.0 + v0.6.0.1 + v0.6.1)

Merge main's test bootstrap, boil-the-lake completeness principle,
selective expansion, ship gate overrides, and gstack-upgrade vendor sync.

Conflicts resolved:
- CHANGELOG: keep main's 0.6.1/0.6.0.1/0.6.0/0.5.4/0.5.3 entries
- VERSION: take main's 0.6.1
- design-consultation: office-hours naming + main's "what's out there" phrasing
- ship: keep both verification rules (fresh evidence + coverage tests)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Garry Tan
2026-03-17 14:37:22 -07:00
42 changed files with 3926 additions and 1054 deletions
+48
View File
@@ -263,6 +263,30 @@
**Effort:** S
**Priority:** P3
### CI/CD generation for non-GitHub providers
**What:** Extend CI/CD bootstrap to generate GitLab CI (`.gitlab-ci.yml`), CircleCI (`.circleci/config.yml`), and Bitrise pipelines.
**Why:** Not all projects use GitHub Actions. Universal CI/CD bootstrap would make test bootstrap work for everyone.
**Context:** v1 ships with GitHub Actions only. Detection logic already checks for `.gitlab-ci.yml`, `.circleci/`, `bitrise.yml` and skips with an informational note. Each provider needs ~20 lines of template text in `generateTestBootstrap()`.
**Effort:** M
**Priority:** P3
**Depends on:** Test bootstrap (shipped)
### Auto-upgrade weak tests (★) to strong tests (★★★)
**What:** When Step 3.4 coverage audit identifies existing ★-rated tests (smoke/trivial assertions), generate improved versions testing edge cases and error paths.
**Why:** Many codebases have tests that technically exist but don't catch real bugs — `expect(component).toBeDefined()` isn't testing behavior. Upgrading these closes the gap between "has tests" and "has good tests."
**Context:** Requires the quality scoring rubric from the test coverage audit. Modifying existing test files is riskier than creating new ones — needs careful diffing to ensure the upgraded test still passes. Consider creating a companion test file rather than modifying the original.
**Effort:** M
**Priority:** P3
**Depends on:** Test quality scoring (shipped)
## Retro
### Deployment health tracking (retro + browse)
@@ -362,6 +386,16 @@
**Priority:** P2
**Depends on:** None
### Cross-platform URL open helper
**What:** `gstack-open-url` helper script — detect platform, use `open` (macOS) or `xdg-open` (Linux).
**Why:** The first-time Completeness Principle intro uses macOS `open` to launch the essay. If gstack ever supports Linux, this silently fails.
**Effort:** S (human: ~30 min / CC: ~2 min)
**Priority:** P4
**Depends on:** Nothing
### CDP-based DOM mutation detection for ref staleness
**What:** Use Chrome DevTools Protocol `DOM.documentUpdated` / MutationObserver events to proactively invalidate stale refs when the DOM changes, without requiring an explicit `snapshot` call.
@@ -446,6 +480,20 @@ Shipped as v0.5.0 on main. Includes `/plan-design-review` (report-only design au
**Priority:** P2
**Depends on:** Ship Confidence Dashboard (shipped)
## Completeness
### Completeness metrics dashboard
**What:** Track how often Claude chooses the complete option vs shortcut across gstack sessions. Aggregate into a dashboard showing completeness trend over time.
**Why:** Without measurement, we can't know if the Completeness Principle is working. Could surface patterns (e.g., certain skills still bias toward shortcuts).
**Context:** Would require logging choices (e.g., append to a JSONL file when AskUserQuestion resolves), parsing them, and displaying trends. Similar pattern to eval persistence.
**Effort:** M (human) / S (CC)
**Priority:** P3
**Depends on:** Boil the Lake shipped (v0.6.1)
## Completed
### Phase 1: Foundations (v0.2.0)