From fc0137b4b058fdba26bce95d4658a1642a015ae7 Mon Sep 17 00:00:00 2001 From: Garry Tan Date: Sun, 17 May 2026 19:06:16 -0700 Subject: [PATCH] chore: bump version and changelog (v1.40.0.0) Co-Authored-By: Claude Opus 4.7 --- AGENTS.md | 10 +++++++ CHANGELOG.md | 68 ++++++++++++++++++++++++++++++++++++++++++ README.md | 2 ++ VERSION | 2 +- docs/skills.md | 80 ++++++++++++++++++++++++++++++++++++++++++++++++++ package.json | 2 +- 6 files changed, 162 insertions(+), 2 deletions(-) diff --git a/AGENTS.md b/AGENTS.md index f17314009..7a577c590 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -75,6 +75,16 @@ Invoke them by name (e.g., `/office-hours`). | `/setup-browser-cookies` | Import cookies from your real browser for authenticated testing. | | `/pair-agent` | Pair a remote AI agent (OpenClaw, Codex, etc.) with your browser. | +### iOS device-farm (v1.40.0.0+) + +| Skill | What it does | +|-------|-------------| +| `/ios-qa` | Live-device iOS QA via USB CoreDevice tunnel + embedded StateServer. Optionally exposes the device over Tailscale so remote agents can drive it. | +| `/ios-fix` | Autonomous iOS bug fixer with regression snapshot capture. | +| `/ios-design-review` | Designer's-eye QA on a real iPhone — 10-dimension Apple HIG rubric. | +| `/ios-clean` | Convenience: strip DebugBridge + #if DEBUG wiring before a Release build. | +| `/ios-sync` | Regenerate the iOS debug bridge against the latest upstream templates. | + ### Safety + scoping | Skill | What it does | diff --git a/CHANGELOG.md b/CHANGELOG.md index a91c9d0de..12a03a59a 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,5 +1,73 @@ # Changelog +## [1.40.0.0] - 2026-05-17 + +## **iOS QA on a real iPhone — no XCTest, no WebDriverAgent, no simulators.** +## **Bring your own Mac mini + Tailscale and you have a DIY device farm any agent can drive.** + +Five new skills (`/ios-qa`, `/ios-fix`, `/ios-design-review`, `/ios-clean`, `/ios-sync`) bring the fork from `time-attack/gstack` into upstream with the hardening it needed to actually ship. The architecture's load-bearing insight: drop XCTest, drop the simulator, drop WebDriverAgent. Embed an HTTP server in the iOS app under test, drive it from a Mac-side bun daemon over the USB CoreDevice IPv6 tunnel. The agent reads your Swift source, codegens typed `@Observable` accessors via a SwiftPM swift-syntax tool (with a TS fallback for fast first-runs), deploys a debug bridge, and runs a closed find→fix→verify loop. With the optional `--tailnet` flag, the Mac daemon also binds Tailscale and accepts authenticated remote calls — your $500 Mac mini + an iPhone you already have replaces the BrowserStack line item. + +### The numbers that matter + +Source: 67 daemon unit/integration tests + 20 codegen tests + 8 high-level E2E tests, all running against a real bun process and a stubbed iOS StateServer. + +| Surface | Fork as-is | Shipped | +|---|---|---| +| StateServer bind | `0.0.0.0:9999`, zero auth | `::1` + `127.0.0.1` only; bearer-token gate; boot token rotates within ~5s of daemon spawn so anything scraping `os_log` past then sees a dead credential | +| Release-build safety | none (any `#if DEBUG` mistake ships the bridge) | structural `Package.swift` `.when(configuration: .debug)` + CI `swift build -c release` invariant test that fails if the `DebugBridge` symbol appears | +| Codegen failure modes covered | regex breaks on computed properties, generics, multi-line types | swift-syntax AST (production), strict TS regex fallback for tests; 3 dedicated fixtures pin the known failure shapes | +| Multi-agent device contention | none | per-device session lock with sliding timeout on mutations only; concurrent `/session/acquire` race test | +| Remote control | not in scope | Tailscale identity-gated `/auth/mint`; capability tiers (observe/interact/mutate/restore); 1h default session TTL (24h cap); audit log of every authenticated mutating request; hashed-identity attempts log | +| Hardcoded paths | 3 `/Users/sinmat/.gstack/...` paths | none — all paths use `$HOME` / `os.homedir()` | +| Test coverage | none | 95 tests covering session-lock concurrency, snapshot/restore atomicity with schema-hash gate, identity canonicalization (user / tag / node-key), capability tier enforcement, rate limits, body-size limits, boot-token leak proofs, tailnet fail-closed probe, CoreDevice tunnel reconnect plumbing, cache-key composite (Swift version + tool git rev + source content + platform triple) | + +### What this means for iOS developers + +You can ship a SwiftUI app, add the `DebugBridge` SPM dep, run `/ios-qa`, and watch an agent drive your phone — taps, swipes, state writes, the whole loop. The "Driven by Claude Code" overlay confirms the device is agent-controlled in real time. Hand the box to a colleague over Tailscale and they can run QA from their laptop without touching the device. The Mac-side daemon enforces capability tiers, so the contractor who only needs to take screenshots can't write state; the CI runner that needs to set up a test scenario can do so without being able to call `/state/restore`. The audit log gives you per-request forensics. The structural Release-build guard means the bridge cannot ship to TestFlight even if a developer forgets `/ios-clean`. + +### Itemized changes + +#### Added + +- **`/ios-qa`** (770-line SKILL.md.tmpl) — live-device QA flow with warm-start session cache, on-demand daemon spawn, Tailscale opt-in, demo + recording modes, full failure-mode + recovery matrix. +- **`/ios-fix`** — autonomous bug fixer that captures a reproducing `/state/snapshot` BEFORE editing source, then rebuilds + redeploys + verifies. Snapshot becomes a regression test fixture. +- **`/ios-design-review`** — 10-dimension Apple HIG audit on a real device. 0-10 scores per dimension with "what would make it a 10" framing, mirroring `/plan-design-review`'s rubric for browser. +- **`/ios-clean`** — convenience wrapper that strips `DebugBridge` SPM + `#if DEBUG` wiring. Explicitly NOT the safety-critical path — the structural Release-build guard in `Package.swift` is. +- **`/ios-sync`** — regenerates accessors against latest upstream gstack templates. Run after upgrading gstack or adding new `@Observable` classes. +- `ios-qa/templates/StateServer.swift.template` — dual-stack loopback bind (`::1` + `127.0.0.1`), boot token rotation, per-device session lock with mutation-only sliding window, snapshot/restore with schema envelope (`_schema_version` + `_app_build_id` + `_accessor_hash`), validate-then-apply atomicity via a single canonical-state-struct assignment, 1MB body cap. +- `ios-qa/templates/DebugOverlay.swift.template` — animated brand-colored border, agent attribution chip (`X-Agent-Identity` header, display-only, never trusted for auth), optional recording-mode watermark for screencasts. +- `ios-qa/templates/Package.swift.template` — DebugBridge target gated `.when(configuration: .debug)`. SwiftPM refuses to link in Release config. +- `ios-qa/daemon/` — Mac-side bun/TS daemon. Single-instance flock + readiness protocol, fail-closed tailscaled LocalAPI probe, dual-track `/auth/mint` (self-service for allowlisted identities, owner-granted via CLI), capability-tier allowlist on the tailnet listener, hashed-identity attempts log, every authenticated mutating tailnet request audited. +- `ios-qa/scripts/gen-accessors-tool/` — SwiftPM tool plugin using swift-syntax for production codegen. +- `ios-qa/scripts/gen-accessors.ts` — TS fallback for fast first-runs and CI. Same composite cache key (`sha256(source || swift_version || tool_git_rev || platform_triple)`) — codex flagged that source-only hash misses generator-logic changes. +- `ios-qa/docs/tailscale-acl-example.md` — runnable example covering tailscaled ACL setup, owner-mint flow, capability tiers, audit log structure, rate limits, and token lifetime. +- `test/skill-e2e-ios.test.ts` — 8 end-to-end scenarios covering codegen + daemon + stub StateServer + Tailscale gating + capability tiers. +- 67 daemon unit/integration tests across `session-tokens`, `allowlist`, `auth-mint`, `single-instance`, `tailscale-localapi`, `audit`, `proxy-classify`, `daemon-integration`. +- 20 codegen tests in `ios-qa/scripts/gen-accessors.test.ts` covering parse, cache key composition, cache hit/miss, 30d prune, and the 3 fork-regex-failure-mode fixtures. + +#### Changed + +- `test/helpers/touchfiles.ts` — registered `ios-qa-e2e` touchfile (gate-tier, fires when any `ios-*/` dir changes) so diff-based selection picks up iOS work. +- `AGENTS.md`, `docs/skills.md` — added "iOS device-farm" sections covering the five new skills. + +#### Hardened (codex-flagged in the plan-review outside voice pass) + +- iOS StateServer is loopback-only ALWAYS. Tailnet ingress is exclusively the Mac daemon's responsibility — the iPhone has no way to validate Tailscale identities, so identity validation MUST be Mac-side. The plan caught and removed an earlier contradiction that would have had the iOS app binding tailnet directly. +- Boot token rotates within ~5s of daemon spawn so anything scraping `os_log` past then sees a dead credential. The fork wrote the boot token to `os_log` once and used it for the daemon's lifetime — a durable-credential-in-logs smell. +- `/auth/mint` trust model split into two distinct mechanisms: self-service (caller must already be in allowlist) and owner-granted (CLI on the Mac writes to the allowlist file). Self-service NEVER auto-allowlists. The fork ambiguously mixed both paths. +- Snapshot envelope includes `_accessor_hash` so a snapshot captured against an older app build is loudly rejected with 409 schema_mismatch instead of silently corrupting state. +- `GET /state/snapshot` returns ONLY fields marked `@Snapshotable`. Default-deny instead of default-leak — keeps tokens, PII, and auth state out of agent visibility unless explicitly opted in. +- Tailnet listener fails closed if tailscaled LocalAPI is unreachable. Daemon refuses to open the tailnet listener at all rather than half-starting. +- `X-Agent-Identity` header is display-only. Never read for auth or for audit beyond the display chip — the daemon-minted token is what determines capability tier. + +#### For contributors + +- New SwiftPM tool dependency: `swift-syntax`. First run builds the dependency tree (2-5 min on a cold machine, ~50ms thereafter via content-hash cache). Document the "first-time setup" UX in `/ios-qa` so users know what's happening. +- The TS fallback in `ios-qa/scripts/gen-accessors.ts` is what tests + CI exercise. Production users get the Swift tool when available; CI never waits 5 minutes for swift-syntax to build. +- All daemon HTTP egress goes through `JSON.stringify(payload, sanitizeReplacer)` to strip lone UTF-16 surrogates before they reach the Anthropic API — mirrors `browse/src/sanitize-replacer.ts`. Tunnel-denial logging mirrors `browse/src/tunnel-denial-log.ts`. No new auth/logging primitives. + +Contributed by @sinacodedit (forked from time-attack/gstack). + ## [1.39.2.0] - 2026-05-15 ## **Conductor workspaces wire `GSTACK_*` keys straight into gbrain embeddings and paid evals.** diff --git a/README.md b/README.md index d89b8d998..f00ee5c73 100644 --- a/README.md +++ b/README.md @@ -229,6 +229,8 @@ Each skill feeds into the next. `/office-hours` writes a design doc that `/plan- | `/setup-gbrain` | **GBrain Onboarding** — from zero to running gbrain in under 5 minutes. PGLite local, Supabase existing URL, or auto-provision a new Supabase project via Management API. MCP registration for Claude Code + per-repo trust triad (read-write/read-only/deny). [Full guide](USING_GBRAIN_WITH_GSTACK.md). | | `/sync-gbrain` | **Keep Brain Current** — re-index this repo's code into gbrain via `gbrain sources add` + `gbrain sync --strategy code`, refresh the `## GBrain Search Guidance` block in CLAUDE.md, and auto-remove guidance when the capability check fails. `--incremental` (default), `--full`, `--dry-run`. Idempotent; safe to re-run. | | `/gstack-upgrade` | **Self-Updater** — upgrade gstack to latest. Detects global vs vendored install, syncs both, shows what changed. | +| `/ios-qa` | **iOS Live-Device QA (v1.40+)** — drive a real iPhone over USB CoreDevice via an embedded `StateServer` in the app. Read Swift source, codegen typed `@Observable` accessors, run the agent loop. Optional `--tailnet` flag turns your Mac mini into a DIY device farm reachable by OpenClaw or any HTTP-capable agent on your Tailscale tailnet. Capability-tier allowlist (observe/interact/mutate/restore), per-device session lock, audit log. | +| `/ios-fix`, `/ios-design-review`, `/ios-clean`, `/ios-sync` | iOS bug-fix loop, designer's-eye HIG audit, debug-bridge cleanup, and accessor resync. See `docs/skills.md`. | ### New binaries (v0.19) diff --git a/VERSION b/VERSION index 939a56892..895062404 100644 --- a/VERSION +++ b/VERSION @@ -1 +1 @@ -1.39.2.0 +1.40.0.0 diff --git a/docs/skills.md b/docs/skills.md index 345a378ad..03e04b0b8 100644 --- a/docs/skills.md +++ b/docs/skills.md @@ -54,6 +54,11 @@ Detailed guides for every gstack skill — philosophy, workflow, and examples. | [`/setup-deploy`](#setup-deploy) | **Deploy Configurator** | One-time setup for `/land-and-deploy`. Detects your platform, production URL, and deploy commands. | | [`/gstack-upgrade`](#gstack-upgrade) | **Self-Updater** | Upgrade gstack to the latest version. Detects global vs vendored install, syncs both, shows what changed. | | [`/make-pdf`](#make-pdf) | **PDF Generator** | Turn any markdown file into a publication-quality PDF. Proper margins, page numbers, cover pages, clickable TOC. | +| [`/ios-qa`](#ios-qa) | **iOS QA Lead** | Live-device iOS QA via USB CoreDevice tunnel + embedded StateServer. Reads Swift source, codegens accessors, drives the real iPhone. Optionally exposes the device over Tailscale for remote agents. | +| [`/ios-fix`](#ios-fix) | **iOS Autonomous Fixer** | Closes the find→fix→verify loop on a real iPhone. Captures a reproducing snapshot, fixes the source, rebuilds, redeploys, verifies. | +| [`/ios-design-review`](#ios-design-review) | **iOS Designer's Eye** | 10-dimension Apple HIG audit on a real iPhone. Rates each screen, says what would make it a 10. | +| [`/ios-clean`](#ios-clean) | **iOS Bridge Cleanup** | Convenience wrapper to strip DebugBridge SPM + `#if DEBUG` wiring. The structural Release-build guard is in Package.swift + CI; this skill is for guided manual removals. | +| [`/ios-sync`](#ios-sync) | **iOS Bridge Resync** | Regenerate accessors and Swift templates against the latest upstream gstack. Run when you add new `@Observable` classes or upgrade gstack. | --- @@ -1178,3 +1183,78 @@ Claude: Replied to Greptile. All tests pass. ``` Three Greptile comments. One real fix. One auto-acknowledged. One false positive pushed back with a reply. Total extra time: about 30 seconds. + +--- + +## `/ios-qa` + +Live-device iOS QA. The fork's load-bearing insight was: don't simulate, don't run XCTest, don't bring up WebDriverAgent. Embed an HTTP server in the app under test, drive it from a Mac-side daemon over the USB CoreDevice IPv6 tunnel. + +The agent reads your Swift source, finds `@Observable` classes with `@Snapshotable`-marked fields, codegens typed accessors, deploys a debug bridge, then runs a closed find→fix→verify loop. + +### Architecture in one diagram + +``` + ┌──────────────────────┐ USB CoreDevice (IPv6) ┌──────────────────┐ + │ gstack-ios-qa daemon │ ────────────────────────▶ │ iOS app │ + │ (Mac, bun/TS) │ bearer + X-Session-Id │ StateServer │ + │ - rotates boot token │ │ (loopback only) │ + │ - mints session toks │ └──────────────────┘ + │ - capability tiers │ + │ - audit + redact │ + └──────────────────────┘ + ▲ + │ Tailscale (optional, --tailnet) + │ + ┌──────────────────────┐ + │ Remote agent │ + │ (OpenClaw, etc.) │ + └──────────────────────┘ +``` + +The iOS app's `StateServer` binds loopback only (`::1` + `127.0.0.1`). The Mac daemon owns tailnet identity validation, capability tiers, and the audit trail. Remote agents NEVER see the boot token — only short-lived session tokens (1h default, 24h hard cap) minted via Tailscale identity gating. + +### The unlock: USB-tethered + Tailscale = DIY device farm + +A $500 Mac mini + an old iPhone + Tailscale free tier replaces what most teams pay BrowserStack/Sauce Labs for. Tailscale ACLs scope which identities can reach which devices at which capability tier. + +See `ios-qa/docs/tailscale-acl-example.md` for the runnable setup. + +### Capability tiers + +| Tier | Endpoints | +|------|-----------| +| observe | `/screenshot`, `/elements`, `GET /state/*`, `/state/snapshot`, `/healthz` | +| interact | observe + `/tap`, `/swipe`, `/type`, `/session/*` | +| mutate | interact + `POST /state/` | +| restore | mutate + `POST /state/restore` | + +Default minted tokens get `interact`. Higher tiers require explicit owner mint. + +--- + +## `/ios-fix` + +Iron Law: no fix without a reproducing snapshot. The agent captures pre-bug state via `GET /state/snapshot`, writes the fix, rebuilds, redeploys, restores the snapshot, and verifies the bug is gone. The snapshot becomes a regression test fixture so the bug can't recur silently. + +Mirrors `/qa`'s find-bug → fix → re-verify loop for iOS. + +--- + +## `/ios-design-review` + +Designer's-eye QA on a real iPhone. Connects to the same `/ios-qa` daemon in observe-tier mode and screenshots every screen. Scores 10 dimensions 0-10: typography hierarchy, spacing rhythm, color hierarchy, touch targets, loading/empty/error states, accessibility, animation discipline, iOS idiom alignment, information density, AI-slop check. + +For each score < 7, uses AskUserQuestion to present the issue with recommended fix. + +--- + +## `/ios-clean` + +Convenience wrapper. The structural Release-build guard against shipping DebugBridge is in `Package.swift` (`.when(configuration: .debug)`) plus a CI invariant test. `/ios-clean` is for developers who want a guided removal flow or who manually added the SPM dependency without going through `/ios-qa`. + +--- + +## `/ios-sync` + +Run after upgrading gstack or adding new `@Observable` classes. Detects what's installed, runs gen-accessors against the latest upstream templates, refreshes any changed Swift files, verifies the app rebuilds. Cache-key invalidation handles Swift version changes, generator git rev changes, and source changes. diff --git a/package.json b/package.json index 592493d5e..3851a78bd 100644 --- a/package.json +++ b/package.json @@ -1,6 +1,6 @@ { "name": "gstack", - "version": "1.39.2.0", + "version": "1.40.0.0", "description": "Garry's Stack — Claude Code skills + fast headless browser. One repo, one install, entire AI engineering workflow.", "license": "MIT", "type": "module",