mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-02 11:45:20 +02:00
docs: update ARCHITECTURE, BROWSER, CONTRIBUTING, README for v0.4.0
- ARCHITECTURE: add ref staleness detection section, update RefEntry type - BROWSER: add ref staleness paragraph to snapshot system docs - CONTRIBUTING: update eval tool descriptions with commentary feature - README: fix missing qa-only in project-local uninstall command Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
+14
-1
@@ -120,7 +120,7 @@ Refs (`@e1`, `@e2`, `@c1`) are how the agent addresses page elements without wri
|
||||
2. Server calls Playwright's page.accessibility.snapshot()
|
||||
3. Parser walks the ARIA tree, assigns sequential refs: @e1, @e2, @e3...
|
||||
4. For each ref, builds a Playwright Locator: getByRole(role, { name }).nth(index)
|
||||
5. Stores Map<string, Locator> on the BrowserManager instance
|
||||
5. Stores Map<string, RefEntry> on the BrowserManager instance (role + name + Locator)
|
||||
6. Returns the annotated tree as plain text
|
||||
|
||||
Later:
|
||||
@@ -142,6 +142,19 @@ Playwright Locators are external to the DOM. They use the accessibility tree (wh
|
||||
|
||||
Refs are cleared on navigation (the `framenavigated` event on the main frame). This is correct — after navigation, all locators are stale. The agent must run `snapshot` again to get fresh refs. This is by design: stale refs should fail loudly, not click the wrong element.
|
||||
|
||||
### Ref staleness detection
|
||||
|
||||
SPAs can mutate the DOM without triggering `framenavigated` (e.g. React router transitions, tab switches, modal opens). This makes refs stale even though the page URL didn't change. To catch this, `resolveRef()` performs an async `count()` check before using any ref:
|
||||
|
||||
```
|
||||
resolveRef(@e3) → entry = refMap.get("e3")
|
||||
→ count = await entry.locator.count()
|
||||
→ if count === 0: throw "Ref @e3 is stale — element no longer exists. Run 'snapshot' to get fresh refs."
|
||||
→ if count > 0: return { locator }
|
||||
```
|
||||
|
||||
This fails fast (~5ms overhead) instead of letting Playwright's 30-second action timeout expire on a missing element. The `RefEntry` stores `role` and `name` metadata alongside the Locator so the error message can tell the agent what the element was.
|
||||
|
||||
### Cursor-interactive refs (@c)
|
||||
|
||||
The `-C` flag finds elements that are clickable but not in the ARIA tree — things styled with `cursor: pointer`, elements with `onclick` attributes, or custom `tabindex`. These get `@c1`, `@c2` refs in a separate namespace. This catches custom components that frameworks render as `<div>` but are actually buttons.
|
||||
|
||||
@@ -87,6 +87,8 @@ The browser's key innovation is ref-based element selection, built on Playwright
|
||||
|
||||
No DOM mutation. No injected scripts. Just Playwright's native accessibility API.
|
||||
|
||||
**Ref staleness detection:** SPAs can mutate the DOM without navigation (React router, tab switches, modals). When this happens, refs collected from a previous `snapshot` may point to elements that no longer exist. To handle this, `resolveRef()` runs an async `count()` check before using any ref — if the element count is 0, it throws immediately with a message telling the agent to re-run `snapshot`. This fails fast (~5ms) instead of waiting for Playwright's 30-second action timeout.
|
||||
|
||||
**Extended snapshot features:**
|
||||
- `--diff` (`-D`): Stores each snapshot as a baseline. On the next `-D` call, returns a unified diff showing what changed. Use this to verify that an action (click, fill, etc.) actually worked.
|
||||
- `--annotate` (`-a`): Injects temporary overlay divs at each ref's bounding box, takes a screenshot with ref labels visible, then removes the overlays. Use `-o <path>` to control the output path.
|
||||
|
||||
+5
-3
@@ -131,11 +131,13 @@ When E2E tests run, they produce machine-readable artifacts in `~/.gstack-dev/`:
|
||||
**Eval history tools:**
|
||||
|
||||
```bash
|
||||
bun run eval:list # list all eval runs
|
||||
bun run eval:compare # compare two runs (auto-picks most recent)
|
||||
bun run eval:summary # aggregate stats across all runs
|
||||
bun run eval:list # list all eval runs (turns, duration, cost per run)
|
||||
bun run eval:compare # compare two runs — shows per-test deltas + Takeaway commentary
|
||||
bun run eval:summary # aggregate stats + per-test efficiency averages across runs
|
||||
```
|
||||
|
||||
**Eval comparison commentary:** `eval:compare` generates natural-language Takeaway sections interpreting what changed between runs — flagging regressions, noting improvements, calling out efficiency gains (fewer turns, faster, cheaper), and producing an overall summary. This is driven by `generateCommentary()` in `eval-store.ts`.
|
||||
|
||||
Artifacts are never cleaned up — they accumulate in `~/.gstack-dev/` for post-mortem debugging and trend analysis.
|
||||
|
||||
### Tier 3: LLM-as-judge (~$0.15/run)
|
||||
|
||||
@@ -614,7 +614,7 @@ Or set `auto_upgrade: true` in `~/.gstack/config.yaml` to upgrade automatically
|
||||
|
||||
Paste this into Claude Code:
|
||||
|
||||
> Uninstall gstack: remove the skill symlinks by running `for s in browse plan-ceo-review plan-eng-review review ship retro qa qa-only setup-browser-cookies; do rm -f ~/.claude/skills/$s; done` then run `rm -rf ~/.claude/skills/gstack` and remove the gstack section from CLAUDE.md. If this project also has gstack at .claude/skills/gstack, remove it by running `for s in browse plan-ceo-review plan-eng-review review ship retro qa setup-browser-cookies; do rm -f .claude/skills/$s; done && rm -rf .claude/skills/gstack` and remove the gstack section from the project CLAUDE.md too.
|
||||
> Uninstall gstack: remove the skill symlinks by running `for s in browse plan-ceo-review plan-eng-review review ship retro qa qa-only setup-browser-cookies; do rm -f ~/.claude/skills/$s; done` then run `rm -rf ~/.claude/skills/gstack` and remove the gstack section from CLAUDE.md. If this project also has gstack at .claude/skills/gstack, remove it by running `for s in browse plan-ceo-review plan-eng-review review ship retro qa qa-only setup-browser-cookies; do rm -f .claude/skills/$s; done && rm -rf .claude/skills/gstack` and remove the gstack section from the project CLAUDE.md too.
|
||||
|
||||
## Development
|
||||
|
||||
|
||||
Reference in New Issue
Block a user