mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-02 03:35:09 +02:00
5f316e0eb4bdf4914f6751f03736df44d55cd80d
160 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
ef840edc8e |
merge: origin/main v1.1.1.0 into garrytan/fix-checkpoints
Main shipped the /ship VERSION/package.json drift-detection fix as v1.1.1.0 — exact collision with our branch's existing version. Bumped ours to 1.1.2.0. Resolved conflicts: - VERSION: 1.1.1.0 → 1.1.2.0 - package.json: 1.1.1.0 → 1.1.2.0 - CHANGELOG.md: moved our /checkpoint → /context-save entry up one header level to [1.1.2.0] and kept main's /ship drift-fix entry at [1.1.1.0]. Sequence now: 1.1.2.0 → 1.1.1.0 → 1.1.0.0 → 1.0.0.0. - Migration renamed v1.1.1.0.sh → v1.1.2.0.sh (version string inside and test path reference both updated). Also bumped the /context-save + /context-restore CHANGELOG entry to credit the adversarial-review hardening wave (HOME guard, realpath fallback, title sanitize, collision-safe filenames, context-restore head cap, autoplan test regex tightening) as contributor-facing notes — previous entry didn't reflect the security work that landed after the initial ship. No overlap between main's /ship Step 12 logic and this branch's work. SKILL.md files regenerated via bun run gen:skill-docs --host all. Golden fixtures updated. bun test: 0 failures across 80+ targeted tests and the full suite. Migration ownership guard: 7/7 pass (~85ms). |
||
|
|
e3c961d00f |
fix(ship): detect + repair VERSION/package.json drift in Step 12 (v1.1.1.0) (#1063)
* fix(ship): detect + repair VERSION/package.json drift in Step 12
/ship Step 12's idempotency check read only VERSION and its bump
action wrote only VERSION. package.json's version field was never
updated, so the first bump silently drifted and re-runs couldn't
see it (they keyed on VERSION alone). Any consumer reading
package.json (bun pm, npm publish, registry UIs) saw a stale semver.
Rewrites Step 12 as a four-state dispatch:
FRESH → normal bump, writes VERSION + package.json in sync
ALREADY_BUMPED → skip, reuse current VERSION
DRIFT_STALE_PKG → sync-only repair path, no re-bump (prevents
double-bump on re-run)
DRIFT_UNEXPECTED → halt and ask user (pkg edited manually,
ambiguous which value is authoritative)
Hardening: NEW_VERSION validated against MAJOR.MINOR.PATCH.MICRO
pattern before any write; node-or-bun required for JSON parsing
(no sed fallback — unsafe on nested "version" fields); invalid
JSON fails hard instead of silently corrupting.
Adds test/ship-version-sync.test.ts with 12 cases covering every
state transition, including the critical drift-repair regression
that verifies sync does not double-bump (the bug Codex caught in
the plan review of my own original fix).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore(ship): regenerate SKILL.md + refresh golden fixtures
Mechanical follow-on from the Step 12 template edit. `bun run
gen:skill-docs --host all` regenerates ship/SKILL.md; host-config
golden-file regression tests then need fresh baselines copied
from the regenerated claude/codex/factory host variants.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(ship): harden Step 12 against whitespace + invalid REPAIR_VERSION
Claude adversarial subagent surfaced three correctness risks in the
Step 12 state machine:
- CURRENT_VERSION and BASE_VERSION were not stripped of CR/whitespace
on read. A CRLF VERSION file would mismatch the clean package.json
version, falsely classify as DRIFT_STALE_PKG, then propagate the
carriage return into package.json via the repair path.
- REPAIR_VERSION was unvalidated. The bump path validates NEW_VERSION
against the 4-digit semver pattern, but the drift-repair path wrote
whatever cat VERSION returned directly into package.json. A
manually-corrupted VERSION file would silently poison the repair.
- Empty-string CURRENT_VERSION (0-byte VERSION, directory-at-VERSION)
fell through to "not equal to base" and misclassified as
ALREADY_BUMPED.
Template fix strips \r/newlines/whitespace on every VERSION read,
guards against empty-string results, and applies the same semver
regex gate in the repair path that already protects the bump path.
Adds two regression tests (trailing-CR idempotency + invalid-semver
repair rejection). Total Step 12 coverage: 14 tests, 14/14 pass.
Opens two follow-up TODOs flagged but not fixed in this branch:
test/template drift risk (the tests still reimplement template bash)
and BASE_VERSION silent fallback on git-show failure.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore(ship): regenerate SKILL.md + refresh goldens after hardening
Mechanical follow-on from the whitespace + REPAIR_VERSION validation
edits to ship/SKILL.md.tmpl. bun run gen:skill-docs --host all
regenerates ship/SKILL.md; host-config golden-file regression tests
need fresh baselines copied from the regenerated claude/codex/factory
host variants.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore: bump version and changelog (v1.0.1.0)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
5bc9767a3b |
merge: origin/main v1.1.0.0 into garrytan/fix-checkpoints
Main shipped browse Puppeteer parity (v1.1.0.0 — goto file://, load-html, screenshot --selector, viewport --scale). Resolved conflicts: - VERSION / package.json: bumped 1.0.1.0 → 1.1.1.0 (main is 1.1.0.0, this branch lands next). - CHANGELOG: moved the /context-save + /context-restore entry to the top as v1.1.1.0, above main's v1.1.0.0. - Migration: renamed gstack-upgrade/migrations/v1.0.1.0.sh → v1.1.1.0.sh (matches new version). Test path updated. Migration version string inside the script also updated. No overlap between browse changes and context-save/context-restore code. SKILL.md files regenerated via bun run gen:skill-docs --host all per CLAUDE.md's "never resolve generated files by accepting either side" rule. Golden fixtures (claude/codex/factory ship) regenerated. bun test: 0 failures. Migration ownership guard: 7/7 pass (~85ms). |
||
|
|
c15b805cd8 |
feat(browse): Puppeteer parity — load-html, screenshot --selector, viewport --scale, file:// (v1.1.0.0) (#1062)
* feat(browse): TabSession loadedHtml + command aliases + DX polish primitives
Adds the foundation layer for Puppeteer-parity features:
- TabSession.loadedHtml + setTabContent/getLoadedHtml/clearLoadedHtml —
enables load-html content to survive context recreation (viewport --scale)
via in-memory replay. ASCII lifecycle diagram in the source explains the
clear-before-navigation contract.
- COMMAND_ALIASES + canonicalizeCommand() helper — single source of truth
for name aliases (setcontent / set-content / setContent → load-html),
consumed by server dispatch and chain prevalidation.
- buildUnknownCommandError() pure function — rich error messages with
Levenshtein-based "Did you mean" suggestions (distance ≤ 2, input
length ≥ 4 to skip 2-letter noise) and NEW_IN_VERSION upgrade hints.
- load-html registered in WRITE_COMMANDS + SCOPE_WRITE so scoped write
tokens can use it.
- screenshot and viewport descriptions updated for upcoming flags.
- New browse/test/dx-polish.test.ts (15 tests): alias canonicalization,
Levenshtein threshold + alphabetical tiebreak, short-input guard,
NEW_IN_VERSION upgrade hint, alias + scope integration invariants.
No consumers yet — pure additive foundation. Safe to bisect on its own.
* feat(browse): accept file:// in goto with smart cwd/home-relative parsing
Extends validateNavigationUrl to accept file:// URLs scoped to safe dirs
(cwd + TEMP_DIR) via the existing validateReadPath policy. The workhorse is a
new normalizeFileUrl() helper that handles non-standard relative forms BEFORE
the WHATWG URL parser sees them:
file:///abs/path.html → unchanged
file://./docs/page.html → file://<cwd>/docs/page.html
file://~/Documents/page.html → file://<HOME>/Documents/page.html
file://docs/page.html → file://<cwd>/docs/page.html
file://localhost/abs/path → unchanged
file://host.example.com/... → rejected (UNC/network)
file:// and file:/// → rejected (would list a directory)
Host heuristic rejects segments with '.', ':', '\\', '%', IPv6 brackets, or
Windows drive-letter patterns — so file://docs.v1/page.html, file://127.0.0.1/x,
file://[::1]/x, and file://C:/Users/x are explicit errors.
Uses fileURLToPath() + pathToFileURL() from node:url (never string-concat) so
URL escapes like %20 decode correctly and Node rejects encoded-slash traversal
(%2F..%2F) outright.
Signature change: validateNavigationUrl now returns Promise<string> (the
normalized URL) instead of Promise<void>. Existing callers that ignore the
return value still compile — they just don't benefit from smart-parsing until
updated in follow-up commits. Callers will be migrated in the next few commits
(goto, diff, newTab, restoreState).
Rewrites the url-validation test file: updates existing tests for the new
return type, adds 20+ new tests covering every normalizeFileUrl shape variant,
URL-encoding edge cases, and path-traversal rejection.
References: codex consult v3 P1 findings on URL parser semantics and fileURLToPath.
* feat(browse): BrowserManager deviceScaleFactor + setContent replay + file:// plumbing
Three tightly-coupled changes to BrowserManager, all in service of the
Puppeteer-parity workflow:
1. deviceScaleFactor + currentViewport tracking. New private fields (default
scale=1, viewport=1280x720) + setDeviceScaleFactor(scale, w, h) method.
deviceScaleFactor is a context-level Playwright option — changing it
requires recreateContext(). The method validates (finite number, 1-3 cap,
headed-mode rejected), stores new values, calls recreateContext(), and
rolls back the fields on failure so a bad call doesn't leave inconsistent
state. Context options at all three sites (launch, recreate happy path,
recreate fallback) now honor the stored values instead of hardcoding
1280x720.
2. BrowserState.loadedHtml + loadedHtmlWaitUntil. saveState captures per-tab
loadedHtml from the session; restoreState replays it via newSession.
setTabContent() — NOT bare page.setContent() — so TabSession.loadedHtml
is rehydrated and survives *subsequent* scale changes. In-memory only,
never persisted to disk (HTML may contain secrets or customer data).
3. newTab + restoreState now consume validateNavigationUrl's normalized
return value. file://./x, file://~/x, and bare-segment forms now take
effect at every navigation site, not just the top-level goto command.
Together these enable: load-html → viewport --scale 2 → viewport --scale 1.5
→ screenshot, with content surviving both context recreations. Codex v2 P0
flagged that bare page.setContent in restoreState would lose content on the
second scale change — this commit implements the rehydration path.
References: codex v2 P0 (TabSession rehydration), codex v3 P1 (4-caller
return value), plan Feature 3 + Feature 4.
* feat(browse): load-html, screenshot --selector, viewport --scale, alias dispatch
Wires the new handlers and dispatch logic that the previous commits made
possible:
write-commands.ts
- New 'load-html' case: validateReadPath for safe-dir scoping, stat-based
actionable errors (not found, directory, oversize), extension allowlist
(.html/.htm/.xhtml/.svg), magic-byte sniff with UTF-8 BOM strip accepting
any <[a-zA-Z!?] markup opener (not just <!doctype — bare fragments like
<div>...</div> work for setContent), 50MB cap via GSTACK_BROWSE_MAX_HTML_BYTES
override, frame-context rejection. Calls session.setTabContent() so replay
metadata is rehydrated.
- viewport command extended: optional [<WxH>], optional [--scale <n>],
scale-only variant reads current size via page.viewportSize(). Invalid
scale (NaN, Infinity, empty, out of 1-3) throws with named value. Headed
mode rejected explicitly.
- clearLoadedHtml() called BEFORE goto/back/forward/reload navigation
(not after) so a timed-out goto post-commit doesn't leave stale metadata
that could resurrect on a later context recreation. Codex v2 P1 catch.
- goto uses validateNavigationUrl's normalized return value.
meta-commands.ts
- screenshot --selector <css> flag: explicit element-screenshot form.
Rejects alongside positional selector (both = error), preserves --clip
conflict at line 161, composes with --base64 at lines 168-174.
- chain canonicalizes each step with canonicalizeCommand — step shape is
now { rawName, name, args } so prevalidation, dispatch, WRITE_COMMANDS.has,
watch blocking, and result labels all use canonical names while audit
labels show 'rawName→name' when aliased. Codex v3 P2 catch — prior shape
only canonicalized at prevalidation and diverged everywhere else.
- diff command consumes validateNavigationUrl return value for both URLs.
server.ts
- Command canonicalization inserted immediately after parse, before scope /
watch / tab-ownership / content-wrapping checks. rawCommand preserved for
future audit (not wired into audit log in this commit — follow-up).
- Unknown-command handler replaced with buildUnknownCommandError() from
commands.ts — produces 'Unknown command: X. Did you mean Y?' with optional
upgrade hint for NEW_IN_VERSION entries.
security-audit-r2.test.ts
- Updated chain-loop marker from 'for (const cmd of commands)' to
'for (const c of commands)' to match the new chain step shape. Same
isWatching + BLOCKED invariants still asserted.
* chore: bump version and changelog (v1.1.0.0)
- VERSION: 1.0.0.0 → 1.1.0.0 (MINOR bump — new user-facing commands)
- package.json: matching version bump
- CHANGELOG.md: new 1.1.0.0 entry describing load-html, screenshot --selector,
viewport --scale, file:// support, setContent replay, and DX polish in user
voice with a dedicated Security section for file:// safe-dirs policy
- browse/SKILL.md.tmpl: adds pattern #12 "Render local HTML", pattern #13
"Retina screenshots", and a full Puppeteer → browse cheatsheet with side-by-
side API mapping and a worked tweet-renderer migration example
- browse/SKILL.md + SKILL.md: regenerated from templates via `bun run gen:skill-docs`
to reflect the new command descriptions
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix: pre-landing review fixes (9 findings from specialist + adversarial review)
Adversarial review (Claude subagent + Codex) surfaced 9 bugs across
CRITICAL/HIGH severity. All fixed:
1. tab-session.ts:setTabContent — state mutation moved AFTER the setContent
await. Prior order left phantom HTML in replay metadata if setContent
threw (timeout, browser crash), which a later viewport --scale would
silently replay. Now loadedHtml is only recorded on successful load.
2. browser-manager.ts:setDeviceScaleFactor — rollback now forces a second
recreateContext after restoring the old fields. The fallback path in
the original recreateContext builds a blank context using whatever
this.deviceScaleFactor/currentViewport hold at that moment (which were
the NEW values we were trying to apply). Rolling back the fields without
a second recreate left the live context at new-scale while state tracked
old-scale. Now: restore fields, force re-recreate with old values, only
if that ALSO fails do we return a combined error.
3. commands.ts:buildUnknownCommandError — Levenshtein tiebreak simplified
to 'd <= 2 && d < bestDist' (strict less). Candidates are pre-sorted
alphabetically, so first equal-distance wins by default. The prior
'(d === bestDist && best !== undefined && cand < best)' clause was dead
code.
4. tab-session.ts:onMainFrameNavigated — now clears loadedHtml, not just
refs + frame. Without this, a user who load-html'd then clicked a link
(or had a form submit / JS redirect / OAuth flow) would retain the stale
replay metadata. The next viewport --scale would silently revert the
tab to the ORIGINAL loaded HTML, losing whatever the post-navigation
content was. Silent data corruption. Browser-emitted navigations trigger
this path via wirePageEvents.
5. browser-manager.ts:saveState + restoreState — tab ownership now flows
through BrowserState.owner. Without this, a scoped agent's viewport
--scale would strand them: tab IDs change during recreate, ownership
map held stale IDs, owner lookup failed. New IDs had no owner, so
writes without tabId were denied (DoS). Worse, if the agent sent a
stale tabId the server's swallowed-tab-switch-error path would let the
command hit whatever tab was currently active (cross-tab authz bypass).
Now: clear ownership before restore, re-add per-tab with new IDs.
6. meta-commands.ts:state load — disk-loaded state.pages is now explicit
allowlist (url, isActive, storage:null) instead of object spread.
Spreading accepted loadedHtml, loadedHtmlWaitUntil, and owner from a
user-writable state file, letting a tampered state.json smuggle HTML
past load-html's safe-dirs / extension / magic-byte / 50MB-cap
validators, or forge tab ownership. Now stripped at the boundary.
7. url-validation.ts:normalizeFileUrl — preserves query string + fragment
across normalization. file://./app.html?route=home#login previously
resolved to a filesystem path that URL-encoded '?' as %3F and '#' as
%23, or (for absolute forms) pathToFileURL dropped them entirely. SPAs
and fixture URLs with query params 404'd or loaded the wrong route.
Now: split on ?/# before path resolution, reattach after.
8. url-validation.ts:validateNavigationUrl — reattaches parsed.search +
parsed.hash to the normalized file:// URL. Same fix at the main
validator for absolute paths that go through fileURLToPath round-trip.
9. server.ts:writeAuditEntry — audit entries now include aliasOf when the
user typed an alias ('setcontent' → cmd: 'load-html', aliasOf:
'setcontent'). Previously the isAliased variable was computed but
dropped, losing the raw input from the forensic trail. Completes the
plan's codex v3 P2 requirement.
Also added bm.getCurrentViewport() and switched 'viewport --scale'-
without-size to read from it (more reliable than page.viewportSize() on
headed/transition contexts).
Tests pass: exit 0, no failures. Build clean.
* test: integration coverage for load-html, screenshot --selector, viewport --scale, replay, aliases
Adds 28 Playwright-integration tests that close the coverage gap flagged
by the ship-workflow coverage audit (50% → expected ~80%+).
**load-html (12 tests):**
- happy path loads HTML file, page text matches
- bare HTML fragments (<div>...</div>) accepted, not just full documents
- missing file arg throws usage
- non-.html extension rejected by allowlist
- /etc/passwd.html rejected by safe-dirs policy
- ENOENT path rejected with actionable "not found" error
- directory target rejected
- binary file (PNG magic bytes) disguised as .html rejected by magic-byte check
- UTF-8 BOM stripped before magic-byte check — BOM-prefixed HTML accepted
- --wait-until networkidle exercises non-default branch
- invalid --wait-until value rejected
- unknown flag rejected
**screenshot --selector (5 tests):**
- --selector flag captures element, validates Screenshot saved (element)
- conflicts with positional selector (both = error)
- conflicts with --clip (mutually exclusive)
- composes with --base64 (returns data:image/png;base64,...)
- missing value throws usage
**viewport --scale (5 tests):**
- WxH --scale 2 produces PNG with 2x element dimensions (parses IHDR bytes 16-23)
- --scale without WxH keeps current size + applies scale
- non-finite value (abc) throws "not a finite number"
- out-of-range (4, 0.5) throws "between 1 and 3"
- missing value throws
**setContent replay across context recreation (3 tests):**
- load-html → viewport --scale 2: content survives (hits setTabContent replay path)
- double cycle 2x → 1.5x: content still survives (proves TabSession rehydration)
- goto after load-html clears replay: subsequent viewport --scale does NOT
resurrect the stale HTML (validates the onMainFrameNavigated fix)
**Command aliases (2 tests):**
- setcontent routes to load-html via chain canonicalization
- set-content (hyphenated) also routes — both end-to-end through chain dispatch
Fixture paths use /tmp (SAFE_DIRECTORIES entry) instead of $TMPDIR which is
/var/folders/... on macOS and outside the safe-dirs boundary. Chain result
labels use rawName→name format when an alias is resolved (matches the
meta-commands.ts chain refactor).
Full suite: exit 0, 223/223 pass.
* docs: update BROWSER.md + CHANGELOG for v1.1.0.0
BROWSER.md:
- Command reference table updated: goto now lists file:// support,
load-html added to Navigate row, viewport flagged with --scale
option, screenshot row shows --selector + --base64 flags
- Screenshot modes table adds the fifth mode (element crop via
--selector flag) and notes the tag-selector-not-caught-positionally
gotcha
- New "Retina screenshots — viewport --scale" subsection explains
deviceScaleFactor mechanics, context recreation side effects, and
headed-mode rejection
- New "Loading local HTML — goto file:// vs load-html" subsection
explains the two paths, their tradeoffs (URL state, relative asset
resolution), the safe-dirs policy, extension allowlist + magic-byte
sniff, 50MB cap, setContent replay across recreateContext, and the
alias routing (setcontent → load-html before scope check)
CHANGELOG.md (v1.1.0.0 security section expanded, no existing content
removed):
- State files cannot smuggle HTML or forge tab ownership (allowlist
on disk-loaded page fields)
- Audit log records aliasOf when a canonical command was reached via
an alias (setcontent → load-html)
- load-html content clears on real navigations (clicks, form submits,
JS redirects) — not just explicit goto. Also notes SPA query/fragment
preservation for goto file://
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
229d9a6505 |
merge: origin/main v1.0.0.0 into garrytan/fix-checkpoints
Main shipped the v1 prompts rewrite (simpler writing style + real LOC receipts + /plan-tune observational substrate). Resolved conflicts: - VERSION / package.json: bumped 0.18.5.0 → 1.0.1.0 (main is 1.0.0.0, this branch lands next). - CHANGELOG: moved the /context-save + /context-restore entry to the top as v1.0.1.0, above main's v1.0.0.0. Also removed the em-dash variants in the new entry (ship voice rule). - TODOS: kept both sections — Context skills (lane feature TODO) first, main's PACING_UPDATES_V0 + Plan Tune v2 deferrals below. - Migration: renamed gstack-upgrade/migrations/v0.18.5.0.sh → v1.0.1.0.sh (matches new version). Test path updated. preamble.ts auto-merged cleanly: main's question-tuning, explain_level, and writing-style sections composed with my context-save/context-restore routing rule. All SKILL.md files regenerated via `bun run gen:skill-docs --host all` per CLAUDE.md's "never resolve generated files by accepting either side" rule. Golden fixtures (claude/codex/factory ship) also regenerated. bun test: 0 failures. |
||
|
|
fe6d764bf7 |
docs: bump VERSION to 0.18.5.0, CHANGELOG + TODOS entry
User-facing changelog leads with the problem: /checkpoint silently stopped saving because Claude Code shipped a native /checkpoint alias for /rewind. The fix is a clean rename to /context-save + /context-restore, with the second bug (restore was filtering by current branch and hiding most recent saves) called out separately under Fixed. TODOS entry for the deferred lane feature points at the existing lane data model in plan-eng-review/SKILL.md.tmpl:240-249 so a future session can pick it up without re-discovering the source. |
||
|
|
0a803f9e81 |
feat: gstack v1 — simpler prompts + real LOC receipts (v1.0.0.0) (#1039)
* docs: add design doc for /plan-tune v1 (observational substrate) Canonical record of the /plan-tune v1 design: typed question registry, per-question explicit preferences, inline tune: feedback with user-origin gate, dual-track profile (declared + inferred separately), and plain-English inspection skill. Captures every decision with pros/cons, what's deferred to v2 with explicit acceptance criteria, and what was rejected entirely. Codex review drove a substantial scope rollback from the initial CEO EXPANSION plan. 15+ legitimate findings (substrate claim was false without a typed registry; E4/E6/clamp logical contradiction; profile poisoning attack surface; LANDED preamble side effect; implementation order) shaped the final shape. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: typed question registry for /plan-tune v1 foundation scripts/question-registry.ts declares 53 recurring AskUserQuestion categories across 15 skills (ship, review, office-hours, plan-ceo-review, plan-eng-review, plan-design-review, plan-devex-review, qa, investigate, land-and-deploy, cso, gstack-upgrade, preamble, plan-tune, autoplan). Each entry has: stable kebab-case id, skill owner, category (approval | clarification | routing | cherry-pick | feedback-loop), door_type (one-way | two-way), optional stable option keys, optional psychographic signal_key, and a one-line description. 12 of 53 are one-way doors (destructive ops, architecture/data forks, security/compliance). These are ALWAYS asked regardless of user preference. Helpers: getQuestion(id), getOneWayDoorIds(), getAllRegisteredIds(), getRegistryStats(). No binary or resolver wiring yet — this is the schema substrate the rest of /plan-tune builds on. Ad-hoc question_ids (not registered) still log but skip psychographic signal attribution. Future /plan-tune skill surfaces frequently-firing ad-hoc ids as candidates for registry promotion. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test: registry schema + safety + coverage tests (gate tier) 20 tests validating the question registry: Schema (7 tests): - Every entry has required fields - All ids are kebab-case and start with their skill name - No duplicate ids - Categories are from the allowed set - door_type is one-way | two-way - Options arrays are well-formed - Descriptions are short and single-line Helpers (5 tests): - getQuestion returns entry for known id, undefined for unknown - getOneWayDoorIds includes destructive questions, excludes two-way - getAllRegisteredIds count matches QUESTIONS keys - getRegistryStats totals are internally consistent One-way door safety (2 tests): - Every critical question (test failure, SQL safety, LLM trust boundary, security scan, merge confirm, rollback, fix apply, premise revise, arch finding, privacy gate, user challenge) is declared one-way - At least 10 one-way doors exist (catches regression if declarations are accidentally dropped) Registry breadth (3 tests): - 11 high-volume skills each have >= 1 registered question - Preamble one-time prompts are registered - /plan-tune's own questions are registered Signal map references (1 test): - signal_key values are typed kebab-case strings Template coverage (2 tests, informational): - AskUserQuestion usage across templates is non-trivial (>20) - Registry spans >= 10 skills 20 pass, 0 fail. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: one-way door classifier (belt-and-suspenders safety fallback) scripts/one-way-doors.ts — secondary keyword-pattern classifier that catches destructive questions even when the registry doesn't have an entry for them. The registry's door_type field (from scripts/question-registry.ts) is the PRIMARY safety gate. This classifier is the fallback for ad-hoc question_ids that agents generate at runtime. Classification priority: 1. Registry lookup by question_id → use declared door_type 2. Skill:category fallback (cso:approval, land-and-deploy:approval) 3. Keyword pattern match against question_summary 4. Default: treat as two-way (safer to log the miss than auto-decide unsafely) Covers 21 destructive patterns across: - File system (rm -rf, delete, wipe, purge, truncate) - Database (drop table/database/schema, delete from) - Git/VCS (force-push, reset --hard, checkout --, branch -D) - Deploy/infra (kubectl delete, terraform destroy, rollback) - Credentials (revoke/reset/rotate API key|token|secret|password) - Architecture (breaking change, schema migration, data model change) 7 new tests in test/plan-tune.test.ts covering: registry-first lookup, unknown-id fallthrough, keyword matching on destructive phrasings including embedded filler words ("rotate the API key"), skill-category fallback, benign questions defaulting to two-way, pattern-list non-empty. 27 pass, 0 fail. 1270 expect() calls. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: psychographic signal map + builder archetypes scripts/psychographic-signals.ts — hand-crafted {signal_key, user_choice} → {dimension, delta} map. Version 0.1.0. Conservative deltas (±0.03 to ±0.06 per event). Covers 9 signal keys: scope-appetite, architecture-care, code-quality-care, test-discipline, detail-preference, design-care, devex-care, distribution-care, session-mode. Helpers: applySignal() mutates running totals, newDimensionTotals() creates empty starting state, normalizeToDimensionValue() sigmoid-clamps accumulated delta to [0,1] (0 → 0.5 neutral), validateRegistrySignalKeys() checks that every signal_key in the registry has a SIGNAL_MAP entry. In v1 the signal map is used ONLY to compute inferred dimension values for /plan-tune inspection output. No skill behavior adapts to these signals until v2. scripts/archetypes.ts — 8 named archetypes + Polymath fallback: - Cathedral Builder (boil-the-ocean + architecture-first) - Ship-It Pragmatist (small scope + fast) - Deep Craft (detail-verbose + principled) - Taste Maker (intuitive, overrides recommendations) - Solo Operator (high-autonomy, delegates) - Consultant (hands-on, consulted on everything) - Wedge Hunter (narrow scope aggressively) - Builder-Coach (balanced steering) - Polymath (fallback when no archetype matches) matchArchetype() uses L2 distance scaled by tightness, with a 0.55 threshold below which we return Polymath. v1 ships the model stable; v2 narrative/vibe commands wire it into user-facing output. 14 new tests: signal map consistency vs registry, applySignal behavior for known/unknown keys, normalization bounds, archetype schema validity, name uniqueness, matchArchetype correctness for each reference profile, Polymath fallback for outliers. 41 pass, 0 fail total in test/plan-tune.test.ts. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: bin/gstack-question-log — append validated AskUserQuestion events Append-only JSONL log at ~/.gstack/projects/{SLUG}/question-log.jsonl. Schema: {skill, question_id, question_summary, category?, door_type?, options_count?, user_choice, recommended?, followed_recommendation?, session_id?, ts} Validates: - skill is kebab-case - question_id is kebab-case, <= 64 chars - question_summary non-empty, <= 200 chars, newlines flattened - category is one of approval/clarification/routing/cherry-pick/feedback-loop - door_type is one-way or two-way - options_count is integer in [1, 26] - user_choice non-empty string, <= 64 chars Injection defense on question_summary rejects the same patterns as gstack-learnings-log (ignore previous instructions, system:, override:, do not report, etc). followed_recommendation is auto-computed when both user_choice and recommended are present. ts auto-injected as ISO 8601 if missing. 21 tests covering: valid payloads, full field preservation, auto-followed computation, appending, long-summary truncation, newline flattening, invalid JSON, missing fields, bad case, oversized ids, invalid enum values, out-of-range options_count, and 6 injection attack patterns. 21 pass, 0 fail, 43 expect() calls. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: bin/gstack-developer-profile — unified profile with migration bin/gstack-developer-profile supersedes bin/gstack-builder-profile. The old binary becomes a one-line legacy shim delegating to --read for /office-hours backward compat. Subcommands: --read legacy KEY:VALUE output (tier, session_count, etc) --migrate folds ~/.gstack/builder-profile.jsonl into ~/.gstack/developer-profile.json. Atomic (temp + rename), idempotent (no-op when target exists or source absent), archives source as .migrated-YYYY-MM-DD-HHMMSS --derive recomputes inferred dimensions from question-log.jsonl using the signal map in scripts/psychographic-signals.ts --profile full profile JSON --gap declared vs inferred diff JSON --trace <dim> event-level trace of what contributed to a dimension --check-mismatch flags dimensions where declared and inferred disagree by > 0.3 (requires >= 10 events first) --vibe archetype name + description from scripts/archetypes.ts --narrative (v2 stub) Auto-migration on first read: if legacy file exists and new file doesn't, migrate before reading. Creates a neutral (all-0.5) stub if nothing exists. Unified schema (see docs/designs/PLAN_TUNING_V0.md §Architecture): {identity, declared, inferred: {values, sample_size, diversity}, gap, overrides, sessions, signals_accumulated, schema_version} 25 new tests across subcommand behaviors: - --read defaults + stub creation - --migrate: 3 sessions preserved with signal tallies, idempotency, archival - Tier calculation: welcome_back / regular / inner_circle boundaries - --derive: neutral-when-empty, upward nudge on 'expand', downward on 'reduce', recomputable (same input → same output), ad-hoc unregistered ids ignored - --trace: contributing events, empty for untouched dims, error without arg - --gap: empty when no declared, correctly computed otherwise - --vibe: returns archetype name + description - --check-mismatch: threshold behavior, 10+ sample requirement - Unknown subcommand errors 25 pass, 0 fail, 60 expect() calls. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: bin/gstack-question-preference — explicit preferences + user-origin gate Subcommands: --check <id> → ASK_NORMALLY | AUTO_DECIDE (decides if a registered question should be auto-decided by the agent) --write '{…}' → set a preference (requires user-origin source) --read → dump preferences JSON --clear [id] → clear one or all --stats → short counts summary Preference values: always-ask | never-ask | ask-only-for-one-way. Stored at ~/.gstack/projects/{SLUG}/question-preferences.json. Safety contract (the core of Codex finding #16, profile-poisoning defense from docs/designs/PLAN_TUNING_V0.md §Security model): 1. One-way doors ALWAYS return ASK_NORMALLY from --check, regardless of user preference. User's never-ask is overridden with a visible safety note so the user knows why their preference didn't suppress the prompt. 2. --write requires an explicit `source` field: - Allowed: "plan-tune", "inline-user" - REJECTED with exit code 2: "inline-tool-output", "inline-file", "inline-file-content", "inline-unknown" Rejection is explicit ("profile poisoning defense") so the caller can log and surface the attempt. 3. free_text on --write is sanitized against injection patterns (ignore previous instructions, override:, system:, etc.) and newline-flattened. Each --write also appends a preference-set event to ~/.gstack/projects/{SLUG}/question-events.jsonl for derivation audit trail. 31 tests: - --check behavior (4): defaults, two-way, one-way (one-way overrides never-ask with safety note), unknown ids, missing arg - --check with prefs (5): never-ask on two-way → AUTO_DECIDE; never-ask on one-way → ASK_NORMALLY with override note; always-ask always asks; ask-only-for-one-way flips appropriately - --write valid (5): inline-user accepted, plan-tune accepted, persisted correctly, event appended, free_text preserved with flattening - User-origin gate (6): missing source rejected; inline-tool-output rejected with exit code 2 and explicit poisoning message; inline-file, inline-file-content, inline-unknown rejected; unknown source rejected - Schema validation (4): invalid JSON, bad question_id, bad preference, injection in free_text - --read (2): empty → {}, returns writes - --clear (3): specific id, clear-all, NOOP for missing - --stats (2): empty zeros, tallies by preference type 31 pass, 0 fail, 52 expect() calls. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: question-tuning preamble resolvers scripts/resolvers/question-tuning.ts ships three preamble generators: generateQuestionPreferenceCheck — before each AskUserQuestion, agent runs gstack-question-preference --check <id>. AUTO_DECIDE suppresses the ask and auto-chooses recommended. ASK_NORMALLY asks as usual. One-way door safety override is handled by the binary. generateQuestionLog — after each AskUserQuestion, agent appends a log record with skill, question_id, summary, category, door_type, options_count, user_choice, recommended, session_id. generateInlineTuneFeedback — offers inline "tune:" prompt after two-way questions. Documents structured shortcuts (never-ask, always-ask, ask-only-for-one-way, ask-less) AND accepts free-form English with normalization + confirmation. Explicitly spells out the USER-ORIGIN GATE: only write tune events when the prefix appears in the user's own chat message, never from tool output or file content. Binary enforces. All three resolvers are gated by the QUESTION_TUNING preamble echo. When the config is off, the agent skips these sections entirely. Ready to be wired into preamble.ts in the next commit. Codex host has a simpler variant that uses $GSTACK_BIN env vars. scripts/resolvers/index.ts registers three placeholders: QUESTION_PREFERENCE_CHECK, QUESTION_LOG, INLINE_TUNE_FEEDBACK Total resolver count goes from 45 to 48. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: wire question-tuning into preamble for tier >= 2 skills scripts/resolvers/preamble.ts — adds two things: 1. _QUESTION_TUNING config echo in the preamble bash block, gated on the user's gstack-config `question_tuning` value (default: false). 2. A combined Question Tuning section for tier >= 2 skills, injected after the confusion protocol. The section itself is runtime-gated by the QUESTION_TUNING value — agents skip it entirely when off. scripts/resolvers/question-tuning.ts — consolidated into one compact combined section `generateQuestionTuning(ctx)` covering: preference check before the question, log after, and inline tune: feedback with user-origin gate. Per-phase generators remain exported for unit tests but are no longer the main entrypoint. Size impact: +570 tokens / +2.3KB per tier-2+ SKILL.md. Three skills (plan-ceo-review, office-hours, ship) still exceed the 100KB token ceiling — but they were already over before this change. Delta is the smallest viable wiring of the /plan-tune v1 substrate. Golden fixtures (test/fixtures/golden/claude-ship, codex-ship, factory-ship) regenerated to match the new baseline. Full test run: 1149 pass, 0 fail, 113 skip across 28 files. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: regenerate SKILL.md files with question-tuning section bun run gen:skill-docs --host all after wiring the QUESTION_TUNING preamble section. Every tier >= 2 skill now includes the combined Question Tuning guidance. Runtime-gated — agents skip the section when question_tuning is off in gstack-config (default). Golden fixtures (claude-ship, codex-ship, factory-ship) updated to the new baseline. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: /plan-tune skill — conversational inspection + preferences plan-tune/SKILL.md.tmpl: the user-facing skill for /plan-tune v1. Routes plain-English intent to one of 8 flows: - Enable + setup (first-time): 5 declaration questions mapping to the 5 psychographic dimensions (scope_appetite, risk_tolerance, detail_preference, autonomy, architecture_care). Writes to developer-profile.json declared.*. - Inspect profile: plain-English rendering of declared + inferred + gap. Uses word bands (low/balanced/high) not raw floats. Shows vibe archetype when calibration gate is met. - Review question log: top-20 question frequencies with follow/override counts. Highlights override-heavy questions as candidates for never-ask. - Set a preference: normalizes "stop asking me about X" → never-ask, etc. Confirms ambiguous phrasings before writing via gstack-question-preference. - Edit declared profile: interprets free-form ("more boil-the-ocean") and CONFIRMS before mutating declared.* (trust boundary per Codex #15). - Show gap: declared vs inferred diff with plain-English severity bands (close / drift / mismatch). Never auto-updates declared from the gap. - Stats: preference counts + diversity/calibration status. - Enable / disable: gstack-config set question_tuning true|false. Design constraints enforced: - Plain English everywhere. No CLI subcommand syntax required. Shortcuts (`profile`, `vibe`, `stats`, `setup`) exist but optional. - user-origin gate on tune: writes. source: "plan-tune" for user-invoked /plan-tune; source: "inline-user" for inline tune: from other skills. - One-way doors override never-ask (safety, surfaced to user). - No behavior adaptation in v1 — this skill inspects and configures only. Generates plan-tune/SKILL.md at ~11.6k tokens, well under the 100KB ceiling. Generated for all hosts via `bun run gen:skill-docs --host all`. Full free test suite: 1149 pass, 0 fail, 113 skip across 28 files. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test: end-to-end pipeline + preamble injection coverage Added 6 tests to test/plan-tune.test.ts: Preamble injection (3 tests): - tier 2+ includes Question Tuning section with preference check, log, and user-origin gate language ('profile-poisoning defense', 'inline-user') - tier 1 does NOT include the prose section (QUESTION_TUNING bash echo still fires since it's in the bash block all tiers share) - codex host swaps binDir references to $GSTACK_BIN End-to-end pipeline (3 tests) — real binaries working together, not mocks: - Log 5 expand choices → --derive → profile shows scope_appetite > 0.5 (full log → registry lookup → signal map → normalization round-trip) - --write source: inline-tool-output rejected; --read confirms no pref was persisted (the profile-poisoning defense actually works end-to-end) - Migrate a 3-session legacy file; confirm legacy gstack-builder-profile shim still returns SESSION_COUNT: 3, TIER: welcome_back, CROSS_PROJECT: true test/plan-tune.test.ts now has 47 tests total. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test: E2E test for /plan-tune plain-English inspection flow (gate tier) test/skill-e2e-plan-tune.test.ts — verifies /plan-tune correctly routes plain-English intent ("review the questions I've been asked") to the Review question log section without requiring CLI subcommand syntax. Seeds a synthetic question-log.jsonl with 3 entries exercising: - override behavior (user chose expand over recommended selective) - one-way door respect (user followed ship-test-failure-triage recommendation) - two-way override (user skipped recommended changelog polish) Invokes the skill via `claude -p` and asserts: - Agent surfaces >= 2 of 3 logged question_ids in output - Agent notices override/skip behavior from the log - Exit reason is success or error_max_turns (not agent-crash) Gate-tier because the core v1 DX promise is plain-English intent routing. If it requires memorized subcommands or breaks on natural language, that's a regression of the defining feature. Registered in test/helpers/touchfiles.ts with dependencies: - plan-tune/** (skill template + generated md) - scripts/question-registry.ts (required for log lookup) - scripts/psychographic-signals.ts, scripts/one-way-doors.ts (derive path) - bin/gstack-question-log, gstack-question-preference, gstack-developer-profile Skipped when EVALS_ENABLED is not set; runs on `bun run test:evals`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.19.0.0) — /plan-tune v1 Ships /plan-tune as observational substrate: typed question registry, dual-track developer profile (declared + inferred), explicit per-question preferences with user-origin gate, inline tune: feedback across every tier >= 2 skill, unified developer-profile.json with migration from builder-profile.jsonl. Scope rolled back from initial CEO EXPANSION plan after outside-voice review (Codex). 6 deferrals tracked as P0 TODOs with explicit acceptance criteria: E1 substrate wiring, E3 narrative/vibe, E4 blind-spot coach, E5 LANDED celebration, E6 auto-adjustment, E7 psychographic auto-decide. See docs/designs/PLAN_TUNING_V0.md for the full design record. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ci): harden Dockerfile.ci against transient Ubuntu mirror failures The CI image build failed with: E: Failed to fetch http://archive.ubuntu.com/ubuntu/pool/main/... Connection failed [IP: 91.189.92.22 80] ERROR: process "/bin/sh -c apt-get update && apt-get install ..." did not complete successfully: exit code: 100 archive.ubuntu.com periodically returns "connection refused" on individual regional mirrors. Without retry logic a single failed fetch nukes the whole Docker build. Three defenses, layered: 1. /etc/apt/apt.conf.d/80-retries — apt fetches each package up to 5 times with a 30s timeout. Handles per-package flakes. 2. Shell-loop retry around the whole apt-get step (x3, 10s sleep) — handles the case where apt-get update itself can't reach any mirror. 3. --retry 5 --retry-delay 5 --retry-connrefused on all curl fetches (bun install script, GitHub CLI keyring, NodeSource setup script). Applied to every apt-get and curl call in the Dockerfile. No behavior change on happy path — only kicks in when mirrors blip. Fixes the build-image job that was blocking CI on the /plan-tune PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: add PLAN_TUNING_V1 + PACING_UPDATES_V0 design docs Captures the V1 design (ELI10 writing + LOC reframe) in docs/designs/PLAN_TUNING_V1.md and the extracted V1.1 pacing-overhaul plan in docs/designs/PACING_UPDATES_V0.md. V1 scope was reduced from the original bundled pacing + writing-style plan after three engineering-review passes revealed structural gaps in the pacing workstream that couldn't be closed via plan-text editing. TODOS.md P0 entry links to V1.1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: curated jargon list for V1 writing-style glossing Repo-owned list of ~50 high-frequency technical terms (idempotent, race condition, N+1, backpressure, etc.) that gstack glosses on first use in tier-≥2 skill output. Baked into generated SKILL.md prose at gen-skill-docs time. Terms not on this list are assumed plain-English enough. Contributions via PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(preamble): V1 Writing Style section + EXPLAIN_LEVEL echo + migration prompt Adds a new Writing Style section to tier-≥2 preamble output composing with the existing AskUserQuestion Format section. Six rules: jargon glossed on first use per skill invocation (from scripts/jargon-list.json), outcome- framed questions, short sentences, decisions close with user impact, gloss-on-first-use even if user pasted term, user-turn override for "be terse" requests. Baked conditionally (skip if EXPLAIN_LEVEL: terse). Adds EXPLAIN_LEVEL preamble echo using \${binDir} (host-portable matching V0 QUESTION_TUNING pattern). Adds WRITING_STYLE_PENDING echo reading a flag file written by the V0→V1 upgrade migration; on first post-upgrade skill run, the agent fires a one-time AskUserQuestion offering terse mode. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(gstack-config): validate explain_level + document in header Adds explain_level: default|terse to the annotated config header with a one-line description. Whitelists valid values; on set of an unknown value, prints a specific warning ("explain_level '\$VALUE' not recognized. Valid values: default, terse. Using default.") and writes the default value. Matches V1 preamble's EXPLAIN_LEVEL echo expectation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: V1 upgrade migration — writing-style opt-out prompt New migration script following existing v0.15.2.0.sh / v0.16.2.0.sh pattern. Writes a .writing-style-prompt-pending flag file on first run post-upgrade. The preamble's migration-prompt block reads the flag and fires a one-time AskUserQuestion offering the user a choice between the new default writing style and restoring V0 prose via \`gstack-config set explain_level terse\`. Idempotent via flag files; if the user has already set explain_level explicitly, counts as answered and skips. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: LOC reframe tooling — throughput comparison + README updater + scc installer Three new scripts: - scripts/garry-output-comparison.ts — enumerates Garry-authored commits in 2013 + 2026 on public repos, extracts ADDED lines from git diff, classifies as logical SLOC via scc --stdin (regex fallback if scc missing). Writes docs/throughput-2013-vs-2026.json with per-language breakdown + explicit caveats (public repos only, commit-style drift, private-work exclusion). - scripts/update-readme-throughput.ts — reads the JSON if present, replaces the README's <!-- GSTACK-THROUGHPUT-PLACEHOLDER --> anchor with the computed multiple (preserving the anchor for future runs). If JSON missing, writes GSTACK-THROUGHPUT-PENDING marker that CI rejects — forcing the build to run before commit. - scripts/setup-scc.sh — standalone OS-detecting installer for scc. Not a package.json dependency (95% of users never run throughput). Brew on macOS, apt on Linux, GitHub releases link on Windows. Two-string anchor pattern (PLACEHOLDER vs PENDING) prevents the pipeline from destroying its own update path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(retro): surface logical SLOC + weighted commits above raw LOC V1 reorders the /retro summary table to lead with features shipped, then commits + weighted commits (commits × files-touched capped at 20), then PRs merged, then logical SLOC added as the primary code-volume metric. Raw LOC stays present but is demoted to context. Rationale inline in the template: ten lines of a good fix is not less shipping than ten thousand lines of scaffold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(v1): README hero reframe + writing-style + CHANGELOG + version bump to 1.0.0.0 README.md: - Hero removes "600,000+ lines of production code" framing; replaces with the computed 2013-vs-2026 pro-rata multiple (via <!-- GSTACK-THROUGHPUT-PLACEHOLDER --> anchor, filled by the update-readme-throughput build step). - Hiring callout: "ship real products at AI-coding speed" instead of "10K+ LOC/day." - New Writing Style section (~80 words) between Quick start and Install: "v1 prompts = simpler" framing, outcome-language example, terse-mode opt-out, pointer to /plan-tune. CLAUDE.md: one-paragraph Writing style (V1) note under project conventions, linking to preamble resolver + V1 design docs. CHANGELOG.md: V1 entry on top of v0.19.0.0 with user-facing narrative (what changes, how to opt out, for-contributors notes). Mentions scope reduction — pacing overhaul ships in V1.1. CONTRIBUTING.md: one-paragraph note on jargon-list.json maintenance (PR to add/remove terms; regenerate via gen:skill-docs). VERSION + package.json: bump to 1.0.0.0. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: regenerate SKILL.md files + golden fixtures for V1 Mechanical regeneration from the updated templates in prior commits: - Writing Style section now appears in tier-≥2 skill output. - EXPLAIN_LEVEL + WRITING_STYLE_PENDING echoes in preamble bash. - V1 migration-prompt block fires conditionally on first upgrade. - Jargon list inlined into preamble prose at gen time. - Retro template's logical SLOC + weighted commits order applied. Regenerated for all 8 hosts via bun run gen:skill-docs --host all. Golden ship-skill fixtures refreshed from regenerated outputs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test: V1 gate coverage — writing-style resolver + config + jargon + migration + dormancy Six new gate-tier test files: - test/writing-style-resolver.test.ts — asserts Writing Style section is injected into tier-≥2 preamble, all 6 rules present, jargon list inlined, terse-mode gate condition present, Codex output uses \$GSTACK_BIN (not ~/.claude/), tier-1 does NOT get the section, migration-prompt block present. - test/explain-level-config.test.ts — gstack-config set/get round-trip for default + terse, unknown-value warns + defaults to default, header documents the key, round-trip across set→set→get. - test/jargon-list.test.ts — shape + ~50 terms + no duplicates (case-insensitive) + includes canonical high-signal terms. - test/v0-dormancy.test.ts — 5D dimension names + archetype names forbidden in default-mode tier-≥2 SKILL.md output, except for plan-tune and office-hours where they're load-bearing. - test/readme-throughput.test.ts — script replaces anchor with number on happy path, writes PENDING marker when JSON missing, CI gate asserts committed README contains no PENDING string. - test/upgrade-migration-v1.test.ts — fresh run writes pending flag, idempotent after user-answered, pre-existing explain_level counts as answered. All 95 V1 test-expect() calls pass. Full suite: 0 failures. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: compute real 2013-vs-2026 throughput multiple (130.2×) Ran scripts/garry-output-comparison.ts across all 15 public garrytan/* repos. Aggregated results into docs/throughput-2013-vs-2026.json and ran scripts/update-readme-throughput.ts to replace the README placeholder. 2013 public activity: 2 commits, 2,384 logical lines added across 1 week, in 1 repo (zurb-foundation-wysihtml5 upstream contribution). 2026 public activity: 279 commits, 310,484 logical lines added across 17 active weeks, in 3 repos (gbrain, gstack, resend_robot). Multiples (public repos only, apples-to-apples): - Logical SLOC: 130.2× - Commits per active week: 8.2× - Raw lines added: 134.4× Private work at both eras (2013 Bookface at YC, Posterous-era code, 2026 internal tools) is excluded from this comparison. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: 207× throughput multiple (with private repos + Bookface) Re-ran scripts/garry-output-comparison.ts across all 41 repos under garrytan/* (15 public + 26 private), including Bookface (YC's internal social network, 2013-era work). 2013 activity: 71 commits, 5,143 logical lines, 4 active repos (bookface, delicounter, tandong, zurb-foundation-wysihtml5) 2026 activity: 350 commits, 1,064,818 logical lines, 15 active repos (gbrain, gstack, gbrowser, tax-app, kumo, tenjin, autoemail, kitsune, easy-chromium-compiles, conductor-playground, garryslist-agent, baku, gstack-website, resend_robot, garryslist-brain) Multiples: - Logical SLOC: 207× (up from 130.2× when including private work) - Raw lines: 223× - Commits/active-week: 3.4× Stopped committing docs/throughput-2013-vs-2026.json — analysis is a local artifact, not repo state. Added docs/throughput-*.json to .gitignore. Full markdown analysis at ~/throughput-analysis-2026-04-18.md (local-only). README multiple is now hardcoded; re-run the script and edit manually when you want to refresh it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: run rate vs year-to-date throughput comparison Two separate numbers in the README hero: - Run rate: ~700× (9,859 logical lines/day in 2026 vs 14/day in 2013) - Year-to-date: 207× (2026 through April 18 already exceeds 2013 full year by 207×) Previous "207× pro-rata" framing mixed full-year 2013 vs partial-year 2026. Run rate is the apples-to-apples normalization; YTD is the "already produced" total. Both are honest; both are compelling; they measure different things. Analysis at ~/throughput-analysis-2026-04-18.md (local-only). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(throughput): script natively computes to-date + run-rate multiples Enhanced scripts/garry-output-comparison.ts so both calculations come out of a single run instead of being reassembled ad-hoc in bash: PerYearResult now includes: - days_elapsed — 365 for past years, day-of-year for current - is_partial — flags the current (in-progress) year - per_day_rate — logical/raw/commits normalized by calendar day - annualized_projection — per_day_rate × 365 Output JSON's `multiples` now has two sibling blocks: - multiples.to_date — raw volume ratios (2026-YTD / 2013-full-year) - multiples.run_rate — per-day pace ratios (apples-to-apples) Back-compat: multiples.logical_lines_added still aliases to_date for older consumers reading the JSON. Updated README hero to cite both (picking up brain/* repo that was missed in the earlier aggregation pass): 2026 run rate: ~880× my 2013 pace (12,382 vs 14 logical lines/day) 2026 YTD: 260× the entire 2013 year Stderr summary now prints both multiples at the end of each run. Full analysis at ~/throughput-analysis-2026-04-18.md (local-only). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: ON_THE_LOC_CONTROVERSY methodology post + README link Long-form response to the "LOC is a meaningless vanity metric" critique. Covers: - The three branches of the LOC critique and which are right - Why logical SLOC (NCLOC) beats raw LOC as the honest measurement - Full method: author-scoped git diff, regex-classified added lines, aggregated across 41 public + private garrytan/* repos - Both calculations: to-date (260x) and run-rate (879x) - Steelman of the critics (greenfield-vs-maintenance, survivorship bias, quality-adjusted productivity, time-to-first-user) - Reproduction instructions Linked from README hero via a blockquote directly below the number. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * exclude: tax-app from throughput analysis (import-dominated history) tax-app's history is one commit of 104K logical lines — an initial import of a codebase, not authored work. Removing it to keep the comparison honest. Changes: - scripts/garry-output-comparison.ts: added EXCLUDED_REPOS constant with tax-app + a one-line rationale. The script now skips excluded repos with a stderr note and deletes any stale output JSON so aggregation loops don't pick up pre-exclusion numbers. - README hero: updated to 810× run rate + 240× YTD (were 880×/260×). Wording updated to "40 public + private repos ... after excluding repos dominated by imported code." - docs/ON_THE_LOC_CONTROVERSY.md: updated all numbers, added an "Exclusions" paragraph explaining tax-app, removed tax-app from the "shipped not WIP" example list. New numbers (2026 through day 108, without tax-app): - To-date: 240× logical SLOC (1,233,062 vs 5,143) - Run rate: 810× per-day pace (11,417 vs 14 logical/day) - Annualized: ~4.2M logical lines projected Future re-runs automatically skip tax-app. Add more exclusions to EXCLUDED_REPOS at the top of the script with a one-line rationale. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: correct tax-app exclusion rationale tax-app is a demo app I built for an upcoming YC channel video, not an "import-dominated history" as the previous commit claimed. Excluded because it's not production shipping work, not because of an import commit. Updated rationale in scripts/garry-output-comparison.ts's EXCLUDED_REPOS constant, in docs/ON_THE_LOC_CONTROVERSY.md's method section + conclusion, and in the README hero wording ("one demo repo" vs the earlier "repos dominated by imported code"). Numbers unchanged — the exclusion itself is the same, just the reason. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: harden ON_THE_LOC_CONTROVERSY against Cramer + neckbeard critiques Reframes the thesis as "engineers can fly now" (amplification, not replacement) and fortifies the soft spots critics will attack. Added: - Flight-thesis opener: pilot vs walker, leverage not replacement. - Second deflation layer for AI verbosity (on top of NCLOC). Headline moves from 810x to 408x after generous 2x AI-boilerplate cut, with explicit sensitivity analysis showing the number is still large under pessimistic priors (5x → 162x, 10x → 81x, 100x impossible). - Weekly distribution check (kills "you had one burst week" attack). - Revert rate (2.0%) and post-merge fix rate (6.3%) with OSS comparables (K8s/Rails/Django band). Addresses "where are your error rates" directly. - Named production adoption signals (gstack 1000+ installs, gbrain beta, resend_robot paying API) with explicit concession that "shipped != used at scale" for most of the corpus. - Harder steelman: 5 specific concessions with quantified pivot points (e.g., "if 2013 baseline was 3.5x higher, 810x → 228x, still high"). Removed factual error: Posterous acquisition paragraph (Garry had already left Posterous by 2011, so the "Twitter bought our private repos" excuse for the 2013 corpus gap doesn't apply). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: update gstack/gbrain adoption numbers in LOC controversy post gstack: "1,000+ distinct project installations" → "tens of thousands of daily active users" (telemetry-reported, community tier, opt-in). gbrain: "small set of beta testers" → "hundreds of beta testers running it live." Both are the accurate current numbers. The concession paragraph below (about shipped != adopted at scale for the long-tail repos) still reads correctly since it's about the corpus as a whole, not gstack/gbrain specifically. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: reframe reproducibility note as OSS breakout flex "You'd need access to my private repos" → "Bookface and Posthaven are private, but gstack and gbrain are open-sourced with tens of thousands of GitHub stars and tens of thousands of confirmed regular users, among the most-used OSS projects in the world that didn't exist three months ago." Keeps the `gh repo list` command at the end for the actual reproducibility instruction. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Rewrite LOC controversy post - Lead with concession (LOC is garbage, do the math anyway) - Preempt 14 lines/day meme with historical baselines (Brooks, Jones, McConnell) - Remove 'neckbeard' language throughout - Add slop-scan story (Ben Vinegar, 5.24 → 1.96, 62% cut) - David Cramer GUnit joke - Add testing philosophy section (the real unlock) - ASCII weekly distribution chart - gstack telemetry section with real numbers (15K installs, 305K invocations, 95.2% success) - Top skills usage chart - Pick-your-priors paragraph moved earlier (the killer) - Sharper close: run the script, show me your numbers * docs: four precision fixes on LOC controversy post 1. Citation fix. Kernighan didn't say anything about LOC-as-metric (that's the famous "aircraft building by weight" quote, commonly misattributed but actually Bill Gates). Replaced "Kernighan implied it before that" with the real Dijkstra quote ("lines produced" vs "lines spent" from EWD1036, with direct link) + the Gates quote. Verified via web search. 2. Slop-scan direction clarified. "(highest on his benchmark)" was ambiguous — could read as a brag. Now: "Higher score = more slop. He ran it on gstack and we scored 5.24, the worst he'd measured at the time." Then the 62% cut lands as an actual win. 3. Prose/chart skill-usage ordering now matches. Added /plan-eng-review (28,014) to the prose list so it doesn't conflict with the chart below it. 4. Cut the "David — I owe you one / GUnit" insider joke. Most readers won't connect Cramer → Sentry → GUnit naming. Ends the slop-scan paragraph on the stronger line: "Run `bun test` and watch 2,000+ tests pass." Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: tighten four LOC post citations to match primary sources 1. Bill Gates quote: flagged as folklore-grade. Was "Bill Gates put it more memorably" (firm attribution). Now "The old line (widely attributed to Bill Gates, sourcing murky) puts it more memorably." The quote stands; honesty about attribution avoids the same misattribution trap we just fixed for Kernighan. 2. Capers Jones: "15-50 across thousands of projects" → "roughly 16-38 LOC/day across thousands of projects" — matches his actual published measurements (which also report as 325-750 LOC/month). 3. Steve McConnell: "10-50 for finished, tested, delivered code" was folklore. Replaced with his actual project-size-dependent range from Code Complete: "20-125 LOC/day for small projects (10K LOC) down to 1.5-25 for large projects (10M LOC) — it's size-dependent, not a single number." 4. Revert rate comparison: "Kubernetes, Rails, and Django historically run 1.5-3%" was unsourced. Replaced with "mature OSS codebases typically run 1-3%" + "run the same command on whatever you consider the bar and compare." No false specificity about which repos. Net: every quantitative citation in the post now matches primary-source figures or is explicitly flagged as folklore. Neckbeards can verify. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: drop Writing style section from README Was sitting in prime real estate between Quick start and Install — internal implementation detail, not something users need up-front. Existing coverage is enough: - Upgrade migration prompt notifies users on first post-upgrade run - CLAUDE.md has the contributor note - docs/designs/PLAN_TUNING_V1.md has the full design Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: collapse team-mode setup into one paste-and-go command Step 2 was three separate code blocks: setup --team, then team-init, then git add/commit. Mirrors Step 1's style now — one shell one-liner that does all three. Subshell (cd && ./setup --team) keeps the user in their repo pwd so team-init + git commit land in the right place. "Swap required for optional" moved to a one-liner below. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: move full-clone footnote from README to CONTRIBUTING The "Contributing or need full history?" note is for contributors, not for someone following the README install flow. Moved into CONTRIBUTING's Quick start section where it fits next to the existing clone command, with a tip to upgrade an existing shallow clone via \`git fetch --unshallow\`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: root <root@localhost> |
||
|
|
9ec4ab7eb9 |
codex + Apple Silicon hardening wave (v0.18.4.0) (#1056)
* fix: ad-hoc codesign compiled binaries on Apple Silicon after build On some Apple Silicon machines, Bun's --compile produces a corrupt or linker-only code signature. macOS kills these binaries with SIGKILL (exit 137, zsh: killed) before they execute a single instruction. Add a post-build codesign step to setup that runs only on Darwin arm64: 1. Remove the corrupt/linker-only signature (required — a direct re-sign fails with 'invalid or unsupported format for signature') 2. Apply a fresh ad-hoc signature The step is idempotent, costs <1s, and is what Bun's own docs recommend for distributed standalone executables. All four compiled binaries are covered: browse, find-browse, design, and gstack-global-discover. Failure is a non-fatal warning so Intel/CI builds are unaffected. Fixes #997 * fix: prevent codex exec stdin deadlock with </dev/null redirect codex CLI 0.120.0+ blocks indefinitely when stdin is a non-TTY pipe (Claude Code Bash tool, background bash, CI). The CLI sees a non-TTY stdin and waits for EOF to append it as a <stdin> block, even when the prompt is passed as a positional argument. Fix: add < /dev/null to every codex exec and codex review invocation in the source-of-truth files (scripts/resolvers/*.ts and *.md.tmpl). Generated SKILL.md files will be produced by bun run gen:skill-docs in a subsequent commit (Tension D: template+resolver only, generator is authoritative, not cherry-picked artifacts). Affected source files (16 total invocations): - scripts/resolvers/review.ts (4) - scripts/resolvers/design.ts (3) - codex/SKILL.md.tmpl (5) - autoplan/SKILL.md.tmpl (4) Fixes #971 Co-Authored-By: loning <loning@users.noreply.github.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: codex/autoplan hardening + Apple Silicon coreutils auto-install Hardens /codex and /autoplan against silent failures surfaced by the #972 stdin fix and #1003 Apple Silicon codesign. Six-layer defense: 1. **Multi-signal auth probe** (new Step 0.5 / Phase 0.5): env-based auth ($CODEX_API_KEY, $OPENAI_API_KEY) OR file-based auth (${CODEX_HOME:-~/.codex}/auth.json). Rejects false negatives that the old file-only check produced for CI / platform-engineer users. 2. **Timeout wrapper** around every codex exec / codex review invocation: gtimeout → timeout → unwrapped fallback chain. On exit 124, surfaces common causes + actionable next step. Guards against model-API stalls not covered by the #972 stdin fix. 3. **Stderr capture in Challenge mode** (codex/SKILL.md.tmpl:208): 2>/dev/null → 2>$TMPERR. Post-invocation grep for auth/login/unauthorized surfaces errors that were previously dropped silently. 4. **Completeness check** in the Python JSON parser: tracks turn.completed events and warns on zero (possible mid-stream disconnect). 5. **Version warning** for known-bad Codex CLI (0.120.0-0.120.2, the range that introduced the stdin deadlock #972 fixes). Anchored regex `(^|[^0-9.])0\.120\.(0|1|2)([^0-9.]|$)` prevents 0.120.10 / 0.120.20 false positives. 6. **Failure telemetry + operational learnings**: codex_timeout, codex_auth_failed, codex_cli_missing, codex_version_warning events land in ~/.gstack/analytics/skill-usage.jsonl behind the existing telemetry opt-in. On timeout (exit 124), auto-logs an operational learning via gstack-learnings-log so future /investigate sessions surface prior hang patterns automatically. **Shared helper** (bin/gstack-codex-probe): consolidates all four pieces (auth probe, version check, timeout wrapper, telemetry logger) into one bash file that /codex and /autoplan source. Namespace-prefixed (_gstack_codex_*) with a unit test that verifies sourcing does not leak shell options into the caller. pathRewrites in host configs rewrite ~/.claude/skills/gstack → $GSTACK_ROOT for Codex, $GSTACK_BIN for Factory/Cursor/etc. **Apple Silicon coreutils auto-install** (setup:264): macOS lacks GNU timeout by default; Homebrew's coreutils installs it as gtimeout to avoid shadowing BSD utilities. ./setup now auto-installs coreutils on Darwin (arch-agnostic — applies to Intel + Apple Silicon) when neither gtimeout nor timeout is present. Opt-out via GSTACK_SKIP_COREUTILS=1 for CI, managed machines, or offline envs. **25 deterministic unit tests** (test/codex-hardening.test.ts): - 8 auth probe combinations (env precedence, whitespace, alternate $CODEX_HOME, corrupt file paths) - 10 version regex cases including 0.120.10 false-positive guards and v-prefixed / multiline output - 4 timeout wrapper + namespace hygiene (bash -n, gtimeout preference, set-option leak check) - 3 telemetry payload schema checks (confirms env values + auth tokens never leak into emitted events) **1 periodic-tier E2E** (test/skill-e2e-autoplan-dual-voice.test.ts): gates the /autoplan dual-voice path — asserts both Claude subagent and Codex voices produce output in Phase 1, OR that [codex-unavailable] is logged when Codex is absent. ~\$1/run, not a CI gate. Golden baseline + gen-skill-docs exclusion list updated for the new codex path references and the 16 < /dev/null redirects from #972. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: plan-review right-sized diff counterbalance (not minimal-diff default) /plan-ceo-review and /plan-eng-review listed "minimal diff" as an engineering preference without counterbalancing language. Reviewers picked up on that and rejected rewrites that should have been approved. The preference is now framed as "right-sized diff" with explicit permission to recommend a rewrite when the existing foundation is broken. Implementation alternatives section in CEO review gets an equal-weight clarification: don't default to minimal viable just because it is smaller. Recommend whichever best serves the user's goal; if the right answer is a rewrite, say so. Three-line tone edit per template, no voice / ETHOS / YC / promotional content change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * release: v0.18.4.0 — codex + Apple Silicon hardening wave - Apple Silicon codesign fix (#1003 @voidborne-d) - Codex stdin deadlock fix (#972 @loning) - Codex timeout wrapper (gtimeout → timeout → unwrapped fallback) - Multi-signal auth gate for /codex + /autoplan - Codex version warning for known-bad CLI (0.120.0-0.120.2) - Challenge mode stderr capture + completeness check - Plan-review right-sized diff counterbalance - Failure telemetry + auto-log timeout as operational learning - 25 deterministic unit tests + dual-voice periodic E2E Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: voidborne-d <voidborne-d@users.noreply.github.com> Co-authored-by: loning <loning@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
1211b6b40b |
community wave: 6 PRs + hardening (v0.18.1.0) (#1028)
* fix: extend tilde-in-assignment fix to design resolver + 4 skill templates PR #993 fixed the Claude Code permission prompt for `scripts/resolvers/browse.ts` and `gstack-upgrade/SKILL.md.tmpl`. Same bug lives in three more places that weren't on the contributor's branch: - `scripts/resolvers/design.ts` (3 spots: D=, B=, and _DESIGN_DIR=) - `design-shotgun/SKILL.md.tmpl` (_DESIGN_DIR=) - `plan-design-review/SKILL.md.tmpl` (_DESIGN_DIR=) - `design-consultation/SKILL.md.tmpl` (_DESIGN_DIR=) - `design-review/SKILL.md.tmpl` (REPORT_DIR=) Replaces bare `~/` with quoted `"$HOME/..."` in the source-of-truth files, then regenerates. `grep -rEn '^[A-Za-z_]+=~/' --include="SKILL.md" .` now returns zero hits across all hosts (claude, codex, cursor, gbrain, hermes). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(openclaw): make native skills codex-friendly (#864) Normalizes YAML frontmatter on the 4 hand-authored OpenClaw skills so stricter parsers like Codex can load them. Codex CLI was rejecting these files with "mapping values are not allowed in this context" on colons inside unquoted description scalars. - Drops non-standard `version` and `metadata` fields - Rewrites descriptions into simple "Use when..." form (no inline colons) - Adds a regression test enforcing strict frontmatter (name + description only) Verified live: Codex CLI now loads the skills without errors. Observed during /codex outside-voice run on the eval-community-prs plan review — Codex stderr tripped on these exact files, which was real-world confirmation the fix is needed. Dropped the connect-chrome changes from the original PR (the symlink removal is out of scope for this fix; keeping connect-chrome -> open-gstack-browser). Co-Authored-By: Cathryn Lavery <cathrynlavery@users.noreply.github.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(browse): server persists across Claude Code Bash calls The browse server was dying between Bash tool invocations in Claude Code because: 1. SIGTERM: The Claude Code sandbox sends SIGTERM to all child processes when a Bash command completes. The server received this and called shutdown(), deleting the state file and exiting. 2. Parent watchdog: The server polls BROWSE_PARENT_PID every 15s. When the parent Bash shell exits (killed by sandbox), the watchdog detected it and called shutdown(). Both mechanisms made it impossible to use the browse tool across multiple Bash calls — every new `$B` invocation started a fresh server with no cookies, no page state, and no tabs. Fix: - SIGTERM handler: log and ignore instead of shutdown. Explicit shutdown is still available via the /stop command or SIGINT (Ctrl+C). - Parent watchdog: log once and continue instead of shutdown. The existing idle timeout (30 min) handles eventual cleanup. The /stop command and SIGINT still work for intentional shutdown. Windows behavior is unchanged (uses taskkill /F which bypasses signal handlers). Tested: browse server survives across 5+ separate Bash tool calls in Claude Code, maintaining cookies, page state, and navigation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(browse): gate #994 SIGTERM-ignore to normal mode only PR #994 made browse persist across Claude Code Bash calls by ignoring SIGTERM and parent-PID death, relying on the 30-min idle timeout for eventual cleanup. Codex outside-voice review caught that the idle timeout doesn't apply in two modes: headed mode (/open-gstack-browser) and tunnel mode (/pair-agent). Both early-return from idleCheckInterval. Combined with #994's ignore-SIGTERM, those sessions would leak forever after the user disconnects — a real resource leak on shared machines where multiple /pair-agent sessions come and go. Fix: gate SIGTERM-ignore and parent-PID-watchdog-ignore to normal (headless) mode only. Headed + tunnel modes respect both signals and shutdown cleanly. Idle timeout behavior unchanged. Also documents the deliberate contract change for future contributors — don't re-add global SIGTERM shutdown thinking it's missing; it's intentionally scoped. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: keep cookie picker alive after cli exits Fixes garrytan/gstack#985 * fix: add opencode setup support * feat(browse): add Windows browser path detection and DPAPI cookie decryption - Extend BrowserPlatform to include win32 - Add windowsDataDir to BrowserInfo; populate for Chrome, Edge, Brave, Chromium - getBaseDir('win32') → ~/AppData/Local - findBrowserMatch checks Network/Cookies first on Windows (Chrome 80+) - Add getWindowsAesKey() reading os_crypt.encrypted_key from Local State JSON - Add dpapiDecrypt() via PowerShell ProtectedData.Unprotect (stdin/stdout) - decryptCookieValue branches on platform: AES-256-GCM (Windows) vs AES-128-CBC (mac/linux) - Fix hardcoded /tmp → TEMP_DIR from platform.ts in openDbFromCopy Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(browse): Windows cookie import — profile discovery, v20 detection, CDP fallback Three bugs fixed in cookie-import-browser.ts: - listProfiles() and findInstalledBrowsers() now check Network/Cookies on Windows (Chrome 80+ moved cookies from profile/Cookies to profile/Network/Cookies) - openDb() always uses copy-then-read on Windows (Chrome holds exclusive locks) - decryptCookieValue() detects v20 App-Bound Encryption with specific error code Added CDP-based extraction fallback (importCookiesViaCdp) for v20 cookies: - Launches Chrome headless with --remote-debugging-port on the real profile - Extracts cookies via Network.getAllCookies over CDP WebSocket - Requires Chrome to be closed (v20 keys are path-bound to user-data-dir) - Both cookie picker UI and CLI direct-import paths auto-fall back to CDP Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(browse): document CDP debug port security + log Chrome version on v20 fallback Follow-up to #892 per Codex outside-voice review. Two small additions to the Windows v20 App-Bound Encryption CDP fallback: 1. Inline comment documenting the deliberate security posture of the --remote-debugging-port. Chrome binds it to 127.0.0.1 by default, so the threat model is local-user-only (which is no worse than baseline — local attackers can already read the cookie DB). Random port 9222-9321 is for collision avoidance, not security. Chrome is always killed in finally. 2. One-time Chrome version log on CDP entry via /json/version. When Chrome inevitably changes v20 key format or /json/list shape in a future major version, logs will show exactly which version users are hitting. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: v0.18.1.0 — community wave (6 PRs + hardening) VERSION bump + users-first CHANGELOG entry for the wave: - #993 tilde-in-assignment fix (byliu-labs) - #994 browse server persists across Bash calls (joelgreen) - #996 cookie picker alive after cli exits (voidborne-d) - #864 OpenClaw skills codex-friendly (cathrynlavery) - #982 OpenCode native setup (breakneo) - #892 Windows cookie import + DPAPI + v20 CDP fallback (msr-hickory) Plus 3 follow-up hardening commits we own: - Extended tilde fix to design resolver + 4 more skill templates - Gated #994 SIGTERM-ignore to normal mode only (headed/tunnel preserve shutdown) - Documented CDP debug port security + log Chrome version on v20 fallback Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: review pass — package.json version, import dedup, error context, stale help Findings from /review on the wave PR: - [P1] package.json version was 0.18.0.1 but VERSION is 0.18.1.0, failing test/gen-skill-docs.test.ts:177 "package.json version matches VERSION file". Bumped package.json to 0.18.1.0. - [P2] Duplicate import of cookie-picker-routes in browse/src/server.ts (handleCookiePickerRoute at line 20 + hasActivePicker at line 792). Merged into single import at top. - [P2] cookie-import-browser.ts:494 generic rethrow loses underlying error. Now preserves the message so "ENOENT" vs "JSON parse error" vs "permission denied" are distinguishable in user output. - [P3] setup:46 "Missing value for --host" error message listed an incomplete set of hosts (missing factory, openclaw, hermes, gbrain). Aligned with the "Unknown value" error on line 94. Kept as-is (not real issues): - cookie-import-browser.ts:869 empty catch on Chrome version fetch is the correct pattern for best-effort diagnostics (per slop-scan philosophy in CLAUDE.md — fire-and-forget failures shouldn't throw). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(watchdog): invert test 3 to match merged #994 behavior main #1025 added browse/test/watchdog.test.ts with test 3 expecting the old "watchdog kills server when parent dies" behavior. The merge with this branch's #994 inverted that semantic — the server now STAYS ALIVE on parent death in normal headless mode (multi-step QA across Claude Code Bash calls depends on this). Changes: - Renamed test 3 from "watchdog fires when parent dies" to "server STAYS ALIVE when parent dies (#994)". - Replaced 25s shutdown poll with 20s observation window asserting the server remains alive after the watchdog tick. - Updated docstring to document all 3 watchdog invariants (env-var disable, headed-mode disable, headless persists) and note tunnel-mode coverage gap. Verification: bun test browse/test/watchdog.test.ts → 3 pass, 0 fail (22.7s). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ci): switch apt mirror to Hetzner to bypass Ubicloud → archive.ubuntu.com timeouts Both build attempts of `.github/docker/Dockerfile.ci` failed at `apt-get update` with persistent connection timeouts to archive.ubuntu.com:80 and security.ubuntu.com:80 — 90+ seconds of "connection timed out" against every Ubuntu IP. Not a transient blip; this PR doesn't touch the Dockerfile, and a re-run reproduced the same failure across all 9 mirror IPs. Root cause: Ubicloud runners (Hetzner FSN1-DC21 per runner output) have unreliable HTTP-port-80 routing to Ubuntu's official archive endpoints. Fix: - Rewrite /etc/apt/sources.list.d/ubuntu.sources (deb822 format in 24.04) to use https://mirror.hetzner.com/ubuntu/packages instead. Hetzner's mirror is publicly accessible from any cloud (not Hetzner-only despite the name) and route-local for Ubicloud's actual host. Solves both reliability and latency. - Add a 3-attempt retry loop around both `apt-get update` calls as belt-and-suspenders. Even Hetzner's mirror can have brief blips, and the retry costs nothing when the first attempt succeeds. Verification: the workflow will rebuild on push. Local `docker build` not practical for a 12-step image with bun + claude + playwright deps + a 10-min cold install. Trusting CI. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ci): use HTTP for Hetzner apt mirror (base image lacks ca-certificates) Previous commit switched to https://mirror.hetzner.com/... which proved the mirror is reachable and routes correctly (no more 90s timeouts), but exposed a chicken-and-egg: ubuntu:24.04 ships without ca-certificates, and that's exactly the package we're installing. Result: "No system certificates available. Try installing ca-certificates." Fix: use http:// for the Hetzner mirror. Apt's security model verifies package integrity via GPG-signed Release files, not TLS, so HTTP here is no weaker than the upstream defaults (Ubuntu's official sources also default to HTTP for the same reason). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Cathryn Lavery <cathrynlavery@users.noreply.github.com> Co-authored-by: Joel Green <thejoelgreen@gmail.com> Co-authored-by: d 🔹 <258577966+voidborne-d@users.noreply.github.com> Co-authored-by: Break <breakneo@gmail.com> Co-authored-by: Michael Spitzer-Rubenstein <msr.ext@hickory.ai> |
||
|
|
b3eaffce07 |
feat: context rot defense for /ship — subagent isolation + clean step numbering (v0.18.1.0) (#1030)
* refactor: renumber /ship steps to clean integers (1-20)
Replaces fractional step numbers (1.5, 2.5, 3.25, 3.4, 3.45, 3.47, 3.48,
3.5, 3.55, 3.56, 3.57, 3.75, 3.8, 5.5, 6.5, 8.5, 8.75) with clean
integers 1 through 20, plus allowed resolver sub-steps 8.1, 8.2,
9.1, 9.2, 9.3. Fractional numbering signaled "optional appendix" and
contributed to /ship's habit of skipping late-stage steps.
Affects:
- ship/SKILL.md.tmpl (all headings + ~30 cross-references)
- scripts/resolvers/review.ts (ship-side 3.47/3.48/3.57/3.8 conditionals)
- scripts/resolvers/review-army.ts (ship-side 3.55/3.56 conditionals)
- scripts/resolvers/testing.ts (ship-side 2.5/3.4 references, 5 sites)
- scripts/resolvers/utility.ts (CHANGELOG heading gets Step 13 prefix)
- test/gen-skill-docs.test.ts (5 step-number assertions updated)
- test/skill-validation.test.ts (3 step-number assertions updated)
/review step numbering (1.5, 2.5, 4.5, 5.5-5.8) intentionally unchanged —
only the ship-side of each isShip conditional was updated.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat: subagent isolation for /ship's 4 context-heaviest sub-workflows
Fights context rot. By late /ship, the parent context is bloated with
500-1,750 lines of intermediate tool output from tests, coverage audits,
reviews, adversarial checks, and PR body construction. The model is
at its least intelligent when it reaches doc-sync — which is why
/document-release was being skipped ~80% of the time.
Applies subagent dispatch (proven pattern from Review Army at Step 9.1
and Adversarial at Step 11) to four sub-workflows where the parent
only needs the conclusion, not the intermediate output:
- Step 7 (Test Coverage Audit) — subagent returns coverage_pct, gaps,
diagram, tests_added
- Step 8 (Plan Completion Audit) — subagent returns total_items, done,
changed, deferred, summary
- Step 10 (Greptile Triage) — subagent fetches + classifies, parent
handles user interaction and commits fixes (AskUserQuestion + Edit
can't run in subagents)
- Step 18 (Documentation Sync) — subagent invokes full /document-release
skill in fresh context; parent embeds documentation_section in PR body
Sequencing fix for Step 18: runs AFTER Step 17 (Push) and BEFORE Step 19
(Create PR). The PR is created once from final HEAD with the
## Documentation section baked into the initial body — no create-then-
re-edit dance, no race conditions with document-release's own PR body
editor.
Adds "You are NOT done" guardrail after Step 17 (Push) to break the
natural stopping point that currently causes doc-release skips.
Each subagent falls back to inline execution if it fails or returns
invalid JSON. /ship never blocks on subagent failure.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test: regression guard for /ship step numbering
Three regression guards in skill-validation.test.ts to prevent future
drift back to fractional step numbering:
1. ship/SKILL.md.tmpl contains no fractional step numbers except the
allowed resolver sub-steps (8.1, 8.2, 9.1, 9.2, 9.3). A contributor
adding "Step 3.75" next month will fail this test with a clear error.
2. ship/SKILL.md main headings use clean integer step numbers. If a
renumber accidentally leaves a decimal heading, this catches it.
3. review/SKILL.md step numbers unchanged — regression guard for the
resolver conditionals in review.ts/review-army.ts. If a future edit
accidentally touches the review-side of an isShip ternary, /review's
fractional numbering (1.5, 4.5, 5.7) would vanish. This test catches
that cross-contamination.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs: sync ship step references after renumber
CLAUDE.md: "At /ship time (Step 5)" → "(Step 13)" — CHANGELOG is now
explicitly Step 13 after the renumber (was implicit between old
Step 4 and Step 5.5).
TODOS.md: "Step 3.4 coverage audit" → "Step 7" — references the open
TODO for auto-upgrading ★-rated tests, which hooks into the coverage
audit step.
Both are historical references to ship's step numbering that became
stale when clean integer renumbering landed in
|
||
|
|
822e843a60 |
fix: headed browser auto-shutdown + disconnect cleanup (v0.18.1.0) (#1025)
* fix: headed browser no longer auto-shuts down after 15 seconds The parent-process watchdog in server.ts polls the spawning CLI's PID every 15s and self-terminates if it is gone. The connect command in cli.ts exits with process.exit(0) immediately after launching the server, so the watchdog would reliably kill the headed browser within ~15s. This contradicted the idle timer's own design: server.ts:745 explicitly skips headed mode because "the user is looking at the browser. Never auto-die." The watchdog had no such exemption. Two-layer fix: 1. CLI layer: connect handler always sets BROWSE_PARENT_PID=0 (was only pass-through for pair-agent subprocesses). The user owns the headed browser lifecycle; cleanup happens via browser disconnect event or $B disconnect. 2. CLI layer: startServer() honors caller's BROWSE_PARENT_PID=0 in the headless spawn path too. Lets CI, non-interactive shells, and Claude Code Bash calls opt into persistent servers across short-lived CLI invocations. 3. Server layer: defense-in-depth. Watchdog now also skips when BROWSE_HEADED=1, so even if a future launcher forgets PID=0, headed browsers won't die. Adds log lines when the watchdog is disabled so lifecycle debugging is easier. Four community contributors diagnosed variants of this bug independently. Thanks for the clear analyses and reproductions. Closes #1020 (rocke2020) Closes #1018 (sanghyuk-seo-nexcube) Closes #1012 (rodbland2021) Closes #986 (jbetala7) Closes #1006 Closes #943 Co-Authored-By: rocke2020 <noreply@github.com> Co-Authored-By: sanghyuk-seo-nexcube <noreply@github.com> Co-Authored-By: rodbland2021 <noreply@github.com> Co-Authored-By: jbetala7 <noreply@github.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: disconnect handler runs full cleanup before exiting When the user closed the headed browser window, the disconnect handler in browser-manager.ts called process.exit(2) directly, bypassing the server's shutdown() function entirely. That meant: - sidebar-agent daemon kept polling a dead server - session state wasn't saved - Chromium profile locks (SingletonLock, SingletonSocket, SingletonCookie) weren't cleaned — causing "profile in use" errors on next $B connect - state file at .gstack/browse.json was left stale Now the disconnect handler calls onDisconnect(), which server.ts wires up to shutdown(2). Full cleanup runs first, then the process exits with code 2 — preserving the existing semantic that distinguishes user-close (exit 2) from crashes (exit 1). shutdown() now accepts an optional exitCode parameter (default 0) so the SIGTERM/SIGINT paths and the disconnect path can share cleanup code while preserving their distinct exit codes. Surfaced by Codex during /plan-eng-review of the watchdog fix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: pre-existing test flakiness in relink.test.ts The 23 tests in this file all shell out to gstack-config + gstack-relink (bash scripts doing subprocess work). Under parallel bun test load, those subprocess spawns contend with other test suites and each test can drift ~200ms past Bun's 5s default timeout, causing 5+ flaky timeouts per run in the gate-tier ship gate. Wrap the `test` import to default the per-test timeout to 15s. Explicit per-test timeouts (third arg) still win, so individual tests can lower it if needed. No behavior change — only gives subprocess-heavy tests more headroom under parallel load. Noticed by /ship pre-flight test run. Unrelated to the main PR fix but blocking the gate, so fixing as a separate commit per the test ownership protocol. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: SIGTERM/SIGINT shutdown exit code regression Node's signal listeners receive the signal name ('SIGTERM' / 'SIGINT') as the first argument. When shutdown() started accepting an optional exitCode parameter in the prior disconnect-cleanup commit, the bare `process.on('SIGTERM', shutdown)` registration started silently calling shutdown('SIGTERM'). The string passed through to process.exit(), Node coerced it to NaN, and the process exited with code 1 instead of 0. Wrap both listeners so they call shutdown() with no args — signal name never leaks into the exitCode slot. Surfaced by /ship's adversarial subagent. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: onDisconnect async rejection leaves process running The disconnect handler calls this.onDisconnect() without awaiting it, but server.ts wires the callback to shutdown(2) — which is async. If that promise rejects, the rejection drops on the floor as an unhandled rejection, the browser is already disconnected, and the server keeps running indefinitely with no browser attached. Add a sync try/catch for throws and a .catch() chain for promise rejections. Both fall back to process.exit(2) so a dead browser never leaves a live server. Also widen the callback type from `() => void` to `() => void | Promise<void>` to match the actual runtime shape of the wired shutdown(2) call. Surfaced by /ship's adversarial subagent. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: honor BROWSE_PARENT_PID=0 with trailing whitespace The strict string compare `process.env.BROWSE_PARENT_PID === '0'` meant any stray newline or whitespace (common from shell `export` in a pipe or heredoc) would fail the check and re-enable the watchdog against the caller's intent. Switch to parseInt + === 0, matching the server's own parseInt at server.ts:760. Handles '0', '0\n', ' 0 ', and unset correctly; non-numeric values (parseInt returns NaN, NaN === 0 is false) fail safe — watchdog stays active, which is the safe default for unexpected input. Surfaced by /ship's adversarial subagent. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: preserve bun:test sub-APIs in relink test wrapper The previous commit wrapped bun:test's `test` to bump the per-test timeout default to 15s but cast the wrapper `as typeof _bunTest` without copying the sub-properties (`.only`, `.skip`, `.each`, `.todo`, `.failing`, `.if`) from the original. The cast was a lie: the wrapper was a plain function, not the full callable with those chained properties attached. The file doesn't use any of them today, but a future test.only or test.skip would fail with a cryptic "undefined is not a function." Object.assign the original _bunTest's properties onto the wrapper so sub-APIs chain correctly forever. Surfaced by /ship's adversarial subagent. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.18.1.0) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test: regression tests for parent-process watchdog End-to-end tests in browse/test/watchdog.test.ts that prove the three invariants v0.18.1.0 depends on. Each test spawns the real server.ts (not a mock), so any future change that breaks the watchdog logic fails here — the thing /ship's adversarial review flagged as missing. 1. BROWSE_PARENT_PID=0 disables the watchdog Spawns server with PID=0, reads stdout, confirms the "watchdog disabled (BROWSE_PARENT_PID=0)" log line appears and "Parent process ... exited" does NOT. ~2s. 2. BROWSE_HEADED=1 disables the watchdog (server-side guard) Spawns server with BROWSE_HEADED=1 and a bogus parent PID (999999). Proves BROWSE_HEADED takes precedence over a present PID — if the server-side defense-in-depth regresses, the watchdog would try to poll 999999 and fire on the "dead parent." ~2s. 3. Default headless mode: watchdog fires when parent dies The regression guard for the original orphan-prevention behavior. Spawns a real `sleep 60` parent and a server watching its PID, then kills the parent and waits up to 25s for the server to exit. The watchdog polls every 15s so first tick is 0-15s after death, plus shutdown() cleanup. ~18s. Total runtime: ~21s for all 3 tests. They catch the class of bug this branch exists to fix: "does the process live or die when it should?" Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: rocke2020 <noreply@github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
6a785c5729 |
fix: ngrok Windows build + close CI error-swallowing gap (v0.18.0.1) (#1024)
* fix(browse): externalize @ngrok/ngrok so Node server bundle builds on Windows @ngrok/ngrok has a native .node addon that causes `bun build --outfile` to fail with "cannot write multiple output files without an output directory". Externalize it alongside the existing runtime deps (playwright, diff, bun:sqlite), matching the exact pattern used for every other dynamic import in server.ts. Adds a policy comment explaining when to extend the externals list so the next native dep doesn't repeat this failure. Two community contributors independently converged on this fix: - @tomasmontbrun-hash (#1019) - @scarson (#1013) Also fixes issues #1010 and #960. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(package.json): subshell cleanup so || true stops masking build/test failures Shell operator precedence trap in both the build and test scripts: cmd1 && cmd2 && ... && rm -f .*.bun-build || true bun test ... && bun run slop:diff 2>/dev/null || true The trailing `|| true` was intended to suppress cleanup errors, but it applies to the entire `&&` chain — so ANY failure (including the build-node-server.sh failure that broke Windows installs since v0.15.12) silently exits 0. CI ran the build, the build failed, and CI reported green. Wrap the cleanup/slop-diff commands in subshells so `|| true` only scopes to the intended step: ... && (rm -f .*.bun-build || true) bun test ... && (bun run slop:diff 2>/dev/null || true) Verified: `bash -c 'false && echo A && rm -f X || true'` exits 0 (old, broken), `bash -c 'false && echo A && (rm -f X || true)'` exits 1 (new, correct). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(browse): add build validation test for server-node.mjs Two assertions: 1. `node --check` passes on the built `server-node.mjs` (valid ES module syntax). This catches regressions where the post-processing steps (perl regex replacements) corrupt the bundle. 2. No inlined `@ngrok/ngrok` module identifiers (ngrok_napi, platform- specific binding packages). Verifies the --external flag actually kept it external. Skips gracefully when `browse/dist/server-node.mjs` is missing — the dist dir is gitignored, so a fresh clone + `bun test` without a prior build is a valid state, not a failure. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(setup): verify @ngrok/ngrok can load on Windows Mirror the existing Playwright verification step. Since @ngrok/ngrok is now externalized in server-node.mjs (resolved at runtime from node_modules), confirm the platform-specific native binary (@ngrok/ngrok-win32-x64-msvc et al.) is installed at setup time rather than surfacing the failure later when the user runs /pair-agent. Same fallback pattern: if `node -e "require('@ngrok/ngrok')"` fails, fall back to `npm install --no-save @ngrok/ngrok` to pull the missing binary. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: bump to v0.18.0.1 for ngrok Windows fix + CI error-propagation Fixes shipped in this version: - Externalize @ngrok/ngrok so the Node server bundle builds on Windows (PRs #1019, #1013; issues #1010, #960) - Shell precedence fix so build/test failures no longer exit 0 in CI - Build validation test for server-node.mjs - Windows setup verifies @ngrok/ngrok native binary is loadable Credit: @tomasmontbrun-hash (#1019), @scarson (#1013). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
b805aa0113 |
feat: Confusion Protocol, Hermes + GBrain hosts, brain-first resolver (v0.18.0.0) (#1005)
* feat: add Confusion Protocol to preamble resolver Injects a high-stakes ambiguity gate at preamble tier >= 2 so all workflow skills get it. Fires when Claude encounters architectural decisions, data model changes, destructive operations, or contradictory requirements. Does NOT fire on routine coding. Addresses Karpathy failure mode #1 (wrong assumptions) with an inline STOP gate instead of relying on workflow skill invocation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add Hermes and GBrain host configs Hermes: tool rewrites for terminal/read_file/patch/delegate_task, paths to ~/.hermes/skills/gstack, AGENTS.md config file. GBrain: coding skills become brain-aware when GBrain mod is installed. Same tool rewrites as OpenClaw (agents spawn Claude Code via ACP). GBRAIN_CONTEXT_LOAD and GBRAIN_SAVE_RESULTS NOT suppressed on gbrain host, enabling brain-first lookup and save-to-brain behavior. Both registered in hosts/index.ts with setup script redirect messages. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: GBrain resolver — brain-first lookup and save-to-brain New scripts/resolvers/gbrain.ts with two resolver functions: - GBRAIN_CONTEXT_LOAD: search brain for context before skill starts - GBRAIN_SAVE_RESULTS: save skill output to brain after completion Placeholders added to 4 thinking skill templates (office-hours, investigate, plan-ceo-review, retro). Resolves to empty string on all hosts except gbrain via suppressedResolvers. GBRAIN suppression added to all 9 non-gbrain host configs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: wire slop:diff into /review as advisory diagnostic Adds Step 3.5 to the review template: runs bun run slop:diff against the base branch to catch AI code quality issues (empty catches, redundant return await, overcomplicated abstractions). Advisory only, never blocking. Skips silently if slop-scan is not installed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add Karpathy compatibility note to README Positions gstack as the workflow enforcement layer for Karpathy-style CLAUDE.md rules (17K stars). Links to forrestchang/andrej-karpathy-skills. Maps each Karpathy failure mode to the gstack skill that addresses it. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: improve native OpenClaw thinking skills office-hours: add design doc path visibility message after writing ceo-review: add HARD GATE reminder at review section transitions retro: add non-git context support (check memory for meeting notes) Mirrors template improvements to hand-crafted native skills. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: update tests and golden fixtures for new hosts - Host count: 8 → 10 (hermes, gbrain) - OpenClaw adapter test: expects undefined (dead code removed) - Golden ship fixtures: updated with Confusion Protocol + vendoring Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: regenerate all SKILL.md files Regenerated from templates after Confusion Protocol, GBrain resolver placeholders, slop:diff in review, HARD GATE reminders, investigation learnings, design doc visibility, and retro non-git context changes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: update project documentation for v0.18.0.0 - CHANGELOG: add v0.18.0.0 entry (Confusion Protocol, Hermes, GBrain, slop in review, Karpathy note, skill improvements) - CLAUDE.md: add hermes.ts and gbrain.ts to hosts listing - README.md: update agent count 8→10, add Hermes + GBrain to table - VERSION: bump to 0.18.0.0 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: sync package.json version to 0.18.0.0 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: extract Step 0 from review SKILL.md in E2E test The review-base-branch E2E test was copying the full 1493-line review/SKILL.md into the test fixture. The agent spent 8+ turns reading it in chunks, leaving only 7 turns for actual work, causing error_max_turns on every attempt. Now extracts only Step 0 (base branch detection, ~50 lines) which is all the test actually needs. Follows the CLAUDE.md rule: "NEVER copy a full SKILL.md file into an E2E test fixture." Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: update GBrain and Hermes host configs for v0.10.0 integration GBrain: add 'triggers' to keepFields so generated skills pass checkResolvable() validation. Add version compat comment. Hermes: un-suppress GBRAIN_CONTEXT_LOAD and GBRAIN_SAVE_RESULTS. The resolvers handle GBrain-not-installed gracefully, so Hermes agents with GBrain as a mod get brain features automatically. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: GBrain resolver DX improvements and preamble health check Resolver changes: - gbrain query → gbrain search (fast keyword search, not expensive hybrid) - Add keyword extraction guidance for agents - Show explicit gbrain put_page syntax with --title, --tags, heredoc - Add entity enrichment with false-positive filter - Name throttle error patterns (exit code 1, stderr keywords) - Add data-research routing for investigate skill - Expand skillSaveMap from 4 to 8 entries - Add brain operation telemetry summary Preamble changes: - Add gbrain doctor --fast --json health check for gbrain/hermes hosts - Parse check failures/warnings count - Show failing check details when score < 50 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: preserve keepFields in allowlist frontmatter mode The allowlist mode hard-coded name + description reconstruction but never iterated keepFields for additional fields. Adding 'triggers' to keepFields was a no-op because the field was silently stripped. Now iterates keepFields and preserves any field beyond name/description from the source template frontmatter, including YAML arrays. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add triggers to all 38 skill templates Multi-word, skill-specific trigger keywords for GBrain's RESOLVER.md router. Each skill gets 3-6 triggers derived from its "Use when asked to..." description text. Avoids single generic words that would collide across skills (e.g., "debug this" not "debug"). These are distinct from voice-triggers (speech-to-text aliases) and serve GBrain's checkResolvable() validation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: regenerate all SKILL.md files and update golden fixtures Regenerated from updated templates (triggers, brain placeholders, resolver DX improvements, preamble health check). Golden fixtures updated to match. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: settings-hook remove exits 1 when nothing to remove gstack-settings-hook remove was exiting 0 when settings.json didn't exist, causing gstack-uninstall to report "SessionStart hook" as removed on clean systems where nothing was installed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: update project documentation for GBrain v0.10.0 integration ARCHITECTURE.md: added GBRAIN_CONTEXT_LOAD and GBRAIN_SAVE_RESULTS to resolver table. CHANGELOG.md: expanded v0.18.0.0 entry with GBrain v0.10.0 integration details (triggers, expanded brain-awareness, DX improvements, Hermes brain support), updated date. CLAUDE.md: added gbrain to resolvers/ directory comment. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: routing E2E stops writing to user's ~/.claude/skills/ installSkills() was copying SKILL.md files to both project-level (.claude/skills/ in tmpDir) and user-level (~/.claude/skills/). Writing to the user's real install fails when symlinks point to different worktrees or dangling targets (ENOENT on copyFileSync). Now installs to project-level only. The test already sets cwd to the tmpDir, so project-level discovery works. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: scale Gemini E2E back to smoke test Gemini CLI gets lost in worktrees on complex tasks (review times out at 600s, discover-skill hits exit 124). Nobody uses Gemini for gstack skill execution. Replace the two failing tests (gemini-discover-skill and gemini-review-findings) with a single smoke test that verifies Gemini can start and read the README. 90s timeout, no skill invocation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
2300067267 |
feat: UX behavioral foundations + ux-audit command (v0.17.0.0) (#1000)
* feat: UX behavioral foundations — Krug's usability principles as shared design infrastructure Add UX_PRINCIPLES resolver distilling Steve Krug's "Don't Make Me Think" into actionable guidance for AI agents. Injected into all 4 design skills as a shared behavioral foundation complementing the existing visual checklist (WHAT to check) and cognitive patterns (HOW designers see) with HOW USERS ACTUALLY BEHAVE. Methodology rewire: 6 Krug usability tests woven into existing design-review phases — Trunk Test, 3-Second Scan, Page Area Test, Happy Talk Detection with word count metric, Mindless Choice Audit, Goodwill Reservoir tracking with visual dashboard. First-person narration mode for design-review output with anti-slop guardrail. Hard rules: 4 Krug always/never rules in DESIGN_HARD_RULES (placeholder-as-label, floating headings, visited link distinction, minimum type size). Krug, Redish, Jarrett added to plan-design-review references. Token ceiling: gen-skill-docs.ts warns if any SKILL.md exceeds 100KB (~25K tokens). Documented in CLAUDE.md. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: $B ux-audit command + snapshot --heatmap flag New browse meta-command: ux-audit extracts page structure (site ID, navigation, headings, interactive elements, text blocks) as structured JSON for agent-side UX behavioral analysis. Pure data extraction — the agent applies the 6 usability tests and makes judgment calls. Element caps: 50 headings, 100 links, 200 interactive, 50 text blocks. New snapshot flag: -H/--heatmap accepts a JSON color map mapping ref IDs to colors (green/yellow/red/blue/orange/gray). Extends existing snapshot -a annotation system with per-ref colors instead of hardcoded red. Color whitelist validation prevents CSS injection. Composable — any skill can use it. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: update project documentation for v0.17.0.0 ARCHITECTURE.md: added {{UX_PRINCIPLES}} resolver to placeholder table. VERSION: bumped to 0.17.0.0 for UX behavioral foundations release. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.17.0.0) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: adversarial review fixes for ux-audit and heatmap Security: - Remove live form value extraction from ux-audit (leaked input field values) - Add ux-audit to PAGE_CONTENT_COMMANDS (untrusted content wrapping) Correctness: - Scope youAreHere selector to nav containers (was matching animation classes) - Validate heatmap JSON is a plain object (string/array/null produced garbage) - Use textContent instead of innerText for word count (avoids layout computation) - Remove dead url variable and unused LINK_CAP constant Found by Codex + Claude adversarial review. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
7e96fe299b |
fix: security wave 3 — 12 fixes, 7 contributors (v0.16.4.0) (#988)
* fix(security): validateOutputPath symlink bypass — check file-level symlinks validateOutputPath() previously only resolved symlinks on the parent directory. A symlink at /tmp/evil.png → /etc/crontab passed the parent check (parent is /tmp, which is safe) but the write followed the symlink outside safe dirs. Add lstatSync() check: if the target file exists and is a symlink, resolve through it and verify the real target is within SAFE_DIRECTORIES. ENOENT (file doesn't exist yet) falls through to the existing parent-dir check. Closes #921 Co-Authored-By: Yunsu <Hybirdss@users.noreply.github.com> * fix(security): shell injection in bin/ scripts — use env vars instead of interpolation gstack-settings-hook interpolated $SETTINGS_FILE directly into bun -e double-quoted blocks. A path containing quotes or backticks breaks the JS string context, enabling arbitrary code execution. Replace direct interpolation with environment variables (process.env). Same fix applied to gstack-team-init which had the same pattern. Systematic audit confirmed only these two scripts were vulnerable — all other bin/ scripts already use stdin piping or env vars. Closes #858 Co-Authored-By: Gus <garagon@users.noreply.github.com> * fix(security): cookie-import path validation bypass + hardcoded /tmp Two fixes: 1. cookie-import relative path bypass (#707): path.isAbsolute() gated the entire validation, so relative paths like "sensitive-file.json" bypassed the safe-directory check entirely. Now always resolves to absolute path with realpathSync for symlink resolution, matching validateOutputPath(). 2. Hardcoded /tmp in cookie-import-browser (#708): openDbFromCopy used /tmp directly instead of os.tmpdir(), breaking Windows support. Also adds explicit imports for SAFE_DIRECTORIES and isPathWithin in write-commands.ts (previously resolved implicitly through bundler). Closes #852 Co-Authored-By: Toby Morning <urbantech@users.noreply.github.com> * fix(security): redact form fields with sensitive names, not just type=password Form redaction only applied to type="password" fields. Hidden and text fields named csrf_token, api_key, session_id, etc. were exposed unredacted in LLM context, leaking secrets. Extend redaction to check field name and id against sensitive patterns: token, secret, key, password, credential, auth, jwt, session, csrf, sid, api_key. Uses the same pattern style as SENSITIVE_COOKIE_NAME. Closes #860 Co-Authored-By: Gus <garagon@users.noreply.github.com> * fix(security): restrict session file permissions to owner-only Design session files written to /tmp with default umask (0644) were world-readable on shared systems. Sessions contain design prompts and feedback history. Set mode 0o600 (owner read/write only) on both create and update paths. Closes #859 Co-Authored-By: Gus <garagon@users.noreply.github.com> * fix(security): enforce frozen lockfile during setup bun install without --frozen-lockfile resolves ^semver ranges from npm on every run. If an attacker publishes a compromised compatible version of any dependency, the next ./setup pulls it silently. Add --frozen-lockfile with fallback to plain install (for fresh clones where bun.lock may not exist yet). Matches the pattern already used in the .agents/ generation block (line 237). Closes #614 Co-Authored-By: Alberto Martinez <halbert04@users.noreply.github.com> * fix: remove duplicate recursive chmod on /tmp in Dockerfile.ci chmod -R 1777 /tmp recursively sets sticky bit on files (no defined behavior), not just the directory. Deduplicate to single chmod 1777 /tmp. Closes #747 Co-Authored-By: Maksim Soltan <Gonzih@users.noreply.github.com> * fix(security): learnings input validation + cross-project trust gate Three fixes to the learnings system: 1. Input validation in gstack-learnings-log: type must be from allowed list, key must be alphanumeric, confidence must be 1-10 integer, source must be from allowed list. Prevents injection via malformed fields. 2. Prompt injection defense: insight field checked against 10 instruction-like patterns (ignore previous, system:, override, etc.). Rejected with clear error message. 3. Cross-project trust gate in gstack-learnings-search: AI-generated learnings from other projects are filtered out. Only user-stated learnings cross project boundaries. Prevents silent prompt injection across codebases. Also adds trusted field (true for user-stated source, false for AI-generated) to enable the trust gate at read time. Closes #841 Co-Authored-By: Ziad Al Sharif <Ziadstr@users.noreply.github.com> * feat(security): track cookie-imported domains and scope cookie imports Foundation for origin-pinned JS execution (#616). Tracks which domains cookies were imported from so the JS/eval commands can verify execution stays within imported origins. Changes: - BrowserManager: new cookieImportedDomains Set with track/get/has methods - cookie-import: tracks imported cookie domains after addCookies - cookie-import-browser: tracks domains on --domain direct import - cookie-import-browser --all: new explicit opt-in for all-domain import (previously implicit behavior, now requires deliberate flag) Closes #615 Co-Authored-By: Alberto Martinez <halbert04@users.noreply.github.com> * feat(security): pin JS/eval execution to cookie-imported origins When cookies have been imported for specific domains, block JS execution on pages whose origin doesn't match. Prevents the attack chain: 1. Agent imports cookies for github.com 2. Prompt injection navigates to attacker.com 3. Agent runs js document.cookie → exfiltrates github cookies assertJsOriginAllowed() checks the current page hostname against imported cookie domains with subdomain matching (.github.com allows api.github.com). When no cookies are imported, all origins allowed (nothing to protect). about:blank and data: URIs are allowed (no cookies at risk). Depends on #615 (cookie domain tracking). Closes #616 Co-Authored-By: Alberto Martinez <halbert04@users.noreply.github.com> * feat(security): add persistent command audit log Append-only JSONL audit trail for all browse server commands. Unlike in-memory ring buffers, the audit log persists across restarts and is never truncated. Each entry records: timestamp, command, args (truncated to 200 chars), page origin, duration, status, error (truncated to 300 chars), hasCookies flag, connection mode. All writes are best-effort — audit failures never block command execution. Log stored at ~/.gstack/.browse/browse-audit.jsonl. Closes #617 Co-Authored-By: Alberto Martinez <halbert04@users.noreply.github.com> * fix(security): block hex-encoded IPv4-mapped IPv6 metadata bypass URL constructor normalizes ::ffff:169.254.169.254 to ::ffff:a9fe:a9fe (hex form), which was not in the blocklist. Similarly, ::169.254.169.254 normalizes to ::a9fe:a9fe. Add both hex-encoded forms to BLOCKED_METADATA_HOSTS so they're caught by the direct hostname check in validateNavigationUrl. Closes #739 Co-Authored-By: Osman Mehmood <mehmoodosman@users.noreply.github.com> * chore: bump version and changelog (v0.16.4.0) Security wave 3: 12 fixes, 7 contributors. Cookie origin pinning, command audit log, domain tracking. Symlink bypass, path validation, shell injection, form redaction, learnings injection, IPv6 SSRF, session permissions, frozen lockfile. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Yunsu <Hybirdss@users.noreply.github.com> Co-authored-by: Gus <garagon@users.noreply.github.com> Co-authored-by: Toby Morning <urbantech@users.noreply.github.com> Co-authored-by: Alberto Martinez <halbert04@users.noreply.github.com> Co-authored-by: Maksim Soltan <Gonzih@users.noreply.github.com> Co-authored-by: Ziad Al Sharif <Ziadstr@users.noreply.github.com> Co-authored-by: Osman Mehmood <mehmoodosman@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
c6e6a21d1a |
refactor: AI slop reduction with cross-model quality review (v0.16.3.0) (#941)
* refactor: add error-handling utility module with selective catches safeUnlink (ignores ENOENT), safeKill (ignores ESRCH), isProcessAlive (extracted from cli.ts with Windows support), and json() Response helper. All catches check err.code and rethrow unexpected errors instead of swallowing silently. Unit tests cover happy path + error code paths. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: replace defensive try/catches in server.ts with utilities Replace ~12 try/catch sites with safeUnlink/safeKill calls in shutdown, emergencyCleanup, killAgent, and log cleanup. Convert empty catches to selective catches with error code checks. Remove needless welcome page try/catches (fs.existsSync doesn't need wrapping). Reduces slop-scan empty-catch locations from 11 to 8 and error-swallowing from 24 to 18. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: extract isProcessAlive and replace try/catches in cli.ts Move isProcessAlive to shared error-handling module. Replace ~20 try/catch sites with safeUnlink/safeKill in killServer, connect, disconnect, and cleanup flows. Convert empty catches to selective catches. Reduces slop-scan empty-catch from 22 to 2 locations. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: remove unnecessary return await in content-security and read-commands Remove 6 redundant return-await patterns where there's no enclosing try block. Eliminates all defensive.async-noise findings from these files. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: add slop-scan config to exclude vendor files Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: replace empty catches with selective error handling in sidebar-agent Convert 8 empty catch blocks to selective catches that check err.code (ESRCH for process kills, ENOENT for file ops). Import safeUnlink for cancel file cleanup. Unexpected errors now propagate instead of being silently swallowed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: replace empty catches and mark pass-through wrappers in browser-manager Convert 12 empty catch blocks to selective catches: filesystem ops check ENOENT/EACCES, browser ops check for closed/Target messages, URL parsing checks TypeError. Add 'alias for active session' comments above 6 pass-through wrapper methods to document their purpose (and exempt from slop-scan pass-through-wrappers rule). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: selective catches in gstack-global-discover Convert 8 defensive catch blocks to selective error handling. Filesystem ops check ENOENT/EACCES, process ops check exit status. Unexpected errors now propagate instead of returning silent defaults. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: selective catches in write-commands, cdp-inspector, meta-commands, snapshot Convert ~27 empty/obscuring catches to selective error handling across 4 browse source files. CDP ops check for closed/Target/detached messages, DOM ops check TypeError/DOMException, filesystem ops check ENOENT/EACCES, JSON parsing checks SyntaxError. Remove dead code in cdp-inspector where try/catch wrapped synchronous no-ops. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: selective catches in Chrome extension files Convert empty catches and error-swallowing patterns across inspector.js, content.js, background.js, and sidepanel.js. DOM catches filter TypeError/DOMException, chrome API catches filter Extension context invalidated, network catches filter Failed to fetch. Unexpected errors now propagate. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: restore isProcessAlive boolean semantics, add safeUnlinkQuiet, remove unused json() isProcessAlive now catches ALL errors and returns false (pure boolean probe). Callers use it in if/while conditions without try/catch, so throwing on EPERM was a behavior change that could crash the CLI. Windows path gets its safety catch restored. safeUnlinkQuiet added for best-effort cleanup paths where throwing on non-ENOENT errors (like EPERM during shutdown) would abort cleanup. json() removed — dead code, never imported anywhere. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: use safeUnlinkQuiet in shutdown and cleanup paths Shutdown, emergency cleanup, and disconnect paths should never throw on file deletion failures. Switched from safeUnlink (throws on EPERM) to safeUnlinkQuiet (swallows all errors) in these best-effort paths. Normal operation paths (startup, lock release) keep safeUnlink. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * revert: remove brittle string-matching catches and alias comments in browser-manager Revert 6 catches that matched error messages via includes('closed'), includes('Target'), etc. back to empty catches. These fire-and-forget operations (page.close, bringToFront, dialog dismiss) genuinely don't care about any error type. String matching on error messages is brittle and will break on Playwright version bumps. Remove 6 'alias for active session' comments that existed solely to game slop-scan's pass-through-wrapper exemption rule. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * revert: remove brittle string-matching catches in extension files Revert error-swallowing fixes in background.js and sidepanel.js that matched error messages via includes('Failed to fetch'), includes( 'Extension context invalidated'), etc. In Chrome extensions, uncaught errors crash the entire extension. The original catch-and-log pattern is the correct choice for extension code where any error is non-fatal. content.js and inspector.js changes kept — their TypeError/DOMException catches are typed, not string-based. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add slop-scan usage guidelines to CLAUDE.md Instructions for using slop-scan to improve genuine code quality, not to game metrics or hide that we're AI-coded. Documents what to fix (empty catches on file/process ops, typed exception narrows, return await) and what NOT to fix (string-matching on error messages, linter gaming comments, tightening extension/cleanup catches). Includes utility function reference and baseline score tracking. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: add slop-scan as diagnostic in test suite Runs slop-scan after bun test as a non-blocking diagnostic. Prints the summary (top files, hotspots) so you see the number without it gating anything. Available standalone via bun run slop. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: slop-diff shows only NEW findings introduced on this branch Runs slop-scan on HEAD and the merge-base, diffs results with line-number-insensitive fingerprinting so shifted code doesn't create false positives. Uses git worktree for clean base comparison. Shows net new vs removed findings. Runs automatically after bun test. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: design doc for slop-scan integration in /review and /ship Deferred plan for surfacing slop-diff findings automatically during code review and shipping. Documents integration points, auto-fix vs skip heuristics, and implementation notes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.16.3.0) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
dbd7aee5b6 |
feat: relationship closing — office-hours adapts to repeat users (v0.16.2.0) (#937)
* fix: sync package.json version with VERSION file package.json was 0.15.15.0 while VERSION was 0.15.16.0, causing gen-skill-docs freshness check test failures. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add builder profile helper for office-hours relationship closing New bin/gstack-builder-profile reads ~/.gstack/builder-profile.jsonl and outputs structured summary (tier, signals, resources, topics). Single source of truth for all closing state — no separate config keys or logs. Uses bun-based JSONL parsing pattern from gstack-learnings-search. Graceful fallback to introduction tier if bun unavailable or file missing. 26 unit tests covering tier computation, signal accumulation, cross-project detection, nudge eligibility, resource dedup, and malformed JSONL handling. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: relationship closing — office-hours adapts to repeat users The office-hours closing now deepens over time instead of repeating the same YC plea every session. Four tiers based on session count: - Introduction (session 1): full YC plea + founder resources - Welcome Back (sessions 2-3): lead with recognition, skip plea - Regular (sessions 4-7): arc-level callbacks, signal visibility, builder-to-founder nudge, auto-generated journey summary - Inner Circle (sessions 8+): the data speaks Key design decisions (from CEO + Eng + Codex + DX reviews): - Single source of truth: one builder-profile.jsonl, no split-brain state - Lead with recognition on repeat visits (DX: magical moment hits immediately) - Narrative arc journey summary, not data tables - Tone examples per tier to prevent generic AI voice - Global resource dedup (low-sensitivity video watch history) - Migration merges per-project resource logs into builder profile Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.16.2.0) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
a7593d70ef |
fix: cookie picker auth token leak (v0.15.17.0) (#904)
* fix: cookie picker auth token leak (CVE — CVSS 7.8) GET /cookie-picker served HTML that inlined the master bearer token without authentication. Any local process could extract it and use it to call /command, executing arbitrary JS in the browser context. Fix: Jupyter-style one-time code exchange. The picker URL now includes a one-time code that is consumed via 302 redirect, setting an HttpOnly session cookie. The master AUTH_TOKEN never appears in HTML. The session cookie is isolated from the scoped token system (not valid for /command). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: bump version and changelog (v0.15.17.0) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: browse-snapshot E2E turn budget too tight (7 → 9) The agent consistently uses 8 turns for 5 snapshot commands because it reads the saved annotated PNG to verify it was created. All 3 CI attempts hit error_max_turns at exactly 8. Bumping to 9 gives headroom. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
b73f364411 |
feat: browser data platform for AI agents (v0.16.0.0) (#907)
* refactor: extract path-security.ts shared module validateOutputPath, validateReadPath, and SAFE_DIRECTORIES were duplicated across write-commands.ts, meta-commands.ts, and read-commands.ts. Extract to a single shared module with re-exports for backward compatibility. Also adds validateTempPath() for the upcoming GET /file endpoint (TEMP_DIR only, not cwd, to prevent remote agents from reading project files). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: default paired agents to full access, split SCOPE_CONTROL The trust boundary for paired agents is the pairing ceremony itself, not the scope. An agent with write scope can already click anything and navigate anywhere. Gating js/cookies behind --admin was security theater. Changes: - Default pair scopes: read+write+admin+meta (was read+write) - New SCOPE_CONTROL for browser-wide destructive ops (stop, restart, disconnect, state, handoff, resume, connect) - --admin flag now grants control scope (backward compat) - New --restrict flag for limited access (e.g., --restrict read) - Updated hint text: "re-pair with --control" instead of "--admin" Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add media and data commands for page content extraction media command: discovers all img/video/audio/background-image elements on the page. Returns JSON with URLs, dimensions, srcset, loading state, HLS/DASH detection. Supports --images/--videos/--audio filters and optional CSS selector scoping. data command: extracts structured data embedded in pages (JSON-LD, Open Graph, Twitter Cards, meta tags). One command returns product prices, article metadata, social share info without DOM scraping. Both are READ scope with untrusted content wrapping. Shared media-extract.ts helper for reuse by the upcoming scrape command. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add download, scrape, and archive commands download: fetch any URL or @ref element to disk using browser session cookies via page.request.fetch(). Supports blob: URLs via in-page base64 conversion. --base64 flag returns inline data URI (cap 10MB). Detects HLS/DASH and rejects with yt-dlp hint. scrape: bulk media download composing media discovery + download loop. Sequential with 100ms delay, URL deduplication, configurable --limit. Writes manifest.json with per-file metadata for machine consumption. archive: saves complete page as MHTML via CDP Page.captureSnapshot. No silent fallback -- errors clearly if CDP unavailable. All three are WRITE scope (write to disk, blocked in watch mode). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add GET /file endpoint for remote agent file retrieval Remote paired agents can now retrieve downloaded files over HTTP. TEMP_DIR only (not cwd) to prevent project file exfiltration. - Bearer token auth (root or scoped with read scope) - Path validation via validateTempPath() (symlink-aware) - 200MB size cap - Extension-based MIME detection - Zero-copy streaming via Bun.file() Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add scroll --times N for automated repeated scrolling Extends the scroll command with --times N flag for infinite feed scraping. Scrolls N times with configurable --wait delay (default 1000ms) between each scroll for content loading. Usage: scroll --times 10 scroll --times 5 --wait 2000 scroll --times 3 .feed-container Composable with scrape: scroll to load content, then scrape images. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add network response body capture (--capture/--export/--bodies) The killer feature for social media scraping. Extends the existing network command to intercept API response bodies: network --capture [--filter graphql] # start capturing network --capture stop # stop network --export /tmp/api.jsonl # export as JSONL network --bodies # show summary Uses page.on('response') listener with URL pattern filtering. SizeCappedBuffer (50MB total, 5MB per-entry cap) evicts oldest entries when full. Binary responses stored as base64, text as-is. This lets agents tap Instagram's GraphQL API, TikTok's hydration data, and any SPA's internal API responses instead of fragile DOM scraping. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add screenshot --base64 for inline image return Returns data:image/png;base64,... instead of writing to disk. Cap at 10MB. Works with all screenshot modes (element, clip, viewport). Eliminates the two-step screenshot+file-serve dance for remote agents. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: add data platform tests and media fixture Tests for SizeCappedBuffer (eviction, export, summary), validateTempPath (TEMP_DIR only, rejects cwd), command registration (all new commands in correct scope sets), and MIME mapping source checks. Rich HTML fixture with: standard images, lazy-loaded images, srcset, video with sources + HLS, audio, CSS background-images, JSON-LD, Open Graph, Twitter Cards, and meta tags. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: regenerate SKILL.md with Extraction category Add Extraction category to browse command table ordering. Regenerate SKILL.md files to include media, data, download, scrape, archive commands in the generated documentation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.16.0.0) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
1868636f49 |
refactor: extract TabSession for per-tab state isolation (v0.15.16.0) (#873)
* plan: batch command endpoint + multi-tab parallel execution for GStack Browser * refactor: extract TabSession from BrowserManager for per-tab state Move per-tab state (refMap, lastSnapshot, frame) into a new TabSession class. BrowserManager delegates to the active TabSession via getActiveSession(). Zero behavior change — all existing tests pass. This is the foundation for the /batch endpoint: both /command and /batch will use the same handler functions with TabSession, eliminating shared state races during parallel tab execution. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: update handler signatures to use TabSession Change handleReadCommand and handleSnapshot to take TabSession instead of BrowserManager. Change handleWriteCommand to take both TabSession (per-tab ops) and BrowserManager (global ops like viewport, headers, dialog). handleMetaCommand keeps BrowserManager for tab management. Tests use thin wrapper functions that bridge the old 3-arg call pattern to the new signatures via bm.getActiveSession(). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add POST /batch endpoint for parallel multi-tab execution Execute multiple commands across tabs in a single HTTP request. Commands targeting different tabs run concurrently via Promise.allSettled. Commands targeting the same tab run sequentially within that group. Features: - Batch-safe command subset (text, goto, click, snapshot, screenshot, etc.) - newtab/closetab as special commands within batch - SSE streaming mode (stream: true) for partial results - Per-command error isolation (one tab failing doesn't abort the batch) - Max 50 commands per batch, soft batch-level timeout A 143-page crawl drops from ~45 min (serial HTTP) to ~5 min (20 tabs in parallel, batched commands). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: add batch endpoint integration tests 10 tests covering: - Multi-tab parallel execution (goto + text on different tabs) - Same-tab sequential ordering - Per-command error isolation (one tab fails, others succeed) - Page-scoped refs (snapshot refs are per-session, not global) - Per-tab lastSnapshot (snapshot -D with independent baselines) - getSession/getActiveSession API - Batch-safe command subset validation - closeTab via page.close preserves at-least-one-page invariant - Parallel goto on 3 tabs simultaneously Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: harden codex-review E2E — extract SKILL.md section, bump maxTurns to 25 The test was copying the full 55KB/1075-line codex SKILL.md into the fixture, requiring 8 Read calls just to consume it and exhausting the 15-turn budget before reaching the actual codex review command. Now extracts only the review-relevant section (~6KB/148 lines), reducing Read calls from 8 to 1. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: move batch endpoint plan into BROWSER.md as feature documentation The batch endpoint is implemented — document it as an actual feature in BROWSER.md (architecture, API shape, design decisions, usage pattern) and remove the standalone plan file. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.15.16.0) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: gstack <ship@gstack.dev> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
6cc094cd41 |
fix: pair-agent tunnel drops after 15s (v0.15.15.1) (#868)
* fix: remove stray `domains` reference crashing connect command The connect command's status fetch had an undefined `domains` variable in the JSON body, causing "Connect failed: domains is not defined" and preventing headed mode from initializing properly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: pair-agent server dies 15s after CLI exits The server monitors BROWSE_PARENT_PID and self-terminates when the parent exits. For pair-agent, the connect subprocess is the parent, so the server dies 15s after connect finishes. Disable parent-PID monitoring for pair-agent sessions so the server outlives the CLI. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.15.15.1) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: newtab blocked by tab ownership check for scoped tokens The tab ownership check ran before the newtab handler, checking the active tab (owned by root) against the scoped token. Since the scoped token doesn't own the root tab, newtab returned 403. Skip the ownership check for newtab since it creates a new tab. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: regression tests for pair-agent tunnel fixes Three source-level tests covering the bugs fixed on this branch: - connect status fetch has no undefined variable references (domains) - pair-agent disables parent PID monitoring (BROWSE_PARENT_PID=0) - newtab excluded from tab ownership check for scoped tokens Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
03973c2fab |
fix: community security wave — 8 PRs, 4 contributors (v0.15.13.0) (#847)
* fix(bin): pass search params via env vars (RCE fix) (#819) Replace shell string interpolation with process.env in gstack-learnings-search to prevent arbitrary code execution via crafted learnings entries. Also fixes the CROSS_PROJECT interpolation that the original PR missed. Adds 3 regression tests verifying no shell interpolation remains in the bun -e block. Co-authored-by: garagon <garagon@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(browse): add path validation to upload command (#821) Add isPathWithin() and path traversal checks to the upload command, blocking file exfiltration via crafted upload paths. Uses existing SAFE_DIRECTORIES constant instead of a local copy. Adds 3 regression tests. Co-authored-by: garagon <garagon@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(browse): symlink resolution in meta-commands validateOutputPath (#820) Add realpathSync to validateOutputPath in meta-commands.ts to catch symlink-based directory escapes in screenshot, pdf, and responsive commands. Resolves SAFE_DIRECTORIES through realpathSync to handle macOS /tmp -> /private/tmp symlinks. Existing path validation tests pass with the hardened implementation. Co-authored-by: garagon <garagon@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add uninstall instructions to README (#812) Community PR #812 by @0531Kim. Adds two uninstall paths: the gstack-uninstall script (handles everything) and manual removal steps for when the repo isn't cloned. Includes CLAUDE.md cleanup note and Playwright cache guidance. Co-Authored-By: 0531Kim <0531Kim@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(browse): Windows launcher extraEnv + headed-mode token (#822) Community PR #822 by @pieterklue. Three fixes: 1. Windows launcher now merges extraEnv into spawned server env (was only passing BROWSE_STATE_FILE, dropping all other env vars) 2. Welcome page fallback serves inline HTML instead of about:blank redirect (avoids ERR_UNSAFE_REDIRECT on Windows) 3. /health returns auth token in headed mode even without Origin header (fixes Playwright Chromium extensions that don't send it) Also adds HOME/USERPROFILE fallback for cross-platform compatibility. Co-Authored-By: pieterklue <pieterklue@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(browse): terminate orphan server when parent process exits (#808) Community PR #808 by @mmporong. Passes BROWSE_PARENT_PID to the spawned server process. The server polls every 15s with signal 0 and calls shutdown() if the parent is gone. Prevents orphaned chrome-headless-shell processes when Claude Code sessions exit abnormally. Co-Authored-By: mmporong <mmporong@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(security): IPv6 ULA blocking, cookie redaction, per-tab cancel, targeted token (#664) Community PR #664 by @mr-k-man (security audit round 1, new parts only). - IPv6 ULA prefix blocking (fc00::/7) in url-validation.ts with false-positive guard for hostnames like fd.example.com - Cookie value redaction for tokens, API keys, JWTs in browse cookies command - Per-tab cancel files in killAgent() replacing broken global kill-signal - design/serve.ts: realpathSync upgrade prevents symlink bypass in /api/reload - extension: targeted getToken handler replaces token-in-health-broadcast - Supabase migration 003: column-level GRANT restricts anon UPDATE scope - Telemetry sync: upsert error logging - 10 new tests for IPv6, cookie redaction, DNS rebinding, path traversal Co-Authored-By: mr-k-man <mr-k-man@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(security): CSS injection guard, timeout clamping, session validation, tests (#806) Community PR #806 by @mr-k-man (security audit round 2, new parts only). - CSS value validation (DANGEROUS_CSS) in cdp-inspector, write-commands, extension inspector - Queue file permissions (0o700/0o600) in cli, server, sidebar-agent - escapeRegExp for frame --url ReDoS fix - Responsive screenshot path validation with validateOutputPath - State load cookie filtering (reject localhost/.internal/metadata cookies) - Session ID format validation in loadSession - /health endpoint: remove currentUrl and currentMessage fields - QueueEntry interface + isValidQueueEntry validator for sidebar-agent - SIGTERM->SIGKILL escalation in timeout handler - Viewport dimension clamping (1-16384), wait timeout clamping (1s-300s) - Cookie domain validation in cookie-import and cookie-import-browser - DocumentFragment-based tab switching (XSS fix in sidepanel) - pollInProgress reentrancy guard for pollChat - toggleClass/injectCSS input validation in extension inspector - Snapshot annotated path validation with realpathSync - 714-line security-audit-r2.test.ts + 33-line learnings-injection.test.ts Co-Authored-By: mr-k-man <mr-k-man@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.15.13.0) Community security wave: 8 PRs from 4 contributors (@garagon, @mr-k-man, @mmporong, @0531Kim, @pieterklue). IPv6 ULA blocking, cookie redaction, per-tab cancel signaling, CSS injection guards, timeout clamping, session validation, DocumentFragment XSS fix, parent process watchdog, uninstall docs, Windows fixes, and 750+ lines of security regression tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: garagon <garagon@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: 0531Kim <0531Kim@users.noreply.github.com> Co-authored-by: pieterklue <pieterklue@users.noreply.github.com> Co-authored-by: mmporong <mmporong@users.noreply.github.com> Co-authored-by: mr-k-man <mr-k-man@users.noreply.github.com> |
||
|
|
b3d064aabb |
fix: gstack-team-init detects and removes vendored copies (#848)
* fix: gstack-team-init detects and removes vendored copies in team mode When running gstack-team-init inside a repo with a vendored .claude/skills/gstack/, the script now auto-detects and removes it: git rm --cached, add to .gitignore, rm -rf. Also adds team_mode config key to setup --team/--no-team, and makes gstack-upgrade Step 4.5 team-mode aware (remove instead of sync). Includes 5 new integration tests for the vendored copy migration. * chore: bump version and changelog (v0.15.14.0) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
dae251e066 |
feat: team-friendly gstack install mode (v0.15.7.0) (#809)
* feat: add gstack-settings-hook for atomic Claude Code hook management DRY helper for adding/removing SessionStart hooks in ~/.claude/settings.json. Handles missing files, deduplication, malformed JSON, and atomic writes (.tmp + rename) to prevent corruption on crash or disk-full. Part of team-install-mode feature (credit: Jared Friedman). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add gstack-session-update for automatic team updates SessionStart hook target that auto-updates gstack at session start. Background fork (zero latency), throttled to once/hour, with lockfile (mkdir + PID), stale lock recovery, GIT_TERMINAL_PROMPT=0, and debug logging to ~/.gstack/analytics/session-update.log. Part of team-install-mode feature (credit: Jared Friedman). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add --team, --no-team, -q flags to setup --team enables auto_upgrade and registers SessionStart hook via gstack-settings-hook. --no-team reverses it. -q/--quiet suppresses all informational output (for hook-triggered setup runs). --local now prints a deprecation warning. Replaces ~20 echo calls with log() helper for quiet mode support. Part of team-install-mode feature (credit: Jared Friedman). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add gstack-team-init for repo-level team bootstrapping Two modes: 'optional' (gentle CLAUDE.md suggestion) and 'required' (CLAUDE.md enforcement + .claude/hooks/check-gstack.sh PreToolUse hook that blocks work without gstack installed). Atomic JSON writes, idempotent, prints git add instructions. Part of team-install-mode feature (credit: Jared Friedman). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: deprecate vendoring, document team mode, clean up uninstall - README: replace "Step 2: Add to your repo" vendoring instructions with team mode (./setup --team + gstack-team-init) - CLAUDE.md: rename "Vendored symlink awareness" to "Dev symlink awareness", add deprecation note - CONTRIBUTING.md: remove vendoring language from prefix section - bin/gstack-uninstall: clean up SessionStart hook on uninstall Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add vendoring deprecation detection to skill preamble Detects vendored gstack in CWD (.claude/skills/gstack/ that's not a symlink and has VERSION or .git). Outputs VENDORED_GSTACK: yes/no. Adds generateVendoringDeprecation() section that offers one-time migration to team mode via AskUserQuestion. Part of team-install-mode feature (credit: Jared Friedman). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: regenerate SKILL.md files with vendoring deprecation preamble Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: team mode (v0.15.7.0) — credit Jared Friedman Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: add integration tests for team mode (20 tests) Covers gstack-settings-hook (add, remove, dedup, preserve existing, atomic write), gstack-session-update (guards, throttle, non-fatal), gstack-team-init (optional, required, enforcement hook, idempotent), and setup flags (-q, --local deprecation). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
a94a64f821 |
fix: snapshot -i auto-detects dropdown/popover interactive elements (#845)
* fix: snapshot -i auto-detects dropdown/popover interactive elements - Auto-enable cursor-interactive scan (-C) when -i flag is used - Add floating container detection (portals, popovers, dropdowns) - Detects position:fixed/absolute with high z-index - Recognizes data-floating-ui-portal, data-radix-* attributes - Recognizes role=listbox, role=menu containers - Elements inside floating containers bypass the hasRole skip - Catches dropdown items missed by the accessibility tree - Role=option/menuitem elements in floating containers captured even without cursor:pointer/onclick - Tag floating container items with 'popover-child' reason - Include role name in @c ref reasons when present - Add dropdown.html test fixture - Add dropdown/popover detection test suite (6 tests) - Add test: -i alone includes cursor-interactive elements Fixes: Bookface autocomplete, Radix UI combobox, React portals, and similar dynamic dropdown patterns where ariaSnapshot() misses the floating content. * chore: bump version and changelog (v0.15.12.0) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * docs: update snapshot -i/-C flag descriptions to mention auto-enable behavior * test: strengthen clickability test guard assertions The @c ref clickability test previously used if-guards that would silently pass when no Alice line was found in the snapshot output. Both Claude and Codex adversarial review flagged this as a test that could regress without CI noticing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: regenerate top-level SKILL.md with updated flag descriptions Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: root <root@localhost> Co-authored-by: gstack <ship@gstack.dev> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> |
||
|
|
422f172fbb |
feat: ship re-run executes all verification checks (v0.15.10.0) (#833)
* feat: review army idempotency + cross-review dedup resolver Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: ship re-run executes all checks, adds review army + dedup Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: regression guards for ship specialist dispatch + dedup + idempotency Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.15.10.0) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
b3cd3fd68b |
feat: native OpenClaw skills + ClaHub publishing (v0.15.10.0) (#832)
* feat: add 4 native OpenClaw skills for ClaHub publishing Hand-crafted methodology skills for the OpenClaw wintermute workspace: - gstack-openclaw-office-hours (375 lines) — 6 forcing questions, startup + builder modes - gstack-openclaw-ceo-review (193 lines) — 4 scope modes, 18 cognitive patterns - gstack-openclaw-investigate (136 lines) — Iron Law, 4-phase debugging - gstack-openclaw-retro (301 lines) — git analytics, per-person praise/growth Pure methodology, no gstack infrastructure. All frontmatter uses single-line inline JSON for OpenClaw parser compatibility. * feat: add AGENTS.md dispatch section with behavioral rules Ready-to-paste section for OpenClaw AGENTS.md with 3 iron-clad rules: 1. Always spawn sessions, never redirect user to Claude Code 2. Resolve repo path or ask, don't punt 3. Autoplan runs end-to-end, reports back in chat Includes full dispatch routing (Simple/Medium/Heavy/Full/Plan tiers). * chore: clear OpenClaw includeSkills — native skills replace generated Native ClaHub skills replace the gen-skill-docs pipeline output for these 4 skills. Updated test to validate empty includeSkills array. * docs: ClaHub install instructions + dispatch routing rules - README: add Native OpenClaw Skills section with clawhub install command - OPENCLAW.md: update dispatch routing with behavioral rules, update native skills section to reference ClaHub * chore: bump version and changelog (v0.15.10.0) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: add gstack-upgrade to OpenClaw dispatch routing Ensures "upgrade gstack" routes to a Claude Code session with /gstack-upgrade instead of Wintermute trying to handle it conversationally. * fix: stop tracking 58MB compiled binary bin/gstack-global-discover Already in .gitignore but was tracked due to historical mistake. Same issue as browse/dist/ and design/dist/. The .ts source is right next to it and ./setup builds from source for every platform. * test: detect compiled binaries and large files tracked by git Two new tests in skill-validation: - No Mach-O or ELF binaries tracked (catches accidental git add of compiled output) - No files over 2MB tracked (catches bloated binaries sneaking in) Both print the exact git rm --cached command to fix the issue. * fix: ClaHub → ClawHub (correct spelling) * docs: add ClawHub publishing instructions to CLAUDE.md Documents the clawhub publish command (not clawhub skill publish), auth flow, version bumping, and verification. --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
e2d005c7f4 |
feat: OpenClaw integration v2 — prompt is the bridge (v0.15.9.0) (#816)
* feat: add includeSkills to HostConfig + update OpenClaw config Add includeSkills allowlist field with union logic (include minus skip). Update OpenClaw to generate only 4 native methodology skills (office-hours, plan-ceo-review, investigate, retro). Remove staticFiles.SOUL.md reference (pointed to non-existent file). * feat: OpenClaw integration — gstack-lite/full generation + spawned session detection Add includeSkills filter to gen-skill-docs pipeline. Generate gstack-lite (planning discipline for spawned coding sessions) and gstack-full (complete feature pipeline) for OpenClaw host. Add OPENCLAW_SESSION env var detection in preamble for spawned session auto-detect. Update setup --host openclaw to print redirect message. * docs: OpenClaw architecture doc + regenerate all SKILL.md with spawned session detection Add docs/OPENCLAW.md with 4-tier dispatch routing and integration architecture. Generate gstack-lite and gstack-full prompt templates. Regenerate all SKILL.md files with OPENCLAW_SESSION env var check in preamble. * test: update golden baselines + OpenClaw includeSkills tests Update golden SKILL.md baselines for preamble SPAWNED_SESSION change. Replace staticFiles SOUL.md test with includeSkills validation. * chore: bump version and changelog (v0.15.9.0) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: remove all Wintermute references from source files Replace with generic "orchestrator" or "OpenClaw" as appropriate. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add Plan dispatch tier — full review gauntlet for Claude Code project planning New gstack-plan template chains /office-hours → /autoplan (CEO + eng + design + DX + codex adversarial), saves the reviewed plan, and reports back to the orchestrator. The orchestrator persists the plan link to its own memory store. 5 tiers now: Simple, Medium, Heavy, Full, Plan. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
2b08cfe71e |
fix: close redundant PRs + friendly error on all design commands (v0.15.8.1) (#817)
* fix: friendly OpenAI org error on all design commands Previously only generate.ts showed a user-friendly message when the OpenAI org wasn't verified. Now evolve, iterate, variants, and check all detect the 403 + "organization must be verified" pattern and show a clear message with the correct verification URL. * test: regression test for >128KB Codex session_meta Documents the current 128KB buffer limitation. When Codex embeds session_meta beyond 128KB, this test will fail, signaling the need for a streaming parse or larger buffer. * chore: bump version and changelog (v0.15.8.1) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
9ca8f1d7a9 |
feat: adaptive gating + cross-review dedup for review army (v0.15.2.0) (#760)
* feat: add test_stub optional field to specialist finding schema All specialist prompts now document test_stub as an optional output field, enabling specialists to suggest test code alongside findings. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: adaptive gating + test framework detection for review army Adds gstack-specialist-stats binary for tracking specialist hit rates. Resolver now detects test framework for test_stub generation, applies adaptive gating to skip silent specialists, and compiles per-specialist stats for the review-log entry. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: cross-review finding dedup + test stub override + enriched review-log Step 5.0 suppresses findings previously skipped by the user when the relevant code hasn't changed. Test stub findings force ASK classification so users approve test creation. Review-log now includes quality_score, per-specialist stats, and per-finding action records. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.15.2.0) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: bash operator precedence in test framework detection [ -f a ] || [ -f b ] && X="y" evaluates as A || (B && C), so the assignment only runs when the second test passes. Wrap the OR group in braces: { [ -f a ] || [ -f b ]; } && X="y". Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
115d81d792 |
fix: security wave 1 — 14 fixes for audit #783 (v0.15.7.0) (#810)
* fix: DNS rebinding protection checks AAAA (IPv6) records too Cherry-pick PR #744 by @Gonzih. Closes the IPv6-only DNS rebinding gap by checking both A and AAAA records independently. Co-Authored-By: Gonzih <gonzih@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: validateOutputPath symlink bypass — resolve real path before safe-dir check Cherry-pick PR #745 by @Gonzih. Adds a second pass using fs.realpathSync() to resolve symlinks after lexical path validation. Co-Authored-By: Gonzih <gonzih@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: validate saved URLs before navigation in restoreState Cherry-pick PR #751 by @Gonzih. Prevents navigation to cloud metadata endpoints or file:// URIs embedded in user-writable state files. Co-Authored-By: Gonzih <gonzih@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: telemetry-ingest uses anon key instead of service role key Cherry-pick PR #750 by @Gonzih. The service role key bypasses RLS and grants unrestricted database access — anon key + RLS is the right model for a public telemetry endpoint. Co-Authored-By: Gonzih <gonzih@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: killAgent() actually kills the sidebar claude subprocess Cherry-pick PR #743 by @Gonzih. Implements cross-process kill signaling via kill-file + polling pattern, tracks active processes per-tab. Co-Authored-By: Gonzih <gonzih@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(design): bind server to localhost and validate reload paths Cherry-pick PR #803 by @garagon. Adds hostname: '127.0.0.1' to Bun.serve() and validates /api/reload paths are within cwd() or tmpdir(). Closes C1+C2 from security audit #783. Co-Authored-By: garagon <garagon@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: add auth gate to /inspector/events SSE endpoint (C3) The /inspector/events endpoint had no authentication, unlike /activity/stream which validates tokens. Now requires the same Bearer header or ?token= query param check. Closes C3 from security audit #783. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: sanitize design feedback with trust boundary markers (C4+H5) Wrap user feedback in <user-feedback> XML markers with tag escaping to prevent prompt injection via malicious feedback text. Cap accumulated feedback to last 5 iterations to limit incremental poisoning. Closes C4 and H5 from security audit #783. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: harden file/directory permissions to owner-only (C5+H9+M9+M10) Add mode 0o700 to all mkdirSync calls for state/session directories. Add mode 0o600 to all writeFileSync calls for session.json, chat.jsonl, and log files. Add umask 077 to setup script. Prevents auth tokens, chat history, and browser logs from being world-readable on multi-user systems. Closes C5, H9, M9, M10 from security audit #783. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: TOCTOU race in setup symlink creation (C6) Remove the existence check before mkdir -p (it's idempotent) and validate the target isn't already a symlink before creating the link. Prevents a local attacker from racing between the check and mkdir to redirect SKILL.md writes. Closes C6 from security audit #783. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: remove CORS wildcard, restrict to localhost (H1) Replace Access-Control-Allow-Origin: * with http://127.0.0.1 on sidebar tab/chat endpoints. The Chrome extension uses manifest host_permissions to bypass CORS entirely, so this only blocks malicious websites from making cross-origin requests. Closes H1 from security audit #783. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: make cookie picker auth mandatory (H2) Remove the conditional if(authToken) guard that skipped auth when authToken was undefined. Now all cookie picker data/action routes reject unauthenticated requests. Closes H2 from security audit #783. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: gate /health token on chrome-extension Origin header Only return the auth token in /health response when the request Origin starts with chrome-extension://. The Chrome extension always sends this origin via manifest host_permissions. Regular HTTP requests (including tunneled ones from ngrok/SSH) won't get the token. The extension also has a fallback path through background.js that reads the token from the state file directly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: update server-auth test for chrome-extension Origin gating The test previously checked for 'localhost-only' comment. Now checks for 'chrome-extension://' since the token is gated on Origin header. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.15.7.0) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Gonzih <gonzih@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: garagon <garagon@users.noreply.github.com> |
||
|
|
31943b2f02 |
feat: anti-skip rule for all review skills (v0.15.6.1) (#804)
* feat: anti-skip rule for all review skills Review skills sometimes skip sections when reviewing strategy or spec plans. This adds an explicit anti-skip rule to CEO (1-11), eng (1-4), design (1-7), and DX (1-8) review skills. Also fixes CEO header from "10 sections" to "11 sections" to match actual count. * chore: bump version and changelog (v0.15.6.1) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
8038cad4a7 |
fix: self-healing skill prefix consistency in setup (#805)
* fix: self-healing gstack-relink after setup to prevent skill prefix drift Setup now runs gstack-relink as a final consistency check after linking Claude skills. This independently reads skill_prefix from config and ensures name: fields and directory names match, catching cases where interrupted setups or stale state left skills incorrectly prefixed. * chore: bump version and changelog (v0.15.6.1) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
04b709d91a |
feat: declarative multi-host platform + OpenCode, Slate, Cursor, OpenClaw (v0.15.5.0) (#793)
* test: add golden-file baselines for host config refactor Snapshot generated SKILL.md output for ship skill across all 3 existing hosts (Claude, Codex, Factory). These baselines verify the config-driven refactor produces identical output to the current hardcoded system. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add HostConfig interface and validator for declarative host system New scripts/host-config.ts defines the typed HostConfig interface that captures all per-host variation: paths, frontmatter rules, path/tool rewrites, suppressed resolvers, runtime root symlinks, install strategy, and behavioral config (co-author trailer, learnings mode, boundary instruction). Includes validateHostConfig() and validateAllConfigs() with regex-based security validation and cross-config uniqueness checks. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add typed host configs for Claude, Codex, Factory, and Kiro Extract all hardcoded host-specific values from gen-skill-docs.ts, types.ts, preamble.ts, review.ts, and setup into typed HostConfig objects. Each host is a single file in hosts/ with its paths, frontmatter rules, path/tool rewrites, runtime root manifest, and install behavior. hosts/index.ts exports all configs, derives the Host type, and provides resolveHostArg() for CLI alias handling (e.g., 'agents' -> 'codex', 'droid' -> 'factory'). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: derive Host type and HOST_PATHS from host configs types.ts no longer hardcodes host names or paths. The Host type is derived from ALL_HOST_CONFIGS in hosts/index.ts, and HOST_PATHS is built dynamically from each config's globalRoot/localSkillRoot/usesEnvVars. Adding a new host to hosts/index.ts automatically extends the type system. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: gen-skill-docs.ts consumes typed host configs Replace hardcoded EXTERNAL_HOST_CONFIG, transformFrontmatter host branches, path/tool rewrite if-chains, and ALL_HOSTS array with config-driven lookups from hosts/*.ts. - Host detection uses resolveHostArg() (handles aliases like agents/droid) - transformFrontmatter uses config's allowlist/denylist mode, extraFields, conditionalFields, renameFields, and descriptionLimitBehavior - Path rewrites use config's pathRewrites array (replaceAll, order matters) - Tool rewrites use config's toolRewrites object - Skill skipping uses config's generation.skipSkills - ALL_HOSTS derived from ALL_HOST_NAMES - Token budget display regex derived from host configs Golden-file comparison: all 3 hosts produce IDENTICAL output to baselines. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: preamble, co-author trailer, and resolver suppression use host configs - preamble.ts: hostConfigDir derived from config.globalRoot instead of hardcoded Record - utility.ts: generateCoAuthorTrailer reads from config.coAuthorTrailer instead of host switch statement - gen-skill-docs.ts: suppressedResolvers from config skip resolver execution at placeholder replacement time (belt+suspenders with existing ctx.host checks in individual resolvers) Golden-file comparison: all 3 hosts produce IDENTICAL output to baselines. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: setup tooling uses config-driven host detection - host-config-export.ts: new CLI that exposes host configs to bash (list, get, detect, validate, symlinks commands) - bin/gstack-platform-detect: reads host configs instead of hardcoded binary/path mapping - scripts/skill-check.ts: iterates host configs for skill validation and freshness checks instead of separate Codex/Factory blocks - lib/worktree.ts: iterates host configs for directory copy instead of hardcoded .agents Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add OpenCode, Slate, and Cursor host configs Three new hosts added to the declarative config system. Each is a typed HostConfig object with paths, frontmatter rules, and path rewrites. All generate valid SKILL.md output with zero .claude/skills path leakage. - hosts/opencode.ts: OpenCode (opencode.ai), skills at ~/.config/opencode/ - hosts/slate.ts: Slate (Random Labs), skills at ~/.slate/ - hosts/cursor.ts: Cursor, skills at ~/.cursor/ - .gitignore: add .kiro/, .opencode/, .slate/, .cursor/, .openclaw/ Zero code changes needed — just config files + re-export in index.ts. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add OpenClaw host config with adapter for tool mapping OpenClaw gets a hybrid approach: typed config for paths/frontmatter/ detection + a post-processing adapter for semantic tool rewrites. Config handles: path rewrites, frontmatter (name+description+version), CLAUDE.md→AGENTS.md, tool name rewrites (Bash→exec, Read→read, etc.), suppressed resolvers, SOUL.md via staticFiles. Adapter handles: AskUserQuestion→prose, Agent→sessions_spawn, $B→exec $B. Zero .claude/skills path leakage. Zero hardcoded tool references remaining. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: contributor add-host skill + fix version sync - contrib/add-host/SKILL.md.tmpl: contributor-only skill that guides new host config creation. Lives in contrib/, excluded from user installs. - package.json: sync version with VERSION file (0.15.2.1) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * test: add parameterized host smoke tests for all hosts 35 new tests covering all 7 external hosts (Codex, Factory, Kiro, OpenCode, Slate, Cursor, OpenClaw). Each host gets 4-5 tests: - output exists on disk with SKILL.md files - no .claude/skills path leakage in non-root skills - frontmatter has name + description fields - --dry-run freshness check passes - /codex skill excluded (for hosts with skipSkills: ['codex']) Tests are parameterized over ALL_HOST_CONFIGS so adding a new host automatically gets smoke-tested with zero new test code. Also updates --host all test to verify all registered hosts generate. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * test: 100% coverage for host config system 71 new tests in test/host-config.test.ts covering: - hosts/index.ts: ALL_HOST_CONFIGS, getHostConfig, resolveHostArg (aliases), getExternalHosts, uniqueness checks - host-config.ts validateHostConfig: name regex, displayName, cliCommand, cliAliases, globalRoot, localSkillRoot, hostSubdir, frontmatter.mode, linkingStrategy, shell injection attempts, paths with $ and ~ - host-config.ts validateAllConfigs: duplicate name/hostSubdir/globalRoot detection, error prefix format, real configs pass - HOST_PATHS derivation: env vars for external hosts, literal paths for Claude, localSkillRoot matches config, every host has entry - host-config-export.ts CLI: list, get (string/boolean/array), detect, validate, symlinks, error cases (missing args, unknown field/host) - Golden-file regression: claude/codex/factory ship SKILL.md vs baselines - Individual host config correctness: prefixable, linkingStrategy, usesEnvVars, description limits, metadata, sidecar, tool rewrites, conditional fields, suppressed resolvers, boundary instruction, co-author trailers, skip rules, path rewrites, runtime root assets Combined with the 35 parameterized smoke tests from gen-skill-docs.test.ts, total new test coverage for multi-host: 106 tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: update golden baselines and sync version after merge from main Golden files refreshed to match post-merge generated output. package.json version synced to VERSION file (0.15.4.0). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: bump version and changelog (v0.15.5.0) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: sidebar E2E tests now self-contained and passing - sidebar-url-accuracy: fix stale assertion that expected extensionUrl in prompt text (prompt format changed, URL is now in pageUrl field) - sidebar-css-interaction: simplify task from multi-step HN comment navigation to single-page example.com style injection (faster, more reliable, still exercises goto + style + completion flow) - Update golden baselines after merge from main All 3 sidebar tests now pass: 3/3, 0 fail, ~36s total. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: add ADDING_A_HOST.md guide + update docs for multi-host system - docs/ADDING_A_HOST.md: step-by-step guide for adding a new host (create config, register, gitignore, generate, test). Covers the full HostConfig interface, adapter pattern, and validation. - CONTRIBUTING.md: replace stale "Dual-host development" section with "Multi-host development" covering all 8 hosts and linking to the guide. - README.md: consolidate Codex/Factory install sections into one "Other AI Agents" section listing all supported hosts with auto-detect. - CLAUDE.md: add hosts/, host-config.ts, host-adapters/, contrib/ to project structure tree. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: README per-host install instructions for all 8 agents Each supported agent now has its own copy-paste install block with the exact command and where skills end up on disk. Includes: auto-detect, Codex, OpenCode, Cursor, Factory, OpenClaw, Slate, and Kiro. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
447851452a |
feat: interactive /plan-devex-review + plan mode skill fix (v0.15.5.0) (#796)
* fix: skill invocation during plan mode takes precedence over generic plan mode Adds a "Skill Invocation During Plan Mode" section to the preamble resolver so all generated SKILL.md files include it. Fixes a bug where Claude treats loaded skill content as reference material instead of executable instructions, and keeps trying to ExitPlanMode instead of following the skill workflow step by step. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: interactive /plan-devex-review with persona, benchmarks, and forcing questions Complete rewrite of the DX review skill to match CEO/eng review depth. New flow: investigate (persona, empathy, competitors, magical moment, journey tracing) then force decisions, then score with evidence. Three modes: DX EXPANSION, DX POLISH, DX TRIAGE. 20-45 interactive STOP points vs 10-12 before. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: autoplan DX POLISH mode + review log schema for new devex fields Adds mode selection, persona, competitive, and magical moment override rules to autoplan Phase 3.5. Documents new review log fields (mode, persona, competitive_tier) in the plan-file-review-report schema. Syncs package.json version to VERSION. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: update project documentation for v0.15.5.0 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
cf73db5f19 |
feat: autoplan DX integration + README docs (v0.15.4.0) (#791)
* docs: document /plan-devex-review and /devex-review in README Add both skills to the install instructions, skills table, and a new "Which review?" comparison table showing when to use each review type. * feat: add /plan-devex-review to /autoplan as conditional Phase 3.5 Auto-detects developer-facing scope (API, CLI, SDK, shell, agent, MCP, OpenClaw) and runs DX review with dual adversarial voices after Eng review. * chore: bump version and changelog (v0.15.4.0) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
be96ff5ce7 |
feat: /plan-devex-review + /devex-review — DX review skills (v0.15.3.0) (#784)
* feat: add DX framework resolver for shared principles and scoring rubric
New {{DX_FRAMEWORK}} resolver provides compact (~150 lines) shared content
for /plan-devex-review and /devex-review: Addy Osmani's 8 DX principles,
7 characteristics table, 10 cognitive patterns, scoring rubric, and TTHW
benchmarks. Hall of Fame examples loaded on-demand per pass to avoid bloat.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: add DX Review row to review dashboard
Adds plan-devex-review and devex-review schema entries to the review
dashboard resolver and placeholder table in the preamble. All existing
SKILL.md files regenerated to include the new DX Review row.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: /plan-devex-review skill — DX plan review with Osmani framework
Plan-stage developer experience review. Rates 8 DX dimensions 0-10:
getting started, API/CLI/SDK design, error messages, docs, upgrade path,
dev environment, community, and DX measurement. Includes developer empathy
simulation, auto-detect product type with applicability gate, DX scorecard
with trend tracking, and a conditional Claude Code Skill DX checklist.
Hall of Fame examples loaded on-demand per pass from dx-hall-of-fame.md.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: /devex-review skill — live DX audit with browse
Live-system developer experience audit using browse tool. Tests all 8
dimensions aligned with /plan-devex-review for boomerang comparison
(plan said 3 min TTHW, reality says 8). Each dimension marked TESTED,
INFERRED, or N/A with evidence. Scope-aware: declares what browse can
and cannot test, falls back to file artifacts for untestable dimensions.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: bump version and changelog (v0.15.3.0)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
||
|
|
c620de38e1 |
fix: setup runs pending migrations so git pull + ./setup works (#774)
* fix: setup runs pending migrations so git pull + ./setup works Previously, version migrations only ran during /gstack-upgrade (Step 4.75). Users who updated via git pull + ./setup never got migrations applied, leaving stale skill directory structures in place. Now setup tracks the last-run version in ~/.gstack/.last-setup-version and runs any pending migrations automatically. Idempotent — safe to run multiple times. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: setup runs pending migrations so git pull + ./setup works Previously, version migrations only ran during /gstack-upgrade (Step 4.75). Users who updated via git pull + ./setup never got migrations applied, leaving stale skill directory structures in place. Now setup tracks the last-run version in ~/.gstack/.last-setup-version and runs any pending migrations automatically. Addresses adversarial review findings: space-safe while-read loop, fresh install guard, upper-bound version check, missing VERSION guard. * chore: bump version and changelog (v0.15.2.1) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
846269e3b1 |
feat: voice-friendly skill triggers for AquaVoice (v0.14.6.0) (#732)
* feat: voice-friendly skill triggers for speech-to-text input Add voice-triggers YAML field to 10 SKILL.md.tmpl files with natural-language aliases (e.g. "see-so" for /cso, "tech review" for /plan-eng-review). gen-skill-docs preprocesses voice triggers before transformFrontmatter, folding them into the description and stripping the field from output. Includes unit tests, README voice input section, and CONTRIBUTING.md update. * chore: bump version and changelog (v0.14.6.0) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
6169273d16 |
feat: /design-html works from any starting point (v0.15.1.0) (#734)
* feat: /design-html works from any starting point — not just design-shotgun Three routing modes: approved mockup (Case A), CEO plan or design variants without formal approval (Case B), or clean slate with just a description (Case C). Each mode asks the right questions via AskUserQuestion instead of blocking with "no approved design found." * chore: bump version and changelog (v0.15.1.0) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
562a67503a |
feat: Session Intelligence Layer — /checkpoint + /health + context recovery (v0.15.0.0) (#733)
* feat: session timeline binaries (gstack-timeline-log + gstack-timeline-read) New binaries for the Session Intelligence Layer. gstack-timeline-log appends JSONL events to ~/.gstack/projects/$SLUG/timeline.jsonl. gstack-timeline-read reads, filters, and formats timeline data for /retro consumption. Timeline is local-only project intelligence, never sent anywhere. Always-on regardless of telemetry setting. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: preamble context recovery + timeline events + predictive suggestions Layers 1-3 of the Session Intelligence Layer: - Timeline start/complete events injected into every skill via preamble - Context recovery (tier 2+): lists recent CEO plans, checkpoints, reviews - Cross-session injection: LAST_SESSION and LATEST_CHECKPOINT for branch - Predictive skill suggestion from recent timeline patterns - Welcome back message synthesis - Routing rules for /checkpoint and /health Timeline writes are NOT gated by telemetry (local project intelligence). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: /checkpoint + /health skills (Layers 4-5) /checkpoint: save/resume/list working state snapshots. Supports cross-branch listing for Conductor workspace handoff. Session duration tracking. /health: code quality scorekeeper. Wraps project tools (tsc, biome, knip, shellcheck, tests), computes composite 0-10 score, tracks trends over time. Auto-detects tools or reads from CLAUDE.md ## Health Stack. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: regenerate SKILL.md files + add timeline tests 9 timeline tests (all passing) mirroring learnings.test.ts pattern. All 34 SKILL.md files regenerated with new preamble (context recovery, timeline events, routing rules for /checkpoint and /health). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.15.0.0) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: update self-learning roadmap post-Session Intelligence R1-R3 marked shipped with actual versions. R4 becomes Adaptive Ceremony (trust as separate policy engine, scope-aware, gradual degradation). R5 becomes /autoship (resumable state machine, not linear chain). R6-R7 unbundled from old R5. Added State Systems reference, Risk Register (Codex-reviewed), and validation metrics for R4. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: E2E tests for Session Intelligence (timeline, recovery, checkpoint) 3 gate-tier E2E tests: - timeline-event-flow: binary data flow round-trip (no LLM) - context-recovery-artifacts: seeded artifacts appear in preamble - checkpoint-save-resume: checkpoint file created with YAML frontmatter Also fixes package.json version sync (0.14.6.0 → 0.15.0.0). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
8115951284 |
feat: recursive self-improvement — operational learning + full skill wiring (v0.13.8.0) (#647)
* refactor: remove dead contributor mode, replace with operational self-improvement slot Contributor mode never fired in 18 days of heavy use (required manual opt-in via gstack-config, gated behind _CONTRIB=true, wrote disconnected markdown). Removes: generateContributorMode(), _CONTRIB bash var, 2 E2E tests, touchfile entry, doc references. Cleans up skip-lists in plan-ceo-review, autoplan, review resolver, and document-release templates. The operational self-improvement system (next commit) replaces this slot with automatic learning capture that requires no opt-in. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: operational self-improvement — every skill learns from failures Adds universal operational learning capture to the preamble completion protocol. At the end of every skill session, the agent reflects on CLI failures, wrong approaches, and project quirks, logging them as type "operational" to the learnings JSONL. Future sessions surface these automatically. - generateCompletionStatus(ctx) now includes operational capture section - Preamble bash shows top 3 learnings inline when count > 5 - New "operational" type in generateLearningsLog alongside pattern/pitfall/etc - Updated unit tests + operational seed entry in learnings E2E Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: wire learnings into all insight-producing skills Adds LEARNINGS_SEARCH and/or LEARNINGS_LOG to 10 skill templates that produce reusable insights but were previously disconnected from the learning system: - office-hours, plan-ceo-review, plan-eng-review: add LOG (had SEARCH) - plan-design-review: add both SEARCH + LOG (had neither) - design-review, design-consultation, cso, qa, qa-only: add both - retro: add SEARCH (had LOG) 13 skills now fully participate in the learning loop (read + write). Every review, QA, investigation, and design session both consults prior learnings and contributes new ones. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: add operational-learning E2E test (gate-tier) Validates the write path: agent encounters a CLI failure, logs an operational learning to JSONL via gstack-learnings-log. Replaces the removed contributor-mode E2E test. Setup: temp git repo, copy bin scripts, set GSTACK_HOME. Prompt: simulated npm test failure needing --experimental-vm-modules. Assert: learnings.jsonl exists with type=operational entry. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: learnings-show E2E slug mismatch — seed at computed slug, not hardcoded The test seeded learnings at projects/test-project/ but gstack-slug computes the slug from basename(workDir) when no git remote exists. The agent's search looked at the wrong path and found nothing. Fix: compute slug the same way gstack-slug does (basename + sanitize) and seed the learnings there. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.13.8.0) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
7ea6ead9fa |
fix: ship idempotency + skill prefix name patching (v0.14.3.0) (#693)
* fix: add idempotency guards to /ship Steps 4, 7, 8 (#649) If git push succeeds but gh pr create fails, re-running /ship would double-bump VERSION and duplicate CHANGELOG entries. Now: - Step 4: check if VERSION already differs from base branch - Step 7: fetch only the specific branch, skip push if already up to date - Step 8: if PR exists, update body via gh pr edit instead of creating duplicate No CHANGELOG guard needed — Step 5 is already idempotent by design ("replace existing entries with one unified entry"). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: patch name: in SKILL.md frontmatter for prefix mode (#620, #578) ./setup --prefix creates gstack-* symlinks but SKILL.md still says name: qa, so Claude Code ignores the prefix. Now: - New bin/gstack-patch-names shared helper patches name: field via sed - setup calls it after link_claude_skill_dirs - gstack-relink calls it after symlink loop - gen-skill-docs.ts prints warning when skill_prefix is true Edge cases: gstack-upgrade not double-prefixed, root gstack skill never prefixed, prefix removal restores original names, SKILL.md without frontmatter is a safe no-op. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: add name patching + ship idempotency tests (#620, #649) - 4 unit tests for name: patching in relink.test.ts (prefix on/off, gstack-upgrade not double-prefixed, no-frontmatter no-op) - 2 tests for gen-skill-docs prefix warning - 1 E2E test for ship idempotency (periodic tier) - Updated setupMockInstall to write SKILL.md with proper frontmatter - Added ship-idempotency touchfiles + tier classification Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.14.3.0) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: PR idempotency checks open state, dedupe touchfiles, sync package.json - Step 8 PR guard now checks state==OPEN so closed PRs don't prevent new PR creation (adversarial review finding) - Remove duplicate ship-idempotency entry in E2E_TOUCHFILES - Sync package.json version to 0.14.3.0 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: patch name: before creating symlinks to fix --no-prefix ordering bug gstack-patch-names must run BEFORE link_claude_skill_dirs so symlink names reflect the correct (patched) name: values. Previously, switching from --prefix to --no-prefix would read stale gstack-* names from SKILL.md and create wrong symlinks. (Codex adversarial finding) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
a4a181ca92 |
feat: Review Army — parallel specialist reviewers for /review (v0.14.3.0) (#692)
* feat: extend gstack-diff-scope with SCOPE_MIGRATIONS, SCOPE_API, SCOPE_AUTH
Three new scope signals for Review Army specialist activation:
- SCOPE_MIGRATIONS: db/migrate/, prisma/migrations/, alembic/, *.sql
- SCOPE_API: *controller*, *route*, *endpoint*, *.graphql, openapi.*
- SCOPE_AUTH: *auth*, *session*, *jwt*, *oauth*, *permission*, *role*
* feat: add 7 specialist checklist files for Review Army
- testing.md (always-on): coverage gaps, flaky patterns, security enforcement
- maintainability.md (always-on): dead code, DRY, stale comments
- security.md (conditional): OWASP deep analysis, auth bypass, injection
- performance.md (conditional): N+1 queries, bundle impact, complexity
- data-migration.md (conditional): reversibility, lock duration, backfill
- api-contract.md (conditional): breaking changes, versioning, error format
- red-team.md (conditional): adversarial analysis, cross-cutting concerns
All use standard header with JSON output schema and NO FINDINGS fallback.
* feat: Review Army resolver — parallel specialist dispatch + merge
New resolver in review-army.ts generates template prose for:
- Stack detection and specialist selection
- Parallel Agent tool dispatch with learning-informed prompts
- JSON finding collection, fingerprint dedup, consensus highlighting
- PR quality score computation
- Red Team conditional dispatch
Registered as REVIEW_ARMY in resolvers/index.ts.
* refactor: restructure /review template for Review Army
- Replace Steps 4-4.75 with CRITICAL pass + {{REVIEW_ARMY}}
- Remove {{DESIGN_REVIEW_LITE}} and {{TEST_COVERAGE_AUDIT_REVIEW}}
(subsumed into Design and Testing specialists respectively)
- Extract specialist-covered categories from checklist.md
- Keep CRITICAL + uncovered INFORMATIONAL in main agent pass
* test: Review Army — 14 diff-scope tests + 7 E2E tests
- test/diff-scope.test.ts: 14 tests for all 9 scope signals
- test/skill-e2e-review-army.test.ts: 7 E2E tests
Gate: migration safety, N+1 detection, delivery audit,
quality score, JSON findings
Periodic: red team, consensus
- Updated gen-skill-docs tests for new review structure
- Added touchfile entries and tier classifications
* docs: update SELF_LEARNING_V0.md with Release 2 status + Release 2.5
Mark Release 2 (Review Army) as in-progress. Add Release 2.5 for
deferred expansions (E1 adaptive gating, E3 test stubs, E5 cross-review
dedup, E7 specialist tracking).
* chore: bump version and changelog (v0.14.3.0)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
||
|
|
a0328be04c |
feat: always-on adversarial review + scope drift + plan mode design tools (v0.14.3.0) (#694)
* feat: always-on adversarial review + scope drift resolver + cross-model tension format
Rewrite generateAdversarialStep() to remove LOC-based tier skipping. Every review
now runs both Claude adversarial subagent and Codex adversarial challenge. OLD_CFG
only gates Codex passes, not Claude. Add generateScopeDrift() shared resolver.
Fix cross-model tension AskUserQuestion to include RECOMMENDATION + Completeness.
* feat: add scope drift to /ship, extract from /review template
/ship gets {{SCOPE_DRIFT}} at Step 3.48 + PR body slot. /review replaces
hardcoded scope drift with {{SCOPE_DRIFT}} + {{PLAN_COMPLETION_AUDIT_REVIEW}}.
* feat: plan mode safe operations — browse, design, codex allowed in plan mode
Add preamble section declaring $B, $D, codex, and ~/.gstack/ writes as
plan-mode-safe. Unblocks design skills during planning.
* test: update adversarial + add scope drift assertions
Rename adversarial tests to reflect always-on behavior. Remove tier
threshold assertions. Add scope drift content assertions for both
/review and /ship generated SKILL.md files.
* chore: bump version and changelog (v0.14.3.0)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
||
|
|
a1a933614c |
feat: sidebar CSS inspector + per-tab agents (v0.13.9.0) (#650)
* feat: CDP inspector module — persistent sessions, CSS cascade, style modification New browse/src/cdp-inspector.ts with full CDP inspection engine: - inspectElement() via CSS.getMatchedStylesForNode + DOM.getBoxModel - modifyStyle() via CSS.setStyleTexts with headless page.evaluate fallback - Persistent CDP session lifecycle (create, reuse, detach on nav, re-create) - Specificity sorting, overridden property detection, UA rule filtering - Modification history with undo support - formatInspectorResult() for CLI output Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: browse server inspector endpoints + inspect/style/cleanup/prettyscreenshot CLI Server endpoints: POST /inspector/pick, GET /inspector, POST /inspector/apply, POST /inspector/reset, GET /inspector/history, GET /inspector/events (SSE). CLI commands: inspect (CDP cascade), style (live CSS mod), cleanup (page clutter removal), prettyscreenshot (clean screenshot pipeline). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: sidebar CSS inspector — element picker, box model, rule cascade, quick edit Extension changes for the visual CSS inspector: - inspector.js: element picker with hover highlight, CSS selector generation, basic mode fallback (getComputedStyle + CSSOM), page alteration handlers - inspector.css: picker overlay styles (blue highlight + tooltip) - background.js: inspector message routing (picker <-> server <-> sidepanel) - sidepanel: Inspector tab with box model viz (gstack palette), matched rules with specificity badges, computed styles, click-to-edit quick edit, Send to Agent/Code button, empty/loading/error states Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: document inspect, style, cleanup, prettyscreenshot browse commands Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: auto-track user-created tabs and handle tab close browser-manager.ts changes: - context.on('page') listener: automatically tracks tabs opened by the user (Cmd+T, right-click open in new tab, window.open). Previously only programmatic newTab() was tracked, so user tabs were invisible. - page.on('close') handler in wirePageEvents: removes closed tabs from the pages map and switches activeTabId to the last remaining tab. - syncActiveTabByUrl: match Chrome extension's active tab URL to the correct Playwright page for accurate tab identity. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: per-tab agent isolation via BROWSE_TAB environment variable Prevents parallel sidebar agents from interfering with each other's tab context. Three-layer fix: - sidebar-agent.ts: passes BROWSE_TAB=<tabId> env var to each claude process, per-tab processing set allows concurrent agents across tabs - cli.ts: reads process.env.BROWSE_TAB and includes tabId in command request body - server.ts: handleCommand() temporarily switches activeTabId when tabId is present, restores after command completes (safe: Bun event loop is single-threaded) Also: per-tab agent state (TabAgentState map), per-tab message queuing, per-tab chat buffers, verbose streaming narration, stop button endpoint. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: sidebar per-tab chat context, tab bar sync, stop button, UX polish Extension changes: - sidepanel.js: per-tab chat history (tabChatHistories map), switchChatTab() swaps entire chat view, browserTabActivated handler for instant tab sync, stop button wired to /sidebar-agent/stop, pollTabs renders tab bar - sidepanel.html: updated banner text ("Browser co-pilot"), stop button markup, input placeholder "Ask about this page..." - sidepanel.css: tab bar styles, stop button styles, loading state fixes - background.js: chrome.tabs.onActivated sends browserTabActivated to sidepanel with tab URL for instant tab switch detection Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: per-tab isolation, BROWSE_TAB pinning, tab tracking, sidebar UX sidebar-agent.test.ts (new tests): - BROWSE_TAB env var passed to claude process - CLI reads BROWSE_TAB and sends tabId in body - handleCommand accepts tabId, saves/restores activeTabId - Tab pinning only activates when tabId provided - Per-tab agent state, queue, concurrency - processingTabs set for parallel agents sidebar-ux.test.ts (new tests): - context.on('page') tracks user-created tabs - page.on('close') removes tabs from pages map - Tab isolation uses BROWSE_TAB not system prompt hack - Per-tab chat context in sidepanel - Tab bar rendering, stop button, banner text Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: resolve merge conflicts — keep security defenses + per-tab isolation Merged main's security improvements (XML escaping, prompt injection defense, allowed commands whitelist, --model opus, Write tool, stderr capture) with our branch's per-tab isolation (BROWSE_TAB env var, processingTabs set, no --resume). Updated test expectations for expanded system prompt. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.13.9.0) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: add inspector message types to background.js allowlist Pre-existing bug found by Codex: ALLOWED_TYPES in background.js was missing all inspector message types (startInspector, stopInspector, elementPicked, pickerCancelled, applyStyle, toggleClass, injectCSS, resetAll, inspectResult). Messages were silently rejected, making the inspector broken on ALL pages. Also: separate executeScript and insertCSS into individual try blocks in injectInspector(), store inspectorMode for routing, and add content.js fallback when script injection fails (CSP, chrome:// pages). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: basic element picker in content.js for CSP-restricted pages When inspector.js can't be injected (CSP, chrome:// pages), content.js provides a basic picker using getComputedStyle + CSSOM: - startBasicPicker/stopBasicPicker message handlers - captureBasicData() with ~30 key CSS properties, box model, matched rules - Hover highlight with outline save/restore (never leaves artifacts) - Click uses e.target directly (no re-querying by selector) - Sends inspectResult with mode:'basic' for sidebar rendering - Escape key cancels picker and restores outlines Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: cleanup + screenshot buttons in sidebar inspector toolbar Two action buttons in the inspector toolbar: - Cleanup (🧹): POSTs cleanup --all to server, shows spinner, chat notification on success, resets inspector state (element may be removed) - Screenshot (📸): POSTs screenshot to server, shows spinner, chat notification with saved file path Shared infrastructure: - .inspector-action-btn CSS with loading spinner via ::after pseudo-element - chat-notification type in addChatEntry() for system messages - package.json version bump to 0.13.9.0 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: inspector allowlist, CSP fallback, cleanup/screenshot buttons 16 new tests in sidebar-ux.test.ts: - Inspector message allowlist includes all inspector types - content.js basic picker (startBasicPicker, captureBasicData, CSSOM, outline save/restore, inspectResult with mode basic, Escape cleanup) - background.js CSP fallback (separate try blocks, inspectorMode, fallback) - Cleanup button (POST /command, inspector reset after success) - Screenshot button (POST /command, notification rendering) - Chat notification type and CSS styles Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: update project documentation for v0.13.9.0 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: cleanup + screenshot buttons in chat toolbar (not just inspector) Quick actions toolbar (🧹 Cleanup, 📸 Screenshot) now appears above the chat input, always visible. Both inspector and chat buttons share runCleanup() and runScreenshot() helper functions. Clicking either set shows loading state on both simultaneously. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: chat toolbar buttons, shared helpers, quick-action-btn styles Tests that chat toolbar exists (chat-cleanup-btn, chat-screenshot-btn, quick-actions container), CSS styles (.quick-action-btn, .quick-action-btn.loading), shared runCleanup/runScreenshot helper functions, and cleanup inspector reset. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: aggressive cleanup heuristics — overlays, scroll unlock, blur removal Massively expanded CLEANUP_SELECTORS with patterns from uBlock Origin and Readability.js research: - ads: 30+ selectors (Google, Amazon, Outbrain, Taboola, Criteo, etc.) - cookies: OneTrust, Cookiebot, TrustArc, Quantcast + generic patterns - overlays (NEW): paywalls, newsletter popups, interstitials, push prompts, app download banners, survey modals - social: follow prompts, share tools - Cleanup now defaults to --all when no args (sidebar button fix) - Uses !important on all display:none (overrides inline styles) - Unlocks body/html scroll (overflow:hidden from modal lockout) - Removes blur/filter effects (paywall content blur) - Removes max-height truncation (article teaser truncation) - Collapses empty ad placeholder whitespace (empty divs after ad removal) - Skips gstack-ctrl indicator in sticky removal Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: disable action buttons when disconnected, no error spam - setActionButtonsEnabled() toggles .disabled class on all cleanup/screenshot buttons (both chat toolbar and inspector toolbar) - Called with false in updateConnection when server URL is null - Called with true when connection established - runCleanup/runScreenshot silently return when disconnected instead of showing 'Not connected' error notifications - CSS .disabled style: pointer-events:none, opacity:0.3, cursor:not-allowed Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: cleanup heuristics, button disabled state, overlay selectors 17 new tests: - cleanup defaults to --all on empty args - CLEANUP_SELECTORS overlays category (paywall, newsletter, interstitial) - Major ad networks in selectors (doubleclick, taboola, criteo, etc.) - Major consent frameworks (OneTrust, Cookiebot, TrustArc, Quantcast) - !important override for inline styles - Scroll unlock (body overflow:hidden) - Blur removal (paywall content blur) - Article truncation removal (max-height) - Empty placeholder collapse - gstack-ctrl indicator skip in sticky cleanup - setActionButtonsEnabled function - Buttons disabled when disconnected - No error spam from cleanup/screenshot when disconnected - CSS disabled styles for action buttons Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: LLM-based page cleanup — agent analyzes page semantically Instead of brittle CSS selectors, the cleanup button now sends a prompt to the sidebar agent (which IS an LLM). The agent: 1. Runs deterministic $B cleanup --all as a quick first pass 2. Takes a snapshot to see what's left 3. Analyzes the page semantically to identify remaining clutter 4. Removes elements intelligently, preserving site branding This means cleanup works correctly on any site without site-specific selectors. The LLM understands that "Your Daily Puzzles" is clutter, "ADVERTISEMENT" is junk, but the SF Chronicle masthead should stay. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: aggressive cleanup heuristics + preserve top nav bar Deterministic cleanup improvements (used as first pass before LLM analysis): - New 'clutter' category: audio players, podcast widgets, sidebar puzzles/games, recirculation widgets (taboola, outbrain, nativo), cross-promotion banners - Text-content detection: removes "ADVERTISEMENT", "Article continues below", "Sponsored", "Paid content" labels and their parent wrappers - Sticky fix: preserves the topmost full-width element near viewport top (site nav bar) instead of hiding all sticky/fixed elements. Sorts by vertical position, preserves the first one that spans >80% viewport width. Tests: clutter category, ad label removal, nav bar preservation logic. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: LLM-based cleanup architecture, deterministic heuristics, sticky nav 22 new tests covering: - Cleanup button uses /sidebar-command (agent) not /command (deterministic) - Cleanup prompt includes deterministic first pass + agent snapshot analysis - Cleanup prompt lists specific clutter categories for agent guidance - Cleanup prompt preserves site identity (masthead, headline, body, byline) - Cleanup prompt instructs scroll unlock and $B eval removal - Loading state management (async agent, setTimeout) - Deterministic clutter: audio/podcast, games/puzzles, recirculation - Ad label text patterns (ADVERTISEMENT, Sponsored, Article continues) - Ad label parent wrapper hiding for small containers - Sticky nav preservation (sort by position, first full-width near top) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: prevent repeat chat message rendering on reconnect/replay Root cause: server persists chat to disk (chat.jsonl) and replays on restart. Client had no dedup, so every reconnect re-rendered the entire history. Messages from an old HN session would repeat endlessly on the SF Chronicle tab. Fix: renderedEntryIds Set tracks which entry IDs have been rendered. addChatEntry skips entries already in the set. Entries without an id (local notifications) bypass the check. Clear chat resets the set. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: agent stops when done, no focus stealing, opus for prompt injection safety Three fixes for sidebar agent UX: - System prompt: "Be CONCISE. STOP as soon as the task is done. Do NOT keep exploring or doing bonus work." Prevents agent from endlessly taking screenshots and highlighting elements after answering the question. - switchTab(id, opts): new bringToFront option. Internal tab pinning (BROWSE_TAB) uses bringToFront: false so agent commands never steal window focus from the user's active app. - Keep opus model (not sonnet) for prompt injection resistance on untrusted web pages. Remove Write from allowedTools (agent only needs Bash for $B). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: agent conciseness, focus stealing, opus model, switchTab opts Tests for the three UX fixes: - System prompt contains STOP/CONCISE/Do NOT keep exploring - sidebar agent uses opus (not sonnet) for prompt injection resistance - switchTab has bringToFront option, defaults to true (opt-out) - handleCommand tab pinning uses bringToFront: false (no focus steal) - Updated stale tests: switchTab signature, allowedTools excludes Write, narration -> conciseness, tab pinning restore calls Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: sidebar CSS interaction E2E — HN comment highlight round-trip New E2E test (periodic tier, ~$2/run) that exercises the full sidebar agent pipeline with CSS interaction: 1. Agent navigates to Hacker News 2. Clicks into the top story's comments 3. Reads comments and identifies the most insightful one 4. Highlights it with a 4px solid orange outline via style injection Tests: navigation, snapshot, text reading, LLM judgment, CSS modification. Requires real browser + real Claude (ANTHROPIC_API_KEY). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: sidebar CSS E2E test — correct idle timeout (ms not s), pipe stdio Root cause of test failure: BROWSE_IDLE_TIMEOUT is in milliseconds, not seconds. '600' = 0.6 seconds, server died immediately after health check. Fixed to '600000' (10 minutes). Also: use 'pipe' stdio instead of file descriptors (closing fds kills child on macOS/bun), catch ConnectionRefused on poll retry, 4 min poll timeout for the multi-step opus task. Test passes: agent navigates to HN, reads comments, identifies most insightful one, highlights it with orange CSS, stops. 114s, $0.00. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
7911b7b974 |
fix: force comparison board as default variant chooser (v0.14.1.0) (#658)
* fix: force comparison board as default variant chooser The comparison board ($D compare --serve) was being skipped in favor of showing variants inline + AskUserQuestion "which do you prefer?" — a degraded experience missing rating controls, comments, and remix buttons. Changes: - Replace "show inline" instruction with "do NOT show inline, proceed to comparison board" in plan-design-review/SKILL.md.tmpl - Add CRITICAL RULE: never use AskUserQuestion as the variant chooser - Change DESIGN_SHOTGUN_LOOP resolver to AskUserQuestion-first wait with polling fallback (affects all 3 consumer skills) - Fix board URL from /design-board.html (404) to / (correct) - Improve serve-failure fallback to show variants inline via Read tool * chore: bump version and changelog (v0.14.1.0) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
8151fcd589 |
feat: /design-html skill — Pretext-native HTML from approved mockups (v0.14.0.0) (#653)
* feat: /design-html skill — Pretext-native HTML from approved mockups New skill that takes approved design-shotgun mockups and generates production-quality HTML with Pretext for computed text layout. Text reflows on resize, heights adjust to content, zero hardcoded CSS. Includes vendored Pretext bundle (30KB), smart API routing per design type, AskUserQuestion refinement loop, framework detection, and 3-viewport verification screenshots. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: integrate /design-html into design skill pipeline - design-shotgun: Step 6 option B now chains to /design-html - design-consultation: suggests /design-html after shipping DESIGN.md (conditional on screen-level output, not tokens-only) - plan-design-review: expanded chaining to include /design-shotgun and /design-html alongside review skills Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: update plan-design-review chaining test for design skills plan-design-review now chains to /design-shotgun and /design-html in addition to review skills. Update the assertion to match. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: add gstack keyword to design-html description for validation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.14.0.0) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
403637f0c8 |
feat: rotating founder resources in /office-hours closing (v0.13.10.0) (#652)
* feat: rotating founder resources in /office-hours closing Add Beat 3.5 with 34 curated resources (5 Garry Tan videos, 2 YC Backstory, 9 Lightcone Podcast, 8 Startup School, 10 PG essays) that rotate contextually each session. Includes dedup log to avoid repeats, analytics logging, and browser-open offers. Also adds chmod +x safety net to build script. * chore: bump version and changelog (v0.13.10.0) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
66c09644a7 |
feat: composable skills — INVOKE_SKILL resolver + factoring infrastructure (v0.13.7.0) (#644)
* feat: add parameterized resolver support to gen-skill-docs
Extend the placeholder regex from {{WORD}} to {{WORD:arg1:arg2}},
enabling parameterized resolvers like {{INVOKE_SKILL:plan-ceo-review}}.
- Widen ResolverFn type to accept optional args?: string[]
- Update RESOLVERS record to use ResolverFn type
- Both replacement and unresolved-check regexes updated
- Fully backward compatible: existing {{WORD}} patterns unchanged
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: add INVOKE_SKILL resolver for composable skill loading
New composition.ts resolver module that emits prose instructing Claude
to read another skill's SKILL.md and follow it, skipping preamble
sections. Supports optional skip= parameter for additional sections.
Usage: {{INVOKE_SKILL:plan-ceo-review}} or
{{INVOKE_SKILL:plan-ceo-review:skip=Outside Voice}}
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: use frontmatter name: for skill symlinks and Codex paths
Patch all 3 name-derivation paths to read name: from SKILL.md
frontmatter instead of relying solely on directory basenames.
This enables directory names that differ from invocation names
(e.g., run-tests/ directory with name: test).
- setup: link_claude_skill_dirs reads name: via grep, falls back to basename
- gen-skill-docs.ts: codexSkillName uses frontmatter name for Codex output paths
- gen-skill-docs.ts: moved frontmatter extraction before Codex path logic
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: extract CHANGELOG_WORKFLOW resolver from /ship
Move changelog generation logic into a reusable resolver. The resolver
is changelog-only (no version bump per Codex review recommendation).
Adds voice rules inline. /ship Step 5 now uses {{CHANGELOG_WORKFLOW}}.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor: use INVOKE_SKILL resolver for plan-ceo-review office-hours fallback
Replace inline skill loading prose (read file, skip sections) with
{{INVOKE_SKILL:office-hours}} in the mid-session detection path.
The BENEFITS_FROM prerequisite offer is unchanged (separate use case).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor: BENEFITS_FROM resolver delegates to INVOKE_SKILL
Eliminate duplicated skip-list logic by having generateBenefitsFrom
call generateInvokeSkill internally. The wrapper (AskUserQuestion,
design doc re-check) stays in BENEFITS_FROM. The loading instructions
(read file, skip sections, error handling) come from INVOKE_SKILL.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test: add resolver tests for INVOKE_SKILL, CHANGELOG_WORKFLOW, parameterized args
12 new tests covering:
- INVOKE_SKILL: template placeholder, default skip list, error handling,
BENEFITS_FROM delegation
- CHANGELOG_WORKFLOW: content, cross-check, voice guidance, format
- Parameterized resolver infra: colon-separated args processing,
no unresolved placeholders across all generated SKILL.md files
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: bump version and changelog (v0.13.7.0)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: journey routing tests — CLAUDE.md routing rules + stronger descriptions
Three journey E2E tests (ideation, ship, debug) were failing because
Claude answered directly instead of invoking the Skill tool. Root cause:
skill descriptions in system-reminder are too weak to override Claude's
default behavior for tasks it can handle natively.
Fix has two parts:
1. CLAUDE.md routing rules in test workdir — Claude weighs project-level
instructions higher than skill description metadata
2. "Proactively invoke" (not "suggest") in office-hours, investigate,
ship descriptions — reinforces the routing signal
10/10 journey tests now pass (was 7/10).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: one-time CLAUDE.md routing injection prompt
Add a preamble section that checks if the project's CLAUDE.md has
skill routing rules. If not (and user hasn't declined), asks once
via AskUserQuestion to inject a "## Skill routing" section.
Root cause: skill descriptions in system-reminder metadata are too
weak to reliably trigger proactive Skill tool invocation. CLAUDE.md
project instructions carry higher weight in Claude's decision making.
- Preamble bash checks for "## Skill routing" in CLAUDE.md
- Stores decline in gstack-config (routing_declined=true)
- Only asks once per project (HAS_ROUTING check + config check)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: annotated config file + routing injection tests
gstack-config now writes a documented header on first config creation
with every supported key explained (proactive, telemetry, auto_upgrade,
skill_prefix, routing_declined, codex_reviews, skip_eng_review, etc.).
Users can edit ~/.gstack/config.yaml directly, anytime.
Also fixes grep to use ^KEY: anchoring so commented header lines don't
shadow real config values.
Tests added:
- 7 new gstack-config tests (annotated header, no duplication, comment
safety, routing_declined get/set/reset)
- 6 new gen-skill-docs tests (preamble routing injection: bash checks,
config reads, AskUserQuestion, decline persistence, routing rules)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: bump to v0.13.9.0, separate CHANGELOG from main's releases
Split our branch's changes into a new 0.13.9.0 entry instead of
jamming them into 0.13.7.0 which already landed on main as
"Community Wave."
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: clarify branch-scoped VERSION/CHANGELOG after merging main
Add explicit rules: merging main doesn't mean adopting main's version.
Branch always gets its own entry on top with a higher version number.
Three-point checklist after every merge.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: put our 0.13.9.0 entry on top of CHANGELOG
Newest version goes on top. Our branch lands next, so our entry
must be above main's 0.13.8.0.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: restore missing 0.13.7.0 Community Wave entry
Accidentally dropped the 0.13.7.0 entry when reordering.
All entries now present: 0.13.9.0 > 0.13.8.0 > 0.13.7.0 > 0.13.6.0.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: add CHANGELOG integrity check rule
After any edit that moves/adds/removes entries, grep for version
headers and verify no gaps or duplicates before committing.
Prevents accidentally dropping entries during reordering.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|