mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-06 21:46:40 +02:00
feat: adversarial spec review loop + skill chaining (v0.9.1.0) (#249)
* feat: add {{SPEC_REVIEW_LOOP}}, {{DESIGN_SKETCH}}, benefits-from resolvers
Three new resolvers in gen-skill-docs.ts:
- {{SPEC_REVIEW_LOOP}}: adversarial subagent reviews documents on 5
dimensions (completeness, consistency, clarity, scope, feasibility)
with convergence guard, quality score, and JSONL metrics
- {{DESIGN_SKETCH}}: generates rough HTML wireframes for UI ideas using
DESIGN.md constraints and design principles, renders via $B
- {{BENEFITS_FROM}}: parses benefits-from frontmatter and generates
skill chaining offer prose (one-hop-max, never blocks)
Also extends TemplateContext with benefitsFrom field and adds inline
YAML frontmatter parsing for the new field.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: /office-hours spec review loop + visual sketch phases
- Phase 4.5 ({{DESIGN_SKETCH}}): for UI ideas, generates rough HTML
wireframe using design principles from {{DESIGN_METHODOLOGY}} and
DESIGN.md, renders via $B, presents screenshot for iteration
- Phase 5.5 ({{SPEC_REVIEW_LOOP}}): adversarial subagent reviews the
design doc before user sees it — catches gaps in completeness,
consistency, clarity, scope, and feasibility
- Adds {{BROWSE_SETUP}} for $B availability in sketch phase
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: skill chaining — plan reviews offer /office-hours
- plan-ceo-review: benefits-from office-hours, offers /office-hours when
no design doc found, mid-session detection when user seems lost,
spec review loop on CEO plan documents
- plan-eng-review: benefits-from office-hours, offers /office-hours when
no design doc found
- One-hop-max chaining: never blocks, max one offer per session
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test: add validation + E2E tests for spec review, sketch, benefits-from
Unit tests (32 new assertions):
- SPEC_REVIEW_LOOP: 5 dimensions, Agent dispatch, 3 iterations, quality
score, metrics path, convergence guard, graceful failure
- DESIGN_SKETCH: DESIGN.md awareness, wireframe, $B goto/screenshot,
rough aesthetic, skip conditions
- BENEFITS_FROM: prerequisite offer in CEO + eng review, graceful
decline, skills without benefits-from don't get offer
- office-hours structure: spec review loop, adversarial dimensions,
visual sketch section
E2E tests (2 new):
- office-hours-spec-review: verifies agent understands the spec review
loop from SKILL.md
- plan-ceo-review-benefits: verifies agent understands the skill
chaining offer
Touchfiles updated for diff-based test selection.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: bump version and changelog (v0.9.1.0)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -218,6 +218,25 @@ success/error/abort, and `USED_BROWSE` with true/false based on whether `$B` was
|
||||
If you cannot determine the outcome, use "unknown". This runs in the background and
|
||||
never blocks the user.
|
||||
|
||||
## SETUP (run this check BEFORE any browse command)
|
||||
|
||||
```bash
|
||||
_ROOT=$(git rev-parse --show-toplevel 2>/dev/null)
|
||||
B=""
|
||||
[ -n "$_ROOT" ] && [ -x "$_ROOT/.agents/skills/gstack/browse/dist/browse" ] && B="$_ROOT/.agents/skills/gstack/browse/dist/browse"
|
||||
[ -z "$B" ] && B=~/.codex/skills/gstack/browse/dist/browse
|
||||
if [ -x "$B" ]; then
|
||||
echo "READY: $B"
|
||||
else
|
||||
echo "NEEDS_SETUP"
|
||||
fi
|
||||
```
|
||||
|
||||
If `NEEDS_SETUP`:
|
||||
1. Tell the user: "gstack browse needs a one-time build (~10 seconds). OK to proceed?" Then STOP and wait.
|
||||
2. Run: `cd <SKILL_DIR> && ./setup`
|
||||
3. If `bun` is not installed: `curl -fsSL https://bun.sh/install | bash`
|
||||
|
||||
# YC Office Hours
|
||||
|
||||
You are a **YC office hours partner**. Your job is to ensure the problem is understood before solutions are proposed. You adapt to what the user is building — startup founders get the hard questions, builders get an enthusiastic collaborator. This skill produces design docs, not code.
|
||||
@@ -482,6 +501,66 @@ Present via AskUserQuestion. Do NOT proceed without user approval of the approac
|
||||
|
||||
---
|
||||
|
||||
## Visual Sketch (UI ideas only)
|
||||
|
||||
If the chosen approach involves user-facing UI (screens, pages, forms, dashboards,
|
||||
or interactive elements), generate a rough wireframe to help the user visualize it.
|
||||
If the idea is backend-only, infrastructure, or has no UI component — skip this
|
||||
section silently.
|
||||
|
||||
**Step 1: Gather design context**
|
||||
|
||||
1. Check if `DESIGN.md` exists in the repo root. If it does, read it for design
|
||||
system constraints (colors, typography, spacing, component patterns). Use these
|
||||
constraints in the wireframe.
|
||||
2. Apply core design principles:
|
||||
- **Information hierarchy** — what does the user see first, second, third?
|
||||
- **Interaction states** — loading, empty, error, success, partial
|
||||
- **Edge case paranoia** — what if the name is 47 chars? Zero results? Network fails?
|
||||
- **Subtraction default** — "as little design as possible" (Rams). Every element earns its pixels.
|
||||
- **Design for trust** — every interface element builds or erodes user trust.
|
||||
|
||||
**Step 2: Generate wireframe HTML**
|
||||
|
||||
Generate a single-page HTML file with these constraints:
|
||||
- **Intentionally rough aesthetic** — use system fonts, thin gray borders, no color,
|
||||
hand-drawn-style elements. This is a sketch, not a polished mockup.
|
||||
- Self-contained — no external dependencies, no CDN links, inline CSS only
|
||||
- Show the core interaction flow (1-3 screens/states max)
|
||||
- Include realistic placeholder content (not "Lorem ipsum" — use content that
|
||||
matches the actual use case)
|
||||
- Add HTML comments explaining design decisions
|
||||
|
||||
Write to a temp file:
|
||||
```bash
|
||||
SKETCH_FILE="/tmp/gstack-sketch-$(date +%s).html"
|
||||
```
|
||||
|
||||
**Step 3: Render and capture**
|
||||
|
||||
```bash
|
||||
$B goto "file://$SKETCH_FILE"
|
||||
$B screenshot /tmp/gstack-sketch.png
|
||||
```
|
||||
|
||||
If `$B` is not available (browse binary not set up), skip the render step. Tell the
|
||||
user: "Visual sketch requires the browse binary. Run the setup script to enable it."
|
||||
|
||||
**Step 4: Present and iterate**
|
||||
|
||||
Show the screenshot to the user. Ask: "Does this feel right? Want to iterate on the layout?"
|
||||
|
||||
If they want changes, regenerate the HTML with their feedback and re-render.
|
||||
If they approve or say "good enough," proceed.
|
||||
|
||||
**Step 5: Include in design doc**
|
||||
|
||||
Reference the wireframe screenshot in the design doc's "Recommended Approach" section.
|
||||
The screenshot file at `/tmp/gstack-sketch.png` can be referenced by downstream skills
|
||||
(`/plan-design-review`, `/design-review`) to see what was originally envisioned.
|
||||
|
||||
---
|
||||
|
||||
## Phase 4.5: Founder Signal Synthesis
|
||||
|
||||
Before writing the design doc, synthesize the founder signals you observed during the session. These will appear in the design doc ("What I noticed") and in the closing conversation (Phase 6).
|
||||
@@ -618,7 +697,73 @@ Supersedes: {prior filename — omit this line if first design on this branch}
|
||||
{observational, mentor-like reflections referencing specific things the user said during the session. Quote their words back to them — don't characterize their behavior. 2-4 bullets.}
|
||||
```
|
||||
|
||||
Present the design doc to the user via AskUserQuestion:
|
||||
---
|
||||
|
||||
## Spec Review Loop
|
||||
|
||||
Before presenting the document to the user for approval, run an adversarial review.
|
||||
|
||||
**Step 1: Dispatch reviewer subagent**
|
||||
|
||||
Use the Agent tool to dispatch an independent reviewer. The reviewer has fresh context
|
||||
and cannot see the brainstorming conversation — only the document. This ensures genuine
|
||||
adversarial independence.
|
||||
|
||||
Prompt the subagent with:
|
||||
- The file path of the document just written
|
||||
- "Read this document and review it on 5 dimensions. For each dimension, note PASS or
|
||||
list specific issues with suggested fixes. At the end, output a quality score (1-10)
|
||||
across all dimensions."
|
||||
|
||||
**Dimensions:**
|
||||
1. **Completeness** — Are all requirements addressed? Missing edge cases?
|
||||
2. **Consistency** — Do parts of the document agree with each other? Contradictions?
|
||||
3. **Clarity** — Could an engineer implement this without asking questions? Ambiguous language?
|
||||
4. **Scope** — Does the document creep beyond the original problem? YAGNI violations?
|
||||
5. **Feasibility** — Can this actually be built with the stated approach? Hidden complexity?
|
||||
|
||||
The subagent should return:
|
||||
- A quality score (1-10)
|
||||
- PASS if no issues, or a numbered list of issues with dimension, description, and fix
|
||||
|
||||
**Step 2: Fix and re-dispatch**
|
||||
|
||||
If the reviewer returns issues:
|
||||
1. Fix each issue in the document on disk (use Edit tool)
|
||||
2. Re-dispatch the reviewer subagent with the updated document
|
||||
3. Maximum 3 iterations total
|
||||
|
||||
**Convergence guard:** If the reviewer returns the same issues on consecutive iterations
|
||||
(the fix didn't resolve them or the reviewer disagrees with the fix), stop the loop
|
||||
and persist those issues as "Reviewer Concerns" in the document rather than looping
|
||||
further.
|
||||
|
||||
If the subagent fails, times out, or is unavailable — skip the review loop entirely.
|
||||
Tell the user: "Spec review unavailable — presenting unreviewed doc." The document is
|
||||
already written to disk; the review is a quality bonus, not a gate.
|
||||
|
||||
**Step 3: Report and persist metrics**
|
||||
|
||||
After the loop completes (PASS, max iterations, or convergence guard):
|
||||
|
||||
1. Tell the user the result — summary by default:
|
||||
"Your doc survived N rounds of adversarial review. M issues caught and fixed.
|
||||
Quality score: X/10."
|
||||
If they ask "what did the reviewer find?", show the full reviewer output.
|
||||
|
||||
2. If issues remain after max iterations or convergence, add a "## Reviewer Concerns"
|
||||
section to the document listing each unresolved issue. Downstream skills will see this.
|
||||
|
||||
3. Append metrics:
|
||||
```bash
|
||||
mkdir -p ~/.gstack/analytics
|
||||
echo '{"skill":"office-hours","ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","iterations":ITERATIONS,"issues_found":FOUND,"issues_fixed":FIXED,"remaining":REMAINING,"quality_score":SCORE}' >> ~/.gstack/analytics/spec-review.jsonl 2>/dev/null || true
|
||||
```
|
||||
Replace ITERATIONS, FOUND, FIXED, REMAINING, SCORE with actual values from the review.
|
||||
|
||||
---
|
||||
|
||||
Present the reviewed design doc to the user via AskUserQuestion:
|
||||
- A) Approve — mark Status: APPROVED and proceed to handoff
|
||||
- B) Revise — specify which sections need changes (loop back to revise those sections)
|
||||
- C) Start over — return to Phase 2
|
||||
|
||||
@@ -324,6 +324,37 @@ DESIGN=$(ls -t ~/.gstack/projects/$SLUG/*-$BRANCH-design-*.md 2>/dev/null | head
|
||||
```
|
||||
If a design doc exists (from `/office-hours`), read it. Use it as the source of truth for the problem statement, constraints, and chosen approach. If it has a `Supersedes:` field, note that this is a revised design.
|
||||
|
||||
## Prerequisite Skill Offer
|
||||
|
||||
When the design doc check above prints "No design doc found," offer the prerequisite
|
||||
skill before proceeding.
|
||||
|
||||
Say to the user via AskUserQuestion:
|
||||
|
||||
> "No design doc found for this branch. `/office-hours` produces a structured problem
|
||||
> statement, premise challenge, and explored alternatives — it gives this review much
|
||||
> sharper input to work with. Takes about 10 minutes. The design doc is per-feature,
|
||||
> not per-product — it captures the thinking behind this specific change."
|
||||
|
||||
Options:
|
||||
- A) Run /office-hours first (in another window, then come back)
|
||||
- B) Skip — proceed with standard review
|
||||
|
||||
If they skip: "No worries — standard review. If you ever want sharper input, try
|
||||
/office-hours first next time." Then proceed normally. Do not re-offer later in the session.
|
||||
|
||||
**Mid-session detection:** During Step 0A (Premise Challenge), if the user can't
|
||||
articulate the problem, keeps changing the problem statement, answers with "I'm not
|
||||
sure," or is clearly exploring rather than reviewing — offer `/office-hours`:
|
||||
|
||||
> "It sounds like you're still figuring out what to build — that's totally fine, but
|
||||
> that's what /office-hours is designed for. Want to pause this review and run
|
||||
> /office-hours first? It'll help you nail down the problem and approach, then come
|
||||
> back here for the strategic review."
|
||||
|
||||
Options: A) Yes, run /office-hours first. B) No, keep going.
|
||||
If they keep going, proceed normally — no guilt, no re-asking.
|
||||
|
||||
When reading TODOS.md, specifically:
|
||||
* Note any TODOs this plan touches, blocks, or unlocks
|
||||
* Check if deferred work from prior reviews relates to this plan
|
||||
@@ -467,6 +498,70 @@ Repo: {owner/repo}
|
||||
|
||||
Derive the feature slug from the plan being reviewed (e.g., "user-dashboard", "auth-refactor"). Use the date in YYYY-MM-DD format.
|
||||
|
||||
After writing the CEO plan, run the spec review loop on it:
|
||||
|
||||
## Spec Review Loop
|
||||
|
||||
Before presenting the document to the user for approval, run an adversarial review.
|
||||
|
||||
**Step 1: Dispatch reviewer subagent**
|
||||
|
||||
Use the Agent tool to dispatch an independent reviewer. The reviewer has fresh context
|
||||
and cannot see the brainstorming conversation — only the document. This ensures genuine
|
||||
adversarial independence.
|
||||
|
||||
Prompt the subagent with:
|
||||
- The file path of the document just written
|
||||
- "Read this document and review it on 5 dimensions. For each dimension, note PASS or
|
||||
list specific issues with suggested fixes. At the end, output a quality score (1-10)
|
||||
across all dimensions."
|
||||
|
||||
**Dimensions:**
|
||||
1. **Completeness** — Are all requirements addressed? Missing edge cases?
|
||||
2. **Consistency** — Do parts of the document agree with each other? Contradictions?
|
||||
3. **Clarity** — Could an engineer implement this without asking questions? Ambiguous language?
|
||||
4. **Scope** — Does the document creep beyond the original problem? YAGNI violations?
|
||||
5. **Feasibility** — Can this actually be built with the stated approach? Hidden complexity?
|
||||
|
||||
The subagent should return:
|
||||
- A quality score (1-10)
|
||||
- PASS if no issues, or a numbered list of issues with dimension, description, and fix
|
||||
|
||||
**Step 2: Fix and re-dispatch**
|
||||
|
||||
If the reviewer returns issues:
|
||||
1. Fix each issue in the document on disk (use Edit tool)
|
||||
2. Re-dispatch the reviewer subagent with the updated document
|
||||
3. Maximum 3 iterations total
|
||||
|
||||
**Convergence guard:** If the reviewer returns the same issues on consecutive iterations
|
||||
(the fix didn't resolve them or the reviewer disagrees with the fix), stop the loop
|
||||
and persist those issues as "Reviewer Concerns" in the document rather than looping
|
||||
further.
|
||||
|
||||
If the subagent fails, times out, or is unavailable — skip the review loop entirely.
|
||||
Tell the user: "Spec review unavailable — presenting unreviewed doc." The document is
|
||||
already written to disk; the review is a quality bonus, not a gate.
|
||||
|
||||
**Step 3: Report and persist metrics**
|
||||
|
||||
After the loop completes (PASS, max iterations, or convergence guard):
|
||||
|
||||
1. Tell the user the result — summary by default:
|
||||
"Your doc survived N rounds of adversarial review. M issues caught and fixed.
|
||||
Quality score: X/10."
|
||||
If they ask "what did the reviewer find?", show the full reviewer output.
|
||||
|
||||
2. If issues remain after max iterations or convergence, add a "## Reviewer Concerns"
|
||||
section to the document listing each unresolved issue. Downstream skills will see this.
|
||||
|
||||
3. Append metrics:
|
||||
```bash
|
||||
mkdir -p ~/.gstack/analytics
|
||||
echo '{"skill":"plan-ceo-review","ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","iterations":ITERATIONS,"issues_found":FOUND,"issues_fixed":FIXED,"remaining":REMAINING,"quality_score":SCORE}' >> ~/.gstack/analytics/spec-review.jsonl 2>/dev/null || true
|
||||
```
|
||||
Replace ITERATIONS, FOUND, FIXED, REMAINING, SCORE with actual values from the review.
|
||||
|
||||
### 0E. Temporal Interrogation (EXPANSION, SELECTIVE EXPANSION, and HOLD modes)
|
||||
Think ahead to implementation: What decisions will need to be made during implementation that should be resolved NOW in the plan?
|
||||
```
|
||||
|
||||
@@ -269,6 +269,25 @@ DESIGN=$(ls -t ~/.gstack/projects/$SLUG/*-$BRANCH-design-*.md 2>/dev/null | head
|
||||
```
|
||||
If a design doc exists, read it. Use it as the source of truth for the problem statement, constraints, and chosen approach. If it has a `Supersedes:` field, note that this is a revised design — check the prior version for context on what changed and why.
|
||||
|
||||
## Prerequisite Skill Offer
|
||||
|
||||
When the design doc check above prints "No design doc found," offer the prerequisite
|
||||
skill before proceeding.
|
||||
|
||||
Say to the user via AskUserQuestion:
|
||||
|
||||
> "No design doc found for this branch. `/office-hours` produces a structured problem
|
||||
> statement, premise challenge, and explored alternatives — it gives this review much
|
||||
> sharper input to work with. Takes about 10 minutes. The design doc is per-feature,
|
||||
> not per-product — it captures the thinking behind this specific change."
|
||||
|
||||
Options:
|
||||
- A) Run /office-hours first (in another window, then come back)
|
||||
- B) Skip — proceed with standard review
|
||||
|
||||
If they skip: "No worries — standard review. If you ever want sharper input, try
|
||||
/office-hours first next time." Then proceed normally. Do not re-offer later in the session.
|
||||
|
||||
### Step 0: Scope Challenge
|
||||
Before reviewing anything, answer these questions:
|
||||
1. **What existing code already partially or fully solves each sub-problem?** Can we capture outputs from existing flows rather than building parallel ones?
|
||||
|
||||
@@ -1,5 +1,14 @@
|
||||
# Changelog
|
||||
|
||||
## [0.9.1.0] - 2026-03-20 — Adversarial Spec Review + Skill Chaining
|
||||
|
||||
### Added
|
||||
|
||||
- **Your design docs now get stress-tested before you see them.** When you run `/office-hours`, an independent AI reviewer checks your design doc for completeness, consistency, clarity, scope creep, and feasibility — up to 3 rounds. You get a quality score (1-10) and a summary of what was caught and fixed. The doc you approve has already survived adversarial review.
|
||||
- **Visual wireframes during brainstorming.** For UI ideas, `/office-hours` now generates a rough HTML wireframe using your project's design system (from DESIGN.md) and screenshots it. You see what you're designing while you're still thinking, not after you've coded it.
|
||||
- **Skills help each other now.** `/plan-ceo-review` and `/plan-eng-review` detect when you'd benefit from running `/office-hours` first and offer it — one-tap to switch, one-tap to decline. If you seem lost during a CEO review, it'll gently suggest brainstorming first.
|
||||
- **Spec review metrics.** Every adversarial review logs iterations, issues found/fixed, and quality score to `~/.gstack/analytics/spec-review.jsonl`. Over time, you can see if your design docs are getting better.
|
||||
|
||||
## [0.9.0.1] - 2026-03-19
|
||||
|
||||
### Changed
|
||||
|
||||
+146
-1
@@ -227,6 +227,25 @@ success/error/abort, and `USED_BROWSE` with true/false based on whether `$B` was
|
||||
If you cannot determine the outcome, use "unknown". This runs in the background and
|
||||
never blocks the user.
|
||||
|
||||
## SETUP (run this check BEFORE any browse command)
|
||||
|
||||
```bash
|
||||
_ROOT=$(git rev-parse --show-toplevel 2>/dev/null)
|
||||
B=""
|
||||
[ -n "$_ROOT" ] && [ -x "$_ROOT/.claude/skills/gstack/browse/dist/browse" ] && B="$_ROOT/.claude/skills/gstack/browse/dist/browse"
|
||||
[ -z "$B" ] && B=~/.claude/skills/gstack/browse/dist/browse
|
||||
if [ -x "$B" ]; then
|
||||
echo "READY: $B"
|
||||
else
|
||||
echo "NEEDS_SETUP"
|
||||
fi
|
||||
```
|
||||
|
||||
If `NEEDS_SETUP`:
|
||||
1. Tell the user: "gstack browse needs a one-time build (~10 seconds). OK to proceed?" Then STOP and wait.
|
||||
2. Run: `cd <SKILL_DIR> && ./setup`
|
||||
3. If `bun` is not installed: `curl -fsSL https://bun.sh/install | bash`
|
||||
|
||||
# YC Office Hours
|
||||
|
||||
You are a **YC office hours partner**. Your job is to ensure the problem is understood before solutions are proposed. You adapt to what the user is building — startup founders get the hard questions, builders get an enthusiastic collaborator. This skill produces design docs, not code.
|
||||
@@ -491,6 +510,66 @@ Present via AskUserQuestion. Do NOT proceed without user approval of the approac
|
||||
|
||||
---
|
||||
|
||||
## Visual Sketch (UI ideas only)
|
||||
|
||||
If the chosen approach involves user-facing UI (screens, pages, forms, dashboards,
|
||||
or interactive elements), generate a rough wireframe to help the user visualize it.
|
||||
If the idea is backend-only, infrastructure, or has no UI component — skip this
|
||||
section silently.
|
||||
|
||||
**Step 1: Gather design context**
|
||||
|
||||
1. Check if `DESIGN.md` exists in the repo root. If it does, read it for design
|
||||
system constraints (colors, typography, spacing, component patterns). Use these
|
||||
constraints in the wireframe.
|
||||
2. Apply core design principles:
|
||||
- **Information hierarchy** — what does the user see first, second, third?
|
||||
- **Interaction states** — loading, empty, error, success, partial
|
||||
- **Edge case paranoia** — what if the name is 47 chars? Zero results? Network fails?
|
||||
- **Subtraction default** — "as little design as possible" (Rams). Every element earns its pixels.
|
||||
- **Design for trust** — every interface element builds or erodes user trust.
|
||||
|
||||
**Step 2: Generate wireframe HTML**
|
||||
|
||||
Generate a single-page HTML file with these constraints:
|
||||
- **Intentionally rough aesthetic** — use system fonts, thin gray borders, no color,
|
||||
hand-drawn-style elements. This is a sketch, not a polished mockup.
|
||||
- Self-contained — no external dependencies, no CDN links, inline CSS only
|
||||
- Show the core interaction flow (1-3 screens/states max)
|
||||
- Include realistic placeholder content (not "Lorem ipsum" — use content that
|
||||
matches the actual use case)
|
||||
- Add HTML comments explaining design decisions
|
||||
|
||||
Write to a temp file:
|
||||
```bash
|
||||
SKETCH_FILE="/tmp/gstack-sketch-$(date +%s).html"
|
||||
```
|
||||
|
||||
**Step 3: Render and capture**
|
||||
|
||||
```bash
|
||||
$B goto "file://$SKETCH_FILE"
|
||||
$B screenshot /tmp/gstack-sketch.png
|
||||
```
|
||||
|
||||
If `$B` is not available (browse binary not set up), skip the render step. Tell the
|
||||
user: "Visual sketch requires the browse binary. Run the setup script to enable it."
|
||||
|
||||
**Step 4: Present and iterate**
|
||||
|
||||
Show the screenshot to the user. Ask: "Does this feel right? Want to iterate on the layout?"
|
||||
|
||||
If they want changes, regenerate the HTML with their feedback and re-render.
|
||||
If they approve or say "good enough," proceed.
|
||||
|
||||
**Step 5: Include in design doc**
|
||||
|
||||
Reference the wireframe screenshot in the design doc's "Recommended Approach" section.
|
||||
The screenshot file at `/tmp/gstack-sketch.png` can be referenced by downstream skills
|
||||
(`/plan-design-review`, `/design-review`) to see what was originally envisioned.
|
||||
|
||||
---
|
||||
|
||||
## Phase 4.5: Founder Signal Synthesis
|
||||
|
||||
Before writing the design doc, synthesize the founder signals you observed during the session. These will appear in the design doc ("What I noticed") and in the closing conversation (Phase 6).
|
||||
@@ -627,7 +706,73 @@ Supersedes: {prior filename — omit this line if first design on this branch}
|
||||
{observational, mentor-like reflections referencing specific things the user said during the session. Quote their words back to them — don't characterize their behavior. 2-4 bullets.}
|
||||
```
|
||||
|
||||
Present the design doc to the user via AskUserQuestion:
|
||||
---
|
||||
|
||||
## Spec Review Loop
|
||||
|
||||
Before presenting the document to the user for approval, run an adversarial review.
|
||||
|
||||
**Step 1: Dispatch reviewer subagent**
|
||||
|
||||
Use the Agent tool to dispatch an independent reviewer. The reviewer has fresh context
|
||||
and cannot see the brainstorming conversation — only the document. This ensures genuine
|
||||
adversarial independence.
|
||||
|
||||
Prompt the subagent with:
|
||||
- The file path of the document just written
|
||||
- "Read this document and review it on 5 dimensions. For each dimension, note PASS or
|
||||
list specific issues with suggested fixes. At the end, output a quality score (1-10)
|
||||
across all dimensions."
|
||||
|
||||
**Dimensions:**
|
||||
1. **Completeness** — Are all requirements addressed? Missing edge cases?
|
||||
2. **Consistency** — Do parts of the document agree with each other? Contradictions?
|
||||
3. **Clarity** — Could an engineer implement this without asking questions? Ambiguous language?
|
||||
4. **Scope** — Does the document creep beyond the original problem? YAGNI violations?
|
||||
5. **Feasibility** — Can this actually be built with the stated approach? Hidden complexity?
|
||||
|
||||
The subagent should return:
|
||||
- A quality score (1-10)
|
||||
- PASS if no issues, or a numbered list of issues with dimension, description, and fix
|
||||
|
||||
**Step 2: Fix and re-dispatch**
|
||||
|
||||
If the reviewer returns issues:
|
||||
1. Fix each issue in the document on disk (use Edit tool)
|
||||
2. Re-dispatch the reviewer subagent with the updated document
|
||||
3. Maximum 3 iterations total
|
||||
|
||||
**Convergence guard:** If the reviewer returns the same issues on consecutive iterations
|
||||
(the fix didn't resolve them or the reviewer disagrees with the fix), stop the loop
|
||||
and persist those issues as "Reviewer Concerns" in the document rather than looping
|
||||
further.
|
||||
|
||||
If the subagent fails, times out, or is unavailable — skip the review loop entirely.
|
||||
Tell the user: "Spec review unavailable — presenting unreviewed doc." The document is
|
||||
already written to disk; the review is a quality bonus, not a gate.
|
||||
|
||||
**Step 3: Report and persist metrics**
|
||||
|
||||
After the loop completes (PASS, max iterations, or convergence guard):
|
||||
|
||||
1. Tell the user the result — summary by default:
|
||||
"Your doc survived N rounds of adversarial review. M issues caught and fixed.
|
||||
Quality score: X/10."
|
||||
If they ask "what did the reviewer find?", show the full reviewer output.
|
||||
|
||||
2. If issues remain after max iterations or convergence, add a "## Reviewer Concerns"
|
||||
section to the document listing each unresolved issue. Downstream skills will see this.
|
||||
|
||||
3. Append metrics:
|
||||
```bash
|
||||
mkdir -p ~/.gstack/analytics
|
||||
echo '{"skill":"office-hours","ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","iterations":ITERATIONS,"issues_found":FOUND,"issues_fixed":FIXED,"remaining":REMAINING,"quality_score":SCORE}' >> ~/.gstack/analytics/spec-review.jsonl 2>/dev/null || true
|
||||
```
|
||||
Replace ITERATIONS, FOUND, FIXED, REMAINING, SCORE with actual values from the review.
|
||||
|
||||
---
|
||||
|
||||
Present the reviewed design doc to the user via AskUserQuestion:
|
||||
- A) Approve — mark Status: APPROVED and proceed to handoff
|
||||
- B) Revise — specify which sections need changes (loop back to revise those sections)
|
||||
- C) Start over — return to Phase 2
|
||||
|
||||
@@ -23,6 +23,8 @@ allowed-tools:
|
||||
|
||||
{{PREAMBLE}}
|
||||
|
||||
{{BROWSE_SETUP}}
|
||||
|
||||
# YC Office Hours
|
||||
|
||||
You are a **YC office hours partner**. Your job is to ensure the problem is understood before solutions are proposed. You adapt to what the user is building — startup founders get the hard questions, builders get an enthusiastic collaborator. This skill produces design docs, not code.
|
||||
@@ -287,6 +289,10 @@ Present via AskUserQuestion. Do NOT proceed without user approval of the approac
|
||||
|
||||
---
|
||||
|
||||
{{DESIGN_SKETCH}}
|
||||
|
||||
---
|
||||
|
||||
## Phase 4.5: Founder Signal Synthesis
|
||||
|
||||
Before writing the design doc, synthesize the founder signals you observed during the session. These will appear in the design doc ("What I noticed") and in the closing conversation (Phase 6).
|
||||
@@ -423,7 +429,13 @@ Supersedes: {prior filename — omit this line if first design on this branch}
|
||||
{observational, mentor-like reflections referencing specific things the user said during the session. Quote their words back to them — don't characterize their behavior. 2-4 bullets.}
|
||||
```
|
||||
|
||||
Present the design doc to the user via AskUserQuestion:
|
||||
---
|
||||
|
||||
{{SPEC_REVIEW_LOOP}}
|
||||
|
||||
---
|
||||
|
||||
Present the reviewed design doc to the user via AskUserQuestion:
|
||||
- A) Approve — mark Status: APPROVED and proceed to handoff
|
||||
- B) Revise — specify which sections need changes (loop back to revise those sections)
|
||||
- C) Start over — return to Phase 2
|
||||
|
||||
@@ -10,6 +10,7 @@ description: |
|
||||
or "is this ambitious enough".
|
||||
Proactively suggest when the user is questioning scope or ambition of a plan,
|
||||
or when the plan feels like it could be thinking bigger.
|
||||
benefits-from: [office-hours]
|
||||
allowed-tools:
|
||||
- Read
|
||||
- Grep
|
||||
@@ -331,6 +332,37 @@ DESIGN=$(ls -t ~/.gstack/projects/$SLUG/*-$BRANCH-design-*.md 2>/dev/null | head
|
||||
```
|
||||
If a design doc exists (from `/office-hours`), read it. Use it as the source of truth for the problem statement, constraints, and chosen approach. If it has a `Supersedes:` field, note that this is a revised design.
|
||||
|
||||
## Prerequisite Skill Offer
|
||||
|
||||
When the design doc check above prints "No design doc found," offer the prerequisite
|
||||
skill before proceeding.
|
||||
|
||||
Say to the user via AskUserQuestion:
|
||||
|
||||
> "No design doc found for this branch. `/office-hours` produces a structured problem
|
||||
> statement, premise challenge, and explored alternatives — it gives this review much
|
||||
> sharper input to work with. Takes about 10 minutes. The design doc is per-feature,
|
||||
> not per-product — it captures the thinking behind this specific change."
|
||||
|
||||
Options:
|
||||
- A) Run /office-hours first (in another window, then come back)
|
||||
- B) Skip — proceed with standard review
|
||||
|
||||
If they skip: "No worries — standard review. If you ever want sharper input, try
|
||||
/office-hours first next time." Then proceed normally. Do not re-offer later in the session.
|
||||
|
||||
**Mid-session detection:** During Step 0A (Premise Challenge), if the user can't
|
||||
articulate the problem, keeps changing the problem statement, answers with "I'm not
|
||||
sure," or is clearly exploring rather than reviewing — offer `/office-hours`:
|
||||
|
||||
> "It sounds like you're still figuring out what to build — that's totally fine, but
|
||||
> that's what /office-hours is designed for. Want to pause this review and run
|
||||
> /office-hours first? It'll help you nail down the problem and approach, then come
|
||||
> back here for the strategic review."
|
||||
|
||||
Options: A) Yes, run /office-hours first. B) No, keep going.
|
||||
If they keep going, proceed normally — no guilt, no re-asking.
|
||||
|
||||
When reading TODOS.md, specifically:
|
||||
* Note any TODOs this plan touches, blocks, or unlocks
|
||||
* Check if deferred work from prior reviews relates to this plan
|
||||
@@ -474,6 +506,70 @@ Repo: {owner/repo}
|
||||
|
||||
Derive the feature slug from the plan being reviewed (e.g., "user-dashboard", "auth-refactor"). Use the date in YYYY-MM-DD format.
|
||||
|
||||
After writing the CEO plan, run the spec review loop on it:
|
||||
|
||||
## Spec Review Loop
|
||||
|
||||
Before presenting the document to the user for approval, run an adversarial review.
|
||||
|
||||
**Step 1: Dispatch reviewer subagent**
|
||||
|
||||
Use the Agent tool to dispatch an independent reviewer. The reviewer has fresh context
|
||||
and cannot see the brainstorming conversation — only the document. This ensures genuine
|
||||
adversarial independence.
|
||||
|
||||
Prompt the subagent with:
|
||||
- The file path of the document just written
|
||||
- "Read this document and review it on 5 dimensions. For each dimension, note PASS or
|
||||
list specific issues with suggested fixes. At the end, output a quality score (1-10)
|
||||
across all dimensions."
|
||||
|
||||
**Dimensions:**
|
||||
1. **Completeness** — Are all requirements addressed? Missing edge cases?
|
||||
2. **Consistency** — Do parts of the document agree with each other? Contradictions?
|
||||
3. **Clarity** — Could an engineer implement this without asking questions? Ambiguous language?
|
||||
4. **Scope** — Does the document creep beyond the original problem? YAGNI violations?
|
||||
5. **Feasibility** — Can this actually be built with the stated approach? Hidden complexity?
|
||||
|
||||
The subagent should return:
|
||||
- A quality score (1-10)
|
||||
- PASS if no issues, or a numbered list of issues with dimension, description, and fix
|
||||
|
||||
**Step 2: Fix and re-dispatch**
|
||||
|
||||
If the reviewer returns issues:
|
||||
1. Fix each issue in the document on disk (use Edit tool)
|
||||
2. Re-dispatch the reviewer subagent with the updated document
|
||||
3. Maximum 3 iterations total
|
||||
|
||||
**Convergence guard:** If the reviewer returns the same issues on consecutive iterations
|
||||
(the fix didn't resolve them or the reviewer disagrees with the fix), stop the loop
|
||||
and persist those issues as "Reviewer Concerns" in the document rather than looping
|
||||
further.
|
||||
|
||||
If the subagent fails, times out, or is unavailable — skip the review loop entirely.
|
||||
Tell the user: "Spec review unavailable — presenting unreviewed doc." The document is
|
||||
already written to disk; the review is a quality bonus, not a gate.
|
||||
|
||||
**Step 3: Report and persist metrics**
|
||||
|
||||
After the loop completes (PASS, max iterations, or convergence guard):
|
||||
|
||||
1. Tell the user the result — summary by default:
|
||||
"Your doc survived N rounds of adversarial review. M issues caught and fixed.
|
||||
Quality score: X/10."
|
||||
If they ask "what did the reviewer find?", show the full reviewer output.
|
||||
|
||||
2. If issues remain after max iterations or convergence, add a "## Reviewer Concerns"
|
||||
section to the document listing each unresolved issue. Downstream skills will see this.
|
||||
|
||||
3. Append metrics:
|
||||
```bash
|
||||
mkdir -p ~/.gstack/analytics
|
||||
echo '{"skill":"plan-ceo-review","ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","iterations":ITERATIONS,"issues_found":FOUND,"issues_fixed":FIXED,"remaining":REMAINING,"quality_score":SCORE}' >> ~/.gstack/analytics/spec-review.jsonl 2>/dev/null || true
|
||||
```
|
||||
Replace ITERATIONS, FOUND, FIXED, REMAINING, SCORE with actual values from the review.
|
||||
|
||||
### 0E. Temporal Interrogation (EXPANSION, SELECTIVE EXPANSION, and HOLD modes)
|
||||
Think ahead to implementation: What decisions will need to be made during implementation that should be resolved NOW in the plan?
|
||||
```
|
||||
|
||||
@@ -10,6 +10,7 @@ description: |
|
||||
or "is this ambitious enough".
|
||||
Proactively suggest when the user is questioning scope or ambition of a plan,
|
||||
or when the plan feels like it could be thinking bigger.
|
||||
benefits-from: [office-hours]
|
||||
allowed-tools:
|
||||
- Read
|
||||
- Grep
|
||||
@@ -110,6 +111,20 @@ DESIGN=$(ls -t ~/.gstack/projects/$SLUG/*-$BRANCH-design-*.md 2>/dev/null | head
|
||||
```
|
||||
If a design doc exists (from `/office-hours`), read it. Use it as the source of truth for the problem statement, constraints, and chosen approach. If it has a `Supersedes:` field, note that this is a revised design.
|
||||
|
||||
{{BENEFITS_FROM}}
|
||||
|
||||
**Mid-session detection:** During Step 0A (Premise Challenge), if the user can't
|
||||
articulate the problem, keeps changing the problem statement, answers with "I'm not
|
||||
sure," or is clearly exploring rather than reviewing — offer `/office-hours`:
|
||||
|
||||
> "It sounds like you're still figuring out what to build — that's totally fine, but
|
||||
> that's what /office-hours is designed for. Want to pause this review and run
|
||||
> /office-hours first? It'll help you nail down the problem and approach, then come
|
||||
> back here for the strategic review."
|
||||
|
||||
Options: A) Yes, run /office-hours first. B) No, keep going.
|
||||
If they keep going, proceed normally — no guilt, no re-asking.
|
||||
|
||||
When reading TODOS.md, specifically:
|
||||
* Note any TODOs this plan touches, blocks, or unlocks
|
||||
* Check if deferred work from prior reviews relates to this plan
|
||||
@@ -253,6 +268,10 @@ Repo: {owner/repo}
|
||||
|
||||
Derive the feature slug from the plan being reviewed (e.g., "user-dashboard", "auth-refactor"). Use the date in YYYY-MM-DD format.
|
||||
|
||||
After writing the CEO plan, run the spec review loop on it:
|
||||
|
||||
{{SPEC_REVIEW_LOOP}}
|
||||
|
||||
### 0E. Temporal Interrogation (EXPANSION, SELECTIVE EXPANSION, and HOLD modes)
|
||||
Think ahead to implementation: What decisions will need to be made during implementation that should be resolved NOW in the plan?
|
||||
```
|
||||
|
||||
@@ -8,6 +8,7 @@ description: |
|
||||
"review the architecture", "engineering review", or "lock in the plan".
|
||||
Proactively suggest when the user has a plan or design doc and is about to
|
||||
start coding — to catch architecture issues before implementation.
|
||||
benefits-from: [office-hours]
|
||||
allowed-tools:
|
||||
- Read
|
||||
- Write
|
||||
@@ -277,6 +278,25 @@ DESIGN=$(ls -t ~/.gstack/projects/$SLUG/*-$BRANCH-design-*.md 2>/dev/null | head
|
||||
```
|
||||
If a design doc exists, read it. Use it as the source of truth for the problem statement, constraints, and chosen approach. If it has a `Supersedes:` field, note that this is a revised design — check the prior version for context on what changed and why.
|
||||
|
||||
## Prerequisite Skill Offer
|
||||
|
||||
When the design doc check above prints "No design doc found," offer the prerequisite
|
||||
skill before proceeding.
|
||||
|
||||
Say to the user via AskUserQuestion:
|
||||
|
||||
> "No design doc found for this branch. `/office-hours` produces a structured problem
|
||||
> statement, premise challenge, and explored alternatives — it gives this review much
|
||||
> sharper input to work with. Takes about 10 minutes. The design doc is per-feature,
|
||||
> not per-product — it captures the thinking behind this specific change."
|
||||
|
||||
Options:
|
||||
- A) Run /office-hours first (in another window, then come back)
|
||||
- B) Skip — proceed with standard review
|
||||
|
||||
If they skip: "No worries — standard review. If you ever want sharper input, try
|
||||
/office-hours first next time." Then proceed normally. Do not re-offer later in the session.
|
||||
|
||||
### Step 0: Scope Challenge
|
||||
Before reviewing anything, answer these questions:
|
||||
1. **What existing code already partially or fully solves each sub-problem?** Can we capture outputs from existing flows rather than building parallel ones?
|
||||
|
||||
@@ -8,6 +8,7 @@ description: |
|
||||
"review the architecture", "engineering review", or "lock in the plan".
|
||||
Proactively suggest when the user has a plan or design doc and is about to
|
||||
start coding — to catch architecture issues before implementation.
|
||||
benefits-from: [office-hours]
|
||||
allowed-tools:
|
||||
- Read
|
||||
- Write
|
||||
@@ -73,6 +74,8 @@ DESIGN=$(ls -t ~/.gstack/projects/$SLUG/*-$BRANCH-design-*.md 2>/dev/null | head
|
||||
```
|
||||
If a design doc exists, read it. Use it as the source of truth for the problem statement, constraints, and chosen approach. If it has a `Supersedes:` field, note that this is a revised design — check the prior version for context on what changed and why.
|
||||
|
||||
{{BENEFITS_FROM}}
|
||||
|
||||
### Step 0: Scope Challenge
|
||||
Before reviewing anything, answer these questions:
|
||||
1. **What existing code already partially or fully solves each sub-problem?** Can we capture outputs from existing flows rather than building parallel ones?
|
||||
|
||||
+162
-1
@@ -55,6 +55,7 @@ const HOST_PATHS: Record<Host, HostPaths> = {
|
||||
interface TemplateContext {
|
||||
skillName: string;
|
||||
tmplPath: string;
|
||||
benefitsFrom?: string[];
|
||||
host: Host;
|
||||
paths: HostPaths;
|
||||
}
|
||||
@@ -1261,6 +1262,156 @@ Only commit if there are changes. Stage all bootstrap files (config, test direct
|
||||
---`;
|
||||
}
|
||||
|
||||
function generateSpecReviewLoop(_ctx: TemplateContext): string {
|
||||
return `## Spec Review Loop
|
||||
|
||||
Before presenting the document to the user for approval, run an adversarial review.
|
||||
|
||||
**Step 1: Dispatch reviewer subagent**
|
||||
|
||||
Use the Agent tool to dispatch an independent reviewer. The reviewer has fresh context
|
||||
and cannot see the brainstorming conversation — only the document. This ensures genuine
|
||||
adversarial independence.
|
||||
|
||||
Prompt the subagent with:
|
||||
- The file path of the document just written
|
||||
- "Read this document and review it on 5 dimensions. For each dimension, note PASS or
|
||||
list specific issues with suggested fixes. At the end, output a quality score (1-10)
|
||||
across all dimensions."
|
||||
|
||||
**Dimensions:**
|
||||
1. **Completeness** — Are all requirements addressed? Missing edge cases?
|
||||
2. **Consistency** — Do parts of the document agree with each other? Contradictions?
|
||||
3. **Clarity** — Could an engineer implement this without asking questions? Ambiguous language?
|
||||
4. **Scope** — Does the document creep beyond the original problem? YAGNI violations?
|
||||
5. **Feasibility** — Can this actually be built with the stated approach? Hidden complexity?
|
||||
|
||||
The subagent should return:
|
||||
- A quality score (1-10)
|
||||
- PASS if no issues, or a numbered list of issues with dimension, description, and fix
|
||||
|
||||
**Step 2: Fix and re-dispatch**
|
||||
|
||||
If the reviewer returns issues:
|
||||
1. Fix each issue in the document on disk (use Edit tool)
|
||||
2. Re-dispatch the reviewer subagent with the updated document
|
||||
3. Maximum 3 iterations total
|
||||
|
||||
**Convergence guard:** If the reviewer returns the same issues on consecutive iterations
|
||||
(the fix didn't resolve them or the reviewer disagrees with the fix), stop the loop
|
||||
and persist those issues as "Reviewer Concerns" in the document rather than looping
|
||||
further.
|
||||
|
||||
If the subagent fails, times out, or is unavailable — skip the review loop entirely.
|
||||
Tell the user: "Spec review unavailable — presenting unreviewed doc." The document is
|
||||
already written to disk; the review is a quality bonus, not a gate.
|
||||
|
||||
**Step 3: Report and persist metrics**
|
||||
|
||||
After the loop completes (PASS, max iterations, or convergence guard):
|
||||
|
||||
1. Tell the user the result — summary by default:
|
||||
"Your doc survived N rounds of adversarial review. M issues caught and fixed.
|
||||
Quality score: X/10."
|
||||
If they ask "what did the reviewer find?", show the full reviewer output.
|
||||
|
||||
2. If issues remain after max iterations or convergence, add a "## Reviewer Concerns"
|
||||
section to the document listing each unresolved issue. Downstream skills will see this.
|
||||
|
||||
3. Append metrics:
|
||||
\`\`\`bash
|
||||
mkdir -p ~/.gstack/analytics
|
||||
echo '{"skill":"${_ctx.skillName}","ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","iterations":ITERATIONS,"issues_found":FOUND,"issues_fixed":FIXED,"remaining":REMAINING,"quality_score":SCORE}' >> ~/.gstack/analytics/spec-review.jsonl 2>/dev/null || true
|
||||
\`\`\`
|
||||
Replace ITERATIONS, FOUND, FIXED, REMAINING, SCORE with actual values from the review.`;
|
||||
}
|
||||
|
||||
function generateBenefitsFrom(ctx: TemplateContext): string {
|
||||
if (!ctx.benefitsFrom || ctx.benefitsFrom.length === 0) return '';
|
||||
|
||||
const skillList = ctx.benefitsFrom.map(s => `\`/${s}\``).join(' or ');
|
||||
const first = ctx.benefitsFrom[0];
|
||||
|
||||
return `## Prerequisite Skill Offer
|
||||
|
||||
When the design doc check above prints "No design doc found," offer the prerequisite
|
||||
skill before proceeding.
|
||||
|
||||
Say to the user via AskUserQuestion:
|
||||
|
||||
> "No design doc found for this branch. ${skillList} produces a structured problem
|
||||
> statement, premise challenge, and explored alternatives — it gives this review much
|
||||
> sharper input to work with. Takes about 10 minutes. The design doc is per-feature,
|
||||
> not per-product — it captures the thinking behind this specific change."
|
||||
|
||||
Options:
|
||||
- A) Run /${first} first (in another window, then come back)
|
||||
- B) Skip — proceed with standard review
|
||||
|
||||
If they skip: "No worries — standard review. If you ever want sharper input, try
|
||||
/${first} first next time." Then proceed normally. Do not re-offer later in the session.`;
|
||||
}
|
||||
|
||||
function generateDesignSketch(_ctx: TemplateContext): string {
|
||||
return `## Visual Sketch (UI ideas only)
|
||||
|
||||
If the chosen approach involves user-facing UI (screens, pages, forms, dashboards,
|
||||
or interactive elements), generate a rough wireframe to help the user visualize it.
|
||||
If the idea is backend-only, infrastructure, or has no UI component — skip this
|
||||
section silently.
|
||||
|
||||
**Step 1: Gather design context**
|
||||
|
||||
1. Check if \`DESIGN.md\` exists in the repo root. If it does, read it for design
|
||||
system constraints (colors, typography, spacing, component patterns). Use these
|
||||
constraints in the wireframe.
|
||||
2. Apply core design principles:
|
||||
- **Information hierarchy** — what does the user see first, second, third?
|
||||
- **Interaction states** — loading, empty, error, success, partial
|
||||
- **Edge case paranoia** — what if the name is 47 chars? Zero results? Network fails?
|
||||
- **Subtraction default** — "as little design as possible" (Rams). Every element earns its pixels.
|
||||
- **Design for trust** — every interface element builds or erodes user trust.
|
||||
|
||||
**Step 2: Generate wireframe HTML**
|
||||
|
||||
Generate a single-page HTML file with these constraints:
|
||||
- **Intentionally rough aesthetic** — use system fonts, thin gray borders, no color,
|
||||
hand-drawn-style elements. This is a sketch, not a polished mockup.
|
||||
- Self-contained — no external dependencies, no CDN links, inline CSS only
|
||||
- Show the core interaction flow (1-3 screens/states max)
|
||||
- Include realistic placeholder content (not "Lorem ipsum" — use content that
|
||||
matches the actual use case)
|
||||
- Add HTML comments explaining design decisions
|
||||
|
||||
Write to a temp file:
|
||||
\`\`\`bash
|
||||
SKETCH_FILE="/tmp/gstack-sketch-$(date +%s).html"
|
||||
\`\`\`
|
||||
|
||||
**Step 3: Render and capture**
|
||||
|
||||
\`\`\`bash
|
||||
$B goto "file://$SKETCH_FILE"
|
||||
$B screenshot /tmp/gstack-sketch.png
|
||||
\`\`\`
|
||||
|
||||
If \`$B\` is not available (browse binary not set up), skip the render step. Tell the
|
||||
user: "Visual sketch requires the browse binary. Run the setup script to enable it."
|
||||
|
||||
**Step 4: Present and iterate**
|
||||
|
||||
Show the screenshot to the user. Ask: "Does this feel right? Want to iterate on the layout?"
|
||||
|
||||
If they want changes, regenerate the HTML with their feedback and re-render.
|
||||
If they approve or say "good enough," proceed.
|
||||
|
||||
**Step 5: Include in design doc**
|
||||
|
||||
Reference the wireframe screenshot in the design doc's "Recommended Approach" section.
|
||||
The screenshot file at \`/tmp/gstack-sketch.png\` can be referenced by downstream skills
|
||||
(\`/plan-design-review\`, \`/design-review\`) to see what was originally envisioned.`;
|
||||
}
|
||||
|
||||
const RESOLVERS: Record<string, (ctx: TemplateContext) => string> = {
|
||||
COMMAND_REFERENCE: generateCommandReference,
|
||||
SNAPSHOT_FLAGS: generateSnapshotFlags,
|
||||
@@ -1272,6 +1423,9 @@ const RESOLVERS: Record<string, (ctx: TemplateContext) => string> = {
|
||||
DESIGN_REVIEW_LITE: generateDesignReviewLite,
|
||||
REVIEW_DASHBOARD: generateReviewDashboard,
|
||||
TEST_BOOTSTRAP: generateTestBootstrap,
|
||||
SPEC_REVIEW_LOOP: generateSpecReviewLoop,
|
||||
DESIGN_SKETCH: generateDesignSketch,
|
||||
BENEFITS_FROM: generateBenefitsFrom,
|
||||
};
|
||||
|
||||
// ─── Codex Helpers ───────────────────────────────────────────
|
||||
@@ -1394,7 +1548,14 @@ function processTemplate(tmplPath: string, host: Host = 'claude'): { outputPath:
|
||||
// Extract skill name from frontmatter for TemplateContext
|
||||
const nameMatch = tmplContent.match(/^name:\s*(.+)$/m);
|
||||
const skillName = nameMatch ? nameMatch[1].trim() : path.basename(path.dirname(tmplPath));
|
||||
const ctx: TemplateContext = { skillName, tmplPath, host, paths: HOST_PATHS[host] };
|
||||
|
||||
// Extract benefits-from list from frontmatter (inline YAML: benefits-from: [a, b])
|
||||
const benefitsMatch = tmplContent.match(/^benefits-from:\s*\[([^\]]*)\]/m);
|
||||
const benefitsFrom = benefitsMatch
|
||||
? benefitsMatch[1].split(',').map(s => s.trim()).filter(Boolean)
|
||||
: undefined;
|
||||
|
||||
const ctx: TemplateContext = { skillName, tmplPath, benefitsFrom, host, paths: HOST_PATHS[host] };
|
||||
|
||||
// Replace placeholders
|
||||
let content = tmplContent.replace(/\{\{(\w+)\}\}/g, (match, name) => {
|
||||
|
||||
@@ -416,6 +416,98 @@ describe('REVIEW_DASHBOARD resolver', () => {
|
||||
});
|
||||
});
|
||||
|
||||
// --- {{SPEC_REVIEW_LOOP}} resolver tests ---
|
||||
|
||||
describe('SPEC_REVIEW_LOOP resolver', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'office-hours', 'SKILL.md'), 'utf-8');
|
||||
|
||||
test('contains all 5 review dimensions', () => {
|
||||
for (const dim of ['Completeness', 'Consistency', 'Clarity', 'Scope', 'Feasibility']) {
|
||||
expect(content).toContain(dim);
|
||||
}
|
||||
});
|
||||
|
||||
test('references Agent tool for subagent dispatch', () => {
|
||||
expect(content).toMatch(/Agent.*tool/i);
|
||||
});
|
||||
|
||||
test('specifies max 3 iterations', () => {
|
||||
expect(content).toMatch(/3.*iteration|maximum.*3/i);
|
||||
});
|
||||
|
||||
test('includes quality score', () => {
|
||||
expect(content).toContain('quality score');
|
||||
});
|
||||
|
||||
test('includes metrics path', () => {
|
||||
expect(content).toContain('spec-review.jsonl');
|
||||
});
|
||||
|
||||
test('includes convergence guard', () => {
|
||||
expect(content).toMatch(/[Cc]onvergence/);
|
||||
});
|
||||
|
||||
test('includes graceful failure handling', () => {
|
||||
expect(content).toMatch(/skip.*review|unavailable/i);
|
||||
});
|
||||
});
|
||||
|
||||
// --- {{DESIGN_SKETCH}} resolver tests ---
|
||||
|
||||
describe('DESIGN_SKETCH resolver', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'office-hours', 'SKILL.md'), 'utf-8');
|
||||
|
||||
test('references DESIGN.md for design system constraints', () => {
|
||||
expect(content).toContain('DESIGN.md');
|
||||
});
|
||||
|
||||
test('contains wireframe or sketch terminology', () => {
|
||||
expect(content).toMatch(/wireframe|sketch/i);
|
||||
});
|
||||
|
||||
test('references browse binary for rendering', () => {
|
||||
expect(content).toContain('$B goto');
|
||||
});
|
||||
|
||||
test('references screenshot capture', () => {
|
||||
expect(content).toContain('$B screenshot');
|
||||
});
|
||||
|
||||
test('specifies rough aesthetic', () => {
|
||||
expect(content).toMatch(/[Rr]ough|hand-drawn/);
|
||||
});
|
||||
|
||||
test('includes skip conditions', () => {
|
||||
expect(content).toMatch(/no UI component|skip/i);
|
||||
});
|
||||
});
|
||||
|
||||
// --- {{BENEFITS_FROM}} resolver tests ---
|
||||
|
||||
describe('BENEFITS_FROM resolver', () => {
|
||||
const ceoContent = fs.readFileSync(path.join(ROOT, 'plan-ceo-review', 'SKILL.md'), 'utf-8');
|
||||
const engContent = fs.readFileSync(path.join(ROOT, 'plan-eng-review', 'SKILL.md'), 'utf-8');
|
||||
|
||||
test('plan-ceo-review contains prerequisite skill offer', () => {
|
||||
expect(ceoContent).toContain('Prerequisite Skill Offer');
|
||||
expect(ceoContent).toContain('/office-hours');
|
||||
});
|
||||
|
||||
test('plan-eng-review contains prerequisite skill offer', () => {
|
||||
expect(engContent).toContain('Prerequisite Skill Offer');
|
||||
expect(engContent).toContain('/office-hours');
|
||||
});
|
||||
|
||||
test('offer includes graceful decline', () => {
|
||||
expect(ceoContent).toContain('No worries');
|
||||
});
|
||||
|
||||
test('skills without benefits-from do NOT have prerequisite offer', () => {
|
||||
const qaContent = fs.readFileSync(path.join(ROOT, 'qa', 'SKILL.md'), 'utf-8');
|
||||
expect(qaContent).not.toContain('Prerequisite Skill Offer');
|
||||
});
|
||||
});
|
||||
|
||||
// ─── Codex Generation Tests ─────────────────────────────────
|
||||
|
||||
describe('Codex generation (--host codex)', () => {
|
||||
|
||||
@@ -57,9 +57,13 @@ export const E2E_TOUCHFILES: Record<string, string[]> = {
|
||||
'review-base-branch': ['review/**'],
|
||||
'review-design-lite': ['review/**', 'test/fixtures/review-eval-design-slop.*'],
|
||||
|
||||
// Office Hours
|
||||
'office-hours-spec-review': ['office-hours/**', 'scripts/gen-skill-docs.ts'],
|
||||
|
||||
// Plan reviews
|
||||
'plan-ceo-review': ['plan-ceo-review/**'],
|
||||
'plan-ceo-review-selective': ['plan-ceo-review/**'],
|
||||
'plan-ceo-review-benefits': ['plan-ceo-review/**', 'scripts/gen-skill-docs.ts'],
|
||||
'plan-eng-review': ['plan-eng-review/**'],
|
||||
'plan-eng-review-artifact': ['plan-eng-review/**'],
|
||||
|
||||
@@ -140,6 +144,10 @@ export const LLM_JUDGE_TOUCHFILES: Record<string, string[]> = {
|
||||
'design-review/SKILL.md fix loop': ['design-review/SKILL.md', 'design-review/SKILL.md.tmpl'],
|
||||
'design-consultation/SKILL.md research': ['design-consultation/SKILL.md', 'design-consultation/SKILL.md.tmpl'],
|
||||
|
||||
// Office Hours
|
||||
'office-hours/SKILL.md spec review': ['office-hours/SKILL.md', 'office-hours/SKILL.md.tmpl', 'scripts/gen-skill-docs.ts'],
|
||||
'office-hours/SKILL.md design sketch': ['office-hours/SKILL.md', 'office-hours/SKILL.md.tmpl', 'scripts/gen-skill-docs.ts'],
|
||||
|
||||
// Other skills
|
||||
'retro/SKILL.md instructions': ['retro/SKILL.md', 'retro/SKILL.md.tmpl'],
|
||||
'qa-only/SKILL.md workflow': ['qa-only/SKILL.md', 'qa-only/SKILL.md.tmpl'],
|
||||
|
||||
@@ -2911,6 +2911,128 @@ Write the full output (including the GATE verdict) to ${codexDir}/codex-output.m
|
||||
}, 360_000);
|
||||
});
|
||||
|
||||
// --- Office Hours Spec Review E2E ---
|
||||
|
||||
describeIfSelected('Office Hours Spec Review E2E', ['office-hours-spec-review'], () => {
|
||||
let ohDir: string;
|
||||
|
||||
beforeAll(() => {
|
||||
ohDir = fs.mkdtempSync(path.join(os.tmpdir(), 'skill-e2e-oh-spec-'));
|
||||
const run = (cmd: string, args: string[]) =>
|
||||
spawnSync(cmd, args, { cwd: ohDir, stdio: 'pipe', timeout: 5000 });
|
||||
|
||||
run('git', ['init', '-b', 'main']);
|
||||
run('git', ['config', 'user.email', 'test@test.com']);
|
||||
run('git', ['config', 'user.name', 'Test']);
|
||||
fs.writeFileSync(path.join(ohDir, 'README.md'), '# Test Project\n');
|
||||
run('git', ['add', '.']);
|
||||
run('git', ['commit', '-m', 'init']);
|
||||
|
||||
// Copy office-hours skill
|
||||
fs.mkdirSync(path.join(ohDir, 'office-hours'), { recursive: true });
|
||||
fs.copyFileSync(
|
||||
path.join(ROOT, 'office-hours', 'SKILL.md'),
|
||||
path.join(ohDir, 'office-hours', 'SKILL.md'),
|
||||
);
|
||||
});
|
||||
|
||||
afterAll(() => {
|
||||
try { fs.rmSync(ohDir, { recursive: true, force: true }); } catch {}
|
||||
});
|
||||
|
||||
test('/office-hours SKILL.md contains spec review loop', async () => {
|
||||
const result = await runSkillTest({
|
||||
prompt: `Read office-hours/SKILL.md. I want to understand the spec review loop.
|
||||
|
||||
Summarize what the "Spec Review Loop" section does — specifically:
|
||||
1. How many dimensions does the reviewer check?
|
||||
2. What tool is used to dispatch the reviewer?
|
||||
3. What's the maximum number of iterations?
|
||||
4. What metrics are tracked?
|
||||
|
||||
Write your summary to ${ohDir}/spec-review-summary.md`,
|
||||
workingDirectory: ohDir,
|
||||
maxTurns: 8,
|
||||
timeout: 120_000,
|
||||
testName: 'office-hours-spec-review',
|
||||
runId,
|
||||
});
|
||||
|
||||
logCost('/office-hours spec review', result);
|
||||
recordE2E('/office-hours-spec-review', 'Office Hours Spec Review E2E', result);
|
||||
expect(result.exitReason).toBe('success');
|
||||
|
||||
const summaryPath = path.join(ohDir, 'spec-review-summary.md');
|
||||
if (fs.existsSync(summaryPath)) {
|
||||
const summary = fs.readFileSync(summaryPath, 'utf-8').toLowerCase();
|
||||
// Verify the agent understood the key concepts
|
||||
expect(summary).toMatch(/5.*dimension|dimension.*5|completeness|consistency|clarity|scope|feasibility/);
|
||||
expect(summary).toMatch(/agent|subagent/);
|
||||
expect(summary).toMatch(/3.*iteration|iteration.*3|maximum.*3/);
|
||||
}
|
||||
}, 180_000);
|
||||
});
|
||||
|
||||
// --- Plan CEO Review Benefits-From E2E ---
|
||||
|
||||
describeIfSelected('Plan CEO Review Benefits-From E2E', ['plan-ceo-review-benefits'], () => {
|
||||
let benefitsDir: string;
|
||||
|
||||
beforeAll(() => {
|
||||
benefitsDir = fs.mkdtempSync(path.join(os.tmpdir(), 'skill-e2e-benefits-'));
|
||||
const run = (cmd: string, args: string[]) =>
|
||||
spawnSync(cmd, args, { cwd: benefitsDir, stdio: 'pipe', timeout: 5000 });
|
||||
|
||||
run('git', ['init', '-b', 'main']);
|
||||
run('git', ['config', 'user.email', 'test@test.com']);
|
||||
run('git', ['config', 'user.name', 'Test']);
|
||||
fs.writeFileSync(path.join(benefitsDir, 'README.md'), '# Test Project\n');
|
||||
run('git', ['add', '.']);
|
||||
run('git', ['commit', '-m', 'init']);
|
||||
|
||||
// Copy plan-ceo-review skill
|
||||
fs.mkdirSync(path.join(benefitsDir, 'plan-ceo-review'), { recursive: true });
|
||||
fs.copyFileSync(
|
||||
path.join(ROOT, 'plan-ceo-review', 'SKILL.md'),
|
||||
path.join(benefitsDir, 'plan-ceo-review', 'SKILL.md'),
|
||||
);
|
||||
});
|
||||
|
||||
afterAll(() => {
|
||||
try { fs.rmSync(benefitsDir, { recursive: true, force: true }); } catch {}
|
||||
});
|
||||
|
||||
test('/plan-ceo-review SKILL.md contains prerequisite skill offer', async () => {
|
||||
const result = await runSkillTest({
|
||||
prompt: `Read plan-ceo-review/SKILL.md. Search for sections about "Prerequisite" or "office-hours" or "design doc found".
|
||||
|
||||
Summarize what happens when no design doc is found — specifically:
|
||||
1. Is /office-hours offered as a prerequisite?
|
||||
2. What options does the user get?
|
||||
3. Is there a mid-session detection for when the user seems lost?
|
||||
|
||||
Write your summary to ${benefitsDir}/benefits-summary.md`,
|
||||
workingDirectory: benefitsDir,
|
||||
maxTurns: 8,
|
||||
timeout: 120_000,
|
||||
testName: 'plan-ceo-review-benefits',
|
||||
runId,
|
||||
});
|
||||
|
||||
logCost('/plan-ceo-review benefits-from', result);
|
||||
recordE2E('/plan-ceo-review-benefits', 'Plan CEO Review Benefits-From E2E', result);
|
||||
expect(result.exitReason).toBe('success');
|
||||
|
||||
const summaryPath = path.join(benefitsDir, 'benefits-summary.md');
|
||||
if (fs.existsSync(summaryPath)) {
|
||||
const summary = fs.readFileSync(summaryPath, 'utf-8').toLowerCase();
|
||||
// Verify the agent understood the skill chaining
|
||||
expect(summary).toMatch(/office.hours/);
|
||||
expect(summary).toMatch(/design doc|no design/i);
|
||||
}
|
||||
}, 180_000);
|
||||
});
|
||||
|
||||
// Module-level afterAll — finalize eval collector after all tests complete
|
||||
afterAll(async () => {
|
||||
if (evalCollector) {
|
||||
|
||||
@@ -644,6 +644,59 @@ describe('office-hours skill structure', () => {
|
||||
test('contains builder operating principles', () => {
|
||||
expect(content).toContain('Delight is the currency');
|
||||
});
|
||||
|
||||
// Spec Review Loop (Phase 5.5)
|
||||
test('contains spec review loop', () => {
|
||||
expect(content).toContain('Spec Review Loop');
|
||||
});
|
||||
|
||||
test('contains adversarial review dimensions', () => {
|
||||
for (const dim of ['Completeness', 'Consistency', 'Clarity', 'Scope', 'Feasibility']) {
|
||||
expect(content).toContain(dim);
|
||||
}
|
||||
});
|
||||
|
||||
test('contains subagent dispatch instruction', () => {
|
||||
expect(content).toMatch(/Agent.*tool|subagent/i);
|
||||
});
|
||||
|
||||
test('contains max 3 iterations', () => {
|
||||
expect(content).toMatch(/3.*iteration|maximum.*3/i);
|
||||
});
|
||||
|
||||
test('contains quality score', () => {
|
||||
expect(content).toContain('quality score');
|
||||
});
|
||||
|
||||
test('contains spec review metrics path', () => {
|
||||
expect(content).toContain('spec-review.jsonl');
|
||||
});
|
||||
|
||||
test('contains convergence guard', () => {
|
||||
expect(content).toMatch(/convergence/i);
|
||||
});
|
||||
|
||||
// Visual Sketch (Phase 4.5)
|
||||
test('contains visual sketch section', () => {
|
||||
expect(content).toContain('Visual Sketch');
|
||||
});
|
||||
|
||||
test('contains wireframe generation', () => {
|
||||
expect(content).toMatch(/wireframe|sketch/i);
|
||||
});
|
||||
|
||||
test('contains DESIGN.md awareness', () => {
|
||||
expect(content).toContain('DESIGN.md');
|
||||
});
|
||||
|
||||
test('contains browse rendering', () => {
|
||||
expect(content).toContain('$B goto');
|
||||
expect(content).toContain('$B screenshot');
|
||||
});
|
||||
|
||||
test('contains rough aesthetic instruction', () => {
|
||||
expect(content).toMatch(/rough|hand-drawn/i);
|
||||
});
|
||||
});
|
||||
|
||||
describe('investigate skill structure', () => {
|
||||
@@ -856,6 +909,22 @@ describe('CEO review mode validation', () => {
|
||||
expect(content).toContain('HOLD SCOPE');
|
||||
expect(content).toContain('REDUCTION');
|
||||
});
|
||||
|
||||
// Skill chaining (benefits-from)
|
||||
test('contains prerequisite skill offer for office-hours', () => {
|
||||
expect(content).toContain('Prerequisite Skill Offer');
|
||||
expect(content).toContain('/office-hours');
|
||||
});
|
||||
|
||||
test('contains mid-session detection', () => {
|
||||
expect(content).toContain('Mid-session detection');
|
||||
expect(content).toMatch(/still figuring out|seems lost/i);
|
||||
});
|
||||
|
||||
// Spec review on CEO plans
|
||||
test('contains spec review loop for CEO plan documents', () => {
|
||||
expect(content).toContain('Spec Review Loop');
|
||||
});
|
||||
});
|
||||
|
||||
// --- gstack-slug helper ---
|
||||
|
||||
Reference in New Issue
Block a user