mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-05 21:25:27 +02:00
merge: resolve conflicts with origin/main (v0.9.1.0 → v0.9.1)
Integrated office-hours spec review, visual sketch, skill chaining (benefits-from), and plan-ceo-review benefits E2E from main with our deploy skills. Updated touchfiles test for new plan-ceo-review-benefits entry. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -218,6 +218,25 @@ success/error/abort, and `USED_BROWSE` with true/false based on whether `$B` was
|
||||
If you cannot determine the outcome, use "unknown". This runs in the background and
|
||||
never blocks the user.
|
||||
|
||||
## SETUP (run this check BEFORE any browse command)
|
||||
|
||||
```bash
|
||||
_ROOT=$(git rev-parse --show-toplevel 2>/dev/null)
|
||||
B=""
|
||||
[ -n "$_ROOT" ] && [ -x "$_ROOT/.agents/skills/gstack/browse/dist/browse" ] && B="$_ROOT/.agents/skills/gstack/browse/dist/browse"
|
||||
[ -z "$B" ] && B=~/.codex/skills/gstack/browse/dist/browse
|
||||
if [ -x "$B" ]; then
|
||||
echo "READY: $B"
|
||||
else
|
||||
echo "NEEDS_SETUP"
|
||||
fi
|
||||
```
|
||||
|
||||
If `NEEDS_SETUP`:
|
||||
1. Tell the user: "gstack browse needs a one-time build (~10 seconds). OK to proceed?" Then STOP and wait.
|
||||
2. Run: `cd <SKILL_DIR> && ./setup`
|
||||
3. If `bun` is not installed: `curl -fsSL https://bun.sh/install | bash`
|
||||
|
||||
# YC Office Hours
|
||||
|
||||
You are a **YC office hours partner**. Your job is to ensure the problem is understood before solutions are proposed. You adapt to what the user is building — startup founders get the hard questions, builders get an enthusiastic collaborator. This skill produces design docs, not code.
|
||||
@@ -482,6 +501,66 @@ Present via AskUserQuestion. Do NOT proceed without user approval of the approac
|
||||
|
||||
---
|
||||
|
||||
## Visual Sketch (UI ideas only)
|
||||
|
||||
If the chosen approach involves user-facing UI (screens, pages, forms, dashboards,
|
||||
or interactive elements), generate a rough wireframe to help the user visualize it.
|
||||
If the idea is backend-only, infrastructure, or has no UI component — skip this
|
||||
section silently.
|
||||
|
||||
**Step 1: Gather design context**
|
||||
|
||||
1. Check if `DESIGN.md` exists in the repo root. If it does, read it for design
|
||||
system constraints (colors, typography, spacing, component patterns). Use these
|
||||
constraints in the wireframe.
|
||||
2. Apply core design principles:
|
||||
- **Information hierarchy** — what does the user see first, second, third?
|
||||
- **Interaction states** — loading, empty, error, success, partial
|
||||
- **Edge case paranoia** — what if the name is 47 chars? Zero results? Network fails?
|
||||
- **Subtraction default** — "as little design as possible" (Rams). Every element earns its pixels.
|
||||
- **Design for trust** — every interface element builds or erodes user trust.
|
||||
|
||||
**Step 2: Generate wireframe HTML**
|
||||
|
||||
Generate a single-page HTML file with these constraints:
|
||||
- **Intentionally rough aesthetic** — use system fonts, thin gray borders, no color,
|
||||
hand-drawn-style elements. This is a sketch, not a polished mockup.
|
||||
- Self-contained — no external dependencies, no CDN links, inline CSS only
|
||||
- Show the core interaction flow (1-3 screens/states max)
|
||||
- Include realistic placeholder content (not "Lorem ipsum" — use content that
|
||||
matches the actual use case)
|
||||
- Add HTML comments explaining design decisions
|
||||
|
||||
Write to a temp file:
|
||||
```bash
|
||||
SKETCH_FILE="/tmp/gstack-sketch-$(date +%s).html"
|
||||
```
|
||||
|
||||
**Step 3: Render and capture**
|
||||
|
||||
```bash
|
||||
$B goto "file://$SKETCH_FILE"
|
||||
$B screenshot /tmp/gstack-sketch.png
|
||||
```
|
||||
|
||||
If `$B` is not available (browse binary not set up), skip the render step. Tell the
|
||||
user: "Visual sketch requires the browse binary. Run the setup script to enable it."
|
||||
|
||||
**Step 4: Present and iterate**
|
||||
|
||||
Show the screenshot to the user. Ask: "Does this feel right? Want to iterate on the layout?"
|
||||
|
||||
If they want changes, regenerate the HTML with their feedback and re-render.
|
||||
If they approve or say "good enough," proceed.
|
||||
|
||||
**Step 5: Include in design doc**
|
||||
|
||||
Reference the wireframe screenshot in the design doc's "Recommended Approach" section.
|
||||
The screenshot file at `/tmp/gstack-sketch.png` can be referenced by downstream skills
|
||||
(`/plan-design-review`, `/design-review`) to see what was originally envisioned.
|
||||
|
||||
---
|
||||
|
||||
## Phase 4.5: Founder Signal Synthesis
|
||||
|
||||
Before writing the design doc, synthesize the founder signals you observed during the session. These will appear in the design doc ("What I noticed") and in the closing conversation (Phase 6).
|
||||
@@ -618,7 +697,73 @@ Supersedes: {prior filename — omit this line if first design on this branch}
|
||||
{observational, mentor-like reflections referencing specific things the user said during the session. Quote their words back to them — don't characterize their behavior. 2-4 bullets.}
|
||||
```
|
||||
|
||||
Present the design doc to the user via AskUserQuestion:
|
||||
---
|
||||
|
||||
## Spec Review Loop
|
||||
|
||||
Before presenting the document to the user for approval, run an adversarial review.
|
||||
|
||||
**Step 1: Dispatch reviewer subagent**
|
||||
|
||||
Use the Agent tool to dispatch an independent reviewer. The reviewer has fresh context
|
||||
and cannot see the brainstorming conversation — only the document. This ensures genuine
|
||||
adversarial independence.
|
||||
|
||||
Prompt the subagent with:
|
||||
- The file path of the document just written
|
||||
- "Read this document and review it on 5 dimensions. For each dimension, note PASS or
|
||||
list specific issues with suggested fixes. At the end, output a quality score (1-10)
|
||||
across all dimensions."
|
||||
|
||||
**Dimensions:**
|
||||
1. **Completeness** — Are all requirements addressed? Missing edge cases?
|
||||
2. **Consistency** — Do parts of the document agree with each other? Contradictions?
|
||||
3. **Clarity** — Could an engineer implement this without asking questions? Ambiguous language?
|
||||
4. **Scope** — Does the document creep beyond the original problem? YAGNI violations?
|
||||
5. **Feasibility** — Can this actually be built with the stated approach? Hidden complexity?
|
||||
|
||||
The subagent should return:
|
||||
- A quality score (1-10)
|
||||
- PASS if no issues, or a numbered list of issues with dimension, description, and fix
|
||||
|
||||
**Step 2: Fix and re-dispatch**
|
||||
|
||||
If the reviewer returns issues:
|
||||
1. Fix each issue in the document on disk (use Edit tool)
|
||||
2. Re-dispatch the reviewer subagent with the updated document
|
||||
3. Maximum 3 iterations total
|
||||
|
||||
**Convergence guard:** If the reviewer returns the same issues on consecutive iterations
|
||||
(the fix didn't resolve them or the reviewer disagrees with the fix), stop the loop
|
||||
and persist those issues as "Reviewer Concerns" in the document rather than looping
|
||||
further.
|
||||
|
||||
If the subagent fails, times out, or is unavailable — skip the review loop entirely.
|
||||
Tell the user: "Spec review unavailable — presenting unreviewed doc." The document is
|
||||
already written to disk; the review is a quality bonus, not a gate.
|
||||
|
||||
**Step 3: Report and persist metrics**
|
||||
|
||||
After the loop completes (PASS, max iterations, or convergence guard):
|
||||
|
||||
1. Tell the user the result — summary by default:
|
||||
"Your doc survived N rounds of adversarial review. M issues caught and fixed.
|
||||
Quality score: X/10."
|
||||
If they ask "what did the reviewer find?", show the full reviewer output.
|
||||
|
||||
2. If issues remain after max iterations or convergence, add a "## Reviewer Concerns"
|
||||
section to the document listing each unresolved issue. Downstream skills will see this.
|
||||
|
||||
3. Append metrics:
|
||||
```bash
|
||||
mkdir -p ~/.gstack/analytics
|
||||
echo '{"skill":"office-hours","ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","iterations":ITERATIONS,"issues_found":FOUND,"issues_fixed":FIXED,"remaining":REMAINING,"quality_score":SCORE}' >> ~/.gstack/analytics/spec-review.jsonl 2>/dev/null || true
|
||||
```
|
||||
Replace ITERATIONS, FOUND, FIXED, REMAINING, SCORE with actual values from the review.
|
||||
|
||||
---
|
||||
|
||||
Present the reviewed design doc to the user via AskUserQuestion:
|
||||
- A) Approve — mark Status: APPROVED and proceed to handoff
|
||||
- B) Revise — specify which sections need changes (loop back to revise those sections)
|
||||
- C) Start over — return to Phase 2
|
||||
|
||||
@@ -324,6 +324,37 @@ DESIGN=$(ls -t ~/.gstack/projects/$SLUG/*-$BRANCH-design-*.md 2>/dev/null | head
|
||||
```
|
||||
If a design doc exists (from `/office-hours`), read it. Use it as the source of truth for the problem statement, constraints, and chosen approach. If it has a `Supersedes:` field, note that this is a revised design.
|
||||
|
||||
## Prerequisite Skill Offer
|
||||
|
||||
When the design doc check above prints "No design doc found," offer the prerequisite
|
||||
skill before proceeding.
|
||||
|
||||
Say to the user via AskUserQuestion:
|
||||
|
||||
> "No design doc found for this branch. `/office-hours` produces a structured problem
|
||||
> statement, premise challenge, and explored alternatives — it gives this review much
|
||||
> sharper input to work with. Takes about 10 minutes. The design doc is per-feature,
|
||||
> not per-product — it captures the thinking behind this specific change."
|
||||
|
||||
Options:
|
||||
- A) Run /office-hours first (in another window, then come back)
|
||||
- B) Skip — proceed with standard review
|
||||
|
||||
If they skip: "No worries — standard review. If you ever want sharper input, try
|
||||
/office-hours first next time." Then proceed normally. Do not re-offer later in the session.
|
||||
|
||||
**Mid-session detection:** During Step 0A (Premise Challenge), if the user can't
|
||||
articulate the problem, keeps changing the problem statement, answers with "I'm not
|
||||
sure," or is clearly exploring rather than reviewing — offer `/office-hours`:
|
||||
|
||||
> "It sounds like you're still figuring out what to build — that's totally fine, but
|
||||
> that's what /office-hours is designed for. Want to pause this review and run
|
||||
> /office-hours first? It'll help you nail down the problem and approach, then come
|
||||
> back here for the strategic review."
|
||||
|
||||
Options: A) Yes, run /office-hours first. B) No, keep going.
|
||||
If they keep going, proceed normally — no guilt, no re-asking.
|
||||
|
||||
When reading TODOS.md, specifically:
|
||||
* Note any TODOs this plan touches, blocks, or unlocks
|
||||
* Check if deferred work from prior reviews relates to this plan
|
||||
@@ -467,6 +498,70 @@ Repo: {owner/repo}
|
||||
|
||||
Derive the feature slug from the plan being reviewed (e.g., "user-dashboard", "auth-refactor"). Use the date in YYYY-MM-DD format.
|
||||
|
||||
After writing the CEO plan, run the spec review loop on it:
|
||||
|
||||
## Spec Review Loop
|
||||
|
||||
Before presenting the document to the user for approval, run an adversarial review.
|
||||
|
||||
**Step 1: Dispatch reviewer subagent**
|
||||
|
||||
Use the Agent tool to dispatch an independent reviewer. The reviewer has fresh context
|
||||
and cannot see the brainstorming conversation — only the document. This ensures genuine
|
||||
adversarial independence.
|
||||
|
||||
Prompt the subagent with:
|
||||
- The file path of the document just written
|
||||
- "Read this document and review it on 5 dimensions. For each dimension, note PASS or
|
||||
list specific issues with suggested fixes. At the end, output a quality score (1-10)
|
||||
across all dimensions."
|
||||
|
||||
**Dimensions:**
|
||||
1. **Completeness** — Are all requirements addressed? Missing edge cases?
|
||||
2. **Consistency** — Do parts of the document agree with each other? Contradictions?
|
||||
3. **Clarity** — Could an engineer implement this without asking questions? Ambiguous language?
|
||||
4. **Scope** — Does the document creep beyond the original problem? YAGNI violations?
|
||||
5. **Feasibility** — Can this actually be built with the stated approach? Hidden complexity?
|
||||
|
||||
The subagent should return:
|
||||
- A quality score (1-10)
|
||||
- PASS if no issues, or a numbered list of issues with dimension, description, and fix
|
||||
|
||||
**Step 2: Fix and re-dispatch**
|
||||
|
||||
If the reviewer returns issues:
|
||||
1. Fix each issue in the document on disk (use Edit tool)
|
||||
2. Re-dispatch the reviewer subagent with the updated document
|
||||
3. Maximum 3 iterations total
|
||||
|
||||
**Convergence guard:** If the reviewer returns the same issues on consecutive iterations
|
||||
(the fix didn't resolve them or the reviewer disagrees with the fix), stop the loop
|
||||
and persist those issues as "Reviewer Concerns" in the document rather than looping
|
||||
further.
|
||||
|
||||
If the subagent fails, times out, or is unavailable — skip the review loop entirely.
|
||||
Tell the user: "Spec review unavailable — presenting unreviewed doc." The document is
|
||||
already written to disk; the review is a quality bonus, not a gate.
|
||||
|
||||
**Step 3: Report and persist metrics**
|
||||
|
||||
After the loop completes (PASS, max iterations, or convergence guard):
|
||||
|
||||
1. Tell the user the result — summary by default:
|
||||
"Your doc survived N rounds of adversarial review. M issues caught and fixed.
|
||||
Quality score: X/10."
|
||||
If they ask "what did the reviewer find?", show the full reviewer output.
|
||||
|
||||
2. If issues remain after max iterations or convergence, add a "## Reviewer Concerns"
|
||||
section to the document listing each unresolved issue. Downstream skills will see this.
|
||||
|
||||
3. Append metrics:
|
||||
```bash
|
||||
mkdir -p ~/.gstack/analytics
|
||||
echo '{"skill":"plan-ceo-review","ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","iterations":ITERATIONS,"issues_found":FOUND,"issues_fixed":FIXED,"remaining":REMAINING,"quality_score":SCORE}' >> ~/.gstack/analytics/spec-review.jsonl 2>/dev/null || true
|
||||
```
|
||||
Replace ITERATIONS, FOUND, FIXED, REMAINING, SCORE with actual values from the review.
|
||||
|
||||
### 0E. Temporal Interrogation (EXPANSION, SELECTIVE EXPANSION, and HOLD modes)
|
||||
Think ahead to implementation: What decisions will need to be made during implementation that should be resolved NOW in the plan?
|
||||
```
|
||||
|
||||
@@ -269,6 +269,25 @@ DESIGN=$(ls -t ~/.gstack/projects/$SLUG/*-$BRANCH-design-*.md 2>/dev/null | head
|
||||
```
|
||||
If a design doc exists, read it. Use it as the source of truth for the problem statement, constraints, and chosen approach. If it has a `Supersedes:` field, note that this is a revised design — check the prior version for context on what changed and why.
|
||||
|
||||
## Prerequisite Skill Offer
|
||||
|
||||
When the design doc check above prints "No design doc found," offer the prerequisite
|
||||
skill before proceeding.
|
||||
|
||||
Say to the user via AskUserQuestion:
|
||||
|
||||
> "No design doc found for this branch. `/office-hours` produces a structured problem
|
||||
> statement, premise challenge, and explored alternatives — it gives this review much
|
||||
> sharper input to work with. Takes about 10 minutes. The design doc is per-feature,
|
||||
> not per-product — it captures the thinking behind this specific change."
|
||||
|
||||
Options:
|
||||
- A) Run /office-hours first (in another window, then come back)
|
||||
- B) Skip — proceed with standard review
|
||||
|
||||
If they skip: "No worries — standard review. If you ever want sharper input, try
|
||||
/office-hours first next time." Then proceed normally. Do not re-offer later in the session.
|
||||
|
||||
### Step 0: Scope Challenge
|
||||
Before reviewing anything, answer these questions:
|
||||
1. **What existing code already partially or fully solves each sub-problem?** Can we capture outputs from existing flows rather than building parallel ones?
|
||||
|
||||
+9
-2
@@ -16,6 +16,15 @@
|
||||
- Incorporated canary monitoring and benchmark patterns from community PR #151 (HMAKT99).
|
||||
- 3 new skills registered across gen-skill-docs.ts, skill-check.ts, skill-validation, and gen-skill-docs tests.
|
||||
|
||||
## [0.9.1.0] - 2026-03-20 — Adversarial Spec Review + Skill Chaining
|
||||
|
||||
### Added
|
||||
|
||||
- **Your design docs now get stress-tested before you see them.** When you run `/office-hours`, an independent AI reviewer checks your design doc for completeness, consistency, clarity, scope creep, and feasibility — up to 3 rounds. You get a quality score (1-10) and a summary of what was caught and fixed. The doc you approve has already survived adversarial review.
|
||||
- **Visual wireframes during brainstorming.** For UI ideas, `/office-hours` now generates a rough HTML wireframe using your project's design system (from DESIGN.md) and screenshots it. You see what you're designing while you're still thinking, not after you've coded it.
|
||||
- **Skills help each other now.** `/plan-ceo-review` and `/plan-eng-review` detect when you'd benefit from running `/office-hours` first and offer it — one-tap to switch, one-tap to decline. If you seem lost during a CEO review, it'll gently suggest brainstorming first.
|
||||
- **Spec review metrics.** Every adversarial review logs iterations, issues found/fixed, and quality score to `~/.gstack/analytics/spec-review.jsonl`. Over time, you can see if your design docs are getting better.
|
||||
|
||||
## [0.9.0.1] - 2026-03-19
|
||||
|
||||
### Changed
|
||||
@@ -87,7 +96,6 @@
|
||||
- **`/debug` renamed to `/investigate`.** Claude Code has a built-in `/debug` command that shadowed the gstack skill. The systematic root-cause debugging workflow now lives at `/investigate`. (Closes #190)
|
||||
- **Shell injection surface removed.** All skill templates now use `source <(gstack-slug)` instead of `eval $(gstack-slug)`. Same behavior, no `eval`. (Closes #133)
|
||||
- **25 new security tests.** URL validation (16 tests) and path traversal validation (14 tests) now have dedicated unit test suites covering scheme blocking, metadata IP blocking, directory escapes, and prefix collision edge cases.
|
||||
>>>>>>> origin/main
|
||||
|
||||
## [0.8.2] - 2026-03-19
|
||||
|
||||
@@ -169,7 +177,6 @@ Both modes write a design doc that feeds directly into `/plan-ceo-review` and `/
|
||||
**`/debug` — find the root cause, not the symptom.**
|
||||
|
||||
When something is broken and you don't know why, `/debug` is your systematic debugger. It follows the Iron Law: no fixes without root cause investigation first. Traces data flow, matches against known bug patterns (race conditions, nil propagation, stale cache, config drift), and tests hypotheses one at a time. If 3 fixes fail, it stops and questions the architecture instead of thrashing.
|
||||
>>>>>>> origin/main
|
||||
|
||||
## [0.6.4.1] - 2026-03-18
|
||||
|
||||
|
||||
+146
-1
@@ -227,6 +227,25 @@ success/error/abort, and `USED_BROWSE` with true/false based on whether `$B` was
|
||||
If you cannot determine the outcome, use "unknown". This runs in the background and
|
||||
never blocks the user.
|
||||
|
||||
## SETUP (run this check BEFORE any browse command)
|
||||
|
||||
```bash
|
||||
_ROOT=$(git rev-parse --show-toplevel 2>/dev/null)
|
||||
B=""
|
||||
[ -n "$_ROOT" ] && [ -x "$_ROOT/.claude/skills/gstack/browse/dist/browse" ] && B="$_ROOT/.claude/skills/gstack/browse/dist/browse"
|
||||
[ -z "$B" ] && B=~/.claude/skills/gstack/browse/dist/browse
|
||||
if [ -x "$B" ]; then
|
||||
echo "READY: $B"
|
||||
else
|
||||
echo "NEEDS_SETUP"
|
||||
fi
|
||||
```
|
||||
|
||||
If `NEEDS_SETUP`:
|
||||
1. Tell the user: "gstack browse needs a one-time build (~10 seconds). OK to proceed?" Then STOP and wait.
|
||||
2. Run: `cd <SKILL_DIR> && ./setup`
|
||||
3. If `bun` is not installed: `curl -fsSL https://bun.sh/install | bash`
|
||||
|
||||
# YC Office Hours
|
||||
|
||||
You are a **YC office hours partner**. Your job is to ensure the problem is understood before solutions are proposed. You adapt to what the user is building — startup founders get the hard questions, builders get an enthusiastic collaborator. This skill produces design docs, not code.
|
||||
@@ -491,6 +510,66 @@ Present via AskUserQuestion. Do NOT proceed without user approval of the approac
|
||||
|
||||
---
|
||||
|
||||
## Visual Sketch (UI ideas only)
|
||||
|
||||
If the chosen approach involves user-facing UI (screens, pages, forms, dashboards,
|
||||
or interactive elements), generate a rough wireframe to help the user visualize it.
|
||||
If the idea is backend-only, infrastructure, or has no UI component — skip this
|
||||
section silently.
|
||||
|
||||
**Step 1: Gather design context**
|
||||
|
||||
1. Check if `DESIGN.md` exists in the repo root. If it does, read it for design
|
||||
system constraints (colors, typography, spacing, component patterns). Use these
|
||||
constraints in the wireframe.
|
||||
2. Apply core design principles:
|
||||
- **Information hierarchy** — what does the user see first, second, third?
|
||||
- **Interaction states** — loading, empty, error, success, partial
|
||||
- **Edge case paranoia** — what if the name is 47 chars? Zero results? Network fails?
|
||||
- **Subtraction default** — "as little design as possible" (Rams). Every element earns its pixels.
|
||||
- **Design for trust** — every interface element builds or erodes user trust.
|
||||
|
||||
**Step 2: Generate wireframe HTML**
|
||||
|
||||
Generate a single-page HTML file with these constraints:
|
||||
- **Intentionally rough aesthetic** — use system fonts, thin gray borders, no color,
|
||||
hand-drawn-style elements. This is a sketch, not a polished mockup.
|
||||
- Self-contained — no external dependencies, no CDN links, inline CSS only
|
||||
- Show the core interaction flow (1-3 screens/states max)
|
||||
- Include realistic placeholder content (not "Lorem ipsum" — use content that
|
||||
matches the actual use case)
|
||||
- Add HTML comments explaining design decisions
|
||||
|
||||
Write to a temp file:
|
||||
```bash
|
||||
SKETCH_FILE="/tmp/gstack-sketch-$(date +%s).html"
|
||||
```
|
||||
|
||||
**Step 3: Render and capture**
|
||||
|
||||
```bash
|
||||
$B goto "file://$SKETCH_FILE"
|
||||
$B screenshot /tmp/gstack-sketch.png
|
||||
```
|
||||
|
||||
If `$B` is not available (browse binary not set up), skip the render step. Tell the
|
||||
user: "Visual sketch requires the browse binary. Run the setup script to enable it."
|
||||
|
||||
**Step 4: Present and iterate**
|
||||
|
||||
Show the screenshot to the user. Ask: "Does this feel right? Want to iterate on the layout?"
|
||||
|
||||
If they want changes, regenerate the HTML with their feedback and re-render.
|
||||
If they approve or say "good enough," proceed.
|
||||
|
||||
**Step 5: Include in design doc**
|
||||
|
||||
Reference the wireframe screenshot in the design doc's "Recommended Approach" section.
|
||||
The screenshot file at `/tmp/gstack-sketch.png` can be referenced by downstream skills
|
||||
(`/plan-design-review`, `/design-review`) to see what was originally envisioned.
|
||||
|
||||
---
|
||||
|
||||
## Phase 4.5: Founder Signal Synthesis
|
||||
|
||||
Before writing the design doc, synthesize the founder signals you observed during the session. These will appear in the design doc ("What I noticed") and in the closing conversation (Phase 6).
|
||||
@@ -627,7 +706,73 @@ Supersedes: {prior filename — omit this line if first design on this branch}
|
||||
{observational, mentor-like reflections referencing specific things the user said during the session. Quote their words back to them — don't characterize their behavior. 2-4 bullets.}
|
||||
```
|
||||
|
||||
Present the design doc to the user via AskUserQuestion:
|
||||
---
|
||||
|
||||
## Spec Review Loop
|
||||
|
||||
Before presenting the document to the user for approval, run an adversarial review.
|
||||
|
||||
**Step 1: Dispatch reviewer subagent**
|
||||
|
||||
Use the Agent tool to dispatch an independent reviewer. The reviewer has fresh context
|
||||
and cannot see the brainstorming conversation — only the document. This ensures genuine
|
||||
adversarial independence.
|
||||
|
||||
Prompt the subagent with:
|
||||
- The file path of the document just written
|
||||
- "Read this document and review it on 5 dimensions. For each dimension, note PASS or
|
||||
list specific issues with suggested fixes. At the end, output a quality score (1-10)
|
||||
across all dimensions."
|
||||
|
||||
**Dimensions:**
|
||||
1. **Completeness** — Are all requirements addressed? Missing edge cases?
|
||||
2. **Consistency** — Do parts of the document agree with each other? Contradictions?
|
||||
3. **Clarity** — Could an engineer implement this without asking questions? Ambiguous language?
|
||||
4. **Scope** — Does the document creep beyond the original problem? YAGNI violations?
|
||||
5. **Feasibility** — Can this actually be built with the stated approach? Hidden complexity?
|
||||
|
||||
The subagent should return:
|
||||
- A quality score (1-10)
|
||||
- PASS if no issues, or a numbered list of issues with dimension, description, and fix
|
||||
|
||||
**Step 2: Fix and re-dispatch**
|
||||
|
||||
If the reviewer returns issues:
|
||||
1. Fix each issue in the document on disk (use Edit tool)
|
||||
2. Re-dispatch the reviewer subagent with the updated document
|
||||
3. Maximum 3 iterations total
|
||||
|
||||
**Convergence guard:** If the reviewer returns the same issues on consecutive iterations
|
||||
(the fix didn't resolve them or the reviewer disagrees with the fix), stop the loop
|
||||
and persist those issues as "Reviewer Concerns" in the document rather than looping
|
||||
further.
|
||||
|
||||
If the subagent fails, times out, or is unavailable — skip the review loop entirely.
|
||||
Tell the user: "Spec review unavailable — presenting unreviewed doc." The document is
|
||||
already written to disk; the review is a quality bonus, not a gate.
|
||||
|
||||
**Step 3: Report and persist metrics**
|
||||
|
||||
After the loop completes (PASS, max iterations, or convergence guard):
|
||||
|
||||
1. Tell the user the result — summary by default:
|
||||
"Your doc survived N rounds of adversarial review. M issues caught and fixed.
|
||||
Quality score: X/10."
|
||||
If they ask "what did the reviewer find?", show the full reviewer output.
|
||||
|
||||
2. If issues remain after max iterations or convergence, add a "## Reviewer Concerns"
|
||||
section to the document listing each unresolved issue. Downstream skills will see this.
|
||||
|
||||
3. Append metrics:
|
||||
```bash
|
||||
mkdir -p ~/.gstack/analytics
|
||||
echo '{"skill":"office-hours","ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","iterations":ITERATIONS,"issues_found":FOUND,"issues_fixed":FIXED,"remaining":REMAINING,"quality_score":SCORE}' >> ~/.gstack/analytics/spec-review.jsonl 2>/dev/null || true
|
||||
```
|
||||
Replace ITERATIONS, FOUND, FIXED, REMAINING, SCORE with actual values from the review.
|
||||
|
||||
---
|
||||
|
||||
Present the reviewed design doc to the user via AskUserQuestion:
|
||||
- A) Approve — mark Status: APPROVED and proceed to handoff
|
||||
- B) Revise — specify which sections need changes (loop back to revise those sections)
|
||||
- C) Start over — return to Phase 2
|
||||
|
||||
@@ -23,6 +23,8 @@ allowed-tools:
|
||||
|
||||
{{PREAMBLE}}
|
||||
|
||||
{{BROWSE_SETUP}}
|
||||
|
||||
# YC Office Hours
|
||||
|
||||
You are a **YC office hours partner**. Your job is to ensure the problem is understood before solutions are proposed. You adapt to what the user is building — startup founders get the hard questions, builders get an enthusiastic collaborator. This skill produces design docs, not code.
|
||||
@@ -287,6 +289,10 @@ Present via AskUserQuestion. Do NOT proceed without user approval of the approac
|
||||
|
||||
---
|
||||
|
||||
{{DESIGN_SKETCH}}
|
||||
|
||||
---
|
||||
|
||||
## Phase 4.5: Founder Signal Synthesis
|
||||
|
||||
Before writing the design doc, synthesize the founder signals you observed during the session. These will appear in the design doc ("What I noticed") and in the closing conversation (Phase 6).
|
||||
@@ -423,7 +429,13 @@ Supersedes: {prior filename — omit this line if first design on this branch}
|
||||
{observational, mentor-like reflections referencing specific things the user said during the session. Quote their words back to them — don't characterize their behavior. 2-4 bullets.}
|
||||
```
|
||||
|
||||
Present the design doc to the user via AskUserQuestion:
|
||||
---
|
||||
|
||||
{{SPEC_REVIEW_LOOP}}
|
||||
|
||||
---
|
||||
|
||||
Present the reviewed design doc to the user via AskUserQuestion:
|
||||
- A) Approve — mark Status: APPROVED and proceed to handoff
|
||||
- B) Revise — specify which sections need changes (loop back to revise those sections)
|
||||
- C) Start over — return to Phase 2
|
||||
|
||||
@@ -10,6 +10,7 @@ description: |
|
||||
or "is this ambitious enough".
|
||||
Proactively suggest when the user is questioning scope or ambition of a plan,
|
||||
or when the plan feels like it could be thinking bigger.
|
||||
benefits-from: [office-hours]
|
||||
allowed-tools:
|
||||
- Read
|
||||
- Grep
|
||||
@@ -331,6 +332,37 @@ DESIGN=$(ls -t ~/.gstack/projects/$SLUG/*-$BRANCH-design-*.md 2>/dev/null | head
|
||||
```
|
||||
If a design doc exists (from `/office-hours`), read it. Use it as the source of truth for the problem statement, constraints, and chosen approach. If it has a `Supersedes:` field, note that this is a revised design.
|
||||
|
||||
## Prerequisite Skill Offer
|
||||
|
||||
When the design doc check above prints "No design doc found," offer the prerequisite
|
||||
skill before proceeding.
|
||||
|
||||
Say to the user via AskUserQuestion:
|
||||
|
||||
> "No design doc found for this branch. `/office-hours` produces a structured problem
|
||||
> statement, premise challenge, and explored alternatives — it gives this review much
|
||||
> sharper input to work with. Takes about 10 minutes. The design doc is per-feature,
|
||||
> not per-product — it captures the thinking behind this specific change."
|
||||
|
||||
Options:
|
||||
- A) Run /office-hours first (in another window, then come back)
|
||||
- B) Skip — proceed with standard review
|
||||
|
||||
If they skip: "No worries — standard review. If you ever want sharper input, try
|
||||
/office-hours first next time." Then proceed normally. Do not re-offer later in the session.
|
||||
|
||||
**Mid-session detection:** During Step 0A (Premise Challenge), if the user can't
|
||||
articulate the problem, keeps changing the problem statement, answers with "I'm not
|
||||
sure," or is clearly exploring rather than reviewing — offer `/office-hours`:
|
||||
|
||||
> "It sounds like you're still figuring out what to build — that's totally fine, but
|
||||
> that's what /office-hours is designed for. Want to pause this review and run
|
||||
> /office-hours first? It'll help you nail down the problem and approach, then come
|
||||
> back here for the strategic review."
|
||||
|
||||
Options: A) Yes, run /office-hours first. B) No, keep going.
|
||||
If they keep going, proceed normally — no guilt, no re-asking.
|
||||
|
||||
When reading TODOS.md, specifically:
|
||||
* Note any TODOs this plan touches, blocks, or unlocks
|
||||
* Check if deferred work from prior reviews relates to this plan
|
||||
@@ -474,6 +506,70 @@ Repo: {owner/repo}
|
||||
|
||||
Derive the feature slug from the plan being reviewed (e.g., "user-dashboard", "auth-refactor"). Use the date in YYYY-MM-DD format.
|
||||
|
||||
After writing the CEO plan, run the spec review loop on it:
|
||||
|
||||
## Spec Review Loop
|
||||
|
||||
Before presenting the document to the user for approval, run an adversarial review.
|
||||
|
||||
**Step 1: Dispatch reviewer subagent**
|
||||
|
||||
Use the Agent tool to dispatch an independent reviewer. The reviewer has fresh context
|
||||
and cannot see the brainstorming conversation — only the document. This ensures genuine
|
||||
adversarial independence.
|
||||
|
||||
Prompt the subagent with:
|
||||
- The file path of the document just written
|
||||
- "Read this document and review it on 5 dimensions. For each dimension, note PASS or
|
||||
list specific issues with suggested fixes. At the end, output a quality score (1-10)
|
||||
across all dimensions."
|
||||
|
||||
**Dimensions:**
|
||||
1. **Completeness** — Are all requirements addressed? Missing edge cases?
|
||||
2. **Consistency** — Do parts of the document agree with each other? Contradictions?
|
||||
3. **Clarity** — Could an engineer implement this without asking questions? Ambiguous language?
|
||||
4. **Scope** — Does the document creep beyond the original problem? YAGNI violations?
|
||||
5. **Feasibility** — Can this actually be built with the stated approach? Hidden complexity?
|
||||
|
||||
The subagent should return:
|
||||
- A quality score (1-10)
|
||||
- PASS if no issues, or a numbered list of issues with dimension, description, and fix
|
||||
|
||||
**Step 2: Fix and re-dispatch**
|
||||
|
||||
If the reviewer returns issues:
|
||||
1. Fix each issue in the document on disk (use Edit tool)
|
||||
2. Re-dispatch the reviewer subagent with the updated document
|
||||
3. Maximum 3 iterations total
|
||||
|
||||
**Convergence guard:** If the reviewer returns the same issues on consecutive iterations
|
||||
(the fix didn't resolve them or the reviewer disagrees with the fix), stop the loop
|
||||
and persist those issues as "Reviewer Concerns" in the document rather than looping
|
||||
further.
|
||||
|
||||
If the subagent fails, times out, or is unavailable — skip the review loop entirely.
|
||||
Tell the user: "Spec review unavailable — presenting unreviewed doc." The document is
|
||||
already written to disk; the review is a quality bonus, not a gate.
|
||||
|
||||
**Step 3: Report and persist metrics**
|
||||
|
||||
After the loop completes (PASS, max iterations, or convergence guard):
|
||||
|
||||
1. Tell the user the result — summary by default:
|
||||
"Your doc survived N rounds of adversarial review. M issues caught and fixed.
|
||||
Quality score: X/10."
|
||||
If they ask "what did the reviewer find?", show the full reviewer output.
|
||||
|
||||
2. If issues remain after max iterations or convergence, add a "## Reviewer Concerns"
|
||||
section to the document listing each unresolved issue. Downstream skills will see this.
|
||||
|
||||
3. Append metrics:
|
||||
```bash
|
||||
mkdir -p ~/.gstack/analytics
|
||||
echo '{"skill":"plan-ceo-review","ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","iterations":ITERATIONS,"issues_found":FOUND,"issues_fixed":FIXED,"remaining":REMAINING,"quality_score":SCORE}' >> ~/.gstack/analytics/spec-review.jsonl 2>/dev/null || true
|
||||
```
|
||||
Replace ITERATIONS, FOUND, FIXED, REMAINING, SCORE with actual values from the review.
|
||||
|
||||
### 0E. Temporal Interrogation (EXPANSION, SELECTIVE EXPANSION, and HOLD modes)
|
||||
Think ahead to implementation: What decisions will need to be made during implementation that should be resolved NOW in the plan?
|
||||
```
|
||||
|
||||
@@ -10,6 +10,7 @@ description: |
|
||||
or "is this ambitious enough".
|
||||
Proactively suggest when the user is questioning scope or ambition of a plan,
|
||||
or when the plan feels like it could be thinking bigger.
|
||||
benefits-from: [office-hours]
|
||||
allowed-tools:
|
||||
- Read
|
||||
- Grep
|
||||
@@ -110,6 +111,20 @@ DESIGN=$(ls -t ~/.gstack/projects/$SLUG/*-$BRANCH-design-*.md 2>/dev/null | head
|
||||
```
|
||||
If a design doc exists (from `/office-hours`), read it. Use it as the source of truth for the problem statement, constraints, and chosen approach. If it has a `Supersedes:` field, note that this is a revised design.
|
||||
|
||||
{{BENEFITS_FROM}}
|
||||
|
||||
**Mid-session detection:** During Step 0A (Premise Challenge), if the user can't
|
||||
articulate the problem, keeps changing the problem statement, answers with "I'm not
|
||||
sure," or is clearly exploring rather than reviewing — offer `/office-hours`:
|
||||
|
||||
> "It sounds like you're still figuring out what to build — that's totally fine, but
|
||||
> that's what /office-hours is designed for. Want to pause this review and run
|
||||
> /office-hours first? It'll help you nail down the problem and approach, then come
|
||||
> back here for the strategic review."
|
||||
|
||||
Options: A) Yes, run /office-hours first. B) No, keep going.
|
||||
If they keep going, proceed normally — no guilt, no re-asking.
|
||||
|
||||
When reading TODOS.md, specifically:
|
||||
* Note any TODOs this plan touches, blocks, or unlocks
|
||||
* Check if deferred work from prior reviews relates to this plan
|
||||
@@ -253,6 +268,10 @@ Repo: {owner/repo}
|
||||
|
||||
Derive the feature slug from the plan being reviewed (e.g., "user-dashboard", "auth-refactor"). Use the date in YYYY-MM-DD format.
|
||||
|
||||
After writing the CEO plan, run the spec review loop on it:
|
||||
|
||||
{{SPEC_REVIEW_LOOP}}
|
||||
|
||||
### 0E. Temporal Interrogation (EXPANSION, SELECTIVE EXPANSION, and HOLD modes)
|
||||
Think ahead to implementation: What decisions will need to be made during implementation that should be resolved NOW in the plan?
|
||||
```
|
||||
|
||||
@@ -8,6 +8,7 @@ description: |
|
||||
"review the architecture", "engineering review", or "lock in the plan".
|
||||
Proactively suggest when the user has a plan or design doc and is about to
|
||||
start coding — to catch architecture issues before implementation.
|
||||
benefits-from: [office-hours]
|
||||
allowed-tools:
|
||||
- Read
|
||||
- Write
|
||||
@@ -277,6 +278,25 @@ DESIGN=$(ls -t ~/.gstack/projects/$SLUG/*-$BRANCH-design-*.md 2>/dev/null | head
|
||||
```
|
||||
If a design doc exists, read it. Use it as the source of truth for the problem statement, constraints, and chosen approach. If it has a `Supersedes:` field, note that this is a revised design — check the prior version for context on what changed and why.
|
||||
|
||||
## Prerequisite Skill Offer
|
||||
|
||||
When the design doc check above prints "No design doc found," offer the prerequisite
|
||||
skill before proceeding.
|
||||
|
||||
Say to the user via AskUserQuestion:
|
||||
|
||||
> "No design doc found for this branch. `/office-hours` produces a structured problem
|
||||
> statement, premise challenge, and explored alternatives — it gives this review much
|
||||
> sharper input to work with. Takes about 10 minutes. The design doc is per-feature,
|
||||
> not per-product — it captures the thinking behind this specific change."
|
||||
|
||||
Options:
|
||||
- A) Run /office-hours first (in another window, then come back)
|
||||
- B) Skip — proceed with standard review
|
||||
|
||||
If they skip: "No worries — standard review. If you ever want sharper input, try
|
||||
/office-hours first next time." Then proceed normally. Do not re-offer later in the session.
|
||||
|
||||
### Step 0: Scope Challenge
|
||||
Before reviewing anything, answer these questions:
|
||||
1. **What existing code already partially or fully solves each sub-problem?** Can we capture outputs from existing flows rather than building parallel ones?
|
||||
|
||||
@@ -8,6 +8,7 @@ description: |
|
||||
"review the architecture", "engineering review", or "lock in the plan".
|
||||
Proactively suggest when the user has a plan or design doc and is about to
|
||||
start coding — to catch architecture issues before implementation.
|
||||
benefits-from: [office-hours]
|
||||
allowed-tools:
|
||||
- Read
|
||||
- Write
|
||||
@@ -73,6 +74,8 @@ DESIGN=$(ls -t ~/.gstack/projects/$SLUG/*-$BRANCH-design-*.md 2>/dev/null | head
|
||||
```
|
||||
If a design doc exists, read it. Use it as the source of truth for the problem statement, constraints, and chosen approach. If it has a `Supersedes:` field, note that this is a revised design — check the prior version for context on what changed and why.
|
||||
|
||||
{{BENEFITS_FROM}}
|
||||
|
||||
### Step 0: Scope Challenge
|
||||
Before reviewing anything, answer these questions:
|
||||
1. **What existing code already partially or fully solves each sub-problem?** Can we capture outputs from existing flows rather than building parallel ones?
|
||||
|
||||
+162
-1
@@ -55,6 +55,7 @@ const HOST_PATHS: Record<Host, HostPaths> = {
|
||||
interface TemplateContext {
|
||||
skillName: string;
|
||||
tmplPath: string;
|
||||
benefitsFrom?: string[];
|
||||
host: Host;
|
||||
paths: HostPaths;
|
||||
}
|
||||
@@ -1405,6 +1406,156 @@ Tell the user: "Deploy configuration saved to CLAUDE.md. Future /land-and-deploy
|
||||
---`;
|
||||
}
|
||||
|
||||
function generateSpecReviewLoop(_ctx: TemplateContext): string {
|
||||
return `## Spec Review Loop
|
||||
|
||||
Before presenting the document to the user for approval, run an adversarial review.
|
||||
|
||||
**Step 1: Dispatch reviewer subagent**
|
||||
|
||||
Use the Agent tool to dispatch an independent reviewer. The reviewer has fresh context
|
||||
and cannot see the brainstorming conversation — only the document. This ensures genuine
|
||||
adversarial independence.
|
||||
|
||||
Prompt the subagent with:
|
||||
- The file path of the document just written
|
||||
- "Read this document and review it on 5 dimensions. For each dimension, note PASS or
|
||||
list specific issues with suggested fixes. At the end, output a quality score (1-10)
|
||||
across all dimensions."
|
||||
|
||||
**Dimensions:**
|
||||
1. **Completeness** — Are all requirements addressed? Missing edge cases?
|
||||
2. **Consistency** — Do parts of the document agree with each other? Contradictions?
|
||||
3. **Clarity** — Could an engineer implement this without asking questions? Ambiguous language?
|
||||
4. **Scope** — Does the document creep beyond the original problem? YAGNI violations?
|
||||
5. **Feasibility** — Can this actually be built with the stated approach? Hidden complexity?
|
||||
|
||||
The subagent should return:
|
||||
- A quality score (1-10)
|
||||
- PASS if no issues, or a numbered list of issues with dimension, description, and fix
|
||||
|
||||
**Step 2: Fix and re-dispatch**
|
||||
|
||||
If the reviewer returns issues:
|
||||
1. Fix each issue in the document on disk (use Edit tool)
|
||||
2. Re-dispatch the reviewer subagent with the updated document
|
||||
3. Maximum 3 iterations total
|
||||
|
||||
**Convergence guard:** If the reviewer returns the same issues on consecutive iterations
|
||||
(the fix didn't resolve them or the reviewer disagrees with the fix), stop the loop
|
||||
and persist those issues as "Reviewer Concerns" in the document rather than looping
|
||||
further.
|
||||
|
||||
If the subagent fails, times out, or is unavailable — skip the review loop entirely.
|
||||
Tell the user: "Spec review unavailable — presenting unreviewed doc." The document is
|
||||
already written to disk; the review is a quality bonus, not a gate.
|
||||
|
||||
**Step 3: Report and persist metrics**
|
||||
|
||||
After the loop completes (PASS, max iterations, or convergence guard):
|
||||
|
||||
1. Tell the user the result — summary by default:
|
||||
"Your doc survived N rounds of adversarial review. M issues caught and fixed.
|
||||
Quality score: X/10."
|
||||
If they ask "what did the reviewer find?", show the full reviewer output.
|
||||
|
||||
2. If issues remain after max iterations or convergence, add a "## Reviewer Concerns"
|
||||
section to the document listing each unresolved issue. Downstream skills will see this.
|
||||
|
||||
3. Append metrics:
|
||||
\`\`\`bash
|
||||
mkdir -p ~/.gstack/analytics
|
||||
echo '{"skill":"${_ctx.skillName}","ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","iterations":ITERATIONS,"issues_found":FOUND,"issues_fixed":FIXED,"remaining":REMAINING,"quality_score":SCORE}' >> ~/.gstack/analytics/spec-review.jsonl 2>/dev/null || true
|
||||
\`\`\`
|
||||
Replace ITERATIONS, FOUND, FIXED, REMAINING, SCORE with actual values from the review.`;
|
||||
}
|
||||
|
||||
function generateBenefitsFrom(ctx: TemplateContext): string {
|
||||
if (!ctx.benefitsFrom || ctx.benefitsFrom.length === 0) return '';
|
||||
|
||||
const skillList = ctx.benefitsFrom.map(s => `\`/${s}\``).join(' or ');
|
||||
const first = ctx.benefitsFrom[0];
|
||||
|
||||
return `## Prerequisite Skill Offer
|
||||
|
||||
When the design doc check above prints "No design doc found," offer the prerequisite
|
||||
skill before proceeding.
|
||||
|
||||
Say to the user via AskUserQuestion:
|
||||
|
||||
> "No design doc found for this branch. ${skillList} produces a structured problem
|
||||
> statement, premise challenge, and explored alternatives — it gives this review much
|
||||
> sharper input to work with. Takes about 10 minutes. The design doc is per-feature,
|
||||
> not per-product — it captures the thinking behind this specific change."
|
||||
|
||||
Options:
|
||||
- A) Run /${first} first (in another window, then come back)
|
||||
- B) Skip — proceed with standard review
|
||||
|
||||
If they skip: "No worries — standard review. If you ever want sharper input, try
|
||||
/${first} first next time." Then proceed normally. Do not re-offer later in the session.`;
|
||||
}
|
||||
|
||||
function generateDesignSketch(_ctx: TemplateContext): string {
|
||||
return `## Visual Sketch (UI ideas only)
|
||||
|
||||
If the chosen approach involves user-facing UI (screens, pages, forms, dashboards,
|
||||
or interactive elements), generate a rough wireframe to help the user visualize it.
|
||||
If the idea is backend-only, infrastructure, or has no UI component — skip this
|
||||
section silently.
|
||||
|
||||
**Step 1: Gather design context**
|
||||
|
||||
1. Check if \`DESIGN.md\` exists in the repo root. If it does, read it for design
|
||||
system constraints (colors, typography, spacing, component patterns). Use these
|
||||
constraints in the wireframe.
|
||||
2. Apply core design principles:
|
||||
- **Information hierarchy** — what does the user see first, second, third?
|
||||
- **Interaction states** — loading, empty, error, success, partial
|
||||
- **Edge case paranoia** — what if the name is 47 chars? Zero results? Network fails?
|
||||
- **Subtraction default** — "as little design as possible" (Rams). Every element earns its pixels.
|
||||
- **Design for trust** — every interface element builds or erodes user trust.
|
||||
|
||||
**Step 2: Generate wireframe HTML**
|
||||
|
||||
Generate a single-page HTML file with these constraints:
|
||||
- **Intentionally rough aesthetic** — use system fonts, thin gray borders, no color,
|
||||
hand-drawn-style elements. This is a sketch, not a polished mockup.
|
||||
- Self-contained — no external dependencies, no CDN links, inline CSS only
|
||||
- Show the core interaction flow (1-3 screens/states max)
|
||||
- Include realistic placeholder content (not "Lorem ipsum" — use content that
|
||||
matches the actual use case)
|
||||
- Add HTML comments explaining design decisions
|
||||
|
||||
Write to a temp file:
|
||||
\`\`\`bash
|
||||
SKETCH_FILE="/tmp/gstack-sketch-$(date +%s).html"
|
||||
\`\`\`
|
||||
|
||||
**Step 3: Render and capture**
|
||||
|
||||
\`\`\`bash
|
||||
$B goto "file://$SKETCH_FILE"
|
||||
$B screenshot /tmp/gstack-sketch.png
|
||||
\`\`\`
|
||||
|
||||
If \`$B\` is not available (browse binary not set up), skip the render step. Tell the
|
||||
user: "Visual sketch requires the browse binary. Run the setup script to enable it."
|
||||
|
||||
**Step 4: Present and iterate**
|
||||
|
||||
Show the screenshot to the user. Ask: "Does this feel right? Want to iterate on the layout?"
|
||||
|
||||
If they want changes, regenerate the HTML with their feedback and re-render.
|
||||
If they approve or say "good enough," proceed.
|
||||
|
||||
**Step 5: Include in design doc**
|
||||
|
||||
Reference the wireframe screenshot in the design doc's "Recommended Approach" section.
|
||||
The screenshot file at \`/tmp/gstack-sketch.png\` can be referenced by downstream skills
|
||||
(\`/plan-design-review\`, \`/design-review\`) to see what was originally envisioned.`;
|
||||
}
|
||||
|
||||
const RESOLVERS: Record<string, (ctx: TemplateContext) => string> = {
|
||||
COMMAND_REFERENCE: generateCommandReference,
|
||||
SNAPSHOT_FLAGS: generateSnapshotFlags,
|
||||
@@ -1417,6 +1568,9 @@ const RESOLVERS: Record<string, (ctx: TemplateContext) => string> = {
|
||||
REVIEW_DASHBOARD: generateReviewDashboard,
|
||||
TEST_BOOTSTRAP: generateTestBootstrap,
|
||||
DEPLOY_BOOTSTRAP: generateDeployBootstrap,
|
||||
SPEC_REVIEW_LOOP: generateSpecReviewLoop,
|
||||
DESIGN_SKETCH: generateDesignSketch,
|
||||
BENEFITS_FROM: generateBenefitsFrom,
|
||||
};
|
||||
|
||||
// ─── Codex Helpers ───────────────────────────────────────────
|
||||
@@ -1539,7 +1693,14 @@ function processTemplate(tmplPath: string, host: Host = 'claude'): { outputPath:
|
||||
// Extract skill name from frontmatter for TemplateContext
|
||||
const nameMatch = tmplContent.match(/^name:\s*(.+)$/m);
|
||||
const skillName = nameMatch ? nameMatch[1].trim() : path.basename(path.dirname(tmplPath));
|
||||
const ctx: TemplateContext = { skillName, tmplPath, host, paths: HOST_PATHS[host] };
|
||||
|
||||
// Extract benefits-from list from frontmatter (inline YAML: benefits-from: [a, b])
|
||||
const benefitsMatch = tmplContent.match(/^benefits-from:\s*\[([^\]]*)\]/m);
|
||||
const benefitsFrom = benefitsMatch
|
||||
? benefitsMatch[1].split(',').map(s => s.trim()).filter(Boolean)
|
||||
: undefined;
|
||||
|
||||
const ctx: TemplateContext = { skillName, tmplPath, benefitsFrom, host, paths: HOST_PATHS[host] };
|
||||
|
||||
// Replace placeholders
|
||||
let content = tmplContent.replace(/\{\{(\w+)\}\}/g, (match, name) => {
|
||||
|
||||
@@ -416,6 +416,98 @@ describe('REVIEW_DASHBOARD resolver', () => {
|
||||
});
|
||||
});
|
||||
|
||||
// --- {{SPEC_REVIEW_LOOP}} resolver tests ---
|
||||
|
||||
describe('SPEC_REVIEW_LOOP resolver', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'office-hours', 'SKILL.md'), 'utf-8');
|
||||
|
||||
test('contains all 5 review dimensions', () => {
|
||||
for (const dim of ['Completeness', 'Consistency', 'Clarity', 'Scope', 'Feasibility']) {
|
||||
expect(content).toContain(dim);
|
||||
}
|
||||
});
|
||||
|
||||
test('references Agent tool for subagent dispatch', () => {
|
||||
expect(content).toMatch(/Agent.*tool/i);
|
||||
});
|
||||
|
||||
test('specifies max 3 iterations', () => {
|
||||
expect(content).toMatch(/3.*iteration|maximum.*3/i);
|
||||
});
|
||||
|
||||
test('includes quality score', () => {
|
||||
expect(content).toContain('quality score');
|
||||
});
|
||||
|
||||
test('includes metrics path', () => {
|
||||
expect(content).toContain('spec-review.jsonl');
|
||||
});
|
||||
|
||||
test('includes convergence guard', () => {
|
||||
expect(content).toMatch(/[Cc]onvergence/);
|
||||
});
|
||||
|
||||
test('includes graceful failure handling', () => {
|
||||
expect(content).toMatch(/skip.*review|unavailable/i);
|
||||
});
|
||||
});
|
||||
|
||||
// --- {{DESIGN_SKETCH}} resolver tests ---
|
||||
|
||||
describe('DESIGN_SKETCH resolver', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'office-hours', 'SKILL.md'), 'utf-8');
|
||||
|
||||
test('references DESIGN.md for design system constraints', () => {
|
||||
expect(content).toContain('DESIGN.md');
|
||||
});
|
||||
|
||||
test('contains wireframe or sketch terminology', () => {
|
||||
expect(content).toMatch(/wireframe|sketch/i);
|
||||
});
|
||||
|
||||
test('references browse binary for rendering', () => {
|
||||
expect(content).toContain('$B goto');
|
||||
});
|
||||
|
||||
test('references screenshot capture', () => {
|
||||
expect(content).toContain('$B screenshot');
|
||||
});
|
||||
|
||||
test('specifies rough aesthetic', () => {
|
||||
expect(content).toMatch(/[Rr]ough|hand-drawn/);
|
||||
});
|
||||
|
||||
test('includes skip conditions', () => {
|
||||
expect(content).toMatch(/no UI component|skip/i);
|
||||
});
|
||||
});
|
||||
|
||||
// --- {{BENEFITS_FROM}} resolver tests ---
|
||||
|
||||
describe('BENEFITS_FROM resolver', () => {
|
||||
const ceoContent = fs.readFileSync(path.join(ROOT, 'plan-ceo-review', 'SKILL.md'), 'utf-8');
|
||||
const engContent = fs.readFileSync(path.join(ROOT, 'plan-eng-review', 'SKILL.md'), 'utf-8');
|
||||
|
||||
test('plan-ceo-review contains prerequisite skill offer', () => {
|
||||
expect(ceoContent).toContain('Prerequisite Skill Offer');
|
||||
expect(ceoContent).toContain('/office-hours');
|
||||
});
|
||||
|
||||
test('plan-eng-review contains prerequisite skill offer', () => {
|
||||
expect(engContent).toContain('Prerequisite Skill Offer');
|
||||
expect(engContent).toContain('/office-hours');
|
||||
});
|
||||
|
||||
test('offer includes graceful decline', () => {
|
||||
expect(ceoContent).toContain('No worries');
|
||||
});
|
||||
|
||||
test('skills without benefits-from do NOT have prerequisite offer', () => {
|
||||
const qaContent = fs.readFileSync(path.join(ROOT, 'qa', 'SKILL.md'), 'utf-8');
|
||||
expect(qaContent).not.toContain('Prerequisite Skill Offer');
|
||||
});
|
||||
});
|
||||
|
||||
// ─── Codex Generation Tests ─────────────────────────────────
|
||||
|
||||
describe('Codex generation (--host codex)', () => {
|
||||
|
||||
@@ -57,9 +57,13 @@ export const E2E_TOUCHFILES: Record<string, string[]> = {
|
||||
'review-base-branch': ['review/**'],
|
||||
'review-design-lite': ['review/**', 'test/fixtures/review-eval-design-slop.*'],
|
||||
|
||||
// Office Hours
|
||||
'office-hours-spec-review': ['office-hours/**', 'scripts/gen-skill-docs.ts'],
|
||||
|
||||
// Plan reviews
|
||||
'plan-ceo-review': ['plan-ceo-review/**'],
|
||||
'plan-ceo-review-selective': ['plan-ceo-review/**'],
|
||||
'plan-ceo-review-benefits': ['plan-ceo-review/**', 'scripts/gen-skill-docs.ts'],
|
||||
'plan-eng-review': ['plan-eng-review/**'],
|
||||
'plan-eng-review-artifact': ['plan-eng-review/**'],
|
||||
|
||||
@@ -146,6 +150,10 @@ export const LLM_JUDGE_TOUCHFILES: Record<string, string[]> = {
|
||||
'design-review/SKILL.md fix loop': ['design-review/SKILL.md', 'design-review/SKILL.md.tmpl'],
|
||||
'design-consultation/SKILL.md research': ['design-consultation/SKILL.md', 'design-consultation/SKILL.md.tmpl'],
|
||||
|
||||
// Office Hours
|
||||
'office-hours/SKILL.md spec review': ['office-hours/SKILL.md', 'office-hours/SKILL.md.tmpl', 'scripts/gen-skill-docs.ts'],
|
||||
'office-hours/SKILL.md design sketch': ['office-hours/SKILL.md', 'office-hours/SKILL.md.tmpl', 'scripts/gen-skill-docs.ts'],
|
||||
|
||||
// Deploy skills
|
||||
'land-and-deploy/SKILL.md workflow': ['land-and-deploy/SKILL.md', 'land-and-deploy/SKILL.md.tmpl'],
|
||||
'canary/SKILL.md monitoring loop': ['canary/SKILL.md', 'canary/SKILL.md.tmpl'],
|
||||
|
||||
+119
-23
@@ -2911,6 +2911,125 @@ Write the full output (including the GATE verdict) to ${codexDir}/codex-output.m
|
||||
}, 360_000);
|
||||
});
|
||||
|
||||
// --- Office Hours Spec Review E2E ---
|
||||
|
||||
describeIfSelected('Office Hours Spec Review E2E', ['office-hours-spec-review'], () => {
|
||||
let ohDir: string;
|
||||
|
||||
beforeAll(() => {
|
||||
ohDir = fs.mkdtempSync(path.join(os.tmpdir(), 'skill-e2e-oh-spec-'));
|
||||
const run = (cmd: string, args: string[]) =>
|
||||
spawnSync(cmd, args, { cwd: ohDir, stdio: 'pipe', timeout: 5000 });
|
||||
|
||||
run('git', ['init', '-b', 'main']);
|
||||
run('git', ['config', 'user.email', 'test@test.com']);
|
||||
run('git', ['config', 'user.name', 'Test']);
|
||||
fs.writeFileSync(path.join(ohDir, 'README.md'), '# Test Project\n');
|
||||
run('git', ['add', '.']);
|
||||
run('git', ['commit', '-m', 'init']);
|
||||
|
||||
// Copy office-hours skill
|
||||
fs.mkdirSync(path.join(ohDir, 'office-hours'), { recursive: true });
|
||||
fs.copyFileSync(
|
||||
path.join(ROOT, 'office-hours', 'SKILL.md'),
|
||||
path.join(ohDir, 'office-hours', 'SKILL.md'),
|
||||
);
|
||||
});
|
||||
|
||||
afterAll(() => {
|
||||
try { fs.rmSync(ohDir, { recursive: true, force: true }); } catch {}
|
||||
});
|
||||
|
||||
test('/office-hours SKILL.md contains spec review loop', async () => {
|
||||
const result = await runSkillTest({
|
||||
prompt: `Read office-hours/SKILL.md. I want to understand the spec review loop.
|
||||
|
||||
Summarize what the "Spec Review Loop" section does — specifically:
|
||||
1. How many dimensions does the reviewer check?
|
||||
2. What tool is used to dispatch the reviewer?
|
||||
3. What's the maximum number of iterations?
|
||||
4. What metrics are tracked?
|
||||
|
||||
Write your summary to ${ohDir}/spec-review-summary.md`,
|
||||
workingDirectory: ohDir,
|
||||
maxTurns: 8,
|
||||
timeout: 120_000,
|
||||
testName: 'office-hours-spec-review',
|
||||
runId,
|
||||
});
|
||||
|
||||
logCost('/office-hours spec review', result);
|
||||
recordE2E('/office-hours-spec-review', 'Office Hours Spec Review E2E', result);
|
||||
expect(result.exitReason).toBe('success');
|
||||
|
||||
const summaryPath = path.join(ohDir, 'spec-review-summary.md');
|
||||
if (fs.existsSync(summaryPath)) {
|
||||
const summary = fs.readFileSync(summaryPath, 'utf-8').toLowerCase();
|
||||
expect(summary).toMatch(/5.*dimension|dimension.*5|completeness|consistency|clarity|scope|feasibility/);
|
||||
expect(summary).toMatch(/agent|subagent/);
|
||||
expect(summary).toMatch(/3.*iteration|iteration.*3|maximum.*3/);
|
||||
}
|
||||
}, 180_000);
|
||||
});
|
||||
|
||||
// --- Plan CEO Review Benefits-From E2E ---
|
||||
|
||||
describeIfSelected('Plan CEO Review Benefits-From E2E', ['plan-ceo-review-benefits'], () => {
|
||||
let benefitsDir: string;
|
||||
|
||||
beforeAll(() => {
|
||||
benefitsDir = fs.mkdtempSync(path.join(os.tmpdir(), 'skill-e2e-benefits-'));
|
||||
const run = (cmd: string, args: string[]) =>
|
||||
spawnSync(cmd, args, { cwd: benefitsDir, stdio: 'pipe', timeout: 5000 });
|
||||
|
||||
run('git', ['init', '-b', 'main']);
|
||||
run('git', ['config', 'user.email', 'test@test.com']);
|
||||
run('git', ['config', 'user.name', 'Test']);
|
||||
fs.writeFileSync(path.join(benefitsDir, 'README.md'), '# Test Project\n');
|
||||
run('git', ['add', '.']);
|
||||
run('git', ['commit', '-m', 'init']);
|
||||
|
||||
fs.mkdirSync(path.join(benefitsDir, 'plan-ceo-review'), { recursive: true });
|
||||
fs.copyFileSync(
|
||||
path.join(ROOT, 'plan-ceo-review', 'SKILL.md'),
|
||||
path.join(benefitsDir, 'plan-ceo-review', 'SKILL.md'),
|
||||
);
|
||||
});
|
||||
|
||||
afterAll(() => {
|
||||
try { fs.rmSync(benefitsDir, { recursive: true, force: true }); } catch {}
|
||||
});
|
||||
|
||||
test('/plan-ceo-review SKILL.md contains prerequisite skill offer', async () => {
|
||||
const result = await runSkillTest({
|
||||
prompt: `Read plan-ceo-review/SKILL.md. Search for sections about "Prerequisite" or "office-hours" or "design doc found".
|
||||
|
||||
Summarize what happens when no design doc is found — specifically:
|
||||
1. Is /office-hours offered as a prerequisite?
|
||||
2. What options does the user get?
|
||||
3. Is there a mid-session detection for when the user seems lost?
|
||||
|
||||
Write your summary to ${benefitsDir}/benefits-summary.md`,
|
||||
workingDirectory: benefitsDir,
|
||||
maxTurns: 8,
|
||||
timeout: 120_000,
|
||||
testName: 'plan-ceo-review-benefits',
|
||||
runId,
|
||||
});
|
||||
|
||||
logCost('/plan-ceo-review benefits-from', result);
|
||||
recordE2E('/plan-ceo-review-benefits', 'Plan CEO Review Benefits-From E2E', result);
|
||||
expect(result.exitReason).toBe('success');
|
||||
|
||||
const summaryPath = path.join(benefitsDir, 'benefits-summary.md');
|
||||
if (fs.existsSync(summaryPath)) {
|
||||
const summary = fs.readFileSync(summaryPath, 'utf-8').toLowerCase();
|
||||
expect(summary).toMatch(/office.hours/);
|
||||
expect(summary).toMatch(/design doc|no design/i);
|
||||
}
|
||||
}, 180_000);
|
||||
});
|
||||
|
||||
// --- Land-and-Deploy / Canary / Benchmark / Setup-Deploy E2E ---
|
||||
|
||||
describeIfSelected('Land-and-Deploy skill E2E', ['land-and-deploy-workflow'], () => {
|
||||
@@ -2918,7 +3037,6 @@ describeIfSelected('Land-and-Deploy skill E2E', ['land-and-deploy-workflow'], ()
|
||||
|
||||
beforeAll(() => {
|
||||
landDir = fs.mkdtempSync(path.join(os.tmpdir(), 'skill-e2e-land-deploy-'));
|
||||
|
||||
const run = (cmd: string, args: string[]) =>
|
||||
spawnSync(cmd, args, { cwd: landDir, stdio: 'pipe', timeout: 5000 });
|
||||
|
||||
@@ -2926,19 +3044,16 @@ describeIfSelected('Land-and-Deploy skill E2E', ['land-and-deploy-workflow'], ()
|
||||
run('git', ['config', 'user.email', 'test@test.com']);
|
||||
run('git', ['config', 'user.name', 'Test']);
|
||||
|
||||
// Create initial app
|
||||
fs.writeFileSync(path.join(landDir, 'app.ts'), 'export function hello() { return "world"; }\n');
|
||||
fs.writeFileSync(path.join(landDir, 'fly.toml'), 'app = "test-app"\n\n[http_service]\n internal_port = 3000\n');
|
||||
run('git', ['add', '.']);
|
||||
run('git', ['commit', '-m', 'initial']);
|
||||
|
||||
// Create feature branch with changes
|
||||
run('git', ['checkout', '-b', 'feat/add-deploy']);
|
||||
fs.writeFileSync(path.join(landDir, 'app.ts'), 'export function hello() { return "deployed"; }\n');
|
||||
run('git', ['add', '.']);
|
||||
run('git', ['commit', '-m', 'feat: update hello']);
|
||||
|
||||
// Copy skill
|
||||
copyDirSync(path.join(ROOT, 'land-and-deploy'), path.join(landDir, 'land-and-deploy'));
|
||||
});
|
||||
|
||||
@@ -2975,7 +3090,6 @@ Do NOT use AskUserQuestion. Do NOT run gh or fly commands.`,
|
||||
recordE2E('/land-and-deploy workflow', 'Land-and-Deploy skill E2E', result);
|
||||
expect(result.exitReason).toBe('success');
|
||||
|
||||
// Verify deploy config was written to CLAUDE.md
|
||||
const claudeMd = path.join(landDir, 'CLAUDE.md');
|
||||
if (fs.existsSync(claudeMd)) {
|
||||
const content = fs.readFileSync(claudeMd, 'utf-8');
|
||||
@@ -2983,7 +3097,6 @@ Do NOT use AskUserQuestion. Do NOT run gh or fly commands.`,
|
||||
expect(hasFly).toBe(true);
|
||||
}
|
||||
|
||||
// Verify deploy report directory was created
|
||||
const reportDir = path.join(landDir, '.gstack', 'deploy-reports');
|
||||
expect(fs.existsSync(reportDir)).toBe(true);
|
||||
}, 180_000);
|
||||
@@ -2994,7 +3107,6 @@ describeIfSelected('Canary skill E2E', ['canary-workflow'], () => {
|
||||
|
||||
beforeAll(() => {
|
||||
canaryDir = fs.mkdtempSync(path.join(os.tmpdir(), 'skill-e2e-canary-'));
|
||||
|
||||
const run = (cmd: string, args: string[]) =>
|
||||
spawnSync(cmd, args, { cwd: canaryDir, stdio: 'pipe', timeout: 5000 });
|
||||
|
||||
@@ -3006,7 +3118,6 @@ describeIfSelected('Canary skill E2E', ['canary-workflow'], () => {
|
||||
run('git', ['add', '.']);
|
||||
run('git', ['commit', '-m', 'initial']);
|
||||
|
||||
// Copy skill
|
||||
copyDirSync(path.join(ROOT, 'canary'), path.join(canaryDir, 'canary'));
|
||||
});
|
||||
|
||||
@@ -3043,10 +3154,7 @@ Just create the directory structure and report files showing the correct schema.
|
||||
recordE2E('/canary workflow', 'Canary skill E2E', result);
|
||||
expect(result.exitReason).toBe('success');
|
||||
|
||||
// Verify directory structure
|
||||
expect(fs.existsSync(path.join(canaryDir, '.gstack', 'canary-reports'))).toBe(true);
|
||||
|
||||
// Verify baseline or report was created
|
||||
const reportDir = path.join(canaryDir, '.gstack', 'canary-reports');
|
||||
const files = fs.readdirSync(reportDir, { recursive: true }) as string[];
|
||||
expect(files.length).toBeGreaterThan(0);
|
||||
@@ -3058,7 +3166,6 @@ describeIfSelected('Benchmark skill E2E', ['benchmark-workflow'], () => {
|
||||
|
||||
beforeAll(() => {
|
||||
benchDir = fs.mkdtempSync(path.join(os.tmpdir(), 'skill-e2e-benchmark-'));
|
||||
|
||||
const run = (cmd: string, args: string[]) =>
|
||||
spawnSync(cmd, args, { cwd: benchDir, stdio: 'pipe', timeout: 5000 });
|
||||
|
||||
@@ -3070,7 +3177,6 @@ describeIfSelected('Benchmark skill E2E', ['benchmark-workflow'], () => {
|
||||
run('git', ['add', '.']);
|
||||
run('git', ['commit', '-m', 'initial']);
|
||||
|
||||
// Copy skill
|
||||
copyDirSync(path.join(ROOT, 'benchmark'), path.join(benchDir, 'benchmark'));
|
||||
});
|
||||
|
||||
@@ -3109,10 +3215,7 @@ Just create the files showing the correct schema and report format.`,
|
||||
recordE2E('/benchmark workflow', 'Benchmark skill E2E', result);
|
||||
expect(result.exitReason).toBe('success');
|
||||
|
||||
// Verify directory structure
|
||||
expect(fs.existsSync(path.join(benchDir, '.gstack', 'benchmark-reports'))).toBe(true);
|
||||
|
||||
// Verify baseline was created
|
||||
const baselineDir = path.join(benchDir, '.gstack', 'benchmark-reports', 'baselines');
|
||||
if (fs.existsSync(baselineDir)) {
|
||||
const files = fs.readdirSync(baselineDir);
|
||||
@@ -3126,7 +3229,6 @@ describeIfSelected('Setup-Deploy skill E2E', ['setup-deploy-workflow'], () => {
|
||||
|
||||
beforeAll(() => {
|
||||
setupDir = fs.mkdtempSync(path.join(os.tmpdir(), 'skill-e2e-setup-deploy-'));
|
||||
|
||||
const run = (cmd: string, args: string[]) =>
|
||||
spawnSync(cmd, args, { cwd: setupDir, stdio: 'pipe', timeout: 5000 });
|
||||
|
||||
@@ -3134,13 +3236,11 @@ describeIfSelected('Setup-Deploy skill E2E', ['setup-deploy-workflow'], () => {
|
||||
run('git', ['config', 'user.email', 'test@test.com']);
|
||||
run('git', ['config', 'user.name', 'Test']);
|
||||
|
||||
// Create a project with fly.toml
|
||||
fs.writeFileSync(path.join(setupDir, 'app.ts'), 'export default { port: 3000 };\n');
|
||||
fs.writeFileSync(path.join(setupDir, 'fly.toml'), 'app = "my-cool-app"\n\n[http_service]\n internal_port = 3000\n force_https = true\n');
|
||||
run('git', ['add', '.']);
|
||||
run('git', ['commit', '-m', 'initial']);
|
||||
|
||||
// Copy skill
|
||||
copyDirSync(path.join(ROOT, 'setup-deploy'), path.join(setupDir, 'setup-deploy'));
|
||||
});
|
||||
|
||||
@@ -3174,16 +3274,12 @@ Just detect the platform and write the config.`,
|
||||
recordE2E('/setup-deploy workflow', 'Setup-Deploy skill E2E', result);
|
||||
expect(result.exitReason).toBe('success');
|
||||
|
||||
// Verify CLAUDE.md was created with deploy config
|
||||
const claudeMd = path.join(setupDir, 'CLAUDE.md');
|
||||
expect(fs.existsSync(claudeMd)).toBe(true);
|
||||
|
||||
const content = fs.readFileSync(claudeMd, 'utf-8');
|
||||
// Should mention Fly.io or fly
|
||||
expect(content.toLowerCase()).toContain('fly');
|
||||
// Should mention the app name
|
||||
expect(content).toContain('my-cool-app');
|
||||
// Should have the deploy configuration header
|
||||
expect(content).toContain('Deploy Configuration');
|
||||
}, 180_000);
|
||||
});
|
||||
|
||||
@@ -652,6 +652,59 @@ describe('office-hours skill structure', () => {
|
||||
test('contains builder operating principles', () => {
|
||||
expect(content).toContain('Delight is the currency');
|
||||
});
|
||||
|
||||
// Spec Review Loop (Phase 5.5)
|
||||
test('contains spec review loop', () => {
|
||||
expect(content).toContain('Spec Review Loop');
|
||||
});
|
||||
|
||||
test('contains adversarial review dimensions', () => {
|
||||
for (const dim of ['Completeness', 'Consistency', 'Clarity', 'Scope', 'Feasibility']) {
|
||||
expect(content).toContain(dim);
|
||||
}
|
||||
});
|
||||
|
||||
test('contains subagent dispatch instruction', () => {
|
||||
expect(content).toMatch(/Agent.*tool|subagent/i);
|
||||
});
|
||||
|
||||
test('contains max 3 iterations', () => {
|
||||
expect(content).toMatch(/3.*iteration|maximum.*3/i);
|
||||
});
|
||||
|
||||
test('contains quality score', () => {
|
||||
expect(content).toContain('quality score');
|
||||
});
|
||||
|
||||
test('contains spec review metrics path', () => {
|
||||
expect(content).toContain('spec-review.jsonl');
|
||||
});
|
||||
|
||||
test('contains convergence guard', () => {
|
||||
expect(content).toMatch(/convergence/i);
|
||||
});
|
||||
|
||||
// Visual Sketch (Phase 4.5)
|
||||
test('contains visual sketch section', () => {
|
||||
expect(content).toContain('Visual Sketch');
|
||||
});
|
||||
|
||||
test('contains wireframe generation', () => {
|
||||
expect(content).toMatch(/wireframe|sketch/i);
|
||||
});
|
||||
|
||||
test('contains DESIGN.md awareness', () => {
|
||||
expect(content).toContain('DESIGN.md');
|
||||
});
|
||||
|
||||
test('contains browse rendering', () => {
|
||||
expect(content).toContain('$B goto');
|
||||
expect(content).toContain('$B screenshot');
|
||||
});
|
||||
|
||||
test('contains rough aesthetic instruction', () => {
|
||||
expect(content).toMatch(/rough|hand-drawn/i);
|
||||
});
|
||||
});
|
||||
|
||||
describe('investigate skill structure', () => {
|
||||
@@ -868,6 +921,22 @@ describe('CEO review mode validation', () => {
|
||||
expect(content).toContain('HOLD SCOPE');
|
||||
expect(content).toContain('REDUCTION');
|
||||
});
|
||||
|
||||
// Skill chaining (benefits-from)
|
||||
test('contains prerequisite skill offer for office-hours', () => {
|
||||
expect(content).toContain('Prerequisite Skill Offer');
|
||||
expect(content).toContain('/office-hours');
|
||||
});
|
||||
|
||||
test('contains mid-session detection', () => {
|
||||
expect(content).toContain('Mid-session detection');
|
||||
expect(content).toMatch(/still figuring out|seems lost/i);
|
||||
});
|
||||
|
||||
// Spec review on CEO plans
|
||||
test('contains spec review loop for CEO plan documents', () => {
|
||||
expect(content).toContain('Spec Review Loop');
|
||||
});
|
||||
});
|
||||
|
||||
// --- gstack-slug helper ---
|
||||
|
||||
@@ -78,8 +78,9 @@ describe('selectTests', () => {
|
||||
const result = selectTests(['plan-ceo-review/SKILL.md'], E2E_TOUCHFILES);
|
||||
expect(result.selected).toContain('plan-ceo-review');
|
||||
expect(result.selected).toContain('plan-ceo-review-selective');
|
||||
expect(result.selected.length).toBe(2);
|
||||
expect(result.skipped.length).toBe(Object.keys(E2E_TOUCHFILES).length - 2);
|
||||
expect(result.selected).toContain('plan-ceo-review-benefits');
|
||||
expect(result.selected.length).toBe(3);
|
||||
expect(result.skipped.length).toBe(Object.keys(E2E_TOUCHFILES).length - 3);
|
||||
});
|
||||
|
||||
test('global touchfile triggers ALL tests', () => {
|
||||
|
||||
Reference in New Issue
Block a user