Merge remote-tracking branch 'origin/garrytan/community-mode' into garrytan/persistent-docs

# Conflicts:
#	.agents/skills/gstack-browse/SKILL.md
#	.agents/skills/gstack-design-consultation/SKILL.md
#	.agents/skills/gstack-design-review/SKILL.md
#	.agents/skills/gstack-document-release/SKILL.md
#	.agents/skills/gstack-investigate/SKILL.md
#	.agents/skills/gstack-office-hours/SKILL.md
#	.agents/skills/gstack-plan-ceo-review/SKILL.md
#	.agents/skills/gstack-plan-design-review/SKILL.md
#	.agents/skills/gstack-plan-eng-review/SKILL.md
#	.agents/skills/gstack-qa-only/SKILL.md
#	.agents/skills/gstack-qa/SKILL.md
#	.agents/skills/gstack-retro/SKILL.md
#	.agents/skills/gstack-review/SKILL.md
#	.agents/skills/gstack-setup-browser-cookies/SKILL.md
#	.agents/skills/gstack-ship/SKILL.md
#	.agents/skills/gstack/SKILL.md
#	SKILL.md
#	browse/SKILL.md
#	codex/SKILL.md
#	design-consultation/SKILL.md
#	design-review/SKILL.md
#	document-release/SKILL.md
#	investigate/SKILL.md
#	office-hours/SKILL.md
#	plan-ceo-review/SKILL.md
#	plan-design-review/SKILL.md
#	plan-eng-review/SKILL.md
#	qa-only/SKILL.md
#	qa/SKILL.md
#	retro/SKILL.md
#	review/SKILL.md
#	scripts/gen-skill-docs.ts
#	setup-browser-cookies/SKILL.md
#	ship/SKILL.md
This commit is contained in:
Garry Tan
2026-03-21 19:32:18 -07:00
48 changed files with 3646 additions and 3240 deletions
+69 -44
View File
@@ -33,6 +33,12 @@ _TEL_START=$(date +%s)
_SESSION_ID="$$-$(date +%s)"
echo "TELEMETRY: ${_TEL:-off}"
echo "TEL_PROMPTED: $_TEL_PROMPTED"
_EMAIL=$(~/.codex/skills/gstack/bin/gstack-config get email 2>/dev/null || true)
_COMM_PROMPTED=$([ -f ~/.gstack/.community-prompted ] && echo "yes" || echo "no")
_AUTH_OK=$(~/.codex/skills/gstack/bin/gstack-auth-refresh --check 2>/dev/null && echo "yes" || echo "no")
echo "EMAIL: ${_EMAIL:-none}"
echo "COMM_PROMPTED: $_COMM_PROMPTED"
echo "AUTH: $_AUTH_OK"
mkdir -p ~/.gstack/analytics
echo '{"skill":"browse","ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","repo":"'$(basename "$(git rev-parse --show-toplevel 2>/dev/null)" 2>/dev/null || echo "unknown")'"}' >> ~/.gstack/analytics/skill-usage.jsonl 2>/dev/null || true
for _PF in ~/.gstack/analytics/.pending-*; do [ -f "$_PF" ] && ~/.codex/skills/gstack/bin/gstack-telemetry-log --event-type skill_run --skill _pending_finalize --outcome unknown --session-id "$_SESSION_ID" 2>/dev/null || true; break; done
@@ -58,28 +64,31 @@ Only run `open` if the user says yes. Always run `touch` to mark as seen. This o
If `TEL_PROMPTED` is `no` AND `LAKE_INTRO` is `yes`: After the lake intro is handled,
ask the user about telemetry. Use AskUserQuestion:
> Help gstack get better! Community mode shares usage data (which skills you use, how long
> they take, crash info) with a stable device ID so we can track trends and fix bugs faster.
> No code, file paths, or repo names are ever sent.
> gstack can share usage data (which skills you use, how long they take, crash info)
> to help improve the project. No code, file paths, or repo names are ever sent.
>
> The **community tier** unlocks extra features:
> - **Cloud backup** of your gstack config + history (restore on new machines)
> - **Benchmarks**: see how your usage compares to other builders
> - **Skill recommendations** based on community patterns
>
> Change anytime with `gstack-config set telemetry off`.
Options:
- A) Help gstack get better! (recommended)
- B) No thanks
- A) Community — share data + email for backup, benchmarks & recommendations (recommended)
- B) Anonymous — share data only, no account
- C) No thanks
If A: run `~/.codex/skills/gstack/bin/gstack-config set telemetry community`
If A: ask for their email via a follow-up AskUserQuestion, then run:
```bash
~/.codex/skills/gstack/bin/gstack-config set telemetry community
~/.codex/skills/gstack/bin/gstack-auth <user-provided-email>
```
The auth script will send a verification code to their email. Wait for them to enter the 6-digit code.
If auth succeeds, continue with the skill. If it fails, fall back to anonymous tier.
If B: ask a follow-up AskUserQuestion:
> How about anonymous mode? We just learn that *someone* used gstack — no unique ID,
> no way to connect sessions. Just a counter that helps us know if anyone's out there.
Options:
- A) Sure, anonymous is fine
- B) No thanks, fully off
If B→A: run `~/.codex/skills/gstack/bin/gstack-config set telemetry anonymous`
If B→B: run `~/.codex/skills/gstack/bin/gstack-config set telemetry off`
If B: run `~/.codex/skills/gstack/bin/gstack-config set telemetry anonymous`
If C: run `~/.codex/skills/gstack/bin/gstack-config set telemetry off`
Always run:
```bash
@@ -88,6 +97,33 @@ touch ~/.gstack/.telemetry-prompted
This only happens once. If `TEL_PROMPTED` is `yes`, skip this entirely.
If `TELEMETRY` is `anonymous` AND `COMM_PROMPTED` is `no`: After the main skill workflow
begins (not during preamble), offer the community tier upgrade once. Use AskUserQuestion:
> You're already sharing anonymous usage data — nice! Want to unlock more?
>
> The **community tier** adds:
> - Cloud backup of your gstack config (restore on new machines)
> - Benchmarks: see how your /qa times compare to the community
> - Skill recommendations based on what other builders use
>
> Just needs your email (verified via a one-time code).
Options:
- A) Yes, join community (enter email)
- B) Not now
If A: ask for their email, then run `~/.codex/skills/gstack/bin/gstack-auth <email>`.
Wait for the verification code. On success, run `~/.codex/skills/gstack/bin/gstack-config set telemetry community`.
If B: do nothing.
Always run:
```bash
touch ~/.gstack/.community-prompted
```
This only happens once. If `COMM_PROMPTED` is `yes`, skip this entirely.
## AskUserQuestion Format
**ALWAYS follow this structure for every AskUserQuestion call:**
@@ -125,26 +161,6 @@ AI-assisted coding makes the marginal cost of completeness near-zero. When you p
- BAD: "Let's defer test coverage to a follow-up PR." (Tests are the cheapest lake to boil.)
- BAD: Quoting only human-team effort: "This would take 2 weeks." (Say: "2 weeks human / ~1 hour CC.")
## Search Before Building
Before building infrastructure, unfamiliar patterns, or anything the runtime might have a built-in — **search first.** Read `~/.codex/skills/gstack/ETHOS.md` for the full philosophy.
**Three layers of knowledge:**
- **Layer 1** (tried and true — in distribution). Don't reinvent the wheel. But the cost of checking is near-zero, and once in a while, questioning the tried-and-true is where brilliance occurs.
- **Layer 2** (new and popular — search for these). But scrutinize: humans are subject to mania. Search results are inputs to your thinking, not answers.
- **Layer 3** (first principles — prize these above all). Original observations derived from reasoning about the specific problem. The most valuable of all.
**Eureka moment:** When first-principles reasoning reveals conventional wisdom is wrong, name it:
"EUREKA: Everyone does X because [assumption]. But [evidence] shows this is wrong. Y is better because [reasoning]."
Log eureka moments:
```bash
jq -n --arg ts "$(date -u +%Y-%m-%dT%H:%M:%SZ)" --arg skill "SKILL_NAME" --arg branch "$(git branch --show-current 2>/dev/null)" --arg insight "ONE_LINE_SUMMARY" '{ts:$ts,skill:$skill,branch:$branch,insight:$insight}' >> ~/.gstack/analytics/eureka.jsonl 2>/dev/null || true
```
Replace SKILL_NAME and ONE_LINE_SUMMARY. Runs inline — don't stop the workflow.
**WebSearch fallback:** If WebSearch is unavailable, skip the search step and note: "Search unavailable — proceeding with in-distribution knowledge only."
## Contributor Mode
If `_CONTRIB` is `true`: you are in **contributor mode**. You're a gstack user who also helps make it better.
@@ -214,10 +230,15 @@ Determine the skill name from the `name:` field in this file's YAML frontmatter.
Determine the outcome from the workflow result (success if completed normally, error
if it failed, abort if the user interrupted).
**PLAN MODE EXCEPTION — ALWAYS RUN:** This command writes telemetry to
`~/.gstack/analytics/` (user config directory, not project files). The skill
preamble already writes to the same directory — this is the same pattern.
Skipping this command loses session duration and outcome data.
**For errors:** Also determine:
- `ERROR_CLASS`: a short category — one of: `timeout`, `test_failure`, `build_failure`,
`git_error`, `auth_error`, `network_error`, `browse_error`, `lint_error`,
`merge_conflict`, `permission_error`, `unknown_error`. Pick the most specific match.
- `ERROR_MESSAGE`: a one-line summary of what went wrong (max 200 chars). Include the
command that failed and the key error text. Example: `"bun test: 3 tests failed in
auth.test.ts — expected 200 got 401"`. Never include file paths, secrets, or PII.
- `FAILED_STEP`: which step in the skill workflow failed. Example: `"run_tests"`,
`"create_pr"`, `"merge_base"`, `"build"`, `"qa_browse"`. Use snake_case, max 30 chars.
Run this bash:
@@ -227,12 +248,16 @@ _TEL_DUR=$(( _TEL_END - _TEL_START ))
rm -f ~/.gstack/analytics/.pending-"$_SESSION_ID" 2>/dev/null || true
~/.codex/skills/gstack/bin/gstack-telemetry-log \
--skill "SKILL_NAME" --duration "$_TEL_DUR" --outcome "OUTCOME" \
--used-browse "USED_BROWSE" --session-id "$_SESSION_ID" 2>/dev/null &
--used-browse "USED_BROWSE" --session-id "$_SESSION_ID" \
--error-class "ERROR_CLASS" --error-message "ERROR_MESSAGE" \
--failed-step "FAILED_STEP" 2>/dev/null &
```
Replace `SKILL_NAME` with the actual skill name from frontmatter, `OUTCOME` with
success/error/abort, and `USED_BROWSE` with true/false based on whether `$B` was used.
If you cannot determine the outcome, use "unknown". This runs in the background and
For `ERROR_CLASS`, `ERROR_MESSAGE`, and `FAILED_STEP`: use empty string `""` if the
outcome is not error. If the outcome is error but you cannot determine the details,
use `"unknown_error"`, `""`, and `""` respectively. This runs in the background and
never blocks the user.
# browse: QA Testing & Dogfooding
@@ -378,7 +403,7 @@ The snapshot is your primary tool for understanding and interacting with pages.
-s <sel> --selector Scope to CSS selector
-D --diff Unified diff against previous snapshot (first call stores baseline)
-a --annotate Annotated screenshot with red overlay boxes and ref labels
-o <path> --output Output path for annotated screenshot (default: <temp>/browse-annotated.png)
-o <path> --output Output path for annotated screenshot (default: /tmp/browse-annotated.png)
-C --cursor-interactive Cursor-interactive elements (@c refs — divs with pointer, onclick)
```
@@ -34,6 +34,12 @@ _TEL_START=$(date +%s)
_SESSION_ID="$$-$(date +%s)"
echo "TELEMETRY: ${_TEL:-off}"
echo "TEL_PROMPTED: $_TEL_PROMPTED"
_EMAIL=$(~/.codex/skills/gstack/bin/gstack-config get email 2>/dev/null || true)
_COMM_PROMPTED=$([ -f ~/.gstack/.community-prompted ] && echo "yes" || echo "no")
_AUTH_OK=$(~/.codex/skills/gstack/bin/gstack-auth-refresh --check 2>/dev/null && echo "yes" || echo "no")
echo "EMAIL: ${_EMAIL:-none}"
echo "COMM_PROMPTED: $_COMM_PROMPTED"
echo "AUTH: $_AUTH_OK"
mkdir -p ~/.gstack/analytics
echo '{"skill":"design-consultation","ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","repo":"'$(basename "$(git rev-parse --show-toplevel 2>/dev/null)" 2>/dev/null || echo "unknown")'"}' >> ~/.gstack/analytics/skill-usage.jsonl 2>/dev/null || true
for _PF in ~/.gstack/analytics/.pending-*; do [ -f "$_PF" ] && ~/.codex/skills/gstack/bin/gstack-telemetry-log --event-type skill_run --skill _pending_finalize --outcome unknown --session-id "$_SESSION_ID" 2>/dev/null || true; break; done
@@ -59,28 +65,31 @@ Only run `open` if the user says yes. Always run `touch` to mark as seen. This o
If `TEL_PROMPTED` is `no` AND `LAKE_INTRO` is `yes`: After the lake intro is handled,
ask the user about telemetry. Use AskUserQuestion:
> Help gstack get better! Community mode shares usage data (which skills you use, how long
> they take, crash info) with a stable device ID so we can track trends and fix bugs faster.
> No code, file paths, or repo names are ever sent.
> gstack can share usage data (which skills you use, how long they take, crash info)
> to help improve the project. No code, file paths, or repo names are ever sent.
>
> The **community tier** unlocks extra features:
> - **Cloud backup** of your gstack config + history (restore on new machines)
> - **Benchmarks**: see how your usage compares to other builders
> - **Skill recommendations** based on community patterns
>
> Change anytime with `gstack-config set telemetry off`.
Options:
- A) Help gstack get better! (recommended)
- B) No thanks
- A) Community — share data + email for backup, benchmarks & recommendations (recommended)
- B) Anonymous — share data only, no account
- C) No thanks
If A: run `~/.codex/skills/gstack/bin/gstack-config set telemetry community`
If A: ask for their email via a follow-up AskUserQuestion, then run:
```bash
~/.codex/skills/gstack/bin/gstack-config set telemetry community
~/.codex/skills/gstack/bin/gstack-auth <user-provided-email>
```
The auth script will send a verification code to their email. Wait for them to enter the 6-digit code.
If auth succeeds, continue with the skill. If it fails, fall back to anonymous tier.
If B: ask a follow-up AskUserQuestion:
> How about anonymous mode? We just learn that *someone* used gstack — no unique ID,
> no way to connect sessions. Just a counter that helps us know if anyone's out there.
Options:
- A) Sure, anonymous is fine
- B) No thanks, fully off
If B→A: run `~/.codex/skills/gstack/bin/gstack-config set telemetry anonymous`
If B→B: run `~/.codex/skills/gstack/bin/gstack-config set telemetry off`
If B: run `~/.codex/skills/gstack/bin/gstack-config set telemetry anonymous`
If C: run `~/.codex/skills/gstack/bin/gstack-config set telemetry off`
Always run:
```bash
@@ -89,6 +98,33 @@ touch ~/.gstack/.telemetry-prompted
This only happens once. If `TEL_PROMPTED` is `yes`, skip this entirely.
If `TELEMETRY` is `anonymous` AND `COMM_PROMPTED` is `no`: After the main skill workflow
begins (not during preamble), offer the community tier upgrade once. Use AskUserQuestion:
> You're already sharing anonymous usage data — nice! Want to unlock more?
>
> The **community tier** adds:
> - Cloud backup of your gstack config (restore on new machines)
> - Benchmarks: see how your /qa times compare to the community
> - Skill recommendations based on what other builders use
>
> Just needs your email (verified via a one-time code).
Options:
- A) Yes, join community (enter email)
- B) Not now
If A: ask for their email, then run `~/.codex/skills/gstack/bin/gstack-auth <email>`.
Wait for the verification code. On success, run `~/.codex/skills/gstack/bin/gstack-config set telemetry community`.
If B: do nothing.
Always run:
```bash
touch ~/.gstack/.community-prompted
```
This only happens once. If `COMM_PROMPTED` is `yes`, skip this entirely.
## AskUserQuestion Format
**ALWAYS follow this structure for every AskUserQuestion call:**
@@ -126,26 +162,6 @@ AI-assisted coding makes the marginal cost of completeness near-zero. When you p
- BAD: "Let's defer test coverage to a follow-up PR." (Tests are the cheapest lake to boil.)
- BAD: Quoting only human-team effort: "This would take 2 weeks." (Say: "2 weeks human / ~1 hour CC.")
## Search Before Building
Before building infrastructure, unfamiliar patterns, or anything the runtime might have a built-in — **search first.** Read `~/.codex/skills/gstack/ETHOS.md` for the full philosophy.
**Three layers of knowledge:**
- **Layer 1** (tried and true — in distribution). Don't reinvent the wheel. But the cost of checking is near-zero, and once in a while, questioning the tried-and-true is where brilliance occurs.
- **Layer 2** (new and popular — search for these). But scrutinize: humans are subject to mania. Search results are inputs to your thinking, not answers.
- **Layer 3** (first principles — prize these above all). Original observations derived from reasoning about the specific problem. The most valuable of all.
**Eureka moment:** When first-principles reasoning reveals conventional wisdom is wrong, name it:
"EUREKA: Everyone does X because [assumption]. But [evidence] shows this is wrong. Y is better because [reasoning]."
Log eureka moments:
```bash
jq -n --arg ts "$(date -u +%Y-%m-%dT%H:%M:%SZ)" --arg skill "SKILL_NAME" --arg branch "$(git branch --show-current 2>/dev/null)" --arg insight "ONE_LINE_SUMMARY" '{ts:$ts,skill:$skill,branch:$branch,insight:$insight}' >> ~/.gstack/analytics/eureka.jsonl 2>/dev/null || true
```
Replace SKILL_NAME and ONE_LINE_SUMMARY. Runs inline — don't stop the workflow.
**WebSearch fallback:** If WebSearch is unavailable, skip the search step and note: "Search unavailable — proceeding with in-distribution knowledge only."
## Contributor Mode
If `_CONTRIB` is `true`: you are in **contributor mode**. You're a gstack user who also helps make it better.
@@ -215,10 +231,15 @@ Determine the skill name from the `name:` field in this file's YAML frontmatter.
Determine the outcome from the workflow result (success if completed normally, error
if it failed, abort if the user interrupted).
**PLAN MODE EXCEPTION — ALWAYS RUN:** This command writes telemetry to
`~/.gstack/analytics/` (user config directory, not project files). The skill
preamble already writes to the same directory — this is the same pattern.
Skipping this command loses session duration and outcome data.
**For errors:** Also determine:
- `ERROR_CLASS`: a short category — one of: `timeout`, `test_failure`, `build_failure`,
`git_error`, `auth_error`, `network_error`, `browse_error`, `lint_error`,
`merge_conflict`, `permission_error`, `unknown_error`. Pick the most specific match.
- `ERROR_MESSAGE`: a one-line summary of what went wrong (max 200 chars). Include the
command that failed and the key error text. Example: `"bun test: 3 tests failed in
auth.test.ts — expected 200 got 401"`. Never include file paths, secrets, or PII.
- `FAILED_STEP`: which step in the skill workflow failed. Example: `"run_tests"`,
`"create_pr"`, `"merge_base"`, `"build"`, `"qa_browse"`. Use snake_case, max 30 chars.
Run this bash:
@@ -228,12 +249,16 @@ _TEL_DUR=$(( _TEL_END - _TEL_START ))
rm -f ~/.gstack/analytics/.pending-"$_SESSION_ID" 2>/dev/null || true
~/.codex/skills/gstack/bin/gstack-telemetry-log \
--skill "SKILL_NAME" --duration "$_TEL_DUR" --outcome "OUTCOME" \
--used-browse "USED_BROWSE" --session-id "$_SESSION_ID" 2>/dev/null &
--used-browse "USED_BROWSE" --session-id "$_SESSION_ID" \
--error-class "ERROR_CLASS" --error-message "ERROR_MESSAGE" \
--failed-step "FAILED_STEP" 2>/dev/null &
```
Replace `SKILL_NAME` with the actual skill name from frontmatter, `OUTCOME` with
success/error/abort, and `USED_BROWSE` with true/false based on whether `$B` was used.
If you cannot determine the outcome, use "unknown". This runs in the background and
For `ERROR_CLASS`, `ERROR_MESSAGE`, and `FAILED_STEP`: use empty string `""` if the
outcome is not error. If the outcome is error but you cannot determine the details,
use `"unknown_error"`, `""`, and `""` respectively. This runs in the background and
never blocks the user.
# /design-consultation: Your Design System, Built Together
@@ -343,12 +368,7 @@ If browse is not available, rely on WebSearch results and your built-in design k
**Step 3: Synthesize findings**
**Three-layer synthesis:**
- **Layer 1 (tried and true):** What design patterns does every product in this category share? These are table stakes — users expect them.
- **Layer 2 (new and popular):** What are the search results and current design discourse saying? What's trending? What new patterns are emerging?
- **Layer 3 (first principles):** Given what we know about THIS product's users and positioning — is there a reason the conventional design approach is wrong? Where should we deliberately break from the category norms?
**Eureka check:** If Layer 3 reasoning reveals a genuine design insight — a reason the category's visual language fails THIS product — name it: "EUREKA: Every [category] product does X because they assume [assumption]. But this product's users [evidence] — so we should do Y instead." Log the eureka moment (see preamble).
The goal of research is NOT to copy. It is to get in the ballpark — to understand the visual language users in this category already expect. This gives you the baseline. The interesting design work starts after you have the baseline: deciding where to follow conventions (so the product feels literate) and where to break from them (so the product is memorable).
Summarize conversationally:
> "I looked at what's out there. Here's the landscape: they converge on [patterns]. Most of them feel [observation — e.g., interchangeable, polished but generic, etc.]. The opportunity to stand out is [gap]. Here's where I'd play it safe and where I'd take a risk..."
+68 -43
View File
@@ -34,6 +34,12 @@ _TEL_START=$(date +%s)
_SESSION_ID="$$-$(date +%s)"
echo "TELEMETRY: ${_TEL:-off}"
echo "TEL_PROMPTED: $_TEL_PROMPTED"
_EMAIL=$(~/.codex/skills/gstack/bin/gstack-config get email 2>/dev/null || true)
_COMM_PROMPTED=$([ -f ~/.gstack/.community-prompted ] && echo "yes" || echo "no")
_AUTH_OK=$(~/.codex/skills/gstack/bin/gstack-auth-refresh --check 2>/dev/null && echo "yes" || echo "no")
echo "EMAIL: ${_EMAIL:-none}"
echo "COMM_PROMPTED: $_COMM_PROMPTED"
echo "AUTH: $_AUTH_OK"
mkdir -p ~/.gstack/analytics
echo '{"skill":"design-review","ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","repo":"'$(basename "$(git rev-parse --show-toplevel 2>/dev/null)" 2>/dev/null || echo "unknown")'"}' >> ~/.gstack/analytics/skill-usage.jsonl 2>/dev/null || true
for _PF in ~/.gstack/analytics/.pending-*; do [ -f "$_PF" ] && ~/.codex/skills/gstack/bin/gstack-telemetry-log --event-type skill_run --skill _pending_finalize --outcome unknown --session-id "$_SESSION_ID" 2>/dev/null || true; break; done
@@ -59,28 +65,31 @@ Only run `open` if the user says yes. Always run `touch` to mark as seen. This o
If `TEL_PROMPTED` is `no` AND `LAKE_INTRO` is `yes`: After the lake intro is handled,
ask the user about telemetry. Use AskUserQuestion:
> Help gstack get better! Community mode shares usage data (which skills you use, how long
> they take, crash info) with a stable device ID so we can track trends and fix bugs faster.
> No code, file paths, or repo names are ever sent.
> gstack can share usage data (which skills you use, how long they take, crash info)
> to help improve the project. No code, file paths, or repo names are ever sent.
>
> The **community tier** unlocks extra features:
> - **Cloud backup** of your gstack config + history (restore on new machines)
> - **Benchmarks**: see how your usage compares to other builders
> - **Skill recommendations** based on community patterns
>
> Change anytime with `gstack-config set telemetry off`.
Options:
- A) Help gstack get better! (recommended)
- B) No thanks
- A) Community — share data + email for backup, benchmarks & recommendations (recommended)
- B) Anonymous — share data only, no account
- C) No thanks
If A: run `~/.codex/skills/gstack/bin/gstack-config set telemetry community`
If A: ask for their email via a follow-up AskUserQuestion, then run:
```bash
~/.codex/skills/gstack/bin/gstack-config set telemetry community
~/.codex/skills/gstack/bin/gstack-auth <user-provided-email>
```
The auth script will send a verification code to their email. Wait for them to enter the 6-digit code.
If auth succeeds, continue with the skill. If it fails, fall back to anonymous tier.
If B: ask a follow-up AskUserQuestion:
> How about anonymous mode? We just learn that *someone* used gstack — no unique ID,
> no way to connect sessions. Just a counter that helps us know if anyone's out there.
Options:
- A) Sure, anonymous is fine
- B) No thanks, fully off
If B→A: run `~/.codex/skills/gstack/bin/gstack-config set telemetry anonymous`
If B→B: run `~/.codex/skills/gstack/bin/gstack-config set telemetry off`
If B: run `~/.codex/skills/gstack/bin/gstack-config set telemetry anonymous`
If C: run `~/.codex/skills/gstack/bin/gstack-config set telemetry off`
Always run:
```bash
@@ -89,6 +98,33 @@ touch ~/.gstack/.telemetry-prompted
This only happens once. If `TEL_PROMPTED` is `yes`, skip this entirely.
If `TELEMETRY` is `anonymous` AND `COMM_PROMPTED` is `no`: After the main skill workflow
begins (not during preamble), offer the community tier upgrade once. Use AskUserQuestion:
> You're already sharing anonymous usage data — nice! Want to unlock more?
>
> The **community tier** adds:
> - Cloud backup of your gstack config (restore on new machines)
> - Benchmarks: see how your /qa times compare to the community
> - Skill recommendations based on what other builders use
>
> Just needs your email (verified via a one-time code).
Options:
- A) Yes, join community (enter email)
- B) Not now
If A: ask for their email, then run `~/.codex/skills/gstack/bin/gstack-auth <email>`.
Wait for the verification code. On success, run `~/.codex/skills/gstack/bin/gstack-config set telemetry community`.
If B: do nothing.
Always run:
```bash
touch ~/.gstack/.community-prompted
```
This only happens once. If `COMM_PROMPTED` is `yes`, skip this entirely.
## AskUserQuestion Format
**ALWAYS follow this structure for every AskUserQuestion call:**
@@ -126,26 +162,6 @@ AI-assisted coding makes the marginal cost of completeness near-zero. When you p
- BAD: "Let's defer test coverage to a follow-up PR." (Tests are the cheapest lake to boil.)
- BAD: Quoting only human-team effort: "This would take 2 weeks." (Say: "2 weeks human / ~1 hour CC.")
## Search Before Building
Before building infrastructure, unfamiliar patterns, or anything the runtime might have a built-in — **search first.** Read `~/.codex/skills/gstack/ETHOS.md` for the full philosophy.
**Three layers of knowledge:**
- **Layer 1** (tried and true — in distribution). Don't reinvent the wheel. But the cost of checking is near-zero, and once in a while, questioning the tried-and-true is where brilliance occurs.
- **Layer 2** (new and popular — search for these). But scrutinize: humans are subject to mania. Search results are inputs to your thinking, not answers.
- **Layer 3** (first principles — prize these above all). Original observations derived from reasoning about the specific problem. The most valuable of all.
**Eureka moment:** When first-principles reasoning reveals conventional wisdom is wrong, name it:
"EUREKA: Everyone does X because [assumption]. But [evidence] shows this is wrong. Y is better because [reasoning]."
Log eureka moments:
```bash
jq -n --arg ts "$(date -u +%Y-%m-%dT%H:%M:%SZ)" --arg skill "SKILL_NAME" --arg branch "$(git branch --show-current 2>/dev/null)" --arg insight "ONE_LINE_SUMMARY" '{ts:$ts,skill:$skill,branch:$branch,insight:$insight}' >> ~/.gstack/analytics/eureka.jsonl 2>/dev/null || true
```
Replace SKILL_NAME and ONE_LINE_SUMMARY. Runs inline — don't stop the workflow.
**WebSearch fallback:** If WebSearch is unavailable, skip the search step and note: "Search unavailable — proceeding with in-distribution knowledge only."
## Contributor Mode
If `_CONTRIB` is `true`: you are in **contributor mode**. You're a gstack user who also helps make it better.
@@ -215,10 +231,15 @@ Determine the skill name from the `name:` field in this file's YAML frontmatter.
Determine the outcome from the workflow result (success if completed normally, error
if it failed, abort if the user interrupted).
**PLAN MODE EXCEPTION — ALWAYS RUN:** This command writes telemetry to
`~/.gstack/analytics/` (user config directory, not project files). The skill
preamble already writes to the same directory — this is the same pattern.
Skipping this command loses session duration and outcome data.
**For errors:** Also determine:
- `ERROR_CLASS`: a short category — one of: `timeout`, `test_failure`, `build_failure`,
`git_error`, `auth_error`, `network_error`, `browse_error`, `lint_error`,
`merge_conflict`, `permission_error`, `unknown_error`. Pick the most specific match.
- `ERROR_MESSAGE`: a one-line summary of what went wrong (max 200 chars). Include the
command that failed and the key error text. Example: `"bun test: 3 tests failed in
auth.test.ts — expected 200 got 401"`. Never include file paths, secrets, or PII.
- `FAILED_STEP`: which step in the skill workflow failed. Example: `"run_tests"`,
`"create_pr"`, `"merge_base"`, `"build"`, `"qa_browse"`. Use snake_case, max 30 chars.
Run this bash:
@@ -228,12 +249,16 @@ _TEL_DUR=$(( _TEL_END - _TEL_START ))
rm -f ~/.gstack/analytics/.pending-"$_SESSION_ID" 2>/dev/null || true
~/.codex/skills/gstack/bin/gstack-telemetry-log \
--skill "SKILL_NAME" --duration "$_TEL_DUR" --outcome "OUTCOME" \
--used-browse "USED_BROWSE" --session-id "$_SESSION_ID" 2>/dev/null &
--used-browse "USED_BROWSE" --session-id "$_SESSION_ID" \
--error-class "ERROR_CLASS" --error-message "ERROR_MESSAGE" \
--failed-step "FAILED_STEP" 2>/dev/null &
```
Replace `SKILL_NAME` with the actual skill name from frontmatter, `OUTCOME` with
success/error/abort, and `USED_BROWSE` with true/false based on whether `$B` was used.
If you cannot determine the outcome, use "unknown". This runs in the background and
For `ERROR_CLASS`, `ERROR_MESSAGE`, and `FAILED_STEP`: use empty string `""` if the
outcome is not error. If the outcome is error but you cannot determine the details,
use `"unknown_error"`, `""`, and `""` respectively. This runs in the background and
never blocks the user.
# /design-review: Design Audit → Fix → Verify
+68 -43
View File
@@ -32,6 +32,12 @@ _TEL_START=$(date +%s)
_SESSION_ID="$$-$(date +%s)"
echo "TELEMETRY: ${_TEL:-off}"
echo "TEL_PROMPTED: $_TEL_PROMPTED"
_EMAIL=$(~/.codex/skills/gstack/bin/gstack-config get email 2>/dev/null || true)
_COMM_PROMPTED=$([ -f ~/.gstack/.community-prompted ] && echo "yes" || echo "no")
_AUTH_OK=$(~/.codex/skills/gstack/bin/gstack-auth-refresh --check 2>/dev/null && echo "yes" || echo "no")
echo "EMAIL: ${_EMAIL:-none}"
echo "COMM_PROMPTED: $_COMM_PROMPTED"
echo "AUTH: $_AUTH_OK"
mkdir -p ~/.gstack/analytics
echo '{"skill":"document-release","ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","repo":"'$(basename "$(git rev-parse --show-toplevel 2>/dev/null)" 2>/dev/null || echo "unknown")'"}' >> ~/.gstack/analytics/skill-usage.jsonl 2>/dev/null || true
for _PF in ~/.gstack/analytics/.pending-*; do [ -f "$_PF" ] && ~/.codex/skills/gstack/bin/gstack-telemetry-log --event-type skill_run --skill _pending_finalize --outcome unknown --session-id "$_SESSION_ID" 2>/dev/null || true; break; done
@@ -57,28 +63,31 @@ Only run `open` if the user says yes. Always run `touch` to mark as seen. This o
If `TEL_PROMPTED` is `no` AND `LAKE_INTRO` is `yes`: After the lake intro is handled,
ask the user about telemetry. Use AskUserQuestion:
> Help gstack get better! Community mode shares usage data (which skills you use, how long
> they take, crash info) with a stable device ID so we can track trends and fix bugs faster.
> No code, file paths, or repo names are ever sent.
> gstack can share usage data (which skills you use, how long they take, crash info)
> to help improve the project. No code, file paths, or repo names are ever sent.
>
> The **community tier** unlocks extra features:
> - **Cloud backup** of your gstack config + history (restore on new machines)
> - **Benchmarks**: see how your usage compares to other builders
> - **Skill recommendations** based on community patterns
>
> Change anytime with `gstack-config set telemetry off`.
Options:
- A) Help gstack get better! (recommended)
- B) No thanks
- A) Community — share data + email for backup, benchmarks & recommendations (recommended)
- B) Anonymous — share data only, no account
- C) No thanks
If A: run `~/.codex/skills/gstack/bin/gstack-config set telemetry community`
If A: ask for their email via a follow-up AskUserQuestion, then run:
```bash
~/.codex/skills/gstack/bin/gstack-config set telemetry community
~/.codex/skills/gstack/bin/gstack-auth <user-provided-email>
```
The auth script will send a verification code to their email. Wait for them to enter the 6-digit code.
If auth succeeds, continue with the skill. If it fails, fall back to anonymous tier.
If B: ask a follow-up AskUserQuestion:
> How about anonymous mode? We just learn that *someone* used gstack — no unique ID,
> no way to connect sessions. Just a counter that helps us know if anyone's out there.
Options:
- A) Sure, anonymous is fine
- B) No thanks, fully off
If B→A: run `~/.codex/skills/gstack/bin/gstack-config set telemetry anonymous`
If B→B: run `~/.codex/skills/gstack/bin/gstack-config set telemetry off`
If B: run `~/.codex/skills/gstack/bin/gstack-config set telemetry anonymous`
If C: run `~/.codex/skills/gstack/bin/gstack-config set telemetry off`
Always run:
```bash
@@ -87,6 +96,33 @@ touch ~/.gstack/.telemetry-prompted
This only happens once. If `TEL_PROMPTED` is `yes`, skip this entirely.
If `TELEMETRY` is `anonymous` AND `COMM_PROMPTED` is `no`: After the main skill workflow
begins (not during preamble), offer the community tier upgrade once. Use AskUserQuestion:
> You're already sharing anonymous usage data — nice! Want to unlock more?
>
> The **community tier** adds:
> - Cloud backup of your gstack config (restore on new machines)
> - Benchmarks: see how your /qa times compare to the community
> - Skill recommendations based on what other builders use
>
> Just needs your email (verified via a one-time code).
Options:
- A) Yes, join community (enter email)
- B) Not now
If A: ask for their email, then run `~/.codex/skills/gstack/bin/gstack-auth <email>`.
Wait for the verification code. On success, run `~/.codex/skills/gstack/bin/gstack-config set telemetry community`.
If B: do nothing.
Always run:
```bash
touch ~/.gstack/.community-prompted
```
This only happens once. If `COMM_PROMPTED` is `yes`, skip this entirely.
## AskUserQuestion Format
**ALWAYS follow this structure for every AskUserQuestion call:**
@@ -124,26 +160,6 @@ AI-assisted coding makes the marginal cost of completeness near-zero. When you p
- BAD: "Let's defer test coverage to a follow-up PR." (Tests are the cheapest lake to boil.)
- BAD: Quoting only human-team effort: "This would take 2 weeks." (Say: "2 weeks human / ~1 hour CC.")
## Search Before Building
Before building infrastructure, unfamiliar patterns, or anything the runtime might have a built-in — **search first.** Read `~/.codex/skills/gstack/ETHOS.md` for the full philosophy.
**Three layers of knowledge:**
- **Layer 1** (tried and true — in distribution). Don't reinvent the wheel. But the cost of checking is near-zero, and once in a while, questioning the tried-and-true is where brilliance occurs.
- **Layer 2** (new and popular — search for these). But scrutinize: humans are subject to mania. Search results are inputs to your thinking, not answers.
- **Layer 3** (first principles — prize these above all). Original observations derived from reasoning about the specific problem. The most valuable of all.
**Eureka moment:** When first-principles reasoning reveals conventional wisdom is wrong, name it:
"EUREKA: Everyone does X because [assumption]. But [evidence] shows this is wrong. Y is better because [reasoning]."
Log eureka moments:
```bash
jq -n --arg ts "$(date -u +%Y-%m-%dT%H:%M:%SZ)" --arg skill "SKILL_NAME" --arg branch "$(git branch --show-current 2>/dev/null)" --arg insight "ONE_LINE_SUMMARY" '{ts:$ts,skill:$skill,branch:$branch,insight:$insight}' >> ~/.gstack/analytics/eureka.jsonl 2>/dev/null || true
```
Replace SKILL_NAME and ONE_LINE_SUMMARY. Runs inline — don't stop the workflow.
**WebSearch fallback:** If WebSearch is unavailable, skip the search step and note: "Search unavailable — proceeding with in-distribution knowledge only."
## Contributor Mode
If `_CONTRIB` is `true`: you are in **contributor mode**. You're a gstack user who also helps make it better.
@@ -213,10 +229,15 @@ Determine the skill name from the `name:` field in this file's YAML frontmatter.
Determine the outcome from the workflow result (success if completed normally, error
if it failed, abort if the user interrupted).
**PLAN MODE EXCEPTION — ALWAYS RUN:** This command writes telemetry to
`~/.gstack/analytics/` (user config directory, not project files). The skill
preamble already writes to the same directory — this is the same pattern.
Skipping this command loses session duration and outcome data.
**For errors:** Also determine:
- `ERROR_CLASS`: a short category — one of: `timeout`, `test_failure`, `build_failure`,
`git_error`, `auth_error`, `network_error`, `browse_error`, `lint_error`,
`merge_conflict`, `permission_error`, `unknown_error`. Pick the most specific match.
- `ERROR_MESSAGE`: a one-line summary of what went wrong (max 200 chars). Include the
command that failed and the key error text. Example: `"bun test: 3 tests failed in
auth.test.ts — expected 200 got 401"`. Never include file paths, secrets, or PII.
- `FAILED_STEP`: which step in the skill workflow failed. Example: `"run_tests"`,
`"create_pr"`, `"merge_base"`, `"build"`, `"qa_browse"`. Use snake_case, max 30 chars.
Run this bash:
@@ -226,12 +247,16 @@ _TEL_DUR=$(( _TEL_END - _TEL_START ))
rm -f ~/.gstack/analytics/.pending-"$_SESSION_ID" 2>/dev/null || true
~/.codex/skills/gstack/bin/gstack-telemetry-log \
--skill "SKILL_NAME" --duration "$_TEL_DUR" --outcome "OUTCOME" \
--used-browse "USED_BROWSE" --session-id "$_SESSION_ID" 2>/dev/null &
--used-browse "USED_BROWSE" --session-id "$_SESSION_ID" \
--error-class "ERROR_CLASS" --error-message "ERROR_MESSAGE" \
--failed-step "FAILED_STEP" 2>/dev/null &
```
Replace `SKILL_NAME` with the actual skill name from frontmatter, `OUTCOME` with
success/error/abort, and `USED_BROWSE` with true/false based on whether `$B` was used.
If you cannot determine the outcome, use "unknown". This runs in the background and
For `ERROR_CLASS`, `ERROR_MESSAGE`, and `FAILED_STEP`: use empty string `""` if the
outcome is not error. If the outcome is error but you cannot determine the details,
use `"unknown_error"`, `""`, and `""` respectively. This runs in the background and
never blocks the user.
## Step 0: Detect base branch
+69 -50
View File
@@ -35,6 +35,12 @@ _TEL_START=$(date +%s)
_SESSION_ID="$$-$(date +%s)"
echo "TELEMETRY: ${_TEL:-off}"
echo "TEL_PROMPTED: $_TEL_PROMPTED"
_EMAIL=$(~/.codex/skills/gstack/bin/gstack-config get email 2>/dev/null || true)
_COMM_PROMPTED=$([ -f ~/.gstack/.community-prompted ] && echo "yes" || echo "no")
_AUTH_OK=$(~/.codex/skills/gstack/bin/gstack-auth-refresh --check 2>/dev/null && echo "yes" || echo "no")
echo "EMAIL: ${_EMAIL:-none}"
echo "COMM_PROMPTED: $_COMM_PROMPTED"
echo "AUTH: $_AUTH_OK"
mkdir -p ~/.gstack/analytics
echo '{"skill":"investigate","ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","repo":"'$(basename "$(git rev-parse --show-toplevel 2>/dev/null)" 2>/dev/null || echo "unknown")'"}' >> ~/.gstack/analytics/skill-usage.jsonl 2>/dev/null || true
for _PF in ~/.gstack/analytics/.pending-*; do [ -f "$_PF" ] && ~/.codex/skills/gstack/bin/gstack-telemetry-log --event-type skill_run --skill _pending_finalize --outcome unknown --session-id "$_SESSION_ID" 2>/dev/null || true; break; done
@@ -60,28 +66,31 @@ Only run `open` if the user says yes. Always run `touch` to mark as seen. This o
If `TEL_PROMPTED` is `no` AND `LAKE_INTRO` is `yes`: After the lake intro is handled,
ask the user about telemetry. Use AskUserQuestion:
> Help gstack get better! Community mode shares usage data (which skills you use, how long
> they take, crash info) with a stable device ID so we can track trends and fix bugs faster.
> No code, file paths, or repo names are ever sent.
> gstack can share usage data (which skills you use, how long they take, crash info)
> to help improve the project. No code, file paths, or repo names are ever sent.
>
> The **community tier** unlocks extra features:
> - **Cloud backup** of your gstack config + history (restore on new machines)
> - **Benchmarks**: see how your usage compares to other builders
> - **Skill recommendations** based on community patterns
>
> Change anytime with `gstack-config set telemetry off`.
Options:
- A) Help gstack get better! (recommended)
- B) No thanks
- A) Community — share data + email for backup, benchmarks & recommendations (recommended)
- B) Anonymous — share data only, no account
- C) No thanks
If A: run `~/.codex/skills/gstack/bin/gstack-config set telemetry community`
If A: ask for their email via a follow-up AskUserQuestion, then run:
```bash
~/.codex/skills/gstack/bin/gstack-config set telemetry community
~/.codex/skills/gstack/bin/gstack-auth <user-provided-email>
```
The auth script will send a verification code to their email. Wait for them to enter the 6-digit code.
If auth succeeds, continue with the skill. If it fails, fall back to anonymous tier.
If B: ask a follow-up AskUserQuestion:
> How about anonymous mode? We just learn that *someone* used gstack — no unique ID,
> no way to connect sessions. Just a counter that helps us know if anyone's out there.
Options:
- A) Sure, anonymous is fine
- B) No thanks, fully off
If B→A: run `~/.codex/skills/gstack/bin/gstack-config set telemetry anonymous`
If B→B: run `~/.codex/skills/gstack/bin/gstack-config set telemetry off`
If B: run `~/.codex/skills/gstack/bin/gstack-config set telemetry anonymous`
If C: run `~/.codex/skills/gstack/bin/gstack-config set telemetry off`
Always run:
```bash
@@ -90,6 +99,33 @@ touch ~/.gstack/.telemetry-prompted
This only happens once. If `TEL_PROMPTED` is `yes`, skip this entirely.
If `TELEMETRY` is `anonymous` AND `COMM_PROMPTED` is `no`: After the main skill workflow
begins (not during preamble), offer the community tier upgrade once. Use AskUserQuestion:
> You're already sharing anonymous usage data — nice! Want to unlock more?
>
> The **community tier** adds:
> - Cloud backup of your gstack config (restore on new machines)
> - Benchmarks: see how your /qa times compare to the community
> - Skill recommendations based on what other builders use
>
> Just needs your email (verified via a one-time code).
Options:
- A) Yes, join community (enter email)
- B) Not now
If A: ask for their email, then run `~/.codex/skills/gstack/bin/gstack-auth <email>`.
Wait for the verification code. On success, run `~/.codex/skills/gstack/bin/gstack-config set telemetry community`.
If B: do nothing.
Always run:
```bash
touch ~/.gstack/.community-prompted
```
This only happens once. If `COMM_PROMPTED` is `yes`, skip this entirely.
## AskUserQuestion Format
**ALWAYS follow this structure for every AskUserQuestion call:**
@@ -127,26 +163,6 @@ AI-assisted coding makes the marginal cost of completeness near-zero. When you p
- BAD: "Let's defer test coverage to a follow-up PR." (Tests are the cheapest lake to boil.)
- BAD: Quoting only human-team effort: "This would take 2 weeks." (Say: "2 weeks human / ~1 hour CC.")
## Search Before Building
Before building infrastructure, unfamiliar patterns, or anything the runtime might have a built-in — **search first.** Read `~/.codex/skills/gstack/ETHOS.md` for the full philosophy.
**Three layers of knowledge:**
- **Layer 1** (tried and true — in distribution). Don't reinvent the wheel. But the cost of checking is near-zero, and once in a while, questioning the tried-and-true is where brilliance occurs.
- **Layer 2** (new and popular — search for these). But scrutinize: humans are subject to mania. Search results are inputs to your thinking, not answers.
- **Layer 3** (first principles — prize these above all). Original observations derived from reasoning about the specific problem. The most valuable of all.
**Eureka moment:** When first-principles reasoning reveals conventional wisdom is wrong, name it:
"EUREKA: Everyone does X because [assumption]. But [evidence] shows this is wrong. Y is better because [reasoning]."
Log eureka moments:
```bash
jq -n --arg ts "$(date -u +%Y-%m-%dT%H:%M:%SZ)" --arg skill "SKILL_NAME" --arg branch "$(git branch --show-current 2>/dev/null)" --arg insight "ONE_LINE_SUMMARY" '{ts:$ts,skill:$skill,branch:$branch,insight:$insight}' >> ~/.gstack/analytics/eureka.jsonl 2>/dev/null || true
```
Replace SKILL_NAME and ONE_LINE_SUMMARY. Runs inline — don't stop the workflow.
**WebSearch fallback:** If WebSearch is unavailable, skip the search step and note: "Search unavailable — proceeding with in-distribution knowledge only."
## Contributor Mode
If `_CONTRIB` is `true`: you are in **contributor mode**. You're a gstack user who also helps make it better.
@@ -216,10 +232,15 @@ Determine the skill name from the `name:` field in this file's YAML frontmatter.
Determine the outcome from the workflow result (success if completed normally, error
if it failed, abort if the user interrupted).
**PLAN MODE EXCEPTION — ALWAYS RUN:** This command writes telemetry to
`~/.gstack/analytics/` (user config directory, not project files). The skill
preamble already writes to the same directory — this is the same pattern.
Skipping this command loses session duration and outcome data.
**For errors:** Also determine:
- `ERROR_CLASS`: a short category — one of: `timeout`, `test_failure`, `build_failure`,
`git_error`, `auth_error`, `network_error`, `browse_error`, `lint_error`,
`merge_conflict`, `permission_error`, `unknown_error`. Pick the most specific match.
- `ERROR_MESSAGE`: a one-line summary of what went wrong (max 200 chars). Include the
command that failed and the key error text. Example: `"bun test: 3 tests failed in
auth.test.ts — expected 200 got 401"`. Never include file paths, secrets, or PII.
- `FAILED_STEP`: which step in the skill workflow failed. Example: `"run_tests"`,
`"create_pr"`, `"merge_base"`, `"build"`, `"qa_browse"`. Use snake_case, max 30 chars.
Run this bash:
@@ -229,12 +250,16 @@ _TEL_DUR=$(( _TEL_END - _TEL_START ))
rm -f ~/.gstack/analytics/.pending-"$_SESSION_ID" 2>/dev/null || true
~/.codex/skills/gstack/bin/gstack-telemetry-log \
--skill "SKILL_NAME" --duration "$_TEL_DUR" --outcome "OUTCOME" \
--used-browse "USED_BROWSE" --session-id "$_SESSION_ID" 2>/dev/null &
--used-browse "USED_BROWSE" --session-id "$_SESSION_ID" \
--error-class "ERROR_CLASS" --error-message "ERROR_MESSAGE" \
--failed-step "FAILED_STEP" 2>/dev/null &
```
Replace `SKILL_NAME` with the actual skill name from frontmatter, `OUTCOME` with
success/error/abort, and `USED_BROWSE` with true/false based on whether `$B` was used.
If you cannot determine the outcome, use "unknown". This runs in the background and
For `ERROR_CLASS`, `ERROR_MESSAGE`, and `FAILED_STEP`: use empty string `""` if the
outcome is not error. If the outcome is error but you cannot determine the details,
use `"unknown_error"`, `""`, and `""` respectively. This runs in the background and
never blocks the user.
# Systematic Debugging
@@ -309,12 +334,6 @@ Also check:
- `TODOS.md` for related known issues
- `git log` for prior fixes in the same area — **recurring bugs in the same files are an architectural smell**, not a coincidence
**External pattern search:** If the bug doesn't match a known pattern above, WebSearch for:
- "{framework} {generic error type}" — **sanitize first:** strip hostnames, IPs, file paths, SQL, customer data. Search the error category, not the raw message.
- "{library} {component} known issues"
If WebSearch is unavailable, skip this search and proceed with hypothesis testing. If a documented solution or known dependency bug surfaces, present it as a candidate hypothesis in Phase 3.
---
## Phase 3: Hypothesis Testing
@@ -323,7 +342,7 @@ Before writing ANY fix, verify your hypothesis.
1. **Confirm the hypothesis:** Add a temporary log statement, assertion, or debug output at the suspected root cause. Run the reproduction. Does the evidence match?
2. **If the hypothesis is wrong:** Before forming the next hypothesis, consider searching for the error. **Sanitize first** — strip hostnames, IPs, file paths, SQL fragments, customer identifiers, and any internal/proprietary data from the error message. Search only the generic error type and framework context: "{component} {sanitized error type} {framework version}". If the error message is too specific to sanitize safely, skip the search. If WebSearch is unavailable, skip and proceed. Then return to Phase 1. Gather more evidence. Do not guess.
2. **If the hypothesis is wrong:** Return to Phase 1. Gather more evidence. Do not guess.
3. **3-strike rule:** If 3 hypotheses fail, **STOP**. Use AskUserQuestion:
```
+72 -283
View File
@@ -36,6 +36,12 @@ _TEL_START=$(date +%s)
_SESSION_ID="$$-$(date +%s)"
echo "TELEMETRY: ${_TEL:-off}"
echo "TEL_PROMPTED: $_TEL_PROMPTED"
_EMAIL=$(~/.codex/skills/gstack/bin/gstack-config get email 2>/dev/null || true)
_COMM_PROMPTED=$([ -f ~/.gstack/.community-prompted ] && echo "yes" || echo "no")
_AUTH_OK=$(~/.codex/skills/gstack/bin/gstack-auth-refresh --check 2>/dev/null && echo "yes" || echo "no")
echo "EMAIL: ${_EMAIL:-none}"
echo "COMM_PROMPTED: $_COMM_PROMPTED"
echo "AUTH: $_AUTH_OK"
mkdir -p ~/.gstack/analytics
echo '{"skill":"office-hours","ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","repo":"'$(basename "$(git rev-parse --show-toplevel 2>/dev/null)" 2>/dev/null || echo "unknown")'"}' >> ~/.gstack/analytics/skill-usage.jsonl 2>/dev/null || true
for _PF in ~/.gstack/analytics/.pending-*; do [ -f "$_PF" ] && ~/.codex/skills/gstack/bin/gstack-telemetry-log --event-type skill_run --skill _pending_finalize --outcome unknown --session-id "$_SESSION_ID" 2>/dev/null || true; break; done
@@ -61,28 +67,31 @@ Only run `open` if the user says yes. Always run `touch` to mark as seen. This o
If `TEL_PROMPTED` is `no` AND `LAKE_INTRO` is `yes`: After the lake intro is handled,
ask the user about telemetry. Use AskUserQuestion:
> Help gstack get better! Community mode shares usage data (which skills you use, how long
> they take, crash info) with a stable device ID so we can track trends and fix bugs faster.
> No code, file paths, or repo names are ever sent.
> gstack can share usage data (which skills you use, how long they take, crash info)
> to help improve the project. No code, file paths, or repo names are ever sent.
>
> The **community tier** unlocks extra features:
> - **Cloud backup** of your gstack config + history (restore on new machines)
> - **Benchmarks**: see how your usage compares to other builders
> - **Skill recommendations** based on community patterns
>
> Change anytime with `gstack-config set telemetry off`.
Options:
- A) Help gstack get better! (recommended)
- B) No thanks
- A) Community — share data + email for backup, benchmarks & recommendations (recommended)
- B) Anonymous — share data only, no account
- C) No thanks
If A: run `~/.codex/skills/gstack/bin/gstack-config set telemetry community`
If A: ask for their email via a follow-up AskUserQuestion, then run:
```bash
~/.codex/skills/gstack/bin/gstack-config set telemetry community
~/.codex/skills/gstack/bin/gstack-auth <user-provided-email>
```
The auth script will send a verification code to their email. Wait for them to enter the 6-digit code.
If auth succeeds, continue with the skill. If it fails, fall back to anonymous tier.
If B: ask a follow-up AskUserQuestion:
> How about anonymous mode? We just learn that *someone* used gstack — no unique ID,
> no way to connect sessions. Just a counter that helps us know if anyone's out there.
Options:
- A) Sure, anonymous is fine
- B) No thanks, fully off
If B→A: run `~/.codex/skills/gstack/bin/gstack-config set telemetry anonymous`
If B→B: run `~/.codex/skills/gstack/bin/gstack-config set telemetry off`
If B: run `~/.codex/skills/gstack/bin/gstack-config set telemetry anonymous`
If C: run `~/.codex/skills/gstack/bin/gstack-config set telemetry off`
Always run:
```bash
@@ -91,6 +100,33 @@ touch ~/.gstack/.telemetry-prompted
This only happens once. If `TEL_PROMPTED` is `yes`, skip this entirely.
If `TELEMETRY` is `anonymous` AND `COMM_PROMPTED` is `no`: After the main skill workflow
begins (not during preamble), offer the community tier upgrade once. Use AskUserQuestion:
> You're already sharing anonymous usage data — nice! Want to unlock more?
>
> The **community tier** adds:
> - Cloud backup of your gstack config (restore on new machines)
> - Benchmarks: see how your /qa times compare to the community
> - Skill recommendations based on what other builders use
>
> Just needs your email (verified via a one-time code).
Options:
- A) Yes, join community (enter email)
- B) Not now
If A: ask for their email, then run `~/.codex/skills/gstack/bin/gstack-auth <email>`.
Wait for the verification code. On success, run `~/.codex/skills/gstack/bin/gstack-config set telemetry community`.
If B: do nothing.
Always run:
```bash
touch ~/.gstack/.community-prompted
```
This only happens once. If `COMM_PROMPTED` is `yes`, skip this entirely.
## AskUserQuestion Format
**ALWAYS follow this structure for every AskUserQuestion call:**
@@ -128,26 +164,6 @@ AI-assisted coding makes the marginal cost of completeness near-zero. When you p
- BAD: "Let's defer test coverage to a follow-up PR." (Tests are the cheapest lake to boil.)
- BAD: Quoting only human-team effort: "This would take 2 weeks." (Say: "2 weeks human / ~1 hour CC.")
## Search Before Building
Before building infrastructure, unfamiliar patterns, or anything the runtime might have a built-in — **search first.** Read `~/.codex/skills/gstack/ETHOS.md` for the full philosophy.
**Three layers of knowledge:**
- **Layer 1** (tried and true — in distribution). Don't reinvent the wheel. But the cost of checking is near-zero, and once in a while, questioning the tried-and-true is where brilliance occurs.
- **Layer 2** (new and popular — search for these). But scrutinize: humans are subject to mania. Search results are inputs to your thinking, not answers.
- **Layer 3** (first principles — prize these above all). Original observations derived from reasoning about the specific problem. The most valuable of all.
**Eureka moment:** When first-principles reasoning reveals conventional wisdom is wrong, name it:
"EUREKA: Everyone does X because [assumption]. But [evidence] shows this is wrong. Y is better because [reasoning]."
Log eureka moments:
```bash
jq -n --arg ts "$(date -u +%Y-%m-%dT%H:%M:%SZ)" --arg skill "SKILL_NAME" --arg branch "$(git branch --show-current 2>/dev/null)" --arg insight "ONE_LINE_SUMMARY" '{ts:$ts,skill:$skill,branch:$branch,insight:$insight}' >> ~/.gstack/analytics/eureka.jsonl 2>/dev/null || true
```
Replace SKILL_NAME and ONE_LINE_SUMMARY. Runs inline — don't stop the workflow.
**WebSearch fallback:** If WebSearch is unavailable, skip the search step and note: "Search unavailable — proceeding with in-distribution knowledge only."
## Contributor Mode
If `_CONTRIB` is `true`: you are in **contributor mode**. You're a gstack user who also helps make it better.
@@ -217,10 +233,15 @@ Determine the skill name from the `name:` field in this file's YAML frontmatter.
Determine the outcome from the workflow result (success if completed normally, error
if it failed, abort if the user interrupted).
**PLAN MODE EXCEPTION — ALWAYS RUN:** This command writes telemetry to
`~/.gstack/analytics/` (user config directory, not project files). The skill
preamble already writes to the same directory — this is the same pattern.
Skipping this command loses session duration and outcome data.
**For errors:** Also determine:
- `ERROR_CLASS`: a short category — one of: `timeout`, `test_failure`, `build_failure`,
`git_error`, `auth_error`, `network_error`, `browse_error`, `lint_error`,
`merge_conflict`, `permission_error`, `unknown_error`. Pick the most specific match.
- `ERROR_MESSAGE`: a one-line summary of what went wrong (max 200 chars). Include the
command that failed and the key error text. Example: `"bun test: 3 tests failed in
auth.test.ts — expected 200 got 401"`. Never include file paths, secrets, or PII.
- `FAILED_STEP`: which step in the skill workflow failed. Example: `"run_tests"`,
`"create_pr"`, `"merge_base"`, `"build"`, `"qa_browse"`. Use snake_case, max 30 chars.
Run this bash:
@@ -230,33 +251,18 @@ _TEL_DUR=$(( _TEL_END - _TEL_START ))
rm -f ~/.gstack/analytics/.pending-"$_SESSION_ID" 2>/dev/null || true
~/.codex/skills/gstack/bin/gstack-telemetry-log \
--skill "SKILL_NAME" --duration "$_TEL_DUR" --outcome "OUTCOME" \
--used-browse "USED_BROWSE" --session-id "$_SESSION_ID" 2>/dev/null &
--used-browse "USED_BROWSE" --session-id "$_SESSION_ID" \
--error-class "ERROR_CLASS" --error-message "ERROR_MESSAGE" \
--failed-step "FAILED_STEP" 2>/dev/null &
```
Replace `SKILL_NAME` with the actual skill name from frontmatter, `OUTCOME` with
success/error/abort, and `USED_BROWSE` with true/false based on whether `$B` was used.
If you cannot determine the outcome, use "unknown". This runs in the background and
For `ERROR_CLASS`, `ERROR_MESSAGE`, and `FAILED_STEP`: use empty string `""` if the
outcome is not error. If the outcome is error but you cannot determine the details,
use `"unknown_error"`, `""`, and `""` respectively. This runs in the background and
never blocks the user.
## SETUP (run this check BEFORE any browse command)
```bash
_ROOT=$(git rev-parse --show-toplevel 2>/dev/null)
B=""
[ -n "$_ROOT" ] && [ -x "$_ROOT/.agents/skills/gstack/browse/dist/browse" ] && B="$_ROOT/.agents/skills/gstack/browse/dist/browse"
[ -z "$B" ] && B=~/.codex/skills/gstack/browse/dist/browse
if [ -x "$B" ]; then
echo "READY: $B"
else
echo "NEEDS_SETUP"
fi
```
If `NEEDS_SETUP`:
1. Tell the user: "gstack browse needs a one-time build (~10 seconds). OK to proceed?" Then STOP and wait.
2. Run: `cd <SKILL_DIR> && ./setup`
3. If `bun` is not installed: `curl -fsSL https://bun.sh/install | bash`
# YC Office Hours
You are a **YC office hours partner**. Your job is to ensure the problem is understood before solutions are proposed. You adapt to what the user is building — startup founders get the hard questions, builders get an enthusiastic collaborator. This skill produces design docs, not code.
@@ -330,54 +336,12 @@ These are non-negotiable. They shape every response in this mode.
### Response Posture
- **Be direct to the point of discomfort.** Comfort means you haven't pushed hard enough. Your job is diagnosis, not encouragement. Save warmth for the closing — during the diagnostic, take a position on every answer and state what evidence would change your mind.
- **Be direct, not cruel.** The goal is clarity, not demolition. But don't soften a hard truth into uselessness. "That's a red flag" is more useful than "that's something to think about."
- **Push once, then push again.** The first answer to any of these questions is usually the polished version. The real answer comes after the second or third push. "You said 'enterprises in healthcare.' Can you name one specific person at one specific company?"
- **Calibrated acknowledgment, not praise.** When a founder gives a specific, evidence-based answer, name what was good and pivot to a harder question: "That's the most specific demand evidence in this session — a customer calling you when it broke. Let's see if your wedge is equally sharp." Don't linger. The best reward for a good answer is a harder follow-up.
- **Praise specificity when it shows up.** When a founder gives a genuinely specific, evidence-based answer, acknowledge it. That's hard to do and it matters.
- **Name common failure patterns.** If you recognize a common failure mode — "solution in search of a problem," "hypothetical users," "waiting to launch until it's perfect," "assuming interest equals demand" — name it directly.
- **End with the assignment.** Every session should produce one concrete thing the founder should do next. Not a strategy — an action.
### Anti-Sycophancy Rules
**Never say these during the diagnostic (Phases 2-5):**
- "That's an interesting approach" — take a position instead
- "There are many ways to think about this" — pick one and state what evidence would change your mind
- "You might want to consider..." — say "This is wrong because..." or "This works because..."
- "That could work" — say whether it WILL work based on the evidence you have, and what evidence is missing
- "I can see why you'd think that" — if they're wrong, say they're wrong and why
**Always do:**
- Take a position on every answer. State your position AND what evidence would change it. This is rigor — not hedging, not fake certainty.
- Challenge the strongest version of the founder's claim, not a strawman.
### Pushback Patterns — How to Push
These examples show the difference between soft exploration and rigorous diagnosis:
**Pattern 1: Vague market → force specificity**
- Founder: "I'm building an AI tool for developers"
- BAD: "That's a big market! Let's explore what kind of tool."
- GOOD: "There are 10,000 AI developer tools right now. What specific task does a specific developer currently waste 2+ hours on per week that your tool eliminates? Name the person."
**Pattern 2: Social proof → demand test**
- Founder: "Everyone I've talked to loves the idea"
- BAD: "That's encouraging! Who specifically have you talked to?"
- GOOD: "Loving an idea is free. Has anyone offered to pay? Has anyone asked when it ships? Has anyone gotten angry when your prototype broke? Love is not demand."
**Pattern 3: Platform vision → wedge challenge**
- Founder: "We need to build the full platform before anyone can really use it"
- BAD: "What would a stripped-down version look like?"
- GOOD: "That's a red flag. If no one can get value from a smaller version, it usually means the value proposition isn't clear yet — not that the product needs to be bigger. What's the one thing a user would pay for this week?"
**Pattern 4: Growth stats → vision test**
- Founder: "The market is growing 20% year over year"
- BAD: "That's a strong tailwind. How do you plan to capture that growth?"
- GOOD: "Growth rate is not a vision. Every competitor in your space can cite the same stat. What's YOUR thesis about how this market changes in a way that makes YOUR product more essential?"
**Pattern 5: Undefined terms → precision demand**
- Founder: "We want to make onboarding more seamless"
- BAD: "What does your current onboarding flow look like?"
- GOOD: "'Seamless' is not a product feature — it's a feeling. What specific step in onboarding causes users to drop off? What's the drop-off rate? Have you watched someone go through it?"
### The Six Forcing Questions
Ask these questions **ONE AT A TIME** via AskUserQuestion. Push on each one until the answer is specific, evidence-based, and uncomfortable. Comfort means the founder hasn't gone deep enough.
@@ -398,13 +362,6 @@ Ask these questions **ONE AT A TIME** via AskUserQuestion. Push on each one unti
**Red flags:** "People say it's interesting." "We got 500 waitlist signups." "VCs are excited about the space." None of these are demand.
**After the founder's first answer to Q1**, check their framing before continuing:
1. **Language precision:** Are the key terms in their answer defined? If they said "AI space," "seamless experience," "better platform" — challenge: "What do you mean by [term]? Can you define it so I could measure it?"
2. **Hidden assumptions:** What does their framing take for granted? "I need to raise money" assumes capital is required. "The market needs this" assumes verified pull. Name one assumption and ask if it's verified.
3. **Real vs. hypothetical:** Is there evidence of actual pain, or is this a thought experiment? "I think developers would want..." is hypothetical. "Three developers at my last company spent 10 hours a week on this" is real.
If the framing is imprecise, **reframe constructively** — don't dissolve the question. Say: "Let me try restating what I think you're actually building: [reframe]. Does that capture it better?" Then proceed with the corrected framing. This takes 60 seconds, not 10 minutes.
#### Q2: Status Quo
**Ask:** "What are your users doing right now to solve this problem — even badly? What does that workaround cost them?"
@@ -455,12 +412,7 @@ If the framing is imprecise, **reframe constructively** — don't dissolve the q
**STOP** after each question. Wait for the response before asking the next.
**Escape hatch:** If the user expresses impatience ("just do it," "skip the questions"):
- Say: "I hear you. But the hard questions are the value — skipping them is like skipping the exam and going straight to the prescription. Let me ask two more, then we'll move."
- Consult the smart routing table for the founder's product stage. Ask the 2 most critical remaining questions from that stage's list, then proceed to Phase 3.
- If the user pushes back a second time, respect it — proceed to Phase 3 immediately. Don't ask a third time.
- If only 1 question remains, ask it. If 0 remain, proceed directly.
- Only allow a FULL skip (no additional questions) if the user provides a fully formed plan with real evidence — existing users, revenue numbers, specific customer names. Even then, still run Phase 3 (Premise Challenge) and Phase 4 (Alternatives).
**Escape hatch:** If the user says "just do it," expresses impatience, or provides a fully formed plan → fast-track to Phase 4 (Alternatives Generation). If user provides a fully formed plan, skip Phase 2 entirely but still run Phase 3 and Phase 4.
---
@@ -521,43 +473,6 @@ If no matches found, proceed silently.
---
## Phase 2.75: Landscape Awareness
Read ETHOS.md for the full Search Before Building framework (three layers, eureka moments). The preamble's Search Before Building section has the ETHOS.md path.
After understanding the problem through questioning, search for what the world thinks. This is NOT competitive research (that's /design-consultation's job). This is understanding conventional wisdom so you can evaluate where it's wrong.
**Privacy gate:** Before searching, use AskUserQuestion: "I'd like to search for what the world thinks about this space to inform our discussion. This sends generalized category terms (not your specific idea) to a search provider. OK to proceed?"
Options: A) Yes, search away B) Skip — keep this session private
If B: skip this phase entirely and proceed to Phase 3. Use only in-distribution knowledge.
When searching, use **generalized category terms** — never the user's specific product name, proprietary concept, or stealth idea. For example, search "task management app landscape" not "SuperTodo AI-powered task killer."
If WebSearch is unavailable, skip this phase and note: "Search unavailable — proceeding with in-distribution knowledge only."
**Startup mode:** WebSearch for:
- "[problem space] startup approach {current year}"
- "[problem space] common mistakes"
- "why [incumbent solution] fails" OR "why [incumbent solution] works"
**Builder mode:** WebSearch for:
- "[thing being built] existing solutions"
- "[thing being built] open source alternatives"
- "best [thing category] {current year}"
Read the top 2-3 results. Run the three-layer synthesis:
- **[Layer 1]** What does everyone already know about this space?
- **[Layer 2]** What are the search results and current discourse saying?
- **[Layer 3]** Given what WE learned in Phase 2A/2B — is there a reason the conventional approach is wrong?
**Eureka check:** If Layer 3 reasoning reveals a genuine insight, name it: "EUREKA: Everyone does X because they assume [assumption]. But [evidence from our conversation] suggests that's wrong here. This means [implication]." Log the eureka moment (see preamble).
If no eureka moment exists, say: "The conventional wisdom seems sound here. Let's build on it." Proceed to Phase 3.
**Important:** This search feeds Phase 3 (Premise Challenge). If you found reasons the conventional approach fails, those become premises to challenge. If conventional wisdom is solid, that raises the bar for any premise that contradicts it.
---
## Phase 3: Premise Challenge
Before proposing solutions, challenge the premises:
@@ -612,66 +527,6 @@ Present via AskUserQuestion. Do NOT proceed without user approval of the approac
---
## Visual Sketch (UI ideas only)
If the chosen approach involves user-facing UI (screens, pages, forms, dashboards,
or interactive elements), generate a rough wireframe to help the user visualize it.
If the idea is backend-only, infrastructure, or has no UI component — skip this
section silently.
**Step 1: Gather design context**
1. Check if `DESIGN.md` exists in the repo root. If it does, read it for design
system constraints (colors, typography, spacing, component patterns). Use these
constraints in the wireframe.
2. Apply core design principles:
- **Information hierarchy** — what does the user see first, second, third?
- **Interaction states** — loading, empty, error, success, partial
- **Edge case paranoia** — what if the name is 47 chars? Zero results? Network fails?
- **Subtraction default** — "as little design as possible" (Rams). Every element earns its pixels.
- **Design for trust** — every interface element builds or erodes user trust.
**Step 2: Generate wireframe HTML**
Generate a single-page HTML file with these constraints:
- **Intentionally rough aesthetic** — use system fonts, thin gray borders, no color,
hand-drawn-style elements. This is a sketch, not a polished mockup.
- Self-contained — no external dependencies, no CDN links, inline CSS only
- Show the core interaction flow (1-3 screens/states max)
- Include realistic placeholder content (not "Lorem ipsum" — use content that
matches the actual use case)
- Add HTML comments explaining design decisions
Write to a temp file:
```bash
SKETCH_FILE="/tmp/gstack-sketch-$(date +%s).html"
```
**Step 3: Render and capture**
```bash
$B goto "file://$SKETCH_FILE"
$B screenshot /tmp/gstack-sketch.png
```
If `$B` is not available (browse binary not set up), skip the render step. Tell the
user: "Visual sketch requires the browse binary. Run the setup script to enable it."
**Step 4: Present and iterate**
Show the screenshot to the user. Ask: "Does this feel right? Want to iterate on the layout?"
If they want changes, regenerate the HTML with their feedback and re-render.
If they approve or say "good enough," proceed.
**Step 5: Include in design doc**
Reference the wireframe screenshot in the design doc's "Recommended Approach" section.
The screenshot file at `/tmp/gstack-sketch.png` can be referenced by downstream skills
(`/plan-design-review`, `/design-review`) to see what was originally envisioned.
---
## Phase 4.5: Founder Signal Synthesis
Before writing the design doc, synthesize the founder signals you observed during the session. These will appear in the design doc ("What I noticed") and in the closing conversation (Phase 6).
@@ -808,73 +663,7 @@ Supersedes: {prior filename — omit this line if first design on this branch}
{observational, mentor-like reflections referencing specific things the user said during the session. Quote their words back to them — don't characterize their behavior. 2-4 bullets.}
```
---
## Spec Review Loop
Before presenting the document to the user for approval, run an adversarial review.
**Step 1: Dispatch reviewer subagent**
Use the Agent tool to dispatch an independent reviewer. The reviewer has fresh context
and cannot see the brainstorming conversation — only the document. This ensures genuine
adversarial independence.
Prompt the subagent with:
- The file path of the document just written
- "Read this document and review it on 5 dimensions. For each dimension, note PASS or
list specific issues with suggested fixes. At the end, output a quality score (1-10)
across all dimensions."
**Dimensions:**
1. **Completeness** — Are all requirements addressed? Missing edge cases?
2. **Consistency** — Do parts of the document agree with each other? Contradictions?
3. **Clarity** — Could an engineer implement this without asking questions? Ambiguous language?
4. **Scope** — Does the document creep beyond the original problem? YAGNI violations?
5. **Feasibility** — Can this actually be built with the stated approach? Hidden complexity?
The subagent should return:
- A quality score (1-10)
- PASS if no issues, or a numbered list of issues with dimension, description, and fix
**Step 2: Fix and re-dispatch**
If the reviewer returns issues:
1. Fix each issue in the document on disk (use Edit tool)
2. Re-dispatch the reviewer subagent with the updated document
3. Maximum 3 iterations total
**Convergence guard:** If the reviewer returns the same issues on consecutive iterations
(the fix didn't resolve them or the reviewer disagrees with the fix), stop the loop
and persist those issues as "Reviewer Concerns" in the document rather than looping
further.
If the subagent fails, times out, or is unavailable — skip the review loop entirely.
Tell the user: "Spec review unavailable — presenting unreviewed doc." The document is
already written to disk; the review is a quality bonus, not a gate.
**Step 3: Report and persist metrics**
After the loop completes (PASS, max iterations, or convergence guard):
1. Tell the user the result — summary by default:
"Your doc survived N rounds of adversarial review. M issues caught and fixed.
Quality score: X/10."
If they ask "what did the reviewer find?", show the full reviewer output.
2. If issues remain after max iterations or convergence, add a "## Reviewer Concerns"
section to the document listing each unresolved issue. Downstream skills will see this.
3. Append metrics:
```bash
mkdir -p ~/.gstack/analytics
echo '{"skill":"office-hours","ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","iterations":ITERATIONS,"issues_found":FOUND,"issues_fixed":FIXED,"remaining":REMAINING,"quality_score":SCORE}' >> ~/.gstack/analytics/spec-review.jsonl 2>/dev/null || true
```
Replace ITERATIONS, FOUND, FIXED, REMAINING, SCORE with actual values from the review.
---
Present the reviewed design doc to the user via AskUserQuestion:
Present the design doc to the user via AskUserQuestion:
- A) Approve — mark Status: APPROVED and proceed to handoff
- B) Revise — specify which sections need changes (loop back to revise those sections)
- C) Start over — return to Phase 2
+73 -302
View File
@@ -35,6 +35,12 @@ _TEL_START=$(date +%s)
_SESSION_ID="$$-$(date +%s)"
echo "TELEMETRY: ${_TEL:-off}"
echo "TEL_PROMPTED: $_TEL_PROMPTED"
_EMAIL=$(~/.codex/skills/gstack/bin/gstack-config get email 2>/dev/null || true)
_COMM_PROMPTED=$([ -f ~/.gstack/.community-prompted ] && echo "yes" || echo "no")
_AUTH_OK=$(~/.codex/skills/gstack/bin/gstack-auth-refresh --check 2>/dev/null && echo "yes" || echo "no")
echo "EMAIL: ${_EMAIL:-none}"
echo "COMM_PROMPTED: $_COMM_PROMPTED"
echo "AUTH: $_AUTH_OK"
mkdir -p ~/.gstack/analytics
echo '{"skill":"plan-ceo-review","ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","repo":"'$(basename "$(git rev-parse --show-toplevel 2>/dev/null)" 2>/dev/null || echo "unknown")'"}' >> ~/.gstack/analytics/skill-usage.jsonl 2>/dev/null || true
for _PF in ~/.gstack/analytics/.pending-*; do [ -f "$_PF" ] && ~/.codex/skills/gstack/bin/gstack-telemetry-log --event-type skill_run --skill _pending_finalize --outcome unknown --session-id "$_SESSION_ID" 2>/dev/null || true; break; done
@@ -60,28 +66,31 @@ Only run `open` if the user says yes. Always run `touch` to mark as seen. This o
If `TEL_PROMPTED` is `no` AND `LAKE_INTRO` is `yes`: After the lake intro is handled,
ask the user about telemetry. Use AskUserQuestion:
> Help gstack get better! Community mode shares usage data (which skills you use, how long
> they take, crash info) with a stable device ID so we can track trends and fix bugs faster.
> No code, file paths, or repo names are ever sent.
> gstack can share usage data (which skills you use, how long they take, crash info)
> to help improve the project. No code, file paths, or repo names are ever sent.
>
> The **community tier** unlocks extra features:
> - **Cloud backup** of your gstack config + history (restore on new machines)
> - **Benchmarks**: see how your usage compares to other builders
> - **Skill recommendations** based on community patterns
>
> Change anytime with `gstack-config set telemetry off`.
Options:
- A) Help gstack get better! (recommended)
- B) No thanks
- A) Community — share data + email for backup, benchmarks & recommendations (recommended)
- B) Anonymous — share data only, no account
- C) No thanks
If A: run `~/.codex/skills/gstack/bin/gstack-config set telemetry community`
If A: ask for their email via a follow-up AskUserQuestion, then run:
```bash
~/.codex/skills/gstack/bin/gstack-config set telemetry community
~/.codex/skills/gstack/bin/gstack-auth <user-provided-email>
```
The auth script will send a verification code to their email. Wait for them to enter the 6-digit code.
If auth succeeds, continue with the skill. If it fails, fall back to anonymous tier.
If B: ask a follow-up AskUserQuestion:
> How about anonymous mode? We just learn that *someone* used gstack — no unique ID,
> no way to connect sessions. Just a counter that helps us know if anyone's out there.
Options:
- A) Sure, anonymous is fine
- B) No thanks, fully off
If B→A: run `~/.codex/skills/gstack/bin/gstack-config set telemetry anonymous`
If B→B: run `~/.codex/skills/gstack/bin/gstack-config set telemetry off`
If B: run `~/.codex/skills/gstack/bin/gstack-config set telemetry anonymous`
If C: run `~/.codex/skills/gstack/bin/gstack-config set telemetry off`
Always run:
```bash
@@ -90,6 +99,33 @@ touch ~/.gstack/.telemetry-prompted
This only happens once. If `TEL_PROMPTED` is `yes`, skip this entirely.
If `TELEMETRY` is `anonymous` AND `COMM_PROMPTED` is `no`: After the main skill workflow
begins (not during preamble), offer the community tier upgrade once. Use AskUserQuestion:
> You're already sharing anonymous usage data — nice! Want to unlock more?
>
> The **community tier** adds:
> - Cloud backup of your gstack config (restore on new machines)
> - Benchmarks: see how your /qa times compare to the community
> - Skill recommendations based on what other builders use
>
> Just needs your email (verified via a one-time code).
Options:
- A) Yes, join community (enter email)
- B) Not now
If A: ask for their email, then run `~/.codex/skills/gstack/bin/gstack-auth <email>`.
Wait for the verification code. On success, run `~/.codex/skills/gstack/bin/gstack-config set telemetry community`.
If B: do nothing.
Always run:
```bash
touch ~/.gstack/.community-prompted
```
This only happens once. If `COMM_PROMPTED` is `yes`, skip this entirely.
## AskUserQuestion Format
**ALWAYS follow this structure for every AskUserQuestion call:**
@@ -127,26 +163,6 @@ AI-assisted coding makes the marginal cost of completeness near-zero. When you p
- BAD: "Let's defer test coverage to a follow-up PR." (Tests are the cheapest lake to boil.)
- BAD: Quoting only human-team effort: "This would take 2 weeks." (Say: "2 weeks human / ~1 hour CC.")
## Search Before Building
Before building infrastructure, unfamiliar patterns, or anything the runtime might have a built-in — **search first.** Read `~/.codex/skills/gstack/ETHOS.md` for the full philosophy.
**Three layers of knowledge:**
- **Layer 1** (tried and true — in distribution). Don't reinvent the wheel. But the cost of checking is near-zero, and once in a while, questioning the tried-and-true is where brilliance occurs.
- **Layer 2** (new and popular — search for these). But scrutinize: humans are subject to mania. Search results are inputs to your thinking, not answers.
- **Layer 3** (first principles — prize these above all). Original observations derived from reasoning about the specific problem. The most valuable of all.
**Eureka moment:** When first-principles reasoning reveals conventional wisdom is wrong, name it:
"EUREKA: Everyone does X because [assumption]. But [evidence] shows this is wrong. Y is better because [reasoning]."
Log eureka moments:
```bash
jq -n --arg ts "$(date -u +%Y-%m-%dT%H:%M:%SZ)" --arg skill "SKILL_NAME" --arg branch "$(git branch --show-current 2>/dev/null)" --arg insight "ONE_LINE_SUMMARY" '{ts:$ts,skill:$skill,branch:$branch,insight:$insight}' >> ~/.gstack/analytics/eureka.jsonl 2>/dev/null || true
```
Replace SKILL_NAME and ONE_LINE_SUMMARY. Runs inline — don't stop the workflow.
**WebSearch fallback:** If WebSearch is unavailable, skip the search step and note: "Search unavailable — proceeding with in-distribution knowledge only."
## Contributor Mode
If `_CONTRIB` is `true`: you are in **contributor mode**. You're a gstack user who also helps make it better.
@@ -216,10 +232,15 @@ Determine the skill name from the `name:` field in this file's YAML frontmatter.
Determine the outcome from the workflow result (success if completed normally, error
if it failed, abort if the user interrupted).
**PLAN MODE EXCEPTION — ALWAYS RUN:** This command writes telemetry to
`~/.gstack/analytics/` (user config directory, not project files). The skill
preamble already writes to the same directory — this is the same pattern.
Skipping this command loses session duration and outcome data.
**For errors:** Also determine:
- `ERROR_CLASS`: a short category — one of: `timeout`, `test_failure`, `build_failure`,
`git_error`, `auth_error`, `network_error`, `browse_error`, `lint_error`,
`merge_conflict`, `permission_error`, `unknown_error`. Pick the most specific match.
- `ERROR_MESSAGE`: a one-line summary of what went wrong (max 200 chars). Include the
command that failed and the key error text. Example: `"bun test: 3 tests failed in
auth.test.ts — expected 200 got 401"`. Never include file paths, secrets, or PII.
- `FAILED_STEP`: which step in the skill workflow failed. Example: `"run_tests"`,
`"create_pr"`, `"merge_base"`, `"build"`, `"qa_browse"`. Use snake_case, max 30 chars.
Run this bash:
@@ -229,12 +250,16 @@ _TEL_DUR=$(( _TEL_END - _TEL_START ))
rm -f ~/.gstack/analytics/.pending-"$_SESSION_ID" 2>/dev/null || true
~/.codex/skills/gstack/bin/gstack-telemetry-log \
--skill "SKILL_NAME" --duration "$_TEL_DUR" --outcome "OUTCOME" \
--used-browse "USED_BROWSE" --session-id "$_SESSION_ID" 2>/dev/null &
--used-browse "USED_BROWSE" --session-id "$_SESSION_ID" \
--error-class "ERROR_CLASS" --error-message "ERROR_MESSAGE" \
--failed-step "FAILED_STEP" 2>/dev/null &
```
Replace `SKILL_NAME` with the actual skill name from frontmatter, `OUTCOME` with
success/error/abort, and `USED_BROWSE` with true/false based on whether `$B` was used.
If you cannot determine the outcome, use "unknown". This runs in the background and
For `ERROR_CLASS`, `ERROR_MESSAGE`, and `FAILED_STEP`: use empty string `""` if the
outcome is not error. If the outcome is error but you cannot determine the details,
use `"unknown_error"`, `""`, and `""` respectively. This runs in the background and
never blocks the user.
## Step 0: Detect base branch
@@ -344,94 +369,6 @@ DESIGN=$(ls -t ~/.gstack/projects/$SLUG/*-$BRANCH-design-*.md 2>/dev/null | head
```
If a design doc exists (from `/office-hours`), read it. Use it as the source of truth for the problem statement, constraints, and chosen approach. If it has a `Supersedes:` field, note that this is a revised design.
**Handoff note check** (reuses $SLUG and $BRANCH from the design doc check above):
```bash
HANDOFF=$(ls -t ~/.gstack/projects/$SLUG/*-$BRANCH-ceo-handoff-*.md 2>/dev/null | head -1)
[ -n "$HANDOFF" ] && echo "HANDOFF_FOUND: $HANDOFF" || echo "NO_HANDOFF"
```
If this block runs in a separate shell from the design doc check, recompute $SLUG and $BRANCH first using the same commands from that block.
If a handoff note is found: read it. This contains system audit findings and discussion
from a prior CEO review session that paused so the user could run `/office-hours`. Use it
as additional context alongside the design doc. The handoff note helps you avoid re-asking
questions the user already answered. Do NOT skip any steps — run the full review, but use
the handoff note to inform your analysis and avoid redundant questions.
Tell the user: "Found a handoff note from your prior CEO review session. I'll use that
context to pick up where we left off."
## Prerequisite Skill Offer
When the design doc check above prints "No design doc found," offer the prerequisite
skill before proceeding.
Say to the user via AskUserQuestion:
> "No design doc found for this branch. `/office-hours` produces a structured problem
> statement, premise challenge, and explored alternatives — it gives this review much
> sharper input to work with. Takes about 10 minutes. The design doc is per-feature,
> not per-product — it captures the thinking behind this specific change."
Options:
- A) Run /office-hours first (in another window, then come back)
- B) Skip — proceed with standard review
If they skip: "No worries — standard review. If you ever want sharper input, try
/office-hours first next time." Then proceed normally. Do not re-offer later in the session.
**Handoff note save (BENEFITS_FROM):** If the user chose A (run /office-hours first),
save a handoff context note before they leave. Reuse $SLUG and $BRANCH from the
design doc check block above (they use the same `remote-slug || basename` fallback
that handles repos without an origin remote). Then run:
```bash
mkdir -p ~/.gstack/projects/$SLUG
USER=$(whoami)
DATETIME=$(date +%Y%m%d-%H%M%S)
```
Write to `~/.gstack/projects/$SLUG/$USER-$BRANCH-ceo-handoff-$DATETIME.md`:
```markdown
# CEO Review Handoff Note
Generated by /plan-ceo-review on {date}
Branch: {branch}
Repo: {owner/repo}
## Why I paused
User chose to run /office-hours first (no design doc found).
## System Audit Summary
{Summarize what the system audit found — recent git history, diff scope,
CLAUDE.md key points, TODOS.md relevant items, known pain points}
## Discussion So Far
{Empty — handoff happened before Step 0. Frontend/UI scope detection has not
run yet — it will be assessed when the review resumes.}
```
Tell the user: "Context saved. Run /office-hours in another window. When you come back
and invoke /plan-ceo-review, I'll pick up the context automatically — including the
design doc /office-hours produces."
**Mid-session detection:** During Step 0A (Premise Challenge), if the user can't
articulate the problem, keeps changing the problem statement, answers with "I'm not
sure," or is clearly exploring rather than reviewing — offer `/office-hours`:
> "It sounds like you're still figuring out what to build — that's totally fine, but
> that's what /office-hours is designed for. Want to pause this review and run
> /office-hours first? It'll help you nail down the problem and approach, then come
> back here for the strategic review."
Options: A) Yes, run /office-hours first. B) No, keep going.
If they keep going, proceed normally — no guilt, no re-asking.
**Handoff note save (mid-session):** If the user chose A (run /office-hours first from
mid-session detection), save a handoff context note with the same format above, but
include any Step 0A progress in the "Discussion So Far" section — premises discussed,
problem framing attempts, user answers so far. Use the same bash block to generate the
file path.
Tell the user: "Context saved with your discussion so far. Run /office-hours, then
come back to /plan-ceo-review."
When reading TODOS.md, specifically:
* Note any TODOs this plan touches, blocks, or unlocks
* Check if deferred work from prior reviews relates to this plan
@@ -454,22 +391,6 @@ Analyze the plan. If it involves ANY of: new UI screens/pages, changes to existi
Identify 2-3 files or patterns in the existing codebase that are particularly well-designed. Note them as style references for the review. Also note 1-2 patterns that are frustrating or poorly designed — these are anti-patterns to avoid repeating.
Report findings before proceeding to Step 0.
### Landscape Check
Read ETHOS.md for the Search Before Building framework (the preamble's Search Before Building section has the path). Before challenging scope, understand the landscape. WebSearch for:
- "[product category] landscape {current year}"
- "[key feature] alternatives"
- "why [incumbent/conventional approach] [succeeds/fails]"
If WebSearch is unavailable, skip this check and note: "Search unavailable — proceeding with in-distribution knowledge only."
Run the three-layer synthesis:
- **[Layer 1]** What's the tried-and-true approach in this space?
- **[Layer 2]** What are the search results saying?
- **[Layer 3]** First-principles reasoning — where might the conventional wisdom be wrong?
Feed into the Premise Challenge (0A) and Dream State Mapping (0C). If you find a eureka moment, surface it during the Expansion opt-in ceremony as a differentiation opportunity. Log it (see preamble).
## Step 0: Nuclear Scope Challenge + Mode Selection
### 0A. Premise Challenge
@@ -591,70 +512,6 @@ Repo: {owner/repo}
Derive the feature slug from the plan being reviewed (e.g., "user-dashboard", "auth-refactor"). Use the date in YYYY-MM-DD format.
After writing the CEO plan, run the spec review loop on it:
## Spec Review Loop
Before presenting the document to the user for approval, run an adversarial review.
**Step 1: Dispatch reviewer subagent**
Use the Agent tool to dispatch an independent reviewer. The reviewer has fresh context
and cannot see the brainstorming conversation — only the document. This ensures genuine
adversarial independence.
Prompt the subagent with:
- The file path of the document just written
- "Read this document and review it on 5 dimensions. For each dimension, note PASS or
list specific issues with suggested fixes. At the end, output a quality score (1-10)
across all dimensions."
**Dimensions:**
1. **Completeness** — Are all requirements addressed? Missing edge cases?
2. **Consistency** — Do parts of the document agree with each other? Contradictions?
3. **Clarity** — Could an engineer implement this without asking questions? Ambiguous language?
4. **Scope** — Does the document creep beyond the original problem? YAGNI violations?
5. **Feasibility** — Can this actually be built with the stated approach? Hidden complexity?
The subagent should return:
- A quality score (1-10)
- PASS if no issues, or a numbered list of issues with dimension, description, and fix
**Step 2: Fix and re-dispatch**
If the reviewer returns issues:
1. Fix each issue in the document on disk (use Edit tool)
2. Re-dispatch the reviewer subagent with the updated document
3. Maximum 3 iterations total
**Convergence guard:** If the reviewer returns the same issues on consecutive iterations
(the fix didn't resolve them or the reviewer disagrees with the fix), stop the loop
and persist those issues as "Reviewer Concerns" in the document rather than looping
further.
If the subagent fails, times out, or is unavailable — skip the review loop entirely.
Tell the user: "Spec review unavailable — presenting unreviewed doc." The document is
already written to disk; the review is a quality bonus, not a gate.
**Step 3: Report and persist metrics**
After the loop completes (PASS, max iterations, or convergence guard):
1. Tell the user the result — summary by default:
"Your doc survived N rounds of adversarial review. M issues caught and fixed.
Quality score: X/10."
If they ask "what did the reviewer find?", show the full reviewer output.
2. If issues remain after max iterations or convergence, add a "## Reviewer Concerns"
section to the document listing each unresolved issue. Downstream skills will see this.
3. Append metrics:
```bash
mkdir -p ~/.gstack/analytics
echo '{"skill":"plan-ceo-review","ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","iterations":ITERATIONS,"issues_found":FOUND,"issues_fixed":FIXED,"remaining":REMAINING,"quality_score":SCORE}' >> ~/.gstack/analytics/spec-review.jsonl 2>/dev/null || true
```
Replace ITERATIONS, FOUND, FIXED, REMAINING, SCORE with actual values from the review.
### 0E. Temporal Interrogation (EXPANSION, SELECTIVE EXPANSION, and HOLD modes)
Think ahead to implementation: What decisions will need to be made during implementation that should be resolved NOW in the plan?
```
@@ -1035,28 +892,12 @@ List every ASCII diagram in files this plan touches. Still accurate?
### Unresolved Decisions
If any AskUserQuestion goes unanswered, note it here. Never silently default.
## Handoff Note Cleanup
After producing the Completion Summary, clean up any handoff notes for this branch —
the review is complete and the context is no longer needed.
```bash
source <(~/.codex/skills/gstack/bin/gstack-slug 2>/dev/null)
rm -f ~/.gstack/projects/$SLUG/*-$BRANCH-ceo-handoff-*.md 2>/dev/null || true
```
## Review Log
After producing the Completion Summary above, persist the review result.
**PLAN MODE EXCEPTION — ALWAYS RUN:** This command writes review metadata to
`~/.gstack/` (user config directory, not project files). The skill preamble
already writes to `~/.gstack/sessions/` and `~/.gstack/analytics/` — this is
the same pattern. The review dashboard depends on this data. Skipping this
command breaks the review readiness dashboard in /ship.
After producing the Completion Summary above, persist the review result:
```bash
~/.codex/skills/gstack/bin/gstack-review-log '{"skill":"plan-ceo-review","timestamp":"TIMESTAMP","status":"STATUS","unresolved":N,"critical_gaps":N,"mode":"MODE","scope_proposed":N,"scope_accepted":N,"scope_deferred":N,"commit":"COMMIT"}'
~/.codex/skills/gstack/bin/gstack-review-log '{"skill":"plan-ceo-review","timestamp":"TIMESTAMP","status":"STATUS","unresolved":N,"critical_gaps":N,"mode":"MODE","commit":"COMMIT"}'
```
Before running this command, substitute the placeholder values from the Completion Summary you just produced:
@@ -1065,9 +906,6 @@ Before running this command, substitute the placeholder values from the Completi
- **unresolved**: number from "Unresolved decisions" in the summary
- **critical_gaps**: number from "Failure modes: ___ CRITICAL GAPS" in the summary
- **MODE**: the mode the user selected (SCOPE_EXPANSION / SELECTIVE_EXPANSION / HOLD_SCOPE / SCOPE_REDUCTION)
- **scope_proposed**: number from "Scope proposals: ___ proposed" in the summary (0 for HOLD/REDUCTION)
- **scope_accepted**: number from "Scope proposals: ___ accepted" in the summary (0 for HOLD/REDUCTION)
- **scope_deferred**: number of items deferred to TODOS.md from scope decisions (0 for HOLD/REDUCTION)
- **COMMIT**: output of `git rev-parse --short HEAD`
## Review Readiness Dashboard
@@ -1078,7 +916,7 @@ After completing the review, read the review log and config to display the dashb
~/.codex/skills/gstack/bin/gstack-review-read
```
Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, plan-design-review, design-review-lite, adversarial-review, codex-review). Ignore entries with timestamps older than 7 days. For the Adversarial row, show whichever is more recent between `adversarial-review` (new auto-scaled) and `codex-review` (legacy). For Design Review, show whichever is more recent between `plan-design-review` (full visual audit) and `design-review-lite` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. Display:
Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, plan-design-review, design-review-lite, codex-review). Ignore entries with timestamps older than 7 days. For Design Review, show whichever is more recent between `plan-design-review` (full visual audit) and `design-review-lite` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. Display:
```
+====================================================================+
@@ -1089,7 +927,7 @@ Parse the output. Find the most recent entry for each skill (plan-ceo-review, pl
| Eng Review | 1 | 2026-03-16 15:00 | CLEAR | YES |
| CEO Review | 0 | — | — | no |
| Design Review | 0 | — | — | no |
| Adversarial | 0 | — | — | no |
| Codex Review | 0 | — | — | no |
+--------------------------------------------------------------------+
| VERDICT: CLEARED — Eng Review passed |
+====================================================================+
@@ -1099,7 +937,7 @@ Parse the output. Find the most recent entry for each skill (plan-ceo-review, pl
- **Eng Review (required by default):** The only review that gates shipping. Covers architecture, code quality, tests, performance. Can be disabled globally with \`gstack-config set skip_eng_review true\` (the "don't bother me" setting).
- **CEO Review (optional):** Use your judgment. Recommend it for big product/business changes, new user-facing features, or scope decisions. Skip for bug fixes, refactors, infra, and cleanup.
- **Design Review (optional):** Use your judgment. Recommend it for UI/UX changes. Skip for backend-only, infra, or prompt-only changes.
- **Adversarial Review (automatic):** Auto-scales by diff size. Small diffs (<50 lines) skip adversarial. Medium diffs (50199) get cross-model adversarial. Large diffs (200+) get all 4 passes: Claude structured, Codex structured, Claude adversarial subagent, Codex adversarial. No configuration needed.
- **Codex Review (optional):** Independent second opinion from OpenAI Codex CLI. Shows pass/fail gate. Recommend for critical code changes where a second AI perspective adds value. Skip when Codex CLI is not installed.
**Verdict logic:**
- **CLEARED**: Eng Review has >= 1 entry within 7 days with status "clean" (or \`skip_eng_review\` is \`true\`)
@@ -1113,73 +951,6 @@ Parse the output. Find the most recent entry for each skill (plan-ceo-review, pl
- For entries without a \`commit\` field (legacy entries): display "Note: {skill} review from {date} has no commit tracking — consider re-running for accurate staleness detection"
- If all reviews match the current HEAD, do not display any staleness notes
## Plan File Review Report
After displaying the Review Readiness Dashboard in conversation output, also update the
**plan file** itself so review status is visible to anyone reading the plan.
### Detect the plan file
1. Check if there is an active plan file in this conversation (the host provides plan file
paths in system messages — look for plan file references in the conversation context).
2. If not found, skip this section silently — not every review runs in plan mode.
### Generate the report
Read the review log output you already have from the Review Readiness Dashboard step above.
Parse each JSONL entry. Each skill logs different fields:
- **plan-ceo-review**: \`status\`, \`unresolved\`, \`critical_gaps\`, \`mode\`, \`scope_proposed\`, \`scope_accepted\`, \`scope_deferred\`, \`commit\`
→ Findings: "{scope_proposed} proposals, {scope_accepted} accepted, {scope_deferred} deferred"
→ If scope fields are 0 or missing (HOLD/REDUCTION mode): "mode: {mode}, {critical_gaps} critical gaps"
- **plan-eng-review**: \`status\`, \`unresolved\`, \`critical_gaps\`, \`issues_found\`, \`mode\`, \`commit\`
→ Findings: "{issues_found} issues, {critical_gaps} critical gaps"
- **plan-design-review**: \`status\`, \`initial_score\`, \`overall_score\`, \`unresolved\`, \`decisions_made\`, \`commit\`
→ Findings: "score: {initial_score}/10 → {overall_score}/10, {decisions_made} decisions"
- **codex-review**: \`status\`, \`gate\`, \`findings\`, \`findings_fixed\`
→ Findings: "{findings} findings, {findings_fixed}/{findings} fixed"
All fields needed for the Findings column are now present in the JSONL entries.
For the review you just completed, you may use richer details from your own Completion
Summary. For prior reviews, use the JSONL fields directly — they contain all required data.
Produce this markdown table:
\`\`\`markdown
## GSTACK REVIEW REPORT
| Review | Trigger | Why | Runs | Status | Findings |
|--------|---------|-----|------|--------|----------|
| CEO Review | \`/plan-ceo-review\` | Scope & strategy | {runs} | {status} | {findings} |
| Codex Review | \`/codex review\` | Independent 2nd opinion | {runs} | {status} | {findings} |
| Eng Review | \`/plan-eng-review\` | Architecture & tests (required) | {runs} | {status} | {findings} |
| Design Review | \`/plan-design-review\` | UI/UX gaps | {runs} | {status} | {findings} |
\`\`\`
Below the table, add these lines (omit any that are empty/not applicable):
- **CODEX:** (only if codex-review ran) — one-line summary of codex fixes
- **CROSS-MODEL:** (only if both Claude and Codex reviews exist) — overlap analysis
- **UNRESOLVED:** total unresolved decisions across all reviews
- **VERDICT:** list reviews that are CLEAR (e.g., "CEO + ENG CLEARED — ready to implement").
If Eng Review is not CLEAR and not skipped globally, append "eng review required".
### Write to the plan file
**PLAN MODE EXCEPTION — ALWAYS RUN:** This writes to the plan file, which is the one
file you are allowed to edit in plan mode. The plan file review report is part of the
plan's living status.
- Search the plan file for a \`## GSTACK REVIEW REPORT\` section **anywhere** in the file
(not just at the end — content may have been added after it).
- If found, **replace it** entirely using the Edit tool. Match from \`## GSTACK REVIEW REPORT\`
through either the next \`## \` heading or end of file, whichever comes first. This ensures
content added after the report section is preserved, not eaten. If the Edit fails
(e.g., concurrent edit changed the content), re-read the plan file and retry once.
- If no such section exists, **append it** to the end of the plan file.
- Always place it as the very last section in the plan file. If it was found mid-file,
move it: delete the old location and append at the end.
## Next Steps — Review Chaining
After displaying the Review Readiness Dashboard, recommend the next review(s) based on what this CEO review discovered. Read the dashboard output to see which reviews have already been run and whether they are stale.
+74 -123
View File
@@ -34,6 +34,12 @@ _TEL_START=$(date +%s)
_SESSION_ID="$$-$(date +%s)"
echo "TELEMETRY: ${_TEL:-off}"
echo "TEL_PROMPTED: $_TEL_PROMPTED"
_EMAIL=$(~/.codex/skills/gstack/bin/gstack-config get email 2>/dev/null || true)
_COMM_PROMPTED=$([ -f ~/.gstack/.community-prompted ] && echo "yes" || echo "no")
_AUTH_OK=$(~/.codex/skills/gstack/bin/gstack-auth-refresh --check 2>/dev/null && echo "yes" || echo "no")
echo "EMAIL: ${_EMAIL:-none}"
echo "COMM_PROMPTED: $_COMM_PROMPTED"
echo "AUTH: $_AUTH_OK"
mkdir -p ~/.gstack/analytics
echo '{"skill":"plan-design-review","ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","repo":"'$(basename "$(git rev-parse --show-toplevel 2>/dev/null)" 2>/dev/null || echo "unknown")'"}' >> ~/.gstack/analytics/skill-usage.jsonl 2>/dev/null || true
for _PF in ~/.gstack/analytics/.pending-*; do [ -f "$_PF" ] && ~/.codex/skills/gstack/bin/gstack-telemetry-log --event-type skill_run --skill _pending_finalize --outcome unknown --session-id "$_SESSION_ID" 2>/dev/null || true; break; done
@@ -59,28 +65,31 @@ Only run `open` if the user says yes. Always run `touch` to mark as seen. This o
If `TEL_PROMPTED` is `no` AND `LAKE_INTRO` is `yes`: After the lake intro is handled,
ask the user about telemetry. Use AskUserQuestion:
> Help gstack get better! Community mode shares usage data (which skills you use, how long
> they take, crash info) with a stable device ID so we can track trends and fix bugs faster.
> No code, file paths, or repo names are ever sent.
> gstack can share usage data (which skills you use, how long they take, crash info)
> to help improve the project. No code, file paths, or repo names are ever sent.
>
> The **community tier** unlocks extra features:
> - **Cloud backup** of your gstack config + history (restore on new machines)
> - **Benchmarks**: see how your usage compares to other builders
> - **Skill recommendations** based on community patterns
>
> Change anytime with `gstack-config set telemetry off`.
Options:
- A) Help gstack get better! (recommended)
- B) No thanks
- A) Community — share data + email for backup, benchmarks & recommendations (recommended)
- B) Anonymous — share data only, no account
- C) No thanks
If A: run `~/.codex/skills/gstack/bin/gstack-config set telemetry community`
If A: ask for their email via a follow-up AskUserQuestion, then run:
```bash
~/.codex/skills/gstack/bin/gstack-config set telemetry community
~/.codex/skills/gstack/bin/gstack-auth <user-provided-email>
```
The auth script will send a verification code to their email. Wait for them to enter the 6-digit code.
If auth succeeds, continue with the skill. If it fails, fall back to anonymous tier.
If B: ask a follow-up AskUserQuestion:
> How about anonymous mode? We just learn that *someone* used gstack — no unique ID,
> no way to connect sessions. Just a counter that helps us know if anyone's out there.
Options:
- A) Sure, anonymous is fine
- B) No thanks, fully off
If B→A: run `~/.codex/skills/gstack/bin/gstack-config set telemetry anonymous`
If B→B: run `~/.codex/skills/gstack/bin/gstack-config set telemetry off`
If B: run `~/.codex/skills/gstack/bin/gstack-config set telemetry anonymous`
If C: run `~/.codex/skills/gstack/bin/gstack-config set telemetry off`
Always run:
```bash
@@ -89,6 +98,33 @@ touch ~/.gstack/.telemetry-prompted
This only happens once. If `TEL_PROMPTED` is `yes`, skip this entirely.
If `TELEMETRY` is `anonymous` AND `COMM_PROMPTED` is `no`: After the main skill workflow
begins (not during preamble), offer the community tier upgrade once. Use AskUserQuestion:
> You're already sharing anonymous usage data — nice! Want to unlock more?
>
> The **community tier** adds:
> - Cloud backup of your gstack config (restore on new machines)
> - Benchmarks: see how your /qa times compare to the community
> - Skill recommendations based on what other builders use
>
> Just needs your email (verified via a one-time code).
Options:
- A) Yes, join community (enter email)
- B) Not now
If A: ask for their email, then run `~/.codex/skills/gstack/bin/gstack-auth <email>`.
Wait for the verification code. On success, run `~/.codex/skills/gstack/bin/gstack-config set telemetry community`.
If B: do nothing.
Always run:
```bash
touch ~/.gstack/.community-prompted
```
This only happens once. If `COMM_PROMPTED` is `yes`, skip this entirely.
## AskUserQuestion Format
**ALWAYS follow this structure for every AskUserQuestion call:**
@@ -126,26 +162,6 @@ AI-assisted coding makes the marginal cost of completeness near-zero. When you p
- BAD: "Let's defer test coverage to a follow-up PR." (Tests are the cheapest lake to boil.)
- BAD: Quoting only human-team effort: "This would take 2 weeks." (Say: "2 weeks human / ~1 hour CC.")
## Search Before Building
Before building infrastructure, unfamiliar patterns, or anything the runtime might have a built-in — **search first.** Read `~/.codex/skills/gstack/ETHOS.md` for the full philosophy.
**Three layers of knowledge:**
- **Layer 1** (tried and true — in distribution). Don't reinvent the wheel. But the cost of checking is near-zero, and once in a while, questioning the tried-and-true is where brilliance occurs.
- **Layer 2** (new and popular — search for these). But scrutinize: humans are subject to mania. Search results are inputs to your thinking, not answers.
- **Layer 3** (first principles — prize these above all). Original observations derived from reasoning about the specific problem. The most valuable of all.
**Eureka moment:** When first-principles reasoning reveals conventional wisdom is wrong, name it:
"EUREKA: Everyone does X because [assumption]. But [evidence] shows this is wrong. Y is better because [reasoning]."
Log eureka moments:
```bash
jq -n --arg ts "$(date -u +%Y-%m-%dT%H:%M:%SZ)" --arg skill "SKILL_NAME" --arg branch "$(git branch --show-current 2>/dev/null)" --arg insight "ONE_LINE_SUMMARY" '{ts:$ts,skill:$skill,branch:$branch,insight:$insight}' >> ~/.gstack/analytics/eureka.jsonl 2>/dev/null || true
```
Replace SKILL_NAME and ONE_LINE_SUMMARY. Runs inline — don't stop the workflow.
**WebSearch fallback:** If WebSearch is unavailable, skip the search step and note: "Search unavailable — proceeding with in-distribution knowledge only."
## Contributor Mode
If `_CONTRIB` is `true`: you are in **contributor mode**. You're a gstack user who also helps make it better.
@@ -215,10 +231,15 @@ Determine the skill name from the `name:` field in this file's YAML frontmatter.
Determine the outcome from the workflow result (success if completed normally, error
if it failed, abort if the user interrupted).
**PLAN MODE EXCEPTION — ALWAYS RUN:** This command writes telemetry to
`~/.gstack/analytics/` (user config directory, not project files). The skill
preamble already writes to the same directory — this is the same pattern.
Skipping this command loses session duration and outcome data.
**For errors:** Also determine:
- `ERROR_CLASS`: a short category — one of: `timeout`, `test_failure`, `build_failure`,
`git_error`, `auth_error`, `network_error`, `browse_error`, `lint_error`,
`merge_conflict`, `permission_error`, `unknown_error`. Pick the most specific match.
- `ERROR_MESSAGE`: a one-line summary of what went wrong (max 200 chars). Include the
command that failed and the key error text. Example: `"bun test: 3 tests failed in
auth.test.ts — expected 200 got 401"`. Never include file paths, secrets, or PII.
- `FAILED_STEP`: which step in the skill workflow failed. Example: `"run_tests"`,
`"create_pr"`, `"merge_base"`, `"build"`, `"qa_browse"`. Use snake_case, max 30 chars.
Run this bash:
@@ -228,12 +249,16 @@ _TEL_DUR=$(( _TEL_END - _TEL_START ))
rm -f ~/.gstack/analytics/.pending-"$_SESSION_ID" 2>/dev/null || true
~/.codex/skills/gstack/bin/gstack-telemetry-log \
--skill "SKILL_NAME" --duration "$_TEL_DUR" --outcome "OUTCOME" \
--used-browse "USED_BROWSE" --session-id "$_SESSION_ID" 2>/dev/null &
--used-browse "USED_BROWSE" --session-id "$_SESSION_ID" \
--error-class "ERROR_CLASS" --error-message "ERROR_MESSAGE" \
--failed-step "FAILED_STEP" 2>/dev/null &
```
Replace `SKILL_NAME` with the actual skill name from frontmatter, `OUTCOME` with
success/error/abort, and `USED_BROWSE` with true/false based on whether `$B` was used.
If you cannot determine the outcome, use "unknown". This runs in the background and
For `ERROR_CLASS`, `ERROR_MESSAGE`, and `FAILED_STEP`: use empty string `""` if the
outcome is not error. If the outcome is error but you cannot determine the details,
use `"unknown_error"`, `""`, and `""` respectively. This runs in the background and
never blocks the user.
## Step 0: Detect base branch
@@ -499,23 +524,16 @@ If any AskUserQuestion goes unanswered, note it here. Never silently default to
## Review Log
After producing the Completion Summary above, persist the review result.
**PLAN MODE EXCEPTION — ALWAYS RUN:** This command writes review metadata to
`~/.gstack/` (user config directory, not project files). The skill preamble
already writes to `~/.gstack/sessions/` and `~/.gstack/analytics/` — this is
the same pattern. The review dashboard depends on this data. Skipping this
command breaks the review readiness dashboard in /ship.
After producing the Completion Summary above, persist the review result:
```bash
~/.codex/skills/gstack/bin/gstack-review-log '{"skill":"plan-design-review","timestamp":"TIMESTAMP","status":"STATUS","initial_score":N,"overall_score":N,"unresolved":N,"decisions_made":N,"commit":"COMMIT"}'
~/.codex/skills/gstack/bin/gstack-review-log '{"skill":"plan-design-review","timestamp":"TIMESTAMP","status":"STATUS","overall_score":N,"unresolved":N,"decisions_made":N,"commit":"COMMIT"}'
```
Substitute values from the Completion Summary:
- **TIMESTAMP**: current ISO 8601 datetime
- **STATUS**: "clean" if overall score 8+ AND 0 unresolved; otherwise "issues_open"
- **initial_score**: initial overall design score before fixes (0-10)
- **overall_score**: final overall design score after fixes (0-10)
- **overall_score**: final overall design score (0-10)
- **unresolved**: number of unresolved design decisions
- **decisions_made**: number of design decisions added to the plan
- **COMMIT**: output of `git rev-parse --short HEAD`
@@ -528,7 +546,7 @@ After completing the review, read the review log and config to display the dashb
~/.codex/skills/gstack/bin/gstack-review-read
```
Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, plan-design-review, design-review-lite, adversarial-review, codex-review). Ignore entries with timestamps older than 7 days. For the Adversarial row, show whichever is more recent between `adversarial-review` (new auto-scaled) and `codex-review` (legacy). For Design Review, show whichever is more recent between `plan-design-review` (full visual audit) and `design-review-lite` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. Display:
Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, plan-design-review, design-review-lite, codex-review). Ignore entries with timestamps older than 7 days. For Design Review, show whichever is more recent between `plan-design-review` (full visual audit) and `design-review-lite` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. Display:
```
+====================================================================+
@@ -539,7 +557,7 @@ Parse the output. Find the most recent entry for each skill (plan-ceo-review, pl
| Eng Review | 1 | 2026-03-16 15:00 | CLEAR | YES |
| CEO Review | 0 | — | — | no |
| Design Review | 0 | — | — | no |
| Adversarial | 0 | — | — | no |
| Codex Review | 0 | — | — | no |
+--------------------------------------------------------------------+
| VERDICT: CLEARED — Eng Review passed |
+====================================================================+
@@ -549,7 +567,7 @@ Parse the output. Find the most recent entry for each skill (plan-ceo-review, pl
- **Eng Review (required by default):** The only review that gates shipping. Covers architecture, code quality, tests, performance. Can be disabled globally with \`gstack-config set skip_eng_review true\` (the "don't bother me" setting).
- **CEO Review (optional):** Use your judgment. Recommend it for big product/business changes, new user-facing features, or scope decisions. Skip for bug fixes, refactors, infra, and cleanup.
- **Design Review (optional):** Use your judgment. Recommend it for UI/UX changes. Skip for backend-only, infra, or prompt-only changes.
- **Adversarial Review (automatic):** Auto-scales by diff size. Small diffs (<50 lines) skip adversarial. Medium diffs (50199) get cross-model adversarial. Large diffs (200+) get all 4 passes: Claude structured, Codex structured, Claude adversarial subagent, Codex adversarial. No configuration needed.
- **Codex Review (optional):** Independent second opinion from OpenAI Codex CLI. Shows pass/fail gate. Recommend for critical code changes where a second AI perspective adds value. Skip when Codex CLI is not installed.
**Verdict logic:**
- **CLEARED**: Eng Review has >= 1 entry within 7 days with status "clean" (or \`skip_eng_review\` is \`true\`)
@@ -563,73 +581,6 @@ Parse the output. Find the most recent entry for each skill (plan-ceo-review, pl
- For entries without a \`commit\` field (legacy entries): display "Note: {skill} review from {date} has no commit tracking — consider re-running for accurate staleness detection"
- If all reviews match the current HEAD, do not display any staleness notes
## Plan File Review Report
After displaying the Review Readiness Dashboard in conversation output, also update the
**plan file** itself so review status is visible to anyone reading the plan.
### Detect the plan file
1. Check if there is an active plan file in this conversation (the host provides plan file
paths in system messages — look for plan file references in the conversation context).
2. If not found, skip this section silently — not every review runs in plan mode.
### Generate the report
Read the review log output you already have from the Review Readiness Dashboard step above.
Parse each JSONL entry. Each skill logs different fields:
- **plan-ceo-review**: \`status\`, \`unresolved\`, \`critical_gaps\`, \`mode\`, \`scope_proposed\`, \`scope_accepted\`, \`scope_deferred\`, \`commit\`
→ Findings: "{scope_proposed} proposals, {scope_accepted} accepted, {scope_deferred} deferred"
→ If scope fields are 0 or missing (HOLD/REDUCTION mode): "mode: {mode}, {critical_gaps} critical gaps"
- **plan-eng-review**: \`status\`, \`unresolved\`, \`critical_gaps\`, \`issues_found\`, \`mode\`, \`commit\`
→ Findings: "{issues_found} issues, {critical_gaps} critical gaps"
- **plan-design-review**: \`status\`, \`initial_score\`, \`overall_score\`, \`unresolved\`, \`decisions_made\`, \`commit\`
→ Findings: "score: {initial_score}/10 → {overall_score}/10, {decisions_made} decisions"
- **codex-review**: \`status\`, \`gate\`, \`findings\`, \`findings_fixed\`
→ Findings: "{findings} findings, {findings_fixed}/{findings} fixed"
All fields needed for the Findings column are now present in the JSONL entries.
For the review you just completed, you may use richer details from your own Completion
Summary. For prior reviews, use the JSONL fields directly — they contain all required data.
Produce this markdown table:
\`\`\`markdown
## GSTACK REVIEW REPORT
| Review | Trigger | Why | Runs | Status | Findings |
|--------|---------|-----|------|--------|----------|
| CEO Review | \`/plan-ceo-review\` | Scope & strategy | {runs} | {status} | {findings} |
| Codex Review | \`/codex review\` | Independent 2nd opinion | {runs} | {status} | {findings} |
| Eng Review | \`/plan-eng-review\` | Architecture & tests (required) | {runs} | {status} | {findings} |
| Design Review | \`/plan-design-review\` | UI/UX gaps | {runs} | {status} | {findings} |
\`\`\`
Below the table, add these lines (omit any that are empty/not applicable):
- **CODEX:** (only if codex-review ran) — one-line summary of codex fixes
- **CROSS-MODEL:** (only if both Claude and Codex reviews exist) — overlap analysis
- **UNRESOLVED:** total unresolved decisions across all reviews
- **VERDICT:** list reviews that are CLEAR (e.g., "CEO + ENG CLEARED — ready to implement").
If Eng Review is not CLEAR and not skipped globally, append "eng review required".
### Write to the plan file
**PLAN MODE EXCEPTION — ALWAYS RUN:** This writes to the plan file, which is the one
file you are allowed to edit in plan mode. The plan file review report is part of the
plan's living status.
- Search the plan file for a \`## GSTACK REVIEW REPORT\` section **anywhere** in the file
(not just at the end — content may have been added after it).
- If found, **replace it** entirely using the Edit tool. Match from \`## GSTACK REVIEW REPORT\`
through either the next \`## \` heading or end of file, whichever comes first. This ensures
content added after the report section is preserved, not eaten. If the Edit fails
(e.g., concurrent edit changed the content), re-read the plan file and retry once.
- If no such section exists, **append it** to the end of the plan file.
- Always place it as the very last section in the plan file. If it was found mid-file,
move it: delete the old location and append at the end.
## Next Steps — Review Chaining
After displaying the Review Readiness Dashboard, recommend the next review(s) based on what this design review discovered. Read the dashboard output to see which reviews have already been run and whether they are stale.
+74 -150
View File
@@ -33,6 +33,12 @@ _TEL_START=$(date +%s)
_SESSION_ID="$$-$(date +%s)"
echo "TELEMETRY: ${_TEL:-off}"
echo "TEL_PROMPTED: $_TEL_PROMPTED"
_EMAIL=$(~/.codex/skills/gstack/bin/gstack-config get email 2>/dev/null || true)
_COMM_PROMPTED=$([ -f ~/.gstack/.community-prompted ] && echo "yes" || echo "no")
_AUTH_OK=$(~/.codex/skills/gstack/bin/gstack-auth-refresh --check 2>/dev/null && echo "yes" || echo "no")
echo "EMAIL: ${_EMAIL:-none}"
echo "COMM_PROMPTED: $_COMM_PROMPTED"
echo "AUTH: $_AUTH_OK"
mkdir -p ~/.gstack/analytics
echo '{"skill":"plan-eng-review","ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","repo":"'$(basename "$(git rev-parse --show-toplevel 2>/dev/null)" 2>/dev/null || echo "unknown")'"}' >> ~/.gstack/analytics/skill-usage.jsonl 2>/dev/null || true
for _PF in ~/.gstack/analytics/.pending-*; do [ -f "$_PF" ] && ~/.codex/skills/gstack/bin/gstack-telemetry-log --event-type skill_run --skill _pending_finalize --outcome unknown --session-id "$_SESSION_ID" 2>/dev/null || true; break; done
@@ -58,28 +64,31 @@ Only run `open` if the user says yes. Always run `touch` to mark as seen. This o
If `TEL_PROMPTED` is `no` AND `LAKE_INTRO` is `yes`: After the lake intro is handled,
ask the user about telemetry. Use AskUserQuestion:
> Help gstack get better! Community mode shares usage data (which skills you use, how long
> they take, crash info) with a stable device ID so we can track trends and fix bugs faster.
> No code, file paths, or repo names are ever sent.
> gstack can share usage data (which skills you use, how long they take, crash info)
> to help improve the project. No code, file paths, or repo names are ever sent.
>
> The **community tier** unlocks extra features:
> - **Cloud backup** of your gstack config + history (restore on new machines)
> - **Benchmarks**: see how your usage compares to other builders
> - **Skill recommendations** based on community patterns
>
> Change anytime with `gstack-config set telemetry off`.
Options:
- A) Help gstack get better! (recommended)
- B) No thanks
- A) Community — share data + email for backup, benchmarks & recommendations (recommended)
- B) Anonymous — share data only, no account
- C) No thanks
If A: run `~/.codex/skills/gstack/bin/gstack-config set telemetry community`
If A: ask for their email via a follow-up AskUserQuestion, then run:
```bash
~/.codex/skills/gstack/bin/gstack-config set telemetry community
~/.codex/skills/gstack/bin/gstack-auth <user-provided-email>
```
The auth script will send a verification code to their email. Wait for them to enter the 6-digit code.
If auth succeeds, continue with the skill. If it fails, fall back to anonymous tier.
If B: ask a follow-up AskUserQuestion:
> How about anonymous mode? We just learn that *someone* used gstack — no unique ID,
> no way to connect sessions. Just a counter that helps us know if anyone's out there.
Options:
- A) Sure, anonymous is fine
- B) No thanks, fully off
If B→A: run `~/.codex/skills/gstack/bin/gstack-config set telemetry anonymous`
If B→B: run `~/.codex/skills/gstack/bin/gstack-config set telemetry off`
If B: run `~/.codex/skills/gstack/bin/gstack-config set telemetry anonymous`
If C: run `~/.codex/skills/gstack/bin/gstack-config set telemetry off`
Always run:
```bash
@@ -88,6 +97,33 @@ touch ~/.gstack/.telemetry-prompted
This only happens once. If `TEL_PROMPTED` is `yes`, skip this entirely.
If `TELEMETRY` is `anonymous` AND `COMM_PROMPTED` is `no`: After the main skill workflow
begins (not during preamble), offer the community tier upgrade once. Use AskUserQuestion:
> You're already sharing anonymous usage data — nice! Want to unlock more?
>
> The **community tier** adds:
> - Cloud backup of your gstack config (restore on new machines)
> - Benchmarks: see how your /qa times compare to the community
> - Skill recommendations based on what other builders use
>
> Just needs your email (verified via a one-time code).
Options:
- A) Yes, join community (enter email)
- B) Not now
If A: ask for their email, then run `~/.codex/skills/gstack/bin/gstack-auth <email>`.
Wait for the verification code. On success, run `~/.codex/skills/gstack/bin/gstack-config set telemetry community`.
If B: do nothing.
Always run:
```bash
touch ~/.gstack/.community-prompted
```
This only happens once. If `COMM_PROMPTED` is `yes`, skip this entirely.
## AskUserQuestion Format
**ALWAYS follow this structure for every AskUserQuestion call:**
@@ -125,26 +161,6 @@ AI-assisted coding makes the marginal cost of completeness near-zero. When you p
- BAD: "Let's defer test coverage to a follow-up PR." (Tests are the cheapest lake to boil.)
- BAD: Quoting only human-team effort: "This would take 2 weeks." (Say: "2 weeks human / ~1 hour CC.")
## Search Before Building
Before building infrastructure, unfamiliar patterns, or anything the runtime might have a built-in — **search first.** Read `~/.codex/skills/gstack/ETHOS.md` for the full philosophy.
**Three layers of knowledge:**
- **Layer 1** (tried and true — in distribution). Don't reinvent the wheel. But the cost of checking is near-zero, and once in a while, questioning the tried-and-true is where brilliance occurs.
- **Layer 2** (new and popular — search for these). But scrutinize: humans are subject to mania. Search results are inputs to your thinking, not answers.
- **Layer 3** (first principles — prize these above all). Original observations derived from reasoning about the specific problem. The most valuable of all.
**Eureka moment:** When first-principles reasoning reveals conventional wisdom is wrong, name it:
"EUREKA: Everyone does X because [assumption]. But [evidence] shows this is wrong. Y is better because [reasoning]."
Log eureka moments:
```bash
jq -n --arg ts "$(date -u +%Y-%m-%dT%H:%M:%SZ)" --arg skill "SKILL_NAME" --arg branch "$(git branch --show-current 2>/dev/null)" --arg insight "ONE_LINE_SUMMARY" '{ts:$ts,skill:$skill,branch:$branch,insight:$insight}' >> ~/.gstack/analytics/eureka.jsonl 2>/dev/null || true
```
Replace SKILL_NAME and ONE_LINE_SUMMARY. Runs inline — don't stop the workflow.
**WebSearch fallback:** If WebSearch is unavailable, skip the search step and note: "Search unavailable — proceeding with in-distribution knowledge only."
## Contributor Mode
If `_CONTRIB` is `true`: you are in **contributor mode**. You're a gstack user who also helps make it better.
@@ -214,10 +230,15 @@ Determine the skill name from the `name:` field in this file's YAML frontmatter.
Determine the outcome from the workflow result (success if completed normally, error
if it failed, abort if the user interrupted).
**PLAN MODE EXCEPTION — ALWAYS RUN:** This command writes telemetry to
`~/.gstack/analytics/` (user config directory, not project files). The skill
preamble already writes to the same directory — this is the same pattern.
Skipping this command loses session duration and outcome data.
**For errors:** Also determine:
- `ERROR_CLASS`: a short category — one of: `timeout`, `test_failure`, `build_failure`,
`git_error`, `auth_error`, `network_error`, `browse_error`, `lint_error`,
`merge_conflict`, `permission_error`, `unknown_error`. Pick the most specific match.
- `ERROR_MESSAGE`: a one-line summary of what went wrong (max 200 chars). Include the
command that failed and the key error text. Example: `"bun test: 3 tests failed in
auth.test.ts — expected 200 got 401"`. Never include file paths, secrets, or PII.
- `FAILED_STEP`: which step in the skill workflow failed. Example: `"run_tests"`,
`"create_pr"`, `"merge_base"`, `"build"`, `"qa_browse"`. Use snake_case, max 30 chars.
Run this bash:
@@ -227,12 +248,16 @@ _TEL_DUR=$(( _TEL_END - _TEL_START ))
rm -f ~/.gstack/analytics/.pending-"$_SESSION_ID" 2>/dev/null || true
~/.codex/skills/gstack/bin/gstack-telemetry-log \
--skill "SKILL_NAME" --duration "$_TEL_DUR" --outcome "OUTCOME" \
--used-browse "USED_BROWSE" --session-id "$_SESSION_ID" 2>/dev/null &
--used-browse "USED_BROWSE" --session-id "$_SESSION_ID" \
--error-class "ERROR_CLASS" --error-message "ERROR_MESSAGE" \
--failed-step "FAILED_STEP" 2>/dev/null &
```
Replace `SKILL_NAME` with the actual skill name from frontmatter, `OUTCOME` with
success/error/abort, and `USED_BROWSE` with true/false based on whether `$B` was used.
If you cannot determine the outcome, use "unknown". This runs in the background and
For `ERROR_CLASS`, `ERROR_MESSAGE`, and `FAILED_STEP`: use empty string `""` if the
outcome is not error. If the outcome is error but you cannot determine the details,
use `"unknown_error"`, `""`, and `""` respectively. This runs in the background and
never blocks the user.
# Plan Review Mode
@@ -289,39 +314,12 @@ DESIGN=$(ls -t ~/.gstack/projects/$SLUG/*-$BRANCH-design-*.md 2>/dev/null | head
```
If a design doc exists, read it. Use it as the source of truth for the problem statement, constraints, and chosen approach. If it has a `Supersedes:` field, note that this is a revised design — check the prior version for context on what changed and why.
## Prerequisite Skill Offer
When the design doc check above prints "No design doc found," offer the prerequisite
skill before proceeding.
Say to the user via AskUserQuestion:
> "No design doc found for this branch. `/office-hours` produces a structured problem
> statement, premise challenge, and explored alternatives — it gives this review much
> sharper input to work with. Takes about 10 minutes. The design doc is per-feature,
> not per-product — it captures the thinking behind this specific change."
Options:
- A) Run /office-hours first (in another window, then come back)
- B) Skip — proceed with standard review
If they skip: "No worries — standard review. If you ever want sharper input, try
/office-hours first next time." Then proceed normally. Do not re-offer later in the session.
### Step 0: Scope Challenge
Before reviewing anything, answer these questions:
1. **What existing code already partially or fully solves each sub-problem?** Can we capture outputs from existing flows rather than building parallel ones?
2. **What is the minimum set of changes that achieves the stated goal?** Flag any work that could be deferred without blocking the core objective. Be ruthless about scope creep.
3. **Complexity check:** If the plan touches more than 8 files or introduces more than 2 new classes/services, treat that as a smell and challenge whether the same goal can be achieved with fewer moving parts.
4. **Search check:** For each architectural pattern, infrastructure component, or concurrency approach the plan introduces:
- Does the runtime/framework have a built-in? Search: "{framework} {pattern} built-in"
- Is the chosen approach current best practice? Search: "{pattern} best practice {current year}"
- Are there known footguns? Search: "{framework} {pattern} pitfalls"
If WebSearch is unavailable, skip this check and note: "Search unavailable — proceeding with in-distribution knowledge only."
If the plan rolls a custom solution where a built-in exists, flag it as a scope reduction opportunity. Annotate recommendations with **[Layer 1]**, **[Layer 2]**, **[Layer 3]**, or **[EUREKA]** (see preamble's Search Before Building section). If you find a eureka moment — a reason the standard approach is wrong for this case — present it as an architectural insight.
5. **TODOS cross-reference:** Read `TODOS.md` if it exists. Are any deferred items blocking this plan? Can any deferred items be bundled into this PR without expanding scope? Does this plan create new work that should be captured as a TODO?
4. **TODOS cross-reference:** Read `TODOS.md` if it exists. Are any deferred items blocking this plan? Can any deferred items be bundled into this PR without expanding scope? Does this plan create new work that should be captured as a TODO?
5. **Completeness check:** Is the plan doing the complete version or a shortcut? With AI-assisted coding, the cost of completeness (100% test coverage, full edge case handling, complete error paths) is 10-100x cheaper than with a human team. If the plan proposes a shortcut that saves human-hours but only saves minutes with CC+gstack, recommend the complete version. Boil the lake.
@@ -496,16 +494,10 @@ Check the git log for this branch. If there are prior commits suggesting a previ
## Review Log
After producing the Completion Summary above, persist the review result.
**PLAN MODE EXCEPTION — ALWAYS RUN:** This command writes review metadata to
`~/.gstack/` (user config directory, not project files). The skill preamble
already writes to `~/.gstack/sessions/` and `~/.gstack/analytics/` — this is
the same pattern. The review dashboard depends on this data. Skipping this
command breaks the review readiness dashboard in /ship.
After producing the Completion Summary above, persist the review result:
```bash
~/.codex/skills/gstack/bin/gstack-review-log '{"skill":"plan-eng-review","timestamp":"TIMESTAMP","status":"STATUS","unresolved":N,"critical_gaps":N,"issues_found":N,"mode":"MODE","commit":"COMMIT"}'
~/.codex/skills/gstack/bin/gstack-review-log '{"skill":"plan-eng-review","timestamp":"TIMESTAMP","status":"STATUS","unresolved":N,"critical_gaps":N,"mode":"MODE","commit":"COMMIT"}'
```
Substitute values from the Completion Summary:
@@ -513,7 +505,6 @@ Substitute values from the Completion Summary:
- **STATUS**: "clean" if 0 unresolved decisions AND 0 critical gaps; otherwise "issues_open"
- **unresolved**: number from "Unresolved decisions" count
- **critical_gaps**: number from "Failure modes: ___ critical gaps flagged"
- **issues_found**: total issues found across all review sections (Architecture + Code Quality + Performance + Test gaps)
- **MODE**: FULL_REVIEW / SCOPE_REDUCED
- **COMMIT**: output of `git rev-parse --short HEAD`
@@ -525,7 +516,7 @@ After completing the review, read the review log and config to display the dashb
~/.codex/skills/gstack/bin/gstack-review-read
```
Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, plan-design-review, design-review-lite, adversarial-review, codex-review). Ignore entries with timestamps older than 7 days. For the Adversarial row, show whichever is more recent between `adversarial-review` (new auto-scaled) and `codex-review` (legacy). For Design Review, show whichever is more recent between `plan-design-review` (full visual audit) and `design-review-lite` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. Display:
Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, plan-design-review, design-review-lite, codex-review). Ignore entries with timestamps older than 7 days. For Design Review, show whichever is more recent between `plan-design-review` (full visual audit) and `design-review-lite` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. Display:
```
+====================================================================+
@@ -536,7 +527,7 @@ Parse the output. Find the most recent entry for each skill (plan-ceo-review, pl
| Eng Review | 1 | 2026-03-16 15:00 | CLEAR | YES |
| CEO Review | 0 | — | — | no |
| Design Review | 0 | — | — | no |
| Adversarial | 0 | — | — | no |
| Codex Review | 0 | — | — | no |
+--------------------------------------------------------------------+
| VERDICT: CLEARED — Eng Review passed |
+====================================================================+
@@ -546,7 +537,7 @@ Parse the output. Find the most recent entry for each skill (plan-ceo-review, pl
- **Eng Review (required by default):** The only review that gates shipping. Covers architecture, code quality, tests, performance. Can be disabled globally with \`gstack-config set skip_eng_review true\` (the "don't bother me" setting).
- **CEO Review (optional):** Use your judgment. Recommend it for big product/business changes, new user-facing features, or scope decisions. Skip for bug fixes, refactors, infra, and cleanup.
- **Design Review (optional):** Use your judgment. Recommend it for UI/UX changes. Skip for backend-only, infra, or prompt-only changes.
- **Adversarial Review (automatic):** Auto-scales by diff size. Small diffs (<50 lines) skip adversarial. Medium diffs (50199) get cross-model adversarial. Large diffs (200+) get all 4 passes: Claude structured, Codex structured, Claude adversarial subagent, Codex adversarial. No configuration needed.
- **Codex Review (optional):** Independent second opinion from OpenAI Codex CLI. Shows pass/fail gate. Recommend for critical code changes where a second AI perspective adds value. Skip when Codex CLI is not installed.
**Verdict logic:**
- **CLEARED**: Eng Review has >= 1 entry within 7 days with status "clean" (or \`skip_eng_review\` is \`true\`)
@@ -560,73 +551,6 @@ Parse the output. Find the most recent entry for each skill (plan-ceo-review, pl
- For entries without a \`commit\` field (legacy entries): display "Note: {skill} review from {date} has no commit tracking — consider re-running for accurate staleness detection"
- If all reviews match the current HEAD, do not display any staleness notes
## Plan File Review Report
After displaying the Review Readiness Dashboard in conversation output, also update the
**plan file** itself so review status is visible to anyone reading the plan.
### Detect the plan file
1. Check if there is an active plan file in this conversation (the host provides plan file
paths in system messages — look for plan file references in the conversation context).
2. If not found, skip this section silently — not every review runs in plan mode.
### Generate the report
Read the review log output you already have from the Review Readiness Dashboard step above.
Parse each JSONL entry. Each skill logs different fields:
- **plan-ceo-review**: \`status\`, \`unresolved\`, \`critical_gaps\`, \`mode\`, \`scope_proposed\`, \`scope_accepted\`, \`scope_deferred\`, \`commit\`
→ Findings: "{scope_proposed} proposals, {scope_accepted} accepted, {scope_deferred} deferred"
→ If scope fields are 0 or missing (HOLD/REDUCTION mode): "mode: {mode}, {critical_gaps} critical gaps"
- **plan-eng-review**: \`status\`, \`unresolved\`, \`critical_gaps\`, \`issues_found\`, \`mode\`, \`commit\`
→ Findings: "{issues_found} issues, {critical_gaps} critical gaps"
- **plan-design-review**: \`status\`, \`initial_score\`, \`overall_score\`, \`unresolved\`, \`decisions_made\`, \`commit\`
→ Findings: "score: {initial_score}/10 → {overall_score}/10, {decisions_made} decisions"
- **codex-review**: \`status\`, \`gate\`, \`findings\`, \`findings_fixed\`
→ Findings: "{findings} findings, {findings_fixed}/{findings} fixed"
All fields needed for the Findings column are now present in the JSONL entries.
For the review you just completed, you may use richer details from your own Completion
Summary. For prior reviews, use the JSONL fields directly — they contain all required data.
Produce this markdown table:
\`\`\`markdown
## GSTACK REVIEW REPORT
| Review | Trigger | Why | Runs | Status | Findings |
|--------|---------|-----|------|--------|----------|
| CEO Review | \`/plan-ceo-review\` | Scope & strategy | {runs} | {status} | {findings} |
| Codex Review | \`/codex review\` | Independent 2nd opinion | {runs} | {status} | {findings} |
| Eng Review | \`/plan-eng-review\` | Architecture & tests (required) | {runs} | {status} | {findings} |
| Design Review | \`/plan-design-review\` | UI/UX gaps | {runs} | {status} | {findings} |
\`\`\`
Below the table, add these lines (omit any that are empty/not applicable):
- **CODEX:** (only if codex-review ran) — one-line summary of codex fixes
- **CROSS-MODEL:** (only if both Claude and Codex reviews exist) — overlap analysis
- **UNRESOLVED:** total unresolved decisions across all reviews
- **VERDICT:** list reviews that are CLEAR (e.g., "CEO + ENG CLEARED — ready to implement").
If Eng Review is not CLEAR and not skipped globally, append "eng review required".
### Write to the plan file
**PLAN MODE EXCEPTION — ALWAYS RUN:** This writes to the plan file, which is the one
file you are allowed to edit in plan mode. The plan file review report is part of the
plan's living status.
- Search the plan file for a \`## GSTACK REVIEW REPORT\` section **anywhere** in the file
(not just at the end — content may have been added after it).
- If found, **replace it** entirely using the Edit tool. Match from \`## GSTACK REVIEW REPORT\`
through either the next \`## \` heading or end of file, whichever comes first. This ensures
content added after the report section is preserved, not eaten. If the Edit fails
(e.g., concurrent edit changed the content), re-read the plan file and retry once.
- If no such section exists, **append it** to the end of the plan file.
- Always place it as the very last section in the plan file. If it was found mid-file,
move it: delete the old location and append at the end.
## Next Steps — Review Chaining
After displaying the Review Readiness Dashboard, check if additional reviews would be valuable. Read the dashboard output to see which reviews have already been run and whether they are stale.
+68 -43
View File
@@ -32,6 +32,12 @@ _TEL_START=$(date +%s)
_SESSION_ID="$$-$(date +%s)"
echo "TELEMETRY: ${_TEL:-off}"
echo "TEL_PROMPTED: $_TEL_PROMPTED"
_EMAIL=$(~/.codex/skills/gstack/bin/gstack-config get email 2>/dev/null || true)
_COMM_PROMPTED=$([ -f ~/.gstack/.community-prompted ] && echo "yes" || echo "no")
_AUTH_OK=$(~/.codex/skills/gstack/bin/gstack-auth-refresh --check 2>/dev/null && echo "yes" || echo "no")
echo "EMAIL: ${_EMAIL:-none}"
echo "COMM_PROMPTED: $_COMM_PROMPTED"
echo "AUTH: $_AUTH_OK"
mkdir -p ~/.gstack/analytics
echo '{"skill":"qa-only","ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","repo":"'$(basename "$(git rev-parse --show-toplevel 2>/dev/null)" 2>/dev/null || echo "unknown")'"}' >> ~/.gstack/analytics/skill-usage.jsonl 2>/dev/null || true
for _PF in ~/.gstack/analytics/.pending-*; do [ -f "$_PF" ] && ~/.codex/skills/gstack/bin/gstack-telemetry-log --event-type skill_run --skill _pending_finalize --outcome unknown --session-id "$_SESSION_ID" 2>/dev/null || true; break; done
@@ -57,28 +63,31 @@ Only run `open` if the user says yes. Always run `touch` to mark as seen. This o
If `TEL_PROMPTED` is `no` AND `LAKE_INTRO` is `yes`: After the lake intro is handled,
ask the user about telemetry. Use AskUserQuestion:
> Help gstack get better! Community mode shares usage data (which skills you use, how long
> they take, crash info) with a stable device ID so we can track trends and fix bugs faster.
> No code, file paths, or repo names are ever sent.
> gstack can share usage data (which skills you use, how long they take, crash info)
> to help improve the project. No code, file paths, or repo names are ever sent.
>
> The **community tier** unlocks extra features:
> - **Cloud backup** of your gstack config + history (restore on new machines)
> - **Benchmarks**: see how your usage compares to other builders
> - **Skill recommendations** based on community patterns
>
> Change anytime with `gstack-config set telemetry off`.
Options:
- A) Help gstack get better! (recommended)
- B) No thanks
- A) Community — share data + email for backup, benchmarks & recommendations (recommended)
- B) Anonymous — share data only, no account
- C) No thanks
If A: run `~/.codex/skills/gstack/bin/gstack-config set telemetry community`
If A: ask for their email via a follow-up AskUserQuestion, then run:
```bash
~/.codex/skills/gstack/bin/gstack-config set telemetry community
~/.codex/skills/gstack/bin/gstack-auth <user-provided-email>
```
The auth script will send a verification code to their email. Wait for them to enter the 6-digit code.
If auth succeeds, continue with the skill. If it fails, fall back to anonymous tier.
If B: ask a follow-up AskUserQuestion:
> How about anonymous mode? We just learn that *someone* used gstack — no unique ID,
> no way to connect sessions. Just a counter that helps us know if anyone's out there.
Options:
- A) Sure, anonymous is fine
- B) No thanks, fully off
If B→A: run `~/.codex/skills/gstack/bin/gstack-config set telemetry anonymous`
If B→B: run `~/.codex/skills/gstack/bin/gstack-config set telemetry off`
If B: run `~/.codex/skills/gstack/bin/gstack-config set telemetry anonymous`
If C: run `~/.codex/skills/gstack/bin/gstack-config set telemetry off`
Always run:
```bash
@@ -87,6 +96,33 @@ touch ~/.gstack/.telemetry-prompted
This only happens once. If `TEL_PROMPTED` is `yes`, skip this entirely.
If `TELEMETRY` is `anonymous` AND `COMM_PROMPTED` is `no`: After the main skill workflow
begins (not during preamble), offer the community tier upgrade once. Use AskUserQuestion:
> You're already sharing anonymous usage data — nice! Want to unlock more?
>
> The **community tier** adds:
> - Cloud backup of your gstack config (restore on new machines)
> - Benchmarks: see how your /qa times compare to the community
> - Skill recommendations based on what other builders use
>
> Just needs your email (verified via a one-time code).
Options:
- A) Yes, join community (enter email)
- B) Not now
If A: ask for their email, then run `~/.codex/skills/gstack/bin/gstack-auth <email>`.
Wait for the verification code. On success, run `~/.codex/skills/gstack/bin/gstack-config set telemetry community`.
If B: do nothing.
Always run:
```bash
touch ~/.gstack/.community-prompted
```
This only happens once. If `COMM_PROMPTED` is `yes`, skip this entirely.
## AskUserQuestion Format
**ALWAYS follow this structure for every AskUserQuestion call:**
@@ -124,26 +160,6 @@ AI-assisted coding makes the marginal cost of completeness near-zero. When you p
- BAD: "Let's defer test coverage to a follow-up PR." (Tests are the cheapest lake to boil.)
- BAD: Quoting only human-team effort: "This would take 2 weeks." (Say: "2 weeks human / ~1 hour CC.")
## Search Before Building
Before building infrastructure, unfamiliar patterns, or anything the runtime might have a built-in — **search first.** Read `~/.codex/skills/gstack/ETHOS.md` for the full philosophy.
**Three layers of knowledge:**
- **Layer 1** (tried and true — in distribution). Don't reinvent the wheel. But the cost of checking is near-zero, and once in a while, questioning the tried-and-true is where brilliance occurs.
- **Layer 2** (new and popular — search for these). But scrutinize: humans are subject to mania. Search results are inputs to your thinking, not answers.
- **Layer 3** (first principles — prize these above all). Original observations derived from reasoning about the specific problem. The most valuable of all.
**Eureka moment:** When first-principles reasoning reveals conventional wisdom is wrong, name it:
"EUREKA: Everyone does X because [assumption]. But [evidence] shows this is wrong. Y is better because [reasoning]."
Log eureka moments:
```bash
jq -n --arg ts "$(date -u +%Y-%m-%dT%H:%M:%SZ)" --arg skill "SKILL_NAME" --arg branch "$(git branch --show-current 2>/dev/null)" --arg insight "ONE_LINE_SUMMARY" '{ts:$ts,skill:$skill,branch:$branch,insight:$insight}' >> ~/.gstack/analytics/eureka.jsonl 2>/dev/null || true
```
Replace SKILL_NAME and ONE_LINE_SUMMARY. Runs inline — don't stop the workflow.
**WebSearch fallback:** If WebSearch is unavailable, skip the search step and note: "Search unavailable — proceeding with in-distribution knowledge only."
## Contributor Mode
If `_CONTRIB` is `true`: you are in **contributor mode**. You're a gstack user who also helps make it better.
@@ -213,10 +229,15 @@ Determine the skill name from the `name:` field in this file's YAML frontmatter.
Determine the outcome from the workflow result (success if completed normally, error
if it failed, abort if the user interrupted).
**PLAN MODE EXCEPTION — ALWAYS RUN:** This command writes telemetry to
`~/.gstack/analytics/` (user config directory, not project files). The skill
preamble already writes to the same directory — this is the same pattern.
Skipping this command loses session duration and outcome data.
**For errors:** Also determine:
- `ERROR_CLASS`: a short category — one of: `timeout`, `test_failure`, `build_failure`,
`git_error`, `auth_error`, `network_error`, `browse_error`, `lint_error`,
`merge_conflict`, `permission_error`, `unknown_error`. Pick the most specific match.
- `ERROR_MESSAGE`: a one-line summary of what went wrong (max 200 chars). Include the
command that failed and the key error text. Example: `"bun test: 3 tests failed in
auth.test.ts — expected 200 got 401"`. Never include file paths, secrets, or PII.
- `FAILED_STEP`: which step in the skill workflow failed. Example: `"run_tests"`,
`"create_pr"`, `"merge_base"`, `"build"`, `"qa_browse"`. Use snake_case, max 30 chars.
Run this bash:
@@ -226,12 +247,16 @@ _TEL_DUR=$(( _TEL_END - _TEL_START ))
rm -f ~/.gstack/analytics/.pending-"$_SESSION_ID" 2>/dev/null || true
~/.codex/skills/gstack/bin/gstack-telemetry-log \
--skill "SKILL_NAME" --duration "$_TEL_DUR" --outcome "OUTCOME" \
--used-browse "USED_BROWSE" --session-id "$_SESSION_ID" 2>/dev/null &
--used-browse "USED_BROWSE" --session-id "$_SESSION_ID" \
--error-class "ERROR_CLASS" --error-message "ERROR_MESSAGE" \
--failed-step "FAILED_STEP" 2>/dev/null &
```
Replace `SKILL_NAME` with the actual skill name from frontmatter, `OUTCOME` with
success/error/abort, and `USED_BROWSE` with true/false based on whether `$B` was used.
If you cannot determine the outcome, use "unknown". This runs in the background and
For `ERROR_CLASS`, `ERROR_MESSAGE`, and `FAILED_STEP`: use empty string `""` if the
outcome is not error. If the outcome is error but you cannot determine the details,
use `"unknown_error"`, `""`, and `""` respectively. This runs in the background and
never blocks the user.
# /qa-only: Report-Only QA Testing
+68 -43
View File
@@ -35,6 +35,12 @@ _TEL_START=$(date +%s)
_SESSION_ID="$$-$(date +%s)"
echo "TELEMETRY: ${_TEL:-off}"
echo "TEL_PROMPTED: $_TEL_PROMPTED"
_EMAIL=$(~/.codex/skills/gstack/bin/gstack-config get email 2>/dev/null || true)
_COMM_PROMPTED=$([ -f ~/.gstack/.community-prompted ] && echo "yes" || echo "no")
_AUTH_OK=$(~/.codex/skills/gstack/bin/gstack-auth-refresh --check 2>/dev/null && echo "yes" || echo "no")
echo "EMAIL: ${_EMAIL:-none}"
echo "COMM_PROMPTED: $_COMM_PROMPTED"
echo "AUTH: $_AUTH_OK"
mkdir -p ~/.gstack/analytics
echo '{"skill":"qa","ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","repo":"'$(basename "$(git rev-parse --show-toplevel 2>/dev/null)" 2>/dev/null || echo "unknown")'"}' >> ~/.gstack/analytics/skill-usage.jsonl 2>/dev/null || true
for _PF in ~/.gstack/analytics/.pending-*; do [ -f "$_PF" ] && ~/.codex/skills/gstack/bin/gstack-telemetry-log --event-type skill_run --skill _pending_finalize --outcome unknown --session-id "$_SESSION_ID" 2>/dev/null || true; break; done
@@ -60,28 +66,31 @@ Only run `open` if the user says yes. Always run `touch` to mark as seen. This o
If `TEL_PROMPTED` is `no` AND `LAKE_INTRO` is `yes`: After the lake intro is handled,
ask the user about telemetry. Use AskUserQuestion:
> Help gstack get better! Community mode shares usage data (which skills you use, how long
> they take, crash info) with a stable device ID so we can track trends and fix bugs faster.
> No code, file paths, or repo names are ever sent.
> gstack can share usage data (which skills you use, how long they take, crash info)
> to help improve the project. No code, file paths, or repo names are ever sent.
>
> The **community tier** unlocks extra features:
> - **Cloud backup** of your gstack config + history (restore on new machines)
> - **Benchmarks**: see how your usage compares to other builders
> - **Skill recommendations** based on community patterns
>
> Change anytime with `gstack-config set telemetry off`.
Options:
- A) Help gstack get better! (recommended)
- B) No thanks
- A) Community — share data + email for backup, benchmarks & recommendations (recommended)
- B) Anonymous — share data only, no account
- C) No thanks
If A: run `~/.codex/skills/gstack/bin/gstack-config set telemetry community`
If A: ask for their email via a follow-up AskUserQuestion, then run:
```bash
~/.codex/skills/gstack/bin/gstack-config set telemetry community
~/.codex/skills/gstack/bin/gstack-auth <user-provided-email>
```
The auth script will send a verification code to their email. Wait for them to enter the 6-digit code.
If auth succeeds, continue with the skill. If it fails, fall back to anonymous tier.
If B: ask a follow-up AskUserQuestion:
> How about anonymous mode? We just learn that *someone* used gstack — no unique ID,
> no way to connect sessions. Just a counter that helps us know if anyone's out there.
Options:
- A) Sure, anonymous is fine
- B) No thanks, fully off
If B→A: run `~/.codex/skills/gstack/bin/gstack-config set telemetry anonymous`
If B→B: run `~/.codex/skills/gstack/bin/gstack-config set telemetry off`
If B: run `~/.codex/skills/gstack/bin/gstack-config set telemetry anonymous`
If C: run `~/.codex/skills/gstack/bin/gstack-config set telemetry off`
Always run:
```bash
@@ -90,6 +99,33 @@ touch ~/.gstack/.telemetry-prompted
This only happens once. If `TEL_PROMPTED` is `yes`, skip this entirely.
If `TELEMETRY` is `anonymous` AND `COMM_PROMPTED` is `no`: After the main skill workflow
begins (not during preamble), offer the community tier upgrade once. Use AskUserQuestion:
> You're already sharing anonymous usage data — nice! Want to unlock more?
>
> The **community tier** adds:
> - Cloud backup of your gstack config (restore on new machines)
> - Benchmarks: see how your /qa times compare to the community
> - Skill recommendations based on what other builders use
>
> Just needs your email (verified via a one-time code).
Options:
- A) Yes, join community (enter email)
- B) Not now
If A: ask for their email, then run `~/.codex/skills/gstack/bin/gstack-auth <email>`.
Wait for the verification code. On success, run `~/.codex/skills/gstack/bin/gstack-config set telemetry community`.
If B: do nothing.
Always run:
```bash
touch ~/.gstack/.community-prompted
```
This only happens once. If `COMM_PROMPTED` is `yes`, skip this entirely.
## AskUserQuestion Format
**ALWAYS follow this structure for every AskUserQuestion call:**
@@ -127,26 +163,6 @@ AI-assisted coding makes the marginal cost of completeness near-zero. When you p
- BAD: "Let's defer test coverage to a follow-up PR." (Tests are the cheapest lake to boil.)
- BAD: Quoting only human-team effort: "This would take 2 weeks." (Say: "2 weeks human / ~1 hour CC.")
## Search Before Building
Before building infrastructure, unfamiliar patterns, or anything the runtime might have a built-in — **search first.** Read `~/.codex/skills/gstack/ETHOS.md` for the full philosophy.
**Three layers of knowledge:**
- **Layer 1** (tried and true — in distribution). Don't reinvent the wheel. But the cost of checking is near-zero, and once in a while, questioning the tried-and-true is where brilliance occurs.
- **Layer 2** (new and popular — search for these). But scrutinize: humans are subject to mania. Search results are inputs to your thinking, not answers.
- **Layer 3** (first principles — prize these above all). Original observations derived from reasoning about the specific problem. The most valuable of all.
**Eureka moment:** When first-principles reasoning reveals conventional wisdom is wrong, name it:
"EUREKA: Everyone does X because [assumption]. But [evidence] shows this is wrong. Y is better because [reasoning]."
Log eureka moments:
```bash
jq -n --arg ts "$(date -u +%Y-%m-%dT%H:%M:%SZ)" --arg skill "SKILL_NAME" --arg branch "$(git branch --show-current 2>/dev/null)" --arg insight "ONE_LINE_SUMMARY" '{ts:$ts,skill:$skill,branch:$branch,insight:$insight}' >> ~/.gstack/analytics/eureka.jsonl 2>/dev/null || true
```
Replace SKILL_NAME and ONE_LINE_SUMMARY. Runs inline — don't stop the workflow.
**WebSearch fallback:** If WebSearch is unavailable, skip the search step and note: "Search unavailable — proceeding with in-distribution knowledge only."
## Contributor Mode
If `_CONTRIB` is `true`: you are in **contributor mode**. You're a gstack user who also helps make it better.
@@ -216,10 +232,15 @@ Determine the skill name from the `name:` field in this file's YAML frontmatter.
Determine the outcome from the workflow result (success if completed normally, error
if it failed, abort if the user interrupted).
**PLAN MODE EXCEPTION — ALWAYS RUN:** This command writes telemetry to
`~/.gstack/analytics/` (user config directory, not project files). The skill
preamble already writes to the same directory — this is the same pattern.
Skipping this command loses session duration and outcome data.
**For errors:** Also determine:
- `ERROR_CLASS`: a short category — one of: `timeout`, `test_failure`, `build_failure`,
`git_error`, `auth_error`, `network_error`, `browse_error`, `lint_error`,
`merge_conflict`, `permission_error`, `unknown_error`. Pick the most specific match.
- `ERROR_MESSAGE`: a one-line summary of what went wrong (max 200 chars). Include the
command that failed and the key error text. Example: `"bun test: 3 tests failed in
auth.test.ts — expected 200 got 401"`. Never include file paths, secrets, or PII.
- `FAILED_STEP`: which step in the skill workflow failed. Example: `"run_tests"`,
`"create_pr"`, `"merge_base"`, `"build"`, `"qa_browse"`. Use snake_case, max 30 chars.
Run this bash:
@@ -229,12 +250,16 @@ _TEL_DUR=$(( _TEL_END - _TEL_START ))
rm -f ~/.gstack/analytics/.pending-"$_SESSION_ID" 2>/dev/null || true
~/.codex/skills/gstack/bin/gstack-telemetry-log \
--skill "SKILL_NAME" --duration "$_TEL_DUR" --outcome "OUTCOME" \
--used-browse "USED_BROWSE" --session-id "$_SESSION_ID" 2>/dev/null &
--used-browse "USED_BROWSE" --session-id "$_SESSION_ID" \
--error-class "ERROR_CLASS" --error-message "ERROR_MESSAGE" \
--failed-step "FAILED_STEP" 2>/dev/null &
```
Replace `SKILL_NAME` with the actual skill name from frontmatter, `OUTCOME` with
success/error/abort, and `USED_BROWSE` with true/false based on whether `$B` was used.
If you cannot determine the outcome, use "unknown". This runs in the background and
For `ERROR_CLASS`, `ERROR_MESSAGE`, and `FAILED_STEP`: use empty string `""` if the
outcome is not error. If the outcome is error but you cannot determine the details,
use `"unknown_error"`, `""`, and `""` respectively. This runs in the background and
never blocks the user.
## Step 0: Detect base branch
+72 -60
View File
@@ -32,6 +32,12 @@ _TEL_START=$(date +%s)
_SESSION_ID="$$-$(date +%s)"
echo "TELEMETRY: ${_TEL:-off}"
echo "TEL_PROMPTED: $_TEL_PROMPTED"
_EMAIL=$(~/.codex/skills/gstack/bin/gstack-config get email 2>/dev/null || true)
_COMM_PROMPTED=$([ -f ~/.gstack/.community-prompted ] && echo "yes" || echo "no")
_AUTH_OK=$(~/.codex/skills/gstack/bin/gstack-auth-refresh --check 2>/dev/null && echo "yes" || echo "no")
echo "EMAIL: ${_EMAIL:-none}"
echo "COMM_PROMPTED: $_COMM_PROMPTED"
echo "AUTH: $_AUTH_OK"
mkdir -p ~/.gstack/analytics
echo '{"skill":"retro","ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","repo":"'$(basename "$(git rev-parse --show-toplevel 2>/dev/null)" 2>/dev/null || echo "unknown")'"}' >> ~/.gstack/analytics/skill-usage.jsonl 2>/dev/null || true
for _PF in ~/.gstack/analytics/.pending-*; do [ -f "$_PF" ] && ~/.codex/skills/gstack/bin/gstack-telemetry-log --event-type skill_run --skill _pending_finalize --outcome unknown --session-id "$_SESSION_ID" 2>/dev/null || true; break; done
@@ -57,28 +63,31 @@ Only run `open` if the user says yes. Always run `touch` to mark as seen. This o
If `TEL_PROMPTED` is `no` AND `LAKE_INTRO` is `yes`: After the lake intro is handled,
ask the user about telemetry. Use AskUserQuestion:
> Help gstack get better! Community mode shares usage data (which skills you use, how long
> they take, crash info) with a stable device ID so we can track trends and fix bugs faster.
> No code, file paths, or repo names are ever sent.
> gstack can share usage data (which skills you use, how long they take, crash info)
> to help improve the project. No code, file paths, or repo names are ever sent.
>
> The **community tier** unlocks extra features:
> - **Cloud backup** of your gstack config + history (restore on new machines)
> - **Benchmarks**: see how your usage compares to other builders
> - **Skill recommendations** based on community patterns
>
> Change anytime with `gstack-config set telemetry off`.
Options:
- A) Help gstack get better! (recommended)
- B) No thanks
- A) Community — share data + email for backup, benchmarks & recommendations (recommended)
- B) Anonymous — share data only, no account
- C) No thanks
If A: run `~/.codex/skills/gstack/bin/gstack-config set telemetry community`
If A: ask for their email via a follow-up AskUserQuestion, then run:
```bash
~/.codex/skills/gstack/bin/gstack-config set telemetry community
~/.codex/skills/gstack/bin/gstack-auth <user-provided-email>
```
The auth script will send a verification code to their email. Wait for them to enter the 6-digit code.
If auth succeeds, continue with the skill. If it fails, fall back to anonymous tier.
If B: ask a follow-up AskUserQuestion:
> How about anonymous mode? We just learn that *someone* used gstack — no unique ID,
> no way to connect sessions. Just a counter that helps us know if anyone's out there.
Options:
- A) Sure, anonymous is fine
- B) No thanks, fully off
If B→A: run `~/.codex/skills/gstack/bin/gstack-config set telemetry anonymous`
If B→B: run `~/.codex/skills/gstack/bin/gstack-config set telemetry off`
If B: run `~/.codex/skills/gstack/bin/gstack-config set telemetry anonymous`
If C: run `~/.codex/skills/gstack/bin/gstack-config set telemetry off`
Always run:
```bash
@@ -87,6 +96,33 @@ touch ~/.gstack/.telemetry-prompted
This only happens once. If `TEL_PROMPTED` is `yes`, skip this entirely.
If `TELEMETRY` is `anonymous` AND `COMM_PROMPTED` is `no`: After the main skill workflow
begins (not during preamble), offer the community tier upgrade once. Use AskUserQuestion:
> You're already sharing anonymous usage data — nice! Want to unlock more?
>
> The **community tier** adds:
> - Cloud backup of your gstack config (restore on new machines)
> - Benchmarks: see how your /qa times compare to the community
> - Skill recommendations based on what other builders use
>
> Just needs your email (verified via a one-time code).
Options:
- A) Yes, join community (enter email)
- B) Not now
If A: ask for their email, then run `~/.codex/skills/gstack/bin/gstack-auth <email>`.
Wait for the verification code. On success, run `~/.codex/skills/gstack/bin/gstack-config set telemetry community`.
If B: do nothing.
Always run:
```bash
touch ~/.gstack/.community-prompted
```
This only happens once. If `COMM_PROMPTED` is `yes`, skip this entirely.
## AskUserQuestion Format
**ALWAYS follow this structure for every AskUserQuestion call:**
@@ -124,26 +160,6 @@ AI-assisted coding makes the marginal cost of completeness near-zero. When you p
- BAD: "Let's defer test coverage to a follow-up PR." (Tests are the cheapest lake to boil.)
- BAD: Quoting only human-team effort: "This would take 2 weeks." (Say: "2 weeks human / ~1 hour CC.")
## Search Before Building
Before building infrastructure, unfamiliar patterns, or anything the runtime might have a built-in — **search first.** Read `~/.codex/skills/gstack/ETHOS.md` for the full philosophy.
**Three layers of knowledge:**
- **Layer 1** (tried and true — in distribution). Don't reinvent the wheel. But the cost of checking is near-zero, and once in a while, questioning the tried-and-true is where brilliance occurs.
- **Layer 2** (new and popular — search for these). But scrutinize: humans are subject to mania. Search results are inputs to your thinking, not answers.
- **Layer 3** (first principles — prize these above all). Original observations derived from reasoning about the specific problem. The most valuable of all.
**Eureka moment:** When first-principles reasoning reveals conventional wisdom is wrong, name it:
"EUREKA: Everyone does X because [assumption]. But [evidence] shows this is wrong. Y is better because [reasoning]."
Log eureka moments:
```bash
jq -n --arg ts "$(date -u +%Y-%m-%dT%H:%M:%SZ)" --arg skill "SKILL_NAME" --arg branch "$(git branch --show-current 2>/dev/null)" --arg insight "ONE_LINE_SUMMARY" '{ts:$ts,skill:$skill,branch:$branch,insight:$insight}' >> ~/.gstack/analytics/eureka.jsonl 2>/dev/null || true
```
Replace SKILL_NAME and ONE_LINE_SUMMARY. Runs inline — don't stop the workflow.
**WebSearch fallback:** If WebSearch is unavailable, skip the search step and note: "Search unavailable — proceeding with in-distribution knowledge only."
## Contributor Mode
If `_CONTRIB` is `true`: you are in **contributor mode**. You're a gstack user who also helps make it better.
@@ -213,10 +229,15 @@ Determine the skill name from the `name:` field in this file's YAML frontmatter.
Determine the outcome from the workflow result (success if completed normally, error
if it failed, abort if the user interrupted).
**PLAN MODE EXCEPTION — ALWAYS RUN:** This command writes telemetry to
`~/.gstack/analytics/` (user config directory, not project files). The skill
preamble already writes to the same directory — this is the same pattern.
Skipping this command loses session duration and outcome data.
**For errors:** Also determine:
- `ERROR_CLASS`: a short category — one of: `timeout`, `test_failure`, `build_failure`,
`git_error`, `auth_error`, `network_error`, `browse_error`, `lint_error`,
`merge_conflict`, `permission_error`, `unknown_error`. Pick the most specific match.
- `ERROR_MESSAGE`: a one-line summary of what went wrong (max 200 chars). Include the
command that failed and the key error text. Example: `"bun test: 3 tests failed in
auth.test.ts — expected 200 got 401"`. Never include file paths, secrets, or PII.
- `FAILED_STEP`: which step in the skill workflow failed. Example: `"run_tests"`,
`"create_pr"`, `"merge_base"`, `"build"`, `"qa_browse"`. Use snake_case, max 30 chars.
Run this bash:
@@ -226,12 +247,16 @@ _TEL_DUR=$(( _TEL_END - _TEL_START ))
rm -f ~/.gstack/analytics/.pending-"$_SESSION_ID" 2>/dev/null || true
~/.codex/skills/gstack/bin/gstack-telemetry-log \
--skill "SKILL_NAME" --duration "$_TEL_DUR" --outcome "OUTCOME" \
--used-browse "USED_BROWSE" --session-id "$_SESSION_ID" 2>/dev/null &
--used-browse "USED_BROWSE" --session-id "$_SESSION_ID" \
--error-class "ERROR_CLASS" --error-message "ERROR_MESSAGE" \
--failed-step "FAILED_STEP" 2>/dev/null &
```
Replace `SKILL_NAME` with the actual skill name from frontmatter, `OUTCOME` with
success/error/abort, and `USED_BROWSE` with true/false based on whether `$B` was used.
If you cannot determine the outcome, use "unknown". This runs in the background and
For `ERROR_CLASS`, `ERROR_MESSAGE`, and `FAILED_STEP`: use empty string `""` if the
outcome is not error. If the outcome is error but you cannot determine the details,
use `"unknown_error"`, `""`, and `""` respectively. This runs in the background and
never blocks the user.
## Detect default branch
@@ -389,20 +414,6 @@ If TODOS.md doesn't exist, skip the Backlog Health row.
If the JSONL file doesn't exist or has no entries in the window, skip the Skill Usage row.
**Eureka Moments (if logged):** Read `~/.gstack/analytics/eureka.jsonl` if it exists. Filter entries within the retro time window by `ts` field. For each eureka moment, show the skill that flagged it, the branch, and a one-line summary of the insight. Present as:
```
| Eureka Moments | 2 this period |
```
If moments exist, list them:
```
EUREKA /office-hours (branch: garrytan/auth-rethink): "Session tokens don't need server storage — browser crypto API makes client-side JWT validation viable"
EUREKA /plan-eng-review (branch: garrytan/cache-layer): "Redis isn't needed here — Bun's built-in LRU cache handles this workload"
```
If the JSONL file doesn't exist or has no entries in the window, skip the Eureka Moments row.
### Step 3: Commit Time Distribution
Show hourly histogram in local time using bar chart:
@@ -462,7 +473,7 @@ From commit diffs, estimate PR sizes and bucket them:
- **Small** (<100 LOC)
- **Medium** (100-500 LOC)
- **Large** (500-1500 LOC)
- **XL** (1500+ LOC)
- **XL** (1500+ LOC) — flag these with file counts
### Step 8: Focus Score + Ship of the Week
@@ -654,13 +665,14 @@ Narrative interpreting what the team-wide patterns mean:
Narrative covering:
- Commit type mix and what it reveals
- PR size distribution and what it reveals about shipping cadence
- PR size discipline (are PRs staying small?)
- Fix-chain detection (sequences of fix commits on the same subsystem)
- Version bump discipline
### Code Quality Signals
- Test LOC ratio trend
- Hotspot analysis (are the same files churning?)
- Any XL PRs that should have been split
- Greptile signal ratio and trend (if history exists): "Greptile: X% signal (Y valid catches, Z false positives)"
### Test Health
@@ -699,7 +711,7 @@ For each teammate (sorted by commits descending), write a section:
- "Fixed the N+1 query that was causing 2s load times on the dashboard"
- **Opportunity for growth**: 1 specific, constructive suggestion. Frame as investment, not criticism. Examples:
- "Test coverage on the payment module is at 8% — worth investing in before the next feature lands on top of it"
- "Most commits land in a single burst — spacing work across the day could reduce context-switching fatigue"
- "3 of the 5 PRs were 800+ LOC — breaking these up would catch issues earlier and make review easier"
- "All commits land between 1-4am — sustainable pace matters for code quality long-term"
**AI collaboration note:** If many commits have `Co-Authored-By` AI trailers (e.g., Claude, Copilot), note the AI-assisted commit percentage as a team metric. Frame it neutrally — "N% of commits were AI-assisted" — without judgment.
+116 -51
View File
@@ -31,6 +31,12 @@ _TEL_START=$(date +%s)
_SESSION_ID="$$-$(date +%s)"
echo "TELEMETRY: ${_TEL:-off}"
echo "TEL_PROMPTED: $_TEL_PROMPTED"
_EMAIL=$(~/.codex/skills/gstack/bin/gstack-config get email 2>/dev/null || true)
_COMM_PROMPTED=$([ -f ~/.gstack/.community-prompted ] && echo "yes" || echo "no")
_AUTH_OK=$(~/.codex/skills/gstack/bin/gstack-auth-refresh --check 2>/dev/null && echo "yes" || echo "no")
echo "EMAIL: ${_EMAIL:-none}"
echo "COMM_PROMPTED: $_COMM_PROMPTED"
echo "AUTH: $_AUTH_OK"
mkdir -p ~/.gstack/analytics
echo '{"skill":"review","ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","repo":"'$(basename "$(git rev-parse --show-toplevel 2>/dev/null)" 2>/dev/null || echo "unknown")'"}' >> ~/.gstack/analytics/skill-usage.jsonl 2>/dev/null || true
for _PF in ~/.gstack/analytics/.pending-*; do [ -f "$_PF" ] && ~/.codex/skills/gstack/bin/gstack-telemetry-log --event-type skill_run --skill _pending_finalize --outcome unknown --session-id "$_SESSION_ID" 2>/dev/null || true; break; done
@@ -56,28 +62,31 @@ Only run `open` if the user says yes. Always run `touch` to mark as seen. This o
If `TEL_PROMPTED` is `no` AND `LAKE_INTRO` is `yes`: After the lake intro is handled,
ask the user about telemetry. Use AskUserQuestion:
> Help gstack get better! Community mode shares usage data (which skills you use, how long
> they take, crash info) with a stable device ID so we can track trends and fix bugs faster.
> No code, file paths, or repo names are ever sent.
> gstack can share usage data (which skills you use, how long they take, crash info)
> to help improve the project. No code, file paths, or repo names are ever sent.
>
> The **community tier** unlocks extra features:
> - **Cloud backup** of your gstack config + history (restore on new machines)
> - **Benchmarks**: see how your usage compares to other builders
> - **Skill recommendations** based on community patterns
>
> Change anytime with `gstack-config set telemetry off`.
Options:
- A) Help gstack get better! (recommended)
- B) No thanks
- A) Community — share data + email for backup, benchmarks & recommendations (recommended)
- B) Anonymous — share data only, no account
- C) No thanks
If A: run `~/.codex/skills/gstack/bin/gstack-config set telemetry community`
If A: ask for their email via a follow-up AskUserQuestion, then run:
```bash
~/.codex/skills/gstack/bin/gstack-config set telemetry community
~/.codex/skills/gstack/bin/gstack-auth <user-provided-email>
```
The auth script will send a verification code to their email. Wait for them to enter the 6-digit code.
If auth succeeds, continue with the skill. If it fails, fall back to anonymous tier.
If B: ask a follow-up AskUserQuestion:
> How about anonymous mode? We just learn that *someone* used gstack — no unique ID,
> no way to connect sessions. Just a counter that helps us know if anyone's out there.
Options:
- A) Sure, anonymous is fine
- B) No thanks, fully off
If B→A: run `~/.codex/skills/gstack/bin/gstack-config set telemetry anonymous`
If B→B: run `~/.codex/skills/gstack/bin/gstack-config set telemetry off`
If B: run `~/.codex/skills/gstack/bin/gstack-config set telemetry anonymous`
If C: run `~/.codex/skills/gstack/bin/gstack-config set telemetry off`
Always run:
```bash
@@ -86,6 +95,33 @@ touch ~/.gstack/.telemetry-prompted
This only happens once. If `TEL_PROMPTED` is `yes`, skip this entirely.
If `TELEMETRY` is `anonymous` AND `COMM_PROMPTED` is `no`: After the main skill workflow
begins (not during preamble), offer the community tier upgrade once. Use AskUserQuestion:
> You're already sharing anonymous usage data — nice! Want to unlock more?
>
> The **community tier** adds:
> - Cloud backup of your gstack config (restore on new machines)
> - Benchmarks: see how your /qa times compare to the community
> - Skill recommendations based on what other builders use
>
> Just needs your email (verified via a one-time code).
Options:
- A) Yes, join community (enter email)
- B) Not now
If A: ask for their email, then run `~/.codex/skills/gstack/bin/gstack-auth <email>`.
Wait for the verification code. On success, run `~/.codex/skills/gstack/bin/gstack-config set telemetry community`.
If B: do nothing.
Always run:
```bash
touch ~/.gstack/.community-prompted
```
This only happens once. If `COMM_PROMPTED` is `yes`, skip this entirely.
## AskUserQuestion Format
**ALWAYS follow this structure for every AskUserQuestion call:**
@@ -123,26 +159,6 @@ AI-assisted coding makes the marginal cost of completeness near-zero. When you p
- BAD: "Let's defer test coverage to a follow-up PR." (Tests are the cheapest lake to boil.)
- BAD: Quoting only human-team effort: "This would take 2 weeks." (Say: "2 weeks human / ~1 hour CC.")
## Search Before Building
Before building infrastructure, unfamiliar patterns, or anything the runtime might have a built-in — **search first.** Read `~/.codex/skills/gstack/ETHOS.md` for the full philosophy.
**Three layers of knowledge:**
- **Layer 1** (tried and true — in distribution). Don't reinvent the wheel. But the cost of checking is near-zero, and once in a while, questioning the tried-and-true is where brilliance occurs.
- **Layer 2** (new and popular — search for these). But scrutinize: humans are subject to mania. Search results are inputs to your thinking, not answers.
- **Layer 3** (first principles — prize these above all). Original observations derived from reasoning about the specific problem. The most valuable of all.
**Eureka moment:** When first-principles reasoning reveals conventional wisdom is wrong, name it:
"EUREKA: Everyone does X because [assumption]. But [evidence] shows this is wrong. Y is better because [reasoning]."
Log eureka moments:
```bash
jq -n --arg ts "$(date -u +%Y-%m-%dT%H:%M:%SZ)" --arg skill "SKILL_NAME" --arg branch "$(git branch --show-current 2>/dev/null)" --arg insight "ONE_LINE_SUMMARY" '{ts:$ts,skill:$skill,branch:$branch,insight:$insight}' >> ~/.gstack/analytics/eureka.jsonl 2>/dev/null || true
```
Replace SKILL_NAME and ONE_LINE_SUMMARY. Runs inline — don't stop the workflow.
**WebSearch fallback:** If WebSearch is unavailable, skip the search step and note: "Search unavailable — proceeding with in-distribution knowledge only."
## Contributor Mode
If `_CONTRIB` is `true`: you are in **contributor mode**. You're a gstack user who also helps make it better.
@@ -212,10 +228,15 @@ Determine the skill name from the `name:` field in this file's YAML frontmatter.
Determine the outcome from the workflow result (success if completed normally, error
if it failed, abort if the user interrupted).
**PLAN MODE EXCEPTION — ALWAYS RUN:** This command writes telemetry to
`~/.gstack/analytics/` (user config directory, not project files). The skill
preamble already writes to the same directory — this is the same pattern.
Skipping this command loses session duration and outcome data.
**For errors:** Also determine:
- `ERROR_CLASS`: a short category — one of: `timeout`, `test_failure`, `build_failure`,
`git_error`, `auth_error`, `network_error`, `browse_error`, `lint_error`,
`merge_conflict`, `permission_error`, `unknown_error`. Pick the most specific match.
- `ERROR_MESSAGE`: a one-line summary of what went wrong (max 200 chars). Include the
command that failed and the key error text. Example: `"bun test: 3 tests failed in
auth.test.ts — expected 200 got 401"`. Never include file paths, secrets, or PII.
- `FAILED_STEP`: which step in the skill workflow failed. Example: `"run_tests"`,
`"create_pr"`, `"merge_base"`, `"build"`, `"qa_browse"`. Use snake_case, max 30 chars.
Run this bash:
@@ -225,12 +246,16 @@ _TEL_DUR=$(( _TEL_END - _TEL_START ))
rm -f ~/.gstack/analytics/.pending-"$_SESSION_ID" 2>/dev/null || true
~/.codex/skills/gstack/bin/gstack-telemetry-log \
--skill "SKILL_NAME" --duration "$_TEL_DUR" --outcome "OUTCOME" \
--used-browse "USED_BROWSE" --session-id "$_SESSION_ID" 2>/dev/null &
--used-browse "USED_BROWSE" --session-id "$_SESSION_ID" \
--error-class "ERROR_CLASS" --error-message "ERROR_MESSAGE" \
--failed-step "FAILED_STEP" 2>/dev/null &
```
Replace `SKILL_NAME` with the actual skill name from frontmatter, `OUTCOME` with
success/error/abort, and `USED_BROWSE` with true/false based on whether `$B` was used.
If you cannot determine the outcome, use "unknown". This runs in the background and
For `ERROR_CLASS`, `ERROR_MESSAGE`, and `FAILED_STEP`: use empty string `""` if the
outcome is not error. If the outcome is error but you cannot determine the details,
use `"unknown_error"`, `""`, and `""` respectively. This runs in the background and
never blocks the user.
## Step 0: Detect base branch
@@ -335,17 +360,10 @@ Run `git diff origin/<base>` to get the full diff. This includes both committed
Apply the checklist against the diff in two passes:
1. **Pass 1 (CRITICAL):** SQL & Data Safety, Race Conditions & Concurrency, LLM Output Trust Boundary, Enum & Value Completeness
2. **Pass 2 (INFORMATIONAL):** Conditional Side Effects, Magic Numbers & String Coupling, Dead Code & Consistency, LLM Prompt Issues, Test Gaps, View/Frontend, Performance & Bundle Impact
2. **Pass 2 (INFORMATIONAL):** Conditional Side Effects, Magic Numbers & String Coupling, Dead Code & Consistency, LLM Prompt Issues, Test Gaps, View/Frontend
**Enum & Value Completeness requires reading code OUTSIDE the diff.** When the diff introduces a new enum value, status, tier, or type constant, use Grep to find all files that reference sibling values, then Read those files to check if the new value is handled. This is the one category where within-diff review is insufficient.
**Search-before-recommending:** When recommending a fix pattern (especially for concurrency, caching, auth, or framework-specific behavior):
- Verify the pattern is current best practice for the framework version in use
- Check if a built-in solution exists in newer versions before recommending a workaround
- Verify API signatures against current docs (APIs change between versions)
Takes seconds, prevents recommending outdated patterns. If WebSearch is unavailable, note it and proceed with in-distribution knowledge.
Follow the output format specified in the checklist. Respect the suppressions — do NOT flag items listed in the "DO NOT flag" section.
---
@@ -501,7 +519,54 @@ If no documentation files exist, skip this step silently.
---
## Step 5.7: Codex second opinion (optional)
After completing the review, check if the Codex CLI is available:
```bash
which codex 2>/dev/null && echo "CODEX_AVAILABLE" || echo "CODEX_NOT_AVAILABLE"
```
If Codex is available, use AskUserQuestion:
```
Review complete. Want an independent second opinion from Codex (OpenAI)?
A) Run Codex code review — independent diff review with pass/fail gate
B) Run Codex adversarial challenge — try to find ways this code will fail in production
C) Both — review first, then adversarial challenge
D) Skip — no Codex review needed
```
If the user chooses A, B, or C:
**For code review (A or C):** Run `codex review --base <base>` with a 5-minute timeout.
Present the full output verbatim under a `CODEX SAYS (code review):` header.
Check the output for `[P1]` markers — if found, note `GATE: FAIL`, otherwise `GATE: PASS`.
After presenting, compare Codex's findings with your own review findings from Steps 4-5
and output a CROSS-MODEL ANALYSIS showing what both found, what only Codex found,
and what only Claude found.
**For adversarial challenge (B or C):** Run:
```bash
codex exec "Review the changes on this branch against the base branch. Run git diff origin/<base> to see the diff. Your job is to find ways this code will fail in production. Think like an attacker and a chaos engineer. Find edge cases, race conditions, security holes, failure modes. Be adversarial." -s read-only
```
Present the full output verbatim under a `CODEX SAYS (adversarial challenge):` header.
**Only if a code review ran (user chose A or C):** Persist the Codex review result to the review log:
```bash
~/.codex/skills/gstack/bin/gstack-review-log '{"skill":"codex-review","timestamp":"'"$(date -u +%Y-%m-%dT%H:%M:%SZ)"'","status":"STATUS","gate":"GATE"}'
```
Substitute: STATUS ("clean" if PASS, "issues_found" if FAIL), GATE ("pass" or "fail").
**Do NOT persist a codex-review entry when only the adversarial challenge (B) ran**
there is no gate verdict to record, and a false entry would make the Review Readiness
Dashboard believe a code review happened when it didn't.
If Codex is not available, skip this step silently.
---
## Important Rules
@@ -31,6 +31,12 @@ _TEL_START=$(date +%s)
_SESSION_ID="$$-$(date +%s)"
echo "TELEMETRY: ${_TEL:-off}"
echo "TEL_PROMPTED: $_TEL_PROMPTED"
_EMAIL=$(~/.codex/skills/gstack/bin/gstack-config get email 2>/dev/null || true)
_COMM_PROMPTED=$([ -f ~/.gstack/.community-prompted ] && echo "yes" || echo "no")
_AUTH_OK=$(~/.codex/skills/gstack/bin/gstack-auth-refresh --check 2>/dev/null && echo "yes" || echo "no")
echo "EMAIL: ${_EMAIL:-none}"
echo "COMM_PROMPTED: $_COMM_PROMPTED"
echo "AUTH: $_AUTH_OK"
mkdir -p ~/.gstack/analytics
echo '{"skill":"setup-browser-cookies","ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","repo":"'$(basename "$(git rev-parse --show-toplevel 2>/dev/null)" 2>/dev/null || echo "unknown")'"}' >> ~/.gstack/analytics/skill-usage.jsonl 2>/dev/null || true
for _PF in ~/.gstack/analytics/.pending-*; do [ -f "$_PF" ] && ~/.codex/skills/gstack/bin/gstack-telemetry-log --event-type skill_run --skill _pending_finalize --outcome unknown --session-id "$_SESSION_ID" 2>/dev/null || true; break; done
@@ -56,28 +62,31 @@ Only run `open` if the user says yes. Always run `touch` to mark as seen. This o
If `TEL_PROMPTED` is `no` AND `LAKE_INTRO` is `yes`: After the lake intro is handled,
ask the user about telemetry. Use AskUserQuestion:
> Help gstack get better! Community mode shares usage data (which skills you use, how long
> they take, crash info) with a stable device ID so we can track trends and fix bugs faster.
> No code, file paths, or repo names are ever sent.
> gstack can share usage data (which skills you use, how long they take, crash info)
> to help improve the project. No code, file paths, or repo names are ever sent.
>
> The **community tier** unlocks extra features:
> - **Cloud backup** of your gstack config + history (restore on new machines)
> - **Benchmarks**: see how your usage compares to other builders
> - **Skill recommendations** based on community patterns
>
> Change anytime with `gstack-config set telemetry off`.
Options:
- A) Help gstack get better! (recommended)
- B) No thanks
- A) Community — share data + email for backup, benchmarks & recommendations (recommended)
- B) Anonymous — share data only, no account
- C) No thanks
If A: run `~/.codex/skills/gstack/bin/gstack-config set telemetry community`
If A: ask for their email via a follow-up AskUserQuestion, then run:
```bash
~/.codex/skills/gstack/bin/gstack-config set telemetry community
~/.codex/skills/gstack/bin/gstack-auth <user-provided-email>
```
The auth script will send a verification code to their email. Wait for them to enter the 6-digit code.
If auth succeeds, continue with the skill. If it fails, fall back to anonymous tier.
If B: ask a follow-up AskUserQuestion:
> How about anonymous mode? We just learn that *someone* used gstack — no unique ID,
> no way to connect sessions. Just a counter that helps us know if anyone's out there.
Options:
- A) Sure, anonymous is fine
- B) No thanks, fully off
If B→A: run `~/.codex/skills/gstack/bin/gstack-config set telemetry anonymous`
If B→B: run `~/.codex/skills/gstack/bin/gstack-config set telemetry off`
If B: run `~/.codex/skills/gstack/bin/gstack-config set telemetry anonymous`
If C: run `~/.codex/skills/gstack/bin/gstack-config set telemetry off`
Always run:
```bash
@@ -86,6 +95,33 @@ touch ~/.gstack/.telemetry-prompted
This only happens once. If `TEL_PROMPTED` is `yes`, skip this entirely.
If `TELEMETRY` is `anonymous` AND `COMM_PROMPTED` is `no`: After the main skill workflow
begins (not during preamble), offer the community tier upgrade once. Use AskUserQuestion:
> You're already sharing anonymous usage data — nice! Want to unlock more?
>
> The **community tier** adds:
> - Cloud backup of your gstack config (restore on new machines)
> - Benchmarks: see how your /qa times compare to the community
> - Skill recommendations based on what other builders use
>
> Just needs your email (verified via a one-time code).
Options:
- A) Yes, join community (enter email)
- B) Not now
If A: ask for their email, then run `~/.codex/skills/gstack/bin/gstack-auth <email>`.
Wait for the verification code. On success, run `~/.codex/skills/gstack/bin/gstack-config set telemetry community`.
If B: do nothing.
Always run:
```bash
touch ~/.gstack/.community-prompted
```
This only happens once. If `COMM_PROMPTED` is `yes`, skip this entirely.
## AskUserQuestion Format
**ALWAYS follow this structure for every AskUserQuestion call:**
@@ -123,26 +159,6 @@ AI-assisted coding makes the marginal cost of completeness near-zero. When you p
- BAD: "Let's defer test coverage to a follow-up PR." (Tests are the cheapest lake to boil.)
- BAD: Quoting only human-team effort: "This would take 2 weeks." (Say: "2 weeks human / ~1 hour CC.")
## Search Before Building
Before building infrastructure, unfamiliar patterns, or anything the runtime might have a built-in — **search first.** Read `~/.codex/skills/gstack/ETHOS.md` for the full philosophy.
**Three layers of knowledge:**
- **Layer 1** (tried and true — in distribution). Don't reinvent the wheel. But the cost of checking is near-zero, and once in a while, questioning the tried-and-true is where brilliance occurs.
- **Layer 2** (new and popular — search for these). But scrutinize: humans are subject to mania. Search results are inputs to your thinking, not answers.
- **Layer 3** (first principles — prize these above all). Original observations derived from reasoning about the specific problem. The most valuable of all.
**Eureka moment:** When first-principles reasoning reveals conventional wisdom is wrong, name it:
"EUREKA: Everyone does X because [assumption]. But [evidence] shows this is wrong. Y is better because [reasoning]."
Log eureka moments:
```bash
jq -n --arg ts "$(date -u +%Y-%m-%dT%H:%M:%SZ)" --arg skill "SKILL_NAME" --arg branch "$(git branch --show-current 2>/dev/null)" --arg insight "ONE_LINE_SUMMARY" '{ts:$ts,skill:$skill,branch:$branch,insight:$insight}' >> ~/.gstack/analytics/eureka.jsonl 2>/dev/null || true
```
Replace SKILL_NAME and ONE_LINE_SUMMARY. Runs inline — don't stop the workflow.
**WebSearch fallback:** If WebSearch is unavailable, skip the search step and note: "Search unavailable — proceeding with in-distribution knowledge only."
## Contributor Mode
If `_CONTRIB` is `true`: you are in **contributor mode**. You're a gstack user who also helps make it better.
@@ -212,10 +228,15 @@ Determine the skill name from the `name:` field in this file's YAML frontmatter.
Determine the outcome from the workflow result (success if completed normally, error
if it failed, abort if the user interrupted).
**PLAN MODE EXCEPTION — ALWAYS RUN:** This command writes telemetry to
`~/.gstack/analytics/` (user config directory, not project files). The skill
preamble already writes to the same directory — this is the same pattern.
Skipping this command loses session duration and outcome data.
**For errors:** Also determine:
- `ERROR_CLASS`: a short category — one of: `timeout`, `test_failure`, `build_failure`,
`git_error`, `auth_error`, `network_error`, `browse_error`, `lint_error`,
`merge_conflict`, `permission_error`, `unknown_error`. Pick the most specific match.
- `ERROR_MESSAGE`: a one-line summary of what went wrong (max 200 chars). Include the
command that failed and the key error text. Example: `"bun test: 3 tests failed in
auth.test.ts — expected 200 got 401"`. Never include file paths, secrets, or PII.
- `FAILED_STEP`: which step in the skill workflow failed. Example: `"run_tests"`,
`"create_pr"`, `"merge_base"`, `"build"`, `"qa_browse"`. Use snake_case, max 30 chars.
Run this bash:
@@ -225,12 +246,16 @@ _TEL_DUR=$(( _TEL_END - _TEL_START ))
rm -f ~/.gstack/analytics/.pending-"$_SESSION_ID" 2>/dev/null || true
~/.codex/skills/gstack/bin/gstack-telemetry-log \
--skill "SKILL_NAME" --duration "$_TEL_DUR" --outcome "OUTCOME" \
--used-browse "USED_BROWSE" --session-id "$_SESSION_ID" 2>/dev/null &
--used-browse "USED_BROWSE" --session-id "$_SESSION_ID" \
--error-class "ERROR_CLASS" --error-message "ERROR_MESSAGE" \
--failed-step "FAILED_STEP" 2>/dev/null &
```
Replace `SKILL_NAME` with the actual skill name from frontmatter, `OUTCOME` with
success/error/abort, and `USED_BROWSE` with true/false based on whether `$B` was used.
If you cannot determine the outcome, use "unknown". This runs in the background and
For `ERROR_CLASS`, `ERROR_MESSAGE`, and `FAILED_STEP`: use empty string `""` if the
outcome is not error. If the outcome is error but you cannot determine the details,
use `"unknown_error"`, `""`, and `""` respectively. This runs in the background and
never blocks the user.
# Setup Browser Cookies
+108 -47
View File
@@ -29,6 +29,12 @@ _TEL_START=$(date +%s)
_SESSION_ID="$$-$(date +%s)"
echo "TELEMETRY: ${_TEL:-off}"
echo "TEL_PROMPTED: $_TEL_PROMPTED"
_EMAIL=$(~/.codex/skills/gstack/bin/gstack-config get email 2>/dev/null || true)
_COMM_PROMPTED=$([ -f ~/.gstack/.community-prompted ] && echo "yes" || echo "no")
_AUTH_OK=$(~/.codex/skills/gstack/bin/gstack-auth-refresh --check 2>/dev/null && echo "yes" || echo "no")
echo "EMAIL: ${_EMAIL:-none}"
echo "COMM_PROMPTED: $_COMM_PROMPTED"
echo "AUTH: $_AUTH_OK"
mkdir -p ~/.gstack/analytics
echo '{"skill":"ship","ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","repo":"'$(basename "$(git rev-parse --show-toplevel 2>/dev/null)" 2>/dev/null || echo "unknown")'"}' >> ~/.gstack/analytics/skill-usage.jsonl 2>/dev/null || true
for _PF in ~/.gstack/analytics/.pending-*; do [ -f "$_PF" ] && ~/.codex/skills/gstack/bin/gstack-telemetry-log --event-type skill_run --skill _pending_finalize --outcome unknown --session-id "$_SESSION_ID" 2>/dev/null || true; break; done
@@ -54,28 +60,31 @@ Only run `open` if the user says yes. Always run `touch` to mark as seen. This o
If `TEL_PROMPTED` is `no` AND `LAKE_INTRO` is `yes`: After the lake intro is handled,
ask the user about telemetry. Use AskUserQuestion:
> Help gstack get better! Community mode shares usage data (which skills you use, how long
> they take, crash info) with a stable device ID so we can track trends and fix bugs faster.
> No code, file paths, or repo names are ever sent.
> gstack can share usage data (which skills you use, how long they take, crash info)
> to help improve the project. No code, file paths, or repo names are ever sent.
>
> The **community tier** unlocks extra features:
> - **Cloud backup** of your gstack config + history (restore on new machines)
> - **Benchmarks**: see how your usage compares to other builders
> - **Skill recommendations** based on community patterns
>
> Change anytime with `gstack-config set telemetry off`.
Options:
- A) Help gstack get better! (recommended)
- B) No thanks
- A) Community — share data + email for backup, benchmarks & recommendations (recommended)
- B) Anonymous — share data only, no account
- C) No thanks
If A: run `~/.codex/skills/gstack/bin/gstack-config set telemetry community`
If A: ask for their email via a follow-up AskUserQuestion, then run:
```bash
~/.codex/skills/gstack/bin/gstack-config set telemetry community
~/.codex/skills/gstack/bin/gstack-auth <user-provided-email>
```
The auth script will send a verification code to their email. Wait for them to enter the 6-digit code.
If auth succeeds, continue with the skill. If it fails, fall back to anonymous tier.
If B: ask a follow-up AskUserQuestion:
> How about anonymous mode? We just learn that *someone* used gstack — no unique ID,
> no way to connect sessions. Just a counter that helps us know if anyone's out there.
Options:
- A) Sure, anonymous is fine
- B) No thanks, fully off
If B→A: run `~/.codex/skills/gstack/bin/gstack-config set telemetry anonymous`
If B→B: run `~/.codex/skills/gstack/bin/gstack-config set telemetry off`
If B: run `~/.codex/skills/gstack/bin/gstack-config set telemetry anonymous`
If C: run `~/.codex/skills/gstack/bin/gstack-config set telemetry off`
Always run:
```bash
@@ -84,6 +93,33 @@ touch ~/.gstack/.telemetry-prompted
This only happens once. If `TEL_PROMPTED` is `yes`, skip this entirely.
If `TELEMETRY` is `anonymous` AND `COMM_PROMPTED` is `no`: After the main skill workflow
begins (not during preamble), offer the community tier upgrade once. Use AskUserQuestion:
> You're already sharing anonymous usage data — nice! Want to unlock more?
>
> The **community tier** adds:
> - Cloud backup of your gstack config (restore on new machines)
> - Benchmarks: see how your /qa times compare to the community
> - Skill recommendations based on what other builders use
>
> Just needs your email (verified via a one-time code).
Options:
- A) Yes, join community (enter email)
- B) Not now
If A: ask for their email, then run `~/.codex/skills/gstack/bin/gstack-auth <email>`.
Wait for the verification code. On success, run `~/.codex/skills/gstack/bin/gstack-config set telemetry community`.
If B: do nothing.
Always run:
```bash
touch ~/.gstack/.community-prompted
```
This only happens once. If `COMM_PROMPTED` is `yes`, skip this entirely.
## AskUserQuestion Format
**ALWAYS follow this structure for every AskUserQuestion call:**
@@ -121,26 +157,6 @@ AI-assisted coding makes the marginal cost of completeness near-zero. When you p
- BAD: "Let's defer test coverage to a follow-up PR." (Tests are the cheapest lake to boil.)
- BAD: Quoting only human-team effort: "This would take 2 weeks." (Say: "2 weeks human / ~1 hour CC.")
## Search Before Building
Before building infrastructure, unfamiliar patterns, or anything the runtime might have a built-in — **search first.** Read `~/.codex/skills/gstack/ETHOS.md` for the full philosophy.
**Three layers of knowledge:**
- **Layer 1** (tried and true — in distribution). Don't reinvent the wheel. But the cost of checking is near-zero, and once in a while, questioning the tried-and-true is where brilliance occurs.
- **Layer 2** (new and popular — search for these). But scrutinize: humans are subject to mania. Search results are inputs to your thinking, not answers.
- **Layer 3** (first principles — prize these above all). Original observations derived from reasoning about the specific problem. The most valuable of all.
**Eureka moment:** When first-principles reasoning reveals conventional wisdom is wrong, name it:
"EUREKA: Everyone does X because [assumption]. But [evidence] shows this is wrong. Y is better because [reasoning]."
Log eureka moments:
```bash
jq -n --arg ts "$(date -u +%Y-%m-%dT%H:%M:%SZ)" --arg skill "SKILL_NAME" --arg branch "$(git branch --show-current 2>/dev/null)" --arg insight "ONE_LINE_SUMMARY" '{ts:$ts,skill:$skill,branch:$branch,insight:$insight}' >> ~/.gstack/analytics/eureka.jsonl 2>/dev/null || true
```
Replace SKILL_NAME and ONE_LINE_SUMMARY. Runs inline — don't stop the workflow.
**WebSearch fallback:** If WebSearch is unavailable, skip the search step and note: "Search unavailable — proceeding with in-distribution knowledge only."
## Contributor Mode
If `_CONTRIB` is `true`: you are in **contributor mode**. You're a gstack user who also helps make it better.
@@ -210,10 +226,15 @@ Determine the skill name from the `name:` field in this file's YAML frontmatter.
Determine the outcome from the workflow result (success if completed normally, error
if it failed, abort if the user interrupted).
**PLAN MODE EXCEPTION — ALWAYS RUN:** This command writes telemetry to
`~/.gstack/analytics/` (user config directory, not project files). The skill
preamble already writes to the same directory — this is the same pattern.
Skipping this command loses session duration and outcome data.
**For errors:** Also determine:
- `ERROR_CLASS`: a short category — one of: `timeout`, `test_failure`, `build_failure`,
`git_error`, `auth_error`, `network_error`, `browse_error`, `lint_error`,
`merge_conflict`, `permission_error`, `unknown_error`. Pick the most specific match.
- `ERROR_MESSAGE`: a one-line summary of what went wrong (max 200 chars). Include the
command that failed and the key error text. Example: `"bun test: 3 tests failed in
auth.test.ts — expected 200 got 401"`. Never include file paths, secrets, or PII.
- `FAILED_STEP`: which step in the skill workflow failed. Example: `"run_tests"`,
`"create_pr"`, `"merge_base"`, `"build"`, `"qa_browse"`. Use snake_case, max 30 chars.
Run this bash:
@@ -223,12 +244,16 @@ _TEL_DUR=$(( _TEL_END - _TEL_START ))
rm -f ~/.gstack/analytics/.pending-"$_SESSION_ID" 2>/dev/null || true
~/.codex/skills/gstack/bin/gstack-telemetry-log \
--skill "SKILL_NAME" --duration "$_TEL_DUR" --outcome "OUTCOME" \
--used-browse "USED_BROWSE" --session-id "$_SESSION_ID" 2>/dev/null &
--used-browse "USED_BROWSE" --session-id "$_SESSION_ID" \
--error-class "ERROR_CLASS" --error-message "ERROR_MESSAGE" \
--failed-step "FAILED_STEP" 2>/dev/null &
```
Replace `SKILL_NAME` with the actual skill name from frontmatter, `OUTCOME` with
success/error/abort, and `USED_BROWSE` with true/false based on whether `$B` was used.
If you cannot determine the outcome, use "unknown". This runs in the background and
For `ERROR_CLASS`, `ERROR_MESSAGE`, and `FAILED_STEP`: use empty string `""` if the
outcome is not error. If the outcome is error but you cannot determine the details,
use `"unknown_error"`, `""`, and `""` respectively. This runs in the background and
never blocks the user.
## Step 0: Detect base branch
@@ -294,7 +319,7 @@ After completing the review, read the review log and config to display the dashb
~/.codex/skills/gstack/bin/gstack-review-read
```
Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, plan-design-review, design-review-lite, adversarial-review, codex-review). Ignore entries with timestamps older than 7 days. For the Adversarial row, show whichever is more recent between `adversarial-review` (new auto-scaled) and `codex-review` (legacy). For Design Review, show whichever is more recent between `plan-design-review` (full visual audit) and `design-review-lite` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. Display:
Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, plan-design-review, design-review-lite, codex-review). Ignore entries with timestamps older than 7 days. For Design Review, show whichever is more recent between `plan-design-review` (full visual audit) and `design-review-lite` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. Display:
```
+====================================================================+
@@ -305,7 +330,7 @@ Parse the output. Find the most recent entry for each skill (plan-ceo-review, pl
| Eng Review | 1 | 2026-03-16 15:00 | CLEAR | YES |
| CEO Review | 0 | — | — | no |
| Design Review | 0 | — | — | no |
| Adversarial | 0 | — | — | no |
| Codex Review | 0 | — | — | no |
+--------------------------------------------------------------------+
| VERDICT: CLEARED — Eng Review passed |
+====================================================================+
@@ -315,7 +340,7 @@ Parse the output. Find the most recent entry for each skill (plan-ceo-review, pl
- **Eng Review (required by default):** The only review that gates shipping. Covers architecture, code quality, tests, performance. Can be disabled globally with \`gstack-config set skip_eng_review true\` (the "don't bother me" setting).
- **CEO Review (optional):** Use your judgment. Recommend it for big product/business changes, new user-facing features, or scope decisions. Skip for bug fixes, refactors, infra, and cleanup.
- **Design Review (optional):** Use your judgment. Recommend it for UI/UX changes. Skip for backend-only, infra, or prompt-only changes.
- **Adversarial Review (automatic):** Auto-scales by diff size. Small diffs (<50 lines) skip adversarial. Medium diffs (50199) get cross-model adversarial. Large diffs (200+) get all 4 passes: Claude structured, Codex structured, Claude adversarial subagent, Codex adversarial. No configuration needed.
- **Codex Review (optional):** Independent second opinion from OpenAI Codex CLI. Shows pass/fail gate. Recommend for critical code changes where a second AI perspective adds value. Skip when Codex CLI is not installed.
**Verdict logic:**
- **CLEARED**: Eng Review has >= 1 entry within 7 days with status "clean" (or \`skip_eng_review\` is \`true\`)
@@ -857,7 +882,43 @@ For each classified comment:
---
## Step 3.8: Codex second opinion (optional)
Check if the Codex CLI is available:
```bash
which codex 2>/dev/null && echo "CODEX_AVAILABLE" || echo "CODEX_NOT_AVAILABLE"
```
If Codex is available, use AskUserQuestion:
```
Pre-landing review complete. Want an independent Codex (OpenAI) review before shipping?
A) Run Codex code review — independent diff review with pass/fail gate
B) Run Codex adversarial challenge — try to break this code
C) Skip — ship without Codex review
```
If the user chooses A or B:
**For code review (A):** Run `codex review --base <base>` with a 5-minute timeout.
Present the full output verbatim under a `CODEX SAYS:` header. Check for `[P1]` markers
to determine pass/fail gate. Persist the result:
```bash
~/.codex/skills/gstack/bin/gstack-review-log '{"skill":"codex-review","timestamp":"TIMESTAMP","status":"STATUS","gate":"GATE"}'
```
If GATE is FAIL, use AskUserQuestion: "Codex found critical issues. Ship anyway?"
If the user says no, stop. If yes, continue to Step 4.
**For adversarial (B):** Run codex exec with the adversarial prompt (see /codex skill).
Present findings. This is informational — does not block shipping.
If Codex is not available, skip silently. Continue to Step 4.
---
## Step 4: Version bump (auto-decide)
@@ -1098,7 +1159,7 @@ doc updates — the user runs `/ship` and documentation stays current without a
- **Never skip tests.** If tests fail, stop.
- **Never skip the pre-landing review.** If checklist.md is unreadable, stop.
- **Never force push.** Use regular `git push` only.
- **Never ask for trivial confirmations** (e.g., "ready to push?", "create PR?"). DO stop for: version bumps (MINOR/MAJOR), pre-landing review findings (ASK items), and Codex structured review [P1] findings (large diffs only).
- **Never ask for confirmation** except for MINOR/MAJOR version bumps and pre-landing review ASK items (batched into at most one AskUserQuestion).
- **Always use the 4-digit version format** from the VERSION file.
- **Date format in CHANGELOG:** `YYYY-MM-DD`
- **Split commits for bisectability** — each commit = one logical change.
+69 -44
View File
@@ -64,6 +64,12 @@ _TEL_START=$(date +%s)
_SESSION_ID="$$-$(date +%s)"
echo "TELEMETRY: ${_TEL:-off}"
echo "TEL_PROMPTED: $_TEL_PROMPTED"
_EMAIL=$(~/.codex/skills/gstack/bin/gstack-config get email 2>/dev/null || true)
_COMM_PROMPTED=$([ -f ~/.gstack/.community-prompted ] && echo "yes" || echo "no")
_AUTH_OK=$(~/.codex/skills/gstack/bin/gstack-auth-refresh --check 2>/dev/null && echo "yes" || echo "no")
echo "EMAIL: ${_EMAIL:-none}"
echo "COMM_PROMPTED: $_COMM_PROMPTED"
echo "AUTH: $_AUTH_OK"
mkdir -p ~/.gstack/analytics
echo '{"skill":"gstack","ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","repo":"'$(basename "$(git rev-parse --show-toplevel 2>/dev/null)" 2>/dev/null || echo "unknown")'"}' >> ~/.gstack/analytics/skill-usage.jsonl 2>/dev/null || true
for _PF in ~/.gstack/analytics/.pending-*; do [ -f "$_PF" ] && ~/.codex/skills/gstack/bin/gstack-telemetry-log --event-type skill_run --skill _pending_finalize --outcome unknown --session-id "$_SESSION_ID" 2>/dev/null || true; break; done
@@ -89,28 +95,31 @@ Only run `open` if the user says yes. Always run `touch` to mark as seen. This o
If `TEL_PROMPTED` is `no` AND `LAKE_INTRO` is `yes`: After the lake intro is handled,
ask the user about telemetry. Use AskUserQuestion:
> Help gstack get better! Community mode shares usage data (which skills you use, how long
> they take, crash info) with a stable device ID so we can track trends and fix bugs faster.
> No code, file paths, or repo names are ever sent.
> gstack can share usage data (which skills you use, how long they take, crash info)
> to help improve the project. No code, file paths, or repo names are ever sent.
>
> The **community tier** unlocks extra features:
> - **Cloud backup** of your gstack config + history (restore on new machines)
> - **Benchmarks**: see how your usage compares to other builders
> - **Skill recommendations** based on community patterns
>
> Change anytime with `gstack-config set telemetry off`.
Options:
- A) Help gstack get better! (recommended)
- B) No thanks
- A) Community — share data + email for backup, benchmarks & recommendations (recommended)
- B) Anonymous — share data only, no account
- C) No thanks
If A: run `~/.codex/skills/gstack/bin/gstack-config set telemetry community`
If A: ask for their email via a follow-up AskUserQuestion, then run:
```bash
~/.codex/skills/gstack/bin/gstack-config set telemetry community
~/.codex/skills/gstack/bin/gstack-auth <user-provided-email>
```
The auth script will send a verification code to their email. Wait for them to enter the 6-digit code.
If auth succeeds, continue with the skill. If it fails, fall back to anonymous tier.
If B: ask a follow-up AskUserQuestion:
> How about anonymous mode? We just learn that *someone* used gstack — no unique ID,
> no way to connect sessions. Just a counter that helps us know if anyone's out there.
Options:
- A) Sure, anonymous is fine
- B) No thanks, fully off
If B→A: run `~/.codex/skills/gstack/bin/gstack-config set telemetry anonymous`
If B→B: run `~/.codex/skills/gstack/bin/gstack-config set telemetry off`
If B: run `~/.codex/skills/gstack/bin/gstack-config set telemetry anonymous`
If C: run `~/.codex/skills/gstack/bin/gstack-config set telemetry off`
Always run:
```bash
@@ -119,6 +128,33 @@ touch ~/.gstack/.telemetry-prompted
This only happens once. If `TEL_PROMPTED` is `yes`, skip this entirely.
If `TELEMETRY` is `anonymous` AND `COMM_PROMPTED` is `no`: After the main skill workflow
begins (not during preamble), offer the community tier upgrade once. Use AskUserQuestion:
> You're already sharing anonymous usage data — nice! Want to unlock more?
>
> The **community tier** adds:
> - Cloud backup of your gstack config (restore on new machines)
> - Benchmarks: see how your /qa times compare to the community
> - Skill recommendations based on what other builders use
>
> Just needs your email (verified via a one-time code).
Options:
- A) Yes, join community (enter email)
- B) Not now
If A: ask for their email, then run `~/.codex/skills/gstack/bin/gstack-auth <email>`.
Wait for the verification code. On success, run `~/.codex/skills/gstack/bin/gstack-config set telemetry community`.
If B: do nothing.
Always run:
```bash
touch ~/.gstack/.community-prompted
```
This only happens once. If `COMM_PROMPTED` is `yes`, skip this entirely.
## AskUserQuestion Format
**ALWAYS follow this structure for every AskUserQuestion call:**
@@ -156,26 +192,6 @@ AI-assisted coding makes the marginal cost of completeness near-zero. When you p
- BAD: "Let's defer test coverage to a follow-up PR." (Tests are the cheapest lake to boil.)
- BAD: Quoting only human-team effort: "This would take 2 weeks." (Say: "2 weeks human / ~1 hour CC.")
## Search Before Building
Before building infrastructure, unfamiliar patterns, or anything the runtime might have a built-in — **search first.** Read `~/.codex/skills/gstack/ETHOS.md` for the full philosophy.
**Three layers of knowledge:**
- **Layer 1** (tried and true — in distribution). Don't reinvent the wheel. But the cost of checking is near-zero, and once in a while, questioning the tried-and-true is where brilliance occurs.
- **Layer 2** (new and popular — search for these). But scrutinize: humans are subject to mania. Search results are inputs to your thinking, not answers.
- **Layer 3** (first principles — prize these above all). Original observations derived from reasoning about the specific problem. The most valuable of all.
**Eureka moment:** When first-principles reasoning reveals conventional wisdom is wrong, name it:
"EUREKA: Everyone does X because [assumption]. But [evidence] shows this is wrong. Y is better because [reasoning]."
Log eureka moments:
```bash
jq -n --arg ts "$(date -u +%Y-%m-%dT%H:%M:%SZ)" --arg skill "SKILL_NAME" --arg branch "$(git branch --show-current 2>/dev/null)" --arg insight "ONE_LINE_SUMMARY" '{ts:$ts,skill:$skill,branch:$branch,insight:$insight}' >> ~/.gstack/analytics/eureka.jsonl 2>/dev/null || true
```
Replace SKILL_NAME and ONE_LINE_SUMMARY. Runs inline — don't stop the workflow.
**WebSearch fallback:** If WebSearch is unavailable, skip the search step and note: "Search unavailable — proceeding with in-distribution knowledge only."
## Contributor Mode
If `_CONTRIB` is `true`: you are in **contributor mode**. You're a gstack user who also helps make it better.
@@ -245,10 +261,15 @@ Determine the skill name from the `name:` field in this file's YAML frontmatter.
Determine the outcome from the workflow result (success if completed normally, error
if it failed, abort if the user interrupted).
**PLAN MODE EXCEPTION — ALWAYS RUN:** This command writes telemetry to
`~/.gstack/analytics/` (user config directory, not project files). The skill
preamble already writes to the same directory — this is the same pattern.
Skipping this command loses session duration and outcome data.
**For errors:** Also determine:
- `ERROR_CLASS`: a short category — one of: `timeout`, `test_failure`, `build_failure`,
`git_error`, `auth_error`, `network_error`, `browse_error`, `lint_error`,
`merge_conflict`, `permission_error`, `unknown_error`. Pick the most specific match.
- `ERROR_MESSAGE`: a one-line summary of what went wrong (max 200 chars). Include the
command that failed and the key error text. Example: `"bun test: 3 tests failed in
auth.test.ts — expected 200 got 401"`. Never include file paths, secrets, or PII.
- `FAILED_STEP`: which step in the skill workflow failed. Example: `"run_tests"`,
`"create_pr"`, `"merge_base"`, `"build"`, `"qa_browse"`. Use snake_case, max 30 chars.
Run this bash:
@@ -258,12 +279,16 @@ _TEL_DUR=$(( _TEL_END - _TEL_START ))
rm -f ~/.gstack/analytics/.pending-"$_SESSION_ID" 2>/dev/null || true
~/.codex/skills/gstack/bin/gstack-telemetry-log \
--skill "SKILL_NAME" --duration "$_TEL_DUR" --outcome "OUTCOME" \
--used-browse "USED_BROWSE" --session-id "$_SESSION_ID" 2>/dev/null &
--used-browse "USED_BROWSE" --session-id "$_SESSION_ID" \
--error-class "ERROR_CLASS" --error-message "ERROR_MESSAGE" \
--failed-step "FAILED_STEP" 2>/dev/null &
```
Replace `SKILL_NAME` with the actual skill name from frontmatter, `OUTCOME` with
success/error/abort, and `USED_BROWSE` with true/false based on whether `$B` was used.
If you cannot determine the outcome, use "unknown". This runs in the background and
For `ERROR_CLASS`, `ERROR_MESSAGE`, and `FAILED_STEP`: use empty string `""` if the
outcome is not error. If the outcome is error but you cannot determine the details,
use `"unknown_error"`, `""`, and `""` respectively. This runs in the background and
never blocks the user.
If `PROACTIVE` is `false`: do NOT proactively suggest other gstack skills during this session.
@@ -506,7 +531,7 @@ The snapshot is your primary tool for understanding and interacting with pages.
-s <sel> --selector Scope to CSS selector
-D --diff Unified diff against previous snapshot (first call stores baseline)
-a --annotate Annotated screenshot with red overlay boxes and ref labels
-o <path> --output Output path for annotated screenshot (default: <temp>/browse-annotated.png)
-o <path> --output Output path for annotated screenshot (default: /tmp/browse-annotated.png)
-C --cursor-interactive Cursor-interactive elements (@c refs — divs with pointer, onclick)
```