Merge remote-tracking branch 'origin/garrytan/community-mode' into garrytan/persistent-docs

# Conflicts:
#	.agents/skills/gstack-browse/SKILL.md
#	.agents/skills/gstack-design-consultation/SKILL.md
#	.agents/skills/gstack-design-review/SKILL.md
#	.agents/skills/gstack-document-release/SKILL.md
#	.agents/skills/gstack-investigate/SKILL.md
#	.agents/skills/gstack-office-hours/SKILL.md
#	.agents/skills/gstack-plan-ceo-review/SKILL.md
#	.agents/skills/gstack-plan-design-review/SKILL.md
#	.agents/skills/gstack-plan-eng-review/SKILL.md
#	.agents/skills/gstack-qa-only/SKILL.md
#	.agents/skills/gstack-qa/SKILL.md
#	.agents/skills/gstack-retro/SKILL.md
#	.agents/skills/gstack-review/SKILL.md
#	.agents/skills/gstack-setup-browser-cookies/SKILL.md
#	.agents/skills/gstack-ship/SKILL.md
#	.agents/skills/gstack/SKILL.md
#	SKILL.md
#	browse/SKILL.md
#	codex/SKILL.md
#	design-consultation/SKILL.md
#	design-review/SKILL.md
#	document-release/SKILL.md
#	investigate/SKILL.md
#	office-hours/SKILL.md
#	plan-ceo-review/SKILL.md
#	plan-design-review/SKILL.md
#	plan-eng-review/SKILL.md
#	qa-only/SKILL.md
#	qa/SKILL.md
#	retro/SKILL.md
#	review/SKILL.md
#	scripts/gen-skill-docs.ts
#	setup-browser-cookies/SKILL.md
#	ship/SKILL.md
This commit is contained in:
Garry Tan
2026-03-21 19:32:18 -07:00
48 changed files with 3646 additions and 3240 deletions
+78 -118
View File
@@ -40,6 +40,12 @@ _TEL_START=$(date +%s)
_SESSION_ID="$$-$(date +%s)"
echo "TELEMETRY: ${_TEL:-off}"
echo "TEL_PROMPTED: $_TEL_PROMPTED"
_EMAIL=$(~/.claude/skills/gstack/bin/gstack-config get email 2>/dev/null || true)
_COMM_PROMPTED=$([ -f ~/.gstack/.community-prompted ] && echo "yes" || echo "no")
_AUTH_OK=$(~/.claude/skills/gstack/bin/gstack-auth-refresh --check 2>/dev/null && echo "yes" || echo "no")
echo "EMAIL: ${_EMAIL:-none}"
echo "COMM_PROMPTED: $_COMM_PROMPTED"
echo "AUTH: $_AUTH_OK"
mkdir -p ~/.gstack/analytics
echo '{"skill":"codex","ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","repo":"'$(basename "$(git rev-parse --show-toplevel 2>/dev/null)" 2>/dev/null || echo "unknown")'"}' >> ~/.gstack/analytics/skill-usage.jsonl 2>/dev/null || true
for _PF in ~/.gstack/analytics/.pending-*; do [ -f "$_PF" ] && ~/.claude/skills/gstack/bin/gstack-telemetry-log --event-type skill_run --skill _pending_finalize --outcome unknown --session-id "$_SESSION_ID" 2>/dev/null || true; break; done
@@ -65,28 +71,31 @@ Only run `open` if the user says yes. Always run `touch` to mark as seen. This o
If `TEL_PROMPTED` is `no` AND `LAKE_INTRO` is `yes`: After the lake intro is handled,
ask the user about telemetry. Use AskUserQuestion:
> Help gstack get better! Community mode shares usage data (which skills you use, how long
> they take, crash info) with a stable device ID so we can track trends and fix bugs faster.
> No code, file paths, or repo names are ever sent.
> gstack can share usage data (which skills you use, how long they take, crash info)
> to help improve the project. No code, file paths, or repo names are ever sent.
>
> The **community tier** unlocks extra features:
> - **Cloud backup** of your gstack config + history (restore on new machines)
> - **Benchmarks**: see how your usage compares to other builders
> - **Skill recommendations** based on community patterns
>
> Change anytime with `gstack-config set telemetry off`.
Options:
- A) Help gstack get better! (recommended)
- B) No thanks
- A) Community — share data + email for backup, benchmarks & recommendations (recommended)
- B) Anonymous — share data only, no account
- C) No thanks
If A: run `~/.claude/skills/gstack/bin/gstack-config set telemetry community`
If A: ask for their email via a follow-up AskUserQuestion, then run:
```bash
~/.claude/skills/gstack/bin/gstack-config set telemetry community
~/.claude/skills/gstack/bin/gstack-auth <user-provided-email>
```
The auth script will send a verification code to their email. Wait for them to enter the 6-digit code.
If auth succeeds, continue with the skill. If it fails, fall back to anonymous tier.
If B: ask a follow-up AskUserQuestion:
> How about anonymous mode? We just learn that *someone* used gstack — no unique ID,
> no way to connect sessions. Just a counter that helps us know if anyone's out there.
Options:
- A) Sure, anonymous is fine
- B) No thanks, fully off
If B→A: run `~/.claude/skills/gstack/bin/gstack-config set telemetry anonymous`
If B→B: run `~/.claude/skills/gstack/bin/gstack-config set telemetry off`
If B: run `~/.claude/skills/gstack/bin/gstack-config set telemetry anonymous`
If C: run `~/.claude/skills/gstack/bin/gstack-config set telemetry off`
Always run:
```bash
@@ -95,6 +104,33 @@ touch ~/.gstack/.telemetry-prompted
This only happens once. If `TEL_PROMPTED` is `yes`, skip this entirely.
If `TELEMETRY` is `anonymous` AND `COMM_PROMPTED` is `no`: After the main skill workflow
begins (not during preamble), offer the community tier upgrade once. Use AskUserQuestion:
> You're already sharing anonymous usage data — nice! Want to unlock more?
>
> The **community tier** adds:
> - Cloud backup of your gstack config (restore on new machines)
> - Benchmarks: see how your /qa times compare to the community
> - Skill recommendations based on what other builders use
>
> Just needs your email (verified via a one-time code).
Options:
- A) Yes, join community (enter email)
- B) Not now
If A: ask for their email, then run `~/.claude/skills/gstack/bin/gstack-auth <email>`.
Wait for the verification code. On success, run `~/.claude/skills/gstack/bin/gstack-config set telemetry community`.
If B: do nothing.
Always run:
```bash
touch ~/.gstack/.community-prompted
```
This only happens once. If `COMM_PROMPTED` is `yes`, skip this entirely.
## AskUserQuestion Format
**ALWAYS follow this structure for every AskUserQuestion call:**
@@ -132,26 +168,6 @@ AI-assisted coding makes the marginal cost of completeness near-zero. When you p
- BAD: "Let's defer test coverage to a follow-up PR." (Tests are the cheapest lake to boil.)
- BAD: Quoting only human-team effort: "This would take 2 weeks." (Say: "2 weeks human / ~1 hour CC.")
## Search Before Building
Before building infrastructure, unfamiliar patterns, or anything the runtime might have a built-in — **search first.** Read `~/.claude/skills/gstack/ETHOS.md` for the full philosophy.
**Three layers of knowledge:**
- **Layer 1** (tried and true — in distribution). Don't reinvent the wheel. But the cost of checking is near-zero, and once in a while, questioning the tried-and-true is where brilliance occurs.
- **Layer 2** (new and popular — search for these). But scrutinize: humans are subject to mania. Search results are inputs to your thinking, not answers.
- **Layer 3** (first principles — prize these above all). Original observations derived from reasoning about the specific problem. The most valuable of all.
**Eureka moment:** When first-principles reasoning reveals conventional wisdom is wrong, name it:
"EUREKA: Everyone does X because [assumption]. But [evidence] shows this is wrong. Y is better because [reasoning]."
Log eureka moments:
```bash
jq -n --arg ts "$(date -u +%Y-%m-%dT%H:%M:%SZ)" --arg skill "SKILL_NAME" --arg branch "$(git branch --show-current 2>/dev/null)" --arg insight "ONE_LINE_SUMMARY" '{ts:$ts,skill:$skill,branch:$branch,insight:$insight}' >> ~/.gstack/analytics/eureka.jsonl 2>/dev/null || true
```
Replace SKILL_NAME and ONE_LINE_SUMMARY. Runs inline — don't stop the workflow.
**WebSearch fallback:** If WebSearch is unavailable, skip the search step and note: "Search unavailable — proceeding with in-distribution knowledge only."
## Contributor Mode
If `_CONTRIB` is `true`: you are in **contributor mode**. You're a gstack user who also helps make it better.
@@ -221,10 +237,15 @@ Determine the skill name from the `name:` field in this file's YAML frontmatter.
Determine the outcome from the workflow result (success if completed normally, error
if it failed, abort if the user interrupted).
**PLAN MODE EXCEPTION — ALWAYS RUN:** This command writes telemetry to
`~/.gstack/analytics/` (user config directory, not project files). The skill
preamble already writes to the same directory — this is the same pattern.
Skipping this command loses session duration and outcome data.
**For errors:** Also determine:
- `ERROR_CLASS`: a short category — one of: `timeout`, `test_failure`, `build_failure`,
`git_error`, `auth_error`, `network_error`, `browse_error`, `lint_error`,
`merge_conflict`, `permission_error`, `unknown_error`. Pick the most specific match.
- `ERROR_MESSAGE`: a one-line summary of what went wrong (max 200 chars). Include the
command that failed and the key error text. Example: `"bun test: 3 tests failed in
auth.test.ts — expected 200 got 401"`. Never include file paths, secrets, or PII.
- `FAILED_STEP`: which step in the skill workflow failed. Example: `"run_tests"`,
`"create_pr"`, `"merge_base"`, `"build"`, `"qa_browse"`. Use snake_case, max 30 chars.
Run this bash:
@@ -234,12 +255,16 @@ _TEL_DUR=$(( _TEL_END - _TEL_START ))
rm -f ~/.gstack/analytics/.pending-"$_SESSION_ID" 2>/dev/null || true
~/.claude/skills/gstack/bin/gstack-telemetry-log \
--skill "SKILL_NAME" --duration "$_TEL_DUR" --outcome "OUTCOME" \
--used-browse "USED_BROWSE" --session-id "$_SESSION_ID" 2>/dev/null &
--used-browse "USED_BROWSE" --session-id "$_SESSION_ID" \
--error-class "ERROR_CLASS" --error-message "ERROR_MESSAGE" \
--failed-step "FAILED_STEP" 2>/dev/null &
```
Replace `SKILL_NAME` with the actual skill name from frontmatter, `OUTCOME` with
success/error/abort, and `USED_BROWSE` with true/false based on whether `$B` was used.
If you cannot determine the outcome, use "unknown". This runs in the background and
For `ERROR_CLASS`, `ERROR_MESSAGE`, and `FAILED_STEP`: use empty string `""` if the
outcome is not error. If the outcome is error but you cannot determine the details,
use `"unknown_error"`, `""`, and `""` respectively. This runs in the background and
never blocks the user.
## Step 0: Detect base branch
@@ -320,13 +345,13 @@ TMPERR=$(mktemp /tmp/codex-err-XXXXXX.txt)
2. Run the review (5-minute timeout):
```bash
codex review --base <base> -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR"
codex review --base <base> -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR"
```
Use `timeout: 300000` on the Bash call. If the user provided custom instructions
(e.g., `/codex review focus on security`), pass them as the prompt argument:
```bash
codex review "focus on security" --base <base> -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR"
codex review "focus on security" --base <base> -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR"
```
3. Capture the output. Then parse cost from stderr:
@@ -367,85 +392,17 @@ CROSS-MODEL ANALYSIS:
7. Persist the review result:
```bash
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"codex-review","timestamp":"TIMESTAMP","status":"STATUS","gate":"GATE","findings":N,"findings_fixed":N}'
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"codex-review","timestamp":"TIMESTAMP","status":"STATUS","gate":"GATE","findings":N}'
```
Substitute: TIMESTAMP (ISO 8601), STATUS ("clean" if PASS, "issues_found" if FAIL),
GATE ("pass" or "fail"), findings (count of [P1] + [P2] markers),
findings_fixed (count of findings that were addressed/fixed before shipping).
GATE ("pass" or "fail"), findings (count of [P1] + [P2] markers).
8. Clean up temp files:
```bash
rm -f "$TMPERR"
```
## Plan File Review Report
After displaying the Review Readiness Dashboard in conversation output, also update the
**plan file** itself so review status is visible to anyone reading the plan.
### Detect the plan file
1. Check if there is an active plan file in this conversation (the host provides plan file
paths in system messages — look for plan file references in the conversation context).
2. If not found, skip this section silently — not every review runs in plan mode.
### Generate the report
Read the review log output you already have from the Review Readiness Dashboard step above.
Parse each JSONL entry. Each skill logs different fields:
- **plan-ceo-review**: \`status\`, \`unresolved\`, \`critical_gaps\`, \`mode\`, \`scope_proposed\`, \`scope_accepted\`, \`scope_deferred\`, \`commit\`
→ Findings: "{scope_proposed} proposals, {scope_accepted} accepted, {scope_deferred} deferred"
→ If scope fields are 0 or missing (HOLD/REDUCTION mode): "mode: {mode}, {critical_gaps} critical gaps"
- **plan-eng-review**: \`status\`, \`unresolved\`, \`critical_gaps\`, \`issues_found\`, \`mode\`, \`commit\`
→ Findings: "{issues_found} issues, {critical_gaps} critical gaps"
- **plan-design-review**: \`status\`, \`initial_score\`, \`overall_score\`, \`unresolved\`, \`decisions_made\`, \`commit\`
→ Findings: "score: {initial_score}/10 → {overall_score}/10, {decisions_made} decisions"
- **codex-review**: \`status\`, \`gate\`, \`findings\`, \`findings_fixed\`
→ Findings: "{findings} findings, {findings_fixed}/{findings} fixed"
All fields needed for the Findings column are now present in the JSONL entries.
For the review you just completed, you may use richer details from your own Completion
Summary. For prior reviews, use the JSONL fields directly — they contain all required data.
Produce this markdown table:
\`\`\`markdown
## GSTACK REVIEW REPORT
| Review | Trigger | Why | Runs | Status | Findings |
|--------|---------|-----|------|--------|----------|
| CEO Review | \`/plan-ceo-review\` | Scope & strategy | {runs} | {status} | {findings} |
| Codex Review | \`/codex review\` | Independent 2nd opinion | {runs} | {status} | {findings} |
| Eng Review | \`/plan-eng-review\` | Architecture & tests (required) | {runs} | {status} | {findings} |
| Design Review | \`/plan-design-review\` | UI/UX gaps | {runs} | {status} | {findings} |
\`\`\`
Below the table, add these lines (omit any that are empty/not applicable):
- **CODEX:** (only if codex-review ran) — one-line summary of codex fixes
- **CROSS-MODEL:** (only if both Claude and Codex reviews exist) — overlap analysis
- **UNRESOLVED:** total unresolved decisions across all reviews
- **VERDICT:** list reviews that are CLEAR (e.g., "CEO + ENG CLEARED — ready to implement").
If Eng Review is not CLEAR and not skipped globally, append "eng review required".
### Write to the plan file
**PLAN MODE EXCEPTION — ALWAYS RUN:** This writes to the plan file, which is the one
file you are allowed to edit in plan mode. The plan file review report is part of the
plan's living status.
- Search the plan file for a \`## GSTACK REVIEW REPORT\` section **anywhere** in the file
(not just at the end — content may have been added after it).
- If found, **replace it** entirely using the Edit tool. Match from \`## GSTACK REVIEW REPORT\`
through either the next \`## \` heading or end of file, whichever comes first. This ensures
content added after the report section is preserved, not eaten. If the Edit fails
(e.g., concurrent edit changed the content), re-read the plan file and retry once.
- If no such section exists, **append it** to the end of the plan file.
- Always place it as the very last section in the plan file. If it was found mid-file,
move it: delete the old location and append at the end.
---
## Step 2B: Challenge (Adversarial) Mode
@@ -549,7 +506,7 @@ THE PLAN:
For a **new session:**
```bash
codex exec "<prompt>" -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached --json 2>"$TMPERR" | python3 -c "
codex exec "<prompt>" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached --json 2>"$TMPERR" | python3 -c "
import sys, json
for line in sys.stdin:
line = line.strip()
@@ -582,7 +539,7 @@ for line in sys.stdin:
For a **resumed session** (user chose "Continue"):
```bash
codex exec resume <session-id> "<prompt>" -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached --json 2>"$TMPERR" | python3 -c "
codex exec resume <session-id> "<prompt>" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached --json 2>"$TMPERR" | python3 -c "
<same python streaming parser as above>
"
```
@@ -618,7 +575,10 @@ Session saved — run /codex again to continue this conversation.
agentic coding model). This means as OpenAI ships newer models, /codex automatically
uses them. If the user wants a specific model, pass `-m` through to codex.
**Reasoning effort:** All modes use `xhigh` — maximum reasoning power. When reviewing code, breaking code, or consulting on architecture, you want the model thinking as hard as possible.
**Reasoning effort** varies by mode — use the right level for each task:
- **Review mode:** `high` — thorough but not slow. Diff review benefits from depth but doesn't need maximum compute.
- **Challenge (adversarial) mode:** `xhigh` — maximum reasoning power. When trying to break code, you want the model thinking as hard as possible.
- **Consult mode:** `high` — good balance of depth and speed for conversations.
**Web search:** All codex commands use `--enable web_search_cached` so Codex can look up
docs and APIs during review. This is OpenAI's cached index — fast, no extra cost.