Merge remote-tracking branch 'origin/garrytan/community-mode' into garrytan/persistent-docs

# Conflicts: # .agents/skills/gstack-browse/SKILL.md # .agents/skills/gstack-design-consultation/SKILL.md # .agents/skills/gstack-design-review/SKILL.md # .agents/skills/gstack-document-release/SKILL.md # .agents/skills/gstack-investigate/SKILL.md # .agents/skills/gstack-office-hours/SKILL.md # .agents/skills/gstack-plan-ceo-review/SKILL.md # .agents/skills/gstack-plan-design-review/SKILL.md # .agents/skills/gstack-plan-eng-review/SKILL.md # .agents/skills/gstack-qa-only/SKILL.md # .agents/skills/gstack-qa/SKILL.md # .agents/skills/gstack-retro/SKILL.md # .agents/skills/gstack-review/SKILL.md # .agents/skills/gstack-setup-browser-cookies/SKILL.md # .agents/skills/gstack-ship/SKILL.md # .agents/skills/gstack/SKILL.md # SKILL.md # browse/SKILL.md # codex/SKILL.md # design-consultation/SKILL.md # design-review/SKILL.md # document-release/SKILL.md # investigate/SKILL.md # office-hours/SKILL.md # plan-ceo-review/SKILL.md # plan-design-review/SKILL.md # plan-eng-review/SKILL.md # qa-only/SKILL.md # qa/SKILL.md # retro/SKILL.md # review/SKILL.md # scripts/gen-skill-docs.ts # setup-browser-cookies/SKILL.md # ship/SKILL.md
2026-05-05 05:05:08 +02:00 · 2026-03-21 19:32:18 -07:00
parent 022de19ab3 b349769e2e
commit a14401f3db
48 changed files with 3646 additions and 3240 deletions
@@ -40,6 +40,12 @@ _TEL_START=$(date +%s)
 _SESSION_ID="$$-$(date +%s)"
 echo "TELEMETRY: ${_TEL:-off}"
 echo "TEL_PROMPTED: $_TEL_PROMPTED"
+_EMAIL=$(~/.claude/skills/gstack/bin/gstack-config get email 2>/dev/null || true)
+_COMM_PROMPTED=$([ -f ~/.gstack/.community-prompted ] && echo "yes" || echo "no")
+_AUTH_OK=$(~/.claude/skills/gstack/bin/gstack-auth-refresh --check 2>/dev/null && echo "yes" || echo "no")
+echo "EMAIL: ${_EMAIL:-none}"
+echo "COMM_PROMPTED: $_COMM_PROMPTED"
+echo "AUTH: $_AUTH_OK"
 mkdir -p ~/.gstack/analytics
 echo '{"skill":"codex","ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","repo":"'$(basename "$(git rev-parse --show-toplevel 2>/dev/null)" 2>/dev/null || echo "unknown")'"}'  >> ~/.gstack/analytics/skill-usage.jsonl 2>/dev/null || true
 for _PF in ~/.gstack/analytics/.pending-*; do [ -f "$_PF" ] && ~/.claude/skills/gstack/bin/gstack-telemetry-log --event-type skill_run --skill _pending_finalize --outcome unknown --session-id "$_SESSION_ID" 2>/dev/null || true; break; done
@@ -65,28 +71,31 @@ Only run `open` if the user says yes. Always run `touch` to mark as seen. This o
 If `TEL_PROMPTED` is `no` AND `LAKE_INTRO` is `yes`: After the lake intro is handled,
 ask the user about telemetry. Use AskUserQuestion:

-> Help gstack get better! Community mode shares usage data (which skills you use, how long
-> they take, crash info) with a stable device ID so we can track trends and fix bugs faster.
-> No code, file paths, or repo names are ever sent.
+> gstack can share usage data (which skills you use, how long they take, crash info)
+> to help improve the project. No code, file paths, or repo names are ever sent.
+>
+> The **community tier** unlocks extra features:
+> - **Cloud backup** of your gstack config + history (restore on new machines)
+> - **Benchmarks**: see how your usage compares to other builders
+> - **Skill recommendations** based on community patterns
+>
 > Change anytime with `gstack-config set telemetry off`.

 Options:
- A) Help gstack get better! (recommended)
- B) No thanks
+- A) Community — share data + email for backup, benchmarks & recommendations (recommended)
+- B) Anonymous — share data only, no account
+- C) No thanks

-If A: run `~/.claude/skills/gstack/bin/gstack-config set telemetry community`
+If A: ask for their email via a follow-up AskUserQuestion, then run:
+```bash
+~/.claude/skills/gstack/bin/gstack-config set telemetry community
+~/.claude/skills/gstack/bin/gstack-auth <user-provided-email>
+```
+The auth script will send a verification code to their email. Wait for them to enter the 6-digit code.
+If auth succeeds, continue with the skill. If it fails, fall back to anonymous tier.

-If B: ask a follow-up AskUserQuestion:
-
-> How about anonymous mode? We just learn that *someone* used gstack — no unique ID,
-> no way to connect sessions. Just a counter that helps us know if anyone's out there.
-
-Options:
- A) Sure, anonymous is fine
- B) No thanks, fully off
-
-If B→A: run `~/.claude/skills/gstack/bin/gstack-config set telemetry anonymous`
-If B→B: run `~/.claude/skills/gstack/bin/gstack-config set telemetry off`
+If B: run `~/.claude/skills/gstack/bin/gstack-config set telemetry anonymous`
+If C: run `~/.claude/skills/gstack/bin/gstack-config set telemetry off`

 Always run:
 ```bash
@@ -95,6 +104,33 @@ touch ~/.gstack/.telemetry-prompted

 This only happens once. If `TEL_PROMPTED` is `yes`, skip this entirely.

+If `TELEMETRY` is `anonymous` AND `COMM_PROMPTED` is `no`: After the main skill workflow
+begins (not during preamble), offer the community tier upgrade once. Use AskUserQuestion:
+
+> You're already sharing anonymous usage data — nice! Want to unlock more?
+>
+> The **community tier** adds:
+> - Cloud backup of your gstack config (restore on new machines)
+> - Benchmarks: see how your /qa times compare to the community
+> - Skill recommendations based on what other builders use
+>
+> Just needs your email (verified via a one-time code).
+
+Options:
+- A) Yes, join community (enter email)
+- B) Not now
+
+If A: ask for their email, then run `~/.claude/skills/gstack/bin/gstack-auth <email>`.
+Wait for the verification code. On success, run `~/.claude/skills/gstack/bin/gstack-config set telemetry community`.
+If B: do nothing.
+
+Always run:
+```bash
+touch ~/.gstack/.community-prompted
+```
+
+This only happens once. If `COMM_PROMPTED` is `yes`, skip this entirely.
+
 ## AskUserQuestion Format

 **ALWAYS follow this structure for every AskUserQuestion call:**
@@ -132,26 +168,6 @@ AI-assisted coding makes the marginal cost of completeness near-zero. When you p
 - BAD: "Let's defer test coverage to a follow-up PR." (Tests are the cheapest lake to boil.)
 - BAD: Quoting only human-team effort: "This would take 2 weeks." (Say: "2 weeks human / ~1 hour CC.")

-## Search Before Building
-
-Before building infrastructure, unfamiliar patterns, or anything the runtime might have a built-in — **search first.** Read `~/.claude/skills/gstack/ETHOS.md` for the full philosophy.
-
-**Three layers of knowledge:**
- **Layer 1** (tried and true — in distribution). Don't reinvent the wheel. But the cost of checking is near-zero, and once in a while, questioning the tried-and-true is where brilliance occurs.
- **Layer 2** (new and popular — search for these). But scrutinize: humans are subject to mania. Search results are inputs to your thinking, not answers.
- **Layer 3** (first principles — prize these above all). Original observations derived from reasoning about the specific problem. The most valuable of all.
-
-**Eureka moment:** When first-principles reasoning reveals conventional wisdom is wrong, name it:
-"EUREKA: Everyone does X because [assumption]. But [evidence] shows this is wrong. Y is better because [reasoning]."
-
-Log eureka moments:
-```bash
-jq -n --arg ts "$(date -u +%Y-%m-%dT%H:%M:%SZ)" --arg skill "SKILL_NAME" --arg branch "$(git branch --show-current 2>/dev/null)" --arg insight "ONE_LINE_SUMMARY" '{ts:$ts,skill:$skill,branch:$branch,insight:$insight}' >> ~/.gstack/analytics/eureka.jsonl 2>/dev/null || true
-```
-Replace SKILL_NAME and ONE_LINE_SUMMARY. Runs inline — don't stop the workflow.
-
-**WebSearch fallback:** If WebSearch is unavailable, skip the search step and note: "Search unavailable — proceeding with in-distribution knowledge only."
-
 ## Contributor Mode

 If `_CONTRIB` is `true`: you are in **contributor mode**. You're a gstack user who also helps make it better.
@@ -221,10 +237,15 @@ Determine the skill name from the `name:` field in this file's YAML frontmatter.
 Determine the outcome from the workflow result (success if completed normally, error
 if it failed, abort if the user interrupted).

-**PLAN MODE EXCEPTION — ALWAYS RUN:** This command writes telemetry to
-`~/.gstack/analytics/` (user config directory, not project files). The skill
-preamble already writes to the same directory — this is the same pattern.
-Skipping this command loses session duration and outcome data.
+**For errors:** Also determine:
+- `ERROR_CLASS`: a short category — one of: `timeout`, `test_failure`, `build_failure`,
+  `git_error`, `auth_error`, `network_error`, `browse_error`, `lint_error`,
+  `merge_conflict`, `permission_error`, `unknown_error`. Pick the most specific match.
+- `ERROR_MESSAGE`: a one-line summary of what went wrong (max 200 chars). Include the
+  command that failed and the key error text. Example: `"bun test: 3 tests failed in
+  auth.test.ts — expected 200 got 401"`. Never include file paths, secrets, or PII.
+- `FAILED_STEP`: which step in the skill workflow failed. Example: `"run_tests"`,
+  `"create_pr"`, `"merge_base"`, `"build"`, `"qa_browse"`. Use snake_case, max 30 chars.

 Run this bash:

@@ -234,12 +255,16 @@ _TEL_DUR=$(( _TEL_END - _TEL_START ))
 rm -f ~/.gstack/analytics/.pending-"$_SESSION_ID" 2>/dev/null || true
 ~/.claude/skills/gstack/bin/gstack-telemetry-log \
  --skill "SKILL_NAME" --duration "$_TEL_DUR" --outcome "OUTCOME" \
-  --used-browse "USED_BROWSE" --session-id "$_SESSION_ID" 2>/dev/null &
+  --used-browse "USED_BROWSE" --session-id "$_SESSION_ID" \
+  --error-class "ERROR_CLASS" --error-message "ERROR_MESSAGE" \
+  --failed-step "FAILED_STEP" 2>/dev/null &
 ```

 Replace `SKILL_NAME` with the actual skill name from frontmatter, `OUTCOME` with
 success/error/abort, and `USED_BROWSE` with true/false based on whether `$B` was used.
-If you cannot determine the outcome, use "unknown". This runs in the background and
+For `ERROR_CLASS`, `ERROR_MESSAGE`, and `FAILED_STEP`: use empty string `""` if the
+outcome is not error. If the outcome is error but you cannot determine the details,
+use `"unknown_error"`, `""`, and `""` respectively. This runs in the background and
 never blocks the user.

 ## Step 0: Detect base branch
@@ -320,13 +345,13 @@ TMPERR=$(mktemp /tmp/codex-err-XXXXXX.txt)

 2. Run the review (5-minute timeout):
 ```bash
-codex review --base <base> -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR"
+codex review --base <base> -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR"
 ```

 Use `timeout: 300000` on the Bash call. If the user provided custom instructions
 (e.g., `/codex review focus on security`), pass them as the prompt argument:
 ```bash
-codex review "focus on security" --base <base> -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR"
+codex review "focus on security" --base <base> -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR"
 ```

 3. Capture the output. Then parse cost from stderr:
@@ -367,85 +392,17 @@ CROSS-MODEL ANALYSIS:

 7. Persist the review result:
 ```bash
-~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"codex-review","timestamp":"TIMESTAMP","status":"STATUS","gate":"GATE","findings":N,"findings_fixed":N}'
+~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"codex-review","timestamp":"TIMESTAMP","status":"STATUS","gate":"GATE","findings":N}'
 ```

 Substitute: TIMESTAMP (ISO 8601), STATUS ("clean" if PASS, "issues_found" if FAIL),
-GATE ("pass" or "fail"), findings (count of [P1] + [P2] markers),
-findings_fixed (count of findings that were addressed/fixed before shipping).
+GATE ("pass" or "fail"), findings (count of [P1] + [P2] markers).

 8. Clean up temp files:
 ```bash
 rm -f "$TMPERR"
 ```

-## Plan File Review Report
-
-After displaying the Review Readiness Dashboard in conversation output, also update the
-**plan file** itself so review status is visible to anyone reading the plan.
-
-### Detect the plan file
-
-1. Check if there is an active plan file in this conversation (the host provides plan file
-   paths in system messages — look for plan file references in the conversation context).
-2. If not found, skip this section silently — not every review runs in plan mode.
-
-### Generate the report
-
-Read the review log output you already have from the Review Readiness Dashboard step above.
-Parse each JSONL entry. Each skill logs different fields:
-
- **plan-ceo-review**: \`status\`, \`unresolved\`, \`critical_gaps\`, \`mode\`, \`scope_proposed\`, \`scope_accepted\`, \`scope_deferred\`, \`commit\`
-  → Findings: "{scope_proposed} proposals, {scope_accepted} accepted, {scope_deferred} deferred"
-  → If scope fields are 0 or missing (HOLD/REDUCTION mode): "mode: {mode}, {critical_gaps} critical gaps"
- **plan-eng-review**: \`status\`, \`unresolved\`, \`critical_gaps\`, \`issues_found\`, \`mode\`, \`commit\`
-  → Findings: "{issues_found} issues, {critical_gaps} critical gaps"
- **plan-design-review**: \`status\`, \`initial_score\`, \`overall_score\`, \`unresolved\`, \`decisions_made\`, \`commit\`
-  → Findings: "score: {initial_score}/10 → {overall_score}/10, {decisions_made} decisions"
- **codex-review**: \`status\`, \`gate\`, \`findings\`, \`findings_fixed\`
-  → Findings: "{findings} findings, {findings_fixed}/{findings} fixed"
-
-All fields needed for the Findings column are now present in the JSONL entries.
-For the review you just completed, you may use richer details from your own Completion
-Summary. For prior reviews, use the JSONL fields directly — they contain all required data.
-
-Produce this markdown table:
-
-\`\`\`markdown
-## GSTACK REVIEW REPORT
-
-| Review | Trigger | Why | Runs | Status | Findings |
-|--------|---------|-----|------|--------|----------|
-| CEO Review | \`/plan-ceo-review\` | Scope & strategy | {runs} | {status} | {findings} |
-| Codex Review | \`/codex review\` | Independent 2nd opinion | {runs} | {status} | {findings} |
-| Eng Review | \`/plan-eng-review\` | Architecture & tests (required) | {runs} | {status} | {findings} |
-| Design Review | \`/plan-design-review\` | UI/UX gaps | {runs} | {status} | {findings} |
-\`\`\`
-
-Below the table, add these lines (omit any that are empty/not applicable):
-
- **CODEX:** (only if codex-review ran) — one-line summary of codex fixes
- **CROSS-MODEL:** (only if both Claude and Codex reviews exist) — overlap analysis
- **UNRESOLVED:** total unresolved decisions across all reviews
- **VERDICT:** list reviews that are CLEAR (e.g., "CEO + ENG CLEARED — ready to implement").
-  If Eng Review is not CLEAR and not skipped globally, append "eng review required".
-
-### Write to the plan file
-
-**PLAN MODE EXCEPTION — ALWAYS RUN:** This writes to the plan file, which is the one
-file you are allowed to edit in plan mode. The plan file review report is part of the
-plan's living status.
-
- Search the plan file for a \`## GSTACK REVIEW REPORT\` section **anywhere** in the file
-  (not just at the end — content may have been added after it).
- If found, **replace it** entirely using the Edit tool. Match from \`## GSTACK REVIEW REPORT\`
-  through either the next \`## \` heading or end of file, whichever comes first. This ensures
-  content added after the report section is preserved, not eaten. If the Edit fails
-  (e.g., concurrent edit changed the content), re-read the plan file and retry once.
- If no such section exists, **append it** to the end of the plan file.
- Always place it as the very last section in the plan file. If it was found mid-file,
-  move it: delete the old location and append at the end.
-
 ---

 ## Step 2B: Challenge (Adversarial) Mode
@@ -549,7 +506,7 @@ THE PLAN:

 For a **new session:**
 ```bash
-codex exec "<prompt>" -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached --json 2>"$TMPERR" | python3 -c "
+codex exec "<prompt>" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached --json 2>"$TMPERR" | python3 -c "
 import sys, json
 for line in sys.stdin:
    line = line.strip()
@@ -582,7 +539,7 @@ for line in sys.stdin:

 For a **resumed session** (user chose "Continue"):
 ```bash
-codex exec resume <session-id> "<prompt>" -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached --json 2>"$TMPERR" | python3 -c "
+codex exec resume <session-id> "<prompt>" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached --json 2>"$TMPERR" | python3 -c "
 <same python streaming parser as above>
 "
 ```
@@ -618,7 +575,10 @@ Session saved — run /codex again to continue this conversation.
 agentic coding model). This means as OpenAI ships newer models, /codex automatically
 uses them. If the user wants a specific model, pass `-m` through to codex.

-**Reasoning effort:** All modes use `xhigh` — maximum reasoning power. When reviewing code, breaking code, or consulting on architecture, you want the model thinking as hard as possible.
+**Reasoning effort** varies by mode — use the right level for each task:
+- **Review mode:** `high` — thorough but not slow. Diff review benefits from depth but doesn't need maximum compute.
+- **Challenge (adversarial) mode:** `xhigh` — maximum reasoning power. When trying to break code, you want the model thinking as hard as possible.
+- **Consult mode:** `high` — good balance of depth and speed for conversations.

 **Web search:** All codex commands use `--enable web_search_cached` so Codex can look up
 docs and APIs during review. This is OpenAI's cached index — fast, no extra cost.