mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-05 05:05:08 +02:00
Merge remote-tracking branch 'origin/garrytan/community-mode' into garrytan/persistent-docs
# Conflicts: # .agents/skills/gstack-browse/SKILL.md # .agents/skills/gstack-design-consultation/SKILL.md # .agents/skills/gstack-design-review/SKILL.md # .agents/skills/gstack-document-release/SKILL.md # .agents/skills/gstack-investigate/SKILL.md # .agents/skills/gstack-office-hours/SKILL.md # .agents/skills/gstack-plan-ceo-review/SKILL.md # .agents/skills/gstack-plan-design-review/SKILL.md # .agents/skills/gstack-plan-eng-review/SKILL.md # .agents/skills/gstack-qa-only/SKILL.md # .agents/skills/gstack-qa/SKILL.md # .agents/skills/gstack-retro/SKILL.md # .agents/skills/gstack-review/SKILL.md # .agents/skills/gstack-setup-browser-cookies/SKILL.md # .agents/skills/gstack-ship/SKILL.md # .agents/skills/gstack/SKILL.md # SKILL.md # browse/SKILL.md # codex/SKILL.md # design-consultation/SKILL.md # design-review/SKILL.md # document-release/SKILL.md # investigate/SKILL.md # office-hours/SKILL.md # plan-ceo-review/SKILL.md # plan-design-review/SKILL.md # plan-eng-review/SKILL.md # qa-only/SKILL.md # qa/SKILL.md # retro/SKILL.md # review/SKILL.md # scripts/gen-skill-docs.ts # setup-browser-cookies/SKILL.md # ship/SKILL.md
This commit is contained in:
+78
-118
@@ -40,6 +40,12 @@ _TEL_START=$(date +%s)
|
||||
_SESSION_ID="$$-$(date +%s)"
|
||||
echo "TELEMETRY: ${_TEL:-off}"
|
||||
echo "TEL_PROMPTED: $_TEL_PROMPTED"
|
||||
_EMAIL=$(~/.claude/skills/gstack/bin/gstack-config get email 2>/dev/null || true)
|
||||
_COMM_PROMPTED=$([ -f ~/.gstack/.community-prompted ] && echo "yes" || echo "no")
|
||||
_AUTH_OK=$(~/.claude/skills/gstack/bin/gstack-auth-refresh --check 2>/dev/null && echo "yes" || echo "no")
|
||||
echo "EMAIL: ${_EMAIL:-none}"
|
||||
echo "COMM_PROMPTED: $_COMM_PROMPTED"
|
||||
echo "AUTH: $_AUTH_OK"
|
||||
mkdir -p ~/.gstack/analytics
|
||||
echo '{"skill":"codex","ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","repo":"'$(basename "$(git rev-parse --show-toplevel 2>/dev/null)" 2>/dev/null || echo "unknown")'"}' >> ~/.gstack/analytics/skill-usage.jsonl 2>/dev/null || true
|
||||
for _PF in ~/.gstack/analytics/.pending-*; do [ -f "$_PF" ] && ~/.claude/skills/gstack/bin/gstack-telemetry-log --event-type skill_run --skill _pending_finalize --outcome unknown --session-id "$_SESSION_ID" 2>/dev/null || true; break; done
|
||||
@@ -65,28 +71,31 @@ Only run `open` if the user says yes. Always run `touch` to mark as seen. This o
|
||||
If `TEL_PROMPTED` is `no` AND `LAKE_INTRO` is `yes`: After the lake intro is handled,
|
||||
ask the user about telemetry. Use AskUserQuestion:
|
||||
|
||||
> Help gstack get better! Community mode shares usage data (which skills you use, how long
|
||||
> they take, crash info) with a stable device ID so we can track trends and fix bugs faster.
|
||||
> No code, file paths, or repo names are ever sent.
|
||||
> gstack can share usage data (which skills you use, how long they take, crash info)
|
||||
> to help improve the project. No code, file paths, or repo names are ever sent.
|
||||
>
|
||||
> The **community tier** unlocks extra features:
|
||||
> - **Cloud backup** of your gstack config + history (restore on new machines)
|
||||
> - **Benchmarks**: see how your usage compares to other builders
|
||||
> - **Skill recommendations** based on community patterns
|
||||
>
|
||||
> Change anytime with `gstack-config set telemetry off`.
|
||||
|
||||
Options:
|
||||
- A) Help gstack get better! (recommended)
|
||||
- B) No thanks
|
||||
- A) Community — share data + email for backup, benchmarks & recommendations (recommended)
|
||||
- B) Anonymous — share data only, no account
|
||||
- C) No thanks
|
||||
|
||||
If A: run `~/.claude/skills/gstack/bin/gstack-config set telemetry community`
|
||||
If A: ask for their email via a follow-up AskUserQuestion, then run:
|
||||
```bash
|
||||
~/.claude/skills/gstack/bin/gstack-config set telemetry community
|
||||
~/.claude/skills/gstack/bin/gstack-auth <user-provided-email>
|
||||
```
|
||||
The auth script will send a verification code to their email. Wait for them to enter the 6-digit code.
|
||||
If auth succeeds, continue with the skill. If it fails, fall back to anonymous tier.
|
||||
|
||||
If B: ask a follow-up AskUserQuestion:
|
||||
|
||||
> How about anonymous mode? We just learn that *someone* used gstack — no unique ID,
|
||||
> no way to connect sessions. Just a counter that helps us know if anyone's out there.
|
||||
|
||||
Options:
|
||||
- A) Sure, anonymous is fine
|
||||
- B) No thanks, fully off
|
||||
|
||||
If B→A: run `~/.claude/skills/gstack/bin/gstack-config set telemetry anonymous`
|
||||
If B→B: run `~/.claude/skills/gstack/bin/gstack-config set telemetry off`
|
||||
If B: run `~/.claude/skills/gstack/bin/gstack-config set telemetry anonymous`
|
||||
If C: run `~/.claude/skills/gstack/bin/gstack-config set telemetry off`
|
||||
|
||||
Always run:
|
||||
```bash
|
||||
@@ -95,6 +104,33 @@ touch ~/.gstack/.telemetry-prompted
|
||||
|
||||
This only happens once. If `TEL_PROMPTED` is `yes`, skip this entirely.
|
||||
|
||||
If `TELEMETRY` is `anonymous` AND `COMM_PROMPTED` is `no`: After the main skill workflow
|
||||
begins (not during preamble), offer the community tier upgrade once. Use AskUserQuestion:
|
||||
|
||||
> You're already sharing anonymous usage data — nice! Want to unlock more?
|
||||
>
|
||||
> The **community tier** adds:
|
||||
> - Cloud backup of your gstack config (restore on new machines)
|
||||
> - Benchmarks: see how your /qa times compare to the community
|
||||
> - Skill recommendations based on what other builders use
|
||||
>
|
||||
> Just needs your email (verified via a one-time code).
|
||||
|
||||
Options:
|
||||
- A) Yes, join community (enter email)
|
||||
- B) Not now
|
||||
|
||||
If A: ask for their email, then run `~/.claude/skills/gstack/bin/gstack-auth <email>`.
|
||||
Wait for the verification code. On success, run `~/.claude/skills/gstack/bin/gstack-config set telemetry community`.
|
||||
If B: do nothing.
|
||||
|
||||
Always run:
|
||||
```bash
|
||||
touch ~/.gstack/.community-prompted
|
||||
```
|
||||
|
||||
This only happens once. If `COMM_PROMPTED` is `yes`, skip this entirely.
|
||||
|
||||
## AskUserQuestion Format
|
||||
|
||||
**ALWAYS follow this structure for every AskUserQuestion call:**
|
||||
@@ -132,26 +168,6 @@ AI-assisted coding makes the marginal cost of completeness near-zero. When you p
|
||||
- BAD: "Let's defer test coverage to a follow-up PR." (Tests are the cheapest lake to boil.)
|
||||
- BAD: Quoting only human-team effort: "This would take 2 weeks." (Say: "2 weeks human / ~1 hour CC.")
|
||||
|
||||
## Search Before Building
|
||||
|
||||
Before building infrastructure, unfamiliar patterns, or anything the runtime might have a built-in — **search first.** Read `~/.claude/skills/gstack/ETHOS.md` for the full philosophy.
|
||||
|
||||
**Three layers of knowledge:**
|
||||
- **Layer 1** (tried and true — in distribution). Don't reinvent the wheel. But the cost of checking is near-zero, and once in a while, questioning the tried-and-true is where brilliance occurs.
|
||||
- **Layer 2** (new and popular — search for these). But scrutinize: humans are subject to mania. Search results are inputs to your thinking, not answers.
|
||||
- **Layer 3** (first principles — prize these above all). Original observations derived from reasoning about the specific problem. The most valuable of all.
|
||||
|
||||
**Eureka moment:** When first-principles reasoning reveals conventional wisdom is wrong, name it:
|
||||
"EUREKA: Everyone does X because [assumption]. But [evidence] shows this is wrong. Y is better because [reasoning]."
|
||||
|
||||
Log eureka moments:
|
||||
```bash
|
||||
jq -n --arg ts "$(date -u +%Y-%m-%dT%H:%M:%SZ)" --arg skill "SKILL_NAME" --arg branch "$(git branch --show-current 2>/dev/null)" --arg insight "ONE_LINE_SUMMARY" '{ts:$ts,skill:$skill,branch:$branch,insight:$insight}' >> ~/.gstack/analytics/eureka.jsonl 2>/dev/null || true
|
||||
```
|
||||
Replace SKILL_NAME and ONE_LINE_SUMMARY. Runs inline — don't stop the workflow.
|
||||
|
||||
**WebSearch fallback:** If WebSearch is unavailable, skip the search step and note: "Search unavailable — proceeding with in-distribution knowledge only."
|
||||
|
||||
## Contributor Mode
|
||||
|
||||
If `_CONTRIB` is `true`: you are in **contributor mode**. You're a gstack user who also helps make it better.
|
||||
@@ -221,10 +237,15 @@ Determine the skill name from the `name:` field in this file's YAML frontmatter.
|
||||
Determine the outcome from the workflow result (success if completed normally, error
|
||||
if it failed, abort if the user interrupted).
|
||||
|
||||
**PLAN MODE EXCEPTION — ALWAYS RUN:** This command writes telemetry to
|
||||
`~/.gstack/analytics/` (user config directory, not project files). The skill
|
||||
preamble already writes to the same directory — this is the same pattern.
|
||||
Skipping this command loses session duration and outcome data.
|
||||
**For errors:** Also determine:
|
||||
- `ERROR_CLASS`: a short category — one of: `timeout`, `test_failure`, `build_failure`,
|
||||
`git_error`, `auth_error`, `network_error`, `browse_error`, `lint_error`,
|
||||
`merge_conflict`, `permission_error`, `unknown_error`. Pick the most specific match.
|
||||
- `ERROR_MESSAGE`: a one-line summary of what went wrong (max 200 chars). Include the
|
||||
command that failed and the key error text. Example: `"bun test: 3 tests failed in
|
||||
auth.test.ts — expected 200 got 401"`. Never include file paths, secrets, or PII.
|
||||
- `FAILED_STEP`: which step in the skill workflow failed. Example: `"run_tests"`,
|
||||
`"create_pr"`, `"merge_base"`, `"build"`, `"qa_browse"`. Use snake_case, max 30 chars.
|
||||
|
||||
Run this bash:
|
||||
|
||||
@@ -234,12 +255,16 @@ _TEL_DUR=$(( _TEL_END - _TEL_START ))
|
||||
rm -f ~/.gstack/analytics/.pending-"$_SESSION_ID" 2>/dev/null || true
|
||||
~/.claude/skills/gstack/bin/gstack-telemetry-log \
|
||||
--skill "SKILL_NAME" --duration "$_TEL_DUR" --outcome "OUTCOME" \
|
||||
--used-browse "USED_BROWSE" --session-id "$_SESSION_ID" 2>/dev/null &
|
||||
--used-browse "USED_BROWSE" --session-id "$_SESSION_ID" \
|
||||
--error-class "ERROR_CLASS" --error-message "ERROR_MESSAGE" \
|
||||
--failed-step "FAILED_STEP" 2>/dev/null &
|
||||
```
|
||||
|
||||
Replace `SKILL_NAME` with the actual skill name from frontmatter, `OUTCOME` with
|
||||
success/error/abort, and `USED_BROWSE` with true/false based on whether `$B` was used.
|
||||
If you cannot determine the outcome, use "unknown". This runs in the background and
|
||||
For `ERROR_CLASS`, `ERROR_MESSAGE`, and `FAILED_STEP`: use empty string `""` if the
|
||||
outcome is not error. If the outcome is error but you cannot determine the details,
|
||||
use `"unknown_error"`, `""`, and `""` respectively. This runs in the background and
|
||||
never blocks the user.
|
||||
|
||||
## Step 0: Detect base branch
|
||||
@@ -320,13 +345,13 @@ TMPERR=$(mktemp /tmp/codex-err-XXXXXX.txt)
|
||||
|
||||
2. Run the review (5-minute timeout):
|
||||
```bash
|
||||
codex review --base <base> -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR"
|
||||
codex review --base <base> -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR"
|
||||
```
|
||||
|
||||
Use `timeout: 300000` on the Bash call. If the user provided custom instructions
|
||||
(e.g., `/codex review focus on security`), pass them as the prompt argument:
|
||||
```bash
|
||||
codex review "focus on security" --base <base> -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR"
|
||||
codex review "focus on security" --base <base> -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR"
|
||||
```
|
||||
|
||||
3. Capture the output. Then parse cost from stderr:
|
||||
@@ -367,85 +392,17 @@ CROSS-MODEL ANALYSIS:
|
||||
|
||||
7. Persist the review result:
|
||||
```bash
|
||||
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"codex-review","timestamp":"TIMESTAMP","status":"STATUS","gate":"GATE","findings":N,"findings_fixed":N}'
|
||||
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"codex-review","timestamp":"TIMESTAMP","status":"STATUS","gate":"GATE","findings":N}'
|
||||
```
|
||||
|
||||
Substitute: TIMESTAMP (ISO 8601), STATUS ("clean" if PASS, "issues_found" if FAIL),
|
||||
GATE ("pass" or "fail"), findings (count of [P1] + [P2] markers),
|
||||
findings_fixed (count of findings that were addressed/fixed before shipping).
|
||||
GATE ("pass" or "fail"), findings (count of [P1] + [P2] markers).
|
||||
|
||||
8. Clean up temp files:
|
||||
```bash
|
||||
rm -f "$TMPERR"
|
||||
```
|
||||
|
||||
## Plan File Review Report
|
||||
|
||||
After displaying the Review Readiness Dashboard in conversation output, also update the
|
||||
**plan file** itself so review status is visible to anyone reading the plan.
|
||||
|
||||
### Detect the plan file
|
||||
|
||||
1. Check if there is an active plan file in this conversation (the host provides plan file
|
||||
paths in system messages — look for plan file references in the conversation context).
|
||||
2. If not found, skip this section silently — not every review runs in plan mode.
|
||||
|
||||
### Generate the report
|
||||
|
||||
Read the review log output you already have from the Review Readiness Dashboard step above.
|
||||
Parse each JSONL entry. Each skill logs different fields:
|
||||
|
||||
- **plan-ceo-review**: \`status\`, \`unresolved\`, \`critical_gaps\`, \`mode\`, \`scope_proposed\`, \`scope_accepted\`, \`scope_deferred\`, \`commit\`
|
||||
→ Findings: "{scope_proposed} proposals, {scope_accepted} accepted, {scope_deferred} deferred"
|
||||
→ If scope fields are 0 or missing (HOLD/REDUCTION mode): "mode: {mode}, {critical_gaps} critical gaps"
|
||||
- **plan-eng-review**: \`status\`, \`unresolved\`, \`critical_gaps\`, \`issues_found\`, \`mode\`, \`commit\`
|
||||
→ Findings: "{issues_found} issues, {critical_gaps} critical gaps"
|
||||
- **plan-design-review**: \`status\`, \`initial_score\`, \`overall_score\`, \`unresolved\`, \`decisions_made\`, \`commit\`
|
||||
→ Findings: "score: {initial_score}/10 → {overall_score}/10, {decisions_made} decisions"
|
||||
- **codex-review**: \`status\`, \`gate\`, \`findings\`, \`findings_fixed\`
|
||||
→ Findings: "{findings} findings, {findings_fixed}/{findings} fixed"
|
||||
|
||||
All fields needed for the Findings column are now present in the JSONL entries.
|
||||
For the review you just completed, you may use richer details from your own Completion
|
||||
Summary. For prior reviews, use the JSONL fields directly — they contain all required data.
|
||||
|
||||
Produce this markdown table:
|
||||
|
||||
\`\`\`markdown
|
||||
## GSTACK REVIEW REPORT
|
||||
|
||||
| Review | Trigger | Why | Runs | Status | Findings |
|
||||
|--------|---------|-----|------|--------|----------|
|
||||
| CEO Review | \`/plan-ceo-review\` | Scope & strategy | {runs} | {status} | {findings} |
|
||||
| Codex Review | \`/codex review\` | Independent 2nd opinion | {runs} | {status} | {findings} |
|
||||
| Eng Review | \`/plan-eng-review\` | Architecture & tests (required) | {runs} | {status} | {findings} |
|
||||
| Design Review | \`/plan-design-review\` | UI/UX gaps | {runs} | {status} | {findings} |
|
||||
\`\`\`
|
||||
|
||||
Below the table, add these lines (omit any that are empty/not applicable):
|
||||
|
||||
- **CODEX:** (only if codex-review ran) — one-line summary of codex fixes
|
||||
- **CROSS-MODEL:** (only if both Claude and Codex reviews exist) — overlap analysis
|
||||
- **UNRESOLVED:** total unresolved decisions across all reviews
|
||||
- **VERDICT:** list reviews that are CLEAR (e.g., "CEO + ENG CLEARED — ready to implement").
|
||||
If Eng Review is not CLEAR and not skipped globally, append "eng review required".
|
||||
|
||||
### Write to the plan file
|
||||
|
||||
**PLAN MODE EXCEPTION — ALWAYS RUN:** This writes to the plan file, which is the one
|
||||
file you are allowed to edit in plan mode. The plan file review report is part of the
|
||||
plan's living status.
|
||||
|
||||
- Search the plan file for a \`## GSTACK REVIEW REPORT\` section **anywhere** in the file
|
||||
(not just at the end — content may have been added after it).
|
||||
- If found, **replace it** entirely using the Edit tool. Match from \`## GSTACK REVIEW REPORT\`
|
||||
through either the next \`## \` heading or end of file, whichever comes first. This ensures
|
||||
content added after the report section is preserved, not eaten. If the Edit fails
|
||||
(e.g., concurrent edit changed the content), re-read the plan file and retry once.
|
||||
- If no such section exists, **append it** to the end of the plan file.
|
||||
- Always place it as the very last section in the plan file. If it was found mid-file,
|
||||
move it: delete the old location and append at the end.
|
||||
|
||||
---
|
||||
|
||||
## Step 2B: Challenge (Adversarial) Mode
|
||||
@@ -549,7 +506,7 @@ THE PLAN:
|
||||
|
||||
For a **new session:**
|
||||
```bash
|
||||
codex exec "<prompt>" -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached --json 2>"$TMPERR" | python3 -c "
|
||||
codex exec "<prompt>" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached --json 2>"$TMPERR" | python3 -c "
|
||||
import sys, json
|
||||
for line in sys.stdin:
|
||||
line = line.strip()
|
||||
@@ -582,7 +539,7 @@ for line in sys.stdin:
|
||||
|
||||
For a **resumed session** (user chose "Continue"):
|
||||
```bash
|
||||
codex exec resume <session-id> "<prompt>" -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached --json 2>"$TMPERR" | python3 -c "
|
||||
codex exec resume <session-id> "<prompt>" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached --json 2>"$TMPERR" | python3 -c "
|
||||
<same python streaming parser as above>
|
||||
"
|
||||
```
|
||||
@@ -618,7 +575,10 @@ Session saved — run /codex again to continue this conversation.
|
||||
agentic coding model). This means as OpenAI ships newer models, /codex automatically
|
||||
uses them. If the user wants a specific model, pass `-m` through to codex.
|
||||
|
||||
**Reasoning effort:** All modes use `xhigh` — maximum reasoning power. When reviewing code, breaking code, or consulting on architecture, you want the model thinking as hard as possible.
|
||||
**Reasoning effort** varies by mode — use the right level for each task:
|
||||
- **Review mode:** `high` — thorough but not slow. Diff review benefits from depth but doesn't need maximum compute.
|
||||
- **Challenge (adversarial) mode:** `xhigh` — maximum reasoning power. When trying to break code, you want the model thinking as hard as possible.
|
||||
- **Consult mode:** `high` — good balance of depth and speed for conversations.
|
||||
|
||||
**Web search:** All codex commands use `--enable web_search_cached` so Codex can look up
|
||||
docs and APIs during review. This is OpenAI's cached index — fast, no extra cost.
|
||||
|
||||
Reference in New Issue
Block a user