mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-02 03:35:09 +02:00
b343ba2797
* fix(security): skip hidden directories in skill template discovery
discoverTemplates() scans subdirectories for SKILL.md.tmpl files but
only skips node_modules, .git, and dist. Hidden directories like
.claude/, .agents/, and .codex/ (which contain symlinked skill
installs) were being scanned, allowing a malicious .tmpl in a
symlinked skill to inject into the generation pipeline.
Fix: add !d.name.startsWith('.') to the subdirs() filter. This skips
all dot-prefixed directories, matching the standard convention that
hidden dirs are not source code.
* fix(security): sanitize telemetry JSONL inputs against injection
SKILL, OUTCOME, SESSION_ID, SOURCE, and EVENT_TYPE values go directly
into printf %s for JSONL output. If any contain double quotes,
backslashes, or newlines, the JSON breaks — or worse, injects
arbitrary fields.
Fix: strip quotes, backslashes, and control characters from all
string fields before JSONL construction via json_safe() helper.
* fix(security): validate JSON input in gstack-review-log
gstack-review-log appends its argument directly to a JSONL file with
no validation. Malformed or crafted input could corrupt the review log
or inject arbitrary content.
Fix: validate input is parseable JSON via python3 before appending.
Reject with exit 1 and stderr message if invalid.
* fix: treat relative dot-paths as file paths in screenshot command
Closes #495
* fix: use host-specific co-author trailer in /ship and /document-release
Codex-generated skills hardcoded a Claude co-author trailer in commit
messages. Users running gstack under Codex pushed commits attributed
to the wrong AI assistant.
Add {{CO_AUTHOR_TRAILER}} resolver that emits the correct trailer
based on ctx.host:
- claude: Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- codex: Co-Authored-By: OpenAI Codex <noreply@openai.com>
Replace hardcoded trailers in ship/SKILL.md.tmpl and
document-release/SKILL.md.tmpl with the resolver placeholder.
Fixes #282. Fixes #383.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: auto-upgrade marker no longer masks newer remote versions
When a just-upgraded-from marker persists across sessions, the update
check would write UP_TO_DATE to cache and exit immediately — never
fetching the remote VERSION. Users silently miss updates that landed
after their last upgrade.
Remove the early exit and premature cache write so the script falls
through to the remote check after consuming the marker. This ensures
JUST_UPGRADED is still emitted for the preamble, while also detecting
any newer versions available upstream.
Fixes #515
* fix: decouple doc generation from binary compilation in build script
The build script chains gen:skill-docs and bun build --compile with &&,
so a doc generation failure (e.g. missing Codex host config, template
error) prevents the browse binary from being compiled. Users end up
with a broken install where setup reports the binary is missing.
Replace && with ; for the two gen:skill-docs steps so they run
independently of the compilation chain. Doc generation errors are still
visible in stderr, but no longer block binary compilation.
Fixes #482
* fix: extend security sanitization + add 10 tests for merged community PRs
- Extend json_safe() to ERROR_CLASS and FAILED_STEP fields
- Improve ERROR_MESSAGE escaping to handle backslashes and newlines
- Replace python3 with bun for JSON validation in gstack-review-log
- Add 7 telemetry injection prevention tests
- Add 2 review-log JSON validation tests
- Add 1 discover-skills hidden directory filtering test
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: stabilize flaky E2E tests (browse-basic, ship-base-branch, dashboard-via)
browse-basic: bump maxTurns 5→7 (agent reads PNG per SKILL.md instruction)
ship-base-branch: extract Step 0 only instead of full 1900-line ship/SKILL.md
dashboard-via: extract dashboard section only + increase timeout 90s→180s
Root cause: copying full SKILL.md files into test fixtures caused context bloat,
leading to timeouts and flaky turn limits. Extracting only the relevant section
cut dashboard-via from timing out at 240s to finishing in 38s.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: add E2E fixture extraction rule to CLAUDE.md
Never copy full SKILL.md files into E2E test fixtures. Extract only
the section the test needs. Also: run targeted evals in foreground,
never pkill and restart mid-run.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: stabilize journey-think-bigger routing test
Use exact trigger phrases from plan-ceo-review skill description
("think bigger", "expand scope", "ambitious enough") instead of
the ambiguous "thinking too small". Reduce maxTurns 5→3 to cut
cost per attempt ($0.12 vs $0.25). Test remains periodic tier
since LLM routing is inherently non-deterministic.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* remove: delete journey-think-bigger routing test
Never passed reliably. Tests ambiguous routing ("think bigger" →
plan-ceo-review) but Claude legitimately answers directly instead
of invoking a skill. The other 10 journey tests cover routing
with clear, actionable signals.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: bump version and changelog (v0.12.7.0)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Arun Kumar Thiagarajan <arunkt.bm14@gmail.com>
Co-authored-by: bluzername <bluzer@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Greg Jackson <gregario@users.noreply.github.com>
375 lines
15 KiB
TypeScript
375 lines
15 KiB
TypeScript
import type { TemplateContext } from './types';
|
||
|
||
export function generateSlugEval(ctx: TemplateContext): string {
|
||
return `eval "$(${ctx.paths.binDir}/gstack-slug 2>/dev/null)"`;
|
||
}
|
||
|
||
export function generateSlugSetup(ctx: TemplateContext): string {
|
||
return `eval "$(${ctx.paths.binDir}/gstack-slug 2>/dev/null)" && mkdir -p ~/.gstack/projects/$SLUG`;
|
||
}
|
||
|
||
export function generateBaseBranchDetect(_ctx: TemplateContext): string {
|
||
return `## Step 0: Detect platform and base branch
|
||
|
||
First, detect the git hosting platform from the remote URL:
|
||
|
||
\`\`\`bash
|
||
git remote get-url origin 2>/dev/null
|
||
\`\`\`
|
||
|
||
- If the URL contains "github.com" → platform is **GitHub**
|
||
- If the URL contains "gitlab" → platform is **GitLab**
|
||
- Otherwise, check CLI availability:
|
||
- \`gh auth status 2>/dev/null\` succeeds → platform is **GitHub** (covers GitHub Enterprise)
|
||
- \`glab auth status 2>/dev/null\` succeeds → platform is **GitLab** (covers self-hosted)
|
||
- Neither → **unknown** (use git-native commands only)
|
||
|
||
Determine which branch this PR/MR targets, or the repo's default branch if no
|
||
PR/MR exists. Use the result as "the base branch" in all subsequent steps.
|
||
|
||
**If GitHub:**
|
||
1. \`gh pr view --json baseRefName -q .baseRefName\` — if succeeds, use it
|
||
2. \`gh repo view --json defaultBranchRef -q .defaultBranchRef.name\` — if succeeds, use it
|
||
|
||
**If GitLab:**
|
||
1. \`glab mr view -F json 2>/dev/null\` and extract the \`target_branch\` field — if succeeds, use it
|
||
2. \`glab repo view -F json 2>/dev/null\` and extract the \`default_branch\` field — if succeeds, use it
|
||
|
||
**Git-native fallback (if unknown platform, or CLI commands fail):**
|
||
1. \`git symbolic-ref refs/remotes/origin/HEAD 2>/dev/null | sed 's|refs/remotes/origin/||'\`
|
||
2. If that fails: \`git rev-parse --verify origin/main 2>/dev/null\` → use \`main\`
|
||
3. If that fails: \`git rev-parse --verify origin/master 2>/dev/null\` → use \`master\`
|
||
|
||
If all fail, fall back to \`main\`.
|
||
|
||
Print the detected base branch name. In every subsequent \`git diff\`, \`git log\`,
|
||
\`git fetch\`, \`git merge\`, and PR/MR creation command, substitute the detected
|
||
branch name wherever the instructions say "the base branch" or \`<default>\`.
|
||
|
||
---`;
|
||
}
|
||
|
||
export function generateDeployBootstrap(_ctx: TemplateContext): string {
|
||
return `\`\`\`bash
|
||
# Check for persisted deploy config in CLAUDE.md
|
||
DEPLOY_CONFIG=$(grep -A 20 "## Deploy Configuration" CLAUDE.md 2>/dev/null || echo "NO_CONFIG")
|
||
echo "$DEPLOY_CONFIG"
|
||
|
||
# If config exists, parse it
|
||
if [ "$DEPLOY_CONFIG" != "NO_CONFIG" ]; then
|
||
PROD_URL=$(echo "$DEPLOY_CONFIG" | grep -i "production.*url" | head -1 | sed 's/.*: *//')
|
||
PLATFORM=$(echo "$DEPLOY_CONFIG" | grep -i "platform" | head -1 | sed 's/.*: *//')
|
||
echo "PERSISTED_PLATFORM:$PLATFORM"
|
||
echo "PERSISTED_URL:$PROD_URL"
|
||
fi
|
||
|
||
# Auto-detect platform from config files
|
||
[ -f fly.toml ] && echo "PLATFORM:fly"
|
||
[ -f render.yaml ] && echo "PLATFORM:render"
|
||
([ -f vercel.json ] || [ -d .vercel ]) && echo "PLATFORM:vercel"
|
||
[ -f netlify.toml ] && echo "PLATFORM:netlify"
|
||
[ -f Procfile ] && echo "PLATFORM:heroku"
|
||
([ -f railway.json ] || [ -f railway.toml ]) && echo "PLATFORM:railway"
|
||
|
||
# Detect deploy workflows
|
||
for f in .github/workflows/*.yml .github/workflows/*.yaml; do
|
||
[ -f "$f" ] && grep -qiE "deploy|release|production|cd" "$f" 2>/dev/null && echo "DEPLOY_WORKFLOW:$f"
|
||
[ -f "$f" ] && grep -qiE "staging" "$f" 2>/dev/null && echo "STAGING_WORKFLOW:$f"
|
||
done
|
||
\`\`\`
|
||
|
||
If \`PERSISTED_PLATFORM\` and \`PERSISTED_URL\` were found in CLAUDE.md, use them directly
|
||
and skip manual detection. If no persisted config exists, use the auto-detected platform
|
||
to guide deploy verification. If nothing is detected, ask the user via AskUserQuestion
|
||
in the decision tree below.
|
||
|
||
If you want to persist deploy settings for future runs, suggest the user run \`/setup-deploy\`.`;
|
||
}
|
||
|
||
export function generateQAMethodology(_ctx: TemplateContext): string {
|
||
return `## Modes
|
||
|
||
### Diff-aware (automatic when on a feature branch with no URL)
|
||
|
||
This is the **primary mode** for developers verifying their work. When the user says \`/qa\` without a URL and the repo is on a feature branch, automatically:
|
||
|
||
1. **Analyze the branch diff** to understand what changed:
|
||
\`\`\`bash
|
||
git diff main...HEAD --name-only
|
||
git log main..HEAD --oneline
|
||
\`\`\`
|
||
|
||
2. **Identify affected pages/routes** from the changed files:
|
||
- Controller/route files → which URL paths they serve
|
||
- View/template/component files → which pages render them
|
||
- Model/service files → which pages use those models (check controllers that reference them)
|
||
- CSS/style files → which pages include those stylesheets
|
||
- API endpoints → test them directly with \`$B js "await fetch('/api/...')"\`
|
||
- Static pages (markdown, HTML) → navigate to them directly
|
||
|
||
**If no obvious pages/routes are identified from the diff:** Do not skip browser testing. The user invoked /qa because they want browser-based verification. Fall back to Quick mode — navigate to the homepage, follow the top 5 navigation targets, check console for errors, and test any interactive elements found. Backend, config, and infrastructure changes affect app behavior — always verify the app still works.
|
||
|
||
3. **Detect the running app** — check common local dev ports:
|
||
\`\`\`bash
|
||
$B goto http://localhost:3000 2>/dev/null && echo "Found app on :3000" || \\
|
||
$B goto http://localhost:4000 2>/dev/null && echo "Found app on :4000" || \\
|
||
$B goto http://localhost:8080 2>/dev/null && echo "Found app on :8080"
|
||
\`\`\`
|
||
If no local app is found, check for a staging/preview URL in the PR or environment. If nothing works, ask the user for the URL.
|
||
|
||
4. **Test each affected page/route:**
|
||
- Navigate to the page
|
||
- Take a screenshot
|
||
- Check console for errors
|
||
- If the change was interactive (forms, buttons, flows), test the interaction end-to-end
|
||
- Use \`snapshot -D\` before and after actions to verify the change had the expected effect
|
||
|
||
5. **Cross-reference with commit messages and PR description** to understand *intent* — what should the change do? Verify it actually does that.
|
||
|
||
6. **Check TODOS.md** (if it exists) for known bugs or issues related to the changed files. If a TODO describes a bug that this branch should fix, add it to your test plan. If you find a new bug during QA that isn't in TODOS.md, note it in the report.
|
||
|
||
7. **Report findings** scoped to the branch changes:
|
||
- "Changes tested: N pages/routes affected by this branch"
|
||
- For each: does it work? Screenshot evidence.
|
||
- Any regressions on adjacent pages?
|
||
|
||
**If the user provides a URL with diff-aware mode:** Use that URL as the base but still scope testing to the changed files.
|
||
|
||
### Full (default when URL is provided)
|
||
Systematic exploration. Visit every reachable page. Document 5-10 well-evidenced issues. Produce health score. Takes 5-15 minutes depending on app size.
|
||
|
||
### Quick (\`--quick\`)
|
||
30-second smoke test. Visit homepage + top 5 navigation targets. Check: page loads? Console errors? Broken links? Produce health score. No detailed issue documentation.
|
||
|
||
### Regression (\`--regression <baseline>\`)
|
||
Run full mode, then load \`baseline.json\` from a previous run. Diff: which issues are fixed? Which are new? What's the score delta? Append regression section to report.
|
||
|
||
---
|
||
|
||
## Workflow
|
||
|
||
### Phase 1: Initialize
|
||
|
||
1. Find browse binary (see Setup above)
|
||
2. Create output directories
|
||
3. Copy report template from \`qa/templates/qa-report-template.md\` to output dir
|
||
4. Start timer for duration tracking
|
||
|
||
### Phase 2: Authenticate (if needed)
|
||
|
||
**If the user specified auth credentials:**
|
||
|
||
\`\`\`bash
|
||
$B goto <login-url>
|
||
$B snapshot -i # find the login form
|
||
$B fill @e3 "user@example.com"
|
||
$B fill @e4 "[REDACTED]" # NEVER include real passwords in report
|
||
$B click @e5 # submit
|
||
$B snapshot -D # verify login succeeded
|
||
\`\`\`
|
||
|
||
**If the user provided a cookie file:**
|
||
|
||
\`\`\`bash
|
||
$B cookie-import cookies.json
|
||
$B goto <target-url>
|
||
\`\`\`
|
||
|
||
**If 2FA/OTP is required:** Ask the user for the code and wait.
|
||
|
||
**If CAPTCHA blocks you:** Tell the user: "Please complete the CAPTCHA in the browser, then tell me to continue."
|
||
|
||
### Phase 3: Orient
|
||
|
||
Get a map of the application:
|
||
|
||
\`\`\`bash
|
||
$B goto <target-url>
|
||
$B snapshot -i -a -o "$REPORT_DIR/screenshots/initial.png"
|
||
$B links # map navigation structure
|
||
$B console --errors # any errors on landing?
|
||
\`\`\`
|
||
|
||
**Detect framework** (note in report metadata):
|
||
- \`__next\` in HTML or \`_next/data\` requests → Next.js
|
||
- \`csrf-token\` meta tag → Rails
|
||
- \`wp-content\` in URLs → WordPress
|
||
- Client-side routing with no page reloads → SPA
|
||
|
||
**For SPAs:** The \`links\` command may return few results because navigation is client-side. Use \`snapshot -i\` to find nav elements (buttons, menu items) instead.
|
||
|
||
### Phase 4: Explore
|
||
|
||
Visit pages systematically. At each page:
|
||
|
||
\`\`\`bash
|
||
$B goto <page-url>
|
||
$B snapshot -i -a -o "$REPORT_DIR/screenshots/page-name.png"
|
||
$B console --errors
|
||
\`\`\`
|
||
|
||
Then follow the **per-page exploration checklist** (see \`qa/references/issue-taxonomy.md\`):
|
||
|
||
1. **Visual scan** — Look at the annotated screenshot for layout issues
|
||
2. **Interactive elements** — Click buttons, links, controls. Do they work?
|
||
3. **Forms** — Fill and submit. Test empty, invalid, edge cases
|
||
4. **Navigation** — Check all paths in and out
|
||
5. **States** — Empty state, loading, error, overflow
|
||
6. **Console** — Any new JS errors after interactions?
|
||
7. **Responsiveness** — Check mobile viewport if relevant:
|
||
\`\`\`bash
|
||
$B viewport 375x812
|
||
$B screenshot "$REPORT_DIR/screenshots/page-mobile.png"
|
||
$B viewport 1280x720
|
||
\`\`\`
|
||
|
||
**Depth judgment:** Spend more time on core features (homepage, dashboard, checkout, search) and less on secondary pages (about, terms, privacy).
|
||
|
||
**Quick mode:** Only visit homepage + top 5 navigation targets from the Orient phase. Skip the per-page checklist — just check: loads? Console errors? Broken links visible?
|
||
|
||
### Phase 5: Document
|
||
|
||
Document each issue **immediately when found** — don't batch them.
|
||
|
||
**Two evidence tiers:**
|
||
|
||
**Interactive bugs** (broken flows, dead buttons, form failures):
|
||
1. Take a screenshot before the action
|
||
2. Perform the action
|
||
3. Take a screenshot showing the result
|
||
4. Use \`snapshot -D\` to show what changed
|
||
5. Write repro steps referencing screenshots
|
||
|
||
\`\`\`bash
|
||
$B screenshot "$REPORT_DIR/screenshots/issue-001-step-1.png"
|
||
$B click @e5
|
||
$B screenshot "$REPORT_DIR/screenshots/issue-001-result.png"
|
||
$B snapshot -D
|
||
\`\`\`
|
||
|
||
**Static bugs** (typos, layout issues, missing images):
|
||
1. Take a single annotated screenshot showing the problem
|
||
2. Describe what's wrong
|
||
|
||
\`\`\`bash
|
||
$B snapshot -i -a -o "$REPORT_DIR/screenshots/issue-002.png"
|
||
\`\`\`
|
||
|
||
**Write each issue to the report immediately** using the template format from \`qa/templates/qa-report-template.md\`.
|
||
|
||
### Phase 6: Wrap Up
|
||
|
||
1. **Compute health score** using the rubric below
|
||
2. **Write "Top 3 Things to Fix"** — the 3 highest-severity issues
|
||
3. **Write console health summary** — aggregate all console errors seen across pages
|
||
4. **Update severity counts** in the summary table
|
||
5. **Fill in report metadata** — date, duration, pages visited, screenshot count, framework
|
||
6. **Save baseline** — write \`baseline.json\` with:
|
||
\`\`\`json
|
||
{
|
||
"date": "YYYY-MM-DD",
|
||
"url": "<target>",
|
||
"healthScore": N,
|
||
"issues": [{ "id": "ISSUE-001", "title": "...", "severity": "...", "category": "..." }],
|
||
"categoryScores": { "console": N, "links": N, ... }
|
||
}
|
||
\`\`\`
|
||
|
||
**Regression mode:** After writing the report, load the baseline file. Compare:
|
||
- Health score delta
|
||
- Issues fixed (in baseline but not current)
|
||
- New issues (in current but not baseline)
|
||
- Append the regression section to the report
|
||
|
||
---
|
||
|
||
## Health Score Rubric
|
||
|
||
Compute each category score (0-100), then take the weighted average.
|
||
|
||
### Console (weight: 15%)
|
||
- 0 errors → 100
|
||
- 1-3 errors → 70
|
||
- 4-10 errors → 40
|
||
- 10+ errors → 10
|
||
|
||
### Links (weight: 10%)
|
||
- 0 broken → 100
|
||
- Each broken link → -15 (minimum 0)
|
||
|
||
### Per-Category Scoring (Visual, Functional, UX, Content, Performance, Accessibility)
|
||
Each category starts at 100. Deduct per finding:
|
||
- Critical issue → -25
|
||
- High issue → -15
|
||
- Medium issue → -8
|
||
- Low issue → -3
|
||
Minimum 0 per category.
|
||
|
||
### Weights
|
||
| Category | Weight |
|
||
|----------|--------|
|
||
| Console | 15% |
|
||
| Links | 10% |
|
||
| Visual | 10% |
|
||
| Functional | 20% |
|
||
| UX | 15% |
|
||
| Performance | 10% |
|
||
| Content | 5% |
|
||
| Accessibility | 15% |
|
||
|
||
### Final Score
|
||
\`score = Σ (category_score × weight)\`
|
||
|
||
---
|
||
|
||
## Framework-Specific Guidance
|
||
|
||
### Next.js
|
||
- Check console for hydration errors (\`Hydration failed\`, \`Text content did not match\`)
|
||
- Monitor \`_next/data\` requests in network — 404s indicate broken data fetching
|
||
- Test client-side navigation (click links, don't just \`goto\`) — catches routing issues
|
||
- Check for CLS (Cumulative Layout Shift) on pages with dynamic content
|
||
|
||
### Rails
|
||
- Check for N+1 query warnings in console (if development mode)
|
||
- Verify CSRF token presence in forms
|
||
- Test Turbo/Stimulus integration — do page transitions work smoothly?
|
||
- Check for flash messages appearing and dismissing correctly
|
||
|
||
### WordPress
|
||
- Check for plugin conflicts (JS errors from different plugins)
|
||
- Verify admin bar visibility for logged-in users
|
||
- Test REST API endpoints (\`/wp-json/\`)
|
||
- Check for mixed content warnings (common with WP)
|
||
|
||
### General SPA (React, Vue, Angular)
|
||
- Use \`snapshot -i\` for navigation — \`links\` command misses client-side routes
|
||
- Check for stale state (navigate away and back — does data refresh?)
|
||
- Test browser back/forward — does the app handle history correctly?
|
||
- Check for memory leaks (monitor console after extended use)
|
||
|
||
---
|
||
|
||
## Important Rules
|
||
|
||
1. **Repro is everything.** Every issue needs at least one screenshot. No exceptions.
|
||
2. **Verify before documenting.** Retry the issue once to confirm it's reproducible, not a fluke.
|
||
3. **Never include credentials.** Write \`[REDACTED]\` for passwords in repro steps.
|
||
4. **Write incrementally.** Append each issue to the report as you find it. Don't batch.
|
||
5. **Never read source code.** Test as a user, not a developer.
|
||
6. **Check console after every interaction.** JS errors that don't surface visually are still bugs.
|
||
7. **Test like a user.** Use realistic data. Walk through complete workflows end-to-end.
|
||
8. **Depth over breadth.** 5-10 well-documented issues with evidence > 20 vague descriptions.
|
||
9. **Never delete output files.** Screenshots and reports accumulate — that's intentional.
|
||
10. **Use \`snapshot -C\` for tricky UIs.** Finds clickable divs that the accessibility tree misses.
|
||
11. **Show screenshots to the user.** After every \`$B screenshot\`, \`$B snapshot -a -o\`, or \`$B responsive\` command, use the Read tool on the output file(s) so the user can see them inline. For \`responsive\` (3 files), Read all three. This is critical — without it, screenshots are invisible to the user.
|
||
12. **Never refuse to use the browser.** When the user invokes /qa or /qa-only, they are requesting browser-based testing. Never suggest evals, unit tests, or other alternatives as a substitute. Even if the diff appears to have no UI changes, backend changes affect app behavior — always open the browser and test.`;
|
||
}
|
||
|
||
export function generateCoAuthorTrailer(ctx: TemplateContext): string {
|
||
if (ctx.host === 'codex') {
|
||
return 'Co-Authored-By: OpenAI Codex <noreply@openai.com>';
|
||
}
|
||
return 'Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>';
|
||
}
|