mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-06 21:46:40 +02:00
feat: contributor mode, session awareness, recommendation format (#90)
* feat: contributor mode, session awareness, universal RECOMMENDATION format
- Rename {{UPDATE_CHECK}} → {{PREAMBLE}} across all 10 skill templates
- Add session tracking (touch ~/.gstack/sessions/$PPID, count active sessions)
- ELI16 mode when 3+ concurrent sessions detected (re-ground user on context)
- Contributor mode: auto-file field reports to ~/.gstack/contributor-logs/
- Universal AskUserQuestion format: context → question → RECOMMENDATION → options
- Update plan-ceo-review and plan-eng-review to reference preamble baseline
- Add vendored symlink awareness section to CLAUDE.md
- Rewrite CONTRIBUTING.md with contributor workflow and cross-project testing
- Add tests for contributor mode and session awareness in generated output
- Add E2E eval for contributor mode report filing
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: add Enum & Value Completeness to /review critical checklist
New CRITICAL review category that traces new enum values, status strings,
and type constants through every consumer outside the diff. Catches the
class of bugs where a new value is added but not handled in all switch/case
chains, allowlists, or frontend-backend contracts.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* chore: bump v0.4.1, user-facing changelog, update qa-only template and architecture docs
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* docs: add CHANGELOG style guide — user-facing, sell the feature
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: rewrite v0.4.1 changelog to be user-facing and sell the features
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: add evals for RECOMMENDATION format, session awareness, and enum completeness
Free tests (Tier 1): RECOMMENDATION format + session awareness in all
preamble SKILL.md files, enum completeness checklist structure and CRITICAL
classification.
E2E eval: /review catches missed enum handlers when a new status value
is added but not handled in case/switch and notify methods.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: add E2E eval for session awareness ELI16 mode
Stubs _SESSIONS=4, gives agent a decision point on feature/add-payments
branch, verifies the output re-grounds the user with project, branch,
context, and RECOMMENDATION — the ELI16 mode behavior for 3+ sessions.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: contributor mode eval marked FAIL due to expected browse error
The test intentionally runs a nonexistent binary to trigger contributor
mode. The session runner's browse error detection catches "no such file
or directory...browse" and sets browseErrors, causing recordE2E to mark
passed=false. Override passed to check only exitReason since the browse
error is the expected scenario.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
+30
@@ -0,0 +1,30 @@
|
||||
# Feature branch version: adds "returned" status but misses consumers
|
||||
class Order < ApplicationRecord
|
||||
STATUSES = %w[pending processing shipped delivered returned].freeze
|
||||
|
||||
validates :status, inclusion: { in: STATUSES }
|
||||
|
||||
def display_status
|
||||
case status
|
||||
when 'pending' then 'Awaiting processing'
|
||||
when 'processing' then 'Being prepared'
|
||||
when 'shipped' then 'On the way'
|
||||
when 'delivered' then 'Delivered'
|
||||
# BUG: 'returned' not handled — falls through to nil
|
||||
end
|
||||
end
|
||||
|
||||
def can_cancel?
|
||||
# BUG: should 'returned' be cancellable? Not considered.
|
||||
%w[pending processing].include?(status)
|
||||
end
|
||||
|
||||
def notify_customer
|
||||
case status
|
||||
when 'pending' then OrderMailer.confirmation(self).deliver_later
|
||||
when 'shipped' then OrderMailer.shipped(self).deliver_later
|
||||
when 'delivered' then OrderMailer.delivered(self).deliver_later
|
||||
# BUG: 'returned' has no notification — customer won't know return was received
|
||||
end
|
||||
end
|
||||
end
|
||||
Vendored
+27
@@ -0,0 +1,27 @@
|
||||
# Existing file on main: order model with status handling
|
||||
class Order < ApplicationRecord
|
||||
STATUSES = %w[pending processing shipped delivered].freeze
|
||||
|
||||
validates :status, inclusion: { in: STATUSES }
|
||||
|
||||
def display_status
|
||||
case status
|
||||
when 'pending' then 'Awaiting processing'
|
||||
when 'processing' then 'Being prepared'
|
||||
when 'shipped' then 'On the way'
|
||||
when 'delivered' then 'Delivered'
|
||||
end
|
||||
end
|
||||
|
||||
def can_cancel?
|
||||
%w[pending processing].include?(status)
|
||||
end
|
||||
|
||||
def notify_customer
|
||||
case status
|
||||
when 'pending' then OrderMailer.confirmation(self).deliver_later
|
||||
when 'shipped' then OrderMailer.shipped(self).deliver_later
|
||||
when 'delivered' then OrderMailer.delivered(self).deliver_later
|
||||
end
|
||||
end
|
||||
end
|
||||
@@ -125,10 +125,26 @@ describe('gen-skill-docs', () => {
|
||||
const rootTmpl = fs.readFileSync(path.join(ROOT, 'SKILL.md.tmpl'), 'utf-8');
|
||||
expect(rootTmpl).toContain('{{COMMAND_REFERENCE}}');
|
||||
expect(rootTmpl).toContain('{{SNAPSHOT_FLAGS}}');
|
||||
expect(rootTmpl).toContain('{{PREAMBLE}}');
|
||||
|
||||
const browseTmpl = fs.readFileSync(path.join(ROOT, 'browse', 'SKILL.md.tmpl'), 'utf-8');
|
||||
expect(browseTmpl).toContain('{{COMMAND_REFERENCE}}');
|
||||
expect(browseTmpl).toContain('{{SNAPSHOT_FLAGS}}');
|
||||
expect(browseTmpl).toContain('{{PREAMBLE}}');
|
||||
});
|
||||
|
||||
test('generated SKILL.md contains contributor mode check', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'SKILL.md'), 'utf-8');
|
||||
expect(content).toContain('Contributor Mode');
|
||||
expect(content).toContain('gstack_contributor');
|
||||
expect(content).toContain('contributor-logs');
|
||||
});
|
||||
|
||||
test('generated SKILL.md contains session awareness', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'SKILL.md'), 'utf-8');
|
||||
expect(content).toContain('_SESSIONS');
|
||||
expect(content).toContain('RECOMMENDATION');
|
||||
expect(content).toContain('ELI16');
|
||||
});
|
||||
|
||||
test('qa and qa-only templates use QA_METHODOLOGY placeholder', () => {
|
||||
|
||||
@@ -280,6 +280,124 @@ Report the exact output — either "READY: <path>" or "NEEDS_SETUP".`,
|
||||
// Clean up
|
||||
try { fs.rmSync(nonGitDir, { recursive: true, force: true }); } catch {}
|
||||
}, 60_000);
|
||||
|
||||
test('contributor mode files a report on gstack error', async () => {
|
||||
const contribDir = fs.mkdtempSync(path.join(os.tmpdir(), 'skill-e2e-contrib-'));
|
||||
const logsDir = path.join(contribDir, 'contributor-logs');
|
||||
fs.mkdirSync(logsDir, { recursive: true });
|
||||
|
||||
// Extract contributor mode instructions from generated SKILL.md
|
||||
const skillMd = fs.readFileSync(path.join(ROOT, 'SKILL.md'), 'utf-8');
|
||||
const contribStart = skillMd.indexOf('## Contributor Mode');
|
||||
const contribEnd = skillMd.indexOf('\n## ', contribStart + 1);
|
||||
const contribBlock = skillMd.slice(contribStart, contribEnd > 0 ? contribEnd : undefined);
|
||||
|
||||
const result = await runSkillTest({
|
||||
prompt: `You are in contributor mode (_CONTRIB=true).
|
||||
|
||||
${contribBlock}
|
||||
|
||||
OVERRIDE: Write contributor logs to ${logsDir}/ instead of ~/.gstack/contributor-logs/
|
||||
|
||||
Now try this browse command (it will fail — there is no binary at this path):
|
||||
/nonexistent/path/browse goto https://example.com
|
||||
|
||||
This is a gstack issue (the browse binary is missing/misconfigured).
|
||||
File a contributor report about this issue. Then tell me what you filed.`,
|
||||
workingDirectory: contribDir,
|
||||
maxTurns: 8,
|
||||
timeout: 60_000,
|
||||
testName: 'contributor-mode',
|
||||
runId,
|
||||
});
|
||||
|
||||
logCost('contributor mode', result);
|
||||
// Override passed: this test intentionally triggers a browse error (nonexistent binary)
|
||||
// so browseErrors will be non-empty — that's expected, not a failure
|
||||
recordE2E('contributor mode report', 'Skill E2E tests', result, {
|
||||
passed: result.exitReason === 'success',
|
||||
});
|
||||
|
||||
// Verify a contributor log was created with expected format
|
||||
const logFiles = fs.readdirSync(logsDir).filter(f => f.endsWith('.md'));
|
||||
expect(logFiles.length).toBeGreaterThan(0);
|
||||
|
||||
const logContent = fs.readFileSync(path.join(logsDir, logFiles[0]), 'utf-8');
|
||||
expect(logContent).toContain('Hey gstack team');
|
||||
expect(logContent).toContain('What I was trying to do');
|
||||
expect(logContent).toContain('What happened instead');
|
||||
|
||||
// Clean up
|
||||
try { fs.rmSync(contribDir, { recursive: true, force: true }); } catch {}
|
||||
}, 90_000);
|
||||
|
||||
test('session awareness adds ELI16 context when _SESSIONS >= 3', async () => {
|
||||
const sessionDir = fs.mkdtempSync(path.join(os.tmpdir(), 'skill-e2e-session-'));
|
||||
|
||||
// Set up a git repo so there's project/branch context to reference
|
||||
const run = (cmd: string, args: string[]) =>
|
||||
spawnSync(cmd, args, { cwd: sessionDir, stdio: 'pipe', timeout: 5000 });
|
||||
run('git', ['init']);
|
||||
run('git', ['config', 'user.email', 'test@test.com']);
|
||||
run('git', ['config', 'user.name', 'Test']);
|
||||
fs.writeFileSync(path.join(sessionDir, 'app.rb'), '# my app\n');
|
||||
run('git', ['add', '.']);
|
||||
run('git', ['commit', '-m', 'init']);
|
||||
run('git', ['checkout', '-b', 'feature/add-payments']);
|
||||
// Add a remote so the agent can derive a project name
|
||||
run('git', ['remote', 'add', 'origin', 'https://github.com/acme/billing-app.git']);
|
||||
|
||||
// Extract AskUserQuestion format instructions from generated SKILL.md
|
||||
const skillMd = fs.readFileSync(path.join(ROOT, 'SKILL.md'), 'utf-8');
|
||||
const aqStart = skillMd.indexOf('## AskUserQuestion Format');
|
||||
const aqEnd = skillMd.indexOf('\n## ', aqStart + 1);
|
||||
const aqBlock = skillMd.slice(aqStart, aqEnd > 0 ? aqEnd : undefined);
|
||||
|
||||
const outputPath = path.join(sessionDir, 'question-output.md');
|
||||
|
||||
const result = await runSkillTest({
|
||||
prompt: `You are running a gstack skill. The session preamble detected _SESSIONS=4 (the user has 4 gstack windows open).
|
||||
|
||||
${aqBlock}
|
||||
|
||||
You are on branch feature/add-payments in the billing-app project. You were reviewing a plan to add Stripe integration.
|
||||
|
||||
You've hit a decision point: the plan doesn't specify whether to use Stripe Checkout (hosted) or Stripe Elements (embedded). You need to ask the user which approach to use.
|
||||
|
||||
Since this is non-interactive, DO NOT actually call AskUserQuestion. Instead, write the EXACT text you would display to the user (the full AskUserQuestion content) to the file: ${outputPath}
|
||||
|
||||
Remember: _SESSIONS=4, so ELI16 mode is active. The user is juggling multiple windows and may not remember what this conversation is about. Re-ground them.`,
|
||||
workingDirectory: sessionDir,
|
||||
maxTurns: 8,
|
||||
timeout: 60_000,
|
||||
testName: 'session-awareness',
|
||||
runId,
|
||||
});
|
||||
|
||||
logCost('session awareness', result);
|
||||
recordE2E('session awareness ELI16', 'Skill E2E tests', result);
|
||||
|
||||
// Verify the output contains ELI16 re-grounding context
|
||||
if (fs.existsSync(outputPath)) {
|
||||
const output = fs.readFileSync(outputPath, 'utf-8');
|
||||
const lower = output.toLowerCase();
|
||||
// Must mention project name
|
||||
expect(lower.includes('billing') || lower.includes('acme')).toBe(true);
|
||||
// Must mention branch
|
||||
expect(lower.includes('payment') || lower.includes('feature')).toBe(true);
|
||||
// Must mention what we're working on
|
||||
expect(lower.includes('stripe') || lower.includes('checkout') || lower.includes('payment')).toBe(true);
|
||||
// Must have a RECOMMENDATION
|
||||
expect(output).toContain('RECOMMENDATION');
|
||||
} else {
|
||||
// Check agent output as fallback
|
||||
const output = result.output || '';
|
||||
expect(output).toContain('RECOMMENDATION');
|
||||
}
|
||||
|
||||
// Clean up
|
||||
try { fs.rmSync(sessionDir, { recursive: true, force: true }); } catch {}
|
||||
}, 90_000);
|
||||
});
|
||||
|
||||
// --- B4: QA skill E2E ---
|
||||
@@ -392,6 +510,78 @@ Write your review findings to ${reviewDir}/review-output.md`,
|
||||
}, 120_000);
|
||||
});
|
||||
|
||||
// --- Review: Enum completeness E2E ---
|
||||
|
||||
describeE2E('Review enum completeness E2E', () => {
|
||||
let enumDir: string;
|
||||
|
||||
beforeAll(() => {
|
||||
enumDir = fs.mkdtempSync(path.join(os.tmpdir(), 'skill-e2e-enum-'));
|
||||
|
||||
const run = (cmd: string, args: string[]) =>
|
||||
spawnSync(cmd, args, { cwd: enumDir, stdio: 'pipe', timeout: 5000 });
|
||||
|
||||
run('git', ['init']);
|
||||
run('git', ['config', 'user.email', 'test@test.com']);
|
||||
run('git', ['config', 'user.name', 'Test']);
|
||||
|
||||
// Commit baseline on main — order model with 4 statuses
|
||||
const baseContent = fs.readFileSync(path.join(ROOT, 'test', 'fixtures', 'review-eval-enum.rb'), 'utf-8');
|
||||
fs.writeFileSync(path.join(enumDir, 'order.rb'), baseContent);
|
||||
run('git', ['add', 'order.rb']);
|
||||
run('git', ['commit', '-m', 'initial order model']);
|
||||
|
||||
// Feature branch adds "returned" status but misses handlers
|
||||
run('git', ['checkout', '-b', 'feature/add-returned-status']);
|
||||
const diffContent = fs.readFileSync(path.join(ROOT, 'test', 'fixtures', 'review-eval-enum-diff.rb'), 'utf-8');
|
||||
fs.writeFileSync(path.join(enumDir, 'order.rb'), diffContent);
|
||||
run('git', ['add', 'order.rb']);
|
||||
run('git', ['commit', '-m', 'add returned status']);
|
||||
|
||||
// Copy review skill files
|
||||
fs.copyFileSync(path.join(ROOT, 'review', 'SKILL.md'), path.join(enumDir, 'review-SKILL.md'));
|
||||
fs.copyFileSync(path.join(ROOT, 'review', 'checklist.md'), path.join(enumDir, 'review-checklist.md'));
|
||||
fs.copyFileSync(path.join(ROOT, 'review', 'greptile-triage.md'), path.join(enumDir, 'review-greptile-triage.md'));
|
||||
});
|
||||
|
||||
afterAll(() => {
|
||||
try { fs.rmSync(enumDir, { recursive: true, force: true }); } catch {}
|
||||
});
|
||||
|
||||
test('/review catches missing enum handlers for new status value', async () => {
|
||||
const result = await runSkillTest({
|
||||
prompt: `You are in a git repo on branch feature/add-returned-status with changes against main.
|
||||
Read review-SKILL.md for the review workflow instructions.
|
||||
Also read review-checklist.md and apply it — pay special attention to the Enum & Value Completeness section.
|
||||
Run /review on the current diff (git diff main...HEAD).
|
||||
Write your review findings to ${enumDir}/review-output.md
|
||||
|
||||
The diff adds a new "returned" status to the Order model. Your job is to check if all consumers handle it.`,
|
||||
workingDirectory: enumDir,
|
||||
maxTurns: 15,
|
||||
timeout: 90_000,
|
||||
testName: 'review-enum-completeness',
|
||||
runId,
|
||||
});
|
||||
|
||||
logCost('/review enum', result);
|
||||
recordE2E('/review enum completeness', 'Review enum completeness E2E', result);
|
||||
expect(result.exitReason).toBe('success');
|
||||
|
||||
// Verify the review caught the missing enum handlers
|
||||
const reviewPath = path.join(enumDir, 'review-output.md');
|
||||
if (fs.existsSync(reviewPath)) {
|
||||
const review = fs.readFileSync(reviewPath, 'utf-8');
|
||||
// Should mention the missing "returned" handling in at least one of the methods
|
||||
const mentionsReturned = review.toLowerCase().includes('returned');
|
||||
const mentionsEnum = review.toLowerCase().includes('enum') || review.toLowerCase().includes('status');
|
||||
const mentionsCritical = review.toLowerCase().includes('critical');
|
||||
expect(mentionsReturned).toBe(true);
|
||||
expect(mentionsEnum || mentionsCritical).toBe(true);
|
||||
}
|
||||
}, 120_000);
|
||||
});
|
||||
|
||||
// --- B6/B7/B8: Planted-bug outcome evals ---
|
||||
|
||||
// Outcome evals also need ANTHROPIC_API_KEY for the LLM judge
|
||||
|
||||
@@ -411,6 +411,66 @@ describe('TODOS-format.md reference consistency', () => {
|
||||
});
|
||||
});
|
||||
|
||||
// --- v0.4.1 feature coverage: RECOMMENDATION format, session awareness, enum completeness ---
|
||||
|
||||
describe('v0.4.1 preamble features', () => {
|
||||
const skillsWithPreamble = [
|
||||
'SKILL.md', 'browse/SKILL.md', 'qa/SKILL.md',
|
||||
'qa-only/SKILL.md',
|
||||
'setup-browser-cookies/SKILL.md',
|
||||
'ship/SKILL.md', 'review/SKILL.md',
|
||||
'plan-ceo-review/SKILL.md', 'plan-eng-review/SKILL.md',
|
||||
'retro/SKILL.md',
|
||||
];
|
||||
|
||||
for (const skill of skillsWithPreamble) {
|
||||
test(`${skill} contains RECOMMENDATION format`, () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, skill), 'utf-8');
|
||||
expect(content).toContain('RECOMMENDATION: Choose');
|
||||
expect(content).toContain('AskUserQuestion');
|
||||
});
|
||||
|
||||
test(`${skill} contains session awareness`, () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, skill), 'utf-8');
|
||||
expect(content).toContain('_SESSIONS');
|
||||
expect(content).toContain('ELI16');
|
||||
});
|
||||
}
|
||||
});
|
||||
|
||||
describe('Enum & Value Completeness in review checklist', () => {
|
||||
const checklist = fs.readFileSync(path.join(ROOT, 'review', 'checklist.md'), 'utf-8');
|
||||
|
||||
test('checklist has Enum & Value Completeness section', () => {
|
||||
expect(checklist).toContain('Enum & Value Completeness');
|
||||
});
|
||||
|
||||
test('Enum & Value Completeness is classified as CRITICAL', () => {
|
||||
// It should appear under Pass 1 — CRITICAL, not Pass 2
|
||||
const pass1Start = checklist.indexOf('### Pass 1');
|
||||
const pass2Start = checklist.indexOf('### Pass 2');
|
||||
const enumStart = checklist.indexOf('Enum & Value Completeness');
|
||||
expect(enumStart).toBeGreaterThan(pass1Start);
|
||||
expect(enumStart).toBeLessThan(pass2Start);
|
||||
});
|
||||
|
||||
test('Enum & Value Completeness mentions tracing through consumers', () => {
|
||||
expect(checklist).toContain('Trace it through every consumer');
|
||||
expect(checklist).toContain('case');
|
||||
expect(checklist).toContain('allowlist');
|
||||
});
|
||||
|
||||
test('Enum & Value Completeness is in the gate classification as CRITICAL', () => {
|
||||
const gateSection = checklist.slice(checklist.indexOf('## Gate Classification'));
|
||||
// The ASCII art has CRITICAL on the left and INFORMATIONAL on the right
|
||||
// Enum & Value Completeness should appear on a line with the CRITICAL tree (├─ or └─)
|
||||
const enumLine = gateSection.split('\n').find(l => l.includes('Enum & Value Completeness'));
|
||||
expect(enumLine).toBeDefined();
|
||||
// It's on the left (CRITICAL) side — starts with ├─ or └─
|
||||
expect(enumLine!.trimStart().startsWith('├─') || enumLine!.trimStart().startsWith('└─')).toBe(true);
|
||||
});
|
||||
});
|
||||
|
||||
// --- Part 7: Planted-bug fixture validation (A4) ---
|
||||
|
||||
describe('Planted-bug fixture validation', () => {
|
||||
|
||||
Reference in New Issue
Block a user