Files
gstack/browse/test/fixtures/qa-eval.html
T
Garry Tan 76803d789a feat: 3-tier eval suite with planted-bug outcome testing (EVALS=1)
Adds comprehensive eval infrastructure:
- Tier 1 (free): 13 new static tests — cross-skill path consistency, QA
  structure validation, greptile format, planted-bug fixture validation
- Tier 2 (Agent SDK E2E): /qa quick, /review with pre-built git repo,
  3 planted-bug outcome evals (static, SPA, checkout — each with 5 bugs)
- Tier 3 (LLM judge): QA workflow quality, health rubric clarity,
  cross-skill consistency, baseline score pinning

New fixtures: 3 HTML pages with 15 total planted bugs, ground truth JSON,
review-eval-vuln.rb, eval-baselines.json. Shared llm-judge.ts helper (DRY).

Unified EVALS=1 flag replaces SKILL_E2E + ANTHROPIC_API_KEY checks.
`bun run test:evals` runs everything that costs money (~$4/run).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-14 01:17:36 -05:00

52 lines
1.7 KiB
HTML

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>QA Eval — Widget Dashboard</title>
<style>
body { font-family: sans-serif; padding: 20px; }
nav { margin-bottom: 20px; }
nav a { margin-right: 15px; color: #0066cc; }
form { margin: 20px 0; padding: 15px; border: 1px solid #ccc; border-radius: 4px; }
input { display: block; margin: 8px 0; padding: 6px; }
button { padding: 8px 16px; margin-top: 8px; }
.stats { margin: 20px 0; }
img { display: block; margin: 20px 0; }
</style>
</head>
<body>
<nav>
<a href="/">Home</a>
<a href="/about">About</a>
<a href="/nonexistent-404-page">Resources</a> <!-- BUG 1: broken link (404) -->
</nav>
<h1>Widget Dashboard</h1>
<form id="contact">
<h2>Contact Us</h2>
<input type="text" name="name" placeholder="Name" required>
<input type="email" name="email" placeholder="Email" required>
<button type="submit" disabled>Submit</button> <!-- BUG 2: submit button permanently disabled -->
</form>
<div class="stats" style="width: 400px; overflow: hidden;">
<h2>Statistics</h2>
<p style="white-space: nowrap; width: 600px;">
Revenue: $1,234,567.89 | Users: 45,678 | Conversion: 3.2% | Growth: +12.5% MoM | Retention: 87.3%
</p> <!-- BUG 3: content overflow/clipping — text wider than container with overflow:hidden -->
</div>
<img src="/logo.png"> <!-- BUG 4: missing alt text on image -->
<footer>
<p>&copy; 2026 Widget Co. All rights reserved.</p>
</footer>
<script>
console.error("TypeError: Cannot read properties of undefined (reading 'map')");
// BUG 5: console error on page load
</script>
</body>
</html>