From 5393739862342e68a3f2336cf4b9382cf5877bca Mon Sep 17 00:00:00 2001 From: Garry Tan Date: Fri, 13 Mar 2026 15:43:47 -0700 Subject: [PATCH] ci: SKILL.md freshness check on push/PR + TODO updates - .github/workflows/skill-docs.yml: fails if generated SKILL.md files are stale - TODO.md: add E2E cost tracking and model pinning to future ideas --- .github/workflows/skill-docs.yml | 11 +++++++++++ TODO.md | 2 ++ 2 files changed, 13 insertions(+) create mode 100644 .github/workflows/skill-docs.yml diff --git a/.github/workflows/skill-docs.yml b/.github/workflows/skill-docs.yml new file mode 100644 index 00000000..6f8f1744 --- /dev/null +++ b/.github/workflows/skill-docs.yml @@ -0,0 +1,11 @@ +name: Skill Docs Freshness +on: [push, pull_request] +jobs: + check-freshness: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + - uses: oven-sh/setup-bun@v2 + - run: bun install + - run: bun run gen:skill-docs + - run: git diff --exit-code || (echo "Generated SKILL.md files are stale. Run: bun run gen:skill-docs" && exit 1) diff --git a/TODO.md b/TODO.md index dc09311f..8eaff4b2 100644 --- a/TODO.md +++ b/TODO.md @@ -103,6 +103,8 @@ - [ ] Trend tracking across QA runs — compare baseline.json over time, detect regressions (P2, S) - [ ] CI/CD integration — `/qa` as GitHub Action step, fail PR if health score drops (P2, M) - [ ] Accessibility audit mode — `--a11y` flag for focused accessibility testing (P3, S) + - [ ] E2E test cost tracking — track cumulative API spend, warn if over threshold (P3, S) + - [ ] E2E model pinning — pin E2E tests to claude-sonnet-4-6 for cost efficiency, add retry:2 for flaky LLM (P2, XS) ## Ideas & Notes - Browser is the nervous system — every skill should be able to see, interact with, and verify the web