From 2c446cb7096cc43100a0d0e08625186d396f5c1b Mon Sep 17 00:00:00 2001
From: Garry Tan <garrytan@gmail.com>
Date: Fri, 17 Apr 2026 15:51:01 +0800
Subject: [PATCH] chore: raise skill token ceiling warning from 25K to 40K
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The 25K ceiling predated flagship models with 200K-1M windows and assumed
every skill prompt dominates context cost. Modern reality: prompt caching
amortizes the skill load across invocations, and three carefully-tuned
skills (ship, plan-ceo-review, office-hours) legitimately pack 25-35K
tokens of behavior that can't be cut without degrading quality or removing
protected content (Garry's voice, YC pitch, specialist review instructions).

We made the safe prose cuts earlier (coverage diagram, plan status footer,
plan mode operations). The remaining gap is structural — real compression
would require splitting /ship into ship-quick vs ship-full, externalizing
large resolvers to reference docs, or removing detailed skill behavior.
Each is 1-2 days of work. The cost of the warning firing is zero (it's
a warning, not an error). The cost of hitting it is ~15¢ per invocation
at worst, amortized further by prompt caching.

Raising to 40K catches what it's supposed to catch — a runaway 10K+ token
growth in a single release — without crying wolf on legitimately big
skills. Reference doc in CLAUDE.md updated to reflect the new philosophy:
when you hit 40K, ask WHAT grew, don't blindly compress tuned prose.

scripts/gen-skill-docs.ts: TOKEN_CEILING_BYTES 100_000 → 160_000.
CLAUDE.md: document the "watch for feature bloat, not force compression"
intent of the ceiling.

Verification: `bun run gen:skill-docs --host all` shows zero TOKEN
CEILING warnings under the new 40K threshold.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 CLAUDE.md                 | 14 ++++++++++----
 scripts/gen-skill-docs.ts | 12 +++++++++---
 2 files changed, 19 insertions(+), 7 deletions(-)

diff --git a/CLAUDE.md b/CLAUDE.md
index 074b6122..ae68d806 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -139,10 +139,16 @@ SKILL.md files are **generated** from `.tmpl` templates. To update docs:
 To add a new browse command: add it to `browse/src/commands.ts` and rebuild.
 To add a snapshot flag: add it to `SNAPSHOT_FLAGS` in `browse/src/snapshot.ts` and rebuild.
 
-**Token ceiling:** Generated SKILL.md files must stay under 100KB (~25K tokens).
-`gen-skill-docs` warns if any file exceeds this. If a skill template grows past the
-ceiling, consider extracting optional sections into separate resolvers that only
-inject when relevant, or making verbose evaluation rubrics more concise.
+**Token ceiling:** Generated SKILL.md files trip a warning above 160KB (~40K tokens).
+This is a "watch for feature bloat" guardrail, not a hard gate. Modern flagship
+models have 200K-1M context windows, so 40K is 4-20% of window, and prompt caching
+makes the marginal cost of larger skills small. The ceiling exists to catch runaway
+preamble/resolver growth, not to force compression on carefully-tuned big skills
+(`ship`, `plan-ceo-review`, `office-hours` legitimately pack 25-35K tokens of
+behavior). If you blow past 40K, the right fix is usually: (1) look at WHAT grew,
+(2) if one resolver added 10K+ in a single PR, question whether it belongs inline
+or as a reference doc, (3) only compress carefully-tuned prose as a last resort —
+cuts to the coverage audit, review army, or voice directive have real quality cost.
 
 **Merge conflicts on SKILL.md files:** NEVER resolve conflicts on generated SKILL.md
 files by accepting either side. Instead: (1) resolve conflicts on the `.tmpl` templates
diff --git a/scripts/gen-skill-docs.ts b/scripts/gen-skill-docs.ts
index dd709ee9..40f08369 100644
--- a/scripts/gen-skill-docs.ts
+++ b/scripts/gen-skill-docs.ts
@@ -534,10 +534,16 @@ for (const currentHost of hostsToRun) {
       const tokens = Math.round(content.length / 4); // ~4 chars per token
       tokenBudget.push({ skill: relOutput, lines, tokens });
 
-      // Token ceiling check: warn if any generated SKILL.md exceeds ~25K tokens (100KB)
-      const TOKEN_CEILING_BYTES = 100_000;
+      // Token ceiling check: warn if any generated SKILL.md exceeds ~40K tokens (160KB).
+      // The ceiling is a "watch for feature bloat" guardrail, not a hard gate. Modern
+      // flagship models have 200K-1M context windows, so 40K (4-20% of window) is fine.
+      // Prompt caching further reduces the marginal cost of larger skills. This ceiling
+      // exists to catch a runaway preamble or resolver that's grown by 10K+ tokens in
+      // a release, not to force compression on carefully-tuned big skills (ship,
+      // plan-ceo-review, office-hours all legitimately pack 25-35K tokens of behavior).
+      const TOKEN_CEILING_BYTES = 160_000;
       if (content.length > TOKEN_CEILING_BYTES) {
-        console.warn(`⚠️  TOKEN CEILING: ${relOutput} is ${content.length} bytes (~${tokens} tokens), exceeds ${TOKEN_CEILING_BYTES} byte ceiling (~25K tokens)`);
+        console.warn(`⚠️  TOKEN CEILING: ${relOutput} is ${content.length} bytes (~${tokens} tokens), exceeds ${TOKEN_CEILING_BYTES} byte ceiling (~40K tokens)`);
       }
     }