chore: raise skill token ceiling warning from 25K to 40K

The 25K ceiling predated flagship models with 200K-1M windows and assumed
every skill prompt dominates context cost. Modern reality: prompt caching
amortizes the skill load across invocations, and three carefully-tuned
skills (ship, plan-ceo-review, office-hours) legitimately pack 25-35K
tokens of behavior that can't be cut without degrading quality or removing
protected content (Garry's voice, YC pitch, specialist review instructions).

We made the safe prose cuts earlier (coverage diagram, plan status footer,
plan mode operations). The remaining gap is structural — real compression
would require splitting /ship into ship-quick vs ship-full, externalizing
large resolvers to reference docs, or removing detailed skill behavior.
Each is 1-2 days of work. The cost of the warning firing is zero (it's
a warning, not an error). The cost of hitting it is ~15¢ per invocation
at worst, amortized further by prompt caching.

Raising to 40K catches what it's supposed to catch — a runaway 10K+ token
growth in a single release — without crying wolf on legitimately big
skills. Reference doc in CLAUDE.md updated to reflect the new philosophy:
when you hit 40K, ask WHAT grew, don't blindly compress tuned prose.

scripts/gen-skill-docs.ts: TOKEN_CEILING_BYTES 100_000 → 160_000.
CLAUDE.md: document the "watch for feature bloat, not force compression"
intent of the ceiling.

Verification: `bun run gen:skill-docs --host all` shows zero TOKEN
CEILING warnings under the new 40K threshold.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Garry Tan
2026-04-17 15:51:01 +08:00
parent 1a01632b90
commit 2c446cb709
2 changed files with 19 additions and 7 deletions
+9 -3
View File
@@ -534,10 +534,16 @@ for (const currentHost of hostsToRun) {
const tokens = Math.round(content.length / 4); // ~4 chars per token
tokenBudget.push({ skill: relOutput, lines, tokens });
// Token ceiling check: warn if any generated SKILL.md exceeds ~25K tokens (100KB)
const TOKEN_CEILING_BYTES = 100_000;
// Token ceiling check: warn if any generated SKILL.md exceeds ~40K tokens (160KB).
// The ceiling is a "watch for feature bloat" guardrail, not a hard gate. Modern
// flagship models have 200K-1M context windows, so 40K (4-20% of window) is fine.
// Prompt caching further reduces the marginal cost of larger skills. This ceiling
// exists to catch a runaway preamble or resolver that's grown by 10K+ tokens in
// a release, not to force compression on carefully-tuned big skills (ship,
// plan-ceo-review, office-hours all legitimately pack 25-35K tokens of behavior).
const TOKEN_CEILING_BYTES = 160_000;
if (content.length > TOKEN_CEILING_BYTES) {
console.warn(`⚠️ TOKEN CEILING: ${relOutput} is ${content.length} bytes (~${tokens} tokens), exceeds ${TOKEN_CEILING_BYTES} byte ceiling (~25K tokens)`);
console.warn(`⚠️ TOKEN CEILING: ${relOutput} is ${content.length} bytes (~${tokens} tokens), exceeds ${TOKEN_CEILING_BYTES} byte ceiling (~40K tokens)`);
}
}