mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-02 11:45:20 +02:00
chore: raise skill token ceiling warning from 25K to 40K
The 25K ceiling predated flagship models with 200K-1M windows and assumed every skill prompt dominates context cost. Modern reality: prompt caching amortizes the skill load across invocations, and three carefully-tuned skills (ship, plan-ceo-review, office-hours) legitimately pack 25-35K tokens of behavior that can't be cut without degrading quality or removing protected content (Garry's voice, YC pitch, specialist review instructions). We made the safe prose cuts earlier (coverage diagram, plan status footer, plan mode operations). The remaining gap is structural — real compression would require splitting /ship into ship-quick vs ship-full, externalizing large resolvers to reference docs, or removing detailed skill behavior. Each is 1-2 days of work. The cost of the warning firing is zero (it's a warning, not an error). The cost of hitting it is ~15¢ per invocation at worst, amortized further by prompt caching. Raising to 40K catches what it's supposed to catch — a runaway 10K+ token growth in a single release — without crying wolf on legitimately big skills. Reference doc in CLAUDE.md updated to reflect the new philosophy: when you hit 40K, ask WHAT grew, don't blindly compress tuned prose. scripts/gen-skill-docs.ts: TOKEN_CEILING_BYTES 100_000 → 160_000. CLAUDE.md: document the "watch for feature bloat, not force compression" intent of the ceiling. Verification: `bun run gen:skill-docs --host all` shows zero TOKEN CEILING warnings under the new 40K threshold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -534,10 +534,16 @@ for (const currentHost of hostsToRun) {
|
||||
const tokens = Math.round(content.length / 4); // ~4 chars per token
|
||||
tokenBudget.push({ skill: relOutput, lines, tokens });
|
||||
|
||||
// Token ceiling check: warn if any generated SKILL.md exceeds ~25K tokens (100KB)
|
||||
const TOKEN_CEILING_BYTES = 100_000;
|
||||
// Token ceiling check: warn if any generated SKILL.md exceeds ~40K tokens (160KB).
|
||||
// The ceiling is a "watch for feature bloat" guardrail, not a hard gate. Modern
|
||||
// flagship models have 200K-1M context windows, so 40K (4-20% of window) is fine.
|
||||
// Prompt caching further reduces the marginal cost of larger skills. This ceiling
|
||||
// exists to catch a runaway preamble or resolver that's grown by 10K+ tokens in
|
||||
// a release, not to force compression on carefully-tuned big skills (ship,
|
||||
// plan-ceo-review, office-hours all legitimately pack 25-35K tokens of behavior).
|
||||
const TOKEN_CEILING_BYTES = 160_000;
|
||||
if (content.length > TOKEN_CEILING_BYTES) {
|
||||
console.warn(`⚠️ TOKEN CEILING: ${relOutput} is ${content.length} bytes (~${tokens} tokens), exceeds ${TOKEN_CEILING_BYTES} byte ceiling (~25K tokens)`);
|
||||
console.warn(`⚠️ TOKEN CEILING: ${relOutput} is ${content.length} bytes (~${tokens} tokens), exceeds ${TOKEN_CEILING_BYTES} byte ceiling (~40K tokens)`);
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
Reference in New Issue
Block a user