mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-06 21:46:40 +02:00
chore: raise skill token ceiling warning from 25K to 40K
The 25K ceiling predated flagship models with 200K-1M windows and assumed every skill prompt dominates context cost. Modern reality: prompt caching amortizes the skill load across invocations, and three carefully-tuned skills (ship, plan-ceo-review, office-hours) legitimately pack 25-35K tokens of behavior that can't be cut without degrading quality or removing protected content (Garry's voice, YC pitch, specialist review instructions). We made the safe prose cuts earlier (coverage diagram, plan status footer, plan mode operations). The remaining gap is structural — real compression would require splitting /ship into ship-quick vs ship-full, externalizing large resolvers to reference docs, or removing detailed skill behavior. Each is 1-2 days of work. The cost of the warning firing is zero (it's a warning, not an error). The cost of hitting it is ~15¢ per invocation at worst, amortized further by prompt caching. Raising to 40K catches what it's supposed to catch — a runaway 10K+ token growth in a single release — without crying wolf on legitimately big skills. Reference doc in CLAUDE.md updated to reflect the new philosophy: when you hit 40K, ask WHAT grew, don't blindly compress tuned prose. scripts/gen-skill-docs.ts: TOKEN_CEILING_BYTES 100_000 → 160_000. CLAUDE.md: document the "watch for feature bloat, not force compression" intent of the ceiling. Verification: `bun run gen:skill-docs --host all` shows zero TOKEN CEILING warnings under the new 40K threshold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -139,10 +139,16 @@ SKILL.md files are **generated** from `.tmpl` templates. To update docs:
|
||||
To add a new browse command: add it to `browse/src/commands.ts` and rebuild.
|
||||
To add a snapshot flag: add it to `SNAPSHOT_FLAGS` in `browse/src/snapshot.ts` and rebuild.
|
||||
|
||||
**Token ceiling:** Generated SKILL.md files must stay under 100KB (~25K tokens).
|
||||
`gen-skill-docs` warns if any file exceeds this. If a skill template grows past the
|
||||
ceiling, consider extracting optional sections into separate resolvers that only
|
||||
inject when relevant, or making verbose evaluation rubrics more concise.
|
||||
**Token ceiling:** Generated SKILL.md files trip a warning above 160KB (~40K tokens).
|
||||
This is a "watch for feature bloat" guardrail, not a hard gate. Modern flagship
|
||||
models have 200K-1M context windows, so 40K is 4-20% of window, and prompt caching
|
||||
makes the marginal cost of larger skills small. The ceiling exists to catch runaway
|
||||
preamble/resolver growth, not to force compression on carefully-tuned big skills
|
||||
(`ship`, `plan-ceo-review`, `office-hours` legitimately pack 25-35K tokens of
|
||||
behavior). If you blow past 40K, the right fix is usually: (1) look at WHAT grew,
|
||||
(2) if one resolver added 10K+ in a single PR, question whether it belongs inline
|
||||
or as a reference doc, (3) only compress carefully-tuned prose as a last resort —
|
||||
cuts to the coverage audit, review army, or voice directive have real quality cost.
|
||||
|
||||
**Merge conflicts on SKILL.md files:** NEVER resolve conflicts on generated SKILL.md
|
||||
files by accepting either side. Instead: (1) resolve conflicts on the `.tmpl` templates
|
||||
|
||||
@@ -534,10 +534,16 @@ for (const currentHost of hostsToRun) {
|
||||
const tokens = Math.round(content.length / 4); // ~4 chars per token
|
||||
tokenBudget.push({ skill: relOutput, lines, tokens });
|
||||
|
||||
// Token ceiling check: warn if any generated SKILL.md exceeds ~25K tokens (100KB)
|
||||
const TOKEN_CEILING_BYTES = 100_000;
|
||||
// Token ceiling check: warn if any generated SKILL.md exceeds ~40K tokens (160KB).
|
||||
// The ceiling is a "watch for feature bloat" guardrail, not a hard gate. Modern
|
||||
// flagship models have 200K-1M context windows, so 40K (4-20% of window) is fine.
|
||||
// Prompt caching further reduces the marginal cost of larger skills. This ceiling
|
||||
// exists to catch a runaway preamble or resolver that's grown by 10K+ tokens in
|
||||
// a release, not to force compression on carefully-tuned big skills (ship,
|
||||
// plan-ceo-review, office-hours all legitimately pack 25-35K tokens of behavior).
|
||||
const TOKEN_CEILING_BYTES = 160_000;
|
||||
if (content.length > TOKEN_CEILING_BYTES) {
|
||||
console.warn(`⚠️ TOKEN CEILING: ${relOutput} is ${content.length} bytes (~${tokens} tokens), exceeds ${TOKEN_CEILING_BYTES} byte ceiling (~25K tokens)`);
|
||||
console.warn(`⚠️ TOKEN CEILING: ${relOutput} is ${content.length} bytes (~${tokens} tokens), exceeds ${TOKEN_CEILING_BYTES} byte ceiling (~40K tokens)`);
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
Reference in New Issue
Block a user