Files
gstack/skillify/SKILL.md.tmpl
T
Garry Tan e0b454fe58 feat(skillify): /skillify codifies last /scrape into permanent skill
The productivity multiplier. /scrape discovers the flow; /skillify writes
it as deterministic Playwright-via-browse-client code so the next /scrape
on the same intent runs in ~200ms.

11-step flow with three locked contracts from the v1.19.0.0 plan review:

D1 — Provenance guard. Walk back ≤10 agent turns for a clearly-bounded
/scrape result. Refuse with one specific message if cold. No silent
synthesis from chat fragments.

D2 — Synthesis input slice. Extract ONLY the final-attempt $B calls that
produced the JSON the user accepted, plus the user's intent string. Drop
failed selectors, drop unrelated chat, drop earlier-session content.
Closes Codex finding #6 by picking option (b) from the design doc:
re-prompt from agent's own context, not a structured recorder.

D3 — Atomic write. Stage to ~/.gstack/.tmp/skillify-<spawnId>/, run
$B skill test against the temp dir, only rename into the final tier path
on test pass + user approval. Test fail or approval reject = rm -rf the
temp dir entirely.

Default tier: global (~/.gstack/browser-skills/<name>/). --project flag
overrides to per-project. Generated test must include at least one ★★
assertion (parsed JSON has expected shape + non-empty key fields), not a
smoke ★ assertion.

Bun runtime distribution (Codex finding #7) carries over to Phase 4.
Documented in the skill's Limits section.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 18:33:52 -07:00

435 lines
15 KiB
Cheetah
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
name: skillify
version: 1.0.0
description: |
Codify the most recent successful /scrape flow into a permanent
browser-skill on disk. Future /scrape calls with the same intent run
the codified script in ~200ms instead of re-driving the page. Walks
back through the conversation, synthesizes script.ts + script.test.ts
+ fixture, runs the test in a temp dir, and asks before committing.
Use when asked to "skillify", "codify", "save this scrape", or
"make this permanent". (gstack)
allowed-tools:
- Bash
- Read
- Write
- AskUserQuestion
triggers:
- skillify
- codify this scrape
- save this scrape
- make this permanent
---
{{PREAMBLE}}
# /skillify — codify the last scrape into a permanent skill
The productivity multiplier. `/scrape` discovered how to pull the data;
`/skillify` writes it as deterministic Playwright-via-`browse-client`
code so the next `/scrape` call on the same intent runs in ~200ms.
Without this command, `/scrape` is a slow wrapper around `$B`. With it,
every successful scrape is a one-time cost.
## Iron contract — never write a half-broken skill to disk
Skills are user-trust artifacts. A broken skill in `$B skill list` makes
agents reach for the wrong tool and erodes confidence. This skill writes
to a temp dir, runs the auto-generated test there, and only renames into
the final tier path on (a) test pass + (b) explicit user approval. On
either failure, the temp dir is removed entirely. There is no "almost
shipped" state.
---
## Step 1 — Provenance guard (D1)
Walk back through the conversation, **at most 10 agent turns**, looking
for the most recent `/scrape` invocation that:
- Was bounded (you can identify the user's intent line and the trailing
JSON the prototype produced)
- Produced a JSON result the user did not subsequently invalidate
(e.g., did not say "that's wrong", did not ask you to retry)
If you cannot find one, refuse with exactly this message:
> "No recent /scrape result found in this conversation. Run /scrape
> <intent> first, then say /skillify."
Stop. Do not synthesize from chat fragments. Do not synthesize from a
match-path /scrape result (matched skills are already codified — there's
nothing to skillify).
If you find a candidate but the user is currently three turns past it
discussing something unrelated, ask once before proceeding:
> "The last successful /scrape was '<intent line>' a few turns back.
> Skillify that one?"
A "yes" lets you continue. Anything else: refuse with the message above.
## Step 2 — Propose name + triggers
From the prototype intent, extract:
- A short skill name: lowercase letters/digits/dashes, ≤32 chars,
starts with a letter, no consecutive dashes. E.g.,
`lobsters-frontpage`, `gh-issue-list`, `pypi-package-stats`.
- 35 trigger phrases the agent should match against in future `/scrape`
calls. Mix the canonical phrase ("scrape lobsters frontpage") with
paraphrases ("top posts on lobste.rs", "lobsters front page").
- The host (just the hostname, e.g. `lobste.rs`).
Then **AskUserQuestion** to confirm:
```
D<N> — Skill name + tier
Project/branch/task: codifying /scrape "<intent>" as a browser-skill.
ELI10: Pick a short name we'll use to find this skill next time you say
something similar. Pick a tier — global means every project on this
machine sees it, project means just this repo.
Stakes if we pick wrong: bad name buries the skill in $B skill list;
wrong tier means future projects can't find it (or can find it when you
didn't want them to).
Recommendation: A — <proposed-name> at global tier — most scrape skills
generalize across projects.
Note: options differ in kind, not coverage — no completeness score.
A) Keep "<proposed-name>" at global tier — ~/.gstack/browser-skills/<proposed-name>/ (recommended)
B) Keep "<proposed-name>" but at project tier — <project>/.gstack/browser-skills/<proposed-name>/
C) Rename it (free-form — say the new name)
```
**Tier-shadowing check.** Before showing the question, run `$B skill list`
and check for an existing skill at the same name. If found, add to the
question:
> "Note: a <tier> skill named '<name>' already exists. Picking the same
> name at a higher tier (project > global > bundled) shadows it; picking
> the same tier collides and will be refused at write time. Pick a
> different name to coexist."
## Step 3 — Synthesize `script.ts` (D2)
**Use only the final-attempt `$B` calls** that produced the JSON the
user accepted, plus the user's intent string. Drop:
- Failed selector attempts (the four selectors you tried before the
working one)
- Unrelated `$B` commands from earlier turns
- All conversation prose, summaries, your own reasoning
The script imports the SDK from `./_lib/browse-client` (a sibling copy,
written in step 6) and exports a parser function so `script.test.ts` can
exercise it against the bundled fixture without spinning up the daemon.
Mirror the bundled reference at `browser-skills/hackernews-frontpage/script.ts`:
```ts
import { browse } from './_lib/browse-client';
export interface Item { /* one row of the JSON output */ }
export interface Output { items: Item[]; count: number; }
const TARGET_URL = '<the URL the prototype used>';
export function parseFromHtml(html: string): Item[] {
// Pure function: HTML in, parsed Item[] out. No $B calls.
// Future fixture-replay tests call this directly.
}
if (import.meta.main) { await main(); }
async function main(): Promise<void> {
await browse.goto(TARGET_URL);
const html = await browse.html();
const items = parseFromHtml(html);
const output: Output = { items, count: items.length };
process.stdout.write(JSON.stringify(output) + '\n');
}
```
The parser MUST be a pure function. If your prototype used multiple `$B`
calls (e.g., goto + click "Next" + html), keep all of them in `main()`
but extract the parsing into pure helpers. The fixture-replay tests in
step 5 only exercise the pure parts.
## Step 4 — Capture the fixture
```bash
$B goto "<TARGET_URL>"
$B html > /tmp/skillify-fixture-$$.html
```
The fixture filename inside the staged dir is
`fixtures/<host-with-dashes>-<YYYY-MM-DD>.html`, where the date is today.
E.g. `fixtures/lobste-rs-2026-04-27.html`.
Read the file you wrote, store its contents in a variable, and use it
when staging in step 7.
## Step 5 — Write `script.test.ts`
Mirror `browser-skills/hackernews-frontpage/script.test.ts`. The test
must include at least one ★★ assertion — parsed output has the expected
shape AND non-empty key fields — not a smoke ★ assertion. Smoke tests
that only check `parseFromHtml` doesn't throw are insufficient.
```ts
import { describe, it, expect } from 'bun:test';
import * as fs from 'fs';
import * as path from 'path';
import { parseFromHtml } from './script';
describe('<name> parser', () => {
const fixturePath = path.join(import.meta.dir, 'fixtures', '<host>-<date>.html');
const html = fs.readFileSync(fixturePath, 'utf-8');
const items = parseFromHtml(html);
it('returns at least one item from the bundled fixture', () => {
expect(items.length).toBeGreaterThan(0);
});
it('every item has the required shape', () => {
for (const item of items) {
expect(typeof item.<keyfield>).toBe('<keytype>');
// ... assert on every required field
}
});
});
```
## Step 6 — Resolve the canonical SDK path + read it
The canonical SDK lives at `<gstack-install>/browse/src/browse-client.ts`.
The bundled-skill loader walks the install tree to find it; mirror that.
Resolve the gstack install dir. Two reliable signals (in order):
1. The bundled `hackernews-frontpage` skill — look at its tier path from
`$B skill list` (the `bundled` row). The skill dir is
`<gstack-install>/browser-skills/hackernews-frontpage/`, so the install
dir is two `dirname` calls above its `_lib/browse-client.ts`.
2. The active gstack skills install at `~/.claude/skills/gstack/`. Read
the symlink target if it's a symlink, otherwise use the path directly.
Example (run as Bun, not bash, to avoid shell-redirect parsing issues):
```ts
import * as fs from 'fs';
import * as os from 'os';
import * as path from 'path';
function resolveSdkPath(): string {
const candidates = [
path.join(os.homedir(), '.claude', 'skills', 'gstack', 'browse', 'src', 'browse-client.ts'),
// Add other install-dir candidates if your environment differs.
];
for (const c of candidates) {
try {
const real = fs.realpathSync(c);
if (fs.existsSync(real)) return real;
} catch {}
}
throw new Error('Could not resolve canonical browse-client.ts');
}
const sdkContents = fs.readFileSync(resolveSdkPath(), 'utf-8');
```
Read the SDK contents into a variable. The staging step writes it as
`_lib/browse-client.ts` byte-identical to the canonical. Phase 1 decision
#4 — each skill is fully self-contained, no version drift possible.
## Step 7 — Stage the skill (D3 atomic write)
Use the helper at `browse/src/browser-skill-write.ts`. Construct an inline
TypeScript snippet (or shell out to a small Bun one-liner) that calls:
```ts
import { stageSkill } from '<gstack-install>/browse/src/browser-skill-write';
const stagedDir = stageSkill({
name: '<name>',
files: new Map([
['SKILL.md', skillMd],
['script.ts', scriptTs],
['script.test.ts', scriptTestTs],
['_lib/browse-client.ts', sdkContents],
['fixtures/<host>-<date>.html', fixtureHtml],
]),
});
console.log(stagedDir);
```
The SKILL.md content for `<name>` follows the Phase 1 frontmatter
contract:
```yaml
---
name: <name>
description: <one-line, what data this returns>
host: <hostname>
trusted: false # agent-authored skills are untrusted by default
source: agent
version: 1.0.0
args: [] # extend if your script accepts --arg key=value
triggers:
- <phrase 1>
- <phrase 2>
- <phrase 3>
---
# <Name> scraper
<2-3 sentences on what the script does, what URL it hits, and what
shape of JSON it returns. NO conversation context. NO chat fragments.
This is a durable on-disk artifact — keep it tight.>
## Usage
\`\`\`
$ $B skill run <name>
{ "items": [...], "count": N }
\`\`\`
```
Capture `stagedDir` (the path returned by `stageSkill`). You'll pass it
to `$B skill test` next, then to `commitSkill` or `discardStaged`.
## Step 8 — Run `$B skill test` against the staged dir
```bash
$B skill test "<name>" --dir "<stagedDir>"
```
If `$B skill test` does not yet accept `--dir`, fall back to invoking the
test runner directly against the staged path:
```bash
( cd "<stagedDir>" && bun test script.test.ts )
```
If the test fails:
1. Read the test output. If the failure is a fixable parser bug,
rewrite `script.ts` and `script.test.ts` (still inside the staged
dir) and retry — at most twice. Show the diff to the user before
each retry.
2. If still failing after two retries, OR the failure is an
environmental issue (SDK import, daemon connection):
```ts
import { discardStaged } from '<gstack-install>/browse/src/browser-skill-write';
discardStaged('<stagedDir>');
```
Report the failure to the user, show them the staged `script.ts` for
reference, and stop. No on-disk artifact.
## Step 9 — Approval gate
Tests passed. Now ask the user before committing:
```
D<N> — Commit skill "<name>" at <resolved-tier-path>?
Project/branch/task: codified /scrape "<intent>" — tests pass against fixture.
ELI10: The script ran clean against the snapshot we captured. Saying yes
moves the staged folder into ~/.gstack/browser-skills/ where /scrape
will find it next time. Saying no removes the staged folder and nothing
lands on disk.
Stakes if we pick wrong: yes commits an artifact you have to manually rm
later if you regret it ($B skill rm <name> --global). No throws away
~30s of synthesis work.
Recommendation: A — tests passed, the script is self-contained, this is
the productivity payoff for the prototype.
Note: options differ in kind, not coverage — no completeness score.
A) Commit it (recommended)
B) Look at the script first (I'll print SKILL.md + script.ts and re-ask)
C) Discard — don't commit
```
If the user picks B, print the staged `SKILL.md` and `script.ts` (NOT
the fixture or _lib/), then re-ask the same A/B/C question (without B
this time — they already saw it).
## Step 10 — Commit (atomic) or discard
If the user approved:
```ts
import { commitSkill } from '<gstack-install>/browse/src/browser-skill-write';
const dest = commitSkill({
name: '<name>',
tier: '<global|project>', // from step 2 answer
stagedDir: '<stagedDir>',
});
console.log(`Committed: ${dest}`);
```
If `commitSkill` throws "already exists" (tier-shadowing collision the
user dismissed in step 2), report and ask whether to:
- Pick a different name (back to step 2)
- `$B skill rm <name>` then retry
- Discard
If the user rejected in step 9:
```ts
import { discardStaged } from '<gstack-install>/browse/src/browser-skill-write';
discardStaged('<stagedDir>');
```
Report: "Discarded. No skill was written to disk."
## Step 11 — Confirm + verify
After a successful commit, run one verification:
```bash
$B skill list | grep <name>
$B skill run <name> # should match the JSON the prototype produced
```
If the post-commit run does not match the prototype output, something
in synthesis drifted. Surface this to the user — they may want to
`$B skill rm <name>` and retry. Do NOT silently roll back; the user
deserves to see the discrepancy.
End the skill with one line: "Skill '<name>' committed at <tier>. Future
/scrape calls matching '<canonical-trigger>' will run in ~200ms."
---
## Limits (be honest)
- **Bun runtime required.** The codified skill runs as a Bun process
(`bun run script.ts`). Phase 1 design carry-over (Codex finding #7).
Real fix lands in Phase 4 (self-contained binary or Node fallback).
For now: the skill works on any machine that has gstack installed,
which means it has Bun.
- **Fixture-replay tests are point-in-time.** When the target site
rotates HTML, the fixture goes stale and the test passes against an
outdated snapshot. Phase 4 will add fixture-staleness detection.
- **Synthesis is best-effort.** You're writing a script from your own
conversation memory. If the prototype was complex (multi-page, JS
hydration, lazy load) the codified script may need a hand-edit before
it's reliable. The post-commit verify step catches obvious drift.
- **Single-target only.** One `$B goto` URL per skill. Multi-page
crawls are out of scope — write a separate skill per target, or
parameterize via `args:` if the URL pattern is regular.
## What this skill does NOT do
- Codify match-path /scrape results (matched skills are already codified)
- Codify mutating flows (those are /automate's job — Phase 2 P0)
- Run skills (that's `$B skill run` — codified skills are run via /scrape's
match path or directly)
- Edit existing skills ($EDITOR + the skill dir is the surface — `$B skill
show <name>` finds the path)
- Tombstone or remove ($B skill rm)
{{LEARNINGS_LOG}}