mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-02 03:35:09 +02:00
feat(skillify): /skillify codifies last /scrape into permanent skill
The productivity multiplier. /scrape discovers the flow; /skillify writes it as deterministic Playwright-via-browse-client code so the next /scrape on the same intent runs in ~200ms. 11-step flow with three locked contracts from the v1.19.0.0 plan review: D1 — Provenance guard. Walk back ≤10 agent turns for a clearly-bounded /scrape result. Refuse with one specific message if cold. No silent synthesis from chat fragments. D2 — Synthesis input slice. Extract ONLY the final-attempt $B calls that produced the JSON the user accepted, plus the user's intent string. Drop failed selectors, drop unrelated chat, drop earlier-session content. Closes Codex finding #6 by picking option (b) from the design doc: re-prompt from agent's own context, not a structured recorder. D3 — Atomic write. Stage to ~/.gstack/.tmp/skillify-<spawnId>/, run $B skill test against the temp dir, only rename into the final tier path on test pass + user approval. Test fail or approval reject = rm -rf the temp dir entirely. Default tier: global (~/.gstack/browser-skills/<name>/). --project flag overrides to per-project. Generated test must include at least one ★★ assertion (parsed JSON has expected shape + non-empty key fields), not a smoke ★ assertion. Bun runtime distribution (Codex finding #7) carries over to Phase 4. Documented in the skill's Limits section. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
+1114
File diff suppressed because it is too large
Load Diff
@@ -0,0 +1,434 @@
|
||||
---
|
||||
name: skillify
|
||||
version: 1.0.0
|
||||
description: |
|
||||
Codify the most recent successful /scrape flow into a permanent
|
||||
browser-skill on disk. Future /scrape calls with the same intent run
|
||||
the codified script in ~200ms instead of re-driving the page. Walks
|
||||
back through the conversation, synthesizes script.ts + script.test.ts
|
||||
+ fixture, runs the test in a temp dir, and asks before committing.
|
||||
Use when asked to "skillify", "codify", "save this scrape", or
|
||||
"make this permanent". (gstack)
|
||||
allowed-tools:
|
||||
- Bash
|
||||
- Read
|
||||
- Write
|
||||
- AskUserQuestion
|
||||
triggers:
|
||||
- skillify
|
||||
- codify this scrape
|
||||
- save this scrape
|
||||
- make this permanent
|
||||
---
|
||||
|
||||
{{PREAMBLE}}
|
||||
|
||||
# /skillify — codify the last scrape into a permanent skill
|
||||
|
||||
The productivity multiplier. `/scrape` discovered how to pull the data;
|
||||
`/skillify` writes it as deterministic Playwright-via-`browse-client`
|
||||
code so the next `/scrape` call on the same intent runs in ~200ms.
|
||||
|
||||
Without this command, `/scrape` is a slow wrapper around `$B`. With it,
|
||||
every successful scrape is a one-time cost.
|
||||
|
||||
## Iron contract — never write a half-broken skill to disk
|
||||
|
||||
Skills are user-trust artifacts. A broken skill in `$B skill list` makes
|
||||
agents reach for the wrong tool and erodes confidence. This skill writes
|
||||
to a temp dir, runs the auto-generated test there, and only renames into
|
||||
the final tier path on (a) test pass + (b) explicit user approval. On
|
||||
either failure, the temp dir is removed entirely. There is no "almost
|
||||
shipped" state.
|
||||
|
||||
---
|
||||
|
||||
## Step 1 — Provenance guard (D1)
|
||||
|
||||
Walk back through the conversation, **at most 10 agent turns**, looking
|
||||
for the most recent `/scrape` invocation that:
|
||||
|
||||
- Was bounded (you can identify the user's intent line and the trailing
|
||||
JSON the prototype produced)
|
||||
- Produced a JSON result the user did not subsequently invalidate
|
||||
(e.g., did not say "that's wrong", did not ask you to retry)
|
||||
|
||||
If you cannot find one, refuse with exactly this message:
|
||||
|
||||
> "No recent /scrape result found in this conversation. Run /scrape
|
||||
> <intent> first, then say /skillify."
|
||||
|
||||
Stop. Do not synthesize from chat fragments. Do not synthesize from a
|
||||
match-path /scrape result (matched skills are already codified — there's
|
||||
nothing to skillify).
|
||||
|
||||
If you find a candidate but the user is currently three turns past it
|
||||
discussing something unrelated, ask once before proceeding:
|
||||
|
||||
> "The last successful /scrape was '<intent line>' a few turns back.
|
||||
> Skillify that one?"
|
||||
|
||||
A "yes" lets you continue. Anything else: refuse with the message above.
|
||||
|
||||
## Step 2 — Propose name + triggers
|
||||
|
||||
From the prototype intent, extract:
|
||||
|
||||
- A short skill name: lowercase letters/digits/dashes, ≤32 chars,
|
||||
starts with a letter, no consecutive dashes. E.g.,
|
||||
`lobsters-frontpage`, `gh-issue-list`, `pypi-package-stats`.
|
||||
- 3–5 trigger phrases the agent should match against in future `/scrape`
|
||||
calls. Mix the canonical phrase ("scrape lobsters frontpage") with
|
||||
paraphrases ("top posts on lobste.rs", "lobsters front page").
|
||||
- The host (just the hostname, e.g. `lobste.rs`).
|
||||
|
||||
Then **AskUserQuestion** to confirm:
|
||||
|
||||
```
|
||||
D<N> — Skill name + tier
|
||||
Project/branch/task: codifying /scrape "<intent>" as a browser-skill.
|
||||
ELI10: Pick a short name we'll use to find this skill next time you say
|
||||
something similar. Pick a tier — global means every project on this
|
||||
machine sees it, project means just this repo.
|
||||
Stakes if we pick wrong: bad name buries the skill in $B skill list;
|
||||
wrong tier means future projects can't find it (or can find it when you
|
||||
didn't want them to).
|
||||
Recommendation: A — <proposed-name> at global tier — most scrape skills
|
||||
generalize across projects.
|
||||
Note: options differ in kind, not coverage — no completeness score.
|
||||
A) Keep "<proposed-name>" at global tier — ~/.gstack/browser-skills/<proposed-name>/ (recommended)
|
||||
B) Keep "<proposed-name>" but at project tier — <project>/.gstack/browser-skills/<proposed-name>/
|
||||
C) Rename it (free-form — say the new name)
|
||||
```
|
||||
|
||||
**Tier-shadowing check.** Before showing the question, run `$B skill list`
|
||||
and check for an existing skill at the same name. If found, add to the
|
||||
question:
|
||||
|
||||
> "Note: a <tier> skill named '<name>' already exists. Picking the same
|
||||
> name at a higher tier (project > global > bundled) shadows it; picking
|
||||
> the same tier collides and will be refused at write time. Pick a
|
||||
> different name to coexist."
|
||||
|
||||
## Step 3 — Synthesize `script.ts` (D2)
|
||||
|
||||
**Use only the final-attempt `$B` calls** that produced the JSON the
|
||||
user accepted, plus the user's intent string. Drop:
|
||||
|
||||
- Failed selector attempts (the four selectors you tried before the
|
||||
working one)
|
||||
- Unrelated `$B` commands from earlier turns
|
||||
- All conversation prose, summaries, your own reasoning
|
||||
|
||||
The script imports the SDK from `./_lib/browse-client` (a sibling copy,
|
||||
written in step 6) and exports a parser function so `script.test.ts` can
|
||||
exercise it against the bundled fixture without spinning up the daemon.
|
||||
|
||||
Mirror the bundled reference at `browser-skills/hackernews-frontpage/script.ts`:
|
||||
|
||||
```ts
|
||||
import { browse } from './_lib/browse-client';
|
||||
|
||||
export interface Item { /* one row of the JSON output */ }
|
||||
export interface Output { items: Item[]; count: number; }
|
||||
|
||||
const TARGET_URL = '<the URL the prototype used>';
|
||||
|
||||
export function parseFromHtml(html: string): Item[] {
|
||||
// Pure function: HTML in, parsed Item[] out. No $B calls.
|
||||
// Future fixture-replay tests call this directly.
|
||||
}
|
||||
|
||||
if (import.meta.main) { await main(); }
|
||||
|
||||
async function main(): Promise<void> {
|
||||
await browse.goto(TARGET_URL);
|
||||
const html = await browse.html();
|
||||
const items = parseFromHtml(html);
|
||||
const output: Output = { items, count: items.length };
|
||||
process.stdout.write(JSON.stringify(output) + '\n');
|
||||
}
|
||||
```
|
||||
|
||||
The parser MUST be a pure function. If your prototype used multiple `$B`
|
||||
calls (e.g., goto + click "Next" + html), keep all of them in `main()`
|
||||
but extract the parsing into pure helpers. The fixture-replay tests in
|
||||
step 5 only exercise the pure parts.
|
||||
|
||||
## Step 4 — Capture the fixture
|
||||
|
||||
```bash
|
||||
$B goto "<TARGET_URL>"
|
||||
$B html > /tmp/skillify-fixture-$$.html
|
||||
```
|
||||
|
||||
The fixture filename inside the staged dir is
|
||||
`fixtures/<host-with-dashes>-<YYYY-MM-DD>.html`, where the date is today.
|
||||
E.g. `fixtures/lobste-rs-2026-04-27.html`.
|
||||
|
||||
Read the file you wrote, store its contents in a variable, and use it
|
||||
when staging in step 7.
|
||||
|
||||
## Step 5 — Write `script.test.ts`
|
||||
|
||||
Mirror `browser-skills/hackernews-frontpage/script.test.ts`. The test
|
||||
must include at least one ★★ assertion — parsed output has the expected
|
||||
shape AND non-empty key fields — not a smoke ★ assertion. Smoke tests
|
||||
that only check `parseFromHtml` doesn't throw are insufficient.
|
||||
|
||||
```ts
|
||||
import { describe, it, expect } from 'bun:test';
|
||||
import * as fs from 'fs';
|
||||
import * as path from 'path';
|
||||
import { parseFromHtml } from './script';
|
||||
|
||||
describe('<name> parser', () => {
|
||||
const fixturePath = path.join(import.meta.dir, 'fixtures', '<host>-<date>.html');
|
||||
const html = fs.readFileSync(fixturePath, 'utf-8');
|
||||
const items = parseFromHtml(html);
|
||||
|
||||
it('returns at least one item from the bundled fixture', () => {
|
||||
expect(items.length).toBeGreaterThan(0);
|
||||
});
|
||||
|
||||
it('every item has the required shape', () => {
|
||||
for (const item of items) {
|
||||
expect(typeof item.<keyfield>).toBe('<keytype>');
|
||||
// ... assert on every required field
|
||||
}
|
||||
});
|
||||
});
|
||||
```
|
||||
|
||||
## Step 6 — Resolve the canonical SDK path + read it
|
||||
|
||||
The canonical SDK lives at `<gstack-install>/browse/src/browse-client.ts`.
|
||||
The bundled-skill loader walks the install tree to find it; mirror that.
|
||||
|
||||
Resolve the gstack install dir. Two reliable signals (in order):
|
||||
|
||||
1. The bundled `hackernews-frontpage` skill — look at its tier path from
|
||||
`$B skill list` (the `bundled` row). The skill dir is
|
||||
`<gstack-install>/browser-skills/hackernews-frontpage/`, so the install
|
||||
dir is two `dirname` calls above its `_lib/browse-client.ts`.
|
||||
2. The active gstack skills install at `~/.claude/skills/gstack/`. Read
|
||||
the symlink target if it's a symlink, otherwise use the path directly.
|
||||
|
||||
Example (run as Bun, not bash, to avoid shell-redirect parsing issues):
|
||||
|
||||
```ts
|
||||
import * as fs from 'fs';
|
||||
import * as os from 'os';
|
||||
import * as path from 'path';
|
||||
|
||||
function resolveSdkPath(): string {
|
||||
const candidates = [
|
||||
path.join(os.homedir(), '.claude', 'skills', 'gstack', 'browse', 'src', 'browse-client.ts'),
|
||||
// Add other install-dir candidates if your environment differs.
|
||||
];
|
||||
for (const c of candidates) {
|
||||
try {
|
||||
const real = fs.realpathSync(c);
|
||||
if (fs.existsSync(real)) return real;
|
||||
} catch {}
|
||||
}
|
||||
throw new Error('Could not resolve canonical browse-client.ts');
|
||||
}
|
||||
|
||||
const sdkContents = fs.readFileSync(resolveSdkPath(), 'utf-8');
|
||||
```
|
||||
|
||||
Read the SDK contents into a variable. The staging step writes it as
|
||||
`_lib/browse-client.ts` byte-identical to the canonical. Phase 1 decision
|
||||
#4 — each skill is fully self-contained, no version drift possible.
|
||||
|
||||
## Step 7 — Stage the skill (D3 atomic write)
|
||||
|
||||
Use the helper at `browse/src/browser-skill-write.ts`. Construct an inline
|
||||
TypeScript snippet (or shell out to a small Bun one-liner) that calls:
|
||||
|
||||
```ts
|
||||
import { stageSkill } from '<gstack-install>/browse/src/browser-skill-write';
|
||||
|
||||
const stagedDir = stageSkill({
|
||||
name: '<name>',
|
||||
files: new Map([
|
||||
['SKILL.md', skillMd],
|
||||
['script.ts', scriptTs],
|
||||
['script.test.ts', scriptTestTs],
|
||||
['_lib/browse-client.ts', sdkContents],
|
||||
['fixtures/<host>-<date>.html', fixtureHtml],
|
||||
]),
|
||||
});
|
||||
console.log(stagedDir);
|
||||
```
|
||||
|
||||
The SKILL.md content for `<name>` follows the Phase 1 frontmatter
|
||||
contract:
|
||||
|
||||
```yaml
|
||||
---
|
||||
name: <name>
|
||||
description: <one-line, what data this returns>
|
||||
host: <hostname>
|
||||
trusted: false # agent-authored skills are untrusted by default
|
||||
source: agent
|
||||
version: 1.0.0
|
||||
args: [] # extend if your script accepts --arg key=value
|
||||
triggers:
|
||||
- <phrase 1>
|
||||
- <phrase 2>
|
||||
- <phrase 3>
|
||||
---
|
||||
|
||||
# <Name> scraper
|
||||
|
||||
<2-3 sentences on what the script does, what URL it hits, and what
|
||||
shape of JSON it returns. NO conversation context. NO chat fragments.
|
||||
This is a durable on-disk artifact — keep it tight.>
|
||||
|
||||
## Usage
|
||||
|
||||
\`\`\`
|
||||
$ $B skill run <name>
|
||||
{ "items": [...], "count": N }
|
||||
\`\`\`
|
||||
```
|
||||
|
||||
Capture `stagedDir` (the path returned by `stageSkill`). You'll pass it
|
||||
to `$B skill test` next, then to `commitSkill` or `discardStaged`.
|
||||
|
||||
## Step 8 — Run `$B skill test` against the staged dir
|
||||
|
||||
```bash
|
||||
$B skill test "<name>" --dir "<stagedDir>"
|
||||
```
|
||||
|
||||
If `$B skill test` does not yet accept `--dir`, fall back to invoking the
|
||||
test runner directly against the staged path:
|
||||
|
||||
```bash
|
||||
( cd "<stagedDir>" && bun test script.test.ts )
|
||||
```
|
||||
|
||||
If the test fails:
|
||||
|
||||
1. Read the test output. If the failure is a fixable parser bug,
|
||||
rewrite `script.ts` and `script.test.ts` (still inside the staged
|
||||
dir) and retry — at most twice. Show the diff to the user before
|
||||
each retry.
|
||||
2. If still failing after two retries, OR the failure is an
|
||||
environmental issue (SDK import, daemon connection):
|
||||
|
||||
```ts
|
||||
import { discardStaged } from '<gstack-install>/browse/src/browser-skill-write';
|
||||
discardStaged('<stagedDir>');
|
||||
```
|
||||
|
||||
Report the failure to the user, show them the staged `script.ts` for
|
||||
reference, and stop. No on-disk artifact.
|
||||
|
||||
## Step 9 — Approval gate
|
||||
|
||||
Tests passed. Now ask the user before committing:
|
||||
|
||||
```
|
||||
D<N> — Commit skill "<name>" at <resolved-tier-path>?
|
||||
Project/branch/task: codified /scrape "<intent>" — tests pass against fixture.
|
||||
ELI10: The script ran clean against the snapshot we captured. Saying yes
|
||||
moves the staged folder into ~/.gstack/browser-skills/ where /scrape
|
||||
will find it next time. Saying no removes the staged folder and nothing
|
||||
lands on disk.
|
||||
Stakes if we pick wrong: yes commits an artifact you have to manually rm
|
||||
later if you regret it ($B skill rm <name> --global). No throws away
|
||||
~30s of synthesis work.
|
||||
Recommendation: A — tests passed, the script is self-contained, this is
|
||||
the productivity payoff for the prototype.
|
||||
Note: options differ in kind, not coverage — no completeness score.
|
||||
A) Commit it (recommended)
|
||||
B) Look at the script first (I'll print SKILL.md + script.ts and re-ask)
|
||||
C) Discard — don't commit
|
||||
```
|
||||
|
||||
If the user picks B, print the staged `SKILL.md` and `script.ts` (NOT
|
||||
the fixture or _lib/), then re-ask the same A/B/C question (without B
|
||||
this time — they already saw it).
|
||||
|
||||
## Step 10 — Commit (atomic) or discard
|
||||
|
||||
If the user approved:
|
||||
|
||||
```ts
|
||||
import { commitSkill } from '<gstack-install>/browse/src/browser-skill-write';
|
||||
const dest = commitSkill({
|
||||
name: '<name>',
|
||||
tier: '<global|project>', // from step 2 answer
|
||||
stagedDir: '<stagedDir>',
|
||||
});
|
||||
console.log(`Committed: ${dest}`);
|
||||
```
|
||||
|
||||
If `commitSkill` throws "already exists" (tier-shadowing collision the
|
||||
user dismissed in step 2), report and ask whether to:
|
||||
|
||||
- Pick a different name (back to step 2)
|
||||
- `$B skill rm <name>` then retry
|
||||
- Discard
|
||||
|
||||
If the user rejected in step 9:
|
||||
|
||||
```ts
|
||||
import { discardStaged } from '<gstack-install>/browse/src/browser-skill-write';
|
||||
discardStaged('<stagedDir>');
|
||||
```
|
||||
|
||||
Report: "Discarded. No skill was written to disk."
|
||||
|
||||
## Step 11 — Confirm + verify
|
||||
|
||||
After a successful commit, run one verification:
|
||||
|
||||
```bash
|
||||
$B skill list | grep <name>
|
||||
$B skill run <name> # should match the JSON the prototype produced
|
||||
```
|
||||
|
||||
If the post-commit run does not match the prototype output, something
|
||||
in synthesis drifted. Surface this to the user — they may want to
|
||||
`$B skill rm <name>` and retry. Do NOT silently roll back; the user
|
||||
deserves to see the discrepancy.
|
||||
|
||||
End the skill with one line: "Skill '<name>' committed at <tier>. Future
|
||||
/scrape calls matching '<canonical-trigger>' will run in ~200ms."
|
||||
|
||||
---
|
||||
|
||||
## Limits (be honest)
|
||||
|
||||
- **Bun runtime required.** The codified skill runs as a Bun process
|
||||
(`bun run script.ts`). Phase 1 design carry-over (Codex finding #7).
|
||||
Real fix lands in Phase 4 (self-contained binary or Node fallback).
|
||||
For now: the skill works on any machine that has gstack installed,
|
||||
which means it has Bun.
|
||||
- **Fixture-replay tests are point-in-time.** When the target site
|
||||
rotates HTML, the fixture goes stale and the test passes against an
|
||||
outdated snapshot. Phase 4 will add fixture-staleness detection.
|
||||
- **Synthesis is best-effort.** You're writing a script from your own
|
||||
conversation memory. If the prototype was complex (multi-page, JS
|
||||
hydration, lazy load) the codified script may need a hand-edit before
|
||||
it's reliable. The post-commit verify step catches obvious drift.
|
||||
- **Single-target only.** One `$B goto` URL per skill. Multi-page
|
||||
crawls are out of scope — write a separate skill per target, or
|
||||
parameterize via `args:` if the URL pattern is regular.
|
||||
|
||||
## What this skill does NOT do
|
||||
|
||||
- Codify match-path /scrape results (matched skills are already codified)
|
||||
- Codify mutating flows (those are /automate's job — Phase 2 P0)
|
||||
- Run skills (that's `$B skill run` — codified skills are run via /scrape's
|
||||
match path or directly)
|
||||
- Edit existing skills ($EDITOR + the skill dir is the surface — `$B skill
|
||||
show <name>` finds the path)
|
||||
- Tombstone or remove ($B skill rm)
|
||||
|
||||
{{LEARNINGS_LOG}}
|
||||
Reference in New Issue
Block a user