diff --git a/docs/ON_THE_LOC_CONTROVERSY.md b/docs/ON_THE_LOC_CONTROVERSY.md index 262f2f18..4a106715 100644 --- a/docs/ON_THE_LOC_CONTROVERSY.md +++ b/docs/ON_THE_LOC_CONTROVERSY.md @@ -1,218 +1,169 @@ -# On the LOC Controversy +# On the LOC controversy -*Or: what happened when I mentioned how many lines of code I've been shipping, and what the numbers actually say.* +Or: what happened when I mentioned how many lines of code I've been shipping, and what the numbers actually say. ---- +## The critique is right. And it doesn't matter. -## The point, before you scroll +LOC is a garbage metric. Every senior engineer knows it. Dijkstra said it in 1988. Kernighan implied it before that. If you measure programmer productivity in lines of code, you're measuring the wrong thing. This has been true for 40 years and it's still true. -I am not saying engineers are going away. Nobody serious thinks that. - -I am saying **engineers can fly now**. One engineer in 2026 has the output of a small team in 2013, working the same hours, at the same day job, with the same brain. The code-generation cost curve collapsed by two orders of magnitude. The metrics that measure engineering output all moved accordingly. LOC is one of them. It's a crude one. Every metric on earth is crude. That doesn't mean the shift isn't real. - -This post is me doing the math honestly against my own 2013 baseline, naming what the critics are right about, putting the numbers I'd be least proud to put next to the numbers I'm most proud of, and showing the reproducibility script so you can audit me. - -If you read one sentence: **the same person shipping 14 logical lines per day in 2013 is shipping 11,417 logical lines per day in 2026, and the number that matters isn't the line count, it's what those lines become.** - ---- - -## The tidal wave - -I posted that in the last 60 days I'd shipped 600,000 lines of production code. - -The replies came in fast. Most of them some variation of: +I posted that in the last 60 days I'd shipped 600,000 lines of production code. The replies came in fast: - "That's just AI slop." -- "LOC is a meaningless metric. Dijkstra said so. Kernighan said so. Every senior engineer in the last 40 years said so." +- "LOC is a meaningless metric. Every senior engineer in the last 40 years said so." - "Of course you produced 600K lines. You had an AI writing boilerplate." - "More lines is bad, not good." - "You're confusing volume with productivity. Classic PM brain." - "Where are your error rates? Your DAUs? Your revert counts?" - "This is embarrassing." -Some of those replies are right. Some miss the point. This post is me doing the math honestly so I can tell the difference, and so I can respond to the steelman instead of the strawman. +Some of those are right. Here's what happens when you take the smart version of the critique seriously and do the math anyway. -## Why LOC became the lightning rod +## Three branches of the AI coding critique -The AI coding critique has three branches. They get collapsed into one. +They get collapsed into one, but they're different arguments. -**Branch 1: LOC doesn't measure quality.** True. A 50-line well-factored library beats a 5,000-line bloated one. Dijkstra wrote that in 1988 ("programmers are a cost, not an asset") and it was right then and right now. +**Branch 1: LOC doesn't measure quality.** True. Always has been. A 50-line well-factored library beats a 5,000-line bloated one. This was true before AI and it's true now. It was never a killer argument. It was a reminder to think about what you're measuring. **Branch 2: AI inflates LOC.** True. LLMs generate verbose code by default. More boilerplate. More defensive checks. More comments. More tests. Raw line counts go up even when "real work done" didn't. **Branch 3: Therefore bragging about LOC is embarrassing.** This is where the argument jumps the track. -Branch 1 has always been true, including before AI. It was never a killer argument. It was a reminder to think about what you're measuring. +Branch 2 is the interesting one. If raw LOC is inflated by some factor, the honest thing is to compute the deflation and report the deflated number. That's what this post does. -Branch 2 is the interesting one. If raw LOC is inflated by some factor, the honest thing is to compute the deflation and report the deflated number. That's what this post does. Then it takes the next step and applies a second deflation for AI verbosity, because a neckbeard is correct that `cloc` was written to strip blanks, not to strip defensive null-checks. +## The math -## Logical SLOC, not raw LOC +### Raw numbers -The standard response to "raw LOC is garbage" is **logical SLOC**, sometimes called **source lines of code** (SLOC) or **non-comment non-blank** (NCLOC). Tools like `cloc` and `scc` have computed this for 20 years. Same code, fluff stripped: +I wrote a script ([`scripts/garry-output-comparison.ts`](../scripts/garry-output-comparison.ts)) that enumerates every commit I authored across all 41 repos owned by `garrytan/*` on GitHub — 15 public, 26 private — in 2013 and 2026. For each commit, it counts logical lines added (non-blank, non-comment). The 2013 corpus includes Bookface, the YC-internal social network I built that year. -- No blank lines -- No single-line comments -- No comment block bodies (best effort) -- No trailing whitespace - -Logical SLOC doesn't eliminate AI inflation entirely. AI writes 2-3 defensive null checks where a senior engineer would write zero and rely on type guarantees. AI inlines try/catch around things that don't throw. AI spells out `const result = foo(); return result` instead of `return foo()`. NCLOC counts all of that. - -**So let's apply a second deflation.** Assume AI-generated code is 2x more verbose than senior hand-crafted code at the logical level, on top of the blank-and-comment stripping. That's aggressive — most careful measurements I've seen put the multiplier at 1.3x to 1.8x — but it's the upper bound a skeptic would demand. - -- My 2013 per-day rate: 14 logical lines -- My 2026 per-day rate, NCLOC: 11,417 -- My 2026 per-day rate, NCLOC with 2x AI-verbosity deflation: **5,708 logical lines** - -Multiple on daily pace, with both deflations: **408x**. Cut it in half again and call me a pathological liar: **204x**. At which point you should ask yourself whether 200x is a more defensible number than 810x, or whether the entire argument about "maybe the multiple is smaller" matters when the smaller number is still 200x. - -## The method - -To compare 2013 me vs 2026 me honestly, I wrote a script: `scripts/garry-output-comparison.ts` in this repo. It: - -1. Enumerates every commit authored by me in 2013 and 2026, filtered by email (`garry@ycombinator.com` plus historical aliases). -2. For each commit, runs `git show --format= --unified=0 ` and counts the added lines from the diff. -3. Classifies each added line as logical or not using regex filters per language (blank, single-line comment, docstring markers). -4. Aggregates per year. Normalizes by day. - -I cloned all 41 repos owned by `garrytan/*` on GitHub — 15 public, 26 private — and ran the script against each. Bookface, the YC-internal social network I built in 2013 and 2014, is in the corpus. So are the three 2013-era projects (delicounter, tandong) and the upstream OSS contribution that year (zurb-foundation-wysihtml5). - -One repo excluded from the 2026 numbers: **tax-app**. It's a demo app I built for an upcoming YC channel video, not production shipping work, so it shouldn't count. Baked into the script's `EXCLUDED_REPOS` constant. - -## The numbers - -### Year-to-date (raw volume comparison) +One repo excluded from 2026: `tax-app` (demo for a YC video, not production work). Baked into the script's `EXCLUDED_REPOS` constant. Run it yourself. 2013 was a full year. 2026 is day 108 as of this writing (April 18). -- **2013 full year:** 5,143 logical lines added -- **2026 through April 18:** 1,233,062 logical lines added -- **Multiple: 240x** +| | 2013 (full year) | 2026 (108 days) | Multiple | +|------------------|----------------:|----------------:|---------:| +| Logical SLOC | 5,143 | 1,233,062 | 240x | +| Logical SLOC/day | 14 | 11,417 | 810x | +| Commits | 71 | 351 | 4.9x | +| Files touched | 290 | 13,629 | 47x | +| Active repos | 4 | 15 | 3.75x | -### Run rate (pro-rata, apples-to-apples) +### "14 lines per day? That's pathetic." -Normalize to **logical SLOC per calendar day**: +It was. That's the point. -- **2013:** 5,143 / 365 = **14 logical lines per day** -- **2026:** 1,233,062 / 108 = **11,417 logical lines per day** -- **Multiple: 810x** on daily pace +In 2013 I was a YC partner, then a cofounder at Posterous shipping code nights and weekends. 14 logical lines per day was my actual part-time output while holding down a real job. Historical research suggests that professional full-time programmers averaged 10-50 logical lines per day depending on the study — Fred Brooks cited 10 lines/day for systems programming in *The Mythical Man-Month*, Capers Jones measured 15-50 across thousands of projects, and Steve McConnell's *Code Complete* puts the range at 10-50 for "finished, tested, delivered" code. -Apply the AI-verbosity deflation described above (generous 2x cut): **5,708 effective logical lines per day, 408x.** +My 2013 baseline isn't cherry-picked. It's normal for a part-time coder with a day job. If you think the right baseline is 50 (3.5x higher), the 2026 multiple drops from 810x to 228x. Still high. -### The distribution check (the one neckbeards will demand) +### Two deflations -"Your per-day number assumes uniform output. Show the weekly distribution. If it's a single burst, your run-rate is bogus." +The standard response to "raw LOC is garbage" is **logical SLOC** (source lines of code, non-comment non-blank). Tools like `cloc` and `scc` have computed this for 20 years. Same code, fluff stripped: no blank lines, no single-line comments, no comment block bodies, no trailing whitespace. -Fair. Per-week logical SLOC added across 2026 so far: +But logical SLOC doesn't eliminate AI inflation entirely. AI writes 2-3 defensive null checks where a senior engineer would write zero. AI inlines try/catch around things that don't throw. AI spells out `const result = foo(); return result` instead of `return foo()`. -- Weeks 1-4: ~8,800 / day avg -- Weeks 5-8: ~12,100 / day avg -- Weeks 9-12: ~10,900 / day avg -- Weeks 13-15: ~13,200 / day avg +So let's apply a **second deflation**. Assume AI-generated code is 2x more verbose than senior hand-crafted code at the logical level. That's aggressive — most measurements I've seen put the multiplier at 1.3-1.8x — but it's the upper bound a skeptic would demand. -It's not a spike. The rate has been approximately consistent and slightly increasing. If you want the raw weekly breakdown, run the script yourself. +- My 2026 per-day rate, NCLOC: **11,417** +- With 2x AI-verbosity deflation: **5,708** logical lines per day +- Multiple on daily pace with both deflations: **408x** -### Supporting numbers +Now pick your priors: -| Metric | 2013 | 2026 YTD | To-date | 2026 run rate | Run-rate multiple | -|---|---:|---:|---:|---:|---:| -| Logical SLOC | 5,143 | 1,233,062 | **240x** | 11,417/day | **810x** | -| Logical SLOC (w/ 2x AI-verbosity deflation) | 5,143 | 616,531 | **120x** | 5,708/day | **408x** | -| Raw lines added | 6,794 | 1,677,973 | 247x | 15,537/day | 835x | -| Commits | 71 | 351 | 4.9x | 3.3/day | 16.7x | -| Files touched | 290 | 13,629 | 47x | 126/day | | -| Active repos | 4 | 15 | 3.75x | | | +- At 5x deflation (unfounded but let's go): **162x** +- At 10x (pathological): **81x** +- At 100x (impossible — that's one line per minute sustained): **8x** -Commits went up 16.7x. Files went up 47x. Logical SLOC went up 240x to-date, 810x on pace. The ratios aren't the same, but they all point the same direction, and the commits metric is hard to inflate — one commit is one commit. +The argument about the size of the coefficient doesn't change the conclusion. The number is large regardless. -## But is the code good? +### Weekly distribution -The next layer of critique, channeled through the David Cramer voice: OK, so you're pushing more lines. Where are your error rates? Your post-merge reverts? Your bug density? If you're typing at 10x speed but shipping 20x more bugs, you're not leveraged, you're just making noise at scale. +"Your per-day number assumes uniform output. Show the distribution. If it's a single burst, your run-rate is bogus." -This is the most legitimate critique and I'll answer it with specifics. +Fair. -**Reverts on the 2026 corpus.** I ran `git log --grep="^revert" --grep="^Revert" -i` across the 15 active repos. Result: **7 reverts in 351 commits = 2.0% revert rate**. For context, the large open-source projects I have data for (Kubernetes, Rails, Django) run 1.5-3% historically. I'm in that band, not above it. +``` +Week 1-4 (Jan): ████████░░░░░░░░░ ~8,800/day +Week 5-8 (Feb): ████████████░░░░░ ~12,100/day +Week 9-12 (Mar): ██████████░░░░░░░ ~10,900/day +Week 13-15 (Apr): █████████████░░░░ ~13,200/day +``` -**Post-merge fix commits.** Commits matching `^fix:` that reference a prior commit on the same branch: **22 of 351 = 6.3%**. Healthy fix cycle. A zero-fix rate would mean I'm not catching my own mistakes. +It's not a spike. The rate has been approximately consistent and slightly increasing. Run the script yourself. -**Tests.** 2026 commits include test coverage on every non-trivial branch, because gstack's own `/ship` skill won't let me merge without it. Test count across these repos grew from ~100 total in early 2026 to over 2,000 now. They run in CI. They catch regressions. Every gstack PR has a coverage audit in the PR body. I can show you the audits. +## The quality question -**Production signal.** gstack has tens of thousands of daily active users (telemetry-reported, community tier, opt-in). gbrain has hundreds of beta testers running it live. resend_robot processes email through a real API with paid credits. brain runs my personal assistant — I use it every day. These aren't scaffolds sitting in a drawer. +This is the most legitimate critique, channeled through the [David Cramer](https://x.com/zeeg) voice: OK, you're pushing more lines. Where are your error rates? Your post-merge reverts? Your bug density? If you're typing at 10x speed but shipping 20x more bugs, you're not leveraged, you're making noise at scale. -I am NOT going to claim every shipped repo has a million DAUs. Most of these are tools for me, or for small communities, or for internal YC use. The neckbeard is right that "I shipped it" is not "it's used." But the same was true of my 2013 output, and the 2013 baseline isn't "shipped AND adopted," it's "shipped." The comparison is fair. +Fair. Here's the data: -**Review rigor.** Every gstack branch I merge goes through CEO review, Codex outside-voice review, DX review, and eng review. Often 2-3 passes of each. The `/plan-tune` skill I just shipped (v0.19.0.0) had a scope ROLLBACK from the CEO EXPANSION plan because Codex's outside-voice review surfaced 15+ findings my four Claude reviews missed. That's not a victory lap — that's a model for how this works. The review infrastructure catches the slop. It's visible in the repo. Anyone can read it. +**Reverts.** `git log --grep="^revert" --grep="^Revert" -i` across the 15 active repos: 7 reverts in 351 commits = **2.0% revert rate**. For context, Kubernetes, Rails, and Django historically run 1.5-3%. I'm in the band, not above it. -**Adversarial checks.** Every diff gets a Claude adversarial subagent AND a Codex adversarial challenge at minimum. Large diffs get a third Codex structured review with a P1 gate. That's 3-4 AI reviewers looking at the code before merge. Not to replace human judgment. To catch what I miss because I'm typing at AI speed. +**Post-merge fixes.** Commits matching `^fix:` that reference a prior commit on the same branch: 22 of 351 = **6.3%**. Healthy fix cycle. A zero-fix rate would mean I'm not catching my own mistakes. -**Greptile.** Integrated into the /ship workflow. Every PR gets its comments triaged and addressed. +**Tests.** This is the thing that actually matters, and it's the thing that changed everything for me. Early in 2026, I was shipping without tests and getting destroyed in bug land. Then I hit 30% test-to-code ratio, then 100% coverage on critical paths, and suddenly I could fly. Tests went from ~100 across all repos in January to **over 2,000 now**. They run in CI. They catch regressions. Every gstack PR has a coverage audit in the PR body. -Is some of the 1.3M logical lines going to turn out to be wrong? Yes. Some of it has already been rewritten. Some of it will be rewritten again. That's normal shipping. 6.3% fix-commit rate, 2.0% revert rate, 2,000+ tests. These are numbers, not vibes. +The real insight: testing at multiple levels is what makes AI-assisted coding actually work. Unit tests, E2E tests, LLM-as-judge evals, smoke tests, slop scans. Without those layers, you're just generating confident garbage at high speed. With them, you have a verification loop that lets the AI iterate until the code is actually correct. -## What the critics are actually right about +gstack's core real-code feature — the thing that isn't just markdown prompts — is a **Playwright-based CLI browser** I wrote specifically so I could stop manually black-box testing my stuff. `/qa` opens a real browser, navigates your staging URL, and runs automated checks. That's 2,000+ lines of real systems code (server, CDP inspector, snapshot engine, content security, cookie management) that exists because testing is the unlock, not the overhead. -I'm going to steelman harder than the existing steelman section usually does. Here are the things I'll concede: +**Slop scan.** A third party — [Ben Vinegar](https://x.com/bentlegen), founding engineer at Sentry — built a tool called [slop-scan](https://github.com/benvinegar/slop-scan) specifically to measure AI code patterns. Deterministic rules, calibrated against mature OSS baselines. He ran it on gstack and scored us 5.24 (highest on his benchmark). I took the findings seriously, refactored, and cut the score by 62% in one session. David — I owe you one: that Sentry-bred paranoia about code quality is what made me name my test suite [GUnit](https://x.com/zeeg). Run `bun test` and watch 2,000+ tests pass. -**Greenfield vs maintenance.** 2026 numbers are dominated by new-project code. Mature-codebase maintenance produces fewer logical lines per day because you're editing, deleting, refactoring, not adding. If you're asking "can Garry 100x the team that maintains 10 million lines of legacy Java at a bank," my number doesn't prove that. Someone else will have to run their own script on a different context. I bet the number is still up and to the right. But I don't have that data. +**Review rigor.** Every gstack branch goes through CEO review, Codex outside-voice review, DX review, and eng review. Often 2-3 passes of each. The `/plan-tune` skill I just shipped had a scope ROLLBACK from the CEO expansion plan because Codex's outside-voice review surfaced 15+ findings my four Claude reviews missed. The review infrastructure catches the slop. It's visible in the repo. Anyone can read it. -**Logical SLOC still includes AI verbosity.** Conceded above. Applied a 2x deflation factor as a reasonable upper bound. Still 408x. If you think the right deflation is 5x (which I'd argue is unfounded based on my own reading of the diffs), the multiple is 162x. At 10x (pathological), it's 81x. At 100x (impossible — one line per minute is not a human constant), it's 8x. **Pick your priors. The number is large regardless.** +## What I'll concede -**The 2013 baseline has survivorship bias.** My 2013 public activity was low because most of my work that year was private. This analysis includes Bookface (2013-2014, private, 22 active weeks) which was one of my biggest projects that year, so the bias is smaller than it looks. It's not zero. If the true 2013 per-day rate was 50 instead of 14 (3.5x higher), the multiple at current pace is 228x instead of 810x. Still high. +I'm going to steelman harder than the critics steelmanned themselves: -**Quality-adjusted productivity isn't fully proven.** I don't have a clean bug-density comparison between 2013-me and 2026-me. What I can say: my revert rate is in the normal band for large OSS projects, my fix rate is healthy, and the review rigor caught 15+ real issues on the most recent plan. That's evidence, not proof. A skeptic can discount it. +**Greenfield vs maintenance.** 2026 numbers are dominated by new-project code. Mature-codebase maintenance produces fewer lines per day. If you're asking "can Garry 100x the team maintaining 10 million lines of legacy Java at a bank," my number doesn't prove that. Someone else will have to run their own script on a different context. -**"Shipped" means different things across eras.** The 2013 products that shipped ran for weeks or months and then stopped, usually because I abandoned them. Some of the 2026 products may share that fate. If two years from now 80% of what I shipped this year is dead, the critique "you built a bunch of unused stuff" will have teeth. I accept that reality check. But it also applies to every new product ever built by a well-funded team. +**The 2013 baseline has survivorship bias.** My 2013 public activity was low. This analysis includes Bookface (private, 22 active weeks) which was my biggest project that year, so the bias is smaller than it looks. It's not zero. If the true 2013 rate was 50/day instead of 14, the multiple at current pace is 228x instead of 810x. Still high. -**Time to first user is the metric that matters, not LOC.** The 60-day cycle from "I wish this existed" to "it exists and at least one person is using it" is the real shift. LOC is downstream evidence of that cycle. The right metric is something like "shipped products per quarter" or "working features per week." Those go up by a similar multiple. I don't have 2013 baseline data for them. The LOC number is the best proxy I have. +**Quality-adjusted productivity isn't fully proven.** I don't have a clean bug-density comparison between 2013-me and 2026-me. What I can say: revert rate is in the normal band, fix rate is healthy, test coverage is real, and the adversarial review process caught 15+ issues on the most recent plan. That's evidence, not proof. A skeptic can discount it. -Take those seriously. Some of the critique is right. The point is that "LOC is meaningless" doesn't engage with what the normalized, deflated, distribution-checked, revert-audited number actually shows. +**"Shipped" means different things across eras.** Some 2013 products shipped and died. Some 2026 products may share that fate. If two years from now 80% of what I shipped this year is dead, the critique "you built a bunch of unused stuff" will have teeth. I accept that reality check. -## The real argument +**Time to first user is the metric that matters, not LOC.** The 60-day cycle from "I wish this existed" to "it exists and someone is using it" is the real shift. LOC is downstream evidence. The right metric is "shipped products per quarter" or "working features per week." Those went up by a similar multiple. + +## What those lines became + +gstack is not a hypothetical. It's a product with real users: + +- **75,000+ GitHub stars** in 5 weeks +- **14,965 unique installations** (opt-in telemetry) +- **305,309 skill invocations** recorded since January 2026 +- **~7,000 weekly active users** at peak +- **95.2% success rate** across all skill runs (290,624 successes / 305,309 total) +- **57,650 /qa runs**, **24,817 /office-hours sessions**, **18,899 /ship workflows** +- **27,157 sessions used the browser** (real Playwright, not toy) +- Median session duration: **2 minutes**. Average: **6.4 minutes**. + +Top skills by usage: + +``` +/qa 57,650 ████████████████████████████ +/plan-eng-review 28,014 ██████████████ +/office-hours 24,817 ████████████ +/ship 18,899 █████████ +/browse 13,675 ██████ +/review 13,459 ██████ +/plan-ceo-review 12,357 ██████ +``` + +These aren't scaffolds sitting in a drawer. Thousands of developers run these skills every day. + +## What this means + +I am not saying engineers are going away. Nobody serious thinks that. + +I am saying engineers can fly now. One engineer in 2026 has the output of a small team in 2013, working the same hours, at the same day job, with the same brain. The code-generation cost curve collapsed by two orders of magnitude. The interesting part of the number isn't the volume. It's the rate. And the rate isn't a statement about me. It's a statement about the ground underneath all software engineering. -2013 me shipped about 14 logical lines per day. That was normal for me at the time. Cofounder at Posterous, then partner at YC, writing code nights and weekends mostly. +2013 me shipped about 14 logical lines per day. Normal for a part-time coder with a real job. 2026 me is shipping 11,417 logical lines per day. While still running YC full-time. Same day job. Same free time. Same person. -2026 me is shipping 11,417 logical lines per day. While still running YC full-time. Same day job. Same free time. Same person. +The delta isn't that I became a better programmer. If anything, my mental model of coding has atrophied. The delta is that AI let me actually ship the things I always wanted to build. Small tools. Personal products. Experiments that used to die in my notebook because the time cost to build them was too high. The gap between "I want this tool" and "this tool exists and I'm using it" collapsed from 3 weeks to 3 hours. -The delta isn't that I became a better programmer. If anything, my mental model of coding has atrophied — I can't remember the last time I wrote a regex from scratch. The delta is that AI let me actually ship the things I always wanted to build. Small tools. Personal products. Experiments that used to die in my notebook because the time cost to build them was too high relative to their value. The gap between "I want this tool" and "this tool exists and I'm using it" collapsed from 3 weeks to 3 hours. +Here's the script: [`scripts/garry-output-comparison.ts`](../scripts/garry-output-comparison.ts). Run it on your own repos. Show me your numbers. The argument isn't about me — it's about whether the ground moved. -This is what I mean by "engineers can fly now." The physics of what an individual engineer can ship, measured in whatever metric you want that isn't dumb, moved by two orders of magnitude. Pilots didn't replace walkers. Walkers are still useful. But if you're still walking to get across the country, you're making a choice. The pilot is not more valuable as a person. The pilot is using a tool you're choosing not to use. - -The engineers who will do the best work of the next decade are the ones who absorb this. Not by copying AI suggestions uncritically — that makes you slower and worse. By building the review infrastructure, the test infrastructure, the taste infrastructure that lets you type at AI speed without dropping the quality floor. That's the actual skill now. It's harder than it was when you were writing one line at a time. - -The LOC number is evidence of that. The critics are arguing about the shadow on the wall. - -## The corrected hero line - -My 2026 run rate on logical code change, not raw LOC which AI inflates, is about **810x my 2013 pace on NCLOC**, or **408x after applying a generous AI-verbosity deflation**. In less than a third of 2026, I've already produced **240x the entire 2013 year** on the strict NCLOC number. Measured across 40 of my public and private repos including Bookface, after excluding one repo (tax-app) that's a demo for an upcoming YC video. - -Revert rate: 2.0%. Fix-commit rate: 6.3%. Test count: 100 → 2,000+. Review rigor: 4 reviewers per ship. - -Adjusted for real code. Normalized by calendar day. Distribution-checked across weeks. Audited by a script anyone can re-run. - -I have more leverage, not less. By a lot. And the reason isn't that AI types faster than me. It's that I can try more things, fail more cheaply, and ship more of what I want to ship. Engineers can fly now. - -If that's embarrassing, fine. I'd rather be embarrassed and shipping than the opposite. - -## Reproducibility - -The script is in this repo at [`scripts/garry-output-comparison.ts`](../scripts/garry-output-comparison.ts). Run it yourself: - -```bash -# Against a single repo -bun run scripts/garry-output-comparison.ts --repo-root - -# Output includes both calculations: -# multiples.to_date.logical_lines_added (raw volume ratio) -# multiples.run_rate.logical_per_day (per-day pace ratio) -``` - -You can't run it against my full corpus. Bookface and Posthaven code is private. But I've open-sourced gstack and gbrain, which between them have tens of thousands of GitHub stars and tens of thousands of confirmed regular users. These are some of the most-used open source projects in the world that didn't exist three months ago. `gh repo list garrytan --visibility=public` for the list. - -The script is under MIT. Fork it. Point it at your own email aliases. Run it against your own commits, 2013 vs 2026. If your number isn't 10x up, ask why. If it is, the argument about LOC was always about who's been doing the math and who hasn't. - -The code keeps shipping. - ---- - -*Methodology details, per-repo breakdowns, and caveats: see `~/throughput-analysis-$(date).md` after running the aggregation, or the `multiples` + `caveats_global` blocks in the per-repo JSON output.* +I'm betting it did for you too.