Run Data Run
SubstackBuilding in public. AI experiments, lessons learned, and the journey of treating AI as a multiplier.

The Number That Predicts When Your Agent Will Break
A new benchmark gives a name to the failure practitioners keep calling "complex reasoning."
Last 30 Days: Google I/O 2026 (Audio Overview)
Audio companion to the May 20 Last 30 Days post. Two AI hosts debate why Google's hundred-announcement keynote didn't move the betting markets.

Last 30 Days: Google I/O 2026
Google announced roughly a hundred things in two hours. The betting markets didn't move a point.
Last 30 Days: The Enterprise Battle for Claude Code
Audio companion to the May 18 Last 30 Days post.

Last 30 Days: Claude Code
Pro plan flap, a candid postmortem, a CVE, Uber's whole AI budget, and a quiet crackdown on harness substitution

The AND we have to hold
One week of AI-in-medicine headlines, two opposite directions, and three questions to keep asking in this week's Sunday Deep Dive
Earlier posts (13 more)

Eight Agents, One Fight
Seven specialists, one personal assistant, and the quiet work of keeping a heterogeneous squad pointed at intent.

Neural Networks Don't Think in Straight Lines
Goodfire's new work suggests most of our tools for understanding AI are pointed at the wrong shape.

Three Harnesses, Three Characters, One Working Week
Choose your harness every six months. Don't let it choose you.

From the Vault, Literally
Every Sunday I pick one paper or release that’s worth your time, break it apart, and tell you why it matters.

Evals Are the New Bottleneck
Twenty-one thousand seven hundred and thirty agent runs. Nine models. One leaderboard sweep...

Sunday Deep Dive: Reckoning Is Not Judgment
Every Sunday I pick one paper or release that’s worth your time, break it apart, and tell you why it matters.

Two Gaps, Not One
Introducing: Builder-Leader: The AI Exoskeleton That Crosses the Gap (My Book)

Sunday Deep Dive: The Specialists Are Coming for the Generalists
Every Sunday, I pick one paper or release that’s genuinely worth your time, break it apart, and tell you why it matters.

The Specialist Is Now You
What AI enabled yesterday was a benchmark number. What AI enabled today is the individual, working alone, owning the whole pipeline from raw sequence to mechanistic call.

Start With Claude Code
A year in, the harness is the moat.

Sunday Deep Dive: Anthropic's Mythos Preview
Every Sunday, I pick one paper or release that’s genuinely worth your time, break it apart, and tell you why it matters.

Train Once, Inference Forever
I reproduced Cursor's GPU optimization on a model released two days ago, let two AI agents run 52 experiments, and mapped what nobody else has published.

Sunday Deep Dive: The Math Trick That Cuts LLM Memory by 6x
Every Sunday, I pick one paper or release that’s genuinely worth your time, break it apart, and tell you why it matters. No hype. No summaries of summaries. Just the idea, explained.
Want more? Subscribe to get new posts in your inbox.
Subscribe on SubstackTechnical Deep Dives
In-depth technical articles from my AI research garden. Reference material on architectures, implementations, and emerging patterns.
Pruning Your AI Agent Skills Library
AI Development
Ambient Intelligence Intent Over Instructions
AI Development
Anthropic Dreaming Continual Learning
AI Development
Anthropic Multi Agent Research System
AI Development
Autonomous Ai Agent Squad 10 Dollars Month
AI Development
Building Effective Ai Agents Openai Guide
AI Development
