Four boundaries for an agent with access to your books
An agent that reconciles invoices and posts to your ledger holds the keys to your bank. The four boundaries that decide whether a prompt injection can move money.
Blog
Notes from real engineering work — Forward Deployed Engineering, AI-native vertical SaaS, domain immersion (accounting, finance), and the trade-offs we don't usually write down.
An agent that reconciles invoices and posts to your ledger holds the keys to your bank. The four boundaries that decide whether a prompt injection can move money.
Aggregate pass@1 falls from 76% to 52% as tasks get longer. In an agent that posts to your ledger, that gap is mis-stated entries. Capability isn't reliability.
In 22 days, Phil Schmid published three deep dives — skills, MCP, subagents — that read together are the unwritten manual for building a serious agent in 2026.
Firefox found 13x more bugs, Zenith won 5 of 8 tasks at 43% cost, Eugene Yan shared his workflow. The signal: invest in your harness, not the model.