AI Coding Tools Assume a Team You Don't Have

Every AI vendor ships a playbook for using its tools the right way: keep your configuration files lean, maintain an audit trail, invest in the meta-engineering that makes the agent reliable. The operators actually shipping product on these tools read that advice, nod, and ignore most of it. This week's Executive AI Roundtable narrowed to one founder building an entire software platform on Claude Code — running it hard enough to burn through a weekly usage ceiling with days to spare — and the conversation kept circling the same gap: the best practices are written for a company with people to spare, and a bootstrapped team doesn't have them.

What follows is shared under the Chatham House Rule — the insights freely, the names not.

Figuring out how to get AI to be more productive, and figuring out what to trust and what to review yourself — they're two completely separate things.

This essay is one week's notes from the Executive AI Roundtable — a closed-door peer conversation for CEOs and founders, held weekly under the Chatham House Rule.

Talk faster than you can type

The AI tool I got the most from this week was a microphone. On a founder's recommendation from the last Roundtable, I finally gave voice dictation a real try and installed FluidVoice on my Mac — an app that runs an open speech model on-device. The on-device part is what sold me: nothing gets shipped to a vendor, which settles my privacy concerns about sending my audio off to whoever. Even better, there is less lag than the server-based solutions I've tried.

A Stream Deck foot pedal keeps my hands off the keyboard: one switch starts and stops the capture, another sends the prompt with a carriage return, and a small heads-up display shows the transcription in real time so I can preview the text before it lands in the field.

I expected dictation to make the thinking worse — sloppier, more rambling. The opposite happened. Speaking surfaces more of what's in your head than typing does. I can talk faster than I can type, and thoughts fall out of my head quickly when I'm working through an idea; capturing them in voice lets me get more detail down. A founder on the call put it well: "When I'm typing something, there's way more going on in my head than I type." There's an enormous amount of context in a person's head that never gets written down, never gets recorded, and that an LLM therefore never sees. Talking to the model closes part of that gap.

The open dictation model running locally on Apple Silicon is NVIDIA's Parakeet — on a Mac with no NVIDIA GPU in it.

Point the agent at the database nobody can search

A founder needed to confirm that a new company name didn't collide with an existing trademark. The official search interface is the kind of tool that punishes you for using it — multi-word phrase search fails, the backfile is partitioned into more than ninety annual files stretching back to the 1880s, and you download it a year at a time. So they stopped fighting the interface. "I just downloaded the entire trademark database" — every annual file — and pointed Claude Code at the whole thing.

When a public dataset has a hostile front end, the move is to pull it locally and let the agent search it. The collision check that the official tool had made tedious became a single question for an agent who now had all 140 years of records in context. Due diligence work that used to be uneconomical — too slow, too manual to bother — becomes a five-minute task once the data sits where an agent can read it.

Let it run the night shift

A founder is building a new product at full tilt, and the usage bill tells the story. On a Claude Max 20× subscription, they hit their weekly limit before the week was out: "I ran out of it just before the end of the week." Part of the spend was an overnight job. "I told Claude Code: I want you to create five hundred issues in GitHub for things you think you should investigate, and investigate as many as you can overnight." They went to bed; the agent worked the backlog it had written for itself.

Running agents while you sleep is its own discipline, one worth getting deliberate about; more on that in Run AI on the Graveyard Shift. The new skill is knowing which jobs are worth a night of compute — and being able to see where the tokens went.

Two dials, not one: trusting AI and using AI

Two things most teams tune with a single dial are actually separate. "Figuring out how to get AI to be more productive, and figuring out what to trust and what to review yourself — they're two completely separate things." Productivity is teaching the agent to be proactive, to remember your standing instructions, and to stop asking what to do next. Trust is deciding which of its outputs you'll ship without reading them. A model can be highly productive and barely trusted, or trusted in a narrow domain and frustratingly passive everywhere else.

What sets the dials is the stakes. "The mindset for an existing business and a new business is very different." On a greenfield product that is itself an experiment, you can crank productivity and keep trust low because nothing is at risk yet — break it, and you learn something. On an established product, the order flips: the cost of a wrong call the AI made unsupervised is measured in customers and revenue, so trust has to be earned slowly, even when productivity is sitting right there. Conflating the two dials is why so many enterprise AI rollouts stall — they're trying to solve a speed problem and a risk problem with one setting.

AI Workshop for CEOs

Deciding what to trust AI with and where to keep a human in the loop is the work — and it's different for every org. In three hours live with a small group of eight CEOs, plus a 1-on-1 to apply it to your stack, we map exactly where AI earns autonomy in your business and where it doesn't.

Reserve Your Seat →

Agents forget. Each session starts fresh, so the same mistakes recur unless you build memory on top. One founder had wired up a hook suggested by a friend: after each back-and-forth, the agent evaluates whether something went wrong and, if it did, records the problem as a lesson. "In any chat, I just say 'review lessons,' and it shows me the lessons, and I get to approve, reject, or edit." It's a persistent feedback channel bolted onto a stateless tool — the practical answer to telling an agent the same thing for the eighth time.

The session-end counterpart is a free download: Anthropic's claude-md-improver skill reviews the session and folds the new gotchas into your project's instructions. I run it before quitting, and the known bugs stop coming back. This is the same loop-closing discipline covered in Close the AI Feedback Loop.

The cautionary half came from the same build. While running an agent harness that routed work between Claude Code and Codex, a separate code reviewer, the founder realized the review comments were going straight from one tool to the other — never through GitHub. "We just have no idea what feedback Codex gave." The record went dark. On a throwaway experiment, fine; they called it "building this product blind" and meant it as a choice. But these tools optimize for speed, not for an audit trail, and it's very easy for the trail to vanish while you feel productive. They're so good at recording everything that you get complacent about the one path that isn't recorded.

The meta-work tax

Late in the call, a founder read Anthropic's own guidance on managing usage and hit the recommendation to keep your project instructions under 200 lines. Their reaction was unprintable, but the substance was right: that advice is reasonable for a company with thousands of engineers and people whose job is to maintain instruction files across repos. For a small bootstrapped SaaS team, it isn't. They can't read everything in a single session, let alone spend a morning pruning a config file. It's not happening.

The vendor playbooks assume infrastructure — spare people, spare time, a meta-engineering function — that the teams moving fastest on AI don't have. Anthropic and all these AI vendors have very high expectations about how much time you'll put into the meta-work, the meta-engineering.

The meta-work is insurance, and the premium scales with your stakes. Helping an established business write an AI policy, the questions are exactly the ones a solo founder can defer: every slice of the org uses AI differently, so what's an appropriate use, what needs human review, and what doesn't? What are the IP implications — trade secrets, patents, copyright? Do you need to track which commits had AI input? For a company with customers and a moat, that ceremony is how you avoid losing IP or shipping a disaster, and it's worth real time — the groundwork in Map Your Operating System Before You Apply AI and the IP questions in generative-AI work is where to start. For a greenfield experiment, the same ceremony is overhead that buys nothing. The skill is knowing which one you're running.

Where to start

Separate the two dials. Decide what you'll let AI ship unreviewed, and — separately — how proactive you want it to be. Don't tune trust and productivity with one dial.
Match the meta-work to the stakes. On a throwaway experiment, skip the ceremony and move. On a business with customers, build the review thresholds and the audit trail first.
Capture lessons automatically. Add a hook that records what went wrong and replay it next session, and run a config-improver before you quit.
Point the agent at the data, not the UI. When a public database has a hostile interface, pull it local and let the agent search the whole thing.
Watch where the record goes dark. Instrument any multi-agent chain so its feedback lands somewhere a human can still read it.
Right-size the vendor advice. Treat best practices written for thousands of engineers as aspirational, not mandatory — then take the corrective step of writing down the few that actually pay off for a team your size.

Resources From the Roundtable

FluidVoice — free, open-source on-device dictation app for macOS that runs an open speech model (NVIDIA Parakeet) locally; raised as the privacy-preserving alternative to cloud dictation.
Stream Deck Pedal — Elgato's three-switch USB foot pedal, mapped to start and stop dictation hands-free.
Claude Code — Anthropic's agentic coding tool used to build the founder's entire product, run the trademark search, and work an overnight issue backlog.
Claude Code best practices — Anthropic's guidance recommending a CLAUDE.md stay near a 150–200 instruction budget; the source that sparked the meta-work debate.
claude-md-improver skill — Anthropic's official skill that audits a session and folds new gotchas into your CLAUDE.md; run at session end to keep known bugs from recurring.

AI Coding Tools Assume a Team You Don't Have

John M. P. Knox

Talk faster than you can type

Point the agent at the database nobody can search

Let it run the night shift

Two dials, not one: trusting AI and using AI

Keep the record, or build blind

The meta-work tax

Where to start

Resources From the Roundtable

Want to Talk?

AI Coding Tools Assume a Team You Don't Have

John M. P. Knox

Talk faster than you can type

Point the agent at the database nobody can search

Let it run the night shift

Two dials, not one: trusting AI and using AI

Keep the record, or build blind

The meta-work tax

Where to start

Resources From the Roundtable

Want to Talk?

Get in Touch

Message Sent!