The AI Harness Beats the Model: What Operators Build

A small engine wrapped in a vast tangle of cables and reins, evoking the harness operators build around AI models.

One CEO spends two hours every Friday letting Claude review the week's conversations and propose new skills to add to his automation library. Another built a custom app that lets him share his screen on a Zoom call, rant at an AI in real time — "move that bar three pixels left, fix the color, expand the FAQ" — and capture his punch list complete with screenshots. A founder running a one-person shop has effectively replaced a full-time contractor with the scaffolding he's built around Claude Code.

The frontier models barely came up in this week's roundtable; I asked about Fable, and there wasn't much interest. What kept surfacing instead was the harness operators are building around AI — the audit loops, the rating systems, the voice interfaces, the patterns for running agents in parallel and having them check each other's work. The model is commoditized. The harness isn't.

The takeaway: Stop comparing models and start building the harness. The operators pulling ahead run audit loops — a standing Friday session where the AI reviews the week's work and proposes new skills — plus overnight client health scoring and parallel agents that check each other's output. That scaffolding compounds while the models keep leapfrogging each other.

What follows are notes from this week's Executive AI Roundtable discussion, shared under the Chatham House Rule.

AI Workshop for CEOs

Building the harness around AI — the audit loops, the orchestration patterns, the team-level workflow scaffolding — is exactly the work of the workshop's 1-on-1 session. Three hours live with a group of 8 CEOs, plus a personal session to map it to how your team actually works.

Reserve Your Seat →

As soon as you extrapolate this to a large organization, it's like things are molasses.

Stop Comparing Models. Start Building the Harness.

As early-adopter CEOs have noted in multiple meetings, "the harness and the processes are actually the sticking points for me now. Getting it the right context, the right time, and letting it actually grind through a task versus its raw intelligences."

They went further: "the statement that today is the worst it's going to ever be is largely true." If models are roughly equivalent today and only going to get better, the differentiation lives entirely in what you build around them — not in which lab you pay.

the model is the commodity · the harness around it is the moat

That's a different game than the one most leaders are still playing. The vendor-selection conversations, the "should we move from ChatGPT to Claude" debates, the procurement reviews — those discussions are less important in 2026. The current question is: what does the workflow around the model look like? The retrieval. The chaining. The parallelization. The quality review passes. The handoff to a human. The audit loop that gets the system better over time.

The operators in this room have stopped treating AI as a chat box and started treating it as infrastructure. They've also stopped expecting any one model release to change their lives. The next leap won't come from a model upgrade. It'll come from the orchestration layer.

The next session went deeper on the async side of that orchestration layer — see Run AI on the Graveyard Shift for the GitHub-Issues-as-source-of-truth pattern, the brief that compresses overnight work into five decisions, and the author-vs-reviewer split between models.

The Friday Skill Audit

A CEO running client work out of Basecamp has Claude on a calendar block every Friday. The session reviews everything he did the previous week — every conversation, every chat, every change — and hands back a prioritized list of new skills to add to his library or edits to the ones that already exist. "It's doing a weekly audit of all of our conversations, and suggesting new skills to create, or edits to existing ones to, like, make them run better."

It's a meta-loop: the same AI that handles the day-to-day work also reviews how it happened and proposes what to formalize. A new repeated task becomes a skill. An old skill that's drifting gets a refinement. Friction shows up as a candidate for automation. He spends about two hours on Fridays running through the queue.

He's clear about why the scheduled cadence matters: "I try to spend maybe about two hours on Friday just working on AI tooling and kind of forcing myself to do it, because otherwise I'm not gonna do it." Without the calendar block, the audit doesn't happen. The natural pull of urgent client work always wins. The block turns leverage from optional into recurring.

He also called out the cost of not having this discipline: "I think your instinct is, like, I'm just going to watch Twitter, and there's all this cool stuff, but, like, 95% of it's hype and people sort of lying." The audit replaces hype and doom-scrolling with a proactive review loop that mines last week's day-to-day work for new use cases and refinements.

Most operators bolt on AI tooling once and never revisit it. This one revisits it every week, with help from the AI itself, and the system gets sharper with every pass. That's the difference between AI as a tool and AI as a colleague that gets better with seniority.

Client Health Scoring on Autopilot

An agency CEO uses AI to monitor each of their engagements. Each night, Claude walks through their client workspaces in Basecamp and produces a letter grade for every client: how often they've interacted, what the sentiment of their messages looks like, and whether the rhythm feels healthy or off. Anything sliding gets flagged: "For that client, how many times have they interacted? Are they happy? What's the sentiment on what they're posting? And give me a letter grade. You better spend more attention on these guys, type of thing."

It's the kind of system that used to require a dedicated customer success role a small agency couldn't justify — too small to hire, too important to ignore. Now it runs overnight. The morning queue is already prioritized.

The power is in what it doesn't ask the operator to do. He doesn't have to remember to check on Client X. The system surfaces Client X because the sentiment in their last three messages slipped. Letter grades are crude on purpose — they don't pretend to be diagnostic, they only force a triage decision. Crude is enough to act on.

This is work that used to be uneconomical and is now free. A junior account manager spending an hour every morning reading every client's recent traffic and assigning a gut grade would be a luxury. Claude does the same work overnight for the cost of a few cents in tokens. The operator wakes up to a list, not a research task.

Talk to the Screen, Get a Punch List

Another founder hopped on the call and shared his screen. He'd been refining a Claude Code application that lets him share a product screen on Zoom and just talk to an AI agent as if it were a designer or product manager on the other end of the line. The agent captures intent in real time, structures it into a punch list with screenshots, and feeds it back to Claude Code to implement.

Watching the demo, I jokingly narrated what the user experience felt like: "You just rant at your AI and tell the fix. I said I wanted three pixels border. That is four pixels! Fix it immediately."

Two design choices make it work. First, it uses OpenAI's real-time speech-to-speech API rather than after-the-fact transcription, so the AI responds conversationally while the user is still on the call — there's no awkward typing or pauses. Second, the agent doesn't try to be a developer. Its job during the call is to listen, understand, and confirm. The implementation happens after the call ends, off the existing Claude Code agent loop that already handles "give me a list, make the changes."

He framed it as a deliberate bet on where things are going: "In the next six to twelve months, Replit, Claude Code, like, these things will have an interface where you can zoom and screen share." The product itself is half-built — there's already an iPhone version — but the bet underneath it is that the interface to a coding agent will be a conversation on a live screen, not a chat window with pasted-in references.

The current product-feedback loop — meeting, notes, ticket, sprint, build, review, ship — takes days at best. A real-time voice agent at the front end of that loop, capturing intent and routing the work to the agent who builds it, can cut days to hours.

The practical takeaway for a CEO reading this: the next round of internal tools your team builds should treat voice as the primary input and the screen as the primary context. The keyboard is a bottleneck. For a low-cost way in, I run an on-device dictation setup — an open speech model plus a foot pedal so voice capture never leaves the laptop.

Three Thousand Emails in Ten Minutes

A CEO described a workflow that a friend at a larger SaaS company put together, using Retool and Claude. The team needed to reach out to roughly 3,000 contacts across different industries, with different prior engagement, and requiring different pitches. The friend built a small Retool app that pulled the contact list, generated about a dozen email templates trained on examples of copy that had landed before, and walked through the list assigning the right template, personalizing per recipient, and producing a draft for each. "Retool, and it goes through and like drafts emails to like 3,000 contacts, and I ended up creating like 12 templates."

The whole run took about ten minutes for work that would have eaten hours even with prior-generation AI assist. Not because the model is dramatically better — the harness is. A purpose-built front end on top of Claude makes the workflow viable. Without it, an operator would still be copy-pasting into the chat box one prospect at a time.

The general principle holds at every scale, including a one-person outreach push. AI's biggest gain in sales and marketing is lowering the activation energy of personalized outreach, not perfecting the copy. The marginal cost of dropping a draft into Claude before sending is tiny. The marginal lift on quality is real. The team that adopts the discipline ends up sending more outreach because the friction is lower, and the outreach lands better because each version gets one more pass before it goes out.

Parallel Agents and the Guardian Pattern

A few smaller patterns came up in passing that are worth lifting out of the conversation. The common thread: operators are starting to treat Claude Code less as a chatbot and more as a small fleet of agents they orchestrate.

The first is parallelization. Claude Code's /agents command spawns sub-agents that work on independent pieces of a task simultaneously. A punch list of five items — change a button color, move a nav bar, expand a FAQ, update a footer, fix a typo — runs in the wall-clock time of the slowest one instead of the sum of all five. A roomful of agents costs almost nothing in tokens compared to the engineer-minutes it saves.

The second is the guardian pattern (Anthropic calls this the evaluator-optimizer pattern): spawn a second agent with a cleared context to review the first agent's work. The reviewer doesn't know how the work was done; it only knows what the work was supposed to accomplish. That separation catches the failures a single-pass agent confidently misses. Same principle as a good engineering code review — the author is the worst person to catch their own mistakes, but at a tiny fraction of the marginal cost.

The third is /clear. When you're switching from one project to a fundamentally different one — from features to SEO, from code to writing — clearing the context window before the switch keeps old vocabulary, constraints, and framing from bleeding into the new task. Claude doesn't get confused; it forgets the things it should forget.

The fourth is auto mode. Letting Claude Code run actions within pre-approved scopes without prompting at each step turns out to be the single biggest unlock for sustained agent workflows. The flow-state cost of permission prompts is enormous. Auto mode, with sensible scopes, removes it. The one thing better than AI is delegating AI — and delegating AI only works if you stop interrupting it.

None of these is radical. Each one shaves minutes or hours per week. Stacked, they're the difference between an operator who can run a one-person company and one who can't.

The Reason Big Companies Aren't Getting These Gains

Founders in the room described themselves as a couple of times more effective than a year ago. They've replaced contractors with their own AI scaffolding, run broader data stacks with thinner teams, and automated weekly audit loops atop their own skill libraries. The leverage at the individual-founder level is real and measurable.

But: "as soon as you extrapolate this to a large organization, it's like things are molasses. This is my experience. It just moves so much slower."

That observation is structural, not anecdotal. The same pattern dominates the AI economic data right now. Companies are spending more on AI than ever — Uber burned through its entire 2026 AI budget in four months and has now capped employee spending on Claude Code. Meanwhile, productivity gains at the org level are difficult to find. Most of the companies I've talked to have accelerated their work, but efficiency — the kind that drops to the bottom line — is a rare outcome of most AI deployments.

Why? Because AI is a skill, and most organizations aren't training for it. A salesperson at a mid-size B2B company doesn't have a playbook for using AI to research a prospect, draft an outreach sequence, or run a post-call coaching review. There isn't one. Best practices outside the technology bubble effectively don't exist, and the ones that do are out of date within a quarter.

The cultural mismatch compounds the skill gap. AI gives the biggest gains to people who think like founders — scrappy, curious, willing to build, break, and iterate. Most employees are not founders, and shouldn't be expected to act like them. They were hired to be reliable cogs in a larger machine. Dropping a frontier AI tool into that environment without retraining the cogs doesn't change the machine; it just lets the cogs do their old jobs slightly differently.

A founder in the room put it plainly: AI "hasn't made our sales go twice as fast. The returns are just fundamentally different. We are using it, and I think it is beneficial, and it is helping at the margins, but it's not a game changer like it has been in software and the actual product building."

The gap between individual leverage and organizational leverage is the real story of 2026's AI economy. It's also the place where boards and CEOs are getting the news late. The dashboard says spend is up. The cultural reality is that most of the team doesn't know how to convert that spend into output. More on this in the companies-stalled-at-stage-one essay and the AI people problem.

What This Is Costing Our Brains

The optimism in the room came with a real concession.

I've noticed my attention span getting worse if I don't fight it. Long pull requests feel almost impossible to read. One operator at the table reported the same thing — feeling lighter, like he can't focus. Another said reading was already a struggle for him and that AI's summarization has been freeing, but the broader pattern is unmistakable. The tools that make us faster are also making us less patient with anything they can't accelerate.

The fix is a deliberate choice about which loops to keep human. Reading the actual code that ships your product. Reading the actual contract. Sitting with the actual customer feedback. AI summaries are a productivity multiplier on the easy two-thirds. On the hard one-third — the part where judgment lives — they're a tax disguised as a savings, because they sand off the friction that was doing useful work in your head.

The harness, again, decides whether AI compounds for you or works against you. Loops you let AI flatten: the routine ones. Loops you protect: the judgment ones. AI is the most useful tool for closing hidden gaps in a generation, and it has real failure modes. Treat the cognitive cost the way a serious athlete treats the cost of caffeine: real, manageable, worth being honest about.

Where to Start

If you're a CEO reading this and wondering what to copy first, the operator-level moves in this essay collapse into a short list. Pick one this week:

Schedule a weekly AI audit. Two hours, calendar-locked, where you (or your most AI-fluent operator) review the week's AI use and formalize whatever showed up more than twice into a repeatable skill. The audit beats the doom-scroll.
Score what matters daily. Pick the thing whose health you most need to know — client engagement, pipeline freshness, support sentiment, code-review backlog — and have AI grade it overnight on a crude letter scale. Wake up to triage, not research.
Run a parallel-agent experiment. Pick a multi-item task (a punch list, a content batch, a refactor across files) and have agents work it in parallel. Learn what splits cleanly and what doesn't. This is a cheap dry run for the orchestration layer you'll eventually need.
Treat AI as a skill in your training plan. Add it to the same place where you train sales methodology, product ergonomics, or technical onboarding. If a salesperson is expected to use AI on every call, they need a playbook for using AI on every call. (How to Deploy AI in Your Company walks through a 12-week deployment playbook built around exactly this kind of training.)
Reserve a human-only loop. Pick the loop that's most load-bearing for your business — code review, customer conversations, contract review — and refuse to let AI summaries replace it. Protect the place where judgment lives.

You're not buying a model. You're building the harness around the model — the audit cadence, the scoring system, the orchestration pattern, the skill base of your team, the loops you decided to keep human. The model is commoditized. The harness is the moat.

Resources From the Roundtable

Claude Code — Anthropic's coding agent, central to most of the workflows in this essay. The /agents, /clear, and auto-mode commands referenced above all live here.
Basecamp — Project management and client communication. Connects to Claude Code via MCP for the weekly audit and the nightly client-health scoring.
Retool — Low-code platform used to wrap Claude into a bulk email-drafting workflow.
PostHog — Product analytics. One operator now talks to PostHog through Claude Code, generating insights and setting alerts conversationally.
Whisper Flow — Voice transcription, paired with a single-button Stream Deck Mini so dictation activates without a keyboard chord.
HubSpot — CRM, used as the baseline for AI-driven prospecting and gap analysis.
OpenAI Realtime API — Speech-to-speech model behind the voice-Zoom feedback agent.

If you want to go deeper on the patterns underneath these systems, the AI skills that replace workflows essay walks through what makes a workflow worth automating, and Close the AI Feedback Loop covers the audit-and-improve cadence that makes the Friday review work. For where the harness sits in the larger progression, The Four Levels of AI Adoption maps the climb from chat to fully triggered autonomy, and AI Knowledge Capture covers what happens to the harness when its builder resigns — and the policy that keeps it.

The AI Harness Beats the Model: What Operators Build

John M. P. Knox

Stop Comparing Models. Start Building the Harness.

The Friday Skill Audit

Client Health Scoring on Autopilot

Talk to the Screen, Get a Punch List

Three Thousand Emails in Ten Minutes

Parallel Agents and the Guardian Pattern

The Reason Big Companies Aren't Getting These Gains

What This Is Costing Our Brains

Where to Start

Resources From the Roundtable

Want to Talk?

The AI Harness Beats the Model: What Operators Build

John M. P. Knox

Stop Comparing Models. Start Building the Harness.

The Friday Skill Audit

Client Health Scoring on Autopilot

Talk to the Screen, Get a Punch List

Three Thousand Emails in Ten Minutes

Parallel Agents and the Guardian Pattern

The Reason Big Companies Aren't Getting These Gains

What This Is Costing Our Brains

Where to Start

Resources From the Roundtable

Want to Talk?

Get in Touch

Message Sent!