Moving Average Inc.

The AI Harness Beats the Model: What Operators Build

Notes from this week's Executive AI Roundtable

AI RoundtableAI Adoption

One CEO spends two hours every Friday letting Claude review the week's conversations and propose new skills to add to his automation library. Another built a custom app that lets him share his screen on a Zoom call, rant at an AI in real time — "move that bar three pixels left, fix the color, expand the FAQ" — and have the punch list captured, ticketed, and shipped before the next standup. A founder running a one-person shop has effectively replaced a full-time contractor with the scaffolding he's built around Claude Code.

The frontier models barely came up in this week's roundtable. What kept surfacing instead was the harness operators are building around AI — the audit loops, the rating systems, the voice interfaces, the patterns for running agents in parallel and having them check each other's work. The model is commoditized. The harness isn't.

What follows are notes from this week's Executive AI Roundtable discussion, shared under the Chatham House Rule.

AI Workshop for CEOs

Building the harness around AI — the audit loops, the orchestration patterns, the team-level workflow scaffolding — is exactly the work of the workshop's 1-on-1 session. Three hours live with a group of 8 CEOs, plus a personal session to map it to how your team actually works.

Reserve Your Seat →

As soon as you extrapolate this to a large organization, it's like things are molasses.

Stop Comparing Models. Start Building the Harness.

Frontier models — Claude 4.x Opus, Sonnet, the new ChatGPT releases — each got a sentence at most in the opening of the conversation. A CEO put the new shape of the problem directly: "the harness and the processes that are actually the sticking points for me now. Getting it the right context, the right time, and letting it actually grind through a task versus its raw intelligences."

He went further: "the statement that today is the worst it's going to ever be is largely true." If models are roughly equivalent today and only going to get better, the differentiation lives entirely in what you build around them — not in which lab you pay.

That's a different game than the one most leaders are still playing. The vendor-selection conversations, the "should we move from ChatGPT to Claude" debates, the procurement reviews — those answer last year's question. The current question is: what does the workflow around the model look like? The retrieval. The chaining. The parallelization. The review pass. The handoff to a human. The audit loop that gets the system better over time.

The operators in this room have stopped treating AI as a chat box and started treating it as infrastructure. They've also stopped expecting any one model release to change their lives. The next leap won't come from a model upgrade. It'll come from the orchestration layer.

The Friday Skill Audit

A CEO running client work out of Basecamp has Claude on a calendar block every Friday. The session reviews everything he did the previous week — every conversation, every chat, every change — and hands back a prioritized list of new skills to add to his library or edits to the ones that already exist. "It's doing a weekly audit of all of our conversations, and suggesting new skills to create, or edits to existing ones to, like, make them run better."

It's a meta-loop: the same AI that handles the day-to-day work also reviews how it happened and proposes what to formalize. A new repeated task becomes a skill. An old skill that's drifting gets a refinement. Friction shows up as a candidate for automation. He spends about two hours on Fridays running through the queue.

He's clear about why the scheduled cadence matters: "I try to spend maybe about two hours on Friday just working on AI tooling and kind of forcing myself to do it, because otherwise I'm not gonna do it." Without the calendar block, the audit doesn't happen. The natural pull of urgent client work always wins. The block turns leverage from optional into recurring.

He also called out the cost of not having this discipline: "I think your instinct is, like, I'm just going to watch Twitter, and there's all this cool stuff, but, like, 95% of it's hype and people sort of lying." The audit replaces a doom-scroll for tactics with a deterministic loop that surfaces tactics already proven in his own work.

Most operators bolt on AI tooling once and never revisit it. This one revisits it every week, with help from the AI itself, and the system gets sharper with every pass. That's the difference between AI as a tool and AI as a colleague that gets better with seniority.

Client Health Scoring on Autopilot

An agency CEO uses AI to monitor each of their engagements. Each night, Claude walks his client workspaces in Basecamp and produces a letter grade for every client: how often they've interacted, what the sentiment of their messages looks like, whether the rhythm feels healthy or off. Anything sliding gets flagged: "For that client, how many times have they interacted? Like, are they happy? Like, what's the sentiment on what they're posting? And kind of give me a letter grade. You better spend more attention on these guys, type of thing."

It's the kind of system that used to require a dedicated customer-success role at any scale below a hundred clients — too small to hire, too important to ignore. Now it runs overnight. The morning queue is already prioritized.

The power is in what it doesn't ask the operator to do. He doesn't have to remember to check on Client X. The system surfaces Client X because the sentiment in their last three messages slipped. Letter grades are crude on purpose — they don't pretend to be diagnostic, they only force a triage decision. Crude is enough to act on.

This is work that used to be uneconomical and is now free. A junior account manager spending an hour every morning reading every client's recent traffic and assigning a gut grade would be a luxury. Claude does the same work overnight for the cost of a few cents in tokens. The operator wakes up to a list, not a research task.

Talk to the Screen, Get a Punch List

Another founder hopped on the call and shared his screen. He'd built a Claude Code application that lets him share a product screen on Zoom and just talk to an AI agent as if it were a designer or product manager on the other end of the line. The agent captures intent in real time, structures it into a punch list with screenshots, and feeds it back to Claude Code to implement. Watching the demo, I narrated what the user experience felt like: "You just rant at your AI and tell the fix. I said I wanted three pixels border. That is four pixels. Fix it immediately."

Two design choices make it work. First, it uses OpenAI's speech-to-speech real-time API rather than after-the-fact transcription, so the AI talks back conversationally while the user is still on the call — there's no awkward typing or pause. Second, the agent doesn't try to be a developer in the moment. Its job during the call is to listen, understand, and confirm. The implementation happens after the call ends, off the existing Claude Code agent loop that already handles "give me a list, make the changes."

He framed it as a deliberate bet on where things are going: "in the next six to twelve months Replit, Claude Code, like, these things will have an interface where you can zoom and screen share." The product itself is half-built — there's already an iPhone version. What it implies matters more: the interface to a coding agent is going to be a conversation about a live screen, not a chat window with paste-in references.

The current product-feedback loop — meeting, notes, ticket, sprint, build, review, ship — takes days at best. A real-time voice agent on the front end of that loop, capturing intent and routing the work to the agent that builds it, cuts days to hours.

The practical takeaway for a CEO reading this: the next round of internal tools your team builds should treat voice as the primary input and the screen as the primary context. The keyboard is a bottleneck.

Three Thousand Emails in Ten Minutes

A CEO described a workflow a friend at a larger SaaS company put together, using Retool and Claude. The team needed to reach out to roughly three thousand contacts — different industries, different prior engagement, different pitches required. The friend built a small Retool app that pulled the contact list, generated about a dozen email templates trained on examples of copy that had landed before, and walked through the list assigning the right template, personalizing per recipient, and producing a draft for each. "Retool, and it goes through and like drafts emails to like 3,000 contacts and I ended up creating like 12 templates."

The whole run took about ten minutes for work that would have eaten hours even with prior-generation AI assist. Not because the model is dramatically better — the harness is. A purpose-built front end on top of Claude makes the workflow viable. Without it, an operator would still be copy-pasting into the chat box one prospect at a time.

The general principle holds at every scale, including a one-person outreach push. AI's biggest gain in sales and marketing is lowering the activation energy of personalized outreach, not perfecting the copy. The marginal cost of dropping a draft into Claude before sending is tiny. The marginal lift on quality is real. The team that adopts the discipline ends up sending more outreach, because the friction is lower — and the outreach lands better, because each version got one more pass before it went out.

Parallel Agents and the Guardian Pattern

A few smaller patterns came up in passing that are worth lifting out of the conversation. The common thread: operators are starting to treat Claude Code less like a chat box and more like a small fleet of agents they orchestrate.

The first is parallelization. Claude Code's /agents command spawns sub-agents that work on independent pieces of a task simultaneously. A punch list of five items — change a button color, move a nav bar, expand a FAQ, update a footer, fix a typo — runs in the wall-clock time of the slowest one instead of the sum of all five. A roomful of agents costs almost nothing in tokens compared to the engineer-minutes it saves.

The second is the guardian pattern [Anthropic calls this the evaluator-optimizer pattern]: spawn a second agent with a cleared context to review the first agent's work. The reviewer doesn't know how the work was done; it only knows what the work was supposed to accomplish. That separation catches the failures a single-pass agent confidently misses. Same principle as a good engineering code review — the author is the worst person to catch their own mistakes — but at a tiny fraction of the marginal cost.

The third is /clear. When you're switching from one project to a fundamentally different one — features to SEO, code to writing — clearing the context window before the switch keeps old vocabulary, old constraints, and old framing from bleeding into the new task. Claude doesn't get confused; it forgets the things it should forget.

The fourth is auto mode. Letting Claude Code run actions inside pre-approved scopes without prompting per step turns out to be the single biggest unlock for sustained agent workflows. The flow-state cost of permission prompts is enormous. Auto mode, with sensible scopes, removes it. "The one thing better than AI is delegating AI" — and delegating AI works only if you stop interrupting it.

None of these are radical. Each one shaves minutes or hours per week. Stacked, they're the difference between an operator who can run a one-person company and one who can't.

The Reason Big Companies Aren't Getting These Gains

Here's the part that should worry a board.

Founders in the room described themselves as a couple of times more effective than a year ago. They've replaced contractors with their own AI scaffolding, run wider data stacks with thinner teams, automated weekly audit loops on top of their own skill libraries. The leverage at the individual-founder level is real and measurable.

But: "as soon as you extrapolate this to a large organization, it's like things are molasses. This is my experience. It just moves so much slower."

That observation is structural, not anecdotal. The same pattern dominates the AI economic data right now. Companies are spending more on AI than ever — Uber burned through its entire 2026 AI budget in four months, and has now capped employee Claude Code spend. Meanwhile, productivity gains at the org level are difficult to find. Most of the companies I've talked to have accelerated their work, but efficiency — the kind that drops to the bottom line — is a rare outcome of successful AI deployment.

Why? Because AI is a skill, and most organizations aren't training for it. A salesperson at a mid-size B2B company doesn't have a playbook for using AI to research a prospect, draft an outreach sequence, or run a post-call coaching review. There isn't one. Best practices outside the technology bubble effectively don't exist, and the ones that do are out of date within a quarter.

The cultural mismatch compounds the skill gap. AI gives the biggest gains to people who think like founders — scrappy, curious, willing to build and break and iterate. Most employees are not founders, and shouldn't be expected to act like them. They were hired to be reliable cogs in a larger machine. Dropping a frontier AI tool into that environment without retraining the cogs doesn't change the machine; it just lets the cogs do their old jobs slightly differently.

A founder in the room put it as bluntly as I've heard it: AI "hasn't made our sales go twice as fast. The returns are just fundamentally different. We are using it, and I think it is beneficial, and it is helping at the margins, but it's not a game changer like it has been in software and the actual product building."

The gap between individual leverage and organizational leverage is the real story of 2026's AI economy. It's also the place where boards and CEOs are getting the news late. The dashboard says spend is up. The cultural reality is that most of the team doesn't know how to convert that spend into output. More on this in the companies-stalled-at-stage-one essay and the AI people problem.

What This Is Costing Our Brains

The optimism in the room came with a real concession.

I've noticed my attention span getting worse. Long pull requests feel impossible to read without the AI summarizing them first. One operator at the table reported the same thing — feeling lighter, like he can't focus. Another said reading was already a struggle for him and that AI's summarization has been freeing, but the broader pattern is unmistakable. The tools that make us faster are also making us less patient with anything they can't accelerate.

The fix is deliberate choice about which loops to keep human. Reading the actual code that ships your product. Reading the actual contract. Sitting with the actual customer feedback. AI summaries are a productivity multiplier on the easy two-thirds. On the hard one-third — the part where judgment lives — they're a tax disguised as a savings, because they sand off the friction that was doing useful work in your head.

The harness, again, decides whether AI compounds for you or works against you. Loops you let AI flatten: the routine ones. Loops you protect: the judgment ones. AI is the most useful tool for closing hidden gaps in a generation, and it has real failure modes. Treat the cognitive cost the way a serious athlete treats the cost of caffeine: real, manageable, worth being honest about.

Where to Start

If you're a CEO reading this and wondering what to copy first, the operator-level moves in this essay collapse into a short list. Pick one this week:

  • Schedule a weekly AI audit. Two hours, calendar-locked, where you (or your most AI-fluent operator) review the week's AI use and formalize whatever showed up more than twice into a repeatable skill. The audit beats the doom-scroll.
  • Score what matters daily. Pick the thing whose health you most need to know — client engagement, pipeline freshness, support sentiment, code-review backlog — and have AI grade it overnight on a crude letter scale. Wake up to triage, not research.
  • Run a parallel-agent experiment. Pick a multi-item task (a punch list, a content batch, a refactor across files) and have agents work it in parallel. Learn what splits cleanly and what doesn't. This is a cheap dry run for the orchestration layer you'll eventually need.
  • Treat AI as a skill in your training plan. Add it to the same place where you train sales methodology, product ergonomics, or technical onboarding. If a salesperson is expected to use AI on every call, they need a playbook for using AI on every call.
  • Reserve a human-only loop. Pick the loop that's most load-bearing for your business — code review, customer conversations, contract review — and refuse to let AI summaries replace it. Protect the place where judgment lives.

The pattern across all five is the same. You're not buying a model. You're building the harness around the model — the audit cadence, the scoring system, the orchestration pattern, the skill base of your team, the loops you decided to keep human. The model is commoditized. The harness is the moat.

Resources From the Roundtable

  • Claude Code — Anthropic's coding agent, central to most of the workflows in this essay. The /agents, /clear, and auto-mode commands referenced above all live here.
  • Basecamp — Project management and client communication. Connects to Claude Code via MCP for the weekly audit and the nightly client-health scoring.
  • Retool — Low-code platform used to wrap Claude into a bulk email-drafting workflow.
  • PostHog — Product analytics. One operator now talks to PostHog through Claude Code, generating insights and setting alerts conversationally.
  • Whisper Flow — Voice transcription, paired with a single-button Stream Deck Mini so dictation activates without a keyboard chord.
  • HubSpot — CRM, used as the baseline for AI-driven prospecting and gap analysis.
  • OpenAI Realtime API — Speech-to-speech model behind the voice-Zoom feedback agent.

If you want to go deeper on the patterns underneath these systems, the AI skills that replace workflows essay walks through what makes a workflow worth automating, and Close the AI Feedback Loop covers the audit-and-improve cadence that makes the Friday review work.

John M. P. Knox
John M. P. Knox

Founder of Moving Average Inc. 25 years across MedTech, enterprise platforms, and semiconductors — from writing 64-bit code at AMD to guiding 15+ products to market. TinySeed LP and mentor. Hosts the Executive AI Roundtable.

Get the next essay

I write about AI strategy, IP, and leadership. No spam, unsubscribe anytime.

Share this article

Want to Talk?

Send me a quick message and I'll get back to you.

Full form →