The AI Layer Most Teams Aren't Building

Most leaders are still asking which AI tool to adopt. The operators in this week's roundtable are well past that. They've adopted the tools. The conversation in the room was about the layer underneath — the scaffolding that turns "we use Claude" into something that compounds rather than slops.

What follows are notes from this week's Executive AI Roundtable discussion, shared under the Chatham House Rule.

Executive AI Roundtable

Conversations like the one behind this essay happen every week. I host a closed-door roundtable for founders and C-level leaders navigating AI strategy — no vendors, no pitches, just operators comparing notes. Attendance has tripled since early April; the next session is at capacity.

Join the Waitlist →

Fight Fire With Fire

The most concrete examples in the room were AI tools built to manage AI's own output.

A software-agency lead described a Ruby gem her team built called Wally — named after the Pixar garbage collector. It runs a weekly audit on each codebase, pairs deterministic linters like RuboCop with an LLM triage step, and opens a PR when a refactor or removal is worthwhile. It's all agentic — one agent scans the code and finds opportunities to refactor or remove; another decides whether the finding is worth fixing; a third takes the task and opens the PR.

Another founder runs a code-review bot named Auto. Auto is a Git user. He's a required reviewer on every pull request. A virtual machine pings GitHub every fifteen minutes, picks up any reviews assigned to him, runs the code-reviewer skill in a Claude Code session, and posts the result back. By the time a human sees the PR, the bot has already been through it — either the developer fixed the flagged issues or modified the reviewer skill itself, with a justification, to teach it. The founder's read on the result: their code quality has gotten a lot better, because they invested a lot of time in that defensive AI approach.

He calls it the fight-fire-with-fire approach, and the principle is sharp: AI generates code at volume, and one of its known failure modes is duplicate code — writing brand-new functions for things that already exist in the codebase. Claude Code can search for prior implementations, but there's a limit, and the context window can't hold an entire repo. The answer isn't a smarter generator. It's an equally serious investment in AI code review. The deterministic linter handles patterns it knows; the LLM handles judgment; they argue, then roll up to a final report.

Two-AI consensus catches more than either alone. The pattern generalizes: AI generates dashboards, skills, emails. Each output stream needs a counterpart that audits or dedupes it. If you only build the generation side, the slop accumulates faster than you can clean it — which is exactly why quality engineering is having a renaissance.

Skills All the Way Down

Several attendees were wiring AI into their workflows as composable skills — small, scoped capabilities the AI loads on demand.

A real-estate-software founder described his team's pipeline: ingest sales-call transcripts, extract voice-of-customer phrases tied to specific pain points, generate landing pages tuned to those phrases, then push the pages and matching long-tail keywords into Google and Meta ads. Every step in that pipeline is its own skill. "Skills all the way down." Standardized on Claude. Meta's MCP made the ad-publishing step easy; Google has more friction.

Skills change the cost structure of AI work. Prompts are one-shot; skills persist. You build them once, refine them as you use them, version them in Git, share them across the team. Another founder's infrastructure was just-enough: a virtual machine logged into Claude Code, a cron job that invokes a skill on a schedule, Slack hooks that trigger skills on demand. No "agent platform" required. The skill is the agent. (More on the skill-as-workflow shift.)

But skill management has its own failure mode. The same founder's developers were building duplicate skills — both proud of their version, neither willing to use the other's. He called it the IKEA effect: built it myself, love it so much, won't switch. He had to enforce one canonical version per process and tell the engineers to "duke it out" before either skill went into the shared repo. Submodules — which some engineers have learned to avoid in their development life — turned out to handle the canonical-skill problem cleanly. Claude can manage submodule reconciliation just fine.

The thing to notice: you can't manage AI skill drift the way you manage code drift. A skill is just a markdown file. Anyone can write one. Nobody owns the question of which skill is the skill. If you don't appoint a curator, you'll wake up with sixteen versions of the same dashboard generator and team members emailing each other their personal favorites.

Don't Bet the Farm — But the Switching Cost Is Already Low

When a founder asked the room where to host his agents — the cloud, a local Mac, somewhere else — the conversation turned to vendor lock-in. The veteran in the room offered a sharp warning: the winners now won't necessarily be the winners in 6 months or a year. He's old enough to remember when "build for Netscape" was the answer, before Amazon IPO'd. The risk in any fast-moving market is getting on the wrong train early and going off in the wrong direction.

Another participant pushed back with something equally true. The actual artifacts of AI work — skills, prompts, context files — are text. Text is portable. "We've switched between OpenAI and Claude and back, and run it on local LLMs. The switching cost is just you saying, alright, I'm giving up on Claude, I'm gonna go do Codex this week, and it'll just move all the text files in 5 minutes and reformat them."

Both are right. The refinement: don't bet the farm on a vendor's UI, infrastructure, or proprietary format. Bet the farm on your text files. Keep the IP in Git, where it belongs. The model can change. The skills shouldn't have to.

There's one more nuance worth knowing: the model and the harness are not the same thing. Claude in Claude Code is configured with tools — calculators, code execution, file access — that aren't there if you call the same model raw through Azure or another cloud. Same engine, different tires. A skill that works beautifully in Claude Code can produce noticeably worse results when you swap the harness, even though the model name didn't change. Test the harness, not just the model — and be careful when porting a working pipeline to new infrastructure. (This is also why LLMs can quietly fail at math when the harness doesn't include the calculator tools.)

What Could You Not Use AI For This Week?

Adoption inside a team isn't a tooling problem. Several attendees described the same split: the developers face an identity crisis, the QA and support people face overwhelm. Both groups push back, in different ways.

One founder shared a one-on-one tactic that's been quietly transforming his team. In every check-in, he asks the same question: What could you not use AI for this week?

The answers are revealing — they live in the comfortable corners of someone's week. He picks one apart on the spot. The team member's example: you had to pull out Postman and write some API requests. The reframe: Claude could write a Python script that did 5 million API requests and then tell you what it found.

The exercise works because it surfaces mental blocks rather than technical limits. Engineers who already use AI for code completion will quietly skip it for the API testing they've done a hundred times. Habits live in the comfortable corners. The question pulls the AI lens across the entire week.

The QA and support team had a different problem from the engineers. "For the developers, it's an identity crisis. For the non-developer roles, it's very overwhelming that their day is gonna be more in VS Code." He showed his QA lead an AI browser-automation tool and reframed her job: manage a team of 10 QA people. They're all Claude. It took a couple of months before the light-bulb moment arrived. The friction wasn't the tool; it was asking someone whose work had been in browser tabs to live in an IDE all day. (More on the cultural side of AI rollouts.)

The non-engineer adoption ramp that came up next: GitHub Codespaces. A virtual development environment in the browser, around fifty cents an hour, no installation, no Linux configuration, no shell setup. The container has Python or whatever; it just works. For someone whose entire workday is already in browser tabs, "open a Codespace" is a much shorter walk than "install VS Code, install Claude Code, install Node, set up your shell." The cheat code that worked when nothing else did.

Token-Maxing Is the New Lines of Code

A side thread worth pulling out: how teams measure AI adoption. Someone heard at MicroConf that a CEO had threatened to fire anyone who didn't burn through their full token allocation. The room laughed, but the deeper point landed. "Token-maxing is the new lines of code." You'll get what you measure. Tokens-as-KPI produces noise — long-winded prompts, context dumps, agents arguing in circles. Lines-of-code-as-KPI produced the same junk a generation ago. (The room remembered the Dilbert comic about coding yourself a minivan.)

A green GitHub contribution graph isn't proof either. AI agents will happily double the size of a codebase to ship a feature; that doesn't mean the codebase is better. (See again: Wally, the garbage-collector bot.)

The harder question: what should you measure? The honest answer is workflow-level, not metric-level. Did the marketer ship the campaign faster, with sharper positioning? Did the engineer fix the bug in half the time, at the same quality? Did the salesperson catch a missed objection in the call review? None of those show up in token consumption. Capability per output is the right axis — and tracking it requires actually looking at the work, not the dashboard.

What to Actually Build

A short list extracted from the conversation:

Stand up an AI code-review layer. Pair a deterministic linter with an LLM reviewer. Make the LLM reviewer modifiable (skill-as-code) so engineers can correct its rules during a PR. Require its approval before merge.
Pick one skill repository per team and curate it. Personal skills are fine on local machines; canonical skills go in a shared repo with a curator. Submodules work — Claude can manage them — even if you'd previously sworn off submodules.
Test the harness, not just the model. A workflow that works in Claude Code may not work raw via API. Move tested skills to new infrastructure deliberately, not assumptively.
Ask "what could you not use AI for this week?" in every one-on-one. Treat the answer as the next conversation, not a complaint to log.
Stop measuring tokens. Measure intent-to-output time, capability per output, or the disappearance of work that used to take a week.

The room agreed on something at the close: there isn't a job title yet for the person who owns this layer. "There's no AI in IT, mate," came the reply when someone suggested IT might get the job — and the consensus was that the AI architect role is going to live somewhere else. Whoever picks up the work — engineering lead, ops lead, founder — they'll be the one who decides whether your AI is producing useful output or accumulating expensive slop.

For more on building the operational structure that AI work depends on, see Map Your Operating System Before You Apply AI.

Resources From the Roundtable

Claude Code — the central tool nearly every operator referenced; the foundation for skill repos, code-review bots, and agent loops.
GitHub Codespaces — browser-based development environments; mentioned as the friction-free entry point for non-engineer team members.
RuboCop — the deterministic Ruby linter that the agency's "Wally" cleanup bot pairs with an LLM reviewer.
Ruby on Rails — the agency's stack of choice; opinionated framework that makes the Wally workflow possible by giving the AI a strong pattern to enforce.
Meta Marketing API MCP — referenced as the reason Meta's ad-publishing step was easier than Google's in the sales-pipeline build.
HubSpot App Marketplace — referenced as a moat: AI can't yet shortcut a multi-month vendor approval process.
MicroConf — the SaaS founder community where two attendees met; the "fire anyone who doesn't use their token allocation" anecdote came from a speaker there.

The AI Layer Most Teams Aren't Building

John M. P. Knox

Fight Fire With Fire

Skills All the Way Down

Don't Bet the Farm — But the Switching Cost Is Already Low

What Could You Not Use AI For This Week?

Token-Maxing Is the New Lines of Code

What to Actually Build

Resources From the Roundtable

Want to Talk?

The AI Layer Most Teams Aren't Building

John M. P. Knox

Fight Fire With Fire

Skills All the Way Down

Don't Bet the Farm — But the Switching Cost Is Already Low

What Could You Not Use AI For This Week?

Token-Maxing Is the New Lines of Code

What to Actually Build

Resources From the Roundtable

Want to Talk?

Get in Touch

Message Sent!