Moving Average Inc.

The AI Layer Most Teams Aren't Building

Code review bots, skill repos, and the operational scaffolding behind teams getting real AI value.

AI Roundtable

Most leaders are still asking which AI tool to adopt. The operators in this week's roundtable are well past that. They've adopted the tools. The conversation in the room was about the layer underneath — the scaffolding that turns "we use Claude" into something that compounds rather than slops.

What follows are notes from this week's Executive AI Roundtable discussion, shared under the Chatham House Rule.

Skills all the way down.

This essay is one week's notes from the Executive AI Roundtable — a closed-door peer conversation for CEOs and founders, held weekly under the Chatham House Rule.

Skills all the way down.

Fight Fire With Fire

The most concrete examples in the room were AI tools built to manage AI's own output.

A software agency lead described a Ruby gem her team built, called Wall-E — named after the Pixar garbage collector. It runs a weekly audit on each codebase, pairs deterministic linters like RuboCop with an LLM triage step, and opens a PR when a refactor or removal is worthwhile. It's all agentic — one agent scans the code and finds opportunities to refactor or remove; another decides whether the finding is worth fixing; a third takes the task and opens the PR.

Another founder runs a code-review bot named Auto. Auto is a Git user. He's a required reviewer on every pull request. A virtual machine pings GitHub every fifteen minutes, picks up any reviews assigned to it, runs the code-reviewer skill in a Claude Code session, and posts the result back. By the time a human sees the PR, the bot has already been through it — either the developer fixed the flagged issues or modified the reviewer skill itself, with a justification, to teach it. The founder has seen impressive results: their code quality has improved significantly because they invested substantial time in that defensive AI approach.

He calls it the fight-fire-with-fire approach: AI generates code at volume, and a large search area can hide many kinds of mistakes. For instance, one of its known failure modes is duplicate code — writing brand-new functions for things that already exist in the codebase. Claude Code can search for prior implementations, but there's a limit, and the context window can't hold an entire repo. The solution is an equally serious investment in AI code review. The deterministic linter handles patterns it knows; the LLM and the human handle judgment.

Two-AI consensus catches more than either alone. The pattern generalizes: AI generates dashboards, skills, emails. Each output stream needs a counterpart to audit or deduplicate it. If you only build the generation side, the slop accumulates faster than you can clean it — which is exactly why quality engineering is having a renaissance.

The pattern scales down to individuals, too. I run AI on my own sales-call transcripts as a coach. Yesterday, it told me I'd done a rug pull on a buying signal — which I didn't enjoy hearing, but it was correct. Brutal, useful, and the next call gets better because of it.

AI Workshop for CEOs

If your team has the tools but not the scaffolding underneath, the workshop is where we build the layer most teams skip — three hours live with eight CEOs and a 1-on-1 to design the skills, prompts, and review loops for your business.

Reserve Your Seat →

Skills All the Way Down

Several attendees were wiring AI into their workflows as composable skills — small, scoped capabilities that the AI loads on demand.

A real-estate software founder described his team's pipeline: ingest years of sales call transcripts, extract voice-of-customer phrases tied to specific pain points, generate landing pages tuned to those phrases, then push the pages and matching long-tail keywords into Google and Meta ads. Every step in that pipeline is its own Claude skill. "Skills all the way down."

He noted that Meta's MCP made the ad-publishing step easy for the agent to handle, whereas Google's ad-creation process introduces significantly more friction for AI.

Skills change the cost structure of AI work. Prompts are one-shot; skills persist. You build them once, refine them as you use them, version them in Git, and share them across the team. Another founder's infrastructure was just enough: a virtual machine logged into Claude Code, a cron job that invokes a skill on a schedule, and Slack hooks that trigger skills on demand. No "agent platform" required. The skill is the agent. (More on the skill-as-workflow shift.)

But skill management has its own failure mode. The same founder's developers were building duplicate skills — both proud of their version, neither willing to use the other's. He called it the IKEA effect: built it myself, love it so much, won't switch. He had to enforce a single canonical version per process and tell the engineers to "duke it out" before either skill was added to the shared repo. Submodules — which some engineers have learned to avoid in their development life — turned out to handle the canonical-skill problem cleanly. Claude can manage submodule reconciliation just fine.

The thing to notice: you can't manage AI skill drift the way you manage code drift. A skill is just a markdown file. Anyone can write one. Nobody owns the question of which skill is the skill. If you don't appoint a curator, you'll wake up with sixteen versions of the same dashboard generator and team members emailing each other their personal favorites.

Don't Bet the Farm — But the Switching Cost Is Already Low

When a founder asked the room where to host his agents — the cloud, a local Mac, somewhere else — the conversation turned to vendor lock-in. A veteran in the room offered a sharp warning: the winners now won't necessarily be the winners in 6 months or a year. He's old enough to remember when "build for Netscape" was the answer, before Amazon IPO'd. The risk in any fast-moving market is getting on the wrong train early and going off in the wrong direction.

Another participant pushed back with something equally true. The actual artifacts of AI work — skills, prompts, context files — are text. Text is portable. "We've switched between OpenAI and Claude and back, and run it on local LLMs. The switching cost is just you saying, alright, I'm giving up on Claude, I'm gonna go do Codex this week, and it'll just move all the text files in 5 minutes and reformat them."

Both are right. The refinement: don't bet the farm on a vendor's UI, infrastructure, or proprietary format. Bet the farm on your text files. Keep the IP in Git, where it belongs. The model can change. The skills shouldn't have to.

There's one more nuance: the model and the harness are not the same thing. Claude in Claude Code is configured with specific tools — calculators, code execution, file access — that aren't available when you call the same model directly via Azure or another cloud. Same engine, different tires. A skill that works beautifully in Claude Code can produce noticeably worse results when you swap the harness, even though the model name didn't change. Test the harness, not just the model — and be careful when porting a working pipeline to new infrastructure. (This is also why LLMs can quietly fail at math when the harness doesn't include the calculator tools.)

What Could You Not Use AI For This Week?

Adoption inside a team isn't a tooling problem. Several attendees described the same split: the developers face an identity crisis, and the QA and support people face overwhelm. Both groups push back in different ways.

One founder shared a one-on-one tactic that's been quietly transforming his team. In every check-in, he asks the same question: What could you not use AI for this week?

In other words, they're asking their employees which of their tasks didn't include help from Claude.

The founder spoke of one employee who had to pull out Postman and write some API requests. Their response: Claude could write a Python script that did 5 million API requests and then tell you what it found.

The exercise works because it surfaces mental blocks and habits rather than technical limits. Engineers who already use AI for code completion will quietly skip it for the API testing they've done a hundred times. Habits live in the comfortable corners. The question pulls the AI lens across the entire week.

For the wider version of this problem — what to do when the founder ships at AI speed but the team doesn't — see Closing the 4x AI Speed Gap on Your Engineering Team.

Their QA and support teams have a different problem from the engineers. "For the developers, it's an identity crisis. For the non-developer roles, it's very overwhelming that their day is gonna be more in VS Code." He showed his QA lead an AI browser-automation tool and reframed her job: manage a team of 10 QA people. They're all Claude. It took a couple of months before the light-bulb moment arrived. The friction wasn't the tool; it was asking someone whose work had been in browser tabs to spend all day in an IDE. (More on the cultural side of AI rollouts.)

Another participant pointed to a useful taxonomy that's been circulating in the room — what they called the Shapiro framework for AI use in engineering, organized into five levels. Spicy autocomplete at the bottom: IntelliSense, but more confident. Above that, you're using a super-intelligent copilot pair-programming alongside you. Higher up, the engineer manages agentic work where you scope a task, walk away, and review the output line by line. Higher still: the engineer writes harnesses — test scaffolds and evaluation suites — so human review becomes the exception, not the default. Naming the levels lets a team see where each member actually operates, rather than arguing about whether they "use AI."

The non-engineer adoption ramp that came up next: GitHub Codespaces. A virtual development environment in the browser, around fifty cents an hour, no installation, no Linux configuration, no shell setup. The container has Python, and whatever other tools are needed; it just works. For someone whose entire workday is already in browser tabs, "open a Codespace" is a much shorter walk than "install VS Code, install Claude Code, install Node, set up your shell." The cheat code that worked when nothing else did.

Token-Maxing Is the New Lines of Code

A side thread worth pulling out: how teams measure AI adoption. Someone heard at MicroConf that a CEO had threatened to fire anyone who didn't burn through their full token allocation. "Token-maxing is the new lines of code." You'll get what you measure. Tokens-as-KPI produces noise — long-winded prompts, context dumps, agents arguing in circles. Lines of code as a KPI produced the same junk a generation ago. (The room remembered the Dilbert comic about coding yourself a minivan.)

A green GitHub contribution graph isn't proof either. AI agents will happily double the size of a codebase to ship a feature; that doesn't mean the codebase is better. (See again: Wall-E, the garbage-collector bot.)

The harder question: what should you measure? Workflow-level outcomes, not metric counts. Did the marketer ship the campaign faster, with sharper positioning? Did the engineer fix the bug in half the time while maintaining the same quality? Did the salesperson catch a missed objection in the call review? None of those show up in token consumption. Capability per output is the right axis — and tracking it requires actually looking at the work, not the dashboard.

What to Actually Build

A short list extracted from the conversation:

  1. Stand up an AI code-review layer. Pair a deterministic linter with an LLM reviewer. Make the LLM reviewer modifiable (skill-as-code) so engineers can correct its rules during a PR. Require its approval before merging.
  2. Pick one skill repository per team and curate it. Personal skills are fine on local machines; canonical skills go in a shared repo with a curator. Submodules work — Claude can manage them — even if you'd previously sworn off submodules.
  3. Test the harness, not just the model. A workflow that works in Claude Code may not work raw via API. Move tested skills to new infrastructure deliberately, not assumptively.
  4. Ask "what could you not use AI for this week?" in every one-on-one. Treat the answer as the next conversation, not a complaint to log.
  5. Stop measuring tokens. Measure intent-to-output time, capability per output, or the disappearance of work that used to take a week.

The room agreed on something at the close: there isn't a job title yet for the person who owns this layer. "There's no AI in IT, mate," came the reply when someone suggested IT might get the job — and the consensus was that the AI architect role is going to live somewhere else. Whoever picks up the work — engineering lead, ops lead, founder — they'll be the one who decides whether your AI is producing useful output or accumulating expensive slop.

One last thread, carrying forward from my essay about the dangers of carelessly replacing humans with AI: Claude is a commodity. People know things AI can't replace. The team you keep is the team that maintains leverage over AI output — and the cleanup layer is how that leverage compounds.

For more on building the operational structure that AI work depends on, see Map Your Operating System Before You Apply AI.

Resources From the Roundtable

  • Claude Code — the central tool nearly every operator referenced; the foundation for skill repos, code-review bots, and agent loops.
  • GitHub Codespaces — browser-based development environments; mentioned as the friction-free entry point for non-engineer team members.
  • RuboCop — the deterministic Ruby linter that the agency's "Wall-E" cleanup bot pairs with an LLM reviewer.
  • Ruby on Rails — the agency's stack of choice; opinionated framework that makes the Wall-E workflow possible by giving the AI a strong pattern to enforce.
  • Meta Marketing API MCP — referenced as the reason Meta's ad-publishing step was easier than Google's in the sales-pipeline build.
  • HubSpot App Marketplace — referenced as a moat: AI can't yet shortcut a multi-month vendor approval process.
  • MicroConf — the SaaS founder community where two attendees met.
John M. P. Knox
John M. P. Knox

Founder of Moving Average Inc. 25 years across MedTech, enterprise platforms, and semiconductors — from writing 64-bit code at AMD to guiding 15+ products to market. TinySeed LP and mentor. Hosts the Executive AI Roundtable.

Get the next essay

I write about AI strategy, IP, and leadership. No spam, unsubscribe anytime.

Share this article

Want to Talk?

Send me a quick message and I'll get back to you.

Full form →