A SaaS founder opened this week's roundtable with this question: "It takes me one hour to develop a feature. If I gave it to one of my developers, it takes them four hours. Without Claude, it would take them a week. So how do we drive that four hours down to one?" This AI skill gap inspired a lively discussion.
What follows are notes from this week's Executive AI Roundtable discussion, shared under the Chatham House Rule.
Executive AI Roundtable
Conversations like the one behind this essay happen every week. I host a closed-door roundtable for founders and C-level leaders navigating AI strategy — no vendors, no pitches, just operators comparing notes.
Join the Waitlist →
The 4x Gap Is Real, and It Isn't About Talent
The 4x gap surprised few, as many CEOs have observed these AI performance gaps. You, or another person in your organization, make productive use of AI. Others move faster than before, but not that fast. The straight-line explanation — they just need more time with the tools — gets less convincing as the months tick by.
A non-technical founder pushed back from the other side. "Now the top engineers are people who have the mindset of how can I do this faster, how can I do this faster." Then the kicker: "I would question whether you have, like, top engineers." It got a laugh from some, and we all pondered if AI changed the definition of a top engineer in late 2025. Engineers who built reputations on careful craft and skills are not always the same engineers who thrive when the right move is to delegate the first draft and audit the output.
Many brilliant engineers haven't caught up to this new reality, and nothing in their day-to-day forces them to. They're satisfied with their habits, even as the world transforms around them.
A Shared Harness, Not a Box of Solo Habits
The closest thing to a unifying fix in the room was a word that kept surfacing between Alicia Rojas, software engineer at Telos Labs, and a founder building production AI systems: the harness. Not the model, not the IDE — the configuration around the model that shapes how people use it.
Alicia described the situation at Telos before: every engineer had their own setup, prompts, skills, and scattered context. The good patterns lived in individuals' accounts. Knowledge transfer required a manual handoff. Half the team had no idea what the other half had already figured out.
One piece of her team's fix was a shared-config GitHub repository,
symlinked into every engineer's configuration. One
CLAUDE.md with the team's standards. A curated set of
plugins. Custom commands the whole team can call. Hooks that
programmatically prevent the model from reading files known to contain
secrets — .env, key files, the usual suspects.
When someone discovers a better prompt or a better skill, it goes into
the repo, and everyone pulls it.
The other founder pushed on the idea. Can the harness make the prompt matter less? If two engineers on the same team get wildly different output from Claude, the harness hasn't done its job. That's the bar — a team harness so good that what you type into the prompt doesn't determine the quality of the result. Few are there yet. Nobody is. But that's the direction.
The companion move is observability. One founder is wiring up LangFuse and OpenTelemetry hooks so every prompt the team sends is traced, reviewable, and improvable in retrospect. Not for surveillance — for coaching. You can't level up prompts you can't see.
More on the team scaffolding pattern in The AI Layer Most Teams Aren't Building.
Inside a 12-Seat Rollout at Telos
Note: I asked Alicia to share her experiences rolling out AI to her team to bring some hands-on color to today's discussion. She generously allowed me to share her deck and identity for this post.
Alicia walked the room through her team's actual Claude Teams rollout — the chat-product side of the harness, separate from the Claude Code GitHub repo. She shared her deck with us here. They currently pay for 12 seats, with a mix of standard and premium accounts. They're three weeks in to their rollout, having previously used a mix of individual AI plans and providers.
What she wanted Claude Teams to deliver, in her own words from the deck: shared Projects with team-wide instructions and knowledge, visibility into who's getting value and how, governance ("one place to update policy, not N inboxes"), and lower friction for spreading good prompts internally. The before-state was the same one most companies are stuck in — "the good practices stayed in people's heads."
What it actually delivered by mid-May: 12 weekly active users, 92.3% utilization, and daily activity ramping from 2–3 users in late April to 10–12 today. The dashboard breaks usage out by Activity, Lines of Code, and individual member — and the per-member Lines of Code view (anonymized in her slides) is what set up the next part of the conversation.
The Projects view is where the cultural shift becomes visible. A shared project holds the artifacts of an engagement: instructions, files dropped in from Google Drive, generated outputs, and meeting notes. When a teammate has a chat worth showing, they share it from the Activity tab. It's an internal show your work feed that surfaces good prompts the rest of the team would otherwise never see — the same compounding mechanic as the shared GitHub config, just running in the chat product instead of the IDE.
The questions that came back from the room landed on practical edges. Several attendees noted: the pricing mix is awkward. You're forced into a minimum of five standard seats before you can add a premium one. There's no team-tier equivalent of the $200 Max plan an individual power user can buy. One participant did the math out loud: a developer who actually maxes a $200 premium seat can run up roughly "a $1,000 bill per developer" in real usage — a 5x jump from the listed seat price. The dashboards and shared Projects give you visibility and governance; the seat economics are still catching up to how heavy users actually run.
Others were more taken with the upside: rolling out a shared set of Claude Skills, instructions, and project memory to a whole team in one move. Alicia's deck plus the rollout numbers makes it a useful reference point for anyone weighing the same trade.
Tokens Don't Tell You Who's Winning
Some participants moved their teams from individual Claude plans to Claude Teams, partly for the shared visibility into who's using what. But the dashboards only get you so far.
It's the old Dilbert strip about paying engineers per line of code — the punchline being some employee saying he's going to code himself a new minivan. Token usage has the same problem.
A Twitter thread yesterday illustrated the problem. Amazon, apparently, has set up some kind of internal token-usage leaderboard, and employees have responded by writing agents that just burn tokens to climb it. The moment a metric becomes a target, the metric stops measuring what you wanted it to measure. It measures whoever is gaming it, making token usage a terrible proxy for business value.
What actually works is messier and more human. One participant, fresh from a conference, talked his cofounder into spending a few days just token-maxing in Cursor together — no goal beyond stop rationing and see what happens. Several hundred dollars per dev per day, bigger and bigger tasks. "Holy shit, look what we did" days followed. The cofounder didn't need a dashboard. He needed to see what was possible alongside someone he trusted.
A move that surfaced in an earlier session and keeps proving its worth: in your one-on-ones, ask each person, what were you not able to achieve with AI this past week? The question reframes the burden of proof. Instead of asking people to justify their token usage, you're asking them to notice the moments they fell back on manual work, and to bring those moments to you.
The per-member Lines of Code view in Alicia's Telos dashboard illustrates two interesting points. First, the heaviest "code" producer on her team isn't a developer at all — she's a designer who prototypes in HTML now instead of Figma, hands the result off to engineers, and opens her own pull requests. Second, it illustrates that Claude Code isn't a tool only for engineers. She "writes" more code than anyone else, even though that's not a designer's traditional deliverable.
The choices you make here — 1-on-1 dialog, pair programming, empathy, metrics, standards — are the culture. None of them measures as cleanly as token counts. All of them outperform token counts.
Empathy Is the Underrated Tool
The cultural side is where the conversation got unexpectedly tender. "I mean, a lot of this is cultural and social," one participant said. "Which some devs have a real issue with, because they don't believe they are."
The empathy reframing came in a few flavors:
- Some engineers are quietly afraid to exceed their $20 plan because they're trying to be cost-conscious. They're rationing themselves to save your budget, but also strangling their performance. Upgrade them, and tell them you want them to spend more.
- Some engineers think prompting is beneath them — that real engineers should be able to think through the problem and write the code, not delegate to a stochastic tool. The leader's job is to make it culturally safe to delegate and to motivate the desired behavior.
- Some engineers genuinely don't see how the game changed. Have a dialogue. Pair with them. Show them. The conversion happens in a one-on-one, not in a memo.
You can't dashboard your way past a cultural problem. You have to talk to people. "I thought the whole point was that machines were going to take over the world," one participant deadpanned, "and we wouldn't have to worry about all the people stuff."
The people stuff is where the speed gap actually closes.
See also: The AI People Problem and Close the AI Feedback Loop.
Orchestrating Around Hallucination, Not Pretending It's Gone
A separate strand of the conversation came from operators running AI-driven workflows in production, where "trust the model" isn't an option. The shared insight: the answer isn't a bigger model. It's building better orchestration around the model.
One founder showed the pattern. A QA engineer files a task in Linear. The system spins up a sandboxed container. Claude steps through a fixed checklist — simplify the code, run the tests, check performance, check for risk, follow team PR standards, write a clean commit. Each step's output is auditable. If a check fails, the orchestrator can route back to an earlier step. At the end, a Slack message lands with the result.
His preferred orchestration tool is Temporal — a hosted state machine that, unlike Lambda-based alternatives, can run workers on his own infrastructure for as long as the work needs. AI feature work can take twelve hours of compute. CI/CD pipelines weren't built for that. GitHub Actions feels too linear. Temporal lets him write the orchestration as code, with retries and rollbacks supported as first-class features, and run the actual workers on AWS containers (or a GPU, if he needs one) under his control.
In the AI era, the orchestration layer is where the team's institutional knowledge gets codified. Every "thing a senior dev would catch" becomes a step in the chain. The model doesn't have to be infallible; the chain catches the misses. Another participant is building a similar pattern around pull request review: every PR auto-runs a sequence of custom review skills, with the model's findings posted to Slack so a human can quickly decide whether to merge or push back.
This is what AI-augmented engineering at a team scale requires — not a single brilliant prompt, but a chain of small, inspectable steps, documented once and run a thousand times. Without this automation, the AI contribution to quality is far more limited.
AI as Opportunity Engine, Not Oracle
A workshop facilitator described a recent one-on-one onboarding with a founder who had a 1,200-line to-do list. The founder was embarrassed by the list — embarrassed to the point that he felt he had to tidy it up before he gave it to a machine to even think about it.
The facilitator told him to hand it over messy. Within three minutes, the model came back like a calm friend who'd actually read it: there are some great ideas for blogs here, this is for product, here's the to-do list, you've got way too many, these are the ones that are most like you. The founder didn't need to be told what to do. He needed someone to look at the chaos and say, this is workable.
The takeaway isn't that the AI knew better. It didn't. AI plowed through the list without reluctance, fear, or apprehension — which is what the founder couldn't do for himself. That posture, AI as an unflinching second pair of eyes, is the version most CEOs haven't fully internalized yet.
I think of AI in these contexts as an opportunity engine. It helps you take advantage of opportunities. If it gets one wrong, well — you could have gotten it wrong, too. You're just rolling the dice faster. The quality assurance is still on you. The judgment is still on you. The willingness to take more shots, on more bets, with shorter feedback loops between them — that's what AI buys you. If the founder didn't like what Claude offered, he could ask for another take and roll the dice again.
One participant added a useful Turing test for the modern era: you can't tell whether it was the human or the AI that made the screwup. Take that seriously, and a lot of the angst about AI errors deflates. Humans make those same errors. The difference is that AI makes them faster, which means you find out about them faster, which means you fix them faster.
Of course, using AI isn't risk-free. CEOs can get carried away, as the facilitator pointed out. "They are run by their agents and their AI," he warned — entrepreneurs who let the agent's preferences quietly become their backlog. AI addiction, was the reply. "People are just, like, focused on what's easy to do with AI instead of what needs to be done." The opportunity engine cuts both ways. Roll the dice faster on the bets that matter; don't let the dice decide what the bets are.
Related: Map Your Operating System Before You Apply AI for how to find the leverage points in your business before pointing a model at them.
Where to Start This Quarter
If you're a leader who's pulling ahead while your team lags:
- Build a shared team harness. One
CLAUDE.md. A curated plugin set. Hooks that enforce the non-negotiables (no secret exfiltration, no destructive commands). Symlink it into everyone's setup. Commit improvements as a team. - Add observability before you add dashboards. Trace prompts with LangFuse or OpenTelemetry. Or try pair prompting. You can't coach what you can't see. Use your insights for one-on-ones, not for performance reviews.
- Replace token-count metrics with culture moves. Ask every direct report, in their next one-on-one, what couldn't you do with AI this week? Listen for the manual workarounds. Make the wins visible.
- Upgrade the people who are rationing themselves. If your engineers are silently capping their AI spend to look cost-conscious, you have a culture problem disguised as a budget problem. Tell them out loud you want them to spend more.
- Codify your review checklist as an orchestrated workflow. Whatever a senior engineer would check on a pull request — performance, risk, simplification, standards — turn it into a chain of skills that runs automatically. Temporal or any orchestration tool will do. The point is to stop relying on the senior engineer being awake.
- Treat AI as an opportunity engine, not an oracle. Roll the dice faster. Keep humans in the loop where judgment lives. Pair every aggressive bet with a check.
The 4x gap isn't a permanent state. It's the natural lag between one person discovering the new game and a whole team learning to play it. The teams that close the gap fastest aren't the ones with the best tools. They're the ones whose leaders stop measuring tokens and start coaching humans.
Resources From the Roundtable
- Telos Labs — Alicia Rojas's consulting firm; Foundry, AI workflow, and tech-strategy advisory work. Her Claude Teams rollout deck is the source for the dashboard numbers in this essay.
- Claude Code — primary AI coding tool discussed throughout; everyone in the room was using it.
- Claude Teams — shared accounts, activity dashboards, team-level visibility. Several teams have moved off individual plans.
- Temporal — hosted state-machine orchestration; favored for long-running AI workflows that don't fit CI/CD or Lambda's constraints.
- Linear — issue tracker used as the entry point for automated PR-review and code-orchestration workflows.
- LangFuse — open-source LLM observability and tracing.
- OpenTelemetry — the standard underneath LangFuse and most tracing setups; used here as the team's hook into every Claude session.
- Fly.io — container hosting, mentioned with mixed feelings around long-running task handling.
- AWS Step Functions — the orchestration alternative the same founder moved away from.
- Business of Software AI Workshops