Where AI Quietly Goes Wrong in Small Companies

A CEO on this week's AI roundtable described something worth unpacking. On a six-hour flight, he'd built a "chief people officer" simulation in Microsoft Copilot — chatting with it about his team, his org design, the challenges his people were facing. When the chat ran out of space (i.e., the context window was too full), he printed it to PDF and re-uploaded the document to keep the memory alive.

Then he said the part that mattered: "The AI getting it wrong was almost the most useful thing."

The model didn't know his company. It got things wrong constantly. And every time it did, he had to pause and articulate, in writing, exactly why it was wrong. That process of correction, repeated dozens of times, was the real value. The fact that he was forced to put his own thinking into words gave him a new perspective on his organization.

That same week, another attendee hunted an integer overflow bug. He asked AI to find it. As sometimes happens, the model was confident, fast, and completely wrong about the cause.

These two stories share a lesson: AI is most useful when you stay engaged with it — correcting it, checking it, arguing with it. The magic happens in the back-and-forth.

What follows is shared under the Chatham House Rule.

The AI getting it wrong was almost the most useful thing.

This essay is one week's notes from the Executive AI Roundtable — a closed-door peer conversation for CEOs and founders, held weekly under the Chatham House Rule.

AI Can't Do Math, and That's More Dangerous Than It Sounds

The integer overflow bug was a spectacular failure. The smaller one — the one that kept coming up in this conversation — was a time zone table.

One of the CEOs had asked his AI to generate a simple artifact: central time, UTC, East Coast time. When he used it the first time, something seemed off, so he checked it. It was wrong.

When he pushed back, the model admitted: "I just made an incorrect assumption about the time offset." That fragment says everything. The AI didn't refuse to do math. It didn't reach for a calculator. It just guessed an offset, plugged it into the table, and handed the result back with the same confidence it would have used for anything else.

Despite being software, LLMs are language engines — they predict words, not compute numbers. Under the hood, they're doing something closer to "what word usually comes next?" than "what does 1 plus 5 equal?" — a fundamentally different kind of computation. They can write software that does math. They can read formulas. They can use calculation tools. They can explain reasoning (usually). But ask them to do arithmetic without the right tools, and they can confidently produce numbers that look right and aren't.

There is nuance here. Think of an exotic sports car. You buy the cheapest set of all-season tires at your discount warehouse, and the car performs completely differently than it does with a set of expensive racing tires. Products like Claude Pro provide tools alongside the AI model, and configure their system to use these tools when the situation calls for it. However, if you simply use the raw API, you won't get that extra tooling and configuration by default.

This is why using Claude Opus 4.6 in Microsoft Cowork and in Claude Pro can feel completely different. Anthropic and Microsoft equip the same AI models with different tires.

For most chat use cases, this is harmless. A CEO running financial analysis with AI — and one of this week's attendees described doing exactly that, in "a zillion Excel sheets" — is courting disaster if the tooling isn't right for that application. The model can absolutely tell you whether one number is bigger than another. It can read a P&L and describe a trend. But you don't want an LLM to perform a calculation itself because the answer will be suspect.

The fix is simple but boring: make the AI use tools. Have it write Excel formulas or Python code. Have it produce a markdown file explaining how it arrived at every number, so you can verify the reasoning. For important questions, I run the same artifact through two different models and look for the points where they disagree. It's a peer review for AI.

A founder running a thirty-person company without a CFO can get useful financial commentary from AI. But only if the model relies on tooling to do the math.

Same instinct as the previous roundtable post on closing the AI feedback loop: the value lives in the loop you build around the answer.

AI Workshop for CEOs

If you want to spot these failure modes inside your own org before they cost you a quarter, that's the diagnostic — three hours live with a small group of eight CEOs, plus a 1-on-1 session to pressure-test your own setup.

Reserve Your Seat →

The IP Question Nobody Wants to Answer

The next thread is the one most leaders quietly avoid. Where, exactly, is your team's data going? Who has access to it?

One founder described running his company entirely inside Microsoft. The reasoning makes sense: everything's already there, the contract is in place, and the data has nowhere new to go. Copilot feels like an extension of the existing trust boundary. But the configuration is only legible to a Microsoft expert.

For instance, who on his team can actually see what's inside Copilot? He'd built an agent in Microsoft's agent studio and started worrying he might accidentally share sensitive information across the team. He didn't think it would actually happen. But the doubt adds friction, especially when it comes to compensation data or detailed financials.

This is an area where developing an AI policy and training can help. If the CEO is uncertain, employees definitely are. Researching and documenting how this works will prevent employees from guessing or avoiding AI out of uncertainty.

The legal version of the IP question is a separate thing, and it has a crisper answer. Trade secrets only exist as long as you don't disclose them to a third party without an agreement. Paste a process into a free chat tool and you've published it. That's the principle behind the old advice to avoid free tools: the legal protection evaporates the moment you hit submit, regardless of the vendor's intentions.

Most small companies aren't going to enforce their IP in court. But most have one or two things — a manufacturing process, a customer list, a model of how a product actually works — that they'd rather not risk giving a competitor access to through a general-purpose AI. The simple rule: if you wouldn't paste it into an email to a stranger, don't paste it into a free AI tool.

This is where local models get interesting, and it's the only part of this section that most IP advice gets wrong. A model running entirely on your own machine, with no network calls, is treated by trade secret law the same way an Excel spreadsheet is. You haven't disclosed anything. You haven't published anything. The trade secret survives. Local models are slower, weaker, and a pain to run — so they're not the right answer for most workloads today. But for the handful where you really need secrecy, it's the best answer.

Free Assessment

Most AI failures trace back to a missing policy, not a model bug.

The damage usually shows up in the policies and vendor terms nobody reread before the rollout. Take the 3-minute AI IP Risk Assessment to score your IP protection, policy coverage, documentation readiness, and vendor risk.

Take the Assessment →

Shadow AI Lives in Tools You Already Pay For

Every CAD program is adding AI now. Adobe is adding AI. Google Suite is adding AI. Your accounting software is probably adding AI. The vendor pushes an update, you install it, and suddenly there's a feature inside an existing license that's doing something with your data. The only way to know what is to read the EULA every time it changes.

One attendee mentioned that he never knows what's active inside Google Suite, AI-wise. On top of that, the privacy settings drift with little warning, and the defaults always seem to favor sending more data. As they put it, "if you're not paying for the product, you're the product."

That's the version of shadow AI nobody has a policy for. Not the version your team is using deliberately, but the AI features that seem to be popping into every software product.

There's no clean fix, no setting you can switch to prevent this. You can't realistically audit every license your company holds. What you can do is establish a principle — every new vendor, every new terms of service, someone reviews how your data is being used and how to opt out of sharing it or training on it.

I wrote more about this in Shadow AI: The Tools Your Team Uses Without You. The roundtable version of the conversation just kept adding examples.

Principles, Not Rules

The last theme came up at the end of the conversation, almost as an aside, and it might be the most actionable one.

One founder admitted he hadn't published any AI policy yet — and the reason was very specific. He has a team member who reflexively rejects any published rule. Doesn't matter what the rule is. Writing it down triggers a fight.

"And so then, not having any rule…" — he trailed off, but the implication was clear. No rule means no friction, but also no protection.

This is the part where many would insist you have to have written rules. If this employee is valuable to your organization, start with a 1-on-1 conversation. To prevent a reflexive rejection, you need to show this person respect and give them an opportunity to weigh in. Hopefully, you can agree on a set of principles around AI usage.

Principles look like:

We protect our IP. That means we don't paste sensitive material into free tools.
We use the tools the company pays for. If you need a different one, ask, and we'll buy a license.
We assume AI can be wrong. Important outputs get checked.
We don't put data into tools we haven't reviewed.

If your rule-hater can't agree to principles, then you need to make some difficult decisions. What is more important? An AI policy, or this employee? If they do agree, then out of that agreement, hopefully, you can arrive at a suitable written policy.

This is the same observation I made in The AI People Problem — most AI rollouts fail because of culture, not technology. A shared understanding of the why behind policy is how you ship culture. The handbook comes later — if it comes at all.

What I Took From This Session

A founder running a small company is not going to build the AI infrastructure of a Fortune 500. There's no CISO, no AI ethics board, no full-time policy team. What there is, instead, is one or two people who care, who notice, and who keep adjusting course as the tools beneath them change.

The companies that get this right pay attention and respond to change. The policy follows.

The CEO who built the CPO simulation didn't get a great org chart out of it. He got something more valuable: a clearer articulation of what he actually believed about his team. The one who hit the time zone bug now treats every AI calculation as suspect. The one who's nervous about publishing AI rules has discovered something worth being nervous about — his company's AI usage outgrew its governance, and he noticed before it cost him.

Every company working with AI right now is writing the playbook as they go. The ones in this room are at least writing it with their eyes open.

Resources From the Roundtable

Microsoft Copilot — the central AI tool one of the founders is using inside his Microsoft tenant; came up as the home for his "chief people officer" experiment and his agent-studio worries
Claude — both attendees said they'd switched to Claude from ChatGPT, primarily because of Anthropic's positioning and brand decisions
Claude Code — Anthropic's command-line agent; came up as the natural next step beyond chat for one founder building a dashboard overlay on his ERP
Adobe Creative Cloud and Google Workspace — both mentioned as examples of "shadow AI" — products where AI features have appeared inside existing licenses without an obvious opt-in

Where AI Quietly Goes Wrong in Small Companies

John M. P. Knox

AI Can't Do Math, and That's More Dangerous Than It Sounds

The IP Question Nobody Wants to Answer

Shadow AI Lives in Tools You Already Pay For

Principles, Not Rules

What I Took From This Session

Resources From the Roundtable

Want to Talk?

Where AI Quietly Goes Wrong in Small Companies

John M. P. Knox

AI Can't Do Math, and That's More Dangerous Than It Sounds

The IP Question Nobody Wants to Answer

Shadow AI Lives in Tools You Already Pay For

Principles, Not Rules

What I Took From This Session

Resources From the Roundtable

Want to Talk?

Get in Touch

Message Sent!