A B2B SaaS Summary of the AI Engineer World's Fair 2024

A practical perspective on how AI is applied in a business context

John M. P. Knox

Founder

Disclaimer: This essay offers a practical perspective on AI for B2B SaaS businesses. The AIEWF targets engineers and tech business leadership, a very different audience. Most folks attend AIEWF to get up to speed on the latest AI technologies, learn how to improve system performance, and so on, not for business applications. I don’t want to leave a poor impression of the AIEWF.

I witnessed many impressive demonstrations at the 2024 AI Engineer World’s Fair, but the practical business uses are few and far between. I saw no hints that we were close to the El Dorado of AGI or business agents running without human review. Most vendors I saw at AIEWF sold AI tools and services for developers, not end-user applications.

Want a hosted LLM model like Llama or Mistral? Do you need dedicated GPUs for training? Would you like a solution for building sophisticated LLM workflows? How about vector store databases? You could talk to these kinds of vendors on the expo floor. They provide all the components needed to make anything from a custom foundation model to a state-of-the-art RAG workflow.

But you’d need to build your application from parts like you might design a shell script. I didn’t talk to many attendees or expo-floor vendors who sold off-the-shelf applications that use AI technology to increase revenue or operational efficiency. This is great news for application developers! There are many opportunities for AI B2B applications.

What is AI?

AI means less than it once did, in a way. Only a few attendees at AIEWF noticed that most non-LLM models were conspicuously absent. Many attendees seemed blissfully unaware that AI research has existed since at least the 1950s. Computer vision, bayesian networks, classifiers, and neural networks were not among the topics of discussion.

Large language models (LLMs) were overwhelmingly the focus. As a result, most of the applications focused on language as a product. Unfortunately, language as a machine interface has the same problems as language as a human interface. Ambiguity, confusion, and bullshitting are now computer problems, not just human problems.

This may be an advantage for application vendors. While the startup market focuses on language applications, there may be opportunities for integrating computer vision or business data analysis.

Fancy Words

Like any new technology trend, AI has developed an insular vocabulary, some of which have hazy definitions. Here are some words that the cool AI kids throw around these days.

Evals are a tool of quality engineering. In general, evals (evaluations) involve evaluating the capabilities of an AI software system by examining the output from a suite of test inputs. For instance, a prompt requiring JSON output should produce valid JSON output, or the eval should fail. They’re the unit test of the AI world, but usually with less of a black-and-white interpretation.

An AI agent is a system that runs automatically. The notion of agents seems incredibly fuzzy. At one end of the spectrum, Zapier could qualify as an agent (although the audience here might object). At the other end of the agency spectrum, an AI agent can replace a role or an entire department within a business. The human-replacement end of the spectrum isn’t there yet. Even at this conference at the bleeding edge, I didn’t speak with anyone who used an agent in business without human supervision.

AGI, or Artificial General Intelligence, seems nearly undefined. This term seemed to be used interchangeably with “superintelligence.” Everyone seems to agree that AGI might become more intelligent than we are, but the primary “Job to be Done” discussed appeared to involve the extinction of humanity. I hope someone will teach AGIs about maintenance before OpenAI or Tesla builds Terminator bodies. I didn’t speak with anyone who claimed we were close to AGI technology.

Hallucination is the fancy term for lies invented by an AI system. You can think of hallucinations as AI bullshit. LLMs may produce certain-sounding text, but it isn’t always correct.

RAG is a technique in which an AI system searches for (hopefully) factual information to use as the basis of a response to a query. This approach reduces hallucinations and allows an AI system to provide information beyond the scope of the training set used in its components.

Vector stores or vector databases are a technology that stores and retrieves textual information in a tokenized form. In general, a vector store allows you to search a database by similarity in meaning. RAG applications frequently use this technology to find information relevant to a query.

LangChain, LangGraph, Langflow and similar-sounding products allow multi-step AI systems to be built as a series of steps with minimal coding. These tools are like shell scripts for AI — don’t say that in front of one of the vendors.

Business Applications of AI

I asked anyone who would listen how businesses use AI today to generate more revenue or increase efficiency. Out of the hundreds of attendees, I heard three practical applications of AI in B2B SaaS businesses.

The first application is RAG-based customer service tools. I spoke with a few businesses using RAG to automatically produce responses to customer support inquiries using an LLM referencing their support documents in a vector store. However, I don’t believe anyone I spoke to allowed these systems to reply to customers without human review. Folks have also built similar systems for internal users to find and synthesize information within internal business documents.

The second application is an automated LLM that provides code-review feedback to pull requests, summarizes pull requests, and suggests ways to resolve CI/CD system errors. This is a simple extension of AI into existing software best practices.

The final application has been commercialized at Kudu, which monitors press releases, blog posts, job postings, and other sources to discover new potential sales targets for B2B SaaS companies. This is the most exciting example of a practical AI agent I saw at the conference because it lies precisely at the sweet spot of interpreting textual information and finding revenue opportunities. I hope to see more valuable business applications of LLMs in the future.

The folks selling shovels to the AI gold miners are a different story. I suspect many expo floor vendors will ultimately get aqui-hired, pivot, or shut down. Optimizing LLM pipelines, hosting AI, and competing with LangChain will require massive (and wise) technical, marketing, and sales investments. Even Amazon, Microsoft, and Google may struggle to differentiate themselves in AI. We’ve seen this story before — some VC-funded businesses make it to orbit, and most fall back to earth. Let the big dogs and VC-funded monsters fight over the expensive-to-develop foundational models.

If you want to sell AI products instead of using AI technology to improve your business processes, find a niche where customer budgets are large and hallucinations aren’t a problem. There are many business operations, like sales and marketing, where surfacing possibilities (instead of definite data) is acceptable. If data is represented by text, or you can transform the data into text, there may be an opportunity to use an LLM.

While employees have many benefits over LLMs, AI will likely always have an advantage over humans when searching subtle signals in tedious mountains of data. You can see this even today in medical applications like radiology, where AI models can yield more sensitivity to signs of disease (or perhaps resistance to tedium) than humans. Imagine applying this sensitivity to search results, blog posts, podcasts, video transcripts, product announcements, Amazon listings, analytics — any application where human attention hasn’t scaled with the tsunami of data.

Alternatively, build a product with large margins to cover the cost of human supervision. We are trying to build revenue, not excitement. I wouldn’t try to create a better vector store or LLM! Those businesses are likely subsidized by venture money.

Conclusion

While the AIEWF left me feeling excited about the future of AI, it also left me with several questions about the current hype. Autonomous agents and AGI aren’t yet practical. The marketing narratives (i.e., the hype) around them feel massive, but they’re still science fiction. It reminds me of the (now deflated) hype around DAOs (Decentralized Autonomous Organizations) in crypto. They sound cool, but without a concrete example, it is impossible to diagram a list of pros and cons.

It also isn’t clear to me (or many folks I spoke with) that LLMs are on the road to AGI. LLMs are an incredible technology, but ultimately, they transform and combine existing information. Arguably, this is a kind of cognition, but it doesn’t seem like a universal form of cognition. My fellow attendees suspect this, too — several pointed out that human consciousness stems from a combination of different elements. For example, the language parts of our brain can’t make decisions without the emotional bits.

AGI might not follow our natural brain architecture, so take my skepticism with a grain of salt. Still, I suspect the pieces needed for fully autonomous agents don’t yet exist and that a better LLM alone won’t result in AGI.

In addition, language as the AI interface has issues. Knox’s first law* is that people don’t read. TikTok, YouTube, and Netflix are massive. Netflix has more than ten times the number of subscribers as the New York Times. Text isn’t that popular! And yet, current trends rely so much on textual output and input.

As demonstrated by OpenAI in the closing keynote, conversational audio approaches would seem to remove the need to read and write, but they aren’t yet natural. An audio ChatGPT sounds pretty similar to the responses from its text interface. That is to say, ChatGPT talks like a university professor after voice-acting lessons. It’s more often a lecture than a conversation, which was apparent from how readily the OpenAI employee interrupted ChatGPT every few seconds. This felt rude and didn’t jive with typical human discussions. When humans talk, we follow cues from each other and seek clarification as a feedback loop. LLMs give the impression that they’re impatiently trying to wrap up the conversation at every step, and it’s exhausting.

What about video and image generation? While exciting, the technology is difficult to control. If a picture is worth ten thousand words, then the prompt size for visual generative AI isn't large enough. When tools like DALL-E can reliably incorporate text or render human interactions, we'll have a breakthrough in AI interaction. Until then, prompting image and video systems remains a dark art requiring trial and error.

There is good news. AI in the form of LLMs shines today in observing new language-borne information and helping synthesize it into valuable signals or retrieving and repackaging information from a database. LLMs also excel in translating language from one form to another, such as translating English to Spanish or Jira tickets to Python programs. They’re also helpful in interacting with some kinds of APIs and other tools, but generally only under human supervision.

Founders should consider where monitoring or synthesizing information embedded in language can help automate a process or alert the business to new opportunities (or threats). I suggest avoiding most “AI” businesses unless you can differentiate yourself from the competition with a single, obvious sentence. The best AI opportunities may be business tools, not tools to build AI products.

* What, you don’t name funny observations after yourself?