Curbing AI hallucinations before they start

If you want to know how far off the rails your generative AI tool might go before you ask it to do any work, ask it to describe what its limitations are, the head of an AI company says. That gives you a roadmap for tweaking your data sources and the tool’s programming before you deploy it.

“These tools are very clear about what they do and don’t do,” Don White, CEO of Satisfi Labs, told Legal Dive. “If you ask any of these large language models what their limitations are, they will give you a very comprehensive list. I don’t think enough people are asking them.”

The possibility that your generative AI tool will produce hallucinations or otherwise provide incorrect content has become a concern among companies as they look to leverage the technology to improve work quality and efficiency.

The widely reported mishap earlier this year of an attorney including in his court filing cases that were fabricated by an AI tool has companies hitting the pause button. But a good generative AI tool will tell you the sources it's pulling data from and also what it’s programmed to do if it can’t fulfill your query from those data sources.

“It’s like a hammer and nail,” said White, who co-founded Satisfi seven years ago to help companies use AI to extend the reach of staff in interacting with customers. “You can use it wrong and put the nail in your hand and smack it.”

Data sources

If a query indicates your data sources are too limited, or contain outdated information, or don’t have sufficient constraints – that is, if it’s pulling data from the internet without guardrails – you can rethink the sources you want the tool to pull from, White said.

Likewise, if a query indicates the tool is making up content when the data sources are insufficient, you can rethink how you want it to respond when it doesn’t have the data it needs.

“The question becomes, what do you want it to do if it doesn’t know?” White said.

It’s crucial to have a firm idea before you deploy the tool about what is, and isn’t, a proper answer, and then test for that, he said.

If you don’t want the tool pulling incorrect information from its data sources, for example, test the tool on matters you know about so you can see if it’s giving you the right kind of response.

“If you saw a case in [your test] that you know wasn’t part of your source data, that’s your check that it’s gone outside the lines,” White said. “What the LLM will do at times, when you ask it to draw some conclusions, is actually insert what it needs to support its argument, whether or not it exists.”

If it is going outside the lines, that’s your cue to see if your data sources are sufficient and if it has adequate programming to know what it doesn’t know and stop itself.

Response style

Along with testing the content, you want to test the manner in which the tool responds, because that can be critical on the liability side.

If it doesn’t know something, or the query goes outside the parameters of what it’s programmed to answer, or it’s asked something inappropriate, what do you want the tool’s response to be?

Do you want the tool to repeat the question before declining to answer or just decline to answer? Or is there another response you want it to give?

“It could be something simple like, repeat back the question first,” said White, or “it could simply ignore the query: ‘Hey, I know you … said something nefarious and inappropriate. I only know [XYZ] policy.’ Or it could ignore what the person said and say, ‘I only know about [XYZ] policy.’”

Conversational AI

Satisfi launched with a focus on sports and entertainment companies. For these companies, White said, there tends to be a big gap between staff size and the number of customers attending an event. Think of tens of thousands of fans at a concert or at a football game compared to just a few staff available to answer their queries.

For companies to augment staff, Satisfi makes an app available for fans to download to their phone to get their queries answered by a chatbot.

“It’s about how you, as a fan, can find out everything around you without moving from your seat,” White said. “A human could not even do that. AI is the only way that could ever happen.”

When a fan queries the tool, the accuracy of the response depends on how well the tool was set up at deployment – what data sources it was given, whether it was corrected during testing if it gave an incorrect answer or went outside its data sources, and so on – and how well the data sources are kept updated.

Considerations about set-up and ongoing data quality apply to any generative AI tool, not just Satisfi’s.

In Satisfi’s case, White said, the company maps out data quality upfront.

“We set up [updating] as part of the onboarding,” White said. “We have an understanding of what data changes, how often it changes, and then give the client the opportunity to update it themselves in real time or reach out to us.”

Most clients manage the task themselves. “Usually it’s marketing or guest services departments” who keep the data updated, he said.

Unique to his company’s software, White said, is a response moderator to help ensure the tool stays within the parameters of the data sources and otherwise doesn’t go off the rails by blocking any response that goes outside the criteria that were set for the tool at deployment.

“When it goes outside of [the expected outcomes], it violates the criteria,” White said. “Then we can go back and the moderator will rescind it and say, ‘Try again; you didn’t get it right,’ until it passes the test.”

The company has a patent pending on the moderator. “It literally constrains the generative AI to exactly what the client says is available information to answer the question,” he said.

In-house role

Whatever tool a company ends up using, there’s a role for in-house counsel, White said.

It’s incumbent on the legal team to limit liability by writing a set of terms and conditions on use of the AI tool. That will help ensure users’ expectations are aligned with the risk the company is prepared to take.

“I’ve spent more time with lawyers on this topic in the last two months than in the last five years,” he said. “If in-house lawyers aren’t focusing on this space, they’re missing a lot of opportunity” to add value as their company deploys the technology.