Thinks Out Loud: E-commerce and Digital Strategy podcast

The AI Coin Flip: Why AI Gives Every Customer a Different Answer (Digital Reset Episode 488)

0:00
22:01
Spola tillbaka 15 sekunder
Spola framåt 15 sekunder

Rand Fishkin’s team ran 2,961 prompts across ChatGPT, Claude, and Google AI. 600 volunteers, 12 different prompts, two months of runs. They wanted to answer one question: how often do you see the same list of brand recommendations twice, even with the exact same prompt?

The answer? Less than 1% of the time. The odds of seeing the same list in the same order are closer to one in a thousand.

Most conversations about AI inconsistency treat it as a measurement problem: how do I know if my brand is showing up? That’s a legitimate question. But it’s not the only question. And it might not even be the most important one.

If AI systems give different recommendations essentially every time, the same inconsistency is already baked into every AI chatbot you’ve deployed — your hotel chat widget, your B2B sales assistant, your customer service tool. Most teams have never measured it. And some of those inconsistent answers are already driving negative reviews for your brand and business.

This episode connects three stories Tim has covered over the last three weeks — the AI value gap, the uncertain timeline of agentic commerce, and now AI inconsistency — showing that they all stem from the same underlying condition. It also explains what City of Hope, appearing in 69 of 71 AI responses for "West Coast cancer care hospitals," tells us about how you can fix this problem for your business.

Key Insights for Strategic Leaders to Close the Gap
In this episode, Tim Peter breaks down:

  • The full SparkToro/Gumshoe.ai research — and what it actually means. Rand Fishkin and Patrick O’Donnell ran nearly 3,000 prompts with 600 volunteers. The list of brands recommended changed more than 99% of the time. Here’s why that reframes everything about how you should be tracking AI visibility.
  • The operational problem most people are missing. AI inconsistency isn’t only a marketing measurement challenge — it’s a liability inside your own deployed tools. Your AI chatbot may be giving materially different answers to different customers right now. And, it’s almost certain that no one on your team is measuring that.
  • City of Hope: what 97% consistency looks like. Why City of Hope appeared in 69 of 71 AI responses for "West Coast cancer care hospitals" and what that reveals about how AI decides which brands it’s willing to commit to — and which ones it isn’t.
  • Why "post more content" is the wrong strategy. How AI actually works: triangulation across independent sources, why your own website is a low-weight signal, and what "digital witnesses" means for building prompt brand equity that holds up.
  • The King, Queen, and Crown Jewels operating model. Content is king, customer experience is queen, and data is the crown jewels, not just as a branding concept, but as the mechanism that drives the AI’s confidence in your brand.
  • Four moves to make this week.
    • Shift from rank to frequency measurement.
    • Audit your deployed AI tools for consistency before worrying about external AI visibility.
    • Build credible witnesses, not content volume.
    • And treat review velocity as a strategic input, not just a reputation metric.

Whether you’re in hospitality, retail, or B2B, this episode is for anyone who’s deploying AI in a customer-facing role… or who’s who’s being asked to report on AI visibility and wants a better sense of what they’re actually measuring.

The AI Coin Flip: Why AI Gives Every Customer a Different Answer (Digital Reset Episode 488) — Headlines and Show Notes

Show Notes and Links

Buy the Book — Digital Reset: Driving Marketing and Customer Acquisition Beyond Big Tech

Tim Peter has written a new book called Digital Reset: Driving Marketing Beyond Big Tech. You can learn more about it here on the site. Or buy your copy on Amazon.com today.

Past Appearances

Rutgers Business School MSDM Speaker: Series: a Conversation with Tim Peter, Author of "Digital Reset"

Free Downloads

We have some free downloads for you to help you navigate the current situation, which you can find right here:

Subscribe to Thinks Out Loud

Contact information for the podcast: [email protected]

Past Insights from Tim Peter Thinks

Technical Details for Thinks Out Loud

Recorded using a Shure SM7B Vocal Dynamic Microphone and a Focusrite Scarlett 4i4 (3rd Gen) USB Audio Interface.

Running time: 22m 01s

You can subscribe to Thinks Out Loud in iTunes, the Google Play Store, via our dedicated podcast RSS feed (or sign up for our free newsletter). You can also download/listen to the podcast here on Thinks using the player at the top of this page.

Transcript: The AI Coin Flip: Why AI Gives Every Customer a Different Answer

Welcome back to the show. I’m Tim Peter.

I’ve talked about a concept called "Prompt Brand Equity" for a while now, the idea that what matters in AI search isn’t where your brand ranks. It’s whether you show up at all, whether your brand shows up at all. And I mentioned some early research from Rand Fishkin at SparkToro, shows that AI recommendation lists were unpredictable. You could be number one in one chat and number three in the next, even with the exact same prompt, by the exact same person.

Well, the full research is out now, and the numbers are much more striking than I expected. Rand’s team ran 2,961 prompts through ChatGPT, Claude and Google AI with 600 volunteers over two months. The question they were trying to answer, how often do you see the same list of brand recommendations twice, even if you run the exact same prompt over and over?

The answer? Less than 1% of the time. I want to say that again. Less than 1% of the time do you see the same list twice. In other words, practically never. That has real consequences for how you measure your business’s AI visibility and for how you think about the AI tools that you’ve already deployed in your own business.

And for the ROI gap that I covered on episode 486, which if you missed it, was about why 88% of companies are using AI, but only 6% are seeing significant value from it. It turns out that these trends, these traits, these facts are connected. Today I want to get into how.

This is episode 488 of Digital Reset with Tim Peter. I’m Tim Peter. Let’s dive in.

Okay. Let me start with what the research actually found, because the headline number undersells it a little. Rand Fishkin partnered with Patrick O’Donnell at a company called Gumshoe.ai. They recruited 600 volunteers to run 12 different prompts, things like "recommend headphones under $300," or "what are the best project management tools."

They ran these through ChatGPT, Claude, and Google Gemini, Google AI, over and over for two months, nearly 3,000 runs in total. And what they found is this: the list of brands recommended changes more than 99% of the time. The odds of seeing the same list in the same order twice are closer to one in a thousand.

That’s nuts, right? So I wanna be fair about what this means and what it doesn’t mean. It doesn’t mean that AI is useless. It doesn’t mean that brand mentions in AI are random and it definitely doesn’t mean you should give up on showing up in AI answers. Far from it.

What it means is that where you appear in any given AI response, whether you’re number one or number three, tells you essentially nothing. That position is random. It’s not predictive of anything to you or to your business.

The useful metric isn’t rank. It isn’t where you show up. It’s frequency. How often does your brand appear at all, across a large sample of runs on the questions that matter to your customers. That number tells you something real.

That number is, of course, prompt brand equity, not position, frequency.

I mentioned Rand’s early work on this in episode 485 when I talked about how we’ve moved from a world of card catalogs to a world of concierges. The new data just puts specific numbers into what we already expected. The picture is much clearer now, and if I’m being really honest, a little more dramatic than I expected.

Now, here’s what I think is the most under-reported part of the story, and it matters a lot if you are in hospitality or honestly, if you’re in any business that has deployed AI in a customer facing role.

When people talk about AI inconsistency, they almost always frame it as a marketing measurement problem. You know, how do I know if my brand is showing up? And it’s a legitimate question. I’ll get back to that in just a moment, but it’s not the only problem here.

The second problem is operational, and it’s happening right now in your business.

If AI systems give different recommendations essentially every time to customers asking the same question, what do you think is happening when your AI chatbot answers the same question from two different customers? Think about that for a moment.

Let me give you a hospitality example. Let’s say you’ve got a guest who opens your hotel’s chat widget and asks, what’s the cancellation policy for a reservation that’s made between this date and this date? And your AI gives them an answer. An hour later, a different customer asks the exact same question because they’re staying during a high demand period.

Your AI might give a slightly different answer. Maybe the window is 48 hours in one response and 24 in another. Maybe it says that there’s a charge for one and not for the other. Maybe amenities that are included get described differently. Maybe the rate quote shifts. Is that happening? Uh, almost certainly.

Has anyone on your team actually measured it with any consistency? Almost certainly not.

By the way, this is not just a hospitality problem, it’s a B2B problem too. It’s a retail problem too. If you are using an AI tool to help your sales team respond to prospect questions or using AI to handle initial customer service inquiries, the same inconsistency is baked into those responses. Your AI isn’t giving a consistent answer. It’s giving a distribution of answers, and your customers are sampling from that distribution. That’s an operational liability, not just a measurement inconvenience.

In hospitality, especially where the expectation of guest forms before arrival is a huge driver of satisfaction. Or dissatisfaction and where dissatisfied guests write reviews that the AI then reads. And I don’t mean the AI on your website. I mean ChatGPT or Claude or Google Gemini or Google AI Overviews or AI Mode. This is a problem that you need to think carefully about.

So if AI recommendations are this inconsistent, why is it that some brands do really well and others don’t? Why do some brands appear in almost every other response while others don’t appear in very many at all? And there’s a phenomenal example, just an amazing example from the research that I want to share.

Rand’s team asked Google, specifically to recommend "West Coast cancer care hospitals" 71 times, and City of Hope appeared in 69 of those responses. That’s 97% of the responses.

That’s not luck, that’s not some glitch in the algorithm. City of Hope appears in 97% of those responses because the AI’s confidence weight for City of Hope on the topic of "West Coast cancer care" is so high that it essentially has to include them. There’s too much validating information from too many credible independent sources pointing in the same direction for the AI to leave them out with any reasonable probability.

I want to be fair here for a moment. There is a reality in Google searches where they refer to "your money or your life searches," things where it can cost somebody significant money or it can interfere with their health outcomes.

Google is very careful in YMYL searches, "your money or your life" searches, not to include any business that could reasonably harm your money or your life.

So it is possible, I want to be very clear, it’s very possible that this specific query ends up being a YMYL query, or because it’s a YMYL query, Google’s going to be particularly careful not to recommend a brand that it’s not highly confident about.

It still underscores the point that the concierge wants to give the best possible answer, not just what gets the most links, not just what ranks number one, but what it knows is the correct answer. And what City of Hope is seeing is what high prompt brand equity looks like when it’s working.

The inverse is also true. Brands that show up 5% or 10% of the time aren’t being penalized. They’re simply not validated enough. They’re simply not enough corroborating information about them for the AI to be willing to commit to the brand.

Now, when I say validated or corroborated, I want to be specific about what that means. Because "post more content" isn’t the answer here. That’s not how you’re going to win.

What AI systems are doing is triangulating. They’re looking at a variety of sources and they’re asking "Do multiple independent sources, sources that have no obvious reason to be aligned, point to this brand and only this brand—or mostly this brand—as a credible answer to this question?"

Sure your own website says you’re doing great. That’s a data point, but it’s not necessarily a very high weight one. You are a biased source. What your customers say in reviews, that’s a higher weight signal. What industry publications and third parties say about you? Also a higher weight signal. What other expert sources cite you for, again, higher weight.

They’re witnesses to your competence and your capabilities and your credibility. Digital witnesses backing up and verifying what the concierge, the large language model, knows about you.

This. This is the reason that I say content is king, customer experience is queen, and data is the crown jewels. It’s not just a branding concept, it’s an operating model that drives prompt brand equity.

The content you publish establishes what you are about. It says, "this is who we believe we are." The experience you deliver generates the reviews and the word of mouth to validate that, that corroborates the other witness’s story. And the data, the first party signals that you own and the external signals that others generate about you is what the AI draws from to decide whether or not you are the brand it should include.

When those three areas, the content, your customer experience, and your data, are aligned and consistent, the AI’s confidence weight for your brand goes up. The concierge can recommend you confidently.

When those data points, when those three elements are contradictory or thin or absent, you’re the brand that shows up one time in 10.

Your brand is the prompt, but for your brand to actually be the prompt for the AI to reach for you by name the way it reaches for City of Hope, it needs enough signal that it has no other choice of who to select.

So practically, what do you do about this? I want to give you four things to think about this time around.

First shift your measurement from rank to frequency. Stop worrying about whether you’re number one or number two in an AI response. That number will change constantly and tells you nothing at all reliable.

What you want to track is if you run the 10 questions your customers most commonly ask across ChatGPT, Google, and Perplexity, let’s say for 10 runs each, how often do you appear? That’s your baseline frequency. And you can use tools like Peec.AI or seoClarity or others to actually measure that for you. But run it again 30 days from now, 60 days, 90 days from now. If your frequency is going up, then your signals are strengthening. If it’s declining, then something needs your attention.

I talked about many of these tools in episode 485, and I’ll link to them again in the show notes. But you can start measuring today for free with nothing but an hour and a few open browser tabs. If you’ve got people on your team, great. Set them to the same task. Take a few people, take 30 minutes and ask them to do it, and record those in a Google Sheet or an Excel spreadsheet so you’ve got something to refer back to.

Second audit your internal AI tools for consistency. While you’re worrying about showing up in external AI answers, go check what your own deployed AI is telling your customers.

In fact, if you already have an AI chatbot, I would do this first.

Take a look at the most frequently asked questions. Could be five, could be 10, but the ones that get asked regularly and the ones that your team already know the answer to cold and ask your chatbot each one 10 times. If the answers differ materially on anything that would affect a customer’s expectations, you’ve got an operational problem that’s creating review risk right now. And that’s actually more urgent than any GEO tactic because you’re creating a negative impression of your brand.

Third, focus your content investment on increasing witnesses, credible witnesses, not just content volume. One piece of original expert authored content that gets cited and shared and referenced by independent sources does more for your prompt brand equity than 50 AI generated blog posts.

The AI isn’t counting the number of pieces of content on your site. It’s weighting the content by the degree to which independent verifiable sources, credible sources, confirm what your content says. Write less. Make it more citable. Make it more original. Make it more authentic. Make it genuinely worth citing. That’s how you get cited more often.

Fourth, treat review velocity as a strategic input, not just a reputation metric. Review velocity was always an important metric. This was always important. But AI makes it more true.

Reviews are one of the highest weight external signals that AI systems draw upon, particularly if you’re in hospitality or a local business, or, uh, a customer service oriented business.

Recent, detailed, specific reviews on Google, TripAdvisor, Yelp, your OTA listings are among the most direct inputs to prompt brand equity available to small, independent businesses. Larger businesses, the same is gonna be true for something like G2 because that’s where your customers go to talk about you. Getting 10 substantive high quality reviews this quarter does more for your AI visibility than almost any technical GEO tactic. And responding to those reviews where the systems allow you, all of them, not just the good ones, tells the AI that you’re a business that takes the feedback seriously. That’s a signal too.

Okay, so those four tactics out of the way, here’s the bigger picture I want to leave you with.

Over the last three weeks on this show, I’ve talked about three different stories that all come from the same root cause. I talked about the AI value gap, why 88% of companies use AI, but only 6% see significant results. I talked about agentic commerce and why even OpenAI wasn’t reasonably able to predict how soon AI driven transactions would happen. And now I’m talking about AI inconsistency, why the same prompt produces a different answer almost every single time.

I’m going to say this as clearly as I can. Those are not three separate problems. They’re three symptoms of the same underlying condition. AI systems work by constructing a best guess answer from the available signals. The concierge wants to know. When the signals are weak or inconsistent or contradictory, its output is going to be weak and inconsistent and contradictory.

When the signals are strong, deep, and consistent, the AI finds you. It commits to you. It recommends you 69 times out of 71 as opposed to one out of a thousand. The companies in the 6% who are closing the value gap, the businesses that appear consistently in AI answers, they’re not doing something exotic. They’re doing the fundamentals very, very, very well.

They’ve got clear objectives. They’ve got strong content. They’ve got data they own and trust. And they have customer experiences that earn reviews that confirm what their content says.

That’s the digital reset in practice. It’s not fighting the AI. It’s not chasing every new platform or tool or protocol. It’s making your signal clear enough that the machine finds you, no matter what form that machine takes, whether it’s Big Tech or AI or something else that comes down the road. It all works together. Do that well, and you’re gonna find yourself in a very good position for a very long time.

Now, if this episode gave you a clearer picture of what prompt brand equity actually means and how you can build it, do me a favor. Send it to one colleague who is currently staring at your 2026 budget and wondering where they put their money, what we do next. It might send them from spending time or money on the wrong thing. Even better, send this to the person on your team who is currently reporting on AI rankings. Ask them "if this list changes 99% of the time, what are we actually measuring?"

I want to remind you that you can find the show notes for this episode, including links to the SparkToro research, the brand visibility tools that I mentioned, and the full archive of past episodes at timpeter.com/podcasts.

And if you’re ready to go deeper on making your brand the answer that AI reaches for my book, "Digital Reset: Driving Marketing and Customer Acquisition Beyond Big Tech," is the roadmap that you need. You’ll find the link in the show notes.

Thank you again so much for listening. I genuinely appreciate you. Until next time, please be well, be safe and be excellent to each other. I’ll see you soon.

Take Your Next Step Toward a Digital Reset

“Digital Reset with Tim Peter” helps you look beyond the "shiny objects" to build a business that lasts. How can we help you today?

  • The Brief: Get the weekly email that turns these strategic ideas into actionable demand. Subscribe to The Digital Reset Brief
  • The Book: Master the framework with Digital Reset: Driving Marketing and Customer Acquisition Beyond Big Tech. Buy the Book
  • The Experience: Need a bespoke digital strategy for your hotel, resort, SaaS firm, or financial services firm? Tim Peter & Associates can help you. Work with Tim

The post The AI Coin Flip: Why AI Gives Every Customer a Different Answer (Digital Reset Episode 488) appeared first on Tim Peter & Associates.

Fler avsnitt från "Thinks Out Loud: E-commerce and Digital Strategy"