AI search engines give incorrect answers at an alarming 60% rate, study says

angrynb

Smack-Fu Master, in training
9
roughly 1 in 4 Americans now uses AI models as alternatives to traditional search engines
AI models incorrectly answered more than 60 percent of queries about news content.

That's about the standard American rate of correctness. If most of us actually cared about factual correctness, we wouldn't be in the situation we're in.
 
Upvote
188 (190 / -2)
Post content hidden for low score. Show…

markgo

Ars Praefectus
3,169
Subscriptor++
Despite these issues, Howard sees room for improvement in future iterations, stating, "Today is the worst that the product will ever be," citing substantial investments and engineering efforts aimed at improving these tools.

That could be said about literally any technology at any point in history. It’s not even remotely an excuse. It’s semantically null.

He should be fired for saying something that stupid.
 
Upvote
147 (148 / -1)

fargofallout

Wise, Aged Ars Veteran
113
Subscriptor
I know there's a lot in here to point out and discuss and whatnot, but this little bit here:

...and Grok 3's premium service ($40/month)...

I don't know that I would pay even $5/month for any of these services, so I want to know who in the world would actually pay $40/month for that. I don't know if the number would be incredibly amusing or incredibly depressing.
 
Upvote
90 (90 / 0)

markgo

Ars Praefectus
3,169
Subscriptor++
Is this the same Columbia University that was more than willing to expel its students because they're not "American Enough" to pass Trump's American "purity" test?
C’mon you’re posting this in an article on accuracy? Please post any proof that Columbia expelled anyone for viewpoint. They have only expelled a handful of people, all of whom were involved in the Hamilton Hall takeover.

You can say that was unjustified but there is NO evidence there was any sort of Trump related purity test.
 
Upvote
40 (61 / -21)

umichans

Smack-Fu Master, in training
28
Is this the same Columbia University that was more than willing to expel its students because they're not "American Enough" to pass Trump's American "purity" test?
What does a released study about AI error rates have anything to do with university administration being cowed into submission by a fascist president trying to recreate 1932 Germany.
 
Upvote
82 (89 / -7)

WXW

Ars Scholae Palatinae
1,075
Yeah, shocking... The other day I asked o3-mini a question, it had no idea, but still answered some bullshit again and again. I saved the "reasoning" text when I called its bullshit out, because it surprised me how revealing it was:

It sounds like the user expects a sincere answer, so I should make sure not to guess any details this time!
 
Upvote
93 (93 / 0)

DrewW

Ars Scholae Palatinae
1,444
Subscriptor++
All kidding aside, it doesn't take more than five minutes to figure out that these AI engines are often wrong....in the worst possible way. Subtly incorrect with an air of authority. Rarely entirely incorrect....so people give the benefit of the doubt. Sigh.
I think of dumb ai like malicious compliance from dumb people. The cheese will stick to a pizza if you add a 1/8 cup of glue; the request was fulfilled and the solution will work. I expect the same uselessness from a chatbot or from a stoned undergrad.
 
Upvote
17 (17 / 0)

betam4x

Ars Praefectus
3,284
Subscriptor++
Not really surprising. If you use Google more than once a day, you would know that. 30 minutes prior to this being posted I searched for something on my desktop (which is the only device I haven’t moved to DDG) and the “A.I.” changed 3 simple words that could have killed me if I hadn’t know. better.

I was looking up the max safe dose of a OTC sleeping pill in 24 hours. I needed to take another, the bottle mentioned nothing about a max dose or anything, I didn’t want to take too much, and definitely didn’t want to OD. I just wanted to go to sleep.

Had I followed Google’s advice, I would be hospitalized or worse right now. Thankfully I know enough to have caught it…this time.

Just 3 words in the AI summary could kill someone. Let that sink in, and they were small words. Unimportant words to many.

Click “web” or don’t use Google folks. Google launched my career decades ago when it became public. I was hugely successful because I knew how to use it. Now I am telling you: walk away.

EDIT: before someone asks, I may provide details later, but I'm baffled and honestly considering reaching out to my lawyer to see if maybe something can be done (probably not, but he likes challenges). Due to this, I won't give details (yet), but the tl;dr is: the LLM behind the Google AI stuff changed basically suggested that the max dose is the minimum if I had a huge issue falling asleep, and suggested another random value as the max that was 5X as much. None of the "sources" suggested anything like this, so it is unclear where Google got this information.

When I finally found a reputable page on the subject, the page noted that such a high dose can cause "respiratory depression, cardiac arrest, and death".

I don't rely on AI results in general because I know how they work, but had my spouse Googled that...or my kids, or anyone else...
 
Last edited:
Upvote
125 (130 / -5)
Post content hidden for low score. Show…
Post content hidden for low score. Show…

Fatesrider

Ars Legatus Legionis
22,830
Subscriptor
Despite these issues, Howard sees room for improvement in future iterations, stating, "Today is the worst that the product will ever be," citing substantial investments and engineering efforts aimed at improving these tools.
"Room for improvement" will be its epitaph.

And that whole "today is the worst that the product will ever be" ignores precedent and reality. No matter how bad something is today, it very much can be worse tomorrow.

Citation: the world today.
 
Upvote
55 (55 / 0)

jocedeg

Seniorius Lurkius
22
This doesn’t seem remotely informative? Traditional search is much better than GenAI for finding the origin of an exact piece of text. This seems like a study designed to find what it wants, that’s not even close to a real world use case.
...but whatever the use case... it WAS wrong 60% of the time on a case.

Reason enough to worry, no?
 
Upvote
35 (35 / 0)

vvax56nM

Smack-Fu Master, in training
99
"Room for improvement" will be its epitaph.

And that whole "today is the worst that the product will ever be" ignores precedent and reality. No matter how bad something is today, it very much can be worse tomorrow.

Citation: the world today.
Considering the enshittification epidemic going on I find it more likely that products are at their best at launch and will just get worse over time.
 
Upvote
45 (46 / -1)

sigmasirrus

Ars Scholae Palatinae
1,137
Not really surprising. If you use Google more than once a day, you would know that. 30 minutes prior to this being posted I searched for something on my desktop (which is the only device I haven’t moved to DDG) and the “A.I.” changed 3 simple words that could have killed me if I hadn’t know. better.

I was looking up the max safe dose of a OTC sleeping pill in 24 hours. I needed to take another, the bottle mentioned nothing about a max dose or anything, I didn’t want to take too much, and definitely didn’t want to OD. I just wanted to go to sleep.

Had I followed Google’s advice, I would be hospitalized or worse right now. Thankfully I know enough to have caught it…this time.

Just 3 words in the AI summary could kill someone. Let that sink in, and they were small words. Unimportant words to many.

Click “web” or don’t use Google folks. Google launched my career decades ago when it became public. I was hugely successful because I knew how to use it. Now I am telling you: walk away.

EDIT: before someone asks, I may provide details later, but I'm baffled and honestly considering reaching out to my lawyer to see if maybe something can be done (probably not, but he likes challenges). Due to this, I won't give details (yet), but the tl;dr is: the LLM behind the Google AI stuff changed basically suggested that the max dose is the minimum if I had a huge issue falling asleep, and suggested another random value as the max that was 5X as much. None of the "sources" suggested anything like this, so it is unclear where Google got this information.

When I finally found a reputable page on the subject, the page noted that such a high dose can cause "respiratory depression, cardiac arrest, and death".

I don't rely on AI results in general because I know how they work, but had my spouse Googled that...or my kids, or anyone else...
I’d say you could sue the manufacturer for not including the safe maximum does on the bottle! Usually I see stuff like “do not exceed 3 doses in 24 hours” on OTC stuff.
 
Upvote
25 (25 / 0)
The average user is probably not using LLMs for this kind of thing, and we already have tools that do exact text matching well, and LLMs aren't one of them.

Arguably the bigger problem with AI search is the opposite: that it steals and scrapes data directly from websites, often verbatim, which both depreciates website traffic and runs into the problem of sharing information without appropriate context.

Ultimately I'm just not sure how much value this study actually has. It's like the strawberry or logic puzzle things. Yeah it's funny that LLMs are bad at these and we can make fun of how overhyped the products are, but it's also clearly outside the normal scope of us
Did you miss the news that Microsoft, Google, Perplexity, etc. are offering LLM search and average users are in fact using it?

How is this study of live products being used by millions of users not relevant or valuable?

Also, the problem of the search ignoring website rules against scraping is in the study and is mentioned in the this article.

Did you just read an AI summary of this article? ;)
 
Upvote
87 (87 / 0)

ninjonxb

Smack-Fu Master, in training
74
And AGI is right around the corner and these people love to claim that hallucination isn’t as big of a problem anymore >.>

Waiting for this article to be on hacker news for everyone to come out and defend and downplay this.

There is a huge amount of money and advertising effort in convincing the average user that these tools are reliable.
 
Upvote
27 (27 / 0)

sigmasirrus

Ars Scholae Palatinae
1,137
I said this in the other AI thread but I'm not a huge fan of this study.

Yes AI sucks, and yes AI is being way overmarketed, but this particular study seems both beyond the scope of how it's normally used and intentionally not really a good fit for an LLM in the first place.

The average user is probably not using LLMs for this kind of thing, and we already have tools that do exact text matching well, and LLMs aren't one of them.

Arguably the bigger problem with AI search is the opposite: that it steals and scrapes data directly from websites, often verbatim, which both depreciates website traffic and runs into the problem of sharing information without appropriate context.

Ultimately I'm just not sure how much value this study actually has. It's like the strawberry or logic puzzle things. Yeah it's funny that LLMs are bad at these and we can make fun of how overhyped the products are, but it's also clearly outside the normal scope of use.

In a way it almost feels like bait, and a potential distraction from the more serious issues surrounding LLMs.
It’s true that it’s a little to the left. But not by much. An ordinary user might ask “what was that BBC article about the polar bears recently?” And expect to find an answer. In fact, I’ve had ChatGPT answer vague questions like that successfully sometimes. You would think if the tool is any good, if you’re more specific it would do a really good job of finding the article. In the study, though, rather than say “I don’t know” it makes URLs up sometimes.

Maybe if the hallucination rate were 1 in a million requests it might be more worthwhile since now you’re approaching the reliability of the underlying dataset (I.e. all the top 10 results on Google, when Google was still decent, could still be wrong). But this is a much higher error rate. So you never know when to trust it, making it kinda useless.

Actually less that useless because it’ll lull you into a false sense of security.
 
Upvote
28 (28 / 0)

Edgar Allan Esquire

Ars Tribunus Militum
2,999
Subscriptor
I didn't see it in the article, but the study had a narrower focus than just "AI is inaccurate"
We deliberately chose excerpts that, if pasted into a traditional Google search, returned the original source within the first three results.
It argued the AI as a search engine is worse than modern plain Googling. Given more people and companies are using AI instead of vanilla algos, I can see that as a cromulent warning.

That's setting aside things like citing a real article that is unrelated to the one being searched for.

Screenshot-2025-03-06-at-9.20.56%E2%80%AFAM.jpg
 
Upvote
40 (40 / 0)

Schpyder

Ars Tribunus Angusticlavius
9,765
Subscriptor++
That could be said about literally any technology at any point in history. It’s not even remotely an excuse. It’s semantically null.

He should be fired for saying something that stupid.

It's not always true though, look at Google Search. It was significantly better both at finding information and as a user experience some four years ago, before Prabhakar Raghavan sank his bean-counter talons into it. In our current era of late-stage capitalism, it's entirely possible - nay, likely - for technologies to in fact get objectively worse in the name of The Almighty Shareholder Value (pbui).
 
Upvote
54 (54 / 0)