AI search engines give incorrect answers at an alarming 60% rate, study says

faffod · Mar 13, 2025

correctly identified all 10 excerpts from paywalled National Geographic content, despite National Geographic explicitly disallowing Perplexity’s web crawlers.

AI companies want to bypass copyright laws , but aren't willing to wait for laws to be updated... I am shocked.

angrynb · Mar 13, 2025

roughly 1 in 4 Americans now uses AI models as alternatives to traditional search engines

AI models incorrectly answered more than 60 percent of queries about news content.

That's about the standard American rate of correctness. If most of us actually cared about factual correctness, we wouldn't be in the situation we're in.

R-V · Mar 13, 2025

If AI companies are allowed to bypass copyright laws, so should I.

TinCoyote · Mar 13, 2025

I'm shocked, I tell you! Shocked!

All kidding aside, it doesn't take more than five minutes to figure out that these AI engines are often wrong....in the worst possible way. Subtly incorrect with an air of authority. Rarely entirely incorrect....so people give the benefit of the doubt. Sigh.

DaveSimmons · Mar 13, 2025

If I was evil scum I'd totally be marketing MAGAI® Search to the deplorables. Guaranteed to return doubleplusgood truthiness.

markgo · Mar 13, 2025

Despite these issues, Howard sees room for improvement in future iterations, stating, "Today is the worst that the product will ever be," citing substantial investments and engineering efforts aimed at improving these tools.

That could be said about literally any technology at any point in history. It’s not even remotely an excuse. It’s semantically null.

He should be fired for saying something that stupid.

mcswell · Mar 13, 2025

angrynb said:
If most of us actually cared about factual correctness, we wouldn't be in the situation we're in.

But...but...but... President Musk said X is a much better source of news than anything else!

fargofallout · Mar 13, 2025

I know there's a lot in here to point out and discuss and whatnot, but this little bit here:

...and Grok 3's premium service ($40/month)...

I don't know that I would pay even $5/month for any of these services, so I want to know who in the world would actually pay $40/month for that. I don't know if the number would be incredibly amusing or incredibly depressing.

markgo · Mar 13, 2025

C.M. Allen said:
Is this the same Columbia University that was more than willing to expel its students because they're not "American Enough" to pass Trump's American "purity" test?

C’mon you’re posting this in an article on accuracy? Please post any proof that Columbia expelled anyone for viewpoint. They have only expelled a handful of people, all of whom were involved in the Hamilton Hall takeover.

You can say that was unjustified but there is NO evidence there was any sort of Trump related purity test.

umichans · Mar 13, 2025

C.M. Allen said:
Is this the same Columbia University that was more than willing to expel its students because they're not "American Enough" to pass Trump's American "purity" test?

What does a released study about AI error rates have anything to do with university administration being cowed into submission by a fascist president trying to recreate 1932 Germany.

WXW · Mar 13, 2025

Yeah, shocking... The other day I asked o3-mini a question, it had no idea, but still answered some bullshit again and again. I saved the "reasoning" text when I called its bullshit out, because it surprised me how revealing it was:

It sounds like the user expects a sincere answer, so I should make sure not to guess any details this time!

DrewW · Mar 13, 2025

TinCoyote said:
All kidding aside, it doesn't take more than five minutes to figure out that these AI engines are often wrong....in the worst possible way. Subtly incorrect with an air of authority. Rarely entirely incorrect....so people give the benefit of the doubt. Sigh.

I think of dumb ai like malicious compliance from dumb people. The cheese will stick to a pizza if you add a 1/8 cup of glue; the request was fulfilled and the solution will work. I expect the same uselessness from a chatbot or from a stoned undergrad.

betam4x · Mar 13, 2025

Not really surprising. If you use Google more than once a day, you would know that. 30 minutes prior to this being posted I searched for something on my desktop (which is the only device I haven’t moved to DDG) and the “A.I.” changed 3 simple words that could have killed me if I hadn’t know. better.

I was looking up the max safe dose of a OTC sleeping pill in 24 hours. I needed to take another, the bottle mentioned nothing about a max dose or anything, I didn’t want to take too much, and definitely didn’t want to OD. I just wanted to go to sleep.

Had I followed Google’s advice, I would be hospitalized or worse right now. Thankfully I know enough to have caught it…this time.

Just 3 words in the AI summary could kill someone. Let that sink in, and they were small words. Unimportant words to many.

Click “web” or don’t use Google folks. Google launched my career decades ago when it became public. I was hugely successful because I knew how to use it. Now I am telling you: walk away.

EDIT: before someone asks, I may provide details later, but I'm baffled and honestly considering reaching out to my lawyer to see if maybe something can be done (probably not, but he likes challenges). Due to this, I won't give details (yet), but the tl;dr is: the LLM behind the Google AI stuff changed basically suggested that the max dose is the minimum if I had a huge issue falling asleep, and suggested another random value as the max that was 5X as much. None of the "sources" suggested anything like this, so it is unclear where Google got this information.

When I finally found a reputable page on the subject, the page noted that such a high dose can cause "respiratory depression, cardiac arrest, and death".

I don't rely on AI results in general because I know how they work, but had my spouse Googled that...or my kids, or anyone else...

Maarten · Mar 13, 2025

Generative Bullshit. And maliciously so.

Martin123 · Mar 13, 2025

markgo said:
That could be said about literally any technology at any point in history.

Are you saying no product in history ever went downhill?

Tundrok · Mar 13, 2025

AI is overhyped beyond belief and I have to spend so much of my time at work swatting down STUPID AI integration ideas. The hallucination problems aren't magically getting solved anytime soon.

balthazarr · Mar 13, 2025

Bullshit spewing machines gonna spew bullshit. News at 11.

vvax56nM · Mar 13, 2025

Is it just me or is the Google Search AI summary much worse than asking an AI chatbot directly?

DoktorYes · Mar 13, 2025

rather than declining to respond when they lacked reliable information, the models frequently provided confabulations—plausible-sounding incorrect or speculative answers.

Huh, so they ARE actually acting more like humans.

jocedeg · Mar 13, 2025

markgo said:
That could be said about literally any technology at any point in history. It’s not even remotely an excuse. It’s semantically null.

He should be fired for saying something that stupid.

You're overlooking the fact that he could be wrong and it could get worst.

Rocketpig · Mar 13, 2025

I run several highly specialized content sites and routinely see AI reference our sites despite having the setting that we don’t want our sites crawled.

AI (and tech in general really) is yet another way to leech off the work of others and monetize it for the benefit of a few.

Yay. Murica.

Fatesrider · Mar 13, 2025

Despite these issues, Howard sees room for improvement in future iterations, stating, "Today is the worst that the product will ever be," citing substantial investments and engineering efforts aimed at improving these tools.

"Room for improvement" will be its epitaph.

And that whole "today is the worst that the product will ever be" ignores precedent and reality. No matter how bad something is today, it very much can be worse tomorrow.

Citation: the world today.

jocedeg · Mar 13, 2025

nmack said:
This doesn’t seem remotely informative? Traditional search is much better than GenAI for finding the origin of an exact piece of text. This seems like a study designed to find what it wants, that’s not even close to a real world use case.

...but whatever the use case... it WAS wrong 60% of the time on a case.

Reason enough to worry, no?

vvax56nM · Mar 13, 2025

Fatesrider said:
"Room for improvement" will be its epitaph.

And that whole "today is the worst that the product will ever be" ignores precedent and reality. No matter how bad something is today, it very much can be worse tomorrow.

Citation: the world today.

Considering the enshittification epidemic going on I find it more likely that products are at their best at launch and will just get worse over time.

fkaOld_one · Mar 13, 2025

It’s understandable that no one here appears to be very surprised, but it needs to be stressed that this is a horrific result. Wrong ninety percent of the time is shocking and unacceptable under any circumstances. There’s a huge problem right in front of us.

The Sheep Look Up · Mar 13, 2025

R-V said:
If AI companies are allowed to bypass copyright laws, so should I.

I scraped Elsevier's entire journal collection today. But I did it using a LLM, so that's perfectly okay, right?

JustYourAverageJDP · Mar 13, 2025

Still more accurate than the President of the United States. I don't see why people are being so harsh on AI that is orders of magnitude superior to the most powerful man in the world. I mean its probably 100 times more accurate...

sigmasirrus · Mar 13, 2025

betam4x said:
Not really surprising. If you use Google more than once a day, you would know that. 30 minutes prior to this being posted I searched for something on my desktop (which is the only device I haven’t moved to DDG) and the “A.I.” changed 3 simple words that could have killed me if I hadn’t know. better.

I was looking up the max safe dose of a OTC sleeping pill in 24 hours. I needed to take another, the bottle mentioned nothing about a max dose or anything, I didn’t want to take too much, and definitely didn’t want to OD. I just wanted to go to sleep.

Had I followed Google’s advice, I would be hospitalized or worse right now. Thankfully I know enough to have caught it…this time.

Just 3 words in the AI summary could kill someone. Let that sink in, and they were small words. Unimportant words to many.

Click “web” or don’t use Google folks. Google launched my career decades ago when it became public. I was hugely successful because I knew how to use it. Now I am telling you: walk away.

EDIT: before someone asks, I may provide details later, but I'm baffled and honestly considering reaching out to my lawyer to see if maybe something can be done (probably not, but he likes challenges). Due to this, I won't give details (yet), but the tl;dr is: the LLM behind the Google AI stuff changed basically suggested that the max dose is the minimum if I had a huge issue falling asleep, and suggested another random value as the max that was 5X as much. None of the "sources" suggested anything like this, so it is unclear where Google got this information.

When I finally found a reputable page on the subject, the page noted that such a high dose can cause "respiratory depression, cardiac arrest, and death".

I don't rely on AI results in general because I know how they work, but had my spouse Googled that...or my kids, or anyone else...

I’d say you could sue the manufacturer for not including the safe maximum does on the bottle! Usually I see stuff like “do not exceed 3 doses in 24 hours” on OTC stuff.

DaveSimmons · Mar 13, 2025

squiggit said:
The average user is probably not using LLMs for this kind of thing, and we already have tools that do exact text matching well, and LLMs aren't one of them.

Arguably the bigger problem with AI search is the opposite: that it steals and scrapes data directly from websites, often verbatim, which both depreciates website traffic and runs into the problem of sharing information without appropriate context.

Ultimately I'm just not sure how much value this study actually has. It's like the strawberry or logic puzzle things. Yeah it's funny that LLMs are bad at these and we can make fun of how overhyped the products are, but it's also clearly outside the normal scope of us

Did you miss the news that Microsoft, Google, Perplexity, etc. are offering LLM search and average users are in fact using it?

How is this study of live products being used by millions of users not relevant or valuable?

Also, the problem of the search ignoring website rules against scraping is in the study and is mentioned in the this article.

Did you just read an AI summary of this article?

Sprigganmaster · Mar 13, 2025

Google AI results are worst than Google search… which are really bad. Double enshittifaction?

ninjonxb · Mar 13, 2025

And AGI is right around the corner and these people love to claim that hallucination isn’t as big of a problem anymore >.>

Waiting for this article to be on hacker news for everyone to come out and defend and downplay this.

There is a huge amount of money and advertising effort in convincing the average user that these tools are reliable.

balthazarr · Mar 13, 2025

Sprigganmaster said:
Google AI results are worst than Google search… which are really bad. Double enshittifaction?

I wonder whether they'll ever circle around to 'so bad it's good' like D-grade movies?

sigmasirrus · Mar 13, 2025

squiggit said:
I said this in the other AI thread but I'm not a huge fan of this study.

Yes AI sucks, and yes AI is being way overmarketed, but this particular study seems both beyond the scope of how it's normally used and intentionally not really a good fit for an LLM in the first place.

The average user is probably not using LLMs for this kind of thing, and we already have tools that do exact text matching well, and LLMs aren't one of them.

Arguably the bigger problem with AI search is the opposite: that it steals and scrapes data directly from websites, often verbatim, which both depreciates website traffic and runs into the problem of sharing information without appropriate context.

Ultimately I'm just not sure how much value this study actually has. It's like the strawberry or logic puzzle things. Yeah it's funny that LLMs are bad at these and we can make fun of how overhyped the products are, but it's also clearly outside the normal scope of use.

In a way it almost feels like bait, and a potential distraction from the more serious issues surrounding LLMs.

It’s true that it’s a little to the left. But not by much. An ordinary user might ask “what was that BBC article about the polar bears recently?” And expect to find an answer. In fact, I’ve had ChatGPT answer vague questions like that successfully sometimes. You would think if the tool is any good, if you’re more specific it would do a really good job of finding the article. In the study, though, rather than say “I don’t know” it makes URLs up sometimes.

Maybe if the hallucination rate were 1 in a million requests it might be more worthwhile since now you’re approaching the reliability of the underlying dataset (I.e. all the top 10 results on Google, when Google was still decent, could still be wrong). But this is a much higher error rate. So you never know when to trust it, making it kinda useless.

Actually less that useless because it’ll lull you into a false sense of security.

sigmasirrus · Mar 13, 2025

Meanwhile in the search for quality information I came across a search engine for scientific papers that seems really good, better than Google Scholar.

https://openalex.org/

Edgar Allan Esquire · Mar 13, 2025

I didn't see it in the article, but the study had a narrower focus than just "AI is inaccurate"

We deliberately chose excerpts that, if pasted into a traditional Google search, returned the original source within the first three results.

It argued the AI as a search engine is worse than modern plain Googling. Given more people and companies are using AI instead of vanilla algos, I can see that as a cromulent warning.

That's setting aside things like citing a real article that is unrelated to the one being searched for.

Screenshot-2025-03-06-at-9.20.56%E2%80%AFAM.jpg

Schpyder · Mar 13, 2025

markgo said:
That could be said about literally any technology at any point in history. It’s not even remotely an excuse. It’s semantically null.

He should be fired for saying something that stupid.

It's not always true though, look at Google Search. It was significantly better both at finding information and as a user experience some four years ago, before Prabhakar Raghavan sank his bean-counter talons into it. In our current era of late-stage capitalism, it's entirely possible - nay, likely - for technologies to in fact get objectively worse in the name of The Almighty Shareholder Value (pbui).

AI search engines give incorrect answers at an alarming 60% rate, study says

Ars Praetorian

Smack-Fu Master, in training

Wise, Aged Ars Veteran

Ars Centurion

Ars Legatus Legionis

Ars Praefectus

Ars Scholae Palatinae

Wise, Aged Ars Veteran

Ars Praefectus

Smack-Fu Master, in training

Ars Scholae Palatinae

Ars Scholae Palatinae

Ars Praefectus

Ars Tribunus Militum

Ars Praetorian

Ars Centurion

Ars Tribunus Angusticlavius

Smack-Fu Master, in training

Ars Centurion

Seniorius Lurkius

Smack-Fu Master, in training

Ars Legatus Legionis

Seniorius Lurkius

Smack-Fu Master, in training

Ars Centurion

Ars Centurion

Ars Centurion

Ars Scholae Palatinae

Ars Legatus Legionis

Smack-Fu Master, in training

Smack-Fu Master, in training

Ars Tribunus Angusticlavius

Ars Scholae Palatinae

Ars Scholae Palatinae

Ars Tribunus Militum

Ars Tribunus Angusticlavius

nproxy.org