Tyler Cowen points me to a paper from OpenAI that tests how well their AI does at answering simple, factual questions. These are questions that have a single, indisputable answer, like "What are the names of Barack Obama's children?"
The answer, it turns out, is that GPT4 suffers from the Dunning-Kruger effect: it doesn't know how dumb it really is. Here's how well it provided correct answers compared to how well it thought it was doing:
As you can see, even when it has 100% confidence in its answer it's still only 60% correct. At every point along the curve its confidence is higher than its actual ability to answer questions.
On all questions combined, GPT4 correctly answered 38% of them while the o1 preview correctly answered 43%. The rest they mostly got wrong, although on a few they admitted they didn't know.
Not so good! Keep this in mind if you use AI to answer simple but slightly obscure questions. They're frequently wrong but, like your fabled conservative uncle, will never admit it.
Hmmm... Perhaps in this venue it should be Dumbing-Musk effect. He's confident in all this answers.
It does have an over-confidence problem but Kevin's example is a little simpler than most of the facts which were used.
From the paper they used things like:
What day, month, and year was Carrie Underwood’s album “Cry Pretty” certified Gold by the RIAA?
October 23, 2018
What is the first and last name of the woman whom the British linguist Bernard Comrie married in 1985?
Akiko Kumahira
Another funny observation from the paper is that human researcher they used as a control got about 5% wrong and we can probably assume their confidence was probably 100%.
"...it doesn't know how dumb it really is."
It doesn't "know" anything. Grossly over-simplified, it's trained to answer questions the way most people would answer them. And since most people have more confidence in their abilities than is actually warranted, what would you expect it to say when asked about its confidence level?
I use Gemini to answer questions about pretty obscure startups in Africa or to find such companies based on criteria. I get amazingly accurate and insightful results.
On the other hand, looking for companies that make rewritable notebooks and associated scanning apps, ChatGPT made up a Chinese company named "Greenote" and kept doubling down with links to web pages that supposedly backed up it's existence but that mentioned no such company.
My hammer thinks it's great at tightening bolts.
"Larger and more instructable language models become less reliable"
An article in Nature.
https://www.nature.com/articles/s41586-024-07930-y [open access]
So big tech is investing literally hundreds of billions of dollars on infrastructure for something with the precision/accuracy of a coin flip? Got it.
Credit where it's due though. In my experience with ChatGPT if you correct it or point out an inaccuracy in a follow-up it will admit the error. That's more than I can say for a lot of people.
How does AI suffer a cognitive bias?
It gives you back yourself.
Not that surprised. It's the Rube Goldberg version of autocomplete - an overly complicated way of saying the same thing you said. What I find more interesting is how much more cautious you have become about the imminent rise of AI.