According to the Stanford AI Index Report, AI models have gotten really good at passing medical boards. GPT-4 Medprompt got 90.2% of its answers correct in 2023.
I would have guessed that it would be considerably higher by now, but according to the leaderboard maintained by Papers With Code, not so much:
The best performance is now owned by Google with its Med-Gemini product, but it's only 0.9 percentage points better. It seems as though progress has stalled a bit.
Still, I wouldn't be surprised if this is already better than most real doctors can manage.
How intelligent does it have to be to say "Lose a few kilos, cut down on your drinking, and get these tests done at pathology where some more AI will prescribe any recommended medications"?
I suspect it will get very good at diagnosis fairly soon as well. Feed it complete test results, vitals, and patient response to a lengthy list of questions and it will eventually come up with the right answer more often than humans.
I’d also think it will soon be very good at reading X-rays and CT scans, too. That will be a boon for small towns and rural hospitals that aren’t big enough to have a full time radiologist on staff.
The technology for reading X-rays and CT scans has nothing in common with the large language model programs. And people have been working on the first for decades now so don't expect a surge in progress.
I would think the main contribution of AI should be pre-diagnosis questionaires (not just interpreting test results). Because that is repetitive and time consuming for doctors and could be done before the doctor sees the patient (and could suggest what tests were indicated - and point out possibilities the doctor might need to exclude). What it shouldn't do is make final decisions (which need to made in consultation with the patient).
Completely off topic but may be interesting to Kevin if he hasn't already seen it: https://www.theguardian.com/science/2025/jan/06/roman-empires-use-of-lead-lowered-iq-levels-across-europe-study-finds
Although maybe the change in IQ is not the most important - the change is impulse control (leading to more violence) might be more important. Interesting also that the effect may have been WORSE for the upper classes.
Beethoven probably died of lead poisoning, so they say.
This seems to have been debunked.
We are back to "we don't know what Beethoven died of"--or suffered from before that.
and before his death lead was responsible for all his great compositions?
Yeah - hard to say. To clarify, it might not have been cause of death, but cause of deafness. So they say. No one really knows. I guess somehow his hair was available for testing after all these years and came up hot for lead.
Some medical boards questions are bad. Maybe in the future AI can write the questions.
There was a study done years ago showing that you get better outcomes if doctors followed a check-list. Other studies showed that doctors and like everybody else, they tend to fight their last battle. That is, if they see a patient that reminds them of a previous case, they'll jump to the conclusion that they have the same condition.
AI will be an important tool in medical care, depending on how it is setup. Computer programs in general can help, and don't have to be AI--e.g. cross checks to make sure medications don't interfere with each other, that dosages are correct, etc.
Of course if AI is setup for creative coding of procedures to maximize reimbursements, well....
"That is, if they see a patient that reminds them of a previous case, they'll jump to the conclusion that they have the same condition." So exactly like continuously learning AI.
> I suspect it will get very good at diagnosis fairly soon as well. Feed it complete
> test results, vitals, and patient response to a lengthy list of questions and it will
> eventually come up with the right answer more often than humans.
There were expert systems that did this at a very good level in the 1970's and '80's. They were pretty good as doctor's aids, but docs didn't want to use them.
Last year my doctor told me that he uses AI daily, and its like having the most brilliant doctor to discuss a case. It can also completely hallucinate, but since he went to medical school and has almost 40 years of experience, that helps to determine when the AI goes off the rails.
Pretty much every time I go to the doctor, I have correctly self-diagnosed or know what tests I need. Most medical decisions are not that difficult.
Most of them, yes. It's the ones that aren't. And knowing the difference.
What is this AI? Artificial insemination?
You mean all there is to being a good doctor is passing the board? Who knew? ...
Yeah, my thought, too. I recall seeing a study showing a strong (maybe the strongest) predictor of success as a surgeon being ‘manual dexterity’. Ask your surgeon if s/he does needlepoint, or builds ships in bottles ….
Surgeons are a special case. Usually, some other doctor has done any diagnosis required before the patient is referred to a surgeon, so surgeons don't need to be skilled diagnosticians, which is where expert systems and AI come in.
This has the potential to dramatically decrease the cost of medical care, which means profits can skyrocket!! Whoo-hoo!!
Uhm ... last time I saw I doctor, the doctor asked *me* questions, not the other way around.
Show me an LLM that can actually perform a diagnosis, rather than pass a written exam (which was always just a proxy for "has memorized enough stuff" - we know LLMs are exceedingly good at that, as long as the stuff is text).
This is stupid. I don't to go to a doctor and have a 1 in 10 likelihood of them being completely wrong. That seems like an unacceptably low accuracy rate.
The term "AI" is getting thrown around so much these days I think its heading to the same spot as "literally." Anyway, humans have been constructing tools for easily getting answers to questions ever since the invention of the book, and have been constructing tools for better ways of performing tasks since the invention of the wheel.
Each new tool or each new way of categorizing information is hailed as a breakthrough which will eliminate jobs, and each new tool or way of categorizing information sometimes does, in fact, eliminate jobs (and often creates other ones).
The fact that a computer could pass a test as well as a human is old news. Perhaps, now, a computer could create a test, but that is the question.
Its not a better way of coming up with answers, its deciding on the question that is the "I" in artificial intelligence. I haven't seen any progress on that point.
Large language models are actually plagerism on an industrial scale. Not that plagerism on an industrial scale is not useful, but its still plagerism.
Then there's this. Curse you, Gen AI!
https://apnews.com/article/tesla-cybertruck-explosion-trump-hotel-las-vegas-248b41d87287170aa7b68d27581fdb4d
I'm sure most real doctors would do that well if they were also allowed to haul a massive database in with them.