OpenAI’s GPT-4 outperforms doctors in another new study

It may "know" more about treating eye problems than your own GP.
Sign up for the Freethink Weekly newsletter!
A collection of our favorite stories straight to your inbox

OpenAI’s most powerful AI model outperformed junior doctors in deciding how to treat patients with eye problems and came close to scoring as high as expert ophthalmologists — at least on this test.

The challenge: When doctors are in med school, they rotate across clinical areas, spending time with specialists in surgery, psychiatry, ophthalmology, and more to ensure they’ll have a basic knowledge of all the subjects by the time they get their medical license.

If they become a general practitioner (GP), though, they may rarely use the info they learned in some of those specialties and in treating less common conditions.

GPT-4 significantly outperformed the junior doctors, scoring 69% compared to their 43%.

The idea: Researchers at the University of Cambridge were curious to see whether large language models (LLMs) — AIs that can understand and generate conversational text — could help GPs treat patients with eye problems, something they might not be handling on a day-to-day basis.

For a study published in PLOS Digital Health, they presented GPT-4 — the LLM powering OpenAI’s ChatGPT Plus — with 87 scenarios of patients with a range of eye problems and asked it to choose the best diagnosis or treatment from four options.

They also gave the test to expert ophthalmologists, trainees working to become ophthalmologists, and unspecialized junior doctors, who have about as much knowledge of eye problems as general practitioners.

“The most important thing is to empower patients to decide whether they want computer systems to be involved or not.”

Arun Thirunavukarasu

GPT-4 significantly outperformed the junior doctors, scoring 69% on the test compared to their median score of 43%. It also scored higher than the trainees, who had a median score of 59%, and was pretty close to the median score of the expert ophthalmologists: 76%.

“What this work shows is that the knowledge and reasoning ability of these large language models in an eye health context is now almost indistinguishable from experts,” lead author Arun Thirunavukarasu told the Financial Times.

Looking ahead: The Cambridge team doesn’t think LLMs will replace doctors, but they do envision the systems being integrated into clinical workflows — a GP who is having trouble getting in touch with a specialist for advice on how to treat something they haven’t seen in a while (or ever) could query an AI, for example.

“The most important thing is to empower patients to decide whether they want computer systems to be involved or not,” said Thirunavukarasu. “That will be an individual decision for each patient to make.”

We’d love to hear from you! If you have a comment about this article or if you have a tip for a future Freethink story, please email us at [email protected].

Sign up for the Freethink Weekly newsletter!
A collection of our favorite stories straight to your inbox
Related
The West needs more water. This Nobel winner may have the answer.
Paul Migrom has an Emmy, a Nobel, and a successful company. There’s one more big problem on the to-do list.
Can we automate science? Sam Rodriques is already doing it.
People need to anticipate the revolution that’s coming in how humans and AI will collaborate to create discoveries, argues Sam Rodrigues.
AI is now designing chips for AI
AI-designed microchips have more power, lower cost, and are changing the tech landscape.
Why futurist Amy Webb sees a “technology supercycle” headed our way
Amy Webb’s data suggests we are on the cusp of a new tech revolution that will reshape the world in much the same way the steam engine and internet did in the past.
The exciting research that may cure Parkinson’s 
GeneCode is developing a drug it hopes won’t just alleviate Parkinson’s symptoms but also protect and restore patient’s neural health.
Up Next
A view of an orange and blue jet in flight, with desert terrain visible in the background.
Subscribe to Freethink for more great stories