How GPT models perform on the United States medical licensing examination: a systematic review

Dana Brin, Vera Sorin, Eli Konen, Girish Nadkarni, Benjamin S. Glicksberg, Eyal Klang

Research output: Contribution to journalReview articlepeer-review

Abstract

Objective: The United States Medical Licensing Examination (USMLE) assesses physicians' competency. Passing this exam is required to practice medicine in the U.S. With the emergence of large language models (LLMs) like ChatGPT and GPT-4, understanding their performance on these exams illuminates their potential in medical education and healthcare. Materials and methods: A PubMed literature search following the 2020 PRISMA guidelines was conducted, focusing on studies using official USMLE questions and GPT models. Results: Six relevant studies were found out of 19 screened, with GPT-4 showcasing the highest accuracy rates of 80–100% on the USMLE. Open-ended prompts typically outperformed multiple-choice ones, with 5-shot prompting slightly edging out zero-shot. Conclusion: LLMs, especially GPT-4, display proficiency in tackling USMLE questions. As AI integrates further into healthcare, ongoing assessments against trusted benchmarks are essential.

Original languageEnglish
Article number500
JournalDiscover Applied Sciences
Volume6
Issue number10
DOIs
StatePublished - Oct 2024

Fingerprint

Dive into the research topics of 'How GPT models perform on the United States medical licensing examination: a systematic review'. Together they form a unique fingerprint.

Cite this