TY - JOUR
T1 - How GPT models perform on the United States medical licensing examination
T2 - a systematic review
AU - Brin, Dana
AU - Sorin, Vera
AU - Konen, Eli
AU - Nadkarni, Girish
AU - Glicksberg, Benjamin S.
AU - Klang, Eyal
N1 - Publisher Copyright:
© The Author(s) 2024.
PY - 2024/10
Y1 - 2024/10
N2 - Objective: The United States Medical Licensing Examination (USMLE) assesses physicians' competency. Passing this exam is required to practice medicine in the U.S. With the emergence of large language models (LLMs) like ChatGPT and GPT-4, understanding their performance on these exams illuminates their potential in medical education and healthcare. Materials and methods: A PubMed literature search following the 2020 PRISMA guidelines was conducted, focusing on studies using official USMLE questions and GPT models. Results: Six relevant studies were found out of 19 screened, with GPT-4 showcasing the highest accuracy rates of 80–100% on the USMLE. Open-ended prompts typically outperformed multiple-choice ones, with 5-shot prompting slightly edging out zero-shot. Conclusion: LLMs, especially GPT-4, display proficiency in tackling USMLE questions. As AI integrates further into healthcare, ongoing assessments against trusted benchmarks are essential.
AB - Objective: The United States Medical Licensing Examination (USMLE) assesses physicians' competency. Passing this exam is required to practice medicine in the U.S. With the emergence of large language models (LLMs) like ChatGPT and GPT-4, understanding their performance on these exams illuminates their potential in medical education and healthcare. Materials and methods: A PubMed literature search following the 2020 PRISMA guidelines was conducted, focusing on studies using official USMLE questions and GPT models. Results: Six relevant studies were found out of 19 screened, with GPT-4 showcasing the highest accuracy rates of 80–100% on the USMLE. Open-ended prompts typically outperformed multiple-choice ones, with 5-shot prompting slightly edging out zero-shot. Conclusion: LLMs, especially GPT-4, display proficiency in tackling USMLE questions. As AI integrates further into healthcare, ongoing assessments against trusted benchmarks are essential.
UR - http://www.scopus.com/inward/record.url?scp=85204307278&partnerID=8YFLogxK
U2 - 10.1007/s42452-024-06194-5
DO - 10.1007/s42452-024-06194-5
M3 - Review article
AN - SCOPUS:85204307278
SN - 2523-3971
VL - 6
JO - Discover Applied Sciences
JF - Discover Applied Sciences
IS - 10
M1 - 500
ER -