TY - JOUR
T1 - Exploring the Role of Large Language Models in Melanoma
T2 - A Systematic Review
AU - Zarfati, Mor
AU - Nadkarni, Girish N.
AU - Glicksberg, Benjamin S.
AU - Harats, Moti
AU - Greenberger, Shoshana
AU - Klang, Eyal
AU - Soffer, Shelly
N1 - Publisher Copyright:
© 2024 by the authors.
PY - 2024/12
Y1 - 2024/12
N2 - Objective: This systematic review evaluates the current applications, advantages, and challenges of large language models (LLMs) in melanoma care. Methods: A systematic search was conducted in PubMed and Scopus databases for studies published up to 23 July 2024, focusing on the application of LLMs in melanoma. The review adhered to PRISMA guidelines, and the risk of bias was assessed using the modified QUADAS-2 tool. Results: Nine studies were included, categorized into subgroups: patient education, diagnosis, and clinical management. In patient education, LLMs demonstrated high accuracy, though readability often exceeded recommended levels. For diagnosis, multimodal LLMs like GPT-4V showed capabilities in distinguishing melanoma from benign lesions, but accuracy varied, influenced by factors such as image quality and integration of clinical context. Regarding management advice, ChatGPT provided more reliable recommendations compared to other LLMs, but all models lacked depth for individualized decision-making. Conclusions: LLMs, particularly multimodal models, show potential in improving melanoma care. However, current applications require further refinement and validation. Future studies should explore fine-tuning these models on large, diverse dermatological databases and incorporate expert knowledge to address limitations such as generalizability across different populations and skin types.
AB - Objective: This systematic review evaluates the current applications, advantages, and challenges of large language models (LLMs) in melanoma care. Methods: A systematic search was conducted in PubMed and Scopus databases for studies published up to 23 July 2024, focusing on the application of LLMs in melanoma. The review adhered to PRISMA guidelines, and the risk of bias was assessed using the modified QUADAS-2 tool. Results: Nine studies were included, categorized into subgroups: patient education, diagnosis, and clinical management. In patient education, LLMs demonstrated high accuracy, though readability often exceeded recommended levels. For diagnosis, multimodal LLMs like GPT-4V showed capabilities in distinguishing melanoma from benign lesions, but accuracy varied, influenced by factors such as image quality and integration of clinical context. Regarding management advice, ChatGPT provided more reliable recommendations compared to other LLMs, but all models lacked depth for individualized decision-making. Conclusions: LLMs, particularly multimodal models, show potential in improving melanoma care. However, current applications require further refinement and validation. Future studies should explore fine-tuning these models on large, diverse dermatological databases and incorporate expert knowledge to address limitations such as generalizability across different populations and skin types.
KW - artificial intelligence
KW - large language models
KW - melanoma
UR - http://www.scopus.com/inward/record.url?scp=85211796249&partnerID=8YFLogxK
U2 - 10.3390/jcm13237480
DO - 10.3390/jcm13237480
M3 - Review article
AN - SCOPUS:85211796249
SN - 2077-0383
VL - 13
JO - Journal of Clinical Medicine
JF - Journal of Clinical Medicine
IS - 23
M1 - 7480
ER -