Trends in Accuracy and Appropriateness of Alopecia Areata Information Obtained from a Popular Online Large Language Model, ChatGPT

Ross O'Hagan, Randie H. Kim, Brian J. Abittan, Stella Caldas, Jonathan Ungar, Benjamin Ungar

Research output: Contribution to journalArticlepeer-review

2 Scopus citations

Abstract

Background: Patients with alopecia areata (AA) may access a wide range of sources for information about AA, including the recently developed ChatGPT. Assessing the quality of health information provided by these sources is crucial, as patients are utilizing them in increasing numbers. Objectives: The aim of the study was to evaluate appropriateness and accuracy of responses to common patient questions about AA generated by ChatGPT. Methods: Responses generated by ChatGPT 3.5 and ChatGPT 4.0 to 25 questions addressing common patient concerns were assessed by multiple attending dermatologists in an academic center for appropriateness and accuracy. Appropriateness of responses by both models for use in two hypothetical contexts as follows: (1) for patient-facing general information websites, and (2) for electronic health record (EHR) message drafts. Results: The accuracy across all responses was 4.41 out of 5. Accuracy scores of responses ChatGPT 3.5 responses had a mean of 4.29, whereas those generated by ChatGPT 4.0 had mean accuracy score of 4.53. Assessments ranged from 100% of responses rated as appropriate for the general question category to 79% questions about management for an EHR message draft. Raters largely preferred responses generated by ChatGPT 4.0 versus ChatGPT 3.5. Reviewer agreement was found to be moderate across all questions, with a 53.7% agreement and Fleiss' κ co-efficient of 0.522 (p value <0.001). Conclusions: The large language model ChatGPT outputted mostly appropriate information for common patient concerns. While not all responses were accurate, the trend toward improvement with newer iterations suggests potential future utility for patients and dermatologists.

Original languageEnglish
Pages (from-to)952-957
Number of pages6
JournalDermatology
Volume239
Issue number6
DOIs
StatePublished - 1 Dec 2023

Keywords

  • Alopecia areata
  • Artificial intelligence
  • ChatGPT

Fingerprint

Dive into the research topics of 'Trends in Accuracy and Appropriateness of Alopecia Areata Information Obtained from a Popular Online Large Language Model, ChatGPT'. Together they form a unique fingerprint.

Cite this