Abstract
Recent advancements in generative artificial intelligence have enabled analysis of text with visual data, which could have important implications in healthcare. Diagnosis in ophthalmology is often based on a combination of ocular examination, and clinical context. The aim of this study was to evaluate the performance of multimodal GPT-4 (GPT-4 V) in an integrated analysis of ocular images and clinical text. This retrospective study included 40 patients seen in our institution with images of their ocular examinations. Cases were selected by a board-certified ophthalmologist, to represent various pathologies. We provided the model with each patient image, without and then with the clinical context. We also asked two non-ophthalmology physicians to write diagnoses for each image, without and then with the clinical context. Answers for both GPT-4 V and the non-ophthalmologists were evaluated by two board-certified ophthalmologists. Performance accuracies were calculated and compared. GPT-4 V provided the correct diagnosis in 19/40 (47.5%) cases based on images without clinical context, and in 27/40 (67.5%) cases when clinical context was provided. Non-ophthalmologist physicians provided the correct diagnoses in 24/40 (60.0%), and 23/40 (57.5%) of cases without clinical context, and in 29/40 (72.5%) and 27/40 (67.5%) with clinical context. For all study participants adding context improved accuracy (p = 0.033). GPT-4 V is currently able to simultaneously analyze and integrate visual and textual data, and arrive at accurate clinical diagnoses in the majority of cases. Multimodal large language models like GPT-4 V have significant potential to advance both patient care and research in ophthalmology.
Original language | English |
---|---|
Article number | 4999 |
Journal | Scientific Reports |
Volume | 15 |
Issue number | 1 |
DOIs | |
State | Published - Dec 2025 |
Keywords
- AI
- GPT
- LLMs
- Large language models
- Multimodal algorithms
- Ophthalmology