Integrated visual and text-based analysis of ophthalmology clinical cases using a large language model

Vera Sorin, Noa Kapelushnik, Idan Hecht, Ofira Zloto, Benjamin S. Glicksberg, Hila Bufman, Adva Livne, Yiftach Barash, Girish N. Nadkarni, Eyal Klang

Research output: Contribution to journalArticlepeer-review

Abstract

Recent advancements in generative artificial intelligence have enabled analysis of text with visual data, which could have important implications in healthcare. Diagnosis in ophthalmology is often based on a combination of ocular examination, and clinical context. The aim of this study was to evaluate the performance of multimodal GPT-4 (GPT-4 V) in an integrated analysis of ocular images and clinical text. This retrospective study included 40 patients seen in our institution with images of their ocular examinations. Cases were selected by a board-certified ophthalmologist, to represent various pathologies. We provided the model with each patient image, without and then with the clinical context. We also asked two non-ophthalmology physicians to write diagnoses for each image, without and then with the clinical context. Answers for both GPT-4 V and the non-ophthalmologists were evaluated by two board-certified ophthalmologists. Performance accuracies were calculated and compared. GPT-4 V provided the correct diagnosis in 19/40 (47.5%) cases based on images without clinical context, and in 27/40 (67.5%) cases when clinical context was provided. Non-ophthalmologist physicians provided the correct diagnoses in 24/40 (60.0%), and 23/40 (57.5%) of cases without clinical context, and in 29/40 (72.5%) and 27/40 (67.5%) with clinical context. For all study participants adding context improved accuracy (p = 0.033). GPT-4 V is currently able to simultaneously analyze and integrate visual and textual data, and arrive at accurate clinical diagnoses in the majority of cases. Multimodal large language models like GPT-4 V have significant potential to advance both patient care and research in ophthalmology.

Original languageEnglish
Article number4999
JournalScientific Reports
Volume15
Issue number1
DOIs
StatePublished - Dec 2025

Keywords

  • AI
  • GPT
  • LLMs
  • Large language models
  • Multimodal algorithms
  • Ophthalmology

Fingerprint

Dive into the research topics of 'Integrated visual and text-based analysis of ophthalmology clinical cases using a large language model'. Together they form a unique fingerprint.

Cite this