Validation of a Zero-shot Learning Natural Language Processing Tool to Facilitate Data Abstraction for Urologic Research

Basil Kaufmann, Dallin Busby, Chandan Krushna Das, Neeraja Tillu, Mani Menon, Ashutosh K. Tewari, Michael A. Gorin

Research output: Contribution to journalArticlepeer-review


Background: Urologic research often requires data abstraction from unstructured text contained within the electronic health record. A number of natural language processing (NLP) tools have been developed to aid with this time-consuming task; however, the generalizability of these tools is typically limited by the need for task-specific training. Objective: To describe the development and validation of a zero-shot learning NLP tool to facilitate data abstraction from unstructured text for use in downstream urologic research. Design, setting, and participants: An NLP tool based on the GPT-3.5 model from OpenAI was developed and compared with three physicians for time to task completion and accuracy for abstracting 14 unique variables from a set of 199 deidentified radical prostatectomy pathology reports. The reports were processed in vectorized and scanned formats to establish the impact of optical character recognition on data abstraction. Intervention: A zero-shot learning NLP tool for data abstraction. Outcome measurements and statistical analysis: The tool was compared with the human abstractors in terms of superiority for data abstraction speed and noninferiority for accuracy. Results and limitations: The human abstractors required a median (interquartile range) of 93 s (72–122 s) per report for data abstraction, whereas the software required a median of 12 s (10–15 s) for the vectorized reports and 15 s (13–17 s) for the scanned reports (p < 0.001 for all paired comparisons). The accuracies of the three human abstractors were 94.7% (95% confidence interval [CI], 93.8–95.5%), 97.8% (95% CI, 97.2–98.3%), and 96.4% (95% CI, 95.6–97%) for the combined set of 2786 data points. The tool had accuracy of 94.2% (95% CI, 93.3–94.9%) for the vectorized reports and was noninferior to the human abstractors at a margin of –10% (α = 0.025). The tool had slightly lower accuracy of 88.7% (95% CI 87.5–89.9%) for the scanned reports, making it noninferior to two of three human abstractors. Conclusions: The developed zero-shot learning NLP tool offers urologic researchers a highly generalizable and accurate method for data abstraction from unstructured text. An open access version of the tool is available for immediate use by the urologic community. Patient summary: In this report, we describe the design and validation of an artificial intelligence tool for abstracting discrete data from unstructured notes contained within the electronic medical record. This freely available tool, which is based on the GPT-3.5 technology from OpenAI, is intended to facilitate research and scientific discovery by the urologic community.

Original languageEnglish
JournalEuropean Urology Focus
StateAccepted/In press - 2024


  • Data abstraction
  • Large language models
  • Natural language processing


Dive into the research topics of 'Validation of a Zero-shot Learning Natural Language Processing Tool to Facilitate Data Abstraction for Urologic Research'. Together they form a unique fingerprint.

Cite this