TY - GEN
T1 - Utilizing Open-Source Large Language Models to Extract Genitourinary Symptoms from Clinical Notes
AU - Bai, Yunbing
AU - Cui, Wanting
AU - Finkelstein, Joseph
N1 - Publisher Copyright:
© 2025 The Authors.
PY - 2025/6/26
Y1 - 2025/6/26
N2 - Accurately identifying patient signs and symptoms from clinical notes is essential for effective diagnosis, treatment planning, and medical research. In this study, we evaluated the performance of the Meta Llama model in extracting signs and symptoms related to the genitourinary system, along with their corresponding ICD-10 codes, from urological clinical notes in the MTSamples dataset. The dataset was manually annotated to compare the extraction results of large language models (LLMs) output. We utilized Llama 3.3-70B and performed prompt engineering. The findings suggest that the best performance was achieved when the prompt included a predefined list of definitions of corresponding ICD-10 codes and restricted the model from making assumptions. Under these conditions, Llama 3.3-70B achieved an average recall of 0.96, precision of 0.89, and F1-score of 0.92 for S&S extraction, as well as an average recall of 0.93, precision of 0.85, and F1-score of 0.89 for ICD-10 code generation.
AB - Accurately identifying patient signs and symptoms from clinical notes is essential for effective diagnosis, treatment planning, and medical research. In this study, we evaluated the performance of the Meta Llama model in extracting signs and symptoms related to the genitourinary system, along with their corresponding ICD-10 codes, from urological clinical notes in the MTSamples dataset. The dataset was manually annotated to compare the extraction results of large language models (LLMs) output. We utilized Llama 3.3-70B and performed prompt engineering. The findings suggest that the best performance was achieved when the prompt included a predefined list of definitions of corresponding ICD-10 codes and restricted the model from making assumptions. Under these conditions, Llama 3.3-70B achieved an average recall of 0.96, precision of 0.89, and F1-score of 0.92 for S&S extraction, as well as an average recall of 0.93, precision of 0.85, and F1-score of 0.89 for ICD-10 code generation.
KW - Large Language Models
KW - Llama Models
KW - Natural Language Processing
KW - Symptom Extraction
UR - https://www.scopus.com/pages/publications/105010176976
U2 - 10.3233/SHTI250664
DO - 10.3233/SHTI250664
M3 - Conference contribution
C2 - 40588872
AN - SCOPUS:105010176976
T3 - Studies in Health Technology and Informatics
SP - 16
EP - 20
BT - Global Healthcare Transformation in the Era of Artificial Intelligence and Informatics
A2 - Mantas, John
A2 - Hasman, Arie
A2 - Gallos, Parisis
A2 - Zoulias, Emmanouil
A2 - Karitis, Konstantinos
PB - IOS Press BV
T2 - 23rd Annual International Conference on Informatics, Management, and Technology in Healthcare, ICIMTH 2025
Y2 - 4 July 2025 through 6 July 2025
ER -