TY - JOUR
T1 - Large language models versus classical machine learning performance in COVID-19 mortality prediction using high-dimensional tabular data
AU - Ghaffarzadeh-Esfahani, Mohammadreza
AU - Ghaffarzadeh-Esfahani, Mahdi
AU - Salahi-Niri, Aryan
AU - Toreyhi, Hossein
AU - Atf, Zahra
AU - Mohsenzadeh-Kermani, Amirali
AU - Sarikhani, Mahshad
AU - Tajabadi, Zohreh
AU - Shojaeian, Fatemeh
AU - Bagheri, Mohammad Hassan
AU - Feyzi, Aydin
AU - Tarighat-Payma, Mohamadamin
AU - Gazmeh, Narges
AU - Heydari, Fateme
AU - Afshar, Hossein
AU - Allahgholipour, Amirreza
AU - Alimardani, Farid
AU - Salehi, Ameneh
AU - Asadimanesh, Naghmeh
AU - Khalafi, Mohammad Amin
AU - Shabanipour, Hadis
AU - Moradi, Ali
AU - Zadeh, Sajjad Hossein
AU - Yazdani, Omid
AU - Esbati, Romina
AU - Maleki, Moozhan
AU - Nasr, Danial Samiei
AU - Soheili, Amirali
AU - Majlesi, Hossein
AU - Shahsavan, Saba
AU - Soheilipour, Alireza
AU - Goudarzi, Nooshin
AU - Taherifard, Erfan
AU - Hatamabadi, Hamidreza
AU - Samaan, Jamil S.
AU - Savage, Thomas
AU - Sakhuja, Ankit
AU - Soroush, Ali
AU - Nadkarni, Girish
AU - Darazam, Ilad Alavi
AU - Pourhoseingholi, Mohamad Amin
AU - Safavi-Naini, Seyed Amir Ahmad
N1 - Publisher Copyright:
© The Author(s) 2025.
PY - 2025/12
Y1 - 2025/12
N2 - This study compared the performance of classical feature-based machine learning models (CMLs) and large language models (LLMs) in predicting COVID-19 mortality using high-dimensional tabular data from 9,134 patients across four hospitals. Seven CML models, including XGBoost and random forest (RF), were evaluated alongside eight LLMs, such as GPT-4 and Mistral-7b, which performed zero-shot classification on text-converted structured data. Additionally, Mistral-7b was fine-tuned using the QLoRA approach. XGBoost and RF demonstrated superior performance among CMLs, achieving F1 scores of 0.87 and 0.83 for internal and external validation, respectively. GPT-4 led the LLM category with an F1 score of 0.43, while fine-tuning Mistral-7b significantly improved its recall from 1% to 79%, yielding a stable F1 score of 0.74 during external validation. Although LLMs showed moderate performance in zero-shot classification, fine-tuning substantially enhanced their effectiveness, potentially bridging the gap with CML models. However, CMLs still outperformed LLMs in handling high-dimensional tabular data tasks. This study highlights the potential of both CMLs and fine-tuned LLMs in medical predictive modeling, while emphasizing the current superiority of CMLs for structured data analysis.
AB - This study compared the performance of classical feature-based machine learning models (CMLs) and large language models (LLMs) in predicting COVID-19 mortality using high-dimensional tabular data from 9,134 patients across four hospitals. Seven CML models, including XGBoost and random forest (RF), were evaluated alongside eight LLMs, such as GPT-4 and Mistral-7b, which performed zero-shot classification on text-converted structured data. Additionally, Mistral-7b was fine-tuned using the QLoRA approach. XGBoost and RF demonstrated superior performance among CMLs, achieving F1 scores of 0.87 and 0.83 for internal and external validation, respectively. GPT-4 led the LLM category with an F1 score of 0.43, while fine-tuning Mistral-7b significantly improved its recall from 1% to 79%, yielding a stable F1 score of 0.74 during external validation. Although LLMs showed moderate performance in zero-shot classification, fine-tuning substantially enhanced their effectiveness, potentially bridging the gap with CML models. However, CMLs still outperformed LLMs in handling high-dimensional tabular data tasks. This study highlights the potential of both CMLs and fine-tuned LLMs in medical predictive modeling, while emphasizing the current superiority of CMLs for structured data analysis.
KW - COVID-19 mortality
KW - Fine-tuning
KW - Large language models
KW - Machine learning
KW - Structured data
KW - Zero-shot classification
UR - https://www.scopus.com/pages/publications/105023334374
U2 - 10.1038/s41598-025-26705-7
DO - 10.1038/s41598-025-26705-7
M3 - Article
C2 - 41315569
AN - SCOPUS:105023334374
SN - 2045-2322
VL - 15
JO - Scientific Reports
JF - Scientific Reports
IS - 1
M1 - 42712
ER -