TY - JOUR
T1 - A clinical benchmark of public self-supervised pathology foundation models
AU - Campanella, Gabriele
AU - Chen, Shengjia
AU - Singh, Manbir
AU - Verma, Ruchika
AU - Muehlstedt, Silke
AU - Zeng, Jennifer
AU - Stock, Aryeh
AU - Croken, Matt
AU - Veremis, Brandon
AU - Elmas, Abdulkadir
AU - Shujski, Ivan
AU - Neittaanmäki, Noora
AU - Huang, Kuan Lin
AU - Kwan, Ricky
AU - Houldsworth, Jane
AU - Schoenfeld, Adam J.
AU - Vanderbilt, Chad
N1 - Publisher Copyright:
© The Author(s) 2025.
PY - 2025/12
Y1 - 2025/12
N2 - The use of self-supervised learning to train pathology foundation models has increased substantially in the past few years. Notably, several models trained on large quantities of clinical data have been made publicly available in recent months. This will significantly enhance scientific research in computational pathology and help bridge the gap between research and clinical deployment. With the increase in availability of public foundation models of different sizes, trained using different algorithms on different datasets, it becomes important to establish a benchmark to compare the performance of such models on a variety of clinically relevant tasks spanning multiple organs and diseases. In this work, we present a collection of pathology datasets comprising clinical slides associated with clinically relevant endpoints including cancer diagnoses and a variety of biomarkers generated during standard hospital operation from three medical centers. We leverage these datasets to systematically assess the performance of public pathology foundation models and provide insights into best practices for training foundation models and selecting appropriate pretrained models. To enable the community to evaluate their models on our clinical datasets, we make available an automated benchmarking pipeline for external use.
AB - The use of self-supervised learning to train pathology foundation models has increased substantially in the past few years. Notably, several models trained on large quantities of clinical data have been made publicly available in recent months. This will significantly enhance scientific research in computational pathology and help bridge the gap between research and clinical deployment. With the increase in availability of public foundation models of different sizes, trained using different algorithms on different datasets, it becomes important to establish a benchmark to compare the performance of such models on a variety of clinically relevant tasks spanning multiple organs and diseases. In this work, we present a collection of pathology datasets comprising clinical slides associated with clinically relevant endpoints including cancer diagnoses and a variety of biomarkers generated during standard hospital operation from three medical centers. We leverage these datasets to systematically assess the performance of public pathology foundation models and provide insights into best practices for training foundation models and selecting appropriate pretrained models. To enable the community to evaluate their models on our clinical datasets, we make available an automated benchmarking pipeline for external use.
UR - https://www.scopus.com/pages/publications/105002974175
U2 - 10.1038/s41467-025-58796-1
DO - 10.1038/s41467-025-58796-1
M3 - Article
AN - SCOPUS:105002974175
SN - 2041-1723
VL - 16
JO - Nature Communications
JF - Nature Communications
IS - 1
M1 - 3640
ER -