Appropriate Evaluation of Diagnostic Utility of Machine Learning Algorithm Generated Images

Young Joon Kwon, Danielle Toussie, Lea Azour, Jose Concepcion, Corey Eber, G. Anthony Reina, Ping Tak Peter Tang, Amish H. Doshi, Eric K. Oermann, Anthony B. Costa

Research output: Contribution to journalConference articlepeer-review

3 Scopus citations

Abstract

Generative machine learning (ML) methods can reduce time, cost, and radiation associated with medical image acquisition, compression, or generation techniques. While quantitative metrics are commonly used in the evaluation of ML generated images, it is unknown how well these quantitative metrics relate to the diagnostic utility of images. Here, fellowship-trained radiologists provided diagnoses and qualitative evaluations on chest radiographs reconstructed from the current standard JPEG2000 or variational autoencoder (VAE) techniques. Cohen’s kappa coefficient measured the agreement of diagnoses based on different reconstructions. Methods that produced similar Fréchet inception distance (FID) showed similar diagnostic performances. Thus in place of time-intensive expert radiologist verification, an appropriate target FID – an objective quantitative metric – can evaluate the clinical utility of ML generated medical images.

Original languageEnglish
Pages (from-to)179-193
Number of pages15
JournalProceedings of Machine Learning Research
Volume136
StatePublished - 2020
Event6th Workshop on Machine Learning for Health: Advancing Healthcare for All, ML4H 2020, in conjunction with the 34th Conference on Neural Information Processing Systems, NeurIPS 2020 - Virtual, Online
Duration: 11 Dec 2020 → …

Keywords

  • Clinical Validation
  • Data Compression
  • Generative Models

Fingerprint

Dive into the research topics of 'Appropriate Evaluation of Diagnostic Utility of Machine Learning Algorithm Generated Images'. Together they form a unique fingerprint.

Cite this