Multiple-instance learning of somatic mutations for the classification of tumour type and the prediction of microsatellite status

Jordan Anaya, John William Sidhom, Faisal Mahmood, Alexander S. Baras

Research output: Contribution to journalArticlepeer-review

2 Scopus citations

Abstract

Large-scale genomic data are well suited to analysis by deep learning algorithms. However, for many genomic datasets, labels are at the level of the sample rather than for individual genomic measures. Machine learning models leveraging these datasets generate predictions by using statically encoded measures that are then aggregated at the sample level. Here we show that a single weakly supervised end-to-end multiple-instance-learning model with multi-headed attention can be trained to encode and aggregate the local sequence context or genomic position of somatic mutations, hence allowing for the modelling of the importance of individual measures for sample-level classification and thus providing enhanced explainability. The model solves synthetic tasks that conventional models fail at, and achieves best-in-class performance for the classification of tumour type and for predicting microsatellite status. By improving the performance of tasks that require aggregate information from genomic datasets, multiple-instance deep learning may generate biological insight.

Original languageEnglish
Pages (from-to)57-67
Number of pages11
JournalNature Biomedical Engineering
Volume8
Issue number1
DOIs
StatePublished - Jan 2024
Externally publishedYes

Fingerprint

Dive into the research topics of 'Multiple-instance learning of somatic mutations for the classification of tumour type and the prediction of microsatellite status'. Together they form a unique fingerprint.

Cite this