NPSV-deep: a deep learning method for genotyping structural variants in short read genome sequencing data

Michael D. Linderman, Jacob Wallace, Alderik van der Heyde, Eliza Wieman, Daniel Brey, Yiran Shi, Peter Hansen, Zahra Shamsi, Jeremiah Liu, Bruce D. Gelb, Ali Bashir

Research output: Contribution to journalArticlepeer-review

Abstract

Motivation: Structural variants (SVs) play a causal role in numerous diseases but can be difficult to detect and accurately genotype (determine zygosity) with short-read genome sequencing data (SRS). Improving SV genotyping accuracy in SRS data, particularly for the many SVs first detected with long-read sequencing, will improve our understanding of genetic variation. Results: NPSV-deep is a deep learning-based approach for genotyping previously reported insertion and deletion SVs that recasts this task as an image similarity problem. NPSV-deep predicts the SV genotype based on the similarity between pileup images generated from the actual SRS data and matching SRS simulations. We show that NPSV-deep consistently matches or improves upon the state-of-the-art for SV genotyping accuracy across different SV call sets, samples and variant types, including a 25% reduction in genotyping errors for the Genome-in-a-Bottle (GIAB) high-confidence SVs. NPSV-deep is not limited to the SVs as described; it improves deletion genotyping concordance a further 1.5 percentage points for GIAB SVs (92%) by automatically correcting imprecise/incorrectly described SVs. Availability and implementation: Python/Cþþ source code and pre-trained models freely available at https://github.com/mlinderm/npsv2.

Original languageEnglish
Article numberbtae129
JournalBioinformatics
Volume40
Issue number3
DOIs
StatePublished - 1 Mar 2024

Fingerprint

Dive into the research topics of 'NPSV-deep: a deep learning method for genotyping structural variants in short read genome sequencing data'. Together they form a unique fingerprint.

Cite this