TY - JOUR
T1 - Holistic optimization of an RNA-seq workflow for multi-threaded environments
AU - Hung, Ling Hong
AU - Lloyd, Wes
AU - Sridhar, Radhika Agumbe
AU - Ravishankar, Saranya Devi Athmalingam
AU - Xiong, Yuguang
AU - Sobie, Eric
AU - Yeung, Ka Yee
N1 - Publisher Copyright:
© 2019 The Author(s) 2019. Published by Oxford University Press. All rights reserved.
PY - 2019/10/15
Y1 - 2019/10/15
N2 - For many next generation-sequencing pipelines, the most computationally intensive step is the alignment of reads to a reference sequence. As a result, alignment software such as the Burrows-Wheeler Aligner is optimized for speed and is often executed in parallel on the cloud. However, there are other less demanding steps that can also be optimized to significantly increase the speed especially when using many threads. We demonstrate this using a unique molecular identifier RNA-sequencing pipeline consisting of 3 steps: split, align, and merge. Optimization of all three steps yields a 40% increase in speed when executed using a single thread. However, when executed using 16 threads, we observe a 4-fold improvement over the original parallel implementation and more than an 8-fold improvement over the original single-threaded implementation. In contrast, optimizing only the alignment step results in just a 13% improvement over the original parallel workflow using 16 threads.
AB - For many next generation-sequencing pipelines, the most computationally intensive step is the alignment of reads to a reference sequence. As a result, alignment software such as the Burrows-Wheeler Aligner is optimized for speed and is often executed in parallel on the cloud. However, there are other less demanding steps that can also be optimized to significantly increase the speed especially when using many threads. We demonstrate this using a unique molecular identifier RNA-sequencing pipeline consisting of 3 steps: split, align, and merge. Optimization of all three steps yields a 40% increase in speed when executed using a single thread. However, when executed using 16 threads, we observe a 4-fold improvement over the original parallel implementation and more than an 8-fold improvement over the original single-threaded implementation. In contrast, optimizing only the alignment step results in just a 13% improvement over the original parallel workflow using 16 threads.
UR - http://www.scopus.com/inward/record.url?scp=85073183300&partnerID=8YFLogxK
U2 - 10.1093/bioinformatics/btz169
DO - 10.1093/bioinformatics/btz169
M3 - Article
C2 - 30859176
AN - SCOPUS:85073183300
SN - 1367-4803
VL - 35
SP - 4173
EP - 4175
JO - Bioinformatics
JF - Bioinformatics
IS - 20
ER -