TY - JOUR
T1 - Optimizing a whole-genome sequencing data processing pipeline for precision surveillance of health care-associated infections
AU - Huang, Weihua
AU - Wang, Guiqing
AU - Yin, Changhong
AU - Chen, Donald
AU - Dhand, Abhay
AU - Chanza, Melissa
AU - Dimitrova, Nevenka
AU - Fallon, John T.
N1 - Publisher Copyright:
© 2019 by the authors. Licensee MDPI, Basel, Switzerland.
PY - 2019/10
Y1 - 2019/10
N2 - The surveillance of health care-associated infection (HAI) is an essential element of the infection control program. While whole-genome sequencing (WGS) has widely been adopted for genomic surveillance, its data processing remains to be improved. Here, we propose a three-level data processing pipeline for the precision genomic surveillance of microorganisms without prior knowledge: species identification, multi-locus sequence typing (MLST), and sub-MLST clustering. The former two are closely connected to what have widely been used in current clinical microbiology laboratories, whereas the latter one provides significantly improved resolution and accuracy in genomic surveillance. Comparing to a broadly used reference-dependent alignment/mapping method and an annotation-dependent pan-/core-genome analysis, we implemented our reference-and annotation-independent, k-mer-based, simplified workflow to a collection of Acinetobacter and Enterococcus clinical isolates for tests. By taking both single nucleotide variants and genomic structural changes into account, the optimized k-mer-based pipeline demonstrated a global view of bacterial population structure in a rapid manner and discriminated the relatedness between bacterial isolates in more detail and precision. The newly developed WGS data processing pipeline would facilitate WGS application to the precision genomic surveillance of HAI. In addition, the results from such a WGS-based analysis would be useful for the precision laboratory diagnosis of infectious microorganisms.
AB - The surveillance of health care-associated infection (HAI) is an essential element of the infection control program. While whole-genome sequencing (WGS) has widely been adopted for genomic surveillance, its data processing remains to be improved. Here, we propose a three-level data processing pipeline for the precision genomic surveillance of microorganisms without prior knowledge: species identification, multi-locus sequence typing (MLST), and sub-MLST clustering. The former two are closely connected to what have widely been used in current clinical microbiology laboratories, whereas the latter one provides significantly improved resolution and accuracy in genomic surveillance. Comparing to a broadly used reference-dependent alignment/mapping method and an annotation-dependent pan-/core-genome analysis, we implemented our reference-and annotation-independent, k-mer-based, simplified workflow to a collection of Acinetobacter and Enterococcus clinical isolates for tests. By taking both single nucleotide variants and genomic structural changes into account, the optimized k-mer-based pipeline demonstrated a global view of bacterial population structure in a rapid manner and discriminated the relatedness between bacterial isolates in more detail and precision. The newly developed WGS data processing pipeline would facilitate WGS application to the precision genomic surveillance of HAI. In addition, the results from such a WGS-based analysis would be useful for the precision laboratory diagnosis of infectious microorganisms.
KW - Data processing pipeline
KW - Genomic surveillance
KW - Health care-associated infection (HAI)
KW - K-mer
KW - Whole-genome sequencing (WGS)
UR - http://www.scopus.com/inward/record.url?scp=85074280627&partnerID=8YFLogxK
U2 - 10.3390/microorganisms7100388
DO - 10.3390/microorganisms7100388
M3 - Article
AN - SCOPUS:85074280627
SN - 2076-2607
VL - 7
JO - Microorganisms
JF - Microorganisms
IS - 10
M1 - 388
ER -