Decomposing the Apoptosis Pathway Into Biologically Interpretable Principal Components

Min Wang, Steven M. Kornblau, Kevin R. Coombes

Research output: Contribution to journalArticlepeer-review

20 Scopus citations

Abstract

Principal component analysis (PCA) is one of the most common techniques in the analysis of biological data sets, but applying PCA raises 2 challenges. First, one must determine the number of significant principal components (PCs). Second, because each PC is a linear combination of genes, it rarely has a biological interpretation. Existing methods to determine the number of PCs are either subjective or computationally extensive. We review several methods and describe a new R package, PCDimension, that implements additional methods, the most important being an algorithm that extends and automates a graphical Bayesian method. Using simulations, we compared the methods. Our newly automated procedure is competitive with the best methods when considering both accuracy and speed and is the most accurate when the number of objects is small compared with the number of attributes. We applied the method to a proteomics data set from patients with acute myeloid leukemia. Proteins in the apoptosis pathway could be explained using 6 PCs. By clustering the proteins in PC space, we were able to replace the PCs by 6 “biological components,” 3 of which could be immediately interpreted from the current literature. We expect this approach combining PCA with clustering to be widely applicable.

Original languageEnglish
JournalCancer Informatics
Volume17
DOIs
StatePublished - 7 May 2018
Externally publishedYes

Keywords

  • Auer-Gervini
  • Bayes rule
  • Dimension reduction
  • broken stick
  • randomization-based procedure

Fingerprint

Dive into the research topics of 'Decomposing the Apoptosis Pathway Into Biologically Interpretable Principal Components'. Together they form a unique fingerprint.

Cite this