TY - JOUR
T1 - Integrating shotgun proteomics and mRNA expression data to improve protein identification
AU - Ramakrishnan, Smriti R.
AU - Vogel, Christine
AU - Prince, John T.
AU - Li, Zhihua
AU - Penalva, Luiz O.
AU - Myers, Margaret
AU - Marcotte, Edward M.
AU - Miranker, Daniel P.
AU - Wang, Rong
N1 - Funding Information:
Funding: National Science Foundation (DBI-0640923, IIS-0325116); Welch (F-1515); Packard Foundation; National Institutes of Health (GM06779-01, GM076536-01). International Human Frontier Science Program (to C.V.).
PY - 2009/6
Y1 - 2009/6
N2 - Motivation: Tandem mass spectrometry (MS/MS) offers fast and reliable characterization of complex protein mixtures, but suffers from low sensitivity in protein identification. In a typical shotgun proteomics experiment, it is assumed that all proteins are equally likely to be present. However, there is often other information available, e.g. the probability of a protein's presence is likely to correlate with its mRNA concentration. Results: We develop a Bayesian score that estimates the posterior probability of a protein's presence in the sample given its identification in an MS/MS experiment and its mRNA concentration measured under similar experimental conditions. Our method, MSpresso, substantially increases the number of proteins identified in an MS/MS experiment at the same error rate, e.g. in yeast, MSpresso increases the number of proteins identified by ∼40%. We apply MSpresso to data from different MS/MS instruments, experimental conditions and organisms (Escherichia coli, human), and predict 19-63% more proteins across the different datasets. MSpresso demonstrates that incorporating prior knowledge of protein presence into shotgun proteomics experiments can substantially improve protein identification scores.
AB - Motivation: Tandem mass spectrometry (MS/MS) offers fast and reliable characterization of complex protein mixtures, but suffers from low sensitivity in protein identification. In a typical shotgun proteomics experiment, it is assumed that all proteins are equally likely to be present. However, there is often other information available, e.g. the probability of a protein's presence is likely to correlate with its mRNA concentration. Results: We develop a Bayesian score that estimates the posterior probability of a protein's presence in the sample given its identification in an MS/MS experiment and its mRNA concentration measured under similar experimental conditions. Our method, MSpresso, substantially increases the number of proteins identified in an MS/MS experiment at the same error rate, e.g. in yeast, MSpresso increases the number of proteins identified by ∼40%. We apply MSpresso to data from different MS/MS instruments, experimental conditions and organisms (Escherichia coli, human), and predict 19-63% more proteins across the different datasets. MSpresso demonstrates that incorporating prior knowledge of protein presence into shotgun proteomics experiments can substantially improve protein identification scores.
UR - http://www.scopus.com/inward/record.url?scp=65649152557&partnerID=8YFLogxK
U2 - 10.1093/bioinformatics/btp168
DO - 10.1093/bioinformatics/btp168
M3 - Article
C2 - 19318424
AN - SCOPUS:65649152557
SN - 1367-4803
VL - 25
SP - 1397
EP - 1403
JO - Bioinformatics
JF - Bioinformatics
IS - 11
ER -