A major manifestation of cancer is the alteration of protein measurements. However, proteins are harder and more expensive to measure than genes and transcripts. To address this problem, we crowdsourced it via the NCI-CPTAC DREAM proteogenomics challenge. We provided participants data to build models to predict protein and phosphorylation levels from genomic and transcriptomic data in cancer patients. We then asked participants to use such models to predict unseen (phospho)protein data from given genomic and transcriptomic data in other patients. This experiment allowed us to assess the predictive performance of the proposed methods in an unbiased and “double-blinded” manner. We found that ensemble methods perform better, and we identified which proteins and biological processes are easier or harder to predict. In general, performance was limited, suggesting that (phospho)proteomic cannot be replaced, at least yet, by genomic and transcriptomic profiling.
- machine learning
- protein regulation