Oral Presentation 29th Annual Lorne Proteomics Symposium 2024

From peptides to proteins: missingness-informed protein quantification in bottom-up proteomics (#17)

Mengbo Li 1 , Gordon Smyth 1
  1. WEHI, Parkville, VIC, Australia

Mass spectrometry (MS) based proteomics is a powerful tool in biomedical research, but its usefulness is limited by the frequent occurrence of missing values. We argue that missing values should always be viewed as missing not at random (MNAR) in MS-based proteomics data, because the probability of detection is related to the underlying intensity. We propose a statistical model for non-ignorable missing values in proteomics data, termed the detection probability curve (DPC). Importantly, DPC provides a probabilistic model for missing values and can be used to inform the downstream differential expression analysis. To this end, we introduce the DPC-quantification model, where missing values are taken into account when summarizing peptides into proteins. For each protein group, we use DPC to represent missing values. An additive linear model is fitted to estimate the protein-level intensity in each sample by maximizing the posterior distribution with empirical priors. Uncertainty in protein-level estimations is incorporated into differential expression testing via a customized limma analysis. The proposed methods are tested and evaluated on real data where we show that the DPC-quantification model eliminates missing values in protein-level data and improves the statistical power for differential expression in proteome-wide experiments while maintaining correct control of the false discovery rate.