The immunopeptidome comprises the suite of human leukocyte antigen (HLA)-bound peptides that are presented at the cell surface for recognition by patrolling T cells. Peptides with post-translational modifications (PTMs) can be presented by HLA molecules, contributing to the diversity of possible T-cell targets. Glycosylation is one such PTM that involves the attachment of sugar chains to the side chains of specific amino acids. Despite being a ubiquitous and important PTM for many proteins within the cell, the prevalence and diversity of glycosylation across the immunopeptidome remains underexplored. We have recently shown that the HLA class II repertoire is populated extensively by trimmed N-glycans; however, the situation for HLA class I is still unclear, in part due to the lower proportion of glycopeptides presented by these molecules.
While peptide search engines such as PEAKS, Byonic, and MSFragger have the functionality to identify glycosylated immunopeptides, optimised data generation and analysis workflows are still needed. This is largely due to the (i) non-tryptic nature of immunopeptides creating a vast search space further expanded by the presence of glycans, and (ii) the futility of searching the large proportion of spectra that do not bear any of the hallmark oxonium signatures of glycosylation. To remedy this, we sought to streamline the analysis of glycosylated immunopeptides through an in silico enrichment approach of putative glycosylated spectra. This is achieved by filtering out all spectra that do not contain the glycopeptide-specific signature oxonium ions liberated following collision-induced dissociation mass spectrometry. We also developed an in-house algorithm to generate dataset-specific glycan lists to serve as pseudo-glycomics databases, further reducing the search space and search time required. We then sought to assess and benchmark this approach against multiple peptide search engines, primarily using Byonic, the leading glycoproteomics search engine. The in silico enrichment approach greatly reduced the search time required for immunopeptide data from days down to hours depending on the dataset. Our results show that although mass spectra from HLA class I immunopeptidomes contain only a relatively small fraction (~0.5-10%) of glycopeptide spectra, the identified glycopeptides conform to the expected HLA class I binding motifs and, dependent on the presenting allele, are decorated by a variety of N- and O-linked glycans previously overlooked by the field.
In summary, our workflow provides a streamlined computational approach for the analysis of glycosylated MHC-I immunopeptides opening exciting opportunities to explore their structural diversity and functional roles in immune surveillance.