Zitian Zhen1, Alexis A. Howard2, Derin B. Keskin1,2, Vladimir Brusic1,3, Lou Chitkushev1* and Guang Lan Zhang1
1 Metropolitan College, Boston University, Boston MA, USA
2 Dana-Farber Cancer Institute, Boston MA, USA
3 University of Nottingham Ningbo, Ningbo, China
ltc [at] bu.edu
Abstract
The C57 Black 6 (C57BL/6) mice, recognized for their genetic uniformity, are among the most widely utilized inbred laboratory animals in immunology research and vaccine development. We propose developing a bioinformatics system for the in silico prediction of MHC class I restricted T-cell epitopes in C57BL/6 mice. Among the multiple steps of the MHC class I antigen processing and recognition pathway, MHC binding is considered the most selective step in T cell recognition. However, many bioinformatics systems focus solely on modeling MHC binding to predict binders, which leads to higher rates of false positives. Recent technological advancements in mass spectrometry (MS) have provided abundant MHC class I ligand data, allowing the incorporation of antigen processing steps. We collected >5,000 H2-Dᵇ and >5,000 H2-Kᵇ binding peptides, along with >4,000 and >5,000 eluted ligands from public databases, respectively. Additionally, thermostability assessment of peptide-MHC binding is crucial for accurate immunogenicity predictions, as stability affects antigen presentation efficiency and T cell activation. We utilized data generated from Dana-Farber Cancer Institute’s temperature gradient experiments, which yielded >3,000 H2-Dᵇ and >5,000 H2-Kᵇ binding peptides. Our work integrates these factors into a computational system for epitope identification in C57BL mice. Utilizing deep learning methods, we trained and validated epitope prediction models using natural ligands and thermostability models using the proprietary data generated by our collaborators. We compared the performance of our models with existing prediction tools validated by many benchmark studies. Our integrated model exhibited superior overall predictive capabilities.
Keywords: bioinformatics system, deep learning, T-cell epitope, MHC binding, C57BL/6
Acknowledgement: DBK would like to acknowledge support from R01-HL157174 and NIH/NCI 3UG1CA189955-09S1