Preview

Scientific and Technical Journal of Information Technologies, Mechanics and Optics

Advanced search

Incorporating negative examples into Hidden Markov Model-based classification of peptide sequences

https://doi.org/10.17586/2226-1494-2025-25-5-888-901

Abstract

Hidden Markov Models (HMMs) trained to identify binding regions in peptide sequences have demonstrated the ability to uncover shared amino acid patterns in peptides bound to major histocompatibility complex molecules. In this work, we present an enhanced approach for predicting peptide binding using an ensemble of HMMs. Building on a previously proposed method, we extend it to a classification setting by incorporating both binding (positive) and non-binding (negative) peptide sequences. Our strategy involves training two sets of models on these distinct datasets and selecting ensemble members based on conditional probability estimates. The method was evaluated across six alleles of major histocompatibility complex using two model architectures: simplified architecture with 9 states representing the peptide binding core region and two cycle-states for the amino acids outside this region, and extended architecture, in which each cycle state was replaced by 9 additional states. Models evaluated in comparison with the state-of-the-art MixMHC2pred predictor. Results show a statistically significant improvement in prediction accuracy. Notably, incorporating non-binding peptides during training improved performance in several cases, highlighting the importance of background sequence information in distinguishing binding-specific patterns.

About the Authors

V. A. Polezhaeva
ITMO University
Russian Federation

Valeriia A. Polezhaeva — Student

Saint Petersburg, 197101



D. A. Kleverov
Washington University in St. Louis. School of Medicine. Department of Pathology and Immunology
United States

Denis A. Kleverov — Visiting Researcher

sc 58741254400

Saint Louis, 631110



A. A. Shalyto
ITMO University
Russian Federation

Anatoly A. ShalytoD.Sc., Full Professor

sc 56131789500

Saint Petersburg, 197101



M. Artyomov
ITMO University; Washington University in St. Louis. School of Medicine. Department of Pathology and Immunology
Russian Federation

Maxim Artyomov — PhD (Chemistry), Full Professor; Professor

sc 9242717500

Saint Petersburg, 197101

Saint Louis, 631110



References

1. Corradin G. Antigen processing and presentation. Immunology Letters, 1990, vol. 25, no. 1–3, pp. 11–13. https://doi.org/10.1016/0165-2478(90)90082-2

2. Abualrous E.T., Sticht J., Freund C. Major histocompatibility complex (MHC) class I and class II proteins: impact of polymorphism on antigen presentation. Current Opinion in Immunology, 2021, vol. 70, pp. 95–104. https://doi.org/10.1016/j.coi.2021.04.009

3. Waldman A.D., Fritz J.M., Lenardo M.J. A guide to cancer immunotherapy: from T cell basic science to clinical practice. Nature Reviews Immunology, 2020, vol. 20, no. 11, pp. 651–668. https://doi.org/10.1038/s41577-020-0306-5

4. Wieczorek M., Abualrous E.T., Sticht J., Alvaro-Benito M., Stolzenberg S., Noé F., Freund C. Major histocompatibility complex (MHC) class I and MHC class II proteins: conformational plasticity in antigen presentation. Frontiers in Immunology, 2017, vol. 8, pp. 292. https://doi.org/10.3389/fimmu.2017.00292

5. Kleverov D.A., Shalyto A.A., Artyomov M.N. A method for constructing interpretable hidden Markov models for the task of identifying binding cores in sequences. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2023, vol. 23, no. 5, pp. 989–1000. (in Russian). https://doi.org/10.17586/2226-1494-2023-23-5-989-1000

6. Gutiérrez S.E., Esteban E.N., Lützelschwab C.M., Juliarena M.A. Major histocompatibility complex-associated resistance to infectious diseases: the case of bovine leukemia virus infection. Trends and Advances in Veterinary Genetics, 2017, pp. 101–126. https://doi.org/10.5772/intechopen.68416

7. Eddy S.R. Profile hidden Markov models. Bioinformatics, 1998, vol. 14, no. 9, pp. 755–763. https://doi.org/10.1093/bioinformatics/14.9.755

8. Alspach E., Lussier D.M., Miceli A.P., Kizhvatov I., DuPage M., Luoma A.M., et al. MHC-II neoantigens shape tumour immunity and response to immunotherapy. Nature, 2019, vol. 574, no. 7780, pp. 696–701. https://doi.org/10.1038/s41586-019-1671-8

9. Kim M.W., Gao W., Lichti C.F., Gu X., Dykstra T., Cao J., et al. Endogenous self-peptides guard immune privilege of the central nervous system. Nature, 2025, vol. 637, no. 8044, pp. 176–183. https://doi.org/10.1038/s41586-024-08279-y

10. Vita R., Blazeska N., Marrama D., Duesing S., Bennett J., Greenbaum J., et al. The Immune Epitope Database (IEDB): 2024 update. Nucleic Acids Research, 2025, vol. 53, no. D1, pp. D436– D443. https://doi.org/10.1093/nar/gkae1092

11. Hastie T., Tibshirani R., Friedman J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, 2009, 767 p. https://doi.org/10.1007/978-0-387-84858-7

12. Capietto A.H., Jhunjhunwala S., Pollock S.B., Lupardus P., Wong J., Hänsch L., et al. Mutation position is an important determinant for predicting cancer neoantigens. Journal of Experimental Medicine, 2020, vol. 217, no. 4, pp. e20190179. https://doi.org/10.1084/14.

13. Rahman K.S., Chowdhury E.U., Sachse K., Kaltenboeck B. Inadequate reference datasets biased toward short non-epitopes confound B-cell epitope prediction. The Journal of Biological Chemistry, 2016, vol. 291, no. 28, pp. 14585–14599. https://doi.org/10.1074/jbc.M116.729020

14. Mudge J.M., Carbonell-Sala S., Diekhans M., Martinez J.G., Hunt T., Jungreis I., et al. GENCODE 2025: reference gene annotation for human and mouse. Nucleic Acids Research, 2025, vol. 53, no. D1, pp. D966–D975. https://doi.org/10.1093/nar/gkae1078

15. Forney G.D. The viterbi algorithm. Proceedings of the IEEE, 1973, vol. 61, no. 3, pp. 268–278. https://doi.org/10.1109/proc.1973.9030

16. Rabiner L.R. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 1989, vol. 77, no. 2, pp. 257–286. https://doi.org/10.1109/5.18626

17. Nielsen M., Lundegaard C., Lund O. Prediction of MHC class II binding affinity using SMM-align, a novel stabilization matrix alignment method. BMC Bioinformatics, 2007, vol. 8, pp. 238. https://doi.org/10.1186/1471-2105-8-238

18. DeLong E.R., DeLong D.M., Clarke-Pearson D.L. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics, 1988, vol. 44, no. 3, pp. 837–845. https://doi.org/10.2307/2531595

19. Sun X., Xu W. Fast implementation of DeLong’s algorithm for comparing the areas under correlated receiver operating characteristic curves. IEEE Signal Processing Letters, 2014, vol. 21, no. 11, pp. 1389–1393. https://doi.org/10.1109/LSP.2014.2337313

20. Virtanen P., Gommers R., Oliphant T.E., Haberland M., Reddy T., Cournapeau D., et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nature Methods, 2020, vol. 17, no. 3, pp. 261– 272. https://doi.org/10.1038/s41592-019-0686-2

21. Racle J., Michaux J., Rockinger G.A., Arnaud M., Bobisse S., Chong C., et al. Robust prediction of HLA class II epitopes by deep motif deconvolution of immunopeptidomes. Nature Biotechnology, 2019, vol. 37, no. 11, pp. 1283–1286. https://doi.org/10.1038/s41587-019-0289-6

22. Koşaloğlu-Yalçın Z., Sidney J., Chronister W., Peters B., Sette A. Comparison of HLA ligand elution data and binding predictions reveals varying prediction performance for the multiple motifs recognized by HLA-DQ2.5. Immunology, 2021, vol. 162, no. 2, pp. 235–247. https://doi.org/10.1111/imm.13279jem.20190179


Review

For citations:


Polezhaeva V.A., Kleverov D.A., Shalyto A.A., Artyomov M. Incorporating negative examples into Hidden Markov Model-based classification of peptide sequences. Scientific and Technical Journal of Information Technologies, Mechanics and Optics. 2025;25(5):888-901. https://doi.org/10.17586/2226-1494-2025-25-5-888-901

Views: 15


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2226-1494 (Print)
ISSN 2500-0373 (Online)