A method for constructing interpretable hidden Markov models for the task of identifying binding cores in sequences

D. A. Kleverov; A. A. Shalyto; M. N. Artyomov

doi:10.17586/2226-1494-2023-23-5-989-1000

A method for constructing interpretable hidden Markov models for the task of identifying binding cores in sequences

D. A. Kleverov, A. A. Shalyto, M. N. Artyomov

https://doi.org/10.17586/2226-1494-2023-23-5-989-1000

Full Text:

PDF (Rus)

Generate QR code

Abstract

Solving the problem of predicting the immune response against foreign protein sequence fragments processed by cells is one of the major milestones on the road to the personalized cancer vaccine development. The selection of peptides participating in the immune response is a complex multi-stage process of filtering initial sequences to present their fragments on the cell surface. The most studied task regarding this filtering nowadays is the prediction of the binding probability of peptides to major histocompatibility complex molecules. Modern methods for predicting this stage are usually based on algorithms using artificial neural networks, which make it impossible to interpret the result predictions of such models. One of the methods to overcome this limitation is the use of interpretable hidden Markov models. In this work, an analysis of the binding prediction task is performed. As a result, a method for constructing interpretable models that consider domain-specific constraints and requirements is proposed. A method for the constriction, training and interpretation of hidden Markov models was proposed for each class of molecules. The construction and training are based on maintaining the model architecture capable of extracting and visualizing the binding core of the peptide. Interpretation is possible through the analysis of the model graph. The proposed method is tested in the task of training a model that not only enables prediction but also facilitates determining the position of the peptide binding core and the distribution of amino acids within the core. Prediction models were trained for two types of molecules using binding data. The distributions of amino acids in the binding core match the state distributions of the model. Sequence patterns of such regions extracted using the trained models for two sets of peptide data correspond to patterns from public databases, confirming the successful validation of the method. Interpretable models provide a better description of the problem domain and help to draw a conclusion about peptide characteristics based on information extracted from the model. This information will allow researchers to better understand other steps of peptide processing involved in the immune response. For example, one can study relationships between these steps or perform a transfer of knowledge from models trained for one step to others. Using this knowledge will allow the training of the models under conditions of limited training data.

Keywords

binding prediction, hidden Markov models, Viterbi algorithm, data analysis, motif identification, sequences alignment, interpretable models

About the Authors

D. A. Kleverov

ITMO University
Russian Federation

Denis A. Kleverov — PhD Student

Saint Petersburg, 197101

A. A. Shalyto

ITMO University
Russian Federation

Anatoly A. Shalyto — D.Sc., Full Professor

sc 56131789500

Saint Petersburg, 197101

M. N. Artyomov

ITMO University ; Washington University in St. Louis. School of Medicine. Department of Pathology and Immunology
Russian Federation

Maxim N. Artyomov — PhD (Chemistry), Professor (Researcher)

sc 9242717500

Saint Petersburg, 197101

Saint Louis, 63110, USA

References

1. Chen D.S., Mellman I. Oncology meets immunology: The cancer- immunity cycle. Immunity, 2013, vol. 39, no. 1, pp. 1–10. https://doi.org/10.1016/j.immuni.2013.07.012

2. Matsushita H., Vesely M.D., Koboldt D.C., Rickert C.G., Uppaluri R., Magrini V.J., Arthur C.D., White J.M., Chen Y.-S., Shea L.K., Hundal J., Wendl M.C., Demeter R., Wylie T., Allison J.P., Smyth M.J., Old L.J., Mardis E.R., Schreiber R.D. Cancer exome analysis reveals a T-cell-dependent mechanism of cancer immunoediting. Nature, 2012, vol. 482, no. 7385, pp. 400–404. https://doi.org/10.1038/nature10755

3. Corradin G. Antigen processing and presentation. Immunology Letters, 1990, vol. 25, no. 1–3, pp. 11–13. https://doi.org/10.1016/0165-2478(90)90082-2

4. Waldman A.D., Fritz J.M., Lenardo M.J. A guide to cancer immunotherapy: from T cell basic science to clinical practice. Nature Reviews Immunology, 2020, vol. 20, no. 11, pp. 651–668. https://doi.org/10.1038/s41577-020-0306-5

5. Ott P.A., Hu Z., Keskin D.B., Shukla S.A. et al. An immunogenic personal neoantigen vaccine for patients with melanoma. Nature, 2017, vol. 547, no. 7662, pp. 217–221. https://doi.org/10.1038/nature22991

6. Alspach E., Lussier D.M., Miceli A.P., Kizhvatov I., DuPage M., Luoma A.M., Meng W., Lichti C.F., Esaulova E., Vomund A.N., Runci D., Ward J.P., Gubin M.M., Medrano R.F.V., Arthur C.D., White J.M., Sheehan K.C.F., Chen A., Wucherpfennig K.W., Jacks T., Unanue E.R., Artyomov M.N., Schreiber R.D. MHC-II neoantigens shape tumour immunity and response to immunotherapy. Nature, 2019, vol. 574, no. 7780, pp. 696–701. https://doi.org/10.1038/s41586-019-1671-8

7. Reynisson B., Alvarez B., Paul S., Peters B., Nielsen M. NetMHCpan-4.1 and NetMHCIIpan-4.0: improved predictions of MHC antigen presentation by concurrent motif deconvolution and integration of MS MHC eluted ligand data. Nucleic Acids Research, 2020, vol. 48, no. W1, pp. 449–454. https://doi.org/10.1093/nar/gkaa379

8. O’Donnell T.J., Rubinsteyn A., Laserson U. MHCflurry 2.0: Improved pan-allele prediction of MHC class I-presented peptides by incorporating antigen processing. Cell Systems, 2020, vol. 11, no. 1, pp. 42–48. https://doi.org/10.1016/j.cels.2020.06.010

9. Phloyphisut P., Pornputtapong N., Sriswasdi S., Chuangsuwanich E. MHCSeqNet: a deep neural network model for universal MHC binding prediction. BMC Bioinformatics, 2019, vol. 20, no. 1, pp. 270. https://doi.org/10.1186/s12859-019-2892-4

10. Shao X.M., Bhattacharya R., Huang J., Sivakumar I.K.A., Tokheim C., Zheng L., Hirsch D., Kaminow B., Omdahl A., Bonsack M., Riemer A.B., Velculescu V.E., Anagnostou V., Pagel K.A., Karchin R. High-throughput prediction of MHC class I and II neoantigens with MHCnuggets. Cancer Immunology Research, 2020, vol. 8, no. 3, pp. 396–408. https://doi.org/10.1158/2326-6066.cir-19-0464

11. Rabiner L.R. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 1989, vol. 77, no. 2, pp. 257–286. https://doi.org/10.1109/5.18626

12. Revzin L.M., Filchenkov A.A., Tulupyev A.L. Representation of multinomial linear hidden Markov models in the form of algebraic Bayesian networks. SPIIRAS Proceedings, 2012, vol. 1, no. 20, pp. 186–199. (in Russian). https://doi.org/10.15622/sp.20.10

13. Eddy S.R. Profile hidden Markov models. Bioinformatics, 1998, vol. 14, no. 9, pp. 755–763. https://doi.org/10.1093/bioinformatics/14.9.755

14. Bui H.-H., Sidney J., Peters B., Sathiamurthy M., Sinichi A., Purton K.-A., Mothé B.R., Chisari F.V., Watkins D.I., Sette A. Automated generation and evaluation of specific MHC binding predictive tools: ARB matrix applications. Immunogenetics, 2005, vol. 57, no. 5, pp. 304–314. https://doi.org/10.1007/s00251-005-0798-y

15. Sarkizova S., Klaeger S., Le P.M., Li L.W., Oliveira G., Keshishian H., Hartigan C.R., Zhang W., Braun D.A., Ligon K.L., Bachireddy P., Zervantonakis I.K., Rosenbluth J.M., Ouspenskaia T., Law T., Justesen S., Stevens J., Lane W.J., Eisenhaure T., Zhang G.L., Clauser K.R., Hacohen N., Carr S.A., Wu C.J., Keskin D.B. A large peptidome dataset improves HLA class I epitope prediction across most of the human population. Nature Biotechnology, 2020, vol. 38, no. 2, pp. 199–209. https://doi.org/10.1038/s41587-019-0322-9

16. Gomez-Perosanz M., Ras-Carmona A., Reche P.A. PCPS: A web server to predict proteasomal cleavage sites. Methods in Molecular Biology, 2020, vol. 2131, pp. 399–406. https://doi.org/10.1007/978-1-0716-0389-5_23

17. Schmidt J., Smith A.R., Magnin M., Racle J., Devlin J.R., Bobisse S., Cesbron J., Bonnet V., Carmona S.J., Huber F., Ciriello G., Speiser D.E., Bassani-Sternberg M., Coukos G., Baker B.M., Harari A., Gfeller D. Prediction of neo-epitope immunogenicity reveals TCR recognition determinants and provides insight into immunoediting. Cell Reports Medicine, 2021, vol. 2, no. 2, pp. 100194. https://doi.org/10.1016/j.xcrm.2021.100194

18. Capietto A.H., Jhunjhunwala S., Pollock S.B., Lupardus P., Wong J., Hänsch L., Cevallos J., Chestnut Y., Fernandez A., Lounsbury N., Nozawa T., Singh M., Fan Z., de la Cruz C.C., Phung Q.T., Taraborrelli L., Haley B., Lill J.R., Mellman I., Bourgon R., Delamarre L. Mutation position is an important determinant for predicting cancer neoantigens. Journal of Experimental Medicine, 2020, vol. 217, no. 4, pp. e20190179. https://doi.org/10.1084/jem.20190179

19. Andreatta M., Karosiene E., Rasmussen M., Stryhn A., Buus S., Nielsen M. Accurate pan-specific prediction of peptide-MHC class II binding affinity with improved binding core identification. Immunogenetics, 2015, vol. 67, no. 11–12, pp. 641–650. https://doi.org/10.1007/s00251-015-0873-y

20. Punt J., Stranford S., Jones P., Owen J.A. Kuby Immunology. New York, Macmillan Education, 2019, 994 p.

21. Dendrou C.A., Petersen J., Rossjohn J., Fugger L. HLA variation and disease. Nature Reviews Immunology, 2018, vol. 18, no. 5, pp. 325– 339. https://doi.org/10.1038/nri.2017.143

22. Robinson J., Halliwell J.A., Hayhurst J.D., Flicek P., Parham P., Marsh S.G.E. The IPD and IMGT/HLA database: allele variant databases. Nucleic Acids Research, 2015, vol. 43, no. D1, pp. D423– D431. https://doi.org/10.1093/nar/gku1161

23. Tulupev A.L., Nikolenko S.I., Sirotkin A.V. Fundamentals of Bayesian Network Theory. St. Petersburg, SPbU Publ., 2019, pp. 399. (in Russian)

24. Ng S.K., Krishnan T., McLachlan G.J. The EM algorithm. Handbook of Computational Statistics, 2012, pp. 139–172. https://doi.org/10.1007/978-3-642-21551-3_6

25. Forney G.D. The viterbi algorithm. Proceedings of the IEEE, 1973, vol. 61, no. 3, pp. 268–278. https://doi.org/10.1109/proc.1973.9030

26. Tareen A., Kinney J.B. Logomaker: beautiful sequence logos in Python. Bioinformatics, 2020, vol. 36, no. 7, pp. 2272–2274. https://doi.org/10.1093/bioinformatics/btz921

27. Vita R., Mahajan S., Overton J.A., Dhanda S.K., Martini S., Cantrell J.R., Wheeler D.K., Sette A., Peters B. The immune epitope database (IEDB): 2018 update. Nucleic Acids Research, 2019, vol. 47, no. D1, pp. D339–D343. https://doi.org/10.1093/nar/gky1006

28. Rapin N., Hoof I., Lund O., Nielsen M. MHC motif viewer. Immunogenetics, 2008, vol. 60, no. 12, pp. 759–765. https://doi.org/10.1007/s00251-008-0330-2

29. Berman H.M. The protein data bank. Nucleic Acids Research, 2000, vol. 28, no. 1, pp. 235–242. https://doi.org/10.1093/nar/28.1.235

30. Andreatta M., Lund O., Nielsen M. Simultaneous alignment and clustering of peptide data using a Gibbs sampling approach. Bioinformatics, 2013, vol. 29, no. 1, pp. 8–14. https://doi.org/10.1093/bioinformatics/bts621

31. van Balen P., Kester M.G.D., de Klerk W., Crivello P., Arrieta- Bolaños E., de Ru A.H., Jedema I., Mohammed Y., Heemskerk M.H.M., Fleischhauer K., van Veelen P.A., Falkenburg J.H.F. Immunopeptidome analysis of HLA-DPB1 allelic variants reveals new functional hierarchies. The Journal of Immunology, 2020, vol. 204, no. 12, pp. 3273–3282. https://doi.org/10.4049/jimmunol.2000192

32. Koşaloğlu-Yalçın Z., Sidney J., Chronister W., Peters B., Sette A. Comparison of HLA ligand elution data and binding predictions reveals varying prediction performance for the multiple motifs recognized by HLA-DQ2.5. Immunology, 2021, vol. 162, no. 2, pp. 235–247. https://doi.org/10.1111/imm.13279

33. Kawashima S., Kanehisa M. AAindex: Amino Acid index database. Nucleic Acids Research, 2000, vol. 28, no. 1, pp. 374–374. https://doi.org/10.1093/nar/28.1.374

Review

For citations:

Kleverov D.A., Shalyto A.A., Artyomov M.N. A method for constructing interpretable hidden Markov models for the task of identifying binding cores in sequences. Scientific and Technical Journal of Information Technologies, Mechanics and Optics. 2023;23(5):989-1000. (In Russ.) https://doi.org/10.17586/2226-1494-2023-23-5-989-1000

This work is licensed under a Creative Commons Attribution 4.0 License.

ISSN 2226-1494 (Print)
ISSN 2500-0373 (Online)

Username
Password
	Remember me
Not a user? Register with this site Forgot your password?

User

Scientific and Technical Journal of Information Technologies, Mechanics and Optics

A method for constructing interpretable hidden Markov models for the task of identifying binding cores in sequences

Full Text:

Abstract

Keywords

About the Authors

References

Review

For citations:

Cookies policy