Основанное на особом интересе прогнозирование протоонкогена и обнаружение возможностей его мутации в онкоген на основе первоначального анализа последовательности аминокислот
https://doi.org/10.17586/2226-1494-2024-24-1-101-111
Аннотация
Утрата регуляторной функции генов, подавляющих опухоль, и мутации в протоонкогенах являются общими механизмами, лежащими в основе неконтролируемого роста опухолей при разнообразном комплексе заболеваний, известных как рак. Онкоген можно излечить путем диагностики и лечения возможностей протоонкогена на ранних стадиях. В последнее время подходы машинного обучения помогают сосредоточить внимание и предоставить информацию о возможностях протоонкогена, который может превращаться в онкоген при различных типах рака или изменять его на ранних стадиях. Предложен эффективный и уникальный предиктор протоонкогена с помощью нейронной сети Bi-Directional Long Short Term Memory (BiLSTM), дополненный концепцией ухода за больными. Этот подход также позволяет определить вероятность перехода от протоонкогена к онкогену с использованием статистических моментов, представления аминокислотного состава на основе положения и глубоких особенностей, извлеченных из последовательности. В работе применен классификатор K-Nearest Neighbor с помощью, которого можно определить вероятность перехода от протоонкогена к раковому онкогену.
Ключевые слова
Об авторах
М. ВиджаялакшмиИндия
Виджаялакшми Маникам — научный сотрудник; доцент
Абишекапати, Тируневелли-627012
Тирунелвели, 627011
М. Валлинаяги
Индия
Валлинаяги Махеш — PhD, руководитель, доцент
Тирунелвели, 627011
Список литературы
1. Williams D.E., Eisenman J., Baird A., Rauch C., Van Ness K., March C.J., Park L.S., Martin U., Mochizukl D.Y., Boswell H.S., Burgess G.S., Cosman D., Lyman S.D. Identification of a ligand for the c-kit Proto-oncogene. Cell, 1990, vol. 63, no. 1, pp. 167–174. https://doi.org/10.1016/0092-8674(90)90297-r
2. Cooper G.M. Oncogenes. 2nd ed. Jones and Bartlett Publishers Inc. Boston, 1995, 384 p.
3. Mulligan L.M., Kwok J.B., Healey C.S., Elsdon M.J., Eng C., Gardner E., Love D.R., Mole S.E., Moore J.K., Papi L., Ponder M.A., Telenius H., Tunnacliffe A., Ponder B.A. Germ-line mutations of the RET Proto-oncogene in multiple endocrine neoplasia type 2A. Nature, 1993, vol. 363, no. 6428, pp. 458–460. https://doi.org/10.1038/363458a0
4. Croce C.M. Oncogenes and cancer. New England journal of medicine, 2008, vol. 358, no. 5, pp. 502–511. https://doi.org/10.1056/NEJMra072367
5. Vogelstein B., Papadopoulos N., Velculescu V.E., Diaz L.A., Kinzler K.W. Cancer genome landscapes. Science, 2013, vol. 339, no. 6127, pp. 1546–1558. https://doi.org/10.1126/science.1235122
6. Pon J.R., Marra M.A. Driver and passenger mutations in cancer. Annual Review of Pathology: Mechanisms of Disease, 2015, vol. 10, pp. 25–50. https://doi.org/10.1146/annurev-pathol-012414-040312
7. Kulmanov M., Khan M.A., Hoehndorf R. DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier. Bioinformatics, 2018, vol. 34, no. 4, pp. 660–668. https://doi.org/10.1093/bioinformatics/btx624
8. Wass M.N., Sternberg M.J. ConFunc–functional annotation in the twilight zone. Bioinformatics, 2008, vol. 24, no. 6, pp. 798–806. https://doi.org/10.1093/bioinformatics/btn037
9. Deng M., Zhang K., Mehta S., Chen T., Sun F. Prediction of protein function using protein-protein interaction data. Journal of Computational Biology, 2003, vol. 10, no. 6, pp. 947–960. https://doi.org/10.1089/106652703322756168
10. Marcotte E.M., Pellegrini M., Ng H.L., Rice D.W., Yeates T.O., Eisenberg D. Detecting protein function and protein-protein interactions from genome sequences. Science, 1999, vol. 285, no. 5428, pp. 751–753. https://doi.org/10.1126/science.285.5428.751
11. Pal D., Eisenberg D. Inference of protein function from protein structure. Structure, 2005, vol. 13, no. 1, pp. 121–130. https://doi.org/10.1016/j.str.2004.10.015
12. Huttenhower C., Hibbs M., Myers C., Troyanskaya O.G. A scalable method for integration and functional analysis of multiple microarray datasets. Bioinformatics, 2006, vol. 22, no. 23, pp. 2890–2897. https://doi.org/10.1093/bioinformatics/btl492
13. Kourmpetis Y.A.I., van Dijk A.D.J., Bink M.C.A., van Ham M.R.C.H.J., terBraak C.J.F. Bayesian markov random field analysis for protein function prediction based on network data. PLoS One, 2010, vol. 5, no. 2. https://doi.org/10.1371/journal.pone.0009293
14. Radivojac P., Clark W.T., Oron T.R. et al. A large-scale evaluation of computational protein function prediction. Nature Methods, 2013, vol. 10, no. 3, pp. 221–227. https://doi.org/10.1038/nmeth.2340
15. Mihaylov I., Nisheva M., Vassilev D. Application of machine learning models for survival prognosis in breast cancer studies. Information, 2019, vol. 10, no. 3, pp. 93. https://doi.org/10.3390/info10030093
16. Cruz J.A., Wishart D.S. Applications of machine learning in cancer prediction and prognosis. Cancer Informatics, 2006, vol. 2, pp. 59– 77. https://doi.org/10.1177/117693510600200030
17. Sotiriou C., Neo S.-Y., McShane L.M., Korn E.L., Long P.M., Jazaeri A., Martiat P., Fox S.B., Harris A.L., Liu E.T. Breast cancer classification and prognosis based on gene expression profiles from a population-based study. Proceedings of the National Academy of Sciences of the United States of America, 2003, vol. 100, no. 18, pp. 10393–10398. https://doi.org/10.1073/pnas.1732912100
18. Vural S., Wang X., Guda C. Classification of breast cancer patients using somatic mutation profiles and machine learning approaches. BMC Systems Biology, 2016, vol. 10, no. 3, pp. 62. https://doi.org/10.1186/s12918-016-0306-z
19. Cai Z., Xu D., Zhang Q., Zhang J., Ngai S.-M., Shao J. Classification of lung cancer using ensemble-based feature selection and machine learning methods. Molecular BioSystems, 2015, vol. 11, no. 3, pp. 791–800. https://doi.org/10.1039/c4mb00659c
20. Kourou K., Exarchos T.P., Exarchos K.P., Karamouzis M.V. Fotiadis D.I. Machine learning applications in cancer prognosis and prediction. Computational and Structural Biotechnology Journal, 2015, vol. 13, pp. 8–17. https://doi.org/10.1016/j.csbj.2014.11.005
21. Khan Y.D., Batool A., Rasool N., Khan S.A., Chou K.-C.J. Prediction of nitrosocysteine sites using position and composition variant features. Letters in Organic Chemistry, 2019, vol. 16, no. 4, pp. 283– 293. https://doi.org/10.2174/1570178615666180802122953
22. Malebary S.J., Khan R., Khan Y.D. ProtoPred: Advancing oncological research through identification of proto-oncogene proteins. IEEE Access, 2021, vol. 9, pp. 68788–68797. https://doi.org/10.1109/ACCESS.2021.3076448
23. Mahmood M.K., Ehsan A., Khan Y.D., Chou K.-C. iHyd-LysSite (EPSV): identifying hydroxylysine sites in protein using statistical formulation by extracting enhanced position and sequence variant feature technique. Current Genomic, 2020, vol. 21, no. 7, pp. 536– 545. https://doi.org/10.2174/1389202921999200831142629
24. Kumar P., Henikoff S., Ng P.C. Predicting the effects of coding nonsynonymous variants on protein function using the SIFT algorithm. Nature Protocols, 2009, vol. 4, no. 7, pp. 1073–1081. https://doi.org/10.1038/nprot.2009.86
25. Vaser R., Adusumalli S., Leng S., Sikic M., Ng P.C. SIFT missense predictions for genomes. Nature Protocols, 2016, vol. 11, no. 1, pp. 1–9. https://doi.org/10.1038/nprot.2015.123
26. Yang Y., Lu B.L., Yang W.Y. Classification of protein sequences based on word segmentation methods. Proc. of the 6th Asia-Pacific Bioinformatics Conference (APBC ’08), 2008, pp. 177–186. https://doi.org/10.1142/9781848161092_0020
27. Ali F., Hayat M. Classification of membrane protein types using Voting Feature Interval in combination with Chou׳s Pseudo Amino Acid Composition. Journal of Theoretical Biology, 2015, vol. 384, pp. 78–83. https://doi.org/10.1016/j.jtbi.2015.07.034
28. Allehaibi K., Daanial Khan Y., Khan S.A. iTAGPred: A two-level prediction model for identification of angiogenesis and tumor angiogenesis biomarkers. Applied Bionics and Biomechanics, 2021, vol. 2021, pp. 2803147. https://doi.org/10.1155/2021/2803147
29. Lyu J., Li J.J., Su J., Peng F., Chen Y.E., Ge X., Li W. DORGE: Discovery of Oncogenes and tumoR suppressor genes using Genetic and Epigenetic features. Science Advances, 2020, vol. 6, no. 46, pp. 1–17. https://doi.org/10.1126/sciadv.aba6784
30. Feng P., Yang H., Ding H., Lin H., Chen W., Chou K.C. iDNA6mAPseKNC: Identifying DNA N6-methyladenosine sites by incorporating nucleotide physicochemical properties into PseKNC. Genomics, 2018, vol. 111, no. 1, pp. 96–102. https://doi.org/10.1016/j.ygeno.2018.01.005
31. Huang C.H., Peng H.S., Ng K.L. Prediction of cancer proteins by integrating protein interaction, domain frequency, and domain interaction data using machine learning algorithms. BioMed Research International, 2015, vol. 2015, pp. 312047. https://doi.org/10.1155/2015/312047
32. Rahman M.S., Shatabda S., Saha S., Kaykobad M., Rahman M.S. DPP-PseAAC: a DNA-binding protein prediction model using Chou’s general PseAAC. Journal of Theoretical Biology, 2018, vol. 452, pp. 22–34. https://doi.org/10.1016/j.jtbi.2018.05.006
33. Chowdhury S.Y., Shatabda S., Dehzangi A. iDNAProt-ES: Identification of DNA-binding proteins using evolutionary and structural features. Scientific Reports, 2017, vol. 7, pp. 14938. https:// doi.org/10.1038/s41598-017-14945-1
34. Kumar R.D., Searleman A.C., Swamidass S.J., Griffith O.L., Bose R. Statistically identifying tumor suppressors and oncogenes from pancancer genome-sequencing data. Bioinformatics, 2015, vol. 31, no. 22, pp. 3561–3568. https://doi.org/10.1093/bioinformatics/btv430
35. Akmal M.A., Hussain W., Rasool N., Khan Y.D., Khan S.A., Chou K.‑C. Using CHOU’S 5-steps rule to predict O-linked serine glycosylation sites by blending position relative features and statistical moment. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2021, vol. 18, no. 5, pp. 2045–2056. https://doi.org/10.1109/TCBB.2020.2968441
36. Khan Y.D., Ahmad F., Anwar M.W. Aneuro-cognitive approach for iris recognition using back propagation. World Applied Sciences Journal, 2012, vol. 16, no. 5, pp. 678–685.
37. Khan Y.D., Ahmed F., Khan S.A. Situation recognition using image moments and recurrent neural networks. Neural Computing and Applications, 2014, vol. 24, no. 7–8, pp. 1519–1529. https://doi.org/10.1007/s00521-013-1372-4
38. Khan Y.D., Khan N.S., Farooq S., Abid A., Khan S.A., Ahmad F., Mahmood M.K. An efficient algorithm for recognition of human actions. Scientific World Journal, 2014, vol. 2014, pp. 875879. https://doi.org/10.1155/2014/875879
39. Khan Y.D., Khan S.A., Ahmad F., Islam S. Iris recognition using image moments and K-means algorithm. Scientific World Journal, 2014, vol. 2014, pp. 723595. https://doi.org/10.1155/2014/723595
40. Mahmood S., Khan Y.D., Mahmood M.K. A treatise to vision enhancement and color fusion techniques in night vision devices. Multimedia Tools and Applications, 2018, vol. 77, no. 2, pp. 2689– 2737. https://doi.org/10.1007/s11042-017-4365-y
41. Butt H., Rasool N., Khan Y.D. A treatise to computational approaches towards prediction of membrane protein and its subtypes. The Journal of Membrane Biology, 2017, vol. 250, no. 1, pp. 55–76. https://doi.org/10.1007/s00232-016-9937-7
42. Akmal M.A., Rasool N., Khan Y.D. Prediction of N-linked glycosylation sites using position relative features and statistical moments. PLoS ONE, 2017, vol. 12, no. 8, pp. 1–21. https://doi.org/10.1371/journal.pone.0181966
43. Pundir S., Magrane M., Martin M.J., O’Donovan C. Searching and navigating UniProt databases. Current Protocols in Bioinformatics, 2015. pp. 1.27.1–1.27.10 https://doi.org/10.1002/0471250953.bi0127s50
44. Delorenzi M., Speed T. An HMM model for coiled-coil domains and a comparison with PSSM-based predictions. Bioinformatics, 2002, vol. 18, no. 4, pp. 617–625. https://doi.org/10.1093/bioinformatics/18.4.617
45. Jia J., Liu Z., Xiao X., Liu B., Chou K.-C. iSuc-PseOpt: identifying lysine succinylation sites in proteins by incorporating sequence-coupling effects into pseudo components and optimizing imbalanced training dataset. Analytical Biochemistry, 2016, vol. 497, pp. 48–56. https://doi.org/10.1016/j.ab.2015.12.009
Рецензия
Для цитирования:
Виджаялакшми М., Валлинаяги М. Основанное на особом интересе прогнозирование протоонкогена и обнаружение возможностей его мутации в онкоген на основе первоначального анализа последовательности аминокислот. Научно-технический вестник информационных технологий, механики и оптики. 2024;24(1):101-111. https://doi.org/10.17586/2226-1494-2024-24-1-101-111
For citation:
Vijayalakshmi M., Vallinayagi M. Deep attention based Proto-oncogene prediction and Oncogene transition possibility detection using moments and position based amino acid features. Scientific and Technical Journal of Information Technologies, Mechanics and Optics. 2024;24(1):101-111. https://doi.org/10.17586/2226-1494-2024-24-1-101-111