Основанное на особом интересе прогнозирование протоонкогена и  обнаружение возможностей его мутации в онкоген на основе первоначального  анализа последовательности аминокислот

М. Виджаялакшми; М. Валлинаяги

doi:10.17586/2226-1494-2024-24-1-101-111

Основанное на особом интересе прогнозирование протоонкогена и обнаружение возможностей его мутации в онкоген на основе первоначального анализа последовательности аминокислот

М. Виджаялакшми, М. Валлинаяги

https://doi.org/10.17586/2226-1494-2024-24-1-101-111

Полный текст:

PDF (Eng)

сгенерировать QR код

Аннотация

Утрата регуляторной функции генов, подавляющих опухоль, и мутации в протоонкогенах являются общими механизмами, лежащими в основе неконтролируемого роста опухолей при разнообразном комплексе заболеваний, известных как рак. Онкоген можно излечить путем диагностики и лечения возможностей протоонкогена на ранних стадиях. В последнее время подходы машинного обучения помогают сосредоточить внимание и предоставить информацию о возможностях протоонкогена, который может превращаться в онкоген при различных типах рака или изменять его на ранних стадиях. Предложен эффективный и уникальный предиктор протоонкогена с помощью нейронной сети Bi-Directional Long Short Term Memory (BiLSTM), дополненный концепцией ухода за больными. Этот подход также позволяет определить вероятность перехода от протоонкогена к онкогену с использованием статистических моментов, представления аминокислотного состава на основе положения и глубоких особенностей, извлеченных из последовательности. В работе применен классификатор K-Nearest Neighbor с помощью, которого можно определить вероятность перехода от протоонкогена к раковому онкогену.

Ключевые слова

протоонкогены, PseAAC, прогнозирование, гены опухолевой супрессии, TSG, машинное обучение, двунаправленная долговременная краткосрочная память, BiLSTM

Об авторах

М. Виджаялакшми

Университет Манонманиам Сундаранар; Женский колледж Шри Сарада
Индия

Виджаялакшми Маникам — научный сотрудник; доцент

Абишекапати, Тируневелли-627012

Тирунелвели, 627011

М. Валлинаяги

Женский колледж Шри Сарада
Индия

Валлинаяги Махеш — PhD, руководитель, доцент

Тирунелвели, 627011

Список литературы

1. Williams D.E., Eisenman J., Baird A., Rauch C., Van Ness K., March C.J., Park L.S., Martin U., Mochizukl D.Y., Boswell H.S., Burgess G.S., Cosman D., Lyman S.D. Identification of a ligand for the c-kit Proto-oncogene. Cell, 1990, vol. 63, no. 1, pp. 167–174. https://doi.org/10.1016/0092-8674(90)90297-r

2. Cooper G.M. Oncogenes. 2nd ed. Jones and Bartlett Publishers Inc. Boston, 1995, 384 p.

3. Mulligan L.M., Kwok J.B., Healey C.S., Elsdon M.J., Eng C., Gardner E., Love D.R., Mole S.E., Moore J.K., Papi L., Ponder M.A., Telenius H., Tunnacliffe A., Ponder B.A. Germ-line mutations of the RET Proto-oncogene in multiple endocrine neoplasia type 2A. Nature, 1993, vol. 363, no. 6428, pp. 458–460. https://doi.org/10.1038/363458a0

4. Croce C.M. Oncogenes and cancer. New England journal of medicine, 2008, vol. 358, no. 5, pp. 502–511. https://doi.org/10.1056/NEJMra072367

5. Vogelstein B., Papadopoulos N., Velculescu V.E., Diaz L.A., Kinzler K.W. Cancer genome landscapes. Science, 2013, vol. 339, no. 6127, pp. 1546–1558. https://doi.org/10.1126/science.1235122

6. Pon J.R., Marra M.A. Driver and passenger mutations in cancer. Annual Review of Pathology: Mechanisms of Disease, 2015, vol. 10, pp. 25–50. https://doi.org/10.1146/annurev-pathol-012414-040312

7. Kulmanov M., Khan M.A., Hoehndorf R. DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier. Bioinformatics, 2018, vol. 34, no. 4, pp. 660–668. https://doi.org/10.1093/bioinformatics/btx624

8. Wass M.N., Sternberg M.J. ConFunc–functional annotation in the twilight zone. Bioinformatics, 2008, vol. 24, no. 6, pp. 798–806. https://doi.org/10.1093/bioinformatics/btn037

9. Deng M., Zhang K., Mehta S., Chen T., Sun F. Prediction of protein function using protein-protein interaction data. Journal of Computational Biology, 2003, vol. 10, no. 6, pp. 947–960. https://doi.org/10.1089/106652703322756168

10. Marcotte E.M., Pellegrini M., Ng H.L., Rice D.W., Yeates T.O., Eisenberg D. Detecting protein function and protein-protein interactions from genome sequences. Science, 1999, vol. 285, no. 5428, pp. 751–753. https://doi.org/10.1126/science.285.5428.751

11. Pal D., Eisenberg D. Inference of protein function from protein structure. Structure, 2005, vol. 13, no. 1, pp. 121–130. https://doi.org/10.1016/j.str.2004.10.015

12. Huttenhower C., Hibbs M., Myers C., Troyanskaya O.G. A scalable method for integration and functional analysis of multiple microarray datasets. Bioinformatics, 2006, vol. 22, no. 23, pp. 2890–2897. https://doi.org/10.1093/bioinformatics/btl492

13. Kourmpetis Y.A.I., van Dijk A.D.J., Bink M.C.A., van Ham M.R.C.H.J., terBraak C.J.F. Bayesian markov random field analysis for protein function prediction based on network data. PLoS One, 2010, vol. 5, no. 2. https://doi.org/10.1371/journal.pone.0009293

14. Radivojac P., Clark W.T., Oron T.R. et al. A large-scale evaluation of computational protein function prediction. Nature Methods, 2013, vol. 10, no. 3, pp. 221–227. https://doi.org/10.1038/nmeth.2340

15. Mihaylov I., Nisheva M., Vassilev D. Application of machine learning models for survival prognosis in breast cancer studies. Information, 2019, vol. 10, no. 3, pp. 93. https://doi.org/10.3390/info10030093

16. Cruz J.A., Wishart D.S. Applications of machine learning in cancer prediction and prognosis. Cancer Informatics, 2006, vol. 2, pp. 59– 77. https://doi.org/10.1177/117693510600200030

17. Sotiriou C., Neo S.-Y., McShane L.M., Korn E.L., Long P.M., Jazaeri A., Martiat P., Fox S.B., Harris A.L., Liu E.T. Breast cancer classification and prognosis based on gene expression profiles from a population-based study. Proceedings of the National Academy of Sciences of the United States of America, 2003, vol. 100, no. 18, pp. 10393–10398. https://doi.org/10.1073/pnas.1732912100

18. Vural S., Wang X., Guda C. Classification of breast cancer patients using somatic mutation profiles and machine learning approaches. BMC Systems Biology, 2016, vol. 10, no. 3, pp. 62. https://doi.org/10.1186/s12918-016-0306-z

19. Cai Z., Xu D., Zhang Q., Zhang J., Ngai S.-M., Shao J. Classification of lung cancer using ensemble-based feature selection and machine learning methods. Molecular BioSystems, 2015, vol. 11, no. 3, pp. 791–800. https://doi.org/10.1039/c4mb00659c

20. Kourou K., Exarchos T.P., Exarchos K.P., Karamouzis M.V. Fotiadis D.I. Machine learning applications in cancer prognosis and prediction. Computational and Structural Biotechnology Journal, 2015, vol. 13, pp. 8–17. https://doi.org/10.1016/j.csbj.2014.11.005

21. Khan Y.D., Batool A., Rasool N., Khan S.A., Chou K.-C.J. Prediction of nitrosocysteine sites using position and composition variant features. Letters in Organic Chemistry, 2019, vol. 16, no. 4, pp. 283– 293. https://doi.org/10.2174/1570178615666180802122953

22. Malebary S.J., Khan R., Khan Y.D. ProtoPred: Advancing oncological research through identification of proto-oncogene proteins. IEEE Access, 2021, vol. 9, pp. 68788–68797. https://doi.org/10.1109/ACCESS.2021.3076448

23. Mahmood M.K., Ehsan A., Khan Y.D., Chou K.-C. iHyd-LysSite (EPSV): identifying hydroxylysine sites in protein using statistical formulation by extracting enhanced position and sequence variant feature technique. Current Genomic, 2020, vol. 21, no. 7, pp. 536– 545. https://doi.org/10.2174/1389202921999200831142629

24. Kumar P., Henikoff S., Ng P.C. Predicting the effects of coding nonsynonymous variants on protein function using the SIFT algorithm. Nature Protocols, 2009, vol. 4, no. 7, pp. 1073–1081. https://doi.org/10.1038/nprot.2009.86

25. Vaser R., Adusumalli S., Leng S., Sikic M., Ng P.C. SIFT missense predictions for genomes. Nature Protocols, 2016, vol. 11, no. 1, pp. 1–9. https://doi.org/10.1038/nprot.2015.123

26. Yang Y., Lu B.L., Yang W.Y. Classification of protein sequences based on word segmentation methods. Proc. of the 6th Asia-Pacific Bioinformatics Conference (APBC ’08), 2008, pp. 177–186. https://doi.org/10.1142/9781848161092_0020

27. Ali F., Hayat M. Classification of membrane protein types using Voting Feature Interval in combination with Chou׳s Pseudo Amino Acid Composition. Journal of Theoretical Biology, 2015, vol. 384, pp. 78–83. https://doi.org/10.1016/j.jtbi.2015.07.034

28. Allehaibi K., Daanial Khan Y., Khan S.A. iTAGPred: A two-level prediction model for identification of angiogenesis and tumor angiogenesis biomarkers. Applied Bionics and Biomechanics, 2021, vol. 2021, pp. 2803147. https://doi.org/10.1155/2021/2803147

29. Lyu J., Li J.J., Su J., Peng F., Chen Y.E., Ge X., Li W. DORGE: Discovery of Oncogenes and tumoR suppressor genes using Genetic and Epigenetic features. Science Advances, 2020, vol. 6, no. 46, pp. 1–17. https://doi.org/10.1126/sciadv.aba6784

30. Feng P., Yang H., Ding H., Lin H., Chen W., Chou K.C. iDNA6mAPseKNC: Identifying DNA N6-methyladenosine sites by incorporating nucleotide physicochemical properties into PseKNC. Genomics, 2018, vol. 111, no. 1, pp. 96–102. https://doi.org/10.1016/j.ygeno.2018.01.005

31. Huang C.H., Peng H.S., Ng K.L. Prediction of cancer proteins by integrating protein interaction, domain frequency, and domain interaction data using machine learning algorithms. BioMed Research International, 2015, vol. 2015, pp. 312047. https://doi.org/10.1155/2015/312047

32. Rahman M.S., Shatabda S., Saha S., Kaykobad M., Rahman M.S. DPP-PseAAC: a DNA-binding protein prediction model using Chou’s general PseAAC. Journal of Theoretical Biology, 2018, vol. 452, pp. 22–34. https://doi.org/10.1016/j.jtbi.2018.05.006

33. Chowdhury S.Y., Shatabda S., Dehzangi A. iDNAProt-ES: Identification of DNA-binding proteins using evolutionary and structural features. Scientific Reports, 2017, vol. 7, pp. 14938. https:// doi.org/10.1038/s41598-017-14945-1

34. Kumar R.D., Searleman A.C., Swamidass S.J., Griffith O.L., Bose R. Statistically identifying tumor suppressors and oncogenes from pancancer genome-sequencing data. Bioinformatics, 2015, vol. 31, no. 22, pp. 3561–3568. https://doi.org/10.1093/bioinformatics/btv430

35. Akmal M.A., Hussain W., Rasool N., Khan Y.D., Khan S.A., Chou K.‑C. Using CHOU’S 5-steps rule to predict O-linked serine glycosylation sites by blending position relative features and statistical moment. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2021, vol. 18, no. 5, pp. 2045–2056. https://doi.org/10.1109/TCBB.2020.2968441

36. Khan Y.D., Ahmad F., Anwar M.W. Aneuro-cognitive approach for iris recognition using back propagation. World Applied Sciences Journal, 2012, vol. 16, no. 5, pp. 678–685.

37. Khan Y.D., Ahmed F., Khan S.A. Situation recognition using image moments and recurrent neural networks. Neural Computing and Applications, 2014, vol. 24, no. 7–8, pp. 1519–1529. https://doi.org/10.1007/s00521-013-1372-4

38. Khan Y.D., Khan N.S., Farooq S., Abid A., Khan S.A., Ahmad F., Mahmood M.K. An efficient algorithm for recognition of human actions. Scientific World Journal, 2014, vol. 2014, pp. 875879. https://doi.org/10.1155/2014/875879

39. Khan Y.D., Khan S.A., Ahmad F., Islam S. Iris recognition using image moments and K-means algorithm. Scientific World Journal, 2014, vol. 2014, pp. 723595. https://doi.org/10.1155/2014/723595

40. Mahmood S., Khan Y.D., Mahmood M.K. A treatise to vision enhancement and color fusion techniques in night vision devices. Multimedia Tools and Applications, 2018, vol. 77, no. 2, pp. 2689– 2737. https://doi.org/10.1007/s11042-017-4365-y

41. Butt H., Rasool N., Khan Y.D. A treatise to computational approaches towards prediction of membrane protein and its subtypes. The Journal of Membrane Biology, 2017, vol. 250, no. 1, pp. 55–76. https://doi.org/10.1007/s00232-016-9937-7

42. Akmal M.A., Rasool N., Khan Y.D. Prediction of N-linked glycosylation sites using position relative features and statistical moments. PLoS ONE, 2017, vol. 12, no. 8, pp. 1–21. https://doi.org/10.1371/journal.pone.0181966

43. Pundir S., Magrane M., Martin M.J., O’Donovan C. Searching and navigating UniProt databases. Current Protocols in Bioinformatics, 2015. pp. 1.27.1–1.27.10 https://doi.org/10.1002/0471250953.bi0127s50

44. Delorenzi M., Speed T. An HMM model for coiled-coil domains and a comparison with PSSM-based predictions. Bioinformatics, 2002, vol. 18, no. 4, pp. 617–625. https://doi.org/10.1093/bioinformatics/18.4.617

45. Jia J., Liu Z., Xiao X., Liu B., Chou K.-C. iSuc-PseOpt: identifying lysine succinylation sites in proteins by incorporating sequence-coupling effects into pseudo components and optimizing imbalanced training dataset. Analytical Biochemistry, 2016, vol. 497, pp. 48–56. https://doi.org/10.1016/j.ab.2015.12.009

Рецензия

Для цитирования:

Виджаялакшми М., Валлинаяги М. Основанное на особом интересе прогнозирование протоонкогена и обнаружение возможностей его мутации в онкоген на основе первоначального анализа последовательности аминокислот. Научно-технический вестник информационных технологий, механики и оптики. 2024;24(1):101-111. https://doi.org/10.17586/2226-1494-2024-24-1-101-111

For citation:

Vijayalakshmi M., Vallinayagi M. Deep attention based Proto-oncogene prediction and Oncogene transition possibility detection using moments and position based amino acid features. Scientific and Technical Journal of Information Technologies, Mechanics and Optics. 2024;24(1):101-111. https://doi.org/10.17586/2226-1494-2024-24-1-101-111

JATS XML

Контент доступен под лицензией Creative Commons Attribution 4.0 License.

ISSN 2226-1494 (Print)
ISSN 2500-0373 (Online)

Логин
Пароль
	Запомнить меня
Регистрация нового пользователя Забыли Ваш пароль?

Войти

Научно-технический вестник информационных технологий, механики и оптики