Preview

Scientific and Technical Journal of Information Technologies, Mechanics and Optics

Advanced search

Deep attention based Proto-oncogene prediction and Oncogene transition possibility detection using moments and position based amino acid features

https://doi.org/10.17586/2226-1494-2024-24-1-101-111

Abstract

The loss of the regulatory function of tumor suppression genes and mutations in Proto-oncogene are the common underlying mechanisms for uncontrolled tumor growth in the varied complex of disorders known as cancer. Oncogene can be curable by means of diagnosing and treating the possibilities of Proto-oncogene at earlier stages. Recently, machine learning approaches helps to focus and provide information about the possibilities of Proto-oncogene that may change into oncogene in different cancer types. This study helps to diagnose the possibilities of Proto-oncogene which are possible to change oncogenes at earlier stage. Thus, this present study proposed an efficient unique predictor of Proto[1]oncogene with the help of Bi-Directional Long Short Term Memory added with attention concept. This approach also find the probability of Proto-oncogene to oncogene using statistical moments, position based amino-acid composition representation and deep features extracted from the sequence. Consequently, this study suggests that using a K-Nearest Neighbor classifier it is possible to find probability of changing from Proto-oncogene to cancerous oncogene.

About the Authors

M. Vijayalakshmi
Affiliated to Manonmaniam Sundaranar University; Sri Sarada College for Women
India

Manickam Vijayalakshmi — Research Scholar; Assistant Professor

 Abishekapatti, Tirunelveli-627012

 Tirunelveli, 627011



M. Vallinayagi
Sri Sarada College for Women
India

Mahesh Vallinayagi — PhD, Head, Associate Professor

 Tirunelveli, 627011



References

1. Williams D.E., Eisenman J., Baird A., Rauch C., Van Ness K., March C.J., Park L.S., Martin U., Mochizukl D.Y., Boswell H.S., Burgess G.S., Cosman D., Lyman S.D. Identification of a ligand for the c-kit Proto-oncogene. Cell, 1990, vol. 63, no. 1, pp. 167–174. https://doi.org/10.1016/0092-8674(90)90297-r

2. Cooper G.M. Oncogenes. 2nd ed. Jones and Bartlett Publishers Inc. Boston, 1995, 384 p.

3. Mulligan L.M., Kwok J.B., Healey C.S., Elsdon M.J., Eng C., Gardner E., Love D.R., Mole S.E., Moore J.K., Papi L., Ponder M.A., Telenius H., Tunnacliffe A., Ponder B.A. Germ-line mutations of the RET Proto-oncogene in multiple endocrine neoplasia type 2A. Nature, 1993, vol. 363, no. 6428, pp. 458–460. https://doi.org/10.1038/363458a0

4. Croce C.M. Oncogenes and cancer. New England journal of medicine, 2008, vol. 358, no. 5, pp. 502–511. https://doi.org/10.1056/NEJMra072367

5. Vogelstein B., Papadopoulos N., Velculescu V.E., Diaz L.A., Kinzler K.W. Cancer genome landscapes. Science, 2013, vol. 339, no. 6127, pp. 1546–1558. https://doi.org/10.1126/science.1235122

6. Pon J.R., Marra M.A. Driver and passenger mutations in cancer. Annual Review of Pathology: Mechanisms of Disease, 2015, vol. 10, pp. 25–50. https://doi.org/10.1146/annurev-pathol-012414-040312

7. Kulmanov M., Khan M.A., Hoehndorf R. DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier. Bioinformatics, 2018, vol. 34, no. 4, pp. 660–668. https://doi.org/10.1093/bioinformatics/btx624

8. Wass M.N., Sternberg M.J. ConFunc–functional annotation in the twilight zone. Bioinformatics, 2008, vol. 24, no. 6, pp. 798–806. https://doi.org/10.1093/bioinformatics/btn037

9. Deng M., Zhang K., Mehta S., Chen T., Sun F. Prediction of protein function using protein-protein interaction data. Journal of Computational Biology, 2003, vol. 10, no. 6, pp. 947–960. https://doi.org/10.1089/106652703322756168

10. Marcotte E.M., Pellegrini M., Ng H.L., Rice D.W., Yeates T.O., Eisenberg D. Detecting protein function and protein-protein interactions from genome sequences. Science, 1999, vol. 285, no. 5428, pp. 751–753. https://doi.org/10.1126/science.285.5428.751

11. Pal D., Eisenberg D. Inference of protein function from protein structure. Structure, 2005, vol. 13, no. 1, pp. 121–130. https://doi.org/10.1016/j.str.2004.10.015

12. Huttenhower C., Hibbs M., Myers C., Troyanskaya O.G. A scalable method for integration and functional analysis of multiple microarray datasets. Bioinformatics, 2006, vol. 22, no. 23, pp. 2890–2897. https://doi.org/10.1093/bioinformatics/btl492

13. Kourmpetis Y.A.I., van Dijk A.D.J., Bink M.C.A., van Ham M.R.C.H.J., terBraak C.J.F. Bayesian markov random field analysis for protein function prediction based on network data. PLoS One, 2010, vol. 5, no. 2. https://doi.org/10.1371/journal.pone.0009293

14. Radivojac P., Clark W.T., Oron T.R. et al. A large-scale evaluation of computational protein function prediction. Nature Methods, 2013, vol. 10, no. 3, pp. 221–227. https://doi.org/10.1038/nmeth.2340

15. Mihaylov I., Nisheva M., Vassilev D. Application of machine learning models for survival prognosis in breast cancer studies. Information, 2019, vol. 10, no. 3, pp. 93. https://doi.org/10.3390/info10030093

16. Cruz J.A., Wishart D.S. Applications of machine learning in cancer prediction and prognosis. Cancer Informatics, 2006, vol. 2, pp. 59– 77. https://doi.org/10.1177/117693510600200030

17. Sotiriou C., Neo S.-Y., McShane L.M., Korn E.L., Long P.M., Jazaeri A., Martiat P., Fox S.B., Harris A.L., Liu E.T. Breast cancer classification and prognosis based on gene expression profiles from a population-based study. Proceedings of the National Academy of Sciences of the United States of America, 2003, vol. 100, no. 18, pp. 10393–10398. https://doi.org/10.1073/pnas.1732912100

18. Vural S., Wang X., Guda C. Classification of breast cancer patients using somatic mutation profiles and machine learning approaches. BMC Systems Biology, 2016, vol. 10, no. 3, pp. 62. https://doi.org/10.1186/s12918-016-0306-z

19. Cai Z., Xu D., Zhang Q., Zhang J., Ngai S.-M., Shao J. Classification of lung cancer using ensemble-based feature selection and machine learning methods. Molecular BioSystems, 2015, vol. 11, no. 3, pp. 791–800. https://doi.org/10.1039/c4mb00659c

20. Kourou K., Exarchos T.P., Exarchos K.P., Karamouzis M.V. Fotiadis D.I. Machine learning applications in cancer prognosis and prediction. Computational and Structural Biotechnology Journal, 2015, vol. 13, pp. 8–17. https://doi.org/10.1016/j.csbj.2014.11.005

21. Khan Y.D., Batool A., Rasool N., Khan S.A., Chou K.-C.J. Prediction of nitrosocysteine sites using position and composition variant features. Letters in Organic Chemistry, 2019, vol. 16, no. 4, pp. 283– 293. https://doi.org/10.2174/1570178615666180802122953

22. Malebary S.J., Khan R., Khan Y.D. ProtoPred: Advancing oncological research through identification of proto-oncogene proteins. IEEE Access, 2021, vol. 9, pp. 68788–68797. https://doi.org/10.1109/ACCESS.2021.3076448

23. Mahmood M.K., Ehsan A., Khan Y.D., Chou K.-C. iHyd-LysSite (EPSV): identifying hydroxylysine sites in protein using statistical formulation by extracting enhanced position and sequence variant feature technique. Current Genomic, 2020, vol. 21, no. 7, pp. 536– 545. https://doi.org/10.2174/1389202921999200831142629

24. Kumar P., Henikoff S., Ng P.C. Predicting the effects of coding nonsynonymous variants on protein function using the SIFT algorithm. Nature Protocols, 2009, vol. 4, no. 7, pp. 1073–1081. https://doi.org/10.1038/nprot.2009.86

25. Vaser R., Adusumalli S., Leng S., Sikic M., Ng P.C. SIFT missense predictions for genomes. Nature Protocols, 2016, vol. 11, no. 1, pp. 1–9. https://doi.org/10.1038/nprot.2015.123

26. Yang Y., Lu B.L., Yang W.Y. Classification of protein sequences based on word segmentation methods. Proc. of the 6th Asia-Pacific Bioinformatics Conference (APBC ’08), 2008, pp. 177–186. https://doi.org/10.1142/9781848161092_0020

27. Ali F., Hayat M. Classification of membrane protein types using Voting Feature Interval in combination with Chou׳s Pseudo Amino Acid Composition. Journal of Theoretical Biology, 2015, vol. 384, pp. 78–83. https://doi.org/10.1016/j.jtbi.2015.07.034

28. Allehaibi K., Daanial Khan Y., Khan S.A. iTAGPred: A two-level prediction model for identification of angiogenesis and tumor angiogenesis biomarkers. Applied Bionics and Biomechanics, 2021, vol. 2021, pp. 2803147. https://doi.org/10.1155/2021/2803147

29. Lyu J., Li J.J., Su J., Peng F., Chen Y.E., Ge X., Li W. DORGE: Discovery of Oncogenes and tumoR suppressor genes using Genetic and Epigenetic features. Science Advances, 2020, vol. 6, no. 46, pp. 1–17. https://doi.org/10.1126/sciadv.aba6784

30. Feng P., Yang H., Ding H., Lin H., Chen W., Chou K.C. iDNA6mAPseKNC: Identifying DNA N6-methyladenosine sites by incorporating nucleotide physicochemical properties into PseKNC. Genomics, 2018, vol. 111, no. 1, pp. 96–102. https://doi.org/10.1016/j.ygeno.2018.01.005

31. Huang C.H., Peng H.S., Ng K.L. Prediction of cancer proteins by integrating protein interaction, domain frequency, and domain interaction data using machine learning algorithms. BioMed Research International, 2015, vol. 2015, pp. 312047. https://doi.org/10.1155/2015/312047

32. Rahman M.S., Shatabda S., Saha S., Kaykobad M., Rahman M.S. DPP-PseAAC: a DNA-binding protein prediction model using Chou’s general PseAAC. Journal of Theoretical Biology, 2018, vol. 452, pp. 22–34. https://doi.org/10.1016/j.jtbi.2018.05.006

33. Chowdhury S.Y., Shatabda S., Dehzangi A. iDNAProt-ES: Identification of DNA-binding proteins using evolutionary and structural features. Scientific Reports, 2017, vol. 7, pp. 14938. https:// doi.org/10.1038/s41598-017-14945-1

34. Kumar R.D., Searleman A.C., Swamidass S.J., Griffith O.L., Bose R. Statistically identifying tumor suppressors and oncogenes from pancancer genome-sequencing data. Bioinformatics, 2015, vol. 31, no. 22, pp. 3561–3568. https://doi.org/10.1093/bioinformatics/btv430

35. Akmal M.A., Hussain W., Rasool N., Khan Y.D., Khan S.A., Chou K.‑C. Using CHOU’S 5-steps rule to predict O-linked serine glycosylation sites by blending position relative features and statistical moment. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2021, vol. 18, no. 5, pp. 2045–2056. https://doi.org/10.1109/TCBB.2020.2968441

36. Khan Y.D., Ahmad F., Anwar M.W. Aneuro-cognitive approach for iris recognition using back propagation. World Applied Sciences Journal, 2012, vol. 16, no. 5, pp. 678–685.

37. Khan Y.D., Ahmed F., Khan S.A. Situation recognition using image moments and recurrent neural networks. Neural Computing and Applications, 2014, vol. 24, no. 7–8, pp. 1519–1529. https://doi.org/10.1007/s00521-013-1372-4

38. Khan Y.D., Khan N.S., Farooq S., Abid A., Khan S.A., Ahmad F., Mahmood M.K. An efficient algorithm for recognition of human actions. Scientific World Journal, 2014, vol. 2014, pp. 875879. https://doi.org/10.1155/2014/875879

39. Khan Y.D., Khan S.A., Ahmad F., Islam S. Iris recognition using image moments and K-means algorithm. Scientific World Journal, 2014, vol. 2014, pp. 723595. https://doi.org/10.1155/2014/723595

40. Mahmood S., Khan Y.D., Mahmood M.K. A treatise to vision enhancement and color fusion techniques in night vision devices. Multimedia Tools and Applications, 2018, vol. 77, no. 2, pp. 2689– 2737. https://doi.org/10.1007/s11042-017-4365-y

41. Butt H., Rasool N., Khan Y.D. A treatise to computational approaches towards prediction of membrane protein and its subtypes. The Journal of Membrane Biology, 2017, vol. 250, no. 1, pp. 55–76. https://doi.org/10.1007/s00232-016-9937-7

42. Akmal M.A., Rasool N., Khan Y.D. Prediction of N-linked glycosylation sites using position relative features and statistical moments. PLoS ONE, 2017, vol. 12, no. 8, pp. 1–21. https://doi.org/10.1371/journal.pone.0181966

43. Pundir S., Magrane M., Martin M.J., O’Donovan C. Searching and navigating UniProt databases. Current Protocols in Bioinformatics, 2015. pp. 1.27.1–1.27.10 https://doi.org/10.1002/0471250953.bi0127s50

44. Delorenzi M., Speed T. An HMM model for coiled-coil domains and a comparison with PSSM-based predictions. Bioinformatics, 2002, vol. 18, no. 4, pp. 617–625. https://doi.org/10.1093/bioinformatics/18.4.617

45. Jia J., Liu Z., Xiao X., Liu B., Chou K.-C. iSuc-PseOpt: identifying lysine succinylation sites in proteins by incorporating sequence-coupling effects into pseudo components and optimizing imbalanced training dataset. Analytical Biochemistry, 2016, vol. 497, pp. 48–56. https://doi.org/10.1016/j.ab.2015.12.009


Review

For citations:


Vijayalakshmi M., Vallinayagi M. Deep attention based Proto-oncogene prediction and Oncogene transition possibility detection using moments and position based amino acid features. Scientific and Technical Journal of Information Technologies, Mechanics and Optics. 2024;24(1):101-111. https://doi.org/10.17586/2226-1494-2024-24-1-101-111

Views: 8


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2226-1494 (Print)
ISSN 2500-0373 (Online)