Preview

Scientific and Technical Journal of Information Technologies, Mechanics and Optics

Advanced search

Predicting gene-disease associations using a heterogeneous graph neural network

https://doi.org/10.17586/2226-1494-2024-24-4-594-601

Abstract

The research presents the development of a heterogeneous graph neural network model for predicting gene-disease using existing genomic and medical data. The novelty of the approach is in integrating the principles of graph neural networks and heterogeneous information networks for efficient processing of structured data and consideration of complex genepathology interactions. The solution proposed is a heterogeneous graph neural network which utilizes a heterogeneous graph structure for representing genes, diseases, and their relationships. The performance of the developed model was evaluated on the DisGeNET, LASTFM, YELP datasets. On these datasets, a comparison was made with current SOTA models. The comparison results demonstrated that the proposed model outperforms other models in terms of Average Precision (AP), F1-measure (F1@S), Hit@k, Area Under Receiver Operating Characteristic curve (AUROC) in predicting “gene-disease” associations. The model developed serves as a tool for bioinformatics analysis and can aid researchers and doctors in studying genetic diseases. This could expedite the discovery of new drug targets and the advancement of personalized medicine.

About the Authors

D. A. Sidorenko
ITMO University
Russian Federation

Denis A. Sidorenko — PhD Student

Saint Petersburg, 197101



A. A. Shalyto
ITMO University
Russian Federation

Anatoly A. Shalyto — D.Sc., Full Professor, Chief Scientific Researcher

Saint Petersburg, 197101



References

1. Henaff M., Bruna J., LeCun Y. Deep convolutional networks on graph-structured data. arXiv, 2015, arXiv:1506.05163. https://doi.org/10.48550/arXiv.1506.05163

2. Wang X., Bo D., Shi C., Fan S., Ye Y., Yu P.S. A survey on heterogeneous graph embedding: methods, techniques, applications and sources. IEEE Transactions on Big Data, 2023, vol. 9, no. 2, pp. 415–436. https://doi.org/10.1109/TBDATA.2022.3177455

3. Shao B., Li X., Bian G. A survey of research hotspots and frontier trends of recommendation systems from the perspective of knowledge graph. Expert Systems with Applications, 2021, vol. 165, pp. 113764. https://doi.org/10.1016/j.eswa.2020.113764

4. László L. Random walks on graphs: a survey. Combinatorics. V. 2. 1993, pp. 1–46.

5. Li L., Wang Y., An L., Kong X., Huang T. A network-based method using a random walk with restart algorithm and screening tests to identify novel genes associated with Menière’s disease. PLOS ONE, 2017, vol. 12, no. 8, pp. e0182592. https://doi.org/10.1371/journal.pone.0182592

6. Muslu Ö., Hoyt C.T., Lacerda M., Hofmann-Apitius M., Frohlich H. GuiltyTargets: Prioritization of novel therapeutic targets with network representation learning. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2022, vol. 19, no. 1, pp. 491–500. https://doi.org/10.1109/TCBB.2020.3003830

7. Li Y., Kuwahara H., Yang P., Song L., Gao X. PGCN: Disease gene prioritization by disease and gene embedding through graph convolutional neural networks. biorxiv.org, 2019. https://doi.org/10.1101/532226

8. Dutta A., Alcaraz J., TehraniJamsaz A., Cesar E., Sikora A., Jannesari A. Performance optimization using multimodal modeling and heterogeneous GNN. arXiv, 2023, arXiv.2304.12568. https://doi.org/10.48550/arXiv.2304.12568

9. Thanapalasingam T., van Berkel L., Bloem P., Groth P. Relational graph convolutional networks: Closer Look. PeerJ Computer Science, 2022, vol. 8, pp. e1073. https://doi.org/10.7717/PEERJ-CS.1073

10. Wang X., Ji H., Shi C., Wang B., Ye Y., Cui P., Yu P.S. Heterogeneous graph attention network. Proc. of the WWW ‘19: The World Wide Web Conference, 2019, pp. 2022–2032. https://doi.org/10.1145/3308558.3313562

11. Ali A., Bagchi A. An overview of protein-protein interaction. Current Chemical Biology, 2015, vol. 9, no. 1, pp. 53–65. https://doi.org/10.2174/221279680901151109161126

12. Malone J., Holloway E., Adamusiak T., Kapushesky M., Zheng J., Kolesnikov N., Zhukova A., Brazma A., Parkinson H. Modeling sample variables with an experimental factor ontology. Bioinformatics, 2010, vol. 26, no. 8, pp. 1112–1118. https://doi.org/10.1093/bioinformatics/btq099

13. Lee J., Yoon W., Kim S., Kim D., Kim S., So C.H., Kang J. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 2020, vol. 36, no. 4, pp. 1234–1240. https://doi.org/10.1093/bioinformatics/btz682

14. Zhang X., Zou Y., Shi W. Dilated convolution neural network with LeakyReLU for environmental sound classification. Proc. of the 22nd International Conference on Digital Signal Processing (DSP), 2017. https://doi.org/10.1109/ICDSP.2017.8096153

15. Piñero J., Queralt-Rosinach N., Bravo A., Deu-Pons J., Bauer-Mehren A., Baron M., Sanz F., Furlong L.I. DisGeNET: A discovery platform for the dynamical exploration of human diseases and their genes. Database, 2015, vol. 2015. https://doi.org/10.1093/database/bav028

16. Alam M., Cevallos B., Flores O., Lunetto R., Yayoshi K., Woo J. Yelp Dataset Analysis using Scalable Big Data. arXiv, 2021, arXiv.2104.08396v1. https://doi.org/10.48550/arXiv.2104.08396

17. Li Y., Guo X., Lin W., Zhong M., Li Q., Liu Z., Zhong W., Zhu Z. Learning dynamic user interest sequence in knowledge graphs for click-through rate prediction. IEEE Transactions on Knowledge and Data Engineering, 2023, vol. 35, no. 1, pp. 647–657. https://doi.org/10.1109/TKDE.2021.3073717

18. Kuo Y., Wang R., Liu G., Shu Z., Wang N., Zhang R., Yu J., Chen J., Li X., Zhou X. HerGePred: Heterogeneous network embedding representation for disease gene prediction. IEEE Journal of Biomedical and Health Informatics, 2019, vol. 23, no. 4, pp. 1805– 1815. https://doi.org/10.1109/JBHI.2018.2870728

19. Grover A., Leskovec J. node2vec: Scalable feature learning for networks. Proc. of the KDD’16 . International Conference on Knowledge Discovery & Data Mining, 2016, pp. 855–864. https://doi.org/10.1145/2939672.2939754

20. Yuxiao D., Chawla N., Swami A. metapath2vec: Scalable representation learning for heterogeneous networks. Proc. of the KDD’17 . International Conference on Knowledge Discovery & Data Mining, 2017, pp 135–144. https://doi.org/10.1145/3097983.3098036

21. Mikolov T., Chen K., Corrado G., Dean J. Efficient Estimation of Word Representations in Vector Space. Proc. of the Workshop ICLR, 2013.

22. Perozzi B., Al-Rfou R., Skiena S. DeepWalk: Online learning of social representations. Proc. of the KDD’14. 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2014, pp. 701–710. https://doi.org/10.1145/2623330.2623732

23. Hu Z., Dong Y., Wang K., Sun Y. Heterogeneous graph transformer. Proc. of the WWW ’20. The Web Conference, 2020, pp. 2704–2710. https://doi.org/10.1145/3366423.3380027

24. He M., Huang C., Liu B., Wang Y., Li J. Factor graph-aggregated heterogeneous network embedding for disease-gene association prediction. BMC Bioinformatics, 2021, vol. 22, pp. 165. https://doi.org/10.1186/s12859-021-04099-3


Review

For citations:


Sidorenko D.A., Shalyto A.A. Predicting gene-disease associations using a heterogeneous graph neural network. Scientific and Technical Journal of Information Technologies, Mechanics and Optics. 2024;24(4):594-601. (In Russ.) https://doi.org/10.17586/2226-1494-2024-24-4-594-601

Views: 11


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2226-1494 (Print)
ISSN 2500-0373 (Online)