Preview

Scientific and Technical Journal of Information Technologies, Mechanics and Optics

Advanced search

Method for identifying the active module in biological graphs with multi-component vertex weights

https://doi.org/10.17586/2226-1494-2025-25-3-487-497

Abstract

An active module in biological graphs is a connected subgraph whose vertices share a common biological function. To identify an active module, one must first construct a weighted biological graph. The weight of each vertex is calculated based on biological experiments investigating the target biological function. However, the results of a single experiment may not fully describe the desired active module, covering only part of it and potentially introducing uncertainty into the vertex weights. This work demonstrates that employing Fisher’s method to integrate data from multiple experiments followed by applying a Markov chain Monte Carlo (MCMC) and machine learning–based approach to the results of Fisher’s method, enables more effective identification of active modules in biological graphs. The study utilizes the InWebIM protein–protein interaction graph, a human brain reconstruction graph from the BigBrain project, and a gene graph for the organism Caenorhabditis elegans. To combine the results of several experiments into a single outcome within one graph, Fisher’s method is applied. Afterwards, the search for active modules is conducted using an MCMC and machine learning-based method. To validate the proposed method on real data, results from GenomeWide Association Studies on schizophrenia and smoking are used, along with the gene expression matrix of patients with skin melanoma from the TCGA project. Applying Fisher’s method makes it possible to consider the results of multiple biological experiments simultaneously. Subsequent use of the MCMC and machine learning–based method improves the accuracy of identifying active modules compared to ranking graph vertices solely by Fisher’s method. Considering the results of multiple biological experiments when determining active modules plays a crucial role in increasing the accuracy of identifying the vertices of the active module. This, in turn, promotes a deeper understanding of the biological mechanisms of diseases, which can be of great significance for the development of new diagnostic and therapeutic methods.

About the Authors

D. A. Usoltsev
Institute for Genomic Medicine, Nationwide Children’s Hospital; ITMO University
United States

Dmitrii A. Usoltsev — Senior Researcher

Columbus, 43205;

PhD Student

Saint Petersburg, 197101

sc 57279360300



I. I. Molotkov
Institute for Genomic Medicine, Nationwide Children’s Hospital; The Ohio State University College of Medicine
United States

Ivan I. Molotkov — Senior Researcher

Columbus, 43205;

PhD Student

Columbus, 43210

sc 58651494600



M. N. Artomov
Institute for Genomic Medicine, Nationwide Children’s Hospital; The Ohio State University College of Medicine
United States

Mykyta N. Artomov — PhD (Chemistry), Associate Professor, Chief Researcher

Columbus, 43205;

Professor of Pediatrics

Columbus, 43210

sc 36542095500



A. A. Sergushichev
Washington University School of Medicine in St. Louis
United States

Alexey A. Sergushichev — PhD, Associate Professor

St. Louis, 63110, USA

sc 55772694000



A. A. Shalyto
ITMO University
United States

Anatoly A. Shalyto — D.Sc., Chief Researcher, Full Professor

Saint Petersburg, 197101

sc 56131789500



References

1. Wang S., Wu R., Lu J., Jiang Y., Huang T., Cai Y.D. Protein-protein interaction networks as miners of biological discovery. Proteomics, 2022, vol. 22, no. 15-16, P. e2100190. https://doi.org/10.1002/pmic.202100190

2. Rao X., Dixon R.A. Co-expression networks for plant biology: why and how. Acta Biochimica et Biophysica Sinica, 2019, vol. 51, no. 10, pp. 981–988. https://doi.org/10.1093/abbs/gmz080

3. Rawls K., Dougherty B.V., Papin J. Metabolic network reconstructions to predict drug targets and off-target effects. Methods in Molecular Biology, 2020, vol. 2088, pp. 315–330. https://doi.org/10.1007/978-1-0716-0159-4_14

4. Dittrich M.T., Klau G.W., Rosenwald A., Dandekar T., Mller T. Identifying functional modules in protein-protein interaction networks: an integrated exact approach. Bioinformatics, 2008, vol. 24, no. 13. pp. i223–i231. https://doi.org/10.1093/bioinformatics/btn161

5. Zhu Q.M., Hsu Y.H.H., Lassen F.H., MacDonald B.T., Stead S., Malolepsza E., Kim A., Li T., Mizoguchi T., Schenone M., Guzman G., Tanenbaum B., Fornelos N., Carr S.A., Gupta R.M., Ellinor P.T., Lage K. Protein interaction networks in the vasculature prioritize genes and pathways underlying coronary artery disease. Communications Biology, 2024, vol. 7, no. 1, pp. 87. https://doi.org/10.1038/s42003-023-05705-1

6. Nehme R., Pietiläinen O., Artomov M., Tegtmeyer M., Valakh V., Lehtonen L., Bell C., Singh T., Trehan A., Sherwood J. et. al. The 22q11.2 region regulates presynaptic gene-products linked to schizophrenia. Nature Communications, 2022, vol. 13, no. 1, pp. 3690. https://doi.org/10.1038/s41467-022-31436-8

7. Alexeev N., Isomurodov J., Sukhov V., Korotkevich G., Sergushichev A. Markov chain Monte Carlo for active module identification problem. BMC Bioinformatics, 2020, vol. 21, Suppl. 6, pp. 261. https://doi.org/10.1186/s12859-020-03572-9

8. Usoltsev D.A., Molotkov I.I., Artomov M.N., Sergushichev A.A., Shalyto A.A. Application of Markov chain Monte Carlo and machine learning for identifying active modules in biological graphs. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2024, vol. 24, no. 6, pp. 962–971. (in Russian). https://doi.org/10.17586/2226-1494-2024-24-6-962-971

9. Kim T.K., Park J.H. More about the basic assumptions of t-test: normality and sample size. Korean Journal of Anesthesiology, 2019, vol. 72, no. 4, pp. 331–335. https://doi.org/10.4097/kja.d.18.00292

10. Pounds S., Morris S.W. Estimating the occurrence of false positives and false negatives in microarray studies by approximating and partitioning the empirical distribution of p-values. Bioinformatics, 2003, vol. 19, no. 10, pp. 1236–1242. https://doi.org/10.1093/bioinformatics/btg148

11. Ham H., Park T. Combining p-values from various statistical methods for microbiome data. Frontiers in Microbiology, 2022, vol. 13, pp. 990870. https://doi.org/10.3389/fmicb.2022.990870

12. Li T., Wernersson R., Hansen R.B., Horn H., Mercer J., Slodkowicz G., Workman C.T., Rigina O., Rapacki K., Stærfeldt H.H., Brunak S., Jensen T.S., Lage K. A scored human protein-protein interaction network to catalyze genomic interpretation. Nature Methods, 2017, vol. 14, no. 1, pp. 61–64. https://doi.org/10.1038/nmeth.4083

13. Rossi R., Ahmed N. The network data repository with interactive graph analytics and visualization. Proc. of the 29th AAAI Conference on Artificial Intelligence, 2015, vol. 29, no. 1. https://doi.org/10.1609/aaai.v29i1.9277

14. Amunts K., Lepage C., Borgeat L., Mohlberg H., Dickscheid T., Rousseau M.É., Bludau S., Bazin P.L., Lewis L.B., OrosPeusquens A.M., Shah N.J., Lippert T., Zilles K., Evans A.C. BigBrain: an ultrahigh-resolution 3D human brain model. Science, 2013, vol. 340, no. 6139, pp. 1472–1475. https://doi.org/10.1126/science.1235381

15. Cho A., Shin J., Hwang S., Kim C., Shim H., Kim H., Kim H., Lee I. WormNet v3: a network-assisted hypothesis-generating server for Caenorhabditis elegans. Nucleic Acids Research, 2014, vol. 42, no. W1, pp. W76–W82. https://doi.org/10.1093/nar/gku367

16. Zhu Z., Zhang F., Hu H., Bakshi A., Robinson M.R., Powell J.E., Montgomery G.W., Goddard M.E., Wray N.R., Visscher P.M., Yang J. Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nature Genetics, 2016, vol. 48, no. 5, pp. 481–487. https://doi.org/10.1038/ng.3538

17. Usoltsev D., Molotkov I., Artomov M. A meta-predictor for causal gene identification in GWAS overcomes limitations of existing computational approaches. American Society of Human Genetics (Complex Traits and Polygenic Disorders Poster Friday Session), 2024.

18. Pardiñas A.F., Holmans P., Pocklington A.J., Escott-Price V., Ripke S., Carrera N., Legge S.E., Bishop S., Cameron D., Hamshere M.L., et. al. Common schizophrenia alleles are enriched in mutation-intolerant genes and in regions under strong background selection. Nature Genetics, 2018, vol. 50, no. 3, pp. 381–389. https://doi.org/10.1038/s41588-018-0059-2

19. Barbeira A.N., Dickinson S.P., Bonazzola R., Zheng J., Wheeler H.E., Torres J.M., Torstenson E.S., Shah K.P., Garcia T., Edwards T.L., Stahl E.A., Huckins L.M., Nicolae D.L., Cox N.J., Im H.K. Exploring the phenotypic consequences of tissue specific gene expression variation inferred from GWAS summary statistics. Nature Communications, 2018, vol. 9, no. 1, pp. 1825. https://doi.org/10.1038/s41467-018-03621-1

20. Urbut S.M., Wang G., Carbonetto P., Stephens M. Flexible statistical methods for estimating and testing effects in genomic studies with multiple conditions. Nature Genetics, 2019, vol. 51, no. 1, pp. 187–195. https://doi.org/10.1038/s41588-018-0268-8

21. Kolosov N., Daly M.J., Artomov M. Prioritization of disease genes from GWAS using ensemble-based positive-unlabeled learning. European Journal of Human Genetics, 2021, vol. 29, no. 10, pp. 1527–1535. https://doi.org/10.1038/s41431-021-00930-w

22. Lam M., Chen C-Y., Li Z., Martin A.R., Bryois J., Ma X., Gaspar H., Ikeda M., Benyamin B., Brown B.C. et al. Comparative genetic architectures of schizophrenia in East Asian and European populations. Nature Genetics, 2019, vol. 51, no. 12, pp. 1670–1678. https://doi.org/10.1038/s41588-019-0512-x

23. Singh T., Poterba T., Curtis D., Akil H., Al Eissa M., Barchas J.D., Bass N., Bigdeli T.B., Breen G., Bromet E.J. et al. Rare coding variants in ten genes confer substantial risk for schizophrenia. Nature, 2022, vol. 604, no. 7906, pp. 509–516. https://doi.org/10.1038/s41586-022-04556-w

24. Usoltsev D., Kolosov N., Rotar O., Loboda A., Boyarinova M., Moguchaya E., Kolesova E., Erina A., Tolkunova K., Rezapova V., Molotkov I. et al. Complex trait susceptibilities and population diversity in a sample of 4,145 Russians. Nature Communications, 2024, vol. 15, no. 1, pp. 6212. https://doi.org/10.1038/s41467-024-50304-1

25. Usoltsev D., Njauw C.N., Ji Z., Kumar R., Sergushichev A., Zhang S., Shlyakhto E., Daly M.J., Artomov M., Tsao H. Analysis of variants induced by combined ex vivo irradiation and in vivo tumorigenesis suggests a role for the ZNF831 p.R1393Q variant in cutaneous melanoma development. Journal of Investigative Dermatology, 2024, in Press, corrected proof. https://doi.org/10.1016/j.jid.2024.08.042

26. Loboda A.A. A method of graphical clustering for joint analysis of genotyping and expression data. Dissertation for the degree of candidate of technical sciences. St. Petersburg, 2022, 232 p. (in Russian)

27. Subramanian A., Tamayo P., Mootha V.K., Mukherjee S., Ebert B.L., Gillette M.A., Paulovich A., Pomeroy S.L., Golub T.R., Lander E.S., Mesirov J.P. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc. of the National Academy of Sciences of the United States of America, 2005, vol. 102, no. 43, pp. 15545–15550. https://doi.org/10.1073/pnas.0506580102


Review

For citations:


Usoltsev D.A., Molotkov I.I., Artomov M.N., Sergushichev A.A., Shalyto A.A. Method for identifying the active module in biological graphs with multi-component vertex weights. Scientific and Technical Journal of Information Technologies, Mechanics and Optics. 2025;25(3):487-497. (In Russ.) https://doi.org/10.17586/2226-1494-2025-25-3-487-497

Views: 6


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2226-1494 (Print)
ISSN 2500-0373 (Online)