Preview

Scientific and Technical Journal of Information Technologies, Mechanics and Optics

Advanced search

Multiple context-free path querying by matrix multiplication

https://doi.org/10.17586/2226-1494-2023-23-2-271-278

Abstract

Many graph analysis problems can be formulated as formal language-constrained path querying problems where the formal languages are used as constraints for navigational path queries. Recently, the context-free language (CFL) reachability formulation has become very popular and can be used in many areas, for example, querying graph databases, Resource Description Framework (RDF) analysis. However, the generative capacity of context-free grammars (CFGs) is too weak to generate some complex queries, for example, from natural languages, and the various extensions of CFGs have been proposed. Multiple context-free grammar (MCFG) is one of such extensions of CFGs. Despite the fact that, to the best of our knowledge, there is no algorithm for MCFL-reachability, this problem is known to be decidable. This paper is devoted to developing the first such algorithm for the MCFL-reachability problem. The essence of the proposed algorithm is to use a set of Boolean matrices and operations on them to find paths in a graph that satisfy the given constraints. The main operation here is Boolean matrix multiplication. As a result, the algorithm returns a set of matrices containing all information needed to solve the MCFL-reachability problem. The presented algorithm is implemented in Python using GraphBLAS API. An analysis of real RDF data and synthetic graphs for some MCFLs is performed. The study showed that using a sparse format for matrix storage and parallel computing for graphs with tens of thousands of edges the analysis time can be 10–20 minutes. The result of the analysis provides tens of millions of reachable vertex pairs. The proposed algorithm can be applied in problems of static code analysis, bioinformatics, network analysis, as well as in graph databases when a path query cannot be expressed using context-free grammars. The provided algorithm is linear algebra-based, hence, it allows one to use high-performance libraries and utilize modern parallel hardware.

About the Authors

I. V. Epelbaum
Querify Labs Ltd
Russian Federation

lya V. Epelbaum — Student

Saint Petersburg, 193313

sc 57218700891



R. Sh. Azimov
St. Petersburg State University (SPbSU)
Russian Federation

Rustam Sh. Azimov — Senior Lecturer

Saint Petersburg, 199034

sc 57203022098



S. V. Grigorev
St. Petersburg State University (SPbSU)
Russian Federation

Semyon V. Grigorev — PhD (Physics & Mathematics), Associate Professor

Saint Petersburg, 199034

sc 56047575300



References

1. Barrett C., Jacob R., Marathe M. Formal-language-constrained path problems. SIAM Journal on Computing, 2000, vol. 30, no. 3, pp. 809– 837. https://doi.org/10.1137/S0097539798337716

2. Terekhov A., Khoroshev A., Azimov R., Grigorev S. Context-free path querying with single-path semantics by matrix multiplication. Proc. of the 3 rd Joint International Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics ( NDA ) , 2020, pp. 1 – 12 . https://doi.org/10.1145/3398682.3399163

3. Zhang X., Feng Z., Wang X., Rao G., Wu W. Context-free path queries on RDF graphs. Lecture Notes in Computer Science, 2016, vol. 9981, pp. 632–648. https://doi.org/10.1007/978-3-319-46523-4_38

4. Zheng X., Rugina R. Demand-driven alias analysis for C. Proc. of the 35 th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 2008, pp. 197–208. https://doi.org/10.1145/1328438.1328464

5. Sevon P., Eronen L. Subgraph queries by context-free grammars. Journal of Integrative Bioinformatics, 2008, vol. 5, no. 2, pp. 157– 172. https://doi.org/10.1515/jib-2008-100

6. Zhang Q., Lyu M.R., Yuan H., Su Z. Fast algorithms for dyck-cflreachability with applications to alias analysis. Proc. of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2013, pp. 435 – 446. https://doi.org/10.1145/2491956.2462159

7. Reps T. Undecidability of context-sensitive data-dependence analysis. ACM Transactions on Programming Languages and Systems, 2000, vol. 22, no. 1, pp. 162–186. https://doi.org/10.1145/345099.345137

8. Hopcroft J.E., Motwani R., Ullman J.D. Introduction to automata theory, languages, and computation. ACM Sigact News, 2001, vol. 32, no. 1, pp. 60–65. https://doi.org/10.1145/568438.568455

9. Zhang Q., Su Z. Context-sensitive data-dependence analysis via linear conjunctive language reachability. Proc. of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages, 2017, pp. 344– 358. https://doi.org/10.1145/3009837.3009848

10. Kepner J., Aaltonen P., Bader D., Buluç A., Franchetti F., Gilbert J., Hutchison D., Kumar M., Lumsdaine A., Meyerhenke H., McMillan S., Yang C., Owens J.D., Zalewski M., Mattson T., Moreira J. Mathematical foundations of the graphblas. Proc. of the 2016 IEEE High Performance Extreme Computing Conference (HPEC), 2016, pp. 1–9. https://doi.org/10.1109/hpec.2016.7761646

11. Azimov R., Epelbaum I., Grigorev S. Context-free path querying with all-path semantics by matrix multiplication. Proc. of the 4 th ACM SIGMOD Joint International Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA), 2021, pp. 1–7. https://doi.org/10.1145/3461837.3464513

12. Azimov R., Grigorev S. Context-free path querying by matrix multiplication. Proc. of the 1 st ACM SIGMOD Joint International Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA), 2018, pp. 1–10. https://doi.org/10.1145/3210259.3210264

13. Nakanish R., Takada K., Seki H. An efficient recognition algorithm for multiple context-free languages. Proc. of the 5th Meeting on the Mathematics of Language (MOL5), 1997, pp. 119–123.

14. Cohen S.B., Gildea D. Parsing linear context-free rewriting systems with fast matrix multiplication. Computational Linguistics, 2016, vol. 42, no. 3, pp. 421–455. https://doi.org/10.1162/coli_a_00254

15. Seki H., Matsumura T., Fujii M., Kasami T. On multiple context-free grammars. Theoretical Computer Science, 1991, vol. 88, no. 2, pp. 191–229. https://doi.org/10.1016/0304-3975(91)90374-B

16. Hagberg A., Schult D., Swart P. Exploring network structure, dynamics, and function using network, Proc. of the 7 th Python in Science Conference (SciPy 2008), 2008, pp. 11–15.


Review

For citations:


Epelbaum I.V., Azimov R.Sh., Grigorev S.V. Multiple context-free path querying by matrix multiplication. Scientific and Technical Journal of Information Technologies, Mechanics and Optics. 2023;23(2):271-278. https://doi.org/10.17586/2226-1494-2023-23-2-271-278

Views: 7


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2226-1494 (Print)
ISSN 2500-0373 (Online)