Accelerating and analyzing performance of shortest path algorithms on GPU  using CUDA platform: Bellman-Ford, Dijkstra, and Floyd-Warshall algorithms

D. Bodra; S. Khairnar

doi:10.17586/2226-1494-2025-25-5-866-875

Accelerating and analyzing performance of shortest path algorithms on GPU using CUDA platform: Bellman-Ford, Dijkstra, and Floyd-Warshall algorithms

D. Bodra, S. Khairnar

https://doi.org/10.17586/2226-1494-2025-25-5-866-875

Full Text:

PDF (Eng)

Generate QR code

Abstract

The computational demands of the shortest path algorithms on large-scale graphs with millions of vertices and edges pose significant challenges for serial implementations, often requiring hours of execution time even on powerful CPUs. This paper evaluates Graphic Processing Units implementations of three fundamental shortest path algorithms — BellmanFord, Dijkstra, and Floyd-Warshall using NVIDIA CUDA platform. We implemented and compared multiple variants of each algorithm, starting with basic parallel approaches and applying various optimization techniques, including gridstride loops, shared memory utilization, memory coalescing, and algorithm-specific enhancements such as flag-based early termination for Bellman-Ford and tiled computation for Floyd-Warshall. Our study provides performance analysis comparing different optimization strategies and their effectiveness across various graph datasets.

Keywords

GPU computing, CUDA platform, shortest path algorithms, parallel algorithms, graph algorithms, Bellman-Ford, Dijkstra, Floyd-Warshall, performance optimization

About the Authors

D. Bodra

Harrisburg University of Science and Technology
United States

Deep Bodra — Magister, Student

sc 57216618940

Harrisburg,17101

S. Khairnar

Virginia Tech
United States

Sushil Khairnar — Magister, Student

sc 57204777066

Virginia, 24061

References

1. Harish P., Narayanan P.J. Accelerating large graph algorithms on the GPU using CUDA. Lecture Notes in Computer Science, 2007, vol. 4873, pp. 197–208. https://doi.org/10.1007/978-3-540-77220-0_21

2. Cormen T.H., Leiserson C.E., Rivest R.L., Stein C. Introduction to Algorithms. MIT press, 2009. 1292 p.

3. Katz G.J., Kider J.T. All-pairs shortest-paths for large graphs on the GPU. Proc. of the 23rd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware, 2008, pp. 47–55.

4. Lebedev S.S., Novikov F.A. The Necessary and sufficient condition for Dijkstra’s algorithm applicability. Computer Tools in Education, 2017, no. 4, pp. 5–13. (in Russian)

5. Lund B., Smith J.W. A multi-stage cuda kernel for floyd-warshall. arXiv, 2010, arXiv:1001.4108. https://doi.org/10.48550/arXiv.1001.4108

6. Winkler D., Meister M., Rezavand M., Rauch W. gpuSPHASE—A shared memory caching implementation for 2D SPH using CUDA. Computer Physics Communications, 2017, vol. 213, pp. 165–180. https://doi.org/10.1016/j.cpc.2016.11.011

7. Harris M. CUDA Pro Tip: write flexible kernels with grid-stride loops. Available at: https://developer.nvidia.com/blog/cuda-pro-tipwrite-flexible-kernels-grid-stride-loops. (accessed: 30.05.2025)

8. Yang S., Liu X., Wang Y., He X., Tan G. Fast All-Pairs Shortest Paths algorithm in large sparse graph. Proc. of the 37th International Conference on Supercomputing, 2023, pp. 277–288. https://doi.org/10.1145/3577193.3593728

9. Tang W., Chen T., Armstrong M.P. GPU-accelerated parallel all-pair shortest path routing within stochastic road networks. International Journal of Geographical Information Science, 2025, vol. 39, no. 1, pp. 53–85. https://doi.org/10.1080/13658816.2024.2394651

10. Spridon D.E., Deaconu A.M., Tayyebi J. Novel GPU-based method for the generalized maximum flow problem. Computation, 2025, vol. 13, no. 2, pp. 40. https://doi.org/10.3390/computation13020040

11. Agarwal P., Dutta M. New approach of Bellman Ford algorithm on GPU using compute unified design architecture (CUDA). International Journal of Computer Applications, 2015, vol. 110, no. 13, pp. 1–5. https://doi.org/10.5120/19375-1027

12. Song B. High-performance parallelization of Dijkstra’s algorithm using MPI and CUDA. arXiv, 2025, arXiv:2504.03667. https://doi.org/10.48550/arXiv.2504.03667

13. Bengtsson M., Wittsten J., Waidringer J. Warehouse storage and retrieval optimization via clustering, dynamic systems modeling, and GPU-accelerated routing. arXiv, 2025, arXiv:2504.20655. https://doi.org/10.48550/arXiv.2504.20655

14. Kumar H.S., Singh A., Ojha M.K. Artificial intelligence based navigation in quasi structured environment. arXiv, 2024, arXiv:2407.17508. https://doi.org/10.48550/arXiv.2407.17508

15. Morgan N., Yenusah C., Diaz A., Dunning D., Moore J., Heilman E., et al. On a simplified approach to achieve parallel performance and portability across CPU and GPU architectures. Information, 2024, vol. 15, no. 11, pp. 673. https://doi.org/10.3390/info15110673

16. Leskovec J., Krevl A. SNAP Datasets: Stanford large network dataset collection. 2014. Available at: http://snap.stanford.edu/data

17. Buluç A., Gilbert J.R., Budak C. Solving path problems on the GPU. Parallel Computing, 2010, vol. 36, no. 5-6, pp. 241–253. https://doi.org/10.1016/j.parco.2009.12.002

18. Merrill D., Garland M., Grimshaw A. Scalable GPU graph traversal. ACM SIGPLAN Notices, 2012, vol. 47, no. 8, pp. 117–128. https://doi.org/10.1145/2370036.2145832

19. Kirk D.B., Hwu W.M.W. Programming Massively Parallel Processors: a Hands-on Approach. Morgan Kaufmann, 2016, 576 p.

20. Nickolls J., Buck I., Garland M., Skadron K. Scalable parallel programming with CUDA. Queue, 2008, vol. 6, no. 2, pp. 40–53. https://doi.org/10.1145/1365490.1365500

21. Bodra D., Khairnar S. Comparative performance analysis of modern NoSQL data technologies: Redis, Aerospike, and Dragonfly. Journal of Research, Innovation and Technologies, 2025, vol. 4, no. 2, pp. 193–200. https://doi.org/10.57017/jorit.v4.2(8).05

22. Khairnar S., Bodra D. Recommendation engine for Amazon magazine subscriptions. International Journal of Advanced Computer Science and Applications, 2025, vol. 16, no. 7, pp. 1–8. https://doi.org/10.14569/ijacsa.2025.0160796

Review

For citations:

Bodra D., Khairnar S. Accelerating and analyzing performance of shortest path algorithms on GPU using CUDA platform: Bellman-Ford, Dijkstra, and Floyd-Warshall algorithms. Scientific and Technical Journal of Information Technologies, Mechanics and Optics. 2025;25(5):866-875. https://doi.org/10.17586/2226-1494-2025-25-5-866-875

This work is licensed under a Creative Commons Attribution 4.0 License.

ISSN 2226-1494 (Print)
ISSN 2500-0373 (Online)

Username
Password
	Remember me
Not a user? Register with this site Forgot your password?

User

Scientific and Technical Journal of Information Technologies, Mechanics and Optics