Assessment of the reliability of a recoverable container virtualization cluster
https://doi.org/10.17586/2226-1494-2025-25-5-988-995
Abstract
Container virtualization technology is increasingly being used in the development of fault-tolerant clusters with high availability and low request processing latency. In designing highly reliable clusters, a key task is the structuralparametric model-oriented synthesis which takes into account the impact of the number of deployed containers on performance, request processing latency, and system reliability. Justifying the choice of solutions to ensure high cluster reliability currently requires the development of reliability models for recoverable container virtualization clusters during reconfiguration, considering the migration of virtual containers. The basis for decisions to ensure high cluster availability is the development of models for a recoverable cluster during reconfiguration, taking into account the migration of virtual containers. The novelty of the proposed Markov model of a cluster lies in considering a two-stage recovery of its operability, determining the impact of the number of containers to be migrated during reconfiguration — both before and after the physical recovery of failed servers — on cluster reliability. Two options for container migration during cluster recovery are considered. In the first scenario, during the physical recovery phase of a failed server, container migration to a functional server does not occur, while in the second scenario it does. In the second stage of reconfiguration, following the physical recovery of a failed server, container migration takes place, allowing for either an increase or decrease in the number of containers deployed on them. Based on the proposed Markov models of cluster reliability with container virtualization, an evaluation of its readiness coefficient is provided, and the influence of the number of containers loaded during migration at the two reconfiguration stages on system reliability is determined. The proposed Markov models of cluster reliability with container virtualization are aimed at justifying design decisions for organizing and restoring cluster operability after server failures, considering the impact of container migration implementation options on system availability. Future research will analyze the impact of container migration options on both cluster availability and request processing latency at the two considered reconfiguration stages.
About the Authors
V. A. BogatyrevRussian Federation
Vladimir A. Bogatyrev — D.Sc., Professor; Russian Federation; Professor
sc 7006571069
Saint Petersburg, 190000
Saint Petersburg, 197101
V. Q. Phung
Russian Federation
Van Quy Phung — PhD Student
Saint Petersburg, 197101
References
1. Goyal P., Deora S.S. Reliability of Trust Management Systems in Cloud Computing. Indian Journal of Cryptography and Network Security, 2022, vol. 2, no. 1, pp. 1–5. https://doi.org/10.54105/ijcns.C1417.051322
2. Chen G., Guan N., Huang K., Yi W. Fault-tolerant real-time tasks scheduling with dynamic fault handling. Journal of Systems Architecture, 2020, vol. 102, pp. 101688. https://doi.org/10.1016/j.sysarc.2019.101688
3. Shubinsky I.B., Rozenberg I.N., Papic L. Adaptive fault tolerance in real-time information systems. Reliability: Theory and Applications, 2017, vol. 12, no. 1 (44), pp. 18–25.
4. Chinnaiah N.R., Niranjan N. Fault tolerant software systems using software configurations for cloud computing. Journal of Cloud Computing, 2018, vol. 7, pp. 3. https://doi.org/10.1186/s13677-018-0104-9
5. Srivastava A., Kumar N. Queueing model based dynamic scalability for containerized cloud. International Journal of Advanced Computer Science and Applications, 2023, vol. 14, no. 1, pp. 465–472. https://doi.org/10.14569/IJACSA.2023.0140150
6. Shukur H.M., Zeebaree S.R.M., Zebari R.R., Zeebaree D.Q., Ahmed O.M., Salih A.A. Cloud computing virtualization of resources allocation for distributed systems. Journal of Applied Science and Technology Trends, 2020, vol. 1, no. 2, pp. 98–105. https://doi.org/10.38094/jastt1331
7. Alam I., Sharif K., Li F., Latif Z., Karim M.M., Biswas S., Nour B., Wang Y. A survey of network virtualization techniques for Internet of things using SDN and NFV. ACM Computing Surveys, 2020, vol. 53, no. 2, pp. 1–40. https://doi.org/10.1145/3379444
8. Chen H., Qin W., Wang L. Task partitioning and offloading in IoT cloud-edge collaborative computing framework: a survey. Journal of Cloud Computing, 2022, vol. 11, pp. 86. https://doi.org/10.1186/s13677-022-00365-8
9. Kushchazli A., Safargalieva A., Kochetkova I., Gorshenin A. Queuing model with customer class movement across server groups for analyzing virtual machine migration in cloud computing. Mathematics, 2024, vol. 12, no. 3, pp. 468. https://doi.org/10.3390/math12030468
10. Kumari P., Kaur P. A survey of fault tolerance in cloud computing. Journal of King Saud University — Computer and Information Sciences, 2021, vol. 33, no. 10, pp. 1159–1176. https://doi.org/10.1016/j.jksuci.2018.09.021
11. Tatarnikova T.M., Arkhiptsev E.D. Designing fault-tolerant systems with micro-service architecture. Proc. of the 27th International Conference on Soft Computing and Measurements (SCM), 2024, pp. 348–351. https://doi.org/10.1109/SCM62608.2024.10554143
12. Bogatyrev V.A. Protocols for dynamic distribution of requests through a bus with variable logic ring for reception authority transfer. Automatic Control and Computer Sciences, 1999, vol. 33, no. 1, pp. 57–63.
13. Sovetov B.Ya., Tatarnikova T.M., Poymanova E.D. Storage scaling management model. Information and Control Systems, 2020, no. 5 (108), pp. 43–49. https://doi.org/10.31799/1684-8853-2020-5-43-49
14. Bogatyrev A.V., Bogatyrev V.A., Bogatyrev S.V. The probability of timeliness of a fully connected exchange in a redundant real-time communication system. Proc. of the Wave Electronics and its Application in Information and Telecommunication Systems (WECONF), 2020, pp. 1–4. https://doi.org/10.1109/WECONF48837.2020.9131517
15. Bogatyrev V.A., Bogatyrev S.V., Bogatyrev A.V. Control of multipath transmissions in the nodes of switching segments of reserved paths. Proc. of the International Conference on Information, Control, and Communication Technologies (ICCT), 2022, pp. 1–5. https://doi.org/10.1109/ICCT56057.2022.9976839
16. Terskov V., Sakash I. The reliability evaluation of local computer networks using markov model of multiple heterogeneous groups of switches. E3S Web of Conferences, 2024, vol. 592, pp. 3036. https://doi.org/10.1051/e3sconf/202459203036
17. Polovko A.M., Gurov S.V. Fundamentals of Reliability Theory. St. Petersburg, BHV-Petersburg Publ., 2006, 702 p. (in Russian)
18. Koren I. Fault-Tolerant Systems. Morgan Kaufmann, 2007, 400 p.
19. Aysan H. Fault-tolerance strategies and probabilistic guarantees for real-time systems. Doctoral dissertation, Mälardalen University, 2012, 109 p.
20. Rakhman P.A., Sharipov M.I. Reliability model of a two-node cluster of high-availability applications in enterprise management systems. Ekonomika i menedzhment sistem upravleniya, 2015, no. 3 (17), pp. 85–102. (in Russian)
21. Khomonenko A.D., Blagoveshchenskaya E.A., Prourzin O.V., Andruk A.A. Forecasting the reliability of a cluster computing system using a semi-Markov model of alternating processes and monitoring. High Technologies in Earth Space Research. H&ES Research, 2018, vol. 10, no. 4, pp. 72–82. (in Russian). https://doi.org/10.24411/2409-5419-2018-10099
22. Bogatyrev V.A., Vinokurova M.S. Control and safety of operation of duplicated computer systems. Communications in Computer and Information Science, 2017, vol. 700, pp. 331–342. https://doi.org/10.1007/978-3-319-66836-9_28
23. Bogatyrev V.A. Exchange of duplicated computing complexes in fault-tolerant systems. Automatic Control and Computer Sciences, 2011, vol. 45, no. 5, pp. 268–276. https://doi.org/10.3103/S014641161105004X
24. Bogatyrev V.A., Bogatyrev S.V., Bogatyrev A.V. Assessment of the readiness of a computer system for timely servicing of requests when combined with information recovery of memory after failures. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2023, vol. 23, no. 3, pp. 608–617. (in Russian). https://doi.org/10.17586/2226-1494-2023-23-3-608-617
25. Compastié M., Badonnel R., Festor O., He R. From virtualization security issues to cloud protection opportunities: An in-depth analysis of system virtualization models. Computers & Security, 2020, vol. 97, pp. 101905. https://doi.org/10.1016/j.cose.2020.101905
26. Choudhary A., Govil M.C., Singh G., Awasthi L.K., Pilli E.S., Kapil D. A critical survey of live virtual machine migration techniques. Journal of Cloud Computing, 2017, vol. 6, pp. 23. https://doi.org/10.1186/s13677-017-0092-1
27. Aleksankov S.M. Models of live migration with iterative approach and move of virtual machines. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2015, vol.15, no. 6, pp. 1098–1104. (in Russian). https://doi.org/10.17586/2226-1494-2015-15-6-1098-1104
28. Bogatyrev V.A., Derkach A.N. Evaluation of a cyber-physical computing system with migration of virtual machines during continuous computing. Computers, 2020, vol. 9, no. 2, pp. 42. https://doi.org/10.3390/computers9020042
29. Kleinrock L. Queueing Systems. Volume 1: Theory. WileyInterscience, 1975, 417 p.
30. Phung V.Q., Bogatyrev V.F., Karmanovskiy N.S., Le V.H. Evaluation of probabilistic-temporal characteristics of a computer system with container virtualization. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2024, vol. 24, no. 2, pp. 249–255. (in Russian). https://doi.org/10.17586/2226-1494-2024-24-2-249-255
31. Nguyen T.A., Kim D.S., Park J.S. A comprehensive availability modeling and analysis of a virtualized servers system using stochastic reward nets. The Scientific World Journal. 2014. V. 2014. P. 165316. https://doi.org/10.1155/2014/165316
Review
For citations:
Bogatyrev V.A., Phung V.Q. Assessment of the reliability of a recoverable container virtualization cluster. Scientific and Technical Journal of Information Technologies, Mechanics and Optics. 2025;25(5):988-995. (In Russ.) https://doi.org/10.17586/2226-1494-2025-25-5-988-995































