Fault-tolerant Hamiltonian cycle strategy for fast node fault diagnosis based on PMC in data center networks

Zhipeng Zhao; Zhenyu Hu; Zhiyu Zhao; Xiaoyu Du; Tianfei Chen; Lijun Sun; Zhipeng Zhao; Zhenyu Hu; Zhiyu Zhao; Xiaoyu Du; Tianfei Chen; Lijun Sun

doi:10.3934/mbe.2024093

Mathematical Biosciences and Engineering

2024, Volume 21, Issue 2: 2121-2136. doi: 10.3934/mbe.2024093

Previous Article Next Article

Communication Special Issues

Fault-tolerant Hamiltonian cycle strategy for fast node fault diagnosis based on PMC in data center networks

1.
Key Laboratory of Grain Information Processing and Control (Henan University of Technology), Minister of Education, Zhengzhou, 450001, P.R. China
2.
Henan Key Laboratory of Grain Photoelectric Detection and Control, Henan University of Technology, Zhengzhou 450001, P. R. China
3.
College of Information Science and Engineering, Henan University of Technology, Zhengzhou 450001, P. R. China
4.
College of Engineering Economics, Henan Finance University, Zhengzhou, Henan, 450001, P. R. China
5.
Luoyang Railway Station of China Railway Zhengzhou Bureau Group Co. LTD, Luoyang, Henan, 450001, P. R. China
6.
College of Computer and Information Engineering, Henan University, Kaifeng, Henan, 475000, P. R. China

Received: 31 October 2023 Revised: 19 December 2023 Accepted: 22 December 2023 Published: 10 January 2024

System-level fault diagnosis model, namely, the PMC model, detects fault nodes only through the mutual testing of nodes in the system without physical equipment. In order to achieve server nodes fault diagnosis in large-scale data center networks (DCNs), the traditional algorithm based on the PMC model cannot meet the characteristics of high diagnosability, high accuracy and high efficiency due to its inability to ensure that the test nodes are fault-free. This paper first proposed a fault-tolerant Hamiltonian cycle fault diagnosis (FHFD) algorithm, which tests nodes in the order of the Hamiltonian cycle to ensure that the test nodes are faultless. In order to improve testing efficiency, a hierarchical diagnosis mechanism was further proposed, which recursively divides high scale structures into a large number of low scale structures based on the recursive structure characteristics of DCNs. Additionally, we proved that $ 2(n-2){n^{k-1}} $ and $ (n-2){t_{n, k}}/{t_{n, 1}} $ faulty nodes could be detected for $ BCub{e_{n, k}} $ and $ DCel{l_{n, k}} $ within a limited time for the proposed diagnosis strategy. Simulation experiments have also shown that our proposed strategy has improved the diagnosability and test efficiency dramatically.
- Data center,
- Fault-tolerant Hamiltonian cycle,
- server node fault diagnosis
Citation: Zhipeng Zhao, Zhenyu Hu, Zhiyu Zhao, Xiaoyu Du, Tianfei Chen, Lijun Sun. Fault-tolerant Hamiltonian cycle strategy for fast node fault diagnosis based on PMC in data center networks[J]. Mathematical Biosciences and Engineering, 2024, 21(2): 2121-2136. doi: 10.3934/mbe.2024093

Related Papers:

Abstract

System-level fault diagnosis model, namely, the PMC model, detects fault nodes only through the mutual testing of nodes in the system without physical equipment. In order to achieve server nodes fault diagnosis in large-scale data center networks (DCNs), the traditional algorithm based on the PMC model cannot meet the characteristics of high diagnosability, high accuracy and high efficiency due to its inability to ensure that the test nodes are fault-free. This paper first proposed a fault-tolerant Hamiltonian cycle fault diagnosis (FHFD) algorithm, which tests nodes in the order of the Hamiltonian cycle to ensure that the test nodes are faultless. In order to improve testing efficiency, a hierarchical diagnosis mechanism was further proposed, which recursively divides high scale structures into a large number of low scale structures based on the recursive structure characteristics of DCNs. Additionally, we proved that $ 2(n-2){n^{k-1}} $ and $ (n-2){t_{n, k}}/{t_{n, 1}} $ faulty nodes could be detected for $ BCub{e_{n, k}} $ and $ DCel{l_{n, k}} $ within a limited time for the proposed diagnosis strategy. Simulation experiments have also shown that our proposed strategy has improved the diagnosability and test efficiency dramatically.

References

[1]	Z. P. Zhao, W. D. Yang, B. Wu, Flow aggregation through dynamic routing overlaps in software defined networks, Elsevier Comput. Networks, 176 (2020), 107293. https://doi.org/10.1016/j.comnet.2020.107293 doi: 10.1016/j.comnet.2020.107293
[2]	N. W. Chang, S. Y. Hsieh, Conditional diagnosability of augmented cubes under the PMC model, IEEE Transact. Dependable Secure Comput., 9 (2011), 46–60. https://doi.org/10.1109/TDSC.2010.59 doi: 10.1109/TDSC.2010.59
[3]	F. P. Preparata, G. Metze, F. P. Chien, On the connection assignment problem of diagnosable systems, IEEE Transact. Electron. Comput., 16 (2006), 848–854.
[4]	S. Karunanithi, A. D. Friedman, Analysis of digital systems using a new measure of system diagnosis, IEEE Transact. Comput., 28 (2006), 121-133. https://doi.org/10.1109/TC.1979.1675301 doi: 10.1109/TC.1979.1675301
[5]	A. K. Somani, O. Peleg, On diagnosability of large fault sets in regular topology-based computer systems, IEEE Transact. Computers, 45 (1996), 892–903. https://doi.org/10.1109/12.536232 doi: 10.1109/12.536232
[6]	H. Zhuang, S. Zheng, X. Liu, C. K. Lin, X. Li, Reliability evaluation of generalized exchanged hypercubes based on imprecise diagnosis strategies, Parallel Process. Letters, 31 (2021), 2150005.
[7]	C. H. Tsai, A quick pessimistic diagnosis algorithm for hypercube-Like multiprocessor systems under the PMC model, IEEE Transact. Computers, 62 (2013), 259–267. https://doi.org/10.1109/TC.2011.228 doi: 10.1109/TC.2011.228
[8]	S. Zhou, L. Lin, L. Xu, D. Wang, The t/k -diagnosability of star graph networks, IEEE Transact. Comput., 64 (2015), 547–555. https://doi.org/10.1109/TC.2013.228 doi: 10.1109/TC.2013.228
[9]	D. Cheng, The pessimistic diagnosability of graphs and its applications to four kinds of interconnection networks, Int. J. Computer Math. Computer Syst. Theory, 4 (2019), 37–47. https://doi.org/10.1080/23799927.2019.1567593 doi: 10.1080/23799927.2019.1567593
[10]	L. Lin, Y. Huang, L. Xu, S. Y. Hsieh, A pessimistic fault diagnosability of large-scale connected networks via extra connectivity, IEEE Transact. Parallel Distributed Syst., 33 (2021), 415–428. https://doi.org/10.1109/TPDS.2021.3093243 doi: 10.1109/TPDS.2021.3093243
[11]	X. Li, J. Fan, C. K. Lin, X. Jia, Diagnosability evaluation of the data center network DCell, Computer J., 61 (2018), 129–143. https://doi.org/10.1093/comjnl/bxx057 doi: 10.1093/comjnl/bxx057
[12]	X. Li, J. Fan, C. K. Lin, B. Cheng, X. Jia, The extra connectivity, extra conditional diagnosability and t/k-diagnosability of the data center network DCell, Theor. Computer Sci., 766 (2019), 16–29.
[13]	H. Huang, Y. Chen, X. Liu, Z. Han, F. Xiao, t-diagnosability and conditional diagnosability of BCube networks, in IEEE 21st International Conference on High Performance Computing and Communications, (2019). https://doi.org/10.1109/HPCC/SmartCity/DSS.2019.00328
[14]	N. X. Heng, C. Z. Run, Z. Miao, A hierarchical fault diagnosis algorithm for data center networks, J. Electron., 42 (2014), 2536–2542. https://doi.org/10.3969/j.issn.0372-2112.2014.12.029 doi: 10.3969/j.issn.0372-2112.2014.12.029
[15]	T. K. Li, C. H. Tsai, A fast fault-identification algorithm for bijective connection graphs using the PMC model, Inform. Sci., 187 (2012), 291–297. https://doi.org/10.1016/j.ins.2011.10.022 doi: 10.1016/j.ins.2011.10.022
[16]	L. C. Ye, J. R. Ye, Five-round adaptive diagnosis in hamiltonian networks, IEEE Transact. Parallel Distributed Syst., 26 (2015), 2459–2464. https://doi.org/10.1109/TPDS.2014.2350480 doi: 10.1109/TPDS.2014.2350480
[17]	X. L. Sun, J X Fan, B L Cheng, Y. Wang, L. Zhang, Probabilistic fault diagnosis of clustered faults for multiprocessor systems, J. Comput. Sci. Technol., 38 (2023), 821–833.
[18]	C. Guo, H. Wu, K. Tan, L. Shi, Y. Zhang, S. Lu, Dcell: A scalable and Fault-tolerant network structure for data centers, SIGCOMM, (2008), 75–86.
[19]	X. Wang, A. Erickson, J. Fan, X. Jia, Hamiltonian properties of DCell networks, Comput. J., 11 (2015), 1–11. https://doi.org/10.1093/comjnl/bxv019 doi: 10.1093/comjnl/bxv019
[20]	C. Guo, G. Lu, D. Li, H. Wu, X. Zhang, Y. Shi, et al., BCube: A high performance, server-centric network architecture for modular data centers, SIGCOMM, (2009), 63–74. https://doi.org/10.1145/1594977.1592577
[21]	C. H. Huang, J. F. Fang, The pancyclicity and the hamiltonian-connectivity of the generalized base-b hypercube, J. Supercomput., 34 (2008), 263–269.
[22]	G. J. Wang, C. K. Lin, J. X. Lin, J. Y. Lin, B. L. Lin, Fault-tolerant hamiltonicity and hamiltonian connectivity of BCube with various faulty elements, J. Comput. Sci. Technol., 35 (2020), 1064–1083.

Reader Comments

Your name:*

Email:*
© 2024 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)