The advantages of k-visibility: A comparative analysis of several time series clustering algorithms

Sergio Iglesias-Perez; Alberto Partida; Regino Criado; Sergio Iglesias-Perez; Alberto Partida; Regino Criado

doi:10.3934/math.20241687

AIMS Mathematics

2024, Volume 9, Issue 12: 35551-35569. doi: 10.3934/math.20241687

Previous Article Next Article

Research article

The advantages of k-visibility: A comparative analysis of several time series clustering algorithms

1.
Data, Complex Networks and Cybersecurity Sciences Technological Institute, Univ. Rey Juan Carlos, 28028 Madrid, Spain
2.
Science, Computing, and Technology Department, School of Architecture, Engineering and Design, Universidad Europea de Madrid, Calle Tajo, S/N, Villaviciosa de Odón, 28670 Madrid, Spain
3.
Department of Computer Science and Technology, Universidad Internacional de La Rioja, Logroño, Spain
4.
Departamento de Matematica Aplicada Ciencia e Ingenieria de los Materiales y Tecnologia Electronica ESCET Universidad Rey Juan Carlos C Tulipan, 28933 Mostoles (Madrid), Spain

Received: 23 July 2024 Revised: 20 November 2024 Accepted: 27 November 2024 Published: 20 December 2024
MSC : 05C85, 68R10, 68W05

This paper outlined the advantages of the k-visibility algorithm proposed in ^[1,2] compared to traditional time series clustering algorithms, highlighting enhanced computational efficiency and comparable clustering quality. This method leveraged visibility graphs, transforming time series into graph structures where data points were represented as nodes, and edges are established based on visibility criteria. It employed the traditional k-means clustering method to cluster the time series. This approach was particularly efficient for long time series and demonstrated superior performance compared to existing clustering methods. The structural properties of visibility graphs provided a robust foundation for clustering, effectively capturing both local and global patterns within the data. In this paper, we have compared the k-visibility algorithm with 4 algorithms frequently used in time series clustering and compared the results in terms of accuracy and computational time. To validate the results, we have selected 15 datasets from the prestigious UCR (University of California, Riverside) archive in order to make a homogeneous validation. The result of this comparison concluded that k-visibility was always the fastest algorithm and that it was one of the most accurate in matching the clustering proposed by the UCR archive.
- time series,
- visibility graph,
- clustering,
- computational efficiency
Citation: Sergio Iglesias-Perez, Alberto Partida, Regino Criado. The advantages of k-visibility: A comparative analysis of several time series clustering algorithms[J]. AIMS Mathematics, 2024, 9(12): 35551-35569. doi: 10.3934/math.20241687

Related Papers:

Abstract

This paper outlined the advantages of the k-visibility algorithm proposed in ^[1,2] compared to traditional time series clustering algorithms, highlighting enhanced computational efficiency and comparable clustering quality. This method leveraged visibility graphs, transforming time series into graph structures where data points were represented as nodes, and edges are established based on visibility criteria. It employed the traditional k-means clustering method to cluster the time series. This approach was particularly efficient for long time series and demonstrated superior performance compared to existing clustering methods. The structural properties of visibility graphs provided a robust foundation for clustering, effectively capturing both local and global patterns within the data. In this paper, we have compared the k-visibility algorithm with 4 algorithms frequently used in time series clustering and compared the results in terms of accuracy and computational time. To validate the results, we have selected 15 datasets from the prestigious UCR (University of California, Riverside) archive in order to make a homogeneous validation. The result of this comparison concluded that k-visibility was always the fastest algorithm and that it was one of the most accurate in matching the clustering proposed by the UCR archive.

References

[1]	S. Iglesias-Perez, R. Criado, Temporal metagraph: A new mathematical approach to capture temporal dependencies and interactions between different entities over time, Chaos Soliton Fract., 175 (2023), 113940. http://dx.doi.org/10.1016/j.chaos.2023.113940 doi: 10.1016/j.chaos.2023.113940
[2]	S. Iglesias-Perez, R. Criado, Increasing the effectiveness of network intrusion detection systems (NIDSs) by using multiplex networks and visibility graphs, Mathematics, 11 (2023), 107. http://dx.doi.org/10.3390/math11010107 doi: 10.3390/math11010107
[3]	L. Lacasa, B. Luque, F. Ballesteros, J. Luque, J. C. Nuno, From time series to complex networks: The visibility graph, Proc. Natl. Acad. Sci. USA, 105 (2008), 4972–4975. http://dx.doi.org/10.1073/pnas.0709247105 doi: 10.1073/pnas.0709247105
[4]	A. Partida, R. Criado, M. Romance, Visibility graph analysis of IOTA and IoTeX price series: An intentional risk-based strategy to use 5G for IoT, Electronics, 10 (2021), 2282. https://doi.org/10.3390/electronics10182282 doi: 10.3390/electronics10182282
[5]	J. Lopes, P. Pinto, A. Partida, A. Pinto, Use of visibility graphs for the early detection of DoS attacks, In: 2024 IEEE international conference on cyber security and resilience (CSR), 2024,101–106. https://doi.org/10.1109/CSR61664.2024.10679430
[6]	B. Luque, L. Lacasa, F. Ballesteros, J. Luque, Horizontal visibility graphs: Exact results for random time series, Phys. Rev. E, 80 (2009), 046103. http://dx.doi.org/10.1103/PhysRevE.80.046103 doi: 10.1103/PhysRevE.80.046103
[7]	G. Liu, L. Li, L. Zhang, Q. Li, S. S. Law, Sensor faults classification for SHM systems using deep learning-based method with Tsfresh features, Smart Mater. Struct., 29 (2020), 075005. https://doi.org/10.1088/1361-665X/ab85a6 doi: 10.1088/1361-665X/ab85a6
[8]	S. Aghabozorgi, A. S. Shirkhorshidi, T. Y. Wah, Time-series clustering–a decade review, Inform. Syst., 53 (2015), 16–38. http://dx.doi.org/10.1016/j.is.2015.04.007 doi: 10.1016/j.is.2015.04.007
[9]	T. W. Liao, Clustering of time series data—a survey, Pattern Recognit., 38 (2005), 1857–1874. http://dx.doi.org/10.1016/j.patcog.2005.01.025 doi: 10.1016/j.patcog.2005.01.025
[10]	S. Fröhwirth-Schnatter, S. Kaufmann, Model-based clustering of multiple time series, J. Bus. Econ. Stat., 26 (2004), 78–89.
[11]	C. Bouveyron, J. Jacques, Model-based clustering of time series in group-specific functional subspaces, Adv. Data Anal. Classif., 5 (2011), 281–300. https://doi.org/10.1007/s11634-011-0095-6 doi: 10.1007/s11634-011-0095-6
[12]	C. Pamminger, S. Frühwirth-Schnatter, Model-based clustering of categorical time series, Bayesian Anal., 5 (2010), 345–368. https://doi.org/10.1214/10-BA606 doi: 10.1214/10-BA606
[13]	M. Christ, N. Braun, J. Neuffer, A. W. Kempa-Liehr, Time series feature extraction on basis of scalable hypothesis tests (tsfresh–a python package), Neurocomputing, 307 (2018), 72–77. http://dx.doi.org/10.1016/j.neucom.2018.03.067 doi: 10.1016/j.neucom.2018.03.067
[14]	D. J. Berndt, J. Clifford, Using dynamic time warping to find patterns in time series, In: Proceedings of the 3rd international conference on knowledge discovery and data mining, 1994,359–370.
[15]	A. Partida, R. Criado, M. Romance, Identity and access management resilience against intentional risk for blockchain-based IOT platforms, Electronics, 10 (2021), 378. https://doi.org/10.3390/electronics10040378 doi: 10.3390/electronics10040378
[16]	R. Tavenard, J. Faouzi, G. Vandewiele, F. Divo, G. Androz, C. Holtz, et al., Tslearn, a machine learning toolkit for time series data, J. Mach. Learn. Res., 21 (2020), 1–6.
[17]	I. S. Dhillon, Y. Guan, B. Kulis, Kernel k-means: Spectral clustering and normalized cuts, In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining, 2004,551–556. http://dx.doi.org/10.1145/1014052.1014118
[18]	J. Paparrizos, L. Gravano, k-shape: Efficient and accurate clustering of time series, In: Proceedings of the 2015 ACM SIGMOD international conference on management of data, 2015, 1855–1870. http://dx.doi.org/10.1145/2723372.2737793
[19]	S. Iglesias-Pérez, S. Moral-Rubio, R. Criado, A new approach to combine multiplex networks and time series attributes: Building intrusion detection systems (IDS) in cybersecurity, Chaos Soliton Fract., 150 (2021), 111143. https://doi.org/10.1016/j.chaos.2021.111143 doi: 10.1016/j.chaos.2021.111143
[20]	S. Iglesias-Pérez, S. Moral-Rubio, R. Criado, Combining multiplex networks and time series: A new way to optimize real estate forecasting in New York using cab rides, Physica A, 609 (2023), 128306. https://doi.org/10.1016/j.physa.2022.128306 doi: 10.1016/j.physa.2022.128306
[21]	H. A. Dau, A. Bagnall, K. Kamgar, C. C. Michael Yeh, Y. Zhu, S. Gharghabi, et al., The UCR time series archive, IEEE/CAA J. Autom. Sin., 6 (2019), 1293–1305. http://dx.doi.org/10.1109/JAS.2019.1911747 doi: 10.1109/JAS.2019.1911747
[22]	M. J. Warrens, H. van der Hoef, Understanding the adjusted rand index and other partition comparison indices based on counting object pairs, J. Classif., 39 (2022), 487–509. https://doi.org/10.1007/s00357-022-09413-z doi: 10.1007/s00357-022-09413-z
[23]	P. J. Rousseeuw, Silhouettes: A graphical aid to the interpretation and validation of cluster analysis, J. Comput. Appl. Math., 20 (1987), 53–65. https://doi.org/10.1016/0377-0427(87)90125-7

Reader Comments

Your name:*

Email:*
© 2024 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)