Citation: Guojun Gan, Qiujun Lan, Shiyang Sima. Scalable Clustering by Truncated Fuzzy c-means[J]. Big Data and Information Analytics, 2016, 1(2): 247-259. doi: 10.3934/bdia.2016007
[1] | Pawan Lingras, Farhana Haider, Matt Triff . Fuzzy temporal meta-clustering of financial trading volatility patterns. Big Data and Information Analytics, 2017, 2(3): 219-238. doi: 10.3934/bdia.2017018 |
[2] | Jinyuan Zhang, Aimin Zhou, Guixu Zhang, Hu Zhang . A clustering based mate selection for evolutionary optimization. Big Data and Information Analytics, 2017, 2(1): 77-85. doi: 10.3934/bdia.2017010 |
[3] | Marco Tosato, Jianhong Wu . An application of PART to the Football Manager data for players clusters analyses to inform club team formation. Big Data and Information Analytics, 2018, 3(1): 43-54. doi: 10.3934/bdia.2018002 |
[4] | Guojun Gan, Kun Chen . A Soft Subspace Clustering Algorithm with Log-Transformed Distances. Big Data and Information Analytics, 2016, 1(1): 93-109. doi: 10.3934/bdia.2016.1.93 |
[5] | Yaguang Huangfu, Guanqing Liang, Jiannong Cao . MatrixMap: Programming abstraction and implementation of matrix computation for big data analytics. Big Data and Information Analytics, 2016, 1(4): 349-376. doi: 10.3934/bdia.2016015 |
[6] | Subrata Dasgupta . Disentangling data, information and knowledge. Big Data and Information Analytics, 2016, 1(4): 377-390. doi: 10.3934/bdia.2016016 |
[7] |
Hamzeh Khazaei, Marios Fokaefs, Saeed Zareian, Nasim Beigi-Mohammadi, Brian Ramprasad, Mark Shtern, Purwa Gaikwad, Marin Litoiu .
How do I choose the right NoSQL solution? A comprehensive theoretical and experimental survey . Big Data and Information Analytics, 2016, 1(2): 185-216.
doi: 10.3934/bdia.2016004
|
[8] | Ming Yang, Dunren Che, Wen Liu, Zhao Kang, Chong Peng, Mingqing Xiao, Qiang Cheng . On identifiability of 3-tensors of multilinear rank (1; Lr; Lr). Big Data and Information Analytics, 2016, 1(4): 391-401. doi: 10.3934/bdia.2016017 |
[9] | Robin Cohen, Alan Tsang, Krishna Vaidyanathan, Haotian Zhang . Analyzing opinion dynamics in online social networks. Big Data and Information Analytics, 2016, 1(4): 279-298. doi: 10.3934/bdia.2016011 |
[10] | Ugo Avila-Ponce de León, Ángel G. C. Pérez, Eric Avila-Vales . A data driven analysis and forecast of an SEIARD epidemic model for COVID-19 in Mexico. Big Data and Information Analytics, 2020, 5(1): 14-28. doi: 10.3934/bdia.2020002 |
Data clustering refers to a process of dividing a set of items into homogeneous groups or clusters such that items in the same cluster are similar to each other and items from different clusters are distinct [10,1]. As one of the most popular tools for data exploration, data clustering has found applications in many scientific areas such as bioinformatics [21,26], actuarial science and insurance [11,13], image segmentation [20,25], to name just a few.
During the past six decades, many clustering algorithms have been developed by researchers from different areas. These clustering algorithms can be divided into two groups: hard clustering algorithms and fuzzy clustering algorithms. In hard clustering algorithms, each item is assigned to one and only one cluster; In fuzzy clustering algorithms, each item can be assigned to one or more clusters with some degrees of membership. Examples of hard clustering algorithms include the
The FCM algorithm is formulated to minimize an objective function. Let
Q(U,Z)=k∑l=1n∑i=1uαil‖xi−zl‖2, | (1) |
where
uil∈[0,1], i=1,2,…,n,l=1,2,…,k, | (2a) |
k∑l=1uil=1, i=1,2,…,n, | (2b) |
n∑i=1uil>0, l=1,2,…,k. | (2c) |
Similar to the
The FCM algorithm has some advantages over the
In this paper, we propose a modified version of the FCM algorithm, called the TFCM (Truncated FCM) algorithm, to address the aforementioned drawback of the FCM algorithm. In the TFCM algorithm, a subset of the full fuzzy partition matrix is stored and the number of distance calculations at each iteration is reduced. The idea of the TFCM algorithm stems from the insight that when
The remaining part of this paper is organized as follows. In Section 2, we give a brief review of relevant work. In Section 3, we introduce the TFCM algorithm in detail. In Section 4, we demonstrate the performance of the TFCM algorithm using numerical experiments. Finally, we conclude the paper with some remarks in Section 5.
As one of the most popular fuzzy clustering algorithms, the FCM algorithm was originally proposed by [9] and later modified by [4]. Many improvements of the FCM algorithm have been proposed since its introduction. In this section, we give a brief review of research work related to the efficiency of the FCM algorithm.
[6] proposed the AFCM (Approximate FCM) algorithm by replacing some variates in the FCM equations with integer-valued or real-valued estimates. The AFCM algorithm was developed to process digital images interactively. In the implementation of the AFCM algorithm, the fuzzy memberships
[7] proposed a multistage random sampling FCM algorithm, called the mrFCM algorithm, to reduce the runtime of the FCM algorithm. The mrFCM algorithm consists of two phases. In the first phase, the FCM algorithm is applied to a series of subsamples selected randomly from the whole dataset in order to find good initial cluster centers. In the second phase, the standard FCM algorithm with the initial cluster centers obtained from the first phase is applied to partition the whole dataset.
[19] proposed the psFCM (partition simplification FCM) algorithm to speed up the FCM algorithm by simplifying the computation at each iteration and reducing the number of iterations. Similar to the mrFCM algorithm [7], the psFCM algorithm also consists of two phases. In the first phase, the kd-tree method is first used to partition the whole dataset into small blocks. All points in a block are represented by the centroid of the block. In this way, a large dataset is reduced to a simplified dataset that is much smaller than the original dataset. Then the FCM algorithm is applied to the simplified dataset to obtain the actual cluster centers. In the second phase, the FCM algorithm with the cluster centers obtained from the first phase is applied to partition the original dataset.
[23] proposed a modified version of the FCM algorithm by eliminating the need to store the fuzzy partition matrix
[24] proposed the PFCM (Parallel FCM) algorithm for clustering large datasets by using the Message Passing Interface (MPI). In the PFCM algorithm with
[17] proposed the geFFCM (generalized extensible fast fuzzy c-means) algorithm to cluster very large datasets. The geFFCM algorithm is similar to the mrFCM algorithm [7] and the psFCM algorithm [19] in the sense that divide-and-conquer strategy is used by all three algorithms. In the geFFCM algorithm, a subsample
[18] compared three different implementation of the FCM algorithm for clustering very large datasets. In particular, [18] compared the random sample and extension FCM, single-pass FCM, and on-line FCM. In addition, kernelized versions of the three algorithms were also compared. [29] proposed the FCM++ algorithm to improve the speed of the FCM algorithm by using the seeding mechanism of the K-means++ [2].
Almost all of the aforementioned algorithms aim at speeding up the FCM algorithm for large datasets. These algorithms do not scale well when the desired number of clusters is large. The algorithm proposed by [23] reduces the time complexity of the FCM algorithm from
In this section, we introduce the TFCM (Truncated Fuzzy
Let
uil∈[0,1], i=1,2,…,n,l=1,2,…,k, |
and
k∑l=1uil=1, i=1,2,…,n. |
Let
|{l:uil>0}|≤T, | (3) |
where
Then the objective function of the TFCM algorithm is defined as
P(U,Z)=n∑i=1k∑l=1uαil(‖xi−zl‖2+ϵ), | (4) |
where
P(U,Z)=n∑i=1∑l∈Iiuαil(‖xi−zl‖2+ϵ). | (5) |
From Equation (5) we see that the main difference between the TFCM algorithm and the fuzzy
Theorem 3.1. Given a set of centers
uil=(‖xi−zl‖2+ϵ)−1α−1∑s∈Ii(‖xi−zs‖2+ϵ)−1α−1, 1≤i≤n,l∈Ii, | (6) |
where
Ii={l1,l2,…,lT} | (7) |
with
‖xi−zl1‖≤‖xi−zl2‖≤⋯≤‖xi−zlk‖. |
Proof. For each
Pi(ui,Ii)=∑l∈Iiuαil(‖xi−zl‖2+ϵ) | (8) |
is minimized subject to
Pi(ui,λ,Ii)=∑l∈Iiuαil(‖xi−zl‖2+ϵ)+λ(∑l∈Iiuil−1). |
We can obtain the optimal weights given in Equation (6) by solving the equations obtained by taking derivatives of
Now we show that
Pi(u∗i,Ii)≤Pi(v∗i,Ji), | (9) |
where
Since
Pi(u∗i,Ii)=∑l∈Ii(u∗il)α(‖xi−zl‖2+ϵ) =∑l∈Ii(‖xi−zl‖2+ϵ)−αα−1(∑s∈Ii(‖xi−zs‖2+ϵ)−1α−1)α(‖xi−zl‖2+ϵ) =∑l∈Ii(‖xi−zl‖2+ϵ)−1α−1(∑s∈Ii(‖xi−zs‖2+ϵ)−1α−1)α =1(∑s∈Ii(‖xi−zs‖2+ϵ)−1α−1)α−1. |
Similarly, we have
Pi(v∗i,Ji)=1(∑s∈ji(‖xi−zs‖2+ϵ)−1α−1)α−1. |
Since
∑s∈Ii(‖xi−zs‖2+ϵ)−1α−1≥∑s∈Ji(‖xi−zs‖2+ϵ)−1α−1, |
which shows that the inequality given in Equation (9) is true. This completes the proof.
Theorem 3.2. Given a fuzzy partition matrix
zlj=∑ni=1uαilxij∑ni=1uαil=∑i∈Cluαilxij∑i∈Cluαil, | (10) |
for
The proof of Theorem 3.2 is omitted as it is similar to the result of the FCM algorithm.
![]() |
The pseudo-code of the TFCM algorithm is given in Algorithm 1. The TFCM algorithm consists of two phases: the initialization phase and the iteration phase. In the initialization phase, we initialize the cluster centers to be
Regarding how to choose a value for the parameter
A list of default values for the parameters required by the TFCM algorithm is given in Table 1. The parameters
Parameter | Default Value |
| |
| |
| |
1000 | |
2 |
In this section, we present some numerical results to demonstrate the performance of the TFCM algorithm in terms of speed and accuracy. We also compare the performance of the TFCM algorithm to that of the FCM algorithm. We implemented both the TFCM algorithm and the FCM algorithm in Java. In order to make relatively fair comparison between the TFCM algorithm and the FCM algorithm, we used the same sets of initial cluster centers and the same criteria to terminate the algorithms.
To show that the TFCM algorithm works, we created two synthetic datasets, which are summarized in Table 2. Both synthetic datasets are two-dimensional datasets. One dataset contains four clusters and the other dataset contains 100 clusters. Figure 1 shows the two datasets.
Dataset | Size | Dimension | Clusters |
S1 | 400 | 2 | 4 clusters, each has 100 points |
S2 | 5000 | 2 | 100 clusters, each has 50 points |
Since we know the labels of the data points of the two synthetic datasets, we use the corrected Rand index [8,14,15,16] to measure the accuracy of the clustering algorithms. The corrected Rand index, denoted by
Table 3 shows the speed and accuracy of the TFCM algorithm and the FCM algorithm when applied to the first synthetic dataset. Since both algorithms use random initial cluster centers, we run the two algorithms 10 times to alleviate the impact of initial cluster centers on the performance measures.
Runtime | | |
2 | 0.103(0.139) | 0.433(0) |
4 | 0.058(0.114) | 1(0) |
8 | 0.106(0.154) | 0.682(0.023) |
(A) TFCM | ||
Runtime | | |
2 | 0.044(0.061) | 0.498(0) |
4 | 0.05(0.058) | 1(0) |
8 | 0.176(0.143) | 0.726(0.038) |
(B) FCM |
The first synthetic dataset contains 400 data points and four clusters of equal size. When the number of clusters
The test results on the first synthetic dataset show that when
Table 4 shows the speed and accuracy of the two algorithms when applied to the second synthetic dataset 10 times. The second synthetic dataset contains 5000 data points, which are contained in 100 clusters. Each cluster contains 50 points. Table 4(a) shows the speed and accuracy of the TFCM algorithm when
Runtime | | |
50 | 6.869(6.65) | 0.502(0.007) |
100 | 5.084(1.97) | 0.797(0.029) |
200 | 20.639(7.879) | 0.776(0.008) |
(A) TFCM with | ||
Runtime | | |
50 | 5.269(1.574) | 0.483(0.007) |
100 | 4.348(1.887) | 0.848(0.03) |
200 | 20.184(9.307) | 0.777(0.008) |
(B) TFCM with | ||
Runtime | | |
50 | 71.877(16.729) | 0.526(0.006) |
100 | 26.341(18.1) | 0.819(0.025) |
200 | 53.683(26.543) | 0.799(0.015) |
(C) FCM |
If we increased
If we look at the average runtime for different
As we mentioned in the introduction section of this article, data clustering was used to divide a large portfolio of variable annuity contracts into hundreds of clusters in order to find representative contracts for metamodeling [11,13]. Existing clustering algorithms are slow for dividing a large dataset into hundreds of clusters. In this subsection, we apply the TFCM algorithm to divide a large portfolio of variable annuity contracts into hundreds of clusters.
The variable annuity dataset was simulated by a Java program [12]. The dataset contains 10,000 variable annuity contracts. The original dataset contains categorical variables. We converted the categorical variables into binary dummy variables and normalized all numerical variables to the interval [0, 1]. The resulting dataset has 22 numerical features. Since the dataset has no labels, we cannot use the corrected Rand index to measure the accuracy of the clustering results. To compare the clustering results of this dataset, we use the within-cluster sum of squares defined as
WSS=k∑l=1∑x∈Cld∑j=1(xj−zlj)2, | (11) |
where
We applied the TFCM algorithm to this dataset with different values of
Runtime | | |
100 | 6.417(1.9) | 944.636(14.574) |
200 | 16.167(5.565) | 735.001(6.37) |
(A) TFCM with | ||
Runtime | | |
100 | 16.734(5.133) | 930.14(15.614) |
200 | 31.871(19.216) | 721.291(5.3) |
(B) TFCM with | ||
Runtime | | |
100 | 71.185(22.023) | 958.137(15.234) |
200 | 87.918(22.641) | 740.548(6.688) |
(C) TFCM with | ||
Runtime | | |
100 | 164.02(57.612) | 994.111(18.829) |
200 | 219.695(51.104) | 783.113(7.156) |
(D)TFCM with | ||
Runtime | | |
100 | 280.137(70.577) | 1049.864(24.202) |
200 | 339.216(80.694) | 822.988(8.866) |
(E) TFCM with | ||
Runtime | | |
100 | 597.828(193.2) | 895.205(16.264) |
200 | 756.378(382.952) | 697.841(6.736) |
(F) FCM |
Table 5(f) shows the result of the FCM algorithm when applied to the variable annuity dataset. From this table we see that it took the FCM algorithm about 756.378 seconds to divide the dataset into
Tables 5(a)-5(e) give the results of the TFCM algorithm when applied to the variable annuity dataset with different values of
If we compare Table 5(b) and Table 5(f), we see that the TFCM algorithm is more than 20 times faster than the FCM algorithm. For example, it took the FCM algorithm about 756 seconds on average to divide the dataset into 200 clusters, but it only took the TFCM algorithm about 32 seconds on average to divide the dataset into 200 clusters.
In summary, the numerical experiments show that the TFCM algorithm outperformed the FCM algorithm in terms of speed when the desired number of clusters is large. The accuracy of the TFCM algorithm is close to that of the FCM algorithm.
In some situations, we need to divide a large dataset into a large number of clusters. For example, we need to divide millions of web pages into thousands of categories [5] and divide a large portfolio of insurance policies into hundreds of clusters in order to select representative policies [11,13]. Most existing algorithms are not efficient when used to divide a large dataset into a large number of clusters.
In this paper, we proposed a truncated fuzzy
We implement the TFCM algorithm in a straightforward way according to the pseudo-code given in Algorithm 1. The speed of the TFCM algorithm can be further improved by using the technique introduced by [23]. This technique allows us to combine the step of updating the cluster centers and the step of updating the fuzzy memberships into a single step. In future, we would like to incorporate this technique into the TFCM algorithm and compare the TFCM algorithm with other algorithms mentioned in Section 2.
This research was partially supported the National Natural Science Foundation of China (Grant No.71171076).
[1] | [ C. C. Aggarwal and C. K. Reddy (eds.), Data Clustering:Algorithms and Applications, CRC Press, Boca Raton, FL, USA, 2014. |
[2] | [ D. Arthur and S. Vassilvitskii, k-means++:The advantages of careful seeding, in Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA'07, Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2007, 1027-1035. |
[3] | [ J. C. Bezdek, R. Ehrlich and W. Full, FCM:The fuzzy c-means clustering algorithm, Computers & Geosciences, 10(1984), 191-203. |
[4] | [ J. Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithms, Kluwer Academic Publishers, Norwell, MA, USA, 1981. |
[5] | [ A. Broder, L. Garcia-Pueyo, V. Josifovski, S. Vassilvitskii and S. Venkatesan, Scalable kmeans by ranked retrieval, in Proceedings of the 7th ACM International Conference on Web Search and Data Mining, WSDM'14, ACM, 2014, 233-242. |
[6] | [ R. L. Cannon, J. V. Dave and J. Bezdek, Efficient implementation of the fuzzy c-means clustering algorithms, IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-8(1986), 248-255. |
[7] | [ T. W. Cheng, D. B. Goldgof and L. O. Hall, Fast fuzzy clustering, Fuzzy Sets and Systems, 93(1998), 49-56. |
[8] | [ M. de Souto, I. Costa, D. de Araujo, T. Ludermir and A. Schliep, Clustering cancer gene expression data:A comparative study, BMC Bioinformatics, 9(2008), p497. |
[9] | [ J. C. Dunn, A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters, Journal of Cybernetics, 3(1973), 32-57. |
[10] | [ G. Gan, Data Clustering in C++:An Object-Oriented Approach, Data Mining and Knowledge Discovery Series, Chapman & Hall/CRC Press, Boca Raton, FL, USA, 2011. |
[11] | [ G. Gan, Application of data clustering and machine learning in variable annuity valuation, Insurance:Mathematics and Economics, 53(2013), 795-801. |
[12] | [ G. Gan, A multi-asset Monte Carlo simulation model for the valuation of variable annuities, in Proceedings of the Winter Simulation Conference, 2015, 3162-3163. |
[13] | [ G. Gan and S. Lin, Valuation of large variable annuity portfolios under nested simulation:A functional data approach, Insurance:Mathematics and Economics, 62(2015), 138-150. |
[14] | [ G. Gan and M. K.-P. Ng, Subspace clustering using affinity propagation, Pattern Recognition, 48(2015), 1455-1464. |
[15] | [ G. Gan and M. K.-P. Ng, Subspace clustering with automatic feature grouping, Pattern Recognition, 48(2015), 3703-3713. |
[16] | [ G. Gan, Y. Zhang and D. K. Dey, Clustering by propagating probabilities between data points, Applied Soft Computing, 41(2016), 390-399. |
[17] | [ R. J. Hathaway and J. C. Bezdek, Extending fuzzy and probabilistic clustering to very large data sets, Computational Statistics & Data Analysis, 51(2006), 215-234. |
[18] | [ T. Havens, J. Bezdek, C. Leckie, L. Hall and M. Palaniswami, Fuzzy c-means algorithms for very large data, IEEE Transactions on Fuzzy Systems, 20(2012), 1130-1146. |
[19] | [ M.-C. Hung and D.-L. Yang, An efficient fuzzy c-means clustering algorithm, in Proceedings IEEE International Conference on Data Mining, 2001, 225-232. |
[20] | [ Z.-X. Ji, Q.-S. Sun and D.-S. Xia, A modified possibilistic fuzzy c-means clustering algorithm for bias field estimation and segmentation of brain MR image, Computerized Medical Imaging and Graphics, 35(2011), 383-397. |
[21] | [ D. Jiang, C. Tang and A. Zhang, Cluster analysis for gene expression data:A survey, IEEE Transactions on Knowledge and Data Engineering, 16(2004), 1370-1386. |
[22] | [ F. Klawonn, Fuzzy clustering:Insights and a new approach, Mathware & Soft Computing, 11(2004), 125-142. |
[23] | [ J. F. Kolen and T. Hutcheson, Reducing the time complexity of the fuzzy c-means algorithm, IEEE Transactions on Fuzzy Systems, 10(2002), 263-267. |
[24] | [ T. Kwok, K. Smith, S. Lozano and D. Taniar, Parallel fuzzy c-means clustering for large data sets, in Euro-Par 2002 Parallel Processing (eds. B. Monien and R. Feldmann), vol. 2400 of Lecture Notes in Computer Science, Springer, 2002, 365-374. |
[25] | [ H. Liu, F. Zhao and L. Jiao, Fuzzy spectral clustering with robust spatial information for image segmentation, Applied Soft Computing, 12(2012), 3636-3647. |
[26] | [ J. D. MacCuish and N. E. MacCuish, Clustering in Bioinformatics and Drug Discovery, CRC Press, Boca Raton, FL, 2010. |
[27] | [ J. Macqueen, Some methods for classification and analysis of multivariate observations, in Proceedings of the 5th Berkeley Symposium on Mathematical Statistics andProbability (eds. L. LeCam and J. Neyman), University of California Press, Berkely, CA, USA, 1(1967), 281-297. |
[28] | [ S. A. A. Shalom, M. Dash and M. Tue, Graphics hardware based efficient and scalable fuzzy cmeans clustering, in Proceedings of the 7th Australasian Data Mining Conference, 87(2008), 179-186. |
[29] | [ A. Stetco, X.-J. Zeng and J. Keane, Fuzzy c-means++:Fuzzy c-means with effective seeding initalization, Expert Systems with Applications, 42(2015), 7541-7548. |
1. | Guojun Gan, Valuation of Large Variable Annuity Portfolios Using Linear Models with Interactions, 2018, 6, 2227-9091, 71, 10.3390/risks6030071 | |
2. | Guojun Gan, Emiliano A. Valdez, Data Clustering with Actuarial Applications, 2020, 24, 1092-0277, 168, 10.1080/10920277.2019.1575242 | |
3. | Ben Mingbin Feng, Zhenni Tan, Jiayi Zheng, Efficient Simulation Designs for Valuation of Large Variable Annuity Portfolios, 2020, 24, 1092-0277, 275, 10.1080/10920277.2019.1685394 | |
4. | Guojun Gan, Jimmy Xiangji Huang, 2017, A Data Mining Framework for Valuing Large Portfolios of Variable Annuities, 9781450348874, 1467, 10.1145/3097983.3098013 |
Parameter | Default Value |
| |
| |
| |
1000 | |
2 |
Dataset | Size | Dimension | Clusters |
S1 | 400 | 2 | 4 clusters, each has 100 points |
S2 | 5000 | 2 | 100 clusters, each has 50 points |
Runtime | | |
2 | 0.103(0.139) | 0.433(0) |
4 | 0.058(0.114) | 1(0) |
8 | 0.106(0.154) | 0.682(0.023) |
(A) TFCM | ||
Runtime | | |
2 | 0.044(0.061) | 0.498(0) |
4 | 0.05(0.058) | 1(0) |
8 | 0.176(0.143) | 0.726(0.038) |
(B) FCM |
Runtime | | |
50 | 6.869(6.65) | 0.502(0.007) |
100 | 5.084(1.97) | 0.797(0.029) |
200 | 20.639(7.879) | 0.776(0.008) |
(A) TFCM with | ||
Runtime | | |
50 | 5.269(1.574) | 0.483(0.007) |
100 | 4.348(1.887) | 0.848(0.03) |
200 | 20.184(9.307) | 0.777(0.008) |
(B) TFCM with | ||
Runtime | | |
50 | 71.877(16.729) | 0.526(0.006) |
100 | 26.341(18.1) | 0.819(0.025) |
200 | 53.683(26.543) | 0.799(0.015) |
(C) FCM |
Runtime | | |
100 | 6.417(1.9) | 944.636(14.574) |
200 | 16.167(5.565) | 735.001(6.37) |
(A) TFCM with | ||
Runtime | | |
100 | 16.734(5.133) | 930.14(15.614) |
200 | 31.871(19.216) | 721.291(5.3) |
(B) TFCM with | ||
Runtime | | |
100 | 71.185(22.023) | 958.137(15.234) |
200 | 87.918(22.641) | 740.548(6.688) |
(C) TFCM with | ||
Runtime | | |
100 | 164.02(57.612) | 994.111(18.829) |
200 | 219.695(51.104) | 783.113(7.156) |
(D)TFCM with | ||
Runtime | | |
100 | 280.137(70.577) | 1049.864(24.202) |
200 | 339.216(80.694) | 822.988(8.866) |
(E) TFCM with | ||
Runtime | | |
100 | 597.828(193.2) | 895.205(16.264) |
200 | 756.378(382.952) | 697.841(6.736) |
(F) FCM |
Parameter | Default Value |
| |
| |
| |
1000 | |
2 |
Dataset | Size | Dimension | Clusters |
S1 | 400 | 2 | 4 clusters, each has 100 points |
S2 | 5000 | 2 | 100 clusters, each has 50 points |
Runtime | | |
2 | 0.103(0.139) | 0.433(0) |
4 | 0.058(0.114) | 1(0) |
8 | 0.106(0.154) | 0.682(0.023) |
(A) TFCM | ||
Runtime | | |
2 | 0.044(0.061) | 0.498(0) |
4 | 0.05(0.058) | 1(0) |
8 | 0.176(0.143) | 0.726(0.038) |
(B) FCM |
Runtime | | |
50 | 6.869(6.65) | 0.502(0.007) |
100 | 5.084(1.97) | 0.797(0.029) |
200 | 20.639(7.879) | 0.776(0.008) |
(A) TFCM with | ||
Runtime | | |
50 | 5.269(1.574) | 0.483(0.007) |
100 | 4.348(1.887) | 0.848(0.03) |
200 | 20.184(9.307) | 0.777(0.008) |
(B) TFCM with | ||
Runtime | | |
50 | 71.877(16.729) | 0.526(0.006) |
100 | 26.341(18.1) | 0.819(0.025) |
200 | 53.683(26.543) | 0.799(0.015) |
(C) FCM |
Runtime | | |
100 | 6.417(1.9) | 944.636(14.574) |
200 | 16.167(5.565) | 735.001(6.37) |
(A) TFCM with | ||
Runtime | | |
100 | 16.734(5.133) | 930.14(15.614) |
200 | 31.871(19.216) | 721.291(5.3) |
(B) TFCM with | ||
Runtime | | |
100 | 71.185(22.023) | 958.137(15.234) |
200 | 87.918(22.641) | 740.548(6.688) |
(C) TFCM with | ||
Runtime | | |
100 | 164.02(57.612) | 994.111(18.829) |
200 | 219.695(51.104) | 783.113(7.156) |
(D)TFCM with | ||
Runtime | | |
100 | 280.137(70.577) | 1049.864(24.202) |
200 | 339.216(80.694) | 822.988(8.866) |
(E) TFCM with | ||
Runtime | | |
100 | 597.828(193.2) | 895.205(16.264) |
200 | 756.378(382.952) | 697.841(6.736) |
(F) FCM |