Research article Special Issues

Modified Mahalanobis-Taguchi System based on proper orthogonal decomposition for high-dimensional-small-sample-size data classification

  • Mahalanobis-Taguchi System (MTS) is an effective algorithm for dimensionality reduction, feature extraction and classification of data in a multidimensional system. However, when applied to the field of high-dimensional small sample data, MTS has challenges in calculating the Mahalanobis distance due to the singularity of the covariance matrix. To this end, we construct a modified Mahalanobis-Taguchi System (MMTS) by introducing the idea of proper orthogonal decomposition (POD). The constructed MMTS expands the application scope of MTS, taking into account correlations between variables and the influence of dimensionality. It can not only retain most of the original sample information features, but also achieve a substantial reduction in dimensionality, showing excellent classification performance. The results show that, compared with expert classification, individual classifiers such as NB, RF, k-NN, SVM and superimposed classifiers such as Wrapper + RF, MRMR + SVM, Chi-square + BP, SMOTE + Wrapper + RF and SMOTE + MRMR + SVM, MMTS has a better classification performance when extracting orthogonal decomposition vectors with eigenvalues greater than 0.001.

    Citation: Ting Mao, Lanting Yu, Yueyi Zhang, Li Zhou. Modified Mahalanobis-Taguchi System based on proper orthogonal decomposition for high-dimensional-small-sample-size data classification[J]. Mathematical Biosciences and Engineering, 2021, 18(1): 426-444. doi: 10.3934/mbe.2021023

    Related Papers:

    [1] Xiwen Qin, Shuang Zhang, Dongmei Yin, Dongxue Chen, Xiaogang Dong . Two-stage feature selection for classification of gene expression data based on an improved Salp Swarm Algorithm. Mathematical Biosciences and Engineering, 2022, 19(12): 13747-13781. doi: 10.3934/mbe.2022641
    [2] Eric Ke Wang, Nie Zhe, Yueping Li, Zuodong Liang, Xun Zhang, Juntao Yu, Yunming Ye . A sparse deep learning model for privacy attack on remote sensing images. Mathematical Biosciences and Engineering, 2019, 16(3): 1300-1312. doi: 10.3934/mbe.2019063
    [3] Eric Ke Wang, Fan Wang, Ruipei Sun, Xi Liu . A new privacy attack network for remote sensing images classification with small training samples. Mathematical Biosciences and Engineering, 2019, 16(5): 4456-4476. doi: 10.3934/mbe.2019222
    [4] Phung Cong Phi Khanh, Duc-Tan Tran, Van Tu Duong, Nguyen Hong Thinh, Duc-Nghia Tran . The new design of cows' behavior classifier based on acceleration data and proposed feature set. Mathematical Biosciences and Engineering, 2020, 17(4): 2760-2780. doi: 10.3934/mbe.2020151
    [5] Song Yang, Huibin Wang, Hongmin Gao, Lili Zhang . Few-shot remote sensing scene classification based on multi subband deep feature fusion. Mathematical Biosciences and Engineering, 2023, 20(7): 12889-12907. doi: 10.3934/mbe.2023575
    [6] Kunpeng Li, Zepeng Wang, Yu Zhou, Sihai Li . Lung adenocarcinoma identification based on hybrid feature selections and attentional convolutional neural networks. Mathematical Biosciences and Engineering, 2024, 21(2): 2991-3015. doi: 10.3934/mbe.2024133
    [7] Yanghan Ou, Siqin Sun, Haitao Gan, Ran Zhou, Zhi Yang . An improved self-supervised learning for EEG classification. Mathematical Biosciences and Engineering, 2022, 19(7): 6907-6922. doi: 10.3934/mbe.2022325
    [8] Hongming Liu, Yunyuan Gao, Jianhai Zhang, Juanjuan Zhang . Epilepsy EEG classification method based on supervised locality preserving canonical correlation analysis. Mathematical Biosciences and Engineering, 2022, 19(1): 624-642. doi: 10.3934/mbe.2022028
    [9] Liang-Sian Lin, Susan C Hu, Yao-San Lin, Der-Chiang Li, Liang-Ren Siao . A new approach to generating virtual samples to enhance classification accuracy with small data—a case of bladder cancer. Mathematical Biosciences and Engineering, 2022, 19(6): 6204-6233. doi: 10.3934/mbe.2022290
    [10] Paula Mercurio, Di Liu . Identifying transition states of chemical kinetic systems using network embedding techniques. Mathematical Biosciences and Engineering, 2021, 18(1): 868-887. doi: 10.3934/mbe.2021046
  • Mahalanobis-Taguchi System (MTS) is an effective algorithm for dimensionality reduction, feature extraction and classification of data in a multidimensional system. However, when applied to the field of high-dimensional small sample data, MTS has challenges in calculating the Mahalanobis distance due to the singularity of the covariance matrix. To this end, we construct a modified Mahalanobis-Taguchi System (MMTS) by introducing the idea of proper orthogonal decomposition (POD). The constructed MMTS expands the application scope of MTS, taking into account correlations between variables and the influence of dimensionality. It can not only retain most of the original sample information features, but also achieve a substantial reduction in dimensionality, showing excellent classification performance. The results show that, compared with expert classification, individual classifiers such as NB, RF, k-NN, SVM and superimposed classifiers such as Wrapper + RF, MRMR + SVM, Chi-square + BP, SMOTE + Wrapper + RF and SMOTE + MRMR + SVM, MMTS has a better classification performance when extracting orthogonal decomposition vectors with eigenvalues greater than 0.001.


    Mahalanobis-Taguchi System (MTS) is a new pattern recognition method proposed by Japanese quality engineer Genichi Taguchi [1,2]. This method utilizes Mahalanobis distance and Taguchi robust design to perform dimensionality reduction, feature extraction, classification and diagnosis in a multi-dimensional system [2]. MTS uses Mahalanobis distance as the covariance distance. It can effectively measure the importance of each variable [4] and has unique robustness. Therefore, MTS is widely used in dimensionality reduction and classification of large-sample-size data and unbalanced data, such as dimensionality reduction and classification of agricultural crop [3], pre-indicators of delirium [4], and health monitoring [5].

    In recent years, more and more high-dimensional-small-sample-size data require dimensionality reduction and classification, such as pilot randomized controlled designs in clinical research [6], energy prediction and optimization in the petrochemical industry [7], and rapid-cycle quality improvement projects [8]. The sample numbers of datasets from these fields are often much smaller than their dimension numbers [9]. Therefore, it is difficult to apply statistical methods for parameter estimation and classification, and the constructed predictive models are prone to problems such as poor learning performance, large uncertainty, and low accuracy, which will interfere with the final decision [10]. Some scholars proposed to perform principal component analysis (PCA) based on variance measurement to achieve dimensionality reduction and classification of high-dimensional-small-sample-size data [11]. However, PCA only reduces the dimensionality of a single dataset and does not consider correlations between datasets [12]. To address this issue, Ma et al. improved PCA and proposed sparse principal component analysis (SPCA), which sparsifies load factors so that most of them are 0, thereby rendering the principal component a stronger interpretation ability [13]. However, SPCA requires the calculation of a large number of eigenvalues, which is computationally costive [14]. In addition, some scholars used penalized techniques to reduce the dimensionality of high-dimensional small sample data, such as least absolute shrinkage and selection operator (LASSO) regression or smoothly clipped absolute deviation (SCAD) and minimum concave penalty functions. However, such techniques suffer from so-called oracle inequalities, multiple local minimums, and computational difficulties [15].The minimum-redundancy-maximum-relevance (MRMR) is used as one of the effective methods of dimensional reduction because of its high accuracy, but it has high computational cost and is significantly influenced by the number of features [16]. Wrapper is also an effective dimensional reduction method, but often suffers from over-fitting problems [17].

    In view of the unique effectiveness of MTS in dimensionality reduction and classification, Xiao et al. [18] optimized MTS to perform classification on high-dimensional-small-sample-size data. MTS uses definite thresholds to identify valid features and build models, thus exhibiting good dimensionality reduction, feature extraction and classification performances. However, when using MTS to process high-dimensional-small-sample-size data, the solution of an inverse problem is not unique, there will be an infinite number of solutions compatible with the data [19]. Therefore, the singularity of covariance matrix must be considered.

    To overcome the covariance matrix singularity problem, and to inherit and innovate the traditional variance measurement-based dimensionality reduction and classification methods, in this study, we introduce the idea of proper orthogonal decomposition (POD) from the traditional methods to construct a modified MTS (MMTS) for dimensionality reduction and classification of high-dimensional-small-sample-size data. POD was originally proposed by K. Pearson [20] and H. Hotelling [21]. It can reduce a large number of interdependent variables to a much smaller number of variables [22]. Therefore, it can effectively overcome the variance matrix singularity problem. Compared with the adjoint matrix method or the Schmidt orthogonalization method used in the traditional MTS, the POD method has several advantages in terms of building fewer matrix elements, greatly reducing computation, and improving computing speed. When applied in high-dimensional-small-sample-size data, POD-based MMTS can not only solve the covariance matrix singularity problem in MTS, improve the architecture and algorithm of the traditional MTS, expand the application scope of MTS, but also inherit and innovate the traditional dimensionality reduction methods by overcoming the poor robustness and week interpretation ability of PCA and reducing the computational cost of SPCA. Therefore, MMTS is a method that can quickly and efficiently extract valid features and achieve the goal of dimensionality reduction and classification.

    The structure of this article is as follows. In Section 2, we introduce the theory and method of the traditional MTS. In Section 3, we first introduce POD and then systematically explain the MMTS constructed based on the idea of POD. In Section 4, we introduce a dataset and the evaluation metrics, and demonstrate through experiments that the MMTS exhibits good classification accuracy. Section 5 summarizes this research.

    MTS is one of the most cutting-edge methods in the field of quality engineering. It utilizes Mahalanobis distance and Taguchi robust design to perform diagnosis and forecast in a multi-dimensional system [1]. It applies the whole set of ideas of Taguchi signal-to-noise ratio (SNR) experimental design into characteristic variable selection of pattern recognition [23]. This method uses orthogonal tables and SNR gains to test the validity of characteristic variables and select influential characteristic variables, thereby optimizing and simplifying characteristic variables [1] and effectively reducing dimensionality. It realizes minimizing the number of original features in the system without losing the quality of recognition [24].

    Since MTS uses Mahalanobis distance as the covariance distance, it can effectively measure the importance of attribute sets [25]. Mahalanobis distance, which constitutes a general indication of the degree of divergence in the means of the sample characteristics in multivariate space [26], is used to eliminate the influence of dimensions [27]. The Mahalanobis distance facilitates the construction of multi-dimensional measure scales for the purpose of classification [28]. Therefore, MTS has great advantages on dimensionality reduction. In addition, MTS does not require any data distribution assumption, requires only small sample sizes, and is easy to operate [2]. In view of these characteristics, MTS is widely used in the medical, engineering, and financial fields [29].

    The basic operation steps of MTS are as follows:

    Step1: Construction of the Mahalanobis space

    Collect a sample dataset with a sample number of n and a characteristic variable number of p from the normal group. Let xij be the value of the jth characteristic variable of the ith sample in the normal group, where i=1,2,...,n, j=1,2,..,p.

    Calculate the mean ˉxj and the standard deviation sj, and standardize the sample dataset:

    ˉxj=1nni=1xijsj=1n1ni=1(xijˉxj)2Zij=xijˉxjsj (1)

    Step2: Calculation of the Mahalanobis distance and SNR

    Calculate the Mahalanobis distances of the normal and abnormal groups. Mahalanobis distances is a square distance, recorded as D2:

    MDi=D2=ZiR1(Zi)T (2)

    where R is the correlation coefficient matrix of data. Zi is the all characteristic values of the ith sample that have been standardized, Zi=[Zi1,Zi2,,Zip].

    If the Mahalanobis distance of the normal group is greater than that of the abnormal group, the Mahalanobis space is valid, and an appropriate orthogonal table is selected for orthogonal experiments and SNR calculation. This study takes larger-the-better characteristics as an example:

    SNR=10lg(1mmi=11MDi2) (3)

    Where m is the sample number of the abnormal group.

    Step 3: Selection of characteristic variables

    For a jth characteristic variable, ¯SNRlevel1 represents the mean SNR value of experiments using this characteristic variable, while ¯SNRlevel2 represents the mean SNR value of experiments not using this characteristic variable. If SNR gain=¯SNRlevel1¯SNRlevel2>0, the jth characteristic variable is selected, otherwise the variable is removed.

    This section introduces the classification of high-dimensional-small-sample-size data using an MMTS. First, we introduce the POD proposed by K. Pearson [20] and H. Hotelling [21]. Then we describe an improved POD proposed by Zhu et al. [30], which is suitable for matrices with a large number of variables and a relatively small number of samples. Finally, we introduce the algorithm and operation steps of the constructed MMTS in details.

    POD can obtain uncorrelated principal components [31] and is therefore widely used in multi-dimensional and multi-variable scenarios, such as comparison of stock return volatility patterns [32] and environmental research [33].

    Set X as the data matrix, ti as the orthogonal decomposition vector of matrix XX and ri as the orthogonal decomposition vector of matrix XX. The principle of POD is to perform ti=Xri(i=1,2,...,n) orthogonal transformation on data matrix X, so that the variance of vector ti is the largest, that is max(titi). When the variance corresponding to ti is smaller, the vector is considered to contain more noise components and needs to be removed. The model is:

    max(titi)s.t.{ti=Xri(i=1,2,...,n)riri=1(i=1,2,...,n)titj=0(ij) (4)

    To enhance the calculation efficiency of the iterative algorithm, Zhu et al. [30]. improved the algorithm. Instead of using matrix (XX)i, the improved algorithm uses matrix (XX)i, which is more suitable for matrices with a large number of variables and a relatively small number of samples. Multiply X by ri=1λi(XX)iri, we get:

    ri=1λi(XX)iriXri=1λiX(XX)iriXri=titi=1λiX(XX)iriλiti=(XX)iti (5)

    Where λi is the characteristic value corresponding to ri.

    Perform the following transformations on (XX)i+1=(XX)iλiriri:

    (XX)i+1=(XX)iλiririX(XX)i+1=X(XX)iλiXriri(XX)iXi+1=(XX)iXiλitiri(XX)iXi+1X=(XX)iXiXλitiriX(XX)i(XX)i+1=(XX)i(XX)iλititi(XX)i(XX)i+1=(XX)i(XX)i(XX)ititi(XX)i+1=(XX)ititi (6)

    Perform the following transformation on λiri:

    λiri=(XX)ri=Xti (7)

    It is demonstrated that the results obtained by this algorithm (including ri,ti,λi) are exactly the same as those obtained by the traditional algorithm.

    MTS has unique advantages in dimensionality reduction and classification. It utilizes orthogonal tables to design experimental plans, takes the SNR gain of the Mahalanobis distance as the evaluation metric, and uses the data analysis method of orthogonal experiment to optimize the selection of characteristic variables. Therefore, this method is novel, ease to operate and effective [23]. However, when the dimension number of data is greater than the number of samples, the Mahalanobis distance cannot be calculated [34], thereby making MTS unusable. This research introduces the idea of POD into MTS and proposes an MMTS, thereby eliminating the system structural and algorithmic defects of MTS.

    The algorithm and operation steps of the POD-based MMTS are as follows:

    Step 1: Data standardization

    Let the input data matrix X be an n×p matrix, and that n<p. xij is an element in the ith row and jth column of X. Its actual meaning is the value of the jth characteristic variable of the ith sample, where i=1,2,...,n, j=1,2,..,p.

    Standardize the data as follows:

    ˉxj=1nni=1xijsj=1n1ni=1(xijˉxj)2Zij=xijˉxjsj (8)

    Step 2: Iteratively calculate the variance of ZZ matrix of the normal samples until λ(k+1)λ(k)λ(k)<108 is satisfied. λ(k) is the characteristic value of the kth iteration. At this point, all useful information has been extracted by default.

    Calculate the variance of ZZ matrix of the normal samples:

    (ZZ)1=ZZ (9)

    where Zi=[Zi1,Zi2,...,Zip], Z=[Z1,Z2,,Zn].

    Select λj to make vector tj a unit vector:

    tj=1λj(ZZ)ktj (10)

    Where λj is the characteristic value and tj is the orthogonal decomposition vector of matrix (ZZ)k.

    Select λj to make vector rj a unit vector, tj=Zrj, and calculate vector rj:

    rj=1λjZtj (11)

    Update the variance matrix:

    (ZZ)k+1=(ZZ)ktjtj (12)

    The abnormal samples are extracted according to the orthogonal decomposition vectors extracted from the normal samples.

    Step 3: Calculate the correlation matrix and Mahalanobis distance and verify the validity of the Mahalanobis space.

    Let the dimension number obtained by POD be q. Set λnewj as the characteristic value and ξj as the orthogonal decomposition vector of the data matrix, j=1,2,..,q. Note that ξj needs to be unitized.

    R=qj=1λnewjξjξj (13)

    Calculate the Mahalanobis distance:

    MDi=ZiR1Zi=qj=1(ξjZi)2λnewj (14)

    Where Zi=[Zi1,Zi2,...,Zip].

    If the Mahalanobis distance of the abnormal samples is greater than that of the normal samples, the Mahalanobis space is valid.

    Step 4: Calculate the SNR gain, extract valid features, and complete dimensionality reduction.

    Use an appropriate orthogonal table to carry out orthogonal experiments, and select a valid orthogonal decomposition variable by calculating the SNR. The SNR used in this article has larger-the-better characteristics:

    SNR=10lg(1mmi=11MDi2) (15)

    Where m is the number of samples in the abnormal group.

    When designing the orthogonal table, "1" indicates that the characteristic variable is used, while "2" indicates that the characteristic variable is not used. The calculation formula of SNR gain is as follows:

    gain=¯SNRlevel1¯SNRlevel2 (16)

    If gain=¯SNRlevel1¯SNRlevel2>0, then the characteristic variable is considered to be worth keeping; if gain=¯SNRlevel1¯SNRlevel2<0, it is considered that the characteristic variable should be removed.

    Step 5: Use the dimensionality-reduced characteristic variable to verify the validity of the Mahalanobis space again. If the Mahalanobis space is valid, the dimensionality-reduced characteristic variable can be used for other analyses (such as prediction and classification) of unknown samples.

    Take classification as an example. Before classifying the test set X, it is necessary to standardize its data matrix. Let the element of the unknown dataset be Xij, the mean value and standard deviation of the matrix of "normal" samples be ˉxj and sj, respectively, and the processed matrix element be XDij, then:

    XDij=Xijˉxjsj (17)

    Extract q dimensions of valid variables from the processed data matrix, calculate the Mahalanobis distance MDi, and classify the test set according to the threshold.

    MDi=XDiR1(XDi)T (18)

    Where R1 is the inverse matrix of the correlation matrix.

    The main contribution of this paper is to integrate the idea of POD into MTS, thereby perfecting MTS's own theoretical system, solving the problems of covariance matrix singularity and the eigenvalue estimation, and expanding the application scope of MTS. In addition, our research establishes a reliable and effective method for dimensionality reduction of high-dimensional-small-sample-size data, thereby providing new ideas for academic research on high-dimensional-small-sample-size data and feature extraction and dimensionality reduction in real life scenarios.

    In recent years, radiomics and texture analysis have received extensive attention from clinical and academic circles [35,36,37]. Therefore, in this section, we select a dataset from the UCI database showing colonoscopy video information of gastrointestinal lesions [38]. The dataset includes 76 cases, 698 characteristic variables, and three types of lesions, namely hyperplastic lesions, adenomas and serrated adenomas. After deleting the missing values, the dimension of dataset is 415.

    From the perspective of a binary classification problem, hyperplastic lesions belong to the "benign" category with a sample size of 21, while adenomas and serrated adenomas together belong to the "malignant" category with a sample size of 55. During binary classification of this dataset, the average classification accuracy of seven clinicians (including four experts and three beginners) is 79.4956%. The average classification accuracy of the expert group is 79.6052%, and the average classification accuracy of the beginner group is 79.3860%. The data features used in this study include narrow band imaging (NBI), color, 3D shape, and textural features. The software used in this study includes MATLAB 2018b, SPSS25, and Minitab17.

    Our experiments compare the results of MMTS with the results of individual classifiers such as naive bayes (NB), random forests (RF), k-nearest neighbor (k-NN) and support vector machine (SVM) techniques that have been commonly used in the gastroenterology field to facilitate high-dimensional-small-sample-size data classification. Since Wrapper [39], MRMR [40] and Chi-square features selection [41] are common methods in terms of feature selection and dimensionality reduction, and the synthetic minority oversampling technique (SMOTE) [42,43] is widely used in case of small samples to increase the number of observations and thus improve classification accuracy, our experiments also compare the results of MMTS with the results of superimposed classifiers such as Wrapper + RF, MRMR + SVM, Chi-square + BP, SMOTE + Wrapper + RF and SMOTE + MRMR + SVM.

    To verify if MMTS can effectively classify high-dimensional-small-sample-size data, this study uses precision, specificity, recall, accuracy, and F-value as evaluation metrics [44].

    Table 1 is a confusion matrix for the binary classification problem, where TP (True Positives) represents the number of positive cases that are correctly predicted, TN (True Negatives) represents the number of negative cases that are predicted correctly, FP (False Positives) represents the number of positive cases that are incorrectly predicted, and FN (False Negatives) represents the number of negative cases that are incorrectly predicted.

    precision=TPTP+FP (19)
    recall=TPTP+FN (20)
    specificity=TNTN+FP (21)
    accuracy=TP+TNNn+Np (22)
    F=(1+β2)×Recall×Precisionβ2×Recall+Precision (23)
    Table 1.  Confusion matrix for the binary classification problem.
    Ture class
    Negative Positive
    Hypothesis output Negative TN FN
    Positive FP TP
    Sum Nn Np
    T: True; F: False; N: Negative; P: Positive

     | Show Table
    DownLoad: CSV

    F-value is the weighted harmonic average of "precision"and "recall". It is usually assumed that both metrics are equally important, that is β=1, then F=2Recall×PrecisionRecall+Precision. The larger the F-value (the closer it is to 1), the better the classification effect.

    Iterative calculation is performed on a training set ZZ randomly selected from normal samples to obtain the orthogonal decomposition results shown in Table 2.

    Table 2.  Orthogonal decomposition vectors.
    Component Initial eigenvalue Extract load sum of square
    Total Variance percentage (%) Cumulative variance percentage (%) Total Variance percentage (%) Cumulative variance percentage (%)
    1 14.40442 96.02944 96.029 14.40442 96.029 96.029
    2 0.40432 2.69545 98.725 0.40432 2.695 98.725
    3 0.12969 0.86459 99.589 0.12969 .865 99.589
    4 0.03494 0.23296 99.822 0.03494 .233 99.822
    5 0.01601 0.10670 99.929 0.01601 .107 99.929
    6 0.00350 0.02331 99.952 0.00350 .023 99.952
    7 0.00315 0.02098 99.973 0.00315 .021 99.973
    8 0.00181 0.01206 99.985 0.00181 .012 99.985
    9 0.00110 0.00736 99.993 0.00110 .007 99.993
    10 0.00059 0.00396 99.997 0.00059 .004 99.997
    11 0.00020 0.00136 99.998 0.00020 .001 99.998
    12 0.00013 0.00089 99.999 0.00013 .001 99.999
    13 0.00008 0.00054 100.000 0.00008 .001 100.000
    14 0.00004 0.00026 100.000
    15 0.00002 0.00014 100.000

     | Show Table
    DownLoad: CSV

    For high-dimensional-small-sample-size data, the optimal extraction number of orthogonal decomposition vector is still an open topic. Therefore, to determine the optimal extraction number of orthogonal decomposition vectors, this study extracts the corresponding number of orthogonal decomposition vectors following four schemes of eigenvalues greater than 1, eigenvalues greater than 0.001, eigenvalues greater than 0.0005, and eigenvalue variance accounting for 100% (hereinafter abbreviated as MMTS-1, MMTS-2, MMTS-3, and MMTS-4, respectively). Figure 1 shows the Mahalanobis distances of normal samples (MDX) and those of abnormal samples (MDY) calculated by the different schemes.

    Figure 1.  MDXs and MDYs under the four schemes of (a) MMTS-1, (b) MMTS-2, (c) MMTS-3, and (d) MMTS-4.

    Since MMTS-1 has only one orthogonal decomposition vector and cannot be subjected to dimensionality reduction, we calculate the SNR gains of the other three schemes. Orthogonal tables L12(29), L12(210), and L16(213) are chosen according to the corresponding scheme. The calculated SNR gains are shown in Figure 2.

    Figure 2.  SNR gains of MMTS-2, MMTS-3, and MMTS-4.

    According to the results, except for the gain of the third eigenvector of MMTS-3, which is less than zero, the SNR gains calculated by other methods are all greater than zero. Therefore, the third eigenvector of MMTS-3 is removed, while all the remaining eigenvectors are reserved. For MMTS-3, after removing the third eigenvector, it is necessary to recalculate the Mahalanobis space, verify the validity of the Mahalanobis space, and recalculate the Mahalanobis distance. Figure 3 shows the test set Mahalanobis distances calculated by the four schemes.

    Figure 3.  Mahalanobis distances of the test set.

    Then we classify the samples in the test set according to the Mahalanobis distances, and calculate the precision, specificity, recall, accuracy, and Fvalue of the four schemes. The results are summarized in Table 3.

    Table 3.  Summary of classification performance of four schemes.
    Number Method Number of variables Precision Specificity Recall Accuracy F
    (a) MMTS-1 1 28.57% 54.55% 47.62% 52.63% 0.36
    (b) MMTS-2 9 83.33% 95.00% 83.33% 92.31% 0.83
    (c) MMTS-3 9 58.06% 76.36% 85.71% 78.95% 0.69
    (d) MMTS-4 13 44.68% 52.73% 100.00% 65.79% 0.62

     | Show Table
    DownLoad: CSV

    Among the four schemes, the optimal orthogonal decomposition vector extraction scheme is MMTS-2, that is, selecting orthogonal decomposition vectors with eigenvalues greater than 0.001. The evaluation metric values of MMTS-2 are precision=83.33%, specificity=95.00%, recall=83.33%, accuracy=92.31%, and F=0.83. In MMTS-1, since only one orthogonal decomposition vector is selected, the component contains relatively less feature information of the original dataset. Therefore, this scheme cannot be used to effectively classify the samples in the test set and the values of its evaluation metrics are relatively low. In MMTS-3 and MMTS-4, orthogonal decomposition vectors with eigenvalues > 0.0005 and with eigenvalue variance accounting for 100% are extracted, respectively, retaining most feature information of the original dataset. However, "overfitting" phenomenon is likely to occur when using these two schemes to process high-dimensional-small-sample-size data, thereby resulting in decreased evaluation metric values and affecting final classification results. MMTS-2 selects orthogonal decomposition vectors with eigenvalues greater than 0.001 and extracts enough feature information from the original dataset without resulting in "overfitting". Therefore, this scheme exhibits great effectiveness in classification.

    To verify the effectiveness of the proposed method, we compare it with expert classification, individual classifiers such as NB, RF, k-NN, SVM and superimposed classifiers such as Wrapper + RF, MRMR + SVM, Chi-square + BP, SMOTE + Wrapper + RF and SMOTE + MRMR + SVM.

    When using Wrapper + RF and MRMR + SVM methods, we select 30 features for classification. When using Chi-square + BP method, we select 332 features based on Chi-square values. Figure 4 shows the Chi-square values distribution.

    Figure 4.  Chi-square values distribution of Chi-square+SVM method.

    When using SMOTE + Wrapper + RF and SMOTE + MRMR + SVM methods, in order to explore the best classification performance, we debug the K value in the SMOTE (Select K nearestneighbor when synthesizing samples.), K=2,3,4,5,6. Finally, we set K=2 to add 20 normal samples. Figure 5 shows classification performance under different K values.

    Figure 5.  Classification performance under different K values.

    The classification performance results of NB, RF, k-NN, SVM, Wrapper + RF, MRMR + SVM, Chi-square + BP, SMOTE + Wrapper + RF and SMOTE + MRMR + SVM are summarized in Table 4.

    Table 4.  Summary of classification performance of MMTS-2 and other methods.
    Method Number of variables Precision Specificity Recall Accuracy F
    MMTS-2 9 83.33% 95.00% 83.33% 92.31% 0.83
    NB 415 33.33% 70.00% 50.00% 65.38% 0.40
    RF 415 50.00% 75.00% 83.33% 76.92% 0.63
    k-NN 415 33.33% 70.00% 50.00% 65.38% 0.40
    SVM 415 54.55% 75.00% 100.00% 80.77% 0.71
    Wrapper + RF, 30 83.33% 95.00% 83.33% 92.31% 0.83
    MRMR + SVM 30 40.00% 70.00% 66.67% 69.23% 0.50
    Chi-square + BP 332 60.00% 90.00% 50.00% 80.77% 0.55
    SMOTE + Wrapper + RF 30 50.00% 80.00% 66.67% 76.92% 0.57
    SMOTE + MRMR + SVM 30 57.14% 85.00% 66.67% 80.77% 0.62

     | Show Table
    DownLoad: CSV

    Compared with the methods mentioned above, MMTS-2 shows better dimensionality reduction and classification performance. The accuracy of the MMTS-2 method is higher (F = 0.83) as compared to the average accuracy of the other methods. In terms of dimensionality reduction, the number of selected features for the MMTS-2 method is much lower (N = 9) as compared to the other methods. Overall, the classification performance of the MMTS-2 method is identical to the classification performance of the Wrapper + RF method, and both are superior to the other methods.

    In general, MMTS not only greatly reduces the number of characteristic variables and effectively achieves the goal of dimensionality reduction, but also obtains better results on all the evaluation metrics except for the recall of SVM. Our experiments have proved that MMTS can achieve dimensionality reduction without sacrificing classification accuracy, it still retains a large amount of the most important original feature information.

    When used for dimensionality reduction and classification of high-dimensional-small-sample-size data, MMTS can reduce the computational cost while retaining valid variables to effectively achieve the goal of dimensionality reduction. In addition, since this method can enlarge small differences between two types of samples, the differences in Mahalanobis space and Mahalanobis distance between two types of samples will be enlarged. Therefore, the method shows good performance when classifying test set. Taken collectively, our results indicate that the MMTS proposed in this study is suitable for dimensionality reduction and classification of high-dimensional-small-sample-size data.

    PCA and SPCA are not suitable for processing high-dimensional-small-sample-size data, because they do not consider the impact of data correlation on dimensionality. As one of the promising dimensionality reduction and classification methods in the 21st century, the traditional MTS must consider the covariance matrix singularity problem when dealing with multi-dimensional-small-sample-size data. In order to solve the problems of traditional dimensionality reduction methods and MTS in dimensionality reduction and classification of high-dimensional-small-sample-size data, this study proposes an MMTS based on the idea of POD.

    Our experiments prove that MMTS can achieve the optimal results when extracting decomposition vectors with eigenvalues greater than 0.001. Extracting too much information will lead to the phenomenon of "overfitting", while extracting too little information can not accurately classify the samples.

    In addition, the dimensionality reduction efficiency and classification performance of MMTS is better than those of expert classification, individual classifiers such as NB, RF, k-NN, SVM and superimposed classifiers such as Wrapper + RF, MRMR + SVM, Chi-square + BP, SMOTE + Wrapper + RF and SMOTE + MRMR + SVM.

    The classification accuracy of the Wrapper + RF, MRMR + SVM, and Chi-square + BP methods are much higher as compared to the individual classifiers (NB, RF, k-NN, SVM). Our results demonstrate the importance of dimensionality reduction when dealing with high-dimensional-small-sample-size data classification. Dimensionality reduction and feature selection can not only reduce calculations but also improve classification accuracy.

    When using SMOTE + Wrapper + RF and SMOTE + MRMR + SVM, in order to obtain better classification performance, we debug the value of K and set the value to 2 to 6. The classification performance is better when k = 2, which indicates that when the sample size is small, K=2 has a better performance in terms of the nearest neighbor selection. Furthermore, we notice that in comparison to the Wrapper + RF method, the classification performance of the SMOTE + Wrapper + RF method is lower. However, in comparison to the MRMR + SVM method, the classification performance of the SMOTE + MRMR + SVM method is higher. Thus, the method of using SMOTE to expand the sample size is not applicable to all classifiers.

    In general, compared with NB, RF, k-NN, SVM, Wrapper + RF, MRMR + SVM, Chi-square + BP, SMOTE + Wrapper + RF and SMOTE + MRMR + SVM, MMTS is more suitable for high-dimensional small sample data classification research.

    The main contributions of MMTS are: (a) It solves the problems of covariance matrix singularity and eigenvalue estimation difficulty in dimensionality reduction and classification of high-dimensional-small-sample-size data, thereby improving the architecture and algorithm of the traditional MTS. (b) It takes into account the correlation of data, reduces the computational cost, has high robustness, and overcomes the problems of PCA and SPCA in dimensionality reduction. (c) It provides a new idea for processing high-dimensional-small-sample-size data, and achieves quick and effective dimensionality reduction and classification without sacrificing valid sample feature information. In summary, MMTS is of certain reference value, it provides new ideas for academic research on high-dimensional-small-sample-size data dimensionality reduction and classification in real life scenarios.

    This work was supported by the China National Social Science Fund Project "Research on the Implementation Path of Leading Quality Improvement with Standards in the Context of Building a Powerful Manufacturing Country" [grant number 18BJY033]. We would like to thank TopEdit (www.topeditsci.com) for English language editing of this manuscript.

    The authors declare there is no conflict of interest.



    [1] G. Taguchi, S. Chowdhury, Y. Wu, The Mahalanobis-Taguchi System, 2001.
    [2] Z. P. Chang, Y. W. Li, N. Fatima., A theoretical survey on Mahalanobis-Taguchi System, Measurement, 136 (2019), 501–510. doi: 10.1016/j.measurement.2018.12.090
    [3] N. Deepa, K. Ganesan, Mahalanobis Taguchi System based criteria selection tool for agriculture crops, 41 (2016), 1407–1414.
    [4] B. Buenviaje, J. E. Bischoff, R. A. Roncace, C. J. Willy, Mahalanobis-Taguchi System to identify preindicators of delirium in the ICU, IEEE J. Biomed. Heal. Informatics, 20 (2016), 1205–1212. doi: 10.1109/JBHI.2015.2434949
    [5] J. Wang, C. Duan, Structural health monitoring using Mahalanobis-Taguchi System, Proc. 2009 Int. Conf. Inf. Eng. Comput. Sci. ICIECS 2009, (2009).
    [6] A. K. Dwivedi, I. Mallawaarachchi, L. A. Alvarado, Analysis of small sample size studies using nonparametric bootstrap test with pooled resampling method, Stat. Med., 36 (2017), 2187–2205.
    [7] H. F. Gong, Z. S. Chen, Q. X. Zhu, Y. L. He, A monte carlo and PSO based virtual sample generation method for enhancing the energy prediction and energy optimization on small data problem: An empirical study of petrochemical industries, Appl. Energy, 197 (2017), 405–415. doi: 10.1016/j.apenergy.2017.04.007
    [8] E. Etchells, M. Ho, K. G. Shojania, Value of small sample sizes in rapid-cycle quality improvement projects, BMJ Qual. Saf., 25 (2016), 202–206. doi: 10.1136/bmjqs-2015-005094
    [9] W. Jia, D. Zhao, L. Ding, An optimized RBF neural network algorithm based on partial least squares and genetic algorithm for classification of small sample, Appl. Soft Comput. J., 48 (2016), 373–384. doi: 10.1016/j.asoc.2016.07.037
    [10] Abdul Lateh, A. K. Muda, Z. I. M. Yusof, N. A. Muda, M. S. Azmi, Handling a small dataset problem in prediction model by employ artificial data generation approach: A review, J. Phys. Conf. Ser., 892 (2017).
    [11] F. Song, Z. Guo, D. Mei, Feature selection using principal component analysis, Int. Conf. Syst. Sci., 2 (2010).
    [12] C. Lameiro, P. J. Schreier, A sparse CCA algorithm with application to model-order selection for small sample support, ICASSP, IEEE Int. Conf. Acoust. Speech Signal Process. Proc., (2017), 4721–4725.
    [13] Z. Ma, Sparse principal component analysis and iterative thresholding, Ann. Stat., 41 (2013), 772–801. doi: 10.1214/13-AOS1097
    [14] C. M. Feng, Y. L. Gao, J. X. Liu, C. H. Zheng, S. J. Li, D. Wang, A Simple Review of Sparse Principal Components Analysi, Intell. Comput. Theor. Appl., 9772 (2016), 374–383.
    [15] L. R. Eun, C. Jinwoo, Y. Kyusang, A systematic review on model selection in high-dimensional regression, J. Korean Stat. Soc., 48 (2019), 12.
    [16] S. Ramírez‐Gallego, I. Lastra, D. Martínez‐Rego, V. Bolón‐Canedo, J. M. Benítez, F. Herrera, et al., Fast‐mRMR: Fast minimum redundancy maximum relevance algorithm for high‐dimensional big data, Int. J. Intell. Syst., 32 (2017), 134–152. doi: 10.1002/int.21833
    [17] González, J. Ortega, M. Damas, P. Martín-Smith, J.Q. Gan, A new multi-objective wrapper method for feature selection-accuracy and stability analysis for BCI, Neurocomputing, 333 (2019), 407–418. doi: 10.1016/j.neucom.2019.01.017
    [18] X. Xiao, D. Fu, Y. Shi, J. Wen, Optimized Mahalanobis-Taguchi System for high-dimensional small sample data classification, Comput. Intell. Neurosci., 2020 (2020).
    [19] B. Bayar, N. Bouaynaya, R. Shterenberg, SMURC: High-dimension small-sample multivariate regression with covariance estimation, IEEE J. Biomed. Heal. Informatics, 21 (2017), 573–581. doi: 10.1109/JBHI.2016.2515993
    [20] K. Pearson, On lines and planes of closest fit to systems of points in space, Phil. Mag, 2 (1901).
    [21] H. Hotelling, Analysis of a complex of statistical variables into principal components, Educ. Psychol, 24 (1933), 417–441. doi: 10.1037/h0071325
    [22] G. Kerschen, J. C. Golinval, A. F. Vakakis, L. A. Bergman, The method of proper orthogonal decomposition for dynamical characterization and order reduction of mechanical systems: an overview, Nonlinear Dyn., 41 (2005), 147–169. doi: 10.1007/s11071-005-2803-2
    [23] X. Chen, Research on Several Issues of mahalanobis taguchi System, 2008.
    [24] L. Cheng, V. Yaghoubi, W. Van Paepegem, M. Kersemans, Mahalanobis classification system (MCS) integrated with binary particle swarm optimization for robust quality classification of complex metallic turbine blades, Mech. Syst. Signal Process., 146 (2021), 107060. doi: 10.1016/j.ymssp.2020.107060
    [25] Z. Chang, L. Cheng, L. Cui, Interval choquet fuzzy integral multi-attribute decision-making method based on mahalanobis taguchi system, Control Decis., 31 (2016), 180–186.
    [26] Y. Kikuchi, T. Ishihara, Anomaly detection and prediction of high-tension bolts by using strain of tower shell, Wind Energy, (2020), 1–16.
    [27] Z. Sheng, L. Cheng, Y. Gu, Research on the generation mechanism of mahalanobis space in mahalanobis taguchi system based on control chart[, Math. Stat. Manag., 26 (2017), 1059–1068.
    [28] Z. Chang, W. Chen, Y. Gu, H. Xu, Mahalanobis-Taguchi System for symbolic interval data based on kernel mahalanobis distance, IEEE Access, 8 (2020), 20428–20438. doi: 10.1109/ACCESS.2020.2967411
    [29] W. Z. A. W. Muhamad, K. R. Jamaludin, S. A. Saad, Z. R. Yahya, S. A. Zakaria, Random binary search algorithm based feature selection in Mahalanobis Taguchi system for breast cancer diagnosis, AIP Conf. Proc., 2018.
    [30] E. Zhu, X. Wang, A principal component orthogonal decomposition algorithm suitable for processing fingerprint data of traditional Chinese medicine, J. Xiamen Univ. Natural Sci. Ed., 6 (2005), 150–151.
    [31] K. Lu, Y. Jin, Y. Chen, Y. Yang, L. Hou, Z. Zhang, et al., Review for order reduction based on proper orthogonal decomposition and outlooks of applications in mechanical systems, Mech. Syst. Signal Process., 123 (2019), 264–297. doi: 10.1016/j.ymssp.2019.01.018
    [32] D. Wang, L. He, J. Zhu, Comparison of stock return volatility patterns based on functional adaptive clustering, Stat. Res., 35 (2018), 79–91.
    [33] V. Penenko, E. Tsvetova, Orthogonal decomposition methods for inclusion of climatic data into environmental studies, Ecol. Modell., 217 (2008), 279–291. doi: 10.1016/j.ecolmodel.2008.06.004
    [34] M. Ohkubo, Y. Nagata, Anomaly detection in high-dimensional data with the Mahalanobis‑Taguchi system, Total Qual. Manag. Bus. Excell., 29 (2018), 1213–1227. doi: 10.1080/14783363.2018.1487615
    [35] R. Valeria, C. Renato, R. Carlo, U. Lorenzo, C. Sirio, V. Francesco, et al., Prediction of tumor grade and nodal status in oropharyngeal and oral cavity squamous-cell carcinoma using a radiomic approach, Anticancer Res., 40 (2020), 271–280. doi: 10.21873/anticanres.13949
    [36] A. Stanzione, C. Ricciardi, R. Cuocolo, V. Romeo, J. Petrone, M. Sarnataro, et al., MRI radiomics for the prediction of fuhrman grade in clear cell renal cell carcinoma: A machine learning exploratory study, J. Digit. Imaging, (2020), 1–9.
    [37] V. Romeo, C. Ricciardi, R. Cuocolo, A. Stanzione, F. Verde, L. Sarno, et al., Machine learning analysis of MRI-derived texture features to predict placenta accreta spectrum in patients with placenta previa Magnetic resonance imaging, 64 (2019), 71–76.
    [38] P. Mesejo, D. Pizarro, A. Abergel, Computer-aided classification of gastrointestinal lesions in regular colonoscopy, IEEE Trans. Med. Imaging, 35 (2016), 2051–2063, . doi: 10.1109/TMI.2016.2547947
    [39] R. L.Babu, S. Vijayan, Wrapper based feature selection in semantic medical information retrieval, J. Med. Imaging Heal. Informatics, 6 (2016), 802–805. doi: 10.1166/jmihi.2016.1758
    [40] G. L. Irene, R. V. Esther, Characterization of artifact signals in neck photoplethysmography, IEEE Trans. Biomed. Eng., 67 (2020), 1–1. doi: 10.1109/TBME.2019.2950357
    [41] J. Gardezi, I. Faye, F. Adjed, N. Kamel, M. Hussain, Mammogram classification using chi-square distribution on local binary pattern features, J. Med. Imaging Heal. Informatics, 7 (2017), 30–35.
    [42] H. A. Khan, W. Jue, M. Mushtaq, M. U. Mushtaq, Brain tumor classification in MRI image using convolutional neural network, Math. Biosci. Eng., 17 (2020), 6203. doi: 10.3934/mbe.2020328
    [43] G. D'Addio, C. Ricciardi, G. Improta, P. Bifulco, M. Cesarelli, Feasibility of Machine Learning in Predicting Features Related to Congenital Nystagmus, In: Henriques J., Neves N., de Carvalho P. (eds XV Mediterranean Conference on Medical and Biological Engineering and Computing MEDICON 2019), IFMBE Proceedings, (2020).
    [44] M. El-Banna, Modified Mahalanobis Taguchi System for imbalance data classification, Comput. Intell. Neurosci., 2017 (2017).
  • This article has been cited by:

    1. Ivan Izonin, Roman Tkachenko, Ivanna Dronyuk, Pavlo Tkachenko, Michal Gregus, Mariia Rashkevych, Predictive modeling based on small data in clinical medicine: RBF-based additive input-doubling method, 2021, 18, 1551-0018, 2599, 10.3934/mbe.2021132
    2. Liang-Sian Lin, Susan C Hu, Yao-San Lin, Der-Chiang Li, Liang-Ren Siao, A new approach to generating virtual samples to enhance classification accuracy with small data—a case of bladder cancer, 2022, 19, 1551-0018, 6204, 10.3934/mbe.2022290
    3. Xiping Yang, Lifang Cheng, Hyper-Spectral Image Pixel Classification Based on Golden Sine and Chaotic Spotted Hyena Optimization Algorithm, 2023, 11, 2169-3536, 89757, 10.1109/ACCESS.2023.3307196
    4. Sri Nur Areena Mohd Zaini, Siti Khadijah Mat Saad, Mohd Yazid Abu, 2024, 3080, 0094-243X, 040014, 10.1063/5.0192253
    5. Yue Han, Yue Zhang, Jun Wang, Semantic-driven dimension reduction for wireless internet of things, 2024, 25, 25426605, 101138, 10.1016/j.iot.2024.101138
  • Reader Comments
  • © 2021 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(3348) PDF downloads(275) Cited by(5)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog