Loading [MathJax]/jax/element/mml/optable/MathOperators.js
Research article

Feature screening for ultrahigh-dimensional binary classification via linear projection

  • Linear discriminant analysis (LDA) is one of the most widely used methods in discriminant classification and pattern recognition. However, with the rapid development of information science and technology, the dimensionality of collected data is high or ultrahigh, which causes the failure of LDA. To address this issue, a feature screening procedure based on the Fisher's linear projection and the marginal score test is proposed to deal with the ultrahigh-dimensional binary classification problem. The sure screening property is established to ensure that the important features could be retained and the irrelevant predictors could be eliminated. The finite sample properties of the proposed procedure are assessed by Monte Carlo simulation studies and a real-life data example.

    Citation: Peng Lai, Mingyue Wang, Fengli Song, Yanqiu Zhou. Feature screening for ultrahigh-dimensional binary classification via linear projection[J]. AIMS Mathematics, 2023, 8(6): 14270-14287. doi: 10.3934/math.2023730

    Related Papers:

    [1] Hanji He, Meini Li, Guangming Deng . Group feature screening for ultrahigh-dimensional data missing at random. AIMS Mathematics, 2024, 9(2): 4032-4056. doi: 10.3934/math.2024197
    [2] Zhongzheng Wang, Guangming Deng, Haiyun Xu . Group feature screening based on Gini impurity for ultrahigh-dimensional multi-classification. AIMS Mathematics, 2023, 8(2): 4342-4362. doi: 10.3934/math.2023216
    [3] Qingqing Jiang, Guangming Deng . Ultra-high-dimensional feature screening of binary categorical response data based on Jensen-Shannon divergence. AIMS Mathematics, 2024, 9(2): 2874-2907. doi: 10.3934/math.2024142
    [4] Renqing Liu, Guangming Deng, Hanji He . Generalized Jaccard feature screening for ultra-high dimensional survival data. AIMS Mathematics, 2024, 9(10): 27607-27626. doi: 10.3934/math.20241341
    [5] Pedro Fernández de Córdoba, Carlos A. Reyes Pérez, Enrique A. Sánchez Pérez . Mathematical features of semantic projections and word embeddings for automatic linguistic analysis. AIMS Mathematics, 2025, 10(2): 3961-3982. doi: 10.3934/math.2025185
    [6] Changfu Yang, Wenxin Zhou, Wenjun Xiong, Junjian Zhang, Juan Ding . Single-index logistic model for high-dimensional group testing data. AIMS Mathematics, 2025, 10(2): 3523-3560. doi: 10.3934/math.2025163
    [7] Li Liu, Long Zhang, Huaxiang Zhang, Shuang Gao, Dongmei Liu, Tianshi Wang . A data partition strategy for dimension reduction. AIMS Mathematics, 2020, 5(5): 4702-4721. doi: 10.3934/math.2020301
    [8] Shudi Yang, Zheng-An Yao . Weight distributions for projective binary linear codes from Weil sums. AIMS Mathematics, 2021, 6(8): 8600-8610. doi: 10.3934/math.2021499
    [9] Zongmin Yue, Yitong Li, Fauzi Mohamed Yusof . Dynamic analysis and optimal control of Zika virus transmission with immigration. AIMS Mathematics, 2023, 8(9): 21893-21913. doi: 10.3934/math.20231116
    [10] Sixing Tao . Breathers, resonant multiple waves and complexiton solutions of a (2+1)-dimensional nonlinear evolution equation. AIMS Mathematics, 2023, 8(5): 11651-11665. doi: 10.3934/math.2023590
  • Linear discriminant analysis (LDA) is one of the most widely used methods in discriminant classification and pattern recognition. However, with the rapid development of information science and technology, the dimensionality of collected data is high or ultrahigh, which causes the failure of LDA. To address this issue, a feature screening procedure based on the Fisher's linear projection and the marginal score test is proposed to deal with the ultrahigh-dimensional binary classification problem. The sure screening property is established to ensure that the important features could be retained and the irrelevant predictors could be eliminated. The finite sample properties of the proposed procedure are assessed by Monte Carlo simulation studies and a real-life data example.



    More and more attention has been paid to the analysis of ultrahigh-dimensional data with the rapid development of information science and technology. The requirement of dealing with high-dimensional data efficiently should be well satisfied. Ultrahigh-dimensional data often appear in the area of biomedical imaging, gene expression and proteomics studies and so on. The dimensionality p of the collected data is allowed to diverge at a nonpolynomial rate with the sample size n, which is logp=O(nξ) for some ξ>0. Hence dimension reduction seems imperative for the efficient manipulation and analysis of ultrahigh-dimensional data.

    Feature reduction techniques such as principal component analysis and linear discriminant analysis (LDA) have been proposed and applied successfully in practice to reduce the dimensionality of the original features without losing too much information. LDA is one of the most popular approaches in discriminant classification and pattern recognition, and it aims to find a proper linear transformation so that each sample vector with a high dimension is projected into a low-dimension vector while preserving the original cluster structure as much as possible. However, when the dimension is ultrahigh, the classical classification methods are no longer applicable, such as LDA [1], because of too many redundant variables. For example, the ovarian cancer data which were studied by Sorace and Zhan [2] consisted of serum samples of 162 ovarian cancer patients and 91 control subjects. For each sample, 15154 distinct mass-to-charge ratios (M/Z) were available for analysis. It is interesting to identify proteomic patterns (corresponding to the M/Z value) that can distinguish ovarian cancer subjects from control subjects. The small number of samples and many redundant variables make discriminant classification unable to work effectively.

    To deal with the ultrahigh-dimensional discriminant classification difficulty, many marginal feature screening procedures are proposed by statisticians to reduce the dimension rapidly, and then some classical discriminant analysis methods could be processed. Mai and Zou [3] proposed a feature screening procedure named Kolmogorov filter (KF) for binary classification based on the Kolmogorov-Smirnov statistic, which enjoyed the sure screening property under much-weakened model assumptions. Mai and Zou [4] proposed the fused Kolmogorov filter, which generalized the KF procedure to the multi-classification case. Lai et al. [5] proposed the feature screening procedure based on the expected conditional Kolmogorov filter for the ultrahigh-dimensional binary classification problem with a dependent variable. Cui et al. [6] proposed a model-free feature screening index named MV for ultrahigh-dimensional discriminant analysis based on the difference between conditional and unconditional distribution functions. Pan et al. [7] developed a pairwise sure independence screening procedure (PSIS) for LDA, but this procedure depended on the parametric modeling assumptions and may perform poorly for heavy-tailed data. Cheng et al. [8] proposed a robust ranking screening procedure based on the conditional expectation of the rank of predictor samples for the ultrahigh-dimensional discriminant analysis, which was robust against the heavy-tailed distributions, potential outliers and the sample shortage for some categories. He et al. [9] generalized the MV procedure by modifying MV with a weight function. The proposed Anderson-Darling sure independence screening procedure (AD-SIS) could be more robust against the heavy-tailed distributions. Song et al. [10] proposed a robust composite weighted quantile screening procedure based on the difference between the conditional and the unconditional quantiles of the feature. Different from the existing methods, which used the differences of means or the differences of conditional cumulative distribution functions between classes as the screening indexes. Sheng and Wang [11] proposed a new feature screening method to rank the importance of predictors based on the classification accuracy of marginal classifiers.

    Although many feature screening procedures for the ultrahigh-dimensional discriminant analysis problems have been proposed, some of them are even model-free; the study for the LDA problem, one of the most popular approaches in discriminant classification and pattern recognition, is still very attractive. In order to solve the linear discrimination problem with ultrahigh-dimensional features, the dimension reduction method based on the linear projection may bring a better performance than the model-free methods. In this paper, we propose a feature screening procedure based on the Fisher's linear discriminant framework. By minimizing the linear projection of the sum of squares in the original cluster structures and maximizing the linear projection of the sum of squares between groups, the marginal score test is constructed and combined with the linear projection optimal problem. First, the proposed method can screen out the irrelevant predictors in the linear discriminant function through the use of estimating equations. Second, the proposed procedure processes the sure screening property. Third, the simple structure of the screening index makes the calculation fast.

    The rest of this paper is organized as follows. In Section 2, we construct the feature screening estimating equations, propose the feature screening procedure and further study its theoretical properties. In Section 3, we present Monte Carlo simulation studies to examine the finite sample performance of the proposed procedure. We also use the proposed procedure in a real data example. All technical details are presented in the Appendix.

    Consider the class data samples (Xi,Yi), i=1,,n, where Yi is the binary class index variable that equals one of {1,2}, and XiRp. Suppose that G={G1,G2}Rp×n, where each GjRp×nj for j=1,2 represents an independent class data set {Xij}={Xi when Yi=j}, nj denotes the number of the samples of the class Gj; and, n1+n2=n. When the dimensionality p is large, the discriminant analysis based on the data would be rather complex and inefficient. LDA aims to find a linear projection βRp, so it maps each sample vector Xi to a new low-dimension sample βXi, i=1,,n. It seeks to project the observations into a lower space such that the intergroup variance of the projected samples is large and the intragroup variance is small. A classification rule is obtained by assigning the sample to its nearest centroid in the transformed space. To find the projection direction and delete the irrelevant predictors simultaneously, we construct the linear projection feature screening procedure in the following.

    Based on the linear projection, a random sample Xi is projected to βXi. Define

    SSE=2j=1nji=1(βXijβˉXj)2=β2j=1nji=1(XijˉXj)(XijˉXj)β:=βEβ,SS(TR)=2j=1nj(βˉXjβˉX)2=β2j=1nj(ˉXjˉX)(ˉXjˉX)β:=βBβ, (2.1)

    where ˉXj=1njnji=1Xij and ˉX=1nni=1Xi. Thus, the linear projection procedure aims to obtain β by

    maxββBββEβ,s.t.β0andβ=1. (2.2)

    When the dimensionality p is ultrahigh, the traditional solutions for (2.2) fail. For example, the related eigenvectors of the eigenvalues solved from |BλE|=0 are hard to obtain. For ultrahigh-dimensional problems, sparsity is often present, meaning that only a small number of predictors contribute significantly to the LDA process. We denote the active set and the inactive set as A={k:βk0,1kp} and Ac={k:βk=0,1kp}, respectively. Note that (2.2) is a constrained optimization problem. To avoid the constraint, without loss of generality, we assume β=(β1,β2,βp)=(β1,β(1)), where β(1)=(β2,,βp), β1>0 and β1=1β(1)2. β1>0 means X1 is important for the linear discriminant classification. The active predictor X1 can be identified by comparing each marginal correlation of Xk and Y for k=1,,p. We propose a feature screening procedure to identify potential active predictors as follows. Assume E(Xk)=0, Var(Xk)=1, and redefine Xk=XkE(Xk|X1) for k=2,,p. The transformation of Xk can clear out the linear correlation between Xk and the active predictor X1.

    By introducing β(1), we estimate β(β(1)) by maximizing

    maxβ(1)ˆL(β(1))=maxβ(1)βBββEβ, (2.3)

    or solving

    ˆL(β(1))β(1)=1(βEβ)2[2Jβ(1)BββEβ2βEβJβ(1)Eβ]=0, (2.4)

    where Jβ(1)=ββ(1)=(b1,,bp) is a p×(p1) matrix, b1=(1β(1)2)1/2β(1) and bs=(0,,0,1,0,,0) with sth element equals to 1, s=2,,p. Let ˆLk(β) be the kth component of ˆL(β(1))β(1). Therefore, (ˆL2(β),,ˆLp(β))=0 are estimating equations of β(1). Since the sparsity property is satisfied, motivated by the score test screening procedure proposed by Zhao and Li [12], for each k(k1), we consider a marginal estimating equation of βk(k=2,p) and assume that all the other covariates are unrelated to the linear discriminant classification except X1. Denote this marginal estimating equation by Lk(β), and ˆωk(βk)=ˆLk(β1,0,,0,βk,0,,0)=0. From this marginal estimating equation, if |ˆωk(0)| is bigger than 0, it means that βk=0 is not the solution of this estimating equation. Thus Xk is a possible active predictor. Otherwise, the coefficient βk=0 denotes that Xk is not important in the linear discriminant analysis. Therefore, similar to Zhao and Li [12] and Ma et al. [13], each |ˆωk(0)|=ˆLk(1,0,,0) is the numerator of the score statistic for a hypothesis: βk=0(k2) under the kth marginal model and therefore can be a sensible screening statistic. Here β1=1 is from β=1. Let ˆωk=ˆωk(0). It follows

    ˆωk=ˆωk(0)=2E211[E11Bk1B11Ek1],k=2,,p, (2.5)

    where E11=2j=1ni=1[Xi1ˉXj1]2I(Yi=j), Ek1=2j=1ni=1[XikˉXjk][Xi1ˉXj1]I(Yi=j), B11=2j=1ni=1[ˉXj1ˉX1]2I(Yi=j) and Bk1=2j=1ni=1[ˉXjkˉXk][ˉXj1ˉX1]I(Yi=j). To simplify the calculation and theoretical derivation, define

    ˆωk=ˆωk(0)=1n2[E11Bk1B11Ek1],k=2,,p. (2.6)

    Note that ˆωk is a scaled version of ˆωk. They lead to the same result of feature ranking and screening.

    Define ωk=ωk(0)=T11T12T21T22, where

    T11=2j=1E{[X1E(X1Ij)E(Ij)]2Ij},T21=2j=1E{[E(X1Ij)E(Ij)E(X1)]2Ij},T12=2j=1E{[E(XkIj)E(Ij)E(Xk)][E(X1Ij)E(Ij)E(X1)]Ij},T22=2j=1E{[XkE(XkIj)E(Ij)][X1E(X1Ij)E(Ij)]Ij}, (2.7)

    where Ij=I(Y=j), and I() is the indicator function. From (2.6), if Xk and Y are independent, then it follows

    ˆωkPωk=2j=1E{IjE2(X1Ij)E2(Ij)}2j=1E{X1XkIj}=0, n.

    Therefore, ˆωk could be used as the feature screening index.

    For a given threshold value cn, the active set is estimated as

    ˆAcn={2kp:|ˆωk|cn}. (2.8)

    Usually, the predefined cn is not easy to be identified. Another way is to select the top dn predictors and estimate the active set as

    ˆAdn={2kp:|ˆωk| ranks among the top dn}. (2.9)

    The submodel size dn is a predefined threshold value, e.g., dn=v[n/log(n)], v is some positive integer, see Fan and Lv [14]. In practice, v is chosen to be bigger than 1 to enhance the probability of selecting all the relevant predictors.

    Next, we establish the theoretical property of the proposed feature screening method. To study the sure screening property, the following regularity conditions are assumed.

    ● C1. X satisfies the sub-exponential tail probability uniformly in p. That is, there exists a positive constant s0 such that for all 0<s2s0,

    suppmax1kpE{exp(sX2k)}<.

    ● C2. There exist some constants c>0 and 0κ<12 such that minkAωk2cnκ.

    Theorem 1. (Sure Screening Property) Under Condition C1, for any 0<γ<12κ, there exist positive constants c1>0 and c2>0 such that

    P(max1kp|ˆωkωk|cnκ)O{2pexp(c1n12γ2κ)+2npexp(c2nγ)}. (2.10)

    Further, if both conditions C1 and C2 hold, by taking cn=cnκ in (2.8), we have

    P(AˆAcn)1O{2snexp(c1n12γ2κ)+2nsnexp(c2nγ)}, (2.11)

    where sn is the cardinality of A, which is sparse and may vary with n.

    Theorem 2. (Minimum Model Size) Under conditions in Theorem 1, for any cn=c3nκ,c3>0, there exist positive constants c4 and c5, such that

    P(ˆAcn0O(nκpk=1|ωk|))1O{2pexp(c4n12γ2κ)+2npexp(c5nγ)}. (2.12)

    Here 0 denotes the cardinality of a set.

    Remark 1. Theorem 1 shows that the sure screening property holds for the proposed linear projection feature screening procedure. The dimensionality p is allowed to increase at an exponential rate of the sample size n, i.e., p=o(exp(nα)). From (2.10) of Theorem 1, the left term of (2.10) tends to 0 if 0<α<12γ2κ. Furthermore, it shows that the feature screening procedure could retain all the important classification predictors with probability tending to 1, which means P(AˆAcn)1. The screened features could be utilized in the linear discriminant analysis. If the dimensionality is still high, some penalized methods could be processed. Theorem 2 shows that as long as pk=1|ωk| is of a polynomial order of sample size, then the number of selected variables is also of polynomial order of sample size.

    In this section, we present two simulation studies of the popular discriminant analysis models, the logistic model and the probit model, and one real data analysis to assess the finite sample performances of the proposed method (LDA-SIS). Furthermore, we compare the effectiveness of our proposed method with other existing competitive screening methods, including the T-test (Fan and Fan [1]), DC (Li et al. [15]), KF (Mai and Zou [3]), MV (Cui et al. [6]), PSIS (Pan et al. [7]) and RRS (Cheng et al. [8]).

    For each simulation, we set the dimensionality p to 1000 and 2000, and the sample size n to 100 and 200, respectively. All the simulation results are based on 1000 replications. Similar to Fan and Lv [14] and Li et al. [15], the screening threshold number is set to be dn=[n/log(n)]. The following criteria are considered to evaluate the performances of all screening methods.

    ● MMS: The minimum model size of the submodel contains all active predictors. The five quantiles of MMS over 1000 replications are presented.

    Pk: The proportion of the kth active predictor is selected into the model with size dn.

    Pa: The proportion that all active predictors are selected into the model.

    Example 1 (Logistic Model): Consider the logistic regression model

    logit(py)=Xββ,py=P(Y=1|X).

    The covariate X=(X1,X(1)),X(1)=(X2,,Xp) is generated from X1N(0,1) and X(1)Np1(0,Σ), where Σ is a (p1)×(p1) covariance matrix with elements σij=ρ|ij|,i,j=1,,p1. We consider ρ=0.2,0.5 and 0.8, respectively. Set β=(1.4,1.2,1.0,0.8,0.6,0p5) and the random error ε which is added to Xββ follows N(0,1). The simulation results of Example 1 are shown in Tables 1 and 2.

    Table 1.  The selecting rates Pa and Pks in Example 1.
    n=100 n=200
    Method P1 P2 P3 P4 P5 Pa P1 P2 P3 P4 P5 Pa
    ρ=0.2 LDA-SIS 1 0.94 0.944 0.789 0.372 0.276 1 1 1 0.997 0.83 0.827
    p=1000 DC 1 0.887 0.887 0.678 0.293 0.17 1 0.996 1 0.98 0.737 0.721
    T-test 1 0.914 0.904 0.725 0.316 0.202 1 0.999 1 0.988 0.779 0.771
    RRS 1 0.896 0.899 0.688 0.318 0.187 1 0.998 1 0.985 0.751 0.74
    KF 1 0.805 0.803 0.555 0.231 0.087 1 0.99 0.994 0.934 0.631 0.577
    MV 1 0.881 0.879 0.662 0.286 0.159 1 0.996 0.999 0.976 0.722 0.702
    PSIS 1 0.914 0.904 0.725 0.316 0.202 1 0.999 1 0.988 0.779 0.771
    p=2000 LDA-SIS 1 0.898 0.907 0.676 0.283 0.158 1 0.998 0.999 0.989 0.75 0.74
    DC 1 0.809 0.819 0.556 0.213 0.083 1 0.997 0.999 0.96 0.643 0.621
    T-test 1 0.836 0.841 0.6 0.235 0.1 1 0.998 0.998 0.966 0.678 0.657
    RRS 1 0.812 0.82 0.572 0.223 0.092 1 0.998 0.999 0.961 0.657 0.633
    KF 1 0.679 0.699 0.465 0.164 0.035 1 0.988 0.985 0.877 0.532 0.463
    MV 1 0.787 0.796 0.546 0.203 0.075 1 0.995 0.998 0.946 0.631 0.599
    PSIS 1 0.836 0.841 0.6 0.235 0.1 1 0.998 0.998 0.966 0.678 0.657
    ρ=0.5 LDA-SIS 1 0.996 0.998 0.993 0.828 0.821 1 1 1 1 0.998 0.998
    p=1000 DC 1 0.99 0.997 0.979 0.759 0.739 1 1 1 1 0.994 0.994
    T-test 1 0.993 0.999 0.989 0.787 0.774 1 1 1 1 0.995 0.995
    RRS 1 0.991 0.995 0.982 0.764 0.743 1 1 1 1 0.994 0.994
    KF 1 0.963 0.992 0.944 0.644 0.593 1 1 1 1 0.977 0.977
    MV 1 0.986 0.997 0.972 0.737 0.715 1 1 1 1 0.992 0.992
    PSIS 1 0.993 0.999 0.989 0.787 0.774 1 1 1 1 0.995 0.995
    p=2000 LDA-SIS 1 0.996 0.998 0.985 0.771 0.761 1 1 1 1 0.993 0.993
    DC 1 0.982 0.99 0.971 0.681 0.65 1 1 1 1 0.972 0.972
    T-test 1 0.985 0.993 0.978 0.724 0.702 1 1 1 1 0.984 0.984
    RRS 1 0.978 0.991 0.974 0.684 0.652 1 1 1 1 0.976 0.976
    KF 1 0.927 0.97 0.913 0.55 0.464 1 1 1 0.999 0.941 0.941
    MV 1 0.969 0.988 0.963 0.653 0.612 1 1 1 1 0.964 0.964
    PSIS 1 0.985 0.993 0.978 0.724 0.702 1 1 1 1 0.984 0.984
    ρ=0.8 LDA-SIS 1 1 1 1 0.999 0.999 1 1 1 1 1 1
    p=1000 DC 1 1 1 1 0.999 0.999 1 1 1 1 1 1
    T-test 1 1 1 1 0.999 0.999 1 1 1 1 1 1
    RRS 1 1 1 1 0.999 0.999 1 1 1 1 1 1
    KF 1 1 1 0.999 0.994 0.994 1 1 1 1 1 1
    MV 1 1 1 1 0.998 0.998 1 1 1 1 1 1
    PSIS 1 1 1 1 0.999 0.999 1 1 1 1 1 1
    p=2000 LDA-SIS 1 1 1 1 1 1 1 1 1 1 1 1
    DC 1 1 1 1 1 1 1 1 1 1 1 1
    T-test 1 1 1 1 1 1 1 1 1 1 1 1
    RRS 1 1 1 1 0.999 0.999 1 1 1 1 1 1
    KF 1 1 0.999 0.999 0.981 0.979 1 1 1 1 1 1
    MV 1 1 1 1 0.999 0.999 1 1 1 1 1 1
    PSIS 1 1 1 1 1 1 1 1 1 1 1 1

     | Show Table
    DownLoad: CSV
    Table 2.  The different quantiles of MMS in Example 1.
    n=100 n=200
    Method 5% 25% 50% 75% 95% 5% 25% 50% 75% 95%
    ρ=0.2 LDA-SIS 7 19 56 173.25 575 5 5 8 22 153.05
    p=1000 DC 9 34 106 279.25 736.55 5 6 12 43 275.1
    T-test 8 27 80 233.25 651.2 5 6 11 35 212.1
    RRS 9 31 95 251 726.25 5 6 12 41 236
    KF 15 56 167.5 343 770.3 5 9 25 90.25 361.4
    MV 10 38.75 115.5 288.25 764.3 5 6 14 49 293.05
    PSIS 8 27 80 233.25 651.2 5 6 11 35 212.1
    p=2000 LDA-SIS 10 38 124 394.5 1345.05 5 5 11 42.25 320.55
    DC 14 73 220.5 605.75 1461.05 5 7 19 95.25 689
    T-test 13 59 185.5 488.75 1404 5 6 16 72 482.3
    RRS 13 66.75 200.5 567.5 1406.1 5 7 18 84.25 540.2
    KF 29.95 113.5 338.5 741.75 1536.35 5 12 45 175 810.05
    MV 15 74 246 640.5 1455.3 5 7 22 107 695.05
    PSIS 13 59 185.5 488.75 1404 5 6 16 72 482.3
    ρ=0.5 LDA-SIS 5 5 6 14 89 5 5 5 5 7
    p=1000 DC 5 5 8 23 155 5 5 5 5 9
    T-test 5 5 7 18 123.05 5 5 5 5 7
    RRS 5 5 8 22 136.1 5 5 5 5 8.05
    KF 5 7 15 51 238.1 5 5 5 7 20
    MV 5 5 9 27.25 174.05 5 5 5 5 10
    PSIS 5 5 7 18 123.05 5 5 5 5 7
    p=2000 LDA-SIS 5 5 7 20 175.15 5 5 5 5 8
    DC 5 5 11 39 303 5 5 5 5 19
    T-test 5 5 10 31 241.45 5 5 5 5 16
    RRS 5 6 11 38 270.05 5 5 5 5 18.05
    KF 5 9 24 91 486 5 5 5 8.25 46
    MV 5 6 13 49 298.5 5 5 5 6 22
    PSIS 5 5 10 31 241.45 5 5 5 5 16
    ρ=0.8 LDA-SIS 5 5 5 5 6 5 5 5 5 5
    p=1000 DC 5 5 5 5 6 5 5 5 5 5
    T-test 5 5 5 5 6 5 5 5 5 5
    RRS 5 5 5 5 6 5 5 5 5 5
    KF 5 5 5 6 8 5 5 5 5 6
    MV 5 5 5 5 6 5 5 5 5 5
    PSIS 5 5 5 5 6 5 5 5 5 5
    p=2000 LDA-SIS 5 5 5 5 6 5 5 5 5 5
    DC 5 5 5 5 6 5 5 5 5 5
    T-test 5 5 5 5 6 5 5 5 5 5
    RRS 5 5 5 5 6 5 5 5 5 5
    KF 5 5 5 6 11 5 5 5 5 6
    MV 5 5 5 5 7 5 5 5 5 5
    PSIS 5 5 5 5 6 5 5 5 5 5

     | Show Table
    DownLoad: CSV

    Example 2 (Probit Model): Consider the probit regression model

    py=Φ(Xββ),py=P(Y=1|X),

    where Φ() is the cumulative distribution function of standard normal distribution. Assume that the true active set A={1,5,20,21,100}, and β is the p dimensional parametric vector with βA=(1,1,1,1,1) and 0 otherwise. The other settings are the same to Example 1. The simulation results of Example 2 are shown in Tables 3 and 4.

    Table 3.  The selecting rates Pa and Pks in Example 2.
    n=100 n=200
    Method P1 P2 P3 P4 P5 Pa P1 P2 P3 P4 P5 Pa
    ρ=0.2 LDA-SIS 1 0.779 0.933 0.934 0.76 0.499 1 0.991 0.999 1 0.992 0.982
    p=1000 DC 1 0.703 0.882 0.893 0.689 0.362 1 0.976 0.998 0.998 0.984 0.956
    T-test 1 0.727 0.9 0.907 0.713 0.401 1 0.983 1 0.999 0.987 0.969
    RRS 1 0.713 0.889 0.895 0.697 0.376 1 0.976 0.998 0.997 0.986 0.957
    KF 1 0.595 0.808 0.785 0.581 0.2 1 0.946 0.993 0.991 0.945 0.88
    MV 1 0.684 0.873 0.871 0.661 0.324 1 0.971 0.998 0.997 0.977 0.943
    PSIS 1 0.727 0.9 0.907 0.713 0.401 1 0.983 1 0.999 0.987 0.969
    p=2000 LDA-SIS 1 0.697 0.88 0.884 0.721 0.359 1 0.985 1 0.998 0.975 0.958
    DC 1 0.616 0.822 0.818 0.644 0.226 1 0.962 0.998 0.996 0.96 0.916
    T-test 1 0.649 0.852 0.848 0.673 0.275 1 0.976 1 0.998 0.967 0.941
    RRS 1 0.621 0.825 0.82 0.648 0.229 1 0.968 0.998 0.996 0.96 0.922
    KF 1 0.466 0.703 0.696 0.497 0.084 1 0.903 0.983 0.99 0.896 0.783
    MV 1 0.583 0.803 0.793 0.62 0.184 1 0.958 0.998 0.995 0.949 0.9
    PSIS 1 0.649 0.852 0.848 0.673 0.275 1 0.976 1 0.998 0.967 0.941
    ρ=0.5 LDA-SIS 1 0.725 0.99 0.995 0.705 0.486 1 0.985 1 1 0.982 0.968
    p=1000 DC 1 0.655 0.975 0.982 0.644 0.373 1 0.965 1 1 0.958 0.926
    T-test 1 0.691 0.983 0.986 0.681 0.429 1 0.972 1 1 0.97 0.944
    RRS 1 0.665 0.976 0.984 0.656 0.397 1 0.967 1 1 0.968 0.937
    KF 1 0.535 0.926 0.938 0.526 0.219 1 0.922 0.999 1 0.897 0.824
    MV 1 0.63 0.969 0.977 0.624 0.349 1 0.958 1 1 0.948 0.908
    PSIS 1 0.691 0.983 0.986 0.681 0.429 1 0.972 1 1 0.97 0.944
    p=2000 LDA-SIS 1 0.636 0.979 0.977 0.645 0.369 1 0.959 1 1 0.96 0.919
    DC 1 0.556 0.962 0.957 0.566 0.274 1 0.945 1 1 0.939 0.884
    T-test 1 0.589 0.975 0.969 0.617 0.326 1 0.951 1 1 0.952 0.903
    RRS 1 0.567 0.962 0.956 0.57 0.274 1 0.948 1 1 0.942 0.891
    KF 1 0.428 0.896 0.892 0.443 0.125 1 0.877 0.998 1 0.867 0.76
    MV 1 0.534 0.941 0.95 0.539 0.245 1 0.935 1 1 0.929 0.868
    PSIS 1 0.589 0.975 0.969 0.617 0.326 1 0.951 1 1 0.952 0.903
    ρ=0.8 LDA-SIS 1 0.641 0.999 0.999 0.57 0.343 1 0.966 1 1 0.953 0.922
    p=1000 DC 1 0.592 0.997 0.999 0.519 0.283 1 0.95 1 1 0.91 0.863
    T-test 1 0.625 0.999 1 0.532 0.301 1 0.962 1 1 0.932 0.895
    RRS 1 0.598 0.997 0.998 0.525 0.287 1 0.954 1 1 0.926 0.881
    KF 1 0.47 0.983 0.982 0.42 0.178 1 0.898 1 1 0.857 0.769
    MV 1 0.561 0.996 0.995 0.506 0.259 1 0.95 1 1 0.907 0.86
    PSIS 1 0.625 0.999 1 0.532 0.301 1 0.962 1 1 0.932 0.895
    p=2000 LDA-SIS 1 0.588 1 0.999 0.496 0.265 1 0.96 1 1 0.922 0.883
    DC 1 0.519 0.998 0.996 0.434 0.199 1 0.931 1 1 0.877 0.812
    T-test 1 0.562 0.998 0.997 0.459 0.229 1 0.951 1 1 0.895 0.847
    RRS 1 0.522 0.995 0.997 0.432 0.202 1 0.933 1 1 0.882 0.819
    KF 1 0.414 0.982 0.974 0.34 0.127 1 0.834 1 1 0.784 0.641
    MV 1 0.496 0.995 0.994 0.411 0.176 1 0.913 1 1 0.86 0.782
    PSIS 1 0.562 0.998 0.997 0.459 0.229 1 0.951 1 1 0.895 0.847

     | Show Table
    DownLoad: CSV
    Table 4.  The different quantiles of MMS in Example 2.
    n=100 n=200
    Method 5% 25% 50% 75% 95% 5% 25% 50% 75% 95%
    ρ=0.2 LDA-SIS 5 10 22 55 252 5 5 5 6 17.05
    p=1000 DC 6 15 38 92 394.25 5 5 6 8 36
    T-test 6 12 30 77 288.05 5 5 5 7 23.05
    RRS 6 14 34 89 316.2 5 5 5 8 31.05
    KF 10 26.75 64.5 151 453.05 5 6 8 18 99
    MV 6 16 40 105 410.05 5 5 6 10 42.05
    PSIS 6 12 30 77 288.05 5 5 5 7 23.05
    p=2000 LDA-SIS 6 15 36 101.5 490.1 5 5 5 7 29
    DC 8 24 63 181 720.4 5 5 6 11 55.05
    T-test 7 19.75 49 135 561.55 5 5 6 9 42
    RRS 8 23 56 164.25 634 5 5 6 10 58
    KF 14 52 124 298.25 957.2 5 6 11 30 157.05
    MV 8 28 69 197.25 784.55 5 5 7 12 76.05
    PSIS 7 19.75 49 135 561.55 5 5 6 9 42
    ρ=0.5 LDA-SIS 5 9 22 71 295.05 5 5 6 9 28
    p=1000 DC 6 13 35 113 444.1 5 5 7 12 49
    T-test 6 11 29 86 351.05 5 5 6 10 41.05
    RRS 6 13 33.5 95.25 399.3 5 5 7 12 46
    KF 8 24 65 165 516.7 5 6 10 24 106.1
    MV 6 14 41 118 490.75 5 5 7 13.25 55
    PSIS 6 11 29 86 351.05 5 5 6 10 41.05
    p=2000 LDA-SIS 6 13 37 114 556.95 5 5 6 10 58.05
    DC 7 20 56 191.25 813.25 5 5 7 15 98.1
    T-test 6 16 45 155 616.7 5 5 7 12.25 74
    RRS 7 19 52.5 180.75 679 5 5 7 14.25 89.05
    KF 10.95 40 118 320.25 978.35 5 7 13 36 228.15
    MV 7 22.75 64 219.25 801.65 5 6 8 17 121.05
    PSIS 6 16 45 155 616.7 5 5 7 12.25 74
    ρ=0.8 LDA-SIS 8 16 31 86.25 334.25 7.95 11 14 19 51.05
    p=1000 DC 9 20 45 125.25 508.15 8 11 15 23 78.05
    T-test 9 18 37.5 105 399 8 11 15 21 65.05
    RRS 9 19 41 114 444.1 8 11 15 22 68
    KF 10 31 74.5 188.25 564.25 8 12 19 34 136.05
    MV 10 21 48 135.25 523.3 8 12 16 24 81.05
    PSIS 9 18 37.5 105 399 8 11 15 21 65.05
    p=2000 LDA-SIS 9 21 48 151 619.1 8 11 14 22 66.05
    DC 10 28 73 228.5 897.3 8 11 17 29 106.05
    T-test 9.95 23 60.5 172 676.45 8 11 15 26 81
    RRS 10 28 70 201.25 746.25 8 11 17 29 101
    KF 13 49 136 334.25 1113.7 8 13 24 61 211.15
    MV 10 33 82 241.25 957 8 11 18 33 121.2
    PSIS 9.95 23 60.5 172 676.45 8 11 15 26 81

     | Show Table
    DownLoad: CSV

    From Tables 14, we can find that the proposed LDA-SIS procedure has better feature screening performances than the other procedures. The proportion of all active predictors selected into the screened submodel (Pa) is larger for the LDA-SIS procedure, and the minimum model size of the submodel which contains all active predictors (MMS) of the LDA-SIS procedure is smaller. From Tables 1 and 2, with the correlation parameter ρ increasing, the performances of the feature screening procedures become better. This phenomenon shows that when the active predictors have strong relationships with each other, the proposed feature screening procedure could select the important predictors more correctly. On the other hand, from Tables 3 and 4, with the correlation parameter ρ increasing, the performances of the feature screening procedures become worse. It shows that when the active predictors have strong relationships with the inactive predictors, the feature screening accuracy would be compromised. Furthermore, with the sample size increasing, better results would be obtained.

    We applied the LDA-SIS feature screening procedure to ovarian cancer data previously studied by Sorace and Zhan [2], Fushiki et al. [16], Zhang et al. [17], and Zhang et al. [18]. This dataset was generated using surface-enhanced laser desorption time-of-flight mass spectrometry and comprises serum samples from 162 ovarian cancer patients and 91 control subjects. The data are available on the Clinical Proteomics Program Databank website (http://home.ccr.cancer.gov/ncifdaproteomics/ppatterns.asp). For each ovarian cancer sample, we analyzed 15, 154 distinct mass-to-charge ratios (M/Z). As reported by Sorace and Zhan [2], the region with M/Z values below 500 was often discarded as noise, resulting in a reduction of the dimensionality of the biomarker features from 15, 154 to 12, 757. Our goal in this study was to identify proteomic patterns corresponding to specific M/Z values that could distinguish ovarian cancer subjects from control subjects.

    We randomly split the 253 samples into the training data set and the testing data set. In particular, we sampled approximately 100γ% of the ovarian cancer patients and 100γ% of the control subjects as the training data set, and the rest as the testing data set. We standardized the data to zero mean and unit variance before the discriminant classification.

    Different feature screening procedures are utilized to identify the important potential biomarkers in the standardized training data. In our LDA-SIS procedure, we select the variable with the largest value of the Kolmogorov-Smirnov statistics (Mai and Zou [3]) as X1. Let dn=[c0ntr/log(ntr)], c0=0.25,0.5,1, and ntr is the sample size of the training data. In this case, dn = 8, 17, and 35, respectively. After the first feature screening step, the kernel support vector machine (KSVM) method with Gaussian kernel function, and the penalized logistic model (PLM) with LASSO method (Tibshirani, R. [19]) are applied in the modeling step based on the screened dn potential biomarkers, respectively. And their performances are evaluated by the testing data. The packages e1071 and glmnet are used here.

    The procedure is repeated 200 times with γ = 0.7 and 0.8, respectively. Three assessment criteria are introduced to investigate the classification performance of the different methods.

    ● Testing error: The number of errors identified on the testing set.

    ● TPR (sensitivity or true positive rate): The proportion of ovarian cancer patients diagnosed correctly.

    ● PPV (positive predictive value): The proportion of samples diagnosed with ovarian cancer who did have the disease.

    Table 5 summarizes the median and robust standard deviation (RSD) in the parentheses of testing error, TPR, and PPV. The results show that our proposed LDA-SIS method outperforms the other methods based on these evaluation criteria. Furthermore, we observe that increasing the proportion of training data selected (γ=0.8) leads to better performance across all model sizes dn(8,17,35).

    Table 5.  Classification performance of ovarian cancer data.
    dn Assessment Criteria LDA-SIS DC T-test RRS KF MV PSIS
    8 KSVM-Testing error 3(1.49) 8(2.43) 8(2.24) 8(2.43) 8(2.24) 8(2.24) 8(2.24)
    KSVM-TPR 0.97(0.06) 0.83(0.07) 0.83(0.07) 0.85(0.08) 0.84(0.07) 0.85(0.07) 0.83(0.07)
    KSVM-PPV 0.97(0.04) 0.96(0.04) 0.96(0.04) 0.95(0.04) 0.96(0.04) 0.95(0.04) 0.96(0.04)
    PLM-Testing error 3(2.24) 5(2.24) 5(2.24) 5(2.24) 6(2.99) 5(2.24) 5(2.24)
    PLM-TPR 0.97(0.05) 0.94(0.05) 0.93(0.05) 0.94(0.04) 0.93(0.06) 0.94(0.05) 0.93(0.05)
    PLM-PPV 0.97(0.04) 0.92(0.05) 0.92(0.05) 0.93(0.07) 0.91(0.07) 0.93(0.06) 0.92(0.06)
    0.8 17 KSVM-Testing error 3(2.24) 8(2.99) 8(2.99) 7(3.73) 8(2.99) 7(2.99) 8(2.99)
    KSVM-TPR 0.94(0.04) 0.83(0.06) 0.83(0.06) 0.87(0.08) 0.85(0.07) 0.87(0.08) 0.83(0.06)
    KSVM- PPV 0.97(0.04) 0.96(0.04) 0.95(0.04) 0.94(0.04) 0.94(0.04) 0.94(0.04) 0.95(0.04)
    PLM-Testing error 2(1.49) 4(2.24) 5(2.24) 3(2.24) 4(2.24) 3(2.24) 5(2.24)
    PLM-TPR 0.97(0.04) 0.94(0.05) 0.94(0.07) 0.95(0.04) 0.93(0.05) 0.95(0.04) 0.94(0.06)
    PLM-PPV 0.97(0.04) 0.95(0.04) 0.94(0.05) 0.96(0.06) 0.95(0.07) 0.96(0.06) 0.94(0.05)
    35 KSVM-Testing error 3(2.24) 8(3.73) 8(3.73) 6(2.99) 7(3.73) 6(2.99) 8(3.73)
    KSVM-TPR 0.97(0.07) 0.84(0.07) 0.85(0.07) 0.88(0.08) 0.85(0.08) 0.88(0.08) 0.85(0.07)
    KSVM- PPV 0.97(0.04) 0.95(0.04) 0.95(0.04) 0.94(0.04) 0.94(0.04) 0.94(0.04) 0.95(0.04)
    PLM-Testing error 2(1.49) 3(2.24) 3(2.24) 3(2.24) 3(2.24) 3(2.24) 3(2.24)
    PLM-TPR 0.97(0.04) 0.96(0.03) 0.96(0.04) 0.96(0.05) 0.96(0.04) 0.96(0.04) 0.96(0.04)
    PLM-PPV 0.97(0.04) 0.95(0.05) 0.96(0.06) 0.97(0.06) 0.95(0.04) 0.96(0.04) 0.96(0.06)
    8 KSVM-Testing error 4(2.24) 11(3.17) 12(3.17) 11(3.92) 11(2.99) 11(3.73) 12(3.17)
    KSVM-TPR 0.96(0.04) 0.83(0.06) 0.83(0.06) 0.85(0.07) 0.84(0.07) 0.85(0.07) 0.83(0.06)
    KSVM-PPV 0.97(0.03) 0.96(0.04) 0.96(0.04) 0.95(0.04) 0.96(0.04) 0.95(0.04) 0.96(0.04)
    PLM-Testing error 4(2.24) 9(2.99) 9(2.99) 8(2.99) 9(2.99) 8(2.99) 6(2.99)
    PLM-TPR 0.96(0.03) 0.94(0.05) 0.94(0.05) 0.95(0.05) 0.94(0.06) 0.95(0.05) 0.94(0.04)
    PLM-PPV 0.96(0.04) 0.89(0.05) 0.90(0.05) 0.91(0.05) 0.89(0.05) 0.90(0.04) 0.90(0.05)
    0.7 17 KSVM-Testing error 5(2.24) 12(2.99) 12(3.73) 10(4.48) 11(4.48) 11(4.48) 12(3.73)
    KSVM-TPR 0.94(0.06) 0.84(0.07) 0.84(0.06) 0.87(0.08) 0.85(0.07) 0.87(0.07) 0.84(0.06)
    KSVM-PPV 0.98(0.04) 0.96(0.04) 0.95(0.04) 0.95(0.04) 0.95(0.04) 0.95(0.04) 0.95(0.04)
    PLM-Testing error 4(2.24) 7(2.99) 7(3.73) 6(2.99) 7(2.99) 6(2.99) 7(3.73)
    PLM-TPR 0.97(0.02) 0.95(0.04) 0.95(0.04) 0.96(0.04) 0.96(0.04) 0.96(0.04) 0.95(0.04)
    PLM-PPV 0.96(0.04) 0.92(0.05) 0.92(0.05) 0.93(0.04) 0.92(0.05) 0.93(0.04) 0.92(0.06)
    35 KSVM-Testing error 5(2.99) 12(3.73) 11(4.48) 9(4.66) 11(4.48) 10(4.48) 11(4.48)
    KSVM-TPR 0.94(0.07) 0.84(0.07) 0.85(0.07) 0.89(0.07) 0.87(0.08) 0.88(0.08) 0.85(0.07)
    KSVM-PPV 0.98(0.04) 0.95(0.04) 0.95(0.04) 0.95(0.04) 0.94(0.04) 0.94(0.04) 0.95(0.04)
    PLM-Testing error 3(1.49) 6(2.99) 6(2.24) 5(2.99) 6(2.99) 5(2.99) 5(2.24)
    PLM-TPR 0.98(0.02) 0.96(0.04) 0.96(0.03) 0.96(0.03) 0.96(0.04) 0.96(0.03) 0.96(0.03)
    PLM-PPV 0.98(0.02) 0.94(0.04) 0.94(0.04) 0.95(0.04) 0.94(0.04) 0.94(0.04) 0.94(0.04)

     | Show Table
    DownLoad: CSV

    In this article, we employed Fisher's linear projection and the marginal score test to study the feature screening procedure for the ultrahigh-dimensional binary classification problem. Although many feature screening procedures for the ultrahigh-dimensional discriminant analysis problems have been proposed, some of them are even model-free, the study for the LDA problem, one of the most popular approaches in discriminant classification and pattern recognition, is still very attractive. By minimizing the linear projection of the sum of squares in the original cluster structures and maximizing the linear projection of the sum of squares between groups, we constructed the marginal score test and combined it with the linear projection optimal problem to build the feature screening index. The sure screening property and the minimum model size of the procedure are studied. The sure screening property ensures that the feature screening procedure can retain all the important classification predictors with the probability tending to 1. And the minimum model size of the procedure proposed by Theorem 2 shows that as long as pk=1|ωk| is of a polynomial order of the sample size, the number of the selected variables is also a polynomial order of the sample size. The finite sample performance of the proposed procedure was illustrated by Monte Carlo studies and a real-data example. The simulation studies demonstrate that the proposed feature screening method performs well, and the simple structure of the screening index makes the calculation fast.

    Peng Lai's research is supported by National Natural Science Foundation of China (11771215). Yanqiu Zhou's research is supported by Guangxi Science and Technology Base and Talent Project (2020ACI9151), and Guangxi University Young and Middle-aged Teachers Basic Research Ability Improvement Project (2021KY0343).

    The authors declare no conflict of interest.

    The proofs of Theorem 1 and Theorem 2 in this paper are similar to the proofs of Theorem 1 in Li et al. [15] and Theorems 1–2 in Liu et al. [20]. Similar lemmas are used here to facilitate proving the proposed theorems, these lemmas are listed in the following and the proofs of these lemmas can be found in the Appendices of Li et al. [15] and Liu et al. [20].

    Lemma 1. Let μ=E(X). If P(a1Xb1)=1, then

    E[exp{s(Xμ)}]exp{s2(b1a1)2/8}, forany s>0.

    Lemma 2. Let h(X1,,Xm) be a kernel of the U statistics Un, and θ=E{h(X1,,Xm)}. If ah(X1,,Xm)b, then, for any ϵ>0 and n>m, we have

    P(Unθϵ)exp(2[n/m]ϵ2(ba)2),

    where [n/m] denotes the integer part of n/m.

    Furthermore, due to the symmetry of U statistic, we also have

    P(|Unθ|ϵ)2exp(2[n/m]ϵ2(ba)2).

    In the following, we give the proofs of Theorem 1 and Theorem 2. For convenience, we denote M, Mi, and ci,i=1,2,, as the generic constants depending on the context. Define Ij=I(Y=j) and I(Yi=j)=Iij.

    Proof of Theorem 1. For ˆωkωk, we have

    ˆωkωk=[ˆT11ˆT12ˆT21ˆT22][T11T12T21T22], (1)

    where ˆT11=1nE11, ˆT12=1nBk1, ˆT21=1nB11 and ˆT22=1nEk1. We first consider ˆT11T11.

    Define ˜T11=1n2j=1ni=1[Xi1E(X1Ij)E(Ij)]2Iij. We have

    P(|ˆT11T11|ε)P(|ˆT11˜T11|+|˜T11T11|ε)P(|˜T11T11|ε/2), (2)

    with n sufficiently large, i.e., nM1. It follows

    P(|˜T11T11|ε/2)2j=1P(|ˆT11T11|ε4),

    where

    ˆT11=1nni=1[Xi1E(X1Ij)E(Ij)]2IijandT11=E{[X1E(X1Ij)E(Ij)]2Ij}.

    Obviously, ˆT11 is the U-statistic, and T11 is the kernel of the U-statistic of ˆT11. Define h1(Xi1,Yi)=[Xi1E(X1Ij)E(Ij)]2Iij. Thus, we have

    ˆT11=1nni=1[Xi1E(X1Ij)E(Ij)]2IijI(h1(Xi1,Yi)M)+1nni=1[Xi1E(X1Ij)E(Ij)]2IijI(h1(Xi1,Yi)>M):=ˆT111+ˆT112. (3)

    Accordingly, we decompose T11 into two parts

    T11=E{[X1E(X1Ij)E(Ij)]2IjI(h1(X1,Y)M)}+E{[X1E(X1Ij)E(Ij)]2IjI(h1(X1,Y)>M)}:=T111+T112. (4)

    Clearly, ˆT111 and ˆT112 are unbiased estimators of T111 and T112.

    Similar to the proof of Theorem 2 in Zhu et al.[21], with the Markov's inequality and the properties of U-statistic, for any t>0, we can obtain that

    P(ˆT111T111ε)exp(tε)exp(tT111)E{exp(tˆT111)}exp(tε)En{exp(tn[h1(Xi1,Yi)I(h1(Xi1,Yi)M)T111])}exp(tε+M2t28n),

    where the last inequality is concluded from Lemma 1. By choosing t=4εn/M2, we have

    P(ˆT111T111ε)exp(2ε2nM2).

    Therefore, by the symmetry of U-statistic, we can get

    P(|ˆT111T111|ε)2exp(2ε2nM2). (5)

    Next, we show the consistency of ˆT112.

    With the Cauchy-Schwartz and Markov's inequalities, for any s>0,

    (T112)2E{h21(X1,Y)}E{exp(sh1(X1,Y))}/exp(sM).

    Note that

    h1(X1,Y)2X21+2E2(X1Ij)E2(Ij),

    which yields

    E{exp(sh1(X1,Y))}exp(2sE2(X1Ij)E2(Ij))E{exp(2sX21)}.

    By condition C1, if we choose M=cnγ, for 0<γ<12κ, then T112ε2 when n is sufficiently large. Consequently, similar to the proof of (B.4) in Li et al.[15], there exist some constant c1 and some s>0 such that

    P(|ˆT112T112|ε)P(|T112|ε2)c1nexp(sM4). (6)

    Recall that M=cnγ. Combining (3)–(6), we have

    P(|ˆT11T11|ε)P(|ˆT111T111|ε2)+P(|ˆT112T112|ε2)2exp(2c2ε2n12γ)+c1nexp(c3nγ),

    where c2 and c3 are some positive constants.

    Similarly, for u,v=1,2, we can prove

    P(|ˆTuvTuv|ε)2exp(2c2uvε2n12γ)+c1uvnexp(c3uvnγ),

    where c1uv, c2uv and c3uv are some positive constants. Therefore, it follows

    P(|ˆTuvTuv|ε)2j=1P(|ˆTuvTuv|ε2m)4exp(c2uvε28n12γ)+2c1uvnexp(c3uvnγ), (7)

    for u,v=1,2.

    Therefore, similar to the proof of Lemma 4 and Lemma 5 in Liu et al.[20], by (1), (2) and (7), we get

    P(|ˆωkωk|ε)P(|ˆT11ˆT12T11T12|ε2)+P(|ˆT21ˆT22T21T22|ε2)8exp(c4ε24n12γ)+2c5nexp(c6nγ),

    where c4c6 are some positive constants. Thus,

    P(max1kp|ˆωkωk|cnκ)O{2pexp(c74n12γ2κ)+2npexp(c8nγ)}. (8)

    Next, we prove the second part of Theorem 1 using the similar method of the proof of Theorem 1 in Li et al. [15]. If A, then there must exist some k\in\mathcal{A} such that \hat{\omega}_k^{*}\leq cn^{-\kappa} . Since \mathop{\min}\limits_{k\in\mathcal{A}}\omega_k^{*}\geq2cn^{-\kappa} , it indicates that

    \left\{\mathcal{A}\nsubseteq\hat{\mathcal{A}}_{c_n}\right\}\subseteq\left\{\big|\hat{\omega}_k^{*}-\omega_k^{*}\big| > cn^{-\kappa}, {\text{for some}}\ k\in\mathcal{A}\right\}.

    Therefore,

    \begin{eqnarray*} P(\mathcal{A}\subseteq\hat{\mathcal{A}}_{c_n})&\geq&1-P\left(\min\limits_{k\in\mathcal{A}}\big|\hat{\omega}_k^{*}-\omega_k^{*}\big|\geq cn^{-\kappa}\right) \geq1-s_nP\left(\big|\hat{\omega}_k^{*}-\omega_k^{*}\big|\geq cn^{-\kappa}\right)\\ &\geq&1-O\left\{2s_n\exp\left(-\frac{c_{7}}{4}n^{1-2\gamma-2\kappa}\right)+2ns_n\exp\left(-c_{8}n^{\gamma}\right)\right\}. \end{eqnarray*}

    This complete the proof of the second part.

    Proof of Theorem 2. Note that for any c_9 > 0 , the number of \{k:|\omega_k^*| > \frac{c_9}{2}n^{-\kappa}\} is bounded by O(n^{\kappa}\sum_{k = 1}^p|\omega_k^*|) . Then on the set

    \mathcal{B} = \{\max\limits_{1\leq k\leq p}|\hat{\omega}_k^*-\omega_k^*|\leq\frac{c_9}{2}n^{-\kappa}\},

    the number of \{k:|\hat{\omega}_k^*| > c_9n^{-\kappa}\} can't exceed the number of \{k:|\omega_k^*| > \frac{c_9}{2}n^{-\kappa}\} . Therefore, we have

    \begin{eqnarray*} P\left(\left\|\mathcal{\hat{A}}_{c_n}\right\|_{0}\leq O(n^{\kappa}\sum\limits_{k = 1}^p|\omega_k^*|)\right)\geq P(\mathcal{B}). \end{eqnarray*}

    Then, by (8), the proof is completed.



    [1] J. Fan, Y. Fan, High dimensional classification using features annealed independence rules, Ann. Stat., 36 (2008), 2605–2637. http://dx.doi.org/10.1214/07-AOS504 doi: 10.1214/07-AOS504
    [2] J. Sorace, M. Zhan, A data review and re-assessment of ovarian cancer serum proteomic profiling, BMC Bioinformatics, 4 (2003), 1–13. http://dx.doi.org/10.1186/1471-2105-4-24 doi: 10.1186/1471-2105-4-24
    [3] Q. Mai, H. Zou, The Kolmogorov filter for variable screening in high-dimensional binary classification, Biometrika, 100 (2013), 229–234. http://dx.doi.org/10.1093/biomet/ass062 doi: 10.1093/biomet/ass062
    [4] Q. Mai, H. Zou, The fused Kolmogorov filter: A nonparametric model-free screening method, Ann. Stat., 43 (2015), 1471–1497. http://dx.doi.org/10.1214/14-AOS1303 doi: 10.1214/14-AOS1303
    [5] P. Lai, F. Song, K. Chen, Z. Liu, Model free feature screening with dependent variable in ultrahigh dimensional binary classification, Statist. Probab. Lett., 125 (2017), 141–148. https://doi.org/10.1016/j.spl.2017.02.011 doi: 10.1016/j.spl.2017.02.011
    [6] H. Cui, R. Li, W. Zhong, Model-free feature screening for ultrahigh dimensional discriminant analysis, J. Am. Stat. Assoc., 110 (2015), 630–641. http://dx.doi.org/10.1080/01621459.2014.920256 doi: 10.1080/01621459.2014.920256
    [7] R. Pan, H. Wang, R. Li, Ultrahigh dimensional multi-class linear discriminant analysis by pairwise sure independence screening, J. Am. Stat. Assoc., 111 (2016), 169–179. http://dx.doi.org/10.1080/01621459.2014.998760 doi: 10.1080/01621459.2014.998760
    [8] G. Cheng, X. Li, P. Lai, F. Song, J. Yu, Robust rank screening for ultrahigh dimensional discriminant analysis, Stat. Comput., 27 (2017), 535–545. http://dx.doi.org/10.1007/s11222-016-9637-2 doi: 10.1007/s11222-016-9637-2
    [9] S. He, S. Ma, W. Xu, A modified mean-variance feature-screening procedure for ultrahigh-dimensional discriminant analysis, Comput. Stat. Data Anal., 137 (2019), 155–169. http://dx.doi.org/10.1016/j.csda.2019.02.003 doi: 10.1016/j.csda.2019.02.003
    [10] F. Song, P. Lai, B. Shen, Robust composite weighted quantile screening for ultrahigh dimensional discriminant analysis, Metrika, 83 (2020), 799–820. https://doi.org/10.1007/s00184-019-00758-x doi: 10.1007/s00184-019-00758-x
    [11] Y. Sheng, Q. Wang, Model-free feature screening for ultrahigh dimensional classification, J. Multivar. Anal., 178 (2020), 104618. http://dx.doi.org/10.1016/j.jmva.2020.104618 doi: 10.1016/j.jmva.2020.104618
    [12] S. Zhao, Y. Li, Score test variable screening, Biometrics, 70 (2014), 862–871. http://dx.doi.org/10.1111/biom.12209 doi: 10.1111/biom.12209
    [13] Y. Ma, Y. Li, H. Lin, Concordance measure-based feature screening and variable selection, Stat. Sinica, 27 (2017), 1967–1985. http://dx.doi.org/10.5705/ss.202016.0024 doi: 10.5705/ss.202016.0024
    [14] J. Fan, J. Lv, Sure independence screening for ultrahigh dimensional feature space, J. R. Stat. Soc. Series B. Stat. Methodol., 70 (2008), 849–911. http://dx.doi.org/10.1111/j.1467-9868.2008.00674.x doi: 10.1111/j.1467-9868.2008.00674.x
    [15] R. Li, W. Zhong, L. Zhu, Feature screening via distance correlation Learning, J. Am. Stat. Assoc., 107 (2012), 1129–1139. http://dx.doi.org/10.1080/01621459.2012.695654 doi: 10.1080/01621459.2012.695654
    [16] T. Fushiki, H. Fujisawa, S. Eguchi, Identification of biomarkers from mass spectrometry data using a "common" peak approach, BMC Bioinformatics, 7 (2006), 358–366. http://dx.doi.org/10.1186/1471-2105-7-358 doi: 10.1186/1471-2105-7-358
    [17] M. Zhang, W. Wang, Y. Du, ULDA-based heuristic feature selection method for proteomic profile analysis and biomarker discovery, Chemometr. Intell. Lab. Syst., 102 (2010), 84–90. http://dx.doi.org/10.1016/j.chemolab.2010.04.005 doi: 10.1016/j.chemolab.2010.04.005
    [18] M. Zhang, P. Tong, W. Wang, J. Geng, Y. Du, Proteomic profile analysis and biomarker discovery from mass spectra using independent component analysis combined with uncorrelated linear discriminant analysis, Chemometr. Intell. Lab. Syst., 105 (2011), 207–214. http://dx.doi.org/10.1016/j.chemolab.2011.01.007 doi: 10.1016/j.chemolab.2011.01.007
    [19] R. Tibshirani, Regression shrinkage and selection via the lasso, J. R. Stat. Soc. Series B. Methodol., 58 (1996), 267–288. http://dx.doi.org/10.1111/j.2517-6161.1996.tb02080.x doi: 10.1111/j.2517-6161.1996.tb02080.x
    [20] J. Liu, R. Li, R. Wu, Feature selection for varying coefficient models with ultrahigh-dimensional covariates, J. Am. Stat. Assoc., 109 (2014), 266–274. http://dx.doi.org/10.1080/01621459.2013.850086 doi: 10.1080/01621459.2013.850086
    [21] L. Zhu, L. Li, R. Li, L. Zhu, Model-free feature screening for ultrahigh-dimensional data, J. Am. Stat. Assoc., 106 (2011), 1464–1475. http://dx.doi.org/10.1198/jasa.2011.tm10563 doi: 10.1198/jasa.2011.tm10563
  • This article has been cited by:

    1. Xudong Lu, Qingsong Zhang, Xiaomin Wang, Xuan He, Jun Fu, Yu Wu, Mingliang Tang, 2023, Classification Method of Unstructured Data of Massive Hydropower Stations Based on Big Data Processing Technology, 9798400716478, 955, 10.1145/3656766.3656924
    2. Hanji He, Meini Li, Guangming Deng, Group feature screening for ultrahigh-dimensional data missing at random, 2024, 9, 2473-6988, 4032, 10.3934/math.2024197
    3. Rosas-Alatriste Carolina, Alarcón-Paredes Antonio, Alarcón-Paredes Diego, Ventura-Molina Elías, 2024, Chapter 13, 978-3-031-77292-4, 169, 10.1007/978-3-031-77293-1_13
  • Reader Comments
  • © 2023 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(1576) PDF downloads(55) Cited by(3)

Figures and Tables

Tables(5)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog