Processing math: 100%
Research article

Variable selection and estimation for accelerated failure time model via seamless-L0 penalty

  • Received: 19 June 2022 Revised: 30 September 2022 Accepted: 10 October 2022 Published: 18 October 2022
  • MSC : 14Q15, 62N02, 62E20

  • Survival data with high dimensional covariates have been collected in medical studies and other fields. In this work, we propose a seamless L0 (SELO) penalized method for the accelerated failure time (AFT) model under the framework of high dimension. Specifically, we apply the SELO to do variable selection and estimation under this model. Under appropriate conditions, we show that the SELO selects a model whose dimension is comparable to the underlying model, and prove that the proposed procedure is asymptotically normal. Simulation results demonstrate that the SELO procedure outperforms other existing procedures. The real data analysis is considered as well which shows that SELO selects the variables more correctly.

    Citation: Yin Xu, Ning Wang. Variable selection and estimation for accelerated failure time model via seamless-L0 penalty[J]. AIMS Mathematics, 2023, 8(1): 1195-1207. doi: 10.3934/math.2023060

    Related Papers:

    [1] Veronica Kariuki, Anthony Wanjoya, Oscar Ngesa, Mahmoud M. Mansour, Enayat M. Abd Elrazik, Ahmed Z. Afify . The accelerated failure time regression model under the extended-exponential distribution with survival analysis. AIMS Mathematics, 2024, 9(6): 15610-15638. doi: 10.3934/math.2024754
    [2] Kannat Na Bangchang . Application of Bayesian variable selection in logistic regression model. AIMS Mathematics, 2024, 9(5): 13336-13345. doi: 10.3934/math.2024650
    [3] Emrah Altun, Mustafa Ç. Korkmaz, M. El-Morshedy, M. S. Eliwa . The extended gamma distribution with regression model and applications. AIMS Mathematics, 2021, 6(3): 2418-2439. doi: 10.3934/math.2021147
    [4] Neama Salah Youssef Temraz . Analysis of stress-strength reliability with m-step strength levels under type I censoring and Gompertz distribution. AIMS Mathematics, 2024, 9(11): 30728-30744. doi: 10.3934/math.20241484
    [5] Fiaz Ahmad Bhatti, Azeem Ali, G. G. Hamedani, Mustafa Ç. Korkmaz, Munir Ahmad . The unit generalized log Burr XII distribution: properties and application. AIMS Mathematics, 2021, 6(9): 10222-10252. doi: 10.3934/math.2021592
    [6] Abdullah Mohammed Alomair, Weineng Zhu, Usman Shahzad, Fawaz Khaled Alarfaj . Non-parametric calibration estimation of distribution function under stratified random sampling. AIMS Mathematics, 2025, 10(2): 4457-4472. doi: 10.3934/math.2025205
    [7] Fiaz Ahmad Bhatti, G. G. Hamedani, Mashail M. Al Sobhi, Mustafa Ç. Korkmaz . On the Burr XII-Power Cauchy distribution: Properties and applications. AIMS Mathematics, 2021, 6(7): 7070-7092. doi: 10.3934/math.2021415
    [8] W. B. Altukhaes, M. Roozbeh, N. A. Mohamed . Feasible robust Liu estimator to combat outliers and multicollinearity effects in restricted semiparametric regression model. AIMS Mathematics, 2024, 9(11): 31581-31606. doi: 10.3934/math.20241519
    [9] Refah Alotaibi, Hassan Okasha, Hoda Rezk, Abdullah M. Almarashi, Mazen Nassar . On a new flexible Lomax distribution: statistical properties and estimation procedures with applications to engineering and medical data. AIMS Mathematics, 2021, 6(12): 13976-13999. doi: 10.3934/math.2021808
    [10] Zhongqi Liang, Yanqiu Zhou . Model averaging based on weighted generalized method of moments with missing responses. AIMS Mathematics, 2023, 8(9): 21683-21699. doi: 10.3934/math.20231106
  • Survival data with high dimensional covariates have been collected in medical studies and other fields. In this work, we propose a seamless L0 (SELO) penalized method for the accelerated failure time (AFT) model under the framework of high dimension. Specifically, we apply the SELO to do variable selection and estimation under this model. Under appropriate conditions, we show that the SELO selects a model whose dimension is comparable to the underlying model, and prove that the proposed procedure is asymptotically normal. Simulation results demonstrate that the SELO procedure outperforms other existing procedures. The real data analysis is considered as well which shows that SELO selects the variables more correctly.



    Analyzing high-dimensional survival data has become an important topic in statistics, among which finding covariates with good predictive power of survival is a fundamental step. In the variable selection, penalized least squares procedures is an attractive approach. Penalized least squares procedures are used for variable selection and estimation which help predict estimators.

    As a useful alternative to the Cox model [2], the AFT model[10] based on linear regression models has an intuitive form compared to Cox model. The AFT model with an unspecified error distribution has been studied commonly for right-censored data. Two approaches in this aspect have gained attractive attention. One uses the Kaplan-Meier estimator to obtain the ordinary least squares estimator. The other is the rank-based estimator, which is motivated by the score function of the partial likelihood. See for examples in [1,13,17].

    Identifying significant factors with predictive power, many techniques for linear regression models have been extended to the Cox regression and the AFT model. Penalized methods have drawn extensive attentions, which are for imposing some penalties to the regression coefficients. By balancing the goodness of fit and model complexity, penalization approaches lead the complex models to a profile. There exists plenty of methods used in gene expression analysis with survival data; see for examples in [16,19]. Moreover, various penalization methods of consistent selection have also been proposed. Examples include the adaptive Lasso [19], the smoothly clipped absolute deviations (SCAD) [6], the minimax concave penalty (MCP)[20] and the bridge penalty. It has been shown that the bridge penalty had the oracle estimation in the linear regression models having divergent number of covariates. For the AFT models, there also exists much literature (e.g., [7,9,22]). To name but a few, Huang et al. [7] considered the regularization approaches for estimation in the AFT model with high-dimension covariates based on Stute's weighted least squares method. Huang and Ma [8] considered variable selection for AFT model with bridge method. Wang and Song[18] applied adaptive lasso to the AFT models. In recent years, there are still a lot of studies on the AFT models. For example, Chai et al.[3] considered a set of low-dimensional covariates of main interest and a set of high-dimensional covariates that may also affect survival under the accelerated failure time model. Choi and Choi [4] proposed the logistic-kernel smoothing procedure for the semi-parametric AFT model with high-dimensional right-censored data. Li et al.[12] proposed a unified Expectation-Maximization approach combined with the L1-norm penalty to perform variable selection and parameter estimation simultaneously in the accelerated failure time model with right-censored survival data of moderate sizes.

    This article is motivated by the need for considering a seamless-L0 (SELO) penalty [5] under the AFT model, which is a smooth function similar to the L0 penalty. Under appropriate conditions, we show that the SELO selects a model whose dimension is comparable to the underlying model and prove that the proposed estimators is asymptotically normal. Monte Carlo simulations to evaluate the finite sample performance of the proposed procedure are computed. The proposed method is also demonstrated through an empirical analysis.

    The rest of this paper is organized as follows. The AFT model is based on SELO penalization and computational algorithm are introduced in Section 2. In Section 3, we further propose an accurate variable selection for high dimensional sparse AFT model based on seamless L0. The root n consistency and the asymptotic normality of the resulting estimate are established. We simulate Monte Carlo simulation study to examine the finite sample performance of the proposed estimate in Section 4. A real data example is used to illustrate the proposed methodology in Section 5.

    Let Ti be the logarithm of the failure time and Xi be the p-dimensional covariate vector. The AFT model assumes

    Ti=α+Xiβ+εi, i=1,,n, (2.1)

    where α is the intercept, βRp is an unknown vector of interest, and εi is the random error. When Ti is subject to right censoring, we can only observe (Yi,δi,Xi), where Yi=min(Ti,Ci), Xi be the p-dimensional covariate vector for the ith row of the n×p matrix X which is the covariate matrix, Ci is the logarithm of the censoring time, and δi=I{TiCi} is the censoring indicator. We assume that (Yi,δi,Xi), i=1,,n, come from the same distribution.

    Let ˆFn be the Kaplan-Meier estimator of the distribution function F of T. ˆFn can be written as

    ˆFn(y)=ni=1wiI{Y(i)y},

    where the wi's are the jumps in the Kaplan-Meier estimator expressed as w1=δ(1)n, wi=δ(i)ni+1i1j=1(njnj+1)δ(j), j=2,,n. wi's are also called the Kaplan-Meier weights; see for examples in Stute and Wang[14]. Here Y(1)Y(n) are the order statistics of Yi's and δ(1),,δ(n) are the associated censoring indicators. Similarly, let X(1),,X(n) be the associated covariates of the ordered Yi's. The weighted least square(WLS) loss function is

    12ni=1wi(Y(i)αX(i)β)2. (2.2)

    Let ˉXw=ni=1wiX(i)/ni=1wi, and ˉYw=ni=1wiY(i)/ni=1wi, denoted by X(i)=(nwi)1/2(X(i)ˉXw) and Y(i)=(nwi)1/2(Y(i)ˉYw). The weighted least square(WLS) objective function (2.2) can be written as

    n(β)=12ni=1(Y(i)XT(i)β)2.

    Penalized regression problem has been studied extensively. LASSO is one of the most popular and widely studied L1 penalty. But it has been proved that its estimator may be inconsistent for model selection. The smoothly clipped absolute deviations (SCAD) and the minimax concave penalty (MCP) are another two popular penalties. SCAD is a continuous penalty and its estimator has oracle property. MCP also performs well in variable selection, whose estimator is consistent. However, L0 penalty directly penalizes the non-zero parameters, whose drawback is the difficulty of computing because of its discontinuity. Seamless-L0 (SELO) was proposed in Dicker [5], which was explicitly designed to minic L0 penalty. It has been found that SELO possessed good theoretical properties.

    We now describe the variable selection for AFT model via SELO. Coordinate descent is introduced to solve this problem. We propose tuning parameter λ using cross-validation. The SELO penalized objective function is,

    Q(β)=n(β)+pj=1pSELO(βj), (2.3)

    where SELO(βj) is defined as,

    pSELO(βj)=PSELO,λ,τ(βj)=λlog(2)log(|βj||βj|+τ+1),

    and λ is tuning parameter. When λ is large, SELO may select small estimators. In the paragraph, λ is determined by Cross Validation. It is easy to see that when τ is enough small, pSELO(βj)λI{βj0}, which is similar to L0 penalty.

    To minimize (2.3), we utilize coordinate descent algorithm. Coordinate descent algorithm[21] has been widely used in penalized regression problem, which optimizes an objective function by calculating a single parameter at a time until convergence is reached. Dicker [5] described this algorithm for obtaining SELO estimators. The algorithm is formulated in terms of the tuning parameter λ. For a fixed value of λ, it can be implemented in the following steps.

    Algorithm of AFT model with SELO penalty
    Step 1. Initialize β(0)j=0, j=1,,p.
    Step 2. For the k-th iteration, we calculate the parameter from βk1 to βkp. ˜β(k)i=argminQ(˜βk1,,˜βki1,βi,βk1i+1,,βk1p).
    Step 3. If |˜β(k+1)˜β(k)| is small or k is very large, return β(k+1); otherwise increase k to k+1 and go to Step 2.
    Step 4. Repeat Steps 2 and 3 until convergence.

    In this section, we prove the consistency and asymptotic normality of WLS estimator via SELO under some conditions. Following the notation of Stute [14,15], let H denote the distribution function of Y. Under the assumption of independence between T and C, 1H(y)=(1F(y))(1G(y)), where F and G are the distribution functions of T and C. Let τY, τT and τC be the endpoints of the support of Y, T and C. We put

    ˜F0(x,y)={F0(x,y)y<τH,F0(x,τH)+1{τHA}F0(x,{τH})yτH.

    Now, introduce the following sub-distribution functions:

    ˜H1(x,y)=P(Xx,Zy,δ=1)  and  ˜H0(y)=P(Zy,δ=0).

    Under random censoring the limit variance becomes much more complicated. Let

    γ0(y)=exp{y0˜H0(dz)1H(z)},  γ1(y)=11H(y)1y<w(wxTβ)xjγ0(w)˜H1(dx,dw)

    and

    γ2(y)=1{v<y,v<w}(wxTβ)xjγ0(w)[1H(v)]2˜H0(dz)˜H1(dx,dw).

    We assume that

    (A1) (a) E[(YXTβ)2XXTδ]<, (b) |(wxTβ)xj|D1/2(w)˜F0(dx,dw)<, for j=1,...,p and D(y)=y0[(1H(w))(1G(w))]1G(dw).

    (A2) λ=O(1), τ=O(p1/2/n3/2), λn/p and pσ2/n0 when n.

    (A3) rλ(E(XXT))R, where r and R are positive constant.

    (A4) limnn1max1inpj=1w2ix2ij=0.

    (A5) E(|ϵσ|2+2δ1+δ)<M for some M<0 and δ>0.

    Condition (A1) is usually used for the proof of consistency in Stute[14]. Condition (A2) restricts the size of λ and τ. Condition (A3) gives the bound of the eigenvalues of E(XXT), which is used in Theorem 3.1. Conditions (A4) and (A5) are used for the proof the asymptotic normality of SELO estimators and are related to the Lindeberg condition of Lindeberg-Feller CLT.

    Theorem 3.1. Suppose that conditions (A1)–(A5), then

    (i) limnP({j:^ββj0}=A)=1, A={j;βj0}.

    (ii) n(n1XTAWXA/σ2)1/2(ˆβAβA)D(σ2(XTAWXA))12GA where GN(0,Σ) and Σ=Var{δγ0(Y)(YXβ)X+(1δ)γ1(Y;β)γ2(Y;β)}/n and GA is the part of G corresponding to βA.

    The proof is put in the supplementary.

    Let T be generated from T=Xβ+ϵ, where ϵN(0,σ). The covariates X=(X1,...,Xp) are standard normal. Here we set σ=0.1. The censoring variables are generated as uniformly distributed U(0,C0) and independent of the events, where C0 is chosen to obtain the censoring rate 25% and 40%. Set two sample sizes n=200 and n=400. The tuning parameter λ is chosen by cross validation. For each value of n, we simulated 1000 independent datasets {(y1,xT1),...,(yn,xTn)}. For each dataset, we calculated estimates of β. For each estimator ˆβ, we recorded: the model size, ˆA={j;ˆβj0}; an indicator of whether or not the true model was selected, I{ˆA=A}; the false positive rate, |ˆAA|/|ˆA|; the false negative rate, |AˆA|/(p|ˆA|); and the model error, (ˆββ)T(ˆββ). The column labeled "size", "rate", "F+", "F-" and "MSE" represent the above indicators. Results for SELO, LASSO, SCAD and MCP are summarized in the tables. Furthermore, we use the V-fold cross-validation to determine the tuning parameter. The CV score is Vv=1[n(ˆβ(v))(v)n(ˆβ(v))]. In this article, we set V=5.

    The example was conducted with p=8 and set β=(3,1.5,0,0,1,0,0,0)R8, where Table 1 summarizes the variable selection results based on SELO, LASSO, SCAD and MCP when censoring rates are 25% and 40%. Overall, SELO performs better than other three methods, which selects the correct model more frequently. For instance, when the censoring rate is 25%, SELO selects the true model most accurately. The true model size is 3 and the average size from SELO is 3.43. LASSO performs worse than other three methods both in model size and correct rate. LASSO, SCAD and MCP select model with average size 4.12, 4.05 and 4.04 and select the correct model in 42%, 46.7% and 47%. Similar results perform when the censoring rate is 40%. But we can easily see that the situation when the censoring rate is 25% is better than that when the censoring rate is 40%. When n increases, we can see that the results are better. For instance, SELO selects 3.07 variables when n=400, which is more accurate compared to 3.43 when n=200. Other indicators can also prove it.

    Table 1.  Simulation results for p=8.
    25% censoring 40% censoring
    n Method size rate F+ F- MSE size rate F+ F- MSE
    200 SELO 3.43 0.568 0.137 0.156 2.61 3.62 0.38 0.267 0.287 2.55
    LASSO 4.12 0.429 0.212 0.428 3.62 4.43 0.338 0.276 0.567 2.72
    SCAD 4.05 0.467 0.198 0.417 2.41 4.36 0.346 0.271 0.537 2.49
    MCP 4.04 0.47 0.196 0.412 2.46 4.33 0.359 0.267 0.531 2.45
    400 SELO 3.07 0.894 0.035 0.029 2.27 3.27 0.589 0.147 0.139 2.43
    LASSO 3.80 0.620 0.136 0.277 3.44 3.96 0.519 0.178 0.373 2.66
    SCAD 3.74 0.641 0.131 0.275 2.20 3.93 0.534 0.171 0.363 2.42
    MCP 3.74 0.641 0.131 0.275 2.24 3.92 0.533 0.170 0.361 2.40

     | Show Table
    DownLoad: CSV

    The example was conducted with p=50 and set β=(3,1.5,0,0,2,0,3,0,0,2,0,,0)R50, the rest of β is zero. Other settings were similar to the case in simulation I and the results are listed in Table 2. It is easy to see that SELO remains better performance compared to other three methods. The model size from SELO is 5.68, which is the closest to the true model when the censoring rate is 25% compared to 9.64, 9.27 and 8.86 for LASSO, SCAD and MCP. And it also performs better in correct rate and other indicators. For instance, SELO selects 52% correct variables which is better compared to 3%, 5% and 7% for LASSO, SCAD and MCP. The indicators "F+" and "F-" of SELO are the smallest among four methods. Similar results perform when p=50 compared to p=8. It can be concluded that the results are worse when the censoring rate is 40%. However, when n increases from 200 to 400, SELO selects more correct models.

    Table 2.  Simulation results for p=50.
    25% censoring 40% censoring
    n Method size rate F+ F- MSE size rate F+ F- MSE
    200 SELO 5.68 0.29 0.194 0.028 3.05 6.27 0.27 0.267 0.287 2.55
    LASSO 9.64 0.03 0.416 0.127 5.32 9.27 0.08 0.276 0.567 2.72
    SCAD 9.27 0.05 0.396 0.116 5.19 8.91 0.12 0.271 0.537 2.49
    MCP 8.86 0.07 0.374 0.104 5.88 8.42 0.10 0.267 0.531 2.45
    400 SELO 5.09 0.52 0.138 0.015 5.00 5.48 0.29 0.245 0.031 2.805
    LASSO 9.2 0.03 0.396 0.124 5.18 9.42 0.07 0.405 0.116 3.356
    SCAD 8.66 0.10 0.357 0.101 5.20 8.99 0.09 0.375 0.103 2.830
    MCP 8.56 0.13 0.347 0.099 5.53 8.70 0.11 0.357 0.095 2.662

     | Show Table
    DownLoad: CSV

    The example is set under 25% and 40% censoring, and we also estimate the mean estimated variance across 1000 simulated datasets when n is 200 and 400. From the Tables 3 and 4, we see that SELO with tuning over τ{0.001,0.01,0.1,0.5} seems to give better variance when compared to SELO with τ=0.01 fixed. We can see that the 1000 estimators remain stable whenever censoring rate are 25% and 40% and the simulation results perform better when n increases to 400.

    Table 3.  Variance of SELO estimator under 25% censoring.
    n τ β1 β2 β3 β4 β5 β6 β7 β8
    200 0.001 0.322 0.254 0.103 0.104 0.270 0.103 0.09 0.11
    {0.001,0.01,0.1,0.5} 0.319 0.204 0.011 0.004 0.940 0.01 0.02 0.001
    400 0.001 0.152 0.125 0.045 0.043 0.135 0.043 0.03 0.04
    {0.001,0.01,0.1,0.5} 0.145 0.120 0.009 0.026 0.136 0.005 0.02 0.02

     | Show Table
    DownLoad: CSV
    Table 4.  Variance of SELO estimator under 40% censoring.
    n τ β1 β2 β3 β4 β5 β6 β7 β8
    200 0.001 0.319 0.376 0.041 0.032 0.326 0.037 0.03 0.32
    {0.001,0.01,0.1,0.5} 0.316 0 0.010 0.007 0.007 0.002 0.003 0.012
    400 0.001 0.157 0.322 0.003 0.003 0.162 0.003 0.015 0.021
    {0.001,0.01,0.1,0.5} 0.154 0.077 0.001 0 0.098 0 0 0.002

     | Show Table
    DownLoad: CSV

    PBC data was collected in the Mayo Clinic trial of primary biliary cirrhosis of liver conducted between 1974 and 1984. A total of 424 PBC patients, referred to Mayo Clinic during that ten-year interval, met eligibility criteria for the randomized placebo controlled trial of the drug D-penicillamine. The first 312 cases in the data were participated in the randomized trial and contained complete data. The additional 112 cases did not participate in the clinical trial. After deleting the missing data, the remaining 276 datasets are used for the analysis. We consider 17 covariates: age, albumin, alk.phos, ascites, ast, bili, chol, copper, platelet, edema, hepato, protime, sex, spiders, stage, trt, trig.

    Among 276 samples without losing data, we calculated Table 5. The optimal values of lambda with SELO is small, which is 0.008. And the optimal values of lambda that are chosen by CV for LASSO, SCAD and MCP are 0.011. LASSO selects 6 variables and SELO selects 3 variables. LASSO selects the genes including sex, heptato, bili, albumin, protime, stage. SCAD selects the same variables like MCP. However, SELO selects sex, albumin and protime. The variables selected by SELO are also contained by other three methods. Meanwhile, we also calculate the AIC (Akaike Information Criteria) AIC=nlog(ˆσ2)+2(d+1) (d is the non-zero parameter and ˆσ is the error between the estimator and true value) for four methods which show that SELO results better with smaller AIC. In Table 6, we also calculated p values for the coefficients of variables selected by SELO. We calculated p values for several steps. Firstly, we calculated t values called tk where tk=ˆβ/s and s is the sum of squares error of the estimators. Secondly, we found the value of tα/2(nK) where α=0.05 and nK is the degree. Finally, we calculated the values of P(T>tα/2(nK))=P(T<tα/2(nK)).

    Table 5.  PBC data: Estimated coefficients and selected variables.
    Method Model size R2 Covariate AIC
    SELO 3 0.349 sex, albumin, protime 68.72
    LASSO 6 0.249 sex, heptato, bili, albumin, protime, stage 85.76
    SCAD 5 0.299 sex, heptato, bili, albumin, protime 75.48
    MCP 5 0.299 sex, heptato, bili, albumin, protime 75.48

     | Show Table
    DownLoad: CSV
    Table 6.  PBC data: Significance test and p value.
    variable coefficient p-value
    sex 0.624 1.53×1011<0.05
    albumin 0.341 1.2×104
    protime 0.210 1.004×104
    hepato -0.045 0.217
    bili -0.068 0.116
    stage -0.036 0.255

     | Show Table
    DownLoad: CSV

    Table 6 indicates the significance test results. From the results, we can see that the p value of sex, albumin and protime are all less than 0.05. The variables selected by SELO are all significant. Overall, SELO selected a simple model compared to another three methods.

    Statistical analysis of failure time with high dimension covariates is an important topic. In this article, we investigate a new method (SELO) for the AFT model with high dimension covariates, for simultaneous variable selection and estimation. A real dataset (PBC) is analyzed and SELO selects some important covariates. Our numerical results indicate that SELO performs better than another three methods. In this article, we address the situation where pn and prove the oracle property under the condition p/n0. We allow both n and p to diverge but p goes to infinity more slowly than n. The situation where p is much larger than n will be extended in the future research.

    The authors would like to thanks the editors and three reviewers for their valuable and helpful comments. The authors also thanks Guangren Yang and Yiming Liu for their advice for revising the paper.

    The authors declare no conflict of interest.

    In order to complete the proofs of Theorem 3.1 (i), we first show two lemmas.

    Lemma 1. Recall that

    Q(β)=12ni=1wi(Y(i)XT(i)β)2+pj=1PSELO(βj). (A.1)

    Then for every r(0,1), there exists a constant C0>0 such that

    liminfnP[argminββCpσ2/nQn(β){βRp;ββ<Cpσ2n}]>1r

    where CC0.

    Proof. Let αn=pσ2/n and fix r(0,1). To prove the Lemma 1, it suffices to show that if C>0 is large enough, then

    P{supu=1Q(β+Cαnu)>Q(β)}1ϵ.

    Furthermore, define Qn(u)=Q(β+Cαnu)Q(β).Then,

    Qn(u)=12C2α2nuT[ni=1wiX(i)XT(i)]uni=1wi(Y(i)XT(i)β)XT(i)Cαnu+pj=1[PSELO(βj+Cαnuj)PSELO(βj)]12C2α2nuT[ni=1wiX(i)XT(i)]uni=1wi(Y(i)XT(i)β)XT(i)Cαnu+jK(u)[PSELO(βj+Cαnuj)PSELO(βj)].

    By the results of Stute (1993, 1996), we have

    ni=1wiX(i)XT(i)PE(XXT),  and  nni=1wi(Y(i)XT(i)β)XT(i)DW,

    where WN(0,Σ), with Σ defined in the theorem. The last term in Qn(u) where K(u)={j;PSELO(βj+Cαnuj)PSELO(βj)<0}. The fact that PSELO is concave on [0,) imply that, for each β, PSELO(βj+Cαnuj)PSELO(βj)Cαn|uj|PSELO(βj+Cαnuj)

    Qn(u)12C2α2nuT[ni=1wiX(i)XT(i)]uni=1wi(Y(i)XT(i)β)XT(i)CαnujK(u)Cαn|uj|PSELO(βj+Cαnuj)=12C2α2nuT[ni=1wiX(i)XT(i)]uni=1wi(Y(i)XT(i)β)XT(i)CαnuCαnjK(u)λlog(2)τ(2|βj|+τ)(|βj|+τ)Δ=I1+I2+I3.

    Under conditions (A1) and (A3), for I1,

    ni=1wiX(i)XT(i)PE(XXT),  and  I112C2α2nR.

    For I2,

    nni=1wi(Y(i)XT(i)β)XT(i)DN(0,Σ),  and  I2=Op(Cαnn).

    For I3, by condition (A2), we have I3=op(Cαn). We conclude that if C>0 is large enough, then infu=1Qn(u)>0 holds for all n sufficiently large, with probability at least 1r. This finishes the proof of the Lemma 1.

    Lemma 2. We assume that C>0, Q(β) is similar to that in Lemma 1. Under the conditions (A1)–(A3),

    limnP[argminββCpσ2/nQn(β){βRp;βAc=0}]=1,

    where Ac={1,...,p}/A is the complement of A in {1,...,p}.

    Proof. Suppose that βRp and that ββ<Cαnu. Define ˜βRp by ~ββAc=0 and ˜βA=βA. Similar to the proof of Lemma 1, if Dn(β,˜β)=Qn(β)Qn(˜β), then

    Dn(β,˜β)=12ni=1wi(Y(i)XT(i)β)212ni=1wi(Y(i)XT(i)˜β)2+jAcPSELO(βj)=12ni=1wi(β˜β)TX(i)XT(i)(β˜β)ni=1wi(Y(i)XT(i)˜β)XT(i)(β˜β)+jAcPSELO(βj)Δ=I1+I2+I3.

    For I1 and I2, under the conditions (A1) and (A3),

    I1+I2=Op(||β˜β||pσ2n).

    For I3, PSELO is concave β<Cpσ2/n|βj|<Cpσ2/n, and

    jAcPSELO(βj)>τ(2|Cpσ2/n|+τ)(|Cpσ2/n|+τ)β˜β>0.

    Under the condition (A2), we combine the results of I1, I2 and I3. So, Dn(β,˜β)>0.

    Combining the proof of Lemmas 1 and 2, we can have the conclusion of Theorem 3.1 (i).

    Proof of Theorem 3.1 (ii). We consider the proof related to the Lindeberg condition of Lindeberg-Feller CLT. Under the conditions (A1)–(A5), let ˆβA be a estimator where A={j;ˆβj0}.

    We can easily have the following form,

    ˆβA=βA+(XTAWXA)1XTAϵ(XTAWXA)1pA(ˆβ)

    and

    n((n1XTAWXA)σ2)1/2(ˆβAβA)=(σ2XTAWXA)1/2XTAWXAϵ(σ2ni=1XTAWXA)1/2pA(ˆβ).

    To prove,

    (σ2XTAWXA)12XTAWϵN(0,G)  and  (σ2XTAWXA)12XTAWϵ=ni=1wi,n,

    where wi,n=(σ2XTAWXA)12wix(i),Aϵi. Let ηi,n=wixT(i),A(XTAWXA)12(XTAWXA)12wix(i),A.

    By the condition of Lindeberg-Feller CLT, we have

    E[wi,n2;wi,n2>δ0]=ηi,nE[ϵ2iσ2;ηi,nϵ2iσ2>δ0]=ηi,nηi,nϵ2iσ2>δ0ϵ2iσ2dF(x).

    By Holder inequality, set 1p=22+δ, 1q=δδ+2,

    E[wi,n2;wi,n2>δ0]ηi,n(ηi,nϵ2iσ2>δ0|ϵiσ|2pdF(x))1p(ηi,nϵ2iσ2>δ01qdF(x))1q=ηi,n(ηi,nϵ2iσ2>δ0|ϵiσ|2+δdF(x))22+δP{ηi,nϵ2iσ2>δ0}δ2+δ=ηi,nE(|ϵiσ|2+δ)22+δP{ηi,nϵ2iσ2>δ0}δ2+δ.

    By Markov inequality,

    P{ηi,nϵ2iσ2>δ0}δ2+δ(ηi,nδ0E{ϵ2iσ2})δ2+δ=δδ2+δ0ηδ2+δE(|ϵiσ|2)δ2+δ

    and

    E[wi,n2;wi,n2>δ0]η1+δ2+δδδδ+20E(|ϵσ|)4δ+42+δ.

    We showed that ni=1ηi,n=ni=1wi and ηi,n(n1XTAWXA)122max1inqj=11nw2ix2ij

    ni=1E[wi,n2;wi,n2>δ0]δδ2+δ0E(|ϵσ|2+2δ1+δ)ni=1w2ix2ijmax1inηδ2+δi,n.

    By conditions (A4) and (A5), ni=1E[wi,n2;wi,n2>δ0]0.

    By conditions (A1)–(A3), (σ2ni=1XTAWXA)1/2pA(ˆβ)=op(1),

    and

    nni=1wi(Y(i)XT(i)β)XT(i)DN(0,Σ),

    where, Σ=Var{δγ0(Y)(YXβ)X+(1δ)γ1(Y;β)γ2(Y;β)}.

    So, (σ2XTAWXA)12XTAWϵD(σ2XTAWXA)12GA, where GN(0,Σ) follows.



    [1] J. Buckley, I. James, Linear regression with censored data, Biometrika, 66 (1979), 429–436. https://doi.org/10.1093/biomet/66.3.429 doi: 10.1093/biomet/66.3.429
    [2] D. R. Cox, Regression models and life-tables (with discussion), J. Roy. Stat. Soc. Ser. B, 34 (1972), 187–220. https://doi.org/10.1111/j.2517-6161.1972.tb00899.x doi: 10.1111/j.2517-6161.1972.tb00899.x
    [3] H. Chai, Q. Z. Zhang, J. Huang, S. G. Ma, Inference for low-dimensional covariates in a high-dimensional accelerated failure time model, Stat. Sinica, 29 (2019), 877–894. https://doi.org/10.5705/ss.202016.0449 doi: 10.5705/ss.202016.0449
    [4] T. Choi, S. Choi, A fast algorithm for the accelerated failure time model with high-dimensional time-to-event data, J. Stat. Comput. Simul., 91 (2021), 3385–3403. https://doi.org/10.1080/00949655.2021.1927034 doi: 10.1080/00949655.2021.1927034
    [5] L. Dicker, B. S. Huang, X. H. Lin, Variable selection and estimation with the seamless-L 0 penalty, Stat. Sinica, 23 (2013), 929–962. https://dx.org/10.5705/ss.2011.074 doi: 10.5705/ss.2011.074
    [6] J. Q. Fan, R. Z. Li, Variable selection via nonconcave penalized likelihood and its oracle properties, J. Am. Stat. Assoc., 96 (2001), 1348–1360. https://doi.org/10.1198/016214501753382273 doi: 10.1198/016214501753382273
    [7] J. Huang, S. G. Ma, H. L. Xie, Regularized estimation in the accelerated failure time model with high-dimensional covariates, Biometrics, 62 (2006), 813–820. https://doi.org/10.1111/j.1541-0420.2006.00562.x doi: 10.1111/j.1541-0420.2006.00562.x
    [8] J. Huang, S. G. Ma, Variable selection in the accelerated failure time model via the bridge method, Lifetime Data Anal., 16 (2010), 176–195. https://doi.org/10.1007/s10985-009-9144-2 doi: 10.1007/s10985-009-9144-2
    [9] S. M. Hu, J. S. Rao, Sparse penalization with censoring constraints for estimating high dimensional AFT models with applications to microarray data analysis, Technical reports, University of Miami, 2010.
    [10] J. D. Kalbfleisch, R. L. Prentice, The statistical analysis of failure time data, John Wiley & Sons. Inc., New Jersey, 2 (2011), 168–170. https://doi.org/10.1016/0197-2456(81)90009-X
    [11] Y. D. Kim, H. Choi, H. S. Oh, Smoothly clipped absolute deviation on high dimensions, J. Am. Stat. Assoc., 103 (2008), 1665–1673. https://doi.org/10.1198/016214508000001066 doi: 10.1198/016214508000001066
    [12] Y. Li, M. X. Liang, L. Mao, S. J. Wang, Robust estimation and variable selection for the accelerated failure time model, Stat. Med., 40 (2021), 4473–4491. https://doi.org/10.1002/sim.9042 doi: 10.1002/sim.9042
    [13] Y. Ritov, Estimation in a linear regression model with censored data, Ann. Stat., 18 (1990), 354–372. https://doi.org/10.1214/aos/1176347502 doi: 10.1214/aos/1176347502
    [14] W. Stute, Consistent estimation under random censorship when covariables are present, J. Multivariate Anal., 45 (1993), 89–103. https://doi.org/10.1006/jmva.1993.1028 doi: 10.1006/jmva.1993.1028
    [15] W. Stute, Distributional convergence under random censorship when covariables are present, Scand. J. Stat., 23 (1996), 461–471. https://doi.org/10.1016/s0167-7152(98)00069-8 doi: 10.1016/s0167-7152(98)00069-8
    [16] R. Tibshirani, Regression shrinkage and selection via the lasso, J. Roy. Stat. Soc. B, 58 (1996), 267–288. https://doi.org/10.1111/j.2517-6161.1996.tb02080.x doi: 10.1111/j.2517-6161.1996.tb02080.x
    [17] A. A. Tsiatis, Estimating regression parameters using linear rank tests for censored data, Ann. Stat., 18 (1990), 354–372. https://doi.org/354-372.10.1214/aos/1176347504
    [18] X. G. Wang, L. X. Song, Adaptive Lasso variable selection for the accelerated failure models, Commun. Stat.-Theor. M., 40 (2011), 4372–4386. https://doi.org/10.1080/03610926.2010.513785 doi: 10.1080/03610926.2010.513785
    [19] H. Zou, The adaptive lasso and its oracle properties, J. Am. Stat. Assoc., 101 (2006), 1418–1429. https://doi.org/10.1198/016214506000000735 doi: 10.1198/016214506000000735
    [20] H. Zou, Nearly unbiased variable selection under minimax concave penalty, J. Am. Stat. Assoc., 38 (2010), 894–942. https://doi.org/894-942.10.1214/09-AOS729
    [21] W. J. Fu, Penalized regressions: The bridge versus the lasso, J. Comput. Graph. Stat., 7 (1998). https://doi.org/397-416.10.1214/09-AOS729
    [22] M. H. R. Khan, J. E. H. Shaw, Variable selection for survival data with a class of adaptive elastic net techniques, Stat. Comput., 26 (2016), 725–741. https://doi.org/10.1007/s11222-015-9555-8 doi: 10.1007/s11222-015-9555-8
  • This article has been cited by:

    1. Gabriela Ciuperca, Right-censored models by the expectile method, 2025, 1380-7870, 10.1007/s10985-024-09643-w
  • Reader Comments
  • © 2023 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(1801) PDF downloads(91) Cited by(1)

Figures and Tables

Tables(6)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog