Loading [MathJax]/jax/element/mml/optable/GreekAndCoptic.js
Research article Special Issues

Cluster validity indices for mixture hazards regression models

  • In the analysis of survival data, the problems of competing risks arise frequently in medical applications where individuals fail from multiple causes. Semiparametric mixture regression models have become a prominent approach in competing risks analysis due to their flexibility and easy interpretation of resultant estimates. The literature presents several semiparametric methods on the estimations for mixture Cox proportional hazards models, but fewer works appear on the determination of the number of model components and the estimation of baseline hazard functions using kernel approaches. These two issues are important because both incorrect number of components and inappropriate baseline functions can lead to insufficient estimates of mixture Cox hazard models. This research thus proposes four validity indices to select the optimal number of model components based on the posterior probabilities and residuals resulting from the application of an EM-based algorithm on a mixture Cox regression model. We also introduce a kernel approach to produce a smooth estimate of the baseline hazard function in a mixture model. The effectiveness and the preference of the proposed cluster indices are demonstrated through a simulation study. An analysis on a prostate cancer dataset illustrates the practical use of the proposed method.

    Citation: Yi-Wen Chang, Kang-Ping Lu, Shao-Tung Chang. Cluster validity indices for mixture hazards regression models[J]. Mathematical Biosciences and Engineering, 2020, 17(2): 1616-1636. doi: 10.3934/mbe.2020085

    Related Papers:

    [1] Mohamed S. Eliwa, Buthaynah T. Alhumaidan, Raghad N. Alqefari . A discrete mixed distribution: Statistical and reliability properties with applications to model COVID-19 data in various countries. Mathematical Biosciences and Engineering, 2023, 20(5): 7859-7881. doi: 10.3934/mbe.2023340
    [2] Caihao Qu, Tengda Ma, Xin YAN, Xiaomei Li, Yumin Li . Overexpressed PAQR4 predicts poor overall survival and construction of a prognostic nomogram based on PAQR family for hepatocellular carcinoma. Mathematical Biosciences and Engineering, 2022, 19(3): 3069-3090. doi: 10.3934/mbe.2022142
    [3] Guohong Xin, Xiaoci Cao, Wujie Zhao, Pintian Lv, Gang Qiu, Yaxing Li, Bin Wang, Baoshuan Fang, Yitao Jia . MicroRNA expression profile and TNM staging system predict survival in patients with lung adenocarcinoma. Mathematical Biosciences and Engineering, 2020, 17(6): 8074-8083. doi: 10.3934/mbe.2020409
    [4] Chuan Hui Foo . Modeling discontinuous growth in reared Panulirus ornatus: A generalized additive model and Cox proportional hazard model approach. Mathematical Biosciences and Engineering, 2023, 20(8): 14487-14501. doi: 10.3934/mbe.2023648
    [5] Jinqi He, Wenjing Zhang, Faxiang Li, Yan Yu . Development of metastasis-associated seven gene signature for predicting lung adenocarcinoma prognosis using single-cell RNA sequencing data. Mathematical Biosciences and Engineering, 2021, 18(5): 5959-5977. doi: 10.3934/mbe.2021298
    [6] Xiuxian Zhu, Xianxiong Ma, Chuanqing Wu . A methylomics-correlated nomogram predicts the recurrence free survival risk of kidney renal clear cell carcinoma. Mathematical Biosciences and Engineering, 2021, 18(6): 8559-8576. doi: 10.3934/mbe.2021424
    [7] Wei Niu, Lianping Jiang . A seven-gene prognostic model related to immune checkpoint PD-1 revealing overall survival in patients with lung adenocarcinoma. Mathematical Biosciences and Engineering, 2021, 18(5): 6136-6154. doi: 10.3934/mbe.2021307
    [8] Zhanjiang Li, Yixiao Yuan, Tianning Sun, Pengfei Li . Early warning model of credit risk for family farms and ranches in Inner Mongolia based on Probit regression-Kmeans clustering. Mathematical Biosciences and Engineering, 2023, 20(5): 8546-8560. doi: 10.3934/mbe.2023375
    [9] Yunxiang Meng, Qihong Duan, Kai Jiao, Jiang Xue . A screened predictive model for esophageal squamous cell carcinoma based on salivary flora data. Mathematical Biosciences and Engineering, 2023, 20(10): 18368-18385. doi: 10.3934/mbe.2023816
    [10] Shuo Sun, Xiaoni Cai, Jinhai Shao, Guimei Zhang, Shan Liu, Hongsheng Wang . Machine learning-based approach for efficient prediction of diagnosis, prognosis and lymph node metastasis of papillary thyroid carcinoma using adhesion signature selection. Mathematical Biosciences and Engineering, 2023, 20(12): 20599-20623. doi: 10.3934/mbe.2023911
  • In the analysis of survival data, the problems of competing risks arise frequently in medical applications where individuals fail from multiple causes. Semiparametric mixture regression models have become a prominent approach in competing risks analysis due to their flexibility and easy interpretation of resultant estimates. The literature presents several semiparametric methods on the estimations for mixture Cox proportional hazards models, but fewer works appear on the determination of the number of model components and the estimation of baseline hazard functions using kernel approaches. These two issues are important because both incorrect number of components and inappropriate baseline functions can lead to insufficient estimates of mixture Cox hazard models. This research thus proposes four validity indices to select the optimal number of model components based on the posterior probabilities and residuals resulting from the application of an EM-based algorithm on a mixture Cox regression model. We also introduce a kernel approach to produce a smooth estimate of the baseline hazard function in a mixture model. The effectiveness and the preference of the proposed cluster indices are demonstrated through a simulation study. An analysis on a prostate cancer dataset illustrates the practical use of the proposed method.


    Survival analysis is a branch of statistics for analyzing time-to-event data. When looking into survival data, one frequently encounters the problem of competing risks in which samples are subject to multiple kinds of failure. The Cox proportional hazards model, introduced by Cox [1], is popular in survival analysis for describing the relationship between the distributions of survival times and covariates and is commonly employed to analyze cause-specific survival data. The traditional approach is to separately fit a Cox proportional hazards model to the data for each failure type, after considering the data with other kinds of failure censored. However, this conventional method encounters problems like the estimates are hard to interpret and the confidence bands of estimated hazards are wide, because the method does not cover all failure types [2,3].

    An alternative approach is to fit competing risks data by using a mixture model that incorporates the distinct types of failure to partition the population into groups, and it assumes an individual will fail from each risk with the probabilities being attributed to the proportions of each group, respectively. Moreover, the mixture approach is helpful for estimating the effects of covariates in each group through parametric proportional hazard regressions such as Cox’s model. McLachlan and Peel [4] noted that a mixture model is allowed for both dependent and independent competing risks and it can improve a model’s fit to the data than the traditional approach in which the causes of failure are assumed to be independent. Mixture models are popular in competing risks analysis, because their resultant estimates are easy to interpret [2], although complex.

    Semi-parametric mixture models are a generalization of parametric mixture models and have become a prominent approach for modelling data with competing risks. Semiparametric approaches to mixture models are preferable for their ability to adjust for the associated variables and allow for assessing the effects of these variables on both the probabilities of eventual causes of failure through a logistic model and the relevant conditional hazard functions by applying the Cox proportional hazards model (cf. [2]). Below, we review the existing semiparametric methods of mixture models for competing risks data.

    Ng and McLachlan [5] proposed an ECM-based semi-parametric mixture method without specifying the baseline hazard function to analyze competing risks data. They noted that when the component-baseline hazard is not monotonic increasing their semi-parametric approach can consistently produce less biased estimates than those done by fully parametric approaches. Moreover, when the component-baseline hazard is monotonic increasing, the two approaches demonstrate comparable efficiency in the estimation of parameters for mildly and moderately censoring. Chang et al. [6] studied non-parametric maximum-likelihood estimators through a semiparametric mixture model for competing risks data. Their model is feasible for right censored data and can provide estimates of quantities like a covariate-specific fatality rate or a covariate-specific expected time length. Moreover, Lu and Peng [7] set up a semiparametric mixture regression model to analyze competing risks data under the ordinary mechanism of conditional independent censoring. Choi and Huang [8] offered a maximum likelihood scheme for semiparametric mixture models to make efficient and reliable estimations for the cumulative hazard function. One advantage with their approach is that the joint estimations for model parameters connect all considered competing risks under the constraint that all the probabilities of failing from respective causes sum to 1 given any covariates. Other research studies for competing risks data are based on semiparametric mixture models, e.g. [5,6,7,8].

    Although the mixture hazard model is preferable to direct approaches, two important but challenging issues frequently encountered in the applications are the determination of the number of risk types and the identification of the failure type of each individual.

    It is understandable that the results of a mixture model analysis highly depend on the number of components. It is also conceivably hard to cover all types of competing risks in a mixture model. Validity indices are a vital technique in model selection. The cluster validity index is a kind of criterion function to determine the optimal number of clusters. Some cluster validity indices presented by [9,10,11] are designed to find an optimal cluster number for fuzzy clustering algorithms; some are only related to the membership, while some take into account the distance between the data sets and cluster centers. Wu et al. [12] proposed median-type validity indices, which are robust to noises and outliers. Zhou et al. [13] introduced a weighted summation type of validity indices for fuzzy clustering, but they are unfeasible for mixture regression models. Conversely, Henson et al. [14] evaluated the ability of several statistical criteria such as the Akaike information criterion (AIC) and Bayesian information criterion (BIC) to produce a proper number of components for latent variable mixture modeling. However, AIC and BIC may present the problem of over- and under-estimation on the number of components [15], respectively.

    As to the identification of failure types, many studies on the problems of competing risks like [5,6,7,8] assumed that the failure type of an individual is known if the subject’s failure time is observed, but if an individual is censored and only the censored time is known, then which failure type the subject fails from is unknown. In fact, even if one observes the failure time, then the true cause of failure might not be clear and needs further investigations. Thus, deciding the number of competing risks and recognizing the failure type of each individual are critical in competing risks analysis, but scant work has been done on them.

    Besides the above problems, another critical issue existing in mixture Cox hazard models particularly is the estimation of the baseline hazard function. The Cox proportional hazards model consists of two parts: the baseline hazard function and the proportional regression model. Bender et al. [16] assumed that the baseline hazard function follows a specific lifetime distribution, but this assumption is obviously restrictive. A single lifetime distribution may not adequately explain all data —for example, the failure rate is not monotonic increasing or decreasing. Alternatively, some scholars adopted nonparametric approaches to estimate the baseline hazard function that are more flexible. Ng and McLachlan [5] assumed the baseline hazard function to be piecewise constant by treating each observed survival time as a cut-off point, but the piecewise constant assumption has the disadvantage that the estimated curve is not smooth, while smoothing is required in several applications [17]. In fact, our simulation results also show that the derived estimates based on a piecewise constant hazard function are not sufficient in some cases (e.g. model 4 in Figure 4). Understandably, an inadequate estimation of the baseline function affects the selection of the number of model components; and hence leads to insufficient estimates of the model parameters.

    Figure 1.  Plot of different types of kernel function.
    Figure 2.  The scatter plot of the observed data and the expectation of the survival time E(T) from non-mixture model (a), mixture models with two types of risk (b), and three types of risk (c). Note that ER represents the type of risk for estimation.
    Figure 3.  The scatter plot of a sample data set for models M1~M4 respectively.
    Figure 4.  Plots of a single run simulated series by models M1~M4 respectively; E(T)Ri: the ith type of risk for estimation; CBH: cumulative baseline hazard rate; RRi: real cumulative baseline hazard function for the ith type of risk; (Mi-j): model Mi is fitted and j = 1, 3: the scatter plots of the observed data and the estimated expectation curves; j = 2, 4: plots of the estimated cumulative baseline hazard functions; j = 1, 2: estimated by piecewise constant assumption; j = 3, 4: estimated by kernel method.

    In order to solve the above mentioned problems with the Cox mixture hazard modelling for competing risks data, we propose four indices and the kernel estimation for the base line function in this paper. Validity indices are a vital technique in model selection, but they have been less utilized for deciding the number of components of a mixture regression model. By using posterior probabilities and residual functions, we propose four validity indices that are applicable to regression models in this study. Under the EM-based mixture model, the posterior probabilities play an important role in classifying data, which take role of data memberships in fuzzy clustering. Unlike the traditional regression model, the survival model does not meet the assumption that survival time variation is constant for each covariate. Therefore, we incorporate the functions of posterior probabilities and the sum of standard residuals to constitute the new validity indices and verify the effectiveness of the proposed new indices through extensive simulations. Moreover, we extend the kernel method of Guilloux et al. [18] to estimate the baseline hazard function smoothly and hence more accurately.

    The remainder of this paper is organized as follows. Section 2 introduces the mixture Cox proportional hazards regression model, develops an EM-based algorithm to estimate the model parameters, and also discusses kernel estimations for the baseline hazard function. Section 3 constructs four validity indices for selecting the number of model components in a mixture Cox proportional hazards regression model. Section 4 carries out several simulations and assesses the effectiveness of our validity indices. Section 5 analyzes a practical data set of prostate cancer patients treated with different dosages of the drug diethylstilbestrol. Finally, Section 6 states conclusions and a discussion.

    For mixture model analysis, suppose each member of a population can be categorized into g mutually exclusive clusters according to its failure type. Let D={(tj,XTj,δj):j=1,,n} , be a sample drawn from this population where T denotes the transpose of a vector, tj is the failure or right censoring time, Xj=(xj1,xj2,...,xjd)T is a d-dimensional vector of covariates, and:

    δj={1,ifthejthindividaulisuncensored,0,ifthejthindividauliscensored.

    The mixture probability density function (pdf) of t is defined by:

    f(t)=gi=1pifi(t) , subject to gi=1pi=1, (1)

    where pi is the mixing probability of failure due to the ith type of risk and g is the number of model components.

    In the ith component, the hazard function hi(t|Xj,βi) given covariate Xj follows a Cox proportional hazards model defined by

    hi(t|Xj,βi)=h0i(t)exp(XjTβi), (2)

    where βi=(βi1,βi2,...,βid)T is the vector of regression coefficients, and h0i(t) is the baseline hazard function of the ith component. We define the ith component-survival function and pdf by:

    Si(t|Xj,βi)=exp[H0i(t)exp(XjTβi)]

    and

    fi(t|Xj,βi)=h0i(t)exp[XjTβiH0i(t)exp(XjTβi)],

    where H0i(t)=t0h0i(s)ds is the ith component-cumulative baseline hazard function.

    The unknown parameters are the mixing probabilities p=(p1,p2,...,pg1)T and regression coefficients β=(β1T,β2T,...,βgT)T , where βi=(βi1,βi2,...,βid)T . Based on (1) and Zhou [19], the log-likelihood function under the mixture hazards model with right censored data is given by

    l(p,β)=nj=1gi=1{δjlog[pifi(tj|Xj,βi)]+(1δj)log[piSi(tj|Xj,βi)]},

    where f(tj|Xj,β)=gi=1pifi(tj|Xj,βi) and S(tj|Xj,β)=gi=1piSi(tj|Xj,βi) .

    Assume that the true causes of failure for an individual are unobserved, and hence the data are incomplete. We introduce the latent variable zij as:

    zij={1,ifjthindividualfailsduetoithtypeofrisk;0,otherwise.

    The complete-data log-likelihood function is given by:

    lc(p,β)=nj=1gi=1zij{δjlog[pifi(tj|Xj,βi)]+(1δj)log[piSi(tj|Xj,βi)]}. (3)

    Subsequently, the parameters are estimated through the expectation and maximization (EM) algorithm.

    E-step: On the (k+1)th iteration of E-step, we calculate the conditional expectation of the complete-data log-likelihood (3) given the current estimates of the parameters, i.e.:

    E[lc(p,β)|p(k),β(k)]=nj=1gi=1zij(k){δjlog[pifi(tj|Xj,βi)]+(1δj)log[piSi(tj|Xj,βi)]}=nj=1gi=1zij(k)logpi+nj=1z1j(k)[δjlogf1(tj|Xj,β1)+(1δj)logS1(tj|Xj,β1)]++nj=1zgj(k)[δjlogfg(tj|Xj,βg)+(1δj)logSg(tj|Xj,βg)]=Q0+Q1++Qg. (4)

    Here, p(k) and β(k) are the estimates of p and β , respectively, in the kth iteration. By Baye’s Theorem, we have:

    zij(k)=E(zij|p(k),β(k))=pi(k)fi(tj|Xj,βi(k))δjSi(tj|Xj,βi(k))1δjgl=1pl(k)fl(tj|Xj,βl(k))δjSl(tj|Xj,βl(k))1δj (5)

    which is the posterior probability that the jth individual with survival time tj fails due to the ith type of risk.

    M-step: The (k+1)th iteration of M-step provides the updated estimates p(k+1) and β(k+1) that maximizes (4) with respect to p and β .

    Under the constraints gi=1pi=1 , to maximize Q0=nj=1gi=1zij(k)logpi from (4), we obtain the estimation of mixing probability with

    pi(k+1)=nj=1zij(k)n. (6)

    The equation Qi from (4) for i=1,...,g can be written as:

    Qi=nj=1zij(k){δj[logh0i(tj)+XjTβi]exp(XjTβi)H0i(tj)}. (7)

    Define the score vector U(βi) for i=1,...,g as the first derivate of (7) with respect to the vector βi given H0i(t) fixed at H0i(k+1)(t) , and the estimation βi(k+1) satisfies the equation:

    U(βi)=Qiβi|H0i(tj)=H0i(k+1)(tj)=nj=1zij(k)[δjexp(XjTβi)H0i(k+1)(tj)]Xj=0. (8)

    To estimate the baseline hazard function under the mixture hazards model, we propose the kernel estimator. Define Nj(t)=I(tjtδj=1) as an event counting process and Yj(t)=I(tjt) as risk process. The updated kernel estimator of ith component-baseline hazard function h0i(t) on the (k+1)th iteration is defined by:

    h0i(k+1)(t|X,Zi(k),βi(k),b(k))=1b(k)τ0K(tub(k))dH0i(k+1)(u|X,Zi(k),βi(k)),τ0, (9)

    where K:RR is a kernel function, and b(k) is a positive parameter called the bandwidth. There are several types of kernel functions commonly used, appearing in Table 1 and Figure 1. We try these kernel functions in the simulated examples and find no significant differences. In this paper, we choose biweight as the kernel function to estimate the baseline hazard function.

    Table 1.  Different types of kernel function.
    Kernel function K(u)
    Gaussian K(u)=12πe12u2,<u<
    Epanechnikov K(u)=34(1u2),|u|1
    Biweight K(u)=1516(1u2)2,|u|1
    Triweight K(u)=3532(1u2)3,|u|1

     | Show Table
    DownLoad: CSV

    Derived by smoothing the increments of the Breslow estimator, the kernel estimator (9) can be written as:

    h0i(k+1)(t|X,Zi(k),βi(k),b(k))=1b(k)nj=1τ0K(tub(k))zij(k)I(¯Y(u)>0)Sni(u|X,Zi(k),βi(k)),dNj(u), (10)

    where ¯Y=1nnj=1Yj and Sni(u|X,Zi,βi)=nj=1zijexp(XjTβi)Yj(u) .

    Horova et al. [20] and Patil [21] introduced the cross-validation method to select the bandwidth of the kernel estimator. We define the cross-validation function for bandwidth b written as CV(b) under our model as:

    CV(b)=gi=1nj=1zij(k)[h0i(k+1)(tj|X,Zi(k),βi(k),b(k))]22gi=1lj1b(k)K(tltjb(k))zil(k)δlY(tl)zij(k)δjY(tj).

    The selection of bandwidth on the (k+1)th iteration is given by:

    b(k+1)=argminbBnCV(b), (11)

    where Bn cover the set of acceptable bandwidths.

    The algorithm is shown as follows, where we fix n and g and set up initial values for mixing probabilities p(0) , which are usually 1/1gg , regression coefficients β(0) , baseline hazard rates h0(0) , bandwidth b(0) is 0.5, and a termination value, ε>0 .

    Set the initial counter l=1 .

    Step 1: Compute Z(l1) with p(l1) , h0(l1) and β(l1) by (5);

    Step 2: Update p(l) with Z(l1) by (6);

    Step 3: Update h(l)0 with Z(l1) , β(l1) and b(l1) by (10);

    Step 4: Update bandwidth b(l) with Z(l1) , h(l)0 and β(l1) by (11);

    Step 5: Update β(l) with Z(l1) , h(l)0 and β(l1) by (8);

    Step 6: IF p(l)p(l1)2+h(l)0h(l1)02+β(l)β(l1)2<ε , THEN stop;

    ELSE let l=l+1 and GOTO Step 1.

    Note that the superscript (.) represents the number of iterations, h(0)0=(h01(0),h02(0),...,h0g(0))T is a g×n matrix, where h(0)0i=(h(0)0i(t1),h(0)0i(t2),...,h(0)0i(tn))T , and each row is initialized by a constant vector.

    In traditional regression analysis, we select the best model by picking the one that minimizes the sum of squared residuals, but unlike the traditional regression model, the survival model does not meet the assumption that the standard deviation of the survival time is a constant at each covariate. From Figure 2(a), we see that the survival time with higher expectation has higher standard deviation. Therefore, to select the best model we need to adjust the standard deviation to avoid being greatly affected by data that have large standard deviations. Moreover, if the model fits the data well, then each observed survival time will be close to the expectation of the component model, which has the largest posterior probability corresponding to one’s risk type.

    Figure 2(b) illustrate that observation A is closer to the mean line (red line) of the component model corresponding to risk type 1, say model 1, than to the mean line (blue line) of model 2. From (5), we see that the posterior probabilities of the observation A corresponding to the first type of risk (red line) will be much larger than that of the second type of risk (blue line). Hence, to build up validity indices for mixture models, we consider the posterior probabilities as the role of weights and define the mixture sum of standard absolute residuals (MsSAE) and mixture sum of standard squared residuals (MsSSE) as follows:

    MsSAE=gi=1nj=1ˆzij|tjˆEi(tj)|ˆVari(tj) ; MsSSE=gi=1nj=1ˆzij[tjˆEi(tj)]2ˆVari(tj) ,

    where ˆEi(tj)=0exp[ˆH0i(t)]exp(xjTˆβi)dt and ˆVari(tj)=20texp[ˆH0i(t)]exp(xjTˆβi)dtˆEi(tj)2 . The squared distance is considered, because it is easier to catch an abnormal model.

    From Figure 2(c) we can see that the expectation (green line) of the survival time according to the third type of risk (ER3) is close to that (blue line) corresponding to the second type of risk (ER2). In order to penalize the overfitting model, which is the model with too many model components, we consider the distance between the expectations of each survival time according to any two types of risk as the penalty. Define the average absolute separation ¯ASep , the average squared separation ¯SSep , the minimum absolute separation minASep and the minimum squared separation minSSep as:

    ¯ASep=2g(g1)gi=1gl>inj=1|ˆEi(tj)ˆEl(tj)| ; ¯SSep=2g(g1)gi=1gl>inj=1[ˆEi(tj)ˆEl(tj)]2;
    minASep=minilnj=1|ˆEi(tj)ˆEl(tj)| ; minSSep=minilnj=1[ˆEi(tj)ˆEl(tj)]2.

    A good model will possess small mixture standard residuals and large separation of expectations. Hence, based on the above-mentioned functions of residuals and separation of means, we propose four validity indices V1 ~ V4 for selecting the number of model components under the mixture hazards regression model.

    (V1). Absolute standard residuals and average separation VaRaS

    VaRaS=MsSAE¯ASep=gi=1nj=1ˆzij|tjˆEi(tj)|/|tjˆEi(tj)|ˆVari(tj)ˆVari(tj)2g(g1)gi=1gl>inj=1|ˆEi(tj)ˆEl(tj)|

    We find an optimal number g of types of risk by solving min2gn1VaRaS .

    (V2). Squared standard residuals and average separation VsRaS

    VsRaS=MsSSE¯SSep=gi=1nj=1ˆzij[tjˆEi(tj)]2/ˆzij[tjˆEi(tj)]2ˆVari(tj)ˆVari(tj)2g(g1)gi=1gl>inj=1[ˆEi(tj)ˆEl(tj)]2

    We find an optimal number g of types of risk by solving min2gn1VsRaS .

    (V3). Absolute standard residuals and minimum separation VaRmS

    VaRmS=MsSAEminASep=gi=1nj=1ˆzij|tjˆEi(tj)|/|tjˆEi(tj)|ˆVari(tj)ˆVari(tj)minilnj=1|ˆEi(tj)ˆEl(tj)|

    We find an optimal number g of types of risk by solving min2gn1VaRmS .

    (V4). Squared standard residuals and minimum separation VsRmS

    VsRmS=MsSSEminSSep=gi=1nj=1ˆzij[tjˆEi(tj)]2/[tjˆEi(tj)]2ˆVari(tj)ˆVari(tj)minilnj=1[ˆEi(tj)ˆEl(tj)]2

    We find an optimal number g of types of risk by solving min2gn1VsRmS .

    For the simulated data we consider four different models M1~M4. Under the mixture Cox proportional hazards model (2), the ith component hazard function is:

    hi(t|Xj,βi)=h0i(t)exp(xjβi,1,1+...+xjkβi,1,k+...+xjβi,d,1+...+xjkβi,d,k),

    where d is the number of covariates, k is the degree of models, βi=(βi,1,1,βi,1,k,...,βi,d,k,βi,d,k)T is the vector of regression coefficients and h0i(t) is the ith component-baseline hazard function.

    Consider two common distributions for the baseline hazard functions, Weibull and Gompertz; the ith component Weibull baseline and Gompertz baseline are defined by h0i(t)=λiρitρi1 and h0i(t)=λiexp(ρitj) , respectively, where λi and ρi are the scale and shape parameters. Let {\bf{ \pmb{\mathit{ λ}} }} = {{\rm{(}}{\lambda _{\rm{1}}}{\rm{, }}...{\rm{, }}{\lambda _g}{\rm{)}}^T} , {\bf{ \pmb{\mathit{ ρ}} }} = {{\rm{(}}{\rho _{\rm{1}}}{\rm{, }}...{\rm{, }}{\rho _g}{\rm{)}}^T} , and \mathit{\boldsymbol{\beta }} = {({\mathit{\boldsymbol{\beta }} _{\rm{1}}}^T, {\mathit{\boldsymbol{\beta }}_2}^T, ..., {\mathit{\boldsymbol{\beta }} _g}^T)^T} . The covariates X = {({x_1}, {x_2}, ..., {x_n})^T} in all cases are generated independently from a uniform distribution U(- 4, 4) . The information for four models is shown in Table 2, and the scatter plots of a sample dataset are presented in Figure 3.

    Table 2.  The information for models M1~M4 respectively.
    Model n1 g2 d3 k4 \mathit{\boldsymbol{p}} = \left[{\begin{array}{*{20}{c}} {{p_{\rm{1}}}} \\ \vdots \\ {{p_g}} \end{array}} \right] BH5 {\bf{ \pmb{\mathit{ λ}} }} = \left[{\begin{array}{*{20}{c}} {{\lambda _{\rm{1}}}} \\ \vdots \\ {{\lambda _g}} \end{array}} \right] {\bf{ \pmb{\mathit{ ρ}} }} = \left[{\begin{array}{*{20}{c}} {{\rho _{\rm{1}}}} \\ \vdots \\ {{\rho _g}} \end{array}} \right] {\bf{ \pmb{\mathit{ β}} }} = \left[{\begin{array}{*{20}{c}} {{{\bf{ \pmb{\mathit{ β}} }}_{\rm{1}}}^T} \\ \vdots \\ {{{\bf{ \pmb{\mathit{ β}} }}_g}^T} \end{array}} \right] {U_i} 6
    M1 200 2 1 1 \left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.7}}} \\ {{\rm{0}}{\rm{.3}}} \end{array}} \right] Weibull \left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.005}}} \\ {{\rm{1}}{\rm{.5}}} \end{array}} \right] \left[{\begin{array}{*{20}{c}} {{\rm{3}}{\rm{.0}}} \\ {{\rm{2}}{\rm{.0}}} \end{array}} \right] \left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.3}}} \\ {{\rm{0}}{\rm{.5}}} \end{array}} \right] \begin{gathered} {U_1}(5, 9) \\ {U_2}(2, 6) \\ \end{gathered}
    M2 200 2 1 2 \left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.5}}} \\ {{\rm{0}}{\rm{.5}}} \end{array}} \right] Gompertz \left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.2}}} \\ {{\rm{0}}{\rm{.7}}} \end{array}} \right] \left[{\begin{array}{*{20}{c}} {{\rm{1}}{\rm{.5}}} \\ {{\rm{2}}{\rm{.0}}} \end{array}} \right] \left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.8}}} & {{\rm{0}}{\rm{.1}}} \\ { - {\rm{0}}{\rm{.6}}} & {{\rm{0}}{\rm{.1}}} \end{array}} \right] \begin{gathered} {U_1}({\rm{4}}, {\rm{9}}) \\ {U_2}({\rm{4}}, {\rm{9}}) \\ \end{gathered}
    M3 400 2 2 1 \left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.5}}} \\ {{\rm{0}}{\rm{.5}}} \end{array}} \right] Weibull \left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.003}}} \\ {{\rm{0}}{\rm{.002}}} \end{array}} \right] \left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.5}}} \\ {{\rm{0}}{\rm{.7}}} \end{array}} \right] \left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.8}}} & { - {\rm{0}}{\rm{.5}}} \\ { - 0.6} & {0.5} \end{array}} \right] \begin{gathered} {U_{\rm{1}}}(12, 15) \\ {U_{\rm{2}}}(10, 13) \\ \end{gathered}
    M4 400 3 1 1 \left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.35}}} \\ {{\rm{0}}{\rm{.30}}} \\ {{\rm{0}}{\rm{.35}}} \end{array}} \right] Gompertz \left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.0002}}} \\ {{\rm{0}}{\rm{.002}}} \\ {{\rm{0}}{\rm{.0003}}} \end{array}} \right] \left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.7}}} \\ {{\rm{2}}{\rm{.0}}} \\ {{\rm{0}}{\rm{.8}}} \end{array}} \right] \left[{\begin{array}{*{20}{c}} { - 0.8} \\ {{\rm{0}}{\rm{.2}}} \\ {{\rm{1}}{\rm{.0}}} \end{array}} \right] \begin{gathered} {U_{\rm{1}}}(10, 15) \\ {U_{\rm{2}}}(4, 6) \\ {U_{\rm{3}}}(15, 20) \\ \end{gathered}
    1: sample size; 2: number of risk types; 3: number of covariates; 4: degree of models; 5: baseline hazard function; 6: censored times are generated from a uniform distribution U_{i}(a, b) for i=1, …, g.

     | Show Table
    DownLoad: CSV

    We consider an EM-based semi-parametric mixture hazards model to analyze simulated data, and compare the two methods of estimating the baseline hazard function. For the first method proposed by Ng and McLachlan [5], they assume the baseline hazard function is piecewise constant and calculate this function using maximum likelihood estimation (MLE). For the second method introduced in this paper, we use a kernel estimator to estimate the baseline hazard rates and choose biweight as the kernel function.

    In order to graphically demonstrate the results, we first show the results for a single run of simulation in Table 3 and Figure 4. The correct rate (CR) in Table 3 is the percentage of individuals matched into the true attributable type of risk. According to the results of the estimation, we match the individuals into one type of risk with largest posterior probability. Thus, this correct rate is defined as:

    CR = \frac{1}{n}\sum\limits_{j = 1}^n {\sum\limits_{i = 1}^g {I\left\{ {j \in risk(i) \cap {{\hat z}_{ij}} = \mathop {\max }\limits_i ({{\mathit{\boldsymbol{\hat Z}}}_j})} \right\}} } where { \mathit{\boldsymbol{\hat Z}}_j} = {({\hat z_{1j}}, {\hat z_{2j}}, ..., {\hat z_{gj}})^T}.
    Table 3.  The estimation of a simulated series by models M1~M4 respectively.
    {\bf{p}} {\bf{ \pmb{\mathit{ β}} }} CR MsSSE/n
    M1 True1 \left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.7}}} \\ {{\rm{0}}{\rm{.3}}} \end{array}} \right] \left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.3}}} \\ {{\rm{0}}{\rm{.5}}} \end{array}} \right]
    Piecewise2 \left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.561}}} \\ {{\rm{0}}{\rm{.439}}} \end{array}} \right] \left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.528}}} \\ {{\rm{0}}{\rm{.851}}} \end{array}} \right] 0.860 0.810
    Kernel3, bw4=1.0 \left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.672}}} \\ {{\rm{0}}{\rm{.328}}} \end{array}} \right] \left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.336}}} \\ {{\rm{0}}{\rm{.586}}} \end{array}} \right] 0.945 0.659
    M2 True \left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.5}}} \\ {{\rm{0}}{\rm{.5}}} \end{array}} \right] \left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.8}}} & {{\rm{0}}{\rm{.1}}} \\ { - {\rm{0}}{\rm{.6}}} & {{\rm{0}}{\rm{.1}}} \end{array}} \right]
    Piecewise \left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.641}}} \\ {{\rm{0}}{\rm{.958}}} \end{array}} \right] \left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.674}}} & {{\rm{0}}{\rm{.136}}} \\ { - 1.136} & {{\rm{0}}{\rm{.298}}} \end{array}} \right] 0.705 0.963
    Kernel, bw=0.5 \left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.523}}} \\ {{\rm{0}}{\rm{.476}}} \end{array}} \right] \left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.738}}} & {{\rm{0}}{\rm{.078}}} \\ { - {\rm{0}}{\rm{.762}}} & {{\rm{0}}{\rm{.146}}} \end{array}} \right] 0.855 0.910
    M3 True \left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.5}}} \\ {{\rm{0}}{\rm{.5}}} \end{array}} \right] \left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.8}}} & { - {\rm{0}}{\rm{.5}}} \\ { - 0.6} & {0.5} \end{array}} \right]
    Piecewise \left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.508}}} \\ {{\rm{0}}{\rm{.491}}} \end{array}} \right] \left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.993}}} & { - {\rm{0}}{\rm{.562}}} \\ { - 0.{\rm{562}}} & {0.{\rm{608}}} \end{array}} \right] 0.838 1.240
    Kernel, bw=0.4 \left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.478}}} \\ {{\rm{0}}{\rm{.522}}} \end{array}} \right] \left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.885}}} & { - {\rm{0}}{\rm{.534}}} \\ { - 0.6{\rm{28}}} & {0.5{\rm{21}}} \end{array}} \right] 0.843 1.142
    M4 True \left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.35}}} \\ {{\rm{0}}{\rm{.30}}} \\ {{\rm{0}}{\rm{.35}}} \end{array}} \right] \left[{\begin{array}{*{20}{c}} { - 0.8} \\ {{\rm{0}}{\rm{.2}}} \\ {{\rm{1}}{\rm{.0}}} \end{array}} \right]
    Piecewise \left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.399}}} \\ {{\rm{0}}{\rm{.265}}} \\ {{\rm{0}}{\rm{.335}}} \end{array}} \right] \left[{\begin{array}{*{20}{c}} { - 0.{\rm{938}}} \\ {{\rm{0}}{\rm{.920}}} \\ {{\rm{1}}{\rm{.137}}} \end{array}} \right] 0.693 1.211
    Kernel, bw=0.9 \left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.368}}} \\ {{\rm{0}}{\rm{.306}}} \\ {{\rm{0}}{\rm{.325}}} \end{array}} \right] \left[{\begin{array}{*{20}{c}} { - 0.8{\rm{06}}} \\ {{\rm{0}}{\rm{.192}}} \\ {{\rm{0}}{\rm{.927}}} \end{array}} \right] 0.873 0.828
    1: true parameters; 2: piecewise constant estimates; 3: kernel estimates; 4: bandwidth.

     | Show Table
    DownLoad: CSV

    When using a piecewise constant estimator under M1, from the estimated mixing probabilities, CR in Table 3 and the expectation line in Figure 4(M1-1), it can be seen that we will misclassify some data into the 2nd type of risk where their true risk type is the 1st one. As a result, the estimates of regression coefficients in Table 3 and the cumulative baseline hazard rate in Figure 4(M1-2) are not close to the true model. Furthermore, under M4, from the expectation line according to the 1st and 2nd types of risk in Figure 4(M4-1), it can be seen that we will misclassify some data between the 1st and 2nd types of risk when using piecewise constant estimator. The estimates of regression coefficients in Table 3 and the cumulative baseline hazard rate in Figure 4(M4-2) are mismatched with the real model. It is obvious that using the kernel procedure for the baseline hazard estimation will increase CR compared to using the piecewise constant procedure.

    We next show the results for 1000 simulations in Table 4. The absolute relative bias (ARB) for parameter θ is defined by:

    ARB(\theta ) = \left| {\frac{{E(\hat \theta ) - \theta }}{{E(\hat \theta )}}} \right|.
    Table 4.  The estimation of 1000 simulated series by models M1~M4 respectively.
    bias_ {\bf{p}} 3 MSE_ {\bf{p}} 4 bias_ {\bf{ \pmb{\mathit{ β}} }} 5 MSE_ {\bf{ \pmb{\mathit{ β}} }} 6 \overline {ARB} \overline {CR} \overline {MsSSE}
    M1 Piecewise1 \left[{\begin{array}{*{20}{c}} { - 0.160} \\ {{\rm{0}}{\rm{.160}}} \end{array}} \right] \left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.026}}} \\ {{\rm{0}}{\rm{.026}}} \end{array}} \right] \left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.088}}} \\ {{\rm{0}}{\rm{.275}}} \end{array}} \right] \left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.020}}} \\ {{\rm{0}}{\rm{.076}}} \end{array}} \right] 0.401 0.699 0.796
    Kernel2 \left[{\begin{array}{*{20}{c}} { - 0.{\rm{035}}} \\ {{\rm{0}}{\rm{.035}}} \end{array}} \right] \left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.002}}} \\ {{\rm{0}}{\rm{.002}}} \end{array}} \right] \left[{\begin{array}{*{20}{c}} { - 0.073} \\ { - 0.007} \end{array}} \right] \left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.007}}} \\ {{\rm{0}}{\rm{.000}}} \end{array}} \right] 0.107 0.856 0.653
    M2 Piecewise \left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.132}}} \\ { - 0.132} \end{array}} \right] \left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.017}}} \\ {0.{\rm{017}}} \end{array}} \right] \left[{\begin{array}{*{20}{c}} { - 0.097} & {{\rm{0}}{\rm{.041}}} \\ { - {\rm{0}}{\rm{.652}}} & {{\rm{0}}{\rm{.172}}} \end{array}} \right] \left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.010}}} & {{\rm{0}}{\rm{.001}}} \\ {{\rm{0}}{\rm{.429}}} & {{\rm{0}}{\rm{.029}}} \end{array}} \right] 0.646 0.680 1.329
    Kernel \left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.089}}} \\ { - 0.{\rm{089}}} \end{array}} \right] \left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.008}}} \\ {0.{\rm{008}}} \end{array}} \right] \left[{\begin{array}{*{20}{c}} { - 0.{\rm{123}}} & {{\rm{0}}{\rm{.054}}} \\ { - {\rm{0}}{\rm{.311}}} & {{\rm{0}}{\rm{.017}}} \end{array}} \right] \left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.018}}} & {{\rm{0}}{\rm{.006}}} \\ {{\rm{0}}{\rm{.124}}} & {{\rm{0}}{\rm{.000}}} \end{array}} \right] 0.292 0.774 1.009
    M3 Piecewise \left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.028}}} \\ { - 0.{\rm{028}}} \end{array}} \right] \left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.000}}} \\ {0.{\rm{000}}} \end{array}} \right] \left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.167}}} & { - {\rm{0}}{\rm{.091}}} \\ { - {\rm{0}}{\rm{.079}}} & {{\rm{0}}{\rm{.046}}} \end{array}} \right] \left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.028}}} & {{\rm{0}}{\rm{.008}}} \\ {{\rm{0}}{\rm{.006}}} & {{\rm{0}}{\rm{.002}}} \end{array}} \right] 0.122 0.847 1.271
    Kernel \left[{\begin{array}{*{20}{c}} { - {\rm{0}}{\rm{.006}}} \\ {0.{\rm{006}}} \end{array}} \right] \left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.000}}} \\ {0.{\rm{000}}} \end{array}} \right] \left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.033}}} & { - {\rm{0}}{\rm{.020}}} \\ {{\rm{0}}{\rm{.069}}} & { - {\rm{0}}{\rm{.051}}} \end{array}} \right] \left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.001}}} & {{\rm{0}}{\rm{.000}}} \\ {{\rm{0}}{\rm{.004}}} & {{\rm{0}}{\rm{.002}}} \end{array}} \right] 0.054 0.849 1.097
    M4 Piecewise \left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.043}}} \\ { - 0.055} \\ {{\rm{0}}{\rm{.012}}} \end{array}} \right] \left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.001}}} \\ {0.0{\rm{03}}} \\ {{\rm{0}}{\rm{.000}}} \end{array}} \right] \left[{\begin{array}{*{20}{c}} { - 0.003} \\ {0.791} \\ {{\rm{0}}{\rm{.251}}} \end{array}} \right] \left[{\begin{array}{*{20}{c}} {0.00{\rm{2}}} \\ {0.{\rm{627}}} \\ {{\rm{0}}{\rm{.063}}} \end{array}} \right] 0.766 0.646 0.737
    Kernel \left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.018}}} \\ { - 0.0{\rm{42}}} \\ {{\rm{0}}{\rm{.023}}} \end{array}} \right] \left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.000}}} \\ {0.0{\rm{01}}} \\ {{\rm{0}}{\rm{.000}}} \end{array}} \right] \left[{\begin{array}{*{20}{c}} {0.0{\rm{32}}} \\ {0.{\rm{071}}} \\ { - {\rm{0}}{\rm{.014}}} \end{array}} \right] \left[{\begin{array}{*{20}{c}} {0.00{\rm{2}}} \\ {0.{\rm{009}}} \\ {{\rm{0}}{\rm{.000}}} \end{array}} \right] 0.112 0.799 0.565
    1: piecewise constant estimates; 2: kernel estimates; 3: bias of {\bf{p}} ; 4: mean square error (MSE) of {\bf{p}} ; 5: bias of {\bf{ \pmb{\mathit{ β}} }} ; 6: mean square error (MSE) of {\bf{ \pmb{\mathit{ β}} }} .

     | Show Table
    DownLoad: CSV

    In Table 4 the mean absolute relative bias ( \overline {ARB} ) of the model with k parameters is defined by \overline {ARB} = {{\sum\nolimits_{i = 1}^k {ARB({\theta _i})} } \mathord{\left/ {\vphantom {{\sum\nolimits_{i = 1}^k {ARB({\theta _i})} } k}} \right. } k} . Moreover, \overline {CR} and \overline {MsSSE} are the mean of CR and MsSSE/n for each simulation. Table 4 presents that the \overline {ARB} and \overline {MsSSE} of the kernel estimate are smaller than those of the piecewise constant estimate. Moreover, the \overline {CR} of the kernel estimate is larger than that of the piecewise constant estimate in all cases considered. Thus, the model with the baseline hazard functions estimated by the kernel method fits the data better than that with piecewise constant baseline.

    In section 4.2, we consider an EM-based semi-parametric mixture hazards model to analyze simulated data under models M1~M4 by considering several possible number of risk types, that is model components, and use the kernel estimator to estimate the baseline hazard rates with biweight as the kernel function. Next, we use validity indices to select the optimal number of model components. The following six validity indices are used to compare with the validity indices we have come up with ( {V_{aRaS}} , {V_{sRaS}} , {V_{aRmS}} , and {V_{sRmS}} ).

    1. Partition coefficient {V_{PC}} proposed by Bezdek [22].

    2. Normalized partition coefficient {V_{NPC}} proposed by Dave [23].

    3. Partition entropy {V_{PE}} proposed by Bezdek [24].

    4. Normalized partition entropy {V_{NPE}} proposed by Dunn [25].

    5. Akaike information criterion AIC.

    6. Bayesian information criterion BIC.

    It is well known that memberships play an important role in fuzzy clustering. Similarly, under the EM-based mixture model, the posterior probabilities are closely related to the role of memberships. Therefore, we replace the role of memberships with posterior probabilities in the validity indices {V_{PC}} , {V_{NPC}} , {V_{PE}} , and {V_{NPE}} . Moreover, the formulas for AIC and BIC are computed by

    AIC = - 2 \cdot {l_c}(\mathit{\boldsymbol{\hat p}}, \mathit{\boldsymbol{\hat \beta }}) + 2k ; BIC = - 2 \cdot {l_c}(\hat p, \hat \beta ) + k\log (n),

    where \; {l_c}(\mathit{\boldsymbol{\hat p}}, \mathit{\boldsymbol{\hat \beta }}) is the complete-data log-likelihood (3) given the estimated parameters, and k is the number of parameters for estimation.

    All in all we consider ten indices, including {V_{PC}} , {V_{NPC}} , {V_{PE}} , {V_{NPE}} , AIC, BIC, {V_{aRaS}} , {V_{sRaS}} , {V_{aRmS}} , and {V_{sRmS}} , to select the optimal number of model components. Table 5 shows the proportion of choosing the correct number of model components over 1000 simulation runs based on the considered indices respectively. In each simulation run, each model of M1~M4 is fitted for components 2, 3, and 4 separately. Note that we assume the number of model components is greater than one for satisfying the requirement of the proposed validity indices. We define the proportion of choosing the correct number of risk types by each index in Table 5 as:

    \frac{{\# ({\rm{choose}}\;{\rm{correct}}\;{\rm{g}}\;{\rm{by}}\;{\rm{index}})}}{{\# ({\rm{simulaiton}})}}.
    Table 5.  The proportion of choosing the correct g by each index for 1000 simulation runs under models M1~M4 respectively.
    {V_{PC}} {V_{NPC}} {V_{PE}} {V_{NPE}} AIC BIC {V_{aRaS}} {V_{sRaS}} {V_{aRmS}} {V_{sRmS}}
    M1 0.962 0.880 0.976 0.894 0.964 0.950 0.896 0.896 0.984 0.992
    M2 0.954 0.564 0.963 0.485 0.524 0.631 0.863 0.851 0.981 0.990
    M3 1.000 0.798 1.000 0.868 0.998 0.998 0.994 0.998 1.000 1.000
    M4 0.486 0.780 0.413 0.810 0.646 0.660 0.923 0.916 0.813 0.703

     | Show Table
    DownLoad: CSV

    Table 5 shows that the proportion of choosing the correct g by traditional indices {V_{PC}} , {V_{NPC}} , {V_{PE}} , {V_{NPE}} , AIC, and BIC are not consistent under models M1~M4, where at least one model is not performing well (denoted by fluorescent yellow color in the table). On the other hand, the proposed indices ( {V_{aRaS}} , {V_{sRaS}} , {V_{aRmS}} , and {V_{sRmS}} ) are consistent and possess high proportions for every model, except that the proportion of {V_{sRmS}} under M4 is 0.703, which is slightly low, but it is still higher than that of most traditional indices. Hence, the proposed validity indices are superior than others in selecting the correct number of components.

    As a practical illustration of the proposed EM-based semi-parametric mixture hazard model, we consider the survival times of 506 patients with prostate cancer who entered a clinical trial during 1967–1969. These data were randomly allocated to different levels of treatment with the drug diethylstilbestrol (DES) and were considered by Byar and Green [26] and published by Andrews and Herzberg [27]. Kay [28] analyzed a subset of the data by considering eight types of risk, defined by eight categorical variables: drug treatment (RX: 0, 0.0 or 0.2 mg estrogen; 1, 1.0 or 5.0 mg estrogen); age group (AG: 0, < 75 years; 1, 75 to 79 years; 2, > 79 years); weight index (WT: 0, > 99 kg; 1, 80 to 99 kg; 2, < 80 kg); performance rating (PF: 0, normal; 1, limitation of activity); history of cardiovascular disease (HX: 0, no; 1, yes); serum haemoglobin (HG: 0, > 12 g/100 ml; 1, 9 to 12 g/100 ml; 2, < 9 g/100 ml); size of primary lesion (SZ: 0, < 30 cm2; 1, ≥ 30 cm2), and Gleason stage/grade category (SG: 0, ≤ 10; 1, > 10). Cheng et al. [26] classified this dataset with three types of risk as: (1) death due to prostate cancer; (2) death due to cardiovascular (CVD) disease; and (3) other causes.

    We analyze the same dataset with eight categorical variables (RX, AG, WT, PF, HX, SZ, SG). There are 483 patients with complete information on these covariates, and the proportion of censored observations is 28.8%. We ignore the information about the risk factors and use indices, including {V_{PC}} , {V_{NPC}} , {V_{PE}} , {V_{NPE}} , AIC, BIC, {V_{aRaS}} , {V_{sRaS}} , {V_{aRmS}} , and {V_{sRmS}} to select the optimal number of risk types. From Table 6, the number of risk types selected by {V_{aRaS}} , {V_{sRaS}} , {V_{aRmS}} , and {V_{sRmS}} is three, and that selected by other indices is two. The number of model components selected by the indices we have proposed is the same as that in the previous studies introduced by Cheng et al. [3].

    Table 6.  The value of each index with different number of risk types under prostate cancer data.
    {V_{PC}} {V_{NPC}} {V_{PE}} {V_{NPE}} AIC BIC {V_{aRaS}} {V_{sRaS}} {V_{aRmS}} {V_{sRmS}}
    g = 2 0.7813 0.5626 0.3369 0.5720 4.1518 4.2989 0.5894 0.4437 0.5894 0.4437
    g = 3 0.6684 0.5027 0.5260 0.6135 4.5012 4.7262 0.3783 0.1974 0.5016 0.2943
    g = 4 0.5581 0.4109 0.7564 0.7075 4.7967 5.0996 0.4746 57.572 0.6123 98.534
    Note: (1) g represents the number of risk types when estimating. (2) The optimal values of g according to each index are highlighted in bold.

     | Show Table
    DownLoad: CSV

    From existing medical experience and a previous study, we presume that these model components may agree with some particular types of risk and thus can decide whether there are significant relationships between the covariates and the survival times by using the Wald statistical test. Based on the cause-specific hazard approach, Cheng et al. [3] found that treatment with a higher DES dosage (RX = 1) significantly reduces the risk of death due to prostate cancer. Table 7 shows that the DES dosage has a significant effect on time to death due to the 1st type of risk, and that the estimated regression coefficients of RX is negative. Byar and Green [26] found that patients with a big size of primary lesion (SZ = 1) and high-grade tumors (SG = 1) are at greater risk of prostate cancer death. Table 7 lists that SZ and SG have a significant effect on time to death due to the 1st type of risk, and that the estimated regression coefficients are all positive. Accordingly, we presume the 1st type of risk relates to prostate cancer. Furthermore, based on the cause-specific hazard approach, Cheng et al. [3] found that treatment with a higher DES dosage (RX = 1) significantly increases the risk of death due to CVD. From Table 7, we see that DES dosage has a significant effect on time to death due to the 2nd and 3rd types of risk, and that the estimated regression coefficient of RX is positive.

    Table 7.  The model estimates (with standard errors) of prostate cancer data given the number of risk types equal to 3.
    1st type of risk 2nd type of risk 3rd type of risk
    \mathit{\boldsymbol{p}} 0.2132 0.3930 0.3936
    \mathit{\boldsymbol{\beta }} RX −0.0296*(0.1267) 0.3546*(0.1414) 0.7589*(0.1425)
    AG 0.3144*(0.1143) 1.7445*(0.1041) 1.8104*(0.1396)
    WT −0.0817*(0.0916) 1.7915*(0.0967) −0.5555*(0.1290)
    PF 1.4742*(0.2233) 0.1244*(0.2527) 1.6468*(0.3325)
    HX 3.0027*(0.1176) 1.2829*(0.1377) −0.6092*(0.1486)
    HG 0.8489*(0.1536) 1.6074*(0.1669) −5.2153*(0.7267)
    SZ 0.8567*(0.2119) 3.0334*(0.1998) −3.2661*(0.4074)
    SG 4.3184*(0.1010) −0.3907*(0.1419) −0.9933*(0.1560)
    Note: * denotes P-value < 0.05.

     | Show Table
    DownLoad: CSV

    We know that patients with a history of cardiovascular disease (HX = 1) have a higher probability of death due to CVD, compared to those patients without such a history. Table 7 shows that the estimated regression coefficient of HX is positive due to the 2nd type of risk. Hence, we presume the 2nd type of risk may relate to CVD. There is no explicit relationship between covariates and survival times adhering to the 3rd type of risk. Thus, we only presume the 3rd type of risk may relate to other death causes without specification. According to the significant relationship of covariates and survival times, we assess that the 1st, 2nd and 3rd types of risk for estimation from an EM-based semi-parametric mixture hazard model are classified to prostate cancer, CVD, and other unspecified causes, respectively.

    This study introduces four new validity indices, {V_{aRaS}} , {V_{sRaS}} , {V_{aRmS}} , and {V_{sRmS}} , for deciding the number of model components when applying an EM-based Cox proportional hazards mixture model to a dataset of competing risks. We incorporate the posterior probabilities and the sum of standard residuals to constitute the new validity indices. Moreover, our study sets up an extended kernel approach to estimate the baseline functions more smoothly and accurately. Extensive simulations show that the kernel procedure for the baseline hazard estimation is helpful for increasing the correct rate of classifying individual into the true attributable type of risk. Furthermore, simulation results demonstrate that the proposed validity indices are consistent and have a higher percentage of selecting the optimal number of model components than the traditional competitors. Thus, the proposed indices are superior to several traditional indices such as the most commonly used in statistics, AIC and BIC. We also employ the propose method to a prostate cancer data-set to illustrate its practicability.

    It is obvious that if we apply the four new validity indices at the same time, then we have the best chance to select the optimal number of model components. One concern is picking the best one among the proposed validity indices. In fact, the average separation versions ( {V_{aRaS}} , {V_{sRaS}} ) easily neutralizes the effects of small and large distances among the expectations of component models. On the other hand, as long as there is a small distance among the expectations of component models, the minimum separation versions ( {V_{aRmS}} , {V_{sRmS}} ) will catch the information about the overfitting model. Under the analysis of prostate cancer data, we see that {V_{aRmS}} and {V_{sRmS}} behave more sensitively than {V_{aRaS}} and {V_{sRaS}} for detecting the overfitting models (i.e., the distances of indices between overfitting and optimal models are much larger than those between underfitting and optimal models). Furthermore, according to the simulation results, the index {V_{sRmS}} performs slightly poor on a certain model, we thus recommend employing {V_{aRmS}} if just one of the proposed validity indices is to be used.

    In the future we may test the effectiveness of the proposed validity indices on statistical models other than the mixture Cox proportional hazards regression models. We could also advance the efficiency of the proposed indices in determining the number of components of mixture models. Another issue is to reduce the computation cost. For instance, the bandwidth of the kernel procedure for baseline hazard function estimates is recalculated on each iteration, which consumes computation time. All these factors need further investigation and will be covered in our future research.

    The authors thank the anonymous reviewers for their insightful comments and suggestions which have greatly improved this article. This work was partially supported by the Ministry of Science and Technology, Taiwan [Grant numbers MOST 108-2118-M-025-001-and MOST 108-2118-M-003-003-].



    [1] D. R. Cox, Regression models and life-tables with discussion, J. R. Stat. Soc. Ser. B-Stat. Methodol., 34 (1972), 187-220.
    [2] G. Escarela, and R. Bowater, Fitting a semi-parametric mixture model for competing risks in survival data, Commun. Stat.-Theory Methods, 37 (2008), 277-293.
    [3] S. C. Cheng, J. P. Fine, and L. J. Wei, Prediction of cumulative incidence function under the proportional hazards model, Biometrics 54 (1998), 219-228.
    [4] G. J. McLachlan and D. Peel, Finite mixture models, Wiley Series in Probability and Statistics: Applied Probability and Statistics, Wiley-Interscience, New York, 2000.
    [5] S. K. Ng and G. J. McLachlan, An EM-based semi-parametric mixture model approach to the regression analysis of competing risks data, Stat. Med., 22 (2003), 1097-1111.
    [6] I. S. Chang, C. A. Hsiung, C. C. Wen and W. C. Yang, Non-parametric maximum likelihood estimation in a semiparametric mixture model for competing risks data, Scand. J. Stat., 34 (2007), 870-895.
    [7] W. Lu and L. Peng, Semiparametric analysis of mixture regression models with competing risks data, Lifetime Data Anal., 14 (2008), 231-252.
    [8] S. Choi and X. Huang, Maximum likelihood estimation of semiparametric mixture component models for competing risks data, Biometrics, 70 (2014), 588-598.
    [9] A. K. Jain and R. C. Dubes, Algorithms for clustering data, Prentice-Hall, Englewood Cliffs, NJ, 1988.
    [10] Y. G. Tang, F. C. Sun and Z. Q. Sun, Improved validation index for fuzzy clustering, in American Control Conference, June 8-10, 2005. Portland, OR, USA, (2005), 1120-1125.
    [11] W. Wang and Y. Zhang, On fuzzy cluster validity indices, Fuzzy Sets Syst., 158 (2007), 2095-2117.
    [12] K. L. Wu, M. S. Yang and J. N. Hsieh, Robust cluster validity indexes, Pattern Recognit., 42 (2009), 2541-2550.
    [13] K. L. Zhou, S. Ding, C. Fu and S. L. Yang, Comparison and weighted summation type of fuzzy cluster validity indices, Int. J. Comput. Commun. Control, 9 (2014), 370-378.
    [14] J. M. Henson, S. P. Reise and K. H. Kim, Detecting mixtures from structural model differences using latent variable mixture modeling: A comparison of relative model fit statistics, Struct. Equ. Modeling, 14 (2007), 202-226.
    [15] J. R. Busemeyer, J. Wang, J. T. Townsend and A. Eidels, The Oxford Handbook of Computational and Mathematical Psychology. Oxford University Press 2015.
    [16] R. Bender, T. Augustin and M. Blettner, Generating survival times to simulate Cox proportional hazards models, Stat. Med., 24 (2005), 1713-1723.
    [17] P. Royston, Estimating a smooth baseline hazard function for the Cox model. Technical Report No. 314. University College London, London 2011. Available from: https://www.semanticscholar.org/paper/Estimating-a-smooth-baseline-hazard-function-for-Royston/2f329b48f674a74253eb428b71ff237365fd4051.
    [18] A. Guilloux, S. Lemler and M. L. Taupin, Adaptive kernel estimation of the baseline function in the Cox model with high-dimensional covariates, J. Multivar. Anal., 148 (2016) 141-159.
    [19] M. Zhou, Empirical likelihood method in survival analysis, CRC Press 2016
    [20] I. Horova, J. Kolacek, and J. Zelinka, Kernel smoothing in Matlab theory and practice of kernel smoothing, World Scientific Publishing, 2012.
    [21] P. N. Patil, Bandwidth choice for nonparametric hazard rate estimation, J. Stat. Plan. Infer., 35 (1993), 15-30.
    [22] J. C. Bezdek, Numerical taxonomy with fuzzy sets, J. Math. Biol., 7 (1974), 57-71.
    [23] R. N. Dave, Validating fuzzy partition obtained through c-shells clustering, Pattern Recognit. Lett., 17 (1996), 613-623.
    [24] J. C. Bezdek, Cluster validity with fuzzy sets, Journal of Cybernetics, 3 (1974), 58-73. DOI: 10.1080/01969727308546047.
    [25] J. C. Dunn, Indices of partition fuzziness and the detection of clusters in large data sets, in Fuzzy Automata and Decision Processes, (ed. M.M. Gupta, Elsevier), NY, 1977.
    [26] D. P. Byar and S. B. Green, The choice of treatment for cancer patients based on covariate information: application to prostate cancer, Bull. Cancer, 67 (1980), 477-490.
    [27] D. F. Andrews and A. M. Herzberg, Data: a collection of problems from many fields for the student and research worker, Springer-Verlag, New York, (1985), 261-274.
    [28] R. Kay, Treatment effects in competing-risks analysis of prostate cancer data, Biometrics, 42 (1986), 203-211.
  • Reader Comments
  • © 2020 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(3859) PDF downloads(327) Cited by(0)

Figures and Tables

Figures(4)  /  Tables(7)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog