1.
Introduction
Generalized linear models are extensions of the linear regression model to avoid the selection of the normality response and linearity imposed by the linear regression model, which is impossible for binary or count responses. Regression for count data is widely performed by models, such as Poisson [1], negative binomial [2] and zero-inflated regressions [3,4,5]. The well-known property of a Poisson distribution shows its mean that is equal to the variance. This situation is often unrealistic as the distribution of counts tends to have a variance not equal to its mean. When data handles over-dispersion, a negative binomial distribution is utilized to model count variables. Zero-inflated models are used to model count data that have many zero counts. Moreover, the Conway-Maxwell Poisson distribution is used to deal with under-dispersion and over-dispersion [2]. Also, the discrete Weibull distribution is examined to handle under-dispersion and over-dispersion discrete data. This model was first introduced by Nakagawa and Osaki [6]. The motivation for considering the discrete Weibull distribution stems from the vital role the that continuous Weibull distribution plays in the survival analysis and failure time study. Similarly, the continuous Weibull distribution is widely used in probabilistic modeling and a fatigue life prediction model [7,8]. However, there are many challenging things about this distribution that has yet to be proposed.
The inference for parameters of the discrete Weibull regression model has been investigated in a few studies based on a parameter affecting zero observation related to the explanatory variables through the log-log and logit link functions. Kalktawi [9] and Englehardt and Li [10] showed how a discrete Weibull regression model can be adapted to address over-dispersion and under-dispersion via the log-log link function. Moreover, Klakattawi et al. [11] proposed an ability to adapt in a simple way to different types of dispersions: Over-dispersion, under-dispersion and covariate-specific dispersion. Peluso and Vinciotti [12] conducted a simulation study linking two parameters to inspect a discrete Weibull regression model's level of flexibility.
The maximum likelihood estimation of parameters is valid for an asymptotically large sample size of data [13]. One of the most common problems occurring in the count regression model is the maximum likelihood estimates that become unstable with larger standard errors of the estimates that affect statistical inference when insufficiently large sample sizes manifest. To overcome the problem, various alternatives to the maximum likelihood estimation have been proposed, and the Bayesian estimation is one of them. However, the Bayes estimators depend on the prior distributions of the parameters in the model. Many researchers at different periods of time worked in this area of research proposed different prior distributions in the count regression model (see, for example, Haselimashhadi et al. [14], Gelman et al. [15], Fu [16] and Chanialidis et al. [17]). Recently, Chaiprasithikul and Duangsaphon [18,19] proposed the Bayesian estimation for censored data and the zero-inflated and hurdle discrete Weibull regression models via the log-log link function for a parameter that affects zero observation. Uniform non-informative and normal prior distributions were used to account for the regression coefficients. It was demonstrated that this suggestion performs well when applied to real datasets and in simulation studies. Alternatively, a regression structure for the discrete Weibull model through the median link function was proposed by Kalktawi [9]. In addition, there are many works that have developed the Bayesian estimation procedure for the reliability and life testing experiments (see, for example, Ahmadini et al. [20] and Okasha et al. [21]).
In the present article, the Bayesian estimation is examined, based on the random walk Metropolis algorithm for the median discrete Weibull regression model under the three different prior distributions: Uniform non-informative, normal and Laplace prior distributions. A simulation study is conducted to compare the performance of three different prior distributions and the maximum likelihood estimation in both under-dispersion and over-dispersion cases. Moreover, a real dataset is analyzed to see how the model works in practice.
The rest of paper is organized as follows. An overview of the median discrete Weibull regression model is presented in section two, along with the maximum likelihood estimation, Bayesian estimation, simulation study and real data analysis. In section three, results and discussion are explained. Finally, this paper is concluded in section four.
2.
Materials and methods
2.1. The median discrete Weibull regression model
A discrete Weibull distribution (type one) and some properties were introduced by Nakagawa and Osaki [6]. The cumulative distribution function and the probability mass function of a random variable are given by
and
respectively, where 0<q<1 and β>0 are the shape parameters. When y=0, the parameter q=1−pY(0;q,β), which is the probability of Y more than zero. In other words, when q is small, an excessive zero case occurs. Parameter β indicates the skewness and controls the range of values of Y. Moreover, Kalktawi [9] proposed the parameter β reflects the dispersion of data through the numerical analyses; if 0<β≤1, the data is over-dispersion, if β≥2, the data is under-dispersion and if 1<β<2, the data can be either over-dispersed or under-dispersed depending on the value of q. Thus, the discrete Weibull distribution is suitable for both over-dispersion and under-dispersion. Meanwhile, there are the special cases of a discrete Weibull distribution: Geometric distribution, the discrete Exponential distribution (see Sato et al. [22]), the discrete Rayleigh distribution (see Roy [23]) and the Bernoulli distribution.
The mean and variance of a discrete Weibull distribution are no closed-form expressions, but the numerical approximations can be obtained (see Barbiero [24]). Another property of a discrete Weibull distribution is the quantile function Q(τ). Let's say the τ−th (0<τ<1) quantile that is the smallest value of y for which F(y)≥τ. The quantile Q(τ) has a closed-form expression as given by
The quantile formula provided in Eq (3) can be applied. The median for discrete distributions can be defined as any value of y that F(y)≥0.5, then the median can be easily obtained from the closed form as given by
The presence of regression analysis for count data is a statistical process to measure the relationship between a count variable and one or more explanatory variables. Klakattawi et al. [11] showed how a discrete Weibull regression model can be adapted to address over-dispersion and under-dispersion via the log-log link function to the parameter q. Moreover, Haselimashhadi et al. [14] applied the logit link function to the parameter q, which is commonly used in classification problems for probabilities that are bounded between zero and one. Furthermore, they performed β that depends on explanatory variables through the log link function. The proposed discrete Weibull regression model, unlike generalized linear models in which the conditional mean is central to the interpretation, has the advantage that the conditional quantiles can be easily extracted from the fitted model. Moreover, the regression coefficients can be easily interpreted in terms of changes in the conditional median. According to the median, it has a closed-form expression and is more appropriate than the mean because of the skewness and outliers commonly found for counting data. Kalktawi [9] performed the median link function to a discrete Weibull regression model and used the maximum likelihood method for parameters estimation.
This study determines Yi,i=1,2,...,n as a count response variable, which takes only the non-negative integer values with the k explanatory variables, xi=(1,xi1,...,xik) and a vector composed of regression coefficients as α=(α0,α1,...,αk)′. It is assumed that the parameter qi=q(xi) is related to k explanatory variables xi via the median link function as follows
In the context, it is useful to assume that
Thus,
Equation (7) is substituted to Eq (4); hence, the parameter qi can be obtained as
The conditional probability mass function of Yi given xi can be written as
2.2. Maximum likelihood estimation
In this section, the maximum likelihood estimation for the discrete Weibull regression model is performed by linking only parameter q. The likelihood function of the median discrete Weibull regression is given by
The log-likelihood function of the discrete Weibull regression model via median is given by
The maximum likelihood estimation of the parameters is obtained by setting the first partial derivatives of the log-likelihood function with respect to each unknown parameter equal to zero;
where wi(α,β)=e−(ln2)(yiexiα)β−e−(ln2)(yi+1exiα)β.
The maximum likelihood estimators do not have a closed form solution because of the complex form of the likelihood equations. It is very difficult to prove that the solution to the normal equations gives a global maximum. Therefore, the maximum likelihood estimators are estimated by using the numerical method applied in the function optim() from package stats in R language, which minimizes the negative log-likelihood function of the median discrete Weibull regression model.
Let I(α,β) be the observed Fisher's information matrix for the (k+2)×(k+2) unknown parameters that contain negative of the second derivative of the log-likelihood function; hence, the variance-covariance matrix is the inverse of the observed Fisher's information matrix,
The maximum likelihood estimators are substituted, thus resulting in an estimator of ∑ denoted by ˆ∑,
This matrix can be obtained by inverting the Hessian matrix from the function hessian() in R language. The Hessian matrix contains the second derivative of the negative log-likelihood, i.e., moreover, the Hessian matrix is the observed Fisher's information matrix.
According to the parameter inferences performed using the maximum likelihood method, under some regularity conditions [25], these estimators enjoy standard asymptotic properties. Thus, by the asymptotic normality of maximum likelihood estimators, the 100(1−α)% confidence intervals for parameters αj, j=0,1,2,..., k and β, respectively are
where zα/2 is the upper α/2−th quantile of the standard normal distribution.
2.3. Bayesian estimation
2.3.1. Random walk Metropolis algorithm
The Metropolis-Hastings (MH) algorithm is the most popular example of a Markov chain Monte Carlo (MCMC) method for simulating a sample from a probability distribution that is the target distribution from which direct sampling is difficult. This algorithm is similar to the acceptance-rejection method; the proposal (candidate) value can be generated from the proposal distribution, then, the proposal value is accepted with an acceptance probability. Moreover, the MH algorithm is converging to the target distribution itself. For more details on the MH algorithm, see Hastings [26] and Gilks et al. [27].
Given y=(y1,y2,...,yn) is the vector of the observed values of a random sample Y1,Y2,...Yn, let p(θ|y) be the target distribution, while θ is the vector of current state values (parameters) and θ∗ is the proposal value generated from the proposal distribution q(θ∗|θ). Then, the proposal value θ∗ is accepted with the probability p=min(1,Rθ), where
The iterative steps of the MH algorithm can be described as follows
Step 1: Initialize the parameter θ(0) for the algorithm.
Step 2: For l=1,2,...,L repeat the following steps:
a. Generate θ∗∼q(θ∗|θ(t−1)).
b. Calculate p=min(1,Rθ).
c. Generate u from a uniform distribution, u∼U(0,1).
If u≤p, accept θ∗ and set θ(l)=θ∗ with probability p.
If u>p, reject θ∗ and set θ(l)=θ(l−1) with probability 1-p.
A random walk Metropolis algorithm is a special case of the MH algorithm. In the random walk Metropolis algorithm, the proposal distribution is symmetrical, depending only on the distance between the current state value and the proposal value, then the proposal value θ∗ is accepted with probability p=min(1,Rθ), where
The algorithm of random walk Metropolis can be summarized followed by the above steps with adjusting Step 2. Generate random error ϵ from a multivariate normal distribution with a zero-mean vector and variance-covariance ∑.
2.3.2. Bayesian estimation for median discrete Weibull regression model
This section performs the Bayes estimators for the median discrete Weibull regression model based on three schemes of prior distributions as follows:
ⅰ) Uniform non-informative prior distribution: If no prior information is available, a default flat prior can be resorted to, then it is easy to focus on the uniform non-informative prior distribution. The following the prior distributions are
ⅱ) Normal prior distribution: As stated earlier, the possible values of αj are real numbers, which corresponds to the possible values of a normal distribution; this study selects the prior distribution of αj; that is, a normal distribution with the hyperparameters as (μαj,σ2αj), j=0,1,...,k. For parameter β, this study selects the prior distribution that is Gamma distribution with the hyperparameters as (a,b). The following prior distributions are
and
ⅲ) Laplace prior distribution: If prior information is available, this study can perform the informative prior distribution that should include all possible values of parameter. The possible values of αj are real numbers which corresponds to the possible values of a Laplace distribution; it selects the prior distribution of αj; that is, a Laplace distribution with the hyperparameters as (0,1/λ). Similarly, it selects the prior distribution, which is Gamma distribution with the hyperparameters as (a,b). The following prior distributions are
and
The joint prior distributions of the parameters α and β under the independence assumption is
where θ=(α0,α1,...,αk,β).
The choice of the hyperparameters' values is generally modified by available information of dataset to improve the Bayes estimators. For example, it fixes the hyperparameters' values of αj,j=0,1,2,...,k of normal prior distribution with mean zero and high variance. For Laplace prior distribution, it fixes the hyperparameters' values of αj,j=0,1,2,...,k with some λ>0. In addition, the hyperparameters' values of β are considered by the maximum likelihood estimator of β with the mean of Gamma distribution. The joint posterior density function of the parameters α and β can be written as:
where L(θ|y,x) is the likelihood function of the median discrete Weibull regression model in Eq (10).
The Bayes estimator of each parameter under the squared error loss function is the expected value of each parameter under the joint posterior density function. Therefore, the Bayes estimators are given by
and
where j=0,1,2,...,k.
A difficulty to the implementation of Bayesian procedure is that of obtaining the posterior distribution. The process often requires the integration which is very difficult to calculate especially when dealing with complex and high-dimensional models. In such a situation, MH algorithms are highly helpful in this case to model deviations from the posterior density and generate accurate approximations [26,27].
Since the integral in Eqs (19) and (20) does not have a closed form, this study chose the random walk MH algorithm to estimate the Bayes estimators. It also determines the joint posterior density function of the parameters α and β in Eq (18) as the target distribution, while θ is the current state value and θ∗ is the proposal value generated from the proposal distribution q(θ∗|θ). Then, the proposal value θ∗ is accepted with the probability p=min(1,Rθ), where
For the random walk Metropolis algorithm, the proposal distribution is symmetrical, depending only on the distance between the current state value and the proposal value. Then, the proposal value θ∗ is accepted with probability p=min(1,Rθ), where
The iterative steps of the random walk Metropolis algorithm can be described as follows:
Step 1: Initialize the parameters θ(0)=(α(0),β(0)) for the algorithm using the maximum likelihood estimation (MLE) of the parameters θ=(α,β).
Step 2: For l=1,2,...,L, repeat the following steps:
a. Generate random error vector ϵ from a multivariate normal distribution with a zero-mean vector and variance-covariance matrix as a diagonal matrix in which the diagonal elements are the diagonal of the inverse of the observed Fisher's information matrix; ϵ∼N(μμ=0,∑=diag(I−1(θ))).
Then, set θ∗=θ(l−1)+ϵ.
b. Calculate p=min(1,Rθ) where Rθ=L(θ∗|y,x)π(θ∗)L(θ|y,x)π(θ).
c. Generate μ from a uniform distribution; u∼U(0,1).
If u≤p, accept θ∗ and set θ(l)=θ∗ with probability p.
If u>p, reject θ∗ and set θ(l)=θ(l−1)= with probability1−p.
Step 3: Remove B of the chain for burn-in.
Step 4: Calculate the estimated values of the Bayes estimators of the parameters α, and β from the average of the generated values as given by
where θ is a parameter in vector θ=(α,β).
The construction of the highest posterior density (HPD) credible intervals of the parameters αj,j=0,1,2,..,k, and β follows the Monte Carlo procedure. Given an MCMC sample θ(l),l=B+1,B+2,...,L, the HPD interval for θ can be shown as follows:
Step 1: Sort θ(l),l=B+1,B+2,...,L to obtain the ordered value
Step 2: Compute the 100(1−α)% HPD credible intervals
where [(1−α)(L−B)] is the integer part of (1−α)(L−B) and θ is a parameter in vector θ=(α,β).
2.4. Simulation study
In this section, the Monte Carlo simulation is conducted to assess and compare the performance of the Bayesian estimation via the random walk Metropolis algorithm for the median discrete Weibull regression model under difference three prior distributions for the regression parameters: Uniform non-informative prior (Bayes(U)), normal prior (Bayes(N)) and Laplace prior (Bayes(L)). Moreover, the MLE is considered. The various selected sample sizes (n) are 50,100 and 200. The three explanatory variables are considered: A standard normal distribution (x1∼N(0,1)), a uniform distribution that lies between -0.3 and 0.3 (x2∼U(−0.3,0.3)) and a Bernoulli distribution with probability of success 0.4 (x3∼Ber(0.4)). In particular, this study selects the regression parameters to take values (α0,α1,α2,α3)=(1.5,0.4,−0.2,0.8) and β=0.9 for over-dispersion, β=2.5 for under-dispersion and β=1.6 for either over-dispersed or under-dispersed, depending on the value of q.
The parameters are estimated by using the numerical method. In this paper, the Nelder-Mead method in the function optim() from package stats in R is applied to estimate parameters of the median discrete Weibull regression model. For Bayesian estimation, it fixes the hyperparameters' values of αj and j=0,1,2,3 of normal prior distribution with mean zero and variance 1002 and the hyperparameters' values of β as one and the maximum likelihood estimator. In addition, it fixes the hyperparameters' values of αj and j=0,1,2,3 of Laplace prior distribution as 0.5 and the hyperparameters' values of β as one and the maximum likelihood estimator. Additionally, this study considers 10,000 iterations of the sampler and uses the first 10% of the data as burn-in. This simulation study is repeated 1,000 times. The measures of accuracy for the estimators are
(ⅰ) the estimates of the parameters (Est.)
(ⅱ) the mean square error (MSE)
(ⅲ) the coverage probability (CP)
(ⅳ) the average length (AL)
where ^αj is the j-th estimator LCL(l)αj, UCL(l)αj are the j-th lower bound and upper bound for the 95% confidence interval of the l -th time and LCLαj<αj<UCLαj is the total of the number of times that αj is inside the confidence interval. The same measure of accuracy has been applied for the estimators of parameter β. Tables 1–3 report the estimates of the parameters (Est.) together with the MSE. In addition, Tables 4–6 report the 95% CP and the AL.
2.5. Real data analysis
In this section, the median discrete Weibull regression is applied to a real data set that shows the ability for over-dispersion data (see Kalktawi [9]). This data is available under the "COUNT" package in R from Hosmer and Lemeshow [28] and represents the number of visits to a doctor by pregnant women in the first three months of their pregnancies with 189 observations. The response variable is the number of physician visits in first trimester, and the three explanatory variables are history of mother smoking (1 = history of mother smoking; 0 = mother nonsmoker) (x1), weight (lbs) at last menstrual period (x2) and age of mother (x3). For fitting the discrete Weibull distribution of the response variable, the Kolmogorov-Smirnov statistic is 0.0985 less than the critical value of 0.0989. Thus, this data can be modeled by the discrete Weibull distribution. Moreover, this data is modeled by the Poisson, negative binomial and discrete Weibull distributions. The results show that the Akaike information criterion (AIC) from the Poisson, negative binomial and discrete Weibull distributions are 476.59,466.85 and 466.84, respectively. Additionally, the mean and variance of the data are 0.7937 and 1.1221, respectively, which indicates an over-dispersion case.
We estimate parameters and construct the 95% confidence intervals via the maximum likelihood estimation method. To demonstrate how the proposed Bayesian method under the three prior distributions can be used in practice, this study calculates parameter estimates and the 95% HPD interval of the parameters with L=10,000 replicates and 10% of the chain for burn-in; B=1,000. In addition, the three information criteria, namely, the AIC, the Bayesian information criterion (BIC) and the deviance information criterion (DIC) (see in Kalktawi [9] and Haselimashhadi et al. [14]) are applied to compare models with different estimates of parameters, which are models for the three explanatory variables and a subset of parameters of that significance. All results are reported in Table 7. Along with, the traceplot, autocorrelation for sampled values and posterior densities of significant independent variables are performed in Figure 1.
3.
Results and discussion
3.1. Results and discussion of simulation study
An inspection of Tables 1–3 show that the estimates of the parameters (Est.) in the simulation study obtained by all methods seem to be close to the true parameter values. Moreover, all of the estimators have monotonic behaviors according to the MSE, namely, when n increases, the estimated MSE values decrease. The Bayes estimators have a smaller MSE than the estimators of MLE. The MSE of the Bayes(L) outperforms other methods in almost all situations. Additionally, note that the MSE for all estimators of the three Bayesian methods behave very similarly when n=200. Conversely, the MLE presents the highest MSE but has a satisfactory performance when sample size increases. In addition, , note that the MSE of the estimators of α2 are very high and may cause what we define as a strong effect on x2 or the high variance of the estimators of α2. Furthermore, the explanatory variable affects the MSE of estimators. However, when n is large enough and β increases, the MSE of estimators will decrease and will no longer be very high anymore.
Tables 4–6 show that when sample sizes were increased, the CP of all methods was generally close to the nominal confidence level. The CP obtained when using the three Bayesian methods are closer to the nominal level than using MLE method. Additionally, the CP for the three prior distributions of the Bayesian method behave remarkably similar. Regarding the AL, as sample sizes were increased, the AL of the 95% confidence intervals decreased for all methods. For cases β=0.9 and β=1.6, the AL based on the Bayes(L) were the shortest for almost all situations after the CP was considered, while the Bayes(U) were the shortest in the case of β=2.5. Although the AL for the MLE method performs the shortest in some situations, but the CP is farthest from the nominal confidence level. Additionally, the results of the AL for the estimator α2 are quite wide, which is the explanation given for the extremely high MSE values of the estimator α2.
3.2. Results and discussion of real data analysis
Table 7 shows the results of significant explanatory variables that are selected from the three explanatory variables. We only report that an explanatory variable x3 (age of mother) shows significant in all methods. Results from comparing models with different estimates of parameters suggest that a model for only x3 provided better fitting than a model for the three explanatory variables, according to the AIC, BIC and DIC. Regarding the DIC of the three Bayesian methods for each of the two models, they exhibited remarkably similar behavior. Additionally, the AIC and BIC are included as well. This finding corresponds to the results of the simulation study in case n=200. Figure 1 shows the traceplot, autocorrelation for sampled values and posterior densities for regression coefficient α3 based on the Bayes(L). It can be seen that the trace plot showed adequate convergence. Moreover, it is clear that the sampled values are well mixed and exhibit adequate stability for autocorrelation.
4.
Conclusions
This paper introduced the Bayesian estimation for the discrete Weibull regression model via the median link function. The Bayesian approach was considered on the three different prior distributions: Uniform non-informative prior, normal prior and Laplace prior. The augmented random walk Metropolis procedure was also proposed to compute the Bayes estimates of the unknown parameters. Moreover, the maximum likelihood estimation was compared. The performance of these methods was compared by using the Monte Carlo simulation based on the MSE and CP criteria. These criteria were calculated for different sample sizes based on both under-dispersion and over-dispersion data, along with the application of the methods illustrated by using a real dataset available on the literature to compare models with different estimates of parameters via the Akaike information criterion, the Bayesian information criterion and the deviance information criterion. Based on MSE criterion, the Bayesian using Laplace prior distribution for estimating the parameters performs better than other approaches. Additionally, the three Bayesian methods behaved very similarly with the large sample size. Estimated coverage probabilities of the three Bayesian approaches were considered as the criteria of a good confidence interval. In additioon, the results of real data analysis were coincided with those in the simulation study. Overall, the Bayesian estimation using Laplace prior distribution outperforms other methods for parameters estimation. However, the Bayesian estimation using all three prior distributions can be an effective alternative for this model.
Use of AI tools declaration
The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.
Acknowledgments
The authors are grateful to the Editor in Chief and anonymous referee for the insightful and constructive comments that improve this paper.
Conflict of interest
All authors declare no conflicts of interest in this paper.