1.
Introduction
Since the last few decades, almost all research about the origination of G-families is just adopting the approaches of differential equations, compounding, weighting, etc., and thousands of statistical models have been added to the literature. No doubt, some very useful models are introduced using the above-described techniques. Still, a keen analysis reveals that mostly out of these models are internally correlated and maybe the replacement of one another in definite parametric conditions. Moreover, they are similar in mathematical appearance with only mild differences. The critical point is that almost all models are algebraic and non-trigonometric. For a brief study, we refer the reader to [1,2,3,4,5]. Also, for more information about machine learning, see [6,7,8].
Recently, the attention of statisticians has turned towards the directional data and disbursing the trigonometric functions in the existing classical models in order to construct generalized trigonometric families of distributions which open new and co-research horizons for Mathematicians and Statisticians. It is observed that the trigonometric functions enhance the flexibility prominently, keep relative balance and simplicity, show vast applicability in modeling different types of practical data sets and explore the skewness, kurtosis, and tail characteristics along with improving the goodness-of-fit (GoF). Table 1 presents chronological literature review on sine function based families and distributions.
The development of trigonometric and algebraic functions mixed with a new generalized Lomax family of probability distributions is the basic motivation. The remainder motivations are five folded:
● To develop a new sine generator of distributions using the spirit of odd generator and combination of algebraic and trigonometric functions concurrently;
● To introduce a new G-family called "The Odd Lomax Trigonometric Generalized Family of Distributions" (Locsc-G for short) in a trigonometric scenario;
● The proposed family is simple, free from non-identifiability and over parametrization issues;
● To investigate the injection of sine-cosecant functions in odd generator methods in classical distributions, leading to a novel, more versatile, and effective models;
● The new density adopts uni modal features or shapes as well (in almost all base models), and the hazard function adopts all monotone and non-monotone shapes.
The current study is conducted following the spirit of the odd generator presented by [9], the Weibull-G family developed by [10], the sine-G family introduced by [11] and the generalized odd Gamma-G family introduced by [12] collectively. In modern distribution theory, in our view, the trigonometric functions based on generalized families and distributions will prove a breakthrough for modeling the data of physical phenomena.
In Section 1, an introduction about trigonometric work with motivations are presented. New generator and family with special members are presented in Section 2, whereas the new family characteristics are derived in Section 3. In Section 5, the graphical behavior of the new family is observed using famous statistical models. The special member using Weibull as the baseline is investigated in Section 6 along with sub-models. Two data applications demonstrate the significance of the new family and model in Section 8 while final remarks and conclusions end the study in Section 9.
2.
The new family
2.1. Genesis of odd sine/reciprocal cosecant generator
The odd generator (G(x)1−G(x)) and sine function [sin(π2G(x))] are used collectively to develop the new generator.
This generator W[G(x)]:[0,1]⟶R (a link function) satisfies all required conditions of T-X family of distributions.
2.2. Origination of the basic functions
Let r(t)=αλ[1+(tλ)]−(α+1) is the lomax density where 0<t<∞. Replace "t" by the new generator W[G(x)]=[csc(π2G(x))−1]−1 in lomax function, we arrived at the new family "Locsc-G" whose distribution function in Eq (2.2), and probability density function in Eq (2.3) and hazard rate function in Eq (2.4), respectively, are given below.
In Table 2, for example, eight new recruits are added by employing the well-known statistical distributions on all feasible intervals.
3.
Characteristics of the new family
3.1. Quantile function and quantile density function
We will discuss a characteristic of X called the quantile function (qf), which may be determined directly by inverting (2.2) as below
Equation (3.1) possesses a lot of applications; some are given below: 1) To find median, quartiles, deciles and percentiles. 2) Replacing any standard model, this equation can be used to simulate density, histogram, and exact cdfs for these data can be accomplished. 3) The variability analysis related to skewness and kurtosis can be performed on the basis of quantile measures as the Bowley skewness (see [13]) and the Moors kurtosis (see [14]), respectively. A remarkable function related to Eq (3.1), having statistical significance discussed in [15], is the quantile density function denoted by Q′(U) is:
3.2. Basic reliability functions
The hazard rate function, which is an essential concept that plays a vital role in risk and survival analysis, is an example of an important function. There are some other important functions such as survival function S(x), also another important one is reversed hazard rate r(x), at last we must not forget the cumulative hazard rate H(x), and the very interesting mills' ratio m(x), elasticity e(x) and finally the conditional reliability function ˉG(G(x),α,β|t), which are respectively, presented below.
3.3. Useful series expansions
We have the following linear representations for the new families CDF and pdf.
Proposition 3.1. The new family's cdf and pdf have the following linear representations:
where
Proof. If the cdf and pdf of a random variable Y can be stated as Hc(x)=G(x)c and hc(x)=cG(x)c−1g(x) then we say that this random variable has exp-G with power parameter c>0, The cdf of the new family, required to be linearized, is
using the expansion (1+x)−n=∑∞i=0(−ni)xi and binomial expansion simultaneously, F(x) becomes
using MATHEMATICA 11.1, [csc(π2G(x))]j=∑∞k=0ak(j)(π2G(x))2k
where a0(j)=1, a1(j)=j/6, a2(j)=(j/180)+(j2/72), etc.
For F(x), the required linear representation is obtained. Moreover, just by simple differentiation, the linear representation of f(x) can be obtained.
3.4. Moments and derivations
For the new family, the rth moment is (r be an integer and all sum and integrals are assumed to exist)
μ′r is also expressed by consuming the quantile function (or changing the variable x=Q(p)) given by Eq (3.1), in this way
The derived integral is computable using any modern mathematical software like Mathematica, R, Matlab, or Maple for given G(x), α, and λ.
3.5. Probability weighted moments
For r≥1,s≥0, routinely, the (r,s)th probability weighted moment (PWM) is expressed as
Then, we have
After a bit modification using trigonometric relations, (3.7) can be written as:
and
Similarly,
in Eq (3.6), ρr,s can be written as:
where
3.6. Moment generating function
Introducing mathematical properties is very important. The moment generating function is stated in the following mathematical format:
Otherwise, without utilizing the moments but consuming the linear representation presented in expression (3.2), M(t) is expressed as
3.7. Critical points of the density and hazard rate function
By solving the following equation ∂log[f(x)]∂x=0 and ∂log[h(x)]∂x=0 we can provide the density and hazard rate function critical points respectively. For the Locsc-G density function, the nonlinear equation related to density is:
While the critical points of the hrf are obtained from the equation ∂log[h(x)]∂x=0
3.8. Stochastic ordering
A detailed description of stochastic ordering is available in [16], here, utilizing the family parameters α and λ, a proof is presented concerning the stochastic ordering.
Proposition 3.2. let us suppose that we heave a random variable let us say it X came from a distribution with the density function f1(x) as defined in (2.3) with parameters α1 and λ and let us suppose that we heave a random variable let us say it Y came from a distribution with the density function as defined in f2(x) as defined in (2.3) with parameters α2 and λ.So, if α1⩾α2, we have X⩾lrY, i.e., f1(x)f2(x) is decreasing.
Proof. The density is
Then
Since α1⩾α2, then after differentiation with respect to x, we get
The proof of Proposition 3.2 ends with the conclusion that X⩾lrY.
3.9. Stress-strength reliability parameter
The reliability parameter is very important. See [17] for a detailed study on stress-strength reliability.
Let us suppose that we heave a random variable X came from a distribution with density function f1(x) given by (2.3) with parameters α1 and λ1 and another variable Y came from a distribution with distribution function F2(x) defined as (2.2) with parameters α2 and λ2. Then, the reliability parameter is defined by :
With f1(x) and F2(x) functions,
After simplification, we get
Where
where d0(i+j)=1, d1(i+j)=l/6, d2(i+j)=(l/180)+((l2)/72), etc. and similar for e∗n(m). If α1=α2 and λ1=λ2 (corresponds to the case being distributed identically), at end, we obtained R=12(k+l+n).
3.10. Order statistics
Order statistics invariably appear in a variety of applications requiring data related to survival testing. You will find all the information in the book [18].
Consider the ith order statistic Xi:n and its density is to find. Let a random sample X1,…,Xn is chosen from the new family then
Equations (2.2) and (2.3) are substituted in the Eq (3.13), we get
Notably, f1:n(x) and fn:n(x) are the densities of X1:n=inf(X1,…,Xn) and Xn:n=sup(X1,…,Xn) respectively.
Proposition 3.3. TheXi:n pdf may be represented as a linear combination of pdfs from the exp-G distribution family.
Proof. Firstly, consider the Eq (3.13) which displays the expression of fi:n(x). Applying the binomial series expansion and substituting the Eq (3.2) in Eq (3.13), we get
By virtue of generalized binomial expansion and relevant series on sine and cosine trigonometric functions, we have
where
Moreover, h∗(2(p+q)+1)(x) is a pdf of the exp-G family of distributions with parameter (2(p+q)+1), the proposal evidence (3.3) is accomplished.
4.
Estimation
In this section, we introduce different classical estimation methods for estimating the new family parameters α, λ, and ξ, which are obtained by maximization of minimization of the objective function, as we will see in this section. For more information about the introduced estimation methods, see [19,20,21].
The estimated parameters of our proposed family by the maximum likelihood estimation (MLE) method are obtained by maximizing the log-likelihood function of (2.3) which is defined in the following equation.
The estimated parameters of our proposed family by Anderson-Darling estimation (ADE) method is obtained by minimizing the following equation (x(1)≤x(2)≤…≤x(n))
The estimated parameters of our proposed family by right-tail Anderson-Darling estimation (RADE) method is obtained by minimizing the following equation (x(1)≤x(2)≤…≤x(n))
The estimated parameters of our proposed family by Cramér-von Mises estimation (CVME) method is obtained by minimizing the following equation (x(1)≤x(2)≤…≤x(n))
The estimated parameters of our proposed family by least-squares estimation (LSE) method is obtained by minimizing the following equation (x(1)≤x(2)≤…≤x(n))
The estimated parameters of our proposed family by weighted least-squares estimation (WLSE) method is obtained by minimizing the following equation (x(1)≤x(2)≤…≤x(n))
The estimated parameters of our proposed family by maximum product of spacing estimation (MPSE) method is obtained by maximizing the following equation (x(1)≤x(2)≤…≤x(n))
where
5.
Special Locsc-G distributions with graphical analysis
We presented a few special models of the new family using well-known statistical distributions as a baseline, developed main functions, and analyzed and described graphical flexibility.
5.1. The lomax cosecant exponentiated exponential (LocscEE) distribution
Let X be an exponentiated exponential random variable with cdf G(x)=(1−e−δx)β and density g(x)=δβe−δx(1−e−δx)(β−1). Then the CDF, pdf, and hazard rate function of the LocscEE distribution, respectively, become as (for x>0)
Figure 1 displays some plots of the density and hazard rate function of the LocscEE distribution for some parametric values. Figure 1(a) depicts that the LocscEE density exhibits reverse-j, approximately symmetrical, left-skewed and right-skewed shapes. Figure 1(b) reveals that the LocscEE hazard rate function has increasing, decreasing, increasing-decreasing-increasing, and upside-down bathtub shapes.
5.2. The lomax cosecant Weibull (LocscW) distribution
Taking G(x) to be the Weibull cdf with scale parameter σ>0 and shape parameter β>0, say G(x)=1−e−σxβ, and the weibull density g(x)=σβxβ−1e−σxβ, it follows the four-parameters LocscW having the following new cdf, pdf and hazard rate function (for x>0)
Figure 2 displays some plots of the density and hazard rate function of the LocscW distribution for some parametric values. Figure 2(a) depicts that the LocscW density have symmetrical, right-skewed, left-skewed, reversed-J and J shapes. Figure 2(b) reveals that the LocscW hazard rate function has decreased, increasing bathtub and upside-down bathtub shapes.
5.3. The lomax cosecant Burr(LocscB) distribution
Let X be an burr random variable with pdf g(x)=δβ(x)δ−1(1+(x)δ)−(β+1) and cdf G(x)=1−(1+xδ)−β,x>0δ,β>0. it follows the four-parameters LocscB having the following cdf, pdf and hazard function (for x>0)
Figure 3 displays some plots of the density and hazard rate function of the LocscB distribution for some parametric values. Figure 3(a) depicts that the LocscB density have symmetrical, right-skewed, left-skewed and reversed-J shapes. Figure 3(b) reveals that the LocscB hazard rate function have decreasing, increasing and upside down bathtub shapes.
6.
The new distribution
6.1. Main properties
In this part, we will look at the unique member of the Locsc-G family of distributions that uses the Weibull distribution as a baseline, as well as its key features. As a result, by swapping the CDF G(x)=1−e−σxβ, x>0, into Eq (2.2), The new distribution's CDF can be written as below
The corresponding pdf is
The corresponding hazard rate function is
Figure 4 illustrates the suggested model's showing density forms.
Figure 5 illustrates the suggested model's displaying hazard rate function forms.
6.1.1. Reliability measures
The hazard rate function is a key notion and performs a central role in risk and survival analysis. There are some other important functions such survival function S(x), also another important one is reversed hazard rate r(x), at last we must not forget the cumulative hazard rate H(x), and the very interesting mills' ratio m(x), elasticity e(x) and finally the conditional reliability function ˉG(G(x),α,β|t), which are respectively, presented below.
6.1.2. Residual and reverse residual life
The residual life has several uses in probability and statistics and risk assessment. The residual lifetime of LocscW random variable X denoted by Rt(x) is
Additionally, the reversed hazard rate function ˉRt(x) is
6.1.3. Quantile function, quantile density function and median
The quantile function of LocscW is given by
The quartiles and octiles, as well as skewness and kurtosis, can be calculated from this description, and the following distribution results are useful: For a random variable U with a uniform distribution on (0,1), QLocscW(U) has the LocscW distribution.
Furthermore, the quantile density function (qdf) for LocscW may indeed be calculated by getting the differentiation of QLocscW(U) with respect to p. The median, in an instance, is provided as
6.2. MacGillivary's skewness
Obtaining skewness is very important for researchers, however MacGillivary (1986) created a technique to obtain it by the aid of the quantile function, such as below
where pϵ(0,1) and Q(.) is the qf stated in Eq (6.4).
Because the MacGillivary skewness measure δ(p) is simply dependent on qf, it can efficiently characterize the influence of the parameters (α,β,σ,λ) just on the skewness of X. In Figure 6, the plots in Figure 6(left) describes keeping parameters (α=1,λ=0.1,σ=0.5) as constant while the parameter β values are increased from 0.1 to 0.9, then δ(p)→0 means skewness approaches to zero (or approaching to symmetry).
In Figure 6, the plots in Figure 6(middle) describes keeping parameters (α=1.5,λ=0.5,σ=1.0) as constant (as compared to Figure 6(left), the values of (α,λ,σ) are increased by 0.5) while the parameter β values are also increased from 0.1 to 1.0 on regular spacing, then δ(p)→0.5 means lightly skewness is observed.
In Figure 6, the plots in Figure 6(right) describes keeping parameters (α=1.5,λ=1.0,σ=1.0) as constant (as compared to Figure 6(middle), the values of (α,σ) are not changes but λ in increased 0.5 only) while the parameter β values are increased from 0.05 to 0.5 on different spacing values, then δ(p)→1.0 means significant skewness is produced.
In Figure 7, the plots in Figure 7(left) describes keeping parameters (α=2.0,β=1.5,σ=0.1) as constant while the parameter λ values are increased from 0.1 to 0.95, then the symmetry is loosed towards left (negative skewness is observed).
In Figure 7, the plots in Figure 7(middle) describes keeping parameters (β=1.0,λ=0.1,σ=0.5) as constant while the parameter α values are increased from 0.1 to 1.05 on different spacing values, then δ(p) increases heavily means the highly skewness is produced on right side.
In Figure 7, the plots in Figure 7(right) describes keeping parameters (α=1.5,λ=0.5,σ=0.5) as constant while the parameter β values are increased from 1.05 to 3.5 on different spacing values, then δ(p) increases means the right skewness is produced.
6.3. Skewness and kurtosis via 3D graphs
Recently, the tendency has shifted, and the graphical image is now more common and preferred than numerical and tabular representation. The 3D figures showed below vividly demonstrate the shift in skewness and kurtosis that occurs when the parental model parameters are changed. In Figure 8, the alternate curves 8(a) and 8(c) are for skewness while 8(b) and 8(d) are for kurtosis respectively. Both measures of skewness and kurtosis for the proposed model are highly dependent on the fixed values of α and λ.
In Figure 9, the curves 9(a) and 9(c) are for skewness while 9(b) and 9(d) are for kurtosis respectively.
The baseline parameters σ=2.5 and β=3.1 are taken in Figure 9, it is observed that the skewness is decreased (symmetry is increased) in Figure 9(a) as well as the kurtosis is reduced (normality is increased) in Figure 9(b).
6.4. Reduced models of LocscW
In Table 3, three new reduced/sub-models of LocscW distribution are deduced here, just limiting the parametric values.
7.
Numerical simulation
In this section we will use all estimation methods presented in Section (4) with replacing out baseline model by Weibull distribution ( G\left(x, \xi \right) = 1-{\rm e}^{-\sigma\, x^{\beta}} ). Now, we will study the performance of the estimated parameters of the LocscW distribution by this estimation methods. Also, we do a comparison between all methods by using numerical values of average of bias (BIAS) |Bias(\widehat{\pmb \Delta})| = \frac{1}{M}\sum_{i = 1}^{M}|\widehat{\pmb \Delta}-\pmb \Delta| , mean squared errors (MSE), MSE = \frac{1}{M}\sum_{i = 1}^{M}(\widehat{\pmb \Delta}-\pmb \Delta)^2 , and mean relative errors (MRE) MRE = \frac{1}{M}\sum_{i = 1}^{M}|\widehat{\pmb \Delta}-\pmb \Delta|/\pmb \Delta , \pmb \Delta = (\alpha, \lambda, \xi) . The simulation results may be used to build and apply a guideline for choosing the best estimating approach for the specified model parameters. The R software (version 4.0.3) is used to produce M = 10,000 random samples from the proposed distribution for n = 50,100,200,300 and 500.
7.1. Concluding remarks on the simulation
The numerical results of simulations are reported in Tables 4–8 and the power of each value refers to its order in comparing all estimation methods with each other in the same line. Our estimators' partial and overall rankings are displayed in Table 9, in which we conclude that the best method for estimating proposed model parameters when having random samples from our model is MPSE, followed by MLE. Also, we found that as the sample increase, the absolute Bias and MSE and MRE diminishes
8.
Applications
Researchers present two applications of the LocscW model in this section, one on hydrological data and the other on survival data. Using the approach of a limited-memory quasi-Newton code for bound-constrained optimization, we construct the log-likelihood function assessed at the MLEs ( \hat{\ell} ) (L-BFGS-B). We take into account many good statistics for model comparison, including the maximized log-likelihood ( \hat{\ell} ), Akaike information criterion (AIC), Corrected Akaike information criterion (CAIC), Bayesian information criterion (BIC), Hannan Quinn information criterion (HQIC), Anderson-Darling ( A^{*} ), Cramér–von Mises ( W^{*} ) and Kolmogorov-Smirnov (K-S) measures, where lower values of these statistics and higher p-values of K-S indicate good fits.
8.1. First application: Wheaton river data
The first data corresponds to the exceedances of flood peaks (in m3/s) of the Wheaton River near Carcross in Yukon Territory, Canada. They were analysed by Choulakian and Stephens (2001) and are listed below.
1.7, 2.2, 14.4, 1.1, 0.4, 20.6, 5.3, 0.7, 1.9, 13.0, 12.0, 9.3, 1.4, 18.7, 8.5, 25.5, 11.6, 14.1, 22.1, 1.1, 2.5, 14.4, 1.7, 37.6, 0.6, 2.2, 39.0, 0.3, 15.0, 11.0, 7.3, 22.9, 1.7, 0.1, 1.1, 0.6, 9.0, 1.7, 7.0, 20.1, 0.4, 2.8, 14.1, 9.9, 10.4, 10.7, 30.0, 3.6, 5.6, 30.8, 13.3, 4.2, 25.5, 3.4, 11.9, 21.5, 27.6, 36.4, 2.7, 64.0, 1.5, 2.5, 27.4, 1.0, 27.1, 20.2, 16.8, 5.3, 9.7, 27.5, 2.5, 27.0.
The summary statistics for these data are: n = 60, \bar{x} = 2.19297, s = 1.920062, skewness = 1.2614 and kurtosis = 2.23207.
The histogram, box plot, and kernel density plots of the aforementioned data are shown in Figure 10, which demonstrates that the distribution is right-skewed, while the TTT plot is first convex and subsequently concave, indicating a bathtub failure rate. As a result, the LocscW distribution may theoretically be used to represent the existing data.
Table 10 provides the MLEs of the LocscW parameters with standard errors (in parentheses) along with the competitor Weibull models. The outputs attest that 4 -parameter LocscW is the best fit because SEs are very small as compared to MLEs.
Table 11 provides the values of AIC, CAIC, BIC, HQIC, A^{\ast} , W^{\ast} , K-S, and P-values for each model. We utilized all criteria of gof, and on the basis of these statistics outputs, the best fit model is LocscW (4-parameters only) than competitor Weibull models (having a greater number of parameters) and has the potential to fit right-skewed data with the bathtub failure rate.
The plots of the estimated densities are shown in Figure 11 while the plots of the estimated densities are shown in Figure 12 for estimated distribution functions for Wheaton river data.
Figure 13 presents the plots of the estimated density in 13(a) while Figure 13(b) shows the plots the estimated cdf for LocscW model using Wheaton river data.
Figure 14 presents the plot of the estimated density in 14(a) while Figure 14(b) shows the plot the estimated cdf and Figure 14(c) shows the P-P plot for LocscW model using Wheaton river data.
8.2. Second application: Survival data
The next set of data reflects the survival times (in years) of a number of patients administered chemotherapy by Bekker et al. (2000). This data set's 47 values are as follows:
0.047, 0.115, 0.121, 0.132, 0.164, 0.197, 0.203, 0.260, 0.282, 0.296, 0.334, 0.395, 0.458, 0.466, 0.501, 0.507, 0.529, 0.534, 0.540, 0.641, 0.644, 0.696, 0.841, 0.863, 1.099, 1.219, 1.271, 1.326, 1.447, 1.485, 1.553, 1.581, 1.589, 2.178, 2.343, 2.416, 2.444, 2.825, 2.830, 3.578, 3.658, 3.743, 3.978, 4.003, 4.033.
The histogram, box plot, and kernel density of the above data are displayed in Figure 15 indicates that the distribution is right-skewed and unimodal while the TTT plot of the data is first convex and then concave (increasing-decreasing-increasing (confused type)), which suggests a model with heavy right tail is required, motivating the use of the LocscW model on these data.
Table 12 provides the MLEs of the LocscW parameters with standard errors (in parentheses) along with the competitor Weibull models. The outputs attest that 4 -parameter LocscW is the best fit because SEs are very small as compared to MLEs.
Table 13 provides the values of AIC, CAIC, BIC, HQIC, A^{\ast} , W^{\ast} , K-S, and P-values for each model. We utilized all criteria of gof, and on the basis of these statistics outputs, the best fit model is LocscW (4-parameters only) than competitor Weibull models (having a greater number of parameters) and has the potential to fit right-skewed data.
The plots of the estimated densities are shown in Figure 16, while the plots of the estimated CDFs for the LocscW model and its competing models utilizing survival data are shown in Figure 17.
Figure 18 presents the plots of the estimated density in 18(a) while Figure 18(b) shows the plots the estimated cdf for LocscW model for survival data.
Figure 19 presents the plot of the estimated density in 19(a) while Figure 19(b) shows the plot the estimated cdf and Figure 19(c) shows the P-P plot for LocscW model using Wheaton river data.
9.
Concluding remarks
In this paper, we presented a new lomax-G family of distributions using odd sine/cosecant function (Locsc-G) and obtained prominent mathematical properties such as reliability functions, linear representation for cdf and pdf in terms of exp-G distributions, ordinary and weighted moments, quantile and moment generating function, stress-strength reliability, stochastic ordering, and order statistics. Using well-known distributions, the graphical analysis is performed to observe the flexibility in the proposed family with almost all unimodal shapes of densities and hazard rate functions. Moreover, a new Lomax cosecant Weibull distribution (LocscW), a four-parameter model, is also discussed in detail. The model parameters are estimated by the method of maximum likelihood. We used almost all goodness-of-fit criteria to prove the usefulness of the proposed family and model (LocscW) by means of applications of two data sets. We forecast the wider utility of the new family and model in statistical fields, chiefly in hydrological studies, survival analysis, and reliability engineering.
Acknowledgments
This research was supported by Researchers Supporting Project number (RSP-2021/156), King Saud University, Riyadh, Saudi Arabia.
Conflict of interest
The authors declare there is no conflict of interest.