1.
Introduction
In science, wind speed is a fundamental quantity that results from the movement of air from high-pressure areas to low-pressure areas, primarily caused by temperature changes. Wind speed has a diverse impact on life and the economy and is important, such as in renewable energy production, aviation operations, and crop production. Additionally, monitoring and predicting wind speed contributes to preparedness and disaster prevention. Mohammadi, Alavi, and McGowan [1] studied the estimation of wind speed distributions and demonstrated that the Birnbaum-Saunders distribution is the most suitable. However, on days or times with no wind, where the wind speed is zero, the Birnbaum-Saunders distribution cannot be used for analysis since it is positively skewed. Therefore, the delta-Birnbaum-Saunders distribution is a more suitable option. The delta-Birnbaum-Saunders distribution combines both zero and positive values. The zero observations follow the binomial distribution with binomial proportion δ, whereas the positive observations with the probability 1−δ follow the Birnbaum-Saunders (BS) distribution. It is well known that the BS distribution is widely applied across various fields such as environmental research, agriculture, business, industry, and medical sciences [2,3,4,5]. Since the Birnbaum-Saunders distribution is positively skewed and cannot be applied when zero values are present, it is not suitable for datasets containing zeros. However, real-world data may include zeros, making the delta-Birnbaum-Saunders distribution a more appropriate choice. The concept of the delta-Birnbaum-Saunders distribution originates from Aitchison's research [6]. Subsequently, several researchers have applied the concept of incorporating zero values into various positive distributions, providing more diverse and accurate approaches for statistical analysis, such as the delta-lognormal distribution. Hasan and Krishnamoorthy [7] used the delta-lognormal distribution to construct confidence intervals for the mean, employing both the fiducial approach and the method of variance estimate recovery (MOVER). Maneerat, Niwitpong, and Niwitpon [8] constructed confidence intervals for the difference between variances using the delta-lognormal distribution. They compared the highest posterior density (HPD) method with the normal approximation (NA), parametric bootstrap (PB), and fiducial generalized confidence interval (FGCI) methods. Singhasomboon, Panichkitkosolkul, and Volodin [9] proposed methods for constructing confidence intervals for the ratio of medians in lognormal distributions. The methods they introduced include the NA, the MOVER, and the generalized confidence interval (GCI). Their findings indicate that GCI performs well in terms of coverage probabilities, and they recommend using the NA method for moderate to large sample sizes when the mean and variance are small. For the delta-Birnbaum-Saunders distribution, Ratasukharom, Niwitpong, and Niwitpong [10] used the GCI, bootstrap confidence interval, generalized fiducial confidence interval (GFCI), and NA to estimate the proportion of zeros using the variance-stabilized transformation (VST), Wilson, and Hannig methods. These approaches were applied to construct confidence intervals for the variance of the delta-Birnbaum-Saunders distribution. They found that the GFCI based on the Wilson method is most suitable for small sample sizes, the GFCI based on the Hannig method is optimal for medium sample sizes, and the GFCI based on the VST method performs best for large sample sizes. For the delta-gamma distribution, Guo et al. [11] proposed GCIs based on fiducial inference, Box-Cox transformation, PB, and MOVER to construct confidence intervals for the difference between coefficients of variation in delta-gamma distributions. They found that all four GCI methods provided satisfactory results in terms of coverage probabilities. For the delta-two-parameter exponential distribution, Khooriphan, Niwitpong, and Niwitpong [12] proposed methods for constructing confidence intervals for the mean of the delta-two-parameter exponential distribution using PB, standard bootstrapping, the GCI, and the MOVER. They found that GCI is recommended for small to moderate sample sizes, while PB is more suitable for large sample sizes.
The coefficient of variation is a statistical measure of relative dispersion used to compare the variability of distinct datasets. The coefficient of variation is defined as the ratio of the standard deviation to the mean. The coefficient of variation value is typically expressed as a percentage. A higher coefficient of variation indicates greater relative variability, while a lower coefficient of variation suggests less relative variability. Moreover, the coefficient of variation is a useful tool and is applied in various real-world scenarios, for example, investment analysis, healthcare, education, and economics. Importantly, environmental scientists use coefficients of variation to study the variability in environmental data, such as rainfall patterns, temperature fluctuations, or pollutant levels [13,14,15]. Furthermore, numerous researchers have conducted studies on confidence intervals for the coefficient of variation in various distributions. In the normal distribution, Vangel [16] created the confidence intervals for the coefficient of variation. Buntao and Niwitpong [17] used delta-lognormal and lognormal distributions to create the confidence intervals for the coefficient of variation. D'Cunha and Rao [14] described a method for calculating the coefficient of variation of the lognormal distribution using Bayesian inference. Sangnawakij and Niwitpong [18] examined the Gamma distribution's ratio coefficient of variation. Yosboonruang, Niwitpong, and Niwitpong [19] suggested confidence intervals for the difference between two independent coefficients of variation of the two delta-lognormal distributions. Puggard, Niwitpong, and Niwitpong [20] proposed confidence intervals for the coefficient of variation in the Birnbaum-Saunders distribution. La-ongkaew, Niwitpong, and Niwitpong [21] presented the confidence intervals for the ratio of the coefficients of variation between the two Weibull distributions.
Many researchers have studied and developed confidence intervals for parameters in various probability distributions. From the study on constructing confidence intervals for parameters in various positive distributions that include zero values, it was found that the generalized confidence interval and normal approximation methods are effective. Additionally, the bootstrap confidence interval is recognized as a fundamental technique for constructing confidence intervals. Many researchers recommend these methods after comparing them with other methods. However, to date, there has been no research conducted on confidence intervals for parameters of the delta-Birnbaum-Saunders distribution. As a result, the purpose of this study is to construct confidence intervals for the coefficients of variation in the delta-Birnbaum-Saunders distribution. This study proposes three methods for constructing confidence intervals: the normal approximation, the generalized confidence interval that estimates the proportion of zero using variance-stabilizing transformation, as proposed by Wu and Hsieh [22], and the generalized confidence interval that estimates the proportion of zero using the Wilson score method, as proposed by Li, Zhou, and Tian [23]. These three methods are then compared with the bootstrap confidence interval. Furthermore, to validate the accuracy of these methods, all four of them will be applied to real-world data, specifically wind speed data collected in Ubon Ratchathani and Si Sa Kat, Thailand.
2.
Preliminary
Let Y=(Y1,Y2,…,Yn) be a random sample from the delta-Birnbaum-Saunders (DBS) distribution with the proportion of zero δ, shape parameter α, and scale parameter β, denoted by Y∼DBS(δ,α,β), the probability density function for the delta-Birnbaum-Saunders population is expressed as
where I is an indicator function, with I0[y]={1;y=0,0;otherwise, and I(0,∞)[y]={0;y=0,1;y>0. Then the distribution function of Y is given by
where F(y;α,β) is the Birnbaum-Saunders distribution function. For Y=0, the number of zero observations is distributed according to the binomial distribution denoted by n(0)∼Binomial(n,δ). Given n=n(1)+n(0), where n(1) and n(0) represent the numbers of positive and zero values, respectively, the maximum likelihood estimate of δ is ˆδ=n(0)n. According to the Aitchison [6] concept, the population mean, variance, and coefficient of variation can be calculated as follows:
and
The method for constructing confidence intervals for the coefficient of variation of the delta-Birnbaum-Saunders distribution will be presented in the next section.
3.
Proposed methods
3.1. Normal approximation
The normal approximation (NA) method is a technique that depends on the sample size, becoming more accurate as the sample size increases. A statistical approach used to derive an estimator with an asymptotically normal distribution is the delta method. Let
Using the delta method, the asymptotic distribution of the estimator based on the Taylor series of g(ˆα,ˆδ) at α and δ is calculated as follows:
Consider that it is possible to demonstrate that the probability of √nRemainder converges to 0 as the sample size n approaches infinity. Since ˆα∼N(α,α22n(1)) and ˆδ∼N(δ,δ(1−δ)n), following computations, we can obtain
Subsequently, we can calculate the asymptotic mean and variance of the estimator as follows:
and
where O=(2+α2)2. Detailed procedures for deriving the asymptotic mean and variance are provided in the appendix. Assume that ˆα and ˆδ are independent; then the maximum likelihood estimator of θ can be determined as
where ˆα={2[(∑n(1)i=1yin(1)(∑n(1)i=1y−1in(1)))1/2−1]}1/2 is the modified moment estimator of α proposed by Ng, Kundu, and Balakrishnan [24]. Then, the estimated variance of ˆθ can be written as
where H=(2+ˆα2)2. A random variable Z=ˆθ−θ√ˆV(ˆθ)∼N(0,1) according to the central limit theorem. Therefore, the (1−υ)100% CI for θ based on NA is given by
where zυ/2 is the (υ/2)th quantile value from the standard normal distribution.
3.2. Generalized confidence interval
The concept of the generalized confidence interval (GCI) method proposed by Weerahandi [25] provides a general framework for constructing confidence intervals by considering the generalized pivotal quantity (GPQ). In constructing confidence intervals for θ based on GCI, the GPQs of β and α are taken into consideration. Let T∼t(n(1)−1). Sun [26] recommended that the GPQ of the β should be provided by
β1and β2 are the two solutions of the quadratic equation,
where A=1n(1)∑n(1)i=11√Yi, B=∑n(1)i=1(1√Yi−A)2, C=1n(1)∑n(1)i=1√Yi, and D=∑n(1)i=1(√Yi−C)2, while the GPQ of α should be provided by
where E1=∑n(1)i=1Yi, E2=∑n(1)i=11Yi, and U∼χ2n(1).
For the GPQ of δ, we use two concepts: the variance-stabilized transformation (VST) and the Wilson score method (WS). The details are explained in the following subsections.
3.2.1. GCI based on the VST: G.VST
According to Wu and Hsieh [22], the GPQ of δ is defined as
where V=2√n(arcsin√ˆδ−arcsin√δ)∼N(0,1). Hence, the GPQ for θ is
Consequently, the (1−υ)100% CI for θ is based on G.VST is given by
where RVSTθ(υ) is the (υ/2)th percentile of RVSTθ.
3.2.2. GCI based on the WS: G.WS
In accordance with Li, Zhou, and Tian [23], the GPQ of δ is described as
where W=n(0)−nδ√nδ(1−δ). Thus, the GPQ for θ is
Therefore, the (1−υ)100% CI for θ is based on G.WS is given by
where RWSθ(υ) is the (υ/2)th percentile of RWSθ.
3.3. Bootstrap confidence interval
The bootstrap method is a resampling technique used to estimate the sampling distribution of a statistic by repeatedly resampling from the observed data with replacement, as proposed by Efron [27]. Let ˆα/ and ˆδ/ be observed values of ˆα and ˆδ based on bootstrap samples. Suppose that K bootstrap samples are available. The bootstrap expectation E(ˆα) can be approximated by using the mean ˆα/(.)=1K∑Kj=1ˆα/j, where ˆα/j is sequence of the bootstrap MLEs of α, for j=1,2,…,K. The bootstrap bias estimate based on K replications of ˆα is given by
Then, the constant-bias-correcting estimates, as defined by Mackinnon and Smith [28], are used for creating the bias-corrected estimator, which is
According to Brown, Cai, and DasGupta [29], they proposed the Jeffreys interval for the binomial proportion, which employs the Jeffreys prior and is represented by Beta(0.5,0.5). Therefore, it results in
where n∗(0)=nˆδ/and n∗(1)=n(1−ˆδ/). The bootstrap estimator of θ can be written as
Consequently, the (1−υ)100% CI for θ is based on BCI is given by
where ˆθ(Boot)(υ) is the (υ/2)th percentile of ˆθ(Boot).
4.
Results and discussion
In this simulation study, we have compared the performance of the proposed methods by considering the coverage probabilities greater than or equal to the nominal confidence level of 0.95, along with the expected lengths of the shortest confidence interval. This comparison was conducted using Monte Carlo simulations and the statistical software R. The overall number of replications was set to generate a simulation with 5,000 replications in total, 1,000 replications for the GCI, and 500 replications for the BCI. In addition, the sample size has been set to n = 30, 50,100,150, and 200, and the following parameters have been specified: δ = 0.1, 0.5, and 0.7; α = 0.25, 0.50, 0.75, 1.00, and 1.50; and β = 1. The algorithm presents the steps for estimating the coverage probability and expected length to compare the efficiency of the proposed methods.
The results from Table 1 are as follows: it is evident that the NA and BCI methods have values that are close, both in terms of coverage probabilities and expected lengths. Similarly, the G.VST and G.WS methods also exhibit close values to each other in almost all the cases studied, with coverage probabilities remaining stable and close to 0.95, and they have the shortest expected lengths. This results in the G.VST and G.WS methods being more efficient than the NA and BCI methods. Figure 1 shows a comparison of various methods in terms of shape parameters relative to coverage probability and expected length. It is evident that the coverage probabilities for the G.VST and G.WS methods are consistently greater than and close to the nominal confidence level of 0.95 in almost all cases. The BCI method achieves a coverage probability close to the specified criterion when the shape parameters are 0.25 and 1.00. Meanwhile, for the NA method, the coverage probability meets the required criterion when the shape parameter is small. As the shape parameter increases, the NA method's coverage probability shows a tendency to decrease. When considering the expected lengths, a consistent trend is observed for all methods. As the shape parameter value increases, expected lengths also increase progressively. Figure 2 shows a comparison of various methods in terms of sample sizes relative to coverage probability and expected length. It was found that the coverage probabilities of the G.VST and G.WS methods meet the specified criteria. For the NA method, the coverage probability increases as the sample size increases. The BCI method provides coverage probability close to the specified level only when the sample size is 50. Regarding the expected length, it reveals that as the sample size increases, the expected length for all methods decreases, resulting in improved efficiency. Figure 3 shows a comparison of various methods in terms of the proportion of zero relative to coverage probability and expected length. It demonstrates that the coverage probabilities for the G.VST and G.WS methods consistently align closely with the specified confidence level. However, the NA and BCI methods provide coverage probabilities that fall below the specified confidence level. When examining expected length, a similar trend is observed across all methods: as the proportion of zero increases, the expected length also increases. Nonetheless, the G.VST and G.WS methods yield a shorter expected length compared to the NA and BCI methods.
5.
Application
Wind speed plays multiple important roles and has various impacts, particularly in agriculture. It affects plant growth rates, leading to faster growth and increased crop yields. Because Thailand is known as an agricultural country, a large portion of its population has always been engaged in farming or related occupations. Therefore, wind speed is an important factor that affects agriculture in Thailand. In this research, wind speed data from Ubon Ratchathani province for the hourly periods on March 9–10, 2023, and wind speed data from Si Sa Kat province for the hourly periods on April 3–7, 2023, have been applied for analysis, as presented in Tables 2 and 3. The wind speed data for both provinces was obtained from the Automatic Weather System in Thailand (http://www.aws-observation.tmd.go.th/main/main). We have plotted histograms of the wind speed data for Ubon Ratchathani and Si Sa Kat provinces to visualize the data distribution, shown in Figures 4 and 5. Since the wind speed data include both zero values (no wind) and positive values, we examined the suitability of the data distribution for positive values by comparing it to other distributions, including the normal, exponential, Cauchy, logistic, and Birnbaum-Saunders distributions. To assess the suitability of these distributions for the data, we have used the Akaike information criterion (AIC) and the Bayesian information criterion (BIC), calculated as
and
respectively, where p represents the number of parameters estimated, o represents the number of observations, and L represents the likelihood function. From Table 4, it is evident that the AIC and BIC values for the Birnbaum-Saunders distribution are the lowest compared to other distributions. This suggests that the Birnbaum-Saunders distribution is the most suitable for the positive value of the wind speed data. As a result, the wind speed data, which contains both positive and zero values, is modeled as the delta-Birnbaum-Saunders distribution. Consequently, we have used this distribution to calculate confidence intervals for the coefficients of variation of the wind speed data. In addition, we have presented summary statistics for the wind speed data in Table 5. In the wind speed data, the parameter α represents the shape or skewness of the distribution, reflecting the tendency toward lower or higher-than-normal wind speeds. The parameter β indicates the scale of the wind speed distribution in the area; if β changes, the distribution of the data will also shift. The parameter δ represents the proportion of zero values in the dataset. Point estimates or coefficients of variation for Ubon Ratchathani and Si Sa Kat provinces were found to be 1.2183 and 1.3085, respectively. Table 6 presents the calculated 95% confidence intervals for the coefficient of variation for the wind speed data from Ubon Ratchathani and Si Sa Kat provinces. We compared the wind speed data from Ubon Ratchathani with the parameters from the data simulation, using the sample size of n = 50, parameter α = 0.75, and parameter δ = 0.3, from Table 1. The simulation results indicate that the NA, G.VST, G.WS, and BCI methods achieve coverage probabilities greater than the specified confidence level of 0.95. Additionally, it was found that the G.WS method provides the shortest confidence interval compared to other methods. The confidence interval for the wind speed data from Ubon Ratchathani using the G.WS method is (1.0797, 1.4925), with the confidence interval length of 0.4128, the shortest among the methods. This indicates that the study results are consistent. Subsequently, we compared the wind speed data from Si Sa Ket with the parameters from the data simulation using the sample size of n = 100, parameter α = 1.00, and parameter δ = 0.3, from Table 1. The simulation results show that the G.VST, G.WS, and BCI methods achieve coverage probabilities greater than the specified confidence level, while the NA method has a coverage probability lower than the specified confidence level. Therefore, we considered only the G.VST, G.WS, and BCI methods. It was found that the G.VST method provides the shortest confidence interval. The confidence interval for the wind speed data from Si Sa Ket using the G.VST method is (1.2002, 1.4655), with the confidence interval length of 0.2653, the shortest among all methods. This indicates that the study results are consistent. Consequently, to construct confidence intervals for the coefficient of variation of wind speed data in Thailand, we recommend using the G.WS method for Ubon Ratchathani province and the G.VST method for Si Sa Ket province.
6.
Conclusions
In this study, we constructed confidence intervals for the coefficient of variation of the delta-Birnbaum-Saunders distribution. We proposed three methods: NA, G.VST, and G.WS, and compared them with BCI. Then, we compared the performance of the proposed method based on the coverage probabilities greater than or equal to the 0.95 confidence level, along with the expected lengths of the shortest confidence interval. The simulation results indicate that the coverage probabilities of the G.VST and G.WS methods are greater than or close to the nominal confidence level. Meanwhile, the NA method shows coverage probability greater than the nominal confidence level when the shape parameter is small. Additionally, the coverage probability of the BCI method becomes closer to the nominal confidence level as the sample size increases. Considering the expected lengths, the BCI method provides shorter confidence intervals than the NA method, except when the shape parameter is large. However, the G.VST and G.WS methods yield the shortest and most similar confidence intervals, making these two methods the most efficient overall. Moreover, all the proposed methods were applied to wind speed data in Thailand and yielded results consistent with the simulation outcomes. Therefore, the G.VST and G.WS methods are recommended for constructing confidence intervals for the coefficient of variation of the delta-Birnbaum-Saunders distribution. In future research, we will investigate new methods and expand the parameters of interest in the delta-Birnbaum-Saunders distribution to enhance the effectiveness of constructing confidence intervals.
Author contributions
Usanee Janthasuwan analyzed the data, drafted, and wrote the manuscript. Suparat Niwitpong conceptualized and designed the experiment and revised the manuscript. Sa-Aat Niwitpong proposed analytical tools, approved the final draft, and secured funding.
Acknowledgments
The authors would like to thank the editor and reviewers, whose valuable comments and suggestions enhanced the quality of the paper. This research was funded by the King Mongkut's University of Technology North Bangkok. Contract no: KMUTNB-68-KNOW-17.
Conflict of interest
The authors declare no conflict of interest.
Appendix
The asymptotic mean and variance
We have used the Delta method to obtain an estimator with an asymptotically normal distribution based on the Taylor series, as follows:
where g(α,δ)=12+α2√α2(4+5α2)+δ(2+α2)21−δ. Now, we will calculate the partial derivatives of g(α,δ) with respect to α as follows:
Next, we have calculated the partial derivatives of g(α,δ)with respect to δ, and we obtain that
After that, by using the equation above to substitute into Eq A.1, we get that
as n→∞. It is well known that the asymptotic distribution of α and δ is given by
respectively. We have calculated the asymptotic mean of the coefficient of variation of the Delta-Birnbaum-Saunders distribution as follows:
In addition, the asymptotic variance of the coefficient of variation of the Delta-Birnbaum-Saunders distribution is given by
Note that ˆα∼N(α,α22n(1)) and ˆδ∼N(δ,δ(1−δ)n).