1.
Introduction
The selection of biased and unbiased estimators has drawn considerable attention from researchers in the field of statistical estimation. However, researchers frequently employ biased estimators in scenarios with modest variations, ensuring their estimates closely resemble the underlying population parameter on average. These approaches typically result in greater variety, which reduces their usefulness in most cases. As more information becomes available, the estimation of scenario changes favors biased estimators with a lower mean square error (MSE), not with standing their bias. This feature increased the precision of the estimator. Using supplementary data with a strong relationship to the variable under study is a standard procedure in the field of survey sampling. This methodology frequently enhances the accuracy and dependability of the estimators during both the design and estimation phases. Selecting pertinent additional data with care can significantly reduce the mean square error (MSE) of the estimators used to estimate the population parameters. As ratio estimators may leverage the current link between the study and auxiliary variables, they have become popular in support of this goal. Ratio estimators are useful tools for increasing the accuracy of estimates when calculating the population total or average. Significant progress has been made in this sector as a result of numerous academics that have created a range of ratio and regression-based estimators, each based on a different transformation [1], significantly increasing the amount of knowledge in this field. Within the SSRS framework, some studies have presented estimators based on mixed ratio-type techniques. Koyuncu et al. [2] Examined the estimators developed by [1] within the context of SSRS. Moreover, Koyuncu et al. [3] provided a combined version of the SSRS estimator that had been put forth by [5]. Singh et al. [6] Proposed an extensive set of estimators that utilize supplemental data in the SSRS. Singh et al. [7] generated an extraordinarily effective set of estimators using the same SSRS architecture. Together with the references included in these publications, the [8] provided a thorough assessment.
Stratified sampling with auxiliary variables has diverse applications in physics, engineering, and environmental sciences. In physics, it enhances the particle density estimates in high-energy collisions, cosmological parameter estimates, and material property predictions. Engineering applications include reliability analysis, signal processing, and network-traffic estimation. Specific examples have revealed their utility in estimating ocean currents, predicting structural failures, and optimizing energy systems. These applications underscore the flexibility and potential of stratified sampling in improving the estimation accuracy and efficiency [9].
Our primary goal of this study, in the context of stratified random sampling, is to develop and evaluate efficient estimators that utilize only one additional variable. Furthermore, we describe and assess two novel groups of mean estimators for a finite population. Our investigation includes a thorough examination of their bias and MSE up to the first level of approximation, which yields useful insights into their performance.
Researchers constantly strive for progress in their respective domains. The proposed estimator is a significant improvement in the field of sampling methodology. The adoption of this strategy enhances the development of statistical methods, leading to a constant enhancement in the precision and reliability of estimating population parameters. The suggested estimator is specifically developed to offer improved accuracy in calculating the mean of a finite population. By strategically including a single auxiliary variable in each stratum, it optimizes the use of available information, leading to more precise estimations in comparison to current approaches.
In stratified random sampling, precise estimation of the finite population mean is crucial. Estimators, such as the traditional stratified sampling estimator, ratio estimator, and regression estimator, often rely on simplistic assumptions or fail to effectively hitch the correlation between the study and auxiliary variable. Recently developed estimators, such as the generalized regression estimator and the exponential ratio estimator, offer enhancements, but still have limitations. To address these gaps, we introduce new estimators that incorporate a single auxiliary variable into stratified random sampling. These novel estimators aim to enhance the estimation accuracy and efficiency by better capturing the complex relationships between variables. The proposed estimators are significant, as they provide more reliable and precise estimates, especially in scenarios with non-linear relationships or non-normal distributions, thereby filling an important methodological gap in the survey sampling literature. Simulation studies and empirical evaluations demonstrate the superiority of the proposed estimators over existing ones, making them valuable tools for practitioners and researchers seeking improved estimation strategies.
2.
Methodology
Let us take population of size N that comprises L strata (a group of homogenous units) such that L∑h=1Nh=N where Nh shows the hth stratum size (h = 1, 2, ..., L). Let each stratum sampled nh units through simple random sample without replacement (SRSWOR) scheme, such that, L∑h=1nh=n. Let us suppose that the ith pair of the sample (yhi, xhi) represent the values of y (study variable) and x (auxiliary variable) on the ith unit of the hth stratum, where i = 1, 2, 3, ... Nh.
To obtain the expressions for the Bias and MSE of the estimators, we supposed the various properties listed below to be true.
Suppose ¯yst=L∑h=1Wh¯yh=¯Y(1+ε0), ¯xst=L∑h=1Wh¯xh=¯X(1+ε1) are the overall means of the study and auxiliary variables obtained through a stratified random sample, respectively. Thus the relative error terms ε0 and ε1 satisfies the below properties.
where,
C2yh=S2yh¯Y2 and C2xh=S2xh¯X2 are population coefficient of variations of the study and auxiliary variables, respectively. λh=1nh−1Nh is the finite population correction (fpc), wh=NhN is the stratum weight, and R=ˉXˉY.
3.
Summary of some estimators
Several estimators have been devised to evaluate the finite population mean in the context of stratified random sampling with a single auxiliary variable. Researchers and statisticians have studied several approaches, each designed to utilize information contained in the auxiliary variable to improve the accuracy of the population parameters estimates. The estimators in this context are designed to address the complexities of finite population sampling, considering the stratified structure and the utilization of a single auxiliary variable as a valuable tool for more robust and reliable mean estimation [4]. The conventional estimator for the population mean in the context of stratified random sampling is an unbiased estimator and is defined as follows:
The formula for the variance of the conventional unbiased estimator is provided as:
Though the usual estimator is unbiased, its variance is large. Therefore, when auxiliary information Xi about the study variable Yi is available, then the researchers in [10] suggest the traditional ratio estimator as
The bias of Cochran's ratio estimator along with its MSE is given as:
Bahl et al. [11] suggested an exponential ratio-type estimator. The functional form, Bias, and MSE of Bahl and Tuteja's estimators are as follows:
Based on the work of [12,13] a ratio estimator is introduced where the population coefficient of variation is known.
The estimator's first-order bias and MSE are discussed as follows:
Where ϕ=L∑h=1whˉXh(ˉX+Cx).
Upadhyaya et al. [14] suggested a modified version of [15] by multiplying the coefficient of kurtosis by the mean of the auxiliary variable:
The MSE and Bias of this estimator are:
Here, θ=¯Xhβ2h(x)¯Xhβ2h(x)+Cxh.
A general family of estimators of the population mean was proposed by [16] in response to the work of [17].
Substituting different values of the constants a, b, τ ( = 0, 1, -1), and α, we obtain several estimators. The bias and MSE of the estimator are:
π=a¯Xa¯X+b, α=V11πτV02 and
based on [18,19], introduced a class of exponential estimators for the population mean in the SRSWOR scheme is introduced.
The estimator's MSE is obtained as,
where,
Motivated by [20,21], proposed a ratio cum exponential type estimator is proposed, as follows:
Here, ˉxst′=L∑h=1wh(ahˉxh+bh)andˉXst′=L∑h=1wh(ahˉXh+bh), and ahandbh are functions of the known parameters like coefficient of Kurtosis, coefficient of variations etc. of the auxiliary variable.
The Bias and MSE of the above estimator are:
where θ=aˉX2(aˉX+b) and α2 are minimizing constant.
For α2(opt)=L∑h=1w2hλh(θhSxh−RSyh)L∑h=1Sxh the optimum MSE converges to Regression estimator as,
The factor ρc represents the aggregate correlation coefficient over all strata and is defined as,
Motivated by [22,23] the following difference exponential ratio estimator are proposed:
Here, Ast, Bst, and γ are the generalizing constants, and k1 and k2 are the minimizing constants. The Koyuncu estimator's first order of approximated Bias and MSE are given as:
Here, δ=AstˉXAstˉX+Bst. For optimum values of k1=DE−2BC4AB−E2 and k2=ˉYCE−2AD4AB−E2 the MSE is given as:
Here, A=1+V20+δ2V02−2δV11, B=1+(2γ2+δ2−γ−2δγ)V02, C=δV11−2−34δ2V02, D={δγ−34δ2−γ(γ−1)}V02−2 and E={2δ2+γ2−γ(2δ+1)}V02+2+2(γ−δ)V11.
Tiwari et al. [24] proposed the following difference cum ratio exponential estimator as
Here, ast, bst, cst, and dst are either known parameters or some functions of the parameters of X, α3, and β which are the generalizing constants that can take values like (1, 0, -1) etc, and k3 and k4 are the minimizing constants. The estimator's bias and MSE are provided:
Here, μst=[astˉX+bstcstˉX+dst]α3 and υst=[cstˉXcstˉX+dst]
For k3=B1C1−2D1E12(A1B1−E21) and k4=2A1D1−C1E12(A1B1−E21), the lowest possible MSE is calculated as:
Here, A1=μ2st[1+V20+(V02−4V11)(β2+α3υst)+2V02(β2+α3υst)2], B1=R2μ2stV02, C1=μst[2+(V02−2V11)(β2+α3υst)+V02(β2+α3υst)2], D1=RμstV02(β2+α3υst) and E1=Rμ2stV11. Javed et al. [25] proposed the following family of estimator estimators,
Here, the constants a, b are generalizing elements. The bias of the proposed estimator is given as,
For k5=−B2C2−2D2E22(A2B2−E22) and k6=−R(2A2D2−C2E2)2(A2B2−E22) a minimal value for MSE is expressed as,
Here, A2=1+V20+4η2V02, B2=R2V02, C2=V20+(4η2+18)V02−3ηV11, D2=V11−ηV02 and E2=V11−2ηV02.
4.
Proposed estimator
The estimators suggested in this study represent significant improvements in the field of finite population estimation. In contrast to conventional unbiased estimators, which are appropriate when only the primary study variable is accessible, these innovative estimators designed to exploit the potential of supplementary information. By carefully including only one auxiliary variable in the estimation process, we achieved sophisticated equilibrium between bias and precision.
Two discrete estimator families were carefully designed and assessed using stratified random sampling. The aforementioned estimators were specifically designed to address the inherent difficulties associated with estimating the average of a determinate population. Consequently, they provided a novel approach for enhancing the accuracy of the estimation.
4.1. First proposed estimator
Muneer et al. [26] proposed the following regression-exponential-Ratio type estimator
Here, w1 and w2 are minimizing constants and α takes values either 1 or 0 to have ratio exponential or product exponential estimators, respectively. Similarly, Shabbir et al. [28] proposed the below estimator
Here, w3 and w4 are the generalizing constants and u, v are some known suitably chosen parameters of the auxiliary variable or some real valued constants.
In light of the work of [26,28], we propose the following estimator:
Here, S1 and S2 are optimizing constants, whose values are obtained so that the MSE is minimum, ℓ can take values from 0 to 1 and the generalizing constants u and v are to be replaced by the values of the population parameters or some function of the parameters of the supplementary variable.
After simplification and application of different series, the proposed estimator is converted to the following form:
Here, ϑ1=1−η2−2ℓ and ϑ2=ℓ+(ℓ−12)η+18(3−2ℓ)η2 and η=uˉXstuˉXst+v
Now, subtracting ˉYst from both sides, we have:
When we apply expectation to both sides of the previous equation, we get the following bias expression:
To obtain the MSE expression, we take the square of both sides of the equation
After taking expectation the MSE expression obtained as,
Here, Apr=1+V20+(ϑ21−2ϑ2)V02+4ϑ1V11, Bpr=1+(ϑ21−2ϑ2)V02, Cpr=1−ϑ2V02+ϑ1V11, Dpr=1−ϑ2V02 and Epr=1+(ϑ21−2ϑ2)V02+2ϑ1V11.
Now, let differentiate the MSE equation to obtain the values of S1 and S2 to have minimum MSE.
∂MSE(Tpro1)∂S1=0 and ∂MSE(Tpro1)∂S2=0. So, we obtain:
Solving Eqs (4.1.10) and (4.1.11), we gain the following optimal values of S1=BprCpr−DprEprAprBpr−E2pr and S2=¯Yst(AprDpr−CprEpr)AprBpr−E2pr. With these values, the minimum MSE adopts the below form:
4.2. Second proposed estimator
Taking some insights from the work of [26,27,28], we propose the following class of estimators.
The values of optimizing constants T1 and T2 are obtained so that the MSE is minimum. The difference equation up-to first order of approximation of the proposed estimator in terms of errors is expressed as
After taking the expectation the bias of the suggested estimator is given as,
Here, δ1=12η+2ℓ−1 and δ2=ℓ+12η(2ℓ−1)+38η2.
Squaring both sides of the above (4.2.2) difference equation and using first order of approximation, we have,
or
or
Here, Ap=(V20+(δ21+2δ2)V02−4δ1V11), Bp=(1+(δ21+2δ2)V02), Cp=(δ2V02−δ1V11), Dp=(1+δ2V02) and Ep=(1+(δ21+2δ2)V02−2δ1V11).
For optimum values of T1=−[BpCp−DpEp+BpApBp−E2p+Bp] and T2=[¯Y(ApDp−CpEp+Dp−Ep)ApBp−E2p+Bp], the least possible value of the MSE up to the first order of approximation is shown as
5.
Efficiency comparison
In this section, we define the conditions that must be met for the suggested estimators to outperform the currently used estimating methods in terms of efficiency.
5.1. Conditions for the first proposed estimator
Condition (ⅰ)
By comparing (3.2) and (4.1.12), MSE(Tpro1)⩽MSE(Tst) if
Here, ℜ1=AprD2pr+BprC2pr−2CprDprEprAprBpr−E2pr
Condition (ⅱ)
By comparing (3.5) and (4.1.12), MSE(Tpro1)⩽MSE(Tr) if
Condition (ⅲ)
By comparing (3.8) and (4.1.12), MSE(Tpro1)⩽MSE(TSD) if
Condition (ⅳ)
By comparing (3.11) and (4.1.12), MSE(Tpro1)⩽MSE(TBT) if
Condition (ⅴ)
By comparing (3.14) and (4.1.12), MSE(Tpro1)⩽MSE(TUS) if
Condition (ⅵ)
By comparing (3.17) and (4.1.12), MSE(Tpro1)⩽MSE(TCh) if
Condition (ⅶ)
By comparing (3.20) and (4.1.12), MSE(Tpro1)⩽MSE(TO) if
Condition (ⅷ)
By comparing (3.24) and (4.1.12), MSE(Tpro1)⩽MSE(TG) if
Condition (ⅸ)
By comparing (3.28) and (4.1.12), MSE(Tpro1)⩽MSE(TNK) if
Condition (ⅹ)
By comparing (3.32) and (4.1.12), MSE(Tpro1)⩽MSE(TTSS) if
Condition (xi)
By comparing (3.35) and (4.1.12) MSE(Tpro1)⩽MSE(TMJ) if
5.2. Conditions for the second proposed estimator
Condition (ⅰ)
By comparing (3.2) and (4.2.7), MSE(Tpro2)⩽MSE(Tst) if
Where, ℜ2=ApB2p+BpC2p−2CpDpEp+Bp+2BpCp−D2p−2DpEp and ℜ3=ApBp−E2p+Bp
Condition (ⅱ)
By comparing (3.5) and (4.2.7), MSE(Tpro2)⩽MSE(Tr) if
Condition (ⅲ)
By comparing (3.8) and (4.2.7), MSE(Tpro2)⩽MSE(TSD) if
Condition (ⅳ)
By comparing (3.11) and (4.2.7), MSE(Tpro2)⩽MSE(TBT) if
Condition (ⅴ)
By comparing (3.14) and (4.2.7), MSE(Tpro2)⩽MSE(TUS) if
Condition (ⅵ)
By comparing (3.17) and (4.2.7), MSE(Tpro2)⩽MSE(TCh) if
Condition (ⅶ)
By comparing (3.20) and (4.2.7), MSE(Tpro2)⩽MSE(TO) if
Condition (ⅷ)
By comparing (3.24) and (4.2.7), MSE(Tpro2)⩽MSE(TG) if
Condition (ⅸ)
By comparing (3.28) and (4.2.7), MSE(Tpro2)⩽MSE(TNK) if
Condition (ⅹ)
By comparing (3.32) and (4.2.7), MSE(Tpro2)⩽MSE(TTSS) if
Condition (xi)
By comparing (3.35) and (4.2.7), MSE(Tpro2)⩽MSE(TMJ) if
The above theorems are important for the development of conditions under which the novel estimators outperform the suggested estimators. If these conditions hold, then the novelty of the estimators is guaranteed. In other words, these assumptions are related to the efficiency of the proposed estimator.
6.
Numerical comparison
To check the performance of the proposed estimator relative to the classical estimator, the following data sets were considered (see Table 1).
Data Ⅰ: (source: [29])
(The two strata are Stratum 1: Rawalpindi, Lahore, Sargodha and Gujranwala. Stratum 2: Sahiwal, Faisalabad, D.G Khan, Multan and Bahawalpur)
Y: In 2012 division's wise employment level.
X: in 2012 division's wise quantity of registered factories.
Data Ⅱ: (source: [29])
Y: in 2012 division's wise enrollment of students.
X: in 2012 divisions wise the count of Govt schools.
Data Ⅲ: (source: [17]). The dataset has information on the apple production amount (Y) and the number of apple trees (X) in 854 villages in Turkey in the year 1999. The data is categorized into strata based on the region of Turkey.
Data Ⅳ: (source: [2]) The study contains the number of instructors as study variable and the number of students as supplementary variable in schools for 923 districts in six regions in Turkey in 2007. (1: Aegean 2: Black Sea 3: Central Anatolia 4: East and Southeast Anatolia 5: Marmara 6: Mediterranean)
Data Ⅴ: (source: [30]). The main variable pertains to the number of wet days, whereas the auxiliary variable refers to the total number of sunshine hours.
Table 2 shows the MSE of all the estimators selected from the [25], along with the proposed estimators under stratified random sampling with a single supplementary variable. The first, second, and third populations consisted of two strata, each with summary information mentioned. The fourth and fifth populations consisted of six strata each. MSE results were obtained for the proposed estimators for three different values of the generalizing constants u and v. In the first estimator, u = 1 and v = 0, and no transformation is applied. In the second estimator, u = 1 and v = Cx, while the third value had the proposed estimators u = ρyx and v = Cx. Furthermore, in the first three populations, the value of the generalizing constant α was 0.5, while in the fourth population, it was α = 0.65. In the fifth population, α = 0.40, and the suggested estimators were compared. It was apparent that the MSEs of the proposed estimators (Tpor1 and Tpro2) were less than those of all competing estimators in this study. In addition, the use of transformation further decreased the MSE values of the suggested estimator.
The entries in Table 3 and Figure 1 represent the PREs of the estimators for the population mean in the stratified random sampling WOR scheme, in the presence of an auxiliary variable. PREs were obtained relative to the classical estimator of the mean. In all five populations, the efficiencies of the proposed estimators were higher than those of all the listed estimators. In addition, the use of transformation (by applying different parameter values for u and v) further enhanced the efficiency of the estimator. As in the given case, Tproi(1) did not undergo transformations. In Tproi(2), u = 1 and v = Cx, and in Tproi(3), u = rho and v = Cx (i = 1, 2). A visual display of the PREs relative to each dataset is shown in Figure 1. Each of the five lines compare the PREs of the estimators in different datasets. It is obvious that among the five lines, the height of the graph was maximum for the last six entries (Tpro1(1) to Tpor2(3), proposed estimators) compared to the rest of the existing estimators. Hence, the graphical display of PREs supports the claim that the proposed estimators are significantly more efficient than the existing estimators of the finite population mean in stratified random sampling with single auxiliary information.
7.
Simulation study
In the section, we conducted a simulation study of both the established and newly introduced estimators to assess the stability of these estimators across random samples. We began with a stratified population of N = 1000 units, from which a sample of n = 100 pairs of values (y, x) were selected. This population comprised two strata with sizes N1 = 600 and N2 = 400. By employing proportional allocation, we extracted samples of size n1 = 60% and n2 = 40% of the total sample size (n) from these respective strata. The mean vectors and covariance matrices are expressed as follows (see Table 4):
Here, MSE and PRE values for the estimators were carried out using the following steps in R software.
Step-1: Simple random samples without replacement (SRSWOR) of different sizes n = 10, 20, 50,100,200. were drawn from the target population. For each sample size, a loop of 10,000 times was caried out and allowed R-studio to compute the estimator values at each iteration.
Step-2: For each sample, the values of the existing and suggested estimators were calculated separately by taking the average of all iterations.
Step-3: Using the values obtained in Step-2 the MSE of the estimators is obtained.
Step-4: PRE of the estimators is obtained using the following formula:
pre(Ti)=Var(T0)MSE(Ti)×100 Where, Ti replaces different estimators.
Table 5 presents the simulation results for the MSEs of the estimators with respect to the usual estimators for various sample sizes. By exploring the table, we can see that the MSEs of the suggested estimators are smaller than those of other estimators. Furthermore, our estimator is stable with respect to sample size, and as the sample size increases, the MSE of the estimator also decreases. Hence, our suggested estimators are the best among all competing estimators under study.
Table 6 shows the simulation results of the different estimators with respect to the usual estimators for the various sample sizes. By exploring the table, we can see that the PREs of the suggested estimators are higher than those of all rival estimators. Furthermore, the suggested estimators are stable with respect to sample size, and as the size of the sample increases, efficiency also increases. Hence, our proposed estimator is superior to all the competing estimators under study. The visual display of the PREs is shown in Figure 2, where each line shows the PREs distribution with a different sample size. Upon examination of the graph, we decided that in each of the samples, the height of the line was the maximum for the last six values (Tpro1(1) to Tpro2(3), the proposed estimators). Hence, the graphical display of the simulation results supports the superiority of the proposed estimators.
8.
Discussion
We provide two new families of estimators in the context of stratified random sampling that are intended to enhance population mean estimation by utilizing a single auxiliary variable. Eq (4.1.1) formalizes the study in [26,28], which has a major effect on the construction of the first family of estimators. Equation (4.2.1) provides the mathematical representation of the second family of estimators, which is also developed based on the findings published in [28,31]. Expressions for the bias and mean squared error (MSE) of both estimator families were derived by a comprehensive theoretical study that took into account their first-order approximations. The statistical features of these formulations are explained in depth in Eqs (4.1.6), (4.1.12), (4.2.3), and (4.2.7).
To determine the relative efficiency of our proposed estimators compared to existing methods, we established performance criteria based on MSE minimization and precision improvement. Specifically, Eqs (5.1.1)–(5.1.11) delineated the necessary conditions under which the first estimator family achieved superior performance relative to conventional estimators. Likewise, a corresponding set of conditions was identified for the second estimator family, ensuring its enhanced efficiency over competing methods. These are those situations that are necessary for the proposed estimators to be efficient relative to the estimators mentioned under study.
Both real and simulated datasets were used to thoroughly assess the suggested estimators' effectiveness. The MSE and percentage relative efficiency (PRE) values calculated for real-world data are shown in Tables 2 and 3, which also show how the estimator performs differently for varying values of auxiliary variables, represented by u and v. Among these tables, one can observe that the MSE values of the last six estimators, the proposed one, have small values relative to all the other estimators shown in the table. Similarly, the PRE values for the proposed last six estimators are larger than all the competing estimators for the population mean given in the tables. The findings support the suggested estimators' statistical superiority by showing a constant trend of producing lower MSEs and higher PREs than traditional estimators for the five data sets. The observed patterns indicate that the suggested methodologies provide more accurate estimates of the population mean, thereby reducing estimation errors and enhancing efficiency.
Furthermore, the robustness of these findings was confirmed through extensive simulation studies, with Tables 5 and 6 summarizing the outcomes. The performance of the estimators was tested under five sample sizes: 10, 20, 50,100, and 200. In all sample sizes used in the simulation studies, both families of estimators exhibited small mean squared errors (MSEs) and large values of percent relative efficiencies (PREs) compared to all competing estimators for the population mean. Furthermore, the tables confirm that the proposed estimators are less variable across sample sizes in terms of MSEs and PREs. These simulated results align closely with the empirical observations, further validating the performance advantages of our proposed estimator families. Notably, the trends in simulated data mirror those observed in real-world datasets, suggesting the generalizability of our approach across population structures.
Visual representations of the PRE values are provided in Figures 1 and 2, which illustrate the efficiency comparisons between our estimators and existing alternatives. Each line in Figure 1 represents a distinct population, while each line in Figure 2 corresponds to a different sample size. In both figures, the graph lines for the proposed families reach the highest points, indicating superior percent relative efficiencies (PREs). A consistent upward trend is evident, demonstrating that the proposed families consistently achieve higher PRE values across scenarios. This graphical evidence strongly supports our conclusion that the new estimator families outperform traditional approaches in terms of precision and reliability.
By analyzing the summary statistics in Table 1 and the percent relative efficiencies (PREs) in Table 3, we observe the following patterns: In the first three datasets, the correlation coefficients for all strata are comparatively smaller than those in the last two datasets. Additionally, the PRE values for the first estimator are higher in the first three datasets compared to the second estimator. Conversely, the PREs for the second estimator are higher than those of the first estimator in the last two datasets.
Based on these findings, we conclude that the first family of estimators performs more efficiently when the correlation coefficients for all or some of the datasets are relatively small. On the other hand, the second family of estimators performs more efficiently when all or most strata have larger correlation coefficients.
Overall, our research offers a substantial contribution to the field of sampling methodology by introducing efficient estimators that optimize the use of an auxiliary variable in stratified random sampling. The proposed approaches not only enhance estimation accuracy but also provide a more reliable alternative to existing techniques. Researcher can extend this work by exploring the application of these estimators in more complex sampling frameworks or integrating additional auxiliary variables to further refine precision levels. The methodological advancements presented in this study pave the way for improved sampling strategies in statistical analysis, benefiting empirical research and practical data collection applications.
9.
Conclusions
We introduced two new exponential estimators that can be used to calculate the population mean when stratified random sampling is applied with a single auxiliary variable. We also found formulas for the first-order bias and the MSE of the new estimators. Furthermore, a demanding criterion was established to identify the circumstances in which the proposed estimators outperformed traditional and existing alternatives. We performed a thorough comparison of the MSEs and PREs of our newly designed estimators with those of other approaches. We conducted a comprehensive review, including both simulated experiments and real-world datasets, to improve the robustness and usefulness of our findings. The empirical findings from this investigation consistently confirm the superiority and effectiveness of the proposed estimator families when compared to all the other estimators examined in this study.
We emphasize the significant advancements made by our cutting-edge exponential-type estimators and highlight their improved performance and efficacy in the difficult field of stratified random sampling using a single auxiliary variable.
The proposed estimators account for the nonlinear relationships between the study and auxiliary variables, in contrast to traditional estimators. Additionally, the proposed estimators are a hybrid of regression, ratio, product, and exponential functions to obtain more accurate results. Furthermore, the proposed estimators can adapt to multiple distributions, making them more versatile.
The first limitation of the suggested estimators is that although they can handle nonlinear relations, they assume specific functional forms between the variables. Violation of this assumption may affect the performance of the estimator. Adding more parameters to the suggested estimators increases complexity and computational requirements.
In conclusion, researchers can develop more effective variables in light of the suggested estimators to cope with nonresponse problems. Researchers can extend the proposed estimators to scenarios of multiple auxiliaries and examine the proposed estimators in other sampling designs and population scenarios.
Author contributions
Khazan Sher: Conceptualization, project administration, writing original draft, writing–review and editing; Muhammad Ameeq, Basem A. Alkhaleel, Sidra Naz: Investigation, writing original draft, writing–review and editing; Muhammad Muneeb Hassan, Olyan Albalawi: Project administration, investigation, writing original draft, writing–review and editing. All authors have read and agreed to the published version of the manuscript.
Use of Generative-AI tools declaration
The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.
Acknowledgment
Researchers Supporting Project number (RSPD2024R630), King Saud University, Riyadh, Saudi Arabia.
Conflict of interest
The authors declare no conflict of interest.
Appendix A
a. Development of the Bias and MSE of the first family of estimators
Rewriting the first family of estimators
In terms of error, the estimator could be written as
Expanding the above Taylor series up to first order of approximation, we have
After applying the exponential series, we obtain the below expression
By subtracting ˉYst from both sides, we have
When we apply expectation to both sides of the previous equation, we get the following bias expression:
To obtain the MSE expression, we take the square of both sides of the equation,
After taking expectation, the MSE expression is obtained as:
Or
Here, Apr=1+V20+(ϑ21−2ϑ2)V02+4ϑ1V11, Bpr=1+(ϑ21−2ϑ2)V02, Cpr=1−ϑ2V02+ϑ1V11, Dpr=1−ϑ2V02 and Epr=1+(ϑ21−2ϑ2)V02+2ϑ1V11.
Now, let us differentiate the MSE equation to obtain the values of S1 and S2 to have minimum MSE.
Solving Eq (4.1.10) for S1, we have
Solving Eq (4.1.11) for S2 , we have
By putting Eq (4.1.2a) in Eq (4.1.1a) we have
Now, to obtain a value for S2, we put the value from Eq (4.1.3a) in (4.1.2a)
With these values from Eqs (4.1.3a) and (4.1.4a), the minimum MSE adopts the below form,
b. Development of the Bias and MSE of the second family of estimators
Rewriting the Eq (4.2.1), we have,
In terms of errors, the above equation could be written as:
Expanding the above Taylor series up to first order of approximation, we have:
After applying exponential series, we obtain the below expression:
The difference equation up-to first order of approximation of the proposed estimator in terms of errors is expressed as
After taking the expectation, the bias of the suggested estimator is given as:
where δ1=12η+2ℓ−1 and δ2=ℓ+12η(2ℓ−1)+38η2.
Squaring both sides of the above (49) difference equation and using first order of approximation, we have,
or
or
where Ap=(V20+(δ21+2δ2)V02−4δ1V11), Bp=(1+(δ21+2δ2)V02), Cp=(δ2V02−δ1V11), Dp=(1+δ2V02) and Ep=(1+(δ21+2δ2)V02−2δ1V11).
To obtain the values of T1 and T2, we differentiate the Eq (4.2.6) w.r.t as below:
Putting Eq (4.2.6b) in (4.2.5b), we have,
With this value of T1, Eq (4.2.6b) adopts the following form
The least possible value of the MSE is obtained by utilizing Eqs (4.2.7b) and (4.2.8b)
or
Appendix B
Section 5.1
Section 5.2
Where, ℜ2=ApB2p+BpC2p−2CpDpEp + Bp+2BpCp−D2p−2DpEp and ℜ3=ApBp−E2p+Bp