1.
Introduction
Estimation of some population parameters, using a specific sampling design, has been an interesting area of research. The popular parameters which have been an area of interest, in simple random sampling, are the population mean and variance. The basic estimators of the population mean and variance in simple random sampling are sample mean, ˉy, and sample variance, s2. In certain situations, the information of some auxiliary variables is also available and can be used to obtain more efficient estimators for some population parameters. Several authors have proposed some improved estimators of the population mean and the population variance by using the information of the auxiliary variables. The popular estimators of population mean, using information of auxiliary variables, are the ratio and regression estimators given by [1]. The ratio and regression estimators have attracted several authors, and different modifications have been proposed from time to time. A class of estimators of population mean by using information of some auxiliary variables has been proposed by [2]. Another class of regression and ratio-product type estimators has been proposed by [3], and it performs better than the classical ratio estimator. Several estimators of population mean in cases of single- and two-phase sampling have been proposed by [4]. A general class of estimators of the population mean in single- and two-phase sampling has been proposed by [5]. More details on estimators of population mean can be found in [6,7], among others.
In recent years, the estimation of population variance has also attracted a lot of authors. Classical ratio and regression estimators of the population variance in single-phase sampling have been proposed by [8,9]. An improved ratio type estimator of the population variance has been proposed by [10]. Some ratio and regression type estimators of population variance in two-phase sampling have been proposed by [11]. The exponential type estimators have also attracted some authors in recent times and [12] have proposed an exponential estimator of population variance. Some general classes of exponential estimators have been proposed by [13,14]. An estimator of coefficient of variation in single-phase sampling has been proposed by [15]. Some other notable works on variance estimation are [16–21], among others.
Recently, [22] proposed an estimator of general population parameters in single-phase sampling. The estimator has been proposed by using information of a single auxiliary variable. The estimator provides a unified way to estimate the population mean, variance and coefficient of variation for specific values of the constants involved. In this paper, we have proposed some estimators of general population parameters in single- and two-phase sampling. The estimators have been proposed using information of a single and multiple auxiliary variables. The plan of the paper is as follows.
Some methodology and notations are given in Section 2. The new estimators of general population parameters for single phase sampling are proposed in Section 3. The estimators have been proposed by using information of single and multiple auxiliary variables. The expressions for the bias and the mean square error (MSE) of the proposed estimators are obtained. In Section 4, estimators for the general population parameters are proposed for two-phase sampling alongside the expressions for the bias and MSE of the proposed two-phase sampling estimators. In Section 5, the comparison of the estimators of specific parameters is given. Some numerical study of the proposed estimator is given in Section 6. The numerical study comprises simulation study and applications using some real populations, and the conclusions and recommendations are given in Section 7.
2.
Methodology and notations
In this section, we have given some methodology and notations that will be used in this paper. Suppose that the units of a population are labeled as U1,U2,…,UN while the values of some variable of interest are Y1,Y2,…,YN. Suppose, further, that the estimation of some general population parameter
is required, where
and
are, respectively, the population mean and variance of Y. It is to be noted that the general parameter t(a,b) reduces to the population mean for a = 1 and b = 0, and it reduces to the population variance for a = 0 and b = 2 and to the coefficient of variation for a=−1 and b = 1. When information of some auxiliary variable is known, then the conventional regression estimator, using a sample of size n, is
where β=Sxy/SxyS2xS2x is the population regression coefficient between X and Y, and
and
are the population and the sample mean of the auxiliary variable X. The mean square error of (1) is
where
and
is the population correlation coefficient between X and Y.
In some situations, the population information of auxiliary variable is not available, and in such situations the regression estimator (1) cannot be used. The problem can be solved by using a two-phase sampling technique. In two-phase sampling, a first phase sample of size n1 is drawn from a population of size N, and information of an auxiliary variable is recorded. A sub-sample of size n2 < n1 is drawn from the first-phase sample, and information of the auxiliary variable and the study variable is recorded. The conventional regression estimator, in two phase sampling, is given as
where
is second phase sample mean of study variable Y,
is the second-phase sample mean of auxiliary variable X, and
is the first-phase sample mean of auxiliary variable X. The MSE of two-phase sampling regression estimator is
where
and
The regression estimator of population variance is given by [9] as
where γ is a constant, S2x and s2x are, respectively, the population and the sample variances of the auxiliary variable, and s2y is the sample variance of Y. The estimator for two-phase sampling can be easily written. Several modifications of the two-phase sampling regression estimator of mean are given in [6].
The derivation of bias and MSE of the estimators of the mean and the variance require certain notations. In this paper, we will assume that the sample mean and the sample variance of study and auxiliary variable are connected with the population mean and the population variance as
and
The relation between sample estimates and the population parameters in case of two-phase sampling is
and
The expected values of error terms ε′s and e′s are all zero. Some additional expectations for single- and two-phase sampling and for single auxiliary variable, are
In case of multiple auxiliary variables, we will use the following results for single- and two-phase sampling:
where εεx=[εx1⋯εxq]/, ex=[ex1⋯exq]/, R=diag(ρyxj), Φ21=diag(φ21j), C∗x=diag(Cxj) and
and
Also,
and
We will, now, propose some new estimators for single-phase sampling.
3.
Estimators for single-phase sampling
In this section, we have proposed some new estimators of the general population parameter for single-phase sampling. These estimators have been proposed using information of a single and several auxiliary variables. These estimators are proposed in the following.
3.1. Estimator with single auxiliary variable
In the following, we have proposed a new estimator of general population parameter using information of a single auxiliary variable. The proposed estimator is
It is easy to see that the proposed estimator reduces to the classical estimator of mean for
Also, for
the estimator (9) reduces to the classical estimator of variance. For
the estimator (9) becomes a regression type estimator of the population mean, and for
we have a regression type estimator of the population variance. Further, for
the estimator (9) becomes a regression type estimator of coefficient of variation. Now, to obtain the bias and MSE of (9), we write the estimator using the error notations as
Expanding, and retaining only the linear terms, we have
or
Applying expectation and simplifying, the bias of the proposed estimator (9) is
Again, squaring (10) and retaining only the quadratic terms, we have
Applying expectation and using (6), the MSE of (9) is
We will, now, obtain the optimum values of α and β which minimize (12). For this, we differentiate (12) with respect to α and β, equate the derivatives to zero and solve the resulting equations simultaneously. Now, the derivatives of (12) with respect to α and β are
and
Equating the above derivatives to zero and simultaneously solving the resulting equations, the optimum values of α and β which minimizes (9) are
and
Using these values in (12), the minimum MSE of estimator given in (9) is
where
and
The MSE for specific cases of (9) are readily obtained. For example, if
then the MSE of a regression type estimator of population mean is obtained as
Further, if
the expression for MSE of a regression type estimator of variance is obtained as
Again, if
the MSE of a regression type estimator of coefficient of variation is obtained as
It is interesting to note that for
the optimum MSE of (9) reduces to the MSE of classical regression estimator given in (2). Also, for
the optimum MSE of (9) reduces to the classical regression type estimator of variance as given by [9].
3.2. Estimator with several auxiliary variables
In this section, we will give an estimator of general population parameter in single-phase sampling using the information of several auxiliary variables. The proposed estimator is
Again, it is easy to see that the proposed estimator (19) provides certain estimators as a special case for different values of (a,b,αj,βj). Using error notations, the estimator (19) can be written as
where
Expanding, and retaining only the linear terms, we have
or
Taking expectation on both sides, the bias of the proposed estimator (18) is
Again, squaring (20) and retaining only the quadratic terms, we have
Taking expectation of the above equation and using (8), the MSE of (19) is
We will, now, obtain the optimum values of αα and ββ which minimizes (22). For this, we will first differentiate (22) with respect to αα and ββ. The derivatives are
and
Equating the derivatives to zero, the normal equations are
and
Writing the above equations in matrix form, we have
Solving the above matrix equations, the optimum values of αα and ββ are given as the solution of
Now, we invert the above partitioned matrix as below. Let
and then
where
and
Using the values of the inverted matrix in (22), the optimum values of αα and ββ are
and
Using these optimum values of αα and ββ in (21), the minimum MSE of (18) is
It is interesting to note that, for
the minimum MSE, given in (26), reduces to the minimum mean square error of the classical regression estimator of mean with several auxiliary variables; see [6]. Also, for
the minimum MSE, given in (26), reduces to the minimum MSE of a general estimator of variance given by [19].
4.
Estimators for two-phase sampling
In this section, we have proposed some new estimators of the general population parameter for two-phase sampling. The estimators have been proposed using information of a single and several auxiliary variables.
4.1. Two-phase sampling estimator with single auxiliary variable
In the following, we have proposed a new estimator of general population parameter for two-phase sampling using information of a single auxiliary variable. The proposed estimator is
It is easy to see that the proposed estimator (27) reduces to the regression type estimator of mean in two-phase sampling for
The estimator (27) reduces to the regression type estimator of variance in two-phase sampling for
Now, to derive the bias and MSE of (27), we write the estimator (27), using error notations, as
Now, expanding the power series and retaining only the linear terms, we have
or
Applying expectation on (28) and using (7), the bias of (27) is
Again, squaring (29) and retaining only the terms whose powers add up to 2, we have
Applying expectation, and using (7), the mean square error of (29) is
The optimum values of α and β which minimize (30) are the same as given in (13) and (14). The minimum mean square error is obtained by using the optimum values of α and β in (28) and is
where
and
It is to be noted that the minimum MSE, given in (31), reduces to (15) for θ1=0. Further, for
the minimum MSE, given in (31), reduces to the MSE of the two-phase sampling regression estimator of the population mean. Also, for
the minimum MSE, given in (31), reduces to the MSE of the two-phase sampling regression estimator of the population variance; see, for example, [19]. Further, for
the minimum MSE, given in (31), reduces to the MSE of the two-phase sampling estimator of coefficient of variation and is given as
We will, now, propose a new estimator of general population parameter in two-phase sampling using information of several auxiliary variables.
4.2. Two-phase sampling estimator with several auxiliary variables
The proposed estimator of general population parameter in two-phase sampling with multiple auxiliary variables is
The estimator (33) provides various estimators as special cases for specific choices of the parameters involved. For example, if
then we have a regression type estimator of the population mean for two-phase sampling with multiple auxiliary variables. Again, if
then we have a regression type estimator of the population variance in two-phase sampling with multiple auxiliary variables. Further, if
then we have a two-phase sampling estimator of the coefficient of variation with multiple auxiliary variables. Now, to derive the bias and MSE of the proposed two-phase sampling estimator, we write it as
Expanding the powers and retaining only the linear terms, we have
or
Applying expectations, and using (8), the bias of the proposed estimator is
Again, squaring (34), applying expectation and using (8), the MSE of (35) is
The optimum values of αα(2) and ββ(2) which minimizes (36) are the same as given in (24) and (25). Using the optimum values αα(2) and ββ(2) in (36), the minimum MSE is
The mean square error of specific cases of (33) can be easily obtained from (37) by using the specific values of the parameters.
5.
Comparison of the proposed estimators
In this section we have given some comparison of the proposed estimators with some existing estimators. The comparison will be given in case of a single auxiliary variable. The comparison for the multiple auxiliary variables case is analogous.
We will first give a comparison of the proposed estimators with the general estimator of population parameter suggested by [22]. The estimator is
with mean square error
where
and
A close comparison of (39) with (15) indicates that the MSEs of the two estimators are equal. It is interesting to note that our proposed estimator (9) is much simpler in application than (38). We will, now, give a comparison of the estimators of specific population parameters.
5.1. Comparison with estimators of the population mean
In the following, we will give a comparison of estimators for estimation of the mean. We know that the proposed estimator reduces to the estimator of mean for (a,b,α,β)=(1,0,αopt,βopt) and is given as
The MSE of the above estimator is given in (16) and can also be written as
or
where Var(ˉy)=θˉY2C2y is the variance of the mean per unit estimator. From above, we can see that the proposed estimator of the mean is always more efficient than the mean per unit estimator. Again, the MSE of the proposed estimator of the mean can be written as
where
is variance of the classical regression estimator of the mean. It is clear that the proposed estimator will be more efficient than the classical regression estimator of the mean if ρyx⩾φ−103φ12. Since the MSE of the estimators of the mean proposed by [23] is the same as the MSE of the classical regression estimator, the proposed estimator of the mean, (40), is more efficient than the estimator proposed by [23] if ρyx⩾φ−103φ12.
Further, the estimators proposed by [24,25] are less efficient than the classical regression estimator; therefore, they are less efficient than the proposed estimator of the mean, given in (40).
5.2. Comparison with estimators of the population variance
It is easy to see that the proposed estimator reduces to the regression type estimator of variance for
and is given as
The MSE of the above estimator is given in (17) and can also be written as
or
where
is the MSE of the classical estimator of the variance. The expression of MSE, (44), is same as the expression of the MSE of the variance estimator proposed by [14], but the construction of our proposed estimator of the variance, (42), is much simpler as compared with the variance estimator given by [14]. Further, it is easy to show that our proposed estimator, (42), is more efficient than the classical estimator of variance, s2y, and the estimator proposed by [13].
We will, now, compare our proposed estimator of variance with the estimators proposed by [18,19]. For this, we first see that the MSE of estimators proposed by [18,19] is the same and is given as
Now, our proposed estimator of variance will be more efficient than the estimators proposed by [18,19], if
6.
Numerical study
In this section, we have conducted numerical study of the specific cases of the proposed estimator of general population parameter. The numerical study has been conducted in two ways: simulation and study using real population. These numerical studies are given in the following sub-sections.
6.1. Simulation
In this section, the comparison of the proposed estimator is done with some existing estimators through simulation. The simulation has been done using some popular single- and two-phase sampling estimators of the mean and the variance. We have used two-phase versions of some of the estimators of mean and variance which are not available in the literature. The estimators used in the simulation, in addition to classical ratio and regression estimators of the mean, are given in Tables 1 and 2 below. The simulation algorithm for single-phase sampling is as below:
1) Generate an artificial population of size 5000 from a bivariate normal distribution N2(60,45,52,42,ρ) by using different values of the correlation coefficient.
2) Generate random samples of sizes 50,100,200 and 500 from the generated population.
3) Compute different estimators by using the generated samples.
4) Repeat steps 2 and 3 for 20000 times for each sample size.
5) Compute mean square error of each estimator of mean and variance at different sample sizes by using
In the above tables ˉy(2) and s2y(2) are the second phase mean and variance of the study variable. Similar notations hold for the auxiliary variable.
The simulation algorithm for two-phase sampling is as below:
1) Generate an artificial population of size 5000 from a bivariate normal distribution N2(60,45,52,42,ρ) by using different values of the correlation coefficient.
2) Generate first phase random samples of sizes 500 and 1000 from the generated population.
3) Generate second phase random samples of sizes 5%, 10% and 20% of the first phase sample.
4) Compute different estimators by using the second phase sample mean of Y, first and second phase sample means of auxiliary variable X and some population measures of auxiliary variable X.
5) Repeat steps 2–4 for 20000 times for each combination of first and second phase sample size.
6) Compute bias and mean square error of each estimator at different sample sizes as given in step 5 for the single-phase case above.
The results of the simulation study are given in Tables 3–6 below.
We can see, from the above tables, that our proposed estimators of the mean and the variance outperform other competing estimators. The results given in the above tables also indicate that the mean square error of all of the estimators decreases with the increase in the sample size.
The graphs of relative efficiency of various estimators of the mean and the variance, relative to the ratio estimators of mean and variance, are given in Figures 1 and 2 below. The graphs also show that our proposed estimators of the mean and the variance have the best efficiency as compared with the competing estimators. We can also see, from the figures, that the estimator proposed by [25] is the worst estimator to estimate the population mean. This estimator is even worse than the ratio estimator. The derived estimator of the mean by [22] is better than some of the estimators used in the study, but still this estimator performs worse than the classical regression estimator of the mean and the estimator proposed by [24]. Similar conclusions can be drawn from the comparison of estimators of the variance, and we can see that our derived estimator of variance outperforms all other estimators used in the study. The relative efficiencies of the estimators of the variance show that all of the estimators used in the study perform better than the classical ratio estimator of variance proposed by [9].
6.2. Empirical study using real populations
In this section, we have conducted an empirical study of some popular estimators of the mean and the variance by using some real populations. We have used five populations for this empirical study. The first three populations are taken from [27], and the last two are taken from [28]. Summary measures of the populations are given in Table 7 below.
The empirical study has been conducted by using a 25% sample from each of the populations. We have used six estimators of the mean and five estimators of the variance in this empirical study. The estimators of the mean that we have used are given in Table 1 above, excluding the estimator by [22], as it has the same mean square error as the mean square error of our proposed estimator. The estimators of variance that we have used in this empirical study are classical ratio and regression estimators by [9], estimator by [12], estimator by [13] and our derived estimator of variance, given in Table 2 above. The mean square error of various estimators is computed for each population. The results are given in Tables 8 and 9 below.
From the above tables, we can see that our proposed estimators of the mean and the variance perform better than all other competing estimators. We can also see that the estimator of the mean proposed by [25] and the estimator of the variance proposed by [13] are the worst estimators. The performance of these estimators increases where population variance of the study variable is much smaller as compared with the population variance of the auxiliary variable.
7.
Conclusions
In this paper, we have proposed some estimators of the general population parameters for single- and two-phase sampling. These estimators have been proposed by using information of a single and several auxiliary variables. The proposed estimators can be used to obtain estimators of population mean, population variance and population coefficient of variation. The expressions for the mean square error of the proposed estimators have been obtained for single- and two-phase sampling. We have seen that our proposed estimators have smaller mean square error as compared with some of the existing estimators. We have conducted extensive simulation study of the proposed estimator for single- and two-phase sampling. Several available estimators are compared in the simulation study. We have seen that our proposed estimators of the mean and the variance perform better than the competing estimators used in the study. We have also seen that the simulated mean square errors of various estimators decrease with increase in the sample size. We have also conducted an empirical study using some real populations. The empirical study has been conducted by computing the analytical mean square error of various estimators. The empirical study shows that our proposed estimators of the mean and the variance are better than the other estimators used in the study. It is, therefore, recommended that the proposed estimators are better choices for estimation of population mean and population variance as compared with the existing estimators.
Conflict of interest
The authors declare no conflicts of interest.