1.
Introduction
The efficient estimation of population parameters has been a challenging job within the domain of statistics. Sampling methods have played an important role in developing various estimators for various situations. Simple random sampling has been a useful method to draw a sample from a population that is homogeneous with respect to the characteristics under study. The estimation of the population mean or total is of interest in several situations. For example, we may be interested to estimate the average or total yearly income of a household; hence the estimation of the average or total income of the locality might be required. The estimate can be improved by using some auxiliary variable(s) that is (are) highly correlated with the variable for which the estimation is required. A classical method of estimation in such a situation is the ratio method of estimation, proposed by [1]. The method has been modified from time to time in order to improve the efficiency by reducing the mean square error (MSE). A modification of the ratio estimator was proposed in [2], whereas an extensive numerical study of the ratio estimator has been performed by the authors of [3]. More modifications of the ratio estimator were proposed in [4,5,6]. The product method of estimation has also been used by some authors when the variable of interest and auxiliary variable are negatively correlated. The ratio and product estimators have been combined by some authors to propose some more efficient estimators. The estimator proposed in [7] combines ratio and product estimators and has a smaller MSE than the classical ratio estimator. Another ratio-type estimator was proposed by the authors of [8] who used the ideas presented in [4,5,6].
In recent years, some work has been done to propose families of estimators. Some families of estimators were proposed in [9,10,11]. These families of estimators provide other different estimators as a special case. Ratio and ratio-type estimators have been developed by some authors by using some measures of auxiliary variables other than the mean. Some notable references in this regard are [12,13,14]. The idea of using a ratio estimator in ranked set sampling was proposed in [15].
In some situations, the population information of auxiliary variables is unknown; hence, ratio- or product-type estimators cannot be used. For such situations, a useful sampling technique, known as two-phase sampling, has been proposed. The method is described in [16] and [17], and some ratio- and regression-type estimators have been introduced by the authors of [18,19,20]. More details on various estimators in single- and two-phase sampling can be found in [21] and [22]. There is always room to suggest some more efficient estimators, and we have proposed some new estimators for the population mean in single- and two-phase sampling in this paper. The plan of the paper follows.
A brief description of the ratio estimator in single- and two-phase sampling is given in Section 2 alongside some notations. Some existing single- and two-phase sampling estimators are given in Section 3. A new estimator for single phase sampling is proposed in Section 4 alongside the expressions for the bias and MSE of the proposed estimator. In Section 5, a two-phase sampling version of the proposed estimator is presented. The expression for the bias and MSE of the two-phase sampling estimator are also given in Section 5. In Section 6, a numerical study is detailed for the proposed estimators. The numerical study comprised a simulation and applications using some real populations. The conclusions and recommendations are given in Section 7.
2.
Methodology and notations
In this section the methodology and notations are given. Suppose the units of a population of size N are ${U_1}, {U_2}, {\rm{K}} , {U_N}$ and the values of some variable of interest are ${Y_1}, {Y_2}, {\rm{K}} , {Y_N}$. Suppose further that the estimation of the population mean $\bar Y = {N^{ - 1}}\sum\nolimits_{i = 1}^N {{Y_i}} $ is required. The estimation is done by using a random sample of size n. The conventional single phase sampling estimator of $\bar Y$ is the sample mean $\bar y = {n^{ - 1}}\sum\nolimits_{i = 1}^n {{y_i}} $. When information on an auxiliary variable is available then the ratio estimator of the population mean is given as
where $\bar X = {N^{ - 1}}\sum\nolimits_{i = 1}^N {{X_i}} $ and $\bar x = {n^{ - 1}}\sum\nolimits_{i = 1}^n {{x_i}} $ are population and sample means of the auxiliary variable, respectively. The MSE of the ratio estimator is
where $\theta = {n^{ - 1}} - {N^{ - 1}}$, ${C_y}$ is the population coefficient of variation for Y, ${C_x}$ is the population coefficient of variation for X and $\rho $ is the population correlation coefficient between X and Y. The classical ratio estimator has been very popular for a long period of time.
In some situations, the population information for the auxiliary variable is not available and in such situations the ratio estimator (1) cannot be used. The problem can be solved by using a two-phase sampling technique. In two-phase sampling a first-phase sample of size n1 is drawn from a population of size N, and information about the auxiliary variable is recorded. A sub-sample of size n2 < n1 is drawn from the first-phase sample and information about the auxiliary variable and study variable are recorded. The conventional ratio estimator in two-phase sampling is given as
where ${\bar y_2} = n_2^{ - 1}\sum\nolimits_{i = 1}^{{n_2}} {{y_i}} $ is the second-phase sample mean of the study variable Y, ${\bar x_2} = n_2^{ - 1}\sum\nolimits_{i = 1}^{{n_2}} {{x_i}} $ is second-phase sample mean of the auxiliary variable X and ${\bar x_1} = n_1^{ - 1}\sum\nolimits_{i = 1}^{{n_1}} {{x_i}} $ is the first-phase sample mean of the auxiliary variable X. The MSE of two-phase sampling ratio estimator is
where ${\theta _2} = n_2^{ - 1} - {N^{ - 1}}$ and ${\theta _1} = n_1^{ - 1} - {N^{ - 1}}$. Several modifications of the two-phase sampling ratio estimator have been proposed from time to time; see for example [21].
The MSE of an estimator in single-phase sampling is usually obtained by writing $\bar y = \bar Y\left( {1 + {e_y}} \right)$ and $\bar x = \bar X\left( {1 + {e_x}} \right)$ where ${e_y}$ and ${e_x}$ are errors in estimation such that
The MSE of an estimator in two-phase sampling is usually obtained by writing ${\bar y_2} = \bar Y\left( {1 + {e_{{y_2}}}} \right)$, ${\bar x_2} = \bar X\left( {1 + {e_{{x_2}}}} \right)$ and ${\bar x_1} = \bar X\left( {1 + {e_{{x_1}}}} \right)$ where ${e_{{y_2}}}$, ${e_{{x_2}}}$ and ${e_{{x_1}}}$ are errors in estimation such that
We will now discuss some important single- and two-phase sampling estimators in the following section.
3.
Some existing estimators
Several estimators have been proposed from time to time for the estimation of the population mean in single- and two-phase sampling. Some of these are given in the following subsections.
3.1. Some single phase sampling estimators
Some popular single-phase sampling estimators will now be discussed.
● Sisodia and Dwivedi estimator [2]
MSE:
● Singh estimator [23]
MSE:
where ${\beta _2}$ is the coefficient of kurtosis for the auxiliary variable X.
● Kadilar and Cingi estimator [9]
MSE:
where b is the regression coefficient between x and y.
● Yan and Tian estimator [24]
Estimator:
MSE:
where ${\beta _1}$ is the coefficient of skewness for the auxiliary variable X.
● Singh estimator [25]
Estimator:
MSE:
where ${S_x}$ is the standard deviation of the auxiliary variable X.
● Subramani and Kumarpandiyan estimator [26]
Estimator:
MSE:
where ${M_d}$ is the mean deviation of the auxiliary variable X.
For the above estimators, it is assumed that the quantities ${C_x}, {C_y}, {M_d}, {\beta _1}$ and ${\beta _2}$ are known.
3.2. Some two-phase sampling estimators
Some popular two-phase sampling estimators will now be discussed.
● Mohanty estimator [27]
Estimator:
MSE:
where Z is another auxiliary variable.
● Hanif, Hammad and Shahbaz estimator [28]
Estimator:
where k is the weighting constant such that $0 < k < 1$.
MSE:
where ${\rho _{xy}}$ is the correlation coefficient between $\left( {X, Y} \right)$, ${\rho _{yz}}$ is the correlation coefficient between $\left( {Y, Z} \right)$ and ${\rho _{xz}}$ is the correlation coefficient between $\left( {X, Z} \right)$.
More details about single- and two-phase sampling estimators can be found in [21]. We will now propose a new ratio-type estimator in single-phase sampling.
4.
A new class of estimators for the mean in single-phase sampling
The proposed class of estimators for single phase sampling is
where ${A_1} = {C_x}\, \, , \, \, {A_2} = {\beta _2}\, \, , \, \, {A_3} = {\beta _1}\, \, , \, \, {A_4} = {S_x}\, \, {\text{and}}\, \, {A_5} = {M_d}$. We can see that the proposed class of estimators can be written as
where ${t_1} = {\bar y_{SD}}\, \, , \, \, {t_2} = {\bar y_{S04}}\, \, , \, \, {t_3} = {\bar y_{YT}}\, \, , \, \, {t_4} = {\bar y_{S03}}\, \, {\text{and}}\, \, {t_5} = {\bar y_{SK}}$. From above we can see that the proposed class of estimators is a weighted sum of various ratio type estimators. The bias and MSE of the proposed class of estimators are derived in the following subsections.
4.1. Bias and MSE
Using the notations from (5), the class of estimators given in (23) can be written as
Expanding the negative powers and retaining the linear terms only, we have
Expanding and applying expectation, the bias of the proposed class of estimators is
Again, expanding (24) and retaining only the linear terms we have
Squaring and applying expectation, the MSE of the proposed class of estimators is
Expanding the square, applying expectation to individual terms and ignoring $E\left( b \right) - \beta $, where $\beta $ is the population regression coefficient between X and Y, the MSE of the proposed class of estimators is
The optimum value of $\alpha $ that minimizes the MSE is obtained by differentiating (25) with respect to $\alpha $ and equating the resulting derivative to zero. The optimum value thus obtained is
Substituting the optimum value of $\alpha $ in (25), the minimum MSE of the proposed class of estimators is
4.2. Comparison of the proposed class of estimators
A comparison of the proposed class of estimators is given below. For the comparison we have compared the MSE of the proposed class of estimators with some existing estimators. The proposed class of estimators will be better than the estimators compared if
where t is any estimator compared. We first compare the proposed class of estimators with the classical ratio estimator. For this we have
From above, it is easy to see that $MSE\left( {{{\bar y}_R}} \right) \geqslant MSE\left( {{{\bar y}_{DS_i^3}}} \right)$; hence the proposed class of estimators is always more precise than the ratio estimator. The equality holds when $\rho {C_y} = {C_x}$.
Again, we will compare the proposed class of estimators with the estimators given in Subsection 3.1. The proposed class of estimators will be more efficient than the estimators given in Subsection 3.1 if
Now,
and mi is suitably defined. Comparing the MSE of ${t_i}$ with MSE of proposed class of estimators, we have
or
From (28) we can see that the proposed class of estimators will be more efficient than any of the estimators given in Subsection 3.1 if $C_y^2{\rho ^2} + 2\rho {C_y}{C_x} > m_i^2C_x^2$.
Finally, we will compare the proposed class of estimators with the estimator proposed in [9]. For this we have
or
hence the proposed class of estimators will always be more efficient than the estimator proposed in [9].
We will now propose a class of estimators for two-phase sampling in the following section.
5.
A new class of estimators for the mean in two-phase sampling
The proposed class of estimators for two-phase sampling is
where Ai has been defined in Section 4. Also ${\bar x_1}$ and ${\bar x_2}$ are the first- and second-phase sample means of the auxiliary variable X and ${\bar y_2}$ is the second-phase sample mean of the study variable Y. We can see that the proposed class of estimators for two-phase sampling can be written as
where
represents the two-phase sampling counterparts of the estimators ${\bar y_{SD}}, {\bar y_{S04}}, {\bar y_{YT}}, {\bar y_{S03}}$ and ${\bar y_{SK}}$. Also ${\bar y_{KC\left( 2 \right)}}$ is the two-phase sampling counterpart of the estimator proposed in [9]; it is given as
From above we can see that the proposed class of estimators is a weighted sum of various ratio-type estimators in two-phase sampling. The bias and MSE of the proposed class of estimators are derived in the following subsections.
5.1. Bias and MSE
Using the notations from (6), the class of estimators given by (30) can be written as
Expanding the negative powers and retaining the linear terms only, we have
Expanding (31) and applying expectation, the bias of the proposed class of estimators for two-phase sampling is
Again, expanding (31) and retaining only the linear terms, we have
Squaring and applying expectation, the MSE of the proposed class of estimators is
Expanding the square and applying expectation to individual terms, the MSE of the proposed class of estimators is
The optimum value of $\alpha $ that minimizes the mean square error is obtained by differentiating (32) with respect to $\alpha $ and equating the resulting derivative to zero. The optimum value thus obtained is
The minimum value of the MSE of the class of two-phase sampling estimators can be obtained by applying the optimum value of $\alpha $ from (33) in (32), as follows:
where $\Delta = C_x^2{\left( {1 - {m_i}} \right)^2} + 2\rho {C_y}{C_x}\left( {1 + {m_i}} \right) - {\rho ^2}C_y^2$.
6.
Numerical study
In this section, we present the numerical study for the proposed estimators. The numerical study comprised a simulation for the single- and two-phase sampling estimators and an application of the proposed single-phase sampling estimator by using some real populations. We will first give the simulation study for the proposed estimators in the following subsection.
6.1. Simulation study
Here, we present the simulation study for the proposed single- and two-phase sampling estimators. The algorithm for the single phase sampling is given below.
1) Generate an artificial population of size 5000 from a bivariate normal distribution ${N_2}\left( {65, 50, {6^2}, {5^2}, \rho } \right)$ by using different values of the correlation coefficient.
2) Generate random samples of sizes 50,100,200 and 500 from the generated population.
3) Compute different estimators by using the generated samples.
4) Repeat Steps 2 and 3; 20000 times for each sample size.
5) Compute the bias and MSE of each estimator for different sample sizes by using
where i = R, SD, S04, KC, YT, S03, SK, $DS_1^3$, $DS_2^3$, $DS_3^3$, $DS_4^3$ and $DS_5^3$.
Table 1 contains the bias of various estimators and Table 2 contains the MSE of various estimators used in the simulation study.
From Table 1, we can see that the bias remains negative for all of the sample sizes and all values of the correlation coefficients. From Table 2, we can see that all members of the proposed class of estimators are more efficient than the other estimators used in the study. We can also see that the MSE of all of the estimators decreases with an increase in the sample size. The relative efficiency of various estimators relative to the classical ratio estimator was also computed; the results are plotted in Figure 1 given below.
From the above figure we can see that all the members of the proposed class of estimators are more efficient than the ratio estimator.
We also conducted the simulation study for the proposed class of estimators for two-phase sampling. The algorithm for the two-phase sampling simulation is given below.
1) Generate an artificial population of size 5000 from a bivariate normal distribution ${N_2}\left( {65, 50, {6^2}, {5^2}, \rho } \right)$ by using different values of the correlation coefficient.
2) Generate first-phase random samples of sizes 500 and 1000 from the generated population.
3) Generate second-phase random samples with sizes 5%, 10% and 20% of the first-phase sample.
4) Compute different estimators by using the second-phase sample mean of Y, first- and second-phase sample means of the auxiliary variable X and some population measures for the auxiliary variable X.
5) Repeat Steps 2–4; 20000 times for each combination of first- and second-phase sample sizes.
6) Compute the bias and MSE of each estimator for different sample sizes by using
where i = CR, $DS_1^3\left( 2 \right)$, $DS_2^3\left( 2 \right)$, $DS_3^3\left( 2 \right)$, $DS_4^3\left( 2 \right)$ and $DS_5^3\left( 2 \right)$.
Table 3 contains the bias of various estimators whereas Table 4 contains the MSE of various estimators.
The above table shows that the bias of various estimators fluctuates and is on the relatively higher side.
From the above tables, we can see that all members of the proposed class of estimators are more efficient than the classical two-phase sampling ratio estimator.
We also computed the relative efficiency of various two-phase sampling estimators relative to the two-phase sampling ratio estimator. The relative efficiencies are plotted for various combinations of $\rho $, n1 and n2. The plots of relative efficiencies are given in Figure 2 below.
From the above figure, we can see that the estimator ${\bar y_{DS_4^3\left( 2 \right)}}$ is the most efficient among the members of the new class. The member ${\bar y_{DS_3^3\left( 2 \right)}}$ is the worst in the class, but all members are more efficient than the ratio estimator. We will now present the numerical study of different estimators using some real populations.
6.2. Study using data from some real populations
In this subsection, we present a numerical comparison of various estimators using some real populations. The description of the populations used in the study alongside some measures computed from these populations are given in Table 5 below.
We computed the MSE of various estimators by using the data on the above-mentioned populations. The results are shown in Table 6 below.
From the above table we can see that the proposed estimator outperformed all of the other estimators used in the study as it has smallest MSE among all of the estimators. We also computed the relative efficiency of the proposed estimator relative to various estimators. The MSE and relative efficiency are shown in Figure 3 below.
The graph also shows the same results as are provided in Table 6.
7.
Conclusions
In this paper we have proposed two new families of estimators for estimation of the population mean. The families of estimators have been proposed for single- and two-phase sampling. The expressions for the bias and MSE for the proposed families of estimators have been obtained. We have seen that the proposed family of estimators in single-phase sampling is more efficient than the other estimators to which they were compared. This has been shown through analytical and empirical comparisons. The results of the simulation study also support this comparison. We can thus conclude that the proposed single-phase sampling estimator will estimate the population mean with more precision. We have also found that the proposed family of estimators in two-phase sampling is also more efficient than the classical ratio estimator for two-phase sampling. This has been concluded by the results of an extensive simulation study on the two-phase sampling family of estimators. We can, therefore, conclude that the proposed families of estimators will provide more efficient results for estimation of the population mean in single- and two-phase sampling when information on a single auxiliary variable is available.
Conflict of interest
The authors declare no conflict of interest.