Positive solutions to a semipositone superlinear elastic beam equation

Haixia Lu; Li Sun; Haixia Lu; Li Sun

doi:10.3934/math.2021250

AIMS Mathematics

2021, Volume 6, Issue 5: 4227-4237. doi: 10.3934/math.2021250

Previous Article Next Article

Research article Special Issues

Positive solutions to a semipositone superlinear elastic beam equation

Haixia Lu ^{1,2
,
,},
Li Sun ¹

1.
School of Mathematics and Statistics, Jiangsu Normal University, Xuzhou, 221116, P. R. China
2.
School of arts and science, Suqian College, Suqian 223800, P. R. China

Received: 11 October 2020 Accepted: 04 January 2021 Published: 07 February 2021
MSC : 34K10, 37C25

A semipositone fourth-order two-point boundary value problem is considered. In mechanics, the problem describes the deflection of an elastic beam rigidly fastened on the left and simply supported on the right. Under some conditions concerning the first eigenvalue corresponding to the relevant linear operator, the existence of nontrivial solutions and positive solutions to this boundary value problem is obtained. The main results are obtained by using the topological method and the fixed point theory of superlinear operators.

Keywords:

topological degree,
fixed point,
nontrivial solutions and positive solutions,
elastic beam equations

Citation: Haixia Lu, Li Sun. Positive solutions to a semipositone superlinear elastic beam equation[J]. AIMS Mathematics, 2021, 6(5): 4227-4237. doi: 10.3934/math.2021250

Related Papers:

[1]	Taewan Kim, Jung Hoon Kim . A new optimal control approach to uncertain Euler-Lagrange equations: $H_\infty$ disturbance estimator and generalized $H_2$ tracking controller. AIMS Mathematics, 2024, 9(12): 34466-34487. doi: 10.3934/math.20241642
[2]	Valérie Gauthier-Umaña, Henryk Gzyl, Enrique ter Horst . Decoding as a linear ill-posed problem: The entropy minimization approach. AIMS Mathematics, 2025, 10(2): 4139-4152. doi: 10.3934/math.2025192
[3]	Dayang Dai, Dabuxilatu Wang . A generalized Liu-type estimator for logistic partial linear regression model with multicollinearity. AIMS Mathematics, 2023, 8(5): 11851-11874. doi: 10.3934/math.2023600
[4]	Fei Yan, Junpeng Li, Haosheng Jiang, Chongqi Zhang . $A$ -Optimal designs for mixture polynomial models with heteroscedastic errors. AIMS Mathematics, 2023, 8(11): 26745-26757. doi: 10.3934/math.20231369
[5]	Jiali Wu, Maoning Tang, Qingxin Meng . A stochastic linear-quadratic optimal control problem with jumps in an infinite horizon. AIMS Mathematics, 2023, 8(2): 4042-4078. doi: 10.3934/math.2023202
[6]	Xiaowei Zhang, Junliang Li . Model averaging with causal effects for partially linear models. AIMS Mathematics, 2024, 9(6): 16392-16421. doi: 10.3934/math.2024794
[7]	Zuliang Lu, Ruixiang Xu, Chunjuan Hou, Lu Xing . A priori error estimates of finite volume element method for bilinear parabolic optimal control problem. AIMS Mathematics, 2023, 8(8): 19374-19390. doi: 10.3934/math.2023988
[8]	Bo Jiang, Yongge Tian . Equivalent analysis of different estimations under a multivariate general linear model. AIMS Mathematics, 2024, 9(9): 23544-23563. doi: 10.3934/math.20241144
[9]	Chengjin Tang, Jiahao Guo, Yinghui Dong . Optimal investment based on performance measure with a stochastic benchmark. AIMS Mathematics, 2025, 10(2): 2750-2770. doi: 10.3934/math.2025129
[10]	Yang Liu, Ruihu Li, Qiang Fu, Hao Song . On the minimum distances of binary optimal LCD codes with dimension 5. AIMS Mathematics, 2024, 9(7): 19137-19153. doi: 10.3934/math.2024933

Abstract

1. Introduction

The generalized linear model (GLM), a generalization of the linear model with wide applications in many research areas, was proposed by Nelder and Wedderburn ^[1] in 1972 for discrete dependent variables, which cannot be dealt with by the ordinary linear regression model. The GLM allows the response variable to be nonnormal distributions, including binomial, Poisson, gamma, and inverse Gaussian distributions, whose means are linked with the predictors by a link function.

Nowadays, with the rapid development of science and technology, massive data is ubiquitous in many fields, including medicine, industry, and economics. Extracting effective information from massive data is the core challenge of big data analysis. However, the limited arithmetic power of computers tends to consume a lot of computing time. In order to deal with this challenge, parallel computing and distributed computing are commonly used, and subsampling techniques have emerged as a result, i.e., a small number of representative samples are extracted from massive data. Imberg et al. ^[2] proposed a theory on optimal design in the context of general data subsampling issues. It includes and extends most existing methods, works out optimality conditions, and offers algorithms for finding optimal subsampling scheme designs, which introduces a new class of invariant linear optimality criteria. Chao et al. ^[3] presented an optimal subsampling approach for modal regression with big data. The estimators are obtained by means of a two-step algorithm based on the modal expectation maximization when the bandwidth is not related to the subsample size.

There has been a great deal of research on subsampling algorithms of specific models. Wang et al. ^[4] devised a rapid subsampling algorithm to approximate the maximum likelihood estimators in the context of logistic regression. Based on the previous study, Wang ^[5] presented an enhanced estimation method for logistic regression, which has a higher estimation efficiency. In the case that data are usually distributed in multiple distributed sites for storage, Zuo et al. ^[6] developed a distributed subsampling procedure to effectively approximate the maximum likelihood estimators of logistic regression. Ai et al. ^[7] focused on the optimal subsampling method under the A-optimality criteria based on the method developed by Wang ^[4] for generalized linear models to quickly approximate maximum likelihood estimators from massive data. Yao and Wang ^[8] examined optimal subsampling methods for various models, including logistic regression models, softmax regression models, generalized linear models, quantile regression models, and quasi-likelihood estimators. Yu et al. ^[9] proposed an efficient subsampling procedure for online data streams with a multinomial logistic model. Yu et al. ^[10] studied the subsampling technique for the Akaike information criterion (AIC) and the smoothed AIC model-averaging framework for generalized linear models. Yu et al. ^[11] reviewed several subsampling methods for massive datasets from the viewpoint of statistical design.

To the best of our knowledge, all the existing methods above assume that the covariates are fully observable. However, in practice, this assumption is not realistic, and covariates may be inaccurately observed owing to measurement errors, which will lead to biases in the estimators of the regression coefficients. This means that we may incorrectly determine some unimportant variables as significant, which in turn affects the model selection and interpretation. Therefore, it is necessary to consider measurement errors. Liang et al. ^[12], Li and Xue ^[13], and Liang and Li ^[14] investigated the partial linear measurement error models. Stefanski ^[15] and Nakamura ^[16] obtained the corrected score functions of the GLM, such as linear regression, gamma regression, inverse gamma regression, and Poisson regression. Yang et al. ^[17] proposed an empirical likelihood method based on the moment identity of the corrected score function to perform statistical inference for a class of generalized linear measurement error models. Fuller ^[18] estimated the variable error model using the maximum likelihood method and studied statistical inference. Hu and Cui ^[19] proposed a corrected error variance method to accurately estimate the error variance, which can effectively reduce the influence of measurement error and false correlation at the same time. Carroll et al. ^[20] summarized the measurement errors in linear regression and described some simple and universally applicable measurement error analysis methods. Yi et al. ^[21] presented a regression calibration method, which is one of the first statistical methods introduced to address measurement errors in the covariates. In addition, they presented an overview of the conditional score and corrected score approaches for measurement error correction. Regarding the measurement errors in different situations existing in actual data, extensive research has been carried out, and a variety of methods have been proposed, see ^{[22,23,24,25]}. Recently, a class of variable selection procedures has been developed for measurement error models, see ^[26,27]. More recently, Ju et al. ^[28] studied the optimal subsampling algorithm and the random perturbation subsampling algorithm for big data linear models with measurement errors. The aim of this paper is to estimate the parameters using a subsampling algorithm for a class of generalized linear measurement error models in the massive data analysis.

In this paper, we study a class of the GLM with measurement errors, such as logistic regression models and Poisson regression models. We combine the corrected score function method with subsampling techniques to investigate subsampling algorithms. The consistency and asymptotic normality of the estimators obtained in the general subsampling algorithm are derived. We optimize the subsampling probabilities based on the design of A-optimality and L-optimality criteria and incorporate a truncation method in the optimal subsampling probabilities to obtain the optimal estimators. In addition, we develop an adaptive two-step algorithm and obtain the consistency and asymptotic normality of the final subsampling estimators. Finally, the effectiveness of the proposed method is demonstrated through numerical simulations and real data analysis.

The remainder of this paper is organized as follows: Section 2 introduces the corrected score function under different distributions and derives the general subsampling algorithm and the adaptive two-step algorithm. Sections 3 and 4 verify the effectiveness of the proposed method by generating simulated experimental data and two real data sets, respectively. Section 5 provides conclusions.

2. Model and methodology

In the GLM, it is assumed that the conditional distribution of the response variable belongs to the exponential family

$f(y;\theta ) = \exp \left\{\frac{\theta y - b(\theta )}{a(\phi)} + c(y,\phi) \right\},$

where $a(\cdot), b(\cdot), c(\cdot, \cdot)$ are known functions, $\theta$ is called the natural parameter, and $\phi$ is called the dispersion parameter.

Let $\left\{ {\left({{{\mathit{\boldsymbol{X}}}_i}, {{{Y}}_i}} \right)} \right\}_{i = 1}^N$ be independent and identically distributed random samples, ${\mu_i } = E\left({{Y}_i} \mid {\mathit{\boldsymbol{X}}_i}\right), \; V\left({\mu}_i \right) = \text{Var}\left({{Y}_i}\mid {\mathit{\boldsymbol{X}}_i}\right)$ , where the covariate ${\mathit{\boldsymbol{X}}}_i \in{\mathbb{R}{^p}}$ and the response variable ${{Y}}_i \in{\mathbb{R}}$ , $V(\cdot)$ is a known variance function. The conditional expectation of ${Y_i}$ given ${{\mathit{\boldsymbol{X}}}_i}$ is

$\begin{align} g(\mu_i) & = {\mathit{\boldsymbol{X}}}_i^{\rm T} {\mathit{\boldsymbol{\beta}}}, \end{align}$

(2.1)

where $g(\cdot)$ is the canonical link function, and $\mathit{\boldsymbol{\beta}} = {\left({{\beta _1}, \ldots, {\beta _p}} \right)^{\rm T}}$ is a $p$ -dimensional unknown regression parameter.

In practice, covariates are not always accurately observed, and there are measurement errors that cannot be ignored. Let ${{\mathit{\boldsymbol{W}}}_i}$ be an accurate observation of the covariate ${{\mathit{\boldsymbol{X}}}_i}$ . Assuming that the additive measurement error model is

$\begin{align} {{\mathit{\boldsymbol{W}}}_i} = {{\mathit{\boldsymbol{X}}}_i} + {{\mathit{\boldsymbol{U}}}_i}, \end{align}$

(2.2)

where ${\mathit{\boldsymbol{U}}_i} \sim N_p({{\mathbf{0}}}, {{\mathit{\boldsymbol{\Sigma}}}_u})$ , and it is independent of $({{\mathit{\boldsymbol{X}}}_i}, {Y_i})$ . Combining (2.1) and (2.2) yields a generalized linear model with measurement errors.

Define the log-likelihood function as $\ell (\mathit{\boldsymbol{\beta}}; Y_i) = \sum\limits_{i = 1}^N \log f(Y_i; \mathit{\boldsymbol{\beta}})$ . If ${{\mathit{\boldsymbol{X}}}_i}$ is observable, the score function for $\mathit{\boldsymbol{\beta}}$ in (2.1) is

$\sum\limits_{i = 1}^N \mathit{\boldsymbol{\eta}}_{i} \left(\mathit{\boldsymbol{\beta}} ;{{\mathit{\boldsymbol{X}}}_i},{Y_i} \right) = \sum\limits_{i = 1}^N \frac{\partial \ell (\mathit{\boldsymbol{\beta}}; Y_i)}{\partial \mathit{\boldsymbol{\beta}}} = \sum\limits_{i = 1}^N \frac{Y_i - \mu_i}{V(\mu_i)} \cdot \frac{\partial \mu_i}{\partial \mathit{\boldsymbol{\beta}}},$

and satisfies $E[{\mathit{\boldsymbol{\eta}}}_i \left(\mathit{\boldsymbol{\beta}}; {{\mathit{\boldsymbol{X}}}_i}, {Y_i} \right) \mid {{\mathit{\boldsymbol{X}}}_i}] = {\mathbf{0}}$ . However, when there is an error in ${{\mathit{\boldsymbol{X}}}_i}$ , directly replacing ${{\mathit{\boldsymbol{X}}}_i}$ with ${{\mathit{\boldsymbol{W}}}_i}$ to calculate ${\mathit{\boldsymbol{\eta}}}_i \left(\mathit{\boldsymbol{\beta}}; {{\mathit{\boldsymbol{X}}}_i}, {Y_i} \right)$ causes a bias, i.e., $E[{\mathit{\boldsymbol{\eta}}}_i \left(\mathit{\boldsymbol{\beta}}; {{\mathit{\boldsymbol{X}}}_i}, {Y_i} \right)] = {{\mathbf{0}}}$ will not always hold, hence a correction is needed. We define an unbiased score function ${{\mathit{\boldsymbol{\eta}}}_i^*}({{\mathit{\boldsymbol{\Sigma}}}_u}, \mathit{\boldsymbol{\beta}}; {{\mathit{\boldsymbol{W}}}_i}, {Y_i})$ for $\mathit{\boldsymbol{\beta}}$ satisfying $E[{{\mathit{\boldsymbol{\eta}}}_i^*} ({{\mathit{\boldsymbol{\Sigma}}}_u}, \mathit{\boldsymbol{\beta}}; {{\mathit{\boldsymbol{W}}}_i}, {Y_i}) \mid {{\mathit{\boldsymbol{X}}}_i}] = {\mathbf{0}}$ by the idea of ^[16]. The maximum likelihood estimator ${\hat {\mathit{\boldsymbol{\beta}}} _{\text{MLE}}}$ of $\mathit{\boldsymbol{\beta}}$ is the solution of the estimating equation

$\begin{align} \mathit{\boldsymbol{Q}}(\mathit{\boldsymbol{\beta}}): = \sum^N_{i = 1}{\mathit{\boldsymbol{\eta}}^{*}_{i}({\mathit{\boldsymbol{\Sigma}}}_{u},\mathit{\boldsymbol{\beta}};{\mathit{\boldsymbol{W}}}_{i},{Y}_{i})} = {\mathbf{0}}. \end{align}$

(2.3)

Based on the following moment identities associated with the error model (2.2),

$E({{\mathit{\boldsymbol{W}}}_i}\mid {{\mathit{\boldsymbol{X}}}_i}) = {{\mathit{\boldsymbol{X}}}_i},$

$E({{\mathit{\boldsymbol{W}}}_i}{{\mathit{\boldsymbol{W}}}_i^{\rm T}}\mid{{\mathit{\boldsymbol{X}}}_i}) = {{\mathit{\boldsymbol{X}}}_i}{{{\mathit{\boldsymbol{X}}}}_i^{\rm T}} + {{\mathit{\boldsymbol{\Sigma}}}_u},$

$E(\exp ({{\mathit{\boldsymbol{W}}}_i^{\rm T}}\mathit{\boldsymbol{\beta}})\mid{{\mathit{\boldsymbol{X}}}_i}) = \exp \left( {{{\mathit{\boldsymbol{X}}}_i^{\rm T}}\mathit{\boldsymbol{\beta}} + \frac{1}{2}{\mathit{\boldsymbol{\beta}}^{\rm T}}{{\bf{\Sigma }}_u}\mathit{\boldsymbol{\beta}}} \right),$

$E\left[ {{{\mathit{\boldsymbol{W}}}_i}\exp \left( {{{\mathit{\boldsymbol{W}}}_i^{\rm T}}\mathit{\boldsymbol{\beta}}} \right)\mid {{\mathit{\boldsymbol{X}}}_i}} \right] = \left( {{{\mathit{\boldsymbol{X}}}_i} + {{\mathit{\boldsymbol{\Sigma}}}_u}\mathit{\boldsymbol{\beta}}} \right)\exp \left( {{{\mathit{\boldsymbol{X}}}_i^{\rm T}}\mathit{\boldsymbol{\beta}} + \frac{1}{2}{\mathit{\boldsymbol{\beta}}^{\rm T}}{{\mathit{\boldsymbol{\Sigma}}}_u}\mathit{\boldsymbol{\beta}}} \right),$

$E\left[ {{{\mathit{\boldsymbol{W}}}_i}\exp \left( { - {{{\mathit{\boldsymbol{W}}}}_i^{\rm T}}\mathit{\boldsymbol{\beta}}} \right)\mid {{\mathit{\boldsymbol{X}}}_i}} \right] = \left( {{{\mathit{\boldsymbol{X}}}_i} - {{\mathit{\boldsymbol{\Sigma}}}_u}\mathit{\boldsymbol{\beta}}} \right)\exp \left[ { - {{\mathit{\boldsymbol{X}}}_i^{\rm T}}\mathit{\boldsymbol{\beta}} + \frac{1}{2}{\mathit{\boldsymbol{\beta}}^{\rm T}}{{\mathit{\boldsymbol{\Sigma}}}_u}\mathit{\boldsymbol{\beta}}} \right],$

$E\left[ {{{\mathit{\boldsymbol{W}}}_i}\exp \left( { - 2{{\mathit{\boldsymbol{W}}}_i^{\rm T}}\mathit{\boldsymbol{\beta}}} \right) \mid {{\mathit{\boldsymbol{X}}}_i}} \right] = \left( {{{\mathit{\boldsymbol{X}}}_i} - 2{{\mathit{\boldsymbol{\Sigma}}}_u}\mathit{\boldsymbol{\beta}}} \right)\exp \left[ { - 2{{\mathit{\boldsymbol{X}}}_i^{\rm T}}\mathit{\boldsymbol{\beta}} + 2{\mathit{\boldsymbol{\beta}}^{\rm T}}{{\mathit{\boldsymbol{\Sigma}}}_u}\mathit{\boldsymbol{\beta}}} \right],$

then we can construct the unbiased score function for binary logistic measurement error regression models and Poisson measurement error regression models, which are widely used in practice.

(1) Binary logistic measurement error regression models.

We consider the logistic measurement error regression model

$\left\{ \begin{aligned} &P\left( Y_i = 1 \mid {\mathit{\boldsymbol{X}}}_i \right) = \frac{1}{1 + \exp \left( - {\mathit{\boldsymbol{X}}}_i^{\rm T} \mathit{\boldsymbol{\beta}} \right)}, \\ &{\mathit{\boldsymbol{W}}}_i = {\mathit{\boldsymbol{X}}}_i + {\mathit{\boldsymbol{U}}}_i, \end{aligned} \right.$

with mean ${\mu_i } = {\left[ {1 + \exp \left({ - {{\mathit{\boldsymbol{X}}}_i^{\rm T}}\mathit{\boldsymbol{\beta}}} \right)} \right]^{ - 1}}$ and variance $\text{Var}\left({Y_i \mid {{\mathit{\boldsymbol{X}}}_i}} \right) = {\mu_i }\left({1 - {\mu_i }} \right)$ . Followed by Huang and Wang ^[29], the corrected score function is

$\mathit{\boldsymbol{\eta}}_i^*\left( {{{\mathit{\boldsymbol{\Sigma}}}_u},\mathit{\boldsymbol{\beta}};{\mathit{\boldsymbol{W}}_i},{Y_i}} \right) = {\mathit{\boldsymbol{W}}_i}{Y_i} + \left( {{\mathit{\boldsymbol{W}}_i} + {{\mathit{\boldsymbol{\Sigma}}}_u}{\mathit{\boldsymbol{\beta}}}} \right)\exp \left( { - {\mathit{\boldsymbol{W}}_i^{\rm T}} {\mathit{\boldsymbol{\beta}}} - \frac{1}{2}{\mathit{\boldsymbol{\beta}}^{\rm T}}{{\mathit{\boldsymbol{\Sigma}}}_u}{\mathit{\boldsymbol{\beta}}}} \right){Y_i} - {\mathit{\boldsymbol{W}}_i},$

and its first-order derivative is

${\bf{\Omega }}_i^*\left( {{{\mathit{\boldsymbol{\Sigma}}}_u},\mathit{\boldsymbol{\beta}};{\mathit{\boldsymbol{W}}_i},{Y_i}} \right) = \frac{{\partial \mathit{\boldsymbol{\eta}}_i^*\left( {{{\mathit{\boldsymbol{\Sigma}}}_u},\mathit{\boldsymbol{\beta}};{\mathit{\boldsymbol{W}}_i},{Y_i}} \right)}}{\partial {{\mathit{\boldsymbol{\beta}}}^{\rm T}}} = \left[ {{{\mathit{\boldsymbol{\Sigma}}}_u} - \left( {{\mathit{\boldsymbol{W}}_i} + {{\mathit{\boldsymbol{\Sigma}}}_u}\mathit{\boldsymbol{\beta}}} \right){{\left( {{\mathit{\boldsymbol{W}}_i} + {{\mathit{\boldsymbol{\Sigma}}}_u}\mathit{\boldsymbol{\beta}}} \right)}^{\rm T}}} \right]\exp \left( { - \mathit{\boldsymbol{W}}_i^{\rm T}\mathit{\boldsymbol{\beta}} - \frac{1}{2}{\mathit{\boldsymbol{\beta}}^{\rm T}}{{\mathit{\boldsymbol{\Sigma}}}_u} \mathit{\boldsymbol{\beta}}} \right){Y_i}.$

(2) Poisson measurement error regression models.

Let $Y_i$ follow the Poisson distribution with mean ${\mu }_i$ , $\text{Var}\left({Y_i \mid {\mathit{\boldsymbol{X}}}_i} \right) = {\mu }_i$ . Consider the log linear measurement error model

$\left\{ \begin{aligned} &\log \left( \mu_i \right) = {\mathit{\boldsymbol{X}}_i^{\rm T}} {\mathit{\boldsymbol{\beta}}}, \\ &\mathit{\boldsymbol{W}}_i = \mathit{\boldsymbol{X}}_i + \mathit{\boldsymbol{U}}_i, \end{aligned} \right.$

then we have the corrected score function

$\mathit{\boldsymbol{\eta}}_i^*\left( {{{\mathit{\boldsymbol{\Sigma}}}_u},\mathit{\boldsymbol{\beta}};{\mathit{\boldsymbol{W}}_i},{Y_i}} \right) = {\mathit{\boldsymbol{W}}_i}{Y_i} - \left( {{\mathit{\boldsymbol{W}}_i} - {{\mathit{\boldsymbol{\Sigma}}}_u}{\mathit{\boldsymbol{\beta}}}} \right)\exp \left( {\mathit{\boldsymbol{W}}_i^{\rm T}\mathit{\boldsymbol{\beta}} - \frac{1}{2}{\mathit{\boldsymbol{\beta}}^{\rm T}}{{\mathit{\boldsymbol{\Sigma}}}_u}\mathit{\boldsymbol{\beta}}} \right),$

and its first-order derivative is

${\bf{\Omega }}_i^*\left( {{{\mathit{\boldsymbol{\Sigma}}}_u},\mathit{\boldsymbol{\beta}};{\mathit{\boldsymbol{W}}_i},{Y_i}} \right) = \frac{{\partial \mathit{\boldsymbol{\eta}}_i^*\left( {{{\mathit{\boldsymbol{\Sigma}}}_u},\mathit{\boldsymbol{\beta}};{\mathit{\boldsymbol{W}}_i},{Y_i}} \right)}}{{\partial {\mathit{\boldsymbol{\beta}}}}^{\rm T}} = \left[ {{{\mathit{\boldsymbol{\Sigma}}}_u} - \left( {{\mathit{\boldsymbol{W}}_i} - {{\mathit{\boldsymbol{\Sigma}}}_u}\mathit{\boldsymbol{\beta}}} \right){{\left( {{\mathit{\boldsymbol{W}}_i} - {{\mathit{\boldsymbol{\Sigma}}}_u}\mathit{\boldsymbol{\beta}}} \right)}^{\rm T}}} \right]\exp \left( {{\mathit{\boldsymbol{W}}}_i^{\rm T} {\mathit{\boldsymbol{\beta}}} - \frac{1}{2}{\mathit{\boldsymbol{\beta}}^{\rm T}}{{\mathit{\boldsymbol{\Sigma}}}_u} \mathit{\boldsymbol{\beta}}} \right).$

2.1. General subsampling algorithm

It is assumed that ${\pi_i}$ is the probability of sampling the $i$ -th sample $\left({{{\mathit{\boldsymbol{W}}}_i}, {{Y}_i}} \right)$ , $i = 1, \ldots, N$ . Let $S$ be the set of the subsamples $\left({{{\widetilde {\mathit{\boldsymbol{W}}}_i}}, {\widetilde {{Y}_i}}} \right)$ with corresponding sampling probabilities $\widetilde {\pi_{i}}$ , i.e., $S = \left\{ { {\left({{\widetilde {{\mathit{\boldsymbol{W}}}_i}}, {\widetilde {{Y}_i}}, {\widetilde {\pi_i}}} \right)} } \right\}$ with the subsample size $r$ . The general subsampling algorithm is shown in Algorithm 1.

Algorithm 1 General subsampling algorithm.

Step 1. Given the subsampling probabilities

$\pi_i, \; i = 1, \ldots, N$ of all data points.
Step 2. Perform repeated sampling with replacement

$r$ times to form the subsample set

$S = \left\{ \left(\widetilde{\mathit{\boldsymbol{W}}}_i, \widetilde{Y}_i, \widetilde{\pi}_i \right) \right\}$ , where

${\widetilde {{\boldsymbol{W}}}_i}$ ,

${\widetilde {{Y}_i}}$ and

${\widetilde {\pi_i}}$ represent the covariate, response variable and subsampling probability in the subsample, respectively.
Step 3. Based on the subsample set

$S$ , solve the weighted estimation equation

$\mathit{\boldsymbol{Q}}^{*}(\mathit{\boldsymbol{\beta}})$ to obtain

$\overset{\smile}{\mathit{\boldsymbol{\beta}}}$ , where

$\begin{align} \mathit{\boldsymbol{Q}}^{*}(\mathit{\boldsymbol{\beta}}) : = \frac{1}{r}\sum_{i = 1}^r \frac{1}{\widetilde {\pi_i}} {\tilde {\mathit{\boldsymbol{\eta}}}_i^*}\left(\mathit{\boldsymbol{\Sigma}}_u, \mathit{\boldsymbol{\beta}}; {\widetilde {{\boldsymbol{W}}}_i}, {\widetilde {{Y}_i}}\right) = {\mathbf{0}}, \end{align}$ (2.4)

where

${\tilde {\mathit{\boldsymbol{\eta}}}_i^*}\left(\mathit{\boldsymbol{\Sigma}}_u, \mathit{\boldsymbol{\beta}}; {\widetilde {{\boldsymbol{W}}}_i}, {\widetilde {{Y}_i}}\right)$ is the unbiased score function of

$i$ -th sample point in the subsample and its first order derivative is

${\widetilde{\bf{\Omega }}_i^*}\left(\mathit{\boldsymbol{\Sigma}}_u, \mathit{\boldsymbol{\beta}}; {\widetilde {{\boldsymbol{W}}}_i}, {\widetilde {{Y}_i}}\right)$ .

To obtain the consistency and asymptotic normality of $\overset{\smile}{\mathit{\boldsymbol{\beta}}}$ , the following assumptions should be made. For simplicity, denote $\mathit{\boldsymbol{\eta}} _i^*({{\mathit{\boldsymbol{\Sigma}}}_u}, \mathit{\boldsymbol{\beta}}; {{\mathit{\boldsymbol{W}}}_i}, {Y_i})$ and ${\mathit{\boldsymbol{\Omega}}}_i^*({{\mathit{\boldsymbol{\Sigma}}}_u}, \mathit{\boldsymbol{\beta}}; {{\mathit{\boldsymbol{W}}}_i}, {Y_i})$ as $\mathit{\boldsymbol{\eta}} _i^*\left({{\mathit{\boldsymbol{\Sigma}}}_u}, \mathit{\boldsymbol{\beta}}\right)$ and ${\mathit{\boldsymbol{\Omega}}}_i^*\left({{\mathit{\boldsymbol{\Sigma}}}_u}, \mathit{\boldsymbol{\beta}}\right)$ .

A1: It is assumed that ${{\mathit{\boldsymbol{W}}}_i^{\rm T}} \mathit{\boldsymbol{\beta}}$ is almost necessarily in the interior of a closed set $K \in \Theta$ , $\Theta$ is a natural parameter space.

A2: The regression parameters are located in the ball $\Lambda = \left\{ {\mathit{\boldsymbol{\beta}} \in {\mathbb{R}{^p}}:{{\left\| \mathit{\boldsymbol{\beta}} \right\|}_1} \le B} \right\}, {{\mathit{\boldsymbol{\beta}}} _t}$ and ${\hat {\mathit{\boldsymbol{\beta}}}_{\text{MLE}}}$ are true parameters and maximum likelihood estimators, which are interior points of $\Lambda$ , and $B$ is a constant, where $\| \cdot \|_1$ denotes $\ell_1$ -norm.

A3: As $n \to \infty$ , the observed information matrix ${{\bf{M}}_X} : = \frac{1}{N} \sum\limits_{i = 1}^N {{\bf{\Omega}}_{i}^*} \left(\mathit{\boldsymbol{\Sigma}}_u, {\hat {\mathit{\boldsymbol{\beta}}}_{\text{MLE}}}\right)$ is a positive definite matrix in probability.

A4: Assume that for all $\mathit{\boldsymbol{\beta}} \in \Lambda$ , $\frac{1}{N}\sum\limits_{i = 1}^N {{{\left\| {\mathit{\boldsymbol{\eta}}_{_i}^*\left({{\mathit{\boldsymbol{\Sigma}}}_u}, \mathit{\boldsymbol{\beta}}\right)} \right\|}^4}} = {O_P}(1)$ , where $\| \cdot \|$ denotes the Euclidean norm.

A5: Suppose that the full sample covariates have finite 6th-order moments, i.e., $E{\left\| {{\mathit{\boldsymbol{W}}_1}} \right\|^6} \le \infty$ .

A6: For any $\delta \ge 0$ , we assume that

$\frac{1}{{{N^{2 + \delta }}}}\sum\limits_{i = 1}^N {\frac{{{{\left\| {\mathit{\boldsymbol{\eta}}_{_i}^*({{\mathit{\boldsymbol{\Sigma}}}_u},{{\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}})} \right\|}^{2 + \delta }}}}{{\pi _i^{1 + \delta }}}} = {O_P}(1), \; \frac{1}{{{N^{2 + \delta }}}}\sum\limits_{i = 1}^N {\frac{{{{\left| {{\bf{\Omega}}_{_i}^{*(j_1j_2)}({{\mathit{\boldsymbol{\Sigma}}}_u},{{\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}})} \right|}^{2 + \delta }}}}{{\pi _i^{1 + \delta }}}} = {O_P}(1),$

where ${\bf{\Omega}}_{_i}^{*(j_1j_2)}$ represents the elements of the $j_1$ -th row and $j_2$ -th column of the matrix ${\bf{\Omega}}_{i}^*$ .

A7: Assume that $\mathit{\boldsymbol{\eta}} _i^*({{\mathit{\boldsymbol{\Sigma}}}_u}, \mathit{\boldsymbol{\beta}})$ and ${\mathit{\boldsymbol{\Omega}}}_i^*({{\mathit{\boldsymbol{\Sigma}}}_u}, \mathit{\boldsymbol{\beta}})$ are $m({{\mathit{\boldsymbol{W}}}_i})$ -Lipschitz continuous. For any ${\mathit{\boldsymbol{\beta}} _1}, \; {\mathit{\boldsymbol{\beta}} _2} \in \Lambda$ , there exist functions ${m_1}({{\mathit{\boldsymbol{W}}}_i})$ and ${m_2}({{\mathit{\boldsymbol{W}}}_i})$ such that $\left\| {\mathit{\boldsymbol{\eta}} _i^*({{\mathit{\boldsymbol{\Sigma}}}_u}, {\mathit{\boldsymbol{\beta}} _1}) - \mathit{\boldsymbol{\eta}} _i^*({{\mathit{\boldsymbol{\Sigma}}}_u}, {\mathit{\boldsymbol{\beta}} _2})} \right\| \le {m_1}({{\mathit{\boldsymbol{W}}}_i})\left\| {{\mathit{\boldsymbol{\beta}} _1} - {\mathit{\boldsymbol{\beta}} _2}} \right \|$ , ${{\left\| {{\bf{\Omega}}_i^*({{\mathit{\boldsymbol{\Sigma}}}_u}, {\mathit{\boldsymbol{\beta}} _1}) - {\bf{\Omega}}_i^*({{\mathit{\boldsymbol{\Sigma}}}_u}, {\mathit{\boldsymbol{\beta}} _2})} \right\|}_S \le {m_2}({{\mathit{\boldsymbol{W}}}_i})\left\| {{\mathit{\boldsymbol{\beta}} _1} - {\mathit{\boldsymbol{\beta}} _2}} \right\|}$ , where ${\| \bf{A} \|}_S$ denotes the spectral norm of matrix $\bf A$ . Further assume that $E\left\{ {{m_1}({{\mathit{\boldsymbol{W}}}_i})} \right\} \le \infty$ and $E\left\{ {{m_2}({{\mathit{\boldsymbol{W}}}_i})} \right\} \le \infty$ .

Assumptions A1 and A2 are also used in Clémencon et al. ^[30]. The set $\Lambda$ in Assumption A2 is also known as the admissible set and is a prerequisite for consistency estimation for the GLM with full data ^[31]. Assumption A3 imposes a condition on the covariates to ensure that the MLE based on the full dataset is consistent. In order to obtain the Bahadur representation of the subsampling estimators, Assumptions A4 and A5 are required. Assumption A6 is a moment condition for the subsampling probability and is also required for the Lindberg-Feller central limit theorem. Assumption A7 adds a restriction on smoothing, which can be found in ^[32].

The following theorems show the consistency and asymptotic normality of the subsampling estimators.

Theorem 2.1. If Assumptions A1–A7 hold, as $r \to \infty$ and $N \to \infty$ , $\overset{\smile}{\mathit{\boldsymbol{\beta}}}$ converges to $\hat{\mathit{\boldsymbol{\beta}}}_{\mathit{\text{MLE}}}$ in conditional probability given $\mathcal{F}_N$ , and the convergence rate is ${r^{\frac{1}{2}}}$ . That is, for all $\varepsilon > 0$ , there exist constants ${\Delta _\varepsilon }$ and ${r_\varepsilon }$ such that

$\begin{equation} P\left( \left\| \overset{\smile}{\mathit{\boldsymbol{\beta}}} - \hat{\mathit{\boldsymbol{\beta}}}_{\mathit{\text{MLE}}} \right\| \ge r^{-\frac{1}{2}} \Delta_\varepsilon \mid \mathcal{F}_N \right) < \varepsilon, \end{equation}$

(2.5)

for all $r > {r_\varepsilon }$ .

Theorem 2.2. If Assumptions A1–A7 hold, as $r \to \infty$ and $N \to \infty$ , conditional on $\mathcal{F}_N$ , the estimator $\overset{\smile}{\mathit{\boldsymbol{\beta}}}$ obtained from Algorithm 1 satisfies

$\begin{equation} {{\bf{V}}^{ - \frac{1}{2}}}(\overset{\smile}{\mathit{\boldsymbol{\beta}}} - \hat{\mathit{\boldsymbol{\beta}}}_{\mathit{\text{MLE}}})\mathop \to \limits^d N_p({{\mathbf{0}}},{\bf I}), \end{equation}$

(2.6)

where ${\bf{V}} = {\bf{M}}_X^{ - 1}{{\bf{V}}_{\mathit{\text{C}}}}{\bf{M}}_X^{ - 1} = {O_P}({r^{ - 1}})$ , and

${{\bf{V}}_{\mathit{\text{C}}}} = \frac{1}{{{N^2}r}}\sum\limits_{i = 1}^N {\frac{{\mathit{\boldsymbol{\eta}}_{_i}^*({{\mathit{\boldsymbol{\Sigma}}}_u},\hat{\mathit{\boldsymbol{\beta}}}_{\mathit{\text{MLE}}})\mathit{\boldsymbol{\eta}}{{_{_i}^*}^{\rm T}}({{\mathit{\boldsymbol{\Sigma}}}_u},\hat{\mathit{\boldsymbol{\beta}}}_{\mathit{\text{MLE}}})}}{{{\pi _i}}}}.$

Remark 1. In order to get the standard error of the corresponding estimator, we estimate the variance-covariance matrix of $\overset{\smile}{\mathit{\boldsymbol{\beta}}}$ by

${\widehat{\bf{V}}} = \widehat{\bf{ M}}_X^{ - 1}{{\widehat{\bf{ V}}}_{\text {C}}}{\widehat{\bf{ M}}}_X^{ - 1},$

where

${{\widehat{\bf{ M}}}_X} = \frac{1}{{Nr}}\sum\limits_{i = 1}^r {\frac{{{\widetilde{\bf{\Omega}}}_{_i}^*({{\mathit{\boldsymbol{\Sigma}}}_u},{{\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}})}}{{{{\widetilde \pi }_i}}}},$

${{\widehat{\bf{ V}}}_{\text {C}}} = \frac{1}{{{N^2}{r^2}}}\sum\limits_{i = 1}^r {\frac{{\tilde{\mathit{\boldsymbol{\eta}}}_{_i}^*({{\mathit{\boldsymbol{\Sigma}}}_u},{{\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}})\tilde{\mathit{\boldsymbol{\eta}}}{{_{_i}^*}^{\rm T}}({{\mathit{\boldsymbol{\Sigma}}}_u},{{\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}})}}{{\widetilde \pi _i^2}}}.$

Based on the A-optimality criteria in the optimal design language, the optimal subsampling probabilities are obtained by minimizing the asymptotic mean square error of $\overset{\smile}{\mathit{\boldsymbol{\beta}}}$ in Theorem 2.2.

However, ${\mathit{\boldsymbol{\Sigma}}}_u$ is usually unknown in practice. Therefore, we need to estimate the covariance matrix ${\mathit{\boldsymbol{\Sigma}}}_u$ as suggested by ^[12]. We observe that the consistent, unbiased moment estimator of ${\mathit{\boldsymbol{\Sigma}}}_u$ is

${{\widehat{\bf{ \Sigma }}}_u} = \frac{{\sum\limits_{i = 1}^N {\sum\limits_{j = 1}^{{m_i}} {\left( {{\mathit{\boldsymbol{W}}_{ij}} - {{\overline{\mathit{\boldsymbol{W}}}}_i}} \right){{\left( {{\mathit{\boldsymbol{W}}_{ij}} - {{\overline{\mathit{\boldsymbol{W}}}}_i}} \right)}^{\rm T}}} } }}{{\sum\limits_{i = 1}^N {\left( {{m_i}}-1 \right)} }},$

where ${\overline{\mathit{\boldsymbol{W}}}}_i$ is the sample mean of the replicates, and $m_i$ is the number of repeated measurements of the $i$ -th individual.

Theorem 2.3. Define $g_i^{\mathit{\text{mV}}} = \left\| {{\bf{M}}_X^{-1}}\mathit{\boldsymbol{\eta}}_i^*(\mathit{\boldsymbol{\Sigma}}_u, \hat{\mathit{\boldsymbol{\beta}}}_{\mathit{\text{MLE}}}) \right\|, \; i = 1, \ldots, N$ . The subsampling strategy is mV-optimal if the subsampling probability is chosen such that

$\begin{equation} \pi_i^{\mathit{\text{mV}}} = \frac{g_i^{\mathit{\text{mV}}}}{\sum\limits_{j = 1}^N g_j^{\mathit{\text{mV}}}}, \end{equation}$

(2.7)

which is obtained by minimizing $tr({\bf{V}})$ .

Theorem 2.4. Define $g_i^{\mathit{\text{mVc}} } = \left\| {\mathit{\boldsymbol{\eta}}}_i^*(\mathit{\boldsymbol{\Sigma}}_u, \hat{\mathit{\boldsymbol{\beta}}}_{\mathit{\text{MLE}}}) \right\|, \; i = 1, \ldots, N$ . The subsampling strategy is mVc-optimal if the subsampling probability is chosen such that

$\begin{equation} \pi_i^{\mathit{\text{mVc}}} = \frac{g_i^{\mathit{\text{mVc}}}}{\sum\limits_{j = 1}^N g_j^{\mathit{\text{mVc}}}}, \end{equation}$

(2.8)

which is obtained by minimizing $tr({{\bf{V}}_{\mathit{\text{C}}}})$ .

Remark 2. ${{\bf{M}}_X}$ and ${{\bf{V}}_{\text{C}}}$ are non-negative definite matrices, and ${\bf{V}} = {\bf{M}}_X^{ - 1}{{\bf{V}}_{\text{C}}}{\bf{M}}_X^{ - 1}$ , then $tr({\bf{V}}) = tr\left({{\bf{M}}_X^{ - 1}{{\bf{V}}_{\text{C}}}{\bf{M}}_X^{ - 1}} \right) \le {\sigma _{\max }}\left({{\bf{M}}_X^{ - 2}} \right)tr\left({{{\bf{V}}_{\text{C}}}} \right)$ , where ${\sigma_{\max}}\left({\bf{A}} \right)$ represents the maximum eigenvalue of square matrix ${\bf{A}}$ . As ${\sigma _{\max }}\left({{\bf{M}}_X^{ - 2}} \right)$ does not depend on $\mathit{\boldsymbol{\pi}}$ , minimizing $tr({{\bf{V}}_C})$ means minimizing the upper bound of $tr({\bf{V}})$ . In fact, for two given subsampling probabilities ${\mathit{\boldsymbol{\pi}}_1}$ and ${\mathit{\boldsymbol{\pi}}_2}$ , ${\bf{V}}\left({{\mathit{\boldsymbol{\pi}}_1}} \right) \le {\bf{V}}\left({{\mathit{\boldsymbol{\pi}}_2}} \right)$ if and only if ${{\bf{V}}_{\text{C}}}\left({{\mathit{\boldsymbol{\pi}}_1}} \right) \le {{\bf{V}}_{\text{C}}}\left({{\mathit{\boldsymbol{\pi}}_2}} \right)$ . Therefore, minimizing $tr({{\bf{V}}_{\text{C}}})$ reduces considerable computational time compared to minimizing $tr({\bf{V}})$ , and $tr({{\bf{V}}_{\text{C}}})$ does not take into account the structural information of the data.

2.2. Optimal subsampling algorithm

The optimal subsampling probabilities are defined as $\left\{ \pi_i^{\text{op}} \right\}_{i = 1}^N = \left\{ \pi_i^{\text{mV}} \right\}_{i = 1}^N$ or $\left\{ \pi_i^{\text{mVc}} \right\}_{i = 1}^N$ . However, because $\pi _i^{\text{op}}$ depends on ${{\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}}$ , it cannot be used directly in applications. To calculate $\pi _i^{\text{op}}$ , it is necessary to use a prior estimator ${{\tilde{\mathit{\boldsymbol{\beta}}}}_0}$ , which is obtained by the prior subsample of size ${r_0}$ .

We know $\pi _i^{\text{op}}$ is proportional to $\left\| {{\mathit{\boldsymbol{\eta}}_i^*({{\mathit{\boldsymbol{\Sigma}}}_u}, {{\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}})}} \right\|$ , however, in actual situations, there may be some data points that make $\mathit{\boldsymbol{\eta}}_i^*({{\mathit{\boldsymbol{\Sigma}}}_u}, {{\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}}) = {\mathbf{0}}$ , which will never be included in a subsample, and some data points with $\mathit{\boldsymbol{\eta}}_i^*({{\mathit{\boldsymbol{\Sigma}}}_u}, {{\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}}) \approx {\mathbf{0}}$ also have small probabilities of being sampled. If these special data points are excluded, some sample information will be missed, but if these data points are included, the variance of the subsampling estimator may increase.

To avoid Eq (2.4) from being inflated by these special data points, this paper adopts a truncation method, setting a threshold $\omega$ for $\left\| \mathit{\boldsymbol{\eta}}_i^*(\mathit{\boldsymbol{\Sigma}}_u, {\hat{\mathit{\boldsymbol{{\beta}}}}}_{\text{MLE}}) \right\|$ , that is, replacing $\left\| \mathit{\boldsymbol{\eta}}_i^*(\mathit{\boldsymbol{\Sigma}}_u, {{\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}}) \right\|$ with $max \left\{ \left\| \mathit{\boldsymbol{\eta}}_i^*(\mathit{\boldsymbol{\Sigma}}_u, {{\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}}) \right\|, \omega \right\}$ , where $\omega$ is a very small positive number, for example, $10^{-4}$ . In applications, the choice and design of the truncation weight function, which is a commonly used technique, are crucial to improving the robustness of the model and optimizing the performance.

We replace $\hat{\mathit{\boldsymbol{{\beta}}}}_{\text {MLE}}$ in the matrix $\bf{V}$ with $\tilde{\mathit{\boldsymbol{{\beta}}}}_0$ , denoted as $\widetilde{\bf{{V}}}$ , then $tr\left(\widetilde{\bf{{V}}} \right) \le tr\left(\widetilde{\bf{{V}}}^\omega \right) \le tr\left(\widetilde{\bf{{V}}} \right) + \frac{\omega^2}{N^2 r} \sum\limits_{i = 1}^N \frac{{\left\| {\bf{M}}_X^{-1} \right\|}^2}{\pi_i^{\text {op}}}$ . Therefore, when $\omega$ is sufficiently small, $tr\left(\widetilde{\bf{{V}}}^\omega \right)$ approaches $tr\left(\widetilde{\bf{{V}}} \right)$ . The threshold $\omega$ is set to make the subsample estimators more robust without sacrificing excessively estimation efficiency. ${\widetilde{\bf{{M}}}}_X = \frac{1}{Nr_0} \sum\limits_{i = 1}^{r_0} {\bf{\Omega}}_i^*(\mathit{\boldsymbol{\Sigma}}_u, {\tilde{\mathit{\boldsymbol{\beta}}}}_0)$ based on the prior subsample can be used to approximate ${\bf{M}}_X$ . The two-step algorithm is presented in Algorithm 2.

Algorithm 2 Optimal subsampling algorithm.

Step 1. Extract a prior subsample set

${S_{r_0}}$ with a subsample size of

$r_0$ from the full data, assuming that the subsampling probabilities of the prior subsample are

${{\mathit{\boldsymbol{\pi}}}^{\text {UNIF}}} = \left\{ {{\pi _i}: = \frac{{{1}}}{N}} \right\}_{i = 1}^N$ . We use Algorithm 1 to obtain a prior estimator

${\tilde{\mathit{\boldsymbol{\beta}}}_0}$ , replace

${\hat{\mathit{\boldsymbol{\beta}}}_{\text {MLE}}}$ with

${\tilde{\mathit{\boldsymbol{\beta}}}_0}$ in Eqs (2.7) and (2.8) to get the optimal subsampling probabilities

$\left\{ {{ \pi }_i^{\text {opt}}}\right\}_{i = 1}^N$ .
Step 2. Use the optimal subsample probabilities

$\left\{ {{ \pi }_i^{\text {opt}}}\right\}_{i = 1}^N$ computed in Step 1 to extract a subsample size of

$r$ with replacement. According to the step in Algorithm 1, combining the subsamples from Step 1 and solving the estimating Eq (2.4) to get the estimator

$\check {\mathit{\boldsymbol{\beta}}}$ based on a subsample of total size

$r_0+r$ .

Remark 3. In Algorithm 2, ${\tilde{\mathit{\boldsymbol{\beta}}}}_0$ in Step 1 satisfies

$\mathit{\boldsymbol{Q}}_{\tilde{\mathit{\boldsymbol{\beta}}}_0}^{*0}(\mathit{\boldsymbol{\beta}}) = \frac{1}{r_0} \sum\limits_{i = 1}^{r_0} \frac{\tilde{\mathit{\boldsymbol{{\eta}}}}_{i}^*(\mathit{\boldsymbol{\Sigma}}_u, \mathit{\boldsymbol{\beta}})}{\pi_i^{\text {UNIF}}} = {\mathbf{0}}$

with the prior subsample set $S_{r_0}$ , and

${\bf{M}}_X^{{\tilde{\mathit{\boldsymbol{\beta}}}}_0} = \frac{1}{Nr_0} \sum\limits_{i = 1}^{r_0} \frac{\widetilde{\bf{{\Omega}}}_{i}^*(\mathit{\boldsymbol{\Sigma}}_u, \tilde{\mathit{\boldsymbol{{\beta}}}}_0)}{\pi_i^{\text {UNIF}}}.$

In Step 2, the subsampling probabilities are $\left\{{\pi}_i^{\text{opt}}\right\}_{i = 1}^N = \left\{ \pi_i^{\text{mVt}} \right\}_{i = 1}^N$ or $\left\{ \pi_i^{\text{mVct}} \right\}_{i = 1}^N$ , let

$g_i^{\text{mVt}} = \begin{cases} \left\| {\bf{M}}_X^{-1}\mathit{\boldsymbol{\eta}}_i^*(\mathit{\boldsymbol{\Sigma}}_u, {{\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}}) \right\|,\; \text{if}\; \left\|\mathit{\boldsymbol{\eta}}_i^*(\mathit{\boldsymbol{\Sigma}}_u, {{\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}}) \right\| > \omega\\ \omega \left\| {\bf{M}}_X^{-1} \right\|, \; \text{if}\; \left\|\mathit{\boldsymbol{\eta}}_i^*(\mathit{\boldsymbol{\Sigma}}_u, {{\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}}) \right\| < \omega \end{cases}, \; i = 1, \ldots, N,\;$

$g_i^{\text{mVct}} = max \left\{ \left\| \mathit{\boldsymbol{\eta}}_i^*(\mathit{\boldsymbol{\Sigma}}_u, {{\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}}) \right\|, \omega \right\},$

then

$\pi_i^{\text{mVt}} = \frac{g_i^{\text{mVt}}}{\sum\limits_{j = 1}^N g_j^{\text{mVt}}} \quad and \quad \pi_i^{\text{mVct}} = \frac{g_i^{\text{mVct}}}{\sum\limits_{j = 1}^N g_j^{\text{mVct}}}.$

The subsample set is $S_{r_0} \cup \left\{ \left(\widetilde{\mathit{\boldsymbol{W}}}_i, \widetilde{Y}_i, \widetilde{\pi}_i^{\text{opt}} \right) \mid i = 1, \ldots, r \right\}$ with a subsample size of $r + r_0$ , and $\check{\mathit{\boldsymbol{\beta}}}$ is the solution to the corresponding estimating equation

$\begin{aligned} \mathit{\boldsymbol{Q}}_{\tilde{\mathit{\boldsymbol{\beta}}}_0}^{two-step}(\mathit{\boldsymbol{\beta}}) & = \frac{1}{r + r_0} \sum\limits_{i = 1}^{r + r_0} \frac{\tilde{\mathit{\boldsymbol{{\eta}}}}_{i}^*(\mathit{\boldsymbol{\Sigma}}_u, {\mathit{\boldsymbol{\beta}}})}{\widetilde{\pi}_i^{\text{opt}}} = \frac{r}{r + r_0} \mathit{\boldsymbol{Q}}_{\tilde{\mathit{\boldsymbol{\beta}}}_0}^{*}(\mathit{\boldsymbol{\beta}}) + \frac{r_0}{r + r_0} \mathit{\boldsymbol{Q}}_{\tilde{\mathit{\boldsymbol{\beta}}}_0}^{*0}(\mathit{\boldsymbol{\beta}}) = {\mathbf{0}}, \end{aligned}$

where

$\mathit{\boldsymbol{Q}}_{{{\mathit{\boldsymbol{\tilde \beta}}}_0}}^{*}(\mathit{\boldsymbol{\beta}}) = \frac{1}{r}\sum\limits_{i = 1}^r {\frac{{\tilde{\mathit{\boldsymbol{\eta}}}_{_i}^*({{\mathit{\boldsymbol{\Sigma}}}_u},\mathit{\boldsymbol{\beta}})}}{{\widetilde \pi _i^{\text {opt}}}}}.$

Theorem 2.5. If Assumptions A1–A7 hold, as ${r_0}{{r}^{-1}} \to 0$ , ${r_0} \to \infty, r \to \infty$ and $N \to \infty$ , if ${\tilde {\mathit{\boldsymbol{\beta}}}}_0$ exists, then the estimator $\check{\mathit{\boldsymbol{\beta}}}$ obtained from Algorithm 2 converges to $\hat{\mathit{\boldsymbol{\beta}}}_{\mathit{\text{MLE}}}$ in conditional probability given $\mathcal{F}_N$ , and its convergence rate is ${r^{\frac{1}{2}}}$ . For all $\varepsilon > 0$ , there exist finite $\Delta_\varepsilon$ and $r_\varepsilon$ such that

$\begin{equation} P\left( \left\| \check {\mathit{\boldsymbol{\beta}}} - {\hat{\mathit{\boldsymbol{\beta}}}}_{\mathit{\text{MLE}}} \right\| \ge r^{-\frac{1}{2}} \Delta_\varepsilon \mid \mathcal{F}_N \right) < \varepsilon, \end{equation}$

(2.9)

for all $r > r_\varepsilon$ .

Theorem 2.6. If Assumptions A1–A7 hold, as ${r_0}{{{r}^{-1}}} \to 0$ , ${r_0} \to \infty, r \to \infty$ and $N \to \infty$ , conditional on $\mathcal{F}_N$ , the estimator $\check{\mathit{\boldsymbol{\beta}}}$ obtained from Algorithm 2 satisfies

$\begin{equation} {\bf{V}}_{\mathit{\text{opt}}}^{^{ - \frac{1}{2}}}(\check{\mathit{\boldsymbol{\beta}}} - {\hat{\mathit{\boldsymbol{\beta}}}_{\mathit{\text{MLE}}}})\mathop \to \limits^d N_p({{\mathbf{0}}},{\bf I}), \end{equation}$

(2.10)

where ${{\bf{V}}_{\mathit{\text{opt}}}} = {\bf{M}}_X^{ - 1}{\bf{V}}_{\mathit{\text{C}}}^{\mathit{\text{opt}}}{\bf{M}}_X^{ - 1} = {O_P}({r^{ - 1}})$ , and

${\bf{V}}_{\mathit{\text{C}}}^{\mathit{\text{opt}}} = \frac{1}{{{N^2}r}}\sum\limits_{i = 1}^N {\frac{{\mathit{\boldsymbol{\eta}}_{_i}^*({{\mathit{\boldsymbol{\Sigma}}}_u},{{\hat{\mathit{\boldsymbol{\beta}}}}_{\mathit{\text{MLE}}}})\mathit{\boldsymbol{\eta}}{{_{_i}^*}^{\rm T}}({{\mathit{\boldsymbol{\Sigma}}}_u},{{\hat{\mathit{\boldsymbol{\beta}}}}_{\mathit{\text{MLE}}}})}}{{\pi _i^{\mathit{\text{opt}}}}}}.$

Remark 4. We estimate the variance-covariance matrix of $\check{\mathit{\boldsymbol{\beta}}}$ by

${\widehat{\bf{V}}}_{\text{opt}} = {\widehat{\bf{ M}}}_X^{- 1}{{\widehat{\bf{ V}}}_{\text {C}}^{\text{opt}}}{\widehat{\bf{ M}}}_X^{ - 1},$

where

${{\widehat{\bf{ M}}}_X} = \frac{1}{{N\left( {{r_0} + r} \right)}}\left[ {\sum\limits_{i = 1}^{{r_0}} {\frac{{{\widetilde{\bf{\Omega}}}_{_i}^*({{\mathit{\boldsymbol{\Sigma}}}_u},{{\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}})}}{{\widetilde \pi _{_i}^{\text {UNIF}}}}} + \sum\limits_{i = 1}^r {\frac{{{\widetilde{\bf{ \Omega}}}_{_i}^*({{\mathit{\boldsymbol{\Sigma}}}_u},{{\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}})}}{{\widetilde \pi _{_i}^{\text{opt}}}}} } \right],$

${{\widehat{\bf{ V}}}_{\text {C}}^{\text{opt}}} = \frac{1}{{{N^2}{{\left( {{r_0} + r} \right)}^2}}}\left[ {\sum\limits_{i = 1}^{{r_0}} {\frac{{\tilde{\mathit{\boldsymbol{\eta}}}_{_i}^*({{\mathit{\boldsymbol{\Sigma}}}_u},{{\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}}){\tilde {\mathit{\boldsymbol{\eta}}}}{{_{_i}^*}^{\rm T}}({{\mathit{\boldsymbol{\Sigma}}}_u},{{\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}})}}{{\widetilde \pi _{_i}^{{\text {UNIF}}^2}}}} + \sum\limits_{i = 1}^r {\frac{{\tilde{\mathit{\boldsymbol{\eta}}}_{_i}^*({{\mathit{\boldsymbol{\Sigma}}}_u},{{\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}})\tilde{\mathit{\boldsymbol{\eta}}}{{_{_i}^*}^{\rm T}}({{\mathit{\boldsymbol{\Sigma}}}_u},{{\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}})}}{{\widetilde \pi _i^{{\text{opt}}^2}}}} } \right].$

3. Simulation studies

In this section, we perform numerical simulations using synthetic data to evaluate the finite sample performance of the proposed method in Algorithm 2 (denoted as mV and mVc). For a fair comparison, we also give the results of the uniform subsampling method and set the size to be the same as that of Algorithm 2. The estimators of the above three subsampling methods, uniform—the uniform subsampling, mV—the mV probability subsampling, and mVc—the mVc probability subsampling, are compared with MLE—the maximum likelihood estimators for full data. In addition, we conduct simulation experiments using two models: the logistic regression model and the Poisson regression model.

3.1. Binary logistic measurement error regression model

Set the sample size $N = 100000$ , the true value ${\mathit{\boldsymbol{\beta}}}_{t} = (0.5, -0.6, 0.5)^{\rm T}$ , the covariate ${\mathit{\boldsymbol{X}}}_i \sim N_3({{\mathbf{0}}}, {\boldsymbol \Sigma})$ , where ${\mathit{\boldsymbol{\Sigma}}} = 0.5{\bf I} +0.5 {{\mathbf{1}}}{{\mathbf{1}}}^{\rm T}$ , ${\bf I}$ is an identity matrix. The response $Y_i$ follows a binomial distribution with $P \left(Y_i = 1 \mid {\mathit{\boldsymbol{X}}_i} \right) = \left(1+\exp(-{\mathit{\boldsymbol{X}}_i^{\rm T}}{\boldsymbol \beta_t}) \right)^{-1}$ . We consider the following three cases to generate the measurement error term ${\boldsymbol{U}} _i$ .

● Case 1: ${\mathit{\boldsymbol{U}}}_i \sim N_3({{\mathbf{0}}}, 0.4^2{\bf I})$ ;

● Case 2: ${\mathit{\boldsymbol{U}}}_i \sim N_3({{\mathbf{0}}}, 0.5^2{\bf I})$ ;

● Case 3: ${\mathit{\boldsymbol{U}}}_i \sim N_3({{\mathbf{0}}}, 0.6^2{\bf I})$ .

The subsample size in Step 1 of Algorithm 2 is selected as $r_0 = 400$ . The subsample size $r$ is set to be 500, 1000, 1500, 2000, 2500, and 5000. In order to verify that $\check{\mathit{\boldsymbol{\beta}}}$ can asymptotically approach ${\mathit{\boldsymbol{\beta_{t}}}}$ , we repeat $K = 1000$ and calculate $MSE = \frac{1}{K}{\sum\limits_{k = 1}^K {\left\| {{{\check{\mathit{\boldsymbol{\beta}}}}^{\left(k \right)}} - {\mathit{\boldsymbol{\beta}}_t}} \right\|} ^2}$ , where ${\check{\mathit{\boldsymbol{\beta}}}}^{\left(k \right)}$ is the parameter estimator of the subsample generated by the $k$ -th repetition.

The simulation results are shown in , which can be seen that both mV and mVc always have smaller MSEs than uniform subsampling. The MSEs of all the subsampling methods decrease as an increase of $r$ , which confirms the theoretical results of the consistency of the subsampling methods. As the variance of the error term increases, the MSEs of uniform, mV, and mVc also increase. The mV is better than the mVc because the subsampling probabilities of mV take the structural information of the data into account. A comparison between the corrected and uncorrected methods shows that the MSEs of the corrected methods are much smaller than those of the uncorrected methods, and the difference between the corrected and uncorrected methods increases as the error variance increases.

Figure 1. MSEs for

$\check{\boldsymbol \beta}$ with different second step subsample size

$r$ and

$r_0 = 400$ . The colorful icons and lines represent the corrected subsampling methods. The gray icons and lines represent the uncorrected subsampling methods.

DownLoad: Full-Size Img PowerPoint

Now, we evaluate the statistical inference performance of the optimal subsampling method for different $r$ and variances of ${{\boldsymbol{U}}}_i$ . The parameter ${\beta}_1$ is taken as an example, and a $95\%$ confidence interval is constructed. reports the empirical coverage probabilities and average lengths of three subsampling methods. It is evident that both mV and mVc have similar performance and consistently outperform the uniform subsampling method. As $r$ increases, the length of the confidence interval uniformly decreases.

Table 1. Empirical coverage probabilities and average lengths of confidence intervals for

${\beta}_1$ in the logistic regression models with different

$r$ and

$r_0 = 500$ .

		uniform		mV		mVc
Case	$r$	Coverage	Length	Coverage	Length	Coverage	Length
Case 1	500	0.958	0.565	0.932	0.331	0.942	0.457
	1000	0.952	0.453	0.925	0.248	0.954	0.333
	1500	0.960	0.387	0.920	0.206	0.964	0.274
	2000	0.932	0.345	0.907	0.180	0.954	0.237
	2500	0.938	0.313	0.910	0.160	0.956	0.211
	5000	0.964	0.302	0.908	0.148	0.937	0.202
Case 2	500	0.956	0.634	0.946	0.602	0.962	0.613
	1000	0.946	0.621	0.934	0.586	0.946	0.593
	1500	0.927	0.597	0.954	0.551	0.962	0.561
	2000	0.943	0.543	0.956	0.524	0.921	0.518
	2500	0.970	0.475	0.958	0.453	0.944	0.462
	5000	0.963	0.438	0.932	0.417	0.947	0.441
Case 3	500	0.958	0.706	0.956	0.432	0.968	0.550
	1000	0.946	0.561	0.972	0.399	0.970	0.409
	1500	0.944	0.479	0.968	0.321	0.960	0.329
	2000	0.936	0.425	0.964	0.265	0.958	0.281
	2500	0.926	0.389	0.966	0.249	0.954	0.250
	5000	0.915	0.356	0.947	0.220	0.942	0.236

| Show Table

DownLoad: CSV

3.2. Poisson measurement error regression model

Let ${\mathit{\boldsymbol{\beta}}}_{t} = (0.5, -0.6, 0.5)^{\rm T}$ , the covariate ${{\boldsymbol{X}}}_i \sim N_3({{\mathbf{0}}}, {\boldsymbol \Sigma})$ , where ${\mathit{\boldsymbol{\Sigma}}} = 0.3{\bf I} +0.5 {{\mathbf{1}}}{{\mathbf{1}}}^{\rm T}$ , ${\bf I}$ is an identity matrix. We consider the following three cases to generate the measurement error term ${\boldsymbol{U}} _i$ .

● Case 1: ${\mathit{\boldsymbol{U}}}_i \sim N_3({{\mathbf{0}}}, 0.3^2{\bf I})$ ;

● Case 2: ${\mathit{\boldsymbol{U}}}_i \sim N_3({{\mathbf{0}}}, 0.4^2{\bf I})$ ;

● Case 3: ${\mathit{\boldsymbol{U}}}_i \sim N_3({{\mathbf{0}}}, 0.5^2{\bf I})$ .

We also generate a sample of $N = 100000$ following $Poisson \left({{\mu _i}} \right)$ , where ${\mu _i} = \exp \left({\mathit{\boldsymbol{X}}_i^{\rm T}{\mathit{\boldsymbol{\beta}}_t}} \right)$ . distribution, and summarize the MSEs with the number of simulations $K = 1000$ in Figure 2. The other settings are the same as those in the logistic regression example.

Figure 2. MSEs for

$\check{\boldsymbol \beta}$ with different second step subsample size

$r$ and

$r_0 = 400$ . The colorful icons and lines represent the corrected subsampling methods. The gray icons and lines represent the uncorrected subsampling methods.

DownLoad: Full-Size Img PowerPoint

In , it can be seen that the MSEs of both the mV and mVc methods are smaller than those of the uniform subsampling, with the mV method being the optimal. In addition, the corrected method is obviously effective, which is consistent with . reports the empirical coverage probabilities and average lengths of $95\%$ confidence interval of the parameter ${\beta}_3$ for three subsampling methods. The conclusions of Table 2 are consistent with those of Table 1, but the average lengths of the intervals for Poisson regression are significantly longer than those for logistic regression.

Table 2. Empirical coverage probabilities and average lengths of confidence intervals for

${\beta}_3$ in the Poisson regression models with different

$r$ and

$r_0 = 500$ .

		uniform		mV		mVc
Case	$r$	Coverage	Length	Coverage	Length	Coverage	Length
Case 1	500	0.962	0.441	0.962	0.383	0.958	0.399
	1000	0.944	0.352	0.964	0.291	0.964	0.304
	1500	0.932	0.302	0.964	0.241	0.966	0.255
	2000	0.952	0.268	0.930	0.210	0.944	0.223
	2500	0.946	0.244	0.958	0.188	0.974	0.201
	5000	0.952	0.234	0.961	0.173	0.943	0.185
Case 2	500	0.938	0.127	0.936	0.108	0.948	0.109
	1000	0.936	0.102	0.946	0.082	0.934	0.082
	1500	0.942	0.087	0.934	0.069	0.936	0.068
	2000	0.952	0.078	0.956	0.060	0.952	0.059
	2500	0.946	0.071	0.932	0.053	0.944	0.053
	5000	0.935	0.068	0.965	0.045	0.971	0.047
Case 3	500	0.940	0.185	0.936	0.153	0.953	0.156
	1000	0.950	0.148	0.954	0.113	0.958	0.118
	1500	0.932	0.127	0.950	0.094	0.958	0.099
	2000	0.946	0.113	0.952	0.082	0.960	0.086
	2500	0.942	0.103	0.932	0.073	0.950	0.077
	5000	0.937	0.096	0.956	0.065	0.964	0.061

| Show Table

DownLoad: CSV

In order to explore the influence of different subsample size allocated in the two-step algorithm, we calculate the MSEs for $r_0$ at different proportions under the condition that the total subsample size remains constant. Set the total subsample size $r_0 + r = 3000$ , and the result is shown in . It can be seen that the accuracy of the two-step algorithm will initially improve with the increase of $r_0$ . However, when $r_0$ increases to a certain extent, the accuracy of the algorithm begins to decrease. There are two reasons: (1) if $r_0$ is too small, the estimators in the first step will be biased, and it is difficult to ensure the accuracy; (2) if $r_0$ are too large, then the performances of mV and mVc are similar to that of the uniform subsampling. When $r_0/(r_0 + r)$ is around 0.25, the two-step algorithm performs the best.

Figure 3. MSEs vs proportions of the first step subsample with fixed total subsample size for logistic and Poisson models with Case 1.

DownLoad: Full-Size Img PowerPoint

We use the Sys.time() function in R to calculate the running time of three subsampling methods and full data. We conduct 1000 repetitions, set $r_0 = 200$ , and consider different $r$ values in Case 1. The results are shown in Tables 3 and 4. It is easy to find that the uniform subsampling algorithm requires the least computation time. Because there is no need to calculate the subsampling probabilities. In addition, the mV method takes longer than the mVc method, and this result is consistent with the theoretical analysis in Section 2.

Table 3. Computing time (in seconds) for logistic regression with Case 1 for different

$r$ and fixed

$r_0 = 200$ .

$r$
Method	300	500	800	1200	1600	2000
uniform	0.2993	0.3337	0.4985	0.5632	0.8547	0.5083
mV	3.5461	3.6485	3.8623	4.1256	4.4325	5.2365
mVc	3.2852	3.3658	3.5463	3.8562	4.0235	4.4235
Full	45.9075

| Show Table

DownLoad: CSV

Table 4. Computing time (in seconds) for Poisson regression with Case 1 for different

$r$ and fixed

$r_0 = 200$ .

$r$
Method	300	500	800	1200	1600	2000
uniform	0.4213	0.4868	0.5327	0.5932	0.7147	0.8883
mV	4.6723	4.8963	5.2369	5.6524	6.0128	6.3567
mVc	4.3521	4.6329	4.9658	5.2156	5.7652	5.9635
Full	51.2603

| Show Table

DownLoad: CSV

4. Real data

4.1. Global census dataset

In this section, we apply the proposed method to analyze the 1994 global census data, which contains 42 countries, from the Machine Learning Database ^[33]. There are 5 covariates in the data: $x_1$ represents age; $x_2$ represents the population weight value, which is assigned by the Population Division of the Census Bureau and is related to socioeconomic characteristics; $x_3$ represents the highest level of education, that is, the highest level of education since primary school; $x_4$ represents capital loss, which refers to the loss of income from bad investment, which is the difference between the lower selling price and the higher purchase price of an individual's investment; $x_5$ represents weekly working hours. If an individual's annual income exceeds 50,000 dollars, it is expressed as $y_i = 1$ and $y_i = 0$ otherwise.

To verify the effectiveness of the proposed method, we add the measurement errors to the covariates $x_2$ , $x_4$ and $x_5$ in this dataset, and the covariance matrix of the measurement error is

$\begin{equation*} {\mathit{\boldsymbol{\Sigma}}}_{u} = \begin{bmatrix} 0& & & & \\ &0.04& & & \\ & & 0& & \\ & & &0.04& \\ & & & & 0.04\\ \end{bmatrix}. \end{equation*}$

We split the full dataset into a training set of 32561 observations and a test set of 16281 observations in a 2:1 ratio. We apply the proposed method to the training set and evaluate the classification performance with the test set. We calculate $LEMSE = \log \left(\frac{1}{K}{\sum\limits_{k = 1}^K {\left\| {{{\check{\mathit{\boldsymbol{\beta}}}}^{\left(k \right)}} - {{\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}}} \right\|} ^2}\right)$ based on 1000 bootstrap subsample estimators with $r = 500, 1000, 1500, 2200, 2500,$ and $r_0 = 500$ . The corrected MLE estimators for the training set are ${{\hat \beta }_{\text{MLE, 0}}^{\text{err}}} = -1.6121$ , ${{\hat \beta }_{\text{MLE, 1}}^{\text{err}}} = 1.1992$ , ${{\hat \beta }_{\text{MLE, 2}}^{\text{err}}} = 0.0103$ , ${{\hat \beta }_{\text{MLE, 3}}^{\text{err}}} = 0.9142$ , ${{\hat \beta }_{\text{MLE, 4}}^{\text{err}}} = 0.2617$ , ${{\hat \beta }_{\text{MLE, 5}}^{\text{err}}} = 0.8694$ .

shows the average estimators and the corresponding standard errors based on the proposed method ( $r_0 = 500$ , $r = 2000$ ). It can be seen that the estimators from three subsampling methods are close to the estimators from the full data. In general, the mV and mVc subsampling methods produce small standard errors.

Table 5. Average estimators based on subsamples with measurement error and subsample size

$r = 2000$ . The numbers in parentheses are the standard errors of the average estimators.

	uniform	mV	mVc
$\text{Intercept}$	-1.6084(0.069)	-1.5998(0.055)	-1.3122(0.052)
${\check{\beta}}_1^{\text{err}}$	1.2879(0.205)	1.1880(0.103)	1.2038(0.097)
${\check{\beta}}_2^{\text{err}}$	0.0105(0.106)	0.0104(0.059)	0.0111(0.046)
${\check{\beta}}_3^{\text{err}}$	1.0033(0.201)	0.9217(0.067)	0.9199(0.054)
${\check{\beta}}_4^{\text{err}}$	0.2636(0.094)	0.2698(0.054)	0.2555(0.063)
${\check{\beta}}_5^{\text{err}}$	0.9469(0.229)	0.8741(0.083)	0.8628(0.076)

| Show Table

DownLoad: CSV

All subsampling methods show that each variable has a positive impact on income, with age, highest education level, and weekly working hours having significant impacts on income. Interestingly, capital losses have a significant positive impact on income because low-income people rarely invest. However, the population weight value has the smallest impact on income, the reason should be more inclined to reflect the overall distribution characteristics among groups rather than the specific economic performance of individuals. Income is a highly volatile variable, and the income gap between different groups may be large. Even under the same socioeconomic characteristics, the income distribution may have a large variance. This high variability weakens the overall impact of the population weight on income.

Fix $r_0 = 500$ , shows the LEMSEs calculated for the subsample with measurement errors. We can see that the LEMSEs of the corrected methods are much smaller than those of the uncorrected methods. As $r$ increases, the LEMSEs become increasingly small. The estimators of the subsampling methods are consistent and the mV method is the best. Figure 4(b) shows the proportion of responses in the test set being correctly classified for different subsample sizes. The mV performs slightly better than the mVc. It can also be seen that the prediction accuracy of the corrected subsampling methods is slightly greater compared with the correspondingly uncorrected methods.

Figure 4. LEMSEs and model prediction accuracy (proportion of correctly classified models) for the subsample with measurement errors. The colorful icons and lines represent the corrected subsampling methods. The gray icons and lines represent the uncorrected subsampling methods.

DownLoad: Full-Size Img PowerPoint

4.2. Creditcard fraud dataset

This subsection applies the corrected subsampling method to creditcard fraud detection dataset from Kaggle ^*, and the dependent variable is whether an individual has committed creditcard fraud. There are 284,807 pieces of data in the dataset, with a total of 492 fraud cases. Since the data involves sensitive information, the covariates have all been processed by principal component analysis with a total of 28 principal components. Amount represents the consumption amount, class is the dependent variable, 1 represents fraud, and 0 means normal. The first four principal components and the consumption amount are selected as independent variables.

^*https://www.kaggle.com/datasets/creepycrap/creditcard-fraud-dataset

To verify the effectiveness of the proposed method, we add the measurement errors to the covariates, and the covariance matrix of the measurement error is ${\mathit{\boldsymbol{\Sigma}}}_{u} = 0.16{\bf I}$ . We split the dataset into the training set and the test set in a 3:1 ratio and summarize the LEMSEs based on the number of simulations $K = 1000$ with $r = 500, 1000, 1500, 2200, 2500, 5000,$ and $r_0 = 500$ .

The MLE estimators for the training set are ${{\hat \beta }_{\text{MLE, 0}}^{\text{err}}} = -8.8016$ , ${{\hat \beta }_{\text{MLE, 1}}^{\text{err}}} = -0.6070$ , ${{\hat \beta }_{\text{MLE, 2}}^{\text{err}}} = 0.0737$ , ${{\hat \beta }_{\text{MLE, 3}}^{\text{err}}} = -0.9056$ , ${{\hat \beta }_{\text{MLE, 4}}^{\text{err}}} = 1.4553$ , ${{\hat \beta }_{\text{MLE, 5}}^{\text{err}}} = -0.1329$ . shows the average estimators and the corresponding standard errors ( $r_0 = 500$ , $r = 2000$ ). It can be seen that the estimators from three subsampling methods are close to the estimators from the full data. In general, the mV and mVc subsampling methods produce small standard errors. From Figure 5, we can obtain similar results as in Figure 4.

Table 6. Average estimators based on subsamples with measurement error and subsample size

$r = 2000$ . The numbers in parentheses are the standard errors of the average estimators.

	uniform	mV	mVc
$\text{Intercept}$	-8.7934(0.0678)	-8.8105(0.0562)	-8.8135(0.0543)
${\check{\beta}}_1^{\text{err}}$	-0.6123(0.341)	-0.6047(0.142)	-0.6035(0.105)
${\check{\beta}}_2^{\text{err}}$	0.0712(0.125)	0.0730(0.064)	0.0798(0.088)
${\check{\beta}}_3^{\text{err}}$	-0.9321(0.245)	-0.9087(0.067)	-0.9123(0.057)
${\check{\beta}}_4^{\text{err}}$	1.4618(0.198)	1.4580(0.054)	1.4603(0.075)
${\check{\beta}}_5^{\text{err}}$	-0.1435(0.531)	-0.1347(0.242)	-0.1408(0.225)

| Show Table

DownLoad: CSV

Figure 5. LEMSEs and model prediction accuracy (proportion of correctly classified models) for the subsample with measurement errors. The colorful icons and lines represent the corrected subsampling methods. The gray icons and lines represent the uncorrected subsampling methods.

DownLoad: Full-Size Img PowerPoint

5. Conclusions

In this paper, we not only combine the corrected score method with the subsampling technique, but also theoretically derive the consistency and asymptotic normality of the subsampling estimators. In addition, an adaptive two-step algorithm is developed based on optimal subsampling probabilities using A-optimality and L-optimality criteria and the truncation method. The theoretical results of the proposed method are tested with simulated and two real datasets, and the experimental results demonstrate the effectiveness and good performance of the proposed method.

This paper merely assumes that the covariates are affected by the measurement error. However, in practical applications, the response variables can be influenced by measurement errors. The optimal subsampling probabilities are obtained by minimizing $tr(\bf V)$ or $tr({\bf V}_{\text{C}})$ using the design ideas of the A-optimality and L-optimality criteria. In the future, the other optimality criteria for subsampling can be considered to develop more efficient algorithms.

Author contributions

Ruiyuan Chang: Furnished the algorithms and numerical results presented in the manuscript and composed the original draft of the manuscript; Xiuli Wang: Rendered explicit guidance regarding the proof of the theorem and refined the language of the entire manuscript; Mingqiu Wang: Rendered explicit guidance regarding the proof of theorems and the writing of codes and refined the language of the entire manuscript. All authors have read and consented to the published version of the manuscript.

Use of Generative-AI tools declaration

The authors declare that they have not used Artificial Intelligence (AI) tools in the creation of this article.

Acknowledgments

This research was supported by the National Natural Science Foundation of China (12271294) and the Natural Science Foundation of Shandong Province (ZR2024MA089).

Conflict of interest

The authors declare no conflict of interest.

Appendix

The proofs of the following lemmas and theorems are primarily based on Wang et al. ^[5], Ai et al. ^[7] and Yu et al. ^[34].

Lemma 1. If Assumptions A1–A4 and A6 hold, as $r \to \infty$ and $N \to \infty$ , conditional on $\mathcal{F}_N$ , we have

$\begin{equation} {\overset{\smile}{\bf{M}}_X} - {{\bf{M}}_X} = {O_{P \mid\mathcal{F}_N}}({r^{ - \frac{1}{2}}}), \end{equation}$

(A.1)

$\begin{equation} \frac{1}{N}{\mathit{\boldsymbol{Q}}^{*}}({{\hat{\mathit{\boldsymbol{\beta}}}}_{\mathit{\text{MLE}}}}) - \frac{1}{N}{\mathit{\boldsymbol{Q}}}({{\hat{\mathit{\boldsymbol{\beta}}}}_{\mathit{\text{MLE}}}}) = {O_{P \mid\mathcal{F}_N}}({r^{ - \frac{1}{2}}}), \end{equation}$

(A.2)

$\begin{equation} \frac{1}{N}{\bf{V}}_{\mathit{\text{C}}}^{ - \frac{1}{2}}{\mathit{\boldsymbol{Q}}^{*}}({{\hat{\mathit{\boldsymbol{\beta}}}}_{\mathit{\text{MLE}}}})\mathop \to \limits^d N_p({{\mathbf{0}}},{\bf I}), \end{equation}$

(A.3)

where

${\overset{\smile}{\bf{M}}_X} = \frac{1}{{Nr}}\sum\limits_{i = 1}^r {\frac{{{\widetilde{ {\bf{\Omega}}}}_{_i}^*({{\mathit{\boldsymbol{\Sigma}}}_u},{{\hat{\mathit{\boldsymbol{\beta}}}}_{\mathit{\text{MLE}}}})}}{{{{\widetilde \pi }_i}}}},$

and

${{\bf{V}}_{\mathit{\text{C}}}} = \frac{1}{{{N^2}r}}\sum\limits_{i = 1}^N {\mathit{\boldsymbol{\eta}}_{_i}^*({{\mathit{\boldsymbol{\Sigma}}}_u},{{\hat{\mathit{\boldsymbol{\beta}}}}_{\mathit{\text{MLE}}}})\mathit{\boldsymbol{\eta}}{{_{_i}^*}^{\rm T}}({{\mathit{\boldsymbol{\Sigma}}}_u},{{\hat{\mathit{\boldsymbol{\beta}}}}_{\mathit{\text{MLE}}}})}.$

Proof.

$\begin{aligned} E\left( \overset{\smile}{\bf{M}}_X \,\middle\vert\, \mathcal{F}_N \right)& = E\left( \frac{1}{Nr} \sum\limits_{i = 1}^r \frac{{\widetilde{\bf{\Omega}}}_i^*(\mathit{\boldsymbol{\Sigma}}_u, {\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}})}{\widetilde{\pi}_i} \,\middle\vert\, \mathcal{F}_N \right) \\ & = \frac{1}{Nr} \sum\limits_{i = 1}^r \sum\limits_{j = 1}^N \pi_j \frac{{\bf{\Omega}}_j^*(\mathit{\boldsymbol{\Sigma}}_u, {\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}})}{\pi_j} \\ & = \frac{1}{N} \sum\limits_{i = 1}^N {\bf{\Omega}}_i^*(\mathit{\boldsymbol{\Sigma}}_u, {\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}) \\ & = {\bf{M}}_X. \end{aligned}$

By Assumption A6, we have

$\begin{aligned} &E\left[ \left( \overset{\smile}{\bf{M}}_X^{j_1 j_2} - \mathbf{M}_X^{j_1 j_2} \right)^2 \,\middle\vert\, \mathcal{F}_N \right] \\ = & E\left[ \left( \frac{1}{Nr} \sum\limits_{i = 1}^r \frac{{\widetilde{\bf{\Omega}}}_i^{*(j_1 j_2)}(\mathit{\boldsymbol{\Sigma}}_u, {\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}})}{\widetilde{\pi}_i} - \frac{1}{N} \sum\limits_{i = 1}^N {\bf{\Omega}}_i^{*(j_1 j_2)}(\mathit{\boldsymbol{\Sigma}}_u, {\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}) \right)^2 \,\middle\vert\, \mathcal{F}_N \right] \\ = & \frac{1}{r} \sum\limits_{i = 1}^N \pi_i \left( \frac{{\bf{\Omega}}_i^{*(j_1 j_2)}(\mathit{\boldsymbol{\Sigma}}_u, {\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}})}{N \pi_i} - \mathbf{M}_X^{j_1 j_2} \right)^2 \\ = & \frac{1}{r} \sum\limits_{i = 1}^N \pi_i \left( \frac{{\bf{\Omega}}_i^{*(j_1 j_2)}(\mathit{\boldsymbol{\Sigma}}_u, {\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}})}{N \pi_i} \right)^2 - \frac{1}{r} \left( \mathbf{M}_X^{j_1 j_2} \right)^2 \\ \le& \frac{1}{r} \sum\limits_{i = 1}^N \pi_i \left( \frac{{\bf{\Omega}}_i^{*(j_1 j_2)}(\mathit{\boldsymbol{\Sigma}}_u, {\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}})}{N \pi_i} \right)^2 \\ = & O_P(r^{-1}). \end{aligned}$

It follows from Chebyshev's inequality that (A.1) holds.

$\begin{aligned} E\left( \frac{1}{N} \mathit{\boldsymbol{Q}}^{*}({\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}) \,\middle\vert\, \mathcal{F}_N \right) & = E\left( \frac{1}{N} \frac{1}{r} \sum\limits_{i = 1}^r \frac{\tilde{\mathit{\boldsymbol{{{\eta}}}}}_i^*(\mathit{\boldsymbol{\Sigma}}_u, {\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}})}{{\widetilde{\pi}}_i} \,\middle\vert\, \mathcal{F}_N \right) \\ & = \frac{1}{Nr} \sum\limits_{i = 1}^r \sum\limits_{j = 1}^N \pi_j \frac{\mathit{\boldsymbol{\eta}}_j^*(\mathit{\boldsymbol{\Sigma}}_u, {\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}})}{\pi_j} \\ & = \frac{1}{N} \sum\limits_{i = 1}^N {\mathit{\boldsymbol{\eta}}}_i^*(\mathit{\boldsymbol{\Sigma}}_u, {\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}) \\ & = {{\mathbf{0}}}. \end{aligned}$

By Assumption A4, we have

$\begin{aligned} \text{Var}\left( \frac{1}{N} \mathit{\boldsymbol{Q}}^{*}({\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}) \,\middle\vert\, \mathcal{F}_N \right) & = \text{Var}\left[ \left( \frac{1}{N} \frac{1}{r} \sum\limits_{i = 1}^r \frac{{\tilde{\mathit{\boldsymbol{\eta}}}}_i^*(\mathit{\boldsymbol{\Sigma}}_u, {\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}})}{\widetilde{\pi}_i} \right) \,\middle\vert\, \mathcal{F}_N \right] \\ & = \frac{1}{N^2 r^2} \sum\limits_{i = 1}^r \sum\limits_{j = 1}^N \pi_j \frac{{\mathit{\boldsymbol{\eta}}}_i^*(\mathit{\boldsymbol{\Sigma}}_u, {\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}) {\mathit{\boldsymbol{\eta}}}_i^{*{\rm T}}(\mathit{\boldsymbol{\Sigma}}_u, \hat{\mathit{\boldsymbol{\beta}}}_{\text{MLE}})}{\pi_j^2} \\ & = \frac{1}{N^2 r} \sum\limits_{i = 1}^N \frac{{\mathit{\boldsymbol{\eta}}}_i^*(\mathit{\boldsymbol{\Sigma}}_u, {\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}) {\mathit{\boldsymbol{\eta}}}_i^{*{\rm T}}(\mathit{\boldsymbol{\Sigma}}_u, {\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}})}{\pi_i} \\ & = O_P(r^{-1}). \end{aligned}$

Now (A.2) follows from Markov's Inequality.

Let $\mathit{\boldsymbol{\gamma}} _i^{*} = {\left(N{\pi _i}\right)}^{-1} {{\tilde {\mathit{\boldsymbol{\eta}}} }_{_i}^*({{\mathit{\boldsymbol{\Sigma}}}_u}, {{\hat{\mathit{\boldsymbol{\beta}}}}_{\text {MLE}}})}$ , then $N^{-1}{\mathit{\boldsymbol{Q}}^{*}}({\hat{\mathit{\boldsymbol{\beta}}}_{\text {MLE}}}) = {r}^{-1}\sum\limits_{i = 1}^r {\mathit{\boldsymbol{\gamma}} _i^{*}}$ holds. Based on Assumption A6, for all $\varepsilon > 0$ , we have

$\begin{aligned} &\sum\limits_{i = 1}^r E\left\{ \left\| r^{-\frac{1}{2}} \mathit{\boldsymbol{\gamma}}_i^{*} \right\|^2 I\left( \left\| \mathit{\boldsymbol{\gamma}}_i^{*} \right\| > r^{\frac{1}{2}} \varepsilon \right) \,\middle\vert\, \mathcal{F}_N \right\} \\ = & \frac{1}{r} \sum\limits_{i = 1}^r E\left\{ \left\| \mathit{\boldsymbol{\gamma}}_i^{*} \right\|^2 I\left( \left\| \mathit{\boldsymbol{\gamma}}_i^{*} \right\| > r^{\frac{1}{2}} \varepsilon \right) \,\middle\vert\, \mathcal{F}_N \right\} \\ \le& \frac{1}{r^{\frac{3}{2}} \varepsilon} \sum\limits_{i = 1}^r E\left\{ \left\| \mathit{\boldsymbol{\gamma}}_i^{*} \right\|^3 \,\middle\vert\, \mathcal{F}_N \right\} \\ = & \frac{1}{r^{\frac{1}{2}} \varepsilon} \frac{1}{N^3} \sum\limits_{i = 1}^N \frac{\left\| \mathit{\boldsymbol{\gamma}}_i^{*} \right\|^3}{\pi_i^2} \\ = & O_P(r^{-\frac{1}{2}}) = o_P(1). \end{aligned}$

This shows that the Lindeberg-Feller conditions are satisfied in probability. Therefore (A.3) is true. □

Lemma 2. Assumptions A1–A7 hold, as $r \to \infty$ and $N \to \infty$ , conditional on $\mathcal{F}_N$ , for all ${{{\boldsymbol{s}}}_r} \to {\mathbf{0}}$ , we have

$\begin{equation} \frac{1}{Nr} \sum\limits_{i = 1}^N \frac{\widetilde{\bf{\Omega}}_i^*(\mathit{\boldsymbol{\Sigma}}_u, {\hat{\mathit{\boldsymbol{\beta}}}}_{\mathit{\text{MLE}}} + {{{\boldsymbol{s}}}_r})}{\widetilde{\pi}_i} - \frac{1}{N} \sum\limits_{i = 1}^N {\bf{\Omega}}_i^*(\mathit{\boldsymbol{\Sigma}}_u, {\hat{\mathit{\boldsymbol{\beta}}}}_{\mathit{\text{MLE}}}) = o_{P\mid\mathcal{F}_N}(1). \end{equation}$

(A.4)

Proof. The Eq (A.4) can be written as

$\frac{1}{Nr}\sum\limits_{i = 1}^N {\frac{{{\widetilde{\bf{\Omega}}}_i^*({{\mathit{\boldsymbol{\Sigma}}}_u},{{\hat{\mathit{\boldsymbol{\beta}}}}_{\text {MLE}}} + {{{\boldsymbol{s}}}_r})}}{{{{\widetilde \pi }_i}}}} - \frac{1}{Nr}\sum\limits_{i = 1}^N {\frac{{{\widetilde{\bf{\Omega}}}_i^*({{\mathit{\boldsymbol{\Sigma}}}_u},{{\hat{\mathit{\boldsymbol{\beta}}}}_{\text {MLE}}})}}{{{{\widetilde \pi }_i}}}} + \frac{1}{Nr}\sum\limits_{i = 1}^N {\frac{{{\widetilde{\bf{\Omega}}}_i^*({{\mathit{\boldsymbol{\Sigma}}}_u},{{\hat{\mathit{\boldsymbol{\beta}}}}_{\text {MLE}}})}}{{{{\widetilde \pi }_i}}}} - \frac{1}{N}\sum\limits_{i = 1}^N {{\bf{\Omega}}_i^*({{\mathit{\boldsymbol{\Sigma}}}_u},{{\mathit{\boldsymbol{\hat \beta}}}_{\text {MLE}}})}.$

Let

${{\mathit{\boldsymbol{\tau}}} _1} : = \frac{1}{Nr}\sum\limits_{i = 1}^N {\frac{{\widetilde{\bf{ \Omega}}_i^*({{\mathit{\boldsymbol{\Sigma}}}_u},{{\hat{\mathit{\boldsymbol{\beta}}}}_{\text {MLE}}} + {{{\boldsymbol{s}}}_r})}}{{{{\widetilde \pi }_i}}}} - \frac{1}{Nr}\sum\limits_{i = 1}^N {\frac{{\widetilde{\bf{ \Omega}}_i^*({{\mathit{\boldsymbol{\Sigma}}}_u},{{\hat{\mathit{\boldsymbol{\beta}}}}_{\text {MLE}}})}}{{{{\widetilde \pi }_i}}}},$

then by Assumption A7, we have

$\begin{aligned} E\left( \left\| {\mathit{\boldsymbol{\tau}}} _1 \right\|_S \,\middle\vert\, \mathcal{F}_N \right) = & E\left\{ \frac{1}{Nr} \sum\limits_{i = 1}^N \frac{1}{\widetilde{\pi}_i} \left\| {\widetilde{\bf{\Omega}}}_i^*(\mathit{\boldsymbol{\Sigma}}_u, {\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}} + {{{\boldsymbol{s}}}_r}) - {\widetilde{\bf{\Omega}}}_i^*(\mathit{\boldsymbol{\Sigma}}_u, {\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}) \right\|_S \,\middle\vert\, \mathcal{F}_N \right\} \\ = & \frac{1}{Nr} \sum\limits_{i = 1}^r \sum\limits_{j = 1}^N \pi_j \frac{1}{\pi_j} \left\| {\bf{\Omega}}_i^*(\mathit{\boldsymbol{\Sigma}}_u, {\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}} + {{{\boldsymbol{s}}}_r}) - {\bf{\Omega}}_i^*(\mathit{\boldsymbol{\Sigma}}_u, \hat{\mathit{\boldsymbol{\beta}}}_{\text{MLE}}) \right\|_S \\ \le& \frac{1}{N} \sum\limits_{i = 1}^N m_2(W_i) \| {{{\boldsymbol{s}}}_r} \| \\ = & o_P(1). \end{aligned}$

It follows from Markov's inequality that ${{\mathit{\boldsymbol{\tau}}} _1} = {o_{P\mid\mathcal{F}_N}}(1)$ .

Let

${{\mathit{\boldsymbol{\tau}} }_2} : = \frac{1}{Nr}\sum\limits_{i = 1}^N {\frac{{{\widetilde{\bf{\Omega}}}_i^*({{\mathit{\boldsymbol{\Sigma}}}_u},{{\hat{\mathit{\boldsymbol{\beta}}}}_{\text {MLE}}})}}{{\widetilde \pi }_i} - \frac{1}{N}\sum\limits_{i = 1}^N {{\bf{\Omega}}_i^*}({{\mathit{\boldsymbol{\Sigma}}}_u},{{\hat{\mathit{\boldsymbol{\beta}}}}_{\text {MLE}}})},$

then

$E\left\{ \frac{1}{Nr} \sum\limits_{i = 1}^N \frac{\widetilde{\bf{{\Omega}}}_i^*(\mathit{\boldsymbol{\Sigma}}_u, {\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}})}{\widetilde{\pi}_i} \,\middle\vert\, \mathcal{F}_N \right\} = \frac{1}{N} \sum\limits_{i = 1}^N {\bf{\Omega}}_i^*(\mathit{\boldsymbol{\Sigma}}_u, {\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}).$

From the proof of Lemma 1, it follows that

$E\left[ \left( \overset{\smile}{\bf{M}}_X^{j_1 j_2} - {\bf{M}}_X^{j_1 j_2} \right)^2 \,\middle\vert\, \mathcal{F}_N \right] = {O_P}({r^{ - 1}}) = {o_P}(1).$

Therefore ${{\mathit{\boldsymbol{\tau}}} _2} = {o_{P\mid\mathcal{F}_N}}(1)$ , and (A.4) holds

□

Next, we will prove Theorems 2.1 and 2.2.

Proof of Theorem 2.1. $\overset{\smile}{\mathit{\boldsymbol{\beta}}}$ is the solution of ${\mathit{\boldsymbol{Q}}^{*}}(\mathit{\boldsymbol{\beta}}) = \frac{1}{r}\sum\limits_{i = 1}^r {\frac{1}{{{{\widetilde \pi }_i}}}}{{\tilde{\mathit{\boldsymbol{\eta}}}}_{_i}^*({{\mathit{\boldsymbol{\Sigma}}}_u}, \mathit{\boldsymbol{\beta}})} = {\mathbf{0}}$ , then

$E\left( \frac{1}{N} \mathit{\boldsymbol{Q}}^{*}(\mathit{\boldsymbol{\beta}}) \,\middle\vert\, \mathcal{F}_N \right) = \frac{1}{Nr} \sum\limits_{i = 1}^r \sum\limits_{j = 1}^N \pi_j \frac{{\mathit{\boldsymbol{\eta}}}_{j}^*(\mathit{\boldsymbol{\Sigma}}_u, \mathit{\boldsymbol{\beta}})}{\pi_j} = \frac{1}{N} \sum\limits_{i = 1}^N {\mathit{\boldsymbol{\eta}}}_{i}^*(\mathit{\boldsymbol{\Sigma}}_u, \mathit{\boldsymbol{\beta}}) = \frac{1}{N} \mathit{\boldsymbol{Q}}(\mathit{\boldsymbol{\beta}}).$

By Assumption A6, we have

$\begin{aligned} \text{Var}\left( \frac{1}{N} \mathit{\boldsymbol{Q}}^{*}(\mathit{\boldsymbol{\beta}}) \,\middle\vert\, \mathcal{F}_N \right) & = \text{Var}\left( \frac{1}{N} \frac{1}{r} \sum\limits_{i = 1}^r \frac{{\tilde{\mathit{\boldsymbol{\eta}}}}_i^*(\mathit{\boldsymbol{\Sigma}}_u, \mathit{\boldsymbol{\beta}})}{\widetilde{\pi}_i} \,\middle\vert\, \mathcal{F}_N \right) \\ & = \frac{1}{N^2 r^2} \sum\limits_{i = 1}^r \sum\limits_{j = 1}^N \pi_j \frac{\mathit{\boldsymbol{\eta}}_i^*(\mathit{\boldsymbol{\Sigma}}_u, \mathit{\boldsymbol{\beta}}) {\mathit{\boldsymbol{\eta}}}_i^{*{\rm T}}(\mathit{\boldsymbol{\Sigma}}_u, \mathit{\boldsymbol{\beta}})}{\pi_j^2} \\ & = \frac{1}{N^2 r} \sum\limits_{i = 1}^N \frac{{\mathit{\boldsymbol{\eta}}}_i^*(\mathit{\boldsymbol{\Sigma}}_u, \mathit{\boldsymbol{\beta}}) {\mathit{\boldsymbol{\eta}}}_i^{*{\rm T}}(\mathit{\boldsymbol{\Sigma}}_u, \mathit{\boldsymbol{\beta}})}{\pi_i} \\ & = O_P(r^{-1}). \end{aligned}$

Therefore, as $r \to \infty$ , ${N}^{-1} \mathit{\boldsymbol{Q}}^{*}(\mathit{\boldsymbol{\beta}}) - {N}^{-1} \mathit{\boldsymbol{Q}}(\mathit{\boldsymbol{\beta}}) \xrightarrow{} 0$ for all ${\boldsymbol \beta} \in \Lambda$ in conditional probability given $\mathcal{F}_N$ . Thus, from Theorem 5.9 in ^[32], we have $\left\| \overset{\smile}{\mathit{\boldsymbol{\beta}}} - \hat{\mathit{\boldsymbol{\beta}}}_{\text{MLE}} \right\| = o_{P\mid\mathcal{F}_N}(1)$ . By Taylor expansion,

$\begin{aligned} \frac{1}{N} \mathit{\boldsymbol{Q}}^{*}(\overset{\smile}{\mathit{\boldsymbol{\beta}}}) = {\mathbf{0}} & = \frac{1}{N} \mathit{\boldsymbol{Q}}^{*}({\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}) + \frac{1}{Nr} \sum\limits_{i = 1}^r \frac{{\widetilde{\bf{\Omega}}}_i^*(\mathit{\boldsymbol{\Sigma}}_u, {\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}} + {{\boldsymbol{s}}}_r)}{\widetilde{\pi}_i} (\overset{\smile}{\mathit{\boldsymbol{\beta}}} - {\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}). \end{aligned}$

By Lemma 2, it follows that

$\frac{1}{Nr} \sum\limits_{i = 1}^N \frac{\widetilde{\bf{\Omega}}_i^*(\mathit{\boldsymbol{\Sigma}}_u, {\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}} + {{\boldsymbol{s}}}_r)}{\widetilde{\pi}_i} - \frac{1}{N} \sum\limits_{i = 1}^N {\bf{\Omega}}_i^*(\mathit{\boldsymbol{\Sigma}}_u, {\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}) = o_{P\mid\mathcal{F}_N}(1),$

then

${\mathbf{0}} = \frac{1}{N} \mathit{\boldsymbol{Q}}^{*}({\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}) + \frac{1}{N} \sum\limits_{i = 1}^N {\bf{\Omega}}_i^*(\mathit{\boldsymbol{\Sigma}}_u, {\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}) (\overset{\smile}{\mathit{\boldsymbol{\beta}}} - {\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}) + o_{P\mid\mathcal{F}_N}(1)(\overset{\smile}{\mathit{\boldsymbol{\beta}}} - {\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}).$

Here is

$\frac{1}{N}{\mathit{\boldsymbol{Q}}^{*}}({\mathit{\boldsymbol{\hat \beta}}_{\text {MLE}}}) + {{\bf{M}}_X}(\overset{\smile}{\mathit{\boldsymbol{\beta}}} - \hat{\mathit{\boldsymbol{\beta}}}_{\text{MLE}}) + {o_{P\mid\mathcal{F}_N}}\left( {\left\| {\overset{\smile}{\mathit{\boldsymbol{\beta}}} - \hat{\mathit{\boldsymbol{\beta}}}_{\text{MLE}}} \right\|} \right) = {\mathbf{0}},$

we have

$\begin{equation} \begin{aligned} \overset{\smile}{\mathit{\boldsymbol{\beta}}} - {\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}} & = - {\bf{M}}_X^{-1} \left\{ \frac{1}{N} \mathit{\boldsymbol{Q}}^{*}({\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}) + o_{P\mid\mathcal{F}_N} \left( \left\| \overset{\smile}{\mathit{\boldsymbol{\beta}}} - {\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}} \right\| \right) \right\} \\ & = - {\bf{M}}_X^{-1} \mathit{\boldsymbol{V}}_{\text {C}}^{\frac{1}{2}} {\bf{V}}_{\text {C}}^{-\frac{1}{2}} \frac{1}{N} \mathit{\boldsymbol{Q}}^{*}({\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}) + {\bf{M}}_X^{-1} o_{P\mid\mathcal{F}_N} \left( \left\| \overset{\smile}{\mathit{\boldsymbol{\beta}}} - {\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}} \right\| \right) \\ & = O_{P\mid\mathcal{F}_N} \left( r^{-\frac{1}{2}} \right) + o_{P\mid\mathcal{F}_N} \left( \left\| \overset{\smile}{\mathit{\boldsymbol{\beta}}} - {\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}} \right\| \right). \end{aligned} \end{equation}$

(A.5)

By Lemma 1 and Assumption A3, ${\bf{M}}_X^{ - 1} = {O_{P\mid\mathcal{F}_N}}\left(1 \right)$ , we have $\overset{\smile}{\mathit{\boldsymbol{\beta}}} - \hat{\mathit{\boldsymbol{\beta}}}_{\text{MLE}} = {O_{P\mid\mathcal{F}_N}}\left({{r^{ - \frac{1}{2}}}} \right)$ . □

Proof of Theorem 2.2. By Lemma 1 and (A.5), as $r \to \infty$ , conditional on $\mathcal{F}_N$ , it holds that

$\begin{aligned} {\bf{V}}^{-\frac{1}{2}}(\overset{\smile}{\mathit{\boldsymbol{\beta}}} - {\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}) = - {\bf{V}}^{-\frac{1}{2}}{{\bf{M}}}_X^{-1}{\bf{V}}_{\text {C}}^{\frac{1}{2}}{\bf{V}}_{\text {C}}^{-\frac{1}{2}}\frac{1}{N}{\mathit{\boldsymbol{Q}}^{*}}({\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}) + o_{P|{\mathcal{F}_N}}\left(1\right). \end{aligned}$

By Lemma 1 and Slutsky's theorem, it follows that

${{\bf{V}}^{ - \frac{1}{2}}}(\overset{\smile}{\mathit{\boldsymbol{\beta}}} - {\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}})\mathop \to \limits^d N_p({{\mathbf{0}}},{\bf I}).$

□

Proof of Theorem 2.3. To minimize the asymptotic variance $tr({\bf{V}})$ of $\overset{\smile}{\mathit{\boldsymbol{\beta}}}$ , the optimization problem is

$\begin{equation} \left\{\begin{array}{l} \min tr({\bf{V}}) = \min \frac{1}{N^2 r} \sum\limits_{i = 1}^N \left[\frac{1}{\pi_i} \left\|{{\bf{M}}_X^{-1}} {\mathit{\boldsymbol{\eta}}}_i^*\left(\mathit{\boldsymbol{\Sigma}}_u, \hat{\mathit{\boldsymbol{\beta}}}_{\text {MLE}}\right)\right\|^2\right], \\ \text { s.t. } \sum\limits_{i = 1}^N \pi_i = 1, \quad 0 \leq \pi_i \leq 1, \quad i = 1, \ldots, N. \end{array}\right. \end{equation}$

(A.6)

Define $g_i^{\text{mV}} = \left\|{{\bf{M}}_X^{-1}} {\mathit{\boldsymbol{\eta}}}_i^*\left(\mathit{\boldsymbol{\Sigma}}_u, \hat{\mathit{\boldsymbol{\beta}}}_{\text {MLE}}\right)\right\|, \; i = 1, \ldots, N$ , it follows from Cauchy's inequality that

$\begin{aligned} {tr}({\bf{V}}) & = \frac{1}{N^2 r} \sum\limits_{i = 1}^N \left[ \frac{1}{\pi_i} \left\|{{\bf{M}}_X^{-1}} {\mathit{\boldsymbol{\eta}}}_i^*\left(\mathit{\boldsymbol{\Sigma}}_u, \hat{\mathit{\boldsymbol{\beta}}}_{\text {MLE}}\right)\right\|^2 \right] \\ & = \frac{1}{N^2 r} \left( \sum\limits_{i = 1}^N \pi_i \right) \left\{ \sum\limits_{i = 1}^N \left[ \frac{1}{\pi_i} \left\|{{\bf{M}}_X^{-1}} {\mathit{\boldsymbol{\eta}}}_i^*\left(\mathit{\boldsymbol{\Sigma}}_u, \hat{\mathit{\boldsymbol{\beta}}}_{\text {MLE}}\right)\right\|^2 \right] \right\} \\ &\ge \frac{1}{N^2 r} \left[ \sum\limits_{i = 1}^N \left\|{{\bf{M}}_X^{-1}} {\mathit{\boldsymbol{\eta}}}_i^*\left(\mathit{\boldsymbol{\Sigma}}_u, \hat{\mathit{\boldsymbol{\beta}}}_{\text {MLE}}\right)\right\| \right]^2 \\ & = \frac{1}{N^2 r} \left[ \sum\limits_{i = 1}^N g_i^{\text{mV}} \right]^2. \end{aligned}$

The equality sign holds if and only if ${\pi _i} \propto g_i^{\text{mV}}$ , therefore

$\pi_i^{\text{mV}} = \frac{g_i^{\text{mV}}}{\sum\limits_{j = 1}^N g_j^{\text{mV}}}$

is the optimal solution. □

The proof of Theorem 2.4 is similar to Theorem 2.3.

Lemma 3. If Assumptions A1–A4 and A6 hold, as $r_0 \to \infty$ , $r \to \infty$ and $N \to \infty$ , conditional on $\mathcal{F}_N$ , we have

$\begin{equation} {\overset{\smile}{\bf{M}}_X^{{\tilde{\mathit{\boldsymbol{\beta}}}}_0}} - {{\bf{M}}_X} = {O_{P|{\mathcal{F}_N}}(r^{-\frac{1}{2}})}, \end{equation}$

(A.7)

$\begin{equation} {\bf{M}}_X^0 - {{\bf{M}}_X} = {O_{P|{\mathcal{F}_N}}}({r_0}^{ - \frac{1}{2}}), \end{equation}$

(A.8)

$\begin{equation} \frac{1}{N}\mathit{\boldsymbol{Q}}_{{{\tilde{\mathit{\boldsymbol{\beta}}}}_0}}^{*}({{\hat{\mathit{\boldsymbol{\beta}}}}_{\mathit{\text{MLE}}}}) = {O_{P|{\mathcal{F}_N}}}({r^{ - \frac{1}{2}}}), \end{equation}$

(A.9)

$\begin{equation} \frac{1}{N}\mathit{\boldsymbol{Q}}_{{{\tilde{\mathit{\boldsymbol{\beta}}}}_0}}^{*0}({{\hat{\mathit{\boldsymbol{\beta}}}}_{\mathit{\text{MLE}}}}) = {O_{P|{\mathcal{F}_N}}}({{r_0}^{ - \frac{1}{2}}}), \end{equation}$

(A.10)

$\begin{equation} \frac{1}{N}{\bf{V}}_{\mathit{\text{C}}}^{\mathit{\text{opt}}- \frac{1}{2}}\mathit{\boldsymbol{Q}}_{{{\tilde{\mathit{\boldsymbol{\beta}}}}_0}}^{*}({{\hat{\mathit{\boldsymbol{\beta}}}}_{\mathit{\text{MLE}}}})\mathop \to \limits^d N_p({{\mathbf{0}}},{\bf I}), \end{equation}$

(A.11)

where

$\overset{\smile}{\bf{M}}_X^{\tilde{\boldsymbol \beta}_0} = \frac{1}{{Nr}}\sum\limits_{i = 1}^r {\frac{{{\widetilde{\bf{\Omega}}}_{_i}^*({{\mathit{\boldsymbol{\Sigma}}}_u},{{\hat{\mathit{\boldsymbol{\beta}}}}_{\mathit{\text{MLE}}}})}}{{\widetilde \pi _i^{\mathit{\text{opt}}}}}},$

${\bf{M}}_X^0 = \frac{1}{{N{r_0}}}\sum\limits_{i = 1}^{{r_0}} {\frac{{{\widetilde{\bf{ \Omega}}}_{_i}^*({{\mathit{\boldsymbol{\Sigma}}}_u},{{\hat{\mathit{\boldsymbol{\beta}}}}_{\mathit{\text{MLE}}}})}}{{\widetilde \pi _i^{\mathit{\text{UNIF}}}}}}.$

Proof.

$\begin{aligned} E\left( \overset{\smile}{\bf{M}}_X^{\tilde{\boldsymbol \beta}_0} \,\middle\vert\, \mathcal{F}_N \right) & = E_{{\tilde{\boldsymbol \beta}}_0} \left[ E\left(\overset{\smile}{\bf{M}}_X^{{\tilde{\boldsymbol \beta}}_0} \,\middle\vert\, \mathcal{F}_N, {\tilde{\boldsymbol \beta}}_0\right) \right] \\ & = E_{{\tilde{\boldsymbol \beta}}_0} \left[ E \left( \frac{1}{Nr} \sum\limits_{i = 1}^r \frac{{\widetilde{\bf{{\Omega}}}}_{i}^*(\mathit{\boldsymbol{\Sigma}}_u, {\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}})}{\widetilde{\pi}_i^{\text{opt}}} \,\middle\vert\, \mathcal{F}_N, {\tilde{\boldsymbol \beta}}_0 \right) \right] \\ & = E_{{\tilde{\boldsymbol \beta}}_0} \left[ E\left({\bf{M}}_X \,\middle\vert\, \mathcal{F}_N, {\tilde{\boldsymbol \beta}}_0\right) \right] \\ & = {\bf{M}}_X. \end{aligned}$

By Assumption A6, we have

$\begin{aligned} &E\left[ \left( \overset{\smile}{\bf{M}}_X^{{\tilde{\boldsymbol \beta}}_0, j_1 j_2} - {{\bf{M}}_X}^{j_1 j_2} \right)^2 \,\middle\vert\, \mathcal{F}_N \right] \\ = & E_{{\tilde{\mathit{\boldsymbol{\beta}}}}_0} \left\{ E \left[ \left( \overset{\smile}{\bf{M}}_X^{\tilde{\mathit{\boldsymbol{\beta}}}_0, j_1 j_2} - {\bf{M}}_X^{j_1 j_2}\right)^2 \,\middle\vert\, \mathcal{F}_N, \tilde{\beta}_0 \right]\right\} \\ = & E_{{\tilde{\mathit{\boldsymbol{\beta}}}}_0} \left[ \frac{1}{r} \sum\limits_{i = 1}^N {\pi}_i^{\text{opt}} \left( \frac{{\widetilde{\bf{{\Omega}}}}_{i}^{*j_1 j_2}(\mathit{\boldsymbol{\Sigma}}_u, {\hat{\boldsymbol \beta}}_{\text{MLE}})}{N {\pi}_i^{\text{opt}}} - {\bf{M}}_X^{j_1 j_2} \right)^2 \,\middle\vert\, \mathcal{F}_N, \tilde{\boldsymbol \beta}_0 \right] \\ = & E_{{\tilde{\mathit{\boldsymbol{\beta}}}}_0} \left[ \frac{1}{r} \sum\limits_{i = 1}^N {\pi}_i^{\text{opt}} \left( \frac{{\widetilde{\bf{{\Omega}}}}_{i}^{*j_1 j_2}(\mathit{\boldsymbol{\Sigma}}_u, {\hat{\boldsymbol \beta}}_{\text{MLE}})}{N {\pi}_i^{\text{opt}}} \right)^2 - \frac{1}{r} \left( {\bf{M}}_X^{j_1 j_2} \right)^2 \,\middle\vert\, \mathcal{F}_N, {\widetilde{\boldsymbol \beta}}_0 \right] \\ \le& E_{{\tilde{\mathit{\boldsymbol{\beta}}}}_0} \left[ \frac{1}{r} \sum\limits_{i = 1}^N {\pi}_i^{\text{opt}} \left( \frac{{\widetilde{\bf{{\Omega}}}}_{i}^{*j_1 j_2}(\mathit{\boldsymbol{\Sigma}}_u, {\hat{\boldsymbol \beta}}_{\text{MLE}})}{N {\pi}_i^{\text{opt}}} \right)^2 \,\middle\vert\, \mathcal{F}_N, \tilde{\boldsymbol \beta}_0 \right] \\ = & \frac{1}{r} \sum\limits_{i = 1}^N \frac{\left( {\bf{\Omega}}_{i}^{*j_1 j_2}(\mathit{\boldsymbol{\Sigma}}_u, {\hat{\boldsymbol \beta}}_{\text{MLE}}) \right)^2}{N^2 {{\pi}}_i^{\text{opt}}} \\ = & O_P(r^{-1}). \end{aligned}$

It follows from Chebyshev's inequality that (A.7) holds. Similarly, (A.8) also holds.

$E\left( \frac{1}{N} \mathit{\boldsymbol{Q}}_{\tilde{\boldsymbol \beta}_0}^{*}({\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}) \,\middle\vert\, \mathcal{F}_N \right) = E_{\tilde{\boldsymbol \beta}_0} \left[ E\left( \frac{1}{N} \frac{1}{r} \sum\limits_{i = 1}^r \frac{{\tilde{\mathit{\boldsymbol{{\eta}}}}}_{i}^*(\mathit{\boldsymbol{\Sigma}}_u, {\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}})}{\widetilde{\pi}_i^{\text{opt}}} \,\middle\vert\, \mathcal{F}_N, {\tilde{\boldsymbol \beta}}_0 \right) \right] = \frac{1}{N} \sum\limits_{i = 1}^N \mathit{\boldsymbol{\eta}}_{i}^*(\mathit{\boldsymbol{\Sigma}}_u, {\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}) = {{\mathbf{0}}}.$

By Assumption A6, we have

$\begin{aligned} \text{Var}\left( \frac{1}{N} \mathit{\boldsymbol{Q}}_{\tilde{\boldsymbol \beta}_0}^{*}({\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}) \,\middle\vert\, \mathcal{F}_N \right) & = E_{{\tilde{\boldsymbol \beta}}_0} \left\{\text{Var} \left[ \left( \frac{1}{N} \frac{1}{r} \sum\limits_{i = 1}^r \frac{{\tilde{\mathit{\boldsymbol{{\eta}}}}}_{i}^*(\mathit{\boldsymbol{\Sigma}}_u, {\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}})}{\widetilde{\pi}_i^{\text{opt}}}\right) \,\middle\vert\, \mathcal{F}_N, \tilde{\boldsymbol \beta}_0 \right] \right\}\\ & = \frac{1}{N^2 r} \sum\limits_{i = 1}^N \frac{\mathit{\boldsymbol{\eta}}_{i}^*(\mathit{\boldsymbol{\Sigma}}_u, {\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}) {\mathit{\boldsymbol{\eta}}}_{i}^{*{\rm T}}(\mathit{\boldsymbol{\Sigma}}_u, {\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}})}{{\pi}_i^{\text{opt}}} \\ & = O_P(r^{-1}). \end{aligned}$

Therefore, the (A.9) and (A.10) follow from Markov's Inequality.

Let

$\mathit{\boldsymbol{\gamma}} _{i,{{\mathit{\boldsymbol{\tilde \beta}}}_0}}^{*} = \frac{{\tilde{\mathit{\boldsymbol{\eta}}}_{_i}^*({{\mathit{\boldsymbol{\Sigma}}}_u},{{\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}})}}{{N\widetilde \pi _i^{\text {opt}}}},$

for all $\varepsilon > 0$ , it follows that ${N}^{-1}\mathit{\boldsymbol{Q}}_{{{\tilde{\mathit{\boldsymbol{\beta}}}}_0}}^{*}({{\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}}) = {r}^{-1}\sum\limits_{i = 1}^r {\mathit{\boldsymbol{\gamma}} _{i, {{\tilde{\mathit{\boldsymbol{\beta}}}}_0}}^{*}}$ ,

$\begin{aligned} &\sum\limits_{i = 1}^r E_{\tilde{\boldsymbol \beta}_0} \left\{E\left[ \left\| r^{-\frac{1}{2}} \mathit{\boldsymbol{\gamma}}_{i,\tilde{\boldsymbol \beta}_0}^{*} \right\|^2 I \left( \left\| \mathit{\boldsymbol{\gamma}}_{i,\tilde{\boldsymbol \beta}_0}^{*} \right\| > r^{\frac{1}{2}} \varepsilon \right) \,\middle\vert\, \mathcal{F}_N, {\tilde{\boldsymbol \beta}}_0 \right] \right\} \\ = & \frac{1}{r} \sum\limits_{i = 1}^r E_{{\tilde{\boldsymbol \beta}}_0} \left\{ E \left[ \left\| \mathit{\boldsymbol{\gamma}}_{i,{\tilde{\boldsymbol \beta}_0}}^{*} \right\|^2 I \left( \left\| \mathit{\boldsymbol{\gamma}}_{i,\tilde{\boldsymbol \beta}_0}^{*} \right\| > r^{\frac{1}{2}} \varepsilon \right) \,\middle\vert\, \mathcal{F}_N, \tilde{\boldsymbol \beta}_0 \right] \right\} \\ \le & \frac{1}{r^{\frac{3}{2}} \varepsilon} \sum\limits_{i = 1}^r E_{\tilde{\boldsymbol \beta}_0} \left[ E \left( \left\| \mathit{\boldsymbol{\gamma}}_{i,{\tilde{\boldsymbol \beta}}_0}^{*} \right\|^3 \,\middle\vert\, \mathcal{F}_N, {\tilde{\boldsymbol \beta}}_0 \right) \right] \\ = & \frac{1}{r^{\frac{1}{2}} \varepsilon} \frac{1}{N^3} \sum\limits_{i = 1}^N \frac{\left\| {\tilde{\mathit{\boldsymbol{\eta}}}_{_i}^*({{\mathit{\boldsymbol{\Sigma}}}_u},{{\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}})} \right\|^3}{\pi_i^{\text{opt}^2}} \\ = & O_P(r^{-\frac{1}{2}}) = o_P(1). \end{aligned}$

This shows that the Lindeberg-Feller conditions are satisfied in probability. Therefore (A.11) is true. □

Lemma 4. If Assumptions A1–A7 hold, as $r_0 \to \infty$ , $r \to \infty$ and $N \to \infty$ , for all ${{{\boldsymbol{s}}}_{r_0}} \to \boldsymbol0$ and ${{{\boldsymbol{s}}}_r} \to {\mathbf{0}}$ , conditional on $\mathcal{F}_N$ , we have

$\begin{equation} \frac{1}{{N{r_0}}}\sum\limits_{i = 1}^{{r_0}} {\frac{{{\widetilde{\bf{ \Omega}}}_i^*({{\mathit{\boldsymbol{\Sigma}}}_u},{{\hat{\mathit{\boldsymbol{\beta}}}}_{\mathit{\text{MLE}}}} + {{{\boldsymbol{s}}}_{r_0}})}}{{\widetilde \pi _i^{\mathit{\text{opt}}}}}} - \frac{1}{N}\sum\limits_{i = 1}^N {{\bf{\Omega}}_i^*({{\mathit{\boldsymbol{\Sigma}}}_u},{{\hat{\mathit{\boldsymbol{\beta}}}}_{\mathit{\text{MLE}}}})} = {o_{P\mid\mathcal{F}_N}}(1), \end{equation}$

(A.12)

$\begin{equation} \frac{1}{{N{r}}}\sum\limits_{i = 1}^{{r}} {\frac{{{\widetilde{\bf{ \Omega}}}_i^*({{\mathit{\boldsymbol{\Sigma}}}_u},{{\hat{\mathit{\boldsymbol{\beta}}}}_{\mathit{\text{MLE}}}} + {{{\boldsymbol{s}}}_r})}}{{\widetilde \pi _i^\mathit{\text{opt}}}}} - \frac{1}{N}\sum\limits_{i = 1}^N {{\bf{\Omega}}_i^*({{\mathit{\boldsymbol{\Sigma}}}_u},{{\hat{\mathit{\boldsymbol{\beta}}}}_{\mathit{\text{MLE}}}})} = {o_{P\mid\mathcal{F}_N}}(1). \end{equation}$

(A.13)

Proof. The Eq (A.12) can be written as

$\frac{1}{{N{r_0}}}\sum\limits_{i = 1}^{{r_0}} {\frac{{{\widetilde{\bf{\Omega}}}_i^*({{\mathit{\boldsymbol{\Sigma}}}_u},{{\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}} + {\mathit{\boldsymbol{s}}_{r_0}})}}{{{{\widetilde \pi }_i^{\text {opt}}}}}} - \frac{1}{{N{r_0}}}\sum\limits_{i = 1}^{{r_0}} {\frac{{{\widetilde{\bf{\Omega}}}_i^*({{\mathit{\boldsymbol{\Sigma}}}_u},{{\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}})}}{{\widetilde \pi _i^{\text{opt}}}}} + \frac{1}{{N{r_0}}}\sum\limits_{i = 1}^{{r_0}} {\frac{{{\widetilde{\bf{\Omega}}}_i^*({{\mathit{\boldsymbol{\Sigma}}}_u},{{\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}})}}{{\widetilde \pi _i^{\text{opt}}}}} - \frac{1}{N}\sum\limits_{i = 1}^N {{\bf{\Omega}}_i^*({{\mathit{\boldsymbol{\Sigma}}}_u},{{\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}})}.$

Let

$\mathit{\boldsymbol{\tau}}_1^0 : = \frac{1}{{N{r_0}}}\sum\limits_{i = 1}^{{r_0}} {\frac{{{\widetilde{\bf{\Omega}}}_i^*({{\mathit{\boldsymbol{\Sigma}}}_u},{{\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}} + {\mathit{\boldsymbol{s}}_{r_0}})}}{{\widetilde \pi _i^{\text{opt}}}}} - \frac{1}{{N{r_0}}}\sum\limits_{i = 1}^{{r_0}} {\frac{{{\widetilde{\bf{ \Omega}}}_i^*({{\mathit{\boldsymbol{\Sigma}}}_u},{{\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}})}}{{\widetilde \pi _i^{\text{opt}}}}},$

then by Assumption A7, we have

$\begin{aligned} E \left( \left\| \mathit{\boldsymbol{\tau}}_1^0 \right\|_S \,\middle\vert\, \mathcal{F}_N \right) = & E_{\tilde{\mathit{\boldsymbol{\beta}}}_0} \left\{ E \left[ \frac{1}{N r_0} \sum\limits_{i = 1}^{r_0} \frac{1}{\tilde{\pi}_i^{\text{opt}}} \left\| {\widetilde{\bf{\Omega}}}_{i}^*(\mathit{\boldsymbol{\Sigma}}_u, {\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}} +{\mathit{\boldsymbol{s}}_{r_0}}) - {\widetilde{\bf{\Omega}}}_{i}^*(\mathit{\boldsymbol{\Sigma}}_u, {\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}) \right\|_S \,\middle\vert\, \mathcal{F}_N, {\tilde{\mathit{\boldsymbol{\beta}}}}_0 \right] \right\} \\ = & \frac{1}{N} \sum\limits_{i = 1}^N \left\|{\bf{\Omega}}_{i}^*(\mathit{\boldsymbol{\Sigma}}_u, {\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}} + {\mathit{\boldsymbol{s}}_{r_0}}) - {\bf{\Omega}}_{i}^*(\mathit{\boldsymbol{\Sigma}}_u, {\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}})\right\|_S \\ \le& \frac{1}{N} \sum\limits_{i = 1}^N m_2(\mathit{\boldsymbol{W}}_i) \left\| {\mathit{\boldsymbol{s}}_{r_0}} \right\| \\ = & o_P(1). \end{aligned}$

It follows from Markov's inequality that $\mathit{\boldsymbol{\tau}}_1^0 = {o_{P\mid\mathcal{F}_N}}(1)$ .

Let

$\mathit{\boldsymbol{\tau}}_2^0 : = \frac{1}{{N{r_0}}}\sum\limits_{i = 1}^{{r_0}} {\frac{{{\widetilde{\bf{\Omega}}}_i^*({{\mathit{\boldsymbol{\Sigma}}}_u},{{\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}})}}{{\widetilde \pi _i^{\text{opt}}}}} - \frac{1}{N}\sum\limits_{i = 1}^N {{\bf{\Omega}}_i^*({{\mathit{\boldsymbol{\Sigma}}}_u},{{\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}})},$

then

$\begin{aligned} E_{\tilde{\boldsymbol \beta}_0} \left\{ E \left[ \frac{1}{N r_0} \sum\limits_{i = 1}^{r_0} \frac{{\widetilde{\bf{\Omega}}}_{i}^*\left(\mathit{\boldsymbol{\Sigma}}_u, {\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}} + {\mathit{\boldsymbol{s}}_{r_0}}\right)}{\widetilde{\pi}_i^{\text {opt}}} \,\middle\vert\, \mathcal{F}_N, {\tilde{\boldsymbol \beta}}_0 \right] \right\} & = \frac{1}{N} \sum\limits_{i = 1}^N {\bf{\Omega}}_{i}^*(\mathit{\boldsymbol{\Sigma}}_u, {\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}). \end{aligned}$

By the proof of Lemma 3, it follows that

$E\left[ \left(\overset{\smile}{\bf{M}}_X^{\tilde{\boldsymbol \beta}_0, j_1 j_2} - {{\bf{M}}}_X^{j_1 j_2} \right)^2 \,\middle\vert\, \mathcal{F}_N \right] = {O_P}({r_0}^{-1}) = {o_P}(1),$

we have $\mathit{\boldsymbol{\tau}}_2^0 = {o_{P\mid\mathcal{F}_N}}(1)$ . Therefore (A.12) holds. Similarly, (A.13) is also true. □

Next, we will prove Theorems 2.5 and 2.6.

Proof of Theorem 2.5.

$E\left( \frac{1}{N} \mathit{\boldsymbol{Q}}_{{\tilde{\boldsymbol \beta}}_0}^{*}(\mathit{\boldsymbol{\beta}}) \,\middle\vert\, \mathcal{F}_N \right) = E_{\tilde{\boldsymbol \beta}_0} \left[ E\left( \frac{1}{N} \frac{1}{r} \sum\limits_{i = 1}^r \frac{{\tilde{\mathit{\boldsymbol{\eta}}}}_{i}^*(\mathit{\boldsymbol{\Sigma}}_u, \mathit{\boldsymbol{\beta}})}{\widetilde{\pi}_i^{\text{opt}}} \,\middle\vert\, \mathcal{F}_N, \tilde{\boldsymbol \beta}_0 \right) \right] = \frac{1}{N} \sum\limits_{i = 1}^N \mathit{\boldsymbol{\eta}}_{i}^*(\mathit{\boldsymbol{\Sigma}}_u, \mathit{\boldsymbol{\beta}}) = \frac{1}{N} \mathit{\boldsymbol{Q}}(\mathit{\boldsymbol{\beta}}).$

By Assumption A6, we have

${\text{Var}}\left( \frac{1}{N} {\mathit{\boldsymbol{Q}}}_{{\tilde{\mathit{\boldsymbol{\beta}}}}_0}^{*}({\mathit{\boldsymbol{\beta}}}) \,\middle\vert\, \mathcal{F}_N \right) = E_{\tilde{\mathit{\boldsymbol{\beta}}}_0} \left\{ \text {Var} \left( \frac{1}{N} \frac{1}{r} \sum\limits_{i = 1}^r \frac{{\tilde{\mathit{\boldsymbol{\eta}}}}_{i}^*(\mathit{\boldsymbol{\Sigma}}_u, \mathit{\boldsymbol{\beta}})}{\widetilde{\pi}_i^{\text{opt}}} \,\middle\vert\, \mathcal{F}_N, \tilde{\mathit{\boldsymbol{\beta}}}_0 \right) \right\} = \frac{1}{N^2 r} \sum\limits_{i = 1}^N \frac{\mathit{\boldsymbol{\eta}}_{i}^*(\mathit{\boldsymbol{\Sigma}}_u, \mathit{\boldsymbol{\beta}}) \mathit{\boldsymbol{\eta}}_{i}^{*{\rm T}}(\mathit{\boldsymbol{\Sigma}}_u, \mathit{\boldsymbol{\beta}})}{\pi_i^{\text{opt}}} = O_P(r^{-1}).$

Hence, as $r \to \infty$ , ${N}^{-1} \mathit{\boldsymbol{Q}}_{{\tilde{\mathit{\boldsymbol{\beta}}}}_0}^{*}(\mathit{\boldsymbol{\beta}}) - {N}^{-1} \mathit{\boldsymbol{Q}}(\mathit{\boldsymbol{\beta}})\mathop \to \limits^{} {{\mathbf{0}}}$ for all ${\boldsymbol \beta} \in \Lambda$ in conditional probability given $\mathcal{F}_N$ .

$\check {\mathit{\boldsymbol{\beta}}}$ is the solution of $\mathit{\boldsymbol{Q}}_{{{{\tilde{\boldsymbol \beta}}}_0}}^{two - step}(\mathit{\boldsymbol{\beta}}) = {\mathbf{0}}$ , we have

$\begin{equation} {\mathbf{0}} = \frac{1}{N}\mathit{\boldsymbol{Q}}_{{{\tilde{\mathit{\boldsymbol{{\beta}}}}}_0}}^{two - step}(\check{\mathit{\boldsymbol{\beta}}}) = \frac{r}{{r + {r_0}}}\frac{1}{N}\mathit{\boldsymbol{Q}}_{{\tilde{\boldsymbol \beta}}_0}^{*}({\check{\mathit{\boldsymbol{\beta}}}}) + \frac{{{r_0}}}{{r + {r_0}}}\frac{1}{N}{\mathit{\boldsymbol{Q}}}_{{{\tilde{\mathit{\boldsymbol{\beta}}}}_0}}^{*0}(\check{\mathit{\boldsymbol{\beta}}}). \end{equation}$

(A.14)

By Lemma 4, we have

$\frac{1}{N r_0} \sum\limits_{i = 1}^{r_0} \frac{{\widetilde{\bf{\Omega}}}_{i}^*(\mathit{\boldsymbol{\Sigma}}_u, {\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}} + {{{\boldsymbol{s}}}_{r_0}})}{\widetilde{\pi}_i^{\text{opt}}} = \frac{1}{N} \sum\limits_{i = 1}^N {\bf{\Omega}}_{i}^*(\mathit{\boldsymbol{\Sigma}}_u, {\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}) + o_{P|\mathcal{F}_N}(1) = {\bf{M}_X} + {o_{P|\mathcal{F}_N}(1)},$

and

$\frac{1}{{Nr}}\sum\limits_{i = 1}^r {\frac{{{\widetilde{\bf{\Omega}}}_i^*({{\mathit{\boldsymbol{\Sigma}}}_u},{{\hat{\mathit{\boldsymbol{\beta}}}}_{MLE}} + {{{\boldsymbol{s}}}_{r}})}}{{\widetilde \pi _i^{\text {opt}}}}} = {{\bf{M}}_X} + {o_{P|\mathcal{F}_N}(1)}.$

By Taylor expansion, we have

$\begin{equation} \begin{aligned} \frac{1}{N} \mathit{\boldsymbol{Q}}_{\tilde{\mathit{\boldsymbol{\beta}}}_0}^{*}(\check{\mathit{\boldsymbol{{\beta}}}}) & = \frac{1}{N} \mathit{\boldsymbol{Q}}_{\tilde{\boldsymbol \beta}_0}^{*}(\hat{\mathit{\boldsymbol{\beta}}}_{\text{MLE}}) + \frac{1}{Nr} \sum\limits_{i = 1}^r \frac{{\bf{\Omega}}_i^*(\mathit{\boldsymbol{\Sigma}}_u, {\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}} + {{{\boldsymbol{s}}}_{r}})}{\widetilde{\pi}_i^{\text{opt}}} (\check{\mathit{\boldsymbol{\beta}}} - {\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}) \\ & = \frac{1}{N} \mathit{\boldsymbol{Q}}_{{\tilde{\boldsymbol \beta}_0}}^{*}({\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}) + {\bf{M}}_X ({\hat{\mathit{\boldsymbol{{\beta}}}}} - {\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}) + o_{P|\mathcal{F}_N}(1) (\check {\mathit{\boldsymbol{\beta}}} - {\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}). \end{aligned} \end{equation}$

(A.15)

Similarly,

$\begin{equation} \frac{1}{N}{\mathit{\boldsymbol{Q}}}_{{\tilde{\mathit{\boldsymbol{\beta}}}}_0}^{*0}({\check{\mathit{\boldsymbol{\beta}}}}) = \frac{1}{N}\mathit{\boldsymbol{Q}}_{{\tilde{\mathit{\boldsymbol{\beta}}}}_0}^{*0}({\hat{\mathit{\boldsymbol{\beta}}}_{\text{MLE}}}) + {{\bf{M}}_X}(\check{\mathit{\boldsymbol{\beta}}} - {{\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}}) + {o_{P|\mathcal{F}_N}}(1)(\check{\mathit{\boldsymbol{\beta}}} - {{\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}}). \end{equation}$

(A.16)

As ${{{r_0}}}{{r}^{-1}} \to 0, {N}^{-1} \mathit{\boldsymbol{Q}}_{{{\tilde{\mathit{\boldsymbol{\beta}}}}_0}}^{*0}({{\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}}) = {O_{P|\mathcal{F}_N}}\left(r_0^{ - \frac{1}{2}}\right)$ , then

$\frac{r_0}{r + r_0} \frac{1}{N} \mathit{\boldsymbol{Q}}_{\tilde{\boldsymbol \beta}_0}^{*0}(\hat{\mathit{\boldsymbol{\beta}}}_{\text{MLE}}) = \frac{r_0}{r + r_0} {O_{P|\mathcal{F}_N}}(r_0^{-\frac{1}{2}}) = {o_{P|\mathcal{F}_N}}(r^{-\frac{1}{2}}).$

Combining this with (A.14)–(A.16), we have

$\begin{equation} {\check {\boldsymbol \beta}} - {\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}} = {O_{P|\mathcal{F}_N}} \left( r^{ - \frac{1}{2}} \right) + {o_{P|\mathcal{F}_N}} \left(\left\| {\check {\boldsymbol \beta}} - {\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}} \right\| \right), \end{equation}$

(A.17)

which implies that ${\check {\boldsymbol \beta}} - {\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}} = {O_{P|\mathcal{F}_N}} \left(r^{ - \frac{1}{2}} \right)$ . □

Proof of Theorem 2.6. By Lemma 3, $\frac{1}{N}{\bf{V}}_{\text {C}}^{ - \frac{1}{2}}\mathit{\boldsymbol{Q}}_{{{\tilde{\mathit{\boldsymbol{\beta}}}}_0}}^{*}({{\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}})\mathop \to \limits^d N({{\mathbf{0}}}, {\bf I})$ , we have

$\begin{aligned} \left\| {\bf{V}}_{\text {C}} - {\bf{V}}_{\text {C}}^{\text{opt}} \right\|_S & = \left\| \frac{1}{N^2 r} \sum\limits_{i = 1}^N \frac{\mathit{\boldsymbol{\eta}}_{i}^*(\mathit{\boldsymbol{\Sigma}}_u, \hat{\mathit{\boldsymbol{\beta}}}_{\text{MLE}}) \mathit{\boldsymbol{\eta}}_{i}^{*{\rm T}}(\mathit{\boldsymbol{\Sigma}}_u, {\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}})}{\pi_i^{\text{op}}} - \frac{1}{N^2 r} \sum\limits_{i = 1}^N \frac{\mathit{\boldsymbol{\eta}}_{i}^*(\mathit{\boldsymbol{\Sigma}}_u, {\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}) {\mathit{\boldsymbol{\eta}}}_{i}^{*{\rm T}}(\mathit{\boldsymbol{\Sigma}}_u, {\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}})}{\pi_i^{\text{opt}}} \right\|_S \\ &\le \frac{1}{N^2 r} \sum\limits_{i = 1}^N \left\| \frac{1}{\pi_i^{\text{op}}} - \frac{1}{\pi_i^{\text{opt}}} \right\| \left\| {\mathit{\boldsymbol{\eta}}}_{i}^*(\mathit{\boldsymbol{\Sigma}}_u, {\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}) \right\|^2 \\ & = \frac{1}{r} \sum\limits_{i = 1}^N \left\| 1 - \frac{\pi_i^{\text{op}}}{\pi_i^{\text{opt}}} \right\| \frac{\left\| {\mathit{\boldsymbol{\eta}}}_{i}^*(\mathit{\boldsymbol{\Sigma}}_u, {\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}) \right\|^2}{N^2 \pi_i^{\text{op}}}. \end{aligned}$

Taking $\pi _i^{\text{mVc}}$ as an example, by Assumpion A4, the above equation can be summarized as

$\begin{aligned} &\frac{1}{r} \sum\limits_{i = 1}^N \left\| 1 - \frac{\pi_i^{\text{mVc}}}{\pi_i^{\text{mVct}}} \right\| \frac{\left\| \mathit{\boldsymbol{\eta}}_{i}^*(\mathit{\boldsymbol{\Sigma}}_u, {\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}) \right\|^2}{N^2} \frac{\sum\limits_{j = 1}^N g_j^{\text{mVc}}}{g_i^{\text{mVc}}} \\ = & \frac{1}{r} \sum\limits_{i = 1}^N \left\| 1 - \frac{\pi_i^{\text{mVc}}}{\pi_i^{\text{mVct}}} \right\| \frac{\left\| \mathit{\boldsymbol{\eta}}_{i}^*(\mathit{\boldsymbol{\Sigma}}_u, {\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}) \right\|^2}{N^2} \frac{\sum\limits_{j = 1}^N \left\| {\mathit{\boldsymbol{\eta}}}_j^*(\mathit{\boldsymbol{\Sigma}}_u, {\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}) \right\| }{\left\| \mathit{\boldsymbol{\eta}}_i^*(\mathit{\boldsymbol{\Sigma}}_u, {\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}) \right\| } \\ \le & \frac{1}{r} \sum\limits_{i = 1}^N \left\| 1 - \frac{\pi_i^{\text{mVc}}}{\pi_i^{\text{mVct}}} \right\| \frac{\left\| {\mathit{\boldsymbol{\eta}}}_{i}^*(\mathit{\boldsymbol{\Sigma}}_u, {\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}) \right\|}{N} \frac{\sum\limits_{j = 1}^N \left\| {\mathit{\boldsymbol{\eta}}}_j^*(\mathit{\boldsymbol{\Sigma}}_u, {\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}) \right\| }{N} \\ \le &\frac{1}{r} \left( \frac{1}{N} \sum\limits_{i = 1}^N \left\| 1 - \frac{\pi_i^{\text{mVc}}}{\pi_i^{\text{mVct}}} \right\|^2 \right)^{\frac{1}{2}} \left( \sum\limits_{i = 1}^N \frac{\left\| {\mathit{\boldsymbol{\eta}}}_{i}^*(\mathit{\boldsymbol{\Sigma}}_u, {\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}) \right\|^2}{N} \right)^{\frac{1}{2}} \left(\frac{\sum\limits_{j = 1}^N \left\| {\mathit{\boldsymbol{\eta}}}_j^*(\mathit{\boldsymbol{\Sigma}}_u, {\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}) \right\| }{N} \right) \\ = & {o_{P\mid\mathcal{F}_N}}\left( {{r^{ - 1}}} \right). \end{aligned}$

Therefore ${\left\| {{{\bf{V}}_{\text {C}}} - {\bf{V}}_{\text {C}}^{opt}} \right\|_S} = {o_{P\mid\mathcal{F}_N}}\left({{r^{ - 1}}} \right)$ , and

$\begin{aligned} {\bf{V}}_{\text {opt}}^{-\frac{1}{2}}(\check{\mathit{\boldsymbol{{\beta}}}} - {\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}) & = - {\bf{V}}_{\text {opt}}^{-\frac{1}{2}}{\bf{M}}_X^{-1}\frac{1}{N}\mathit{\boldsymbol{Q}}_{{\tilde{\mathit{\boldsymbol{\beta}}}}_0}^{two-step}({\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}) + o_{P|\mathcal{F}_N}(1) \\ & = - {\bf{V}}_{\text {opt}}^{-\frac{1}{2}}{\bf{M}}_X^{-1} \left[ \frac{r}{r + r_0} \frac{1}{N}\mathit{\boldsymbol{Q}}_{{\tilde{\mathit{\boldsymbol{\beta}}}}_0}^{*}(\check{\mathit{\boldsymbol{\beta}}}) + \frac{r_0}{r + r_0} \frac{1}{N}\mathit{\boldsymbol{Q}}_{{\tilde{\mathit{\boldsymbol{\beta}}}}_0}^{*0}(\check{\mathit{\boldsymbol{\beta}}}) \right] + o_{P|\mathcal{F}_N}(1) \\ & = - {\bf{V}}_{\text {opt}}^{-\frac{1}{2}}{\bf{M}}_X^{-1}{\left( {\bf{V}}_{\text {C}}^{\text {opt}} \right)^{\frac{1}{2}}}{\left( {\bf{V}}_{\text {C}}^{\text {opt}} \right)^{-\frac{1}{2}}}\frac{1}{N}\mathit{\boldsymbol{Q}}_{{\tilde{\mathit{\boldsymbol{\beta}}}}_0}^{*}(\check{\mathit{\boldsymbol{\beta}}}) + o_{P|\mathcal{F}_N}(1), \end{aligned}$

which implies that

${\bf{V}}_{\text {opt}}^{ - \frac{1}{2}}{\bf{M}}_X^{ - 1}{\left( {{\bf{V}}_{\text {C}}^{\text {opt}}} \right)^{\frac{1}{2}}}{\left\{ {{\bf{V}}_{\text {opt}}^{ - \frac{1}{2}}{\bf{M}}_X^{ - 1}{{\left( {{\bf{V}}_{\text {C}}^{\text {opt}}} \right)}^{\frac{1}{2}}}} \right\}^{\rm T}} = {\bf{V}}_{\text {opt}}^{ - \frac{1}{2}}{\bf{M}}_X^{ - 1}{\left( {{\bf{V}}_{\text {C}}^{\text {opt}}} \right)^{\frac{1}{2}}}{\left( {{\bf{V}}_{\text {C}}^{{\text {opt}}}} \right)^{\frac{1}{2}}}{\bf{M}}_X^{ - 1}{\bf{V}}_{\text {opt}}^{ - \frac{1}{2}} = {\bf{I}}.$

$\\$ Therefore

${\bf{V}}_{\text {opt}}^{ - \frac{1}{2}}({\check{\mathit{\boldsymbol{\beta}}}} - {{\mathit{\boldsymbol{\hat {\beta}}}}_{\text{MLE}}})\mathop \to \limits^d N_p({{\mathbf{0}}},{\bf I}).$

□

References

[1]	R. P. Agarwal, Y. M. Chow, Iterative method for fourth order boundary value problem, J. Comput. Appl. Math., 10 (1984), 203–217. doi: 10.1016/0377-0427(84)90058-X
[2]	Z. B. Bai, H. Y. Wang, On positive solutions of some nonlinear four-order beam equations, J. Math. Anal. Appl., 270 (2002), 357–368. doi: 10.1016/S0022-247X(02)00071-9
[3]	Z. B. Bai, The upper and lower solution method for some fourth-order boundary value problems, Nonlinear Anal.-Theor.,, 67 (2007), 1704–1709.
[4]	G. Bonanno, B. D. Bella, A boundary value problem for fourth-order elastic beam equations, J. Math. Anal. Appl., 343 (2008), 1166–1176. doi: 10.1016/j.jmaa.2008.01.049
[5]	G. Bonanno, B. D. Bella, D. O'Regan, Non-trivial solutions for nonlinear fourth-order elastic beam equations, Comput. Math. Appl., 62 (2011), 1862–1869. doi: 10.1016/j.camwa.2011.06.029
[6]	R. Graef, B. Yang, Positive solutions of a nonlinear fourth order boundary value problem, Communications on Applied Nonlinear Analysis, 14 (2007), 61–73.
[7]	C. P. Gupta, Existence and uniqueness results for the bending of an elastic beam equation, Appl. Anal., 26 (1988), 289–304. doi: 10.1080/00036818808839715
[8]	P. Korman, Uniqueness and exact multiplicity of solutions for a class of fourth-order semilinear problems, P. Roy. Soc. Edinb. A, 134 (2004), 179–190. doi: 10.1017/S0308210500003140
[9]	B. D. Lou, Positive solutions for nonlinear elastic beam models, International Journal of Mathematics and Mathematical Sciences, 27 (2001), 365–375. doi: 10.1155/S0161171201004203
[10]	R. Y. Ma, L. Xu, Existence of positive solutions of a nonlinear fourth-order boundary value problem, Appl. Math. Lett., 23 (2010), 537–543. doi: 10.1016/j.aml.2010.01.007
[11]	Q. L. Yao, Positive solutions for eigenvalue problems of four-order elastic beam equations, Appl. Math. Lett., 17 (2004), 237–243. doi: 10.1016/S0893-9659(04)90037-7
[12]	Q. L. Yao, Existence of $n$ solutions and/or positive solutions to a semipositone elastic beam equation, Nonlinear Anal.-Theor., 66 (2007), 138–150. doi: 10.1016/j.na.2005.11.016
[13]	Q. L. Yao, positive solutions of nonlinear elastic beam equation rigidly fastened on the left and simply supported on the right, Nonlinear Anal.-Theor., 69 (2008), 1570–1580. doi: 10.1016/j.na.2007.07.002
[14]	C. B. Zhai, R. P. Song, Q. Q. Han, The existence and the uniqueness of symmetric positive solutions for a fourth-order boundary value problem, Comput. Math. Appl., 62 (2011), 2639–2647. doi: 10.1016/j.camwa.2011.08.003
[15]	X. P. Zhang, Existence and iteration of monotone positive solutions for an elastic beam equation with a corner, Nonlinear Anal.-Real, 10 (2009), 2097–2103. doi: 10.1016/j.nonrwa.2008.03.017
[16]	K. Deimling, Nonlinear Functional Analysis, Springer-Verlag, Berlin-Heidelberg-Newyork, 1985.
[17]	D. J. Guo, V. Lakshmikanthan, Nonlinear Problems in Abstract Cones, Academic press, San Diego, 1988.
[18]	D. J. Guo, Nonlinear Functional Analysis, second edn., Shandong Science and Technology Press, Jinan, 2001 (in Chinese).
[19]	H. Amann, Fixed point equations and nonlinear eigenvalue problems in ordered Banach spaces, SIAM Rev., 18 (1976), 620–709. doi: 10.1137/1018114
[20]	G. W. Zhang, J. X. Sun, Positive solutions of $m$ -point boundary value problems, J. Math. Anal. Appl., 291 (2004), 406–418. doi: 10.1016/j.jmaa.2003.11.034

Reader Comments

Your name:*

Email:*
© 2021 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

AIMS Mathematics

1.8 3.4

Metrics

Article views(2699) PDF downloads(254) Cited by(1)

Preview PDF

Download XML

Export Citation

Article outline

Show full outline

AIMS Mathematics

Positive solutions to a semipositone superlinear elastic beam equation

Related Papers:

Abstract

1. Introduction

2. Model and methodology

2.1. General subsampling algorithm

2.2. Optimal subsampling algorithm

3. Simulation studies

3.1. Binary logistic measurement error regression model

3.2. Poisson measurement error regression model

4. Real data

4.1. Global census dataset

4.2. Creditcard fraud dataset

5. Conclusions

Author contributions

Use of Generative-AI tools declaration

Acknowledgments

Conflict of interest

Appendix

References

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Other Articles By Authors

Related pages

Tools

Export File

Citation

Format

Content

Catalog

Abstract

1. Introduction

2. Model and methodology

2.1. General subsampling algorithm

2.2. Optimal subsampling algorithm

3. Simulation studies

3.1. Binary logistic measurement error regression model

3.2. Poisson measurement error regression model

4. Real data

4.1. Global census dataset

4.2. Creditcard fraud dataset

5. Conclusions

Author contributions

Use of Generative-AI tools declaration

Acknowledgments

Conflict of interest

Appendix

References