Concentration for multiplier empirical processes with dependent weights

Huiming Zhang; Hengzhen Huang; Huiming Zhang; Hengzhen Huang

doi:10.3934/math.20231471

AIMS Mathematics

2023, Volume 8, Issue 12: 28738-28752. doi: 10.3934/math.20231471

Previous Article Next Article

Research article

Concentration for multiplier empirical processes with dependent weights

Huiming Zhang ^1,2,
Hengzhen Huang ^{3
,
,}

1.
Institute of Artificial Intelligence, Beihang University, Beijing 100083, China
2.
Zhuhai UM Science & Technology Research Institute, Zhuhai 519000, China
3.
College of Mathematics and Statistics, Guangxi Normal University, Guilin 541004, China

Received: 03 August 2023 Revised: 16 September 2023 Accepted: 27 September 2023 Published: 23 October 2023
MSC : 62K05, 62K15

A novel concentration inequality for the sum of independent sub-Gaussian variables with random dependent weights is introduced in statistical settings for high-dimensional data. The random dependent weights are functions of some regularized estimators. We applied the proposed concentration inequality to obtain a high probability bound for the stochastic Lipschitz constant for negative binomial loss functions involved in Lasso-penalized negative binomial regressions. We used this bound to study oracle inequalities for Lasso estimators. Additionally, a similar concentration inequality was derived for a randomly weighted sum of independent centred exponential family variables.

Keywords:

Citation: Huiming Zhang, Hengzhen Huang. Concentration for multiplier empirical processes with dependent weights[J]. AIMS Mathematics, 2023, 8(12): 28738-28752. doi: 10.3934/math.20231471

Related Papers:

[1]	Liqi Xia, Xiuli Wang, Peixin Zhao, Yunquan Song . Empirical likelihood for varying coefficient partially nonlinear model with missing responses. AIMS Mathematics, 2021, 6(7): 7125-7152. doi: 10.3934/math.2021418
[2]	Jingjing Yang, Weizhong Tian, Chengliang Tian, Sha Li, Wei Ning . Empirical likelihood method for detecting change points in network autoregressive models. AIMS Mathematics, 2024, 9(9): 24776-24795. doi: 10.3934/math.20241206
[3]	Suping Wang . The random convolution sampling stability in multiply generated shift invariant subspace of weighted mixed Lebesgue space. AIMS Mathematics, 2022, 7(2): 1707-1725. doi: 10.3934/math.2022098
[4]	Cuiping Wang, Xiaoshuang Zhou, Peixin Zhao . Empirical likelihood based heteroscedasticity diagnostics for varying coefficient partially nonlinear models. AIMS Mathematics, 2024, 9(12): 34705-34719. doi: 10.3934/math.20241652
[5]	Jianye Yang, Tongjiang Yan, Pengcheng Ren . A novel Bayesian federated learning framework to address multi-dimensional heterogeneity problem. AIMS Mathematics, 2023, 8(7): 15058-15080. doi: 10.3934/math.2023769
[6]	Rashad M. Asharabi, Somaia M. Alhazmi . Accelerating the convergence of a two-dimensional periodic nonuniform sampling series through the incorporation of a bivariate Gaussian multiplier. AIMS Mathematics, 2024, 9(11): 30898-30921. doi: 10.3934/math.20241491
[7]	Khanittha Talordphop, Yupaporn Areepong, Saowanit Sukparungsee . An empirical assessment of Tukey combined extended exponentially weighted moving average control chart. AIMS Mathematics, 2025, 10(2): 3945-3960. doi: 10.3934/math.2025184
[8]	Zhanshou Chen, Muci Peng, Li Xi . A new procedure for unit root to long-memory process change-point monitoring. AIMS Mathematics, 2022, 7(4): 6467-6477. doi: 10.3934/math.2022360
[9]	Gunduz Caginalp . Fat tails arise endogenously from supply/demand, with or without jump processes. AIMS Mathematics, 2021, 6(5): 4811-4846. doi: 10.3934/math.2021283
[10]	Miyoun Jung . A variational image denoising model under mixed Cauchy and Gaussian noise. AIMS Mathematics, 2022, 7(11): 19696-19726. doi: 10.3934/math.20221080

Abstract

1. Introduction

Over the last two decades, modern data collection techniques have enabled scientists and engineers to access and load vast numbers of variables as random data in their experiments. Probability theory provides the mathematical foundations for statistics and data-driven problems have led to various new advances in statistical research, which in turn contributes new and challenging problems in probability for further study. For instance, the rapid development of high-dimensional statistics has spurred the growth of the probability theory and even pure mathematics, including concentration inequalities, random matrix theory, geometric functional analysis and more ^[2,16].

The emergence of high-throughput data has led to a surge in statistical research on complex data, particularly on high-dimensional data and statistical learning ^[14,20]. This trend has gained traction in various scientific fields despite the high cost of measurements. Data sets are typically small, with only tens or hundreds of observations, and limited computing power often restricts the size of suitable finite samples. As a result, modern statisticians and data scientists have shifted their focus from asymptotic to non-asymptotic analysis, as it can handle small sample sizes and large model dimensions. Concentration inequalities play a crucial role in high-dimensional statistical inference, as they can derive various explicit non-asymptotic error bounds as a function of the sample size, sparsity level and dimension. When analyzing the various error bounds of the regularized estimator, concentration inequalities are indispensable tools for analysis ^[3,20].

When the random variables are unbound, the classcial Hoeffding's inequality ^[8] falls to do a non-asymptotic analysis. We need the concept of sub-Gaussian random variables ^[9] to obtain tight Hoeffding-type concentration inequalities for the sum of independent random variables. A centered random variable (r.v.) $X$ is called sub-Gaussian ( $X \sim \operatorname{subG}({\sigma^2})$ ) if ${\rm{E}}{e^{s X}} \le {e^{{s^2}{{\sigma}^2}/2}}$ for $\forall\; s \in \mathbb{R}$ , where ${\sigma} > 0$ is the sub-Gaussian parameter. From Chernoff's inequality, the exponential decay of the sub-Gaussian tail is obtained by $P \left(X \ge t \right)\le {\inf }_{s > 0} \exp\{-s t\} \mathrm{E} \exp\{s X\} \le{\inf }_{s > 0} \exp({-s t + \frac{\sigma^2s^2}{2}}) = \exp({- \frac{t^2}{2 \sigma^2}})$ , minimizing the upper bound by putting $s = {t}/{\sigma^2}$ . Moreover, for independent $\{ {X_i}\} _{i = 1}^n$ with $X_{i} \sim \operatorname{subG}(\sigma_{i})$ , we have sub-Gaussian concentration inequality

$\begin{equation} P\left(|\sum\nolimits_{i = 1}^{n}X_{i}| \ge t\right) \le 2\exp \left\{-\frac{t^{2}}{2 \sum _{i = 1}^{n} \sigma_{i}^{2}}\right\}, \; t \geq 0, \end{equation}$

(1.1)

for any variance proxies $\{\sigma_i^2\}_{i = 1}^n$ of $\{ {X_i}\} _{i = 1}^n$ (see Theorem 1.5 in ^[1]). Define the $L_p$ -norm of r.v. $X$ as ${\left\| X \right\|_p}: = {({\rm{E}}|X{|^p})^{1/p}}$ . An alternative form of the sub-Gaussian parameter is defined by the sub-Gaussian norm $\|\cdot\|_{\theta_{2}}$ for zero-mean r.v. $X$ is defined as $\|X\|_{\theta_{2}}: = \sup _{p \geq 1} [\frac{{\rm{E}}X^{2p}}{(2p-1)!!}]^{1/{(2p)}}$ (see page 23 in ^[1]).

Corollary 1.7 in ^[15] extended the sub-Gaussian concentration inequality (1.1) to the weighted sum of independent sub-Gaussian random variables with fixed weights.

Lemma 1. [Concentration for weighted sub-Gaussian sum] Let $Y_{1}, \ldots, Y_{n}$ be $n$ independent r.v.s with $Y_i \sim \operatorname{subG}(\sigma_i^{2})$ . Define ${\sigma ^2} = \mathop {\max }\limits_{1 \le i \le n} \sigma _i^2 < \infty$ . For any ${\mathit{\boldsymbol{ w }}} : = ({w_1}, \cdots, {w_n})^\top,$ we have

$P\left( {| {\sum\nolimits_{i = 1}^n {{w_i}} {Y_i}}| > t} \right) \le 2\exp \left( { - \frac{{{t^2}}}{{2{\sigma ^2}\|{\mathit{\boldsymbol{ w }}}\|_2^2}}} \right).$

However, if $w_{i}$ 's are random in Lemma 1, the story is totally different. The goal of this paper is to obtain novel theoretical results on the concentration inequality for the sum of dependent variables with random weights, under high-dimensional data background. Our theory is motivated from the non-asymptotic oracle inequalities of the regularized estimator in high-dimensional negative binomial regressions ^[21], and the concentration of random Lipschitz coefficients associated with empirical loss functions ^[4]. Our setting is different from classical multiplier empirical processes serving the multiplier Bootstrap inference, where the multipliers are random variables independent of $\{Y_i\}_{i = 1}^{n}$ (see Chapter 2.9 of ^[17] and ^[6,7]). Mendelson ^[11] studied the concentration inequalities for the centered multiplier process indexed by a functional class, where the i.i.d. multipliers need not be independent of the original empirical processes. In the analyses of high-dimension continuous data regressions by empirical processes, researches often resort to concentration inequalities of the Lipschitz function of strongly log-concave distributions (see Theorems 2.26 and 3.16 in ^[18]). For high-dimension count data regressions, our section 3.3 discusses the discrete distributions with strongly log-concave structures, which it is considered hard to check the definition of discrete strongly log-concave distributions (see (3.12) below). However, this strong assumption is usually intractable and unverifiable from the data. The sub-Gaussian assumption for the i.i.d. data is testable (see ^[23]).

In section two, we present the main results of the theory and demonstrate their applications in a class of high-dimensional generalized linear models. Theoretical proofs of the main results and some lemmas and additional results are given in section three. Finally, the conclusions are presented in section four.

2. Concentration for dependent summations

2.1. Main results

When controlling the summation of a function of the random sample indexed by a common estimator $\hat{\theta}$ , it is false to use any sort of classical law of large numbers and central limit theorems (or concentration inequality for independent summation).

Formally, let $X_{1}, \ldots, X_{n}$ be a random sample independently drawn from $P$ on a measurable space $(\mathcal{X}, \mathcal{A})$ . Given an estimator $\hat{\theta},$ we want to study its asymptotic properties for summation of some function $f_{\hat{\theta}}(X_i)$ ,

$\frac{1}{n} \sum\limits_{i = 1}^{n}[f_{\hat{\theta}}\left(X_{i}\right)- \mathrm{E} f_{\hat{\theta}}(X_{i})].$

A possible solution: Prove a uniform version (the suprema of empirical processes, see ^[17]) for all possible $\hat{\theta}$ on a set $K$ , which is usually stronger than what is needed.

$\frac{1}{n} \sum\limits_{i = 1}^{n}[f_{\hat{\theta}}\left(X_{i}\right)- \mathrm{E} f_{\hat{\theta}}(X_{i})]\le \sup\limits_{\theta \in K} \left| \frac{1}{n} \sum\limits_{i = 1}^{n}[f_{\theta}\left(X_{i}\right)- \mathrm{E} f_{\theta}(X_{i})]\right|.$

The summation in the $\rm{sup}$ enjoy independence.

In following theorem, we extend Lemma 1 with dependent and random weights.

Theorem 1. [Concentration for weighted dependent sum] Let $Y_{i}$ 's be independent centered sub-Gaussian random variables with $\mathop {\max }\nolimits_{1 \le i \le n}{\|Y_k\|_{\theta _2}} < \infty$ . Let $w_{i}(\hat{\theta})$ 's be a series of bounded function of a bounded random variable $\hat{\theta}$ as the weights (can be dependent on all $Y_{i}$ 's), where ${\| \hat{\theta}\|_1} \leq r < \infty$ and $\mathop {\max }\nolimits_{1 \le i \le n}w_{i}(\cdot) \le 1$ . Then, with probability of at least $1-\delta$ ,

$\begin{equation} \left| \frac{1}{n}{\sum\limits_{i = 1}^n {w_{i}(\hat{\theta})} {Y_i}} \right| \le 4\sqrt{\frac{1}{n}\sum\limits_{i = 1}^n\||Y_i- Y_i^{\prime}|\|_{\theta _2}^2}\sqrt {\frac{{\log \delta^{-1}}}{n}}+2\sqrt {\frac{1}{{n}}\sum\limits_{i = 1}^{n} {\rm{E}}(Y_i^2)}\sqrt {\frac{{2\log (2p)}}{n}}, \end{equation}$

(2.1)

for all ${\| \hat{\theta}\|_1} < \infty$ , where $Y_i^{\prime}$ is an independent copy of $Y_i$ .

The first term in (2.1) is due to sub-Gaussian concentration, and the second term in (2.1) is from the upper bound of the expected version of the superma of empirical process $f({\mathit{\boldsymbol{ Y }}}): = \frac{1}{n} \sup \nolimits_{{{\left\| {\theta} \right\|}_1} \le r} | {\sum\nolimits_{i = 1}^n {w_{i}({\theta}) } {Y_i}} |$ (see the proof in section three).

2.2. Applications: Local stochastic Lipschitz conditions in GLMs

The concentration of random Lipschitz coefficients associated with empirical loss functions is crucial for deriving error bounds of Lasso or Elastic-net penalized high-dimensional generalized linear models (GLMs) in high-dimensional regressions. For more information, please refer to ^[3,4,10].

Definition 1. [Elastic-net or Lasso penalized loss problems] Let $\{(Y_i, \mathit{\boldsymbol{X}}_i)\}_{i = 1}^{n}$ be independent identically random variables with values in $\mathbb{R}\times \mathbb{R}^p$ , where $\{Y_i\}_{i = 1}^{n}\sim Y$ are response variables and $\{\mathit{\boldsymbol{X}}_i\}_{i = 1}^{n}\sim \mathit{\boldsymbol{X}}$ are covariates. Let $l(y, \mathit{\boldsymbol{x}}, \boldsymbol{\beta})$ be a loss function of parameter $\boldsymbol{\beta}$ and data $(y, \mathit{\boldsymbol{x}})$ . The empirical loss function is defined as

$\mathbb{P}_n l(Y, \boldsymbol{X} , {{ \boldsymbol{\beta} }}): = \frac{1}{n}\sum\limits_{i = 1}^n l(Y_i, \mathit{\boldsymbol{X}}_i, \boldsymbol{\beta}).$

Elastic-net (or Lasso) estimators are given by

$\begin{equation} \boldsymbol{\hat \beta} = : \boldsymbol{\hat \beta} ({\lambda _1}, {\lambda _2}) = \mathop {\rm{argmin}}\limits_{\boldsymbol{\beta} \in {\mathbb{R}^p}} \{ \mathbb{P}_n l(Y, \boldsymbol{X} , {{ \boldsymbol{\beta} }}) + {\lambda _1}\left\| \boldsymbol{\beta} \right\|_1 + {\lambda _2}{\left\| \boldsymbol{\beta} \right\|_2^2}\}, \end{equation}$

(2.2)

where ${\lambda _1} > 0$ and ${\lambda _2}\ge 0$ are tuning parameters.

Define the minimizer

$\begin{equation} { \boldsymbol{\beta}^{\rm{*}}}{ = }\mathop {\rm{argmin} }\limits_{\boldsymbol{\beta}\in {{\mathbb{R}}^p}} {\rm{E}}[l(Y, {\mathit{\boldsymbol{X}}, { \boldsymbol{\beta} }})] \end{equation}$

(2.3)

as the vector of true coefficients, where $l(Y, \mathit{\boldsymbol{X}}, \boldsymbol{\beta})$ is the loss function. The $\ell_1$ ball is denoted as ${{\cal S}_R}(\boldsymbol{\beta} ^*): = \left\{ { \boldsymbol{\beta} \in {\mathbb{R}^p}:{\|{ \boldsymbol{\beta} } - \boldsymbol{\beta} ^*\|_1} \le R} \right\}$ . Theorem 1 can be used to establish exponential-type concentration inequalities for the local stochastic Lipschitz (LSL) constant:

$\sup \limits_{{\boldsymbol{\beta}\in{\cal S}_R}(\boldsymbol{\beta}^*)} \frac{\left( \mathbb{P}_{n}-\mathbb{P}\right) [ l(Y, \boldsymbol{X} , \boldsymbol{\beta})-l(Y, \boldsymbol{X} , \boldsymbol{\beta}^{*})]}{\|{ \boldsymbol{\beta} } - \boldsymbol{\beta} ^*\|_1}.$

When study error bounds ${\|{{\hat{ \boldsymbol{\beta}} }} - \boldsymbol{\beta} ^*\|_1}$ of Lasso or Elastic-net penalized high-dimensional GLMs, one must bound the dependent empirical processes $\frac{\left(\mathbb{P}_{n}-\mathbb{P}\right) [l(Y, \boldsymbol{X}, \boldsymbol{\hat \beta})-l(Y, \boldsymbol{X}, \boldsymbol{\beta}^{*})]}{\|\boldsymbol{\hat \beta} - \boldsymbol{\beta}^*\|_1}$ with the LSL constant as the upper bound.

Next, we provide an example of negative binomial loss in negative binomial regressions ^[21]. The negative binomial loss function is $l(y, \mathit{\boldsymbol{x}}, \boldsymbol{\beta}) = y\mathit{\boldsymbol{x}}^\top \boldsymbol{\beta} - (\theta + y)\log (\theta + {e^{\mathit{\boldsymbol{x}}^\top\boldsymbol{\beta} }})$ , where $\theta$ is called the dispersion parameter. Denote the expected risk function as $\mathbb{P}l(Y, \boldsymbol{X}, \boldsymbol{\beta}) : = {\rm E}l(Y, \boldsymbol{X}, \boldsymbol{\beta})$ . Let ${l_1}(y, \mathit{\boldsymbol{x}}, \boldsymbol{\beta}) : = - y[\mathit{\boldsymbol{x}}^\top \boldsymbol{\beta} - {\rm{log}}(\theta + {\rm{ exp}}\{ {\rm{ }}{\boldsymbol{x}^\top} \boldsymbol{\beta} \})]$ , and ${l_2}(\mathit{\boldsymbol{x}}, \boldsymbol{\beta}) : = \theta {\rm{log}}(\theta {\rm{ + exp}}\{{\boldsymbol{x}^\top} \boldsymbol{\beta} \})$ , then

${\left( \mathbb{P}_{n}-\mathbb{P}\right) [ l(y, \boldsymbol{x} , \boldsymbol{\beta})-l(y, \boldsymbol{x} , \boldsymbol{\beta}^{*})]} = \left( \mathbb{P}_{n}-\mathbb{P}\right) [ l_1(y, \boldsymbol{x} , \boldsymbol{\beta})-l_1(y, \boldsymbol{x} , \boldsymbol{\beta}^{*})]+\left( \mathbb{P}_{n}-\mathbb{P}\right) [ l_2( \boldsymbol{x} , \boldsymbol{\beta})-l_2(\boldsymbol{x} , \boldsymbol{\beta}^{*})].$

The upper bounds for the first and second parts of the empirical process: $(\mathbb{P}_{n}-\mathbb{P})(l_{m}(\boldsymbol{\beta}^{*})-l_{m}(\hat{ \boldsymbol{\beta}}))$ for $m = 1, 2$ is paramount to study the error bound of ${\|{{\hat{ \boldsymbol{\beta}} }} - \boldsymbol{\beta} ^*\|_1}$ .

Let $\lambda$ be a positive constant that needs to be determined. We have

$\begin{align} &\; \; \; \; P\left(\sup _{{\boldsymbol{\beta}\in{\cal S}_R}(\boldsymbol{\beta}^*)} \frac{|\left( \mathbb{P}_{n}-\mathbb{P}\right) [ l(Y, \boldsymbol{X} , \boldsymbol{\beta})-l(Y, \boldsymbol{X} , \boldsymbol{\beta}^{*})]|}{\|{ \boldsymbol{\beta} } - \boldsymbol{\beta} ^*\|_1}\le \lambda\right)\\ & \le P\left(\sup _{{\boldsymbol{\beta}\in{\cal S}_R}(\boldsymbol{\beta}^*)} \frac{|\left( \mathbb{P}_{n}-\mathbb{P}\right) [ l_1(Y, \boldsymbol{X} , \boldsymbol{\beta})-l_1(Y, \boldsymbol{X} , \boldsymbol{\beta}^{*})]|}{\|{ \boldsymbol{\beta} } - \boldsymbol{\beta} ^*\|_1}\le \frac{\lambda}{2}\right)\\ &+P\left(\sup _{{\boldsymbol{\beta}\in{\cal S}_R}(\boldsymbol{\beta}^*)} \frac{|\left( \mathbb{P}_{n}-\mathbb{P}\right) [ l_2(\boldsymbol{X} , \boldsymbol{\beta})-l_2(\boldsymbol{X} , \boldsymbol{\beta}^{*})]|}{\|{ \boldsymbol{\beta} } - \boldsymbol{\beta} ^*\|_1}\le \frac{\lambda}{2}\right). \end{align}$

(2.4)

Here, we assume that both $\boldsymbol{x}$ and $\boldsymbol{\beta}$ are bound, and $\theta$ is a known dispersion parameter. The high probability for the second term in (2.4) is easy to deal with if we apply McDiarmid's inequality (see Lemma 4 of ^[21]). However, the high probability for the first term in (2.4) is hard to control since it contains unbounded negative binomial variables $\{Y_i\}_{i = 1}^{n}$ . Zhang and Jia ^[21] used the concentration inequality for strongly log-concave discrete distributions to solve this problem, but the strongly log-concave property is difficult to check for discrete distribution (see (H.4) in ^[21]). The sub-Gaussian distribution assumption is easy to verify for negative binomial variables, and this is from the fact that the negative binomial distribution belongs to the exponential family if the dispersion parameter is given. When $\Theta$ is compact in (2.7) below, Proposition 3.2 in ^[20] shows that $\{ {Y_i}\} _{i = 1}^n$ is sub-Gaussian.

From Taylor's expansion of continuous functions, one has $\log \left({\theta + {e^x}} \right) - \log \left({\theta + {e^a}} \right) = \frac{{{e^{\tilde a}}}}{{\theta + {e^{\tilde a}}}}(x - a)$ , where ${\tilde a}$ is some real number between $a$ and $x$ . Let $\boldsymbol{X}_i^\top\tilde {\boldsymbol{\beta}}$ be some point between $\boldsymbol{X}_i^\top\hat{ \boldsymbol{\beta}}$ and $\boldsymbol{X}_i^\top \boldsymbol{\beta} ^*$ , i.e., $\tilde {\boldsymbol{\beta}} = \left({\begin{array}{*{20}{c}} {{t_1}{{\hat \beta }_1}}\\ \vdots \\ {{t_p}{{\hat \beta }_p}} \end{array}} \right) + \left({\begin{array}{*{20}{c}} {(1 - {t_1})\beta _1^*}\\ \vdots \\ {(1 - {t_p})\beta _p^*} \end{array}} \right)$ for $\{ {t_j}{\rm{\} }}_{j = 1}^p \subset [0, 1]$ . Observe that

$\begin{align} (\mathbb{P}_{n}-\mathbb{P}) [ l_{1}( \boldsymbol{\beta}^{*})-l_{1}(\hat{ \boldsymbol{\beta}})]& = \frac{{ - 1}}{n}\sum\limits_{i = 1}^n {({Y_i}} - {\rm{E}}{Y_i})\boldsymbol{X}_i^\top[({ \boldsymbol{\beta} ^*} - \hat {\boldsymbol{\beta}} ) - \log ( {\frac{{\theta {\rm{ + exp}}\{ {\bf{X}}_i^\top{ \boldsymbol{\beta} ^*}\} }}{{\theta {\rm{ + exp}}\{ {\bf{X}}_i^\top\hat {\boldsymbol{\beta}} \} }}})]\\ & = \frac{{ - 1}}{n}\sum\limits_{i = 1}^n {({Y_i}} - {\rm{E}}{Y_i}){\boldsymbol{X}_i^\top}[({ \boldsymbol{\beta} ^*} - \hat {\boldsymbol{\beta}} ) - \frac{{{\rm{exp}}\{ \boldsymbol{X}_i^\top\tilde {\boldsymbol{\beta}} \} \boldsymbol{X}_i^\top({ \boldsymbol{\beta} ^*} - \hat {\boldsymbol{\beta}} )}}{{\theta {\rm{ + exp}}\{ {\boldsymbol{X}_i^\top}\tilde {\boldsymbol{\beta}} \} }}] \\ & = \frac{{ 1}}{n}\sum\limits_{i = 1}^n \frac{{\theta \boldsymbol{X}_i^\top(\hat {\boldsymbol{\beta}}-{ \boldsymbol{\beta} ^*})}}{{\theta {\rm{ + exp}}\{ \boldsymbol{X}_i^\top\tilde {\boldsymbol{\beta}}\} }} \cdot ({Y_i} - {\rm{E}}{Y_i}). \end{align}$

(2.5)

For a finite $M_0$ , if we have $\hat{ \boldsymbol{\beta}} \in {{\cal S}_{M_0}}(\boldsymbol{\beta} ^*)$ , then $\tilde {\boldsymbol{\beta}} \in {{\cal S}_{M_0}}(\boldsymbol{\beta} ^*)$ . This is from the fact $\| {\tilde {\boldsymbol{\beta}} - { \boldsymbol{\beta} ^*}} \| \le \sum\nolimits_{j = 1}^p {{t_j}|{{\hat \beta }_j} - \beta _1^*|} \le \| {{{\hat{ \boldsymbol{\beta}} }} - {\boldsymbol{\beta}^*}} \| \le {M_0}.$ Suppose $|\boldsymbol{X}_i|_{\infty}$ is uniformly bounded by $1/M_0$ . We have $|(2.5)|: = \frac{1}{n}{\sum\limits_{i = 1}^n {w_{i}({\hat{ \boldsymbol{\beta}} })} {({Y_i}} - {\rm{E}}{Y_i})}$ with dependent weights

${w_{i}({\hat {\boldsymbol{\beta}} })}: = \frac{{\theta \boldsymbol{X}_i^\top(\hat {\boldsymbol{\beta}}-{ \boldsymbol{\beta} ^*})}}{{\theta {\rm{ + exp}}\{ \boldsymbol{X}_i^\top\tilde {\boldsymbol{\beta}}\} }}$ and

$|{w_{i}({\hat {\boldsymbol{\beta}} })}|\le 1$ .

Thus, the high probability upper bound in Theorem 1 is applicable to determine $\frac{\lambda}{2}$ , i.e.,

$\begin{equation} \frac{\lambda}{2} = 4\sqrt{\frac{1}{n}\sum\limits_{k = 1}^n\||Y_k- Y_k^{\prime}|\|_{\theta _2}^2}\sqrt {\frac{{\log \delta^{-1}}}{n}}+2\sqrt {\frac{1}{{n}}\sum\limits_{i = 1}^{n} {\rm{E}}({Y_i} - {\rm{E}}{Y_i})^2}\sqrt {\frac{{2\log 2p}}{n}}. \end{equation}$

(2.6)

2.3. Berstein-type concentration inequalities

In this section, let $\{ {Y_i}\} _{i = 1}^n$ be exponential family random variables with density

$\begin{equation} f(y_i;\theta_i ) = c(y_i)\exp \{y_i\eta_i - b(\eta_i )\}, \; \eta_i \in \Theta. \end{equation}$

(2.7)

Here, ${\rm{E}}(Y_i) = \dot{b}(\eta_i)$ and ${\rm{Var}}(Y_i) = \ddot{b}(\eta_i)$ . It should be noted that Proposition 3.2 in ^[20] shows that $\{ {Y_i}\} _{i = 1}^n$ is sub-Gaussian if $\Theta$ is compact.

Under distributional assumption (2.7), we will study the Berstein-type concentration inequalities for the randomly weighted sum of centered exponential family random variables (with different parameters $\theta_i$ 's):

$\sum\limits_{i = 1}^n \{w_{i}(\hat{\theta}){Y_i} - {\rm{E}}[w_{i}(\hat{\theta}) {Y_i}]\},$

where the $\{w_{i}(\hat{\theta})\}_{i = 1}^n$ are called the multipliers (or random weights), and $\hat{\theta}$ is independent of $\{ {Y_i}\}_{i = 1}^n$ .

Theorem 2. [Concentration inequalities for randomly weighted sum of exponential family r.v.s] If $\{ {Y_i}\} _{i = 1}^n$ has density (2.7) with moment conditions: ${\rm{E}}|Y_i|^k \le k!C_Y^k$ , where $C_Y > 0$ is a constant. We assume that

(i) Bounded weights: ${\mathit{\boldsymbol{ W }}}_{\hat\theta} : = ({w_{1}}({\hat\theta}), \cdots, {w_{n}}({\hat\theta}))^\top$ is a random vector s.t. $\mathop {\max }\nolimits_{1 \le i \le n} |{w_{i}}({\hat\theta})| \le {w} < \infty$ ;

(ii) Let ${\rm{E}}[|{Y_i}{|^k}|{\mathit{\boldsymbol{ W }}}_{\hat\theta}] = {\rho _{i, k}}{\rm{E}}[|{Y_i}{|^k}]$ with ${\rm{E}}{\rho _{i, k}} = 1$ ;

(iii) There exists a non-decreasing sequence ${\{ {u_n}\} }$ and constant ${C_\rho}$ s.t. $P(\mathop {\max}\limits_{k\ge 1, 1 \le i \le n} {\rho _{i, k}} > {u_n}) \le {C_\rho}/{{{u_n}}}.$ Then,

$\begin{equation} P( {|\sum\limits_{i = 1}^n {[{w_{i}}({\hat\theta}){Y_i}} - {\rm{E}}[{w_{i}}({\hat\theta}){Y_i})]| \ge t} ) \le 2\exp \left\{ { - \frac{{{t^2}}}{{16n{u_n}{{(w{C_T})}^2} + 4w{C_T}t}}} \right\}+\frac{C_\rho}{{{u_n}}}, \end{equation}$

(2.8)

where $\{{w_{i}}({\hat\theta}){Y_i}\}$ is dependent since each ${w_{i}}({\hat\theta}){Y_i}$ depends on a common estimator ${\hat\theta}$ from the data $\{ {Y_i}\} _{i = 1}^n$ .

It should be noted that our work here is related to Proposition 3.2 in ^[20] and it is about the concentration inequalities for the non-random weighted sum of exponential family random variables. For condition (ii), suppose that the estimator ${\hat\theta}$ converges to a true parameter $\theta^*$ almost surely, and one has ${\rm{E}}[|{Y_i}{|^k}|{\mathit{\boldsymbol{ W }}}_{\theta^*}] = {\rm{E}}[|{Y_i}{|^k}]$ since $W_{\theta^*}$ is non-random. The difference between conditional expectation and unconditional expectation is:

$\mathop {\max}\limits_{k\ge 1, 1 \le i \le n}\left| {{\rho _{i, k}} - 1} \right| = \mathop {\max}\limits_{k\ge 1, 1 \le i \le n}\frac{|{\rm{E}}[|{Y_i}{|^k}|{\mathit{\boldsymbol{ W }}}_{\hat\theta}]-{\rm{E}}[|{Y_i}{|^k}|{\mathit{\boldsymbol{ W }}}_{\theta^*}]|}{{\rm{E}}[|{Y_i}{|^k}|{\mathit{\boldsymbol{ W }}}_{\theta^*}]}\le O_p(\|{\hat\theta}-{\theta^*}\|_1),$

if $|{\rm{E}}[|{Y_i}{|^k}|{\mathit{\boldsymbol{ W }}}_{{\boldsymbol{\beta}}}]$ is a $\ell_1$ -Lipschitz function of ${ \boldsymbol{\beta}}$ . We call $P(\mathop {\max}\limits_{k\ge 1, 1 \le i \le n} {\rho _{i, k}} \ge {u_n}) \le {C_\rho}/{{{u_n}}}$ in assumption (iii) a high level condition. Intuitionally, due to the dependence summation, the random weighted summation will lose the rate of convergence in the exponential inequalities (addition term $4w{C_T}t$ is added), compared to the case of non-random weighted summation. The assumption of compact parameter space for the exponential family is key to obtaining the sub-Gaussian type concentration inequalities.

Our multiplier concentration inequality here is different from ^[11], which studies the concentration upper bounds for centered multiplier empirical processes $\frac{1}{{\sqrt n }}\sum\limits_{i = 1}^n {[{W_i}{Y_i}}- {\rm{E}}({W_i}{Y_i})]$ (the random weights $\{W_i\}$ and random variables $\{Y_i\}$ need not be independent); however, they assume that $\{W_i\}$ is i.i.d. To the best of our knowledge, Theorem 2 is a new concentration inequality that is suitable for the weighted sum of dependent random variables.

3. Theoretical proofs

3.1. Proofs of main results

Proof of Theorem 1: Let ${\mathit{\boldsymbol{ Y }}} = (Y_1, \cdots, Y_n)^\top$ be a vector of independent r.v.s in a space $\mathcal{Y},$ and define $(Y_1^{\prime}, \cdots, Y_n^{\prime})^\top$ as an independent copy of $(Y_1, \cdots, Y_n)^\top$ . For any function $f: \mathcal{Y}^{n} \rightarrow \mathbb{R}$ , it is of interest to study the concentration for $f({\mathit{\boldsymbol{ Y }}})$ about its expectation. In case of Theorem 1,

$f({\mathit{\boldsymbol{ Y }}}): = \frac{1}{n}\mathop {\sup }\limits_{{{\left\| {\theta} \right\|}_1} \le r} | {\sum\limits_{i = 1}^n {w_{i}({\theta}) } {Y_i}} |$ .

For $z \in \mathcal{Y}$ and $k \in\{1, \ldots, n\}$ , define the substitution operator

$S_{z}^{k}: \mathcal{Y}^{n} \rightarrow \mathcal{Y}^{n}$ by

$S_{z}^{k} y : = \left(y_{1}, \ldots, y_{k-1}, z, y_{k+1}, \ldots, y_{n}\right)$

and the centered conditional version of $f$

$\begin{align} D_{f, Y_k}(y) & : = f\left(y_{1}, \ldots, y_{k-1}, Y_{k}, y_{k+1}, \ldots, y_{n}\right)-{\rm{E}} f\left(y_{1}, \ldots, y_{k-1}, Y_{k}^{\prime}, y_{k+1}, \ldots, y_{n}\right) \\ & = f(S_{Y_{k}}^{k} y)-{\rm{E}}f(S_{Y_{k}^{\prime}}^{k} y) = {\rm{E}}[ f(S_{Y_{k}}^{k} y)-f(S_{Y_{k}^{\prime}}^{k} y) \mid Y_k ]. \end{align}$

(3.1)

Next, we use a constant-sharper sub-Gaussian concentration for $f(Z)$ in Corollary 4 in ^[22], which requires the $\| \cdot \|_{{\theta_2}}$ -norm condition of r.v. $\{D_{f, Z_i}(z)\}_{i = 1}^n$ .

Lemma 2. If $\{D_{f, Z_i}(z)\}_{i = 1}^n$ has finite $\| \cdot \|_{{\theta_2}}$ -norm for ${z \in \mathcal{Z}}$ , then $f(Z)- {\rm{E}}f(Z) \sim \operatorname{subG}({\rm{8}}\sup_{z \in \mathcal{Z}} \sum_{i = 1}^n {\left\| {{D_{ f, {Z_i}}}(z)} \right\|_{{\theta _2}}^2})$ and

${P} \left\{ {f(Z) - {\rm{E}}f(Z) > t} \right\} \le e^{{{{ - {t^2}}}/{({{\rm{16}}\sup\limits_{z \in \mathcal{Z}} \sum\limits_{i = 1}^n {\| {{D_{f, {Z_i}}}(z)} \|_{{\theta _2}}^2} })}}}, t \ge 0$ .

From the identity in (3.1), we have

$\begin{align} \| D_{f, Y_k}(y) \|_{\theta _2} & = \|f\left(y_{1}, \ldots, y_{k-1}, Y_{k}, y_{k+1}, \ldots, y_{n}\right)-{\rm{E}} f\left(y_{1}, \ldots, y_{k-1}, Y_{k}^{\prime}, y_{k+1}, \ldots, y_{n}\right)\|_{\theta _2}\\ & = \|\frac{1}{n}\mathop {\sup }\limits_{{{\left\| {\theta} \right\|}_1} \le r} | {w_{1}({\theta}) } {y_1}+\cdots+{w_{k-1}({\theta}) } y_{k-1}+{w_{k}({\theta}) } Y_{k}+{w_{k+1}({\theta}) } y_{k+1} +\cdots+{w_n({\theta}) } y_n |\\ &- \frac{1}{n}\mathop {\sup }\limits_{{{\left\| {\theta} \right\|}_1} \le r} | {w_{1}({\theta}) } {y_1}+\cdots+{w_{k-1}({\theta}) } y_{k-1}+{w_{k}({\theta}) } Y_{k}^{\prime}+{w_{k+1}({\theta}) } y_{k+1} +\cdots+{w_n({\theta}) } y_n | \mid Y_k ] \|_{\theta _2}\\ &\le \frac{1}{n}\|\mathrm{E}[\mathop {\sup }\limits_{{{\left\| {\theta} \right\|}_1} \le r}{w_{k}({\theta}) }|Y_k- Y_k^{\prime}||Y_k]\|_{\theta _2}\le \frac{1}{n}\|\mathrm{E}[|Y_k- Y_k^{\prime}||Y_k]\|_{\theta _2}. \end{align}$

(3.2)

The conditional Jensen's inequality gives

$\begin{align} \mathrm{E}\left[\left|\mathrm{E}\left[|Y_k- Y_k^{\prime}|\mid {X_k}\right]\right|^{p}\right]&\le \mathrm{E}\left[\mathrm{E}\{|Y_k- Y_k^{\prime}| \mid {X_k}\}^{p}\right] = \mathrm{E}\left[\mathrm{E}\{\left(|Y_k- Y_k^{\prime}|^{p}\right)^{1/p} \mid {X_k}\}^{p}\right] \\ & \le \mathrm{E}\left\{\mathrm{E}\left[|Y_k- Y_k^{\prime}|^{p} \mid {X_k}\right]\right\} = \mathrm{E}|Y_k- Y_k^{\prime}|^{p}, \; p \geq 1. \end{align}$

(3.3)

The definition ${\left\| X \right\|_{\theta_{2}}} = \sup\nolimits_{k \ge 1} {[{\frac{{{2^k}k!}}{{(2k)!}}{\rm{E}}{X^{2k}}}]^{1/(2k)}}$ shows $\| D_{f, Y_k}(y) \|_{\theta _2} \le \frac{1}{n} \||Y_k- Y_k^{\prime}|\|_{\theta _2}$ by (3.2) and (3.3). Hence, we have $\sup_{z \in \mathcal{Z}} \sum_{i = 1}^n {\| {{D_{f, {Z_i}}}(z)} \|_{{\theta _2}}^2} = \frac{1}{n^2}\sum_{k = 1}^n\||Y_k- Y_k^{\prime}|\|_{\theta _2}^2$ in Lemma 2, which leads to

${\mathop{\rm P}\nolimits} \left\{ {f({\mathit{\boldsymbol{ Y }}}) - {\rm{E}}f({\mathit{\boldsymbol{ Y }}}) > t} \right\} \le e^{{{{ - {(nt)^2}}}/16\sum\limits_{k = 1}^n\||Y_k- Y_k^{\prime}|\|_{\theta _2}^2}}$ ,

$t \ge 0$ .

Let $\delta = e^{{{{ - {(nt)^2}}}/16\sum_{k = 1}^n\||Y_k- Y_k^{\prime}|\|_{\theta _2}^2}}$ , and $t = 4\sqrt{\frac{1}{n}\sum_{k = 1}^n\||Y_k- Y_k^{\prime}|\|_{\theta _2}^2}\sqrt {\frac{{\log \delta^{-1}}}{n}}$ . We have

$\begin{equation} f({\mathit{\boldsymbol{ Y }}}) \le t+{\rm{E}}f({\mathit{\boldsymbol{ Y }}}) = 4\sqrt{\frac{1}{n}\sum\limits_{i = 1}^n\||Y_i- Y_i^{\prime}|\|_{\theta _2}^2}\sqrt {\frac{{\log \delta^{-1}}}{n}}+{\rm{E}}f({\mathit{\boldsymbol{ Y }}}), \end{equation}$

(3.4)

with probability at least $1-\delta$ .

It remains to obtain a bounds on ${\rm{E}}f({\mathit{\boldsymbol{ Y }}})$ , which is upper bounded by the symmetrization theorem from Lemma 3 with different functions. To see this, let $X_i = Y_i$ in Lemma 3 and $g_i(Y_i) = w_{i}({\theta}) Y_i$ for $i = 1, \cdots, n$ .

Since $w_{i}({\theta})$ 's are series of bounded functions of a common bounded variable ${\theta}$ where ${\| {\theta}\|_1} \leq r$ and $\mathop {\max }\nolimits_{1 \le i \le n}w_{i}(\cdot) \le 1$ , for any vector ${\theta}$ with ${{\left\| {\theta} \right\|}_1} \le r$ , there exists a sequence of vectors $\{a_{w_{i}}\}_{i = 1}^{n}\in \mathbb{R}^p$ with ${\left\| a_{w_{i}} \right\|_{\infty}}\le1/r$ such that

$\begin{align} w_{i}({\theta}) = { a_{w_{i}}^\top{\theta}} \le {{{{\left\| a_{w_{i}} \right\|}_{\infty}}{{\left\| {\theta} \right\|}_1}}} \le 1. \end{align}$

(3.5)

Equation (3.5) and Lemma 3 imply

$\begin{align*} {\rm{E}}f({\mathit{\boldsymbol{ Y }}}) & \le \frac{2}{n}{\rm{E}}\left(\mathop {\sup }\limits_{{{\left\| {\theta} \right\|}_1} \le r} | {\sum\limits_{i = 1}^n {w_{i}({\theta})}\epsilon_{i} {Y_i}} | \right) = \frac{2}{n}{\rm{E}}\left( {\mathop {\sup }\limits_{{{\left\| {\theta} \right\|}_1} \le r} |\sum\limits_{i = 1}^n \sum\limits_{j = 1}^p {{\epsilon_i}}{Y_i}{{a_{{w_{i}}j}}} {\theta_j}|} \right)\\ & = \frac{2}{n}{\rm{E}}\left( {\mathop {\sup }\limits_{{{\left\| {\theta} \right\|}_1} \le r} | \sum\limits_{j = 1}^p (\sum\limits_{i = 1}^n {{\epsilon_i}}{Y_i}{{a_{{w_{i}}j}}}) {\theta_j}|} \right)\\ [\text{by Hölder's inequality}]\; \; & \le \frac{2}{n} {\rm{E}} \left(\mathop {\sup }\limits_{{{\left\| {\theta} \right\|}_1} \le r} \mathop {\max }\limits_{1 \le j \le p}\left\lvert\sum\limits_{i = 1}^{n} {{\epsilon_i}}{Y_i}{{a_{{w_{i}}j}}} \right\rvert \cdot {\left\| \theta \right\|_1}\right) \\ & \le \frac{{2r}}{{n}}{\rm{E}}\left( \mathop {\max }\limits_{1 \le j \le p}\left\lvert\sum\limits_{i = 1}^{n} {{\epsilon_i}}{Y_i}{{a_{{w_{i}}j}}} \right\rvert \right) = \frac{2r}{{n}}{\rm{E}}\left({\rm{E}}_{\epsilon} \mathop {\max }\limits_{1 \le j \le p}\left\lvert\sum\limits_{i = 1}^{n} {{\epsilon_i}}{Y_i}{{a_{{w_{i}}j}}} \right\rvert \right). \end{align*}$

Next, we apply the maximal inequality. By Corollary 7.5 in ^[20], with ${\rm{E}}[{{\epsilon_i}}{Y_i}{{a_{{w_{i}}j}}}|{\mathit{\boldsymbol{ Y }}}] = 0$ and ${{\epsilon_i}}{Y_i}{{a_{{w_{i}}j}}} \le \mathop {\max }\nolimits_{1 \le i \le n}{\left\| a_{w_{i}} \right\|_{\infty}}{Y_i} = r^{-1}{Y_i}$ , one has

$\begin{align*} \frac{2r}{{n}}{\rm{E}}\left({\rm{E}}_{\epsilon} \mathop {\max }\limits_{1 \le j \le p}\left\lvert\sum\limits_{i = 1}^{n} {{\epsilon_i}}{Y_i}{{a_{{w_{i}}j}}} \right\rvert \right)&\le \frac{{2}}{{n}}\sqrt {2\log (2p)} {\rm{E}}\left( \sqrt {\sum\limits_{i = 1}^{n} Y_i^2}| {\mathit{\boldsymbol{ Y }}} \right)\\ [\text{By Jensen's inequality}]\; &\le \frac{{2}}{{n}}\sqrt {2\log (2p)} \sqrt {{\rm{E}}\sum\limits_{i = 1}^{n} Y_i^2} = 2\sqrt {\frac{1}{{n}}\sum\limits_{i = 1}^{n} {\rm{E}}Y_i^2}\sqrt {\frac{{2\log (2p)}}{n}}. \end{align*}$

Thus, ${\rm{E}}f({\mathit{\boldsymbol{ Y }}})\le 2\sqrt {\frac{1}{{n}}\sum_{i = 1}^{n} {\rm{E}}Y_i^2}\sqrt {\frac{{2\log (2p)}}{n}}$ . Using (3.4),

$\begin{equation} f({\mathit{\boldsymbol{ Y }}}) \le t+{\rm{E}}f({\mathit{\boldsymbol{ Y }}}) = 4\sqrt{\frac{1}{n}\sum\limits_{i = 1}^n\||Y_i- Y_i^{\prime}|\|_{\theta _2}^2}\sqrt {\frac{{\log \delta^{-1}}}{n}}+2\sqrt {\frac{1}{{n}}\sum\limits_{i = 1}^{n} {\rm{E}}Y_i^2}\sqrt {\frac{{2\log (2p)}}{n}}, \end{equation}$

(3.6)

with the probability of at least $1-\delta$ .

Proof of Theorem 2: We will adopt the following result, which gives the moments inequality for the exponential family. It is deduced by the analytic properties of the absolute moments of exponential family random variables:

${\rm{E}}|Y|^k \le k!C_Y^k,$

see Proposition 5.2 in ^[20]. For notation simplicity, let ${{\mathit{\boldsymbol{ W }}}}: = {\mathit{\boldsymbol{ W }}}_{\hat\theta}$ and $W_{i} = {w_{i}}({\hat\theta})$ . By using Taylor's expansion and the binomial coefficient formula, we have the following upper bound for the conditional moment generating function of ${W_i}{Y_i} - {\rm{E}}({W_i}{Y_i})$ , conditioning on the event $\{ \mathop {\max }\nolimits_{k \ge 1, 1 \le i \le n} {\rho _{i, k}} \le {u_n}\}$ :

$\begin{align} {\rm{E}}[{e^{s({W_i}{Y_i} - {\rm{E}}({W_i}{Y_i}))}}|{\mathit{\boldsymbol{ W }}}] & = 1+ \sum\limits_{m = 2}^\infty {\frac{{{s^m}}}{{m!}}} {\rm{E}}[{({W_i}{Y_i} - {\rm{E}}({W_i}{Y_i}))^m}|{\mathit{\boldsymbol{ W }}}]\\ & = 1+\sum\limits_{m = 2}^\infty {\frac{{{s^m}}}{{m!}}} {\rm{E}}[\sum\limits_{k = 0}^m {\left( {\begin{array}{*{20}{c}} k\\ m \end{array}} \right)} {{\rm{(}}{W_i}{Y_i})^k}{( - {\rm{E(}}{W_i}{Y_i}))^{m - k}}|{\mathit{\boldsymbol{ W }}}]\\ & \le 1+\sum\limits_{m = 2}^\infty {\frac{{{s^m}}}{{m!}}} [\sum\limits_{k = 0}^m {\left( {\begin{array}{*{20}{c}} k\\ m \end{array}} \right)} {\rm{E}}|{W_i}{Y_i}{|^k}{({\rm{E}}|{W_i}{Y_i}|)^{m - k}}|{\mathit{\boldsymbol{ W }}}]\\ (\text{Due to}\; \mathop {\max }\limits_{1 \le i \le n} |{W_i}| \le w)\; \; & \le 1+\sum\limits_{m = 2}^\infty {\frac{{{s^m}}}{{m!}}} [w^m\sum\limits_{k = 0}^m {\left( {\begin{array}{*{20}{c}} k\\ m \end{array}} \right)} {\rm{E}}(|{Y_i}|^k|{\mathit{\boldsymbol{ W }}}){({\rm{E}}|{Y_i}|)^{m - k}}]\\ &\le 1+\sum\limits_{m = 2}^\infty {\frac{{{s^m}}}{{m!}}} [(2w)^m\mathop {{\rm{max}}}\limits_{1 \le k \le m} \{ {\rm{E}}(|{Y_i}|^k|{\mathit{\boldsymbol{ W }}}){({\rm{E}}|{Y_i}|)^{m - k}}\}\\ (\text{By assumption (iii)})&\le 1+u_n \sum\limits_{m = 2}^\infty {\frac{{(2ws)^m}}{{m!}}} [\mathop {{\rm{max}}}\limits_{1 \le k \le m} \{ {\rm{E}}(|{Y_i}|^k){({\rm{E}}|{Y_i}|)^{m - k}}\}, \end{align}$

(3.7)

for $s \in (0, \delta)$ with some $\delta > 0$ .

Therefore, we can assume that $\left| {2sw{C_T}} \right| < 1$ , so

$\begin{align} {\rm{E}}[{e^{s({W_i}{Y_i} - {\rm{E}}({W_i}{Y_i}))}}|W] &\le 1 +u_n \sum\limits_{m = 2}^\infty {\frac{{(2ws)^m}}{{m!}}} [m!C_T^m] = 1 +u_n {(2sw{C_T})^2}\sum\limits_{m = 2}^\infty {{{(2sw{C_T})}^{m - 2}}} \\ & = 1{\rm{ + }}\frac{{u_n{{(2sw{C_T})}^2}}}{{{\rm{1}} - 2sw{C_T}}} \le e^{\frac{{{{u_n(2sw{C_T})}^2}}}{{{\rm{1}} - 2sw{C_T}}}}. \end{align}$

(3.8)

Define the randomly weighted sum $S_n^{W } = :\sum\nolimits_{i = 1}^n {{W_i}{Y_i}}$ . By the conditional independence of $\{{W_i}{Y_i}|{\mathit{\boldsymbol{ W }}}\}_{i = 1}^n$ , it follows that by (3.8),

$\begin{align} {{\rm{E}}}[e^{s(S_n^W - {\rm{E}}S_n^W)}|{\mathit{\boldsymbol{ W }}}] & = \prod\limits_{i = 1}^n {{\rm{E}}[\exp \{ s[{W_i}{Y_i} - {\rm{E}}({W_i}{Y_i})]\} }|{\mathit{\boldsymbol{ W }}}] \le e^{\frac{{{{nu_n(2sw{C_T})}^2}}}{{{\rm{1}} - 2sw{C_T}}}}. \end{align}$

(3.9)

By conditional Markov's inequality and on $\{ \mathop {\max }\nolimits_{k \ge 1, 1 \le i \le n} {\rho _{i, k}} < {u_n}\}$ , we have for $a > 0$

$\begin{align} P(|S_n^W - {\rm{E}}S_n^W| \ge t|{\mathit{\boldsymbol{ W }}})&\le P(a(S_n^W - {\rm{E}}S_n^W) \ge at|{\mathit{\boldsymbol{ W }}})+P(a(-S_n^W + {\rm{E}}S_n^W) \ge at|{\mathit{\boldsymbol{ W }}})\\ &\le \frac{{{\rm{E}}}[e^{a(S_n^W - {\rm{E}}S_n^W)}|{\mathit{\boldsymbol{ W }}}]}{\exp(a t)}+\frac{{{\rm{E}}}[e^{a(-S_n^W + {\rm{E}}S_n^W)}|{\mathit{\boldsymbol{ W }}}]}{\exp(a t)}\\ [\text{Using}\; (3.9)\; \text{as}\; a \in (-\delta, \delta )]\; \; \; &\le 2\exp\{\frac{{{{nu_n(2aw{C_T})}^2}}}{{{\rm{1}} - 2aw{C_T}}} - a t \} = 2\exp \left\{ { - \frac{{{t^2}}}{{16nu_n{{(w{C_T})}^2} + 4w{C_T}t}}} \right\} \end{align}$

(3.10)

where the last equality is obtained by setting $a = \frac{t}{{8nu_n{{(w{C_T})}^2} + 2w{C_T}t}}$ .

Taking expectation w.r.t. ${\mathit{\boldsymbol{ W }}}$ on (3.10), it implies

$\begin{align*} P(|S_n^W - {\rm{E}}S_n^W| \ge t)& = P\left(|S_n^W - {\rm{E}}S_n^W| \ge t, \mathop {\max }\limits_{k \ge 1, 1 \le i \le n} {\rho _{i, k}} > {u_n}\right)+P\left(|S_n^W - {\rm{E}}S_n^W| \ge t, \mathop {\max }\limits_{k \ge 1, 1 \le i \le n} {\rho _{i, k}} \le {u_n}\right)\\ & \le P\left(\mathop {\max }\limits_{k \ge 1, 1 \le i \le n} {\rho _{i, k}} > {u_n})+P(|S_n^W - {\rm{E}}S_n^W| \ge t, \mathop {\max }\limits_{k \ge 1, 1 \le i \le n} {\rho _{i, k}} \le {u_n}\}\right)\\ & \le {C_\rho }/{u_n} + {\rm{E}}[P(|S_n^W - {\rm{E}}S_n^W| \ge t, \mathop {\max }\nolimits_{k \ge 1, 1 \le i \le n} {\rho _{i, k}} \le {u_n}|{\mathit{\boldsymbol{ W }}})]\\ & \le {C_\rho }/{u_n} + 2\exp \left\{ { - \frac{{{t^2}}}{{16n{u_n}{{(w{C_T})}^2} + 4w{C_T}t}}} \right\}. \end{align*}$

3.2. Some lemmas

Lemma 3. [Symmetrization theorem with different functions] Let $\varepsilon_{1}, ..., \varepsilon_{n}$ be a Rademacher sequence with uniform distribution on $\{ - 1, 1\}$ , independent of $X_{1}, ..., X_{n}$ and $g_i\in \mathcal{G}_i$ . Then,

${\rm{E}}\left(\sup _{g_1, \cdots, g_n \in \mathcal{G}_1, \cdots, \mathcal{G}_n}\left\lvert \sum\limits_{i = 1}^{n}\left[ g_i( X _{i})-{\rm{E}}\left\{g_i( X _{i})\right\} \right] \right\rvert\right)\le 2{\rm{E}}\left[{\rm{E}}_{\epsilon}\left\{ \underset{g_1, \cdots, g_n \in \mathcal{G}_1, \cdots, \mathcal{G}_n}{\sup}\left\lvert \sum\limits_{i = 1}^{n} \epsilon_{i}g_i( X _{i}) \right\rvert\right\}\right],$

where ${\rm{E}}_{\epsilon}\left\{\cdot\right\}$ refers to the expectation w.r.t. $\epsilon_{1}, ..., \epsilon_{n}$ .

Proof: Let $\{X_{i}^{\prime}\}_{i = 1}^{n}$ be an independent copy of $\{X_{i}\}_{i = 1}^{n}$ . The ${\rm{E}}^{\prime}$ denotes the expectation w.r.t. $\{X_{i}^{\prime}\}_{i = 1}^{n}$ , and let ${\cal F}_{n}^{\prime} = \sigma\left(X^{\prime}_{1}, \cdots, X^{\prime}_{n}\right)$ . So,

$\begin{align*} {\rm{E}}\left(\sup _{g_1, \cdots, g_n \in \mathcal{G}_1, \cdots, \mathcal{G}_n}\left\lvert \sum\limits_{i = 1}^{n}\left[ g_i( X _{i})-{\rm{E}}\left\{g_i( X _{i})\right\} \right] \right\rvert\right)& = {\rm{E}}\left(\sup _{g_1, \cdots, g_n \in \mathcal{G}_1, \cdots, \mathcal{G}_n}\left\lvert {\rm{E}}^{\prime}\sum\limits_{i = 1}^{n} [g_{i}\left(X_{t}\right)-g_{i}\left(X_{i}^{\prime}\right)]| {\cal F}_{n}^{\prime}\right\rvert \right)\\ (\text{Jensen's inequality of the absolute function})& \leq {\rm{E}}\left(\sup _{g_1, \cdots, g_n \in \mathcal{G}_1, \cdots, \mathcal{G}_n}{\rm{E}}^{\prime}\left\lvert\sum\limits_{i = 1}^{n}[g_{i}\left(X_{t}\right)-g_{i}\left(X_{i}^{\prime}\right) ]\right\rvert| {\cal F}_{n}^{\prime}\right)\\ (\text{Jensen's inequality of the max function})&\leq {\rm{E}}\left({\rm{E}}^{\prime}\sup _{g_1, \cdots, g_n \in \mathcal{G}_1, \cdots, \mathcal{G}_n}\left\lvert\sum\limits_{i = 1}^{n}[ g_{i}\left(X_{t}\right)-g_{i}\left(X_{i}^{\prime}\right) ] \right\rvert|{\cal F}_{n}^{\prime}\right)\\ & = {\rm{E}}\left(\sup _{f_1, \cdots, f_n \in \mathcal{G}_1, \cdots, \mathcal{G}_n}\left\lvert\sum\limits_{i = 1}^{n} [g_{i}\left(X_{t}\right)-g_{i}\left(X_{i}^{\prime}\right)] \right\rvert\right), \end{align*}$

where we use the conditional expectation version of Jensen's inequalities.

Since $\varepsilon_{i} [g_{i}\left(X_{i}\right)-g_{i}(X_{i}^{\prime})]$ and $g_{i}\left(X_{i}\right)-g_{i}(X_{i}^{\prime})$ have the same distribution, then,

$= {\rm{E}}\left(\sup _{g_1, \cdots, g_n \in \mathcal{G}_1, \cdots, \mathcal{G}_n}\left\lvert\sum\limits_{i = 1}^{n} \varepsilon_i [g_{i}\left(X_{i}\right)-g_{i}\left(X_{i}^{\prime}]\right)\right\rvert\right)\leq 2{\rm{E}}\left[{\rm{E}}_{\epsilon}\left\{ \underset{g_1, \cdots, g_n \in \mathcal{G}_1, \cdots, \mathcal{G}_n}{\sup}\left\lvert \sum\limits_{i = 1}^{n} \epsilon_{i}g_i( X _{i}) \right\rvert\right\}\right].$

If $g_i = f, \; \mathcal{G}_i = \mathcal{F}$ for $i = 1, 2, \cdots, n$ , then we have the classical symmetrization theorem.

Lemma 4. [Symmetrization Theorem, Lemma 2.3.1 in ^[17]] Let $\varepsilon_{1}, ..., \varepsilon_{n}$ be a Rademacher sequence with uniform distribution on $\{ - 1, 1\}$ , independent of $X_{1}, ..., X_{n}$ and $f\in \mathcal{F}$ . Then, we have

${\rm{E}}\left[ \underset{f \in \mathcal{F}}{\sup}\left\lvert \sum\limits_{i = 1}^{n}\left[ f( X _{i})-{\rm{E}}\left\{f( X _{i})\right\} \right] \right\rvert\right]\le 2{\rm{E}}\left[{\rm{E}}_{\epsilon}\left\{ \underset{f \in \mathcal{F}}{\sup}\left\lvert \sum\limits_{i = 1}^{n} \epsilon_{i}f( X _{i}) \right\rvert\right\}\right],$

where ${\rm{E}}[\cdot]$ refers to the expectation w.r.t. $X_{1}, ..., X_{n}$ and ${\rm{E}}_{\epsilon}\left\{\cdot\right\}$ w.r.t. $\epsilon_{1}, ..., \epsilon_{n}$ .

3.3. Concentration for strongly log-concave discrete distributions

In this section, we restate the applications of the concentration inequality for a function of the data under the so-called strongly log-concave discrete distribution assumption, which was used in the Supplementary Material of ^[21]. We utilized the convex geometry approach to establish the tail bounds. In convex geometry, the following discrete version of the Prékopa-Leindler inequality can be found in Theorem 1.2 of ^[5]. The discrete version of the Prékopa-Leindler inequality is an essential inequality when deriving concentration inequalities of strongly log-concave counting measures. This shares the same idea when we consider the continuous version of Prékopa-Leindler inequality (see Theorem 3.15 of ^[18]).

Let $\lfloor r\rfloor = \max \{m \in \mathbb{Z}; m \leq r\}$ be the lower integer part of $r \in \mathbb{R}$ , and $\lceil r\rceil = -\lfloor- r\rfloor$ be the upper integer part. Denote $\lfloor {\mathit{\boldsymbol{ x }}}\rfloor = \left(\left\lfloor x_{1}\right\rfloor, \ldots\left\lfloor x_{n}\right\rfloor\right)$ and $\lceil {\mathit{\boldsymbol{ x }}}\rceil = \left(\left\lceil x_{1}\right\rceil, \ldots, \left\lceil x_{n}\right\rceil\right)$ .

Lemma 5. [Discrete Prékopa-Leindler inequality] Let $f, g, h, k: \mathbb{Z}^{n} \rightarrow [0, \infty)$ be functions that satisfy the following inequality:

$\begin{equation} f({\mathit{\boldsymbol{ x }}}) g({\mathit{\boldsymbol{ y }}}) \leq h(\lfloor\lambda {\mathit{\boldsymbol{ x }}}+(1-\lambda) {\mathit{\boldsymbol{ y }}}\rfloor) k(\lceil(1-\lambda){\mathit{\boldsymbol{ x }}}+\lambda {\mathit{\boldsymbol{ y }}}\rceil), \quad \forall {\mathit{\boldsymbol{ x }}}, {\mathit{\boldsymbol{ y }}} \in \mathbb{Z}^{n}, \quad \forall \lambda \in[0, 1]. \end{equation}$

(3.11)

Then, we have

$\left(\sum\limits_{{\mathit{\boldsymbol{ x }}} \in Z^{n}} f({\mathit{\boldsymbol{ x }}})\right)\left(\sum\limits_{{\mathit{\boldsymbol{ x }}} \in Z^{n}} g({\mathit{\boldsymbol{ x }}})\right) \leq\left(\sum\limits_{{\mathit{\boldsymbol{ x }}} \in \mathbb{Z}^{n}} h({\mathit{\boldsymbol{ x }}})\right)\left(\sum\limits_{{\mathit{\boldsymbol{ x }}} \in Z^{n}} k({\mathit{\boldsymbol{ x }}})\right).$

From a geometric perspective, the Prékopa-Leindler inequality is a valuable method to prove concentration inequalities under Lipschitz functions of strongly log-concave distributions. From the idea in ^[12], a distribution ${P}$ with a density $p({\mathit{\boldsymbol{ x }}})$ (w.r.t. the counting measure) is said to be strongly discrete log-concave, if $\psi({\mathit{\boldsymbol{ x }}}) = :-\log p({\mathit{\boldsymbol{ x }}}) : \mathbb{Z}^{n} \rightarrow \mathbb{R}$ is strongly midpoint log-convex for some $\gamma > 0$ :

$\begin{equation} \psi({\mathit{\boldsymbol{ x }}})+\psi({\mathit{\boldsymbol{ y }}})-\psi(\lceil \frac{1}{2} {\mathit{\boldsymbol{ x }}}+\frac{1}{2} {\mathit{\boldsymbol{ y }}}\rceil)-\psi(\lfloor\frac{1}{2} {\mathit{\boldsymbol{ x }}}+\frac{1}{2} {\mathit{\boldsymbol{ y }}}\rfloor) \ge \frac{\gamma}{4}\|{\mathit{\boldsymbol{ x }}}-{\mathit{\boldsymbol{ y }}}\|_{2}^{2}, \quad \forall {\mathit{\boldsymbol{ x }}}, {\mathit{\boldsymbol{ y }}} \in \mathbb{Z}^{n}. \end{equation}$

(3.12)

The inequality (3.12) is an extension of strongly convexity for continuous functions on $\mathbb{R}^{n}$ :

$\lambda \psi({\mathit{\boldsymbol{ x }}})+(1-\lambda) \psi({\mathit{\boldsymbol{ y }}})-\psi(\lambda {\mathit{\boldsymbol{ x }}}+(1-\lambda) {\mathit{\boldsymbol{ y }}}) \geq \frac{\gamma}{2} \lambda(1-\lambda)\|{\mathit{\boldsymbol{ x }}}-{\mathit{\boldsymbol{ y }}}\|_{2}^{2}, \quad \forall {\mathit{\boldsymbol{ x }}}, {\mathit{\boldsymbol{ y }}} \in \mathbb{R}^{n}, \quad \forall \lambda \in[0, 1],$

with modulus of convexity $\gamma$ ^[13].

Strongly log-convex property for a discrete density function requires that continuous functions are restricted on a lattice space. If $\gamma = 0$ , (3.12) turns to the discrete midpoint convexity property for $\psi({\mathit{\boldsymbol{ x }}})$

$\psi({\mathit{\boldsymbol{ x }}})+\psi({\mathit{\boldsymbol{ y }}})\ge\psi(\lceil \frac{1}{2} {\mathit{\boldsymbol{ x }}}+\frac{1}{2} {\mathit{\boldsymbol{ y }}}\rceil)+\psi(\lfloor\frac{1}{2} {\mathit{\boldsymbol{ x }}}+\frac{1}{2} {\mathit{\boldsymbol{ y }}}\rfloor), \quad \forall {\mathit{\boldsymbol{ x }}}, {\mathit{\boldsymbol{ y }}} \in \mathbb{Z}^{n},$

see ^[12]. However, directly restricting a continuous function to some lattice space may not necessarily obtain discrete convex functions. For the corresponding counterexample, see ^[19].

For one-dimensional ${P}$ , the probability mass function $p(x)$ is said to be log-concave if the sequence $\{p(x)\}_{x \in \mathbb{Z}}$ is log-concave; that is, for any $\lambda n+(1-\lambda) m \in \mathbb{Z}$ with $m, n \in \mathbb{Z}$ and $\lambda \in(0, 1)$ , one has

$p(\lambda n+(1-\lambda) m) \geq p(n)^{\lambda} p(m)^{1-\lambda}.$

Proposition 1. [The concentration inequality of strongly log-concave discrete distributions] Consider a strongly log-concave discrete distribution ${P}_\gamma$ with index $\gamma > 0$ on $\mathbb{Z}^{n}$ . For $f: \mathbb{R}^{n} \rightarrow \mathbb{R}$ that is $L$ -Lipschitz w.r.t. Euclidean norm, then,

$\begin{equation} {\Pr}\left\{|f({\mathit{\boldsymbol{ X }}})-\mathrm{E}f({\mathit{\boldsymbol{ X }}})| \geq t\right\} \leq 2 e^{-\frac{\gamma t^{2}}{4L^{2}}}. \end{equation}$

(3.13)

Proof: Let $h$ be a zero-mean function with Lipschitz constant $L$ (w.r.t. the Euclidean norm). It remains to prove the upper bound of a moment generating function $\mathrm{E}e^{h({\mathit{\boldsymbol{ X }}})} \leq e^{\frac{L^{2}}{\tau}}$ . Then, for $f$ with Lipschitz constant $K$ and $\lambda \in \mathbb{R},$ we apply the upper bound to the zero-mean function $h({\mathit{\boldsymbol{ X }}}): = \lambda(f({\mathit{\boldsymbol{ X }}})-\mathrm{E}f({\mathit{\boldsymbol{ X }}}))$ , which has Lipschitz constant $L = \lambda K$ . Given $\lambda \in(0, 1)$ and ${\mathit{\boldsymbol{ x }}}, {\mathit{\boldsymbol{ y }}} \in \mathbb{Z}^{n},$ define the proximity operator of $h$ as

$l({\mathit{\boldsymbol{ y }}}): = \inf \limits_{{\mathit{\boldsymbol{ x }}} \in \mathbb{Z}^{n}}\left\{h({\mathit{\boldsymbol{ x }}})+\frac{\gamma}{4}\|{\mathit{\boldsymbol{ x }}}-{\mathit{\boldsymbol{ y }}}\|_{2}^{2}\right\}.$

With this proximity operator, the proof is proceeding by using the discrete Prekopa-Leindler inequality (Lemma 5) with $\lambda = 1 / 2$ , $h({\mathit{\boldsymbol{ t }}}) = k({\mathit{\boldsymbol{ t }}}) = :p({\mathit{\boldsymbol{ t }}}) = e^{-\psi({\mathit{\boldsymbol{ t }}})}$ , $f({\mathit{\boldsymbol{ x }}}): = e^ {-h({\mathit{\boldsymbol{ x }}})-\psi({\mathit{\boldsymbol{ x }}})}\; \text { and }\; g({\mathit{\boldsymbol{ y }}}): = e^{l({\mathit{\boldsymbol{ y }}})-\psi({\mathit{\boldsymbol{ y }}})}.$ We check that

$\begin{align} e^{\frac{1}{2}[l({\mathit{\boldsymbol{ y }}})-h({\mathit{\boldsymbol{ x }}})-\psi({\mathit{\boldsymbol{ y }}})-\psi({\mathit{\boldsymbol{ x }}})]} \le e^{-\frac{1}{2}\psi(\lceil \frac{1}{2} {\mathit{\boldsymbol{ x }}}+\frac{1}{2} {\mathit{\boldsymbol{ y }}}\rceil)} \cdot e^{-\frac{1}{2}\psi(\lfloor\frac{1}{2} {\mathit{\boldsymbol{ x }}}+\frac{1}{2} {\mathit{\boldsymbol{ y }}}\rfloor)} \quad \forall {\mathit{\boldsymbol{ x }}}, {\mathit{\boldsymbol{ y }}} \in \mathbb{Z}^{n}. \end{align}$

(3.14)

Then, (3.11) satisfies Lemma 5 with $\lambda = 1 / 2$ .

By discrete strong convexity of the function $\psi$

$\begin{align*} \frac{1}{2} [\psi({\mathit{\boldsymbol{ x }}})+ \psi({\mathit{\boldsymbol{ y }}}){-\psi(\lceil \frac{1}{2} {\mathit{\boldsymbol{ x }}}+\frac{1}{2} {\mathit{\boldsymbol{ y }}}\rceil)-\psi(\lfloor\frac{1}{2} {\mathit{\boldsymbol{ x }}}+\frac{1}{2} {\mathit{\boldsymbol{ y }}}\rfloor)}\ge \frac{\gamma}{8}\|{\mathit{\boldsymbol{ x }}}-{\mathit{\boldsymbol{ y }}}\|_{2}^{2}, \end{align*}$

and the proximity operator of $h$ , we have

$\begin{align*} &\; \; \; \; {-\frac{1}{2}\psi(\lceil \frac{1}{2} {\mathit{\boldsymbol{ x }}}+\frac{1}{2} {\mathit{\boldsymbol{ y }}}\rceil)-\frac{1}{2}\psi(\lfloor\frac{1}{2} {\mathit{\boldsymbol{ x }}}+\frac{1}{2} {\mathit{\boldsymbol{ y }}}\rfloor)}\\ &\ge \frac{1}{2}\left\{l({\mathit{\boldsymbol{ y }}})-h({\mathit{\boldsymbol{ x }}})-\frac{\gamma}{4}\|{\mathit{\boldsymbol{ x }}}-{\mathit{\boldsymbol{ y }}}\|_{2}^{2}\right\}{-\frac{1}{2}\psi(\lceil \frac{1}{2} {\mathit{\boldsymbol{ x }}}+\frac{1}{2} {\mathit{\boldsymbol{ y }}}\rceil)-\frac{1}{2}\psi(\lfloor\frac{1}{2} {\mathit{\boldsymbol{ x }}}+\frac{1}{2} {\mathit{\boldsymbol{ y }}}\rfloor)}\\ & \ge \frac{1}{2}\{l({\mathit{\boldsymbol{ y }}})-h({\mathit{\boldsymbol{ x }}})\}-\frac{1}{2} \psi({\mathit{\boldsymbol{ y }}})-\frac{1}{2} \psi({\mathit{\boldsymbol{ x }}}), \end{align*}$

which verifies (3.14).

By $\sum_{{\mathit{\boldsymbol{ x }}} \in \mathbb{Z}^{n}} h({\mathit{\boldsymbol{ x }}}) = \sum_{{\mathit{\boldsymbol{ x }}} \in \mathbb{Z}^{n}} k({\mathit{\boldsymbol{ x }}}) = 1$ , we know that Lemma 5 gives

$\mathrm{E}e^{l({\mathit{\boldsymbol{ Y }}})}\mathrm{E}e^{-h({\mathit{\boldsymbol{ X }}})} = \sum\limits_{{\mathit{\boldsymbol{ x }}} \in \mathbb{Z}^{n}} e^{-h({\mathit{\boldsymbol{ x }}})-\psi({\mathit{\boldsymbol{ x }}})}\sum\limits_{{\mathit{\boldsymbol{ y }}} \in \mathbb{Z}^{n}} e^{l({\mathit{\boldsymbol{ y }}})-\psi({\mathit{\boldsymbol{ y }}})} \le 1.$

Then, Jensen's inequality implies

$\mathrm{E}e^{l({\mathit{\boldsymbol{ Y }}})}\leq (\mathrm{E}e^{-h({\mathit{\boldsymbol{ X }}})})^{-1} \leq (e^{\mathrm{E}[-h({\mathit{\boldsymbol{ X }}})]})^{-1} = 1,$

where in the last equality we use $\mathrm{E}[-h({\mathit{\boldsymbol{ X }}})] = \mathrm{E}[\lambda(f({\mathit{\boldsymbol{ X }}})-\mathrm{E}f({\mathit{\boldsymbol{ X }}}))] = 0$ . The definition of the proximity operator shows

$\begin{aligned} 1\ge \mathrm{E}e^{l({\mathit{\boldsymbol{ y }}})} = \mathrm{E}e^{\inf _{{\mathit{\boldsymbol{ x }}} \in \mathbb{Z}^{n}}\left\{h({\mathit{\boldsymbol{ x }}})+\frac{\gamma}{4}\|{\mathit{\boldsymbol{ x }}}-{\mathit{\boldsymbol{ Y }}}\|_{2}^{2}\right\}} & = \mathrm{E}e^{\inf _{{\mathit{\boldsymbol{ x }}} \in \mathbb{Z}^{n}}\left\{h({\mathit{\boldsymbol{ Y }}})+[h({\mathit{\boldsymbol{ x }}})-h({\mathit{\boldsymbol{ Y }}})]+\frac{\gamma}{4}\|{\mathit{\boldsymbol{ x }}}-{\mathit{\boldsymbol{ Y }}}\|_{2}^{2}\right\}}\\ & \ge \mathrm{E} e^{h({\mathit{\boldsymbol{ Y }}})+\inf _{{\mathit{\boldsymbol{ x }}} \in \mathbb{R}^{n}}\left\{-L\|{\mathit{\boldsymbol{ x }}}-{\mathit{\boldsymbol{ Y }}}\|_{2}+\frac{\gamma}{4}\|{\mathit{\boldsymbol{ x }}}-{\mathit{\boldsymbol{ Y }}}\|_{2}^{2}\right\} } = \mathrm{E}e^{h({\mathit{\boldsymbol{ Y }}})-{L^{2}}/{\gamma}}, \end{aligned}$

where the second last inequality due to $L$ -Lipschitz of $h$ , i.e., $|h({\mathit{\boldsymbol{ x }}})-h({\mathit{\boldsymbol{ Y }}})| \leq L\|{\mathit{\boldsymbol{ x }}}-{\mathit{\boldsymbol{ Y }}}\|_{2}.$

Then, we have $\mathrm{E}e^{\lambda(f({\mathit{\boldsymbol{ X }}})-\mathrm{E}f({\mathit{\boldsymbol{ X }}})]} \leq e^{\frac{1}{2}\cdot\lambda^{2}\cdot\frac{2L^{2}}{\gamma}}\; \text { for all } \lambda \in \mathbb{R}.$ This means that ${f({\mathit{\boldsymbol{ X }}})}-\mathrm{E}f({\mathit{\boldsymbol{ X }}})\sim\operatorname{subG}(\frac{2L^{2}}{\gamma})$ , hence the tail bound is (3.13).

4. Conclusions

Non-asymptotic statistical inference on high-dimensional data is important for many fields, such as data mining and machine learning. In this paper, we derived a novel concentration inequality for the sum of independent sub-Gaussian variables with random dependent weights in high-dimensional regression settings. We applied the proposed concentration inequality to obtain a high probability bound for the stochastic Lipschitz constant for negative binomial loss functions involved in Lasso-penalized negative binomial regressions, and used this bound to study oracle inequalities for the Lasso estimators. The usefulness of the proposed concentration inequality in applications was justified by solid theoretical proofs.

Use of AI tools declaration

The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.

Acknowledgments

The authors would like to thank the reviewers and Dr. Pengfei Wang for their valuable suggestions that have significantly improved the quality of this paper. This work was supported in part by the National Natural Science Foundation of China under grant numbers 12101630 and 12261011, and the "double first-class" construction projects of Chinese universities under grant number ZG216S2348.

Conflict of interest

The authors declare no conflict of interest.

References

[1]	V. V. Buldygin, Y. V. Kozachenko, Metric characterization of random variables and random processes, Providence: American Mathematical Society, 2000.
[2]	S. Boucheron, G. Lugosi, P. Massart, Concentration inequalities: A nonasymptotic theory of independence, Oxford: Oxford University Press, 2013.
[3]	P. Bühlmann, S. A. van de Geer, Statistics for high-dimensional data: methods, theory and applications, Berlin: Springer, 2011. https://doi.org/10.1007/978-3-642-20192-9
[4]	Z. Chi, A local stochastic Lipschitz condition with application to Lasso for high dimensional generalized linear models, arXiv: 1009.1052. https://doi.org/10.48550/arXiv.1009.1052
[5]	D. Halikias, B. Klartag, B. A. Slomka, Discrete variants of Brunn-Minkowski type inequalities, Annales de la Faculté des Sciences de Toulouse Mathématiques, 30 (2021), 267–279. https://doi.org/10.5802/afst.1674
[6]	Q. Han, J. A. Wellner, Convergence rates of least squares regression estimators with heavy-tailed errors, Ann. Statist., 47 (2019), 2286–2319. https://doi.org/10.1214/18-AOS1748 doi: 10.1214/18-AOS1748
[7]	Q. Han, Multiplier U-processes: sharp bounds and applications, Bernoulli, 28 (2022), 87–124. https://doi.org/10.3150/21-BEJ1334 doi: 10.3150/21-BEJ1334
[8]	W. Hoeffding, Probability inequalities for sums of bounded random variables, J. Am. Stat. Assoc., 58 (1963), 13–30. https://doi.org/10.1080/01621459.1963.10500830 doi: 10.1080/01621459.1963.10500830
[9]	J. Kahane, Propriétés locales des fonctions à séries de Fourier aléatoires, Stud. Math., 19 (1960), 1–25. https://doi.org/10.4064/sm-19-1-1-25 doi: 10.4064/sm-19-1-1-25
[10]	S. Li, H. Wei, X. Lei, Heterogeneous overdispersed count data regressions via double-penalized estimations, Mathematics, 10 (2022), 1700. https://doi.org/10.3390/math10101700 doi: 10.3390/math10101700
[11]	S. Mendelson, Upper bounds on product and multiplier empirical processes, Stoch. Proc. Appl., 126 (2016), 3652–3680. https://doi.org/10.1016/j.spa.2016.04.019 doi: 10.1016/j.spa.2016.04.019
[12]	S. Moriguchi, K. Murota, A. Tamura, F. Tardella, Discrete midpoint convexity, Math. Oper. Res., 45 (2020), 99–128. https://doi.org/10.1287/moor.2018.0984
[13]	M. W. Mahoney, J. C. Duchi, A. C. Gilbert, The mathematics of data, Providence: American Mathematical Society, 2018.
[14]	P. Massart, Some applications of concentration inequalities to statistics, Annales de la Facult des Sciences de Toulouse Mathmatiques, 9 (2000), 245–303. https://doi.org/10.5802/afst.961 doi: 10.5802/afst.961
[15]	P. Rigollet, J. C. Hütter, High dimensional statistics, New York: Spring, 2019.
[16]	R. Vershynin, Introduction to the non-asymptotic analysis of random matrices, arXiv: 1011.3027. https://doi.org/10.48550/arXiv.1011.3027
[17]	A. W. Vaart, J. A. Wellner, Weak convergence and empirical processes: with applications to statistics, New York: Springer, 1996. https://doi.org/10.1007/978-1-4757-2545-2
[18]	M. J. Wainwright, High-dimensional statistics: a non-asymptotic viewpoint, Cambridge: Cambridge University Press, 2019.
[19]	Ü. Yüceer, Discrete convexity: convexity for functions defined on discrete spaces, Discrete Appl. Math., 119 (2002), 297–304. https://doi.org/10.1016/S0166-218X(01)00191-3 doi: 10.1016/S0166-218X(01)00191-3
[20]	H. Zhang, S. Chen, Concentration inequalities for statistical inference, Commun. Math. Res., 37 (2021), 1–85 https://doi.org/10.4208/cmr.2020-0041 doi: 10.4208/cmr.2020-0041
[21]	H. Zhang, J. Jia, Elastic-net regularized high-dimensional negative binomial regression: consistency and weak signals detection, Stat. Sinica, 32 (2022), 181–207. https://doi.org/10.5705/SS.202019.0315 doi: 10.5705/SS.202019.0315
[22]	H. Zhang, X. Lei, Growing-dimensional partially functional linear models: non-asymptotic optimal prediction error, Phys. Scr., 98 (2023), 095216. https://doi.org/10.1088/1402-4896/aceac0 doi: 10.1088/1402-4896/aceac0
[23]	H. Zhang, H. Wei, G. Cheng, Tight non-asymptotic inference via sub-Gaussian intrinsic moment norm, arXiv: 2303.07287. https://doi.org/10.48550/arXiv.2303.07287

Reader Comments

Your name:*

Email:*
© 2023 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

AIMS Mathematics

1.8 3.4

Metrics

Article views(1224) PDF downloads(52) Cited by(0)

Preview PDF

Download XML

Export Citation

Article outline

Show full outline

AIMS Mathematics

Concentration for multiplier empirical processes with dependent weights

Related Papers:

Abstract

1. Introduction

2. Concentration for dependent summations

2.1. Main results

2.2. Applications: Local stochastic Lipschitz conditions in GLMs

2.3. Berstein-type concentration inequalities

3. Theoretical proofs

3.1. Proofs of main results

3.2. Some lemmas

3.3. Concentration for strongly log-concave discrete distributions

4. Conclusions

Use of AI tools declaration

Acknowledgments

Conflict of interest

References

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Other Articles By Authors

Catalog

AIMS Mathematics

Concentration for multiplier empirical processes with dependent weights

Related Papers:

Abstract

1. Introduction

2. Concentration for dependent summations

2.1. Main results

2.2. Applications: Local stochastic Lipschitz conditions in GLMs

2.3. Berstein-type concentration inequalities

3. Theoretical proofs

3.1. Proofs of main results

3.2. Some lemmas

3.3. Concentration for strongly log-concave discrete distributions

4. Conclusions

Use of AI tools declaration

Acknowledgments

Conflict of interest

References

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Other Articles By Authors

Related pages

Tools

Export File

Citation

Format

Content

Catalog