Research article

Concentration for multiplier empirical processes with dependent weights

  • Received: 03 August 2023 Revised: 16 September 2023 Accepted: 27 September 2023 Published: 23 October 2023
  • MSC : 62K05, 62K15

  • A novel concentration inequality for the sum of independent sub-Gaussian variables with random dependent weights is introduced in statistical settings for high-dimensional data. The random dependent weights are functions of some regularized estimators. We applied the proposed concentration inequality to obtain a high probability bound for the stochastic Lipschitz constant for negative binomial loss functions involved in Lasso-penalized negative binomial regressions. We used this bound to study oracle inequalities for Lasso estimators. Additionally, a similar concentration inequality was derived for a randomly weighted sum of independent centred exponential family variables.

    Citation: Huiming Zhang, Hengzhen Huang. Concentration for multiplier empirical processes with dependent weights[J]. AIMS Mathematics, 2023, 8(12): 28738-28752. doi: 10.3934/math.20231471

    Related Papers:

    [1] Liqi Xia, Xiuli Wang, Peixin Zhao, Yunquan Song . Empirical likelihood for varying coefficient partially nonlinear model with missing responses. AIMS Mathematics, 2021, 6(7): 7125-7152. doi: 10.3934/math.2021418
    [2] Jingjing Yang, Weizhong Tian, Chengliang Tian, Sha Li, Wei Ning . Empirical likelihood method for detecting change points in network autoregressive models. AIMS Mathematics, 2024, 9(9): 24776-24795. doi: 10.3934/math.20241206
    [3] Suping Wang . The random convolution sampling stability in multiply generated shift invariant subspace of weighted mixed Lebesgue space. AIMS Mathematics, 2022, 7(2): 1707-1725. doi: 10.3934/math.2022098
    [4] Cuiping Wang, Xiaoshuang Zhou, Peixin Zhao . Empirical likelihood based heteroscedasticity diagnostics for varying coefficient partially nonlinear models. AIMS Mathematics, 2024, 9(12): 34705-34719. doi: 10.3934/math.20241652
    [5] Jianye Yang, Tongjiang Yan, Pengcheng Ren . A novel Bayesian federated learning framework to address multi-dimensional heterogeneity problem. AIMS Mathematics, 2023, 8(7): 15058-15080. doi: 10.3934/math.2023769
    [6] Rashad M. Asharabi, Somaia M. Alhazmi . Accelerating the convergence of a two-dimensional periodic nonuniform sampling series through the incorporation of a bivariate Gaussian multiplier. AIMS Mathematics, 2024, 9(11): 30898-30921. doi: 10.3934/math.20241491
    [7] Khanittha Talordphop, Yupaporn Areepong, Saowanit Sukparungsee . An empirical assessment of Tukey combined extended exponentially weighted moving average control chart. AIMS Mathematics, 2025, 10(2): 3945-3960. doi: 10.3934/math.2025184
    [8] Zhanshou Chen, Muci Peng, Li Xi . A new procedure for unit root to long-memory process change-point monitoring. AIMS Mathematics, 2022, 7(4): 6467-6477. doi: 10.3934/math.2022360
    [9] Gunduz Caginalp . Fat tails arise endogenously from supply/demand, with or without jump processes. AIMS Mathematics, 2021, 6(5): 4811-4846. doi: 10.3934/math.2021283
    [10] Miyoun Jung . A variational image denoising model under mixed Cauchy and Gaussian noise. AIMS Mathematics, 2022, 7(11): 19696-19726. doi: 10.3934/math.20221080
  • A novel concentration inequality for the sum of independent sub-Gaussian variables with random dependent weights is introduced in statistical settings for high-dimensional data. The random dependent weights are functions of some regularized estimators. We applied the proposed concentration inequality to obtain a high probability bound for the stochastic Lipschitz constant for negative binomial loss functions involved in Lasso-penalized negative binomial regressions. We used this bound to study oracle inequalities for Lasso estimators. Additionally, a similar concentration inequality was derived for a randomly weighted sum of independent centred exponential family variables.



    Over the last two decades, modern data collection techniques have enabled scientists and engineers to access and load vast numbers of variables as random data in their experiments. Probability theory provides the mathematical foundations for statistics and data-driven problems have led to various new advances in statistical research, which in turn contributes new and challenging problems in probability for further study. For instance, the rapid development of high-dimensional statistics has spurred the growth of the probability theory and even pure mathematics, including concentration inequalities, random matrix theory, geometric functional analysis and more [2,16].

    The emergence of high-throughput data has led to a surge in statistical research on complex data, particularly on high-dimensional data and statistical learning [14,20]. This trend has gained traction in various scientific fields despite the high cost of measurements. Data sets are typically small, with only tens or hundreds of observations, and limited computing power often restricts the size of suitable finite samples. As a result, modern statisticians and data scientists have shifted their focus from asymptotic to non-asymptotic analysis, as it can handle small sample sizes and large model dimensions. Concentration inequalities play a crucial role in high-dimensional statistical inference, as they can derive various explicit non-asymptotic error bounds as a function of the sample size, sparsity level and dimension. When analyzing the various error bounds of the regularized estimator, concentration inequalities are indispensable tools for analysis [3,20].

    When the random variables are unbound, the classcial Hoeffding's inequality [8] falls to do a non-asymptotic analysis. We need the concept of sub-Gaussian random variables [9] to obtain tight Hoeffding-type concentration inequalities for the sum of independent random variables. A centered random variable (r.v.) X is called sub-Gaussian (XsubG(σ2)) if EesXes2σ2/2 for sR, where σ>0 is the sub-Gaussian parameter. From Chernoff's inequality, the exponential decay of the sub-Gaussian tail is obtained by P(Xt)infs>0exp{st}Eexp{sX}infs>0exp(st+σ2s22)=exp(t22σ2), minimizing the upper bound by putting s=t/σ2. Moreover, for independent {Xi}ni=1 with XisubG(σi), we have sub-Gaussian concentration inequality

    P(|ni=1Xi|t)2exp{t22ni=1σ2i},t0, (1.1)

    for any variance proxies {σ2i}ni=1 of {Xi}ni=1 (see Theorem 1.5 in [1]). Define the Lp-norm of r.v. X as Xp:=(E|X|p)1/p. An alternative form of the sub-Gaussian parameter is defined by the sub-Gaussian norm θ2 for zero-mean r.v. X is defined as Xθ2:=supp1[EX2p(2p1)!!]1/(2p) (see page 23 in [1]).

    Corollary 1.7 in [15] extended the sub-Gaussian concentration inequality (1.1) to the weighted sum of independent sub-Gaussian random variables with fixed weights.

    Lemma 1. [Concentration for weighted sub-Gaussian sum] Let Y1,,Yn be n independent r.v.s with YisubG(σ2i). Define σ2=max1inσ2i<. For any w:=(w1,,wn), we have

    P(|ni=1wiYi|>t)2exp(t22σ2w22).

    However, if wi's are random in Lemma 1, the story is totally different. The goal of this paper is to obtain novel theoretical results on the concentration inequality for the sum of dependent variables with random weights, under high-dimensional data background. Our theory is motivated from the non-asymptotic oracle inequalities of the regularized estimator in high-dimensional negative binomial regressions [21], and the concentration of random Lipschitz coefficients associated with empirical loss functions [4]. Our setting is different from classical multiplier empirical processes serving the multiplier Bootstrap inference, where the multipliers are random variables independent of {Yi}ni=1 (see Chapter 2.9 of [17] and [6,7]). Mendelson [11] studied the concentration inequalities for the centered multiplier process indexed by a functional class, where the i.i.d. multipliers need not be independent of the original empirical processes. In the analyses of high-dimension continuous data regressions by empirical processes, researches often resort to concentration inequalities of the Lipschitz function of strongly log-concave distributions (see Theorems 2.26 and 3.16 in [18]). For high-dimension count data regressions, our section 3.3 discusses the discrete distributions with strongly log-concave structures, which it is considered hard to check the definition of discrete strongly log-concave distributions (see (3.12) below). However, this strong assumption is usually intractable and unverifiable from the data. The sub-Gaussian assumption for the i.i.d. data is testable (see [23]).

    In section two, we present the main results of the theory and demonstrate their applications in a class of high-dimensional generalized linear models. Theoretical proofs of the main results and some lemmas and additional results are given in section three. Finally, the conclusions are presented in section four.

    When controlling the summation of a function of the random sample indexed by a common estimator ˆθ, it is false to use any sort of classical law of large numbers and central limit theorems (or concentration inequality for independent summation).

    Formally, let X1,,Xn be a random sample independently drawn from P on a measurable space (X,A). Given an estimator ˆθ, we want to study its asymptotic properties for summation of some function fˆθ(Xi),

    1nni=1[fˆθ(Xi)Efˆθ(Xi)].

    A possible solution: Prove a uniform version (the suprema of empirical processes, see [17]) for all possible ˆθ on a set K, which is usually stronger than what is needed.

    1nni=1[fˆθ(Xi)Efˆθ(Xi)]supθK|1nni=1[fθ(Xi)Efθ(Xi)]|.

    The summation in the sup enjoy independence.

    In following theorem, we extend Lemma 1 with dependent and random weights.

    Theorem 1. [Concentration for weighted dependent sum] Let Yi's be independent centered sub-Gaussian random variables with max1inYkθ2<. Let wi(ˆθ)'s be a series of bounded function of a bounded random variable ˆθ as the weights (can be dependent on all Yi's), where ˆθ1r< and max1inwi()1. Then, with probability of at least 1δ,

    |1nni=1wi(ˆθ)Yi|41nni=1|YiYi|2θ2logδ1n+21nni=1E(Y2i)2log(2p)n, (2.1)

    for all ˆθ1<, where Yi is an independent copy of Yi.

    The first term in (2.1) is due to sub-Gaussian concentration, and the second term in (2.1) is from the upper bound of the expected version of the superma of empirical process f(Y):=1nsupθ1r|ni=1wi(θ)Yi| (see the proof in section three).

    The concentration of random Lipschitz coefficients associated with empirical loss functions is crucial for deriving error bounds of Lasso or Elastic-net penalized high-dimensional generalized linear models (GLMs) in high-dimensional regressions. For more information, please refer to [3,4,10].

    Definition 1. [Elastic-net or Lasso penalized loss problems] Let {(Yi,Xi)}ni=1 be independent identically random variables with values in R×Rp, where {Yi}ni=1Y are response variables and {Xi}ni=1X are covariates. Let l(y,x,β) be a loss function of parameter β and data (y,x). The empirical loss function is defined as

    Pnl(Y,X,β):=1nni=1l(Yi,Xi,β).

    Elastic-net (or Lasso) estimators are given by

    ˆβ=:ˆβ(λ1,λ2)=argminβRp{Pnl(Y,X,β)+λ1β1+λ2β22}, (2.2)

    where λ1>0 and λ20 are tuning parameters.

    Define the minimizer

    β=argminβRpE[l(Y,X,β)] (2.3)

    as the vector of true coefficients, where l(Y,X,β) is the loss function. The 1 ball is denoted as SR(β):={βRp:ββ1R}. Theorem 1 can be used to establish exponential-type concentration inequalities for the local stochastic Lipschitz (LSL) constant:

    supβSR(β)(PnP)[l(Y,X,β)l(Y,X,β)]ββ1.

    When study error bounds ˆββ1 of Lasso or Elastic-net penalized high-dimensional GLMs, one must bound the dependent empirical processes (PnP)[l(Y,X,ˆβ)l(Y,X,β)]ˆββ1 with the LSL constant as the upper bound.

    Next, we provide an example of negative binomial loss in negative binomial regressions [21]. The negative binomial loss function is l(y,x,β)=yxβ(θ+y)log(θ+exβ), where θ is called the dispersion parameter. Denote the expected risk function as Pl(Y,X,β):=El(Y,X,β). Let l1(y,x,β):=y[xβlog(θ+exp{xβ})], and l2(x,β):=θlog(θ+exp{xβ}), then

    (PnP)[l(y,x,β)l(y,x,β)]=(PnP)[l1(y,x,β)l1(y,x,β)]+(PnP)[l2(x,β)l2(x,β)].

    The upper bounds for the first and second parts of the empirical process: (PnP)(lm(β)lm(ˆβ)) for m=1,2 is paramount to study the error bound of ˆββ1.

    Let λ be a positive constant that needs to be determined. We have

    P(supβSR(β)|(PnP)[l(Y,X,β)l(Y,X,β)]|ββ1λ)P(supβSR(β)|(PnP)[l1(Y,X,β)l1(Y,X,β)]|ββ1λ2)+P(supβSR(β)|(PnP)[l2(X,β)l2(X,β)]|ββ1λ2). (2.4)

    Here, we assume that both x and β are bound, and θ is a known dispersion parameter. The high probability for the second term in (2.4) is easy to deal with if we apply McDiarmid's inequality (see Lemma 4 of [21]). However, the high probability for the first term in (2.4) is hard to control since it contains unbounded negative binomial variables {Yi}ni=1. Zhang and Jia [21] used the concentration inequality for strongly log-concave discrete distributions to solve this problem, but the strongly log-concave property is difficult to check for discrete distribution (see (H.4) in [21]). The sub-Gaussian distribution assumption is easy to verify for negative binomial variables, and this is from the fact that the negative binomial distribution belongs to the exponential family if the dispersion parameter is given. When Θ is compact in (2.7) below, Proposition 3.2 in [20] shows that {Yi}ni=1 is sub-Gaussian.

    From Taylor's expansion of continuous functions, one has log(θ+ex)log(θ+ea)=e˜aθ+e˜a(xa), where ˜a is some real number between a and x. Let Xi˜β be some point between Xiˆβ and Xiβ, i.e., ˜β=(t1ˆβ1tpˆβp)+((1t1)β1(1tp)βp) for {tj}pj=1[0,1]. Observe that

    (PnP)[l1(β)l1(ˆβ)]=1nni=1(YiEYi)Xi[(βˆβ)log(θ+exp{Xiβ}θ+exp{Xiˆβ})]=1nni=1(YiEYi)Xi[(βˆβ)exp{Xi˜β}Xi(βˆβ)θ+exp{Xi˜β}]=1nni=1θXi(ˆββ)θ+exp{Xi˜β}(YiEYi). (2.5)

    For a finite M0, if we have ˆβSM0(β), then ˜βSM0(β). This is from the fact ˜ββpj=1tj|ˆβjβ1|ˆββM0. Suppose |Xi| is uniformly bounded by 1/M0. We have |(2.5)|:=1nni=1wi(ˆβ)(YiEYi) with dependent weights

    wi(ˆβ):=θXi(ˆββ)θ+exp{Xi˜β} and |wi(ˆβ)|1.

    Thus, the high probability upper bound in Theorem 1 is applicable to determine λ2, i.e.,

    λ2=41nnk=1|YkYk|2θ2logδ1n+21nni=1E(YiEYi)22log2pn. (2.6)

    In this section, let {Yi}ni=1 be exponential family random variables with density

    f(yi;θi)=c(yi)exp{yiηib(ηi)},ηiΘ. (2.7)

    Here, E(Yi)=˙b(ηi) and Var(Yi)=¨b(ηi). It should be noted that Proposition 3.2 in [20] shows that {Yi}ni=1 is sub-Gaussian if Θ is compact.

    Under distributional assumption (2.7), we will study the Berstein-type concentration inequalities for the randomly weighted sum of centered exponential family random variables (with different parameters θi's):

    ni=1{wi(ˆθ)YiE[wi(ˆθ)Yi]},

    where the {wi(ˆθ)}ni=1 are called the multipliers (or random weights), and ˆθ is independent of {Yi}ni=1.

    Theorem 2. [Concentration inequalities for randomly weighted sum of exponential family r.v.s] If {Yi}ni=1 has density (2.7) with moment conditions: E|Yi|kk!CkY, where CY>0 is a constant. We assume that

    (i) Bounded weights: Wˆθ:=(w1(ˆθ),,wn(ˆθ)) is a random vector s.t. max1in|wi(ˆθ)|w<;

    (ii) Let E[|Yi|k|Wˆθ]=ρi,kE[|Yi|k] with Eρi,k=1;

    (iii) There exists a non-decreasing sequence {un} and constant Cρ s.t. P(maxk1,1inρi,k>un)Cρ/un. Then,

    P(|ni=1[wi(ˆθ)YiE[wi(ˆθ)Yi)]|t)2exp{t216nun(wCT)2+4wCTt}+Cρun, (2.8)

    where {wi(ˆθ)Yi} is dependent since each wi(ˆθ)Yi depends on a common estimator ˆθ from the data {Yi}ni=1.

    It should be noted that our work here is related to Proposition 3.2 in [20] and it is about the concentration inequalities for the non-random weighted sum of exponential family random variables. For condition (ii), suppose that the estimator ˆθ converges to a true parameter θ almost surely, and one has E[|Yi|k|Wθ]=E[|Yi|k] since Wθ is non-random. The difference between conditional expectation and unconditional expectation is:

    maxk1,1in|ρi,k1|=maxk1,1in|E[|Yi|k|Wˆθ]E[|Yi|k|Wθ]|E[|Yi|k|Wθ]Op(ˆθθ1),

    if |E[|Yi|k|Wβ] is a 1-Lipschitz function of β. We call P(maxk1,1inρi,kun)Cρ/un in assumption (iii) a high level condition. Intuitionally, due to the dependence summation, the random weighted summation will lose the rate of convergence in the exponential inequalities (addition term 4wCTt is added), compared to the case of non-random weighted summation. The assumption of compact parameter space for the exponential family is key to obtaining the sub-Gaussian type concentration inequalities.

    Our multiplier concentration inequality here is different from [11], which studies the concentration upper bounds for centered multiplier empirical processes 1nni=1[WiYiE(WiYi)] (the random weights {Wi} and random variables {Yi} need not be independent); however, they assume that {Wi} is i.i.d. To the best of our knowledge, Theorem 2 is a new concentration inequality that is suitable for the weighted sum of dependent random variables.

    Proof of Theorem 1: Let Y=(Y1,,Yn) be a vector of independent r.v.s in a space Y, and define (Y1,,Yn) as an independent copy of (Y1,,Yn). For any function f:YnR, it is of interest to study the concentration for f(Y) about its expectation. In case of Theorem 1,

    f(Y):=1nsupθ1r|ni=1wi(θ)Yi|.

    For zY and k{1,,n}, define the substitution operator

    Skz:YnYn by Skzy:=(y1,,yk1,z,yk+1,,yn)

    and the centered conditional version of f

    Df,Yk(y):=f(y1,,yk1,Yk,yk+1,,yn)Ef(y1,,yk1,Yk,yk+1,,yn)=f(SkYky)Ef(SkYky)=E[f(SkYky)f(SkYky)Yk]. (3.1)

    Next, we use a constant-sharper sub-Gaussian concentration for f(Z) in Corollary 4 in [22], which requires the θ2-norm condition of r.v. {Df,Zi(z)}ni=1.

    Lemma 2. If {Df,Zi(z)}ni=1 has finite θ2-norm for zZ, then f(Z)Ef(Z)subG(8supzZni=1Df,Zi(z)2θ2) and

    P{f(Z)Ef(Z)>t}et2/(16supzZni=1Df,Zi(z)2θ2),t0.

    From the identity in (3.1), we have

    Df,Yk(y)θ2=f(y1,,yk1,Yk,yk+1,,yn)Ef(y1,,yk1,Yk,yk+1,,yn)θ2=1nsupθ1r|w1(θ)y1++wk1(θ)yk1+wk(θ)Yk+wk+1(θ)yk+1++wn(θ)yn|1nsupθ1r|w1(θ)y1++wk1(θ)yk1+wk(θ)Yk+wk+1(θ)yk+1++wn(θ)yn|Yk]θ21nE[supθ1rwk(θ)|YkYk||Yk]θ21nE[|YkYk||Yk]θ2. (3.2)

    The conditional Jensen's inequality gives

    E[|E[|YkYk|Xk]|p]E[E{|YkYk|Xk}p]=E[E{(|YkYk|p)1/pXk}p]E{E[|YkYk|pXk]}=E|YkYk|p,p1. (3.3)

    The definition Xθ2=supk1[2kk!(2k)!EX2k]1/(2k) shows Df,Yk(y)θ21n|YkYk|θ2 by (3.2) and (3.3). Hence, we have supzZni=1Df,Zi(z)2θ2=1n2nk=1|YkYk|2θ2 in Lemma 2, which leads to

    P{f(Y)Ef(Y)>t}e(nt)2/16nk=1|YkYk|2θ2, t0.

    Let δ=e(nt)2/16nk=1|YkYk|2θ2, and t=41nnk=1|YkYk|2θ2logδ1n. We have

    f(Y)t+Ef(Y)=41nni=1|YiYi|2θ2logδ1n+Ef(Y), (3.4)

    with probability at least 1δ.

    It remains to obtain a bounds on Ef(Y), which is upper bounded by the symmetrization theorem from Lemma 3 with different functions. To see this, let Xi=Yi in Lemma 3 and gi(Yi)=wi(θ)Yi for i=1,,n.

    Since wi(θ)'s are series of bounded functions of a common bounded variable θ where θ1r and max1inwi()1, for any vector θ with θ1r, there exists a sequence of vectors {awi}ni=1Rp with awi1/r such that

    wi(θ)=awiθawiθ11. (3.5)

    Equation (3.5) and Lemma 3 imply

    Ef(Y)2nE(supθ1r|ni=1wi(θ)ϵiYi|)=2nE(supθ1r|ni=1pj=1ϵiYiawijθj|)=2nE(supθ1r|pj=1(ni=1ϵiYiawij)θj|)[by Hölder's inequality]2nE(supθ1rmax1jp|ni=1ϵiYiawij|θ1)2rnE(max1jp|ni=1ϵiYiawij|)=2rnE(Eϵmax1jp|ni=1ϵiYiawij|).

    Next, we apply the maximal inequality. By Corollary 7.5 in [20], with E[ϵiYiawij|Y]=0 and ϵiYiawijmax1inawiYi=r1Yi, one has

    2rnE(Eϵmax1jp|ni=1ϵiYiawij|)2n2log(2p)E(ni=1Y2i|Y)[By Jensen's inequality]2n2log(2p)Eni=1Y2i=21nni=1EY2i2log(2p)n.

    Thus, Ef(Y)21nni=1EY2i2log(2p)n. Using (3.4),

    f(Y)t+Ef(Y)=41nni=1|YiYi|2θ2logδ1n+21nni=1EY2i2log(2p)n, (3.6)

    with the probability of at least 1δ.

    Proof of Theorem 2: We will adopt the following result, which gives the moments inequality for the exponential family. It is deduced by the analytic properties of the absolute moments of exponential family random variables:

    E|Y|kk!CkY,

    see Proposition 5.2 in [20]. For notation simplicity, let W:=Wˆθ and Wi=wi(ˆθ). By using Taylor's expansion and the binomial coefficient formula, we have the following upper bound for the conditional moment generating function of WiYiE(WiYi), conditioning on the event {maxk1,1inρi,kun}:

    E[es(WiYiE(WiYi))|W]=1+m=2smm!E[(WiYiE(WiYi))m|W]=1+m=2smm!E[mk=0(km)(WiYi)k(E(WiYi))mk|W]1+m=2smm![mk=0(km)E|WiYi|k(E|WiYi|)mk|W](Due tomax1in|Wi|w)1+m=2smm![wmmk=0(km)E(|Yi|k|W)(E|Yi|)mk]1+m=2smm![(2w)mmax1km{E(|Yi|k|W)(E|Yi|)mk}(By assumption (iii))1+unm=2(2ws)mm![max1km{E(|Yi|k)(E|Yi|)mk}, (3.7)

    for s(0,δ) with some δ>0.

    Therefore, we can assume that |2swCT|<1, so

    E[es(WiYiE(WiYi))|W]1+unm=2(2ws)mm![m!CmT]=1+un(2swCT)2m=2(2swCT)m2=1+un(2swCT)212swCTeun(2swCT)212swCT. (3.8)

    Define the randomly weighted sum SWn=:ni=1WiYi. By the conditional independence of {WiYi|W}ni=1, it follows that by (3.8),

    E[es(SWnESWn)|W]=ni=1E[exp{s[WiYiE(WiYi)]}|W]enun(2swCT)212swCT. (3.9)

    By conditional Markov's inequality and on {maxk1,1inρi,k<un}, we have for a>0

    P(|SWnESWn|t|W)P(a(SWnESWn)at|W)+P(a(SWn+ESWn)at|W)E[ea(SWnESWn)|W]exp(at)+E[ea(SWn+ESWn)|W]exp(at)[Using(3.9)asa(δ,δ)]2exp{nun(2awCT)212awCTat}=2exp{t216nun(wCT)2+4wCTt} (3.10)

    where the last equality is obtained by setting a=t8nun(wCT)2+2wCTt.

    Taking expectation w.r.t. W on (3.10), it implies

    P(|SWnESWn|t)=P(|SWnESWn|t,maxk1,1inρi,k>un)+P(|SWnESWn|t,maxk1,1inρi,kun)P(maxk1,1inρi,k>un)+P(|SWnESWn|t,maxk1,1inρi,kun})Cρ/un+E[P(|SWnESWn|t,maxk1,1inρi,kun|W)]Cρ/un+2exp{t216nun(wCT)2+4wCTt}.

    Lemma 3. [Symmetrization theorem with different functions] Let ε1,...,εn be a Rademacher sequence with uniform distribution on {1,1}, independent of X1,...,Xn and giGi. Then,

    E(supg1,,gnG1,,Gn|ni=1[gi(Xi)E{gi(Xi)}]|)2E[Eϵ{supg1,,gnG1,,Gn|ni=1ϵigi(Xi)|}],

    where Eϵ{} refers to the expectation w.r.t. ϵ1,...,ϵn.

    Proof: Let {Xi}ni=1 be an independent copy of {Xi}ni=1. The E denotes the expectation w.r.t. {Xi}ni=1, and let Fn=σ(X1,,Xn). So,

    E(supg1,,gnG1,,Gn|ni=1[gi(Xi)E{gi(Xi)}]|)=E(supg1,,gnG1,,Gn|Eni=1[gi(Xt)gi(Xi)]|Fn|)(Jensen's inequality of the absolute function)E(supg1,,gnG1,,GnE|ni=1[gi(Xt)gi(Xi)]||Fn)(Jensen's inequality of the max function)E(Esupg1,,gnG1,,Gn|ni=1[gi(Xt)gi(Xi)]||Fn)=E(supf1,,fnG1,,Gn|ni=1[gi(Xt)gi(Xi)]|),

    where we use the conditional expectation version of Jensen's inequalities.

    Since εi[gi(Xi)gi(Xi)] and gi(Xi)gi(Xi) have the same distribution, then,

    =E(supg1,,gnG1,,Gn|ni=1εi[gi(Xi)gi(Xi])|)2E[Eϵ{supg1,,gnG1,,Gn|ni=1ϵigi(Xi)|}].

    If gi=f,Gi=F for i=1,2,,n, then we have the classical symmetrization theorem.

    Lemma 4. [Symmetrization Theorem, Lemma 2.3.1 in [17]] Let ε1,...,εn be a Rademacher sequence with uniform distribution on {1,1}, independent of X1,...,Xn and fF. Then, we have

    E[supfF|ni=1[f(Xi)E{f(Xi)}]|]2E[Eϵ{supfF|ni=1ϵif(Xi)|}],

    where E[] refers to the expectation w.r.t. X1,...,Xn and Eϵ{} w.r.t. ϵ1,...,ϵn.

    In this section, we restate the applications of the concentration inequality for a function of the data under the so-called strongly log-concave discrete distribution assumption, which was used in the Supplementary Material of [21]. We utilized the convex geometry approach to establish the tail bounds. In convex geometry, the following discrete version of the Prékopa-Leindler inequality can be found in Theorem 1.2 of [5]. The discrete version of the Prékopa-Leindler inequality is an essential inequality when deriving concentration inequalities of strongly log-concave counting measures. This shares the same idea when we consider the continuous version of Prékopa-Leindler inequality (see Theorem 3.15 of [18]).

    Let r=max{mZ;mr} be the lower integer part of rR, and r=r be the upper integer part. Denote x=(x1,xn) and x=(x1,,xn).

    Lemma 5. [Discrete Prékopa-Leindler inequality] Let f,g,h,k:Zn[0,) be functions that satisfy the following inequality:

    f(x)g(y)h(λx+(1λ)y)k((1λ)x+λy),x,yZn,λ[0,1]. (3.11)

    Then, we have

    (xZnf(x))(xZng(x))(xZnh(x))(xZnk(x)).

    From a geometric perspective, the Prékopa-Leindler inequality is a valuable method to prove concentration inequalities under Lipschitz functions of strongly log-concave distributions. From the idea in [12], a distribution P with a density p(x) (w.r.t. the counting measure) is said to be strongly discrete log-concave, if ψ(x)=:logp(x):ZnR is strongly midpoint log-convex for some γ>0:

    ψ(x)+ψ(y)ψ(12x+12y)ψ(12x+12y)γ4xy22,x,yZn. (3.12)

    The inequality (3.12) is an extension of strongly convexity for continuous functions on Rn:

    λψ(x)+(1λ)ψ(y)ψ(λx+(1λ)y)γ2λ(1λ)xy22,x,yRn,λ[0,1],

    with modulus of convexity γ [13].

    Strongly log-convex property for a discrete density function requires that continuous functions are restricted on a lattice space. If γ=0, (3.12) turns to the discrete midpoint convexity property for ψ(x)

    ψ(x)+ψ(y)ψ(12x+12y)+ψ(12x+12y),x,yZn,

    see [12]. However, directly restricting a continuous function to some lattice space may not necessarily obtain discrete convex functions. For the corresponding counterexample, see [19].

    For one-dimensional P, the probability mass function p(x) is said to be log-concave if the sequence {p(x)}xZ is log-concave; that is, for any λn+(1λ)mZ with m,nZ and λ(0,1), one has

    p(λn+(1λ)m)p(n)λp(m)1λ.

    Proposition 1. [The concentration inequality of strongly log-concave discrete distributions] Consider a strongly log-concave discrete distribution Pγ with index γ>0 on Zn. For f:RnR that is L-Lipschitz w.r.t. Euclidean norm, then,

    Pr{|f(X)Ef(X)|t}2eγt24L2. (3.13)

    Proof: Let h be a zero-mean function with Lipschitz constant L (w.r.t. the Euclidean norm). It remains to prove the upper bound of a moment generating function Eeh(X)eL2τ. Then, for f with Lipschitz constant K and λR, we apply the upper bound to the zero-mean function h(X):=λ(f(X)Ef(X)), which has Lipschitz constant L=λK. Given λ(0,1) and x,yZn, define the proximity operator of h as

    l(y):=infxZn{h(x)+γ4xy22}.

    With this proximity operator, the proof is proceeding by using the discrete Prekopa-Leindler inequality (Lemma 5) with λ=1/2, h(t)=k(t)=:p(t)=eψ(t), f(x):=eh(x)ψ(x) and g(y):=el(y)ψ(y). We check that

    e12[l(y)h(x)ψ(y)ψ(x)]e12ψ(12x+12y)e12ψ(12x+12y)x,yZn. (3.14)

    Then, (3.11) satisfies Lemma 5 with λ=1/2.

    By discrete strong convexity of the function ψ

    12[ψ(x)+ψ(y)ψ(12x+12y)ψ(12x+12y)γ8xy22,

    and the proximity operator of h, we have

    12ψ(12x+12y)12ψ(12x+12y)12{l(y)h(x)γ4xy22}12ψ(12x+12y)12ψ(12x+12y)12{l(y)h(x)}12ψ(y)12ψ(x),

    which verifies (3.14).

    By xZnh(x)=xZnk(x)=1, we know that Lemma 5 gives

    Eel(Y)Eeh(X)=xZneh(x)ψ(x)yZnel(y)ψ(y)1.

    Then, Jensen's inequality implies

    Eel(Y)(Eeh(X))1(eE[h(X)])1=1,

    where in the last equality we use E[h(X)]=E[λ(f(X)Ef(X))]=0. The definition of the proximity operator shows

    1Eel(y)=EeinfxZn{h(x)+γ4xY22}=EeinfxZn{h(Y)+[h(x)h(Y)]+γ4xY22}Eeh(Y)+infxRn{LxY2+γ4xY22}=Eeh(Y)L2/γ,

    where the second last inequality due to L-Lipschitz of h, i.e., |h(x)h(Y)|LxY2.

    Then, we have Eeλ(f(X)Ef(X)]e12λ22L2γ for all λR. This means that f(X)Ef(X)subG(2L2γ), hence the tail bound is (3.13).

    Non-asymptotic statistical inference on high-dimensional data is important for many fields, such as data mining and machine learning. In this paper, we derived a novel concentration inequality for the sum of independent sub-Gaussian variables with random dependent weights in high-dimensional regression settings. We applied the proposed concentration inequality to obtain a high probability bound for the stochastic Lipschitz constant for negative binomial loss functions involved in Lasso-penalized negative binomial regressions, and used this bound to study oracle inequalities for the Lasso estimators. The usefulness of the proposed concentration inequality in applications was justified by solid theoretical proofs.

    The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.

    The authors would like to thank the reviewers and Dr. Pengfei Wang for their valuable suggestions that have significantly improved the quality of this paper. This work was supported in part by the National Natural Science Foundation of China under grant numbers 12101630 and 12261011, and the "double first-class" construction projects of Chinese universities under grant number ZG216S2348.

    The authors declare no conflict of interest.



    [1] V. V. Buldygin, Y. V. Kozachenko, Metric characterization of random variables and random processes, Providence: American Mathematical Society, 2000.
    [2] S. Boucheron, G. Lugosi, P. Massart, Concentration inequalities: A nonasymptotic theory of independence, Oxford: Oxford University Press, 2013.
    [3] P. Bühlmann, S. A. van de Geer, Statistics for high-dimensional data: methods, theory and applications, Berlin: Springer, 2011. https://doi.org/10.1007/978-3-642-20192-9
    [4] Z. Chi, A local stochastic Lipschitz condition with application to Lasso for high dimensional generalized linear models, arXiv: 1009.1052. https://doi.org/10.48550/arXiv.1009.1052
    [5] D. Halikias, B. Klartag, B. A. Slomka, Discrete variants of Brunn-Minkowski type inequalities, Annales de la Faculté des Sciences de Toulouse Mathématiques, 30 (2021), 267–279. https://doi.org/10.5802/afst.1674
    [6] Q. Han, J. A. Wellner, Convergence rates of least squares regression estimators with heavy-tailed errors, Ann. Statist., 47 (2019), 2286–2319. https://doi.org/10.1214/18-AOS1748 doi: 10.1214/18-AOS1748
    [7] Q. Han, Multiplier U-processes: sharp bounds and applications, Bernoulli, 28 (2022), 87–124. https://doi.org/10.3150/21-BEJ1334 doi: 10.3150/21-BEJ1334
    [8] W. Hoeffding, Probability inequalities for sums of bounded random variables, J. Am. Stat. Assoc., 58 (1963), 13–30. https://doi.org/10.1080/01621459.1963.10500830 doi: 10.1080/01621459.1963.10500830
    [9] J. Kahane, Propriétés locales des fonctions à séries de Fourier aléatoires, Stud. Math., 19 (1960), 1–25. https://doi.org/10.4064/sm-19-1-1-25 doi: 10.4064/sm-19-1-1-25
    [10] S. Li, H. Wei, X. Lei, Heterogeneous overdispersed count data regressions via double-penalized estimations, Mathematics, 10 (2022), 1700. https://doi.org/10.3390/math10101700 doi: 10.3390/math10101700
    [11] S. Mendelson, Upper bounds on product and multiplier empirical processes, Stoch. Proc. Appl., 126 (2016), 3652–3680. https://doi.org/10.1016/j.spa.2016.04.019 doi: 10.1016/j.spa.2016.04.019
    [12] S. Moriguchi, K. Murota, A. Tamura, F. Tardella, Discrete midpoint convexity, Math. Oper. Res., 45 (2020), 99–128. https://doi.org/10.1287/moor.2018.0984
    [13] M. W. Mahoney, J. C. Duchi, A. C. Gilbert, The mathematics of data, Providence: American Mathematical Society, 2018.
    [14] P. Massart, Some applications of concentration inequalities to statistics, Annales de la Facult des Sciences de Toulouse Mathmatiques, 9 (2000), 245–303. https://doi.org/10.5802/afst.961 doi: 10.5802/afst.961
    [15] P. Rigollet, J. C. Hütter, High dimensional statistics, New York: Spring, 2019.
    [16] R. Vershynin, Introduction to the non-asymptotic analysis of random matrices, arXiv: 1011.3027. https://doi.org/10.48550/arXiv.1011.3027
    [17] A. W. Vaart, J. A. Wellner, Weak convergence and empirical processes: with applications to statistics, New York: Springer, 1996. https://doi.org/10.1007/978-1-4757-2545-2
    [18] M. J. Wainwright, High-dimensional statistics: a non-asymptotic viewpoint, Cambridge: Cambridge University Press, 2019.
    [19] Ü. Yüceer, Discrete convexity: convexity for functions defined on discrete spaces, Discrete Appl. Math., 119 (2002), 297–304. https://doi.org/10.1016/S0166-218X(01)00191-3 doi: 10.1016/S0166-218X(01)00191-3
    [20] H. Zhang, S. Chen, Concentration inequalities for statistical inference, Commun. Math. Res., 37 (2021), 1–85 https://doi.org/10.4208/cmr.2020-0041 doi: 10.4208/cmr.2020-0041
    [21] H. Zhang, J. Jia, Elastic-net regularized high-dimensional negative binomial regression: consistency and weak signals detection, Stat. Sinica, 32 (2022), 181–207. https://doi.org/10.5705/SS.202019.0315 doi: 10.5705/SS.202019.0315
    [22] H. Zhang, X. Lei, Growing-dimensional partially functional linear models: non-asymptotic optimal prediction error, Phys. Scr., 98 (2023), 095216. https://doi.org/10.1088/1402-4896/aceac0 doi: 10.1088/1402-4896/aceac0
    [23] H. Zhang, H. Wei, G. Cheng, Tight non-asymptotic inference via sub-Gaussian intrinsic moment norm, arXiv: 2303.07287. https://doi.org/10.48550/arXiv.2303.07287
  • Reader Comments
  • © 2023 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(1224) PDF downloads(52) Cited by(0)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog