Loading [MathJax]/jax/element/mml/optable/MathOperators.js
Research article

Cross-view learning with scatters and manifold exploitation in geodesic space


  • Received: 11 April 2023 Revised: 07 July 2023 Accepted: 23 July 2023 Published: 31 July 2023
  • Cross-view data correlation analysis is a typical learning paradigm in machine learning and pattern recognition. To associate data from different views, many approaches to correlation learning have been proposed, among which canonical correlation analysis (CCA) is a representative. When data is associated with label information, CCA can be extended to a supervised version by embedding the supervision information. Although most variants of CCA have achieved good performance, nearly all of their objective functions are nonconvex, implying that their optimal solutions are difficult to obtain. More seriously, the discriminative scatters and manifold structures are not exploited simultaneously. To overcome these shortcomings, in this paper we construct a Discriminative Correlation Learning with Manifold Preservation, DCLMP for short, in which, in addition to the within-view supervision information, discriminative knowledge as well as spatial structural information are exploited to benefit subsequent decision making. To pursue a closed-form solution, we remodel the objective of DCLMP from the Euclidean space to a geodesic space and obtain a convex formulation of DCLMP (C-DCLMP). Finally, we have comprehensively evaluated the proposed methods and demonstrated their superiority on both toy and real datasets.

    Citation: Qing Tian, Heng Zhang, Shiyu Xia, Heng Xu, Chuang Ma. Cross-view learning with scatters and manifold exploitation in geodesic space[J]. Electronic Research Archive, 2023, 31(9): 5425-5441. doi: 10.3934/era.2023275

    Related Papers:

    [1] Yanyan Gao, Yangjiang Wei . Group codes over symmetric groups. AIMS Mathematics, 2023, 8(9): 19842-19856. doi: 10.3934/math.20231011
    [2] Chunqiang Cui, Jin Chen, Shixun Lin . Metric and strong metric dimension in TI-power graphs of finite groups. AIMS Mathematics, 2025, 10(1): 705-720. doi: 10.3934/math.2025032
    [3] Yunpeng Bai, Yuanlin Li, Jiangtao Peng . Unit groups of finite group algebras of Abelian groups of order 17 to 20. AIMS Mathematics, 2021, 6(7): 7305-7317. doi: 10.3934/math.2021428
    [4] Huani Li, Ruiqin Fu, Xuanlong Ma . Forbidden subgraphs in reduced power graphs of finite groups. AIMS Mathematics, 2021, 6(5): 5410-5420. doi: 10.3934/math.2021319
    [5] Yingyu Luo, Yu Wang . Supercommuting maps on unital algebras with idempotents. AIMS Mathematics, 2024, 9(9): 24636-24653. doi: 10.3934/math.20241200
    [6] Guoqing Wang . A generalization of Kruyswijk-Olson theorem on Davenport constant in commutative semigroups. AIMS Mathematics, 2020, 5(4): 2992-3001. doi: 10.3934/math.2020193
    [7] Yongli Zhang, Jiaxin Shen . Flag-transitive non-symmetric 2-designs with λ prime and exceptional groups of Lie type. AIMS Mathematics, 2024, 9(9): 25636-25645. doi: 10.3934/math.20241252
    [8] Hatice Gülsün Akay . (Co-)fibration of generalized crossed modules. AIMS Mathematics, 2024, 9(11): 32782-32796. doi: 10.3934/math.20241568
    [9] Xiaofei Cao, Yuyue Huang, Xue Hua, Tingyu Zhao, Sanzhang Xu . Matrix inverses along the core parts of three matrix decompositions. AIMS Mathematics, 2023, 8(12): 30194-30208. doi: 10.3934/math.20231543
    [10] Hui Yan, Hongxing Wang, Kezheng Zuo, Yang Chen . Further characterizations of the weak group inverse of matrices and the weak group matrix. AIMS Mathematics, 2021, 6(9): 9322-9341. doi: 10.3934/math.2021542
  • Cross-view data correlation analysis is a typical learning paradigm in machine learning and pattern recognition. To associate data from different views, many approaches to correlation learning have been proposed, among which canonical correlation analysis (CCA) is a representative. When data is associated with label information, CCA can be extended to a supervised version by embedding the supervision information. Although most variants of CCA have achieved good performance, nearly all of their objective functions are nonconvex, implying that their optimal solutions are difficult to obtain. More seriously, the discriminative scatters and manifold structures are not exploited simultaneously. To overcome these shortcomings, in this paper we construct a Discriminative Correlation Learning with Manifold Preservation, DCLMP for short, in which, in addition to the within-view supervision information, discriminative knowledge as well as spatial structural information are exploited to benefit subsequent decision making. To pursue a closed-form solution, we remodel the objective of DCLMP from the Euclidean space to a geodesic space and obtain a convex formulation of DCLMP (C-DCLMP). Finally, we have comprehensively evaluated the proposed methods and demonstrated their superiority on both toy and real datasets.



    This paper considers the following heteroscedastic model:

    Yi=f(Xi)Ui+g(Xi),i{1,,n}. (1.1)

    In this equation, g(x) is a known mean function, and the variance function r(x)(r(x):=f2(x)) is unknown. Both the mean function g(x) and variance function r(x) are defined on [0,1]. The random variables U1,,Un are independent and identically distributed (i.i.d.) with E[Ui]=0 and V[Ui]=1. Furthermore, the random variable Xi is independent of Ui for any i{1,,n}. The purpose of this paper is to estimate the mth derivative functions r(m)(x)(mN) from the observed data (X1,Y1),,(Xn,Yn) by a wavelet method.

    Heteroscedastic models are widely used in economics, engineering, biology, physical sciences and so on; see Box [1], Carroll and Ruppert [2], Härdle and Tsybakov [3], Fan and Yao [4], Quevedo and Vining [5] and Amerise [6]. For the above estimation model (1.1), the most popular method is the kernel method. Many important and interesting results of kernel estimators have been obtained by Wang et al. [7], Kulik and Wichelhaus [8] and Shen et al. [9]. However, the optimal bandwidth parameter of the kernel estimator is not easily obtained in some cases, especially when the function has some sharp spikes. Because of the good local properties in both time and frequency domains, the wavelet method has been widely used in nonparametric estimation problems; see Donoho and Johnstone [10], Cai [11], Nason et al. [12], Cai and Zhou [13], Abry and Didier [14] and Li and Zhang [15]. For the estimation problem (1.1), Kulik and Raimondo [16] studied the adaptive properties of warped wavelet nonlinear approximations over a wide range of Besov scales. Zhou et al. [17] developed wavelet estimators for detecting and estimating jumps and cusps in the mean function. Palanisamy and Ravichandran [18] proposed a data-driven estimator by applying wavelet thresholding along with the technique of sparse representation. The asymptotic normality for wavelet estimators of variance function under αmixing condition was obtained by Ding and Chen [19].

    In this paper, we focus on nonparametric estimation of the derivative function r(m)(x) of the variance function r(x). It is well known that derivative estimation plays an important and useful role in many practical applications (Woltring [20], Zhou and Wolfe, [21], Chacón and Duong [22], Wei et al.[23]). For the estimation model (1.1), a linear wavelet estimator and an adaptive nonlinear wavelet estimator for the derivative function r(m)(x) are constructed. Moreover, the convergence rates over L˜p(1˜p<) risk of two wavelet estimators are proved in Besov space Bsp,q(R) with some mild conditions. Finally, numerical experiments are carried out, where an automatic selection method is used to obtain the best parameters of two wavelet estimators. According to the simulation study, both wavelet estimators can efficiently estimate the derivative function. Furthermore, the nonlinear wavelet estimator shows better performance than the linear estimator.

    This paper considers wavelet estimations of a derivative function in Besov space. Now, we first introduce some basic concepts of wavelets. Let ϕ be an orthonormal scaling function, and the corresponding wavelet function is denoted by ψ. It is well known that {ϕτ,k:=2τ/2ϕ(2τxk),ψj,k:=2j/2ψ(2jxk),jτ,kZ} forms an orthonormal basis of L2(R). This paper uses the Daubechies wavelet, which has a compactly support. Then, for any integer j, a function h(x)L2([0,1]) can be expanded into a wavelet series as

    h(x)=kΛjαj,kϕj,k(x)+j=jkΛjβj,kψj,k(x),x[0,1]. (1.2)

    In this equation, Λj={0,1,,2j1}, αj,k=h,ϕj,k[0,1] and βj,k=h,ψj,k[0,1].

    Lemma 1.1. Let a scaling function ϕ be t-regular (i.e., ϕCt and |Dαϕ(x)|c(1+|x|2)l for each lZ and α=0,1,,t). If {αk}lp and 1p, there exist c2c1>0 such that

    c12j(121p)(αk)pkΛjαk2j2ϕ(2jxk)pc22j(121p)(αk)p.

    Besov spaces contain many classical function spaces, such as the well known Sobolev and Hölder spaces. The following lemma gives an important equivalent definition of a Besov space. More details about wavelets and Besov spaces can be found in Meyer [24] and Härdle et al. [25].

    Lemma 1.2. Let ϕ be t-regular and hLp([0,1]). Then, for p,q[1,) and 0<s<t, the following assertions are equivalent:

    (i) hBsp,q([0,1]);

    (ii) {2jshPjhp}lq;

    (iii) {2j(s1p+12)βj,kp}lq.

    The Besov norm of h can be defined by

    hBsp,q=(ατ,k)p+(2j(s1p+12)βj,kp)jτq,

    where βj,kpp=kΛj|βj,k|p.

    In this section, we will construct our wavelet estimators, and give the main theorem of this paper. The main theorem shows the convergence rates of wavelet estimators under some mild assumptions. Now, we first give the technical assumptions of the estimation model (1.1) in the following.

    A1: The variance function r:[0,1]R is bounded.

    A2: For any i{0,,m1}, variance function r satisfies r(i)(0)=r(i)(1)=0.

    A3: The mean function g:[0,1]R is bounded and known.

    A4: The random variable X satisfies XU([0,1]).

    A5: The random variable U has a moment of order 2˜p(˜p1).

    In the above assumptions, A1 and A3 are conventional conditions for nonparametric estimations. The condition A2 is used to prove the unbiasedness of the following wavelet estimators. In addition, A4 and A5 are technique assumptions, which will be used in Lemmas 4.3 and 4.5.

    According to the model (1.1), our linear wavelet estimator is constructed by

    ˆrlinn(x):=kΛjˆαj,kϕj,k(x). (2.1)

    In this definition, the scale parameter j will be given in the following main theorem, and

    ˆαj,k:=1nni=1Y2i(1)mϕ(m)j,k(Xi)10g2(x)(1)mϕ(m)j,k(x)dx. (2.2)

    More importantly, it should be pointed out that this linear wavelet estimator is an unbiased estimator of the derivative function r(m)(x) by Lemma 4.1 and the properties of wavelets.

    On the other hand, a nonlinear wavelet estimator is defined by

    ˆrnonn(x):=kΛjˆαj,kϕj,k(x)+j1j=jˆβj,kI{|ˆβj,k|κtn}ψj,k(x). (2.3)

    In this equation, IA denotes the indicator function over an event A, tn=2mjlnn/n,

    ˆβj,k:=1nni=1(Y2i(1)mψ(m)j,k(Xi)wj,k)I{|Y2i(1)mψ(m)j,k(Xi)wj,k|ρn}, (2.4)

    ρn=2mjn/lnn, and wj,k=10g2(x)(1)mψ(m)j,k(x)dx. The positive integer j and j1 will also be given in our main theorem, and the constant κ will be chosen in Lemma 4.5. In addition, we adopt the following symbol: x+:=max{x,0}. AB denotes AcB for some constant c>0; AB means BA; AB stands for both AB and BA.

    In this position, the convergence rates of two wavelet estimators are given in the following main theorem.

    Main theorem For the estimation model (1.1) with the assumptions A1-A5, r(m)(x)Bsp,q([0,1])(p,q[1,), s>0) and 1˜p<, if {p>˜p1,s>0} or {1p˜p,s>1/p}.

    (a) the linear wavelet estimator ˆrlinn(x) with s=s(1p1˜p)+ and 2jn12s+2m+1 satisfies

    E[ˆrlinn(x)r(m)(x)˜p˜p]n˜ps2s+2m+1. (2.5)

    (b) the nonlinear wavelet estimator ˆrnonn(x) with 2jn12t+2m+1 (t>s) and 2j1(nlnn)12m+1 satisfies

    E[ˆrnonn(x)r(m)(x)˜p˜p](lnn)˜p1(lnnn)˜pδ, (2.6)

    where

    δ=min{s2s+2m+1,s1/p+1/˜p2(s1/p)+2m+1}={s2s+2m+1p>˜p(2m+1)2s+2m+1s1/p+1/˜p2(s1/p)+2m+1p˜p(2m+1)2s+2m+1.

    Remark 1. Note that ns˜p2s+1(n(s1/p+1/˜p)˜p2(s1/p)+1) is the optimal convergence rate over L˜p(1˜p<+) risk for nonparametric wavelet estimations (Donoho et al. [26]). The linear wavelet estimator can obtain the optimal convergence rate when p>˜p1 and m=0.

    Remark 2. When m=0, this derivative estimation problem reduces to the classical variance function estimation. Then, the convergence rates of the nonlinear wavelet estimator are same as the optimal convergence rates of nonparametric wavelet estimation up to a lnn factor in all cases.

    Remark 3. According to main theorem (a) and the definition of the linear wavelet estimator, it is easy to see that the construction of the linear wavelet estimator depends on the smooth parameter s of the unknown derivative function r(m)(x), which means that the linear estimator is not adaptive. Compared with the linear estimator, the nonlinear wavelet estimator only depends on the observed data and the sample size. Hence, the nonlinear estimator is adaptive. More importantly, the nonlinear wavelet estimator has a better convergence rate than the linear estimator in the case of p˜p.

    In order to illustrate the empirical performance of the proposed estimators, we produce a numerical illustration using an adaptive selection method, which is used to obtain the best parameters of the wavelet estimators. For the problem (1.1), we choose three common functions, HeaviSine, Corner and Spikes, as the mean function g(x); see Figure 1. Those functions are usually used in wavelet literature. On the other hand, we choose the function f(x) by f1(x)=3(4x2)2e(4x2)2, f2(x)=sin(2πsinπx) and f3(x)=(2x1)2+1, respectively. In addition, we assume that the random variable U satisfies UN[0,1]. The aim of this paper is to estimate the derivative function r(m)(x) of the variance function r(x)(r=f2) by the observed data (X1,Y1),,(Xn,Yn). In this section, we adopt r1(x)=[f1(x)]2, r2(x)=[f2(x)]2 and r3(x)=[f3(x)]2. For the sake of simplicity, our simulation study focuses on the derivative function r(x)(m=1) and r(x)(m=0) by the observed data (X1,Y1),,(Xn,Yn)(n=4096). Furthermore, we use the mean square error (MSE(ˆr(x),r(x))=1nni=1(ˆr(Xi)r(Xi))2) and the average magnitude of error (AME(ˆr(x),r(x))=1nni=1|ˆr(Xi)r(Xi)|) to evaluate the performances of the wavelet estimators separately.

    Figure 1.  Three mean functions. (a) HeaviSine, (b) Corner, (c) Spikes.

    For the linear and nonlinear wavelet estimators, the scale parameter j and threshold value λ(λ=κtn) play important roles in the function estimation problem. In order to obtain the optimal scale parameter and threshold value of wavelet estimators, this section uses the two-fold cross validation (2FCV) approach (Nason [27], Navarro and Saumard [28]). During the first example of simulation study, we choose HeaviSine as the mean function g(x), and f1(x)=3(4x2)2e(4x2)2. The estimation results of two wavelet estimators are presented by Figure 2. For the optimal scale parameter j of the linear wavelet estimator, we built a collection of j and j=1,,log2(n)1. The best parameter j is selected by minimizing a 2FCV criterion denoted by 2FCV(j); see Figure 2(a). According to Figure 2(a), it is easy to see that the 2FCV(j) and MSE both can get the minimum value when j=4. For the nonlinear wavelet estimator, the best threshold value λ is also obtained by the 2FCV(λ) criterion in Figure 2(b). Meanwhile, the parameter j is same as the linear estimator, and the parameter j1 is chosen as the maximum scale parameter log2(n)1. From Figure 2(c) and 2(d), the linear and nonlinear wavelet estimators both can get a good performance with the best scale parameter and threshold value. More importantly, the nonlinear wavelet estimator shows better performance than the linear estimator.

    Figure 2.  The estimation results of wavelet estimators when g(x) is HeaviSine and r(x)=r1(x). (a) Graphs of the MSE (black line) and 2FCV criterion (red line) of the linear estimator. (b) Graphs of the MSE (black line) and 2FCV criterion (blue line) of the nonlinear estimator. (c) Fluctuating data (X,Y) (gray circles), the true variance r(x) (black line), the linear estimator ˆrlin (red line) and the nonlinear estimator ˆrnon (blue line). (d) The estimation results of the linear (red line) and nonlinear (blue line) for derivative function r(x).

    In the following simulation study, more numerical experiments are presented to sufficiently verify the performance of the wavelet method. According to Figures 310, the wavelet estimators both can obtain good performances in different cases. Especially, the nonlinear wavelet estimator gets better estimation results than the linear estimator. Also, the MSE and AME of the wavelet estimators in all examples are provided by Table 1. Meanwhile, it is easy to see from Table 1 that the nonlinear wavelet estimators can have better performance than the linear estimators.

    Figure 3.  The estimation results of wavelet estimators when g(x) is HeaviSine and r(x)=r2(x).
    Figure 4.  The estimation results of wavelet estimators when g(x) is HeaviSine and r(x)=r3(x).
    Figure 5.  The estimation results of wavelet estimators when g(x) is Corner and r(x)=r1(x).
    Figure 6.  The estimation results of wavelet estimators when g(x) is Corner and r(x)=r2(x).
    Figure 7.  The estimation results of wavelet estimators when g(x) is Corner and r(x)=r3(x).
    Figure 8.  The estimation results of wavelet estimators when g(x) is Spikes and r(x)=r1(x).
    Figure 9.  The estimation results of wavelet estimators when g(x) is Spikes and r(x)=r2(x).
    Figure 10.  The estimation results of wavelet estimators when g(x) is Spikes and r(x)=r3(x).
    Table 1.  The MSE and AME of the wavelet estimators.
    HeaviSine Corner Spikes
    r1 r2 r3 r1 r2 r3 r1 r2 r3
    MSE(ˆrlin,r) 0.0184 0.0073 0.0071 0.0189 0.0075 0.0064 0.0189 0.0069 0.0052
    MSE(ˆrnon,r) 0.0048 0.0068 0.0064 0.0044 0.0070 0.0057 0.0042 0.0061 0.0046
    MSE(ˆrlin,r) 0.7755 0.0547 0.0676 0.7767 0.1155 0.0737 0.7360 0.2566 0.0655
    MSE(ˆrnon,r) 0.2319 0.0573 0.0560 0.2204 0.0644 0.0616 0.2406 0.2868 0.0539
    AME(ˆrlin,r) 0.0935 0.0653 0.0652 0.0973 0.0667 0.0615 0.0964 0.0621 0.0550
    AME(ˆrnon,r) 0.0506 0.0641 0.0619 0.0486 0.0649 0.0583 0.0430 0.0595 0.0518
    AME(ˆrlin,r) 0.6911 0.1876 0.2348 0.7021 0.2686 0.2451 0.6605 0.4102 0.2320
    AME(ˆrnon,r) 0.3595 0.1862 0.2125 0.3450 0.2020 0.2229 0.3696 0.4198 0.2095

     | Show Table
    DownLoad: CSV

    Now, we provide some lemmas for the proof of the main Theorem.

    Lemma 4.1. For the model (1.1) with A2 and A4,

    E[ˆαj,k]=αj,k, (4.1)
    E[1nni=1(Y2i(1)mψ(m)j,k(Xi)wj,k)]=βj,k. (4.2)

    Proof. According to the definition of ˆαj,k,

    E[ˆαj,k]=E[1nni=1Y2i(1)mϕ(m)j,k(Xi)10g2(x)(1)mϕ(m)j,k(x)dx]=1nni=1E[Y2i(1)mϕ(m)j,k(Xi)]10g2(x)(1)mϕ(m)j,k(x)dx=E[Y21(1)mϕ(m)j,k(X1)]10g2(x)(1)mϕ(m)j,k(x)dx=E[r(X1)U21(1)mϕ(m)j,k(X1)]+2E[f(X1)U1g(X1)(1)mϕ(m)j,k(X1)]+E[g2(X1)(1)mϕ(m)j,k(X1)]10g2(x)(1)mϕ(m)j,k(x)dx.

    Then, it follows from A4 that

    E[g2(X1)(1)mϕ(m)j,k(X1)]=10g2(x)(1)mϕ(m)j,k(x)dx.

    Using the assumption of independence between Ui and Xi,

    E[r(X1)U21(1)mϕ(m)j,k(X1)]=E[U21]E[r(X1)(1)mϕ(m)j,k(X1)],
    E[f(X1)U1g(X1)(1)mϕ(m)j,k(X1)]=E[U1]E[f(X1)g(X1)(1)mϕ(m)j,k(X1)].

    Meanwhile, the conditions V[U1]=1 and E[U1]=0 imply E[U21]=1. Hence, one gets

    E[ˆαj,k]=E[r(X1)(1)mϕ(m)j,k(X1)]=10r(x)(1)mϕ(m)j,k(x)dx=(1)m10r(x)ϕ(m)j,k(x)dx=10r(m)(x)ϕj,k(x)dx=αj,k

    by the assumption A2.

    On the other hand, one takes ψ instead of ϕ, and wj,k instead of 10g2(x)(1)mϕ(m)j,k(x)dx. The second equation will be proved by the similar mathematical arguments.

    Lemma 4.2. (Rosenthal's inequality) Let X1,,Xn be independent random variables such that E[Xi]=0 and E[|Xi|p]<. Then,

    E[|ni=1Xi|p]{ni=1E[|Xi|p]+(ni=1E[|Xi|2])p2, p > 2 ,(ni=1E[|Xi|2])p2,1p2.

    Lemma 4.3. For the model (1.1) with A1–A5, 2jn and 1˜p<,

    E[|ˆαj,kαj,k|˜p]n˜p22˜pmj, (4.3)
    E[|ˆβj,kβj,k|˜p](lnnn)˜p22˜pmj. (4.4)

    Proof. By (4.1) and the independence of random variables Xi and Ui, one has

    |ˆαj,kαj,k|=|1nni=1Y2i(1)mϕ(m)j,k(Xi)10g2(x)(1)mϕ(m)j,k(x)dxE[ˆαj,k]|=1n|ni=1(Y2i(1)mϕ(m)j,k(Xi)E[Y2i(1)mϕ(m)j,k(Xi)])|=1n|ni=1Ai|.

    In this above equation, Ai:=Y2i(1)mϕ(m)j,k(Xi)E[Y2i(1)mϕ(m)j,k(Xi)].

    According to the definition of Ai, one knows that E[Ai]=0 and

    E[|Ai|˜p]=E[|Y2i(1)mϕ(m)j,k(Xi)E[Y2i(1)mϕ(m)j,k(Xi)]|˜p]E[|Y2i(1)mϕ(m)j,k(Xi)|˜p]E[|(r(X1)U21+g2(X1))(1)mϕ(m)j,k(Xi)|˜p]E[U2˜p1]E[|r(X1)ϕ(m)j,k(Xi)|˜p]+E[|g2(X1)ϕ(m)j,k(Xi)|˜p].

    The assumption A5 shows E[U2˜p1]1. Furthermore, it follows from A1 and A3 that

    E[U2˜p1]E[|r(X1)ϕ(m)j,k(X1)|˜p]E[|ϕ(m)j,k(X1)|˜p],E[g2˜p(X1)|ϕ(m)j,k(X1)|˜p]E[|ϕ(m)j,k(X1)|˜p].

    In addition, and the properties of wavelet functions imply that

    E[|ϕ(m)j,k(Xi)|˜p]=10|ϕ(m)j,k(x)|˜pdx=2j(˜p/2+m˜p1)10|ϕ(m)(2jxk)|˜pd(2jxk)=2j(˜p/2+m˜p1)||ϕ(m)||˜p˜p2j(˜p/2+m˜p1).

    Hence,

    E[|Ai|˜p]2j(˜p/2+m˜p1).

    Especially in ˜p=2, E[|Ai|2]22mj.

    Using Rosenthal's inequality and 2jn,

    E[|ˆαj,kαj,k|˜p]=1n˜pE[|ni=1Ai|˜p]{1n˜p(ni=1E[|Ai|˜p]+(ni=1E[|Ai|2])˜p2),˜p>2,1n˜p(ni=1E[|Ai|2])˜p2,1˜p2,{1n˜p(n2j(˜p2+m˜p1)+(n22mj)˜p2),˜p>2,1n˜p(n22mj)˜p2,1˜p2,n˜p22˜pmj.

    Then, the first inequality is proved.

    For the second inequality, note that

    βj,k=E[1nni=1(Y2i(1)mψ(m)j,k(Xi)wj,k)]=1nni=1E[(Y2i(1)mψ(m)j,k(Xi)10g2(x)(1)mψ(m)j,k(x)dx)]=1nni=1E[Ki]

    with (4.2) and Ki:=Y2i(1)mψ(m)j,k(Xi)10g2(x)(1)mψ(m)j,k(x)dx.

    Let Bi:=KiI{|Ki|ρn}E[KiI{|Ki|ρn}]. Then, by the definition of ˆβj,k in (2.4),

    |ˆβj,kβj,k|=|1nni=1KiI{|Ki|ρn}βj,k|1n|ni=1Bi|+1nni=1E[|Ki|I{|Ki|>ρn}]. (4.5)

    Similar to the arguments of Ai, it is easy to see that E[Bi]=0 and

    E[|Bi|˜p]E[|KiI{|Ki|ρn}|˜p]E[|Ki|˜p]2j(˜p2+m˜p1).

    Especially in the case of ˜p=2, one can obtain E[|Bi|2]22mj. On the other hand,

    E[|Ki|I{|Ki|>ρn}]E[|Ki||Ki|ρn]=E[K21]ρn22mjρn=tn=2mjlnnn. (4.6)

    According to Rosenthal's inequality and 2jn,

    E[|ˆβj,kβj,k|˜p]1n˜pE[|ni=1Bi|˜p]+(tn)˜p{1n˜p(ni=1E[|Bi|˜p]+(ni=1E[|Bi|2])˜p2)+(tn)˜p,˜p>2,1n˜p(ni=1E[|Bi|2])˜p2+(tn)˜p,1˜p2,{1n˜p(n2j(˜p2+m˜p1)+(n22mj)˜p2)+(lnnn)˜p22˜pmj,˜p>2,1n˜p(n22mj)˜p2+(lnnn)˜p22˜pmj,1˜p2,(lnnn)˜p22˜pmj.

    Then, the second inequality is proved.

    Lemma 4.4. (Bernstein's inequality) Let X1,,Xn be independent random variables such that E[Xi]=0, |Xi|<M and E[|Xi|2]:=σ2. Then, for each ν>0

    P(1n|ni=1Xi|ν)2exp{nν22(σ2+νM/3)}.

    Lemma 4.5. For the model (1.1) with A1–A5 and 1˜p<+, there exists a constant κ>1 such that

    P(|ˆβj,kβj,k|κtn)n˜p. (4.7)

    Proof. According to (4.5), one gets Ki=Y2i(1)mψ(m)j,k(Xi)10g2(x)(1)mψ(m)j,k(x)dx, Bi=KiI{|Ki|ρn}E[KiI{|Ki|ρn}] and

    |ˆβj,kβj,k|1n|ni=1Bi|+1nni=1E[|Ki|I{|Ki|>ρn}].

    Meanwhile, (4.6) shows that there exists c>0 such that E[|Ki|I{|Ki|>ρn}]ctn. Furthermore, the following conclusion is true.

    {|ˆβj,kβj,k,u|κtn}{[1n|ni=1Bi|+1nni=1E(|Ki|I{|Ki|>ρn})]κtn}{1n|ni=1Bi|(κc)tn}.

    Note that the definition of Bi implies that |Bi|ρn and . Using the arguments of Lemma 4.3, . Furthermore, by Bernstein's inequality,

    Then, one can choose large enough such that

    Proof of (a): Note that

    Hence,

    (4.8)

    The stochastic term .

    It follows from Lemma 1.1 that

    Then, according to (4.3), and , one gets

    (4.9)

    The bias term .

    When , . Using Hölder inequality, Lemma 1.2 and ,

    When and , one knows that and

    Hence, the following inequality holds in both cases.

    (4.10)

    Finally, the results (4.8)–(4.10) show

    Proof of (b): By the definitions of and , one has

    Furthermore,

    (4.11)

    In this above inequality,

    For . According to (4.9) and ,

    (4.12)

    For . Using similar mathematical arguments as (4.10), when , one can obtain . This with leads to

    On the other hand, when and , one has and

    Therefore, for each ,

    (4.13)

    For . According to Hölder inequality and Lemma 1.1,

    Note that

    Meanwhile,

    Then, can be decomposed as

    (4.14)

    where

    For . It follows from the Hölder inequality that

    By Lemma 4.3, one gets

    This with Lemma 4.5, and shows that

    (4.15)

    For . One defines

    Clearly, . Furthermore, one rewrites

    (4.16)

    For . By Lemma 4.3 and

    (4.17)

    For . Using Lemma 4.3, one has

    When , by the Hölder inequality, , and Lemma 1.2, one can obtain that

    (4.18)

    When , it follows from Lemma 1.2 that

    (4.19)

    Take

    Then, (4.19) can be rewritten as

    (4.20)

    When holds if and only if , and

    (4.21)

    When holds if and only if , . Define

    and obviously, . Furthermore, one rewrites

    (4.22)

    For . Note that in the case of . Then, by the same arguments of (4.20), one gets

    (4.23)

    For . The conditions and imply . Similar to (4.18), one obtains

    (4.24)

    Combining (4.18), (4.21), (4.23) and (4.24),

    This with (4.16) and (4.17) shows that

    (4.25)

    For . According to the definition of , one can write

    For . It is easy to see that

    For . One rewrites . When , using the Hölder inequality and Lemma 1.2,

    When , one has

    For the case of , one can easily obtain that and

    When , . Moreover, by the definition of , one rewrites

    Note that

    On the other hand, similar to the arguments of (4.24), one has

    Therefore, in all of the above cases,

    (4.26)

    Finally, combining the above results (4.14), (4.15), (4.25) and (4.26), one gets

    This with (4.11)–(4.13) shows

    This paper considers wavelet estimations of the derivatives of the variance function in a heteroscedastic model. The upper bounds over risk of the wavelet estimators are discussed under some mild assumptions. The results show that the linear wavelet estimator can obtain the optimal convergence rate in the case of . When , the nonlinear wavelet estimator has a better convergence rate than the linear estimator. Moreover, the nonlinear wavelet estimator is adaptive. Finally, some numerical experiments are presented to verify the good performances of the wavelet estimators.

    We would like to thank the reviewers for their valuable comments and suggestions, which helped us to improve the quality of the manuscript. This paper is supported by the Guangxi Natural Science Foundation (No. 2022JJA110008), National Natural Science Foundation of China (No. 12001133), Center for Applied Mathematics of Guangxi (GUET), and Guangxi Colleges and Universities Key Laboratory of Data Analysis and Computation.

    All authors declare that they have no conflicts of interest.



    [1] P. L. Lai, C. Fyfe, Kernel and nonlinear canonical correlation analysis, International Journal of Neural Systems, Int. J. Neural Syst., 10 (2000), 365–377. https://doi.org/10.1142/S012906570000034X doi: 10.1142/S012906570000034X
    [2] D. R. Hardoon, S. Szedmak, J. Shawe-Taylor, Canonical correlation analysis: an overview with application to learning methods, Neural Comput., 16 (2004). https://doi.org/10.1162/0899766042321814 doi: 10.1162/0899766042321814
    [3] Q. Tian, C. Ma, M. Cao, S. Chen, H. Yin, A Convex Discriminant Semantic Correlation Analysis for Cross-View Recognition, IEEE Trans. Cybernetics, 52 (2020), 1–13. https://doi.org/10.1109/TCYB.2020.2988721 doi: 10.1109/TCYB.2020.2988721
    [4] Q. Tian, S. Xia, M. Cao, K. Chen, Reliable sensing data fusion through robust multiview prototype learning, IEEE Trans. Ind. Inform., 18 (2022), 2665–2673. https://doi.org/10.1109/TII.2021.3064358 doi: 10.1109/TII.2021.3064358
    [5] P. Zhuang, J. Wu, F. Porikli, C. Li, Underwater image enhancement with hyper-laplacian reflectance priors, IEEE Trans. Image Process., 31 (2022), 5442–5455. https://doi.org/10.1109/TIP.2022.3196546 doi: 10.1109/TIP.2022.3196546
    [6] V. Sindhwani, D. S. Rosenberg, An RKHS for multi-view learning and manifold co-regularization, IEEE Trans. Cybernetics, 99 (2020), 1–33. https://doi.org/10.1145/1390156.1390279 doi: 10.1145/1390156.1390279
    [7] M. H. Quang, L. Bazzani, V. Murino, A unifying framework for vector-valued manifold regularization and multi-view learning, in Proceedings of the 30th International Conference on Machine Learning, (2013), 100–108.
    [8] J. Zhao, X. Xie, X. Xu, S. Sun, Multi-view learning overview: Recent progress and new challenges, Inform. Fusion, 38 (2017), 43–54. https://doi.org/10.1016/j.inffus.2017.02.007 doi: 10.1016/j.inffus.2017.02.007
    [9] D. Zhang, T. He, F. Zhang, Real-time human mobility modeling with multi-view learning, ACM Trans. Intell. Syst. Technol., 9 (2017), 1–25. https://doi.org/10.1145/3092692 doi: 10.1145/3092692
    [10] D. Zhai, H. Chang, S. Shan, X. Chen, W. Gao, Multiview metric learning with global consistency and local smoothness, ACM Trans. Intell. Syst. Technol., 3 (2012), 1–22. https://doi.org/10.1145/2168752.2168767 doi: 10.1145/2168752.2168767
    [11] P. Zhuang, X. Ding, Underwater image enhancement using an edge-preserving filtering retinex algorithm, Multimed. Tools Appl., 79 (2020), 17257–17277. https://doi.org/10.1007/s11042-019-08404-4 doi: 10.1007/s11042-019-08404-4
    [12] T. Sun, S. Chen, J. Yang, P. Shi, A novel method of combined feature extraction for recognition, in 2008 Eighth IEEE International Conference on Data Mining, (2008), 1043–1048. https://doi.org/10.1109/ICDM.2008.28
    [13] Y. Peng, D. Zhang, J. Zhang, A new canonical correlation analysis algorithm with local discrimination, Neural Process. Lett., 31 (2010), 1–15. https://doi.org/10.1007/s11063-009-9123-3 doi: 10.1007/s11063-009-9123-3
    [14] S. Su, H. Ge, Y. H. Yuan, Multi-patch embedding canonical correlation analysis for multi-view feature learning, J. Vis. Commun. Image R., 41 (2016), 47–57. https://doi.org/10.1016/j.jvcir.2016.09.004 doi: 10.1016/j.jvcir.2016.09.004
    [15] Q. S. Sun, Z. D. Liu, P. A. Heng, D. S. Xia, Rapid and brief communication: A theorem on the generalized canonical projective vectors, Pattern Recogn., 38 (2005), 449–452. https://doi.org/10.1016/j.patcog.2004.08.009 doi: 10.1016/j.patcog.2004.08.009
    [16] H. K. Ji, Q. S. Sun, Y. H. Yuan, Z. X. Ji, Fractional-order embedding supervised canonical correlations analysis with applications to feature extraction and recognition, Neural Process. Lett., 45 (2017), 279–297. https://doi.org/10.1007/s11063-016-9524-z doi: 10.1007/s11063-016-9524-z
    [17] X. D. Zhou, X. H. Chen, S. C. Chen, Combined-feature-discriminability enhanced canonical correlation analysis, Pattern Recogn. Artif. Intell., 25 (2012), 285–291.
    [18] P. N. Belhumeur, J. P. Hespanha, D. J. Kriegman, Eigenfaces vs. fisherfaces: Recognition using class specific linear projection, IEEE Trans. Pattern Anal. Mach. Intell., 19 (1997), 711–720. https://doi.org/10.1109/34.598228 doi: 10.1109/34.598228
    [19] F. Zhao, L. Qiao, F. Shi, P. Yap, D. Shen, Feature fusion via hierarchical supervised local CCA for diagnosis of autism spectrum disorder, Brain Imaging Behav., 11 (2017), 1050–1060. https://doi.org/10.1007/s11682-016-9587-5 doi: 10.1007/s11682-016-9587-5
    [20] M. Haghighat, M. Abdel-Mottaleb, W. Alhalabi, Discriminant correlation analysis: Real-time feature level fusion for multimodal biometric recognition, IEEE Trans. Inform. Foren. Sec., 11 (2016), 1984–1996. https://doi.org/10.1109/TIFS.2016.2569061 doi: 10.1109/TIFS.2016.2569061
    [21] A. Sharma, A. Kumar, H. Daume, D. W. Jacobs, Generalized multiview analysis: A discriminative latent space, in 2012 IEEE Conference on Computer Vision and Pattern Recognition, (2012), 2160–2167. https://doi.org/10.1109/CVPR.2012.6247923
    [22] S. Sun, X. Xie, M. Yang, Multiview uncorrelated discriminant analysis, IEEE Trans. Cybernetics, 46 (2016), 3272–3284. https://doi.org/10.1109/TCYB.2015.2502248 doi: 10.1109/TCYB.2015.2502248
    [23] P. Hu, D. Peng, J. Guo, L. Zhen, Local feature based multi-view discriminant analysis, Knowl.-Based Syst., 149 (2018), 34–46. https://doi.org/10.1016/j.knosys.2018.02.008 doi: 10.1016/j.knosys.2018.02.008
    [24] X. Fu, K. Huang, M. Hong, N. D. Sidiropoulos, A. M. C. So, Scalable and flexible multiview MAX-VAR canonical correlation analysis, IEEE Trans. Signal Process., 65 (2017), 4150–4165. https://doi.org/10.1109/TSP.2017.2698365 doi: 10.1109/TSP.2017.2698365
    [25] D. Y. Gao, Canonical duality theory and solutions to constrained nonconvex quadratic programming, J. Global Optim., 29 (2004), 377–399. https://doi.org/10.1023/B:JOGO.0000048034.94449.e3 doi: 10.1023/B:JOGO.0000048034.94449.e3
    [26] J. Fan, S. Chen, Convex discriminant canonical correlation analysis, Pattern Recogn. Artif. Intell., 30 (2017), 740–746. https://doi.org/10.16451/j.cnki.issn1003-6059.201708008 doi: 10.16451/j.cnki.issn1003-6059.201708008
    [27] C. Tang, X. Zheng, X. Liu, W. Zhang, J. Zhang, J. Xiong, et al., Cross-view locality preserved diversity and consensus learning for multi-view unsupervised feature selection, IEEE Trans. Knowl. Data Eng., 34 (2022), 4705–4716. https://doi.org/10.1109/TKDE.2020.3048678 doi: 10.1109/TKDE.2020.3048678
    [28] C. Tang, Z. Li, J. Wang, X. Liu, W. Zhang, E. Zhu, Unified one-step multi-view spectral clustering, IEEE Trans. Knowl. Data Eng., 35 (2023), 6449–6460. https://doi.org/10.1109/TKDE.2022.3172687 doi: 10.1109/TKDE.2022.3172687
    [29] J. Wang, C. Tang, Z. Wan, W. Zhang, K. Sun, A. Y. Zomaya, Efficient and Effective One-Step Multiview Clustering, IEEE Trans. Neur. Net. Learn. Syst., (2023), 1–12. https://doi.org/10.1109/TNNLS.2023.3253246 doi: 10.1109/TNNLS.2023.3253246
    [30] P. L. Lai, C. FyFe, KERNEL AND NONLINEAR CANONICAL CORRELATION ANALYSIS, International Journal of Neural Systems, 10 (2000), 365–377.
    [31] K Fukumizu, FR Bach, A Gretton, Statistical consistency of kernel canonical correlation analysis, J. Mach. Learn. Res., 8 (2007), 361–383.
    [32] T. Liu, T. K. Pong, Further properties of the forward Cbackward envelope with applications to difference-of-convex programming, Comput. Optim. Appl., 67 (2017), 480–520. https://doi.org/10.1007/s10589-017-9900-2 doi: 10.1007/s10589-017-9900-2
    [33] T. P. Dinh, H. M. Le, H. A. Le Thi, F. Lauer, A difference of convex functions algorithm for switched linear regression, IEEE Trans. Automat. Contr., 59 (2014), 2277–2282. https://doi.org/10.1109/TAC.2014.2301575 doi: 10.1109/TAC.2014.2301575
    [34] P. Zadeh, R. Hosseini, S. Sra, Geometric mean metric learning, in Proceedings of The 33rd International Conference on Machine Learning, (2016), 2464–2471.
    [35] B. Stephen, V. Lieven, Convex optimization, Cambridge University Press, Cambridge, 2004.
    [36] V. Arsigny, P. Fillard, X. Pennec, N. Ayache, Geometric means in a novel vector space structure on symmetric positive-definite matrices, SIAM J. Matrix Anal. Appl., 29 (2007), 328–347. https://doi.org/10.1137/050637996 doi: 10.1137/050637996
    [37] A. Papadopoulos, Metric Spaces, Convexity and Nonpositive Curvature, European Mathematical Society, Zurich, 2005.
    [38] T. Rapcsák, Geodesic convexity in nonlinear optimization, J. Optim. Theory Appl., 69 (1991), 169–183. https://doi.org/10.1007/BF00940467 doi: 10.1007/BF00940467
    [39] C. L. Liu, K. Nakashima, H. Sako, H. Fujisawa, Handwritten digit recognition: investigation of normalization and feature extraction techniques, Pattern Recogn., 37 (2004), 265–279. https://doi.org/10.1016/S0031-3203(03)00224-3 doi: 10.1016/S0031-3203(03)00224-3
    [40] Pawlicki, D. S. Lee, Hull, Srihari, Neural network models and their application to handwritten digit recognition, in IEEE 1988 International Conference on Neural Networks, 2 (1988), 63–70. https://doi.org/10.1109/ICNN.1988.23913
    [41] C. H. Lampert, H. Nickisch, S. Harmeling, Learning to detect unseen object classes by between-class attribute transfer, in 2009 IEEE Conference on Computer Vision and Pattern Recognition, (2009), 951–958. https://doi.org/10.1109/CVPR.2009.5206594
    [42] C. R. Jack, M. A. Bernstein, N. C. Fox, P. Thompson, G. Alexander, D. Harvey, et al., The Alzheimer's disease neuroimaging initiative (ADNI): MRI methods, J. Magn. Reson. Imaging, 27 (2008), 685–691. https://doi.org/10.1002/jmri.21049 doi: 10.1002/jmri.21049
    [43] S. Moschoglou, A. Papaioannou, C. Sagonas, J. Deng, I. Kotsia, S. Zafeiriou, Agedb: the first manually collected, in-the-wild age database, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, (2017), 51–59.
    [44] B. C. Chen, C. S. Chen, W. H. Hsu, Cross-age reference coding for age-invariant face recognition and retrieval, in Computer Vision – ECCV 2014., Springer, (2014), 768–783. https://doi.org/10.1007/978-3-319-10599-4_49
    [45] R. Rothe, R. Timofte, L. Van Gool, Deep expectation of real and apparent age from a single image without facial landmarks, Int. J. Comput. Vis., 126 (2018), 144–157. https://doi.org/10.1007/s11263-016-0940-3 doi: 10.1007/s11263-016-0940-3
    [46] G. Guo, G. Mu, Y. Fu, T. S. Huang, Human age estimation using bio-inspired features, in 2009 IEEE Conference on Computer Vision and Pattern Recognition, (2009), 112–119. https://doi.org/10.1109/CVPR.2009.5206681
    [47] Q. Zhu, M. C. Yeh, K. T. Cheng, S. Avidan, Fast human detection using a cascade of histograms of oriented gradients, in 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06), (2006), 1491–1498. https://doi.org/10.1109/CVPR.2006.119
    [48] K. Simonyan, A. Zisserma, Very deep convolutional networks for large-scale image recognition, preprint, arXiv: 1409.1556.
  • This article has been cited by:

    1. Junke Kou, Hao Zhang, Partial Derivatives Estimation of Multivariate Variance Function in Heteroscedastic Model via Wavelet Method, 2024, 13, 2075-1680, 69, 10.3390/axioms13010069
  • Reader Comments
  • © 2023 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(1041) PDF downloads(50) Cited by(0)

Figures and Tables

Figures(5)  /  Tables(8)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog