
Citation: Daniel Gerth, Bernd Hofmann. Oversmoothing regularization with ℓ1-penalty term[J]. AIMS Mathematics, 2019, 4(4): 1223-1247. doi: 10.3934/math.2019.4.1223
[1] | Wei Yang, Lili Pan, Jinhui Wan . Smoothing gradient descent algorithm for the composite sparse optimization. AIMS Mathematics, 2024, 9(12): 33401-33422. doi: 10.3934/math.20241594 |
[2] | Fethi Bouzeffour . Inversion formulas for space-fractional Bessel heat diffusion through Tikhonov regularization. AIMS Mathematics, 2024, 9(8): 20826-20842. doi: 10.3934/math.20241013 |
[3] | El Mostafa Kalmoun, Fatimah Allami . On the existence and stability of minimizers for generalized Tikhonov functionals with general similarity data. AIMS Mathematics, 2021, 6(3): 2764-2777. doi: 10.3934/math.2021169 |
[4] | Xuemin Xue, Xiangtuan Xiong, Yuanxiang Zhang . Two fractional regularization methods for identifying the radiogenic source of the Helium production-diffusion equation. AIMS Mathematics, 2021, 6(10): 11425-11448. doi: 10.3934/math.2021662 |
[5] | Hongwu Zhang, Xiaoju Zhang . Tikhonov-type regularization method for a sideways problem of the time-fractional diffusion equation. AIMS Mathematics, 2021, 6(1): 90-101. doi: 10.3934/math.2021007 |
[6] | JingJun Yu . Some congruences for ℓ-regular partitions with certain restrictions. AIMS Mathematics, 2024, 9(3): 6368-6378. doi: 10.3934/math.2024310 |
[7] | Phuong Nguyen Duc, Erkan Nane, Omid Nikan, Nguyen Anh Tuan . Approximation of the initial value for damped nonlinear hyperbolic equations with random Gaussian white noise on the measurements. AIMS Mathematics, 2022, 7(7): 12620-12634. doi: 10.3934/math.2022698 |
[8] | Dun-Gang Li, Fan Yang, Ping Fan, Xiao-Xiao Li, Can-Yun Huang . Landweber iterative regularization method for reconstructing the unknown source of the modified Helmholtz equation. AIMS Mathematics, 2021, 6(9): 10327-10342. doi: 10.3934/math.2021598 |
[9] | M. J. Huntul, Kh. Khompysh, M. K. Shazyndayeva, M. K. Iqbal . An inverse source problem for a pseudoparabolic equation with memory. AIMS Mathematics, 2024, 9(6): 14186-14212. doi: 10.3934/math.2024689 |
[10] | Zhenping Li, Xiangtuan Xiong, Jun Li, Jiaqi Hou . A quasi-boundary method for solving an inverse diffraction problem. AIMS Mathematics, 2022, 7(6): 11070-11086. doi: 10.3934/math.2022618 |
In this paper we consider the problem of finding a stable approximate solution to the ill-posed linear operator equation
Ax=y, | (1.1) |
where A:X:=ℓ2→Y is a bounded linear operator with non-closed range, i.e., range(A)≠¯range(A), in the separable Hilbert space Y with norm ‖⋅‖. We recall that for 1≤p≤∞ the spaces ℓp are Banach spaces consisting of (infinite) series x={xi}∞i=1 with norms ‖x‖ℓp=p√∑∞i=1|xi|p for 1≤p<∞ and ‖⋅‖ℓ∞=supi∈N|xi|. For p=2 one obtains a Hilbert space. We assume that in (1.1) only noisy data yδ is available satisfying the inequality
||y−yδ||≤δ | (1.2) |
with known noise level δ>0. In order to determine a stable solution to (1.1) from the noisy data, regularization is required. We are searching for approximations xδα of the exact solution x† with Ax†=y based on the noisy data yδ, which are calculated as ℓ1-regularized solutions
xδα:=argminx∈ℓ2{12||Ax−yδ||2+α||x||ℓ1} | (1.3) |
with regularization parameters α>0. This approach was extensively studied in the last decade and we refer to the brief discussion in Section 4. We remark that the Banach space ℓ1 is non-reflexive, which makes its analysis challenging. In particular, it was shown in [3,Prop 3.3] that (1.1) is always ill-posed when the domain of A is restricted to ℓ1 and Y is a reflexive Banach space. On the other hand, ℓ1-regularization in the case ‖x†‖ℓ1<∞ is well understood as a prominent version of sparsity promoting regularization. Hence, it seems to be of great interest whether the ℓ1-regularization also makes sense in the oversmoothing case, which is characterized by the fact that x†∈ℓ2∖ℓ1 such that ‖x†‖ℓ1=∞. The scenario of oversmoothing regularization has, to the best of the authors knowledge, only been treated in the setting of regularization in Hilbert scales, see Section 2. There, the specific structure of the problem yields link conditions as tools to jump between different scale levels and thus to handle the occurring analytical problems arising from the oversmoothing character of the penalty. For other space settings the corresponding tools are missing, and this paper is an attempt to overcome this deficit. We therefore stress that in this paper we are not interested in specific applications, nor in improving any sort of reconstruction quality. Equation (1.1) and procedure (1.3) merely constitute a new trial case for modeling the oversmoothing regularization.
Let us now discuss our motivation for investigating this regularization idea. As we will see in our model problem, oversmoothing ℓ1-regularization already has one advantage over the classical Tikhonov regularization with penalty ‖⋅‖2ℓ2, i.e.,
xδα:=argminx∈ℓ2{12||Ax−yδ||2+α||x||2ℓ2}. | (1.4) |
Namely, it does not suffer from the saturation effect of the ℓ2-regularized approach, and the proposed a priori choice of the regularization parameter coincides with the discrepancy principle for all cases under consideration. Hence, one motivation for oversmoothing regularization are possibly improved convergence properties. But there are more potential benefits. An oversmoothing penalty might yield an easier optimization problem. For example, when x†∈ℓp∖ℓ2 with p>2, classical Tikhonov regularization (1.4) would be oversmoothing. However, since it is much easier to find the minimizer of the ℓ2-penalized Tikhonov functional (which corresponds to solving a linear system) than minimizing a functional with arbitrary ℓp-penalty, the former might be preferred. A third aspect for oversmoothing regularization is feature extraction. In terms of our model problem, ℓ1-regularization is known to yield sparse regularized solutions, i.e., only finitely many components are non-zero. As we will see later, ℓ1-regularization in some sense selects the best (least) discretization level to achieve a certain residual. Therefore, the approximate solutions need less storage space than ℓ2-regularized solutions. In general, other features than sparsity could be of interest. Finally, the analysis of oversmoothing regularization is challenging and might provide new insights for general regularization approaches.
The paper is organized as follows: In Section 2 we sketch the existing results on oversmoothing regularization in Hilbert scales. We proceed with properties of ℓ1-regularized solutions and discuss their implications in Section 3. After that we move to convergence rates for ℓ1-regularization including the oversmoothing case in Section 4. In this context, we restrict our studies to diagonal operators and present a specific simplified model problem, which is the basis for the subsequent numerical experiments. In Section 5 we compare our derived convergence rates with the ones known from classical Tikhonov regularization with ℓ2-penalty. Finally, numerical experiments supporting the theoretical results are presented in Section 6.
To the best of the authors knowledge, oversmoothing regularization has so far only been investigated in the setting of regularization in Hilbert scales, starting with the seminal paper by Natterer [6]. There, Natterer considers a linear problem (1.1) between Hilbert spaces X and Y. The regularized solutions are obtained by the minimization problem
xδα:=argminx∈X{||Ax−yδ||2+α||x||2p}, | (2.1) |
where the penalty is a norm in a Hilbert scale. In this context, a family {Xs}s∈R of Hilbert spaces with X0:=X is called Hilbert scale if Xt⊆Xs whenever s<t and the inclusion is a continuous embedding, i.e., there exists cs,t>0 such that ||x||s≤cs,t||x||t. The Hilbert scale is induced by an unbounded, self-adjoint, and strictly positive definite operator T in X such that ‖x‖s:=‖Tsx‖X, s∈R, defines a norm in Xs.
Under the noise model (1.2) in combination with the smoothness assumption ||x†||q≤ρ for some q>0, and provided that there exists some a>0 such that with two constants m,M>0 and for all x∈X the inequality chain
m||x||−a≤||Ax||≤M||x||−a | (2.2) |
holds true, Natterer shows in [6] the error estimate
‖xδα−x†‖X≤Cδqa+qρaa+q | (2.3) |
whenever the penalty smoothness p in (2.1) is sufficiently high, in the sense that p≥(q−a)/2, and for the a-priori parameter choice
α(δ)=c(ρ)δ2(a+p)a+q | (2.4) |
with appropriately chosen constant c(ρ)>0. It is interesting that here the index p, which characterizes the smoothness of the penalty in (2.1), is limited for obtaining the rate result (2.3) by a lower bound, but not by an upper bound. Really, Natterer states that ``there is nothing wrong with high order regularization, even well above the order of the smoothness of the exact solution''. Even though x† may have arbitrarily small q-norms with q>0, one still obtains the order optimal rate of convergence. The only adjustment to be made is an appropriate decrease of the regularization parameter.
Recently, Natterer's results have been extended to ill-posed inverse problems with nonlinear forward operators by Mathé and Hofmann [7,8]. In the first paper, the regularization parameter is chosen according to the discrepancy principle, while the second paper shows convergence rates under, in principle, the same a-priori parameter choice as occurring in (2.4). In both papers the obtained convergence rate coincides with that in (2.3), i.e., ‖xδα−x†‖X=O(δqa+q) as δ→0.
In this section we discuss some basic properties of oversmoothing ℓ1-regularization or the lack thereof. The results are easily generalized to arbitrary penalty functionals in Tikhonov-type regularization, as long as certain standard assumptions are fulfilled. We refer to, e.g., [9,Assumption 3.11 and Assumption 3.22] or [10,Assumption 8.1] for details. Basic properties of ℓ1-regularization are well known. As the following lemma shows, existence and stability of regularized solutions with respect to small perturbations in the data are not influenced by the oversmoothing setting.
Lemma 3.1. For all fixed α,δ>0, a minimizer of (1.3) exists and has a finite ℓ1-norm. Furthermore, if {yk} is a sequence norm-converging in Y to yδ, i.e. yk→yδ, then every associated sequence {xk} with
xk∈argminx∈ℓ2{12||Ax−yk||2+α‖x‖ℓ1} |
has a subsequence that converges weakly. The weak limit of every such convergent subsequence {x˜k} of {xk} is a minimizer ˜x of (1.3). Moreover, ‖x˜k‖ converges to ‖˜x‖.
Proof. Take α>0 arbitrary but fixed. Since ℓ1⊂ℓ2, there exists ˜x∈ℓ2 with finite values ||A˜x−yδ|| and ‖˜x‖ℓ1. This implies
‖xk‖ℓ1≤1α(12‖Axk−yk‖2+α‖xk‖ℓ1)≤1α(12‖A˜x−yk‖2+α‖˜x‖ℓ1)<∞ |
by the optimality of the minimizer. The remaining part of the proof follows from standard arguments [9,10].
We move on to identify some necessary conditions for the convergence of regularized solutions for δ→0 in the case of oversmoothing regularization.
Theorem 3.1. Let x†∈ℓ2 with ‖x†‖ℓ1=∞ denote a solution to (1.1). If the Tikhonov regularized solutions (1.3) under consideration are weakly convergent to x† for an a priori or a posteriori parameter choice rule α=α(δ,yδ), then the following items must hold for a sequence xk:=xδkαk⇀x† as δk→0:
a) limk→∞‖xk‖ℓ1=∞,
b) limk→∞αk=0,
c) limk→∞αk‖xk‖ℓ1≤C<∞.
Proof. By Lemma 3.1, ‖xk‖ℓ1<∞ for all k∈N. If, however, we assume that there is a subsequence {xkj} with ‖xkj‖ℓ1≤c<∞ uniformly for all j∈N, then the assumed weak convergence xkj⇀x† in X implies that ‖x†‖ℓ1≤c. This contradicts the assumption ‖x†‖ℓ1=∞ and yields item a) of the theorem. Now take some fixed ˜x∈ℓ1 and keep in mind the definition of the xk as minimizers of the functional (1.3). It is
12||Axk−yδk||2+αk‖xk‖ℓ1≤12||A˜x−yδk||2+αk‖˜x‖ℓ1≤δ2k+||A˜x−y||2+αk‖˜x‖ℓ1. |
Therefore
αk(‖xk‖ℓ1−‖˜x‖ℓ1)≤δ2k+||A˜x−y||2≤C<∞. |
Since ‖˜x‖ℓ1<∞ we need limk→∞αk=0 in order to allow limk→∞‖xk‖ℓ1=∞ as necessary due to a). Additionally, the product αk‖xk‖ℓ1 has to stay bounded, yielding together b) and c).
The next step would be to show (weak) convergence of the regularized solutions to the exact solution as δ→0. However, no such result is known. Even for the Hilbert scale setting of Section 2 no general convergence assertion appears to be available. In the standard setting ‖x†‖ℓ1<∞, one has due to the optimality of the xδα
12‖Axδα−yδ‖2+α‖xδα‖ℓ1≤12δ2+α‖x†‖ℓ1<∞. | (3.1) |
Requiring α→0 and δ2/α→0 as δ→0 then ensures (weak) convergence of the regularized solutions to the exact one. In particular, the ℓ1-norm of the regularized solutions ‖xδα‖ℓ1 remains bounded by a constant for all δ>0. In the oversmoothing setting, the right-hand side in (3.1) is infinite, hence provides no useful information. It appears natural to replace x† by suitable auxiliary elements to bound the Tikhonov functional, but that is not enough. Let {xδ}δ>0 be a sequence with
‖xδ‖ℓ1=infx∈ℓ2,‖Ax−y‖≤cδ‖x‖ℓ1 |
for fixed constant c>0. Using the {xδ}δ>0 to bound the Tikhonov functional, we obtain
12‖Axδα−yδ‖2+α‖xδα‖ℓ1≤12‖Axδ−yδ‖2+α‖xδ‖ℓ1≤(c+1)δ2+α‖xδ‖ℓ1. | (3.2) |
For any choice of α=α(δ) one obtains ‖xδα‖ℓ1≤cδ2α+‖xδ‖ℓ1. Even if δ2/α→0, this does not yield a bound for ‖xδα‖ℓ1 independent of δ, since ‖xδ‖ℓ1→∞. Therefore one cannot infer the existence of a (weakly) convergent subsequence among the regularized solutions xδα as is the argument in the standard, non-oversmoothing, convergence results. For the oversmoothing regularization one would need that the xδα are bounded in a norm weaker than the ℓ1-norm, for example, in the ℓ2-norm. It is currently not clear how such a connection can be established. At this point we also mention that the oversmoothing approach prevents the use of state-of-the-art regularization theory. In recent years, variational inequalities (sometimes also called variational source conditions) have emerged as a powerful tool in the theory of Banach-space regularization, and we only refer, for example, to the papers [4,5]. For ℓ1 regularization, in [11] a variational inequality of the form
‖x−x†‖ℓ1≤‖x‖ℓ1−‖x†‖ℓ1+φ(‖Ax−Ax†‖) | (3.3) |
for a well-defined concave index function φ and valid for all x∈ℓ1 was established and used for deriving convergence rates in the case x†∈ℓ1. Clearly, this concept is inapplicable when x†∈ℓ2∖ℓ1 and hence ‖x†‖ℓ1=∞, but it could be an idea to measure the error in ℓ2-norms. Therefore, it seems to be interesting that not even for x†∈ℓ1 a variational inequality of the form
‖x−x†‖κℓ2≤‖x‖ℓ1−‖x†‖ℓ1+φ(‖Ax−Ax†‖) |
with κ=1 or κ=2 is known as an alternative to (3.3).
In this section, we derive convergence rates for the ℓ1-penalized Tikhonov-type regularization (1.3) to (1.1). This method became a popular and powerful tool in the last decade, sparked by the seminal paper [13]. Since then, many authors have contributed to its theory and application. Here we only mention the papers [14,15,16]. As is typical in ℓ1-regularization, we assume that A is injective. For the non-injective case we refer to [17]. The vast majority of papers connected to ℓ1-regularization assumes sparsity in the sense that x†∈ℓ0, i.e., that it has only finitely many non-zero components. However, in [11] for the first time the situation that the exact solution x† is not sparse, but only x†∈ℓ1, was explored. The results were later refined and extended in [3,12,18,19]. In some sense, this paper is a continuation of this development as we are now interested in the case that x† is not even in ℓ1, but x†∈ℓ2∖ℓ1. Due to this we will employ the ℓ2-norm to measure the speed of convergence, and we seek for an index function φ, i.e., a continuous, monotonically increasing, and concave function with φ(0)=0, such that
||xδα−x†||ℓ2≤Cφ(δ) |
holds with some constant C>0. We will show that an appropriate choice of the regularization parameter yields such a function φ.
It is well-known that ℓ1-regularization yields sparse solutions. We will see that for showing convergence rates when x†∉ℓ1 it is essential to estimate the support of the regularized solutions. In order to do this, we rely on the explicit calculation of the minimizers. Therefore, we have to restrict ourselves to diagonal operators for this paper. We denote by {e(i)}∞i=1 the canonical orthonormal basis in ℓ2 with components e(k)k=1 and e(i)k=0 for k≠i. Moreover, we denote by {v(i)}∞i=1 an orthonormal basis in the range closure ¯range(A) of Y. Then we say that a compact operator A:ℓ2→Y with decreasingly ordered singular values {σi}∞i=1 is of diagonal structure if
Ae(i)=σiv(i)ande(i)=1σiA∗v(i)(i=1,2,...). |
This model includes compact linear operators mapping between general Hilbert spaces X and Y, since such operators can be diagonalized with respect to their singular system [1,2].
We now present a way of constructing functions φ that serve as prototypes for convergence rates. To this end, we define, for all n∈N, the linear projectors
Pn:ℓ2→ℓ2,Pnx={xi}ni=1. | (4.1) |
Lemma 4.1. Let A:ℓ2→Y be a compact linear operator with diagonal structure as introduced above, possessing the singular values
‖A‖=σ1≥σ2≥...≥σn≥σn+1≥...→0asn→∞, |
and let x†∈ℓ2 denote the uniquely determined solution to the operator equation (1.1), and set
T(x†,n):=||(I−Pn)x†||ℓ2=√∞∑i=n+1|x†i|2. | (4.2) |
Then for any x∈ℓ2 it is, for all n∈N,
||x−x†||ℓ2≤||(I−Pn)x||ℓ2+σ−1n||Ax−Ax†||+T(x†,n). | (4.3) |
Proof. We have, for any x∈ℓ2, with the linear projectors from (4.1) the relation
||x−x†||ℓ2≤||(I−Pn)x||ℓ2+||Pn(x−x†)||ℓ2+||(I−Pn)x†||ℓ2 | (4.4) |
which holds for all n∈N. We keep n arbitrary but fixed and start estimating the last term which describes the decay of the tail of the solution and thus its smoothness. It is a fixed function of n which goes to zero as n→∞, and we employ the convention (4.2).
Next we estimate the middle term ||Pn(x−x†)||ℓ2 in (4.4). In order to do so we recall the notion of the modulus of continuity, given by
ω(M,θ):=sup{||x||:x∈M,||Ax||≤θ} |
where M⊂ℓ2. This quantity is essentially related to minimal errors of any regularization method for noisy data. Since Pn(x−x†)∈ℓ2n:=span{e1,…,en}, we can use tools from approximation theory to estimate its norm. In [20,Proposition 3.9] it has been shown that for any finite dimensional space Xn
ω(Xn,θ)=θΘ(A,n), |
where the modulus of injectivity Θ(A,n) is defined as
Θ(A,n):=inf0≠x∈Xn||Ax||||x||. |
For diagonal operators it is Θ(A,n)=σn and thus
ω(ℓ2n,θ)=σ−1nθ. |
Noting that for diagonal operators it also holds that
||APn(x−x†)||≤||A(x−x†)||, | (4.5) |
it is therefore
||Pn(x−x†)||ℓ2≤ω(ℓ2n,||APn(x−x†)||)=σ−1n||APn(x−x†)||≤σ−1n||A(x−x†)||. | (4.6) |
Inserting this and (4.2) into (4.4), we obtain (4.3).
In the best case scenario the term ||(I−Pn)x||ℓ2 in (4.3) vanishes (note that this is a crucial point to be shown for the ℓ1-regularized approximations to x†) and only the two rightmost terms remain. The best possible convergence rate is therefore determined by taking the infimum of those two expressions with respect to n. This yields a prototype rate function
φ(t)=infn∈N{tσn+‖(I−Pn)x†‖ℓ2}. | (4.7) |
Note that as infimum over affine functions φ is concave, and it is also monotonically increasing with φ(0)=0. Since for fixed t≥0 the infimum is taken over a countable set and σ−1nt→∞ monotonically as n→∞ while T(x†,n)→0 monotonically, the infimum is attained as minimum and the corresponding index
ninf(t):=argminn∈N{tσn+‖(I−Pn)x†‖ℓ2} | (4.8) |
is well defined. In order to show ||x−x†||ℓ2≤φ(‖Ax−Ax†‖) for the ℓ1-regularized solutions x=xδα from (1.3), we need the following assumptions.
Assumption 4.1. (a) The singular values of A fulfill, for all n∈N,
1≤σnσn+1≤Cσ |
with a constant 1≤Cσ<∞.
(b) The tail is monotone, i.e., |x†n| is monotonically decreasing for all sufficiently large n.
(c) The convergence rate φ(t) is not dominated by the tail, i.e., there exists a constant ˜C such that
T(x†,ninf(t))≤˜Cσ−1ninf(t)t. |
(d) For the true solution x† it holds
|x†ninf(t)+1|≤Cσ−1ninf(t)t√nninf(t) |
for a constant 0≤C<∞ and sufficiently small t>0.
Part (a) limits the ill-posedness of the forward operators. A polynomial decay of the singular values fulfills the condition, even an exponential decay of order σi=e−i is permitted, but for any ilonε>0 a decay σi=e−i1+ilonε violates the condition. Note that the left-hand inequality is trivial, but will be used in a proof later. Part (b) is required purely for technical reasons. Part (c) and (d) link the smoothness of the true solution with the ill-posedness of the forward operator. They will be discussed in examples below. We remark that similar restrictions are standing assumptions in the Hilbert-scale setting of Section 2: Condition (2.2) ensures that the singular values of A fall asymptotically as i−a, and the condition p≥(q−a)/2 implies that the solution must not be significantly smoother than the penalty.
For later use we mention two consequences of Assumption 4.1.
Lemma 4.2. Let A be as in Lemma 4.1, and let Assumption 4.1 (a) and (c) hold. Then
11+˜Cφ(t)≤σ−1ninf(t)t≤φ(t). | (4.9) |
Furthermore, it is
‖APninf(t)x†−Ax†‖≤˜Ct. | (4.10) |
Proof. The right-hand inequality of (4.9) follows from the definition of φ in (4.7). Similarly, it is
φ(t)=σ−1ninf(t)t+T(x†,ninf(t))≤(1+˜C)σ−1ninf(t)t, |
which yields the left-hand inequality. To obtain the estimate of the residual, we observe that
‖APninf(t)x†−Ax†‖2=∞∑i=ninf(t)+1|σix†i|2≤σ2ninf(t)+1∞∑i=ninf(t)+1|x†i|2≤σ2ninf(t)T(x†,ninf(t))2. |
Inserting Assumption 4.1 (c) immediately yields (4.10).
In order to show convergence rates based on Lemma 4.1, two terms need to be estimated in (4.3) for x=xδα: the residual ‖Axα−Ax†‖ and the tail ‖(I−Pn)xδα‖ℓ2.
Lemma 4.3. Let A be as in Lemma 4.1, and let Assumption 4.1 (c) hold. Then, with the a-priori choice
α(δ)=cαδ2√ninf(δ)φ(δ), | (4.11) |
with constant 0<cα<∞, the minimizers xδα of (1.3) satisfy
||Axδα−yδ||≤((˜C+1)2+4cα)δ. | (4.12) |
In particular, it is
‖Axδα−Ax†‖≤cδ | (4.13) |
with positive constant c=(˜C+1)2+4cα+1.
Proof. From the Tikhonov functional (1.3) we have for all n∈N
12||Axδα−yδ||2+α||xδα||ℓ1≤12||APnx†−yδ||2+α||Pnx†||ℓ1, |
which with
||xδα||ℓ1=||Pnxδα||ℓ1+||(I−Pn)xδα||ℓ1 |
and
||Pnx†||ℓ1≤||Pn(x†−xδα)||ℓ1+||Pnxδα||ℓ1 |
yields
12||Axδα−yδ||2+α||(I−Pn)xδα||ℓ1≤12||APnx†−yδ||2+α||Pn(x†−xδα)||ℓ1≤12‖APnx†−yδ‖2+α√n‖Pn(x†−xδα)‖ℓ2. | (4.14) |
Now fix n=ninf(δ). Using (4.6) and (4.9) on the right-hand side we have
||Pninf(δ)(x†−xδα)||ℓ2≤σ−1ninf(δ)‖Axδα−Ax†‖≤σ−1ninf(δ)(‖Axδα−yδ‖+‖Ax†−yδ‖)≤2σ−1ninf(δ)max{‖Axδα−yδ‖,δ}≤2φ(‖Axδα−yδ‖), | (4.15) |
where in the last estimate we have assumed ‖Axδα−yδ‖>δ, as otherwise the assertion of the lemma is trivially fulfilled. We combine this estimate with (4.14) and again set n=ninf(δ). This yields
12||Axδα−yδ||2≤12||APninf(δ)x†−yδ||2+2α√ninf(δ)φ(||Axδα−yδ)||). | (4.16) |
Note that by (4.10) we have ||APninf(δ)x†−yδ||≤(˜C+1)δ. Inserting the parameter choice (4.11) into (4.16), we continue analogously to [5,Corollary 1]. Namely, it is by concavity of φ
12||Axδα−yδ||2≤(˜C+1)22δ2+2cαδ2√ninf(δ)φ(δ)√ninf(δ)φ(||Axδα−yδ‖)≤(˜C+1)22δ2+2cαδ2φ(‖Axδα−yδ‖)φ(δ)≤δ2((˜C+1)22+2cα)φ(‖Axδα−yδ‖)φ(δ)≤δ2((˜C+1)22+2cα)‖Axδα−yδ‖φ(δ)δφ(δ)=δ((˜C+1)22+2cα)‖Axδα−yδ‖. |
This yields ||Axδα−yδ||≤((˜C+1)2+4cα)δ for α from (4.11).
The second assertion follows from this, the noise assumption (1.2), and the triangle inequality.
Lemma 4.4. Let A be as in Lemma 4.1, and let Assumption 4.1 (a)–(d) hold. Let α be chosen according to (4.11) such that cα≥C(˜C+1). Then the minimizers xδα of (1.3) satisfy
‖(I−Pninf(δ))xδα)‖ℓ2≤Cσφ(δ). | (4.17) |
Proof. The diagonal structure of the operator allows to calculate the minimizer of (1.3) explicitly, and (1.3) reads
12||Ax−yδ||2+α||x||ℓ1=12∑i∈N(σixi−yδi)2+α∑i∈N|xi|. | (4.18) |
Since the components are decoupled, the first order optimality condition for the above functional is
∂∂xi(12∑i∈N(σixi−yδi)2+α∑i∈N|xi|)=0∀i∈N, |
i.e., for each i∈N,
σi(σixi−yδi)+αsgn(xi)=0 |
where
sgn(x):={1x>0−1x<0∈[−1,1]x=0. |
Consider the case [xδα]i>0. Then
[xδα]i=yδiσi−ασ2i>0. |
On the other hand, for [xδα]i<0 we have
[xδα]i=yδiσi+ασ2i<0 |
and consequently
[xδα]i=0⇔|yδiσi|≤ασ2i. | (4.19) |
We will only consider the case [xδα]i>0 further, the results for [xδα]i<0 are analogous with inverted sign. First let y be exact, noise-free data, i.e., yi=σix†i for all i∈N, where the minimizer of (4.18) (with yδ temporarily replaced by y) is denoted by xα. Then (4.19) yields
[xα]i=0⇔|x†i|≤ασ2i. | (4.20) |
Inserting the parameter choice (4.11) and considering i=ninf(δ)+1, we find with Assumption 4.1 (a) and (d), and with (4.9) that
ασ2ninf(δ)+1=ασ2ninf(δ)σ2ninf(δ)σ2ninf(δ)+1≥cα(σ−1ninf(δ)δ)2√ninf(δ)φ(δ)≥cα˜C+1σ−1ninf(δ)δ√ninf(δ)≥cαC(˜C+1)|x†ninf(δ)+1|. |
Now, as long as cα≥C(˜C+1), this implies [xα]ninf(δ)+1=0. Because ασ2i is increasing in i while |x†i| decreases, we conclude that for ninf(δ) sufficiently large (i.e., δ sufficiently small), [xα]ninf+1=0. Under Assumption 4.1 (b) all entries [xα]m, m>ninf(δ), must then also be zero. Hence, in the noise-free case, we have ‖(I−Pninf(δ))xα‖ℓ2=0. Therefore, for noisy data and under the same parameter choice, any contribution to ‖(I−Pninf(δ))xδα‖ℓ2 must be due to the noise in the data. Consider the model y−yδ=δζ with some ζ∈ℓ2, ||ζ||ℓ2=1. Then ‖y−yδ‖=δ and
[xδα]i>0⇔x†i+δζiσi−ασ2i>0. | (4.21) |
The components x†i are fixed and decreasing. The regularization parameter α is fixed for all i, hence α/σ2i grows with increasing index. Therefore, the smaller i, the (potentially) larger the components [xδα]i could become. To estimate the tail ||(I−Pninf(δ))xδα||ℓ2 we are only interested in the higher indices i>ninf(δ). Due to the asymptotics of the individual terms, ||(I−Pninf(δ))xδα||ℓ2 therefore becomes largest when all its mass is concentrated in the lowest possible component ninf(δ)+1, i.e., ζninf(δ)+1=1 and (I−Pninf(δ))xδα=([xδα]ninf(δ)+1,0,…). From the noise-free considerations above we already obtained x†i≤ασ2i. Hence, any positive contribution in (4.21) comes from the noise. With an analogous argument for the case [xδα]i<0 we therefore get
|[xδα]ninf(δ)+1|≤δσninf(δ)+1. |
Using Assumption 4.1 (c) we obtain
δσninf(δ)+1=δσninf(δ)σninf(δ)σninf(δ)+1≤Cσσ−1ninf(δ)δ≤Cσφ(δ). |
Putting all the pieces together yields the main theorem.
Theorem 4.1. Let A:ℓ2→Y be a compact linear operator with diagonal structure, possessing the singular values
‖A‖=σ1≥σ2≥...≥σn≥σn+1≥...→0asn→∞. |
Let x†∈ℓ2 denote the uniquely determined solution to the operator equation (1.1), and let Assumption 4.1 hold. Then the ℓ1-regularized solutions xδα from (1.3) with noisy data yδ obeying the noise model ‖y−yδ‖≤δ satisfy for sufficiently small δ>0 the estimate
||xδα−x†||ℓ2≤(Cσ+c)φ(δ) |
with c as in (4.13) and with the concave index function
φ(t)=infn∈N{tσn+‖(I−Pn)x†‖ℓ2}, |
provided the regularization parameter is chosen a priori as
α(δ)=cαδ2√ninf(δ)φ(δ), |
with constant cα≥C(˜C+1). The integers
ninf(δ):=argminn∈N{δσn+‖(I−Pn)x†‖ℓ2} |
can be found for all δ>0.
Proof. Lemma 4.1 and the discussion thereafter provide us with the prototype of the convergence rate φ(t) (4.7) and with ninf(t) (4.8). Consider (4.3) with x replaced by xδα, i.e.,
||xδα−x†||ℓ2≤||(I−Pn)xδα||ℓ2+σ−1n||Axδα−Ax†||+T(x†,n). | (4.22) |
Fix n=ninf(δ). From Lemma 4.3 we then have ‖Axδα−Ax†‖≤cδ with c>1, hence
||xδα−x†||ℓ2≤||(I−Pninf(δ))xδα||ℓ2+c(σ−1ninf(δ)δ+T(x†,ninf(δ))). |
The term in brackets, by definition, attains the value φ(δ). The remaining term ||(I−Pninf(δ))xδα||ℓ2 was estimated in Lemma 4.4. Therefore
‖xδα−x†‖ℓ2≤(Cσ+c)φ(δ). |
The a-priori choice requires, in principle, the knowledge of the exact solution and is thus unfeasible in practice. In the following we comment on the discrepancy principle as an a-posteriori parameter choice. We begin with a helpful lemma.
Lemma 4.5. Let Assumption 4.1 hold and α∗(δ) be any choice of the regularization parameter in (1.3) such that
τ1δ≤‖Axδα∗−yδ‖≤τ2δ |
for (1+˜C)<τ1≤τ2<∞. Then ˉcα∗≥α with the a-priori choice of α from (4.11) and ˉc=2cα(τ2+1)τ21−(˜C+1)2.
Proof. From the minimizing property of the Tikhonov-functional we have
12‖Axδα∗−yδ‖2+α∗‖xδα∗‖ℓ1≤12‖APninf(δ)x†−yδ‖2+α∗‖Pninf(δ)x†‖ℓ1. |
Similar to the proof of Lemma 4.3 this implies
12||Axδα∗−yδ||2≤12||APninf(δ)x†−yδ||2+α∗√ninf(δ)φ(||Axδα−Ax†)||). | (4.23) |
Note that for this results we have replaced (4.15) by
||Pninf(δ)(x†−xδα)||ℓ2≤σ−1ninf(δ)‖Axδα−Ax†‖≤φ(‖Axδα−Ax†‖). |
Using ||APninf(δ)x†−yδ||≤(˜C+1)δ (cf. Lemma 4.2) and τ1δ≤‖Axδα∗−yδ‖ in (4.23) we have
τ21δ22≤(˜C+1)22δ2+α∗√ninf(δ)φ(||Axδα−Ax†)||), |
and since τ1>(1+˜C)
δ2≤2τ21−(˜C+1)2α∗√ninf(δ)φ(||Axδα−Ax†)||). |
Using the upper bound ‖Axδα∗−yδ‖≤τ2δ and the concavity of φ, it is
φ(||Axδα−Ax†||)≤φ(||Axδα−yδ||+||yδ−Ax†||)≤(τ2+1)φ(δ) |
and we obtain, with α from (4.11),
α=cαδ2√ninf(δ)φ(δ)≤cα2(τ2+1)τ21−(˜C+1)2α∗. |
Theorem 4.2. Let A:ℓ2→Y be a compact linear operator with diagonal structure, possessing the singular values
‖A‖=σ1≥σ2≥...≥σn≥σn+1≥...→0asn→∞. |
Let x†∈ℓ2 denote the uniquely determined solution to the operator equation (1.1), and let Assumption 4.1 hold. Then the ℓ1-regularized solutions xδα from (1.3) with noisy data yδ obeying the noise model ‖y−yδ‖≤δ satisfy for sufficiently small δ>0 the estimate
||xδα−x†||ℓ2≤(τ2+Cσ+1)φ(δ) |
with a constant 0<c<∞ and with the concave index function
φ(t)=infn∈N{tσn+‖(I−Pn)x†‖ℓ2}, |
provided the regularization parameter is chosen a posteriori such that
τ1δ≤‖Axδα−yδ‖≤τ2δ, | (4.24) |
with parameters 1+˜C<τ1≤τ2<∞ such that 2C(˜C+1)(τ2+1)τ21−(˜C+1)2≤1.
Proof. We start as in the proof of Theorem 4.1 and consider (4.22). Due to the parameter choice (4.24) and the triangle inequality we have
‖Axδα−Ax†‖≤(τ2+1)δ, |
hence it is, with n=ninf(δ),
||xδα−x†||ℓ2≤||(I−Pninf(δ))xδα||ℓ2+(τ2+1)(σ−1ninf(δ)δ+T(x†,ninf(δ)))=||(I−Pninf(δ))xδα||ℓ2+(τ2+1)φ(δ). |
According to Lemma 4.3, the regularization parameter α∗ obtained from the discrepancy principle is larger than the a-priori parameter a from (4.11), α≤α∗, if ˉc≤1. The minimal admissible constant cα in Lemma 4.4 is cα=C(˜C+1). Inserting this into ˉc yields α≤α∗ if 2C(˜C+1)(τ2+1)τ21−(˜C+1)2≤1. It now follows analogously as in the proof of Lemma 4.4 that for noise-free data ||(I−Pninf(δ))xδα||ℓ2=0 and in the worst-case noise scenario ||(I−Pninf(δ))xδα||ℓ2≤Cσφ(δ). Hence
||xδα−x†||ℓ2≤(τ2+Cσ+1)φ(δ). |
In order to exemplify and illustrate the general theory in more detail, we will now consider some simple model scenarios. As before, we assume that A:ℓ2→ℓ2 is diagonal, [Ax]i=σixi, i=1,2,…. For simplicity, we will denote all constants by a generic constant 0<c<∞.
Let σi=i−β and x†i=i−η for positive values β and η and all i∈N. In particular, for 12<η≤1 this yields a case of oversmoothing regularization ‖x†‖ℓ1=∞, whereas for η>1 we have the classical model with ‖x†‖ℓ1<∞.
Theorem 4.3. Let A be diagonal with singular values σi=i−β, β>0. Let x†∈ℓ2 be such that [x†]i=i−η for η>12. Then the ℓ1-regularized solutions xδα from (1.3) to (1.1), with noisy data yδ obeying ‖y−yδ‖≤δ, satisfy
||xδα−x†||ℓ2≤cδ2η−12η+2β−1, | (4.25) |
provided the regularization parameter is chosen a priori as
α(δ)=cδ4β+2η2η+2β−1 | (4.26) |
or according to the discrepancy principle (4.24).
Proof. It remains to calculate the quantities occurring in the proof of Theorem 4.1 explicitly.
The tail of the exact solution satisfies
||(I−Pn)x†||ℓ2=√∞∑i=n+1|i−η|2≤√12η−1(n+1)1−2η≤(2η−1)−12n12−η. | (4.27) |
Inserting the structure of A, the rate prototype (4.7) becomes
φ(t)=infn∈N{nβt+cηn12−η}. | (4.28) |
It is simple calculus to show
φ(t)=ct2η−12η+2β−1 |
and
ninf(t)=⌈ct−22η+2β−1⌉ | (4.29) |
where ⌈⋅⌉ denotes the closest integer. Inserting the previous results into the parameter choice (4.11) yields
α(δ)=δ2√ninf(δ)φ(δ)=cδ4β+2η2η+2β−1. | (4.30) |
We mention that this model problem fulfills Assumption 4.1. Namely, we have xn+1<xn=n−η for all n∈N, and from (4.27) we obtain
‖(I−Pn)x†‖ℓ2√n≤cn−η. |
Also Assumption 4.1 (c) is fulfilled as
1≤σnσn+1≤n−β(n+1)−β=(n+1n)β≤2β=:Cσ. |
Finally, we observe that
δ2α=δ2cδ4β+2η2η+2β−1=cδ2η−22η+2β−1 |
goes to infinity when 12<η<1, stays constant for η=1, and goes to zero for η>1. That means there is a seamless transition between the regime of oversmoothing regularization and that of classical regularization. Numerical experiments are presented in Section 6.
We now consider for simplicity the case σi=i−1 and x†i=e−i for all i∈N. This is no oversmoothing regularization since x†∈ℓ1. Using MATHEMATICA®we calculate the tail in (4.7), T(x†,n)≤ce−n. It is then simple calculus to show
ninf(t)=cln1t. |
This yields
φ(t)=σ−1ninf(t)t+T(x†,ninf(t))=c(tln1t+t). |
In contrast to the previous example, the two terms σ−1ninf(δ)δ and T(x†,ninf(δ)) are no longer of the same order, but, for sufficiently small δ, σ−1ninf(δ)δ is the dominating part, i.e., Assumption 4.1 (c) is fulfilled. Also part (d) of Assumption 4.1 holds, since x†ninf(δ)+1<x†ninf(δ)=cδ, whereas σ−1ninf(δ)δ√ninf(δ)=cδ√ln(1δ). The predicted rate ‖xδα−x†‖ℓ2≤cφ(δ)=cδln1δ can be verified numerically, see the left plot of Figure 5.
Let now σi=e−i and, for simplicity, x†i=i−1 for all i∈N. We are again in the oversmoothing regime. To formulate the convergence rate, we recall the definition of the Lambert-W-function, defined as z=W(zez). With this, the minimizing argument of
φ(t)=infn∈Nent+cn−12 |
can be found, using MATHEMATICA®, to be
ninf(t)=⌈32W(23t−32)⌉, |
such that
φ(t)=teninf(t)+cninf(t)−12. |
The first term decays faster than the second one, hence the rate is dominated by the tail of the exact solution, and Assumption 4.1 (c) is violated, see Figure 1. Consequently, Theorem 4.1 is not applicable. Indeed, in a numerical experiment, shown in the left part of Figure 6, the measured convergence rate is different from the one predicted here.
Let for this example σi=e−i and x†i=e−i for all i∈N. Then
φ(t)=infn∈Nent+ce−n |
and
ninf(t)=12ln(1t), |
hence
φ(t)=c(e12ln(1t)t+e−12ln(1t))=c√t. |
Because x†ninf(δ)+1<x†ninf(δ)=c√δ, but σ−1ninf(δ)δ√ninf(δ)=c√δln(1δ), Assumption 4.1 (d) is (formally) violated. Numerically, however, even for values of δ significantly smaller then the machine-ilonε, a constant can be found such that Assumption 4.1 (d) holds, i.e., √δ≤C(δ0)√δln(1δ) for δ>δ0. A plot exemplifying this can be found in Figure 2. Consequently, our numerically retrieved convergence rate is close to the predicted √δ-rate, see the right part of Figure 6.
Our last examples goes back to the setting that sparked the developments around ℓ1-regularization, namely the case of sparse solutions. Let x†=(x†1,x†2,…,x†n0,0,0,…). Let A be as in Theorem 4.1 with no further restrictions. Then
φ(t)=infn∈Nσ−1nt+∞∑i=n+1|x†i|=min{infn∈N,n≤n0σ−1nt+n0∑i=n+1|x†i|,σ−1n0+1t}, |
i.e., (only) for sufficiently small values of δ can a linear convergence rate φ(δ)=cδ be reached. Note that in the literature the linear convergence rate was derived in the ℓ1-norm, ‖xδα−x†‖ℓ1≤cδ, whereas we obtain the same rate in the ℓ2-norm ‖xδα−x†‖ℓ2≤cδ. Since for sufficiently small δ>0 ninf(δ)=n0+1=const, we also recover from (4.11) the well-known parameter choice α=cδ.
In order to get a feeling for the derived convergence rate we compare the result from the previous section with classical, non-oversmoothing, ℓ1-regularization and ℓ2-regularization. In order to be able to use explicit expressions for the convergence rates and parameter choice rules, we will only consider the specific model problem of Section 4.3.1. Let us start with classical ℓ1-regularization, i.e., the approximate solution to (1.1) is obtained via (1.3) but under the assumption that ||x†||ℓ1<∞. In [19] a convergence rate
||xδα−x†||ℓ1≤cδη−1η+β−12=:φℓ1(δ) |
was derived using the parameter choice
α(δ)∼δ2φℓ1(δ)=cδ4β+2η2η+2β−1. |
Now let us move to ℓ2-regularization. This corresponds to the classic Tikhonov regularization, i.e., the approximate solution to (1.1) is given by (1.4). It is then well known, see e.g. [1,2], that under the assumption
x†∈range((A∗A)ν) | (5.1) |
for some 0≤ν≤2 the best possible convergence rate
||xδα−x†||ℓ2≤cδνν+1 | (5.2) |
can be shown under the a-priori parameter choice
α(δ)∼δ2ν+1. |
In the diagonal setting of Section 4.3 the source condition (5.1) can easily be related to the parameters η and β. Namely, (5.1) holds for all ν with 2η−12β>ν [21]. Since we are interested in the largest ν we set ν=2η−12β for simplicity, acknowledging that actually we should write ν=2η−12β+ilonε for arbitrary but small ilonε>0. With this, the convergence rate becomes
||xδα−x†||ℓ2≤cδνν+1=cδ2η−12η+2β−1=φ(δ) |
with φ(δ) from (4.25), and the parameter choice is
α(δ)∼δ2φ(δ)2=δ2δ4η−22η+2β−1=δ4β2η+2β−1. |
We summarize the convergence rates and parameter choices in Table 1. One sees that the ℓ1−ℓ2-regularization, corresponding to ℓ1-regularization with the ℓ2-norm as error measure which includes the oversmoothing regularization, inherits the parameter choice from the classical ℓ1-regularization and the convergence rate from the classical ℓ2-regularization. The parameter choice influences the residual ||Axδα−yδ|| and the penalty value ||xδα||ℓ1. Since it is most important to keep the residual on a level of about δ, the ℓ1-parameter choice is used. This is in line with [18] where it has been observed that for ℓ1-regularization the regularization parameter obtained via discrepancy principle and a-priori parameter choice always coincide up to constant factors. It is however somewhat surprising that this property appears to remain even when x†∉ℓ1. The less smooth the solution is, the smaller α has to be chosen. On the other hand, the optimal convergence rate in ℓ2 is well known to be given by (5.2). Therefore ℓ1-regularization yields the optimal (ℓ2-)convergence rate, even when x†∉ℓ1. Even more, it does not saturate like classical ℓ2 regularization which possesses the qualification ν=2η−12β≤2. Take any 12<η≤1. Then x†∈ℓ2∖ℓ1. If β<η2−14, then ν>2 and ℓ2-regularization (1.4) has saturated with the limiting convergence rate φ(δ)∼δ2/3. Oversmoothing ℓ1-regularization, on the other hand, would yield a higher rate of convergence since it does not saturate.
Type | ℓ1 | ℓ1−ℓ2 | ℓ2 |
rate | φℓ1(δ)=δ2η−22η+2β−2 | φℓ2(δ)=cδ2η−12η+2β−1 | φℓ2(δ)=cδ2η−12η+2β−1 |
α | δ4β+2η2η+2β−1 | δ4β+2η2η+2β−1 | δ4β2η+2β−1 |
α recipe | α=δ2φℓ1 | α=δ2φℓ2(δ)δ−12η+2β−1 | α=δ2φℓ2(δ) |
In this section we consider an operator specifically tailored to the setting of Section 4.1. We start with the Voltera operator
[˜Ax](s)=∫s0x(t)dt |
and discretize ˜A with the rectangular rule at N=400 points. In order to ensure our desired properties of the model cases of Section 4.3 we compute the SVD of the resulting matrix and manually set its singular values σi accordingly. This means that the actual operator A in (1.1) is an operator possessing the same singular vectors {ui} and {vi} as ˜A, but different singular values {σi}. Using the SVD, we construct our solution such that x†i=⟨x†,vi⟩ holds according to the scenario. We add random noise to the data y=Ax† such that ||y−yδ||=δ. The range of δ is such that the relative error is between 25% and 0.2%. The solutions are computed via
xδα=argmin{||Ax−yδ||2+α||x||ℓ1}, |
where the ℓ1-norm is taken using the coefficients with respect to the basis originating from the SVD. We compute the reconstruction error in the ℓ2 norm as well as the residuals. For larger values of η we can observe the convergence rate directly. For smaller values of η, we have to compensate for the error introduced by the discretization level. Namely, since we use a discretization level N=400, numerically we actually measure
||P400(xδα−x†)||ℓ2 |
with the projectors P as before being the cut-off after N=400 elements. In the plots of the convergence rates we show
||P400(xδα−x†)+(I−P400)x†||ℓ2. | (6.1) |
The second term can be calculated analytically and is supposed to correct for the fact that we cannot measure the regularization error for coefficients corresponding to larger n, i.e., we add the tail of x† that can not be observed. Because xδα has only finitely many non-zero components and we use a sufficiently large discretization, we do not make any error with respect to the tail of the solutions.
Our main focus is on the fully diagonal case of Section 4.3.1, i.e, σi=i−β for β>0 and x†i=i−η for η>12. Here, the regularization parameter is chosen a-priori according to (4.26) with cα=1.
We show selected plots of the convergence rates in Figure 3 for β=1 and Figure 4 for β=2. The plots are given with logarithmic scales for both axis. In each plot of the convergence rates we added the regression line for the assumption ||xδα−x†||ℓ2=cδe. The value of e is given in the legend. The regularization parameter is shown in the title of the figures in the form α=δa, a given. The result of our simulations for a larger number of parameters η are shown in Tables 2 and 3 for β=1 and β=2, respectively. We see that for all values of η the predicted and measured convergence rate coincides nicely. Additionally, the residual remains stable around ||Axδα−yδ||∼δ. For small values of η and β=1 the residual is a bit smaller than expected. We suppose this is due to the cut-off of x† due to the discretization. For correct results we would have to include a tail of the residual similar to (6.1). If η is very large, i.e. the components of the solution decay rapidly, the observed convergence rate is basically linear. We suppose this is due to numerical effects as numerically those solutions are de facto sparse.
η | α, a | measured rate, e | predicted rate, e | residual, d |
0.55 | 2.42 | 0.047 | 0.048 | 1.1 |
0.6 | 2.36 | 0.09 | 0.091 | 1.08 |
0.7 | 2.25 | 0.163 | 0.166 | 1.06 |
0.8 | 2.15 | 0.229 | 0.23 | 1.028 |
0.9 | 2.07 | 0.284 | 0.286 | 1.007 |
1 | 2 | 0.33 | 0.333 | 1.01 |
1.05 | 1.97 | 0.359 | 0.355 | 1.006 |
1.1 | 1.94 | 0.372 | 0.375 | 1.006 |
1.3 | 1.83 | 0.458 | 0.444 | 1.01 |
1.5 | 1.75 | 0.515 | 0.5 | 1.01 |
2 | 1.6 | 0.595 | 0.6 | 1.01 |
2.5 | 1.5 | 0.659 | 0.667 | 0.996 |
3 | 1.42 | 0.698 | 0.714 | 0.996 |
6 | 1.23 | 0.81 | 0.85 | 0.997 |
η | α=δa, a | measured rate, e | predicted rate, e | residual, d |
0.55 | 2.22 | 0.024 | 0.024 | 1.0005 |
0.6 | 2.19 | 0.0473 | 0.0476 | 1.002 |
0.7 | 2.13 | 0.089 | 0.091 | 1.01 |
0.8 | 2.09 | 0.128 | 0.13 | 1.006 |
0.9 | 2.04 | 0.166 | 0.167 | 1.006 |
1.01 | 1.996 | 0.209 | 0.203 | 0.999 |
1.1 | 1.96 | 0.236 | 0.23 | 0.994 |
1.3 | 1.89 | 0.284 | 0.286 | 1.002 |
1.5 | 1.83 | 0.329 | 0.333 | 0.999 |
1.75 | 1.76 | 0.384 | 0.385 | 0.996 |
2 | 1.71 | 0.421 | 0.428 | 0.999 |
Finally, we compare the theoretically predicted convergence rates from the remaining model cases of Section 4.3 to measured rates from numerical experiments. In all experiments, the regularization parameter was chosen according to the discrepancy principle (4.24) with τ1=1.1 and τ2=1.3. We obtained a good match between the theoretical and observed convergence rates in the cases of sections 4.3.2, 4.3.4, and 4.3.4. Only case 4.3.3 showed a significant difference. This is no surprise, since Assumption 4.1 does not hold in this case.
We have shown that oversmoothing regularization, i.e., Tikhonov regularization with a penalty that takes the value infinity in the exact solution, yields existence and stability of regularized solutions. We have discussed why it is difficult to show convergence of regularized solutions to the true solution in the limit of vanishing data noise. For the specific case of Tikhonov-regularization with ℓ1-penalty term under a diagonal operator we have derived convergence rates of regularized solutions to the true solution even if those does not belong to ℓ1 anymore. The theoretical convergence rates have been verified numerically for a model problem.
D. Gerth was funded by Deutsche Forschungsgemeinschaft (DFG), project HO1454/10-1.
The authors declare there is no conflicts of interest in this paper.
[1] | H. W. Engl, M. Hanke, and A. Neubauer, Regularization of Inverse Problems, Dordrecht: Kluwer Academic Publishers, 1996. |
[2] | A. K. Louis, Inverse und schlecht gestellt Probleme, Stuttgart: Teubner, 1989. |
[3] | D. Gerth and B. Hofmann, On ℓ1-regularization under continuity of the forward operator in weaker topologies. In: New Trends in Parameter Identification for Mathematical Models (Ed.: B. Hofmann, A. Leitao, J.P. Zubelli), Cham: Birkhäuser, (2018), 67-88. |
[4] |
B. Hofmann, B. Kaltenbacher, C. Pöschl, et al. A convergence rates result for Tikhonov regularization in Banach spaces with non-smooth operators, Inverse Probl., 23 (2007), 987-1010. doi: 10.1088/0266-5611/23/3/009
![]() |
[5] | B. Hofmann and P. Mathé, Parameter choice in Banach space regularization under variational inequalities, Inverse Probl., 28 (2012), 104006. |
[6] |
F. Natterer, Error bounds for Tikhonov regularization in Hilbert scales, Appl. Anal., 18 (1984), 29-37. doi: 10.1080/00036818408839508
![]() |
[7] | B. Hofmann and P. Mathé, Tikhonov regularization with oversmoothing penalty for non-linear ill-posed problems in Hilbert scales, Inverse Probl., 34 (2018), 015007. |
[8] | B. Hofmann and P. Mathé, A priori parameter choice in Tikhonov regularization with oversmoothing penalty for non-linear ill-posed problems, 2019. Available from: https://arxiv.org/abs/1904.02014. |
[9] | T. Schuster, B. Kaltenbacher, B. Hofmann, et al. Regularization Methods in Banach spaces, Berlin/Boston: De Gruyter, 2012. |
[10] | B. Hofmann, On smoothness concepts in regularization for nonlinear inverse problems in Banach spaces. In: Mathematical and Computational Modeling: With Applications in Natural and Social Sciences, Engineering, and the Arts (Ed.: R. Melnik), New Jersey: John Wiley, (2015), 192-221. |
[11] | M. Burger, J. Flemming and B. Hofmann, Convergence rates in ℓ1-regularization if the sparsity assumption fails, Inverse Probl., 29 (2013), 025013. |
[12] | J. Flemming, B. Hofmann and I. Veselic, A unified approach to convergence rates for ℓ1-regularization and lacking sparsity, J. Inverse Ill-posed Probl., 24 (2016), 139-148. |
[13] |
I. Daubechies, M. Defrise and C. De Mol, An iterative thresholding algorithm for linear inverse problems with a sparsity constraint, Commun. Pur. Appl. Math., 57 (2004), 1413-1457. doi: 10.1002/cpa.20042
![]() |
[14] | M. Grasmair, M. Haltmeier and O. Scherzer, Sparse regularization with lq penalty term, Inverse Probl., 24 (2008), 055020. |
[15] | D. Lorenz, Convergence rates and source conditions for Tikhonov regularization with sparsity constraints, J. Inverse ill-posed Probl, 16 (2008), 463-478. |
[16] | R. Ramlau, Regularization properties of Tikhonov regularization with sparsity constraints, Electron. T. Numer. Anal., 30 (2008), 54-74. |
[17] | J. Flemming, Convergence rates for l1-regularization without injectivity-type assumptions, Inverse Probl., 32 (2016), 095001. |
[18] |
D. Gerth, J. Flemming, Injectivity and weak*-to-weak continuity suffice for convergence rates in l1-regularization, J. Inverse Ill-posed Probl., 26 (2018), 85-94. doi: 10.1515/jiip-2017-0008
![]() |
[19] | D. Gerth, Convergence rates for ℓ1-regularization without the help of a variational inequality, Electron. T. Numer. Anal., 46 (2017), 233-244. |
[20] | B. Hofmann, P. Mathé and M. Schieck, Modulus of continuity for conditionally stable ill-posed problems in Hilbert space, J. Inverse Ill-posed Probl., 16 (2008), 567-585. |
[21] |
D. Gerth, Using Landweber iteration to quantify source conditions - a numerical study, J. Inverse Ill-posed Probl., 27 (2019), 367-383. doi: 10.1515/jiip-2018-0071
![]() |
1. | Daniel Gerth, Bernd Hofmann, Christopher Hofmann, 2020, Chapter 9, 978-981-15-1591-0, 177, 10.1007/978-981-15-1592-7_9 | |
2. | Philip Miller, Thorsten Hohage, Maximal spaces for approximation rates in ℓ1-regularization, 2021, 149, 0029-599X, 341, 10.1007/s00211-021-01225-4 | |
3. | De-Han Chen, Bernd Hofmann, Irwin Yousept, Oversmoothing Tikhonov regularization in Banach spaces * , 2021, 37, 0266-5611, 085007, 10.1088/1361-6420/abcea0 |
Type | ℓ1 | ℓ1−ℓ2 | ℓ2 |
rate | φℓ1(δ)=δ2η−22η+2β−2 | φℓ2(δ)=cδ2η−12η+2β−1 | φℓ2(δ)=cδ2η−12η+2β−1 |
α | δ4β+2η2η+2β−1 | δ4β+2η2η+2β−1 | δ4β2η+2β−1 |
α recipe | α=δ2φℓ1 | α=δ2φℓ2(δ)δ−12η+2β−1 | α=δ2φℓ2(δ) |
η | α, a | measured rate, e | predicted rate, e | residual, d |
0.55 | 2.42 | 0.047 | 0.048 | 1.1 |
0.6 | 2.36 | 0.09 | 0.091 | 1.08 |
0.7 | 2.25 | 0.163 | 0.166 | 1.06 |
0.8 | 2.15 | 0.229 | 0.23 | 1.028 |
0.9 | 2.07 | 0.284 | 0.286 | 1.007 |
1 | 2 | 0.33 | 0.333 | 1.01 |
1.05 | 1.97 | 0.359 | 0.355 | 1.006 |
1.1 | 1.94 | 0.372 | 0.375 | 1.006 |
1.3 | 1.83 | 0.458 | 0.444 | 1.01 |
1.5 | 1.75 | 0.515 | 0.5 | 1.01 |
2 | 1.6 | 0.595 | 0.6 | 1.01 |
2.5 | 1.5 | 0.659 | 0.667 | 0.996 |
3 | 1.42 | 0.698 | 0.714 | 0.996 |
6 | 1.23 | 0.81 | 0.85 | 0.997 |
η | α=δa, a | measured rate, e | predicted rate, e | residual, d |
0.55 | 2.22 | 0.024 | 0.024 | 1.0005 |
0.6 | 2.19 | 0.0473 | 0.0476 | 1.002 |
0.7 | 2.13 | 0.089 | 0.091 | 1.01 |
0.8 | 2.09 | 0.128 | 0.13 | 1.006 |
0.9 | 2.04 | 0.166 | 0.167 | 1.006 |
1.01 | 1.996 | 0.209 | 0.203 | 0.999 |
1.1 | 1.96 | 0.236 | 0.23 | 0.994 |
1.3 | 1.89 | 0.284 | 0.286 | 1.002 |
1.5 | 1.83 | 0.329 | 0.333 | 0.999 |
1.75 | 1.76 | 0.384 | 0.385 | 0.996 |
2 | 1.71 | 0.421 | 0.428 | 0.999 |
Type | ℓ1 | ℓ1−ℓ2 | ℓ2 |
rate | φℓ1(δ)=δ2η−22η+2β−2 | φℓ2(δ)=cδ2η−12η+2β−1 | φℓ2(δ)=cδ2η−12η+2β−1 |
α | δ4β+2η2η+2β−1 | δ4β+2η2η+2β−1 | δ4β2η+2β−1 |
α recipe | α=δ2φℓ1 | α=δ2φℓ2(δ)δ−12η+2β−1 | α=δ2φℓ2(δ) |
η | α, a | measured rate, e | predicted rate, e | residual, d |
0.55 | 2.42 | 0.047 | 0.048 | 1.1 |
0.6 | 2.36 | 0.09 | 0.091 | 1.08 |
0.7 | 2.25 | 0.163 | 0.166 | 1.06 |
0.8 | 2.15 | 0.229 | 0.23 | 1.028 |
0.9 | 2.07 | 0.284 | 0.286 | 1.007 |
1 | 2 | 0.33 | 0.333 | 1.01 |
1.05 | 1.97 | 0.359 | 0.355 | 1.006 |
1.1 | 1.94 | 0.372 | 0.375 | 1.006 |
1.3 | 1.83 | 0.458 | 0.444 | 1.01 |
1.5 | 1.75 | 0.515 | 0.5 | 1.01 |
2 | 1.6 | 0.595 | 0.6 | 1.01 |
2.5 | 1.5 | 0.659 | 0.667 | 0.996 |
3 | 1.42 | 0.698 | 0.714 | 0.996 |
6 | 1.23 | 0.81 | 0.85 | 0.997 |
η | α=δa, a | measured rate, e | predicted rate, e | residual, d |
0.55 | 2.22 | 0.024 | 0.024 | 1.0005 |
0.6 | 2.19 | 0.0473 | 0.0476 | 1.002 |
0.7 | 2.13 | 0.089 | 0.091 | 1.01 |
0.8 | 2.09 | 0.128 | 0.13 | 1.006 |
0.9 | 2.04 | 0.166 | 0.167 | 1.006 |
1.01 | 1.996 | 0.209 | 0.203 | 0.999 |
1.1 | 1.96 | 0.236 | 0.23 | 0.994 |
1.3 | 1.89 | 0.284 | 0.286 | 1.002 |
1.5 | 1.83 | 0.329 | 0.333 | 0.999 |
1.75 | 1.76 | 0.384 | 0.385 | 0.996 |
2 | 1.71 | 0.421 | 0.428 | 0.999 |