On the randomised stability constant for inverse problems

Giovanni S. Alberti; Yves Capdeboscq; Yannick Privat; Giovanni S. Alberti; Yves Capdeboscq; Yannick Privat

doi:10.3934/mine.2020013

Mathematics in Engineering

2020, Volume 2, Issue 2: 264-286. doi: 10.3934/mine.2020013

Previous Article Next Article

Research article Special Issues

On the randomised stability constant for inverse problems

1.
MaLGa Center, Department of Mathematics, University of Genoa, Via Dodecaneso 35, 16146 Genova, Italy
2.
Université de Paris, Laboratoire Jacques-Louis Lions (LJLL), F-75013 Paris, France
3.
Sorbonne Université, CNRS, LJLL, F-75005 Paris, France
4.
IRMA, Université de Strasbourg, CNRS UMR 7501, 7 rue René Descartes, 67084 Strasbourg, France

Received: 23 March 2019 Accepted: 17 December 2019 Published: 12 February 2020

In this paper we introduce the randomised stability constant for abstract inverse problems, as a generalisation of the randomised observability constant, which was studied in the context of observability inequalities for the linear wave equation. We study the main properties of the randomised stability constant and discuss the implications for the practical inversion, which are not straightforward.

Keywords:

Citation: Giovanni S. Alberti, Yves Capdeboscq, Yannick Privat. On the randomised stability constant for inverse problems[J]. Mathematics in Engineering, 2020, 2(2): 264-286. doi: 10.3934/mine.2020013

Related Papers:

[1]	Lauri Oksanen, Mikko Salo . Inverse problems in imaging and engineering science. Mathematics in Engineering, 2020, 2(2): 287-289. doi: 10.3934/mine.2020014
[2]	Valentina Candiani, Matteo Santacesaria . Neural networks for classification of strokes in electrical impedance tomography on a 3D head model. Mathematics in Engineering, 2022, 4(4): 1-22. doi: 10.3934/mine.2022029
[3]	Michiaki Onodera . Linear stability analysis of overdetermined problems with non-constant data. Mathematics in Engineering, 2023, 5(3): 1-18. doi: 10.3934/mine.2023048
[4]	Valentina Volpini, Lorenzo Bardella . Asymptotic analysis of compression sensing in ionic polymer metal composites: The role of interphase regions with variable properties. Mathematics in Engineering, 2021, 3(2): 1-31. doi: 10.3934/mine.2021014
[5]	Elena Beretta, M. Cristina Cerutti, Luca Ratti . Lipschitz stable determination of small conductivity inclusions in a semilinear equation from boundary data. Mathematics in Engineering, 2021, 3(1): 1-10. doi: 10.3934/mine.2021003
[6]	William R. B. Lionheart . Histogram tomography. Mathematics in Engineering, 2020, 2(1): 55-74. doi: 10.3934/mine.2020004
[7]	Plamen Stefanov . Conditionally stable unique continuation and applications to thermoacoustic tomography. Mathematics in Engineering, 2019, 1(4): 789-799. doi: 10.3934/mine.2019.4.789
[8]	Yu Chen, Jin Cheng, Giuseppe Floridia, Youichiro Wada, Masahiro Yamamoto . Conditional stability for an inverse source problem and an application to the estimation of air dose rate of radioactive substances by drone data. Mathematics in Engineering, 2020, 2(1): 26-33. doi: 10.3934/mine.2020002
[9]	Rolando Magnanini, Giorgio Poggesi . Interpolating estimates with applications to some quantitative symmetry results. Mathematics in Engineering, 2023, 5(1): 1-21. doi: 10.3934/mine.2023002
[10]	Jinghong Li, Hongyu Liu, Wing-Yan Tsui, Xianchao Wang . An inverse scattering approach for geometric body generation: a machine learning perspective. Mathematics in Engineering, 2019, 1(4): 800-823. doi: 10.3934/mine.2019.4.800

Abstract

1. Introduction

Inverse problems are the key to all experimental setups where the physical quantity of interest is not directly observable and must be recovered from indirect measurements. They appear in many different contexts including medical imaging, non-destructive testing, seismic imaging or signal processing. In mathematical terms, an inverse problem consists in the inversion of a linear or nonlinear map

$T\colon X\to Y,\qquad x\mapsto T(x),$

which models how the quantity of interest $x$ belonging to a space $X$ is related to the measurements $y = T(x)$ in the space $Y$ . The reader is referred to the many books on inverse problems for a comprehensive exposition (see, e.g., ^{[2,3,4,24,27,30,44,47]}).

Inverse problems can be ill-posed: the map $T$ may not be injective (i.e., two different $x_{1}$ and $x_{2}$ may correspond to the same measurement $T(x_{1}) = T(x_{2})$ ) or, when injective, $T^{-1}\colon{\rm{ran}} T\subseteq Y\to X$ may not be continuous (i.e., two different and not close $x_{1}$ and $x_{2}$ may correspond to almost identical measurements $T(x_{1})\approx T(x_{2})$ ). Various strategies have been introduced to tackle the issue of inversion in this setting, Tykhonov regularisation being the most famous method ^[17,29].

Our purpose is to investigate the role of randomisation in the resolution of inverse problems. By randomisation, we refer to the use of random measurements, or to the case of random unknowns. We do not mean the well-established statistical approaches in inverse problems, as in the Bayesian framework, where probability is used to assess the reliability of the reconstruction.

Even with this distinction, the wording "randomisation" may refer to many different concepts in the framework of inverse problems. A huge literature is devoted to the issue of randomisation in the measuring process: the unknown $x$ in $X$ is fixed and deterministic, and we choose the measurements randomly according to some suitable distribution. For example, compressed sensing ^[19] and passive imaging with ambient noise ^[20] belong to this class. Another very popular idea consists in randomising the unknown: in this scenario, we try to recover most unknowns $x$ in $X$ , according to some distribution. For example, this is typically the situation when deep learning ^[21] is applied to inverse problems. We briefly review these instances in Appendix A.

This article deals with a more recent approach, introduced in the framework of observability or control theory: the so-called randomised observability constants ^[38,39,40]. We argue that, in contrast with the initial motivation for its introduction, the randomised observability constant is not necessarily indicative of the likelihood for randomised unknowns to be observable.

In the next section, we recall the notion of randomised observability constant and comment on its use in optimal design problems. In section 3, we reformulate the randomised observability constant in an abstract setting as randomised stability constant. In section 4 we show that when the classical (deterministic) stability constant is null, a positive randomised stability constant need not imply, as one could hope, that the inverse problem can be solved for most unknowns. In the course of our study, we make several observations on the properties of the randomised stability constant. Section 5 contains several concluding remarks and discusses possible future directions.

2. Randomising initial data of PDEs

In this section we briefly review the randomised observability constant introduced in ^[38,39,40] and its main properties. This was motivated by the original idea of randomisation by Paley and Zygmund, which we now briefly discuss.

2.1. On randomisation processes

In order to understand the principle of randomisation that will be at the core of this paper, it is useful to recall the historical result by Paley and Zygmund on Fourier series. Let $(c_{n})_{n\in\mathbb{Z}}$ be an element of $\ell^{2}(\mathbb{C})$ and $f$ be the Fourier series given by

$f:\mathbb{T}\ni\theta\mapsto\sum\limits_{n\in\mathbb{Z}}c_{n}e^{in\theta},$

where $\mathbb{T}$ denotes the torus $\mathbb{R}/(2\pi)$ . According to the so-called Parseval identity, the function $f$ belongs to $L^{2}(\mathbb{T})$ ; furthermore, the coefficients $c_{n}$ can be chosen in such a way that $f$ does not belong to any $L^{q}(\mathbb{T})$ for $q > 2$ . Some of the results obtained by Paley and Zygmund (see ^[35,36,37]) address the regularity of the Fourier series $f$ . They show that if one changes randomly and independently the signs of the Fourier coefficients $c_{n}$ , then the resulting random Fourier series belongs almost surely to any $L^{q}(\mathbb{T})$ for $q > 2$ . More precisely, introducing a sequence $(\beta_{n}^{\nu})_{n\in\mathbb{Z}}$ of independent Bernoulli random variables on a probability space $(A, \mathcal{A}, \mathbb{P})$ such that

$\mathbb{P}(\beta_{n}^{\nu} = \pm1) = \frac{1}{2},$

then, the Fourier series $f^{\nu}$ given by

$f^{\nu}:\mathbb{T}\ni\theta\mapsto\sum\limits_{n\in\mathbb{Z}}\beta_{n}^{\nu}c_{n}e^{in\theta}$

belongs almost surely to $L^{q}(\mathbb{T})$ for all $q < +\infty$ .

In ^[13], the effect of the randomisation on the initial data of solutions to dispersive equations was investigated. In that case, initial data and solutions of the PDE considered are expanded in a Hilbert basis of $L^{2}(\Omega)$ made of eigenfunctions $(\phi_{j})_{j\geq1}$ of the Laplace operator. The randomisation procedure consists hence in multiplying all the terms of the series decomposition by well-chosen independent random variables. In particular, regarding for instance the homogeneous wave equation with Dirichlet boundary conditions, one can show that for all initial data $(y^{0}, y^{1})\in H_{0}^{1}(\Omega)\times L^{2}(\Omega)$ , the Bernoulli randomisation keeps the $H_{0}^{1}\times L^{2}$ norm constant. It is observed that many other choices of randomisation are possible. For instance a positive effect of the randomisation can be observed by considering independent centred Gaussian random variables with variance 1 (Gaussian randomisation). Indeed, it allows to generate a dense subset of the space of initial data $H_{0}^{1}(\Omega)\times L^{2}(\Omega)$ through the mapping

$R_{(y^{0},y^{1})}\colon A\to H_{0}^{1}(\Omega)\times L^{2}(\Omega),\qquad\nu\mapsto(y_{\nu}^{0},y_{\nu}^{1}),$

where $(y_{\nu}^{0}, y_{\nu}^{1})$ denotes the pair of randomised initial data, provided that all the coefficients in the series expansion of $(y^{0}, y^{1})$ are nonzero. Several other properties of these randomisation procedures are also established in ^[13].

2.2. Randomised observability constant

We now review how randomisation appeared in the framework of inverse problems involving an observability inequality. The property of observability of a system is related to the following issue: How to recover the solutions of a PDE from the knowledge of partial measurements of the solutions. In what follows, we will concentrate on wave models, having in particular photoacoustic/thermoacoustic tomography imaging in mind.

Let $T > 0$ and $\Omega\subseteq\mathbb{R}^{d}$ be a bounded Lipschitz domain with outer unit normal $\nu$ . We consider the homogeneous wave equation with Dirichlet boundary conditions

$\begin{equation} \left\{ \begin{array}{ll} \partial_{tt}y(t,x)-\Delta y(t,x) = 0 & \quad(t,x)\in[0,T]\times\Omega,\\ y(t,x) = 0 & \quad(t,x)\in[0,T]\times\partial\Omega. \end{array}\right. \end{equation}$

(2.1)

It is well known that, for all $(y^{0}, y^{1})\in H_{0}^{1}(\Omega)\times L^{2}(\Omega)$ , there exists a unique solution $y\in C^{0}([0, T], H_{0}^{1}(\Omega))\cap C^{1}((0, T), L^{2}(\Omega))$ of (2.1) such that $y(0, x) = y^{0}(x)$ and $\partial_{t}y(0, x) = y^{1}(x)$ for almost every $x\in\Omega$ . Let $\Gamma$ be a measurable subset of $\partial\Omega$ , representing the domain occupied by some sensors, which take some measurements over a time horizon $[0, T]$ .

The inverse problem under consideration reads as follows.

Inverse problem: Reconstruct the initial condition $(y^{0}, y^{1})$ from the knowledge of the partial boundary measurements

${ \mathbb{1}}_{\Gamma}(x)\frac{\partial y}{\partial\nu}(t,x),\qquad(t,x)\in[0,T]\times\partial\Omega.$

To solve this problem, we introduce the so-called observability constant: $C_{T}(\Gamma)$ is defined as the largest non-negative constant $C$ such that

$\begin{equation} C\Vert(y(0,\cdot),\partial_{t}y(0,\cdot))\Vert_{H_{0}^{1}(\Omega)\times L^{2}(\Omega)}^{2}\leq\int_{0}^{T}\int_{\Gamma}\left|\frac{\partial y}{\partial\nu}(t,x)\right|^{2}\,d\mathcal{H}^{d-1}\,dt, \end{equation}$

(2.2)

for any solution $y$ of (2.1), where $H_{0}^{1}(\Omega)$ is equipped with the norm $\|u\|_{H_{0}^{1}(\Omega)} = \|\nabla u\|_{L^{2}(\Omega)}$ . Then, the aforementioned inverse problem is well-posed if and only if $C_{T}(\Gamma) > 0$ . In such a case, we will say that observability holds true in time $T$ . Moreover, observability holds true within the class of $\mathcal{C}^{\infty}$ domains $\Omega$ if $(\Gamma, T)$ satisfies the Geometric Control Condition (GCC) (see ^[8]), and this sufficient condition is almost necessary.

Let us express the observability constant more explicitly. Fix an orthonormal basis (ONB) $(\phi_{j})_{j\geq1}$ of $L^{2}(\Omega)$ consisting of (real-valued) eigenfunctions of the Dirichlet-Laplacian operator on $\Omega$ , associated with the negative eigenvalues $(-\lambda_{j}^{2})_{j\geq1}$ . Then, any solution $y$ of (2.1) can be expanded as

$\begin{equation} y(t,x) = \sum\limits_{j = 1}^{+\infty}y_{j}(t)\phi_{j}(x) = {\frac{1}{\sqrt{2}}}\sum\limits_{j = 1}^{+\infty}\left(\frac{a_{j}}{{\lambda_{j}}}e^{i\lambda_{j}t}+\frac{b_{j}}{{\lambda_{j}}}e^{-i\lambda_{j}t}\right)\phi_{j}(x), \end{equation}$

(2.3)

where the coefficients $a_{j}$ and $b_{j}$ account for initial data. More precisely, we consider the ONB of $H_{0}^{1}(\Omega)\times L^{2}(\Omega)$ given by $\{\psi_{j}^{+}, \psi_{j}^{-}:j\ge1\}$ , where

$\begin{equation} \psi_{j}^{+} = \frac{1}{\sqrt{2}}(\frac{\phi_{j}}{\lambda_{j}},i\phi_{j}),\qquad\psi_{j}^{-} = \frac{1}{\sqrt{2}}(\frac{\phi_{j}}{\lambda_{j}},-i\phi_{j}). \end{equation}$

(2.4)

Expanding now the initial data with respect to this basis we can write $(y^{0}, y^{1}) = \sum_{j = 1}^{+\infty}a_{j}\psi_{j}^{+}+b_{j}\psi_{j}^{-}$ , namely,

$y^{0} = \sum\limits_{j = 1}^{+\infty}\frac{a_{j}+b_{j}}{\sqrt{2}\,\lambda_{j}}\,\phi_{j},\qquad y^{1} = \sum\limits_{j = 1}^{+\infty}i\frac{a_{j}-b_{j}}{\sqrt{2}}\,\phi_{j}.$

The corresponding solution to (2.1) is given by (2.3). In addition, Parseval's identity yields $\|(y^{0}, y^{1})\|_{H_{0}^{1}(\Omega)\times L^{2}(\Omega)}^{2} = \sum_{j = 1}^{+\infty}|a_{j}|^{2}+|b_{j}|^{2}$ .

Then, the constant $C_{T}(\Gamma)$ rewrites

$\begin{split}C_{T}(\Gamma) & = \inf\limits_{\substack{(a_{j}),(b_{j})\in\ell^{2}( \mathbb{C})\\ \sum_{j = 1}^{+\infty}(|a_{j}|^{2}+|b_{j}|^{2}) = 1 } }\int_{0}^{T}\int_{\Gamma}\left|\frac{\partial y}{\partial\nu}(t,x)\right|^{2}\,d\mathcal{H}^{d-1}\,dt,\end{split}$

where $y(t, x)$ is given by (2.3).

The constant $C_{T}(\Gamma)$ is deterministic and takes into account any $(a_{j}), (b_{j})\in\ell^{2}(\mathbb{C})$ , including the worst possible cases. Interpreting $C_{T}(\Gamma)$ as a quantitative measure of the well-posed character of the aforementioned inverse problem, one could expect that such worst cases do not occur too often; thus it would appear desirable to consider a notion of observation in average.

Motivated by the findings of Paley and Zygmund (see §2.1) and its recent use in another context ^[11,13], making a random selection of all possible initial data for the wave equation (2.1) consists in replacing $C_{T}(\Gamma)$ with the so-called randomised observability constant defined by

$\begin{equation} C_{T,\mathrm{rand}}(\Gamma) = \inf\limits_{\substack{(a_{j}),(b_{j})\in\ell^{2}( \mathbb{C})\\ \sum_{j = 1}^{+\infty}(|a_{j}|^{2}+|b_{j}|^{2}) = 1} }\mathbb{E}\left(\int_{0}^{T}\int_{\Gamma}\left|\frac{\partial y^{\nu}}{\partial\nu}(t,x)\right|^{2}d\mathcal{H}^{d-1}\,dt\right), \end{equation}$

(2.5)

where

$\begin{equation} y^{\nu}(t,x) = {\frac{1}{\sqrt{2}}}\sum\limits_{j = 1}^{+\infty}\left(\frac{\beta_{1,j}^{\nu}a_{j}}{{\lambda_{j}}}e^{i\lambda_{j}t}+\frac{\beta_{2,j}^{\nu}b_{j}}{{\lambda_{j}}}e^{-i\lambda_{j}t}\right)\phi_{j}(x) \end{equation}$

(2.6)

and $(\beta_{1, j}^{\nu})_{j\in\mathbb{N}}$ and $(\beta_{2, j}^{\nu})_{j\in\mathbb{N}}$ are two sequences of independent random variables of Bernoulli or Gaussian type, on a probability space $(A, \mathcal{A}, \mathbb{P})$ with mean 0 and variance 1. Here, $\mathbb{E}$ is the expectation in the probability space, and runs over all possible events $\nu$ . In other words, we are randomising the Fourier coefficients $\{a_{j}, b_{j}\}_{j\ge1}$ of the initial data $(y^{0}, y^{1})$ with respect to the basis $\{\psi_{j}^{\pm}\}_{j\ge1}$ .

The randomised observability constant was introduced in ^{[38,39,40,41,42]}. It can be expressed in terms of deterministic quantities (see [40,Theorem 2.2]).

Proposition 1. Let $\Gamma\subset\partial\Omega$ be measurable. We have

$\begin{equation} C_{T,\mathrm{rand}}(\Gamma) = {\frac{T}{2}}\inf\limits_{j\in\mathbb{N}}\frac{1}{\lambda_{j}^{2}}\int_{\Gamma}\left(\frac{\partial\phi_{j}}{\partial\nu}(x)\right)^{2}\,d\mathcal{H}^{d-1}. \end{equation}$

(2.7)

Proof. In view of Lemma 1 (see below), we have that

$C_{T,\mathrm{rand}}(\Gamma) = \inf\{\|{ \mathbb{1}}_{\Gamma}\partial_{\nu}y_{j}^{+}\|_{L^{2}([0,T]\times\partial\Omega)}^{2},\|{ \mathbb{1}}_{\Gamma}\partial_{\nu}y_{j}^{-}\|_{L^{2}([0,T]\times\partial\Omega)}^{2}:j\ge1\},$

where $y_{j}^{\pm}$ is the solution to (2.1) with initial condition $\psi_{j}^{\pm}$ . Thus

$y_{j}^{\pm}(t,x) = \frac{1}{\sqrt{2}\,\lambda_{j}}e^{\pm i\lambda_{j}t}\phi_{j}(x),$

and in turn

$\|{ \mathbb{1}}_{\Gamma}\partial_{\nu}y_{j}^{\pm}\|_{L^{2}([0,T]\times\partial\Omega)}^{2} = \frac{1}{2\lambda_{j}^{2}}\int_{[0,T]\times\Gamma}|e^{\pm i\lambda_{j}t}\partial_{\nu}\phi_{j}(x)|^{2}\,dtdx,$

which leads to our thesis.

We have $C_{T, \mathrm{rand}}(\Gamma)\geq C_{T}(\Gamma)$ (see Proposition 4 below). It has been noted in ^[41] that the observability inequality defining $C_{T, \mathrm{rand}}(\Gamma)$ is associated to a deterministic control problem for the wave equation (2.1), where the control has a particular form but acts in the whole domain $\Omega$ .

Regarding $C_{T, \mathrm{rand}}(\Gamma)$ , we refer to [,Section 4] for a discussion on the positivity of this constant. The authors show that if $\Omega$ is either a hypercube or a disk, then $C_{T, \mathrm{rand}}(\Gamma) > 0$ for every relatively non-empty open subset $\Gamma$ of $\partial\Omega$ . In particular, in some cases $C_{T}(\Gamma) = 0$ while $C_{T, \mathrm{rand}}(\Gamma) > 0$ . This raised hopes that, even if recovering all unknowns is an unstable process, recovering most unknowns could be feasible, since apparently most unknowns are observable. This heuristic argument, mentioned amongst the motivations for the study of the optimisation of the randomised observability constant, was not investigated further in the aforementioned papers. This matter will be studied in the following sections, dedicated more generally on the possible use of such a constant for investigating the well-posed character of general inverse problems.

2.2.1. Applications to optimal design problems

A larger observability constant $C_{T}(\Gamma)$ in (2.2) leads to a smaller Lipschitz norm bound of the inverse map. Therefore $C_{T}(\Gamma)$ can be used as the quantity to maximise when searching for optimal sensors' positions. However, this turns out to be somewhat impractical. When implementing a reconstruction process, one has to carry out in general a very large number of measures; likewise, when implementing a control procedure, the control strategy is expected to be efficient in general, but possibly not for all cases. Thus, one aims at exhibiting an observation domain designed to be the best possible in average, that is, over a large number of experiments. Adopting this point of view, it appeared relevant to consider an average over random initial data. In ^[38,39,40], the best observation is modelled in terms of maximising a randomised observability constant, which coincides with $C_{T, \mathrm{rand}}(\Gamma)$ when dealing with the boundary observation of the wave equation.

When dealing with internal observation of the wave equation on a closed manifold, it has been shown in ^[26] that the related observability constant reads as the minimum of two quantities: The infimum of the randomised observability constants over every orthonormal eigenbasis and a purely geometric criterion standing for the minimal average time spent by a geodesic in the observation set.

However, one should keep in mind that a large randomised constant may not be associated with a reconstruction method (see section 4).

3. The randomised stability constant for abstract inverse problems

It is convenient to generalise the construction of the previous section to an abstract setting. In what follows, none of the arguments require the precise form of the forward operator related to the wave equation, as they rely solely on the structure of the randomised constant.

For the remainder of this paper, we let $X$ and $Y$ be separable infinite-dimensional Hilbert Spaces, and $P\colon X\to Y$ be an injective bounded linear operator.

If $P^{-1}\colon\operatorname{ran}P\to X$ is a bounded operator, the inverse problem of finding $x$ from $P(x)$ can be solved in a stable way for all $x\in X$ , without the need for randomisation. This can be measured quantitatively by the constant

$C_{\mathrm{det}} = \inf\limits_{x\in X\setminus\left\{ 0\right\} }\frac{\left\Vert Px\right\Vert _{Y}^{2}}{\left\Vert x\right\Vert _{X}^{2}} \gt 0.$

The smaller $C_{\mathrm{det}}$ is, the more ill-conditioned the inversion becomes.

On the other hand, when $P^{-1}$ is unbounded, the situation is different, and the inverse problem is ill-posed ^[24,30]. In this case, although the kernel of $P$ reduces to $\{0\}$ , we have

$\begin{equation} C_{\mathrm{det}} = \inf\limits_{x\in X\setminus\left\{ 0\right\} }\frac{\left\Vert Px\right\Vert _{Y}^{2}}{\left\Vert x\right\Vert _{X}^{2}} = 0. \end{equation}$

(3.1)

Examples of such maps abound. The example that motivated our study was that introduced in section 2.2, with $X = H_{0}^{1}(\Omega)\times L^{2}(\Omega)$ , $Y = L^{2}([0, T]\times\partial\Omega)$ and

$P(y^{0},y^{1}) = y|_{[0,T]\times\Gamma},$

where $\Gamma\subseteq\partial\Omega$ and $y$ is the solution to (2.1) with initial condition $(y^{0}, y^{1})$ : If $\Gamma$ is not large enough, $P$ is still injective but $P^{-1}$ is unbounded ^[31]. Any injective compact linear operator satisfies (3.1).

Let us now introduce the randomised stability constant, which generalises the randomised observability constant to this general setting. We consider the class of random variables introduced in the last section. Choose an ONB $\mathrm{e} = \{\mathrm{e}_{k}\}_{k\in\mathbb{N}}$ of $X$ and write $x = \sum_{k = 1}^{\infty}x_{k}\mathrm{e}_{k}\in X$ . We consider random variables of the form

$x^{\nu} = \sum\limits_{k = 1}^{\infty}\beta_{k}^{\nu}x_{k} \mathrm{e}_{k},$

where $\beta_{k}^{\nu}$ are i.i.d. complex-valued random variables on a probability space $(A, \mathcal{A}, \mathbb{P})$ with vanishing mean and variance $1$ , so that $\mathbb{E}(|\beta_{k}^{\nu}|^{2}) = 1$ for every $k$ . These include the Bernoulli and Gaussian random variables considered in the previous section. It is worth observing that, in the case of Bernoulli random variables, we have $|\beta_{k}^{\nu}|^{2} = 1$ for every $k$ , so that $\|x^{\nu}\|_{X} = \|x\|_{X}$ .

Definition 1. The randomised stability constant is defined as

$C_{\mathrm{rand}}(\mathrm{e}) = \inf\limits_{x\in X\setminus\left\{ 0\right\} } \mathbb{E}\left(\frac{\left\Vert P\left(x^{\nu}\right)\right\Vert _{Y}^{2}}{\left\Vert x\right\Vert _{X}^{2}}\right).$

As in the previous section, this constant should represent stability of the inverse problem for most unknowns. By definition, we have $C_{\mathrm{rand}}(\mathrm{e})\ge C_{\mathrm{det}}$ . In general, this is a strict inequality: We will provide examples in section 4. This can be heuristically seen also by the following deterministic expression for the randomised stability constant.

Lemma 1. There holds

$\begin{equation} C_{\mathrm{rand}}(\mathrm{e}) = \inf\limits_{k}\left\Vert P\left(\mathrm{e}_{k}\right)\right\Vert _{Y}^{2}. \end{equation}$

(3.2)

Proof. Since $\left(Y, \left\Vert \cdot\right\Vert _{Y}\right)$ is also a Hilbert space, we find

$\begin{equation} \begin{split}\left\Vert P\left(x^{\nu}\right)\right\Vert _{Y}^{2} & = \left\langle \sum\limits_{k = 1}^{\infty}\beta_{k}^{\nu}x_{k} P\left(\mathrm{e}_{k}\right),\sum\limits_{l = 1}^{\infty}\beta_{l}^{\nu}x_{l}P\left(\mathrm{e}_{k}\right)\right\rangle _{Y}\\ & = \sum\limits_{k = 1}^{\infty}|\beta_{k}^{\nu}|^{2} |x_{k}|^{2}\left\Vert P\left(\mathrm{e}_{k}\right)\right\Vert _{Y}^{2}+\sum\limits_{\substack{k,l\\ k\neq l } }\beta_{k}^{\nu}\overline{\beta_{l}^{\nu}}x_{k}\overline{x_{l}}\left\langle P\left(\mathrm{e}_{k}\right),P\left(\mathrm{e}_{k}\right)\right\rangle_Y. \end{split} \end{equation}$

(3.3)

Since $\beta_{k}^{\nu}$ are i.i.d. with vanishing mean and such that $\mathbb{E}(|\beta_{k}^{\nu}|^{2}) = 1$ , we obtain

$\begin{equation} \mathbb{E}\left(\frac{\left\Vert P\left(x^{\nu}\right)\right\Vert _{Y}^{2}}{\left\Vert x\right\Vert _{X}^{2}}\right) = \frac{ \mathbb{E}\left(\left\Vert P\left(x^{\nu}\right)\right\Vert _{Y}^{2}\right)}{\sum_{k = 1}^{\infty}|x_{k}|^{2}} = \frac{\sum_{k = 1}^{\infty}|x_{k}|^{2}\left\Vert P\left(\mathrm{e}_{k}\right)\right\Vert _{Y}^{2}}{\sum_{k = 1}^{\infty}|x_{k}|^{2}}\geq\inf\limits_{k}\left\Vert P\left(\mathrm{e}_{k}\right)\right\Vert _{Y}^{2}, \end{equation}$

(3.4)

which means $C_{\mathrm{rand}}(\mathrm{e})\ge\inf_{k}\left\Vert P\left(\mathrm{e}_{k}\right)\right\Vert _{Y}^{2}$ . Choosing $x = \mathrm{e}_{k}$ , we obtain (3.2).

4. Exploiting $C_{\mathrm{rand}}$ in inverse problems

The aim of this section is to discuss the impact of the randomised observability constant in inverse problems. We would like to address the following question: How can the positivity of $C_{ \rm{rand}}$ be exploited in the solution of an inverse ill-posed problem? We will not fully address this issue, but rather provide a few positive and negative partial results.

We remind the reader that the randomisation introduced in the last section when (3.1) holds, was based on the point of view that the ratio in (3.1) is not "usually" small, and that, hopefully, in most cases inversion "should" be possible. It is worthwhile to observe that the subset of the $x\in X$ such that $\left\Vert Px\right\Vert _{Y}\geq c\left\Vert x\right\Vert _{X}$ for some fixed $c > 0$ is never generic in $X$ , since $\left\{ x\in X:\left\Vert Px\right\Vert _{Y} < c\left\Vert x\right\Vert _{X}\right\}$ is open and non empty by (3.1). This caveat in mind, we nevertheless wish to test if some evidence can be given to support our optimistic approach that in most cases, inversion should be possible.

Proposition 2. For every $\epsilon > 0$ and $x\in X$ , there exists $c > 0$ such that

$\begin{equation} \mathbb{P}\left(\left\Vert Px^{\nu}\right\Vert _{Y}\geq c\left\Vert x\right\Vert _{X}\right) \gt 1-\epsilon. \end{equation}$

(4.1)

Proof. Take $x\in X$ . Define the real-valued map $g(c) = \mathbb{P}\left(\left\Vert Px^{\nu}\right\Vert _{Y}\geq c\left\Vert x\right\Vert _{X}\right).$ Take a sequence $c_{n}\searrow0$ . It is enough to show that

$\lim\limits_{n\to+\infty}g(c_{n}) = 1.$

We write

$g\left(c_{n}\right) = \int_{A}f_{n}\left(\nu\right)d \mathbb{P}(\nu),{\text{ where }}\;f_{n}\left(\nu\right) = \begin{cases} 1 & {{\text{if}}\; {\left\Vert Px^{\nu}\right\Vert _{Y}\geq c_{n}\left\Vert x\right\Vert _{X}},}\\ 0 & {\text{otherwise.}} \end{cases}$

Note that $f_{n}$ is monotone increasing and, since $\ker P = \left\{ 0\right\}$ , $\lim_{n\to\infty}f_{n}(\nu) = 1$ for every $\nu$ . Thus by the Monotone Convergence Theorem,

$\lim\limits_{n\to\infty}\int_{A}f_{n}\left(\nu\right)d \mathbb{P}(\nu) = \int_{A}\left(\lim\limits_{n\to\infty}f_{n}\left(\nu\right)\right)d \mathbb{P}(\nu) = 1.$

This thus shows that, for a fixed $x$ , our intuition is vindicated: in the vast majority of cases, the inequality $\left\Vert Px^{\nu}\right\Vert _{Y}\geq c\left\Vert x\right\Vert _{X}$ holds true. This is true independently of $C_{\mathrm{rand}}(\mathrm{e})$ ; we now investigate whether the positivity of $C_{\mathrm{rand}}(\mathrm{e})$ may yield a stronger estimate.

4.1. Large deviation inequalities

The next step is to estimate the probability that, for a given $x\in X\setminus\{0\}$ , the square of the ratio $\frac{\left\Vert P\left(x^{\nu}\right)\right\Vert _{Y}}{\left\Vert x\right\Vert _{X}}$ used in Definition 1 is close to its mean value. The large deviation result we could derive describes the deviation from an upper bound to $C_{\mathrm{rand}}(\mathrm{e})$ , namely the constant $K_{\mathrm{rand}}(\mathrm{e})$ defined by

$\begin{equation} K_{\mathrm{rand}}(\mathrm{e}) = \sup\limits_{k}\|P(\mathrm{e}_{k})\|_{Y}^{2}. \end{equation}$

(4.2)

Theorem 1 (large deviation estimate). Assume that $Y = L^{2}(\Sigma, \mu)$ , where $(\Sigma, S, \mu)$ is a measure space. Let $(\beta_{k}^{\nu})_{k\in\mathbb{N}}$ be a sequence of independent random variables of Bernoulli type, on a probability space $(A, \mathcal{A}, \mathbb{P})$ with mean 0 and variance 1. Let $x\in X\setminus\{0\}$ and $x^{\nu} = \sum_{k = 1}^{\infty}\beta_{k}^{\nu}\mathrm{e}_{k}x_{k}$ . Then, for every $\delta > 0$ we have

$\mathbb{P}\left({\Vert Px^{\nu}\Vert_{Y}\ge\delta\left\Vert x^{\nu}\right\Vert _{X}}\right)\leq\exp\left(2-\frac{1}{e}\frac{\delta}{\sqrt{ K_{\mathrm{rand}}(\mathrm{e})}}\right).$

The proof of Theorem 1 is postponed to Appendix B. The argument follows the same lines as the one of [11,Theorem 2.1] and the general method introduced in ^[13].

Remark 1 (Application to a wave system). Considering the wave equation (2.1) and adopting the framework of section 2.2 leads to choose $\{\psi_{j}^{\pm}\}_{j\geq1}$ defined by (2.4) as the orthonormal basis $\mathrm{e}$ . In that case, $X = H_{0}^{1}(\Omega)\times L^{2}(\Omega)$ , $\Sigma = [0, T]\times\partial\Omega$ , $d\mu = dt\, d\mathcal{H}^{d-1}$ and $Y = L^{2}(\Sigma)$ . Following the discussion in section 2.2, the map $P$ is given by $P(y^{0}, y^{1}) = { \mathbb{1}}_{\Gamma}\frac{\partial y}{\partial\nu}$ , where $y$ is the unique solution of (2.1). Further, we have

$\begin{align*} C_{\mathrm{rand}}(\mathrm{e}) & = \frac{T}{2}\inf\limits_{j\in\mathbb{N}}\frac{1}{\lambda_{j}^{2}}\int_{\Gamma}\left(\frac{\partial\phi_{j}}{\partial\nu}\right)^{2}\,d\mathcal{H}^{d-1},\\ K_{\text{rand}}\left(\mathrm{e}\right) & = \frac{T}{2}\sup\limits_{j\in\mathbb{N}}\frac{1}{\lambda_{j}^{2}}\int_{\Gamma}\left(\frac{\partial\phi_{j}}{\partial\nu}\right)^{2}\,d\mathcal{H}^{d-1}, \end{align*}$

where the first equality is given in Proposition 1, and the second one follows by applying the same argument.

Note that according to the so-called Rellich identity^*, we have $0 < K_{\text{rand}}\left(\mathrm{e}\right)\leq\frac{T}{2}\operatorname{diam}(\Omega)$ , under additional mild assumptions on the domain $\Omega$ .

This identity, discovered by Rellich in 1940 ^[43], reads

$2\lambda^{2} = \int_{\partial\Omega}\langle x,\nu\rangle\left(\frac{\partial\phi}{\partial\nu}\right)^{2}\,d\mathcal{H}^{d-1}$

for every eigenpair $(\lambda, \phi)$ of the Laplacian-Dirichlet operator, $\Omega$ being a bounded connected domain of $\mathbb{R}^{n}$ either convex or with a $C^{1, 1}$ boundary.

The estimate given in Theorem 1 is on the "wrong side", since we show that the ratio related to the inversion is much bigger than $K_{\mathrm{rand}}(\mathrm{e})$ with low probability. The issue is not the boundedness of $P$ , which is given a priori, but of its inverse. This would correspond to a result of the type

$\begin{equation} \mathbb{P}\left(\frac{\|P(x^{\nu})\|_{Y}^{2}}{\|x\|_{X}^{2}} \lt C_{\mathrm{rand}}(\mathrm{e})-\delta\right)\le\text{small constant}, \end{equation}$

(4.3)

namely, a quantification of the estimate given in Proposition 2, uniform in $x$ . If such a bound held, it would show that $C_{\mathrm{rand}}(\mathrm{e})$ is a reliable estimator of the behaviour of the ratio $\frac{\|P(x)\|_{Y}^{2}}{\|x\|_{X}^{2}}$ in general. Notice that, in the favourable case when $P^{-1}$ is bounded, there exists $\delta_0\in [0, C_{\mathrm{rand}}(\mathrm{e}))$ such that

$\mathbb{P}\left(\frac{\|P(x^{\nu})\|_{Y}^{2}}{\|x\|_{X}^{2}} \lt C_{\mathrm{rand}}(\mathrm{e})-\delta\right) = 0$

for all $\delta\in [\delta_0, C_{\mathrm{rand}}(\mathrm{e}))$ .

In this general framework, estimate (4.3) does not hold, see Example 2. Using a concentration inequality, a weaker bound can be derived.

Proposition 3. Assume that $Y = L^{2}(\Sigma, \mu)$ , where $(\Sigma, S, \mu)$ is a measure space. Let $(\beta_{k}^{\nu})_{k\in\mathbb{N}}$ be a sequence of independent random variables of Bernoulli type, on a probability space $(A, \mathcal{A}, \mathbb{P})$ with mean 0 and variance 1. Let $x\in X\setminus\{0\}$ and $x^{\nu} = \sum_{k = 1}^{\infty}\beta_{k}^{\nu}\mathrm{e}_{k}x_{k}$ . Then, for every $\delta > 0$ we have

$\begin{equation} \mathbb{P}\left(\frac{\|P(x^{\nu})\|_{Y}^{2}}{\|x\|_{X}^{2}}-\mathbb{E}\left(\frac{\left\Vert P\left(x^{\nu}\right)\right\Vert _{Y}^{2}}{\left\Vert x\right\Vert _{X}^{2}}\right) \lt -\delta\right) \lt \exp\left(-\frac{\delta^{2}}{4 K_{\mathrm{rand}}(\mathrm{e})^{2}}\right). \end{equation}$

(4.4)

This result is based on an appropriate Hoeffding inequality; its proof is postponed to Appendix B. Note that (4.4) is not a large deviation result: the quantity under consideration is bounded between 0 and 1, and the upper bound obtained is not small. This is unavoidable, see Example 2.

4.2. Can you reconstruct two numbers from their sum?

We collect here several observations that suggest that the positivity of the randomised stability constant may not be helpful for solving the inverse problems, not even for most unknowns.

4.2.1. Instability arises for every $x\in X$

We remind the reader why (3.1) renders inversion unstable. Hypothesis (3.1) implies that there exists a sequence $\left(x_{n}\right)_{n\in\mathbb{N}}$ such that

$\begin{equation} \left\Vert x_{n}\right\Vert _{X} = 1\text{ and }\left\Vert Px_{n}\right\Vert _{Y} \lt \frac{1}{n}\text{ for all }n\in\mathbb{N}. \end{equation}$

(4.5)

Suppose that our measurements are not perfect, and are affected by a low level of noise $\delta$ , $\left\Vert \delta\right\Vert _{Y}\leq\epsilon$ , with $\epsilon > 0$ . Then, for every $n$ such that $n\epsilon > 1$ , we have

$\left\Vert P\left(x+x_{n}\right)-P\left(x\right)\right\Vert _{Y} \lt \epsilon,$

hence $x$ and $x+x_{n}$ correspond to the same measured data, even if $\left\Vert \left(x+x_{n}\right)-x\right\Vert _{X} = 1$ . This is an unavoidable consequence of the unboundedness of $P^{-1}$ , and is true for every $x\in X$ , even if the randomised stability constant were positive (and possibly large).

4.2.2. The dependence of $C_{\mathrm{rand}}$ on the basis

Lemma 1 shows that the ONB used to randomise our input plays a role, as it appears explicitly in the formula (3.2). The following proposition underscores that point. Namely, if we consider all possible randomisations with respect to all ONB of $X$ we recover the deterministic stability constant $C_{\mathrm{det}}$ .

Proposition 4. We have

$\inf\limits_{\mathrm{e}} C_{\mathrm{rand}}(\mathrm{e}) = C_{\mathrm{det}},$

where the infimum is taken over all ONB of $X$ . In particular, if $P^{-1}$ is unbounded then $\inf_{\mathrm{e}} C_{\mathrm{rand}}(\mathrm{e}) = 0$ .

Proof. By definition of $C_{\mathrm{rand}}(\mathrm{e})$ , we have that $C_{\mathrm{rand}}(\mathrm{e})\ge C_{\mathrm{det}}$ for every ONB $\mathrm{e}$ , and so it remains to prove that

$\inf\limits_{\mathrm{e}} C_{\mathrm{rand}}(\mathrm{e})\le C_{\mathrm{det}}.$

By definition of $C_{\mathrm{det}}$ , we can find a sequence $x_{n}\in X$ such that $\|x_{n}\|_{X} = 1$ for every $n$ and $\|Px_{n}\|_{Y}^{2}\to C_{\mathrm{det}}.$ For every $n$ , complete $x_{n}$ to an ONB of X, which we call $\mathrm{e}^{(n)}$ . By Lemma 1 we have $C_{\mathrm{rand}}(\mathrm{e}^{(n)})\le\|Px_{n}\|_{Y}^{2}$ , and so

$\inf\limits_{\mathrm{e}} C_{\mathrm{rand}}(\mathrm{e})\le\inf\limits_{n}C_{\mathrm{rand}}(\mathrm{e}^{(n)})\le\inf\limits_{n}\|Px_{n}\|_{Y}^{2}\le\lim\limits_{n\to+\infty}\|Px_{n}\|_{Y}^{2} = C_{\mathrm{det}}.$

This result shows that, in general, the randomised stability constant strongly depends on the choice of the basis. There will always be bases for which it becomes arbitrarily small when $P^{-1}$ is unbounded.

It is also worth observing that for compact operators, which arise frequently in inverse problems, the randomised stability constant is always zero.

Lemma 2. If $P$ is compact then $C_{\mathrm{rand}}(\mathrm{e}) = 0$ for every ONB $\mathrm{e}$ of $X$ .

Proof. Since $\mathrm{e}_{k}$ tends to zero weakly in $X$ , by the compactness of $P$ we deduce that $P(\mathrm{e}_{k})$ tends to zero strongly in $Y$ . Thus, by Lemma 1 we have

$C_{\mathrm{rand}}(\mathrm{e}) = \inf\limits_{k}\left\Vert P\left(\mathrm{e}_{k}\right)\right\Vert _{Y}^{2}\leq\lim\limits_{k\to+\infty}\left\Vert P\left(\mathrm{e}_{k}\right)\right\Vert _{Y}^{2} = 0,$

as desired.

4.2.3. Examples

Let us now consider some examples. The first example is finite dimensional and the kernel of the operator is not trivial. Note that the definition of $C_{\mathrm{rand}}(\mathrm{e})$ and all results above, except Proposition 2 and Lemma 2, are valid also in this case, with suitable changes due to the finiteness of the ONB.

Example 1. Choose $X = \mathbb{R}^{2}$ and $Y = \mathbb{R}$ , and consider the map

$\begin{align*} S\colon \mathbb{R}^{2} & \to \mathbb{R}\\ \left(x,y\right) & \mapsto x+y. \end{align*}$

The associated inverse problem can be phrased: Find the two numbers whose sum is given. This problem is ill-posed and impossible to solve. The deterministic stability constant vanishes

$\inf\limits_{\left(x_{1},x_{2}\right)\in \mathbb{R}^{2}\setminus\left\{ \left(0,0\right)\right\} }\frac{\left|x_{1}+x_{2}\right|^{2}}{x_{1}^{2}+x_{2}^{2}} = 0,$

and $S^{-1}$ does not exist. However, the randomised constant obtained using the canonical basis is positive. Indeed, $\left|P\left(1, 0\right)\right| = \left|P\left(0, 1\right)\right| = 1$ , therefore

$C_{\text{rand}}\left(\{\left(1,0\right),(0,1)\}\right) = \inf\{1,1\} = 1.$

The positivity of this constant does not imply the existence of any useful method to perform the reconstruction of $x$ and $y$ from $x+y$ , even for most $(x, y)\in\mathbb{R}^{2}$ .

Had we chosen as orthonormal vectors $\frac{1}{\sqrt{2}}\left(1, 1\right)$ and $\frac{1}{\sqrt{2}}\left(1, -1\right)$ , since $\left|P\left(1, -1\right)\right| = 0$ , we would have found

$C_{\text{rand}}\left(\frac{1}{\sqrt{2}}\left(1,1\right),\frac{1}{\sqrt{2}}\left(1,-1\right)\right) = 0.$

One may wonder whether the features highlighted above are due to the fact that the kernel is not trivial. That is not the case, as the following infinite-dimensional generalisation with trivial kernel shows.

Example 2. Consider the case when $X = Y = \ell^{2}$ , equipped with the canonical euclidean norm. Let $\mathrm{e} = \{\mathrm{e}_{k}\}_{k = 0}^{+\infty}$ denote the canonical ONB of $\ell^{2}$ . Take a sequence $\left(\eta_{n}\right)_{n\in\mathbb{N}_{0}}$ such that $\eta_{n} > 0$ for all $n$ , and $\lim_{n\to\infty}\eta_{n} = 0$ . We consider the operator $P$ defined by

$P(\mathrm{e}_{2n}) = \mathrm{e}_{2n}+\mathrm{e}_{2n+1},\qquad P(\mathrm{e}_{2n+1}) = \mathrm{e}_{2n}+(1+\eta_{n})\mathrm{e}_{2n+1}.$

The operator $P$ may be represented with respect to the canonical basis $\mathrm{e}$ by the block-diagonal matrix

$P = \begin{bmatrix}1 & 1 & 0 & 0 & \cdots\\ 1 & 1+\eta_{0} & 0 & 0 & \cdots\\ 0 & 0 & 1 & 1 & \cdots\\ 0 & 0 & 1 & 1+\eta_{1} & \cdots\\ \vdots & \vdots & \vdots & \vdots & \ddots \end{bmatrix}.$

In other words, $P$ may be expressed as

$P(x) = (x_{0}+x_{1},x_{0}+(1+\eta_{0})x_{1},x_{2}+x_{3},x_{2}+(1+\eta_{1})x_{3},\dots),\qquad x\in\ell^{2}.$

We note that $\ker T = \left\{ 0\right\}$ and that its inverse is given by

$P^{-1}(y) = ((1+\eta_{0}^{-1})y_{0}-\eta_{0}^{-1}y_{1},\eta_{0}^{-1}(y_{1}-y_{0}),(1+\eta_{1}^{-1})y_{2}-\eta_{1}^{-1}y_{3},\eta_{1}^{-1}(y_{3}-y_{2}),\dots),$

which is an unbounded operator since $\eta_{n}^{-1}\to+\infty$ . Given the block diagonal structure of this map, the inversion consists of solving countably many inverse problems (i.e., linear systems) of the form

$\left\{ \begin{array}{l} x_{2n}+x_{2n+1} = y_{2n},\\ x_{2n}+(1+\eta_{n})x_{2n+1} = y_{2n+1}. \end{array}\right.$

As soon as $\tilde{n}$ is such that $\eta_{\tilde{n}}$ becomes smaller than the noise level, all the following inverse problems for $n\ge\tilde{n}$ are impossible to be solved, since they reduce to the "sum of two numbers" discussed in Example 1.

Note that

$\|P(\mathrm{e}_{2n})\|_{2}^{2} = 2,\qquad\|P(\mathrm{e}_{2n+1})\|_{2}^{2} = 1+\left(1+\eta_{n}\right)^{2},$

therefore

$C_{\mathrm{rand}}(\mathrm{e}) = 2.$

If we choose instead the rotated orthonormal basis,

$v_{2n} = \frac{1}{\sqrt{2}}\left(\mathrm{e}_{2n}+\mathrm{e}_{2n+1}\right),\quad v_{2n+1} = \frac{1}{\sqrt{2}}\left(\mathrm{e}_{2n}-\mathrm{e}_{2n+1}\right),$

then $P\left(v_{2n+1}\right) = -\eta_{n}\frac{1}{\sqrt{2}}\mathrm{e}_{2n+1},$ and so

$C_{\text{rand}}\left(\{v_{k}\}_{k}\right) = \inf\limits_{k}\|P\left(v_{k}\right)\|_{2}^{2}\le\lim\limits_{n\to\infty}\|P\left(v_{2n+1}\right)\|_{2}^{2} = 0.$

We now turn to (4.3) and (4.4). For some $k\geq0$ , consider $x = \mathrm{e}_{2k}$ . Then $P(x^{\nu}) = {\beta_{2k}^{\nu}}(\mathrm{e}_{2k}+\mathrm{e}_{2k+1})$ and therefore $\|P(x^{\nu})\|_{Y} = \|P(x)\|_{Y}$ : there is no deviation as

$\frac{\|P(x^{\nu})\|_{Y}^{2}}{\|x\|_{X}^{2}} = \mathbb{E}\left(\frac{\|P(x^{\nu})\|_{Y}^{2}}{\|x\|_{X}^{2}}\right) = C_{\mathrm{rand}}(\mathrm{e}) = 2.$

Thus, the probabilities in (4.3) and (4.4) are equal to $0$ , and the inequalities are trivial.

However, alternatively, consider $x = \mathrm{e}_{2k}+\mathrm{e}_{2k+1}$ . Then $\|P(x^{\nu})\|_{Y}^{2} = 4+(2+\eta_{2k})^{2}$ if $\beta_{1}^{\nu}\beta_{2}^{\nu} = 1$ and $\|P(x^{\nu})\|_{Y}^{2} = \eta_{2k}^{2}$ if $\beta_{1}^{\nu}\beta_{2}^{\nu} = -1$ . Therefore

$\frac{\|P(x^{\nu})\|_{Y}^{2}}{\|x\|_{X}^{2}}- C_{\mathrm{rand}}(\mathrm{e}) = \begin{cases} \frac{(2+\eta_{2k})^{2}}{2} & {\text{with probability}\; {\frac{1}{2}},}\\ \frac{\eta_{2k}^{2}}{2}-2 & {\text{with probability}\; {\frac{1}{2}}.} \end{cases}$

As a consequence, (4.3) cannot be true in general for every $x$ . Similarly, we have

$\frac{\|P(x^{\nu})\|_{Y}^{2}}{\|x\|_{X}^{2}}- \mathbb{E}\left(\frac{\|P(x^{\nu})\|_{Y}^{2}}{\|x\|_{X}^{2}}\right) = \begin{cases} 2+\eta_{2k} & {\text{with probability}\; {\frac{1}{2}},}\\ -2-\eta_{2k} & {\text{with probability} \;{\frac{1}{2}},} \end{cases}$

and the left-hand side of (4.4) can indeed be large for some $x$ .

It is worth observing that a very similar example is considered in ^[32] to show that particular complications may arise when using neural networks for solving some inverse problems, even naive and small scale (cfr. §A.3).

These examples show that considering the observability constant for a particular basis sheds little light on a potential stable inversion of the problem in average, and that considering all possible randomisations leads to the same conclusion as the deterministic case (confirming Proposition 4).

4.3. Linear versus nonlinear problems

The pitfalls we encountered when we tried to make use of the randomised stability constant all stem from the linearity of the problems we are considering. The seminal work of Burq and Tzvetkov ^[12], which showed existence of solutions in super-critical regimes for a semilinear problem did not involve tinkering with associated linear operator (the wave equation); it is the nonlinearity that controlled the critical threshold. In both compressed sensing and passive imaging with random noise sources, nonlinearity plays a key role; further, deep networks are nonlinear maps (cfr. Appendix A).

The naive intuition we discussed earlier, namely, that extreme situations do not occur often, is more plausible for nonlinear maps where pathological behaviour is local.

Example 3. As a toy finite-dimensional example, consider the map

$T\colon\mathbb{R}\to\mathbb{R},\qquad T(x) = x(x-\epsilon)(x+\epsilon),$

for some small $\epsilon > 0$ . Then $T$ can be stably inverted outside of a region with size of order $\epsilon$ , since there the inverse is continuous. Thus, a random initial data has little chance of falling precisely in the problematic region. Such a case cannot happen with linear maps.

Example 4. Let $A\colon H\to H$ be an unbounded linear operator on a Hilbert space with compact resolvent, so that the spectrum of $A$ is discrete. Define the nonlinear map

$T\colon H\times[0,1]\to H\times[0,1],\qquad T(x,\lambda) = (Ax+\lambda x,\lambda).$

Note that $A+\lambda I$ is invertible with probability $1$ if $\lambda$ is chosen uniformly in $[0, 1]$ . Thus, if $H\times[0, 1]$ is equipped with a product probability measure whose factor on $[0, 1]$ is the uniform probability, then $x$ may be reconstructed from $T(x)$ with probability $1$ .

5. Concluding remarks

In this paper we focused on the randomised stability constant for linear inverse problems, which we introduced as a generalisation of the randomised observability constant.

We argue that, despite its intuitive and simple definition, the randomised stability constant has no implications in the practical solution of inverse problems, even for most unknowns. As the examples provided show, this may be due to the linearity of the problem. With nonlinear problems, the situation is expected to be completely different. It could be that the randomised stability constant is meaningful in the context of a nonlinear inversion process, involving for example a hierarchical decomposition ^[34,46], but we do not know of results in that direction: this is left for future research.

Conflict of interest

The authors declare no conflict of interest.

A. Examples of techniques based on randomisation

In this appendix we briefly review three different techniques for solving inverse problems where randomisation plays a crucial role. We do not aim to provide an exhaustive overview, or to report on the most recent advances, or to discuss the many variants that have been studied. The examples we present are used to contrast possible different approaches, and the level of mathematical understanding associated with them.

A.1. Compressed sensing

Since the seminal works ^[15,16], compressed sensing (CS) has provided a theoretical and numerical framework to overcome Nyqvist criterion in sampling theory for the reconstruction of sparse signals. In other words, sparse signals in $\mathbb{C}^{n}$ may be reconstructed from $k$ discrete Fourier measurements, with $k$ smaller than $n$ and directly proportional to the sparsity of the signal (up to log factors, see eq. (A.2) below). Let us give a quick overview of the main aspects of CS, in order to show how it fits in the general framework of section 1. For additional details, the reader is referred to the book ^[19], and to the references therein.

Given $s\in\mathbb{N} = \{1, 2, \dots\}$ , let $X$ be the set of $s$ -sparse signals in $\mathbb{C}^{n}$ , namely

$X = \{x\in\mathbb{C}^{n}:\#{\rm{supp}} x\le s\}.$

Let $F\colon\mathbb{C}^{n}\to\mathbb{C}^{n}$ denote the discrete Fourier transform. In fact, any unitary map may be considered, by means of the notion of incoherence ^[14]. In any case, the Fourier transform is a key example for the applications to Magnetic Resonance Imaging and Computerised Tomography (via the Fourier Slice Theorem). In order to subsample the Fourier measurements, we consider subsets $S_{a}$ of cardinality $k$ of $\{1, 2, \dots, n\}$ and parametrise them with $a\in\{1, 2, \dots, {\binom{n}{k}}\}$ . Let $Y = \mathbb{C}^{k}$ and $P_{a}\colon\mathbb{C}^{n}\to\mathbb{C}^{k}$ be the projection selecting the entries corresponding to $S_{a}$ . We then define the measurement map

$T_{a} = P_{a}\circ F\colon X\to\mathbb{C}^{k}.$

In other words, $T_{a}$ is the partial Fourier transform, since only the frequencies in $S_{a}$ are measured, and $\#S_{a} = k\le n$ .

Given an unknown signal $x_{0}\in X$ , we need to reconstruct it from the partial knowledge of its Fourier measurements represented by $y: = T_{a}(x_{0})$ . The sparsity of $x_{0}$ has to play a crucial role in the reconstruction, since as soon as $k < n$ the map $P_{a}\circ F\colon\mathbb{C}^{n}\to\mathbb{C}^{k}$ necessarily has a non-trivial kernel. It is worth observing that sparsity is a nonlinear condition: If $X$ were a linear subspace of $\mathbb{C}^{n}$ , the problem would be either trivial or impossible, depending on $\ker(P_{a}\circ F)\cap X$ . Thus nonlinearity plays a crucial role here.

The simplest reconstruction algorithm is to look for the sparsest solution to $T_{a}x = y$ , namely to solve the minimisation problem

$\min\limits_{x\in\mathbb{C}^{n}}\|x\|_{0}\quad{{\text{subject to }}\; {T_{a}x = y},}$

where $\|x\|_{0} = \#{\rm{supp}} x$ . However, this problem is NP complex, and its direct resolution impractical. Considering the convex relaxation $\|\cdot\|_{1}$ of $\|\cdot\|_{0}$ leads to a well-defined minimisation problem

$\begin{equation} \min\limits_{x\in\mathbb{C}^{n}}\|x\|_{1}\quad{{\text{subject to}} \;{T_{a}x = y},} \end{equation}$

(A.1)

whose solution may be easily found by convex optimisation (in fact, by linear programming).

The theory of CS guarantees exact reconstruction. More precisely, if $\tilde{x}$ is a minimiser of (A.1), then $\tilde{x} = x_{0}$ with high probability, provided that

$\begin{equation} k\ge Cs\log n, \end{equation}$

(A.2)

and that $a$ is chosen uniformly at random in $\{1, 2, \dots, {\binom{n}{k}}\}$ (namely, the subset $S_{a}$ is chosen uniformly at random among all the subsets of cardinality $k$ of $\{1, 2, \dots, n\}$ ) ^[15]. In addition, in the noisy case, with measurements of the form $y = T_{a}(x_{0})+\eta$ where $\|\eta\|_{2}\le\epsilon$ , by relaxing the equality " $T_{a}x = y$ " to the inequality " $\|T_{a}x-y\|_{2}\le\epsilon$ " in (A.1), one obtains the linear convergence rate $\|x_{0}-\tilde{x}\|_{2}\le C\epsilon$ , namely, the solution is stable.

In summary, CS allows for the stable reconstruction of all sparse signals from partial Fourier measurements, for most choices of the measured frequencies. The corresponding forward map $T_{a}\colon X\to\mathbb{C}^{k}$ is nonlinear, simply because $X$ is not a vector space.

A.2. Passive imaging with random noise sources

The material presented in this part is taken from ^[20], to which the reader is referred for more detailed discussion on this topic.

A typical multistatic imaging problem is the recovery of some properties of a medium with velocity of propagation $c(x) > 0$ from some measurements at locations $x_{j}\in\mathbb{R}^{3}$ of the solution $u(t, x)$ of the wave equation

$\partial_{t}^{2}u(t,x)-c(x)^{2}\Delta u(t,x) = f(t)\delta(x-y),\qquad(t,x)\in\mathbb{R}\times\mathbb{R}^{3},$

where $f(t)$ is the source pulse located at $y$ . One of the major applications of this setup is geophysical imaging, where one wants to recover properties of the structure of the earth from measurements taken on the surface. Generating sources in this context is expensive and disruptive. Earthquakes are often used as sources, but they are rare and isolated events. Yet, noisy signals, as they may be recorded by seismographers, even if low in amplitude, may be relevant and useful even in absence of important events.

The key idea is to consider the data generated by random sources (e.g., in seismology, those related to the waves of the sea). The equation becomes

$\partial_{t}^{2}u(t,x)-c(x)^{2}\Delta u(t,x) = n(t,x),\qquad(t,x)\in\mathbb{R}\times\mathbb{R}^{3},$

where the source term $n(t, x)$ is a zero-mean stationary random process that models the ambient noise sources. We assume that its autocorrelation function is

$\mathbb{E}(n(t_{1},y_{1})n(t_{2},y_{2})) = F(t_{2}-t_{1})K(y_{1})\delta(y_{1}-y_{2}),$

where $F$ is the time correlation function (normalised so that $F(0) = 1$ ) and $K$ characterises the spatial support of the sources. The presence of $\delta(y_{1}-y_{2})$ makes the process $n$ delta-correlated in space.

The reconstruction is based on the calculation of the empirical cross correlation of the signals recorded at $x_{1}$ and $x_{2}$ up to time $T$ :

$C_{T}(\tau,x_{1},x_{2}) = \frac{1}{T}\int_{0}^{T}u(t,x_{1})u(t+\tau,x_{2})\,dt.$

Its expectation is the statistical cross correlation

$\mathbb{E}(C_{T}(\tau,x_{1},x_{2})) = C^{(1)}(\tau,x_{1},x_{2}),$

which is given by

$\begin{equation} C^{(1)}(\tau,x_{1},x_{2}) = \frac{1}{2\pi}\int_{\mathbb{R}\times\mathbb{R}^{3}}\hat{F}(\omega)K(y)\overline{\hat{G}(\omega,x_{1},y)}\hat{G}(\omega,x_{2},y)e^{-i\omega\tau}\,dtdy, \end{equation}$

(A.3)

where $\hat{\cdot}$ denotes the Fourier transform in time and $G(t, x, y)$ is the time-dependent Green's function. Moreover, $C_{T}$ is a self-averaging quantity, namely

$\lim\limits_{T\to+\infty}C_{T}(\tau,x_{1},x_{2}) = C^{(1)}(\tau,x_{1},x_{2})$

in probability.

The role of randomised sources is now clear: from the measured empirical cross correlation $C_{T}$ with large values of $T$ it is possible to estimate, with high probability, the statistical cross correlation $C^{(1)}$ . Using (A.3), from $C^{(1)}(\tau, x_{1}, x_{2})$ it is possible to recover (some properties of) the Green function $G$ , which yield useful information about the medium, such as travel times.

A.3. Deep Learning in inverse problems for PDE

Convolutional Neural Networks have recently been used for a variety of imaging and parameter reconstruction problems ^[22], including Electrical Impedance Tomography (EIT) ^[23,33,48], optical tomography ^[18], inverse problems with internal data ^[9], diffusion problems in imaging ^[7], computerised tomography ^[10,28], photoacoustic tomography ^[5,25] and magnetic resonance imaging ^[49,50]. In the following brief discussion, we decided to focus on inverse problems for partial differential equations (PDE), and in particular on EIT, but similar considerations are valid for most methods cited above.

Significant improvement has been observed in EIT with deep learning compared to previous imaging approaches. Let $\Omega\subseteq\mathbb{R}^{d}$ , $d\ge2$ , be a bounded Lipschitz domain with outer unit normal $\nu$ . The data in EIT is (a part of) of the Dirichlet-to-Neumann map

$\Lambda_{\sigma}\colon\begin{array}[t]{rcl} H^{\frac{1}{2}}\left(\partial\Omega\right)/ \mathbb{R} & \to & H^{-\frac{1}{2}}\left(\partial\Omega\right)/ \mathbb{R}\\ v & \mapsto & \left.\sigma\nabla u\cdot\nu\right|_{\partial\Omega} \end{array}$

where $u(x)$ denotes the unique solution of the elliptic problem

$\left\{ \begin{array}{ll} \operatorname{div}\left(\sigma(x)\nabla u(x)\right) = 0 & x\in\Omega,\\ u(x) = v(x) & x\in\partial\Omega, \end{array}\right.$

and $\sigma(x) > 0$ is the unknown conductivity. The experimental data is usually part of the inverse map, namely the Neumann-to-Dirichlet map $\Lambda_{\sigma}^{-1}$ . In two dimensions, provided that the electrodes are equally separated on the unit disk, the data may be modelled by

$T_{N}\Lambda_{\sigma}^{-1}T_{N},$

where $T_{N}$ is the $L^{2}$ projection on span $\{\theta\mapsto\cos(n\theta):1\leq n\leq N\}$ , where $N$ is the number of electrodes: It is the partial Fourier transform limited to the first $N$ coefficients.

Direct neural network inversion approaches suffer from drawbacks alike direct non-regularised inversion attempts: the output is very sensitive to measurement errors and small variations. Successful approaches to Deep Learning EIT ^[23,33,48], and to other parameter identification problems in PDE, often involve two steps.

The first step consists in the derivation of an approximate conductivity $\sigma$ by a stable, albeit blurry, regularised inversion method. For instance, in ^[23] the "D-bar" equation is used, while in ^[33] a one-step Gauss-Newton method is used. In both cases, the output of this step is a representation of the conductivity coefficient, which depends on the inversion method used. This first step is deterministic and its analysis is well understood. The forward problem, relating the conductivity to the Dirichlet-to-Neumann map, is nonlinear, independently of the inversion algorithm used. Indeed, the map $\Lambda_{\sigma}$ is a linear operator, but $\sigma\mapsto\Lambda_{\sigma}$ is nonlinear.

The second, post-processing, step uses a neural network to "deblur" the image, and in fact restores details that were not identifiable after the first step.

The second step is not unlike other successful usage of deep-learning approaches for image classification; in general they are known to be successful only with very high probability (and in turn for random unknowns). More precisely, since the findings of ^[45], deep networks are known to be vulnerable to so-called "adversarial perturbations" (see the review article ^[1] and the references therein). Given an image $x$ that is correctly classified by the network with high confidence, an adversarial perturbation is a small perturbation $p$ such that the images $x$ and $y = x+p$ are visually indistinguishable but the perturbed image $y$ is misclassified by the network, possibly with high confidence too. State-of-the-art classification networks are successful for the vast majority of natural images, but are very often vulnerable to such perturbations.

These instabilities are not specific to image classification problems; they appear in the same way in image reconstruction ^[6]. In this case, given an image that is well-reconstructed by the network, it is possible to create another image that is visually indistinguishable from the original one, but that is not well-reconstructed by the network.

A full mathematical understanding of deep networks is still lacking, and the reasons of this phenomenon are not fully known. However, the large Lipschitz constant of the network certainly plays a role, since it is a sign of potential instability: in order for the network to be effective, the weights of its linear steps need to be chosen large enough, and the composition of several layers yields an exponentially large constant.

B. Proofs of the large deviation estimates

The proofs of Theorem 1 and Proposition 3 rest upon a classical large deviation estimate, the so-called Hoeffding inequality, see e.g. [11,Prop. 2.2], whose proof is recalled for the convenience of the reader.

Proposition 5. Let $(\alpha_{n}^{\nu})_{n\geq1}$ be a sequence of sequence of independent random variables of Bernoulli type, on a probability space $(A, \mathcal{A}, \mathbb{P})$ with mean 0 and variance 1. Then, for any $t > 0$ and any sequence $(u_{n})_{n\geq1}\in\ell^{2}(\mathbb{C})$ , we have

$\mathbb{P}\left(\sum\limits_{n = 1}^{+\infty}\alpha_{n}^{\nu}v_{n} \lt -t\right) = \mathbb{P}\left(\sum\limits_{n = 1}^{+\infty}\alpha_{n}^{\nu}v_{n} \gt t\right)\leq\exp\left(-\frac{1}{2}\frac{t^{2}}{\sum\limits_{n = 1}^{+\infty}|v_{n}|^{2}}\right).$

Proof. There holds $\mathbb{E}\left(\exp\left(\alpha_{n}^{\nu}v_{n}\right)\right) = \mathbb{E}\left(\sum_{k = 0}^{\infty}\frac{\left(\alpha_{n}^{\nu}v_{n}\right)^{k}}{k!}\right) = \sum_{k = 0}^{\infty}\frac{1}{k!}\mathbb{E}\left(\left(\alpha_{n}^{\nu}v_{n}\right)^{k}\right)$ . All odd powers of $k$ vanish as $\alpha_{n}^{\nu}$ has zero mean and is symmetrical. Therefore for any $\lambda > 0$

$\mathbb{E}\left(\exp\left(\lambda\alpha_{n}^{\nu}v_{n}\right)\right) = \sum\limits_{k = 0}^{\infty}\frac{1}{\left(2k\right)!}\mathbb{E}\left(\left(\lambda\alpha_{n}^{\nu}v_{n}\right)^{2k}\right) \leq\sum\limits_{k = 0}^{\infty}\frac{\lambda^{2k}v_{n}^{2k}}{2^{k}\left(k!\right)} = \exp\left(\frac{\lambda^{2}}{2}v_{n}^{2}\right).$

Applying Chernoff's inequality, we obtain for any $\lambda > 0$

$\mathbb{P}\left(\sum\limits_{n = 1}^{+\infty}\alpha_{n}^{\nu}v_{n} \gt t\right) \le\mathbb{E}[\exp\left(\lambda\left(\sum\limits_{n = 1}^{+\infty}\alpha_{n}^{\nu}v_{n}\right)\right)]\exp\left(-\lambda t\right) \leq\exp\left(\lambda^{2}\frac{1}{2}\sum\limits_{n = 1}^{+\infty}v_{n}^{2}-\lambda t\right).$

Choosing $\lambda = \frac{t}{\sum v_{n}^{2}}$ this yields

as desired.

We are now ready to prove Theorem 1.

Proof of Theorem 1. Fix $r\geq2$ and set $\mathcal{Y}^{\nu} = \frac{P\left(x^{\nu}\right)}{\left\Vert x^{\nu}\right\Vert _{X}}$ . Markov's inequality yields

$\begin{equation} \begin{split} \mathbb{P}\left(\Vert\mathcal{Y}^{\nu}\Vert_{Y}\ge\delta\right) & = \mathbb{P}\left(\Vert\mathcal{Y}^{\nu}\Vert_{Y}^{r}\ge\delta^{r}\right)\leq\frac{1}{\delta^{r}} \mathbb{E}\left(\Vert\mathcal{Y}^{\nu}\Vert_{Y}^{r}\right).\end{split} \end{equation}$

(B.1)

Let us denote by $\mathcal{L}_{\nu}^{r}$ the standard Lebesgue space with respect to the probability measure $d \mathbb{P}$ . Recall that $Y = L^{2}(\Sigma, {\mu})$ . To provide an estimate of the right-hand side, notice that

$\begin{equation} \begin{split} \mathbb{E}\left(\Vert\mathcal{Y}^{\nu}\Vert_{Y}^{r}\right) & = \int_{A}\Vert\mathcal{Y}^{\nu}\Vert_{Y}^{r}\,d \mathbb{P}(\nu)\\ & = \int_{A}\left(\int_{\Sigma}|\mathcal{Y}^{\nu}(s)|^{2}\,d{\mu(s)}\right)^{r/2}\,d \mathbb{P}(\nu)\\ & = \Big\Vert\int_{\Sigma}|\mathcal{Y}^{\nu}(s)|^{2}\,d{\mu(s)}\Big\Vert_{\mathcal{L}_{\nu}^{r/2}}^{r/2}\\ & \leq\left(\int_{\Sigma}\big\Vert|\mathcal{Y}^{\nu}(s)|^{2}\big\Vert_{\mathcal{L}_{\nu}^{r/2}}\,d{\mu(s)}\right)^{r/2}\\ & = \left(\int_{\Sigma}\big\Vert\mathcal{Y}^{\nu}(s)\big\Vert_{\mathcal{L}_{\nu}^{r}}^{2}\,d{\mu(s)}\right)^{r/2}\\ & = \big\Vert s\mapsto\Vert\mathcal{Y}^{\nu}(s)\Vert_{\mathcal{L}_{\nu}^{r}}\big\Vert_{Y}^{r} \end{split} \end{equation}$

(B.2)

by using Jensen's inequality.

Furthermore, for a.e. $s\in\Sigma$ , we have

$\Vert\mathcal{Y}^{\nu}(s)\Vert_{\mathcal{L}_{\nu}^{r}}^{r} = \int_{A}|\mathcal{Y}^{\nu}(s)|^{r}d \mathbb{P}(\nu) = \int_{0}^{+\infty}ru^{r-1} \mathbb{P}(|\mathcal{Y}^{\nu}(s)| \gt u)\,du$

Here, we use that if $X$ denotes a non-negative random variable and $\varphi\colon\mathbb{R}_{+}\to\mathbb{R}_{+}$ , then

$\mathbb{E}(\varphi(X)) = \int_{0}^{+\infty}\varphi'(u) \mathbb{P}(X \gt u)\,du.$

and by using Proposition 5 and the fact that $\mathcal{Y}^{\nu}(s)$ reads

$\mathcal{Y}^{\nu}(s) = \frac{\sum_{k = 1}^{\infty}\beta_{k}^{\nu}x_{k}\left(P\mathrm{e}_{k}\right)(s)}{\sqrt{\sum_{k = 1}^{\infty}|x_{k}|^{2}}},$

one gets

$\Vert\mathcal{Y}^{\nu}(s)\Vert_{\mathcal{L}_{\nu}^{r}}^{r}\leq2\int_{0}^{+\infty}ru^{r-1}\exp\left(-\frac{1}{2}\frac{\sum_{k = 1}^{\infty}|x_{k}|^{2}u^{2}}{\sum_{k = 1}^{+\infty}|\left(P\mathrm{e}_{k}\right)(s)|^{2}|x_{k}|^{2}}\right)\,du.$

As a consequence, by using the change of variable

$v = \frac{\sqrt{\sum_{k = 1}^{\infty}|x_{k}|^{2}}u}{\sqrt{\sum_{k = 1}^{+\infty}|\left(P\mathrm{e}_{k}\right)(s)|^{2}|x_{k}|^{2}}}$

we get

$\Vert\mathcal{Y}^{\nu}(s)\Vert_{\mathcal{L}_{\nu}^{r}}^{r}\leq C(r)\left(\frac{\sum_{k = 1}^{+\infty}|\left(P\mathrm{e}_{k}\right)(s)|^{2}|x_{k}|^{2}}{\sum_{k = 1}^{\infty}|x_{k}|^{2}}\right)^{r/2}$

with

$C(r) = 2\int_{0}^{+\infty}rv^{r-1}e^{-\frac{1}{2}v^{2}}\,dv.$

An elementary computation yields $C(r) < r^{r}$ . Therefore,

$\Vert\mathcal{Y}^{\nu}(s)\Vert_{\mathcal{L}_{\nu}^{r}} \lt \left(r^{2}\frac{\sum_{k = 1}^{+\infty}|\left(P\mathrm{e}_{k}\right)(s)|^{2}|x_{k}|^{2}}{\sum_{k = 1}^{\infty}|x_{k}|^{2}}\right)^{1/2}.$

According to (B.2), we infer that

$\mathbb{E}\left(\Vert\mathcal{Y}^{\nu}\Vert_{Y}^{r}\right) \lt \Bigg\Vert\left(r^{2}\frac{\sum_{k = 1}^{+\infty}|\left(P\mathrm{e}_{k}\right)(\cdot)|^{2}|x_{k}|^{2}}{\sum_{k = 1}^{\infty}|x_{k}|^{2}}\right)^{1/2}\Bigg\Vert_{Y}^{r} = \Bigg\Vert r^{2}\frac{\sum_{k = 1}^{+\infty}|\left(P\mathrm{e}_{k}\right)(\cdot)|^{2}|x_{k}|^{2}}{\sum_{k = 1}^{\infty}|x_{k}|^{2}}\Bigg\Vert_{L^{1}(\Sigma)}^{r/2}.$

From (4.2) and estimate (B.1), we get

$\mathbb{P}\left(\Vert\mathcal{Y}^{\nu}\Vert_{Y}\ge\delta\right) \lt \frac{1}{\delta^{r}}\left(r^{2}\frac{\sum_{k = 1}^{+\infty}\Vert P\mathrm{e}_{k}\Vert_{Y}^{2}|x_{k}|^{2}}{\sum_{k = 1}^{\infty}|x_{k}|^{2}}\right)^{r/2}\le\left(\frac{ K_{\mathrm{rand}}(\mathrm{e})}{\delta^{2}}r^{2}\right)^{r/2},$

using the triangle inequality. Minimising the upper bound with respect to $r$ , that is, choosing

$r^{2} = \frac{e^{-2}\delta^{2}}{{ K_{\mathrm{rand}}(\mathrm{e})}}$

in the inequality above, one finally obtains

$\begin{equation} \mathbb{P}\left(\Vert\mathcal{Y}^{\nu}\Vert_{Y}\ge\delta\right)\leq\exp\left(-\frac{e^{-1}\delta}{\sqrt{{ K_{\mathrm{rand}}(\mathrm{e})}}}\right). \end{equation}$

(B.3)

Note that we have assumed $r\geq2$ , thus implicitly posited that ${e^{-2}\delta^{2}}\geq4{ K_{\mathrm{rand}}(\mathrm{e})}$ ; we multiply the bound by $\exp(2)$ to cover the other case.

We conclude by proving Proposition 3.

Proof of Proposition 3. As in (3.3), we have

$\begin{equation} \left\Vert P\left(x^{\nu}\right)\right\Vert _{Y}^{2} = \sum\limits_{k = 1}^{\infty}x_{k}^{2}\left\Vert P\left(e_{k}\right)\right\Vert _{Y}^{2}+2\sum\limits_{\substack{k,l\\ k \lt l} }\beta_{k}^{\nu}\beta_{l}^{\nu}x_{k}x_{l}\left\langle P\left(\mathrm{e}_{k}\right),P\left(\mathrm{e}_{l}\right)\right\rangle _{Y}. \end{equation}$

(B.4)

Observing that the family $\alpha_{k, l}^{\nu} = \{\beta_{k}^{\nu}\beta_{l}^{\nu}\}_{\substack{k, l\\ k < l} }$ is made of independent Bernoulli variables, equal almost surely to $-1$ or 1, with $1/2$ as probability of success, we apply Proposition 5 with

$v_{k,l} = 2\frac{x_{k}}{\left\Vert x\right\Vert _{X}}\frac{x_{l}}{\left\Vert x\right\Vert _{X}}\left\langle P\left(\mathrm{e}_{k}\right),P\left(\mathrm{e}_{l}\right)\right\rangle _{Y}$

and obtain for all $\delta > 0$

$\mathbb{P}\left(\frac{\|P(x^{\nu})\|_{Y}^{2}}{\|x\|_{X}^{2}}-\mathbb{E}\left(\frac{\left\Vert P\left(x^{\nu}\right)\right\Vert _{Y}^{2}}{\left\Vert x\right\Vert _{X}^{2}}\right) \lt -\delta\right) \lt \exp\left(-\frac{\delta^{2}}{2K}\right),$

with

$\begin{split}K & = \sum\limits_{\substack{k,l\\ k \lt l } }\left(2\frac{x_{k}x_{l}}{\left\Vert x\right\Vert _{X}^{2}}\left\langle P\left(\mathrm{e}_{k}\right),P\left(\mathrm{e}_{l}\right)\right\rangle _{Y}\right)^{2}\\ & \leq2\sum\limits_{k,l}\left(\frac{\left|x_{k}\right|\left|x_{l}\right|}{\left\Vert x\right\Vert _{X}^{2}}\left\Vert P\mathrm{e}_{k}\right\Vert _{Y}\left\Vert P\mathrm{e}_{l}\right\Vert _{Y}\right)^{2}\\ & = 2\left(\sum\limits_{k}\frac{\left|x_{k}\right|^{2}}{\left\Vert x\right\Vert _{X}^{2}}\left\Vert P\mathrm{e}_{k}\right\Vert _{Y}^{2}\right)^{2}\\ & \leq2 K_{\mathrm{rand}}(\mathrm{e})^{2}. \end{split}$

References

[1]	Akhtar N, Mian A (2018) Threat of adversarial attacks on deep learning in computer vision: A survey. IEEE Access 6: 14410-14430. doi: 10.1109/ACCESS.2018.2807385
[2]	Alberti GS, Capdeboscq Y (2018) Lectures on Elliptic Methods for Hybrid Inverse Problems, Paris: Société Mathématique de France.
[3]	Ammari H (2008) An Introduction to Mathematics of Emerging Biomedical Imaging, Berlin: Springer.
[4]	Ammari H, Garnier J, Kang H, et al. (2017) Multi-Wave Medical Imaging: Mathematical Modelling & Imaging Reconstruction, World Scientific.
[5]	Antholzer S, Haltmeier M, Schwab J (2018) Deep learning for photoacoustic tomography from sparse data. Inverse Probl Sci Eng 27: 987-1005.
[6]	Antun V, Renna F, Poon C, et al. (2019) On instabilities of deep learning in image reconstruction-Does AI come at a cost?. arXiv preprint arXiv:1902.05300.
[7]	Arridge S, Hauptmann A (2019) Networks for nonlinear diffusion problems in imaging. J Math Imaging Vis DOI: https://xs.scihub.ltd/https://doi.org/10.1007/s10851-019-00901-3.
[8]	Bardos C, Lebeau G, Rauch J (1992) Sharp sufficient conditions for the observation, control, and stabilization of waves from the boundary. SIAM J Control Optim 30: 1024-1065. doi: 10.1137/0330055
[9]	Berg J, Nyström K (2017) Neural network augmented inverse problems for PDEs. arXiv preprint arXiv:1712.09685.
[10]	Bubba TA, Kutyniok G, Lassas M, et al. (2019) Learning the invisible: A hybrid deep learning-shearlet framework for limited angle computed tomography. Inverse Probl 35: 064002. doi: 10.1088/1361-6420/ab10ca
[11]	Burq N (2010) Random data Cauchy theory for dispersive partial differential equations. In: Proceedings of the International Congress of Mathematicians, New Delhi: Hindustan Book Agency, 3: 1862-1883.
[12]	Burq N, Tzvetkov N (2008) Random data Cauchy theory for supercritical wave equations II: A global existence result. Invent Math 173: 477-496. doi: 10.1007/s00222-008-0123-0
[13]	Burq N, Tzvetkov N (2014) Probabilistic well-posedness for the cubic wave equation. J Eur Math Soc 16: 1-30. doi: 10.4171/JEMS/426
[14]	Candès EJ, Romberg J (2007) Sparsity and incoherence in compressive sampling. Inverse Probl 23: 969-985. doi: 10.1088/0266-5611/23/3/008
[15]	Candès EJ, Romberg J, Tao T (2006) Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information. IEEE Trans Inform Theory 52: 489-509. doi: 10.1109/TIT.2005.862083
[16]	Donoho DL (2006) Compressed sensing. IEEE Trans Inform Theory 52: 1289-1306. doi: 10.1109/TIT.2006.871582
[17]	Engl HW, Hanke M, Neubauer A (1996) Regularization of Inverse Problems, Dordrecht: Kluwer Academic Publishers Group.
[18]	Feng J, Sun Q, Li Z, et al. (2018) Back-propagation neural network-based reconstruction algorithm for diffuse optical tomography. J Biomed Opt 24: 051407.
[19]	Foucart S, Rauhut H (2013) A Mathematical Introduction to Compressive Sensing, New York: Birkhäuser/Springer.
[20]	Garnier J, Papanicolaou G (2016) Passive Imaging with Ambient Noise, Cambridge: Cambridge University Press.
[21]	Goodfellow I, Bengio Y, Courville A (2016) Deep Learning, MIT Press.
[22]	Haltmeier M, Antholzer S, Schwab J (2019) Deep Learning for Image Reconstruction, World Scientific.
[23]	Hamilton SJ, Hauptmann A (2018) Deep d-bar: Real-time electrical impedance tomography imaging with deep neural networks. IEEE Trans Med Imaging 37: 2367-2377. doi: 10.1109/TMI.2018.2828303
[24]	Hasanoğlu AH, Romanov VG (2017) Introduction to Inverse Problems for Differential Equations, Cham: Springer.
[25]	Hauptmann A, Lucka F, Betcke M, et al. (2018) Model-based learning for accelerated, limitedview 3-d photoacoustic tomography. IEEE Trans Med Imaging 37: 1382-1393. doi: 10.1109/TMI.2018.2820382
[26]	Humbert E, Privat Y, Trélat E (2019) Observability properties of the homogeneous wave equation on a closed manifold. Commun Part Diff Eq, 44: 749-772. doi: 10.1080/03605302.2019.1581799
[27]	Isakov V (2006) Inverse Problems for Partial Differential Equations, New York: Springer.
[28]	Jin KH, McCann MT, Froustey E, et al. (2017) Deep convolutional neural network for inverse problems in imaging. IEEE Trans Image Process 26: 4509-4522. doi: 10.1109/TIP.2017.2713099
[29]	Kaltenbacher B, Neubauer A, Scherzer O (2008) Iterative Regularization Methods for Nonlinear ill-posed Problems, Berlin: Walter de Gruyter.
[30]	Kirsch A (2011) An Introduction to the Mathematical Theory of Inverse Problems, New York: Springer.
[31]	Laurent C, Léautaud M (2019) Quantitative unique continuation for operators with partially analytic coefficients. Application to approximate control for waves. J Eur Math Soc 21: 957-1069.
[32]	Maass P (2019) Deep learning for trivial inverse problems. In: Compressed Sensing and Its Applications, Cham: Springer International Publishing, 195-209.
[33]	Martin S, Choi CTM (2017) A post-processing method for three-dimensional electrical impedance tomography. Sci Rep 7: 7212. doi: 10.1038/s41598-017-07727-2
[34]	Modin K, Nachman A, Rondi L (2019) A multiscale theory for image registration and nonlinear inverse problems. Adv Math 346: 1009-1066. doi: 10.1016/j.aim.2019.02.014
[35]	Paley R (1930) On some series of functions. Math Proc Cambridge 26: 458-474. doi: 10.1017/S0305004100016212
[36]	Paley R, Zygmund A (1930) On some series of functions, (1). Math Proc Cambridge, 26: 337-357. doi: 10.1017/S0305004100016078
[37]	Paley R, Zygmund A (1932) On some series of functions, (3). Math Proc Cambridge, 28: 190-205. doi: 10.1017/S0305004100010860
[38]	Privat Y, Trélat E, Zuazua E (2013) Optimal observation of the one-dimensional wave equation. J Fourier Anal Appl 19: 514-544. doi: 10.1007/s00041-013-9267-4
[39]	Privat Y, Trélat E, Zuazua E (2015) Optimal shape and location of sensors for parabolic equations with random initial data. Arch Ration Mech An 216: 921-981. doi: 10.1007/s00205-014-0823-0
[40]	Privat Y, Trélat E, Zuazua E (2016) Optimal observability of the multi-dimensional wave and Schrödinger equations in quantum ergodic domains. J Eur Math Soc 18: 1043-1111. doi: 10.4171/JEMS/608
[41]	Privat Y, Trélat E, Zuazua E (2016) Randomised observation, control and stabilization of waves [Based on the plenary lecture presented at the 86th Annual GAMM Conference, Lecce, Italy, March 24, 2015]. ZAMM Z Angew Math Mech 96: 538-549. doi: 10.1002/zamm.201500181
[42]	Privat Y, Trélat E, Zuazua E (2019) Spectral shape optimization for the neumann traces of the dirichlet-laplacian eigenfunctions. Calc Var Partial Dif 58: 64. doi: 10.1007/s00526-019-1522-3
[43]	Rellich F (1940) Darstellung der eigenwerte von δu+λu= 0 durch ein randintegral. Math Z 46: 635-636. doi: 10.1007/BF01181459
[44]	Otmar Scherzer (2015) Handbook of Mathematical Methods in Imaging. Vol. 1, 2, 3. New York: Springer.
[45]	Szegedy C, Zaremba W, Sutskever I, et al. (2013) Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199.
[46]	Tadmor E, Nezzar S, Vese L (2004) A multiscale image representation using hierarchical (BV, L2) decompositions. Multiscale Model Simul 2: 554-579. doi: 10.1137/030600448
[47]	Tarantola A (2005) Inverse Problem Theory and Methods for Model Parameter Estimation. Philadelphia: Society for Industrial and Applied Mathematics.
[48]	Wei Z, Liu D, Chen X (2019) Dominant-current deep learning scheme for electrical impedance tomography. IEEE Trans Biomedical Eng 66: 2546-2555. doi: 10.1109/TBME.2019.2891676
[49]	Yang G, Yu S, Dong H, et al. (2018) Dagan: Deep de-aliasing generative adversarial networks for fast compressed sensing mri reconstruction. IEEE Trans Med Imaging 37: 1310-1321. doi: 10.1109/TMI.2017.2785879
[50]	Zhu B, Liu JZ, Cauley SF, et al. (2018) Image reconstruction by domain-transform manifold learning. Nature 555: 487-492. doi: 10.1038/nature25988

This article has been cited by:

Giovanni S. Alberti, Paolo Campodonico, Matteo Santacesaria, Compressed Sensing Photoacoustic Tomography Reduces to Compressed Sensing for Undersampled Fourier Measurements, 2021, 14, 1936-4954, 1039, 10.1137/20M1375152

Reader Comments

Your name:*

Email:*
© 2020 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Mathematics in Engineering

1.4 2.2

Metrics

Article views(3808) PDF downloads(358) Cited by(1)

Preview PDF

Download XML

Export Citation

Article outline

Show full outline

Mathematics in Engineering

On the randomised stability constant for inverse problems

Related Papers:

Abstract

1. Introduction

2. Randomising initial data of PDEs

2.1. On randomisation processes

2.2. Randomised observability constant

2.2.1. Applications to optimal design problems

3. The randomised stability constant for abstract inverse problems

4. Exploiting $C_{\mathrm{rand}}$ in inverse problems

4.1. Large deviation inequalities

4.2. Can you reconstruct two numbers from their sum?

4.2.1. Instability arises for every $x\in X$

4.2.2. The dependence of $C_{\mathrm{rand}}$ on the basis

4.2.3. Examples

4.3. Linear versus nonlinear problems

5. Concluding remarks

Conflict of interest

A. Examples of techniques based on randomisation

A.1. Compressed sensing

A.2. Passive imaging with random noise sources

A.3. Deep Learning in inverse problems for PDE

B. Proofs of the large deviation estimates

References

This article has been cited by:

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Other Articles By Authors

Catalog

Mathematics in Engineering

On the randomised stability constant for inverse problems

Related Papers:

Abstract

1. Introduction

2. Randomising initial data of PDEs

2.1. On randomisation processes

2.2. Randomised observability constant

2.2.1. Applications to optimal design problems

3. The randomised stability constant for abstract inverse problems

4. Exploiting Crand C_{\mathrm{rand}} in inverse problems

4.1. Large deviation inequalities

4.2. Can you reconstruct two numbers from their sum?

4.2.1. Instability arises for every x∈X x\in X

4.2.2. The dependence of Crand C_{\mathrm{rand}} on the basis

4.2.3. Examples

4.3. Linear versus nonlinear problems

5. Concluding remarks

Conflict of interest

A. Examples of techniques based on randomisation

A.1. Compressed sensing

A.2. Passive imaging with random noise sources

A.3. Deep Learning in inverse problems for PDE

B. Proofs of the large deviation estimates

References

This article has been cited by:

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Other Articles By Authors

Related pages

Tools

Export File

Citation

Format

Content

Catalog

4. Exploiting $C_{\mathrm{rand}}$ in inverse problems

4.2.1. Instability arises for every $x\in X$

4.2.2. The dependence of $C_{\mathrm{rand}}$ on the basis