Performance analysis of the convex non-convex total variation denoising model

Yating Zhu; Zixun Zeng; Zhong Chen; Deqiang Zhou; Jian Zou; Yating Zhu; Zixun Zeng; Zhong Chen; Deqiang Zhou; Jian Zou

doi:10.3934/math.20241409

AIMS Mathematics

2024, Volume 9, Issue 10: 29031-29052. doi: 10.3934/math.20241409

Previous Article Next Article

Research article

Performance analysis of the convex non-convex total variation denoising model

School of Information and Mathematics, Yangtze University, Jingzhou, Hubei, China

Received: 29 July 2024 Revised: 25 September 2024 Accepted: 09 October 2024 Published: 14 October 2024
MSC : 49J40, 65K10, 68U10

Total variation (TV) regularization is a powerful tool in image denoising, but it often exhibits limited performance in preserving edges. In contrast, non-convex TV regularization can more effectively preserve edges and contours, albeit posing challenges when solving. Recently, the convex non-convex (CNC) strategy has emerged as a potent approach that allows incorporating non-convex TV regularization terms while maintaining the overall convexity of the objective function. Its superior performance has been validated through various numerical experiments; however, theoretical analysis remains lacking. In this paper, we provided theoretical analysis of the performance of the CNC-TV denoising model. By utilizing the oracle inequality, we derived an improved upper bound on its performance compared to TV regularization. In addition, we devised an alternating direction method of multipliers (ADMM) algorithm to address the proposed model and verified its convergence properties. Our proposed model has been validated through numerical experiments in 1D and 2D denoising, demonstrating its exceptional performance.

Keywords:

Citation: Yating Zhu, Zixun Zeng, Zhong Chen, Deqiang Zhou, Jian Zou. Performance analysis of the convex non-convex total variation denoising model[J]. AIMS Mathematics, 2024, 9(10): 29031-29052. doi: 10.3934/math.20241409

Related Papers:

[1]	Lufeng Bai . A new approach for Cauchy noise removal. AIMS Mathematics, 2021, 6(9): 10296-10312. doi: 10.3934/math.2021596
[2]	Abdelilah Hakim, Anouar Ben-Loghfyry . A total variable-order variation model for image denoising. AIMS Mathematics, 2019, 4(5): 1320-1335. doi: 10.3934/math.2019.5.1320
[3]	Myeongmin Kang, Miyoun Jung . Nonconvex fractional order total variation based image denoising model under mixed stripe and Gaussian noise. AIMS Mathematics, 2024, 9(8): 21094-21124. doi: 10.3934/math.20241025
[4]	Miyoun Jung . A variational image denoising model under mixed Cauchy and Gaussian noise. AIMS Mathematics, 2022, 7(11): 19696-19726. doi: 10.3934/math.20221080
[5]	Miyoun Jung . Group sparse representation and saturation-value total variation based color image denoising under multiplicative noise. AIMS Mathematics, 2024, 9(3): 6013-6040. doi: 10.3934/math.2024294
[6]	Donghong Zhao, Ruiying Huang, Li Feng . Proximity algorithms for the ${\mathit{L}}^{1}{\mathit{L}}^{2}/{\mathit{T}\mathit{V}}^{\mathit{\alpha }}$ image denoising model. AIMS Mathematics, 2024, 9(6): 16643-16665. doi: 10.3934/math.2024807
[7]	Min Wang, Jiao Teng, Lei Wang, Junmei Wu . Application of ADMM to robust model predictive control problems for the turbofan aero-engine with external disturbances. AIMS Mathematics, 2022, 7(6): 10759-10777. doi: 10.3934/math.2022601
[8]	Muhammad Aslam Noor, Khalida Inayat Noor . Higher order strongly general convex functions and variational inequalities. AIMS Mathematics, 2020, 5(4): 3646-3663. doi: 10.3934/math.2020236
[9]	Tareq Saeed . Intuitionistic fuzzy variational inequalities and their applications. AIMS Mathematics, 2024, 9(12): 34289-34310. doi: 10.3934/math.20241634
[10]	Chengqing Pan, Haishu Lu . On the existence of solutions for systems of generalized vector quasi-variational equilibrium problems in abstract convex spaces with applications. AIMS Mathematics, 2024, 9(11): 29942-29973. doi: 10.3934/math.20241447

Abstract

1. Introduction

Image denoising aims to reduce or eliminate the noise in the image degradation process while maintaining the basic features of the image ^[1]. Mathematically, the image degradation process can be expressed as

$\begin{align} y = \theta^*+\varepsilon, \end{align}$

(1.1)

where $\theta^{*} \in \mathbb{R}^{n}$ is the original image, $y \in\mathbb{R}^n$ is the observed image, and $\varepsilon$ is the additive white Gaussian noise (AWGN) with variance $\sigma^{2}$ , i.e., $\varepsilon\sim N(0, \sigma^{2}I_{n})$ . The concept of total variation (TV) is a mathematical measure that quantifies the level of variation or change present in an image, measuring how rapidly the pixel intensities change across neighboring pixels. In cases where an image contains noise, its pixel intensities exhibit random fluctuations, resulting in high TV values. By minimizing the TV of an image while preserving crucial features such as edges and textures, we can effectively reduce noise and enhance its quality ^{[2,3,4,5,6,7]}. The standard TV estimator $\hat{\theta}$ can be mathematically defined as

$\begin{align} \hat{\theta} = \arg \min\limits_{\theta} \frac{1}{2n}\Vert y - \theta \Vert_{2}^{2} + \lambda \Vert \theta \Vert_{TV}, \end{align}$

(1.2)

where regularization parameter $\lambda > 0$ is used to control the trade-off between the fidelity term $\frac{1}{2n}||y-\theta||_{2}^{2}$ and the TV regularization $||\theta||_{TV}$ . $n$ is the pixel size of the image. By introducing a first-order difference matrix $D\in {R}^{n\times n}$ , the TV regularization $||\theta||_{TV}$ can be represented by the $\ell_1$ norm, i.e., $||\theta||_{TV} = ||D \theta||_1$ .

Due to the convexity of the $\ell_1$ norm, the TV denoising model (1.2) possesses several advantages, such as having an objective function devoid of redundant local minima and ensuring a unique minimum. However, owing to the inherent property of biased estimation associated with the $\ell_1$ norm, this model may also underestimate the magnitude of signal discontinuity^[8,9,10].

The limitations of TV regularization can be overcome by employing non-convex TV regularization, which not only enhances image texture details but also sharpens edges more effectively ^[11,12,13]. This non-convex TV denoising model can be described as follows:

$\begin{align} \hat{\theta} = \arg\min\limits_\theta\frac{1}{2n}||y-\theta||_{2}^{2}+\lambda \Psi^{NC}(D\theta), \end{align}$

(1.3)

where $\Psi^{NC}(\cdot)$ is a non-convex function such as the $\ell_q$ norm with $0 < q < 1$ ^[14,15]. However, the cost function of (1.3) is usually non-convex and often has an extraneous local minimum, which brings challenges when solving.

Recently, Selesnick and Lanza et al. proposed a novel non-convex TV regularization for signal and image denoising ^[9]. This type of non-convex regularization enables the cost function to become convex by adjusting the non-convex control parameters, thereby addressing the limitations of TV regularization and avoiding unnecessary local minima in the cost function. The concept, later known as convex non-convex total variation (CNC-TV) ^[9,16,17], has been widely utilized in various fields such as medical image reconstruction^[18,19,20], fault diagnosis^[21,22], and computer vision ^[23,24]. The advantages of CNC-TV are twofold: First, the utilization of non-convex regularization not only effectively suppresses noise but also preserves image edge and texture information. Second, by adjusting the non-convexity control parameters, the global convexity of the objective function can be ensured, thereby enabling the attainment of the global optimal value through convexity optimization algorithms ^[25,26]. However, to the best of our knowledge, most of the research on CNC-TV focuses on algorithm design and applications, with a lack of theoretical analysis of performance such as ^[27]. This motivates us to conduct a thorough analysis on the performance of the CNC-TV estimator to substantiate that CNC-TV regularization surpasses TV regularization.

In this paper, we consider the following CNC-TV denoising model:

$\begin{align} \hat{\theta} = \arg\min\limits_\theta\frac{1}{2n}||y-\theta||_{2}^{2}+\lambda\Psi^{CNC}_B(D\theta). \end{align}$

(1.4)

The matrix parameter $B$ in (1.4) not only controls the non-convexity of the non-convex regularization term $\Psi^{CNC}_B(D\theta)$ , but also guarantees the global convexity of the cost function in (1.4).

Through rigorous theoretical analysis and comprehensive experimental evaluations, we show that the utilization of non-convex TV regularization significantly enhances the performance of the CNC-TV estimator, enabling accurate estimation of signal discontinuity. In summary, our contributions can be succinctly summarized as follows:

• Theoretically, for the CNC-TV denoising model, we first establish the conditions necessary to ensure global convexity of the cost function. Subsequently, by employing the oracle inequality, we derive an improved upper bound on performance compared to that of the standard TV denoising model.

• Algorithmically, we derive an ADMM algorithm that ensures convergence to a critical point for the proposed CNC-TV denoising model.

• Empirically, we demonstrate that the proposed CNC-TV denoising model outperforms the standard TV denoising model in both 1D and 2D denoising experiments, owing to its utilization of non-convex regularizations.

The paper is structured as follows. In Section 2, we present some preliminaries and related works related to non-convex TV regularization, performance analysis results, and convex optimization algorithms. In Section 3, we give a theoretical analysis of the performance of the CNC-TV denoising model and further propose an effective ADMM algorithm to solve it. The superiority of the proposed model is validated in Section 4 through both 1D and 2D denoising experiments. Finally, the conclusions are summarized in Section 5.

Notation: Throughout this paper, for a matrix $A$ , $A^T$ and $A^{\dagger}$ represent its transpose and Moore-Penrose pseudo-inverse, respectively. $\sigma_{\text {max}}$ and $\sigma_{\text {min}}$ denote the maximum eigenvalue and minimum eigenvalue of matrix $A$ . $\operatorname{ker}(A)$ and $\operatorname{ker}(A)^{\perp}$ denote the kernel space and orthogonal complement space of matrix $A$ . Moreover, $0\prec A$ indicates that matrix $A$ is positive definite, while $0\preceq A$ implies that matrix $A$ is positive semi-definite. Then we use the symbol $\mathcal{O}(n)$ to indicate the efficiency or complexity of comparing different algorithms. For a set $T\subset\{1, 2, ..., n\}$ , $|T|$ and $T^c$ denote its cardinality and complement. For $\theta \in \mathbb{R}^{n}$ , its $\ell_1$ norm, $\ell_2$ norm, and $\ell_{\infty}$ norm can be defined as $\Vert \theta \Vert_1 = \sum_{i = 1}^{n} |\theta_i|$ , $\Vert\theta \Vert_2 = \sqrt{\sum_{i = 1}^{n} \theta_i^2}$ , and $\Vert\theta \Vert_{\infty} = \max_{1 \leq i \leq n} |\theta_i|$ . For a function $f(\theta)$ , we use $\nabla f(\theta)$ to denote the gradient or subdifferential of $f(\theta)$ .

2. Primary and related work

2.1. Non-convex TV regularization

Standard TV denoising can be typically formulated as a convex optimization problem that incorporates an $\ell_1$ regularization term. However, it has a drawback in the sense that it tends to underestimate the magnitudes of abrupt changes in the signal. To address this limitation, non-convex $\text{TV}_q$ $(0 < q < 1)$ regularization is proposed, which can be expressed as $\Vert D \theta\Vert_q$ $(0 < q < 1)$ ^[14,15]. The effectiveness of edge preservation in $\text{TV}_q$ is further enhanced by incorporating non-convex sparse regularization. However, due to its non-convex nature, the $\text{TV}_q$ regularization introduces challenges in algorithm design as it may lead to convergence toward local optima^[28].

By leveraging the advantages of non-convex regularization and convex optimization techniques, Selesnick and Lanza et al. ^[9] proposed a CNC design strategy for designing non-convex TV regularization, thereby ensuring the global convexity of the objective function. In essence, this is achieved by subtracting its smooth version from the TV regularization, with different smoothing techniques yielding distinct forms of non-convex TV regularization^[29]. The most notable among these is the generalized Moreau envelope TV (GME-TV) regularization^[9], which can be mathematically defined as follows:

$\begin{align} \Psi^{GME}_{B}(D\theta) = \Vert D\theta \Vert_1 - \arg\min\limits_{v}\left\{ \Vert v \Vert_1 + \frac{1}{2}\Vert B(D\theta-v) \Vert_2^2 \right\}. \end{align}$

(2.1)

The last term in (2.1), denoted as

$\begin{align} S_B(D\theta) = \arg\min\limits_{v}\left\{ \Vert v \Vert_1 + \frac{1}{2}\Vert B(D\theta-v) \Vert_2^2 \right\}, \end{align}$

(2.2)

represents the generalized Moreau envelope of $\Vert D\theta \Vert_1$ , which is a commonly used smoothing technique. The matrix parameter $B$ is utilized to adjust the smoothness of $S_B(D\theta)$ and consequently affects the non-convexity of $\Psi^{GME}_B(D\theta)$ . Specifically, when $B = 0$ , then $S_B(D\theta) = 0$ , and $\Psi^{GME}_B(D\theta)$ reduces to the TV regularization term $\Vert D\theta \Vert_1$ . When $B^T B$ is a diagonal matrix, i.e. $B = \sqrt{b}I$ , then $\Psi^{GME}_B(D\theta)$ simplifies to the generalized minimax-concave TV (GMC-TV) regularization introduced in ^[30].

2.2. Performance analysis results

In accordance with the conventions followed in the literature on image denoising ^{[27,31,32,33]}, the performance of a TV estimator $\hat{\theta}$ is evaluated by the high-probability upper bound on its mean squared error (MSE), which is defined by

$\begin{align} \frac{1}{n}\Vert\hat{\theta}-\theta^*\Vert^2_2. \end{align}$

(2.3)

The presentation of upper bounds is accomplished through the utilization of oracle inequalities, which serve as theoretical assurances for the estimator $\hat{\theta}$ in terms of its ability to capture the underlying structure of $\theta^*$ and adapt accordingly.

Let $n$ represent the total number of pixels in the image. Mammen and Van De Geer ^[31] first defined total variation by total derivatives and achieved an estimation rate of order $\mathcal{O}(n^{-3/5})$ . Wang et al. ^[33] investigated the more comprehensive framework of trend filtering on graphs and obtained a rate of order $\mathcal{O}(n^{-4/5})$ . Hütter and Rigollet ^[27] established sharp oracle inequalities with a rate of $\mathcal{O}(n^{-1/2})$ . Sadhanala et al. ^[34] demonstrated that the penalized TVD estimator achieves an optimal near minimax rate of $\mathcal{O}(n^{-1/2})$ . Chatterjee and Goswami ^[35] analyzed the recovery of piecewise rectangular images through the constrained version of 2D total variation denoising, providing a rate of $\mathcal{O}(n^{-3/4})$ . Varma et al. ^[36] determined the statistical error rates associated with first-order stationary points in graph trend filtering, employing non-convex regularizations like smoothly clipped absolute deviation (SCAD) and minimax concave penalty (MCP). A more detailed summary of the oracles results on TV denoising can be found in ^[37,38,39].

The analysis of error rates in our study utilizes techniques from ^[27] that yield precise error rates through the use of oracle inequalities. Here, we present some basic definitions and conclusions from ^[27], which will also be used in the theoretical analysis of this paper.

We first give the definitions of the compatibility factor and inverse scaling factor as follows.

Definition 2.1. (^[27], Definition 1) Let $D$ be a first-order difference matrix, and the compatibility factor of $D$ for a set $T \subseteq[m]$ is defined as

$\begin{align} \kappa_{\emptyset}: = 1, \quad \kappa_{T} = \kappa_{T}(D): = \inf _{\theta} \frac{{\sqrt{|T|}\Vert\theta\Vert_2}}{\left\Vert(D \theta)_T\right\Vert_1} \quad { for }\;T \neq \emptyset . \end{align}$

(2.4)

Moreover, the inverse scaling factor of $D$ is defined as

$\begin{align} \varrho = \varrho(D): = \max _{j \in[m]}\left\Vert s_j\right\Vert_2 . \end{align}$

(2.5)

Then the sharp oracle inequality for TV denoising in ^[27] is expressed as Theorem 2.1.

Theorem 2.1. (^[27], Theorem 2) Fix $\delta \in(0, 1), T \subset[m]$ , and let $D$ be the first-order difference matrix. Define the regularization parameter

$\begin{align} \lambda: = \frac{1}{n} \sigma \varrho \sqrt{2 \log \left(\frac{e m}{\delta}\right)}. \end{align}$

(2.6)

With this choice of $\lambda$ , and for $\forall \bar{\theta} \in \mathbb{R}^n$ , the TV estimator $\hat{\theta}$ satisfies

$\begin{align} \frac{1}{n}\Vert\hat{\theta}-\theta^*\Vert_2^2 \leq & \inf _{\bar{\theta}}\left\{\frac{1}{n}\left\Vert\bar{\theta}-\theta^*\right\Vert^2+4 \lambda\left\Vert(D \bar{\theta})_{T^c}\right\Vert_1\right\} +\frac{8 \sigma^2}{n}\left(\frac{|T| \varrho^2}{\kappa_T^2} \log \left(\frac{e m}{\delta}\right)+\log \left(\frac{e}{\delta}\right)\right) , \end{align}$

(2.7)

on the estimation error with a probability of at least $1-2 \delta$ .

2.3. Convex optimization algorithms

For the following optimization problem:

$\begin{align} \min\limits_\beta\left\{f_1(\theta)+ f_2(\theta)\right\}, \end{align}$

(2.8)

where $f_1(\theta)$ is convex and differentiable, $f_2(\theta)$ is a proper closed, convex, but nonsmooth regularization term. The proximal gradient descent (PGD)^[40] and alternating direction method of multipliers (ADMM)^[41] algorithms exhibit excellent performance in solving such problems, with theoretical guarantees of convergence. The iterative formulas for these algorithms for solving (2.8) are represented as

$\theta^{k+1} = \text{prox}_{\alpha f_2} (\theta^k - \alpha \nabla f_1(\theta^k)),$

(PGD)

and

$\left\{ \begin{aligned} \theta^{k+1} = \, & \text{prox}_{\alpha f_1}(z^k+u^k), \\ z^{k+1} = \, & \text{prox}_{\alpha f_2}(\theta^{k+1}-u^k), \\ u^{k+1} = \, & u^k + \theta^{k+1}-z^{k+1} . \end{aligned} \right.$

(ADMM)

Furthermore, $\text{prox}_{f}(\cdot)$ is the proximal operator of $f$ , which is defined as

$\begin{align} \text{prox}_{f}(\theta) = \arg\min\limits_v\left\{f(v) + \frac{1}{2}\Vert \theta-v\Vert^2_2 \right\}. \end{align}$

(2.9)

As a special case, if $f(\theta) = \lambda \Vert \theta \Vert_1$ , then its proximal operator $\text{prox}_{\lambda \Vert\cdot\Vert_1}$ is the element-wise soft thresholding function

$\begin{equation} \text{prox}_{\lambda \Vert \cdot \Vert_1}(\theta_i) = \text{sgn}(\theta_i) \max\left\{|\theta_i|-\lambda, 0\right\}, \end{equation}$

(2.10)

where $\theta_i$ is the $\mathit{i}$ -th element of $\theta$ .

3. CNC-TV

In this section, we analyze the CNC-TV denoising model both theoretically and algorithmically. In particular, we consider the following GME-TV denoising model:

$\begin{align} \hat{\theta} = \arg\min\limits_{\theta}\frac{1}{2n}\Vert y-\theta\Vert^2_2 +\lambda\Psi^{GME}_B(D\theta), \end{align}$

(3.1)

where the GME-TV regularization $\Psi^{GME}_B(D\theta)$ is defined as (2.1), and the matrix parameter $B$ can influence the non-convexity of $\Psi^{GME}_B(D\theta)$ and further the global convexity of the cost function in (3.1).

We first give the convexity condition of the cost function in (3.1), and then we analyze the performance of the GME-TV estimator. Finally, we solve the GME-TV denoising model (3.1) by the ADMM algorithm and prove the convergence.

3.1. Convexity condition

In this subsection, we verify that the cost function in (3.1) can be guaranteed to be convex by selecting the appropriate non-convex control parameter $B$ .

Theorem 3.1. Let $\theta \in \mathbb{R}^{n}, \lambda > 0, B \in R^{p \times n}$ , define the loss function of (3.1) as

$\begin{equation} f_B^{GME}(\theta) = \frac{1}{2 n}\Vert y - \theta \Vert_2^2+\lambda\Vert D \theta \Vert_1-\lambda \min\limits_{v}\left\{ \frac{1}{2}\Vert B(D \theta-v)\Vert_2^2+\Vert v \Vert_1\right\} . \end{equation}$

(3.2)

Then, $f_B^{GME}(\theta)$ is convex if $D^{T} B^{T} B D \preceq (1 / \lambda) I$ , and $f_B^{GME}(\theta)$ is strong convex if $D^{T} B^{T} B D\prec(1 / \lambda) I$ .

Proof. We first rewrite (3.2) as follows:

$\begin{align} f_B^{GME}(\theta) & = \frac{1}{2 n}\Vert\theta-y\Vert_2^2+\lambda\Vert D \theta\Vert_1-\lambda \min _{v}\left\{\Vert v\Vert_1+\frac{1}{2}\Vert B(D \theta-v)\Vert_2^2\right\} \\ & = \frac{1}{2 n}\Vert\theta-y\Vert_2^2+\lambda\Vert D \theta\Vert_1+\frac{\lambda}{2}\Vert B D \theta\Vert_2^2-\lambda \min\limits_{v}\left\{\Vert v\Vert_1+\frac{1}{2}\Vert B v\Vert_2^2-v^{T} B^{T} B(D \theta)\right\} \\ & = \frac1{2n}\theta^{T}(I-\lambda(B D)^{T}(B D))\theta+\frac1{2n}\Vert y\Vert_{2}^{2}-\frac{1}{n}\theta^{T} y+\lambda\Vert D\theta\Vert_1 \\ &\; \; \; \; +\lambda\max\limits_{v}\left\{-\Vert v\Vert_{1}+(Bv)^{T}B(D\theta)-\frac12\Vert B v\Vert_{2}^{2}\right\} . \end{align}$

(3.3)

Next, we analyze the convexity of each term of the last equation in (3.3). First, $-\frac{1}{n}\theta^{T} y$ is a linear function with respect to $\theta$ , thus it is inherently convex. Second, $\lambda\Vert D\theta\Vert_1$ represents the $\ell_1$ norm form of $D\theta$ , which is also known to be convex. Moving on to the last term, since the expression within the curly braces is affine and therefore convex with respect to $\theta$ , and considering that the point-wise maximum of a set of convex functions remains convex, it can be concluded that this particular term is also convex when considered in relation to $\theta$ . Consequently, the overall convexity of $f_B^{GME}(\theta)$ solely relies on the first term $\frac1{2n}\theta^{T}(I-\lambda(B D)^{T}(B D))\theta$ . If $D^{T} B^{T} B D \preceq (1 / \lambda) I$ holds true, then $I-\lambda(B D)^{T}(B D)$ becomes positive semidefinite and as a result, $f_B^{GME}(\theta)$ exhibits convexity. Conversely, if $D^{T} B^{T} B D\prec(1 / \lambda) I$ , then $I-\lambda(B D)^{T}(B D)$ turns out to be positive definite leading to strong convexity being displayed by $f_B^{GME}(\theta)$ .

The convexity conditions mentioned above can be maintained by constructing the matrix parameter $B$ . Following a similar approach as in ^[30], one simple way to determine the value of parameter $B$ is

$\begin{align} BD = \frac{\zeta}{\sqrt{\lambda}}I ,\quad 0\leq\zeta\leq1 . \end{align}$

(3.4)

When $0 < \zeta < 1$ , then $D^{T} B^{T} B D = (\zeta / \lambda) I$ satisfies the convexity condition. When $\zeta = 0$ , we have $B = 0$ and the GME-TV regularization $\Psi^{GME}_B(D\theta)$ reduce to the TV regularization $\Vert D \theta \Vert_1$ . When $\zeta = 1$ , (3.2) is satisfied with equality, resulting in a "maximally" non-convex regularization.

According to Theorem 3.1, we can deduce the following corollary, which is useful in our proof of algorithm convergence.

Corollary 3.1. For all $\mu\geq\lambda \sigma_{\max }I$ , $\lambda\Psi^{GME}_B(D\theta) + \frac{\mu}{2}\Vert D\theta \Vert_2^2$ is convex, where the $\sigma_{\max }$ is the largest eigenvalue of $B^{T} B$ .

Corollary 3.1 can be derived by replacing $\frac{\mu}{2}\Vert\theta \Vert_2^2$ with $\frac{1}{2n}\Vert y-\theta \Vert_2^2$ in (3.3), following the proof process of Theorem 3.1.

3.2. Performance analysis

In this subsection, we analyze the performance of the GME-TV estimator (3.1), and use the oracle inequality to get an upper bound on the error rates, which is better than the upper bound of the standard TV estimator (1.2).

We consider a general GME-TV estimator (3.1), that is, the GME-TV regularization term is non-convex and does not necessarily satisfy the convexity condition given by Theorem 3.1. Due to the non-convexity, it is possible that the global minimum of the proposed GME-TV estimator cannot be achieved. Therefore, it is crucial to comprehensively understand the statistical performance of any first-order stationary points of the GME-TV estimator, making it more desirable in practical applications. We define $\hat{\theta}\in {R}^p$ as a stationary point of $F_B^{GME}(\theta)$ when it fulfills

$\begin{align} 0 \in \nabla_{\theta} f_B^{GME}(\theta)|_{\theta = \hat{\theta}}. \end{align}$

(3.5)

The compatibility factor, as defined by Definition 2.1, allows us to derive the subsequent oracle inequality that is relevant to the stationary points of the GME-TV estimator (3.1).

Theorem 3.2. (Sharp oracle inequality for the GME-TV estimator) Let $D$ be the first-order difference matrix, and $\sigma_{\mathit{{max}}}$ and $\sigma_{\mathit{{min}}}$ be the maximum and minimum eigenvalues of $B^T B$ , respectively. The regularization parameter $\lambda$ is set as $\lambda = \frac{1}{n} \sigma \varrho \sqrt{2 \log \left(\frac{e m}{\delta}\right)}$ . For the fixed $\delta \in(0, 1)$ , $0 < \mu < \frac{1}{n\Vert D\Vert_2^2}$ and $T \subset[m]$ , the GME-TV estimator $\hat{\theta}$ defined in (3.1) satisfies

$\begin{align} \frac{1}{n}\Vert \hat{\theta}-\theta^*\Vert_2^2 &\leq \inf _\theta\left\{\frac{1}{n}\left \Vert \bar{\theta}-\theta^*\right \Vert_2^2+4\lambda\Psi^{GME}_{\sqrt{\sigma_{\max}I}}((D\bar{\theta})_{T^c}) + 2 \lambda\Psi^{GME}_{\sqrt{\sigma_{\min }I}}(D \bar{\theta})-2\lambda\Psi^{GME}_ {\sqrt{\sigma_{max}I}}(D\bar{\theta})\right\} \\ &\; \; \; \; +\frac{16 \sigma^2}{n(1-n \mu\Vert D\Vert_2^2)}\left(\frac{|T|\varrho^2}{\kappa_T^2} \log (\frac{em}{\delta})+\log(\frac{e}{\delta}) \right). \end{align}$

(3.6)

Proof. Since $\hat{\theta}$ is a stationary point of $f(\theta)$ , according to the definition of a subgradient, we have

$\begin{align} 0 \in \nabla_\theta f(\theta)|_{\theta = \hat{\theta}} & = \frac{1}{n}(\hat{\theta}-y)+\nabla_\theta \Psi^{GME}_B(D\theta)|_{\theta = \hat{\theta}} . \end{align}$

(3.7)

Furthermore, by the chain rule, we have

$\begin{align} \nabla_\theta \Psi^{GME}_B(D\theta)|_{\theta = \hat{\theta}} = D^{T} \operatorname{sign}(D \theta)+ D^{T} B^{T} B\left(D \theta-\underset{v}{\operatorname{min}}\left\{\Vert v\Vert_1+\frac{1}{2}\Vert B(D \theta-v)\Vert_2^2\right\}\right). \end{align}$

(3.8)

Then by (3.8), there exist $z \in \left\{\operatorname{sign}(D \theta)+ B^{T} B\left(D \theta-\underset{v}{\operatorname{min}}\left\{\Vert v\Vert_1+\frac{1}{2}\Vert B(D \theta-v)\Vert_2^2\right\}\right)\right\}$ such that

$\begin{align} \frac{1}{n}(y-\hat{\theta}) = \lambda D^{T} z. \end{align}$

(3.9)

In particular, for $\forall \bar{\theta} \in R^n$ , we have

$\begin{align} \frac{1}{n} \bar{\theta}^{T}(y-\hat{\theta}) = \lambda \bar{\theta} D^{T} z = \lambda\left(D\bar{\theta}\right)^{T} z . \end{align}$

(3.10)

Moreover, for a specific $\hat{\theta}$ , we have

$\begin{align} \frac{1}{n} \hat{\theta}^{T}(y-\hat{\theta}) = \lambda (D \hat{\theta})^{T} z. \end{align}$

(3.11)

By subtracting (3.11) from (3.10) and in accordance with the definition of a subgradient, we obtain

$\begin{align} \frac{1}{n}(y-\hat{\theta})(\bar{\theta}-\hat{\theta})^{T} = \lambda z(D \bar{\theta}-D \hat{\theta})^{T} &\leq \Psi^{GME}_B(D \bar{\theta})-\Psi^{GME}_B(D \hat{\theta}). \end{align}$

(3.12)

Considering the model $y = \theta^*+\varepsilon$ and utilizing the polarization equality, i.e., $2 x^{T} y = \Vert x\Vert_2^2+\Vert y\Vert_2^2-\Vert x-y\Vert_2^2$ , we can reformulate the left-hand side of (3.12) as

$\begin{align} \frac{1}{n}(\theta^*-\hat{\theta}+\varepsilon)(\bar{\theta}-\hat{\theta})^{T} = \, & \frac{1}{n}\left[(\bar{\theta}-\hat{\theta})^{T}\left(\theta^*-\hat{\theta}\right)+\varepsilon^{T}(\bar{\theta}-\hat{\theta})\right] \\ = \, & \frac{1}{2n}\Vert\bar{\theta}-\hat{\theta}\Vert_2^2+\frac{1}{2n}\Vert\theta^*-\hat{\theta}\Vert_2^2-\frac{1}{2 n}\Vert\bar{\theta}-\theta^*\Vert_2^2+\frac{1}{n} \varepsilon^{T}(\bar{\theta}-\hat{\theta}). \end{align}$

(3.13)

For $\forall \theta \in R^n$ , the combination of (3.12) and (3.13) yields

$\begin{align} \frac{1}{2n}\Vert\hat{\theta}-\bar{\theta}\Vert_2^2+\frac{1}{2n}\Vert\hat{\theta^*}-\hat{\theta}\Vert_2^2 \leq \frac{1}{2n}\Vert\bar{\theta}-\theta^*\Vert_2^2+\frac{2}{n} \varepsilon^{T}(\bar{\theta}-\hat{\theta})+\lambda \Psi^{GME}_B(D \bar{\theta})-\lambda \Psi^{GME}_B(D \hat{\theta}). \end{align}$

(3.14)

Next, we will analyze each term on the right-hand side of (3.14) separately.

We first address $\varepsilon^{T}(\bar{\theta}-\hat{\theta})$ . Let $\Pi$ and $I-\Pi$ denote the projection matrices onto $\operatorname{ker}(D)$ and $\operatorname{ker}(D)^{\perp}$ , respectively. Moreover, we have $D^{\dagger} D = (I-\Pi)$ . With the the decomposition $\varepsilon = (\Pi \varepsilon)^{T}+((I-\Pi)\varepsilon)^{T}$ , we have

$\begin{align} \varepsilon^{T}(\hat{\theta}-\bar{\theta}) & = (\Pi \varepsilon)^{T} (\hat{\theta}-\bar{\theta}) +((I-\Pi)\varepsilon)^{T} (\hat{\theta}-\bar{\theta}) \\ & = (\Pi \varepsilon)^{T} (\hat{\theta}-\bar{\theta}) + \varepsilon^T D^{\dagger} D (\hat{\theta}-\bar{\theta}) \\ & \leq\Vert\Pi \varepsilon \Vert_2\Vert\hat{\theta}-\bar{\theta}\Vert_2+\|(D^{\dagger})^{T} \varepsilon\|_{\infty}\Vert D(\hat{\theta}-\bar{\theta})\Vert_1. \end{align}$

(3.15)

The last inequality in (3.15) is derived from Hölder's inequality, i.e., $x^{T} y \leq \Vert x\Vert_p + \Vert y\Vert_q$ with $1/p + 1/q = 1$ .

By using the maximal inequality for Gaussian random variables ^[42], we have

$\begin{align} \Vert\Pi\varepsilon\Vert_{2}\leq2\sigma\sqrt{2\log(e/\delta)} , \end{align}$

(3.16)

and

$\begin{align} \Vert\left(D^{\dagger}\right)^{T}\varepsilon\Vert_{\infty}\leq\sigma \varrho\sqrt{2\log(e m/\delta)} = \lambda n, \end{align}$

(3.17)

with a probability of at least $1-2\delta$ .

The substitution of (3.17) into (3.15) yields

$\begin{align} \frac{2}{n}\varepsilon^{\top}(\hat{\theta}-\bar{\theta}) \leq 2 \lambda \Vert D(\hat{\theta}-\bar{\theta})\Vert_1+\frac{2}{n}\Vert\Pi\varepsilon\Vert_2\Vert\hat{\theta}-\bar{\theta}\Vert_2. \end{align}$

(3.18)

By using $\lambda\Vert D\theta\Vert_1 \leq \lambda \Psi^{GME}_{\sqrt{\sigma_{\max}I}}(D\theta)+\frac{\mu}{2}\Vert D\theta\Vert_2^2$ and $\Vert D(\hat{\theta}-\theta)\Vert_2^2 \leq\Vert D\Vert_2^2\Vert\hat{\theta}-\theta\Vert_2^2$ , we have

$\begin{align} \frac{2}{n}\varepsilon^{T}(\hat{\theta}-\bar{\theta}) \leq 2 \lambda \Psi^{GME}_{\sqrt{\sigma_{\max}I}} (D(\hat{\theta}-\bar{\theta}))+\mu\Vert D(\hat{\theta}-\bar{\theta})\Vert_2^2 +\frac{2}{n}\Vert\Pi\varepsilon\Vert_2\Vert\hat{\theta}-\bar{\theta}\Vert_2. \end{align}$

(3.19)

Substituting (3.19) into (3.14), we obtain

(3.20)

For the GME-TV regularization $\Psi^{GME}_B(D \bar{\theta})$ , we have

$\begin{align} \Psi^{GME}_B(D \bar{\theta}) & = \Vert D\bar{\theta}\Vert_1-\underset{v}{\operatorname{min}}\left\{\Vert v\Vert_1+\frac{1}{2}\Vert B(D\bar{\theta} -v)\Vert_2^2\right\} \leq \Psi^{GME}_ {\sqrt{\sigma_{\min}I}}(D\bar{\theta}) , \end{align}$

(3.21)

and

$\begin{align} \Psi^{GME}_B(D \hat{\theta}) & = \Vert D \hat{\theta}\Vert_1-\underset{v}{\operatorname{min}}\left\{\Vert v\Vert_1+\frac{1}{2}\Vert B(D \hat{\theta}-v)\Vert_2^2\right\} \geq \Psi^{GME}_{\sqrt{\sigma_{\max}I}}(D \hat{\theta}) . \end{align}$

(3.22)

After substituting (3.21) and (3.22) into (3.20), we can rewrite (3.20) as

$\begin{align} \frac{1}{n}\Vert\hat{\theta}-\bar{\theta}\Vert_2^2+\frac{1}{n}\Vert\hat{\theta^*}-\hat{\theta}\Vert_2^2 &\leq \frac{1}{n}\Vert\bar{\theta}-\theta^*\Vert_2^2 +\frac{2}{n}\Vert\Pi\varepsilon\Vert_2\Vert\hat{\theta}-\bar{\theta}\Vert_2+ \mu\Vert D\|_2^2\Vert\hat{\theta}-\bar{\theta}\Vert_2^2 \\ &\; \; \; \; +2\lambda\Psi^{GME}_{\sqrt{\sigma_{\max}I}} (D(\hat{\theta}-\bar{\theta}))+ 2\lambda\Psi^{GME}_ {\sqrt{\sigma_{\min}I}}(D\bar{\theta}) -2\lambda \Psi^{GME}_{\sqrt{\sigma_{\max}I}}(D \hat{\theta}) \\ & = \frac{1}{n}\Vert\bar{\theta}-\theta^*\Vert_2^2 +\frac{2}{n}\Vert\Pi\varepsilon\Vert_2\Vert\hat{\theta}-\bar{\theta}\Vert_2+ \mu\Vert D\Vert_2^2\Vert\hat{\theta}-\bar{\theta}\Vert_2^2 \\ &\; \; \; \; +2\lambda\Psi^{GME}_{\sqrt{\sigma_{\max}I}} (D(\hat{\theta}-\bar{\theta}))+ 2\lambda\Psi^{GME}_ {\sqrt{\sigma_{\max}I}}(D\bar{\theta}) -2\lambda \Psi^{GME}_{\sqrt{\sigma_{\max}I}}(D \hat{\theta})\\ &\; \; \; \; +2\lambda\Psi^{GME}_ {\sqrt{\sigma_{min}I}}(D\bar{\theta})-2\lambda\Psi^{GME}_ {\sqrt{\sigma_{\max}I}}(D\bar{\theta}). \end{align}$

(3.23)

Note that for any $T$ , $\Psi_B(D\theta) = \Psi_B((D\theta)_T) + \Psi_B((D\theta)_{T^c})$ . Consequently, by utilizing the triangular inequality and subadditivity and symmetry of $\varrho$ , we obtain

$\begin{align} &\Psi^{GME}_{\sqrt{\sigma_{\max}I}} (D(\hat{\theta}-\bar{\theta}))+\Psi^{GME}_{\sqrt{\sigma_{\max}I}}(D \bar{\theta})-\Psi^{GME}_{\sqrt{\sigma_{\max}I}}(D \hat{\theta}) \\ \leq \, &\Psi^{GME}_{\sqrt{\sigma_{\max}I}}((D(\hat{\theta}-\bar{\theta}))_{T})+\Psi^{GME}_{\sqrt{\sigma_{\max}I}}((D\bar{\theta})_{T^c})+\Psi^{GME}_{\sqrt{\sigma_{\max}}}((D\hat{\theta})_{T^c}) \\ &+\Psi^{GME}_{\sqrt{\sigma_{\max}I}}(D\bar{\theta})-\Psi^{GME}_{\sqrt{\sigma_{\max}I}}((D\hat{\theta})_{T})-\Psi^{GME}_{\sqrt{\sigma_{\max}I}}((D\hat{\theta})_{T^c}) \\ = \, &\Psi^{GME}_{\sqrt{\sigma_{\max}I}}((D(\hat{\theta}-\hat{\theta}))_{T})+2\Psi^{GME}_{\sqrt{\sigma_{\max}I}}\left(\left(D\theta\right)_{T^c}\right) +\Psi^{GME}_{\sqrt{\sigma_{\max}I}}((D\theta)_{T})-\Psi^{GME}_{\sqrt{\sigma_{\max}I}}((D\hat{\theta})_{T}) \\ \leq \, &2\Psi^{GME}_{\sqrt{\sigma_{\max}I}}((D(\hat{\theta}-\bar{\theta}))_{T})+2\Psi^{GME}_{\sqrt{\sigma_{\max}I}}((D\bar{\theta})_{T^c}). \end{align}$

(3.24)

For $\Psi^{GME}_{\sqrt{\sigma_{\max}I}}((D(\hat{\theta}-\bar{\theta}))_{T})$ in (3.24), we have

$\begin{align} \Psi^{GME}_{\sqrt{\sigma_{\max}I}}((D(\hat{\theta}-\bar{\theta}))_{T}) \leq \Vert D(\hat{\theta}-\bar{\theta}) T\Vert_1 \leq \kappa_T^{-1} \sqrt{|T|}\Vert\hat{\theta}-\bar{\theta}\Vert_2. \end{align}$

(3.25)

The last inequality is based on the definition of a compatibility factor in Definition 2.1.

Next, through the integration of (3.23)–(3.25), we can derive

(3.26)

By utilizing Young's inequality, i.e., $2 x y \leq \frac{x^2}{m}+{m} y^2$ for $m > 0$ , with $x = \frac{1}{n}\Vert\Pi\varepsilon\Vert_2 +2\kappa_T^{-1} \sqrt{|T|}, \; y = \Vert\hat{\theta}-\bar{\theta}\Vert_2$ , and $m = \frac{1}{n}-\mu\Vert D\Vert_2^2$ (with the requirement of $\mu < \frac{1}{n\Vert D\Vert_2^2}$ ), we can rewrite (3.26) as

$\begin{align} 2(\frac{1}{n}\Vert\Pi\varepsilon\Vert_2 +2\kappa_T^{-1} \sqrt{|T|} )\Vert\hat{\theta}-\bar{\theta}\Vert_2&\leq \frac{1}{m}(\frac{1}{n}\Vert\Pi\varepsilon\Vert_2 +2\lambda\kappa_T^{-1} \sqrt{|T|})^2+m\Vert\hat{\theta}-\bar{\theta}\Vert_2^2 \\ &\leq\frac{2}{\frac{1}{n}-\mu\Vert D\Vert_2^2}(\frac{1}{n^2}\Vert\Pi\varepsilon\Vert_2^2 +4\lambda ^2 \kappa_T^{-2}|T|). \end{align}$

(3.27)

The substitution of (3.27) into (3.26) results in the following equation:

$\begin{align} \frac{1}{n}\Vert\bar{\theta}-\hat{\theta}\Vert_2^2+\frac{1}{n}\Vert\theta^*-\hat{\theta}\Vert_2^2&\leq\frac{1}{n} \Vert\bar{\theta}-\theta^*\Vert_2^2+\frac{1}{n}\Vert \hat{\theta}-\bar{\theta} \Vert_2^2+4\lambda\Psi^{GME}_{\sqrt{\sigma_{\max}I}}((D\bar{\theta})_{T^c}) \\ &\; \; \; \; +\frac{2n}{1-n \mu\Vert D\Vert_2^2}(\frac{1}{n^2}\Vert\Pi \varepsilon\|_2^2+4\lambda^2 \kappa_T^{-2}|T|) \\ &\; \; \; \; +2\lambda\Psi^{GME}_ {\sqrt{\sigma_{\min}I}}(D\bar{\theta})-2\lambda\Psi^{GME}_ {\sqrt{\sigma_{\max}I}}(D\bar{\theta}) . \end{align}$

(3.28)

The cancellation of $\frac{1}{n}\Vert\hat{\theta}-\bar{\theta}\Vert_2^2$ on both sides of (3.28) yields

$\begin{align} \frac{1}{n}\Vert\hat{\theta}-\theta^*\Vert_2^2 &\leq \frac{1}{n}\Vert\bar{\theta}-\theta ^*\Vert_2^2+4\lambda\Psi^{GME}_{\sqrt{\sigma_{\max}I}}((D\bar{\theta})_{T^c}) + 2\lambda\Psi^{GME}_ {\sqrt{\sigma_{min}I}}(D\bar{\theta})-2\lambda\Psi^{GME}_ {\sqrt{\sigma_{\max}I}}(D\bar{\theta})\\ &\; \; \; \; +\frac{2 n}{1-n \mu\Vert D\Vert_2^2}\left(\frac{1}{n^2}\Vert\Pi \varepsilon\Vert_2^2+4 \lambda^2\kappa_T^{-2}|T|\right). \end{align}$

(3.29)

By substituting $\Vert\Pi \varepsilon\Vert_2 \leq 2 \sigma \sqrt{2 \operatorname{log} e / \delta}$ and $\sigma \varrho \sqrt{2 \operatorname{log\; em} / \delta} = \lambda n$ into (3.29), we obtain

$\begin{align} \frac{1}{n}\Vert\hat{\theta}-\theta^*\Vert_2^2&\leq \inf _\theta\left\{\frac{1}{n}\Vert\bar{\theta}-\theta^*\Vert_2^2+4\lambda\Psi^{GME}_{\sqrt{\sigma_{\max}I}}((D\bar{\theta})_{T^c}) + 2 \lambda\Psi^{GME}_{\sqrt{\sigma_{\min }I}}(D \bar{\theta})-2\lambda\Psi^{GME}_ {\sqrt{\sigma_{\max}I}}(D\bar{\theta})\right\} \\ &\; \; \; \; +\frac{16 \sigma^2}{n(1-n \mu\Vert D\Vert_2^2)}\left(\frac{|T|\varrho^2}{\kappa_T^2} \log (\frac{em}{\delta})+\log(\frac{e}{\delta}) \right). \end{align}$

(3.30)

This completes the proof of Theorem 3.2. □

The oracle inequality remains valid for any $\hat{\theta}$ that satisfies the first-order optimality condition, even in the case of non-convex regularization. This mild requirement imposed on $\hat{\theta}$ represents a significant divergence from previous findings, such as Theorem 2 of ^[27], which is suitable for global minimization but challenging to ensure when employing non-convex regularizations.

The error bound to the right-hand side of (3.6) can be optimized through careful choice of $\overline{\theta}$ and $T$ . When we set $\bar{\theta} = \theta^*$ , then

$\begin{align} \frac{1}{n}\Vert\hat{\theta}-\theta^*\Vert_2^2 &\leq \inf _\theta\left\{4\lambda\Psi^{GME}_{\sqrt{\sigma_{\max}I}}((D\theta^*)_{T^c}) + 2 \lambda\Psi^{GME}_{\sqrt{\sigma_{\min}I}}(D \theta^*)-2\lambda\Psi^{GME}_ {\sqrt{\sigma_{\max}I}}(D\theta^*)\right\} \\ &\; \; \; \; +\frac{16 \sigma^2}{n(1-n \mu\Vert D\Vert_2^2)}\left(\frac{|T|\varrho^2}{\kappa_T^2} \log (\frac{em}{\delta})+\log(\frac{e}{\delta}) \right). \end{align}$

(3.31)

When we set $T$ as the empty set, i.e., $T = \emptyset$ , then

$\begin{align} \frac{1}{n}\Vert\hat{\theta}-\theta^*\Vert_2^2 &\leq \inf _\theta\left\{4\lambda\Psi^{GME}_{\sqrt{\sigma_{\max}I}}(D\theta^*) + 2 \lambda\Psi^{GME}_{\sqrt{\sigma_{\min }I}}(D \theta^*)-2\lambda\Psi^{GME}_ {\sqrt{\sigma_{\max}I}}(D\theta^*)\right\} \\ &\; \; \; \; +\frac{16 \sigma^2}{n(1-n\mu\Vert D\Vert_2^2)}\log(\frac{e}{\delta}). \end{align}$

(3.32)

The error bound for the GME-TV estimator is now compared with Theorem 2.1 in the special case of setting $T = \emptyset$ , i.e., (3.32). First, due to the non-convex nature of $\Psi^{GME}_B(D\theta)$ , it imposes particularly strict constraints when $D\theta^*$ contains large coefficients. Therefore, we have $\Psi^{GME}_{\sqrt{\sigma_{\max }I}}(D \theta^*)\ll \Vert D\theta^* \Vert_1$ and $\Psi^{GME}_{\sqrt{\sigma_{\min }I}}(D \theta^*)-\Psi^{GME}_ {\sqrt{\sigma_{\max}I}}(D\theta^*)\ll \Vert D\theta^* \Vert_1$ . Consequently, the first term in (3.32) is upper bounded by that of Theorem 2.1. Moreover, the second term in (3.32) includes $n(1-n \mu\Vert D\Vert_2^2)$ in the denominator, which establishes it as an upper bound for the second term mentioned in Theorem 2.1. In summary, the error bound of GME-TV is the upper bound from Theorem 2.1.

On the other hand, the above error gap can be reduced by adjusting the matrix parameter $B$ . When we adjust $B$ such that $\Psi^{GME}_B(D\theta)$ trends to $||D \theta||_1$ , the improvement in the first term of the bound can be erased.

In conclusion, the proposed GME-TV estimator ensures strong statistical guarantees for any stationary point, despite its non-convex nature.

3.3. ADMM algorithm

In this subsection, we solve the GME-TV denoising model (3.1) by using the ADMM algorithm and prove the convergence.

By introducing an auxiliary variable $z$ , problem (3.1) can be rewritten as

$\begin{align} \begin{split} \arg\min\limits_{\theta}&\frac{1}{2n}\Vert y - \theta \Vert_{2}^{2}+\lambda\Psi^{GME}_B(z)\\ \mathrm{s.t.} \quad &z = D\theta . \end{split} \end{align}$

(3.33)

Furthermore, its augmented Lagrangian form can be expressed as

$\begin{align} L(\theta,z,u) = \frac{1}{2n}\Vert\theta-y \Vert_{2}^{2}+\lambda\Psi^{GME}_{B}(z)+u^{T}(D\theta-z)+\frac{\rho}{2}\Vert z-D\theta \Vert_{2}^{2}, \end{align}$

(3.34)

where $u$ represents the Lagrange multiplier and $\rho > 0$ serves as a penalty parameter.

According to ADMM iteration, solving the minimization problem (3.34) involves addressing three subproblems.

Step 1. Update $\theta^{k+1}$ while keeping $z^k$ and $u^k$ fixed.

$\begin{align} \theta^{k+1}& = \arg\min\limits_{\theta}\left\{\frac{1}{2n}\Vert\theta-y\Vert_2^{2}+(u^{k})^{T}D\theta+\frac{\rho}{2}\Vert z-D\theta\Vert_{2}^{2} \right\} \\ & = \arg\min\limits_{\theta}\left\{\frac{1}{2n}\Vert\theta-y\Vert_{2}^{2}+\frac{\rho}{2}\Vert D\theta-(z^{k}-\frac{\rho}{2}u^{k})\Vert_{2}^{2}\right\} \\ & = (\frac{1}{n}I+{\rho}D^{T}D)^{-1}(\frac{1}{n}y+{\rho}D^{T}z^{k}-D^{T}z^{k}u^{k}). \end{align}$

(3.35)

Step 2. Update $z^{k+1}$ while keeping $\theta^{k+1}$ and $u^k$ fixed.

$\begin{align} z^{k+1}& = \arg\min\limits_{z}\left\{\lambda\Psi^{GME}_{B}(z)+(u^{k})^{T}z+\frac{\rho}{2}\Vert z-D\theta^{k+1}\Vert_{2}^{2}\right\} \\ & = \arg\min\limits_{z}\left\{\lambda\Psi^{GME}_{B}(z)+\frac{\rho}{2} \Vert z-D\theta^{k+1}+\frac{1}{\rho}u^{k}\Vert_{2}^{2} \right\}. \end{align}$

(3.36)

Equation (3.36) can be considered as the proximal operator of $\Psi^{GME}_{B}(z)$ . Although this operator does not have an explicit expression, we obtain its iterative solution through the PGD iteration.

By substituting $\Psi_B^{GME}(z)$ into (3.36), we can reformulate equation (3.36) as

$\begin{align} z^{k+1} = \arg\min\limits_{z}\left\{\lambda\Vert z \Vert_{1} - \lambda\min\limits_{v}\left\{\Vert v\Vert_{1}+\frac{1}{2}\Vert B(z-v)\Vert_{2}^{2}\right\}+\frac{\rho}{2}\Vert z-D\theta^{k+1}+\frac{1}{\rho}u^{k}\Vert_{2}^{2}\right\} . \end{align}$

(3.37)

Let $f_1(z) = \frac{\rho}{2}\Vert z-D\theta^{k+1}+\frac{1}{\rho}u^{k}\Vert_{2}^{2} -\lambda\min\limits_{v}\left\{\Vert v\Vert_{1}+\frac{1}{2}\Vert B(z-v)\Vert_{2}^{2} \right\}$ and $f_2(z) = \lambda\Vert z\Vert_{1}$ . The update of $z^{k+1}$ can be obtained through the PGD algorithm as follows:

$\begin{align} z^{k+1}& = \text{prox}_{\alpha f_2}(z^k-\alpha\triangledown f_1(z^k)) \\ & = \text{prox}_{\alpha\lambda\Vert\cdot\Vert_1}(z^k-\alpha f_1(z^k)), \end{align}$

(3.38)

where $\alpha$ is the step size. The main challenge of (3.38) is how to solve the gradient $\nabla f_1(z)$ . According to Lemma 3 in ^[16], the last term of $f_1(z)$ is differentiable with respect to $z$ . Furthermore, the gradient of $f(z)$ can be expressed as

$\begin{align} \nabla f_1(z) = \rho(z-\theta^{k+1}+\frac{u^k}{\rho})-\lambda\left(B^T B\left(z -\arg\min\limits_{v}\left\{\frac{1}{2}\Vert B(z-v)\Vert_2^2+\Vert v\Vert_1\right\}\right)\right) . \end{align}$

(3.39)

Note that (3.39) contains an $\ell_1$ regularization problem:

$\begin{align} v^{k+1} = \arg\min\limits_{v}\left\{\frac{1}{2}\Vert B(z-v)\Vert_2^2+\Vert v\Vert_1\right\}, \end{align}$

(3.40)

which can be regarded as a proximal operator of the $\ell_1$ norm, i.e.,

$\begin{align} v^{k+1} = \text{prox}_{\Vert\cdot\Vert_1}(v^k+B(z-v^k)), \end{align}$

(3.41)

where $prox_{\Vert\cdot\Vert_1}(\cdot)$ can be simply solved by the soft-thresholding operator (2.10).

By substituting (3.41) and (3.39) into (3.38), we can succinctly summarize the update of $z^{k+1}$ as follows:

$\begin{align} \left\{ \begin{array}{cl} v^{k+1} = &\text{prox}_{\Vert\cdot\Vert_1}\{v^k +B(z^k-v^k)\}, \\ w^{k+1} = &(1-\alpha\rho+\alpha\lambda B^T B)z^k+\alpha\rho D\theta^{k+1}-\alpha u^k-\alpha\lambda B^T Bv^{k+1} ,\\ z^{k+1} = &\text{prox}_{\alpha\lambda\Vert\cdot\Vert_1}(w^{k+1}) . \end{array} \right. \end{align}$

(3.42)

Step 3. Update $u^{k+1}$ .

$\begin{align} u^{k+1} = u^k + \rho(D\theta^{k+1}-z^{k+1}). \end{align}$

(3.43)

Finally, the ADMM algorithm for GME-TV regularization can be obtained as Algorithm 1 through the rearrangement of Eqs (3.35), (3.42), and (3.43).

Algorithm 1. ADMM for the GME-TV denoising model
Require: $y, \theta^0, z^0, u^0, B, \lambda, \rho, \alpha$
Ensure: $\theta$
1: while 'stopping criterion is not met' do
2: $\theta^{k+1}=(\frac{1}{n}I+{\rho}D^{T}D)^{-1}(\frac{1}{n}y+{\rho}D^{T}z^{k}-D^{T}z^{k}u^{k});$
3: $v^{k+1}=\text{prox}_{\Vert\cdot\Vert_1}\{v^k +B(z^k-v^k)\};$
4: $w^{k+1}=(1-\alpha\rho+\alpha\lambda B^T B)z^k+\alpha\rho D\theta^{k+1}-\alpha u^k-\alpha\lambda B^T Bv^{k+1};$
5: $z^{k+1}=\text{prox}_{\alpha\lambda\Vert\cdot\Vert_1}(w^{k+1});$
6: $u^{k+1}=u^k + \rho(D\theta^{k+1}-z^{k+1}).$
7: end while

The total computational complexity of Algorithm 1 mainly depends on the $\theta$ -subproblem, which involves the matrix inversion $(\frac{1}{n}I+{\rho}D^{T}D)^{-1}$ with a computation complexity of $\mathcal{O}(n^{3})$ for $D \in R^{n \times n}$ and $\theta \in \mathbb{R}^{n}$ . Fortunately, under the periodic boundary conditions for $\theta$ , $D^{T}D$ can be represented as a block circulant matrix with circulant blocks. In this scenario, solving the $\theta$ -subproblem becomes feasible through one forward fast Fourier transform (FFT) and one inverse FFT, each requiring a cost of $\mathcal{O}(n\log n)$ ^[43]. Consequently, the total computational complexity of Algorithm 1 is significantly reduced to $\mathcal{O}(n\log n)$ , representing a substantial improvement compared to its original complexity of $\mathcal{O}(n^{3})$ .

Algorithm 1 showcases the utilization of an ADMM update with the Lagrangian introduced in Eq (3.34). Additionally, we offer convergence assurance for Algorithm 1.

Theorem 3.3. By selecting the appropriate parameter $\rho$ , then the primal residual $r^{k} = \theta^{k}-z^{k}$ and the dual residual $s^{k+1} = \rho(z^{k+1}-z^{k})$ of Algorithm 1 satisfy that $\lim\limits_{k\to\infty}\Vert r^{k} \Vert_2^2 = 0$ and $\lim\limits_{k\to\infty}\Vert s^{k} \Vert_2^2 = 0$ .

Proof. The proof follows the idea of proving Proposition 1 in ^[44] and Theorem 3 in ^[36].

Considering the augmented Lagrangian $L(\theta, z, u)$ with respect to $z$ , while assuming the other variables are fixed, we can derive

$\begin{align} L(\theta,z,u)& = \frac{1}{2n}\Vert\theta-y\Vert_{2}^{2}+\lambda\Psi^{GME}_{B}(z)+u^{T}(z-D\theta)+\frac{\rho}{2}\Vert z-D\theta\Vert_{2}^{2} \\ & = \frac{1}{2n}\Vert\theta-y\Vert_{2}^{2}+\lambda\Psi^{GME}_{B}(z)-u^{T}z+\frac{\rho}{2}\Vert z\Vert_{2}^{2}-\frac{\rho}{2}\Vert D\theta\Vert_{2}^{2}-\rho u^{T}(D\theta) . \end{align}$

(3.44)

As stated in Corollary 3.1, there exists $\mu > 0$ such that $\lambda\Psi^{GME}_{B}(D\theta)+\frac{\mu}{2}\Vert{D\theta}\Vert_2^2$ is convex. Additionally, it should be noted that the term $\frac{1}{2n}\Vert\theta-y\Vert_{2}^{2}-\frac{\rho}{2}\Vert D\theta\Vert_{2}^{2}-\rho u^{T}(D\theta)$ is independent of $z$ . Therefore, by choosing $\rho \geq \mu$ , we can observe the convexity of $L(\theta, z, u)$ with respect to each variable $\theta$ , $u$ , and $z$ . The property enables us to effectively apply Theorem 5.1 in ^[45]. Consequently, Algorithm 1 converges toward the limit points $\theta^*$ , $z^*$ , and $u^*$ . This implies that for the dual residual, we have

$\begin{align} \lim\limits_{k\to\infty}\Vert s^{k} \Vert_2^2 = \Vert\rho D^{T}(z^{*}-z^{*})\Vert_2^2 = 0 . \end{align}$

(3.45)

For the primal residual, it is worth noting that the update step for $u$ in Algorithm 1 implies that $\forall k, t \geq 0$ ,

$\begin{align} u^{k+t} = u^{k}+\sum\limits_{i = 1}^{t}( D\theta^{k+i}-z^{k+i}). \end{align}$

(3.46)

For fixed $t$ and as $k\to\infty$ , we have

$\begin{align} u^* = u^*+t( D\theta^*-z^*), \end{align}$

(3.47)

where $\forall t\geq0$ , and thus, $D \theta^*-z^* = 0$ . Therefore, it follows that

$\begin{align} \lim\limits_{k\to\infty} \Vert r^{k} \Vert_2^2 = \Vert D\theta^*-z^*\Vert_2^2 = 0. \end{align}$

(3.48)

Theorem 3.3 establishes the convergence of Algorithm 1 toward achieving both primal and dual feasibility in the limit. It also demonstrates the equivalence between the augmented Lagrangian form (3.34) with fixed $z$ and $u$ , and the original form (3.33). Each iteration of Algorithm 1 yields a stationary point $\theta$ for (3.34) with fixed $z$ and $u$ , thereby implying that $\theta^*$ is also a stationary point for (3.33).

4. Numerical experiments

In this section, we will compare the denoising performance of the proposed GME-TV regularization with the standard TV regularization ^[46] and MC-TV regularization ^[9,17] through 1D and 2D denoising experiments.

4.1. 1D denoising

In this experiment, we evaluate the performance of different regularizations for 1D signal denoising. Initially, a piecewise constant 1D signal with a length of $n = 180$ is generated, degraded by the AWGN with a noise level of $\sigma = 0.9$ . Subsequently, GME-TV, MC-TV, and standard TV regularization are applied for the purpose of denoising. The denoising accuracy is quantified using the root mean squared error (RMSE), which is defined as

$\begin{equation} \text{RMSE} = \sqrt{\frac{1}{n} \sum\limits_{i = 1}^{n} ( \hat{\theta}_i - \theta^*_i)^2}. \end{equation}$

(4.1)

The regularization parameter $\lambda$ and the non-convex control matrix parameter $B$ play a crucial role in determining the denoising performance of GME-TV. We establish these two hyperparameters as follows: First, we plot the curve of RMSE with respect to $\lambda$ , and subsequently select the optimal value of $\lambda$ that exhibits the lowest RMSE value. Second, based on (3.4), the matrix parameter $B$ is determined by $\zeta$ . Therefore, once $\lambda$ is determined, we proceed to plot the curve of RMSE with respect to $\zeta$ and then choose the optimal value of $\zeta$ that exhibits the lowest RMSE value. The remaining hyperparameters $\alpha$ and $\rho$ in Algorithm 1 are simply set as $\alpha = 1$ , and $\; \rho = 1$ , following the strategy proposed in ^[9]. The hyperparameters $\lambda$ and $\zeta$ for MC-TV are set up in the same manner as for GME-TV. For TV, since there is no non-convex control parameter, we set $\lambda$ in the same way as for GME-TV. The remaining hyperparameters of MC-TV and TV are set by default according to ^[9] and ^[46], respectively.

The changes in RMSE values of different denoising models with $\lambda$ and $\zeta$ are illustrated in Figure 1. It can be observed that the denoising performance of the models varies with different hyperparameter values. Moreover, GME-TV demonstrates superior denoising performance compared to other models.

Figure 1. (a) RMSE as a function of the regularization parameter

$\lambda$ ; (b) RMSE as a function of the non-convex control parameter

$\zeta$ .

DownLoad: Full-Size Img PowerPoint

We set the hyperparameters of different denoising models according to the results of Figure 1, and show the denoising results of different models in Figure 2. As depicted in Figure 2, TV regularization effectively mitigates noise. However, it tends to underestimate the magnitude of certain jump-type discontinuities and lacks the piecewise constant characteristics of the original noiseless signal. In contrast, GME-TV and MC-TV regularization provide a more precise estimation of segment points in piecewise constant signals and demonstrate superior denoising performance compared to TV regularization. The RMSE values also substantiate the inherent superiority of GME-TV regularization.

Figure 2. 1D signal denoising using different regularizations.

DownLoad: Full-Size Img PowerPoint

4.2. 2D denoising

In this experiment, we test the performance of different regularizations on image denoising. Four commonly used denoising test images, namely "Lena", "Man", "QRcode", and "Geometric", were utilized as shown in Figures 3(a), 4(a), , and . All images have a size of $512\times512$ with grayscale values ranging from $0$ to $255$ . Subsequently, all images are degraded by additive white Gaussian noise with noise level $\sigma = 0.9$ . The noisy images are shown in Figures 3(e), 4(e), 5(e), and 6(e).

Figure 3. Image denoising using different regularizations. (a) Original image; (b) TV denoising result, PSNR = 32.0488 dB; (c) MC-TV denoising result, PSNR = 33.5046 dB; (d) GME-TV denoising result, PSNR = 36.4250 dB; (e) Noisy image; (f) Difference between (a) and (b); (g) Difference between (a) and (c); (h) Difference between (a) and (d).

DownLoad: Full-Size Img PowerPoint

Figure 4. Image denoising using different regularizations. (a) Original image; (b) TV denoising result, PSNR = 32.0488 dB; (c) MC-TV denoising result, PSNR = 33.5046 dB; (d) GME-TV denoising result, PSNR = 36.4250 dB; (e) Noisy image; (f) Difference between (a) and (b); (g) Difference between (a) and (c); (h) Difference between (a) and (d).

DownLoad: Full-Size Img PowerPoint

Figure 5. Image denoising using different regularizations. (a) Original image; (b) TV denoising result, PSNR = 41.5842 dB; (c) MC-TV denoising result, PSNR = 44.0425 dB; (d) GME-TV denoising result, PSNR = 54.2998 dB; (e) Noisy image; (f) Difference between (a) and (b); (g) Difference between (a) and (c); (h) Difference between (a) and (d).

DownLoad: Full-Size Img PowerPoint

Figure 6. Image denoising using different regularizations. (a) Original image; (b) TV denoising result, PSNR = 41.1239 dB; (c) MC-TV denoising result, PSNR = 41.3220 dB; (d) GME-TV denoising result, PSNR = 43.1082 dB; (e) Noisy image; (f) Difference between (a) and (b); (g) Difference between (a) and (c); (h) Difference between (a) and (d).

DownLoad: Full-Size Img PowerPoint

The image denoising accuracy is quantified by the peak signal-to-noise ratio (PSNR), which is defined as

$\begin{align} \text{PSNR} = 10 \times \log_{10} \left( \frac{\text{MAX}^2}{\text{MSE}} \right), \end{align}$

(4.2)

where $\text{MAX} = 255$ is the maximum value of the images, and

$\begin{align} \text{MSE} = \frac{1}{M \times N} \sum\limits_{i = 1}^{M} \sum\limits_{j = 1}^{N} (\theta^*_{ij} - \hat{\theta}_{ij})^2, \end{align}$

(4.3)

where $\theta^*_{i j}$ and $\hat{\theta}_{i j}$ are the pixel values of the original image $\theta^*$ and the denoised image $\hat{\theta}$ , respectively, with all images having a size of $M = N = 512$ . The higher the PSNR value, the smaller the difference between the denoised image and the original image.

We employ the same strategy as the 1D denoising experiment to determine the hyperparameter values of different denoising models, and show the denoising results of each model in Figures 3–6. To enhance the visual contrast, in addition to enlarging a small patch of the local image, we also highlight the difference between the denoised image and the original one. As shown in Figures 3–6, all three denoising models can remove noise well. Compared with TV denoising results, MC-TV and GME-TV not only enhance image texture and other intricate details but also sharpen edges more efficiently for a superior visual outcome. Moreover, the PSNR values further demonstrate the superior performance of GME-TV regularization, as evidenced by the improved PSNR values for all four images compared to TV regularization and MC-TV regularization.

5. Conclusions

In this paper, we propose CNC-TV regularization as a valuable alternative to TV regularization. This approach effectively addresses the issue of underestimation of signal discontinuity while ensuring the global convexity of the objective function. Empirical evidence is provided to demonstrate the superior performance of CNC-TV regularization compared to TV regularization. Furthermore, our theoretical analysis establishes a foundation for a deeper understanding of the estimation properties associated with CNC-TV regularization. Additionally, we demonstrate that the CNC-TV regularization exhibits superior performance on both 1D and 2D denosing experiments.

In this work, we mainly focus on the theoretical analysis of CNC-TV. In fact, these findings suggest its potential significance in future applications such as image denoising, image reconstruction, seismic data processing, etc. Furthermore, there are also some challenges in the implementation of the CNC-TV model, such as the setting strategy of the non-convex control matrix parameter $B$ , which will be further studied in our following work.

Author contributions

Yating Zhu: Conceptualization, Methodology, Software, Writing–original draft; Zixun Zeng: Software, Data curation; Zhong Chen: Funding acquisition, Supervision; Deqiang Zhou: Data curation, Formal analysis; Jian Zou: Project administration, Writing–review & editing, Supervision. All authors have read and agreed to the published version of the manuscript.

Acknowledgments

This research was funded by the National Natural Science Foundation of China (grant number 62273060), and the Undergraduate Training Program of Yangtze University for Innovation and Entrepreneurship (Yz2023302).

Conflict of interest

All authors declare no conflicts of interest.

References

[1]	M. Elad, B. Kawar, G. Vaksman, Image denoising: the deep learning revolution and beyond–a survey paper, SIAM J. Imaging Sci., 16 (2023), 1594–1654. https://doi.org/10.1137/23M1545859 doi: 10.1137/23M1545859
[2]	L. I. Rudin, S. Osher, E. Fatemi, Nonlinear total variation based noise removal algorithms, Phys. D, 60 (1992), 259–268. https://doi.org/10.1016/0167-2789(92)90242-F doi: 10.1016/0167-2789(92)90242-F
[3]	A. Chambolle, V. Caselles, D. Cremers, M. Novaga, T. Pock, An introduction to total variation for image analysis, In: Theoretical foundations and numerical methods for sparse recovery, Berlin, New York: De Gruyter, 2010,263–340. https://doi.org/10.1515/9783110226157.263
[4]	M. Burger, S. Osher, A guide to the TV zoo, In: Level set and PDE based reconstruction methods in imaging, Cham: Springer, 2013, 1–70. https://doi.org/10.1007/978-3-319-01712-9_1
[5]	M. Pragliola, L. Calatroni, A. Lanza, F. Sgallari, On and beyond total variation regularization in imaging: the role of space variance, SIAM Rev., 65 (2023), 601–685. https://doi.org/10.48550/arXiv.2104.03650 doi: 10.48550/arXiv.2104.03650
[6]	D. H. Zhao, R. Y. Huang, L. Feng, Proximity algorithms for the $L^1L^2/TV^{\alpha}$ image denoising model, AIMS Math., 9 (2024), 16643–16665. https://doi.org/10.3934/math.2024807 doi: 10.3934/math.2024807
[7]	A. Ben-loghfyry, A. Hakim, A bilevel optimization problem with deep learning based on fractional total variation for image denoising, Multimed. Tools Appl., 83 (2024), 28595–28614. https://doi.org/10.1007/s11042-023-16583-4 doi: 10.1007/s11042-023-16583-4
[8]	A. Lanza, S. Morigi, F. Sgallari, Convex image denoising via non-convex regularization with parameter selection, J. Math. Imaging Vis., 56 (2016), 195–220. https://doi.org/10.1007/s10851-016-0655-7 doi: 10.1007/s10851-016-0655-7
[9]	I. Selesnick, A. Lanza, S. Morigi, F. Sgallari, Non-convex total variation regularization for convex denoising of signals, J. Math. Imaging Vis., 62 (2020), 825–841. https://doi.org/10.1007/s10851-019-00937-5 doi: 10.1007/s10851-019-00937-5
[10]	M. Kang, M. Jung, Nonconvex fractional order total variation based image denoising model under mixed stripe and Gaussian noise, AIMS Math., 9 (2024), 21094–21124. https://doi.org/10.3934/math.20241025 doi: 10.3934/math.20241025
[11]	I. W. Selesnick, A. Parekh, I. Bayram, Convex 1-D total variation denoising with non-convex regularization, IEEE Signal Process. Lett., 22 (2015), 141–144. https://doi.org/10.1109/LSP.2014.2349356 doi: 10.1109/LSP.2014.2349356
[12]	J. Darbon, M. Sigelle, Image restoration with discrete constrained total variation part Ⅱ: Levelable functions, convex priors and non-convex cases, J. Math. Imaging Vis., 26 (2006), 277–291. https://doi.org/10.1007/s10851-006-0644-3 doi: 10.1007/s10851-006-0644-3
[13]	H. L. Zhang, L. M. Tang, Z. Fang, C. C. Xiang, C. Y. Li, Nonconvex and nonsmooth total generalized variation model for image restoration, Signal Process., 143 (2018), 69–85. https://doi.org/10.1016/j.sigpro.2017.08.021 doi: 10.1016/j.sigpro.2017.08.021
[14]	M. Hintermüller, T. Wu, Nonconvex TV $^q$ -models in image restoration: analysis and a trust-region regularization–based superlinearly convergent solver, SIAM J. Imaging Sci., 6 (2013), 1385–1415. https://doi.org/10.1137/110854746 doi: 10.1137/110854746
[15]	Z. Fang, L. M. Tang, L. Wu, H. X. Liu, A nonconvex TV $_q$ - $l_1$ regularization model and the ADMM based algorithm, Sci. Rep., 12 (2022), 7942. https://doi.org/10.1038/s41598-022-11938-7 doi: 10.1038/s41598-022-11938-7
[16]	A. Lanza, S. Morigi, I. W. Selesnick, F. Sgallari, Sparsity-inducing nonconvex nonseparable regularization for convex image processing, SIAM J. Imaging Sci., 12 (2019), 1099–1134. https://doi.org/10.1137/18M1199149 doi: 10.1137/18M1199149
[17]	A. Lanza, S. Morigi, I. W. Selesnick, F. Sgallari, Convex non-convex variational models, In: Handbook of mathematical models and algorithms in computer vision and imaging: mathematical imaging and vision, Cham: Springer, 2023, 3–59. https://doi.org/10.1007/978-3-030-98661-2_61
[18]	Y. L. Liu, H. Q. Du, Z. X. Wang, W. B. Mei, Convex MR brain image reconstruction via non-convex total variation minimization, Int. J. Imaging Syst. Technol., 28 (2018), 246–253. https://doi.org/10.1002/ima.22275 doi: 10.1002/ima.22275
[19]	M. R. Shen, J. C. Li, T. Zhang, J. Zou, Magnetic resonance imaging reconstruction via non-convex total variation regularization, Int. J. Imaging Syst. Technol., 31 (2021), 412–424. https://doi.org/10.1002/ima.22463 doi: 10.1002/ima.22463
[20]	J. L. Li, Z. Y. Xie, G. Q. Liu, L. Yang, J. Zou, Diffusion optical tomography reconstruction based on convex-nonconvex graph total variation regularization, Math. Methods Appl. Sci., 46 (2023), 4534–4545. https://doi.org/10.1002/mma.8777 doi: 10.1002/mma.8777
[21]	Q. Li, A comprehensive survey of sparse regularization: fundamental, state-of-the-art methodologies and applications on fault diagnosis, Expert Syst. Appl., 229 (2023), 120517.
[22]	H. B. Lin, F. T. Wu, G. L. He, Rolling bearing fault diagnosis using impulse feature enhancement and nonconvex regularization, Mech. Syst. Signal Process., 142 (2020), 106790. https://doi.org/10.1016/j.ymssp.2020.106790 doi: 10.1016/j.ymssp.2020.106790
[23]	J. P. Wang, G. L. Xu, C. L. Li, Z. S. Wang, F. J. Yan, Surface defects detection using non-convex total variation regularized RPCA with kernelization, IEEE Trans. Instrum. Meas., 70 (2021), 1–13. https://doi.org/10.1109/TIM.2021.3056738 doi: 10.1109/TIM.2021.3056738
[24]	T. H. Wen, Z. Chen, T. Zhang, J. Zou, Graph-based semi-supervised learning with non-convex graph total variation regularization, Expert Syst. Appl., 255 (2024), 124709. https://doi.org/10.1016/j.eswa.2024.124709 doi: 10.1016/j.eswa.2024.124709
[25]	G. Scrivanti, É. Chouzenoux, J. C. Pesquet, A CNC approach for directional total variation, In: 2022 30th European Signal Processing Conference (EUSIPCO), 488–492. https://doi.org/10.23919/EUSIPCO55093.2022.9909763
[26]	H. Q. Du, Y. L. Liu, Minmax-concave total variation denoising, Signal Image Video Process., 12 (2018), 1027–1034. https://doi.org/10.1007/s11760-018-1248-2 doi: 10.1007/s11760-018-1248-2
[27]	J. C. Hütter, P. Rigollet, Optimal rates for total variation denoising, In: Conference on Learning Theory, 49 (2016), 1115–1146.
[28]	J. Q. Fan, R. Z. Li, Variable selection via nonconcave penalized likelihood and its oracle properties, J. Amer. Statist. Assoc., 96 (2001), 1348–1360. https://doi.org/10.1198/016214501753382273 doi: 10.1198/016214501753382273
[29]	A. H. Al-Shabili, Y. Feng, I. Selesnick, Sharpening sparse regularizers via smoothing, IEEE Open J. Signal Process., 2 (2021), 396–409. https://doi.org/10.1109/OJSP.2021.3104497 doi: 10.1109/OJSP.2021.3104497
[30]	I. Selesnick, Sparse regularization via convex analysi, IEEE Trans. Signal Process., 65 (2017), 4481–4494. https://doi.org/10.1109/TSP.2017.2711501 doi: 10.1109/TSP.2017.2711501
[31]	E. Mammen, S. van de Geer, Locally adaptive regression splines, Ann. Statist., 25 (1997), 387–413. https://doi.org/10.1214/aos/1034276635 doi: 10.1214/aos/1034276635
[32]	D. Needell, R. Ward, Near-optimal compressed sensing guarantees for total variation minimization, IEEE. Trans. Image Process., 22 (2013), 3941–3949. https://doi.org/10.1109/TIP.2013.2264681 doi: 10.1109/TIP.2013.2264681
[33]	Y. X. Wang, J. Sharpnack, A. J. Smola, R. J. Tibshirani, Trend filtering on graphs, J. Mach. Learn. Res., 17 (2016), 1–41.
[34]	V. Sadhanala, Y. X. Wang, R. J. Tibshirani, Total variation classes beyond 1d: minimax rates, and the limitations of linear smoothers, In: Advances in Neural Information Processing Systems 29 (NIPS 2016), 2016.
[35]	S. Chatterjee, S. Goswami, New risk bounds for 2D total variation denoising, IEEE Trans. Inform. Theory, 67 (2021), 4060–4091. https://doi.org/10.1109/TIT.2021.3059657 doi: 10.1109/TIT.2021.3059657
[36]	R. Varma, H. Lee, J. Kovačević, Y. Chi, Vector-valued graph trend filtering with non-convex penalties, IEEE Trans. Signal Inform. Process. Netw., 6 (2020), 48–62. https://doi.org/10.1109/TSIPN.2019.2957717 doi: 10.1109/TSIPN.2019.2957717
[37]	A. Guntuboyina, D. Lieu, S. Chatterjee, B. Sen, Adaptive risk bounds in univariate total variation denoising and trend filtering, Ann. Statist., 48 (2020), 205–229. https://doi.org/10.1214/18-AOS1799 doi: 10.1214/18-AOS1799
[38]	F. Ortelli, S. van de Geer, Adaptive rates for total variation image denoising, J. Mach. Learn. Res., 21 (2020), 1–38.
[39]	F. Ortelli, S. van de Geer, Prediction bounds for higher order total variation regularized least squares, Ann. Statist., 49 (2021), 2755–2773. https://doi.org/10.1214/21-AOS2054 doi: 10.1214/21-AOS2054
[40]	N. Parikh, S. Boyd, Proximal algorithms, Found. Trends Optim., 1 (2014), 127–239. https://doi.org/10.1561/2400000003
[41]	S. Boyd, N. Parikh, E. Chu, B. Peleato, J. Eckstein, Distributed optimization and statistical learning via the alternating direction method of multipliers, Found. Trends Mach. Learn., 3 (2011), 1–122. http://dx.doi.org/10.1561/2200000016 doi: 10.1561/2200000016
[42]	S. Boucheron, G. Lugosi, P. Massart, Concentration inequalities: a nonasymptotic theory of independence, Oxford University Press, 2013. https://doi.org/10.1093/acprof:oso/9780199535255.001.0001
[43]	N. Kumar, M. Sonkar, G. Bhatnagar, Efficient image restoration via non-convex total variation regularization and ADMM optimization, Appl. Math. Model., 132 (2024), 428–453. https://doi.org/10.1016/j.apm.2024.04.055 doi: 10.1016/j.apm.2024.04.055
[44]	S. J. Ma, J. Huang, A concave pairwise fusion approach to subgroup analysis, J. Amer. Statist. Assoc., 112 (2017), 410–423. https://doi.org/10.1080/01621459.2016.1148039 doi: 10.1080/01621459.2016.1148039
[45]	P. Tseng, Convergence of a block coordinate descent method for nondifferentiable minimization, J. Optim. Theory Appl., 109 (2001), 475–494. https://doi.org/10.1023/A:1017501703105 doi: 10.1023/A:1017501703105
[46]	L. Condat, A direct algorithm for 1-D total variation denoising, IEEE Signal Process. Lett., 20 (2013), 1054–1057. https://doi.org/10.1109/LSP.2013.2278339 doi: 10.1109/LSP.2013.2278339

Reader Comments

Your name:*

Email:*
© 2024 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

AIMS Mathematics

1.8 3.4

Metrics

Article views(620) PDF downloads(29) Cited by(0)

Preview PDF

Download XML

Export Citation

Article outline

Show full outline

Figures and Tables

Figures(6)

AIMS Mathematics

Performance analysis of the convex non-convex total variation denoising model

Related Papers:

Abstract

1. Introduction

2. Primary and related work

2.1. Non-convex TV regularization

2.2. Performance analysis results

2.3. Convex optimization algorithms

3. CNC-TV

3.1. Convexity condition

3.2. Performance analysis

3.3. ADMM algorithm

4. Numerical experiments

4.1. 1D denoising

4.2. 2D denoising

5. Conclusions

Author contributions

Acknowledgments

Conflict of interest

References

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Catalog

AIMS Mathematics

Performance analysis of the convex non-convex total variation denoising model

Related Papers:

Abstract

1. Introduction

2. Primary and related work

2.1. Non-convex TV regularization

2.2. Performance analysis results

2.3. Convex optimization algorithms

3. CNC-TV

3.1. Convexity condition

3.2. Performance analysis

3.3. ADMM algorithm

4. Numerical experiments

4.1. 1D denoising

4.2. 2D denoising

5. Conclusions

Author contributions

Acknowledgments

Conflict of interest

References

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Related pages

Tools

Export File

Citation

Format

Content

Catalog