Research article

New regularization methods for convolutional kernel tensors

  • Received: 28 June 2023 Revised: 21 August 2023 Accepted: 31 August 2023 Published: 12 September 2023
  • MSC : 15B05, 65F15

  • Convolution is a very basic and important operation for convolutional neural networks. For neural network training, how to bound the convolutional layers is a currently popular research topic. Each convolutional layer is represented by a tensor, which corresponds to a structured transformation matrix. The objective is to ensure that the singular values of each transformation matrix are bounded around 1 by changing the entries of the tensor. We propose three new regularization terms for a convolutional kernel tensor and derive the gradient descent algorithm for each penalty function. Numerical examples are presented to demonstrate the effectiveness of the algorithms.

    Citation: Pei-Chang Guo. New regularization methods for convolutional kernel tensors[J]. AIMS Mathematics, 2023, 8(11): 26188-26198. doi: 10.3934/math.20231335

    Related Papers:

    [1] F. Z. Geng . Piecewise reproducing kernel-based symmetric collocation approach for linear stationary singularly perturbed problems. AIMS Mathematics, 2020, 5(6): 6020-6029. doi: 10.3934/math.2020385
    [2] Ali Hassani . Singular expansion of the wave kernel and harmonic sums on Riemannian symmetric spaces of the non-compact type. AIMS Mathematics, 2025, 10(3): 4775-4791. doi: 10.3934/math.2025219
    [3] Ayman Elsharkawy, Hoda Elsayied, Abdelrhman Tawfiq, Fatimah Alghamdi . Geometric analysis of the pseudo-projective curvature tensor in doubly and twisted warped product manifolds. AIMS Mathematics, 2025, 10(1): 56-71. doi: 10.3934/math.2025004
    [4] Dazhao Chen . Weighted boundedness for Toeplitz type operator related to singular integral transform with variable Calderón-Zygmund kernel. AIMS Mathematics, 2021, 6(1): 688-697. doi: 10.3934/math.2021041
    [5] Nuraddeen S. Gafai, Ali H. M. Murid, Samir Naqos, Nur H. A. A. Wahid . Computing the zeros of the Szegö kernel for doubly connected regions using conformal mapping. AIMS Mathematics, 2023, 8(5): 12040-12061. doi: 10.3934/math.2023607
    [6] Hong-Mei Song, Shi-Wei Wang, Guang-Xin Huang . Tensor Conjugate-Gradient methods for tensor linear discrete ill-posed problems. AIMS Mathematics, 2023, 8(11): 26782-26800. doi: 10.3934/math.20231371
    [7] Tianji Wang, Qingdao Huang . A new Newton method for convex optimization problems with singular Hessian matrices. AIMS Mathematics, 2023, 8(9): 21161-21175. doi: 10.3934/math.20231078
    [8] Salima Kouser, Shafiq Ur Rehman, Mabkhoot Alsaiari, Fayyaz Ahmad, Mohammed Jalalah, Farid A. Harraz, Muhammad Akram . A smoothing spline algorithm to interpolate and predict the eigenvalues of matrices extracted from the sequence of preconditioned banded symmetric Toeplitz matrices. AIMS Mathematics, 2024, 9(6): 15782-15795. doi: 10.3934/math.2024762
    [9] Jiaqi Qu, Yunlan Wei, Yanpeng Zheng, Zhaolin Jiang . Fast algorithms for a linear system with infinitesimal generator structure of a Markovian queueing model. AIMS Mathematics, 2025, 10(3): 6546-6559. doi: 10.3934/math.2025299
    [10] Young Joon Ahn . An approximation method for convolution curves of regular curves and ellipses. AIMS Mathematics, 2024, 9(12): 34606-34617. doi: 10.3934/math.20241648
  • Convolution is a very basic and important operation for convolutional neural networks. For neural network training, how to bound the convolutional layers is a currently popular research topic. Each convolutional layer is represented by a tensor, which corresponds to a structured transformation matrix. The objective is to ensure that the singular values of each transformation matrix are bounded around 1 by changing the entries of the tensor. We propose three new regularization terms for a convolutional kernel tensor and derive the gradient descent algorithm for each penalty function. Numerical examples are presented to demonstrate the effectiveness of the algorithms.



    Convolutional neural networks (CNNs) are an important class of deep learning models and they have been applied successfully in image understanding in recent years. The use of CNNs is now the dominant approach for almost all recognition and detection tasks [8]. Despite the great success, the training of deep convolutional networks remains to be difficult both theoretically and practically. It has been shown that exploiting the orthogonality to regularize convolutional layers can improve the stability and performance of CNNs and alleviate the issue of unstable gradients [2,4,9,16,17,21,24]. In this paper, we propose three new regularization terms for convolutional layers and derive the gradient descent algorithm for each penalty function.

    First we introduce some necessary notations used in this paper. The notation denotes the convolution arithmetic in neural networks. vec(X) denotes the vectorization of X. When X is a matrix, with the columns of X stacked on top of one another, vec(X) denotes the resulting column vector. When X is a tensor, vec(X) denotes the column vector obtained by stacking the columns of the flattening of X along the first index (please see [7] on the flattening of a tensor). The notation means to round a number to the nearest integer greater than or equal to the number. For a matrix A, σmax(A) and σmin(A) denote the largest and smallest singular values, respectively.

    The tensor is an important concept in many disciplines [5,15]. Tensors can represent multi-relational data or nonlinear relationships. In CNNs, the convolution is a basic and important operation, which is represented by a tensor. Each convolution arithmetic is associated with a linear structured transformation matrix. Given a convolutional kernel tensor K, Y=KX is mathematically equivalent to

    vec(Y)=Mvec(X), (1.1)

    where M is the structured transformation matrix.

    In the field of deep learning, there exist different forms of convolution arithmetic because of different choices about strides and padding patterns [6]. In this paper, without losing generality, the same convolution with unit strides is used to introduce our method. For the one-channel case, a convolutional kernel is represented by a matrix KRk×k and the input is a matrix XRN×N; then, the output YRN×N is computed by

    Yr,s=(KX)r,s=p{1,,k}q{1,,k}Xrm+p,sm+qKp,q,

    where m=k/2 and Xi,j=0 if i0 or i>N, or if j0 or j>N.

    In deep convolutional networks, multi-channel convolutions are more common and a convolutional kernel is represented by a 4 dimensional tensor. For a kernel tensor KRk×k×g×h and the input represented by a 3 dimensional tensor XRN×N×g, the output Y=KX,YRN×N×h is given by

    Yr,s,c=(KX)r,s,c=d{1,,g}p{1,,k}q{1,,k}Xrm+p,sm+q,dKp,q,d,c,

    where m=k/2 and Xi,j,d=0 if i0 or i>N, or if j0 or j>N.

    Deep neural networks are usually layered. The singular values of the Jacobian of a layer bound the factors by which the norms of forward-propagated and backpropagated signals change. In the backward direction, if the singular values of the layers are all close to zero or all significantly larger than 1, gradient exploding or gradient vanishing will occur, which are fundamental obstacles for training deep networks [8,11,17]. In the forward direction, if the the singular values of the layers are all bounded, the computations will be more stable, the generalization error can be bounded and the robustness to adversarial examples can be improved [1,4,13,19,20,25]. Therefore, it is desirable to constrain the operator norms of network layers. The stability and Hopf bifurcation of some delayed neural networks have been investigated [14,22,23]. Convolutional layers are important components of CNNs. In this paper we will give three new regularization terms for the singular values of convolutional layers and develop the gradient descent algorithms; thus, we can modify the singular values of M in (1.1) as desired by changing the entries of K.

    In the field of deep learning, there have been many research papers studying how to enforce the orthogonality or spectral norm regularization on the weights of a neural network [2,4,16,24]. Unlike the preceding papers including [2,4,16,24] and the references therein, in this paper we handle convolutions differently. They get the h×(gkk) matrix by reshaping the kernel KRk×k×g×h; they then enforce the constraint directly on the h×(gkk) matrix. We enforce the constraint on the transformation matrix M associated with the convolutional kernel tensor K. In [17], the authors project a convolutional layer onto an operator-norm ball and confirm that this is an effective regularizer by conducting numerical experiments. Although the projection method in [17] can effectively prevent the singular values of the transformation matrix from being large, it can't prevent the singular values from being too small. In [10,21], regularization methods are given to ensure that the transformation matrix is near orthogonal, where the largest and smallest singular values are modified simultaneously.

    In this paper we present new regularization methods for the convolutional kernel tensor K. We have two main contributions. First, the new proposed regularization terms can decrease the largest singular value and increase the smallest singular value of convolutional layers independently. Thus the regularization will be more flexible and targeted, depending on the practical need during the training process. Existing methods have no clear impact on the singular values of the transformation matrix M, or they cannot effectively prevent the singular values from being smaller, or can only simultaneously decrease the largest singular value and increase the smallest singular value [10,16,17,21,24]. Second, we give the formulae for partial derivatives of the proposed penalty functions versus the convolutional kernel tensor, which are first order perturbation results, revealing how each entry of a convolutional kernel tensor affects the singular values of the associated structured transformation matrix.

    The rest of the paper is organized as follows: In Section 2, as a warm-up, we handle the one-channel case in which the kernel K is a k×k matrix. We propose the penalty functions and give the formulas for computing partial derivatives. In Section 3, we handle the multi-channel case, where the kernel is represented by a tensor KRk×k×g×h. We also propose the penalty functions and give the gradient descent algorithms. In Section 4, we present numerical results to show that the proposed methods are effective. In Section 5, some conclusions and discussions are given.

    For the one-channel case, the convolutional kernel is a k×k matrix, and there exist one input channel and one output channel. Suppose that the convolutional kernel K is a 3×3 matrix and the input data matrix is N×N; we show the form of the associated structured transformation matrix. Here,

    K=(k11k12k13k21k22k23k31k32k33).

    For Y=KX, the linear transformation matrix M satisfies the equation vec(Y)=Mvec(X), so we can get the linear transformation matrix M as

    M=(A0A1000A1A0A10A1A000A10A1A0A1000A1A0), (2.1)

    where

    A0=(k22k32000k12k22k320k12k2200k320k12k22k32000k12k22),A1=(k23k33000k13k23k330k13k2300k330k13k23k33000k13k23),
    A1=(k21k31000k11k21k310k11k2100k310k11k21k31000k11k21).

    For this case, the N2×N2 matrix M is a doubly blocked banded Toeplitz matrix, i.e., a banded block Toeplitz matrix with its blocks represented as banded Toeplitz matrices. For the details about Toeplitz matrices, we recommend the references [3,12]. We use T to represent the set of all matrices with the same structure as M in (2.1), i.e., doubly blocked banded Toeplitz matrices with a fixed bandwidth.

    From the structure of M, we see that one entry of K corresponds to more than one entry of M. The value of Kp,q will appear at different (i,j) indexes of the matrix M. In this section, We use S to denote this index set, to which each (i,j) index corresponding to Kp,q belongs. That is to say, we have that mij=Kp,q for each (i,j)S and mijKp,q for each (i,j) that does not satisfy (i,j)S.

    Given a matrix M, the square of the Frobenius norm of M, M2F, is the sum of squares of all the entries of M. Meanwhile, it is equal to the sum of squares of all the singular values of M [7]. We will use 12M2F as the regularization term for the convolutional kernel K to prevent the singular values from being too large, and we derive the formula 12M2F/Kp,q. We give the following simple lemma, which will be useful in the following derivation.

    Lemma 2.1. For ARn×n, in terms of the partial derivative of the square of its Frobenius norm with respect to entries aij, it holds that A2F/aij=2A.

    Proof. Combining

    Σa2ij/aij=2aijwithA2F=Σa2ij,

    we can get that

    A2F/aij=2A.

    As we see, one entry of K corresponds to more than one entry of M. For the entry Kp,q, the (i,j) index set S represents the locations in M. According to the chain rule formula about the derivative, in order to get M2F/Kp,q, we need to compute M2F/mij for all (i,j)S and take the sum. Then we can organize the above analysis result as the following theorem.

    Theorem 2.1. Let MRn×n be the structured transformation matrix associated with the kernel KRk×k. Given (p,q), if S denotes the set of all indices (i,j) such that mij=kp,q, it holds that

    12M2FKp,q=(i,j)Smij. (2.2)

    Proof. As we see from the structure of M, each entry of K corresponds to more than one entry of M. Given (p,q), as S denotes the set of all indices (i,j) such that mij=kp,q, combining Lemma 2.1 with the chain rule formula about the derivative, we get

    12M2FKp,q=12(i,j)SM2Fmij=(i,j)Smij.

    As we know, the Frobenius norm of a matrix equals the sum of squares of the singular values. Formula (2.2) could be used to implement the gradient descent algorithm for M2F. So, we can change the entries of a convolutional kernel K to let singular values of M be smaller.

    In this subsection, we show how to increase the smallest singular value of M by modifying the entries of K. To compute σmin(M)/Kp,q, we need the following lemma, which is the perturbation analysis result for a simple singular value of a matrix; see [18] for the details.

    Lemma 2.2. For a matrix A=[aij]Rm×m, if σ is a simple singular value, and u and v are respectively the normalized left and right singular vectors associated with σ. Then σ/aij is uvT.

    The value of Kp,q will appear at different (i,j) indexes of the matrix M, where S is the index set. Therefore we can use the chain rule and Lemma 2.2 to get the next theorem.

    Theorem 2.2. For the one-channel convolutional kernel KRk×k, let MRn×n be the structured transformation matrix. Assume that σmin(M) is simple and σmin(M)>0, and that u, v are the normalized left and right singular vectors associated with σmin(M). Given (p,q), if S denotes the set of all indices (i,j) such that mij=kp,q, we have

    σmin(M)/Kp,q=(i,j)Su(i)v(j). (2.3)

    Proof. Each entry of K corresponds to more than one entry of M. Given (p,q), S denotes the set of all indices (i,j) such that mij=kp,q. Combining Lemma 2.2 with the chain rule formula for the derivative, we get (2.3).

    Formula (2.3) could be used to implement the gradient descent for the penalty function σmin(M). Then we can modify the entries of K to increase σmin(M).

    Now we can combine Theorems 2.1 and 2.2 to ensure that the singular values of M are neither large nor small. As we know, M2F is the squared sum of all singular values of M. If M is n×n, M2F is the squared sum of n singular values. We may choose 12M2Fnσmin(M) as the regularization term to ensure that the singular values of M are neither large nor small. This leads to the next theorem.

    Theorem 2.3. Let MRn×n be the structured transformation matrix corresponding to the one channel convolutional kernel KRk×k. Assume that σmin(M)>0 and that σmin(M) is simple, and u, v are the normalized left and right singular vectors of M that are respectively associated with σmin(M). Given (p,q), if S denotes the set of all indices (i,j) such that mij=kp,q, we have

    (12M2Fnσmin(M))/Kp,q=(i,j)S(mijnu(i)v(j)). (2.4)

    Proof. Combining (2.2) with (2.3), we can get (2.4).

    For the case of multi-channel convolution, the convolutional kernel is represented by a tensor KRk×k×g×h. The tensor XRN×N×g denotes the input, where element Xi,j,d is the value of the input unit within channel d at row i and column j. Entries of Y=KX,YRN×N×h are computed according to

    Yr,s,c=(KX)r,s,c=d{1,,g}p{1,,k}q{1,,k}Xrm+p,sm+q,dKp,q,d,c,

    where Xi,j,d=0 if i0 or i>N, or if j0 or j>N. Through calculation, the structured transformation matrix M such that vec(Y)=Mvec(X) is as follows

    M=(M(1)(1)M(1)(2)M(1)(g)M(2)(1)M(2)(2)M(2)(g)M(h)(1)M(h)(2)M(h)(g)), (3.1)

    where M(c)(d)S, i.e., M(c)(d) is a N2×N2 doubly blocked banded Toeplitz matrix. M(c)(d) corresponds to the portion K:,:,d,c that is convolved with the d-th input channel to get the c-th output channel.

    In this section, we use Ωp,q,z,y to denote the set of all indexes (i,j) satisfying that mij=Kp,q,z,y. That is to say, we have that mij=Kp,q,z,y for each (i,j)Ωp,q,z,y and mijKp,q,z,y for each (i,j) that does not satisfy (i,j)Ωp,q,z,y.

    We can generalize the results for one-channel convolution to the multi-channel case, which are summarized as the following two theorems.

    Theorem 3.1. For the convolutional kernel KRk×k×g×h, let M be the associated structured transformation matrix as defined in (3.1). Given (p,q,z,y), if Ωp,q,z,y is the set of all indices (i,j) such that mij=kp,q,z,y, it holds that

    12M2FKp,q,z,y=(i,j)Ωp,q,z,ymij. (3.2)

    The proof of Theorem 3.1 follows from Lemma 2.1 as in Theorem 2.1; it is omitted here.

    Theorem 3.2. For the convolutional kernel KRk×k×g×h, let M be the associated structured transformation matrix as defined in (3.1). Given (p,q,z,y), if Ωp,q,z,y is the set of all indices (i,j) such that mij=kp,q,z,y, it holds that

    σmin(M)/Kp,q,z,y=(i,j)Ωp,q,z,yu(i)v(j). (3.3)

    The proof of Theorem 3.2 follows from Lemma 2.2 as in Theorem 2.2; it is omitted here.

    Theorem 3.3. For the convolutional kernel KRk×k×g×h, let M be the associated structured transformation matrix as defined in (3.1). Given (p,q,z,y), if Ωp,q,z,y is the set of all indices (i,j) such that mij=kp,q,z,y, it holds that

    (12M2Fmin(g,h)N2σmin(M))/Kp,q,z,y=(i,j)Ωp,q,z,y(mijmin(g,h)u(i)v(j)). (3.4)

    Here, min(g,h) denotes the smaller value of g and h.

    Proof. Combining (3.2) with (3.3), we can get (3.4).

    Then we present the detailed gradient descent algorithm for the three different penalty functions, where in Algorithm 3.3, min(g,h) denotes the smaller value of g and h.

    Algorithm 3.1. Gradient descent algorithm for Rα(K)=12M2F

    (1) Input: a convolutional kernel tensor KRk×k×g×h, step size λ, and input size N×N×g.

    (2) If σmax(M) is large:

    (3)   Compute G=[12M2Fkp,q,z,y]k,k,g,hp,q,z,y=1 by (3.2);

    (4)   Update K=KλG;

    (5) End

    Algorithm 3.2. Gradient descent algorithm for Rα(K)=σmin(M)

    (1) Input: a convolutional kernel tensor KRk×k×g×h, step size λ, and input size N×N×g.

    (2) If σmin(M) is small:

    (3)   Compute G=[σmin(M)kp,q,z,y]k,k,g,hp,q,z,y=1 by (3.3);

    (4)   Update K=KλG;

    (5) End

    Algorithm 3.3. Gradient descent algorithm for Rα(K)=12M2Fmin(g,h)N2σmin(M)

    (1) Input: a convolutional kernel tensor KRk×k×g×h, step size λ, and input size N×N×g.

    (2) While not converged:

    (3)   Compute G=[(12M2Fmin(g,h)N2σmin(M))kp,q,z,y]k,k,g,hp,q,z,y=1 by (3.4);

    (4)   Update K=KλG;

    (5) End

    We do numerical experiments by using MATLAB R2016b on a laptop. The laptop had specifications of 3.0 GHz and 16GB of memory. M denotes the transformation matrix corresponding to the convolutional kernel tensor. σmax(M) and σmin(M), the iteration steps (denoted as "iter"), are used to show the effectiveness of the proposed algorithms. We randomly generated multi-channel convolutional kernels using the following command

    rand(state,1),K=rand(k,k,g,h).

    We considered KR3×3×g×h with different values of g,h, i.e., kernels of different sizes with 3×3 filters. For each kernel, we used 20×20×g as the size of the input data matrix. We then minimized the three different penalty functions by using Algorithms 3.1–3.3 respectively.

    Regarding the choice of the step size λ, although we have no theoretical result, we have a good rule of thumb. According to our numerical experimental results, the step size λ=1e5 is suitable for Algorithms 3.1 and 3.3 and the step size λ=1e4 is suitable for Algorithm 3.2. Regarding the process to obtain Ωp,q,z,y, i.e., the set of all indexes (i,j) such that mij=kp,q,z,y, we first generated a structured matrix A with the entry kp,q,z,y and then we used the MATLAB command "find(A)" to get the row and column subscripts of each nonzero element in A. Besides, at each iteration step we used MATLAB commands "norm(M)" and "cond(M)" to compute the largest and smallest singular value of the new transformation matrix.

    We present the results for 3×3×3×1 and 3×3×1×3 kernels in the following figures. In Figure 1, we demonstrate the changes of the largest singular value of M by using Algorithm 3.1.

    Figure 1.  Changes of σmax(M) for different kernel sizes.

    As the number of iterations increases, σmax(M) is reduced. In Figure 2, we demonstrate the changes of the smallest singular value of M by using Algorithm 3.2. As the number of iterations increases, σmin(M) increases.

    Figure 2.  Changes of σmin(M) for different kernel sizes.

    In Figure 3, the changes of σmax(M) and σmin(M) are shown, respectively as a result of using Algorithm 3.3. As the number of iterations increases, σmax(M) on the left axis scale is reduced and meanwhile σmin(M) on the right axis scale increases. The changes of σmax(M) and σmin(M) in the figures confirm that the three proposed algorithms are effective. In the training of deep neural networks, the practitioners decide which algorithm should be used based on the knowledge about the specific neural network architecture.

    Figure 3.  Changes of σmax(M) and σmin(M) for different kernel sizes.

    Numerical experiments were performed by using other randomly generated examples, which include random kernels with each entry uniformly distributed on [0,1]. The figures illustrating the convergence of σmax(M) and σmin(M) were similar to the figures presented in the paper.

    In this paper, we provide new methods to modify the singular values of the convolutional kernel tensors. From the perspective of linear algebra, each convolution operation corresponds to a structured transformation matrix. We have applied the knowledge about linear algebra in combination with the chain rule formula for computing derivatives to get the new regularization methods. New regularization terms for convolutional kernels have been proposed and the gradient decent algorithms for the regularization terms have been provided. The methods are shown to be effective in modifying the singular values of convolutional kernel tensors.

    The author declares he has not used Artificial Intelligence (AI) tools in the creation of this article.

    This work was supported by the National Natural Science Foundation of China (Grant No. 12001504) and the Fundamental Research Funds for the Central Universities (Grant No. 2652019320).

    The author declares no conflict of interest.



    [1] P. L. Bartlett, D. J. Foster, M. Telgarsky, Spectrally-normalized margin bounds for neural networks, Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017, 6241–6250.
    [2] A. Brock, T. Lim, J. M. Ritchie, N. Weston, Neural photo editing with introspective adversarial networks, arXiv, 2017. https://doi.org/10.48550/arXiv.1609.07093 doi: 10.48550/arXiv.1609.07093
    [3] R. H. F. Chan, X. Jin, An introduction to iterative toeplitz solvers, SIAM Press, 2007.
    [4] M. Cisse, P. Bojanowski, E. Grave, Y. Dauphin, N. Usunier, Parseval networks: improving robustness to adversarial examples, Proceedings of the 34th International Conference on Machine Learning, 70 (2017), 854–863.
    [5] W. Ding, Y. Wei, Theory and computation of tensors: multi-dimensional arrays, Academic Press, 2016. https://doi.org/10.1016/C2014-0-04764-8
    [6] V. Dumoulin, F. Visin, A guide to convolution arithmetic for deep learning, arXiv, 2018. https://doi.org/10.48550/arXiv.1603.07285 doi: 10.48550/arXiv.1603.07285
    [7] G. H. Golub, C. F. Van Loan, Matrix computations, Johns Hopkins University Press, 2013. https://doi.org/10.56021/9781421407944
    [8] I. J. Goodfellow, Y. Bengio, A. Courville, Deep learning, MIT Press, 2016.
    [9] I. J. Goodfellow, J. Shlens, C. Szegedy, Explaining and harnessing adversarial examples, arXiv, 2015. https://doi.org/10.48550/arXiv.1412.6572 doi: 10.48550/arXiv.1412.6572
    [10] P. C. Guo, Q. Ye, On the regularization of convolutional kernels in neural networks, Linear Multilinear Algebra, 70 (2022), 2318–2330. https://doi.org/10.1080/03081087.2020.1795058 doi: 10.1080/03081087.2020.1795058
    [11] J. F. Kolen, S. C. Kremer, Gradient flow in recurrent nets: the difficulty of learning long-term dependencies, Wiley-IEEE Press, 2001. https://doi.org/10.1109/9780470544037.ch14
    [12] X. Q. Jin, Developments and applications of block Toeplitz iterative solvers, Springer Science & Business Media, 2003.
    [13] J. Kovačević, A. Chebira, An introduction to frames, Now Publishers Inc., 2008.
    [14] P. Li, Y. Lu, C. Xu, J. Ren, Insight into Hopf bifurcation and control methods in fractional order BAM neural networks incorporating symmetric structure and delay, Cognit. Comput., 2023. https://doi.org/10.1007/s12559-023-10155-2 doi: 10.1007/s12559-023-10155-2
    [15] L. H. Lim, Tensors in computations, Acta Numer., 30 (2021), 555–764. https://doi.org/10.1017/S0962492921000076 doi: 10.1017/S0962492921000076
    [16] T. Miyato, T. Kataoka, M. Koyama, Y. Yoshida, Spectral normalization for generative adversarial networks, arXiv, 2018. https://doi.org/10.48550/arXiv.1802.05957 doi: 10.48550/arXiv.1802.05957
    [17] H. Sedghi, V. Gupta, P. M. Long, The singular values of convolutional layers, arXiv, 2018. https://doi.org/10.48550/arXiv.1805.10408 doi: 10.48550/arXiv.1805.10408
    [18] G. W. Stewart. Matrix algorithms, SIAM Publications Library, 2001. https://doi.org/10.1137/1.9780898718058
    [19] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. J. Goodfellow, et al., Intriguing properties of neural networks, arXiv, 2013. https://doi.org/10.48550/arXiv.1312.6199 doi: 10.48550/arXiv.1312.6199
    [20] Y. Tsuzuku, I. Sato, M. Sugiyama, Lipschitz-Margin training: scalable certification of perturbation invariance for deep neural networks, Adv. Neural Inf. Process., 31 (2018), 6542–6551.
    [21] J. Wang, Y. Chen, R. Chakraborty, S. X. Yu, Orthogonal convolutional neural networks, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020. https://doi.org/10.1109/CVPR42600.2020.01152 doi: 10.1109/CVPR42600.2020.01152
    [22] C. Xu, Z. Liu, P. Li, J. Yan, L. Yao, Bifurcation mechanism for fractional-order three-triangle multi-delayed neural networks, Neural Process. Lett., 2022. https://doi.org/10.1007/s11063-022-11130-y doi: 10.1007/s11063-022-11130-y
    [23] C. Xu, W. Zhang, Z. Liu, L. Yao, Delay-induced periodic oscillation for fractional-order neural networks with mixed delays, Neurocomputing, 488 (2022), 681–693. https://doi.org/10.1016/j.neucom.2021.11.079 doi: 10.1016/j.neucom.2021.11.079
    [24] Y. Yoshida, T. Miyato, Spectral norm regularization for improving the generalizability of deep learning, arXiv, 2017. https://doi.org/10.48550/arXiv.1705.10941 doi: 10.48550/arXiv.1705.10941
    [25] C. Zhang, S. Bengio, M. Hardt, B. Recht, O. Vinyals, Understanding deep learning (still) requires rethinking generalization, Commun. ACM, 64 (2021), 107–115. https://doi.org/10.1145/3446776 doi: 10.1145/3446776
  • Reader Comments
  • © 2023 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(1331) PDF downloads(74) Cited by(0)

Figures and Tables

Figures(3)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog