Processing math: 100%
Research article

A sigmoidal fractional derivative for regularization

  • Received: 16 January 2020 Accepted: 26 March 2020 Published: 30 March 2020
  • MSC : 26A33

  • In this paper, we propose a new fractional derivative, which is based on a Caputo-type derivative with a smooth kernel. We show that the proposed fractional derivative reduces to the classical derivative and has a smoothing effect which is compatible with 1 regularization. Moreover, it satisfies some classical properties.

    Citation: Mostafa Rezapour, Adebowale Sijuwade, Thomas Asaki. A sigmoidal fractional derivative for regularization[J]. AIMS Mathematics, 2020, 5(4): 3284-3297. doi: 10.3934/math.2020211

    Related Papers:

    [1] Ahu Ercan . Comparative analysis for fractional nonlinear Sturm-Liouville equations with singular and non-singular kernels. AIMS Mathematics, 2022, 7(7): 13325-13343. doi: 10.3934/math.2022736
    [2] Rahat Zarin, Abdur Raouf, Amir Khan, Aeshah A. Raezah, Usa Wannasingha Humphries . Computational modeling of financial crime population dynamics under different fractional operators. AIMS Mathematics, 2023, 8(9): 20755-20789. doi: 10.3934/math.20231058
    [3] Anwar Ahmad, Dumitru Baleanu . On two backward problems with Dzherbashian-Nersesian operator. AIMS Mathematics, 2023, 8(1): 887-904. doi: 10.3934/math.2023043
    [4] Naveed Iqbal, Saleh Alshammari, Thongchai Botmart . Evaluation of regularized long-wave equation via Caputo and Caputo-Fabrizio fractional derivatives. AIMS Mathematics, 2022, 7(11): 20401-20419. doi: 10.3934/math.20221118
    [5] M. A. Zaky, M. Babatin, M. Hammad, A. Akgül, A. S. Hendy . Efficient spectral collocation method for nonlinear systems of fractional pantograph delay differential equations. AIMS Mathematics, 2024, 9(6): 15246-15262. doi: 10.3934/math.2024740
    [6] Abdelkader Lamamri, Iqbal Jebril, Zoubir Dahmani, Ahmed Anber, Mahdi Rakah, Shawkat Alkhazaleh . Fractional calculus in beam deflection: Analyzing nonlinear systems with Caputo and conformable derivatives. AIMS Mathematics, 2024, 9(8): 21609-21627. doi: 10.3934/math.20241050
    [7] Michael Precious Ineh, Edet Peter Akpan, Hossam A. Nabwey . A novel approach to Lyapunov stability of Caputo fractional dynamic equations on time scale using a new generalized derivative. AIMS Mathematics, 2024, 9(12): 34406-34434. doi: 10.3934/math.20241639
    [8] Deniz Uçar . Inequalities for different type of functions via Caputo fractional derivative. AIMS Mathematics, 2022, 7(7): 12815-12826. doi: 10.3934/math.2022709
    [9] Zaid Laadjal, Fahd Jarad . On a Langevin equation involving Caputo fractional proportional derivatives with respect to another function. AIMS Mathematics, 2022, 7(1): 1273-1292. doi: 10.3934/math.2022075
    [10] Aisha Abdullah Alderremy, Mahmoud Jafari Shah Belaghi, Khaled Mohammed Saad, Tofigh Allahviranloo, Ali Ahmadian, Shaban Aly, Soheil Salahshour . Analytical solutions of q-fractional differential equations with proportional derivative. AIMS Mathematics, 2021, 6(6): 5737-5749. doi: 10.3934/math.2021338
  • In this paper, we propose a new fractional derivative, which is based on a Caputo-type derivative with a smooth kernel. We show that the proposed fractional derivative reduces to the classical derivative and has a smoothing effect which is compatible with 1 regularization. Moreover, it satisfies some classical properties.


    Fractional calculus has undergone significant developments in recent years and has found use in physics, engineering, economics, etc [1,2,3]. Classical results about the Riemann-Liouville and Caputo derivatives as well as fractional differential equations can be found in [4,5]. In [11] and [33], Caputo and Fabrizio suggested a new fractional derivative, whose properties were investigated by Losada and Nieto [15]. This fractional derivative was utilized in various applications, including the fractional Nagumo equation in Alqahtani et al. [23], coupled systems of time-fractional differential problems in Alsaedi et al. [24] and Fischer's reaction-diffusion equation in Atangana et al [25]. More applications of the Caputo-Fabrizio fractional derivative can be found in Aydogan et al [26]. and Atangana et al [27].

    For 0α1, <a<t, fH1(a,b) and b>a, the Caputo fractional derivative is defined by

    aCDtαf(t)=1Γ(1α)atf(s)(ts)αds. (1)

    By replacing the term 1Γ(1α) with the normalization constant M(α) such that M(0)=M(1)=1 and adjusting the kernel (ts)α, we obtain the Caputo-Fabrizio fractional derivative defined by

    aCFDtαf(t)=M(α)Γ(1α)atf(s)exp(α(ts)α1)ds. (2)

    The Caputo-Fabrizio fractional derivative of a constant vanishes as does the usual Caputo derivative, however the new kernel exp(αα1) is no longer singular for s=t. Caputo and Fabrizio try to extend their definition in [11] to functions in L1 by

    CFNDαf(t)=j=0CFNDρ[j]g(t,f(t),(ϕf)(t),h(t)NCFDγf(t),g(t)NCFDδf(t))2j,

    Algahtani et al. [23] show that the nonlinear Nagumo equation given by

    CF0Dtαu(x,t)+βu(x,t)nxu(x,t)=x(αu(x,t)nxu(x,t))+γu(x,t)(1um)(umδ), (3)

    where 0<α<1 and β,γ,δ are constant, subject to the boundary conditions

    u(x,0)=f(x),u(0,t)=g(t)

    has an exact solution. The authors show that this PDE can be reformulated in terms of a Lipschitz kernel. Existence of the exact solution is shown using a fixed point approach and uniqueness is provided, given that suitable assumptions are made about the Lipschitz constant. Their study claims that an exponential kernel is in some sense a better kernel than a power function, since the lack of a singularity provides a better filtration effect. In the context of fractional differential equation applications, since the associated functions are not defined in a Banach space, only approximate solutions to certain fractional differential equations can be investigated. The methods used to handle fractional differential problems such as CFDαf(t)=g(t,f(t)), cannot be extended to the problems resembling CFDαf(t)=g(t,f(t),CFDαf(t)).

    In Baleanu et al [14], the Caputo-Fabrizio fractional derivative on the Banach space CR[0,1] is considered in the context of higher order series-type fractional integrodifferential equations. More precisely, an extended Caputo-Fabrizio type fractional derivative is provided of order 0α<1 on CR[0,1] for b>0 by

    CFNDαf(t)=M(α)1α(f(t)f(0))exp(αt1α)+αM(α)(1α)20t(f(t)f(s))exp(α(ts)1α)ds.

    These authors use a standard fixed point approach to establish uniqueness of solutions to fractional series-type differential problems such as

    CFNDαf(t)=j=0CFNDρ[j]g(t,f(t),(ϕf)(t),h(t)NCFDγf(t),g(t)NCFDδf(t))2j,

    with initial condition f(0)=0 and α,γ,δ,ρ(0,1).

    An extension of this type which is compatible with orders beyond (0,1) has yet to be provided.

    The Caputo-Fabrizio fractional derivative is discussed in the setting of distributions in [28]. Other types of fractional derivatives can be found in Katugampola [22] and Oliveira et al [6]. In de Oliveira [12], it is shown that the choice of kernel in a Caputo-type fractional derivative is connected to the Laplace transform via convolution.

    Let I denote the Schwarz class of smooth test functions whose derivatives decay at infinity. Moreover, let I denote the space of continuous linear functionals on I. The distributional derivative

    {T'} is defined as in [32]

    RT(t)ϕ(t)dt=RT(t)ϕ(t)dt, (4)

    for all smooth compactly supported test functions ϕ on R.The distributional Laplace transform is given by

    F(s)=L(ϕ(t))=F(ϕ(t)eσt)(μ),

    where s=σ+iμ, μ<0 and ϕ(t)eσtI. Suppose that f is supported on (0,) such that σ>0 and f(t)eσtI. It follows that the Laplace transform of the derivative is given by

    L(ϕ(t))(s)=sL(ϕ(t))(s).

    Let L denote the distributional Laplace transform defined by

    L(f(x))=L1(sL(f)).

    One can define a more general fractional derivative as follows. Suppose that Φ(s,α) is a fractional integrodifferential operator and K(t,s):R2R is a continuous kernel. Let the corresponding operator ϕ(s,α) be defined for some fractional derivative Dα such that

    L(Dαf(t))=Φ(s,α)L(f(t)),

    where Φ(s,1)=s,Φ(s,1)=1s and Φ(s,0)=1. Then, letting Φ(s,α)=sL(K(s,t,α)). Proceeding with the Convolution Theorem, we are left with a Caputo-type fractional operator of the form

    aDKαf(t)=atK(ts,α)f(s)ds, (5)

    which is dependent on the choice of kernel K. For fH1(a,b), and nN, we can spot commonly used kernels such as the Caputo kernel K1=1Γ(1α)(ts)α1, the Caputo-Fabrizio kernel K2=M(α)1αexp(α(ts)α1) and the Gaussian kernel K3=12πσ2exp(t22σ2) [4,10,13].

    The memory principle for fractional derivatives describes the history of f(t) near the terminal point t=a. Let L denote the memory length, satisfying a+Ltb. Define the error in approximating the fractional derivative by

    EL,α,a(t)=|aDKαf(t)tLDKαf(t)|,

    where aDKαf(t) is as in (5). If f(t)M for a<t<b and 0<α<1, we have the following error estimate for the Caputo fractional derivative

    EL,α,a(t)=|1Γ(1α)tLtf(s)(ts)αds|ML1α|Γ(2α)|.

    For all ϵ>0, if EL,α,a(t)ϵ with a+Ltb, we have

    L(Mϵ|Γ(2α)|)1α1. (6)

    Therefore, the Caputo fractional derivative with terminal a can be approximated by the corresponding fractional derivative with lower limit tL, with the level of accuracy described above.

    In this work, we propose a different fractional derivative that has a smooth kernel. Our primary interest in defining this fractional derivative is the improvement of machine learning algorithms. Caputo-type fractional derivatives have been applied in machine learning, such as in Pu et al [10]. In particular, fractional order gradient methods have been considered in order to improve the performance of the integer order methods. For example, suppose that f:RnR is convex and differentiable with a Lipschitz gradient, then the integer order gradient method defined by

    xk+1=xkμf(xk)

    has a linear convergence rate. Improving the performance of the integer-order gradient method is critical in optimization problems. In recent literature, fractional calculus has been thought to improve the integer order gradient method due to nonlocality and the memory principle. Fractional order gradient methods have been proposed based on the Caputo fractional derivative that offer competitive convergence rates. For example, in [19], a Caputo fractional gradient method is proposed that is shown to be monotone and exhibit strong convergence.

    Fractional derivatives were used in the backpropagation algorithm for feedforward neural networks and convolutional neural networks in [20,31]. In both studies, the rate of convergence was shown to exceed the rate of integer-order methods. Fractional-order methods have been used to investigate complex-valued neural networks in [17] and recurrent neural network models in [30]. In [19] and [16], gradients based on the Caputo fractional derivative are used to update parameters while integer order gradients are used to handle backpropagation allowing for simpler computation. The experiments therein are shown to improve the accuracy of the neural network's performance compared to integer-order methods while being equally costly.

    In the training of machine learning models, one often needs to obtain weights of the features which optimize the training data. In the case of maximum likelihood training, regularization is typically needed so that the model does not overfit the training data. In p regularization, the weight vector is penalized by its p norm. While the case for p=1 and p=2 are very common and result in similar levels of accuracy, 1 regularization is much more practical. Due to its sparsity, 1 regularization is less memory intensive and more time-effective than 2 regularization. On the other hand, 1 regularization is problematic in that during the update process, the gradient of the regularization term is not differentiable at the origin as the error function given below

    E1=E+λk=1N|xk| (7)

    has classical derivative

    E1xj=Exj+λsgn(xj).

    A typical remedy to this problem is to use the stochastic gradient descent method, which approximates the gradient using the training data. Although time efficient for training, when the dimension of the feature space is large, the update process slows down significantly. Furthermore, the model becomes less sparse after training the data. The discontinuity induced by the regularizer proves to be problematic as it adjusts the direction of descent. The use of sigmoids in regularization problems has been previously explored as in Krutikov [29], but not in the context of fractional derivatives. Another remedy to the aforementioned problem is the use of fractional gradients over the classical descent methods. These methods are still in their infancy and problematic in that convergence to the local optimum is not always guaranteed, even when the algorithm converges. Furthermore, these methods often require an adjustment to the fractional derivative by truncation as in [9], variable order techniques as in [18], and methods based on the memory principle (6) due to the computational expense and the failure of the Caputo kernel to be smooth.

    We would also like our operator to be nonlocal. In [13], it is shown that unlike the Caputo derivative, the Caputo-Fabrizio fractional derivative is not a nonlocal operator. The linear fractional differential equation

    λ(aCFDtαf(t))+ν(t)g(t)+η(t,t0)Y(t0)=0

    is shown to reduce to a first-order ordinary differential equation. This means that the Caputo-Fabrizio derivative cannot sufficiently describe processes with nonlocality and memory. With the correct choice of kernel, this complication can be avoided.

    In this section, we define a new left-sided fractional derivative. We show that the proposed fractional derivative reduces to the H1 derivative as the order approaches 1. In the results to follow, for 0<α1, we will let C1(α) denote a normalization constant C(α)Γ(2α) satisfying C(α)Γ(1α)12 as α1.

    Definition 2.1. (Left sigmoidal fractional derivative) Let 0<α1, fH1((a,b)), t>a and {f(t)} denotes the H1 distributional derivative as in (4). We define a new fractional derivative by

    σDaαf(t)=C1(α)at{f(s)}sech2(st1α)ds. (8)

    Now, we show that the left sigmoidal fractional derivative reduces to the H1 derivative.

    Theorem 2.1. (Reduction to classical derivative) Suppose fH1(a,b), then

    limα1σDaαf(t)={f(t)}. (9)

    Proof.

    limα1σDaαf(t)=C(α)Γ(2α)limα1at{f(s)}sech2(st1α)ds
    =2C(α)Γ(1α)limα1at{f(s)}sech2(st1α)2(1α)ds=2C(α)Γ(1α)limα1at{f(s)}sech2(st1α)2(1α)ds
    =limα12C(α)Γ(1α)(at{f(s)}limα1sech2(st1α)2(1α)ds)=at{f(s)}δ(st)ds={f(t)},

    where the last result follows from the observation that

    δ(t) is the Dirac distribution.

    In the following theorem, we show that this left sigmoidal fractional derivative is commutative with respect to the classical derivative.

    Theorem 2.2. Suppose that f is at least twice continuously differentiable and σDaαf(t) is differentiable. If f(a)=0, then

    σDaα(σDa1f(t))=σDa1(σDaαf(t)), (10)

    where 0<α<1.

    Proof. From (8), integrating by parts yields

    σDaα(σDa1f(t))=C1(α)atf(s)sech2(st1α)ds
    =f(t)1α+2C(α)Γ(2α)(1α)atf(s)sech(st1α)tanh(st1α)ds, (11)

    so we have

    σDa1(σDaαf(t))=limγ1σDaγ(σDaαf(t))=ddt(σDaαf(t))=C1(α)ddtatf(s)sech2(st1α)ds
    =f(t)1α+2C(α)Γ(2α)(1α)atf(s)sech(st1α)tanh(st1α)ds, (12)

    appealing to the Leibniz integral rule

    ddt(a(t)b(t)f(t,s)ds)=f(t,b(t))b(t)f(t,a(t))a(t)+a(t)b(t)tf(s,t)dt.

    From (11) and (12), the desired result is obtained.

    In the next theorem, we show that the left sigmoidal fractional derivative does not satisfy the memory principle in the sense of (6). More precisely, the next theorem implies that we show that the left sigmoidal fractional derivative can be approximated by the corresponding fractional derivative with lower limit tL with increased accuracy for orders in which C1(α) is large.

    Theorem 2.3. (Memory principle) Suppose that f is differentiable on (a,b), a+Ltb and 0<α<1. For every ϵ>0, if there exists C0>0 such that f(t)C0, then

    L(1α)(|C1(α)|C0ϵ1)12. (13)

    Proof. Making use of the inequality

    cosh(s)1+s2,

    we have

    |σDaαf(t)σDtLαf(t)|=C1(α)atLf(s)sech2(st1α)dsC1(α)C0atLds1+(st1α)2C1(α)C01+(L1α)2,

    and the result follows.

    In the theorem below, we show that our new fractional derivative provides a sigmoidal approximation to functions that have a piecewise linear H1 distributional derivative. For instance, the proposed left sigmoidal fractional derivative is compatible with 1-regularization. In the case of the 1 norm, it can be used to define a fractional gradient, which approximates its classical gradient via a family of sigmoids as α approaches 1. This is promising in the context of gradient descent algorithms.

    Theorem 2.4 (Norm-1 compatibility) σDaα provides a smooth approximation to the 1 norm defined by

    x1=k=1n|xk|

    as α1 in the sense that for the error function E given in (7), σDaαE1(xj) is given by

    σDaαE(xj)+λC1(α)(α1)tanh(axj1α),

    where a>0.

    Proof. The result follows from the observation that

    C1(α)at{|s|}sech2(st1α)ds=C1(α)atH(t)sech2(st1α)ds,
    =C1(α)(α1)tanh(at1α)ds12(2H(t)1) as α1,

    where H(t) is the Heaviside function.

    Theorem 2.5. (Mittag-Leffler function). Suppose that γ,η>0 and 0<a<t. Then

    σDaαEγ,η(t)C1(α)Eγ,η(ta),

    where Eγ,η(z)=k=0zkΓ(γk+η) is the two-parameter Mittag-Leffler function

    Proof.

    σDaαEγ,η(t)=atsech2(st1α)ddsk=0skΓ(γk+η)ds
    =atsech2(st1α)k=0ksk1Γ(γk+η)=k=0kΓ(γk+η)atsk1sech2(st1α)ds
    k=0kΓ(γk+η)atsk1ds=k=1(ta)kΓ(γk+η). 

    Theorem 2.6. Suppose that f0, 1<p<, 0<α<1 and 0<tT. If f0 is differentiable with fLp(R) and M is the maximal operator of f given by

    Mf(x)=supt012(a+x)axa+xf(t)dt,

    then

    (a) σDtαf(t)2TC1(α)M(|f|)(0)

    (b) σDaαf(t) is integrable on R.

    Proof. (a) Since

    σDtαMf(t)=C1(α)ttf(s)sech2(st1α)ds2tC1(α)12tttf(s)ds
    2TC1(α)supt>0tt|f(s)|dst=2TC1(α)M|f|(0).

    (b) From Young's convolution inequality, fgLrfLpgLprp+r(p1).

    σDaαMf(t)dtC1(α)at|f(s)sech2(st1α)|ds
    =f(t)sech2(tα1)L1(R)fLp(R)sech2(tα1)Lp2p1(R)<.

    The next theorem describes the effect of the Laplace and Fourier transforms, which can extend to distributions as in de Oliveira [6]. The Convolution Theorem connects our choice of kernel as in (5) via the operator Φ(s,α)=sL(K(s,t,α)). In this case, Φ(s,α) depends on the digamma function Ψ(z)=Γ(z)Γ(z). This shows that the left-sigmoidal fractional derivative does not reduce to the left-sided Riemann-Liouville fractional derivative.

    Theorem 2.7. (Transformations) Suppose that 0<α<1, Re(s)>0, ωR, aR and f is a differentiable function of exponential order such that f(0)=0. If T1(s),T2(ω) are defined by

    T1(s)=(1+s(Ψ(2+s4)Ψ(s4)2)),T2(ω)=π2csch(πω2),

    then

    (a) L(σD0αf(t))(s)=C1(α)(s(α1))2T1((α1)s)L(f)(s)

    (b) F(σD0α)(ω)=C1(α)ω2|α1|(α1)T2((α1)s)F(f)(ω),

    where L(f)(s) denotes the Laplace transform of f and F(f)(ω) denotes the Fourier transform of f.

    Proof. (a) follows from a standard application of the Convolution theorem. By using the dilation property L(f(at))=F(sa)a, we have

    L(σD0αf(t))(s)C1(α)=L(fsech2(tα1))=L(f)L(sech2(tα1))
    =s(α1)L(f)(s)L(sech2)(s(α1))
    =((α1)s)2L(f)(s)L(tanh)(s(α1)).

    The transform L(tanht) is handled as follows

    s2L(tanh(t))(s)=s2L(tanh(t))=s20est(1e2t)1+e2tdt
    =s20est(1e2t)k=0(e2t)kdt.

    Because of the absolute convergence of the monotone decreasing sum k=0(1)ke2ktdt and the nondecreasing nature of its partial sums, we can exchange integration and summation using the Lebesgue Monotone Convergence Theorem. Continuing, we have

    s+2s2k=1(1)kL(e2kt)=s+2s2k=1(1)k2k+s
    =s+2sk=1=(1)k2ks+1=s(1+s(Ψ(2+s4)Ψ(s4)2)).

    The identity

    k=0(1)ksk+1=Ψ(s+12s)Ψ(12s)2s

    used above comes from the Lerch transcendent, defined by

    Φ(s,z,a)=k=0zk(a+k)s,

    where |z|<1, a0,1,2,... and using the dilation property once more, the result follows.

    (b) We proceed as in (a).

    F(σD0αf(t))=(σD0αf(t))eiωtdt=F(f)F(sech2(tα1))
    =iωF(f)F(sech2(tα1))=iω|α1|F(f)F(sech2)((α1)ω)
    =ω2|α1|(α1)F(f)F(tanh(t))((α1)ω).

    To finish the proof, we recall the result

    F(tanh(t))=iωπ2csch(πω2).

    Theorem 2.8. Suppose that f is differentiable and 0<α<1. Then

    atf(s)e(st1α)2dsC1(α)1σDaα(f(t))at(1α)2f(s)(1α)2+(st)2dsf(t)f(a).

    Proof. Using the inequality

    coshxex22,

    we have

    e12(st1α)2sech(st1α),

    which results in the leftmost inequality. Noticing that cosh2x1+x2, we have that

    sech2(st1α)(1α)2(1α)2+(st)21,

    finishing the last three inequalities.

    Theorem 2.9. The problem

    σDaα(f(t))=G(t),G(0)=0

    has the solution

    f(t)=g(t)C1(α)+f(0),

    where G(t)=0tg(s)ds.

    Proof. Differentiating the differential equation above, the problem above reduces to

    C1(α)f(t)=g(t),

    which can be integrated to obtain the result.

    Theorem 2.10. Let 0<α<1 and let g:(a,b)×R2 be a continuous function such that there exists a constant C0>0 satisfying

    |g(t,x1,y1)g(t,x2,y2)|C0(|x1x2|+|y1y2|)

    for all t(a,b) and x1,x2,y1,y2R and |(α1)C(α)C0|<1. Then, the problem

    σDaαf(t)=g(t,f(t),σDaαf(t))

    has a unique solution.

    Proof.

    |g(t,σDaα(f1(t)))g(t,σDaα(f2(t))|
    |(α1)C1(α)tanh(at1α)|f1f2|
    |(α1)C1(α)C0||f1f2|.

    Since (α1)C(α)C0<1, the map F:H1(a,b)H1(a,b) defined by

    C1(α)1g(t,σDaα(f1(t)))

    is a contraction. By the Banach fixed-point theorem, it has a unique fixed point, finishing the proof.

    We note that this result is advantageous in that the analogous existence and uniqueness result as in fractional differential systems defined by the Caputo derivative is highly dependent on initial conditions imposed on the primary function of interest and its classical derivatives [4].

    We now shift our attention to a gradient descent method. Suppose that f(x) has a bounded derivative and unique critical point t such that f(t)=0. For atb, 0<α<1, define the scalar left sigmoidal fractional gradient descent method by

    tk+1=tkμσDtk1αf(tk). (14)

    where 0<μ<1 is the learning rate.

    Theorem 2.11 (Fractional Gradient Descent). Let f be as in (13). Then, the left-sigmoidal fractional-order gradient method (13) converges to the true critical point t.

    Proof. Denote the Lipschitz constant of f by L. For kN,

    |tktk+1|=μσDtK1αf(tk)=C1(α)μ|tk1tkf(s)sech2(stk1α)ds|
    C1(α)μL|α1|tanh(stk1α)C1(α)μL|tktk1|.

    Repeating this process, it follows that the tk form a Cauchy sequence, guaranteeing convergence. To show that the sequence converges to the critical point, suppose for contradiction that the sequence (tk)k=0 converges to a point t^t. Then, for every ϵ>0, there exists NN such that for all kN, |f(tk)|>0 and

    |tk1t^|<ϵ<|tt^|.

    As a consequence of (13) and Theorem 2.8, we have

    |tk+1tk|=C1(α)μ|tk1tkf(s)sech2(sa1α)ds|C1(α)μinfk>Ntk1tkf(s)e(st1α)2ds
    C1(α)μinfk>N|f(tk1)|tk1tk1(st1α)2dsM1|tktk1|(1+|tktk1|(1α)3)M1M2|tktk1|32,

    where

    M1=C1(α)μinfk>N|f(tk1)|,M213(1α)3.

    On the other hand, we have the inequality

    |tk+1tk1||tk+1t|+|ttk1|<2ϵ.

    Choosing ϵ<12(M1M2)2 yields M1M2>|tk+1tk|12, which implies that |tk+1tk|>|tktk1|, contradicting the assumption that the sequence (tk) is convergent.

    In this paper, we defined a new sigmoidal fractional derivative, which is compatible with certain weakly differentiable functions. We showed that this fractional derivative satisfies some forms of classical properties and is compatible with the 1 norm by a sigmoidal approximation. For further research, we will investigate this operator in optimization and machine learning. We note that the left-sigmoidal fractional derivative can be applied in the context of gradient descent, which has applications in optimization and machine learning [7,8]. Recently, backpropagation and convolution neural networks have been studied in the context of fractional derivatives, typically of the Caputo-type are being used for gradient descent. This idea is still novel and needs to see improvements. For example, the gradient descent method has been handled by Sheng et al. [20,21], Wang et al. [19], Wei et al. [9] and Bao et al [16]. These methods are still early in development. The following topics still need to be fully addressed: convergence to an extreme point, extending the available range of fractional order, more complicated neural networks, loss function compatibility and the usage of the chain rule.

    The authors declare that there is no conflict of interest.



    [1] R. Gorenflo, F. Mainardi, 223-276. Fractional calculus: integral and differential equations of fractional order. In: Fractals and Fractional Calculus in Continuum Mechanics, Springer Verlag, Wien and New York 1997.
    [2] A. N. Kochubei, General fractional calculus, evolution equations, and renewal processes, Integr. Equat. Oper. Th., 71 (2011), 583-600. doi: 10.1007/s00020-011-1918-8
    [3] V. Kiryakova, Generalised Fractional Calculus and Applications, Pitman Research Notes in Mathematics, CRC Press, 1993.
    [4] I. Podlubny, Fractional Differential Equations, Academic Press, New York, 2009.
    [5] M. Caputo, Elasticità e Dissipazione, Zanichelli, Bologna, 1965.
    [6] E. C. de Oliveira, J. A. T. Machado, A review of definitions for fractional derivatives and integral, Math. Prob. Ing., 2014 (2014), 238459.
    [7] J. S. Zeng and W. T. Yin, On nonconvex decentralized gradient descent, IEEE T. Signal Proces., 66 (2018), 2834-2848.
    [8] Y. LeCun, Y. Bengio, and G. Hinton, Deep learning, Nature, 521 (2015), 436-444. doi: 10.1038/nature14539
    [9] Y. Wei, Y. Kang, W. Yin, et al. Design of generalized fractional order gradient descent method, preprint, 2018.
    [10] Y. F. Pu, G. L. Zhou, Y. Zhang, et al. Fractional Extreme Value Adaptive Training Method: Fractional Steepest Descent Approach, IEEE T. Neur. Net. Lear., 26 (2013), 653-662.
    [11] M. Caputo, M. Fabrizio, A new Definition of Fractional Derivative without Singular Kernel, Progr. Fract. Differ. Appl., 1 (2015), 1-13.
    [12] E. C. de Oliveira, S. Jarosz, J. Vaz Jr., Fractional Calculus via Laplace Transform and its Application in Relaxation Processes, Commun. Nonlinear Sci., 69 (2019), 58-72. doi: 10.1016/j.cnsns.2018.09.013
    [13] V. E. Tarasov, No nonlocality. no fractional derivative, Commun. Nonlinear Sci., 62 (2018), 15-163. doi: 10.1016/j.cnsns.2018.02.019
    [14] D. Baleanu, A. Mousalou, S. Rezapour, (2018). The extended fractional Caputo-Fabrizio derivative of order 0 ≤ σ, Adv. Differ. Equ-NY, 2018 (2018), 255.
    [15] J. Losada, J. J. Nieto, Properties of a new fractional derivative without singular Kernel, Prog. Fract. Differ. Appl., 1 (2015), 87-92.
    [16] C. Bao, Y. PU, Y. Zhang, Fractional-Order Deep Backpropagation Neural Network, Comput. Intel. Neurosc., 2018 (2018), 1-10.
    [17] J. Wang, G. Yang, B. Zhang, et al. Convergence Analysis of Caputo-Type Fractional Order Complex-Valued Neural Networks, IEEE Access, 5 (2017), 14560-14571. doi: 10.1109/ACCESS.2017.2679185
    [18] S. Cheng, Y. Wei, Y. Chen, et al. An innovative fractional order LMS based on variable initial value and gradient order, Signal Process., 133 (2017), 260-269. doi: 10.1016/j.sigpro.2016.11.026
    [19] J. Wang, Y. Wen, Y. Gou, et al. (2017). Fractional-order gradient descent learning of BP neural networks with Caputo derivative, Neural Networks, 89 (2017), 19-30. doi: 10.1016/j.neunet.2017.02.007
    [20] D. Sheng, Y. Wei, Y. Chen, et al. Convolutional neural networks with fractional order gradient method, Neurocomputing, 2019.
    [21] Y. Q. Chen, Q. Gao, Y. H. Wei, et al. Study on fractional order gradient methods, Appl. Math. Comput., 314 (2017), 310-321.
    [22] U. N. Katugampola, A New Fractional Derivative with Classical Properties, arXiv preprint arXiv:1410.6535, 2014.
    [23] R. T. Alqahtani, Fixed-point theorem for Caputo-Fabrizio fractional Nagumo equation with nonlinear diffusion and convection, J. Nonlinear Sci. Appl., 9 (2016), 1991-1999. doi: 10.22436/jnsa.009.05.05
    [24] A. Alsaedi, D. Baleanu, S. Etemad, et al. On Coupled Systems of Time-Fractional Differential Problems by Using a New Fractional Derivative, 2016 (2016), 1-8.
    [25] A. Atangana, On the new fractional derivative and application to nonlinear Fisher's reaction-diffusion equation, Appl. Math. Comput., 273 (2016), 948-956.
    [26] S. M. Aydogan, D. Baleanu, A. Mousalou, et al. On approximate solutions for two higher-order Caputo-Fabrizio fractional integro-differential equations, Adv. Differ. Equ-NY, 2017 (2017), 221.
    [27] A. Atangana, J. F. Gómez-Aguilar, Decolonisation of fractional calculus rules: Breaking commutativity and associativity to capture more natural phenomena, Eur. Phys. J. Plus, 133 (2018), 166.
    [28] T. M. Atanacković, S. Pilipović, D. Zorica, Properties of the Caputo-Fabrizio fractional derivative and its distributional settings, Fract. Calc. Appl. Anal., 21 (2018), 29-44. doi: 10.1515/fca-2018-0003
    [29] V. N. Krutikov, L. A. Kazakovtsev, G. Shkaberina, et al. New method of training two-layer sigmoid neural networks using regularization, IOP Conference Series: Materials Science and Engineering, 537 (2019), 042055. doi: 10.1088/1757-899X/537/4/042055
    [30] R. Rakkiyappan, R. Sivaranjani, G. Velmurugan, et al. Analysis of global o(t-α) stability and global asymptotical periodicity for a class of fractional- order complex-valued neural networks with time varying delays, Neural Networks, 77 (2016), 51-69. doi: 10.1016/j.neunet.2016.01.007
    [31] X. Chen, Application of fractional calculus in bp neural networks, (Ph.D. thesis), Nanjing, Jiangsu: Nanjing Forestry University, 2013.
    [32] A. H.Zemanian, Distribution theory and transform analysis, New York: Dover Publ. Inc., 1987.
    [33] M. Caputo, M. Fabrizio, Applications of new time and spatial fractional derivatives with exponential kernels, Progr. Fract. Differ. Appl., 2 (2016), 1-11. doi: 10.18576/pfda/020101
  • This article has been cited by:

    1. Mahesh Puri Goswami, Raj Kumar, Riemann–Liouville fractional operators of bicomplex order and its properties, 2022, 45, 0170-4214, 5699, 10.1002/mma.8135
  • Reader Comments
  • © 2020 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(4333) PDF downloads(384) Cited by(1)

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog