Transferring monolingual model to low-resource language: the case of Tigrinya

Abrhalei Tela; Abraham Woubie; Ville Hautamäki; Abrhalei Tela; Abraham Woubie; Ville Hautamäki

doi:10.3934/aci.2024011

Applied Computing and Intelligence

2024, Volume 4, Issue 2: 184-194. doi: 10.3934/aci.2024011

Previous Article Next Article

Research article

Transferring monolingual model to low-resource language: the case of Tigrinya

1.
School of Computing, University of Eastern Finland, Joensuu, Finland
2.
Department of Signal Processing and Acoustics, Aalto University, Espoo, Finland

Academic Editor: Chih-Cheng Hung

Received: 23 October 2024 Revised: 02 November 2024 Accepted: 08 November 2024 Published: 18 November 2024

In recent years, transformer models have achieved great success in natural language processing (NLP) tasks. Most of the current results are achieved by using monolingual transformer models, where the model is pre-trained using a single-language unlabelled text corpus. Then, the model is fine-tuned to the specific downstream task. However, the cost of pre-training a new transformer model is high for most languages. In this work, we propose a cost-effective transfer learning method to adopt a strong source language model, trained from a large monolingual corpus to a low-resource language. Thus, using the XLNet language model, we demonstrate competitive performance with mBERT and a pre-trained target language model on the cross-lingual sentiment (CLS) dataset and on a new sentiment analysis dataset for the low-resource language Tigrinya. With only 10k examples of the given Tigrinya sentiment analysis dataset, English XLNet achieved 78.88% F1-Score, outperforming BERT and mBERT by 10% and 7%, respectively. More interestingly, fine-tuning (English) XLNet model on the CLS dataset showed promising results compared to mBERT and even outperformed mBERT for one dataset of the Japanese language.

Keywords:

Citation: Abrhalei Tela, Abraham Woubie, Ville Hautamäki. Transferring monolingual model to low-resource language: the case of Tigrinya[J]. Applied Computing and Intelligence, 2024, 4(2): 184-194. doi: 10.3934/aci.2024011

Related Papers:

[1]	Juan L. G. Guirao, Pshtiwan Othman Mohammed, Hari Mohan Srivastava, Dumitru Baleanu, Marwan S. Abualrub . Relationships between the discrete Riemann-Liouville and Liouville-Caputo fractional differences and their associated convexity results. AIMS Mathematics, 2022, 7(10): 18127-18141. doi: 10.3934/math.2022997
[2]	Shuqin Zhang, Jie Wang, Lei Hu . On definition of solution of initial value problem for fractional differential equation of variable order. AIMS Mathematics, 2021, 6(7): 6845-6867. doi: 10.3934/math.2021401
[3]	Khalid K. Ali, K. R. Raslan, Amira Abd-Elall Ibrahim, Mohamed S. Mohamed . On study the fractional Caputo-Fabrizio integro differential equation including the fractional q-integral of the Riemann-Liouville type. AIMS Mathematics, 2023, 8(8): 18206-18222. doi: 10.3934/math.2023925
[4]	Abdul Samad, Imran Siddique, Zareen A. Khan . Meshfree numerical approach for some time-space dependent order partial differential equations in porous media. AIMS Mathematics, 2023, 8(6): 13162-13180. doi: 10.3934/math.2023665
[5]	Ravi P. Agarwal, Snezhana Hristova . Stability of delay Hopfield neural networks with generalized proportional Riemann-Liouville fractional derivative. AIMS Mathematics, 2023, 8(11): 26801-26820. doi: 10.3934/math.20231372
[6]	Adisorn Kittisopaporn, Pattrawut Chansangiam . Approximate solutions of the $2$ D space-time fractional diffusion equation via a gradient-descent iterative algorithm with Grünwald-Letnikov approximation. AIMS Mathematics, 2022, 7(5): 8471-8490. doi: 10.3934/math.2022472
[7]	Deepak B. Pachpatte . On some ψ Caputo fractional Čebyšev like inequalities for functions of two and three variables. AIMS Mathematics, 2020, 5(3): 2244-2260. doi: 10.3934/math.2020148
[8]	Ravi Agarwal, Snezhana Hristova, Donal O'Regan . Integral presentations of the solution of a boundary value problem for impulsive fractional integro-differential equations with Riemann-Liouville derivatives. AIMS Mathematics, 2022, 7(2): 2973-2988. doi: 10.3934/math.2022164
[9]	Snezhana Hristova, Antonia Dobreva . Existence, continuous dependence and finite time stability for Riemann-Liouville fractional differential equations with a constant delay. AIMS Mathematics, 2020, 5(4): 3809-3824. doi: 10.3934/math.2020247
[10]	Erdal Bas, Ramazan Ozarslan . Theory of discrete fractional Sturm–Liouville equations and visual results. AIMS Mathematics, 2019, 4(3): 593-612. doi: 10.3934/math.2019.3.593

Abstract

1. Introduction

In much of the literature, time fractional models are defined using the Caputo definition ^{[32,33,34,35,36]}, in which time fractional models are models described by fractional differential equations or pseudo state space descriptions. The Caputo definition is widely acclaimed because it makes it possible to define initial conditions that relate to the integer derivatives of the derived functions in the models considered. However, this paper shows that this definition does not take initial conditions properly into account if used to define a time fractional model.

The problem was analysed for the first time by Lorenzo and Hartley ^[1,2]. To take the past of the model into account in a convenient way in a finite interval, they introduced an initialization function. The idea of replacing the commonly used initial values by an initial function was further developed in ^[3]. In ^[4], the need to consider the "prehistories" before the initial instant of the derivate functions was shown, making it possible to address the initialisation of fractional visco-elastic equations to reach a unique solution. In ^[5,6], a counter example was used to demonstrate that initial conditions cannot be correctly taken into account in a dynamical model whether by Caputo or Riemann-Liouville definitions. This led to the conclusion in ^[7] that fractional derivative and time fractional model initializations are two distinct problems. Still using an initial time shifting method, counter examples were proposed in ^[8] to show similar initialisation problems with the Caputo definition for partial differential equations. A time shifting technique was also recently used in ^[9] to analyse a groundwater flow model with time Caputo or Riemann-Liouville fractional partial derivatives. The non-objectivity of these models was demonstrated in this paper. The authors in ^[9] did not address the problem of initialization, but this objectivity can be restored by also introducing an initialization function (instead of initial conditions).

As previously mentioned, several studies and several solutions have already been published on initialisation of fractional models, but many papers in which the initial conditions are taken into account incorrectly are also still published. Thus the novelties and the contributions of the paper are new demonstrations and new simulations that highlight how initialisations must be done with a time fractional model. Thus, in this paper, two examples are used to show that the Caputo definition does not enable initial conditions to be correctly handled when this definition is used to define a time fractional model. In the first example, the response of a simple model, assumed to be at rest, is calculated analytically on a given time interval. Then inside this interval, a second response is computed by considering initial conditions resulting from the first simulation, and ignoring the model past before the considered initial time. This is the initialisation currently found in the literature and this example shows that it is unable to ensure the correct model trajectories. In the second example, two different histories are generated that produce the same initial conditions for the model. This example shows that in spite of equal initial conditions, the model response is different, thus showing that all the model past must be taken into account to define its future. A similar analysis is also carried out with the Riemann-Liouville and the Grünwald-Letnikov's definitions, suggesting that other definitions should also be problematic. Note that all the analyzes carried out and conclusions obtained in this paper relate to models involving only time fractional derivatives and not space fractional derivatives as in ^[29,30,31].

2. Problem analysis with Caputo's definition

The fractional integral of order $\nu$ , $0 < \nu < 1$ , of a function $y\left(t\right)$ is defined by ^[10]:

${\boldsymbol{I}}_{{t}_{0}}^{\nu }y\left(t\right) = \frac{1}{\rm{\Gamma }\left(\nu \right)}{\int }_{{t}_{0}}^{t}\frac{y\left(\tau \right)}{{\left(t-\tau \right)}^{1-\nu }}d\tau \text{.}$

(1)

$\varGamma \left(.\right)$ being Euler's gamma function. From this definition, the Caputo derivative definition of order $\nu$ , $0 < \nu < 1$ , of a function $y\left(t\right)$ is defined by ^[11]:

${}_{C}{\boldsymbol{D}}_{{t}_{0}}^{\nu }y\left(t\right) = {\boldsymbol{I}}_{{t}_{0}}^{1-\nu }\left(\frac{d}{dt}y\left(t\right)\right) = \frac{1}{\rm{\Gamma }\left(1-\nu \right)}{\int }_{{t}_{0}}^{t}\frac{1}{{\left(t-\tau \right)}^{\nu }}\frac{dy\left(\tau \right)}{d\tau }d\tau \text{.}$

(2)

Laplace transform applied to relation (2) reveals how initial conditions are associated to this definition:

$\mathcal{L}\left\{{}_{C}{\boldsymbol{D}}_{{t}_{0}}^{\nu }y\left(t\right)\right\} = \frac{1}{{s}^{1-\nu }}\left(sY\left(s\right)-y\left({t}_{0}\right)\right)$ =

${s}^{\nu }Y\left(s\right)-y\left({t}_{0}\right).$

(3)

To demonstrate that Caputo definition is not able to take initial conditions correctly into account when used to define a time fractional model (a fractional differential equation or a pseudo state space description), the following model is considered

${\boldsymbol{D}}^{\boldsymbol{\nu }}y\left(t\right) = -ay\left(t\right)+u\left(t\right) ~~ 0 < \nu < 1 ~~ a > 0 .$

(4)

In relation (4), ${\boldsymbol{D}}^{\boldsymbol{\nu }}$ denotes the Caputo definition in this section but denotes the Riemann-Liouville or Grünwald-Letnikov definitions in the next section. Then, the following algorithm is used to study model (4).

Algorithm 1

1-Simulation on the time interval $\left[0, {\rm{t}}_{1}\right]$ of the time fractional model (for instance model (4)) with null initial conditions (for $\rm{t}\in \left]-\rm{\infty }, 0\right]$ ). Let ${\rm{S}}_{1}$ denote this simulation.

2-Record the model output $\rm{y}\left(\rm{t}\right)$ and the integer derivatives of $\rm{y}\left(\rm{t}\right)$ (denoted $\rm{y}\rm{\text{'}}\left(\rm{t}\right)$ , $\rm{y}\rm{\text{'}}\rm{\text{'}}\left(\rm{t}\right)$ , ….) at time ${\rm{t}}_{0}$ such that $0 < {\rm{t}}_{0} < {\rm{t}}_{1}$ .

3-Simulate the model again on $\left[{\rm{t}}_{0}, {\rm{t}}_{1}\right]$ , using $\rm{y}\left({\rm{t}}_{0}\right)$ , $\rm{y}\rm{\text{'}}\left({\rm{t}}_{0}\right)$ , $\rm{y}\rm{\text{'}}\rm{\text{'}}\left({\rm{t}}_{0}\right)$ … as initial conditions. Let ${\rm{S}}_{2}$ denote this simulation.

4-Compare ${\rm{S}}_{1}$ and ${\rm{S}}_{2}$ on $\left[{\rm{t}}_{0}, {\rm{t}}_{1}\right]$ and notice if they are different.

Algorithm 1 is now applied to model (4) with $a = 0$ . The model is assumed to be at rest before $t = 0$ , and the input $u\left(t\right)$ is assumed to be a Heaviside function $H\left(t\right)$ . In such conditions, relation (4) is equivalent to ^[11]

$y\left(t\right) = y\left({t}_{0}\right)+{\boldsymbol{I}}_{{t}_{0}}^{\nu }\left\{H\left(t\right)\right\} ~~ 0 < \nu < 1.$

(5)

As a consequence, the simulation defined in Algorithm 1 provides the following solutions:

${S}_{1}:y\left(t\right) = \frac{{t}^{\nu }}{\rm{\Gamma }\left(\nu +1\right)} ~~0 < t < {t}_{1}$

(6)

${S}_{2}:y\left(t\right) = \frac{{\left(t-{t}_{0}\right)}^{\nu }}{\rm{\Gamma }\left(\nu +1\right)}+y\left({t}_{0}\right)~~ {t}_{0} < t < {t}_{1} .$

(7)

proposes a comparison of ${S}_{1}$ and ${S}_{2}$ and reveals a difference, thus demonstrating that the Caputo definition does not correctly take initial conditions into account.

Figure 1. Comparison of the exact response of model (4) with the responses obtained with Caputo definitions with initial conditions (

${\rm{t}}_{0} = 5\rm{s}$ ,

$\rm{a} = 0$ ,

$\rm{ \mathsf{ ν} } = 0.6$ ).

DownLoad: Full-Size Img PowerPoint

Another way to illustrate this result is to consider two different input signals ${u}_{1}\left(t\right)$ and ${u}_{2}\left(t\right)$ that create two different histories with:

${u}_{i}\left(t\right) = {A}_{i}H\left(t+{t}_{i}\right)-{A}_{i}H\left(t\right)~~ \text{with} ~~ {t}_{i} > 0 ,~~ i = \left\{\rm{1, 2}\right\} .$

(8)

The model is assumed to be at rest on $t\in \left]-\infty, {t}_{i}\right]$ . A constraint is also imposed on these signals so that at $t = 0$ , the two resulting model outputs coincide:

${y}_{1}\left(0\right) = {y}_{2}\left(0\right)\text{.}$

(9)

The output ${y}_{i}\left(t\right)$ is thus defined by:

${y}_{i}\left(t\right) = \frac{{A}_{i}}{a}\left(1-{E}_{\nu , 1}^{1}\left(-a{\left(t+{t}_{i}\right)}^{\nu }\right)\right)H\left(t+{t}_{i}\right)-\frac{{A}_{i}}{a}\left(1-{E}_{\nu , 1}^{1}\left(-a{t}^{\nu }\right)\right)H\left(t\right)\text{.}$

(10)

where ${E}_{\alpha, \beta }^{\gamma }\left(z\right)$ is the Mittag-Leffler function defined by ^[12]:

${E}_{\alpha , \beta }^{\gamma }\left(z\right) = \sum _{k = 0}^{\infty }\frac{\rm{\Gamma }\left(\gamma +k\right)}{\rm{\Gamma }\left(\alpha k+\beta \right)\rm{\Gamma }\left(\gamma \right)}\frac{{z}^{k}}{k!}\text{.}$

(11)

Condition (9) thus leads to

$\frac{{A}_{1}}{a}\left[\left(1-{E}_{\nu , 1}^{1}\left(-a{\left({t}_{1}\right)}^{\nu }\right)\right)-\left(1-{E}_{\nu , 1}^{1}\left(0\right)\right)\right] = \frac{{A}_{2}}{a}\left[\left(1-{E}_{\nu , 1}^{1}\left(-a{\left({t}_{2}\right)}^{\nu }\right)\right)-\left(1-{E}_{\nu , 1}^{1}\left(0\right)\right)\right]$

(12)

thus leading to the condition:

${A}_{1} = {A}_{2}\frac{1-{E}_{\nu , 1}^{1}\left(-a{\left({t}_{2}\right)}^{\nu }\right)}{1-{E}_{\nu , 1}^{1}\left(-a{\left({t}_{1}\right)}^{\nu }\right)}\text{.}$

(13)

With $\nu = 0.4$ , $a = 1$ , ${t}_{1} = -8s$ , ${t}_{2} = -2s$ , ${A}_{2} = 5$ and thus ${A}_{1}\approx 4.17$ , shows the signal inputs ${u}_{1}\left(t\right)$ and ${u}_{2}\left(t\right)$ used for the analysis and proposes a comparison of the resulting outputs. This figure shows that the two responses have the same values at $t = 0$ , but that the evolutions for $t>0$ are not the same. The information at $t = 0$ is thus not enough to predict the future of the model. All the past must be taken into account to predict the future of the model, which confirms that initialization as defined by the Caputo definition is not acceptable if used to define a time fractional model such as (4).

Figure 2. Comparison of the responses

${\rm{y}}_{1}\left(\rm{t}\right)$ and

${\rm{y}}_{2}\left(\rm{t}\right)$ of model (4) to two inputs that provide the same initial conditions.

DownLoad: Full-Size Img PowerPoint

3. Analysis with other definitions

The previous section showed that the Caputo definition should no longer be used to define time fractional models such as (4). What about other definitions?

3.1. The Riemann-Liouville definition

The Riemann-Liouville derivative of order $\nu$ , $0 < \nu < 1$ , of a function $y\left(t\right)$ is defined by ^[11]:

${}_{RL}{\boldsymbol{D}}_{{t}_{0}}^{\nu }y\left(t\right) = \frac{1}{\rm{\Gamma }\left(1-\nu \right)}\frac{d}{dt}{\int }_{{t}_{0}}^{t}\frac{y\left(\tau \right)}{{\left(t-\tau \right)}^{\nu }}d\tau \text{.}$

(14)

Laplace transform applied to relation (14) reveals how initial conditions are associated to this definition:

$\mathcal{L}\left\{{}_{RL}{\boldsymbol{D}}_{{t}_{0}}^{\nu }y\left(t\right)\right\} = \mathcal{L}\left\{\frac{d}{dt}\left(\frac{1}{\rm{\Gamma }\left(1-\nu \right)}{\int }_{{t}_{0}}^{t}\frac{y\left(\tau \right)}{{\left(t-\tau \right)}^{\nu }}d\tau \right)\right\} = s\frac{1}{{s}^{1-\nu }}Y\left(s\right)-{\left[{\boldsymbol{I}}_{{t}_{0}}^{1-\nu }y\left(t\right)\right]}_{t = {t}_{0}}\text{.}$

(15)

As a consequence, in ^[11,13], the initialisation of relation (4) is defined by

$\frac{{d}^{\nu }}{{dt}^{\nu }}y\left(t\right) = -ay\left(t\right)+u\left(t\right) ~~{\left.{I}_{{t}_{0}}^{1-\nu }\left\{y\left(t\right)\right\}\right|}_{t = {t}_{0}} = {y}_{0}$

(16)

and thus the initialisation problem of relation (4) is equivalent to the integral equation

$y\left(t\right) = \frac{{y}_{0}}{\rm{\Gamma }\left(\nu \right)}{\left(t-{t}_{0}\right)}^{\nu -1}+{I}_{{t}_{0}}^{\nu }\left\{-ay\left(t\right)+u\left(t\right)\right\}\text{.}$

(17)

Algorithm 1 is applied again to model (4) with $a = 0$ . The model is assumed to be at rest before $t = 0$ , and the input $u\left(t\right)$ is assumed to be a Heaviside function $H\left(t\right)$ . Algorithm 1 provides the following solutions:

${S}_{1}:y\left(t\right) = \frac{{t}^{\nu }}{\rm{\Gamma }\left(\nu +1\right)} ~~ 0 < t < {t}_{1}$

(18)

${S}_{2}:y\left(t\right) = \frac{{\left(t-{t}_{0}\right)}^{\nu }}{\rm{\Gamma }\left(\nu +1\right)}$ +

$\frac{{y}_{0}}{\rm{\Gamma }\left(\nu \right)}{\left(t-{t}_{0}\right)}^{\nu -1} ~~ {t}_{0} < t < {t}_{1} .$

(19)

Relation (19) seems to say that any value of ${y}_{0}$ can be chosen, but whatever the value selected, for ${S}_{2}$ $y\left(t\right)$ tends toward infinity as $t$ tends toward ${t}_{0}$ if ${y}_{0}\ne 0$ whereas $y\left(t\right) = {{t}_{0}}^{\nu }/\rm{\Gamma }\left(\nu +1\right)$ for ${S}_{1}$ . The two simulations thus give different results. This is illustrated by for various values of ${y}_{0}$ .

Figure 3. Comparison of the exact response of model (4) with the responses obtained with the Riemann-Liouville definition (t₀ = 5s,

$\rm{a} = 0, \rm{ \mathsf{ ν} } = 0.7$ ).

DownLoad: Full-Size Img PowerPoint

3.2. The Grünwald-Letnikov definition

The Grünwald-Letnikov derivative of order $\nu$ , $0 < \nu < 1$ , of a function $y\left(t\right)$ is defined by:

${}_{GL}{\boldsymbol{D}}_{{t}_{0}}^{\nu }y\left(t\right) = \underline{h\to 0}{\rm{lim}}\frac{1}{{h}^{\nu }}\sum _{0\le m < \infty }{\left(-1\right)}^{m}\left(\begin{array}{c}\nu \\ m\end{array}\right)f\left(t-mh\right) ~~ t > {t}_{0}$

(20)

with $\left(\begin{array}{c}\nu \\ m\end{array}\right) = \frac{\rm{\Gamma }\left(\nu +1\right)}{m!\rm{\Gamma }\left(\nu -m+1\right)} = \frac{\nu \left(\nu -1\right)\left(\nu -2\right)\dots \left(\nu -m+1\right)}{m\left(m-1\right)\left(m-2\right)\dots \left(m-m+1\right)}$ .

This definition is often used in the literature as it provides a simple numerical scheme for fractional derivative implementation. In some research ^[14,15,16], these numerical schemes are used to solve the initialisation problem:

$\begin{aligned} \boldsymbol{D}_{t_{0}}^{v} y(t) = -a y(t)+u(t) \quad 0 < v < 1 \quad a > 0 \quad \text { for } \quad t_{0} < v < T, \end{aligned}$

(21)

$y\left({t}_{0}\right) = {y}_{0}\text{.}$

In this case, it is not the Grünwald-Letnikov derivative definition which is questionable, but the idea that a time fractional model can be initialized solely with information on the initial moment. From relation (20), it is possible to observe that variable $m$ goes from 0 to infinity, and thus this definition is able to take into account the past of the derivative function, prior to ${t}_{0}$ . In (21), the problem is the way the initial conditions are defined.

To illustrate this problem, Algorithm 1 is applied to model (4) with $a = 1$ . The model is assumed to be at rest before $t = 0$ , and the input $u\left(t\right)$ is assumed to be a Heaviside function $H\left(t\right)$ . In such conditions, the simulation ${S}_{1}$ defined in Algorithm 1 provides the following solution:

${S}_{1}:y\left(t\right) = \frac{{t}^{\nu }}{\rm{\Gamma }\left(\nu +1\right)} ~~ 0 < t < {t}_{1} .$

(22)

Simulation ${S}_{2}$ is done using the Grünwald-Letnikov formula (20) and provides

${S}_{2}:y\left(t\right) = \frac{\frac{1}{{h}^{\nu }}\sum _{1\le m < \infty }{\left(-1\right)}^{m}\left(\begin{array}{c}\nu \\ m\end{array}\right)y\left(t-mh\right)+H\left(t\right)}{\frac{1}{{h}^{\nu }}+1} ~~ t > {t}_{0} .$

(23)

This simulation is done under two conditions:

- ${S}_{21}:$ by taking into account all the past of the model (all the values of $y\left(t\right)$ on $t\in \left[0, {t}_{0}\right]$ , provided by ${S}_{1}$ )

- ${S}_{22}:$ by considering only an initial condition at ${t}_{0}$ (value of $y\left(t\right)$ at ${t}_{0}$ provided by ${S}_{1}$ ).

The comparison of the three simulations is done in Figure 4 and reveals that the Grünwald-Letnikov definition produces an exact solution provided that all the past of the model is taken into account.

Figure 4. Comparison of the exact response of model (4) with the responses obtained with the Grünwald-Letnikov definition (t₀ = 3s,

$\rm{a} = 1, \rm{ \mathsf{ ν} } = 0.6$ ).

DownLoad: Full-Size Img PowerPoint

Relation (23) is particularly interesting because it shows that a time fractional model (here a fractional integrator) is represented by an infinite difference equation, and therefore an initialization of all its terms is necessary for a prediction of the output $y\left(t\right).$

This remark could also apply to the Caputo and Riemann-Liouville definitions which would lead to their reformulations with integrals on the interval $\left]-\rm{\infty }, \rm{ }t\right]$ as suggested in ^[27],

4. Need to take into account all of the model past

The need to take into account the all past of a time fractional model and not just the knowledge of its pseudo state at a single point in the past can be demonstrated quite simply on relation (4) (a particular case of fractional differential equation or of pseudo state space description). Contrary to what relation (4) might suggest, highlights that the implementation of fractional differential equations does not explicitly involve the fractional differentiation operator but the fractional order integration operator ${\boldsymbol{I}}^{\boldsymbol{\nu }}$ . Thus in practice, it is not necessary to specify which particular definition is used for ${\boldsymbol{D}}^{\boldsymbol{\nu }}$ in equation (4). Moreover, even if the system is assumed to have zero initial conditions at $t = 0$ , namely if the system is supposed at rest ( $u\left(t\right) = y\left(t\right) = 0$ , $\forall t < 0$ ), it is important to note that $y\left(t\right)$ cannot be considered as a state for the time fractional model and that all the past of $y\left(t\right)$ is required to compute the model evolution.

Figure 5. Block diagram Eq (4).

DownLoad: Full-Size Img PowerPoint

To better illustrate such a concept, a simple time fractional model is used: a fractional integrator supposed at rest at $t = 0$ . The corresponding block diagram is shown in Figure 6.

Figure 6. Block diagram of an order

$\rm{ \mathsf{ ν} }$ fractional integrator.

DownLoad: Full-Size Img PowerPoint

For an integer integrator, $\nu = 1$ , relation (4) is really a state space description. At ${t}_{1}>0$ , state $y\left(t\right)$ can be computed if the input between 0 and ${t}_{1}$ is known:

$y\left({t}_{1}\right) = {\int }_{0}^{{t}_{1}}\underline{x}\left(\tau \right)d\tau = {y}_{1} = cst\text{.}$

(24)

Values of $y\left(t\right)$ at later times than ${t}_{1}$ are given by:

$y(t)=\int_{0}^{t} \underline{x}(\tau) d \tau=\underbrace{\int_{0}^{t_{1}} \underline{x}(\tau) d \tau}_{y_{1}=c s t}+\int_{t_{1}}^{t} \underline{x}(\tau) d \tau, \quad t > t_{1}.$

(25)

Thus, $y\left(t\right)$ can be computed if $\underline{x}\left(t\right)$ is known within ${t}_{1}$ and $t$ . Integrator output at time ${t}_{1}$ thus summarizes the whole model past. $y\left(t\right)$ is really the state of the dynamic model, in agreement with the definition given in ^[26].

Let us apply the same reasoning to the fractional integrator case of order $\nu$ . From the definition of fractional integration, value of $y\left(t\right)$ at ${t}_{1}>0$ can be computed if the input between $t = 0$ and ${t}_{1}$ is known:

$y\left({t}_{1}\right) = \frac{1}{\rm{\Gamma }\left(\nu \right)}{\int }_{0}^{{t}_{1}}{\left({t}_{1}-\tau \right)}^{\nu -1}\underline{x}\left(\tau \right)d\tau = {y}_{1} = cst\text{.}$

(26)

Variable $y\left(t\right)$ , $\forall t>{t}_{1}$ , is thus given by:

$y\left(t_{1}\right)=\frac{1}{\Gamma(v)} \int_{0}^{t}(t-\tau)^{v-1} \underline{x}(\tau) d \tau=\underbrace{\frac{1}{\Gamma(v)} \int_{0}^{t_{1}}(t-\tau)^{v-1} \underline{x}(\tau) d \tau}_{\alpha(t) \neq y_{1}}+\frac{1}{\Gamma(v)} \int_{t_{1}}^{t}(t-\tau)^{v-1} \underline{x}(\tau) d \tau.$

(27)

Two notable differences can be highlighted with respect to the integer case. First, term $\alpha \left(t\right)$ in equation (27) is not a constant but depends on the considered time $t$ . Moreover, even if ${y}_{1} = y\left({t}_{1}\right)$ is known, it is not enough to compute $\alpha \left(t\right)$ . Output $y\left(t\right)$ of the fractional integrator is thus not a state. The same analysis can be held for the general case of a pseudo state description or a fractional differential equation.

Beyond discussions on the concept of state, computation of $\alpha \left(t\right)$ in relation (27) whatever time $t$ , requires to know $y\left(t\right)$ $\forall t\in \left[0..{t}_{1}\right]$ , thus all the model past. This clearly shows that knowledge of $y\left(t\right)$ at a unique point of the past is not enough.

5. Conclusions

Fractional operators and the resulting time fractional models are known for their memory property. However, for the following two reasons, many studies proposed in the literature seem to ignore this property when the model initialization problem is considered:

-they use the Caputo definition that involves only integer derivatives of the derivate function at the initial time,

-they use other definitions but initialization is done by taking only an initial value for the initial time into consideration.

This kind of initialization means that the operator or model memory exists everywhere on the time axis, except at the initial time. This is not consistent. Memory is an intrinsic property that exists all the time and that is proved in this paper with very simple examples. If from a mathematical point of view, most of the fractional derivative definitions encountered in the literature ^[17] are not problematic, this paper shows that the Caputo and Riemann-Liouville definitions are not able to ensure a proper initialization when used in a model definition. The paper also shows that this problem is not encountered with the Grünwald-Letnikov definition, provided that all the past of the model (from $t\to -\infty$ ) is taken into account. And this is precisely one of the drawbacks of time fractional models that induces a physical inconsistency and many analysis problems ^[17].

What are the possible solutions? One solution can be to add an initialization function to the definition of the model. This is what was proposed by Lorenzo and Hartley ^[1,2]. Yet again however, it requires the knowledge of all the model past (from $t\to -\infty$ ). Another solution consists in introducing new kernels for the definition of fractional integration as in ^[19]. But the goal would not be to solve only a singularity problem as in ^[19], but to reach a finite memory length as was done for instance in ^[20]. Note that while it was claimed in ^[21] that this class of kernels was too restrictive, it is linked to the problem analyzed in this paper: the inability of the Caputo definition to take into account initial conditions properly if used to define a time fractional model ^[22]. The other solution is to introduce new solutions for fractional behavior modeling, without the drawbacks associated to time fractional models ^[18]:

-distributed time delay models ^[23];

-non-linear models ^[24];

-partial differential equation with spatially varying coefficients ^[25].

All the conclusions presented in this paper can they be extended to models involving space fractional derivatives as in ^[29,30,31]? As shown in ^[28], whatever the variable on which the derivative relates, a fractional model remains a doubly infinite dimensional model and as such requires an infinite amount of information for its initialization. The question remains open, however, the authors will seek to answer it in their future work.

References

[1]	S. Bird, Ewan Klein, Edward Loper, Natural language processing with Python: analyzing text with the natural language toolkit, 1 Ed., Sebastopol: O'Reilly Media, Inc., 2009.
[2]	Y. Wu, M. Schuster, Z. Chen, Q. V. Le, M. Norouzi, W. Macherey, et al., Google's neural machine translation system: bridging the gap between human and machine translation, arXiv: 1609.08144. http://dx.doi.org/10.48550/arXiv.1609.08144
[3]	B. Liu, Sentiment analysis and opinion mining, Cham: Springer, 2012. http://dx.doi.org/10.1007/978-3-031-02145-9
[4]	P. Rajpurkar, J. Zhang, K. Lopyrev, P. Liang, SQuAD: 100,000+ questions for machine comprehension of text, Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2016, 2383–2392. http://dx.doi.org/10.18653/v1/D16-1264 doi: 10.18653/v1/D16-1264
[5]	A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, et al., Attention is all you need, Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017, 6000–6010.
[6]	J. Devlin, M. Chang, K. Lee, K. Toutanova, BERT: pre-training of deep bidirectional transformers for language understanding, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 2019, 4171–4186. http://dx.doi.org/10.18653/v1/N19-1423 doi: 10.18653/v1/N19-1423
[7]	Z. Yang, Z. Dai, Y. Yang, J. G. Carbonell, R. Salakhutdinov, Q. V. Le, XLNet: generalized autoregressive pretraining for language understanding, Proceedings of the 33rd International Conference on Neural Information Processing Systems, 2019, 5753–5763.
[8]	S. Ruder, A. Søgaard, I. Vulić, Unsupervised cross-lingual representation learning, Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts, 2019, 31–38. http://dx.doi.org/10.18653/v1/P19-4007 doi: 10.18653/v1/P19-4007
[9]	C. Wang, M. Li, A. J. Smola, Language models with transformers, arXiv: 1904.09408. http://dx.doi.org/10.48550/arXiv.1904.09408
[10]	G. Lample, A. Conneau, Cross-lingual language model pretraining, Proceedings of the 33rd International Conference on Neural Information Processing Systems, 2019, 7059–7069.
[11]	A. Conneau, K. Khandelwal, N. Goyal, V. Chaudhary, G. Wenzek, F. Guzmán, et al., Unsupervised cross-lingual representation learning at scale, Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020, 8440–8451. http://dx.doi.org/10.18653/v1/2020.acl-main.747 doi: 10.18653/v1/2020.acl-main.747
[12]	W. Vries, A. Cranenburgh, A. Bisazza. T. Caselli, G. Noord, M. Nissim, BERTje: a dutch BERT model, arXiv: 1912.09582. http://dx.doi.org/10.48550/arXiv.1912.09582
[13]	A. Virtanen, J. Kanerva, R. Ilo, J. Luoma. J. Luotolahti, T. Salakoski, et al., Multilingual is not enough: BERT for Finnish, arXiv: 1912.07076. http://dx.doi.org/10.48550/arXiv.1912.07076
[14]	K. K, Z. Wang, S. Mayhew, D. Roth, Cross-lingual ability of multilingual BERT: an empirical study, arXiv: 1912.07840. http://dx.doi.org/10.48550/arXiv.1912.07840
[15]	M. Artetxe, S. Ruder, D. Yogatama, On the cross-lingual transferability of monolingual representations, Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020, 4623–4637. http://dx.doi.org/10.18653/v1/2020.acl-main.421 doi: 10.18653/v1/2020.acl-main.421
[16]	Y. Tedla, K. Yamamoto, Morphological segmentation with LSTM neural networks for Tigrinya, IJNLC, 7 (2018), 29–44. http://dx.doi.org/10.5121/ijnlc.2018.7203 doi: 10.5121/ijnlc.2018.7203
[17]	R. Hetzron, The Semitic languages, New York: Routledge, 1997.
[18]	O. Osman, Y. Mikami, Stemming Tigrinya words for information retrieval, Proceedings of COLING 2012: Demonstration Papers, 2012, 345–352.
[19]	M. Tadesse, Trilingual sentiment analysis on social media, Master Thesis, Univeristy of Addis Ababa, 2018.
[20]	Y. K. Tedla, K. Yamamoto, A. Marasinghe, Tigrinya part-of-speech tagging with morphological patterns and the new Nagaoka Tigrinya corpus, International Journal of Computer Applications, 146 (2016), 33–41. http://dx.doi.org/10.5120/IJCA2016910943 doi: 10.5120/IJCA2016910943
[21]	A. Sahle, Sewasiw Tigrinya B'sefihu/a comprehensive Tigrinya grammar, Lawrenceville: Red Sea Press, Inc., 1998.
[22]	T. Kudo, J. Richardson, SentencePiece: a simple and language independent subword tokenizer and detokenizer for neural text processing, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, 2018, 66–71. http://dx.doi.org/10.18653/v1/D18-2012 doi: 10.18653/v1/D18-2012
[23]	Z. Chi, L. Dong, F. Wei, X. Mao, H. Huang, Can monolingual pretrained models help cross-lingual classification? Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, 2020, 12–17.
[24]	T. Mikolov, K. Chen, G. Corrado, J. Dean, Efficient estimation of word representations in vector space, arXiv: 1301.3781. http://dx.doi.org/10.48550/arXiv.1301.3781
[25]	P. Prettenhofer, B. Stein, Cross-language text classification using structural correspondence learning, Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, 2010, 1118–1127.
[26]	A. Sugiyama, N. Yoshinaga, Data augmentation using back-translation for context-aware neural machine translation, Proceedings of the Fourth Workshop on Discourse in Machine Translation, 2019, 35–44. http://dx.doi.org/10.18653/v1/D19-6504 doi: 10.18653/v1/D19-6504
[27]	J. Wei, K. Zou, EDA: easy data augmentation techniques for boosting performance on text classification tasks, Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019, 6382–6388. http://dx.doi.org/10.18653/v1/D19-1670 doi: 10.18653/v1/D19-1670

This article has been cited by:

1.	Vincent Tartaglione, Christophe Farges, Jocelyn Sabatier, Fractional Behaviours Modelling with Volterra Equations: Application to a Lithium-Ion Cell and Comparison with a Fractional Model, 2022, 6, 2504-3110, 137, 10.3390/fractalfract6030137
2.	Juan A. López-Villanueva, Pablo Rodríguez-Iturriaga, Luis Parrilla, Salvador Rodríguez-Bolívar, A compact model of the ZARC for circuit simulators in the frequency and time domains, 2022, 153, 14348411, 154293, 10.1016/j.aeue.2022.154293
3.	Nezha Maamri, Jean-Claude Trigeassou, A Plea for the Integration of Fractional Differential Systems: The Initial Value Problem, 2022, 6, 2504-3110, 550, 10.3390/fractalfract6100550
4.	Jocelyn Sabatier, 2022, Chapter 1, 978-3-031-04382-6, 1, 10.1007/978-3-031-04383-3_1
5.	Zdeněk Biolek, Dalibor Biolek, Viera Biolková, Zdeněk Kolka, Extended Higher-Order Elements with Frequency-Doubled Parameters: The Hysteresis Loops Are Always of Type II, 2023, 23, 1424-8220, 7179, 10.3390/s23167179
6.	Abdelkader Moumen, Abdelaziz Mennouni, Mohamed Bouye, Contributions to the Numerical Solutions of a Caputo Fractional Differential and Integro-Differential System, 2024, 8, 2504-3110, 201, 10.3390/fractalfract8040201
7.	Reny George, Fahad Al-shammari, Mehran Ghaderi, Shahram Rezapour, On the boundedness of the solution set for the $\psi$ -Caputo fractional pantograph equation with a measure of non-compactness via simulation analysis, 2023, 8, 2473-6988, 20125, 10.3934/math.20231025
8.	Shabnam Tashakori, Andres San-Millan, Vahid Vaziri, Sumeet S. Aphale, Fast Parameter Identification of the Fractional-Order Creep Model, 2024, 13, 2076-0825, 534, 10.3390/act13120534
9.	Hacen Serrai, Brahim Tellab, A generalized contraction mapping applied for existence results for ordinary hybrid version of a generalized Sturm–Liouville–Langevin equations under $\Psi$ -Hilfer fractional-order derivative, 2025, 74, 0009-725X, 10.1007/s12215-024-01175-4

Reader Comments

Your name:*

Email:*
© 2024 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Applied Computing and Intelligence

Metrics

Article views(686) PDF downloads(29) Cited by(0)

Preview PDF

Download XML

Export Citation

Article outline

Show full outline

Figures and Tables

Figures(4) / Tables(4)

Applied Computing and Intelligence

Transferring monolingual model to low-resource language: the case of Tigrinya

Related Papers:

Abstract

1. Introduction

2. Problem analysis with Caputo's definition

3. Analysis with other definitions

3.1. The Riemann-Liouville definition

3.2. The Grünwald-Letnikov definition

4. Need to take into account all of the model past

5. Conclusions

References

This article has been cited by:

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Catalog

Applied Computing and Intelligence

Transferring monolingual model to low-resource language: the case of Tigrinya

Related Papers:

Abstract

1. Introduction

2. Problem analysis with Caputo's definition

3. Analysis with other definitions

3.1. The Riemann-Liouville definition

3.2. The Grünwald-Letnikov definition

4. Need to take into account all of the model past

5. Conclusions

References

This article has been cited by:

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Related pages

Tools

Export File

Citation

Format

Content

Catalog