A hybrid CNN-LSTM model with adaptive instance normalization for one shot singing voice conversion

Assila Yousuf; David Solomon George; Assila Yousuf; David Solomon George

doi:10.3934/electreng.2024013

AIMS Electronics and Electrical Engineering

2024, Volume 8, Issue 3: 292-310. doi: 10.3934/electreng.2024013

Previous Article Next Article

Research article

A hybrid CNN-LSTM model with adaptive instance normalization for one shot singing voice conversion

Assila Yousuf ^,,
David Solomon George

Department of Electronics and Communication Engineering, Rajiv Gandhi Institute of Technology, Kottayam, Kerala, 686501, India (Affiliated to APJ Abdul Kalam Technological University, Kerala)

Received: 04 March 2024 Revised: 22 May 2024 Accepted: 23 May 2024 Published: 03 June 2024

Singing voice conversion methods encounter challenges in achieving a delicate balance between synthesis quality and singer similarity. Traditional voice conversion techniques primarily emphasize singer similarity, often leading to robotic-sounding singing voices. Deep learning-based singing voice conversion techniques, however, focus on disentangling singer-dependent and singer-independent features. While this approach can enhance the quality of synthesized singing voices, many voice conversion systems still grapple with the issue of singer-dependent feature leakage into content embeddings. In the proposed singing voice conversion technique, an encoder decoder framework was implemented using a hybrid model of convolutional neural network (CNN) accompanied by long short term memory (LSTM). This paper investigated the use of activation guidance and adaptive instance normalization techniques for one shot singing voice conversion. The instance normalization (IN) layers within the auto-encoder effectively separated singer and content representations. During conversion, singer representations were transferred using adaptive instance normalization (AdaIN) layers. This singing voice system with the help of activation function prevented the transfer of singer information while conveying the singing content. Additionally, the fusion of LSTM with CNN can enhance voice conversion models by capturing both local and contextual features. The one-shot capability simplified the architecture, utilizing a single encoder and decoder. Impressively, the proposed hybrid CNN-LSTM model achieved remarkable performance without compromising either quality or similarity. The objective and subjective evaluation assessments showed that the proposed hybrid CNN-LSTM model outperformed the baseline architectures. Evaluation results showed a mean opinion score (MOS) of 2.93 for naturalness and 3.35 for melodic similarity. These hybrid CNN-LSTM techniques allowed it to perform high-quality voice conversion with minimal training data, making it a promising solution for various applications.

Keywords:

Citation: Assila Yousuf, David Solomon George. A hybrid CNN-LSTM model with adaptive instance normalization for one shot singing voice conversion[J]. AIMS Electronics and Electrical Engineering, 2024, 8(3): 292-310. doi: 10.3934/electreng.2024013

Related Papers:

[1]	P. Pirmohabbati, A. H. Refahi Sheikhani, H. Saberi Najafi, A. Abdolahzadeh Ziabari . Numerical solution of full fractional Duffing equations with Cubic-Quintic-Heptic nonlinearities. AIMS Mathematics, 2020, 5(2): 1621-1641. doi: 10.3934/math.2020110
[2]	SAIRA, Wenxiu Ma, Suliman Khan . An efficient numerical method for highly oscillatory logarithmic-algebraic singular integrals. AIMS Mathematics, 2025, 10(3): 4899-4914. doi: 10.3934/math.2025224
[3]	Kai Wang, Guicang Zhang . Curve construction based on quartic Bernstein-like basis. AIMS Mathematics, 2020, 5(5): 5344-5363. doi: 10.3934/math.2020343
[4]	Taher S. Hassan, Amir Abdel Menaem, Hasan Nihal Zaidi, Khalid Alenzi, Bassant M. El-Matary . Improved Kneser-type oscillation criterion for half-linear dynamic equations on time scales. AIMS Mathematics, 2024, 9(10): 29425-29438. doi: 10.3934/math.20241426
[5]	Dexin Meng . Wronskian-type determinant solutions of the nonlocal derivative nonlinear Schrödinger equation. AIMS Mathematics, 2025, 10(2): 2652-2667. doi: 10.3934/math.2025124
[6]	Samia BiBi, Md Yushalify Misro, Muhammad Abbas . Smooth path planning via cubic GHT-Bézier spiral curves based on shortest distance, bending energy and curvature variation energy. AIMS Mathematics, 2021, 6(8): 8625-8641. doi: 10.3934/math.2021501
[7]	Chunli Li, Wenchang Chu . Remarkable series concerning $\binom{3n}{n}$ and harmonic numbers in numerators. AIMS Mathematics, 2024, 9(7): 17234-17258. doi: 10.3934/math.2024837
[8]	Beatriz Campos, Alicia Cordero, Juan R. Torregrosa, Pura Vindel . Dynamical analysis of an iterative method with memory on a family of third-degree polynomials. AIMS Mathematics, 2022, 7(4): 6445-6466. doi: 10.3934/math.2022359
[9]	A. Palanisamy, J. Alzabut, V. Muthulakshmi, S. S. Santra, K. Nonlaopon . Oscillation results for a fractional partial differential system with damping and forcing terms. AIMS Mathematics, 2023, 8(2): 4261-4279. doi: 10.3934/math.2023212
[10]	Tongzhu Li, Ruiyang Lin . Classification of Möbius homogeneous curves in $\mathbb{R}^4$ . AIMS Mathematics, 2024, 9(8): 23027-23046. doi: 10.3934/math.20241119

Abstract

1. Introduction

We consider the following family of nonlinear oscillators

$\begin{equation} y_{zz}+k(y)y_{z}^{3}+h(y)y_{z}^{2}+f(y)y_{z}+g(y) = 0, \end{equation}$

(1.1)

where $k$ , $h$ , $f\neq0$ and $g\neq0$ are arbitrary sufficiently smooth functions. Particular members of (1.1) are used for the description of various processes in physics, mechanics and so on and they also appear as invariant reductions of nonlinear partial differential equations ^[1,2,3].

Integrability of (1.1) was studied in a number of works ^{[4,5,6,7,8,9,10,11,12,13,14,15,16]}. In particular, in ^[15] linearization of (1.1) via the following generalized nonlocal transformations

$\begin{equation} w = F(y), \quad d\zeta = (G_{1}(y)y_{z}+G_{2}(y))dz. \end{equation}$

(1.2)

was considered. However, equivalence problems with respect to transformations (1.2) for (1.1) and its integrable nonlinear subcases have not been studied previously. Therefore, in this work we deal with the equivalence problem for (1.1) and its integrable subcase from the Painlevé-Gambier classification. Namely, we construct an equivalence criterion for (1.1) and a non-canonical form of Ince Ⅶ equation ^[17,18]. As a result, we obtain two new integrable subfamilies of (1.1). What is more, we demonstrate that for any equation from (1.1) that satisfy one of these equivalence criteria one can construct an autonomous first integral in the parametric form. Notice that we use Ince Ⅶ equation because it is one of the simplest integrable members of (1.1) with known general solution and known classification of invariant curves.

Moreover, we show that transformations (1.2) preserve autonomous invariant curves for equations from (1.1). Since the considered non-canonical form of Ince Ⅶ equation admits two irreducible polynomial invariant curves, we obtain that any equation from (1.1), which is equivalent to it, also admits two invariant curves. These invariant curves can be used for constructing an integrating factor for equations from (1.1) that are equivalent to Ince Ⅶ equation. If this integrating factor is Darboux one, then the corresponding equation is Liouvillian integrable ^[19]. This demonstrates the connection between nonlocal equivalence approach and Darboux integrability theory and its generalizations, which has been recently discussed for a less general class of nonlocal transformations in ^[20,21,22].

The rest of this work is organized as follows. In the next Section we present an equivalence criterion for (1.1) and a non-canonical form of the Ince Ⅶ equation. In addition, we show how to construct an autonomous first integral for an equation from (1.1) satisfying this equivalence criterion. We also demonstrate that transformations (1.2) preserve autonomous invariant curves for (1.1). In Section 3 we provide two examples of integrable equations from (1.1) and construct their parametric first integrals, invariant curves and integrating factors. In the last Section we briefly discuss and summarize our results.

2. Main results

We begin with the equivalence criterion between (1.1) and a non-canonical form of the Ince Ⅶ equation, that is ^[17,18]

$\begin{equation} w_{\zeta\zeta}+3w_{\zeta}+\epsilon w^{3}+2w = 0. \end{equation}$

(2.1)

Here $\epsilon\neq0$ is an arbitrary parameter, which can be set, without loss of generality, to be equal to $\pm 1$ .

The general solution of (1.1) is

$\begin{equation} w = {\rm e}^{-(\zeta-\zeta_{0})}\rm{cn}\left\{\sqrt{\epsilon}({\rm e}^{-(\zeta-\zeta_{0})}-C_{1}),\frac{1}{\sqrt{2}}\right\}. \end{equation}$

(2.2)

Here $\zeta_{0}$ and $C_{1}$ are arbitrary constants and $\rm{cn}$ is the Jacobian elliptic cosine. Expression (2.2) will be used below for constructing autonomous parametric first integrals for members of (1.1).

The equivalence criterion between (1.1) and (2.1) can be formulated as follows:

Theorem 2.1. Equation (1.1) is equivalent to (2.1) if and only if either

$\begin{equation} \begin{gathered} (I)\quad 25515lgp^{2}q_{y}+2352980l^{10}+\left(3430q-6667920p^{3}\right) l^{5}\\ -14580qp^{3}-10q^{2}-76545lgqpp_{y} = 0, \end{gathered} \end{equation}$

(2.3)

$\begin{equation} \begin{gathered} (II)\quad 343l^{5}-972p^{3} = 0, \end{gathered} \end{equation}$

(2.4)

holds. Here

$\begin{equation} \begin{gathered} l = 9(fg_{y}-gf_{y}+fgh-3kg^{2})-2f^{3}, \quad p = gl_{y}-3lg_{y}+l(f^{2}-3gh),\\ q = 25515g_{y}lp^{2}-5103lgpp_{y}+686l^{5}-8505p^{2}\left( f^{2}-3gh \right) l+6561p^{3}. \end{gathered} \end{equation}$

(2.5)

The expression for $G_{2}$ in each case is either

$\begin{equation} (I) \quad G_{2} = \frac {126 l^{2}qp^{2}}{470596 l^{10}- \left( 1333584 p^{3}+1372 q \right) l^{5}+q^{2}}, \end{equation}$

(2.6)

$\begin{equation} (II) \quad G_{2}^{2} = -\frac{49l^{3}G_{2}+9p^{2}}{189pl}. \end{equation}$

(2.7)

In all cases the functions $F$ and $G_{1}$ are given by

$\begin{equation} F^{2} = \frac{l}{81\epsilon G_{2}^{3}}, \quad G_{1} = \frac{G_{2}(f-3G_{2})}{3g}. \end{equation}$

(2.8)

Proof. We begin with the necessary conditions. Substituting (1.2) into (2.1) we get

$\begin{equation} y_{zz}+k(y)y_{z}^{3}+h(y)y_{z}^{2}+f(y)y_{z}+g(y) = 0, \end{equation}$

(2.9)

where

$\begin{equation} \begin{gathered} k = \frac{FG_{1}^{3}(\epsilon F^{2}+2)+3G_{1}^{2}F_{y}+G_{1}F_{yy}-F_{y}G_{1,y}}{G_{2}F_{y}}, \\ h = \frac{G_{2}F_{yy}+(6G_{1}G_{2}-G_{2,y})F_{y}+3FG_{2}G_{1}^{2}(\epsilon F^{2}+2)}{G_{2}F_{y}},\\ f = \frac{3G_{2}(F_{y}+FG_{1}(\epsilon F^{2}+2))}{F_{y}}, \quad g = \frac{FG_{2}^{2}(\epsilon F^{2}+2)}{F_{y}}. \end{gathered} \end{equation}$

(2.10)

As a consequence, we obtain that (1.1) can be transformed into (2.1) if it is of the form (2.9) (or (1.1)).

Conversely, if the functions $F$ , $G_{1}$ and $G_{2}$ satisfy (2.10) for some values of $k$ , $h$ , $f$ and $g$ , then (1.1) can be mapped into (2.1) via (1.2). Thus, we see that the compatibility conditions for (2.10) as an overdertmined system of equations for $F$ , $G_{1}$ and $G_{2}$ result in the necessary and sufficient conditions for (1.1) to be equivalent to (2.1) via (1.2).

To obtain the compatibility conditions, we simplify system (2.10) as follows. Using the last two equations from (2.10) we find the expression for $G_{1}$ given in (2.8). Then, with the help of this relation, from (2.10) we find that

$\begin{equation} 81\epsilon F^{2}G_{2}^{3}-l = 0, \end{equation}$

(2.11)

and

$\begin{equation} \begin{gathered} 567lG_{2}^{3}+(243lgh-81lf^{2}-81gl_{y}+243lg_{y})G_{2}-7l^{2} = 0,\\ 243lgG_{2,y}+324lG_{2}^{3}-81gl_{y}G_{2}+2l^{2} = 0, \end{gathered} \end{equation}$

(2.12)

Here $l$ is given by (2.5).

As a result, we need to find compatibility conditions only for (2.12). In order to find the generic case of this compatibility conditions, we differentiate the first equation twice and find the expression for $G_{2}^{2}$ and condition (2.3). Differentiating the first equation from (2.12) for the third time, we obtain (2.6). Further differentiation does not lead to any new compatibility conditions. Particular case (2.4) can be treated in the similar way.

Finally, we remark that the cases of $l = 0$ , $p = 0$ and $q = 0$ result in the degeneration of transformations (1.2). This completes the proof.

As an immediate corollary of Theorem 2.1 we get

Corollary 2.1. If coefficients of an equation from (1.1) satisfy either (2.3) or (2.4), then an autonomous first integral of this equation can be presented in the parametric form as follows:

$\begin{equation} y = F^{-1}(w), \quad y_{z} = \frac{G_{2}w_{\zeta}}{F_{y}-G_{1}w_{\zeta}}\,. \end{equation}$

(2.13)

Here $w$ is the general solution of (2.1) given by (2.2). Notice also that, formally, (2.13) contains two arbitrary constants, namely $\zeta_{0}$ and $C_{1}$ . However, without loss of generality, one of them can be set equal to zero.

Now we demonstrate that transformations (1.2) preserve autonomous invariant curves for equations from (1.1).

First, we need to introduce the definition of an invariant curve for (1.1). We recall that Eq (1.1) can be transformed into an equivalent dynamical system

$\begin{equation} y_{z} = P, \quad u_{z} = Q, \quad P = u, \quad Q = -k u^{3}-h u^{2}-f u -g. \end{equation}$

(2.14)

A smooth function $H(y, u)$ is called an invariant curve of (2.14) (or, equivalently, of (1.1)), if it is a nontrivial solution of ^[19]

$\begin{equation} PH_{y}+QH_{u} = \lambda H, \end{equation}$

(2.15)

for some value of the function $\lambda$ , which is called the cofactor of $H$ .

Second, we need to introduce the equation that is equivalent to (1.1) via (1.2). Substituting (1.2) into (1.1) we get

$\begin{equation} w_{\zeta\zeta}+\tilde{k}w_{\zeta}^{3}+\tilde{h}w_{\zeta}^{2}+\tilde{f}w_{\zeta}+\tilde{g} = 0, \end{equation}$

(2.16)

where

$\begin{equation} \begin{gathered} \tilde{k} = \frac{kG_{2}^{3}-gG_{1}^{3}+(G_{1,y}-hG_{1})G_{2}^{2}+(fG_{1}-G_{2,y})G_{1}G_{2}}{F_{y}^{2}G_{2}^{2}},\\ \tilde{h} = \frac{(hF_{y}-F_{yy})G_{2}^{2}-(2fG_{1}-G_{2,y})G_{2}F_{y}+3gG_{1}^{2}F_{y}}{F_{y}^{2}G_{2}^{2}},\\ \tilde{f} = \frac{fG_{2}-3gG_{1}}{G_{2}^{2}}, \quad \tilde{g} = \frac{gF_{y}}{G_{2}^{2}}. \end{gathered} \end{equation}$

(2.17)

An invariant curve for (2.16) can be defined in the same way as that for (1.1). Notice that, further, we will denote $w_{\zeta}$ as $v$ .

Theorem 2.2. Suppose that either (1.1) possess an invariant curve $H(y, u)$ with the cofactor $\lambda(y, u)$ or (2.16) possess an invariant curve $\widetilde{H}(w, v)$ with the cofactor $\tilde{\lambda}(w, v)$ . Then, the other equation also has an invariant curve and the corresponding invariant curves and cofactors are connected via

$\begin{equation} \begin{gathered} H(y,u) = \widetilde{H}\left(F,\frac{F_{y}u}{G_{1}u+G_{2}}\right), \quad \lambda(y,u) = (G_{1}u+G_{2})\tilde{\lambda}\left(F,\frac{F_{y}u}{G_{1}u+G_{2}}\right). \end{gathered} \end{equation}$

(2.18)

Proof. Suppose that $\widetilde{H}(w, v)$ is an invariant curve for (2.16) with the cofactor $\widetilde{\lambda}(w, v)$ . Then it satisfies

$\begin{equation} v\widetilde{H}_{w}+(-\tilde{k} v^{3}-\tilde{h} v^{2}-\tilde{f} v -\tilde{g})\widetilde{H}_{v} = \widetilde{\lambda}\widetilde{H}. \end{equation}$

(2.19)

Substituting (1.2) into (2.19) we get

$\begin{equation} uH_{y}+(-k u^{3}-h u^{2}-f u-g)H = (G_{1}u+G_{2}) \tilde{\lambda}\left(F,\frac{F_{y}u}{G_{1}u+G_{2}}\right) H . \end{equation}$

(2.20)

This completes the proof.

As an immediate consequence of Theorem 2.2 we have that transformations (1.2) preserve autonomous first integrals admitted by members of (1.1), since they are invariant curves with zero cofactors.

Another corollary of Theorem 2.2 is that any equation from (1.1) that is connected to (2.1) admits two invariant curves that correspond to irreducible polynomial invariant curves of (2.1). This invariant curves of (2.1) and the corresponding cofactors are the following (see, ^[23] formulas (3.18) and (3.19) taking into account scaling transformations)

$\begin{equation} \widetilde{H} = \pm i \sqrt{-2\epsilon}(v+w)+w^{2}, \quad \tilde{\lambda} = \pm \sqrt{-2\epsilon}w-2. \end{equation}$

(2.21)

Therefore, we have that the following statement holds:

Corollary 2.2. If coefficients of an equation from (1.1) satisfy either (2.3) or (2.4), then is admits the following invariant curves with the corresponding cofactors

$\begin{equation} H = \pm i \sqrt{-2\epsilon}\left(\frac{F_{y}u}{G_{1}u+G_{2}}+F\right)+F^{2}, \quad \lambda = \left(G_{1}u+G_{2}\right)\left(\pm \sqrt{-2\epsilon}F-2\right). \end{equation}$

(2.22)

Let us remark that connections between (2.1) and non-autonomous variants of (1.1) can be considered via a non-autonomous generalization of transformations (1.2). However, one of two nonlocally related equations should be autonomous since otherwise nonlocal transformations do not map a differential equation into a differential equation ^[5].

In this Section we have obtained the equivalence criterion between (1.1) and (2.1), that defines two new completely integrable subfamilies of (1.1). We have also demonstrated that members of these subfamilies posses an autonomous parametric first integral and two autonomous invariant curves.

3. Examples

In this Section we provide two examples of integrable equations from (1.1) satisfying integrability conditions from Theorem 2.1.

Example 1. One can show that the coefficients of the following cubic oscillator

$\begin{equation} y_{zz}-\frac{12\epsilon \mu y}{(\epsilon \mu^{2}y^{4}+2)^{2}}y_{z}^{3}-6\mu y y_{z}+2\mu^{2}y^{3}(\epsilon\mu^{2}y^{4}+2) = 0, \end{equation}$

(3.1)

satisfy condition (2.3) from Theorem 2.1. Consequently, Eq (3.1) is completely integrable and its general solution can be obtained from (2.2) by inverting transformations (1.2). However, it is more convenient to use Corollary 2.1 and present the autonomous first integral of (3.1) in the parametric form as follows:

$\begin{equation} y = \pm\sqrt{\frac{w}{\mu}}, \quad y_{z} = \frac{w(\epsilon w^{2}+2) w_{\zeta}}{2w_{\zeta}+w(\epsilon w^{2}+2)}, \end{equation}$

(3.2)

where $w$ is given by (2.2), $\zeta$ is considered as a parameter and $\zeta_{0}$ , without loss of generality, can be set equal to zero. As a result, we see that (3.1) is integrable since it has an autonomous first integral.

Moreover, using Corollary 2.2 one can find invariant curves admitted by (3.1)

$\begin{equation} \begin{gathered} H_{1,2} = \frac{y^{4}\left[(\sqrt{2}\pm \sqrt{-\epsilon}\mu y^{2})^{2}(\sqrt{2}\mp\sqrt{-\epsilon}\mu y^{2})+2(\epsilon\mu y^{2}\mp\sqrt{-2\epsilon})u\right]}{2\mu^{2}y^{2}(\epsilon\mu^{2}y^{4}+2)-4u},\\ \lambda_{1,2} = \pm\frac{2(\mu y^{2}(\epsilon\mu^{2}y^{4}+2)-2u)(\sqrt{-2\epsilon}\mu y^{2}\mp 2)}{y(\epsilon\mu^{2}y^{4}+2)} \end{gathered} \end{equation}$

(3.3)

With the help of the standard technique of the Darboux integrability theory ^[19], it is easy to find the corresponding Darboux integrating factor of (3.1)

$\begin{equation} M = \frac{(\epsilon \mu^{2}y^{4}+2)^{\frac{9}{4}}}{(2\epsilon u^{2}+(\epsilon\mu^{2}y^{4}+2)^{2})^{\frac{3}{4}}(\mu y^{2}(\epsilon \mu^{2}y^{4}+2)-2u)^{\frac{3}{2}}}. \end{equation}$

(3.4)

Consequently, equation is (3.1) Liouvillian integrable.

Example 2. Consider the Liénard (1, 9) equation

$\begin{equation} y_{zz}+(b_{i}y^{i}){y_{z}}+a_{j}y^{j} = 0, \quad i = 0,\ldots 4, \quad j = 0,\ldots,9. \end{equation}$

(3.5)

Here summation over repeated indices is assumed. One can show that this equation is equivalent to (2.1) if it is of the form

$\begin{equation} y_{zz}-9(y+\mu)(y+3\mu)^{3}y_{z}+2y(2y+3\mu)(y+3\mu)^{7} = 0, \end{equation}$

(3.6)

where $\mu$ is an arbitrary constant.

With the help of Corollary 2.1 one can present the first integral of (3.6) in the parametric form as follows:

$\begin{equation} y = \frac{3\sqrt{-2\epsilon}\mu w}{2-\sqrt{-2\epsilon}w}, \quad y_{z} = \frac{7776 \sqrt{2} \epsilon \mu^{5} w w_{\zeta}}{(\sqrt{-2\epsilon}w-2)^{5}(2\sqrt{-\epsilon}w_{\zeta}+(\sqrt{2}\epsilon w +2\sqrt{-\epsilon})w)}, \end{equation}$

(3.7)

where $w$ is given by (2.2). Thus, one can see that (3.5) is completely integrable due to the existence of this parametric autonomous first integral.

Using Corollary 2.2 we find two invariant curves of (3.6):

$\begin{equation} H_{1} = \frac{y^{2}[(2y+3\mu)(y+3\mu)^{4}-2u)]}{(y+3\mu)^{2}[(y+3\mu)^{4}y-u]}, \quad \lambda_{1} = \frac{6\mu(u-y(y+3\mu)^{4})}{y(y+3\mu)}, \end{equation}$

(3.8)

and

$\begin{equation} H_{2} = \frac{y^{2}(y+3\mu)^{2}}{y(y+3\mu)^{4}-u}, \quad \lambda_{2} = \frac{2(2y+3\mu)(u-2y(y+3\mu)^{4})}{y(y+3\mu)}. \end{equation}$

(3.9)

The corresponding Darboux integrating factor is

$\begin{equation} M = \left[y(y+3\mu)^{4}-u\right]^{-\frac{3}{2}}\left[(2y+3\mu)(y+3\mu)^{4}-2u\right]^{-\frac{3}{4}}. \end{equation}$

(3.10)

As a consequence, we see that Eq (3.6) is Liouvillian integrable.

Therefore, we see that equations considered in Examples 1 and 2 are completely integrable from two points of view. First, they possess autonomous parametric first integrals. Second, they have Darboux integrating factors.

4. Conclusions and discussion

In this work we have considered the equivalence problem between family of Eqs (1.1) and its integrable member (2.1), with equivalence transformations given by generalized nonlocal transformations (1.2). We construct the corresponding equivalence criterion in the explicit form, which leads to two new integrable subfamilies of (1.1). We have demonstrated that one can explicitly construct a parametric autonomous first integral for each equation that is equivalent to (2.1) via (1.2). We have also shown that transformations (1.2) preserve autonomous invariant curves for (1.1). As a consequence, we have obtained that equations from the obtained integrable subfamilies posses two autonomous invariant curves, which corresponds to the irreducible polynomial invariant curves of (2.1). This fact demonstrate a connection between nonlocal equivalence approach and Darboux and Liouvillian integrability approach. We have illustrate our results by two examples of integrable equations from (1.1).

Acknowledgments

The author was partially supported by Russian Science Foundation grant 19-71-10003.

Conflict of interest

The author declares no conflict of interest in this paper.

References

[1]	Helander E, Virtanen T, Nurminen J, Gabbouj M (2010) Voice conversion using partial least squares regression. IEEE/ACM Transactions on Audio, Speech and Language Processing 18: 912–921. https://doi.org/10.1109/TASL.2011.2165944 doi: 10.1109/TASL.2011.2165944
[2]	Saito Y, Takamichi S, Saruwatari H (2017) Voice conversion using input-to-output highway networks. IEICE T Inf Syst 100: 1925–1928. https://doi.org/10.1587/transinf.2017EDL8034 doi: 10.1587/transinf.2017EDL8034
[3]	Yeh CC, Hsu PC, Chou JC, Lee HY, Lee LS (2018) Rhythm Flexible Voice Conversion Without Parallel Data Using Cycle-GAN Over Phoneme Posteriorgram Sequences. IEEE Spoken Language Technology Workshop (SLT) 274–281. https://doi.org/10.1109/SLT.2018.8639647 doi: 10.1109/SLT.2018.8639647
[4]	Sun L, Wang H, Kang S, Li K, Meng HM (2016) Personalized Cross-Lingual TTS Using Phonetic Posteriorgrams. Interspeech 322–326. https://doi.org/10.21437/Interspeech.2016-1043 doi: 10.21437/Interspeech.2016-1043
[5]	Tian X, Chng ES, Li H (2019) A Speaker-Dependent WaveNet for Voice Conversion with Non-Parallel Data. Interspeech 201–205. https://doi.org/10.21437/Interspeech.2019-1514 doi: 10.21437/Interspeech.2019-1514
[6]	Takahashi N, Singh MK, Mitsufuji Y (2023) Robust One-Shot Singing Voice Conversion. arXiv: 2210.11096v2. https://doi.org/10.48550/arXiv.2210.11096
[7]	Hono Y, Hashimoto K, Oura K, Nankaku Y, Tokuda K (2019) Singing Voice Synthesis Based on Generative Adversarial Networks. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 6955–6959. https://doi.org/10.1109/ICASSP.2019.8683154 doi: 10.1109/ICASSP.2019.8683154
[8]	Sun L, Kang S, Li K, Meng H (2015) Voice conversion using deep bidirectional long short-term memory based recurrent neural networks. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 4869–4873. https://doi.org/10.1109/ICASSP.2015.7178896 doi: 10.1109/ICASSP.2015.7178896
[9]	Kaneko T, Kameoka H, Hiramatsu K, Kashino K (2017) Sequence-to-Sequence Voice Conversion with Similarity Metric Learned Using Generative Adversarial Networks. Interspeech 2017: 1283–1287. http://dx.doi.org/10.21437/Interspeech.2017-970 doi: 10.21437/Interspeech.2017-970
[10]	Freixes M, Alías F, Carrie JC (2019) A unit selection text-to-speech-and-singing synthesis framework from neutral speech: proof of concept. EURASIP Journal on Audio, Speech, and Music Processing 2019: 1–14. https://doi.org/10.1186/s13636-019-0163-y doi: 10.1186/s13636-019-0163-y
[11]	Hono Y, Hashimoto K, Oura K, Nankaku Y, Tokuda K (2021) Sinsy: a deep neural network-based singing voice synthesis system. IEEE/ACM T Audio Spe 29: 2803–2815. https://doi.org/10.1109/TASLP.2021.3104165 doi: 10.1109/TASLP.2021.3104165
[12]	Sisman B, Vijayan K, Dong M, Li H (2019) SINGAN: Singing Voice Conversion with Generative Adversarial Networks. Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) 112–118. https://doi.org/10.1109/APSIPAASC47483.2019.9023162 doi: 10.1109/APSIPAASC47483.2019.9023162
[13]	Sisman B, Li H (2020) Generative adversarial networks for singing voice conversion with and without parallel data. Odyssey 238–244. https://doi.org/10.21437/Odyssey.2020-34 doi: 10.21437/Odyssey.2020-34
[14]	Zhao W, Wang W, Sun Y, Tang T (2019) Singing voice conversion based on wd-gan algorithm. IEEE 4th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC) 950–954. https://doi.org/10.1109/IAEAC47372.2019.8997824 doi: 10.1109/IAEAC47372.2019.8997824
[15]	Fang F, Yamagishi J, Echizen I, Lorenzo-Trueba J (2018) High-Quality Nonparallel Voice Conversion Based on Cycle-Consistent Adversarial Network. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 5279–5283. https://doi.org/10.1109/ICASSP.2018.8462342 doi: 10.1109/ICASSP.2018.8462342
[16]	Kameoka H, Kaneko T, Tanaka K, Hojo N (2018) StarGAN-VC: non-parallel many-to-many Voice Conversion Using Star Generative Adversarial Networks. IEEE Spoken Language Technology Workshop (SLT) 266–273. https://doi.org/10.1109/SLT.2018.8639535 doi: 10.1109/SLT.2018.8639535
[17]	Chen Y, Xia R, Yang K, Zou K (2023) MICU: Image Super-resolution via Multi-level Information Compensation and U-net. Expert Syst Appl 245: 123111. https://doi.org/10.1016/j.eswa.2023.123111 doi: 10.1016/j.eswa.2023.123111
[18]	Chen Y, Xia R, Yang K, Zou K (2023) MFMAM: Image Inpainting via Multi-Scale Feature Module with Attention Module. Comput Vis Image Und 238: 103883. https://doi.org/10.1016/j.cviu.2023.103883 doi: 10.1016/j.cviu.2023.103883
[19]	Chen Y, Xia R, Yang K, Zou K (2023) GCAM: Lightweight Image Inpainting via Group Convolution and Attention Mechanism. Int J Mach Learn Cyb 15: 1815–1825. https://doi.org/10.1007/s13042-023-01999-z doi: 10.1007/s13042-023-01999-z
[20]	Chen Y, Xia R, Yang K, Zou K (2024) DNNAM: Image Inpainting Algorithm via Deep Neural Networks and Attention Mechanism. Appl Soft Comput 111392. https://doi.org/10.1016/j.asoc.2024.111392 doi: 10.1016/j.asoc.2024.111392
[21]	Chen Y, Xia R, Yang K, Zou K (2023) DARGS: Image Inpainting Algorithm via Deep Attention Residuals Group and Semantics. J King Saud Univ-Comput 35: 101567. https://doi.org/10.1016/j.jksuci.2023.101567 doi: 10.1016/j.jksuci.2023.101567
[22]	Chen L, Zhang X, Li Y, Sun M, Chen W (2024) A Noise-Robust Voice Conversion Method with Controllable Background Sounds. Complex Intell Syst 1–14. https://doi.org/10.1007/s40747-024-01375-6 doi: 10.1007/s40747-024-01375-6
[23]	Walczyna T, Piotrowski Z (2023) Overview of Voice Conversion Methods Based on Deep Learning. Applied sciences 13: 3100. https://doi.org/10.3390/app13053100 doi: 10.3390/app13053100
[24]	Liu EM, Yeh JW, Lu JH, Liu YW (2023) Speaker Embedding Space Cosine Similarity Comparisons of Singing Voice Conversion. The Journal of the Acoustical Society of America (JASA) 154: A244–A244. https://doi.org/10.1121/10.0023424 doi: 10.1121/10.0023424
[25]	Hsu CC, Hwang HT, Wu YC, Tsao Y, Wang HM (2016) Voice conversion from non-parallel corpora using variational auto-encoder. Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) 1–6. https://doi.org/10.1109/APSIPA.2016.7820786 doi: 10.1109/APSIPA.2016.7820786
[26]	Tobing PL, Wu YC, Hayashi T, Kobayashi K, Toda T (2019) Non-Parallel Voice Conversion with Cyclic Variational Autoencoder, Interspeech 674–678. https://doi.org/10.21437/Interspeech.2019-2307 doi: 10.21437/Interspeech.2019-2307
[27]	Yook D, Leem SG, Lee K, Yoo IC (2020) Many- to-many voice conversion using cycle-consistent variational autoencoder with multiple decoders. Odyssey 215–221. https://doi.org/10.21437/Odyssey.2020-31 doi: 10.21437/Odyssey.2020-31
[28]	Hsu CC, Hwang HT, Wu YC, Tsao Y, Wang HM (2017) Voice conversion from unaligned corpora using variational autoencoding wasserstein generative adversarial networks. arXiv preprint arXiv: 1704.00849. https://doi.org/10.48550/arXiv.1704.0084
[29]	Huang WC, Violeta LP, Liu S, Shi J, Toda T (2023) The Singing Voice Conversion Challenge 2023. 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) 1–8. https://doi.org/10.1109/ASRU57964.2023.10389671 doi: 10.1109/ASRU57964.2023.10389671
[30]	Chen Q, Tan M, Qi Y, Zhou J, Li Y, Wu Q (2022) V2C: Visual Voice Cloning. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 21242–21251.
[31]	Qian K, Zhang Y, Chang S, Yang X, Hasegawa-Johnson M (2019) Autovc: Zero-shot voice style transfer with only autoencoder loss. International Conference on Machine Learning 5210–5219.
[32]	Patel M, Purohit M, Parmar M, Shah NJ, Patil HA (2020) Adagan: Adaptive gan for many-to-many non-parallel voice conversion.
[33]	Liu F, Wang H, Peng R, Zheng C, Li X (2021) U2-VC: one-shot voice conversion using two-level nested U-structure. EURASIP Journal on Audio, Speech, and Music Processing 2021: 1–15. https://doi.org/10.1186/s13636-021-00226-3 doi: 10.1186/s13636-021-00226-3
[34]	Liu F, Wang H, Ke Y, Zheng C (2022) One-shot voice conversion using a combination of U2-Net and vector quantization. Appl Acoust 199: 109014. https://doi.org/10.1016/j.apacoust.2022.109014 doi: 10.1016/j.apacoust.2022.109014
[35]	Wu DY, Lee HY (2020) One-shot voice conversion by vector quantization. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 7734–7738. https://doi.org/10.1109/ICASSP40776.2020.9053854 doi: 10.1109/ICASSP40776.2020.9053854
[36]	Chou JC, Lee HY (2019) One-Shot Voice Conversion by Separating Speaker and Content Representations with Instance Normalization. Interspeech 664–668. https://doi.org/10.21437/Interspeech.2019-2663 doi: 10.21437/Interspeech.2019-2663
[37]	Huang X, Belongie S (2017) Arbitrary style transfer in real-time with adaptive instance normalization. IEEE International Conference on Computer Vision (ICCV) 1501–1510. https://doi.org/10.1109/ICCV.2017.167 doi: 10.1109/ICCV.2017.167
[38]	Lian J, Lin P, Dai Y, Li G (2022) Arbitrary Voice Conversion via Adversarial Learning and Cycle Consistency Loss. International Conference on Intelligent Computing 569–578. https://doi.org/10.1007/978-3-031-13829-4_49 doi: 10.1007/978-3-031-13829-4_49
[39]	Gu Y, Zhao X, Yi X, Xiao J (2022) Voice Conversion Using learnable Similarity-Guided Masked Autoencoder. International Workshop on Digital watermarking 13825: 53–67. https://doi.org/10.1007/978-3-031-25115-3_4 doi: 10.1007/978-3-031-25115-3_4
[40]	Chen YH, Wu DY, Wu TH, Lee HY (2021) AGAIN-VC: A one-shot voice conversion using activation guidance and adaptive instance normalization. IEEE International Conference on Acoustics, Speech, and Signal Processing 5954–5958. https://doi.org/10.1109/ICASSP39728.2021.9414257 doi: 10.1109/ICASSP39728.2021.9414257
[41]	Ulyanov D, Lebedev V, Vedaldi A, Lempitsky VS (2016) Texture networks: Feed-forward synthesis of textures and stylized images. Proceedings of the 33nd International Conference on Machine Learning 1349–1357.
[42]	Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. Proceedings of the 32nd International Conference on International Conference on Machine Learning 37: 448–456.
[43]	Li Y, Wang N, Shi J, Liu J, Hou X (2016) Revisiting batch normalization for practical domain adaptation. arXiv preprint arXiv: 1603.04779.
[44]	Ulyanov D, Vedaldi A, Lempitsky V (2017) Improved Texture Networks: Maximizing Quality and Diversity in Feed-Forward Stylization and Texture Synthesis. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 4105–4113. https://doi.org/10.1109/CVPR.2017.437 doi: 10.1109/CVPR.2017.437
[45]	Liu J, Han W, Ruan H, Chen X, Jiang D, Li H (2018) Learning Salient Features for Speech Emotion Recognition Using CNN. First Asian Conference on Affective Computing and Intelligent Interaction (ACII Asia) 1–5. https://doi.org/10.1109/ACIIAsia.2018.8470393 doi: 10.1109/ACIIAsia.2018.8470393
[46]	Lim W, Jang D, Lee T (2016) Speech emotion recognition using convolutional and Recurrent Neural Networks. Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) 1–4. https://doi.org/10.1109/APSIPA.2016.7820699 doi: 10.1109/APSIPA.2016.7820699
[47]	Hajarolasvadi N, Demirel H (2019) 3D CNN-Based Speech Emotion Recognition Using K-Means Clustering and Spectrograms. Entropy (Basel) 21: 479. https://doi.org/10.3390/e21050479 doi: 10.3390/e21050479
[48]	Graves A (2012) Long Short-Term Memory Supervised Sequence Labelling with Recurrent Neural Networks. Studies in Computational Intelligence 385: 37–45. https://doi.org/10.1007/978-3-642-24797-2 doi: 10.1007/978-3-642-24797-2
[49]	Chung J, Gulcehre C, Cho K, Bengio Y (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv: 1412.3555.
[50]	Kumar K, Kumar R, de Boissiere T, Gestin L, Teoh WZ, Sotelo J, et al. (2019) Melgan: Generative adversarial networks for conditional waveform synthesis. Advances in Neural Information Processing Systems 14910–14921.
[51]	Kong J, Kim J, Bae J (2020) HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis. Proceedings of the 34th International Conference on Neural Information Processing Systems 33: 17022–17033.
[52]	Duan Z, Fang H, Li B, Sim KC, Wang Y (2013) The NUS sung and spoken lyrics corpus: A quantitative comparison of singing and speech. Asia-Pacific Signal and Information Processing Association Annual Summit and Conference 1–9. https://doi.org/10.1109/APSIPA.2013.6694316 doi: 10.1109/APSIPA.2013.6694316
[53]	Kubichek R (1993) Mel-cepstral distance measure for objective speech quality assessment. Proceedings of IEEE Pacific Rim Conference on Communications Computers and Signal Processing 1: 125–128. https://doi.org/10.1109/PACRIM.1993.407206 doi: 10.1109/PACRIM.1993.407206
[54]	Kobayashi K, Toda T, Nakamura S (2018) Intra-gender statistical singing voice conversion with direct waveform modification using log spectral differential. Speech Commun 99: 211–220. https://doi.org/10.1016/j.specom.2018.03.011 doi: 10.1016/j.specom.2018.03.011
[55]	Toda T, Tokuda K (2007) A speech parameter generation algorithm considering global variance for hmm-based speech synthesis. IEICE T Inf Syst 90: 816–824. https://doi.org/10.1093/ietisy/e90-d.5.816 doi: 10.1093/ietisy/e90-d.5.816

This article has been cited by:

1.	Dmitry I. Sinelshchikov, Linearizabiliy and Lax representations for cubic autonomous and non-autonomous nonlinear oscillators, 2023, 01672789, 133721, 10.1016/j.physd.2023.133721
2.	Jaume Giné, Xavier Santallusia, Integrability via algebraic changes of variables, 2024, 184, 09600779, 115026, 10.1016/j.chaos.2024.115026
3.	Meryem Belattar, Rachid Cheurfa, Ahmed Bendjeddou, Paulo Santana, A class of nonlinear oscillators with non-autonomous first integrals and algebraic limit cycles, 2023, 14173875, 1, 10.14232/ejqtde.2023.1.50
4.	Jaume Giné, Dmitry Sinelshchikov, Integrability of Oscillators and Transcendental Invariant Curves, 2025, 24, 1575-5460, 10.1007/s12346-024-01182-x

Reader Comments

Your name:*

Email:*
© 2024 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)