Component | Value |
Arabinose | 25.9 ± 0.138 |
Xylose | 32.3 ± 0.210 |
Galactose | 12.2 ± 0.203 |
Glucose | 9.3 ± 0.071 |
Protein | 3.4 ± 0.190 |
Ash | 1.1 ± 0.001 |
Ferulic acid | 0.025 ± 0.001 |
A/X ratio | 0.8 |
Singing voice conversion methods encounter challenges in achieving a delicate balance between synthesis quality and singer similarity. Traditional voice conversion techniques primarily emphasize singer similarity, often leading to robotic-sounding singing voices. Deep learning-based singing voice conversion techniques, however, focus on disentangling singer-dependent and singer-independent features. While this approach can enhance the quality of synthesized singing voices, many voice conversion systems still grapple with the issue of singer-dependent feature leakage into content embeddings. In the proposed singing voice conversion technique, an encoder decoder framework was implemented using a hybrid model of convolutional neural network (CNN) accompanied by long short term memory (LSTM). This paper investigated the use of activation guidance and adaptive instance normalization techniques for one shot singing voice conversion. The instance normalization (IN) layers within the auto-encoder effectively separated singer and content representations. During conversion, singer representations were transferred using adaptive instance normalization (AdaIN) layers. This singing voice system with the help of activation function prevented the transfer of singer information while conveying the singing content. Additionally, the fusion of LSTM with CNN can enhance voice conversion models by capturing both local and contextual features. The one-shot capability simplified the architecture, utilizing a single encoder and decoder. Impressively, the proposed hybrid CNN-LSTM model achieved remarkable performance without compromising either quality or similarity. The objective and subjective evaluation assessments showed that the proposed hybrid CNN-LSTM model outperformed the baseline architectures. Evaluation results showed a mean opinion score (MOS) of 2.93 for naturalness and 3.35 for melodic similarity. These hybrid CNN-LSTM techniques allowed it to perform high-quality voice conversion with minimal training data, making it a promising solution for various applications.
Citation: Assila Yousuf, David Solomon George. A hybrid CNN-LSTM model with adaptive instance normalization for one shot singing voice conversion[J]. AIMS Electronics and Electrical Engineering, 2024, 8(3): 292-310. doi: 10.3934/electreng.2024013
[1] | Dyoni M. de Oliveira, Victor Hugo Salvador, Thatiane R. Mota, Aline Finger-Teixeira, Rodrigo F. de Almeida, Douglas A. A. Paixão, Amanda P. De Souza, Marcos S. Buckeridge, Rogério Marchiosi, Osvaldo Ferrarese-Filho, Fabio M. Squina, Wanderley D. dos Santos . Feruloyl esterase from Aspergillus clavatus improves xylan hydrolysis of sugarcane bagasse. AIMS Bioengineering, 2017, 4(1): 1-11. doi: 10.3934/bioeng.2017.1.1 |
[2] | Mayra A. Mendez-Encinas, Dora E. Valencia-Rivera, Elizabeth Carvajal-Millan, Humberto Astiazaran-Garcia, Agustín Rascón-Chu, Francisco Brown-Bojorquez . Electrosprayed highly cross-linked arabinoxylan particles: effect of partly fermentation on the inhibition of Caco-2 cells proliferation. AIMS Bioengineering, 2021, 8(1): 52-70. doi: 10.3934/bioeng.2021006 |
[3] | Nina Alchujyan, Elena Aghajanova, Nina Movsesyan, Arthur Melkonyan, Artashes Guevorkian, Armen Andreasyan, Margarita Hovhannisyan . Sex-specific alterations in creatine metabolism in cellular compartments of peripheral blood leukocytes in type 1 diabetes. AIMS Bioengineering, 2024, 11(4): 600-616. doi: 10.3934/bioeng.2024028 |
[4] | Daniel N. Riahi, Saulo Orizaga . Modeling and computation for unsteady blood flow and solute concentration in a constricted porous artery. AIMS Bioengineering, 2023, 10(1): 67-88. doi: 10.3934/bioeng.2023007 |
[5] | Nhuan P. Nghiem, Clyde W. Ellis, Jr., Justin Montanti . The effects of ethanol on hydrolysis of cellulose and pretreated barley straw by some commercial cellulolytic enzyme products. AIMS Bioengineering, 2016, 3(4): 441-453. doi: 10.3934/bioeng.2016.4.441 |
[6] | Guang Long, Bo Lin, Lu Wang, Lingyan Wu, Tieying Yin, Donghong Yu, Guixue Wang . Sappan Lignum Extract Inhibits Restenosis in the Injured Artery through the Deactivation of Nuclear Factor-κB. AIMS Bioengineering, 2014, 1(1): 25-39. doi: 10.3934/bioeng.2014.1.25 |
[7] | Chiara Schiraldi, Alberto D'Avino, Alessandro Ruggiero, Katia Della Corte, Mario De Rosa . Saccharomyces pastorianus as cell factory to improve production of fructose 1,6-diphosphate using novel fermentation strategies. AIMS Bioengineering, 2015, 2(3): 206-221. doi: 10.3934/bioeng.2015.3.206 |
[8] | Simge Çelebi, Mert Burkay Çöteli . Red and white blood cell classification using Artificial Neural Networks. AIMS Bioengineering, 2018, 5(3): 179-191. doi: 10.3934/bioeng.2018.3.179 |
[9] | Vasudeva Reddy Tatiparthi, Madhava Rao, Santosh Kumar, Hindumathi . Detection and analysis of coagulation effect in vein using MEMS laminar flow for the early heart stroke diagnosis. AIMS Bioengineering, 2023, 10(1): 1-12. doi: 10.3934/bioeng.2023001 |
[10] | David K. Y. Lim, Peer M. Schenk . Microalgae selection and improvement as oil crops: GM vs non-GM strain engineering. AIMS Bioengineering, 2017, 4(1): 151-161. doi: 10.3934/bioeng.2017.1.151 |
Singing voice conversion methods encounter challenges in achieving a delicate balance between synthesis quality and singer similarity. Traditional voice conversion techniques primarily emphasize singer similarity, often leading to robotic-sounding singing voices. Deep learning-based singing voice conversion techniques, however, focus on disentangling singer-dependent and singer-independent features. While this approach can enhance the quality of synthesized singing voices, many voice conversion systems still grapple with the issue of singer-dependent feature leakage into content embeddings. In the proposed singing voice conversion technique, an encoder decoder framework was implemented using a hybrid model of convolutional neural network (CNN) accompanied by long short term memory (LSTM). This paper investigated the use of activation guidance and adaptive instance normalization techniques for one shot singing voice conversion. The instance normalization (IN) layers within the auto-encoder effectively separated singer and content representations. During conversion, singer representations were transferred using adaptive instance normalization (AdaIN) layers. This singing voice system with the help of activation function prevented the transfer of singer information while conveying the singing content. Additionally, the fusion of LSTM with CNN can enhance voice conversion models by capturing both local and contextual features. The one-shot capability simplified the architecture, utilizing a single encoder and decoder. Impressively, the proposed hybrid CNN-LSTM model achieved remarkable performance without compromising either quality or similarity. The objective and subjective evaluation assessments showed that the proposed hybrid CNN-LSTM model outperformed the baseline architectures. Evaluation results showed a mean opinion score (MOS) of 2.93 for naturalness and 3.35 for melodic similarity. These hybrid CNN-LSTM techniques allowed it to perform high-quality voice conversion with minimal training data, making it a promising solution for various applications.
Arabinoxylan (AX) are non-starch polysaccharides in the cell walls of the most commonly consumed cereal grains, constituting a significant portion of dietary fiber [1]. AX are essential components in diets due to their potential beneficial effects on human health as prebiotics stimulating specific intestinal microbes and fermentation patterns, improving colon function [2]. Many of the biological activities of AX are correlated with their structural features [3]. Structurally, AX are composed of a linear backbone of (1→4) linked β-D-xylopyranosyl residues, which are substituted at C(O)−2 or C(O)−3 positions with L-arabinofuranosyl residues [4]. Around 50–60% of xylose residues are no replaced with arabinose units and present different substitution patterns, but variations in a structure depend on the extraction source of AX [5]. These arabinose units can be esterified with ferulic acid (FA) on (O)−5 position, forming di- or tri-FA structures. The esterification grade also varies in function of cereal source and the extraction methodology employed [3]. The FA units in AX chains can be crosslinking using chemical or enzymatic free radicals generating agents. Commonly, the laccase enzyme is used for FA oxidation and form additional di- or tri-FA structures [6]. The crosslinking of AX results in the formation of highly viscous solutions and gels (AX-Gel), presenting unique physicochemical properties and playing exciting roles in the food industry and human health [7].
Generally, AX and AX-Gel consumption in humans is associated with beneficial postprandial effects such as lowering glucose and lipids levels to avoid disease development. The consumption of AX is related to anti-obesogenic impact, reduction of heart disease, and the attenuation of type 2 diabetes by improving carbohydrates, lipids, and amino acid metabolism [1],[8]–[10]. The AX gels fabricated with covalent crosslinking can absorb large amounts of water, are stable at temperatures or pH changes, and have a neutral taste and odor, which are essential characteristics for food applications [11],[12]. Besides, the AX crosslinking can promote a selective fermentation of these polysaccharides in the colon, limiting the growth of bacteria considered non-beneficial (Bacteroides) and favoring the growth of Bifidobacteria, a probiotic [13]. Furthermore, the AX gels are considered hypothetical protectors of the gut microbiota against consuming a high-fat diet when they were part of the Wistar rats' diet formulation [9].
A few human evaluations have reported minor effects or no changes in postprandial glucose after AX-rich diet ingestion [14],[15]. The principal hypothesis for attenuation of metabolic responses caused by AX intakes is the increase of alimentary bolus viscosity in the gastrointestinal tract. It has been suggested that the high viscosity that AX and AX-Gel present in aqueous media would reduce the absorption rate of nutrients resulting in a subsequent lowering of blood glucose and lipids levels [16]. Some other studies point out that the AX structure may reduce the intestinal α-glucosidase activity avoiding the production of monomeric sugars units as glucose or fructose from dietary digestible carbohydrates [17].
In animals, contrasting results about the postprandial effects of AX and AX-Gel consumption are also presented. In some studies, postprandial glucose levels present a considerable reduction by the AX or AX-Gel intake, while in others, glucose levels remain similar to control [18]. In addition, sometimes, the consumption of AX-rich diets has been considered anti-nutritive due to changes in nutrients digestion and absorption rate associated with the viscose nature of AX [19],[20]. Despite the contradictory results by consuming AX or AX-Gel, dietary fiber is important as part of the diet. It does not always impart negative or anti-nutritive effects. For that reason, this work aimed to evaluate the impact of AX and AX-Gel on blood serum lipids and glucose levels of Wistar rats. A standard diet supplemented with 5% (w/w) lyophilized AX or AX-Gel was administered in a single meal, and, after that, blood samples were collected. The serum lipid profile, including total cholesterol, triglycerides, high-density lipoprotein (HDL) cholesterol, and low-density lipoprotein (LDL) cholesterol and glucose levels, were determined from blood samples.
Maize bran AX was isolated following the methodology of Carvajal-Millan et al. [21]. The obtained AX contains 0.025 µg/mg AX of FA and arabinose to xylose (A/X) ratio of 0.8. Table 1 presents the composition of the AX obtained. Laccase enzyme (benzenediol: oxygen oxidoreductase, E.C.1.10.3.2) from Trametes versicolor and all other used chemicals were provided by Sigma Aldrich Co. (St. Louis, Missouri).
Component | Value |
Arabinose | 25.9 ± 0.138 |
Xylose | 32.3 ± 0.210 |
Galactose | 12.2 ± 0.203 |
Glucose | 9.3 ± 0.071 |
Protein | 3.4 ± 0.190 |
Ash | 1.1 ± 0.001 |
Ferulic acid | 0.025 ± 0.001 |
A/X ratio | 0.8 |
*Note: Carbohydrates, and ash and protein are expressed in g/100 g of AX dry matter. Ferulic acid is expressed in µg/mg of AX dry matter. Values are the mean ± standard deviation.
Fourier transform infrared (FT-IR) analysis of dry maize bran AX and AX-Gel was carried out on a Nicolet FT-IR spectrophotometer (Nicolet Instrument Corp., Madison, WI, USA), using KBr pellets (2 mg sample/200 mg KBr). A blank pellet was used as background. FT-IR spectra were acquired in absorbance mode in the mid-infrared region (400–4000 cm−1) at 4 cm−1 of resolution and 32 scans [22]. The obtained spectra were analyzed with the OMNIC 9.3.32 software (Thermo Fisher Inc). The characteristic bands of AX were detected according to previously reported FT-IR spectra [23].
The AX-gel was prepared following the methodology of Berlanga-Reyes et al. [24] and Martínez-López et al. [25] with some modifications. Maize bran AX were dispersed in 0.1 M sodium acetate buffer at pH 5.5 to obtain a 4% (w/v) solution. The solution was maintained at constant stirring and room temperature for 24 h. Laccase was dispersed in 0.1 M sodium acetate buffer at pH 5.5 (0.4 U/µL) and incorporated as a crosslinking agent into the AX solution to prepare the AX-Gel (1.675 nkat per mg AX). The mixture was stirring for a few minutes. Then, the gel was allowed to set at 25 °C overnight. Both AX solution and AX-Gel were frozen at −20 °C and freeze-dried at −40 °C/0.125 mbar (Labconco lyophilizer, Kansas, USA) for two days.
The mechanical spectrum of AX-Gel was studied using a strain-controlled rheometer (Discovery HR-2 rheometer, TA Instruments, New Castle, DE, USA) in oscillatory mode. The storage (G′) and loss (G″) moduli and tan δ (G′/G″) were monitored to evaluate the gel hardness by a frequency sweep from 0.01 to 10 Hz at 5% strain and 25 °C. Measurements were made at the end of the network formation in the linearity range of viscoelastic behavior. The rheological test was carried out in duplicate, and the results were reported as the means [26].
The microstructure and surface morphology of the freeze-dried AX and AX-Gel were evaluated by field emission scanning electron microscopy (JEOL JSM-7401F, Peabody, MA, USA) without coating at low voltage (1.8 kV). The SEM images were obtained a 2000× magnification in secondary and backscattered electrons image mode [22]. ImageJ software was used to evaluate the external morphology of AX powder and AX-gel.
Twelve male Wistar rats were housed individually in metabolic cages in an environment-controlled room (temperature: 23 ± 2 °C; relativity humidity: 60 ± 5%) with a 12 h day/night cycle. During a week, the organisms were acclimatized, and they were fed a standard pellet diet with free access to water. The weight of the animals on the study day was between 300–320 g. Animal handling and all experiments were approved by the animal ethical committee of the Research Center for Food and Development (CIAD, AC), following the procedures and specifications of the Official Mexican Standard Norm (NOM-062-ZOO-1999).
The animals were fasted for 15 h with free access to water and randomly divided into three groups (4 per treatment). The rats were weighed, and their tail tip was washed with ethanol. A drop of blood was taken from the end of the tail vein, and the blood glucose concentration was determined with the Accu-Check Performa glucometer (Roche, Mannheim, Germany). This first measure was considered as the baseline blood glucose level. After that, the rats were divided into three groups. Two groups were fed with 20 g of a standard pellet diet containing 5% (w/w) of lyophilized AX or AX-Gel. The control group was fed with 20 g of only standard pellet diet. The rodent consumed the whole meal after 4 h of presentation. Blood glucose concentrations were determined by collecting a new drop of blood from the tail vein at 2 and 10 h after food presentation. The next day, all groups were fed with 20 g of standard pellet diet.
The concentration of total cholesterol, triglyceride, HDL-cholesterol, and LDL-cholesterol were determined on blood serum after 10 h of food consumption. A sample of blood (approximately 500 µL) from the tail vein of each rat was taken. The samples were allowed to stand for 2 h to let that coagulation process take place. The serum lipid profile was determined using a clinical chemistry autoanalyzer based on dry chemistry micro-slide technology (VITRO® 350 chemistry system, Johnson & Johnson, USA).
The blood glucose, total cholesterol, triglyceride, HDL-cholesterol, and LDL-cholesterol values were performed in triplicate the data are reported as the means ± standard deviation.
FT-IR spectra of AX extracted from maize bran and the AX-Gel are presented in Figure 1. Both samples showed almost identical IR spectra, indicating similar molecular identity, and they presented the typical AX spectrum. The spectrum displays the characteristic absorbance bands for polysaccharides in the region of 1200–800 cm−1, especially the signal at 1028 cm−1 is related to the C-OH bending of xylans [27],[28]. Two little dissimulated shoulders at 1069 cm−1 and 904 cm−1 are detected, indicating the antisymmetric C-O-C stretching mode of the glycosidic links and the arabinose substitutions C-3 of xylose units [23],[28]. The band at 1639 cm−1 has been associated with the carbonyl stretching vibration of FA at a low degree of esterification [27]. At 2923 cm−1, the signal for the group CH2 was observed, and a wide band at 3320 cm−1 was presented for the OH group.
The mechanical spectrum of the AX-Gel after 12 h of gelation showed that G′ was higher than G″ and independent of frequency (Figure 2). The values of G′ were linear, ranging from 170 to 200 Pa, while G″ values were lower and changed depending on the frequency. The values of G′ and G″ were considered to calculate tan δ (G′/G″). The tan δ values increased from 0.007 to 0.1 when frequency increased.
The scanning electron micrographs of AX and AX-Gel are presented in Figure 3. The AX powder (Figure 3a) displayed a segmented and granular structure with an irregular and rough morphology which simulate small porous in the surface. In the case of the AX-Gel (Figure 3b), they had a continuous surface characterized by an irregular three-dimensional arrangement. They presented continuous aggregates of nodular structures that create the porosity of the networks.
The rat's blood glucose levels registered before and after food consumption are shown in Figure 4. The mean of fasting blood glucose in each group, considered basal level or time 0, was 125 ± 1, 132 ± 10, and 129 ± 9 mg/dL for control, AX-Gel, and AX group, respectively. After 2 and 10 h of food consumption, the blood glucose levels were measured. In general, the glucose values were from 130 to 150 mg/dL, and in a few cases, they exceeded the 160 mg/dL, especially in the AX-Gel group. Although postprandial blood glucose values trended to increase in both AX-Gel and AX groups, they do not show significant differences compared with the control group.
The serum lipid profile levels were also assessed after 10 h of food consumption. The total cholesterol mean values (Figure 5a) were similar for the AX and the control group, while the mean value increased significantly (p ≤ 0.05) for the AX-Gel group. In triglyceride levels (Figure 5b), the mean of the AX, AX-Gel, and control group were not significantly different. The HDL-cholesterol levels (Figure 5c) were similar between the three groups presenting values around 50 mg/dL. The LDL-cholesterol level (Figure 5d) for AX and AX-Gel groups was significantly higher (p ≤ 0.05) with 14 and 7 mg/dL, respectively, compared with the control group (1.6 mg/dL).
The maize bran AX presented an integral structure corroborated with the FT-IR spectrum, in which all typical signals of AX polysaccharides were detected. Besides, the A/X value (0.8) is similar to that reported for other AX obtained from maize bran and indicates that they have a highly branched structure [29],[30]. However, the FA content (0.025 µg/mg) was lower than the value reported in the literature for other maize bran AX (7.8 µg/mg) [31]. Differences in FA content have been associated with the length of the alkaline process. The increase of AX alkaline extraction time decreases the FA content, impacting the gelling capacity of the AX directly because less crosslinking points are formed, leading probably to a low viscosity solution [32],[33].
The viscoelastic properties of AX-Gel were similar to those shown by other authors for crosslinked solutions of AX at different concentrations using laccase [23],[25],[34],[35]. Typical behavior of solid-like material was proved in the AX-Gel through the mechanical spectra, with G′ > G″, a linear and independent G′ value, and the G″ value-dependent of frequency [30],[34]. In previous studies, high G′ values in AX gels have been attributed to the content of covalent crosslinking combined with physical entanglements between their chains [7],[25]. The frequency-independent behavior of G′ displayed in AX-Gel reflects the stability between crosslinking points in the network, which has been seen in many AX gels [25],[31]. According to Mendez-Encinas et al. [36], the behavior of AX gels is related to the structural and conformational characteristic of AX, especially the FA content, which is the molecule for covalent crosslinking. Furthermore, the small values obtained for tan δ describe the nature of the network, and in this case, it proves an elastic character of AX-Gel [22],[37]. tan δ values lower than 0.1 are associated with an elastic system, while tan δ values higher than 0.1 suggest a more liquid-like character in the network [38].
The differences between the surface morphology of AX and AX-Gel have also been shown by previous works that point out how several factors as the branching degree of the xylan backbone, the content of FA, and the crosslinking of the AX chains generate specific morphologies. Martinez-Lopez et al. [25] observed similar surfaces with nodular clusters in AX microparticles crosslinked from a 4% AX solution. They have associated this heterogeneous structure with the content and distribution of ferulic acid due to other studies have reported that the crosslinking of gels via phenolic groups present nodular clusters, which determine the pore size in the network.
In the present research, the consumption of a standard diet supplemented with 5% of AX or AX-Gel by Wistar rats did not significantly change the levels of postprandial glucose after 2 and 10 h in relation to control. Contrasting results have been reported in Wistar rats fed with a diet containing 4% of crosslinking AX which significantly reduced the postprandial glucose levels [39]. According to those authors, the crosslinking of AX dramatically increases its viscosity in solution, which may delay the bowel transit and absorption rate, causing the blunt of glucose values. The absence of changes in glucose levels observed in the present work, after the ingestion of 5% of the AX-Gel by the Wistar rats, could be associated with the low G′ values (170-200 Pa) in these gels. AX gels exhibit viscoelastic properties and more compact microstructure than non-crosslinked AX [40]. However, the formation of covalent crosslinking points to set the gel mainly depends on the content of FA in the AX chain [4]. Probably, the low FA content in the structure of the AX used (0.025 µg/mg) generates few crosslinking points impacting directly on gel characteristics such as viscoelasticity. [40]. The small tan δ values (0.007–0.100) support the elastic character of AX-Gel. Although, the tan δ registered in the present study is higher than that reported for highly crosslinked AX gels (0.001) [31], which can be attributed to the lower FA content in the sample in relation to that study (7.18 µg/mg AX). Thus, several structural characteristics of AX and AX gels, including the FA content, would modify their physiological functions. In a previous report [39], supplementation of food with 4% of non-cross-linked AX did not blunt the postprandial blood glucose response in Wistar rats, which agrees with the results found in the present study. Those authors attribute the similar effect of AX and control to the low viscosity of AX in relation to AX-Gel. Comparable results have also been reported in pigs where no diet-induced differences were found in postprandial glucose levels, even when the food contained 10% or 17% of native AX as dietary fiber [41],[42]. For humans, AX consumption has been shown to reduce postprandial glucose even if only 3.2% of AX has been incorporated as dietary fiber in food [43],[44].
Regarding the lipid profile behavior, the total cholesterol and LDL-cholesterol values were higher in AX-Gel in relation to control and AX treatments, but the values were low than those reported by previous studies in rats [45],[46]. At the same time, triglycerides and HDL-cholesterol remain close to the control for both treatments. These results are different from those presented in other studies where the lipid profile has been evaluated after 4 or 5 weeks of AX consumption. One study with rats showed that the ingestion of the dietary fiber from diverse plants, which included AX, reduced the total cholesterol, triglycerides, and LDL-cholesterol and increased the production of HDL-cholesterol [47]. Those authors proposed the increase in bowel transit rate as a mechanism of action, causing an alteration in lipid absorption as cholesterol and fatty acids, which may bind to fiber preventing the formation of micelles. Chen et al. [48] also presented decreased total cholesterol, triglycerides, and LDL-cholesterol after ingesting a high-fat diet supplemented with 6% AX. The contrasting results of this study suggest that the AX-Gel diet does not bind the cholesterol coming from a standard diet, probably due to the short time of treatments consumption. Similar results to those of our study were found in pigs fed with the AX diet, where the serum triglycerides resulted in higher concentrations compared with the control diet. Another study in pigs showed that an 8% AX-rich diet, consumed during 4 weeks, does not significantly affect LDL, HDL, or total cholesterol but decreases triglyceride levels [49]. In humans with metabolic syndrome or type 2 diabetes, the consumption of the AX diet does not improve the lipids profile [50],[51].
In the case of LDL-cholesterol levels, they were low in all groups, in relation to those shown by previous studies in rats [45],[46]. However, the AX-Gel group presented a higher LDL-cholesterol value and a higher total cholesterol level in relation to control and AX treatments. The biological role of LDL is to transport the cholesterol from the liver to tissues, where it is incorporated into the cellular membranes [46],[52]. The increased serum levels of LDL-cholesterol in the AX-Gel group are probably related to the augmented total cholesterol in the same group, indicating its transport from the liver to extra-hepatic tissues.
The supplementation of maize bran AX and AX-Gel at 5% in food for Wistar rats does not interfere with the absorption of nutrients. The levels of postprandial glucose, triglycerides, and HDL cholesterol were close to control treatment. Only total cholesterol and LDL-cholesterol showed an increase in AX-Gel treatment, but the values were within normal ranges that have been reported for healthy organisms. The variation in postprandial responses after the consumption of AX and AX-Gel rich diets suggests that further studies are necessary to clarify the underlying mechanisms and the effect of polysaccharide structural variation and the characteristics of the gel formed.
[1] |
Helander E, Virtanen T, Nurminen J, Gabbouj M (2010) Voice conversion using partial least squares regression. IEEE/ACM Transactions on Audio, Speech and Language Processing 18: 912–921. https://doi.org/10.1109/TASL.2011.2165944 doi: 10.1109/TASL.2011.2165944
![]() |
[2] |
Saito Y, Takamichi S, Saruwatari H (2017) Voice conversion using input-to-output highway networks. IEICE T Inf Syst 100: 1925–1928. https://doi.org/10.1587/transinf.2017EDL8034 doi: 10.1587/transinf.2017EDL8034
![]() |
[3] |
Yeh CC, Hsu PC, Chou JC, Lee HY, Lee LS (2018) Rhythm Flexible Voice Conversion Without Parallel Data Using Cycle-GAN Over Phoneme Posteriorgram Sequences. IEEE Spoken Language Technology Workshop (SLT) 274–281. https://doi.org/10.1109/SLT.2018.8639647 doi: 10.1109/SLT.2018.8639647
![]() |
[4] |
Sun L, Wang H, Kang S, Li K, Meng HM (2016) Personalized Cross-Lingual TTS Using Phonetic Posteriorgrams. Interspeech 322–326. https://doi.org/10.21437/Interspeech.2016-1043 doi: 10.21437/Interspeech.2016-1043
![]() |
[5] |
Tian X, Chng ES, Li H (2019) A Speaker-Dependent WaveNet for Voice Conversion with Non-Parallel Data. Interspeech 201–205. https://doi.org/10.21437/Interspeech.2019-1514 doi: 10.21437/Interspeech.2019-1514
![]() |
[6] | Takahashi N, Singh MK, Mitsufuji Y (2023) Robust One-Shot Singing Voice Conversion. arXiv: 2210.11096v2. https://doi.org/10.48550/arXiv.2210.11096 |
[7] |
Hono Y, Hashimoto K, Oura K, Nankaku Y, Tokuda K (2019) Singing Voice Synthesis Based on Generative Adversarial Networks. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 6955–6959. https://doi.org/10.1109/ICASSP.2019.8683154 doi: 10.1109/ICASSP.2019.8683154
![]() |
[8] |
Sun L, Kang S, Li K, Meng H (2015) Voice conversion using deep bidirectional long short-term memory based recurrent neural networks. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 4869–4873. https://doi.org/10.1109/ICASSP.2015.7178896 doi: 10.1109/ICASSP.2015.7178896
![]() |
[9] |
Kaneko T, Kameoka H, Hiramatsu K, Kashino K (2017) Sequence-to-Sequence Voice Conversion with Similarity Metric Learned Using Generative Adversarial Networks. Interspeech 2017: 1283–1287. http://dx.doi.org/10.21437/Interspeech.2017-970 doi: 10.21437/Interspeech.2017-970
![]() |
[10] |
Freixes M, Alías F, Carrie JC (2019) A unit selection text-to-speech-and-singing synthesis framework from neutral speech: proof of concept. EURASIP Journal on Audio, Speech, and Music Processing 2019: 1–14. https://doi.org/10.1186/s13636-019-0163-y doi: 10.1186/s13636-019-0163-y
![]() |
[11] |
Hono Y, Hashimoto K, Oura K, Nankaku Y, Tokuda K (2021) Sinsy: a deep neural network-based singing voice synthesis system. IEEE/ACM T Audio Spe 29: 2803–2815. https://doi.org/10.1109/TASLP.2021.3104165 doi: 10.1109/TASLP.2021.3104165
![]() |
[12] |
Sisman B, Vijayan K, Dong M, Li H (2019) SINGAN: Singing Voice Conversion with Generative Adversarial Networks. Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) 112–118. https://doi.org/10.1109/APSIPAASC47483.2019.9023162 doi: 10.1109/APSIPAASC47483.2019.9023162
![]() |
[13] |
Sisman B, Li H (2020) Generative adversarial networks for singing voice conversion with and without parallel data. Odyssey 238–244. https://doi.org/10.21437/Odyssey.2020-34 doi: 10.21437/Odyssey.2020-34
![]() |
[14] |
Zhao W, Wang W, Sun Y, Tang T (2019) Singing voice conversion based on wd-gan algorithm. IEEE 4th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC) 950–954. https://doi.org/10.1109/IAEAC47372.2019.8997824 doi: 10.1109/IAEAC47372.2019.8997824
![]() |
[15] |
Fang F, Yamagishi J, Echizen I, Lorenzo-Trueba J (2018) High-Quality Nonparallel Voice Conversion Based on Cycle-Consistent Adversarial Network. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 5279–5283. https://doi.org/10.1109/ICASSP.2018.8462342 doi: 10.1109/ICASSP.2018.8462342
![]() |
[16] |
Kameoka H, Kaneko T, Tanaka K, Hojo N (2018) StarGAN-VC: non-parallel many-to-many Voice Conversion Using Star Generative Adversarial Networks. IEEE Spoken Language Technology Workshop (SLT) 266–273. https://doi.org/10.1109/SLT.2018.8639535 doi: 10.1109/SLT.2018.8639535
![]() |
[17] |
Chen Y, Xia R, Yang K, Zou K (2023) MICU: Image Super-resolution via Multi-level Information Compensation and U-net. Expert Syst Appl 245: 123111. https://doi.org/10.1016/j.eswa.2023.123111 doi: 10.1016/j.eswa.2023.123111
![]() |
[18] |
Chen Y, Xia R, Yang K, Zou K (2023) MFMAM: Image Inpainting via Multi-Scale Feature Module with Attention Module. Comput Vis Image Und 238: 103883. https://doi.org/10.1016/j.cviu.2023.103883 doi: 10.1016/j.cviu.2023.103883
![]() |
[19] |
Chen Y, Xia R, Yang K, Zou K (2023) GCAM: Lightweight Image Inpainting via Group Convolution and Attention Mechanism. Int J Mach Learn Cyb 15: 1815–1825. https://doi.org/10.1007/s13042-023-01999-z doi: 10.1007/s13042-023-01999-z
![]() |
[20] |
Chen Y, Xia R, Yang K, Zou K (2024) DNNAM: Image Inpainting Algorithm via Deep Neural Networks and Attention Mechanism. Appl Soft Comput 111392. https://doi.org/10.1016/j.asoc.2024.111392 doi: 10.1016/j.asoc.2024.111392
![]() |
[21] |
Chen Y, Xia R, Yang K, Zou K (2023) DARGS: Image Inpainting Algorithm via Deep Attention Residuals Group and Semantics. J King Saud Univ-Comput 35: 101567. https://doi.org/10.1016/j.jksuci.2023.101567 doi: 10.1016/j.jksuci.2023.101567
![]() |
[22] |
Chen L, Zhang X, Li Y, Sun M, Chen W (2024) A Noise-Robust Voice Conversion Method with Controllable Background Sounds. Complex Intell Syst 1–14. https://doi.org/10.1007/s40747-024-01375-6 doi: 10.1007/s40747-024-01375-6
![]() |
[23] |
Walczyna T, Piotrowski Z (2023) Overview of Voice Conversion Methods Based on Deep Learning. Applied sciences 13: 3100. https://doi.org/10.3390/app13053100 doi: 10.3390/app13053100
![]() |
[24] |
Liu EM, Yeh JW, Lu JH, Liu YW (2023) Speaker Embedding Space Cosine Similarity Comparisons of Singing Voice Conversion. The Journal of the Acoustical Society of America (JASA) 154: A244–A244. https://doi.org/10.1121/10.0023424 doi: 10.1121/10.0023424
![]() |
[25] |
Hsu CC, Hwang HT, Wu YC, Tsao Y, Wang HM (2016) Voice conversion from non-parallel corpora using variational auto-encoder. Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) 1–6. https://doi.org/10.1109/APSIPA.2016.7820786 doi: 10.1109/APSIPA.2016.7820786
![]() |
[26] |
Tobing PL, Wu YC, Hayashi T, Kobayashi K, Toda T (2019) Non-Parallel Voice Conversion with Cyclic Variational Autoencoder, Interspeech 674–678. https://doi.org/10.21437/Interspeech.2019-2307 doi: 10.21437/Interspeech.2019-2307
![]() |
[27] |
Yook D, Leem SG, Lee K, Yoo IC (2020) Many- to-many voice conversion using cycle-consistent variational autoencoder with multiple decoders. Odyssey 215–221. https://doi.org/10.21437/Odyssey.2020-31 doi: 10.21437/Odyssey.2020-31
![]() |
[28] | Hsu CC, Hwang HT, Wu YC, Tsao Y, Wang HM (2017) Voice conversion from unaligned corpora using variational autoencoding wasserstein generative adversarial networks. arXiv preprint arXiv: 1704.00849. https://doi.org/10.48550/arXiv.1704.0084 |
[29] |
Huang WC, Violeta LP, Liu S, Shi J, Toda T (2023) The Singing Voice Conversion Challenge 2023. 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) 1–8. https://doi.org/10.1109/ASRU57964.2023.10389671 doi: 10.1109/ASRU57964.2023.10389671
![]() |
[30] | Chen Q, Tan M, Qi Y, Zhou J, Li Y, Wu Q (2022) V2C: Visual Voice Cloning. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 21242–21251. |
[31] | Qian K, Zhang Y, Chang S, Yang X, Hasegawa-Johnson M (2019) Autovc: Zero-shot voice style transfer with only autoencoder loss. International Conference on Machine Learning 5210–5219. |
[32] | Patel M, Purohit M, Parmar M, Shah NJ, Patil HA (2020) Adagan: Adaptive gan for many-to-many non-parallel voice conversion. |
[33] |
Liu F, Wang H, Peng R, Zheng C, Li X (2021) U2-VC: one-shot voice conversion using two-level nested U-structure. EURASIP Journal on Audio, Speech, and Music Processing 2021: 1–15. https://doi.org/10.1186/s13636-021-00226-3 doi: 10.1186/s13636-021-00226-3
![]() |
[34] |
Liu F, Wang H, Ke Y, Zheng C (2022) One-shot voice conversion using a combination of U2-Net and vector quantization. Appl Acoust 199: 109014. https://doi.org/10.1016/j.apacoust.2022.109014 doi: 10.1016/j.apacoust.2022.109014
![]() |
[35] |
Wu DY, Lee HY (2020) One-shot voice conversion by vector quantization. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 7734–7738. https://doi.org/10.1109/ICASSP40776.2020.9053854 doi: 10.1109/ICASSP40776.2020.9053854
![]() |
[36] |
Chou JC, Lee HY (2019) One-Shot Voice Conversion by Separating Speaker and Content Representations with Instance Normalization. Interspeech 664–668. https://doi.org/10.21437/Interspeech.2019-2663 doi: 10.21437/Interspeech.2019-2663
![]() |
[37] |
Huang X, Belongie S (2017) Arbitrary style transfer in real-time with adaptive instance normalization. IEEE International Conference on Computer Vision (ICCV) 1501–1510. https://doi.org/10.1109/ICCV.2017.167 doi: 10.1109/ICCV.2017.167
![]() |
[38] |
Lian J, Lin P, Dai Y, Li G (2022) Arbitrary Voice Conversion via Adversarial Learning and Cycle Consistency Loss. International Conference on Intelligent Computing 569–578. https://doi.org/10.1007/978-3-031-13829-4_49 doi: 10.1007/978-3-031-13829-4_49
![]() |
[39] |
Gu Y, Zhao X, Yi X, Xiao J (2022) Voice Conversion Using learnable Similarity-Guided Masked Autoencoder. International Workshop on Digital watermarking 13825: 53–67. https://doi.org/10.1007/978-3-031-25115-3_4 doi: 10.1007/978-3-031-25115-3_4
![]() |
[40] |
Chen YH, Wu DY, Wu TH, Lee HY (2021) AGAIN-VC: A one-shot voice conversion using activation guidance and adaptive instance normalization. IEEE International Conference on Acoustics, Speech, and Signal Processing 5954–5958. https://doi.org/10.1109/ICASSP39728.2021.9414257 doi: 10.1109/ICASSP39728.2021.9414257
![]() |
[41] | Ulyanov D, Lebedev V, Vedaldi A, Lempitsky VS (2016) Texture networks: Feed-forward synthesis of textures and stylized images. Proceedings of the 33nd International Conference on Machine Learning 1349–1357. |
[42] | Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. Proceedings of the 32nd International Conference on International Conference on Machine Learning 37: 448–456. |
[43] | Li Y, Wang N, Shi J, Liu J, Hou X (2016) Revisiting batch normalization for practical domain adaptation. arXiv preprint arXiv: 1603.04779. |
[44] |
Ulyanov D, Vedaldi A, Lempitsky V (2017) Improved Texture Networks: Maximizing Quality and Diversity in Feed-Forward Stylization and Texture Synthesis. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 4105–4113. https://doi.org/10.1109/CVPR.2017.437 doi: 10.1109/CVPR.2017.437
![]() |
[45] |
Liu J, Han W, Ruan H, Chen X, Jiang D, Li H (2018) Learning Salient Features for Speech Emotion Recognition Using CNN. First Asian Conference on Affective Computing and Intelligent Interaction (ACII Asia) 1–5. https://doi.org/10.1109/ACIIAsia.2018.8470393 doi: 10.1109/ACIIAsia.2018.8470393
![]() |
[46] |
Lim W, Jang D, Lee T (2016) Speech emotion recognition using convolutional and Recurrent Neural Networks. Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) 1–4. https://doi.org/10.1109/APSIPA.2016.7820699 doi: 10.1109/APSIPA.2016.7820699
![]() |
[47] |
Hajarolasvadi N, Demirel H (2019) 3D CNN-Based Speech Emotion Recognition Using K-Means Clustering and Spectrograms. Entropy (Basel) 21: 479. https://doi.org/10.3390/e21050479 doi: 10.3390/e21050479
![]() |
[48] |
Graves A (2012) Long Short-Term Memory Supervised Sequence Labelling with Recurrent Neural Networks. Studies in Computational Intelligence 385: 37–45. https://doi.org/10.1007/978-3-642-24797-2 doi: 10.1007/978-3-642-24797-2
![]() |
[49] | Chung J, Gulcehre C, Cho K, Bengio Y (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv: 1412.3555. |
[50] | Kumar K, Kumar R, de Boissiere T, Gestin L, Teoh WZ, Sotelo J, et al. (2019) Melgan: Generative adversarial networks for conditional waveform synthesis. Advances in Neural Information Processing Systems 14910–14921. |
[51] | Kong J, Kim J, Bae J (2020) HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis. Proceedings of the 34th International Conference on Neural Information Processing Systems 33: 17022–17033. |
[52] |
Duan Z, Fang H, Li B, Sim KC, Wang Y (2013) The NUS sung and spoken lyrics corpus: A quantitative comparison of singing and speech. Asia-Pacific Signal and Information Processing Association Annual Summit and Conference 1–9. https://doi.org/10.1109/APSIPA.2013.6694316 doi: 10.1109/APSIPA.2013.6694316
![]() |
[53] |
Kubichek R (1993) Mel-cepstral distance measure for objective speech quality assessment. Proceedings of IEEE Pacific Rim Conference on Communications Computers and Signal Processing 1: 125–128. https://doi.org/10.1109/PACRIM.1993.407206 doi: 10.1109/PACRIM.1993.407206
![]() |
[54] |
Kobayashi K, Toda T, Nakamura S (2018) Intra-gender statistical singing voice conversion with direct waveform modification using log spectral differential. Speech Commun 99: 211–220. https://doi.org/10.1016/j.specom.2018.03.011 doi: 10.1016/j.specom.2018.03.011
![]() |
[55] |
Toda T, Tokuda K (2007) A speech parameter generation algorithm considering global variance for hmm-based speech synthesis. IEICE T Inf Syst 90: 816–824. https://doi.org/10.1093/ietisy/e90-d.5.816 doi: 10.1093/ietisy/e90-d.5.816
![]() |
1. | Vesta Navikaitė-Šnipaitienė, Dovilė Liudvinavičiūtė, Ramunė Rutkaitė, Vaida Kitrytė-Syrpa, Michail Syrpas, Antioxidant Capacity and Thermal Stability of Arthrospira platensis Extract Encapsulated in Starch Sodium Octenyl Succinate with Freeze-, Spray-, and Nanospray-Drying, 2025, 30, 1420-3049, 1303, 10.3390/molecules30061303 |
Component | Value |
Arabinose | 25.9 ± 0.138 |
Xylose | 32.3 ± 0.210 |
Galactose | 12.2 ± 0.203 |
Glucose | 9.3 ± 0.071 |
Protein | 3.4 ± 0.190 |
Ash | 1.1 ± 0.001 |
Ferulic acid | 0.025 ± 0.001 |
A/X ratio | 0.8 |
*Note: Carbohydrates, and ash and protein are expressed in g/100 g of AX dry matter. Ferulic acid is expressed in µg/mg of AX dry matter. Values are the mean ± standard deviation.