Text steganography on RNN-Generated lyrics

Yongju Tong; YuLing Liu; Jie Wang; Guojiang Xin; Yongju Tong; YuLing Liu; Jie Wang; Guojiang Xin

doi:10.3934/mbe.2019271

Mathematical Biosciences and Engineering

2019, Volume 16, Issue 5: 5451-5463. doi: 10.3934/mbe.2019271

Previous Article Next Article

Research article Special Issues

Text steganography on RNN-Generated lyrics

1.
College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, China
2.
Department of Computer Science, University of Massachusetts Lowell, Lowell, M.A., 01854, USA
3.
College of Management and Information Engineering, Hunan University of Chinese Medicine, Changsha 410208, China

Received: 17 January 2019 Accepted: 09 May 2019 Published: 12 June 2019

We present a Recurrent Neural Network (RNN) Encoder-Decoder model to generate Chinese pop music lyrics to hide secret information. In particular, on a given initial line of a lyric, we use the LSTM model to generate the next Chinese character or word to form a new line. In so doing, we generate the entire lyric from what has been generated so far. Using common lyric formats and rhymes we extracted, we generate lyrics embedded with secret information to meet the visual and pronunciation requirements. We carry out experiments and theoretical analysis, and show that lyrics generated by our method offer higher embedding capacities for steganography, which also look more natural than the existing steganography methods based on text generations.

Keywords:

Citation: Yongju Tong, YuLing Liu, Jie Wang, Guojiang Xin. Text steganography on RNN-Generated lyrics[J]. Mathematical Biosciences and Engineering, 2019, 16(5): 5451-5463. doi: 10.3934/mbe.2019271

Related Papers:

[1]	Daniela De Silva, Ovidiu Savin . Uniform density estimates and $\Gamma$ -convergence for the Alt-Phillips functional of negative powers. Mathematics in Engineering, 2023, 5(5): 1-27. doi: 10.3934/mine.2023086
[2]	Vito Crismale, Gianluca Orlando . A lower semicontinuity result for linearised elasto-plasticity coupled with damage in W^1,γ, γ > 1. Mathematics in Engineering, 2020, 2(1): 101-118. doi: 10.3934/mine.2020006
[3]	Matteo Novaga, Marco Pozzetta . Connected surfaces with boundary minimizing the Willmore energy. Mathematics in Engineering, 2020, 2(3): 527-556. doi: 10.3934/mine.2020024
[4]	Fernando Farroni, Giovanni Scilla, Francesco Solombrino . On some non-local approximation of nonisotropic Griffith-type functionals. Mathematics in Engineering, 2022, 4(4): 1-22. doi: 10.3934/mine.2022031
[5]	Kyungkeun Kang, Dongkwang Kim . Existence of generalized solutions for Keller-Segel-Navier-Stokes equations with degradation in dimension three. Mathematics in Engineering, 2022, 4(5): 1-25. doi: 10.3934/mine.2022041
[6]	Massimo Frittelli, Ivonne Sgura, Benedetto Bozzini . Turing patterns in a 3D morpho-chemical bulk-surface reaction-diffusion system for battery modeling. Mathematics in Engineering, 2024, 6(2): 363-393. doi: 10.3934/mine.2024015
[7]	Serena Della Corte, Antonia Diana, Carlo Mantegazza . Global existence and stability for the modified Mullins–Sekerka and surface diffusion flow. Mathematics in Engineering, 2022, 4(6): 1-104. doi: 10.3934/mine.2022054
[8]	Chiara Caracciolo . Normal form for lower dimensional elliptic tori in Hamiltonian systems. Mathematics in Engineering, 2022, 4(6): 1-40. doi: 10.3934/mine.2022051
[9]	Emilio N. M. Cirillo, Giuseppe Saccomandi, Giulio Sciarra . Compact structures as true non-linear phenomena. Mathematics in Engineering, 2019, 1(3): 434-446. doi: 10.3934/mine.2019.3.434
[10]	M. M. Bhatti, Efstathios E. Michaelides . Oldroyd 6-constant Electro-magneto-hydrodynamic fluid flow through parallel micro-plates with heat transfer using Darcy-Brinkman-Forchheimer model: A parametric investigation. Mathematics in Engineering, 2023, 5(3): 1-19. doi: 10.3934/mine.2023051

Abstract

1. Introduction

According to the Food and Agriculture Organization of the United Nations (FAO), more than one billion people worldwide rely on fish to supplement their bodies with animal protein ^[1]. Over the past 30 years, aquaculture has been the fastest-growing sector in agriculture. It is one of the pillar industries driving China's economy, creating many jobs in rural areas and bringing stable income to farmers ^[2]. With the development of artificial intelligence and big data technology, how to increase aquaculture production based on modern information technology and improve the information management of fisheries and fishery administration has become a hot research topic.

For aquatic animals, DO is essential in sustaining their lives, and survival and reproduction can only occur under oxygenated conditions. At the same time, too high or too low dissolved oxygen concentrations can be fatal to the health of aquatic products and must be kept within a reasonable range ^[3]. When DO levels are too high, fish are prone to bubble disease ^[4]. On the contrary, when DO is below the standard index for a long time, the growth of aquatic organisms will be slowed down, disease resistance will be reduced, and death will result in severe cases. Therefore, aquaculturists would be very convenient if the trend of DO concentration could be accurately predicted in advance. However, accurately predicting DO concentration trends are challenging. Since aquaculture is in an open-air environment some microorganisms in the water will increase the DO content through photosynthesis; meanwhile, fish and phytoplankton will also accelerate the oxygen consumption through respiration ^[5]. Different depths and water temperatures will also lead to uneven DO distribution within the culture water ^[6]. Therefore, the DO time series monitored by water quality sensors will show a nonlinear characteristic ^[7,8]. With the increase in the prediction time, the nonlinear characteristics will gradually decrease the model's accuracy ^[9,10].

In order to reduce the influence of nonlinear characteristics on prediction results, researchers have gradually designed various water quality prediction models for various application scenarios, which can be divided into mechanistic and nonmechanistic models according to their working principles ^[11,12]. The mechanical model is derived from the system structure based on the physical, chemical, biological and other reaction processes of the water environment system with the help of many hydrological, water quality, meteorological and other monitoring data ^[13]. Because the mechanistic model requires a large amount of basic information about the water environment, which is usually very complex, it limits the further application in water quality prediction.

With the development and application of computer technology, more and more non-mechanical models are being applied to water quality prediction ^[14]. The non-mechanical water quality prediction method does not consider the physical and chemical changes of the water body. It builds a corresponding model based on historical data to predict the changing trend. The process is simple, and the effect is good. Mainly including time series, regression, probabilistic statistical, machine learning, and deep learning ^[15,16,17]. For example, Shi et al. ^[18] propose the softplus extreme learning machine (ELM) model based on clustering to accurately predict changes in DO for the nonlinear characteristics of the DO data in aquaculture waters. The model employs partial least squares while using a new Softplus activation function to improve the ELM, which solves the nonlinearity problem in time series data streams and avoids the instability of the output weight coefficients. However, this model is not suitable for training a large amount of historical data, so that the deep learning model may be more suitable for actual aquatic data. Li et al. ^[19] developed three deep learning models, recurrent neural network (RNN), long short-term memory neural network (LSTM) and gated recurrent unit (GRU), to predict DO in fish ponds. The results showed that the performance of GRU is similar to LSTM, but the time cost and the number of parameters used by GRU are much lower than LSTM, which is more suitable for DO prediction in natural fish ponds. Although the above three deep learning models can obtain excellent prediction results while training a large amount of aquatic data, the accuracy will decrease significantly with the increase in prediction time.

In recent years, many scholars have also decomposed the raw DO data first and then used different machine learning models or deep learning models to predict the characteristics of the decomposition results. For example, Li et al. ^[20] proposed a hybrid model of integrated empirical mode decomposition based on multiscale features. Ren et al. ^[21] proposed a prediction model based on variational modal decomposition and deep belief network combined prediction model. Huang et al. ^[22] proposed a combined prediction model based on fully integrated empirical mode decomposition, adaptive noise and GRU combined with an improved particle swarm algorithm. Compared with various benchmark models, this method can effectively consider complex time-series data and predict DO variation trends reliably. This decomposition-based method can effectively separate and denoise the original data and enhance the neural network input data quality, thus improving the model's prediction accuracy. However, the method based on decomposition before prediction can cause the problem of boundary distortion, and there is a possibility that future data will be used in the model training ^[23]. Because of the excellent performance of the attention mechanism in artificial intelligence, many scholars have introduced it to DO prediction in aquaculture. For example, Bi et al. ^[24] added an attention mechanism after the LSTM network for multi-steps prediction of DO. Liu et al. ^[25] built a combined prediction model combining attention mechanism and RNN network and obtained excellent short-term and long-term prediction.

Based on previous studies, we investigate a combined model combining convolutional neural network, attention mechanism, and bidirectional long-term and short-term memory neural networks for short-term and long-term prediction of DO concentrations in aquaculture. It consists of a one-dimensional convolutional neural network (1D-CNN), a BiLSTM, and the AM, called the CNN-BiLSTM-AM model. The 1D-CNN helps the model extract important feature data from multiple input vectors. The BiLSTM adds forward and backward propagation to the LSTM, which can learn the input data's backward and forward adjacency relationships in the input data to achieve the purpose of fully mining the multiple data features. After this, the BiLSTM network is combined with the AM. The model's attention is focused on the moving step to capture the effect of different time steps on DO concentration prediction and improve the accuracy and stability of the model in long-term prediction.

The remainder of this paper is organized as follows: Section 2 describes the materials and methods. Section 3 details the experiments and analysis of results. Finally, Section 4 gives conclusions and directions for future work.

2. Materials and methods

2.1. Problem definition

The time series is a set of numerical sequences formed by arranging them in chronological order ^[26]. Time series are divided into unary time series and multivariate time series, and multivariate time series is the combination of multiple unary time series, which can be regarded as a sampling process to obtain multiple observed variables from different sources ^[27]. The multidimensional DO concentration prediction model for aquaculture proposed in this paper is a multivariate time series prediction problem. For a given time series data with $N$ characteristic variables, multivariate single-step prediction can be defined by the Eq (2.1):

$\begin{equation} {\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{y} _{t + 1}} = f\left( {x_0^0,x_1^1, \cdot \cdot \cdot ,x_t^{N - 1}} \right) \end{equation}$

(2.1)

where ${\mathord{\buildrel{\lower3pt\hbox{ $ \frown$ }} \over y} _{t + 1}}$ represents the model's estimate of DO concentration for one future moment, $x_i^j = {\left({x_0^j, x_1^j, \cdot \cdot \cdot, x_t^j} \right)^T}$ represents the data vector for the $j \in \left[ {0, N} \right)$ eigenvariable at the moment $i \in \left\{ {0, 1, 2, \cdot \cdot \cdot, t} \right\}$ .

The multivariate multi-step prediction is based on the multivariate single-step prediction. It uses $k \cdot N$ characteristic variables in the last $k$ moments as inputs to predict the DO concentration for the next $k$ moments. The Eq (2.2) demonstrates the $k$ steps prediction.

$\begin{equation} \left( {{{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{y} }_{t + 1}}, \cdot \cdot \cdot ,{{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{y} }_{t + k}}} \right) = f\left( {{X_0},{X_1}, \cdot \cdot \cdot ,{X_k}} \right) \end{equation}$

(2.2)

where ${X_i} \in \left\{ {{X_1}, {X_2}, \cdot \cdot \cdot, {X_k}} \right\}$ is a multi-step supervised learning dataset constructed $i$ moments in advance, model $f\left(\cdot \right)$ is usually estimated using a supervised learning strategy and multi-step prediction, using training data and corresponding labels for estimation.

The data used in this paper were obtained from two aquaculture farms in Yantai, Shandong Province, China. The marine farm is equipped with multi-parameter water quality monitoring sensors to collect various water environment data in real-time, including water temperature, salinity, chlorophyll concentration, and DO concentration. All data are collected at a frequency of 10 minutes. This paper uses the first 80% of the data collected in the experiment as the training set samples. The last 20% of the data collected is divided into validation and test set samples. The training set is used to adjust the model's internal parameters, such as the weights and bias vectors of the neural network. The validation data check whether the model is overfitted or underfitting. The test set data is used to evaluate the model's predictive performance.

Multivariate multi-step prediction can be defined as a supervised learning problem. The multi-dimensional data are stitched together to form a matrix. Specifically, the inputs to the model are DO concentration, salinity, water temperature and chlorophyll content at the past time. The output of the model is the DO concentration at the current time. $T$ denotes the current moment and $n$ represents the prediction time step. illustrates the construction process of the supervised learning dataset in this paper, using multi-dimensional data from the previous $\left({T - n} \right)$ moments to predict the DO concentration at the moment. The data surrounded by red boxes represent a set of constructed training data and target labels, and the sliding window technique is used to repeat the operation to complete the construction of all data.

Figure 1. Multi-step predictive supervised learning data construction.

DownLoad: Full-Size Img PowerPoint

At the same time, the DO concentration at the current moment can be predicted in several time steps; the one-time step is 10 min, using sliding windows of different sizes. In order to verify the performance of the model in short-term prediction, three-step prediction (30 minutes) and six-steps prediction (1 hour) are selected for experimental analysis. The three-step prediction uses the historical data of the first three moments to predict the DO trend of the current moment 30 minutes in advance. The six-step prediction uses the historical data of the first six moments in advance to predict the DO trend of the current moment. Based on this, this paper chooses 36 steps ahead (6 hours) and 72 steps ahead (12 hours) to verify the performance of the CNN-BiLSTM-AM model in long-term prediction.

2.2. CNN

The CNN is an improvement of Lecun on multilayer perceptron (MLP) ^[28]. Due to its structural features of local area connectivity, weight sharing, and downsampling, the CNN excels in image processing. The application scenarios of CNN are specifically in image classification, face recognition, autonomous driving, and target detection ^[29,30,31]. There are three types of convolution operations: 1D, 2D convolution and 3D convolution ^[32]. The 1D convolution is used to process sequential data, such as in natural language processing; 2D convolution is often used in computer vision and image processing; 3D convolution is often used in medicine and video processing.

Since this paper focuses on predicting DO concentration sequences in aquaculture and is one-dimensional data, 1D-CNN is used for feature extraction. The 1D-CNN is calculated as shown in Eq (2.3):

$\begin{equation} {h_t} = \sigma \left( {W * {x_t} + b} \right) \end{equation}$

(2.3)

where $W$ denotes the convolution kernel, also called the weight coefficient of the filter in the convolution layer. $b$ denotes the bias vector. ${x_t}$ represents the sample data of the ${t_{th}}$ input network. $*$ denotes the convolution operator. $\sigma$ represents the activation function. ${h_t}$ represents the output result after the convolution operation.

shows the structure of 1D-CNN. In this paper, the 1D-CNN is used to extract the features of DO concentration for the characteristics of time series data. Firstly, the historical data of the aquaculture water environment with multiple features are stitched into a matrix. In order to feed the constructed supervised learning data into the interior of the CNN network, the dimensions of the matrix are therefore converted into $k$ tensors. At the same time, each tensor has $m$ rows and $n$ columns. The $k$ represents the size of the one-dimensional convolution kernel, $m$ is a variety of water environment data from the past few days, and $n$ represents the length of the time step. After the convolution operation, this paper uses maximum pooling to retain the most vital features and eliminate the weak ones. The purpose is to reduce the complexity and avoid overfitting the model. The step of maximum pooling is to place the pooling window on the sequence and use the maximum value within that window as the output value of the pooling. The window is then slid, and the above steps are repeated until the end of the sequence.

Figure 2. The structure of 1D-CNN model.

DownLoad: Full-Size Img PowerPoint

2.3. LSTM

Although the RNN can learn the relationship between the current moment and the earlier moment information in the long time series prediction problem, the longer the time goes, the more difficult it is for RNN to learn this relationship. Researchers call this phenomenon the long-term dependence problem ^[33]. It is like a person with weak memory who cannot remember the past is the same. The root cause of the long-term dependency problem is that the gradient tends to disappear or explode after the RNN has propagated through many stages. To solve the long-term dependency problem, Hochreiter and Schmidhuber ^[34] proposed the LSTM in 1997. LSTM is a modification of RNN, and Figure 3 illustrates the basic structural unit of LSTM. The LSTM network consists of several structural units, each containing three gating mechanisms: forgetting, input, and output.

Figure 3. The structure of one LSTM neuron.

DownLoad: Full-Size Img PowerPoint

The forgetting gate mainly determines the degree of forgetting of previous information. The forgetting gate decides which information from the past is discarded after receiving the last moment output $h_{t-1}$ and the current moment input $x_t$ . The forgetting gate of LSTM is calculated as follows:

$\begin{equation} {f_t} = \sigma \left( {{W_f} \cdot \left[ {{h_{t - 1}},{x_t}} \right] + {b_f}} \right) \end{equation}$

(2.4)

where $f_t$ represents the output of the forgetting gate and $\sigma$ represents the activation function. The output of the sigmoid function in the forgetting gate ranges between $\left({0, 1} \right)$ so that selective discarding of the data can be achieved.

The role of the input gate is to select which current information is input to the internal network after the forgetting gate has discarded some of the information ^[35,36]. The input gate of LSTM consists of two steps. Firstly, determining which values need to be updated according to Eqs (2.5) and (2.6). Secondly, updating the cell state of the last moment to the cell state of the current moment according to Eq {(2.7)}.

$\begin{equation} {i_t} = \sigma \left( {{W_i} \cdot \left[ {{h_{t - 1}},{x_t}} \right] + {b_i}} \right) \end{equation}$

(2.5)

$\begin{equation} {\tilde C_t} = \tanh \left( {{W_C} \cdot \left[ {{h_{t - 1}},{x_t}} \right] + {b_C}} \right) \end{equation}$

(2.6)

$\begin{equation} {C_t} = {f_t} * {C_{t - 1}} + {i_t} * {\tilde C_t} \end{equation}$

(2.7)

As the last part inside the LSTM network, the output gate obtains the model's output value, and the calculation formula is as follows: Eqs {(2.8) and (2.9)}. The initial output value $o_t$ is first calculated using the output $h_{t-1}$ of the previous moment and the input $x_t$ of the current moment, which is used to control the information to be output. The hyperbolic tangent activation function $(tanh)$ then scales the cell state $c_t$ at the current moment between $-1$ and $1$ and multiplies the result by the initial output value $o_t$ to obtain the final output of the output gate.

$\begin{equation} {o_t} = \sigma \left( {{W_o}\left[ {{h_{t - 1}},{x_t}} \right]} \right) + {b_o} \end{equation}$

(2.8)

$\begin{equation} {h_t} = {o_t} * \tanh \left( {{C_t}} \right) \end{equation}$

(2.9)

where $w_f$ , $w_i$ , $w_c$ , $w_o$ are the weight matrices of oblivion gate, input gate, state update, and output gate. $b_f$ , $b_i$ , $b_c$ , and $b_o$ represent the corresponding bias vectors.

2.4. BiLSTM

Although the LSTM network internally solves the long-term dependency problem by adding a threshold mechanism, the LSTM can only handle temporal information in one direction. For the long-time sequence problem, the current state may relate to the previous information and involve the following information. The BiLSTM combines the bi-directional recurrent neural network ^[37] and LSTM to fully use the sequence's historical and future information to improve the model's prediction performance ^[38]. Since BiLSTM networks can obtain the before-and-after features of the input sequences, they have been widely used in machine translation and speech recognition.

The BiLSTM neural network adds a forward layer and backward layer to the LSTM structure, and the main structure is shown in Figure 4. The principle of each propagation layer in BiLSTM is precisely the same as the structure of the one-way propagation LSTM model. As shown in Eq {(2.10)}, the final output consists of a superposition of LSTMs in both directions.

$\begin{equation} {h_t} = {\vec h_t} \otimes {\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\leftarrow}$}}{h} _t} \end{equation}$

(2.10)

Figure 4. The structure of one LSTM neuron.

DownLoad: Full-Size Img PowerPoint

where $t$ represents the moment in the time series; $\vec h$ and $\mathord{\buildrel{\lower3pt\hbox{ $ \leftarrow$ }} \over h}$ represent the output values of the forward and backward propagation layers, respectively. $h_t$ represents the output results after the forward and backward superposition at moment $t$ .

2.5. AM

When we look through a photo album quickly, we may not see the whole picture but often focus on the most beautiful parts of the picture. The AM references the above idea explained in mathematical language as $X = \left[ {{x_1}, {x_2}, {x_3}, \cdot \cdot \cdot, {x_n}} \right]$ representing $n$ inputs of information. In order to save computational resources, we do not want the neural network to process all the inputs but only select some information from the most relevant inputs to the task. In recent years, the AM mechanism has been widely used in image enhancement, text classification, and machine translation ^[39,40].

In this paper, we use soft attention, the most common attention method. It means that when information selection is performed, instead of selecting one of the $n$ input information, the weighted sum of the $n$ input information is calculated and then input to the neural network for calculation. In this paper, attention is assigned to the prediction step of the model based on the CNN-BiLSTM network.

The calculation steps are as follows: the first step is to calculate the similarity between the output ${h_i}\left({i = 1, 2, \cdot \cdot \cdot, n} \right)$ of the BiLSTM network at each time and the output $h_t$ of the current moment to get the corresponding weight ${s_i}\left({i = 1, 2, \cdot \cdot \cdot, n} \right)$ . The second step normalizes the weights $s_i$ using the $soft\max \left(\cdot \right)$ function. Obtain the output $h_t$ at the current moment with the weights ${\alpha _i}\left({i = 1, 2, \cdot \cdot \cdot, n} \right)$ of the outputs $h_i$ at each moment. The third step weights the weight ${\alpha _i}$ and the output $h_i$ of the BiLSTM neural network at each moment to obtain the final output $c_i$ . The formulas for the attention mechanism are given in Eqs {(2.11)–(2.13)}.

$\begin{equation} {s_i} = \tanh \left( {{W_{{h_i}}}{h_t} + {b_{{h_i}}}} \right) \end{equation}$

(2.11)

$\begin{equation} {\alpha _i} = soft\max \left( {{s_i}} \right) \end{equation}$

(2.12)

$\begin{equation} {c_t} = \sum\limits_{i = 1}^n {{\alpha _i}{h_i}} \end{equation}$

(2.13)

2.6. CNN-BiLSTM-AM model

Figure 5 illustrates the CNN-BiLSTM-AM model framework built in this paper. Table 1 collates the details of the main parameters of the CNN-BiLSTM-AM model. In addition to the input and output layers, the model contains a convolutional layer, a BiLSTM layer, and an Attention layer.

Figure 5. CNN-BiLSTM-AM model structure framework diagram.

DownLoad: Full-Size Img PowerPoint

Table 1. Parameter setting of the model.

Hyperparameter	Value
Filter size of CNN	[128, 64]
Kernel size of CNN	1
Padding	same
Activation function	Relu
Unit number for BiLSTM	128
Optimization function	adam
Learning-rate	0.001
Batch size	64
Epoch number	100

| Show Table

DownLoad: CSV

The first step is the data input layer, which uses multi-feature data from aquaculture water environments to construct a supervised learning dataset $\left\{ {\left({{X_i}, {Y_i}} \right)|i = 1, 2, \cdot \cdot \cdot, n} \right\}$ , and constructs the features $X$ and labels $Y$ for model training. The second step is the convolution layer, which extracts the features in the sequence using one-dimensional convolution (Conv1D) while adding a pooling (MaxPooling) layer to reduce the complexity of the features to avoid overfitting of the model. Although the convolution and pooling stages of the convolution layer can fully extract the time-series features and enrich the diversity of features, CNN only considers the correlation links between adjacent data in the sequence and does not consider the problem of long-term information dependence of the time series.

In order to remedy this deficiency, the third step of this paper connects the BiLSTM layer to the CNN layer. The forward and reverse LSTM networks in the BiLSTM layer are utilized to fully consider the sequences' past and future information features. Since the BiLSTM network is based on the bi-directional recurrent neural network and the LSTM neural network, the core of the network is still the LSTM, so the feature information of the output of the CNN layer needs to be changed accordingly, as the tensor form of the data is consistent with the input format [number of samples, prediction time step size, and input feature dimension] required by the LSTM. In the BiLSTM layer, the BiLSTM network firstly traverses the data output from the CNN layer from left to right, and secondly traverses the data from right to left in the reverse direction, and finally stitches the output results from the above two directions together and submits them to the attention layer.

In this paper, we study the multi-dimensional multi-step time series prediction of DO concentration, and different time steps affect the model's accuracy. Therefore, the fourth step of this paper incorporates the AM after the BiLSTM layer, specifically, the soft attention mechanism. The AM is used to capture the importance of different prediction steps in the time series on the impact of the model and improve the model's overall prediction accuracy. The fifth step uses the Flatten Layer to transform the multi-dimensional input data into one dimension. Finally, the fully connected layer is added to obtain the model's output.

2.7. Evaluation criteria

In this paper, the mean absolute error (MAE), root mean square error (RMSE), mean absolute percentage error (MAPE) and determination coefficient ( $R^2$ ) are chosen to assess the prediction accuracy of the model. Their calculation methods are shown in Eqs {(2.14)–(2.17)}.

$\begin{equation} MAE = \frac{1}{N}\sum\limits_{{\rm{k}} = 1}^N {\left| {{{\hat y}_k} - {y_k}} \right|} \end{equation}$

(2.14)

$\begin{equation} RMSE = \sqrt {\frac{1}{N}\sum\limits_{k = 1}^N {{{\left( {{y_k} - {{\hat y}_k}} \right)}^2}} } \end{equation}$

(2.15)

$\begin{equation} MAPE = \frac{1}{N}\sum\limits_{k = 1}^N {\left| {\frac{{{{\hat y}_k} - {y_k}}}{{{y_k}}}} \right|} \times 100\% \end{equation}$

(2.16)

$\begin{equation} {R^2} = 1 - \frac{{\sum\nolimits_{k = 1}^N {{{\left( {{y_k} - {{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{y}}_k}} \right)}^2}} }}{{\sum\nolimits_{k = 1}^N {{{\left( {{y_k} - {{\bar y}^2}} \right)}^2}} }} \end{equation}$

(2.17)

where $N$ is the number of samples in the test set, ${\mathord{\buildrel{\lower3pt\hbox{ $ \frown$ }} \over y} _k}$ is the actual value, and $y_k$ represents the prediction result. The model has higher prediction accuracy when the MAE and RMSE results are negligible. Conversely, a higher $R^2$ value represents a better fit of the model on the test set and more accurate prediction results.

3. Experiments and analysis of results

3.1. Data preprocessing

Aquaculture enterprises in the daily production and operation process, power outages, poor network signals and water quality sensor failure and other factors can easily lead to the collection of raw water environmental data missing values and abnormal. This paper uses the Lagrangian interpolation method to fill in the missing and abnormal data, calculated as in Eq (3.1). Table 2 collates the statistical information on DO concentrations monitored in aquaculture farms above.

Table 2. Statistical characteristics of dissolved oxygen in the aqueous environment.

Cases	Datasets	Numbers	Statistical indicators
Cases	Datasets	Numbers	Mean	Std.	Max.	Min.	Kurtosis	Skewness
Dataset 1	All Samples	20851	8.09	1.20	10.25	5.02	-0.64	-0.62
	Training Set	16681	8.57	0.77	10.25	6.37	-0.37	-0.47
	Validation Set	2085	6.54	0.25	7.09	5.84	-0.53	-0.04
	Testing Set	2085	5.81	0.35	6.44	5.02	-0.88	-0.21
Dataset 2	All Samples	21673	7.28	1.35	10.32	4.12	-0.99	0.46
	Training Set	17339	6.75	0.94	9.08	4.12	-0.69	0.48
	Validation Set	2167	8.99	0.24	9.71	8.49	-0.03	0.97
	Testing Set	2167	9.70	0.22	10.32	9.32	-1.47	0.19

| Show Table

DownLoad: CSV

At the same time, the different units of data in different water environments and the possible existence of odd sample data in the original data can negatively affect the model's training. For example, when gradient descent is performed, the gradient direction tends to deviate from the direction of the minimum value, resulting in a long training time for the model. Therefore, this paper adopts the normalization operation for the preprocessed data, and Eq (3.2) describes its calculation process.

$\begin{equation} {L_n}\left( x \right) = \sum\limits_{k = 0}^n {{y_k} \cdot \frac{{\left( {x - {x_0}} \right)\left( {x - {x_1}} \right) \cdot \cdot \cdot \left( {x - {x_{k - 1}}} \right)\left( {x - {x_{k + 1}}} \right) \cdot \cdot \cdot \left( {x - {x_n}} \right)}}{{\left( {{x_k} - {x_0}} \right)\left( {{x_k} - {x_1}} \right) \cdot \cdot \cdot \left( {{x_k} - {x_{k - 1}}} \right)\left( {{x_k} - {x_{k + 1}}} \right) \cdot \cdot \cdot \left( {{x_k} - {x_n}} \right)}}} \end{equation}$

(3.1)

$\begin{equation} z' = \frac{{z - \min \left( z \right)}}{{\max \left( z \right) - \min \left( z \right)}} \end{equation}$

(3.2)

where ${x_k}\left({k = 0, 1, \cdot \cdot \cdot, n} \right)$ is several independent variables, $y_k$ is the value of the dependent variable corresponding to the independent variables, $x$ is the interpolation node, and ${L_n}\left(x \right)$ is the result after interpolation. $z$ represents the different water environment parameters in the original data, $\min \left(z \right)$ and $\max \left(z \right)$ represent the minimum and maximum values under the same feature, respectively, $z'$ representing the normalized data results.

3.2. Model development

In the experiments of this paper, several classical prediction models were trained and predicted to compare and evaluate the prediction performance of CNN-BiLSTM-AM models. They are SVR, RNN, LSTM, CNN-LSTM and CNN-BiLSTM models.

Meanwhile, to ensure the fairness of the experiments, we use the same training set, validation set and test set for all the models and choose the same values for the same hyperparameters in these models. The MSE is used as the loss function for model training, and the weights of the models are optimized using the adaptive moment estimation method. The learning rate was 0.001 and the training period was 100 times. All models were done in a Python programming environment and implemented on the Keras framework.

3.3. Performance analysis on periodic datasets

In order to visualize the prediction results, Figure 6 shows the short-term prediction fitting curves of different comparison models and the CNN-BiLSTM-AM model proposed in this paper on the test set. The four subplots (a)–(d) represent 3-step prediction, 6-step prediction, 36-step prediction and 72-step prediction, respectively. It can be seen from the Figure 6 that the dissolved oxygen concentration in the water environment of this aquaculture farm shows a cyclic variation, fluctuating up and down in a particular range. Although all models could predict the overall trend of DO correctly, the prediction accuracy of different models varied considerably at the peaks and valleys of different time steps. We can see that the prediction curves of the SVR model deviate the farthest from the actual values at the peaks and valleys, and the rest of the deep learning models and the combined models are closer to the actual values.

Figure 6. Fitted curve of dissolved oxygen prediction for the first data set.

DownLoad: Full-Size Img PowerPoint

Table 3 collates the short-term prediction results of the CNN-BiLSTM-Attention model proposed in this paper for dissolved oxygen concentration in an aquaculture water environment, including three steps ahead (30 minutes) and six steps ahead (1 hour). Based on the prediction results of 3 steps ahead (30 minutes), it can be seen that the MAE, RMSE and MAPE of the deep learning RNN model are reduced by 10.12, 4.34 and 13.19%, respectively, compared with the machine learning SVR model. The three metrics of LSTM were reduced by another 13.85, 9.59 and 12.66%, respectively, compared to the RNN model. It is verified that LSTM alleviates the long-term dependence problem of RNN by increasing the threshold mechanism and improving the model's prediction accuracy. Compared with the combined model, none of the three single prediction models has higher prediction accuracy than the combined model at 30 minutes of prediction. It reflects that the combined model can utilize different network structures to improve the overall prediction of short-term forecasts.

Table 3. Short-term prediction performance metrics for the first dataset.

Methods	3 steps				6 steps
Methods	MAE	RMSE	MAPE	$R^2$	MAE	RMSE	MAPE	$R^2$
SVR	0.0514	0.0599	0.0091	0.9711	0.1238	0.1333	0.0219	0.8565
RNN	0.0462	0.0573	0.0079	0.9736	0.0555	0.0638	0.0096	0.9671
LSTM	0.0398	0.0518	0.0069	0.9799	0.0552	0.0627	0.0094	0.9683
CNN-LSTM	0.0309	0.0394	0.0053	0.9875	0.0452	0.0575	0.0078	0.9733
CNN-BiLSTM	0.0231	0.0342	0.0041	0.9907	0.0303	0.0429	0.0053	0.9851
Proposed	0.0135	0.0187	0.0023	0.9971	0.0202	0.0293	0.0034	0.9931

| Show Table

DownLoad: CSV

Figure 7. Performance comparison of different models on the first dataset.

DownLoad: Full-Size Img PowerPoint

To verify whether the model's prediction performance is improved after adding AM, a CNN-BiLSTM comparison model is introduced in this paper. The experimental results show that the prediction performance of the CNN-BiLSTM-AM model after adding AM improves by 41.56, 45.32 and 43.9%, respectively, when predicting 30 minutes in the short term. It demonstrates that focusing the model's attention on the time step can improve the short-term prediction performance of the model. In the prediction results of 6 steps ahead (1 hour), SVR reduced 11.43% in $R^2$ metric compared to LSTM in deep learning models. The CNN-BiLSTM-AM model proposed in this paper also improved 33.33, 31.7 and 35.85% over the best-performing CNN-BiLSTM model in the combined neural network.

To verify whether the CNN-BiLSTM-AM model proposed in this paper still has excellent prediction performance for long-term prediction, we take 36 steps ahead (6 hours) and 72 steps ahead (12 hours) for multiple time steps of DO concentration prediction. According to , the prediction accuracy of the SVR model decreases sharply as the prediction time step increases. In the $R^2$ index, its accuracy decreased by 21.97%, from 97.11% for 3-step prediction to 75.77% for 72-step prediction. RNN and LSTM closely follow this.

Table 4. Long-term prediction performance metrics for the first dataset.

Methods	36 steps				72 steps
Methods	MAE	RMSE	MAPE	$R^2$	MAE	RMSE	MAPE	$R^2$
SVR	0.1352	0.1490	0.0240	0.8142	0.1365	0.1713	0.0247	0.7577
RNN	0.0794	0.0923	0.0135	0.9287	0.0881	0.1063	0.0154	0.9067
LSTM	0.0633	0.0765	0.0109	0.9513	0.0675	0.0797	0.0115	0.9476
CNN-LSTM	0.0448	0.0599	0.0078	0.9756	0.0479	0.0601	0.0082	0.9702
CNN-BiLSTM	0.0347	0.0451	0.0062	0.9835	0.0452	0.0584	0.0079	0.9719
Proposed	0.0156	0.0219	0.0027	0.9968	0.0283	0.0381	0.0049	0.9883

| Show Table

DownLoad: CSV

The 36-step predictions of RNN and LSTM decrease by 3.97 and 1.75%, respectively, in $R^2$ compared to their 6-step predictions for short-term prediction of DO concentration. Although the accuracy of long-term prediction was reduced, LSTM had higher stability than RNN, verifying that LSTM alleviated the long-term dependence problem by its unique gating mechanism. The combined neural network models all outperformed the single model in long-term prediction. Compared with LSTM, CNN-BiLSTM improved by 45.18, 41.06 and 43.12% in 36-step prediction, and 33.04, 26.73 and 31.3% in 72-step prediction.

Happily, the performance of the CNN-BiLSTM-AM model used in this paper is 55.04, 51.44 and 56.45% higher than that of CNN-BiLSTM without attention mechanism in 36-step prediction. In the 72-step prediction, each index improved by 37.39, 34.76 and 37.97%, respectively. The above results indicate that the CNN-BiLSTM-AM model has more robust prediction performance on longer prediction steps and has higher stability than similar models.

3.4. Performance analysis on non-periodic datasets

Based on the above, this paper explores the generalization ability of CNN-BiLSTM-AM model on different datasets, and we use the water environment data monitored by another farm in Yantai. As shown in Figure 8, it can be seen that the periodicity of this dataset is not apparent, and the overall dissolved oxygen content is higher than that of the first dataset.

Figure 8. Fitting curve of dissolved oxygen prediction for the second data set.

DownLoad: Full-Size Img PowerPoint

Tables 5 and 6 summarize the performance metrics of the different models for short-term and long-term forecasting on this dataset. We can clearly see that the CNN-BiLSTM model with the added attention mechanism has excellent performance in all metrics. In terms of MAE metrics, it improves 25.29%, 26.10%, 33.43% and 21.59% in short-term and long-term predictions, respectively, over the CNN-BiLSTM model without the added attention mechanism. It improved 30.76%, 48.85%, 49.31% and 44.24% in short-term and long-term predictions, respectively, over the RNN model. The growth trend of the prediction steps shows that adding the attention mechanism to the CNN-BiLSTM model in this paper can effectively enhance the long-term prediction ability of the model and has higher prediction accuracy compared with the benchmark model.

Table 5. Short-term prediction performance metrics for the second data set.

Methods	3 steps				6 steps
Methods	MAE	RMSE	MAPE	$R^2$	MAE	RMSE	MAPE	$R^2$
SVR	0.0431	0.0514	0.0043	0.9473	0.0518	0.0607	0.0053	0.9264
RNN	0.0273	0.0368	0.0028	0.9729	0.0393	0.0502	0.0040	0.9497
LSTM	0.0208	0.0321	0.0021	0.9793	0.0334	0.0452	0.0034	0.9592
CNN-LSTM	0.0257	0.0323	0.0027	0.9791	0.0329	0.0413	0.0034	0.9659
CNN-BiLSTM	0.0253	0.0314	0.0026	0.9803	0.0272	0.0352	0.0028	0.9753
Proposed	0.0189	0.0257	0.0019	0.9869	0.0201	0.0266	0.0021	0.9858

| Show Table

DownLoad: CSV

Table 6. Long-term prediction performance metrics for the second data set.

Methods	36 steps				72 steps
Methods	MAE	RMSE	MAPE	$R^2$	MAE	RMSE	MAPE	$R^2$
SVR	0.0849	0.0971	0.0086	0.8126	0.1013	0.0118	0.0104	0.7519
RNN	0.0436	0.0555	0.0045	0.9395	0.0495	0.0642	0.0051	0.9184
LSTM	0.0409	0.0499	0.0042	0.9505	0.0420	0.0532	0.0043	0.9443
CNN-LSTM	0.0394	0.0494	0.0041	0.9516	0.0422	0.0504	0.0044	0.9497
CNN-BiLSTM	0.0332	0.0402	0.0034	0.9683	0.0352	0.0446	0.0036	0.9606
Proposed	0.0221	0.0298	0.0023	0.9824	0.0276	0.0372	0.0028	0.9726

| Show Table

DownLoad: CSV

Both Figure 7 and Figure 9 show a combination of bar and line graphs that visualize the prediction accuracy of the different prediction models on the two data sets, respectively. Subplots (a) to (d) in the figures represent 3-step forecasting, 6-step forecasting, 36-step forecasting, and 72-step forecasting, respectively. Starting from the legend, MAE and RMSE correspond to the orange and green bars, whose values correspond to the left y-axis. mape corresponds to the purple beads, whose values correspond to the right y-axis. The red line graphs represent the coefficients of determination of the indicators, and higher values represent better fitting results of the model.

Figure 9. Performance comparison of different models on the second dataset.

DownLoad: Full-Size Img PowerPoint

In summary, we can conclude that both short-term prediction (30 minutes, 1 hour) and long-term prediction (6 hours, 12 hours). The CNN-BiLSTM-AM model proposed in this paper has excellent prediction performance and better stability in long-term prediction.

4. Conclusions

This paper proposed a multi-step prediction model of dissolved oxygen concentration based on the combination of attention mechanisms and a combined neural network to improve the prediction accuracy of dissolved oxygen concentration in an aquaculture water environment. The model incorporates an attention mechanism in the BiLSTM network to focus the model's attention on the time step of prediction, which enhances the long-term prediction ability of the model. By comparing the SVR, RNN, LSTM, CNN-LSTM and CNN-BiLSTM prediction models, the CNN-BiLSTM-AM model established in this paper has higher prediction accuracy. As the time step increases, the model's superiority becomes more evident.

In the study process, this paper only used four water quality parameters as the input of the model. Future studies can consider on-farm weather factors and more water quality factors for the model.

Acknowledgments

This work was supported by the Yantai Science and Technology Innovation Development Plan Project (Grant No. 2022XDRH015).

Conflict of interest

The authors declare that there is no conflict of interests regarding the publication of this paper.

References

[1]	R. H. Meng, S. G. Rice, J. Wang, et al.,A fusion steganographic algorithm based on faster R-CNN, CMC Comput. Mater. Con., 55 (2018), 1–16.
[2]	G. J. Xin, Y. L. Liu, T. Yang, et al., An adaptive audio steganography for covert wireless communication, Secur. Commun. Netw., 1 (2018), 1–10.
[3]	F. Peng, X. Q. Gong, M. Long, et al., A selective encryption scheme for protecting H.264/AVC video in multimedia social network, Multimed. Tools Appl., 76 (2018), 3235–3253.
[4]	Y. W. Kim, K. A. Moon and I. S. Oh,A text watermarking algorithm based on word classification and inter-word space statistics, International Conference on Document Analysis and Recognition, 2 (2003), 775–799.
[5]	A. M. Alattar and O. M. Alattar, Watermarking electronic text documents containing justified paragraphs and irregular line spacing, International Society for Optics and Photonics, 5306 (2004), 685–695.
[6]	B. K. Ramakrishnan, P. K. Thandra and A. V. S. M. Srinivasula, Text steganography: a novel character-level embedding algorithm using font attribute, Secur. Commun. Netw., 9 (2016), 6066– 6079.
[7]	R. Kumar, A. Malik, S. Singh, et al., A space based reversible high capacity text steganography scheme using Font type and style, International Conference on Computing, Communication and Automation, (2016), 1090–1094.
[8]	Q. Cao, X. M. Sun and L. Y. Xiang,A secure text steganography based on synonym substitution, IEEE Conference Anthology, (2014), 1–3.
[9]	L. Y. Xiang, Y. Li and W. Hao, Reversible natural language watermarking using synonym substitution and arithmetic coding, CMC Comput. Mater. Con., 55 (2018), 541–559.
[10]	J. Cong, D. Zhang and M. Pan,Chinese text information hiding based on paraphrasing technology, IEEE International Conference of Information Science and Management Engineering, 1 (2010), 39–42.
[11]	Y. Yang, Y. W. Chen and Y. L. Chen,A novel universal steganalysis algorithm based on the IQM and the SRM, CMC Comput. Mater. Con., 56 (2018), 261–271.
[12]	L. Y. Xiang, J. M. Yu, C. F. Yang, et al., A word-embedding-based steganalysis method for linguistic steganography via synonym-substitution, IEEE Access, 6 (2018), 64131–64141.
[13]	Z. S. Yu, L. S. Huang and Z. L. Chen, High embedding ratio text steganography by ci-poetry of the song dynasty, J. Chin. Inf. Proc., 23 (2009), 55–62.
[14]	J. W. Wang, T. Li, X. Y. Luo, et al., Identifying computer generated images based on quaternion central moments in color quaternion wavelet domain, IEEE. T. Circ. Syst. Vid., (2018), 1.
[15]	X. Zhang and M. Lapata,Chinese poetry generation with recurrent neural networks, International Conference on Empirical Methods in Natural Language Processing, (2014), 670–680.
[16]	Q. X. Wang, T. Y. Luo and D. Wang, Can machine generate traditional chinese poetry? A feigenbaum test, International Conference on Brain Inspired Cognitive Systems, 10023 (2016), 34–46.
[17]	Q. X. Wang, T. Y. Luo and D. Wang,Chinese song iambics generation with neural attention-based model, Association for Computing Machinery, (2016), 2943–2949.
[18]	X. Y. Yi, R. Y. Li and M. S. Sun,Generating chinese classical poems with RNN encoder-decoder, in Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data (eds. M. Sun, X. Wang, B. Chang, D. Xiong), Springer, 10565 (2017), 211–223.
[19]	Y.B. Luo, Y.F. Huang andF. F. Li,Textsteganography based onci-poetry generation usingmarkov chain model, KSII. T. Internet. Inf., 10 (2016), 4568–4584.
[20]	Y. B. Luo and Y. F. Huang, Text steganography with high embedding rate: Using recurrent neural networks to generate chinese classic poetry, 5th ACM Workshop on Information Hiding and Multimedia Security, (2017), 99–104.
[21]	C. Olah, Understanding LSTM Networks, 2015. Available from: http://colah.github.io/posts/2015-08-Understanding-LSTMs.
[22]	A. Karpathy,The unreasonable effectiveness of recurrent neural networks, 2015. Available from: http://karpathy.github.io/2015/05/21/rnn-effectiveness.
[23]	Q. Y. Du, The application of the thirteen rhymes in singing technique, Journal of Xingyi Normal University for Nationalities, (2010), in Chinese.

This article has been cited by:

1.	K Ganpati Shrinivas Sharma, Surekha Bhusnur, 2023, Data Ensemble Model for Prediction of Oxygen Content in Gas fired Boiler for Efficient Combustion, 979-8-3503-9874-8, 1, 10.1109/SCEECS57921.2023.10062991
2.	Zheng Li, Min Yao, Zhenmin Luo, Qianrui Huang, Tongshuang Liu, Ultra-early prediction of the process parameters of coal chemical production, 2024, 10, 24058440, e30821, 10.1016/j.heliyon.2024.e30821
3.	Rui Tan, Zhaocai Wang, Tunhua Wu, Junhao Wu, A data-driven model for water quality prediction in Tai Lake, China, using secondary modal decomposition with multidimensional external features, 2023, 47, 22145818, 101435, 10.1016/j.ejrh.2023.101435
4.	Mingyan Wang, Qing Xu, Yingying Cao, Shahbaz Gul Hassan, Wenjun Liu, Min He, Tonglai Liu, Longqin Xu, Liang Cao, Shuangyin Liu, Huilin Wu, An Ensemble Model for Water Temperature Prediction in Intensive Aquaculture, 2023, 11, 2169-3536, 137285, 10.1109/ACCESS.2023.3339190
5.	Wenhao Li, Yin Zhao, Yining Zhu, Zhongtian Dong, Fenghe Wang, Fengliang Huang, Research progress in water quality prediction based on deep learning technology: a review, 2024, 31, 1614-7499, 26415, 10.1007/s11356-024-33058-7
6.	Yane Li, Lijun Guo, Jiyang Wang, Yiwei Wang, Dayu Xu, Jun Wen, An Improved Sap Flow Prediction Model Based on CNN-GRU-BiLSTM and Factor Analysis of Historical Environmental Variables, 2023, 14, 1999-4907, 1310, 10.3390/f14071310
7.	Itunu C. Adedeji, Ebrahim Ahmadisharaf, Clayton J. Clark, A unified subregional framework for modeling stream water quality across watersheds of a hydrologic subregion, 2025, 958, 00489697, 177870, 10.1016/j.scitotenv.2024.177870
8.	Sirisak Pangvuthivanich, Wirachai Roynarin, Promphak Boonraksa, Terapong Boonraksa, Deep Learning‐Driven Forecasting for Compressed Air Oxygenation Integrating With Floating PV Power Generation System, 2025, 7, 2516-8401, 10.1049/esi2.70000
9.	Kaixuan Shao, Daoliang Li, Hao Tang, Yonghui Zhang, Bo Xu, Uzair Aslam Bhatti, Improving multi-step dissolved oxygen prediction in aquaculture using adaptive temporal convolution and optimized transformer, 2025, 235, 01681699, 110329, 10.1016/j.compag.2025.110329

Reader Comments

Your name:*

Email:*
© 2019 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)