Feature fusion–based preprocessing for steel plate surface defect recognition

Yong Tian; Tian Zhang; Qingchao Zhang; Yong Li; Zhaodong Wang; Yong Tian; Tian Zhang; Qingchao Zhang; Yong Li; Zhaodong Wang

doi:10.3934/mbe.2020305

Mathematical Biosciences and Engineering

2020, Volume 17, Issue 5: 5672-5685. doi: 10.3934/mbe.2020305

Previous Article Next Article

Research article Special Issues

Feature fusion–based preprocessing for steel plate surface defect recognition

State Key Laboratory of Rolling and Automation, Northeastern University, Shenyang 110819, China

Received: 18 May 2020 Accepted: 17 August 2020 Published: 26 August 2020

To address the problem of steel strip surface defect detection, a feature fusion–based preprocessing strategy is proposed based on machine vision technology. This strategy can increase the feature dimension of the image, highlight the pixel features of the image, and improve the recognition accuracy of the convolutional neural network. This method is based on commonly used image feature extraction operators (e.g., Sobel, Laplace, Prewitt, Robert, and local binary pattern) to process the defect image data, extract the edges and texture features of the defect image, and fuse the grayscale image processed by the feature operator with the original grayscale image by using three channels. To consider also computational efficiency and reduce the number of calculation parameters, the three channels are converted into a single channel according to a certain weight ratio. With this strategy, the steel plate surface defect database of NEU is processed, and fusion schemes with different operator combinations and different weight ratios for conversion to the single channel are explored. The test results show that, under the same network framework and with the same computational cost, the fusion scheme of Sobel:image:Laplace and the single-channel conversion weight ratio of 0.2:0.6:0.2 can improve the recognition rate of a previously unprocessed image by 3% and can achieve a final accuracy rate of 99.77%, thereby demonstrating the effectiveness of the proposed strategy.

Keywords:

Citation: Yong Tian, Tian Zhang, Qingchao Zhang, Yong Li, Zhaodong Wang. Feature fusion–based preprocessing for steel plate surface defect recognition[J]. Mathematical Biosciences and Engineering, 2020, 17(5): 5672-5685. doi: 10.3934/mbe.2020305

Related Papers:

[1]	Dingwei Tan, Yuliang Lu, Xuehu Yan, Lintao Liu, Longlong Li . High capacity reversible data hiding in MP3 based on Huffman table transformation. Mathematical Biosciences and Engineering, 2019, 16(4): 3183-3194. doi: 10.3934/mbe.2019158
[2]	Yongju Tong, YuLing Liu, Jie Wang, Guojiang Xin . Text steganography on RNN-Generated lyrics. Mathematical Biosciences and Engineering, 2019, 16(5): 5451-5463. doi: 10.3934/mbe.2019271
[3]	Kun Zheng, Junjie Shen, Guangmin Sun, Hui Li, Yu Li . Shielding facial physiological information in video. Mathematical Biosciences and Engineering, 2022, 19(5): 5153-5168. doi: 10.3934/mbe.2022241
[4]	Wanru Du, Xiaochuan Jing, Quan Zhu, Xiaoyin Wang, Xuan Liu . A cross-modal conditional mechanism based on attention for text-video retrieval. Mathematical Biosciences and Engineering, 2023, 20(11): 20073-20092. doi: 10.3934/mbe.2023889
[5]	Xue Li, Huibo Zhou, Ming Zhao . Transformer-based cascade networks with spatial and channel reconstruction convolution for deepfake detection. Mathematical Biosciences and Engineering, 2024, 21(3): 4142-4164. doi: 10.3934/mbe.2024183
[6]	Xianyi Chen, Anqi Qiu, Xingming Sun, Shuai Wang, Guo Wei . A high-capacity coverless image steganography method based on double-level index and block matching. Mathematical Biosciences and Engineering, 2019, 16(5): 4708-4722. doi: 10.3934/mbe.2019236
[7]	Shanqing Zhang, Xiaoyun Guo, Xianghua Xu, Li Li, Chin-Chen Chang . A video watermark algorithm based on tensor decomposition. Mathematical Biosciences and Engineering, 2019, 16(5): 3435-3449. doi: 10.3934/mbe.2019172
[8]	Pedro R. Palos Sánchez, José A. Folgado-Fernández, Mario Alberto Rojas Sánchez . Virtual Reality Technology: Analysis based on text and opinion mining. Mathematical Biosciences and Engineering, 2022, 19(8): 7856-7885. doi: 10.3934/mbe.2022367
[9]	Jimmy Ming-Tai Wu, Jerry Chun-Wei Lin, Philippe Fournier-Viger, Youcef Djenouri, Chun-Hao Chen, Zhongcui Li . The density-based clustering method for privacy-preserving data mining. Mathematical Biosciences and Engineering, 2019, 16(3): 1718-1728. doi: 10.3934/mbe.2019082
[10]	Qing Ye, Qiaojia Zhang, Sijie Liu, Kaiqiang Chen . A novel chaotic system based on coupled map lattice and its application in HEVC encryption. Mathematical Biosciences and Engineering, 2021, 18(6): 9410-9429. doi: 10.3934/mbe.2021463

Abstract

1. Introduction

Steganography is an efficient privacy communication measurement in which secret messages are embedded into digital media, such as digital images, video or audio files, to implement the information delivery ^[1,2,3,4]. As a countermeasure to steganography, steganalysis ^[5,6,7,8] is mainly used to detect the presence of hidden data in a digital media. Traditional steganography mainly involves single digital image, and always combines the side-information based distortion function and Syndrome-Trellis Codes (STC) ^[4] to implement steganography. For example, data hider uses the non-round coefficients from one uncompressed image to measure the steganography costs of each DCT coefficient and then embed messages when saving them as compressed images. This strategy is consistently feasible in the context of modern steganography, because the side-information is not available to the recipient (or steganalyst). However, when the size of secret messages is too big, the steganography for single image does not work. One feasible way is to hide the messages among a batch of images. We usually name this scheme as batch steganography ^[9,10,11]. Nevertheless, batch steganography is not easy, at least inconvenient, to apply in real world due to the following two reasons. First, traditional distortion function mainly focuses on single image, the distortion definition for batch steganography is not straightforward. Second, batch steganography needs a huge of homogeneous covers, which are hard to be obtained due to diverse social networks.

One natural question arises: Is there a kind of media that can avoid the above two problems to facilitate data hiding? The answer is positive. Video sequences provide this possibility because they usually consist of a number of homogeneous image frames and thus have a higher capacity. Nevertheless, some ones may wonder if video steganography is as successful as traditional side-information based image steganography, because most of videos from different acquisition devices, e.g. cell phones or digital cameras, are always saved as JPEG format, not the uncompressed format. Accordingly, if we use the video to hide the secret messages, it has to answer two key questions: (1) How to design optimal distortion function by using continuous video frames with same (or approximate) scenes? (2) How to design embedding strategy to ensure that stego videos can resist diverse network attacks, such as usual noise attacks, video frame attacks and video compression attacks?

Aiming at the first question, since video always contains compressed image frames, the existing distortion function based on side-information cannot be transplanted directly. Nevertheless, we can get some inspiration from different definition of side-information, for example, designing the steganography cost by using multiple image with the same scene. Actually, several works has been developed in this direction. In ^[12], the authors proposed a new view to model the differences between the printed image and its scan version. Unfortunately, this scheme is inconvenient due to two pitfalls: (1) this scheme is rather labor-consuming due to requiring a large number of scan versions. (2) the difference among scan images maybe lead to the complication increasing. To remove this weakness, in ^[13], the authors designed a different type of side-information by multiple compressed images with the same scene. This scheme avoided time-consuming and formed a more secure method even if only two images are used. Although a quite significant increasing for anti-detection can be obtained easily with respect to the case of single image, this scheme is rather difficult to be practical because the image database is hard to build.

Regarding the second question, existing video steganographic methods ^{[14,15,16,17,18,19]} can be divided into two categories according to the information embedding domain. One is spatial domain based video steganography, in which the data is embedded directly into raw pixel values, and they usually refer to the processing of image steganography, such as Least Significant Bit (LSB) Matching method ^[1], Spread Spectrum (SS) method ^[20], and BCH code ^[15] et al.. Although spatial methods can embed high capacity messages, it is inevitable to loss the hidden messages once the stego videos are damaged by unexpected network interference, such as noise, compression, or frame losing. Another type is the joint-compression domain video steganography. In this category, most of methods embed data into different types of compressed video, e.g. motion vectors (MVs) methods ^[17,18], inter/intra prediction methods ^[19], quantized DCT coefficients methods ^[14,16] et al.. These compression domain based methods have a similarity, that is, lower embedding capacity. Moreover, although some compression domain based methods can effectively resist double compression attack, the secret messages are very hard to be recovered once the video frames are lost or damaged during delivery. Therefore, the robustness for video steganography needs to be further improved. This paper tries to fill this gap.

Facing the aforementioned problems, we make the following novel contributions in video steganography:

● We propose a robust video steganography scheme by a new distortion function and ensemble reconstruction mechanism. The proposed solution can not only improve the security performance of stego video, but also ensure the completeness of original data even if some video frames are damaged in delivery.

● Proposed scheme investigates another form of side-information by referring the adjacent image frames with the same scene, and then employs the side-information to design distortion function. This distortion function is effective because the message senders do not need to access the uncompressed image frames.

● Vandermonde matrix is used to expand and divide original data to multiple shares, which are embedded in the continuous frames by combining the designed distortion function and STC algorithm. Subsequently, ensemble reconstruction mechanism is designed to ensure the completeness and correctness of original data, even if partial data is damaged during delivery.

● Comprehensive experiments are performed with classical video sequences. The experimental results demonstrate that proposed scheme can significantly improve the overall performance on visual quality, robustness and anti-steganalysis, leading to a superiority for existing video steganographic methods.

The rest of this paper is organized as follows. Section 2 provides the details of proposed scheme by introducing the procedure of distortion function and data ensemble reconstruction. Subsequently, comprehensive experiments are performed to evaluate the performance of proposed scheme. The experimental results and corresponding discussions are presented in Section 3. Finally, Section 4 concludes the paper.

2. Data-loss resilience video steganography

2.1. The framework of proposed scheme

The framework of our proposed robust video steganography scheme is shown in Figure 1. The proposed scheme is mainly comprised of two parts: data embedding and data extraction. In the data embedding stage, we firstly use a multi-ary Vandermonde matrix to expand original data and then divide them into multiple small shares, which are considered as "actual embedding data". Secondly, we consider the continuous adjacent frames with same scene as pre-cover to provide the side-information, and then design an efficient distortion function by referring to the adjacent frames. Finally, the data shares are embedded into each frame by an existing cost-based embedding scheme. In the data extraction stage, we extract the undamaged shares from the received video frames, and then recover the original data by an ensemble reconstruction mechanism, although the video frames might be damaged or intercepted during the delivery. We claim that the original data can be recovered perfectly as long as the recipient can obtain enough undamaged shares.

Figure 1. Framework of proposed robust batch steganography scheme.

DownLoad: Full-Size Img PowerPoint

2.2. Data decomposition and reconstruction

A video sequence usually contains a lot of image frames with (approximately) same scene, if message sender hopes to deliver secret messages by video sequence, he can spread the messages into continuous image frames. At the receiving end, the receiver extracts the secret messages from these image frames according to a fixed order. Unfortunately, video sequences may be attacked/damaged during transmission, such as the network noise or the warden who might try to remove the video frames. In this case, it is unreasonable to assume that the recipient can receive the information completely and accurately. To improve the robustness of video steganography, in this section, we try to use matrix decomposition mechanism ^[11,21] to divide the original messages into multiple shares. Since each share only carry a small portion of valid information, partial loss for these shares do not affect the recovery for original messages. The corresponding details can be explained by Figure 2.

Figure 2. The procedure of original data decomposition using Vandermonde matrix.

DownLoad: Full-Size Img PowerPoint

Assume that the given secret messages are a binary stream. To expand the original data, we first present the original data into $q$ -ary symbol system, where $q$ is an odd prime. Actually, this procedure is rather simple. The messages are segmented into multiple pieces. Each piece contains $L_1$ bits, which can be converted to $L_2$ $q$ -ary digits according to the following equation.

$\begin{eqnarray} \begin{aligned} {L_1} = \left\lfloor {{L_2}\cdot{{\log }_2}q} \right\rfloor. \end{aligned} \end{eqnarray}$

(2.1)

We can provide a simple sample to explain it graphically. Assume that $L_1 = 4$ , $L_2 = 2$ and the original data is converted into 5-ary notational system. Three binary pieces, (1101 0110 1001), can be converted to six 5-ary digits (23 11 14). Notably, the size reduction from $L_1$ to $L_2$ can be calculated by the following equation.

$\begin{eqnarray} \begin{aligned} {r} = 1 - \frac{{{L_1}}}{{{L_2} \cdot {{\log }_2}q}} \end{aligned} \end{eqnarray}$

(2.2)

Clearly, when $L_1$ and $L_2$ are very large, $r$ is close to $0$ .

When the original data (binary stream) is converted completely, we integrate all $q$ -ary digits as a sequence and then expand them into multiple shares by the following steps.

$\bf{Step}$ $\bf{1:}$ Segment the $q$ -ary digit sequence into $K$ small blocks. Denote each of them as $\left\{ {{d_{k, 1}}, {d_{k, 2}}, \cdots, {d_{k, m}}} \right\}$ , where $m$ represents the length of a digital block and $k \in [1, K]$ .

$\bf{Step}$ $\bf{2:}$ Build Vandermonde matrix $\bf{A}$

$\begin{eqnarray} \begin{aligned} {\bf{A}} = \left[ {\begin{array}{*{20}{l}} 1&1& \cdots &1\\ {{a_1}}&{{a_2}}& \cdots &{{a_n}}\\ {a_1^2}&{a_2^2}& \cdots &{a_n^2}\\ \vdots & \vdots & \cdots & \vdots \\ {a_1^{m - 1}}&{a_2^{m - 1}}& \cdots &{a_n^{m - 1}} \end{array}} \right]\bmod {\rm{ }}q \end{aligned} \end{eqnarray}$

(2.3)

where $a_1, a_2, \cdots, a_n \in [0, q-1]$ are named as the indices of $\bf{A}$ and they are different with each other. In addition, $m$ , $n$ , and $q$ must satisfy $m\le n \le q$ .

$\bf{Step}$ $\bf{3:}$ With the following equation, each digital block $\{ d_{k, 1}$ , ${d_{k, 2}}$ , $\cdots$ , ${d_{k, m}} \}$ can be expanded to $n$ shares.

$\begin{eqnarray} \begin{aligned} \begin{array}{l} \left[ {\begin{array}{*{20}{l}} {{t_{k, 1}}}&{{t_{k, 2}}}&{{t_{k, 3}}}& \cdots &{{t_{k, n}}} \end{array}} \right] = \left[ {\begin{array}{*{20}{l}} {{d_{k, 1}}}&{{d_{k, 2}}}&{{d_{k, 3}}}& \cdots &{{d_{k, m}}} \end{array}} \right] \cdot {\bf{A}}, \end{array} \end{aligned} \end{eqnarray}$

(2.4)

where the symbol " $\cdot$ " in Equation (2.4) presents the multiplication operator in $q$ -ary notational system.

In order to understand data decomposition mechanism easily, we provide an actual example. Assume that $q = 7$ , $n = 6$ , $m = 3$ , and the original data are three 7-ary digits [2 1 4]. We set the indices $a_1, a_2, \cdots, a_n$ of Vandermonde matrix as [5 3 1 0 2 4]. So, the Vandermonde matrix can be built easily by Equation (2.3) and the original digit vector [2 1 4] can be expanded as [2 6 0 2 6 0] according to Equation (2.4).

According to the above steps, $m$ $q$ -ary digits from the original data can be expanded easily to $n$ $q$ -ary digits. Obviously, there is some redundancy in these $n$ $q$ -ary digits. We denote the redundancy rate as $R_e$ , which can be calculated easily as follows.

$\begin{eqnarray} \begin{aligned} {R_e} = 1 - \frac{{{m}}}{n}. \end{aligned} \end{eqnarray}$

(2.5)

Obviously, as long as the loss (or damaged) rate for $t_{k, 1}$ , $t_{k, 2}$ , $t_{k, 3}$ , $\cdots$ , $t_{k, n}$ is not more than $R_e$ , the original digits $d_{k, 1}$ , $d_{k, 2}$ , $d_{k, 3}$ , $\cdots$ , $d_{k, m}$ can be just reconstructed by Equation (2.6).

$\begin{eqnarray} \begin{aligned} \begin{array}{l} \begin{array}{l} \left[ {\begin{array}{*{20}{l}} {{d_{k, 1}}}&{{d_{k, 2}}}&{{d_{k, 3}}}& \cdots &{{d_{k, m}}} \end{array}} \right] = \left[ {\begin{array}{*{20}{l}} {{{t'}_{k, 1}}}&{{{t'}_{k, 2}}}&{{{t'}_{k, 3}}}& \cdots &{{{t'}_{k, m}}} \end{array}} \right] \cdot {\left( {{\bf{A'}}} \right)^{ - 1}} \end{array} \end{array} \end{aligned} \end{eqnarray}$

(2.6)

where $t'_{k, 1}$ , $t'_{k, 2}$ , $t'_{k, 3}$ , $\cdots$ , $t'_{k, m}$ are $m$ undamaged digits, which are selected from the received digits (they maybe contain the wrong digits). $\bf{A'}$ is $m\times m$ Vandermonde matrix built by the indices $a'_1$ , $a'_2$ , $\cdots$ , $a'_m$ (referring to Equation (2.3)). ${\bf{A'}}^{-1}$ is the inversion matrix of ${\bf{A'}}$ in $q$ -ary notational system, whose calculation details can be found in ^[11].

2.3. Distortion function based on frame reference

In this section, we describe the design details of a new distortion function when the sender possesses more than two continuous frames with the (approximately) same scene. Since the continuous video frames have strong correlation, when multiple continuous frames are used to carry the given messages, we can consider the adjacent frames as pre-covers, and calculate steganographic cost (distortion) of current cover to provide a better guidance for video steganography.

In the following, we describe the detailed designing procedure of distortion function when the continuous three cover frames are available. These three frames are considered as JPEG version and denoted ${\bf{F}^{\left(1 \right)}}$ , ${\bf{F}^{\left(2 \right)}}$ and ${\bf{F}^{\left(3 \right)}}$ . We denote the quantized DCT coefficients in three frames as $x_{ij}^{\left(1 \right)}$ , $x_{ij}^{\left(2 \right)}$ and $x_{ij}^{\left(3 \right)}$ , respectively. We then pronounce $x_{ij}^{\left(2 \right)}$ as cover frame, $x_{ij}^{\left(1 \right)}$ and $x_{ij}^{\left(3 \right)}$ as side-information.

When $x_{ij}^{\left(1 \right)}$ and $x_{ij}^{\left(3 \right)}$ are considered as side-information, the message sender can calculate the steganographic cost of modifying coefficient $x_{ij}^{\left(2 \right)}$ by -1 and +1. The corresponding costs are denoted as $\rho _{ij}^{\left(2 \right)}\left({ - 1} \right)$ and $\rho _{ij}^{\left(2 \right)}\left({ + 1} \right)$ . In order to ensure that the proposed distortion function can reflect the cost of changing the cover coefficient $x_{ij}^{\left(2 \right)}$ more accurately, we select the classical embedding schemes, such as J-UNIWARD ^[3], as the basis of calculating steganographic cost. Since the side-information is used to improve further the accuracy of distortion function, we keep the original costs (the costs calculating by classical steganographic schemes) when $x_{ij}^{\left(1 \right)}$ = $x_{ij}^{\left(2 \right)}$ and $x_{ij}^{\left(2 \right)}$ = $x_{ij}^{\left(3 \right)}$ , otherwise, re-modulate them. In other words, the values $x_{ij}^{\left(1 \right)}$ and $x_{ij}^{\left(3 \right)}$ are only useful when $x_{ij}^{\left(1 \right)}$ $\ne$ $x_{ij}^{\left(2 \right)}$ OR $x_{ij}^{\left(2 \right)}$ $\ne$ $x_{ij}^{\left(3 \right)}$ . Since proposed distortion function refers to two adjacent frames, the new cost $\rho _{ij}^{'}\left({ \pm 1} \right)$ can be explained by the following four-cases procedures:

$\bf{Case 1:}$ When $x_{ij}^{\left(1 \right)} = x_{ij}^{\left(2 \right)} \quad$ and $\quad x_{ij}^{\left(2 \right)} = x_{ij}^{\left(3 \right)}$ ,

$\begin{eqnarray} \rho _{ij}^{'}\left( { \pm 1} \right) = \rho _{ij}^{\left( 2 \right)}\left( { \pm 1} \right). \end{eqnarray}$

(2.7)

$\bf{Case 2:}$ When $x_{ij}^{\left(1 \right)} \ne x_{ij}^{\left(2 \right)} \quad$ and $\quad x_{ij}^{\left(2 \right)} = x_{ij}^{\left(3 \right)}$ ,

$\begin{eqnarray} \rho _{ij}^{'}\left( {{s_{ij}}} \right) = \frac{{\alpha \left( Q \right)\rho _{ij}^{\left( 2 \right)}\left( {{s_{ij}}} \right) + \rho _{ij}^{\left( 2 \right)}\left( {{s_{ij}}} \right)}}{2} , \end{eqnarray}$

(2.8)

where ${s_{ij}} = {\rm{sign}}\left({x_{ij}^{\left(1 \right)} - x_{ij}^{\left(2 \right)}} \right)$ .

$\bf{Case 3:}$ When $x_{ij}^{\left(1 \right)} = x_{ij}^{\left(2 \right)} \quad$ and $\quad x_{ij}^{\left(2 \right)} \ne x_{ij}^{\left(3 \right)}$ ,

$\begin{eqnarray} \rho _{ij}^{'}\left( {{s_{ij}}} \right) = \frac{\rho _{ij}^{\left( 2 \right)}\left( {{s_{ij}}} \right) +{\beta \left( Q \right)\rho _{ij}^{\left( 2 \right)}\left( {{s_{ij}}} \right)}}{2}, \end{eqnarray}$

(2.9)

where ${s_{ij}} = {\rm{sign}}\left({x_{ij}^{\left(3 \right)} - x_{ij}^{\left(2 \right)}} \right)$ .

$\bf{Case 4:}$ When $x_{ij}^{\left(1 \right)} \ne x_{ij}^{\left(2 \right)} \quad$ and $\quad x_{ij}^{\left(2 \right)} \ne x_{ij}^{\left(3 \right)}$

$\begin{eqnarray} \rho _{ij}^{'}\left( {{s_{ij}}} \right) = \frac{{\alpha \left( Q \right)\rho _{ij}^{\left( 2 \right)}\left( {{s_{ij}}} \right) + \beta \left( Q \right)\rho _{ij}^{\left( 2 \right)}\left( {{s_{ij}}} \right)}}{2}, \end{eqnarray}$

(2.10)

where ${s_{ij}} = {\rm{sign}}\left({x_{ij}^{\left(1 \right)} - x_{ij}^{\left(2 \right)} + x_{ij}^{\left(3 \right)} - x_{ij}^{\left(2 \right)}} \right)$ .

Clearly, $\alpha \left(Q \right)$ and $\beta \left(Q \right)$ are two modulation factors referring to the compression factor $Q$ , where $\alpha \left(Q \right)$ , $\beta \left(Q \right)$ $\in[0, 1]$ and $Q\in[1,100]$ . They are utilized to control the actual cost values calculated from the side-information and will be discussed later.

Notably, in this section, we only design an new distortion function, which is considered to provide a better steganographic cost measurement. An actual steganographic method can be formed by combining the new distortion function and STC algorithm ^[4]. The corresponding details can be found in following section.

2.4. Data embedding and ensemble reconstruction

2.4.1. Data embedding

Following data decomposition mechanism and distortion measurement, we design a robust video steganographic scheme.

Algorithm 1: Data Embedding in Video Sequence

Input:Video frames

$v_1, v_2, \dots, v_S$ ,

${\bf{T}} = \left\{ {{{\bf{t}}_1}, {{\bf{t}}_2}, \cdots, {{\bf{t}}_n}} \right\}$ , modulation parameters

$\alpha (Q)$ and

$\beta (Q)$ , multi-ary parameter

$q$ .

Output: Stego video sequence

$V'$ .

1 for

$t \leftarrow$

$\rm{1}$

$\bf{to}$

$S-2$ do

23 end

24 Integrate all stego frames

$v'_1, v'_2, \dots, v'_S$ to build stego video sequence

$V'$ \;

Denote the given secret messages $M_o$ and video sequence as $V$ . Proposed scheme tries to spread $M_o$ into video frames^* and ensure that the recipient can get the complete messages even if the video sequence is damaged during delivery. Therefore, proposed scheme is believed to be able to resist the diverse video attacks, such as noise, frame cropping or removal. The specific embedding procedure can be implemented as follows.

^* We do not consider that how the sender informs the recipient of the length of each share, or how many shares correspond to a frame or a video sequence, because it could be solved by hiding the information in the frame or video header or by other secret channel.

$\bf{Step 1:}$ Decompose the video sequence $V$ to a batch of frames $v_1, v_2, \dots, v_S$ , which are ensured to have the (approximate) same scenes^†. We denote the quality factor of video frames as $Q$ , and then get two parameters $\alpha (Q)$ and $\beta (Q)$ by referring to the modulation parameters table, which will be discussed in the next section.

^† In fact, the scene cuts are very common in video and always produce the frames with diversity contents. However, we do not discuss this special case lonely, because when new scene is cut, video frames can be re-extracted from the new scene to ensure they have the (approximate) same contents.

$\bf{Step 2:}$ Convert the original data $M_o$ to binary stream, which is subsequently converted to a $K\times m$ $q$ -ary digital matrix ${\bf{D}} = \left\{ {{{\bf{d}}_1}, {{\bf{d}}_2}, \cdots, {{\bf{d}}_m}} \right\}$ .

$\bf{Step 3:}$ Build Vandermonde matrix ${\bf{A}}$ by $a_1, a_2, \cdots, a_n$ and calculate the expanded data by the following equation.

$\begin{eqnarray} \begin{aligned} {\bf{T}} = {\bf{D}} \times {\bf{A}} \quad \bmod {\rm{ }}q \end{aligned} \end{eqnarray}$

(2.11)

where ${\bf{T}} = \left\{ {{{\bf{t}}_1}, {{\bf{t}}_2}, \cdots, {{\bf{t}}_n}} \right\}$ is a $K\times n$ matrix, ${\bf{t}}_i$ = $(t_{1, i}, t_{2, i}, \cdots, t_{K, i})^{\mathrm{T}}$ is a data vector, $i\in [1, n]$ . Since the indices of Vandermonde matrix must be delivered with the expanded data, we denote $(t_{1, i}, t_{2, i}, \cdots, t_{K, i}, a_i)^{\mathrm{T}}$ as a complete share.

$\bf{Step 4:}$ With the expanded data ${\bf{T}}$ and the cover frames $v_1, v_2, \dots, v_S$ . We can embed these shares into each frame by combining costs $\rho_{ij}^{'}$ and STC algorithm. The detailed procedure can be found in Algorithm 1.

2.4.2. Data ensemble reconstruction

Once stego video is delivered through insecure network channel, it might face to diverse network attacks. According to proposed data reconstruction procedure, the recipient can reconstruct the original information by ensemble decision, even if partial stego frames are removed or intercepted during delivery. Assume that the remaining stego frames can extract $m'$ complete shares (maybe contain some modified digits), $n\ge m' \ge m$ . We select $m$ shares from the remaining stego frames, and then extract their data vectors ${\bf{t'}}_1$ , ${\bf{t'}}_2$ , ${\bf{t'}}_3$ , $\cdots$ , ${\bf{t'}}_{m}$ and the corresponding indices $a'_1$ , $a'_2$ , $\cdots$ , $a'_{m}$ . The original information can be recovered correctly by an ensemble reconstruction mechanism whose detailed procedure is shown in Algorithm $2$ .

We also provide an actual example to explain our ensemble mechanism. According to the example in Section 2.2 and results of data decomposition, the complete expanded data should be [2 6 0 2 6 0]. We assume two digits are modified in delivery, that is, the last digit is lost and the second digit is changed (assuming '6' to '4'). Thus, the digits that are received by recipient are [2 $\bf{4}$ 0 2 6]. Since the original data have three digits, we can randomly select three digits from [2 $\bf{4}$ 0 2 6] and repeat four times (corresponding to the parameter $en = 4$ in Algorithm $2$ ). Assume that these four selections are [0 2 6], [ $\bf{4}$ 2 6], [2 0 6] and [2 $\bf{4}$ 2], respectively, we then calculate their Vandermonde inverse matrix according the method in ^[11] and obtain four "suspicious" original data, [2 1 4], [2 0 1], [2 1 4] and [2 4 2]. Finally, the majority voting (corresponding to the MaxVoting function in Algorithm $2$ ) is used to give the final decision [2 1 4]. An actual example can be found in Figure 3. Although the recipient does not know which digits are modified in video transmission, the ensemble mechanism probably makes a correct decision by majority voting.

Figure 3. An actual example for ensemble reconstruction mechanism.

DownLoad: Full-Size Img PowerPoint

Algorithm 2: Data Ensemble Reconstruction From Attacked Video

Input:

$m'$ data vectors

${\bf{t'}}_1$ ,

${\bf{t'}}_2$ ,

${\bf{t'}}_3$ ,

$\cdots$ ,

${\bf{t'}}_{m'}$ , the indices

$a'_1$ ,

$a'_2$ ,

$\cdots$ ,

$a'_{m'}$ for

$m'$ data vectors, multi-ary parameter

$q$ , ensemble rounds

$en$ .

Output:

${\bf{D}} = \left\{ {{{\bf{d}}_1}, {{\bf{d}}_2}, \cdots, {{\bf{d}}_m}} \right\}$ .

1 for

$i \leftarrow$

$\rm{1}$

$\bf{to}$

$en$ do

6 end

${{\bf{D}}}$ = MaxVoting(

${{\bf{D}}_{1}}, {{\bf{D}}_{2}}, {{\bf{D}}_{3}}, \cdots, {{\bf{D}}_{en}}$ ).

2.5. Several important notes

We would like to raise the attention to readers that if too many shares are lost, for example, when $m' < m$ , proposed scheme is not able to recover the original data. Actually, the main principle has been explained by Equation (2.5).

In this algorithm, we use ensemble voting strategy (the function MaxVoting in Algorithm $2$ ) to decide the correct original data. Although the received shares are complete, they maybe contain some modified digits, e.g. digit $3$ may be changed to $7$ due to the noise interference. Thus, the calculated original data matrices ${{\bf{D}}_{1}}, {{\bf{D}}_{2}}, {{\bf{D}}_{3}}, \cdots, {{\bf{D}}_{en}}$ in Algorithm $2$ might be different. We can give the correct decision by counting the maximum same occurrences for ${{\bf{D}}_{1}}, {{\bf{D}}_{2}}, {{\bf{D}}_{3}}, \cdots, {{\bf{D}}_{en}}$ . Also, we do not set $m' = m$ in data reconstruction, because if $m' = m$ , ${{\bf{D}}_{1}}, {{\bf{D}}_{2}}, {{\bf{D}}_{3}}, \cdots, {{\bf{D}}_{en}}$ might be different each other, this result is invalid in the ensemble strategy.

3. Experimental results and analysis

3.1. Experimental setup

We carry out our experiments on a classical video database ^[22], e.g. , which contains $15$ test sequences with $4:2:0$ YUV format. These video sequences have the same resolution of $352\times 288$ , and belong to diverse categories, including people, architecture, landscape, flowers, and so on. The detailed descriptions are given in Table 1.

Figure 4. The classical video sequences in our experiments.

DownLoad: Full-Size Img PowerPoint

Table 1. Detailed descriptions of video sequences.

Sequence	Resolution	Number of frames
Bus	$352\times288$	$150$
City	$352\times288$	$300$
Coastguard	$352\times288$	$300$
Crew	$352\times288$	$300$
Flower	$352\times288$	$250$
Football	$352\times288$	$260$
Foreman	$352\times288$	$300$
Harbour	$352\times288$	$300$
Highway	$352\times288$	$2000$
Ice	$352\times288$	$240$
Mobile	$352\times288$	$300$
Paris	$352\times288$	$1065$
Soccer	$352\times288$	$300$
Tempete	$352\times288$	$260$
Waterfall	$352\times288$	$260$

| Show Table

DownLoad: CSV

In addition, to verify our proposed method, each video is separated to a number of image frames. Then, all frames are compressed with same quality factor to avoid the influence of different quantization matrices for steganalysis. Since only the luminance contains a lot of non-zero coefficients for video frames, we just hide the messages into Y component. On the other hand, for a given steganographic algorithm, all frames are embedded with random messages embedding and then create new stego videos. Moreover, in order to test the performance of proposed scheme, we select some experimental video to train the corresponding parameters, e.g. $\alpha(Q)$ and $\beta(Q)$ . The ensemble classifier is employed to show the comparable results.

As the main concern of steganography, embedding capacity and anti-steganalysis performance are two important focuses. In our experiments, we measure the embedding capacity by bit per frame (bpf for short), which is explained as follows.

$\begin{eqnarray} r = \rm{\frac{{The\:total\:number\:of\:embedding\:bits}}{{The\:number\:of\:frames}}} \end{eqnarray}$

(3.1)

Similarly, the anti-steganalysis performance is evaluated by minimum average classification probability error ( $P_E$ in short).

$\begin{eqnarray} {P_E} = \mathop {\min }\limits_{{P_{FA}}} \left( {{P_{FA}} + {P_{MD}}} \right)/2 \end{eqnarray}$

(3.2)

where $P_{FA}$ and $P_{MD}$ are the false-alarm and the missing detection rates of a detector, respectively.

3.2. Test optimal modulation parameters $\alpha(Q)$ and $\beta(Q)$

We design a series of experiments to test the contribution of two modulation parameters $\alpha(Q)$ and $\beta(Q)$ and give their optimal values, which are determined when the $P_E$ has the minimum.

Two video sequences, Highway and Paris, are used to perform this experiment because they contain more frames $(2000+1065 = 3065)$ . We save these frames with different quality factors and then divide them equally into training and testing sets. Gabor Filter Residual (GFR) feature set ^[7] and ensemble classifier ^[23] are used to provide the experimental results because they can effectively detect modern steganography, e.g. J-UNIWARD ^[3]. We determine the optimal modulation factors experimentally by getting the minimal $P_E$ . Six quality factors, $70$ , $75$ , $80$ , $85$ , $90$ , $95$ , are tested to obtain the optimal modulation parameters. In , we give the changing trend of two parameters under relative payloads $r = 0.2$ Kbpf. As can be seen in this figure, there is only a slight difference between $\alpha(Q)$ and $\beta(Q)$ . This is because the adjacent frames have the same scene (same content). When the continuous adjacent frames are used as the reference (pre-cover) of current frame, there might have a (approximate) same steganographic costs. In addition, we can see that the modulation value becomes significantly bigger with the quality factors increasing. In fact, this interesting phenomenon is mainly related to the calculation procedure of original distortion function.

Figure 5. The modulation parameters

$\alpha(Q)$ and

$\beta(Q)$ with JPEG quality factor

$Q$ increasing. The relative payload is

$r = 0.2$ Kbpf.

DownLoad: Full-Size Img PowerPoint

shows the optimal modulation parameters for different payloads by carrying out a series of experiments. It can be observed that the optimal modulation parameter values gradually decrease with the payload increasing. Actually, since the new distortion depends on the original distortion (e.g. $\rho _{ij}\left({ \pm 1} \right)$ in Section 2.3), with respect to embedding in single frame, the new distortion function referring to the continuous adjacent frames will significantly increase empirical security, especially for the large payloads and small quality factors.

Table 2. Optimal modulation parameter combinations

$(\alpha(Q), \beta(Q))$ for different relative payloads (bpf).

Payload $r$	Quality factor $Q$
Payload $r$	$70$	$75$	$80$	$85$	$90$	$95$
$0.1$ K	$(0.087, 0.086)$	$(0.091, 0.092)$	$(0.192, 0.201)$	$(0.284, 0.291)$	$(0.479, 0.464)$	$(0.654, 0.662)$
$0.2$ K	$(0.095, 0.093)$	$(0.114, 0.107)$	$(0.190, 0.181)$	$(0.286, 0.274)$	$(0.457, 0.461)$	$(0.623, 0.619)$
$0.3$ K	$(0.094, 0.099)$	$(0.104, 0.117)$	$(0.167, 0.163)$	$(0.243, 0.252)$	$(0.437, 0.431)$	$(0.581, 0.597)$
$0.4$ K	$(0.085, 0.083)$	$(0.090, 0.103)$	$(0.150, 0.144)$	$(0.248, 0.224)$	$(0.407, 0.389)$	$(0.563, 0.542)$
$0.5$ K	$(0.076, 0.079)$	$(0.084, 0.091)$	$(0.137, 0.129)$	$(0.226, 0.224)$	$(0.375, 0.370)$	$(0.503, 0.489)$

| Show Table

DownLoad: CSV

3.3. Robustness analysis for proposed scheme

In our proposed scheme, we introduce Vandermonde matrix to divide the original data into multiple shares (e.g. Figure 2), and then hide these shares in a series of video frames by combining new distortion function and STC algorithm. In this section, we analyze the robustness of proposed scheme.

With the Equation (2.4) and , we know that $m$ $q$ -ary digits can be expanded to $n$ $q$ -ary digits by Vandermonde matrix. Obviously, $n$ $q$ -ary digits carry $m$ original digits. In other words, for $n$ expanded digits, each of them only carries $\frac{{{m}}}{n}$ valid original digits. Therefore, there is some redundancy in $n$ expanded digits, which can be calculated by $\frac{{{n-m}}}{n}$ (referring to Equation (2.5)). As such, as long as the recipient can receive no less than $m$ digits from $n$ expanded digits, he can recover the $m$ original digits. Actually, this procedure can be deduced easily from the . For example, we assume that the recipient has received $m'$ shares, $m\le m' \le n$ . He selects $m$ ones from $m'$ shares, and then extract $m$ expanded data vectors ${\bf{t'}}_1$ , ${\bf{t'}}_2$ , ${\bf{t'}}_3$ , $\cdots$ , ${\bf{t'}}_{m}$ and their corresponding indices $a'_1$ , $a'_2$ , $\cdots$ , $a'_m$ . According to Equation (2.3), a Vandermonde matrix $\bf{A'}$ can be built according to $m$ indices.

$\begin{eqnarray} \begin{aligned} {\bf{A'}} = \left[ {\begin{array}{*{20}{l}} 1&1& \cdots &1\\ {{a'_1}}&{{a'_2}}& \cdots &{{a'_m}}\\ {(a'_1)^2}&{(a'_2)^2}& \cdots &{(a'_m)^2}\\ \vdots & \vdots & \cdots & \vdots \\ {(a'_1)^{m - 1}}&{(a'_2)^{m - 1}}& \cdots &{(a'_m)^{m - 1}} \end{array}} \right]\bmod {\rm{ }}q. \end{aligned} \end{eqnarray}$

(3.3)

We can prove that matrix $\bf{A'}$ has an inverse matrix in $q$ -ary notational system because it is a full-rank matrix. We omit the actual proof ^[11] here due to space limitations. Denote $\bf{A'}^{-1}$ as the inverse matrix of $\bf{A'}$ , the original data ${\bf{D}} = \left\{ {{{\bf{d}}_1}, {{\bf{d}}_2}, \cdots, {{\bf{d}}_m}} \right\}$ can be calculated easily by the following equation.

$\begin{eqnarray} \begin{aligned} \begin{array}{l} \begin{array}{l} {\bf{D}} \quad = \left[ {\begin{array}{*{20}{l}} {{\bf{d}}_1}&{{\bf{d}}_2}& {{\bf{d}}_3}& \cdots &{{\bf{d}}_m} \end{array}} \right] = \left[ {\begin{array}{*{20}{l}} {\bf{t'}}_1 & {\bf{t'}}_2& {\bf{t'}}_3& \cdots& {\bf{t'}}_{m} \end{array}} \right] \cdot {\left( {{\bf{A'}}} \right)^{ - 1}} \end{array} \end{array} \end{aligned} \end{eqnarray}$

(3.4)

In general, the redundancy rate $R_e$ depends on the parameters $m$ and $n$ . It can be up very high if we set an extreme gap between $m$ and $n$ . shows the relationship between redundancy rate $R_e$ and parameters $m$ and $n$ . As can been in this table, when the parameter $m$ is fixed, $R_e$ will become higher with $n$ increasing. This conclusion can be also validated theoretically by the trend of lines in Figure 6.

Table 3. Different parameter combinations (

$m, n$ ) and different redundancy rate

$R_e$ .

$m$	$n$	$q$	$R_e$
$5$	$9$	$11$	$44.44\%$
$5$	$13$	$17$	$61.54\%$
$5$	$29$	$31$	$82.76\%$
$5$	$71$	$73$	$92.96\%$
$5$	$251$	$251$	$98.01\%$

| Show Table

DownLoad: CSV

Figure 6. The relationship between redundancy rate

$R_e$ and two parameters

$m$ and

$n$ . Six different values,

$m = 4$ ,

$m = 15$ ,

$m = 31$ ,

$m = 49$ ,

$m = 101$ ,

$m = 171$ , are tested and satisfy the condition

$m\le n \le q$ .

DownLoad: Full-Size Img PowerPoint

3.4. Comparison with the state of the arts

In this section, we compare the proposed video steganographic scheme with other state-of-the-arts. The performance comparison mainly focuses on three aspects: visual quality, robustness, and anti-steganalysis. We should raise the readers' attention that we do not use the video sequences Highway and Paris in the following experiments, because they are used in Section 3.2 to find the optimal modulation parameters. If these two video sequences are re-used, the corresponding experiments might encounter the over-fitting.

3.4.1. Visual quality comparison

The imperceptibility is very important for video steganography. It is always required that the steganographic method should not cause severe visual quality degradation.

In our experiment, we use J-UNIWARD algorithm to calculate original distortion and then employ proposed new distortion function to further improve the original distortion. Peak signal-to-noise ratio (PSNR for short) is used to evaluate the visual quality of stego video sequences. Since the video frames are compressed format, we calculate the PSNR (dB) by comparing the uncompressed video sequence before data embedding and the decompression reconstructed video sequence after data embedding. We test all video sequences with three payloads $0.1$ Kbpf, $0.2$ Kbpf, and $0.3$ Kbpf. Figure 7 shows the visual quality of proposed scheme with three payloads. As can be seen in this figure, the stego frames and original frames are apparently difficult to distinguish. This demonstrates that proposed steganographic scheme has a high visual quality. Additionally, Table 4 lists the PSNR values of all video sequences by comparing three steganographic methods, Chang's method ^[14], Liu's method ^[15], and Mstafa's method ^[16]. The payload is fixed to $0.1$ K bpf. We can get the conclusion from this table that proposed scheme has a slight visual quality degradation after data hiding, but, comparison with other methods, it still has a significant superior performance.

Figure 7. Visual quality of original and stego frames for Flower (frame 48), Foreman (frame 30) and Mobile (frame 63). The relative payloads are 0.1 Kbpf, 0.2 Kbpf, and 0.3 Kbpf.

DownLoad: Full-Size Img PowerPoint

Table 4. PSNR(dB) comparison by using three steganographic methods with payload

$r = 0.1$ K bpf. All testing video sequences are used in this experiment to give the average results.

Sequence	Cover	Chang et al. [14]	Liu et al.[15]	Mstafa et al.[16]	Proposed Scheme}
Bus	$33.378$	$32.963$	$33.084$	$33.190$	$33.212$
City	$34.898$	$34.657$	$34.775$	$34.782$	$34.804$
Coastguard	$34.562$	$34.157$	$34.321$	$34.379$	$34.483$
Crew	$37.012$	$36.792$	$36.801$	$36.887$	$36.901$
Flower	$34.501$	$34.125$	$34.301$	$34.374$	$34.423$
Football	$35.928$	$35.645$	$35.709$	$35.756$	$35.811$
Foreman	$36.068$	$35.798$	$35.887$	$35.892$	$35.975$
Harbour	$34.040$	$33.724$	$33.882$	$33.910$	$33.922$
Ice	$39.556$	$39.084$	$39.163$	$39.192$	$39.221$
Mobile	$33.476$	$33.003$	$33.098$	$33.104$	$33.192$
Soccer	$35.518$	$35.241$	$35.287$	$35.296$	$35.332$
Tempete	$34.531$	$34.302$	$34.339$	$34.387$	$34.401$
Waterfall	$34.668$	$34.012$	$34.206$	$34.213$	$34.279$

| Show Table

DownLoad: CSV

3.4.2. Robustness testing

We further test the robustness of proposed scheme with a series of experiments. The corresponding experiments can be performed by three attack forms: usual attack, frame attack and video compression attack.

We firstly test the robustness of proposed scheme for usual attack forms. Three usual attack forms, including Salt & Peppers noise, Gaussian noise and Median filtering, are used to provide testing results. In these experiments, we set data decomposition parameter combination as ( $m = 5$ , $n = 13$ , $q = 17$ ), the redundancy rate is thus $R_e = 61.54\%$ . In other words, as long as the expanded data is lost (or damaged) no more than $61.54\%$ , the original data can be recovered perfectly. In data embedding procedure, we evenly hide the expanded data (multiple shares) in each frame with payload $0.2K$ bpf. For Salt & Peppers noise attack, the noise intention is fixed $I = 0.01$ and $0.05$ , and for Gaussian noise attack, we also fix the parameters $V = 0.01$ and $0.05$ , while for Median filtering, the size of block is set to $3\times 3$ . We verify proposed scheme by using and un-using ensemble mechanism, where the ensemble rounds are $11$ (corresponding to $en = 11$ in Algorithm $2$ ). Each experiment is run $100$ times. Each time, we randomly extract data from the remaining (complete) frames. The results are from average calculation as the times that the original data can be recovered correctly over the total testing times. shows the experimental results. It can be observed that, for three usual attack forms, when the ensemble mechanism is not used, the ratio that original data can be recovered perfectly is rather low, only $67\%$ for Salt & Peppers noise with $I = 0.01$ , $63\%$ for Gaussian noise with $V = 0.01$ and $79\%$ for Median filtering. This is because the noise might modify the embedding information so that the recovered original data is wrong even if they seem to be complete.

Table 5. Robustness testing for three usual attack forms, Salt Peppers noise, Gaussian noise and Median filtering. We test proposed scheme with ensemble mechanism (Ensemble) and without ensemble mechanism (Non-Ensemble), respectively. All testing video sequences are used in this experiment to give the average results.

Attacking Forms	Attacking Parameters	Non-Ensemble	Ensemble
Salt & Peppers	$I = 0.01$	$67\%$	$100\%$
Salt & Peppers	$I = 0.05$	$46\%$	$94\%$
Gaussian noise	$V = 0.01$	$63\%$	$98\%$
Gaussian noise	$V = 0.05$	$42\%$	$92\%$
Median filtering	$3\times 3$ block	$79\%$	$100\%$

| Show Table

DownLoad: CSV

In addition, we also test two video frame attack forms, frame cropping and frame removal. For frame cropping, we set two cases, only one frame cropping and all frames cropping. The cropping scales are fixed $20\%$ and $40\%$ . For frame removal, five removal ratios are tested for all frames, $20\%$ , $40\%$ , $50\%$ , $60\%$ and $70\%$ . The corresponding experimental results are shown in . As can be seen from this table, when the cropping for all frames is more than $40\%$ , the data reconstruction ability for non-ensemble scheme becomes inferior. When the removal ratio is up to $70\%$ , the original data cannot be recovered because the lost ratio has exceeded the redundancy rate $R_e = 61.54\%$ . Overall, proposed scheme can implement a robust recovery for the original data even if they are lost or damaged during delivery.

Table 6. Robustness testing for two video frame attack forms, frame cropping and frame removal. We test proposed scheme with ensemble mechanism (Ensemble) and without ensemble mechanism (Non-Ensemble), respectively. All testing video sequences are used in this experiment to give the average results.

Attacking Forms	Attacking Parameters	Non-Ensemble	Ensemble
Frame Cropping	$20\%$ (One frame)	$100\%$	$100\%$
	$40\%$ (One frame)	$100\%$	$100\%$
	$20\%$ (All frames)	$82\%$	$98\%$
	$40\%$ (All frames)	$53\%$	$90\%$
Frame Removal	$20\%$ frames	$100\%$	$100\%$
	$40\%$ frames	$100\%$	$100\%$
	$50\%$ frames	$68\%$	$100\%$
	$60\%$ frames	$15\%$	$62\%$
	$70\%$ frames	$0\%$	$0\%$

| Show Table

DownLoad: CSV

Moreover, in order to gain more insight, we also test the proposed scheme by using the H.264 video compression with different quantization parameters ( $QP$ for short). In this attack from, compression is applied to every macroblock of video frame and $QP$ is used to control the level of compression. presents the experimental results. We can observe from this table that the data reconstruction ability significantly becomes inferior with $QP$ value increasing, because lower $QP$ value maybe cause an inferior video quality, leading to a larger data modification. Also, for compression testing with different $QP$ values, $QP = 10, 20, 30, 40$ , we further test PSNR for three video steganographic schemes, Chang's scheme ^[14], Dalal's scheme ^[24] and proposed scheme. Three video sequences, Bus, Flower and Foreman, are used to give the experimental results, which are shown in . As can be seen that increasing of $QP$ value results in more compression and leads to a lower video quality.

Table 7. Robustness testing for H.264 video compression attack with different quantization parameter (QP). We test proposed scheme with ensemble mechanism (Ensemble) and without ensemble mechanism (Non-Ensemble), respectively.

Attacking Forms	Attacking Parameters	Non-Ensemble	Ensemble
H.264 Compression	$QP = 40$	$31\%$	$78\%$
	$QP = 30$	$44\%$	$84\%$
	$QP = 20$	$52\%$	$95\%$
	$QP = 10$	$73\%$	$100\%$

| Show Table

DownLoad: CSV

Table 8. When H.264 compression attacks with different quantization parameter (

$QP$ ) are used, PSNR(dB) comparison for three video steganographic schemes, Chang's scheme ^[14], Dalal's scheme ^[24] and proposed scheme. Three video sequences, Bus, Flower and Foreman, are used in this experiment.

Schemes	Video	Quantization parameter $QP$
Schemes	Video	$QP = 10$	$QP = 20$	$QP = 30$	$QP = 40$
Chang et al. [14]	Bus	$33.544$	$33.012$	$32.910$	$32.874$
	Flower	$34.890$	$34.576$	$34.250$	$34.011$
	Foreman	$35.169$	$35.083$	$34.980$	$34.911$
Dalal et al. [24]	Bus	$33.788$	$33.654$	$33.540$	$33.217$
	Flower	$35.014$	$34.983$	$34.756$	$34.669$
	Foreman	$36.701$	$36.542$	$36.102$	$35.818$
Proposed	Bus	$34.095$	$34.013$	$33.953$	$33.881$
	Flower	$35.870$	$35.704$	$35.556$	$35.231$
	Foreman	$37.017$	$36.809$	$36.544$	$36.223$

| Show Table

DownLoad: CSV

3.4.3. Anti-steganalysis testing

As a good video steganographic method, the anti-steganalysis capability is also an important focus. Since video steganographic schemes consistently hide the messages in a batch of video frame, we use three batch steganographic detection schemes (also named steganographer detection methods), hierarchical clustering scheme (HC) ^[25], local outlier factor detection (LOF) ^[26], and ensemble clustering scheme (EC) ^[27], to demonstrate the security performance of different video steganographic schemes against steganalysis.

shows the comparison results between proposed scheme and three state-of-the-art schemes with three payloads $r$ = $0.01$ K, $0.02$ K, $0.03$ K bpf, respectively. In this experiment, we regard each video sequence as a cluster (total $13$ clusters). Each experiment, we randomly choose $7$ clusters (video sequences), and select $50$ frames from each one. Then, one cluster is randomly chosen as the guilty who uses three payloads mentioned above to hide messages, respectively. Each experiment is repeated $100$ times and the overall identification accuracy rate is used to evaluate anti-steganalysis that is denoted by the number of correctly identification over the total testing number. From Table 9, we can observe that proposed scheme has a lower accuracy rate than that of other schemes. This illustrates that proposed scheme is more secure. In addition, we find that the EC method is conclusively more efficient comparing with HC and LOF methods. This is because EC method uses the ensemble clustering mechanism containing a number of sub-clustering, it can experimentally give a superior detection performance. Actually, this interesting phenomenon has been conclusively verified in ^[27].

Table 9. Overall identification accuracy rate for four video steganographic schemes. Three steganalysis methods, HC, LOF and EC, are used.

Schemes	Payload $r$	HC method	LOF method	EC method
Chang et al. [14]	$0.1$ K	$82\%$	$88\%$	$90\%$
	$0.2$ K	$84\%$	$89\%$	$92\%$
	$0.3$ K	$88\%$	$94\%$	$97\%$
Liu et al. [15]	$0.1$ K	$79\%$	$82\%$	$87\%$
	$0.2$ K	$82\%$	$90\%$	$91\%$
	$0.3$ K	$86\%$	$90\%$	$93\%$
Mstafa et al. [16]	$0.1$ K	$77\%$	$80\%$	$84\%$
	$0.2$ K	$80\%$	$85\%$	$90\%$
	$0.3$ K	$85\%$	$86\%$	$89\%$
Proposed	$0.1$ K	$73\%$	$78\%$	$79\%$
	$0.2$ K	$78\%$	$82\%$	$86\%$
	$0.3$ K	$81\%$	$84\%$	$88\%$

| Show Table

DownLoad: CSV

4. Conclusions and future works

In this paper, we proposed a robust video steganographic scheme. We first expand original data to multiple shares. This mechanism can ensure that the recipient recover the original data successfully even if they only obtain a part of data. Then, a new distortion function is designed by using continuous adjacent video frames as side-information, which can further improve the security performance of steganography. Proposed scheme is robust in the sense that the recipient can recover the hidden data even if some frames are damaged or lost during delivery. Extensive experiments are performed to show that our proposed schemes outperform existing video steganographic schemes in terms of visual quality, robustness and anti-steganalysis.

While proposed scheme has shown a good performance in the diverse tests, we should note that it has an obvious short on the utilization of video sequences because proposed scheme consistently expands the original data to multiple shares, this makes that the actual hidden data become more and more, leading to a significant cover-consuming. Nevertheless, we believe that this is only a small problem because we can easily obtain massive video sequences from the Internet. Moreover, although our method is robust in social networks, if too many shares are damaged or lost, proposed scheme will do not work.

Finally, we believe that there may be some room for further improvement. For example, the distortion function can be designed by involving more adjacent frames, although the complexity may rise sharply. In addition, we should consider to further reduce the computational complexity for inverse matrix in $q$ -ary notation system. The above two issues are left as our future works.

Acknowledgments

This work was supported by Natural Science Foundation of China under Grants (No.61602295, No.U1736120) and Natural Science Foundation of Shanghai (No.16ZR1413100, No.18ZR1427500) and the Foreign Visiting Scholar Program of Shanghai Municipal Education Commission.

Conflict of interest

The authors declare that there is no conflict of interests regarding the publication of this article.

References

[1]	D. He, K. Xu, P. Zhou, D. Zhou, Surface defect classification of steels with a new semi-supervised learning method, Opt. Lasers Eng., 117 (2019), 40-48.
[2]	R. Gong, M. Chu, Y. Yang, Y. Feng, A multi-class classifier based on support vector hyper-spheres for steel plate surface defects, Chemom. Intell. Lab. Syst., 188 (2019), 70-78.
[3]	B. Wu, J. Zhou, X. Ji, Y. Yin, X. Shen, Research on Approaches for Computer Aided Detection of Casting Defects in X-ray Images with Feature Engineering and Machine Learning, Procedia Manuf., 37 (2019), 394-401.
[4]	X. Wen, K. Song, M. Niu, Z. Dong, Y. Yan, A three-dimensional inspection system for high temperature steel product surface sample height using stereo vision and blue encoded patterns, Optik, 130 (2017), 131-148
[5]	M. Kuffer, K. Pfeffer, R. Sliuzas, I. Baud, Extraction of slum areas from VHR imagery using GLCM variance, IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens., 9 (2016), 1830-1840.
[6]	Y. Liu, S. Liu, Z. Wang, Multi-focus image fusion with dense SIFT, Inf. Fusion, 23 (2015), 139-155.
[7]	A. Zendehboudi, M. A. Baseer, R. Saidur, Application of support vector machine models for forecasting solar and wind energy resources: A review, J. Cleaner Prod., 199 (2018), 272-285
[8]	I. Rish, An empirical study of the naive Bayes classifier, IJCAI 2001 workshop on empirical methods in artificial intelligence, 2001.
[9]	S. Zhang, X. Li, M. Zong, X. Zhu, R. Wang, Efficient knn classification with different numbers of nearest neighbors, IEEE Trans. Neural Networks Learn. Syst., 29 (2018), 1774-1785.
[10]	G. Biau, E. Scornet, A random forest guided tour, Test, 25 (2016), 197-227.
[11]	C. He, H. Kang, T. Yao, X. Li, An effective classifier based on convolutional neural network and regularized extreme learning machine, Math. Biosci. Eng., 16 (2019), 8309-8321.
[12]	L. Wen, Y. Dong, L. Gao, A new ensemble residual convolutional neural network for remaining useful life estimation, Math. Biosci. Eng., 16 (2019), 862-880.
[13]	Z. Ning, Y. Feng, M. Collotta, X. Kong, X. Wang, L. Guo, et al., Deep Learning in Edge of Vehicles: Exploring Tri-relationship for Data Transmission, IEEE Trans. Ind. Inf., 15 (2019), 5737-5746.
[14]	F. Chen, M. R. Jahanshahi, NB-CNN: Deep learning-based crack detection using convolutional neural network and Naïve Bayes data fusion, IEEE Trans. Ind. Electron., 65 (2018), 4392-4400.
[15]	Y. He, K. Song, Q. Meng, Y. Yan, An end-to-end steel surface defect detection approach via fusing multiple hierarchical features, IEEE Trans. Instrum. Meas., 69 (2020), 1493-1504.
[16]	Y. Wang, H. Xia, X. Yuan, L. Li, B. Sun, Distributed defect recognition on steel surfaces using an improved random forest algorithm with optimal multi-feature-set fusion, Multimedia Tools Appl., 77 (2018), 16741-16770.
[17]	S. Gupta, S. G. Mazumdar, Sobel edge detection algorithm, Int. J. Comput. Sci. Manage. Res., 2 (2013), 1578-1583.
[18]	W. Zheng, K. Liu, Research on Edge Detection Algorithm in Digital Image Processing, 2017 2nd International Conference on Materials Science, Machinery and Energy Engineering, 2017.
[19]	D. Adlakha, D. Adlakha, R. Tanwar, Analytical comparison between Sobel and Prewitt edge detection techniques, Int. J. Sci. Eng. Res., 7 (2016), 1482-1485.
[20]	P. Vit, Comparison of various edge detection technique, Int. J. Signal Process. Image Process. Pattern Recognit., 9 (2016), 143-158.
[21]	T. Ahonen, A. Hadid, M. Pietikainen, Face description with local binary patterns: Application to face recognition, IEEE Trans. Pattern Anal. Mach. Intell., 28 (2006), 2037-2041.
[22]	K. Song, Y. Yan, A noise robust method based on completed local binary patterns for hot-rolled steel strip surface defects, Appl. Surf. Sci., 285 (2013), 858-864.
[23]	K. Li, X. Wang, L. Ji, Application of Multi-Scale Feature Fusion and Deep Learning in Detection of Steel Strip Surface Defect, International Conference on Artificial Intelligence and Advanced Manufacturing (AIAM), 2019.
[24]	A. El-Sawy, H. EL-Bakry, M. Loey, CNN for handwritten arabic digits recognition based on LeNet-5, International conference on advanced intelligent systems and informatics (AISI), 2016.
[25]	A. Krizhevsky, I. Sutskever, G. E. Hinton, ImageNet classification with deep convolutional neural networks, Advances in neural information processing systems (NIPS), 2012.
[26]	A. Vedaldi, A. Zisserman, Vgg convolutional neural networks practical, Dep. Eng. Sci. Univ. Oxford, 2016 (2016), 66.
[27]	C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, et al., Going deeper with convolutions, Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), 2015.
[28]	K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, Proceedings of the IEEE conference on computer vision and pattern recognition, 2016.
[29]	R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, D. Batra, Grad-cam: Visual explanations from deep networks via gradient-based localization, Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2017.

This article has been cited by:

Mukesh Dalal, Mamta Juneja, A secure and robust video steganography scheme for covert communication in H.264/AVC, 2021, 1380-7501, 10.1007/s11042-020-10364-z

Reader Comments

Your name:*

Email:*
© 2020 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)