In order to improve the detection accuracy of hidden message in images, steganalysis features are selected as inputs for steganalysers. However, the existing Fisher criterion ignores the contribution of steganalysis feature components in dispersion to classification, which causes the useful feature components to be deleted, and decreases the detection accuracy of steganalysis features. By analyzing the separability of steganalysis feature components, we introduce the sigmoid function into Fisher's criterion and propose an improved Fisher criterion (I-Fisher criterion), which can make up for the traditional Fisher criterion in separability measurement of steganalysis feature components. To optimize the steganalysis feature and reduce its dimension, we employ the improved Fisher criterion as the heuristic function of the decision rough set α-positive region reduction, and propose the feature selection method based on the improved Fisher. Experimental results show that the proposed method can reduce the dimension and memory of the GFR high-dimensional feature and the CC-PEV lowdimensional feature while maintaining or improving the detection accuracy.
1.
Introduction
Steganalysis aims to detect hidden information in image, audio, video, text, 3D object and other multimedia cover [1,2]. It is designed to counteract against to steganography which is a technology that hides messages in multimedia cover [3,4,5,6]. After more than 20 years of development, the research on steganalysis has made significant progress [7,8,9,10,11]. The key point of steganalysis is to identify statistical differences between features extracted from cover-signals and stego-signals. However, traditional steganalysis methods no longer work when the message is hidden by image adaptive steganography methods, such as those from [12]. Existing steganalysis methods require ever larger feature space, enormous memory and computation power, which limits the practical application of steganslysis. Both traditional steganalysis features and current steganalysis features require to be reduced. How to reduce the dimension of the steganalysis feature has become an urgent problem to be solved.
Recently, steganalysis feature reduction methods have been proposed by researchers. The typical methods are those steganalysis feature reduction based on Genetic or Particle Swarm Optimization algorithm and the integrated classifier [13,14]. These methods can reduce the steganalysis feature dimension, while the time complexity is high. Other feature selection methods use the mutual information to select complementary features [15,16]. These methods can reduce the dimensionality of the steganalysis features, but at the same time it is also reduce the detection accuracy for the stego images. Principal Component Analysis (PCA) is also used by several methods[17,18] in order to reduce the feature dimensions while aiming to maintain the accuracy of the steganalyser. These methods can reduce feature dimension and maintain the detection accuracy of linear features on stegos. However, a large number of experimental results show that detection accuracy for the nonlinear steganalysis features based on PCA can be cut down. In previous work, we have conducted a series of related studies on feature selection for steganalysis [19,20]. A feature selection method for image steganalysis based on decision rough set α-positive region reduction is proposed to reduce feature dimension [21]. This method can significantly reduce the feature dimension while maintaining or improving the detection accuracy. Nevertheless, the steganalysis detection accuracy using the selected feature set by this method can be improved. The feature selection method based on Fisher criterion can not only reduce feature dimension efficiently while maintaining detection accuracy, but also greatly improve detection efficiency [22]. However, the feature dimensions are still high.
For the problems above, analyzing the principle of the steganalytic feature separability measurement, this paper improves the Fisher criterion (i.e. I-Fisher criterion). The I-Fisher criterion is applied to the decision rough set α-positive domain reduction, defining a steganalysis feature selection method, which is expected to further improve the detection accuracy while reducing the feature dimension.
The rest of this manuscript is organized as follows. Section 2 briefly introduces the Fisher criterion. Section 3 describes the improved Fisher criterion. In section 4 describes the proposed feature selection method based on an improved Fisher criterion. Section 5 gives the experimental results. Finally, the final section summarizes the full text and looks forward to the next step.
2.
Related work
The Fisher criterion is a measurement criterion to the contribution of a feature component to classify two classes. In steganalysis feature selection, the Fisher criterion is usually used to measure the separability (i.e. the contribution to classify stego images) of a feature component. The measurement results of the feature components based on the Fisher criterion are usually used as the basis for selection. The steganalysis feature selection based on the Fisher criterion can reduce the feature dimension. The Fisher criterion is defined as:
Denote the cover image class as XC and the stego image class as XS. Feature spaces FC and FS are extracted from XC and XS, respectively. N is the element number of FC and FS. Then, the Fisher criterion is defined as:
where mC(fi) denotes the mean of the ith feature component of the cover image class, mS(fi) denotes the mean of the ith feature component of the stego image class, dC(fi) denotes the standard deviation of the ith feature component of the cover image class, dS(fi) denotes the standard deviation of the ith feature component of the stego image class.
The larger the Fscore(fi) is the better the separability and the greater the contribution of the feature component to the stego images. For this reason, the Fscore values can provide a basis for reduction in feature selection. The feature selection based on the Fisher criterion has achieved good results in steganalysis [22].
3.
Separability measurement based on improved Fisher criterion
In this section, we analyze the principle of the steganalysis feature component separability measurement at first. And based on these, we propose an improved Fisher criterion to measure the separability of a feature component. Then we give the corresponding measurement algorithm of the improved Fisher criterion.
3.1. Analysis of principle
In pattern recognition, the inner-inter class distance is the most intuitive basis of feature selection. It is assumed that if the distance between two classes (brief as inter-class distance) is larger and the distance within a class (brief as inner-class distance) is smaller, the two classes can be more accurately separated. Similarly, the greater the inter-class distance between the cover image class and stego image class is, and the smaller the inner-class distance within a class is, the better the two image classes can be separated in the feature space. In addition, the dispersion degree is also a concept of distance, which can be used to measure the inconsistent distribution between the two classes of samples in the feature space. A larger value of dispersion degree can also be the basis for separate the two classes of samples from each other in the feature space. Similarly, the larger value of the dispersion between the cover image class and the stego image class is, the greater the inconsistent distribution between the two classes of samples in the feature space is, then the better the ability to separate. The relationships between the separability and the eight different combinations of these distances are shown in the following Table 1.
To illustrate this further, a specific schematic diagram is shown as Figure 1:
In this figure, the red dot is the center of the cover image class, and the green dot is the center of the stego image class. The straight line between the red dots and the green dots indicates the inter-class distance between the cover image class and the stego image class. A blue triangle and a yellow square represent a cover image and a stego image, respectively. From Figure 1, it is not difficult to find that the cover images and the stego images are not easily separated when the inter-class distance and dispersion are all small, such as Figure 1(f) and (h). And on the premise of "larger inter-class distance", if the inner-class distance is smaller, the cover image and the stego image can be more easily separated, such as Figure 1(c) and (d). After comprehensive analysis, it can be seen that the feature components with "large inter-class distance, small inner-class distance" or "large dispersion" can better separate the cover image class from the stego image class. Therefore, to measure more comprehensively and accurately, we need to construct a separability measurement criterion that can measure "inner-inter class distance" and "dispersion" of steganalysis feature components. However, the inter distance of steganalysis feature component between cover image class and stego image class plays a more important role in distinguishing the stego image class from the cover image class than dispersion. As shown in Figure 1(a) and (c), as long as the inter distance of the steganalysis feature component is sufficiently large between the cover image class and the stego image class, the cover image class and the stego image class can be separated by the steganalysis feature component. That is to say, when the inter distance of the steganalysis feature component between the cover image class and the stego image class is large enough, the dispersion's influence of the feature components between the cover image class and the stego image class on image classification is not obvious. All of these tell us that when measuring the separability of a feature component, we should focus on the principle of "inner-inter class distance" while taking into account the principle of "dispersion".
3.2. Improved Fisher criterion
Fisher criterion can measure the separability of steganalysis feature component, however it still has some shortcomings. This section improves the Fisher criterion to measure the separability of steganalysis feature component more comprehensively and provide more accurate basis for selection. In the following, we present an algorithm to calculate the separability of the steganalysis feature component based on the I-Fisher criterion.
In the Fisher criterion, [mC(fi)−mS(fi)]2 represents the inter-class distance of the feature component fi between the cover image class and the stego image class. d2C(fi)+d2S(fi) represents the inner-class distance of the feature component fi in the cover image class and the stego image class. According to the "inner-class agammaegation, inter-class dispersion" of the pattern recognition principle, it is known that the steganalysis feature component with large inter distance has a greater contribution to classify the cover image class and the stego image class. Therefore, the Fisher criterion is used to measure the separability of the feature component between the cover image class and the stego image class. The larger the Fscore(fi) value, the better the separability of the feature component.
However, there is a problem when the Fisher criterion measure the separability of the feature component. When mC(fi)=mS(fi), Fscore(fi)=0. It means that the separability of feature components is zero as long as the distance between cover image class and the stego image class is zero. However, this is not the case. According to the principle of pattern recognition, when the inner class agammaegation difference between the cover image class and the stego image class (i.e. dispersion) is significant, the feature component will also contribute to classify the stego image class and the cover image class.
From the above subsection, when measuring the separability of a feature component, we should focus on the principle of "inner-class agammaegation, inter-class dispersion" while taking into account the principle of "dispersion".
In general, the dispersion of a feature component can be represented by the standard deviation difference between the two classes. In addition, dC(fi)dS(fi) and dS(fi)dC(fi) can represent the dispersion of the feature component between the cover image class and stego image class, consider new equation
To prevent the value of g(fi) from being too large, we use the function of sigmoid to balance the the value of g(fi). We define the measurement equation of dispersion. That is
We consider adding the function E(fi)=11+e−g(fi) into the Fisher criterion to improve the Fisher criterion. In addition, since the principle of "inter-class dispersion and inner-class agammaegation" embodied in Fisher criterion plays a relatively important role in the separability measurement of steganalysis features, we give an improved Fisher criterion by taking into account the principle of "dispersion" and increasing the proportion of Fisher criterion (i.e. I-Fisher criterion).
where larger value of IFscore(fi), the better the separability of feature component fi, the greater the contribution of feature component fi to classify the stego image.
3.3. Separability measurement based on I-Fisher criterion
This section introduces the application of the I-Fisher criterion in separability measurement. Let fi be a steganalysis feature component (1≤i≤N), where N is the element number of a steganalysis feature. The IFscore(fi) value is calculated based on the I-Fisher criterion. The calculation process is shown as Figure 2:
As shown in Figure 2, an operator is put into an gray box, where (∙)2 represents the square value of ∙, (∘)╱(∙) represents the radio of ∘ to ∙, and ×4 reflects the importance of Fisher's criterion. The data which involved in the operation is put into a blue box. mC(fi) and mS(fi) represent the means of feature component fi in the cover image class and the stego image class, respectively. dC(fi) and dS(fi) represent the standard deviations of fi in the cover image and the stego image classes, respectively. Through the calculation procedure of this figure, the value of IFscore(fi) which is the separability of steganalysis feature component fi is obtained.
Algorithms 1 is an algorithm to measure the separability of steganalysis feature component based on I-Fisher criterion. The specific algorithm is shown in Algorithm 1.
According to the I-Fisher criterion, we can know that the larger the IFscore(fi) value, the better the separability of the feature component. Comparing the IFscore(fi) values of all the feature components in the feature space, we choose the larger feature components and remove the small feature components to reduce the feature dimension and maintain the detection accuracy of detecting stego images.
4.
Feature selection based on I-Fisher criterion
In view of advantage of the I-Fisher criterion in separability measurement, the I-Fisher criterion proposed in this paper is applied as a heuristic function in the decision rough set α-positive region reduction (brief as IF-based method).
This decision rough set α-positive region reduction can removes those feature components which do not fulfill the attribute independence requirement and positive region non-reduced. The decision rough set α-positive region reduction briefly describes as following.
Let T be a decision table, where object set U={x1,x2,⋯,x2m}, H and Q are the conditional attribute set and decision attribute set on U, respectively, X is the subset of U, α∈[0,1], if the attribute subset B⊆H satisfies the following two conditions:
1) Positive region non-reduced, ‖POSαB(Q)‖≥‖POSαH(Q)‖;
2) Attribute independence, for any fi∈B, ‖POSαB−{fi}(Q)‖<‖POSαB(Q)‖;
then the attribute subset B is a decision rough set α-positive region reduction of the attribute set H, where POSαH(Q)=∪x∈U/QRα−H(X), Rα−H(X)={x∈U|P(X|[x]H)≥α}, the threshold α is a preset detection accuracy of objects. Rα−H(X) is a α-lower approximation set of X. [x]H represents the equivalence class of x under attribute set H. P(X|[x]H) represents the conditional probability, P(X|[x]H)=‖X∩[x]H‖‖[x]H‖. ‖∙‖ represents the element number in set ∙.
According to the decision rough set α-positive region reduction to reduce the steganalysis feature, we can obtain several decision rough set α-positive region reduction subsets. Then it selects the best result from those decision rough set α-positive region reduction subsets.
The main steps of the IF-based method is shown as follows.
Step 1: Constructing a feature matrix and the corresponding decision table. Construct a feature matrix based on the features extracted from the cover image class and the stego image class. Then, add a decision attribute as last column of the matrix to construct a decision table T. The decision attribute value of a cover image is "0", and the decision attribute value of a stego image is "1".
Step 2: Measuring separability. Calculate the IFscore value of feature components in matrix based on IFisher criterion. The measurement method based on IFisher criterion is the emphasis of this paper. See Section 3.3 for details.
Step 3: Reducing feature. First, delete irrelevant feature components. Set IFscoremin as the lower limit of IFscore value. If IFscore(fi)<IFscoremin, then the feature component fi is an irrelevant feature component. The irrelevant feature components are removed, and the rest feature component are add into the candidate feature component set H which is a multiset. Second, calculate the division step λ, λ=IFscoremax−IFscoreminm, where m is the number of expected feature subsets. Third, according to the IFscore value, sort the corresponding feature components of H′ in descending order. Fourth, according to the value of λ, divide the sorted steganalysis feature components into m subsets, i.e. H′={h1,h2,⋯,hm}, where hi={fi1,fi2,⋯,fit} and t is the element number of feature subsets hi. Fourth, initialize B=∅. For each additional feature subset hi, determine whether the reduction set meets with the positive region non-reduction principle. If yes, continue to test whether the feature component in this feature subset hi meets with the attribute independence principle, else remove this subset, and continue to add the next subset. Then for the subset hi, test whether meets with the attribute independence principle. If yes, obtain this candidate subset B, else remove components in this subset. And output the reduction subsets B which meets with the positive region non-reduction principle and the attribute independence principle.
Step 4: Selecting feature subset. Detect the training images based on reduction feature subsets, and calculate the detection accuracy. Select a feature subset B with high detection accuracy and low dimension from reduction subsets, which obtained in Step3, as the final reduction result. And output the subset B and the corresponding column number η.
The detailed process is shown in [21]. In order to facilitate the reader's understanding, we give a diagram of the IF-based method. The diagram of the IF-based method is given in Figure 3.
5.
Experimental results
In this section, after introducing the experimental settings, we analyze the experimental results of steganalysis feature selection method based on I-Fisher criterion.
5.1. Experimental setting
The images used in the experiments are from the BOSSbase-1.01 database* containing 10000 grayscale images, size 512 × 512. First, all the grayscale PGM images are converted to JPEG grayscale images with quality factor 95. Second, the stego images with the payloads of 1.0, 0.25, 0.5, 0.8, 1.0 bpac (bits per nonzero AC DCT coefficient) are generated by the SI-UNIWARD [12] steganographic algorithm, which has good anti-detection performance. Then one group of cover images and five groups of stego images are considered. The steganalysis features are extracted from all the cover and stego images, using the GFR method [23] (17000-D) and CC-PEV method [24] (548-D). A steganalysis feature database which includes 120000 features are obtained.
* P. Bas, T. Filler, T. Pevny, available: http://agents.fel.cvut.cz/stegodata/
Table 2 shows the image sets for experiment.
In this paper we consider the Ensamble Classifier for steganalysis. The detection error PE represents the sums of false negatives (missed detections) and false positives (false alarms). ¯PE represents the average of the detection error PE. Average detection accuracy ¯PA=1−¯PE is used to evaluate the performance. 5000 pairs of images are used as the training set. 5000 pairs of cover and stego images are used as the testing set. The number of expected feature subsets is considered as 100.
5.2. Feature selection in GFR using IF-based method
The dimension of the GFR steganalysis feature proposed in [23] is 17000, including five sub-features, which capture the changes of image statistical feature from 5 different perspectives. We firstly consider the GFR features to conduct steganalysis. We then measure the IFscore values of every feature component based on the I-Fisher criterion. According to IFscore values and decision table, we use decision rough set α-positive region reduction to reduce the GFR feature. Finally, the average detection accuracy of the selected feature is evaluated for image steganalysis.
In the plots of Figure 4, we show the average detection accuracy of the original GFR steganalysis feature and after being reduced using the IF-based method when the bit embedded payloads in the images are 0.1, 0.25, 0.5, 0.8, 1.0. The horizontal axis represents the feature dimension number and the vertical-axis represents the average detection accuracy. "∘" indicates the average detection accuracy of different dimensions after reduction, The red "⋆" indicates that average detection accuracy of the original GFR (17000-D) before reduction. The green "⋆" indicates the highest average detection accuracy and its feature dimension. As shown in Figure 4 (c), when the feature number is reduced to 11956 with payload 0.5, the detection accuracy is 0.6593, which is about 0.21% higher than that of the original. As shown in Figure 4 (d), when the feature number is reduced to 10325 with payload 0.8, the detection accuracy is 0.9156, which is about 0.26% higher than that of the original. From Figure 4, we can see that even if the feature dimension drops to lower than one fourth, it still achieves good classification effect. It can be observed that the proposed method significantly reduced the dimensionality of the steganalysis feature, while also improving the stego-image detection accuracy.
The memory cost of 10000 images' GFR features are compared before and after reduction when the payloads are 0.1, 0.25, 0.5, 0.8, 1.0. When the detection accuracy is closed to the original, the memory costs are shown as follows.
The storage space of an original GFR steganalysis feature is close to 0.5 GB for 10000 images. In order to evaluate the memory requirement for the reduced steganalysis feature, in Table 3 we provide the memory requirements when the reduced feature achieves the same stego image detection results as the case of the original GFR steganalysis feature. From Table 3, it can be seen that the storage space of the GFR steganalysis feature after reducing the feature size under different payloads is significantly reduced as well. Such as: when the payload is 0.1, the original GFR feature requires 0.4610 G memory, the selected feature based on IF-based method needs 0.2831G memory, which saves 0.1779 G, i.e. 38.59%, of the memory cost. When the payload is 1.0, the original GFR feature requires 0.4621 G memory, the selected feature based on IF-based method needs 0.1121 G memory, which saves 0.3500 G, i.e. 75.74%, of the memory cost. In summary, the features which are selected by the IF-based method can save a lot of storage space.
5.3. Feature selection in CC-PEV using IF-based method
The dimension of the CC-PEV steganalysis feature proposed in [24] is 548. As shown in [24] that is to construct a new multi-class JPEG steganalyzer with markedly improved performance. We firstly consider the CC-PEV features to conduct steganalysis. We then measure the IFscore values of every feature component based on the I-Fisher criterion. According to IFscore values and decision table, we use decision rough set α-positive region reduction to reduce the CC-PEV feature. Finally, the average detection accuracy of the selected feature is evaluated for image steganalysis.
In the plots of Figure 5, we show the average detection accuracy of the original CC-PEV steganalysis feature and after being reduced using the IF-based method when the bit embedded payloads in the images are 0.1, 0.25, 0.5, 0.8, 1.0. The horizontal axis represents the feature dimension number and the vertical-axis represents the average detection accuracy. "∘" indicates the average detection accuracy of different dimensions after reduction. The red "⋆" indicates that average detection accuracy of the original CC-PEV (548-D) before reduction. The green "⋆" indicates the highest average detection accuracy and its feature dimension. As shown in Figure 5(b), when the feature number is reduced to 193 with payload 0.25, the detection accuracy is 0.52528, which is about 0.408% higher than that of the original. As shown in Figure 5(d), when the feature number is reduced to 504 with payload 0.8, the detection accuracy is 0.65245, which is about 0.125% higher than that of the original. From Figure 5, it can be observed that the proposed method reduced the dimensionality of the CC-PEV feature, while also improving the stego-image detection accuracy.
Since the original dimension and occupied memory space of the CC-PEV steganalysis feature are small, the memory space that can be saved after the selection is also small. Therefore, the memory space comparison experiment of the CC-PEV steganalysis feature is no longer performed.
5.4. Comparison with Steganalysis-α method
The Steganalysis-α method [21] is a general steganalytic feature selection method based on decision rough set α-positive region reduction. This method further removes the redundant steganalysis feature components based on the Attribute independence principle of decision rough set α-positive region reduction, which can further reduce the feature dimension. In addition, this method removes the conflicting feature components based on the Positive domain non-reduced principle of the decision rough set α-positive region reduction, which can maintain or even improve the detection accuracy. The main steps of Steganalysis-α method is shown as follows: First, measure the ASM value of each feature component in steganalysis feature based on the ASM criterion. Second, reorder these feature components in descending order according to ASM values. Third, reduce the reordered feature components based on decision rough α-positive region reduction. Then select the high detection accuracy and low dimension from all the decision rough α-positive region reduction subsets as the final reduction result. Finally, detect stego images based on the selected feature components.
Figure 6 is a double-axis histogram. The x-axis represents the payload, the left y-axis represents the detection accuracy, and the right y-axis represents the feature number. In the plots of Figure 6, we show the average detection accuracy of selected GFR steganalysis feature by the Steganalysis-α method and the IF-based method when the bit embedded payloads in the images are 0.1, 0.25, 0.5, 0.8, 1.0. The blue histogram represents the detection accuracy of the selected steganalysis feature by Steganalysis-α method. The green histogram represents the detection accuracy of the selected steganalysis feature by IF-based method. The yellow histogram represents the number of the selected steganalysis feature by Steganalysis-α method. The red histogram represents the number of the selected steganalysis feature by IF-based method.
It can be seen from Figure 6 that both the Steganalysis-α method and the IF-based method can reduce the GFR image steganalysis feature. In Figure 6, when payload is 0.5, the feature is reduced to 13205-D based on Steganalysis-α method. It removes 3795-D feature components. The average detection accuracy of reduced features is 0.6579. For features reduced by IFisher-based method, the number is 1249-D less than that of the Steganalysis-α method, and the detection accuracy is 0.14% higher than that of the Steganalysis-α method. When payload is 1.0, the feature is reduced to 11092-D based on Steganalysis-α method. The average detection accuracy of reduced features is 0.9697. For features reduced by IFisher-based method, the number is 1687-D less than that of the Steganalysis-α method, and the detection accuracy is 0.03% higher than that of the Steganalysis-α method.
From Figure 6, it can be observed that the proposed I-Fisher criterion selects a smaller steganalysis feature than the Steganalysis-α method especially when embedding higher bit payloads, while also providing slightly better detection results.
In the plots of Figure 7, we show the average detection accuracy of selected CC-PEV feature by the Steganalysis-α method and the IF-based method when the bit embedded payloads in the images are 0.1, 0.25, 0.5, 0.8, 1.0. The blue histogram represents the detection accuracy of the selected steganalysis feature by Steganalysis-α method. The green histogram represents the detection accuracy of the selected steganalysis feature by IF-based method. The yellow histogram represents the number of the selected steganalysis feature by Steganalysis-α method. The red histogram represents the number of the selected steganalysis feature by IF-based method.
It can be seen from Figure 7 that both the Steganalysis-α method and the IF-based method can reduce the CC-PEV image steganalysis feature. In Figure 7, when payload is 0.5, the feature is reduced to 510-D based on Steganalysis-α method. It removes 38-D feature components. The average detection accuracy of reduced features is 0.52257. For features reduced by IFisher-based method, the number is 5-D less than that of the Steganalysis-α method, and the detection accuracy is 1.88% higher than that of the Steganalysis-α method.
From Figure 7, it can be observed that the detection accuracy of the proposed method is higher than that of the Steganalysis-α method.
6.
Conclusions
In order to further reduce the dimension of steganalysis feature and improve the efficiency of steganalysis, this paper proposes a feature selection method based on I-Fisher criterion. First, adding the dispersion, this paper improves the traditional Fisher criterion. The I-Fisher criterion is able to measure separability more accurately than the traditional Fisher criterion. Then, we apply the I-Fisher criterion into the decision rough set α-positive region reduction during feature selection to select the steganalysis feature. Finally, a series of feature selection experimental results show that the proposed method can improve the detection accuracy of the steganalysis algorithm based on the selected features, while reducing the dimension and memory cost. In the future research work, we will continue to study how to evaluate the contribution of various steganalysis features.
Acknowledgments
This work was supported in part by the National Natural Science Foundation of China (No. U1804263, U1636219, 61872448, 61602508, 61772549, and U1736214), the National Key R & D Program of China (No. 2016YFB0801303, 2016QY01W0105), the Science and Technology Innovation Talent Project of Henan Province (No. 2018JR0018).