
Citation: Xinyi Wang, He Wang, Shaozhang Niu, Jiwei Zhang. Detection and localization of image forgeries using improved mask regional convolutional neural network[J]. Mathematical Biosciences and Engineering, 2019, 16(5): 4581-4593. doi: 10.3934/mbe.2019229
[1] | Jiale Lu, Jianjun Chen, Taihua Xu, Jingjing Song, Xibei Yang . Element detection and segmentation of mathematical function graphs based on improved Mask R-CNN. Mathematical Biosciences and Engineering, 2023, 20(7): 12772-12801. doi: 10.3934/mbe.2023570 |
[2] | Gang Cao, Antao Zhou, Xianglin Huang, Gege Song, Lifang Yang, Yonggui Zhu . Resampling detection of recompressed images via dual-stream convolutional neural network. Mathematical Biosciences and Engineering, 2019, 16(5): 5022-5040. doi: 10.3934/mbe.2019253 |
[3] | Auwalu Saleh Mubarak, Zubaida Said Ameen, Fadi Al-Turjman . Effect of Gaussian filtered images on Mask RCNN in detection and segmentation of potholes in smart cities. Mathematical Biosciences and Engineering, 2023, 20(1): 283-295. doi: 10.3934/mbe.2023013 |
[4] | Jiali Tang, Yan Wang, Chenrong Huang, Huangxiaolie Liu, Najla Al-Nabhan . Image edge detection based on singular value feature vector and gradient operator. Mathematical Biosciences and Engineering, 2020, 17(4): 3721-3735. doi: 10.3934/mbe.2020209 |
[5] | Kun Zheng, Junjie Shen, Guangmin Sun, Hui Li, Yu Li . Shielding facial physiological information in video. Mathematical Biosciences and Engineering, 2022, 19(5): 5153-5168. doi: 10.3934/mbe.2022241 |
[6] | Jun Gao, Qian Jiang, Bo Zhou, Daozheng Chen . Convolutional neural networks for computer-aided detection or diagnosis in medical image analysis: An overview. Mathematical Biosciences and Engineering, 2019, 16(6): 6536-6561. doi: 10.3934/mbe.2019326 |
[7] | Yong Tian, Tian Zhang, Qingchao Zhang, Yong Li, Zhaodong Wang . Feature fusion–based preprocessing for steel plate surface defect recognition. Mathematical Biosciences and Engineering, 2020, 17(5): 5672-5685. doi: 10.3934/mbe.2020305 |
[8] | Bin Zhang, Linkun Sun, Yingjie Song, Weiping Shao, Yan Guo, Fang Yuan . DeepFireNet: A real-time video fire detection method based on multi-feature fusion. Mathematical Biosciences and Engineering, 2020, 17(6): 7804-7818. doi: 10.3934/mbe.2020397 |
[9] | Tingxi Wen, Hanxiao Wu, Yu Du, Chuanbo Huang . Faster R-CNN with improved anchor box for cell recognition. Mathematical Biosciences and Engineering, 2020, 17(6): 7772-7786. doi: 10.3934/mbe.2020395 |
[10] | Huilin Ge, Yuewei Dai, Zhiyu Zhu, Biao Wang . Robust face recognition based on multi-task convolutional neural network. Mathematical Biosciences and Engineering, 2021, 18(5): 6638-6651. doi: 10.3934/mbe.2021329 |
The number of digital images has grown exponentially with the advent of new cameras, smartphones, and tablets. Social media such as Facebook and Twitter have further contributed to their distribution. However, digital content can be easily modified or tampered by photographic software such as Photoshop, Neoimaging, etc, which destroy the people's traditional concept of "seeing is believing". There are certain types of manipulations such as copy-move, splicing that can easily deceive the human perceptual system. Once these fake images are maliciously used to mislead the public about the truth, it will be no doubt to seriously threaten the stability and development of the society. Therefore, how to identify the authenticity of digital images and conduct forensic analysis has become one of the important topics in diverse scientific and security/surveillance applications.
To authenticate a digital image, many techniques have been developed. In general, these techniques can be divided into two types, referred to passive detection techniques [1,2,3] and active detection techniques [4,5]. Active detection techniques embed particular data into the image. When verifying the authenticity of the image, the data is extracted from the suspicious image and compared to the original image. Compared with the active detection techniques, passive detection techniques can verify the authenticity of the image without the support of any additional pre-processing operation, which has attracted more and more attention recently.
Although any trace may not be left on the vision in tampered images, it would inevitably change the local or entire features of the image. Based on this idea, a large number of passive techniques have been developed to detect these images. The main two types of image forgery are copy-move and image splicing. Copy-move is the most generally used by attackers, where some parts of the image are copied and pasted to other parts of the same image. The primary mission is to detect if there exist two or more similar regions in a single image. Until now, a number of passive techniques have been developed to detect the copy-move forgery [6,7,8]. The primary mission of image splicing is to detect whether a given image is a composite one which is generated by cutting and joining two or more images. There are many studies on image splicing detection. For example, Shi et al. [9] proposed a natural image model for image splicing detection. Later, Zhao et al. [10] further developed a 2-D Markov model to characterize the underlying image dependency and achieved splicing localization. In image forensics above, most of the state-of-the-art image tampering detection approaches exploit the frequency domain characteristics and/or statistical properties of an image. At present, many traditional image forensics tasks can be solved by designing correct feature sets and then using these features to distinguish the original image from the processed image. Therefore, research usually focuses on the construction of complex handcrafted features. However, for many tasks, it is difficult to determine which features should be extracted.
Recently, deep learning has become popular due to its promising performance in different visual recognition tasks such as object detection [11,12], scene classification [13], and semantic segmentation [14]. Deep neural networks have shown to be capable of extracting complex statistical dependencies from high-dimensional sensory inputs and efficiently learning their hierarchical representations. Moreover, deep learning based approaches have been increasingly used in passive image forensics. Chen finished the first work of applying CNN in median filtering image forensics [15], then Qian proposed a new paradigm for steganalysis to learn features automatically via deep learning models [16]. Bayar et al. [17] changed the low pass filter layer to an adaptive kernel layer to learn the filtering kernel used in tampered regions. Rao et al. [18] presented a new image forgery detection method based on deep learning technique, which utilizes a convolutional neural network (CNN) to learn hierarchical representations from the input RGB color images automatically. P. Zhou et al. [19] proposed a two-stream Faster R-CNN network and trained it end-to-end to detect the tampered regions given a manipulated image. Most of these deep learning forensic techniques focus on single tamper detection. Only a few can learn a more general forensics model, such as method [19], but the tampered area it locates is not pixel-level, and can only mark the tampered area with the bounding box.
To overcome these issues, we perform an end-to-end classification model trained by the Mask Regional Convolutional Neural Network (Mask R-CNN) [20] to distinguish manipulated regions from authentic regions and attach an Edge Agreement Head [21] to the mask branch of Mask R-CNN. Here this head uses traditional edge detection filter—Sobel kernel [22] on both the predicted mask and the groundtruth mask to encourage their edges to agree and improve detection accuracy. As the additional network head is only relevant during training, inference speed remains unchanged compared to Mask R-CNN. The overall framework (shown in Figure 1) is capable of detecting two types of image manipulations, including copy-move and splicing.
Our main contributions are as follows:
1. Apply the Mask R-CNN model to detect and locate the manipulated regions successfully. Pixel-level prior information of the tamper regions is utilized to provide the supervisory information for the training of Mask R-CNN.
2. Add a Sobel edge detection filter to focus on manipulated boundaries. This filter encourages predicted manipulated masks to have similar image gradients to the ground-truth mask and improves detection accuracy.
3. Create a synthetic tampering dataset based on COCO [23]. Since the previous image tampering datasets [24,25] are not enough to train a deep network.
To make full use of edge information and prior knowledge, a Mask R-CNN model and the Sobel filter are employed in the proposed method. A feature pyramid network (FPN) based on ResNet, and altered according to the images of the tampered area, is utilized as the backbone of the Mask R-CNN. Then, the Mask-RCNN is used to extract pyramid feature maps suitable for the images through the pixel-level prior information of the tamper region. Moreover, the recognition and coarse segmentation of the tamper region is performed, and the region of interest (RoI) is obtained by extending the bounding box of the tamper region provided by the recognition of Mask R-CNN. We also introduce a parameter-free network head, the Edge Agreement Head. This head uses traditional edge detection filter - Sobel filter on both the predicted mask and the groundtruth mask to encourage their edges to agree. The architecture of our method is shown in Figure 2.
The Mask Regional Convolutional Neural Network (Mask R-CNN) is a simple but effective complement to the Faster R-CNN architecture which adds mask prediction, it replaces the classical RoI Pooling layer in Faster R-CNN network with RoI Align. RoI Align introduces an interpolation process, which can largely solve the alignment problem caused by direct sampling only through Pooling. On this basis, a parallel Fully Convolution Networks layer (FCN) is added [26]. FCN can be used to predict pixel-level instance masks. In addition to the mask branch, it also uses the Feature Pyramid Network (FPN) backbone [27]. With this addition, the network can perform a precise location using the high-resolution function maps in the lower layers, it can also use lower resolution semantics for more complex features. Compared with Faster R-CNN, only a small increase in expenditure can achieve the processing speed of 5FPS, and Mask R-CNN can be easily extended to other tasks, such as human attitude estimation. Without resorting to skills, the performance of each task is better than that of all single model detections at present.
In our method, the Mask R-CNN constructs three stages for coarsely tampering detection and localization: feature extraction, region proposal, and prediction. First, for an input image, Mask R-CNN uses RPN network to generate candidate region ROI, so the features are extracted by residual convolution network ResNet-101, then the pyramid feature maps of the image are obtained. The feature extraction process here is the same as that of Faster RCNN. The next step is to get the feature map of each ROI region in the image and correct each ROI using ROI Align. After getting the feature map of each ROI region, the classification and bounding box of each ROI are predicted. Each ROI uses the designed FCN framework to predict the category of each pixel in the ROI region, Finally, a rough segmentation result of the image tampering region is obtained.
The Mask R-CNN loss is a multi-task loss based on the Faster R-CNN loss, and its loss function is defined as
LMRCNN=LClass+LBox+LMask | (1) |
By examining the predicted masks of the mask branches, we realize that these masks usually have blurred boundaries and do not follow the clear and fine contours of the original masks. When Mask R-CNN is used directly without adding edge detection, only coarse segmentation of the tampered area can be obtained, and its loss function is the same as that in [20].
When training a Mask R-CNN for image forgery detection and segmentation, we can always observe incomplete or poor masks, especially during early training steps (shown in Figure 3). Furthermore, the masks often do not follow the real tamper boundaries.
In Figure 3, (a) corresponds to the groundtruth, and from (b) to (d) represent three example mask predictions which demonstrate early-stage predictions of the Mask R-CNN during training. The Figure 3 shows possible mistakes such as missing parts or over segmentation in training. To overcome this problem, we need to find mask edge to supervise the training of Mask R-CNN, so we consider using an edge detection filter.
The edge detection is used to identify the points with significant brightness changing in a digital image. It is the basic process of image processing and computer vision. The application of edge detection acquires edge information better, so we combine it with the Mask RCNN. It can encourage predicted masks to have similar image gradients to the groundtruth mask, thus the tampering region segmentation will be better.
There are many ways to perform edge detection. However, most may be grouped into two categories, Gradient and Laplacian. The gradient method detects the edges by looking for the maximum and minimum in the first derivative of the image. The Laplacian method searches for zero crossings in the second derivative of the image to find edges. The edge detection filter which can be described as a convolution with a 3 × 3 kernel, such as the Sobel and Laplacian filters. In this paper, we choose the Sobel filter. Here, the Sobel filter is an Edge Agreement Head which attaches to the mask branch of Mask R-CNN (shown in Figure 2).
The Sobel operator is a directional algorithm that includes two operators corresponding to horizontal edges and vertical edges detection. It is not a simple average or difference, but a center with a weight of four directions.
f′x=f(x−1,y+1)+2f(x,y+1)+f(x+1,y+1)−f(x−1,y−1)−2f(x,y−1)−f(x+1,y−1)f′y=f(x−1,y−1)+2f(x−1,y)+f(x−1,y+1)−f(x+1,y−1)−2f(x+1,y)−f(x+1,y+1)G[f(x,y)]=|f′x(x,y)|+|f′y(x,y)| | (2) |
Where f′x(x,y),f′y(x,y) represent the first derivative of X and Y directions respectively. G[f(x,y)] is the gradient of Sobel filter, and f(x,y) is the input image with integer pixel coordinates.
The Sobel filter has two filters, include two groups of 3 × 3 matrix in lengthways and transverse directions, two out of three-dimensional matrix can be expressed as follows:
Gx=[10−120−210−1], Gy=[121000−1−2−1] | (3) |
The overall gradient is calculated on the basis of lengthways and transverse gradients:
G=√G2x+G2yθ=arctanGyGx | (4) |
If θ equals zero, it means that the image has a lengthways edge here and the left side is dimmer than the right side.
We attach the Sobel filter as an Edge Agreement Head to the mask branch of Mask R-CNN, which results in the construct of an auxiliary loss called Edge Agreement Loss (L edge ), the Edge Agreement Loss is computed using L2-norm loss function.
Ledge=1nn∑i=1(y−∧y)2 | (5) |
L edge reflects the difference between the target ˆy and the prediction y, which is shown in Fig 4.
Thus, we use the following formula to represent the total loss.
LTotal=LMRCNN+LEdge=LClass+LBox+LMask+LEdge | (6) |
The total loss LTotal consists of the original Mask R-CNN loss LMRCNN (eq. 1) and the new Edge Agreement Loss LEdge. The classification loss LClass and bounding-box loss LBox are the same as those defined in[20]. The branch of the mask has a Km2 - dimension output for each RoI, and its encoding resolution is K binary masks of m×m, and each K class corresponds to one. For this purpose, we use sigmoid per pixel and define LMask as the average binary cross-entropy loss. For the RoI associated with the ground reality class k, LMask is defined only on the K mask (other mask outputs do not cause loss).
All training images are sized to maintain their aspect ratio. The mask size is 28 × 28 pixels, and the image resolution is 1024 × 512 pixels. This method differs from the one used in the original Mask R-CNN [20], where resizing is done such that the smallest size is 800 pixels and the largest is trimmed at 1024 pixels. We set the hyperparameter according to the characteristics of the method and the object detection in original papers [20]. The anchors are selected based on the intersection over union (IoU) ratio of the anchor and ground-truth (GT) boxes and the mask loss is only defined on the positive ROI. The mask target is the intersection between the ROI and its associated ground truth mask. Here, the RoI is considered positive if it has IoU with a ground-truth box of at least 0.5 and negative otherwise.
Each mini-batch has 2 images per GPU, and each image has an ROI of N samples with a plus or minus ratio of 1:3. For the C4 backbone, N is 64, and for FPN, N is 512. A batch size of 2 on a single GPU machine for 640K iterations with a learning rate of 0.01 and a 10 reduction at 240K iterations. The optimization is done by SGD with momentum set to 0.9 and weight decay set to 0.0001.
In this Section, experimental results are presented to demonstrate the efficiency of our method of tampering detection and localization. As mentioned above, we introduce an Edge Agreement Head, which uses Sobel filter on both the predicted mask and the groundtruth mask to encourage their edges to agree. Therefore, we want to verify whether the segmentation accuracy is improved after adding edge detection.
All experiments are conducted using NVidia GeForce GTX 2080 Ti with 11 GB memory in Ubuntu 16.04, the operating environment is Intel Core CPU i7-9700K, GeForce GTX 2080Ti with 32 GB RAM.
Current standard image tampering datasets do not contain enough images for deep neural network training. To overcome this problem, we create a synthetic dataset using the images from COCO [23]. We pre-train our model on our synthetic dataset and use the segmentation annotations to randomly select a different kind of objects. Then we copy and paste them to the same or other images. The tampered images in our synthetic dataset are divided into two classes: (a) copy-move, (b) splicing. Separate the training (80%) and the test set (20%) to ensure that the same background and tampered objects do not appear in the training and test set. At last, we create 30K tampered and authentic image pairs and train our model end-to-end on this synthetic dataset. We use Average Precision (AP) for evaluation, the metric of which is the same as COCO [22] detection evaluation. In Table 1, we can see that the Mask R-CNN with the added sobel filter performs better than the single Mask R-CNN.
AP | synthetic test |
Single Mask R-CNN | 0.713 |
Mask R-CNN+Sobel filter | 0.769 |
We compare our method with current state-of-the-art methods on Cover [24] and Columbia dataset [25]. The Cover dataset [24] is a dataset focusing on copy-move and it covers similar objects as the pasted regions to conceal the tampering artifacts. The Columbia dataset [25] focuses on splicing based on uncompressed images. Ground-truth masks of these two standard datasets are provided. The CASIA dataset [25] contains more tampering images. However, it does not provide the corresponding Ground-truth masks, so we don't choose it in this paper.
The evaluation metrics standards are AP (averaged over IoU thresholds), AP50, AP75 (AP at different scales) and F1 (a pixel localization metric) score. AP is evaluated using mask IoU and the F1 metric is defined as below:
F1=2⋅TP2⋅TP+FN+FP | (6) |
TP represents the number of pixels classified as true positive and FN represents the number of pixels classified as false negative where a tampered pixel is incorrectly classified as authentic, and FP represents the number of pixels classified as false positive where an authentic pixel is incorrectly classified as tampered.
We evaluate the performance of the improved Mask R-CNN model and compare it with the existing baseline approaches [19,20,21,22,23,24,25,26,27,28,29,30] on the same training and testing split protocol as [31] (for COVER) and [32] (for Columbia).
The average F1 score is calculated according to the evaluation metric for each method. The results are shown in Table 2. Obviously, the proposed method outperforms the existing baseline methods in F1 score. When the Sobel Filter is added, the F1 score is also improved compared to the single Mask R-CNN.
Methods | Columbia | Cover |
ELA [28] | 0.470 | 0.222 |
NOI1[29] | 0.574 | 0.269 |
CFA1[30] | 0.467 | 0.190 |
RGB-N [19] | 0.697 | 0.437 |
Single Mask R-CNN | 0.7405 | 0.530 |
Mask R-CNN+Sobel filter(proposed) | 0.7825 | 0.612 |
Tamper detection and localization results of the Mask R-CNN with Edge Agreement Head using the Sobel Edge Detection filter are shown in Figure 5. The first and second rows of Figure 5 show the detection results of copy-move and splicing tampering respectively. We can see our proposed method produces accurate results for copy-move and splicing tampering detection. By attaching the Edge Agreement Head to the Mask R-CNN, the network also produces the correct classification for different types of forgery. We change the classes for manipulation classification to be splicing and copy-move to learn distinct visual tampering artifacts and features for each class. The detection performances of the two types of tampering are shown in Table 3.
Cover | Columbia | Mean | |
AP | 0.936 | 0.978 | 0.957 |
We consider that the superiority of the Sobel filter can be explained by its structure. Since it consists of two filters, not only the edge strength along the x and y-axes can be used in the gradient descent process, but also the direction of the edge can be used to minimize the total loss, So this extra information can speed up training and improve the tamper accuracy. Table 4 shows the AP metrics comparisons on training sets of before and after adding Sobel filter to the network structure, and they are 320, 500 and 640 steps respectively.
Methods | Steps | AP | AP50 | AP75 |
Mask R-CNN +Sobel filter (Proposed) | 320k | 73.2±0.09 | 84.3±0.29 | 72.5±0.06 |
500k | 74.5 | 84.9 | 74.4 | |
640k | 75.4. | 86.1 | 76.8 | |
Single Mask R-CNN | 320k | 72.6± 0.15 | 81.4± 0.23 | 71.2± 0.11 |
500k | 73.1 | 82.6 | 73.7 | |
320k | 73.4 | 83.3 | 74.3 |
When the training time is longer, the difference between the single Mask R-CNN and the edge protocol header still exists, demonstrating the effectiveness of the additional loss not only in the early stages of training but also in subsequent steps. We observe that the edge contour of the detected tampering area is more accurate after increasing the edge agreement loss to train the model. The contrast effects are shown in Figure 5.
Considering tampering images are often attacked by JPEG compression and Resizing, we test the robustness of the proposed method and compare with two methods in Table 5. Experimental results show our approach is more robust to these attacks and better than other methods.
JPEG | 100/1 | 90/0.9 | 70/0.7 |
ELA [28] | 0.305/0.305 | 0.221/0.245 | 0.175/0.188 |
NOI1 [29] | 0.347/0.347 | 0.261/0.275 | 0.230/0.244 |
Our method | 0.633/0.633 | 0.562/0.580 | 0.543/0.564 |
In this paper, we analyze the behavior of the Mask R-CNN network in the early training steps. By observing the prediction mask of the mask branches, we recognize that these often exhibit blurred boundaries, sometimes not following the clear and complete contours of the original tamper-area mask. To improve the accuracy of tamper localization, we introduced a parameter-free network head that applies the Sobel edge detection filter to the mask to calculate the L2 loss between the predicted and groundtruth mask contours. We demonstrate the superior performance of the proposed method over other state-of-the-art image tampering detection methods. More features will be explored in the future.
This work is supported by National Natural Science Foundation of China (No. 61370195, U1536121).
The authors declare no conflict of interest.
[1] | W. Luo, Z. Qu, F. Pan, et al., A survey of passive technology for digital image forensics, FCS, 1 (2007), 166–179. |
[2] | J. G. R. Elwin, T. Aditya and S. M. Shankar, Survey on passive methods of image tampering detection, INCOCC,Erode, 2(2010), 431–436. |
[3] | G. K. Birajdar and V. H. Mankar, Digital image forgery detection using passive techniques: A survey, Digit. Inve, 10 (2013), 226–245. |
[4] | L. Verdoliva, D. Cozzolino and G. Poggi, A feature-based approach for image tampering detection and localization, WIFS, Atlanta, GA, (2014), 149–154. |
[5] | U. H. Panchal and R. Srivastava, A comprehensive survey on digital image watermarking techniques, ICCSNT, (2015), 591–595. |
[6] | Al-Qershi, M. Osamah and B. E. Khoo, Passive detection of copy-move forgery in digital images: State-of-the-art, Foren. Sci. Int., 23 (2013), 284–295. |
[7] | H. Huang, W. Guo and Y. Zhang, Detection of copy-move forgery in digital images using SIFT algorithm, CIIA, (2008), 272–276. |
[8] | T. Mahmood, A. Irtaza, Z. Mehmood, et al., Copy-move forgery detection through stationary wavelets and local binary pattern variance for forensic analysis in digital images, Forensic Sci. Int., 279 (2017), 8–21. |
[9] | Y. Q. Shi, C. Chen and C. Wen, A natural image model approach to splicing detection, In Proceeding MM&Sec '07 Proceedings of the 9th workshop on Multimedia & security, (2007), 51–62. |
[10] | X. Zhao, S. Wang, S, Li, et al., Passive image-splicing detection by a 2-D noncausal markov model, IEEE Trans. CSVT, 25 (2015), 185–199. |
[11] | R. Girshick, Fast R-CNN, ICCV, (2015), 1440–1448. |
[12] | B. H. Jawadul and A. K. Roy-Chowdhury, CNN based region proposals for efficient object detection, ICIP, (2016), 3658–3662. |
[13] | B. Zhou, A. Lapedriza, J. Xiao, et al., Learning deep features for scene recognition using places database, ANIPS, 1(2015), 13–20 |
[14] | L. Jonathan, E. Shelhamer and T. Darrell, Fully convolutional networks for semantic segmentation, IEEE Trans. Pattern Anal. Mach. Intel., 39 (2014), 640–651. |
[15] | J. Chen, X. Kang, Y. Liu, et al., Median filtering forensics based on convolutional neural networks, IEEE Sig. Pro. Lett., 22 (2015), 1849–1853. |
[16] | Y. Qian, J. Dong, W. Wang, et al., Deep learning for steganalysis via convolutional neural networks, Pro. SPIE. ISOE, 94 (2015), 9–14. |
[17] | B. Belhassen and M. C. Stamm, A deep learning approach to universal image manipulation detection using a new convolutional layer, IH&MMSec, (2016), 5–10. |
[18] | R. Yuan and J. Ni, A deep learning approach to detection of splicing and copy-move forgeries in images, WIFS, (2016), 1–6. |
[19] | P. Zhou, X. Han, V. I. Morariu and L. S. Davis, Learning rich features for image manipulation detection,IEEE Conf. Comput. Vis. Pattern Recognit., Salt Lake, (2018), 1053–1061. |
[20] | K. He, G. Gkioxari, P. Dollár, et al., Mask R-CNN. ICCV, 99 (2017), 1–11. |
[21] | R. S. Zimmermann and J. N. J. a. p. a. Siems, Faster Training of Mask R-CNN by Focusing on Instance Boundaries, arXiv: 1809.07069. |
[22] | C. Lopez-Molina, H. Bustince, J. Fernández, et al., A t-norm based approach to edge detection, IWCANN, Springer, Berlin, Heidelberg, (2009), 302–309. |
[23] | T.Y. Lin, M. Maire, S. Belongie, et al., Microsoft coco: Common objects in context, ECCV, Springer, Cham, (2014), 740–755. |
[24] | B. Wen, Y. Zhu, R. Subramanian, et al., Coverage novel database for copy-move forgery detection. ICIP, (2016), 161–165. |
[25] | Y. F. Hsu and S. F. Chang, Detecting image splicing using geometry invariants and camera characteristics consistency, ICME, (2006), 549–552. |
[26] | L. Jonathan, E. Shelhamer and T. Darrell, Fully Convolutional Networks for Semantic Segmentation, CVPR, 39 (2015), 3431–3440. |
[27] | T.Y. Lin, P. Dollar, R. Girshick, et al., Feature pyramid networks for object detection, CVPR, (2017), 2117–2125. |
[28] | N. Krawetz and H. F. J. H. F. S. Solutions, A Picture's Worth, Hacker Fact. Solut., 6 (2007), 1–31. |
[29] | B. Mahdian and S. Saic, Using noise inconsistencies for blind image forensics, Image Vis. Comput., 7 (2009), 1497–1503. |
[30] | P. Ferrara, T. Bianchi, A. De Rosa, et al., Image forgery localization via fine-grained analysis of CFA artifacts, IEEE Trans. Inf. Foren. Secur., 7(2012), 1566–1577. |
[31] | J. H. Bappy, A. K. Roy-Chowdhury, J. Bunk, et al., Exploiting spatial structure for localizing manipulated image regions, ICCV, (2017), 4970–4979. |
[32] | R. Salloum, Y. Ren and C.-C. J. Kuo, Image splicing localization using a multi-task fully convolutional network (MFCN), J. Vis. Com. Image Repre., 51 (2018), 201–209. |
1. | Richar Fernandez Vilchez, David Mauricio, Bullet Impact Detection in Silhouettes Using Mask R-CNN, 2020, 8, 2169-3536, 129542, 10.1109/ACCESS.2020.3008943 | |
2. | Mario Versaci, Francesco Carlo Morabito, Image Edge Detection: A New Approach Based on Fuzzy Entropy and Fuzzy Divergence, 2021, 1562-2479, 10.1007/s40815-020-01030-5 | |
3. | Saeid Asgari Taghanaki, Kumar Abhishek, Joseph Paul Cohen, Julien Cohen-Adad, Ghassan Hamarneh, Deep semantic segmentation of natural and medical images: a review, 2021, 54, 0269-2821, 137, 10.1007/s10462-020-09854-1 | |
4. | Nihat Eren ÖZMEN, Ercan BULUŞ, DERİN SİNİR AĞLARI YARDIMIYLA FOTOMONTAJ TESPİTİ, 2020, 8, 1308-6693, 236, 10.21923/jesd.837237 | |
5. | Zaid Nidhal Khudhair, Farhan Mohamed, Karrar A. Kadhim, A Review on Copy-Move Image Forgery Detection Techniques, 2021, 1892, 1742-6588, 012010, 10.1088/1742-6596/1892/1/012010 | |
6. | Mohamed A. Elaskily, Mohamed M. Dessouky, Osama S. Faragallah, Ahmed Sedik, A survey on traditional and deep learning copy move forgery detection (CMFD) techniques, 2023, 1380-7501, 10.1007/s11042-023-14424-y | |
7. | Mario Versaci, Francesco Carlo Morabito, 2022, chapter 8, 9781799886860, 160, 10.4018/978-1-7998-8686-0.ch008 | |
8. | H R Chennamma, B Madhushree, A comprehensive survey on image authentication for tamper detection with localization, 2023, 82, 1380-7501, 1873, 10.1007/s11042-022-13312-1 | |
9. | Boyang Wang, Wenyu Zhang, MARnet: multi-scale adaptive residual neural network for chest X-ray images recognition of lung diseases, 2022, 19, 1551-0018, 331, 10.3934/mbe.2022017 | |
10. | Tahira Nazir, Marriam Nawaz, Momina Masood, Ali Javed, Copy move forgery detection and segmentation using improved mask region-based convolution network (RCNN), 2022, 131, 15684946, 109778, 10.1016/j.asoc.2022.109778 | |
11. | Muhammad Hameed Siddiqi, Khurshed Asghar, Umar Draz, Amjad Ali, Madallah Alruwaili, Yousef Alhwaiti, Saad Alanazi, M. M. Kamruzzaman, Usman Habib, Image Splicing-Based Forgery Detection Using Discrete Wavelet Transform and Edge Weighted Local Binary Patterns, 2021, 2021, 1939-0122, 1, 10.1155/2021/4270776 | |
12. | Kalyani Dhananjay Kadam, Swati Ahirrao, Ketan Kotecha, Suneet Kumar Gupta, Efficient Approach towards Detection and Identification of Copy Move and Image Splicing Forgeries Using Mask R-CNN with MobileNet V1, 2022, 2022, 1687-5273, 1, 10.1155/2022/6845326 | |
13. | Mukesh Chandra, Kunal Kumar, Prabhat Thakur, Somnath Chattopadhyaya, Firoz Alam, Satish Kumar, Digital technologies, healthcare and Covid-19: insights from developing and emerging nations, 2022, 12, 2190-7188, 547, 10.1007/s12553-022-00650-1 | |
14. | B. Hemalatha, B. Karthik, S. Balaji, K. K. Senthilkumar, Ankush Ghosh, 2022, Chapter 57, 978-981-19-1676-2, 652, 10.1007/978-981-19-1677-9_57 | |
15. | Shobhit Tyagi, Divakar Yadav, A detailed analysis of image and video forgery detection techniques, 2023, 39, 0178-2789, 813, 10.1007/s00371-021-02347-4 | |
16. | Yuanlu Wu, Yan Wo, Guoqiang Han, Joint manipulation trace attention network and adaptive fusion mechanism for image splicing forgery localization, 2022, 81, 1380-7501, 38757, 10.1007/s11042-022-13151-0 | |
17. | Amit Kumar, Namita Tiwari, Meenu Chawla, 2023, 124, 978-3-0364-1151-4, 762, 10.4028/p-y573sx | |
18. | Kalyani Dhananjay Kadam, Swati Ahirrao, Ketan Kotecha, Multiple Image Splicing Dataset (MISD): A Dataset for Multiple Splicing, 2021, 6, 2306-5729, 102, 10.3390/data6100102 | |
19. | Baraa Tareq Hammad, Ismail Taha Ahmed, Norziana Jamil, 2022, An Secure and Effective Copy Move Detection Based on Pretrained Model, 978-1-6654-6806-0, 66, 10.1109/ICSGRC55096.2022.9845141 | |
20. | Kalyani Dhananjay Kadam, Swati Ahirrao, Ketan Kotecha, Sayan Sahu, Detection and Localization of Multiple Image Splicing Using MobileNet V1, 2021, 9, 2169-3536, 162499, 10.1109/ACCESS.2021.3130342 | |
21. | Badal Soni, Pradip K. Das, 2022, Chapter 7, 978-981-16-9040-2, 85, 10.1007/978-981-16-9041-9_7 | |
22. | Syed Tufael Nabi, Munish Kumar, Paramjeet Singh, Naveen Aggarwal, Krishan Kumar, A comprehensive survey of image and video forgery techniques: variants, challenges, and future directions, 2022, 28, 0942-4962, 939, 10.1007/s00530-021-00873-8 | |
23. | Sagi Harshad Varma, Chagantipati Akarsh, 2022, Chapter 30, 978-981-16-9649-7, 385, 10.1007/978-981-16-9650-3_30 | |
24. | Emad Ul Haq Qazi, Tanveer Zia, Abdulrazaq Almorjan, Deep Learning-Based Digital Image Forgery Detection System, 2022, 12, 2076-3417, 2851, 10.3390/app12062851 | |
25. | Preeti Dhiman, Usha Chauhan, 2022, Review on Deep Learning Based Image Forgery Detection and Localization, 978-1-6654-7436-8, 1064, 10.1109/ICAC3N56670.2022.10074342 | |
26. | Wasen Fahad Mashaan, Ismail Taha Ahmed, 2023, Manual and Automatic Feature Engineering in Digital Image Forgery Detection Algorithms: Survey, 978-1-6654-7692-8, 81, 10.1109/CSPA57446.2023.10087398 | |
27. | Yogesh Kumar, Ravi Kumar, Roshan Kumar, Rahul Kumawat, Nikhil Soren, Sachin Kumar Jangir, Tarun Singh, A Review on Image Forgery Detection Techniques Using Machine Learning, 2024, 1556-5068, 10.2139/ssrn.4485739 | |
28. | Paola Capasso, Giuseppe Cattaneo, Maria De Marsico, A Comprehensive Survey on Methods for Image Integrity, 2024, 20, 1551-6857, 1, 10.1145/3633203 | |
29. | Sunen Chakraborty, Kingshuk Chatterjee, Paramita Dey, Detection of Image Tampering Using Deep Learning, Error Levels and Noise Residuals, 2024, 56, 1573-773X, 10.1007/s11063-024-11448-9 | |
30. | T Milinda. H Gedara, Vincenzo Loia, Stefania Tomasiello, 2024, Detecting Fake Images Using Neuro-Fuzzy Inference Systems: A Brief Comparative Analysis, 979-8-3503-1954-5, 1, 10.1109/FUZZ-IEEE60900.2024.10612033 | |
31. | Fatemeh Zare Mehrjardi, Ali Mohammad Latif, Mohsen Sardari Zarchi, Razieh Sheikhpour, A survey on deep learning-based image forgery detection, 2023, 144, 00313203, 109778, 10.1016/j.patcog.2023.109778 | |
32. | Kartik Agarwal, Aishwarya Bhattacharya, Eshaan Anand, Mohit Ranjan Panda, 2024, Image Tampering Detection with ELA Transform and Convolutional Neural Network, 979-8-3503-6178-0, 1, 10.1109/IC-CGU58078.2024.10530697 | |
33. | Xiaojun Li, Cheng Li, Rong Zhou, Lijie Wei, Yanping Wang, Acoustic neuroma classification algorithm based on mask region convolution neural network, 2024, 17, 16878507, 100818, 10.1016/j.jrras.2024.100818 | |
34. | Satyendra Singh, Rajesh Kumar, Image forgery detection: comprehensive review of digital forensics approaches, 2024, 7, 2432-2717, 877, 10.1007/s42001-024-00265-8 | |
35. | Chithra Raj N., Maitreyee Dutta, Jagriti Saini, A systematic literature review on image splicing detection and localization using emerging technologies, 2024, 1573-7721, 10.1007/s11042-024-19843-z | |
36. | Uliyan Diaa, A Deep Learning Model to Inspect Image Forgery on SURF Keypoints of SLIC Segmented Regions, 2024, 14, 1792-8036, 12549, 10.48084/etasr.6622 | |
37. | M. Sabeena, Lizy Abraham, Convolutional block attention based network for copy-move image forgery detection, 2024, 83, 1380-7501, 2383, 10.1007/s11042-023-15649-7 | |
38. | Sunen Chakraborty, Paramita Dey, Kingshuk Chatterjee, 2023, Chapter 17, 978-981-99-2709-8, 197, 10.1007/978-981-99-2710-4_17 | |
39. | Balakrishnan Prabhakaran, Multimedia Data and Security, 2024, 31, 1070-986X, 5, 10.1109/MMUL.2024.3378058 | |
40. | V Ajay Kumar, Sujatha Birudu, K Sirisha, B Chaitanya Mouli, K Hemanth, P Adesh, 2024, Revealing Image Deepfakes: A Convolutional Neural Network Approach Leveraging VGG-16 Mode, 979-8-3503-6079-0, 219, 10.1109/SASI-ITE58663.2024.00046 | |
41. | Muhammad Asad Arshed, Shahzad Mumtaz, Ștefan Cristian Gherghina, Neelam Urooj, Saeed Ahmed, Christine Dewi, A Deep Learning Model for Detecting Fake Medical Images to Mitigate Financial Insurance Fraud, 2024, 12, 2079-3197, 173, 10.3390/computation12090173 | |
42. | Mukesh Kumar, Abhishek Negi, Aditya Panwar, Rajat Mehar, Rohit Rawat, Teekam Singh, 2024, Deep Learning-Powered Approach for Image Forgery Detection, 979-8-3503-8386-7, 1, 10.1109/ICAIT61638.2024.10690398 | |
43. | S. Bharathiraja, B. Rajesh Kanna, S. Geetha, G. Anusooya, Unmasking the digital deception - a comprehensive survey on image forgery and detection techniques, 2024, 56, 0045-0618, 635, 10.1080/00450618.2023.2270575 | |
44. | Shivnarayan Ahirwar, Alpana Pandey, 2024, Digital Image Forgery Detection using Convolutional Neural Network (CNN): A Survey, 979-8-3503-4846-0, 1, 10.1109/SCEECS61402.2024.10481917 | |
45. | Wasan Fahad Mashaan, Ismail Taha Ahmed, 2023, Passive Forgery Detection Techniques:A Survey, 979-8-3503-2130-2, 321, 10.1109/I2CACIS57635.2023.10193581 | |
46. | Maria Darlene Kusnadi, , 2024, Localizing Copy-Move Manipulations in Digital Images Using U-Net Architecture with Various Backbone Models, 979-8-3503-7611-1, 265, 10.1109/ICSINTESA62455.2024.10747995 | |
47. | Rupali M. Bora, Mahesh R. Sanghavi, Shrikant S. Pawar, 2024, Chapter 46, 978-981-97-5790-9, 639, 10.1007/978-981-97-5791-6_46 | |
48. | Emad Ul Haq Qazi, Tanveer Zia, Muhammad Imran, Muhammad Hamza Faheem, Deep Learning-Based Digital Image Forgery Detection Using Transfer Learning, 2023, 38, 2326-005X, 225, 10.32604/iasc.2023.041181 | |
49. | Issam Shallal, Lamia Rzouga Haddada, Najoua Essoukri Ben Amara, 2024, Enhanced Detection of Copy-Move Forgery by Fusing Scores From Handcrafted and Deep Learning-Based Detection Systems, 979-8-3503-6869-7, 113, 10.1109/DeSE63988.2024.10911956 | |
50. | T. Bharathi, V Lokeswara Reddy, 2024, Quantifying Deep Learning based Image Forgery Detection System: A Survey, 979-8-3503-6800-0, 1, 10.1109/ICAITPR63242.2024.10959741 |
AP | synthetic test |
Single Mask R-CNN | 0.713 |
Mask R-CNN+Sobel filter | 0.769 |
Cover | Columbia | Mean | |
AP | 0.936 | 0.978 | 0.957 |
Methods | Steps | AP | AP50 | AP75 |
Mask R-CNN +Sobel filter (Proposed) | 320k | 73.2±0.09 | 84.3±0.29 | 72.5±0.06 |
500k | 74.5 | 84.9 | 74.4 | |
640k | 75.4. | 86.1 | 76.8 | |
Single Mask R-CNN | 320k | 72.6± 0.15 | 81.4± 0.23 | 71.2± 0.11 |
500k | 73.1 | 82.6 | 73.7 | |
320k | 73.4 | 83.3 | 74.3 |
JPEG | 100/1 | 90/0.9 | 70/0.7 |
ELA [28] | 0.305/0.305 | 0.221/0.245 | 0.175/0.188 |
NOI1 [29] | 0.347/0.347 | 0.261/0.275 | 0.230/0.244 |
Our method | 0.633/0.633 | 0.562/0.580 | 0.543/0.564 |
AP | synthetic test |
Single Mask R-CNN | 0.713 |
Mask R-CNN+Sobel filter | 0.769 |
Methods | Columbia | Cover |
ELA [28] | 0.470 | 0.222 |
NOI1[29] | 0.574 | 0.269 |
CFA1[30] | 0.467 | 0.190 |
RGB-N [19] | 0.697 | 0.437 |
Single Mask R-CNN | 0.7405 | 0.530 |
Mask R-CNN+Sobel filter(proposed) | 0.7825 | 0.612 |
Cover | Columbia | Mean | |
AP | 0.936 | 0.978 | 0.957 |
Methods | Steps | AP | AP50 | AP75 |
Mask R-CNN +Sobel filter (Proposed) | 320k | 73.2±0.09 | 84.3±0.29 | 72.5±0.06 |
500k | 74.5 | 84.9 | 74.4 | |
640k | 75.4. | 86.1 | 76.8 | |
Single Mask R-CNN | 320k | 72.6± 0.15 | 81.4± 0.23 | 71.2± 0.11 |
500k | 73.1 | 82.6 | 73.7 | |
320k | 73.4 | 83.3 | 74.3 |
JPEG | 100/1 | 90/0.9 | 70/0.7 |
ELA [28] | 0.305/0.305 | 0.221/0.245 | 0.175/0.188 |
NOI1 [29] | 0.347/0.347 | 0.261/0.275 | 0.230/0.244 |
Our method | 0.633/0.633 | 0.562/0.580 | 0.543/0.564 |