
Citation: Jing Wang, Jiaohua Qin, Xuyu Xiang, Yun Tan, Nan Pan. CAPTCHA recognition based on deep convolutional neural network[J]. Mathematical Biosciences and Engineering, 2019, 16(5): 5851-5861. doi: 10.3934/mbe.2019292
[1] | Shuai Cao, Biao Song . Visual attentional-driven deep learning method for flower recognition. Mathematical Biosciences and Engineering, 2021, 18(3): 1981-1991. doi: 10.3934/mbe.2021103 |
[2] | Jia-Gang Qiu, Yi Li, Hao-Qi Liu, Shuang Lin, Lei Pang, Gang Sun, Ying-Zhe Song . Research on motion recognition based on multi-dimensional sensing data and deep learning algorithms. Mathematical Biosciences and Engineering, 2023, 20(8): 14578-14595. doi: 10.3934/mbe.2023652 |
[3] | Boyang Wang, Wenyu Zhang . ACRnet: Adaptive Cross-transfer Residual neural network for chest X-ray images discrimination of the cardiothoracic diseases. Mathematical Biosciences and Engineering, 2022, 19(7): 6841-6859. doi: 10.3934/mbe.2022322 |
[4] | Zhongxue Yang, Yiqin Bao, Yuan Liu, Qiang Zhao, Hao Zheng, YuLu Bao . Research on deep learning garbage classification system based on fusion of image classification and object detection classification. Mathematical Biosciences and Engineering, 2023, 20(3): 4741-4759. doi: 10.3934/mbe.2023219 |
[5] | Jose Guadalupe Beltran-Hernandez, Jose Ruiz-Pinales, Pedro Lopez-Rodriguez, Jose Luis Lopez-Ramirez, Juan Gabriel Avina-Cervantes . Multi-Stroke handwriting character recognition based on sEMG using convolutional-recurrent neural networks. Mathematical Biosciences and Engineering, 2020, 17(5): 5432-5448. doi: 10.3934/mbe.2020293 |
[6] | Hassan Ali Khan, Wu Jue, Muhammad Mushtaq, Muhammad Umer Mushtaq . Brain tumor classification in MRI image using convolutional neural network. Mathematical Biosciences and Engineering, 2020, 17(5): 6203-6216. doi: 10.3934/mbe.2020328 |
[7] | Yufeng Li, Chengcheng Liu, Weiping Zhao, Yufeng Huang . Multi-spectral remote sensing images feature coverage classification based on improved convolutional neural network. Mathematical Biosciences and Engineering, 2020, 17(5): 4443-4456. doi: 10.3934/mbe.2020245 |
[8] | Weibin Jiang, Xuelin Ye, Ruiqi Chen, Feng Su, Mengru Lin, Yuhanxiao Ma, Yanxiang Zhu, Shizhen Huang . Wearable on-device deep learning system for hand gesture recognition based on FPGA accelerator. Mathematical Biosciences and Engineering, 2021, 18(1): 132-153. doi: 10.3934/mbe.2021007 |
[9] | Eric Ke Wang, liu Xi, Ruipei Sun, Fan Wang, Leyun Pan, Caixia Cheng, Antonia Dimitrakopoulou-Srauss, Nie Zhe, Yueping Li . A new deep learning model for assisted diagnosis on electrocardiogram. Mathematical Biosciences and Engineering, 2019, 16(4): 2481-2491. doi: 10.3934/mbe.2019124 |
[10] | Xiangqun Li, Hu Chen, Dong Zheng, Xinzheng Xu . CED-Net: A more effective DenseNet model with channel enhancement. Mathematical Biosciences and Engineering, 2022, 19(12): 12232-12246. doi: 10.3934/mbe.2022569 |
To prevent the websites from being maliciously accessed by the automatic program in a short time and wasting network resources, the CAPTCHA (Completely Automated Public Turing Test to Tell Computers and Humans Apart) came into being. At present, the major websites have designed kinds of CAPTCHA with low resolution, multi-noise points, deformation characters, and adhesive characters. Therefore, designing a CAPTCHA recognition method can help to verify the security of existing various forms of CAPTCHA and assist in creating more robust CAPTCHAs. At the same time, CAPTCHA recognition technology can also be applied in the area of license plate recognition, optical character recognition, handwriting recognition and so on. Scholars at home and abroad have done a lot of researches and got some progress in this field, which includes the traditional CAPTCHA recognition methods and the CAPTCHA recognition methods based on deep learning.
The traditional methods usually locate a single number or character area in an image and identify a single character after segmentation. For example, Lu Wang et al. [1] focused on the recognition of merged characters and proposed a method based on the local minimum and minimum projection values. It firstly divided the fuzzy-bonded characters and then combined the convolutional neural network (CNN) to identify a single character. But the recognition rate was only 38%. Yan et al. [2] successfully segmented the Microsoft CAPTCHA and recognized it by multiple classifiers, but the recognition rate was only 60%. Liang Zhang et al. [3] proposed a method based on LSTM (Long Short-Term Memory Network) recurrent neural network (RNN) for recognition. Long Yin et al. [4] suggested an approach based on dense scale-invariant feature transform (DENSE SIFT) and random sampling consistency algorithm (RANSAC), which had a recognition rate of 88% for simple sticky characters. It also had a good effect for the difficult twisted CAPTCHA. Hao Li et al. [5] proposed a Harris image matching method combining adaptive threshold and RANSAC. Lingyun Xiang et al. [6] used an adaptive binary arithmetic coding to encode English letters. After that, they also proposed a novel hashing method, called discrete multi-graph hashing [7]. Yuling Liu et al. [8] proposed a valid method for outsourced word segmentation, which saved storage space. Haitao Tang et al. [9] suggested a self-organizing incremental neural network based on PNN-SOINN-RBF. The overall prediction accuracy of single characters on the verification set of offline and online models were72.75% and 50.25%. Yang Wang [10] et al. proposed a three-color denoising method based on RGB, using the method of segmentation character combined with contour difference projection and water droplet algorithm. This method had an excellent recognition effect on the CAPTCHA with background noise and character distortion adhesion. Yishan Chen et al. [11] proposed a method based on traditional digital image morphological processing technology for the segmentation and recognition of CAPTCHA, with the recognition rate of 60%. Ye Wang et al. [12] proposed a new adaptive algorithm to denoise and segment the CAPTCHA images, and used OCR (Optical Character Recognition) and template matching method to recognize a single character. Wentao Ma et al. [13] proposed an adaptive median filtering algorithm based on divide and conquer. Jinwei Wang et al. [14] proposed the CQWT-based forensics scheme for color images to distinguish CG and PG images. Inevitably, the above methods adopt manual data processing; there were three problems as follows:
(1) The way of direct segmentation for the adhesive CAPTCHA images is easy to cause character defects and agammaavate the training task.
(2) Based on the global statistical feature or local feature descriptor of color, texture and shape, the extracted feature cannot accurately represent the images.
(3) Due to the imbalance of data, the results of classifier training are often not ideal and the selection of parameters adds a lot of difficulties for classifier training.
It was proved that a single feature could not adequately represent the image details [15]. The combination of both global and local features was used in image recognition method to achieve good performance [16]. With the rapid development and rise of artificial intelligence, the convolutional neural networks, by using the shared convolution kernel, have shown the effectiveness for multi-feature extraction and achieved excellent classification performance for two-dimensional graphs with invariant displacement, scaling and other forms of distortion. For example, Mingli Wen et al. [17] built a CNN with only five layers, which achieved recognition accuracy as high as 99%. Peng Yu et al. [18] used the AlexNet for CAPTCHA recognition. After 20000 iterations, the recognition accuracy of this model was 99.43%. Zhang et al. [19] improved the LeNet-5, and the recognition accuracy was also as high as 95.2%. Shuren Zhou et al. [20] proposed a traffic sign recognition algorithm based on IVGG. Wei Fang et al. [21] proposed an image recognition model based on CNN, which can enhance the classification model and effectively improve the accuracy of image recognition. Lv Yanping et al. [22] used CNN to identify Chinese CAPTCHA images with distortion, rotation, and background noise. Garg and Pollett [23] developed a single neural network capable of breaking all character-based CAPTCHA. Yunhang Shen et al. [24] proposed a structural model based on multi-scale Angle to identify the currently popular Touclick Chinese CAPTCHA images. Wang Fan et al. [25] used the Keras framework for building CNN to identify Chinese CAPTCHA images with an accuracy of 92.8%. Directly input the CAPTCHA images into the trained CNN to identify which can effectively simplify manual intervention such as character segmentation, position and noise problems. Overall our main contributions are as follows: (1) we propose a CAPTCHA recognition method based on the deep CNN. By identifying different types of CAPTCHA images, it can improve the recognition accuracy and provide a convenient way for the website users to verify the security of their CAPTCHAs; (2) we design a new DenseNet that effectively reduces the memory consumption and show excellent performance.
In 2017, Gao Huang and Zhuang Liu et al. [26] constructed 4 deep CNN called DenseNets, which connected every two layers in the network with "skip-connection." It means that the input of each layer is the union of the output of all the preceding layers, which is different from the traditional network where each layer is only connected to the subsequent layers. The DenseNets have several compelling advantages: they solve the problem of gradient dispersion, and effectively utilize the features of all the preceding convolutional layers, which reduce the computational complexity of the network parameters and show excellent classification performance.
xl=Hl([x1,x2,……,xl−1]) | (1) |
In Eq 1, xl indicates that the lth layer received the feature mapping of all the preceding convolutional layers as input, and [x1,x2,……,xl−1] is a tensor referred to the concatenation of the feature-maps. Therefore, even the last layer can receive the output of the first layer as input. As shown in Figure 1, Gao Huang inputs a given image into the DenseNets, then the network will predict the classification result after convolution and pooling in three Dense Blocks.
Gao Huang and Zhuang Liu have designed 4 networks, such as DenseNet-121, DenseNet-169, DenseNet-201, and DenseNet-264. However, all the DenseNets' convolutional groups are 12 times in Dense Block 2. But Ma N et al. [27] proved that convolutional groups increased the complexity of the network and occupied much memory resources. Therefore, we improve the structural of DenseNets and propose a CAPTCHA identification method based on the DenseNets.
Based on the architecture of the DenseNets, we build a new deep CNN called the DFCR.
Firstly, the original CAPTCHA images with a size of 224×224 are convoluted and pooled to output the cropped CAPTCHA images with a size of 56×56.
After that, 4 dense blocks are concatenated in turn. In each dense block, the "skip connection" and BN→ReLU→Conv(1×1)→BN→ReLU→Conv(3×3) are performed between every two layers, and the transition layer is connected after the first three dense blocks. The structure of the transition layer is constructed by BN→Conv(1×1)→AvgPool(2×2) to implement down-sampling, which is used here to reduce the dimension of the feature-maps and parameters and helps eliminate the computational bottleneck. More importantly, we set the convolutional group of the bottleneck layers as 6 in the Dense Block 2, which is hugely different from Gao Huang's DenseNets.
Finally, the feature-maps are used to represent the confidence map of a class directly. The values in each feature-maps are added to obtain the average value, which is then taken as the confidence value of a class and input into the corresponding softmax layer for classification. The classification layer is composed of global average pooling and softmax, which has fewer parameters and effectively prevents data overfitting. Since the Dataset #1 has 5 characters, we use the multi-task classification method to access 5 softmax classifiers. The Dataset #2 has 4 characters, so the last fully connected layer is changed to 4 dense layers. For the Dataset #3 which has 4 characters, we only need to recognize the Chinese character that is rotated 90° at random, so the original network design can be maintained. The DFCR's architecture is shown in Table 1. The growth rate k is 32. Note that each "conv" layer shown in the table corresponds to the sequence BN→ReLU→Conv.
Layers | Output Size | Dataset #1 | Dataset #2 | Dataset #3 |
Convolution | 112×112 | 7×7conv, stride2 | ||
Pooling | 56×56 | 3×3max pool, stride2 | ||
Dense Block (1) | 56×56 | (1×1conv3×3conv)×6 | ||
Transition (1) | 56×56 | 1×1conv | ||
28×28 | 2×2average pool, stride2 | |||
Dense Block (2) | 28×28 | (1×1conv3×3conv)×6 | ||
Transition (2) | 28×28 | 1×1conv | ||
14×14 | 2×2average pool, stride2 | |||
Dense Block (3) | 14×14 | (1×1conv3×3conv)×24 | ||
Transition (3) | 14×14 | 1×1conv | ||
7×7 | 2×2average pool, stride2 | |||
Dense Block (4) | 7×7 | (1×1conv3×3conv)×16 | ||
7×7global average pool | ||||
Classification Layer | 1×1 | 5×1000Dfully-connected, softmax | 4×1000Dfully-connected, softmax | 1000Dfully-connected, softmax |
Figure 2 shows the process of identifying the image "W52S" by the DFCR we built. First, the network can be input 224×224 images directly. Then, it is linked with a convolution and max pool layer, and 4 dense blocks with 3 transition blocks, and produces 7×7 feature-maps. Particularly, we design 4 softmax layers at the end of the network for Dataset #2. (a) illustrates the "skip connection" is that the nth layer is directly connected to the mth layer. The nth layer outputs k1 feature-maps, and the mth layer convolutes to get k2 feature-maps, so the mth layer outputs (k1+k2) feature-maps. (b) illustrates each dense block has a different number of bottleneck layers. (c) shows that the average pool layer is used to modified transition block.
In this paper, we used three types of CAPTCHA images given by the organizing committee in the 9th China University Student Service Outsourcing Innovation and Entrepreneurship Competition, with each class consists of 15000 CAPTCHA images. We randomly selected 8000 for training, 2000 for validation and 5000 for test. The characteristics of the three types of CAPTCHA images are as follows: The Dataset #1 is a five-character CAPTCHA composed of 10 digits and 26 upper and lower case English letters randomly without slant. The Dataset #2 is a four-character CAPTCHA consisting of 10 digits and 26 uppercase English letters randomly, with skew, noisy and irregular curves. The Dataset #3 is a four-character CAPTCHA composed of Chinese characters, and one character is randomly rotated by 90°. So, the recognition difficulty of these three types of CAPTCHA images increases successively. Table 2 shows the CAPTCHA examples of the three types.
Type | Sample 1 | Sample 2 |
Dataset #1 | ![]() |
![]() |
Dataset #2 | ![]() |
![]() |
Dataset #3 | ![]() |
![]() |
We used the Windows 10 operating system, Inter(R) Core(TM) i5-8400 processor, GTX 1060, and our experiments were completed on Keras. Keras is a high-level neural network API which is very modular, minimal, and extensible.
In our experiment, we first normalized the CAPTCHA images to the size of 224×224 and converted it to the TFrecord format. All the networks were trained using stochastic gradient descent (SGD) with the initial learning rate α = 0.001. Limited by the GPU running memory, we set the batch size as 16 for 100 epochs.
As shown in Figure 3, (1) and (2) respectively show the training accuracy and loss value of the Dataset #1 in 100 epochs, the solid line indicates 5 classifiers of DFCR, and the dotted line indicates DenseNet-121's. It can be seen from the exact value that although the training accuracy of DFCR is not as high as that of DenseNet-121 at begin, but the gradient of training accuracy is faster than DenseNet-121 in the subsequent iteration, especially in the 3 epoch. Then the exact value has reached 98.6%, which also shows that reducing the convolutional group is beneficial to improve the ability of the model to train and accelerate the convergence. At the same time, the DFCR loss value converges faster in the first three epochs. After 6 epochs, the model tends to be stable and the DenseNet-121 loss value converges more quickly in the first 6 epochs. After 11 epochs, the model is almost stable.
As shown in Figure 4, it is the memory consumption and the training duration for 100 epochs. It can be seen that the memory consumption of the DenseNet-121 network during training is close to 80%, and the DFCR we built is only about 60%. And the training time has been reduced by nearly 3 hours. Thus, the DFCR reduces memory consumption and model training time.
We compare the CAPTCHA identification accuracy and parameters of DFCR with the ResNet-50 and the DenseNet-121. Three types of 5000 CAPTCHA test sets of the TFrecords format are input to the trained optimal model, then the recognition accuracy is performed according to the existing tags, and the results are recorded in Table 3 and 4.
Dataset #1 | Dataset #2 | Dataset #3 | ||||
Validation set 2000 | Test set 5000 | Validation set 2000 | Test set 5000 | Validation set 2000 | Test set 5000 | |
ResNet50 | 99.70% | 95.34% | 99.95% | 99.90% | 99.95% | 99.86% |
DenseNet-121 | 99.80% | 95.40% | 99.95% | 99.90% | 100% | 99.92% |
DFCR | 99.80% | 99.60% | 100% | 99.96% | 100% | 99.94% |
Parameters | Dataset #1 | Dataset #2 | Dataset #3 | ||||||
Total Params | Depth | Dimension | Total Params | Depth | Dimension | Total Params | Depth | Dimension | |
ResNet50 | 23966777 | 177 | 2048 | 23890964 | 177 | 2048 | 23595908 | 177 | 2048 |
DenseNet121 | 7227129 | 428 | 1024 | 7189204 | 428 | 1024 | 7041604 | 428 | 1024 |
DFCR | 3781833 | 302 | 784 | 3752788 | 302 | 784 | 5919940 | 386 | 976 |
As shown in Table 3 and 4, we can see that the DFCR have better recognition accuracy than the fine-tuning ResNet-50 and DenseNet-121. Especially on Dataset #1, the accuracy of the DFCR is 4.2% higher than DenseNet-121. Not only that, the total parameters and the feature dimensions of the ResNet-50 are several times than ours, which adds much difficulty to subsequent data processing. We cut the total number of parameters of DFCR to half of DenseNet-121's. Not only the dimension of the feature map is reduced, but the overall training time is reduced by several hours. It can be seen that it is not a mechanically deepening of the network to have an excellent classification effect. In practical applications, a neural network needs to be constructed for specific data.
As shown in Table 5, we visualize the training process of CAPTCHA image "YEqKX." Specifically, we reconstruct the features of each layer of convolution and output a fixed feature. Even if the input of the same picture has a degree of transformation, the output can remain unchanged, which also indicates that the CNNs have strong robustness.
Layers | DenseNet-121 | DFCR |
conv1/relu | ![]() |
![]() |
conv2_block4_1_relu | ![]() |
![]() |
pool2_relu | ![]() |
![]() |
We visualize and superimpose the feature maps of the conv1/relu, conv2_block4_1_relu, and pool2_relu layers in each channel to obtain a visualization as shown in Table 5. Compared with the DenseNet-121, the DFCR we built has a stronger representation of the output characteristics in the same layer. In particular, in the output of the pool2_relu layer, it can be seen that the feature profile of the DFCR is more concrete than the DenseNet-121.
Although there are various kinds of CAPTCHAs, text-based CAPTCHA is applied most widely. On the one hand, it is because its a convenient and user-friendly way for website user; on the other hand, CAPTCHAs are a low-cost solution for websites. However, we know that the text CAPTCHAs are vulnerable and not as secure as expected. So we are willing to design text CAPTCHAs with higher security and better usability.
Defeating the CAPTCHAs is the most effective way to increase its own safety by finding the deficiency. The deep CNNs act as a more robust and useful method. All in all, using deep learning techniques to enhance the security of CAPTCHAs is a promising direction. In this paper, we constructed a deep CNN, which we referred to as DFCR. We compared its effectiveness with the ResNet-50 and the DenseNet-121. The experimental results showed that the DFCR not only kept compelling advantages but also encouraged feature reuse. On the one hand, memory consumption was greatly reduced. On the other hand, it had a better recognition performance than others. We used the end-to-end learning to directly identify the CAPTCHAs from the pixel image, which greatly avoided manual intervention, reduced the complexity of model training, and effectively prevented data over-fitting. It was different from traditional methods. What's more, we found that the recognition difficulty of these CAPTCHA images increases successively. So we can design CAPTCHA images by rotating multiple Chinese characters. The question of whether other CAPTCHA alternatives are robust and whether the designs of new CAPTCHAs can be secure are still open problems and are part of our ongoing work.
This work is supported by the National Natural Science Foundation of China (No.61772561), the Key Research & Development Plan of Hunan Province (No.2018NK2012), the Science Research Projects of Hunan Provincial Education Department (No.18A174, 18C0262), the Science & Technology Innovation Platform and Talent Plan of Hunan Province (No.2017TP1022).
All authors declare no conflicts of interest in this paper.
[1] | L. Wang, R. Zhang and D. Yin, Image verification code identification of hyphen, Comput. Eng. Appl., 28 (2011), 150–153. |
[2] | J. Yan and A. S. E. Ahmad, A low-cost attack on a Microsoft CAPTCHA, Proceedings of the ACM Conference on Computer and Communications Security, (2008), 543–554. |
[3] | L. Zhang, S. W. Huang, Z. X. Shi, et al., CAPTCHA recognition method based on LSTM RNN, Pattern Recogn., 1 (2011), 40–47. |
[4] | L. Yin, D. Yin and R. Zhang, A recognition method of twisted and pasted character verification code, Pattern Recogn., 3 (2014), 235–241. |
[5] | H. Li, J. H. Qin and X. Y. Xiang, An efficient image matching algorithm based on adaptive threshold and RANSAC, IEEE Access, 6 (2018), 66963–66971. |
[6] | L.Y. Xiang, Y. Li and W. Hao, Reversible natural language watermarking using synonym substitution and arithmetic coding, Comput. Mat. Con., 3 (2018), 541–559. |
[7] | L.Y. Xiang, X. B. Shen, J. H. Qin, et al., Discrete multi-graph hashing for large-scale visual search, Neur. Process. Lett., 49 (2019), 1055–1069. |
[8] | Y. L. Liu, H. Peng and J. Wang, Verifiable diversity ranking search over encrypted outsourced Data, Comput. Mater. Con., 1 (2018), 37–57. |
[9] | H. T. Tang, Verification code recognition model and algorithm of self-organizing incremental neural network, MA thesis, Guangdong University of technology, 2016. |
[10] | Y. Wang, Y. Q. Xu and Y. B. Peng, Verification code identification of xiaonei network based on KNN technology, Comput. Moder., 2 (2017),93–97. |
[11] | Y. S. Chen and Y. Zhang, Design and implementation of character-based image verification code recognition algorithm, Comput. K. T., 1 (2017),190–192. |
[12] | Y. Wang and M. Lu, A self-adaptive algorithm to defeat text-based CAPTCHA, IEEE International Conference on Industrial Technology, (2016), 720–725. |
[13] | W. T. Ma, J. H. Qin, X. Y. Xiang, et al., Adaptive median filtering algorithm based on divide and conquer and its application in CAPTCHA recognition, Comput. Mater. Con., 58 (2019), 665–677. |
[14] | J. W. Wang, T. Li and X. Y. Luo, Identifying computer generated images based on quaternion central moments in color quaternion wavelet domain, IEEE T. Circ. Syst. Vid., 1 (2018), 1. |
[15] | X. W. Liu, L. Wang, Jian Zhang, et al., Global and local structure preservation for feature selection, IEEE T. Neur. Net. Lear., 25 (2014), 1083–1095. |
[16] | J. H. Qin, H. Li, X. Y. Xiang, et al., An encrypted image retrieval method based on Harris corner optimization and LSH in cloud computing, IEEE Access, 17 (2019), 24626–24633. |
[17] | M. L. Wen, X. Zhao, M. Q. Cai, et al., End-to-end verification code recognition based on deep learning, Wireless Inter. technol., 14 (2017), 85–86. |
[18] | Y. Peng, Research on verification code recognition based on deep convolutional neural network, Commu. world, 1 (2018), 66–67. |
[19] | Z. Zhang, S. F. Wang and L. Dong, Verification code recognition based on deep learning, J. hubei univ. technol., 2 (2018), 5–11. |
[20] | S. R. Zhou, W. L. Liang, J. G. Li, et al., Improved VGG model for road traffic sign recognition, Comput. Mat. Con., 1 (2018), 11–24. |
[21] | W. Fang, F. H. Zhang and V. S. Sheng, A method for improving CNN-based image recognition using DCGAN, Comput. Mat. Con., 1 (2018), 167–178. |
[22] | Y. P. Lv, F. P. Cai, D. Z. Lin, et al., Chinese character CAPTCHA recognition based on convolution-neural network, Proceedings of the IEEE Congress on Evolutionary Computation, (2016), 4854–4859. |
[23] | G. Garg and C. Pollett, Neural network CAPTCHA crackers, Proceedings of the Future Technologies Conference, (2016), 853–861. |
[24] | Y. H. Shen, R. G. Ji and D. L. Cao, Hacking Chinese touclick CAPTCHA by multiscale corner struc-ture model with fast pattern matching, Proceedings of the ACM International Conference on Multimedia, (2014), 853–856. |
[25] | W. Fan, J. G. Han, Fan Gou, et al., Chinese character verification code recognition by convolutional neural network, Comput. Eng. Appl., 3 (2018), 160–165. |
[26] | G. Huang, Z. Liu, L. V. D Maaten, et al., Densely connected convolutional networks, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2017), 2261–2269. |
[27] | N. Ma, X. Zhang, H. T. Zheng, et al., ShuffleNet V2: practical guidelines for efficient CNN architecture design, Computer Vision and Pattern Recognition, preprint, arXiv:1807.11164. |
1. | Lin Ding, Weihong Xu, Yuantao Chen, Shi Cheng, Improved Density Peaks Clustering Based on Natural Neighbor Expanded Group, 2020, 2020, 1099-0526, 1, 10.1155/2020/8864239 | |
2. | Xiaopeng Yan, Xingyuan Wang, Yongjin Xian, Chaotic image encryption algorithm based on arithmetic sequence scrambling model and DNA encoding operation, 2021, 80, 1380-7501, 10949, 10.1007/s11042-020-10218-8 | |
3. | Caitlin Grady, Sarah Rajtmajer, Lauren Dennis, When Smart Systems Fail: The Ethics of Cyber–Physical Critical Infrastructure Risk, 2021, 2, 2637-6415, 6, 10.1109/TTS.2021.3058605 | |
4. | Haolin Yang, Captcha Recognition using convolutional neural networks with low structural complexity, 2020, 1693, 1742-6588, 012040, 10.1088/1742-6596/1693/1/012040 | |
5. | Jiaohua Qin, Jianhua Chen, Xuyu Xiang, Yun Tan, Wentao Ma, Jing Wang, A privacy-preserving image retrieval method based on deep learning and adaptive weighted fusion, 2020, 17, 1861-8200, 161, 10.1007/s11554-019-00909-3 | |
6. | Yuantao Chen, Linwu Liu, Jiajun Tao, Runlong Xia, Qian Zhang, Kai Yang, Jie Xiong, Xi Chen, The improved image inpainting algorithm via encoder and similarity constraint, 2020, 0178-2789, 10.1007/s00371-020-01932-3 | |
7. | Qiang Liu, Xuyu Xiang, Jiaohua Qin, Yun Tan, Junshan Tan, Yuanjing Luo, Coverless steganography based on image retrieval of DenseNet features and DWT sequence mapping, 2020, 192, 09507051, 105375, 10.1016/j.knosys.2019.105375 | |
8. | Xiaohui Zhang, Xinhua Liu, Thompson Sarkodie-Gyan, Zhixiong Li, Development of a character CAPTCHA recognition system for the visually impaired community using deep learning, 2021, 32, 0932-8092, 10.1007/s00138-020-01160-8 | |
9. | Nan Pan, Jiaohua Qin, Yun Tan, Xuyu Xiang, Guimin Hou, A video coverless information hiding algorithm based on semantic segmentation, 2020, 2020, 1687-5281, 10.1186/s13640-020-00512-8 | |
10. | Chia-Yuan Hsu, Lu-En Lin, Chang Hong Lin, Age and gender recognition with random occluded data augmentation on facial images, 2021, 1380-7501, 10.1007/s11042-020-10141-y | |
11. | Achmad Abdurrazzaq, Ahmad Kadri Junoh, Zainab Yahya, Ismail Mohd, New white blood cell detection technique by using singular value decomposition concept, 2021, 80, 1380-7501, 4627, 10.1007/s11042-020-09946-8 | |
12. | Yuanjing Luo, Jiaohua Qin, Xuyu Xiang, Yun Tan, Qiang Liu, Lingyun Xiang, Coverless real-time image information hiding based on image block matching and dense convolutional network, 2020, 17, 1861-8200, 125, 10.1007/s11554-019-00917-3 | |
13. | Dengyong Zhang, Haixin Tong, Feng Li, Lingyun Xiang, Xiangling Ding, An Ultra-Short-Term Electrical Load Forecasting Method Based on Temperature-Factor-Weight and LSTM Model, 2020, 13, 1996-1073, 4875, 10.3390/en13184875 | |
14. | Xintao Duan, Wenxin Wang, Nao Liu, Dongli Yue, Zimei Xie, Chuan Qin, StegoPNet: Image Steganography With Generalization Ability Based on Pyramid Pooling Module, 2020, 8, 2169-3536, 195253, 10.1109/ACCESS.2020.3033895 | |
15. | Zhangdong Wang, Jiaohua Qin, Xuyu Xiang, Yun Tan, A privacy-preserving and traitor tracking content-based image retrieval scheme in cloud computing, 2021, 0942-4962, 10.1007/s00530-020-00734-w | |
16. | Jiaohua Qin, Wenyan Pan, Xuyu Xiang, Yun Tan, Guimin Hou, A biological image classification method based on improved CNN, 2020, 58, 15749541, 101093, 10.1016/j.ecoinf.2020.101093 | |
17. | Vaibhavi Deshmukh, Swarnima Deshmukh, Shivani Deosatwar, Reva Sarda, Lalit Kulkarni, 2020, Versatile CAPTCHA Generation Using Machine Learning and Image Processing, 978-1-7281-6324-6, 385, 10.1109/ICCCA49541.2020.9250830 | |
18. | Lingyun Xiang, Shuanghui Yang, Yuhang Liu, Qian Li, Chengzhang Zhu, Novel Linguistic Steganography Based on Character-Level Text Generation, 2020, 8, 2227-7390, 1558, 10.3390/math8091558 | |
19. | Zhong Wang, Peibei Shi, M. Irfan Uddin, CAPTCHA Recognition Method Based on CNN with Focal Loss, 2021, 2021, 1099-0526, 1, 10.1155/2021/6641329 | |
20. | Nan Li, Qianyi Jiang, Qi Song, Rui Zhang, Xiaolin Wei, 2020, Chapter 5, 978-3-030-57057-6, 60, 10.1007/978-3-030-57058-3_5 | |
21. | Zhiyang Wang, An algorithm for ATM recognition of spliced money based on image features, 2021, 1380-7501, 10.1007/s11042-020-10348-z | |
22. | Jiangchun Mo, Yucai Zhou, The image inpainting algorithm used on multi-scale generative adversarial networks and neighbourhood, 2020, 61, 0005-1144, 704, 10.1080/00051144.2020.1821535 | |
23. | Yuantao Chen, Haopeng Zhang, Linwu Liu, Xi Chen, Qian Zhang, Kai Yang, Runlong Xia, Jingbo Xie, Research on image Inpainting algorithm of improved GAN based on two-discriminations networks, 2020, 0924-669X, 10.1007/s10489-020-01971-2 | |
24. | Qiang Liu, Xuyu Xiang, Jiaohua Qin, Yun Tan, Yao Qiu, Coverless image steganography based on DenseNet feature mapping, 2020, 2020, 1687-5281, 10.1186/s13640-020-00521-7 | |
25. | Dengyong Zhang, Xiao Chen, Feng Li, Arun Kumar Sangaiah, Xiangling Ding, Honghao Gao, Seam-Carved Image Tampering Detection Based on the Cooccurrence of Adjacent LBPs, 2020, 2020, 1939-0122, 1, 10.1155/2020/8830310 | |
26. | Jiaohua Qin, Yuanjing Luo, Xuyu Xiang, Yun Tan, Huajun Huang, Coverless Image Steganography: A Survey, 2019, 7, 2169-3536, 171372, 10.1109/ACCESS.2019.2955452 | |
27. | Jiaohua Qin, Jing Wang, Yun Tan, Huajun Huang, Xuyu Xiang, Zhibin He, Coverless Image Steganography Based on Generative Adversarial Network, 2020, 8, 2227-7390, 1394, 10.3390/math8091394 | |
28. | Dengyong Zhang, Shanshan Wang, Feng Li, Shang Tian, Jin Wang, Xiangling Ding, Rongrong Gong, Arun K. Sangaiah, An Efficient ECG Denoising Method Based on Empirical Mode Decomposition, Sample Entropy, and Improved Threshold Function, 2020, 2020, 1530-8677, 1, 10.1155/2020/8811962 | |
29. | Lei Yang, Li Feng, Longqing Zhang, Liwei Tian, Predicting freshmen enrollment based on machine learning, 2021, 0920-8542, 10.1007/s11227-021-03763-y | |
30. | Yifu Zeng, Yi Guo, Jiayi Li, Recognition and extraction of high-resolution satellite remote sensing image buildings based on deep learning, 2021, 0941-0643, 10.1007/s00521-021-06027-1 | |
31. | Zhangdong Wang, Jiaohua Qin, Xuyu Xiang, Yun Tan, Jia Peng, A privacy-preserving cross-media retrieval on encrypted data in cloud computing, 2023, 73, 22142126, 103440, 10.1016/j.jisa.2023.103440 | |
32. | Xiaojiang Zuo, Xiao Wang, Rui Han, 2022, An Empirical Analysis of CAPTCHA Image Design Choices in Cloud Services, 978-1-6654-0926-1, 1, 10.1109/INFOCOMWKSHPS54753.2022.9798343 | |
33. | Yao Wang, Yuliang Wei, Mingjin Zhang, Yang Liu, Bailing Wang, Make complex CAPTCHAs simple: A fast text captcha solver based on a small number of samples, 2021, 578, 00200255, 181, 10.1016/j.ins.2021.07.040 | |
34. | Zilong Zhuang, Ying Liu, Yutu Yang, Yinxi Shen, Binli Gou, Color Regression and Sorting System of Solid Wood Floor, 2022, 13, 1999-4907, 1454, 10.3390/f13091454 | |
35. | Qingyang Zhou, Jiaohua Qin, Xuyu Xiang, Yun Tan, Neal N. Xiong, Algorithm of Helmet Wearing Detection Based on AT-YOLO Deep Mode, 2021, 69, 1546-2226, 159, 10.32604/cmc.2021.017480 | |
36. | Qiang Liu, Xuyu Xiang, Jiaohua Qin, Yun Tan, Qin Zhang, A Robust Coverless Steganography Scheme Using Camouflage Image, 2022, 32, 1051-8215, 4038, 10.1109/TCSVT.2021.3108772 | |
37. | Abhishek Sharma, Shilpi Sharma, Saksham Gulati, CAPTCHA Robustness, 2022, 9, 2640-4079, 1, 10.4018/IJSST.299038 | |
38. | Yang Chen, Xiaonan Luo, Songhua Xu, Ruiai Chen, 2022, CaptchaGG: A linear graphical CAPTCHA recognition model based on CNN and RNN, 978-1-6654-5478-0, 175, 10.1109/ICDH57206.2022.00034 | |
39. | P. L. Chithra, K. Sathya, CAPTCHAs against meddler image identification based on a convolutional neural network, 2022, 81, 1380-7501, 8633, 10.1007/s11042-022-11961-w | |
40. | Pu Tian, Weixian Liao, Turhan Kimbrough, Erik Blasch, Wei Yu, 2022, Chapter 2, 978-3-031-09144-5, 17, 10.1007/978-3-031-09145-2_2 | |
41. | Ajay Sudhir Bale, S. Saravana Kumar, M. S. Kiran Mohan, N. Vinay, 2022, Chapter 15, 978-3-030-75944-5, 281, 10.1007/978-3-030-75945-2_15 | |
42. | Lingyun Xiang, Guohan Zhao, Qian Li, Gwang-jun Kim, Osama Alfarraj, Amr Tolba, A Fast and Effective Multiple Kernel Clustering Method on Incomplete Data, 2021, 67, 1546-2226, 267, 10.32604/cmc.2021.013488 | |
43. | Daniel Aguilar, Daniel Riofrio, Diego Benitez, Noel Perez, Ricardo Flores Moyano, 2021, Text-based CAPTCHA Vulnerability Assessment using a Deep Learning-based Solver, 978-1-6654-4141-4, 1, 10.1109/ETCM53643.2021.9590750 | |
44. | Denis O. Ishkov, Valery I. Terekhov, 2022, Text CAPTCHA Traversal with ConvNets: Impact of Color Channels, 978-1-6654-1434-0, 1, 10.1109/REEPE53907.2022.9731423 | |
45. | Zhangdong Wang, Jiaohua Qin, Xuyu Xiang, Yun Tan, Neal N. Xiong, Criss-Cross Attentional Siamese Networks for Object Tracking, 2022, 73, 1546-2226, 2931, 10.32604/cmc.2022.028896 | |
46. | Lingyun Xiang, Guoqing Guo, Qian Li, Chengzhang Zhu, Jiuren Chen, Haoliang Ma, Spam Detection in Reviews Using LSTM-Based Multi-Entity Temporal Features, 2020, 26, 1079-8587, 1375, 10.32604/iasc.2020.013382 | |
47. | Minghui Wei, Jingjing Tang, Haotian Tang, Rui Zhao, Xiaohui Gai, Renying Lin, Zhihan Lv, Adoption of Convolutional Neural Network Algorithm Combined with Augmented Reality in Building Data Visualization and Intelligent Detection, 2021, 2021, 1099-0526, 1, 10.1155/2021/5161111 | |
48. | Tang Xiaohui, An adaptive genetic algorithm-based background elimination model for English text, 2022, 26, 1432-7643, 8133, 10.1007/s00500-022-07204-7 | |
49. | Yao Wang, Yuliang Wei, Yifan Zhang, Chuhao Jin, Guodong Xin, Bailing Wang, Few-shot learning in realistic settings for text CAPTCHA recognition, 2023, 0941-0643, 10.1007/s00521-023-08262-0 | |
50. | Valery Terekhov, Valery Chernenky, Denis Ishkov, 2022, Chapter 10, 978-3-031-15167-5, 111, 10.1007/978-3-031-15168-2_10 | |
51. | Zongshuai Liu, Xuyu Xiang, Jiaohua Qin, Qin Zhang, Neal N. Xiong, Image Recognition of Citrus Diseases Based on Deep Learning, 2020, 66, 1546-2226, 457, 10.32604/cmc.2020.012165 | |
52. | Ke Qing, Rong Zhang, 2022, An Efficient ConvNet for Text-based CAPTCHA Recognition, 979-8-3503-3242-1, 1, 10.1109/ISPACS57703.2022.10082852 | |
53. | Wan Xing, Mohd Rizman Sultan Mohd, Juliana Johari, Fazlina Ahmat Ruslan, 2023, A Review on Text-based CAPTCHA Breaking Based on Deep Learning Methods, 979-8-3503-2903-2, 171, 10.1109/CEDL60560.2023.00040 | |
54. | Soumen Sinha, Mohammed Imaz Surve, 2023, CAPTCHA Recognition And Analysis Using Custom Based CNN Model - Capsecure, 979-8-3503-0060-4, 244, 10.1109/ICETCI58599.2023.10331187 | |
55. | Yaoting Li, Haixia Pan, Huolong Ye, Jiayu Zheng, 2023, Transformer Encoder for Efficient CAPTCHA Recognize, 979-8-3503-3144-8, 355, 10.1109/CBASE60015.2023.10439128 | |
56. | 2024, Redefining Security: Unveiling the Vulnerabilities of Captcha Mechanisms Using Deep Learning, 979-8-3503-0661-3, 1, 10.1109/ESCI59607.2024.10497359 | |
57. | Xuyu Xiang, Yang Tan, Jiaohua Qin, Yun Tan, Advancements and challenges in coverless image steganography: A survey, 2025, 228, 01651684, 109761, 10.1016/j.sigpro.2024.109761 | |
58. | Tao Li, Jiawei Yang, Chenxi Li, Lulu Lv, Kang Liu, Zhipeng Yuan, Youyong Li, Hongqing Yu, 2024, Chapter 4, 978-3-031-52215-4, 41, 10.1007/978-3-031-52216-1_4 | |
59. | Ashutosh Thakur, Priya Singh, 2024, Chapter 33, 978-981-99-6754-4, 431, 10.1007/978-981-99-6755-1_33 | |
60. | Tong Ji, Yuxin Luo, Yifeng Lin, Yuer Yang, Qian Zheng, Siwei Lian, Junjie Li, ImageVeriBypasser: An image verification code recognition approach based on Convolutional Neural Network, 2024, 41, 0266-4720, 10.1111/exsy.13658 | |
61. | Xing Wan, Juliana Johari, Fazlina Ahmat Ruslan, Variational Color Shift and Auto-Encoder Based on Large Separable Kernel Attention for Enhanced Text CAPTCHA Vulnerability Assessment, 2024, 15, 2078-2489, 717, 10.3390/info15110717 | |
62. | Xing Wan, Juliana Johari, Fazlina Ahmat Ruslan, Adaptive CAPTCHA: A CRNN-Based Text CAPTCHA Solver with Adaptive Fusion Filter Networks, 2024, 14, 2076-3417, 5016, 10.3390/app14125016 | |
63. | Matine Hajyan, Alireza Hosseni, Ramin Toosi, Mohammad Ali Akhaee, 2023, Farsi CAPTCHA Recognition Using Attention-Based Convolutional Neural Network, 979-8-3503-9969-1, 221, 10.1109/ICWR57742.2023.10139078 | |
64. | N Valarmathi, M Bharathi Kannan, M Elankumaran, M Gowri Shankar, B Gowtham, 2023, Fruit Disease Prediction with Fertilizer Recommendation for Citrus Family using Deep Learning, 978-1-6654-9199-0, 113, 10.1109/ICSCDS56580.2023.10104601 | |
65. | Xiaodong Wei, Zhichun Wang, Xuebin Chen, Hari Mohan Srivastava, 2023, Research on ship identification based on VGG network and millimeter wave radar, 9781510667600, 182, 10.1117/12.2686259 | |
66. | Hongfeng Niu, Ang Wei, Yunpeng Song, Zhongmin Cai, Exploring visual representations of computer mouse movements for bot detection using deep learning approaches, 2023, 229, 09574174, 120225, 10.1016/j.eswa.2023.120225 | |
67. | Mikołaj Wysocki, Henryk Gierszal, Piotr Tyczka, George Pantelis, Sophia Karagiorgou, 2024, Benchmarking of Different YOLO Models for CAPTCHAs Detection and Classification, 979-8-3503-6248-0, 2846, 10.1109/BigData62323.2024.10826049 | |
68. | N Kopperundevi, P. SaiTejeswarreddy, Malladi Revanth, M Gautham, 2024, Captcha Recognition Using CNN, 979-8-3503-5421-8, 1, 10.1109/ASIANCON62057.2024.10838000 |
Layers | Output Size | Dataset #1 | Dataset #2 | Dataset #3 |
Convolution | 112×112 | 7×7conv, stride2 | ||
Pooling | 56×56 | 3×3max pool, stride2 | ||
Dense Block (1) | 56×56 | (1×1conv3×3conv)×6 | ||
Transition (1) | 56×56 | 1×1conv | ||
28×28 | 2×2average pool, stride2 | |||
Dense Block (2) | 28×28 | (1×1conv3×3conv)×6 | ||
Transition (2) | 28×28 | 1×1conv | ||
14×14 | 2×2average pool, stride2 | |||
Dense Block (3) | 14×14 | (1×1conv3×3conv)×24 | ||
Transition (3) | 14×14 | 1×1conv | ||
7×7 | 2×2average pool, stride2 | |||
Dense Block (4) | 7×7 | (1×1conv3×3conv)×16 | ||
7×7global average pool | ||||
Classification Layer | 1×1 | 5×1000Dfully-connected, softmax | 4×1000Dfully-connected, softmax | 1000Dfully-connected, softmax |
Type | Sample 1 | Sample 2 |
Dataset #1 | ![]() |
![]() |
Dataset #2 | ![]() |
![]() |
Dataset #3 | ![]() |
![]() |
Dataset #1 | Dataset #2 | Dataset #3 | ||||
Validation set 2000 | Test set 5000 | Validation set 2000 | Test set 5000 | Validation set 2000 | Test set 5000 | |
ResNet50 | 99.70% | 95.34% | 99.95% | 99.90% | 99.95% | 99.86% |
DenseNet-121 | 99.80% | 95.40% | 99.95% | 99.90% | 100% | 99.92% |
DFCR | 99.80% | 99.60% | 100% | 99.96% | 100% | 99.94% |
Parameters | Dataset #1 | Dataset #2 | Dataset #3 | ||||||
Total Params | Depth | Dimension | Total Params | Depth | Dimension | Total Params | Depth | Dimension | |
ResNet50 | 23966777 | 177 | 2048 | 23890964 | 177 | 2048 | 23595908 | 177 | 2048 |
DenseNet121 | 7227129 | 428 | 1024 | 7189204 | 428 | 1024 | 7041604 | 428 | 1024 |
DFCR | 3781833 | 302 | 784 | 3752788 | 302 | 784 | 5919940 | 386 | 976 |
Layers | DenseNet-121 | DFCR |
conv1/relu | ![]() |
![]() |
conv2_block4_1_relu | ![]() |
![]() |
pool2_relu | ![]() |
![]() |
Layers | Output Size | Dataset #1 | Dataset #2 | Dataset #3 |
Convolution | 112×112 | 7×7conv, stride2 | ||
Pooling | 56×56 | 3×3max pool, stride2 | ||
Dense Block (1) | 56×56 | (1×1conv3×3conv)×6 | ||
Transition (1) | 56×56 | 1×1conv | ||
28×28 | 2×2average pool, stride2 | |||
Dense Block (2) | 28×28 | (1×1conv3×3conv)×6 | ||
Transition (2) | 28×28 | 1×1conv | ||
14×14 | 2×2average pool, stride2 | |||
Dense Block (3) | 14×14 | (1×1conv3×3conv)×24 | ||
Transition (3) | 14×14 | 1×1conv | ||
7×7 | 2×2average pool, stride2 | |||
Dense Block (4) | 7×7 | (1×1conv3×3conv)×16 | ||
7×7global average pool | ||||
Classification Layer | 1×1 | 5×1000Dfully-connected, softmax | 4×1000Dfully-connected, softmax | 1000Dfully-connected, softmax |
Type | Sample 1 | Sample 2 |
Dataset #1 | ![]() |
![]() |
Dataset #2 | ![]() |
![]() |
Dataset #3 | ![]() |
![]() |
Dataset #1 | Dataset #2 | Dataset #3 | ||||
Validation set 2000 | Test set 5000 | Validation set 2000 | Test set 5000 | Validation set 2000 | Test set 5000 | |
ResNet50 | 99.70% | 95.34% | 99.95% | 99.90% | 99.95% | 99.86% |
DenseNet-121 | 99.80% | 95.40% | 99.95% | 99.90% | 100% | 99.92% |
DFCR | 99.80% | 99.60% | 100% | 99.96% | 100% | 99.94% |
Parameters | Dataset #1 | Dataset #2 | Dataset #3 | ||||||
Total Params | Depth | Dimension | Total Params | Depth | Dimension | Total Params | Depth | Dimension | |
ResNet50 | 23966777 | 177 | 2048 | 23890964 | 177 | 2048 | 23595908 | 177 | 2048 |
DenseNet121 | 7227129 | 428 | 1024 | 7189204 | 428 | 1024 | 7041604 | 428 | 1024 |
DFCR | 3781833 | 302 | 784 | 3752788 | 302 | 784 | 5919940 | 386 | 976 |
Layers | DenseNet-121 | DFCR |
conv1/relu | ![]() |
![]() |
conv2_block4_1_relu | ![]() |
![]() |
pool2_relu | ![]() |
![]() |