With the rapid development and application of the mobile Internet, it is necessary to analyze and classify mobile traffic to meet the needs of users. Due to the difficulty in collecting some application data, the mobile traffic data presents a long-tailed distribution, resulting in a decrease in classification accuracy. In addition, the original GAN is difficult to train, and it is prone to "mode collapse". Therefore, this paper introduces the self-attention mechanism and gradient normalization into the auxiliary classifier generative adversarial network to form SA-ACGAN-GN model to solve the long-tailed distribution and training stability problems of mobile traffic data. This method firstly converts the traffic into images; secondly, to improve the quality of the generated images, the self-attention mechanism is introduced into the ACGAN model to obtain the global geometric features of the images; finally, the gradient normalization strategy is added to SA-ACGAN to further improve the data augmentation effect and improve the training stability. It can be seen from the cross-validation experimental data that, on the basis of using the same classifier, the SA-ACGAN-GN algorithm proposed in this paper, compared with other comparison algorithms, has the best precision reaching 93.8%; after adding gradient normalization, during the training process of the model, the classification loss decreases rapidly and the loss curve fluctuates less, indicating that the method proposed in this paper can not only effectively improve the long-tail problem of the dataset, but also enhance the stability of the model training.
Citation: Xingyu Gong, Ling Jia, Na Li. Research on mobile traffic data augmentation methods based on SA-ACGAN-GN[J]. Mathematical Biosciences and Engineering, 2022, 19(11): 11512-11532. doi: 10.3934/mbe.2022536
With the rapid development and application of the mobile Internet, it is necessary to analyze and classify mobile traffic to meet the needs of users. Due to the difficulty in collecting some application data, the mobile traffic data presents a long-tailed distribution, resulting in a decrease in classification accuracy. In addition, the original GAN is difficult to train, and it is prone to "mode collapse". Therefore, this paper introduces the self-attention mechanism and gradient normalization into the auxiliary classifier generative adversarial network to form SA-ACGAN-GN model to solve the long-tailed distribution and training stability problems of mobile traffic data. This method firstly converts the traffic into images; secondly, to improve the quality of the generated images, the self-attention mechanism is introduced into the ACGAN model to obtain the global geometric features of the images; finally, the gradient normalization strategy is added to SA-ACGAN to further improve the data augmentation effect and improve the training stability. It can be seen from the cross-validation experimental data that, on the basis of using the same classifier, the SA-ACGAN-GN algorithm proposed in this paper, compared with other comparison algorithms, has the best precision reaching 93.8%; after adding gradient normalization, during the training process of the model, the classification loss decreases rapidly and the loss curve fluctuates less, indicating that the method proposed in this paper can not only effectively improve the long-tail problem of the dataset, but also enhance the stability of the model training.
[1] | T. T. T. Nguyen, G. Armitage, A survey of techniques for internet traffic classification using machine learning, IEEE Commun. Surv. Tutorials, 10 (2008), 56–76. https://doi.org/10.1109/SURV.2008.080406 doi: 10.1109/SURV.2008.080406 |
[2] | A. Razaghpanah, A. A. Niaki, N. Vallina-Rodriguez, S. Sundaresan, J. Amann, P. Gill, Studying TLS usage in Android apps, in Proceedings of the 13th International Conference on emerging Networking Experiments and Technologies, (2017), 350–362. https://doi.org/10.1145/3232755.3232779 |
[3] | M. Talyansky, A. Tumarkin. System and method for optimizing inter-node communication in content distribution network, P, US14383062, 2018-01-09. |
[4] | D. Li, Y.F. Zhu, W. Lin. Traffic identification of mobile Apps base on variational autoencoder network. in 2017 13th International Conference on Computational Intelligence and Security (CIS), (2017), 287–291. http://doi.org/10.1109/CIS.2017.00069 |
[5] | G. Aceto, D. Ciuonzo, A. Montieri, A.Pescapé, Mobile encrypted traffic classification using deep learning, in 2018 IEEE/ACM Network Traffic Measurement and Analysis Conference, (2018), 1–8. http://doi.org/10.23919/TMA.2018.8506558 |
[6] | G. Aceto, D. Ciuonzo, A. Montieri, A. Pescapé, MIMETIC: Mobile encrypted traffic classification using multimodal deep learning, Comput. Networks, 165 (2019). http://doi.org/10.1016/j.comnet.2019.106944 doi: 10.1016/j.comnet.2019.106944 |
[7] | G. Aceto, D. Ciuonzo, A. Montieri, A. Pescapé, DISTILLER: Encrypted traffic classification via multimodal multitask deep learning, Network Comput. Appl., 183–184 (2021), 102985. https://doi.org/10.1016/j.jnca.2021.102985 doi: 10.1016/j.jnca.2021.102985 |
[8] | C. Liu, L. He, G. Xiong, Z. Cao, Z. Li, FS-Net: A flow sequence network for encrypted traffic classification, in 2019 IEEE Conference on Computer Communications, (2019), 1171–1179. http://doi.org/10.1109/INFOCOM.2019.8737507 |
[9] | A. Nascita, A. Montieri, G. Aceto, D. Ciuonzo, V. Persico, A. Pescapé, XAI meets mobile traffic classification: Understanding and improving multimodal deep learning architectures, IEEE Trans. Network Serv. Manage., 18 (2021), 4225–4246. http://doi.org/10.1109/TNSM.2021.3098157 doi: 10.1109/TNSM.2021.3098157 |
[10] | M. Galar, A. Fernandez, E. Barrenechea, H. Bustince, F. Herrera, A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans. Syst. Man Cybern., 42 (2012), 463–484. https://doi.org/10.1109/TSMCC.2011.2161285 doi: 10.1109/TSMCC.2011.2161285 |
[11] | J. Lee, K. Park, Gan-based imbalanced data intrusion detection system, Pers. Ubiquitous Comput., 25(2021), 121–128. https://doi.org/10.1007/s00779-019-01332-y doi: 10.1007/s00779-019-01332-y |
[12] | Y. Hong, U. Hwang, J. Yoo, S. Yoon, How generative adversarial networks and their variants work: An overview, ACM Comput. Surv., 52 (2019), 43. https://doi.org/10.1145/3301282 doi: 10.1145/3301282 |
[13] | Y. L. Wu, H. H. Shuai, Z. R. Tam, H. Y. Chiu, Gradient normalization for generative adversarial networks, in International Conference on Computer Vision, 2021. https://doi.org/10.48550/arXiv.2109.02235 |
[14] | N. Japkowicz, Learning from imbalanced data sets: A comparison of various strategies, in AAAI workshop learning from imbalanced data sets, (2000), 10–15. |
[15] | N. Chawla, K. Bowyer, L. Hall, W. Kegelmeyer, Smote: Synthetic minority over-sampling technique, Artif. Intell. Res., 16 (2002), 321–357. https://doi.org/10.1613/jair.953 doi: 10.1613/jair.953 |
[16] | H. He, Y. Bai, E. A. Garcia, ADASYN: Adaptive synthetic sampling approach for imbalanced learning., in Proceedings of International Joint Conference on Neural Networks, (2008), 1322–1328. |
[17] | Z. Xu, D. Shen, T. Nie, Y. Kou, A hybrid sampling algorithm combining M-SMOTE and ENN based on Random forest for medical imbalanced data, Biomed. Inform., (2020). https://doi.org/10.1016/j.jbi.2020.103465 doi: 10.1016/j.jbi.2020.103465 |
[18] | R. Hasibi, M. Shokri, M. Dehghan, Augmentation scheme for dealing with imbalanced network traffic classification using deep learning, Comput. Sci. Networking Int. Archit., 2019. https://doi.org/10.48550/arXiv.1901.00204 doi: 10.48550/arXiv.1901.00204 |
[19] | M. A. Arefeen, S. T. Nimi, M. S. Rahman, Neural network-based undersampling techniques, IEEE Trans. Syst. Man Cybern. Syst., 2020. https://doi.org/10.1109/TSMC.2020.3016283 doi: 10.1109/TSMC.2020.3016283 |
[20] | F. Folino, G. Folino, M. Guarascio, F. S. Pisani, L. Pontieri, On learning effective ensembles of deep neural networks for intrusion detection, Inform. Fusion, 72 (2021), 48–69. https://doi.org/10.1016/j.inffus.2021.02.007 doi: 10.1016/j.inffus.2021.02.007 |
[21] | P. Bedi, N. Gupta, V. Jindal, I-SiamIDS: an improved Siam-IDS for handling class imbalance in network-based intrusion detection systems, Appl. Intell., 51 (2021), 1133–1151. https://doi.org/10.1007/s10489-020-01886-y doi: 10.1007/s10489-020-01886-y |
[22] | N. Gupta, V. Jindal, P. Bedi, CSE-IDS: Using cost-sensitive deep learning and ensemble algorithms to handle class imbalance in network-based intrusion detection systems, Comput. Secur., 112 (2022), 102499. https://doi.org/10.1016/j.cose.2021.102499 doi: 10.1016/j.cose.2021.102499 |
[23] | I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, et al., Generative adversarial nets, in Proceedings of the 27th International Conference on Neural Information Processing Systems, (2014), 2672–2680. |
[24] | L. C. Yang, S. Y. Chou, Y. H. Yang, MidiNet: A convolutional generative adversarial network for symbolic-domain music generation, preprint, arXiv: 1703.10847. |
[25] | S. Yu, S. Zhang, B. Wang, H. Dun, L. Xu, X. Huang, et al., Generative adversarial network based data augmentation to improve cervical cell classification model, Math. Biosci. Eng., 18 (2021), 1740–1752. https://doi.org/10.3934/mbe.2021090 doi: 10.3934/mbe.2021090 |
[26] | R. Durai, R. N. Abirami, Identity preserving multi-pose facial expression recognition using fine tuned VGG on the latent space vector of generative adversarial network, Math. Biosci. Eng., 18 (2021), 3699–3717. https://doi.org/10.3934/mbe.2021186 doi: 10.3934/mbe.2021186 |
[27] | H. Zhang, X. Yu, P. Ren, C. Luo, G. Min, Deep adversarial learning in intrusion detection: A data augmentation enhanced framework, preprint, arXiv: 1901.07949. |
[28] | S. Park, H. Park, Combined oversampling and undersampling method based on slow-start algorithm for imbalanced network traffic, Computing, 103 (2021), 401–424. https://doi.org/10.1007/s00607-020-00854-1 doi: 10.1007/s00607-020-00854-1 |
[29] | L. Vu, B. C. Thanh, U. Nguyen, A deep learning based method for handling imbalanced problem in network traffic classification, in Eighth International Symposium on Information and Communication Technology, (2017), 333–339. https://doi.org/10.1145/3155133.3155175 |
[30] | G. Douzas, F. Bacao, Effective data generation for imbalanced learning using conditional generative adversarial networks, Expert Syst. Appl., 91 (2017), 464–471. https://doi.org/10.1016/j.eswa.2017.09.030 doi: 10.1016/j.eswa.2017.09.030 |
[31] | M. Arjovsky, S. Chintala, L. Bottou, Wasserstein gan, preprint, arXiv: 1701.07875. |
[32] | I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, A. Courville, Improved training of Wasserstein gans, in Proceedings of the 31st International Conference on Neural Information Processing Systems, (2017), 5769–5779. |
[33] | T. Miyato, T. Kataoka, M. Koyama, Y. Yoshida, Spectral normalization for generative adversarial networks, in International Conference on Learning Representations, 2018. |
[34] | H. W. Ding, L. Y. Chen, L. Dong, Z. W. Fu, X. H. Cui, Imbalanced data classification: A KNN and generative adversarial networks-based hybrid approach for intrusion detection, Future Gener. Comput. Syst., 131 (2022), 240–254. https://doi.org/10.1016/j.future.2022.01.026 doi: 10.1016/j.future.2022.01.026 |
[35] | M. Zheng, T. Li, R. Zhu, Y. H. Tang, M. J. Tang, L. L. Lin, et al., Conditional Wasserstein generative adversarial network-gradient penalty-based approach to alleviating imbalanced data classification, Inform. Sci., 512 (2020), 1009–1023. https://doi.org/10.1016/j.ins.2019.10.014 doi: 10.1016/j.ins.2019.10.014 |
[36] | L. M. Xua, X. H. Zeng, Z. W. Huang, W. S. Li, H. Zhang, Low-dose chest X-ray image super-resolution using generative adversarial nets with spectral normalization, Biomed. Signal Proc. Control, 55 (2020), 101600. https://doi.org/10.1016/j.bspc.2019.101600 doi: 10.1016/j.bspc.2019.101600 |