Research article Special Issues

Evading obscure communication from spam emails


  • Received: 02 October 2021 Accepted: 28 November 2021 Published: 22 December 2021
  • Spam is any form of annoying and unsought digital communication sent in bulk and may contain offensive content feasting viruses and cyber-attacks. The voluminous increase in spam has necessitated developing more reliable and vigorous artificial intelligence-based anti-spam filters. Besides text, an email sometimes contains multimedia content such as audio, video, and images. However, text-centric email spam filtering employing text classification techniques remains today's preferred choice. In this paper, we show that text pre-processing techniques nullify the detection of malicious contents in an obscure communication framework. We use Spamassassin corpus with and without text pre-processing and examined it using machine learning (ML) and deep learning (DL) algorithms to classify these as ham or spam emails. The proposed DL-based approach consistently outperforms ML models. In the first stage, using pre-processing techniques, the long-short-term memory (LSTM) model achieves the highest results of 93.46% precision, 96.81% recall, and 95% F1-score. In the second stage, without using pre-processing techniques, LSTM achieves the best results of 95.26% precision, 97.18% recall, and 96% F1-score. Results show the supremacy of DL algorithms over the standard ones in filtering spam. However, the effects are unsatisfactory for detecting encrypted communication for both forms of ML algorithms.

    Citation: Khan Farhan Rafat, Qin Xin, Abdul Rehman Javed, Zunera Jalil, Rana Zeeshan Ahmad. Evading obscure communication from spam emails[J]. Mathematical Biosciences and Engineering, 2022, 19(2): 1926-1943. doi: 10.3934/mbe.2022091

    Related Papers:

  • Spam is any form of annoying and unsought digital communication sent in bulk and may contain offensive content feasting viruses and cyber-attacks. The voluminous increase in spam has necessitated developing more reliable and vigorous artificial intelligence-based anti-spam filters. Besides text, an email sometimes contains multimedia content such as audio, video, and images. However, text-centric email spam filtering employing text classification techniques remains today's preferred choice. In this paper, we show that text pre-processing techniques nullify the detection of malicious contents in an obscure communication framework. We use Spamassassin corpus with and without text pre-processing and examined it using machine learning (ML) and deep learning (DL) algorithms to classify these as ham or spam emails. The proposed DL-based approach consistently outperforms ML models. In the first stage, using pre-processing techniques, the long-short-term memory (LSTM) model achieves the highest results of 93.46% precision, 96.81% recall, and 95% F1-score. In the second stage, without using pre-processing techniques, LSTM achieves the best results of 95.26% precision, 97.18% recall, and 96% F1-score. Results show the supremacy of DL algorithms over the standard ones in filtering spam. However, the effects are unsatisfactory for detecting encrypted communication for both forms of ML algorithms.



    加载中


    [1] C. M. Habito, A. Morgan, C. Vaughan, 'direct'and 'instant': the role of digital technology and social media in young filipinos' intimate relationships, Cult., Health & Sexual., 1–19. doi: 10.1080/13691058.2021.1877825.
    [2] M. U. Khan, A. R. Javed, M. Ihsan, U. Tariq, A novel category detection of social media reviews in the restaurant industry, Multimedia Syst., 1–14. doi: 10.1007/s00530-020-00704-2.
    [3] M. Hina, M. Ali, A. R. Javed, F. Ghabban, L. A. Khan, Z. Jalil, Sefaced: Semantic-based forensic analysis and classification of e-mail data using deep learning, IEEE Access, 9 (2021), 98398–98411. doi: 10.1109/ACCESS.2021.3095730. doi: 10.1109/ACCESS.2021.3095730
    [4] R. Kong, H. Zhu, J. A. Konstan, Learning to ignore: A case study of organization-wide bulk email effectiveness, in Proceedings of the ACM on Human-Computer Interaction, 5 (2021), 1–23. doi: 10.1145/3479861.
    [5] E. Kiselev, Trends and features of russian business email: Contrastive analysis based on materials from business communication textbooks, Jpn. Sl. East Eur. Stud., 41 (2021), 18–41.
    [6] M. Hina, M. Ali, A. R. Javed, G. Srivastava, T. R. Gadekallu, Z. Jalil, Email classification and forensics analysis using ML, in 2021 IEEE SmartWorld, Ubiquitous Intelligence Computing, Advanced Trusted Computing, Scalable Computing Communications, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/IOP/SCI), 2021,630–635. doi: 10.1109/SWC50871.2021.00093.
    [7] W. Ahmed, A. Rasool, A. R. Javed, N. Kumar, T. R. Gadekallu, Z. Jalil, et al., Security in next generation mobile payment systems: A comprehensive survey, IEEE Access, 9 (2021), 115932–115950. doi: 10.1109/ACCESS.2021.3105450. doi: 10.1109/ACCESS.2021.3105450
    [8] A. R. Javed, S. U. Rehman, M. U. Khan, M. Alazab, H. U. Khan, Betalogger: Smartphone sensor-based side-channel attack detection and text inference using language modeling and dense multilayer neural network, Trans. Asian Low-Res. Lang. Inf. Process., 20 (2021), 1–17. doi: 10.1145/3460392. doi: 10.1145/3460392
    [9] A. R. Javed, M. O. Beg, M. Asim, T. Baker, A. H. Al-Bayatti, Alphalogger: Detecting motion-based side-channel attack using smartphone keystrokes, J. Ambient Intell. Human. Comput., 1–14. doi: 10.1007/s12652-020-01770-0.
    [10] A. Basit, M. Zafar, A. R. Javed, Z. Jalil, A novel ensemble machine learning method to detect phishing attack, in 2020 IEEE 23rd International Multitopic Conference (INMIC), IEEE, 2020, 1–5. doi: 10.1109/INMIC50486.2020.9318210.
    [11] A. Basit, M. Zafar, X. Liu, A. R. Javed, Z. Jalil, K. Kifayat, A comprehensive survey of ai-enabled phishing attacks detection techniques, Telecommun. Syst., 76 (2021), 139–154. doi: 10.1007/s11235-020-00733-2. doi: 10.1007/s11235-020-00733-2
    [12] S. ur Rehman, M. Khaliq, S. I. Imtiaz, A. Rasool, M. Shafiq, A. R. Javed, et al., Diddos: An approach for detection and identification of distributed denial of service (ddos) cyberattacks using gated recurrent units (gru), Future Gener. Comput. Syst., 118 (2021), 453–466. doi: 10.1016/j.future.2021.01.022. doi: 10.1016/j.future.2021.01.022
    [13] S. I. Imtiaz, S. ur Rehman, A. R. Javed, Z. Jalil, X. Liu, W. S. Alnumay, Deepamd: Detection and identification of android malware using high-efficient deep artificial neural network, Future Gener. Comput. Syst., 115 (2021), 844–856. doi: 10.1016/j.future.2020.10.008. doi: 10.1016/j.future.2020.10.008
    [14] T. Conley, J. Kalita, Language model metrics and procrustes analysis for improved vector transformation of nlp embeddings, preprint, arXiv: 2106.02490.
    [15] L. Kumar, A secure communication with one-time pad encryption and steganography method in cloud, Turk. J. Comput. Math. Educ. (TURCOMAT), 12 (2021), 2567–2576. doi: 10.1007/s00779-021-01607-3. doi: 10.1007/s00779-021-01607-3
    [16] R. Abid, C. Iwendi, A. R. Javed, M. Rizwan, Z. Jalil, J. H. Anajemba, et al., An optimised homomorphic crt-rsa algorithm for secure and efficient communication, Pers. Ubiquitous Comput., 1–14. doi: 10.1007/s00779-021-01607-3.
    [17] B. Ahuja, R. Doriya, Visual chaos steganography with fractional transform, in Soft Computing and Signal Processing, Springer, 2021,295–304.
    [18] Q. Li, X. Wang, B. Ma, X. Wang, C. Wang, Z. Xia, Y. Shi, Image steganography based on style transfer and quaternion exponent moments, Appl. Soft Comput., 107618. doi: 10.1016/j.asoc.2021.107618.
    [19] L. Serpa-Andrade, R. Garcia-Velez, E. Pinos-Velez, C. Flores-Urgilez, Analysis of the application of steganography applied in the field of cybersecurity, in International Conference on Applied Human Factors and Ergonomics, Springer, 2021,366–371.
    [20] C. Iwendi, Z. Jalil, A. R. Javed, T. Reddy, R. Kaluri, G. Srivastava, et al., Keysplitwatermark: Zero watermarking algorithm for software protection against cyber-attacks, IEEE Access, 8 (2020), 72650–72660. doi: 10.1109/ACCESS.2020.2988160. doi: 10.1109/ACCESS.2020.2988160
    [21] D. A. Putri, D. A. Kristiyanti, E. Indrayuni, A. Nurhadi and D. R. Hadinata, Comparison of naive bayes algorithm and support vector machine using pso feature selection for sentiment analysis on e-wallet review, in Journal of Physics: Conference Series, 1641 (2020), 012085. doi: 10.1088/1742-6596/1641/1/012085.
    [22] A. Mishra, J. A. Latorre, J. Pool, D. Stosic, D. Stosic, G. Venkatesh, et al., Accelerating sparse deep neural networks, preprint, arXiv: 2104.08378.
    [23] M. Ramprasad, N. H. Chowdary, K. J. Reddy, V. Gaurav, Email spam detection using python & machine learning, Turk. J. Phys. Rehabil., 32 (2019), 3.
    [24] M. Eriksson, G. Heuguet, Genealogies of online content identification-an introduction, Int. Hist., 5 (2021), 1–7. doi: 10.1080/24701475.2021.1878649. doi: 10.1080/24701475.2021.1878649
    [25] M. Neha, M. S. Nair, A novel twitter spam detection technique by integrating inception network with attention based lstm, in 2021 5th International Conference on Trends in Electronics and Informatics (ICOEI), IEEE, 2021, 1009–1014. doi: 10.1109/ICOEI51242.2021.9452825.
    [26] F. Iqbal, R. Batool, B. C. Fung, S. Aleem, A. Abbasi, A. R. Javed, Toward tweet-mining framework for extracting terrorist attack-related information and reporting, IEEE Access, 9 (2021), 115535–115547. doi: 10.1109/ACCESS.2021.3102040. doi: 10.1109/ACCESS.2021.3102040
    [27] S. E. Rahman, S. Ullah, Email spam detection using bidirectional long short term memory with convolutional neural network, in 2020 IEEE Region 10 Symposium (TENSYMP), IEEE, 2020, 1307–1311. doi: 10.1109/TENSYMP50017.2020.9230769.
    [28] N. Garba, S. Rakshit, C. D. Maa, N. R. Vajjhala, An email content-based insider threat detection model using anomaly detection algorithms, in Proceedings of the International Conference on Innovative Computing Communication (ICICC) 2021, 2021. doi: 10.2139/ssrn.3833744.
    [29] T. Sharma, P. Ferronato, M. Bashir, Phishing email detection method: Leveraging data across different organizations, 2020.
    [30] S. Afzal, M. Asim, A. R. Javed, M. O. Beg, T. Baker, Urldeepdetect: A deep learning approach for detecting malicious urls using semantic vector models, J. Network Syst. Manage., 29 (2021), 1–27. doi: 10.1007/s10922-021-09587-8. doi: 10.1007/s10922-021-09587-8
    [31] R. Chiramdasu, G. Srivastava, S. Bhattacharya, P. K. Reddy, T. R. Gadekallu, Malicious url detection using logistic regression, in 2021 IEEE International Conference on Omni-Layer Intelligent Systems (COINS), IEEE, 2021, 1–6. doi: 10.1109/COINS51742.2021.9524269.
    [32] C. Rupa, G. Srivastava, S. Bhattacharya, P. Reddy, T. R. Gadekallu, A machine learning driven threat intelligence system for malicious url detection, in The 16th International Conference on Availability, Reliability and Security, 2021, 1–7. doi: 10.1145/3465481.3470029.
    [33] B. Aguirre, Steganography in Contemporary Cyberattacks and the Link to Child Pornography, PhD thesis, Utica College, 2020.
    [34] R. Singh, Analysis of spam email filtering through naive bayes algorithm across different datasets.
    [35] S. Srinivasan, V. Ravi, M. Alazab, S. Ketha, A. Z. Ala'M, S. K. Padannayil, Spam emails detection based on distributed word embedding with deep learning, in Machine Intelligence and Big Data Analytics for Cybersecurity Applications, Springer, 2021,161–189. doi: 10.1002/9781119701859.ch6.
    [36] A. N. Soni, Spam-e-mail-detection-using-advanced-deep-convolution-neuralnetwork-algorithms, J. Innovative Dev. Pharm. Tech. Sci., 2 (2019), 74–80. doi: 10.1007/s35146-018-0155-y. doi: 10.1007/s35146-018-0155-y
    [37] J. Rastenis, S. Ramanauskaitė, I. Suzdalev, K. Tunaitytė, J. Janulevičius, A. Čenys, Multi-language spam/phishing classification by email body text: Toward automated security incident investigation, Electronics, 10 (2021), 668. doi: 10.3390/electronics10060668. doi: 10.3390/electronics10060668
    [38] S. Manjula, M. Shivamurthaiah, Identification of languages from the text document using natural language processing system, Turk. J. Comput. Math. Educ. (TURCOMAT), 12 (2021), 2465–2472.
    [39] M. Mukhanova, Text normalization and spelling correction in kazakh language.
    [40] A. M. Alhassan, W. M. N. W. Zainon, Review of feature selection, dimensionality reduction and classification for chronic disease diagnosis, IEEE Access. 9 (2021), 87310–87317. doi: 10.1109/ACCESS.2021.3088613. doi: 10.1109/ACCESS.2021.3088613
    [41] M. Ontivero-Ortega, A. Lage-Castellanos, G. Valente, R. Goebel, M. Valdes-Sosa, Fast gaussian naïve bayes for searchlight classification analysis, Neuroimage, 163 (2017), 471–479. doi: 10.1016/j.neuroimage.2017.09.001. doi: 10.1016/j.neuroimage.2017.09.001
    [42] A. R. Javed, Z. Jalil, W. Zehra, T. R. Gadekallu, D. Y. Suh, M. J. Piran, A comprehensive survey on digital video forensics: Taxonomy, challenges, and future directions, Eng. Appl. Artif. Intell., 106 (2021), 104456. doi: 10.1016/j.engappai.2021.104456. doi: 10.1016/j.engappai.2021.104456
    [43] S. Visa, B. Ramsay, A. L. Ralescu, E. Van Der Knaap, Confusion matrix-based feature selection, MAICS, 710 (2011), 120–127. doi: 10.3917/trans.120.0127. doi: 10.3917/trans.120.0127
    [44] A. Mann, O. Höft, Categorization of swedish e-mails using supervised machine learning, 2021.
    [45] V. Karunakaran, V. Rajasekar, S. I. T. Joseph, Exploring a filter and wrapper feature selection techniques in machine learning, in Computational Vision and Bio-Inspired Computing, Springer, 2021,497–506.
    [46] N. P. Wosah, T. Win, Phishing mitigation techniques: A literature survey, preprint, arXiv: 2104.06989. doi: 10.5121/ijnsa.2021.13205.
    [47] A. El Kah, I. Zeroual, The effects of pre-processing techniques on arabic text classification, Int. J., 10.
    [48] T. Mehrotra, G. K. Rajput, M. Verma, B. Lakhani, N. Singh, Email spam filtering technique from various perspectives using machine learning algorithms, in Data Driven Approach Towards Disruptive Technologies: Proceedings of MIDAS 2020, Springer Singapore, 2021,423–432. doi: 10.1007/978-981-15-9873-9-33.
    [49] S. P. Shyry, Y. B. Jinila, Detection and prevention of spam mail with semantics-based text classification of collaborative and content filtering, in Journal of Physics: Conference Series, 1770 (2021), 012031. doi: 10.1088/1742-6596/1770/1/012031.
  • Reader Comments
  • © 2022 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(3666) PDF downloads(121) Cited by(9)

Article outline

Figures and Tables

Figures(10)  /  Tables(5)

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog