The ubiquitous adoption of Android devices has unfortunately brought a surge in malware threats, compromising user data, privacy concerns, and financial and device integrity, to name a few. To combat this, numerous efforts have explored automated botnet detection mechanisms, with anomaly-based approaches leveraging machine learning (ML) gaining attraction due to their signature-agnostic nature. However, the problem lies in devising accurate ML models which capture the ever evolving landscape of malwares by effectively leveraging all the possible features from Android application packages (APKs).This paper delved into this domain by proposing, implementing, and evaluating an image-based Android malware detection (AMD) framework that harnessed the power of feature hybridization. The core idea of this framework was the conversion of text-based data extracted from Android APKs into grayscale images. The novelty aspect of this work lied in the unique image feature extraction strategies and their subsequent hybridization to achieve accurate malware classification using ML models. More specifically, four distinct feature extraction methodologies, namely, Texture and histogram of oriented gradients (HOG) from spatial domain, and discrete wavelet transform (DWT) and Gabor from the frequency domain were employed to hybridize the features for improved malware identification. To this end, three image-based datasets, namely, Dex, Manifest, and Composite, derived from the information security centre of excellence (ISCX) Android Malware dataset, were leveraged to evaluate the optimal data source for botnet classification. Popular ML classifiers, including naive Bayes (NB), multilayer perceptron (MLP), support vector machine (SVM), and random forest (RF), were employed for the classification task. The experimental results demonstrated the efficacy of the proposed framework, achieving a peak classification accuracy of 93.03% and recall of 97.1% for the RF classifier using the Manifest dataset and a combination of Texture and HOG features. These findings validate the proof-of-concept and provide valuable insights for researchers exploring ML/deep learning (DL) approaches in the domain of AMD.
Citation: Abul Bashar. Employing combined spatial and frequency domain image features for machine learning-based malware detection[J]. Electronic Research Archive, 2024, 32(7): 4255-4290. doi: 10.3934/era.2024192
The ubiquitous adoption of Android devices has unfortunately brought a surge in malware threats, compromising user data, privacy concerns, and financial and device integrity, to name a few. To combat this, numerous efforts have explored automated botnet detection mechanisms, with anomaly-based approaches leveraging machine learning (ML) gaining attraction due to their signature-agnostic nature. However, the problem lies in devising accurate ML models which capture the ever evolving landscape of malwares by effectively leveraging all the possible features from Android application packages (APKs).This paper delved into this domain by proposing, implementing, and evaluating an image-based Android malware detection (AMD) framework that harnessed the power of feature hybridization. The core idea of this framework was the conversion of text-based data extracted from Android APKs into grayscale images. The novelty aspect of this work lied in the unique image feature extraction strategies and their subsequent hybridization to achieve accurate malware classification using ML models. More specifically, four distinct feature extraction methodologies, namely, Texture and histogram of oriented gradients (HOG) from spatial domain, and discrete wavelet transform (DWT) and Gabor from the frequency domain were employed to hybridize the features for improved malware identification. To this end, three image-based datasets, namely, Dex, Manifest, and Composite, derived from the information security centre of excellence (ISCX) Android Malware dataset, were leveraged to evaluate the optimal data source for botnet classification. Popular ML classifiers, including naive Bayes (NB), multilayer perceptron (MLP), support vector machine (SVM), and random forest (RF), were employed for the classification task. The experimental results demonstrated the efficacy of the proposed framework, achieving a peak classification accuracy of 93.03% and recall of 97.1% for the RF classifier using the Manifest dataset and a combination of Texture and HOG features. These findings validate the proof-of-concept and provide valuable insights for researchers exploring ML/deep learning (DL) approaches in the domain of AMD.
[1] | T. Shishkova, A. Kivva, Mobile Malware Evolution 2021, 2021. Available from: https://securelist.com/mobile-malware-evolution-2021/105876. |
[2] | AppBrain, Number of Android Apps on Google Play, 2024. Available from: https://www.appbrain.com/stats/number-of-android-apps. |
[3] | McAfee, McAfee Mobile Threat Report, 2021. Available from: https://www.mcafee.com/content/dam/global/infographics/McAfeeMobileThreatReport2021.pdf. |
[4] | J. Senanayake, H. Kalutarage, M. O. Al-Kadri, Android mobile malware detection using machine learning: A systematic review, Electronics, 10 (2021), 1606. https://doi.org/10.3390/electronics10131606 doi: 10.3390/electronics10131606 |
[5] | S. Y. Yerima, A. Bashar, A novel android botnet detection system using image-based and manifest file features, Electronics, 11 (2022), 486. https://doi.org/10.3390/electronics11030486 doi: 10.3390/electronics11030486 |
[6] | Z. Wang, Q. Liu, Y. Chi, Review of android malware detection based on deep learning, IEEE Access, 8 (2020), 181102–181126. https://doi.org/10.1109/ACCESS.2020.3028370 doi: 10.1109/ACCESS.2020.3028370 |
[7] | I. Almomani, A. Alkhayer, W. El-Shafai, An automated vision-based deep learning model for efficient detection of android malware attacks, IEEE Access, 10 (2022), 2700–2720. https://doi.org/10.1109/ACCESS.2022.3140341 doi: 10.1109/ACCESS.2022.3140341 |
[8] | D. Vasan, M. Alazab, S. Wassan, H. Naeem, B. Safaei, Q. Zheng, Imcfn: Image-based malware classification using fine-tuned convolutional neural network architecture, Comput. Netw., 171 (2020), 107138. https://doi.org/10.1016/j.comnet.2020.107138 doi: 10.1016/j.comnet.2020.107138 |
[9] | K. Liu, S. Xu, G. Xu, M. Zhang, D. Sun, H. Liu, A review of android malware detection approaches based on machine learning, IEEE Access, 8 (2020), 124579–124607. https://doi.org/10.1109/ACCESS.2020.3006143 doi: 10.1109/ACCESS.2020.3006143 |
[10] | F. Taher, O. AlFandi, M. Al-kfairy, H. Al Hamadi, S. Alrabaee, Droiddetectmw: A hybrid intelligent model for android malware detection, Appl. Sci., 13 (2023), 7720. https://doi.org/10.3390/app13137720 doi: 10.3390/app13137720 |
[11] | X. Zhang, J. Wang, J. Xu, C. Gu, Detection of android malware based on deep forest and feature enhancement, IEEE Access, 11 (2023), 29344–29359. https://doi.org/10.1109/ACCESS.2023.3260977 doi: 10.1109/ACCESS.2023.3260977 |
[12] | N. Herron, W. B. Glisson, J. Todd McDonald, R. K. Benton, Machine learning-based android malware detection using manifest permissions, in Proceedings of the 54th Hawaii International Conference on System Sciences, (2021), 6976. |
[13] | H. Hasan, B. T. Ladani, B. Zamani, Megdroid: A model-driven event generation framework for dynamic android malware analysis, Inform. Software Tech., 135 (2021), 106569. https://doi.org/10.1016/j.infsof.2021.106569 doi: 10.1016/j.infsof.2021.106569 |
[14] | G. Xiao, J. Li, Y. Chen, K. Li, Malfcs: An effective malware classification framework with automated feature extraction based on deep convolutional neural networks, J. Parallel Distr. Com., 141 (2020), 49–58. https://doi.org/10.1016/j.jpdc.2020.03.012 doi: 10.1016/j.jpdc.2020.03.012 |
[15] | S. Millar, N. McLaughlin, J. M. del Rincon, P. Miller, Multi-view deep learning for zero-day android malware detection, J. Inf. Secur. Appl., 58 (2021), 102718. https://doi.org/10.1016/j.jisa.2020.102718 doi: 10.1016/j.jisa.2020.102718 |
[16] | A. T. Kabakus, Droidmalwaredetector: A novel android malware detection framework based on convolutional neural network, Expert Syst. Appl., 206 (2022), 117833. https://doi.org/10.1016/j.eswa.2022.117833 doi: 10.1016/j.eswa.2022.117833 |
[17] | G. D'Angelo, E. Farsimadan, M. Ficco, F. Palmieri, A. Robustelli, Privacy-preserving malware detection in android-based iot devices through federated markov chains, Future Gener. Comp. Sy., 148 (2023), 93–105. https://doi.org/10.1016/j.future.2023.05.021 doi: 10.1016/j.future.2023.05.021 |
[18] | S. Y. Yerima, A. Bashar, Bot-img: A framework for image-based detection of android botnets using machine learning, in 2021 IEEE/ACS 18th International Conference on Computer Systems and Applications (AICCSA), (2021), 1–7. https://doi.org/10.1109/AICCSA53542.2021.9686850 |
[19] | J. Singh, D. Thakur, F. Ali, T. Gera, K. S. Kwak, Deep feature extraction and classification of android malware images, Sensors, 20 (2020), 7013. https://doi.org/10.3390/s20247013 doi: 10.3390/s20247013 |
[20] | J. Tang, R. Li, Y. Jiang, X. Gu, Y. Li, Android malware obfuscation variants detection method based on multi-granularity opcode features, Future Gener. Comp. Sy., 129 (2022), 141–151. https://doi.org/10.1016/j.future.2021.11.005 doi: 10.1016/j.future.2021.11.005 |
[21] | A. S. Mohammed, S. Seher, S. Y. Yerima, A. Bashar, A deep learning based approach to android botnet detection using transfer learning, in 2022 14th International Conference on Computational Intelligence and Communication Networks (CICN), (2022), 543–548. https://doi.org/10.1109/CICN56167.2022.10008334 |
[22] | Y. He, X. Kang, Q. Yan, E. Li, Resnext+: Attention mechanisms based on resnext for malware detection and classification, IEEE T. Inf. Foren. Sec., 19 (2024), 1142–1155. https://doi.org/10.1109/TIFS.2023.3328431 doi: 10.1109/TIFS.2023.3328431 |
[23] | A. F. Abdul Kadir, N. Stakhanova, A. A. Ghorbani, Android botnets: What urls are telling us, in Network and System Security, Springer, Cham, (2015), 78–91. https://doi.org/10.1007/978-3-319-25645-0_6 |
[24] | M. A. Al-Asadi, S. Tasdemir, Empirical comparisons for combining balancing and feature selection strategies for characterizing football players using fifa video game system, IEEE Access, 9 (2021), 149266–149286. https://doi.org/10.1109/ACCESS.2021.3124931 doi: 10.1109/ACCESS.2021.3124931 |
[25] | A. Humeau-Heurtier, Texture feature extraction methods: A survey, IEEE Access, 7 (2019), 8975–9000. https://doi.org/10.1109/ACCESS.2018.2890743 doi: 10.1109/ACCESS.2018.2890743 |
[26] | A. Latif, A. Rasheed, U. Sajid, J. Ahmed, N. Ali, N. I. Ratyal, et al., Content-based image retrieval and feature extraction: A comprehensive review, Math. Probl. Eng., 2019 (2019), 9658350. https://doi.org/10.1155/2019/9658350 doi: 10.1155/2019/9658350 |
[27] | V. Verma, S. K. Muttoo, V. B. Singh, Multiclass malware classification via first-and second-order texture statistics, Comput. Secur., 97 (2020), 101895. https://doi.org/10.1016/j.cose.2020.101895 doi: 10.1016/j.cose.2020.101895 |
[28] | G. N. Srinivasan, G. Shobha, Statistical texture analysis, in Proceedings of World Academy of Science, Engineering and Technology, 36 (2008), 1264–1269. |
[29] | A. Ramola, A. K. Shakya, D. Van Pham, Study of statistical methods for texture analysis and their modern evolutions, Eng. Rep., 2 (2020), e12149. https://doi.org/10.1002/eng2.12149 doi: 10.1002/eng2.12149 |
[30] | N. Chawla, H. Kumar, S. Mukhopadhyay, Machine learning in wavelet domain for electromagnetic emission based malware analysis, IEEE T. Inf. Foren. Sec., 16 (2021), 3426–3441. https://doi.org/10.1109/TIFS.2021.3080510 doi: 10.1109/TIFS.2021.3080510 |
[31] | N. Aggarwal, R. K. Agrawal, First and second order statistics features for classification of magnetic resonance brain images, J. Signal Inf. Process., 3 (2012), 146–153. https://doi.org/10.4236/jsip.2012.32019 doi: 10.4236/jsip.2012.32019 |
[32] | A. Pinhero, M. L. Anupama, P. Vinod, C. A. Visaggio, N. Aneesh, S. Abhijith, et al., Malware detection employed by visualization and deep neural network, Comput. Secur., 105 (2021), 102247. https://doi.org/10.1016/j.cose.2021.102247 doi: 10.1016/j.cose.2021.102247 |
[33] | F. Shang, Y. Li, X. Deng, D. He, Android malware detection method based on naive bayes and permission correlation algorithm, Cluster Comput., 21 (2018), 955–966. https://doi.org/10.1007/s10586-017-0981-6 doi: 10.1007/s10586-017-0981-6 |
[34] | O. S. Jannath Nisha, S. Mary Saira Bhanu, Permission-based android malware application detection using multi-layer perceptron, in Intelligent Systems Design and Applications, Springer, Cham, (2018), 362–371. https://doi.org/10.1007/978-3-030-16660-1_36 |
[35] | M. Wadkar, F. Di Troia, M. Stamp, Detecting malware evolution using support vector machines, Expert Syst. Appl., 143 (2020), 113022. https://doi.org/10.1016/j.eswa.2019.113022 doi: 10.1016/j.eswa.2019.113022 |
[36] | H. J. Zhu, T. H. Jiang, B. Ma, Z. H. You, W. L. Shi, L. Cheng, Hemd: a highly efficient random forest-based malware detection framework for android, Neural Comput. Appl., 30 (2018), 3353–3361. https://doi.org/10.1007/s00521-017-2914-y doi: 10.1007/s00521-017-2914-y |
[37] | R. Ali, A. Ali, F. Iqbal, M. Hussain, F. Ullah, Deep learning methods for malware and intrusion detection: A systematic literature review, Secur. Commun. Netw., 2022 (2022), 2959222. https://doi.org/10.1155/2022/2959222 doi: 10.1155/2022/2959222 |
[38] | M. Ganesh, P. Pednekar, P. Prabhuswamy, D. S. Nair, Y. Park, H. Jeon, Cnn-based android malware detection, in 2017 international conference on software security and assurance (ICSSA), (2017), 60–65. https://doi.org/10.1109/ICSSA.2017.18 |
[39] | H. Cai, N. Meng, B. Ryder, D. Yao, Droidcat: Effective android malware detection and categorization via app-level profiling, IEEE T. Inf. Foren. Sec., 14 (2018), 1455–1470. https://doi.org/10.1109/TIFS.2018.2879302 doi: 10.1109/TIFS.2018.2879302 |
[40] | Y. Wang, Z. Liu, J. Xu, W. Yan, Heterogeneous network representation learning approach for ethereum identity identification, IEEE T. Comput. Soc. Sy., 10 (2023), 890–899. https://doi.org/10.1109/TCSS.2022.3164719 doi: 10.1109/TCSS.2022.3164719 |
[41] | J. Zhao, Y. Lv, Q. Zeng, L. Wan, Online policy learning based output-feedback optimal control of continuous-time systems, IEEE T. Circuits-II, 71 (2022), 652–656. https://doi.org/10.1109/TCSII.2022.3211832 doi: 10.1109/TCSII.2022.3211832 |
[42] | Haipeng Cai. Embracing mobile app evolution via continuous ecosystem mining and characterization, in Proceedings of the IEEE/ACM 7th International Conference on Mobile Software Engineering and Systems, (2020), 31–35. https://doi.org/10.1145/3387905.3388612 |
[43] | H. Cai, B. Ryder, A longitudinal study of application structure and behaviors in android, IEEE T. Software Eng., 47 (2020), 2934–2955. https://doi.org/10.1109/TSE.2020.2975176 doi: 10.1109/TSE.2020.2975176 |
[44] | H. Cai, J. Jenkins, Towards sustainable android malware detection, in Proceedings of the 40th International Conference on Software Engineering: Companion Proceeedings, (2018), 350–351. https://doi.org/10.1145/3183440.3195004 |
[45] | H. Cai, Assessing and improving malware detection sustainability through app evolution studies, ACM T. Softw. Eng. Meth., 29 (2020), 1–28. https://doi.org/10.1145/3371924 doi: 10.1145/3371924 |