The existing classification methods of LiDAR point cloud are almost based on the assumption that each class is balanced, without considering the imbalanced class problem. Moreover, from the perspective of data volume, the LiDAR point cloud classification should be a typical big data classification problem. Therefore, by studying the existing deep network structure and imbalanced sampling methods, this paper proposes an oversampling method based on stack autoencoder. The method realizes automatic generation of synthetic samples by learning the distribution characteristics of the positive class, which solves the problem of imbalance training data well. It only takes the geometric coordinates and intensity information of the point clouds as the input layer and does not need feature construction or fusion, which reduces the computational complexity. This paper also discusses the influence of sampling number, oversampling method and classifier on the classification results, and evaluates the performance from three aspects: true positive rate, positive predictive value and accuracy. The results show that the oversampling method based on stack autoencoder is suitable for imbalanced LiDAR point cloud classification, and has a good ability to improve the effect of positive class. If it is combined with optimized classifier, the classification performance of imbalanced point cloud is greatly improved.
Citation: Peng Ren, Qunli Xia. Classification method for imbalanced LiDAR point cloud based on stack autoencoder[J]. Electronic Research Archive, 2023, 31(6): 3453-3470. doi: 10.3934/era.2023175
The existing classification methods of LiDAR point cloud are almost based on the assumption that each class is balanced, without considering the imbalanced class problem. Moreover, from the perspective of data volume, the LiDAR point cloud classification should be a typical big data classification problem. Therefore, by studying the existing deep network structure and imbalanced sampling methods, this paper proposes an oversampling method based on stack autoencoder. The method realizes automatic generation of synthetic samples by learning the distribution characteristics of the positive class, which solves the problem of imbalance training data well. It only takes the geometric coordinates and intensity information of the point clouds as the input layer and does not need feature construction or fusion, which reduces the computational complexity. This paper also discusses the influence of sampling number, oversampling method and classifier on the classification results, and evaluates the performance from three aspects: true positive rate, positive predictive value and accuracy. The results show that the oversampling method based on stack autoencoder is suitable for imbalanced LiDAR point cloud classification, and has a good ability to improve the effect of positive class. If it is combined with optimized classifier, the classification performance of imbalanced point cloud is greatly improved.
[1] | X. L. Li, C. Liu, Z. N. Wang, X. H. Xie, D. Li, L. J. Xu, Airborne LiDAR: state-of-the-art of system design, technology and application, Meas. Sci. Technol., 32 (2020). https://doi.org/10.1088/1361-6501/abc867 doi: 10.1088/1361-6501/abc867 |
[2] | Y. Benoist, P. Foulon, F. Labourie, Flots d'Anosov a distributions stable et instable differentiables, (French) [Anosov flows with stable and unstable differentiable distributions, J. Am. Math. Soc., 5 (1992), 33–74. https://doi.org/10.1090/S0894-0347-1992-1124979-1 doi: 10.1090/S0894-0347-1992-1124979-1 |
[3] | M. Beland, G. Parker, B. Sparrow, D. Harding, L. Chasmer, S. Phinn, et al., On promoting the use of lidar systems in forest ecosystem research, For. Ecol. Manage., 450 (2019). https://doi.org/10.1109/10.1016/j.foreco.2019.117484 doi: 10.1109/10.1016/j.foreco.2019.117484 |
[4] | L. Mei, T. Ma, Z. Zhang, R. N. Fei, K. Liu, Z. F. Gong, et al., Experimental calibration of the overlap factor for the pulsed atmospheric lidar by employing a collocated Scheimpflug lidar, Remote Sens., 12 (2020). https://doi.org/10.1016/10.3390/rs12071227 doi: 10.1016/10.3390/rs12071227 |
[5] | S. Muckenhuber, H. Holzer, Z. Bockaj, Automotive lidar modelling approach based on material properties and lidar capabilities, Sensors, 20 (2020). https://doi.org/10.3390/s20113309 doi: 10.3390/s20113309 |
[6] | A. Ulvi, Documentation, Three-Dimensional (3D) Modelling and visualization of cultural heritage by using Unmanned Aerial Vehicle (UAV) photogrammetry and terrestrial laser scanners, Int. J. Remote Sens., 42 (2021), 1994–2021. https://doi.org/10.1080/01431161.2020.1834164 doi: 10.1080/01431161.2020.1834164 |
[7] | W. Song, S. H. Zou, Y. F. Tian, S. Fong, K. Cho, Classifying 3D objects in LiDAR point clouds with a back-propagation neural network, Hum.-centric Comput. Inf. Sci., 8 (2018). https://doi.org/10.1186/s13673-018-0152-7 doi: 10.1186/s13673-018-0152-7 |
[8] | Y. Li, G. F. Tong, X. C. Du, X. Yang, J. J. Zhang, L. Yang, A single point-based multilevel features fusion and pyramid neighborhood optimization method for ALS point cloud classification, Appl. Sci., 9 (2019). https://doi.org/10.3390/app9050951 doi: 10.3390/app9050951 |
[9] | T. B. Sun, J. H. Liu, J. M. Kan, T. T. Sui, A study on the classification of vegetation point cloud based on random forest in the straw checkerboard barriers area, J. Intell. Fuzzy Syst., 41 (2021), 4337–4339. https://doi.org/10.3233/JIFS-189694 doi: 10.3233/JIFS-189694 |
[10] | Z. S. Liu, W. Song, Y. F. Tian, S. M. Ji, Y. Sung, L. Wen, et al., Vb-net: Voxel-based broad learning network for 3d object classification, Appl. Sci., 10 (2020). https://doi.org/10.3390/app10196735 doi: 10.3390/app10196735 |
[11] | L. Wang, Y. X. Liu, S. M. Zhang, J. X. Yan, P. J. Tao, Structure-aware convolution for 3D point cloud classification and segmentation, Remote Sens., 12 (2020), 294–302. https://doi.org/10.3390/rs12040634 doi: 10.3390/rs12040634 |
[12] | C. C. Lin, C. H. Kuo, H. T. Chiang, CNN-Based Classification for Point Cloud Object with Bearing Angle Image, IEEE Access, 22 (2022), 1003–1011. https://doi.org/10.1109/JSEN.2021.3130268 doi: 10.1109/JSEN.2021.3130268 |
[13] | X. Li, L. Zhang, Unbalanced data processing using deep sparse learning technique, Future Gener. Comput. Syst., 125 (2021), 480–484. https://doi.org/10.1016/j.future.2021.05.034 doi: 10.1016/j.future.2021.05.034 |
[14] | X. Y. Wang, L. P. Jing, Y. L. Lyu, M. Z. Guo, T. Y. Zeng, Smooth Soft-Balance Discriminative Analysis for imbalanced data, Knowl.-Based Syst., 228 (2020). https://doi.org/10.1016/j.knosys.2020.106604 doi: 10.1016/j.knosys.2020.106604 |
[15] | J. N. Wei, H. S. Huang, L. G. Yao, Y. Hu, Q. S. Fan, D. Huang, IA-SUWO: An Improving Adaptive semi-unsupervised weighted oversampling for imbalanced classification problems, Knowl.-Based Syst., 203 (2020). https://doi.org/10.1016/j.knosys.2020.106116 doi: 10.1016/j.knosys.2020.106116 |
[16] | W. W. Ng, S. C. Xu, J. J. Zhang, X. Tian, T. W. Rong, S. Kwong, Hashing-based undersampling ensemble for imbalanced pattern classification problems, IEEE Trans. Cybern., 52 (2020), 1269–1279. https://doi.org/10.1109/TCYB.2020.3000754 doi: 10.1109/TCYB.2020.3000754 |
[17] | H. Kaur, H. S. Pannu, A. K. Malhi, A systematic review on imbalanced data challenges in machine learning: Applications and solutions, ACM Comput. Surv., 52 (2019). https://doi.org/10.1145/3343440 doi: 10.1145/3343440 |
[18] | N. V. Chawla, K. W. Bowyer, L. O. Hall, W. P. Kegelmeyer, SMOTE: synthetic minority over-sampling technique, J. Artif. Intell. Res., 16 (2002), 321–357. https://doi.org/10.1613/jair.953 doi: 10.1613/jair.953 |
[19] | I. Nekooeimehr, S. K. Lai-Yuen, Adaptive semi-unsupervised weighted oversampling (A-SUWO) for imbalanced datasets, Expert Syst. Appl., 46 (2016), 405–416. https://doi.org/10.1016/j.eswa.2015.10.031 doi: 10.1016/j.eswa.2015.10.031 |
[20] | M. Galar, A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches, IEEE Trans. Syst., Man, Cybern., 42 (2012), 463–484. https://doi.org/10.1109/TSMCC.2011.2161285 doi: 10.1109/TSMCC.2011.2161285 |
[21] | H. I. Lin, M. C. Nguyen, Boosting minority class prediction on imbalanced point cloud data, Appl. Sci., 10 (2020). https://doi.org/10.3390/app10030973 doi: 10.3390/app10030973 |
[22] | B. E. Aissou, A. B. Aissa, A. Dairi, F. Harrou, A. Wichmann, M. Kada, Building roof superstructures classification from imbalanced and low density airborne LiDAR point cloud, IEEE Sens. J., 21 (2021), 14960–14976. https://doi.org/10.1109/JSEN.2021.3073535 doi: 10.1109/JSEN.2021.3073535 |
[23] | T. Kogut, A. Tomczak, A. Sowik, T. Oberski, Seabed modelling by means of airborne laser bathymetry data and imbalanced learning for offshore mapping, Sensors, 22 (2022), 14960–14976. https://doi.org/10.3390/s22093121 doi: 10.3390/s22093121 |
[24] | P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, P. Manzagol, L. Bottou, Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion, J. Mach. Learn. Res., 11 (2010). https://doi.org/10.1016/j.mechatronics.2010.09.004 doi: 10.1016/j.mechatronics.2010.09.004 |
[25] | H. L. Gong, S. B. Cheng, Z. Chen, Q. Li, C. Quilodrán-Casas, D. H. Xiao, et al., An efficient digital twin based on machine learning SVD autoencoder and generalised latent assimilation for nuclear reactor physics, Ann. Nucl. Energy, 179 (2022). https://doi.org/10.1016/j.anucene.2022.109431 doi: 10.1016/j.anucene.2022.109431 |
[26] | S. B. Cheng, J. H. Chen, C. Anastasiou, P. Angeli, O. K. Matar, Y. Guo, et al., Generalised latent assimilation in heterogeneous reduced spaces with machine learning surrogate models, J. Sci. Comput., 94 (2023). https://doi.org/10.1007/s10915-022-02059-4 doi: 10.1007/s10915-022-02059-4 |
[27] | S. Langarica, F. Nunez, Contrastive blind denoising autoencoder for real time denoising of industrial IoT sensor data, Eng. Appl. Artif. Intell., 120 (2023). https://doi.org/10.1016/j.engappai.2023.105838 doi: 10.1016/j.engappai.2023.105838 |
[28] | T. Liu, Y. C. Jin, S. Wang, Q. W. Zheng, G. A. Yang, Denoising method of weak fault acoustic emission signal under strong background noise of engine based on autoencoder and wavelet packet decomposition, Struct. Health Monit., 2023. https://doi.org/10.1177/14759217221143547 doi: 10.1177/14759217221143547 |
[29] | Z. Salekshahrezaee, J. L. Leevy, T. M. Khoshgoftaar, The effect of feature extraction and data sampling on credit card fraud detection, J. Big Data, 10 (2023). https://doi.org/10.1186/s40537-023-00684-w doi: 10.1186/s40537-023-00684-w |
[30] | G. Long, Z. X. Zhang, Deep encrypted traffic detection: An anomaly detection framework for encryption traffic based on parallel automatic feature extraction, Comput. Intell. Neurosci., 2023 (2023). https://doi.org/10.1155/2023/3316642 doi: 10.1155/2023/3316642 |
[31] | X. S. Du, J. Yu, Z. Chu, L. N. Jin, J. Y. Chen, Graph autoencoder-based unsupervised outlier detection, Inf. Sci., 608 (2022), 532–550. https://doi.org/10.1016/j.ins.2022.06.039 doi: 10.1016/j.ins.2022.06.039 |
[32] | A. Abhaya, B. K. Patra, An efficient method for autoencoder based outlier detection, Expert Syst. Appl., 213 (2023). https://doi.org/10.1016/j.eswa.2022.118904 doi: 10.1016/j.eswa.2022.118904 |
[33] | C. K. Ma, Y. J. Park, A new instance density-based synthetic minority oversampling method for imbalanced classification problems, Eng. Optimiz., 54 (2022), 1743–1757. https://doi.org/10.1080/0305215X.2021.1982929 doi: 10.1080/0305215X.2021.1982929 |
[34] | H. Han, W. Y. Wang, B. H. Mao, Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning, Adv. Intell. Comput., 644 (2005), 878–887. https://doi.org/10.1007/11538059-91 doi: 10.1007/11538059-91 |
[35] | C. Bunkhumpornpat, K. Sinapiromsaran, C. Lursinsap, Safe-level-smote: Safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem, Adv. Knowl. Discovery Data Min., 5476 (2009), 475–482. https://doi.org/10.1007/978-3-642-01307-2-43 doi: 10.1007/978-3-642-01307-2-43 |
[36] | G. Douzas, F. Bacao, F. Last, Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE, Inf. Sci., 465 (2018). https://doi.org/10.1016/j.ins.2018.06.056 doi: 10.1016/j.ins.2018.06.056 |
[37] | W. W. Ng, G. J. Zeng, J. J. Zhang, D. S. Yeung, W. Pedrycz, Dual autoencoders features for imbalance classification problem, Pattern Recognit., 60 (2016), 875–889. https://doi.org/10.1016/j.patcog.2016.06.013 doi: 10.1016/j.patcog.2016.06.013 |
[38] | J. F. Xu, Y. J. Zhang, D. Q. Miao, Three-way confusion matrix for classification: A measure driven view, Inf. Sci., 507 (2020), 772–794. https://doi.org/10.1016/j.ins.2019.06.064 doi: 10.1016/j.ins.2019.06.064 |
[39] | C. Seiffert, T. M. Khoshgoftaar, J. V. Hulse, A. Napolitano, RUSBoost: A hybrid approach to alleviating class imbalance, IEEE Trans. Syst., Man, Cybernet.-Part A: Syst. Hum., 40 (2010), 185–197. https://doi.org/10.1109/TSMCA.2009.2029559 doi: 10.1109/TSMCA.2009.2029559 |
[40] | X. R. Jin, Z. X. Ding, T. Li, J. Xiong, G. Tian, J. B. Liu, Comparison of MPL-ANN and PLS-DA models for predicting the severity of patients with acute pancreatitis: An exploratory study, Am. J. Emerg. Med., 44 (2021), 85–91. https://doi.org/10.1016/j.ajem.2021.01.044 doi: 10.1016/j.ajem.2021.01.044 |
[41] | H. Zhou, K. M. Yu, Y. C. Chen, H. P. Hsu, A hybrid feature selection method RFSTL for manufacturing quality prediction based on a high dimensional imbalanced dataset, IEEE Access, 9 (2021), 29719–29735. https://doi.org/10.1109/ACCESS.2021.3059298 doi: 10.1109/ACCESS.2021.3059298 |
[42] | R. Blanquero, E. Carrizosa, C. Molero-Río, D. R. Morales, Optimal randomized classification trees, Comput. Oper. Res., 132 (2021). https://doi.org/10.1016/j.cor.2021.105281 doi: 10.1016/j.cor.2021.105281 |
[43] | Q. A. Al-Haija, M. Krichen, W. A. Elhaija, Machine-learning-based darknet traffic detection system for IoT applications, Electronics, 11 (2022). https://doi.org/10.3390/electronics11040556 doi: 10.3390/electronics11040556 |