Research article

Group feature screening based on Gini impurity for ultrahigh-dimensional multi-classification

  • Received: 04 September 2022 Revised: 17 November 2022 Accepted: 22 November 2022 Published: 05 December 2022
  • MSC : 62H30, 62R07

  • Because the majority of model-free feature screening methods concentrate on individual predictors, they are unable to consider structured predictors, such as grouped variables. In this study, we suggest a model-free and direct extension of the original sure independence screening approach for group screening using Gini impurity for a classification model. Compared to current feature screening approaches, the proposed method performs better in terms of screening efficiency and classification accuracy. It was established that the suggested group screening process exhibits sure screening properties and ranking consistency properties under specific regularity conditions. We used simulation studies to illustrate the limited sample performance of the proposed technique and real data analysis.

    Citation: Zhongzheng Wang, Guangming Deng, Haiyun Xu. Group feature screening based on Gini impurity for ultrahigh-dimensional multi-classification[J]. AIMS Mathematics, 2023, 8(2): 4342-4362. doi: 10.3934/math.2023216

    Related Papers:

  • Because the majority of model-free feature screening methods concentrate on individual predictors, they are unable to consider structured predictors, such as grouped variables. In this study, we suggest a model-free and direct extension of the original sure independence screening approach for group screening using Gini impurity for a classification model. Compared to current feature screening approaches, the proposed method performs better in terms of screening efficiency and classification accuracy. It was established that the suggested group screening process exhibits sure screening properties and ranking consistency properties under specific regularity conditions. We used simulation studies to illustrate the limited sample performance of the proposed technique and real data analysis.



    加载中


    [1] P. Breheny, The group exponential lasso for bi-level variable selection, Biometrika, 71 (2015), 731–740. https://doi.org/10.1111/biom.12300 doi: 10.1111/biom.12300
    [2] P. Breheny, J. Huang, Penalized methods for bi-level variable selection, Stat. Interface., 2 (2009), 369–380. https://doi.org/10.4310/SII.2009.v2.n3.a10 doi: 10.4310/SII.2009.v2.n3.a10
    [3] L. Breiman, J. Friedman, C. J. Stone, R. A. Olshen, Classification and regression trees, Belmont CA: Wadsworth International Group, 1984. https://doi.org/10.1201/9781315139470
    [4] H. Cui, R. Li, W. Zhong, Model-free feature screening for ultrahigh dimensional discriminant analysis, J. Am. Stat. Assoc., 110 (2015), 630–641. https://doi.org/10.1080/01621459.2014.920256 doi: 10.1080/01621459.2014.920256
    [5] J. Fan, J. Lv, Sure independence screening for ultrahigh dimensional feature space, J. R. Stat. Soc. Ser. B, 70 (2008), 849–911. https://doi.org/10.1111/j.1467-9868.2008.00674.x doi: 10.1111/j.1467-9868.2008.00674.x
    [6] J. Fan, R. Samworth, Y. Wu, Ultrahigh dimensional feature selection: beyond the linear model, J. Mach. Learn. Res., 10 (2009), 2013–2038. http://arXiv.org/abs/0812.3201
    [7] H. He, G. Deng, Grouped feature screening for ultra-high dimensional data for the classification model, J. Stat. Comput. Simul., 92 (2022), 972–997. https://doi.org/10.1080/00949655.2021.1981901 doi: 10.1080/00949655.2021.1981901
    [8] D. Huang, R. Li, H. Wang, Feature screening for ultrahigh dimensional categorical data with applications, J. Bus. Econ. Stat., 32 (2014), 237–244. https://doi.org/10.1080/07350015.2013.863158 doi: 10.1080/07350015.2013.863158
    [9] J. Huang, S. Ma, H. Xie, C. Zhang, A group bridge approach for variable selection, Biometrika, 96 (2009), 339–355. https://doi.org/10.1093/biomet/asp020 doi: 10.1093/biomet/asp020
    [10] B. Lantz, Machine learning with R: expert techniques for predictive modeling, $2^{ed}$, Birmingha: Packt Publishing, 2019.
    [11] Q. Mai, H. Zou, The Kolmogorov filter for variable screening in high-dimensional binary classification, Biometrika, 100 (2013), 229–234. https://doi.org/10.1093/biomet/ass062 doi: 10.1093/biomet/ass062
    [12] L. Ni, F. Fang, Entropy-based model-free feature screening for ultrahigh-dimensional multiclass classification, J. Nonparametr. Stat., 28 (2016), 515–530. https://doi.org/10.1080/10485252.2016.1167206 doi: 10.1080/10485252.2016.1167206
    [13] L. Ni, F. Fang, F. Wan, Adjusted pearson Chi-Square feature screening for multi-classification with ultrahigh dimensional data, Metrika, 80 (2017), 805–828. https://doi.org/10.1007/s00184-017-0629-9 doi: 10.1007/s00184-017-0629-9
    [14] L. Ni, Variable screening methods for ultra-high dimensional categorical covariates, Shanghai: East China Normal University, 2019.
    [15] Y. Niu, R. Zhang, J. Liu, H. Li, Group screening for ultra-high-dimensional feature under linear model, Stat. Theor. Relat. Field., 4 (2020), 43–54. https://doi.org/10.1080/24754269.2019.1633763 doi: 10.1080/24754269.2019.1633763
    [16] D. Qiu, J. Ahn, Grouped variable screening for ultra-high dimensional data for linear model, Comput. Stat. Data Anal., 144 (2020), 1–11. https://doi.org/10.1016/j.csda.2019.106894 doi: 10.1016/j.csda.2019.106894
    [17] Y. Sheng, Q. Wang, Model-free feature screening for ultrahigh dimensional classification, J. Multivar. Anal., 178 (2020), 1–15. https://doi.org/10.1016/j.jmva.2020.104618 doi: 10.1016/j.jmva.2020.104618
    [18] W. Song, J. Xie, Group feature screening via the F statistic, Commun. Stat. Simul. Comput., 48 (2019), 1921–1931. https://doi.org/10.1080/03610918.2019.1691223 doi: 10.1080/03610918.2019.1691223
    [19] J. A. K. Suykens, J. Vandewalle, Least squares support vector machine classifiers, Neural Process. Lett., 9 (1999), 293–300. https://doi.org/10.1023/A:1018628609742 doi: 10.1023/A:1018628609742
    [20] X. Shao, J. Zhang, Martingale difference correlation and its use in high-dimensional variable screening, J. Am. Stat. Assoc., 109 (2014), 1302–1318. https://doi.org/10.1080/01621459.2014.887012 doi: 10.1080/01621459.2014.887012
    [21] L. Wang, G. Chen, H. Li, Group SCAD regression analysis for microarray time course gene expression data, Bioinformatics, 23 (2007), 1486–1494. https://doi.org/10.1093/bioinformatics/btm125 doi: 10.1093/bioinformatics/btm125
    [22] M. Yuan, Y. Lin, Model selection and estimation in regression with grouped variables, J. R. Stat. Soc. Ser. B, 68 (2006), 49–67. https://doi.org/10.1111/j.1467-9868.2005.00532.x doi: 10.1111/j.1467-9868.2005.00532.x
    [23] N. Zhou, J. Zhu, Group variable selection via a hierarchical lasso and its oracle property, Stat. Interface., 3 (2010), 557–574. https://doi.org/10.48550/arXiv.1006.2871 doi: 10.48550/arXiv.1006.2871
    [24] L. Zhu, L. Li, R. Li, L. Zhu, Model-free feature screening for ultrahigh-dimensional data, J. Am. Stat. Assoc., 106 (2011), 1464–1475. https://doi.org/10.1198/jasa.2011.tm10563 doi: 10.1198/jasa.2011.tm10563
  • Reader Comments
  • © 2023 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(1315) PDF downloads(69) Cited by(6)

Article outline

Figures and Tables

Tables(8)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog