Group feature screening based on Gini impurity for ultrahigh-dimensional multi-classification

Zhongzheng Wang; Guangming Deng; Haiyun Xu; Zhongzheng Wang; Guangming Deng; Haiyun Xu

doi:10.3934/math.2023216

AIMS Mathematics

2023, Volume 8, Issue 2: 4342-4362. doi: 10.3934/math.2023216

Previous Article Next Article

Research article

Group feature screening based on Gini impurity for ultrahigh-dimensional multi-classification

1.
College of science, Guilin University of Technology, Guangxi 541000, China
2.
Applied Statistics Institute, Guilin University of Technology, Guangxi 541000, China
3.
School of finance, Jiangxi University of Finance and Economics, Jiangxi 330013, China

Received: 04 September 2022 Revised: 17 November 2022 Accepted: 22 November 2022 Published: 05 December 2022
MSC : 62H30, 62R07

Because the majority of model-free feature screening methods concentrate on individual predictors, they are unable to consider structured predictors, such as grouped variables. In this study, we suggest a model-free and direct extension of the original sure independence screening approach for group screening using Gini impurity for a classification model. Compared to current feature screening approaches, the proposed method performs better in terms of screening efficiency and classification accuracy. It was established that the suggested group screening process exhibits sure screening properties and ranking consistency properties under specific regularity conditions. We used simulation studies to illustrate the limited sample performance of the proposed technique and real data analysis.
- ultrahigh-dimensional,
- group feature screening,
- model-free,
- Gini impurity,
- classification model
Citation: Zhongzheng Wang, Guangming Deng, Haiyun Xu. Group feature screening based on Gini impurity for ultrahigh-dimensional multi-classification[J]. AIMS Mathematics, 2023, 8(2): 4342-4362. doi: 10.3934/math.2023216

Related Papers:

Abstract

Because the majority of model-free feature screening methods concentrate on individual predictors, they are unable to consider structured predictors, such as grouped variables. In this study, we suggest a model-free and direct extension of the original sure independence screening approach for group screening using Gini impurity for a classification model. Compared to current feature screening approaches, the proposed method performs better in terms of screening efficiency and classification accuracy. It was established that the suggested group screening process exhibits sure screening properties and ranking consistency properties under specific regularity conditions. We used simulation studies to illustrate the limited sample performance of the proposed technique and real data analysis.

References

[1]	P. Breheny, The group exponential lasso for bi-level variable selection, Biometrika, 71 (2015), 731–740. https://doi.org/10.1111/biom.12300 doi: 10.1111/biom.12300
[2]	P. Breheny, J. Huang, Penalized methods for bi-level variable selection, Stat. Interface., 2 (2009), 369–380. https://doi.org/10.4310/SII.2009.v2.n3.a10 doi: 10.4310/SII.2009.v2.n3.a10
[3]	L. Breiman, J. Friedman, C. J. Stone, R. A. Olshen, Classification and regression trees, Belmont CA: Wadsworth International Group, 1984. https://doi.org/10.1201/9781315139470
[4]	H. Cui, R. Li, W. Zhong, Model-free feature screening for ultrahigh dimensional discriminant analysis, J. Am. Stat. Assoc., 110 (2015), 630–641. https://doi.org/10.1080/01621459.2014.920256 doi: 10.1080/01621459.2014.920256
[5]	J. Fan, J. Lv, Sure independence screening for ultrahigh dimensional feature space, J. R. Stat. Soc. Ser. B, 70 (2008), 849–911. https://doi.org/10.1111/j.1467-9868.2008.00674.x doi: 10.1111/j.1467-9868.2008.00674.x
[6]	J. Fan, R. Samworth, Y. Wu, Ultrahigh dimensional feature selection: beyond the linear model, J. Mach. Learn. Res., 10 (2009), 2013–2038. http://arXiv.org/abs/0812.3201
[7]	H. He, G. Deng, Grouped feature screening for ultra-high dimensional data for the classification model, J. Stat. Comput. Simul., 92 (2022), 972–997. https://doi.org/10.1080/00949655.2021.1981901 doi: 10.1080/00949655.2021.1981901
[8]	D. Huang, R. Li, H. Wang, Feature screening for ultrahigh dimensional categorical data with applications, J. Bus. Econ. Stat., 32 (2014), 237–244. https://doi.org/10.1080/07350015.2013.863158 doi: 10.1080/07350015.2013.863158
[9]	J. Huang, S. Ma, H. Xie, C. Zhang, A group bridge approach for variable selection, Biometrika, 96 (2009), 339–355. https://doi.org/10.1093/biomet/asp020 doi: 10.1093/biomet/asp020
[10]	B. Lantz, Machine learning with R: expert techniques for predictive modeling, $2^{ed}$, Birmingha: Packt Publishing, 2019.
[11]	Q. Mai, H. Zou, The Kolmogorov filter for variable screening in high-dimensional binary classification, Biometrika, 100 (2013), 229–234. https://doi.org/10.1093/biomet/ass062 doi: 10.1093/biomet/ass062
[12]	L. Ni, F. Fang, Entropy-based model-free feature screening for ultrahigh-dimensional multiclass classification, J. Nonparametr. Stat., 28 (2016), 515–530. https://doi.org/10.1080/10485252.2016.1167206 doi: 10.1080/10485252.2016.1167206
[13]	L. Ni, F. Fang, F. Wan, Adjusted pearson Chi-Square feature screening for multi-classification with ultrahigh dimensional data, Metrika, 80 (2017), 805–828. https://doi.org/10.1007/s00184-017-0629-9 doi: 10.1007/s00184-017-0629-9
[14]	L. Ni, Variable screening methods for ultra-high dimensional categorical covariates, Shanghai: East China Normal University, 2019.
[15]	Y. Niu, R. Zhang, J. Liu, H. Li, Group screening for ultra-high-dimensional feature under linear model, Stat. Theor. Relat. Field., 4 (2020), 43–54. https://doi.org/10.1080/24754269.2019.1633763 doi: 10.1080/24754269.2019.1633763
[16]	D. Qiu, J. Ahn, Grouped variable screening for ultra-high dimensional data for linear model, Comput. Stat. Data Anal., 144 (2020), 1–11. https://doi.org/10.1016/j.csda.2019.106894 doi: 10.1016/j.csda.2019.106894
[17]	Y. Sheng, Q. Wang, Model-free feature screening for ultrahigh dimensional classification, J. Multivar. Anal., 178 (2020), 1–15. https://doi.org/10.1016/j.jmva.2020.104618 doi: 10.1016/j.jmva.2020.104618
[18]	W. Song, J. Xie, Group feature screening via the F statistic, Commun. Stat. Simul. Comput., 48 (2019), 1921–1931. https://doi.org/10.1080/03610918.2019.1691223 doi: 10.1080/03610918.2019.1691223
[19]	J. A. K. Suykens, J. Vandewalle, Least squares support vector machine classifiers, Neural Process. Lett., 9 (1999), 293–300. https://doi.org/10.1023/A:1018628609742 doi: 10.1023/A:1018628609742
[20]	X. Shao, J. Zhang, Martingale difference correlation and its use in high-dimensional variable screening, J. Am. Stat. Assoc., 109 (2014), 1302–1318. https://doi.org/10.1080/01621459.2014.887012 doi: 10.1080/01621459.2014.887012
[21]	L. Wang, G. Chen, H. Li, Group SCAD regression analysis for microarray time course gene expression data, Bioinformatics, 23 (2007), 1486–1494. https://doi.org/10.1093/bioinformatics/btm125 doi: 10.1093/bioinformatics/btm125
[22]	M. Yuan, Y. Lin, Model selection and estimation in regression with grouped variables, J. R. Stat. Soc. Ser. B, 68 (2006), 49–67. https://doi.org/10.1111/j.1467-9868.2005.00532.x doi: 10.1111/j.1467-9868.2005.00532.x
[23]	N. Zhou, J. Zhu, Group variable selection via a hierarchical lasso and its oracle property, Stat. Interface., 3 (2010), 557–574. https://doi.org/10.48550/arXiv.1006.2871 doi: 10.48550/arXiv.1006.2871
[24]	L. Zhu, L. Li, R. Li, L. Zhu, Model-free feature screening for ultrahigh-dimensional data, J. Am. Stat. Assoc., 106 (2011), 1464–1475. https://doi.org/10.1198/jasa.2011.tm10563 doi: 10.1198/jasa.2011.tm10563

Reader Comments

Your name:*

Email:*
© 2023 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

AIMS Mathematics

1.8 3.1

Metrics

Article views(2823) PDF downloads(88) Cited by(7)

Preview PDF

Download XML

Export Citation

Article outline

Show full outline

Figures and Tables

Tables(8)

AIMS Mathematics

Group feature screening based on Gini impurity for ultrahigh-dimensional multi-classification

Related Papers:

Abstract

References

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Catalog

AIMS Mathematics

Group feature screening based on Gini impurity for ultrahigh-dimensional multi-classification

Related Papers:

Abstract

References

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Related pages

Tools

Export File

Citation

Format

Content

Catalog