Research article

Feature screening for ultrahigh-dimensional binary classification via linear projection

  • Received: 14 November 2022 Revised: 27 February 2023 Accepted: 24 March 2023 Published: 17 April 2023
  • MSC : 62H30, 62F07

  • Linear discriminant analysis (LDA) is one of the most widely used methods in discriminant classification and pattern recognition. However, with the rapid development of information science and technology, the dimensionality of collected data is high or ultrahigh, which causes the failure of LDA. To address this issue, a feature screening procedure based on the Fisher's linear projection and the marginal score test is proposed to deal with the ultrahigh-dimensional binary classification problem. The sure screening property is established to ensure that the important features could be retained and the irrelevant predictors could be eliminated. The finite sample properties of the proposed procedure are assessed by Monte Carlo simulation studies and a real-life data example.

    Citation: Peng Lai, Mingyue Wang, Fengli Song, Yanqiu Zhou. Feature screening for ultrahigh-dimensional binary classification via linear projection[J]. AIMS Mathematics, 2023, 8(6): 14270-14287. doi: 10.3934/math.2023730

    Related Papers:

  • Linear discriminant analysis (LDA) is one of the most widely used methods in discriminant classification and pattern recognition. However, with the rapid development of information science and technology, the dimensionality of collected data is high or ultrahigh, which causes the failure of LDA. To address this issue, a feature screening procedure based on the Fisher's linear projection and the marginal score test is proposed to deal with the ultrahigh-dimensional binary classification problem. The sure screening property is established to ensure that the important features could be retained and the irrelevant predictors could be eliminated. The finite sample properties of the proposed procedure are assessed by Monte Carlo simulation studies and a real-life data example.



    加载中


    [1] J. Fan, Y. Fan, High dimensional classification using features annealed independence rules, Ann. Stat., 36 (2008), 2605–2637. http://dx.doi.org/10.1214/07-AOS504 doi: 10.1214/07-AOS504
    [2] J. Sorace, M. Zhan, A data review and re-assessment of ovarian cancer serum proteomic profiling, BMC Bioinformatics, 4 (2003), 1–13. http://dx.doi.org/10.1186/1471-2105-4-24 doi: 10.1186/1471-2105-4-24
    [3] Q. Mai, H. Zou, The Kolmogorov filter for variable screening in high-dimensional binary classification, Biometrika, 100 (2013), 229–234. http://dx.doi.org/10.1093/biomet/ass062 doi: 10.1093/biomet/ass062
    [4] Q. Mai, H. Zou, The fused Kolmogorov filter: A nonparametric model-free screening method, Ann. Stat., 43 (2015), 1471–1497. http://dx.doi.org/10.1214/14-AOS1303 doi: 10.1214/14-AOS1303
    [5] P. Lai, F. Song, K. Chen, Z. Liu, Model free feature screening with dependent variable in ultrahigh dimensional binary classification, Statist. Probab. Lett., 125 (2017), 141–148. https://doi.org/10.1016/j.spl.2017.02.011 doi: 10.1016/j.spl.2017.02.011
    [6] H. Cui, R. Li, W. Zhong, Model-free feature screening for ultrahigh dimensional discriminant analysis, J. Am. Stat. Assoc., 110 (2015), 630–641. http://dx.doi.org/10.1080/01621459.2014.920256 doi: 10.1080/01621459.2014.920256
    [7] R. Pan, H. Wang, R. Li, Ultrahigh dimensional multi-class linear discriminant analysis by pairwise sure independence screening, J. Am. Stat. Assoc., 111 (2016), 169–179. http://dx.doi.org/10.1080/01621459.2014.998760 doi: 10.1080/01621459.2014.998760
    [8] G. Cheng, X. Li, P. Lai, F. Song, J. Yu, Robust rank screening for ultrahigh dimensional discriminant analysis, Stat. Comput., 27 (2017), 535–545. http://dx.doi.org/10.1007/s11222-016-9637-2 doi: 10.1007/s11222-016-9637-2
    [9] S. He, S. Ma, W. Xu, A modified mean-variance feature-screening procedure for ultrahigh-dimensional discriminant analysis, Comput. Stat. Data Anal., 137 (2019), 155–169. http://dx.doi.org/10.1016/j.csda.2019.02.003 doi: 10.1016/j.csda.2019.02.003
    [10] F. Song, P. Lai, B. Shen, Robust composite weighted quantile screening for ultrahigh dimensional discriminant analysis, Metrika, 83 (2020), 799–820. https://doi.org/10.1007/s00184-019-00758-x doi: 10.1007/s00184-019-00758-x
    [11] Y. Sheng, Q. Wang, Model-free feature screening for ultrahigh dimensional classification, J. Multivar. Anal., 178 (2020), 104618. http://dx.doi.org/10.1016/j.jmva.2020.104618 doi: 10.1016/j.jmva.2020.104618
    [12] S. Zhao, Y. Li, Score test variable screening, Biometrics, 70 (2014), 862–871. http://dx.doi.org/10.1111/biom.12209 doi: 10.1111/biom.12209
    [13] Y. Ma, Y. Li, H. Lin, Concordance measure-based feature screening and variable selection, Stat. Sinica, 27 (2017), 1967–1985. http://dx.doi.org/10.5705/ss.202016.0024 doi: 10.5705/ss.202016.0024
    [14] J. Fan, J. Lv, Sure independence screening for ultrahigh dimensional feature space, J. R. Stat. Soc. Series B. Stat. Methodol., 70 (2008), 849–911. http://dx.doi.org/10.1111/j.1467-9868.2008.00674.x doi: 10.1111/j.1467-9868.2008.00674.x
    [15] R. Li, W. Zhong, L. Zhu, Feature screening via distance correlation Learning, J. Am. Stat. Assoc., 107 (2012), 1129–1139. http://dx.doi.org/10.1080/01621459.2012.695654 doi: 10.1080/01621459.2012.695654
    [16] T. Fushiki, H. Fujisawa, S. Eguchi, Identification of biomarkers from mass spectrometry data using a "common" peak approach, BMC Bioinformatics, 7 (2006), 358–366. http://dx.doi.org/10.1186/1471-2105-7-358 doi: 10.1186/1471-2105-7-358
    [17] M. Zhang, W. Wang, Y. Du, ULDA-based heuristic feature selection method for proteomic profile analysis and biomarker discovery, Chemometr. Intell. Lab. Syst., 102 (2010), 84–90. http://dx.doi.org/10.1016/j.chemolab.2010.04.005 doi: 10.1016/j.chemolab.2010.04.005
    [18] M. Zhang, P. Tong, W. Wang, J. Geng, Y. Du, Proteomic profile analysis and biomarker discovery from mass spectra using independent component analysis combined with uncorrelated linear discriminant analysis, Chemometr. Intell. Lab. Syst., 105 (2011), 207–214. http://dx.doi.org/10.1016/j.chemolab.2011.01.007 doi: 10.1016/j.chemolab.2011.01.007
    [19] R. Tibshirani, Regression shrinkage and selection via the lasso, J. R. Stat. Soc. Series B. Methodol., 58 (1996), 267–288. http://dx.doi.org/10.1111/j.2517-6161.1996.tb02080.x doi: 10.1111/j.2517-6161.1996.tb02080.x
    [20] J. Liu, R. Li, R. Wu, Feature selection for varying coefficient models with ultrahigh-dimensional covariates, J. Am. Stat. Assoc., 109 (2014), 266–274. http://dx.doi.org/10.1080/01621459.2013.850086 doi: 10.1080/01621459.2013.850086
    [21] L. Zhu, L. Li, R. Li, L. Zhu, Model-free feature screening for ultrahigh-dimensional data, J. Am. Stat. Assoc., 106 (2011), 1464–1475. http://dx.doi.org/10.1198/jasa.2011.tm10563 doi: 10.1198/jasa.2011.tm10563
  • Reader Comments
  • © 2023 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(1177) PDF downloads(51) Cited by(3)

Article outline

Figures and Tables

Tables(5)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog