Research article

Application of Bayesian variable selection in logistic regression model

  • Received: 20 February 2024 Revised: 22 March 2024 Accepted: 29 March 2024 Published: 10 April 2024
  • MSC : 62C10, 62C12

  • Typically, in high dimensional data sets, many covariates are not significantly associated with a response. Moreover, those covariates are highly correlated, leading to a multicollinearity problem. Hence, the model is sparse since the coefficient of most covariates are likely to be zero. The classical frequentist or likelihood-based variable selection via any criterion such as Bayesian Information Criteria (BIC) and Akaike Information Criteria (AIC) or a stepwise subset selection becomes infeasible when the number of variables are large. An alternative solution is a Bayesian variable selection. In this study, we used a variable selection via a Bayesian variable selection and the least absolute shrinkage and selection operator (LASSO) method in the logistic regression model. Moreover, those methods were expanded to be applied to real datasets.

    Citation: Kannat Na Bangchang. Application of Bayesian variable selection in logistic regression model[J]. AIMS Mathematics, 2024, 9(5): 13336-13345. doi: 10.3934/math.2024650

    Related Papers:

  • Typically, in high dimensional data sets, many covariates are not significantly associated with a response. Moreover, those covariates are highly correlated, leading to a multicollinearity problem. Hence, the model is sparse since the coefficient of most covariates are likely to be zero. The classical frequentist or likelihood-based variable selection via any criterion such as Bayesian Information Criteria (BIC) and Akaike Information Criteria (AIC) or a stepwise subset selection becomes infeasible when the number of variables are large. An alternative solution is a Bayesian variable selection. In this study, we used a variable selection via a Bayesian variable selection and the least absolute shrinkage and selection operator (LASSO) method in the logistic regression model. Moreover, those methods were expanded to be applied to real datasets.



    加载中


    [1] J. Geweke, Variable selection and model comparison in regression, Bayesian Stat., 5 (1995). https://doi.org/10.21034/wp.539 doi: 10.21034/wp.539
    [2] P. J. Green, Reversible jump Markov chain Monte Carlo computation and Bayesian model determination, Biometrika, 82 (1995), 711–732. https://doi.org/10.2307/2337340 doi: 10.2307/2337340
    [3] P. L. Baldi, A. D. Long, A Bayesian framework for the analysis of microarray expression data: Regularized t-test and statistical inference of gene changes, Bioinformatics, 17 (2001), 509–519. https://doi.org/10.1093/bioinformatics/17.6.509 doi: 10.1093/bioinformatics/17.6.509
    [4] J. Wakefield, Bayes factors for genome-wide association studies: Comparison with P-Values, Genet. Epidemiol., 33 (2009), 79–86. https://doi.org/10.1002/gepi.20359 doi: 10.1002/gepi.20359
    [5] W. S. Bush, J. H. Moore, Genome-wide association studies, PLoS Comput. Biol., 8 (2012), 1–11. https://doi.org/10.1371/journal.pcbi.1002822 doi: 10.1371/journal.pcbi.1002822
    [6] E. Uffelmann, Q. Q. Huang, S. M. Munung, J. D. Vries, Y. Okada, A. R. Martin, et al., Genome-wide association studie, Nat. Rev. Method. Primers, 59 (2021), 1–21. https://doi.org/10.1038/s43586-021-00056-9 doi: 10.1038/s43586-021-00056-9
    [7] B. J. B. Keats, S. L. Sherman, Population genetics, in Emery and Rimoin's principles and practice of medical genetics, 6 Eds., London: Academic Press, 2013. https://doi.org/10.1016/b978-0-12-383834-6.00015-x
    [8] J. H. Albert, S. Chib, Bayesian analysis of binary and Polychotomous response data, J. Am. Stat. Assoc., 88 (1993), 669–679. https://doi.org/10.1080/01621459.1993.10476321 doi: 10.1080/01621459.1993.10476321
    [9] C. Holmes, L. K. Held, Bayesian auxiliary variable models for binary and multinomial regression, Bayesian Anal., 1 (2006), 145–168. https://doi.org/10.1214/06-ba105 doi: 10.1214/06-ba105
    [10] L. Devroye, Non-uniform random variate generation, New York: Springer, 1986. https://doi.org/10.1007/978-1-4613-8643-8
    [11] D. F. Andrews, C. L. Mallows, Scale mixtures of normal distributions, J. Roy. Stat. Soc., 36 (1974), 99–102. https://doi.org/10.1111/j.2517-6161.1974.tb00989.x doi: 10.1111/j.2517-6161.1974.tb00989.x
    [12] M. Zucknick, S. Richardson, MCMC algorithms for Bayesian variable selection in the logistic regression model for large-scale genomic applications, arXiv Computation, 2014.
    [13] M. Zucknick, 2013. Available from: https://r-forge.r-project.org/projects/bvsflex/.
    [14] S. Wang, W. Chen, R. Yang, Fisher information in ranked set sampling from the simple linear regression model, Commun. Stat.-Simul. C., 53 (2024), 1274–1284.
  • Reader Comments
  • © 2024 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(313) PDF downloads(41) Cited by(0)

Article outline

Figures and Tables

Figures(1)  /  Tables(3)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog