Typically, in high dimensional data sets, many covariates are not significantly associated with a response. Moreover, those covariates are highly correlated, leading to a multicollinearity problem. Hence, the model is sparse since the coefficient of most covariates are likely to be zero. The classical frequentist or likelihood-based variable selection via any criterion such as Bayesian Information Criteria (BIC) and Akaike Information Criteria (AIC) or a stepwise subset selection becomes infeasible when the number of variables are large. An alternative solution is a Bayesian variable selection. In this study, we used a variable selection via a Bayesian variable selection and the least absolute shrinkage and selection operator (LASSO) method in the logistic regression model. Moreover, those methods were expanded to be applied to real datasets.
Citation: Kannat Na Bangchang. Application of Bayesian variable selection in logistic regression model[J]. AIMS Mathematics, 2024, 9(5): 13336-13345. doi: 10.3934/math.2024650
Typically, in high dimensional data sets, many covariates are not significantly associated with a response. Moreover, those covariates are highly correlated, leading to a multicollinearity problem. Hence, the model is sparse since the coefficient of most covariates are likely to be zero. The classical frequentist or likelihood-based variable selection via any criterion such as Bayesian Information Criteria (BIC) and Akaike Information Criteria (AIC) or a stepwise subset selection becomes infeasible when the number of variables are large. An alternative solution is a Bayesian variable selection. In this study, we used a variable selection via a Bayesian variable selection and the least absolute shrinkage and selection operator (LASSO) method in the logistic regression model. Moreover, those methods were expanded to be applied to real datasets.
[1] | J. Geweke, Variable selection and model comparison in regression, Bayesian Stat., 5 (1995). https://doi.org/10.21034/wp.539 doi: 10.21034/wp.539 |
[2] | P. J. Green, Reversible jump Markov chain Monte Carlo computation and Bayesian model determination, Biometrika, 82 (1995), 711–732. https://doi.org/10.2307/2337340 doi: 10.2307/2337340 |
[3] | P. L. Baldi, A. D. Long, A Bayesian framework for the analysis of microarray expression data: Regularized t-test and statistical inference of gene changes, Bioinformatics, 17 (2001), 509–519. https://doi.org/10.1093/bioinformatics/17.6.509 doi: 10.1093/bioinformatics/17.6.509 |
[4] | J. Wakefield, Bayes factors for genome-wide association studies: Comparison with P-Values, Genet. Epidemiol., 33 (2009), 79–86. https://doi.org/10.1002/gepi.20359 doi: 10.1002/gepi.20359 |
[5] | W. S. Bush, J. H. Moore, Genome-wide association studies, PLoS Comput. Biol., 8 (2012), 1–11. https://doi.org/10.1371/journal.pcbi.1002822 doi: 10.1371/journal.pcbi.1002822 |
[6] | E. Uffelmann, Q. Q. Huang, S. M. Munung, J. D. Vries, Y. Okada, A. R. Martin, et al., Genome-wide association studie, Nat. Rev. Method. Primers, 59 (2021), 1–21. https://doi.org/10.1038/s43586-021-00056-9 doi: 10.1038/s43586-021-00056-9 |
[7] | B. J. B. Keats, S. L. Sherman, Population genetics, in Emery and Rimoin's principles and practice of medical genetics, 6 Eds., London: Academic Press, 2013. https://doi.org/10.1016/b978-0-12-383834-6.00015-x |
[8] | J. H. Albert, S. Chib, Bayesian analysis of binary and Polychotomous response data, J. Am. Stat. Assoc., 88 (1993), 669–679. https://doi.org/10.1080/01621459.1993.10476321 doi: 10.1080/01621459.1993.10476321 |
[9] | C. Holmes, L. K. Held, Bayesian auxiliary variable models for binary and multinomial regression, Bayesian Anal., 1 (2006), 145–168. https://doi.org/10.1214/06-ba105 doi: 10.1214/06-ba105 |
[10] | L. Devroye, Non-uniform random variate generation, New York: Springer, 1986. https://doi.org/10.1007/978-1-4613-8643-8 |
[11] | D. F. Andrews, C. L. Mallows, Scale mixtures of normal distributions, J. Roy. Stat. Soc., 36 (1974), 99–102. https://doi.org/10.1111/j.2517-6161.1974.tb00989.x doi: 10.1111/j.2517-6161.1974.tb00989.x |
[12] | M. Zucknick, S. Richardson, MCMC algorithms for Bayesian variable selection in the logistic regression model for large-scale genomic applications, arXiv Computation, 2014. |
[13] | M. Zucknick, 2013. Available from: https://r-forge.r-project.org/projects/bvsflex/. |
[14] | S. Wang, W. Chen, R. Yang, Fisher information in ranked set sampling from the simple linear regression model, Commun. Stat.-Simul. C., 53 (2024), 1274–1284. |