Application of Bayesian variable selection in logistic regression model

Kannat Na Bangchang; Kannat Na Bangchang

doi:10.3934/math.2024650

AIMS Mathematics

2024, Volume 9, Issue 5: 13336-13345. doi: 10.3934/math.2024650

Previous Article Next Article

Research article

Application of Bayesian variable selection in logistic regression model

Kannat Na Bangchang ^{1,2
,
,}

1.
Department of Mathematics and Statistics, Faculty of Science and Technology, Thammasat University, Pathumthani, 12120, Thailand
2.
Thammasat University Research Unit in Statistical Theory and Applications, Thailand

Received: 20 February 2024 Revised: 22 March 2024 Accepted: 29 March 2024 Published: 10 April 2024
MSC : 62C10, 62C12

Typically, in high dimensional data sets, many covariates are not significantly associated with a response. Moreover, those covariates are highly correlated, leading to a multicollinearity problem. Hence, the model is sparse since the coefficient of most covariates are likely to be zero. The classical frequentist or likelihood-based variable selection via any criterion such as Bayesian Information Criteria (BIC) and Akaike Information Criteria (AIC) or a stepwise subset selection becomes infeasible when the number of variables are large. An alternative solution is a Bayesian variable selection. In this study, we used a variable selection via a Bayesian variable selection and the least absolute shrinkage and selection operator (LASSO) method in the logistic regression model. Moreover, those methods were expanded to be applied to real datasets.
- Bayesian variable selection,
- multicollinearity problem,
- logistic regression model,
- LASSO method
Citation: Kannat Na Bangchang. Application of Bayesian variable selection in logistic regression model[J]. AIMS Mathematics, 2024, 9(5): 13336-13345. doi: 10.3934/math.2024650

Related Papers:

Abstract

Typically, in high dimensional data sets, many covariates are not significantly associated with a response. Moreover, those covariates are highly correlated, leading to a multicollinearity problem. Hence, the model is sparse since the coefficient of most covariates are likely to be zero. The classical frequentist or likelihood-based variable selection via any criterion such as Bayesian Information Criteria (BIC) and Akaike Information Criteria (AIC) or a stepwise subset selection becomes infeasible when the number of variables are large. An alternative solution is a Bayesian variable selection. In this study, we used a variable selection via a Bayesian variable selection and the least absolute shrinkage and selection operator (LASSO) method in the logistic regression model. Moreover, those methods were expanded to be applied to real datasets.

References

[1]	J. Geweke, Variable selection and model comparison in regression, Bayesian Stat., 5 (1995). https://doi.org/10.21034/wp.539 doi: 10.21034/wp.539
[2]	P. J. Green, Reversible jump Markov chain Monte Carlo computation and Bayesian model determination, Biometrika, 82 (1995), 711–732. https://doi.org/10.2307/2337340 doi: 10.2307/2337340
[3]	P. L. Baldi, A. D. Long, A Bayesian framework for the analysis of microarray expression data: Regularized t-test and statistical inference of gene changes, Bioinformatics, 17 (2001), 509–519. https://doi.org/10.1093/bioinformatics/17.6.509 doi: 10.1093/bioinformatics/17.6.509
[4]	J. Wakefield, Bayes factors for genome-wide association studies: Comparison with P-Values, Genet. Epidemiol., 33 (2009), 79–86. https://doi.org/10.1002/gepi.20359 doi: 10.1002/gepi.20359
[5]	W. S. Bush, J. H. Moore, Genome-wide association studies, PLoS Comput. Biol., 8 (2012), 1–11. https://doi.org/10.1371/journal.pcbi.1002822 doi: 10.1371/journal.pcbi.1002822
[6]	E. Uffelmann, Q. Q. Huang, S. M. Munung, J. D. Vries, Y. Okada, A. R. Martin, et al., Genome-wide association studie, Nat. Rev. Method. Primers, 59 (2021), 1–21. https://doi.org/10.1038/s43586-021-00056-9 doi: 10.1038/s43586-021-00056-9
[7]	B. J. B. Keats, S. L. Sherman, Population genetics, in Emery and Rimoin's principles and practice of medical genetics, 6 Eds., London: Academic Press, 2013. https://doi.org/10.1016/b978-0-12-383834-6.00015-x
[8]	J. H. Albert, S. Chib, Bayesian analysis of binary and Polychotomous response data, J. Am. Stat. Assoc., 88 (1993), 669–679. https://doi.org/10.1080/01621459.1993.10476321 doi: 10.1080/01621459.1993.10476321
[9]	C. Holmes, L. K. Held, Bayesian auxiliary variable models for binary and multinomial regression, Bayesian Anal., 1 (2006), 145–168. https://doi.org/10.1214/06-ba105 doi: 10.1214/06-ba105
[10]	L. Devroye, Non-uniform random variate generation, New York: Springer, 1986. https://doi.org/10.1007/978-1-4613-8643-8
[11]	D. F. Andrews, C. L. Mallows, Scale mixtures of normal distributions, J. Roy. Stat. Soc., 36 (1974), 99–102. https://doi.org/10.1111/j.2517-6161.1974.tb00989.x doi: 10.1111/j.2517-6161.1974.tb00989.x
[12]	M. Zucknick, S. Richardson, MCMC algorithms for Bayesian variable selection in the logistic regression model for large-scale genomic applications, arXiv Computation, 2014.
[13]	M. Zucknick, 2013. Available from: https://r-forge.r-project.org/projects/bvsflex/.
[14]	S. Wang, W. Chen, R. Yang, Fisher information in ranked set sampling from the simple linear regression model, Commun. Stat.-Simul. C., 53 (2024), 1274–1284.

Reader Comments

Your name:*

Email:*
© 2024 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)