In this paper, we study the problem of simultaneous variable selection and estimation for longitudinal ordinal data with high-dimensional covariates. Using the penalized generalized estimation equation (GEE) method, we obtain some asymptotic properties for these types of data in the case that the dimension of the covariates $ p_n $ tends to infinity as the number of cluster $ n $ approaches to infinity. More precisely, under appropriate regular conditions, all the covariates with zero coefficients can be examined simultaneously with probability tending to 1, and the estimator of the non-zero coefficients exhibits the asymptotic Oracle properties. Finally, we also perform some Monte Carlo studies to illustrate the theoretical analysis. The main result in this paper extends the elegant work of Wang et al. [
Citation: Xianbin Chen, Juliang Yin. Simultaneous variable selection and estimation for longitudinal ordinal data with a diverging number of covariates[J]. AIMS Mathematics, 2022, 7(4): 7199-7211. doi: 10.3934/math.2022402
In this paper, we study the problem of simultaneous variable selection and estimation for longitudinal ordinal data with high-dimensional covariates. Using the penalized generalized estimation equation (GEE) method, we obtain some asymptotic properties for these types of data in the case that the dimension of the covariates $ p_n $ tends to infinity as the number of cluster $ n $ approaches to infinity. More precisely, under appropriate regular conditions, all the covariates with zero coefficients can be examined simultaneously with probability tending to 1, and the estimator of the non-zero coefficients exhibits the asymptotic Oracle properties. Finally, we also perform some Monte Carlo studies to illustrate the theoretical analysis. The main result in this paper extends the elegant work of Wang et al. [
[1] | L. Wang, J. H. Zhou, A. N. Qu, Penalized generalized estimating equations for high-dimensional longitudinal data analysis, Biometrics, 68 (2012), 353–360. http://dx.doi.org/10.1111/j.1541-0420.2011.01678.x doi: 10.1111/j.1541-0420.2011.01678.x |
[2] | L. Wang, GEE analysis of clustered binary data with diverging number of covariates, Ann. Stat., 39 (2011), 389–417. https://doi.org/10.1214/10-AOS846 doi: 10.1214/10-AOS846 |
[3] | H. Akaike, A new look at the statistical model identification, IEEE. T. Automat. Contr. 19 (1974), 716–723. http://dx.doi.org/10.1109/tac.1974.1100705 doi: 10.1109/tac.1974.1100705 |
[4] | G. Schwarz, Estimating the dimension of a model, Ann. Stat., 6 (1978), 461–464. http://dx.doi.org/10.1214/aos/1176344136 doi: 10.1214/aos/1176344136 |
[5] | W. Pan, Akaike's information criterion in generalized estimating equations, Biometrics, 57 (2001), 120–125. https://doi.org/10.1111/j.0006-341X.2001.00120.x doi: 10.1111/j.0006-341X.2001.00120.x |
[6] | W. J. Fu, Penalized estimating equations, Biometrics, 59 (2003), 126–132. http://dx.doi.org/10.1111/1541-0420.00015 doi: 10.1111/1541-0420.00015 |
[7] | E. Cantoni, J. M. Flemming, E. Ronchetti, Variable selection for marginal longitudinal generalized linear models, Biometrics, 61 (2005), 507–514. http://dx.doi.org/10.1111/j.1541-0420.2005.00331.x doi: 10.1111/j.1541-0420.2005.00331.x |
[8] | L. Wang, A. N. Qu, Consistent model selection and data-driven smooth tests for longitudinal data in the estimating equations approach, J. Roy. Statist. Soc., 71 (2009), 177–190. https://doi.org/10.1111/j.1467-9868.2008.00679.x doi: 10.1111/j.1467-9868.2008.00679.x |
[9] | H. Yang, P. Lin, G. H. Zou, H. Liang, Variable selection and model averaging for longitudinal data incorporating GEE approach, Stat. Sinica, 27 (2017), 389–413. http://dx.doi.org/10.5705/ss.2013.277 doi: 10.5705/ss.2013.277 |
[10] | Z. M. Chen, Z. F. Wang, Y. Ivan Chang, Sequential adaptive variables and subject selection for GEE methods, Biometrics, 76 (2020), 496–507. http://dx.doi.org/10.1111/biom.13160 doi: 10.1111/biom.13160 |
[11] | J. M. Williamson, H. M. Lin, H. X. Barnhart, A classification statistic for GEE categorical response models, Journal of Data Science, 1 (2003), 149–165. http://dx.doi.org/10.6339/JDS.2003.01(2).106 doi: 10.6339/JDS.2003.01(2).106 |
[12] | S. R. Lipsitz, K. Kim, L. P. Zhao, Analysis of repeated categorical data using generalized estimating equations, Stat. Med., 13 (1994), 1149–1163. https://doi.org/10.1002/sim.4780131106 doi: 10.1002/sim.4780131106 |
[13] | K. C. Lin, Y. J. Chen, Assessing GEE models with longitudinal ordinal data by global odds ratio, Int. Statistical Inst.: Proc. 58th World Statistical Congress, (2011), 5763–5768. |
[14] | K. Y. Liang, S. L. Zeger, Longitudinal data analysis using generalized linear models, Biometrika, 73 (1986), 13–22. https://doi.org/10.1093/biomet/73.1.13 doi: 10.1093/biomet/73.1.13 |
[15] | J. Q. Fan, R. Z. Li, Variable selection via nonconcave penalized likelihood and its oracle properties, J. Am. Stat. Assoc., 96 (2001), 1348–1360. https://doi.org/10.1198/016214501753382273 doi: 10.1198/016214501753382273 |
[16] | L. Fahrmeir, G. Tutz, Multivariate statistcal modelling based on generalized linear models, New York: Springer, 1994. https://doi.org/10.1007/978-1-4899-0010-4 |
[17] | A. Touloumis, A. Agresti, M. Kateri, GEE for multinomial responses using a local odds ratios parameterization, Biometrics, 69 (2013), 633–640. http://dx.doi.org/10.1111/biom.12054 doi: 10.1111/biom.12054 |
[18] | S. G. Wang, J. H. Shi, S. J. Yin, M. X. Wu, Introduction to linear models. 3rd ed, Beijing: Science Press, 2004. |
[19] | A. Touloumis, Simulating correlated binary and multinomial responses under marginal model specification: the SimCorMultRes package, The R Journal, 8 (2016), 79–91. http://dx.doi.org/10.32614/RJ-2016-034 doi: 10.32614/RJ-2016-034 |
[20] | X. B. Chen, J. L. Yin, Asymptotic properties of GEE estimator for clustered ordinal data with high-dimensional covariates, Commun. Stat.-Theor. M., (2021). http://dx.doi.org/10.1080/03610926.2021.1934029 |
[21] | V. D. Vaart, J. Wellner, Weak convergence and empirical processes: with applications to statistics, New York: Springer, 1996. |