Simultaneous variable selection and estimation for longitudinal ordinal data with a diverging number of covariates

Xianbin Chen; Juliang Yin; Xianbin Chen; Juliang Yin

doi:10.3934/math.2022402

AIMS Mathematics

2022, Volume 7, Issue 4: 7199-7211. doi: 10.3934/math.2022402

Previous Article Next Article

Research article

Simultaneous variable selection and estimation for longitudinal ordinal data with a diverging number of covariates

Xianbin Chen ¹,
Juliang Yin ^{1
,
,}

School of Economics and Statistics, Guangzhou University, Guangzhou, 510006, China

Received: 01 November 2021 Revised: 18 January 2022 Accepted: 26 January 2022 Published: 09 February 2022
MSC : 62E20, 62F12, 62J12

In this paper, we study the problem of simultaneous variable selection and estimation for longitudinal ordinal data with high-dimensional covariates. Using the penalized generalized estimation equation (GEE) method, we obtain some asymptotic properties for these types of data in the case that the dimension of the covariates $ p_n $ tends to infinity as the number of cluster $ n $ approaches to infinity. More precisely, under appropriate regular conditions, all the covariates with zero coefficients can be examined simultaneously with probability tending to 1, and the estimator of the non-zero coefficients exhibits the asymptotic Oracle properties. Finally, we also perform some Monte Carlo studies to illustrate the theoretical analysis. The main result in this paper extends the elegant work of Wang et al. ^[1] to the multinomial response variable case.
- longitudinal ordinal data,
- penalized generalized estimating equations,
- high-dimensional covariates,
- consistency,
- asymptotic normality
Citation: Xianbin Chen, Juliang Yin. Simultaneous variable selection and estimation for longitudinal ordinal data with a diverging number of covariates[J]. AIMS Mathematics, 2022, 7(4): 7199-7211. doi: 10.3934/math.2022402

Related Papers:

Abstract

In this paper, we study the problem of simultaneous variable selection and estimation for longitudinal ordinal data with high-dimensional covariates. Using the penalized generalized estimation equation (GEE) method, we obtain some asymptotic properties for these types of data in the case that the dimension of the covariates $ p_n $ tends to infinity as the number of cluster $ n $ approaches to infinity. More precisely, under appropriate regular conditions, all the covariates with zero coefficients can be examined simultaneously with probability tending to 1, and the estimator of the non-zero coefficients exhibits the asymptotic Oracle properties. Finally, we also perform some Monte Carlo studies to illustrate the theoretical analysis. The main result in this paper extends the elegant work of Wang et al. ^[1] to the multinomial response variable case.

References

[1]	L. Wang, J. H. Zhou, A. N. Qu, Penalized generalized estimating equations for high-dimensional longitudinal data analysis, Biometrics, 68 (2012), 353–360. http://dx.doi.org/10.1111/j.1541-0420.2011.01678.x doi: 10.1111/j.1541-0420.2011.01678.x
[2]	L. Wang, GEE analysis of clustered binary data with diverging number of covariates, Ann. Stat., 39 (2011), 389–417. https://doi.org/10.1214/10-AOS846 doi: 10.1214/10-AOS846
[3]	H. Akaike, A new look at the statistical model identification, IEEE. T. Automat. Contr. 19 (1974), 716–723. http://dx.doi.org/10.1109/tac.1974.1100705 doi: 10.1109/tac.1974.1100705
[4]	G. Schwarz, Estimating the dimension of a model, Ann. Stat., 6 (1978), 461–464. http://dx.doi.org/10.1214/aos/1176344136 doi: 10.1214/aos/1176344136
[5]	W. Pan, Akaike's information criterion in generalized estimating equations, Biometrics, 57 (2001), 120–125. https://doi.org/10.1111/j.0006-341X.2001.00120.x doi: 10.1111/j.0006-341X.2001.00120.x
[6]	W. J. Fu, Penalized estimating equations, Biometrics, 59 (2003), 126–132. http://dx.doi.org/10.1111/1541-0420.00015 doi: 10.1111/1541-0420.00015
[7]	E. Cantoni, J. M. Flemming, E. Ronchetti, Variable selection for marginal longitudinal generalized linear models, Biometrics, 61 (2005), 507–514. http://dx.doi.org/10.1111/j.1541-0420.2005.00331.x doi: 10.1111/j.1541-0420.2005.00331.x
[8]	L. Wang, A. N. Qu, Consistent model selection and data-driven smooth tests for longitudinal data in the estimating equations approach, J. Roy. Statist. Soc., 71 (2009), 177–190. https://doi.org/10.1111/j.1467-9868.2008.00679.x doi: 10.1111/j.1467-9868.2008.00679.x
[9]	H. Yang, P. Lin, G. H. Zou, H. Liang, Variable selection and model averaging for longitudinal data incorporating GEE approach, Stat. Sinica, 27 (2017), 389–413. http://dx.doi.org/10.5705/ss.2013.277 doi: 10.5705/ss.2013.277
[10]	Z. M. Chen, Z. F. Wang, Y. Ivan Chang, Sequential adaptive variables and subject selection for GEE methods, Biometrics, 76 (2020), 496–507. http://dx.doi.org/10.1111/biom.13160 doi: 10.1111/biom.13160
[11]	J. M. Williamson, H. M. Lin, H. X. Barnhart, A classification statistic for GEE categorical response models, Journal of Data Science, 1 (2003), 149–165. http://dx.doi.org/10.6339/JDS.2003.01(2).106 doi: 10.6339/JDS.2003.01(2).106
[12]	S. R. Lipsitz, K. Kim, L. P. Zhao, Analysis of repeated categorical data using generalized estimating equations, Stat. Med., 13 (1994), 1149–1163. https://doi.org/10.1002/sim.4780131106 doi: 10.1002/sim.4780131106
[13]	K. C. Lin, Y. J. Chen, Assessing GEE models with longitudinal ordinal data by global odds ratio, Int. Statistical Inst.: Proc. 58th World Statistical Congress, (2011), 5763–5768.
[14]	K. Y. Liang, S. L. Zeger, Longitudinal data analysis using generalized linear models, Biometrika, 73 (1986), 13–22. https://doi.org/10.1093/biomet/73.1.13 doi: 10.1093/biomet/73.1.13
[15]	J. Q. Fan, R. Z. Li, Variable selection via nonconcave penalized likelihood and its oracle properties, J. Am. Stat. Assoc., 96 (2001), 1348–1360. https://doi.org/10.1198/016214501753382273 doi: 10.1198/016214501753382273
[16]	L. Fahrmeir, G. Tutz, Multivariate statistcal modelling based on generalized linear models, New York: Springer, 1994. https://doi.org/10.1007/978-1-4899-0010-4
[17]	A. Touloumis, A. Agresti, M. Kateri, GEE for multinomial responses using a local odds ratios parameterization, Biometrics, 69 (2013), 633–640. http://dx.doi.org/10.1111/biom.12054 doi: 10.1111/biom.12054
[18]	S. G. Wang, J. H. Shi, S. J. Yin, M. X. Wu, Introduction to linear models. 3rd ed, Beijing: Science Press, 2004.
[19]	A. Touloumis, Simulating correlated binary and multinomial responses under marginal model specification: the SimCorMultRes package, The R Journal, 8 (2016), 79–91. http://dx.doi.org/10.32614/RJ-2016-034 doi: 10.32614/RJ-2016-034
[20]	X. B. Chen, J. L. Yin, Asymptotic properties of GEE estimator for clustered ordinal data with high-dimensional covariates, Commun. Stat.-Theor. M., (2021). http://dx.doi.org/10.1080/03610926.2021.1934029
[21]	V. D. Vaart, J. Wellner, Weak convergence and empirical processes: with applications to statistics, New York: Springer, 1996.

Reader Comments

Your name:*

Email:*
© 2022 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

AIMS Mathematics

1.8 3.1

Metrics

Article views(2598) PDF downloads(71) Cited by(0)

Preview PDF

Download XML

Export Citation

Article outline

Show full outline

Figures and Tables

Tables(1)

AIMS Mathematics

Simultaneous variable selection and estimation for longitudinal ordinal data with a diverging number of covariates

Related Papers:

Abstract

References

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Catalog

AIMS Mathematics

Simultaneous variable selection and estimation for longitudinal ordinal data with a diverging number of covariates

Related Papers:

Abstract

References

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Related pages

Tools

Export File

Citation

Format

Content

Catalog