In the case of non-independent and identically distributed samples, we propose a new ueMC algorithm based on uniformly ergodic Markov samples, and study the generalization ability, the learning rate and convergence of the algorithm. We develop the ueMC algorithm to generate samples from given datasets, and present the numerical results for benchmark datasets. The numerical simulation shows that the logistic regression model with Markov sampling has better generalization ability on large training samples, and its performance is also better than that of classical machine learning algorithms, such as random forest and Adaboost.
Citation: Zhiyong Qian, Wangsen Xiao, Shulan Hu. The generalization ability of logistic regression with Markov sampling[J]. Electronic Research Archive, 2023, 31(9): 5250-5266. doi: 10.3934/era.2023267
In the case of non-independent and identically distributed samples, we propose a new ueMC algorithm based on uniformly ergodic Markov samples, and study the generalization ability, the learning rate and convergence of the algorithm. We develop the ueMC algorithm to generate samples from given datasets, and present the numerical results for benchmark datasets. The numerical simulation shows that the logistic regression model with Markov sampling has better generalization ability on large training samples, and its performance is also better than that of classical machine learning algorithms, such as random forest and Adaboost.
[1] | A. Bayaga, Multinomial logistic regression: Usage and application in risk analysis, J. Appl. Quant. Methods, 5 (2010), 288–297. |
[2] | A. Selmoune, Z. Liu, J. Lee, To pay or not to pay? Understanding public acceptance of congestion pricing: A case study of Nanjing, Electron. Res. Arch, 30 (2022), 4136–4156. https://doi.org/10.3934/era.2022209 doi: 10.3934/era.2022209 |
[3] | Z. Ahmad, Z. Almaspoor, F. Khan, S. E. Alhazmi, M. El-Morshedy, O. Y. Ababneh, et al., On fitting and forecasting the log-returns of cryptocurrency exchange rates using a new logistic model and machine learning algorithms, AIMS Math., 7 (2022), 18031–18049. https://doi.org/10.3934/math.2022993 doi: 10.3934/math.2022993 |
[4] | N. Dwarika, Asset pricing models in South Africa: A comparative of regression analysis and the Bayesian approach, Data Sci. Financ. Econ., 3 (2023), 55–75. https://doi.org/10.3934/DSFE.2023004 doi: 10.3934/DSFE.2023004 |
[5] | D. McAllester, Generalization bounds and consistency, Predicting Struct. Data, 2007. https://doi.org/10.7551/mitpress/7443.003.0015 |
[6] | N. Kordzakhia, G. D. Mishra, L. Reiersølmoen, Robust estimation in the logistic regression model, J. Stat. Plan. Infer., 98 (2001), 211–223. https://doi.org/10.1016/S0378-3758(00)00312-8 doi: 10.1016/S0378-3758(00)00312-8 |
[7] | M. Rashid, Inference on Logistic Regression Models, Ph.D thesis, Bowling Green State University, 2008. |
[8] | D. Dai, D. Wang, A generalized Liu-type estimator for logistic partial linear regression model with multicollinearity, AIMS Math., 8 (2023), 11851–11874. https://doi.org/10.3934/math.2023600 doi: 10.3934/math.2023600 |
[9] | Z. Wang, Z. Wang, B. Fu, Learning restricted bayesian network classifiers with mixed non-i.i.d. sampling, in 2010 IEEE International Conference on Data Mining Workshops, (2010), 899–904. https://doi.org/10.1109/ICDMW.2010.199 |
[10] | H. Sun, Q. Wu, Least square regression with indefinite kernels and coefficient regularization, Appl. Comput. Harmon A, 30 (2011), 96–109 https://doi.org/10.1016/j.acha.2010.04.001 doi: 10.1016/j.acha.2010.04.001 |
[11] | H. Sun, Q. Guo, Coefficient regularized regression with non-iid sampling, Int. J. Comput. Math., 88 (2011), 3113–3124. https://doi.org/10.1080/00207160.2011.587511 doi: 10.1080/00207160.2011.587511 |
[12] | X. Chu, H. Sun, Regularized least square regression with unbounded and dependent sampling, Abstr. Appl. Anal., 2013 (2013), 900–914. https://doi.org/10.1155/2013/139318. doi: 10.1155/2013/139318 |
[13] | Z. C. Guo, L. Shi, Learning with coefficient-based regularization and l1-penalty, Adv. Comput. Math., 39 (2013), 493–510. https://doi.org/10.1007/s10444-012-9288-6 doi: 10.1007/s10444-012-9288-6 |
[14] | B. Jiang, Q. Sun, J. Q. Fan, Bernstein's inequality for general Markov chains, preprint, arXiv: 1805.10721. |
[15] | D. S. Modha, E. Masry, Minimum complexity regression estimation with weakly dependent observations, IEEE Trans. Inf. Theory, 42 (1996), 2133–2145. https://doi.org/10.1109/18.556602 doi: 10.1109/18.556602 |
[16] | F. Merlevède, M. Peligrad, E. Rio, Bernstein inequality and moderate deviations under strong mixing conditions, Inst. Math. Stat. (IMS) Collect., 2009 (2009), 273–292. https://doi.org/10.1214/09-IMSCOLL518 doi: 10.1214/09-IMSCOLL518 |
[17] | J. Q. Fan, B. Jiang, Q. Sun, Hoeffding's lemma for Markov Chains and its applications to statistical learning, preprint, arXiv: 1802.00211. |
[18] | P. J. M. Laarhoven, E. H. L. Aarts, Simulated Annealing: Theory and Applications, Springer, Dordrecht, 1987. |
[19] | J. Thongkam, G. Xu, Y. Zhang, et.al., Support vector machine for outlier detection in breast cancer survivability prediction, in Asia-Pacific Web Conference, Springer, (2008), 99–109. https://doi.org/10.1007/978-3-540-89376-9_10 |
[20] | A. L. B. Miranda, L. P. F. Garcia, A. C. P. L. F. Carvalho, A. C. Lorena, Use of classification algorithms in noise detection and elimination, in International Conference on Hybrid Artificial Intelligence Systems, Springer, (2009), 417–424. https://doi.org/10.1007/978-3-642-02319-4_50 |
[21] | J. Xu, Y. Y. Tang, B. Zou, Z. Xu, L. Li, Y. Lu, et al., The generalization ability of SVM classification based on Markov sampling, IEEE Trans. Cybern., 45 (2014), 1169–1179. https://doi.org/10.1109/TCYB.2014.2346536 doi: 10.1109/TCYB.2014.2346536 |
[22] | J. D. Head, M. C. Zerner, A Broyden—Fletcher—Goldfarb—Shanno optimization procedure for molecular geometries, Chem. Phys. Lett., 122 (1985), 264–270. https://doi.org/10.1016/0009-2614(85)80574-1 doi: 10.1016/0009-2614(85)80574-1 |
[23] | M. Vidyasagar, Learning and Generalization: With Applications to Neural Networks, Springer, London, 2003. |
[24] | S. P. Meyn, R. L. Tweedie, Markov Chains and Stochastic Stability, Springer, Berlin, 2012. |
[25] | P. Doukhan, Mixing: Properties and Examples, Springer, Berlin, 2012. |
[26] | P. Zhang, N. Riedel, Discriminant analysis: A unified approach, in Fifth IEEE International Conference on Data Mining (ICDM'05), 2005. https://doi.org/10.1109/ICDM.2005.51 |
[27] | V. N. Vapnik, An overview of statistical learning theory, IEEE T. Neur. Net. Lear., 10 (1999), 988–999. https://doi.org/10.1109/72.788640 doi: 10.1109/72.788640 |
[28] | F. Cucker, S. Smale, Best choices for regularization parameters in learning theory: On the bias-variance problem, Found. Comput. Math., 2 (2002), 413–428. https://doi.org/10.1007/s102080010030 doi: 10.1007/s102080010030 |
[29] | G. Stempfel, L. Ralaivola, Learning SVMs from sloppily labeled data, in Lecture Notes in Computer Science, Springer, 2009. http://dx.doi.org/10.1007/978-3-642-04274-4_91 |
[30] | M. P. Qian, G. L. Gong, Applied random processes, Peking University Press, Beijing, 1998. |
[31] | W. K. Hastings, Monte Carlo sampling methods using Markov chains and their applications, Biometrika, 57 (1970), 97–109. https://doi.org/10.1093/biomet/57.1.97 doi: 10.1093/biomet/57.1.97 |
[32] | S. Geman S, D. Geman, Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images, IEEE Trans. Pattern Anal. Mach. Intell., 6 (1984), 721–741. https://doi.org/10.1109/TPAMI.1984.4767596 doi: 10.1109/TPAMI.1984.4767596 |