Research article

The generalization ability of logistic regression with Markov sampling

  • Received: 08 May 2023 Revised: 07 July 2023 Accepted: 17 July 2023 Published: 20 July 2023
  • In the case of non-independent and identically distributed samples, we propose a new ueMC algorithm based on uniformly ergodic Markov samples, and study the generalization ability, the learning rate and convergence of the algorithm. We develop the ueMC algorithm to generate samples from given datasets, and present the numerical results for benchmark datasets. The numerical simulation shows that the logistic regression model with Markov sampling has better generalization ability on large training samples, and its performance is also better than that of classical machine learning algorithms, such as random forest and Adaboost.

    Citation: Zhiyong Qian, Wangsen Xiao, Shulan Hu. The generalization ability of logistic regression with Markov sampling[J]. Electronic Research Archive, 2023, 31(9): 5250-5266. doi: 10.3934/era.2023267

    Related Papers:

  • In the case of non-independent and identically distributed samples, we propose a new ueMC algorithm based on uniformly ergodic Markov samples, and study the generalization ability, the learning rate and convergence of the algorithm. We develop the ueMC algorithm to generate samples from given datasets, and present the numerical results for benchmark datasets. The numerical simulation shows that the logistic regression model with Markov sampling has better generalization ability on large training samples, and its performance is also better than that of classical machine learning algorithms, such as random forest and Adaboost.



    加载中


    [1] A. Bayaga, Multinomial logistic regression: Usage and application in risk analysis, J. Appl. Quant. Methods, 5 (2010), 288–297.
    [2] A. Selmoune, Z. Liu, J. Lee, To pay or not to pay? Understanding public acceptance of congestion pricing: A case study of Nanjing, Electron. Res. Arch, 30 (2022), 4136–4156. https://doi.org/10.3934/era.2022209 doi: 10.3934/era.2022209
    [3] Z. Ahmad, Z. Almaspoor, F. Khan, S. E. Alhazmi, M. El-Morshedy, O. Y. Ababneh, et al., On fitting and forecasting the log-returns of cryptocurrency exchange rates using a new logistic model and machine learning algorithms, AIMS Math., 7 (2022), 18031–18049. https://doi.org/10.3934/math.2022993 doi: 10.3934/math.2022993
    [4] N. Dwarika, Asset pricing models in South Africa: A comparative of regression analysis and the Bayesian approach, Data Sci. Financ. Econ., 3 (2023), 55–75. https://doi.org/10.3934/DSFE.2023004 doi: 10.3934/DSFE.2023004
    [5] D. McAllester, Generalization bounds and consistency, Predicting Struct. Data, 2007. https://doi.org/10.7551/mitpress/7443.003.0015
    [6] N. Kordzakhia, G. D. Mishra, L. Reiersølmoen, Robust estimation in the logistic regression model, J. Stat. Plan. Infer., 98 (2001), 211–223. https://doi.org/10.1016/S0378-3758(00)00312-8 doi: 10.1016/S0378-3758(00)00312-8
    [7] M. Rashid, Inference on Logistic Regression Models, Ph.D thesis, Bowling Green State University, 2008.
    [8] D. Dai, D. Wang, A generalized Liu-type estimator for logistic partial linear regression model with multicollinearity, AIMS Math., 8 (2023), 11851–11874. https://doi.org/10.3934/math.2023600 doi: 10.3934/math.2023600
    [9] Z. Wang, Z. Wang, B. Fu, Learning restricted bayesian network classifiers with mixed non-i.i.d. sampling, in 2010 IEEE International Conference on Data Mining Workshops, (2010), 899–904. https://doi.org/10.1109/ICDMW.2010.199
    [10] H. Sun, Q. Wu, Least square regression with indefinite kernels and coefficient regularization, Appl. Comput. Harmon A, 30 (2011), 96–109 https://doi.org/10.1016/j.acha.2010.04.001 doi: 10.1016/j.acha.2010.04.001
    [11] H. Sun, Q. Guo, Coefficient regularized regression with non-iid sampling, Int. J. Comput. Math., 88 (2011), 3113–3124. https://doi.org/10.1080/00207160.2011.587511 doi: 10.1080/00207160.2011.587511
    [12] X. Chu, H. Sun, Regularized least square regression with unbounded and dependent sampling, Abstr. Appl. Anal., 2013 (2013), 900–914. https://doi.org/10.1155/2013/139318. doi: 10.1155/2013/139318
    [13] Z. C. Guo, L. Shi, Learning with coefficient-based regularization and l1-penalty, Adv. Comput. Math., 39 (2013), 493–510. https://doi.org/10.1007/s10444-012-9288-6 doi: 10.1007/s10444-012-9288-6
    [14] B. Jiang, Q. Sun, J. Q. Fan, Bernstein's inequality for general Markov chains, preprint, arXiv: 1805.10721.
    [15] D. S. Modha, E. Masry, Minimum complexity regression estimation with weakly dependent observations, IEEE Trans. Inf. Theory, 42 (1996), 2133–2145. https://doi.org/10.1109/18.556602 doi: 10.1109/18.556602
    [16] F. Merlevède, M. Peligrad, E. Rio, Bernstein inequality and moderate deviations under strong mixing conditions, Inst. Math. Stat. (IMS) Collect., 2009 (2009), 273–292. https://doi.org/10.1214/09-IMSCOLL518 doi: 10.1214/09-IMSCOLL518
    [17] J. Q. Fan, B. Jiang, Q. Sun, Hoeffding's lemma for Markov Chains and its applications to statistical learning, preprint, arXiv: 1802.00211.
    [18] P. J. M. Laarhoven, E. H. L. Aarts, Simulated Annealing: Theory and Applications, Springer, Dordrecht, 1987.
    [19] J. Thongkam, G. Xu, Y. Zhang, et.al., Support vector machine for outlier detection in breast cancer survivability prediction, in Asia-Pacific Web Conference, Springer, (2008), 99–109. https://doi.org/10.1007/978-3-540-89376-9_10
    [20] A. L. B. Miranda, L. P. F. Garcia, A. C. P. L. F. Carvalho, A. C. Lorena, Use of classification algorithms in noise detection and elimination, in International Conference on Hybrid Artificial Intelligence Systems, Springer, (2009), 417–424. https://doi.org/10.1007/978-3-642-02319-4_50
    [21] J. Xu, Y. Y. Tang, B. Zou, Z. Xu, L. Li, Y. Lu, et al., The generalization ability of SVM classification based on Markov sampling, IEEE Trans. Cybern., 45 (2014), 1169–1179. https://doi.org/10.1109/TCYB.2014.2346536 doi: 10.1109/TCYB.2014.2346536
    [22] J. D. Head, M. C. Zerner, A Broyden—Fletcher—Goldfarb—Shanno optimization procedure for molecular geometries, Chem. Phys. Lett., 122 (1985), 264–270. https://doi.org/10.1016/0009-2614(85)80574-1 doi: 10.1016/0009-2614(85)80574-1
    [23] M. Vidyasagar, Learning and Generalization: With Applications to Neural Networks, Springer, London, 2003.
    [24] S. P. Meyn, R. L. Tweedie, Markov Chains and Stochastic Stability, Springer, Berlin, 2012.
    [25] P. Doukhan, Mixing: Properties and Examples, Springer, Berlin, 2012.
    [26] P. Zhang, N. Riedel, Discriminant analysis: A unified approach, in Fifth IEEE International Conference on Data Mining (ICDM'05), 2005. https://doi.org/10.1109/ICDM.2005.51
    [27] V. N. Vapnik, An overview of statistical learning theory, IEEE T. Neur. Net. Lear., 10 (1999), 988–999. https://doi.org/10.1109/72.788640 doi: 10.1109/72.788640
    [28] F. Cucker, S. Smale, Best choices for regularization parameters in learning theory: On the bias-variance problem, Found. Comput. Math., 2 (2002), 413–428. https://doi.org/10.1007/s102080010030 doi: 10.1007/s102080010030
    [29] G. Stempfel, L. Ralaivola, Learning SVMs from sloppily labeled data, in Lecture Notes in Computer Science, Springer, 2009. http://dx.doi.org/10.1007/978-3-642-04274-4_91
    [30] M. P. Qian, G. L. Gong, Applied random processes, Peking University Press, Beijing, 1998.
    [31] W. K. Hastings, Monte Carlo sampling methods using Markov chains and their applications, Biometrika, 57 (1970), 97–109. https://doi.org/10.1093/biomet/57.1.97 doi: 10.1093/biomet/57.1.97
    [32] S. Geman S, D. Geman, Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images, IEEE Trans. Pattern Anal. Mach. Intell., 6 (1984), 721–741. https://doi.org/10.1109/TPAMI.1984.4767596 doi: 10.1109/TPAMI.1984.4767596
  • Reader Comments
  • © 2023 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(1068) PDF downloads(73) Cited by(0)

Article outline

Figures and Tables

Figures(8)  /  Tables(5)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog