In the case of non-independent and identically distributed samples, we propose a new ueMC algorithm based on uniformly ergodic Markov samples, and study the generalization ability, the learning rate and convergence of the algorithm. We develop the ueMC algorithm to generate samples from given datasets, and present the numerical results for benchmark datasets. The numerical simulation shows that the logistic regression model with Markov sampling has better generalization ability on large training samples, and its performance is also better than that of classical machine learning algorithms, such as random forest and Adaboost.
Citation: Zhiyong Qian, Wangsen Xiao, Shulan Hu. The generalization ability of logistic regression with Markov sampling[J]. Electronic Research Archive, 2023, 31(9): 5250-5266. doi: 10.3934/era.2023267
Related Papers:
[1]
Qin Guo, Binlei Cai .
Learning capability of the rescaled pure greedy algorithm with non-iid sampling. Electronic Research Archive, 2023, 31(3): 1387-1404.
doi: 10.3934/era.2023071
[2]
Weishang Gao, Qin Gao, Lijie Sun, Yue Chen .
Design of a novel multimodal optimization algorithm and its application in logistics optimization. Electronic Research Archive, 2024, 32(3): 1946-1972.
doi: 10.3934/era.2024089
[3]
Showkat Ahmad Lone, Hanieh Panahi, Sadia Anwar, Sana Shahab .
Estimations and optimal censoring schemes for the unified progressive hybrid gamma-mixed Rayleigh distribution. Electronic Research Archive, 2023, 31(8): 4729-4752.
doi: 10.3934/era.2023242
[4]
Meixin Xiong, Liuhong Chen, Ju Ming, Jaemin Shin .
Accelerating the Bayesian inference of inverse problems by using data-driven compressive sensing method based on proper orthogonal decomposition. Electronic Research Archive, 2021, 29(5): 3383-3403.
doi: 10.3934/era.2021044
[5]
Dandan Zuo, Wansheng Wang, Lulu Zhang, Jing Han, Ling Chen .
Non-fragile sampled-data control for synchronizing Markov jump Lur'e systems with time-variant delay. Electronic Research Archive, 2024, 32(7): 4632-4658.
doi: 10.3934/era.2024211
[6]
Zhuang Wang, Renting Liu, Jie Xu, Yusheng Fu .
FedSC: A federated learning algorithm based on client-side clustering. Electronic Research Archive, 2023, 31(9): 5226-5249.
doi: 10.3934/era.2023266
[7]
Kaixuan Wang, Shixiong Zhang, Yang Cao, Lu Yang .
Weakly supervised anomaly detection based on sparsity prior. Electronic Research Archive, 2024, 32(6): 3728-3741.
doi: 10.3934/era.2024169
[8]
Shengming Hu, Yongfei Lu, Xuanchi Liu, Cheng Huang, Zhou Wang, Lei Huang, Weihang Zhang, Xiaoyang Li .
Stability prediction of circular sliding failure soil slopes based on a genetic algorithm optimization of random forest algorithm. Electronic Research Archive, 2024, 32(11): 6120-6139.
doi: 10.3934/era.2024284
[9]
Haoqing Wang, Wen Yi, Yannick Liu .
An innovative approach of determining the sample data size for machine learning models: a case study on health and safety management for infrastructure workers. Electronic Research Archive, 2022, 30(9): 3452-3462.
doi: 10.3934/era.2022176
[10]
Amal S. Hassan, Najwan Alsadat, Mohammed Elgarhy, Laxmi Prasad Sapkota, Oluwafemi Samson Balogun, Ahmed M. Gemeay .
A novel asymmetric form of the power half-logistic distribution with statistical inference and real data analysis. Electronic Research Archive, 2025, 33(2): 791-825.
doi: 10.3934/era.2025036
Abstract
In the case of non-independent and identically distributed samples, we propose a new ueMC algorithm based on uniformly ergodic Markov samples, and study the generalization ability, the learning rate and convergence of the algorithm. We develop the ueMC algorithm to generate samples from given datasets, and present the numerical results for benchmark datasets. The numerical simulation shows that the logistic regression model with Markov sampling has better generalization ability on large training samples, and its performance is also better than that of classical machine learning algorithms, such as random forest and Adaboost.
1.
Introduction
The logistic regression model has become one of the most popular machine learning methods in classification [1,2,3,4]. Besides the advantages of good performance and strong interpretability in practical applications, the logistic regression model also has a complete theoretical basis in terms of consistency and generalization performance when the training samples are independent and identically distributed (i.i.d) [5,6,7,8]. However, the hypothesis of i.i.d is quite hard to be proved in practice, so it is natural to consider the logistic regression model with non-i.i.d samples.
The relaxation of the i.i.d hypothesis has been discussed for a long time in machine learning and statistical literature. For example, Wang et al. [9] pointed out that the statistical learning theory in the case of small samples cannot be directly applied to large samples and proposed the generalization bounds of under-sample models based on strict Bayesian network processing. Sun and Wu [10], Sun and Guo [11] and Chu and Sun [12] used mixed samples to analyze the error of l2-norm least squares and l1-norm least squares regression, respectively. Guo and Shi [13] proved that the learning speed of regularized least squares regression was faster than the classical method. Machine learning algorithms in the non-i.i.d case are solved by concentration inequalities because the concentration inequalities can provide probability upper bounds for the deviation [14]. Modha and Masry [15] extended the classical inequalities in the condition from i.i.d to m-correlation and strong mixing, respectively. Merlevède et al. [16] obtained a Bernstein type inequality for a class of weakly dependent and bounded random variables. Fan et al. [17] studied the Hoeffding inequality for general Markov chains and time-dependent functions.
In the paper, we enhance the performance of the logistic regression model from small samples to large samples. Machine learning usually performs well in Markov chain samples [18], so we develop the uniformly ergodic Markov chain algorithm (ueMC algorithm) for the logistic regression model. The ueMC algorithm can identify samples with classification errors and close to the decision boundary, and determine the final Markov samples used in the pre-training models. Compared with the algorithm proposed by Thongkam et al. [19], the ueMC algorithm does not directly eliminate the misclassified samples, as they are based on a random probability. Similar to the method proposed by Miranda et al. [20], the previous eliminated samples may be selected later. Inspired by [21], we study the generalization ability of the ueMC algorithm, and establish the optimal learning rate of the logistic regression classification for ueMC samples. Through numerical study and the simulation, it is verified that the performance of the ueMC algorithm based on ueMC samples is more effective than the algorithm based on other random samples, and the performance of the ueMC algorithm is also better than of the classical machine learning algorithm.
The paper is organized as follows. In Section 2, some definitions and notations are given. In Section 3, we present and prove the main results on the learning rates of the logistic regression model with ueMC samples. In Section 4, we develop a new ueMC algorithm, and present the numerical studies on the generalization performance of the logistic regression model based on Markov sampling for benchmark datasets. Finally, the conclusions are given in Section 5.
2.
Preliminaries
In this section we introduce the definitions and notations throughout this paper.
2.1. Logistic regression model
Let (X,d) be a compact metric space and Y={0,1}. A binary classifier is a function ˆf:X→Y which labels every point x∈X⊆R with some y∈Y. Let φ be a probability distribution on Z=X×Y and (X,Y) be the corresponding random variable. Given Xi=(1,xi1,…,xik)T⊆Rk+1, i=1,2,…,N, the form of the separating hyperplane is as follows:
ˆf=WTX=w0+w1xn1+w2xn2+…+wkxnk=0,
(2.1)
where W=(w0,w1,…,wk)T is the coefficient of the variable.
Let P(Yi∣Xi,W)=11+e−WTXi=sigmod(ˆf) represent the probability of the sample being a positive sample (Yi=1), and 1−P(Yi∣Xi,W) represent the probability of the sample being a negative sample (Yi=0). So we have:
lnP(Yi=1∣Xi,W)P(Yi=0∣Xi,W)=WTXi,
(2.2)
where P(Yi=1∣Xi,W)=eWTXi1+eWTXi.
The log-likelihood function is:
g(W)=N∑i=1ln(YiP(Yi=1∣Xi,W)+(1−Yi)P(Yi=0∣Xi,W)).
(2.3)
The objective function of the logistic regression model is:
L(W)=argminWN∑i=1(YiWTXi+ln(1+eWTXi)).
(2.4)
Linear models are easy to over-fitting in high dimensions because of the correlation between features. In the paper, the weight term of the model is constrained to alleviate the problem of over-fitting by adding a regular term c(W). We choose norm l2 regularization and use BFGS (Broyden-Fletcher-Goldfarb-Shanno) algorithm [22] to solve the parameters.
L(W)=argminWN∑i=1(YiWTXi+ln(1+eWTXi))+c(W).
(2.5)
Let H be Hilbert space, a set of real function on space X⊆R. If there is a kernel function K:X×X→R satisfying ∀x∈X,K⟨⋅,x⟩∈R, then HK is called a reproducing kernel Hilbert space satisfying ∀x∈X,∀f∈H,f(x)=⟨f,K⟨⋅,x⟩⟩.
For a function f : X→R, sign function sgn(f)=1 if f(x)≥0 and sgn(f)=0 if f(x)<0. Then the logistic regression model is defined as sgn(fz,λ), where fz,λ is a minimizer of the following optimization problem involving a set of random sample S={zi}mi=1∈Zm :
fz,λ:=argminf∈HK{λ‖f‖2HK+Ez(f)}.
(2.6)
In Formula (2.6), ℓ(f,z)=−[yln(sigmoid(f(x)))+(1−y)ln(1−sigmoid(f(x)))] is the loss function, Ez(f)=1m∑mi=1ℓ(f,zi) is the empirical error, E(f)=E[ℓ(f,z)] denotes the generalization error of the corresponding function f. λ=1/(2C) is the regularization parameter, where C is a constant which depends on m:C=C(m) and often limm→∞C(m)=∞.
2.2. Uniformly ergodic Markov chains (ueMC)
Suppose a Markov chain on (Z,S) is a sequence of random variables {Zi}i≥1 with a set of transition probability measures P(n)(A∣zi),A∈S,zi∈Z. It is assumed that
P(n)(A∣zi)=P{Zn+i∈A∣Zj,j<i,Zi=zi},n>0.
(2.7)
For any Markov chain, if the transition probability is independent of time, the Markov chain is stationary [23].
Definition 1.Given two probabilities ν1,ν2 on the measure space (Z,S), we define ‖ν1−ν2‖TV=supA∈S|ν1(A)−ν2(A)| as the total variation distance between the measures ν1 and ν2. Meyn and Tweedie [24] pointed out that for a Markov chain {Zi}i≥1, if there are 0<γ0<∞ and 0<ρ0<1,
‖Pk(⋅∣z)−π(⋅)‖TV≤γ0ρk0,∀k≥1,k∈N,
(2.8)
where π(⋅) is the stationary distribution of {Zi}i≥1, then {Zi}i≥1 is a ueMC.
Another equivalent definition of ueMC is the following Doeblin condition [25].
Proposition 1.(Doeblin condition): Suppose {Zi}i≥1 is a Markov chain with transition probability Pn(⋅∣⋅), and μ is a specific non-negative metric with nonzero mass μ0. If there is some integer k such that ∀z∈Z and for all measurable sets A,Pk(A∣z)≤μ(A), then for any integer n, ∀z1,z2∈Z, we have
‖Pk(⋅∣z1)−Pk(⋅∣z2)‖TV≤2βn/k1,
(2.9)
where β1=1−μ0.
3.
Bounds of generalization ability
In this section, we estimate the bounds on the generalization performance of the logistic regression model based on the ueMC sampling by following the enlightening ideas of [26].
3.1. Bounds of generalization ability
To measure the generalization ability of fz,λ, we identify how sgn(fz,λ) converges (with respect to the misclassification error) to the best classifier as C(m) tends to infinity. Recall the regression function of φ, fφ=∫yydφ(y|x), x∈X. Then the Bayes rule is given by the sign of the regression function fc=sgn(fφ). Referring to Vapnik [27], the speed at which fz,λ approaches fφ is measured by excess generalization error E(fz,λ)−E(fφ). Since the minimization (2.6) is taken over the discrete quantity Ez(f), we have to regulate the capacity of the function set. Here the capacity is measured by the covering number.
We should estimate the excess misclassification error R(sgn(fz,λ))−R(fc) to bound the generalization ability of fz,λ. The relation between excess misclassification error and excess generalization error E(f)−E(fφ) for convex loss is for f:X→R,
R(sgn(f))−R(fc)≤E(f)−E(fφ).
(3.1)
Definition 2.Let U>0, and BU={f:f∈HK,‖f‖HK≤U} be the sphere with radius U in HK, let N(ϵ)=N(BU,ϵ),ϵ>0 be the covering number of BU.
Definition 3.The complexity index of HK is s, if there is some Cs>0 such that ∀ϵ>0,
lnN(ϵ)≤Cs(1/ϵ)s.
Proposition 2.Let fz,λ be the function defined by (2.6), and define fλ:=argminf∈HK{λ‖f‖2HK+E(f)} as a regularizing function, then for any λ>0,
where ξ1=ℓ(fz,λ,z)−ℓ(fφ,z) and ξ2=ℓ(fλ,z)−ℓ(fφ,z). □
Definition 4.We say the function fφ can be approximated by HK with exponent 0<β≤1, if there exists a constant Cβ such that for any λ>0,Rλ≤Cβλβ. In the paper, we assume that there is a constant B such that |fφ|≤B.
3.2. Main tools
To prove the main results, there are four lemmas.
Lemma 1.Let f be a continuous function defined on X and ‖f‖∞=supx∈X|f(x)|. Let κ=supx∈X√K⟨x,x⟩, then ‖f‖∞=supx∈X|f(x)|≤κ‖f‖HK,∀f∈HK.
Proof. According to the Cauchy-Schwartz inequality of PSD kernel and the reproducing property of reproducing kernel Hilbert space, we have:
f2(x)≤⟨f,f⟩⋅K⟨x,x⟩,
it follows that
supx∈X|f(x)|≤κ‖f‖HK,∀f∈HK.
□
Lemma 2.ξ2=ℓ(fλ,z)−ℓ(fφ,z), |fφ|≤B, we have
|ξ2|≤d:=κ√Rλ/λ+B+ln(1+eκ√Rλ/λ)+ln(1+eB).
(3.3)
Proof. Due to
Rλ=E(fλ)−E(fφ)+λ‖fλ‖2HK≥λ‖fλ‖2HK,
then ‖fλ‖HK≤√Rλ/λ. By applying Lemma 1, there holds ‖fλ‖∞≤κ√Rλ/λ as ξ2=ℓ(fλ,z)−ℓ(fφ,z) and |fφ|≤B, and when y=1,
Then, the proof ends with |ξ2|=|ℓ(fλ,z)−ℓ(fφ,z)|≤|fλ|+|fφ|+|ln(1+efφ)|+|ln(1+efλ)|. □
Lemma 3.(proved in [21]): Hoeffding's inequality: Let Z={Zi}mi=1 be a ueMC sample and F be a set of bounded and measurable functions, i.e. there is a constant C, such that ∀z∈Z, ∀g∈F, 0≤g(z)≤C. For any ε>0, we have
Pr{|1mm∑i=1g(zi)−E(g)|≥ε}≤2exp{−mε256C‖Γ0‖2E(g)},
(3.4)
where Γ0=√2/(1−β1/2t) and β=1−μ0. Here μ0 and t are from Doeblin condition [25].
Lemma 4.(Lemma 4 in [28]) Let c1, c2>0, and p1>p2>0, then the equation xp1−c1xp2−c2=0 has a unique positive zero x∗. In addition, x∗≤max{(2c1)1/(p1−p2),(2c2)1/p1}.
3.3. Main results
Through the above inference process, we form the following main results.
Theorem 1.Let Z={Zi}mi=1 be a ueMC sample, then for any 0<δ<1,
Proof. The proof is easily obtained from Theorem 1 with the proof of Theorem 2 in [21]. □
4.
Markov sampling and numerical studies
In this section, we introduce a ueMC algorithm to generate the samples from a given dataset. We give numerical studies on the learning performance of the logistic regression model and present some useful discussions.
4.1. Markov sampling algorithm
Here are the notations:
● ST: the initial training set
● Siid: a i.i.d sample from ST
● Smkv: a ueMC sample from ST
● NT: the number of positive samples in Smkv
● NF: the number of negative samples in Smkv
● k,m,N,n,a, max_iteration: super-parameters
The pseudocode of the ueMC algorithm is as follows:
The ueMC algorithm: Markov sampling for Logistic Regression
Input: ST,m,k,N,n, max_iteration, a. Output: fk. Step 1: Set i=0, draw randomly m samples from ST called Siid, and train Siid, get the premilinary mode fi. Step 2: Let Smkv=∅, NT=0, NF=0, t:=1. Step 3: Draw randomly a sample zt from ST, let Smkv=Smkv∪zt, NT=NT+1, if yt=1; NF=NF+1, if yt=0. set j:=0. Step 4: Draw randomly another sample zcandidate from ST, calculate p1=e−ℓ(fi,zcantididete)/e−ℓ(fi,zt), j:=j+1. Step 5: Draw randomly a number prandom from U(0,1). Step 6: If 1>p1>prandom, accept zcandidate with p1; If p1≥1 and ytycandidate=1, calculate p2=e−ycandidatefi/e−ytfi, if p2>prandom, accept zcandidate with p2; If p1≥1 and ytycandidate=−1, accept zcandidate with p1; If n samples are rejected continuously, accept zcandidate with p3=ap1. Step 7: If ycandidate=1 and NT<N2, let NT=NT+1,Smkv=Smkv∪zcandidate; If ycandidate =−1 and NF<N2, let NF=NF+1,Smkv=Smkv∪zcandidate. Step 8: If j> max_iteration or NT+NF>N, train Smkv to get fi+1, else go to Step 4. Step 9: If i<k, let i:=i+1 and go to Step 2, else output fi+1.
Compared with data noise, the over-fitting problem caused by small sample size has a stronger impact on the generalization performance of the classifier ([29]). Different from the methods from [11,19] based on threshold to eliminate noise data, the ueMC is constructed to avoid the problem of over-fitting caused by small training samples.
The ueMC algorithm uses the initial model f0 of Siid and the loss function ℓ(f,z) to construct the transition probability p1,p2,p3 of ueMC. Since p1,p2,p3 is greater than 0 and the sample size in ST is limited, Smkv obtained by the ueMC algorithm is a ueMC, according to the research conclusion of [30].
Different from the MCMC algorithm proposed by [31,32], the ueMC algorithm does not need to know the probability distribution information of the training set samples. In addition, when k=0, the algorithm degenerates into the classical logistic regression model. In order to make the following experimental results without loss of generality, we take k=2 and a=1.2.
4.2. Experiment results
In this subsection, we present the numerical study on the learning performance of the logistic regression model based on linear prediction models for 9 real-world datasets (from http://archive.ics.uci.edu/ml/datasets and https://www.fml.tuebingen.mpg.de/). The information of these data sets is summarized in Table 1, and all these datasets are 2-classes realworld datasets. The samples in these datasets obey non-independent identical distribution. We use the SMOTE algorithm to deal with unbalanced data. For each data set, it is randomly divided into two parts: the training set and the test set.
Table 1.
General Information about 9 Real-world Datasets.
In order to simplify the experimental process, we take N=m in the ueMC algorithm, and carry out 50 repeated experiments for each dataset. The experimental results are shown in Table 2, where "MR (i.i.d.)" and "MR (Markov)" denote the misclassification rates of the logistic regression model based on random sampling and Markov sampling, respectively. In the following, we also discuss the experimental results based on N<m.
Table 2.
Misclassification Rates (%) for 500 Training Samples.
From Table 2, we can find that for 500 training samples, the standard deviations and means of average misclassification rates of the logistic regression model based on Markov sampling are smaller than that of random sampling except UCI-Heart Disease and UCI-Skin, and the means of average misclassification rates based on Markov sampling in UCI-Heart Disease and UCI-Skin datasets are still smaller than that of random samples. To show the learning performance of the logistic regression model based on Markov sampling, we present the average misclassification rates for 50 experimental results of the logistic regression model based on Markov sampling (non-i.i.d) and random sampling (i.i.d.) for different training sizes and four datasets in Figures 1–4.
Figure 1.
Average misclassification rates for UCI-HTRU2 and m = 1000, 1500, 2000, 2500, 3000.
To have a better understanding of learning performance of the logistic regression model based on Markov sampling, the following figures are presented to show the 50 experimental misclassification rates of the logistic regression model based on Markov sampling.
Figures 1–8 show that for UCI-HTRU2, UCI-Waveform, ELEC2 and UCI-Diabetes, the logistic regression model based on Markov sampling would have better learning performance than that of random sampling as the number of training samples is large. For other datasets, since the experimental results are similar, we do not present all of them here.
For the case of N<m, Table 3 shows that for the datasets of UCI-HTRU2, ELEC2 and UCI-Waveform, the logistic regression model based on smaller Markov chain samples (600 samples for UCI-HTRU2, ELEC2 and UCI-Waveform) can present smaller misclassification rates compared to more i.i.d. samples (2000 samples).
Table 3.
Misclassification Rates for Different Training Sizes.
We compare the performance of the logistic regression model with the ueMC samples with the classical logistic regression model, SVMC, Adaboost, RandomForest and other classical machine learning models on UCI-HTRU2, UCI-Skin and MNIST. Tables 4 and 5 show the average classification error rates and Wilkeson signed rank test results of 50 experiments.
On the three real data sets, the difference between the logistic regression model based on the ueMC sampling and classical logistic regression models in classification error rates is statistically significant. It can be concluded that: 1) The generalization ability of the logistic regression model based on the ueMC sampling is better than that of the classical logistic regression model classifier, and it is robust. 2) The generalization ability of the logistic regression model with the ueMC samples is comparable to that of complex classifiers, such as random forest and Adaboost.
4.3. Explanation of learning performance
The ueMC algorithm divides the samples into three categories according to the model pre-trained in the previous step. The first one is the samples with correct classification and close to the decision boundary, the second one is the samples with correct classification but far away from the decision boundary, and the third one is the samples with wrong classification.
According to the definition of the loss function, the distance between the sample and the decision boundary determines the size of the loss. Samples close to the decision boundary would train a better decision boundary for classification. The ueMC algorithm designs the acceptance probability p1 and p2 which ensure that the samples obtained according to the acceptance probability are close to the decision boundary. Therefore, when the initial logistic regression model can better fit the data set, the ueMC algorithm can ensure that the ueMC samples are excellent samples. In addition, we do not directly eliminate the samples with classification errors or far away from the decision boundary, but accept them with a small probability, which not only ensures the excellent properties of the training set, but also maintains the diversity of the training set samples to a certain extent, and can reduce the error caused by the misjudgment of the pre-training model. Therefore, the learning performance of the logistic regression model based on Markov sampling is better than that of random sampling, and the classifier based on Markov sampling is more sparse compared to random sampling.
5.
Conclusions
To study the generalization performance of the logistic regression model based on the ueMC sampling, inspired by the idea from [21], we estimate first the generalization error of logistic model algorithm based on the ueMC sampling. The generalization error is deconstructed into sample error and regularization error by error decomposition, and the convergence of the algorithm is proved. In addition, we also generate the samples from given dataset by the ueMC algorithm. The numerical studies show that as the number of training samples is large, the learning performance of the logistic regression model based on Markov sampling is better than that of random sampling, and its performance is also better than that of classical machine model algorithms, such as random forest and Adaboost. In other words, the ueMC algorithm significantly improves the learning performance of the logistic regression model. To our knowledge, this study is the first on this topic.
Use of AI tools declaration
The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.
Acknowledgments
This work is supported by the Innovation and Talent Base for Digital Technology and Finance (B21038) and "the Fundamental Research Funds for the Central Universities", Zhongnan University of Economics and Law (2722023EJ002).
Conflict of interest
The authors declare there is no conflicts of interest.
References
[1]
A. Bayaga, Multinomial logistic regression: Usage and application in risk analysis, J. Appl. Quant. Methods, 5 (2010), 288–297.
[2]
A. Selmoune, Z. Liu, J. Lee, To pay or not to pay? Understanding public acceptance of congestion pricing: A case study of Nanjing, Electron. Res. Arch, 30 (2022), 4136–4156. https://doi.org/10.3934/era.2022209 doi: 10.3934/era.2022209
[3]
Z. Ahmad, Z. Almaspoor, F. Khan, S. E. Alhazmi, M. El-Morshedy, O. Y. Ababneh, et al., On fitting and forecasting the log-returns of cryptocurrency exchange rates using a new logistic model and machine learning algorithms, AIMS Math., 7 (2022), 18031–18049. https://doi.org/10.3934/math.2022993 doi: 10.3934/math.2022993
[4]
N. Dwarika, Asset pricing models in South Africa: A comparative of regression analysis and the Bayesian approach, Data Sci. Financ. Econ., 3 (2023), 55–75. https://doi.org/10.3934/DSFE.2023004 doi: 10.3934/DSFE.2023004
N. Kordzakhia, G. D. Mishra, L. Reiersølmoen, Robust estimation in the logistic regression model, J. Stat. Plan. Infer., 98 (2001), 211–223. https://doi.org/10.1016/S0378-3758(00)00312-8 doi: 10.1016/S0378-3758(00)00312-8
[7]
M. Rashid, Inference on Logistic Regression Models, Ph.D thesis, Bowling Green State University, 2008.
[8]
D. Dai, D. Wang, A generalized Liu-type estimator for logistic partial linear regression model with multicollinearity, AIMS Math., 8 (2023), 11851–11874. https://doi.org/10.3934/math.2023600 doi: 10.3934/math.2023600
[9]
Z. Wang, Z. Wang, B. Fu, Learning restricted bayesian network classifiers with mixed non-i.i.d. sampling, in 2010 IEEE International Conference on Data Mining Workshops, (2010), 899–904. https://doi.org/10.1109/ICDMW.2010.199
[10]
H. Sun, Q. Wu, Least square regression with indefinite kernels and coefficient regularization, Appl. Comput. Harmon A, 30 (2011), 96–109 https://doi.org/10.1016/j.acha.2010.04.001 doi: 10.1016/j.acha.2010.04.001
[11]
H. Sun, Q. Guo, Coefficient regularized regression with non-iid sampling, Int. J. Comput. Math., 88 (2011), 3113–3124. https://doi.org/10.1080/00207160.2011.587511 doi: 10.1080/00207160.2011.587511
[12]
X. Chu, H. Sun, Regularized least square regression with unbounded and dependent sampling, Abstr. Appl. Anal., 2013 (2013), 900–914. https://doi.org/10.1155/2013/139318. doi: 10.1155/2013/139318
[13]
Z. C. Guo, L. Shi, Learning with coefficient-based regularization and l1-penalty, Adv. Comput. Math., 39 (2013), 493–510. https://doi.org/10.1007/s10444-012-9288-6 doi: 10.1007/s10444-012-9288-6
[14]
B. Jiang, Q. Sun, J. Q. Fan, Bernstein's inequality for general Markov chains, preprint, arXiv: 1805.10721.
[15]
D. S. Modha, E. Masry, Minimum complexity regression estimation with weakly dependent observations, IEEE Trans. Inf. Theory, 42 (1996), 2133–2145. https://doi.org/10.1109/18.556602 doi: 10.1109/18.556602
[16]
F. Merlevède, M. Peligrad, E. Rio, Bernstein inequality and moderate deviations under strong mixing conditions, Inst. Math. Stat. (IMS) Collect., 2009 (2009), 273–292. https://doi.org/10.1214/09-IMSCOLL518 doi: 10.1214/09-IMSCOLL518
[17]
J. Q. Fan, B. Jiang, Q. Sun, Hoeffding's lemma for Markov Chains and its applications to statistical learning, preprint, arXiv: 1802.00211.
[18]
P. J. M. Laarhoven, E. H. L. Aarts, Simulated Annealing: Theory and Applications, Springer, Dordrecht, 1987.
[19]
J. Thongkam, G. Xu, Y. Zhang, et.al., Support vector machine for outlier detection in breast cancer survivability prediction, in Asia-Pacific Web Conference, Springer, (2008), 99–109. https://doi.org/10.1007/978-3-540-89376-9_10
[20]
A. L. B. Miranda, L. P. F. Garcia, A. C. P. L. F. Carvalho, A. C. Lorena, Use of classification algorithms in noise detection and elimination, in International Conference on Hybrid Artificial Intelligence Systems, Springer, (2009), 417–424. https://doi.org/10.1007/978-3-642-02319-4_50
[21]
J. Xu, Y. Y. Tang, B. Zou, Z. Xu, L. Li, Y. Lu, et al., The generalization ability of SVM classification based on Markov sampling, IEEE Trans. Cybern., 45 (2014), 1169–1179. https://doi.org/10.1109/TCYB.2014.2346536 doi: 10.1109/TCYB.2014.2346536
[22]
J. D. Head, M. C. Zerner, A Broyden—Fletcher—Goldfarb—Shanno optimization procedure for molecular geometries, Chem. Phys. Lett., 122 (1985), 264–270. https://doi.org/10.1016/0009-2614(85)80574-1 doi: 10.1016/0009-2614(85)80574-1
[23]
M. Vidyasagar, Learning and Generalization: With Applications to Neural Networks, Springer, London, 2003.
[24]
S. P. Meyn, R. L. Tweedie, Markov Chains and Stochastic Stability, Springer, Berlin, 2012.
[25]
P. Doukhan, Mixing: Properties and Examples, Springer, Berlin, 2012.
[26]
P. Zhang, N. Riedel, Discriminant analysis: A unified approach, in Fifth IEEE International Conference on Data Mining (ICDM'05), 2005. https://doi.org/10.1109/ICDM.2005.51
[27]
V. N. Vapnik, An overview of statistical learning theory, IEEE T. Neur. Net. Lear., 10 (1999), 988–999. https://doi.org/10.1109/72.788640 doi: 10.1109/72.788640
[28]
F. Cucker, S. Smale, Best choices for regularization parameters in learning theory: On the bias-variance problem, Found. Comput. Math., 2 (2002), 413–428. https://doi.org/10.1007/s102080010030 doi: 10.1007/s102080010030
M. P. Qian, G. L. Gong, Applied random processes, Peking University Press, Beijing, 1998.
[31]
W. K. Hastings, Monte Carlo sampling methods using Markov chains and their applications, Biometrika, 57 (1970), 97–109. https://doi.org/10.1093/biomet/57.1.97 doi: 10.1093/biomet/57.1.97
[32]
S. Geman S, D. Geman, Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images, IEEE Trans. Pattern Anal. Mach. Intell., 6 (1984), 721–741. https://doi.org/10.1109/TPAMI.1984.4767596 doi: 10.1109/TPAMI.1984.4767596