1.
Introduction
Let H be a real Hilbert space and K is a nonempty closed and convex subset of a real Hilbert space H. The equilibrium problem (EP) is to find an element u∗∈K such that
where f:K×K→R is a bifunction and satisfying f(z,z)=0 for all z∈H, and EP(f,K) is denoted for a solution set of EP (1.1). EP (1.1) generalizes many various problems in optimization problems such as variational inequalities problems, Nash equilibrium problems, linear programming problems, among others.
To solve the problem EP (1.1), the two-step extragradient method (TSEM) was proposed by Tran et al. [25] in 2008 which is inspired by the concept ideas to solve variational inequalities of Korpelevich [10]. Under the Lipschitz condition of the bifunction f the convergence theorem was proved. However, only weak convergence was obtained in Hilbert spaces. For obtaining strong convergence, this goal was completed by Hieu [8] with Halpern subgradient extragradient method (HSEM) which was modified from the HSEM of Kraikaew and Saejung [11] for variational inequalities. This method is defined by u∈H and
where λ is still some constant depending on the interval that makes the bifunction f satisfies the Lipschitz condition and {αi}⊂(0,1) which satisfies the principle conditions
Very recently, Muangchoo [14] combined a viscosity-type method with the extragradient algorithm for obtaining strong convergence theorem of the EP (1.1). This method is defined by
where μ∈(0,σ)⊂(0,min{1,12c1,12c2}), g is a contraction function on H with contraction constant ξ∈[0,1)(‖g(x)−g(y)‖≤ξ‖x−y‖, ∀x,y∈H), {αi} satisfies the principle conditions limi→∞αi=0 and ∑+∞iαi=+∞, and the step-sizes {λi} is developed by updating the step-sizes method without knowing the Lipschitz-type constants of the bifunction f which satisfies the following:
where Ki=f(zi,ti)−f(zi,vi)−c1‖zi−vi‖2−c2‖ti−vi‖2+1.
Finding technique to speed up the convergence of the algorithm is the way that many mathematicians are interested. One of that is an inertial which was first introduced by Polyak [16]. Very recently, Shehu et al. [22] modified the inertial technique with the Halpern-type algorithm and subgradient extragradient method for obtaining strong convergence to a solution of EP(f,K) such that f is pseudomonotone. This method is defined by γ∈H and
where the inertial parameter δi∈[0,13), τ∈(0,12], the update step-size {λi} satisfies the following:
and {αi} still satisfies the principle conditions limi→∞αi=0 and ∑∞iαi=∞. This update step-size {λi} is limited in the computation and can not be modified in another way.
In this paper, motivated and inspired by the above works in literature, we introduce a modified inertial viscosity extragradient type method for solving equilibrium problems. Moreover, we apply our main result to solve a data classification problem in machine learning and show the performance of our algorithm by comparing with many existing methods.
2.
Preliminaries
Let us begin with some concepts of monotonicity of a bifunction [2,15]. Let K be a nonempty, closed and convex subset of H. A bifunction f:H×H→R is said to be:
(i) strongly monotone on K if there exists a constant γ>0 such that
(ii) mototone on K if f(x,y)⩽−f(y,x),∀x,y∈K;
(iii) pseudomonotone on K if f(x,y)⩾0⟹f(y,x)⩽0,∀x,y∈K;
(iv) satisfying Lipschitz-type condition on K if there exist two positive constants c1,c2 such that
The normal cone NK to K at a point x∈K is defined by NK(x)={w∈H:⟨w,x−y⟩⩾0,∀y∈K}. For every x∈H, the metric projection PKx of x onto K is the nearest point of x in K, that is, PKx=argmin{‖y−x‖:y∈K}. Since K is nonempty, closed and convex, PKx exists and is unique. For each x,z∈H, by ∂2f(z,x), we denote the subdifferential of convex function f(z,.) at x, i.e.,
In particular, for z∈K,
For proving the convergence of the proposed algorithm, we need the following lemmas.
Lemma 2.1. [1] For each x∈H and λ>0,
where proxλg(x)=argmin{λg(y)+12‖x−y‖2:y∈K}.
For any point u∈H, there exists point PKu∈K such that
PK is called the metric projection of H onto K. We know that PK is a nonexpansive mapping of H onto K, that is, ‖PKx−PKy‖≤‖x−y‖, ∀x,y∈K.
Lemma 2.2. [18] Let g:K→R be a convex, subdifferentiable and lower semicontinuous function on K. Suppose the convex set K has nonempty interior, or g is continuous at a point x∗∈K. Then, x∗ is a solution to the following convex problem min{g(x):x∈K} if and only if 0∈∂g(x∗)+NK(x∗), where ∂g(.) denotes the subdifferential of g and NK(x∗) is the normal cone of K at x∗.
Lemma 2.3. [19,22] Let S⊆R be a nonempty, closed, and convex subset of a real Hilbert space H. Let u∈H be arbitrarity given, z=PSu, and Ω={x∈H:⟨x−u,x−z⟩⩽0}. Then Ω∩S={z}.
Lemma 2.4. [28] Let {ai} and {ci} be nonnegative sequences of real numbers such that ∑∞i=1ci<∞, and let {bi} be a sequence of real numbers such that lim supi→∞bi⩽0. If for any i∈N such that
where {γi} is a sequence in (0,1) such that ∑∞i=1γi=∞, then limi→∞ai=0.
Lemma 2.5. [29] Let {ai}, {bi} and {ci} be positive sequences, such that
If ∑∞i=1ci<+∞ and ∑∞i=1bi<+∞; then, limi→+∞ai exists.
The convergence of Algorithm 3.1 will be given under the conditions that
Condition 2.6. (A1) f is pseudomonotone on K with int(K)≠∅ or f(x,.) is continuous at some z∈K for every x∈K;
(A2) f satisfies Lipschitz-type condition on H with two constants c1 and c2;
(A3) f(.,y) is sequentially weakly upper semicontinuous on K for each fixed point y∈K, i.e., if {xi}⊂K is a sequence converging weakly to x∈K, then f(x,y)⩾lim supi→∞f(xi,y);
(A4) for x∈H, f(x,.) is convex and lower semicontinuous, subdifferentiable on H;
(A5) V:H→H is contraction with contraction constant α.
3.
Main results
Now, we are in a position to present a modification of algorithm (EGM) in [25] for equilibrium problems.
In this section, we will analyse the convergence of Algorithm 3.1.
For the rest of this paper, we assume the following condition.
Condition 3.2. (i) {αi}⊂(0,1] is non-increasing with limi→∞αi=0 and ∞∑i=1αi=∞;
(ii) 0≤θi≤θi+1≤θ<13 and limi→∞θiαi‖xi−xi−1‖=0;
(iii) EP(f,K)≠∅.
Before we prove the strong convergence result, we need some lemmas below.
Lemma 3.3. Assume that Conditions 2.6 and 3.2 hold. Let {xi} be generated by Algorithm 3.1. Then there exists N>0 such that
Proof. By the definition of yi, and Lemma 2.1, we have
Putting y=zi into (3.2), we obtain
By the definition of zi, we have
(3.3) and (3.4) imply that
If f(wi,zi)−f(wi,yi)−f(yi,zi)>0, then
Observe that (3.6) is also satisfied if f(wi,zi)−f(wi,yi)−f(yi,zi)≤0. By (3.5) and (3.6), we have
Note that
and
Using (3.8) and (3.9) in (3.7), we obtain, for all y∈K,
Taking y=u∈EP(f,K)⊂K, one has f(u,yi)⩾0,∀i. By (A1), we obtain f(yi,u)⩽0, ∀i. Hence, we obtain from (3.10) that
It follows from λi∈(0,12max{c1,c2}) and (3.11), we have
On the other hand, we have
Substituting (3.11) into (3.13), we obtain
Moreover, we have zi−wi=1τ(xi+1−wi), which together with (3.14) gives
where ϵ=1−ττ.
Lemma 3.4. Assume that Conditions 2.6 and 3.2 hold. Let {xi} be generated by Algorithm 3.1. Then, for all u∈EP(f,K),
Proof. By Lemma 2.5, we have
Moreover, from the definition of wi, we obtain that
Replacing u by xi+1 in (3.18), we have
Substituting (3.18) and (3.19) into (3.17), we have
Therefore, we obtain
It follows that
Since θi is non-decreasing and αi is non-increasing, we then obtain
Lemma 3.5. Assume that Conditions 2.6 and 3.2 hold. Then {xi} generated by Algorithm 3.1 is bounded.
Proof. From (3.15) and Condition 3.2 (ii), there exists K>0 such that
This implies that ‖xi+1−u‖⩽max{‖x1−u‖,‖V(u)−u‖+K1−α}. This shows that {xi} is bounded.
Lemma 3.6. Assume that Conditions 2.6 and 3.2 hold. Let {xi} be generated by Algorithm 3.1. For each i⩾1, define
Then ui⩾0.
Proof. Since {θi} is non-decreasing with 0⩽θi<13, and 2⟨x,y⟩=‖x‖2+‖y‖2−‖x−y‖2 for all x,y∈H, we have
This completes the proof.
Lemma 3.7. Assume that Conditions 2.6 and 3.2 hold. Let {xi} be generated by Algorithm 3.1. Suppose
and
Then {xi} converges strongly to u∈EP(f,K).
Proof. By our assumptions, we have
In the case
this implies that {xi} converges strongly to u immediately. Assume this limit does not hold. Then there is a subset N∗⊆N and a constant ρ>0 such that
Using (3.27) and θi⩽θ<1. For i∈N∗, it then follows that
Consequently, we have lim supi→∞‖xi−u‖⩽0. Since lim infi→∞‖xi−u‖⩾0 obviously holds, it follows that limi→∞‖xi−u‖=0. This implies (by (3.28))
for all i∈N∗ sufficiently large, a contradiction to the assumption that limi→∞‖xi+1−xi‖=0. This completes the proof.
We now give the following strong convergence result of Algorithm 3.1.
Theorem 3.8. Assume that Conditions 2.6 and 3.2 hold. Then {xi} generated by Algorithm 3.1 strongly converges to the solution u=PEP(f,K)V(u).
Proof. From Lemma 3.6 and (3.16), we have
Since PEP(f,K)V is contraction, by the Banach fixed point theorem, there exist unique u=PEP(f,K)V(u). It follow from Lemma 3.3 that
We will consider into 2 cases.
Case 1. In the case of ui+1⩽ui+ti for all i⩾i0 for some i0∈N, ti⩾0 and ∑∞i=1ti<+∞. Then ui⩾0, ∀i⩾1 by Lemma 2.5, we have limi→∞ui=limi→∞ui+1 exists. Since {xi} is bounded by Lemma 3.5, there exists M1>0 such that 2|⟨xi−u,xi−V(xi)⟩|⩽M1 and M2>0 such that ‖xi+1−V(xi+1)‖2+‖V(xi)−xi+1‖2⩽M2. Since 0⩽θi⩽θi+1⩽θ<13 and limi→∞αi=0, there exist N∈N and γ1>0 such that 1−3θi+1−αi⩾γ1 for all i⩾N. Therefore, for i⩾N, we obtain from (3.29) that
as i→∞. Hence limi→∞‖xi+1−xi‖=0. For u∈EP(f,K), we have
and from (3.14), we have
This implies that
By our condition and (3.31), we obtain
Since {xi} is bounded, that is, there exits a subsequence {xik} of {xi} such that xik⇀x∗ for some x∗∈H. From (3.31) and (3.35), we get wik⇀x∗ and yik⇀x∗ as k→∞.
By the definition of zi and (3.6), we have
It follows from {zik} is bounded, 0<λik⩽λ<12max{c1,c2} and Condition 2.6 (A3) that 0⩽lim supk→∞f(yik,y)⩽f(x∗,y) for all y∈H. This implies that f(x∗,y)⩾0 for all y∈K, this shows that x∗∈EP(f,K). Then, we have
by u=PEP(f,K)V(u). Applying (3.36) to the inequality (3.30) with Lemma 2.4, we can conclude that xi→u=PEP(f,K)V(u) as i→∞.
Case 2. In another case of {ui}, we let ϕ:N→N be the map defined for all i⩾i0 (for some i0∈N large enough) by
Clearly, ϕ(i) is a non-decreasing sequence such that ϕ(i)→∞ for i→∞ and uϕ(i)+tϕ(i)⩽uϕ(i)+1 for all i⩾i0. Hence, similar to the proof of Case 1, we therefore obtain from (3.31) that
for some constant M1,M2>0. Thus
By the same proof of Case 1, one also derive
Again observe that for j⩾0 by (3.29), we have uj+1<uj+tj when xj∉Ω={x∈H:⟨x−x0,x−u⟩⩽0}. Hence xϕ(i)∈Ω for all i⩾i0 since uϕ(i)+tϕ(i)⩽uϕ(i)+1. Sine {xϕ(i)} is bounded, there exist subsequence {xϕ(i)} of {xϕ(i)} which converges weakly to some x∗∈H. As Ω is a closed and convex set, it is then weakly closed and so x∗∈Ω. Using (3.40), one can see as in Case 1 that zϕ(i)⇀x∗ and x∗∈EP(f,K) contains u as its only element. We therefore have x∗=u. Furthermore,
due to xϕ(i)∈Ω. This gives
Hence
By definition, uϕ(i)+1, we have
By our Condition 3.2 (i), (3.39) and (3.41), we obtain limi→∞uϕ(i)+1=0. We next show that we actually have limi→∞ui=0. To this end, first observe that, for i⩾i0, one has ui+ti⩽uϕ(i)+1 if i≠ϕ(i). It follows that for all i⩾i0, we have ui⩽max{uϕ(i),uϕ(i)+1}=uϕ(i)+1→0, since limi→∞ti=0, hence lim supi→∞ui⩽0. On the other hand, Lemma 3.6 implies that lim infi→∞ui⩾0. Hence, we obtain limi→∞ui=0. Consequently, the boundedness of {xi}, limi→∞αi=0, and (3.29) show that ‖xi−xi+1‖→0, as i→∞. Hence the definition of ui yields (‖xi+1−u‖2−θi‖xi−u‖2)→0, as i→∞. By using Lemma 3.7, we obtain the desired conclusion immediately.
Setting V(x)=x0, ∀x∈H, then we obtain the following modified Halpern inertial extragradient algorithm for EPs:
From Algorithm 3.1, the convergence depends on the parameter {λi} with the condition 0<λi≤λ<12max{c1,c2}. So, the step size {λi} can be considered in many ways. Applying step size concept of Shehu et al. [22], we then obtain the following modified viscosity type inertial extragradient stepsize algorithm for EPs:
Remark 3.11. (i) Since V(x)=x0, ∀x∈H is contraction, thus the modified Halpern inertial extragradient Algorithm 3.9 converges strongly to x∗=PEP(f,K)x0 with Conditions 2.6 and 3.2;
(ii) Since the step size {λi} in Algorithm 3.10 is a monotonically decreasing sequence with lower bound min{λ1,12max{c1,c2}} [22], thus Algorithm 3.10 converges strongly to the solution u=PEP(f,K)V(u) by Theorem 3.8.
We now give an example in infinitely dimensional spaces L2[0,1] to support the main theorem.
Example 3.12. Let V:L2[0,1]→L2[0,1] be defined by V(x(t))=x(t)2 where x(t)∈L2[0,1]. We can choose x0(t)=sin(t)2 and x1(t)=sin(t). The stopping criterion is defined by ‖xi−xi−1‖<10−2.
We set the following parameters for each algorithm, as seen in Table 1.
Next, we compare the performance of Algorithms 3.1, 3.9 and 3.10. We obtain the results as seen in Table 2.
From Figure 1, we see that the performance of Algorithm 3.10 is better than Algorithms 3.1 and 3.9.
4.
Application to data classification problem
According to the International Diabetes Federation (IDF), there are approximately 463 million people with diabetes worldwide, and it is estimated that by 2045 there will be 629 million people with diabetes. In Thailand, the incidence of diabetes is continuously increasing. There are about 300,000 new cases per year, and 3.2 million people with diabetes are registered in the Ministry of Public Health's registration system. They are causing huge losses in health care costs. Only one disease of diabetes causes the average cost of treatment costs up to 47,596 million baht per year. This has led to an ongoing campaign about the dangers of the disease. Furthermore, diabetes mellitus makes additional noncommunicable diseases that present a high risk for the patient, as they easily contact and are susceptible to infectious diseases such as COVID-19 [23]. Because it is a chronic disease that cannot be cured. There is a chance that the risk of complications spreading to the extent of the loss of vital organs of the body. By the International Diabetes Federation and the World Health Organization (WHO) has designated November 14 of each year as World Diabetes Day to recognize the importance of this disease.
In this research, we used the PIMA Indians diabetes dataset which was downloaded from Kaggle (https://www.kaggle.com/uciml/pima-indians-diabetesdatabase) and is available publicly on UCI repository for training processing by our proposed algorithm. The dataset contains 768 pregnant female patients which 500 were non-diabetics and 268 were diabetics. There were 9 variables present inside the dataset; eight variables contain information about patients, and the 9th variable is the class predicting the patients as diabetic and nondiabetic. This dataset contains the various attributes that are Number of times pregnant; Plasma glucose concentration at 2 Hours in an oral glucose tolerance test (GTIT); Diastolic Blood Pressure (mm Hg); Triceps skin fold thickness (mm); 2-Hour Serum insulin (lh/ml); Body mass index [weight in kg/(Height in m)]; Diabetes pedigree function; Age (years); Binary value indicating non-diabetic /diabetic. For the implementation of machine learning algorithms, 614 were used as a training dataset and 154 were used as a testing training dataset by using 5-fold cross-validation [12]. For benchmarking classifier, we consider the following various methods which have been proposed to classify diabetes (see Table 3):
We focus on extreme learning machine (ELM) proposed by Huang et al. [9] for applying our algorithms to solve data classification problems. It is defined as follows:
Let E:={(xn,tn):xn∈Rn,tn∈Rm,n=1,2,...,P} be a training set of P distinct samples where xn is an input training data and tn is a training target. The output function of ELM for single-hidden layer feed forward neural networks (SLFNs) with M hidden nodes and activation function U is
where wj and bj are parameters of weight and finally the bias, respectively and Θj is the optimal output weight at the j-th hidden node. The hidden layer output matrix H is defined as follows:
To solve ELM is to find optimal output weight Θ=[ΘT1,...,ΘTM]T such that HΘ=T, where T=[tT1,...,tTP]T is the training data. In some cases, finding Θ=H†T, where H† is the Moore-Penrose generalized inverse of H. However, if H does not exist, then, finding such a solution Θ through convex minimization can overcome such difficulty.
In this section, we process some experiments on the classification problem. This problem can be seen as the following convex minimization problem:
where λ is a regularization parameter. This problem is called the least absolute shrinkage and selection operator (LASSO) [26]. Setting f(Θ,ζ)=⟨HT(HΘ−T),ζ−Θ⟩ and V(x)=Cx where C is constant in (0, 1).
The binary cross-entropy loss function along with sigmoid activation function for binary classification calculates the loss of an example by computing the following average:
where ˆyj is the j-th scalar value in the model output, yj is the corresponding target value, and K is the number of scalar values in the model output.
In this work, the performance of machine learning techniques for all classes is accurately measured. The accuracy is calculated by adding the total number of correct predictions to the total number of predictions. The performance parameter calculation of precision and recall are measured. The formulation of three measures [27] are defined as follow:
where a confusion matrix for original and predicted classes are shown in terms of TP, TN, FP, and FN are the True Positive, True Negative, False Positives, and False Negatives, respectively. Similarly, P and N are the Positive and Negative population of Malignant and Benign cases, respectively.
For starting our computation, we set the activation function as sigmoid, hidden nodes M=160, regularization parameter λ=1×10−5, θi=0.3, αi=1i+1, τ=0.5, μ=0.2 for Algorithms 3.1, 3.9 and 3.10 and C=0.9999 for Algorithms 3.1 and 3.10. The stopping criteria is the number of iteration 250. We obtain the results of the different parameters S when λi=Smax(eigenvalue(ATA)) for Algorithms 3.1, 3.9 and the different parameters λ1 for Algorithm 3.10 as seen in Table 4.
We can see that λi=λ1=0.99 greatly improves the performance of Algorithms 3.1, 3.9 and 3.10. Therefore, we choose it as the default inertial parameter for next computation.
We next choose λi=0.99max(eigenvalue(ATA)), αi=1i+1, τ=0.5 for Algorithms 3.1 and 3.9 and C=0.9999 for Algorithm 3.1 with λ1=0.99max(eigenvalue(ATA)), αi=1i+1, τ=0.5, C=0.9999, and μ=0.2 for Algorithm 3.10. The stopping criteria is the number of iteration 250. We consider the different initialization parameter θ where
where N is a number of iterations that we want to stop. We obtain the numerical results as seen in Table 5.
We can see that θ=0.3 greatly improves the performance of Algorithms 3.1, 3.9 and 3.10. Therefore, we choose it as the default inertial parameter for next computation.
We next set λi=0.99max(eigenvalue(ATA)), θi=0.3, τ=0.5 for Algorithms 3.1 and 3.9 and C=0.9999 for Algorithm 3.1 with λ1=0.99max(eigenvalue(ATA)), θi=0.3, τ=0.5, C=0.9999, and μ=0.2 for Algorithm 3.10. The stopping criteria is the number of iteration 250. We consider the different initialization parameter αi. The numerical results are shown in Table 6.
We can see that αi=110i+1 greatly improves the performance of Algorithm 3.1, αi=110i2+1 greatly improves the performance of Algorithm 3.9, and αi=1i2+1 greatly improves the performance of Algorithm 3.10. Therefore, we choose it as the default inertial parameter for next computation.
We next calculate the numerical results by setting λi=0.99max(eigenvalue(ATA)), θi=0.3, αi=110i+1 and C=0.9999 for Algorithm 3.1, λi=0.99max(eigenvalue(ATA)), θi=0.3, αi=110i2+1 for Algorithm 3.9 and λ1=0.99max(eigenvalue(ATA)), θi=0.3, C=0.9999, αi=1i2+1, and μ=0.2 for Algorithm 3.10. The stopping criteria is the number of iteration 250. We consider the different initialization parameter τ. The numerical results are shown in Table 7.
We can see that τ=0.5 greatly improves the performance of Algorithms 3.1, 3.9 and 3.10. Therefore, we choose it as the default inertial parameter for next computation.
We next calculate the numerical results by setting λi=0.99max(eigenvalue(ATA)), θi=0.3, τ=0.5 for Algorithms 3.1 and 3.9 and αi=110i+1 for Algorithm 3.1 with αi=110i2+1 for Algorithm 3.9 and λ1=0.99max(eigenvalue(ATA)), θi=0.3, αi=1i2+1, τ=0.5, and μ=0.2 for Algorithm 3.10. The stopping criteria is the number of iteration 250. We obtain the results of the different parameters C when V(x)=Cx for Algorithms 3.1 and 3.10 as seen in Table 8.
From Tables 3–8, we choose the parameters for Algorithm 3.1 to compare the exist algorithms from the literature. The following Table 9 shows choosing the necessary parameters of each algorithm.
For comparison, We set sigmoid as an activation function, hidden nodes M=160 and regularization parameter λ=1×10−5.
Table 10 shows that Algorithm 3.1 has the highest efficiency in precision, recall, and accuracy. It also has the lowest number of iterations. It has the highest probability of correctly classifying tumors compared to algorithms examinations. We present the training and validation loss with the accuracy of training to show that Algorithm 3.1 has no overfitting in the training dataset.
From Figures 2 and 3, we see that our Algorithm 3.1 has good fit model this means that our Algorithm 3.1 suitably learns the training dataset and generalizes well to classification the PIMA Indians diabetes dataset.
5.
Conclusions
In general, screening for diabetes in pregnancy we use The American College of Obstetricians and Gynecologists (ACOG) recommendations. The accuracy of our method is 80.03% and high accuracy may use for predict correctly diabetes in pregnancy in the future.
In this paper, we introduce a modified extragradient method with inertial extrapolation step and viscosity-type method to solve equilibrium problems of pseudomonotone bifunction operator in real Hilbert spaces. We then prove the strong convergence theorem of the proposed algorithm under the assumption that the bifunction satisfies the Lipchitz-type condition. Moreover, we show choosing stepsize parameter {λi} in many ways, this shows that our algorithm is flexible using, see in Algorithms 3.1 and 3.10. Finally, we show our algorithms are better performance than existing algorithms to solve the diabetes mellitus classification in machine learning.
Acknowledgments
This research was also supported by Fundamental Fund 2022, Chiang Mai University and the NSRF via the Program Management Unit for Human Resources and Institutional Development, Research and Innovation (grant number B05F640183). W. Cholamjiak would like to thank National Research Council of Thailand (N42A650334) and Thailand Science Research and Innovation, the University of Phayao (FF65-UOE).
Conflict of interest
The authors declare no conflict of interest.