1.
Introduction
Let H be a real Hilbert space equipped with an inner product ⟨⋅,⋅⟩ and its induced norm ‖⋅‖. In this work, we deal with the minimization of a strongly convex smooth function over the common fixed-point constraints, which is in the following form:
where FixTi:={x∈H:Ti(x)=x} and the objective function and the constrained operators satisfy the following assumptions:
(A1) The function f:H→R is η-strongly convex and L-smooth.
(A2) The operators Ti:H→H, i=1,…,m, are cutters with m⋂i=1FixTi≠∅.
Thanks to the closedness and the convexity of the common fixed-point sets m⋂i=1FixTi and the strong convexity of f, we can ensure that the solution set to the problem (1.1) is a singleton set. In this situation, we denote the solution point to the problem (1.1) by x∗ throughout this work.
1.1. Related works
The minimization problem and its generalization (such as variational inequality problem) over the common fixed-point constraints have been investigated by many authors, in which we briefly note some related works as follows. Bauschke [1] considered the problem (1.1) in the case when the objective function is given by f(x):=12‖x−a‖2, where the point a∈H is a given point and the operators Ti,i=1,2,…,m are nonexpansive operators. He proposed the following method: Given x1∈H and a nonnegative sequence {βk}∞k=1, for all k∈N, compute
where the function [⋅] is the modulo m function taking values in {1,…,m}. Bauschke proved that the sequence {xk}∞k=1 generated by the proposed method (1.2) converges strongly to the unique solution of the considered problem, provided that the constrained set satisfies
and the sequence {βk}∞k=1⊂[0,1) satisfies limk→∞βk=0, ∑∞k=1βk=∞, and ∑∞k=1|βk+m−βk|<∞.
After that, Yamada [2] considered the solving of a more general setting of the problem (1.1) in the sense of variational inequality problem governed by the strongly monotone and Lipschitz continuous operator F:H→H. Thanks to the optimality condition, it is well known that such a considered problem can be reduced to the problem (1.1) by setting F:=∇f. He proposed the so-called hybrid steepest descent method, which is in the following form: Given x1∈H and a nonnegative sequence {βk}∞k=1, for all k∈N, compute
With the similar assumption on the constrained set (1.3) and the control sequence {βk}∞k=1, the strong convergence of the proposed method (1.4) to the unique solution of the considered problem was obtained.
Xu and Kim [3] also investigated the problem in the same setting as in Yamada [2] and considered the convergence result of the method (1.4) by proposing a variant condition of the control sequence {βk}∞k=1⊂(0,1], which satisties limn→∞βnβn+1=1 in place of the condition ∑∞k=1|βk+m−βk|<∞ given in Yamada [2]. Another condition of the control sequence {βk}∞k=1 and the assumption on the constrained set (1.3) are also the same as above. After the works of Yamada [2] and Xu and Kim [3] were presented, many authors considered the variances and generalizations; see, for instance, [4,5,6,7,8,9].
Prangprakhon et al. [10] also considered the solving of the variational inequality problem over the intersection of fixed point sets of firmly nonexpansive operators. They presented an iterative algorithm, which can be viewed as a generalization of the hybrid steepest descent method in the allowance of adding appropriated information when computing operators values. The main feature of their proposed method is the presence of added information that can be viewed as the allowance of possible numerical errors on the computations of operators' values. They subsequently proved the strong convergence of the proposed method without the assumption on the constrained set (1.3).
Prangprakhon and Nimana [11] subsequently proposed the so-called extrapolated sequential constraint method with conjugate gradient direction (ESCoM-CGD) for solving the variational inequality problem over the intersection of the fixed-point sets of cutters. ESCoM-CGD is motivated by the ideas of the hybrid conjugate gradient method [12], which is an accelerated version of the hybrid steepest descent method together with the extrapolated cyclic cutter methods [13,14]. This method consists of two interesting features, namely, it consists of the extrapolation stepsize function σ(yk) (see Step 2 of Algorithm 1) and the search direction dk+1, which is the combination of the current direction −∇f(xk+1) together with a previous search direction dk. By assumming the boundedness of the generated sequence {xk}∞k=1 together with the assumptions on control sequences, they also proved the strong convergence of the proposed method.
Very recently, Petrot et al. [15] proposed the so-called dynamic distributed three-term conjugate gradient method for solving the strongly monotone variational inequality problem over the intersection of fixed-point sets of firmly nonexpansive operators. Unlike [11], this method has a simultaneous structure so that it allows the independent computation of each operator along with the dynamic weight, which is updated at every iteration. Moreover, the strong convergence of the method was obtained without assuming that the sequence {xk}∞k=1 is bounded.
1.2. Our contribution
In this work, we will present a modification version of ESCoM-CGD by using the search direction considered in [15]. We will prove the convergence of the proposed method without assuming that the generated sequence is bounded. It is important to underline that we will simplify the proving lines compared to the proof of ESCoM-CGD given in [11, Theorem 3] in a shortened way. We also show that the proposed algorithm and convergence result are applicable in the binary classification via support vector machine learning.
The remainder of this work is organized as follows. In the rest of this section, we collect some mathematical preliminaries consisting of some useful definitions and facts needed in the proving lines. In Section 2, we present the modified version of ESCoM-CGD for finding the solution to the problem (1.1). The main convergence theorem and some important remarks are also given in this section. In Section 3, we provide some usefulness of the proposed method and its convergence result in solving the minimum-norm problem to the system of homogeneous linear inequalities, as well as the binary classification via support vector machine learning. In Section 4, we provide some technical convergence properties of the generated sequences and, subsequently, prove the main convergence theorem. Lastly, we give a concluding remark.
1.3. Mathematical preliminaries
In this subsection, we will recall some definitions, properties, and key tools in proving our convergence results. The readers can consult the books [16,17] for more details.
We denote by Id the identity operator on a Hilbert space H. For a sequence {xk}∞k=1, the strong and weak convergences of a sequence {xk}∞k=1 to a point x∈H are defined by the expression xk→x and xk⇀x, respectively.
We will recall the convexity and smoothness of a function. Now, let f:H→R be a real-valued function. Let α>0 be given. We say that the function f is convex if, for all x,y∈H and for all λ∈[0,1], it holds that
In particular, we say that the function f is α-strongly convex if, for all x,y∈H and for all λ∈[0,1], it holds that
The constant α is called the strongly convex parameter.
Let x∈H be given. We say that the function f is Fréchet differentiable or, in short, differentiable at x if there exists a vector g∈H such that
Moreover, we call this unique vector g by the gradient of f at x and denote it by ∇f(x).
It is well known that the convexity of f can be characterized in terms of differentiability property as the followings.
Fact 1.1. [16, Theorem 5.24] Let f:H→R be a real-valued differentiable function, then f is α-strongly convex if, and only if, for all x,y∈H, we have
The following fact is the cornerstone of our proving lines. It is the necessary and sufficient condition for the optimality of a convex function over a nonempty closed and convex set.
Fact 1.2. [16, Corollary 3.68] Let f:H→R be a continuously differentiable convex function and C⊂H be a closed and convex set, then f attains its minimum at a point x∗∈C if, and only if, for all y∈C, it holds that
Let L≥0 be given. We say that the function f is L- smooth if it is differentiable and, for all x,y∈H, it satisfies that
The constant L is called the smoothness parameter. It is clear that an L-smooth function is a continuously differentiable function.
The following fact is a very useful tool in obtaining the convergence result. The proof can be found in [2, Lemma 3.1(b)].
Fact 1.3. Let f:H→R be an η-strongly convex and L-smooth. For any μ∈(0,2η/L2) and β∈(0,1], if we define the operator Fβ:H→H by Fβ(x):=x−μβ∇f(x) for all x∈H, then
for all x,y∈H, where τ:=1−√1+μ2L2−2μη∈(0,1].
Next, we recall the definition of cutters and some useful properties.
Let T:H→H be an operator with FixT:={x∈H:T(x)=x}≠∅. We say the operator T is a cutter if, for all x∈H and z∈FixT, we have
The followings are the important properties of cutters.
Fact 1.4. [17, Remark 2.1.31 and Lemma 2.1.36.] Let T:H→H be a cutter, then the following statements hold:
(i) FixT is closed and convex.
(ii) For all x∈H and for all z∈FixT, we have ⟨Tx−x,z−x⟩≥‖T(x)−x‖2.
Next, we recall the definitions of relaxations of an operator. Let T:H→H be an operator and λ∈[0,2] be given. The relaxation of the operator T is defined by Tλ(x):=(1−λ)x+λT(x), for all x∈H. In this case, we call the parameter λ by a relaxation parameter. Moreover, let σ:H→(0,∞) be a function. The generalized relaxation of the operator T is defined by
for all x∈H. In this case, we call the function σ by a stepsize function. If the function σ(x)≥1 for all x∈H, then we call the operator Tσ,λ by an extrapolation of Tλ. We denote Tσ:=Tσ,1. It is worth noting that, for all x∈H, the followings hold
and for all λ≠0, we also have
For the simplicity, we denote the compositions of operators as
and
We recall the important properties that will be useful for the convergence properties.
Fact 1.5. [17, Section 4.10] Let Ti:H→H, i=1,2,…,m, be cutters with m⋂i=1FixTi≠∅. Let σ:H→(0,∞) be defined by
then the following properties hold:
(i) For all x∉⋂mi=1FixTi, we have
(ii) The operator Tσ is a cutter.
Next, we will recall the notion of demi-closedness principle as follows.
Let T:H→H be an operator having a fixed point. The operator T is said to satisfy the demi-closedness principle if, for every sequence {xk}⊂H such that xk⇀x∈H and ‖T(xk)−xk‖→0, we have x∈FixT.
We close this section by recalling the well-known metric projection, which is defined in the following: Let C be a nonempty subset of H and x∈H. If there is a point y∈C such that ‖x−y‖≤‖x−z‖, for any z∈C, then y is said to be a metric projection of x onto C and is denoted by PC(x). If PC(x) exists and is uniquely determined for all x∈H, then the operator PC:H→H is said to be the metric projection onto C. Actually, we need some additional properties of the set C to ensure the existence and the uniqueness of a metric projection PC(x) for a point x∈H as the following fact.
Fact 1.6. [17, Theorem 1.2.3.] Let C be a nonempty closed convex subset of H, then for any x∈H, there exists a metric projection PC(x) and it is uniquely determined.
Moreover, we finally note that the metric projection onto a nonempty closed convex set C is a cutter with FixPC=C; see [17, Theorem 2.2.21.] for further details.
2.
Algorithm and its convergence
We will proceed in this section by presenting a modified version of the extrapolated sequential constraint method with conjugate gradient direction (ESCoM-CGD) for solving the problem (1.1). Now, we are in a position to state a modified version of ESCoM-CGD as the following algorithm. We call the proposed method by the modified extrapolated sequential constraint method with conjugate gradient direction (in short, MESCoM-CGD).
Some comments and particular situations of MESCoM-CGD are the following:
Remark 2.1. (ⅰ) As we have mentioned that the convergence of ESCoM-CGD (see, [11, Theorem 3]) relies on the assumption that the generated sequence {xk}∞k=1 is bounded, however, it is also pointed in [11, Remark 3] that the boundednesses of the search direction {dk}∞k=1∈H and the sequence {φk}∞k=1 together with the definition of the search direction yield the boundedness of the sequence {xk}∞k=1. In this situation, if we construct the bounded search direction {dkmax{1,‖dk‖}}∞k=1∈H in place of the previous version, then the boundedness of the generated sequence {xk}∞k=1 will be provable as well (see, Lemma 3.3 below). One can notice that the key idea of this bounded strategy is nothing else but restricting the search direction in a unit ball centered at the origin.
(ⅱ) If the search direction {dk}∞k=1 is guaranteed to stay within the unit ball centered at the origin, then it holds that max{1,‖dk‖}=1 so that MESCoM-CGD is reduced to the ESCoM-CGD in the case when the operator Tm=Id.
(ⅲ) If the number of m=1, the relaxation parameter λk=1, the stepsize σ(yk)=1, and max{1,‖dk‖}=1 for all n∈N, then MESCoM-CGD is nothing else but the hybrid conjugate gradient method proposed in [12].
Now, let us begin this section by investigating the assumptions needed for the convergence result.
Assumption 2.2. Assume that
(H1) The sequence of stepsizes {βk}∞k=1⊂(0,1] satisfies ∞∑k=1βk=∞ and ∞∑k=1β2k<∞.
(H2) The sequence of parameters {φk}∞k=1⊂[0,∞) satisfies limk→∞φk=0.
(H3) There is a constant ε∈(0,1) such that the relaxation parameters {λk}∞k=1⊂[ε,2−ε].
(H4) The operators Ti, i=1,…,m, satisfy the demi-closedness principle.
Let us briefly discuss some particular situations in which hypotheses (H1)–(H4) hold as follows:
Remark 2.3. (ⅰ) An example of the stepsizes {βk}∞k=1 satisfying (H1) is, for instance, βk=β0(k+1)b with β0∈(0,1] and b∈(0,1] for all k∈N.
(ⅱ) An example of the parameters {φk}∞k=1 satisfying (H2) is, for instance, φk=φ0(k+1)c with φ0≥0 and c>0 for all k∈N.
(ⅲ) The role of the constant ε∈(0,1) is to ensure that the term λk(2−λk) stays away from zero for all k∈N. This will yield that the cancellation of this term can be processed. One can take, for instance, λk=λ0 for some λ0∈(0,2) as a trivial example of relaxation parameters {λk}∞k=1, satisfying hypothesis (H3).
(ⅳ) The demi-closedness principle of the operators Ti, i=1=,…,m, in (H4) is a key property and it is typically assumed when dealing with the convergence result of the common fixed-point type problems. Actually, the demi-closedness principle is satisfied by a nonexpansive operator, i.e., ‖Ti(x)−Ti(y)‖≤‖x−y‖ for all x,y∈H; see [17, Lemma 3.2.5.] for the proving lines with some historical notes. This yields that the metric projection onto a nonempty closed and convex also satisfies the demi-closedness principle, see [17, Theorem 2.2.21]. Moreover, it is worth mentioning that the demi-closedness principle is also satisfied by a subgradient projection operator Pf for a convex continuous function f:H→R, which is Lipschitz continuous relative to every bounded subset of H; see [17, Theorem 4.2.7] for more details.
The main result of this work is the following theorem:
Theorem 2.4. Let the sequence {xk}∞k=1 be generated by MESCoM-CGD and assume that assumptions (A1) and (A2) and hypotheses (H1)–(H4) hold, then the sequence {xk}∞k=1 converges strongly to the unique solution x∗ of the problem (1.1).
3.
Proof of Theorem 2.4
In this section, we will provide some technical convergence properties and close this section by proving the convergence of the proposed method to the unique solution of the problem (1.1).
For simplicity of notations, we denote
and, for every n∈N and x∈⋂mi=1FixTi, we denote
and
Moreover, for every k≥2, we denote
In order to prove the convergence result in Theorem 2.4, we collect several facts that will be useful in what follows. We first state the lemma relating to the norms of iterate xk to a common fixed-point. The analysis is similar to those given in [11, Lemma 4], however, we state here again for the sake of completeness.
Lemma 3.1. For all k∈N and x∈m⋂i=1FixTi, we have
Proof. Let k∈N and x∈m⋂i=1FixTi be given. By using the properties of extrapolation and Facts 1.4 (ⅱ) and 1.5(ⅱ), we note that
We note from the definition of {yk}∞k=1 that
By invoking this relation in (3.1), we obtain
as required. □
In order to prove the boundedness of the generated sequence {xk}∞k=1, we need the following proposition [18, Lemma 3.1].
Proposition 3.2. Let {ak}∞k=1 be a sequence of nonnegative real numbers such that there exists a subsequence {akj}∞j=1 of {ak}∞k=1 with akj<akj+1 for all j∈N. If, for all k≥k0, we define
then the sequence {v(k)}k≥k0 is nondecreasing, av(k)≤av(k)+1, and ak≤av(k)+1 for every k≥k0.
Now, we are in a position to prove the boundedness property of the generated sequence {xk}∞k=1 as the following lemma. The idea proof is based on [15, Lemma 5].
Lemma 3.3. The sequence {xk}∞k=1 is a bounded sequence.
Proof. Let k∈N and x∈m⋂i=1FixTi be given. Since λk∈[ε,2−ε], we have ε2≤λk(2−λk), and so
Thus, by using this property together with the inequality obtained in Lemma 3.1, we get
For the sake of simplicity, we set
for all k∈N. In this case, the inequality (3.2) can be rewritten as
Next, we divide the proof into two cases based on the behaviors of the sequence {Γk}∞k=1:
Case Ⅰ: Suppose that there exists k0∈N such that Γk+1≤Γk for all k≥k0. In this case, we have Γk≤Γk0 for all k≥k0, which is
for all k≥k0. Since the series ∑∞k=1β2k converges, we obtain that the sequence {‖xk−x‖2}∞k=1 is bounded and, subsequently, the sequence {xk}∞k=1 is also bounded.
Case Ⅱ: Suppose that there exists a subsequence {Γkj}∞j=1 of {Γk}∞k=1 such that Γkj<Γkj+1 for all k∈N, and let {v(k)}∞k=1 be defined as in Proposition 3.2. This yields, for all k≥k0, that
and
By using the relation (3.4) in the inequality (3.3) and the fact that the term 2μβkmax{1,‖dk‖} is positive, we have
Now, let us note that
and, since 0≤φk≤1, we have
Since f is η-strongly convex, we have from Fact 1.1 that
where the last inequality holds by the inequality (3.6). Thus, we obtain that
which implies that the sequence {‖xv(k)−x∗‖}∞k=1 is bounded. Now, let us observe that
which means that the sequence {Γv(k)+1}∞k=1 is bounded above. Finally, by using the inequality (3.5), we obtain that {Γk}∞k=1 is bounded and, subsequently, {xk}∞k=1 is also bounded. □
A simple consequence of Lemma 3.3 is the boundedness of related sequences.
Lemma 3.4. The sequences {∇f(xk)}∞k=1, {dk}∞k=1, and {yk}∞k=1 are bounded.
Proof. Let k∈N and x∈m⋂i=1FixTi be given. We first note from the L-smoothness of f that
where M1:=supk∈N‖xk‖. This means that the boundedness of the sequence {∇f(xk)}∞k=1 is now obtained.
Next, by the construction of dk, we note for all k≥1 that
Thus, we have ‖dk‖≤max{M+1,‖d1‖} for all k∈N and the boundedness of {dk}∞k=1 is obtained. Finally, these two obtained results immediately yield the boundedness of the sequence {yk}∞k=1. □
Lemma 3.5. For all k≥2 and x∈m⋂i=1FixTi, it holds that
Proof. Let k≥2 and x∈m⋂i=1FixTi be given. We note from the inequality (3.1) that
where the second inequality holds by the fact that ‖x+y‖2≤‖x‖2+2⟨y,x+y⟩ for all x,y∈H, and the third inequality holds by Fact 1.3. □
The following proposition will be a key tool for obtaining the convergence result in Theorem 2.4. The idea and its proof can be consulted in [19, Lemma 2.6] and [20, Lemma 2.4].
Proposition 3.6. Let {ak}∞k=1 be a nonnegative real sequence, {δk}∞k=1 be a real sequence, and {αk}∞k=1 be a real sequence in [0,1] such that ∞∑k=1αk=∞. Suppose that
If lim supj→∞δkj≤0 for every subsequence {akj}∞j=1 of {ak}∞k=1 satisfying
then limk→∞ak=0.
Now, we are in a position to prove Theorem 2.4.
Proof. Let x∗ be the unique solution to the problem 1.1. For simplicity, we denote ak:=‖xk−x∗‖2 for all k∈N. Now, by using the facts obtained in Lemmata 3.3 and 3.4 and the fact that limk→∞βk=0, we obtain
which implies that limk→∞εk=0.
Next, we let a subsequence {akj}∞j=1 of the seuqence {ak}∞k=1 satisfying
or, equivalently,
By utilizing the inequality obtained in Lemma 3.1 and the fact limk→∞εk=0, we obtain
This implies that
Since ε2≤λk(2−λk), we obtain that
On the other hand, since {yk}∞k=1 is a bounded sequence, so is the sequence {⟨ykj−x∗,∇f(x∗)⟩}∞j=1. Now, let {ykjℓ}∞ℓ=1 be a sequence of {ykj}∞j=1 such that
Due to the boundedness of the sequence {ykj}∞j=1, there exists a weakly cluster point z∈H and a subsequence {ykjℓ}∞ℓ=1 of {ykj}∞j=1 such that ykjℓ⇀z∈H. According to the obtained fact in (3.7), we note that
Since T1 satisfies the demi-closedness principle, we obtain that z∈FixT1. Furthermore, we note that the facts ykjℓ⇀z and
yield that T1(ykjℓ)⇀T1(z)=z. Furthermore, we observe that
By invoking the assumption that T2 satisfies the demi-closedness principle, we also obtain that z∈FixT2. By continuing the same argument used in the above proving lines, we obtain that z∈FixTi for all i=1,2,…,m, then z∈m⋂i=1FixTi. Since x∗ is the unique solution to the problem (1.1), we note from the optimality condition in Fact 1.2 that
Now, the assumption that limk→∞φk=0, the boundedness of the sequences {yk}∞k=1 and {dk}∞k=1, and the relation (3.8) yield that
Hence, by applying Propostion 3.6, we conclude that limk→∞ak=0. The proof is completed. □
4.
Some illustrated consequences
In this section, we will provide some simple consequences of the main theorem. We start with the minimization problem over the system of linear inequalities. This constraint is nothing else but the intersection of linear half-spaces. We subsequently show that the proposed algorithm and convergence result are also applicable in the well-known support vector machine learning.
4.1. Minimization problem over system of linear inequalities
In this subsection, we assume that the function f:H→R is η-strongly convex and L-smooth as given in the assumption (A1). Now, let ai∈H with ai≠0 and bi∈R, i=1,…,m. We consider the minimization of a strongly convex smooth function over the intersection of nonempty closed and convex sets, which is in the following form:
We may assume the consistency of the system of linear inequalities and we also denote the solution point to the problem (4.1) by x∗. Now, for each i=1,…,m, we let Hi:={x∈H:<ai,x>≤bi} be the half-space corresponding to ai and bi. Furthermore, we set Ti:=PHi, the metric projection onto the half-space Hi, for all i=1,…,m. It is worth noting that the metric projection onto a half-space has a closed-form expression, that is,
see [17, Subsection 4.1.3] for further details. Note that Hi is a closed and convex set and the fixed-point set. FixTi=Hi for all i=1,…,m. To derive an iterative method for solving the problem (4.1), we recall the notations T:=TmTm−1⋯T1, S0:=Id, and Si:=TiTi−1⋯T1, for all i=1,2,…,m. Now, for every x∈H, by setting ui:=Si(x), we have u0=x and ui=Ti(ui−1), for all i=1,2,…,m. In this case, the stepsize function σ:H→(0,∞) can be written as the following:
see [13, Section 4.2] for further details.
In order to solve the problem (4.1), we consider a particular situation of MESCoM-CGD as the following algorithm.
We obtain an immediate consequence of Theorem 2.4 in the following corollary.
Corollary 4.1. Let the sequence {xk}∞k=1 be generated by Algorithm 2 and assume that assumption (A1) and hypotheses (H1)–(H3) hold, then the sequence {xk}∞k=1 converges strongly to the unique solution x∗ of the problem (4.1).
To examine the behavior of Algorithm 2 and the convergence given in Corollary 4.1, we consider the solving of the minimum-norm problem to the system of homogeneous linear inequalities. We assume that the whole space H is finite dimensional so that H=Rn. Given a matrix A=[a1|⋯|am]T of predictors ai=(a1i,…,ani)∈Rn, for all i=1,…,m, b=(b1,…,bm)∈Rm is a vector. The minimum-norm problem is to find the vector x∈Rn that solves the problem
or, equivalently, in the explicit form
Now, by putting the constrained sets Hi:={x∈Rn:⟨ai,x⟩≤bi}, i=1,…,m as half spaces and Ti:=PHi, i=1,…,m, the metric projections onto Hi with FixTi=Hi satisfy the demi-closed principle. Furthermore, the function f(x):=12‖x‖2 is 1-strongly convex with 1-smooth. In this situation, the problem (4.2) is nothing else but a special case of the problem (4.1), which yields that Algorithm 2 and the convergence given in Corollary 4.1 are applicable.
To perform a numerical illustration in solving the problem (4.1), we generate the matrix A∈Rm×n, where m=200 and n=100 by uniformly distributed random generating between (−5,5), and set the vector b=0. According to Remark 2.3 (ⅰ)-(ⅲ), we examine the influence of parameters β0∈[0.8,1], φ0∈[0.1,1], and λ0∈[0.1,1.9]. We fix the corresponding parameter μ=1.9. We terminated the experiment with the error of norm, that is, ‖xk+1−xk‖<ϵ. We manually choose the choice of parameters β0=1, φ0=1, and λ0=1 with the smallest number of iterations when the error of tolerance ϵ=10−4 is met.
Next, we show behavior of Algorithm 2 when solving the problem (4.2) for various error of tolerance ϵ. We set number of variables to be n=100 and consider several number of inequalities, that is, m=100,200,300,400, and 500. According to the above experiments, we set the parameters β0=1, φ0=1, and λ0=1. We plot the number of iterations and computational time in seconds in Figure 1.
It can be noticed from Figure 1 that a smaller number m needed a larger number of iterations for all errors of tolerance. Moreover, for the numbers m=100,200, and 300, it can be seen that a smaller number m needed a larger computational time.
4.2. Support vector machine learning problem
In this subsection, we consider the constructing of a classifier in binary classification problem starting by a given training datasets of two classes. More precisely, the ith training data ai∈Rn and the ith label bi∈{−1,+1} of the ith training data, for all i=1,…,m. The support vector machine learning is to train a weight u∈Rn so that a (linear) classifier c(a):=<a,u> can give a corrected class of every new tested data a∈Rn. In this situation, a will be identified to be in the class −1 if c(a)<0, and to the class +1, otherwise. Mathematically, the support vector machine learning can be form as the following minimization problem:
where ξi≥0 is nonnegative slack variable corresponding to the ith training data for all i=1,…,m. By introducing a new variable x:=(u,ξ1,…,ξm)∈Rn+m and using the idea given in, for instance [15,21], the problem (4.3) can be written as
where the matrix A∈R2m×(n+m) is given by
and the vector b∈R2m is given by
This problem (4.4) is nothing else but a particular case of the problem (4.2) and, subsequently, Algorithm 2 and the convergence given in Corollary 4.1 are also applicable.
In the first experiment, we aim to classify the 28×28 images on gray scale pixels of the handwritten digits from the MNIST dataset, which was provided as https://cs.nyu.edu/roweis/data.html. We used the dataset of 5000 images for the handwritten digit 9 and the dataset of 5000 images for the handwritten digits 0−8. The images are labeled by the classes +1 and −1, respectively. We perform the 10-fold cross-validation on the given datasets. Actually, in each class, we put a fold of 1000 images to be the testing data and the remaining 9 folds consisting of 9000 images to be the training data. We perform the cross-validation process repeatedly for 10 times so that each fold is set to be the testing data.
Recalling that the class of digit 9 and the class of digits 0−8 are labeled by +1 and −1, which are positive and negative, respectively, we denote the numbers of a tested data. It is classified as positive by true positive (TP); if classified as negative, then it is denoted by false negative (FN). If a tested data is labeled as negative and is classified as negative, then it is denoted as true negative (TN); if it is classified as positive, it is denoted as false positive (FP). These numbers are summarized as the following details:
To measure the performance of each obtained classifier performed by each cross-validation, we consider classification performance metrics, namely, accuracy, precision, recall, specificity, and F-measure. These performance metrics are computed by the following details:
In the experiment, we terminate the experiment when the number of iterations is k=50 and, subsequently, average each performance metric after the cross-validation process repeatedly for 10 times. The classification performance metrics of each parameter are presented in Table 1.
The results given in Table 1 shows that the parameters β0 and φ0 slightly effected the classification performances. The highest accuracy and F-measure values were obtained for the case β0=0.5. Actually, we can observe that these five metrics as well as the computation running times were not much different. This may yield the stability of the proposed method in the sense that the corresponding parameters do not hugely effect the convergence of the proposed method.
5.
Conclusions
We presented the modified version of the extrapolated sequential constraint method proposed in [11] for solving the minimization problem over the intersection of the fixed-point sets of cutter operators. We not only presented a simple version of the strong convergence of the proposed method, but also omitted the boundedness assumption used in [11]. We examined the proposed method to solve the binary classification by using support vector machines.
Use of AI tools declaration
The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.
Acknowledgments
The authors are grateful to three anonymous reviewers for their comments and suggestions that improved the paper's quality and presentation. This work has received funding support from the NSRF via the Program Management Unit for Human Resources & Institutional Development, Research and Innovation [grant number B05F650018]. D. Rakjarungkiat gratefully acknowledges the research capability enhancement program through graduate student scholarship, Faculty of Science, Khon Kaen University.
Conflict of interest
The authors declare that there is no conflict of interest regarding the publication of this article.