1.
Introduction
The hidden Markov chain (HMC) model, as characterized in [1,2,3,4,5], is a doubly stochastic process, usually denoted by (X,Y). The process X, which models a to be segmented image, is assumed to be hidden and Markovian, and the process Y, which models the noisy version of the image to be recuperated, is assumed to be observed and real. The success of the HMC model stems from the fact that, when the noise is not too complex, several Bayesian classification approaches such as maximum A Posteriori (MAP) or maximal A Posteriori (MPM) estimators can be used to determine the image X=x from the observed version Y=y. In addition, the spatial regularity of pixels may hinder the effectiveness of the HMC model in image segmentation.
We consider the problem of segmenting a satellite image into two classes, "water" and "trees". The hidden image represents all of the classes, whereas the observed image represents all of the gray levels on each pixel. In general, all neighboring pixels are considered to have a higher probability of the same class than pixels situated further from each other. Nevertheless, in this situation, we can detect pixels that are among a set of pixels in identical groups or near edges but have distinct appearances. According to the classical hypothesis, this attribute cannot be accommodated in the HMC model. For this reason, the HMC model has been augmented to include the pair-wise Markov chain (PMC) model The PMC model is more general than the HMC model, given the process X is not necessarily a Markov chain and the pair (X,Y) is directly Markovian. Another advantage of PMC against HMC is its capacity to easily take into account the noise correlation. Furthermore, the PMC model, like HMC, is capable of implementing Bayesian MAP and MPM restorations.
Another major problem is that when a one-dimensional sequence is generated from a two-dimensional image using any reading process, such as human preference score (HPS), some locality information of the pixels is lost. What prompted us in this study was the success of adding additional components to the observations in the studies [6,7,8] in the framework of the HMC model by taking advantage of a neighborhood of each pixel (for example, four nearest neighbors), which culminated in an impressive model on a range of synthetic images. In this work, we apply the same concept to the PMC model in order to tackle the main issue of spatial regularity of pixels, as well as the problem of pixel information loss in an image, which is the objective of the segmentation process. As a result, we formulate a PMC model with bidimensional observed data (BPMC), whose most important task is to segment images while taking into account the above-mentioned last two difficulties. The focus of this study is to look into the potential benefits of BPMC over PMC in terms of unsupervised segmentation. Initially, the unidentified parameters of the model must be estimated. In the case of PMC, we cannot simply maximize the likelihood analytically, which is the case for the expectation-maximization (EM) algorithm [9,10], at each iteration, so we need to apply iterative conditional estimation (ICE) and SEM [11,12,13]. In general, the ICE and structural equation models (SEM) algorithms are very powerful when used with Gaussian estimation and generalized mixture. In this study, we adapt the ICE and SEM algorithms to the new model. Furthermore, SEM and ICE algorithms are examined in the presence of Gaussian noise for both PMC and BPMC, and it is discovered that SEM and ICE algorithms are more adaptable because likelihood is not always required. In addition, the estimators beyond the maximum likelihood can also be utilized. Therefore, we need to answer the following questions.
● Does the use of BPMC significantly improve restoration results, compared to the PMC model, and how do ICE and SEM algorithms work in the BPMC context?
● Does the use of BPMC compared to PMC permit the image to be converted into a one-dimensional sequence without losing pixel locality information?
The last point ought to be considered since all kinds of hidden Markov chain models based on HPS have shown potential for image segmentation, and in certain situations, they may even be competing with the hidden Markov field (HMF) model in classification [14,15,16]. On top of that, the HMF model allows for a more precise and more intuitive modeling of the spatial relationships between pixels [17], while the PMC model requires HPS to account for the spatial details of the pixels. Conversely, the BPMC model takes into account temporal and spatial pixel information by incorporating an additional element right into the observed process, allowing it to compete with the PMC model in this regard.
The rest of this paper is structured into seven sections. Section 2 presents studies relevant to the current study. In Section 3, we provide the reader with a mathematical background of the BPMC model. Section 4 presents the two models discussed in this study and their properties. Section 5 is committed to the simulation of PMC and BPMC models in a supervised way using MPM classifier-based classifications. Section 6 deals with estimating parameter algorithms; here, we describe the main steps of the ICE and SEM algorithms in the case of the two models, PMC and BPMC. In Section 7, with numerical experiments for various noise factors, we give a study of comparison of the two estimation procedures based on the accuracy of parameter estimates for both models. Furthermore, using error rates and the peak signal-to-noise ratio (PSNR) index, we demonstrate the performance of these models on a sample of synthetic images. Finally, conclusions and some perspectives are discussed in Section 8.
2.
Literature review
The pair-wise terminology was first attached to the hidden Markov model in the works [18,19,20], where the main motivation was to solve the problem of correlation between noise and pixels located on the image boundary. In fact, this difficulty has been emphasized for the first time in the pair-wise Markov field (PMF) model and the pair-wise Markov tree (PMT) model. The PMF model has been successfully used for textured image segmentation [20], real synthetic-aperture radar (SAR) images [21], and hyper-spectral images [22]. The PMT model has also been used for image and signal restoration tasks [19]. Recently, the PMC model [18] has been used for applications including signal and image processing. The PMC model has specific applications in image processing, such as textured and real images segmentation [14,23,24,25,26,27] and text segmentation [28,29], where the authors used an HPS for converting the image and copula to model conditional densities of class. Images with multi-scale can be segmented using PMC [30]. Other uses include fuzzy PMC to treat spatially correlated fuzzy classes between observations [31,32].
Several other methods based on hidden Markov models have been attempted in order to solve the question of the relationship between pixels in the chain after the transformation of a bidimensional image. Among these, an approach suggested by [15] is based on a combination of the HMC and HMF models. The first model is employed in the estimation phase and the second is exploited in the final classification to determine the X configuration. Besides, another strategy in [33,34] is centered on the integration of spectral and contextual information into the HMC model, where the structure of the gathered data is no more Markov when modeling an image using HPS. Additionally, the study [35] entails another suggestion for an algorithm that incorporates fuzzy C-means combined with HPS. In [36], a new HMC is proposed for representing the semantic and spatial interactions between pixels.
A new segmentation method in [7] has recently been introduced that tackles the issues of spatial and temporal pixels via contextual HPS. This method adds an additional element to the observation process in the HMC model so that each pixel recovered from the HPS is linked to another pair of pixels next to it in the image but not in the chain. Similarly, in [8], a second component is introduced in the process Y and a two-dimensional model is generated with estimation and segmentation techniques adapted to this scenario. Table 1 summarizes all of the methods covered here.
3.
Mathematical context
In this part, we describe a brief mathematical basis for the BPMC model given in this study. We consider a couple of stochastic processes (X,Y)=((Xn)1≤n≤N,(Yn)1≤n≤N), where N is the number of pixels. As previously explained, in the segmentation of images with the hidden Markov model (HMM), we need to estimate the latent process X=x via the observed process Y=y.
In the context of the PMC model with two-dimensional observed data, we consider another realization of a random variable ˉY=ˉy modeling information about each pixel based on its neighborhood in the image, but not in the chain, in such a way that the observed process is two-dimensional and is denoted by (Y,ˉY). Then, we intend to estimate the parameters of the joint probability p(x,y,ˉy) by using data unsupervisedly. Based on the observation, Bayesian estimators are used to restructure the hidden process. In the present model's segmentation problem, it considers two laws: An a priori law p(x) and a two-dimensional conditional density p(y,ˉy|x).
To take on the above-mentioned segmentation problem, the a posteriori distribution must be identified, which contains all of the information on X that is available at (Y,ˉY). This distribution was supplied by: p(x|y,ˉy)=p(x,y,ˉy)∑xp(x,y,ˉy), where p(x,y,ˉy)=p(x)×p(y,ˉy|x).
Let L be a bivariate loss function and ˆx be an approximation of the hidden realization, where the latter is obtained by this method:
In Bayesian statistics, the two best-known estimators are MAP and MPM, which are associated respectively with the two loss functions LMAP(x,x′)=L(x,x′) and LMPM(x,x′)=N∑n=1L(xn,x′n), with L here representing the Kronecker symbol.
The two estimators are consequently and respectively defined by:
and
When comparing these two estimators, MPM maximizes the a posteriori marginal probability for each xn, while MAP maximizes the a posteriori probability p(x|y,ˉy) in a direct way, and allows us to estimate the whole x sequence. Next, we only utilize the MPM estimator since it is straightforward to use. We use iterative strategies to estimate the joint distribution of the unobserved process using the data available (y,ˉy). For calculating the joint distribution, we often use a conditional expectation approximation in the case of the ICE method or stochastic generators to estimate a sequence of model parameters from the dataset in the case of the SEM algorithm.
4.
Models
Let's consider two stochastic processes X=(Xn)1≤n≤N and Y=(Yn)1≤n≤N, where N is the number of pixels of an image. The process X is the unknown image whereas Y is the observed one. Each random variable Xn has the values from a finite set of K classes Ω={ω1,…,ωK}, and each Yn has attributed the values from the set of real numbers IR. Realizations of X and Y are denoted by x=(xn)1≤n≤N and y=(yn)1≤n≤N, respectively.
We suppose that the process X is stationary and that the random variables Y=(Yn)1≤n≤N are correlated and conditionally depend on X. As in all Markov models, the difficulty of classification is estimating the latent process X=x based on the observed process Y=y.
4.1. PMC model
This study uses the PMC model and proposes updating the classical pair-wise model. The classical PMC model is a doubly stochastic process, usually noted (X, Y), where Y (one-dimensional) represents the pixels of a hidden image X. The same concept for the bidimensional PMC model is true, except for the process Y. In the classical model, a direct Hilbert Peano scan is used to transform the image from its two-dimensional version to a one-dimensional chain, followed by an inverse scan to reconstruct the final image, but in the suggested model, the second component of Y provides for this goal. The visual results revealed a significant advantage of the proposed model over the classical model, especially for isolated pixels in the to-be-segmented image.
An image of a bidimensional form can be transformed into a one-dimensional form inevitably by combining the image column by column or line by line. Moreover, we can use the classical HPS for an image 2p×2p [2], or the generalized HPS for any given image [37]. In a moderately different sense than above, now let's consider the pair-wise process Z=(Z1,…,Zn,…,ZN) corresponding to X the hidden process and Y the observed process, where Zn=(Xn,Yn) for n=1,…,N. Lowercase letters are used to identify the realizations of such variables and processes. To keep things straightforward and brief, we use the letters p and f, respectively, to describe various distributions on Ω and densities on IR in the following text. p(zn) is also employed for p(Zn=zn). Referring to the previous, the pair Z is a PMC model when its joint distribution can be given as follows p(z)=p(z1)N∏n=2p(zn|zn−1), with
and
Based upon whether the process X is stationary, we can also define the Z distribution only as p(z1)p(z2|z1)=p(z1,z2). Thus the following proposition is inspired by the following works [4,6,38].
Proposition 4.1. For Z a stationary PMC model related to processes X and Y, and for an n from {2,…,N}, the following results are obtained.
(1) Both the distributions of X conditionally to Y=y and of Y conditionally to X=x are Markovians.
(2) If p(xn|xn−1,yn−1)=p(xn|xn−1) and p(yn|xn,xn−1,yn−1)=p(yn|xn,yn−1), then Z is an HMC with dependent noise.
(3) If p(xn|xn−1,yn−1)=p(xn|xn−1) and p(yn|xn,xn−1,yn−1)=p(yn|xn), then Z is an HMC with independent noise.
(4) If p(yn−1|xn−1,xn)=p(yn−1|xn−1), then Z is an HMC with dependent noise.
Proof. (1) Let Z=(X,Y) be a PMC model. Then, n belongs to {2,…,N}, and we have
We also have
This gives p(xn|x1,…,xn−1,y)=p(xn|xn−1,y), therefore X conditionally to Y=y is a Markov chain. In the same way, we can show that Y conditionally to X=x is a Markov chain, and x and y do a symmetrical function here.
(2) Z is a PMC model, then
from which Z is an HMC with dependent noise, identified by the initial distribution p(z1) by
and the transition matrix p(zn|zn−1) given by
(3) Z is a PMC model, then
From which Z is an HMC with independent noise.
(4) Let us show here that the process X is a Markov chain. We have, for n from {2,…,N},
Taking advantage that p(yn−1|xn−1,xn)=p(yn−1|xn−1), we thus have p(xn|x1,…,xn−1)=p(xn|xn−1), which completes the proof.□
Remark 4.1. Properties 2 and 3 in Proposition4.1 demonstrate the strict generality of the PMC model compared with the HMC model.
4.2. BPMC model
In this model, we consider another observed process noted ˉY given by ˉY=(Y1,ˉY1,…,Yn,ˉYn,…,YN,ˉYN). Consequently, for each n=1,…,N, the random variable Xn is associated with the pair (Yn,ˉYn), where ˉYn is related only to the observation Yn as defined below. The problem remains unchanged; determining X based on Y. To establish this new model, we may then state the following conditions (Figure 1).
(1) The process Z=(X,Y) is a PMC model.
(2) The random variables (ˉYn)1≤n≤N are conditionally independent to Z.
(3) Each ˉYn is conditionally distributed to Z and its distribution is conditionally to Zn.
The model to be studied (Z,ˉY), called the BPMC model, could be interpreted as a semi-hidden Markov chain model [4]. Its distribution can be stated as follows.
Let's emphasize that the BPMC model is particularly interesting because, given the pair (Y,ˉY)=(y,ˉy), the distribution of X is a Markov. Using the same logic as the demonstration of the first point in Proposition4.1, we get the following proposition.
Proposition 4.2. For (Z,ˉY), a stationary BPMC model related to processes X and Y, the following result is obtained.
The distributions of X conditional on (Y,ˉY)=(y,ˉy) is a Markov chain.
5.
Restoration problem of simulated models
The reason behind the achievements of the HMC model can be attributed to the fact that the process X conditionally on Y=y is a Markov chain and its transitions can be computed. This observation holds true in the case of the PMC model. However, when it comes to the BPMC model, this conditional distribution exhibits a complex structure. That is precisely why we employ the distribution p(x|y,ˉy). Both models, PMC and BPMC, are assumed to be stationary, such that their distributions are given by p(x1,y1,x2,y2) for PMC and p(x1,y1,ˉy1,x2,y2,ˉy2) for BPMC.
In this section, we contrast the two models using the MPM restoration algorithm in the Gaussian context. Both models utilize Gaussian densities, which implies that p(yn−1,yn|xn−1,xn) and p(ˉyn−1|xn−1), for each n, are Gaussian (in the BPMC model case, we choose p(ˉyn−1|xn−1) to be equal to p(ˉyn−1|xn−1,xn)). In this scenario, we have K2 probabilities p(xn−1=ωi,xn=ωj), K2 mean vectors μij=(μ1ijμ2ij), and K2 variance-covariance matrix Σij=(σ1ijσ12ijσ12ijσ2ij) of the bidimensional densities fij(yn−1,yn) to represent the PMC model. Additionally, the BPMC model is represented by the parameters μ1ij, μ2ij, σ1ij, and σ2ij of the K2 mono-dimensional densities fij(ˉyn−1). In the following subsection, we will simulate the two processes X and Y assuming that all the aforementioned parameters are initially known. Furthermore, we will reconstruct the process X only based on the process Y using the PMC model, while considering both Y and ˉY simultaneously using the BPMC model.
5.1. Simulation of processes in models
Here, we evoke the previously defined distributions of the PMC model and BPMC model. The distribution of the PMC model is given by p(x1,y1)=p(x1)p(y1|x1) and p(xn,yn|xx−1,yn−1)=p(yn|xn−1,yn−1,xn)p(xn|xn−1,yn−1), while the distribution of the BPMC model is represented by p(x1,y1,ˉy1)=p(x1)p(y1|x1)p(ˉy1|x1,y1), p(xn,yn|xn−1,yn−1), and p(ˉyn|xn,yn).
We can simulate the processes X, Y and ˉY of models concurrently and interchangeably using the following approach.
● For the PMC model, x1, y1, xn, and yn, for each n=2,…,N, are simulated with the following probabilities:
● For the BPMC model, x1, y1, xn, and yn are simulated in the same way as above. Additionally, we have simulated ˉy1 and ˉyn, for each n=2,…,N, with these probabilities:
Remark 5.1. It can be seen here that (to dig deeper, see [14,39])
● fx1,x2(y1) is a mono-dimensional Gaussian density N(μ1ij,√σ1ij).
● fxn−1,xn(yn−1) is a mono-dimensional Gaussian density N(μ1ij,√σ1ij).
● it can be shown that yn is generated according to a Gaussian drawing, where p(yn|xn,xn−1,yn−1) is a mono-dimensional Gaussian density of mean μ2ij+σ12ijσ1ij(yn−1−μ1ij) and standard deviation √σ1ijσ2ij−σ12ijσ1ij.
● fx1,x2(ˉy1) is a mono-dimensional Gaussian density N(μ1ij,√σ1ij).
● fxn−1,xn(ˉyn) is a mono-dimensional Gaussian density N(μ2ij,√σ2ij).
5.2. Restoration of hidden process in models
In this work, we look at the MPM approach for reconstructing the hidden sequence X=x.
The MPM estimator for the PMC model consists of maximizing a posteriori marginal probability for each xn as follows:
where χn(ωi)=p(xn|y). Formally, this probability is calculated using the "conditional forward probabilities" αn(ωi)=p(xn=ωi|y1,…,yn) and the "conditional backward probabilities" βn(ωi)=p(yn+1,…,yN|xn=ωi,yn)p(yn+1,…,yN|y1,…,yn) to avoid the numerical problems encountered by the same probabilities in the classical case. These conditional probabilities can be given using the following recursion:
● Initiation phase: For i=1,…,K.
(1) Forward
(2) Backward
● Induction phase: For i=1,…,K and n=1,…,N−1.
(1) Forward
(2) Backward
with p(xn+1,yn+1|xn,yn)=p(xn,yn,xn+1,yn+1)p(xn,yn)=p(xn,xn+1)fxn,xn+1(yn,yn+1)∑xn+1p(xn,xn+1)fxn,xn+1(yn).
It is easy to demonstrate that
Besides this, the MPM estimator under the BPMC model can be calculated as follows.
where χ∗n(ωi)=p(xn|y,ˉy). This probability can be achieved using the following conditional forward and backward probabilities α∗n(ωi)=p(xn=ωi|y1,ˉy1,…,yn,ˉyn) and β∗n(ωi)=p(yn+1,ˉyn+1,…,yN,ˉyN|xn=ωi,yn,ˉyn)p(yn+1,ˉyn+1,…,yN,ˉyN|y1,ˉy1,…,yn,ˉyn). These conditional probabilities can be given using the following recursion:
● Initiation phase: For i=1,…,K.
(1) Forward
(2) Backward
● Induction phase: For i=1,…,K and n=1,…,N−1.
(1) Forward
(2) Backward
with p(xn+1,yn+1,ˉyn+1|xn,yn,ˉyn)=p(xn,xn+1)fxn,xn+1(yn,yn+1)fxn,xn+1(ˉyn+1)∑xn+1p(xn,xn+1)fxn,xn+1(yn).
Considering the foregoing considerations, the distribution of Xn conditionally on (Y=y,ˉY=ˉy) is given by
This gives
5.3. Restoration problem results
In this section, we attempt to show the importance of the BPMC model compared to the PMC model. For that, we treat an example by considering N=100000 and K=2 (i.e., Ω={ω1,ω2}). We shall investigate the models' performance by taking two factors into account, two different chain homogeneities, noted H1 and H2, and different noise parameters:
● Factor chain:
- Case H1: p(ω1,ω1)=p(ω2,ω2)=0.48 and p(ω1,ω2)=p(ω2,ω1)=0.02.
- Case H2: p(ω1,ω1)=p(ω2,ω2)=0.25 and p(ω1,ω2)=p(ω2,ω1)=0.25.
● Factor noise:
- Parameters of density fω1,ω1: μ1ω1,ω1=−3, μ2ω1,ω1=−3, σ1ω1,ω1=14, σ2ω1,ω1=14, and σ12ω1,ω1=0.1.
- Parameters of density fω1,ω2: μ1ω1,ω2=5, μ2ω1,ω2=−5, σ1ω1,ω2=11, σ2ω1,ω2=9, and σ12ω1,ω2=0.9.
- Parameters of density fω2,ω1: μ1ω2,ω1=−3, μ2ω2,ω1=3, σ1ω2,ω1=9, σ2ω2,ω1=11, and σ12ω2,ω1=0.1.
- Parameters of density fω2,ω2: μ1ω2,ω2=5, μ2ω2,ω2=5, σ1ω2,ω2=18, σ2ω2,ω2=18, and σ12ω2,ω2=0.1.
The following parameter is used to determine noise correlation influence on the error rate
ρ=K∑i,j=1(ρωi,ωjρω′i,ω′j)2 with ρωi,ωj>ρω′i,ω′j, where ρωi,ωj=σ12ωi,ωj√σ1ωi,ωj√σ2ωi,ωj,fori,j=1,…,K.
We consider four sets of ρ and the covariance values associated with these four simulations are shown in Table 2.
We specify the misclassification rates computed by the MPM method in Table 3 after calculating the means from 300 independent experiments.
According to these results, for chain in Case H1, the misclassification rates are more important when the coefficient ρ is small (Experiment 4), and becomes progressively smaller when the coefficient ρ decreases. It can be seen from all experiments that the BPMC model-based MPM restorations often work better than the PMC model-based ones. For the case H2 of chain, the misclassification rates are consistently more important for both models. In this situation, the PMC model is preferable to the BPMC model. Other simulations have been done, showing that chain homogeneity is an important element and it can be used during the restoration phase when the data is complete.
6.
PMC and BPMC parameters estimation
The parameter estimation problem from incomplete data for the PMC is considerably more complex than for the HMM. Considering the fact that we are dealing with unsupervised segmentation, all of the parameters noted θ=(θ1,…,θl) must be estimated. We are interested, in this section, in estimating K2 probabilities p(ωi,ωj), 2K2 parameters of μij and 3K2 parameters of Σij. Considering that the log-likelihood of the models under examination cannot be maximized analytically, the methods follow the simulation according to posterior distributions used for parameter estimation. We use here ICE and SEM algorithms.
6.1. Estimation using ICE algorithm
6.1.1. ICE algorithm according to PMC model
The ICE algorithm under the PMC model is an iterative procedure, working as follows:
● initialize θ0i;
● θq+1i is calculated using θq+1i=Eθqi[ˆθi(X,Y)|Y=y] if this expectation is computable, or θq+1i=1TT∑t=1ˆθi(xi,y) if the expectation above is not computable, where x1,…,xT are simulated according to p(x|y,θq).
Returning to our problem, we can take the following estimator for p(ωi,ωj):
The conditional expectation of this estimator, at iteration (q+1), can be calculated and gives
where ψ(q)n(ωi,ωj)=p(xn=ωi,xn+1=ωj|y) is a joint a posteriori probability of two consecutive classes, and can be calculated using
For the parameters μij and Σij, we can choose the following estimators:
However, we cannot calculate the conditional expectation of these estimators. In this situation, we employ a stochastic strategy based on the simulation of X according to its a posteriori distribution, which is a nonstationary Markov chain, as we demonstrated previously:
with an initial distribution χ1(ωi) and a transition matrix
The process for applying the ICE algorithm to the PMC model is then presented in the following manner.
● Iteration (q=0), for i,j=1,…,K.
(1) Initialization of the algorithm with p(q)(ωi,ωj),μ(q)ij,Σ(q)ij.
(2) Calculation of probabilities α(q)n(ωi) according to (5.8) and (5.10).
(3) Calculation of probabilities β(q)n(ωi) according to (5.9) and (5.11).
(4) Deduction of probabilities χ(q)n(ωi) according to (5.12).
(5) Deduction of probabilities ψ(q)n(ωi,ωj) according to (6.3).
(6) Simulation of x(q),1,…,x(q),T according to (6.6).
● Iteration (q+1), for i,j=1,…,K.
(1) Calculation of p(q+1)(ωi,ωj) according to (6.2).
(2) Calculation of μ(q+1)ij and Σ(q+1)ij by
where μtij and Σtij are calculated according to (6.4) and (6.5).
● Iterations are stopped if the algorithm converges; otherwise, the preceding step is repeated.
6.1.2. ICE algorithm according to BPMC model
The ICE algorithm follows this method:
● Initialize θ0i;
● θq+1i is calculated using θq+1i=Eθqi[ˆθi(X,Y,ˉY)|Y=y,ˉY=ˉy] if this expectation is computable, or θq+1i=1TT∑t=1ˆθi(xi,y,ˉy) if the expectation above is not computable, where x1,…,xT are simulated according to p(x|y,ˉy,θq).
Here, we take the same estimator for p(ωi,ωj), as given in (6.1). The conditional expectation of this estimator, at iteration (q+1), can be calculated and gives
where ψ∗(q)n(ωi,ωj)=p(Xn=ωi,xn+1=ωj|y,ˉy) is a joint a posteriori probability of two consecutive classes, and it can be shown that
For the parameters μij and Σij, we choose the same estimators as in (6.4) and (6.5).
Here, we simulate the process X according to its a posteriori distribution under BPMC model p(x|y,ˉy), where
with an initial distribution χ∗1(ωi) and a transition matrix
The ICE algorithm procedure for the BPMC model is then explained as follows.
● Iteration (q=0), for i,j=1,…,K.
(1) Initialization of the algorithm with p(q)(ωi,ωj),μ(q)ij,Σ(q)ij.
(2) Calculation of probabilities α∗(q)n(ωi) according to (5.14) and (5.16).
(3) Calculation of probabilities β∗(q)n(ωi) according to (5.15) and (5.17).
(4) Deduction of probabilities χ∗(q)n(ωi) according to (5.18).
(5) Deduction of probabilities ψ∗(q)n(ωi,ωj) according to (6.10).
(6) Simulation of x(q),1,…,x(q),T according to (6.11).
● Iteration (q+1), for i,j=1,…,K.
(1) Calculation of p(q+1)(ωi,ωj) according to (6.9).
(2) Calculation of μ(q+1)ij and Σ(q+1)ij according to (6.8).
● Iterations are stopped if the algorithm converges; otherwise, the preceding step is repeated.
6.2. Estimation using SEM algorithm
The SEM algorithm is an iterative procedure that uses stochastic drawings to estimate a sequence of model parameters from observations and realizations of X. In each iteration, we simulate the process X according to its a posteriori distribution based on the parameters obtained in the current iteration. Then, the SEM algorithm for the models in this work proceeds as follows:
● We initialize the algorithm with θ0i.
● At each iteration (q), we simulate just one realization x of X according to its a posteriori distribution considering the current parameters. The parameters θ(q+1)i are then calculated.
6.2.1. SEM algorithm according to PMC model
The SEM algorithm under the PMC model runs as follows:
● Iteration (q=0), for i,j=1,…,K.
(1) Initialization of the algorithm with p(q)(ωi,ωj),μ(q)ij,Σ(q)ij.
(2) Calculation of probabilities α(q)n(ωi) according to (5.8) and (5.10).
(3) Calculation of probabilities β(q)n(ωi) according to (5.9) and (5.11).
(4) Deduction of probabilities χ(q)n(ωi) according to (5.12).
(5) Deduction of probabilities ψ(q)n(ωi,ωj) according to (6.3).
(6) Simulation of x(q) according to (6.6).
● Iteration (q+1), for i,j=1,…,K.
(1) Calculation of p(q+1)(ωi,ωj) by
(2) Calculation of μ(q+1)ij and Σ(q+1)ij by
● Iterations are stopped if the algorithm converges; otherwise, the preceding step is repeated.
6.2.2. SEM algorithm according to BPMC model
The SEM algorithm under the BPMC model runs as follows:
● Iteration (q=0), for i,j=1,…,K.
● Initialization of the algorithm with p(q)(ωi,ωj),μ(q)ij,Σ(q)ij.
(1) Calculation of probabilities α∗(q)n(ωi) according to (5.14) and (5.16).
(2) Calculation of probabilities β∗(q)n(ωi) according to (5.15) and (5.17).
(3) Deduction of probabilities χ∗(q)n(ωi) according to (5.18).
(4) Deduction of probabilities ψ∗(q)n(ωi,ωj) according to (6.10).
(5) Simulation of x(q) according to (6.11).
● Iteration (q+1), for i,j=1,…,K.
(1) Calculation of p(q+1)(ωi,ωj) by
(2) Calculation of μ(q+1)ij and Σ(q+1)ij by
● Iterations are stopped if the algorithm converges; otherwise, the preceding step is repeated.
7.
Results of images segmentation
In the present section, we show several results on the use of PMC and BPMC models for gray image segmentation in the Gaussian context. First, we present a comparative study of the results obtained from both PMC and BPMC models, concerning the misclassification rates by the Bayesian MPM algorithm, PSNR index, and the parameters estimation obtained by ICE and SEM algorithms. Second, we illustrate the segmentation results obtained on simulated images corrupted with some correlated noise. These tests give an idea about the behavior of the ICE and SEM methods for numerical and visual results.
7.1. Numerical results
We segment some noisy simulated images with correlated noise using an unsupervised method for evaluating the endurance of the BPMC model in comparison to the PMC model, as well as the consequences when the data is not PMC or HMC-suited. For PMC model segmentation methods, the modeling of the image is done via an HPS to convert the bidimensional collection of pixels as a mono-dimensional chain, and then reorganize the image for segmentation using an inverse HPS. This transformation gives us a stochastic process with a very complex structure. This difficulty can be seen in a few papers [2,7,14].
In this context, the observation process representing the correlated image was obtained as follows: For each xs, we simulate the variable Ns according to a Gaussian distribution N(μxs,σ2xs), then we take
where S is the set of pixels in the image in its bidimensional form, where si is a neighbor of pixel s in a neighborhood of four nearest neighbors.
We study the performance of the models presented in this work on two classes of images of different sizes, which have been noised by the previous method. However, for BPMC model segmentation methods, the sequence of pixels has been obtained via a "line by line" proceeding. We use the same form of Ys as above, and for the second element of the observable process, we set ˉYs as an average of two observations for pixels neighboring to s in the image but not in the chain.
The sequence of pixels collected by converting a bidimensional image to a chain of one dimension is designated by (si)1≤i≤N. The two realizations of the stochastic processes X and Y are respectively. x=(x1,…,xn,…,xN) and y=(y1,…,yn,…,yN), where xi=xsi and yi=ysi. Under these assumptions, both processes Z=(X,Y) and (Z,ˉY) have fairly complex structures, and we can see that the distribution of Y conditionally to X=x is not necessarily a Markov; however, we have segmented the noisy image using an unsupervised method based on MPM restoration under PMC and BPMC models, where all parameters are estimated with ICE and SEM algorithms. The question is therefore whether using the BPMC model instead of the PMC model in such a context can improve segmentation results. To give some experiences, we consider two series of parameters a, μxs, and σxs. Based on the parameters shown in Table 4, the original images and their noisy versions are reported in Figure 2.
We performed six segmentations, and we note that the complete data (C-D) means the original image and its noised version.
● Segmentation by MPM based on parameters of PMC model obtained from the C-D.
● Segmentation by MPM based on parameters of BPMC model obtained from the C-D.
● Segmentation by MPM based on parameters of PMC model obtained with ICE algorithm.
● Segmentation by MPM based on parameters of PMC model obtained with SEM algorithm.
● Segmentation by MPM based on parameters of BPMC model obtained with ICE algorithm.
● Segmentation by MPM based on parameters of BPMC model obtained with SEM algorithm.
We calculated some evaluated criteria to confirm the visual obtained results. The comparison of the algorithms is performed using quality measures: the error rate and the PSNR index (Table 5).
To start, we can make a number of important observations. Columns 1 and 2 can be seen as a kind of reference for the two models presented in this work, on the grounds that the complete data is used in the estimations. Comparing the results obtained by the estimation algorithms with those calculated by the complete data, one can notice that both the ICE and SEM algorithms implemented under the BPMC model shows favorable results indicating that misclassification rates and PSNR indexes are similar for both noise parameter series, except in the case of \verb"Image4" correlated by noise with the parameter of type series 1 .
The first conclusion of interest in image segmentation is that the SEM algorithm behaves correctly in the situation under consideration. The comparison of BPMC and PMC models is the second conclusion. It is known that if the data did not fit either an HMC or a PMC, the PMC model-based MPM with HPS might have given better results than another model. This is not the case, since according to the findings and other experiments we have executed, regardless of the presence of strong enough noise, BPMC model-based MPM restorations surpass PMC model-based ones. Accordingly, using MPM under the BPMC model, ICE and SEM algorithms provide strong unsupervised segmentation methods that may substantially beat PMC model-based ones with HPS.
Also, in Table 6, parameters are also found using complete data and the parameters estimated from ICE and SEM algorithms based on BPMC and PMC models. Regarding these estimations by both algorithms, one can perceive that most of the time the estimates are similar by comparing with those obtained by C-D for PMC and BPMC models in the majority of situations of segmented images. With respect to this, the takeaway is that the BPMC model, in its current form, provides another choice that can compete with and likely outperform nearly any other model in the circumstances at hand.
7.2. Visual results
The visual results of the unsupervised segmentation using both BPMC and PMC models are discussed here. The latter have been initialized using a two-class segmentation done by the K -means algorithm to generate the initial configuration of the process X .
The images (Figure 2) to be segmented are chosen based on the homogeneity of the objects. \verb"Images 1" and \verb"2" show extremely fine details, however \verb"Images 3" and \verb"4" are not particularly homogeneous and missing fine details. Thus, it is easy to demonstrate a significant advantage of the BPMC model over the PMC model, with HPS in the context of image segmentation. In fact, regarding Table 5, the rating is based on the average rate of each algorithm across eight experiments. Under the BPMC model, ICE was 5.25 \% and SEM was 2.75 \% , while under the PMC model, ICE was 8.57 \% and SEM was 5.19 \% . The PSNR index average results in the following rankings: ICE ( 37.39 ), SEM ( 37.51 ) under the BPMC model and ICE ( 37.36 ), and SEM ( 35.84 ) under the PMC model.
We can also visually observe that, depending on the segmentation phase required in these experiments, the error rate is not always the most important criterion. In fact, in \verb"Image 4" , the details of the zebra (Figure 3) are better restored and there are fewer spurious whites and blacks on the background in the case of the BPMC model, whereas this is not the case for the PMC model (Figure 4). However, the error rates of the two models are very close.
Nonetheless, we obtained the following conclusions:
(1) The introduction of a second component in the observation process makes the BPMC model a good rival to the PMC model with HPS in all numerical (Table 5) and visual scenarios (Figure 3) in the context of image segmentation.
(2) The SEM algorithm performs better than the ICE algorithm, especially when working on the BPMC model (Table 5).
Remark 7.1. The HPS is a space-filling curve used to read pixels during image processing. This sweep takes advantage of the two-dimensional locality of pixels. It should be noted that the introduction of such a scan is only conceivable in the case of 2^n\times2^n images, which is why we limit our study to the PMC model. However, the advantage of the proposed model BPMC is that it is capable of handling images of any size.
7.3. Practical implications of proposed approach
An unsupervised approach is particularly useful when a smaller amount of or no labeled data is available for training. The proposed approach has shown superior results compared to the classical PMC model and can be potentially utilized for various applications. Particularly, the suggested model can be used for the segmentation of radar images, textured images, and medical images, because these types of images have an important correlation of noise.
8.
Conclusions and perspectives
We proposed an unsupervised approach for restoring hidden data by applying the PMC model. This study contributes to provide a 2D observed process for the latter model. The novel BPMC model is proposed first to compete with the classical PMC model for image segmentation and then to solve the problem of correlation between noise and the difficulty of two-dimensional pixel locality during one-dimensional image transformation. The parameter estimation methods described for this model are applicable to Gaussian and possibly correlated noise. We initially found that when the data follows a PMC structure, Bayesian restorations according to the BPMC model beat those based on the PMC model. Likewise, the parameter estimating methods described for the BPMC model are extremely efficient, as restorations based on complete parameters are comparable to restorations using approximated values. Next, we executed a series of tests combining the use of unsupervised segmentation of images via an HPS in the case of the PMC model and the insertion of a second element into the observed process in the case of the BPMC model. The assessment of interest in this study is based on two points. In the first step, we show that the BPMC model works best when the stochastic process from a noisy class image with correlated noise is very complex (which is neither PMC nor HMC). Several experiments in this study show that the BPMC model always gives better results compared to the PMC model, and the difference can even be quite significant in some situations. So, one way to generate complex data is to use a noisy image in a correlative version and an HPS or line by line technique. In the second step, we proved that estimation methods corresponding to the proposed model can compete with classical methods based on the PMC model with high efficiency and are therefore of interest for the image segmentation problem.
As a general conclusion, we can confirm that the proposed model, where we introduced a second component in the observed process, offers an interesting opportunity compared to the PMC model with an HPS. Basically, the composition of the second component of \textbf{Y} based on the neighborhood of four nearest neighbors of each pixel provides the same role as the HPS, while on the other hand, the BPMC-based unsupervised restorations with parameter estimation methods surpass those based on the PMC model.
As perspectives for further work, we may apply the concept used here to the triplet Markovian model described in [40,41] in the case of Gaussian noise [4], in a way that the observed process will be two-dimensional and the resulting model will therefore be a bidimensional triplet Markov chain. Of course, the same concept could be used for the segmentation of fuzzy data. It consists of considering a fuzzy HMC model with bidimensional observations as presented in [42] and applying it to medical images. In the proposed model, both spatial correlation and locality details about pixels should be considered at the same time. Another potential direction is concerning types of noise, such as applying non-Gaussian noise to image segmentation using bidimensional Markov models.
Author contributions
A. Joumad: Conceptualization, Data curation, Writing-original draft; A. El Moutaouakkil: Conceptualization, Formal analysis, Writing-original draft; A. Nasroallah: Data curation, Formal analysis, Methodology; O. Boutkhoum: Methodology, Project administration, Software; Mejdl Safran: Funding acquisition, Investigation, Visualization; Sultan Alfarhood: Investigation, Project administration, Visualization; Imran Ashraf: Supervision, Validation, Writing-review & editing. All authors have read and approved the final version of the manuscript for publication.
Acknowledgements
The authors extend their appreciation to King Saud University for funding this research through Researchers Supporting Project Number (RSPD2024R890), King Saud University, Riyadh, Saudi Arabia.
Conflict of interest
The authors declare there is no conflict of interest.