Weakly supervised salient object detection via image category annotation

Ruoqi Zhang; Xiaoming Huang; Qiang Zhu; Ruoqi Zhang; Xiaoming Huang; Qiang Zhu

doi:10.3934/mbe.2023945

Mathematical Biosciences and Engineering

2023, Volume 20, Issue 12: 21359-21381. doi: 10.3934/mbe.2023945

Previous Article Next Article

Research article Special Issues

Weakly supervised salient object detection via image category annotation

1.
Computer School, Beijing Information Science and Technology University, Beijing 100192, China
2.
College of Computer Science and Technology, Zhejiang University, Hangzhou 310013, China

Academic Editor: Shangce Gao

Received: 02 October 2023 Revised: 20 November 2023 Accepted: 23 November 2023 Published: 01 December 2023

The rapid development of deep learning has made a great progress in salient object detection task. Fully supervised methods need a large number of pixel-level annotations. To avoid laborious and consuming annotation, weakly supervised methods consider low-cost annotations such as category, bounding-box, scribble, etc. Due to simple annotation and existing large-scale classification datasets, the category annotation based methods have received more attention while still suffering from inaccurate detection. In this work, we proposed one weakly supervised method with category annotation. First, we proposed one coarse object location network (COLN) to roughly locate the object of an image with category annotation. Second, we refined the coarse object location to generate pixel-level pseudo-labels and proposed one quality check strategy to select high quality pseudo labels. To this end, we studied COLN twice followed by refinement to obtain a pseudo-labels pair and calculated the consistency of pseudo-label pairs to select high quality labels. Third, we proposed one multi-decoder neural network (MDN) for saliency detection supervised by pseudo-label pairs. The loss of each decoder and between decoders are both considered. Last but not least, we proposed one pseudo-labels update strategy to iteratively optimize pseudo-labels and saliency detection models. Performance evaluation on four public datasets shows that our method outperforms other image category annotation based work.

Keywords:

Citation: Ruoqi Zhang, Xiaoming Huang, Qiang Zhu. Weakly supervised salient object detection via image category annotation[J]. Mathematical Biosciences and Engineering, 2023, 20(12): 21359-21381. doi: 10.3934/mbe.2023945

Related Papers:

[1]	Keying Du, Liuyang Fang, Jie Chen, Dongdong Chen, Hua Lai . CTFusion: CNN-transformer-based self-supervised learning for infrared and visible image fusion. Mathematical Biosciences and Engineering, 2024, 21(7): 6710-6730. doi: 10.3934/mbe.2024294
[2]	Yufeng Li, Chengcheng Liu, Weiping Zhao, Yufeng Huang . Multi-spectral remote sensing images feature coverage classification based on improved convolutional neural network. Mathematical Biosciences and Engineering, 2020, 17(5): 4443-4456. doi: 10.3934/mbe.2020245
[3]	Basem Assiri, Mohammad Alamgir Hossain . Face emotion recognition based on infrared thermal imagery by applying machine learning and parallelism. Mathematical Biosciences and Engineering, 2023, 20(1): 913-929. doi: 10.3934/mbe.2023042
[4]	Shuai Cao, Biao Song . Visual attentional-driven deep learning method for flower recognition. Mathematical Biosciences and Engineering, 2021, 18(3): 1981-1991. doi: 10.3934/mbe.2021103
[5]	Eric Ke Wang, Nie Zhe, Yueping Li, Zuodong Liang, Xun Zhang, Juntao Yu, Yunming Ye . A sparse deep learning model for privacy attack on remote sensing images. Mathematical Biosciences and Engineering, 2019, 16(3): 1300-1312. doi: 10.3934/mbe.2019063
[6]	Jun Gao, Qian Jiang, Bo Zhou, Daozheng Chen . Convolutional neural networks for computer-aided detection or diagnosis in medical image analysis: An overview. Mathematical Biosciences and Engineering, 2019, 16(6): 6536-6561. doi: 10.3934/mbe.2019326
[7]	Akansha Singh, Krishna Kant Singh, Michal Greguš, Ivan Izonin . CNGOD-An improved convolution neural network with grasshopper optimization for detection of COVID-19. Mathematical Biosciences and Engineering, 2022, 19(12): 12518-12531. doi: 10.3934/mbe.2022584
[8]	Danial Sharifrazi, Roohallah Alizadehsani, Javad Hassannataj Joloudari, Shahab S. Band, Sadiq Hussain, Zahra Alizadeh Sani, Fereshteh Hasanzadeh, Afshin Shoeibi, Abdollah Dehzangi, Mehdi Sookhak, Hamid Alinejad-Rokny . CNN-KCL: Automatic myocarditis diagnosis using convolutional neural network combined with k-means clustering. Mathematical Biosciences and Engineering, 2022, 19(3): 2381-2402. doi: 10.3934/mbe.2022110
[9]	Bakhtyar Ahmed Mohammed, Muzhir Shaban Al-Ani . An efficient approach to diagnose brain tumors through deep CNN. Mathematical Biosciences and Engineering, 2021, 18(1): 851-867. doi: 10.3934/mbe.2021045
[10]	Zijian Wang, Yaqin Zhu, Haibo Shi, Yanting Zhang, Cairong Yan . A 3D multiscale view convolutional neural network with attention for mental disease diagnosis on MRI images. Mathematical Biosciences and Engineering, 2021, 18(5): 6978-6994. doi: 10.3934/mbe.2021347

Abstract

1. Introduction

A fast diagnosis is an emergent issue in the present situation like-COVID-19 test ^[1,2]. It is a routine test of viruses following the R.T.–PCR method. However, this test is carried out sequentially. So there is a chance of a high FNR ratio. A test like this will take a longer time to complete. Besides that, there arises a shortage of R.T.–PCR test kits. So there is a severe need for alternative tactics to diagnose patients more quickly to manage these situations. The infrared image is self-sufficient for identifying these diseases by measuring temperature as a fast finding. C.T. scan and other pathological tests are essential in evaluating a patient with a suspected pandemic infection ^[3,4]. Moreover, for detection deep precognitive analysis is applied for the presentation of diseases detection during pandemic situations. In such situation biologically-Inspired convolution fuzzy network would be more effective ^[5]. However, a patient's radiological findings may be expected at first. In this paper, we tried to collect information by an infrared camera of a patient's eye-retina to measure temperature by using a smartphone with an advanced camera enriched with infrared features. Almost every mobile phone with an exemplary configuration has this feature. If not, an infrared-enabled application can be installed from the Google Play Store. Then it will be noted, and the next course of action will be taken. Instead of using a regular infrared camera, we want to incorporate a mobile camera with its features. So this technique ensures both applications (through mobile and infrared cameras).

There are numerous applications of infrared images for detecting humans and their body parts for visual surveillance, human-activity tracking ^[6], medical applications for eye-diseases detection ^[7], driver safety, homeland security, etc. An essential possibility is monitoring people by CCTV and surveillance camera-based systems. Infrared images are also related to authenticating human facial parts ^[8,9]. The infrared camera can capture the images in any lighting conditions, while an RGB camera frequently needs proper lighting to capture high-resolution-based clear images. Meanwhile, infrared imaging schemes use infrared light sources to produce a healthier image without natural or artificial light. The traditional face recognition systems depend on self-reporting of manual measurement schemes ^[10], but the infrared technique depends on an automated system ^[11]. The traditional system is weak in checking it continuously and quickly. Researchers looked into contactless sensors, infrared cameras, and infrared cameras to solve these issues. Since temperature is directly connected to the physical parts of a human face like the eyes, forehead, cheek, lips, etc., few researchers applied the physical parts to identify temperature. Lips, mouth, and eyebrows are intimately related to different temperature measurement schemes, so a projected model is addressed. In work ^[10] proposed a method to identify human faces by visual features like the blinking ratio of closed and open eyes. In additional work, researchers noticed by tracing 3D face images using mobile phones. In their recent study M.Kim, B. H. Kim, S. Jo ^[12] developed a contactless-based real-time scheme for recognizing a driver operating a car in a sleeping or drowsing mood.

The use of a digital camera is a normal phenomenon. However, due to various good characteristics, the infrared camera and its imaging techniques are used in various applications like pandemic disease identification. It is proposed in recent literature that the computational timing of an infrared imaging technique is less than that of an RGB imaging technique ^[13] to work by taking advantage of these changes. Using different imaging types, the researchers have shown their efficiency in quantitative measurement. For example, in work ^[14] researchers achieved 65% accuracy utilizing images, but researchers ^[15] achieved more than 91% accuracy using infrared images. It demonstrates that it is possible to achieve a greater accuracy rate in recognizing anything by employing infrared imaging techniques.

The following are the key contributions of our manuscript.

● We proposed an infrared imaging technique to identify the pandemic and make accurate and quicker measurements.

● Few parts of a face are considered for measuring temperature. So a segmentation technique is used in dividing a face into various classes.

● A novel searching technique will be incorporated to search the regions from left to right.

The paper is separated into sub-sections as follows:

Section 2 illustrates related works. The proposed tracking scheme is described in section 3 in detail. The image registration process with its sub-sections (Pre-processing, Segmentation by improved gradient method, Feature extraction, Hybrid adaptive optimized classifier system, Head pose estimation and correction, Feature weights extraction, Regions detection, Vector formation from the segmented region) has been demonstrated in section 4. The feature extraction technique is described in section 4. The findings of the experiment are presented in section 5. In the final step, a decision is reached, and then we discuss the path that our work will take in the future.

2. Related works

Mobile and other electrical gadgets are used for face tracking ^[16] from a short distance in many applications nowadays. However, it may be unsuitable in many situations. Many scholars took facial signals for security and safety to control some surveillance-related issues. A video-based human identification system might have offered a proficient nonintrusive explanation for our day-to-day use. A video-based imaging approach routinely uses two imaging methods, i.e., infrared and color. In work ^[17], projected an improved resolution-based infrared facial image database by extensive manual explanations and clarified the flexibility in various applications. They proposed a set of correlated algorithms for detection and approximation within a facial image. They advised that a multi-process description by networking middleware resolve be used. It might be used for real-time face authentication and recognition using infrared pictures. By evaluating the frontal view of facial regions, scholars ^[18] proposed how infrared video can be exploited with the help of the SIFT flow approach. Furthermore, eye-tracking using video and its effectiveness has been presented along with the efficiencies suggested ^[19]. Finally, the evaluation is finished using the face-temperature histogram value.

Surveillance systems that use network devices like cameras to monitor and track activities generate massive volumes of data ^[20]. Both the migration of data over bandwidth constraints and the accumulation of lags in network technologies are problems that need to be solved. It proposes constructing a decentralized facial recognition algorithm applying PCA and LBP ^[21] for a distributed surveillance system using wireless networks and cloud computing. In the regionalized face tracking approach, face recognition, feature extraction and matching ^[22] are done in two steps. First, face detection and feature extraction are done on a specified cloudlet adjacent to the security cameras, avoiding spending vast amounts of data on a distant processing center. Face matching, on the other hand, is done using the facial features vector in any private cloudlet environment. According to this study, the suggested approach works effectively in the Wireless Network-Cloud architecture of its usefulness in finding "lost" people.

In recent years, the essential part of human life has been security. The most significant consideration at this point is cost. This technology is quite advantageous in minimizing the cost of external movement monitoring. We provide a real-time recognition method in this study that would allow us to handle photographs fast ^[23]. The main objective of this manuscript is to recognize people so that the house and office can be secured. A PIR sensor is utilized to detect movement in a defined area. The Raspberry Pi will next capture the photographs. The face in the captured image will then be detected and recognized ^[24]. Finally, the photographs and notifications will be uploaded to a smartphone-based Wireless Network via the Telegram program. The proposed systems calculate in real-time, are rapid, and are low-cost. The experiments suggest that the proposed facial recognition system may be employed in real-time.

This work proposes a higher efficiency in face recognition approach based on characteristics that use a newly created process named Floor of Log (FoL). This method has the benefit of conserving space and energy while maintaining precision. The scholars ^[25] used K-Nearest Neighbours (KNN) and Support Vector Machine (SVM) ^[26] approach to discover the optimal factor of the FoL technique utilizing cross-validation. The correctness and size far ahead of compression of the suggested approach were assessed. In the Extended YaleB, AR, LFW and CelebA face datasets, the FoL produced better results than a technique with the equivalent classifiers ^[27] without compressed features, with 86% to 91% relative to the same data size. This study provides a robust and easy feature compression technique for FER applications for various parts recognition ^[28]. The FoL is a supervised compression approach that may be modified to get better results and is compatible with edge computing schemes.

The wireless network is a concept that integrates technology into our day-to-day activities by applying deep neural networks and convolution neural network (CNN) learning techniques ^[29,30]. In their FER system, the authors of work ^[31] consider introducing convolutional neural network (CNN) and lengthy as well as short-term memory (LSTM) techniques. On the other hand, researchers ^[32] implemented CNN using fewer data and the eight-folding approach, yielding improved results. One of the key categories in which skill assists us is safety and privacy. Smartphones may be used as a safety alert scheme because they are the most extensively used smart gadgets. Artificial intelligence (A.I.)-enabled intelligent wireless network devices have grown in popularity in recent years. This study developed an innovative network security solution for the smart home using the neuro-fuzzy optimized classifier ^[33]. A security system is built around a Raspberry Pi and a NoIR(no Infrared) Pi camera unit that records and captures images ^[34,35]. A PIR-MS (Passive infrared motion sensor) is also used to identify motion. We propose combining images and motion sensor records from the NoIR(no Infrared) Pi camera unit to identify a safety threat using our algorithm's facial recognition classification technique ^[36,37]. In the event of an emergency, the system can notify the user. The proposed system has a 95.5% accuracy and 91% precision in detecting any security threat.

The Visual Internet of Things (VIoTs) and wireless communication have gotten much interest in recent years because of their capacity to extract object position from scene picture information, attach an optical tag to the item, and then return scene object information to the wireless network. Face recognition is one of the ideal visual network methods since a person's face is an intrinsic label ^[38]. The researchers ^[39] developed a pose estimation technique to resolve the problem due to long-range pixels leading to poor performance in FER. However, due to a lack of processing resources, existing state-of-the-art facial acknowledgement methods based on huge deep artificial neural networks (ANN) ^[40] are challenging to implement in the implanted podium for the visual wireless network. To overcome this problem, we provide a small deep ANN-based facial recognition system for the VIoTs. The proposed technique employs deep neural networks with minimal complexity ^[41] to function in an embedded set.

Moreover, it can withstand changes in lighting and position. We exhibit comparable correctness and enactment results designed for the LFW authentication benchmark using the mobile facial acknowledgement dataset. Scholars ^[42] have demonstrated that a Facial Action Coding System (FACS) is used in the detection of the characterization of a human being and that this same system could be applied to the detection of the categorization of varied consumers' goods by employing the affective reactions of those selective consumers. Work ^[43] proved and implemented a method in real-time on an Android-built platform for panic-face detection ^[44], fatigue detection ^[45] by incorporating mutation, genetic algorithm ^[46,47] and many more in expression recognition. Moreover, scholars ^[48] have proved that a histogram approach becomes very appropriate in these predictions for quicker identification of depth measurement and various expression recognition. Alzheimer's disease is a progressive degenerative neurological illness. It is now incurable, and those who suffer from it are denied the freedom to leave their houses compared to the general population. This article aims to develop and form an IoT prototype that can identify Alzheimer's victimized people, thereby increasing the high worth of life and relaxing caregivers' jobs. The patient wears a small dorsal belt that contains a Node MCU ESP8266 board, a GPS module, and a small portable WiFi modem/router. The patient's location is tracked via a web application and an Android/iOS mobile application. This research also allows the Kalman Filter to track the patient's movement and estimate his position, especially when the patient wanders outside.

Pneumonia causes a high morbidity and mortality rate in infants. This sickness affects the lungs' tiny air sacs, necessitating rapid diagnosis and treatment. One of the most popular diagnostics for diagnosing pneumonia is a chest X-ray. This paper explains how to identify pneumonia in chest X-ray pictures using a real-time Wireless Network system. Three medical experts reviewed the information, including 6000 images of children's chest X-rays. This study adopted twelve alternative image Net-trained Convolutional Neural Networks (CNNs) ^[49,50] architectures as resource extractors. The CNN and deep neural network are very emergent if they are used in FER for disguised and distorted expression recognition ^[51]. Many prototype designs are proposed by scholars ^[52] based on deep CNN for identifying people's faces, iris, and fingers vein in our daily life. CNNs were then integrated with other learning approaches like KNN, Naive Bayes (N.B.), Random Forest (R.F.), Multilayer Perceptron (MLP) ^[53], and SVM. The best model for diagnosing pneumonia in these chest radiographs was the VGG19 architecture with the SVM classifier and RBF kernel. The accuracy precision scores for this combination were 96.47%, 96.46%, and 96.46%, respectively. Paralleled to other articles in the collected works, the proposed approach yielded better results for the measures used. These findings imply that using a real-time Wireless Network system to detect pneumonia in children is beneficial and could be utilized as a diagnostic tool in the future. Doctors will obtain faster and more precise findings using this technology, allowing them to provide the best treatment possible.

Moreover, many works use machine learning-based recognition approaches. Researchers employ machine learning models such as SVM, CNN, and ANN (Artificial Neural Network), Genetic Algorithm for facial identification. A CNN is a deep learning approach ^[54,55] with a high-performance level and can extract features from training data. It achieves features from side to side using several convolutional layers and is often validated by a series of totally related layers ^[30]. On the CK+ database, investigators ^[32] raised CNN accuracy to 96.76%. Improved pre-processing processes such as sample formation and intensity normalization have enriched accuracy. A BDBN with numerous facial expression classifications was employed in another investigation. The accuracy on the CK+ database was 96.7 %, whereas it was 91.8 % on the JAFFE database. On the other hand, the execution period lasts for eight days.

In In this study, the proposed technique saves processing time while improving recognition performance using wireless communication accompanied by a cloudlets server. The proposed method uses numerous pre-processing techniques in conjunction with CNN to achieve excellent accuracy. It is also recommended that A.R.s focus on critical data in categorization to forecast projected expressions. This also helps to cut down on processing time. In addition, parallelism ^[30] is frequently employed to improve speed and precision.

Finally, we conclude that techniques based on appearance can obviate the need for meticulously created visual pieces to characterize a gaze in the eye-tracking system. However, we will incur a substantial penalty in execution time and storage space if we apply the complete input image to a classifier to forecast the gaze. Moreover, training eye image data, including poses and locations, is required during the development phase.

3. Proposed tracking scheme

Classifying distinct face portions is the primary problem while tracking a human face and extracting features from an infrared-facial image. Poses have a direct impact on a person's facial expression. The head movement and poses of a gaze vector are inextricably linked. We suggest a two- part flow diagram. We illustrated it by our experiment's results in Figure 2. First, we use all strategies to build Experiment-Ⅰ's face region feature vectors.

Figure 1. Infrared image and its divisions into six classes.

DownLoad: Full-Size Img PowerPoint

Figure 2. Facial image is separated into Six Classes and Twelve Regions.

DownLoad: Full-Size Img PowerPoint

Figure 3. Proposed flow diagram for implemented infrared image retrieval classification and analysis system.

DownLoad: Full-Size Img PowerPoint

Here we have followed the processes of registering an image, Sequence Maintenance of its sequence, Central-point and facial regions detection. Then, using the mapping function obtained from getting processes, we examined the correlation among the class calibrations in Experiment Ⅱ. The system will proceed to Experiment Ⅱ after completing the calibration process. Otherwise, it returns to the beginning of Experiment Ⅰ. Finally, from the grouped regions, we proceeded to retrieve attributes.

4. Image registration

Image registration is required to guarantee acceptable infrared image dataset series collected from a facial image. First, the image must be registered in the database if it has not already been done. As a result, the proposed technique can help detect picture duplication during image registration. During the image registration process, video images are converted into frames.

4.1. Pre-processing

Due to noises in the images, traditional recognition methods have proved that obtaining correct portions of any part of a human face in unusual settings (low lighting, dark, rainy period, or natural calamity) is exceedingly challenging. We applied the notion of using histograms to avoid difficulties like this, which assures that any lighting effect does not cause too many complications. Normal and sensitive histograms can be used to implant 3D data. The noises can be removed here in this stage from the infrared image. It is common practice to use nonlinear optical filtering to remove noise from an image. Pre-handling is done to work on a different type of filtering used in the infrared image. In our application, we applied Adaptive Weiner Filter (AWF) ^[43] to remove Gaussian Noise and other noises accompanying the infrared images we have used in our application.

Algorithm 1. Pose estimation (PE) algorithm

${P}_{m}=PE({P}_{2D}, {P}_{3D}, {f}_{w}, f)$

$Input:{P}_{2D}, {P}_{3D}, {f}_{w}, f$

$X=Q\left[{P}_{2D}\right]; c=R\left[X\right]$

$Y={P}_{{2D}_{x}/f}; X={P}_{{2D}_{y}/f}$

$h=\left[{P}_{3D}, c\right]; o=Pinv\left(h\right)$

$while\left(true\right)$

$\{j=oxY; k=oxX;$

${L}_{z}=\frac{1}{\sqrt{(\frac{1}{j}}x\frac{1}{k})}$

${p}_{{m}_{1}}=jx{L}_{z}$ ;

${p}_{{m}_{2}}=kx{L}_{z};$

${R}_{{m}_{1}}={p}_{{m}_{1}}\left(1:6\right); {R}_{{m}_{2}}={p}_{{m}_{2}}\left(1:6\right);$

${R}_{{m}_{3}}=\frac{{R}_{{m}_{1}}}{‖{R}_{{m}_{1}}‖}x\frac{{R}_{{m}_{2}}}{‖{R}_{{m}_{2}}‖}$ ;

${p}_{{m}_{3}}=[{R}_{{m}_{3}}, {L}_{z}]$ ;

$c=hx\frac{{p}_{{m}_{3}}}{{L}_{z}};$

$YY=Y; XX=X;$

$Y=c.fw.{P}_{{2D}_{x}};$

$X=c.fw.{P}_{{2D}_{y}};$

${E}_{x}=Y-{Y}^{2};{E}_{y}=X-{X}^{2}$

${if(‖E‖ < E}_{x})$

${\{p}_{{m}_{6}}={[{p}_{{m}_{1}}\left(1:6\right){p}_{{m}_{2}}\left(1:6\right), {p}_{{m}_{3}}\left(1:6\right), {p}_{{m}_{4}}\left(1:6\right), {p}_{{m}_{5}}\left(1:6\right), {p}_{{m}_{6}}\left(1:6\right), {L}_{z}, 1]}^{tm};$

$break;$

$\}$

$Output:{P}_{m}$

| Show Table

DownLoad: CSV

This is a straightforward way of determining the safest method for re-establishing a magnificent sign. The suggested study employs AWF to handle photo positions efficiently. The AWF is used to simplify the image with the most negligible fluctuation. Histograms are commonly balanced to improve the picture's uniformity. Histogram correction is a computer-assisted process for enhancing visual contrast. The prime consciousness esteems been fundamentally increased, i.e. the variety of image strength has been broadened. It creates a less close association with the development of district ties. As a result, following histogram correction, the usual picture contrast increases.

${N}^{x} = \frac{P}{\mathrm{T}\mathrm{P}}$

(1)

Where x = 0, 1 & -1; P = number of pixel, TP = > Total number of pixels

If we want to calculate the histogram equalized image then we may follow the equation below

$H{Q_{m,n}} = lo{g^{{e^{\left( {1 - x} \right)}}}}\left( {\sum_{x = 0}^{{b^{m,n}}} {\left( {{N^x}} \right)} } \right)$

(2)

Where ${log}^{{e}^{\left(1-x\right)}}$ represent nearest neighbourhood integer value.

The similar representation with respect to the pixel intensity value is as follows:

$\frac{\partial P}{\partial x}\left(\int_0^P H Q(x) = \partial P(x) \partial y = \partial P\left(P^{\frac{1}{x}} . \mathrm{P}\right) \delta / \delta P\right.$

(3)

Finally, the probability distribution function (PDF) can be illustrated uniformity as $\frac{\partial P}{\partial x}$ .

However, the outcome shows that the equalization method can smooth and improve histograms.

4.2. Segmentation by improved gradient method

Image line boundary detection ^[47], also known as edge identification is critical in visual interpretation. Edges store massive amounts of info. As a result, the image size is drastically decreased, and less restorative material is combed through, preserving the image's essential core elements. An edge location is extensively employed in image separation because borders frequently appear at picture object boundaries. We incorporated the AROI by selecting from the six divided classes of the face regions. The restrictions of an aim depicted on an image or volume are calculated here. This method of dividing determines whether or not neighboring pixels of starting seeds should be added. They are combining pixels or sub-regions with a local tool. The simplest of these processes is pixel amalgamation, which starts with a collection of "Images" focuses and develops regions by linking pixels with common characteristics.

${AROI}^{S} = \sum\limits _{m, n\in Q2}^{S}Vel.\left({Hp}_{m, }{Lp}_{n}\right)P.log{d}_{m}+\delta \int {e}_{m}\partial P$

(4)

Where ${AROI}^{S}$ is the separated AROI, $Vel$ is the velocity measured by gradient value, ${Hp}_{m, }{Lp}_{n}$ are the pixels values of low and high, $log{d}_{m}$ represents the spatial image size, $\delta$ signifies the image frequency coefficient, ${d}_{m}$ is the distance between two pixels.

4.3. Feature extraction

We used a grey level co-occurrence matrix to understand texture characteristics. It represents the grey level. The input image's spatial information determines the probability of matches with values obtained. Irrespective of these traits, this method examines 18 texture attributes. We applied the image retrieval method by providing a sample image file that retrieval images from a vast dataset that seems to be adjacent. A range of infrared images is used to evaluate the algorithm's performance upon that texturing dataset. We propose sixteen texture classes. Each texture image is broken into six sub-images for all these examples. Many images are obtained depending on the distance between the queries among the data set. The image features are extracted and used in the investigation. Grey level co-occurrence matrix can choose the pixel frequency inside the individual result. The segmentation's directional value can then be used to erase the image attributes utilized in the segmentation. The following is an example of the grey layer co-occurrence matrix technique:

$(m, n) = Vc(m, n, u, v)\sum\limits _{m = 1}^{H}Vc(m, n, u, v)$

(5)

Where Vc is the vector, m; n; u; v; are the pixel values with respect to high and low; C is the image characteristics. We used a grey layer co-occurrence matrix to gain the various attributes for feature extraction. Finally, the features are chosen based on texture and color.

4.4. Hybrid adaptive optimized classifier system

We incorporated the Enhanced Cuckoo Search Optimization (ECSO) algorithm ^[33] to gain the Adaptive Optimized classification (AOC). The Cuckoo Search algorithm (CSA) evaluates the inconsistency characteristics. Moreover, CSA is proposed to minimize the cost of network congestion we may face while collecting input images. In the ECSO algorithm, we have a guideline that we will place a random value on each node during selecting an arbitrary node on the cloudlets. The next cloudlet will move to the most vital node with the most significant number of images. The host server is static, and the cuckoo value possibly will be calculated in coincidence with the possibility of Pr[0; 1] by the host value. To overcome the challenge posed by network congestion and picture optimization variables, ECSO is a required method.

Algorithm 2 ECSO

${Input:Image\_Features(Img}_{ftr}), Image\_Cordinate({Img}_{cor})$

${Output:Classified \; Value(CL}_{val})$

$Begin\; to\; compute\; The\; Random\; Value\; \left(RV\right)$

$form=1:Range({Img}_{ftr}, 1)$

$forn=1:Range({Img}_{ftr}, 1)$

$Distance(m, n)=\sqrt{({Img}_{ftr}\left(m, 1\right)+{Img}_{ftr}\left(m-1\right)+{({Img}_{ftr}\left(m, 1\right)-{Img}_{ftr}\left(n, 1\right))}^{2}}$

$End$

${Img}_{ftr}= {Img}_{ftr}Distance(m, n)$

$RV=({{Img}_{ftr}}^{Distance(m, n)}$ )

$Class\; label\left(CL\right)=unique\left(node\right)$

$L=lenth\left(CL\right)$

$form=1:L$

$T=mean\_all\_cloudlet\_node\left(Img\right)$

$X\left(Img, n\right)=-\frac{1}{2}xTxmean\_all\_cloudlet\_node+\mathrm{l}\mathrm{o}\mathrm{g}\left(m\right)$

$X\left(Img, 2:end\right)=T$

$End$

| Show Table

DownLoad: CSV

Using linear discrimination analysis (LDA), the significance of respective image is considered after the infrared images are categorized. The results of the used classifier are given to the LDA. This approach measures the precision of picture statistics. The infrared images classification involves a remarkable role. It is usual practice to utilize LDA, a data inquiry tool, to reduce the dimension of numerous interconnected variables while maintaining the maximum amount of relevant data. As a type of image categorization, LDA analysis is valid. Matrix creation is demonstrated for use in operators performing image processing. We illustrated the LDA investigation processed by the following equations. The LDA function collects samples in the first phase to prepare properties from testing datasets. This dataset will be built from the six divided classes. The training datasets will be prepared and collected from the twelve regions. The LDA has the K classes (K < = 6) and R regions(R < = 12). There will be vector representation for each class. It will be represented in multidimensional space. In choosing the features from the regions, we applied the searching algorithm (ECSO)) as stated by Algorithm 2. The sensitive picture histogram is exhibited as shown in Figure 4. There are four images in this set, each of which has been converted utilizing sensitive histograms and various illuminations.

Figure 4. Dissimilar illuminations with image histograms equalization.

DownLoad: Full-Size Img PowerPoint

4.5. Features weights extraction

As shown in Figure 5, the features' weights (F.W.) must now be determined, establishing the image's critical pixels and weights. The importance of the features represented by the pixels is reflected in the weights. For example, a set of features represents an A.R. The method of locating POI and important A.R.s such as the nose-tip, RE, L.E., and regions of lips are depicted in Figure 5. Essential qualities are given larger weights to improve the dependability of information, which improves recognition.

Figure 5. Recognition from Active Regions.

DownLoad: Full-Size Img PowerPoint

The following phase separates the features. The Stochastic Face Shape Model (SFSM) and Optimized Principal Component Analysis (OPCA) were used to compare patterns on original and altered pictures (after the corrections of angles). From there, the features are retrieved and expressed in vector form. When the vectors are formed, phase-I is done. It is worth mentioning that an image-based sequence should be kept. The procedure of extracting facial features and preparing vectors is described below.

This section explains how to track facial features in video sequences and use the head posture estimation technique. Previous head pose estimate (P.E.) methods have relied on a stereo camera to provide correct 3-D information for head pose and make the necessary correction by rotation and normalization ^[35]. However, a head model's complex illustrations and exact starting value make a real-time clarification perfect. For example, the human Head is frequently revealed by an ellipsoid. Therefore, the cylindrical head model, CHM, calculates head position with 3-D positions on the corresponding sinusoidal surface. The 2-D to 3-D alteration scheme is utilized to gain the head posture statistics when a 2-D facial characteristic is traced in the individual video frame. Pose scaling with iterations is a two-dimensional to three-dimensional adaptation. Because the 2-D face characteristics have a variety of ramifications when recreating the position.

4.6. Different regions detection

Our proposed method will detect the regions based on the feature weights of the corresponding regions. First, we smoothed the ocular region using L0 as per Gradient Minimization Method (GMM) to estimate the radius because it supports to remove noise on an image pixel. Then, we used the canny edge detector on the ocular areas. We get a few invalid edges here, which we can filter out with a filter. Finally, we collect the related information from the identified regions depending on E1 and E2.

${E_1} = \sum \left( {{R_E}{\rm{x}}{C_R}} \right)$

(6)

${E_2} = \sqrt {G_X^2 + G_Y^2}$

(7)

Where R_E and C_R represent the regions, we measure any region's radius based on the values of R_E and C_R. The pixels are measured horizontally and vertically. G_X and G_Y, respectively, represent these two values. To detect any region, we have to lessen the intensity of that region and make the most of the strength or weightage of that region. The parameter τ controls the trade-off. That is

${U}_{c}, {V}_{c} = \underset{(x, y)}{\mathrm{min}}\left\{{E}_{1}\left(U, V\right)-T.\left({\int }_{-x/5}^{x/5}{E}_{1}(U, V)\delta s+{\int }_{4x/5}^{6x/5}{E}_{2}(U, V)\delta s\right)\right\}$

(8)

Where ( ${U_c}, {V_c}$ ) is the coordinate of that region. It establishes the intervals between ([ $- \frac{1}{5}\pi$ , $\frac{1}{5}\pi$ ] & [ $\frac{4}{5}\pi$ , $\frac{6}{5}\pi$ ]) because the regions do not intersect with each other.

4.7. Vector Formation from segmented region

When G represents the identified region and p represents the vector of that region. To execute a mapping function, it transmits gaze data. The user will give a calibration procedure, and the region vectors will be registered. The mapping function links the vector of that region. Its coordinate's value is measured by (U, V). We applied the SLM, i.e., a simple linear SVM model. We also incorporated the polynomial model (PM) to establish the appropriate mapping functions. In addition, we applied the second-order and third-order polynomial functions in the calibration stage to gain better-synthesized results. As a result, a mapping algorithm can accurately be determined based on a segregated frame.

5. Experimental evaluation

We completed our experiment based on our quantitative and qualitative evaluation algorithm. Therefore, we have organized our entire procedure into five regions, and the execution will be done in parallel in a time-sharing manner. In this testing phase, we are looking into forehead, eyes, nose-tip, and lip feature detection.

5.1. Active regions detection

5.2. Classification based on CNN-Tree-Level Method

Recognition and feature extraction of an eye is a thrilling job. We propose the EPDF function to detect the eye area accurately. In this respect, the CK+ and NAVIE datasets are used in our experiment. The CK+ dataset has 1000 grayscale images using ten subjects and different lighting conditions with different scales. Our experiment has a higher recognition accuracy, i.e., 91.73% in RGB images and 92.39% accuracy in detection by twenty-one infrared images. FER is a new field that requires a high rate of identification for accuracy. This study combines the AROI and CNN methods for identifying and classifying face expressions ^[35]. Five optimized active regions ^[30] of interest (OAROI), namely- The forehead, LE, RE, Nose-tip, and Lip region, are taken into account. The mentioned three AROIs are trained for CNN. The ultimate classification outcome is gained utilizing a decision tree-level synthesized method. The figure of the applied method is shown in Figure 6.

Figure 6. Schematic diagram of learning feature weight, classification of expression based on decision-tree level synthesis scheme.

DownLoad: Full-Size Img PowerPoint

The extents of OAR (optimized active regions) differ for each image. Therefore, the following steps are taken to achieve a similar result.

1. OAR of interests (OAROI) is measured by 120*80 pixels.

2. The features and their respective feature-weights of the used images have been learnt based on the CNN density layers and then by the sub-sampling layers simultaneously.

3. Lastly, it will pass through the Fully Connected Layer.

The fully-connected layers have applied the learned feature-weights to gain the classified expressions. Finally, one (1) is used as a hot code to mean that it will be labelled. There are five regions.

The hot code (1) size is five. An individual bit of the hot code resembles one class of regions. The hot index code (1) resulting in one area is showed. Here zero (0) bit implies false, and one (1) implies true. Figure 6 displays an example of the resultant region is the nose tip.

The final recognition result is attained in the last phase built on decision-tree level synthesized scheme. The decision-level fusion technique is being supported in the following way.

From the above, we can understand that if two or more classifiers need to be used to categorize the expression as the similar class i, i∈I, then the ultimate results after classification will be considered as j. Therefore, if two CNNs classifications are high, the synthesis result will be increased. On the other hand, when the results of five CNNs all will be dissimilar, we have to choose the outcome of the CNN among the five AROI (Forehead, LE, RE, Nose and Lip) based on the maximum accurateness, as it is shown in "Experimental Section".

${ Final\; Result }=\left\{\begin{array}{c} { i, \;if } 2 \; { or \; more \;CNNs } \\ { classify \;the \; expression } \\ \;{ as } \; i, { i \in I } \\ { The\; result \;of\; CNN \;for \;AROI } \\ { of \;Forehead, Left \;Eye,Right \; Eye, Lip } \\ {, otherwise } \end{array}\right.$

5.3. CNN Structure with Ten Fold Validity

The suggested CNN configuration is displayed in Figure 7. The CNN accepts 80*60 input infrared images. We incorporated 3 convolution layers. These three layers are being synthesized into subsampling layers. Sub-sampling layers correlate with the layers. The kernel sizes inside convolution layers, as shown in Figure 7, are divided into 5*5 and 2*2. The stride step point is fixed to 2 for each subsampling layer. As seen in Figure 7, the dimension and quantity of feature maps afterwards the convolution is 80*60*32. After that, an identical style is also applied to the remaining layers. In each of the two fully-connected layers, there are 1024 neurons. Finally, CNN generates the appropriate outputs based on the vote results of five different classed statements. The N (output), as shown in Figure 7, is calculated based on the images stored in the database.

Figure 7. CNN structure with convolution layers, sub-sampling layers and fully connected layers.

DownLoad: Full-Size Img PowerPoint

6. Performance analysis

The testing is accepted from ground truth and CK+ datasets. The performance in tracking accuracy is shown in Table 1. It is shown that CK+ image-set detection in the nose-tip region is always higher, i.e., 93.98%. At the same time, the accuracy in the forehead, left eye, and right eye religion is about 89.71%, 90.89%, and 92.57%, respectively. The lip region has better accuracy, i.e., 93.02%, than the eyes regions. The result of tracking by using infrared images became higher than RGB images. Using twenty-one infrared image sets, we achieved 94.35% on the nose tip region. In the same way, the accuracy in the forehead, LE, RE, and nose-tip regions are 91.21%, 91.47%, 93.31%, and 94.16%, respectively. So the average detection became 92.96% instead of 92.03%, which is comparatively higher than the RGB image.

Table 1. Recognition accuracy.

Applied images	Forehead region accuracy	Left eye region accuracy	Right eye region accuracy	Nose tip region accuracy	Lip region detection accuracy
RGB images	89.71%	90.89%	92.57%	93.98%	93.02%
Infrared images	91.21%	91.47%	93.31%	94.35%	94.46%

| Show Table

DownLoad: CSV

7. Conclusion

A quicker method is applied in recognizing facial temperature by applying the infrared image because the infrared image is self-sufficient to measure temperature quickly. Moreover, we incorporated an AI-based machine intelligence scheme that accelerated the diagnosis process. In choosing the active regions from the six classes and twelve regions, we applied the Enhanced Cuckoo Search Optimization (ECSO) algorithm by incorporating the wireless network using a cloudlets server so that processing becomes more accessible with minimal infrastructure, as stated by Algorithm 2. The devices (infrared or mobile camera) will help measure temperature first. Then, deep learning CNN will be applied during the record processing and analysis to yield better-synthesized results. Table 1 shows the results regarding the temperature measurement accuracy of five regions. We demonstrated that the recognition rate is always greater when using CK+ image-set identification within the nose-tip region, i.e., 93.98 %. The accuracy of the left and right eye religions is around 90.89 % and 92.57 %. The accuracy of the lip region is higher, at 93.02 %, than those of the eyes. The tracking outcome improved when infrared images were used instead of RGB images. We gained 94.35 % on the nose tip region by applying twenty-one infrared images.

Similarly, accuracy is 91.47 %, 93.31 %, and 94.16 % in the left eye, right eye, and nose-tip regions. As a result, the average recognition rate increased to 92.39 % from 91.73%, more significant than the RGB image set. For example, it is shown in Table 2 that by the CNN method applied to Forehead, LE, RE, Nose, and Lips region, we achieved an accuracy of 89.32%, 91.14%, 90.75%, 91.15%, and 93.32%, respectively. On the other hand, by incorporating our Decision tree-level synthesis method and ten-folded-validation technique applied to the Forehead, LE, RE, Nose, and Lips region, we achieved an accuracy of 91.36%, 94.14%, 93.49%, 94.44%, and 97.26% respectively. However, we achieved 3.29% greater accuracy by incorporating the "decision tree level synthesis scheme" and "ten-folded-validation method."

Table 2. Performance result analysis with different investigators.

Investigators	Applied Method	Accuracy found by applied Datasets
Investigators	Applied Method	JAFFE	CK+	MNI	NVIE	SFEW	OWN
S. Happy et al. ^[17]	SFP	85.06%	89.64%
Y. Liu et al. ^[21]	AUDN + 8-fold validity	63.40%	92.40%
L. Zhong, et al. ^[23]	CSPL		89.89%	73.53%
S.H. Lee, et al. ^[27]	SRC	87.85%	94.70%	93.81%
A.Mollahosseini et al. ^[29]	DNN		93.20%	77.90%
R. Saranya et al. ^[31]	CNN + LSTM with 10-fold validity			81.60%		56.68%	95.21%
A.T. Lopes et al. ^[32]	Active appearance method (AMM) Integrated AAM DAF Integrated DAF				66.24%
					71.72%
					79.16%
					91.52%
J.A. Zhanfu et al. ^[39]	AAM + infrared thermal + KNN				63.00%
	Bayesian networks (BNs)				85.51%
P. Shen, et al. ^[34]	KNN				73.00%
Our applied method	CNN method applied on
	1. Forehead region 2. Left-Eye region 3. Right-Eye region 4. Nose region 5. Lips region	89.32% 91.14% 90.75% 91.15% 93.32%
Our proposed method	Decision tree level synthesized technique and 10-fold validation method applied on 1. Forehead region 2. Left-Eye region 3. Right-Eye region 4. Nose region 5. Lips region		91.36%94.14% 93.49% 94.44% 97.26%
Our Achievement	We achieved 3.29% increased accuracy by incorporating decision tree level synthesized scheme and 10 folded validity.

| Show Table

DownLoad: CSV

Acknowledgments

We don't have any funding source during our study.

Conflict of interest

The authors declare there is no conflict of interest.

References

[1]	R. Fan, Q. Hou, M. M. Cheng, G. Yu, R. R. Martin, S. M. Hu, Associating inter-image salient instances for weakly supervised semantic segmentation, in Proceedings of the European Conference on Computer Vision, (2018), 367–383. https://doi.org/10.1007/978-3-030-01240-3_23
[2]	N. Meeboonmak, N. Cooharojananone, Aircraft segmentation from remote sensing images using modified deeply supervised salient object detection with short connections, in International Conference on Mathematics and Computers in Science and Engineering, (2020), 184–187. https://doi.org/10.1109/MACISE49704.2020.00040
[3]	X. Yao, R. Li, J. Zhang, J. Sun, C. Zhang, Explicit boundary guided semi-push-pull contrastive learning for supervised anomaly detection, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2023), 24490–24499.
[4]	N. Yu, H. Li, Q. Xu, A full-flow inspection method based on machine vision to detect wafer surface defects, Math. Biosci. Eng., 20 (2023), 11821–11846. https://doi.org/10.3934/mbe.2023526 doi: 10.3934/mbe.2023526
[5]	S. Hong, T. You, S. Kwak, B. Han, Online tracking by learning discriminative saliency map with convolutional neural network, in International Conference on Machine Learning, (2015), 597–606. https://doi.org/10.48550/arXiv.1502.06796
[6]	Q. Yan, L. Xu, J. Shi, J. Jia, Hierarchical saliency detection, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2013), 1155–1162. https://doi.org/10.1109/CVPR.2013.153
[7]	F. Perazzi, P. Krähenbühl, Y. Pritch, A. Hornung, Saliency filters: Contrast based filtering for salient region detection, in 2012 IEEE Conference on Computer Vision and Pattern Recognition, (2012), 733–740. https://doi.org/10.1109/CVPR.2012.6247743
[8]	L. Zhang, W. Chen, W. Wang, Z. Jin, C. Zhao, Z. Cai, et al., CBGRU: A detection method of smart contract vulnerability based on a hybrid model, Sensors, 22 (2022), 3577. https://doi.org/10.3390/s22093577 doi: 10.3390/s22093577
[9]	L. Zhang, Y. Li, T. Jin, W. Wang, Z. Jin, C. Zhao, et al., SPCBIG-EC: a robust serial hybrid model for smart contract vulnerability detection, Sensors, 22 (2022), 4621. https://doi.org/10.3390/s22124621 doi: 10.3390/s22124621
[10]	L. Zhang, J. Wang, W. Wang, Z. Jin, C. Zhao, Z. Cai, et al., A novel smart contract vulnerability detection method based on information graph and ensemble learning, Sensors, 22 (2022), 3581, https://doi.org/10.3390/s22093581 doi: 10.3390/s22093581
[11]	K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2016), 770–778. https://doi.org/10.1109/CVPR.2016.90
[12]	Y. Li, H. Jin, Z. Li, A weakly supervised learning-based segmentation network for dental diseases, Math. Biosci. Eng., 20 (2023), 2039–2060. https://doi.org/10.3934/mbe.2023094 doi: 10.3934/mbe.2023094
[13]	F. Chen, H. Ma, W. Zhang, SegT: Separated edge-guidance transformer network for polyp segmentation, Math. Biosci. Eng., 20 (2023), 17803–17821. https://doi.org/10.3934/mbe.2023791 doi: 10.3934/mbe.2023791
[14]	Q. Feng, X. Xu, Z. Wang, Deep learning-based small object detection: A survey, Math. Biosci. Eng., 20 (2023), 6551–6590. https://doi.org/10.3934/mbe.2023282 doi: 10.3934/mbe.2023282
[15]	C. Wu, L. Chen, A model with deep analysis on a large drug network for drug classification, Math. Biosci. Eng., 20 (2023), 383–401. https://doi.org/10.3934/mbe.2023018 doi: 10.3934/mbe.2023018
[16]	X. Qin, Z. Zhang, C. Huang, C. Gao, M. Dehghan, M. Jagersand, BASNet: Boundary-aware salient object detection, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2019), 7479–7489. https://doi.org/10.1109/CVPR.2019.00766
[17]	J. X. Zhao, J. J. Liu, D. P. Fan, Y. Cao, J. Yang, M. M. Cheng, EGNet: Edge guidance network for salient object detection, in Proceedings of the IEEE International Conference on Computer Vision, (2019), 8779–8788. https://doi.org/10.1109/ICCV.2019.00887
[18]	W. Wang, S. Zhao, J. Shen, S. C. Hoi, A. Borji, Salient object detection with pyramid attention and salient edges, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2019), 1448–1457. https://doi.org/10.1109/CVPR.2019.00154
[19]	Y. Liu, P. Wang, Y. Cao, Z. Liang, R. W. Lau, Weakly-supervised salient object detection with saliency bounding boxes, IEEE Trans. Image Process., 30 (2021), 4423–4435. https://doi.org/10.1109/TIP.2021.3071691 doi: 10.1109/TIP.2021.3071691
[20]	G. Li, Y. Xie, L. Lin, Weakly supervised salient object detection using image labels, in Proceedings of the AAAI Conference on Artificial Intelligence, 32 (2018), 7024–7031. https://doi.org/10.1609/aaai.v32i1.12308
[21]	Y. Piao, J. Wang, M. Zhang, H. Lu, MFNet: Multi-filter directive network for weakly supervised salient object detection, in Proceedings of the IEEE International Conference on Computer Vision, (2021), 4136–4145. https://doi.org/10.1109/ICCV48922.2021.00410
[22]	Y. Piao, W. Wu, M. Zhang, Y. Jiang, H. Lu, Noise-sensitive adversarial learning for weakly supervised salient object detection, IEEE Trans. Multimedia, 25 (2023), 2888–2897. https://doi.org/10.1109/TMM.2022.3152567 doi: 10.1109/TMM.2022.3152567
[23]	J. Zhang, X. Yu, A. Li, P. Song, B. Liu, Y. Dai, Weakly-supervised salient object detection via scribble annotations, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2020), 12546–12555. https://doi.org/10.1109/CVPR42600.2020.01256
[24]	B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, A. Torralba, Learning deep features for discriminative localization, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2016), 2921–2929. https://doi.org/10.1109/CVPR.2016.319
[25]	L. Wang, H. Lu, Y. Wang, M. Feng, D. Wang, B. Yin, et al., Learning to detect salient objects with image-level supervision, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2017), 136–145. https://doi.org/10.1109/CVPR.2017.404
[26]	X. Zhu, C. Tang, P. Wang, H. Xu, M. Wang, J. Che, et al., Saliency detection via affinity graph learning and weighted manifold ranking, Neurocomputing, 312 (2018), 239–250. https://doi.org/10.1016/j.neucom.2018.05.106 doi: 10.1016/j.neucom.2018.05.106
[27]	W. Zou, N. Komodakis, HARF: Hierarchy-associated rich features for salient object detection, in Proceedings of the IEEE International Conference on Computer Vision, (2015), 406–414. https://doi.org/10.1109/ICCV.2015.54
[28]	Y. Pang, X. Zhao, L. Zhang, H. Lu, Multi-scale interactive network for salient object detection, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2020), 9413–9422. https://doi.org/10.1109/CVPR42600.2020.00943
[29]	X. Qin, Z. Zhang, C. Huang, M. Dehghan, O. R. Zaiane, M. Jagersand, U2-Net: Going deeper with nested U-structure for salient object detection, Pattern Recognit., 106 (2020), 107404. https://doi.org/10.1016/j.patcog.2020.107404 doi: 10.1016/j.patcog.2020.107404
[30]	X. Zhao, Y. Pang, L. Zhang, H. Lu, L. Zhang, Suppress and balance: A simple gated network for salient object detection, in Proceedings of the European Conference on Computer Vision, (2020), 35–51. https://doi.org/10.1007/978-3-030-58536-5_3
[31]	L. Tang, B. Li, Y. Zhong, S. Ding, M. Song, Disentangled high quality salient object detection, in Proceedings of the IEEE International Conference on Computer Vision, (2021), 3580–3590. https://doi.org/10.1109/ICCV48922.2021.00356
[32]	M. Ma, C. Xia, J. Li, Pyramidal feature shrinking for salient object detection, in Proceedings of the AAAI Conference on Artificial Intelligence, 35 (2021), 2311–2318. https://doi.org/10.1609/aaai.v35i3.16331
[33]	Y. Song, H. Tang, M. Zhao, N. Sebe, W. Wang, Quasi-equilibrium feature pyramid network for salient object detection, IEEE Trans. Image Process., 31 (2022), 7144–7153. https://doi.org/10.1109/TIP.2022.322005 doi: 10.1109/TIP.2022.322005
[34]	M. Zhuge, D. P. Fan, N. Liu, D. Zhang, D. Xu, L. Shao, Salient object detection via integrity learning, IEEE Trans. Pattern Anal. Mach. Intell., 45 (2022), 3738–3752. https://doi.org/10.1109/TPAMI.2022.3179526 doi: 10.1109/TPAMI.2022.3179526
[35]	Z. Wu, S. Li, C. Chen, H. Qin, A. Hao, Salient object detection via dynamic scale routing, IEEE Trans. Image Process., 31 (2022), 6649–6663. https://doi.org/10.1109/TIP.2022.3214332 doi: 10.1109/TIP.2022.3214332
[36]	Y. H. Wu, Y. Liu, L. Zhang, M. M. Cheng, B. Ren, EDN: Salient object detection via extremely-downsampled network, IEEE Trans. Image Process., 31 (2022), 3125–3136. https://doi.org/10.1109/TIP.2022.3164550 doi: 10.1109/TIP.2022.3164550
[37]	M. Ma, C. Xia, C. Xie, X. Chen, J. Li, Boosting broader receptive fields for salient object detection, IEEE Trans. Image Process., 32 (2023), 1026–1038.https://doi.org/10.1109/TIP.2022.3232209 doi: 10.1109/TIP.2022.3232209
[38]	R. Bi, C. Ji, Z. Yang, M. Qiao, P. Lv, H. Wang, Residual based attention-unet combing DAC and RMP modules for automatic liver tumor segmentation in CT, Math. Biosci. Eng., 19 (2022), 4703–4718. https://doi.org/10.3934/mbe.2022219 doi: 10.3934/mbe.2022219
[39]	H. Zhu, X. He, M. Wang, M. Zhang, L. Qing, Medical visual question answering via corresponding feature fusion combined with semantic attention, Math. Biosci. Eng., 19 (2022), 10192–10212. https://doi.org/10.3934/mbe.2022478 doi: 10.3934/mbe.2022478
[40]	C. Jin, J. Huang, T. Wei, Y. Chen, Neural architecture search based on dual attention mechanism for image classification, Math. Biosci. Eng., 20 (2023), 2691–2715. https://doi.org/10.3934/mbe.2023126 doi: 10.3934/mbe.2023126
[41]	M. Chen, S. Yi, M. Yang, Z. Yang, X. Zhang, Unet segmentation network of COVID-19 CT images with multi-scale attention, Math. Biosci. Eng., 20 (2023), 16762–16785. https://doi.org/10.3934/mbe.2023747 doi: 10.3934/mbe.2023747
[42]	N. Liu, N. Zhang, K. Wan, L. Shao, J. Han, Visual saliency transformer, in Proceedings of the IEEE International Conference on Computer Vision, (2021), 4722–4732. https://doi.org/10.1109/ICCV48922.2021.00468
[43]	Z. Wang, P. Wang, Y. Han, X. Zhang, M. Sun, Q. Tian, Curiosity-driven salient object detection with fragment attention, IEEE Trans. Image Process., 31 (2022), 5989–6001. https://doi.org/10.1109/TIP.2022.3203605 doi: 10.1109/TIP.2022.3203605
[44]	C. Xie, C. Xia, M. Ma, Z. Zhao, X. Chen, J. Li, Pyramid grafting network for one-stage high resolution saliency detection, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2022), 11717–11726. https://doi.org/10.1109/CVPR52688.2022.01142
[45]	D. P. Fan, J. Zhang, G. Xu, M. M. Cheng, L. Shao, Salient objects in clutter, IEEE Trans. Pattern Anal. Mach. Intell., 45 (2022), 2344–2366. https://doi.org/10.1109/TPAMI.2022.3166451
[46]	M. M. Cheng, S. H. Gao, A. Borji, Y. Q. Tan, Z. Lin, M. Wang, A highly efficient model to study the semantics of salient object detection, IEEE Trans. Pattern Anal. Mach. Intell., 44 (2021), 8006–8021. https://doi.org/110.1109/TPAMI.2021.3107956
[47]	X. Tian, K. Xu, X. Yang, L. Du, B. Yin, R. W. Lau, Bi-directional object-context prioritization learning for saliency ranking, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2022), 5882–5891.
[48]	X. Tian, X. Yang, B. Yin, R. W. Lau, Weakly-supervised salient instance detection, preprint, arXiv: 2009.13898.
[49]	X. Tian, K. Xu, X. Yang, B. Yin, R. W. Lau, Learning to detect instance-level salient objects using complementary image labels, Int. J. Comput. Vision, 130 (2022), 729–746. https://doi.org/10.1007/s11263-021-01553-w doi: 10.1007/s11263-021-01553-w
[50]	Z. Liang, P. Wang, K. Xu, P. Zhang, R. W. Lau, Weakly-supervised salient object detection on light fields, IEEE Trans. Image Process., 31 (2022), 6295–6305. https://doi.org/10.1109/TIP.2022.3207605 doi: 10.1109/TIP.2022.3207605
[51]	X. Zheng, X. Tan, J. Zhou, L. Ma, R. W. H. Lau, Weakly-supervised saliency detection via salient object subitizing, IEEE Trans. Circuits Syst. Video Technol., 31 (2021), 4370–4380. https://doi.org/10.1109/TCSVT.2021.3049408 doi: 10.1109/TCSVT.2021.3049408
[52]	X. Liu, J. Guo, S. Zheng, Weakly-supervised salient object detection with label decoupling siamese network, in Proceedings of the 8th International Conference on Computing and Artificial Intelligence, (2022), 412–418. https://doi.org/10.1145/3532213.3532275
[53]	Y. Zeng, Y. Zhuge, H. Lu, L. Zhang, M. Qian, Y. Yu, Multi-source weak supervision for saliency detection, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2019), 6074–6083. https://doi.org/10.1109/CVPR.2019.00623
[54]	H. Zhang, Y. Zeng, H. Lu, L. Zhang, J. Li, J. Qi, Learning to detect salient object with multi-source weak supervision, IEEE Trans. Pattern Anal. Mach. Intell., 44 (2021), 3577–3589. https://doi.org/10.1109/TPAMI.2021.3059783 doi: 10.1109/TPAMI.2021.3059783
[55]	C. Rother, GrabCut: interactive foreground extraction using iterated graph cuts, ACM Trans. Graphics, 23 (2004), 309–314. https://doi.org/10.1145/1015706.1015720 doi: 10.1145/1015706.1015720
[56]	Y. Boykov, M. Jolly, Interactive graph cuts for optimal boundary and region segmentation of objects in n-d images, in Proceedings of the IEEE International Conference on Computer Vision, (2001), 105–112. https://doi.org/10.1109/ICCV.2001.937505
[57]	J. J. Liu, Q. Hou, M. M. Cheng, J. Feng, J. Jiang, A simple pooling-based design for real-time salient object detection, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2019), 3917–3926. https://doi.org/10.1109/CVPR.2019.00404
[58]	K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition, preprint, arXiv: 1409.1556.
[59]	Y. Li, X. Hou, C. Koch, J. M. Rehg, A. L. Yuille, The secrets of salient object segmentation, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2014), 280–287. https://doi.org/10.1109/CVPR.2014.43
[60]	G. Li, Y. Yu, Visual saliency based on multiscale deep features, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2015), 5455–5463. https://doi.org/10.1109/CVPR.2015.7299184
[61]	Y. Liu, X. Y. Zhang, J. W. Bian, L. Zhang, M. M. Cheng, SAMNet: Stereoscopically attentive multi-scale network for lightweight salient object detection, IEEE Trans. Image Process., 30 (2021), 3804–3814. https://doi.org/10.1109/TIP.2021.3065239 doi: 10.1109/TIP.2021.3065239
[62]	X. Zhang, T. Wang, J. Qi, H. Lu, G. Wang, Progressive attention guided recurrent network for salient object detection, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2018), 714–722.
[63]	T. Zhao, X. Wu, Pyramid feature attention network for saliency detection, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2019), 3085–3094.
[64]	P. Zhang, D. Wang, H. Lu, H. Wang, B. Yin, Learning uncertain convolutional features for accurate saliency detection, in Proceedings of the IEEE International Conference on Computer Vision, (2017), 212–221.

This article has been cited by:

1.	Mohammed Hassan Osman Abdalraheem, Mohammad Alamgir Hossain, Alfadil Ahmed Hamdan, M Tahar Kechadi, Suresh Limkar, 2023, Estimation of Facial Emotion Based on Landmark Points by Applying Artificial Intelligence and Machine Learning, 979-8-3503-0426-8, 1, 10.1109/ICCUBEA58933.2023.10392279
2.	Pradnya Borkar, Vishal Ashok Wankhede, Deepak T. Mane, Suresh Limkar, J. V. N. Ramesh, Samir N. Ajani, RETRACTED ARTICLE: Deep learning and image processing-based early detection of Alzheimer disease in cognitively normal individuals, 2023, 1432-7643, 10.1007/s00500-023-08615-w
3.	Saad Mamoun Abdel Rahman, Nasrullah Armi, Mohammed Eltahir Abdelhag, Sherif Tawfik Amin, Hassan Abu Eishah, 2023, Rapid and Efficient Facial Landmark Identification by Light and High Resolution Network using Artificial Intelligence, 979-8-3503-4389-2, 320, 10.1109/ICRAMET60171.2023.10366566
4.	Mohammad Alamgir Hossain, Mohammed Hassan Osman, Alfadil Ahmed Hamdan, Mohammed Eltahir Abdelhag, M Tahar Kechadi, 2023, FERLP: Facial Emotion Recognition Based on Landmark Points using Artificial Intelligence and Machine Learning, 979-8-3503-3509-5, 1, 10.1109/ICCCNT56998.2023.10308392
5.	Abdullah M. Sheneamer, Malik H. Halawi, Meshari H. Al-Qahtani, Priyadarsan Parida, A hybrid human recognition framework using machine learning and deep neural networks, 2024, 19, 1932-6203, e0300614, 10.1371/journal.pone.0300614
6.	Abdoh Jabbari, 2023, Tracking and Analysis of Pilgrims' Movement Throughout Umrah and Hajj Applying Artificial Intelligence and Machine Learning, 979-8-3503-0426-8, 1, 10.1109/ICCUBEA58933.2023.10392217
7.	Mohammed Hameed Alhameed, 2024, Adaptive Scheduling Architecture for IoT Environment, 9798400716379, 295, 10.1145/3674029.3674075
8.	Mohammed Alhameed, Mohammad Alamgir Hossain, 2023, Rapid Detection of Pilgrims Whereabouts During Hajj and Umrah by Wireless Communication Framework : An application AI and Deep Learning, 978-1-6654-7524-2, 1, 10.1109/ESCI56872.2023.10099969
9.	Dwarakanath B, Pandimurugan V, Mohandas R, Sambath M, Baiju B.V, Chinnasamy A, Detecting the symptoms of COVID-19 during pandemic environment using smart spectacle thermal images and deep capsule networks, 2024, 1573-7721, 10.1007/s11042-024-18812-w
10.	Suresh Limkar, Mohammad Alamgir Hossain, Sherif Tawfik Amin, Yasir Ahmad, 2025, 9781394256044, 185, 10.1002/9781394256075.ch10
11.	Mohammad Mazedul Huq Talukdar, Alfadil Ahmed Hamdan, Yagoub Abbker Adam, Mohammad Alamgir Hossain, Mohammed Hassan Osman, Mohammad Khamruddin, Mohammed Eltahir Abdelhag, 2024, Enhanced Approach to Predict Early Stage Chronic Kidney Disease, 979-8-3315-4310-5, 127, 10.1109/AGERS65212.2024.10932874

Reader Comments

Your name:*

Email:*
© 2023 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Mathematical Biosciences and Engineering

3.9

Metrics

Article views(1638) PDF downloads(61) Cited by(0)

Preview PDF

Download XML

Export Citation

Article outline

Show full outline

Figures and Tables

Figures(7) / Tables(6)

Mathematical Biosciences and Engineering

Weakly supervised salient object detection via image category annotation

Related Papers:

Abstract

1. Introduction

2. Related works

3. Proposed tracking scheme

4. Image registration

4.1. Pre-processing

4.2. Segmentation by improved gradient method

4.3. Feature extraction

4.4. Hybrid adaptive optimized classifier system

4.5. Features weights extraction

4.6. Different regions detection

4.7. Vector Formation from segmented region

5. Experimental evaluation

5.1. Active regions detection

5.2. Classification based on CNN-Tree-Level Method

5.3. CNN Structure with Ten Fold Validity

6. Performance analysis

7. Conclusion

Acknowledgments

Conflict of interest

References

This article has been cited by:

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Catalog

Mathematical Biosciences and Engineering

Weakly supervised salient object detection via image category annotation

Related Papers:

Abstract

1. Introduction

2. Related works

3. Proposed tracking scheme

4. Image registration

4.1. Pre-processing

4.2. Segmentation by improved gradient method

4.3. Feature extraction

4.4. Hybrid adaptive optimized classifier system

4.5. Features weights extraction

4.6. Different regions detection

4.7. Vector Formation from segmented region

5. Experimental evaluation

5.1. Active regions detection

5.2. Classification based on CNN-Tree-Level Method

5.3. CNN Structure with Ten Fold Validity

6. Performance analysis

7. Conclusion

Acknowledgments

Conflict of interest

References

This article has been cited by:

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Related pages

Tools

Export File

Citation

Format

Content

Catalog