
The role of excitatory interneurons (EINs) in the cortical has received increasing attention in the discussion of absence seizures. Numerous physiological experiments have confirmed the correlation between EIN and absence seizures. However, the dynamic mechanisms underlying this relationship are not well understood, and there are some challenges in selecting appropriate stimulation strategies for pyramidal clusters. In this study, we incorporated EIN into the previous Taylor model and developed an improved thalamocortical coupled model consisting of ten neuronal populations. Initially, we investigated the excitatory induction effect of EIN to pyramidal clusters and the external input of EIN. Then, four different targeted treatment approaches (deep brain stimulation (DBS), current balanced biphasic pulse (CBBP), 1:0 coordinated resetting stimulation (1:0 CRS), and 3:2 CRS) were applied to the pyramidal clusters. Moreover, we established two quantitative indices to evaluate the stimulation effects. The results showed that modifying the external input of EIN and the coupling strength projected onto the pyramidal clusters can effectively transition the system from an absence seizure state to other normal states. Additionally, inputs from the left compartment were found to reduce the generation of abnormal discharge regions in the right compartment. Furthermore, considering the treatment effects and current consumption, the 3:2 CRS stimulation strategy appeared to be the most suitable treatment approach for the pyramidal clusters. This work introduces a novel coupled model containing EIN, which contributes new theoretical foundations and insights for the future treatment of absence seizures.
Citation: Quanjun Wu, Zhu Zhang, Ranran Li, Yufan Liu, Yuan Chai. Regulatory role of excitatory interneurons by combining electrical stimulation for absence seizures in the coupled thalamocortical model[J]. Electronic Research Archive, 2024, 32(3): 1533-1550. doi: 10.3934/era.2024070
[1] | Shuai Cao, Biao Song . Visual attentional-driven deep learning method for flower recognition. Mathematical Biosciences and Engineering, 2021, 18(3): 1981-1991. doi: 10.3934/mbe.2021103 |
[2] | Quan Zhu, Xiaoyin Wang, Xuan Liu, Wanru Du, Xingxing Ding . Multi-task learning for aspect level semantic classification combining complex aspect target semantic enhancement and adaptive local focus. Mathematical Biosciences and Engineering, 2023, 20(10): 18566-18591. doi: 10.3934/mbe.2023824 |
[3] | Ruimin Wang, Ruixiang Li, Weiyu Dong, Zhiyong Zhang, Liehui Jiang . Fine-grained identification of camera devices based on inherent features. Mathematical Biosciences and Engineering, 2022, 19(4): 3767-3786. doi: 10.3934/mbe.2022173 |
[4] | Yamei Deng, Ting Song, Xu Wang, Yonglu Chen, Jianwei Huang . Region fine-grained attention network for accurate bone age assessment. Mathematical Biosciences and Engineering, 2024, 21(2): 1857-1871. doi: 10.3934/mbe.2024081 |
[5] | Tong Shan, Jiayong Yan, Xiaoyao Cui, Lijian Xie . DSCA-Net: A depthwise separable convolutional neural network with attention mechanism for medical image segmentation. Mathematical Biosciences and Engineering, 2023, 20(1): 365-382. doi: 10.3934/mbe.2023017 |
[6] | Xiaolin Gui, Yuanlong Cao, Ilsun You, Lejun Ji, Yong Luo, Zhenzhen Luo . A Survey of techniques for fine-grained web traffic identification and classification. Mathematical Biosciences and Engineering, 2022, 19(3): 2996-3021. doi: 10.3934/mbe.2022138 |
[7] | Hai Lu, Enbo Luo, Yong Feng, Yifan Wang . Video-based person re-identification with complementary local and global features using a graph transformer. Mathematical Biosciences and Engineering, 2024, 21(7): 6694-6709. doi: 10.3934/mbe.2024293 |
[8] | Xiaoguang Liu, Yubo Wu, Meng Chen, Tie Liang, Fei Han, Xiuling Liu . A double-channel multiscale depthwise separable convolutional neural network for abnormal gait recognition. Mathematical Biosciences and Engineering, 2023, 20(5): 8049-8067. doi: 10.3934/mbe.2023349 |
[9] | Zhenwu Xiang, Qi Mao, Jintao Wang, Yi Tian, Yan Zhang, Wenfeng Wang . Dmbg-Net: Dilated multiresidual boundary guidance network for COVID-19 infection segmentation. Mathematical Biosciences and Engineering, 2023, 20(11): 20135-20154. doi: 10.3934/mbe.2023892 |
[10] | Jiaming Ding, Peigang Jiao, Kangning Li, Weibo Du . Road surface crack detection based on improved YOLOv5s. Mathematical Biosciences and Engineering, 2024, 21(3): 4269-4285. doi: 10.3934/mbe.2024188 |
The role of excitatory interneurons (EINs) in the cortical has received increasing attention in the discussion of absence seizures. Numerous physiological experiments have confirmed the correlation between EIN and absence seizures. However, the dynamic mechanisms underlying this relationship are not well understood, and there are some challenges in selecting appropriate stimulation strategies for pyramidal clusters. In this study, we incorporated EIN into the previous Taylor model and developed an improved thalamocortical coupled model consisting of ten neuronal populations. Initially, we investigated the excitatory induction effect of EIN to pyramidal clusters and the external input of EIN. Then, four different targeted treatment approaches (deep brain stimulation (DBS), current balanced biphasic pulse (CBBP), 1:0 coordinated resetting stimulation (1:0 CRS), and 3:2 CRS) were applied to the pyramidal clusters. Moreover, we established two quantitative indices to evaluate the stimulation effects. The results showed that modifying the external input of EIN and the coupling strength projected onto the pyramidal clusters can effectively transition the system from an absence seizure state to other normal states. Additionally, inputs from the left compartment were found to reduce the generation of abnormal discharge regions in the right compartment. Furthermore, considering the treatment effects and current consumption, the 3:2 CRS stimulation strategy appeared to be the most suitable treatment approach for the pyramidal clusters. This work introduces a novel coupled model containing EIN, which contributes new theoretical foundations and insights for the future treatment of absence seizures.
High-accuracy image recognition tasks in specific scenarios have always been a prominent research topic in computer vision [1,2,3,4]. Railway stations, banks, airports and border controls are critical systems where deep learning technology has been widely applied [5,6]. On the one hand, the extensive use of deep learning in these scenarios has significantly reduced human and material resources. On the other hand, the uniqueness of these scenarios imposes a high demand for accuracy in deep learning techniques. In particular, security systems necessitate real-time and efficient tracking, management and protection of valuable materials. In this context, cargo recognition plays a critical role. It enables automated and efficient management, enhancing both work efficiency and safety. Therefore, achieving high accuracy and reliability in cargo recognition within high-security systems has emerged as a pressing concern in the field of security technology.
Traditional methods for cargo identification require manual intervention to obtain additional information about the cargo [7,8,9]. Furthermore, the recognition results of traditional methods tend to be suboptimal in complex cargo environments. In contrast, deep learning-based cargo recognition methods exhibit stable performance in intricate environments and can be readily deployed in practical production settings. By employing neural network models and corresponding training techniques, cargo recognition can be automatically achieved in diverse environments, regardless of artificial or other external factors.
Zhu et al. [10] proposed an attribute-guided two-layer learning framework that can identify unknown image categories, thus improving the robustness and performance of few-shot image recognition. Zeng et al. [11] proposed a convolutional neural network model for classifying house styles and achieved reasonable classification results on a small sample dataset, which confirmed the possibility of house style recognition. Yi et al. [12] proposed an end-to-end trained superpixel convolutional neural network by treating irregular superpixel crystals as 2D point clouds and using PointConv layers instead of standard convolutional layers to process these point clouds, thereby learning advanced representations of image superpixel elements and improving superpixel efficiency while obtaining considerable image recognition effects. The aforementioned methods have achieved satisfactory recognition results in their respective scenarios, but they have also ignored the fine-grained features of the recognition objects while only focusing on global and coarse-grained features.
Koyun et al. [13] proposed a two-stage object detection framework named "Focus-and-Detect" for detecting small objects in aerial images and introduced the Incomplete Box Suppression (IBS) method to address the truncation effect of region search methods. This framework demonstrated the best performance in small object detection on the VisDrone validation dataset. Wang et al. [14] presented a small object detection method based on an enhanced Single Shot MultiBox Detector (SSD) algorithm. The method replaced the original VGG-16 with an improved dense convolutional network (C-DenseNet) and incorporated residual prediction layers and DIoU-NMS. This approach effectively resolves the issues of false detection and missed detection in small object detection for object detection algorithms. Dong et al. [15] proposed a new object detection method based on a feature pyramid network (FPN), which introduces a multi-scale deformable attention module (MSDAM) and a multi-level feature aggregation module (MLFAM) to enhance the performance of remote sensing object detection (RSOD), achieving accurate detection on optical remote sensing images (DIOR) and RSOD datasets. These object detection methods have achieved effective object localization and recognition, but at the cost of a large amount of manual annotation and more focused attention on the local and fine-grained features of the objects.
Addressing the aforementioned issues, we present an AGMG-Net to enhance the accuracy of cargo recognition in security scenarios. The proposed network can effectively capture the distribution of focused regions, accurately locate the target position without manual annotation, separate the target from the background and fuse multi-granularity features to achieve precise identification of cargo. The major contributions of this study are outlined as follows:
● We propose an AMAA method to solve the problem of difficulty in locating targets in complex security system environments.
● We also propose a multi-region confidence-dependent optimal selection method to reduce the dependency on the threshold of foreground-background segmentation.
● Building on these two methods, we present an attention-guided multi-granularity feature fusion network that effectively enhances the accuracy of cargo recognition in security systems.
This section provides a brief review of the most relevant work, encompassing multi-branch models, weakly supervised object localization (WSOL) and fine-grained visual classification.
Multi-branch networks as a fundamental structure in deep learning have found wide applications in various task domains including semantic segmentation [16,17] and object detection [18,19,20], enabling the capture and learning of richer and more diverse features. For instance, Xie et al. [21] proposed a multi-branch network for disease detection in retinal images, enhancing the representation of disease-specific features through the fusion of multi-scale and spatial features. To overcome the limited capacity for extracting global spatial information, Xu et al. [22] introduced a dual-branch network composed of a grouped bidirectional LSTM (GBiLSTM) network and a multi-level fusion convolutional transformer (MFCT), generating distinct and robust spectral-spatial features for hyperspectral image classification with limited labeled samples. Addressing the challenge of small object detection in aerial images with limited samples, Zhang et al. [23] proposed a multi-branch network incorporating a transformer branch, leveraging the strengths of generative models and transformer networks to improve the robustness of small object detection in complex environments. To address the problem of significant non-linear differences between image blocks in image matching, in the context of image matching, where significant non-linear differences exist between image blocks, Yu et al. [24] presented a composite metric network comprising a main metric network module and multiple branch metric network modules to capture richer and more distinctive feature differences. Overall, the concept of multi-branch networks has emerged as a crucial research direction in deep learning, offering practical solutions for diverse task domains.
Since Zhou et al. [25] introduced the use of Class Activation Maps (CAM) to characterize object locations, an increasing number of works have applied them to the field of WSOL [26,27,28,29,30,31]. CAM represents a feature map obtained from the global average pooling layer of a classification network and it is weighted before applying softmax to emphasize the position of the target object. Hwang et al. [32] introduced a target localization strategy using entropy regularization, which considers the one-hot labels and the entropy of predicted probabilities, thereby striking a balance between WSOL scores and classification performance. Zhang and Yang [33] proposed an adaptive attention enhancer to address the limitation of existing WSOL methods that lack modeling of the correlation between different regions of the target object. This enhancer supplements object attention by discovering the semantic correspondence between different regions. Gao et al. [34] presented a token semantics coupled attention map (TS-CAM) to tackle the challenge of learning object localization models given image category labels. The self-attention mechanism in vision transformers is utilized to extract long-term dependencies and compensate for the limitations of Convolutional Neural Networks (CNNs) in partial activations. These works demonstrate that WSOL can accomplish object localization tasks with only image annotations, eliminating the need for manual annotation in object detection tasks. Moreover, they hold significant value in image recognition by enabling networks to quickly and accurately identify recognized subjects through WSOL methods, while extracting fine-grained features of the image through the CG-Net branch.
The closest approach to this paper is [35,36], where Wang et al. [35] proposed an accurate semantic-guided discriminative region localization method for fine-grained image recognition methods that ignore the spatial correspondence between low-level details and high-level semantics. Du et al. [36] introduced a novel framework for fine-grained visual classification, addressing challenges in identifying discriminative granularities and fusing information. The framework includes a progressive training strategy and a jigsaw puzzle generator, achieving state-of-the-art performance on benchmark datasets. The difference between this paper and [35,36] is that this paper introduces CAM, a classic practice in WSOL, to initially localize the target, and accomplishes richer feature extraction through operations such as accumulation and coarse- and fine-grained feature fusion, to adequately learn the multi-granularity feature information in the image. Wang et al. [37] proposed Prompting vision-Language Evaluator (PLEor), a novel framework for open set fine-grained retrieval, based on the Contrastive Language-Image Pretraining (CLIP) model. PLEor leverages the pre-trained CLIP model to infer category-specific discrepancies and transfer them to the backbone network trained in close set scenarios. Wang et al. [38] introduced a Fine-grained Retrieval Prompt Tuning (FRPT), and by utilizing sample prompting and feature adaptation, FRPT achieves state-of-the-art performance on fine-grained datasets with fewer parameters.
This paper introduces the AGMG-Net, which consists of the Fine-Grained Net (FG-Net), the Coarse-Grained Net (CG-Net) and the Multi-Granularity Fusion Net (MGF-Net). Cargo image information is complex and exhibits both coarse-grained and fine-grained features. Conventional deep learning approaches primarily focus on learning and extracting coarse-grained features, which limits their ability to capture fine-grained features and leads to inaccuracies and omissions in cargo recognition. To address these challenges, we propose the AGMG-Net. The FG-Net uses global attention to extract features and learn coarse-grained characteristics, such as color, shape and position. The CG-Net employs deep convolution to extract fine-grained features, including texture and structure. The MGF-Net conducts feature fusion and learning on multi-granularity features. The final cargo recognition result is obtained by applying a majority rule after classification using fully connected layers. Figure 1 illustrates the network structure.
The input image undergoes initial processing by the FG-Net module to extract coarse-grained features. The FG-Net module comprises three components: a feature extractor, a classifier and an AMAA module. The feature extractor consists of convolutional and global self-attention blocks, which enable the learning of global features and the extraction of coarse-grained features. The classifier generates the cargo classification result using a global average pooling layer and a fully connected layer. During the early-to-mid training phase, the AMAA module accumulates the multi-stage attention map, directing the CG-Net to focus on the target object for extracting fine-grained features. As training progresses, the AMAA module guides the CG-Net to pay more attention to the overall image, to a certain extent, facilitating effective fusion of the cargo's coarse-grained features.
Suppose the original image is shown in Figure 2(a), the AMAA module generates an attention map using the Class Activation Mapping (CAM) method, as illustrated in Figure 2(b). Let I∈RC×W×H denote the input. The attention map acquired from each iteration is denoted as At∈Rw×h. By setting the initial condition as M0=A0, the cumulative attention map Mc can be calculated using Eq (3.1).
Mct=max(Mct−1,Act), t=1,2,… | (3.1) |
Here C represents the number of channels, H and W represent the width and height of the image, h and w represent the width and height of the attention map, and c stands for the category.
Different positions are emphasized at each stage of the network during training, and the resulting cumulative attention map Mct reflects the attention region distribution for a given category. In the early and middle stages of training, Mct exhibits more accurate localization ability than the attention map Act. However, due to the gradual accumulation of maximum values during the calculation of the attention map M, regions with excessively high attention values can lead to inaccurate target localization. To address this issue, we propose the AMAA module in this paper. The AMAA module retains the attention maps from the previous k−1 stages along with the cumulative attention map Mt, forming an attention sequence denoted as ML, which has a length of k. The specific formulation of ML is presented in Eq (3.2). By utilizing the attention sequence, which comprehensively considers the cumulative attention map M and the recent k−1 attention maps, and performing a weighted summation of ML using Eq (3.3), we can prevent misleading final localization caused by abnormal attention maps.
ML=[Mt,At,At−1,…,At−k+1] | (3.2) |
A=k∑i=0αiMLi | (3.3) |
where α∈Rk. Since taking the simple average of the attention sequence ML would diminish the impact of the most recent attention map on the overall attention map A, this paper adopts the approximate forgetting function [39] L(x,k) to compute the initial value of α. Subsequently, α undergoes normalization using Eq (3.5).
L(x,k)=184k125x+1.84k | (3.4) |
αi=softmax(L(i,k)) | (3.5) |
The AMAA module not only considers the cumulative attention map but also incorporates the attention maps computed in the previous k−2 iterations to generate the comprehensive attention map A for the current iteration. As the model undergoes continuous training, AMAA gradually converges towards the actual distribution of the target position, i.e., the true location of the target object. This enables AMAA to efficiently accomplish target localization, consequently guiding CG-Net in performing localization and recognition.
The comprehensive attention map generated by FG-Net and the original input are both fed into CG-Net for fine-grained feature extraction. CG-Net consists of three components: the MOSBC module, the feature extractor and the classifier. As shown in Figure 3, the MOSBC module first performs image fusion on the input and then selects the target region. The feature extractor has 11 convolutional layers, which enable the capture of detailed information and local features in the image, facilitating the extraction of fine-grained image features. The classifier consists of one global average pooling layer and two fully connected layers, enabling precise image classification.
The comprehensive attention map A, as illustrated in Figure 2(c), still includes non-target areas and exhibits blurred edges within the target region, despite attention accumulation. To address this issue, this paper proposes a region localization method called MOSBC, which aims to identify the most probable area where the target is located. Initially, a two-dimensional bilinear interpolation is employed to transform the comprehensive attention map A into an output of the same width and height, denoted as A′∈RH×W. Subsequently, a threshold γ∈(0,1) is utilized to delineate the foreground and background in the attention map A, as depicted in Eq (3.6).
A∗={0A′≤γ1otherwise | (3.6) |
After dividing the foreground and background, the Euclidean distance from each foreground position to the nearest background is first calculated. Then, the peak distance values are found from the image, and the connected components are analyzed using eight connectivity for local peaks. The watershed algorithm [40] is then used for image segmentation to obtain the target region set, generating N candidate regions Rn=(xn,yn,hn,wn),n=1,…,N, where xn,yn is the center coordinate of the candidate region Rn and hn,wn is the height and width. Then, Eq (3.7) is used to calculate the confidence score ρn for each region and used Eq (3.8) to select the region with the highest confidence score as the finally target Region.
ρn=1hn×wnhn∑i=0wn∑j=0Rn | (3.7) |
Region=argmax{Ri|ρi=max{ρj},1≤i,j≤N} | (3.8) |
It is important to note that the Region obtained at this stage represents a localization box, which requires overlaying it onto the input I to extract the fine-grained image of the target through cropping. Subsequently, the fine-grained image is resized using two-dimensional bilinear interpolation to match the size of I and then fed into the feature extractor of CG-Net for fine-grained feature extraction.
MGF-Net is responsible for multi-scale feature fusion and classification. It consists of a feature fusion layer and a classifier. The feature fusion layer uses the concatenation operation to achieve multi-scale feature fusion at the channel level. This is in contrast to other feature fusion methods (such as addition or multiplication), which do not ensure the diversity and completeness of features. Concatenation can handle feature maps of multiple scales simultaneously, which makes it more flexible. The MGF-Net classifier consists of one global average pooling layer and three fully connected layers. This enables multi-scale image classification. The structure of MGF-Net is depicted in Figure 4 and the network's layer parameters are detailed in Table 1.
Layers | Resolution | Description |
Input | (7,7,1026),(7,7,512) | - |
Concat | (7,7,1026+512) | Concat operation |
Global Pool | (7,7,1538) | Global average pooling |
FC | - | 4096-dimensional FC layer |
FC | - | 4096-dimensional FC layer |
FC | - | k-dimensional FC layer |
The feature map with 1026 channels is derived from FG-Net, while the feature map with 512 channels originates from CG-Net. The number of channels obtained is 1026 after the input image is convolved by 4 layers of FG-Net and 1 layer of transformer, and the number of channels obtained is 512 after the image processed by the MOSBC module is input to CG-Net and convolved by 11 layers as shown in Figure 3.
To evaluate the effectiveness of our proposed method, AGMG-Net was compared against state-of-the-art methods, and ablation experiments were conducted on three publicly available and self-built image recognition datasets.
To simulate real-world scenarios, the experiments were performed on the publicly available Flower and Butterfly datasets. The Flower dataset, obtained from http://download.tensorflow.org/example images/flower phones.tgz, comprises 4323 images of flowers from 5 categories, each with random resolutions. The original butterfly dataset [41] consists of 200 categories, from which 20 categories were selected to create a smaller dataset named butterfly20. This subset contains 2066 images with random resolutions. Sample images from both datasets are depicted in Figure 5, exhibiting diverse lighting angles, shooting angles, distances and backgrounds, resembling the conditions and environments encountered in security systems for cargoes. Utilizing these datasets in the experiments enables a more accurate reflection of practical scenarios, enhancing the experiment's reliability and generalization ability. Evaluating the performance of AGMG-Net under different conditions using these datasets allows for a more comprehensive assessment, thereby improving the algorithm's robustness.
In addition, we created a self-built cargo recognition dataset named "Cargo" specifically for recognizing cargoes in security systems. The dataset comprises 3 categories and 4715 images with random resolutions, organized in a structure identical to the aforementioned publicly available datasets. An overview of the Cargo dataset is provided in Table 2.
Item | Values |
Modalitites | RGB |
Total number of images | 4715 |
Number of classes | 3 |
Number of angle classes | 6 |
Number of distance classes | 5 |
Number of background classes | 7 |
The hardware environment used for this experiment is Intel® Xeon® Platinum 8255C CPU @ 2.50GHz with 12 CPU cores, 32GB DDR4 memory, and NVIDIA GeForce RTX 3090 graphics card. The software platform is Ubuntu 20.02-LTS operating system, Python version 3.7, Pytorch version 1.12.1, CUDA version 11.3 and cuDNN version 8.2.1.
The hyperparameters were set as follows: 100 training iterations were performed for the Flower dataset, 200 for the Butterfly20 dataset and 120 for the Cargo dataset. The Adam optimizer was utilized with a batch size of 16 and a learning rate of 0.001. The training set and testing set were randomly split in a ratio of 8:2, ensuring an equal number of samples for each class.
To objectively evaluate the classification performance of different models, two commonly used metrics in classification tasks, namely accuracy and F1-score, were adopted. The F1-score is a statistical measure of classification model precision, as expressed mathematically in Eq (4.1).
F1−score=2×Precision×RecallPrecision+Recall | (4.1) |
where Precision=TP/(TP+FP), Recall=TP/(TP+FN), TP represents the number of correctly classified positive samples, TN represents the number of correctly classified negative samples, FP represents the number of incorrectly classified positive samples and FN represents the number of incorrectly classified negative samples.
In this section, we first conducted test experiments on CG-Net with the pruning threshold γ, while setting the attention sequence length ML to 4. The performance of AGMG-Net on the Cargo dataset is presented in Table 3.
Threshold γ | Accuracy (%) | F1-score (%) |
0.1 | 98.81 | 94.26 |
0.2 | 99.12 | 95.33 |
0.4 | 99.22 | 96.14 |
0.6 | 99.22 | 96.09 |
0.8 | 99.01 | 95.51 |
As shown in Table 3, the accuracy of AGMG-Net is the best when the cropping threshold γ is 0.2 to 0.6. The accuracy is the worst when it is 0.1, and there is a significant decrease in accuracy when it is 0.8. This shows that γ in the range of 0.2 to 0.6 can help the model to complete the target cropping and effectively improve the recognition performance of the model. When γ is set to 0.4, the F1-score of AGMG-Net reaches the maximum value of 96.14%. Therefore, all subsequent experiments set γ to 0.4.
Furthermore, we conducted additional experiments on the attention sequence length ML with γ set to 0.4. The performance of AGMG-Net on the Cargo dataset is presented in Table 4.
ML | Accuracy (%) | F1-score (%) |
2 | 98.28 | 96.02 |
4 | 99.22 | 96.14 |
6 | 99.43 | 97.21 |
8 | 99.58 | 97.35 |
10 | 99.36 | 97.13 |
Table 4 shows that AGMG-Net achieves the highest accuracy when ML is between 6 and 8. The accuracy is significantly lower when ML is between 2 and 4, and it decreases significantly when ML is set to 10. These findings suggest that maintaining multiple adjacent attention maps between 6 and 8 helps mitigate the influence of outliers during training and improves recognition performance. The model achieves the highest recognition performance when ML is set to 8. Therefore, we set ML to 8 in subsequent experiments.
The ablation experiment was conducted to investigate the impact of the AMAA and MOSBC modules in AGMG-Net on network performance. We removed the AMAA and MOSBC modules from FG-Net and CG-Net and conducted ablation experiments on the three datasets mentioned in Section 4.1. The experimental results, depicted in Figure 6, show that the AMAA and MOSBC modules have a significant impact on the recognition accuracy of the AGMG-Net model.
It is evident from Figure 6(c) that the addition of the AMAA and MOSBC modules increases the number of parameters that AGMG-Net needs to learn. In the early training process, only FG-Net is trained, resulting in relatively low accuracy. However, as the test accuracy reaches a threshold of 0.95, CG-Net starts participating in the training process, leading to rapid object localization and multi-scale feature fusion in 63 rounds. This can be observed in Figure 6(a) and (b), where AGMG-Net surpasses the performance of other models on the Flower (threshold: 0.8), Butterfly20 (threshold: 0.8) and Cargo dataset, with increasing margins at rounds 43, 81 and 63. The optimal values are achieved at rounds 97,105 and 143, with lead values of 3.7, 5.48 and 0.29%. These experiments demonstrate that the AMAA and MOSBC modules positively impact the recognition accuracy of AGMG-Net in image recognition.
Table 5 shows the results of the quantization of the ablation experiment in Figure 6. The results in Table 5 using only the AMAA module are higher than the accuracy of Primitive. This demonstrates that using only the CAM as a basis for localization does not learn more features, and can even be misled by the anomalous CAM. After AMAA and MOSBC work together, the model shows the highest accuracy and the lowest loss. This indicates that the MOSBC module can perform proper target acquisition after effectively localizing the target, providing more complete target information for AGMG-Net.
Methods | Flower | Butterfly20 | Cargo | |||
Accuracy(%) | Loss | Accuracy(%) | Loss | Accuracy(%) | Loss | |
Primitive | 89.03 | 0.7017 | 83.10 | 0.8899 | 99.29 | 0.0698 |
Primitive + AMAA | 89.27 | 0.4241 | 85.95 | 0.7994 | 99.47 | 0.0762 |
Primitive + MOSBC | 86.72 | 0.6617 | 81.01 | 1.4036 | 98.79 | 0.1127 |
Primitive ++ | 92.73 | 0.3697 | 88.57 | 0.3751 | 99.58 | 0.0211 |
To evaluate the effectiveness of the proposed model, this section conducts comparative experiments on the Flower, Butterfly20 and Cargo datasets. Comparing AGMG-Net with representative traditional convolutional networks (VGG [42]), residual networks (ResNeSt [43]), visual transformers (ViT [44]) and CoAtNet [45] models that integrate the advantages of both convolution and transformer. The experiment maintains consistent training parameters, including batch size, learning rate, number of iterations and weight decay. The results are presented in Tables 6–8.
Methods | Accuracy (%)↑ | F1-score (%)↑ | Loss↓ | Train (SPI)↓ | Test (SPI)↓ |
VGG | 74.13 | 37.84 | 0.8189 | 0.0066 | 0.0020 |
ResNeSt | 86.49 | 44.01 | 0.6651 | 0.0075 | 0.0020 |
ViT | 69.86 | 34.08 | 0.8850 | 0.0066 | 0.0025 |
CoAtNet | 88.50 | 54.11 | 0.3550 | 0.0103 | 0.0031 |
DeepMAD (SOTA) | 90.57 | 61.33 | 0.3368 | 0.0226 | 0.0068 |
AGMG-Net | 92.73 | 63.71 | 0.3697 | 0.0212 | 0.0127 |
Methods | Accuracy (%)↑ | F1-score (%)↑ | Loss↓ | Train (SPI)↓ | Test (SPI)↓ |
VGG | 62.62 | 34.11 | 1.1135 | 0.0066 | 0.0020 |
ResNeSt | 77.62 | 41.45 | 0.8012 | 0.0076 | 0.0021 |
ViT | 50.00 | 15.52 | 1.7806 | 0.0066 | 0.0026 |
CoAtNet | 85.00 | 52.60 | 0.6337 | 0.0101 | 0.0031 |
DeepMAD (SOTA) | 88.41 | 71.62 | 0.4246 | 0.0232 | 0.0081 |
AGMG-Net | 88.57 | 73.45 | 0.3751 | 0.0217 | 0.0131 |
Methods | Accuracy (%)↑ | F1-score (%)↑ | Loss↓ | Train (SPI)↓ | Test (SPI)↓ |
VGG | 99.15 | 94.69 | 0.0421 | 0.0067 | 0.0019 |
ResNeSt | 99.21 | 95.97 | 0.0332 | 0.0075 | 0.0020 |
ViT | 92.48 | 69.16 | 0.2025 | 0.0066 | 0.0025 |
CoAtNet | 99.23 | 96.02 | 0.0244 | 0.0103 | 0.0032 |
DeepMAD (SOTA) | 99.53 | 97.04 | 0.0256 | 0.0225 | 0.0065 |
AGMG-Net | 99.58 | 97.35 | 0.0211 | 0.0194 | 0.0111 |
VGG represents traditional convolutional networks and performs excellently in image classification tasks due to its regular network structure, which comprises convolutional and pooling layers. This structure enables VGG to quickly capture image features within a limited number of training iterations. ResNeSt represents residual networks, which incorporate residual blocks that facilitate easier model training and enable capturing deep features of cargoes in complex environments. ViT is a transformer-based visual model that partitions the image into small blocks and processes them through multi-head self-attention mechanisms. This allows ViT to capture global information and the relationships between image blocks, resulting in high recognition accuracy on the ImageNet dataset. CoAtNet combines deep convolutional networks with self-attention mechanisms, enabling it to efficiently extract target features and perform well on both small-scale and large-scale datasets.
The results of ViT in Table 7 show that visual transformers excel at learning global and coarse-grained features, but they struggle to extract effective features from a limited number of samples. On the other hand, convolutional models like VGG and ResNeSt perform relatively poorly at learning fine-grained features of the target, but they can still achieve decent results on small-scale datasets. CoAtNet combines the strengths of both approaches, exhibits powerful feature extraction capabilities and achieves an impressive accuracy of 85%. AGMG-Net, with its CG-Net subnetwork that extracts finer-grained features and combines multiscale features, surpasses CoAtNet in accuracy, demonstrating the effectiveness of combining coarse and fine-grained features.
The information presented in Tables 6 and 8 shows that the distinctive characteristics of the ViT model are effectively utilized on the medium-scale datasets Flower and Cargo, facilitating the extraction of features at a global scale. Furthermore, the VGG and ResNeSt models demonstrate improved learning capabilities in capturing intricate details within the datasets. However, it is noteworthy that the CoAtNet model surpasses both VGG and ResNeSt in terms of its remarkable feature extraction ability, achieving notably high accuracy rates of 88.5 and 99.23%, as well as F1-scores of 52.6 and 96.02%, respectively. Nevertheless, the proposed AGMG-Net model in this study, which incorporates more comprehensive coarse-grained features and richer fine-grained features, exhibits a slight performance improvement over the CoAtNet model, yielding gains of 3.57 and 0.35% in accuracy. Comparing the running speeds of the models in Tables 6–8, it can be seen that AGMG-Net is more time-consuming in the inference process, but faster than the state-of-the-art (SOTA) model in the training process. The multi-branch model AGMG-Net achieves 99.58% accuracy on the Cargo dataset using only a slightly longer time, which shows that the model proposed in this paper is more suitable for security systems that do not require high recognition speed but have higher recognition accuracy.
The above experiments demonstrate that AGMG-Net can effectively leverage the multi-granularity features of the data, resulting in higher classification accuracy and precision. Additionally, AGMG-Net has the potential to accurately classify cargo in complex environmental conditions, demonstrating its practicality and usability for cargo recognition tasks.
The AGMG-Net proposed in this paper has significant advantages in cargo recognition. Unlike existing methods, AGMG-Net specifically considers the fine-grained features of cargo and leverages multiscale features to enhance recognition accuracy, even when data is limited and there are minimal differences between cargo classes. AGMG-Net incorporates the coarse-grained branch's AMAA module for target localization and the fine-grained branch's MOSBC module for target cropping. It then combines the feature maps of both branches through a multiscale fusion branch in Concat mode. Furthermore, it improves prediction accuracy by employing the majority voting method in the prediction layer.
The average recognition rates on the self-built Cargo dataset, as well as the public datasets Flower and Butterfly20, are 99.58, 92.73 and 88.57%. Experimental results demonstrate that AGMG-Net outperforms VGG, ResNeSt, ViT and CoAtNet in terms of classification effectiveness.
In conclusion, AGMG-Net is an effective cargo recognition model that enhances recognition accuracy and classification effectiveness through the integration of attention-guided multiscale feature fusion. It can be successfully applied to cargo recognition tasks within complex security system environments, thereby significantly contributing to cargo safety.
The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.
The authors declare there is no conflict of interest.
[1] |
Y. Yu, F. Han, Q. Wang, A hippocampal-entorhinal cortex neuronal network for dynamical mechanisms of epileptic seizure, IEEE Trans. Neural Syst. Rehabil. Eng., 31 (2023), 1986–1996. https://doi.org/10.1109/TNSRE.2023.3265581 doi: 10.1109/TNSRE.2023.3265581
![]() |
[2] |
E. Akyuz, A. K. Polat, E. Eroglu, I. Kullu, E. Angelopoulou, Y. N. Paudel, Revisiting the role of neurotransmitters in epilepsy: An updated review, Life Sci., 265 (2021), 118826. https://doi.org/10.1016/j.seizure.2013.01.008 doi: 10.1016/j.seizure.2013.01.008
![]() |
[3] | R. D. Thijs, R. Surges, T. J. O'Brien, J. W. Sander, Epilepsy in adults, Lancet, 393 (2019), 689–701. https://doi.org/10.1016/S0140-6736(18)32596-0 |
[4] |
S. Gregorčič, J. Hrovat, N. Bizjak, P. Z. Rener, T. Hostnik, B. Stres, et al., Difficult to treat absence seizures in children: A single-center retrospective study, Front. Neurol., 13 (2022), 958369. https://doi.org/10.3389/fneur.2022.958369 doi: 10.3389/fneur.2022.958369
![]() |
[5] |
Z. W. Wong, T. Engel, More than a drug target: Purinergic signalling as a source for diagnostic tools in epilepsy, Neuropharmacology, 222 (2023), 109303. https://doi.org/10.1016/j.neuropharm.2022.109303 doi: 10.1016/j.neuropharm.2022.109303
![]() |
[6] |
R. S. Fisher, C. Acevedo, A. Arzimanoglou, A. Bogacz, J. H. Cross, C. E. Elger, et al., ILAE official report: a practical clinical definition of epilepsy, Epilepsia, 55 (2014), 475–482. https://doi.org/10.1111/epi.12550 doi: 10.1111/epi.12550
![]() |
[7] |
J. Xue, P. Gong, H. Yang, X. Liu, Y. Jiang, Y. Zhang, et al., Genetic (idiopathic) epilepsy with photosensitive seizures includes features of both focal and generalized seizures, Sci. Rep., 8 (2018), 6254. https://doi.org/10.1038/s41598-018-24644-0 doi: 10.1038/s41598-018-24644-0
![]() |
[8] |
Z. Liu, F. Han, Q. Wang, A review of computational models for gamma oscillation dynamics: from spiking neurons to neural masses, Nonlinear Dyn., 108 (2022), 1849–1866. https://doi.org/10.1007/s11071-022-07298-6 doi: 10.1007/s11071-022-07298-6
![]() |
[9] |
D. Pinault, T. J. O'brien, Cellular and network mechanisms of genetically-determined absence seizures, Thalamus Relat. Syst., 3 (2005), 181–203. https://doi.org/10.1017/S1472928807000209 doi: 10.1017/S1472928807000209
![]() |
[10] |
S. Bhattacharya, M. B. L. Cauchois, P. A. Iglesias, Z. S. Chen, The impact of a closed-loop thalamocortical model on the spatiotemporal dynamics of cortical and thalamic traveling waves, Sci. Rep., 11 (2021), 14359. https://doi.org/10.1038/s41598-021-93618-6 doi: 10.1038/s41598-021-93618-6
![]() |
[11] |
P. N. Taylor, Y. Wang, M. Goodfellow, J. Dauwels, F. Moeller, U. Stephani, et al., A computational study of stimulus driven epileptic seizure abatement, PLoS One, 9 (2014), 1–26. https://doi.org/10.1371/journal.pone.0114316 doi: 10.1371/journal.pone.0114316
![]() |
[12] |
S. I. Amari, Characteristics of randomly connected threshold-element networks and network systems, Proc. IEEE, 59 (1971), 35–47. https://doi.org/10.1109/PROC.1971.8087 doi: 10.1109/PROC.1971.8087
![]() |
[13] |
H. R. Wilson, J. D. Cowan, Excitatory and inhibitory interactions in localized populations of model neurons, Biophys. J., 12 (1972), 1–24. https://doi.org/10.1016/S0006-3495(72)86068-5 doi: 10.1016/S0006-3495(72)86068-5
![]() |
[14] |
M. Goodfellow, K. Schindler, G. Baier, Intermittent spike-wave dynamics in a heterogeneous, spatially extended neural mass model, NeuroImage, 55 (2011), 920–932. https://dx.doi.org/10.1016/j.neuroimage.2010.12.074 doi: 10.1016/j.neuroimage.2010.12.074
![]() |
[15] |
P. N. Taylor, J. Thomas, N. Sinha, J. Dauwels, M. Kaiser, T. Thesen, et al., Optimal control based seizure abatement using patient derived connectivity, Front. Neurosci., 9 (2015), 202. https://doi.org/10.3389/fnins.2015.00202 doi: 10.3389/fnins.2015.00202
![]() |
[16] |
S. Liu, Q. Wang, Transition dynamics of generalized multiple epileptic seizures associated with thalamic reticular nucleus excitability: A computational study, Commun. Nonlinear Sci. Numer. Simul., 52 (2017), 203–213. https://doi.org/10.1016/j.cnsns.2017.04.035 doi: 10.1016/j.cnsns.2017.04.035
![]() |
[17] |
Y. Cao, X. He, Y. Hao, Q. Wang, Transition dynamics of epileptic seizures in the coupled thalamocortical network model, Int. J. Bifurcation Chaos, 28 (2018), 1850104. https://doi.org/10.1142/S0218127418501043 doi: 10.1142/S0218127418501043
![]() |
[18] |
L. Yan, H. Zhang, Z. Sun, Z. Shen, Control analysis of electrical stimulation for epilepsy waveforms in a thalamocortical network, J. Theor. Biol., 504 (2020), 110391. https://doi.org/10.1016/j.jtbi.2020.110391 doi: 10.1016/j.jtbi.2020.110391
![]() |
[19] |
Z. Wang, L. Duan, The combined effects of the thalamic feed-forward inhibition and feed-back inhibition in controlling absence seizures, Nonlinear Dyn., 108 (2022), 191–205. https://doi.org/10.1007/s11071-021-07178-5 doi: 10.1007/s11071-021-07178-5
![]() |
[20] |
A. Somarowthu, K. M. Goff, E. M. Goldberg, Two-photon calcium imaging of seizures in awake, head-fixed mice, Cell Calcium, 96 (2021), 102380. https://doi.org/10.1016/j.ceca.2021.102380 doi: 10.1016/j.ceca.2021.102380
![]() |
[21] |
M. Steriade, Interneuronal epileptic discharges related to spike-and-wave cortical seizures in behaving monkeys, Electroencephalogr. Clin. Neurophysiol., 37 (1974), 247–263. https://doi.org/10.1016/0013-4694(74)90028-5 doi: 10.1016/0013-4694(74)90028-5
![]() |
[22] |
S. Tabatabaee, F. Bahrami, M. Janahmadi, The critical modulatory role of spiny stellate cells in seizure onset based on dynamic analysis of a neural mass model, Front. Neurosci., 15 (2021), 743720. https://doi.org/10.3389/fnins.2021.743720 doi: 10.3389/fnins.2021.743720
![]() |
[23] | L. Yan, H. Zhang, Z. Sun, Z. Cao, Z. Shen, Y. Zhao, Mechanism analysis for excitatory interneurons dominating poly-spike wave and optimization of electrical stimulation, Chaos Interdiscip. J. Nonlinear Sci., 32 (2022). https://doi.org/10.1063/5.0076439 |
[24] |
L. Yan, H. Zhang, Z. Sun, S. Liu, Y. Liu, P. Xiao, Optimization of stimulation waveforms for regulating spike-wave discharges in a thalamocortical model, Chaos, Solitons Fractals, 158 (2022), 112025. https://doi.org/10.1016/j.chaos.2022.112025 doi: 10.1016/j.chaos.2022.112025
![]() |
[25] |
Y. Yu, X. Wang, Q. Wang, Q. Y. Wang, A review of computational modeling and deep brain stimulation: applications to Parkinson's disease, Appl. Math. Mech., 41 (2020), 1747–1768. https://doi.org/10.1007/s10483-020-2689-9 doi: 10.1007/s10483-020-2689-9
![]() |
[26] |
O. V. Popovych, P. A. Tass, Multisite delayed feedback for electrical brain stimulation, Front. Physiol., 9 (2018), 46. https://doi.org/10.3389/fphys.2018.00046 doi: 10.3389/fphys.2018.00046
![]() |
[27] |
Z. Wang, Q. Wang, Stimulation strategies for absence seizures: targeted therapy of the focus in coupled thalamocortical model, Nonlinear Dyn., 96 (2019), 1649–1633. https://doi.org/10.1007/s11071-019-04876-z doi: 10.1007/s11071-019-04876-z
![]() |
[28] |
Y. Yu, Y. Fan, F. Han, G. Luan, Q. Wang, Transcranial direct current stimulation inhibits epileptic activity propagation in a large-scale brain network model, Sci. China Technol. Sci., 66 (2023), 3628–3638. https://doi.org/10.1007/s11431-022-2341-x doi: 10.1007/s11431-022-2341-x
![]() |
[29] |
S. Hou, D. Fan, Q. Wang, Regulating absence seizures by tri-phase delay stimulation applied to globus pallidus internal, Appl. Math. Mech., 43 (2022), 1399–1414. https://doi.org/10.1007/s10483-022-2896-7 doi: 10.1007/s10483-022-2896-7
![]() |
[30] |
P. A. Tass, L. Qin, C. Hauptmann, S. Dovero, E. Bezard, T. Boraud, et al., Coordinated reset has sustained aftereffects in Parkinsonian monkeys, Ann. Neurol., 72 (2012), 816–820. https://doi.org/10.1002/ana.23663 doi: 10.1002/ana.23663
![]() |
[31] |
D. Fan, Q. Wang, Closed-loop control of absence seizures inspired by feedback modulation of basal ganglia to the corticothalamic circuit, IEEE Trans. Neural Syst. Rehabil. Eng., 28 (2020), 581–590. https://doi.org/10.1109/TNSRE.2020.2969426 doi: 10.1109/TNSRE.2020.2969426
![]() |
[32] |
N. Sinha, J. Dauwels, M. Kaiser, S. Cash, W. M. Brandon, Y. Wang, et al., Predicting neurosurgical outcomes in focal epilepsy patients using computational modelling, Brain, 140 (2017), 319–332. https://doi.org/10.1093/brain/aww299 doi: 10.1093/brain/aww299
![]() |
[33] |
D. Fan, Q. Wang, J. Su, H. Xi, Stimulus-induced transitions between spike-wave discharges and spindles with the modulation of thalamic reticular nucleus, J. Comput. Neurosci., 43 (2017), 203–225. https://doi.org/10.1007/s10827-017-0658-4 doi: 10.1007/s10827-017-0658-4
![]() |
[34] |
L. Yin, F. Han, Y. Yu, Q. Wang, A computational network dynamical modeling for abnormal oscillation and deep brain stimulation control of obsessive–compulsive disorder, Cognit. Neurodyn., 17 (2023), 1167–1184. https://doi.org/10.1007/s11571-022-09858-3 doi: 10.1007/s11571-022-09858-3
![]() |
[35] |
X. Wang, Y. Yu, F. H, Q. Wang, Beta-band bursting activity in computational model of heterogeneous external globus pallidus circuits, Commun. Nonlinear Sci. Numer. Simul., 110 (2022), 106388. https://doi.org/10.1016/j.cnsns.2022.106388 { doi: 10.1016/j.cnsns.2022.106388
![]() |
[36] |
Y. Yu, Y. Fan, S. Hou, Q. Wang, Optogenetic stimulation of primary motor cortex regulates beta oscillations in the basal ganglia: A Computational study, Commun. Nonlinear Sci. Numer. Simul., 117 (2023), 106918. https://doi.org/10.1016/j.cnsns.2022.106918 doi: 10.1016/j.cnsns.2022.106918
![]() |
[37] |
K. B. Baker, E. B. Plow, S. Nagel, A. B. Rosenfeldt, R. Gopalakrishnan, C. Clark, et al., Cerebellar deep brain stimulation for chronic post-stroke motor rehabilitation: a phase I trial, Nat. Med., 29 (2023), 2366–2374. https://doi.org/10.1038/s41591-023-02507-0 doi: 10.1038/s41591-023-02507-0
![]() |
[38] |
T. Loddenkemper, A. Pan, S. Neme, K. B. Baker, A. R. Rezai, D. S. Dinner, et al., Deep brain stimulation in epilepsy, J. Clin. Neurophysiol., 18 (2001), 514–532. https://doi.org/10.1097/00004691-200111000-00002 doi: 10.1097/00004691-200111000-00002
![]() |
[39] |
D. Fan, Z. Wang, Q. Wang, Optimal control of directional deep brain stimulation in the parkinsonian neuronal network, Commun. Nonlinear Sci. Numer. Simul., 36 (2016), 219–237. https://doi.org/10.1016/j.cnsns.2015.12.005 doi: 10.1016/j.cnsns.2015.12.005
![]() |
[40] |
J. J. Lippman-Bell, C. Zhou, H. Sun, J. S. Feske, F. E. Jensen, Early-life seizures alter synaptic calcium-permeable AMPA receptor function and plasticity, Mol. Cell. Neurosci., 76 (2016), 11–20. https://doi.org/10.1016/j.mcn.2016.08.002 doi: 10.1016/j.mcn.2016.08.002
![]() |
[41] |
C. H. Tran, M. Vaiana, J. Nakuci, A. Somarowthu, K. M. Goff, N. Goldstein, et al., Interneuron desynchronization precedes seizures in a mouse model of Dravet syndrome, J. Neurosci., 40 (2020), 2764–2755. https://doi.org/10.1523/JNEUROSCI.2370-19.2020 doi: 10.1523/JNEUROSCI.2370-19.2020
![]() |
[42] |
A. Satlin, L. Kramer, A. Laurenza, Development of perampanel in epilepsy, Acta Neurol. Scand., 127 (2013), 3–8. https://doi.org/10.1111/ane.12098 doi: 10.1111/ane.12098
![]() |
[43] |
M. V. Sysoeva, A. Lüttjohann, G. V. Luijtelaar, I. V. Sysoev, Dynamics of directional coupling underlying spike-wave discharges, Neuroscience, 314 (2016), 75–89. https://doi.org/10.1016/j.neuroscience.2015.11.044 doi: 10.1016/j.neuroscience.2015.11.044
![]() |
[44] |
Y. Yu, F. Han, Q. Wang, Q. Y. Wang, Model-based optogenetic stimulation to regulate beta oscillations in Parkinsonian neural networks, Cognit. Neurodyn., 16 (2022), 667–681. https://doi.org/10.1007/s11571-021-09729-3 doi: 10.1007/s11571-021-09729-3
![]() |
[45] |
M. Lv, J. Ma, Multiple modes of electrical activities in a new neuron model under electromagnetic radiation, Neurocomputing, 205 (2016), 375–381. https://doi.org/10.1016/j.neucom.2016.05.004 doi: 10.1016/j.neucom.2016.05.004
![]() |
Layers | Resolution | Description |
Input | (7,7,1026),(7,7,512) | - |
Concat | (7,7,1026+512) | Concat operation |
Global Pool | (7,7,1538) | Global average pooling |
FC | - | 4096-dimensional FC layer |
FC | - | 4096-dimensional FC layer |
FC | - | k-dimensional FC layer |
Item | Values |
Modalitites | RGB |
Total number of images | 4715 |
Number of classes | 3 |
Number of angle classes | 6 |
Number of distance classes | 5 |
Number of background classes | 7 |
Threshold γ | Accuracy (%) | F1-score (%) |
0.1 | 98.81 | 94.26 |
0.2 | 99.12 | 95.33 |
0.4 | 99.22 | 96.14 |
0.6 | 99.22 | 96.09 |
0.8 | 99.01 | 95.51 |
ML | Accuracy (%) | F1-score (%) |
2 | 98.28 | 96.02 |
4 | 99.22 | 96.14 |
6 | 99.43 | 97.21 |
8 | 99.58 | 97.35 |
10 | 99.36 | 97.13 |
Methods | Flower | Butterfly20 | Cargo | |||
Accuracy(%) | Loss | Accuracy(%) | Loss | Accuracy(%) | Loss | |
Primitive | 89.03 | 0.7017 | 83.10 | 0.8899 | 99.29 | 0.0698 |
Primitive + AMAA | 89.27 | 0.4241 | 85.95 | 0.7994 | 99.47 | 0.0762 |
Primitive + MOSBC | 86.72 | 0.6617 | 81.01 | 1.4036 | 98.79 | 0.1127 |
Primitive ++ | 92.73 | 0.3697 | 88.57 | 0.3751 | 99.58 | 0.0211 |
Methods | Accuracy (%)↑ | F1-score (%)↑ | Loss↓ | Train (SPI)↓ | Test (SPI)↓ |
VGG | 74.13 | 37.84 | 0.8189 | 0.0066 | 0.0020 |
ResNeSt | 86.49 | 44.01 | 0.6651 | 0.0075 | 0.0020 |
ViT | 69.86 | 34.08 | 0.8850 | 0.0066 | 0.0025 |
CoAtNet | 88.50 | 54.11 | 0.3550 | 0.0103 | 0.0031 |
DeepMAD (SOTA) | 90.57 | 61.33 | 0.3368 | 0.0226 | 0.0068 |
AGMG-Net | 92.73 | 63.71 | 0.3697 | 0.0212 | 0.0127 |
Methods | Accuracy (%)↑ | F1-score (%)↑ | Loss↓ | Train (SPI)↓ | Test (SPI)↓ |
VGG | 62.62 | 34.11 | 1.1135 | 0.0066 | 0.0020 |
ResNeSt | 77.62 | 41.45 | 0.8012 | 0.0076 | 0.0021 |
ViT | 50.00 | 15.52 | 1.7806 | 0.0066 | 0.0026 |
CoAtNet | 85.00 | 52.60 | 0.6337 | 0.0101 | 0.0031 |
DeepMAD (SOTA) | 88.41 | 71.62 | 0.4246 | 0.0232 | 0.0081 |
AGMG-Net | 88.57 | 73.45 | 0.3751 | 0.0217 | 0.0131 |
Methods | Accuracy (%)↑ | F1-score (%)↑ | Loss↓ | Train (SPI)↓ | Test (SPI)↓ |
VGG | 99.15 | 94.69 | 0.0421 | 0.0067 | 0.0019 |
ResNeSt | 99.21 | 95.97 | 0.0332 | 0.0075 | 0.0020 |
ViT | 92.48 | 69.16 | 0.2025 | 0.0066 | 0.0025 |
CoAtNet | 99.23 | 96.02 | 0.0244 | 0.0103 | 0.0032 |
DeepMAD (SOTA) | 99.53 | 97.04 | 0.0256 | 0.0225 | 0.0065 |
AGMG-Net | 99.58 | 97.35 | 0.0211 | 0.0194 | 0.0111 |
Layers | Resolution | Description |
Input | (7,7,1026),(7,7,512) | - |
Concat | (7,7,1026+512) | Concat operation |
Global Pool | (7,7,1538) | Global average pooling |
FC | - | 4096-dimensional FC layer |
FC | - | 4096-dimensional FC layer |
FC | - | k-dimensional FC layer |
Item | Values |
Modalitites | RGB |
Total number of images | 4715 |
Number of classes | 3 |
Number of angle classes | 6 |
Number of distance classes | 5 |
Number of background classes | 7 |
Threshold γ | Accuracy (%) | F1-score (%) |
0.1 | 98.81 | 94.26 |
0.2 | 99.12 | 95.33 |
0.4 | 99.22 | 96.14 |
0.6 | 99.22 | 96.09 |
0.8 | 99.01 | 95.51 |
ML | Accuracy (%) | F1-score (%) |
2 | 98.28 | 96.02 |
4 | 99.22 | 96.14 |
6 | 99.43 | 97.21 |
8 | 99.58 | 97.35 |
10 | 99.36 | 97.13 |
Methods | Flower | Butterfly20 | Cargo | |||
Accuracy(%) | Loss | Accuracy(%) | Loss | Accuracy(%) | Loss | |
Primitive | 89.03 | 0.7017 | 83.10 | 0.8899 | 99.29 | 0.0698 |
Primitive + AMAA | 89.27 | 0.4241 | 85.95 | 0.7994 | 99.47 | 0.0762 |
Primitive + MOSBC | 86.72 | 0.6617 | 81.01 | 1.4036 | 98.79 | 0.1127 |
Primitive ++ | 92.73 | 0.3697 | 88.57 | 0.3751 | 99.58 | 0.0211 |
Methods | Accuracy (%)↑ | F1-score (%)↑ | Loss↓ | Train (SPI)↓ | Test (SPI)↓ |
VGG | 74.13 | 37.84 | 0.8189 | 0.0066 | 0.0020 |
ResNeSt | 86.49 | 44.01 | 0.6651 | 0.0075 | 0.0020 |
ViT | 69.86 | 34.08 | 0.8850 | 0.0066 | 0.0025 |
CoAtNet | 88.50 | 54.11 | 0.3550 | 0.0103 | 0.0031 |
DeepMAD (SOTA) | 90.57 | 61.33 | 0.3368 | 0.0226 | 0.0068 |
AGMG-Net | 92.73 | 63.71 | 0.3697 | 0.0212 | 0.0127 |
Methods | Accuracy (%)↑ | F1-score (%)↑ | Loss↓ | Train (SPI)↓ | Test (SPI)↓ |
VGG | 62.62 | 34.11 | 1.1135 | 0.0066 | 0.0020 |
ResNeSt | 77.62 | 41.45 | 0.8012 | 0.0076 | 0.0021 |
ViT | 50.00 | 15.52 | 1.7806 | 0.0066 | 0.0026 |
CoAtNet | 85.00 | 52.60 | 0.6337 | 0.0101 | 0.0031 |
DeepMAD (SOTA) | 88.41 | 71.62 | 0.4246 | 0.0232 | 0.0081 |
AGMG-Net | 88.57 | 73.45 | 0.3751 | 0.0217 | 0.0131 |
Methods | Accuracy (%)↑ | F1-score (%)↑ | Loss↓ | Train (SPI)↓ | Test (SPI)↓ |
VGG | 99.15 | 94.69 | 0.0421 | 0.0067 | 0.0019 |
ResNeSt | 99.21 | 95.97 | 0.0332 | 0.0075 | 0.0020 |
ViT | 92.48 | 69.16 | 0.2025 | 0.0066 | 0.0025 |
CoAtNet | 99.23 | 96.02 | 0.0244 | 0.0103 | 0.0032 |
DeepMAD (SOTA) | 99.53 | 97.04 | 0.0256 | 0.0225 | 0.0065 |
AGMG-Net | 99.58 | 97.35 | 0.0211 | 0.0194 | 0.0111 |