Research article Special Issues

A method based on multi-standard active learning to recognize entities in electronic medical record

  • Received: 29 September 2020 Accepted: 21 December 2020 Published: 05 January 2021
  • Deep neural networks(DNN)have achieved good results in the application of Named Entity Recognition (NER), but most of the DNN methods are based on large numbers of annotated data. Electronic Medical Record (EMR) belongs to text data of the specific professional field. The annotation of this kind of data needs experts with strong knowledge of the medical field and time labeling. To tackle the problems of professional medical areas, large data volume, and annotation difficulties of EMR, we propose a new method based on multi-standard active learning to recognize entities in EMR. Our approach uses three criteria: the number of labeled data, the cost of sentence annotation, and the balance of data sampling to determine the choice of active learning strategy. We put forward a more suitable way of uncertainty calculation and measurement rule of sentence annotation for NER's neural network model. Also, we use incremental training to speed up the iterative training in the process of active learning. Finally, the named entity experiment of breast clinical EMRs shows that it can achieve the same accuracy of NER results under the premise of obtaining the same sample's quality. Compared with the traditional supervised learning method of randomly selecting labeled data, the method proposed in this paper reduces the amount of data that needs to be labeled by 66.67%. Besides, an improved TF-IDF method based on Word2Vec is also proposed to vectorize the text by considering the word frequency.

    Citation: Qiao Pan, Chen Huang, Dehua Chen. A method based on multi-standard active learning to recognize entities in electronic medical record[J]. Mathematical Biosciences and Engineering, 2021, 18(2): 1000-1021. doi: 10.3934/mbe.2021054

    Related Papers:

    [1] Ciro D'Apice, Peter I. Kogut, Rosanna Manzo . On optimization of a highly re-entrant production system. Networks and Heterogeneous Media, 2016, 11(3): 415-445. doi: 10.3934/nhm.2016003
    [2] Ciro D'Apice, Peter Kogut, Rosanna Manzo . On generalized active contour model in the anisotropic BV space and its application to satellite remote sensing of agricultural territory. Networks and Heterogeneous Media, 2025, 20(1): 113-142. doi: 10.3934/nhm.2025008
    [3] Nicola Bellomo, Raluca Eftimie, Guido Forni . What is the in-host dynamics of the SARS-CoV-2 virus? A challenge within a multiscale vision of living systems. Networks and Heterogeneous Media, 2024, 19(2): 655-681. doi: 10.3934/nhm.2024029
    [4] Robert Carlson . Myopic models of population dynamics on infinite networks. Networks and Heterogeneous Media, 2014, 9(3): 477-499. doi: 10.3934/nhm.2014.9.477
    [5] Nimisha Pandey, Pramod Kumar Mishra . Detection of DDoS attack in IoT traffic using ensemble machine learning techniques. Networks and Heterogeneous Media, 2023, 18(4): 1393-1409. doi: 10.3934/nhm.2023061
    [6] Dieter Armbruster, Michael Herty, Xinping Wang, Lindu Zhao . Integrating release and dispatch policies in production models. Networks and Heterogeneous Media, 2015, 10(3): 511-526. doi: 10.3934/nhm.2015.10.511
    [7] Jin Li, Jinzheng Qu, Xibo Duan, Xiaoning Su . An image encryption algorithm based on heat flow cryptosystems. Networks and Heterogeneous Media, 2023, 18(3): 1260-1287. doi: 10.3934/nhm.2023055
    [8] Ai-Ping Zhang, Guang-xin Wang, Wei Zhang, Jing-Yu Zhang . Cardiovascular disease classification based on a multi-classification integrated model. Networks and Heterogeneous Media, 2023, 18(4): 1630-1656. doi: 10.3934/nhm.2023071
    [9] Kenneth H. Karlsen, Süleyman Ulusoy . On a hyperbolic Keller-Segel system with degenerate nonlinear fractional diffusion. Networks and Heterogeneous Media, 2016, 11(1): 181-201. doi: 10.3934/nhm.2016.11.181
    [10] Chuanjiang Li, Shaobo Li, Lei Yang, Hongjing Wei, Ansi Zhang, Yizong Zhang . A novel multiscale hybrid neural network for intelligent fine-grained fault diagnosis. Networks and Heterogeneous Media, 2023, 18(1): 444-462. doi: 10.3934/nhm.2023018
  • Deep neural networks(DNN)have achieved good results in the application of Named Entity Recognition (NER), but most of the DNN methods are based on large numbers of annotated data. Electronic Medical Record (EMR) belongs to text data of the specific professional field. The annotation of this kind of data needs experts with strong knowledge of the medical field and time labeling. To tackle the problems of professional medical areas, large data volume, and annotation difficulties of EMR, we propose a new method based on multi-standard active learning to recognize entities in EMR. Our approach uses three criteria: the number of labeled data, the cost of sentence annotation, and the balance of data sampling to determine the choice of active learning strategy. We put forward a more suitable way of uncertainty calculation and measurement rule of sentence annotation for NER's neural network model. Also, we use incremental training to speed up the iterative training in the process of active learning. Finally, the named entity experiment of breast clinical EMRs shows that it can achieve the same accuracy of NER results under the premise of obtaining the same sample's quality. Compared with the traditional supervised learning method of randomly selecting labeled data, the method proposed in this paper reduces the amount of data that needs to be labeled by 66.67%. Besides, an improved TF-IDF method based on Word2Vec is also proposed to vectorize the text by considering the word frequency.


    Abbreviations: CAM: class activation mapping; CNN: convolutional neural network; DA: data augmentation; DARS: DA and regularization strategy, DL: deep learning; LR: learning rate; m-CutMix: modified CutMix; m-OLS: modified Online Label Smoothing; M: million; ML: machine learning, NPSC: number of parameter-setting combinations; OA: overall accuracy; OLS: online label smoothing; RE: regularization-enhanced; RSI: remote sensing images; SC: semantic classification; SE: squeeze-and-excitation; SOTA: state-of-the-art; TR: training ratio; TT: training trick; ViT: vision transformer.

    Remote sensing is a crucial technique for Earth observation. With the expansion of onboard, ground-based [1] and underwater sensors [2], monitoring data has become too large to be interpreted all by hand. Consequently, machine learning (ML) algorithms have played a prominent role in satisfying automatic and intelligent recognition needs. As the deep learning (DL) era comes, CNNs totally dominate the recognition tasks of RSI, including but not limited to semantic classification (SC), segmentation [3] and object detection or recognition [4]. As the fundamental algorithm, researchers have proposed a lot of CNN methods for RSI-SC in the past decade. To boost the model's performance, these methods have commonly paid more attention to human re-modeling strategies, including modifying pre-trained models, fusing features from models, creating combined models and so on, resulting in a remarkable increase in hardware and time costs. However, these ingenious methods still achieved insignificant performance advances due to some imperceptible flaws. The reasons are given as follows:

    First, a model's training can be viewed as searching in the parameter space for an optimal solution. On this premise, the shallow model of traditional ML corresponds to a small search space, and consequently, its training can employ exhaustive computation. Hence, all the strategies using human modeling can be proven effective or not for shallow models. Given the far larger capacity, however, it is impossible to perform an exhaustive search for deep CNNs. Besides, re-modeling will enlarge the search space due to increased parameters. Therefore, it is more difficult to identify the effectiveness of modeling strategies if the methods encounter an expanded parameter space but still employ finite training steps in searching.

    Second, the benchmark RSI sets are commonly much smaller than natural images. For example, the Aerial Image Dataset (AID) and Northwestern Polytechnical University (NWPU) [5,6] RSI sets only have smaller amounts of 10,000 and 31,500, while the ImageNet-1K has one million more. Currently, all the off-the-shelf CNNs for RSI recognition are developed on ImageNet-1K. Hence, a CNN is easy to overfit on RSI datasets due to its large capacity. To avoid overfitting, data augmentation (DA) was widely employed in previous studies. However, a DA-processed training set will have a data distribution shift compared to the original one. With this shift problem unhandled, a CNN will easily achieve suboptimal performance [7]. To the authors' best knowledge, the problem has been commonly ignored in previous studies.

    Third, some emerging TTs can boost a CNN model's performance, but they are also developed based on the attributes of natural images. Nonetheless, several previous methods for RSI-SC have employed TTs in training and achieved better performance. However, these studies only copied the TTs' usage from ImageNet-1K without considering the inherent difference in RSI. Therefore, this simplistic operation may also be suboptimal due to the domain gap. In addition, previous methods commonly employed transfer learning as the technical route. However, most of them break several basic rules of transfer learning, e.g., using a larger learning rate (LR) of 1E-02. Some evidence reveals that a larger LR can easily make a CNN overfit on RSI sets [8].

    To solve these problems, we proposed a simple experiment by training a lightweight CNN [9] (i.e., EfficientNet-B0) on RSI sets. In detail, an EfficientNet-B0 using pre-trained weights on ImageNet-1K is retrained on AID and NWPU using a smaller LR of 1E-04. The whole training only includes 60 epochs without any secondary modeling. Given the same training ratios (TRs), the OA results are compared to some previous state-of-the-art (SOTA) CNN methods published before 2023 (see Table 1, where "None" means no data presented in related studies). As shown in the table, we can see that the single EfficientNet-B0 method can clearly surpass all other methods by using beginner-friendly transfer learning with much fewer parameters. Therefore, it reveals that there may be a more efficient but accurate solution for RSI-SC by using a single CNN.

    Table 1.  OA (%) comparison of CNN methods on AID and NWPU.
    Methods Roadmap Param Size (M) AID NWPU
    TR–20% TR–10%
    [10] at 2018 Fine-tuning loss 23.4 90.82 ± 0.16 89.22 ± 0.50
    [11] at 2019 Feature fusion 276 95.02 ± 0.28 91.30 ± 0.18
    [12] at 2019 CNNs ensemble 167 None 92.44 ± 0.37
    [13] at 2020 Attention aided 14 95.73 ± 0.22 92.70 ± 0.32
    [14] at 2020 Feature fusion 24 None 92.17 ± 0.08
    [15] at 2021 Attention aided 24 94.45 ± 0.76 None
    [16] at 2022 Models combination 54 96.19 ± 0.48 93.67 ± 0.21
    [17] at 2022 Attention aided 8 95.43 ± 0.32 93.39 ± 0.39
    [18] at 2022 CNNs ensemble 47 96.20 ± 0.15 93.24 ± 0.15
    [this work] Single EfficientNet-B0 5.3 96.08 ± 0.07 93.47 ± 0.05

     | Show Table
    DownLoad: CSV

    To tackle the problem, we proposed a simple, lightweight but accurate method for RSI-SC named regularization-enhanced EfficientNet (RE-EfficientNet). It consists of a single EfficientNet-B3 model and a concise training algorithm called RE-CNN. The EfficientNet-B3 has better performance on ImageNet-1K and much fewer parameters if compared to other classical CNNs such as Resnet and so on. Some experiments also prove that EfficientNet-B3 can achieve better performance on RSI sets compared to other CNNs [19]. The RE-CNN algorithm includes some special ideas to handle the aforementioned problems coming from DA and a simple copy of TTs' usage. It presents an effective combination of regularizations that can significantly boost the CNN's performance. The experimental results on AID and NWPU show that RE-EfficientNet can prevail over 30 SOTA CNN and ViT methods published before 2023. The study's contributions are summarized as follows:

    First, the work presents a simple, lightweight but more accurate single-CNN method for classifying RSI. The RE-EfficientNet only consists of an off-the-shelf EfficientNet-B3 with 12 M parameters and employs a replicable open-source training algorithm through transfer learning. Extensive experimental results on AID and NWPU prove that RE-EfficientNet can surpass other SOTA methods with not only remarkable OA improvements but also much fewer parameters.

    Second, this work demonstrates that modified TTs based on the inherent characteristics of RSI can significantly improve a CNN's performance for RSI-SC. Ablation experimental results prove that the RE-CNN algorithm can boost a CNN's performance by 1% OA if the TTs are effectively combined.

    Third, the work proves that classification tasks for RSI can be better performed by using a simpler roadmap at lower hardware and time costs. The easily accessible EfficientNet-B3 and transfer learning pipeline mean that RE-EfficientNet is a time-saving and beginner-friendly solution. In addition, based on the authors' viewpoints, the model training ideas proposed in this paper can also help us develop more efficient approaches in the future.

    The paper's following sections are as follows: Section 2 is a brief review of related works. Sections 3 and 4 present the proposed theory and method. Section 5 presents experimental results, and Sections 6 and 7 present the discussion and conclusions.

    In the beginning, CNNs were used as a fixed feature extractor for RSI-SC due to a lack of sufficient knowledge accumulation. For example, with the pre-trained CNNs from ImageNet-1K, Chaib et al. [20] directly took the model's outputs as the representation of RSI but skipped retraining. The method greatly improved its OA if compared to previous approaches using shallow models. However, the roadmap also lacks effective retraining on RSI sets, which undoubtedly results in poor performance. Afterward, some researchers proposed another roadmap by retraining the CNN on RSI sets using fine-tuned objective functions. For example, Cheng et al. [9], Liu et al. [21,22] and Bazi et al. [23] proposed new auxiliary losses for training CNNs. These ideas consist of complex human-modeling procedures and indicators but do surpass the extractor roadmap.

    Following that, more creative methods were tried to find better solutions. These ideas mainly include architecture fine-tuning, feature fusion and model combination. For example, Zhang et al. [24] proposed a method by combining two CNNs with capsule nets to verify the effectiveness of multi-mode combinations. Xie et al. [25] proposed another approach by fine-tuning CNN's last pooling layer to make the model capable of processing arbitrary resolutions of input images. With the layer's activated information, Zhu et al. [10], Guo et al. [11] and Li et al. [14] proposed similar methods by choosing relatively important features that can boost the model's performance. Sun et al. [26] proposed a method by fusing outputs from a CNN's different layers to generate the so-called semantic and appearance features that can boost the model's performance. These methods presented meaningful novelty, but the improvements were not very remarkable.

    With the success of vision transformers (ViTs), attention mechanisms were widely introduced into a CNN's architecture for gaining advances. For example, Tong et al. [13], Guo et al. [27] and Tang et al. [28] proposed similar methods by adding attention modules to a pre-trained CNN and verified that attention-enhanced CNNs can be better for RSI-SC than the original models without attention modules. Moreover, by using pre-trained models with built-in attention modules, Alhichri et al. [15], Li et al. [16] and Chen et al. [17] proposed other methods by fusing the CNN's outputs with human-engineered indicators or feeding the CNN's outputs into a sequential model to boost the method's performance. All these methods are creative and achieve better performance to some extent, but the final performance still lacks exciting advancement.

    In addition, Minetto et al. [12] and Zhao et al. [18] proposed two different ensemble methods using multiple CNNs. The first method employs twelve CNNs coarsely trained on RSI sets and uses the weighted outputs of individual models as votes for prediction. The second method employs a single CNN as the main classifier but embeds multiple different CNNs as functional branch modules to boost the method's performance. These two methods are competitive for RSI-SC to some extent. However, the two methods also have the disadvantages of high hardware costs and tedious training procedures. As the ViT arose, Bazi et al. [29], Zhang et al. [30] and Wang et al. [31] proposed different methods by using a single ViT and proved ViTs can be competitive for RSI-SC as CNNs. However, all these methods' performances are still not exciting if compared to the results shown in Table 1.

    Moreover, researchers also proposed other creative approaches to find better solutions for RSI-SC. Shi et al. [32] proposed a single-CNN method by using a so-called self-compensating module to obtain a lightweight model that can achieve acceptable performance compared to ready-made CNNs. Chen et al. [33] proposed a single-CNN approach by inserting two different attention modules as functional branches in a pre-trained CNN to find performance advances. Deng et al. [34] proposed a dual-model combining method by fusing the outputs of a pre-trained CNN and ViT and showed competitive performance. Miao et al. [35] proposed a multi-granularity decoupling CNN method by using class-imbalanced pseudo-label selection. It aims to tackle CNN's dependence on a great number of training samples and shows acceptable performance. Song et al. [36] proposed a single CNN method by applying clustering to the extracted features and demonstrated improved performance when training samples were limited. Wang et al. [37] proposed a multi-CNN method by employing two CNNs as cooperative classifiers and an adaptive image size transformer. The model can achieve competitive performance when processing images with different resolutions. Xu et al. [38] proposed a knowledge distillation method by distilling out the long-range features contained in a ViT teacher and transferring them to a CNN student. However, all these methods still share a poor tradeoff between hardware costs and accuracy.

    Currently, regularization and DA are commonly used in CNN training, although these techniques also need professional implementation [39]. For example, with finite training epochs, too strong regularization results in poor fitting, and likewise, too weak regularization leads to over fitting. Similarly, a DA-processed training set will have a significant shift in data distribution, making the model suboptimal if the shift is well handled [7]. Moreover, most of the TTs are developed on ImageNet-1K, which has a domain gap compared to RSI. Nevertheless, TTs are very meaningful due to their obviously low hardware dependency and common open-source nature. Therefore, we should conduct careful research on the usability of TTs before using them. However, most of the previous methods simply copied the TT's procedure from natural images without any modifications.

    To tackle the above problems and enhance the findings of previous studies, we first proposed some novel ideas for the model choice and training algorithm based on a deeply theoretical analysis. Then, we selected a simple pipeline that not only can achieve better performance for RSI-SC but also has much lower hardware and time costs. First, the proposed RE-EfficientNet method only employs an EfficientNet-B3 as the base model and excludes any re-modeling procedures to keep the search in parameter space unexpanded. Second, to adequately utilize the pre-trained weights on ImageNet-1K, this method still employs transfer learning to reduce time costs in training. Third, the RE-CNN training algorithm employs two different combinations of DA transformations in the whole training process to ensure the shift problem of data distribution coming from DA can be alleviated. Fourth, we modified two TTs named CutMix [40] and Online Label Smoothing (OLS) [41] according to the inherent properties of RSI. Then, the RE-CNN algorithm employs the combination of modified CutMix (m-CutMix) and OLS (m-OLS) as regularization in training to boost the model's performance. In brief, the RE-EfficientNet method is a simple, lightweight but more accurate approach for RSI-SC. The proposed method's pipeline is thoroughly different from the previous studies.

    The CNN learns the image features through the operation of convolution. In the architecture, the implementation of convolution is called the convolutional layer. Let XRC,H,W be the input signals for a convolutional layer, and C, H and W denote its channel number, height and width in pixels. Let YRCO,HO,WO be the layer's output, with C0, H0 and W0 denoting the same definitions as C, H and W. Then, the convolution processing can be described as follows:

    Y=n=1WtC0×X+BiasC0 (3.1)

    where Wt denotes the adjustable weights of every convolution kernel, and Bias denotes a random variable.

    In training, based on the common gradient descent technique, we employ the back-propagation algorithm to fit a set. Let Z be the output of the following layer defined in Eq (3.1), with the same Y, Wt and X as in Eq (3.1). Then, the gradient of Z at Wt can be described as

    ZWt=ZYYWt (3.2)

    The convolution processing, as shown in Eq (3.1), is an element-wise multiplication and accumulation operation in mathematics. Then, according to the chain rule in differential calculus, the gradient of Z at w can also be described as

    ZWti=ZY1Y1Wti+ZY2Y2Wti++ZYnYmWti. (3.3)

    According to Eqs (3.1) and (3.3), we can derive that every parameter of a convolutional kernel shares the same gradient value in a back-propagation process due to the addition operation. Hence, in the simplest case, we can treat every kernel of a layer as a single meta-function. Let F be the meta-function, with the same definitions for X, Y, HO, WO and CO in Eq (3.1). Then, the sliding window operation of a convolutional layer can be described as

    YC=[FC0,1,1(X)FC0,W0,1(X)FC0,1,H0(X)FC0,W0,H0(X)]. (3.4)

    Currently, the architecture of deep CNNs commonly consists of cascaded convolutional layers with a fully connected classifier at the tail. Let Pred be the prediction of a CNN model with i neurons and k layers. Let L denote the operation of every convolutional layer of the model, with the same definition of X as in Eq (3.1). Then, in the simplest case, the predicting process of the CNN can be described as follows:

    Pred=in=1Lk(Lk1(L1(X)). (3.5)

    In the back-propagation process, we can set all the parameters of a CNN to a certain value at each training step. The state of all these parameters, according to the definitions of permutation and combination in probability theory, is a combination problem. Let us hypothesize that the setting value of every parameter, in the simplest case, can only be 0 or 1. Let CNPSC be the number of parameter-setting combinations (NPSC). Then, the NPSC for a certain CNN layer with a number of N parameters can be described as follows:

    CNPSC=2N. (3.6)

    Let CNPSCCNN be the NPSC for a whole CNN, with a number of M layers and the same number of N parameters in each layer. Then, according to Eq (3.6), its NPSC can be described as

    CNPSCCNN=2N×N. (3.7)

    Note that the interval for a CNN's single parameter is 0 to 1 in the real training state. As a result, the real NPSC is much larger than CNPSCCNN in Eq (3.7). The CNN predicting function F in Eq (3.5) is a complex composite function consisting of multidimensional variables. To date, we commonly employ the average gradient of a mini-batch to train a CNN with finite training epochs because an exhaustive search is unrealistic. Hence, a CNN method is essentially a non-convex problem that has a number of different locally optimal solutions achieved in training. Therefore, a method with a larger NPSC apparently corresponds to a lower frequency to gain locally optimal solutions unless the searching process employs some efficient modes, such as regional sampling or greedy searching. Unfortunately, to the authors' best knowledge, such an efficient and common enough training algorithm has not emerged yet. As a result, we can simply conclude that a CNN with a smaller NPSC corresponds to a higher probability of achieving its best potential performance in finite training epochs, or, in other words, a shallow model easily achieves convergence.

    The attention module in the CNN's architecture commonly employs two techniques, including channel-wise or spatial-wise attention, to re-weight each output stream of the current layer. For example, Hu et al. [42] proposed the squeeze-and-excitation (SE) block as channel-wise attention to boost a CNN's performance. The SE structure, as shown in the red rectangle of Figure 1, performs SE and another scale operation on the inputs and outputs of the current layer, respectively. Let X and XScale be the original and weighted signals as defined in Eq (3.1), with the same W, H and C. Then, the scale can be described as follows:

    XScaleC=Xc(WtSECWtSECWtSECWtSEC), (3.8)
    Figure 1.  Squeeze-and-excitation block.

    where WtSE[0,1] denotes the SE output weight matrix. Note that the WtSE still has C channels after channel-wise compression, and in the same channel, all weights are equal.

    After pre-training, the weights of WtSE in Eq (3.8) are set to larger or smaller values correlated to the channel-wise feature's importance. In Eq (3.8), X multiplied by WtSE yields XScale. As a result, XScale multiplied by a larger weight becomes more important. In the back-propagation process, according to Eq (3.3), the gradient value of prediction is more sensitive to the parts with great information entropy. Therefore, with the SE attention activated, each iteration step in training becomes sensitive to partial channel-wise regions of the CNN's architecture, resulting in the other regions being slightly updated. Then, according to Eq (3.7), with the attention module inside, the NPSC of a pre-trained CNN is essentially smaller than the one without attention because the former's searching process always ignores its partial channels. Therefore, given finite training steps, a CNN with built-in attention modules commonly performs better. In contrast, we more often employ the class activation mapping technique [43] to intuitively prove this selective importance.

    In the feature fusion method, however, the NPSC is always larger than the one without fusion if the feature extractors are hooked in the same back propagating chain. Let CNPSCFF be the NPSC of a fusion method, with k types of deep features extracted from different layers in a single CNN. Let us hypothesize that the method concatenates all the fused features as the input for the last classifier, or more accurately, hooks all the features in the chain. Then, its NPSC, according to Eq (3.7), can be described as follows:

    CNPSCFF=kj=1CNPSCET, (3.9)

    where CNPSCET is the NPSC of a certain extractor. Comparing Eqs (3.7) and (3.9), we can easily find that, with the larger NPSC, these fusion methods concatenating all the fused features will have a lower frequency to gain more locally optimal solutions than the ones without fusion if equal training steps are given. Therefore, with finite training epochs, the common fusion strategy may face a worse probability of achieving better performance unless it employs some efficient searching modes or unhooks the features.

    Similarly, the method of combining CNNs will have a much larger NPSC if the combined CNNs are in cascaded mode. Theoretically, the method's NPSC is the product of the NPSCs of each CNN. Similarly, with multiple CNNs inside, the NPSC of the combined method in parallel mode is the same as the cascaded CNNs if the combined CNNs are hooked. Therefore, a combining-CNN method also requires some efficient searching algorithms to ensure its true performance can be achieved in finite training steps.

    However, the ensemble of CNNs does not increase the NPSC of a single CNN due to its independent training process for its individual classifiers. Similarly, this rule is also right if a method has multiple CNNs but employs independent training processes for each model, or more accurately, unhooks its multiple CNNs. Nonetheless, these two strategies are more complicated and costly than a single CNN method.

    The prevalence of human re-modeling mainly derives from two ideas: First, human modeling does succeed in ML fields. Second, based on the different imaging conditions compared to ImageNet-1K, the domain gap may weaken the pre-trained CNN's performance on RSI. These viewpoints, however, are not exactly as correct as we think. The reason is twofold. First, as concluded before, some complex human re-modeling strategies reduce the probability of finding better solutions. Meanwhile, a deep CNN is easy to overfit on the smaller RSI sets. Second, the invariant feature in ImageNet-1K does have similarities to RSI. Given an all-layer-frozen CNN, we can still achieve an OA of approximately 75% on AID and NWPU if the last classifier can be trained (see Table 2).

    Table 2.  OA (%) comparison of frozen tests on AID and NWPU.
    Methods Roadmap AID NWPU
    TR-20% TR-50% TR-10% TR-20%
    Frozen CNN layers with a trainable classifier An EfficientNet-B0 78.66 83.42 72.74 76.87

     | Show Table
    DownLoad: CSV

    Based on the above mathematical derivations and analyses, this work employed a simpler transfer learning strategy by using a pre-trained CNN that has built-in attention modules. In particular, the proposed method excludes any architecture modification or feature fusion to maintain the unexpanded NPSC. Given this premise, this work also cautiously evaluated the real value of the invariant feature in the smaller RSI sets. More importantly, it employed a smaller LR and an effective combination of TTs as regularization to avoid overfitting.

    In this paper, we propose a concise and beginner-friendly training framework, as shown in Figure 2. In summary, the training process consists of 300 successive training epochs in total but can be viewed as two steps because the DA and regularization strategy (DARS) in training have some differences. In detail, the DARS of the first 60 epochs, named Step 1, only consists of a cascading combination of routine geometric transformations, coupled with m-OLS as regularization. The DARS of the second 240 epochs, called Step 2, also includes a combination of cascading routine geometric transformations. It additionally uses m-OLS and m-CutMix as regularizations.

    Figure 2.  The proposed training framework.

    The training procedures, as shown in Figure 2, start with the red arrows with the DARS of Step 1 activated. Then, at epoch 61, the model undergoes another 240 training epochs using the DARS of Step 2, just as the blue arrows indicate.

    The procedures of RE-CNN are presented in Algorithm 1. In training, the training and testing images' resolutions are 2562. In detail, the combination of transformations in Steps 1 and 2 is the same, consisting of a resize, a color jitter, a horizontal and vertical flip and a rotation. The mini-batch size in training is 30. The initial LRs in Steps 1 and 2 are both 1E-04. The LR optimizer is a cosine decay algorithm with maximum iterations equal to the training epochs. The object function is cross-entropy. The error's back-propagating algorithm is Adam-W [44], with a weight decay of 1E-06.

    Table  .   .
    Algorithm 1: The procedures of RE-CNN.
    Definition.
    X: a mini-batch of image samples; Y: the corresponding labels of X; TNI: the total number of images in the training dataset; CT: a combination of transformations; CM: the m-CutMix transformation; CMLoss: the corresponding loss function of CM; OLSLoss: the corresponding loss function of m-OLS; F: the forward function of the CNN model; Update: the parameter updating function of CNN through back propagating; Acc: the model's prediction accuracy; Results: the dictionary of Acc and its weights file.
    1 CT = { resize, color Jitter, horizontal and vertical flip, rotation }
    2 Initialize (model, pre-trained weighs file from ImageNet-1k)
    3 for Epoch = 1, …, 60 do
    4      for batch = 1,, TNI / 30 do
    5       Z = F (CT (X))
    6     Grad = OLS-Loss (Z, Y)
    7     Update (model, Grad)
    8     end for
    9   Acc = F (Testing Dataset)
    10   If Acc is the best then
    11     Save the weights file of the best Acc in Results
    12 end for
    13 Initialize (model, the weights file of the best Acc in Results)
    14 for Epoch = 61, …, 300 do
    15      for batch = 1,, TNI / 30 do
    16       Z = F (CM (CT (X)))
    17     Grad = CM-Loss ( OLS-Loss (Z, Y))
    18     Update (model, Grad)
    19     end for
    20   Acc = F (Testing Dataset)
    21   If Acc is the best then
    22     Save the weights file of the best Acc in Results
    23 end for

     | Show Table
    DownLoad: CSV

    RE-EfficientNet employs a single EfficientNet-B3 model as the base model, while EfficientNet-B0 is also used to support some viewpoints. EfficientNets are well-known deep CNN models with built-in SE blocks, and their architecture charts can be found in [8]. The attention mechanism, as concluded before, helps EfficientNets gain better performance on ImageNet-1K. In this work, all the CNN models, as well as the pre-trained weights on ImageNet-1K, follow the original settings in PyTorch without any modification.

    Szegedy et al. [45] proposed label smoothing as regularization to boost performance and avoid overfitting. This technique converts the one-hot hard label of 1 or 0 to a soft version. For example, a value of 0.9 denotes the probability of a certain sample belonging to class A, while a residual value of 0.1 denotes the sum of likelihoods belonging to the other classes. In this setting of 0.1, the likelihoods share an equal average value without considering the difference in similarity between classes. To tackle this problem using an online-updated training style, Zhang et al. [41] proposed the OLS as a solution to learn category similarity from training sets. Let LOLS be the final label of a subclass, LH denote the hard label and LS denote the value of OLS. Then, the computation of a final label can be described as follows:

    LOLS=α×LH+(1α)×LS, (3.10)

    where α (∈ [0, 1]) is a hyperparameter to balance the hard and soft labels. In training, the OLS stores and accesses the labels in a matrix variable of the categories' similarity. Let MatrixS be this soft label matrix and n be the number of classes in a dataset. Then, the data shape of MatrixS can be described as follows:

    MatrixS=[label1,1label1,nlabel1,nlabeln,n], (3.11)

    where label1,n denotes the updated probability that the first-class sample belongs to class n. The updating algorithm for MatrixS can be found in [41].

    Experiments on ImageNet-1K in [41] set the α in Eq (3.10) at 0.5 and proved that its effectiveness surpasses label smoothing. In this paper, based on the obvious larger difference in similarity between subclasses, the α is given an empirical value of 0.9 for the RSI.

    Mixup, proposed by Zhang et al. [46], can help a CNN avoid overconfidence on the training dataset. In implementation, it overlaps an A-class sample onto another B-class sample with a fuzzy overlap rather than a complete replace. Consequently, in training, the model will learn an unnatural image. As an improved version, Yun et al. [40] proposed another, CutMix, giving its operation algorithm in the paper. Different from Mixup, CutMix directly replaces an equal proportion of a class-B sample with an A-class sample's cut patch. Following that, CutMix also modifies the label of the cut-and-mixed image according to the proportion of patches. Let LCM be the label of a cut-and-mixed image, LO be the label of the original image and λ be the proportion of patches. Then, the LCM can be described as follows:

    LCM=λ×LOA+(1λ)×LOB. (3.12)

    Experiments on ImageNet-1K proved CutMix combined with OLS can further improve CNN performance. However, with the same experiments on RSI, it may not work that well. The reason lies in Figure 3.

    Figure 3.  The cut-and-mixed images.

    In Figure 3, rows 1 and 2 are images from ImageNet-1K, with originals in the first and cut-and-mixed ones in the second. At the bottom of Figure 3, rows 3 and 4 are originals and cut-and-mixed images from the RSI dataset. Without zooming in, taking the natural images for instance, we can easily tell where the image patch is and classify the patch to a certain class, as well as the background. However, with a quick look at the RSI without zooming out, it will become more difficult to locate the patch and tell the cut-and-mixed image's category as the semantic scenes change from left to right.

    This contradiction in Figure 3 clearly explains the large difference in similarity between natural images and RSI. With the definition in Eq (3.12), it can work well for natural images; however, it is not absolutely right for RSI. More specifically, taking the RSI into account, the patch of A-class images does have some similarity with the cut-and-mixed background of B-class. In training, if we assign a B-class label to an A-class patch, the CNN model will correlate some features in the patch to the B-category and eliminate the correlation with the A-category. As a result, with the impaired features of the A-class samples, the model will show poor performance when classifying those samples similar to the A-category. Hence, we argue that we should modify TTs based on the inherent difference in RSI. Therefore, we rebuilt several parts of CutMix, including its algorithm and usage, as follows:

    First, based on the greater similarity of RSI, this paper sets a new parameter to control the probability of a cut-and-mixed operation in training. In detail, this parameter has an empirical value of 0.1 based on extensive experimental results. Second, based on the similarity of RSI, this paper further utilizes the hierarchical similarity among categories contained in the MatrixS as defined in Eq (3.11). More specifically, in each training epoch, the image patch of m-CutMix is only cut from those images belonging to the 15 classes in the lowest order of similarity. The m-CutMix algorithm is shown in Algorithm 2.

    Table  .   .
    Algorithm 2: The cut-and-mix procedures of m-CutMix.
    Definition.
    X: a mini-batch of samples; Z: a mini-batch of cut-and-mixed samples; CMO: the original CutMix function; P: the probability of a cut-and-mix operation in training; MatrixS: the matrix contains the hierarchical similarity among categories updated by OLS; sort: the sorting function; random: the random pick-one-out function.
    1 for x in X do
    2   if P > 0.1 then
    3      index = sort (MatrixS (x), type = ascent)
    4      target = random ( index [0 : 15])
    5 …..z = CMO (x, X [target])
    6     Z.append(z)
    7   else
    8   Z.append(x)

     | Show Table
    DownLoad: CSV

    In Step 2, the m-CutMix is active in conjunction with the m-OLS. Hence, the label of each training sample should be recalculated according to Eqs (3.10) and (3.12). Let LStep2 be the sample's label in Step 2, and then its calculation can be described as follows:

    LStep2=LCM(LOALOB)=λ×LOA+(1λ)×LOB, (3.13)

    where LO and LCM are the same as in Eqs (3.10) and (3.12).

    This study employs AID and NWPU as benchmarks to verify the method's effectiveness. The introduction to the two datasets can be found in [5,6]. As a brief summary, these two sets are both cropped from Google Earth. AID has 30 categories of 10,000 images with a fixed resolution of 6002, while NWPU contains 45 categories of 31,500 images with a fixed resolution of 2562. With categories numbered, the representative images for each AID category are shown in Figure 4, while those for NWPU are shown in Figure 5.

    Figure 4.  Sample images for each AID category.
    Figure 5.  Sample images for each NWPU category.

    In this study, to get a fair comparison, the TRs of two datasets are used with the same settings as previous studies, i.e., AID is 20% and 50%, and NWPU is 10% and 20%. The training subsets are also selected at random, using the rest of the samples as testing subsets. This paper also employs the OA and confusion matrix as criteria for different methods' effectiveness evaluations. In detail, the OA can be described as follows:

    OA=NCNT, (3.14)

    Where NC is the total number of accurately classified samples, and NT is the total number of tested samples.

    The experiments were performed on four computers, each equipped with a single RTX 2060. PyTorch 1.11.0 was running with the Compute Unified Device Architecture 11.5 on Windows 10. All the experimental results were averaged over five runs.

    The training curves at different TRs are shown in Figure 6, with the top two charts corresponding to loss curves and the bottom two charts being testing accuracy curves. As shown in Figures 6a and 6b, the training losses of AID and NWPU both show a fast decrease in Step 1, rebound with obvious amplitude at the beginning of Step 2 and then present a slow decrease in the following epochs. As mentioned before, the loss calculations in Steps 1 and 2 are very different according to Eqs (3.10) and (3.13).

    Figure 6.  The fitting curves of AID and NWPU.

    Hence, as shown in Figure 6c, d, the testing accuracy (i.e., the OA) in Step 2 achieves visible improvement on AID as well as NWPU. Similarly, as the m-CutMix was activated, the OA curves showed a clear but slight decline at the beginning of Step 2. In other words, the cut-and-mix operation does have a shocking effect on the model's performance, but with a cautious reconstruction, it can still help the CNN model gain better accuracy. This phenomenon indicates that a higher frequency of cut-and-mix operations may generate greater impacts on the loss and gradient changes. As some evidence in the experimental results [8] suggests, it will result in larger updates to the parameters through backpropagation and make the CNN suboptimal.

    The OA comparison of different methods published from 2017 to 2023 is shown in Table 3. In summary, the OA results include 30 previous studies, which consist of 23 CNN and 7 ViT methods. The "parameter size" column is the exact value self-reported or a minimum estimate according to the open information in the related papers. In addition, "not mentioned" means lacking exact information, while "none" corresponds to no relative information.

    Table 3.  OA (%) comparison of different methods on AID and NWPU.
    Methods Roadmap Parameter size (M) AID NWPU
    TR20% TR50% TR10% TR 20%
    [10] in 2019 Feature fusion 23.4 93.68 ± 0.29 94.75 ± 0.24 90.58 ± 0.19 91.91± 0.23
    [11] in 2019 276 95.02 ± 0.28 96.66 ± 0.19 91.30± 0.18 93.45 ± 0.17
    [20] in 2017 138 None 89.71 ±0.33 None None
    [24] in 2019 145 91.63 ± 0.19 94.74 ± 0.17 89.18 ± 0.14 85.08 ± 0.13
    [25] in 2019 138 93.60 ± 0.12 96.66 ± 0.11 89.89 ± 0.16 92.55 ± 0.14
    [14] in 2020 24 93.33 ± 0.29 95.38 ± 0.29 92.17 ± 0.08 92.46 ± 0.09
    [23] in 2019 A single EfficientNet-B3
    with feature fusion
    > 12 94.19 ± 0.15 None 91.08 ± 0.14 None
    [9] in 2018 Fine-tuning Loss 138 90.82 ± 0.16 91.89 ± 0.22 89.22 ± 0.50 91.89 ± 0.22
    [21] in 2019 Models combination with fusion and fine-tuning loss 633 None 96.37 ±0.30*1 None 93.27 ± 0.17*1
    [22] in 2019 237 None 97.24 ± 0.32*1 None None
    [13] in 2020 Attention aided 14 95.73 ± 0.22 97.16 ± 0.26 92.70± 0.32 94.58 ± 0.26
    [17] in 2022 8 95.43 ± 0.32 97.39 ± 0.24 93.39 ± 0.39 94.95 ± 0.36
    [15] in 2021 24 94.45 ± 0.76 96.56 ± 0.12 None None
    [27] in 2020 89 None None 93.15 ± 0.09 94.86 ± 0.15
    [33] in 2022 26 95.60 ± 0.17 97.14 ± 0.03 92.32 ± 0.15 94.66 ± 0.11
    [28] in 2021 Models combination 276 93.33 ± 0.29 95.38 ± 0.29 91.09 ± 0.13 92.42 ± 0.16
    [26] in 2020 138 92.20 ± 0.23 95.48 ± 0.12 None None
    [16] in 2022 54 96.19 ± 0.48 97.48 ± 0.39 93.67 ± 0.21 95.32 ± 0.36
    [12] in 2019 CNNs ensemble 167 None None 92.44 ± 0.37 None
    [18] in 2022 47 96. 0 ± 0.15 98.54 ± 0.17 93.24 ± 0.15 95.58 ± 0.08
    [29] in 2021 ViT 307 95.86 ± 0.28 None 93.83 ± 0.46 None
    [30] in 2021 46 95.54 ± 0.18 98.48 ± 0.06 93.06 ± 0.11 95.56 ± 0.20
    [31] in 2022 28 96.61 ± 0.07 98.08 ± 0.03 93.90 ± 0.07 95.29 ± 0.12
    [47] in 2022 40 95.56 ± 0.17 96.98±0.16 92.72±0.04 94.66±0.10
    [32] in 2022 A self-designed CNN 0.5 93.15 ± 0.25 97.31 ± 0.10 92.02 ± 0.50 94.39 ± 0.16
    [34] in 2022 CNNs combined with ViT Not mentioned 96.25 ± 0.10 97.70 ± 0.11 93.90 ± 0.14 95.40 ± 0.15
    [35] in 2023 Decoupling CNNs Not mentioned 86.52 ± 0.18 None 84.81 ± 0.36 91.41 ± 0.69
    [36] in 2022 CNN using clustering Not mentioned 91.32*2 None 91.96*2 None
    [37] in 2022 CNNs using adaptive transform 16 94.55 ± 0.27 96.72 ± 0.23 90.25 ± 0.14 93.05 ± 0.12
    [38] in 2022 CNN combined with ViT Not mentioned 95.58 96.88 92.72 94.50
    RE-EfficientNet [this work] Single EfficientNet-B3 12 97.11 ± 0.06 98.15 ± 0.10 94.60 ± 0.05 96.15 ± 0.03
    *1: The methods used a testing ratio of 30% on AID and 70% on NWPU.
    *2: The method only used a testing ratio of 20% on AID and NWPU, respectively.

     | Show Table
    DownLoad: CSV

    As shown in Table 3, RE-EfficientNet shows an obvious advance on AID and NWPU, while only the CNN ensemble in [18] and the ViT in [30] show a partial lead in a TR of 50% on AID. Even with the advantage of more parameters, the methods [18,30] still show a notable OA gap when the testing samples are getting larger in the fourth, sixth and last columns of Table 3. Hence, based on the authors' viewpoint, the partial leading in [18,30] may come from a reduced total number of testing samples. If taking RE-EfficientNet's OA as the baseline, the improvement on the 20% TR of AID is 0.50% over the leading method [31], while the improvements on the 10% and 20% TRs of NWPU are 0.70% and 0.75% over the other leading method [34], respectively. We can see that RE-EfficientNet's lead is more notable when the testing sets are larger. Taking the parameter size into account, we can find that RE-EfficientNet only has a quarter of the parameters in [31] and far fewer than the parameters in [34], which has several CNNs and a ViT inside. Hence, as a fair comparison, the results demonstrate that RE-EfficientNet is more effective and lightweight than the previous ones.

    In Table 3, given the OA results between different technical routes, we can find out some valuable facts as follows:

    First, the strategy of feature fusion in [10,11,14,20,24,25] commonly falls behind, while the one using attention modules in [13,15,17,27,33] is more likely to perform better if the feature fusion is abandoned. Attention modules, however, do not guarantee better performance when the feature fusion is coupled with attention modules like [23] does.

    Second, putting the complex training procedure and huge parameter size aside, we can find that the strategy of model combinations in [16,26,28] has not shown significant improvements over the single CNN with built-in attention modules. However, in [28], the combination strategy using individually trained models can get a slight lead over its competitor with built-in attention modules if its hardware-extensive budget is acceptable.

    Third, despite the complex training procedure, the ensemble of CNNs in [18] does show competitive performance. In [18], each CNN of the ensemble has newly added attention modules, and for this reason, we can see that the ensemble in [12] falls behind due to its inferior individual classifiers without attention modules.

    Fourth, regardless of the larger parameter size, the ViT methods commonly perform better on the larger NWPU than the other single-CNN methods except RE-EfficientNet. In other words, the ViT has not shown persuasive advantages when training sets are smaller. Anyhow, considering the "data-hungry" effect of the ViT architecture widely shown in natural images [47,48], we can easily accept these results. In addition, it is commonly considered that a CNN can have inductive biases inside, such as translation invariance, that can quicken the model's fitting [49]. Contrary to CNNs, however, the ViT's advantage is the long-range dependence of different features and sequence information [50]. Hence, based on the authors' viewpoint, the transfer learning tasks of RSI are easier and cheaper by employing CNNs, unless the hardware-expensive ViT has other acceptable advantages in special tasks.

    Fifth, the two TTs, i.e., label smoothing and Mixup, are used in several previous methods [13,17,27] without any modification. More specifically, the study in [37] even proposed an adaptive label updating pipeline. However, in Table 3, these four methods do not show exciting performance.

    Therefore, all the results validate the effectiveness and superiority of RE-EfficientNet. Meanwhile, based on the comparable results and analysis, it is revealed that the systematic reasons related to the leading performance of RE-EfficientNet come from several aspects. First, the attention modules in EfficientNet-B3 provide a great chance for RE-EfficientNet to achieve better performance if compared to the other classical CNNs without attention. Second, the unmodified EfficientNet-B3 in RE-EfficientNet avoids an expanded search space and can effectively take advantage of better starting points in the search space provided by pre-training on ImageNet-1K. Third, RE-EfficientNet has well handled the data distribution shift from DA by using different effective combinations of transformations. Fourth, the modified TTs according to the nature of RSI are more effective than the simple usage in previous studies. Fifth, compared to ViTs, the inductive biases of CNNs have enhanced RE-EfficientNet's performance for classifying the smaller RSI sets like AID and NWPU.

    The confusion matrixes for AID and NWPU are presented in Figures 7 and 8, while the most confusing categories are marked with a red arrow. As concluded before, RE-EfficientNet shows more obvious leads when the training dataset is smaller. Hence, to get a good understanding of how RE-EfficientNet works, this paper selectively shows the matrixes of AID and NWPU with the same TR of 20%.

    Figure 7.  The confusion matrix of AID at a 20% TR.
    Figure 8.  The confusion matrix of NWPU at a 50% TR.

    As shown in Figure 7, the confusion mainly occurs in the categories including center, church, industry, park, resort, school and square. Meanwhile, except for medium residential, which has an OA of 96.60%, all the OAs of other classes are higher than the mean value of 97.11%. Comparing the confusion result to the feature fusion methods [11,24,25], we can see remarkable improvements in all categories. However, making the same comparison to another fusion method [10], we can find that RE-EfficientNet still shows obvious leads in most categories, excluding resort, school and square. More specifically, in this paper, the OA result of the three classes is less than the results in [10]. This fact proves that feature fusion may have a positive impact on CNN's performance in a certain class. With a close look at the CNN methods with attention modules [13,17,33], we can see that RE-EfficientNet also shows clear leads in all categories. However, when looking at the ViTs [30,38], we will find another contradiction of confusing categories. In detail, in [30], the most confusing classes are bare land and desert, with OAs of 87% and 51%, respectively. Yet, in [38], the most confusing classes are the same as RE-EfficientNet, with a slight OA gap. Because the other ViT methods [29,31] lack related information, we cannot further analyze this deviation. In addition, among the other methods providing necessary information, the CNN ensemble [18] and model combination [28] also have suboptimal performance compared to RE-EfficientNet. In particular, in [18], the ensemble shows the same partial improvements on three categories as [10] does and also has worse performance on the other categories.

    As shown in Figure 8, at the 20% TR of NWPU, the most confusing categories include the church, dense residential, industry, medium residential, palace, rectangular farmland and wetland, with OA less than 93%. Meanwhile, another ten categories have an OA greater than 93% but below the mean value of 96.15%, including commercial area, desert, freeway, island, lake, meadow, mountain, railway, railway station and runaway. Comparing the RE-EfficientNet's results to the other methods [10,11,13,24,25,30,33,38], we can see an overall improvement for all the most confusing categories, and the temporary lead on AID in [10] does not exist on NWPU. However, in [18], the CNN ensemble's most confusing categories are quite different from all others, including sparse residential, forest and overpass, with OA less than 90%. Yet, the OAs of these three categories in RE-EfficientNet are all above 96.30%. Hence, this contradiction still proves the local-optimal-solution problem in deep learning.

    In summary, based on all the comparisons, we can draw two conclusions. First, comparing the previous studies, RE-EfficientNet does perform better on AID and NWPU with fewer training samples, and the improvements are clear for all categories. Second, some previous methods have found partial advances for several confusing categories when the RSI set is small. However, the improved performance will also disappear if the RSI set becomes larger.

    To understand the difference in regularization between RSI and ImageNet-1K, this work also studied the changes in soft labels modified by the m-OLS, and the results are shown in Figure 9. In the figure, the category number has the same definition as in Figures 4 and 5, while "Step" is also the same as in Figure 2.

    Figure 9.  Changes in the category labels modified by m-OLS.

    As shown in Figure 9a, we can see three facts. First, with a 20% TR, the smallest category labels in Step 1 are school and resort. In contrast, the smallest labels in Step 2 are the railway station and storage tank, and all 30 category labels share a small gap of less than 0.1. Second, with a 50% TR, the school and resort in Step 1 still have the smallest label values, but the values are obviously higher than the ones of the 20% TR. However, the categories with the smallest label values in Step 2 change to the church and school, corresponding to the confusing categories in Figure 7. Third, we can see that all the labels of Step 2 are obviously less than the ones of Step 1, while the values of Step 1 show greater volatility with a 20% TR. This phenomenon is mainly because the m-OLS algorithm dynamically updates label values according to the model's prediction results for training sets (see [41]).

    Similarly, as shown in Figure 9b, we can see two facts. First, all 45 label values of Step 1 show greater volatility and are smaller than the values of Step 2. Second, the categories with the smallest label, including the church, medium residential and palace, are the same in Steps 1 and 2 but still correspond to the confusing categories shown in Figure 8. Taking Figure 10 as a whole, we can find that the m-OLS value shows smoother fluctuations as the sum of training samples increases. This phenomenon is also because the model's increasing prediction accuracy on training sets has caused the m-OLS algorithm to make slight changes to the category's similarity.

    Figure 10.  Gradient class activation mapping visualizations.

    Based on all the facts, we can understand that the m-OLS is an adaptive soft label generator based on the similarity of training sets. It can prevent a CNN from being overconfident based on hard labels. Meanwhile, comparing these m-OLS labels to the ones for nature images shown in [41], we can see a larger gap of approximately 0.2. Hence, we should modify the usage of OLS formed in ImageNet-1K to exclude larger fluctuations first and then design an algorithm with proper training epochs to get the soft labels closer to the similarity distribution nature of RSI.

    To validate the effectiveness of modified TTs in training, as shown in Table 4, this work performed ablation experiments for RE-EfficientNet. In the table, a single selected baseline denotes the same training procedures described in Algorithm 1 but excludes m-OLS and m-CutMix. Similarly, with the baseline and m-OLS chosen at the same time, it shows that the training procedures have the m-OLS activated. In other words, the last row of Table 4 represents the entire training procedure described in Algorithm 1.

    Table 4.  OAs (%) of ablation experiments with different combinations of DARS.
    DA and regularization AID NWPU
    Baseline m-OLS m-CutMix TR20% TR50% TR10% TR20%
    96.09 ± 0.16 97.61 ± 0.13 93.52 ± 0.10 95.37 ± 0.06
    96.41 ± 0.01 97.74 ± 0.02 93.89 ± 0.06 95.66 ± 0.11
    96.84 ± 0.08 98.08 ± 0.02 94.38 ± 0.07 96.03 ± 0.04
    97.11 ± 0.06 98.15 ± 0.10 94.60 ± 0.05 96.15 ± 0.03

     | Show Table
    DownLoad: CSV

    As shown in the first row of Table 4, we can see that the baseline training strategy can also let a single EfficientNet-B3 model achieve competitive performance compared to the other previous methods shown in Table 1. Comparing the baseline result to that of the more lightweight EfficientNet-B0 in Table 1, however, we can find no obvious improvements, though the EfficientNet-B3 has a larger capacity. This reveals that, without proper regularization, a CNN model is easily overfitted on the same RSI dataset. As shown in the second row of Table 4, with m-OLS activated, a single EfficientNet-B3 can achieve clearly improved OAs by 0.10% to 0.30%. The advances are more visible on those smaller TRs. As shown in the third row of Table 4, when m-CutMix is active, a single EfficientNet-B3 model can gain more OA by 0.3% to 0.7%, while similarly, the improvements are clearer in the scenario with smaller TRs. At the end, we can see that RE-EfficientNet can achieve remarkable improved OAs by 0.55% to 1.1% compared to the baseline model trained without TTs.

    To gain a deeper understanding of RE-EfficientNet, we employed the gradient class activation mapping (CAM) technique [43] to generate visualizations for different cut-and-mixed samples, and the related images are shown in Figure 10. Ten different categories are included in the CAM visualization. In Figure 10, the label of each column at the top corresponds to a certain category, while the pair of two labels surrounded by a red rectangle means the two categories were cut and mixed with each other. Note that all images are separated into five rows: The first and third are the original and cut-and-mixed images; the second is the CAM of the original; and the fourth and fifth are the CAM of the cut-and-mixed, with a prediction corresponding to the first and second label in each label pair.

    As shown in the first and second rows of Figure 10, the brighter part of the CNN's CAM, also known as the activation part with more important information, agrees with human semantic cognitive patterns if the RSI scene has a salient ground object: e.g., the waves of a beach, the station of a railway station, the court of a basketball court and so on. The result proves that the model has learned to classify different samples based on invariant features that belong to a certain category. However, without a salient object, we can see that the activation becomes wider and relatively darker. This result still agrees with humans because we also classify this scene through its global pattern, though this recognition pattern is not as discriminative as a salient ground object is. Looking at the cut-and-mixed images, as shown in the fourth and fifth rows of Figure 10, we can find two facts as follows:

    First, for the fourth row with a prediction of a first category, its activation is almost the same as the one in the second row, while the activation also gets distracted if the cut-and-mixed image patch closes to its original activating location. Second, for the last row with a prediction of the patch's category, its activation also focuses on the image patch if the patch is part of the activation location shown in the second row.

    Hence, based on the CAM results, we can tell that RE-CNN has helped the model focus its attention on the salient object of the RSI scene. Meanwhile, with a restrained cut-and-mix operation, the model's attention has not obviously been distracted. However, if the cut-and-mix operation frequently happens, this learned attention will very likely be weakened, letting the model be suboptimal. Taking this fact into account, the proposed modification for CutMix is quite reasonable.

    To validate the effectiveness of extracted features, this paper employs the t-Distributed Stochastic Neighbor Embedding (t-SNE) [51] technique to intuitively visualize the similarity of classified samples, and the contrastive results are shown in Figure 11. Two EfficientNet-B3 models are involved in the comparison, while the left one is trained with a combination of m-OLS and m-CutMix, but the right one is not.

    Figure 11.  Visualization of the category's similarity through t-SNE.

    In Figure 11, each category of NWPU consists of two hundred samples selected at random, while the category name is the same as in Figure 5. As shown in Figure 11, most of the categories are clearly separated from each other. This reveals that the features extracted by RE-EfficientNet are effective. A quick glance at Figure 11 shows that the right model has more overlapping categories (marked with red rectangles). More specifically, the overlapping pairs on the right include the church with palace, the dense residential with medium residential and the rectangular farm with terrace, while the left only consists of the church with palace. Comparing the result to the confusion matrix, we can see consistency. Similarly, this result also clearly visualizes the performance gap between the different training strategies. Comparing the t-SNE visualization to those in previous studies [14,18,33,35,38], we can see that this work shows a better separation of categories. Hence, the t-SNE visualization also proves the effectiveness and superiority of RE-EfficientNet.

    To date, the overwhelming majority of proposed methods for RSI-SC employ deep models pre-trained on natural image datasets. The main reason lies in the fact that the RSI research domain lacks large-scale datasets. However, there are still researchers who claim to be developing and trying to release larger-scale RSI datasets. For example, in [31], Wang et al. have tried to train a ResNet-50 CNN model and a ViT from scratch on an RSI dataset called Million-AID [52], which has about a million images cropped from Google Earth. Using the pre-trained weights on Million-AID, [31] did the classifying tests on AID and NWPU, and the results are shown in Table 5.

    Table 5.  OA (%) comparison of different models pre-trained on natural images and RSI.
    Methods Roadmap Parameter size (M) AID NWPU
    20%-TR 50%-TR 10%-TR 20%-TR
    Wang et al. [31] A single Resnet-50 pre-trained on Million-AID 24 96.81 ± 0.03 97.89 ± 0.08 93.93 ± 0.10 95.02 ± 0.06
    A single Vision Transformer pre-trained on Million-AID 19 96.91 ± 0.06 98.22 ± 0.09 94.41 ± 0.11 95.60 ± 0.06
    RE-EfficientNet(This work) A single EfficientNet-B3 pre-trained on ImageNet-1K 12 97.11 ± 0.06 98.15 ± 0.10 94.60 ± 0.05 96.15 ± 0.03

     | Show Table
    DownLoad: CSV

    As shown in Table 5, with pre-trained weights on Million-AID, the ViT shows clearly improved OAs compared to the one using pre-trained weights on ImageNet-1K. Compared to the ViT pre-trained on Million-AID, we can see that RE-EfficientNet still performs better, though the OA gaps are slightly reduced. Compared to the ResNet-50 pre-trained on Million-AID, however, we can see that RE-EfficientNet performs much better with improved OAs by 0.26% to 1.13%. As mentioned before, ViTs commonly need more training samples to perform better. Therefore, the result indicates that the time-consuming pre-training in [31] still has not eliminated the performance gap in RSI sets if compared to RE-EfficientNet. Meanwhile, the ResNet-50 in [31] lacks attention modules, and its training algorithm excludes effective combinations of DA transformations and TTs, resulting in poor performance compared to RE-EfficientNet.

    Based on the basic rule in deep learning, a CNN model pre-trained on a large-scale RSI set has a big chance of achieving better performance because the domain gaps are larger in ImageNet-1K. Currently, only 10,000 samples of the open Million-AID have category information. Hence, if possible, the authors will test this promising pre-training strategy on large-scale RSI datasets in the future.

    As deep learning techniques quickly dominate the computer vision task of RSI, a lot of creative CNN and ViT methods have been proposed for SC. Recently, researchers have proposed many effective but costly methods to seek ideal performance. However, there is still a better way to achieve a simple, lightweight and accurate solution.

    In the beginning, we presented a set of deep and reasonable studies on theoretical analysis and attempts to explain why it is difficult for those complex and costly methods to achieve ideal solutions as expected. Then, we selected an easy way to gain competitive performance for RSI-SC by using a single CNN and beginner-friendly transfer learning. The proposed RE-Efficient only includes a lightweight EfficientNet-B3 with 12 M parameters. The RE-CNN training algorithm only consists of routine DA transformations and two modified TTs. Experimental results on AID and NWPU prove that RE-Efficient can perform much better. It presents remarkable OA improvements of 0.5% to 0.75% over the leading one among the 30 SOTA methods published before 2023. The ablation experimental results also demonstrate that the proposed combinations of DA transformations and modified TTs are effective. It can boost EfficientNet-B3's performance with improved OAs by 0.55% to 1.1%. Since RE-Efficient only consists of an open-source model and training algorithm, it is easier to reproduce. Given its fewer parameters and transfer-leaning strategy, RE-EfficientNet is more efficient and beginner-friendly. The performance of RE-EfficientNet reveals that a single CNN can perform better for RSI-SC if the whole pipeline and training algorithm is effective. In addition, the authors argue that the proposed mathematical derivation can help us develop more efficient methods for RSI-SC in the future.

    There are still some shortcomings in this work. First, RE-EfficientNet may perform better if it has been pre-trained on a large-scale RSI set. Second, the effectiveness of the RE-CNN algorithm for other classical CNNs has not been verified. The authors will make progress on these prospects in the future.

    The authors declare that no artificial intelligence (AI) tool was used in the creation of this article.

    Thanks to Hunan University of Arts and Science for doctoral research funding for this study (grant number 16BSQD23).

    The authors declare that there is no conflict of interest.



    [1] C. Zeng, G. Hui, Construction of electronic medical record system for standardized diagnosis and treatment of breast cancer, J. Chin. Med. Dev., 29 (2014), 46–48.
    [2] Q. M. Ling, Research on the advantages and development of electronic medical record in medical record management, Electron. J. Gen. Stomatol., 7 (2020), 26–31.
    [3] L. Liu, D. B. Wang, Summary of research on named entity recognition, J. Chin. Soc. Sci. Tech. Inf., 3 (2018), 329–340.
    [4] C. Y. Kun, L. T. A, M. Q. Zhu, D. J. C, X. Hua, A study of active learning methods for named entity recognition in clinical text, J. Biomed. Inf., 58 (2015), 11–18. doi: 10.1016/j.jbi.2015.09.010
    [5] Y. Shen, H. Yun, Z. C. Lipton, Y. Kronrod, A. Anandkumar, Deep active learning for named entity recognition, preprint, arXiv: 1707.05928.
    [6] W. W. Ning, L. Yang, G. M. Zu, L. X. Yan, Research progress of active learning algorithm based on sampling strategy, J. Comput. Res. Dev., 49 (2012), 1162–1173.
    [7] W. R. Qi, L. X. Li, H. Y. Li, B. He, G. Yi, Research on active learning method for named entity recognition of Chinese electronic medical record, Chin. Digital Med., 12 (2017), 51–53.
    [8] L. M. Qun, S. Martin, E. E. Khaled, M. B. A, Efficient active learning for electronic medical record de-identification, AMIA Summits Transl. Sci. Proc., (2019), 462–471.
    [9] M. Kholghi, L. Sitbon, G. Zuccon, A. Nguyen, External knowledge and query strategies in active learning: a study in clinical information extraction, in Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, (2015), 143–152.
    [10] J. Zhu, E. H. Hovy, Active learning for word sense disambiguation with methods for addressing the class imbalance problem, in Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), (2007), 783–790.
    [11] M. Bloodgood, K. Vijay–Shanker, Taking into account the differences between actively and passively acquired data: The case of active learning with support vector machines for imbalanced datasets, preprint, arXiv: 1409.4835.
    [12] K. Tomanek, U. Hahn, Reducing class imbalance during active learning for named entity annotation, in Proceedings of the fifth international conference on Knowledge capture, (2009), 105–112.
    [13] S. Ertekin, J. Huang, C. L. Giles, Active learning for class imbalance problem, in Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, 1 (2007), 823–824.
    [14] S. Ertekin, J. Huang, L. Bottou, C. L. Giles, Learning on the border: Active learning in imbalanced data classification, in Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, (2007), 127–136.
    [15] H. Guang, Z. C. Xia, H. X. Lei, A new SVM active learning algorithm and its application in obstacle detection, J. Comput. Res. Dev., 46 (2009), 1934–1941.
    [16] B. C. Mei, Classification of weighted support vector machines based on active learning, Comput. Eng. Des., 30 (2009), 966–970.
    [17] Y. F. Liang, Research on Active Learning Algorithm Based on Expert Committee, Master thesis, Ocean University of China, 2010.
    [18] L. Feng, Research and Application of Active Semi-supervised K-means Clustering Algorithm, Master thesis, Hebei University of Geosciences, 2018.
    [19] X. Li, Y. Guo, Adaptive active learning for image classification, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2013 (2013), 859–866.
    [20] M. Kholghi, L. D. Vine, L. Sitbon, G. Zuccon, A. Nguyen, Clinical information extraction using small data: an active learning approach based on sequence representations and word embeddings, J. Assoc. Inf. Sci. Technol., 68 (2017), 2543–2556. doi: 10.1002/asi.23936
    [21] D. Angluin, Queries and concept learning, Mach. Learn., 2 (1988), 319–342.
    [22] R. Grishman, B. Sundheim, Message understanding conference-6: A brief history, in COLING 1996 Volume 1: The 16th International Conference on Computational Linguistics, 1996.
    [23] C. Friedman, P. O. Alderson, J. H. Austin, S. B. Johnson, A general natural-language text processor for clinical radiology, J. Am. Med. Inf. Assoc., 1 (1994), 161–174. doi: 10.1136/jamia.1994.95236146
    [24] W. S. Li, Research on Chinese Electronic Medical Record of Named Entity Recognition Based on Improved Deep Belief Network, Master thesis, Beijing University of Chemical Technology, 2018.
    [25] G. K. Savova, J. J. Masanz, P. V. Ogren, J. P. Zheng, S. W. Sohn, C. G. Chute, Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications, J. Am. Med. Inf. Assoc., 17 (2010), 507–513. doi: 10.1136/jamia.2009.001560
    [26] S. T. Wu, H. F. Liu, D. C. Li, C. Tao, M. A. Musen, N. H. Shah, Unified Medical Language System term occurrences in clinical notes: a large-scale corpus analysis, J. Am. Med. Inf. Assoc., 19 (2012), 149–156. doi: 10.1136/amiajnl-2012-000844
    [27] E. F. Sang, F. D. Meulder, Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition, preprint, arXiv: 0306050.
    [28] Y. Li, S. L. Gorman, N. Elhadad, Section classification in clinical notes using supervised hidden markov model, in Proceedings of the 1st ACM International Health Informatics Symposium, (2010), 744–750.
    [29] P. Y. Wang, D. H. Gi, Disease name extraction based on multi-label CRF, Appl. Res. Comput., 1 (2017), 118–122.
    [30] F. Ye, Y. Y. Chen, G. G. Zhou, H. M. Li, Y. Li, Intelligent recognition of named entities in electronic medical records, Chin. J. Biomed. Eng., 2 (2011), 98–104.
    [31] J. Liang, X. M. Xian, X. J. He, M. F. Xu, S. Dai, J. Y. Xin, A novel approach towards medical entity recognition in Chinese clinical text, J. Healthcare Eng., (2017), 1–16.
    [32] G. Luo, X. Huang, C. Y, Z. Nie, Joint entity recognition and disambiguation, in Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, (2015), 879–888.
    [33] A. Passos, V. Kumar, M. C. Andrew, Lexicon infused phrase embeddings for named entity resolution, preprint, arXiv: 1404.5367.
    [34] Y. J. Zhang, Z. T. Xu, X. Y. Xue, A maximum entropy Chinese named entity recognition model of integrating multiple features, J. Comput. Res. Dev., 6 (2008), 1004–1010.
    [35] A. Mccallum, W. Li, Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons, Comput. Sci. Dep. Fac. Publ. Ser. 11, (2003), 188–191.
    [36] R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, P. Kuksa, Natural language processing (almost) from scratch, J. Mach. Learn. Res., 12(2011), 2493–2537.
    [37] G. Lample, M. Ballesteros, S. Subramanian, K. Kawakami, C. Dyer, Neural architectures for named entity recognition, preprint, arXiv: 1603.01360.
    [38] A. Jagannatha, Y. Hong, Structured prediction models for RNN based sequence labeling in clinical text, in Proceedings of the conference on empirical methods in natural language processing. conference on empirical methods in natural language processing, (2016), 856–865.
    [39] J. Zhu, H. Wang, B. K. Tsou, M. Ma, Active learning with sampling by uncertainty and density for data annotations, IEEE. Trans. Audio, Speech, Lang. Process., 18 (2012), 1323–1331.
    [40] X. Yan, Research on Image Annotation Method Based on Active Learning, Master thesis, Liaoning University of Technology, 2014.
    [41] L. Jin, Y. F. Cao, C. X. Su, J. Y. Ren, Multi-class image classification based on HS sample selection and BvSB feedback, J. Guizhou Norm. Univ. (Nat. Sci.), (2014), 56–61.
    [42] Q. H. Zhao, Two active learning methods, Master thesis, He Bei University, 2010.
    [43] S. Ertekin, J. Huang, C. L. Giles, Active learning for class imbalance problem, in Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, (2007), 823–824.
    [44] H. S. Seung, M. Opper, H. Sompolinsky, Query by Committee, in Proceedings of the fifth annual workshop on Computational learning theory, (1992), 287–294.
    [45] D. D. Lewis, J. Catlett, Heterogeneous uncertainty sampling for supervised learning, in Machine learning proceedings, Morgan Kaufmann, 1994,148–156.
    [46] T. Scheffer, C. Decomain, S. Wrobel, Active hidden markov models for information extraction, in International Symposium on Intelligent Data Analysis, Springer, Berlin, Heidelberg, (2001), 309–318.
    [47] S. Tong, D. Koller, Support vector machine active learning with applications to text classification, J. Mach. Learn. Res., 2 (2002), 45–66.
    [48] A. Kapoor, E. Horvitz, S. Basu, Selective supervision: guiding supervised learning with decision-theoretic active learning, in IJCAI, 7 (2007), 877–882.
    [49] S. Arora, E. Nyberg, C. P. Rose, Estimating annotation cost for active learning in a multi-annotator environment, in Proceedings of the NAACL HLT 2009 Workshop on Active Learning for Natural Language Processing, (2009), 18–26.
    [50] J. Carroll, R. Haertel, P. McClanahan, E. K. Ringger, K. Seppi, Assessing the costs of sampling methods in active learning for annotation, Fac. Publ., (2008), 185.
    [51] M. Kholghi, L. Sitbon, G. Zuccon, A. Nguyen, Active learning: a step towards automating medical concept extraction, J. Am. Med. Inf. Assoc., 23 (2016), 289–296. doi: 10.1093/jamia/ocv069
    [52] R. Q. Wang, X. L. Li, Y. L. Huang, B. He, Y. Guan, Research on active learning method of Chinese electronic medical record named entity recognition, China Digital Med., 12 (2017), 51–53.
    [53] J. Z. Cheng, W. Qiang, A. Franklin, T. Cohen, H. Xu, Cost-sensitive active learning for phenotyping of electronic health records, AMIA Summits Transl. Sci. Proc., 2019 (2019), 829–838.
    [54] Ö. Uzuner, B. R. South, S. Shen, S. L. Duvall, 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text, J. Am. Med. Inf. Assoc., 18 (2011), 552–556. doi: 10.1136/amiajnl-2011-000203
    [55] S. Pradhan, N. Elhadad, B. South, D. Martinez, L. Christensen, A. Vogel, Task 1: ShARe/CLEF ehealth evaluation lab 2013, in CLEF (Working Notes), 2013.
    [56] G. S. Wang, X. J. Huang, Text classification model of convolutional neural network based on Word2vec and improved TF-IDF, J. Chin. Mini-Micro Comput. Syst., 40 (2019), 210–216.
    [57] M. Kholghi, L. Sitbon, G. Zuccon, A. Nguyen, Active learning reduces annotation time for clinical concept extraction, Int. J. Med. Inform., 106 (2017), 25–31. doi: 10.1016/j.ijmedinf.2017.08.001
    [58] T. H. Nguyen, A. Sil, G. Dinu, R. Florian, Toward mention detection robustness with recurrent neural networks, preprint, arXiv: 1602.07749.
    [59] Z. H. Huang, W. Xu, K. Yu, Bidirectional LSTM-CRF models for sequence tagging, preprint, arXiv: 1508.01991.
    [60] Z. L. Yang, R. Salakhutdinov, W. Cohen, Multi-task cross-lingual sequence tagging from scratch, preprint, arXiv: 1603.06270.
  • This article has been cited by:

    1. Huaxiang Song, Yuxuan Yuan, Zhiwei Ouyang, Yu Yang, Hui Xiang, Quantitative regularization in robust vision transformer for remote sensing image classification, 2024, 39, 0031-868X, 340, 10.1111/phor.12489
    2. Huaxiang Song, Chai Wei, Zhou Yong, Efficient knowledge distillation for remote sensing image classification: a CNN-based approach, 2023, 1744-0084, 10.1108/IJWIS-10-2023-0192
    3. Huaxiang Song, MBC-Net: long-range enhanced feature fusion for classifying remote sensing images, 2024, 17, 1756-378X, 181, 10.1108/IJICC-07-2023-0198
    4. Huaxiang Song, Yafang Li, Xiaowen Li, Yuxuan Zhang, Yangyan Zhu, Yong Zhou, ERKT-Net: Implementing Efficient and Robust Knowledge Distillation for Remote Sensing Image Classification, 2024, 11, 2410-0218, 10.4108/eetinis.v11i3.4748
    5. Huaxiang Song, Yuxuan Yuan, Zhiwei Ouyang, Yu Yang, Hui Xiang, Efficient knowledge distillation for hybrid models: A vision transformer‐convolutional neural network to convolutional neural network approach for classifying remote sensing images, 2024, 6, 2631-6315, 10.1049/csy2.12120
    6. Huaxiang Song, Yong Zhou, Wanbo Liu, Di Zhao, Qun Liu, Jinling Liu, Variance Consistency Learning: Enhancing Cross-Modal Knowledge Distillation for Remote Sensing Image Classification, 2024, 8, 2516-029X, 56, 10.33166/AETiC.2024.04.003
    7. L. Meenachi, K. T. S. Vijayaragavan, A. Deepak, T. Roshan Karthick, 2024, Chapter 25, 978-981-97-7359-6, 289, 10.1007/978-981-97-7360-2_25
    8. Xiangchen Yao, Shuqi Ma, Zhaoyuan Zhang, Yuanzhen Xu, Ziyi Bai, Bo Li, Robust Leakage Detection in Tunnels with Unstable Illumination Using an Explainable Deep Learning Approach, 2025, 12, 2196-7202, 10.1007/s40515-025-00558-z
    9. Huaxiang Song, Junping Xie, Yingying Duan, Xinyi Xie, Yang Zhou, Wenhui Wang, CMKD-Net: A Cross-Modal Knowledge Distillation Method for Remote Sensing Image Classification, 2025, 02731177, 10.1016/j.asr.2025.04.009
  • Reader Comments
  • © 2021 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(3433) PDF downloads(321) Cited by(6)

Figures and Tables

Figures(10)  /  Tables(7)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog