
The mean curvature-based image deblurring model is widely used to enhance the quality of the deblurred images. However, the discretization of the associated Euler-Lagrange equations produce a nonlinear ill-conditioned system which affect the convergence of the numerical algorithms like Krylov subspace methods. To overcome this difficulty, in this paper, we present two new symmetric positive definite (SPD) preconditioners. An efficient algorithm is presented for the mean curvature-based image deblurring problem which combines a fixed point iteration (FPI) with new preconditioned matrices to handle the nonlinearity and ill-conditioned nature of the large system. The eigenvalues analysis is also presented in the paper. Fast convergence has shown in the numerical results by using the proposed new preconditioners.
Citation: Shahbaz Ahmad, Adel M. Al-Mahdi, Rashad Ahmed. Two new preconditioners for mean curvature-based image deblurring problem[J]. AIMS Mathematics, 2021, 6(12): 13824-13844. doi: 10.3934/math.2021802
[1] | Zhimei Fu, Kezheng Zuo, Yang Chen . Further characterizations of the weak core inverse of matrices and the weak core matrix. AIMS Mathematics, 2022, 7(3): 3630-3647. doi: 10.3934/math.2022200 |
[2] | Xiaofei Cao, Yuyue Huang, Xue Hua, Tingyu Zhao, Sanzhang Xu . Matrix inverses along the core parts of three matrix decompositions. AIMS Mathematics, 2023, 8(12): 30194-30208. doi: 10.3934/math.20231543 |
[3] | Jinyong Wu, Wenjie Shi, Sanzhang Xu . Revisiting the m-weak core inverse. AIMS Mathematics, 2024, 9(8): 21672-21685. doi: 10.3934/math.20241054 |
[4] | Wanlin Jiang, Kezheng Zuo . Further characterizations of the m-weak group inverse of a complex matrix. AIMS Mathematics, 2022, 7(9): 17369-17392. doi: 10.3934/math.2022957 |
[5] | Yang Chen, Kezheng Zuo, Zhimei Fu . New characterizations of the generalized Moore-Penrose inverse of matrices. AIMS Mathematics, 2022, 7(3): 4359-4375. doi: 10.3934/math.2022242 |
[6] | Wanlin Jiang, Kezheng Zuo . Revisiting of the BT-inverse of matrices. AIMS Mathematics, 2021, 6(3): 2607-2622. doi: 10.3934/math.2021158 |
[7] | Yongge Tian . Miscellaneous reverse order laws and their equivalent facts for generalized inverses of a triple matrix product. AIMS Mathematics, 2021, 6(12): 13845-13886. doi: 10.3934/math.2021803 |
[8] | Jin Zhong, Yilin Zhang . Dual group inverses of dual matrices and their applications in solving systems of linear dual equations. AIMS Mathematics, 2022, 7(5): 7606-7624. doi: 10.3934/math.2022427 |
[9] | Suthep Suantai, Suparat Kesornprom, Nattawut Pholasa, Yeol Je Cho, Prasit Cholamjiak . A relaxed projection method using a new linesearch for the split feasibility problem. AIMS Mathematics, 2021, 6(3): 2690-2703. doi: 10.3934/math.2021163 |
[10] | Hongjie Jiang, Xiaoji Liu, Caijing Jiang . On the general strong fuzzy solutions of general fuzzy matrix equation involving the Core-EP inverse. AIMS Mathematics, 2022, 7(2): 3221-3238. doi: 10.3934/math.2022178 |
The mean curvature-based image deblurring model is widely used to enhance the quality of the deblurred images. However, the discretization of the associated Euler-Lagrange equations produce a nonlinear ill-conditioned system which affect the convergence of the numerical algorithms like Krylov subspace methods. To overcome this difficulty, in this paper, we present two new symmetric positive definite (SPD) preconditioners. An efficient algorithm is presented for the mean curvature-based image deblurring problem which combines a fixed point iteration (FPI) with new preconditioned matrices to handle the nonlinearity and ill-conditioned nature of the large system. The eigenvalues analysis is also presented in the paper. Fast convergence has shown in the numerical results by using the proposed new preconditioners.
Cyberattacks occurr frequently, causing serious impacts on people's daily life. In 2017, the WannaCry ransomware event broke out globally, hitting at least 300,000 users and causing 8 billion USD in damage [1]. In 2020, a cyberattack on Venezuela's national grid trunk line caused widespread power outages across the country [2]. In 2021, the United States refined product pipeline operator Colonial Pipeline was forced to shut down its fuel network in the eastern seaboard states due to a ransomware attack [3]. With the frequent occurrence of cyberattacks, existing methods, such as firewalls, data encryption, and authentication cannot meet security requirements [4]. Therefore, intrusion detection systems have gained the attention of researchers.
Intrusion detection systems play an important role in protecting critical information infrastructure [5]. According to detection techniques, they are categorized into signature-based intrusion detection systems (SIDS) and anomaly-based intrusion detection systems (AIDS) [6,7]. SIDS maintains an attack library that saves historical attack records. If the current traffic matches the record in the attack library, the traffic is judged to be attack class. AIDS analyzes historical traffic using statistical methods to learn a logical model. If the current traffic deviates from the normal traffic, the traffic is judged to be attack class. SIDS offers the advantages of fast detection and a low false alarm rate, but it cannot detect unknown attacks [8]. On the contrary, AIDS can detect unknown attacks and has a wide application prospect in the future. Figure 1 shows the block diagram of the intrusion detection system [9]. It consists of the following key components: (1) Information collection: network data, application logs, audit records, and other relevant information are collected from the network or hosts. The collected information will be used for intrusion analysis. (2) Analysis engine: modeling or behavior matching is performed based on the collected network information, which in turn forms the corresponding knowledge base. It will alert the network administrator if an intrusion is found. The intrusion process will also be part of the information collection. (3) Knowledge base: a list of historical behaviors or trained models are stored. The knowledge base can be used to analyze current traffic, but it needs to be updated regularly.
Intrusion detection is considered a classification problem, which has prompted researchers to adopt machine learning techniques to improve the performance of intrusion detection systems. In recent years, machine learning techniques have been applied broadly in intrusion detection, and have shown encouraging results in many studies [10]. Machine learning techniques can be classified as shallow learning and deep learning [11]. Shallow learning methods, such as K-nearest neighbors [12], decision trees [13], support vector machines [14] and random forests [15], are widely used because of their strong explainability. Among deep learning, autoencoders [16], deep belief networks [17] and convolutional neural networks [18] have achieved great success in intrusion detection owing to their ability to extract features. In the future, finding suitable machine learning techniques for improving the performance of intrusion detection systems has become a hot topic for researchers.
Researchers have proposed many approaches to detect intrusions based on machine learning techniques. This paper reviews related work from the perspectives of anomaly analysis and feature analysis. In anomaly analysis, Chouhan et al. [19] developed an autoencoder-based residual learning technique to enhance the classification capability of convolutional neural networks. Andresini et al. [20] combined feature selection techniques and residual learning to improve the performance of intrusion detection systems. The above residual thresholds need to be set manually, so Aygun et al. [21] developed a method to determine the thresholds adaptively. Yang et al. [22] developed a method that uses a modified conditional variation autoencoder to generate attack samples for balancing the data. Min et al. [23] developed a memory-enhanced autoencoder to improve the generalizability of the model. Autoencoders are also available for nonlinear dimensionality reduction [24,25]. In addition, to improve the performance of intrusion detection systems, some researchers have developed two-stage decision methods. Belouch et al. [26] introduced a two-stage classification model. In the first stage, a RepTree classifier is used to classify the traffic into normal and abnormal. In the second stage, a classifier is used to classify the anomalies detected in the first stage to identify the attack classes. Niyaz et al. [27] proposed an intrusion detection system based on two phases. The first stage uses a sparse autoencoder for feature extraction from the original data. The second stage feeds the processed features into SoftMaxRegression (SM) and self-taught learning (STL) classifiers for learning, respectively. Zhang et al. [28] applied machine-learning techniques to intrusion detection in in-vehicle networks and proposed a two-stage anomaly detection framework.
In feature analysis, Gu et al. [29] used the marginal density ratio method for data enhancement to improve the performance of intrusion detection. Ieracitano et al. [30] used statistical analysis techniques to identify outliers and redundant data, and, thus, remove unnecessary features. Zhang et al. [31] developed a feature fusion technique to improve model classification performance. Tree-based methods are often used for feature selection. Kasongo et al. [32] used extreme gradient augmentation trees for feature selection followed by shallow methods for classification. Megantara and Ahmad [33] developed a hybrid feature analysis method. This method first uses a decision tree to select the important features. After that, local outlier factors are used to exclude outlier and anomalous features. Rashid et al. [34] used univariate techniques for feature analysis, and integrated methods for classification. Bioheuristics have also been used for feature selection, such as the commonly used particle swarm algorithm and genetic algorithm [35,36,37]. In addition, deep learning methods are often used for nonlinear feature dimensionality reduction. To address the problem that isolated points and noisy data can affect the model performance, Seo et al. [38] used a restricted Boltzmann machine to remove isolated points and noisy data from the dataset. Wuke et al. [39] proposed a combination of multilayer extreme learning machines and autoencoders to reduce the dimensionality of the data. The reduced-dimensional data are then trained by the extreme learning machine. Zhao et al. [40] proposed a method that used deep belief networks and least-squares vector machines. The method first uses a deep belief network for dimensionality reduction, and then uses a particle swarm algorithm to optimize the parameters of the least-squares vector machine.
Although there is a lot of research focused on intrusion detection systems, there are still some issues that need to be addressed. One of the important issues is the dimensionality curse. The high-dimensional data makes it difficult for intrusion detection systems to learn effective data representations, which affects their detection efficiency. Another problem is the increasing number of zero-day attacks [6]. Various attack methods are emerging, leaving network administrators with shorter response times. To address these issues, a two-stage anomaly detection framework based on LightGBM and autoencoder is proposed in this study. The framework can detect novel attacks while improving detection efficiency. LightGBM is an integrated approach that introduces an exclusive feature bundling algorithm and a gradient-based one-sided sampling algorithm. The exclusive feature bundling (EFB) algorithm reduces the number of features that are simultaneously zero, and the gradient-based one-sided sampling (GOSS) method reduces the number of small gradient samples during model training. As a result, the LightGBM algorithm has less time overhead. The focal loss function can increase the weight of difficult samples, which is beneficial to learn the attack samples that are difficult to classify. The autoencoder learns the implicit representation of the data at the encoding layer, and reconstructs the original data at the decoding layer. Exploiting the reconstruction error, the autoencoder can enhance anomaly detection. Therefore, our main innovation is to introduce the focal loss function into LightGBM instead of the Cross-entropy function in it to improve the detection of attack samples. In addition, the reconstruction error of the autoencoder is utilized to further enhance the detection of misclassified samples. For data processing, we use recursive feature elimination, which is a packet-filtering feature selection method to select the best features based on the feature scores. Differently from existing methods, we use a two-stage decision step based on the reinforced LightGBM and Autoencoder. According to our survey, it is the first time the method is proposed. The proposed method has less time overhead and improves the performance of the intrusion detection system.
In the literature [41], we use autoencoder to fit the sampled data and use the LightGBM classifier for multiclassification prediction. However, in this work, we utilize the autoencoder and a modified LightGBM model for anomaly detection. We modified the objective function of the LightGBM and designed a two-stage decision step. The main contributions of this paper are as follows:
(1) To address the dimensionality curse, we propose to use a recursive feature elimination method based on LightGBM to reduce the dimensionality of the original data. The detection efficiency of the intrusion detection system is improved.
(2) To address the problem that the standard LightGBM method cannot effectively detect difficult samples, the focal loss function is introduced into LightGBM. In addition, the improved LightGBM is combined with an autoencoder to effectively respond to zero-day attacks.
(3) Finally, we have conducted experiments on the NSL-KDD and UNSWNB5 datasets. The experiments compare not only the classical methods, but also the current state-of-the-art methods.
The remainder of this paper is structured as follows: Section 2 introduces the relevant theories. Section 3 presents our method. Section 4 provides the experimental results and discussion. Section 5 presents the conclusions and future work of this paper.
In 2017, the Microsoft team proposed the LightGBM model [42]. LightGBM has less time overhead compared to extreme gradient boosting (Xgboost). The Xgboost uses a pre-sorting algorithm when dividing the best partition nodes of the tree [43]. Since the presorting algorithm needs to traverse all the features, it leads to the inefficiency of the algorithm. In general, the time complexity of the Xgboost algorithm is proportional to the size of the data volume [44]. It means that the larger the data volume, the higher the computational overhead. The LightGBM algorithm bins the continuous features and divides different features into different bins, which reduces the computational overhead of the model. This process is called the histogram algorithm. In addition, to further improve the training efficiency of the model, LightGBM introduces the gradient-based one-sided sampling method and the mutually exclusive feature bundling algorithm. The details of the LightGBM are described in Algorithm 1.
Gradient-based one-sided sampling method. The gradient is a vector that denotes the direction of the greatest change in the value of the function, and the maximum value in that direction is the value of the gradient. In machine learning, the size of the gradient of a sample during training indicates how much that sample contributes to the final model. Because a sample with a large gradient reflects that the model has room for convergence, it is beneficial to train the model. In contrast, a sample with a small gradient indicates that the sample is already well-trained and contributes less to the training model. Therefore, it is possible to keep all of the large gradient samples, and reduce the number of less gradient samples. This process is called the gradient-based one-sided sampling method. Specifically, the gradient information of each sample is calculated. For selection purposes, the gradients of all samples are sorted in descending order according to their absolute values. After that, the samples with large gradients are retained, and some samples with small gradients are randomly excluded.
Assume the training set has n samples, denoted as {x1,…,xn}. At each iteration, the negative gradient of the model output is denoted as {g1,…,gn}. For the gradient boosting decision tree, its information gain is calculated as follows. Let O be the training set of the node on the decision tree, then the information gain of the split feature j of the node at d is calculated as:
Vj|O(d)=1nO((∑{xi∈O:xij≤d})2njl|O(d)+(∑{xi∈O:xij>d})2njr|O(d)), | (1) |
where nO=∑I[xi∈O],njl|O(d)=∑I[xi∈O:xij≤d] andnjr|O(d)=∑I[xi∈O:xij>d].
For the GOSS algorithm, the top a × 100% large gradient samples are selected to form set A. After that, b × 100% small gradient samples are selected from the remaining sets to form set B. To maintain the original sample distribution, all small gradient samples in set B need to be multiplied by a coefficient (1–a)/b. Therefore, the final information gain is calculated as follows: Let a and b be the sampling ratios of large gradient and small gradient instances, respectively. According to the sorted instance gradient values, the first a × 100% large gradient sample is selected, and then randomly selects b × 100% small gradient samples from the rest of the data. After many iterations, the final calculated information gain is:
~Vj(d)=1n((∑xi∈Algi+1−ab∑xi∈Blgi)2njl(d)+(∑xi∈Argi+1−ab∑xi∈Brgi)2njr(d)), | (2) |
where Al={xi∈A:xij≤d},Ar={xi∈A:xij>d}, Bl={xi∈B:xij≤d},Br={xi∈B:xij>d}.
Exclusive feature bundling. GOSS reduces the number of samples, while EFB reduces the dimensionality of the features. The dimensionality of the features is another important factor that affects the time overhead. EFB uses the mutually exclusive nature of the features to reduce its dimensionality. Specifically, the EFB algorithm solves this problem by constructing a graph with weights. The nodes of the graph are represented by the features of the samples, while the weights indicate the degree of feature mutual exclusion. Finally, it is transformed into a graph coloring problem and a greedy strategy is used to solve it.
Algorithm 1: LightGBM |
Input: |
Training data: D={(x1,y1),(x2,y2),…,(xn,yn)},xi∈x,x⊆R,yi∈{−1,+1}; |
Loss function: L(y,θ(x)); // y is the true value and θ(x) is the predicted value |
Iterations: M; |
Big gradient data sampling ratio: a; |
Small gradient data sampling ratio: b; |
1: Exclusive Feature Bundling (EFB) techniques are used to combine mutually exclusive features of xi,i={1,…,n} that are not simultaneously non-zero ; 2: Initialize the predicted values: θ0(x)=argminc∑niL(yi,c); 3: For m = 1 to M do: 4: Calculate gradient absolute values: gi=|∂L(yi,θ(xi))∂θ(xi)|θ(xi)=θm−1(x),i={1,…,n}; 5: Resample dataset using gradient-based one-side sampling (GOSS): topN=a×len(D);randN=b×len(D); sorted=GetSortedIndices(abs(g)); A=sorted[1:topN];B=RandomPick(sorted[topN:len(D)],randN); D'=A+B; 6: Calculate the information gains: ~Vj(d)=1n((∑xi∈Algi+1−ab∑xi∈Blgi)2njl(d)+(∑xi∈Argi+1−ab∑xi∈Brgi)2njr(d)); 7: Get a new decision tree θm(x)'onsetD'; 8: Update θm(x)=θm−1(x)+θm(x)'; 9: End for 10: Return ˜θ(x)=θM(x); |
The focal loss function is derived from the cross entropy loss function to boost the recognition of difficult samples [45,46]. The cross-entropy loss function is a typical objective function that measures the closeness of true and observed distributions. A smaller cross-entropy shows a better classification result. The expression of the binary classification cross-entropy (BCE) loss function is shown below:
BCE=−ylog˜y−(1−y)log(1−˜y), | (3) |
where y and ˜y are the true label and the predicted label, respectively.
The focal loss function adds a modulation factor (1−˜y)γ and ˜yγ to the cross-entropy function, which enables the model to assign greater learning weights to difficult samples. As such,
FL=−y(1−˜y)γlog˜y−(1−˜y)yγlog(1−˜y), | (4) |
where γ∈[0,5] is the focal parameter. When γ=0, it is the cross-entropy loss function. The effect of the value of γ on the loss is shown in Figure 2.
In addition, the focus loss function introduces an alpha weighting factor. This factor is used to adjust the weighted losses of different categories. The final focal loss function is represented as:
FL=−αy(1−˜y)γlog˜y−(1−α)(1−y)˜yγlog(1−˜y), | (5) |
where α∈[0,1].
When performing the binary classification task, the objective function of the LightGBM defaults to the binary cross-entropy loss function. As shown in Figure 2, the classification results with the focal loss function are better than the binary cross-entropy loss. In this paper, we adopt the focal loss function as the objective function in LightGBM to enhance the learning of difficult samples.
Autoencoders are neural networks composed of multiple layers of neurons. Essentially, it is a multilayer perceptron that uses a feed-forward algorithm [19,47]. The difference is that the autoencoder has the same number of neurons in the input and output layers, which facilitates the reconstruction of the data. In general, an autoencoder consists of input layer, encoder, middle layer, decoder, and output layer [24]. The encoder, middle layer, and decoder are also called hidden layers. Its structure is shown in Figure 3. The size of the input and output layers is determined by the dimensionality of the dataset. The encoder is used to compress the dataset, and the decoder is used to reconstruct the dataset. The middle layer is a compressed representation of the dataset, and its size is less than the dimensionality of the dataset. In the encoder module, for input x, a compressed representation of the dataset y is obtained after mapping. Iits mathematical expression is shown in Eq (6). In the decoder module, the data x' is reconstructed using different weights w' and biases b'. This process is the opposite of the encoder, and its mathematical expression is shown in Eq (7). Usually, we use an activation function f that is nonlinear, since it can fit arbitrary functions. In addition, the autoencoder needs to define an objective function to measure the similarity of x and x'. When x and x' are close, it means that the autoencoder is well trained. In this study, we use the mean square error (MSE) function to define the loss of the autoencoder, which is one of the functions that are used the most. As such,
y=f(wx+b), | (6) |
x'=f(w'y+b'), | (7) |
where w is the weight coefficient of the encoder layer and b is the bias vector. w' and b' are the weight coefficients and bias vectors of the decoder layer, respectively. These parameters are updated by the backpropagation of the network. Thus,
MSE=1m∑(x'−x), | (8) |
where m denotes the number of samples.
To avoid overfitting, adding regularization to the objective function is a common strategy. In this paper, we use L1 regularization to impose restrictions on the weight coefficient to give them better generalization. Autoencoders that use regularization are called sparse autoencoders [48]. In addition, they can be further classified into shallow sparse autoencoders and deep sparse autoencoders, based on the number of hidden layers. The difference between them is shown in Figure 4. In the figure, x∈Rn is the input data. y∈Rm is the output of the middle layer. hl∈Rk is the vector of the lth hidden layer, and x'∈Rn is the output vector in the sparse autoencoder. A shallow sparse autoencoder consists of three layers, i.e., an input layer, a single hidden layer (middle layer) and an output layer [41]. The deep sparse autoencoder consists of multiple hidden layers stacked on top of each other. It can learn more important implicit information from the original data than the shallow sparse autoencoder. In this study, we use a deep sparse autoencoder for our work. As such,
L1=α‖ω‖, | (9) |
where α‖w‖ denotes the L1 regularization, which refers to the sum of absolute values of all weight parameters ω. α is the penalty factor.
According to the above theory, the original data x and the reconstructed data x' are very similar when an autoencoder is trained successfully. Their differences are also called reconstruction errors. In intrusion detection, there is a vast difference between normal samples and attack samples in the dataset. When an autoencoder trained with normal samples is used to reconstruct the attack samples, their reconstruction error will be larger than the reconstructed normal samples. Therefore, we use the reconstruction error to perform anomaly detection. Suppose the normal sample is x+, and the attack sample is x−. We use only the normal sample x+ to train the autoencoder. Let the reconstructed normal sample be x'+ and the reconstructed attack sample be x'−. Then we can find x+−x'+<x−−x'−. Let the current sample be noted as x∗ and after autoencoder reconstruction as x'∗. Assume that x+−x'+ is less than a certain threshold c. When c<x∗−x'∗, the sample can be judged as an attack sample.
Figure 5 shows the flow chart of the proposed method. It consists of four parts including data preprocessing, feature selection, model training and classification decision. The details are described below.
Data pre-processing. In the data preprocessing, since the model cannot handle non-numerical features, the training and test sets are first numerized. Sparse coding is helpful to enrich the data features. In addition to the category features, other non-numerical features are sparsely coded by the one-hot coding method in this paper. For example, the non-numeric feature "Protocol" has three values [TCP, UDP, ICMP], which can be coded as [100,010,001]. For the numerical features, the variation range of the values is different, which is not conducive to the training of the model. Therefore, in order to reduce the convergence time of the model, the normalization method is needed. In this paper, the maximum-minimum normalization method is used to scale the values in the range of [0, 1]. The maximum-minimum normalization method is represented as follows:
xnormalized=x−xminxmax−xmin, | (10) |
where xmax and xmin denote the max and min values of feature x, respectively.
Feature selection. In the feature selection, the recursive feature elimination method is adopted for feature selection. The recursive feature elimination method is a wrapper method that selects features based on the performance of the classification algorithm. Essentially, the recursive feature elimination method is a greedy algorithm. The recursive deletion is performed based on the ranking score of the features. The method needs to iterate through all the features and remove those that have little impact on the model performance until the desired number of features is satisfied.
Model training. In the model training, we use the process described in Algorithm 1 to build the model. First, the iteration number of the model is set. According to the number of iterations, several different decision trees are trained. Each decision tree is built relying on the performance of the previous decision tree. After several iterations, an integrated model consisting of several weak decision trees is obtained. In particular, we use the focal loss function instead of the default cross-entropy loss function in the definition of the objective function.
Classification decision. In the classification decision, there are two decision phases. In the first decision phase, the LightGBM with the introduction of the focal loss function is used for pre-classification. In the second decision stage, secondary classification is performed using a sparse autoencoder for samples predicted as normal in the first decision stage. Generally, if the sample is judged to be abnormal, it is finally predicted to be attack. On the contrary, if the sample is judged as normal, then it will finally be predicted as normal. The two-stage classification decision step enables the intrusion detection system to improve the accuracy, and the ability to detect unknown attacks.
Description. The NSL-KDD dataset is an improved version of the KDDCup99 dataset [49]. The KDDCup99 dataset is derived from the MIT Lincoln laboratory's intrusion detection evaluation project, which is data collected from nine weeks of network connectivity and system audits. According to Tavallaee et al. [49], the training and testing sets in the KDDCup99 dataset contain 78% and 75% of redundant data, respectively. To address the redundancy problem in the KDDcup99 dataset, Tavallaee extracted the NSL-KDD dataset without redundant data from the KDDCUp99 dataset. The improved NSL-KDD dataset has the following advantages: (1) There are no duplicate records in the training and test sets, which makes the classifier not affected by duplicate records. (2) The number of records in the training and test sets are reasonable, and they do not require high performance of the computer. As shown in Table 1, the NSL-KDD dataset consists of 42 features. The values of each feature are divided into numerical and non-numerical types. Among them, the values of three features including protocols, services and flags are non-numeric types, and the rest is numeric types. The NSL-KDD dataset contains four attack categories, namely Dos, Probe, user-to-root (U2R) and root-to-local (R2L). All these attack categories are considered anomalies. The size of the NSL-KDD dataset is shown in Figure 6. The training set and test set contain 125,973 and 22,544 records, respectively. Among them, the proportion of normal samples and attack samples in the training set are 53.46% and 46.54%, respectively. In the test set, the proportion of normal samples and attack samples are 43.08% and 56.92%, respectively. It is important to note that the attack samples are composed of a variety of different attack types. In the test set, an additional 18 attacks are contained, which means that the test set has different attack patterns [21]. Therefore, it can be used to simulate the detection of zero-day attacks.
Dataset | Feature name | Size |
NSL-KDD | duration, protocols_types, services, flag, src_bytes, dst_bytes, land, wrong_fragment, urgent, hot num_failed_logins, logged_in, num_compromised, root_shell, su_attempted, num_root, num_shells, num_access_files, num_outbound_cmds, is_hot_login, Is_guest_login, count, srv_count, serror_rate, srv_serror_rate, rerror_rate, srv_rerror_rate, same_srv_rate, diff_srv_rate, srv_diff_host_rate, dst_host_count, dst_host_srv_count, dst_host_same_srv_rate, dst_host_diff_srv_rate, dst_host_same_src_port_rate, dst_host_srv_diff_host_rate, dst_host_serror_rate, dst_host_srv_serror_rate, dst_host_rerror_rate, dst_host_srv_rerror_rate, label. | 42 |
UNSWNB15 | srcip, sport, dstip, dsport, protocol, state, dur, sbytes, dbytes, sttl, dttl, sloss, dloss, service, sload, dload, skts, dpkts, swin, dwin, stcpb, dtcpb, smeansz, dmeansz, trans_depth, res_bdy_len, sjit, djit, stime, ltime, sintpkt, dintpkt, tcprtt, synack, ackdat, is_sm_ips_ports, ct_state_ttl, ct_flw_http_mthd, is_ftp_login, ct_ftp_cmd, ct_srv_src, ct_srv_dst, ct_dst_ltm, ct_src_ ltm, ct_src_dport_ltm, ct_dst_sport_ltm, ct_dst_src_ltm, attack_type, label. | 49 |
The UNSWNB15 dataset was created by the Australian Center for Cyber Security in 2015 using the IXIA tool [50]. The dataset contains a total of 2 million records that were saved in four different CSV files [51]. To facilitate the use of the dataset, the UNSWNB15 dataset was divided into a training set and a test set, named UNSWNB15Train and UNSWNB15Test, respectively. As shown in Table 1, a total of 49 features are included in the dataset. Among them, three features containing protocol, service and state are non-numeric types, and the rest are numeric types. Different from the NSL-KDD dataset, the UNSWNB15 dataset includes nine new attack types: Backdoor, Shellcode, Reconnaissance, Worms, Fuzzers, DOS and Generic. In this paper, we used the UNSWNB15Train and UNSWNB15Test datasets for our experiments. The information on this dataset is shown in Figure 6. Specifically, the training set and test set contain 175,341 and 82,332 samples, respectively. In the training set, the proportion of normal samples and attack samples are 31.94% and 68.06%, respectively. In the test set, the proportion of normal samples and attack samples are 44.94% and 55.06%, respectively.
Preprocessing. The NSL-KDD dataset sample contained 41-dimensional features. Because the model cannot handle symbolic data, it is necessary to convert the characteristics of symbolic types into numeric kinds. In addition, the data are encoded with the one-hot method. Specifically, the protocol feature contains three values, represented by three numbers containing 0 and 1. The service features have 70 values, so they are represented by 70 numbers consisting of 0 and 1. The flag feature has 11 values, so it is represented by 11 numbers containing 0 and 1. After one-hot encoding, the dimension of the NSL-KDD dataset is expanded to 122. Similarly, the UNSWNB15 dataset is processed in the same way. Finally, the dimensionality of the UNSWNB15 dataset becomes 196.
In this study, the recursive feature elimination method was used to reduce the data dimensionality. The recursive feature elimination method selects features based on feature importance [52]. First, all the original features are trained using the LightGBM classifier to obtain the weight coefficients of each feature. After that, the features with the smallest weight coefficients are selected and removed from the original feature set to obtain a new subset of features. Finally, the new feature subset is trained again using the LightGBM classifier to obtain the weight coefficients. This process is repeated until the required number of features is obtained. Algorithm 2 describes this process. According to the results in Section 4.3.1, we select 40 and 60 features for the NSL-KDD and UNSWNB15 datasets, respectively.
The sparse autoencoder is used for the second stage of prediction. First, the autoencoder is trained on the normal class samples from the training set. Then, the trained model is reconstructed from the test set. In this study, the sparse autoencoder is composed of 7 hidden layers. Among them, the structures of the encoder are 64 and 32, denoting the number of nodes in each layer. The size of the middle layer is 16. Since the encoder and decoder are symmetric structures, the structures of the decoder are 32 and 64, respectively. The sizes of the input and output layers are 122 for the NSL-KDD dataset, and 196 for the UNSWNB15 dataset. In particular, the Relu function is used as an activation function between neurons.
Algorithm 2: RFE |
Input: Original feature set S=[1,2,3,…,D] // D denotes the features in the sample Expected number of features: N |
Output: Feature ordering set R=[] |
Start: Initialize feature weights wi=1(i=1,…,d) // d denotes the dimensionality of the features in the original dataset |
1: if len(S)≠N, do: |
2: Train the current feature set S with the LightGBM classifier |
3: Calculate the feature weight coefficients in set S |
4: Find the feature with the smallest weight coefficient: r=argminj(wj)(j=1,…,d) |
5: Update feature ordering set: R= [r,R] |
6: Remove less important features: S=S−[r] |
7: d = d-1 |
8: until len(S) = N |
9: end if |
The experiment was conducted on a Dell host, and was configured as follows: 32 G RAM, Intel Core i7-9700 CPU, and Radeon Rx 550x. To speed up the training of the model, we used the GPU on a Linux server to train the autoencoder. We used tensorflow with version 2.2.0 as the backend. Furthermore, sklearn and keras were used to process the dataset. The native lgb [53] library was used to build the model. In the experiments, we set the number of iterations as 200, and the random seed as 42. In addition, for the NSL-KDD dataset, we set α = 0.1 and γ = 0.9. For the UNSWNB15 dataset, we set α = 0.2 and γ = 5.
In this paper, we used accuracy, precision, recall and F1 score to evaluate the performance of the model. The accuracy represents the proportion of instances that are correctly predicted to account for all instances. The precision represents the proportion of correctly predicted attack instances to all predicted attack instances. The recall represents the proportion of attack instances that were correctly predicted by the classifier. The F1 score is a metric of balancing precision and recall. The formula of each metric is determined by Eqs (11)–(14).
Accuracy=TP+TNTP+TN+FP+FN, | (11) |
Precision=TPTP+FP, | (12) |
Recall=TPTP+FN, | (13) |
F1score=2/(1Precision+1Recall), | (14) |
where TP represents the number of attack instances that are correctly predicted. TN represents the number of correctly predicted normal instances. FP represents the number of normal instances that are mispredicted. FN represents the number of attack instances that are mispredicted.
In order to find the appropriate number of features, different numbers of features were used for comparison. Figures 7 and 8 show the accuracy of the two datasets for the different number of features. As can be seen from Figure 7, the highest accuracy is obtained on the NSL-KDD dataset when the number of features is 40. However, for the UNSWNB15 dataset, the number of features is 60. Therefore, we set the number of features to 40 and 60 for the NSL-KDD and UNSWNB15 datasets, respectively.
Figure 9 shows the performance of the proposed method on each metric. On the NSL-KDD dataset, the proposed method achieves 92.57%, 89.93%, 97.91% and 93.75% in accuracy, precision, recall and F1 score, respectively. Among them, recall is the best, which indicates that the proposed method can detect almost all attack classes. The confusion matrix on the NSL-KDD dataset is given in Table 2. It can be seen that only 267 attack classes are not recognized by the proposed method. For the UNSWNB15 dataset, the proposed method achieves 92.71%, 93.43%, 93.32% and 93.38% in terms of accuracy, precision, recall and F1 score, respectively. It can be concluded that the performance of all metrics on this dataset is more balanced. Table 3 shows the confusion matrix on the UNSWNB15 dataset.
NSL-KDD | Predicted label | ||
Normal | Attack | ||
True label | Normal | 8305 | 1406 |
Attack | 267 | 12,566 |
UNSWNB15 | Predicted label | ||
Normal | Attack | ||
True label | Normal | 34,029 | 2971 |
Attack | 3026 | 42,306 |
Table 4 shows the time overhead (the sum of training time and prediction time for the proposed method). When the feature selection method is not used, the time overhead on the two datasets are 8.22 and 18.6 seconds, respectively. In contrast, when using the recursive feature elimination method, the time overhead of the proposed method is 5.9 and 17.25 seconds, respectively. It indicates that the decision efficiency of the model is improved after using the recursive feature reduction method.
Dataset | Original | RFE |
NSL-KDD | 8.22 s | 5.9 s |
UNSWNB15 | 18.6 s | 17.25 s |
In this section, we perform the ablation analysis of the proposed method. The LightGBM model without introducing the focal loss function is taken as the base model, which is LGBM. The model introducing the focal loss function is the improved model, which is FL_LGBM. The proposed model is FL_LGBM-AE. As shown in Figure 10, the performance of these three different models in terms of accuracy and F1 score is shown. The F1 score reflects the harmonic value of precision and recall. It can be concluded that, for both datasets, the FL_LGBM model has a larger improvement compared to the base model. It shows that the focal loss function introduced in LGBM is valid. Furthermore, on the NSL-KDD dataset, compared to FL_LGBM, the FL_LGBM-AE model improved by 11.5% and 13.08% in accuracy and F1 score, respectively. The FL_LGBM-AE model also performs better than the FL_LGBM model on the UNSWNB15 dataset.
In the proposed method, the learning rate and threshold are two important hyperparameters. The proposed method enables the model to learn from difficult samples by the introduction of the focal loss function. It makes the learning rate of the proposed model more important.
Figures 11 and 12 show the effect of different learning rates on these two datasets. On the NSL-KDD dataset, the model performs the worst when the learning rate is equal to 0.003. When the learning rate equals 0.0035, the model without the autoencoder performs the best, reaching an accuracy of 87.85%. As the learning rate increases, the accuracy of the model gradually decreases. Conversely, the accuracy of the model using autoencoder increased as the learning rate increased. The reason is that, when the learning rate increases, the learning pace of the model becomes larger, resulting in the model not converging to the global minimum. As the learning rate becomes larger, it misclassifies most of the attack classes as normal classes. When using the autoencoder, the attack samples that are misclassified as normal classes are accurately identified. The highest accuracy of the model is obtained when the learning rate is equal to 0.03. Overall, most models that used autoencoders were above 90% accurate, which was higher than the models that did not use autoencoders.
On the UNSWNB15 dataset, the performance of the models is stable, whether or not we use autoencoders. When the learning rate reaches 0.004, the accuracy of the model with the autoencoder is slightly higher than the model without the autoencoder. After this point, the accuracy of the model with the autoencoder is slightly smaller than the model without the autoencoder. The possible reason is that the accuracy of the model without the autoencoder is already over 90%. When using the autoencoder, the accuracy of the model only slightly improves. As the learning rate increases, the model performance deteriorates and becomes worse with the autoencoder. However, it cannot demonstrate that the proposed model is ineffective. Autoencoders can still play an important role, as long as a suitable learning rate is found. As can be seen in Figure 12, the best performance of the model is obtained when the learning rate is 0.004.
When training datasets with autoencoders, the range of reconstruction errors produced by different datasets is different. Figure 13 presents the mean squared error for the both datasets. Figures 14 and 15 show the effect of different thresholds on the model. For the NSL-KDD dataset, the proposed method produces the best results when the threshold reaches 0.00095. After that, as the threshold increases, the accuracy of the model gradually decreases. The reason is that the increase in threshold causes the autoencoder to fail to identify the attack samples with large reconstruction errors. For the UNSWNB15 dataset, the proposed method achieves the best results when the thresholds are 0.0095. As the threshold increases, the performance of the model plateaus. Most of the attack samples with large reconstruction errors have already been identified, meaning increasing the threshold has no influence.
Tables 5 and 6 show the performance of different methods on the four evaluation metrics. For the NSL-KDD dataset, the proposed method achieves the best results in terms of recall rate, F1 score and accuracy. It is notable that the random forest (RF), gradient boosting decision Tree (GBDT) and Xgboost models all achieve 90% accuracy. For the UNSWNB15 dataset, the proposed method achieves the best performance in terms of precision, F1 score and accuracy. In contrast, the other methods did not exceed 81% in precision, and did not reach 90% in accuracy. Although these methods have a higher recall, they perform poorly on other metrics. In particular, the proposed method exceeds 90% on the F1 score, which indicates that our method performs more balanced in precision and recall. The other methods have a significant imbalance in precision and recall. Overall, the proposed method has better performance on these two datasets, which proves the effectiveness of the proposed method.
![]() |
Precision (%) | Recall (%) | F1 score (%) | Accuracy (%) |
DT | 89.80 | 97.12 | 93.32 | 92.08 |
SVM | 89.89 | 94.45 | 92.12 | 90.80 |
RF | 90.05 | 96.89 | 93.35 | 92.14 |
GBDT | 90.00 | 97.09 | 93.41 | 92.21 |
Xgboost | 90.00 | 96.94 | 93.34 | 92.13 |
Adaboost | 89.68 | 97.15 | 93.27 | 92.02 |
Proposed method | 89.93 | 97.91 | 93.75 | 92.57 |
![]() |
Precision (%) | Recall (%) | F1 score (%) | Accuracy (%) |
DT | 80.42 | 95.40 | 87.27 | 84.68 |
SVM | 75.03 | 99.58 | 85.58 | 81.52 |
RF | 77.51 | 99.32 | 87.07 | 83.76 |
GBDT | 76.04 | 99.51 | 86.20 | 82.46 |
Xgboost | 76.47 | 98.86 | 86.23 | 82.26 |
Adaboost | 76.57 | 98.85 | 86.30 | 82.72 |
Proposed method | 93.43 | 93.32 | 93.38 | 92.71 |
Figures 16 and 17 show the time overhead for the different methods. It can be seen that the support vector machine (SVM) model has the highest time overhead on these two datasets. The reason is that, after mapping the data to the nonlinear space, the SVM needs to calculate the maximum interval of the decision boundary, which increases the computational overhead. As the number of samples increases, the time overhead becomes larger. It shows that the SVM is not suitable for handling large datasets. In addition, the decision tree (DT) model has the least time overhead owing to the simple decision tree algorithm. The proposed method adds a portion of time overhead due to the use of the focal loss function. The time overheads of the proposed method are 5.9 and 17.25 seconds for these two datasets, respectively. Although the proposed method is not optimal in terms of time overhead, it is still less than the overheads of the SVM, RF, GBDT and Adaboost models. It means that the proposed method still has an advantage in terms of time overhead.
Table 7 shows the comparison of our method with existing methods. We present the results from these publications. For the NSL-KDD dataset, it can be seen that our method performs the best in terms of accuracy, recall and F1 score, reaching 92.57%, 97.91% and 93.75%, respectively. In terms of precision, the literature [20] performs the best. For the UNSWNB15 dataset, our method performs the best in terms of accuracy and F1 score, reaching 92.71% and 93.38%, respectively. In terms of recall, literature [15] and literature [32] obtained 99.28% and 98.06%, respectively, which are the best among all methods. In terms of precision, the literature [31] obtained the best results, achieving 93.88%. The reason is that the literature [15] and [31] used the integration of several different classifiers. The literature [32] used a deep learning approach based on artificial neural networks. However, they did not achieve an accuracy of 90%. In contrast, our method exceeded 90% in all metrics, showing that our method is more effective. This is owed to our proposed two-stage decision step.
Dataset | Method | Accuracy (%) | Recall (%) | Precision(%) | F1 score(%) |
NSL-KDDTest | RandomTree+NBtree [13] | 89.24 | N/A | N/A | N/A |
CBR-CNN [19] | 89.41 | N/A | N/A | N/A | |
AE-LSTM [24] | 89.00 | 88.00 | N/A | N/A | |
AIDA [20] | 92.41 | 92.00 | 94.52 | 93.24 | |
STL [27] | 88.39 | 95.95 | 85.44 | 90.40 | |
AE-IDS [30] | 84.21 | 80.37 | 87.00 | 81.98 | |
MFFSEM [31] | 84.33 | 96.43 | 74.61 | 84.13 | |
Our Method | 92.57 | 97.91 | 89.93 | 93.75 | |
UNSWNB15Test | Voting-CMN [15] | 89.29 | 99.28 | 82.37 | 90.04 |
RepTree [26] | 88.95 | N/A | N/A | N/A | |
ANN [32] | 86.71 | 98.06 | 81.54 | 89.04 | |
MFFSEM [31] | 88.85 | 80.44 | 93.88 | 86.64 | |
LOF [33] | 91.86 | N/A | N/A | N/A | |
GAA [54] | 91.80 | 91.00 | N/A | N/A | |
GBM [55] | 91.31 | N/A | N/A | N/A | |
Our Method | 92.71 | 93.32 | 93.43 | 93.38 |
In this work, we proposed a two-stage intrusion detection framework based on LightGBM and autoencoders. In this framework, to solve the curse of dimensionality, the recursive feature elimination method was used for feature selection. In addition, the focal loss function was introduced in LightGBM to enhance the learning of difficult samples. In order to improve the detection capability of zero-day attacks, this study divided the decision-making process into two stages, thereby improving the performance of the intrusion detection system. The experiments were performed on the NSL-KDD and UNSWNB15 datasets, and the accuracy rates were 92.57% and 92.71%, respectively. The recall reached 97.91% and 93.32%, respectively. Experiments compared classical methods and advanced methods respectively, and the results proved the effectiveness of the proposed method. We can conclude that the proposed method can improve the efficiency and performance of intrusion detection systems.
Although the proposed method achieves a high recall for the NSL-KDD dataset, the recall for UNSWNB15 still needs to be improved. In addition, the precision of our method on both datasets is not yet advanced. This means that we need to further optimize the model. In future work, we will mainly focus on two aspects: First, since the threshold of the autoencoder is the key factor affecting the model, we will develop a method to set the threshold automatically. Second, we segment the attack types and adopt a suitable sampling method to further improve the model performance.
This work was supported by the National Natural Science Foundation of China under Grant 61862007, and the Guangxi Natural Science Foundation under Grant 2020GXNSFBA297103.
We declare that there are no conflicts of interest.
[1] |
R. Acar, C. R. Vogel, Analysis of bounded variation penalty methods for ill-posed problems, Inverse Probl., 10 (1994), 1217–1229. doi: 10.1088/0266-5611/10/6/003
![]() |
[2] |
C. Brito-Loeza, K. Chen, Multigrid algorithm for high order denoising, SIAM J. Imaging Sci., 3 (2010), 363–389. doi: 10.1137/080737903
![]() |
[3] |
C. Brito-Loeza, K. Chen, V. Uc-Cetina, Image denoising using the gaussian curvature of the image surface, Numer. Meth. Part. D. E., 32 (2016), 1066–1089. doi: 10.1002/num.22042
![]() |
[4] |
K. Chen, Introduction to variational image-processing models and applications, Int. J. Comput. Math., 90 (2013), 1–8. doi: 10.1080/00207160.2012.757073
![]() |
[5] |
K. Chen, F. Fairag, A. Al-Mahdi, Preconditioning techniques for an image deblurring problem, Numer. Linear Algebr., 23 (2016), 570–584. doi: 10.1002/nla.2040
![]() |
[6] | F. Fairag, S. Ahmad, A two-level method for image deblurring problem, In: 2019 8th International Conference on Modeling Simulation and Applied Optimization (ICMSAO), 2019, 1–5. |
[7] | F. Fairag, K. Chen, S. Ahmad, Analysis of the ccfd method for mc-based image denoising problems, Electron. T. Numer. Ana., 54 (2021), 108–127. |
[8] |
M. Myllykoski, R. Glowinski, T. Karkkainen, T. Rossi, A new augmented lagrangian approach for l^1-mean curvature image denoising, SIAM J. Imaging Sci., 8 (2015), 95–125. doi: 10.1137/140962164
![]() |
[9] | K. L. Riley, Two-level preconditioners for regularized ill-posed problems, Montana State University, 1999. |
[10] |
L. I. Rudin, S. Osher, E. Fatemi, Nonlinear total variation based noise removal algorithms, Physica D, 60 (1992), 259–268. doi: 10.1016/0167-2789(92)90242-F
![]() |
[11] | H. Rui, H. Pan, A block-centered finite difference method for the {D}arcy-{F}orchheimer model, SIAM J. Numer. Anal., 5 (2012), 2612–2631. |
[12] |
L. Sun, K. Chen, A new iterative algorithm for mean curvature-based variational image denoising, BIT, 54 (2014), 523–553. doi: 10.1007/s10543-013-0448-y
![]() |
[13] | A. N. Tikhonov, Regularization of incorrectly posed problems, Sov. Math. Dokl., 4 (1963), 1624–1627. |
[14] |
C. R. Vogel, M. E. Oman, Fast, robust total variation-based reconstruction of noisy, blurred images, IEEE T. Image Process., 7 (1998), 813–824. doi: 10.1109/83.679423
![]() |
[15] |
F. Yang, K. Chen, B. Yu, Homotopy method for a mean curvature-based denoising model, Appl. Numer. Math., 62 (2012), 185–200. doi: 10.1016/j.apnum.2011.12.001
![]() |
[16] |
F. Yang, K. Chen, B. Yu, D. Fang, A relaxed fixed point method for a mean curvature-based denoising model, Optim. Method. Softw., 29 (2014), 274–285. doi: 10.1080/10556788.2013.788650
![]() |
[17] |
J. Zhang, C. Deng, Y. Shi, S. Wang, Y. Zhu, A fast linearised augmented lagrangian method for a mean curvature based model, E. Asian J. Appl. Math., 8 (2018), 463–476. doi: 10.4208/eajam.010817.160218
![]() |
[18] |
W. Zhu, T. Chan, Image denoising using mean curvature of image surface, SIAM J. Imaging Sci., 5 (2012), 1–32. doi: 10.1137/110822268
![]() |
[19] |
W. Zhu, X. C. Tai, T. Chan, Augmented lagrangian method for a mean curvature based image denoising model, Inverse Probl. Imag., 7 (2013), 1409–1432. doi: 10.3934/ipi.2013.7.1409
![]() |
[20] | W. Zhu, X. C. Tai, T. Chan, A fast algorithm for a mean curvature based image denoising model using augmented lagrangian method, In: Efficient algorithms for global optimization methods in computer vision, Springer, 2014,104–118. |
1. | Yang Chen, Kezheng Zuo, Zhimei Fu, New characterizations of the generalized Moore-Penrose inverse of matrices, 2022, 7, 2473-6988, 4359, 10.3934/math.2022242 | |
2. | Na Liu, Hongxing Wang, Efthymios G. Tsionas, The Characterizations of WG Matrix and Its Generalized Cayley–Hamilton Theorem, 2021, 2021, 2314-4785, 1, 10.1155/2021/4952943 | |
3. | Zhimei Fu, Kezheng Zuo, Yang Chen, Further characterizations of the weak core inverse of matrices and the weak core matrix, 2022, 7, 2473-6988, 3630, 10.3934/math.2022200 | |
4. | Dijana Mosić, Daochang Zhang, New Representations and Properties of the m-Weak Group Inverse, 2023, 78, 1422-6383, 10.1007/s00025-023-01878-7 | |
5. | Congcong Wang, Xiaoji Liu, Hongwei Jin, The MP weak group inverse and its application, 2022, 36, 0354-5180, 6085, 10.2298/FIL2218085W | |
6. | Dijana Mosić, Predrag S. Stanimirović, Lev A. Kazakovtsev, Application of m-weak group inverse in solving optimization problems, 2024, 118, 1578-7303, 10.1007/s13398-023-01512-9 | |
7. | Dijana Mosić, A generalization of the MP-m-WGI, 2024, 47, 1607-3606, 2133, 10.2989/16073606.2024.2352566 | |
8. | Dijana Mosić, Predrag S. Stanimirović, Lev A. Kazakovtsev, Minimization problem solvable by weighted m-weak group inverse, 2024, 1598-5865, 10.1007/s12190-024-02215-z | |
9. | Dijana Mosić, Daochang Zhang, Predrag S. Stanimirović, An extension of the MPD and MP weak group inverses, 2024, 465, 00963003, 128429, 10.1016/j.amc.2023.128429 | |
10. | Kezheng Zuo, Yang Chen, Li Yuan, Further representations and computations of the generalized Moore-Penrose inverse, 2023, 8, 2473-6988, 23442, 10.3934/math.20231191 | |
11. | D. Mosić, P. S. Stanimirović, L. A. Kazakovtsev, The m-weak group inverse for rectangular matrices, 2024, 32, 2688-1594, 1822, 10.3934/era.2024083 | |
12. | Mengyu He, Xiaoji Liu, Hongwei Jin, The MPWG inverse of third-order F-square tensors based on the T-product, 2024, 38, 0354-5180, 939, 10.2298/FIL2403939H | |
13. | Jinzhao Wu, Hongjie Jiang, Mengyu He, Xiaoji Liu, General strong fuzzy solutions of complex fuzzy matrix equations involving the Moore-Penrose weak group inverse, 2024, 654, 00200255, 119832, 10.1016/j.ins.2023.119832 | |
14. | Shuangzhe Liu, Hongxing Wang, Yonghui Liu, Conan Liu, Matrix derivatives and Kronecker products for the core and generalized core inverses, 2024, 535, 0022247X, 128128, 10.1016/j.jmaa.2024.128128 | |
15. | Jiaxuan Yao, Hongwei Jin, Xiaoji Liu, The weak group-star matrix, 2023, 37, 0354-5180, 7919, 10.2298/FIL2323919Y | |
16. | Dijana Mosić, Janko Marovt, Weighted MP weak group inverse, 2024, 32, 1844-0835, 221, 10.2478/auom-2024-0012 | |
17. |
Huanyin Chen,
On m-Generalized Group Inverse in Banach ∗ -Algebras,
2025,
22,
1660-5446,
10.1007/s00009-025-02818-1
|
Dataset | Feature name | Size |
NSL-KDD | duration, protocols_types, services, flag, src_bytes, dst_bytes, land, wrong_fragment, urgent, hot num_failed_logins, logged_in, num_compromised, root_shell, su_attempted, num_root, num_shells, num_access_files, num_outbound_cmds, is_hot_login, Is_guest_login, count, srv_count, serror_rate, srv_serror_rate, rerror_rate, srv_rerror_rate, same_srv_rate, diff_srv_rate, srv_diff_host_rate, dst_host_count, dst_host_srv_count, dst_host_same_srv_rate, dst_host_diff_srv_rate, dst_host_same_src_port_rate, dst_host_srv_diff_host_rate, dst_host_serror_rate, dst_host_srv_serror_rate, dst_host_rerror_rate, dst_host_srv_rerror_rate, label. | 42 |
UNSWNB15 | srcip, sport, dstip, dsport, protocol, state, dur, sbytes, dbytes, sttl, dttl, sloss, dloss, service, sload, dload, skts, dpkts, swin, dwin, stcpb, dtcpb, smeansz, dmeansz, trans_depth, res_bdy_len, sjit, djit, stime, ltime, sintpkt, dintpkt, tcprtt, synack, ackdat, is_sm_ips_ports, ct_state_ttl, ct_flw_http_mthd, is_ftp_login, ct_ftp_cmd, ct_srv_src, ct_srv_dst, ct_dst_ltm, ct_src_ ltm, ct_src_dport_ltm, ct_dst_sport_ltm, ct_dst_src_ltm, attack_type, label. | 49 |
NSL-KDD | Predicted label | ||
Normal | Attack | ||
True label | Normal | 8305 | 1406 |
Attack | 267 | 12,566 |
UNSWNB15 | Predicted label | ||
Normal | Attack | ||
True label | Normal | 34,029 | 2971 |
Attack | 3026 | 42,306 |
Dataset | Original | RFE |
NSL-KDD | 8.22 s | 5.9 s |
UNSWNB15 | 18.6 s | 17.25 s |
![]() |
Precision (%) | Recall (%) | F1 score (%) | Accuracy (%) |
DT | 89.80 | 97.12 | 93.32 | 92.08 |
SVM | 89.89 | 94.45 | 92.12 | 90.80 |
RF | 90.05 | 96.89 | 93.35 | 92.14 |
GBDT | 90.00 | 97.09 | 93.41 | 92.21 |
Xgboost | 90.00 | 96.94 | 93.34 | 92.13 |
Adaboost | 89.68 | 97.15 | 93.27 | 92.02 |
Proposed method | 89.93 | 97.91 | 93.75 | 92.57 |
![]() |
Precision (%) | Recall (%) | F1 score (%) | Accuracy (%) |
DT | 80.42 | 95.40 | 87.27 | 84.68 |
SVM | 75.03 | 99.58 | 85.58 | 81.52 |
RF | 77.51 | 99.32 | 87.07 | 83.76 |
GBDT | 76.04 | 99.51 | 86.20 | 82.46 |
Xgboost | 76.47 | 98.86 | 86.23 | 82.26 |
Adaboost | 76.57 | 98.85 | 86.30 | 82.72 |
Proposed method | 93.43 | 93.32 | 93.38 | 92.71 |
Dataset | Method | Accuracy (%) | Recall (%) | Precision(%) | F1 score(%) |
NSL-KDDTest | RandomTree+NBtree [13] | 89.24 | N/A | N/A | N/A |
CBR-CNN [19] | 89.41 | N/A | N/A | N/A | |
AE-LSTM [24] | 89.00 | 88.00 | N/A | N/A | |
AIDA [20] | 92.41 | 92.00 | 94.52 | 93.24 | |
STL [27] | 88.39 | 95.95 | 85.44 | 90.40 | |
AE-IDS [30] | 84.21 | 80.37 | 87.00 | 81.98 | |
MFFSEM [31] | 84.33 | 96.43 | 74.61 | 84.13 | |
Our Method | 92.57 | 97.91 | 89.93 | 93.75 | |
UNSWNB15Test | Voting-CMN [15] | 89.29 | 99.28 | 82.37 | 90.04 |
RepTree [26] | 88.95 | N/A | N/A | N/A | |
ANN [32] | 86.71 | 98.06 | 81.54 | 89.04 | |
MFFSEM [31] | 88.85 | 80.44 | 93.88 | 86.64 | |
LOF [33] | 91.86 | N/A | N/A | N/A | |
GAA [54] | 91.80 | 91.00 | N/A | N/A | |
GBM [55] | 91.31 | N/A | N/A | N/A | |
Our Method | 92.71 | 93.32 | 93.43 | 93.38 |
Dataset | Feature name | Size |
NSL-KDD | duration, protocols_types, services, flag, src_bytes, dst_bytes, land, wrong_fragment, urgent, hot num_failed_logins, logged_in, num_compromised, root_shell, su_attempted, num_root, num_shells, num_access_files, num_outbound_cmds, is_hot_login, Is_guest_login, count, srv_count, serror_rate, srv_serror_rate, rerror_rate, srv_rerror_rate, same_srv_rate, diff_srv_rate, srv_diff_host_rate, dst_host_count, dst_host_srv_count, dst_host_same_srv_rate, dst_host_diff_srv_rate, dst_host_same_src_port_rate, dst_host_srv_diff_host_rate, dst_host_serror_rate, dst_host_srv_serror_rate, dst_host_rerror_rate, dst_host_srv_rerror_rate, label. | 42 |
UNSWNB15 | srcip, sport, dstip, dsport, protocol, state, dur, sbytes, dbytes, sttl, dttl, sloss, dloss, service, sload, dload, skts, dpkts, swin, dwin, stcpb, dtcpb, smeansz, dmeansz, trans_depth, res_bdy_len, sjit, djit, stime, ltime, sintpkt, dintpkt, tcprtt, synack, ackdat, is_sm_ips_ports, ct_state_ttl, ct_flw_http_mthd, is_ftp_login, ct_ftp_cmd, ct_srv_src, ct_srv_dst, ct_dst_ltm, ct_src_ ltm, ct_src_dport_ltm, ct_dst_sport_ltm, ct_dst_src_ltm, attack_type, label. | 49 |
NSL-KDD | Predicted label | ||
Normal | Attack | ||
True label | Normal | 8305 | 1406 |
Attack | 267 | 12,566 |
UNSWNB15 | Predicted label | ||
Normal | Attack | ||
True label | Normal | 34,029 | 2971 |
Attack | 3026 | 42,306 |
Dataset | Original | RFE |
NSL-KDD | 8.22 s | 5.9 s |
UNSWNB15 | 18.6 s | 17.25 s |
![]() |
Precision (%) | Recall (%) | F1 score (%) | Accuracy (%) |
DT | 89.80 | 97.12 | 93.32 | 92.08 |
SVM | 89.89 | 94.45 | 92.12 | 90.80 |
RF | 90.05 | 96.89 | 93.35 | 92.14 |
GBDT | 90.00 | 97.09 | 93.41 | 92.21 |
Xgboost | 90.00 | 96.94 | 93.34 | 92.13 |
Adaboost | 89.68 | 97.15 | 93.27 | 92.02 |
Proposed method | 89.93 | 97.91 | 93.75 | 92.57 |
![]() |
Precision (%) | Recall (%) | F1 score (%) | Accuracy (%) |
DT | 80.42 | 95.40 | 87.27 | 84.68 |
SVM | 75.03 | 99.58 | 85.58 | 81.52 |
RF | 77.51 | 99.32 | 87.07 | 83.76 |
GBDT | 76.04 | 99.51 | 86.20 | 82.46 |
Xgboost | 76.47 | 98.86 | 86.23 | 82.26 |
Adaboost | 76.57 | 98.85 | 86.30 | 82.72 |
Proposed method | 93.43 | 93.32 | 93.38 | 92.71 |
Dataset | Method | Accuracy (%) | Recall (%) | Precision(%) | F1 score(%) |
NSL-KDDTest | RandomTree+NBtree [13] | 89.24 | N/A | N/A | N/A |
CBR-CNN [19] | 89.41 | N/A | N/A | N/A | |
AE-LSTM [24] | 89.00 | 88.00 | N/A | N/A | |
AIDA [20] | 92.41 | 92.00 | 94.52 | 93.24 | |
STL [27] | 88.39 | 95.95 | 85.44 | 90.40 | |
AE-IDS [30] | 84.21 | 80.37 | 87.00 | 81.98 | |
MFFSEM [31] | 84.33 | 96.43 | 74.61 | 84.13 | |
Our Method | 92.57 | 97.91 | 89.93 | 93.75 | |
UNSWNB15Test | Voting-CMN [15] | 89.29 | 99.28 | 82.37 | 90.04 |
RepTree [26] | 88.95 | N/A | N/A | N/A | |
ANN [32] | 86.71 | 98.06 | 81.54 | 89.04 | |
MFFSEM [31] | 88.85 | 80.44 | 93.88 | 86.64 | |
LOF [33] | 91.86 | N/A | N/A | N/A | |
GAA [54] | 91.80 | 91.00 | N/A | N/A | |
GBM [55] | 91.31 | N/A | N/A | N/A | |
Our Method | 92.71 | 93.32 | 93.43 | 93.38 |