Invertible neural network (INN) is a promising tool for inverse design optimization. While generating forward predictions from given inputs to the system response, INN enables the inverse process without much extra cost. The inverse process of INN predicts the possible input parameters for the specified system response qualitatively. For the purpose of design space exploration and reasoning for critical engineering systems, accurate predictions from the inverse process are required. Moreover, INN predictions lack effective uncertainty quantification for regression tasks, which increases the challenges of decision making. This paper proposes the probabilistic invertible neural network (P-INN): the epistemic uncertainty and aleatoric uncertainty are integrated with INN. A new loss function is formulated to guide the training process with enhancement in the inverse process accuracy. Numerical evaluations have shown that the proposed P-INN has noticeable improvement on the inverse process accuracy and the prediction uncertainty is reliable.
Citation: Yiming Zhang, Zhiwei Pan, Shuyou Zhang, Na Qiu. Probabilistic invertible neural network for inverse design space exploration and reasoning[J]. Electronic Research Archive, 2023, 31(2): 860-881. doi: 10.3934/era.2023043
Related Papers:
[1]
Chunkai Zhang, Yingyang Chen, Ao Yin, Xuan Wang .
Anomaly detection in ECG based on trend symbolic aggregate approximation. Mathematical Biosciences and Engineering, 2019, 16(4): 2154-2167.
doi: 10.3934/mbe.2019105
[2]
Muhammad Firdaus, Siwan Noh, Zhuohao Qian, Harashta Tatimma Larasati, Kyung-Hyune Rhee .
Personalized federated learning for heterogeneous data: A distributed edge clustering approach. Mathematical Biosciences and Engineering, 2023, 20(6): 10725-10740.
doi: 10.3934/mbe.2023475
[3]
Kefeng Fan, Cun Xu, Xuguang Cao, Kaijie Jiao, Wei Mo .
Tri-branch feature pyramid network based on federated particle swarm optimization for polyp segmentation. Mathematical Biosciences and Engineering, 2024, 21(1): 1610-1624.
doi: 10.3934/mbe.2024070
[4]
Songfeng Liu, Jinyan Wang, Wenliang Zhang .
Federated personalized random forest for human activity recognition. Mathematical Biosciences and Engineering, 2022, 19(1): 953-971.
doi: 10.3934/mbe.2022044
[5]
M Kumaresan, M Senthil Kumar, Nehal Muthukumar .
Analysis of mobility based COVID-19 epidemic model using Federated Multitask Learning. Mathematical Biosciences and Engineering, 2022, 19(10): 9983-10005.
doi: 10.3934/mbe.2022466
[6]
Tao Wang, Min Qiu .
A visual transformer-based smart textual extraction method for financial invoices. Mathematical Biosciences and Engineering, 2023, 20(10): 18630-18649.
doi: 10.3934/mbe.2023826
Jianzhong Peng, Wei Zhu, Qiaokang Liang, Zhengwei Li, Maoying Lu, Wei Sun, Yaonan Wang .
Defect detection in code characters with complex backgrounds based on BBE. Mathematical Biosciences and Engineering, 2021, 18(4): 3755-3780.
doi: 10.3934/mbe.2021189
[9]
Shubashini Velu .
An efficient, lightweight MobileNetV2-based fine-tuned model for COVID-19 detection using chest X-ray images. Mathematical Biosciences and Engineering, 2023, 20(5): 8400-8427.
doi: 10.3934/mbe.2023368
[10]
Michael James Horry, Subrata Chakraborty, Biswajeet Pradhan, Maryam Fallahpoor, Hossein Chegeni, Manoranjan Paul .
Factors determining generalization in deep learning models for scoring COVID-CT images. Mathematical Biosciences and Engineering, 2021, 18(6): 9264-9293.
doi: 10.3934/mbe.2021456
Abstract
Invertible neural network (INN) is a promising tool for inverse design optimization. While generating forward predictions from given inputs to the system response, INN enables the inverse process without much extra cost. The inverse process of INN predicts the possible input parameters for the specified system response qualitatively. For the purpose of design space exploration and reasoning for critical engineering systems, accurate predictions from the inverse process are required. Moreover, INN predictions lack effective uncertainty quantification for regression tasks, which increases the challenges of decision making. This paper proposes the probabilistic invertible neural network (P-INN): the epistemic uncertainty and aleatoric uncertainty are integrated with INN. A new loss function is formulated to guide the training process with enhancement in the inverse process accuracy. Numerical evaluations have shown that the proposed P-INN has noticeable improvement on the inverse process accuracy and the prediction uncertainty is reliable.
1.
Introduction
In recent years, the abundance of data generated from many distributed devices with the popularity of smartphones, wearable devices, intelligent home appliances, and autonomous driving. These data are usually concentrated in the data center for effective use. However, a crucial issue arises that the concentrated data store causes leakage of personal privacy [1]. Simultaneously, as the computing power of these mobile devices increases, it is attractive to store data locally while completing related computing tasks. Federated learning is a distributed machine learning framework that allows multiple parties to collaboratively train a model without sharing raw data [2,3], which has attracted significant attention from industry and academia recently. [4] summarizes and discusses in the application of federated learning in big data and its future direction. Although federated learning has essential significance and advantages in protecting user privacy, it also faces many challenges.
First of all, due to the distributed nature of federated learning, it is vulnerable to Byzan- tine attacks. Notably, it has been shown that, with just one Byzantine client, the whole federated optimization algorithm can be compromised and fail to converge [5]. Especially when the training data is not independent and identically distributed (non-iid), the difficulty of defense against Byzantine attacks is increased and it is difficult to guarantee the convergence of the model [6].
Methods for defending against Byzantine attacks in federated learning have been exten- sively studied, including coordinate-wise trimmed mean [9], the coordinate- wise median [7,8], the geometric median [10,11], and distance-based methods Krum [12], BREA [6], Bulyan [5]. In addition to the above methods based on statistical knowledge, [14] proposes a new idea based on anomaly detection to complete the detection of Byzantine clients in the learning process. [13] discusses the challenges and future directions of federated learning in real-time scenarios in terms of cybersecurity.
The above methods can effectively defend against Byzantine attacks to some extent, but there are also some limitations. First, the methods based on statistical knowledge have high computational complexity, and also their defense abilities are weakened due to the non-iid data in federated learning. Second, for the anomaly detection algorithm [14], there is a premise that the detection model should be trained on the test data set. Obviously, the premise hypothesis cannot be realized in practical applications because it is difficult for us to get such a data set, which can cover almost all data distributions. Therefore, it necessary for the anomaly detection model to get pre-training without relying on test dataset and update dynamically on non-iid data.
In this paper, we propose a new method that each client needs to share some data with the server, which makes a trade-off between client privacy and model performance. Unlike FedAvg [2], we use credibility score as the weight of model aggregation, not the sample size. The credibility score of each client is obtained by integrating the verification score and the detection score. The former is calculated by sharing data.
The main contributions of this paper are:
▪ We propose a new federated learning framework (BRCA) which combines credibility assessment and unified update. BRCA not only effectively defends against Byzantine attacks, but also reduces the impact of non-iid data on the aggregated global model.
▪ The credibility assessment combing anomaly detection and data verification effectively detects Byzantine attacks on non-iid data.
▪ By incorporating an adaptive mechanism and transfer learning into the anomaly detection model, the anomaly detection model can dynamically improve detection performance. Moreover, its pre-training no longer relies on the test data set.
▪ We customize four different data distributions for each data set, and explore the influence of data distribution on defense methods against Byzantine attacks.
2.
Related work
FedAvg is firstly proposed in [2] as an aggregation algorithm for federated learning. The server updates the global model by a weighted average of the clients' model updates, and the aggregation weight is determined based on its data sample size. Stich [15] and Woodworth et al. [16] analyze the convergence of FedAvg on strongly-convex smooth loss functions. However, they assume that the data is iid, which is not suitable for federated learning [17,18]. And Li et al. [19] makes the first convergence analysis of FedAvg when the data is non-iid. [20] uses clustering to improve federated learning in non-iid data. Regrettably, the ability of naive FedAvg is very weak to resist Byzantine attacks.
In the iterative process of federated aggregation, honest clients send the true model updates to the server, wishing to train a global model by consolidating their private data. However, Byzantine clients attempt to perturb the optimization process [21]. Byzantine attacks may be caused by some data corruption events in the computing or communication process such as software crashes, hardware failures and transmission errors. Simultaneously, they may also be caused by malicious clients through actively transmitting error information, in order to mislead the learning process [21].
Byzantine-robust federated learning has received increasing attention in recent years. Krum [12] is designed specially to defend Byzantine attacks in the federated learning. Krum generate the global model by a client's model update whose distances to its neighbors is shortest. GeoMed [10] uses the geometric median which is a variant of the median from one dimension to multiple dimensions. Unlike the Krum, the GeoMed uses all client updates to generate a new global model, not just one client update. Trimmed Mean [9] proposes that each dimension of its global model is obtained by averaging the parameters of clients' model updates in that dimension. But before calculating the average, the largest and smallest part of the parameters in that dimension are deleted, Xie et al. [22] and Mhamdi et al. [5] are all its variants. BREA [6] also considers the security of information transmission, but its defense method is still based on distance calculation. Zero [23] based on Watermark detection approach detect attacks such as malware and phishing attacks and cryptojacking. [24] surveys intrusion detection techniques in mobile cloud computing environment.
All of the above defense methods based on statistical knowledge and distance are not effective in defending against Byzantine attacks in non-iid settings. Abnormal [25] uses an anomaly detection model to complete the detection of Byzantine attacks.
The concept of independent and identically distributed (iid) of data is clear, but there are many meanings of non-iid. In this work, we only consider label distribution skew [17]. The categories of samples may vary across clients. For example, in the face recognition task, each user generally has their face data; for mobile device, some users may use emojis that do not show up in others' devices.
We summarize the contributions and limitations of the existing works in Table 1.
Table 1.
The summary of the contributions and limitations of the related papers.
In this paper, we propose a method that combine credibility assessment and unified update to robust federated learning against Byzantine attacks on non-iid data.
3.
Byzantine-robust federated learning on non-iid data
We utilize a federated setting that one server communicates with many clients. For the rest of the paper, we will use the following symbol definitions: A is the total client set, |A| = n; S is the selected client set in every iteration, |S| = k; among them, B is Byzantine client set, |B| = b, and H is honest client set, |H|=h. wti is the model update sent by the client i to the server at round t, Byzantine attack rate ξ=bk⋅wt is the global model at round t, DP = {D, ..., Dn} is clients' private data, Ds = {Ds, ..., Ds } is the clients' shared data, and data-sharing rate γ = |Ds||DP|+|Ds| (|⋅| represents the sample size of the data set).
3.1. BRCA: Byzantine-robust federated learning via credibility assessment
In order to enhance the robustness of federated learning against Byzantines attacks on non-iid data, BRCA combines credibility assessment and unified update, Figure 1 depicts the architecture of BRCA.
Before training, each client needs to share some private data to the server. In each iteration, the server randomly selects some clients and sends the latest global model to them. These clients use their private data to train the model locally and send the model updates to the server. After receiving model updates, the server conducts a credibility assessment for each model update and calculates their credibility scores. Momentum is an effective measure to improve the ability of federated learning to resist Byzantine attacks [26]. So our aggregation Eq (1) is as follow:
wt+1=αWt+(1−a)∑i∈srtiwti
(1)
where rti is the credibility score of client i at round t and α (0 < α < 1) is a decay factor. Last, unified update uses shared data to update the primary global model to get the new global model for this round
Algorithm 1 is the description of BRCA, which contains Credibility Assessment in line 22, and Unified Update in line 28. The crucial of BRCA to defend against Byzantine attacks is credibility assessment. On non-iid data, the data distributions of different clients are immense, and it is difficult to judge whether the difference is caused by Byzantine attacks or the non-iid data. However, the model update of the honest client should have a positive effect on its private data, which is not affected by other clients. Simultaneously, anomaly detection model can effectively detect Byzantine attacks [25]. Thus, we combine the above two ideas to detect Byzantine attacks. In order to solve the shortcomings of the existing anomaly detection models, we propose an adaptive anomaly detection model. In this paper, the shared data is randomly selected by each client based on the sample category. Of course, other sampling methods could also be used, such as clustering. In addition, it must be pointed out that the shared data will only be used on the server, not on the clients. That effectively protect the clients' privacy.
Algorithm 1: BRCA
Input: total clients A; total number of iterations T; learning rate ηserver, ηclient, ηdetection; Byzantine attack rate ξ; epoch Eserver, Eclient; initial global model w0; clients' private data DP={DP1,…,DPN}; clients' shared data Ds = {Ds1, ...,DSn}; initial anomaly detection model θ0; β; α; d; k Output: global model WT+1, anomaly detection model θT+1
To summarize, BRCA has five steps. First: the server pre-train an anomaly detection model by source data and initialize a global model. Second: every client share little private data with the server. Three: every client download the newest global model from the server, and complete model updates by private data. Then, every client send the model update to the server. Four: the server update the global model and complete the adaptation of the anomaly detection model by model updates from clients. Five: the server update the primary global model with unified update, after that, the new global model is completed. Repeating steps three to five until the global model converges
Our work is different from the recent state of the art. First, Krum, GeoMed and TrimmedMean are the representative methods based on geometric knowledge, but their premise is that the data of clients is dependent and identically distributed (iid). The hypothesis of our method is based on the actual application background of FL, aiming at non-iid data. Second, Abnormal is the first method to detect Byzantine attacks by auto-encoder anomaly detection model. However, the training of the anomaly detection model in the method is based on the test dataset and the abnormal detection model in the method is static. For both of the problems, our method has made improvement: 1) we pre-train the anomaly detection model with related but different source data without relying on the test dataset. 2) we introduce adaptive mechanism to the anomaly detection model, which help the detection model get update during federated iteration dynamically.
3.2. Credibility assessment
Algorithm 2(Credibility Assessment) is the key part of BRCA, which assigns a credibility score for each client model update. A Byzantine client would be given much lower credibility score than an honest client. To guarantee the accuracy of the credibility score, Credibility Assessment integrates adaptive anomaly detection model and data verification.
Algorithm 2: Credibility Assessment
Input: local model updates Q; clients' shared data Ds={Ds1,…,Dsn}; anomaly detection model θt; β; selected clients S; ηdetection; d; k Output: credibility score of clients R; honest client set H; anomaly detection model θt+1
1R = ∅: credibility score set; H = ∅: the honest client set; sum = 0; sume = 0; sumf = 0
2C = {Ct1,…,Cti,…Ctk}, client i∈S, cti is the weight of the last convolutional layer of Wti
In Algorithm 2, line 4 is the data verification, which calculates the verification score fi for the model update of client i. And line 5 is the get-anomaly-score() of the adaptive anomaly detection model, which calculates detection score ei. Subsequently, the credibility ri of the model update is ri = βei+(1−β)fi, R={r1, ... ri..., rk}, client i∈S. The make-adaption () in line 24 implements the adaption of the anomaly detection model.
In this paper, we judge the model update with a credibility score lower than the mean of R as a Byzantine attack, and set its credibility score as zero. Finally, normalizing the scores to get the final credibility scores.
3.2.1. Adaptive anomaly detection model
In the training process, we cannot predict the type of attacks, but we can estimate the model update of the honest client. Therefore, we can adopt a one-class classification algorithm to build the anomaly detection model with normal model updates. Such technique will learn the distribution boundary of the model updates to determine whether the new sample is abnormal. Auto-encoder is an effective one-class learning model for detecting anomalies, especially for high-dimensional data [27].
In practical applications, we cannot get the target data to complete the pre-training of our anomaly detection model. Therefore, the initialized anomaly detection model will be pre-trained on the source data with the idea of transfer learning.
At round t, the detection score eti of client i:
eti=exp(Mse(Cti−θt(Cti))−μ(E)σ(E)))
(2)
Our anomaly detection model is different from the one in Abnormal: 1) Abnormal uses the test set of the data set to train the anomaly detection model. Although the detection model obtained can complete the detection task very well, in most cases the test data set is not available. Therefore, based on the idea of transfer learning, we complete the pre-training of the anomaly detection model in the source domain. 2) Abnormal 's anomaly detection model will not be updated after training on the test set. We think this is unreasonable, because the test set is only a tiny part of the overall data. Using a small part of the training data to detect most of the remaining data, and the result may not be accurate enough. Therefore, pre-training of the anomaly detection model is completed in the source domain. Then we use the data of the target domain to fine-tune it in the iterative process to update the anomaly detection model dynamically, as make-adaption shown in Algorithm 3.
Algorithm 3: AADM adaptive anomaly detection model
Input: anomaly detection model θt; weights of the last convolutional layer of the local model C; ηdetection; credibility score R; honest client set H; d; k Output: updated anomaly detection model θt+1
The non-iid of client data increases the difficulty of Byzantine defense. However, the performance of the updated model of each client on its shared data is not affected by other clients, which can be effectively solved this problem. Therefore, we use the clients' shared data {DS=Ds1,…Dsi,…,Dsk} client i∈S to calculate the verification score of their updated model:
fti=(exp(lti−μ(l)σ(L)))−2
(3)
where lti is loss of client i calculated on model wti using the shared data Dsi at round t:
lti=1|Dsi|∑|Dsi|j=0l(Ds(j)i,Wti)
(4)
where Ds(j)i is the j−th sample of Dsi and μ(L), σ(L) are the mean and variance of set L={l1,…,lk} respectively.
3.3. Unified update
After getting the credibility score rkt in Algrithm 2 with the anomaly score ekt and the verification score fkt, we can complete the aggregation of the clients' local model updates in Eq (1) and get a preliminary updated global model. However, due to the non-iid of client data, the knowledge learned by the local model of each client is limited, and the model differences between two clients are also significant. Therefore, to solve the problem that the preliminary aggregation model lacks a clear and consistent goal, we introduce an additional unified update procedure with shared data on server, details can be seen in Algorithm 4.
Algorithm 4: Unified update
Input: global model wt+1; clients' shared data Ds = {Ds1,…,Dsn}; Eserver; ηserver; honest client set H Output: global model wt+1. 1foreach epoche = 0 toEserverdo
Because the data used for the unified update is composed of each client's data, it can more comprehensively cover the distribution of the overall data. The goal and direction of the unified update are based on the overall situation and will not tend to individual data distribution.
4.
Experiments
To verify the effectiveness of BRCA, we structure the client's data into varying degrees of non-iid, and explore the impact of different amounts of shared data on the global model. At the same time, we also compare the performance of our anomaly detection model with the Abnormal 's and explore the necessity of unified update.
4.1. Experimental steup
4.1.1. Datasets
Mnist and Cifar10 are the two most commonly used public data sets in image classification, and most of the benchmark methods in our work also use these two data sets for experiments. Using these two data sets, it is easier to compare with other existing methods.
We do the experiments on Mnist and Cifar10, and customize four different data distributions: (a) non-iid-1: each client only has one class of data. (b) non-iid-2: each client has 2 classes of data. (c) non-iid-3: each client has 5 classes of data. (d) iid: each client has 10 classes of data.
For Mnist, using 100 clients and four data distributions: (a) non-iid-1: each class of data in the training dataset is divided into 10 pieces, and each client selects one piece as its private data. (b) non-iid-2: each class of data in the training dataset is divided into 20 pieces, and each client selects 2 pieces of different classes of the data. (c) non-iid-3 each class of data in the training dataset is divided into 50 pieces, and each client selects 5 pieces of different classes of the data. (d) iid: each class of data in the training dataset is divided into 100 pieces, and each client selects 10 pieces of different classes of the data. As for the source domains used for the pre-training of the anomaly detection model, we randomly select 20,000 lowercase letters in the Nist dataset.
For Cifar10, there are 10 clients and the configuration of four data distributions is similar to that of the Mnist. We select some classes of data in Cifar100 as source domain, which are as follows: lamp (number:40), lawn mower (41), lobster (45), man (46), forest (47), mountain (49), girl (35), Snake (78), Rose (70) and Tao (68), these samples do not exiting in Cifar10.
4.1.2. Models
We use logistic regression on Mnist dataset. ηserver = 0.1, ηclient = 0.1, ηdetection = 0.02, Eclient = 5, Eserver = 1, n = 100, k = 30, ξ = 20%. Two convolution layers and three fully connected layer on Cifar10, ηserver = 0.05, ηclient = 0.05, ηdetection = 0.002, Eclient = 10, Eserver = 10, n = 10, k = 10, ξ = 20%. The structure of models are the same as [10].
4.1.3. Benchmark byzantine attacks
Same-value attacks: A Byzantine client i sends the model update ωi = c1 to the server (1 is all-ones vectors, c is a constant), we set c = 5. Sign-flipping attacks: In this scenario, each client i computes its true model update ωi, then Byzantine clients send ωi = a ωi (a < 0) to the server, we set a = −5. Gaussian attacks: Byzantine clients add Gaussian noise to all the dimensions of the model update ωi = ωi +ϵ, where s follows Gaussian distribution N (0, g2) where g is the variance, we set g = 0.3.
4.1.4. Benchmark defense methods
Defenses: Krum, GeoMed, Trimmed Mean, Abnormal and No Defense. No Defense does not use any defense methods.
4.2. Result and discussion
4.2.1. Impact of shared data rate
In the first experiment, we test the influence of the shared data rate γ in our algorithm, and do the experiment with the data distribution of non-iid-2. We implement it on five different values [1, 3, 5, 7 and 10%]. Figures 2 and 3 are the accuracy and loss for Cifar10. It is found that: 1) In all cases of Byzantine attacks, our algorithm is superior to the three benchmark methods. 2) Only 1% of the data shared by the client can significantly improve the performance of the global model. For three Byzantine attacks, Krum, GeoMed, Trimmed Mean, No Defense are all unable to converge. This also shows that when the model is complex, such methods would be less able to resist Byzantine attacks.
Figure 2.
The Accuracy of Cifar10. Byzantine attack types from (a) to (c) are as follows: Same value, Sign flipping and Gaussian noisy. Six defense methods are adopted for each type of attack, in order: No defense, Krum, GeoMed, Trimmed Mean, Abnormal and BRCA. For Ours, there are five different shared data rate (1, 3, 5, 7 and 10%), which correspond accordingly: BRCA 1, BRCA 3, BRCA 5, BRCA 7, BRCA 10.
With the increase in the client data sharing ratio, the performance of the global model has become lower. When the client shares the data ratio from 1 to 10%, the average growth rate with the three Byzantine attacks are: 1.8→1.41→0.97→0.92%. The clients only share one percent of the data, and the performance of the global model can be greatly improved.
Figure 4 clearly demonstrates the impact of different shared data rates on the loss value of the global model on Cifar10.
Figure 4.
The loss of BRCA on Cifar10 with five different shared rate.
In this part, the purposes of our experiment are: 1) Compare anomaly detection model between ours and Abnormal. 2) Explore the robustness of the anomaly detection model to data that are non-iid. The shared data rate γ is 5%, Sections 4.2.3 and 4.2.4 are the same.
In order to compare the detection performance of the anomaly detection model against Byzantine attacks between BRCA and Abnormal, we use the cross-entropy loss as the evaluation metric which is calculated by the detection score. Firstly, we get detection scores E={e1, ..., ei, ..., ek} based on model update ωi and θ, client i∈S. Then, we set P=Sigmoid(E−μ(E)) represents the probability that the client is honest and 1 − P is the probability that the client is Byzantine. Lastly, we use P and true label Y (yi = 0,i∈ B andyi = 1, j∈ H) to calculate the cross-entropy loss l=Σki=1yiln(Pi)
Figure 5(a)–(c) compare the loss of the anomaly detection model between BRCA and the Abnormal. From the figures, we can see that our model has a greater loss than Abnormal in the initial stage, mainly due to the pre-training of the anomaly detection model using the transfer learning. The initial pre-trained anomaly detection model cannot be used well in the target domain. As the adaptation progress, the loss of our model becomes decreases and gradually outperforms the Abnormal. Although Abnormal has a low loss in the initial stage, as the training progresses, the loss gradually increases, and the detection ability becomes degenerate.
Figure 5.
the cross-entropy loss of our and Abnormal anomaly detection model, on Cifar10 with non-iid-2. (a)–(c) are the performance for three Byzantine-attacks.
Figure 6(a)–(c) show the influence of different data distributions on our detection model. For different data distributions, the detection ability of the model is different, but it is worth pointing out that: as the degree of non-iid of the data increases, the detection ability of the model also increases.
Figure 6.
(a)–(c) are our anomaly detection model's performance on four different data distribution (iid, non-iid-1, non-iid-2, non-iid-3) against Byzantine attacks (Gaussian noisy, sign flipping, same value).
In this part, we study the impact of the unified update on the global model. Figure 7 shows the accuracy of the global model with and without unified update on Cifar10.
Figure 7.
The accuracy of BRCA and BRCA No on Cifar10. BRAC No is based on BRCA with unified update removed.
From non-iid-1 to iid, the improvement of the global model's accuracy by unified update is as follows: 35.1→13.6→4.7→2.3% (Same value), 34.8→10.5→3.0→3.1% (Gaussian noisy), 24.9→9.9→2.8→3.0% (Sign flipping). Combined with Figure 7, it can be clearly found that the more simple the client data is, the more obvious the unified update will be to the improvement of the global model.
When the data is non-iid, the directions of the model updates between clients are different. The higher the degree of non-iid of data, the more significant the difference. The global model obtained by weighted aggregation does not fit well with the global data. Unified update on the shared data can effectively integrate the model updates of multiple clients, giving the global model a consistent direction.
Therefore, it is necessary to implement a unified update to the primary aggregation model when data is non-iid.
4.2.4. Impact of non-iid
Tables 2 and 3 show the accuracy and loss of each defense method under different data distributions on Cifar10. It can be seen that our method is the best, and the performance is relatively stable for different data distributions. The higher the degree of non-iid of data, the more single the data of each client, the lower the performance of the defense method.
Table 2.
The accuracy of the six defenses under four different data distributions on Cifar10, against three attacks.
Our analysis is as follows: 1) The non-iid of data among clients causes large differences between clients' models. And it is difficult for the defense method to judge whether the anomaly is caused by the non-iid of the data or by the Byzantine attacks, which increases the difficulty of defending the Byzantine attack. 2) Krum and GeoMed use statistical knowledge to select the median or individual client's model to represent the global model. This type of method can effectively defend against Byzantine attacks when the data is iid. However when the data is non-iid, each client's model only focuses on a smaller area, and its independence is high, cannot cover the domain of other clients, and obviously cannot represent the global model. 3) Trimmed Mean is based on the idea of averaging to defend against Byzantine attacks. When the parameter dimension of the model is low, it has a good performance. But as the complexity of the model increases, the method can not stably complete convergence.
5.
Conclusions
In this work, we propose a robust federated learning framework against Byzantine attacks when the data is non-iid. BRCA detects Byzantine attacks by credibility assessment. Meanwhile, it makes the unified updating of the global model on the shared data, so that the global model has a consistent direction and its performance is improved. BRCA can make the global model converge very well when facing different data distributions. And for the pre-training of anomaly detection models, transfer learning can help the anomaly detection model get rid of its dependence on the test data set. Experiments have proved that BRCA performs well both on non-iid and iid data, especially on non-iid data. In the future, we will improve our methods by studying how to protect the privacy and security of shared data.
Acknowledgments
This work was partially supported by the Shanghai Science and Technology Innovation Action Plan under Grant 19511101300.
Conflict of interest
All authors declare no conflicts of interest in this paper.
References
[1]
S. Ghosh, G. A. Padmanabha, C. Peng, V. Andreoli, S. Atkinson, P. Pandita, et al., Inverse aerodynamic design of Gas turbine blades using probabilistic machine learning, J. Mech. Des., 144 (2022), 021706. https://doi.org/10.1115/1.4052301 doi: 10.1115/1.4052301
[2]
S. Obayashi, S Takanashi, Genetic optimization of target pressure distributions for inverse design methods, AIAA J., 34 (1996), 881–886. https://doi.org/10.2514/3.13163 doi: 10.2514/3.13163
[3]
P. Boselli, M. Zangeneh, An inverse design based methodology for rapid 3D multi-objective/multidisciplinary optimization of axial turbines, ASME J. Turbomach., 7 (2011), 1459–1468. https://doi.org/10.1115/GT2011-46729 doi: 10.1115/GT2011-46729
[4]
A. Nickless, P. J. Rayner, B. Erni, R. J. Scholes, Comparison of the genetic algorithm and incremental optimisation routines for a Bayesian inverse modelling based network design, Inverse Probl., 34 (2018), 055006. https://doi.org/10.1088/1361-6420/aab46c doi: 10.1088/1361-6420/aab46c
[5]
B. Hofmeister, M. Bruns, R. Rolfes, Finite element model updating using deterministic optimisation: a global pattern search approach, Eng. Struct., 195 (2019), 373–381. https://doi.org/10.1016/j.engstruct.2019.05.047 doi: 10.1016/j.engstruct.2019.05.047
[6]
S. S. Kadre, V. K. Tripathi, Advanced surrogate models for design optimization, Int. J. Eng. Sci., 9 (2016), 66–73.
D. M. Blei, A. Kucukelbir, J. D. McAuliffe, Variational inference: a review for statisticians, J. Am. Stat. Assoc., 112 (2017), 859–877. https://doi.org/10.1080/01621459.2017.1285773 doi: 10.1080/01621459.2017.1285773
[10]
L. Wu, W. Ji, S. M. AbouRizk, Bayesian inference with markov chain monte carlo–based numerical approach for input model updating, J. Comput. Civ. Eng., 34 (2020), 04019043. https://doi.org/10.1061/(ASCE)CP.1943-5487.0000862 doi: 10.1061/(ASCE)CP.1943-5487.0000862
[11]
J. J. Xu, W. G. Chen, C. Demartino, T. Y. Xie, Y. Yu, C. F. Fang, et al., A Bayesian model updating approach applied to mechanical properties of recycled aggregate concrete under uniaxial or triaxial compression, Constr. Build. Mater., 301 (2021), 124274. https://doi.org/10.1016/j.conbuildmat.2021.124274 doi: 10.1016/j.conbuildmat.2021.124274
[12]
Y. Yin, W. Yin, P. Meng, H. Liu, On a hybrid approach for recovering multiple obstacles, Commun. Comput. Phys., 31 (2022), 869–892. https://doi.org/10.4208/cicp.OA-2021-0124 doi: 10.4208/cicp.OA-2021-0124
[13]
N. C. Laurenciu, S. D. Cotofana, Probability density function based reliability evaluation of large-scale ICs, in Proceedings of the 2014 IEEE/ACM International Symposium on Nanoscale Architectures, (2014), 157–162. https://doi.org/10.1145/2770287.2770326
[14]
V. Raj, S. Kalyani, Design of communication systems using deep learning: a variational inference perspective, IEEE Trans. Cognit. Commun. Networking, 6 (2020), 1320–1334. https://doi.org/10.1109/TCCN.2020.2985371 doi: 10.1109/TCCN.2020.2985371
[15]
H. Liu, On local and global structures of transmission eigenfunctions and beyond, J. Inverse Ill-Posed Probl., 30 (2020), 287–305. https://doi.org/10.1515/jiip-2020-0099 doi: 10.1515/jiip-2020-0099
[16]
Y. Gao, H. Liu, X. Wang, K. Zhang, On an artificial neural network for inverse scattering problems, J. Comput. Phys., 448 (2022), 110771. https://doi.org/10.1016/j.jcp.2021.110771 doi: 10.1016/j.jcp.2021.110771
[17]
W. Yin, W. Yang, H. Liu, A neural network scheme for recovering scattering obstacles with limited phaseless far-field data, J. Comput. Phys., 417 (2020), 109594. https://doi.org/10.1016/j.jcp.2020.109594 doi: 10.1016/j.jcp.2020.109594
[18]
P. Zhang, P. Meng, W. Yin, H. Liu, A neural network method for time-dependent inverse source problem with limited-aperture data, J. Comput. Appl. Math., 421 (2023), 114842. https://doi.org/10.1016/j.cam.2022.114842 doi: 10.1016/j.cam.2022.114842
[19]
Y. Lu, Z. Tu, A two-level neural network approach for dynamic FE model updating including damping, J. Sound Vib., 275 (2004), 931–952. https://doi.org/10.1016/S0022-460X(03)00796-X doi: 10.1016/S0022-460X(03)00796-X
[20]
H. Sung, S. Chang, M. Cho, Reduction method based structural model updating method via neural networks, 2020. https://doi.org/10.2514/6.2020-1445
[21]
H. Sung, S. Chang, M. Cho, Efficient model updating method for system identification using a convolutional neural network, AIAAJ, 59 (2021), 3480–3489. https://doi.org/10.2514/1.J059964 doi: 10.2514/1.J059964
[22]
T. Yin, H. Zhu, An efficient algorithm for architecture design of Bayesian neural network in structural model updating, Comput.-Aided Civ. Infrastruct. Eng., 35 (2020), 354–372. https://doi.org/10.1111/mice.12492 doi: 10.1111/mice.12492
E. Yilmaz, B. German, Conditional generative adversarial network framework for airfoil inverse design, AIAA, 2020 (2020). https://doi.org/10.2514/6.2020-3185 doi: 10.2514/6.2020-3185
[26]
J. A. Hodge, K. V. Mishra, A. I. Zaghloul, Joint multi-layer GAN-based design of tensorial RF metasurfaces, in 2019 IEEE 29th International Workshop on Machine Learning for Signal Processing (MLSP), (2019), 1–6. https://doi.org/10.1109/MLSP.2019.8918860
[27]
A. H. Nobari, W. Chen, F. Ahmed, PcDGAN: A continuous conditional diverse generative adversarial network for inverse design, preprint, arXiv: 2106.03620.
[28]
A. H. Nobari, W. Chen, F. Ahmed, Range-GAN: Range-constrained generative adversarial network for conditioned design synthesis, in Proceedings of the ASME 2021 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference, 3B (2021), V03BT03A039. https://doi.org/10.1115/DETC2021-69963
[29]
L. Ardizzone, J. Kruse, S. Wirkert, D. Rahner, E. W. Pellegrini, R. S. Klessen, et al., Analyzing inverse problems with invertible neural networks, preprint, arXiv: 1808.04730.
[30]
L. Dinh, D. Krueger, Y. Bengio, NICE: Non-linear independent components estimation, preprint, arXiv: 1410.8516.
[31]
L. Dinh, J. Sohl-Dickstein, S. Bengio, Density estimation using real NVP, preprint, arXiv: 1605.08803.
[32]
Z. Guan, J. Jing, X. Deng, M. Xu, L. Jiang, Z. Zhang, et al., DeepMIH: Deep invertible network for multiple image hiding, IEEE Trans. Pattern Anal. Mach. Intell., 2022 (2022). https://doi.org/10.1109/TPAMI.2022.3141725
[33]
Y. Liu, Z. Qin, S. Anwar, P. Ji, D. Kim, S. Caldwell, et al., Invertible denoising network: a light solution for real noise removal, in 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2021), 13360–13369. https://doi.org/10.1109/CVPR46437.2021.01316
[34]
M. Oddiraju, A. Behjat, M. Nouh, S. Chowdhury, Inverse design framework with invertible neural networks for passive vibration suppression in phononic structures, J. Mech. Des., 144 (2022), 021707. https://doi.org/10.1115/1.4052300 doi: 10.1115/1.4052300
[35]
V. Fung, J. Zhang, G. Hu, P. Ganesh, B. G. Sumpter, Inverse design of two-dimensional materials with invertible neural networks, npj Comput. Mater., 7 (2021), 200. https://doi.org/10.1038/s41524-021-00670-x doi: 10.1038/s41524-021-00670-x
[36]
P. Noever-Castelos, L. Ardizzone, C. Balzani, Model updating of wind turbine blade cross sections with invertible neural networks, Wind Energy, 25 (2022), 573–599. https://doi.org/10.1002/we.2687 doi: 10.1002/we.2687
[37]
S. Ghosh, G. A. Padmanabha, C. Peng, S. Atkinson, V. Andreoli, P. Pandita, et al., Pro-ML IDeAS: A probabilistic framework for explicit inverse design using invertible neural network, AIAA, 2021 (2021). https://doi.org/10.2514/6.2021-0465
[38]
Y. Gal, Z. Ghahramani, Dropout as a bayesian approximation: representing model uncertainty in deep learning, in Proceedings of 33rd International Conference on Machine Learning, 48 (2016), 1050–1059. Available from: http://proceedings.mlr.press/v48/gal16.html?ref=https://githubhelp.com.
[39]
M. Abdar, F. Pourpanah, S. Hussain, D. Rezazadegan, L. Liu, M. Ghavamzadeh, et al., A review of uncertainty quantification in deep learning: techniques, applications and challenges, Inf. Fusion, 76 (2021), 243–297. https://doi.org/10.1016/j.inffus.2021.05.008 doi: 10.1016/j.inffus.2021.05.008
[40]
E. Hüllermeier, W. Waegeman, Aleatoric and epistemic uncertainty in machine learning: an introduction to concepts and methods, Mach. Learn., 110 (2021), 457–506. https://doi.org/10.1007/s10994-021-05946-3 doi: 10.1007/s10994-021-05946-3
[41]
M. Yadav, A. Misra, A. Malhotra, N. Kumar, Design and analysis of a high-pressure turbine blade in a jet engine using advanced materials, Mater. Today:. Proc., 25 (2020), 639–645. https://doi.org/10.1016/j.matpr.2019.07.530 doi: 10.1016/j.matpr.2019.07.530
Jie Wen, Zhixia Zhang, Yang Lan, Zhihua Cui, Jianghui Cai, Wensheng Zhang,
A survey on federated learning: challenges and applications,
2023,
14,
1868-8071,
513,
10.1007/s13042-022-01647-y
3.
Qingtie Li, Xuemei Wang, Shougang Ren,
A Privacy Robust Aggregation Method Based on Federated Learning in the IoT,
2023,
12,
2079-9292,
2951,
10.3390/electronics12132951
4.
Wenbin Yao, Bangli Pan, Yingying Hou, Xiaoyong Li, Yamei Xia,
An Adaptive Model Filtering Algorithm Based on Grubbs Test in Federated Learning,
2023,
25,
1099-4300,
715,
10.3390/e25050715
5.
Chang Zhang, Shunkun Yang, Lingfeng Mao, Huansheng Ning,
Anomaly detection and defense techniques in federated learning: a comprehensive review,
2024,
57,
1573-7462,
10.1007/s10462-024-10796-1
Caiyu Su, Jinri Wei, Yuan Lei, Hongkun Xuan, Jiahui Li, Chenchu Xu,
Empowering precise advertising with Fed-GANCC: A novel federated learning approach leveraging Generative Adversarial Networks and group clustering,
2024,
19,
1932-6203,
e0298261,
10.1371/journal.pone.0298261
8.
Kai Hu, Sheng Gong, Qi Zhang, Chaowen Seng, Min Xia, Shanshan Jiang,
An overview of implementing security and privacy in federated learning,
2024,
57,
1573-7462,
10.1007/s10462-024-10846-8
9.
S. Annamalai, N. Sangeetha, M. Kumaresan, Dommaraju Tejavarma, Gandhodi Harsha Vardhan, A. Suresh Kumar,
2025,
9781394219216,
127,
10.1002/9781394219230.ch7
10.
Zheng Yang, Ke Gu, Yiming Zuo,
Byzantine Robust Federated Learning Scheme Based on Backdoor Triggers,
2024,
79,
1546-2226,
2813,
10.32604/cmc.2024.050025
Input: total clients A; total number of iterations T; learning rate ηserver, ηclient, ηdetection; Byzantine attack rate ξ; epoch Eserver, Eclient; initial global model w0; clients' private data DP={DP1,…,DPN}; clients' shared data Ds = {Ds1, ...,DSn}; initial anomaly detection model θ0; β; α; d; k Output: global model WT+1, anomaly detection model θT+1
Input: local model updates Q; clients' shared data Ds={Ds1,…,Dsn}; anomaly detection model θt; β; selected clients S; ηdetection; d; k Output: credibility score of clients R; honest client set H; anomaly detection model θt+1
1R = ∅: credibility score set; H = ∅: the honest client set; sum = 0; sume = 0; sumf = 0
2C = {Ct1,…,Cti,…Ctk}, client i∈S, cti is the weight of the last convolutional layer of Wti
Algorithm 3: AADM adaptive anomaly detection model
Input: anomaly detection model θt; weights of the last convolutional layer of the local model C; ηdetection; credibility score R; honest client set H; d; k Output: updated anomaly detection model θt+1
Input: global model wt+1; clients' shared data Ds = {Ds1,…,Dsn}; Eserver; ηserver; honest client set H Output: global model wt+1. 1foreach epoche = 0 toEserverdo
Cryptography is used to protect the security of information transmitted between clients and server.
Defense against Byzantine attacks is still based on distance to find outliers, and had limited defenses capabilities.
Algorithm 1: BRCA
Input: total clients A; total number of iterations T; learning rate ηserver, ηclient, ηdetection; Byzantine attack rate ξ; epoch Eserver, Eclient; initial global model w0; clients' private data DP={DP1,…,DPN}; clients' shared data Ds = {Ds1, ...,DSn}; initial anomaly detection model θ0; β; α; d; k Output: global model WT+1, anomaly detection model θT+1
1R = ∅: the credibility score set.
2H = ∅: the honest client set.
3Function Add Attack(w):
Algorithm 2: Credibility Assessment
Input: local model updates Q; clients' shared data Ds={Ds1,…,Dsn}; anomaly detection model θt; β; selected clients S; ηdetection; d; k Output: credibility score of clients R; honest client set H; anomaly detection model θt+1
1R = ∅: credibility score set; H = ∅: the honest client set; sum = 0; sume = 0; sumf = 0
2C = {Ct1,…,Cti,…Ctk}, client i∈S, cti is the weight of the last convolutional layer of Wti
3foreach clienti∈Sdo
Algorithm 3: AADM adaptive anomaly detection model
Input: anomaly detection model θt; weights of the last convolutional layer of the local model C; ηdetection; credibility score R; honest client set H; d; k Output: updated anomaly detection model θt+1
1Function get-anomaly-score (θt,Cti):
Algorithm 4: Unified update
Input: global model wt+1; clients' shared data Ds = {Ds1,…,Dsn}; Eserver; ηserver; honest client set H Output: global model wt+1. 1foreach epoche = 0 toEserverdo
Attacks
No
Krum
GeoMed
Abnormal
TrimmedMean
BRCA
Same value
Non-iid-1
0.1
0.1
0.1
0.178
0.1
0.529
Non-iid-2
0.101
0.207
0.205
0.480
0.1
0.619
Non-iid-3
0.1
0.398
0.398
0.634
0.1
0.691
iid
0.098
0.696
0.705
0.698
0.101
0.713
Gaussian noisy
Non-iid-1
0.1
0.1
0.1
0.178
0.1
0.529
Non-iid-2
0.191
0.204
0.205
0.513
0.059
0.623
Non-iid-3
0.0409
0.398
0.394
0.660
0.171
0.692
iid
0.1
0.697
0.694
0.710
0.120
0.715
Sign flipping
Non-iid-1
0.1
0.101
0.1
0.177
0.1
0.426
Non-iid-2
0.1
0.192
0.214
0.5131
0.1
0.621
Non-iid-3
0.1
0.397
0398
0.651
0.1
0.686
iid
0.1
0.697
0.703
0.711
0.1
0.718
Attacks
No
Krum
GeoMed
Abnormal
TrimmedMean
BRCA
Same value
Non-iid-1
2.84e16
11.72
9.61
2.29
6.05e17
2.09
Non-iid-2
6.99e16
7.29
8.01
2.06
3.63e16
2.09
Non-iid-3
4.48e16
2.35
2.38
1.893
3.37e16
0.691
iid
1.51e16
0.794
0.774
1.837
3.17e16
1.79
Gaussian noisy
Non-iid-1
8.635e4
8.41
9.37
2.29
936.17
1.54
Non-iid-2
9.51
7.57
8.37
1.34
7.98
0.623
Non-iid-3
8.22
2.01
2.31
0.94
6.07
0.692
iid
8.09
0.81
0.79
0.82
3.12
0.76
Sign flipping
Non-iid-1
2.30
10.72
9.91
2.29
2.30
1.54
Non-iid-2
2.31
7.77
7.10
1.34
2.30
0.621
Non-iid-3
2.31
2.36
2.13
0.94
2.30
0.686
iid
2.31
0.79
0.80
0.81
2.31
0.76
Figure 1. The framework of P-INN. INN architecture incorporates EU in this work. The latent variable z captures the missing information of the forward process, while noise channel σ derives the AU of our model. Obtain y and uncertainty from x in forward process. The posterior distribution of x from a specific y is acquired by sampling z and fixing σ using the inverse process. The data pre-processing and post-processing parts make P-INN more versatile
Figure 2. INN Architecture. INN is formulated by concatenating invertible blocks and permutation blocks. si and ti are trainable affine functions to transform the input vectors. They have no requirements to be invertible as the invertible blocks are affine coupling layers, where NN is preferred
Figure 3. Response of the multi-objective problems. The ground-truth surfaces are shown in (a) and (d). Surfaces fitted by P-INN are shown in (c) and (d). Surfaces fitted by DNN are shown in (e) and (f). The poor fit of INN is not shown here
Figure 4. Loss curves for training data. (a) Total loss. (b) MSE term loss. (c) Likelihood term loss. (d) Reconstruction term loss. (e) Distribution term loss of x and z. (f) Bidirectional term loss
Figure 5. Visual demonstration of Wdg to output (weight) mapping using different methods. (a) Ground truth curve and training data points. (b) Comparison of P-INN fitted curve with ground truth curve. (c) Comparison of INN fitted curve with ground truth curve. (d) Comparison of DNN fitted curve with ground truth curve
Figure 8. Posterior pdfs of the output converted from the input. (a and b) Posterior pdfs predicted by P-INN (green histogram) and INN (orange histogram) when y = 942.46. (c and d) Posterior pdfs predicted by P-INN (green histogram) and INN (orange histogram) when y = 720.85. (e and f) Posterior pdfs predicted by P-INN (green histogram) and INN (orange histogram) when y = 1000.37
Catalog
Abstract
1.
Introduction
2.
Related work
3.
Byzantine-robust federated learning on non-iid data
3.1. BRCA: Byzantine-robust federated learning via credibility assessment