Evolving blocks by segmentation for neural architecture search

Xiaoping Zhao; Liwen Jiang; Adam Slowik; Zhenman Zhang; Yu Xue; Xiaoping Zhao; Liwen Jiang; Adam Slowik; Zhenman Zhang; Yu Xue

doi:10.3934/era.2024092

Electronic Research Archive

2024, Volume 32, Issue 3: 2016-2032. doi: 10.3934/era.2024092

Previous Article Next Article

Research article Special Issues

Evolving blocks by segmentation for neural architecture search

1.
School of Computer Science, Nanjing University of Information Science and Technology, Nanjing 210044, China
2.
Department of Electronics and Computer Science, Koszalin University of Technology, Koszalin 75-453, Poland

Received: 18 October 2023 Revised: 23 February 2024 Accepted: 27 February 2024 Published: 06 March 2024

Convolutional neural networks (CNNs) play a prominent role in solving problems in various domains such as pattern recognition, image tasks, and natural language processing. In recent years, neural architecture search (NAS), which is the automatic design of neural network architectures as an optimization algorithm, has become a popular method to design CNN architectures against some requirements associated with the network function. However, many NAS algorithms are characterised by a complex search space which can negatively affect the efficiency of the search process. In other words, the representation of the neural network architecture and thus the encoding of the resulting search space plays a fundamental role in the designed CNN performance. In this paper, to make the search process more effective, we propose a novel compact representation of the search space by identifying network blocks as elementary units. The study in this paper focuses on a popular CNN called DenseNet. To perform the NAS, we use an ad-hoc implementation of the particle swarm optimization indicated as PSO-CNN. In addition, to reduce size of the final model, we propose a segmentation method to cut the blocks. We also transfer the final model to different datasets, thus demonstrating that our proposed algorithm has good transferable performance. The proposed PSO-CNN is compared with 11 state-of-the-art algorithms on CIFAR10 and CIFAR100. Numerical results show the competitiveness of our proposed algorithm in the aspect of accuracy and the number of parameters.

Keywords:

Citation: Xiaoping Zhao, Liwen Jiang, Adam Slowik, Zhenman Zhang, Yu Xue. Evolving blocks by segmentation for neural architecture search[J]. Electronic Research Archive, 2024, 32(3): 2016-2032. doi: 10.3934/era.2024092

Related Papers:

[1]	Chunkai Zhang, Yingyang Chen, Ao Yin, Xuan Wang . Anomaly detection in ECG based on trend symbolic aggregate approximation. Mathematical Biosciences and Engineering, 2019, 16(4): 2154-2167. doi: 10.3934/mbe.2019105
[2]	Muhammad Firdaus, Siwan Noh, Zhuohao Qian, Harashta Tatimma Larasati, Kyung-Hyune Rhee . Personalized federated learning for heterogeneous data: A distributed edge clustering approach. Mathematical Biosciences and Engineering, 2023, 20(6): 10725-10740. doi: 10.3934/mbe.2023475
[3]	Kefeng Fan, Cun Xu, Xuguang Cao, Kaijie Jiao, Wei Mo . Tri-branch feature pyramid network based on federated particle swarm optimization for polyp segmentation. Mathematical Biosciences and Engineering, 2024, 21(1): 1610-1624. doi: 10.3934/mbe.2024070
[4]	Songfeng Liu, Jinyan Wang, Wenliang Zhang . Federated personalized random forest for human activity recognition. Mathematical Biosciences and Engineering, 2022, 19(1): 953-971. doi: 10.3934/mbe.2022044
[5]	M Kumaresan, M Senthil Kumar, Nehal Muthukumar . Analysis of mobility based COVID-19 epidemic model using Federated Multitask Learning. Mathematical Biosciences and Engineering, 2022, 19(10): 9983-10005. doi: 10.3934/mbe.2022466
[6]	Tao Wang, Min Qiu . A visual transformer-based smart textual extraction method for financial invoices. Mathematical Biosciences and Engineering, 2023, 20(10): 18630-18649. doi: 10.3934/mbe.2023826
[7]	Jiyang Yu, Baicheng Pan, Shanshan Yu, Man-Fai Leung . Robust capped norm dual hyper-graph regularized non-negative matrix tri-factorization. Mathematical Biosciences and Engineering, 2023, 20(7): 12486-12509. doi: 10.3934/mbe.2023556
[8]	Jianzhong Peng, Wei Zhu, Qiaokang Liang, Zhengwei Li, Maoying Lu, Wei Sun, Yaonan Wang . Defect detection in code characters with complex backgrounds based on BBE. Mathematical Biosciences and Engineering, 2021, 18(4): 3755-3780. doi: 10.3934/mbe.2021189
[9]	Shubashini Velu . An efficient, lightweight MobileNetV2-based fine-tuned model for COVID-19 detection using chest X-ray images. Mathematical Biosciences and Engineering, 2023, 20(5): 8400-8427. doi: 10.3934/mbe.2023368
[10]	Michael James Horry, Subrata Chakraborty, Biswajeet Pradhan, Maryam Fallahpoor, Hossein Chegeni, Manoranjan Paul . Factors determining generalization in deep learning models for scoring COVID-CT images. Mathematical Biosciences and Engineering, 2021, 18(6): 9264-9293. doi: 10.3934/mbe.2021456

Abstract

1. Introduction

In recent years, the abundance of data generated from many distributed devices with the popularity of smartphones, wearable devices, intelligent home appliances, and autonomous driving. These data are usually concentrated in the data center for effective use. However, a crucial issue arises that the concentrated data store causes leakage of personal privacy ^[1]. Simultaneously, as the computing power of these mobile devices increases, it is attractive to store data locally while completing related computing tasks. Federated learning is a distributed machine learning framework that allows multiple parties to collaboratively train a model without sharing raw data ^[2,3], which has attracted significant attention from industry and academia recently. ^[4] summarizes and discusses in the application of federated learning in big data and its future direction. Although federated learning has essential significance and advantages in protecting user privacy, it also faces many challenges.

First of all, due to the distributed nature of federated learning, it is vulnerable to Byzan- tine attacks. Notably, it has been shown that, with just one Byzantine client, the whole federated optimization algorithm can be compromised and fail to converge ^[5]. Especially when the training data is not independent and identically distributed (non-iid), the difficulty of defense against Byzantine attacks is increased and it is difficult to guarantee the convergence of the model ^[6].

Methods for defending against Byzantine attacks in federated learning have been exten- sively studied, including coordinate-wise trimmed mean ^[9], the coordinate- wise median ^[7,8], the geometric median ^[10,11], and distance-based methods Krum ^[12], BREA ^[6], Bulyan ^[5]. In addition to the above methods based on statistical knowledge, ^[14] proposes a new idea based on anomaly detection to complete the detection of Byzantine clients in the learning process. ^[13] discusses the challenges and future directions of federated learning in real-time scenarios in terms of cybersecurity.

The above methods can effectively defend against Byzantine attacks to some extent, but there are also some limitations. First, the methods based on statistical knowledge have high computational complexity, and also their defense abilities are weakened due to the non-iid data in federated learning. Second, for the anomaly detection algorithm ^[14], there is a premise that the detection model should be trained on the test data set. Obviously, the premise hypothesis cannot be realized in practical applications because it is difficult for us to get such a data set, which can cover almost all data distributions. Therefore, it necessary for the anomaly detection model to get pre-training without relying on test dataset and update dynamically on non-iid data.

In this paper, we propose a new method that each client needs to share some data with the server, which makes a trade-off between client privacy and model performance. Unlike FedAvg ^[2], we use credibility score as the weight of model aggregation, not the sample size. The credibility score of each client is obtained by integrating the verification score and the detection score. The former is calculated by sharing data.

The main contributions of this paper are:

▪ We propose a new federated learning framework (BRCA) which combines credibility assessment and unified update. BRCA not only effectively defends against Byzantine attacks, but also reduces the impact of non-iid data on the aggregated global model.

▪ The credibility assessment combing anomaly detection and data verification effectively detects Byzantine attacks on non-iid data.

▪ By incorporating an adaptive mechanism and transfer learning into the anomaly detection model, the anomaly detection model can dynamically improve detection performance. Moreover, its pre-training no longer relies on the test data set.

▪ We customize four different data distributions for each data set, and explore the influence of data distribution on defense methods against Byzantine attacks.

2. Related work

FedAvg is firstly proposed in ^[2] as an aggregation algorithm for federated learning. The server updates the global model by a weighted average of the clients' model updates, and the aggregation weight is determined based on its data sample size. Stich ^[15] and Woodworth et al. ^[16] analyze the convergence of FedAvg on strongly-convex smooth loss functions. However, they assume that the data is iid, which is not suitable for federated learning ^[17,18]. And Li et al. ^[19] makes the first convergence analysis of FedAvg when the data is non-iid. ^[20] uses clustering to improve federated learning in non-iid data. Regrettably, the ability of naive FedAvg is very weak to resist Byzantine attacks.

In the iterative process of federated aggregation, honest clients send the true model updates to the server, wishing to train a global model by consolidating their private data. However, Byzantine clients attempt to perturb the optimization process ^[21]. Byzantine attacks may be caused by some data corruption events in the computing or communication process such as software crashes, hardware failures and transmission errors. Simultaneously, they may also be caused by malicious clients through actively transmitting error information, in order to mislead the learning process ^[21].

Byzantine-robust federated learning has received increasing attention in recent years. Krum ^[12] is designed specially to defend Byzantine attacks in the federated learning. Krum generate the global model by a client's model update whose distances to its neighbors is shortest. GeoMed ^[10] uses the geometric median which is a variant of the median from one dimension to multiple dimensions. Unlike the Krum, the GeoMed uses all client updates to generate a new global model, not just one client update. Trimmed Mean ^[9] proposes that each dimension of its global model is obtained by averaging the parameters of clients' model updates in that dimension. But before calculating the average, the largest and smallest part of the parameters in that dimension are deleted, Xie et al. ^[22] and Mhamdi et al. ^[5] are all its variants. BREA ^[6] also considers the security of information transmission, but its defense method is still based on distance calculation. Zero ^[23] based on Watermark detection approach detect attacks such as malware and phishing attacks and cryptojacking. ^[24] surveys intrusion detection techniques in mobile cloud computing environment.

All of the above defense methods based on statistical knowledge and distance are not effective in defending against Byzantine attacks in non-iid settings. Abnormal ^[25] uses an anomaly detection model to complete the detection of Byzantine attacks.

The concept of independent and identically distributed (iid) of data is clear, but there are many meanings of non-iid. In this work, we only consider label distribution skew ^[17]. The categories of samples may vary across clients. For example, in the face recognition task, each user generally has their face data; for mobile device, some users may use emojis that do not show up in others' devices.

We summarize the contributions and limitations of the existing works in Table 1.

Table 1. The summary of the contributions and limitations of the related papers.

Reference	Contributions	Limitations
^[12] ^[10] ^[9] ^[5] ^[22]	Krum, GeoMed and Trimmed Mean complete the Byzantine defense based on statistical knowledge. Easy to deploy applications.	The assumption is that the data of the clients is iid. High computational complexity.
^[25]	The auto-encoder anomaly detection model is firstly applied to detect Byzantine attacks.	The pre-training of the anomaly detection model is completed on test dataset. The anomaly detection model is static.
^[6]	Cryptography is used to protect the security of information transmitted between clients and server.	Defense against Byzantine attacks is still based on distance to find outliers, and had limited defenses capabilities.

| Show Table

DownLoad: CSV

In this paper, we propose a method that combine credibility assessment and unified update to robust federated learning against Byzantine attacks on non-iid data.

3. Byzantine-robust federated learning on non-iid data

We utilize a federated setting that one server communicates with many clients. For the rest of the paper, we will use the following symbol definitions: $A$ is the total client set, $\left|A\right|$ = n; S is the selected client set in every iteration, $\left|S\right|$ = k; among them, B is Byzantine client set, $\left|B\right|$ = b, and H is honest client set, $\left|H\right| = h$ . ${w}_{i}^{t}$ is the model update sent by the client i to the server at round t, Byzantine attack rate $\mathrm{\xi } = \frac{b}{k}\cdot {w}^{t}$ is the global model at round t, ${D}_{P}$ = {D, ..., ${D}_{n}$ } is clients' private data, Ds = {Ds, ..., Ds } is the clients' shared data, and data-sharing rate γ = $\frac{\left|{D}_{s}\right|}{\left|{D}_{P}\right|+\left|{D}_{s}\right|}$ ( $\left|\cdot \right|$ represents the sample size of the data set).

3.1. BRCA: Byzantine-robust federated learning via credibility assessment

In order to enhance the robustness of federated learning against Byzantines attacks on non-iid data, BRCA combines credibility assessment and unified update, Figure 1 depicts the architecture of BRCA.

Figure 1. The frame diagram of the BRCA.

DownLoad: Full-Size Img PowerPoint

Before training, each client needs to share some private data to the server. In each iteration, the server randomly selects some clients and sends the latest global model to them. These clients use their private data to train the model locally and send the model updates to the server. After receiving model updates, the server conducts a credibility assessment for each model update and calculates their credibility scores. Momentum is an effective measure to improve the ability of federated learning to resist Byzantine attacks ^[26]. So our aggregation Eq (1) is as follow:

${w}^{t}+1 = \alpha {W}^{t}+\left(1-a\right)\sum _{i\in s}{r}_{i}^{t}{w}_{i}^{t}$

(1)

where ${r}_{i}^{t}$ is the credibility score of client i at round t and $\alpha$ (0 < $\alpha$ < 1) is a decay factor. Last, unified update uses shared data to update the primary global model to get the new global model for this round

Algorithm 1 is the description of BRCA, which contains Credibility Assessment in line 22, and Unified Update in line 28. The crucial of BRCA to defend against Byzantine attacks is credibility assessment. On non-iid data, the data distributions of different clients are immense, and it is difficult to judge whether the difference is caused by Byzantine attacks or the non-iid data. However, the model update of the honest client should have a positive effect on its private data, which is not affected by other clients. Simultaneously, anomaly detection model can effectively detect Byzantine attacks ^[25]. Thus, we combine the above two ideas to detect Byzantine attacks. In order to solve the shortcomings of the existing anomaly detection models, we propose an adaptive anomaly detection model. In this paper, the shared data is randomly selected by each client based on the sample category. Of course, other sampling methods could also be used, such as clustering. In addition, it must be pointed out that the shared data will only be used on the server, not on the clients. That effectively protect the clients' privacy.

Algorithm 1: BRCA
Input: total clients A; total number of iterations $T$ ; learning rate η_server, η_client, η_detection; Byzantine attack rate $\xi$ ; epoch E_server, E_client; initial global model ${w}^{0}$ ; clients' private data ${D}_{P}=\left\{{D}_{1}^{P}{, \dots, D}_{N}^{P}\right\}$ ; clients' shared data ${D}_{s}$ = { ${D}_{1}^{s}$ , ..., ${D}_{n}^{S}$ }; initial anomaly detection model ${\theta }^{0}$ ; $\beta$ ; $\alpha$ ; $d$ ; $k$ Output: global model W^T+1, anomaly detection model ${\theta }^{T+1}$
1 $R$ = $\varnothing$ : the credibility score set.
2 $H$ = $\varnothing$ : the honest client set.
3 Function Add Attack(w):

| Show Table

DownLoad: CSV

To summarize, BRCA has five steps. First: the server pre-train an anomaly detection model by source data and initialize a global model. Second: every client share little private data with the server. Three: every client download the newest global model from the server, and complete model updates by private data. Then, every client send the model update to the server. Four: the server update the global model and complete the adaptation of the anomaly detection model by model updates from clients. Five: the server update the primary global model with unified update, after that, the new global model is completed. Repeating steps three to five until the global model converges

Our work is different from the recent state of the art. First, Krum, GeoMed and TrimmedMean are the representative methods based on geometric knowledge, but their premise is that the data of clients is dependent and identically distributed (iid). The hypothesis of our method is based on the actual application background of FL, aiming at non-iid data. Second, Abnormal is the first method to detect Byzantine attacks by auto-encoder anomaly detection model. However, the training of the anomaly detection model in the method is based on the test dataset and the abnormal detection model in the method is static. For both of the problems, our method has made improvement: 1) we pre-train the anomaly detection model with related but different source data without relying on the test dataset. 2) we introduce adaptive mechanism to the anomaly detection model, which help the detection model get update during federated iteration dynamically.

3.2. Credibility assessment

Algorithm 2(Credibility Assessment) is the key part of BRCA, which assigns a credibility score for each client model update. A Byzantine client would be given much lower credibility score than an honest client. To guarantee the accuracy of the credibility score, Credibility Assessment integrates adaptive anomaly detection model and data verification.

Algorithm 2: Credibility Assessment
Input: local model updates $Q$ ; clients' shared data ${D}_{s}=\left\{{D}_{1}^{s}, \dots, {D}_{n}^{s}\right\}$ ; anomaly detection model ${\theta }^{t}$ ; $\beta$ ; selected clients $S$ ; η_detection; $d$ ; $k$ Output: credibility score of clients R; honest client set $H$ ; anomaly detection model ${\theta }^{t+1}$
1 $R$ = $\varnothing$ : credibility score set; $H$ = $\varnothing$ : the honest client set; $sum$ = 0; $su{m}_{e}$ = 0; $su{m}_{f}$ = 0
2 $C$ = { ${C}_{1}^{t}, \dots, {C}_{i}^{t}, \dots {C}_{k}^{t}$ }, client $i$ ∈ $S$ , ${c}_{i}^{t}$ is the weight of the last convolutional layer of ${W}_{i}^{t}$
3 for each client $i$ ∈ $S$ do

| Show Table

DownLoad: CSV

In Algorithm 2, line 4 is the data verification, which calculates the verification score ${f}_{i}$ for the model update of client $i$ . And line 5 is the get-anomaly-score() of the adaptive anomaly detection model, which calculates detection score ${e}_{i}$ . Subsequently, the credibility ${r}_{i}$ of the model update is ${r}_{i}$ = $\beta {e}_{i}$ $+\mathrm{ }(1\mathrm{ }-\mathrm{ }\mathrm{\beta }){f}_{i}$ , $\mathrm{R} = {\{r}_{1}$ , ... ${r}_{i}$ ..., ${r}_{k}$ }, client $\mathrm{i}\in \mathrm{S}$ . The make-adaption () in line 24 implements the adaption of the anomaly detection model.

In this paper, we judge the model update with a credibility score lower than the mean of $R$ as a Byzantine attack, and set its credibility score as zero. Finally, normalizing the scores to get the final credibility scores.

3.2.1. Adaptive anomaly detection model

In the training process, we cannot predict the type of attacks, but we can estimate the model update of the honest client. Therefore, we can adopt a one-class classification algorithm to build the anomaly detection model with normal model updates. Such technique will learn the distribution boundary of the model updates to determine whether the new sample is abnormal. Auto-encoder is an effective one-class learning model for detecting anomalies, especially for high-dimensional data ^[27].

In practical applications, we cannot get the target data to complete the pre-training of our anomaly detection model. Therefore, the initialized anomaly detection model will be pre-trained on the source data with the idea of transfer learning.

At round t, the detection score ${e}_{i}^{t}$ of client $i$ :

${e}_{i}^{t} = \mathrm{e}\mathrm{x}\mathrm{p}\left(\frac{Mse{(C}_{i}^{t}-{\theta }^{t}\left({C}_{i}^{t}\right))-\mu \left(E\right)}{\sigma \left(E\right)}\right))$

(2)

Our anomaly detection model is different from the one in Abnormal: 1) Abnormal uses the test set of the data set to train the anomaly detection model. Although the detection model obtained can complete the detection task very well, in most cases the test data set is not available. Therefore, based on the idea of transfer learning, we complete the pre-training of the anomaly detection model in the source domain. 2) Abnormal 's anomaly detection model will not be updated after training on the test set. We think this is unreasonable, because the test set is only a tiny part of the overall data. Using a small part of the training data to detect most of the remaining data, and the result may not be accurate enough. Therefore, pre-training of the anomaly detection model is completed in the source domain. Then we use the data of the target domain to fine-tune it in the iterative process to update the anomaly detection model dynamically, as make-adaption shown in Algorithm 3.

Algorithm 3: AADM adaptive anomaly detection model
Input: anomaly detection model ${\theta }^{t}$ ; weights of the last convolutional layer of the local model $C$ ; η_detection; credibility score $R$ ; honest client set $H$ ; $d$ ; $k$ Output: updated anomaly detection model ${\theta }^{t+1}$
1 Function get-anomaly-score ( ${\theta }^{t}, {C}_{i}^{t}$ ):

| Show Table

DownLoad: CSV

3.2.2. Data verification

The non-iid of client data increases the difficulty of Byzantine defense. However, the performance of the updated model of each client on its shared data is not affected by other clients, which can be effectively solved this problem. Therefore, we use the clients' shared data { ${D}_{S} = {D}_{1}^{s}, \mathrm{ }\dots {D}_{i}^{s}, \dots, {D}_{k}^{s}$ } client $i\in S$ to calculate the verification score of their updated model:

${f}_{i}^{t} = {\left({exp}\left(\frac{{l}_{i}^{t}-\mu \left(l\right)}{\sigma \left(L\right)}\right)\right)}^{-2}$

(3)

where ${l}_{i}^{t}$ is loss of client $i$ calculated on model ${w}_{i}^{t}$ using the shared data ${D}_{i}^{s}$ at round $t$ :

${l}_{i}^{t} = \frac{1}{\left|{D}_{i}^{s}\right|}\sum _{j = 0}^{\left|{D}_{i}^{s}\right|}l\left({D}_{i}^{s\left(j\right)}, {W}_{i}^{t}\right)$

(4)

where ${D}_{i}^{s\left(j\right)}$ is the $j-th$ sample of ${D}_{i}^{s}$ and $\mu \left(L\right)$ , $\sigma \left(L\right)$ are the mean and variance of set $L = \left\{{l}_{1}, \dots, {l}_{k}\right\}$ respectively.

3.3. Unified update

After getting the credibility score ${r}_{t}^{k}$ in Algrithm 2 with the anomaly score ${e}_{t}^{k}$ and the verification score ${f}_{t}^{k}$ , we can complete the aggregation of the clients' local model updates in Eq (1) and get a preliminary updated global model. However, due to the non-iid of client data, the knowledge learned by the local model of each client is limited, and the model differences between two clients are also significant. Therefore, to solve the problem that the preliminary aggregation model lacks a clear and consistent goal, we introduce an additional unified update procedure with shared data on server, details can be seen in Algorithm 4.

Algorithm 4: Unified update
Input: global model ${w}^{\mathit{\boldsymbol{t}} + {\bf{1}}}$ ; clients' shared data ${D}_{s}$ = { ${D}_{1}^{s}{, \dots, D}_{n}^{s}$ }; E_server; η_server; honest client set $H$ Output: global model ${w}^{\mathit{\boldsymbol{t}} + {\bf{1}}}$ . 1 for each epoch $e$ = 0 to E_server do

| Show Table

DownLoad: CSV

Because the data used for the unified update is composed of each client's data, it can more comprehensively cover the distribution of the overall data. The goal and direction of the unified update are based on the overall situation and will not tend to individual data distribution.

4. Experiments

To verify the effectiveness of BRCA, we structure the client's data into varying degrees of non-iid, and explore the impact of different amounts of shared data on the global model. At the same time, we also compare the performance of our anomaly detection model with the Abnormal 's and explore the necessity of unified update.

4.1. Experimental steup

4.1.1. Datasets

Mnist and Cifar10 are the two most commonly used public data sets in image classification, and most of the benchmark methods in our work also use these two data sets for experiments. Using these two data sets, it is easier to compare with other existing methods.

We do the experiments on Mnist and Cifar10, and customize four different data distributions: (a) non-iid-1: each client only has one class of data. (b) non-iid-2: each client has 2 classes of data. (c) non-iid-3: each client has 5 classes of data. (d) iid: each client has 10 classes of data.

For Mnist, using 100 clients and four data distributions: (a) non-iid-1: each class of data in the training dataset is divided into 10 pieces, and each client selects one piece as its private data. (b) non-iid-2: each class of data in the training dataset is divided into 20 pieces, and each client selects 2 pieces of different classes of the data. (c) non-iid-3 each class of data in the training dataset is divided into 50 pieces, and each client selects 5 pieces of different classes of the data. (d) iid: each class of data in the training dataset is divided into 100 pieces, and each client selects 10 pieces of different classes of the data. As for the source domains used for the pre-training of the anomaly detection model, we randomly select 20,000 lowercase letters in the Nist dataset.

For Cifar10, there are 10 clients and the configuration of four data distributions is similar to that of the Mnist. We select some classes of data in Cifar100 as source domain, which are as follows: lamp (number:40), lawn mower (41), lobster (45), man (46), forest (47), mountain (49), girl (35), Snake (78), Rose (70) and Tao (68), these samples do not exiting in Cifar10.

4.1.2. Models

We use logistic regression on Mnist dataset. ${\eta }_{server}$ = 0.1, ${\eta }_{client}$ = 0.1, ${\eta }_{detection}$ = 0.02, ${E}_{client}$ = 5, ${E}_{server}$ = 1, n = 100, k = 30, ξ = 20%. Two convolution layers and three fully connected layer on Cifar10, ${\eta }_{server}$ = 0.05, ${\eta }_{client}$ = 0.05, ${\eta }_{detection}$ = 0.002, ${E}_{client}$ = 10, ${E}_{server}$ = 10, n = 10, k = 10, ξ = 20%. The structure of models are the same as ^[10].

4.1.3. Benchmark byzantine attacks

Same-value attacks: A Byzantine client i sends the model update ${\omega }_{i}$ = $c1$ to the server (1 is all-ones vectors, $c$ is a constant), we set $c$ = 5. Sign-flipping attacks: In this scenario, each client $i$ computes its true model update ${\omega }_{i}$ , then Byzantine clients send ${\omega }_{i}$ = a ${\omega }_{i}$ (a < 0) to the server, we set $a$ = −5. Gaussian attacks: Byzantine clients add Gaussian noise to all the dimensions of the model update ${\omega }_{i}$ = ${\omega }_{i}$ + $ϵ$ , where s follows Gaussian distribution N (0, g²) where g is the variance, we set g = 0.3.

4.1.4. Benchmark defense methods

Defenses: Krum, GeoMed, Trimmed Mean, Abnormal and No Defense. No Defense does not use any defense methods.

4.2. Result and discussion

4.2.1. Impact of shared data rate

In the first experiment, we test the influence of the shared data rate γ in our algorithm, and do the experiment with the data distribution of non-iid-2. We implement it on five different values [1, 3, 5, 7 and 10%]. Figures 2 and 3 are the accuracy and loss for Cifar10. It is found that: 1) In all cases of Byzantine attacks, our algorithm is superior to the three benchmark methods. 2) Only 1% of the data shared by the client can significantly improve the performance of the global model. For three Byzantine attacks, Krum, GeoMed, Trimmed Mean, No Defense are all unable to converge. This also shows that when the model is complex, such methods would be less able to resist Byzantine attacks.

Figure 2. The Accuracy of Cifar10. Byzantine attack types from (a) to (c) are as follows: Same value, Sign flipping and Gaussian noisy. Six defense methods are adopted for each type of attack, in order: No defense, Krum, GeoMed, Trimmed Mean, Abnormal and BRCA. For Ours, there are five different shared data rate (1, 3, 5, 7 and 10%), which correspond accordingly: BRCA 1, BRCA 3, BRCA 5, BRCA 7, BRCA 10.

DownLoad: Full-Size Img PowerPoint

Figure 3. The loss of Cifar10. The legends are the same as Figure 2.

DownLoad: Full-Size Img PowerPoint

With the increase in the client data sharing ratio, the performance of the global model has become lower. When the client shares the data ratio from 1 to 10%, the average growth rate with the three Byzantine attacks are: 1.8→1.41→0.97→0.92%. The clients only share one percent of the data, and the performance of the global model can be greatly improved.

Figure 4 clearly demonstrates the impact of different shared data rates on the loss value of the global model on Cifar10.

Figure 4. The loss of BRCA on Cifar10 with five different shared rate.

DownLoad: Full-Size Img PowerPoint

4.2.2. Performance of anomaly detection model

In this part, the purposes of our experiment are: 1) Compare anomaly detection model between ours and Abnormal. 2) Explore the robustness of the anomaly detection model to data that are non-iid. The shared data rate γ is 5%, Sections 4.2.3 and 4.2.4 are the same.

In order to compare the detection performance of the anomaly detection model against Byzantine attacks between BRCA and Abnormal, we use the cross-entropy loss as the evaluation metric which is calculated by the detection score. Firstly, we get detection scores $\mathrm{E} = \{{e}_{1}$ , ..., ${e}_{i}$ , ..., ${e}_{k}$ } based on model update ${\omega }_{i}$ and θ, client $i$ ∈S. Then, we set $P = Sigmoid(E-\mu (E\left)\right)$ represents the probability that the client is honest and 1 − P is the probability that the client is Byzantine. Lastly, we use $P$ and true label Y ( ${y}^{i}$ = 0, $i$ ∈ B and ${y}^{i}$ = 1, j∈ H) to calculate the cross-entropy loss ${l = \Sigma }_{\mathrm{i} = 1}^{k}{y}_{\mathrm{i}}{ln}\left({P}_{i}\right)$

Figure 5(a)–(c) compare the loss of the anomaly detection model between BRCA and the Abnormal. From the figures, we can see that our model has a greater loss than Abnormal in the initial stage, mainly due to the pre-training of the anomaly detection model using the transfer learning. The initial pre-trained anomaly detection model cannot be used well in the target domain. As the adaptation progress, the loss of our model becomes decreases and gradually outperforms the Abnormal. Although Abnormal has a low loss in the initial stage, as the training progresses, the loss gradually increases, and the detection ability becomes degenerate.

Figure 5. the cross-entropy loss of our and Abnormal anomaly detection model, on Cifar10 with non-iid-2. (a)–(c) are the performance for three Byzantine-attacks.

DownLoad: Full-Size Img PowerPoint

Figure 6(a)–(c) show the influence of different data distributions on our detection model. For different data distributions, the detection ability of the model is different, but it is worth pointing out that: as the degree of non-iid of the data increases, the detection ability of the model also increases.

Figure 6. (a)–(c) are our anomaly detection model's performance on four different data distribution (iid, non-iid-1, non-iid-2, non-iid-3) against Byzantine attacks (Gaussian noisy, sign flipping, same value).

DownLoad: Full-Size Img PowerPoint

4.2.3. Impact of unified update

In this part, we study the impact of the unified update on the global model. Figure 7 shows the accuracy of the global model with and without unified update on Cifar10.

Figure 7. The accuracy of BRCA and BRCA No on Cifar10. BRAC No is based on BRCA with unified update removed.

DownLoad: Full-Size Img PowerPoint

From non-iid-1 to iid, the improvement of the global model's accuracy by unified update is as follows: 35.1→13.6→4.7→2.3% (Same value), 34.8→10.5→3.0→3.1% (Gaussian noisy), 24.9→9.9→2.8→3.0% (Sign flipping). Combined with Figure 7, it can be clearly found that the more simple the client data is, the more obvious the unified update will be to the improvement of the global model.

When the data is non-iid, the directions of the model updates between clients are different. The higher the degree of non-iid of data, the more significant the difference. The global model obtained by weighted aggregation does not fit well with the global data. Unified update on the shared data can effectively integrate the model updates of multiple clients, giving the global model a consistent direction.

Therefore, it is necessary to implement a unified update to the primary aggregation model when data is non-iid.

4.2.4. Impact of non-iid

Tables 2 and 3 show the accuracy and loss of each defense method under different data distributions on Cifar10. It can be seen that our method is the best, and the performance is relatively stable for different data distributions. The higher the degree of non-iid of data, the more single the data of each client, the lower the performance of the defense method.

Table 2. The accuracy of the six defenses under four different data distributions on Cifar10, against three attacks.

Attacks		No	Krum	GeoMed	Abnormal	TrimmedMean	BRCA
Same value	Non-iid-1	0.1	0.1	0.1	0.178	0.1	0.529
	Non-iid-2	0.101	0.207	0.205	0.480	0.1	0.619
	Non-iid-3	0.1	0.398	0.398	0.634	0.1	0.691
	iid	0.098	0.696	0.705	0.698	0.101	0.713
Gaussian noisy	Non-iid-1	0.1	0.1	0.1	0.178	0.1	0.529
	Non-iid-2	0.191	0.204	0.205	0.513	0.059	0.623
	Non-iid-3	0.0409	0.398	0.394	0.660	0.171	0.692
	iid	0.1	0.697	0.694	0.710	0.120	0.715
Sign flipping	Non-iid-1	0.1	0.101	0.1	0.177	0.1	0.426
	Non-iid-2	0.1	0.192	0.214	0.5131	0.1	0.621
	Non-iid-3	0.1	0.397	0398	0.651	0.1	0.686
	iid	0.1	0.697	0.703	0.711	0.1	0.718

| Show Table

DownLoad: CSV

Table 3. The loss of the six defenses under four different data distributions on Cifar10, against three attacks.

Attacks		No	Krum	GeoMed	Abnormal	TrimmedMean	BRCA
Same value	Non-iid-1	2.84e¹⁶	11.72	9.61	2.29	6.05e¹⁷	2.09
	Non-iid-2	6.99e¹⁶	7.29	8.01	2.06	3.63e¹⁶	2.09
	Non-iid-3	4.48e¹⁶	2.35	2.38	1.893	3.37e¹⁶	0.691
	iid	1.51e¹⁶	0.794	0.774	1.837	3.17e¹⁶	1.79
Gaussian noisy	Non-iid-1	8.635e⁴	8.41	9.37	2.29	936.17	1.54
	Non-iid-2	9.51	7.57	8.37	1.34	7.98	0.623
	Non-iid-3	8.22	2.01	2.31	0.94	6.07	0.692
	iid	8.09	0.81	0.79	0.82	3.12	0.76
Sign flipping	Non-iid-1	2.30	10.72	9.91	2.29	2.30	1.54
	Non-iid-2	2.31	7.77	7.10	1.34	2.30	0.621
	Non-iid-3	2.31	2.36	2.13	0.94	2.30	0.686
	iid	2.31	0.79	0.80	0.81	2.31	0.76

| Show Table

DownLoad: CSV

Our analysis is as follows: 1) The non-iid of data among clients causes large differences between clients' models. And it is difficult for the defense method to judge whether the anomaly is caused by the non-iid of the data or by the Byzantine attacks, which increases the difficulty of defending the Byzantine attack. 2) Krum and GeoMed use statistical knowledge to select the median or individual client's model to represent the global model. This type of method can effectively defend against Byzantine attacks when the data is iid. However when the data is non-iid, each client's model only focuses on a smaller area, and its independence is high, cannot cover the domain of other clients, and obviously cannot represent the global model. 3) Trimmed Mean is based on the idea of averaging to defend against Byzantine attacks. When the parameter dimension of the model is low, it has a good performance. But as the complexity of the model increases, the method can not stably complete convergence.

5. Conclusions

In this work, we propose a robust federated learning framework against Byzantine attacks when the data is non-iid. BRCA detects Byzantine attacks by credibility assessment. Meanwhile, it makes the unified updating of the global model on the shared data, so that the global model has a consistent direction and its performance is improved. BRCA can make the global model converge very well when facing different data distributions. And for the pre-training of anomaly detection models, transfer learning can help the anomaly detection model get rid of its dependence on the test data set. Experiments have proved that BRCA performs well both on non-iid and iid data, especially on non-iid data. In the future, we will improve our methods by studying how to protect the privacy and security of shared data.

Acknowledgments

This work was partially supported by the Shanghai Science and Technology Innovation Action Plan under Grant 19511101300.

Conflict of interest

All authors declare no conflicts of interest in this paper.

References

[1]	K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, (2016), 770–778. https://doi.org/10.1109/CVPR.2016.90
[2]	S. Singaravel, J. Suykens, P. Geyer, Deep-learning neural-network architectures and methods: Using component-based models in building-design energy prediction, Adv. Eng. Inf., 38 (2018), 81–90. https://doi.org/10.1016/j.aei.2018.06.004 doi: 10.1016/j.aei.2018.06.004
[3]	H. Xu, J. Kong, M. Liang, H. Sun, M. Qi, Video behavior recognition based on actional-structural graph convolution and temporal extension module, Electron. Res. Arch., 30 (2022), 4157–4177. https://doi.org/10.3934/era.2022210 doi: 10.3934/era.2022210
[4]	D. Peng, Y. Lei, H. Munawar, Y. Guo, W. Li, Semantic-aware domain generalized segmentation, in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, (2022), 2584–2595. https://doi.org/10.1109/CVPR52688.2022.00262
[5]	T. Korbak, K. Shi, A Chen, R. V. Bhalerao, C. Buckley, J. Phang, et al., Pretraining language models with human preferences, in Proceedings of the 40th International Conference on Machine Learning, PMLR, 202 (2023), 17506–17533.
[6]	A. Krizhevsky, I. Sutskever, G. Hinton, Imagenet classification with deep convolutional neural networks, Commun. ACM, 60 (2017), 84–90. https://doi.org/10.1145/3065386 doi: 10.1145/3065386
[7]	K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition, preprint, arXiv: 1409.1556.
[8]	C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, et al., Going deeper with convolutions, in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, (2015), 1–9. https://doi.org/10.1109/CVPR.2015.7298594
[9]	G. Huang, Z. Liu, L. Van Der Maaten, K. Q. Weinberger, Densely connected convolutional networks, in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, (2017), 2261–2269. https://doi.org/10.1109/CVPR.2017.243
[10]	J. Xi, Z. Xu, Z. Yan, W. Liu, Y. Liu, Portrait age recognition method based on improved ResNet and deformable convolution, Electron. Res. Arch., 31 (2023), 6585–6599. https://doi.org/10.3934/era.2023333 doi: 10.3934/era.2023333
[11]	C. Swarup, K. U. Singh, A. Kumar, S. K. Pandey, N. Varshney, T. Singh, Brain tumor detection using CNN, AlexNet & GoogLeNet ensembling learning approaches, Electron. Res. Arch., 31 (2023), 2900–2924. https://doi.org/10.3934/era.2023146 doi: 10.3934/era.2023146
[12]	B. Zoph, Q. V. Le, Neural architecture search with reinforcement learning, preprint, arXiv: 1611.01578.
[13]	T. Elsken, J. H. Metzen, F. Hutter, Neural architecture search: A survey, J. Mach. Learn. Res., 20 (2019), 1997–2017.
[14]	Y. Xue, W. Tong, F. Neri, P. Chen, T. Luo, L. Zhen, et al., Evolutionary architecture search for generative adversarial networks based on weight sharing, IEEE Trans. Evol. Comput., 2023 (2023), 1. https://doi.org/10.1109/TEVC.2023.3338371 doi: 10.1109/TEVC.2023.3338371
[15]	Y. Xue, X. Han, Z. Wang, Self-adaptive weight based on dual-attention for differentiable neural, IEEE Trans. Ind. Inf., 2024 (2024), 1–10. https://doi.org/10.1109/TII.2023.3348843 doi: 10.1109/TII.2023.3348843
[16]	Y. Xue, Z. Zhang, F. Neri, Similarity surrogate-assisted evolutionary neural architecture search with dual encoding strategy, Electron. Res. Arch., 32 (2024), 1017–1043. https://doi.org/10.3934/era.2024050 doi: 10.3934/era.2024050
[17]	H. Liu, K. Simonyan, Y. Yang, DARTS: Differentiable architecture search, preprint, arXiv: 1806.09055.
[18]	Y. Xue, J. Qin, Partial connection based on channel attention for differentiable neural architecture search, IEEE Trans. Ind. Inf., 19 (2023), 6804–6813. https://doi.org/10.1109/TII.2022.3184700 doi: 10.1109/TII.2022.3184700
[19]	Y. Xue, C. Lu, F. Neri, J. Qin, Improved differentiable architecture search with multi-stage progressive partial channel connections, IEEE Trans. Emerging Top. Comput. Intell., 8 (2024), 32–43. https://doi.org/10.1109/TETCI.2023.3301395 doi: 10.1109/TETCI.2023.3301395
[20]	Y. Liu, Y. Sun, B. Xue, M. Zhang, G. G. Yen, K. C. Tan, A survey on evolutionary neural architecture search, IEEE Trans. Neural Networks Learn. Syst., 34 (2023), 550–570. https://doi.org/10.1109/TNNLS.2021.3100554 doi: 10.1109/TNNLS.2021.3100554
[21]	Y. Xue, C. Chen, A. Slowik, Neural architecture search based on a multi-objective evolutionary algorithm with probability stack, IEEE Trans. Evol. Comput., 27 (2023), 778–786. https://doi.org/10.1109/TEVC.2023.3252612 doi: 10.1109/TEVC.2023.3252612
[22]	E. Real, S. Moore, A. Selle, S. Saxena, Y. L. Suematsu, J. Tan, et al., Large-scale evolution of image classifiers, in Proceedings of the 34th International Conference on Machine Learning (ICML), PMLR, 70 (2017), 2902–2911.
[23]	Y. Sun, B. Xue, M. Zhang, G. G. Yen, Completely automated CNN architecture design based on blocks, IEEE Trans. Neural Networks Learn. Syst., 31 (2020), 1242–1254. https://doi.org/10.1109/TNNLS.2019.2919608 doi: 10.1109/TNNLS.2019.2919608
[24]	Y. Xue, Y. Wang, J. Liang, A. Slowik, A self-adaptive mutation neural architecture search algorithm based on blocks, IEEE Comput. Intell. Mag., 16 (2021), 67–78. https://doi.org/10.1109/MCI.2021.3084435 doi: 10.1109/MCI.2021.3084435
[25]	D. Song, C. Xu, X. Jia, Y. Chen, C. Xu, Y. Wang, Efficient residual dense block search for image super-resolution, in Proceedings of the AAAI conference on artificial intelligence, AAAI Press, 34 (2020), 12007–12014. https://doi.org/10.1609/aaai.v34i07.6877
[26]	J. Fang, Y. Sun, Q. Zhang, Y. Li, W. Liu, X. Wang, Densely connected search space for more flexible neural architecture search, in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, (2020), 10625–10634. https://doi.org/10.1109/CVPR42600.2020.01064
[27]	J. Kennedy, R. C. Eberhart, Particle swarm optimization, in Proceedings of ICNN'95 - International Conference on Neural Networks, IEEE, 4 (1995), 1942–1948. https://doi.org/10.1109/ICNN.1995.488968
[28]	B. Wang, B. Xue, M. Zhang, Particle swarm optimisation for evolving deep neural networks for image classification by evolving and stacking transferable blocks, in 2020 IEEE Congress on Evolutionary Computation (CEC), IEEE, (2020), 1–8. https://doi.org/10.1109/CEC48606.2020.9185541
[29]	E. Real, A. Aggarwal, Y. Huang, Q. V. Le, Regularized evolution for image classifier architecture search, in Proceedings of the AAAI Conference on Artificial Intelligence, AAAI Press, 33 (2019), 4780–4789. https://doi.org/10.1609/aaai.v33i01.33014780
[30]	G. Huang, S. Liu, L. v. d. Maaten, K. Q. Weinberger, CondenseNet: An efficient denseNet using learned group convolutions, in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, (2018), 2752–2761. https://doi.org/10.1109/CVPR.2018.00291
[31]	H. Cai, T. Chen, W. Zhang, Y. Yu, J. Wang, Regularized evolution for image classifier architecture search, in Proceedings of the AAAI Conference on Artificial Intelligence, AAAI Press, 32 (2018), 4780–4789. https://doi.org/10.1609/aaai.v32i1.11709
[32]	B. Baker, O. Gupta, N. Naik, R. Raskar, Designing neural network architectures using reinforcement learning, preprint, arXiv: 1611.02167.
[33]	L. Xie, A. Yuille, Genetic CNN, in 2017 IEEE International Conference on Computer Vision (ICCV), IEEE, (2017), 1388–1397. https://doi.org/10.1109/ICCV.2017.154
[34]	H. Liu, K. Simonyan, O. Vinyals, C. Fernando, K. Kavukcuoglu, Hierarchical representations for efficient architecture search, preprint, arXiv: 1711.00436.
[35]	A. I. Sharaf, E. S. F. Radwan, An automated approach for developing a convolutional neural network using a modified firefly algorithm for image classification, in Applications of Firefly Algorithm and its Variants, Springer, (2020), 99–118. https://doi.org/10.1007/978-981-15-0306-1_5
[36]	G. Cuccu, J. Togelius, P. Cudre-Mauroux, Playing atari with six neurons (Extended abstract), in Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20), International Joint Conferences on Artificial Intelligence Organization, (2020), 4711–4715. https://doi.org/10.24963/ijcai.2020/651

This article has been cited by:

1.	Chang Xu, Yu Jia, Liehuang Zhu, Chuan Zhang, Guoxie Jin, Kashif Sharif, TDFL: Truth Discovery Based Byzantine Robust Federated Learning, 2022, 33, 1045-9219, 4835, 10.1109/TPDS.2022.3205714
2.	Jie Wen, Zhixia Zhang, Yang Lan, Zhihua Cui, Jianghui Cai, Wensheng Zhang, A survey on federated learning: challenges and applications, 2023, 14, 1868-8071, 513, 10.1007/s13042-022-01647-y
3.	Qingtie Li, Xuemei Wang, Shougang Ren, A Privacy Robust Aggregation Method Based on Federated Learning in the IoT, 2023, 12, 2079-9292, 2951, 10.3390/electronics12132951
4.	Wenbin Yao, Bangli Pan, Yingying Hou, Xiaoyong Li, Yamei Xia, An Adaptive Model Filtering Algorithm Based on Grubbs Test in Federated Learning, 2023, 25, 1099-4300, 715, 10.3390/e25050715
5.	Chang Zhang, Shunkun Yang, Lingfeng Mao, Huansheng Ning, Anomaly detection and defense techniques in federated learning: a comprehensive review, 2024, 57, 1573-7462, 10.1007/s10462-024-10796-1
6.	Hiralal Bhaskar Solunke, Pawan Bhaladhare, Amol Potgantwar, 2024, chapter 17, 9798369334942, 299, 10.4018/979-8-3693-3494-2.ch017
7.	Caiyu Su, Jinri Wei, Yuan Lei, Hongkun Xuan, Jiahui Li, Chenchu Xu, Empowering precise advertising with Fed-GANCC: A novel federated learning approach leveraging Generative Adversarial Networks and group clustering, 2024, 19, 1932-6203, e0298261, 10.1371/journal.pone.0298261
8.	Kai Hu, Sheng Gong, Qi Zhang, Chaowen Seng, Min Xia, Shanshan Jiang, An overview of implementing security and privacy in federated learning, 2024, 57, 1573-7462, 10.1007/s10462-024-10846-8
9.	S. Annamalai, N. Sangeetha, M. Kumaresan, Dommaraju Tejavarma, Gandhodi Harsha Vardhan, A. Suresh Kumar, 2025, 9781394219216, 127, 10.1002/9781394219230.ch7
10.	Zheng Yang, Ke Gu, Yiming Zuo, Byzantine Robust Federated Learning Scheme Based on Backdoor Triggers, 2024, 79, 1546-2226, 2813, 10.32604/cmc.2024.050025

Reader Comments

Your name:*

Email:*
© 2024 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Electronic Research Archive

1 1.3

Metrics

Article views(1280) PDF downloads(46) Cited by(1)

Preview PDF

Download XML

Export Citation

Article outline

Show full outline

Figures and Tables

Figures(8) / Tables(2)

Electronic Research Archive

Evolving blocks by segmentation for neural architecture search

Related Papers:

Abstract

1. Introduction

2. Related work

3. Byzantine-robust federated learning on non-iid data

3.1. BRCA: Byzantine-robust federated learning via credibility assessment

3.2. Credibility assessment

3.2.1. Adaptive anomaly detection model

3.2.2. Data verification

3.3. Unified update

4. Experiments

4.1. Experimental steup

4.1.1. Datasets

4.1.2. Models

4.1.3. Benchmark byzantine attacks

4.1.4. Benchmark defense methods

4.2. Result and discussion

4.2.1. Impact of shared data rate

4.2.2. Performance of anomaly detection model

4.2.3. Impact of unified update

4.2.4. Impact of non-iid

5. Conclusions

Acknowledgments

Conflict of interest

References

This article has been cited by:

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Catalog

Electronic Research Archive

Evolving blocks by segmentation for neural architecture search

Related Papers:

Abstract

1. Introduction

2. Related work

3. Byzantine-robust federated learning on non-iid data

3.1. BRCA: Byzantine-robust federated learning via credibility assessment

3.2. Credibility assessment

3.2.1. Adaptive anomaly detection model

3.2.2. Data verification

3.3. Unified update

4. Experiments

4.1. Experimental steup

4.1.1. Datasets

4.1.2. Models

4.1.3. Benchmark byzantine attacks

4.1.4. Benchmark defense methods

4.2. Result and discussion

4.2.1. Impact of shared data rate

4.2.2. Performance of anomaly detection model

4.2.3. Impact of unified update

4.2.4. Impact of non-iid

5. Conclusions

Acknowledgments

Conflict of interest

References

This article has been cited by:

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Related pages

Tools

Export File

Citation

Format

Content

Catalog