A comprehensive review of graph convolutional networks: approaches and applications

Xinzheng Xu; Xiaoyang Zhao; Meng Wei; Zhongnian Li; Xinzheng Xu; Xiaoyang Zhao; Meng Wei; Zhongnian Li

doi:10.3934/era.2023213

Electronic Research Archive

2023, Volume 31, Issue 7: 4185-4215. doi: 10.3934/era.2023213

Previous Article Next Article

Review

A comprehensive review of graph convolutional networks: approaches and applications

School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 221116, China

Received: 03 October 2022 Revised: 04 March 2023 Accepted: 28 March 2023 Published: 31 May 2023

Convolutional neural networks (CNNs) utilize local translation invariance in the Euclidean domain and have remarkable achievements in computer vision tasks. However, there are many data types with non-Euclidean structures, such as social networks, chemical molecules, knowledge graphs, etc., which are crucial to real-world applications. The graph convolutional neural network (GCN), as a derivative of CNNs for non-Euclidean data, was established for non-Euclidean graph data. In this paper, we mainly survey the progress of GCNs and introduce in detail several basic models based on GCNs. First, we review the challenges in building GCNs, including large-scale graph data, directed graphs and multi-scale graph tasks. Also, we briefly discuss some applications of GCNs, including computer vision, transportation networks and other fields. Furthermore, we point out some open issues and highlight some future research trends for GCNs.

Keywords:

Citation: Xinzheng Xu, Xiaoyang Zhao, Meng Wei, Zhongnian Li. A comprehensive review of graph convolutional networks: approaches and applications[J]. Electronic Research Archive, 2023, 31(7): 4185-4215. doi: 10.3934/era.2023213

Related Papers:

[1]	Chunkai Zhang, Yingyang Chen, Ao Yin, Xuan Wang . Anomaly detection in ECG based on trend symbolic aggregate approximation. Mathematical Biosciences and Engineering, 2019, 16(4): 2154-2167. doi: 10.3934/mbe.2019105
[2]	Muhammad Firdaus, Siwan Noh, Zhuohao Qian, Harashta Tatimma Larasati, Kyung-Hyune Rhee . Personalized federated learning for heterogeneous data: A distributed edge clustering approach. Mathematical Biosciences and Engineering, 2023, 20(6): 10725-10740. doi: 10.3934/mbe.2023475
[3]	Kefeng Fan, Cun Xu, Xuguang Cao, Kaijie Jiao, Wei Mo . Tri-branch feature pyramid network based on federated particle swarm optimization for polyp segmentation. Mathematical Biosciences and Engineering, 2024, 21(1): 1610-1624. doi: 10.3934/mbe.2024070
[4]	Songfeng Liu, Jinyan Wang, Wenliang Zhang . Federated personalized random forest for human activity recognition. Mathematical Biosciences and Engineering, 2022, 19(1): 953-971. doi: 10.3934/mbe.2022044
[5]	M Kumaresan, M Senthil Kumar, Nehal Muthukumar . Analysis of mobility based COVID-19 epidemic model using Federated Multitask Learning. Mathematical Biosciences and Engineering, 2022, 19(10): 9983-10005. doi: 10.3934/mbe.2022466
[6]	Tao Wang, Min Qiu . A visual transformer-based smart textual extraction method for financial invoices. Mathematical Biosciences and Engineering, 2023, 20(10): 18630-18649. doi: 10.3934/mbe.2023826
[7]	Jiyang Yu, Baicheng Pan, Shanshan Yu, Man-Fai Leung . Robust capped norm dual hyper-graph regularized non-negative matrix tri-factorization. Mathematical Biosciences and Engineering, 2023, 20(7): 12486-12509. doi: 10.3934/mbe.2023556
[8]	Jianzhong Peng, Wei Zhu, Qiaokang Liang, Zhengwei Li, Maoying Lu, Wei Sun, Yaonan Wang . Defect detection in code characters with complex backgrounds based on BBE. Mathematical Biosciences and Engineering, 2021, 18(4): 3755-3780. doi: 10.3934/mbe.2021189
[9]	Shubashini Velu . An efficient, lightweight MobileNetV2-based fine-tuned model for COVID-19 detection using chest X-ray images. Mathematical Biosciences and Engineering, 2023, 20(5): 8400-8427. doi: 10.3934/mbe.2023368
[10]	Michael James Horry, Subrata Chakraborty, Biswajeet Pradhan, Maryam Fallahpoor, Hossein Chegeni, Manoranjan Paul . Factors determining generalization in deep learning models for scoring COVID-CT images. Mathematical Biosciences and Engineering, 2021, 18(6): 9264-9293. doi: 10.3934/mbe.2021456

Abstract

1. Introduction

In recent years, the abundance of data generated from many distributed devices with the popularity of smartphones, wearable devices, intelligent home appliances, and autonomous driving. These data are usually concentrated in the data center for effective use. However, a crucial issue arises that the concentrated data store causes leakage of personal privacy ^[1]. Simultaneously, as the computing power of these mobile devices increases, it is attractive to store data locally while completing related computing tasks. Federated learning is a distributed machine learning framework that allows multiple parties to collaboratively train a model without sharing raw data ^[2,3], which has attracted significant attention from industry and academia recently. ^[4] summarizes and discusses in the application of federated learning in big data and its future direction. Although federated learning has essential significance and advantages in protecting user privacy, it also faces many challenges.

First of all, due to the distributed nature of federated learning, it is vulnerable to Byzan- tine attacks. Notably, it has been shown that, with just one Byzantine client, the whole federated optimization algorithm can be compromised and fail to converge ^[5]. Especially when the training data is not independent and identically distributed (non-iid), the difficulty of defense against Byzantine attacks is increased and it is difficult to guarantee the convergence of the model ^[6].

Methods for defending against Byzantine attacks in federated learning have been exten- sively studied, including coordinate-wise trimmed mean ^[9], the coordinate- wise median ^[7,8], the geometric median ^[10,11], and distance-based methods Krum ^[12], BREA ^[6], Bulyan ^[5]. In addition to the above methods based on statistical knowledge, ^[14] proposes a new idea based on anomaly detection to complete the detection of Byzantine clients in the learning process. ^[13] discusses the challenges and future directions of federated learning in real-time scenarios in terms of cybersecurity.

The above methods can effectively defend against Byzantine attacks to some extent, but there are also some limitations. First, the methods based on statistical knowledge have high computational complexity, and also their defense abilities are weakened due to the non-iid data in federated learning. Second, for the anomaly detection algorithm ^[14], there is a premise that the detection model should be trained on the test data set. Obviously, the premise hypothesis cannot be realized in practical applications because it is difficult for us to get such a data set, which can cover almost all data distributions. Therefore, it necessary for the anomaly detection model to get pre-training without relying on test dataset and update dynamically on non-iid data.

In this paper, we propose a new method that each client needs to share some data with the server, which makes a trade-off between client privacy and model performance. Unlike FedAvg ^[2], we use credibility score as the weight of model aggregation, not the sample size. The credibility score of each client is obtained by integrating the verification score and the detection score. The former is calculated by sharing data.

The main contributions of this paper are:

▪ We propose a new federated learning framework (BRCA) which combines credibility assessment and unified update. BRCA not only effectively defends against Byzantine attacks, but also reduces the impact of non-iid data on the aggregated global model.

▪ The credibility assessment combing anomaly detection and data verification effectively detects Byzantine attacks on non-iid data.

▪ By incorporating an adaptive mechanism and transfer learning into the anomaly detection model, the anomaly detection model can dynamically improve detection performance. Moreover, its pre-training no longer relies on the test data set.

▪ We customize four different data distributions for each data set, and explore the influence of data distribution on defense methods against Byzantine attacks.

2. Related work

FedAvg is firstly proposed in ^[2] as an aggregation algorithm for federated learning. The server updates the global model by a weighted average of the clients' model updates, and the aggregation weight is determined based on its data sample size. Stich ^[15] and Woodworth et al. ^[16] analyze the convergence of FedAvg on strongly-convex smooth loss functions. However, they assume that the data is iid, which is not suitable for federated learning ^[17,18]. And Li et al. ^[19] makes the first convergence analysis of FedAvg when the data is non-iid. ^[20] uses clustering to improve federated learning in non-iid data. Regrettably, the ability of naive FedAvg is very weak to resist Byzantine attacks.

In the iterative process of federated aggregation, honest clients send the true model updates to the server, wishing to train a global model by consolidating their private data. However, Byzantine clients attempt to perturb the optimization process ^[21]. Byzantine attacks may be caused by some data corruption events in the computing or communication process such as software crashes, hardware failures and transmission errors. Simultaneously, they may also be caused by malicious clients through actively transmitting error information, in order to mislead the learning process ^[21].

Byzantine-robust federated learning has received increasing attention in recent years. Krum ^[12] is designed specially to defend Byzantine attacks in the federated learning. Krum generate the global model by a client's model update whose distances to its neighbors is shortest. GeoMed ^[10] uses the geometric median which is a variant of the median from one dimension to multiple dimensions. Unlike the Krum, the GeoMed uses all client updates to generate a new global model, not just one client update. Trimmed Mean ^[9] proposes that each dimension of its global model is obtained by averaging the parameters of clients' model updates in that dimension. But before calculating the average, the largest and smallest part of the parameters in that dimension are deleted, Xie et al. ^[22] and Mhamdi et al. ^[5] are all its variants. BREA ^[6] also considers the security of information transmission, but its defense method is still based on distance calculation. Zero ^[23] based on Watermark detection approach detect attacks such as malware and phishing attacks and cryptojacking. ^[24] surveys intrusion detection techniques in mobile cloud computing environment.

All of the above defense methods based on statistical knowledge and distance are not effective in defending against Byzantine attacks in non-iid settings. Abnormal ^[25] uses an anomaly detection model to complete the detection of Byzantine attacks.

The concept of independent and identically distributed (iid) of data is clear, but there are many meanings of non-iid. In this work, we only consider label distribution skew ^[17]. The categories of samples may vary across clients. For example, in the face recognition task, each user generally has their face data; for mobile device, some users may use emojis that do not show up in others' devices.

We summarize the contributions and limitations of the existing works in Table 1.

Table 1. The summary of the contributions and limitations of the related papers.

Reference	Contributions	Limitations
^[12] ^[10] ^[9] ^[5] ^[22]	Krum, GeoMed and Trimmed Mean complete the Byzantine defense based on statistical knowledge. Easy to deploy applications.	The assumption is that the data of the clients is iid. High computational complexity.
^[25]	The auto-encoder anomaly detection model is firstly applied to detect Byzantine attacks.	The pre-training of the anomaly detection model is completed on test dataset. The anomaly detection model is static.
^[6]	Cryptography is used to protect the security of information transmitted between clients and server.	Defense against Byzantine attacks is still based on distance to find outliers, and had limited defenses capabilities.

| Show Table

DownLoad: CSV

In this paper, we propose a method that combine credibility assessment and unified update to robust federated learning against Byzantine attacks on non-iid data.

3. Byzantine-robust federated learning on non-iid data

We utilize a federated setting that one server communicates with many clients. For the rest of the paper, we will use the following symbol definitions: $A$ is the total client set, $\left|A\right|$ = n; S is the selected client set in every iteration, $\left|S\right|$ = k; among them, B is Byzantine client set, $\left|B\right|$ = b, and H is honest client set, $\left|H\right| = h$ . ${w}_{i}^{t}$ is the model update sent by the client i to the server at round t, Byzantine attack rate $\mathrm{\xi } = \frac{b}{k}\cdot {w}^{t}$ is the global model at round t, ${D}_{P}$ = {D, ..., ${D}_{n}$ } is clients' private data, Ds = {Ds, ..., Ds } is the clients' shared data, and data-sharing rate γ = $\frac{\left|{D}_{s}\right|}{\left|{D}_{P}\right|+\left|{D}_{s}\right|}$ ( $\left|\cdot \right|$ represents the sample size of the data set).

3.1. BRCA: Byzantine-robust federated learning via credibility assessment

In order to enhance the robustness of federated learning against Byzantines attacks on non-iid data, BRCA combines credibility assessment and unified update, Figure 1 depicts the architecture of BRCA.

Figure 1. The frame diagram of the BRCA.

DownLoad: Full-Size Img PowerPoint

Before training, each client needs to share some private data to the server. In each iteration, the server randomly selects some clients and sends the latest global model to them. These clients use their private data to train the model locally and send the model updates to the server. After receiving model updates, the server conducts a credibility assessment for each model update and calculates their credibility scores. Momentum is an effective measure to improve the ability of federated learning to resist Byzantine attacks ^[26]. So our aggregation Eq (1) is as follow:

${w}^{t}+1 = \alpha {W}^{t}+\left(1-a\right)\sum _{i\in s}{r}_{i}^{t}{w}_{i}^{t}$

(1)

where ${r}_{i}^{t}$ is the credibility score of client i at round t and $\alpha$ (0 < $\alpha$ < 1) is a decay factor. Last, unified update uses shared data to update the primary global model to get the new global model for this round

Algorithm 1 is the description of BRCA, which contains Credibility Assessment in line 22, and Unified Update in line 28. The crucial of BRCA to defend against Byzantine attacks is credibility assessment. On non-iid data, the data distributions of different clients are immense, and it is difficult to judge whether the difference is caused by Byzantine attacks or the non-iid data. However, the model update of the honest client should have a positive effect on its private data, which is not affected by other clients. Simultaneously, anomaly detection model can effectively detect Byzantine attacks ^[25]. Thus, we combine the above two ideas to detect Byzantine attacks. In order to solve the shortcomings of the existing anomaly detection models, we propose an adaptive anomaly detection model. In this paper, the shared data is randomly selected by each client based on the sample category. Of course, other sampling methods could also be used, such as clustering. In addition, it must be pointed out that the shared data will only be used on the server, not on the clients. That effectively protect the clients' privacy.

Algorithm 1: BRCA
Input: total clients A; total number of iterations $T$ ; learning rate η_server, η_client, η_detection; Byzantine attack rate $\xi$ ; epoch E_server, E_client; initial global model ${w}^{0}$ ; clients' private data ${D}_{P}=\left\{{D}_{1}^{P}{, \dots, D}_{N}^{P}\right\}$ ; clients' shared data ${D}_{s}$ = { ${D}_{1}^{s}$ , ..., ${D}_{n}^{S}$ }; initial anomaly detection model ${\theta }^{0}$ ; $\beta$ ; $\alpha$ ; $d$ ; $k$ Output: global model W^T+1, anomaly detection model ${\theta }^{T+1}$
1 $R$ = $\varnothing$ : the credibility score set.
2 $H$ = $\varnothing$ : the honest client set.
3 Function Add Attack(w):

| Show Table

DownLoad: CSV

To summarize, BRCA has five steps. First: the server pre-train an anomaly detection model by source data and initialize a global model. Second: every client share little private data with the server. Three: every client download the newest global model from the server, and complete model updates by private data. Then, every client send the model update to the server. Four: the server update the global model and complete the adaptation of the anomaly detection model by model updates from clients. Five: the server update the primary global model with unified update, after that, the new global model is completed. Repeating steps three to five until the global model converges

Our work is different from the recent state of the art. First, Krum, GeoMed and TrimmedMean are the representative methods based on geometric knowledge, but their premise is that the data of clients is dependent and identically distributed (iid). The hypothesis of our method is based on the actual application background of FL, aiming at non-iid data. Second, Abnormal is the first method to detect Byzantine attacks by auto-encoder anomaly detection model. However, the training of the anomaly detection model in the method is based on the test dataset and the abnormal detection model in the method is static. For both of the problems, our method has made improvement: 1) we pre-train the anomaly detection model with related but different source data without relying on the test dataset. 2) we introduce adaptive mechanism to the anomaly detection model, which help the detection model get update during federated iteration dynamically.

3.2. Credibility assessment

Algorithm 2(Credibility Assessment) is the key part of BRCA, which assigns a credibility score for each client model update. A Byzantine client would be given much lower credibility score than an honest client. To guarantee the accuracy of the credibility score, Credibility Assessment integrates adaptive anomaly detection model and data verification.

Algorithm 2: Credibility Assessment
Input: local model updates $Q$ ; clients' shared data ${D}_{s}=\left\{{D}_{1}^{s}, \dots, {D}_{n}^{s}\right\}$ ; anomaly detection model ${\theta }^{t}$ ; $\beta$ ; selected clients $S$ ; η_detection; $d$ ; $k$ Output: credibility score of clients R; honest client set $H$ ; anomaly detection model ${\theta }^{t+1}$
1 $R$ = $\varnothing$ : credibility score set; $H$ = $\varnothing$ : the honest client set; $sum$ = 0; $su{m}_{e}$ = 0; $su{m}_{f}$ = 0
2 $C$ = { ${C}_{1}^{t}, \dots, {C}_{i}^{t}, \dots {C}_{k}^{t}$ }, client $i$ ∈ $S$ , ${c}_{i}^{t}$ is the weight of the last convolutional layer of ${W}_{i}^{t}$
3 for each client $i$ ∈ $S$ do

| Show Table

DownLoad: CSV

In Algorithm 2, line 4 is the data verification, which calculates the verification score ${f}_{i}$ for the model update of client $i$ . And line 5 is the get-anomaly-score() of the adaptive anomaly detection model, which calculates detection score ${e}_{i}$ . Subsequently, the credibility ${r}_{i}$ of the model update is ${r}_{i}$ = $\beta {e}_{i}$ $+\mathrm{ }(1\mathrm{ }-\mathrm{ }\mathrm{\beta }){f}_{i}$ , $\mathrm{R} = {\{r}_{1}$ , ... ${r}_{i}$ ..., ${r}_{k}$ }, client $\mathrm{i}\in \mathrm{S}$ . The make-adaption () in line 24 implements the adaption of the anomaly detection model.

In this paper, we judge the model update with a credibility score lower than the mean of $R$ as a Byzantine attack, and set its credibility score as zero. Finally, normalizing the scores to get the final credibility scores.

3.2.1. Adaptive anomaly detection model

In the training process, we cannot predict the type of attacks, but we can estimate the model update of the honest client. Therefore, we can adopt a one-class classification algorithm to build the anomaly detection model with normal model updates. Such technique will learn the distribution boundary of the model updates to determine whether the new sample is abnormal. Auto-encoder is an effective one-class learning model for detecting anomalies, especially for high-dimensional data ^[27].

In practical applications, we cannot get the target data to complete the pre-training of our anomaly detection model. Therefore, the initialized anomaly detection model will be pre-trained on the source data with the idea of transfer learning.

At round t, the detection score ${e}_{i}^{t}$ of client $i$ :

${e}_{i}^{t} = \mathrm{e}\mathrm{x}\mathrm{p}\left(\frac{Mse{(C}_{i}^{t}-{\theta }^{t}\left({C}_{i}^{t}\right))-\mu \left(E\right)}{\sigma \left(E\right)}\right))$

(2)

Our anomaly detection model is different from the one in Abnormal: 1) Abnormal uses the test set of the data set to train the anomaly detection model. Although the detection model obtained can complete the detection task very well, in most cases the test data set is not available. Therefore, based on the idea of transfer learning, we complete the pre-training of the anomaly detection model in the source domain. 2) Abnormal 's anomaly detection model will not be updated after training on the test set. We think this is unreasonable, because the test set is only a tiny part of the overall data. Using a small part of the training data to detect most of the remaining data, and the result may not be accurate enough. Therefore, pre-training of the anomaly detection model is completed in the source domain. Then we use the data of the target domain to fine-tune it in the iterative process to update the anomaly detection model dynamically, as make-adaption shown in Algorithm 3.

Algorithm 3: AADM adaptive anomaly detection model
Input: anomaly detection model ${\theta }^{t}$ ; weights of the last convolutional layer of the local model $C$ ; η_detection; credibility score $R$ ; honest client set $H$ ; $d$ ; $k$ Output: updated anomaly detection model ${\theta }^{t+1}$
1 Function get-anomaly-score ( ${\theta }^{t}, {C}_{i}^{t}$ ):

| Show Table

DownLoad: CSV

3.2.2. Data verification

The non-iid of client data increases the difficulty of Byzantine defense. However, the performance of the updated model of each client on its shared data is not affected by other clients, which can be effectively solved this problem. Therefore, we use the clients' shared data { ${D}_{S} = {D}_{1}^{s}, \mathrm{ }\dots {D}_{i}^{s}, \dots, {D}_{k}^{s}$ } client $i\in S$ to calculate the verification score of their updated model:

${f}_{i}^{t} = {\left({exp}\left(\frac{{l}_{i}^{t}-\mu \left(l\right)}{\sigma \left(L\right)}\right)\right)}^{-2}$

(3)

where ${l}_{i}^{t}$ is loss of client $i$ calculated on model ${w}_{i}^{t}$ using the shared data ${D}_{i}^{s}$ at round $t$ :

${l}_{i}^{t} = \frac{1}{\left|{D}_{i}^{s}\right|}\sum _{j = 0}^{\left|{D}_{i}^{s}\right|}l\left({D}_{i}^{s\left(j\right)}, {W}_{i}^{t}\right)$

(4)

where ${D}_{i}^{s\left(j\right)}$ is the $j-th$ sample of ${D}_{i}^{s}$ and $\mu \left(L\right)$ , $\sigma \left(L\right)$ are the mean and variance of set $L = \left\{{l}_{1}, \dots, {l}_{k}\right\}$ respectively.

3.3. Unified update

After getting the credibility score ${r}_{t}^{k}$ in Algrithm 2 with the anomaly score ${e}_{t}^{k}$ and the verification score ${f}_{t}^{k}$ , we can complete the aggregation of the clients' local model updates in Eq (1) and get a preliminary updated global model. However, due to the non-iid of client data, the knowledge learned by the local model of each client is limited, and the model differences between two clients are also significant. Therefore, to solve the problem that the preliminary aggregation model lacks a clear and consistent goal, we introduce an additional unified update procedure with shared data on server, details can be seen in Algorithm 4.

Algorithm 4: Unified update
Input: global model ${w}^{\mathit{\boldsymbol{t}} + {\bf{1}}}$ ; clients' shared data ${D}_{s}$ = { ${D}_{1}^{s}{, \dots, D}_{n}^{s}$ }; E_server; η_server; honest client set $H$ Output: global model ${w}^{\mathit{\boldsymbol{t}} + {\bf{1}}}$ . 1 for each epoch $e$ = 0 to E_server do

| Show Table

DownLoad: CSV

Because the data used for the unified update is composed of each client's data, it can more comprehensively cover the distribution of the overall data. The goal and direction of the unified update are based on the overall situation and will not tend to individual data distribution.

4. Experiments

To verify the effectiveness of BRCA, we structure the client's data into varying degrees of non-iid, and explore the impact of different amounts of shared data on the global model. At the same time, we also compare the performance of our anomaly detection model with the Abnormal 's and explore the necessity of unified update.

4.1. Experimental steup

4.1.1. Datasets

Mnist and Cifar10 are the two most commonly used public data sets in image classification, and most of the benchmark methods in our work also use these two data sets for experiments. Using these two data sets, it is easier to compare with other existing methods.

We do the experiments on Mnist and Cifar10, and customize four different data distributions: (a) non-iid-1: each client only has one class of data. (b) non-iid-2: each client has 2 classes of data. (c) non-iid-3: each client has 5 classes of data. (d) iid: each client has 10 classes of data.

For Mnist, using 100 clients and four data distributions: (a) non-iid-1: each class of data in the training dataset is divided into 10 pieces, and each client selects one piece as its private data. (b) non-iid-2: each class of data in the training dataset is divided into 20 pieces, and each client selects 2 pieces of different classes of the data. (c) non-iid-3 each class of data in the training dataset is divided into 50 pieces, and each client selects 5 pieces of different classes of the data. (d) iid: each class of data in the training dataset is divided into 100 pieces, and each client selects 10 pieces of different classes of the data. As for the source domains used for the pre-training of the anomaly detection model, we randomly select 20,000 lowercase letters in the Nist dataset.

For Cifar10, there are 10 clients and the configuration of four data distributions is similar to that of the Mnist. We select some classes of data in Cifar100 as source domain, which are as follows: lamp (number:40), lawn mower (41), lobster (45), man (46), forest (47), mountain (49), girl (35), Snake (78), Rose (70) and Tao (68), these samples do not exiting in Cifar10.

4.1.2. Models

We use logistic regression on Mnist dataset. ${\eta }_{server}$ = 0.1, ${\eta }_{client}$ = 0.1, ${\eta }_{detection}$ = 0.02, ${E}_{client}$ = 5, ${E}_{server}$ = 1, n = 100, k = 30, ξ = 20%. Two convolution layers and three fully connected layer on Cifar10, ${\eta }_{server}$ = 0.05, ${\eta }_{client}$ = 0.05, ${\eta }_{detection}$ = 0.002, ${E}_{client}$ = 10, ${E}_{server}$ = 10, n = 10, k = 10, ξ = 20%. The structure of models are the same as ^[10].

4.1.3. Benchmark byzantine attacks

Same-value attacks: A Byzantine client i sends the model update ${\omega }_{i}$ = $c1$ to the server (1 is all-ones vectors, $c$ is a constant), we set $c$ = 5. Sign-flipping attacks: In this scenario, each client $i$ computes its true model update ${\omega }_{i}$ , then Byzantine clients send ${\omega }_{i}$ = a ${\omega }_{i}$ (a < 0) to the server, we set $a$ = −5. Gaussian attacks: Byzantine clients add Gaussian noise to all the dimensions of the model update ${\omega }_{i}$ = ${\omega }_{i}$ + $ϵ$ , where s follows Gaussian distribution N (0, g²) where g is the variance, we set g = 0.3.

4.1.4. Benchmark defense methods

Defenses: Krum, GeoMed, Trimmed Mean, Abnormal and No Defense. No Defense does not use any defense methods.

4.2. Result and discussion

4.2.1. Impact of shared data rate

In the first experiment, we test the influence of the shared data rate γ in our algorithm, and do the experiment with the data distribution of non-iid-2. We implement it on five different values [1, 3, 5, 7 and 10%]. Figures 2 and 3 are the accuracy and loss for Cifar10. It is found that: 1) In all cases of Byzantine attacks, our algorithm is superior to the three benchmark methods. 2) Only 1% of the data shared by the client can significantly improve the performance of the global model. For three Byzantine attacks, Krum, GeoMed, Trimmed Mean, No Defense are all unable to converge. This also shows that when the model is complex, such methods would be less able to resist Byzantine attacks.

Figure 2. The Accuracy of Cifar10. Byzantine attack types from (a) to (c) are as follows: Same value, Sign flipping and Gaussian noisy. Six defense methods are adopted for each type of attack, in order: No defense, Krum, GeoMed, Trimmed Mean, Abnormal and BRCA. For Ours, there are five different shared data rate (1, 3, 5, 7 and 10%), which correspond accordingly: BRCA 1, BRCA 3, BRCA 5, BRCA 7, BRCA 10.

DownLoad: Full-Size Img PowerPoint

Figure 3. The loss of Cifar10. The legends are the same as Figure 2.

DownLoad: Full-Size Img PowerPoint

With the increase in the client data sharing ratio, the performance of the global model has become lower. When the client shares the data ratio from 1 to 10%, the average growth rate with the three Byzantine attacks are: 1.8→1.41→0.97→0.92%. The clients only share one percent of the data, and the performance of the global model can be greatly improved.

Figure 4 clearly demonstrates the impact of different shared data rates on the loss value of the global model on Cifar10.

Figure 4. The loss of BRCA on Cifar10 with five different shared rate.

DownLoad: Full-Size Img PowerPoint

4.2.2. Performance of anomaly detection model

In this part, the purposes of our experiment are: 1) Compare anomaly detection model between ours and Abnormal. 2) Explore the robustness of the anomaly detection model to data that are non-iid. The shared data rate γ is 5%, Sections 4.2.3 and 4.2.4 are the same.

In order to compare the detection performance of the anomaly detection model against Byzantine attacks between BRCA and Abnormal, we use the cross-entropy loss as the evaluation metric which is calculated by the detection score. Firstly, we get detection scores $\mathrm{E} = \{{e}_{1}$ , ..., ${e}_{i}$ , ..., ${e}_{k}$ } based on model update ${\omega }_{i}$ and θ, client $i$ ∈S. Then, we set $P = Sigmoid(E-\mu (E\left)\right)$ represents the probability that the client is honest and 1 − P is the probability that the client is Byzantine. Lastly, we use $P$ and true label Y ( ${y}^{i}$ = 0, $i$ ∈ B and ${y}^{i}$ = 1, j∈ H) to calculate the cross-entropy loss ${l = \Sigma }_{\mathrm{i} = 1}^{k}{y}_{\mathrm{i}}{ln}\left({P}_{i}\right)$

Figure 5(a)–(c) compare the loss of the anomaly detection model between BRCA and the Abnormal. From the figures, we can see that our model has a greater loss than Abnormal in the initial stage, mainly due to the pre-training of the anomaly detection model using the transfer learning. The initial pre-trained anomaly detection model cannot be used well in the target domain. As the adaptation progress, the loss of our model becomes decreases and gradually outperforms the Abnormal. Although Abnormal has a low loss in the initial stage, as the training progresses, the loss gradually increases, and the detection ability becomes degenerate.

Figure 5. the cross-entropy loss of our and Abnormal anomaly detection model, on Cifar10 with non-iid-2. (a)–(c) are the performance for three Byzantine-attacks.

DownLoad: Full-Size Img PowerPoint

Figure 6(a)–(c) show the influence of different data distributions on our detection model. For different data distributions, the detection ability of the model is different, but it is worth pointing out that: as the degree of non-iid of the data increases, the detection ability of the model also increases.

Figure 6. (a)–(c) are our anomaly detection model's performance on four different data distribution (iid, non-iid-1, non-iid-2, non-iid-3) against Byzantine attacks (Gaussian noisy, sign flipping, same value).

DownLoad: Full-Size Img PowerPoint

4.2.3. Impact of unified update

In this part, we study the impact of the unified update on the global model. Figure 7 shows the accuracy of the global model with and without unified update on Cifar10.

Figure 7. The accuracy of BRCA and BRCA No on Cifar10. BRAC No is based on BRCA with unified update removed.

DownLoad: Full-Size Img PowerPoint

From non-iid-1 to iid, the improvement of the global model's accuracy by unified update is as follows: 35.1→13.6→4.7→2.3% (Same value), 34.8→10.5→3.0→3.1% (Gaussian noisy), 24.9→9.9→2.8→3.0% (Sign flipping). Combined with Figure 7, it can be clearly found that the more simple the client data is, the more obvious the unified update will be to the improvement of the global model.

When the data is non-iid, the directions of the model updates between clients are different. The higher the degree of non-iid of data, the more significant the difference. The global model obtained by weighted aggregation does not fit well with the global data. Unified update on the shared data can effectively integrate the model updates of multiple clients, giving the global model a consistent direction.

Therefore, it is necessary to implement a unified update to the primary aggregation model when data is non-iid.

4.2.4. Impact of non-iid

Tables 2 and 3 show the accuracy and loss of each defense method under different data distributions on Cifar10. It can be seen that our method is the best, and the performance is relatively stable for different data distributions. The higher the degree of non-iid of data, the more single the data of each client, the lower the performance of the defense method.

Table 2. The accuracy of the six defenses under four different data distributions on Cifar10, against three attacks.

Attacks		No	Krum	GeoMed	Abnormal	TrimmedMean	BRCA
Same value	Non-iid-1	0.1	0.1	0.1	0.178	0.1	0.529
	Non-iid-2	0.101	0.207	0.205	0.480	0.1	0.619
	Non-iid-3	0.1	0.398	0.398	0.634	0.1	0.691
	iid	0.098	0.696	0.705	0.698	0.101	0.713
Gaussian noisy	Non-iid-1	0.1	0.1	0.1	0.178	0.1	0.529
	Non-iid-2	0.191	0.204	0.205	0.513	0.059	0.623
	Non-iid-3	0.0409	0.398	0.394	0.660	0.171	0.692
	iid	0.1	0.697	0.694	0.710	0.120	0.715
Sign flipping	Non-iid-1	0.1	0.101	0.1	0.177	0.1	0.426
	Non-iid-2	0.1	0.192	0.214	0.5131	0.1	0.621
	Non-iid-3	0.1	0.397	0398	0.651	0.1	0.686
	iid	0.1	0.697	0.703	0.711	0.1	0.718

| Show Table

DownLoad: CSV

Table 3. The loss of the six defenses under four different data distributions on Cifar10, against three attacks.

Attacks		No	Krum	GeoMed	Abnormal	TrimmedMean	BRCA
Same value	Non-iid-1	2.84e¹⁶	11.72	9.61	2.29	6.05e¹⁷	2.09
	Non-iid-2	6.99e¹⁶	7.29	8.01	2.06	3.63e¹⁶	2.09
	Non-iid-3	4.48e¹⁶	2.35	2.38	1.893	3.37e¹⁶	0.691
	iid	1.51e¹⁶	0.794	0.774	1.837	3.17e¹⁶	1.79
Gaussian noisy	Non-iid-1	8.635e⁴	8.41	9.37	2.29	936.17	1.54
	Non-iid-2	9.51	7.57	8.37	1.34	7.98	0.623
	Non-iid-3	8.22	2.01	2.31	0.94	6.07	0.692
	iid	8.09	0.81	0.79	0.82	3.12	0.76
Sign flipping	Non-iid-1	2.30	10.72	9.91	2.29	2.30	1.54
	Non-iid-2	2.31	7.77	7.10	1.34	2.30	0.621
	Non-iid-3	2.31	2.36	2.13	0.94	2.30	0.686
	iid	2.31	0.79	0.80	0.81	2.31	0.76

| Show Table

DownLoad: CSV

Our analysis is as follows: 1) The non-iid of data among clients causes large differences between clients' models. And it is difficult for the defense method to judge whether the anomaly is caused by the non-iid of the data or by the Byzantine attacks, which increases the difficulty of defending the Byzantine attack. 2) Krum and GeoMed use statistical knowledge to select the median or individual client's model to represent the global model. This type of method can effectively defend against Byzantine attacks when the data is iid. However when the data is non-iid, each client's model only focuses on a smaller area, and its independence is high, cannot cover the domain of other clients, and obviously cannot represent the global model. 3) Trimmed Mean is based on the idea of averaging to defend against Byzantine attacks. When the parameter dimension of the model is low, it has a good performance. But as the complexity of the model increases, the method can not stably complete convergence.

5. Conclusions

In this work, we propose a robust federated learning framework against Byzantine attacks when the data is non-iid. BRCA detects Byzantine attacks by credibility assessment. Meanwhile, it makes the unified updating of the global model on the shared data, so that the global model has a consistent direction and its performance is improved. BRCA can make the global model converge very well when facing different data distributions. And for the pre-training of anomaly detection models, transfer learning can help the anomaly detection model get rid of its dependence on the test data set. Experiments have proved that BRCA performs well both on non-iid and iid data, especially on non-iid data. In the future, we will improve our methods by studying how to protect the privacy and security of shared data.

Acknowledgments

This work was partially supported by the Shanghai Science and Technology Innovation Action Plan under Grant 19511101300.

Conflict of interest

All authors declare no conflicts of interest in this paper.

References

[1]	Z. Zhang, P. Cui, W. Zhu, Deep Learning on Graphs: A Survey, IEEE Trans. Knowl. Data Eng., 34 (2022), 249–270. https://doi.org/10.1109/TKDE.2020.2981333 doi: 10.1109/TKDE.2020.2981333
[2]	D. I. Shuman, S. K. Narang, P. Frossard, A. Ortega, P. Vandergheynst, The emerging field of signal processing on graphs: Extending high dimensional data analysis to networks and other irregular domains, IEEE Signal Process. Mag., 30 (2013), 83–98. https://doi.org/10.1109/MSP.2012.2235192 doi: 10.1109/MSP.2012.2235192
[3]	A. Sandryhaila, J. M. F. Moura, Big data analysis with signal processing on graphs: Representation and processing of massive data sets with irregular structure, IEEE Signal Process. Mag., 31 (2014), 80–90. https://doi.org/10.1109/MSP.2014.2329213 doi: 10.1109/MSP.2014.2329213
[4]	A. Sandryhaila, J. M. F. Moura, Discrete signal processing on graphs, IEEE Trans. Signal Process., 61 (2013), 1644–1656. https://doi.org/10.1109/TSP.2013.2238935 doi: 10.1109/TSP.2013.2238935
[5]	J. Bruna, W. Zaremba, A. Szlam, Y. Lecun, Spectral networks and locally connected networks on graphs, arXiv preprint, (2013), arXiv: 1312.6203. https://doi.org/10.48550/arXiv.1312.6203
[6]	D. Duvenaud, D. Maclaurin, J. Iparraguirre, R. Bombarell, T. Hirzel, A. Aspuru-Guzik, eet al., Convolutional networks on graphs for learning molecular fingerprints, Adv. Neural Inf. Process. Syst., 28 (2015), 2224–2232.
[7]	T. N. Kipf, M. Welling, Semi-supervised classification with graph convolutional networks, arXiv preprint, (2016), arXiv: 1609.02907.
[8]	J. Atwood, D. Towsley, Diffusion-convolutional neural networks, Adv. Neural Inf. Process. Syst., 29 (2016), 1993–2001.
[9]	M. Defferrard, X. Bresson, P. Vandergheynst, Convolutional neural networks on graphs with fast localized spectral filtering, Adv. Neural Inf. Process. Syst., 29 (2016), 3837–3845.
[10]	R. Levie, F. Monti, X. Bresson, M. M. Bronstein, CayleyNets: Graph convolutional neural networks with complex rational spectral filters, IEEE Trans. Signal Process., 67 (2019), 97–109. https://doi.org/10.1109/TSP.2018.2879624 doi: 10.1109/TSP.2018.2879624
[11]	R. Levie, W. Huang, L. Bucci, M. Bronstein, G. Kutyniok, Transferability of Spectral Graph Convolutional Neural Networks, J. Mach. Learn. Res., 22 (2021), 12462–112520.
[12]	F. Monti, D. Boscaini, J. Masci, E. Rodola, J. Svoboda, M. M. Bronstein, Geometric deep learning on graphs and manifolds using mixture model CNNs, . IEEE Conf. Comput. Vis. Pattern Recognit., Honolulu, HI, USA, 2017, 5425–5434. https://doi.org/10.1109/CVPR.2017.576
[13]	M. Fey, J. E. Lenssen, F. Weichert, H. Müller, SplineCNN: fast geometric deep learning with continuous b-spline kernels, Proc. IEEE Conf. Comput. Vis. Pattern Recognit., (2018), 869–877. https://doi.org/10.1109/CVPR.2018.00097 doi: 10.1109/CVPR.2018.00097
[14]	W. L. Hamilton, Z. Ying, J. Leskovec, Inductive representation learning on large graphs, Adv. Neural Inf. Process. Syst., 30 (2017), 1024–1034.
[15]	Y. Zhao, J. Qi, Q. Liu, R. Zhang, WGCN: Graph Convolutional Networks with Weighted Structural Features, in 2021 SIGIR, (2021), 624–633. https://doi.org/10.1145/3404835.3462834
[16]	H. Wu, C. Wang, Y. Tyshetskiy, A. Docherty, K. Lu, L. Zhu, Adversarial examples for graph data: Deep insights into attack and defense, arXiv preprint, (2019), arXiv: 1903.01610. https://doi.org/10.48550/arXiv.1903.01610
[17]	D. Zügner, S. Günnemann, Adversarial attacks on graph neural networks via meta learning, arXiv preprint, (2019), arXiv: 1902.08412. https://doi.org/10.48550/arXiv.1902.08412
[18]	K. Xu, H. Chen, S. Liu, P. Chen, T. Weng, M. Hong, et al., Topology attack and defense for graph neural networks: An optimization perspective, in Proc. Int. Joint Conf. Artif. Intell., (2019), 3961–3967. https://doi.org/10.24963/ijcai.2019/550
[19]	L. Chen, J. Li, J. Peng, A survey of adversarial learning on graph, arXiv preprint, (2003), arXiv: 2003.05730. https://doi.org/10.48550/arXiv.2003.05730
[20]	L. Chen, J. Li, J. Peng, Y. Liu, Z. Zheng, C. Yang, Understanding Structural Vulnerability in Graph Convolutional Networks, in Proc. Int. Joint Conf. Artif. Intell., (2021), 2249–2255. https://doi.org/10.24963/ijcai.2021/310
[21]	P. Velickovic, G. Cucurull, A. Casanova, A. Romero, P. Lio, Y. Bengio, Graph attention networks, arXiv preprint, (2017), arXiv: 1710.10903. https://doi.org/10.48550/arXiv.1710.10903
[22]	A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit. L. Jones, A. N. Gomez, et al., Attention is all you need, Adv. Neural Inf. Process. Syst., 30 (2017), 5998–6008.
[23]	C. Zhuang, Q. Ma, Dual Graph Convolutional Networks for Graph-Based Semi-Supervised Classification, in Proc. Int. Conf. World Wide Web, (2018), 499–508. https://doi.org/10.1145/3178876.3186116
[24]	F. Hu, Y. Zhu, S. Wu, L. Wang, T. Tan, Hierarchical Graph Convolutional Networks for Semi-supervised Node Classification, in Proc. Int. Joint Conf. Artif. Intell., (2019), 4532–4539. https://doi.org/10.24963/ijcai.2019/630
[25]	Y. Zhang, S. Pal, M. Coates, D. Ü stebay, Bayesian graph convolutional neural networks for semi-supervised classification, in Proc. Int. Joint Conf. Artif. Intell., 33 (2019), 5829–5836. https://doi.org/10.1609/aaai.v33i01.33015829
[26]	Y. Luo, R. Ji, T. Guan, J. Yu, P. Liu, Y. Yang, Every node counts: Self-ensembling graph convolutional networks for semi-supervised learning, Pattern Recognit., 106 (2020), 107451. https://doi.org/10.1016/j.patcog.2020.107451 doi: 10.1016/j.patcog.2020.107451
[27]	P. Gong, L. Ai, Neighborhood Adaptive Graph Convolutional Network for Node Classification, IEEE Access, 7 (2019), 170578–170588. https://doi.org/10.1109/ACCESS.2019.2955487 doi: 10.1109/ACCESS.2019.2955487
[28]	I. Chami, Z. Ying, C. Ré, J. Leskovec, Hyperbolic graph convolutional neural networks, in Proc. Adv. Neural Inf. Process. Syst., (2019), 4868–4879.
[29]	J. Dai, Y. Wu, Z. Gao, Y. Jia, A Hyperbolic-to-Hyperbolic Graph Convolutional Network, in 2021 IEEE/CVF Conf. Computer Vision Pattern Recogn. (CVPR), (2021), 154–163. https://doi.org/10.1109/CVPR46437.2021.00022
[30]	S. Rhee, S. Seo, S. Kim, Hybrid Approach of Relation Network and Localized Graph Convolutional Filtering for Breast Cancer Subtype Classification, in 2018 Int. Joint Conf. Artif. Intell., (2018), 3527–3534. https://doi.org/10.24963/ijcai.2018/490
[31]	J. Gilmer, S. S. Schoenholz, P. F. Riley, O. Vinyals, G. E. Dahl, Neural Message Passing for Quantum Chemistry, in 2017 Int. Conf. Machine Learn., (2017), 1263–1272.
[32]	M. Zhang, Z. Cui, M. Neumann, Y. Chen, An End-to-End Deep Learning Architecture for Graph Classification, in Proc. Artif. Intell., (2018), 4438–4445. https://doi.org/10.1609/aaai.v32i1.11782
[33]	R. Ying, J. You, C. Morris, Hierarchical graph representation learning with differentiable pooling, in Proc. 32nd Int. Conf. Neural Inf. Process. Syst., (2018), 4805–4815.
[34]	Y. Ma, S. Wang, C. C Aggarwal, J. Tang, Graph convolutional networks with eigenpooling, in Proc. 25th ACM SIGKDD Int. Conf. Knowl. Discov. Data Min., (2019), 723–731. https://doi.org/10.1145/3292500.3330982
[35]	J. Lee, I. Lee, J. Kang, Self-attention graph pooling, in Proc. 36th Int. Conf. Machine Learn., (2019), 3734–3743. Available from: http://proceedings.mlr.press/v97/lee19c/lee19c.pdf
[36]	C. Cangea, P. Velickovic, N. Jovanovic, T. Kipf, P. Lio, Towards sparse hierarchical graph classifiers, in Proc. Adv. Neural Inf. Process. Syst., (2018). https://doi.org/10.48550/arXiv.1811.01287
[37]	H. Gao, S. Ji, Graph U-Nets, in Proc. 36th Int. Conf. Machine Learn., (2019), 2083–2092. https://doi.org/10.1109/TPAMI.2021.3081010
[38]	H. Gao, Z. Wang, S. Ji, Large-Scale Learnable Graph Convolutional Networks, in Proc. Knowl. Disc. Data Min., (2018), 1416–1424. https://doi.org/10.1145/3219819.3219947
[39]	W. Chiang, X. Liu, S. Si, Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks, in Proc. Knowl. Disc. Data Min., (2019), 257–266. https://doi.org/10.1145/3292500.3330925
[40]	D. Zou, Z. Hu, Y. Wang, S. Jiang, Y. Sun, Q. Gu, Layer-Dependent Importance Sampling for Training Deep and Large Graph Convolutional Networks, in Proc. Adv. Neural Inf. Process. Syst., (2019), 11249–11259.
[41]	J. Wang, Y. Wang, Z. Yang, Bi-GCN: Binary Graph Convolutional Network, in 2021 IEEE/CVF Conf. Comput. Vision Pattern Recogn. (CVPR), (2021), 1561–1570. https://doi.org/10.1109/CVPR46437.2021.00161
[42]	F. Monti, K. Otness, M. M. Bronstein, MOTIFNET: A Motif-Based Graph Convolutional Network for Directed Graphs, in Proc. IEEE Data Sci. Workshop, (2018), 225–228. https://doi.org/10.1109/DSW.2018.8439897
[43]	J. Du, S. Zhang, G. Wu, J. M. F. Moura, S. Kar, Topology adaptive graph convolutional networks, arXiv preprint, (2017), arXiv: 1710.10370.
[44]	E. Yu, Y. Wang, Y. Fu, D. B. Chen, M. Xie, Identifying critical nodes in complex networks via graph convolutional networks, Knowl.-Based Syst., 198 (2020), 105893. https://doi.org/10.1016/j.knosys.2020.105893 doi: 10.1016/j.knosys.2020.105893
[45]	C. Li, X. Qin, X. Xu, D. Yang, G. Wei, Scalable Graph Convolutional Networks with Fast Localized Spectral Filter for Directed Graphs, IEEE Access, 8 (2020), 105634–105644. https://doi.org/10.1109/ACCESS.2020.2999520 doi: 10.1109/ACCESS.2020.2999520
[46]	S. Abu-El-Haija, A. Kapoor, B. Perozzi, J. Lee, N-GCN: Multi-scale Graph Convolution for Semi-supervised Node Classification, in Proc. Conf. Uncertainty in Artif. Intell., (2019), 841–851.
[47]	S. Wan, C. Gong, P. Zhong, B. Du, L. Zhang, J. Yang, Multiscale Dynamic Graph Convolutional Network for Hyperspectral Image Classification, IEEE Trans. Geosci. Remote. Sens., 58 (2020), 3162–3177. https://doi.org/10.1109/TGRS.2019.2949180 doi: 10.1109/TGRS.2019.2949180
[48]	R. Liao, Z. Zhao, R. Urtasun, R. S. Zemel, LanczosNet: Multi-Scale Deep Graph Convolutional Networks, arXiv preprint., (2019), arXiv: 1901.01484. Available from: https://openreview.net/pdf?id = BkedznAqKQ
[49]	S. Luan, M. Zhao, X. Chang, D. Precup, Break the Ceiling: Stronger Multi-scale Deep Graph Convolutional Networks, in Proc. Conf. Workshop on Neural Inform. Process. Syst., 32 (2019), 10943–10953. Available from: https://proceedings.neurips.cc/paper_files/paper/2019/file/ccdf3864e2fa9089f9eca4fc7a48ea0a-Paper.pdf
[50]	F. Manessi, A. Rozza, M. Manzo, Dynamic Graph Convolutional Networks, Pattern Recogn., 97 (2020), 107000. https://doi.org/10.1016/j.patcog.2019.107000 doi: 10.1016/j.patcog.2019.107000
[51]	A. Pareja, G. Domeniconi, J. Chen, T. Ma, T. Suzumura, EvolveGCN: Evolving Graph Convolutional Networks for Dynamic Graphs, in Proc. Int. Joint Conf. Artif. Intell., (2020), 5363–5370. https://doi.org/10.1609/aaai.v34i04.5984
[52]	Z. Qiu, K. Qiu, J. Fu, D. Fu, DGCN: Dynamic Graph Convolutional Network for Efficient Multi-Person Pose Estimation, in Proc. Int. Joint Conf. Artif. Intell., (2020), 11924–11931. https://doi.org/10.1609/aaai.v34i07.6867
[53]	T. Song, Z. Cui, Y. Wang, W. Zheng, Q. Ji, Dynamic Probabilistic Graph Convolution for Facial Action Unit Intensity Estimation, in Proc. IEEE Conf. Comput. Vision Pattern Recogn., (2021), 4845–4854. https://doi.org/10.1109/CVPR46437.2021.00481
[54]	M. S. Schlichtkrull, T. N. Kipf, P. Bloem, R. Berg, I. Titov, M. Welling, Modeling Relational Data with Graph Convolutional Networks, In The Semantic Web: 15th Int. Conf., ESWC 2018, Heraklion, Crete, Greece, June 3–7, 2018, 593–607. https://doi.org/10.1007/978-3-319-93417-4_38
[55]	Z. Huang, X. Li, Y. Ye, M. K. Ng, MR-GCN: Multi-Relational Graph Convolutional Networks based on Generalized Tensor Product, in Proc. Int. Joint Conf. Artif. Intell., (2020), 1258–1264. https://doi.org/10.24963/ijcai.2020/175
[56]	J. Chen, L. Pan, Z. Wei, X. Wang, C. W. Ngo, T. S. Chua, Zero-Shot Ingredient Recognition by Multi-Relational Graph Convolutional Network, in Proc. Int. Joint Conf. Artif. Intell., 34 (2020), 10542–10550. https://doi.org/10.1609/aaai.v34i07.6626
[57]	P. Gopalan, S. Gerrish, M. Freedman, D. Blei, D. Mimno, Scalable inference of overlapping communities, in Proc. Conf. Workshop on Neural Inform. Process. Syst., (2012), 2249–2257.
[58]	R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua, S. Süsstrunk, SLIC superpixels compared to state-of-the-art superpixel methods, IEEE Trans. Pattern Anal. Mach. Intell., 34 (2012), 2274–2282. https://doi.org/10.1109/TPAMI.2012.120 doi: 10.1109/TPAMI.2012.120
[59]	W. Zheng, P. Jing, Q. Xu, Action Recognition Based on Spatial Temporal Graph Convolutional Networks, in Proc. 3rd Int. Conf. Comput. Sci. Appl. Eng., 118 (2019), 1–5. https://doi.org/10.1145/3331453.3361651
[60]	D. Tian, Z. Lu, X. Chen, L. Ma, An attentional spatial temporal graph convolutional network with co-occurrence feature learning for action recognition, Multimed. Tools Appl., 79 (2020), 12679–12697. https://doi.org/10.1007/s11042-020-08611-4 doi: 10.1007/s11042-020-08611-4
[61]	Y. Chen, G. Ma, C. Yuan, B. Li, H. Zhang, F. Wang, et al., Graph convolutional network with structure pooling and joint-wise channel attention for action recognition, Pattern Recogn., 103 (2020), 107321. https://doi.org/10.1016/j.patcog.2020.107321 doi: 10.1016/j.patcog.2020.107321
[62]	J. Dong, Y. Gao, H. J. Lee, H. Zhou, Y. Yao, Z. Fang, et al., Action Recognition Based on the Fusion of Graph Convolutional Networks with High Order Features, Appl. Sci., 10 (2020), 1482. https://doi.org/10.3390/app10041482 doi: 10.3390/app10041482
[63]	Z. Chen, S. Li, B. Yang, Q. Li, H. Liu, Multi-Scale Spatial Temporal Graph Convolutional Network for Skeleton-Based Action Recognition, in Proc. Int. Joint Conf. Artif. Intell., 35 (2021), 1113–1122. https://doi.org/10.1609/aaai.v35i2.16197
[64]	Y. Bin, Z. Chen, X. Wei, X. Chen, C. Gao, N. Sang, Structure-aware human pose estimation with graph convolutional networks, Pattern Recogn., 106 (2020), 107410. https://doi.org/10.1016/j.patcog.2020.107410 doi: 10.1016/j.patcog.2020.107410
[65]	R. Wang, C. Huang, X. Wang, Global Relation Reasoning Graph Convolutional Networks for Human Pose Estimation, IEEE Access, 8 (2020), 38472–38480. https://doi.org/10.1109/ACCESS.2020.2973039 doi: 10.1109/ACCESS.2020.2973039
[66]	T. Sofianos, A. Sampieri, L. Franco, F. Galasso, Space-Time-Separable Graph Convolutional Network for Pose Forecasting, in Proc. IEEE/ICCV Int. Conf. Comput. Vision, (2021), 11209–11218. https://doi.org/10.48550/arXiv.2110.04573
[67]	Z. Zou, W. Tang, Modulated Graph Convolutional Network for 3D Human Pose Estimation, in Proc. ICCV, (2021), 11457–11467. https://doi.org/10.1109/ICCV48922.2021.01128
[68]	B. Yu, H. Yin, Z. Zhu, Spatio-Temporal Graph Convolutional Networks: A Deep Learning Framework for Traffic Forecasting, in Proc. Int. Joint Conf. Artif. Intell., (2018), 3634–3640. https://doi.org/10.24963/ijcai.2018/505
[69]	Y. Han, S. Wang, Y. Ren, C. Wang, P. Gao, G. Chen, Predicting Station-Level Short-Term Passenger Flow in a Citywide Metro Network Using Spatiotemporal Graph Convolutional Neural Networks, ISPRS Int. J. Geo-Inform., 8 (2019), 243. https://doi.org/10.3390/ijgi8060243 doi: 10.3390/ijgi8060243
[70]	B. Zhao, X. Gao, J. Liu, J. Zhao, C. Xu, Spatiotemporal Data Fusion in Graph Convolutional Networks for Traffic Prediction, IEEE Access, 8 (2020), 76632–76641. https://doi.org/10.1109/ACCESS.2020.2989443 doi: 10.1109/ACCESS.2020.2989443
[71]	L. Ge, H. Li, J. Liu, A. Zhou, Temporal Graph Convolutional Networks for Traffic Speed Prediction Considering External Factors, in Proc. Int. Conf. Mobile Data Manag., (2019), 234–242. https://doi.org/10.1109/MDM.2019.00-52
[72]	L. Ge, S. Li, Y. Wang, F. Chang, K. Wu, Global Spatial-Temporal Graph Convolutional Network for Urban Traffic Speed Prediction, Appl. Sci.-basel, 10 (2020), 1509. https://doi.org/10.3390/app10041509 doi: 10.3390/app10041509
[73]	P. Han, P. Yang, P. Zhao, S. Shang, Y. Liu, J. Zhou, et al., GCN-MF: Disease-Gene Association Identification by Graph Convolutional Networks and Matrix Factorization, Knowl. Disc. Data Min., (2019), 705–713. https://doi.org/10.1145/3292500.3330912 doi: 10.1145/3292500.3330912
[74]	J. Li, Z. Li, R. Nie, Z. You, W. Bao, FCGCNMDA: predicting miRNA-disease associations by applying fully connected graph convolutional networks, Mol. Genet. Genom., 295 (2020), 1197–1209. https://doi.org/10.1007/s00438-020-01693-7 doi: 10.1007/s00438-020-01693-7
[75]	L. Wang, Z. You, Y. Li, K. Zhang, Y. Huang, GCNCDA: A new method for predicting circRNA-disease associations based on Graph Convolutional Network Algorithm, PLoS Comput. Biol., 16 (2020), e1007568. https://doi.org/10.1371/journal.pcbi.1007568 doi: 10.1371/journal.pcbi.1007568
[76]	C. Wang, J. Guo, N. Zhao, Y. Liu, X. Liu, G. Liu, et al., A Cancer Survival Prediction Method Based on Graph Convolutional Network, IEEE Trans. NanoBiosci., 19 (2019), 117–126. https://doi.org/10.1109/TNB.2019.2936398 doi: 10.1109/TNB.2019.2936398
[77]	H. Chen, F. Zhuang, L. Xiao, L. Ma, H. Liu, R. Zhang, et al., AMA-GCN: Adaptive Multi-layer Aggregation Graph Convolutional Network for Disease Prediction, in Proc. IJCAI, (2021), 2235–2241. https://doi.org/10.24963/ijcai.2021/308
[78]	K. Gopinath, C. Desrosiers, H. Lombaert, Learnable Pooling in Graph Convolutional Networks for Brain Surface Analysis, IEEE Trans. Pattern Anal. Mach. Intell., 44 (2022), 864–876. https://doi.org/10.1109/TPAMI.2020.3028391 doi: 10.1109/TPAMI.2020.3028391
[79]	R. Ying, R. He, K. Chen, Graph Convolutional Neural Networks for Web-Scale Recommender Systems, in Proc. Knowl. Disc. Data Min., (2018), 974–983. https://doi.org/10.1145/3219819.3219890
[80]	X. Xia, H. Yin, J. Yu, Q. Wang, L. Cui, X. Zhang, Self-Supervised Hypergraph Convolutional Networks for Session-based Recommendation, in Proc. Int. Joint Conf. Artif. Intell., 35 (2021), 4503–4511. https://doi.org/10.1609/aaai.v35i5.16578
[81]	H. Chen, L. Wang, Y. Lin, C. Yeh, F. Wang, H. Yang, Structured Graph Convolutional Networks with Stochastic Masks for Recommender Systems, in Proc. SIGIR, (2021), 614–623. https://doi.org/10.1145/3404835.3462868
[82]	L. Chen, Y. Xie, Z. Zheng, H. Zheng, J. Xie, Friend Recommendation Based on Multi-Social Graph Convolutional Network, IEEE Access, 8 (2020), 43618–43629. https://doi.org/10.1109/ACCESS.2020.2977407 doi: 10.1109/ACCESS.2020.2977407
[83]	T. Zhong, S. Zhang, F. Zhou, K. Zhang, G. Trajcevski, J. Wu, Hybrid graph convolutional networks with multi-head attention for location recommendation, World Wide Web, 23 (2020), 3125–33151. https://doi.org/10.1007/s11280-020-00824-9 doi: 10.1007/s11280-020-00824-9
[84]	T. H. Nguyen, R. Grishman, Graph Convolutional Networks with Argument-Aware Pooling for Event Detection, in Proc. AAAI Confer. Artif. Intell., 32 (2018). https://doi.org/10.1609/aaai.v32i1.12039
[85]	Z. Guo, Y. Zhang, W. Lu, Attention Guided Graph Convolutional Networks for Relation Extraction, Ann. Meet. Assoc. Comput. Linguist., (2019), 241–251. https://doi.org/10.18653/v1/P19-1024 doi: 10.18653/v1/P19-1024
[86]	Y. Hong, Y. Liu, S. Yang, K. Zhang, A. Wen, J. Hu, Improving Graph Convolutional Networks Based on Relation-Aware Attention for End-to-End Relation Extraction, IEEE Access, 8 (2020), 51315–51323. https://doi.org/10.1109/ACCESS.2020.2980859 doi: 10.1109/ACCESS.2020.2980859
[87]	Z. Meng, S. Tian, L. Yu, Y. Lv, Joint extraction of entities and relations based on character graph convolutional network and Multi-Head Self-Attention Mechanism, J. Exp. Theor. Artif. Intell., 33 (2021), 349–362. https://doi.org/10.1080/0952813X.2020.1744198 doi: 10.1080/0952813X.2020.1744198
[88]	L. Yao, C. Mao, Y. Luo, Graph Convolutional Networks for Text Classification, Artif. Intell., (2019), 7370–7377. https://doi.org/10.1609/aaai.v33i01.33017370 doi: 10.1609/aaai.v33i01.33017370
[89]	M. Chandra, D. Ganguly, P. Mitra, B. Pal, J. Thomas, NIP-GCN: An Augmented Graph Convolutional Network with Node Interaction Patterns, in Proc. SIGIR, (2021), 2242–2246. https://doi.org/10.1145/3404835.3463082
[90]	L. Xiao, X. Hu, Y. Chen, Y. Xue, D. Gu, B. Chen, et al., Targeted Sentiment Classification Based on Attentional Encoding and Graph Convolutional Networks, Appl. Sci., 10 (2020), 957. https://doi.org/10.3390/app10030957 doi: 10.3390/app10030957
[91]	P. Zhao, L. Hou, O. Wu, Modeling sentiment dependencies with graph convolutional networks for aspect-level sentiment classification, Knowl.-Based Syst., 193 (2020), 105443. https://doi.org/10.1016/j.knosys.2019.105443 doi: 10.1016/j.knosys.2019.105443
[92]	S. Jiang, Q. Chen, X. Liu, B. Hu, L. Zhang, Multi-hop Graph Convolutional Network with High-order Chebyshev Approximation for Text Reasoning, arXiv preprint, (2021), arXiv: 2106.05221. https://doi.org/10.18653/v1/2021.acl-long.513
[93]	R. Li, H. Chen, F. Feng, Z. Ma, X. Wang, E. Hovy, Dual Graph Convolutional Networks for Aspect-based Sentiment Analysis, in Proc. 59 Ann. Meet. Assoc. Comput. Linguist. And 11^th Int. joint Conf. Nat. Language process., 1 (2021), 6319–6329.
[94]	L. Lv, J. Cheng, N. Peng, M. Fan, D. Zhao, J. Zhang, Auto-encoder based Graph Convolutional Networks for Online Financial Anti-fraud, IEEE Comput. Intell. Financ. Eng. Econ., (2019), 1–6. https://doi.org/10.1109/CIFEr.2019.8759109 doi: 10.1109/CIFEr.2019.8759109
[95]	C. Li, D. Goldwasser, Encoding Social Information with Graph Convolutional Networks for Political Perspective Detection in News Media, in Proc. 57th Ann. Meet. Assoc. Comput. Linguist., (2019), 2594–2604. https://doi.org/10.18653/v1/p19-1247
[96]	Y. Sun, T. He, J. Hu, H. Hang, B. Chen, Socially-Aware Graph Convolutional Network for Human Trajectory Prediction, in 2019 IEEE 3rd Inf. Technol. Network. Electron. Autom. Control Conf. (ITNEC), (2019), 325–333. https://doi.org/10.1109/ITNEC.2019.8729387
[97]	J. Chen, J. Li, M. Ahmed, J. Pang, M. Lu, X. Sun, Next Location Prediction with a Graph Convolutional Network Based on a Seq2seq Framework, KSII Trans. Internet Inf. Syst., 14 (2020), 1909–1928. https://doi.org/10.3837/tiis.2020.05.003 doi: 10.3837/tiis.2020.05.003
[98]	X. Li, Y. Xin, C. Zhao, Y. Yang, Y. Chen, Graph Convolutional Networks for Privacy Metrics in Online Social Networks, Appl. Sci.-Basel, 10 (2020), 1327. https://doi.org/10.3390/app10041327 doi: 10.3390/app10041327

This article has been cited by:

1.	Chang Xu, Yu Jia, Liehuang Zhu, Chuan Zhang, Guoxie Jin, Kashif Sharif, TDFL: Truth Discovery Based Byzantine Robust Federated Learning, 2022, 33, 1045-9219, 4835, 10.1109/TPDS.2022.3205714
2.	Jie Wen, Zhixia Zhang, Yang Lan, Zhihua Cui, Jianghui Cai, Wensheng Zhang, A survey on federated learning: challenges and applications, 2023, 14, 1868-8071, 513, 10.1007/s13042-022-01647-y
3.	Qingtie Li, Xuemei Wang, Shougang Ren, A Privacy Robust Aggregation Method Based on Federated Learning in the IoT, 2023, 12, 2079-9292, 2951, 10.3390/electronics12132951
4.	Wenbin Yao, Bangli Pan, Yingying Hou, Xiaoyong Li, Yamei Xia, An Adaptive Model Filtering Algorithm Based on Grubbs Test in Federated Learning, 2023, 25, 1099-4300, 715, 10.3390/e25050715
5.	Chang Zhang, Shunkun Yang, Lingfeng Mao, Huansheng Ning, Anomaly detection and defense techniques in federated learning: a comprehensive review, 2024, 57, 1573-7462, 10.1007/s10462-024-10796-1
6.	Hiralal Bhaskar Solunke, Pawan Bhaladhare, Amol Potgantwar, 2024, chapter 17, 9798369334942, 299, 10.4018/979-8-3693-3494-2.ch017
7.	Caiyu Su, Jinri Wei, Yuan Lei, Hongkun Xuan, Jiahui Li, Chenchu Xu, Empowering precise advertising with Fed-GANCC: A novel federated learning approach leveraging Generative Adversarial Networks and group clustering, 2024, 19, 1932-6203, e0298261, 10.1371/journal.pone.0298261
8.	Kai Hu, Sheng Gong, Qi Zhang, Chaowen Seng, Min Xia, Shanshan Jiang, An overview of implementing security and privacy in federated learning, 2024, 57, 1573-7462, 10.1007/s10462-024-10846-8
9.	S. Annamalai, N. Sangeetha, M. Kumaresan, Dommaraju Tejavarma, Gandhodi Harsha Vardhan, A. Suresh Kumar, 2025, 9781394219216, 127, 10.1002/9781394219230.ch7
10.	Zheng Yang, Ke Gu, Yiming Zuo, Byzantine Robust Federated Learning Scheme Based on Backdoor Triggers, 2024, 79, 1546-2226, 2813, 10.32604/cmc.2024.050025

Reader Comments

Your name:*

Email:*
© 2023 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)