
Federated learning (FL) is a framework which is used in distributed machine learning to obtain an optimal model from clients' local updates. As an efficient design in model convergence and data communication, cloud-edge-client hierarchical federated learning (HFL) attracts more attention than the typical cloud-client architecture. However, the HFL still poses threats to clients' sensitive data by analyzing the upload and download parameters. In this paper, to address information leakage effectively, we propose a novel privacy-preserving scheme based on the concept of differential privacy (DP), adding Gaussian noises to the shared parameters when uploading them to edge and cloud servers and broadcasting them to clients. Our algorithm can obtain global differential privacy with adjustable noises in the architecture. We evaluate the performance on image classification tasks. In our experiment on the Modified National Institute of Standards and Technology (MNIST) dataset, we get 91% model accuracy-layer HFL-DP, our design is more secure while as being accurate.
Citation: Youqun Long, Jianhui Zhang, Gaoli Wang, Jie Fu. Hierarchical federated learning with global differential privacy[J]. Electronic Research Archive, 2023, 31(7): 3741-3758. doi: 10.3934/era.2023190
[1] | Zhuang Wang, Renting Liu, Jie Xu, Yusheng Fu . FedSC: A federated learning algorithm based on client-side clustering. Electronic Research Archive, 2023, 31(9): 5226-5249. doi: 10.3934/era.2023266 |
[2] | Qingjie Tan, Xujun Che, Shuhui Wu, Yaguan Qian, Yuanhong Tao . Privacy amplification for wireless federated learning with Rényi differential privacy and subsampling. Electronic Research Archive, 2023, 31(11): 7021-7039. doi: 10.3934/era.2023356 |
[3] | Seyha Ros, Prohim Tam, Inseok Song, Seungwoo Kang, Seokhoon Kim . A survey on state-of-the-art experimental simulations for privacy-preserving federated learning in intelligent networking. Electronic Research Archive, 2024, 32(2): 1333-1364. doi: 10.3934/era.2024062 |
[4] | Yunfei Tan, Shuyu Li, Zehua Li . A privacy preserving recommendation and fraud detection method based on graph convolution. Electronic Research Archive, 2023, 31(12): 7559-7577. doi: 10.3934/era.2023382 |
[5] | Nihar Patel, Nakul Vasani, Nilesh Kumar Jadav, Rajesh Gupta, Sudeep Tanwar, Zdzislaw Polkowski, Fayez Alqahtani, Amr Gafar . F-LSTM: Federated learning-based LSTM framework for cryptocurrency price prediction. Electronic Research Archive, 2023, 31(10): 6525-6551. doi: 10.3934/era.2023330 |
[6] | Shuang Yao, Dawei Zhang . A blockchain-based privacy-preserving transaction scheme with public verification and reliable audit. Electronic Research Archive, 2023, 31(2): 729-753. doi: 10.3934/era.2023036 |
[7] | Xiaoyu Jiang, Ruichun Gu, Huan Zhan . Research on incentive mechanisms for anti-heterogeneous federated learning based on reputation and contribution. Electronic Research Archive, 2024, 32(3): 1731-1748. doi: 10.3934/era.2024079 |
[8] | Mengjie Xu, Nuerken Saireke, Jimin Wang . Privacy-preserving distributed optimization algorithm for directed networks via state decomposition and external input. Electronic Research Archive, 2025, 33(3): 1429-1445. doi: 10.3934/era.2025067 |
[9] | Xiongfei Li, Shuyu Li, Hao Xu, Yixuan Zhang . A differentially private distributed collaborative XGBoost method. Electronic Research Archive, 2024, 32(4): 2865-2879. doi: 10.3934/era.2024130 |
[10] | Ruohan Cao, Jin Su, Jinqian Feng, Qin Guo . PhyICNet: Physics-informed interactive learning convolutional recurrent network for spatiotemporal dynamics. Electronic Research Archive, 2024, 32(12): 6641-6659. doi: 10.3934/era.2024310 |
Federated learning (FL) is a framework which is used in distributed machine learning to obtain an optimal model from clients' local updates. As an efficient design in model convergence and data communication, cloud-edge-client hierarchical federated learning (HFL) attracts more attention than the typical cloud-client architecture. However, the HFL still poses threats to clients' sensitive data by analyzing the upload and download parameters. In this paper, to address information leakage effectively, we propose a novel privacy-preserving scheme based on the concept of differential privacy (DP), adding Gaussian noises to the shared parameters when uploading them to edge and cloud servers and broadcasting them to clients. Our algorithm can obtain global differential privacy with adjustable noises in the architecture. We evaluate the performance on image classification tasks. In our experiment on the Modified National Institute of Standards and Technology (MNIST) dataset, we get 91% model accuracy-layer HFL-DP, our design is more secure while as being accurate.
With the rapid development of mobile devices and Internet-of-Things (IoT), it is expected that everything will be connected closely[1,2]. To apply artificial intelligence (AI) technology to different scenes, we prefer to choose distributed machine learning (ML) to process tasks in cooperation with each other[3]. Though practical and pragmatic as it is, it suffers many reliability and security risks in a real distributed environment[4,5]. When training the models cooperatively, the privacy of clients' sensitive datasets becomes a great concern. Hence, federated learning, in which datasets are processed only on the client side, has been proposed[6,7]. FL aims to fit a common model by an empirical risk minimization (ERM) objective. Nevertheless, FL still poses many worrying crises on information leakage, communication efficiency and device diversity[8,9,10,11,12].
As a commonly used architecture, client-server FL also suffers from stability problems, especially in poor network [13]. To address these problems, the more efficient and stable architecture client-edge-cloud hierarchical FL has been proposed[14]. The HFL allows multiple edge servers to perform partial model aggregation. In this way, the model can be trained faster, and better communication-computation trade-offs can be achieved. Edge servers are designed to aggregate the clients' model parameters. Hence, edge servers are capable of reducing the computing burden, message delay and communication burden of cloud servers[15].
FL usually adopts stochastic gradient descent (SGD) in the training process. Based on distributed SGD with a one-step local update, [16,17] have analyzed the convergence performance of federated learning. Federated proximal (FedProx) [18], taking advantage of regularization on a local loss function, improves the convergence performance and system stability.
With FL becoming more and more attractive, data security and confidentiality also raise great concerns. FL seems to protect clients' individual data by exchanging parameters instead of sensitive information. However, there exist many malicious adversaries that are capable of stealing some secret information[19]. The most useful way to prevent information exposure is to apply differential privacy by adding noises in trained parameters [20]. As to the research about DP in FL, there are mainly three aspects: client-level, sample-level and LDP-FL [21,22,23,24,25,26].
Though there are various designs above, the global differential privacy on three-layer hierarchical federated learning is still unexplored. As to two-layer HFL-DP[27], the authors proposed a privacy-preserving scheme based on the theory of local differential privacy, with addition of noise to the shared model parameters before uploading them to edge and cloud servers. However, they did not control the amount of Gaussian noise in the architecture. So, if we want to expand our layer architecture, we may add so much noise to damage the utility, even leading to a tough convergence. Our design solves this problem.
To design a practical and efficient algorithm to reach the balance between privacy of the sensitive data and the quality of a trained model, in this paper, we propose our novel framework based on a DP mechanism. The key value of our design is to realize the global privacy budget in the whole architecture. At each time, we control the overall amount of noise we add instead of letting it increase blindly. Consequently, the model accuracy can be adjusted due to the trade-off between model accuracy and privacy cost. From our perspective, there are few papers about the research of global privacy in hierarchical federated learning. We conduct extensive tests based on real-world datasets to validate the performance of our algorithm. Evaluations show that theoretical results and test results are consistent. To summarize, we propose a novel scheme that satisfied the requirement of DP with global data under a certain noises perturbation level by adding proper Gaussian noise and exploring the trade-off between privacy parameters and model utility to comprehend the relation between privacy level and model quality. By doing so, sensitive user data can be protected even if there exist many adversaries from inner and outer cyberspace.
In client-level, in order to protect the clients' information, we treat the clients' model as a piece of data. In [21] DP-FedAvg and DP-FedSGD were proposed. The sampling is to sample the client, and the noises are added at the center. Sensitivity is calculated based on the sampling rate, with average weights for each client. McMahan et al. suggest that with a huge number of users engaging in, realizing differential privacy costs more computation instead of decreasing utility. By their experiment, the LSTM language models can be similar to un-noised models from quantitative and qualitative aspects when using large datasets. They consider the effect of large step updates on the model. In [22] the model uploaded by the client is clipped at the center so that an adaptive clipping operation can be performed (such as taking the median value of the norm of each client model as the clipping norm). The key point is concealing the contribution of individual clients in the training process. It was shown that when the client's participation is hidden, the model obtained by federated learning still has a good performance. Then we can conclude that when enough clients are participating in the learning, the model can achieve high accuracy. In [23] the uplink privacy protection is carried out first, and then the downlink noises are added based on the uplink noises addition, and the convergence analysis is given. A theoretical convergence limit is specified for the loss function of the trained FL model. Based on this, we can know that the better the convergence performance is the lower the protection level. Vice, given a fixed privacy protection level, increasing the overall number N of clients participating in FL can improve the convergence performance of the final model; and under a given privacy protection level, there is an optimal maximum number of aggregations T in terms of convergence performance.
In sample-level, we protect the sensitive information of the samples under the client. Each sample under the client is regarded as a piece of data, and the federation center is regarded as an opponent. In [24] integrates the DPAGD-CNN algorithm into federated machine learning. DPAGD-CNN is an adaptive allocation algorithm for differential privacy budget under centralized machine learning. Considering the difference in user data, the model gets accurate when adjusting differential privacy. In [25] sampling on the client instead of the data in the client side, but also applies the privacy measurement method of MA and then adds a method based on the privacy budget to adapt the optimal number of iterations, which is actually a novel form of self-adaptation privacy budget allocation. Wei et al. made an exact relation between communication round T and privacy level.
In LDP-FL, a client has only one sample datum, and if there are multiple sample data, multiple gradients will be trained and uploaded to the federation center. In [26] the CLDP-SGD algorithm was proposed, which first samples the clients, then samples again under the clients and finally shuffles the gradients of different clients. There are multiple sample gradients, and the clients will pass multiple gradients to the center. Noises are added locally based on coded perturbation.
In federated learning, each participant can build the model without disclosing the underlying data and only shares the weight update and gradient information of the model with the server. In order to ensure the security of federated learning, noise is added to the model update to obscure the contribution of the client. In [28], they suggest a novel DP aggregation scheme to the improve update strategy. Meanwhile, they adopt the f-differential privacy to analyze the privacy loss so that the loss will be calculated more accurately at every round. In [27], they suggest a design on local differential privacy. They analyze the privacy loss by the moment accounting, so their design can get a tight privacy guarantee. By adding the noise to the shared model parameters before uploading them to edge and cloud servers, they realize the strict differential privacy guarantee for the layers of clients and edge servers.
In this section, we present the HFL framework and the thread model on which our paper based and the related knowledge about DP.
Let us consider a general HFL system consisting of three different parts in three logistic layers, as shown in Figure 1. The clients connected to edge servers are various computers and mobile devices. They are to train a local model from their individual data, and then send local updates to edge server. Edge servers and cloud servers are responsible to aggregate and transmit back the aggregation to corresponding clients.
To further improve security in our system, we consider clients, edge servers, and the cloud server to be all semi-honest, which means they will strictly follow the HFL framework but may infer privacy. Also, there may be external attackers trying to usurp clients' privacy. They are capable of inverting the broadcast models from the servers in order to seize the private individual data, as explained in [29].
Factually, download channels suffer more security risks than upload channels. That is because clients upload their model updates through direct and ephemeral upload channels while download channels are broadcasting and lasting. We consider at most t1 exposures of uploaded updates from each client to their edge server. Due to the broadcast effect, we assume that aggregated parameters will be exposed in the downlink channel for every communication. Therefore, we assume t2 exposures from each edge server to clients. Moreover, the value of t2 satisfies the condition that t|k1∧t∤k1k2. Similarly, we assume there are t4 exposures from each edge server to the cloud server and t5 exposures from the cloud server to edge servers. The value of t5 also satisfies the condition that t∣k1k2. To be more specific, we assume that there are t3 exposures in t1 for cloud aggregation, and t1−t3 exposures are counted as t1 only for edge aggregation.
In our hierarchical architecture, we apply a DP mechanism with parameters ϵ and δ to provide strong privacy preservation. ϵ>0 represents the distinguishable bound of all outputs on neighboring datasets in a database, and δ is the probability of the event that the ratio of the probabilities for two adjacent datasets cannot be bounded by eϵ after the mechanism. Therefore, a larger ϵ entails a higher risk of privacy leakage for a clearer distinguishability of two neighboring datasets[30]. Now, we will formally define DP as follows.
Definition 1. ((ϵ, δ)-Differential privacy): A randomized mechanism M satisfies (ϵ, δ)-DP if for X → R with domain X and range R, and all result set S ⊆ R, there can be any neighboring datasets Di, D′i∈X which only differ by one sample, and then we have
Pr[M(Di)∈S]≤eϵPr[M(D′i)∈S]+δ | (3.1) |
In this paper, we use a Gaussian mechanism which adopts L2 norm sensitivity. To protect the function output s(x), it needs to add zero-mean Gaussian noises with variance σ2I in each coordinate.
M(x)=s(x)+N(0,σ2I), | (3.2) |
where I is an identity matrix, and s() is a real-valued function.
To achieve (ϵ, δ)-DP, the noise scale σ≥cΔs/ϵ, and the constant c≥√2ln(1.25/δ) for ϵ∈(0,1). Δs is the sensitivity of function s(x), which can be computed easily by Δs=maxDi,D′i||s(Di)−s(D′i)||. Seeing the definitions above, we ought to choose a proper value of σ to satisfy (ϵ, δ)-DP.
To protect the parameter in upload and download channels, we design a global (ϵ, δ)-DP. First, we elaborate our design from the clients' upload parameters to the edge server. In every iteration, we consider the batch size in a local training process to be equal to the number of training data, and then we can give the definition of the i-th client in a local training process
sDiU=argminwFi(w,Di)=1|Di||Di|∑j=1argminwFi(w,Di,j), | (4.1) |
where Di is the dataset of i-th clients, and Di,j is the j-th sample in Di. Hence, we can define the sensitivity of sDiU
ΔsDiU=maxDi,D′i||sDiU−sD′iU||=maxDi,D′i||1|Di||Di|∑j=1argminwFi(w,Di,j) −1|D′i||D′i|∑j=1argminwFi(w,D′i,j)||=2C|Di|, | (4.2) |
where C is a clipping bound for wi, and Di and D′i are neighboring datasets. Generally, we define the global sensitivity in the upload channel as
ΔsU=max{ΔsDiU}=max{2C|Di|}, ∀i. | (4.3) |
When clients take advantage of all the training datasets, it comes out to be a small global sensitivity. Hence, we set the number of local datasets Di to be minimum by m. Due to exposure in the upload channel, we define the Gaussian noises σU=cΔsU/ϵ. After t1 and t3 exposures, the noises increase to σU1=ct1ΔsU/ϵ and σU2=ct3ΔsU/ϵ due to the linear relation of ϵ and σU. Similarly, σE=ct4ΔsE/ϵ in the process from edge server to cloud server.
Then, we consider edge servers that upload parameters to the cloud server. The aggregation weight can be expressed as
w=p1w1+p2w2+...+pNwN, | (4.4) |
where n means the number of edge servers connecting, and w represents the result of aggregation. Thus, the sensitivity of edge server's aggregation ΔsDiE can be expressed as
ΔsDiE=maxDi,D′i||sDiE−sD′iE||. | (4.5) |
We have
sDiE=p1w1(D1)+...+piwi(Di)+...+pnwn(Dn) | (4.6) |
and
sD′iE=p1w1(D1)+...+piwi(D′i)+...+pnwn(Dn). | (4.7) |
Therefore, the sensitivity can be expressed as
ΔsDiE=maxDi,D′i||piwi(Di)−piwi(D′i)||=pimaxDi,D′i||wi(Di)−wi(D′i)||=piΔsDiU≤2Cpim | (4.8) |
To meet the demand of small global sensitivity in the channel, the sensitivity ΔsE can be given as
ΔsE=max{ΔsDiE}=max{2Cpim}, ∀i, | (4.9) |
where m is the size of datasets in a client. To reach the proper sensitivity ΔsE, we set all pi=1n. Similarly, we can define the sensitivity in cloud server as ΔsC=max{2Cpiqim}, where pi is the ratio of selecting client and qi is the ratio of selecting edge server. We ought to add enough Gaussian noises in upload channel and then consider whether or not to add noises in download channel to satisfy the global (ϵ,δ)-DP.
In our hierarchical system, we realize a secure and efficient model by global (ϵ, δ)-DP in Algorithm 1. We define the initial global parameter w(0) and introduce the proximal term μ to improve the stability of the overall framework. The essence of this correction term is to increase the difference between the parameters in the local model and the parameters in the global model.
Theorem 1. (DP guarantee for downlink channels): To ensure (ϵ, δ)-DP in the downlink channels with aggregations, the standard deviation of Gaussian noises nE and nC can be defined as
nE={2c∗Cϵ1mn√t22−t21n,t2≥√nt1,0,t2<√nt1. | (4.10) |
nC={2c∗Cϵ2mnN√t25−t24N−t23Nn,t5≥√t24N+t23Nn,0,t5<√t24N+t23Nn. | (4.11) |
Proof. First, we start from proof of the value of nE. According to the definition of global (ϵ, δ)-DP in the uplink channels, the additive noises of clients can be given as σU=c∗t1ΔsU/ϵ1, for the linear relation between ϵ1 and σU with Gaussian mechanism. Here, we assume the noises added in clients obey the same distribution n∼ϕ(n) due to the property of global (ϵ, δ)-DP. Then, the aggregation in an edge server with additive noises can be given as
˜w=n∑i=1pi(wi+ni)=n∑i=1piwi+n∑i=1pini | (4.12) |
Hence, the distribution of n∑i=1pini can be given as
Φ(n)=n⨂i=1ϕi(n), | (4.13) |
where ⨂ is a convolutional operation, and pini∼ϕi(n). If we set ni with clients' noise scale σU by Gaussian mechanism, then n∑i=1pini is a Gaussian distribution. According to the definition of ΔsE, pi=1n will lead to a small sensitivity. To make it a global (ϵ, δ)-DP in download channels, we assume the value of noise scale to be σA=ct2ΔsE/ϵ1. Therefore, the additive noises at edge server can be expressed as
nE=√σ2A−σ2Un={2c∗Cϵ1mn√t22−t21n,t2≥√nt1,0,t2<√nt1. | (4.14) |
Hence, the value of nE has been proved.
Then, we consider the value of nC. In the uplink channels, we set the standard deviation of additive noise in clients and edge servers to be σU=ct3ΔsU/ϵ2 and σE=ct4ΔsE/ϵ2. To ensure the global (ϵ, δ)-DP, the noises vectors in each client obey the same distribution n∼ϕ(n), and the noises in each edge server obey the same distribution N∼ϕ(N). We can get the aggregation at cloud server with artificial noises from clients and edge server as
˜w=N∑j=1n∑i=1qjpi(wi+ni)+N∑j=1qjNj=N∑j=1n∑i=1qjpiwi+N∑j=1n∑i=1qjpini+N∑j=1qjNj | (4.15) |
We know the distributions of qjpini and qjNj are Gaussian distributions. Moreover, we set the standard deviation of noises σA=ct5ΔsC/ϵ2.Therefore, the additive noises at cloud server can be expressed as
nC=√σ2A−σ2Un∗N−σ2EN={2c∗Cϵ2mnN√t25−t24N−t23Nn,t5≥√t24N+t23Nn,0,t5<√t24N+t23Nn. | (4.16) |
Hence, the value of nC has been proved.
At the beginning of this algorithm, the edge servers and cloud server broadcast their own privacy guarantees, and the cloud server sends the initial global parameter to all clients. In every local update, the client trains the model parameters by using its own local datasets. After the local training of each client, the local parameter will be clipped by the threshold C, and then Gaussian noise are added to the parameter wi of each client i. The amount of noises depends on the parameter σU, which is calculated above. Upon receiving all its clients' local parameters, the edge server calculates the aggregated model parameter by weighted average.
After aggregating the model parameters, the edge server chooses to broadcast the aggregation to the selected client or to upload it towards the cloud server. In the case of the aggregation, the aggregated parameters need to be broadcast to clients, and it should add the noises nE to aggregation according to Theorem 1. In another case, the aggregation parameter which will be sent to the cloud server in edge also needs protection by adding noises according to the value of σE.
The cloud server aggregates the model received from all edge servers if iteration t meets the preset condition. After the aggregation in the cloud server, the model parameters will have noises added according to the value of nC. Here, we need to pay attention to the privacy protection performance of our algorithm. First, any malicious adversary cannot get reach to individual training datasets. Due to the local perturbations, it is very tough to infer any valid information from the uploaded parameters. Similarly, an adversary may reveal sensitive information from parameters in a download channel. Hence, when broadcasting the aggregated model parameters to clients, additive noises may be added to parameters based on the theorem above.
Algorithm 1 HFL-DP |
1: for t=1,2,3,...,T do
2: // Client: 3: for Clienti,i=1,2,...,n do 4: if t|k1=0 then 5: Update the local parameters wti as 6: wti=argminwi(Fi(wi)+μ2||wi−wt−1||2) 7: Clip the local parameters 8: wti=wti/max(1,||wti||C) 9: Add noises and upload parameters 10: ˜wti=wti+σU 11: else 12: Update the local parameters wti as 13: wti=argminwi(Fi(wi)+μ2||wi−wt−1||2) 14: end if 15: end for 16: // Edge Server: 17: if t|k1=0 then 18: for Edge serverl,l=1,2,...,N do 19: Conduct Edge Aggregation 20: wl←∑i∈n|Di|˜wti|Dl|; 21: if t|k1k2=0 then 22: Add noises and upload parameters 23: ˜wl=wl+σE 24: else 25: for Clienti,i∈n do 26: wti←wl+nD 27: end for 28: end if 29: end for 30: end if 31: // Cloud Server: 32: if t|k1k2=0 then 33: Conduct Cloud Aggregation 34: w←∑i∈N|Dli|˜wl|D|; 35: for Clienti,i∈n do 36: wti←w+nD 37: end for 38: end if 39: end for |
In this part, we evaluate the proposed HFL-DP in Python 3.7 and PyTorch 1.12.1 with real-world datasets. To analyze the performance of our algorithm, we conduct the experiment by varying the protection levels ϵ, the number of clients n, the number of Edge servers N, the value of clipping bound C.
We train our convolutional neural network (CNN) on the standard MNIST dataset for handwritten digit classification datasets and the CIFAR-10 dataset for ten class images with Python. MNIST dataset is composed of 60,000 training examples and 10,000 testing examples. Each example is a 28 × 28 grey-level image of handwritten digits. The CIFAR-10 dataset is composed of 32 × 32 color images in ten different classes. There are 60,000 examples for each class, with 40,000 examples for training, 10,000 for testing and 10,000 for validation.
In our neural network, we use two convolutional layers with ReLu units and softmax. We also use the cross-entropy function as a criterion. We assume the data we use are IID. In our experiment, the relevant parameters of the same layer are consistent, and we assume the communication channels among different layers are unimpeded.
To show the influence of privacy preservation, we compare the proposed algorithm with non-private HierFAVG[14] on MNIST. We set n=50, N=5, C=15, ϵ1=20 and ϵ2=25. Figures 2 and 3 indicate that privacy preservation costs a bit of accuracy. We can get that the non privacy-preserving trained model is better than the privacy-preserving trained model due to the unfavorable influence of Gaussian noise. Also, HFL converges faster than HFL-DP. However, the loss of accuracy is decreasing with the iteration round proceeding on. Though there may be 5 and 11% losses in the final model, it costs not much.
In Figure 4, we set different values of protection levels ϵ=10, ϵ=20 and ϵ=50, both for ϵ1 and ϵ2. In our experiment, we choose n=50, N=5, T=50, k1=2 and k2=2 and then we can get model accuracy of the aggregation rounds. As shown in Figure 4, the accuracy of the model increases if we relax the privacy guarantees. That is because ϵ represents the distinguishable bound of all outputs on neighboring datasets in a database. A larger ϵ entails a higher risk of privacy leakage for a clearer distinguishability of two neighboring datasets. However, as we continue to increase the value of ϵ, the accuracy does not increase much. So, a better choice of ϵ will balance the safety and utility. For example, when the value of ϵ gets bigger than 20, it will not entail a better model accuracy. We can set 20 as a preferable value in this case.
In this experiment, we set the protection parameter ϵ=20 and four groups of different numbers of clients and servers. We choose n = 50, N = 5, T = 50, k1 = 2 and k2 = 2. Figure 5 shows that when the scale of communication system becomes larger, it needs more noise to realize security. So, a larger communication system may entail more performance loss. In our algorithm, we try to control the amount of noise by Eqs (4.10) and (4.11) instead of adding Gaussian noise blindly. So, it may not decrease the accuracy greatly. For example, when n=100 and N=20, it gets the result of 82%. Under the same condition, when n=10 and N=5, it gets the result of 91%. Moreover, we can also know that a more practical model can be trained by fewer participants while preserving sensitive information. A safe system that contains many users remains to be researched.
We choose various clipping bounds C=5, C=10 and C=15. In our experiment, we set ϵ=20, n=50 and N=5. Figure 6 indicates that once the value of the clipping bound is large enough, the model accuracy will not increase anymore. First, the gradient is clipped, so that the gradient of all samples is less than clipping bound C. Then, a sufficiently large Gaussian noise is added to the clipped gradient to achieve differential privacy. The clipping bound C is equivalent to the concept of sensitivity mentioned before, i.e., the maximum impact that each sample can cause will not exceed C. When a large enough noise positively correlated with C is added to the gradient, the sample can be hidden to meet the requirements of differential privacy.
To further summarize the results of the existing methodological analysis of HFL-DP, the results of the analysis in this paper were compared with the existing literature. The results of the analysis of the different HFL-DP schemes are shown in the Table 3. In [28], the authors researched differential privacy on Server-Client architecture. They used f-DP for training tracking in order to get a tighter definition of privacy. This can make it more communication rounds than using a moment accountant. Meanwhile, they implemented a trick on the clipping method to let the model curve better. In their experiment, the final model accuracy of 25 rounds was 91%. In [27], the authors calculated the noises to realize (ϵli, δli)-DP for client and (ϵl, δl)-DP for edge. The final model accuracy was 90%. This idea effectively realized privacy leakage from the upload aspect.
Main notations | |
k1 | Number of local updates for one edge aggregation |
k2 | Number of edge aggregations for one cloud aggregation |
T | Number of iterations |
w0 | Initial global parameter |
σU | Variance of the Gaussian noises for client |
σE | Variance of the Gaussian noises for edge server |
t | Index of iteration |
wti | Local parameters of client i in iteration t |
C | Clipping bound |
n | Number of clients |
N | Number of edge servers |
μ | Presetting constant of the proximal term |
Acronyms | |
federated learning | FL |
hierarchy federated learning | HFL |
differential privacy | DP |
Local Differential Privacy | LDP |
Internet-of-Things | IoT |
artificial intelligence | AI |
machine learning | ML |
empirical risk minimization | ERM |
Federated Proximal | FedProx |
stochastic gradient descent | SGD |
convolutional neural network | CNN |
Linear rectification function | ReLu |
Independent Identically Distribution | IID |
In [28], the research is designed for Server-Client architecture from client level. The authors do not consider the three-layer HFL yet. In [27], though it gets a better result seemingly, it ignores the privacy leakage in the download channel. If we add directly adequate noises in download channel as the author mentions in [27], it will entail a negative effect of noise accumulation in the hierarchical system on the model utility. Because of the hierarchical structure, we may add noises on the edge side and cloud side in the download channel. Although the amount of noise on each side is different, it still makes a bad influence after several transmissions. Therefore, our design aims to control the amount of noise from the whole system as articulated in Algorithm 1. The additive noises in every download channel can be calculated by (4.13) and (4.15).
In summary, a better model utility can be obtained in a three-layer HFL-DP by studying the overall privacy loss and then controlling the amount of noise added each time.
In this paper, we proposed a novel privacy-preserving method with the concept of global differential privacy. Our scheme provides confidentiality for clients' sensitive data to avoid security threats from not only participants in the communication system but also adversaries from outer cyberspace. We realize the communication protection by adding noises to shared model parameters in each channel. So, user data cannot be directly inferred by any adversary while still keeping high utility for the training model. In our experiment on the MNIST dataset, we get 91% model accuracy. Compared to previous two-layer HFL-DP[27], our design is more secure while being as accurate. Then, we evaluated the performance based on the image classification. The experiment results show that our method is practical and efficient with proper parameters. Therefore, we can balance the trade-off between efficiency and safety by adjusting relevant parameters. In the future, we will do further research about adaptive adjustment to realize the preferable privacy budget and the non-IId data's effect on the utility of our algorithm. Meanwhile, we will tune our algorithm to adapt to the changes in user data distribution.
This work was supported by the National Key Research and Development Program of China (2020YFA0712300), National Natural Science Foundation of China (62132005), Shanghai Trusted Industry Internet Software Collaborative Innovation Center.
The authors declare there is no conflict of interest.
[1] |
C. Wang, X. Wu, G. Liu, T. Deng, K. Peng, S. Wan, Safeguarding cross-silo federated learning with local differential privacy, Digital Commun. Networks, 8 (2022), 446–454. https://doi.org/10.1016/j.dcan.2021.11.006 doi: 10.1016/j.dcan.2021.11.006
![]() |
[2] |
J. Shi, P. Cong, L. Zhao, X. Wang, S. Wan, M. Guizani, A two-stage strategy for UAV-enabled wireless power transfer in unknown environments, IEEE Trans. Mob. Comput., 2023 (2023), 1–15. https://doi.org/10.1109/TMC.2023.3240763 doi: 10.1109/TMC.2023.3240763
![]() |
[3] |
Q. Liu, Z. Zeng, Y. Jin, Distributed machine learning, optimization and applications, Neurocomputing, 489 (2022), 486–487. https://doi.org/10.1016/j.neucom.2021.12.058 doi: 10.1016/j.neucom.2021.12.058
![]() |
[4] |
M. A. P. Chamikara, P. Bertok, I. Khalil, D. Liu, S. Camtepe, Privacy preserving distributed machine learning with federated learning, Comput. Commun., 171 (2021), 112–125. https://doi.org/10.1016/j.comcom.2021.02.014 doi: 10.1016/j.comcom.2021.02.014
![]() |
[5] |
M. Sun, R. Yang, L. Hu, A secure distributed machine learning protocol against static semi-honest adversaries, Appl. Soft Comput., 102 (2021), 107095. https://doi.org/10.1016/j.asoc.2021.107095 doi: 10.1016/j.asoc.2021.107095
![]() |
[6] |
J. Liu, J. Huang, Y. Zhou, X. Li, S. Ji, H. Xiong, et al., From distributed machine learning to federated learning: a survey, Knowl. Inf. Syst., 64 (2022), 885–917. https://doi.org/10.1007/s10115-022-01664-x doi: 10.1007/s10115-022-01664-x
![]() |
[7] |
C. Zhang, Y. Xie, H. Bai, B. Yu, W. Li, Y. Gao, A survey on federated learning, Knowledge-Based Syst., 216 (2021), 106775. https://doi.org/10.1016/j.knosys.2021.106775 doi: 10.1016/j.knosys.2021.106775
![]() |
[8] |
X. Wang, Y. Han, C. Wang, Q. Zhao, X. Chen, M. Chen, In-Edge AI: intelligentizing mobile edge computing, caching and communication by federated learning, IEEE Network, 33 (2019), 156–165. https://doi.org/10.1109/MNET.2019.1800286 doi: 10.1109/MNET.2019.1800286
![]() |
[9] | T. Li, A. K. Sahu, A. Talwalkar, V. Smith, Federated learning: challenges, methods, and future directions, preprint, arXiv: 1908.07873. |
[10] |
H. Yang, Z. Liu, T. Q. S. Quek, H. V. Poor, Scheduling policies for federated learning in wireless networks, IEEE Trans. Commun., 68 (2019), 317–333. https://doi.org/10.1109/TCOMM.2019.2944169 doi: 10.1109/TCOMM.2019.2944169
![]() |
[11] | M. Hao, H. Li, G. Xu, S. Liu, H. Yang, Towards efficient and privacy-preserving federated deep learning, in ICC 2019 - 2019 IEEE International Conference on Communications (ICC), Paris, France, 2019. https://doi.org/10.1109/ICC.2019.8761267 |
[12] |
J. Kang, Z. Xiong, D. Niyato, Y. Zou, Y. Zhang, M. Guizani, Reliable federated learning for mobile networks, IEEE Wireless Commun., 27 (2020), 72–80. https://doi.org/10.1109/MWC.001.1900119 doi: 10.1109/MWC.001.1900119
![]() |
[13] |
S. Liu, J. Yu, X. Deng, S. Wan, FedCPF: an efficient-communication federated learning approach for vehicular edge computing in 6G communication networks, IEEE Trans. Intell. Transp. Syst., 23 (2022), 1616–1629. https://doi.org/10.1109/TITS.2021.3099368 doi: 10.1109/TITS.2021.3099368
![]() |
[14] | L. Liu, J. Zhang, S. H. Song, K. B. Letaief, Client-Edge-Cloud hierarchical federated learning, in ICC 2020 - 2020 IEEE International Conference on Communications (ICC), (2020), 1–6. https://doi.org/10.1109/ICC40277.2020.9148862 |
[15] |
S. Wang, T. Tuor, T. Salonidis, K. K. Leung, C. Makaya, T. He, et al., Adaptive federated learning in resource constrained edge computing systems, IEEE J. Sel. Areas Commun., 37 (2019), 1205–1221. https://doi.org/10.1109/JSAC.2019.2904348 doi: 10.1109/JSAC.2019.2904348
![]() |
[16] | A. Agarwal, J. C. Duchi, Distributed delayed stochastic optimization, in 2012 IEEE 51st IEEE Conference on Decision and Control (CDC), 2012. https://doi.org/10.1109/CDC.2012.6426626 |
[17] | X. Lian, Y. Huang, Y. Li, J. Liu, Asynchronous parallel stochastic gradient for nonconvex optimization, ACM NIPS, (2015), 2737–2745. Available from: https://proceedings.neurips.cc/paper/2015/hash/452bf208bf901322968557227b8f6efe-Abstract.html. |
[18] | T. Li, A. K. Sahu, M. Zaheer, M. Sanjabi, A. Talwalkar, V. Smith, Federated optimization in heterogeneous networks, preprint, arXiv: 1812.06127. |
[19] | A. Wainakh, A. S. Guinea, T. Grube, M. Mühlhäuser, Enhancing privacy via hierarchical federated learning, in 2020 IEEE European Symposium on Security and Privacy Workshops (EuroS & PW), (2020), 344–347. https://doi.org/10.1109/EuroSPW51379.2020.00053 |
[20] | M. Abadi, A. Chu, I. Goodfellow, H. B. McMahan, I. Mironov, K. Talwar, et al., Deep learning with differential privacy, in Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, (2016), 308–318. https://doi.org/10.1145/2976749.2978318 |
[21] | H. B. McMahan, D. Ramage, K. Talwar, L. Zhang, Learning differentially private recurrent language models, in ICLR 2018, (2018), 1–14. Available from: https://openreview.net/forum?id = BJ0hF1Z0b. |
[22] | R. C. Geyer, T. Klein, M. Nabi, Differentially private federated learning: a client level perspective, preprint, arXiv: 1712.07557. |
[23] |
K. Wei, J. Li, M. Ding, C. Ma, H. H. Yang, F. Farokhi, et al., Federated learning with differential privacy: algorithms and performance analysis, IEEE Trans. Inf. Forensics Secur., 15 (2020), 3454–3469. https://doi.org/10.1109/TIFS.2020.2988575 doi: 10.1109/TIFS.2020.2988575
![]() |
[24] |
X. Huang, Y. Ding, Z. L. Jiang, S. Qi, X. Wang, Q. Liao, DP-FL: a novel differentially private federated learning framework for the unbalanced data, World Wide Web, 23 (2020), 2529–2545. https://doi.org/10.1007/s11280-020-00780-4 doi: 10.1007/s11280-020-00780-4
![]() |
[25] |
K. Wei, J. Li, M. Ding, C. Ma, H. Su, B. Zhang, et al., User-level privacy-preserving federated learning: analysis and performance optimization, IEEE Trans. Mob. Comput., 21 (2022), 3388–3401. https://doi.org/10.1109/TMC.2021.3056991 doi: 10.1109/TMC.2021.3056991
![]() |
[26] | A. Girgis, D. Data, S. Diggavi, P. Kairouz, A. T. Suresh, Shuffled model of differential privacy in federated learning, in Proceedings of the 24th International Conference on Artificial Intelligence and Statistics, 130 (2021), 2521–2529. Available from: http://proceedings.mlr.press/v130/girgis21a.html. |
[27] | L. Shi, J. Shu, W. Zhang, Y. Liu, HFL-DP: hierarchical federated learning with differential privacy, in 2021 IEEE Global Communications Conference (GLOBECOM), (2021), 7–11. https://doi.org/10.1109/GLOBECOM46510.2021.9685644 |
[28] | T. Zhou, Hierarchical federated learning with gaussian differential privacy, in AISS '22: Proceedings of the 4th International Conference on Advanced Information Science and System, (2022), 61. https://doi.org/10.1145/3573834.3574544 |
[29] | M. Fredrikson, S. Jha, T. Ristenpart, Model inversion attacks that exploit confidence information and basic countermeasures, in CCS '15: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, (2015), 1322–1333. https://doi.org/10.1145/2810103.2813677 |
[30] |
H. Zhu, F. Yin, S. Peng, X. Tang, Differentially private hierarchical tree with high efficiency, Comput. Secur., 118 (2022), 102727. https://doi.org/10.1016/j.cose.2022.102727 doi: 10.1016/j.cose.2022.102727
![]() |
Main notations | |
k1 | Number of local updates for one edge aggregation |
k2 | Number of edge aggregations for one cloud aggregation |
T | Number of iterations |
w0 | Initial global parameter |
σU | Variance of the Gaussian noises for client |
σE | Variance of the Gaussian noises for edge server |
t | Index of iteration |
wti | Local parameters of client i in iteration t |
C | Clipping bound |
n | Number of clients |
N | Number of edge servers |
μ | Presetting constant of the proximal term |
Acronyms | |
federated learning | FL |
hierarchy federated learning | HFL |
differential privacy | DP |
Local Differential Privacy | LDP |
Internet-of-Things | IoT |
artificial intelligence | AI |
machine learning | ML |
empirical risk minimization | ERM |
Federated Proximal | FedProx |
stochastic gradient descent | SGD |
convolutional neural network | CNN |
Linear rectification function | ReLu |
Independent Identically Distribution | IID |
Main notations | |
k1 | Number of local updates for one edge aggregation |
k2 | Number of edge aggregations for one cloud aggregation |
T | Number of iterations |
w0 | Initial global parameter |
σU | Variance of the Gaussian noises for client |
σE | Variance of the Gaussian noises for edge server |
t | Index of iteration |
wti | Local parameters of client i in iteration t |
C | Clipping bound |
n | Number of clients |
N | Number of edge servers |
μ | Presetting constant of the proximal term |
Acronyms | |
federated learning | FL |
hierarchy federated learning | HFL |
differential privacy | DP |
Local Differential Privacy | LDP |
Internet-of-Things | IoT |
artificial intelligence | AI |
machine learning | ML |
empirical risk minimization | ERM |
Federated Proximal | FedProx |
stochastic gradient descent | SGD |
convolutional neural network | CNN |
Linear rectification function | ReLu |
Independent Identically Distribution | IID |
Client | Edge | ϵ | ACC | Reference |
100 | - | - | 82% | paper[28] |
100 | 10 | 10 | 90% | paper[27] |
100 | 10 | 10 | 82% | this paper |