Loading [MathJax]/jax/output/SVG/jax.js
Research article

Mechanistic modeling of alarm signaling in seed-harvester ants


  • Ant colonies demonstrate a finely tuned alarm response to potential threats, offering a uniquely manageable empirical setting for exploring adaptive information diffusion within groups. To effectively address potential dangers, a social group must swiftly communicate the threat throughout the collective while conserving energy in the event that the threat is unfounded. Through a combination of modeling, simulation, and empirical observations of alarm spread and damping patterns, we identified the behavioral rules governing this adaptive response. Experimental trials involving alarmed ant workers (Pogonomyrmex californicus) released into a tranquil group of nestmates revealed a consistent pattern of rapid alarm propagation followed by a comparatively extended decay period [1]. The experiments in [1] showed that individual ants exhibiting alarm behavior increased their movement speed, with variations in response to alarm stimuli, particularly during the peak of the reaction. We used the data in [1] to investigate whether these observed characteristics alone could account for the swift mobility increase and gradual decay of alarm excitement. Our self-propelled particle model incorporated a switch-like mechanism for ants' response to alarm signals and individual variations in the intensity of speed increased after encountering these signals. This study aligned with the established hypothesis that individual ants possess cognitive abilities to process and disseminate information, contributing to collective cognition within the colony (see [2] and the references therein). The elements examined in this research support this hypothesis by reproducing statistical features of the empirical speed distribution across various parameter values.

    Citation: Michael R. Lin, Xiaohui Guo, Asma Azizi, Jennifer H. Fewell, Fabio Milner. Mechanistic modeling of alarm signaling in seed-harvester ants[J]. Mathematical Biosciences and Engineering, 2024, 21(4): 5536-5555. doi: 10.3934/mbe.2024244

    Related Papers:

    [1] Chunkai Zhang, Yingyang Chen, Ao Yin, Xuan Wang . Anomaly detection in ECG based on trend symbolic aggregate approximation. Mathematical Biosciences and Engineering, 2019, 16(4): 2154-2167. doi: 10.3934/mbe.2019105
    [2] Muhammad Firdaus, Siwan Noh, Zhuohao Qian, Harashta Tatimma Larasati, Kyung-Hyune Rhee . Personalized federated learning for heterogeneous data: A distributed edge clustering approach. Mathematical Biosciences and Engineering, 2023, 20(6): 10725-10740. doi: 10.3934/mbe.2023475
    [3] Kefeng Fan, Cun Xu, Xuguang Cao, Kaijie Jiao, Wei Mo . Tri-branch feature pyramid network based on federated particle swarm optimization for polyp segmentation. Mathematical Biosciences and Engineering, 2024, 21(1): 1610-1624. doi: 10.3934/mbe.2024070
    [4] Songfeng Liu, Jinyan Wang, Wenliang Zhang . Federated personalized random forest for human activity recognition. Mathematical Biosciences and Engineering, 2022, 19(1): 953-971. doi: 10.3934/mbe.2022044
    [5] M Kumaresan, M Senthil Kumar, Nehal Muthukumar . Analysis of mobility based COVID-19 epidemic model using Federated Multitask Learning. Mathematical Biosciences and Engineering, 2022, 19(10): 9983-10005. doi: 10.3934/mbe.2022466
    [6] Tao Wang, Min Qiu . A visual transformer-based smart textual extraction method for financial invoices. Mathematical Biosciences and Engineering, 2023, 20(10): 18630-18649. doi: 10.3934/mbe.2023826
    [7] Jiyang Yu, Baicheng Pan, Shanshan Yu, Man-Fai Leung . Robust capped norm dual hyper-graph regularized non-negative matrix tri-factorization. Mathematical Biosciences and Engineering, 2023, 20(7): 12486-12509. doi: 10.3934/mbe.2023556
    [8] Jianzhong Peng, Wei Zhu, Qiaokang Liang, Zhengwei Li, Maoying Lu, Wei Sun, Yaonan Wang . Defect detection in code characters with complex backgrounds based on BBE. Mathematical Biosciences and Engineering, 2021, 18(4): 3755-3780. doi: 10.3934/mbe.2021189
    [9] Shubashini Velu . An efficient, lightweight MobileNetV2-based fine-tuned model for COVID-19 detection using chest X-ray images. Mathematical Biosciences and Engineering, 2023, 20(5): 8400-8427. doi: 10.3934/mbe.2023368
    [10] Michael James Horry, Subrata Chakraborty, Biswajeet Pradhan, Maryam Fallahpoor, Hossein Chegeni, Manoranjan Paul . Factors determining generalization in deep learning models for scoring COVID-CT images. Mathematical Biosciences and Engineering, 2021, 18(6): 9264-9293. doi: 10.3934/mbe.2021456
  • Ant colonies demonstrate a finely tuned alarm response to potential threats, offering a uniquely manageable empirical setting for exploring adaptive information diffusion within groups. To effectively address potential dangers, a social group must swiftly communicate the threat throughout the collective while conserving energy in the event that the threat is unfounded. Through a combination of modeling, simulation, and empirical observations of alarm spread and damping patterns, we identified the behavioral rules governing this adaptive response. Experimental trials involving alarmed ant workers (Pogonomyrmex californicus) released into a tranquil group of nestmates revealed a consistent pattern of rapid alarm propagation followed by a comparatively extended decay period [1]. The experiments in [1] showed that individual ants exhibiting alarm behavior increased their movement speed, with variations in response to alarm stimuli, particularly during the peak of the reaction. We used the data in [1] to investigate whether these observed characteristics alone could account for the swift mobility increase and gradual decay of alarm excitement. Our self-propelled particle model incorporated a switch-like mechanism for ants' response to alarm signals and individual variations in the intensity of speed increased after encountering these signals. This study aligned with the established hypothesis that individual ants possess cognitive abilities to process and disseminate information, contributing to collective cognition within the colony (see [2] and the references therein). The elements examined in this research support this hypothesis by reproducing statistical features of the empirical speed distribution across various parameter values.



    In recent years, the abundance of data generated from many distributed devices with the popularity of smartphones, wearable devices, intelligent home appliances, and autonomous driving. These data are usually concentrated in the data center for effective use. However, a crucial issue arises that the concentrated data store causes leakage of personal privacy [1]. Simultaneously, as the computing power of these mobile devices increases, it is attractive to store data locally while completing related computing tasks. Federated learning is a distributed machine learning framework that allows multiple parties to collaboratively train a model without sharing raw data [2,3], which has attracted significant attention from industry and academia recently. [4] summarizes and discusses in the application of federated learning in big data and its future direction. Although federated learning has essential significance and advantages in protecting user privacy, it also faces many challenges.

    First of all, due to the distributed nature of federated learning, it is vulnerable to Byzan- tine attacks. Notably, it has been shown that, with just one Byzantine client, the whole federated optimization algorithm can be compromised and fail to converge [5]. Especially when the training data is not independent and identically distributed (non-iid), the difficulty of defense against Byzantine attacks is increased and it is difficult to guarantee the convergence of the model [6].

    Methods for defending against Byzantine attacks in federated learning have been exten- sively studied, including coordinate-wise trimmed mean [9], the coordinate- wise median [7,8], the geometric median [10,11], and distance-based methods Krum [12], BREA [6], Bulyan [5]. In addition to the above methods based on statistical knowledge, [14] proposes a new idea based on anomaly detection to complete the detection of Byzantine clients in the learning process. [13] discusses the challenges and future directions of federated learning in real-time scenarios in terms of cybersecurity.

    The above methods can effectively defend against Byzantine attacks to some extent, but there are also some limitations. First, the methods based on statistical knowledge have high computational complexity, and also their defense abilities are weakened due to the non-iid data in federated learning. Second, for the anomaly detection algorithm [14], there is a premise that the detection model should be trained on the test data set. Obviously, the premise hypothesis cannot be realized in practical applications because it is difficult for us to get such a data set, which can cover almost all data distributions. Therefore, it necessary for the anomaly detection model to get pre-training without relying on test dataset and update dynamically on non-iid data.

    In this paper, we propose a new method that each client needs to share some data with the server, which makes a trade-off between client privacy and model performance. Unlike FedAvg [2], we use credibility score as the weight of model aggregation, not the sample size. The credibility score of each client is obtained by integrating the verification score and the detection score. The former is calculated by sharing data.

    The main contributions of this paper are:

    ▪ We propose a new federated learning framework (BRCA) which combines credibility assessment and unified update. BRCA not only effectively defends against Byzantine attacks, but also reduces the impact of non-iid data on the aggregated global model.

    ▪ The credibility assessment combing anomaly detection and data verification effectively detects Byzantine attacks on non-iid data.

    ▪ By incorporating an adaptive mechanism and transfer learning into the anomaly detection model, the anomaly detection model can dynamically improve detection performance. Moreover, its pre-training no longer relies on the test data set.

    ▪ We customize four different data distributions for each data set, and explore the influence of data distribution on defense methods against Byzantine attacks.

    FedAvg is firstly proposed in [2] as an aggregation algorithm for federated learning. The server updates the global model by a weighted average of the clients' model updates, and the aggregation weight is determined based on its data sample size. Stich [15] and Woodworth et al. [16] analyze the convergence of FedAvg on strongly-convex smooth loss functions. However, they assume that the data is iid, which is not suitable for federated learning [17,18]. And Li et al. [19] makes the first convergence analysis of FedAvg when the data is non-iid. [20] uses clustering to improve federated learning in non-iid data. Regrettably, the ability of naive FedAvg is very weak to resist Byzantine attacks.

    In the iterative process of federated aggregation, honest clients send the true model updates to the server, wishing to train a global model by consolidating their private data. However, Byzantine clients attempt to perturb the optimization process [21]. Byzantine attacks may be caused by some data corruption events in the computing or communication process such as software crashes, hardware failures and transmission errors. Simultaneously, they may also be caused by malicious clients through actively transmitting error information, in order to mislead the learning process [21].

    Byzantine-robust federated learning has received increasing attention in recent years. Krum [12] is designed specially to defend Byzantine attacks in the federated learning. Krum generate the global model by a client's model update whose distances to its neighbors is shortest. GeoMed [10] uses the geometric median which is a variant of the median from one dimension to multiple dimensions. Unlike the Krum, the GeoMed uses all client updates to generate a new global model, not just one client update. Trimmed Mean [9] proposes that each dimension of its global model is obtained by averaging the parameters of clients' model updates in that dimension. But before calculating the average, the largest and smallest part of the parameters in that dimension are deleted, Xie et al. [22] and Mhamdi et al. [5] are all its variants. BREA [6] also considers the security of information transmission, but its defense method is still based on distance calculation. Zero [23] based on Watermark detection approach detect attacks such as malware and phishing attacks and cryptojacking. [24] surveys intrusion detection techniques in mobile cloud computing environment.

    All of the above defense methods based on statistical knowledge and distance are not effective in defending against Byzantine attacks in non-iid settings. Abnormal [25] uses an anomaly detection model to complete the detection of Byzantine attacks.

    The concept of independent and identically distributed (iid) of data is clear, but there are many meanings of non-iid. In this work, we only consider label distribution skew [17]. The categories of samples may vary across clients. For example, in the face recognition task, each user generally has their face data; for mobile device, some users may use emojis that do not show up in others' devices.

    We summarize the contributions and limitations of the existing works in Table 1.

    Table 1.  The summary of the contributions and limitations of the related papers.
    Reference Contributions Limitations
    [12] [10] [9]
    [5] [22]
    Krum, GeoMed and Trimmed Mean complete the Byzantine defense based on statistical knowledge. Easy to deploy applications. The assumption is that the data of the clients is iid. High computational complexity.
    [25] The auto-encoder anomaly detection model is firstly applied to detect Byzantine attacks. The pre-training of the anomaly detection model is completed on test dataset. The anomaly detection model is static.
    [6] Cryptography is used to protect the security of information transmitted between clients and server. Defense against Byzantine attacks is still based on distance to find outliers, and had limited defenses capabilities.

     | Show Table
    DownLoad: CSV

    In this paper, we propose a method that combine credibility assessment and unified update to robust federated learning against Byzantine attacks on non-iid data.

    We utilize a federated setting that one server communicates with many clients. For the rest of the paper, we will use the following symbol definitions: A is the total client set, |A| = n; S is the selected client set in every iteration, |S| = k; among them, B is Byzantine client set, |B| = b, and H is honest client set, |H|=h. wti is the model update sent by the client i to the server at round t, Byzantine attack rate ξ=bkwt is the global model at round t, DP = {D, ..., Dn} is clients' private data, Ds = {Ds, ..., Ds } is the clients' shared data, and data-sharing rate γ = |Ds||DP|+|Ds| (|| represents the sample size of the data set).

    In order to enhance the robustness of federated learning against Byzantines attacks on non-iid data, BRCA combines credibility assessment and unified update, Figure 1 depicts the architecture of BRCA.

    Figure 1.  The frame diagram of the BRCA.

    Before training, each client needs to share some private data to the server. In each iteration, the server randomly selects some clients and sends the latest global model to them. These clients use their private data to train the model locally and send the model updates to the server. After receiving model updates, the server conducts a credibility assessment for each model update and calculates their credibility scores. Momentum is an effective measure to improve the ability of federated learning to resist Byzantine attacks [26]. So our aggregation Eq (1) is as follow:

    wt+1=αWt+(1a)isrtiwti (1)

    where rti is the credibility score of client i at round t and α (0 < α < 1) is a decay factor. Last, unified update uses shared data to update the primary global model to get the new global model for this round

    Algorithm 1 is the description of BRCA, which contains Credibility Assessment in line 22, and Unified Update in line 28. The crucial of BRCA to defend against Byzantine attacks is credibility assessment. On non-iid data, the data distributions of different clients are immense, and it is difficult to judge whether the difference is caused by Byzantine attacks or the non-iid data. However, the model update of the honest client should have a positive effect on its private data, which is not affected by other clients. Simultaneously, anomaly detection model can effectively detect Byzantine attacks [25]. Thus, we combine the above two ideas to detect Byzantine attacks. In order to solve the shortcomings of the existing anomaly detection models, we propose an adaptive anomaly detection model. In this paper, the shared data is randomly selected by each client based on the sample category. Of course, other sampling methods could also be used, such as clustering. In addition, it must be pointed out that the shared data will only be used on the server, not on the clients. That effectively protect the clients' privacy.

        Algorithm 1: BRCA
      Input: total clients A; total number of iterations T; learning rate ηserver, ηclient, ηdetection; Byzantine attack rate ξ; epoch Eserver, Eclient; initial global model w0; clients' private data DP={DP1,,DPN}; clients' shared data
    Ds = {Ds1, ..., DSn }; initial anomaly detection model θ0; β; α; d; k
      Output: global model WT+1, anomaly detection model θT+1
    1  R = : the credibility score set.
    2  H = : the honest client set.
    3  Function Add Attack(w):

     | Show Table
    DownLoad: CSV

    To summarize, BRCA has five steps. First: the server pre-train an anomaly detection model by source data and initialize a global model. Second: every client share little private data with the server. Three: every client download the newest global model from the server, and complete model updates by private data. Then, every client send the model update to the server. Four: the server update the global model and complete the adaptation of the anomaly detection model by model updates from clients. Five: the server update the primary global model with unified update, after that, the new global model is completed. Repeating steps three to five until the global model converges

    Our work is different from the recent state of the art. First, Krum, GeoMed and TrimmedMean are the representative methods based on geometric knowledge, but their premise is that the data of clients is dependent and identically distributed (iid). The hypothesis of our method is based on the actual application background of FL, aiming at non-iid data. Second, Abnormal is the first method to detect Byzantine attacks by auto-encoder anomaly detection model. However, the training of the anomaly detection model in the method is based on the test dataset and the abnormal detection model in the method is static. For both of the problems, our method has made improvement: 1) we pre-train the anomaly detection model with related but different source data without relying on the test dataset. 2) we introduce adaptive mechanism to the anomaly detection model, which help the detection model get update during federated iteration dynamically.

    Algorithm 2(Credibility Assessment) is the key part of BRCA, which assigns a credibility score for each client model update. A Byzantine client would be given much lower credibility score than an honest client. To guarantee the accuracy of the credibility score, Credibility Assessment integrates adaptive anomaly detection model and data verification.

        Algorithm 2: Credibility Assessment
      Input: local model updates Q; clients' shared data Ds={Ds1,,Dsn}; anomaly
    detection model θt; β; selected clients S; ηdetection; d; k
      Output: credibility score of clients R; honest client set H; anomaly detection model θt+1
    1  R = : credibility score set; H = : the honest client set; sum = 0; sume = 0; sumf = 0
    2  C = {Ct1,,Cti,Ctk}, client i S, cti is the weight of the last convolutional layer of Wti
    3  for each clienti S do

     | Show Table
    DownLoad: CSV

    In Algorithm 2, line 4 is the data verification, which calculates the verification score fi for the model update of client i. And line 5 is the get-anomaly-score() of the adaptive anomaly detection model, which calculates detection score ei. Subsequently, the credibility ri of the model update is ri = βei +(1β)fi, R={r1, ... ri..., rk}, client iS. The make-adaption () in line 24 implements the adaption of the anomaly detection model.

    In this paper, we judge the model update with a credibility score lower than the mean of R as a Byzantine attack, and set its credibility score as zero. Finally, normalizing the scores to get the final credibility scores.

    In the training process, we cannot predict the type of attacks, but we can estimate the model update of the honest client. Therefore, we can adopt a one-class classification algorithm to build the anomaly detection model with normal model updates. Such technique will learn the distribution boundary of the model updates to determine whether the new sample is abnormal. Auto-encoder is an effective one-class learning model for detecting anomalies, especially for high-dimensional data [27].

    In practical applications, we cannot get the target data to complete the pre-training of our anomaly detection model. Therefore, the initialized anomaly detection model will be pre-trained on the source data with the idea of transfer learning.

    At round t, the detection score eti of client i:

    eti=exp(Mse(Ctiθt(Cti))μ(E)σ(E))) (2)

    Our anomaly detection model is different from the one in Abnormal: 1) Abnormal uses the test set of the data set to train the anomaly detection model. Although the detection model obtained can complete the detection task very well, in most cases the test data set is not available. Therefore, based on the idea of transfer learning, we complete the pre-training of the anomaly detection model in the source domain. 2) Abnormal 's anomaly detection model will not be updated after training on the test set. We think this is unreasonable, because the test set is only a tiny part of the overall data. Using a small part of the training data to detect most of the remaining data, and the result may not be accurate enough. Therefore, pre-training of the anomaly detection model is completed in the source domain. Then we use the data of the target domain to fine-tune it in the iterative process to update the anomaly detection model dynamically, as make-adaption shown in Algorithm 3.

        Algorithm 3: AADM adaptive anomaly detection model
      Input: anomaly detection model θt; weights of the last convolutional layer of the local model C; ηdetection; credibility score R; honest client set H; d; k
      Output: updated anomaly detection model θt+1
    1  Function get-anomaly-score (θt,Cti):

     | Show Table
    DownLoad: CSV

    The non-iid of client data increases the difficulty of Byzantine defense. However, the performance of the updated model of each client on its shared data is not affected by other clients, which can be effectively solved this problem. Therefore, we use the clients' shared data {DS=Ds1,Dsi,,Dsk} client iS to calculate the verification score of their updated model:

    fti=(exp(ltiμ(l)σ(L)))2 (3)

    where lti is loss of client i calculated on model wti using the shared data Dsi at round t:

    lti=1|Dsi||Dsi|j=0l(Ds(j)i,Wti) (4)

    where Ds(j)i is the jth sample of Dsi and μ(L), σ(L) are the mean and variance of set L={l1,,lk} respectively.

    After getting the credibility score rkt in Algrithm 2 with the anomaly score ekt and the verification score fkt, we can complete the aggregation of the clients' local model updates in Eq (1) and get a preliminary updated global model. However, due to the non-iid of client data, the knowledge learned by the local model of each client is limited, and the model differences between two clients are also significant. Therefore, to solve the problem that the preliminary aggregation model lacks a clear and consistent goal, we introduce an additional unified update procedure with shared data on server, details can be seen in Algorithm 4.

      Algorithm 4: Unified update
      Input: global model wt+1; clients' shared data Ds = {Ds1,,Dsn}; Eserver; ηserver; honest client set H
      Output: global model wt+1.
    1 for each epoch e = 0 to Eserver do

     | Show Table
    DownLoad: CSV

    Because the data used for the unified update is composed of each client's data, it can more comprehensively cover the distribution of the overall data. The goal and direction of the unified update are based on the overall situation and will not tend to individual data distribution.

    To verify the effectiveness of BRCA, we structure the client's data into varying degrees of non-iid, and explore the impact of different amounts of shared data on the global model. At the same time, we also compare the performance of our anomaly detection model with the Abnormal 's and explore the necessity of unified update.

    Mnist and Cifar10 are the two most commonly used public data sets in image classification, and most of the benchmark methods in our work also use these two data sets for experiments. Using these two data sets, it is easier to compare with other existing methods.

    We do the experiments on Mnist and Cifar10, and customize four different data distributions: (a) non-iid-1: each client only has one class of data. (b) non-iid-2: each client has 2 classes of data. (c) non-iid-3: each client has 5 classes of data. (d) iid: each client has 10 classes of data.

    For Mnist, using 100 clients and four data distributions: (a) non-iid-1: each class of data in the training dataset is divided into 10 pieces, and each client selects one piece as its private data. (b) non-iid-2: each class of data in the training dataset is divided into 20 pieces, and each client selects 2 pieces of different classes of the data. (c) non-iid-3 each class of data in the training dataset is divided into 50 pieces, and each client selects 5 pieces of different classes of the data. (d) iid: each class of data in the training dataset is divided into 100 pieces, and each client selects 10 pieces of different classes of the data. As for the source domains used for the pre-training of the anomaly detection model, we randomly select 20,000 lowercase letters in the Nist dataset.

    For Cifar10, there are 10 clients and the configuration of four data distributions is similar to that of the Mnist. We select some classes of data in Cifar100 as source domain, which are as follows: lamp (number:40), lawn mower (41), lobster (45), man (46), forest (47), mountain (49), girl (35), Snake (78), Rose (70) and Tao (68), these samples do not exiting in Cifar10.

    We use logistic regression on Mnist dataset. ηserver = 0.1, ηclient = 0.1, ηdetection = 0.02, Eclient = 5, Eserver = 1, n = 100, k = 30, ξ = 20%. Two convolution layers and three fully connected layer on Cifar10, ηserver = 0.05, ηclient = 0.05, ηdetection = 0.002, Eclient = 10, Eserver = 10, n = 10, k = 10, ξ = 20%. The structure of models are the same as [10].

    Same-value attacks: A Byzantine client i sends the model update ωi = c1 to the server (1 is all-ones vectors, c is a constant), we set c = 5. Sign-flipping attacks: In this scenario, each client i computes its true model update ωi, then Byzantine clients send ωi = a ωi (a < 0) to the server, we set a = −5. Gaussian attacks: Byzantine clients add Gaussian noise to all the dimensions of the model update ωi = ωi +ϵ, where s follows Gaussian distribution N (0, g2) where g is the variance, we set g = 0.3.

    Defenses: Krum, GeoMed, Trimmed Mean, Abnormal and No Defense. No Defense does not use any defense methods.

    In the first experiment, we test the influence of the shared data rate γ in our algorithm, and do the experiment with the data distribution of non-iid-2. We implement it on five different values [1, 3, 5, 7 and 10%]. Figures 2 and 3 are the accuracy and loss for Cifar10. It is found that: 1) In all cases of Byzantine attacks, our algorithm is superior to the three benchmark methods. 2) Only 1% of the data shared by the client can significantly improve the performance of the global model. For three Byzantine attacks, Krum, GeoMed, Trimmed Mean, No Defense are all unable to converge. This also shows that when the model is complex, such methods would be less able to resist Byzantine attacks.

    Figure 2.  The Accuracy of Cifar10. Byzantine attack types from (a) to (c) are as follows: Same value, Sign flipping and Gaussian noisy. Six defense methods are adopted for each type of attack, in order: No defense, Krum, GeoMed, Trimmed Mean, Abnormal and BRCA. For Ours, there are five different shared data rate (1, 3, 5, 7 and 10%), which correspond accordingly: BRCA 1, BRCA 3, BRCA 5, BRCA 7, BRCA 10.
    Figure 3.  The loss of Cifar10. The legends are the same as Figure 2.

    With the increase in the client data sharing ratio, the performance of the global model has become lower. When the client shares the data ratio from 1 to 10%, the average growth rate with the three Byzantine attacks are: 1.8→1.41→0.97→0.92%. The clients only share one percent of the data, and the performance of the global model can be greatly improved.

    Figure 4 clearly demonstrates the impact of different shared data rates on the loss value of the global model on Cifar10.

    Figure 4.  The loss of BRCA on Cifar10 with five different shared rate.

    In this part, the purposes of our experiment are: 1) Compare anomaly detection model between ours and Abnormal. 2) Explore the robustness of the anomaly detection model to data that are non-iid. The shared data rate γ is 5%, Sections 4.2.3 and 4.2.4 are the same.

    In order to compare the detection performance of the anomaly detection model against Byzantine attacks between BRCA and Abnormal, we use the cross-entropy loss as the evaluation metric which is calculated by the detection score. Firstly, we get detection scores E={e1, ..., ei, ..., ek} based on model update ωi and θ, client i∈S. Then, we set P=Sigmoid(Eμ(E)) represents the probability that the client is honest and 1 − P is the probability that the client is Byzantine. Lastly, we use P and true label Y (yi = 0, i∈ B and yi = 1, j∈ H) to calculate the cross-entropy loss l=Σki=1yiln(Pi)

    Figure 5(a)(c) compare the loss of the anomaly detection model between BRCA and the Abnormal. From the figures, we can see that our model has a greater loss than Abnormal in the initial stage, mainly due to the pre-training of the anomaly detection model using the transfer learning. The initial pre-trained anomaly detection model cannot be used well in the target domain. As the adaptation progress, the loss of our model becomes decreases and gradually outperforms the Abnormal. Although Abnormal has a low loss in the initial stage, as the training progresses, the loss gradually increases, and the detection ability becomes degenerate.

    Figure 5.  the cross-entropy loss of our and Abnormal anomaly detection model, on Cifar10 with non-iid-2. (a)–(c) are the performance for three Byzantine-attacks.

    Figure 6(a)(c) show the influence of different data distributions on our detection model. For different data distributions, the detection ability of the model is different, but it is worth pointing out that: as the degree of non-iid of the data increases, the detection ability of the model also increases.

    Figure 6.  (a)–(c) are our anomaly detection model's performance on four different data distribution (iid, non-iid-1, non-iid-2, non-iid-3) against Byzantine attacks (Gaussian noisy, sign flipping, same value).

    In this part, we study the impact of the unified update on the global model. Figure 7 shows the accuracy of the global model with and without unified update on Cifar10.

    Figure 7.  The accuracy of BRCA and BRCA No on Cifar10. BRAC No is based on BRCA with unified update removed.

    From non-iid-1 to iid, the improvement of the global model's accuracy by unified update is as follows: 35.1→13.6→4.7→2.3% (Same value), 34.8→10.5→3.0→3.1% (Gaussian noisy), 24.9→9.9→2.8→3.0% (Sign flipping). Combined with Figure 7, it can be clearly found that the more simple the client data is, the more obvious the unified update will be to the improvement of the global model.

    When the data is non-iid, the directions of the model updates between clients are different. The higher the degree of non-iid of data, the more significant the difference. The global model obtained by weighted aggregation does not fit well with the global data. Unified update on the shared data can effectively integrate the model updates of multiple clients, giving the global model a consistent direction.

    Therefore, it is necessary to implement a unified update to the primary aggregation model when data is non-iid.

    Tables 2 and 3 show the accuracy and loss of each defense method under different data distributions on Cifar10. It can be seen that our method is the best, and the performance is relatively stable for different data distributions. The higher the degree of non-iid of data, the more single the data of each client, the lower the performance of the defense method.

    Table 2.  The accuracy of the six defenses under four different data distributions on Cifar10, against three attacks.
    Attacks No Krum GeoMed Abnormal TrimmedMean BRCA
    Same value Non-iid-1 0.1 0.1 0.1 0.178 0.1 0.529
    Non-iid-2 0.101 0.207 0.205 0.480 0.1 0.619
    Non-iid-3 0.1 0.398 0.398 0.634 0.1 0.691
    iid 0.098 0.696 0.705 0.698 0.101 0.713
    Gaussian noisy Non-iid-1 0.1 0.1 0.1 0.178 0.1 0.529
    Non-iid-2 0.191 0.204 0.205 0.513 0.059 0.623
    Non-iid-3 0.0409 0.398 0.394 0.660 0.171 0.692
    iid 0.1 0.697 0.694 0.710 0.120 0.715
    Sign flipping Non-iid-1 0.1 0.101 0.1 0.177 0.1 0.426
    Non-iid-2 0.1 0.192 0.214 0.5131 0.1 0.621
    Non-iid-3 0.1 0.397 0398 0.651 0.1 0.686
    iid 0.1 0.697 0.703 0.711 0.1 0.718

     | Show Table
    DownLoad: CSV
    Table 3.  The loss of the six defenses under four different data distributions on Cifar10, against three attacks.
    Attacks No Krum GeoMed Abnormal TrimmedMean BRCA
    Same value Non-iid-1 2.84e16 11.72 9.61 2.29 6.05e17 2.09
    Non-iid-2 6.99e16 7.29 8.01 2.06 3.63e16 2.09
    Non-iid-3 4.48e16 2.35 2.38 1.893 3.37e16 0.691
    iid 1.51e16 0.794 0.774 1.837 3.17e16 1.79
    Gaussian noisy Non-iid-1 8.635e4 8.41 9.37 2.29 936.17 1.54
    Non-iid-2 9.51 7.57 8.37 1.34 7.98 0.623
    Non-iid-3 8.22 2.01 2.31 0.94 6.07 0.692
    iid 8.09 0.81 0.79 0.82 3.12 0.76
    Sign flipping Non-iid-1 2.30 10.72 9.91 2.29 2.30 1.54
    Non-iid-2 2.31 7.77 7.10 1.34 2.30 0.621
    Non-iid-3 2.31 2.36 2.13 0.94 2.30 0.686
    iid 2.31 0.79 0.80 0.81 2.31 0.76

     | Show Table
    DownLoad: CSV

    Our analysis is as follows: 1) The non-iid of data among clients causes large differences between clients' models. And it is difficult for the defense method to judge whether the anomaly is caused by the non-iid of the data or by the Byzantine attacks, which increases the difficulty of defending the Byzantine attack. 2) Krum and GeoMed use statistical knowledge to select the median or individual client's model to represent the global model. This type of method can effectively defend against Byzantine attacks when the data is iid. However when the data is non-iid, each client's model only focuses on a smaller area, and its independence is high, cannot cover the domain of other clients, and obviously cannot represent the global model. 3) Trimmed Mean is based on the idea of averaging to defend against Byzantine attacks. When the parameter dimension of the model is low, it has a good performance. But as the complexity of the model increases, the method can not stably complete convergence.

    In this work, we propose a robust federated learning framework against Byzantine attacks when the data is non-iid. BRCA detects Byzantine attacks by credibility assessment. Meanwhile, it makes the unified updating of the global model on the shared data, so that the global model has a consistent direction and its performance is improved. BRCA can make the global model converge very well when facing different data distributions. And for the pre-training of anomaly detection models, transfer learning can help the anomaly detection model get rid of its dependence on the test data set. Experiments have proved that BRCA performs well both on non-iid and iid data, especially on non-iid data. In the future, we will improve our methods by studying how to protect the privacy and security of shared data.

    This work was partially supported by the Shanghai Science and Technology Innovation Action Plan under Grant 19511101300.

    All authors declare no conflicts of interest in this paper.



    [1] X. Guo, M. R. Lin, A. Azizi, L. P. Saldyt, Y. Kang, T. P. Pavlic, et al., Decoding alarm signal propagation of seed-harvester ants using automated movement tracking and supervised machine learning, Proc. R. Soc. B, 289 (2022), 20212176. https://doi.org/10.1098/rspb.2021.2176 doi: 10.1098/rspb.2021.2176
    [2] O. Feinerman, A. Korman, Individual versus collective cognition in social insects, J. Exp. Biol., 220 (2017), 73–82. https://doi.org/10.1242/jeb.143891 doi: 10.1242/jeb.143891
    [3] B. Doerr, M. Fouz, T. Friedrich, Why rumors spread fast in social networks, Commun. ACM, 55 (2012), 70–75. https://doi.org/10.1145/2184319.2184338 doi: 10.1145/2184319.2184338
    [4] L. Bonnasse-Gahot, H. Berestycki, M. Depuiset, M. B. Gordon, S. Roché, N. Rodriguez, et al., Epidemiological modelling of the 2005 french riots: A spreading wave and the role of contagion, Sci. Rep., 8 (2018), 107. https://doi.org/10.1038/s41598-017-18093-4 doi: 10.1038/s41598-017-18093-4
    [5] D. A. Sprague, T. House, Evidence for complex contagion models of social contagion from observational data, PLoS One, 12 (2017), 1–12. https://doi.org/10.1371/journal.pone.0180802 doi: 10.1371/journal.pone.0180802
    [6] C. E. Coltart, B. Lindsey, I. Ghinai, A. M. Johnson, D. L. Heymann, The ebola outbreak, 2013–2016: Old lessons for new epidemics, Phil. Trans. R. Soc. B, 372 (2017), 20160297. https://doi.org/10.1098/rstb.2016.0297 doi: 10.1098/rstb.2016.0297
    [7] B. Hölldobler, E. O. Wilson, The Ants, Harvard University Press, 1990.
    [8] F. E. Regnier, E. O. Wilson, The alarm-defence system of the ant acanthomyops claviger, J. Insect Physiol., 14 (1968), 955–970. https://doi.org/10.1016/0022-1910(68)90006-1 doi: 10.1016/0022-1910(68)90006-1
    [9] W. H. Bossert, E. O. Wilson, The analysis of olfactory communication among animals, J. Theor. Biol., 5 (1963), 443–469. https://doi.org/10.1016/0022-5193(63)90089-4 doi: 10.1016/0022-5193(63)90089-4
    [10] E. Frehland, B. Kleutsch, H. Markl, Modelling a two-dimensional random alarm process, BioSystems, 18 (1985), 197–208. https://doi.org/10.1016/0303-2647(85)90071-1 doi: 10.1016/0303-2647(85)90071-1
    [11] D. J. McGurk, J. Frost, E. J. Eisenbraun, K. Vick, W. A. Drew, J. Young, Volatile compounds in ants: Identification of 4-methyl-3-heptanone from Pogonomyrmex ants, J. Insect Physiol., 12 (1966), 1435–1441. https://doi.org/10.1016/0022-1910(66)90157-0 doi: 10.1016/0022-1910(66)90157-0
    [12] J. B. Xavier, J. M. Monk, S. Poudel, C. J. Norsigian, A. V. Sastry, C. Liao, et al., Mathematical models to study the biology of pathogens and the infectious diseases they cause, Iscience, 25 (2022), 104079. https://doi.org/10.1016/j.isci.2022.104079 doi: 10.1016/j.isci.2022.104079
    [13] P. Törnberg, Echo chambers and viral misinformation: Modeling fake news as complex contagion, PLoS One, 13 (2018), 1–21. https://doi.org/10.1371/journal.pone.0203958 doi: 10.1371/journal.pone.0203958
    [14] G. I. Marchuk, Mathematical Modelling of Immune Response in Infectious Diseases, Springer, 2013. https://doi.org/10.1007/978-94-015-8798-3
    [15] L. G. de Pillis, A. E. Radunskaya, A mathematical model of immune response to tumor invasion, in Computational Fluid and Solid Mechanics 2003, Elsevier, (2003), 1661–1668. https://doi.org/10.1016/B978-008044046-0.50404-8
    [16] L. G. de Pillis, A. Eladdadi, A. E. Radunskaya, Modeling cancer-immune responses to therapy, J. Pharmacokinet. Pharmacodyn., 41 (2014), 461–478. https://doi.org/10.1007/s10928-014-9386-9 doi: 10.1007/s10928-014-9386-9
    [17] A. M. Smith, Validated models of immune response to virus infection, Curr. Opin. Syst. Biol., 12 (2018), 46–52. https://doi.org/10.1016/j.coisb.2018.10.005 doi: 10.1016/j.coisb.2018.10.005
    [18] J. M. Conway, R. M. Ribeiro, Modeling the immune response to hive infection, Curr. Opin. Syst. Biol., 12 (2018), 61–69. https://doi.org/10.1016/j.coisb.2018.10.006 doi: 10.1016/j.coisb.2018.10.006
    [19] S. Legewie, N. Blüthgen, H. Herzel, Mathematical modeling identifies inhibitors of apoptosis as mediators of positive feedback and bistability, PLoS Comput. Biol., 2 (2006), e120. https://doi.org/10.1371/journal.pcbi.0020120 doi: 10.1371/journal.pcbi.0020120
    [20] T. Fasciano, H. Nguyen, A. Dornhaus, M. C. Shin, Tracking multiple ants in a colony, in 2013 IEEE Workshop on Applications of Computer Vision (WACV), IEEE, (2013), 534–540. https://doi.org/10.1109/WACV.2013.6475065
    [21] C. W. Reynolds, Flocks, herds and schools: A distributed behavioral model, in SIGGRAPH '87: Proceedings of the 14th Annual Conference on Computer Graphics and Interactive Techniques, ACM, (1987), 25–34. https://doi.org/10.1145/37401.37406
    [22] T. Vicsek, A. Czirók, E. Ben-Jacob, I. Cohen, O. Shochet, Novel type of phase transition in a system of self-driven particles, Phys. Rev. Lett., 75 (1995), 1226–1229. https://doi.org/10.1103/PhysRevLett.75.1226 doi: 10.1103/PhysRevLett.75.1226
    [23] H. Chaté, F. Ginelli, G. Grégoire, F. Raynaud, Collective motion of self-propelled particles interacting without cohesion, Phys. Rev. E, 77 (2008), 046113. https://doi.org/10.1103/PhysRevE.77.046113 doi: 10.1103/PhysRevE.77.046113
    [24] I. D. Couzin, J. Krause, R. James, G. D. Ruxton, N. R. Franks, Collective memory and spatial sorting in animal groups, J. Theor. Biol., 218 (2002), 1–11. https://doi.org/10.1006/jtbi.2002.3065 doi: 10.1006/jtbi.2002.3065
    [25] H. S. Fisher, L. Giomi, H. E. Hoekstra, L. Mahadevan, The dynamics of sperm cooperation in a competitive environment, Proc. R. Soc. B, 281 (2014), 20140296. https://doi.org/10.1098/rspb.2014.0296 doi: 10.1098/rspb.2014.0296
    [26] H. Hildenbrandt, C. Carere, C. K. Hemelrijk, Self-organized aerial displays of thousands of starlings: A model, Behav. Ecol., 21 (2010), 1349–1359. https://doi.org/10.1093/beheco/arq149 doi: 10.1093/beheco/arq149
    [27] J. C. Lagarias, J. A. Reeds, M. H. Wright, P. E. Wright, Convergence properties of the nelder–mead simplex method in low dimensions, SIAM J. Optim., 9 (1998), 112–147. https://doi.org/10.1137/S1052623496303470 doi: 10.1137/S1052623496303470
    [28] A. Lipp, H. Wolf, F. Lehmann, Walking on inclines: Energetics of locomotion in the ant Camponotus, J. Exp. Biol., 208 (2005), 707–719. https://doi.org/10.1242/jeb.01434 doi: 10.1242/jeb.01434
    [29] N. C. Holt, G. N. Askew, Locomotion on a slope in leaf-cutter ants: Metabolic energy use, behavioural adaptations and the implications for route selection on hilly terrain, J. Exp. Biol., 215 (2012), 2545–2550. https://doi.org/10.1242/jeb.057695 doi: 10.1242/jeb.057695
    [30] M. J. Greene, D. M. Gordon, Interaction rate informs harvester ant task decisions, Behav. Ecol., 18 (2007), 451–455. https://doi.org/10.1093/beheco/arl105 doi: 10.1093/beheco/arl105
    [31] D. M. Gordon, N. J. Mehdiabadi, Encounter rate and task allocation in harvester ants, Behav. Ecol. Sociobiol., 45 (1999), 370–377. https://doi.org/10.1007/s002650050573 doi: 10.1007/s002650050573
    [32] D. M. Gordon, The regulation of foraging activity in red harvester ant colonies, Am. Nat., 159 (2002), 509–518. https://doi.org/10.1086/339461 doi: 10.1086/339461
    [33] N. Razin, J. Eckmann, O. Feinerman, Desert ants achieve reliable recruitment across noisy interactions, J. R. Soc. Interface, 10 (2013), 20130079. https://doi.org/10.1098/rsif.2013.0079
    [34] S. C. Pratt, Behavioral mechanisms of collective nest-site choice by the ant Temnothorax curvispinosus, Insect. Soc., 52 (2005), 383–392. https://doi.org/10.1007/s00040-005-0823-z doi: 10.1007/s00040-005-0823-z
    [35] S. C. Pratt, Quorum sensing by encounter rates in the ant Temnothorax albipennis, Behav. Ecol., 16 (2005), 488–496. https://doi.org/10.1093/beheco/ari020 doi: 10.1093/beheco/ari020
    [36] A. Dornhaus, Specialization does not predict individual efficiency in an ant, PLoS Biol., 6 (2008), e285. https://doi.org/10.1371/journal.pbio.0060285 doi: 10.1371/journal.pbio.0060285
    [37] S. N. Beshers, J. H. Fewell, Models of division of labor in social insects, Annu. Rev. Entomol., 46 (2001), 413–440. https://doi.org/10.1146/annurev.ento.46.1.413 doi: 10.1146/annurev.ento.46.1.413
    [38] D. Charbonneau, C. Poff, H. Nguyen, M. C. Shin, K. Kierstead, A. Dornhaus, Who are the "lazy" ants? The function of inactivity in social insects and a possible role of constraint: Inactive ants are corpulent and may be young and/or selfish, Integr. Comp. Biol., 57 (2017), 649–667. https://doi.org/10.1093/icb/icx029 doi: 10.1093/icb/icx029
    [39] A. Bernadou, J. Busch, J. Heinze, Diversity in identity: Behavioral flexibility, dominance, and age polyethism in a clonal ant, Behav. Ecol. Sociobiol., 69 (2015), 1365–1375. https://doi.org/10.1007/s00265-015-1950-9 doi: 10.1007/s00265-015-1950-9
    [40] E. J. H. Robinson, T. O. Richardson, A. B. Sendova-Franks, O. Feinerman, N. R. Franks, Radio tagging reveals the roles of corpulence, experience and social information in ant decision making, Behav. Ecol. Sociobiol., 63 (2009), 627–636. https://doi.org/10.1007/s00265-008-0696-z doi: 10.1007/s00265-008-0696-z
    [41] H. G. Tanner, A. Jadbabaie, G. J. Pappas, Stable flocking of mobile agents, part I: Fixed topology, in 42nd IEEE International Conference on Decision and Control, IEEE, (2003), 2010–2015. https://doi.org/10.1109/CDC.2003.1272910
    [42] A. Kolpas, M. Busch, H. Li, I. D. Couzin, L. Petzold, J. Moehlis, How the spatial position of individuals affects their influence on swarms: A numerical comparison of two popular swarm dynamics models, PloS One, 8 (2013), e58525. https://doi.org/10.1371/journal.pone.0058525 doi: 10.1371/journal.pone.0058525
  • mbe-21-04-244-suplementary.pdf
  • This article has been cited by:

    1. Chang Xu, Yu Jia, Liehuang Zhu, Chuan Zhang, Guoxie Jin, Kashif Sharif, TDFL: Truth Discovery Based Byzantine Robust Federated Learning, 2022, 33, 1045-9219, 4835, 10.1109/TPDS.2022.3205714
    2. Jie Wen, Zhixia Zhang, Yang Lan, Zhihua Cui, Jianghui Cai, Wensheng Zhang, A survey on federated learning: challenges and applications, 2023, 14, 1868-8071, 513, 10.1007/s13042-022-01647-y
    3. Qingtie Li, Xuemei Wang, Shougang Ren, A Privacy Robust Aggregation Method Based on Federated Learning in the IoT, 2023, 12, 2079-9292, 2951, 10.3390/electronics12132951
    4. Wenbin Yao, Bangli Pan, Yingying Hou, Xiaoyong Li, Yamei Xia, An Adaptive Model Filtering Algorithm Based on Grubbs Test in Federated Learning, 2023, 25, 1099-4300, 715, 10.3390/e25050715
    5. Chang Zhang, Shunkun Yang, Lingfeng Mao, Huansheng Ning, Anomaly detection and defense techniques in federated learning: a comprehensive review, 2024, 57, 1573-7462, 10.1007/s10462-024-10796-1
    6. Hiralal Bhaskar Solunke, Pawan Bhaladhare, Amol Potgantwar, 2024, chapter 17, 9798369334942, 299, 10.4018/979-8-3693-3494-2.ch017
    7. Caiyu Su, Jinri Wei, Yuan Lei, Hongkun Xuan, Jiahui Li, Chenchu Xu, Empowering precise advertising with Fed-GANCC: A novel federated learning approach leveraging Generative Adversarial Networks and group clustering, 2024, 19, 1932-6203, e0298261, 10.1371/journal.pone.0298261
    8. Kai Hu, Sheng Gong, Qi Zhang, Chaowen Seng, Min Xia, Shanshan Jiang, An overview of implementing security and privacy in federated learning, 2024, 57, 1573-7462, 10.1007/s10462-024-10846-8
    9. S. Annamalai, N. Sangeetha, M. Kumaresan, Dommaraju Tejavarma, Gandhodi Harsha Vardhan, A. Suresh Kumar, 2025, 9781394219216, 127, 10.1002/9781394219230.ch7
    10. Zheng Yang, Ke Gu, Yiming Zuo, Byzantine Robust Federated Learning Scheme Based on Backdoor Triggers, 2024, 79, 1546-2226, 2813, 10.32604/cmc.2024.050025
  • Reader Comments
  • © 2024 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(1220) PDF downloads(117) Cited by(0)

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog