This paper theoretically and empirically revisits carbon pricing from the supply-side perspective for carbon assets to solve the recent low price issue which may delay the development of emission reduction technologies in the sense of marginal abatement costs. We propose a carbon pricing model linked to crude oil prices, which has historically been employed in supply-side driven pricing of long-term contracts for early-stage energy trading. Since the model is designed to hold carbon prices between certain lower and upper boundaries using S-shaped carbon price linkage to crude oil prices, it can be useful to overcome a recent low carbon price issue. In addition, it is shown that the model can alleviate the difficulties of carbon derivative pricing in selecting market price of risk. Empirical studies using EUA and Brent crude oil futures prices estimate the parameters of the Brent crude oil-linked EUA price model. The comparison of EUA prices simulated from the model with historical EUA prices suggests that simulated EUA prices be kept relatively higher than historical EUA prices. This is preferable for accelerating carbon emission reductions in that it can make emission reduction technologies with high marginal abatement costs affordable. It may imply that EUA must be priced using a crude oil-linked carbon price model in the early stage of EUA trading until EUA markets mature. This is a sharp contrast to current carbon markets employing premature market-based or supply and demand based pricing models. To show usefulness of crude oil-linked carbon pricing, we also give a numerical example of European carbon option pricing based on the Brent crude oillinked EUA price model by using the Crank-Nicolson finite di erence method. Finally we discuss the relation between crude oil-linked carbon pricing and emission reduction risk. These studies may suggest carbon policy makers should take account of crude oil-linked carbon pricing to tackle low price and low liquidity issues of carbon assets.
1.
Introduction
In the last few decades, technology has significantly dominated our lives and is currently considered the driving force of recent improvements in the medical health care area. Wireless sensor networks (WSNs) have demonstrated considerable importance due to their usage in many different aspects of human lives, such as medical health care, surveillance, environmental monitoring, military fields, and many other useful applications [1,2,3].
With the widespread use of smartphones, researchers have concentrated on the use of mobile technology for mobile medical health care, focusing on systems of medical data aggregation that are used to collect and send health data from patient smartphones directly to health care organizations.
With the exponential growth of health data, the processes of aggregating and analyzing vast amounts of data require immense storage capabilities, powerful computational resources, and fast and secure means of communication. Achieving these requirements by relying only on traditional WSNs is difficult and expensive for health care organizations [4].
Cloud-based solutions have proliferated in the medical health care field due to their extensive benefits [5]. These benefits include large-scale and on-demand storage, agility, cost-effectiveness and continuous service availability for information processing. Therefore, cloud-based solutions have considerable potential to enhance collaboration among the various participating entities in medical health care, such as patients and health care organizations.
Despite these benefits, cloud-based solutions are associated with elevated threat levels in terms of security and privacy. These threats include identity spoofing; data tampering; information disclosure; and violations of data integrity, confidentiality, authenticity, and accountability [6,7].
Mobile health networks (MHNWs) consist of small and inexpensive sensors that are deployed in unsupervised environments and are easily exposed to malfunctions and malicious attacks. Thus, fault tolerance is an important characteristic that must be considered when designing sensor network schemes [8].
In data aggregation schemes, sensor failures can cause the collection and transfer of incorrect data without a guarantee of excellence. Fault tolerance is defined as "the ability of the network to sustain its functionalities properly, even in the presence of failures in some of its nodes". Fault tolerance aims to eliminate critical privacy threats and assure strong privacy protection for users who contribute their data to aggregators to ensure that the applied technology can deliver excellent service quality [9].
Two approaches can be employed to achieve fault tolerance. The first is a reactive approach, in which a system can recover from failures when they occur [10]. A small error can be recovered by a state-of-the-art protocol despite failures [11]. Unfortunately, these protocols can tolerate only partial failures and are not efficient in terms of bandwidth and delay. The second is a proactive approach that handles failures using multiple message exchanges between the nodes and the aggregator before faults occur. This approach substantially reduces the required recovery time, as the information needed for fault recovery is available.
Despite the fact that the state-of-the-art binary proactive protocol achieves a low delay, it suffers from communication overhead, bandwidth costs and large errors [10]. Won et al. [10] presented a novel design for a future ciphertext mechanism; this design supports differential privacy and achieves a higher bandwidth than the state-of-the-art binary proactive protocol. Chen et al. [12] presented a data aggregation scheme that preserves user privacy and guarantees data integrity by adopting a future ciphertext mechanism to provide fault tolerance capability.
Due to the confidential nature of health information and the importance of protecting and preserving the confidentiality of data, information security systems should be designed and developed with consideration of legal, ethical and security issues. Therefore, to design a workable data aggregation scheme for medical health care, the following issues must be addressed. The first issue is how to protect and preserve the security and privacy of data and maintain data confidentiality. The second issue is how to protect a system against failures.
Therefore, we propose a novel design for a fault-tolerant privacy-preserving data aggregation scheme. We use the cloud to aggregate, store and process data. Our contributions can be summarized as follows:
● The proposed architecture achieves a fault-tolerant privacy-preserving data aggregation scheme for lightweight health data with end-to-end verification. Moreover, when some failures occur, the cloud can compute the aggregation result, and health care institutions (HCs) can verify the correctness of the aggregated result.
● We modify the future ciphertext mechanism by adding a threshold for the number of faulty nodes. This modification avoids scenarios in which the cloud continues to compute meaningless aggregation when a serious abnormality occurs in the system.
● For secure aggregation and identity protection, we use homomorphic encryption, as it enables aggregation functions to be performed on encrypted data. We use random noise to achieve differential privacy.
● We provide a security and privacy analysis to show that our proposed scheme supports privacy preservation, fault tolerance, and data integrity verification. Additionally, we evaluate the efficiency, robustness and reliability of our scheme to confirm that it has good real-time performance and low aggregation error.
The remainder of the paper is organized as follows: Section 2 provides a background of health data aggregation and investigates the privacy and security challenges of data aggregation. Section 3 reviews related studies. Preliminaries and the proposed scheme are presented in Sections 4 and 5, respectively. The security analysis is provided in Section 6, followed by the performance analysis in Section 7. The conclusion of the study and future research ideas are discussed in Section 8.
2.
Background
WSNs are formed by hundreds of thousands of sensor nodes that are used to measure and transmit physical or environmental changes, such as temperature and pressure or motion within a monitoring environment. Each sensor node consists of a sensing unit, memory, a processing unit, a power supply and a wireless communication unit [13]. The characteristics of WSNs include limited power, mobility, ability to cope with node failures, and low cost. These features have prompted researchers to introduce a new research area in the medical health care field: MHNWs. Recently, wearable devices and smartphones have been extensively applied in offering health monitoring services based on health data gathered from users. As these health data are very sensitive, any data leakage may violate user privacy [2]. In this section, we present an overview of the different uses of cloud computing in terms of data aggregation and address the security and privacy issues associated with MHNWs.
2.1. Security and privacy challenges
The usage of WSNs has rapidly increased over the past few years in various fields. A massive amount of data is being collected, transmitted and aggregated to perform processing operations. A large number of threats surround WSNs. Consequently, the design of a privacy-preserving data aggregation protocol should address these threats [13], which include the following:
a) Privacy Preservation and Eavesdropping: Eavesdropping is a type of attack in which the intruder tries to obtain confidential data by listening to transmissions over neighboring wireless links. Therefore, privacy preservation assures data privacy that may be threatened by trusted sensor nodes and adversaries. Some aggregation functions, such as min and max, can also be used to breach data privacy. Therefore, the designed protocol must maintain data privacy while using aggregation functions [13].
b) Data Integrity and Data Tampering: One of the most common types of attack on data privacy is data tampering, in which the attacker tries to manipulate (with an intermediate result) sensor data at the aggregator level during the data aggregation phase. This type of attack leads to an incorrect aggregation result and eventually to an incorrect decision [2,13].
c) Efficiency: In WSNs, it is very difficult to avoid communication overhead, but it can be greatly minimized by reducing communication costs, computational costs, and memory and payload sizes. In WSNs, data aggregation must fulfill both bandwidth and energy efficiency requirements throughout network processing.
d) Accuracy and Dynamism: In WSNs, energy constraints must be properly managed. The data generated from all sensing nodes are important. Therefore, all nodes should have sufficient power to process the collected data [3,13].
2.2. Essential requirements for privacy preservation in an e-health cloud
Certain rules and regulations are defined to ensure the privacy of the data within an organization and are called the CIA model (confidentiality, integrity and availability) [5]. Nevertheless, the data managed by third-party vendors require more privacy measures than those existing in the CIA model. Abbas and Khan [7] stated that there are many threats to privacy in the cloud, such as spoofing, masquerading, tampering, replaying and denial of service. The following requirements must be fulfilled to achieve data privacy preservation:
● Confidentiality: The health information of patients must be protected not only in the cloud environment but also from external anomalies and unauthorized users [7].
● Integrity: The data must be protected from illegitimate actions while ensuring that the data have not been altered or tampered with by either authorized or unauthorized users [5,7].
● Anonymity: Health data contain vital information, such as the patient's diseases and name, and this information must be hidden [14,15]. The patient's identity must be protected from intruders, unauthorized users and other internal or external adversaries. Anonymity can be achieved by using a technique known as pseudonymity [5,7].
● Nonrepudiation: These threats are posed by a user who performs tasks and denies them later. In the medical health care area, neither a patient nor a doctor can deny modifying data [7].
3.
Related work
Data aggregation, as a powerful technique for MHNWs, has attracted substantial attention in both academia and industrial fields. Recently, many privacy-preserving data aggregation schemes have been presented. In [11], ACS and Castelluccia proposed a privacy-preserving data aggregation scheme that applies the differential privacy concept by adding Laplace noise to aggregated data. However, the scheme increases network bandwidth and delay. Subsequently, they extended their scheme to support partial fault tolerance.
Lu et al. [15] introduced an efficient privacy-preserving scheme that reduces the computational overhead and delays in the network with its features, thereby providing fewer calculations, less traffic, higher accuracy and verifiable completeness. Khan et al. [16] proposed a fault-tolerant privacy-preserving data aggregation scheme in a fog-enabled Boneh-Goh-Nissam (BGN) cryptosystem used to preserve privacy. This scheme also reduces communication and computation costs. Zhang et al. [17] presented a privacy-preserving data aggregation scheme for health data monitoring in which the health data were stored and processed in the cloud and various strategies were applied based on the prioritization of the dataset. This scheme reduces the communication overhead but is not tolerant of node failures.
Won et al. [10] introduced a novel design for future ciphertext buffering to tolerate malfunctioning smart meters and achieved both differential privacy and error optimization. Chen et al. [12] also adopted the future ciphertext buffering mechanism that was proposed in [10] and proposed an aggregation scheme that supports fault tolerance, privacy preservation and data integrity. In addition, confidentiality was guaranteed by using Diffie-Hellman cryptography, while integrity was achieved by attaching a homomorphic message authentication code (HMAC) to each message.
Han et al. [8] addressed the fault tolerance issue within the health data monitoring framework. They proposed a cloud-based data aggregation scheme that supports additive and nonadditive data aggregation. A BGN cryptosystem was used to protect user privacy. Differential privacy was achieved by using multiple cloud servers. The scheme also guarantees data integrity. Chen et al. [18] presented another multifunctional data aggregation schema (MUDA) that takes advantage of the homomorphic property of the BGN cryptosystem and a bilinear map to provide confidentiality to user data. The MUDA was also extended to support differential privacy [19,20]. Zhu et al. [21] proposed a secure data integrity verification scheme based on a short signature algorithm. They introduced the use of cloud computing to augment computing and storage resources.
Since health data aggregation requires very high computational capabilities, the privacy of sensitive information can be guaranteed if it is encrypted [22] by the owner. Homomorphic encryption enables the cloud to compute the result of aggregation without knowing the raw data. Another way to provide security for health data is through the use of cryptographic storage [3,8,17].
Moreover, verification is an extremely crucial step in health data aggregation, as any tampering with the data results in an invalid aggregation, and such interference must therefore be detected and rejected. The message authentication code (MAC) is a protocol that is commonly used to detect false data and to protect and guarantee the integrity of the data [12,23,24,25]. Zhang et al. [25] and Chen et al. [12] took advantage of the homomorphic properties of the MAC to guarantee data integrity. The hash-based MAC was used by Chen et al. [12] and Zhuo et al. [26] to verify user data.
4.
Preliminaries
This section reviews the relevant definitions and terminologies. These definitions are necessary to understand the remainder of this work. The basic notations and symbols are listed in Table 1.
4.1. Bilinear pairing
A bilinear pairing \mathrm{e} is a map \mathrm{e}:{\mathrm{G}}_{1}\times {\mathrm{G}}_{2}\stackrel{}{\to }{\mathrm{G}}_{\mathrm{T}} , where {\mathrm{G}}_{1}, {\mathrm{G}}_{2} and {\mathrm{G}}_{\mathrm{T}} are cyclic multiplicative groups of the same prime order \mathrm{q} , {\mathrm{g}}_{1} is a generator of {\mathrm{G}}_{1} , and {\mathrm{g}}_{2} is a generator of {\mathrm{G}}_{2} . The pairing \mathrm{e} has the following properties:
Bilinearity: \mathrm{e}\left({\mathrm{g}}_{1}^{\mathrm{a}}, {\mathrm{g}}_{2}^{\mathrm{b}}\right) = \ \mathrm{e}\left({\mathrm{g}}_{1}, {\mathrm{g}}_{2}\right)\forall {\mathrm{g}}_{1}\in {\mathrm{G}}_{1}, {\mathrm{g}}_{2}\in {\mathrm{G}}_{2} and \mathrm{a}, \ \mathrm{b}\in {\mathrm{Z}}_{\mathrm{q}}^{\mathrm{*}} .
Computability: \forall {\mathrm{g}}_{1}\in {\mathrm{G}}_{1}, {\mathrm{g}}_{2}\in {\mathrm{G}}_{2} , \mathrm{e}\left({\mathrm{g}}_{1}, {\mathrm{g}}_{2}\right) can be computed by an efficient algorithm.
Nondegeneracy: \forall {\mathrm{g}}_{1}\in {\mathrm{G}}_{1}, {\mathrm{g}}_{2}\in {\mathrm{G}}_{2} , \mathrm{e}\left({\mathrm{g}}_{1}, {\mathrm{g}}_{2}\right)\ne 1.
4.2. Complexity assumptions
Definition 1. Discrete Logarithmic Problem [27] (DLP): Assume that {\mathrm{G}}_{1}, {\mathrm{G}}_{2} are two cyclic multiplicative cyclic groups, {\mathrm{G}}_{1} is generated by {\mathrm{g}}_{1} , and {\mathrm{G}}_{2} is generated by {\mathrm{g}}_{2} . Suppose that {\mathrm{g}}_{0}, {\mathrm{g}}_{1} are two elements in {\mathrm{G}}_{1} . It is computationally intractable to compute \mathrm{a} such that
Definition 2. Computational Diffie-Hellman Problem [28] (CDH): Assume that {\mathrm{G}}_{1}, {\mathrm{G}}_{2} are two cyclic multiplicative cyclic groups, {\mathrm{G}}_{1} is generated by {\mathrm{g}}_{1} , and {\mathrm{G}}_{2} is generated by {\mathrm{g}}_{2} . Given \mathrm{e}\left({\mathrm{g}}_{1}, {\mathrm{g}}_{1}^{\mathrm{a}}, {\mathrm{g}}_{1}^{\mathrm{b}}\right) and \mathrm{a}, \ \mathrm{b}\in {\mathrm{Z}}_{\mathrm{q}}^{\mathrm{*}} , it is intractable to derive {\mathrm{g}}_{1}^{\mathrm{a}\mathrm{b}} from the given \mathrm{e}\left({\mathrm{g}}_{1}^{\mathrm{a}}, {\mathrm{g}}_{1}^{\mathrm{b}}\right) in polynomial time.
Definition 3. Decisional Diffie-Hellman Problem [26] (DDH): Assume that {\mathrm{G}}_{1}, {\mathrm{G}}_{2} are two cyclic multiplicative cyclic groups, {\mathrm{G}}_{1} is generated by {\mathrm{g}}_{1} , and {\mathrm{G}}_{2} is generated by {\mathrm{g}}_{2} . Given \mathrm{e}\left({\mathrm{g}}_{1}, {\mathrm{g}}_{1}^{\mathrm{a}}, {\mathrm{g}}_{1}^{\mathrm{b}}, {\mathrm{g}}_{1}^{\mathrm{c}}\right) , where a, b, c \in {\mathrm{Z}}_{\mathrm{q}}^{\mathrm{*}} , a DDH determines whether \mathrm{c} = \mathrm{a}\mathrm{b} \ \mathrm{m}\mathrm{o}\mathrm{d} \ \mathrm{q} by checking as follows:
Definition 4. Gap Diffie-Hellman [29] (GDH) Group: A group is Gap Diffie-Hellman if the computational Diffie-Hellman problem is hard but the Decisional Diffe-Hellman problem can be solved in a cyclic multiplicative group {\mathrm{G}}_{1}, {\mathrm{G}}_{2} .
4.3. Differential privacy
Definition 1. ( \mathrm{ϵ}- Differential Privacy) [30] A randomized mechanism \mathrm{A} satisfies \mathrm{ϵ}- differential privacy if for any two datasets {\mathrm{D}}_{1} \ \ \mathrm{a}\mathrm{n}\mathrm{d} \ \ {\mathrm{D}}_{2} , where {\mathrm{D}}_{1}\mathrm{i}\mathrm{s} obtained from {\mathrm{D}}_{2} by adding or removing a single element, and for all \mathrm{S} ⊆ \mathrm{R}\mathrm{a}\mathrm{n}\mathrm{g}\mathrm{e}\left(\mathrm{A}\right),
In the above definition, the parameter \mathrm{ϵ} represents the privacy cost, which allows us to control the desired privacy level. A smaller value of \mathrm{ϵ} denotes better privacy protection but implies that more noise is required and that the result will have lower accuracy. The most common mechanism for achieving \mathrm{ϵ} -differential privacy is to add i.i.d Laplace noise sampled from the Laplace distribution to the aggregated result.
Definition 2. ( 2\mathrm{ϵ}- Differential Privacy) [31] The noise \mathrm{L}\mathrm{a}\mathrm{p}\left({\rm{ \mathsf{ λ} }}\right) is sampled from the Laplace noise distribution with mean 0 and variance {2{\rm{ \mathsf{ λ} }}}^{2} . The probability density function of the distribution is given by
In our scenario, each participant should generate random noise following a Laplace distribution. The Laplace distribution is infinity divisible, where each random variable is a summation of n other random variables as follows:
where {\mathrm{G}}_{\mathrm{i}}\left(\mathrm{n}, \ {\rm{ \mathsf{ λ} }}\right) and {\mathrm{G}}_{\mathrm{i}}'(\mathrm{n}, {\rm{ \mathsf{ λ} }}) are gamma-distributed random variables with a gamma density given by
Additionally, \mathrm{\Gamma }(1/\mathrm{n}) is the gamma function evaluated at 1/\mathrm{n} .
4.4. YASHE
Yet Another Somewhat Homomorphic Encryption (YASHE) is a scheme based on a modified version of n-th degree truncated polynomial ring units (NTRUs) and the multikey homomorphic encryption scheme [32]. It has become a trendy fully homomorphic encryption (FHE) scheme due to its superior performance with lightweight data compared with the performances of other homomorphic schemes [32,33].
The security of YASHE is based on the hardness of the decisional-ring learning with errors (RLWE) problem [34]: Given sample {\mathrm{a}\leftarrow \mathrm{R}}_{\mathrm{q}} , error term e \leftarrow χ, and a secret {\mathrm{s}\leftarrow \mathrm{R}}_{\mathrm{q}} where {\mathrm{a}\leftarrow \mathrm{R}}_{\mathrm{q}} is drawn uniformly at random, it is computationally hard for an adversary that does not know s and e to distinguish between the distribution of e (sa+e, a) and that of (a, b) where \left({\mathrm{b}\leftarrow \mathrm{R}}_{\mathrm{q}}\right) .
YASHE. ParamGen( {\rm{ \mathsf{ λ} }} ): Given a set of parameters {\rm{ \mathsf{ λ} }} , \mathrm{d} , \mathrm{q} , \mathrm{t} , {\mathrm{x}}_{\mathrm{k}\mathrm{e}\mathrm{y}} , {\mathrm{x}}_{\mathrm{e}\mathrm{r}\mathrm{r}} and \mathrm{w} , where {\rm{ \mathsf{ λ} }} is a security parameter, \mathrm{d} is a fixed positive integer that determines \mathrm{R} , and moduli \mathrm{q} and \mathrm{t} exist, with 1<\mathrm{t}<\mathrm{q} , {\mathrm{x}}_{\mathrm{k}\mathrm{e}\mathrm{y}} and {\mathrm{x}}_{\mathrm{e}\mathrm{r}\mathrm{r}} are distributions on \mathrm{R} , and \mathrm{w} is an integer base where \mathrm{w}>1 . The algorithm generates (\mathrm{d} , \mathrm{q} , \mathrm{t} , {\rm{ \mathsf{ λ} }} , {\mathrm{x}}_{\mathrm{k}\mathrm{e}\mathrm{y}} , {\mathrm{x}}_{\mathrm{e}\mathrm{r}\mathrm{r}} , \mathrm{w} ).
YASHE. keyGen (\mathrm{d} , \mathrm{q} , \mathrm{t} , {\rm{ \mathsf{ λ} }} , {\mathrm{x}}_{\mathrm{k}\mathrm{e}\mathrm{y}} , {\mathrm{x}}_{\mathrm{e}\mathrm{r}\mathrm{r}} , \mathrm{w} ): \mathrm{h} , \mathrm{f}\leftarrow {\mathrm{x}}_{\mathrm{k}\mathrm{e}\mathrm{y}} are computed; then, \mathrm{f} = [\mathrm{t}{\mathrm{f}}'+1{]}_{\mathrm{q}} and \mathrm{h} = [\mathrm{t}\mathrm{g}{\mathrm{f}}'{]}_{\mathrm{q}} are set. \mathrm{e}, \ \mathrm{s}\leftarrow {\mathrm{x}}_{\mathrm{e}\mathrm{r}\mathrm{r}}^{{\mathrm{l}}_{\mathrm{w}, \ \mathrm{q}}} are sampled, and \mathrm{\gamma } = \ [{\mathrm{P}}_{\mathrm{w}, \ \mathrm{q}}\left(\mathrm{f}\right)+\mathrm{e}+\mathrm{h}. \ \mathrm{s}{]}_{\mathrm{q}}\in {\mathrm{R}}^{{\mathrm{l}}_{\mathrm{w}}} is computed. Then, (\mathrm{p}\mathrm{k} , \mathrm{s}\mathrm{k} , \mathrm{e}\mathrm{v}\mathrm{k} ) = (\mathrm{h} , \mathrm{f} , \mathrm{\gamma } ) is generated.
YASHE. Encrypt ( \mathrm{p}\mathrm{k}, \ \mathrm{x}) : \mathrm{x}\in \mathrm{R} is encrypted, and ciphertext c = {\left[∆{\left[\mathrm{x}\right]}_{\mathrm{t}}+\mathrm{e}+\mathrm{h}\mathrm{s}\right]}_{\mathrm{q}}\in \mathrm{R} is generated.
YASHE. Decrypt ( \mathrm{s}\mathrm{k}, \ \mathrm{c}) : A ciphertext c is decrypted by x = {\left[⌊\frac{\mathrm{t}}{\mathrm{q}}.[{\mathrm{f}\mathrm{c}]}_{\mathrm{q}}⌉\right]}_{\mathrm{t}}\in \mathrm{R} .
YASHE. Add ({\mathrm{c}}_{1} , {\mathrm{c}}_{2} ): The ciphertext {\mathrm{c}}_{\mathrm{a}\mathrm{d}\mathrm{d}} = [{{\mathrm{c}}_{1}+{\mathrm{c}}_{2}]}_{\mathrm{q}} is output.
4.5. Homomorphic MAC function
One of the basic methods for ensuring data integrity and preventing tampering attacks is to use a homomorphic MAC function. The homomorphic property means that for two messages {\mathrm{x}}_{1} and {\mathrm{x}}_{2} , given two homomorphic MACs (MAC ( {\mathrm{x}}_{1} ) and MAC ( {\mathrm{x}}_{2} )), anyone can compute {{\mathrm{M}\mathrm{A}\mathrm{C} \ (\mathrm{x}}_{1}+ \ \mathrm{x}}_{2} ) without knowing {\mathrm{x}}_{1} or {\mathrm{x}}_{2} . The MAC function can be constructed as follows:
where {\mathrm{x}}_{\mathrm{i}}<\mathrm{q} . This MAC function satisfies the homomorphic property since it follows that
4.6. Hash function
The cryptographic hash function is used to check the integrity and source of the given data. This function accepts an input of arbitrary length and maps it to a fixed length with a one-way, collision-resistant mapping. It is computationally infeasible to map two different input maps (\mathrm{a}, \ \mathrm{b}) to the same output such that \mathrm{h}\left(\mathrm{a}\right) = \mathrm{h}\left(\mathrm{b}\right) , where \mathrm{a}\ne \mathrm{b} . Additionally, it is impossible to infer a from \mathrm{h}\left(\mathrm{a}\right) [35].
5.
Proposed approach
Data aggregation is an important tool in MHNWs, in which a vast amount of sensitive data is transmitted, processed, and analyzed. Therefore, fault tolerance and privacy have become critical issues for health data aggregation. Without appropriate privacy protection, users may not be willing to share their data. Therefore, we introduce a fault-tolerant privacy-preserving data aggregation scheme for health data.
In our scheme, the computational overhead is reduced. Privacy is provided by the fully homomorphic YASHE in addition to embedded noise for differential privacy. Fault tolerance is achieved by applying the future message mechanism to properly sustain network operability even in the presence of failures. To enhance the efficiency of the proposed scheme, a health institution can control malfunctioning nodes. The basic notations and symbols of the scheme are listed in Table 2.
5.1. System Model
Our system model consists of four main entities, as shown in Figure 1: mobile workers (MWs), the health care institution (HC), the cloud (C), and the trusted authority (TA).
■ Trusted Authority (TA): The primary responsibility of the TA is the initialization of the entire system, which includes registering the participants, the HCs and the cloud; generating the required public parameters; and distributing the keys.
■ Health Care Institution (HC): The HC is the requester that seeks aggregation statistics from patients' data. Due to limited storage and computation capabilities, the HC delegates computations to the cloud.
■ Cloud: The cloud server receives encrypted data from MWs and computes the desired statistical results. The cloud server encrypts the computation results and forwards them to the HC.
■ Participant (U): Participants refers to users or MWs who have smartphones and contribute their data to an HC. MWs are randomly chosen and encrypt and send their sensing data to the cloud.
Figure 2 depicts the framework of our proposed scheme, which contains three main entities: the client, the cloud, and the health institution. The cloud is the most prominent of these entities in our proposed scheme and contains three main modules: the data integrity verification module, the fault tolerance module, and the data aggregation module.
The workflow of our framework is as follows: First, the user's encrypted data are sent with two parameters: the first is the future ciphertext, and the second is the verification code. Then, the cloud server will verify the data integrity and calculate the aggregation result. If the aggregator fails to receive the data from one or more users up to m, the aggregator will use the future ciphertext from the buffer memory to calculate the aggregation result and then send the result to the HC. Finally, the HC will decrypt the result.
5.2. A novel fault-tolerant privacy-preserving cloud-based data aggregation scheme for lightweight health data
Step 1: Setup and key management
The TA generates the necessary parameters and keys for the system, generates the bilinear parameters ( \mathrm{q}, \ \mathrm{g}, \ \mathrm{h}, \ \mathrm{e}, {\mathrm{G}}_{1}, {\mathrm{G}}_{2}, {\mathrm{G}}_{\mathrm{T}} ) and encryption parameters for YASHE (\mathrm{d} , \mathrm{q} , \mathrm{t} , {\rm{ \mathsf{ λ} }} , {\mathrm{x}}_{\mathrm{k}\mathrm{e}\mathrm{y}} , {\mathrm{x}}_{\mathrm{e}\mathrm{r}\mathrm{r}} , \mathrm{w} ) and chooses a secure hash function \mathrm{H}\left(\mathrm{x}\right) . The TA registers all mobile users, the requester and the cloud in the system by sending them a private/public key pair ( {\mathrm{s}\mathrm{k}}_{\mathrm{c}}, {\mathrm{p}\mathrm{k}}_{\mathrm{c}}) . The TA selects N mobile users and registers them. Each registered MW is assigned private/public key pairs ( {\mathrm{s}\mathrm{k}}_{\mathrm{i}}, {\mathrm{p}\mathrm{k}}_{\mathrm{i}} ). Both the requester and workers are assigned encryption keys (α, β) for the homomorphic MAC.
Step 2: Sensing and reporting
During each time period t, each participant {\mathrm{U}}_{\mathrm{i}} reports his/her sensing data {\mathrm{x}}_{\mathrm{i}, \ \mathrm{t}} as follows. First, {\mathrm{U}}_{\mathrm{i}} computes
where {\widehat{\mathrm{r}}}_{\mathrm{i}, \ \mathrm{t}} represents random noise variables with gamma densities. The sum of all random noise from all participants guarantees differential privacy due to the divisibility of the Laplace distribution, as described in Section 4.1.
However, adding random noise \widehat{\mathrm{r}} is not adequate for ensuring the privacy of the data. As a result, the noisy data {\widehat{\mathrm{x}}}_{\mathrm{i}, \ \mathrm{t}} should be encrypted using the public key {\mathrm{p}\mathrm{k}}_{\mathrm{c}} of the requester to obtain
Each ciphertext {\mathrm{c}}_{\mathrm{i}, \ \mathrm{t}} is signed with its corresponding signature {{\rm{ \mathsf{ σ} }}}_{\mathrm{i}, \ \mathrm{t}} (generated by the secure hash function H() using participants' private keys {\mathrm{s}\mathrm{k}}_{\mathrm{i}} ) to prevent tampering attacks and ensure data integrity as follows:
To address fault tolerance, we use the proactive aggregation protocol based on the future ciphertext mechanism. Each participant {\mathrm{U}}_{\mathrm{i}} computes two kinds of ciphertext— {\mathrm{c}}_{\mathrm{i}, \ \mathrm{t}} for {\widehat{\mathrm{x}}}_{\mathrm{i}, \ \mathrm{t}} and a future ciphertext {\widehat{\mathrm{c}}}_{\mathrm{i}, \ \mathrm{t}} adapted from {\mathrm{c}}_{\mathrm{i}, \ \mathrm{t}} as follows:
We assume that the aggregator has a buffer memory ( \mathrm{B} ) to store future ciphertexts for each node. In our design, the aggregator is the cloud, which has intensive storage. Each node \mathrm{i} sends its ciphertext {\mathrm{c}}_{\mathrm{i}, \ \mathrm{t}} at time \mathrm{t} and \mathrm{B} future ciphertexts {\widehat{\mathrm{c}}}_{\mathrm{i}, \ \mathrm{t}} , {\widehat{\mathrm{c}}}_{\mathrm{i}, \ \mathrm{t}+1} , {\widehat{\mathrm{c}}}_{\mathrm{i}, \ \mathrm{t}+2} ... {\widehat{\mathrm{c}}}_{\mathrm{i}, \ \mathrm{t}+\mathrm{B}-1} , as shown in Figure 3. In the next iteration, each node sends two ciphertexts: The first ciphertext is the current ciphertext { \ \mathrm{c}}_{\mathrm{i}\mathrm{t}} , and the second ciphertext is the future ciphertext {\widehat{\mathrm{c}}}_{\mathrm{i}, \ \mathrm{t}+\mathrm{B}} and the corresponding signature {{\rm{ \mathsf{ σ} }}}_{\mathrm{i}, \ \mathrm{t}} . The purpose of a future ciphertext is to replace a given ciphertext if the cloud is unable to receive ciphertexts from the corresponding participant node. For increased efficiency, the HC controls the number of malfunctioning nodes using the parameter factor \mathrm{M} .
Step 3: Verifying the correctness of the health data aggregation
To ensure end-to-end verification, we use the HMAC function MAC. Each participant {\mathrm{U}}_{\mathrm{i}} signs the reported data with the corresponding homomorphic MAC value MAC ( {\widehat{x}}_{i, t} ) and calculates the homomorphic MAC value for the future ciphertext \widehat{MAC} ( {\widehat{r}}_{i, t+B} ). Participant {\mathrm{U}}_{\mathrm{i}} sends {c}_{i, t} , {\widehat{c}}_{i, t+B } , {\sigma }_{i, t} , MAC ( {\widehat{x}}_{i, t} ) and \widehat{MAC} ( {\widehat{r}}_{i, t+B} ) to the cloud.
Step 4: Data aggregation and verification
After receiving all reports, the cloud verifies whether the received reports were obtained from the chosen participants for each ciphertext {c}_{i, t} using participants' public keys {pk}_{i} by checking
If the above equation is valid, then data integrity is guaranteed, and the cloud proceeds to compute the aggregation result \mu . If not, a breach has occurred.
However, this equation does not consider fault tolerance. If some reports were not received by the cloud, the cloud cannot verify the received reports or obtain the aggregation results. For a more efficient and reliable schema, we modify the future ciphertext mechanism to enable users to set a preference configuration parameter M and resist the failure of a maximum of M participants out of N total participants.
If the cloud does not receive the ciphertext \mathrm{c} from between one and M nodes, where HC can specify M, the cloud uses the future ciphertext {\widehat{\mathrm{c}}}_{ \ } , which corresponds to the malfunctioning node from the buffer memory. If the number of malfunctioning nodes exceeds M, then the system is reinitialized to choose new medical health care nodes.
To verify the correctness of the aggregation result, the cloud computes the corresponding homomorphic message authentication code MAC as follows:
If the number of participants who fail to send their data is less than M, the cloud verifies the correctness of the aggregation result as follows:
The cloud forwards the results and the corresponding homomorphic MAC values \{ {\rm{ \mathsf{ μ} }}, \ {\rm{ \mathsf{ σ} }}\} to the requester (the HC).
Step 5: Decryption and verification of the results
When HC receives \{ {\rm{ \mathsf{ μ} }}, \ {\rm{ \mathsf{ σ} }}\} from the cloud, it derives the aggregation result \sum _{\mathrm{i} = 1}^{\mathrm{n}}{\mathrm{c}}_{\mathrm{i}, \ \mathrm{t}} by decrypting {\rm{ \mathsf{ μ} }} as follows:
The HC verifies the correctness of the aggregation result obtained using the homomorphic MAC algorithm by checking
If the verification fails, the HC rejects the results. Otherwise, the HC accepts the results.
6.
Security analysis
This section analyzes the security and privacy requirements satisfied by our proposed scheme. Moreover, we demonstrate how our proposed scheme resists different types of adversary models.
6.1. Data privacy
In our scheme, health data {\mathrm{x}}_{\mathrm{i}} are encrypted using YASHE, which is indistinguishable under the chosen ciphertext attack (IND-CPA) and secure under the decisional-RLWE assumption [34]. It is impossible for any time-bounded adversary to decrypt the ciphertext and obtain the health data without the knowledge of the private key, which is known only by the HC.
● Resilience against external attacks:
Proof: The external adversary cannot eavesdrop on the ciphertext {c}_{i, \ t} and extract {x}_{i, \ t} successfully since he/she has no knowledge of t, q, f or e, h . Such knowledge is impossible because f is held securely by participant {U}_{i} and e, h is privately held by the HC.
6.2. Differential privacy
During each time period t, the cloud can perform one of the above two types of queries. Both queries provide 2ϵ -differential privacy, where {\rm{ \mathsf{ λ} }} = GS/ϵ and GS is the global sensitivity of the aggregation result. Although the cloud uses the current and future ciphertexts to infer the sensing data {x}_{i, t}, , i.e., {c}_{i, \ t} - {\widehat{c}}_{i, \ t} = {x}_{i, t}-La{p}_{i, t}\left({\rm{ \mathsf{ λ} }}\right), it also provides ϵ -differential privacy for the data {x}_{i, \ t} [12], as the Laplace distribution has a symmetric shape around its mean of zero. Therefore, during each time period, from the participants' perspective, our scheme provides 2ϵ -differential privacy based on its parallel composition and sequential composition properties. Furthermore, our scheme provides protection against human factor-aware differential aggregation (HAD) [36]. This type of attack aims to break individual privacy. Suppose there are three MWs {\mathrm{M}\mathrm{W}}_{1} , { \ \mathrm{M}\mathrm{W}}_{2} and { \ \mathrm{M}\mathrm{W}}_{3} , and the sensing data {x}_{1}, {x}_{2} of {\mathrm{M}\mathrm{W}}_{1} , {\mathrm{M}\mathrm{W}}_{2} , respectively, are stable at time slots {t}_{1} and {t}_{2} . {\mathrm{M}\mathrm{W}}_{3} does not report any data at time slot {t}_{2} . From Eqs (5), (9), (10) and (13), the aggregated results for {t}_{1} and {t}_{2} are {M}_{1} = \sum _{i = 1}^{3}{x}_{i, 1}+La{p}_{1}\left({\rm{ \mathsf{ λ} }}\right) and {M}_{2} = \sum _{i = 1}^{2}{x}_{i, 2}+La{p}_{2}\left({\rm{ \mathsf{ λ} }}\right)+La{p}_{\mathrm{3, 2}}\left({\rm{ \mathsf{ λ} }}\right) , respectively. It is infeasible for the adversary to derive the sensing data {x}_{3} of {\mathrm{M}\mathrm{W}}_{3} at time slot {t}_{1} by comparing the aggregated result of {t}_{1} and {t}_{2} since {{M}_{1}-M}_{2} = {x}_{3, 1}-La{p}_{3, 1}\left({\rm{ \mathsf{ λ} }}\right) .
6.3. Data integrity
In our scheme, the cloud can easily detect if a report has been modified or interrupted by any adversary. Each report will be signed by a secure hash function at each time t .
● Resilience against modification attacks:
Proof: Assume that the adversary modifies {c}_{i, t} and {\sigma }_{i, t} into {c}_{i, t}' and {\sigma }_{i, t}' , respectively. The modified message passes the verification step if and only if {\sigma }_{i, t}' is guessed correctly. However, GDH group theory posits that it is infeasible for the adversary to determine {\sigma }_{i, t}' from \mathrm{e}\left({\sigma }_{i, t}', {g}_{2}\right) = e\left(H\left(t|\left|{c}_{i, t}'\right)\right), {pk}_{i}\right) since {G}_{1} is a GDH group. Additionally, for the given {\sigma }_{i, t}' , it is impossible to extract {c}_{i, t}' from e\left({\sigma }_{i, t}', {g}_{2}\right) = e\left(H\right(t|\left|{c}_{i, t}'\right), {pk}_{i} ) due to the features of the secure hash function and GDH group.
Therefore, when the adversary tries to transmit a modified message {c}_{i, t}' to the cloud, the modification can be detected by the cloud. As a result, our proposed scheme is resilient against modification attacks.
● Resilience against impersonation attacks:
Proof: To impersonate {\mathrm{U}}_{1} , the adversary must know the private key {\mathrm{s}\mathrm{k}}_{\mathrm{i}} . Using the public key {\mathrm{p}\mathrm{k}}_{\mathrm{i}} and the signature {{\rm{ \mathsf{ σ} }}}_{\mathrm{i}, \ \mathrm{t}} = {\mathrm{H} \ \left(\mathrm{t}\right|\left|{\mathrm{c}}_{\mathrm{i}, \ \mathrm{t}}\right)}^{{\mathrm{s}\mathrm{k}}_{\mathrm{i}}} , it is intractable to find {\mathrm{s}\mathrm{k}}_{\mathrm{i}} in polynomial time due to the discrete logarithmic assumption in {\mathrm{G}}_{1} .
● Resilience against reply attacks:
Proof: The adversary launches a reply attack by sending ciphertext {\mathrm{c}}_{\mathrm{i} \ } with the signature {{\rm{ \mathsf{ σ} }}}_{\mathrm{i}, 1} at time {\mathrm{t}}_{2} , which has been used at time {\mathrm{t}}_{1} , where ( {\mathrm{t}}_{1}<{\mathrm{t}}_{2} ). This can be detected by the cloud since \mathrm{e}\left({{\rm{ \mathsf{ σ} }}}_{\mathrm{i}, 1}, {\mathrm{g}}_{2}\right) = \mathrm{e} \ \left(\mathrm{H} \ \right({\mathrm{t}}_{1}|\left|{\mathrm{c}}_{\mathrm{i}, 1}\right), {\mathrm{p}\mathrm{k}}_{\mathrm{i}} ).
6.4. Robustness
To achieve robustness and node failure resistance in our scheme, we utilize a future ciphertext mechanism that requires low memory expenses. In the case of node failure, the cloud can still compute the aggregation and allows the HC to verify the correctness of aggregation. This in turn guarantees fault tolerance and robustness.
6.5. Correctness of the verification process
We use HMAC to ensure the correctness of the obtained aggregation result. First, during each time period t, the cloud computes the summation of the HMACs' {\rm{ \mathsf{ σ} }} for all received data and forwards the sum to the HC with the aggregation result {\rm{ \mathsf{ μ} }} . Then, the HC computes the HMAC for the aggregation result {\rm{ \mathsf{ μ} }} and checks whether the equation below holds:
Therefore, if the adversary tampers with the aggregation result, this tampering can be detected by the HC. Moreover, Table 3 demonstrates a comparison between the security features of our proposed scheme and those of other works [10,12,26].
7.
Performance analysis
Our proposed scheme is implemented based on the homomorphic scheme developed by Lepoint and Naehrig [32] using the Fast Library for Number Theory (FLINT) arithmetic library and the GNU Multiple Precision (GMP) math library. Our simulation experiments and benchmark tests are executed on a laptop with an Intel core i5 processor, 6 GB of RAM and the Windows 7 (64-bit) operating system. We also implement the scheme of Won et al. [10] for comparison. The performance results are stated in terms of milliseconds.
Additionally, we consider that the health data are manipulated by the patient's mobile phone (MW). Encryption is performed by the MW before sending the data to the cloud, and decryption is performed by the HC after sending the data to the cloud. The cloud receives the encrypted data, computes the summation of these encrypted data, and forwards the encrypted results to the HC. The size of the encrypted dataset is relatively small, as our scheme focuses on lightweight health data. Our simulation dataset is randomly generated from 35 to 42 human body temperature readings.
7.1. Cost of key generation
First, we compare the key generation costs of an MW in our scheme with those in the scheme of Won et al. [10] by changing the security bit to examine the key generation costs at different security levels. Table 4 shows the parameter sets used in our benchmarks. We choose these parameters based on [37]. The comparison is shown in Figures 4 and 5 for the real-time and CPU time flags yielded by benchmark testing. For this comparison, we calculate the cost based on a group of 100 MWs. The graphs plotted in Figures 4 and 5 indicate that the required time for key generation in our scheme is lower than that in the scheme of Won et al. [10] in terms of both real time and CPU time. Note that the real time required for key generation in the scheme of Won et al. [10] is three times higher than that required for key generation in our scheme. Thus, our scheme is four times faster than that of Won et al. [10] based on CPU time flags. The key generation cost is critical to the MW, as a lower cost for key generation leads to a longer battery life.
7.2. Cost of encryption
We also simulate the costs of encryption incurred by an MW when each group of our scheme has 100 participants (U) and compare the calculated costs with those of the scheme of Won et al. [10] at different security levels by changing the security bit. The simulation results are shown in Figures 6 and 7 for the real-time and CPU time flags yielded by benchmark testing. As shown in Figures 6 and 7, the encryption time of our scheme is superior to that of the scheme developed by Won et al. [10]. In terms of both real time and CPU time, the encryption cost of our scheme is six times lower than that of the scheme of Won et al. [10]. The low efficiency of the Won et al. [10] scheme is attributed to its encryption mechanism, where each participant in each time period t must communicate with all partners from the same group to exchange the secret keys {\boldsymbol{s}\boldsymbol{k}}_{\boldsymbol{i}, \ \boldsymbol{t}} to be used as the encryption key. To reduce the encryption cost in the scheme of Won et al. [10], we need to reduce the number of participants (U) in each group. This reduction would cause a decrease in the privacy level of the data and result in a reduced security level. The opportunity for the adversary to attack and disclose the data would then increase.
7.3. Low aggregation error
To make our scheme more practical, we utilize the future ciphertext mechanism proposed by Won et al. [10] to guarantee fault tolerance at the expense of two main requirements. If failures occur, the cloud can still calculate the aggregation result and the corresponding data integrity verification value. To evaluate our fault tolerance protocol, we measure the closeness between the actual summation of the sequence of data and the noisy sum calculated using the root mean square error (RMSE). Figure 8 shows the simulation result of our proposed scheme, where p is the probability of failure for MWs. The error in our scheme is significantly lower than that in the scheme developed by Won et al. [10].
8.
Conclusions and future work
We propose a fault-tolerant privacy-preserving cloud-based data aggregation scheme for lightweight health data. Our proposed scheme takes advantage of the numerous capabilities of the cloud by enabling an HC to delegate data aggregation tasks to the cloud. In our proposed scheme, we implement YASHE to protect the patient's identity and privacy, which enables the cloud to calculate the aggregation result with encrypted data. For differential privacy, we distribute noise among the MWs. Although our scheme enables the HC to verify the correctness of the aggregation result, our fault tolerance scheme is proactive and based on a future ciphertext mechanism. For increased efficiency, we enable the HC to control the number of acceptable malfunctioning nodes.
Compared with the aggregation process in the scheme of Won et al. [10], that in our scheme has a lower aggregation error and is not affected by the number of malfunctioning nodes. In addition, the performance evaluation shows that the computational overhead is significantly reduced. Unlike the encryption time in the scheme of Won et al. [10], that in our scheme is not affected by the number of participants utilized. The simulation results demonstrate the efficiency and feasibility of our scheme. In future work, we will improve our scheme to support multifunctional health data aggregation. Additionally, we will apply batch verification instead of individually verifying the reported data, which will improve the performance of the scheme.
Acknowledgments
This research project was supported by a grant from the Research Center of the Female Scientific and Medical Colleges, Deanship of Scientific Research, King Saud University.
Conflict of interest
All authors declare no conflicts of interest in this paper.