Citation: Weidong Bao, Wenhua Xiao, Haoran Ji, Chao Chen, Xiaomin Zhu, Jianhong Wu. Towards big data processing in clouds: An online cost-minimization approach[J]. Big Data and Information Analytics, 2016, 1(1): 15-29. doi: 10.3934/bdia.2016.1.15
[1] | Nick Cercone . What's the Big Deal About Big Data?. Big Data and Information Analytics, 2016, 1(1): 31-79. doi: 10.3934/bdia.2016.1.31 |
[2] |
Hamzeh Khazaei, Marios Fokaefs, Saeed Zareian, Nasim Beigi-Mohammadi, Brian Ramprasad, Mark Shtern, Purwa Gaikwad, Marin Litoiu .
How do I choose the right NoSQL solution? A comprehensive theoretical and experimental survey . Big Data and Information Analytics, 2016, 1(2): 185-216.
doi: 10.3934/bdia.2016004
|
[3] | Richard Boire . UNDERSTANDING AI IN A WORLD OF BIG DATA. Big Data and Information Analytics, 2018, 3(1): 22-42. doi: 10.3934/bdia.2018001 |
[4] | M Supriya, AJ Deepa . Machine learning approach on healthcare big data: a review. Big Data and Information Analytics, 2020, 5(1): 58-75. doi: 10.3934/bdia.2020005 |
[5] | Enrico Capobianco . Born to be Big: data, graphs, and their entangled complexity. Big Data and Information Analytics, 2016, 1(2): 163-169. doi: 10.3934/bdia.2016002 |
[6] | Ali Asgary, Jianhong Wu . ADERSIM-IBM partnership in big data. Big Data and Information Analytics, 2016, 1(4): 277-278. doi: 10.3934/bdia.2016010 |
[7] | Yaguang Huangfu, Guanqing Liang, Jiannong Cao . MatrixMap: Programming abstraction and implementation of matrix computation for big data analytics. Big Data and Information Analytics, 2016, 1(4): 349-376. doi: 10.3934/bdia.2016015 |
[8] | Yang Yu . Introduction: Special issue on computational intelligence methods for big data and information analytics. Big Data and Information Analytics, 2017, 2(1): i-ii. doi: 10.3934/bdia.201701i |
[9] | Pankaj Sharma, David Baglee, Jaime Campos, Erkki Jantunen . Big data collection and analysis for manufacturing organisations. Big Data and Information Analytics, 2017, 2(2): 127-139. doi: 10.3934/bdia.2017002 |
[10] | Zhouchen Lin . A Review on Low-Rank Models in Data Analysis. Big Data and Information Analytics, 2016, 1(2): 139-161. doi: 10.3934/bdia.2016001 |
The cloud computing paradigm offers a convenient way for users to dynamically adjust its computing resources rented from cloud service providers (CSPs) according to the demand in a Pay-As-You-Go (PAYG) manner. Specifically, in cloud computing, benefited from the development of virtualization technology[3], VMs (Virtual Machines) resources can be scaled up and down to match the applications demands. Compared with traditional approaches, the cloud computing paradigm eliminates users' costs of purchasing and maintaining their own infrastructures.
The elastic and on-demand nature of resource provisioning attracts a lot of users to deploy their applications, especially computation-intensive big data analysis in the clouds. At the age of big data, data analysis is more and more important for applications such as financial analysis, social interaction web sites, astronomical telescope service. For example, Facebook-like social media sites can uncover usage patterns and hidden correlations by analyzing the web site history records (e.g., click records, activity records et al.) to facilitate its marketing decision. We call this kind of organization as Data Service Provider (DSP) in this paper. Under this paradigm, the DSPs should solve two problems in the first place: 1) How to transfer the large-scale data sets from various locations into clouds and 2) How many resources such as computing resource and storage resource should be rented in the clouds for processing?
Although much efforts has been made to design computing models for fast big data analysis, such as Mapreduce[6] and Spark[27], the problems of moving large-scale data to the clouds and provisioning adequate resources at the same time in the clouds is rarely considered in the community. Currently, for the data moving problem, practices such as copying the data into large-scale hard drives for physically transportation[2,15] and even moving the entire machine [1] to datacenters are adopted. These methods not only incur undesirable delays but also insecure case, given that hard drives may be damaged from transportation accident. For the resource provision problem, some works have been done to copy with dynamic workload in clouds[16,21]. But these methods often considered the data moving problem and resource provisioning problem in isolation.
In this paper, targeting the analysis of big data from different locations with the MapReduce-like framework in the clouds, we propose an online approach which systematically address the data moving problem and resource provisioning problem, with the goal of over all cost minimization of running big data analytic in the couds. To achieve this goal, we first formulate the problem into a jointly stochastic optimization problem, and then, apply the Lyapunov Optimization framework. Such a stochastic system does not require predicting the future system states and makes decisions only based on current system state[13]. Based on the drift-plus-penalty function transformation, we propose an online algorithm that is able to move data from multiple regions to distributed datacenters in an online manner and dynamically rent the near optimal number of computing resource and storage resource needed to satisfy user requirements for serving data analysis.
The major contributions of this work are summarized as follows:
● We propose a novel framework that systematically handles data moving from multiple locations to multiple datacenters and resource renting in each datacenter in a nearly optimal manner. In particular, we consider the bandwidth cost, computing cost, storage cost and delay cost as the overall cost and guarantee the data can be processed within a desirable delay. In our framework, VMs in the cloud have different types and are priced dynamically.
● We propose an algorithm to solve the jointed stochastic problem using the Lyapunov optimization framework, which is able to make decisions of resource renting and data moving online. Moreover, the algorithm can have a distributed implementation.
● We conduct performance analyses for the algorithm theoretically, which demonstrate that the algorithm approximates the optimal solution within provable bounds and is capable of processing the tasks within a preset delay.
The remainder of this paper is organized as follows: Section 2 summarizes related works; Section 3 describes the system modeling and the problem formulation; Section 4 gives the online algorithm for solving the problem; Section 5 analyzes the proposed algorithm; Section 6 concludes the paper.
Recent years have witnessed the proliferation of cloud-based service in both academic and industry. Much efforts has been made to migrate the applications such as cloud-based live streaming [9,21], cloud-based online game [18], cloud-based conference [7] and social media applications[24] etc. into clouds. Majority of these studies have focused on how to scale up and down the resource in the clouds to meet user demand or migrate the workflow into clouds.
Few studies have been conducted to move large scale data into clouds. Paper [4] studied how to transfer data to the cloud provider via the Internet and courier services. Study [5] proposed a solution to minimize the transfer latency under a budget constraint. In [11], the authors studied the data streaming storage for real time big data processing. Different from our study, these work deal with the data transfer problem on static scenario in which the data amount is fixed, while our work consider dynamically generated data. In addition, aforementioned studies considered a single datacenter while our work takes into account multiple datacenters. The most relevant work is Zhang et al [28] which proposed an online algorithm to migrate dynamically generated data from various locations to the clouds for processing. However, our work significantly differs since we consider the resource provisioning and data moving as simultaneously and applied the Lyapunov framework to address the problem.
There is also a line of research on resource provisioning in clouds. In the clouds, the server pool and the capacity of each server become elastic. Studies [16,12] considered elastic server capacity supported by virtualization technologies. Work [16] proposed adaptive request allocation and service capacities scaling mechanism mainly to cope with the flash crowd. Study [23] took into account of the VM renting cost and storage cost when making scheduling decisions. Different from these works which often need certain mechanisms to predict the future workloads, our work does not rely on any future information on big data tasks since the Lyapunov optimization framework is adopted. Also, studies on how to scheduling the tasks with different objectives in clouds have been conducted. Works [29,31] proposed efficient scheduling strategies for real-time tasks with energy minimization while studies [31,20] developed task scheduling algorithms under the consideration of fault-tolerant. These works are often within one single datacenter.
In addition, the Lyapunov optimization technique was first proposed in [17] to address the network stability problem and then was introduced into cloud computing to deal with job admission and resource allocation problem [19,10]. Yao et al. [26] extends it from the single time scale to two-time-scale for achieving electricity cost reduction in geographically distributed datacenters. Recently, this approach is used for resource management in cloud-based video service [22,25]. In our work, we utilize this approach to simultaneously address the data moving from multiple locations to multiple datacenters and resource provisioning in each datacenter.
To summarize, our work differs from existing works as follows. 1) Firstly, we address the problems of data moving and resource provisioning systematically and design an online algorithm that can be implemented distributedly. 2) Secondly, with the Lyapunov framework, our method does not rely on the prediction of future big data processing workload, which significantly differs from the assumptions made in [21,23].
We consider such a system scenario as presented in Fig. 1: A DSP (e.g., a global astronomical telescope department) manage multiple geographical data locations that continuously produces large volumes of data. The DSP deploys their data analytics application in cloud and connects the data source to different datacenters located in multiple places. All the data are moved to the the datacenters and processed in the corresponding datacenter with distributed computing model such as MapReduce framework. In the system, the DSP observes the state of the datacenter (e.g., VM price, datacenter load state, network state) and decides the amount of data to be moved to each datacenter and the amount of resource rented from each datacenter, with cost minimization consideration. Finally, the datacenters return the analysis results to DSP after the data have been processed and analyzed.
Formally, considering the geo-distributed datacenters set
The system operates according to time slots, denoted by
set of datacenters distributed over multiple regions | |
set of data locations | |
set of VM types | |
amount of the data generated from region | |
max amount of data generated from region | |
amount of the data allocated to | |
max number of VMs of type- | |
number of type- | |
price of type- | |
price of storage in datacenter | |
price of bandwidth between location | |
data processing rate of type- | |
preset constant for controlling queueing delay in | |
max delay of data process | |
unprocessed data in datacenter | |
Virtual queue associate with |
In this subsection, we first formulate the cost incurred in the system and then define the objective of the problem mathematically.
As aforementioned, the system runs in a time-slotted fashion and the data are dynamically generated over different regions in each time slot. Let
ar(t)≤Armax,∀r,t∈[1,T]. | (1) |
ar(t)=∑d∈Dλdr(t),∀r,t∈[1,T]. | (2) |
The goal of the DSP is to minimize the over all cost incurred in the system by optimizing the amount of data allocated to each datacenter and the number of resources needed. Specifically, the following cost components are considered in this paper: bandwidth cost, latency cost, storage cost and computing cost. Each of the cost is defined as follows.
(1) Usually, the bandwidth price is varying over different VPN links because they often belong to different Internet service providers. Let
Cb(t)=∑d∈D∑r∈Rλdr(t)⋅bdr. | (3) |
(2) Storage cost is an important factor to be considered in choosing the datacenter for data analytics since it often has large amount of data for big data application. Let
Cs(t)=∑d∈D∑r∈Rλdr(t)⋅sd. | (4) |
(3) Due to the variance of VM price over time slots, the number of VMs rented from datacenter has important impact on the over all cost of the system as well as QoS of the big data application. Let
Cp(t)=∑d∈D∑k∈Knkd(t)⋅pkd(t). | (5) |
(4) The latency incurred by upload data to the datacenters is an important performance measure, which is to be minimized in the data moving process.
Cl(t)=∑d∈D∑r∈Rα⋅λdr(t)⋅Ldr(t), | (6) |
where
Based on above cost formulation, the overall cost incurred in the system can be derived as:
C(t)=Cp(t)+Cs(t)+Cb(t)+Cl(t). | (7) |
Hence, the problem of minimizing the time-average cost of data moving and processing within a long-term period
P1.min:ˉC | (8) |
s.t.:ar(t)≤Armax,∀r,t∈[1,T] | (9) |
ar(t)=∑d∈Dλdr(t),∀r,t∈[1,T] | (10) |
0≤nkd(t)≤nk,maxd,∀d,∀k,t∈[1,T] | (11) |
where
From the problem formulation presented above, as the data generation is unknown, we know that the problem is a constrained stochastic optimization problem and our objective is to minimize the long-term average cost by optimizing the amount of data allocated to each datacenter as well as the number of the VMs rented inthe datacenter. To deal with this problem, a recent developed optimization technique is adopted in this paper. The details of solution by using Lyapnov optimization framework is presented in the next section.
In this section, we exploit the Lyapunov optimization theory to design our online control algorithm. An outstanding feature of this method is that it does not require future information about workload. By greedily minimizing the drift-plus-penalty in each time slot, it also can be proved to approach a time averaged cost that is arbitrarily close to optimum, while still maintaining system stability.
According to the standard optimization framework theory in [13], we first transform the problem P1 to an optimization problem of minimizing the Lyapunov drift-plus-penalty term and then design the corresponding online algorithm.
Let
Hd(t+1)=max[Hd(t)−∑k∈Knkd(t)⋅vk,0]+∑r∈Rλdr(t). | (12) |
The above queue update implies that the amount of departed data and newly-arrived data are
To guarantee that the worst-case queuing delay in queue
Zd(t+1)=max[Zd(t)+1{Hd(t)>0}(εd−∑k∈Knkd(t)⋅vk)−1{Hd(t)=0}∑k∈Knk,maxd⋅vk,0], | (13) |
where the indicator function
Let
L(Θ(t))=12∑d∈D{Zd(t)2+Hd(t)2}, | (14) |
where
Δ(Θ(t))=E{L(Θ(t+1))−L(Θ(t))|Θ(t)}. | (15) |
In the sense of Lyapunov optimization framework, the drift-plus-penalty can be obtained by adding the the cost incurred by the system to the above Lyapunov drift, namely,
Δ(Θ(t))+V⋅E{C(t)|Θ(t)}, | (16) |
where
P2.min:(16) | (17) |
s.t.:(9)(10)(11). | (18) |
To solve problem P2, rather than directly minimize the drift-plus-penalty expression (16), we seek to minimize the upper bound for it, without undermining the optimality and performance of the algorithm according to [13]. Therefore, the key is to find an upper bound on problem P2. It can be proved that the expression (16) is bounded as:
Δ(Θ(t))+V⋅E{C(t)|Θ(t)}=B+E{∑d∈D∑k∈Knkd(t)⋅(Vpkd(t)−Hd(t)vk−Zd(t)vk)|Θ(t)}+E{∑d∈D∑r∈Rλdr(t)⋅(Vsd+Vbdr+VαLdr+Hd(t))|Θ(t)}, | (19) |
where
Fortunately, a careful investigation of the R.H.S of inequality (19) reveals that the optimization problem can be equivalently decoupled into two subproblems: 1) data allocation and 2) resource provisioning. The details of solving the two subproblems are presented as follows.
1) Data Allocation: To minimize the R.H.S of (19), by observing the relationship among variables, the part related to Data Allocation can be extracted from the R.H.S of (19) as:
E{∑d∈D∑r∈Rλdr(t)⋅(Vsd+Vbdr+VαLdr+Hd(t))|Θ(t)}. | (20) |
Furthermore, since the data generated from each location are independent, the centralized minimization can be implemented independently and distributedly. Considering the data allocation in location
min∑d∈Dλdr(t)[Vsd+Vbdr+VαLdr+Hd(t)]s.t(9)(10). | (21) |
In fact, the above problem is a generalized min-weight problem where the amount of data from location
λdr(t)={ar(t) d=d∗0 else, | (22) |
where
2) Resource Provisioning: The left part of R.H.S (19) related to variable
min E{∑d∈D∑k∈Knkd(t)⋅(Vpkd(t)−Hd(t)vk−Zd(t)vk)|Θ(t)}s.t(11). | (23) |
Since the resource provisioning problems in each datacenter are independent, similar to data allocation problem, (23) can be solved distributedly within each datacenter. For a single datacenter
minE{∑k∈Knkd(t)⋅(Vpkd(t)−Hd(t)vk−Zd(t)vk)|Θ(t)}s.t(11). | (24) |
The optimal solution to the above linear problem is:
nkd(t)={nk,maxd,if Hd(t)+Zd(t)>Vpkd(t)vk 0,ifHd(t)+Zd(t)≤Vpkd(t)vk. | (25) |
The above solution indicates that a type-
Obviously, the two complex problems of data allocation and resource provisioning have been solved efficiently by using Lyapunov framework so far. The simple strategy facilitates the online deployment of the algorithm in the real world systems. The detail of its online algorithm is presented in Algorithm 1.
Algorithm 1: Procedures of the Proposed online Algorithm in Time Slot |
1Input: |
2 |
3 Output: |
4 |
5 Resource provisioning: |
6 foreach datacenter |
7 |
8 Data Allocation: |
9 foreach |
10 |
11 Update the queues |
Next, to show its priority, we analyze theoretically the performance of the algorithm 1 in terms of cost optimality, queueing delay bound, and the worst delay of data processing.
Theorem 5.1. (Cost Optimality) Suppose the data generation rate
limsupT→∞1T⋅T−1∑t=0E{C(t)}≤C∗+BV, | (26) |
where
Proof. Please see the Appendix B.
This theorem exhibits that the gap between the time average cost obtained by the algorithm proposed in this paper and the optimal cost obtained offline is with
Theorem 5.2. (Queues Delay Bound) Assume
Zmaxd=Vpmaxdvmin+εd, | (27) |
and
Hmaxd=Vpmaxdvmin+∑r∈RArmax | (28) |
where
Proof. Please see the Appendix C.
This theorem shows that the queue backlog is with
Theorem 5.3. (Worst Case Delay)Assume that the system running in First-in-First-Out manner, then the worst delay of the data processing in queue
l=[Hmaxd+Zmaxd/εd], | (29) |
where
Proof. Please see the Appendix D.
This implies that the data arriving at any time slot
Targeting the processing of big data from different locations in geo-distributed datacenters, we propose a systematical way for data moving and resource provisioning with the goal of cost minimization. The model takes into consideration the case that data analysis application is running in dynamic environment (e.g., unpredictable data generation, dynamic VM pricing). By using the Lyapunov technique, we transformed the original problem into two independent subproblems that can be solved efficiently online. Theoretical analysis demonstrates that the algorithm is able to maintain the stability of the dynamic system and complete the data processing within some time slots. It remains to further validate the effectiveness of the proposed algorithm via extensive experiments. Other considerations that may be further incorporated into our proposed framework include data processing relevance between two consecutive time slots, data processing migration among datacenters etc.
For the vectors of
Zd(t+1)2−Zd(t)2≤{1(H(t)>0)(εd−∑k∈Knkd(t)vk)−1(H(t)=0)∑k∈Knk,maxdvk}+2Zd(t){1(H(t)>0)(εd−∑k∈Knkd(t)vk)−1(H(t)=0)∑k∈Knk,maxdvk}≤(εd)2+(∑k∈Knk,maxdvk)2+2Zd(t)(εd−∑k∈Knkd(t)vk) | (30) |
Hd(t+1)2−Hd(t)2≤(∑k∈Knkd(t)vk)2+(∑r∈Rλdr(t))2+2Hd(t)(∑r∈Rλdr(t)−∑k∈Knkd(t)vk) | (31) |
Since
Δ(Θ(t))=12∑d∈DE{Hd(t+1)2−Hd(t)2|Θ(t)}+12∑d∈DE{Zd(t+1)2−Zd(t)2|Θ(t)}≤B+∑d∈DE{Hd(t)(∑r∈Rλdr(t)−∑k∈Knkd(t)vk)|Θ(t)}+∑d∈DE{Zd(t)(εd−∑k∈Knkd(t)vk)|Θ(t)} | (32) |
So far, by adding the term
To prove this theorem, we first give the following lemma.
Lemma B.1. (Existence of Optimal Randomized Stationary Policy): There exists at least one policy
E{C(t)}=C∗,E{∑r∈Rλd,πr(t)}≤E{∑k∈Knk,πd(t)vk},εd≤E{∑k∈Knk,πd(t)vk} | (33) |
where
Based on lemma B.1, next we prove the time averaged cost bound of our algorithm in (26) as follows.
Proof. From lemma B.1, it can be derived that there exist a constant
E{∑r∈Rλd,πr(t)}≤E{∑k∈Knk,πd(t)vk}−δ | (34) |
εd≤E{∑k∈Knk,πd(t)vk}−δ | (35) |
Therefore, recall that our algorithm seek to minimize the right-hand-side of the inequality in (19) by choosing the decision variables among all feasible decisions at each time slot and apply lemma B.1, (34) and (35) into (19), we can obtain:
Δ(Θ(t))+V⋅E{C(t)|Θ(t)}≤B+VC∗−δ∑d∈DE{Hd(t)}−δ∑d∈DE{Zd(t)} | (36) |
Taking the expectation for (36) and using the fact that
E{L(Θ(t+1))−L(Θ(t))}+V⋅E{C(t)}≤B+VC∗−δ∑d∈DE{Hd(t)}−δ∑d∈DE{Zd(t)} | (37) |
With the law of telescoping sums over
E{L(Θ(T))−L(Θ(0))}T+VT⋅T−1∑t=0E{C(t)}≤B+VC∗−δTT−1∑t=0∑d∈DE{Hd(t)}−δTT−1∑t=0∑d∈DE{Zd(t)} | (38) |
Rearranging the terms and considering the fact that
1T⋅T−1∑t=0E{C(t)}≤C∗+BV | (39) |
Now (26) follows by taking a limit as
Proof. For
Similarly, for
Proof. If there is a time slot
Zd(τ+1)=max[Zd(τ)+εd−∑k∈Knkd(t)⋅vk,0]≥Zd(τ)+εd−∑k∈Knkd(t)⋅vk | (40) |
Summing the above inequality over time slots
Zd(t+l+1)≥Zd(t+1)+ld⋅εd−t+l+1∑τ=t+1∑k∈Knkd(τ)⋅vk. | (41) |
Hence, it can be derived that
t+l+1∑τ=t+1∑k∈Knkd(τ)⋅vk≥Hmaxd≥Hd(t). | (42) |
Since the system running in First-in-First-out model, the data arriving at time slot
[1] | [ Moving an elephant:Large scale hadoop data migration at facebook, http://www.facebook.com/notes/paul-yang/moving-an-elephant-large-scale-hadoop-data-migration-at-facebook/10150246275318920. |
[2] | [ AWS Import/Export, http://aws.amazon.com/importexport/. |
[3] | [ P. Barham, B. Dragovic and K. Fraser, Xen and the art of virtualization, SIGOPS Operating Systems Review, 37(2003), 164-177. |
[4] | [ B. Cho and I. Gupta, New algorithms for planning bulk transfer via internet and shipping networks, in Proc. IEEE ICDCS, (2010), 305-314. |
[5] | [ B. Cho and I. Gupta, Budget-constrained bulk data transfer via internet and shipping networks, in Proc. ACM ICAC, (2011), 71-80. |
[6] | [ J. Dean and S. Ghemawat, MapReduce:Simplified data processing on large clusters, Communications of the ACM, 51(2008), 107-113. |
[7] | [ Y. Feng, B. Li and B. Li, Airlift:Video conferencing as a cloud service using interdatacenter networks, in Proceedings of the IEEE International Conference on Network Protocols(ICNP'12), (2012), 1-11. |
[8] | [ L. Georgiadis, M. J. Neely and L. Tassiulas, Resource allocation and cross-layer control in wireless networks, Foundations and Trends in Networking, 1(2006), 1-144. |
[9] | [ Z. Huang, C. Mei, L. Li and T. Woo, CloudStream:Delivering high-quality streaming videos through a cloud-based SVC proxy, in Proceedings of the IEEE INFOCOM, (2011), 201-205. |
[10] | [ F. Liu, Z. Zhou, H. Jin, B. Li, B. Li and H. Jiang, On arbitrating the power-performance tradeoff in SaaS clouds, IEEE Transactions on Parallel and Distributed Systems, 25(2014), 2648-2658. |
[11] | [ X. Mo and H. Wang, Asynchronous index strategy for high performance real-time big data stream storage, in Network Infrastructure and Digital Content (IC-NIDC), (2012), 232-236. |
[12] | [ X. Nan, Y. He and L. Guan, Optimal resource allocation for multimedia cloud based on queuing model, in Proc. of IEEE MMSP Workshop, (2011), 1-6. |
[13] | [ M. J. Neely, Stochastic Network Optimization with Application to Communication and Queueing Systems, Morgan and Claypool, 2010. |
[14] | [ M. J. Neely, Opportunistic scheduling with worst case delay guarantees in single and multi-hop networks, in Proc. of INFOCOM, (2011), 1728-1736. |
[15] | [ E. E. Schadt, M. D. Linderman, J. Sorenson, L. Lee and G. P. Nolan, Computational solutions to large-scale data management and analysis, Nat Rev Genet, 11(2010), 647-657. |
[16] | [ J. Tang, W. P. Tay and Y. Wen, Dynamic request redirection and elastic service scaling in cloud-centric media networks, IEEE Transactions on Multimedia, 16(2014), 1434-1445. |
[17] | [ L. Tassiulas and A. Ephremides, Stability properties of constrained queueing systems and scheduling policies for maximum throughput in multihop radio networks, IEEE Transactions on Automatic Control, 37(1992), 1936-1948. |
[18] | [ C. Union, Homepage http://www.cloudunion.cn/. |
[19] | [ R. Urgaonkar, U. Kozat, K. Igarashi and M. J. Neely, Resource allocation and power management in virtualized data centers, in Proceedings of the IEEE Network Operations and Management Symp(NOMS'10), (2010), 479-486. |
[20] | [ J. Wang, W. Bao, X. Zhu, L. T. Yang and Y. Xiang, FESTAL:Fault-tolerant elastic scheduling algorithm for real-time tasks in virtualized clouds, IEEE Transactions on Computers, 64(2014), 2445-2558. |
[21] | [ F. Wang, J. Liu and M. Chen, CALMS:Cloud-assisted live media streaming for globalized demands with time/region diversities, in Proceedings of the IEEE INFOCOM, (2012), 199-207. |
[22] | [ D. Wu, Z. Xue and J. He, iCloudAccess:Cost-effective streaming of videogames from the cloud with low latency, IEEE Transactions on Circuits and Systems for Video Technology, 28(2014), 1405-1416. |
[23] | [ Y. Wu, C. Wu, B. Li, X. Qiu and F.C.M. Lau, Cloudmedia:When cloud on demand meets video on demand, In Proc. of IEEE ICDCS, (2011), 268-277. |
[24] | [ Y. Wu, C. Wu, B. Li, L. Zhang, Z. Li and F. Lau, Scaling social media applications into geo-distributed clouds, in Proc. IEEE INFOCOM, (2012), 684-692. |
[25] | [ W. Xiao, W. Bao, X. Zhu, C. Wang, L. Chen and L. T. Yang, Dynamic request redirection and resource provisioning for cloud-based video services under heterogeneous environment, IEEE Transactions on Parallel and Distributed Systems, pp (2015), p1. |
[26] | [ Y. Yao, L. Huang and A. B. Sharma, L. Golubchik and M. J. Neely, Power cost reduction in distributed data centers:A two-time-scale approach for delay tolerant workloads, IEEE Transactions On Parallel and Distributed Systems, 25(2014), 200-211. |
[27] | [ M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker and I. Stoica. Spark:cluster computing with working sets, In Proceedings of the 2Nd USENIX Conference on Hot Topics in Cloud Computing(HotCloud'10), Berkeley, CA, USA, (2010), p10. |
1. | Guanlin Wu, Weidong Bao, Xiaomin Zhu, Wenhua Xiao, Ji Wang, Optimal Dynamic Reserved Bandwidth Allocation for Cloud-Integrated Cyber-Physical Systems, 2017, 5, 2169-3536, 26224, 10.1109/ACCESS.2017.2769665 | |
2. | Jun Yang, Mengchen Liu, Zeru Wei, Chao Han, Wei Li, Yiming Miao, 2018, RoCoSense: Integrating Robotics, Smart Clothing and Big Data Clouds for Emotion Sensing, 978-1-5386-2070-0, 1323, 10.1109/IWCMC.2018.8450363 |
set of datacenters distributed over multiple regions | |
set of data locations | |
set of VM types | |
amount of the data generated from region | |
max amount of data generated from region | |
amount of the data allocated to | |
max number of VMs of type- | |
number of type- | |
price of type- | |
price of storage in datacenter | |
price of bandwidth between location | |
data processing rate of type- | |
preset constant for controlling queueing delay in | |
max delay of data process | |
unprocessed data in datacenter | |
Virtual queue associate with |
Algorithm 1: Procedures of the Proposed online Algorithm in Time Slot |
1Input: |
2 |
3 Output: |
4 |
5 Resource provisioning: |
6 foreach datacenter |
7 |
8 Data Allocation: |
9 foreach |
10 |
11 Update the queues |
set of datacenters distributed over multiple regions | |
set of data locations | |
set of VM types | |
amount of the data generated from region | |
max amount of data generated from region | |
amount of the data allocated to | |
max number of VMs of type- | |
number of type- | |
price of type- | |
price of storage in datacenter | |
price of bandwidth between location | |
data processing rate of type- | |
preset constant for controlling queueing delay in | |
max delay of data process | |
unprocessed data in datacenter | |
Virtual queue associate with |
Algorithm 1: Procedures of the Proposed online Algorithm in Time Slot |
1Input: |
2 |
3 Output: |
4 |
5 Resource provisioning: |
6 foreach datacenter |
7 |
8 Data Allocation: |
9 foreach |
10 |
11 Update the queues |