Loading [MathJax]/jax/output/SVG/jax.js

Qualitative properties of mathematical model for data flow

  • Received: 01 November 2020 Revised: 01 May 2021 Published: 01 July 2021
  • 35L65, 35L40

  • In this paper, properties of a recently proposed mathematical model for data flow in large-scale asynchronous computer systems are analyzed. In particular, the existence of special weak solutions based on propagating fronts is established. Qualitative properties of these solutions are investigated, both theoretically and numerically.

    Citation: Cory D. Hauck, Michael Herty, Giuseppe Visconti. Qualitative properties of mathematical model for data flow[J]. Networks and Heterogeneous Media, 2021, 16(4): 513-533. doi: 10.3934/nhm.2021015

    Related Papers:

    [1] Cory D. Hauck, Michael Herty, Giuseppe Visconti . Qualitative properties of mathematical model for data flow. Networks and Heterogeneous Media, 2021, 16(4): 513-533. doi: 10.3934/nhm.2021015
    [2] Tong Li . Qualitative analysis of some PDE models of traffic flow. Networks and Heterogeneous Media, 2013, 8(3): 773-781. doi: 10.3934/nhm.2013.8.773
    [3] Giuseppe Maria Coclite, Lorenzo di Ruvo, Jan Ernest, Siddhartha Mishra . Convergence of vanishing capillarity approximations for scalar conservation laws with discontinuous fluxes. Networks and Heterogeneous Media, 2013, 8(4): 969-984. doi: 10.3934/nhm.2013.8.969
    [4] Christophe Chalons, Paola Goatin, Nicolas Seguin . General constrained conservation laws. Application to pedestrian flow modeling. Networks and Heterogeneous Media, 2013, 8(2): 433-463. doi: 10.3934/nhm.2013.8.433
    [5] Helge Holden, Nils Henrik Risebro . Follow-the-Leader models can be viewed as a numerical approximation to the Lighthill-Whitham-Richards model for traffic flow. Networks and Heterogeneous Media, 2018, 13(3): 409-421. doi: 10.3934/nhm.2018018
    [6] Frederike Kissling, Christian Rohde . The computation of nonclassical shock waves with a heterogeneous multiscale method. Networks and Heterogeneous Media, 2010, 5(3): 661-674. doi: 10.3934/nhm.2010.5.661
    [7] Mauro Garavello, Roberto Natalini, Benedetto Piccoli, Andrea Terracina . Conservation laws with discontinuous flux. Networks and Heterogeneous Media, 2007, 2(1): 159-179. doi: 10.3934/nhm.2007.2.159
    [8] Raimund Bürger, Stefan Diehl, M. Carmen Martí, Yolanda Vásquez . A difference scheme for a triangular system of conservation laws with discontinuous flux modeling three-phase flows. Networks and Heterogeneous Media, 2023, 18(1): 140-190. doi: 10.3934/nhm.2023006
    [9] Dong Li, Tong Li . Shock formation in a traffic flow model with Arrhenius look-ahead dynamics. Networks and Heterogeneous Media, 2011, 6(4): 681-694. doi: 10.3934/nhm.2011.6.681
    [10] Maria Laura Delle Monache, Paola Goatin . Stability estimates for scalar conservation laws with moving flux constraints. Networks and Heterogeneous Media, 2017, 12(2): 245-258. doi: 10.3934/nhm.2017010
  • In this paper, properties of a recently proposed mathematical model for data flow in large-scale asynchronous computer systems are analyzed. In particular, the existence of special weak solutions based on propagating fronts is established. Qualitative properties of these solutions are investigated, both theoretically and numerically.



    The increasing number and diversity of processing units in modern supercomputers presents a significant challenge to the understanding of how data is globally distributed in these systems. This issue is critically important in emerging computer architectures for which the cost of data movement is becoming a critical driver for system design, affecting hardware and software choices and even basic algorithms [12, 17, 18]. At the same time, these supercomputers are becoming increasingly difficult to model. Indeed, simulating the detailed state of a supercomputer requires an even larger computer, thereby making such approaches prohibitive for the largest systems.

    With the help of new mathematical models, a more efficient utilization of available processing units might be possible. Such models are necessarily coarse-grained approximations, used to understand the gross behavior of the system in order to make design, control, and optimization more tractable. In this spirit, a coarse-grained mathematical model has been recently derived in [5] to describe the movement of processed data in an extreme-scale computer system. This model focuses on a idealized setting, in which a lattice of processors perform operations in parallel and communicate with nearest neighbors. Although very simple, the model allows for local variations in processor speed, due to differences in hardware performance or local problem complexity, as well as asynchronously operation, meaning that processors manipulate and share data with their neighbors as they are able, as opposed to a staged process in which processors communicate only after they have all completed a step of assigned work.

    The model proposed in [5] takes the form of a PDE for the density of data at a particular processor location and stage in the global computation. It is derived formally from a microscopic ODE description. The notion of coarse-graining discrete and semi-discrete model of networks and related systems to derive PDE models is common mathematical strategy that is used in many applications, including vehicular [4, 11, 16] and pedestrian traffic flow [1, 6, 8, 15], supply chains [2, 3], bacterial movement [14] and machine learning applications [7].

    In the current work, we establish some qualitative properties of the coarse-grained PDE model in [5]. We focus specifically on the phenomena of front propagation and present simulation results that highlight the theoretical findings. In the context of the current application, these fronts describe the movement of data through the different stages of a computational task. The reason for this analysis is two-fold. First, it was observed in [5] that, after a transient phase, many initial conditions eventually relax to a front whose shape depends on local variations in the processor speed. Second, these fronts can be used to characterize three important features for application: the first time at which some portion of the data reaches a state of completion, the rate at which the rest of the data reaches this state, and the time at which all data reaches this stage. Moreover, the relatively simple form of these fronts enables the investigation of optimization and control strategies based on some application-specific metrics. As a first step in this direction, we explore how variations in the processor speed affect front behavior.

    In this section, we briefly summary the microscopic ODE model from [5] and the macroscopic model that is obtained from it. The microscopic model describes a computer system as a lattice of processors, the dimension of which can be arbitrary, but finite. For application purposes, only one-, two-, and three-dimensional lattices are realistic, and here we focus the one-dimensional case for simplicity. It is assumed in [5] that each processor performs the same task, which is broken into discrete stages. The rate at which a given processor moves data from stage to stage depends not only on its intrinsic processing rate, but also on the availability of usable data in the processor and its neighbors.

    Mathematically, the amount of data at time t that sits in stage k{1,,kmax} of processor i{1,,imax} is given by qi,k(t). In the absence of any throttling, the flow of data between stages is given by a (processor dependent) rate ai(t)0. However the actual flow may be reduced due to a lack of available data in a given processor or its nearest neighbors. Throttling due to a lack of usable data in the nearest neighbors is characterized by the parameter η:=kmax/imax>0. As η increases the global effect of local slowdowns increases.

    For each i{1,,imax} and k{1,,kmax}, the evolution of qi,k is modeled by the ordinary differential equation

    ˙qi,k(t)=fi,k1(t)fi,k(t),qi,k(0)=q0i,k,fi,0(t)=fini(t), (1)

    where q0i,k is the (known) initial amount of data present in each processor i at each stage k and fini(t) is the (known) inflow of unprocessed (or raw) data at processor i and time t. For k{1,,kmax}, fi,k is the flow of data in processor i from stage k to k+1. It is given by the product of the maximum processing rate ai and the composition of two throttling functions v1 and v2; that is,

    fi,k=aiv1(v2(qi,k,Qi+1,kQi,k+qi,k,Qi1,kQi,k+qi,k);q), (2)

    where Qi,k(t) is the total amount of data in processor i that by time t has traversed the first k1 stages of the computation:

    Qi,k(t)=kmaxj=kqi,j(t)+t0fi,kmax(s)ds, (3)

    and Qi±1,kQi,k+qi,k is the amount of usable data available from processor i±1. The throttling functions are given by

    v1(q;q)=max{0,min{1,qq}}, (4)
    v2(q,Δ+,Δ)=min{qi,k,max{Δ+,0},max{Δ,0}}. (5)

    The function v1 is a linear ramp, sometimes referred to as roof-line model [19]. The steepness of the ramp is determined by the parameter q>0, the value at which the processor is fully on. The function v2 determines the amount of data ready to process based on what is available locally and from nearest neighbors. In particular, if not enough data is available from neighboring processors, then fi,k<ai.

    1 In [5], there is another parameter β[0,1] in the definition of v2 that measures the effective strength of coupling between processors. Here we assume β=1 and do not consider it any further.1

    The macroscopic model in [5] is a continuum approximation for (1) that is derived in the limit of infinitely many processors (imax) and stages (kmax), subject to that constraint that η is a fixed, positive, finite constant. This continuum model is a PDE for a function ρ, where ρ(t,x,z) is interpreted as the local density of data at time t, processor location x, and stage completion variable z. It takes the form

    tρ(t,x,z)+zΦ(ρ(t,x,z),xP(t,x,z))=0,(t,x,z)R+×T×(0,1), (6a)
    ρ(0,x,z)=ρ0(x,z),(x,z)T×(0,1), (6b)
    Φ(ρ(t,x,0),xP(t,x,0))=Fin(t,x),(t,x)R+×(0,1), (6c)

    where R+=(0,); T is the torus parameterized by x[0,1); Fin is a scaled, continuous version of fin; P and Φ are given by

    P(t,x,z)=1zρ(t,x,y)dy+t0Φ(ρ(s,x,1),xP(s,x,1))ds, (7a)
    Φ(ρ,σ)=αw1(w2(ρ,σ,σ)); (7b)

    and w1 and w2 are scaled version of v1 and v2:

    w1(u)=min{max{uρ,0},1} (8a)
    w2(u,v1,v2)=min{u,max{u+ηv1,0},max{u+ηv2,0}}. (8b)

    The model (6) is a conservation law that describes the movement of data entering the system at z=0 and exiting at z=1. The flux Φ is the composition of the two scaled throttling functions: First w2 determines the amount of data available based on the local density and the density of left and right neighbors. Once w2 is specified, w1 computes the effective processing speed assuming a linear ramp, where the parameter ρ>0 is the minimum data density needed to operate at full capacity. The parameter η controls the degree of throttling due to the lack of available data in neighboring processors.

    At first glance, (6) appears to include a recursive definition of the flux function Φ due to equation (7a). However, after integrating (6a) over (s,z)[0,t]×[0,1] and applying (6c) and (6b), the result can substituted into (7a) to find the following formula for P:

    P(t,x,z)=t0Fin(s,x)ds+10ρ0(x,z)dzz0ρ(t,x,z)dz. (9)

    Alternatively, since zP(t,x,z)=ρ(t,x,z), (6a) can be rewritten as a closed Hamilton-Jacobi equation for P:

    tPΦ(zP,xP)=0. (10)

    The reformulation of (6a) in terms of (10) is the basis for the analysis and simulation performed in [5]. In the current work, we proceed by analyzing (6) directly in terms of ρ.

    In this section, we investigate qualitative properties of (6). We begin by simplifying the expression for Φ in (7b) using the auxiliary function

    W2(s)=min{¯w2(s),¯w2(s)}, (11)

    where

    ¯w2(s)=min{1,max{1+ηs,0}}; (12)

    (see Figure 1). In terms of w1 and W2, Φ takes the form

    Φ(ρ,σ)={αw1(ρW2(σρ)),ρ00,ρ=0. (13)
    Figure 1.  Shape of the throttling functions used in the definition of the flux Φ in (13).

    Next for simplicity, we artificially extend the domain in z to [0,) and assume initial data with compact support. This assumption allows us to ignore the contribution of the outflow in (7a). Under this assumption, the definition of P is modified as follows:

    Assumption:P(t,x,z)=zρ(t,x,y)dyt0,xT,z[0,). (14)

    Hence, in the following we consider the model (6) on the extended domain

    D:=R+×T×R+ (15)

    and introduce the auxiliary variable

    σ(t,x,z):=xzρ(t,x,y)dy,(t,x,z)D. (16)

    The resulting system for (ρ,σ):D2R2 is then

    tρ+zΦ(ρ,σ)=0,(t,x,z)D, (17a)
    xρ+zσ=0,(t,x,z)D, (17b)
    ρ(0,x,z)=ρ0(x,z),(x,z)T×R+, (17c)
    ρ(t,x,0)=ρb(t,x),σ(t,x,0)=σb(t,x),(t,x)R+×T. (17d)

    The relation of the quantities (ρ,σ) to the original model (6), modified by the previous Assumption, is as follows:

    σb(t,x)=x0ρ(t,x,y)dy, (18a)
    Φ(ρb(t,x),σb(t,x))=Fin(t,x), (18b)
    limzσ(t,x,z)=0. (18c)

    Definition 3.1 [Weak solution] Given initial data ρ0L(T×R+) and boundary data (ρb,σb)L(R+×T)2, we call (ρ,σ)[L(D)]2 a weak solution of (17) if for all smooth compactly supported functions φ:DR and ψ:DR,

    Dρtφ+Φ(ρ,σ)zφdzdxdt+T×R+ρ0(x,z)φ(0,x,z)dzdx+R+×TΦ(ρb(t,x),σb(t,x))φ(t,x,0)dxdt=0, (19a)
    Dρxψ+σzψdzdxdt+R+×Tσb(t,x)ψ(t,x,0)dxdt=0. (19b)

    We do not know whether a solution in the sense of Definition 3.1 exists for (17) with general initial and boundary data or whether such a solution is unique and depends continuously on the data.2 However, for the particular initial and boundary data given in (28), we establish the existence of special solutions below.

    2In [5], a continuous vanishing viscosity solution is established using known results from the mathematical literature on Hamilton-Jacobi equations. However, translating such a result to an L1-theory for ρ would require P to be absolutely continuous. While possible, this additional regularity has not yet been determined.

    The weak formulation (19) gives rise to a Rankine–Hugoniot jump condition [10]. Consider the surface

    S:={(t,x,z)D:z=ζ(t,x)} (20)

    for a differentiable function ζ:R+0×TR+0 and assume that the functions ρ and σ satisfy (17) point-wise in the interior of DS. Then standard arguments (see for example [13, Section 11.1.1]) can be used to establish that

    Φ(ρ,σ)Φ(ρr,σr)=tζ(t,x)(ρρr), (21)

    where

    ρ=ρ(t,x,ζ(t,x)),σ=σ(t,x,ζ(t,x)), (22)
    ρr=ρ(t,x,ζ(t,x)+),σr=σ(t,x,ζ(t,x)+). (23)

    In this section, we investigate solutions to (17) that take the form of fronts; see Figure 2. As discussed in the introduction, such solutions are important from the point of view of applications.

    Figure 2.  Left: Contour plot of the density ρ(t,x,z) for a fixed time t. Processed data with constant density r>0 is depicted in blue up to a stage of completion z=ζ(t,x). Zero data (grey) is prescribed for completion stages z>ζ(t,x), c.f. equation (28). Right: Similar plot of a density with constant values r1 and r2 and regions separated by functions ζ1(t,x) and ζ2(t,x).

    In the case of a single front, the initial and boundary conditions for ρ take the form

    ρ0(x,z)=rH(ζ0(x)z)andρb(t,x)=r, (24)

    where H denotes the Heaviside function and r>0 is a positive constant. In addition, we enforce the consistency of σb as prescribed in (18a):

    σb(t,x)=rxζ(t,x). (25)

    Theorem 3.2. Let ρ,η>0 be positive constants and let α=α(t,x)C1(R+×T) be a strictly positive function. Given initial and boundary conditions of the form (24) and (25), where

    0<r<ρ (26)

    and ζ0C1(T) is non-negative, assume that there is a C1 solution ζ:R+0×TR+ that satisfies

    tζ(t,x)=α(t,x)ρW2(xζ(t,x)),ζ(0,x)=ζ0(x). (27)

    Then there pair (ρ, σ) given by

    ρ(t,x,z)=rH(ζ(t,x)z),σ(t,x,z)=rH(ζ(t,x)z)xζ(t,x) (28)

    is a weak solution for (17) in the sense of Definition 3.1 with initial and boundary conditions given by (24) and (25).

    Before proving this result, some remarks are in order.

    1. The condition in (26) guarantees that rρW2(σr)<1. In this case the flux function Φ (c.f. (13)) is given by the simplified formula

    Φ(ρ,σ)={αρρW2(σρ),ρ0,0,ρ=0, (29)

    This formula and the Rankine–Hugoniot condition in (21) together form the key components of the existence proof.

    2. While the condition r<ρ is sufficient to obtain (29), it is not necessary. Indeed W2(σ/ρ)=0 for σ sufficiently large; in particular, (29) holds at any (x,t) such that xζ(t,x)>η1.

    3. It is currently open as to whether equation (27) has a C1 solution. However, since α,ρ, and W2 are non-negative, it is clear that any such solution will be non-decreasing. This fact is consistent with the notion that data is always processed toward completion.

    4. A result similar to Theorem 3.2 can be obtained for initial data separated by a finite number of non-intersecting fronts. Consider for example the case of two fronts (see Figure 2-b):

    ρ0(x,z)=r1H(ζ1,0(x)z)+r2[H(ζ2,0(x)z)H(ζ1,0(x)z)] (30)

    with positive constants r1r2 and r1<ρ and r2<ρ. If 0<ζ1,0(x)<ζ2,0(x) for all xT, then there is a weak solution (ρ,σ) in the sense of Definition 3.1 of the form

    ρ(t,x,z)=(r1r2)H(ζ1(t,x)z)+r2H(ζ2(t,x)z) (31a)
    σ(t,x,z)=(r1r2)xζ1(t,x)H(ζ1(t,x)z) (31b)
    +r2xζ2(t,x)H(ζ2(t,x)z), (31c)

    provided ζ1(0,x)=ζ1,0(x), ζ2(0,x)=ζ2,0(x), and

    tζ1(t,x)=α(t,x)ρ(r2r1)[r2W2(xζ2(t,x))r1W2((r1r2r1)xζ1(t,x)+r2r1xζ2(t,x))], (32a)
    tζ2(t,x)=α(t,x)ρW2(xζ2(t,x)). (32b)

    Proof of Theorem 3.2. With the C1 assumption on ζ, (ρ,σ) given by equation (28) clearly belongs to L(D)2. By construction the given solution (28) fulfills boundary and initial condition pointwise and hence also in weak form. It therefore remains to verify that (ρ,σ) is a weak solution in the interior of D.

    We first verify that (ρ,σ) fulfills (19b). For ψ smooth and compactly supported in the interior of D, we need to show that:

    D[rH(ζ(t,x)z)xψ(t,x,z)+rH(ζ(t,x)z)xζ(t,x)zψ(t,x,z)]dzdxdt=0. (33)

    By definition of the Heaviside function,

    D[rH(ζ(t,x)z)xψ(t,x,z)+rH(ζ(t,x)z)xζ(t,x)zψ(t,x,z)]dzdxdt=rR+0×Tζ(t,x)0[xψ(t,x,z)+xζ(t,x)zψ(t,x,z)]dzdxdt=rR+0×T[ζ(t,x)0xψ(t,x,z)dz+xζ(t,x)ψ(t,x,ζ(t,x))]dxdt=rR+0×Tx(ζ(t,x)0ψ(t,x,z)dz)dxdt=0, (34)

    where the integral in the last line vanishes due to periodicity of the domain with respect to x.

    We next verify that (ρ,σ) fulfills (19a). Let S={(t,x,z)D:z=ζ(t,x)}D be the surface of the front described by ζ, and let S+:={(t,x,z)D:z>ζ(t,x)} and S:={(t,x,z)D:z<ζ(t,x)}. For z<ζ(t,x) the explicit form of (ρ,σ) implies the following relation

    σ(t,x,z)ρ(t,x,z)=xζ(t,x). (35)

    Hence according to (29),

    Φ(ρ(t,x,z),σ(t,x,z))={α(t,x)rρW2(xζ(t,x))(t,x,z)S0(t,x,z)S+. (36)

    Let φ be any smooth, compactly supported function in the interior of D. In S±, Φ(ρ,σ) and ρ are constant, and in S+, they both vanish. Hence,

    Dρtφ+Φ(ρ,σ)zφdzdxdt=Srtφ+αrρW2(xζ)zφdzdxdt=T{(t,z):z=ζ(t,x)}φ(rtζ+αrρW2(xζ))1(1,tζ)dA(t,z)dx, (37)

    where dA(t,z) denotes the surface measure on the curve {(t,z):z=ζ(t,x)}. The final expression above vanishes due to the definition of ζ given in (27). This finishes the proof.

    It is the equation for the front ζ in (27) that essentially determines the behavior of the special solutions (28). We assume again that 0<r<ρ. With the definition of W2 in (11), this equation takes the form

    tζ(t,x)=α(t,x)ρ×{1η|xζ(t,x)|,|xζ(t,x)|<η1,0,|xζ(t,x)|η1. (38)

    In general, we do not know if there exists a solution to (38), due to the discontinuous flux. However, we may consider particular solutions related to special initial conditions ζ0(x). Those special solutions are used as test cases for numerical simulations in the next section.

    ● If for all (t,x)R+×T, |xζ(t,x)|<η1 and α(t,x)=α is constant, then formally xζ(t,x) fulfills the conservation law

    t(xζ)(t,x)+αηρx|(xζ)(t,x)|=0,(xζ)(0,x)=xζ0(x) (39)

    with flux function f(u)=C|u| and C=αηρ. It is known that there exists a weak (entropic) solution to (39) as long as xζ0(x) is in L, has locally bounded variation, and is continuous from the left [9, Proposition 3.1]. Moreover, if xζ0(x) is piecewise constant, this solution is given by piecewise constant states separated by a finite number of traveling discontinuities [9, Lemma 3.2]. We investigate such piece-wise solutions in Examples 2 and 3 in the numerical results of Section 4.

    ● If α is constant, and if 0xζ0(x)<η1 locally in x, then for sufficiently small t,

    ζ(t,x)=ζ0(xηαρt)+αρt. (40)

    Similarly if η1<xζ0(x)0 locally, then for sufficiently small t,

    ζ(t,x)=ζ0(x+ηαρt)+αρt. (41)

    We use these formulas to make comparisons with numerical results in Examples 2-4 in Section 4.

    ● If locally |xζ0(x)|η1, then ζ(t,x)=ζ0(x) for t sufficiently small. In this case the model predicts that all processors are stalled. However, the condition on ζ0 cannot hold globally due to the periodicity assumption in x. We explore the behavior of the model for this scenario in Example 5 in Section 4.

    In this section, we construct a numerical discretization for the system. Results of numerical simulations using this discretization are presented to illustrate the behavior of special front-type solutions discussed in the previous section.

    Our strategy for approximating (17) is based on the relaxed system of equations

    tρ+zΦ(ρ,σ)=0, (42a)
    εtσ+xρ+zσ=0, (42b)

    where ϵ>0 is a small parameter. Formally, (17a) and (17b) are obtained from (42) in the limit as ϵ0. This limit is considered later after a suitable discretization of (42).

    Before discretizing (42), we investigate hyperbolicity in the simplified case that α(t,x)>0 is constant.3 In this case, (42) takes the non-conservative form

    3 Hyperbolicity will not change if α is space or time–dependent, provided it is sufficiently smooth.

    tU+Bx(U)xU+Bz(U)zU=0, (43)

    where U=(ρ,σ) and

    Bx(U)=(001ϵ0),Bz(U)=(ρΦσΦ01ϵ). (44)

    Under the assumption (26) and provided that ρ>0, the simplified form of Φ in (29) implies that

    ρΦ(ρ,σ)=αρ(W2(s)sW2(s)),σΦ(ρ,σ)=αρW2(s), (45)

    where s=σ/ρ.

    The system (43) is hyperbolic if for any ξ=(ξ1,ξ2)R2 the matrix B(U,ξ)=ξ1Bx(U)+ξ2Bz(U) is diagonalizable in the field of real numbers [13]. Because the function W2 is not smooth, B(U,ξ) is not differentiable when s=0 or s=±1/η. However, away from these points,

    B(U,ξ)={(00ξ1ϵξ2ϵ),|s|>1η,](ξ2αρξ2sgn(s)ηαρξ1ϵξ2ϵ),0<|s|<1η, (46)

    and the eigenvalues of B(U,ξ) are

    (λ1,λ2)={(0,ξ2ϵ),|s|>1η,(λ1,λ2),0<|s|<1η, (47)

    where (λ1,λ2) are the roots of the characteristic polynomial

    p(λ)=λ2ξ2(αρ+1ϵ)λ+ξ2αϵρ(ξ2+sgn(s)ξ1η). (48)

    For |s|>η1, the system is strictly hyperbolic. For 0<|s|<η1, the polynomial p in (48) has discriminant

    dϵ(ξ1,ξ2)=(ξ2ϵρ)2[(ϵαρ)24sgn(s)ϵαρηξ1ξ2] (49)

    which, for ξ20, is negative (so that λ1 and λ2 have non-zero imaginary component) if and only if

    sgn(s)ξ1ξ2>(ϵαρ)24ϵαρη. (50)

    For any ϵ>0, the non-hyperbolic region {(ξ1,ξ2):dϵ(ξ1,ξ2)<0} is nontrivial; see Figure 3. However, for any fixed (ξ1,ξ2), there exists an ϵ(ξ1,ξ2) small enough that dϵ(ξ1,ξ2)0 for all ϵ<ϵ(ξ1,ξ2). Hence as ϵ0+, the non-hyperbolic region vanishes. The numerical method below is constructed by discretizing the relaxation system in (42) and then formally setting ϵ=0. This fact partially justifies the use of (42) even though it is not everywhere hyperbolic when ϵ>0.

    Figure 3.  Domain of hyperbolicity of system (42). In both cases, the non-hyperbolic region (gray) is bounded by the ξ1-axis and by ξ2=±Cξ1 with slope C=4ϵαρη(ϵαρ)20 as ϵ0+.

    To derive a numerical scheme for (42), we first discretize in time using a first-order method. We are eventually interested in the ϵ0 limit of the fully discretized scheme; hence the equation for ρ in (42a) is treated explicitly, but the equation for σ in (42b) is treated implicitly. Let ρn(x,z)ρ(nΔt,x,z) and σn(x,z)σ(nΔt,x,z) be the discrete approximations of ρ and σ, respectively, at time nΔt. We set

    ρn+1=ρnΔtzΦ(ρn,σn), (51a)
    σn+1=σnΔtε(xρn+1+zσn+1). (51b)

    To discretize (51) in x and z, we introduce a bounded computational domain (x,z)[0,1)×[0,1], which is divided into Nx×Nz uniform cells of size Δx×Δz. The cell centers are denoted by (xi,zj), for i{1,,Nx} and j{1,,Nz}; and the cell edges are denoted by xi+12=(i+12)Δx and zj+12=(j+12)Δz, respectively, for i{0,,Nx} and j{0,,Nz}. The approximate cell averages of ρn and σn on the cell centered at (xi,zj) are denoted by

    ¯Rnij1ΔxΔzxi+1/2xi1/2zj+1/2zj1/2ρn(x,z)dxdz (52)

    and

    ¯Snij1ΔxΔzxi+1/2xi1/2zj+1/2zj1/2σn(x,z)dxdz, (53)

    respectively.

    We use a Lax-Friedrichs approximation of zΦ in (51a) and in (51b) a centered discretization for xρ and a one-sided discretization of zσ. The resulting fully discrete scheme is

    ¯Rn+1ij=¯RnijΔtΔz(Fni,j+1/2Fni,j1/2)+14(¯Rni+1,j2¯Rnij+¯Rni1,j), (54a)
    ¯Sn+1ij=¯SnijΔt2ϵΔx(¯Rn+1i+1,j¯Rn+1i1,j)ΔtϵΔz(¯Sn+1i,j+1¯Sn+1ij), (54b)

    where

    Fni,j+12=12(Φ(¯Rni,j+1,¯Sni,j+1)+Φ(¯Rnij,¯Snij)a(¯Rni,j+1¯Rnij)). (55)

    The monotonicity of the numerical flux Fni,j+12 is guaranteed provided that

    max(R,S)ρΦ(R,S)=:aΔzΔt. (56)

    This condition motivates a dynamic choice of Δt: At each time level tn we set Δt=Δzan for

    an=max(i,j):1iNx,1jNzρΦ(¯Rnij,¯Snij). (57)

    The right-biased stencil in the discretization of zσ in equation (54b) warrants some discussion. Indeed, given that the boundary condition for σ is given at z=0, it seems more natural to use a left-biased stencil. However, such an approach does not allow the boundary condition to be computed implicitly via the consistency relation in (18a). However, if ρ has compact support, then (18c) holds. As long as the support of the solution does not reach the boundary of the computational domain at z=1, this condition can be enforced there, in an implicit fashion. This approach is consistent with the fact that σ at a given z is determined entirely by ρ at values of z>z. Moreover, numerical calculations confirm that enforcing the boundary condition in this way yields stable results.

    The final scheme used in Section 4.2 is given by the formal limit of (54) in the limit ϵ0:

    ¯Rn+1ij¯Rnij=ΔtΔz[Fni,j+1/2Fni,j1/2]+14(¯Rni+1,j2¯Rnij+¯Rni1,j), (58a)
    ¯Sn+1ij=Δz2Δx(¯Rn+1i+1,j¯Rn+1i1,j)+¯Sn+1i,j+1. (58b)

    This scheme above must be accompanied by boundary and initial conditions. The initial values ¯R0ij are obtained by integration of the initial data ρ0:

    ¯R0ij=1ΔxΔzxi+1/2xi1/2zj+1/2zj1/2ρ0(x,y)dxdz. (59)

    The cell averages ¯S0ij are obtained by applying a midpoint rule to the consistency relation (18a):

    ¯S0ij=1ΔxNz=jρ0(xi+1/2,zj)ρ0(xi1/2,zj). (60)

    For ¯Snij, zero boundary conditions are prescribed at j=Nz consistent with the right-based stencil in the discretization of zσ:

    ¯Sni,Nz=0. (61)

    For ¯Rnij, boundary conditions at z=0 are computed from ρb(t). For the special solutions discussed in the previous section ρb=r and hence

    ¯Rni,1=r. (62)

    Due to the central discretization of zρ, boundary conditions for ¯Rnij at j=Nz need to be prescribed. Since in the simulation no data reaches this boundary we implement Neumann boundary conditions

    ¯Rni,Nz=¯Rni,Nz1. (63)

    Example 1: validation with the microscopic model. We consider a test problem that has been used to to validate the macroscopic model (17) and the numerical method (58) by comparing with a simulation of the microscopic model introduced in [5]. For this problem, ρ=1.0 and η=1.0. The initial and boundary conditions are given by

    ρ0(x,z)=1.5(sin(2πz))6χ[0,0.5](z),ρb(t,x)=0,σb(t,x)=0, (64)

    and the processor speed is

    α(x)=10.4(sin(πx))2. (65)

    Both models are simulated up to a final time Tfin=0.5.

    The (x,z) domain for the macroscopic model is discretized using Nx=Nz=800 cells, and simulated with the algorithm in (58). In order to be consistent with the choice of η, the microscopic model uses imax=800 processors and kmax=800 stages. The microscopic model is integrated in time using the explicit Euler method.

    In Figure 4 we show the initial density and the final density profiles for both models at time Tfin. We observe qualitative agreement between the microscopic and the macroscopic solutions. In both cases, the data around x=0.5 is processed more slowly than the data in the rest of the domain, due to a slower processing rate α there.

    Figure 4.  Density profiles for Example 1.

    In the remaining examples of this subsection, we explore the behavior of front-type solutions. Unless otherwise stated the following parameters are used in all simulations:

    r=0.5,ρ=0.8,α=0.1,Tfin=2.0. (66)

    The value of Tfin ensures that data does not reach z=1; in particular the assumption in (14) holds. These parameter choices, along with the initial and boundary conditions ensure that

    0ρ(t,x,z)<ρ,forall(t,x,z)D, (67)

    in which case the flux Φ is given by the simplified formula in (29).

    Example 2: front with constant profile. We consider the case of an initial constant front, namely

    ζ0(x)=ζ0=0.2,xT, (68)

    which provides a trivial example for a piece-wise constant solution xζ to (39). Moreover, the evolution of this front is given explicitly in (40) and (41):

    ζ(t,x)=ζ0+αρt. (69)

    In this example η is set to 0.5 although the solution in (69) is independent of this choice.

    Numerical results are shown in Figure 5. Since the processing rate α is constant the front moves with the same speed for each x[0,1), as expected. Moreover, the analytical and the numerical solutions provide the same position of the front at final time.

    Figure 5.  Density profiles based on the a simulation of the constant front solution (69) with the scheme in (58). The simulations in panels (a)-(c) use Nx=Nz=800 computational cells, while in panel (d), numerical solutions are computed at different refinement levels. In the top row, the analytical front is marked by circles.

    Example 3: V-shaped front with small η We consider an initial condition characterized by a V-shape front under the condition that η|xζ|<1; cf. (38). This condition models the physical situation in which processors are not initially at the same stage of completion. Such a situation may be realized if a local group of processors is, at some previous time, slower than the rest [5]. Specifically, we let

    ζ0(x)=(12z0)|x0.5|+z0 (70)

    where z0=ζ0(0.5)<0.5 is a fixed parameter.

    Based on the theoretical findings in [9] for (39), we expect that the front at later times will be piecewise linear. Moreover, the formulas in (40) and (41) can be used to generate the local solution analytically. Indeed, under the assumption η|xζ|<1, a small calculation shows that the local, short time solution away from x=0.5 is given by

    ζloc(t,x)=ζ0(x)+αρ(1η|xζ0|)t. (71)

    where |xζ0|=(12z0). However at x=0.5, Theorem 3.2 does not apply, since the front is not in differentiable there; hence (40) and (41) are not valid. We conjecture that at this point, the front moves forward at full speed α/ρ and, as it catches up with neighboring points in the front, these points also move forward at full speed. This means that the global front solution is given by

    ζ(t,x)=max{ζloc(t,x),z0+αρt,}. (72)

    In Figure 6 we show numerical results for this test problem using the algorithm in (58).. We use a grid with Nx=Nz=800 computational cells. We set z0=0.2 (so that |xζ0(x)|=0.6) and let η=1/1.1. It is clear from these plots that the analytic solution (72) and the numerical solution provide the same position of the front.

    Figure 6.  Density profiles based on the simulation of the V-shaped initial front (70) with the scheme in (58) using Nx=Nz=800. The analytic solution is computed using (72). In the top row, the analytical front is marked by circles.

    Example 4: Smooth front with small η. We repeat the test problem above, except now the initial front is given by a smooth function:

    ζ0(x)=14cos(2πx)+0.3. (73)

    In order to fulfill the condition η|xζ|<1 we choose η1=maxxTζ0(x)+1=π/2+1.

    In Figure 7 we show numerical results for this test problem using the algorithm in (58). We use a grid with Nx=Nz=800 computational cells and simulate the solution to a time horizon Tfin=2. As in the previous example, (72) can be used to evaluate the location of the front analytically. The results in Figure 7 demonstrate that the numerical and analytical solutions agree even when xζ0 is not piece-wise constant.

    Figure 7.  Density profiles based on the simulation of the V-shaped initial front in (70), using the scheme in (58) with Nx=Nz=800. The analytic solution is computed using (72). In the top row, the analytical front is marked by circles.

    Example 5: V-shaped front with large η. We consider again the initial condition (70) characterized by a V-shape front. The model parameters are the same as in Example 3, except that η is now large enough to ensure that η|xζ|>1. In particular |xζ|=0.6, but η=10. In this case, we expect stalling for those processors x0.5; see (38). Again at x=0.5 the solution is not described by Theorem 3.2, since there the front is not C1. We expect that the front at this point will proceed at the maximum speed α/ρ and that the stalled points away front x=0.5 will also begin to move at this rate after they have been overtaken. In other words, we conjecture that the global formula for the solution front is

    ζ(t,x)=max{ζ0(x),z0+αρt}. (74)

    In Figure 8 we show numerical results for a grid with Nx=Nz=800 computational cells and a time horizon Tfin=2. These results agree with the analytical solution in (74).

    Figure 8.  Density profiles based on the simulation of the V-shaped initial front (70) with the scheme in (58) using Nx=Nz=800. The analytical solution is computed using (74). In the top row, the analytical front is marked by circles.

    The discrete model (1) can be used to predict how variations in the processor rate affect computer performance, and to understand whether these rates can be controlled to achieve a prescribed objective, such as faster throughput. For the continuum model in (17), such a control is specified via the function α. In this section, we briefly explore the effects of adapting α with the intent of a more extensive study in later work.

    In order to normalize the control action, we assume that for all t,

    Tα(t,x)dx=ˉα. (75)

    Since α is non-negative, so too is ˉα; we assume further that ˉα>0. Then several choices for α can be envisioned. The simplest choice is to set α equal to a constant. In this case, the normalization in (75) implies that α=ˉα. For a front-type solution, this simple choice will be compared with a policy that depends on the initial data

    ρ0(x,z)=rH(ζ0(x)z),r<ρ,andζ0(x)>0, (76)

    Since the slope xζ(t,x) controls the throttling due to neighboring processors, the condition xζ(t,x)=0 is desirable. We therefore propose a choice for α that gives priority to processors that are trailing in the initial configuration. Specifically, we consider

    α(x)=Cαρ(ζmaxζ0(x)), (77)

    where ζmax>maxxTζ0(x) is a fixed constant and Cα, which depends on ζ0, is chosen so that (75) holds. This choice of α will increase the speed of trailing processors

    In order to compare the temporal and spatial evolution of ρ based on the policy (77) versus the constant α=ˉα policy, we compute some quantities of interest

    ω1(t,z):=t0TΦ(ρ(s,x,z),σ(s,x,z))dxds,ω2(t,z):=TΦ(ρ(t,x,z),σ(t,x,z)dx,ω3(t,z):=t0Tρ(s,x,z)dxds. (78)

    The quantity ω1 measures the cumulative outflow at stage of completion z up to time t; the quantity ω2 measures the current outflow at stage of completion z; and the quantity ω2 measures the cumulative part density at stage of completion z. The first two quantities are indicators of the processed data whereas the last indicator shows the load of the processors, which may be related to the consumption of energy during computation. In each case, larger values of ωi are preferable.

    For our numerical tests, we set Tfin=1 and ˉα=0.5. The initial front ζ0(x) is given as in (70) with z0=0.1, in which case Cα0.5/0.575. The quantities of interest are evaluated for z=0.5 and z=0.75. Figure 9 shows the initial density profile and the profiles at Tfin=1 for each of the two policies. For the policy based on equation (77), the control has been been active long enough to push the processors that were initially trailing ahead of the others. This means the control has been on for too long and should have been modified or turned off at an some earlier time. While the results of the two policies appear similar, the quantities of interest wi(t), i=1,2,3 displayed in Figure 10 demonstrate the differences. In particular, with respect to w1, the policy in (77) outperforms the constant α policy.

    Figure 9.  Density profiles for two different policies choices of α with the scheme in (58) using Nx=Nz=800.
    Figure 10.  Comparison of the quantities of interest wi,i=1,2,3 given in (78).

    In this paper, we have analyzed a recently derived mathematical model for the evolution of processed data in large-scale, asynchronous computers [5]. After the introduction of an auxiliary variable, the model is expressed as a system of partial differential equations. It is possible to prove existence of discontinuous solutions to this system for a particular class of initial and boundary conditions. These solution describes the flow of information as a series of propagating fronts. A numerical scheme for the reformulated system has also been designed based on a relaxation approximation. Numerical simulations based on this scheme demonstrate qualitative agreement with theoretical findings.

    We have also briefly explored the effects of local processor speed on quantities of interest predicted by the model. In future work, we intend to investigate more extensively control mechanisms for optimizing important objectives related to performance of the large-scale computers. Further it is planned to validate the model based on experimental data.

    MH and GV thank the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) for the financial support through 20021702/ GRK2326, 333849990/IRTG-2379, HE5386/14, 15, 18-1, 19-1, 22-1 and under Germany's Excellence Strategy EXC-2023 Internet of Production 390621612. The funding through HIDSS-004 is acknowledged. This work has been supported also by the US National Science Foundation, RNMS (KI-Net) grant 11-07444, and the U.S. Department of Energy, Office of Advanced Scientific Computing Research. The work of CDH was performed at the Oak Ridge National Laboratory, which is managed by UT-Battelle, LLC under Contract No. De-AC05-00OR22725.



    [1] Two-way multi-lane traffic model for pedestrians in corridors. Netw. Heterog. Media (2011) 6: 351-381.
    [2] A model for the dynamics of large queuing networks and supply chains. SIAM J. Appl. Math. (2006) 66: 896-920.
    [3] A scalar conservation law with discontinuous flux for supply chains with finite buffers. SIAM J. Appl. Math. (2011) 71: 1070-1087.
    [4] Derivation of continuum traffic flow models from microscopic follow-the-leader models. SIAM J. Appl. Math. (2002) 63: 259-278.
    [5] A mathematical model of asynchronous data flow in parallel computers. IMA Journal of Applied Mathematics (2020) 85: 865-891.
    [6] On the modelling crowd dynamics from scaling to hyperbolic macroscopic models. Mathematical Models and Methods in Applied Sciences (2008) 18: 1317-1345.
    [7] H. Chan, M. J. Cherukara, B. Narayanan, T. D. Loeffler, C. Benmore, S. K. Gray and S. K. R. S. Sankaranarayanan, Machine learning coarse grained models for water, Nature Communications, 10 (2019), 379. doi: 10.1038/s41467-018-08222-6
    [8] Pedestrian flow models with slowdown interactions. Mathematical Models and Methods in Applied Sciences (2014) 24: 249-275.
    [9] Polygonal approximations of solutions of the initial value problem for a conservation law. J. Math. Anal. Appl. (1972) 38: 33-41.
    [10] C. M. Dafermos, Hyperbolic Conservation Laws in Continuum Physics, vol. 325 of Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences], 2nd edition doi: 10.1007/3-540-29089-3
    [11] Rigorous derivation of nonlinear scalar conservation laws from follow-the-leader type models via many particle limit. Archive for Rational Mechanics and Analysis (2015) 217: 831-871.
    [12] J. Dongarra, J. Hittinger, J. Bell, L. Chacón, R. Falgout, M. Heroux, P. Hovland, E. Ng, C. Webster and S. Wild, Applied Mathematics Research for Exascale Computing, Technical report, U.S. Department of Energy, Office of Science, Advanced Scientific Computing Research Program, 2014. doi: 10.2172/1149042
    [13] L. C. Evans, Partial Differential Equations, American Mathematical Society, 2010. doi: 10.1090/gsm/019
    [14] Modeling selective local interactions with memory. Physica D: Nonlinear Phenomena (2013) 260: 176-190.
    [15] A fluid dynamic model for the movement of pedestrians. Complex Systems (1992) 6: 391-415.
    [16] Models for dense multilane vehicular traffic. SIAM J. Math. Anal. (2019) 51: 3694-3713.
    [17] C. Murray et al., Basic Research Needs for Microelectronics: Report of the Office of Science Workshop on Basic Research Needs for Microelectronics, Technical report, USDOE Office of Science (SC)(United States), 2018.
    [18] J. S. Vetter et al., Extreme Heterogeneity 2018-Productive Computational Science in the Era of Extreme Heterogeneity: Report for DOE ASCR Workshop on Extreme Heterogeneity, Technical report, USDOE Office of Science (SC), Washington, DC (United States), 2018. doi: 10.2172/1473756
    [19] Roofline: An insightful visual performance model for multicore architectures. Communications of the ACM (2009) 52: 65-76.
  • This article has been cited by:

    1. Richard C Barnard, Kai Huang, Cory Hauck, A mathematical model of asynchronous data flow in parallel computers*, 2020, 85, 0272-4960, 865, 10.1093/imamat/hxaa031
    2. Mohammad Daneshvar, Richard C. Barnard, Cory Hauck, Ilya Timofeyev, Modeling information flow in a computer processor with a multi-stage queuing model, 2024, 01672789, 134446, 10.1016/j.physd.2024.134446
  • Reader Comments
  • © 2021 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(2061) PDF downloads(211) Cited by(2)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog