Citation: Hamidou Tembine. Mean-field-type games[J]. AIMS Mathematics, 2017, 2(4): 706-735. doi: 10.3934/Math.2017.4.706
[1] | Qianwei Zhang, Zhihua Yang, Binwei Gui . Two-stage network data envelopment analysis production games. AIMS Mathematics, 2024, 9(2): 4925-4961. doi: 10.3934/math.2024240 |
[2] | Sharifeh Soofizadeh, Reza Fallahnejad . Evaluation of groups using cooperative game with fuzzy data envelopment analysis. AIMS Mathematics, 2023, 8(4): 8661-8679. doi: 10.3934/math.2023435 |
[3] | Martin Do Pham . Fractal approximation of chaos game representations using recurrent iterated function systems. AIMS Mathematics, 2019, 5(6): 1824-1840. doi: 10.3934/math.2019.6.1824 |
[4] | Jun Wang, Dan Wang, Yuan Yuan . Research on low-carbon closed-loop supply chain strategy based on differential games-dynamic optimization analysis of new and remanufactured products. AIMS Mathematics, 2024, 9(11): 32076-32101. doi: 10.3934/math.20241540 |
[5] | Mustafa Ekici . On an axiomatization of the grey Banzhaf value. AIMS Mathematics, 2023, 8(12): 30405-30418. doi: 10.3934/math.20231552 |
[6] | Xin-Hui Shao, Wan-Chen Zhao . Relaxed modified Newton-based iteration method for generalized absolute value equations. AIMS Mathematics, 2023, 8(2): 4714-4725. doi: 10.3934/math.2023233 |
[7] | Zuliang Lu, Lu Xing, Ruixiang Xu, Mingsong Li, Junman Li . Stochastic evolution game analysis of the strategic coalition of enterprise pollution control. AIMS Mathematics, 2024, 9(4): 9287-9310. doi: 10.3934/math.2024452 |
[8] | Rami Amira, Mohammed Salah Abdelouahab, Nouressadat Touafek, Mouataz Billah Mesmouli, Hasan Nihal Zaidi, Taher S. Hassan . Nonlinear dynamic in a remanufacturing duopoly game: spectral entropy analysis and chaos control. AIMS Mathematics, 2024, 9(3): 7711-7727. doi: 10.3934/math.2024374 |
[9] | Bingyuan Gao, Yaxin Zheng, Jieyu Huang . General equilibrium of Bertrand game: A spatial computational approach. AIMS Mathematics, 2021, 6(9): 10025-10036. doi: 10.3934/math.2021582 |
[10] | Vikramjeet Singh, Sunil Kumar Sharma, Om Parkash, Retneer Sharma, Shivam Bhardwaj . A newfangled isolated entropic measure in probability spaces and its applications to queueing theory. AIMS Mathematics, 2024, 9(10): 27293-27307. doi: 10.3934/math.20241326 |
The term "mean-field" has been referred to a physics concept that attempts to describe the effect of an infinite number of particles on the motion of a single particle. Researchers began to apply the concept to social sciences in the early 1960s to study how an infinite number of factors affect individual decisions. However, the key ingredient in a game-theoretic context is the influence of the distribution of states and or control actions into the payoffs of the decision-makers who may have different preferences and characters, and are not necessarily exchangeable per class (or indistinguishable per class/type). A mean-field-type game is a game in which the payoffs and/or the state dynamics coefficient functions involve not only the state and actions profiles but also the distributions of state-action process (or its marginal distributions). In contrast to mean-field games [2,3] in which a single player does not influence of the mean-field terms, here, it is totally different. In mean-field-type games, a single player may have a strong impact on the mean-field terms. This paper presents the key ingredients and recent development of mean-field-type game theory.
Stochastic games is a model for dynamic interactions in which the state evolves in a way that depends on the actions of the decision-makers. The model was introduced in [1], who proved that two-player zero-sum discounted games with finite states under perfect state observation have a value and both players have optimal stationary strategies. In Shapley's initial model of stochastic games, the state transitions and instant payoff were given by the state-action profiles. However, explicit dependence on the distribution of states or distribution of actions were not examined.
A stochastic optimization problem in which the first moments (expected values of the state) influences the state dynamics and the performance criterion were examined in [10]. This corresponds to a first moment-based mean-field-type game with a single decision-maker. The model was extended by [11] to include aggregative structures of the distribution of states and a new stochastic maximum principle was established in the risk-neutral setting. The resulting system is not a simple augmented state space approach because the expected value of the Hamiltonian is involved in it. The work of [17] extends that result to performance criteria that include the entire probability measure of the states. The work of [12,13] consider both expected value of states and expected of control-actions in the performance criteria. We refer the reader to [4,14,15,16] for stochastic maximum principle of mean-field type.
Mean-field-type games with two or more players can be seen as the multi-agent generalization of the single agent mean-field-type control problem. In [18,19] it is shown that the methodology can be used for risk-sensitive mean-field-type games where weakened conditions on the drifts are provided. State measurement noise and partial observation studies were conducted in [20,21]. See also [27,28,29,31,32,33,34,35,36,37,38,39,46,47,48,49,50].
Wiener chaos expansion has been proposed in [30] to transform the mean-field-type games into equivalent deterministic game problems where the state process is replaced by its polynomial chaos or Wiener chaos expansion counterpart. The decomposition into decoupling series expansion is similar to the one known in the theory of stochastic processes, as the Kosambi-Karhunen-Loeve theorem, is a representation of a stochastic process as an infinite linear combination of orthogonal functions, analogous to a Fourier series representation of a function on a bounded domain. Cooperative mean-field-type games were introduced in [5,41,42,43,44,45]. In [40] several applications of mean-field-type games in engineering are provided.
The rest of the paper is structured as follows. Section 2 presents a generic model. Section 3 focuses on solution methods. Classes of mean-field-type games are provided in Section 4. Section 5 examines the partial state observation case. Section 6 presents some recent development and or extensions of mean-field-type games. Section 7 concludes the paper.
Table 1 summarizes the notations used in the article.
I | ≜ | set of decision-makers |
T | ≜ | Length of the horizon |
[0,T] | ≜ | horizon of the mean-field-type game |
t | ≜ | time index. |
S | ≜ | state space |
s(t) | ≜ | state at time t |
Δ(S) | ≜ | set of probability measures on S |
m(t,.) | ≜ | probability measure of the state at time t |
Ai | ≜ | control action set of decision-maker i∈I |
ai(.) | ≜ | strategy of decision-maker i∈I |
a=(ai)i∈I | ≜ | strategy profile |
b(t,s,m,a) | ≜ | drift coefficient function |
σ(t,s,m,a) | ≜ | diffusion coefficient function |
ri(t,s,m,a) | ≜ | instant payoff of decision-maker i∈I |
gi(s,m) | ≜ | terminal payoff of decision-maker i∈I |
Ri,T(m0,a) | ≜ | cumulative payoff of i |
Vi(t,m) | ≜ | equilibrium payoff of i∈I |
(pi,qi) | ≜ | first order adjoint process of i∈I |
(Pi,Qi) | ≜ | second order adjoint process of i∈I |
v∗i(t,s) | ≜ | dual function of i∈I |
Hi | ≜ | ri+bpi+σqi of i∈I |
Hi,s | ≜ | ri,s+bspi+σsqi |
Hi,ss | ≜ | ri,ss+bsspi+σssqi |
A basic mean-field-type game in continuous time is described by:
• The set of decision-makers I={1,2,…,I}, where I∈N.
• The horizon of the interaction is the interval [0,T], T>0.
• There is a non-empty state space S. The set of probability measures on S is denoted by Δ(S).
• For each decision-maker i, a non-empty control action set Ai is available. The set Ai is not necessarily convex. The set of all control actions of all the decision-makers is A=∏i∈IAi.
• An instant payoff of decision-maker i is ri: S×Δ(S)×A→R
• The state evolution is explicitly given by a controlled Itô's stochastic differential equation of mean-field type, called controlled McKean-Vlasov equation:
s(t)=s0+∫t0bdt′+∫t0σdB(t′), t>0. |
with s(0)=s0∈S, and b,σ: S×Δ(S)×A→R, b is the drift coefficient functional and σ is the diffusion coefficient functional, B is a one-dimensional standard Brownian motion on a given probability space (Ω,P,F).
Given a initial state s0 which is drawn from the initial distribution m0, the game G(s0) proceeds as follows. At each instant, each decision-maker observes the state (perfect monitoring, perfect state observation), chooses a control action according her strategy (defined below) and observes/measures her payoff.
For this game, we adopt the following notions: An admissible control of decision-maker i∈I is progressively measurable process with respect to the filtration Ft, taking values in Ai. L2F([0,T],Aj) denotes the set of F-adapted, Aj-valued processes within [0,T], and C([0,T],X) denotes the set of continuous (hence measurable) [0,T]→X. The set of admissible controls of decision-maker i is denoted by Ai.
We identify two processes ai and ˜ai in Ai if
P(ai=˜ai, a.e. on [0,T])=1. |
A strategy for decision-maker i starting at time 0 is a measurable map (renamed again by) ai: [0,T]×C([0,T],X)×∏j≠iL2F([0,T],Aj)→Ai for which there exists ϵ>0 such that for any
(t′,f1,f2)∈(0,T]×{C([0,T],X)×∏j≠iL2F([0,T],Aj)}2, |
if f1=f2 on [0,t′] then ai(.,f1)=ai(.,f2), on [0,t′+ϵ]. To each strategy profile one can associate a control process profile. This will allow us to work with both open-loop and feedback form of strategies by considering ai(.,ss0,a,m). For open-loop strategies the information structure is limited to {t}, ai is simply a measurable function of time (and the initial data). The stochastic maximum principle can be used as a methodology for finding optimal open-loop strategies. The information structure for feedback strategies is s and its distribution (and the initial data). The dual adjoint functions which are obtained from the Bellman functions can be used for finding feedback strategies where here the feedback is a state-and-mean-field feedback.
The cumulative payoff of decision-maker i is
Ri,T(m0,a):=gi(s(T),m(T,.))+∫T0ri(s(t),m(t,.),a(t)) dt, |
where gi(s(T),m(T,.)) is the terminal payoff of i. The risk-neutral payoff of i is E[Ri,T(m0,a)]. The risk-neutral mean-field-type game G0,T(m0) is the normal-form game (I,(Ai,ERi,T)i∈I).
We now introduce the key problems addressed in this paper and some solution concepts.
Given a−i:=(a1,…,ai−1,ai+1,…), the best-response value problem associated with decision-maker i, is
Vi(0,m0)={supai∈AiE[Ri,T(m0,a)]s(t)=ss0,a(t), s(0)=s0∼m0m(t,.)=mm0,a(t,.)=Pss0,a(t), | (1) |
The strategies ai of i that achieve the above best-response value V(0,m0) are called best-response strategies of i to a−i:=(aj)j≠i. The set of such strategies is denoted by BRi(a−i). We are interested in characterizing best response strategies of every decision-maker.
An equilibrium point is a strategy profile a such that for every decision-maker i, ai∈BRi(a−i) and the resulting mean-field is m(t,.)=Pss0,a(t). We are interested in characterizing equilibrium strategies of every decision-maker.
As the state dynamics is a stochastic differential equation of mean-field type, the existence and uniqueness of solution to the state equation is not always guaranteed. Below, we provide sufficiency conditions for existence and uniqueness of a solution.
Lemma 1 (Existence). If the coefficient functions b,σ are continuously differentiable with respect to s,m and the Gateaux-derivatives are bounded, then for each strategy profile (ai)i∈I∈∏i∈IAi, there is a unique solution to the state dynamics which we denote by s(t):=ss0,a(t).
A proof of this Lemma can be directly obtained from [7] by choosing a strategy profile (ai)i∈I∈∏i∈IAi.
Lemma 2. The probability law of the state solves (in the weak sense) the following Fokker-Planck-Kolmogorov equation of mean-field type:
mt+(bm)s−12(σ2m)ss=0, m(0,.)=m0(.). | (2) |
The proof is by now standard using integration by parts in the sense of distribution.
In this section we present three different solution approaches for solving mean-field-type game problems. We also provide some relationships between these methods. We introduce a notion of Gâteaux differentiability with respect to a measure.
Let S be a vector space, b∈Lα(Ω×[0,T]×S×Δ(S), R), O⊂Lα∗ is an open set.
Definition 1 (Gâteaux derivative). The Gâteaux differential bm[˜m] of the functional b at m∈O in the direction ˜m is defined as
limϵ→0+ddϵb(.,t,s,m+ϵ˜m)=∫bm(.,t,s,m)(ξ) ˜m(dξ). |
If the limit exists for all direction ˜m, then one says that b is Gâteaux differentiable at m.
If m∈Lα and bm∈Lα∗, the limit above is finite. The limit appearing in Definition 1 is taken relative to the topology of Lα∗.
Note that one can connect the derivative with the respect to s of the Gateaux derivative bm to the notion of functional derivative to respect to measure and also to the Wasserstein gradient.
We provide Gâteaux differentiation of ‖s‖α−based functions:
• α−norm: Let g(m)=(∫|s|αm(t,ds))1/α=:m1/αα, and m∈Lα, then
limϵ→0+ddϵg(m+ϵ˜m)=1α[∫|ξ|α m(dξ)]1α−1[∫|ξ|α ˜m(dξ)]. |
Thus, gm(m)(s)=sααmα−1α, and ∂sgm(m)(s)=sα−1mα−1α≠0=∂m[gs].
• Lα−normed drift: Let
ˉb=(∫y∈S|b|α(.,t,sa(t),y,a(t))ma(t,dy))1α, |
i.e., the Lα−norm of b with respect to the measure ma(t,.). We compute the Gâteaux-derivative of the Lα−normed drifts: ˉbm(.,t,s,m)(ξ):=bα(.,t,s,ξ)αˉbα−1. By changing variables, one has ˉbm(.,t,ξ,m)(s)=bα(.,t,ξ,s)αˉbα−1(t,ξ,m). We differentiate with respect to the state s to get:
∂sˉbm(.,t,ξ,m)(s)=bα−1(.,t,ξ,s)by(.,t,ξ,s)ˉbα−1(t,ξ,m). | (3) |
E[L∂sˉbm(.,t,S,m)(s)]=E[Lbα−1(.,t,S,s)by(.,t,S,s)ˉbα−1(t,S,m)]:=˜E[˜Lbα−1(.,t,˜S,s)by(.,t,˜S,s)ˉbα−1(t,˜S,m)], where the notation ˜E denotes the expectation with respect to the variables with ˜S which is a copy of S. We now replace the argument s by S to get ˜E[˜L∂sˉbm(.,t,˜S,m)(S)]=˜E[˜Lbα−1(.,t,˜S,S)by(.,t,˜S,S)ˉbα−1(t,˜S,m)].
In contrast to classical differential games or classical mean-field games in which one can directly use the standard Dynamic Programming Principle (DPP) on s(t), here, the presence of the term ma(t,.) in the functions ri,b,σ may create a time-inconsistency, and s(t) is not the appropriate state for DPP. The next Proposition identifies an appropriate state space under which DPP can be obtained and a Hamilton-Jacobi-Bellman (HJB) equation can be derived.
Proposition 1. If there exists V such that
{i∈I,Vi,t+∫s[supai ri+bVi,sm+σ22Vi,ssm]m(t,ds)=0,Vi(T,m)=∫sm(T,ds)gi(s,m(T,.)),m(t,.)=Psa(t) | (4) |
then Vi(0,m0) is an equilibrium payoff and the optimal strategy ai of i maximizes ri+bVi,sm+σ22Vi,ssm.
Proof. In order to use a dynamic programming principle we look for an augmented state such that one gets a Markovian system. By rewriting the payoff functional ERi,T as a function of the measures m(t,.) and the strategies ai(.), we obtain a classical setup. Hence, we only need the evolution of the measures m(t,.) which is given by (2). Then, one obtains a deterministic differential game problem with m as state (in infinite dimensions). The best response of i is
Vi(0,m0)={supai∈AiE[Ri,T(m0,a)]m(t,.) solves (2). |
Applying the classical DPP to the deterministic problem yields
Vi,t+supai[∫srim(t,ds)+⟨˜b,Vi,m⟩]=0, |
where ˜b(m)(t,s)=−(bm)s+12(σ2m)ss. We apply the theory of distributions to establish the following equalities:
⟨˜b,Vi,m⟩=∫s˜b(m)Vi,m ds | (5) |
=∫s−(bm)sVi,m+12(σ2m)ssVi,m | (6) |
=∫s[bVi,sm+σ22Vi,ssm]m(t,ds), | (7) |
which is obtained by integration by part (in the sense of distribution). This completes the proof.
Example 1 (Mean-field-free equations). If ri,gi,b,σ are all independent of m then Vi(t,m)=∫vi(t,s) m(t,ds)=Em(t,.)vi(t,s(t)) where the function (vi(t,s))i solves the classical Bellman system
{i∈I,vi,t+Hi=0,vi(T,s)=gi(s,m(T,.)), | (8) |
Moreover
vi(0,s0)={supai∈AiE[Ri,T(s0,a)]s(t)=ss0,a(t), s(0)=s0 | (9) |
Proposition 1 provides a sufficiency condition for equilibria. However, the existence of classical solution to the infinite-dimensional HJB equation is not guaranteed and the value function may not be differentiable as in the classical case. Weaker notion of solution such as viscosity solution using weak sub/super-differential set will be introduced in the next section.
Since the DPP equilibrium system (4) is in infinite dimensions, the solvability of such an integro-partial differential equation poses some technical difficulties. However, when the value function V is weakly differentiable with respect to (s,m) then a finite dimensional quantity can be obtained as it is in the dual space.
Definition 2. The function v∗(t,s)=Vm(t,s) is the dual function associated to V.
Example 2. In the mean-field-free setting, i.e., when ri,gi,b,σ are independent of m, the dual function v∗(t,s)=Vm(t,s) coincides with the function v(t,s). However, the two functions do not coincide in the mean-field-dependent setting.
Proposition 2. If Vm,Vsm,Vssm exist and are continuous then the equilibrium dual function v∗=Vm solves the following system of partial differential equations
{i∈I,v∗i,t+Hi+EHi,m=0,v∗i(T,s)=gi(s,m(T,.))+Egi,m,m(t,.)=Psa(t), m(0,.)=m0(.) | (10) |
where Hi(t,s,m,v∗i,s,v∗i,ss)=supai [ri+bv∗i,s+σ22v∗i,ss].
Proof. We differentiate the HJB system (4) in the direction ˜m=m,
{i∈I,(Vi,t)m+[supai ri+bVi,sm+σ22Vi,ssm]+∫˜s[supai ri+bVi,sm+σ22Vi,ssm]m(t,˜s,m)(s)m(t,d˜s)=0,Vi,m(T,s)=gi(s,m(T,.))+∫sm(T,ds)gi(s,m(T,.)),m(t,.)=Psa(t) | (11) |
Noting that (Vi,t)m=v∗i,t, Vi,sm=v∗i,s and Vi,ssm=v∗i,ss, the latter system (11) becomes the announced one. This completes the proof.
As pointed out by [17], the dual function v∗i(0,s0) is not the equilibrium payoff of the decision-maker i. The dual function plays an important role in establishing a stochastic maximum principle as we will see in the next subsection.
We now present first order and second-order adjoint processes that are useful in establishing necessary conditions for mean-field-type equilibria. The advantage of the adjoint processes is that they solve a class of linear backward SDEs of mean-field type. The following result provides existence and uniqueness conditions.
Lemma 3 ([9,11]). Consider the following mean-field backward SDE
p(t)=p(T)+∫Tt˜E[ˆf(t′,˜p(t′),˜q(t′),p(t′),q(t′))] dt′−∫Ttq(t′)dB(t′), |
where p(T) is a progressively measurable, square integrable random variable. Let ˆf(t,.,.,.,.) be Lipschitz for all time t∈[0,T] and t↦ˆf(t,0,0,0,0) be square integrable over [0,T]. Then, the mean-field backward SDE has a unique adapted solution satisfying
E[supt∈[0,T]|p(t)|2+∫T0|q(t)|2dt]<∞. | (12) |
Consider the first-order adjoint processes (pi,qi)i∈I
{dpi=−[Hi,s+˜E∂sHi,m]dt+qidB,pi(T)=gi,s(T)+˜E{∂sgi,m(T)},i∈I. | (13) |
Lemma 4. If the functions b,σ,ri,gi are continuously differentiable with respect to (s,m), all their first-order derivatives with respect to (s,m) are continuous in (s,m,a), and bounded. Then, the first-order adjoint system is a linear SDE with almost surely bounded coefficient functions. There is a unique F−adapted solution such that
E[supt∈[0,T]|pi(t)|2+∫T0|qi(t)|2dt]<+∞. |
These strong smoothness conditions on b,σ,ri,gi can be considerably weakened using representations of weak sub/super-differential sets. We refer the reader to [8,9] for more details on existence and uniqueness of solutions to backward SDE of mean-field type.
Proof. By choosing ˆf(t,˜p(t),˜q(t),p(t),q(t))=α0(t,.)+α1(t,.)˜p(t)+α2(t,.)˜q(t)+α3(t,.)p(t)+α4(t,.)q(t) where αi(t,.) are measurable bounded coefficient functions, one gets a backward equation in the form of the adjoint equations. Under the assumptions imposed on b,σ,ri,gi, the coefficient αi fulfill the requirement and result follows as a direct application.
Note that for linear-quadratic mean-field-type games, the payoff function ri=−s2−∫y2m(t,dy)−a2i−∫a′i(a′i)2Pai(da′i) does not satisfy the above assumptions. In that case, one can relax the boundedness assumption and replace it by L2−estimates if the second moment is finite.
Consider the second-order adjoint processes (Pi,Qi)i∈I
{dPi=−{Hi,ss+˜E∂ssHi,m+(2bs+σ2s)Pi+2σsQi}dt+QidB,Pi(T)=gi,ss(T)+˜E{∂ssgi,m(T)},i∈I. | (14) |
Lemma 5. If the functions b,σ,ri,gi are twice continuously differentiable with respect to (s,m), all their derivatives up to second order with respect to (s,m) are continuous in (s,m,a), and bounded. Then, the second order adjoint system is a linear backward SDE with almost surely bounded coefficient functions. There is a unique F−adapted solution such that
E[supt∈[0,T]|Pi(t)|2+∫T0|Qi(t)|2dt]<+∞. |
Follows immediately from Lemma 3.
Proposition 3 (Stochastic Maximum Principle). If (s,a) is a pair of equilibrium state and equilibrium strategy, then there is vector of F−adapted processes (pi,qi,Pi,Qi)i∈I such that Hi(s,m,a′i,a−i)−Hi(s,m,a)+12Pi(σ(s,m,a′i,a−i)−σ(s,m,a))2≤0 for all a′i∈Ai, almost every time t∈[0,T] and P−almost surely. In particular, if σ(t,s,m,a)=σ(t,s,m) is independent of ai then supa′i∈AiHi(s,m,a′i,a−i)≤Hi(s,m,a).
Proof. The proof follows similar steps as in [10] by replacing the derivative with the respect to first moment component by the Gateaux-derivative with respect to m, for each decision-maker i when fixing the strategies of the other decision-makers.
In order to provide weaker conditions we introduce super-and subdifferentials of the viscosity solution V. Define the first order super-differential of the dual function as
D1,+t,wVm(t,w)={(d1,d2)∈R×T(S) |lim supt′,w′→t,wVm(t′,w′)−Vm(t,w)−d1(t′−t)−d2(w′−w)|t′−t|+‖w′−w‖≤0}.
Similarly, the first order sub-differential is
D1,−t,wVm(t,w)={(d1,d2)∈R×T(S) |liminft′,w′→t,wVm(t′,w′)−Vm(t,w)−d1(t′−t)−d2(w′−w)|t′−t|+‖w′−w‖≥0}.
Using these weak derivatives, one has a relationship between stochastic maximum principle and dynamic programming principle in terms of inclusion: p(t)∈D1,+sVm(t,w). In particular, if V is Gâteaux-differentiable with respect to m and s then
{pi(t)=Vi,sm(s(t),m(t))=d2,qi(t)=σVi,ssm(s(t),m(t)). |
Proposition 4. If the first and second order weak derivatives with respect to s of Vm exist then the process pi(t)=Vi,sm(s(t)) solves the backward SDE:
dpi=−αidt+βidB, pi(T)=gi,s(s(T),m(T))+∫s∂sgi,mm(T,ds), |
where αi=Hi,s+˜E[∂sHi,m], Hi,s is a notation for ri,s+bspi+σsqi and should not be confused with derivative of the functional Hi. βi=σVi,ssm=qi.
Proof. Let ˜fi(t,s)=Vi,sm(t,s)=v∗i,s(t,s) Applying Ito's formula applied to ˜pi(t)=˜fi(t,s(t)) yields
d˜pi=[(˜fi)t+b(˜fi)s+σ22(˜fi)ss]dt+σ(˜fi)sdB, |
We identify the coefficient processes ˜qi=σ(˜fi)s=σv∗i,ss=σVi,ssm. It remains to find the drift coefficient of ˜pi.
{˜pi=v∗i,s(t,s)=Vi,sm(t,s),(˜fi)t=v∗i,ts,b(˜fi)s=bv∗i,ss,σ22(˜fi)ss=σ22v∗i,sss |
Summing together one obtains
{(˜fi)t+b(˜fi)s+σ22(˜fi)ss=v∗i,ts+bv∗i,ss+σ22v∗i,sss | (15) |
In order to identify the latter term, one differentiates with respect to s the equation (10) satisfied by the dual function v∗i.
{i∈I,v∗i,st+ri,s+(bv∗i,s)s+(σ22vi,ss)s+˜EHi,sm=0,v∗i,s(T,s)=gi,s(s,m(T,.))+˜Egi,sm,m(t,.)=Psa(t), m(0,.)=m0(.) | (16) |
which is expanded as
{i∈I,(v∗i,s)t+bv∗i,ss+σ22v∗i,sss+[ri,s+bsv∗i,s+σs(σv∗i,ss)+˜EHi,sm]=0,v∗i,s(T,s)=gi,s(s,m(T,.))+˜Egi,sm,m(t,.)=Psa(t), m(0,.)=m0(.) | (17) |
It follows that (v∗i,s)t+bv∗i,ss+σ22v∗i,sss=−[ri,s+bsv∗i,s+σs(σv∗i,ss)+˜EHi,sm], and
d˜pi=−[ri,s+bsv∗i,s+σs(σv∗i,ss)+˜EHi,sm]dt+σv∗i,ssdB=−[Hi,s+˜EHi,sm]dt+qidB, |
The two pair of processes (pi,qi) and (˜pi=v∗i,s,˜qi=σv∗i,ss) solves the same backward SDE with the same terminal condition. Under the assumptions above, these two processes are identical. This completes the proof.
Define the second order super-differential is D1,2,+t,wVm(t,w)={(d1,d2,d3)∈R×T(S)×Sn |
limsupt′,w′→t,wVm(t′,w′)−Vm(t,w)−d1(t′−t)−d2(w′−w)−12(w′−w)′d3(w′−w)|t′−t|+‖w′−w‖2≤0}
Similarly, the second order sub-differential is D1,2,−t,wVm(t,w)={(d1,d2,d3)∈R×T(S)×Rn×Sn |
liminft′,w′→t,wVm(t′,w′)−Vm(t,w)−d1(t′−t)−d2(w′−w)−12(w′−w)′d3(w′−w)|t′−t|+‖w′−w‖2≥0}
Vi is a viscosity solution if and only if Vi(T,m)=∫sgim(T,ds) for any measure m and for all (t,m)∈[0,T)×Δ(S),
{d1+∫supai(.)∈AiHi(t,s,m,a,d2,d3)≤0, ∀ d∈D1,2,+t,wVi,m(t,w),d1+∫supai(.)∈AiHi≥0, ∀ d∈D1,2,−t,wVi,m(t,w). |
In particular, if Vi,m∈C1,3 then
{Pi(t)=Vi,ssm(t,s(t))=d3,Qi(t)=σVi,sssm(t,s(t)). |
Proposition 5. If the first and second order weak derivatives with respect to s of Vm exist then the process Pi(t)=Vi,ssm(t,s(t)) solves the backward SDE:
dPi=−γidt+κidB, Pi(T)=gi,ss(s(T),m(T))+∫s∂ssgi,mm(T,ds), |
where γi=Hi,ss+˜E∂ssHi,m+(2bs+σ2s)Pi+2σsQi, κi=σVi,sssm=Qi
Proof. We compute explicitly the second order terms. Let ˜gi(t,s)=Vi,ssm(t,s)=v∗i,ss(t,s). Applying Ito's formula to ˜Pi(t)=˜gi(t,s(t)) yields
d˜Pi=[(˜gi)t+b(˜gi)s+σ22(˜gi)ss]dt+σ(˜gi)sdB, |
We identify the coefficient processes ˜Qi=σ(˜gi)s=σv∗i,sss=σVi,sssm. It remains to find the drift coefficient of ˜Pi.
{˜Pi=v∗i,ss(t,s)=Vi,ssm(t,s),(˜gi)t=v∗i,tss,b(˜gi)s=bv∗i,sss,σ22(˜gi)ss=σ22v∗i,ssss |
Summing together one obtains
{(˜gi)t+b(˜gi)s+σ22(˜gi)ss=v∗i,tss+bv∗i,sss+σ22v∗i,ssss | (18) |
In order to identify the latter term, one can differentiate twice with respect to s the equation (10) satisfied by the dual function v∗i.
{i∈I,v∗i,sst+ri,ss+(bv∗i,s)ss+(σ22v∗i,ss)ss+˜EHi,ssm=0,v∗i,ss(T,m)=gi,ss(s,m(T,.))+˜Egi,ssm,m(t,.)=Psa(t), m(0,.)=m0(.) | (19) |
{(bv∗i,s)ss=(bsv∗i,s+bv∗i,ss)s=bssv∗i,s+2bsv∗i,ss+bv∗i,sss,(σ22v∗i,ss)ss=[σsσv∗i,ss+σ22v∗i,sss]s=(σssσ+σ2s)v∗i,ss+2σsσv∗i,sss+σ22v∗i,ssss |
By substitution we obtain
{0=v∗i,sst+ri,ss+(bv∗i,s)ss+(σ22v∗i,ss)ss+˜EHi,ssm=v∗i,tss+bv∗i,sss+σ22v∗i,ssss+ri,ss+bssv∗i,s+σss(σv∗i,ss)++(2bs+σ2s)v∗i,ss+2σs(σv∗i,sss)+˜EHi,ssm |
{−[v∗i,tss+bv∗i,sss+σ22v∗i,ssss]=ri,ss+bssv∗i,s+σssσv∗i,ss++(2bs+σ2s)v∗i,ss+2σs(σv∗i,sss)+˜EHi,ssm=Hi,ss+(2bs+σ2s)˜Pi+2σs˜Qi+˜EHi,ssm |
d˜Pi=[(˜gi)t+b(˜gi)s+σ22(˜gi)ss]dt+σ(˜gi)sdB=[v∗i,tss+bv∗i,sss+σ22v∗i,ssss]dt+(σv∗i,sss)dB=−[Hi,ss+˜EHi,ssm+(2bs+σ2s)˜Pi+2σs˜Qi]dt+˜QidB, |
where Hi,ss=ri,ss+bsspi+σssqi. This completes the proof.
The space of square integrable real-valued deterministic functions over [0,T], is denoted as L2([0,T],R). From Riesz and Fischer's theorem, the space L2([0,T],R) of square Lebesgue-integrable functions over [0,T] is an (infinite-dimensional) complete metric space. Thus, L2([0,T],R) is an Hilbert space (with the inner product ⟨f,g⟩=∫T0fgdt).
Lemma 6 ([22], prop. 5.14). Every Hilbert space, that is not reduced to singleton {0}, has an orthonormal basis. In particular, L2([0,T],R) has an orthonormal basis. Let {ˆmk(.),k∈Z+} be an orthonormal basis in the Hilbert space L2([0,T],R). Then, the following are equivalent:
• {ˆmk(.),k∈Z+} is an orthonormal basis in the Hilbert space L2([0,T],R).
• ∀ f∈L2([0,T],R), f(t)=∑k∈Z+⟨f,ˆmk⟩ˆmk(t)=∑k∈Z+(∫T0f(t′)ˆmk(t′)dt′)ˆmk(t).
• One has ‖f‖2L2([0,T],R)=∑k∈Z+|⟨f,ˆmk⟩|2=∑k∈Z+(∫T0f(t)ˆmk(t) dt)2 for all f∈L2([0,T],R).
• The system ⟨f,ˆmk⟩=0 ∀k∈Z+ implies that f=0 (identically null function over [0,T]).
See Robinson ([22], prop. 5.14) for a proof of this result.
Example 3. Orthonormal Basis of L2([0,T],R):
• An important orthogonal basis of L2[0,T] is the set {1,cos(2πTkt),sin(2πTkt)} k≥1. Normalizing by the respective norms in L2([0,T],R), one gets an orthonormal basis.
• One can use the Gram-Schmidt algorithm to construct new bases from more-or-less arbitrary collections of vectors. This is an inductive process which any basis of square integrable functions over [0,T], {fk | k∈Z+}. (i) Since f0≠0, we set ˆm0=f0||f0||L2([0,T],R) and (ii) then inductively
ˆfk+1(t)=fk+1(t)−k∑i=0⟨fk+1,ˆmi⟩ˆmi(t), |
ˆfk+1≠0 because otherwise it can be written as a linear combination of the vector ˆm0(),…,ˆmk(). Thus, set ˆmk+1=ˆfk+1||ˆfk+1||L2([0,T],R) to get a unit vector. (iii) Then the set {ˆmk(.) | k∈Z+} is an orthonormal basis of L2([0,T],R).
Lemma 7. From any orthonormal basis {ˆmk}k of L2([0,T],R), we set ξk=∫T0ˆmk(t)dB(t) be the stochastic Itô's integral. Then, the random variables {ξk}k are identically distributed and Gaussian with zero mean and variance equals to
∫T0ˆm2k(t)dt=‖ˆmk‖2L2([0,T],R)=1. |
Furthermore, the standard Brownian motion can be decomposed as follows:
B(t)=∫t0dB(t′)=∫T0ll[0,t](t′)dB(t′)=∫T0(∑k≥0ˆmk(t′)⟨ll[0,t](.),ˆmk(.)⟩)dB(t′)=∑k≥0⟨ll[0,t](.),ˆmk(.)⟩∫T0ˆmk(t′)dB(t′)=∑k≥0(∫t0ˆmk(t′)dt′)ξk. | (20) |
The expansion
B(t)=∑k≥0ξk∫t0ˆmk(t′)dt′ |
converges in the mean-square sense:
E[‖B(t)−K∑k=0ξk∫t0ˆmk(t′)dt′‖2]→0 |
as K goes to infinity, for t≤T.
The mean-square error is
E[‖+∞∑k=K+1ξk∫t0ˆmk(t′)dt′‖2]=+∞∑k=K+1(∫t0ˆmk(t′)dt′)2=O(tK). |
As a result of the above Lemma, we can view the Ito's process
s(t)=s(0)+∫t0b(t′,s(t′),m(t′),a(t′))dt′+∫t0σ(t′,s(t′),m(t′),a(t′))dB(t′), |
as a function of t,s(0),w and the set of random variables ξ=(ξk)k. We denote the solution of the state equation as s(t):=ss0,a(t).
Definition 3. With the one-dimensional Hermite polynomials
ˆHk(t)=(−1)kk!et22dkdtk[e−t22], |
the basis polynomials of the Wiener chaos space are defined by
χα(ξ)=√α!∏iˆHαi(ξi) |
where α! is the product of all the components' factorials, α denotes the multi-index from the set
{α=(αi)i | αi≥0, | |α|=+∞∑i=0αi<+∞}. |
Note that, for a multi-index α such that |α|<+∞, the number of the terms in the product ∏iˆHαi is finite. From Definition 3 and the properties of the Hermite polynomials one directly shows the orthogonality of the above polynomial basis, which are often referred to as Wick polynomials. If the controlled state process s∈L2(Ω×[0,T],R) then one can decompose it as
s(t)=∑α⟨s(t),χα(ξ)⟩χα(ξ)=∑αE[s(t)χα(ξ)]χα(ξ), |
where ⟨x,y⟩=E[xy], when x and y are random variables. Particularly the coefficient function of order zero, s0(t)=E[s(t)χ0(ξ)], obtains a special meaning. It coincides with the expectation of the process x(t), as the basis polynomial of order zero is identically one.
This is summarized in the following well-known result:
Theorem 3.1 (Cameron and Martin [23]). Assume that the state process s(t) is adapted to the filtration generated by the Wiener chaos {χα(ξ)}α, and is in L2(Ω×[0,T],R) i.e., it satisfies the integrability condition E[∫T0|s(t)|2dt]<+∞. Then, s(t) can be expanded in [0,T] as
s(t,ξ)=∑αsα(t)χα(ξ), |
where the deterministic coefficient functions sα(t)=E[x(t)χα(ξ)] can be interpreted as projections of the process s(t) onto the corresponding chaos basis.
Note that the variance starts with index 1 and not from 0. The polynomial chaos (PC) or Wiener chaos framework was developed by Norbert Wiener [24,25] and generalized by Cameron and Martin later on [23]. The Fourier-Hermite series of s(t) is often called the Wiener chaos expansion (WCE).
Theorem 3.2 (Error estimates, Theorem 2.1 in [26]). If there exists k such that ‖∂kξs(t,ξ)‖2<+∞, then the following error estimates holds:
‖s(t)−∑|α|≤Ksα(t)χα(ξ)‖2≤‖∂kξs(t,ξ)‖2k−1∏i=0(K−i+1)=:ϵK. |
It is important to notice that this error ϵK is much better than Monte-Carlo sampling (in the order of O(1√K) for K samples) when the random processes of the basis capture the state process s. This provides a more efficient way to solve mean-field-type game problem. However, the choice of a basis is crucial and the chosen elements in the basis need to be sparse enough in order to reduce the curse of dimensionality.
We now reformulate the mean-field-type game problems using the PC framework above. We replace the state process s and the control process by their respective PC expansions. Therefore, s is determined by its coefficient functions (sα)α. We can compute the cost as
Egi(s(T),m(T))=ˆgi((sα(T))α)Eri(t,s(t),m(t),a(t))=ˆri(t,(sα(t))α,(aα(t))α). | (21) |
and the state dynamics coefficients are
sα(t)=sα(0)llα=0+⟨χα(.),∫t0bdt′⟩+⟨χα(.),∫t0σdB(t′)⟩. |
Since the Lebesgue integral term is ⟨χα(.),∫t0bdt′⟩=∫t0bαdt′, the Ito's integral term becomes
⟨χα(.),∫t0σdB(t′)⟩=∫t0σ∂t′[χα]dt′. |
Using Hermite polynomials, we know that
∂t′[χα]=∞∑k=1√αkmk(t′)χˆαk(ξ) |
where ˆαk=(α1,…,αk−1,αk−1,αk+1,…), i.e., ˆαki=(αk−1)lli=k+αilli≠k. It turns out that sα solves the ordinary differential system:
˙sα(t)=bα(t)+∞∑k=1√αkmk(t)E[σ(t,.)χˆαk(ξ)] |
with initial condition s(0)llα=0.
Proposition 6. The best response problem (1) becomes the following: For each decision-maker i, given the coefficient strategies of the others ({ai,α}α, j≠i) the best response solves
{sup(ai,α(⋅))αEˆRi,T, subject to ˙sα(t)=bα(t)+∞∑k=1√αkˆmk(t)E[σ(t,.)χˆαk]sα(0)=s(0)llα=0, | (22) |
where ˆRi,T=ˆgi+∫T0 ˆridt.
Proof. We first rewrite the instantaneous payoff functions in terms Wiener chaos. We replace the state process s(t) by its decomposition ∑αsα(t)χα(ξ).
Eri(t,s(t),m(t),a(t))=:ˆri(t,{sα(t)}α,{aα(t)}α) | (23) |
In addition, the coefficients dynamics sα(t) can be easily obtained from the state dynamics and sα solves a standard differential system:
˙sα(t)=bα(t)+∞∑k=1√αkˆmk(t)E[σ(t,.)χˆαk(ξ)]. |
with initial condition s(0)llα=0. Collecting together, we obtain a standard dynamic optimization problem subject to coefficient dynamics with multiple index. This completes the proof.
As we can observe, problem (22) is now a standard differential game with standard deterministic state dynamics. Note that, in order to get these transformations, the underlying problem must be included in the space of square integrable random process Hilbert space. This implies, that the variance of the problem must be finite.
In this section, equilibrium system for mean-field-type games with aggregative structures are presented.
In this subsection we take b,σ,ri,gi as functions of ma only through aggregative terms ∫ϕ.(s)ma(t,ds)=E[ϕ.(sa(t))], and ∫ϕ.(a′)Pa(t,da′)=E[ϕ.(a(t))]. In that case the Gâteaux differentiation with respect to m can be reduced to a finite-dimensional differentiation.
{Ri,T=[∫T0ri(t,sa(t),E[ϕri(sa(t))],a(t),E[ϕri(a(t))])dt+gi(sa(T),E[ϕgi(sa(t))])],supai∈UiERi,T subject todsa(t)=b(t,sa(t),E[ϕb(sa(t))],a(t),E[ϕb(a(t))]) dt+σ(t,sa(t),E[ϕσ(sa(t))],a(t),E[ϕσ(a(t))])dB(t),sa(0)=s0. | (24) |
If b,σ,ri are only functions of (t,sa,E[ϕ.(sa(t))],a,E[ϕ.(s(t))]), then the partial derivatives of the dual functions pi=Vsm(t,sa(t)) and qi=σVssm(t,sa(t)) solve backward SDE system:
{dpi={−Hi,s−ϕri,sE[ri,y]−ϕb,sE[bypi]−ϕσ,sE[σyqi]}+qidB,pi(T)=gi,s(sa(T),E[ϕgi(sa(T))])+ϕgi,s(T)E[gi,y(sa(T),E[ϕgi(sa(T))])] | (25) |
where Hi,s=ri,s+bs.pi+σsqi.
Note that the aggregative structure in the form ˜E[ϕ(S(t),˜S(t))]=∫wϕ(S,w)m(t,dw) which is a random variable, is already included in the cases ϕ(t,s,m) discussed in the previous section.
We take b,σ,ri as functions of the k−th moment of the state ∫skma(t,ds)=E[(sa(t))k], and the l−moment of the control action of decision-maker i ∫(a′i)lPai(t,da′i)=E[ali(t)]. Thus, aggregative function is ϕ.(y)=yk. Then, the derivative of ϕ is ϕ.,y=kyk−1. For k>1, the dynamics yields
{dpi=−{Hi,s+ksk−1EHi,y}dt+qidB,pi(T)=gi,s(sa(T),E[(sa(T))k])+ksk−1(T)E[gi,y(sa(T),E[(sa(T))k])] | (26) |
The existence of solution is not immediate. It requires the (k−1)−moment estimates of the state for k>1.
We take b,σ,ri as functions of the first moments ∫sma(t,ds)=E[sa(t)], and ∫a′Pa(t,da′)=E[a(t)]. Then the aggregative functions are reduced to the identity function ϕ.(y)=y. Then, the derivative of ϕ is ϕ.,y=1. Hence, the first order risk-neutral adjoint processes pi(t)=Vi,sm(t,sa(t)) and qi(t)=σVi,ssm(t,sa(t)) solve a simpler backward SDE system and (25) reduces to
dpi={−Hi,s−E[Hi,y(t,sa,E[sa],pi,qi)]}+qidBpi(T)=gi,s(sa(T),E[sa(T)])+E[gi,y(sa(T),E[sa(T)])]. |
However, the optimal control strategy equation is modified because of the presence of E[a(t)] In the convex control set case, one obtains the variational inequality (Hi,a+EHi,ˉa)(ai−a′i)≥0 where Hi,ˉai denotes the Hamiltonian derivative with respect to the component ˉai=E[ai(t)].
This section examines situations in which the state is partially observed.
{Ri,T(m0,a)=∫T0ri(t,ss0,a(t),mm0,a(t),a(t))dt+gi(ss0,a(T),mm0,a1(T)),supai∈AiEaRi,T(m0,a) subject todss0,a(t)=b(t,ss0,a(t),mm0,a(t),a(t)) dt+σ(t,ss0,a(t),mm0,a(t),a(t))dB(t)+σo(t,ss0,a(t),mm0,a(t),a(t))dBo(t),sa(0)=s0,dya(t)=bo(t,ss0,a(t),mm0,a(t),a(t)) dt+dBo(t), ya(0)=0,mm0,a(t, .):=Pss0,a(t). | (27) |
where Ea denotes the the expectation with respect to the probability space (Ω,F,{Ft}t,Pa) where Pa is defined below. Under partial state observation an admissible strategy ai is a random process ai: [0,T]×Ω→Ai that is adapted to Fyt=σ(y(t′), t′≤t) and E∫T0|ai(t,.)|αdt<∞. The set of admissible strategies is denoted by ˆAi. Introduce the density process ρa(t)=e∫t0bo(t′)dya(t′)−12∫t0|bo(t′)|2dt′. Then, ρa(t) solves the forward SDE dρa=ρabodya, ρa(0)=1. By Girsanov transform, dPa=ρadP, the partial observation problem is transformed into a full observation problem with respect to P and with a new state (ˆsa,ρa).
{supai∈ˆAi E[∫T0ρa(t)ri(t,ˆss0,a(t),mm0,a(t),a(t))dt+ρa(T)gi(ˆss0,a(T),mm0,a(T))]subject todˆsa(t)=b dt+σdB(t)+σo[dya(t)−bo dt],=[b−σobo] dt+σdB+σodya(t),ˆsa(0)=s0,dρa=ρabodya, ρa(0)=1. | (28) |
This problem is similar to (1) but with a new state (ˆss0,a,ρa). One obtains an augmented state (ˆs,ρ) and the infinite-dimensional DPP can be directly applied by considering the probability measure m(t,dˆsdρ)=Pˆsa(t),ρa(t). Similarly the stochastic maximum principle can be applied with the state (ˆs,ρ).
d(ˆsaρ)=(b−σobo0)dt+(σσo0ρabo)(dBdya). |
The new Hamiltonian for decision-maker i is
Hi(t,s,ρ,m,(p1,p2),q)=ρri+[b−σobo]p1+trace(Γ′q), |
where Γ:=(σσo0ρabo), and Γ′ is the transpose of the matrix Γ, and q:=(q11q12q21q22). For the optimal control action, however, it should be conditioned on the observation filtration Fyt. A necessary condition in the smooth case yields
E[δHi+12Pi[δσ]2 | Fyt]≤0. |
Note that the methodology extends to the case of individual observation per decision-maker i using the filtration Fyii,t.
In this section focuses on recent development and or extensions of mean-field-type games.
Consider a mean-field-type game setup with the following data:
{Time step: t∈{0,1,…,T−1}Set of decision-makers:IInitial state : s0∼ms0Stochastic state dynamics: s(t+1)∼qt+1(.| s(t),ms(t),ma(t),a(t))Instant payoff of i:ri(t,s(t),ms(t),ma(t),a(t))Terminal payoff of i:gi(s(T),ms(T)) |
Here the time space is a discrete set and denoted by {0,1,…,T−1}, T≥1 is the length of the horizon, t denotes a time step, s is the state process of the system, ms0 is the initial distribution of states. A decision-maker is denoted by i, has an action space Ai. The system state s(t) is stochastic and its probability transition from t to t+1 is given by qt+1: S×Δ(S)×∏iΔ(Ai)×∏iAi→Δ(S). Denote by ms(t) the distribution of state and by ma(t) the distribution of actions at time t. Decision-maker i's cumulative payoff is
Ri,T(ms0,a)=T−1∑t=0ri(t,s(t),ms(t),ma(t),a(t))+gi(s(T),ms(T)), |
The risk-neutral best response problem of i is
{supaiERi,T(ms0,a) subject tos(t+1)∼qt+1(.| s(t),ms(t),ma(t),a(t)),ms(t,.)=Ps(t), s0∼ms0, | (29) |
Let express the expected payoff in terms of the measure m(t,.).
Eri(t,s(t),ms(t),ma(t),a(t))=∫ri(t,ˉs,ms(t),ma(t),a(t))ms(t,dˉs)=ˆri(t,m(t),a(t)) |
where ˆri depends only on the measure m(t) and the strategy profile a(t). Similarly one can rewrite the expected value of the terminal payoff as
Egi(s(T),ms(T))=∫gi(ˉs,ms(T))ms(T,dˉs)=ˆgi(ms(T)). |
Proposition 7. On the space of measures, one has a deterministic dynamic game problem over multiple stages. Therefore a dynamic programming principle (DPP) holds:
{Vi(t,ms(t))=supa′i{ˆri(t,ms(t),a′i(t),a−i(t))+Vi(t+1,ms(t+1))}ms(t+1,ds′)=∫sqt+1(ds′| s,ms(t),ma(t),a(t))ms(t,ds) | (30) |
As we can see the best-response strategy may be dependent on the state, the mean-field ms, which is referred to as (state-and-mean-field) feedback strategy. Therefore, ma(t) can be expressed as a function of (s(t),ms(t)(t,.)). Thus, the payoff ˆri(t,.) can be expressed as a function (m(t,.),a(t)).
Proposition 8. Suppose a sequence of real-valued function Vi(t,.), t≤T defined on the set of probability measures over S is satisfying the DPP relation above. Then Vi(t,m) is the value function on Δ(S) starting from m(t)=m. Moreover if the supremum is achieved for some action a∗i(.,m), then the best response strategy is in (state-and-mean-field) feedback form, and the payoff value is
Ri(a∗)=Vi(0,m0). |
Proposition 8 provides a sufficiency condition for best-response strategies in terms of (s,ms(t)). The proof is immediate and follows from the verification theorem of DPP in deterministic dynamic games.
articular cases of interest
• Finite state space: Suppose that the state space and the action spaces are nonempty and finite. Let the state transition be
P(s(t+1)=s′ | s(t),ms(t),ma(t),a(t))=qt+1(s′| s(t),ms(t),ma(t),a(t)), |
DPP becomes
{Vi(t,ms(t))=supa′i{ˆri(t,m(t),a′it,a−i,t)+Vi(t+1,ms(t+1))}ms(t+1,s′)=∑s∈Sqt+1(s′| s,ms(t),ma(t),a(t))ms(t,s) |
As for classical polymatrix games, a pure mean-field equilibrium may not exist in general in mean-field-type games with finite actions. However mixed extension can be adopted. By extending the action space to the set of probability measures on Ai and the functions ˆrit,ˆg,iTqt+1 to the corresponding mixed extensions, one gets the existence of mean-field equilibria in behavioral (mixed) strategies for payoffs that are independent of ma.
• Continuous state space: Consider the interactive state dynamics in discrete time
s(t+1)=s(t)+b(t+1,s(t),ms(t),ma(t),a(t),ηt+1) |
where η is random process. The transition kernel of s(t+1) given s(t),ms(t),ma(t),a(t) is
qt+1(ds′| s(t),ms(t),ma(t),a(t))=∫ηP(ds′∋s(t)+bt+1(s(t),ms(t),ma(t),a(t),η))Pηt+1(dη) |
where Pηt+1(dη) denotes the probability distribution of ηt+1. The Bellman equation 7 applies this case.
• Mean-field free case: If ri(s,a,ms,ma)=ri(s,a) and gi(s,ms)=gi(s) for every decision-maker i then
ˆri(m,a)=∫sri(s,a)ms(t,ds). |
There exists a function vi such that
Vi(t,ms(t))=⟨vi(t,.),ms(t,.)⟩=∫svi(t,s)ms(t,ds), |
vi(t,s) is a mean-field free equilibrium payoff function of i. In that case, the dynamic programming reduces to vi(t,s)=supa′itHi(t,s,a′it,a−i,t),
Hi=ri(t,s,a′it,a−i,t)+∫s′vi(t+1,s′)qt+1(ds′|s(t),a′i(t),a−i(t)), |
which is the classical Bellman-Shapley equilibrium system. Note that in this case vi(t,s)=∂m(t,s)V(t,m).
Finite state continuous time games
Discrete state (countable or finite) games of mean-field type can be analyzed using the above methodologies. State process becomes a continuous time interactive Markovian decision process in which mean-field terms such as distribution of states and distribution of control actions are involved in it.
Each decision-maker has a risk-sensitivity index θi∈R. When θi vanishes, one gets the risk-neutral setup discussed above. Consider the risk-sensitive best response of i as
{supai∈Ai1θilog(EeθiRi,T) subject tos(t)=s0+∫t0bdt′+∫t0σdB(t′), t>0s(t)=ss0,a(t), s(0)=s0∼m0m(t,.)=mm0,a(t,.)=Pss0,a(t) | (31) |
Introduce the following augmented state (s,z) such that zi(0)=0, and dzi(t)=ri(t,sa(t),ma(t),a(t))dt. Then,
eθiRi,T=eθizi(T)+θigi(sa(T),ma(T)). |
Thus, we obtain the following mean-field-type control problem for decision-maker i:
{supaiEeθi[zi(T)+gi(sa(T),ma(T))]zai(t)=∫t0ridt′,zai(0)=0,s(t)=s0+∫t0bdt′+∫t0σdB(t′), t>0sa(0)=s0. | (32) |
Let us introduce μ(t,dsdz)=Psa(t),za(t), the probability measure associated with the joint process (sa(t),za(t)). Generically, the measure μ solves the Fokker-Planck-Kolmogorov forward equation in the distributional (weak) sense. The term EeθiRi,T can be rewritten in a deterministic manner as
∫μ(T,dsdz) eθi(zi+gi(s,∫˜zμ(T,.,d˜z))) . |
This is a terminal cost in the sense that it is evaluated only at μ(T,.). \mu is an infinite dimensional quantity that serves as a state in the problem. The advantage now is that \mu(.) is a deterministic quantity. Since there is no running cost, one can write directly the HJB equation using classical calculus of variations for
\hat{V}^{\theta}_i(t, \mu) = \sup\limits_{a_i\in \mathcal{A}_i} \int \mu(T, dsdz) \ e^{\theta_i(z_i+g_i(s, \int_{\tilde{z}}\mu(T, ., d\tilde{z})))}, |
starting from \mu(t, .) = \mu at time t: The risk-sensitive best-value is {V}^{\theta}_i(t, \mu): = \frac{1}{\theta_i}\log \hat{V}^{\theta}_i(t, \mu).
\mu_t = -\partial_s[{b} \mu] -\mbox{div}_z(f \mu)+\frac{1}{2}\partial_{ss}(\sigma'\sigma \mu), | (33) |
with initial distribution \ \mu(0, dx, dz) = m_0(dx)\delta_0(dz) and {\sigma}' is the transpose of {\sigma}.
The risk-sensitive HJB system yields
\begin{eqnarray} \nonumber 0& = & \hat{V}^{\theta}_{i, t}(t, \mu)+ \int {H}^{\theta}_i \mu(t, dsdz), \\ \nonumber && \hat{V}^{\theta}_i(T, \mu) =\int \mu(T, dsdz) \ e^{\theta_i(z_i+g_i(s, \int_{\tilde{z}}\mu(T, ., d\tilde{z})))}, \ \\ && i\in \mathcal{I} \label{hjbrsnoisy} \end{eqnarray} | (34) |
where {H}^{\theta}_i(t, s, z, \mu, \hat{V}_{i, z\mu}, \hat{V}_{i, ss\mu}) =\sup_{a_i} {b}\partial_s \hat{V}_{i, \mu} +\langle r, \partial_z \hat{V}_{i, \mu}\rangle +\frac{\sigma^2}{2} \partial_{ss} \hat{V}_{i, \mu}. Note that
\begin{eqnarray} \nonumber \hat{V}^{\theta}_i(T, \mu)& = & \int \mu(T, dsdz) \ e^{\theta_i(z_i+g_i(s, \int_{\tilde{z}}\mu(T, ., d\tilde{z})))}\\ \nonumber & = &\int_{s, z_i} \ e^{\theta_i(z_i+g_i(s, \int_{\tilde{z}_i}\int_{\tilde{z}_{-i}}\mu(T, ., d\tilde{z})))} \int_{z_{-i}}\mu(T, dsdz)\\ & = &\int_{s, z_i} \ e^{\theta_i(z_i+g_i(s, \int_{\tilde{z}_i}\mu_i(T, ., d\tilde{z}_i)))}\mu_i(T, dsdz_i) =\hat{V}^{\theta}_i(T, \mu_i), \end{eqnarray} | (35) |
where \mu_i(t, dsdz_i) = \int_{z_{-i}} \mu(t, dsdz_idz_{-i}), is the marginal of \mu with respect to (s, z_i), and \hat{V}^{\theta}_i(t, \mu) = \tilde{V}_i(t, \mu_i), The above calculation shows that the value function \frac{1}{\theta_i}\log \hat{V}^{\theta}_i depends only on the measure \mu_i
Proposition 9. If there exists a function \tilde{V} with
\tilde{V}_{i, t}, \tilde{V}_{i, s\mu_i}, \tilde{V}_{i, ss\mu_i}, \tilde{V}_{i, zz\mu_i}, |
satisfying
\begin{eqnarray} \nonumber 0& = &\tilde{V}_{i, t}+ \int {H}^{\theta}_i \mu_i(t, dsdz_i), \\ \nonumber && \tilde{V}_i(T, \mu_i) =\int \mu_i(T, dsdz_i) \ e^{\theta_i(z_i+g_i(s, \int_{\tilde{z}}\mu_i(T, ., d\tilde{z})))}, \ \\ \nonumber && \mu_i(t, dsdz_i) = \int_{z_{-i}} \mu(t, dsdz_idz_{-i}), \\ \nonumber && \mu_t = -\partial_s[{b} \mu] -\mbox{div}_z(r \mu)+\frac{1}{2}\partial_{ss}(\sigma^2 \mu), \\ \nonumber && \mu(0, dsdz) = m_0(ds)\delta_{0}(dz), \\ && i\in \mathcal{I}, \ \label{hjbrsnoisytt} \end{eqnarray} | (36) |
where the integrand Hamiltonian is
\begin{array}{l} {H}^{\theta}_i (t, s, z, \mu_i, \tilde{V}_{i, s\mu_i}, \tilde{V}_{i, z_i\mu_i}, \tilde{V}_{i, ss\mu_i}) \ =\sup\limits_{a_i}[{b}\tilde{V}_{i, s\mu_i} + r_i\tilde{V}_{i, z_i\mu_i} +\frac{\sigma^2}{2} \tilde{V}_{i, ss\mu_i}]\\ = \tilde{V}_{i, z_i\mu_i}\sup\limits_{a_i}[r_i +{b}\frac{\tilde{V}_{i, s\mu_i} }{\tilde{V}_{i, z_i\mu_i}} +\frac{\sigma^2}{2} \frac{\tilde{V}_{i, ss\mu_i}}{\tilde{V}_{i, z_i\mu_i}}]\ = \tilde{V}_{i, z_i\mu_i}H_i(t, s, z, \mu_i, \frac{\tilde{V}_{i, s\mu_i} }{\tilde{V}_{i, z_i\mu_i}}, \frac{\tilde{V}_{i, ss\mu_i}}{\tilde{V}_{i, z_i\mu_i}}) \end{array} |
then \frac{1}{\theta_i}\log \tilde{V}_{i}(0, \mu) is an equilibrium payoff for decision-maker i and the best-response strategy a_i minimizes {H}^{\theta}_i given the other decision-makers' strategies a_{-i}.
Proof. Apply the recipe of Proposition 1 with the augmented state (s, z) and the measure \mu
From the relation
{V}^{\theta}_i(t, \mu): =\frac{1}{\theta_i}\log \hat{V}^{\theta}_i(t, \mu) \implies e^{\theta_i {V}^{\theta}_i(t, \mu)} =\hat{V}^{\theta}_i(t, \mu), | (37) |
It follows that
\begin{array}{l} \hat{V}^{\theta}_{i, \mu}(t, \mu) =\theta_i {V}^{\theta}_{i, \mu}(t, \mu) e^{\theta_i {V}^{\theta}_i(t, \mu)} =\theta_i {V}^{\theta}_{i, \mu}(t, \mu) \hat{V}^{\theta}_i(t, \mu), \\ \hat{V}^{\theta}_{i, s\mu}(t, \mu) =\theta_i {V}^{\theta}_{i, s\mu}(t, \mu) \hat{V}^{\theta}_i(t, \mu), \\ \hat{V}^{\theta}_{i, z\mu}(t, \mu) =\theta_i {V}^{\theta}_{i, z\mu}(t, \mu) \hat{V}^{\theta}_i(t, \mu), \\ \hat{V}^{\theta}_{i, ss\mu}(t, \mu) =\theta_i {V}^{\theta}_{i, ss\mu}(t, \mu) \hat{V}^{\theta}_i(t, \mu), \\ \hat{V}^{\theta}_{i, t}(t, \mu) =\theta_i {V}^{\theta}_{i, t}(t, \mu) \hat{V}^{\theta}_i(t, \mu). \end{array} |
Thus,
\begin{array}{l} {H}^{\theta}_i (t, s, z, \mu_i, \hat{V}_{i, s\mu_i}, \hat{V}_{i, z_i\mu_i}, \hat{V}_{i, ss\mu_i}) \ = \hat{V}_{i, z_i\mu_i}H_i(t, s, z, \mu_i, \frac{\hat{V}_{i, s\mu_i} }{\hat{V}_{i, z_i\mu_i}}, \frac{\hat{V}_{i, ss\mu_i}}{\hat{V}_{i, z_i\mu_i}})\\ % = \theta_i {V}^{\theta}_{i, z\mu} \hat{V}^{\theta}_i(t, \mu) H_i(t, s, z, \mu_i, \frac{{V}^{\theta}_{i, s\mu_i} }{{V}^{\theta}_{i, z_i\mu_i}}, \frac{{V}^{\theta}_{i, ss\mu_i}}{{V}^{\theta}_{i, z_i\mu_i}}) \end{array} |
\begin{eqnarray} \nonumber 0& = & \theta_i {V}^{\theta}_{i, t}(t, \mu) \hat{V}^{\theta}_i(t, \mu) + \int \theta_i {V}^{\theta}_{i, z\mu} \hat{V}^{\theta}_i(t, \mu). H_i(t, s, z, \mu_i, \frac{{V}^{\theta}_{i, s\mu_i} }{{V}^{\theta}_{i, z_i\mu_i}}, \frac{{V}^{\theta}_{i, ss\mu_i}}{{V}^{\theta}_{i, z_i\mu_i}}) \mu_i(t, dsdz_i), \\ \nonumber && {V}^{\theta}_i(T, \mu_i) =\frac{1}{\theta_i}\log \left[\int \mu_i(T, dsdz_i) \ e^{\theta_i(z_i+g_i(s, \int_{\tilde{z}}\mu_i(T, ., d\tilde{z})))}\right], \ \\ \nonumber \label{hjbrsnoisyttty} \end{eqnarray} |
By symplifying by \theta_i \hat{V}^{\theta}_i(t, \mu)\neq 0, we obtain that the risk-sensitive best-response payoff {V}^{\theta}_i(t, \mu) solves the functional PDE given by
\begin{eqnarray} \nonumber 0& = & {V}^{\theta}_{i, t}(t, \mu) + \int {V}^{\theta}_{i, z\mu} . H_i(t, s, z, \mu_i, \frac{{V}^{\theta}_{i, s\mu_i} }{{V}^{\theta}_{i, z_i\mu_i}}, \frac{{V}^{\theta}_{i, ss\mu_i}}{{V}^{\theta}_{i, z_i\mu_i}}) \mu_i(t, dsdz_i), \\ \nonumber && {V}^{\theta}_i(T, \mu_i) =\frac{1}{\theta_i}\log \left[\int \mu_i(T, dsdz_i) \ e^{\theta_i(z_i+g_i(s, \int_{\tilde{z}}\mu_i(T, ., d\tilde{z})))}\right], \ \\ \nonumber && \mu_i(t, dsdz_i) = \int_{z_{-i}} \mu(t, dsdz_idz_{-i}), \\ \nonumber && \mu_t = -\partial_s[{b} \mu] -\mbox{div}_z(r \mu)+\frac{1}{2}\partial_{ss}(\sigma^2 \mu), \\ \nonumber && \mu(0, dsdz) = m_0(ds)\delta_{0}(dz), \\ && i\in \mathcal{I}, \ \label{hjbrsnoisytttyt} \end{eqnarray} | (38) |
Let \tilde{v}^*_i(t, s, z_i): = \tilde{V}_{i, \mu_i}(\mu_i)(t, s, z_i). It follows that \tilde{v}^*_i(t, s, z_i) solves the following PDE system:
\begin{array}{l} 0 = \tilde{v}^*_{i, t}+ \tilde{v}^*_{i, z_i} H_i(t, s, z_i, \mu_i, \frac{\tilde{v}^*_{i, s}}{\tilde{v}^*_{i, z_i}}, \frac{\tilde{v}^*_{i, ss}}{\tilde{v}^*_{i, z_i}})\ + \int \tilde{v}^*_{i, \tilde{z}_i}H_{i, \mu_i} \mu_i(t, d\tilde{s}, d\tilde{z}_i). \end{array} |
The terminal condition for this PDE is
\tilde{v}^*_i(T, s, z_i) = e^{\theta_i (z_i+g_i(s, m(T)))}+\theta_i \int g_{i, m}(\tilde{s}, m(T))({s}) e^{\theta_i (\tilde{z}_i+g_i(\tilde{s}, m(T)))}\ \mu_i(T, d\tilde{s}, d\tilde{z}_i). |
From the equality \tilde{V}^{\theta}_{i, \mu}(t, \mu) =\theta_i {V}^{\theta}_{i, \mu}(t, \mu) \tilde{V}^{\theta}_i(t, \mu), we deduce that the function {v}^*_i(t, s, z_i): = {V}^{\theta}_{i, \mu_i}(\mu_i)(t, s, z_i) =\frac{\tilde{V}^{\theta}_{i, \mu}}{\theta_i \tilde{V}^{\theta}_i} = \frac{\tilde{v}^*_i(t, s, z_i)}{\theta_i \tilde{V}^{\theta}_i} is the dual function associated with the risk-sensitive best response value of decision-maker i.
Note that, as in the risk-neutral case, in general, the risk-sensitive dual function {v}^*(t, s, z_i) is different than the risk-sensitive payoff function {V}^{\theta}(t, \mu_i) or {V}^{\theta}(t, \mu).
Itô's formula applied to
\begin{array}{lll} p^{\theta}_i(t) = \frac{\tilde{v}^*_{i, s}(t, s(t), z_i(t))}{\tilde{v}^*_{i, z_i}(t, s(t), z_i(t))}\ =\frac{ \tilde{V}_{i, s\mu_i}(t, s(t), z_i(t))}{\tilde{V}_{i, z_i\mu_i}(t, s(t), z_i(t))}, \end{array} |
provides a risk-sensitive first-order adjoint equation with the risk-sensitive Hamiltonian H_i^{\theta} = H_i(t, s, m, p^{\theta}_i, q_i^{\theta}+p_i^{\theta} l_i).
Proposition 10. For \theta_i\neq 0, the risk-sensitive first order adjoint process (p_i^{\theta}, q_i^{\theta}, l_i, \eta_i)_{i\in \mathcal{I}} solves
\begin{array}{ll} dp^{\theta}_i = - \{H_{i, s} +\frac{1}{\eta_i} \mathbb{E}[\eta_i H_{i, s\mu_i}]\} dt +q_i^{\theta}[-l_i dt+ dB], \\ p^{\theta}_i(T) = g_{i, s}+\frac{1}{\eta_i(T)}\mathbb{E}\{ \eta_i(T) \partial_s g_{i, m}\}, \\ d\eta_i = \eta_i l_i dB, \\ \eta_i(T) =\theta_i e^{\theta_i[z_i(T)+g_i(s(T), m(T, .))} \end{array} |
Proposition 10 provides a risk-sensitive first order system for stochastic maximum principle of mean-field type. By identification of processes, one obtains
\begin{array}{lll} p_i^{\theta}(t) = \frac{\tilde{v}^*_{i, s}(t, s(t), z_i(t))}{\tilde{v}^*_{i, z_i}(t, s(t), z_i(t))} =\frac{ \tilde{V}_{i, s\mu_i}(t, s(t), z_i(t))}{\tilde{V}_{i, z_i\mu_i}(t, s(t), z_i(t))}, \\ q^{\theta}_i = \sigma \frac{\tilde{v}^*_{i, ss}}{ \eta_i}-p_i^{\theta} l_i, \\ \eta_i(t) = \tilde{v}^*_{i, z_i}(t, s(t), z_i(t)), \\ l_i = \sigma\frac{\eta_{i, s}}{\eta_i} = \sigma \partial_s[\ \log \eta_i] \end{array} |
Proof. A proof can be obtained by following similar steps as for the proof of Proposition 4.
One of the fundamental element in the theory of cooperative games is the formulation of the optimal behavior for the decision-makers. Decision-maker behavior (control action and imputations) satisfying specific optimality behaviors then constitutes a solution of the game. In other words, a solution concept of a dynamic cooperative game is produced by a set of optimality principles such as dynamic bargaining solution and payoff allocation procedure. Altruism and cooperation are fascinating research areas. Intuitively, one has attempted to claim that the decision-makers are better off when they all work cooperatively. However, we are often observing very strange behaviors that are far from cooperation. So, if cooperation is answer, what is the question and why these strange behaviors?
Let us consider a simple example cooperative mean-field-type game [42] with two decision-makers. Assume that if they work together (jointly) they will be able to get V(\{12\}, m_0, [0, T]). DM 1 gets V(\{1\}, m_0, [0, T]) if he or she works alone and DM 2 gets V(\{2\}, m_0, [0, T]). From these three values, it is not clear why these decision-makers should work together. In order to formalize it in terms of their interest, we introduce a cost of making a coalition, C(\{12\}, m_0, [0, T])\geq 0 which is the cost incurred when both decision-makers pool their effort (it includes information exchanging cost, coalition creation cost, etc). While this cost is often neglected in the literature, it may be important in many setups. Thus, a necessary condition for possible cooperation between the decision-makers is
V(\{12\}, m_0, [0, T])-C(\{12\}, m_0, [0, T]) > V(\{1\}, m_0, [0, T])+V(\{2\}, m_0, [0, T]). |
Then, the next question is: what will be their payoff if they cooperate? To answer to this question, we need to know how to share the outcome of the cooperation. It is clear that allocating the equal share \frac{1}{2}[V(\{12\}, m_0, [0, T])-C(\{12\}, m_0, [0, T])] to each decision-maker is not necessary appropriate since it can be less than \min_{i}(V(\{i\}, m_0, [0, T])). Thus, the allocation has to be done in a more clever way.
Cooperative game-theoretic solutions such as Bargaining solution, Core, Shapley value, Nucleolus dealt with such problems. When stochasticity and time-dependence are involved, the solution concepts require careful adaptations. In addition, if the payoff function and the state dynamics are of mean-field type, the optimality equations need to be established. In terms of equilibrium payoffs, a mean-field-type Nash equilibrium could plays the role of a benchmark in a cooperative game, i.e., gives what decision-makers could secure for themselves if there is no agreement, i.e., V(\{i\}, m_0, [0, T])).
Let R_{c, i} be the total before-side-payment cooperative payoff of decision-maker i i.e.,
\mathbb{E} [g_i(s_c(T), m_c(T))+\int_0^T r_{i}(t, s_c(t), m_c(t), a_c(t))dt] |
s_c is the optimal state under cooperative (joint) decision-making scenario. One has \sum_{j\in \mathcal{J}} R_{c, j} = V(\mathcal{J}, m_0, [0, T])). However, one needs to find a better way to share the payoff V(\mathcal{J}, m_0, [0, T])-C(\mathcal{J}, m_0, [0, T]). This leads to the introduction of the notion of imputation, i.e., a vector profile (\gamma_j)_{j\in \mathcal{J}} such that \sum_{j\in \mathcal{J}'}\gamma_j\geq V(\mathcal{J}', m_0, [0, T])-C(\mathcal{J}', m_0, [0, T]) for any \mathcal{J}'\subset \mathcal{J}, \ and \sum_{j\in \mathcal{J}}\gamma_j = V(\mathcal{J}, m_0, [0, T])-C(\mathcal{J}, m_0, [0, T])
By virtue of mean-field type joint optimization, the sum of individual payoffs under cooperation is greater or equal to its noncooperative counterpart, i.e.,
\sum\limits_{j\in \mathcal{J}} R_{c, j} \geq \sum\limits_{j\in \mathcal{J}} V(\{j\}, m_0, [0, T]). |
Thus, the dividend of cooperation (without the coalition making cost) is
DC = \sum\limits_{j\in \mathcal{J}} R_{c, j} - \sum\limits_{j\in \mathcal{J}} V(\{j\}, m_0, [0, T])\geq 0. |
Thus, the dividend of cooperation (with the coalition making cost) to be distributed among the decision-makers is DC-C(\mathcal{J}, m_0, [0, T]).
As a first consequence, it is clear if the coalition making cost is too high (compared to the game coalition value) then there is no reason for the decision-makers to form coalition.
Therefore for cooperation purpose we require the positivity of DC-C(\mathcal{J}, m_0, [0, T]). Using a cooperative game approach yields individual payoffs for the whole interval [0, T]. The selected imputation has, by definition, the property that each decision-maker's payoff in the cooperative game is higher or equal to what she would get in a noncooperative game played on the same time interval.
Let \gamma_i(t) = \gamma(\{i\}, \mathcal{J}, m_0, [t, T]) be the cooperative payoff-to-go after side payment for decision-maker i at position [t, T], 0 < t < T of the game. This is the amount of individual payoff that decision-maker i will actually get. One way sharing the payoff is to use a dynamical Shapley value. The allocated payoff to decision-maker i under Shaley value is
\begin{eqnarray}\gamma_i& = &\sum\limits_{\mathcal{J}'\subset \mathcal{J}, i\notin \mathcal{J}'} \frac{|\mathcal{J}'|!(|\mathcal{J}|-|\mathcal{J}'|-1)!}{|\mathcal{J}|!} [(V-C)(\mathcal{J}' \cup \{i\}, [t, T])-(V-C)(\mathcal{J}', [t, T])].\end{eqnarray} | (39) |
For two decision-makers case, the payoffs of the decision-makers are
\begin{equation} \begin{array}{ll}\gamma_1 =\\ V(\{1\}, m_0, [0, T])+ \frac{1}{2}\left[V(\{12\}, m_0, [0, T])-C(\{12\}, m_0, [0, T])- V(\{1\}, m_0, [0, T])- V(\{2\}, m_0, [0, T]) \right], \nonumber\\ \gamma_2 = \\ V(\{2\}, m_0, [0, T])+ \frac{1}{2}\left[V(\{12\}, m_0, [0, T])-C(\{12\}, m_0, [0, T])- V(\{1\}, m_0, [0, T])- V(\{2\}, m_0, [0, T]) \right]. \nonumber \end{array} \end{equation} |
Definition 4. A cooperative solution is time consistent if, at any position t \in [0, T] the cooperation solution payoff to go \gamma(\{i\}, m(t), [t, T])\geq V(\{i\}, m(t), [t, T]) where the deviating payoff V(\{i\}, m(t), [t, T]) is computed along the cooperative state trajectory s_c(t).
This notion of time consistency and its implementation in cooperative differential games was introduced in [6]. A stronger notion of time consistency is that the cooperative payoff-to-go dominates the noncooperative payoff-to-go for any state s(t), \ t\in [[0, T]. This is called sub-game consistent solution.
Though one of the most commonly used allocation principles is the dynamical Shapley value, however, in the case when decision-makers may be asymmetric in their powers and sizes of payoffs, equal imputation of cooperative gains may not be totally agreeable to asymmetric decision-maker. To overcome this, one can suggest the allocation principle in which the decision-makers, shares of the gain from cooperation are proportional to the relative sizes of their expected deviating payoffs. Thus, a proportional time-consistent solution is given by
\begin{eqnarray} \gamma_i(s) =\frac{V(\{i\}, [t, T])}{\sum\limits_{i}V(\{i\}, [t, T])} [V(\mathcal{J}, [t, T])-C(\mathcal{J}, [t, T])]. \end{eqnarray} | (40) |
If these quantities are positive, one gets \gamma(\{i\}, \mathcal{I}, [t, T])\geq V(\{i\}, m(t), [t, T]), \forall t, (individual rationality) and \sum_{i\in\mathcal{I}} \gamma(\{i\}, \mathcal{I}, [t, T]) = V(\mathcal{I}, m(t), [t, T])-C(\mathcal{I}, [s, T]) (efficiency at any time).
For dynamic games, an additional and stringent condition on the solutions is required: The specific optimality principle must remain optimal at any instant of time throughout the game duration along the optimal state trajectory. This condition is known as dynamic stability or time consistency.
In the context of mean-field type games, the notion of time consistency is crucial since the initial distribution of states and starting time influences naturally the Kolmogorov forward equation. A cooperative solution is sub-game consistent if an extension of the cooperative strategy to a situation with a later starting time and to any possible state brought about by the prior optimal behavior of the decision-makers remains optimal. Sub-game consistent is a stronger notion of time consistency. In the presence of stochastic elements, sub-game consistency is required in a credible cooperative solution. In the field of cooperative mean-field type games, little research has been published to date on sub-game consistent solutions.
If the set \mathcal{A} is a non-convex but a general separable complete metric space (Polish space), Pontryagin's approach suggests the following perturbation method called spike variation. The approach is well-adapted to sub-game perfection in games. Fix (t, s)\in [0, T]\times \mathcal{S} and define the control law a_{\epsilon} as the spike variation of \hat{a} over the set [t, t+\epsilon], \ \epsilon>0 i.e.,
a_{\epsilon}(t') =a(t') {\rm{ll}}_{[t, t+\epsilon]}(t') + \hat{a}(t') {\rm{ll}}_{[0, T]\backslash [t, t+\epsilon]}(t'), |
where a is an arbitrary admissible control and {\rm{ll}}_{[t, t+\epsilon]} is the indicator function over the set [t, t+\epsilon].
Definition 5. Let R_J be objective to be maximized in a. We say that \hat{a} is a sub-game perfect cooperative strategy under spike variation if for any t_0, s_0, a,
\lim\limits_{\epsilon \rightarrow 0}\ \frac{1}{\epsilon}\left[R_J([t_0, T], s_0, \hat{a})- R_J([t_0, T], s_0, a_{\epsilon})\ \right] \geq 0. |
Note that a sub-game perfect cooperative strategy under spike variation is in particular a time consistent solution.
The key difference here is that the solution that we are looking for, should not depend on the initial data (when and where we started).
Let H^{t_0, s_0} be the Pontryagin function associated with the random variable s that starts from s_0 at t_0\in [0, T]. H^{t_0, s_0}(t, s, m, a, p, q) =b p^{t_0, s_0}+\sigma q^{t_0, s_0} +r^{t_0, s_0}_J, where the notation r^{t_0, s_0} is obtained from r when the aggregate term is m = m^{t_0, s_0, a}. The first order adjoint equation process (p^{t_0, s_0}, q^{t_0, s_0}) under optimal cooperative control law becomes
\begin{equation}\begin{array}{ll} dp^{t_0, s_0} =-[H^{t_0, s_0}_s+\mathbb{E} \partial_s H^{t_0, s_0}_{m}] dt +q^{t_0, s_0} dB(t), \ t>t_0, \\ p^{t_0, x_0}(T) = g^{t_0, s_0}_s(T)+\mathbb{E} \partial_s g^{t_0, s_0}_{m}(T). \end{array} \end{equation} | (41) |
The second order adjoint equation (P^{t_0, s_0}, Q^{t_0, s_0}) is defined following similar steps as in (14) but with conditioning on t_0, s_0.
Proposition 11. Let the assumptions of Lemma 5 hold. If (\hat{x}, \hat{a}) is an optimal solution of the cooperative mean-field type game then there are two pairs of processes (p, q), (P, Q) that satisfy the first order and the second order adjoint equations, such that
H^{t, \hat{x}(t)}(t, \hat{x}, \hat{m}, \hat{a}, p^{t, \hat{x}(t)}, q^{t, \hat{x}(t)})-H^{t, \hat{x}(t)}(t, \hat{x}, \hat{m}, a, p^{t, \hat{x}(t)}, q^{t, \hat{x}(t)}) |
+\frac{1}{2}P^{t, \hat{x}(t)}(t)\left( \sigma(t, \hat{x}, \hat{m}, \hat{a})- \sigma(t, \hat{x}, \hat{m}, a)\right)^2 \geq 0, |
for all a(.)\in \mathcal{A}, almost every t and \mathbb{P}-almost surely.
In this article, basic results on mean-field-type games were presented. Most of the results presented here extend to the multi-dimensional state case (vector or matrix), and the state-of-the-art stochastic maximum principle can carried out random coefficient functions, common noise as well as time delays and backward-forward stochastic integro-differential equation. The state-of-art dynamic programming principle works on infinite dimension, in the space of measures. By introducing dual functions which are weak Gateaux-derivatives, relationships between stochastic maximum principle and dynamic programming were established. The methodology was shown to be flexible enough to carry out partial state observation and imperfect state measurement in the non-degenerate case using Girsanov transform. Wiener chaos expansion of the underlying processes were proposed to solve the game problems and truncature error bounds were derived using Kosambi-Karhunen-Loeve's approach. This allows one to solve efficiently of the mean-field-type problem much faster than standard methods as multi-level Monte-Carlo sampling or stochastic collocation methods. The choice of a basis is crucial as the number of elements influence the curse of dimensionality of the problem. Sparse representation and non-intrusive proper generalized decomposition of the processes may be needed in order to significantly reduce the complexity.
This research work is supported by U.S. Air Force Office of Scientific Research under grant number FA9550-17-1-0259. The author is grateful to Prof. Boualem Djehiche for useful comments on the Wiener Chaos Expansions in mean-field-type games. The author is grateful to the Editor and the reviewers' valuable comments that improved the manuscript.
The author declares that there is no conflict of interest regarding the publication of this paper.
[1] | L. S. Shapley, Stochastic games, PNAS, P. Natl Acad. Sci. USA, 39 (1953), 1095-1100. |
[2] | B. Jovanovic, R. W. Rosenthal, Anonymous sequential games, J. Math. Econ. Elsevier, 17 (1988), 77-87. |
[3] | J. M. Lasry, P. L. Lions, Mean field games, Jpn. J. Math. 2 (2007), 229-260. |
[4] | A. Bensoussan, B. Djehiche, H. Tembine, et al. Risk-sensitive mean-field-type control, IEEE CDC, 2017. |
[5] | B. Djehiche, S. A. Tcheukam and H. Tembine, A Mean-Field Game of Evacuation in Multi-Level Building, IEEE T. Automat. Contr. 62 (2017), 5154-5169. |
[6] | L. A. Petrosjan, Stability of the solutions in differential games with several players, Vestnik Leningrad. univ. 19 (1977), 46-52. |
[7] | B. Jourdain, S. Méléard andW.Woyczynski, Nonlinear SDEs driven by Lévy processes and related PDEs, Alea, 4 (2008), 1-29. |
[8] | R. Buckdahn, B. Djehiche, J. Li, et al. Mean-field backward stochastic differential equations: a limit approach, Ann. Probab. 37 (2009), 1524-1565. |
[9] | R. Buckdahn, J. Li and S. Peng, Mean-field backward stochastic differential equations and related partial differential equations, Stoch. Proc. Appl. 119 (2009), 3133-3154. |
[10] | D. Andersson, B. Djehiche, A maximum principle for SDEs of Mean-Field Type, Appl. Math. Opt. 63 (2011), 341-356. |
[11] | R. Buckdahn, B. Djehiche and J. Li, A General Stochastic Maximum Principle for SDEs of Mean-Field Type, Appl. Math. Opt. 64 (2011), 197-216. |
[12] | J. Hosking, A stochastic maximum principle for a stochastic differential game of a mean-field type, Appl. Math. Opt. 66 (2012), 415-454. |
[13] | J. Li, Stochastic maximum principle in the mean-field controls, Automatica, 48 (2012), 366-373. |
[14] | M. Hafayed, S. Abbas, A. Abba, On mean-field partial information maximum principle of optimal control for stochastic systems with Lévy processes, J. Optimiz. Theory App. 167 (2015), 1051-1069. |
[15] | R. Elliott, X. Li and Y. H. Ni, Discrete time mean-field stochastic linear-quadratic optimal control problems, Automatica, 49 (2013), 3222-3233. |
[16] | M. Hafayed, A mean-field maximum principle for optimal control of forward-backward stochastic differential equations with Poisson jump processes, International Journal of Dynamics and Control, 1 (2013), 300-315. |
[17] | A. Bensoussan, J. Frehse and S. C. P. Yam, Mean-field games and mean-field-type control theory, SpringerBriefs in mathematics, Springer, 2013. |
[18] | B. Djehiche, H. Tembine and R. Tempone, A Stochastic Maximum Principle for Risk-Sensitive Mean-Field-Type Control, IEEE T. Automat. Contr. 60 (2015), 2640-2649. |
[19] | H. Tembine, Risk-sensitive mean-field-type games with Lp-norm drifts, Automatica, 59 (2015), 224-237. |
[20] | B. Djehiche, H. Tembine, Risk-Sensitive Mean-Field-Type Control under Partial Observation, In Stochastics in environmental and financial economics, Springer Proceedings in Mathematics and Statistics, 2016. |
[21] | K. Law, H. Tembine and R. Tempone, Deterministic Mean-Field Ensemble Kalman Filtering, SIAM J. Sci. Comput. 38 (2016), 1251-1279. |
[22] | J. Robinson, Applied Analysis lecture notes, University of Warwick, 2006. |
[23] | R. H. Cameron, W. T. Martin, The orthogonal development of non-linear functionals in series of Fourier-Hermite functionals, Ann. Math. 48 (1947), 385-392. |
[24] | N. Wiener, The homogeneous chaos, Am. J. Math. 60 (1938), 897-936. |
[25] | N. Wiener, Nonlinear Problems in Random Theory, Technology Press of the Massachusetts Institute of Technology/Wiley, New York, 1958. |
[26] | F. Augustin, A. Gilg, M. Paffrath, et al. Polynomial chaos for the approximation of uncertainties: Chances and limits, Eur. J. Appl. Math. 19 (2008), 149-190. |
[27] | G. Wang, C. Zhang and W. Zhang, Stochastic maximum principle for mean-field type optimal control under partial information, IEEE T. Automat. Contr. 59 (2014), 522-528. |
[28] | Z.Wu, A maximum principle for partially observed optimal control of forward-backward stochastic control systems, Science China information sciences, 53 (2010), 2205-2214. |
[29] | J. Yong, Linear-quadratic optimal control problems for mean-field stochastic differential equations, SIAM J. Control Optim. 51 (2013), 2809-2838. |
[30] | H. Tembine, Uncertainty quantification in mean-field-type teams and games, In Proceedings of 54th IEEE Conference on Decision and Control (CDC), 2015,4418-4423, |
[31] | S. Tang, The maximum principle for partially observed optimal control of stochastic differential equations, SIAM J. Control Optim. 36 (1998), 1596-1617. |
[32] | H. Ma, B. Liu, Linear Quadratic Optimal Control Problem for Partially Observed Forward Backward Stochastic Differential Equations of Mean-Field Type, Asian J. Control, 2017. |
[33] | G. Wang, Z. Wu and J. Xiong, A linear-quadratic optimal control problem of forward-backward stochastic differential equations with partial information, IEEE T. Automat. Contr. 60 (2015), 2904-2916. |
[34] | Q. Meng, Y. Shen, Optimal control of mean-field jump-diffusion systems with delay: A stochastic maximum principle approach, J. Comput. Appl. Math. 279 (2015), 13-30. |
[35] | T. Meyer-Brandis, B. Oksendal and X. Y. Zhou, A mean-field stochastic maximum principle via Malliavin calculus, Stochastics, 84 (2012), 643-666. |
[36] | Y. Shen, Q. Meng and P. Shi, Maximum principle for mean-field jump-diffusion stochastic delay differential equations and its application to finance, Automatica, 50 (2014), 1565-1579. |
[37] | Y. Shen, T. K. Siu, The maximum principle for a jump-diffusion mean-field model and its application to the mean-variance problem, Nonlinear Anal-Theor, 86 (2013), 58-73. |
[38] | G.Wang, Z.Wu, The maximum principles for stochastic recursive optimal control problems under partial information, IEEE T. Automat. Contr. 54 (2009), 1230-1242. |
[39] | G. Wang, Z. Wu and J. Xiong, Maximum principles for forward-backward stochastic control systems with correlated state and observation noises, SIAM J. Control Optim. 51 (2013), 491-524. |
[40] | B. Djehiche, A. Tcheukam and H. Tembine, Mean-Field-Type Games in Engineering, AIMS Electronics and Electrical Engineering, 1 (2017), 18-73. |
[41] | J. Gao, H. Tembine, Distributed Mean-Field-Type Filters for Big Data Assimilation, IEEE International Conference on Data Science Systems (DSS), Sydney, Australia, 12-14 Dec. 2016. |
[42] | A. K. Cisse, H. Tembine, Cooperative Mean-Field Type Games, 19th World Congress of the International Federation of Automatic Control (IFAC), Cape Town, South Africa, 24-29 August 2014,8995-9000. |
[43] | H. Tembine, Tutorial on Mean-Field-Type Games, 19th World Congress of the International Federation of Automatic Control (IFAC), Cape Town, South Africa, 24-29 August 2014. |
[44] | G. Rossi, A. S. Tcheukam and H. Tembine, How Much Does Users' Psychology Matter in Engineering Mean-Field-Type Games, Workshop on Game Theory and Experimental Methods, June 6-7, Second University of Naples, Italy, 2016. |
[45] | A. S. Tcheukam, H. Tembine, On the Distributed Mean-Variance Paradigm, 13th International Multi-Conference on Systems, Signals & Devices. Conference on Systems, Automation & Control. March 21-24,2016 -Leipzig, Germany, 604-609. |
[46] | T. E. Duncan, H. Tembine, Linear-quadratic mean-field-type games: A direct method, Preprint, 2017. |
[47] | T. E. Duncan, H. Tembine, Linear-quadratic mean-field-type games with common noise: A direct method, Preprint, 2017. |
[48] | T. E. Duncan, H. Tembine, Other-regarding payoffs in linear-quadratic mean-field-type games with common noise: A direct method, Preprint, 2017. |
[49] | T. E. Duncan, H. Tembine, Nash bargaining solution in linear-quadratic mean-field-type games: A direct method, Preprint, 2017. |
[50] | H. Tembine, B. Djehiche, P. Yam, et al. Mean-field-type games with jumps and regime switching, Preprint, 2017. |
1. | Julian Barreiro-Gomez, Tyrone E. Duncan, Bozenna Pasik-Duncan, Hamidou Tembine, Semiexplicit Solutions to Some Nonlinear Nonquadratic Mean-Field-Type Games: A Direct Method, 2020, 65, 0018-9286, 2582, 10.1109/TAC.2019.2946337 | |
2. | Alexander Aurell, Boualem Djehiche, Modeling tagged pedestrian motion: A mean-field type game approach, 2019, 121, 01912615, 168, 10.1016/j.trb.2019.01.011 | |
3. | Yan Sun, Lixin Li, Qianqian Cheng, Dawei Wang, Wei Liang, Xu Li, Zhu Han, 2020, Joint Trajectory and Power Optimization in Multi-Type UAVs Network with Mean Field Q-Learning, 978-1-7281-7440-2, 1, 10.1109/ICCWorkshops49005.2020.9145105 | |
4. | Alain Bensoussan, Boualem Djehiche, Hamidou Tembine, Sheung Chi Phillip Yam, Mean-Field-Type Games with Jump and Regime Switching, 2020, 10, 2153-0785, 19, 10.1007/s13235-019-00306-2 | |
5. | Lixin LI, Yan SUN, Qianqian CHENG, Dawei WANG, Wensheng LIN, Wei CHEN, Optimal trajectory and downlink power control for multi-type UAV aerial base stations, 2021, 10009361, 10.1016/j.cja.2020.12.019 | |
6. | Hamidou Tembine, COVID-19: Data-Driven Mean-Field-Type Game Perspective, 2020, 11, 2073-4336, 51, 10.3390/g11040051 | |
7. | Zahrate El Oula Frihi, Julian Barreiro-Gomez, Salah Eddine Choutri, Hamidou Tembine, Hierarchical Structures and Leadership Design in Mean-Field-Type Games with Polynomial Cost, 2020, 11, 2073-4336, 30, 10.3390/g11030030 | |
8. | Tyrone Duncan, Hamidou Tembine, Linear–Quadratic Mean-Field-Type Games: A Direct Method, 2018, 9, 2073-4336, 7, 10.3390/g9010007 | |
9. | Sergey Gavrilets, Peter J. Richerson, Authority matters: propaganda and the coevolution of behaviour and attitudes, 2022, 4, 2513-843X, 10.1017/ehs.2022.48 | |
10. | Zahrate El Oula Frihi, Salah Eddine Choutri, Julian Barreiro-Gomez, Hamidou Tembine, Hierarchical Mean-Field Type Control of Price Dynamics for Electricity in Smart Grid, 2022, 35, 1009-6124, 1, 10.1007/s11424-021-0176-3 | |
11. | Christian Houle, Damian J. Ruck, R. Alexander Bentley, Sergey Gavrilets, Inequality between identity groups and social unrest, 2022, 19, 1742-5662, 10.1098/rsif.2021.0725 | |
12. | Sergey Gavrilets, Coevolution of actions, personal norms and beliefs about others in social dilemmas, 2021, 3, 2513-843X, 10.1017/ehs.2021.40 | |
13. | Denis Tverskoi, Andrea Guido, Giulia Andrighetto, Angel Sánchez, Sergey Gavrilets, Disentangling material, social, and cognitive determinants of human behavior and beliefs, 2023, 10, 2662-9992, 10.1057/s41599-023-01745-4 | |
14. | Matt Barker, Pierre Degond, Ralf Martin, Mirabelle Muûls, A mean field game model of firm-level innovation, 2023, 33, 0218-2025, 929, 10.1142/S0218202523500203 | |
15. | Partha Sarathi Mohapatra, Puduru Viswanadha Reddy, Linear-Quadratic Mean-Field-Type Difference Games With Coupled Affine Inequality Constraints, 2023, 7, 2475-1456, 1987, 10.1109/LCSYS.2023.3283371 | |
16. | Alexander Aurell, Mean-Field Type Games between Two Players Driven by Backward Stochastic Differential Equations, 2018, 9, 2073-4336, 88, 10.3390/g9040088 | |
17. | Sergey Gavrilets, Denis Tverskoi, Nianyi Wang, Xiaomin Wang, Juan Ozaita, Boyu Zhang, Angel Sánchez, Giulia Andrighetto, Co-evolution of behaviour and beliefs in social dilemmas: estimating material, social, cognitive and cultural determinants, 2024, 6, 2513-843X, 10.1017/ehs.2024.38 |
\mathcal{I} | \triangleq | set of decision-makers |
T | \triangleq | Length of the horizon |
[0, T] | \triangleq | horizon of the mean-field-type game |
t | \triangleq | time index. |
\mathcal{S} | \triangleq | state space |
s(t) | \triangleq | state at time t |
\Delta(\mathcal{S}) | \triangleq | set of probability measures on \mathcal{S} |
m(t, .) | \triangleq | probability measure of the state at time t |
A_i | \triangleq | control action set of decision-maker i\in \mathcal{I} |
a_i(.) | \triangleq | strategy of decision-maker i\in \mathcal{I} |
a=(a_i)_{i\in \mathcal{I}} | \triangleq | strategy profile |
b(t, s, m, a) | \triangleq | drift coefficient function |
\sigma(t, s, m, a) | \triangleq | diffusion coefficient function |
r_i(t, s, m, a) | \triangleq | instant payoff of decision-maker i\in \mathcal{I} |
g_i(s, m) | \triangleq | terminal payoff of decision-maker i\in \mathcal{I} |
\mathcal{R}_{i, T}(m_0, a) | \triangleq | cumulative payoff of i |
V_i(t, m) | \triangleq | equilibrium payoff of i\in \mathcal{I} |
(p_i, q_i) | \triangleq | first order adjoint process of i\in \mathcal{I} |
(P_i, Q_i) | \triangleq | second order adjoint process of i\in \mathcal{I} |
v_i^*(t, s) | \triangleq | dual function of i\in \mathcal{I} |
H_i | \triangleq | r_i+bp_i+\sigma q_i of i\in \mathcal{I} |
H_{i, s} | \triangleq | r_{i, s}+b_sp_i+\sigma_s q_i |
H_{i, ss} | \triangleq | r_{i, ss}+b_{ss}p_i+\sigma_{ss} q_i |
\mathcal{I} | \triangleq | set of decision-makers |
T | \triangleq | Length of the horizon |
[0, T] | \triangleq | horizon of the mean-field-type game |
t | \triangleq | time index. |
\mathcal{S} | \triangleq | state space |
s(t) | \triangleq | state at time t |
\Delta(\mathcal{S}) | \triangleq | set of probability measures on \mathcal{S} |
m(t, .) | \triangleq | probability measure of the state at time t |
A_i | \triangleq | control action set of decision-maker i\in \mathcal{I} |
a_i(.) | \triangleq | strategy of decision-maker i\in \mathcal{I} |
a=(a_i)_{i\in \mathcal{I}} | \triangleq | strategy profile |
b(t, s, m, a) | \triangleq | drift coefficient function |
\sigma(t, s, m, a) | \triangleq | diffusion coefficient function |
r_i(t, s, m, a) | \triangleq | instant payoff of decision-maker i\in \mathcal{I} |
g_i(s, m) | \triangleq | terminal payoff of decision-maker i\in \mathcal{I} |
\mathcal{R}_{i, T}(m_0, a) | \triangleq | cumulative payoff of i |
V_i(t, m) | \triangleq | equilibrium payoff of i\in \mathcal{I} |
(p_i, q_i) | \triangleq | first order adjoint process of i\in \mathcal{I} |
(P_i, Q_i) | \triangleq | second order adjoint process of i\in \mathcal{I} |
v_i^*(t, s) | \triangleq | dual function of i\in \mathcal{I} |
H_i | \triangleq | r_i+bp_i+\sigma q_i of i\in \mathcal{I} |
H_{i, s} | \triangleq | r_{i, s}+b_sp_i+\sigma_s q_i |
H_{i, ss} | \triangleq | r_{i, ss}+b_{ss}p_i+\sigma_{ss} q_i |