Two-person zero-sum stochastic games with varying discount factors

Xiao Wu; Qi Wang; Yinying Kong; Xiao Wu; Qi Wang; Yinying Kong

doi:10.3934/math.2021668

AIMS Mathematics

2021, Volume 6, Issue 10: 11516-11529. doi: 10.3934/math.2021668

Previous Article Next Article

Research article

Two-person zero-sum stochastic games with varying discount factors

1.
School of Mathematics and Statistics, Zhaoqing University, Zhaoqing, 526061, China
2.
School of Mathematics, Sun Yat-Sen University, Guangzhou, 510275, China
3.
School of Intelligence Financial Accounting Management, Guangdong University of Finance and Economics, Guangzhou, 510320, China

Received: 07 May 2021 Accepted: 04 August 2021 Published: 09 August 2021
MSC : 91A15, 60J05

In this paper, two-person zero-sum Markov games with Borel state space and action space, unbounded reward function and state-dependent discount factors are studied. The optimal criterion is expected discount criterion. Firstly, sufficient conditions for the existence of optimal policies are given for the two-person zero-sum Markov games with varying discount factors. Then, the existence of optimal policies is proved by Banach fixed point theorem. Finally, we give an example for reservoir operations to illustrate the existence results.
- two-person zero-sum stochastic games,
- expected discount criterion,
- varying discount factors
Citation: Xiao Wu, Qi Wang, Yinying Kong. Two-person zero-sum stochastic games with varying discount factors[J]. AIMS Mathematics, 2021, 6(10): 11516-11529. doi: 10.3934/math.2021668

Related Papers:

Abstract

In this paper, two-person zero-sum Markov games with Borel state space and action space, unbounded reward function and state-dependent discount factors are studied. The optimal criterion is expected discount criterion. Firstly, sufficient conditions for the existence of optimal policies are given for the two-person zero-sum Markov games with varying discount factors. Then, the existence of optimal policies is proved by Banach fixed point theorem. Finally, we give an example for reservoir operations to illustrate the existence results.

References

[1]	L. S. Shapley, Stochastic games, P. Natl. Acad. Sci. USA, 39 (1953), 1095–1100.
[2]	A. Maitra, T. Parthasarathy, On stochastic games, J. Appl. Probab., 5 (1970), 289–300.
[3]	T. Parthasarathy, Discounted, positive and noncooperative stochastic games, Int. J. Game Theory, 2 (1973), 25–37. doi: 10.1007/BF01737555
[4]	H. Couwenbergh, Stochastic games with metric state space, Int. J. Game Theory, 9 (1980), 25–36. doi: 10.1007/BF01784794
[5]	J. Filar, K. Vrieze, Competitive Markov Decision Processes, New York: Springer-Verlag, 1997.
[6]	A. S. Nowak, Universally measurable strategies in zero-sum stochastic games, Ann. Probab., 13 (1985), 269–287.
[7]	A. Neyman, S. Sorin, Stochastic Games and Applications, Dordrecht: Kluwer Academic Publishers, 2003.
[8]	X. P. Guo, O. Hernández-Lerma, Zero-sum games for continuous-time jump Markov processes in Polish spaces: discounted payoffs, Adv. Appl. Probab., 39 (2007), 645–668. doi: 10.1017/S0001867800001981
[9]	J. Minjárez-Sosa, F. Luque-Vásquez, Two person zero-sum semi-Markov games with unknown holding times distribution on one side: a discounted payoff criterion, Appl. Math. Opt., 57 (2008), 289–305. doi: 10.1007/s00245-007-9016-7
[10]	O. Hernández-Lerma, J. B. Lasserre, Discrete-Time Markov Control Processes: Basic Optimality Criteria, New York: Springer-Verlag, 1996.
[11]	X. P. Guo, O. Hernández-Lerma, Zero-sum continuous-time Markov games with unbounded transition and discounted payoff rates, Bernoulli, 11 (2005), 1009–1029.
[12]	M. Schäll, Conditions for optimality in dynamic programming and for the limit of n-stage optimal policies to be optimal, Z. Wahrscheinlichkeitstheor Verw. Geb., 32 (1975), 179–196. doi: 10.1007/BF00532612
[13]	J. González-Hernández, R. López-Martinez, J. Pérez-Hernández, Markov control processes with randomized discounted cost, Math. Methods Oper. Res., 65 (2007), 27–44. doi: 10.1007/s00186-006-0092-2
[14]	J. González-Hernández, R. López-Martinez, J. Minjárez-Sosa, Approximation, estimation and control of stochastic systems under a randomized discounted cost criterion, Kybernetika, 45 (2009), 737–754.
[15]	Y. Zhang, Convex analytic approach to constrained discounted Markov decision processes with non-constant discount factors, Top, 21 (2013), 378–408. doi: 10.1007/s11750-011-0186-8
[16]	X. Wu, X. P. Guo, First Passage Optimality and Variance Minimisation of Markov Decision Processes with Varying Discount Factors, J. Appl. Probab., 52 (2015), 441–456. doi: 10.1239/jap/1437658608
[17]	L. I. Sennott, Nonzero-sum stochastic games with unbounded costs: discounted and average cost cases, Math. Method Oper. Res., 40 (1994), 145–162. doi: 10.1007/BF01432807
[18]	X. P. Guo, Q. X. Zhu, Average optimality for Markov decision processes in Borel spaces: A new condition and approach, J. Appl. Probab., 43 (2006), 318–334. doi: 10.1239/jap/1152413725
[19]	X. P. Guo, O. Hernández-Lerma, Nonzero-sum games for continuous-time Markov chains with unbounded discounted payoffs, J. Appl. Probab., 42 (2005), 303–320. doi: 10.1239/jap/1118777172
[20]	K. Fan, Minimax theorems, P. Natl. Acad. Sci. USA, 39 (1953), 42–47.
[21]	A. S. Nowak, S. Andrzej, Measurable selection theorems for minimax stochastic optimization problems, SIAM J. Control Optim., 23 (1985), 466–476. doi: 10.1137/0323030

Reader Comments

Your name:*

Email:*
© 2021 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)