In this paper, two-person zero-sum Markov games with Borel state space and action space, unbounded reward function and state-dependent discount factors are studied. The optimal criterion is expected discount criterion. Firstly, sufficient conditions for the existence of optimal policies are given for the two-person zero-sum Markov games with varying discount factors. Then, the existence of optimal policies is proved by Banach fixed point theorem. Finally, we give an example for reservoir operations to illustrate the existence results.
Citation: Xiao Wu, Qi Wang, Yinying Kong. Two-person zero-sum stochastic games with varying discount factors[J]. AIMS Mathematics, 2021, 6(10): 11516-11529. doi: 10.3934/math.2021668
In this paper, two-person zero-sum Markov games with Borel state space and action space, unbounded reward function and state-dependent discount factors are studied. The optimal criterion is expected discount criterion. Firstly, sufficient conditions for the existence of optimal policies are given for the two-person zero-sum Markov games with varying discount factors. Then, the existence of optimal policies is proved by Banach fixed point theorem. Finally, we give an example for reservoir operations to illustrate the existence results.
[1] | L. S. Shapley, Stochastic games, P. Natl. Acad. Sci. USA, 39 (1953), 1095–1100. |
[2] | A. Maitra, T. Parthasarathy, On stochastic games, J. Appl. Probab., 5 (1970), 289–300. |
[3] | T. Parthasarathy, Discounted, positive and noncooperative stochastic games, Int. J. Game Theory, 2 (1973), 25–37. doi: 10.1007/BF01737555 |
[4] | H. Couwenbergh, Stochastic games with metric state space, Int. J. Game Theory, 9 (1980), 25–36. doi: 10.1007/BF01784794 |
[5] | J. Filar, K. Vrieze, Competitive Markov Decision Processes, New York: Springer-Verlag, 1997. |
[6] | A. S. Nowak, Universally measurable strategies in zero-sum stochastic games, Ann. Probab., 13 (1985), 269–287. |
[7] | A. Neyman, S. Sorin, Stochastic Games and Applications, Dordrecht: Kluwer Academic Publishers, 2003. |
[8] | X. P. Guo, O. Hernández-Lerma, Zero-sum games for continuous-time jump Markov processes in Polish spaces: discounted payoffs, Adv. Appl. Probab., 39 (2007), 645–668. doi: 10.1017/S0001867800001981 |
[9] | J. Minjárez-Sosa, F. Luque-Vásquez, Two person zero-sum semi-Markov games with unknown holding times distribution on one side: a discounted payoff criterion, Appl. Math. Opt., 57 (2008), 289–305. doi: 10.1007/s00245-007-9016-7 |
[10] | O. Hernández-Lerma, J. B. Lasserre, Discrete-Time Markov Control Processes: Basic Optimality Criteria, New York: Springer-Verlag, 1996. |
[11] | X. P. Guo, O. Hernández-Lerma, Zero-sum continuous-time Markov games with unbounded transition and discounted payoff rates, Bernoulli, 11 (2005), 1009–1029. |
[12] | M. Schäll, Conditions for optimality in dynamic programming and for the limit of n-stage optimal policies to be optimal, Z. Wahrscheinlichkeitstheor Verw. Geb., 32 (1975), 179–196. doi: 10.1007/BF00532612 |
[13] | J. González-Hernández, R. López-Martinez, J. Pérez-Hernández, Markov control processes with randomized discounted cost, Math. Methods Oper. Res., 65 (2007), 27–44. doi: 10.1007/s00186-006-0092-2 |
[14] | J. González-Hernández, R. López-Martinez, J. Minjárez-Sosa, Approximation, estimation and control of stochastic systems under a randomized discounted cost criterion, Kybernetika, 45 (2009), 737–754. |
[15] | Y. Zhang, Convex analytic approach to constrained discounted Markov decision processes with non-constant discount factors, Top, 21 (2013), 378–408. doi: 10.1007/s11750-011-0186-8 |
[16] | X. Wu, X. P. Guo, First Passage Optimality and Variance Minimisation of Markov Decision Processes with Varying Discount Factors, J. Appl. Probab., 52 (2015), 441–456. doi: 10.1239/jap/1437658608 |
[17] | L. I. Sennott, Nonzero-sum stochastic games with unbounded costs: discounted and average cost cases, Math. Method Oper. Res., 40 (1994), 145–162. doi: 10.1007/BF01432807 |
[18] | X. P. Guo, Q. X. Zhu, Average optimality for Markov decision processes in Borel spaces: A new condition and approach, J. Appl. Probab., 43 (2006), 318–334. doi: 10.1239/jap/1152413725 |
[19] | X. P. Guo, O. Hernández-Lerma, Nonzero-sum games for continuous-time Markov chains with unbounded discounted payoffs, J. Appl. Probab., 42 (2005), 303–320. doi: 10.1239/jap/1118777172 |
[20] | K. Fan, Minimax theorems, P. Natl. Acad. Sci. USA, 39 (1953), 42–47. |
[21] | A. S. Nowak, S. Andrzej, Measurable selection theorems for minimax stochastic optimization problems, SIAM J. Control Optim., 23 (1985), 466–476. doi: 10.1137/0323030 |