This paper analyzes the existence of Nash equilibrium in a discrete-time Markov stopping game with two players. At each decision point, Player Ⅱ is faced with the choice of either ending the game and thus granting Player Ⅰ a final reward or letting the game continue. In the latter case, Player Ⅰ performs an action that affects transitions and receives a running reward from Player Ⅱ. We assume that Player Ⅰ has a constant and non-zero risk sensitivity coefficient, while Player Ⅱ strives to minimize the utility of Player Ⅰ. The effectiveness of decision strategies was measured by the risk-sensitive expected total reward of Player Ⅰ. Exploiting mild continuity-compactness conditions and communication-ergodicity properties, we found that the value function of the game is described as a single fixed point of the equilibrium operator, determining a Nash equilibrium. In addition, we provide an illustrative example in which our assumptions hold.
Citation: Jaicer López-Rivero, Hugo Cruz-Suárez, Carlos Camilo-Garay. Nash equilibria in risk-sensitive Markov stopping games under communication conditions[J]. AIMS Mathematics, 2024, 9(9): 23997-24017. doi: 10.3934/math.20241167
This paper analyzes the existence of Nash equilibrium in a discrete-time Markov stopping game with two players. At each decision point, Player Ⅱ is faced with the choice of either ending the game and thus granting Player Ⅰ a final reward or letting the game continue. In the latter case, Player Ⅰ performs an action that affects transitions and receives a running reward from Player Ⅱ. We assume that Player Ⅰ has a constant and non-zero risk sensitivity coefficient, while Player Ⅱ strives to minimize the utility of Player Ⅰ. The effectiveness of decision strategies was measured by the risk-sensitive expected total reward of Player Ⅰ. Exploiting mild continuity-compactness conditions and communication-ergodicity properties, we found that the value function of the game is described as a single fixed point of the equilibrium operator, determining a Nash equilibrium. In addition, we provide an illustrative example in which our assumptions hold.
[1] | A. Maitra, W. Sudderth, The gambler and the stopper, Lecture Notes-Monograph Series, 30 (1996), 191–208. |
[2] | E. Dynkin, Game variant of a problem on optimal stopping, Soviet Math. Dokl., 0 (1969), 270–274. |
[3] | G. Peskir, A. Shiryaev, Optimal stopping and free-boundary problems, Basel: Birkhäuser, 2006. http://dx.doi.org/10.1007/978-3-7643-7390-0 |
[4] | A. Shiryaev, Optimal stopping rules, Berlin: Springer Science & Business Media, 2007. http://dx.doi.org/10.1007/978-3-540-74011-7 |
[5] | T. Bielecki, D. Hernández-Hernández, S. Pliska, Risk sensitive control of finite state Markov chains in discrete time, with applications to portfolio management, Math. Meth. Oper. Res., 50 (1999), 167–188. http://dx.doi.org/10.1007/s001860050094 doi: 10.1007/s001860050094 |
[6] | G. Peskir, On the American option problem, Math. Finance, 5 (2005), 169–181. http://dx.doi.org/10.1111/j.0960-1627.2005.00214.x doi: 10.1111/j.0960-1627.2005.00214.x |
[7] | E. Altman, A. Shwartz, Constrained Markov games: Nash equilibria, Proceedings of Advances in Dynamic Games and Applications, 2000,213–221. http://dx.doi.org/10.1007/978-1-4612-1336-9_11 |
[8] | R. Atar, A. Budhiraja, A stochastic differential game for the inhomogeneous Laplace equation, Ann. Prob., 38 (2010), 498–531. http://dx.doi.org/10.1214/09-AOP494 doi: 10.1214/09-AOP494 |
[9] | J. Filar, K. Vrieze, Competitive Markov decision processes, New York: Springer Science & Business Media, 2012. http://dx.doi.org/10.1007/978-1-4612-4054-9 |
[10] | V. Kolokoltsov, O. Malafeyev, Understanding game theory: introduction to the analysis of many agent systems with competition and cooperation, Hackensack: World Scientific, 2020. |
[11] | L. Shapley, Stochastic games, PNAS, 39 (1953), 1095–1100. http://dx.doi.org/10.1073/pnas.39.10.1095 doi: 10.1073/pnas.39.10.1095 |
[12] | L. Zachrisson, Markov games, In: Advances in game theory, Princeton: Princeton University Press, 1964,211–254. http://dx.doi.org/10.1515/9781400882014-014 |
[13] | O. Hernández-Lerma, Adaptive Markov control processes, New York: Springer Science & Business Media, 2012. http://dx.doi.org/10.1007/978-1-4419-8714-3 |
[14] | M. Puterman, Markov decision processes: discrete stochastic dynamic programming, Hoboken: John Wiley & Sons, 2014. http://dx.doi.org/10.1002/9780470316887 |
[15] | R. Howard, J. Matheson, Risk-sensitive Markov decision processes, Manage. Sci., 8 (1972), 356–369. http://dx.doi.org/10.1287/mnsc.18.7.356 doi: 10.1287/mnsc.18.7.356 |
[16] | N. Bäuerle, U. Rieder, Markov decision processes with applications to finance, Heidelberg: Springer Science & Business Media, 2011. http://dx.doi.org/10.1007/978-3-642-18324-9 |
[17] | L. Stettner, Risk sensitive portfolio optimization, Math. Meth. Oper. Res., 50 (1999), 463–474. http://dx.doi.org/10.1007/s001860050081 doi: 10.1007/s001860050081 |
[18] | S. Balaji, S. Meyn, Multiplicative ergodicity and large deviations for an irreducible Markov chain, Stoch. Proc. Appl., 90 (2000), 123–144. http://dx.doi.org/10.1016/S0304-4149(00)00032-6 doi: 10.1016/S0304-4149(00)00032-6 |
[19] | I. Kontoyiannis, S. Meyn, Spectral theory and limit theorems for geometrically ergodic Markov processes, Ann. Appl. Probab., 3 (2003), 304–362. http://dx.doi.org/10.1214/aoap/1042765670 doi: 10.1214/aoap/1042765670 |
[20] | N. Bäuerle, U. Rieder, More risk-sensitive Markov decision processes, Math. Oper. Res., 39 (2014), 105–120. http://dx.doi.org/10.1287/moor.2013.0601 doi: 10.1287/moor.2013.0601 |
[21] | V. Borkar, S. Meyn, Risk-sensitive optimal control for Markov decision processes with monotone cost, Math. Oper. Res., 27 (2002), 192–209. http://dx.doi.org/10.1287/moor.27.1.192.334 doi: 10.1287/moor.27.1.192.334 |
[22] | K. Sladkỳ, Risk-sensitive average optimality in Markov decision processes, Kybernetika, 54 (2018), 1218–1230. http://dx.doi.org/10.14736/kyb-2018-6-1218 doi: 10.14736/kyb-2018-6-1218 |
[23] | G. Di Masi, Ł. Stettner, Infinite horizon risk sensitive control of discrete time Markov processes under minorization property, SIAM J. Control Optim., 46 (2007), 231–252. http://dx.doi.org/10.1137/040618631 doi: 10.1137/040618631 |
[24] | A. Jaśkiewicz, Average optimality for risk-sensitive control with general state space, Ann. Appl. Probab., 7 (2007), 654–675. http://dx.doi.org/10.1214/105051606000000790 doi: 10.1214/105051606000000790 |
[25] | R. Cavazos-Cadena, L. Rodríguez-Gutiérrez, D. Sánchez-Guillermo, Markov stopping games with an absorbing state and total reward criterion, Kybernetika, 57 (2021), 474–492. http://dx.doi.org/10.14736/kyb-2021-3-0474 doi: 10.14736/kyb-2021-3-0474 |
[26] | V. Martínez-Cortés, Bi-personal stochastic transient Markov games with stopping times and total reward criterion, Kybernetika, 57 (2021), 1–14. http://dx.doi.org/10.14736/kyb-2021-1-0001 doi: 10.14736/kyb-2021-1-0001 |
[27] | J. López-Rivero, R. Cavazos-Cadena, H. Cruz-Suárez, Risk-sensitive Markov stopping games with an absorbing state, Kybernetika, 58 (2022), 101–122. http://dx.doi.org/10.14736/kyb-2022-1-0101 doi: 10.14736/kyb-2022-1-0101 |
[28] | M. Torres-Gomar, R. Cavazos-Cadena, H. Cruz-Suárez, Denumerable Markov stopping games with risk-sensitive total reward criterion, Kybernetika, 60 (2024), 1–18. http://dx.doi.org/10.14736/kyb-2024-1-0001 doi: 10.14736/kyb-2024-1-0001 |
[29] | W. Zhang, C. Liu, Discrete-time stopping games with risk-sensitive discounted cost criterion, Math. Meth. Oper. Res., in press. http://dx.doi.org/10.1007/s00186-024-00864-1 |
[30] | F. Dufour, T. Prieto-Rumeau, Nash equilibria for total expected reward absorbing Markov games: the constrained and unconstrained cases, Appl. Math. Optim., 89 (2024), 34. http://dx.doi.org/10.1007/s00245-023-10095-1 doi: 10.1007/s00245-023-10095-1 |
[31] | R. Cavazos-Cadena, M. Cantú-Sifuentes, I. Cerda-Delgado, Nash equilibria in a class of Markov stopping games with total reward criterion, Math. Meth. Oper. Res., 94 (2021), 319–340. http://dx.doi.org/10.1007/s00186-021-00759-5 doi: 10.1007/s00186-021-00759-5 |
[32] | J. Saucedo-Zul, R. Cavazos-Cadena, H. Cruz-Suárez, A discounted approach in communicating average Markov decision chains under risk-aversion, J. Optim. Theory Appl., 87 (2020), 585–606. http://dx.doi.org/10.1007/s10957-020-01758-y doi: 10.1007/s10957-020-01758-y |
[33] | P. Hoel, S. Port, C. Stone, Introduction to stochastic processes, Long Grove: Waveland Press, 1986. |
[34] | R. Cavazos-Cadena, Characterization of the optimal risk-sensitive average cost in denumerable Markov decision chains, Math. Oper. Res., 43 (2018), 1025–1050. http://dx.doi.org/10.1287/moor.2017.0893 doi: 10.1287/moor.2017.0893 |