Top-level heading

Policy iteration for stochastic games

Data e ora inizio evento
Data e ora fine evento
Sede

Dipartimento di Matematica Guido Castelnuovo, Università Sapienza Roma

Aula
Sala di Consiglio
Speaker

Marianne Akian, INRIA Saclay-Ile-de-France and CMAP, Ecole Polytechnique

Zero-sum stochastic games with finite state and action spaces, perfect information, and mean payoff criteria arise in particular from the monotone discretization of mean-payoff pursuit-evasion deterministic differential games. When the Markov chains associated to strategies are irreducible, the value of the game can be computed by using Hoffman and Karp policy iteration algorithm (1966), which is similar to the one introduced by Denardo (1967) for solving discounted games. A feature of policy iteration is that the number of iterations is small in practice, whereas in general it can only be bounded by the number of strategies. Recently, Ye and Hansen, Miltersen and Zwick showed that policy iteration for one or two player zero-sum stochastic games, restricted to instances with a fixed discount rate, is strongly polynomial. We shall show that the Hoffman and Karp algorithm is also strongly polynomial for mean-payoff games with bounded first mean return time to a given state. The proof is based on methods of nonlinear Perron-Frobenius theory and on a reduction of the mean-payoff problem to a discounted problem with state dependent discount rate. When no irreducibility assumption is satisfied (multichain games), the policy iteration algorithm needs to be adapted to avoid cycling in degenerate iterations. Such an adaptation was proposed by Cochet-Terrasson and Gaubert (2006), and developped in particular in the software PIGAMES of Detournay (2012). We shall present numerical results of this algorithm on pursuit-evasion games. This talk covers joint works with jean Cochet-Terrasson, Sylvie Detournay, and St�phane Gaubert, See arXiv:1310.4953 and arXiv:1208.0446.