Solving a Dec-POMDP is a really challenging task. In fact, it is known that the problem of finding the optimal solution for a finite-horizon Dec-POMDP with even only two agents is NEXP-complete [1].
Therefore, so much effort has been spent by researchers during last decade to create efficient methods for finding exact or approximate solution of Dec-POMDP. [2] provides a recent survey of the existing methods.
In [3], a decentralized version of POMDP (Dec-POMDP) has been used for rate-adaptive video streaming. They have used Joint Equilibrium based Search for Policies (JESP) [4] to solve their Dec-POMDP model. JESP [4] is guaranteed to find a locally optimal joint policy. It relies on a procedure called alternating maximization, that computes a maximizing policy for one agent at a time, while keeping the policies of the other agents fixed.
Multi-Agent Decision Process (MADP) Toolbox [5], which provides software tools for modeling, specifying, planning and learning a variety of decision-theoretic problems in multi-agent systems.
References
[1] D. S. Bernstein, S. Zilberstein, and N. Immerman, “The Complexity of Decentralized Control of Markov Decision Processes,” in Proc. Uncertainty in Artifical Intelligence, 2000, pp. 32–37.
[2] C. Amato, G. Chowdhary, A. Geramifard, N. K. Ure, and M. J. Kochenderfer, “Decentralized Control of Partially Observable Markov Decision Processes,” in Proc. IEEE CDC, 2013, pp. 2398–2405.
[3] Hemmati, Mahdi, Abdulsalam Yassine, and Shervin Shirmohammadi. "A Dec-POMDP Model for Congestion Avoidance and Fair Allocation of Network Bandwidth in Rate-Adaptive Video Streaming." Computational Intelligence, 2015 IEEE Symposium Series on. IEEE, 2015.
[4] R. Nair, M. Tambe, M. Yokoo, D. Pynadath, and S. Marsella, “Taming decentralized POMDPs: Towards efficient policy computation for multiagent settings,” in Proc. IJCAI, 2003, pp. 705–711.
[5] F. A. Oliehoek, M. T. J. Spaan, and P. Robbel, “MultiAgent Decision Process (MADP) Toolbox 0.3,” 2014.
No comments:
Post a Comment