Thursday, August 11, 2016

MDP vs. POMDP

Markov Decision Process (MDP) [1] models decision problems under uncertainty when the full state information is available. In many real world problems this is not the case and only incomplete state information might be observable. Partially Observable Markov Decision Process (POMDP) [2] provides a powerful modeling framework for such problems. In multi-agent environments where there are several active decision-makers, Decentralized POMDP (Dec-POMDP) [3] is used. 

In [4], a decentralized version of 
POMDP (Dec-POMDP) has been used for rate-adaptive video streaming.

References
[1] D. P. Bertsekas, Dynamic programming and optimal control. Athena Scientific Belmont, MA, 1995, vol. I-II.
[2] L. P. Kaelbling, M. L. Littman, and A. R. Cassandra, “Planning and 
acting in partially observable stochastic domains,” Artificial Intelligence, vol. 101, no. 1-2, pp. 99–134, May 1998.
[3] F. a. Oliehoek, “Decentralized POMDPs,” in Reinforcement Learning: State-of-the-Art, M. Wiering and M. V. Otterlo, Eds. Springer, 2012, pp. 471–503.
[4] Hemmati, Mahdi, Abdulsalam Yassine, and Shervin Shirmohammadi. "A Dec-POMDP Model for Congestion Avoidance and Fair Allocation of Network Bandwidth in Rate-Adaptive Video Streaming." Computational Intelligence, 2015 IEEE Symposium Series on. IEEE, 2015.

No comments:

Post a Comment