Tuesday, September 20, 2016

How to use an MDP to solve an optimization problem?

First, we need to define a set of states, actions, rewards and transition probabilities. From the definition of the states, we must observe that the Markov property exists (all of these states depend on their immediately previous state only). Then, we can use dynamic programming under the reinforcement learning framework [1] to find the optimal state-value function which maximizes the long term rewards.


References
[1] R. Sutton and A. Barto. Reinforcement learning: An introduction. Cambridge Univ Press, 1998.

No comments:

Post a Comment