Introduction - If you have any usage issues, please Google them yourself
In the intensive learning of the University of Wan men, the Grid_world_evaluation algorithm, the python implementation, the algorithm formula are as follows.
V(S) = V(S) + A * ( R(S) + r*V(new_S) - V(S) )
the Grid_world_Policy algorithm.
P(S) = P(S) + A * ( R(S) + r*P(new_S) - P(S) )