Introduction - If you have any usage issues, please Google them yourself
In the intensive learning of the University of Wan men, the RW algorithm, the python implementation, the algorithm formula are as follows.
V(CS) = V(CS) + A * ( V(US) * us - V(CS) * cs )
the TD algorithm.
V(s{t}) = V(s{t}) + a[R(t+1) + rV{S(t+1)} - V{S(t)}]