I at moment trying to implement E-greedy which require me to do this:
One such method is $\epsilon$-greedy, when the agent chooses the action that it believes has the best long-term effect with probability 1-$\epsilon$, and it chooses an action uniformly at random, otherwise. : http://ift.tt/20rQKSu
Generating a random value is not a problem, but how do i choose an action randomly with a probability 1-epsilon at the time?
Aucun commentaire:
Enregistrer un commentaire