lundi 23 mai 2016

e - greedy choose when to choose a random value?

I struggling a bit with implementing this, I think is simple, but I might be overcomplicating things. I am trying to implement an eps-greedy action selector for q-learning. The method is described in this snippet:

Snippet from: http://ift.tt/1WdycHe

How do I make the agent

"select at each time step a random action with a fixed probability 0 <= eps <= 1", instead of selecting greedily one of the learned optimal actions with respect to the Q-function"

I have implemented a AI player which uses Q-learning to determine future actions, but since i am using a greedy action selection, it doesn't explore all it's options, but quickly gets stucked to a local minima, which is why i want to try out the eps-greedy action selecter instead, and see it would help..

The way i understand it is, that if eps was 0.9. then would the output of the function 90 % of the time be the greedy output and 10% of the time would the output be random. But how do make it do so, I would preferably like to avoid having a counter that count up 100, and based on the value of the counter would either return a random action or greedy action, but how do i do that? Is there no way around this?




Aucun commentaire:

Enregistrer un commentaire