I implement Monte carlo tree search for a 2 person strategic game(Where you can win/lose/draw).
I search through the tree following the node with the highest UCB(Upper confidence bound for trees) value. If I find a node with no children, I add all possible moves to it, select one and go into simulation. I have three questions:
-
How do I choose children when I have multiple children nodes with same UCB value? Should I randomly select one or should I select the node that occurs the first time in the for loop(max search)?(Does it even matter?)
-
Which values should I choose for backpropagation? For example if I win in a simulation, should I backpropagate a 10 or 1? If I draw I backpropagate a 0(Only increase visit). Which value should I backpropagate if I lose in a simulation? A 0(like in draw) or a -1/-10 ?
Aucun commentaire:
Enregistrer un commentaire