Let's assume I am offered to keep one of three slot machines, each with unknown and unique reward distributions. Each machine can output a -1, 0 or a 1 after each try. Given the following collected data:
Slot machine 1 data: Attempts: 100, Average reward: 0.3
Slot machine 2 data: Attempts: 10, Average reward: 0.4
Slot machine 3 data: Attempts: 4, Average reward: 0.5
If we want to keep the slot machine that maximizes the reward, which one would it be and why?
Some context: I understand that with more attempts I can be more certain about the expected reward, which is desired. For example, the 3rd machine has the best reward but has been attempted fewer times, meaning that there is a high risk involved. Is there a statistical formula that helps to make this decision?
This is not a Multi-Armed Bandit problem, I don't get to try the slot machines again to make another decision, the question is about making a decision now given the data.
Aucun commentaire:
Enregistrer un commentaire