mercredi 20 novembre 2019

MYSQL Random Entry with weight fails

I'm trying to display weighted random results from my database and I'm unable to get results with expected accuracy. I've followed what I learnt here and here.

This would be my table:

+--------+-----------+
| weight | image     |
+--------+-----------+
|     50 | A         |
|     25 | B         |
|     25 | C         |
+--------+-----------+

I need the image A to appear 50% of the times, the image B the other 25% of the times and C the remaining 25% of the times.

The SQL estatement I'm using goes like this:

SELECT image FROM images WHERE weight > 0 ORDER BY -LOG(1.0 - RAND()) / weight LIMIT 10

So in order to test this properly I made a php script to have this iterate 10,000 times, counting how many times a, b or c was being shown and I display the results on my test script with percentages, like this:

a total: 4976 - 49,76%
b total: 2538 - 25,38%
c total: 2486 - 24,86%

With only 10,000 results and considering the RAND() is just a randomization function I would consider this results to be accurate enough. The problem is that I run this script about 100 times and I realized that 98 out of 100 times b had a higher percentage count than c.

I'm trying to understand what's wrong, both values (b and c) on the table are the same and I'm not introducing any other ordering factor. I took it up a notch and I went for 100,000 iterations of the SQL clause. These are the results:

a total: 50185 - 50,185%
b total: 25201 - 25,201%
c total: 24614 - 24,614%

I run this last test about 50 times (with long wait times between each). This time b was above c every time and accuracy was worse than the accuracy at 10000 iterations. You would expect that as you go higher on the number of iterations, the percentage variation should be getting smaller and the results more accurate. It's obvious that either I'm doing something wrong or RAND() is not really random enough.

Matematically speaking if it was perfectly random it should be improving accuracy the more iterations you make and not the opposite.

Any explanation/solution is welcome.




Aucun commentaire:

Enregistrer un commentaire