mercredi 3 avril 2019

Sorting List randomly by weights

The idea is that i have a list of items with each item having a weight attached. Now i want to randomize the order of the list, but i also want to take the weight into account to "bias" the randomization process.

There are multiple approaches for that, but i'm particulary interested in one certain approach and why it is producing different distribution than i expect. The code is probably fine, but i want to get understanding, why it does what it does?

I know other algorithms that produce expected result.

One is to basically create a range, each with the length of particular item's weight and then randomly pick a point from the full range that got produced. This creates item one-by-one, by doing that over and over again until there are no items/no range to pick from. It produces the expected ratios over a million tries.

There's also another algorithm, that doesn't need range to be produced, but expects initial list to be in random order and then through substraction and checking against x <= 0, also takes items one-by-one to produce a list of randomly, but biased order of items. It produces the expected ratios over a million tries.

The approach i want to currently concentrate on, is to produce a ordering value for each item, and then order the whole list in one go. The code i have written does not produce expected ratios over a million tries.

C# code for a console application

using System;
using System.Collections.Generic;
using System.Linq;

namespace ConsoleTest1
{
    class Program
    {
        static void Main(string[] args)
        {
            var myList = new List<Item>
            {
                new Item { Name = "A70", Weight = 70},
                new Item { Name = "B20", Weight = 20},
                new Item { Name = "C10", Weight = 10},
            };

            var stats = new Dictionary<string, int>();
            myList.ForEach(x => stats.Add(x.Name, 0));

            var rnd = new Random();
            for (var i = 0; i < 1000000; ++i)
            {
                var newList = GetSorted(myList, rnd);
                ++stats[newList.First().Name];
            }

            var sum = stats.ToList().Sum(x => x.Value);
            stats.ToList().ForEach(x => Console.WriteLine($"{x.Key}: {((float)x.Value / sum * 100):0.00}%"));

            Console.ReadLine();
        }

        private static IEnumerable<Item> GetSorted(IEnumerable<Item> list, Random rnd)
        {
            return list
                .Select(x => new
                {
                    Order = x.Weight * rnd.NextDouble(),
                    Item = x
                })
                .OrderByDescending(x => x.Order)
                .Select(x => x.Item);
        }
    }

    class Item
    {
        public string Name { get; set; }
        public int Weight { get; set; }
    }
}

By this code, i would expect probability of each item to be in the first position of the list to be very similar to the weights of each item. Instead of 70-20-10% ratio, i get roughly 85-13-2% ratios. It almost looks like there's some kind of non-linearity coming into play, but i just don't get it right now.

The question here is to understand, how this given code works. I know and have different approaches that work and produces expected result.

Thank you!




Aucun commentaire:

Enregistrer un commentaire