vendredi 18 juin 2021

Fastest way to sample most numbers with minimum difference larger than a value from a Python list

Say I have a list of random float numbers between 1 and 10. I want to find a largest subset where any two of the numbers are different from each other larger than a mindiff = 1.. Right now I am using a brute-force method to search from largest to smallest subsets using itertools.combinations. As shown below, the code finds a subset after 4 s for a list of 20 numbers.

from itertools import combinations
import random
from time import time

mindiff = 1.
length = 20
random.seed(99)
lst = [random.uniform(1., 10.) for _ in range(length)]

t0 = time()
n = len(lst)
sample = []
found = False
while not found:
    # get all subsets with size n
    subsets = list(combinations(lst, n))
    random.shuffle(subsets)
    for subset in subsets:
        # sort the subset numbers
        ss = sorted(subset)
        # calculate the differences between every two adjacent numbers
        diffs = [j-i for i, j in zip(ss[:-1], ss[1:])]
        if min(diffs) > mindiff:
            sample = set(subset)
            found = True
            break
    # check subsets with size -1
    n -= 1

print(sample)
print(time()-t0)

Output:

{2.3704888087015568, 4.365818049020534, 5.403474619948962, 6.518944556233767, 7.8388969285727015, 9.117993839791751}
4.182451486587524

However, in reality I have a list of 200 numbers, which is infeasible for a brute-froce enumeration. I want a fast algorithm to sample just one random largest subset with a minimum difference larger than 1. Note that I want each sample has randomness and maximum size. Any suggestions?




Aucun commentaire:

Enregistrer un commentaire