Given a set of unique sets, I want to sample a minimum amount of sets that has the union of the largest union, i.e., the universe. As an example, let's say we have a set of 20 random sets of integers with different sizes ranging from 1 to 10:
import random
random.seed(99)
length = 20
ss = {frozenset(random.sample(range(100), random.randint(1,10))) for _ in range(length)}
assert len(ss) == 20 # This might be smaller than 20 if frozensets are not all unique
The largest union (universe) is given by
universe = frozenset().union(*ss)
print(universe)
# frozenset({0, 6, 7, 10, 11, 12, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 25,
# 26, 27, 29, 31, 32, 34, 37, 39, 40, 42, 43, 45, 46, 47, 48, 49,
# 51, 52, 53, 54, 56, 59, 60, 62, 63, 64, 66, 67, 68, 69, 75, 76,
# 77, 78, 79, 80, 81, 84, 86, 87, 88, 89, 91, 92, 93, 95, 97, 98, 99})
Right now I am using a brute-force method to search from the unions of 1 to 20 subsets using itertools.combinations
. As shown below, the code finds a minimum amount of 17 subsets after 2.95 s.
from itertools import combinations
from time import time
t0 = time()
n = 1
sample = []
found = False
while not found:
# Get all combinations of n subsets
all_n_ss = list(combinations(ss, n))
# Shuffle to gain randomness
random.shuffle(all_n_ss)
for n_ss in all_n_ss:
u = frozenset().union(*n_ss)
if u == universe:
sample = n_ss
found = True
break
# Add one more subset
n += 1
print(len(sample))
print(sample)
print(time()-t0)
# 17
# (frozenset({0, 66, 7, 42, 48, 17, 81, 51, 25, 27}),
# frozenset({49, 27, 87, 47}),
# frozenset({76, 48, 17, 22, 25, 29, 31}),
# frozenset({14}),
# frozenset({0, 66, 68, 10, 46, 54, 25, 26, 59}),
# frozenset({75, 92, 53, 78}),
# frozenset({67, 68, 11, 79, 87, 89, 62}),
# frozenset({67, 99, 40, 10, 43, 11, 51, 86, 91, 60}),
# frozenset({6, 59, 91, 76, 45, 16, 20, 56, 27, 95}),
# frozenset({32, 98, 40, 46, 15, 86, 23, 29, 63}),
# frozenset({99, 37, 12, 77, 15, 18, 19, 52, 22, 95}),
# frozenset({39, 10, 11, 80, 18, 53, 54, 87}),
# frozenset({32, 93}),
# frozenset({34}),
# frozenset({64, 84, 22}),
# frozenset({32, 97, 69, 45, 16, 51, 88, 60}),
# frozenset({21}))
# 2.9506494998931885
However, in reality I have a set of 200 sets, which is infeasible for a brute-froce enumeration. I want a fast algorithm to sample just one random solution. Note that I want each sample has randomness, minimum amount and largest union.
Any suggestions?
Aucun commentaire:
Enregistrer un commentaire