mardi 29 mars 2022

Computing a random subset in Python - what's wrong in my code?

This code is to return a subset with size k from the set size n (EPI 5.15). That is, take n > 0, k <= n, and from n we (hypothetically) form a set {0, 1, 2, ..., n-1} from which we pick k elements to form a subset. There are nCk possibilities of picking a subset and we want it to be uniformly picked and we also want the permutation in that subset is also random. There are three versions of codes -- from official solution, my tweak, and my own solution. The latter two are WRONG but I don't know why. I will explain the gist of the algorithm right below the three codes.

Official Solution

def random_subset(n: int, k: int) -> List[int]:
    H = {}
    for i in range(k):
        r = random.randrange(i, n)
        rmap = H.get(r, r)
        imap = H.get(i, i)
        H[r] = imap
        H[i] = rmap
    return [H[i] for i in range(k)]

Change to the official solution (WRONG)

def random_subset(n: int, k: int) -> List[int]:
    H = {}
    for i in range(k):
        r = random.randrange(i, n)
        H[r] = H.get(i, i)
        H[i] = H.get(r, r)
    return [H[i] for i in range(k)]

My solution (DEAD WRONG)

def random_subset(n: int, k: int) -> List[int]:
    H = {}
    for i in range(k):
        r = random.randrange(i, n)
        if r in H:
            H[i] = H[r], i
        else:
            H[i] = r, i
    return [H[i] for i in range(k)]

Underlying Logic

We pick an element from the part of array A, <0, 1, 2, ..., n-1>, without duplicate. At first, pick r from array A and swap it with A[0]; then pick another r and swap it with A[1] ... until we have filled A[k-1], in total we have k elements, like the following code:

'''
A = <0, 1, 2, ..., n-1>
i    0  1  2       n-1
'''

def random_sampling(k, A):
    for i in range(k):
        r = random.randrange(i, len(A))
        A[i], A[r] = A[r], A[i]

A = [i for i in range(n)]
k = some_constant
random_sampling(k, A)
answer = A[:k]

To reduce space complexity from O(n) to O(k) by mimicking an array <0, 1, 2, ..., n-1>, we changed the right above code into an official solution that uses a hash table from which we select an element to be included in a subset. The problem arises in the way I use hash table differently from original answer but I don't know why.




Aucun commentaire:

Enregistrer un commentaire