mercredi 25 janvier 2017

Bootstrap resampling of data using python with upper limit for the occurrence of each point

I would like to measure the uncertainty of a method using the bootstrap resampling. I have 200 data points which I'd like to use for resampling. What is the fastest way that I can resample these data points? I have some pre-requirements such as I would like that the same data point would not repeat in the bootstrap sample more than two times and less than 7 points would get replaced in total. It is very important in my case since the structure of data points get changed drastically.

The speed of resampling process is also very crucial because I would like to repeat both resampling and the measurements of a method more than 5000 times.

Currently, I am doing as following which is very slow and not practical at all:

from collections import Counter

SeqNr=np.arange(cat.shape[0])
np.random.choice(SeqNr,size=cat.shape[0])
maximum=max(Counter(L).values())
occurrence=Counter(L).values().count(2)

where cat contains the numpy array of data and this is inside a while loop to force it to follow aforementioned conditions. I will appreciate if someone recommends a faster approach.




Aucun commentaire:

Enregistrer un commentaire