I have the following function to generate a random sample from an original data set:
def randomSampling(originalData):
a = np.random.random_integers(0, 500, size=originalData.shape)
#Number of elements in the result. We split in a half because we want a 50% sample
N=round(parkinsonData.shape[0]/2)
result = np.zeros(originalData.shape)
ia = np.arange(result.size)
#cast to float the sum of the flat a array
tw = float(np.sum(a.ravel()))
result.ravel()[np.random.choice(ia, p=a.ravel()/tw,size=N, replace=False)]=1
return result
My purpose is to achieve a subset of original data that have the 50% of the size of the original set. This way I can execute this function two times, one to achieve training subset and another test subset.
My problem is that in the original data I have a field called status
that have values 0
or 1
. And I want to keep the proportionallity between both set of classes in the subsets of training and test.
How can I do this with python? Also I am not sure that the function is achieving to create the samples with the half of the registers of the original set. Theorically, this should ensure that the size is 50%: N=round(parkinsonData.shape[0]/2)
Aucun commentaire:
Enregistrer un commentaire