vendredi 28 avril 2017

Improve random sample function

I have the following function to generate a random sample from an original data set:

def randomSampling(originalData):
    a = np.random.random_integers(0, 500, size=originalData.shape)
    #Number of elements in the result. We split in a half because we want a 50% sample
    N=round(parkinsonData.shape[0]/2)
    result = np.zeros(originalData.shape)
    ia = np.arange(result.size)
    #cast to float the sum of the flat a array
    tw = float(np.sum(a.ravel()))
    result.ravel()[np.random.choice(ia, p=a.ravel()/tw,size=N, replace=False)]=1
    return result

My purpose is to achieve a subset of original data that have the 50% of the size of the original set. This way I can execute this function two times, one to achieve training subset and another test subset.

My problem is that in the original data I have a field called status that have values 0 or 1. And I want to keep the proportionallity between both set of classes in the subsets of training and test.

How can I do this with python? Also I am not sure that the function is achieving to create the samples with the half of the registers of the original set. Theorically, this should ensure that the size is 50%: N=round(parkinsonData.shape[0]/2)




Aucun commentaire:

Enregistrer un commentaire