samedi 11 mars 2017

Should I specify a random_state in any classifier when doing a 10x10 StratifiedShuffleSplit?

I am doing 10x10 StratifiedShuffleSplit. And I average all the accuracies from each fold. As you can see in the code, I use 10 different random_states for SSS. But for RandomForest I do not use anything. Should I also specifify the random_state in RandomForest too? What will happen if I do not do it?

Thank you.

    result_list = [] 

    for name in ["AWA"]: 
         for el in ['Fp1']:
            x=sio.loadmat('/home/TrainVal/{}_{}.mat'.format(name, el))['x']
            s_y=sio.loadmat('/home/TrainVal/{}_{}.mat'.format(name, el))['y']
            y=np.ravel(s_y)


            print(name, el, x.shape, y.shape)
            print("")

            clf = make_pipeline(preprocessing.RobustScaler(), RandomForestClassifier())   
        ##################10x10 SSS#############
            print("10x10")
            xSSSmean10 = []
            for i in range(10):
                sss = StratifiedShuffleSplit(y, 10, test_size=0.1, random_state=i) 
                scoresSSS = cross_validation.cross_val_score(clf, x, y , cv=sss)

                xSSSmean10.append(scoresSSS.mean())


            result_list.append(xSSSmean10)




Aucun commentaire:

Enregistrer un commentaire