lundi 21 octobre 2019

RandomForestClassifier random_state impact

I have split my dataset into 70/25/5 train/validation/test.

I run hyper-parameters cross-validation on a few models on my training set (RForest, XGBoost, SVC, Logistic, all using auroc metric). Once all cross-validated and re-fitted, I would like to select “best” model based on the auroc on my validation set. I noticed that the results for RForest are strongly sensitive to the random_state parameter (while it's stable for the other models).

In that case, how am I supposed to select the most appropriate model ? Should I average auroc of RFModels with different random_state ? Am I missing something ?

See Confusion Matrix and classification report below

TIA

random_state=0

Confusion=

[[67 78]

[62 76]]

         precision    recall  f1-score   support
   -1.0       0.52      0.46      0.49       145
    1.0       0.49      0.55      0.52       138

Random_state=10000

Confusion=

[[ 15 130]

[ 13 125]]

         precision    recall  f1-score   support
   -1.0       0.54      0.10      0.17       145
    1.0       0.49      0.91      0.64       138



Aucun commentaire:

Enregistrer un commentaire