I have split my dataset into 70/25/5 train/validation/test.
I run hyper-parameters cross-validation on a few models on my training set (RForest, XGBoost, SVC, Logistic, all using auroc metric). Once all cross-validated and re-fitted, I would like to select “best” model based on the auroc on my validation set. I noticed that the results for RForest are strongly sensitive to the random_state parameter (while it's stable for the other models).
In that case, how am I supposed to select the most appropriate model ? Should I average auroc of RFModels with different random_state ? Am I missing something ?
See Confusion Matrix and classification report below
TIA
random_state=0
Confusion=
[[67 78]
[62 76]]
precision recall f1-score support
-1.0 0.52 0.46 0.49 145
1.0 0.49 0.55 0.52 138
Random_state=10000
Confusion=
[[ 15 130]
[ 13 125]]
precision recall f1-score support
-1.0 0.54 0.10 0.17 145
1.0 0.49 0.91 0.64 138
Aucun commentaire:
Enregistrer un commentaire