mardi 31 mars 2020

Ranger Predicted Class Probability of each row in a data frame

With regard to this link Predicted probabilities in R ranger package, I have a question.

Imagine I have a mixed data frame, df (comprising of factor and numeric variables) and I want to do classification using ranger. I am splitting this data frame as test and train sets as Train_Set and Test_Set. BiClass is my prediction factor variable and comprises of 0 and 1 (2 levels)

I want to calculate and attach class probabilities to the data frame using ranger using the following commands:

Biclass.ranger <- ranger(BiClass ~ ., ,data=Train_Set, num.trees = 500, importance="impurity", save.memory = TRUE, probability=TRUE)

probabilities <- as.data.frame(predict(Biclass.ranger, data = Test_Set, num.trees = 200, type='response', verbose = TRUE)$predictions)

The data frame probabilities is a data frame consisting of 2 columns (0 and 1) with number of rows equal to the number of rows in Test_Set.

Does it mean, if I append or attach this data frame, namely, probabilities to the Test_Set as the last two columns, it shows the probability of each row being either 0 or 1? Is my understanding correct?

My second question, when I attempt to calcuate confusion matrix through

pred = predict(Biclass.ranger, data=Test_Set, num.trees = 500, type='response', verbose = TRUE)
table(Test_Set$BiClass, pred$predictions)

I get the following error: Error in table(Test_Set$BiClass, pred$predictions) : all arguments must have the same length

What am I doing wrong?




Aucun commentaire:

Enregistrer un commentaire