I am working on a project which involves fitting a housing dataset and predict resale housing price using the random forest model - according to the Variance Importance plot, I made the following interpretation. VarImptPlot
- Based on the %IncMSE, town and remaining lease are important predictors that would result in at least 50% increase in MSE if their values were randomly shuffled.
My concern is that town is a categorical variable with over 20 classes and not all classes are shown in the plot, is it still possible to make a conclusion that town is a significant predictor given that many levels are ranked high up in the plot?
-Based on the IncNodePurity, floor area and remaining lease are identified as important variables that would result in significant decrease in node impurities.
-Hence town, remaining lease, floor area sqm are significant predictors that would influence resale price of HDB.
Is it correct to make an interpretation on the significance of predictors like this?
Aucun commentaire:
Enregistrer un commentaire