I am attempting to create multiple random forest classifiers and analyze their performance. I also want to analyze the performance of the individual component trees in the forest. However I am unable to do so yet. Below is my code:
iris = datasets.load_iris()
data=pd.DataFrame({
'sepal length':iris.data[:,0],
'sepal width':iris.data[:,1],
'petal length':iris.data[:,2],
'petal width':iris.data[:,3],
'species':iris.target
})
X=data[['sepal length', 'sepal width', 'petal length', 'petal width']] # Features
y=data['species'] # Labels
# Split dataset into training set and test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
clf=RandomForestClassifier(n_estimators=100)
clf.fit(X_train,y_train)
y_pred=clf.predict(X_test)
print("Accuracy:",accuracy_score(y_test, y_pred))
As you can see I am able to print the overall accuracy score of the forest but I don't know how to get the accuracy score of the individual trees. There are 100 component trees in this forest. And I would like to know how I can get their individual characteristics such as accuracy score, visualization, confusion matrix, etc. It would also be helpful if I could get those top 10 individual component trees with the highest accuracy score. Thanks, and please let me know how I can edit the question to make it better.
Aucun commentaire:
Enregistrer un commentaire