I am using the eli5 explain_weights function on a Random Forest classifier from scikit-learn. I have seen in the eli5 documentation (pp. 30-31) that this function is able to return feature importance (mean weight + standard deviation) for each class to predict. However, when using it on my dataset, the function only returns feature importances for the whole model (not for each class).
Here a reproductible example generated with the scikit-learn make_classification function:
import pandas as pd
import eli5
from eli5.sklearn import PermutationImportance
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
x, y = datasets.make_classification(n_samples=200, n_features=5, n_informative=3, n_redundant=2, n_classes=4)
df = pd.concat([pd.DataFrame(x, columns=['feat_1', 'feat_2', 'feat_3', 'feat_4', 'feat_5']), pd.DataFrame(y, columns=['classe'])], axis=1)
df = df.replace({'classe': {0: '1st', 1: '2nd', 2: '3rd', 3: '4th'}})
labels = pd.unique(df['classe'])
train, test = train_test_split(df, stratify=df['classe'], test_size=0.40)
rf = RandomForestClassifier()
rf.fit(train[['feat_1', 'feat_2', 'feat_3', 'feat_4', 'feat_5']], train['classe'])
perm = PermutationImportance(rf).fit(test[['feat_1', 'feat_2', 'feat_3', 'feat_4', 'feat_5']], test['classe'])
var_imp_classes = eli5.explain_weights(perm, top=5, targets=labels, target_names=labels, feature_names=['feat_1', 'feat_2', 'feat_3', 'feat_4', 'feat_5'])
print(eli5.format_as_text(var_imp_classes))
I have renamed features and classes but this is not mandatory here. Likewise, the PermutationImportance step can be avoided by replacing the perm argument in eli5.explain_weights by rf.
This code returns the following:
Explained as: feature importances
Feature importances, computed as a decrease in score when feature
values are permuted (i.e. become noise). This is also known as
permutation importance.
If feature importances are computed on the same data as used for training,
they don't reflect importance of features for generalization. Use a held-out
dataset if you want generalization feature importances.
0.3475 ± 0.1111 feat_1
0.1900 ± 0.1134 feat_4
0.0700 ± 0.0200 feat_3
0.0550 ± 0.0624 feat_2
0.0300 ± 0.0300 feat_5
I can't find the detailed result for each class, as shown in this question. I am using the explain_weightsthe show_weights function as I would like to store the output in a DataFrame, but the same issue arises when using show_weights. I have the same issue using other classifiers, such as SGDClassifier, and after removing the PermutationImportance step.
What is wrong with my code?
Thank you all!
Aucun commentaire:
Enregistrer un commentaire