lundi 9 décembre 2019

Eli5 explain_weights does not returns feature_importance for each class with sklearn Random Forest classifier

I am using the eli5 explain_weights function on a Random Forest classifier from scikit-learn. I have seen in the eli5 documentation (pp. 30-31) that this function is able to return feature importance (mean weight + standard deviation) for each class to predict. However, when using it on my dataset, the function only returns feature importances for the whole model (not for each class).

Here a reproductible example generated with the scikit-learn make_classification function:

import pandas as pd
import eli5
from eli5.sklearn import PermutationImportance
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier

x, y = datasets.make_classification(n_samples=200, n_features=5, n_informative=3, n_redundant=2, n_classes=4)
df = pd.concat([pd.DataFrame(x, columns=['feat_1', 'feat_2', 'feat_3', 'feat_4', 'feat_5']), pd.DataFrame(y, columns=['classe'])], axis=1)
df = df.replace({'classe': {0: '1st', 1: '2nd', 2: '3rd', 3: '4th'}})

labels = pd.unique(df['classe'])

train, test = train_test_split(df, stratify=df['classe'], test_size=0.40)

rf = RandomForestClassifier()
rf.fit(train[['feat_1', 'feat_2', 'feat_3', 'feat_4', 'feat_5']], train['classe'])

perm = PermutationImportance(rf).fit(test[['feat_1', 'feat_2', 'feat_3', 'feat_4', 'feat_5']], test['classe'])
var_imp_classes = eli5.explain_weights(perm, top=5, targets=labels, target_names=labels, feature_names=['feat_1', 'feat_2', 'feat_3', 'feat_4', 'feat_5'])

print(eli5.format_as_text(var_imp_classes))

I have renamed features and classes but this is not mandatory here. Likewise, the PermutationImportance step can be avoided by replacing the perm argument in eli5.explain_weights by rf.

This code returns the following:

Explained as: feature importances

Feature importances, computed as a decrease in score when feature
values are permuted (i.e. become noise). This is also known as 
permutation importance.

If feature importances are computed on the same data as used for training, 
they don't reflect importance of features for generalization. Use a held-out
dataset if you want generalization feature importances.

0.3475 ± 0.1111  feat_1
0.1900 ± 0.1134  feat_4
0.0700 ± 0.0200  feat_3
0.0550 ± 0.0624  feat_2
0.0300 ± 0.0300  feat_5

I can't find the detailed result for each class, as shown in this question. I am using the explain_weightsthe show_weights function as I would like to store the output in a DataFrame, but the same issue arises when using show_weights. I have the same issue using other classifiers, such as SGDClassifier, and after removing the PermutationImportance step.

What is wrong with my code?

Thank you all!




Aucun commentaire:

Enregistrer un commentaire