random: Confused about random

vendredi 10 août 2018

Confused about random_state in sklearn

So, basically, I'm using a RF for descriptive modelling as follows:

from sklearn.model_selection import cross_val_score
from sklearn.ensemble import RandomForestClassifier
from sklearn.utils import class_weight

class_weights = class_weight.compute_class_weight('balanced', np.unique(y), y)
class_weights = dict(enumerate(class_weights))
class_weights

{0: 0.5561096747856852, 1: 4.955559597429368}

clf = RandomForestClassifier(class_weight=class_weights, random_state=0)
clf = clf.fit(X, y)

cross_val_score(clf, X, y, cv=10, scoring='f1').mean()

And plotting variables importance as:

import matplotlib.pyplot as plt

def plot_importances(clf, features, n):
    importances = clf.feature_importances_
    indices = np.argsort(importances)[::-1]

    if n:
        indices = indices[:n]

    plt.figure(figsize=(10, 5))
    plt.title("Feature importances")
    plt.bar(range(len(indices)), importances[indices], align='center')
    plt.xticks(range(len(indices)), features[indices], rotation=90)
    plt.xlim([-1, len(indices)])
    plt.show()

    return features[indices]

imp = plot_importances(clf, X.columns, 30)

I was expecting variable importances to be the same across multiple runs. However, their importances changes whenever I re-run the notebook.

I don't understand why is that. Is it related to the cross_val_score method somehow?

random

vendredi 10 août 2018

Confused about random_state in sklearn

Aucun commentaire:

Enregistrer un commentaire