lundi 18 novembre 2019

KMeans not returning reproducible results in sklearn, even fixing random_state

The following code tests KMeans for several n_clusters and tries to find the "best" n_clusters by the inertia criterion. However, it is not reproducible: even fixing random_state, every time I call kmeans(df) on the same dataset, it generates different clustering - and even different n_clusters. Am I missing something here?

from sklearn.cluster import KMeans
from tqdm import tqdm_notebook

def kmeans(df):
    inertia = []
    models = {}
    start = 3
    end = 40
    for i in tqdm_notebook(range (start, end)):
        k = KMeans(n_clusters=i, init='k-means++', n_init=50, random_state=10, n_jobs=-1).fit(df.values)        
        inertia.append(k.inertia_)
        models[i] = k
    ep = np.argmax(np.gradient(np.gradient(np.array(inertia)))) + start
    return models[ep]



Aucun commentaire:

Enregistrer un commentaire