The following code tests KMeans for several n_clusters and tries to find the "best" n_clusters by the inertia criterion. However, it is not reproducible: even fixing random_state, every time I call kmeans(df) on the same dataset, it generates different clustering - and even different n_clusters. Am I missing something here?
from sklearn.cluster import KMeans
from tqdm import tqdm_notebook
def kmeans(df):
inertia = []
models = {}
start = 3
end = 40
for i in tqdm_notebook(range (start, end)):
k = KMeans(n_clusters=i, init='k-means++', n_init=50, random_state=10, n_jobs=-1).fit(df.values)
inertia.append(k.inertia_)
models[i] = k
ep = np.argmax(np.gradient(np.gradient(np.array(inertia)))) + start
return models[ep]
Aucun commentaire:
Enregistrer un commentaire