vendredi 4 juin 2021

How to control randomness when training word2vec in Gensim?

I'm working on the gensim’s word2vec model, but different runs on the same dataset produce the different model. I tried to set seed to a fixed number, including PYTHONHASHSEED and set the number of workers being one. But all the above methods are not working.

I included my code here:

def word2vec_model(data):
    model = gensim.models.Word2Vec(data, size=300, window=20, workers=4, min_count=1)
    model.wv.save("word2vec.wordvectors")
    embed = gensim.models.KeyedVectors.load("word2vec.wordvectors", mmap='r')
    return embed

I checked the following output:

Cooking.similar_by_vector(Cooking['apple'], topn=10, restrict_vocab=None)

example output:

[('apple', 0.9999999403953552),
 ('charcoal', 0.2554503381252289),
 ('response', 0.25395694375038147),
 ('boring', 0.2537640631198883),
 ('healthy', 0.24807702004909515),
 ('wrong', 0.24783077836036682),
 ('juice', 0.24270494282245636),
 ('lacta', 0.2373320758342743),
 ('saw', 0.2359238862991333),
 ('insufferable', 0.23015251755714417)]

Each run, I got different similar words.

Does anyone know how to solve it?I appreciate any direct codes or documentation. Thank you in advance!




Aucun commentaire:

Enregistrer un commentaire