mercredi 25 avril 2018

Why are serialized numpy random_state objects different when are loaded?

I'm trying to figure out why certain cross-validations using a defined set of indices, the same input data, and the same random_state in sklearn gives different results using the same LogisticRegression model hyperparameters. My first thought was that the initial random_state may be different on subsequent runs. Then I realized when I pickle the random_state it says the objects are different when I compare the 2 objects directly but the values in the get_state method are the same. Why is this?

random_state = np.random.RandomState(0)
print(random_state)
# <mtrand.RandomState object at 0x12424e480>

with open("./rs.pkl", "wb") as f:
    pickle.dump(random_state, f, protocol=pickle.HIGHEST_PROTOCOL)
with open("./rs.pkl", "rb") as f:
    random_state_copy = pickle.load(f)
    print(random_state_copy)
# <mtrand.RandomState object at 0x126465240>
print(random_state == random_state_copy)
# False
print(str(random_state.get_state()) == str(random_state_copy.get_state()))
# True

Versions:

numpy= '1.13.3',

Python='3.6.4 |Anaconda, Inc.| (default, Jan 16 2018, 12:04:33) \n[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)]')




Aucun commentaire:

Enregistrer un commentaire