lundi 29 août 2016

random_state maintained when running script again?

Suppose I have a program, called script.py:

import pandas as pd
import numpy as pd
from sklearn.cross_validation import train_test_split

if __name__ == "__main__":
    df = pd.DataFrame({"x": np.random.randint(5, size = 20), "y": np.random.randint(2, size = 20)})

    train, test = train_test_split(df, test_size = 0.20, random_state = 100)

If I run this script from my command line once:

H:\>python script.py

How can I ensure that the train and test dataframes in subsequent runs (i.e. when I run script.py again) are identical to the train and test dataframes from previous iterations? I know the random_state works if you don't leave the console, but would the equality of these train and test sets be preserved if I came back tomorrow, turned my PC back on, and re-ran script.py?

I am testing the accuracies of different machine learning algorithms, all stored in different scripts, which is why I want to make sure the train and test sets are identical across scripts.




Aucun commentaire:

Enregistrer un commentaire