Suppose I have a program, called script.py
:
import pandas as pd
import numpy as pd
from sklearn.cross_validation import train_test_split
if __name__ == "__main__":
df = pd.DataFrame({"x": np.random.randint(5, size = 20), "y": np.random.randint(2, size = 20)})
train, test = train_test_split(df, test_size = 0.20, random_state = 100)
If I run this script from my command line once:
H:\>python script.py
How can I ensure that the train
and test
dataframes in subsequent runs (i.e. when I run script.py
again) are identical to the train
and test
dataframes from previous iterations? I know the random_state
works if you don't leave the console, but would the equality of these train
and test
sets be preserved if I came back tomorrow, turned my PC back on, and re-ran script.py
?
I am testing the accuracies of different machine learning algorithms, all stored in different scripts, which is why I want to make sure the train and test sets are identical across scripts.
Aucun commentaire:
Enregistrer un commentaire