random: How to properly split train/test with Random seed and np.random.rand()? [duplicate]

jeudi 4 février 2021

Why using numpy.random.seed():

np.random.seed(40)

Does not garant me always the same train/test split with np.random.rand ?

msk = np.random.rand(len(df)) < 0.8
train = df[msk]
test = df[~msk]

First train try:

0  12,886,167 
1  12,777,434 
2  14,054,459 
3  14,520,707 
4  12,618,535 
...

Second train try:

0  12,886,167 
1  12,777,434 
2  14,054,459 
3  14,520,707 
5   8,489,784 
...

How to define the same np.random.rand data seapration ?

random