I want to get reproducible samples of data. A quick experiment suggests, that numpy.random.seed does influence pandas.DataFrame.sample, but it is not documented.
Does anybody know
What I tried
I ran the following a couple of times and always got the same results back
#!/usr/bin/env python
import pandas as pd
import numpy as np
df = pd.DataFrame([(1, 2, 1),
(1, 2, 2),
(1, 2, 3),
(4, 1, 612),
(4, 1, 612),
(4, 1, 1),
(3, 2, 1),
],
columns=['groupid', 'a', 'b'],
index=['India', 'France', 'England', 'Germany', 'UK', 'USA',
'Indonesia'])
np.random.seed(0)
print(df.sample(n=1))
print(df.sample(n=1))
print(df.sample(n=1))
print(df.sample(n=1))
print(df.sample(n=1))
Which gives:
- Indonesia
- France
- Indonesia
- USA
- England
Aucun commentaire:
Enregistrer un commentaire