lundi 4 décembre 2017

Does Pandas use Numpy as a random number generator?

I want to get reproducible samples of data. A quick experiment suggests, that numpy.random.seed does influence pandas.DataFrame.sample, but it is not documented.

Does anybody know

What I tried

I ran the following a couple of times and always got the same results back

#!/usr/bin/env python

import pandas as pd
import numpy as np


df = pd.DataFrame([(1, 2, 1),
                   (1, 2, 2),
                   (1, 2, 3),
                   (4, 1, 612),
                   (4, 1, 612),
                   (4, 1, 1),
                   (3, 2, 1),
                   ],
                  columns=['groupid', 'a', 'b'],
                  index=['India', 'France', 'England', 'Germany', 'UK', 'USA',
                         'Indonesia'])
np.random.seed(0)
print(df.sample(n=1))
print(df.sample(n=1))
print(df.sample(n=1))
print(df.sample(n=1))
print(df.sample(n=1))

Which gives:

  • Indonesia
  • France
  • Indonesia
  • USA
  • England



Aucun commentaire:

Enregistrer un commentaire