mercredi 19 juillet 2017

How to use a consistent random sample in Python Pandas?

Below I have a code where a read a csv file and take a random sample of 700 from the file. I need to do this on multiple files, but if I iterate over the files, the sample (as it is random) will be different for each file, wheras I want to keep it the same once it's randomly generated.

df = pd.read_csv(file.csv, delim_whitespace=True)
df_s = df.sample(n=700)

My ideas are to take the row number and then pass it to the next file, however this does not seem to be very elegant.

Do you know any good solutions to this issue?




Aucun commentaire:

Enregistrer un commentaire