mardi 24 mars 2015

How could I improve my random selection more efficient?

I used a program below to generate random rows from my dataframe, however, the console shows 'MemoryError'. I guess it was probably caused by the code is not efficient enough?



import numpy as np
import pandas as pd
from random import sample

df = pd.read_csv("path").set_index('c1')
rindex = np.array(sample(xrange(len(df_index_1)), 30000))
# I'd like to randomly generate 30k from 60k rows' df, that is 60k(rows)*2k(cols)
df.ix[rindex].to_csv('path')

#author: http://ift.tt/1C7dvAC


Does anyone know how to work this issue out?


Aucun commentaire:

Enregistrer un commentaire