I used a program below to generate random rows from my dataframe, however, the console shows 'MemoryError'. I guess it was probably caused by the code is not efficient enough?
import numpy as np
import pandas as pd
from random import sample
df = pd.read_csv("path").set_index('c1')
rindex = np.array(sample(xrange(len(df_index_1)), 30000))
# I'd like to randomly generate 30k from 60k rows' df, that is 60k(rows)*2k(cols)
df.ix[rindex].to_csv('path')
#author: http://ift.tt/1C7dvAC
Does anyone know how to work this issue out?
Aucun commentaire:
Enregistrer un commentaire