vendredi 24 novembre 2017

Random select is taking too much memory in Python

I am trying to select lines to read from a CSV file, selection is based on:

n=123456789    
s= n//10    
skip = sorted(random.sample(range(1,n+1),k=(n-s)) # to be used with skip_row in pd.read_csv

This statement is halting the computer due to the large RAM it is consuming.

I wonder if there is an alternative to efficiently select the rows to be skipped.




Aucun commentaire:

Enregistrer un commentaire