vendredi 9 juillet 2021

Receive randomized rows as new csv

I am quite new to python (and mainly want to use this for academic reasons), so please bare with my question!

I have collected twitter data through their dev academic account - however the amount is quite big, so I want to create a randomized sample. I already have the data as json as well as csv format.

I now want to get a randomized sample of x rows (let's say 1000 rows) (preferably for a specific column (column "CG" (header=text)) - if that's difficult values for the whole row should be regardded)

What I found is this code, that gives out randomized values.

  1. How can I amend it in a way it will give out randomized rows, but not random values - so they will always contain the content from the same column?
  2. How can I have this create a new csv with the randomized row data as sample?

P.s.: I also tried to get datatools running and make use of their csvrows tool, however although following the instructions I couldn't get the csvrow tool to run. datatools

MWE:

import csv
import random

with open('test.csv', 'r') as csv_file:

    lines = [tuple(line) for line in csv.reader(csv_file)]


n = 1000 #  number of row you want to pick elements from

chosen_rows = random.choices(lines, k=n) 
 # pick n rows in the list

chosen_values = [random.choice(row) for row in chosen_rows]
  # pick a value from each row

print('\n'.join(chosen_values))



Aucun commentaire:

Enregistrer un commentaire