I am quite new to python (and mainly want to use this for academic reasons), so please bare with my question!
I have collected twitter data through their dev academic account - however the amount is quite big, so I want to create a randomized sample. I already have the data as json as well as csv format.
I now want to get a randomized sample of x rows (let's say 1000 rows) (preferably for a specific column (column "CG" (header=text)) - if that's difficult values for the whole row should be regardded)
What I found is this code, that gives out randomized values.
- How can I amend it in a way it will give out randomized rows, but not random values - so they will always contain the content from the same column?
- How can I have this create a new csv with the randomized row data as sample?
P.s.: I also tried to get datatools running and make use of their csvrows tool, however although following the instructions I couldn't get the csvrow tool to run. datatools
MWE:
import csv
import random
with open('test.csv', 'r') as csv_file:
lines = [tuple(line) for line in csv.reader(csv_file)]
n = 1000 # number of row you want to pick elements from
chosen_rows = random.choices(lines, k=n)
# pick n rows in the list
chosen_values = [random.choice(row) for row in chosen_rows]
# pick a value from each row
print('\n'.join(chosen_values))
Aucun commentaire:
Enregistrer un commentaire