lundi 27 février 2017

Loading a random sample from CSV with pandas

I have a CSV of the format

Team, Player

What I want to do is apply a filter to the field Team, then take a random subset of 3 players from EACH team.

So for instance, my CSV looks like :

Man Utd, Ryan Giggs
Man Utd, Paul Scholes
Man Utd, Paul Ince
Man Utd, Danny Pugh
Liverpool, Steven Gerrard
Liverpool, Kenny Dalglish
...

I want to end up with an XLS consisting of 3 random players from each team, and only 1 or 2 in the case where there is less than 3 e.g,

Man Utd, Paul Scholes
Man Utd, Paul Ince
Man Utd, Danny Pugh
Liverpool, Steven Gerrard
Liverpool, Kenny Dalglish

I started out using XLRD, my original post is here.

I am now trying to use Pandas as I believe this will be more flexible into the future.

So, in psuedocode what I want to do is :

foreach(team in csv)
   print random 3 players + team they are assigned to

I've been looking through Pandas and trying to find the best approach to doing this, but I can't find anything similar to what I want to do (it's a difficult thing to Google!). Here's my attempt so far :

import pandas as pd
from collections import defaultdict
import csv as csv


columns = defaultdict(list) # each value in each column is appended to a list

with open('C:\\Users\\ADMIN\\Desktop\\CSV_1.csv') as f:
    reader = csv.DictReader(f) # read rows into a dictionary format
    for row in reader: # read a row as {column1: value1, column2: value2,...}
        print(row)
        #for (k,v) in row.items(): # go over each column name and value
        #    columns[k].append(v) # append the value into the appropriate list
                                 # based on column name k

So I have commented out the last two lines as I am not really sure if I am needed. I now each row being printed, so I just need to select a random 3 rows per each football team (or 1 or 2 in the case where there are less).

How can I accomplish this ? Any tips/tricks?

Thanks.




Aucun commentaire:

Enregistrer un commentaire