jeudi 16 mai 2019

How can I apply .sample(4) function to a DF for each column, within a variable range that also only retrieves Y values?

Im creating a tool that automatically draws random names and contact information from a user variable .csv file.

I want to use the .sample(4) function to give me 4 random rows for each column in my DF.

I have 2 problems Im trying to figure out:

  1. The different .csv files that it can pull in based on user input, have a different number of columns. Each spreadsheet represents a month, and each month has a different number of events. However, 1 thing is consistent: the first 3 columns are name and contact info, and the last 2 are also contact info across all the spreadsheets. Im assuming there is a way I can write it to "give me a .sample(4) for each column (x number of columns) between the first 3 columns, and last 2 columns" That way whether there are 50 events, or 10 events it will know how many .sample(4) to generate.

  2. I only want the sample to choose a row, if for that specific column it is looking at, has a "Y" value instead of a "NaN".

I found an explanation here: Use sample() function to apply in a range of column

that explains how to do almost the exact OPPOSITE of what im trying to do. That ^ is selecting a random sample for every ROW, whereas I want a random sample (of a full row) for every column.

month = input("What month are you drawing for? ")

year = input("What year are you drawing for? ")

import pandas ticket_entries = pandas.read_csv(month+year+'.csv')

ticket_entries.sample(4)




Aucun commentaire:

Enregistrer un commentaire