mardi 7 juin 2022

Running unique samples

I have written code to take a random sample based on certain criteria and for loop through the criteria to take a specific amount of samples for a specific item. The random sample is gathering IDs. I need the random IDs I am pulling to all be unique. If the random sample has an ID that is already appended to the full random ID list I want it to re-run the specific iteration until the IDs pulled in the random sample are not found in the full appended list. The random sample is a list. Sample dataframe contains the sample criteria and the number of samples I want to pull for that criteria. Filtered_df contains the data I want to run the sample on.

def randomSample(df_column, EncounterID_column):
    '''
    Output the Encounters from the filtered dataframe 
    Inputs:
        df_column - Filtered_df["{}".format(column name where identifier value is found in sample)]
        Encounter ID - Column Name for the Encounter ID
    '''
    Encounters = []
    df_column = df_column.astype(str)
    for i in range(len(Samples['Identifier'])):
        new_df = Filtered_df[~df_column.str.contains(str(Samples['Identifier'][i]))==False]
        for x in range(len(new_df[EncounterID_column])):
            while new_df[EncounterID_column][x] not in Encounters:
                Encounters.append(list(new_df[EncounterID_column].sample(n=int(Samples['Number of Samples'][i]))))
    
    return Encounters



Aucun commentaire:

Enregistrer un commentaire