I have written code to take a random sample based on certain criteria and for loop through the criteria to take a specific amount of samples for a specific item. The random sample is gathering IDs. I need the random IDs I am pulling to all be unique. If the random sample has an ID that is already appended to the full random ID list I want it to re-run the specific iteration until the IDs pulled in the random sample are not found in the full appended list. The random sample is a list. Sample dataframe contains the sample criteria and the number of samples I want to pull for that criteria. Filtered_df contains the data I want to run the sample on.
def randomSample(df_column, EncounterID_column):
'''
Output the Encounters from the filtered dataframe
Inputs:
df_column - Filtered_df["{}".format(column name where identifier value is found in sample)]
Encounter ID - Column Name for the Encounter ID
'''
Encounters = []
df_column = df_column.astype(str)
for i in range(len(Samples['Identifier'])):
new_df = Filtered_df[~df_column.str.contains(str(Samples['Identifier'][i]))==False]
for x in range(len(new_df[EncounterID_column])):
while new_df[EncounterID_column][x] not in Encounters:
Encounters.append(list(new_df[EncounterID_column].sample(n=int(Samples['Number of Samples'][i]))))
return Encounters
Aucun commentaire:
Enregistrer un commentaire