mardi 10 décembre 2019

Pseudo-randomization using Python

I am trying to pseudo-randomize the order of entries in a dataframe by applying specific criteria.

There is a really useful thread I found here: thread. To give some background, let´s suppose I have a dataframe like this:

data2 = [['fire', "a", "1"], ['smoke', "b", "1"], ['honeybee', "a", "2"], ['curtain', "c", "2"]] 
df2 = pd.DataFrame(data2, columns = ['item', 'label1', "label2"]) 

I want to reorder randomly every time the dataframe in a way that both label1 and label2 are not repeated in the following row.

For example, this would not be fine because label1 is ok, but in label2 there are two consecutive 2:

item    label1    label2
fire    a         1
curtain c         2
honey   a         2
smoke   b         1

To achieve this, I am running the following code:

import pandas as pd

randomized = False
while not randomized:
    exp_df_2 = df2.sample(frac=1).reset_index(drop=True) #df2 is the original dataframe
    # check for repeats
    for i in range(0, len(exp_df_2)):
        try:
            if i == len(exp_df_2) - 1:
                randomized = True
            elif exp_df_2['label1'][i] != exp_df_2['label1'][i+1] and exp_df_2['label2'][i] != exp_df_2['label2'][i+1]:
                continue
            elif exp_df_2['label1'][i] == exp_df_2['label1'][i+1] or exp_df_2['label2'][i] == exp_df_2['label2'][i+1]:
                break
        except IndexError:
            pass

It seems to work pretty fine, but I wonder if it has some unwanted effects. Does it?

Once that I am sure that this code is doing what I want, I would like to ask one thing: How can I be more loose on the restrictions?

For example, what if I want to allow maximum 2 consecutive identical values for label2, leaving the restriction on label1 as it is?




Aucun commentaire:

Enregistrer un commentaire