mardi 25 février 2020

Dataframe apply balanced allocation of rows to lists based on row type

I have a dataframe with 192 rows , each rows represent a sentence with some metadata and a specific type (H or L). So, for each type I have total of 96 sentences. I need to allocate them to 10 different lists with the following conditions:

  1. Each list has 6 sub-lists with the size of 16 each
  2. Each sub-list has equal number of sentences from each type (8 H and 8 L)
  3. Each list has exactly one sentence from each of the 96 sets (So one sentence from each set and total of 96 sentences per list, 48 H and 48 L)
  4. Across the 10 lists, for each set , there will be 5 sentences from the type H and 5 sentences type L. Meaning, there will be a balance of the number of allocation of type per specific condition.

    So, if we take set#1 as an example - the sentence " I went to work" will appear in 5 lists (anywhere in the list - meaning the sublist doesn't matter at all in this point) , and the sentence "She went to work" will appear in the other 5 lists.

  5. The allocation of specific sentence per list and sub-list shoud be as randomly as possible

And example of a dataframe is:

SetNum       Sentence         Type   Index
1          I went to work      H       0
1          She went to work    L       1
2          I drink coffee      H       2
2          She drinks coffee   L       3
3          The desk is red     H       4
3          The desk is white   L       5
4          The TV is big       H       6
4          The TV is white     L       7
5          This is a car       H       8
5          This is a plane     L       9

..
96         Good morning        H       194
96         Good night          L       195

How can it be done?

Thanks!




Aucun commentaire:

Enregistrer un commentaire