dimanche 11 avril 2021

Splitting a data frame into subset without getting duplicates of a factor

I have a data frame of 750 rows and 3 columns.

  • One column is "IDs" (e.g. 001, 002, 003, 004, 005, etc.). There are a total of 250 levels of IDs.
  • One column is "type" (e.g. OO, AA, AP). There are a total of 3 levels of types.
  • One column is "face" (e.g. CFD, OFD, RAD). There are a total of 3 levels of faces.

First 5 rows of my df

Each "ID" is repeated for each of the three levels of "type" for a total of 750 rows.

I'd like to randomly split my data frame into 25 subset of 30 "IDs" each made of:

  • 10 IDs of the "OO" level of type (having 1 to 2 of the RAD level - 3 to 4 from the OFD level - 4 to 5 from the CFD level)
  • 10 IDs of the "AA" level of type (having 1 to 2 of the RAD level - 3 to 4 from the OFD level - 4 to 5 from the CFD level)
  • 10 IDs of the "AP" level of type (having 1 to 2 of the RAD level - 3 to 4 from the OFD level - 4 to 5 from the CFD level)

Each subset should not include duplicates of IDs.

I have tried combinations of split(), unique(), sample() but nothing is working. Any clue? Thanks in advance.




Aucun commentaire:

Enregistrer un commentaire