dimanche 19 avril 2020

What is a good way of generating ranked/dependent data in R?

For a personal project I am aiming to generate ranked data, for example voting preferences, where each user/observation has a fixed preference list. To illustrate:

data0 <- data.frame("voter"=1:5,"first_choice"=c("A","A","C","B","A"),"second_choice"=c("B","C","B","A","C"),"third_choice"=c("C","B","A","C","B"))

So to clarify, for each voter, their second_choice can not be their first choice, and their third_choice cannot be their first or second choice.

This is of course a toy example, I am aiming to generate approximately 10000 "voters", which all have 9 options instead of 3. Googling did not give me any useful information how to do such a thing efficiently in R. Right now I'm bruteforcing it, randomly generating a choice for each user and checking if it already is picked as one of their previous choices, but this is very inefficient. So I was wondering if anyone has ideas of how to do it the smart way.

Thanks in advance!




Aucun commentaire:

Enregistrer un commentaire