I have a data set where some id(s) have multiple predicted values in a column (meaning that my algorithm had a hard time deciding their labels, in which case, the prediction performs no better than a random guess). I think this will be a good opportunity to assess the accuracy of my prediction versus the true label, but before I do that, I'll need to come up with a method to randomly pick a value for those id(s) that have multiple predicted values in the prediction column, such that each id has its unique predicted value which can be matched with their corresponding true labels on id.
My data looks like this
id <- c(1, 2, 2, 2, 3, 4, 5, 6)
predicted <- c(14, 5, 7, 11, 6, 1, 4, 2)
df <- data.frame(id, predicted)
head(df)
id predicted
1 1 14
2 2 5
3 2 7
4 2 11
5 3 6
6 4 1
id 2 obviously has more than one predicted values and needs to be reduced to one by randomly selecting one value out of the predicted column where id = 2. I'm wondering how can I generate a new dataframe this way where each id is uniquely matched to one predicted value?
Aucun commentaire:
Enregistrer un commentaire