I have a df with data, and a name for each row. I would like the names to be replaced by a random string/number, but with the same string, when a name appears twice or more (eg. for Adam and Camille below).
df <- data.frame("name" = c("Adam", "Adam", "Billy", "Camille", "Camille", "Dennis"), "favourite food" = c("Apples", "Banana", "Oranges", "Banana", "Apples", "Oranges"), stringsAsFactors = F)
The expected output is something like this (it is not important how the random string looks or the lenght of it)
df_exp <- data.frame("name" = c("xxyz", "xxyz", "xyyz", "xyzz", "xyzz", "yyzz"), "favourite food" = c("Apples", "Banana", "Oranges", "Banana", "Apples", "Oranges"), stringsAsFactors = F)
I have tried several random replacement functions in R, however each of them creates a random string for each row in data, and not an individual one for duplicates, eg. stri_rand_strings:
library(stringi)
library(magrittr)
library(tidyr)
library(dplyr)
df <- df %>%
mutate(UniqueID = do.call(paste0, Map(stri_rand_strings, n=6, length=c(2, 6),
pattern = c('[A-Z]', '[0-9]'))))
Aucun commentaire:
Enregistrer un commentaire