mardi 22 décembre 2020

R: How to replace values in column with random numbers WITH duplicates

I have a df with data, and a name for each row. I would like the names to be replaced by a random string/number, but with the same string, when a name appears twice or more (eg. for Adam and Camille below).

df <- data.frame("name" = c("Adam", "Adam", "Billy", "Camille", "Camille", "Dennis"), "favourite food" = c("Apples", "Banana", "Oranges", "Banana", "Apples", "Oranges"), stringsAsFactors = F)

The expected output is something like this (it is not important how the random string looks or the lenght of it)

df_exp <- data.frame("name" = c("xxyz", "xxyz", "xyyz", "xyzz", "xyzz", "yyzz"), "favourite food" = c("Apples", "Banana", "Oranges", "Banana", "Apples", "Oranges"), stringsAsFactors = F)

I have tried several random replacement functions in R, however each of them creates a random string for each row in data, and not an individual one for duplicates, eg. stri_rand_strings:


library(stringi)
library(magrittr)
library(tidyr)
library(dplyr)

df <- df %>%
    mutate(UniqueID = do.call(paste0, Map(stri_rand_strings, n=6, length=c(2, 6),
                                          pattern = c('[A-Z]', '[0-9]'))))



Aucun commentaire:

Enregistrer un commentaire