vendredi 3 décembre 2021

R - map vector of unique values to dataframe column with duplicates

I have a column in a dataframe that is a character vector. I would like to add to my dataframe a column containing unique ID values/codes corresponding to each unique value in said column. Here is some toy data:

fnames <- c("joey", "joey", "joey", "jimmy", "jimmy", "tommy", "michael", "michael", "michael", "michael", "michael", "kevin", "kevin", "christopher", "aaron", "joshua", "joshua", "joshua", "arvid", "aiden", "kentavious", "lawrence", "xavier")

names <- as.data.frame(fnames)

To get the number of unique values of fnames I run:

unique_fnames <- length(unique(names$fnames))

To generate unique IDs for each unique name, I found the following function:

create_unique_ids <- function(n, seed_no = 16169, char_len = 6){
  set.seed(seed_no)
  pool <- c(letters, LETTERS, 0:9)
  
  res <- character(n)
  for(i in seq(n)){
    this_res <- paste0(sample(pool, char_len, replace = TRUE), collapse = "")
    while(this_res %in% res){
      this_res <- paste0(sample(pool, char_len, replace = TRUE), collapse = "")
    }
    res[i] <- this_res
  }
  res
}

Applying create_unique_ids to unique_fnames I get the desired number of ID codes:

unique_fname_id <- create_unique_ids(unique_fnames)

My question is this:

How do I add the vector of unique_fname_id to my dataframe names? The desired result is a dataframe names with a unique_fname_id column that looks something like this:

unique_fname_id <- c("VvWMKt", "VvWMKt", "VvWMKt", "yEbpFq", "yEbpFq", "Z3xCdO"...)

where "VvWMKt" corresponds to "joey", "yEbpFq" corresponds to "jimmy" and so on. The dataframe names would be the same length as the original, just with this added column.

Is there a way to do this? All suggestions are welcome and appreciated. Thanks!




Aucun commentaire:

Enregistrer un commentaire