I have a column in a dataframe that is a character vector. I would like to add to my dataframe a column containing unique ID values/codes corresponding to each unique value in said column. Here is some toy data:
fnames <- c("joey", "joey", "joey", "jimmy", "jimmy", "tommy", "michael", "michael", "michael", "michael", "michael", "kevin", "kevin", "christopher", "aaron", "joshua", "joshua", "joshua", "arvid", "aiden", "kentavious", "lawrence", "xavier")
names <- as.data.frame(fnames)
To get the number of unique values of fnames
I run:
unique_fnames <- length(unique(names$fnames))
To generate unique IDs for each unique name, I found the following function:
create_unique_ids <- function(n, seed_no = 16169, char_len = 6){
set.seed(seed_no)
pool <- c(letters, LETTERS, 0:9)
res <- character(n)
for(i in seq(n)){
this_res <- paste0(sample(pool, char_len, replace = TRUE), collapse = "")
while(this_res %in% res){
this_res <- paste0(sample(pool, char_len, replace = TRUE), collapse = "")
}
res[i] <- this_res
}
res
}
Applying create_unique_ids
to unique_fnames
I get the desired number of ID codes:
unique_fname_id <- create_unique_ids(unique_fnames)
My question is this:
How do I add the vector of unique_fname_id
to my dataframe names
? The desired result is a dataframe names
with a unique_fname_id
column that looks something like this:
unique_fname_id <- c("VvWMKt", "VvWMKt", "VvWMKt", "yEbpFq", "yEbpFq", "Z3xCdO"...)
where "VvWMKt"
corresponds to "joey"
, "yEbpFq"
corresponds to "jimmy"
and so on. The dataframe names
would be the same length as the original, just with this added column.
Is there a way to do this? All suggestions are welcome and appreciated. Thanks!
Aucun commentaire:
Enregistrer un commentaire