mercredi 28 juillet 2021

R dataframe filling named columns with random sample data

I am trying to generate a lot of test data for other programs. Working in R Studio I import an SPSS sav file which has 73 variables and the values and labels recorded in it using Haven as a dataframe "td". This gives me all the variable names which I need to work with. Then I delete all the existing data.

td <- td[0,]

Then I generate 10,000 test data rows by loading the index IDs

td$ID <- 12340000:12349999

So far so good.

I have a constant called ThismanyRows <- 10000 I have a large list of Column header names in a variable called BinaryVariables And a vector of valid values for it called CheckedOrNot <- c(NA, 1)

This is where the problem is:

td[,BinaryVariables] <- sample(x = CheckedOrNot, size= ThismanyRows, replace = TRUE)

does fill all the columns with data. But its all exactly the same data, which isn't what I want. I want the sample function to run against each column, but not each value in each column as in.

Even when

Fillbinary <- function () {sample(x = CheckedOrNot, size= ThismanyRows, replace = TRUE)}

and

td <- lapply(td[,BinaryVariables],Fillbinary) generates: Error in FUN(X[[i]], ...) : unused argument (X[[i]])

So far I have not been able to work out how to deal with each column as a column and apply the sample function to it.

Any help much appreciated.




Aucun commentaire:

Enregistrer un commentaire