I am trying to generate a lot of test data for other programs. Working in R Studio I import an SPSS sav file which has 73 variables and the values and labels recorded in it using Haven as a dataframe "td". This gives me all the variable names which I need to work with. Then I delete all the existing data.
td <- td[0,]
Then I generate 10,000 test data rows by loading the index IDs
td$ID <- 12340000:12349999
So far so good.
I have a constant called ThismanyRows <- 10000
I have a large list of Column header names in a variable called BinaryVariables
And a vector of valid values for it called CheckedOrNot <- c(NA, 1)
This is where the problem is:
td[,BinaryVariables] <- sample(x = CheckedOrNot, size= ThismanyRows, replace = TRUE)
does fill all the columns with data. But its all exactly the same data, which isn't what I want. I want the sample function to run against each column, but not each value in each column as in.
Even when
Fillbinary <- function () {sample(x = CheckedOrNot, size= ThismanyRows, replace = TRUE)}
and
td <- lapply(td[,BinaryVariables],Fillbinary)
generates: Error in FUN(X[[i]], ...) : unused argument (X[[i]])
So far I have not been able to work out how to deal with each column as a column and apply the sample function to it.
Any help much appreciated.
Aucun commentaire:
Enregistrer un commentaire