vendredi 30 janvier 2015

Assigning data frame column values probabilistically

I am trying to create a data frame named "students" with four variables: Gender, Year (Freshman, Sophomore, Junior, Senior), Age, and GPA. The idea is to have a data frame that illustrates the four levels of measurement: nominal, ordinal, interval, and ratio.


At this point it looks something like this:



ID Gender Year Age GPA
1 Male Sophomore 0 3.9
2 Male Junior 0 3.3
3 Female Junior 0 3.6
4 Male Freshman 0 3.1
5 Female Senior 0 2.9


I'm having a problem with Age. I would like Age to be assigned based on a probability. For example, if a student is a Freshman, I'd like Age to be assigned along something like the following lines:



Age Probability
14 .47
15 .48
16 .05


I have a function to do that set up like this:



1: Age <- function(df) {
2: for (i in 1:nrow(df) {
3: if (df[i, 2] == "Freshman") {
4: df[i, 3] = 15
5: } else if {
6: continue through the years
7: }
8: }
9: }


My thinking is that I want to change the right side of the assignment in Line 4 to something that will assign the age probabilistically. That's what I cannot figure out how to do.


On a related note, if there's a better way to do it than what I'm considering, I'd be appreciative of hearing that.


And on a final note, I've Googled the web at large, queried the R forums on Reddit and Talk Stats, and searched the R tags on this site, all to no avail. I can't believe I'm the first person who's ever wanted to do something like this, so it occurs to me that maybe I'm phrasing the query wrong. If that's the case, any guidance there would also be appreciated.





Aucun commentaire:

Enregistrer un commentaire