I would like to know how can I generate an OUTLIER-FREE data using R. I'm generating data using RNORM.
Say I have a linear equation
Y = B0 + B1*X + E, where X~N(5,9) and E~N(0,1).
I'm going to use RNORM in generating X and E. Below are the codes used:
X <- rnorm(50,5,3) #I'm generating 50 Xi's w/ mean=5 & var=9
E <- rnorm(50,0,1) #I'm generating 50 residuals w/ mean=0 & var=1
Now, I'm going to generate Y by plugging the generated data on X & E above in the linear equation.
If the data I've generated above is outlier-free (no influential observation), then no Cook's Distance of observations should exceed 4/n, which is the usual cut-off for detecting influential/outlying observations.
But I wasn't not able to get this so far. I'm still getting outliers once I generate data following this procedure.
Can you help me out on this? Do you know a way how can I generate data which is OUTLIER-FREE.
Thanks a lot!
Aucun commentaire:
Enregistrer un commentaire