dimanche 21 février 2016

How to generate OUTLIER-FREE data in R?

I would like to know how can I generate an OUTLIER-FREE data using R. I'm generating data using RNORM.

Say I have a linear equation

   Y = B0 + B1*X + E,     where X~N(5,9) and E~N(0,1).

I'm going to use RNORM in generating X and E. Below are the codes used:

  X <- rnorm(50,5,3)       #I'm generating 50 Xi's w/ mean=5 & var=9
  E <- rnorm(50,0,1)       #I'm generating 50 residuals w/ mean=0 & var=1

Now, I'm going to generate Y by plugging the generated data on X & E above in the linear equation.

If the data I've generated above is outlier-free (no influential observation), then no Cook's Distance of observations should exceed 4/n, which is the usual cut-off for detecting influential/outlying observations.

But I wasn't not able to get this so far. I'm still getting outliers once I generate data following this procedure.

Can you help me out on this? Do you know a way how can I generate data which is OUTLIER-FREE.

Thanks a lot!




Aucun commentaire:

Enregistrer un commentaire