vendredi 9 septembre 2016

Random draw imputation through Gaussian Mixture Model

I'm currently trying to write the missing data imputation code through Gaussian mixture model. My current reference as shown at this url (Because i can't provide the image of random draw equation): http://ift.tt/2cga2rs

This is the code for e-step and m-step through the Gaussian mixture model:

data(faithful);
library(mvtnorm);   # using multivariate normal package 
N = dim(faithful)[1];   # number of sample points
X = faithful[,1:2]; # the data matrix
alpha1 = alpha2 = .5;       # these are our initial class probability
m1 = c(2,90);       # initial means, chosen to be bad
m2 = c(4,50);  
Sigma1 = Sigma2 = diag(c(var(X[,1]),var(X[,2])));   # initial covariance matrices computed from entire data

e_step<-function(x,m1,m2,Sigma1,Sigma2,alpha){ 

  x_missing<-x
  x_missing$eruptions[1:10] <- NA
  x_missing$waiting[10:20] <- NA
  x_missing$eruptions[is.na(x_missing$eruptions)] = mean(x_missing$eruptions, na.rm=TRUE)
  x_missing$waiting[is.na(x_missing$waiting)] = mean(x_missing$waiting, na.rm=TRUE)
  x<-x_missing

  comp1.prod <- dmvnorm(x, m1, Sigma1) * alpha[1]
  comp2.prod <- dmvnorm(x, m2, Sigma2) * alpha[2]
  sum.of.comps <- comp1.prod + comp2.prod
  comp1.post <- comp1.prod / sum.of.comps
  comp2.post <- comp2.prod / sum.of.comps
  ll = sum(log(sum.of.comps))

  list("ll" = ll,
       "posterior.df" = cbind(comp1.post, comp2.post))
}

m_step<-function(x,posterior.df){
  comp1.n <- sum(posterior.df[, 1])
  comp2.n <- sum(posterior.df[, 2])

  comp1.alpha <- comp1.n / length(x)
  comp2.alpha <- comp2.n / length(x)

  comp1.mu <- colSums(posterior.df[, 1]*X)/comp1.n
  comp2.mu <- colSums(posterior.df[, 2]*X)/comp2.n

  resp1=posterior.df[, 1]
  resp2=posterior.df[, 2]
  acc1 = acc2 = matrix(0,nrow=2,ncol=2);
  Y = as.matrix(x); 
  for (n in 1:N) {
    acc1 = acc1 +  resp1[n] * ((Y[n,] - m1)  %*% t(Y[n,]-m1));   
    acc2 = acc2 + resp2[n] * ((Y[n,] - m2)  %*% t(Y[n,]-m2));   
  }
  Sigma1 = acc1/sum(resp1); Sigma2 = acc2/sum(resp1);

  list("m1" = comp1.mu, 
       "m2"   = comp2.mu,
       "Sigma1" = Sigma1, 
       "Sigma2" = Sigma2,
       "alpha" = c(comp1.alpha, comp2.alpha))
}

for (i in 1:100) {   #### 
  if (i == 1) {
    # Initialization
    e.step <- e_step(X,m1,m2,Sigma1,Sigma2,alpha)
    m.step <- m_step(X, e.step[["posterior.df"]])
    cur.loglik <- e.step[["ll"]]
    loglik.vector <- e.step[["ll"]]
    cat(i,cur.loglik, "\n")
  } else {
    # Repeat E and M steps till convergence
    e.step <- e_step(X,m.step[["m1"]],m.step[["m2"]],m.step[["Sigma1"]],m.step[["Sigma2"]],m.step[["alpha"]])
    m.step <- m_step(X, e.step[["posterior.df"]])
    loglik.vector <- c(loglik.vector, e.step[["ll"]])
    loglik.diff <- abs((cur.loglik - e.step[["ll"]]))
    if(loglik.diff< 0.000001) 
    {
      break
    } else {
      cur.loglik <- e.step[["ll"]]
    }
    cat(i,cur.loglik, "\n")  

  }
}

I have noticed that mean imputation part in the e-step is not in right way. Therefore i want to apply the random draw imputation as mentioned in the paper at page number 5308.

Can anybody here guide me on how to write the random draw imputation in R? How to write the random draw imputation in R specifically?

Thank you in advance

-Jas




Aucun commentaire:

Enregistrer un commentaire