I have simple count data set in a form of data vector PC6. I am trying to identify distribution in order to create pseudo generator for my simulation. Freq table of my data is
> table(PC6)
PC6
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 23 26 28 29 30 31 32 34 36
96 45 38 60 40 47 37 26 32 16 12 13 12 11 6 4 4 5 1 1 2 2 1 1 1 2 1 1 1 1 1
41 48 59 64
1 1 1 1
I define several model in order to identify AIC to compare models and find best match. Since big proportion of possible zeros I included "zero inflated" models and hurdle model. My models are:
> pc6zip <- zeroinfl(PC5 ~ 1|1, dist = "poisson")
> pc6zinb <- zeroinfl(PC6 ~ 1|1, dist = "negbin")
> pc6nb <- fitdistr(PC6, "negative binomial")
> pc6pois <- fitdistr(PC6, "poisson")
> pc6hurdle <- hurdle(PC6 ~ 1|1, dist = "poisson", zero.dist = "poisson")
> AIC(pc6hurdle,pc6nb,pc6pois,pc6zinb,pc6zip)
df AIC
pc6hurdle 2 4092.290
pc6nb 2 2957.367
pc6pois 1 4794.918
pc6zinb 3 2951.646
pc6zip 2 3286.346
I see that negative binomial and zero inflated negative binomial are quite close. If I done this part correctly, my question is how I can generate new dateset that will follow this ZINB model since i have only this parameter theta
> summary(pc6zinb)
Call:
zeroinfl(formula = PC6 ~ 1 | 1, dist = "negbin")
Pearson residuals:
Min 1Q Median 3Q Max
-0.9387 -0.7732 -0.2767 0.3853 9.6531
Count model coefficients (negbin with log link):
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.82669 0.05153 35.446 <2e-16 ***
Log(theta) 0.23829 0.12304 1.937 0.0528 .
Zero-inflation model coefficients (binomial with logit link):
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.3489 0.3354 -7.004 2.48e-12 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Theta = 1.2691
Number of iterations in BFGS optimization: 13
Log-likelihood: -1473 on 3 Df
If I take a look this negative binomial model i got this output
Call:
glm.nb(formula = PC6 ~ 1, init.theta = 0.9328223932, link = log)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.9109 -1.1309 -0.2927 0.3290 3.6570
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.73550 0.04881 35.56 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for Negative Binomial(0.9328) family taken to be 1)
Null deviance: 603.94 on 523 degrees of freedom
Residual deviance: 603.94 on 523 degrees of freedom
AIC: 2957.4
Number of Fisher Scoring iterations: 1
Theta: 0.9328
Std. Err.: 0.0714
2 x log-likelihood: -2953.3670
Since AIC are real close, do you think that negative binomial is better model, and how can I generate new dataset that follow ZINB model using this theta parameter

Aucun commentaire:
Enregistrer un commentaire