jeudi 15 octobre 2020

How to fit simple count data to model in order to generate new dataset

I have simple count data set in a form of data vector PC6. I am trying to identify distribution in order to create pseudo generator for my simulation. Freq table of my data is

> table(PC6)
PC6
 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 23 26 28 29 30 31 32 34 36 
96 45 38 60 40 47 37 26 32 16 12 13 12 11  6  4  4  5  1  1  2  2  1  1  1  2  1  1  1  1  1 
41 48 59 64 
 1  1  1  1

enter image description here

I define several model in order to identify AIC to compare models and find best match. Since big proportion of possible zeros I included "zero inflated" models and hurdle model. My models are:

> pc6zip <- zeroinfl(PC5 ~ 1|1, dist = "poisson")
> pc6zinb <- zeroinfl(PC6 ~ 1|1, dist = "negbin")
> pc6nb <- fitdistr(PC6, "negative binomial")
> pc6pois <- fitdistr(PC6, "poisson")
> pc6hurdle <- hurdle(PC6 ~ 1|1, dist = "poisson", zero.dist = "poisson")

> AIC(pc6hurdle,pc6nb,pc6pois,pc6zinb,pc6zip)
          df      AIC
pc6hurdle  2 4092.290
pc6nb      2 2957.367
pc6pois    1 4794.918
pc6zinb    3 2951.646
pc6zip     2 3286.346

I see that negative binomial and zero inflated negative binomial are quite close. If I done this part correctly, my question is how I can generate new dateset that will follow this ZINB model since i have only this parameter theta

> summary(pc6zinb)

Call:
zeroinfl(formula = PC6 ~ 1 | 1, dist = "negbin")

Pearson residuals:
    Min      1Q  Median      3Q     Max 
-0.9387 -0.7732 -0.2767  0.3853  9.6531 

Count model coefficients (negbin with log link):
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  1.82669    0.05153  35.446   <2e-16 ***
Log(theta)   0.23829    0.12304   1.937   0.0528 .  

Zero-inflation model coefficients (binomial with logit link):
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -2.3489     0.3354  -7.004 2.48e-12 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Theta = 1.2691 
Number of iterations in BFGS optimization: 13 
Log-likelihood: -1473 on 3 Df

If I take a look this negative binomial model i got this output

Call:
glm.nb(formula = PC6 ~ 1, init.theta = 0.9328223932, link = log)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.9109  -1.1309  -0.2927   0.3290   3.6570  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  1.73550    0.04881   35.56   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for Negative Binomial(0.9328) family taken to be 1)

    Null deviance: 603.94  on 523  degrees of freedom
Residual deviance: 603.94  on 523  degrees of freedom
AIC: 2957.4

Number of Fisher Scoring iterations: 1


              Theta:  0.9328 
          Std. Err.:  0.0714 

 2 x log-likelihood:  -2953.3670 

Since AIC are real close, do you think that negative binomial is better model, and how can I generate new dataset that follow ZINB model using this theta parameter




Aucun commentaire:

Enregistrer un commentaire