random: How to fit simple count data to model in order to generate new dataset

jeudi 15 octobre 2020

How to fit simple count data to model in order to generate new dataset

I have simple count data set in a form of data vector PC6. I am trying to identify distribution in order to create pseudo generator for my simulation. Freq table of my data is

> table(PC6)
PC6
 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 23 26 28 29 30 31 32 34 36 
96 45 38 60 40 47 37 26 32 16 12 13 12 11  6  4  4  5  1  1  2  2  1  1  1  2  1  1  1  1  1 
41 48 59 64 
 1  1  1  1

I define several model in order to identify AIC to compare models and find best match. Since big proportion of possible zeros I included "zero inflated" models and hurdle model. My models are:

> pc6zip <- zeroinfl(PC5 ~ 1|1, dist = "poisson")
> pc6zinb <- zeroinfl(PC6 ~ 1|1, dist = "negbin")
> pc6nb <- fitdistr(PC6, "negative binomial")
> pc6pois <- fitdistr(PC6, "poisson")
> pc6hurdle <- hurdle(PC6 ~ 1|1, dist = "poisson", zero.dist = "poisson")

> AIC(pc6hurdle,pc6nb,pc6pois,pc6zinb,pc6zip)
          df      AIC
pc6hurdle  2 4092.290
pc6nb      2 2957.367
pc6pois    1 4794.918
pc6zinb    3 2951.646
pc6zip     2 3286.346

I see that negative binomial and zero inflated negative binomial are quite close. If I done this part correctly, my question is how I can generate new dateset that will follow this ZINB model since i have only this parameter theta

> summary(pc6zinb)

Call:
zeroinfl(formula = PC6 ~ 1 | 1, dist = "negbin")

Pearson residuals:
    Min      1Q  Median      3Q     Max 
-0.9387 -0.7732 -0.2767  0.3853  9.6531 

Count model coefficients (negbin with log link):
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  1.82669    0.05153  35.446   <2e-16 ***
Log(theta)   0.23829    0.12304   1.937   0.0528 .  

Zero-inflation model coefficients (binomial with logit link):
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -2.3489     0.3354  -7.004 2.48e-12 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Theta = 1.2691 
Number of iterations in BFGS optimization: 13 
Log-likelihood: -1473 on 3 Df

If I take a look this negative binomial model i got this output

Call:
glm.nb(formula = PC6 ~ 1, init.theta = 0.9328223932, link = log)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.9109  -1.1309  -0.2927   0.3290   3.6570  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  1.73550    0.04881   35.56   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for Negative Binomial(0.9328) family taken to be 1)

    Null deviance: 603.94  on 523  degrees of freedom
Residual deviance: 603.94  on 523  degrees of freedom
AIC: 2957.4

Number of Fisher Scoring iterations: 1


              Theta:  0.9328 
          Std. Err.:  0.0714 

 2 x log-likelihood:  -2953.3670

Since AIC are real close, do you think that negative binomial is better model, and how can I generate new dataset that follow ZINB model using this theta parameter

random

jeudi 15 octobre 2020

How to fit simple count data to model in order to generate new dataset

Aucun commentaire:

Enregistrer un commentaire