Monday, August 27, 2018

Zero-Inflated Poisson and Negative Binomial Models with GLMMadaptive

Clustered/Grouped Count Data

Often cluster/grouped count data exhibit extra zeros and over-dispersion. To account for these features, Poisson and negative binomial mixed effects models with an extra zero-inflation part are used. These models entail a logistic regression model for the extra zeros, and a Poisson or negative binomial model for the remaining zeros and the positive counts. In both models, random effects are included to account for the correlations in the repeated measurements.

Estimation under maximum likelihood is challenging due to the high dimension of the random effects vector. In this post, we will illustrate how to estimate the parameters of such models using the package GLMMadaptive that uses the adaptive Gaussian quadrature rule to approximate the integrals over the random effects. The function in the package that fits these models is mixed_model(). The user defines the type of model using the family argument. Arguments fixed and random specify the R formulas for the fixed- and random-effects parts of the model for the remaining zeros and the positive counts, and arguments zi_fixed and zi_random specify the formulas for the fixed- and random-effects parts for the extra zeros part.

Simulate Zero-Inflated Negative Binomial Data

To illustrate the use of function mixed_model() to fit these models, we start by simulating longitudinal data from a zero-inflated negative binomial distribution:



Zero-Inflated Poisson Mixed Effects Model

A zero-inflated Poisson mixed model with only fixed effects in the zero part is fitted with the following call to mixed_model() that specifies the zi.poisson() family object in the family argument:


 
Only the log link is currently available for the non-zero part and the logit link for the zero part. Hence, the estimated fixed effects for the two parts are interpreted accordingly. We extend fm1 by also allowing for random intercepts in the zero part. We should note that by default the random intercept of the non-zero part is correlated with the random intercept from the zero part:



We test if we need the extra random effect using a likelihood ratio test:





Zero-Inflated Negative Binomial Mixed Effects Model

We continue with the same data, but we now take into account the potential over-dispersion in the data using a zero-inflated negative binomial model. To fit this mixed model we use an almost identical syntax to what we just did above - the only difference is that we now specify as family the zi.negative.binomial() object:



Similarly to fm1, in gm1 we specified only fixed effects for the logistic regression for the zero part. We now compare this model with the zero-inflated Poisson model that allowed for a random intercept in the zero part. The comparison can be done with the anova() method; because the two models are not nested, we set test = FALSE in the call to anova(), i.e.:



We observe that accounting for the over-dispersion seems to better improve the fit than including the random intercepts term in the zero part.