Monday, August 27, 2018

Zero-Inflated Poisson and Negative Binomial Models with GLMMadaptive

Clustered/Grouped Count Data

Often cluster/grouped count data exhibit extra zeros and over-dispersion. To account for these features, Poisson and negative binomial mixed effects models with an extra zero-inflation part are used. These models entail a logistic regression model for the extra zeros, and a Poisson or negative binomial model for the remaining zeros and the positive counts. In both models, random effects are included to account for the correlations in the repeated measurements.

Estimation under maximum likelihood is challenging due to the high dimension of the random effects vector. In this post, we will illustrate how to estimate the parameters of such models using the package GLMMadaptive that uses the adaptive Gaussian quadrature rule to approximate the integrals over the random effects. The function in the package that fits these models is mixed_model(). The user defines the type of model using the family argument. Arguments fixed and random specify the R formulas for the fixed- and random-effects parts of the model for the remaining zeros and the positive counts, and arguments zi_fixed and zi_random specify the formulas for the fixed- and random-effects parts for the extra zeros part.

Simulate Zero-Inflated Negative Binomial Data

To illustrate the use of function mixed_model() to fit these models, we start by simulating longitudinal data from a zero-inflated negative binomial distribution:



Zero-Inflated Poisson Mixed Effects Model

A zero-inflated Poisson mixed model with only fixed effects in the zero part is fitted with the following call to mixed_model() that specifies the zi.poisson() family object in the family argument:


 
Only the log link is currently available for the non-zero part and the logit link for the zero part. Hence, the estimated fixed effects for the two parts are interpreted accordingly. We extend fm1 by also allowing for random intercepts in the zero part. We should note that by default the random intercept of the non-zero part is correlated with the random intercept from the zero part:



We test if we need the extra random effect using a likelihood ratio test:





Zero-Inflated Negative Binomial Mixed Effects Model

We continue with the same data, but we now take into account the potential over-dispersion in the data using a zero-inflated negative binomial model. To fit this mixed model we use an almost identical syntax to what we just did above - the only difference is that we now specify as family the zi.negative.binomial() object:



Similarly to fm1, in gm1 we specified only fixed effects for the logistic regression for the zero part. We now compare this model with the zero-inflated Poisson model that allowed for a random intercept in the zero part. The comparison can be done with the anova() method; because the two models are not nested, we set test = FALSE in the call to anova(), i.e.:



We observe that accounting for the over-dispersion seems to better improve the fit than including the random intercepts term in the zero part.

Thursday, June 14, 2018

Mixed Models with Adaptive Gaussian Quadrature

Overview

In this post, I would like to introduce my new R package GLMMadaptive for fitting mixed-effects models for non-Gaussian grouped/clustered outcomes using marginal maximum likelihood.

Admittedly, there is a number of packages available for fitting similar models, e.g., lme4, glmmsr, glmmTMB, glmmEP, and glmmML among others; more information on other available packages can also be found in GLMM-FAQ. GLMMadaptive differs from these packages in approximating the integrals over the random effects in the definition of the marginal log-likelihood function using an adaptive Gaussian quadrature rule, while allowing for multiple correlated random effects.

An Example: Mixed Effects Logistic Regression

We illustrate the use of the package in the simple case of a mixed effects logistic regression. We start by simulating some data:



We continue by fitting the mixed effects logistic regression for the longitudinal outcome y assuming random intercepts for the random-effects part. The primary model-fitting function in the package is the mixed_model(), and has four required arguments, namely,
  • fixed: a formula for the fixed effects,  
  • random: a formula for the random effects,  
  • family: a family object specifying the type of response variable, and  
  • data: a data frame containing the variables in the previously mentioned formulas.
 Assuming that the package has been loaded using library("GLMMadaptive"), the call to fit the random intercepts logistic regression is:



By default, 11 quadrature points are used, but this can be adjusted using the nAGQ control argument. We extend model fm1 by also including a random slopes term; however, we assume that the covariance between the random intercepts and random slopes is zero. This is achieved by using the `||` symbol in the specification of the random argument. We fit the new model and compare it with fm1 using the anova() function that performs a likelihood ratio test:



 We further extend the model by estimating the covariance between the random intercepts and random slopes, and we use 15 quadrature points for the numerical integration:



Capabilities and Further Reading

The package offers a wide range of methods for standard generic functions in R applicable to regression models objects and mixed models objects in particular (e.g., fixef(), ranef(), etc.); more information can be found in the Methods for MixMod Objects vignette. In addition, some highlights of its capabilities:
  • It allows for user-defined family objects implementing not standardly available outcome distributions; more information can be found in the Custom Models vignette.
  • It can calculate fixed effects coefficients with a marginal / population averaged interpretation using the function marginal_coefs().
  • Function effectPlotData() calculates predictions and confidence intervals for constructing effect plots.
The development version of the package is available on the GitHub repository.