Clustered/Grouped Count Data
Often cluster/grouped count data exhibit extra zeros and over-dispersion. To account for these features, Poisson and negative binomial mixed effects models with an extra zero-inflation part are used. These models entail a logistic regression model for the extra zeros, and a Poisson or negative binomial model for the remaining zeros and the positive counts. In both models, random effects are included to account for the correlations in the repeated measurements.Estimation under maximum likelihood is challenging due to the high dimension of the random effects vector. In this post, we will illustrate how to estimate the parameters of such models using the package GLMMadaptive that uses the adaptive Gaussian quadrature rule to approximate the integrals over the random effects. The function in the package that fits these models is mixed_model(). The user defines the type of model using the family argument. Arguments fixed and random specify the R formulas for the fixed- and random-effects parts of the model for the remaining zeros and the positive counts, and arguments zi_fixed and zi_random specify the formulas for the fixed- and random-effects parts for the extra zeros part.
Simulate Zero-Inflated Negative Binomial Data
To illustrate the use of function mixed_model() to fit these models, we start by simulating longitudinal data from a zero-inflated negative binomial distribution:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
set.seed(1234) | |
n <- 100 # number of subjects | |
K <- 8 # number of measurements per subject | |
t_max <- 5 # maximum follow-up time | |
# we constuct a data frame with the design: | |
# everyone has a baseline measurment, and then measurements | |
# at random follow-up times | |
DF <- data.frame(id = rep(seq_len(n), each = K), | |
time = c(replicate(n, c(0, sort(runif(K - 1, 0, t_max))))), | |
sex = rep(gl(2, n/2, labels = c("male", "female")), each = K)) | |
# design matrices for the fixed and random effects non-zero part | |
X <- model.matrix(~ sex * time, data = DF) | |
Z <- model.matrix(~ 1, data = DF) | |
# design matrices for the fixed and random effects zero part | |
X_zi <- model.matrix(~ sex, data = DF) | |
Z_zi <- model.matrix(~ 1, data = DF) | |
betas <- c(1.5, 0.05, 0.05, -0.03) # fixed effects coefficients non-zero part | |
shape <- 2 | |
gammas <- c(-1.5, 0.5) # fixed effects coefficients zero part | |
D11 <- 0.5 # variance of random intercepts non-zero part | |
D22 <- 0.4 # variance of random intercepts zero part | |
# we simulate random effects | |
b <- cbind(rnorm(n, sd = sqrt(D11)), rnorm(n, sd = sqrt(D22))) | |
# linear predictor non-zero part | |
eta_y <- as.vector(X %*% betas + rowSums(Z * b[DF$id, 1, drop = FALSE])) | |
# linear predictor zero part | |
eta_zi <- as.vector(X_zi %*% gammas + rowSums(Z_zi * b[DF$id, 2, drop = FALSE])) | |
# we simulate negative binomial longitudinal data | |
DF$y <- rnbinom(n * K, size = shape, mu = exp(eta_y)) | |
# we set the extra zeros | |
DF$y[as.logical(rbinom(n * K, size = 1, prob = plogis(eta_zi)))] <- 0 |
Zero-Inflated Poisson Mixed Effects Model
A zero-inflated Poisson mixed model with only fixed effects in the zero part is fitted with the following call to mixed_model() that specifies the zi.poisson() family object in the family argument:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
fm1 <- mixed_model(y ~ sex * time, random = ~ 1 | id, data = DF, | |
family = zi.poisson(), zi_fixed = ~ sex) | |
fm1 | |
#> | |
#> Call: | |
#> mixed_model(fixed = y ~ sex * time, random = ~1 | id, data = DF, | |
#> family = zi.poisson(), zi_fixed = ~sex) | |
#> | |
#> | |
#> Model: | |
#> family: zero-inflated poisson | |
#> link: log | |
#> | |
#> Random effects covariance matrix: | |
#> StdDev | |
#> (Intercept) 0.8272898 | |
#> | |
#> Fixed effects: | |
#> (Intercept) sexfemale time sexfemale:time | |
#> 1.5275644687 0.0173647260 -0.0040365383 0.0004326841 | |
#> | |
#> Zero-part coefficients: | |
#> (Intercept) sexfemale | |
#> -1.210597 0.498611 | |
#> | |
#> log-Lik: -2223.937 |
Only the log link is currently available for the non-zero part and the logit link for the zero part. Hence, the estimated fixed effects for the two parts are interpreted accordingly. We extend fm1 by also allowing for random intercepts in the zero part. We should note that by default the random intercept of the non-zero part is correlated with the random intercept from the zero part:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
fm2 <- update(fm1, zi_random = ~ 1 | id) | |
fm2 | |
#> | |
#> Call: | |
#> mixed_model(fixed = y ~ sex * time, random = ~1 | id, data = DF, | |
#> family = zi.poisson(), zi_fixed = ~sex, zi_random = ~1 | | |
#> id) | |
#> | |
#> | |
#> Model: | |
#> family: zero-inflated poisson | |
#> link: log | |
#> | |
#> Random effects covariance matrix: | |
#> StdDev Corr | |
#> (Intercept) 0.7391 | |
#> zi_(Intercept) 0.7157 -0.6334 | |
#> | |
#> Fixed effects: | |
#> (Intercept) sexfemale time sexfemale:time | |
#> 1.5902225072 -0.0117290754 -0.0029500093 0.0001046371 | |
#> | |
#> Zero-part coefficients: | |
#> (Intercept) sexfemale | |
#> -1.1201182 0.4416912 | |
#> | |
#> log-Lik: -2214.306 |
We test if we need the extra random effect using a likelihood ratio test:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
anova(fm1, fm2) | |
#> | |
#> AIC BIC log.Lik LRT df p.value | |
#> fm1 4461.87 4480.11 -2223.94 | |
#> fm2 4446.61 4446.61 -2214.31 19.26 2 1e-04 |
Zero-Inflated Negative Binomial Mixed Effects Model
We continue with the same data, but we now take into account the potential over-dispersion in the data using a zero-inflated negative binomial model. To fit this mixed model we use an almost identical syntax to what we just did above - the only difference is that we now specify as family the zi.negative.binomial() object:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
gm1 <- mixed_model(y ~ sex * time, random = ~ 1 | id, data = DF, | |
family = zi.negative.binomial(), zi_fixed = ~ sex) | |
gm1 | |
#> | |
#> Call: | |
#> mixed_model(fixed = y ~ sex * time, random = ~1 | id, data = DF, | |
#> family = zi.negative.binomial(), zi_fixed = ~sex) | |
#> | |
#> | |
#> Model: | |
#> family: zero-inflated negative binomial | |
#> link: log | |
#> | |
#> Random effects covariance matrix: | |
#> StdDev | |
#> (Intercept) 0.8291406 | |
#> | |
#> Fixed effects: | |
#> (Intercept) sexfemale time sexfemale:time | |
#> 1.471442764 0.027018363 0.003096235 -0.006574982 | |
#> | |
#> Zero-part coefficients: | |
#> (Intercept) sexfemale | |
#> -1.5551763 0.5913118 | |
#> | |
#> dispersion parameter: | |
#> 2.227321 | |
#> | |
#> log-Lik: -1958.746 |
Similarly to fm1, in gm1 we specified only fixed effects for the logistic regression for the zero part. We now compare this model with the zero-inflated Poisson model that allowed for a random intercept in the zero part. The comparison can be done with the anova() method; because the two models are not nested, we set test = FALSE in the call to anova(), i.e.:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
anova(gm1, fm2, test = FALSE) | |
#> | |
#> AIC BIC log.Lik df | |
#> gm1 3933.49 3954.33 -1958.75 | |
#> fm2 4446.61 4446.61 -2214.31 1 |
We observe that accounting for the over-dispersion seems to better improve the fit than including the random intercepts term in the zero part.