Thursday, September 4, 2014

Joint Models for Longitudinal and Survival Data

What are joint models for longitudinal and survival data?

In this post we will introduce in layman's terms the framework of joint models for longitudinal and time-to-event data. These models are applied in settings where the sample units are followed-up in time, for example, we may be interest in patients suffering from a specific disease who are followed-up in time to monitor their progress. In this context typically different types of outcomes are collected for the sample units at hand. In general, these can be categorized as longitudinal outcomes, i.e., the same outcome repeatedly measured in time for the same sample unit, and event-time outcomes, i.e., the time until a specific event of interest occurs for the sample units. Often our research questions require analyzing each of these outcomes separately. For instance, does a new treatment have a beneficial effect in the survival of the patients or is there any evidence for a difference in the average longitudinal evolutions between males and females. However, there are also research questions of more complex nature for which a joint modeling approach of these outcomes is required. Joint models for longitudinal and survival data

What are endogenous time-varying covariates (in the setting described above)?

A time-varying covariate is termed endogenous if its value at any time point t is can be affected by an event occurring at an earlier time point s < t, and is exogenous if its value at t is not affected by an earlier event.

Examples of exogenous covariates are the nurse that took the blood sample or performed the echo, the period of the year (e.g., winter versus summer) and environmental factors (e.g., pollution levels). On the other hand all covariates measured on the patient (e.g., biomarkers) are endogenous. To make this distinction more clear let’s consider two time-varying covariates in the context of  a study on asthma, namely, a biomarker for asthma that has been measured during follow up for the patients in the study, and the air pollution levels in the neighborhood where each patients live. Suppose that a particular patient has an asthma attack after s = 5 months from the start of the study. It is directly evident that at a future time point, say t = 5.2 months, the level of the biomarker will be affected from the fact that this patient had an asthma attack, whereas air pollution levels at the same time point t = 5.2 months will not be affected by this attack.

What is non-random dropout (in the setting described above)?

A common and important problem in the analysis of longitudinal outcomes is missing data. Namely, if though measurements are planned at specific time points, often and for a variety of reasons the subjects under study do not adhere to these scheduled visit times or they even completely dropout from the study. When the reasons (or more accurately the probability) for dropping out depends on unobserved longitudinal responses, the process that generates these missing data cannot be ignored, even if the main focus is on the longitudinal outcome. In this case the dropout process is termed non-random.

How joint models work?

The intuitive idea behind joint models for longitudinal and survival outcomes is given in the following figure:

In the top panel the hazard process is depicted, which describes how the instantaneous risk (hazard function) of an event changes in time; for example, the hazard of patient having a relapse. In the bottom panel the asterisk denote the observed longitudinal responses and the green line the underlying longitudinal process; for example, the values of a biomarker for the disease under study. Joint models postulate a relative risk (proportional hazards) model for the event time outcome, which is directly associated with the longitudinal process denoted by the green line. This green line is recovered from the observed data (asterisks) using a mixed effects model. This model contains fixed effects, describing the average longitudinal evolution in time, and random effects that describe how each patient deviates from this average evolution. In their basic form joint models assume that the hazard function at any particular time point t, denoted by the vertical dashed line, is associated with the value of the longitudinal process (green line) at the same time point. The blue line represents the assumption behind the time-dependent Cox model, which posits that the value of the longitudinal outcome remain constant in between the observation times. Estimation of the model is based on the joint distribution of the two outcomes and can be done either under maximum likelihood or under a Bayesian approach. The framework of joint models can be used to account for both endogenous time-varying covariates and non-random dropout.

Fitting joint models using package JMbayes

There are several packages available on CRAN for fitting this type of joint models, namely JM, JMbayes, joineR and lcmm among others. In this post we will show how joint models can be fitted using using package JMbayes that fits joint models under the Bayesian paradigm. For this illustration we will be using the Primary Biliary Cirrhosis (PBC) data set (available in the package and also in the survival package) collected by the Mayo Clinic from 1974 to 1984. For our analysis we will consider 312 patients who have been randomized to D-penicillamine and placebo. During follow-up several biomarkers associated with PBC have been collected for these patients. Here we focus on serum bilirubin levels, which is considered one of the most important ones associated with disease progression. In package JMbayes the PBC data are available in the data frames pbc2 and pbc2.id containing the longitudinal and survival information, respectively (i.e., the former is in the long format while the latter contains a single row per patient).

We start by loading the JMbayes and lattice packages and defining the indicator status2 for the composite event, namely transplantation or death:

library("JMbayes")
library("lattice")
pbc2$status2 <- as.numeric(pbc2$status != "alive")
pbc2.id$status2 <- as.numeric(pbc2.id$status != "alive")



The design followed by package JMbayes requires to first separately fit a linear mixed model for the longitudinal outcome and a Cox model for the survival one. The aim of the linear mixed model is to describe/recover the subject-specific longitudinal trajectories. Close inspection of the shapes of the log serum bilirubin profiles indicates that for some individuals these seem to be nonlinear.

set.seed(1)
## we take a sample of patients with more than six 
## measurements
long_ids <- names(which(table(pbc2$id) > 6))
ids <- sample(long_ids, 16)
xyplot(log(serBilir) ~ year | id, data = pbc2, 

       subset = id %in% ids, type = c("p", "smooth"), 
       lwd = 2, layout = c(4, 4))



Hence, to allow for flexibility in the specification of these profiles we include natural cubic splines in both the fixed- and random-effects parts of the mixed model. This model can be fitted using the following calls to functions lme() and ns() (the latter from package splines): 

lmeFit <- lme(log(serBilir) ~ ns(year, 2), data = pbc2,
              random = ~ ns(year, 2) | id) 

Analogously, in the Cox model we control for treatment and age, and also allow for their interaction:

coxFit <- coxph(Surv(years, status2) ~ drug * age, 
                data = pbc2.id, x = TRUE)

In the call to coxph() argument x is set to TRUE such that the design matrix is also included in the resulting model object. Using as main arguments the lmeFit and coxFit objects, the corresponding joint model is fitted using the code:

jointFit <- jointModelBayes(lmeFit, coxFit, timeVar = "year")

summary(jointFit)



Call:
jointModelBayes(lmeObject = lmeFit.pbc1, survObject = coxFit.pbc1, 
    timeVar = "year")

Data Descriptives:
Longitudinal Process             Event Process
Number of Observations: 1945     Number of Events: 169 (54.2%)
Number of subjects: 312

Joint Model Summary:
Longitudinal Process: Linear mixed-effects model
Event Process: Relative risk model with penalized-spline-approximated 
  baseline risk function
Parameterization: Time-dependent value 

      LPML      DIC       pD
 -3169.172 6116.463 939.5384

Variance Components:
              StdDev    Corr        
(Intercept)   1.0097  (Intr)  n(,2)1
ns(year, 2)1  2.3555  0.3640        
ns(year, 2)2  2.2430  0.3609  0.5701
Residual      0.3023                

Coefficients:
Longitudinal Process
              Value Std.Err Std.Dev   2.5%  97.5%      P
(Intercept)  0.4918  0.0212  0.0699 0.3676 0.6316 <0.001
ns(year, 2)1 2.4117  0.1032  0.2337 1.9423 2.7918 <0.001
ns(year, 2)2 2.3611  0.0895  0.2906 1.7765 2.8609 <0.001

Event Process
                     Value Std.Err  Std.Dev    2.5%    97.5%      P
drugD-penicil      -0.9648  0.1272   0.7543 -2.4530   0.4693  0.199
age                 0.0370  0.0018   0.0107  0.0135   0.0571 <0.001
drugD-penicil:age   0.0179  0.0025   0.0145 -0.0096   0.0470  0.211
Assoct              1.4181  0.0062   0.0953  1.2367   1.6041 <0.001
Bs.gammas1         -6.6205  0.1009   0.5992 -7.7762  -5.3132 <0.001
Bs.gammas2         -6.5832  0.1031   0.5962 -7.7639  -5.2468 <0.001
... 

MCMC summary:
iterations: 20000 
adapt: 3000 
burn-in: 3000 
thinning: 10 
time: 2.6 min
 
Argument timeVar is a character string that specifies the name of the time variable in the mixed model (the scale of time (e.g., days, months, years) must be the same in both the mixed and Cox models). The default call to jointModelBayes() includes in the linear predictor of the relative risk model the subject-specific linear predictor of the mixed model, which in this case represents the average patient-specific log serum bilirubin level. The output of the summary() method is rather self-explanatory and contains model summary statistics, namely LPML (the log pseudo marginal likelihood value), DIC (deviance information criterion), and pD (the effective number of parameters component of DIC), posterior means for all parameters, and standard errors (effective sample size estimated using time series methodology), standard deviations, 95% credibility intervals and tail probabilities for all regression coefficients in the two submodels. The association parameter, denoted in the output as Assoct, is the parameter that measures how strongly associated is the longitudinal outcome at any particular time point t with the hazard of an event at the same time point. The results suggest that serum bilirubin is strongly related with the risk for the composite event, with a doubling of serum bilirubin levels, resulting in a 2.7-fold (95% CI: 2.3; 3.1) increase of the risk.

5 comments:

  1. Dimitris, congratulations for your blog!
    It will be very helpful for all of us who are interested in these statistical models which allow to couple the longitudinal and the time-to-event approach.

    Best regards.
    Xavier_P

    ReplyDelete
  2. Great package - and I really enjoyed your webinar on youtube
    https://www.youtube.com/watch?v=vqGpQmnup0Y

    For the uninitiated (me...!), please can you elaborate on the purpose and nature of the spline fit in the lme function, and when you would and wouldn't do this.

    Just guessing - but it seems you are fitting to a fitted spline, rather than to the underlying data.

    Thanks,
    Dave

    ReplyDelete
    Replies
    1. Dave,

      The reason to use splines is that some times (as in the posted example) the longitudinal profiles of the patients/subjects may be nonlinear. In such cases, it is important to capture these nonlinearities, and splines is just one (easy) way to do it.

      Best,
      Dimitris

      Delete
  3. Many Thanks! My confusion is all my fault. I was thinking that vanilla-flavour lme() was doing more. The user needs to add the raspberry ripple.

    I also found the following helpful...
    http://stats.stackexchange.com/questions/5015/what-are-the-advantages-disadvantages-of-using-splines-smoothed-splines-and
    along with page 178 et seq. of http://www.amazon.co.uk/An-R-Companion-Applied-Regression/dp/141297514X

    Thanks,
    Dave



    ReplyDelete
  4. A package is great! Spline used in case when there is non linearity. What about polynomials? Which one is better to capture non linearity?

    Thanks,
    Mesfin

    ReplyDelete