Friday, March 18, 2016

An Integrated Shiny App for a Course on Repeated Measurements Analysis (completed)

Repeated Measurements Analysis

Repeated measurements analysis, and in particular longitudinal data analysis, is one of the two most frequently used types of analysis in my field (Biostatistics) - the other being survival analysis. Starting from this year I will be teaching in my university a new course on regression model for repeated measurements data that is primarily focused on applied statisticians, epidemiologists and clinicians. In general, this type of audience often finds this topic quite difficult, mainly due to the fact that one has to carefully consider the two levels of such data, namely, how to model longitudinal evolutions, and how to model correlations. On top of that, many of the researchers following this course have been primarily exposed to SPSS, making the transition to R that I will be using in the course somewhat more demanding.

Shiny app

Based on the considerations mentioned above, when I was developing the course I was thinking of ways to facilitate both the understanding of the key concepts of repeated measurements analysis, and how to effectively explain the use of R to analyze such data. The answer to both questions was to utilize the great capabilities of shiny. I have created an app the replays all analyses done in the course - a snapshot shown below.

The students can select a chapter and a section, see the code used in that section in the 'Code' tab, and examine the output in the 'Output' tab. The slides of the course are also integrated in the app, and can been seen in the 'Slides' tab. The 'Help' tab explains the basic usage of the main functions used in the selected chapter. To further enhance understanding of some key concepts, such as how random effects capture correlations and how longitudinal evolutions are affected by the levels of baseline covariates, the app allows to interactively change values for some parameters that control these features. The app also includes four practicals aimed at Chapter 2 that introduces marginal models for continuous data, Chapter 3 that explains linear mixed effects models, Chapter 4 the presents the framework of generalized estimating equations, and Chapter 5 the presents generalized linear mixed effects models, respectively. Chapter 6 focuses on explaining the issues with incomplete data in longitudinal studies. For each practical the students may reveal the answer to specific questions they have trouble solving, or download a whole R markdown report with a detailed explanation of the solutions.

The app is based on some popular packages for repeated measurements analysis (nlme, lme4, MCMCglmm, geepack), and some additional utilities packages (lattice, MASS, corrplot).

The app is available in my dedicated GitHub repository for this course, and can be invoked using the command (assuming that you have the aforementioned packages installed):

shiny::runGitHub("Repeated_Measurements", "drizopoulos")

Friday, March 4, 2016

Dynamic Predictions using Joint Models

What are Dynamic Predictions

In this post we will explain the concept of dynamic predictions and illustrate how these can be computed using the framework of joint models for longitudinal and survival data, and the R package JMbayes. The type of dynamic predictions we will discuss here are calculated in follow-up studies in which some sample units (e.g., patients) who are followed-up in time provide a set of longitudinal measurements. These longitudinal measurements are expected to be associated to events that the sample units may experience during follow-up (e.g., death, onset of disease, getting a child, dropout from the study, etc.). In this context, we would like to utilize the longitudinal information we have available up to  particular time point t to predict the risk of an event after t. For example, for a particular patient we would like to use his available blood values up to year 5 to predict the chance that he will develop a disease before year 7 (i.e., within two years from his last available measurement). The dynamic nature of these predictions stems from the fact that each time we obtain a new a longitudinal measurement we can update the prediction we have previously calculated.

Joint models for longitudinal and survival data have been shown to be a valuable tool for obtaining such predictions. They allow to investigate which features of the longitudinal profiles are most predictive, while appropriately accounting for the complex correlations in the longitudinal measurements.

Fit a Joint Model

For this illustration we will be using the Primary Biliary Cirrhosis (PBC) data set collected by the Mayo Clinic from 1974 to 1984. For our analysis we will consider 312 patients who have been randomized to D-penicillamine and placebo. During follow-up several biomarkers associated with PBC have been collected for these patients. Here we focus on serum bilirubin levels, which is considered one of the most important ones associated with disease progression. In package JMbayes the PBC data are available in the data frames pbc2 and containing the longitudinal and survival information, respectively (i.e., the former is in the long format while the latter contains a single row per patient).

We start by fitting a joint model to the PBC data set. For the log-transformed serum bilirubin we use a linear mixed effects models with natural cubic splines in the fixed and random effects for time, and also correct in the fixed part for age and sex. For the time-to-death we use a Cox model with baseline covariates age, sex and their interaction and the underlying level of serum bilirubin as estimated from the mixed model. This joint model is fitted using the following piece of code:

Calculate Dynamic Predictions

In package JMbayes these subject-specific predictions are calculated using function survfitJM(), respectively. As an illustration, we show how this function can be utilized to derive predictions for Patient 2 from the PBC data set using our fitted joint model jointFit. We first extract the data of this patient in a separate data frame and then we call survfitJM()

The last available measurement of this patient was in year 8.83, and survfitJM() will by default produce estimates of event-free probabilities starting from this last time point to the end of the follow-up. The calculation of these probabilities is based on a Monte Carlo procedure, and in the output we obtain as estimates the mean and median over the Monte Carlo samples along with the 95% pointwise credible intervals. Hence, the probability that this patient will survive up to year 11.2 is 60%. A plot of these probabilities can be obtained using the plot() method for objects returned by survfitJM()

Shiny app for Dynamic Predictions

To facilitate the use of dynamic predictions in practice, a web interface has been written using package shiny. This is available in the demo folder of the package and can be invoked with a call to the runDynPred() function. With this interface users may load an R workspace with the fitted joint model(s), load the data of the new subject, and subsequently obtain dynamic estimates of survival probabilities and future longitudinal measurements (i.e., an estimate after each longitudinal measurement). Several additional options are provided to calculate predictions based on different joint models (if the R workspace contains more than one model), to obtain estimates at specific horizon times, and to extract the data set with the estimated conditional survival probabilities. A detailed description of the options of this app is provided in the 'Help' tab within the app.