class: title-slide # Session 5.1: Hierarchical models: longitudinal data<span style="display:block; margin-top: 10px ;"></span> ## ### ### Imperial College London <!-- Can also separate the various components of the extra argument 'params', eg as in ### Imperial College London, , MSc in Epidemiology --> --- layout: true .my-footer[ .alignleft[ © Marta Blangiardo | Monica Pirani ] .aligncenter[ MSc in Epidemiology ] .alignright[ Imperial College London, NA ] ] --- # Learning Objectives After this session you should be able to: <span style="display:block; margin-top: 10px ;"></span> - Specify hierarchical models for longitudinal data <span style="display:block; margin-top: 10px ;"></span> - Distinguish between random intercept and random slope models and recognise when each is more appropriate <span style="display:block; margin-top: 10px ;"></span> - Be able to run the above models in R-INLA <span style="display:block; margin-top: 10px ;"></span> The topics covered in this lecture are covered in Chapter 4 of Gómez-Rubio (2020a). --- # Outline There is huge scope for elaborating the basic hierarchical models discussed in the previous lecture to reflect additional structure and complexity in the data, e.g. <span style="display:block; margin-top: 10px ;"></span> - Adding covariates at different levels of the hierarchy <span style="display:block; margin-top: 10px ;"></span> - Adding further levels to the hierarchy (patients within wards within hospitals, pupils within schools within local authorities, `\(\ldots\)`) <span style="display:block; margin-top: 10px ;"></span> - Adding non-nested (cross-classified) levels (patients within GPs crossed with hospitals, `\(\ldots\)`) <span style="display:block; margin-top: 10px ;"></span> - **Repeated observations on some/all units (longitudinal data - we will see it in this lecture)** <span style="display:block; margin-top: 10px ;"></span> - Modelling temporal or spatial structure in data, `\(\ldots\)` (we will see it from next week) <span style="display:block; margin-top: 10px ;"></span> --- # Outline 1\. [What are longitudinal data](#longitudinal) <span style="display:block; margin-top: 30px ;"></span> 2\. [Example: antidepressant clinical trial](#example) <span style="display:block; margin-top: 30px ;"></span> 3\. [Model specification](#model) <span style="display:block; margin-top: 30px ;"></span> 4\. [Interpretation](#interpretation) --- name: longitudinal <span style="display:block; margin-top: 250px ;"></span> .myblue[.center[.huge[ **What are longitudinal data**]]] --- # What are longitudinal data? - Arise in studies where individual (or units) are measured repeatedly over time <span style="display:block; margin-top: 20px ;"></span> - For a given individual, observations over time will be typically dependent <span style="display:block; margin-top: 20px ;"></span> - Longitudinal data can arise in various forms: <span style="display:block; margin-top: 20px ;"></span> - continuous or discrete response; discrete response can be binary/binomial, categorical or counts <span style="display:block; margin-top: 10px ;"></span> - equally spaced or irregularly spaced <span style="display:block; margin-top: 10px ;"></span> - same or different time points for each individual <span style="display:block; margin-top: 10px ;"></span> - with or without missing data <span style="display:block; margin-top: 10px ;"></span> - many or few time points, `\(T\)` <span style="display:block; margin-top: 10px ;"></span> - many or few individuals or units, `\(n\)` --- # Analysing longitudinal data - There are many different ways to analyse longitudinal data <span style="display:block; margin-top: 20px ;"></span> - This is a very big field, so we have to be selective <span style="display:block; margin-top: 20px ;"></span> - The key feature of longitudinal data is the need to account for the dependence structure of the data <span style="display:block; margin-top: 20px ;"></span> - Two common methods: - random effects (hierarchical) models <span style="display:block; margin-top: 10px ;"></span> - autoregressive models <span style="display:block; margin-top: 20px ;"></span> - Here, we will focus on random effects models --- name: example <span style="display:block; margin-top: 250px ;"></span> .myblue[.center[.huge[ **Example: sleep study**]]] --- # Sleep study - Belenky, Wesensten, Thorne, Thomas, Sing, Redmond, Russo, and Balkin (2003) describes a study of reaction time in patients under sleep deprivation up to 10 days. <span style="display:block; margin-top: 20px ;"></span> - 18 subjects followed for 10 days <span style="display:block; margin-top: 20px ;"></span> - Subjects rated on average reaction time (in ms) for different activities at each measurement <span style="display:block; margin-top: 20px ;"></span> ``` r > library(lme4) > data(sleepstudy) > head(sleepstudy) ``` ``` Reaction Days Subject 1 249.5600 0 308 2 258.7047 1 308 3 250.8006 2 308 4 321.4398 3 308 5 356.8519 4 308 6 414.6901 5 308 ``` <span style="display:block; margin-top: 20px ;"></span> - Reaction time will be rescaled by dividing by 1000 to have the reaction time in seconds ``` r > sleepstudy$Reaction <- sleepstudy$Reaction / 1000 ``` --- # Sleep Study: long vs wide format - In `R-INLA` is extremely easy to work with longitudinal data, as long as the dataset is in *long format* .panelset[ .panel[.panel-name[Long format] ``` Reaction Days Subject 1 0.2495600 0 308 2 0.2587047 1 308 3 0.2508006 2 308 4 0.3214398 3 308 5 0.3568519 4 308 6 0.4146901 5 308 7 0.3822038 6 308 8 0.2901486 7 308 9 0.4305853 8 308 10 0.4663535 9 308 ``` - Note that a score is present only if the corresponding subject has been observed at that time point ] .panel[.panel-name[Wide format] ``` Subject Reaction.0 Reaction.1 Reaction.2 Reaction.3 Reaction.4 1 308 0.2495600 0.2587047 0.2508006 0.3214398 0.3568519 11 309 0.2227339 0.2052658 0.2029778 0.2047070 0.2077161 21 310 0.1990539 0.1943322 0.2343200 0.2328416 0.2293074 31 330 0.3215426 0.3004002 0.2838565 0.2851330 0.2857973 41 331 0.2876079 0.2850000 0.3018206 0.3201153 0.3162773 51 332 0.2348606 0.2428118 0.2729613 0.3097688 0.3174629 61 333 0.2838424 0.2895550 0.2767693 0.2998097 0.2971710 71 334 0.2654731 0.2762012 0.2433647 0.2546723 0.2790244 81 335 0.2416083 0.2739472 0.2544907 0.2708021 0.2514519 91 337 0.3123666 0.3138058 0.2916112 0.3461222 0.3657324 ``` - Note that with this format should a subject not have measurement for the entire set of days, R would pad these out with NAs ] ] --- # Sleep Study: data <img src="./img/unnamed-chunk-7-1.png" style="display: block; margin: auto;" width="50%"> --- # Sleep Study: objective - Study objective: is the length of sleep deprivation a determinant of reaction time? <span style="display:block; margin-top: 20px ;"></span> - The variables we will use are: <span style="display:block; margin-top: 10px ;"></span> - `\(y\)`: Reaction time (in s) - `\(t\)`: Days <span style="display:block; margin-top: 20px ;"></span> - For simplicity we will <span style="display:block; margin-top: 10px ;"></span> - assume a linear relationship <span style="display:block; margin-top: 20px ;"></span> - The models we will consider are: <span style="display:block; margin-top: 10px ;"></span> - a non-hierarchical model (standard linear regression) (LM) <span style="display:block; margin-top: 10px ;"></span> - a hierarchical model with random intercepts (LMM) <span style="display:block; margin-top: 10px ;"></span> - a hierarchical model with random intercepts and random slopes (LMM2) --- name: model <span style="display:block; margin-top: 250px ;"></span> .myblue[.center[.huge[ **Model specification**]]] --- # Sleep Study: a Bayesian (non-hierarchical) linear model (LM) - Specification: 1\. probability distribution for responses: `$$y_{it} \sim \mbox{Normal}(\mu_{it},\sigma^2)$$` - `\(\color{blue}{y_{it}}\)` = the reaction time for individual `\(i\)` on day `\(t\)` `\((\mbox{days } 0,\ldots,9)\)` <span style="display:block; margin-top: 10px ;"></span> - linear predictor: `\(\color{blue}{\mu_{it}}=\alpha+\beta t\)` <span style="display:block; margin-top: 10px ;"></span> - `\(\color{blue}{t}\)` = the day of the measurement <span style="display:block; margin-top: 20px ;"></span> 2\. .red[In this model no account is taken of the repeated structure (observations are nested within individuals)] <span style="display:block; margin-top: 20px ;"></span> 3\. Assume vague priors for all parameters: `\begin{align*} \alpha, \beta & \sim \mbox{Normal}(0,10000)\\ \frac{1}{\sigma^2} & \sim \mbox{Gamma}(1,0.001) \end{align*}` --- # Sleep Study: DAG for LM <center><img src=./img/DAGS0.png width='80%' title='INCLUDE TEXT HERE'></center> --- # Sleep Study: `R-INLA` code for LM .panelset[ .panel[.panel-name[Code] ``` r > library(INLA) > formula_LM <- Reaction ~ Days > lm <- inla(formula_LM, data=sleepstudy, family="gaussian", control.compute = list(waic=TRUE)) > > #Plot > ggplot(data = sleepstudy, aes(x=Days, y=Reaction)) + geom_point(aes(colour = Subject), alpha = .9) + + #geom_smooth(method="lm") + geom_abline(intercept=lm$summary.fixed[1,1], slope=lm$summary.fixed[2,1]) + labs(title = "Linear Model (LM)")+ + theme(legend.position = "none") ``` ] .panel[.panel-name[Plot] <img src="./img/unnamed-chunk-9-1.png" style="display: block; margin: auto;" width="45%"> - all the subjects share the same intercept and slope ] ] --- # Sleep Study: a Bayesian hierarchical linear model (LMM) - Modify LM to allow a separate intercept for each individual: <span style="display:block; margin-top: 20px ;"></span> `\begin{align*} y_{it} & \sim \mbox{Normal}(\mu_{it},\sigma^2) \\ \mu_{it} & = \textcolor{red}{\alpha_i}+\beta t \end{align*}` We are assuming that *conditionally* on `\(\alpha_i\)`, `\(\{y_{it}, t=0,\ldots,9 \}\)` are independent <span style="display:block; margin-top: 10px ;"></span> - Assume that all the `\(\{\alpha_i\}\)` follow a *common* prior distribution, e.g. `$$\color{red}{\alpha_i \sim \mbox{Normal}(\mu_{\alpha}, \sigma^2_{\alpha}) \;\;\;\; i=1,\ldots,18}$$` - Here we are assuming exchangeability between all the individuals - We may then assume vague priors for the *hyperparameters* of the population distribution: `\begin{align*} \color{red}{\mu_{\alpha}} & \color{red}{\sim \mbox{Normal}(0,10000)} \\ \color{red}{\frac{1}{\sigma^2_{\alpha}}} & \color{red}{\sim \mbox{Gamma}(1,0.001)} \end{align*}` - This is an example of a *Hierarchical LM* or *Linear Mixed Model (LMM)* or *Random Intercepts* model --- # Sleep Study: DAG for LM and LMM <center><img src=./img/DAGS.png width='80%' title='INCLUDE TEXT HERE'></center> .pull-left[ (left): LM ] .pull-right[ (right): LMM - random intercept ] --- # Sleep Study: `R-INLA` code for LMM .panelset[ .panel[.panel-name[Code] ``` r > library(INLA) > formula_LMM <- Reaction ~ -1 + Days + f(Subject,model="iid") > lmm <- inla(formula_LMM, data=sleepstudy, family="gaussian", control.compute = list(waic=TRUE)) > #Plot > p <- ggplot(data = sleepstudy, aes(x=Days, y=Reaction, group=Subject)) > p + geom_point(aes(colour = Subject), alpha = .9) + geom_abline(slope=lmm$summary.fixed[2,1], intercept=lmm$summary.fixed[1,1]+lmm$summary.random$Subject$mean)+ labs(title = "Random Intercept Model (LMM)")+ + theme(legend.position = "none") ``` ] .panel[.panel-name[Plot] <img src="./img/unnamed-chunk-11-1.png" style="display: block; margin: auto;" width="40%"> - each individual has a different regression line - but all individuals have the same slope (parallel lines) ] ] --- name: interpretation <span style="display:block; margin-top: 250px ;"></span> .myblue[.center[.huge[ **Interpretation**]]] --- # Sleep Study: revisiting the data .pull-left[ <img src="./img/unnamed-chunk-12-1.png" style="display: block; margin: auto;" width="100%"> ] .pull-right[ - .blue[blue]: regression line for each subject; - black: regression line from the LMM <span style="display:block; margin-top: 20px ;"></span> - Note that some of the subjects do not fit well in the *parallel lines* scenario <span style="display:block; margin-top: 20px ;"></span> - .red[So we add random slopes to the hierarchical model] ] --- # Sleep Study: adding random slopes (LMM2) - Modify LMM to allow a separate slope for each individual: `\begin{align*} y_{it} & \sim \mbox{Normal}(\mu_{it},\sigma^2) \\ \mu_{it} & = \alpha_i+\color{red}{\beta_{i}}t \end{align*}` - As for the `\(\{\alpha_i\}\)`, assume that the `\(\{\beta_{i}\}\)` follow *common* prior distributions with vague priors on their *hyperparameters* -- <span style="display:block; margin-top: 10px ;"></span> - In `R-INLA` ``` r > Subject2<-sleepstudy$Subject > formula_LMM2 <- Reaction ~ -1 + f(Subject, model="iid", constr=FALSE) + f(Subject2, Days, model="iid", constr=FALSE) > lmm2 <- inla(formula_LMM2, data=sleepstudy, family="gaussian", control.compute = list(waic=TRUE)) > #To access the summaries of the beta_i > lmm2$summary.random$Subject2[1:5,] ``` ``` ID mean sd 0.025quant 0.5quant 0.975quant mode kld 1 308 0.020669784 0.002755561 0.0152484767 0.020673643 0.026069246 0.020673687 3.306737e-09 2 309 0.002253148 0.002725737 -0.0030978834 0.002252863 0.007605816 0.002252865 3.069076e-09 3 310 0.005886208 0.002727767 0.0005290737 0.005886665 0.011240752 0.005886669 3.062921e-09 4 330 0.003006517 0.002726258 -0.0023451021 0.003006078 0.008360649 0.003006081 3.072187e-09 5 331 0.005133740 0.002727411 -0.0002214127 0.005133746 0.010488872 0.005133750 3.060754e-09 ``` --- # Sleep Study: DAG for LM, LMM and LMM2 <center><img src=./img/DAGS2.png width='100%' title='INCLUDE TEXT HERE'></center> .center[ (left): LM (center): LMM - random intercept (right): LMM2 - random intercept and slope ] --- # Sleep Study: random intercepts and slopes (LMM2) .pull-left[ <img src="./img/unnamed-chunk-14-1.png" style="display: block; margin: auto;" width="100%"> ] .pull-right[ - .red[LMM with random intercepts only:] - each individual has a different regression line - but for each treatment, only intercept varies by individual <span style="display:block; margin-top: 20px ;"></span> - .red[LMM with random intercepts and random slopes:] - now intercepts and slopes both vary - better fit for each individual ] --- # Which model is the best? .pull-left[ - Comparing WAIC ``` r > #Linear model > lm$waic$waic ``` ``` [1] -580.1714 ``` ``` r > #Random intercept > lmm$waic$waic ``` ``` [1] -716.8285 ``` ``` r > #Random intercept and slope > lmm2$waic$waic ``` ``` [1] -763.4412 ``` - It looks like the model with random intercept and slope fits the data best ] .pull-right[ - We can look at the variance hyperparameters of the lmm2 model to see how variable the intercept and slope are across individuals ``` r > #Random slope model > #Variability of the random intercept > inla.emarginal(fun=function(x) 1/x, + lmm2$marginals.hyperpar[[2]]) ``` ``` [1] 0.06367934 ``` ``` r > #Variability of the random slope > inla.emarginal(fun=function(x) 1/x, + lmm2$marginals.hyperpar[[3]]) ``` ``` [1] 0.0001472731 ``` - Some variability for both intercept and slope. ] --- # References Belenky, G., N. J. Wesensten, D. R. Thorne, et al. (2003). "Patterns of performance degradation and restoration during sleep restriction and subsequent recovery: A sleep dose-response study". In: _Journal of sleep research_ 12.1, pp. 1-12. Gómez-Rubio, V. (2020a). _Bayesian inference with INLA_. CRC Press.