class: title-slide # Session 3.1: Introduction to INLA and `R-INLA`<span style="display:block; margin-top: 10px ;"></span> ## ### ### Imperial College London <!-- Can also separate the various components of the extra argument 'params', eg as in ### Imperial College London, , MSc in Epidemiology --> --- layout: true .my-footer[ .alignleft[ © Marta Blangiardo | Monica Pirani ] .aligncenter[ MSc in Epidemiology ] .alignright[ Imperial College London, NA ] ] <style> pre { overflow-x: auto; } pre code { word-wrap: normal; white-space: pre; } </style> --- # Learning objectives After this lecture you should be able to <span style="display:block; margin-top: 40px ;"></span> - Present the class of latent Gaussian models <span style="display:block; margin-top: 40px ;"></span> - Present the Laplace approximation and the INLA approach <span style="display:block; margin-top: 40px ;"></span> - Use the basic functions of the `R-INLA` package <span style="display:block; margin-top: 40px ;"></span> The topics treated in this lecture are presented in Chapter 4 of Blangiardo and Cameletti (2015). --- # Outline <span style="display:block; margin-top: 30px ;"></span> 1\. [MCMC to INLA](#mcmcINLA) <span style="display:block; margin-top: 30px ;"></span> 2\. [Latent Gaussian models](#LGM) <span style="display:block; margin-top: 30px ;"></span> 3\. [The INLA approach](#INLA) <span style="display:block; margin-top: 30px ;"></span> 4\. [`R-INLA` package](#Package) --- name: mcmcINLA <span style="display:block; margin-top: 250px ;"></span> .myblue[.center[.huge[ **MCMC to INLA**]]] --- # Why Markov Chain Monte Carlo? - For all but trivial examples it will be difficult to draw an iid Monte Carlo sample directly from the posterior distribution. - This happens, for example, when the dimension of the parameter vector `\(\bm{\theta}\)` is high - Also to use MC methods we must have a known form for the posterior distribution -- <span style="display:block; margin-top: 10px ;"></span> - Alternatively *correlated* values (via MCMC) can be (more easily) drawn to approximate the posterior distribution of the parameter of interest - Instead of simulating independent values from the posterior distribution we draw a sample by running a Markov chain whose stationary distribution is the posterior density `\(p(\theta\mid y)\)` -- <span style="display:block; margin-top: 10px ;"></span> - A sequence of values `\(\left\{\theta^{(1)},\ldots, \theta^{(m)}\right\}\)` generated from a Markov chain that has reached its stationary distribution (i.e. has converged) can be considered as an approximation to the posterior distribution and can be used to compute all the summaries of interest. -- <span style="display:block; margin-top: 10px ;"></span> - MCMC methods are very general and can effectively be applied to `any` model - Even if **in theory**, MCMC can provide (nearly) exact inference, given perfect convergence and MC error `\(\rightarrow 0\)`, in practice, this has to be balanced with model complexity and running time - This is an issue particularly for problems characterised by large data or very complex structure (e.g. hierarchical models) --- # MCMC: Gibbs sampling The .red[**Gibbs sampling**] (GS) is one of the most popular schemes for MCMC. Consider the case of a generic `\(J\)` dimensional parameter set `\((\theta_1,\theta_2,\ldots,\theta_J)\)`: 1. Select a set of initial values `\(\color{blue}{(\theta^{(0)}_1,\theta^{(0)}_2,\ldots,\theta^{(0)}_J)}\)` 2. Sample `\(\color{blue}{\theta^{(1)}_1}\)` from the conditional distribution `\(\color{blue}{ p(\theta_1\mid \theta^{(0)}_2,\theta^{(0)}_3,\ldots,\theta^{(0)}_J,y)}\)`; Sample `\(\color{blue}{\theta^{(1)}_2}\)` from the conditional distribution `\(\color{blue}{p(\theta_2\mid \theta^{(1)}_1,\theta^{(0)}_3,\ldots,\theta^{(0)}_J,y)}\)`; ...; Sample `\(\color{blue}{\theta^{(1)}_J}\)` from the conditional distribution `\(\color{blue}{p(\theta_J\mid \theta^{(1)}_1,\theta^{(1)}_2,\ldots,\theta^{(1)}_{J-1},y)}\)` -- 3. Repeat step 2. for `\(S\)` times until convergence is reached to the target distribution `\(\color{blue}{p(\bm\theta\mid y)}\)` 4. Use the sample from the target distribution to compute all relevant statistics: (posterior) mean, variance, credibility intervals, etc. If the *full conditionals* are not readily available, they need to be estimated (eg via Metropolis-Hastings) before applying the GS Easy references for MCMC are - Blangiardo and Cameletti (2015), Chapter 4 - Johnson, Ott, and Dogucu (2022), Chapters 6-7 --- # MCMC: convergence <center><img src=./img/NormGibbs1.jpeg width='50%' title='INCLUDE TEXT HERE'></center> --- count: false # MCMC: convergence <center><img src=./img/NormGibbs2.jpeg width='53%' title=''></center> --- count: false # MCMC: convergence <center><img src=./img/NormGibbs3.jpeg width='50%' title='INCLUDE TEXT HERE'></center> --- # From MCMC to INLA - `Standard` MCMC sampler are generally easy-ish to program and are in fact implemented in readily available software - MCMC methods are flexible and able to deal with virtually any type of data and model, but they involve computationally- and time- intensive simulations to obtain the posterior distribution for the parameters. For this reason the complexity of the model and the database dimension often remain fundamental issues. -- - The INLA algorithm proposed by Rue, Martino, and Chopin (2009) is a .red[*deterministic*] algorithm for Bayesian inference and it represents an alternative to MCMC which is instead a simulation based algorithm. -- - The INLA algorithm is designed for the class of .red[*latent Gaussian models*] and compared to MCMC it provides (as) accurate results in a shorter time. -- .pull-left[ - INLA has become very popular among statisticians and applied researchers and in the past few years the number of papers reporting usage and extensions of the INLA method has increased considerably. ] .pull-right[ <center><img src=./img/citations.png width='80%' title=''></center> ] --- # INLA website and community - The website contains source code, examples, papers and reports discussing the theory and applications of INLA. - There is also a discussion forum where users can post queries and requests of help. - Almost each year there is an INLA-related scientific meeting. <center><img src=./img/INLA_website.png width='50%' title=''></center> .center[[INLA website](https://www.r-inla.org)] --- name: LGM <span style="display:block; margin-top: 250px ;"></span> .myblue[.center[.huge[ **Latent Gaussian models**]]] --- # Latent Gaussian models (LGMs) - The general problem of (parametric) inference is posited by assuming a probability model for the observed data `\(\boldsymbol y=\left(y_1,\ldots, y_n\right)\)`, as a function of some relevant parameters <span style="display:block; margin-top: -10px ;"></span> .myblue[$$\boldsymbol{y} \mid \boldsymbol{\theta},\boldsymbol\psi \sim p(\boldsymbol{y} \mid \boldsymbol{\theta},\boldsymbol\psi) = \prod_{i=1}^n p(y_i \mid \boldsymbol{\theta},\boldsymbol\psi)$$] -- <span style="display:block; margin-top: -10px ;"></span> - Often (in fact for a surprisingly large range of models!), we can assume that the parameters are described by a .red[**Gaussian Markov Random Field** (GMRF)] .myblue[ `$$\boldsymbol{\theta} \mid \boldsymbol\psi \sim \text{Normal}(\boldsymbol 0,\boldsymbol{Q^{-1}}(\boldsymbol\psi))$$`] .myblue[ `$$\theta_i \perp\!\!\!\perp \theta_j \mid \boldsymbol\theta_{-i,j} \Longleftrightarrow Q_{ij}(\boldsymbol\psi)=0$$` ] <span style="display:block; margin-top: -20px ;"></span> where - The precision matrix `\(\bm Q\)` depends on some hyperparameters `\(\bm\psi\)` . - The notation `\(-i,j\)` indicates all the other elements of the parameters vector, excluding elements `\(i\)` and `\(j\)` - The components of `\(\bm \theta\)` are supposed to be *conditionally independent* with the consequence that `\(\bm Q\)` is a sparse precision matrix. - This kind of models is often referred to as .olive[**Latent Gaussian models**]. --- # LGMs as a general framework - In general .myblue[ `\begin{align*} \bm y \mid \bm \theta,\bm\psi &\sim \prod_i p(y_i\mid\bm\theta,\bm\psi)\class{black}{\mbox{ (Data model)}}\\ \bm\theta \mid \bm\psi &\sim p(\bm\theta\mid\bm\psi) = \mbox{Normal}(0,\bm Q^{-1}(\bm\psi)) \class{black}{\mbox{ (Latent Gaussian Field)}}\\ \bm\psi &\sim p(\bm\psi) \class{black}{\mbox{ (Hyperprior)}} \end{align*}` ] -- <span style="display:block; margin-top: 30px ;"></span> - The dimension of `\(\class{myblue}{\bm\theta}\)` can be very large (eg 10 `\(^2\)`-10 `\(^5\)`). <span style="display:block; margin-top: 10px ;"></span> - Conversely, the dimension of `\(\class{myblue}{\bm\psi}\)` must be relatively small to avoid an exponential increase in the computational costs of the model. --- # LGMs as a general framework - A very general way of specifying the problem is specifying a distribution for `\(\class{myblue}{y_i}\)` characterized by a parameter `\(\class{myblue}{\phi_i}\)` (usually the mean) defined as a function of a structured additive predictor `\(\class{myblue}{\eta_i}\)`, defined on a suitable scale, such that `\(\class{myblue}{g(\phi_i)=\eta_i}\)` (e.g. logistic for binomial data): .myblue[ `$$\eta_i = \beta_0 + \sum_{m=1}^M \beta_m x_{mi} + \sum_{l=1}^L f_l(z_{li})$$` ] where - `\(\beta_0\)` is the intercept; - `\(\bm\beta=\{\beta_1,\ldots,\beta_M\}\)` quantify the effect of the covariates `\(\bm{x}=(\bm{x_1},\ldots, \bm{x_M})\)` on the response; - `\(\bm{f}=\{f_1(\cdot),\ldots,f_L(\cdot)\}\)` is a set of functions defined in terms of some covariates `\(\bm{z}=(\bm z_1,\ldots, \bm z_L)\)` and then assume `$$\class{red}{\bm\theta=\{\beta_0,\bm \beta,\bm f \} \sim \mbox{Normal}(\bm 0,\bm Q^{-1}(\bm\psi)) = \mbox{GMRF}(\bm\psi)}$$` -- - **NB**: This of course implies some form of Normally-distributed marginals for `\(\beta_0,\bm{\beta}\)` and `\(\bm{f}\)` --- # LGMs as a general framework --- examples <span style="display:block; margin-top: -10px ;"></span> Upon varying the form of the functions `\(\class{myblue}{f_l(\cdot)}\)`, this formulation can accommodate a wide range of models (see for a review) - Standard regression - `\(\class{myblue}{f_l(\cdot) = \text{NULL}}\)` -- - Hierarchical models - `\(\class{myblue}{f_l(\cdot) \sim \mbox{Normal}(0,\sigma^2_f)}\)` (Exchangeable) `\(\class{myblue}{\sigma^2_f\mid \bm\psi \sim}\)` some common distribution -- - Spatial and spatio-temporal models - Areal data: `\(\class{myblue}{f_1(\cdot) \sim \mbox{CAR}}\)` (Spatially structured effects) `\(\class{myblue}{f_2(\cdot) \sim \mbox{Normal}(0,\sigma^2_{f_2})}\)` (Unstructured residual) - Geostatistical data: `\(\class{myblue}{f(\cdot) \sim \mbox{Gaussian field}}\)` - Temporal component: `\(\class{myblue}{f(\cdot) \sim \mbox{RW}}\)` -- - Spline smoothing - `\(\class{myblue}{f_l(\cdot) \sim \mbox{AR}(\phi,\sigma^2_\varepsilon)}\)` -- - Survival models, logGaussian Cox Processes, etc. <!-- # MCMC and LGMs --> <!-- - (Standard) MCMC methods can perform poorly when applied to (non-trivial) LGMs. This is due to several factors --> <!-- <span style="display:block; margin-top: 20px ;"></span> --> <!-- - The components of the latent Gaussian field `\(\class{myblue}{\bm{\theta}}\)` tend to be highly correlated, thus impacting on convergence and autocorrelation --> <!-- <span style="display:block; margin-top: 20px ;"></span> --> <!-- - Especially when the number of observations is large, `\(\class{myblue}{\bm{\theta}}\)` and `\(\class{myblue}{\bm\psi}\)` also tend to be highly correlated --> <!-- <span style="display:block; margin-top: 20px ;"></span> --> <!-- - Time to run can be very long --> <!-- - Blocking and overparameterisation can **alleviate**, but rarely eliminate the problem --> --- name: INLA <span style="display:block; margin-top: 250px ;"></span> .myblue[.center[.huge[ **The INLA approach**]]] --- # Integrated Nested Laplace Approximation (INLA) - The first *ingredient* of the INLA approach is the definition of conditional probability, which holds for any pair of variables `\((x,z)\)`. Technically, provided `\(p(z)>0\)` `$$\class{myblue}{p(x\mid z) =: \frac{p(x,z)}{p(z)} \rightarrow p(x,z)=p(x\mid z)p(z)}$$` Then `\(\class{myblue}{p(x\mid z)}\)` can be re-written as `$$\class{myblue}{p(z) = \frac{p(x,z)}{p(x\mid z)}}$$` -- - In particular, a conditional version can be obtained further considering a third variable `\(w\)` as `$$\class{myblue}{p(z\mid w) = \frac{p(x,z\mid w)}{p(x\mid z,w)}}$$` which is particularly relevant to the Bayesian case. -- - The second *ingredient* is the .red[Laplace approximation]. **Main idea**: approximate a generic function `\(\class{myblue}{f(x)}\)` using a quadratic function by means of a Taylor's series expansion around the mode `\(x^*=\arg\!\max_x \log f(x)\)`. Thus, under LA, `\(\class{red}{f(x)\approx \mbox{Normal}(x^*, {\sigma^2}^*)}\)`. --- # INLA objectives - In a Bayesian LGM, the required distributions are `$$\class{myblue}{p(\theta_i\mid\boldsymbol{y})} \class{black}{= \int p(\theta_i,\boldsymbol\psi\mid \boldsymbol{y}) d\boldsymbol\psi = \int} \class{red}{p(\boldsymbol\psi\mid\boldsymbol{y})} \class{orange}{p(\theta_i \mid \boldsymbol\psi,\boldsymbol{y})}\class{black}{d\boldsymbol\psi}$$` `$$\class{myblue}{p(\psi_k\mid\boldsymbol{y})} \class{black}{ = \int}\class{red}{p(\boldsymbol\psi\mid\boldsymbol{y})}\class{black}{d\boldsymbol\psi_{-k}}$$` -- - Thus we need to estimate: 1\. `\(\class{red}{p(\boldsymbol\psi\mid\boldsymbol{y})}\)`, from which also all the relevant marginals `\(\class{myblue}{p(\psi_k\mid\boldsymbol{y})}\)` can be obtained; -- 2\. `\(\color{orange}{p(\theta_i \mid \boldsymbol\psi,\boldsymbol{y})}\)`, which is needed to compute the marginal posterior for the parameters --- # Integrated Nested Laplace Approximation (INLA) 1\. can be easily estimated as `\begin{align} \class{red}{p(\boldsymbol\psi\mid\boldsymbol{y})} & \class{black}{ = ...}\\ & \approx \left. \frac{p(\boldsymbol\psi)p(\boldsymbol\theta\mid\boldsymbol\psi)p(\boldsymbol{y}\mid\boldsymbol\theta,\boldsymbol\psi)}{\tilde{p}(\boldsymbol\theta\mid\boldsymbol\psi,\boldsymbol{y})}\right |_{\boldsymbol\theta={\boldsymbol\theta}^*(\boldsymbol\psi)} =: \class{red}{\tilde{p}(\boldsymbol\psi\mid\boldsymbol{y})} \end{align}` <span style="display:block; margin-top: -10px ;"></span> - `\(\tilde{p}(\boldsymbol\theta\mid\boldsymbol\psi,\boldsymbol{y})\)` is the Gaussian approximation given by the Laplace method of `\(p(\boldsymbol\theta\mid\boldsymbol\psi,\boldsymbol{y})\)` - `\(\boldsymbol\theta={\boldsymbol\theta}^*(\boldsymbol\psi)\)` is its mode for a given `\(\boldsymbol \psi\)` <span style="display:block; margin-top: 20px ;"></span> 2\. is slightly more complex, because in general there will be more elements in `\(\boldsymbol\theta\)` than there are in `\(\boldsymbol\psi\)` and thus this computation is more expensive. Three possible approaches to approximate `\(\color{\orange}{p(\theta_i \mid \boldsymbol\psi,\bm{y})}\)`: - **Gaussian Approximation** <span style="display:block; margin-top: 20px ;"></span> - **Full Laplace Approximation** <span style="display:block; margin-top: 20px ;"></span> - **Simplified Laplace Approximation** (implemented by default by `R-INLA`) --- # INLA – in a nutshell... .pull-left[ .medium[ <ol style="counter-reset: my-counter 0;"> <li>Select a grid of \(H\) points \(\{\bm\psi_h^*\}\) and area weights \(\{\Delta_h\}\); interpolate the density to approximate to the posterior</li> </ol> ] <span style="display:block; margin-top: 18px ;"></span> <center><img src=./img/inla1-1.png width='80%' title=''></center> ] -- .pull-right[ .medium[ <ol style="counter-reset: my-counter 1;"> <li> Approximates the conditional posterior of each \(\theta_j\), given \(\bm\psi, \bm{y}\) on the \(H−\)dimensional grid</li> </ol> ] <span style="display:block; margin-top: 50px ;"></span> <center><img src=./img/inla2-1.png width='85%' title=''></center> ] --- count: false # INLA – in a nutshell... .pull-left[ .medium[ <ol style="counter-reset: my-counter 2;"> <li> Weight the conditional marginal posteriors by the density associated with each \(\psi_h^*\) <span style="display:block; margin-top: 20px ;"></span> </li> </ol> ] <span style="display:block; margin-top: 10px ;"></span> <center><img src=./img/inla3-1.png width='85%' title=''></center> ] -- .pull-right[ .medium[ <ol style="counter-reset: my-counter 3;"> <li> (Numerically) sum over all the conditional densities to obtain the marginal posterior for \(\theta_j\) <span style="display:block; margin-top: 50px ;"></span> </li> </ol> ] <span style="display:block; margin-top: -10px ;"></span> <center><img src=./img/inla4-1.png width='85%' title=''></center> ] More details on the estimation procedure are available in Chater 2 of Gómez-Rubio (2020) --- name: Package <span style="display:block; margin-top: 250px ;"></span> .myblue[.center[.huge[ **The `R-INLA` package**]]] --- # The `INLA` package for `R` - Good news is that all the procedures needed to perform INLA are implemented in a `R` package. This is effectively made by two components <span style="display:block; margin-top: 20px ;"></span> -- 1\. The .red[`GMRFLib`] library - A `C` library for fast and exact simulation of GMRFs -- 2\. The .blue[`inla`] program - A standalone `C` program build upon the `GMRFLib` library (it performs the relevant computation and returns the results in a standardised way) <span style="display:block; margin-top: 20px ;"></span> -- **NB**: Because the package `R-INLA` relies on a standalone `C` program (and other reasons...), it is not available directly from `CRAN`. <span style="display:block; margin-top: 20px ;"></span> `R-INLA` runs natively under Linux, Windows and Mac and it is possible to do multi-threading using `OpenMP` --- # The `INLA` package `R` - Installation - From `R`, installation of the stable version is performed typing `install.packages("INLA",repos=c(getOption("repos"),INLA="https://inla.r-inla-download.org/R/stable"), dep=TRUE)` <span style="display:block; margin-top: 20px ;"></span> - Later, you can upgrade the package by typing `library(INLA)` `inla.upgrade()` <span style="display:block; margin-top: 20px ;"></span> - A test-version (which may contain unstable updates/new functions) can be obtained by typing `inla.upgrade(testing=TRUE)` <span style="display:block; margin-top: 20px ;"></span> - Type `inla.version()` to find out the installed version --- # Step by step guide to using `R-INLA` 1\. The first thing to do is to .red[**specify the model**] - For example, assume we have a generic model .myblue[ `\begin{align*} y_i & \stackrel{\small{iid}}{\sim} p(y_i \mid \theta_i) \\ \eta_i & = \beta_0 + \beta_1 x_{1i} + \beta_2 x_{2i} + f(z_i) \end{align*}` ] where - `\(\color{olive}{\bm x = (\bm x_1,\bm x_2)}\)` are observed covariates - `\(\color{olive}{\bm \beta = (\beta_0,\beta_1,\beta_2)\sim \mbox{Normal}(0,\tau_1^{-1})}\)` are unstructured (*fixed*) effects - `\(\color{olive}{\bm z}\)` is an **index**. This can be used to include structured (*random*), spatial, spatio-temporal effect, etc. - `\(\color{olive}{f \sim \mbox{Normal}(0,\bm{Q}^{-1}_f(\tau_2))}\)` is a suitable function used to model the structured effects -- <span style="display:block; margin-top: 20px ;"></span> - As mentioned earlier, this formulation can actually be used to represent quite a wide class of models! --- # Step by step guide to using `R-INLA` - The model is translated in `R` code using a .red[`formula`] - This is sort of standard in `R` (you would do pretty much the same for calls to functions such as `lm` or `glm`) .olive[`formula = y `] `\(\color{olive}{\sim}\)` .olive[`1+ x1 + x2 + f(name=z, model="...",hyper=...)`] -- <span style="display:block; margin-top: -10px ;"></span> - The .olive[`f()`] function can account for several structured nonlinear effects. We have for example: - .olive[`iid`] specify independent random effects - .olive[`rw1`], .olive[`rw2`], .olive[`ar1`] are smooth effect of covariates or time effects - .olive[`besag`] models spatially structured effects (CAR) - .olive[`generic`] is a user-defined precision matrix - Type .olive[`inla.list.models("latent")`] for the complete list and find descriptions at .olive[[https://www.r-inla.org/documentation](https://www.r-inla.org/documentation)] - Some options can be specified for the .olive[`f()`] term: for example .olive[`hyper`] is used to specify the prior on the hyperparameters (more later). -- <span style="display:block; margin-top: -10px ;"></span> - It is possible to include in the `formula` several .olive[`f()`] terms specifying them separately, e.g. .olive[`formula <- y`] `\(\color{olive}{\sim}\)` .olive[`x1 + x2 + f(z1,...) + f(z2,...) + f(z3,...)`] --- # Step by step guide to using `R-INLA` 2\. Call the function .olive[`inla`] to fit the model, specifying the data and options (more on this later),e.g. .olive[`m = inla(formula, family="...", data=data.frame(y,x1,x2,z))`] <span style="display:block; margin-top: 20px ;"></span> -- -- - The data need to be included in a suitable .olive[`data.frame`] <span style="display:block; margin-top: 20px ;"></span> -- - The distribution of the data (i.e. the likelihood) is specified with the .olive[`family`] option. - Type .olive[``inla.list.models("likelihood")`] for the complete list of likelihood function (we have .olive[`poisson, binomial, gamma, beta, gaussian`] and many others). <span style="display:block; margin-top: 20px ;"></span> -- - Visit .olive[[https://www.r-inla.org/documentation](https://www.r-inla.org/documentation)] for the complete description --- # Step by step guide to using `R-INLA` The .olive[`control.xxx=list(...)`] statements in the .olive[`inla`] function control various part of the INLA program: - .olive[`control.compute`]: for computing measures of fit (eg DIC) <span style="display:block; margin-top: 20px ;"></span> - .olive[`control.predictor`]: for specifying the *Observation matrix* `\(\bm A\)` which links the latent field to the data <span style="display:block; margin-top: 20px ;"></span> - .olive[`control.family`]: for changing the prior distribution of the likelihood hyperparameters <span style="display:block; margin-top: 20px ;"></span> - .olive[`control.fixed`]: for changing the prior distribution of the fixed effects <span style="display:block; margin-top: 20px ;"></span> - .olive[`control.inla`]: for changing the strategy to use for the approximations ('gaussian', 'simplified.laplace' (default) or 'laplace') or the grid exploration strategy <span style="display:block; margin-top: 20px ;"></span> - and many others for expert use. --- # Step by step guide to using `R-INLA` `R` returns an object .olive[`m`] in the class `inla`, which has some methods available as for example .olive[`summary()`] and .olive[`plot()`]. <span style="display:block; margin-top: 20px ;"></span> Some of the elements in an `inla` object returned by a call to `inla()` are: <center><img src=./img/Table_out_inla.png width='60%' title=''></center> .center[Source: Gómez-Rubio (2020), Section 2.3.1] --- #Toy example We consider the .olive[`iris`] dataset included in the .olive[`R`] datasets (see .olive[`?iris`]), regarding the measurements in centimeters of the variables *sepal length* and *width* and *petal length* and *width*, respectively, for 50 flowers from each of 3 species of iris. The species are Iris setosa, versicolor, and virginica. See Section 2.6 of the INLA book. ``` r > summary(iris) ``` ``` Sepal.Length Sepal.Width Petal.Length Petal.Width Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100 1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300 Median :5.800 Median :3.000 Median :4.350 Median :1.300 Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199 3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800 Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500 Species setosa :50 versicolor:50 virginica :50 ``` We specify a simple regression model with `Petal.length` and `Petal.width` as dependent and independent variables, respectively: `\begin{align*} \text{ Petal.length}_i&\sim \text{Normal}(\eta_i,1/\sigma^2_e)\\ \eta_i &= \beta_0 + \beta_1\text{ Petal.width}_i \end{align*}` --- # Toy example: run INLA + exploring the fixed effects output <span style="display:block; margin-top: -15px ;"></span> ``` r > library(INLA) > formula <- Petal.Length ~ 1 + Petal.Width > output <- inla(formula, family="gaussian", data=iris) ``` ``` r > output$summary.fixed ``` ``` mean sd 0.025quant 0.5quant 0.975quant mode (Intercept) 1.083565 0.07287746 0.9404618 1.083565 1.226669 1.083565 Petal.Width 2.229935 0.05133317 2.1291358 2.229935 2.330733 2.229935 kld (Intercept) 3.637008e-09 Petal.Width 3.636986e-09 ``` - For each unstructured *fixed* effect, `R-INLA` reports a set of summary statistics from the posterior distribution. -- - The value of the Kullback-Leibler divergence `kld` describes the difference between the Gaussian approximation and the Simplified Laplace Approximation (SLA) to the marginal posterior densities: - Small values indicate that the posterior distribution is well approximated by a Normal distribution - If so, the more sophisticated SLA gives a *good* error rate and therefore there is no need to use the more computationally intensive *full* Laplace approximation. --- # Exploring the output: hyperparatemers ``` r > output$summary.hyperpar ``` ``` mean sd 0.025quant 0.5quant Precision for the Gaussian observations 4.431767 0.5117038 3.486737 4.412182 0.975quant mode Precision for the Gaussian observations 5.490169 4.372837 ``` - For each hyperparameter the summary statistics are reported to describe the posterior distribution. - **NB**: `INLA` reports results on the **precision** scale (more on this later). --- # Manipulating the marginals from `R-INLA`: fixed effects .panelset[ .panel[.panel-name[marginals1] .pull-left[ ``` r > names(output$marginals.fixed) ``` ``` [1] "(Intercept)" "Petal.Width" ``` ``` r > beta1_post <-output$marginals.fixed[[2]] > marg <- inla.smarginal(beta1_post) > q <-inla.qmarginal(0.05,beta1_post) ``` ``` r > plot(marg,t="l", + ylab="",xlab="", + main=expression(paste("p(",beta[1], "| y)"))) > polygon(c(marg$x[marg$x <= q ], q), + c(marg$y[marg$x <= q ], 0), + col = "slateblue1", border = 1) ``` ] .pull-right[ <img src="./img/explore_plot_out-1.png" > ] ] .panel[.panel-name[marginals2] .small[.pull-left[ ``` r > inla.pmarginal(q,beta1_post) ``` ``` [1] 0.05 ``` ``` r > d <-inla.dmarginal(q,beta1_post) > d ``` ``` [1] 1.994065 ``` ``` r > inla.rmarginal(4, beta1_post) ``` ``` [1] 2.257847 2.290692 2.231191 2.252115 ``` ] ] .pull-right[ <img src="./img/explore_plot1_out-1.png" > ] ] ] --- # Manipulating the marginals from `R-INLA`: hyperparameters `INLA` works with the precision by default .panelset[ .panel[.panel-name[Precision] .pull-left[ ``` r > names(output$marginals.hyperpar) ``` ``` [1] "Precision for the Gaussian observations" ``` ``` r > prec_post <-output$marginals.hyperpar[[1]] ``` ``` r > plot(inla.smarginal(prec_post),t="l", + ylab="",xlab="", + main=expression(paste("p(",1/sigma[e]^2, "| y)"))) ``` ] .pull-right[ <img src="./img/explore_plot_hyp_out-1.png" > ] ] .panel[.panel-name[Variance] .pull-left[ ``` r > var_post = inla.tmarginal(fun=function(x) + 1/x, mar=prec_post) > inla.emarginal(fun=function(x) 1/x, + marg=prec_post) ``` ``` [1] 0.2286684 ``` ``` r > plot(inla.smarginal(var_post),t="l", + ylab="",xlab="", + main=expression(paste("p(",sigma[e]^2, "| y)"))) ``` ] .pull-right[ <img src="./img/explore_plot_hyp1_out-1.png" > ] ] ] --- #Summary The INLA approach is not a rival/competitor/replacement to/of MCMC, just a better option for the class of LGMs. <span style="display:block; margin-top: 20px ;"></span> - The basic idea behind the INLA procedure is simple - Repeatedly use Laplace approximation and take advantage of computational simplifications due to the structure of the model - Use numerical integration to compute the required posterior marginal distributions <span style="display:block; margin-top: 20px ;"></span> - Complications are mostly computational and occur when - Extending to a large number of hyperparameters - Markedly non-Gaussian observations --- # References Blangiardo, M. and M. Cameletti (2015). _Spatial and spatio-temporal Bayesian models with R-INLA_. John Wiley & Sons. Gómez-Rubio, V. (2020). _Bayesian inference with INLA_. CRC Press. Johnson, A. A., M. Q. Ott, and M. Dogucu (2022). _Bayes Rules!: An Introduction to Applied Bayesian Modeling_. CRC Press. Rue, H., S. Martino, and N. Chopin (2009). "Approximate Bayesian inference for latent Gaussian model by using integrated nested Laplace approximations (with discussion)". In: _J. R. Statist. Soc. B_ 71, pp. 319-392.