Session 3.1: Introduction to INLA and R-INLA

# Session 3.1: Introduction to INLA and `R-INLA`

###

### Imperial College London

---

.my-footer[ 
.alignleft[ 
&nbsp; &copy; Marta Blangiardo | Monica Pirani
]
.aligncenter[
MSc in Epidemiology 
]
.alignright[
Imperial College London, NA 
]
]

---

# Learning objectives

After this lecture you should be able to 

- Present the class of latent Gaussian models 

- Present the Laplace approximation and the INLA approach 

- Use the basic functions of the `R-INLA` package

The topics treated in this lecture are presented in Chapter 4  of Blangiardo and Cameletti (2015).

---

# Outline

1\. [MCMC to INLA](#mcmcINLA)

2\. [Latent Gaussian models](#LGM)

3\. [The INLA approach](#INLA)

4\. [`R-INLA` package](#Package)

---

---

# Why Markov Chain Monte Carlo?

- For all but trivial examples it will be difficult to draw an iid Monte Carlo
sample directly from the posterior distribution. 
  - This happens, for example, when the dimension of the parameter vector `$\bm{\theta}$` is high  
  - Also to use MC methods we must have a known form for the posterior distribution

- Alternatively *correlated* values (via MCMC) can be (more easily) drawn to approximate the posterior distribution of the parameter of interest

- Instead of simulating independent values from the posterior distribution we draw a sample by running a Markov chain whose stationary distribution is the posterior density `$p(\theta\mid y)$`

- A sequence of values `$\left\{\theta^{(1)},\ldots, \theta^{(m)}\right\}$`  generated from a Markov chain that has reached its stationary distribution (i.e. has converged) can be considered as an approximation to the posterior distribution and can be used to compute all the summaries of interest.

- MCMC methods are very general and can effectively be applied to `any` model

- Even if **in theory**, MCMC can provide (nearly) exact inference, given perfect convergence and MC error `$\rightarrow 0$`, in practice, this has to be balanced with model complexity and running time

- This is an issue particularly for problems characterised by large data or very complex structure (e.g. hierarchical models)

---

# MCMC: Gibbs sampling

The .red[**Gibbs sampling**] (GS) is one of the most popular schemes for MCMC. Consider the case of a generic `$J$` dimensional parameter set `$(\theta_1,\theta_2,\ldots,\theta_J)$`:

1. Select a set of initial values `$\color{blue}{(\theta^{(0)}_1,\theta^{(0)}_2,\ldots,\theta^{(0)}_J)}$`

2. Sample `$\color{blue}{\theta^{(1)}_1}$` from the conditional distribution `$\color{blue}{ p(\theta_1\mid \theta^{(0)}_2,\theta^{(0)}_3,\ldots,\theta^{(0)}_J,y)}$`;
Sample `$\color{blue}{\theta^{(1)}_2}$` from the conditional distribution `$\color{blue}{p(\theta_2\mid \theta^{(1)}_1,\theta^{(0)}_3,\ldots,\theta^{(0)}_J,y)}$`;
...;
Sample `$\color{blue}{\theta^{(1)}_J}$` from the conditional distribution `$\color{blue}{p(\theta_J\mid \theta^{(1)}_1,\theta^{(1)}_2,\ldots,\theta^{(1)}_{J-1},y)}$`

3. Repeat step 2. for `$S$` times until convergence is reached to the target distribution `$\color{blue}{p(\bm\theta\mid y)}$`

4. Use the sample from the target distribution to compute all relevant statistics: (posterior) mean, variance, credibility intervals, etc.

If the *full conditionals* are not readily available, they need to be estimated (eg via Metropolis-Hastings) before applying the GS

Easy references for MCMC are

- Blangiardo and Cameletti (2015), Chapter 4

- Johnson, Ott, and Dogucu (2022), Chapters 6-7

---

# MCMC: convergence

---

# MCMC: convergence

---

# MCMC: convergence

---

# From MCMC to INLA

- `Standard` MCMC sampler are generally easy-ish to program and are in fact implemented in readily available software

- MCMC methods are flexible and able to deal with virtually any type of data and model, but they  involve computationally- and time- intensive simulations to obtain the posterior distribution for the parameters. For this reason the complexity of the model and the database dimension often remain fundamental issues.

- The INLA algorithm proposed by Rue, Martino, and Chopin (2009) is a .red[*deterministic*] algorithm for Bayesian inference and it represents an alternative to MCMC which is instead a simulation based algorithm.

- The INLA algorithm is designed for the class of .red[*latent Gaussian models*] and compared to MCMC it provides (as) accurate results in a shorter time.

.pull-left[
- INLA has become very popular among statisticians and applied researchers  and in the past few years the number of papers reporting usage and extensions of the INLA method has increased considerably.

]

---

# INLA website and community

- The website contains source code, examples, papers and reports discussing the theory and applications of INLA.

- There is also a discussion forum where users can post queries and requests of help.

- Almost each year there is an INLA-related scientific meeting.

---

---

# Latent Gaussian models (LGMs)
- The general problem of (parametric) inference is posited by assuming a probability model for the observed data `$\boldsymbol y=\left(y_1,\ldots, y_n\right)$`, as a function of some relevant parameters

.myblue[$$\boldsymbol{y} \mid \boldsymbol{\theta},\boldsymbol\psi \sim p(\boldsymbol{y} \mid \boldsymbol{\theta},\boldsymbol\psi) = \prod_{i=1}^n p(y_i \mid \boldsymbol{\theta},\boldsymbol\psi)$$]
--

- Often (in fact for a surprisingly large range of models!), we can assume that the parameters are described by a .red[**Gaussian Markov Random Field** (GMRF)]

.myblue[
`$$\boldsymbol{\theta} \mid \boldsymbol\psi \sim \text{Normal}(\boldsymbol 0,\boldsymbol{Q^{-1}}(\boldsymbol\psi))$$`]

.myblue[
`$$\theta_i \perp\!\!\!\perp \theta_j \mid \boldsymbol\theta_{-i,j}   \Longleftrightarrow Q_{ij}(\boldsymbol\psi)=0$$`
]

where 
- The precision matrix `$\bm Q$` depends on some hyperparameters `$\bm\psi$` .
- The notation `$-i,j$` indicates all the other elements of the parameters vector, excluding elements `$i$` and `$j$`
- The components of `$\bm \theta$` are supposed to be *conditionally independent* with the consequence that `$\bm Q$` is a sparse precision matrix.

- This kind of models is often referred to as .olive[**Latent Gaussian models**].

---

# LGMs as a general framework

- In general

.myblue[
`\begin{align*}
\bm y \mid \bm \theta,\bm\psi  &\sim  \prod_i p(y_i\mid\bm\theta,\bm\psi)\class{black}{\mbox{  (Data model)}}\\
\bm\theta \mid \bm\psi &\sim  p(\bm\theta\mid\bm\psi) = \mbox{Normal}(0,\bm Q^{-1}(\bm\psi)) \class{black}{\mbox{  (Latent Gaussian Field)}}\\
\bm\psi &\sim p(\bm\psi) \class{black}{\mbox{ (Hyperprior)}}
\end{align*}`
]

- The dimension of `$\class{myblue}{\bm\theta}$` can be very large (eg 10 `$^2$`-10 `$^5$`).

- Conversely,  the dimension of `$\class{myblue}{\bm\psi}$` must be relatively small to avoid an exponential increase in the computational costs of the model.

---

# LGMs as a general framework

- A very general way of specifying the problem is specifying a distribution for `$\class{myblue}{y_i}$` characterized by a parameter  `$\class{myblue}{\phi_i}$` (usually the mean) defined as a function of a structured additive predictor `$\class{myblue}{\eta_i}$`, defined on a suitable scale, such that  `$\class{myblue}{g(\phi_i)=\eta_i}$` (e.g. logistic for binomial data):

where
- `$\beta_0$` is the intercept; 
- `$\bm\beta=\{\beta_1,\ldots,\beta_M\}$` quantify the effect of the covariates `$\bm{x}=(\bm{x_1},\ldots, \bm{x_M})$` on the response; 
- `$\bm{f}=\{f_1(\cdot),\ldots,f_L(\cdot)\}$` is a set of functions defined in terms of some covariates `$\bm{z}=(\bm z_1,\ldots, \bm z_L)$`

and then assume

`$$\class{red}{\bm\theta=\{\beta_0,\bm \beta,\bm f \} \sim \mbox{Normal}(\bm 0,\bm Q^{-1}(\bm\psi)) = \mbox{GMRF}(\bm\psi)}$$`

- **NB**: This of course implies some form of Normally-distributed marginals for `$\beta_0,\bm{\beta}$` and `$\bm{f}$`

---

# LGMs as a general framework --- examples

Upon varying the form of the functions `$\class{myblue}{f_l(\cdot)}$`, this formulation can accommodate a wide range of models (see  for a review)

- Standard regression

- `$\class{myblue}{f_l(\cdot) = \text{NULL}}$`

- Hierarchical models

- `$\class{myblue}{f_l(\cdot) \sim \mbox{Normal}(0,\sigma^2_f)}$` (Exchangeable)    
   `$\class{myblue}{\sigma^2_f\mid \bm\psi \sim}$` some common distribution

- Spatial and spatio-temporal models

- Areal data: `$\class{myblue}{f_1(\cdot) \sim \mbox{CAR}}$` &nbsp; (Spatially structured effects)    
  `$\class{myblue}{f_2(\cdot) \sim \mbox{Normal}(0,\sigma^2_{f_2})}$` &nbsp; (Unstructured residual)
  
  - Geostatistical data: `$\class{myblue}{f(\cdot) \sim \mbox{Gaussian field}}$`
  - Temporal component: `$\class{myblue}{f(\cdot) \sim \mbox{RW}}$`

- Spline smoothing

- `$\class{myblue}{f_l(\cdot) \sim \mbox{AR}(\phi,\sigma^2_\varepsilon)}$`

- Survival models, logGaussian Cox Processes, etc.

---

---

# Integrated Nested Laplace Approximation (INLA)

- The first *ingredient* of the INLA approach is the definition of conditional probability, which holds for any pair of variables `$(x,z)$`. Technically, provided `$p(z)>0$`

`$$\class{myblue}{p(x\mid z) =: \frac{p(x,z)}{p(z)} \rightarrow p(x,z)=p(x\mid z)p(z)}$$`

Then `$\class{myblue}{p(x\mid z)}$` can be re-written as 
`$$\class{myblue}{p(z) = \frac{p(x,z)}{p(x\mid z)}}$$`

- In particular, a conditional version can be obtained further considering a third variable `$w$` as 
`$$\class{myblue}{p(z\mid w) = \frac{p(x,z\mid w)}{p(x\mid z,w)}}$$`

which is particularly relevant to the Bayesian case.

- The second *ingredient* is the .red[Laplace approximation]. **Main idea**: approximate a generic function `$\class{myblue}{f(x)}$` using a quadratic function by means of a Taylor's series expansion around the mode `$x^*=\arg\!\max_x \log f(x)$`. Thus, under LA, `$\class{red}{f(x)\approx \mbox{Normal}(x^*, {\sigma^2}^*)}$`.

---

# INLA objectives

- In a Bayesian LGM, the required distributions are

`$$\class{myblue}{p(\theta_i\mid\boldsymbol{y})} \class{black}{=  \int p(\theta_i,\boldsymbol\psi\mid \boldsymbol{y}) d\boldsymbol\psi =  \int} \class{red}{p(\boldsymbol\psi\mid\boldsymbol{y})} \class{orange}{p(\theta_i \mid \boldsymbol\psi,\boldsymbol{y})}\class{black}{d\boldsymbol\psi}$$`
`$$\class{myblue}{p(\psi_k\mid\boldsymbol{y})} \class{black}{ = \int}\class{red}{p(\boldsymbol\psi\mid\boldsymbol{y})}\class{black}{d\boldsymbol\psi_{-k}}$$`

- Thus we need to estimate:

1\. `$\class{red}{p(\boldsymbol\psi\mid\boldsymbol{y})}$`, from which also all the relevant marginals `$\class{myblue}{p(\psi_k\mid\boldsymbol{y})}$` can be obtained;

2\. `$\color{orange}{p(\theta_i \mid \boldsymbol\psi,\boldsymbol{y})}$`, which is needed to compute the marginal posterior for the parameters

---

# Integrated Nested Laplace Approximation (INLA)

1\. can be easily estimated as

`\begin{align}
\class{red}{p(\boldsymbol\psi\mid\boldsymbol{y})} & \class{black}{ = ...}\\
& \approx \left. \frac{p(\boldsymbol\psi)p(\boldsymbol\theta\mid\boldsymbol\psi)p(\boldsymbol{y}\mid\boldsymbol\theta,\boldsymbol\psi)}{\tilde{p}(\boldsymbol\theta\mid\boldsymbol\psi,\boldsymbol{y})}\right |_{\boldsymbol\theta={\boldsymbol\theta}^*(\boldsymbol\psi)} =: \class{red}{\tilde{p}(\boldsymbol\psi\mid\boldsymbol{y})}
\end{align}`

- `$\tilde{p}(\boldsymbol\theta\mid\boldsymbol\psi,\boldsymbol{y})$` is the Gaussian approximation given by the Laplace method  of `$p(\boldsymbol\theta\mid\boldsymbol\psi,\boldsymbol{y})$` 
- `$\boldsymbol\theta={\boldsymbol\theta}^*(\boldsymbol\psi)$` is its mode for a given `$\boldsymbol \psi$`

2\. is slightly more complex, because in general there will be more elements in `$\boldsymbol\theta$` than there are in `$\boldsymbol\psi$` and thus this computation is more expensive.

Three possible approaches to approximate `$\color{\orange}{p(\theta_i \mid \boldsymbol\psi,\bm{y})}$`:
- **Gaussian Approximation**

- **Full Laplace Approximation**

- **Simplified Laplace Approximation** (implemented by default by `R-INLA`)

---

# INLA &ndash; in a nutshell...

.pull-left[
.medium[
<ol style="counter-reset: my-counter 0;">
<li>Select a grid of $H$ points $\{\bm\psi_h^*\}$ and area weights $\{\Delta_h\}$; interpolate the density to approximate to the posterior</li>
</ol>
]

<center><img src=./img/inla1-1.png width='80%' title=''></center>
]

.pull-right[
.medium[
<ol style="counter-reset: my-counter 1;">
<li> Approximates the conditional posterior of each $\theta_j$, given $\bm\psi, \bm{y}$ on the $H−$dimensional grid</li>
</ol>
]

<center><img src=./img/inla2-1.png width='85%' title=''></center>
]

---

.pull-left[
.medium[
<ol style="counter-reset: my-counter 2;">
<li> Weight the conditional marginal posteriors by the density associated with each $\psi_h^*$

</li>
</ol>
]

<center><img src=./img/inla3-1.png width='85%' title=''></center>
]

.pull-right[
.medium[
<ol style="counter-reset: my-counter 3;">
<li> (Numerically) sum over all the conditional densities to obtain the marginal posterior for $\theta_j$

</li>
</ol>
]

<center><img src=./img/inla4-1.png width='85%' title=''></center>
]

More details on the estimation procedure are available in Chater 2 of Gómez-Rubio (2020)

---

---

# The `INLA` package for `R`

- Good news is that all the procedures needed to perform INLA are implemented in a `R` package. This is effectively made by two components

1\. The .red[`GMRFLib`] library

- A `C` library for fast and exact simulation of GMRFs

2\. The .blue[`inla`] program
  
  - A standalone `C` program build upon the `GMRFLib` library (it performs the relevant computation and returns the results in a standardised way)

**NB**: Because the package `R-INLA` relies on a standalone `C` program (and other reasons...), it is not available directly from `CRAN`.

`R-INLA` runs natively under Linux, Windows and Mac and it is possible to do multi-threading using `OpenMP`

---

# The `INLA` package `R` - Installation

- From `R`, installation of the stable version is performed typing

`install.packages("INLA",repos=c(getOption("repos"),INLA="https://inla.r-inla-download.org/R/stable"), dep=TRUE)`

- Later, you can upgrade the package by typing

`library(INLA)`

`inla.upgrade()`

- A test-version (which may contain unstable updates/new functions) can be obtained by typing

`inla.upgrade(testing=TRUE)`

- Type `inla.version()` to find out the installed version

---

# Step by step guide to using `R-INLA`

1\. The first thing to do is to .red[**specify the model**]

- For example, assume we have a generic model
.myblue[
`\begin{align*}
y_i & \stackrel{\small{iid}}{\sim} p(y_i \mid \theta_i) \\
\eta_i & =   \beta_0 + \beta_1 x_{1i} + \beta_2 x_{2i} + f(z_i)
\end{align*}`
]

where

- `$\color{olive}{\bm x = (\bm x_1,\bm x_2)}$` are observed covariates

- `$\color{olive}{\bm \beta = (\beta_0,\beta_1,\beta_2)\sim \mbox{Normal}(0,\tau_1^{-1})}$` are unstructured (*fixed*) effects

- `$\color{olive}{\bm z}$` is an **index**. This can be used to include structured (*random*), spatial, spatio-temporal effect, etc.

- `$\color{olive}{f \sim \mbox{Normal}(0,\bm{Q}^{-1}_f(\tau_2))}$` is a suitable function used to model the structured effects

- As mentioned earlier, this formulation can actually be used to represent quite a wide class of models!

---

# Step by step guide to using `R-INLA`

- The model is translated in `R` code using a .red[`formula`]

- This is sort of standard in `R` (you would do pretty much the same for calls to functions such as `lm` or `glm`)

.olive[`formula = y `] `$\color{olive}{\sim}$` .olive[`1+  x1 + x2 + f(name=z, model="...",hyper=...)`]

- The .olive[`f()`] function can account for several structured nonlinear effects. We have for example:

- .olive[`iid`] specify independent random effects
  
  - .olive[`rw1`], .olive[`rw2`], .olive[`ar1`] are smooth effect of covariates or time effects
  
  - .olive[`besag`] models spatially structured effects (CAR)
    
  - .olive[`generic`] is a user-defined precision matrix
    
- Type .olive[`inla.list.models("latent")`] for the complete list and find descriptions at .olive[[https://www.r-inla.org/documentation](https://www.r-inla.org/documentation)]

- Some options can be specified for the .olive[`f()`] term: for example .olive[`hyper`] is used to specify the prior on the hyperparameters (more later).

- It is possible to include in the `formula` several .olive[`f()`]  terms specifying them separately, e.g.

---

# Step by step guide to using `R-INLA`

2\. Call the function .olive[`inla`] to fit the model, specifying the data and options (more on this later),e.g.

--

- The data need to be included in a suitable .olive[`data.frame`]

--

- The distribution of the data (i.e. the likelihood) is specified with the .olive[`family`] option.

- Type .olive[``inla.list.models("likelihood")`] for the complete list of likelihood function (we have .olive[`poisson, binomial, gamma, beta, gaussian`] and many others).
 

--

-  Visit .olive[[https://www.r-inla.org/documentation](https://www.r-inla.org/documentation)] for the complete description

---

# Step by step guide to using `R-INLA`

The .olive[`control.xxx=list(...)`] statements  in the .olive[`inla`] function control various part of the INLA program:

- .olive[`control.compute`]: for computing measures of fit (eg DIC)

- .olive[`control.predictor`]: for specifying the *Observation matrix* `$\bm A$` which links the latent field to the data

- .olive[`control.family`]: for changing the prior distribution of the likelihood hyperparameters

- .olive[`control.fixed`]: for changing the prior distribution of the fixed effects

- .olive[`control.inla`]: for changing the strategy to use for the approximations ('gaussian', 'simplified.laplace' (default) or 'laplace') or the grid exploration strategy

- and many others for expert use.

---

# Step by step guide to using `R-INLA`

`R` returns an object .olive[`m`] in the class `inla`, which has some methods available as for example .olive[`summary()`]  and  .olive[`plot()`].

Some of the elements in an `inla` object returned by a call to `inla()` are:

---

#Toy example

We consider the .olive[`iris`] dataset included in the .olive[`R`] datasets (see .olive[`?iris`]), regarding the measurements in centimeters of the variables *sepal length* and *width* and *petal length* and *width*, respectively, for 50 flowers from each of 3 species of iris. The species are Iris setosa, versicolor, and virginica. See Section 2.6 of the INLA book.

``` r
> summary(iris)
```

```
  Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
 Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100  
 1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300  
 Median :5.800   Median :3.000   Median :4.350   Median :1.300  
 Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199  
 3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800  
 Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500  
       Species  
 setosa    :50  
 versicolor:50  
 virginica :50  
                
                
                
```

We specify a simple regression model with `Petal.length` and `Petal.width` as dependent and independent variables, respectively:

`\begin{align*}
\text{ Petal.length}_i&\sim \text{Normal}(\eta_i,1/\sigma^2_e)\\
\eta_i &= \beta_0 + \beta_1\text{ Petal.width}_i
\end{align*}`

---

# Toy example: run INLA + exploring the fixed effects output

``` r
> library(INLA)
> formula <- Petal.Length ~ 1 + Petal.Width
> output <- inla(formula, family="gaussian", data=iris)
```

``` r
> output$summary.fixed
```

```
                mean         sd 0.025quant 0.5quant 0.975quant     mode
(Intercept) 1.083565 0.07287746  0.9404618 1.083565   1.226669 1.083565
Petal.Width 2.229935 0.05133317  2.1291358 2.229935   2.330733 2.229935
                     kld
(Intercept) 3.637008e-09
Petal.Width 3.636986e-09
```

- For each unstructured *fixed* effect, `R-INLA` reports a set of summary statistics from the posterior distribution.

- The value of the Kullback-Leibler divergence `kld` describes the difference between the Gaussian approximation and the Simplified Laplace Approximation (SLA) to the marginal posterior densities:

- Small values indicate that the posterior distribution is well approximated by a Normal distribution  
  - If so, the more sophisticated SLA gives a *good* error rate and therefore there is no need to use the more computationally intensive *full* Laplace approximation.

---

# Exploring the output: hyperparatemers

``` r
> output$summary.hyperpar
```

```
                                            mean        sd 0.025quant 0.5quant
Precision for the Gaussian observations 4.431767 0.5117038   3.486737 4.412182
                                        0.975quant     mode
Precision for the Gaussian observations   5.490169 4.372837
```

- For each hyperparameter the summary statistics are reported to describe the posterior distribution.

- **NB**: `INLA` reports results on the **precision** scale (more on this later).

---

# Manipulating the marginals from `R-INLA`: fixed effects

``` r
> names(output$marginals.fixed)
```

```
[1] "(Intercept)" "Petal.Width"
```

``` r
> beta1_post <-output$marginals.fixed[[2]]
> marg <- inla.smarginal(beta1_post)
> q <-inla.qmarginal(0.05,beta1_post)
```

``` r
> plot(marg,t="l",
+ ylab="",xlab="", 
+ main=expression(paste("p(",beta[1], "| y)")))
> polygon(c(marg$x[marg$x <= q ], q),
+ c(marg$y[marg$x <= q ], 0),
+ col = "slateblue1", border = 1)
```
]

``` r
> inla.pmarginal(q,beta1_post)
```

```
[1] 0.05
```

``` r
> d <-inla.dmarginal(q,beta1_post)
> d
```

```
[1] 1.994065
```

``` r
> inla.rmarginal(4, beta1_post)
```

```
[1] 2.257847 2.290692 2.231191 2.252115
```

]
]

---

# Manipulating the marginals from `R-INLA`: hyperparameters

`INLA` works with the precision by default

``` r
> names(output$marginals.hyperpar)
```

```
[1] "Precision for the Gaussian observations"
```

``` r
> prec_post <-output$marginals.hyperpar[[1]]
```

``` r
> plot(inla.smarginal(prec_post),t="l",
+      ylab="",xlab="", 
+      main=expression(paste("p(",1/sigma[e]^2, "| y)")))
```
]

``` r
> var_post = inla.tmarginal(fun=function(x) 
+             1/x, mar=prec_post)
> inla.emarginal(fun=function(x) 1/x, 
+ marg=prec_post)
```

```
[1] 0.2286684
```

``` r
> plot(inla.smarginal(var_post),t="l",
+      ylab="",xlab="", 
+      main=expression(paste("p(",sigma[e]^2, "| y)")))
```
]

---

#Summary

The INLA approach is not a rival/competitor/replacement to/of MCMC, just a better option for the class of LGMs.

- The basic idea behind the INLA procedure is simple

- Repeatedly use Laplace approximation and take advantage of computational simplifications due to the structure of the model

- Use numerical integration to compute the required posterior marginal distributions

- Complications are mostly computational and occur when

- Extending to a large number of hyperparameters

- Markedly non-Gaussian observations

---

# References

Blangiardo, M. and M. Cameletti (2015). _Spatial and spatio-temporal
Bayesian models with R-INLA_. John Wiley & Sons.

Gómez-Rubio, V. (2020). _Bayesian inference with INLA_. CRC Press.

Johnson, A. A., M. Q. Ott, and M. Dogucu (2022). _Bayes Rules!: An
Introduction to Applied Bayesian Modeling_. CRC Press.

Rue, H., S. Martino, and N. Chopin (2009). "Approximate Bayesian
inference for latent Gaussian model by using integrated nested Laplace
approximations (with discussion)". In: _J. R. Statist. Soc. B_ 71, pp.
319-392.