class: title-slide # Session 2.2: Posterior Predictive Distribution and Monte Carlo computation<span style="display:block; margin-top: 10px ;"></span> ## ### ### Imperial College London <!-- Can also separate the various components of the extra argument 'params', eg as in ### Imperial College London, , MSc in Epidemiology --> --- layout: true .my-footer[ .alignleft[ © Marta Blangiardo | Monica Pirani ] .aligncenter[ MSc in Epidemiology ] .alignright[ Imperial College London, NA ] ] --- # Learning objectives After this lecture you should be able to <span style="display:block; margin-top: 40px ;"></span> - Describe what the posterior predictive distribution (PPD) is <span style="display:block; margin-top: 40px ;"></span> - Explain how the PPD is computable <span style="display:block; margin-top: 40px ;"></span> - Describe what Monte Carlo simulation is, and why it is useful <span style="display:block; margin-top: 30px ;"></span> - The topic of posterior prediction is treated in Section 8.3 of Johnson, Ott, and Dogucu (2022) - The topic of Monte Carlo simulation is presented in Sections 4.1-4.4 of Blangiardo and Cameletti (2015). --- # Outline <span style="display:block; margin-top: 30px ;"></span> 1\. [Bayesian predictive distribution](#Bayesian predictive distribution) <span style="display:block; margin-top: 30px ;"></span> 2\. [Computation of PPD](#Computation of PPD) <span style="display:block; margin-top: 30px ;"></span> 3\. [Introduction to Bayesian computing using Monte Carlo simulation](#Introduction to Bayesian computing using Monte Carlo simulation) <span style="display:block; margin-top: 30px ;"></span> 4\. [Example of MC computation](#Example of MC computation) --- name: Bayesian prediction <span style="display:block; margin-top: 250px ;"></span> .myblue[.center[.huge[ **Posterior Predictive Distribution**]]] --- # Bayesian prediction - Often the objective of our analysis is to predict a future event, based upon the data currently available. <span style="display:block; margin-top: 20px ;"></span> -- - Consider this example: <span style="display:block; margin-top: 20px ;"></span> We estimate the prevalence of a disease in a UK hospital using a sample of n = 58 individuals. <span style="display:block; margin-top: 10px ;"></span> We find that `\(y = 10\)` individuals have the disease. <span style="display:block; margin-top: 10px ;"></span> .red[What is the probability that, if we additionally sample `\(k = 30\)` individuals this year, at least 5 will have the disease?] -- <span style="display:block; margin-top: 20px ;"></span> - As usual we start specifying the data distribution: `$$y \sim \mathrm{Binomial}(\theta,n=58)$$` - Let's `\(\theta\)` be the true disease prevalence and `\(y^{*}\)` be the .red[predicted value] -- - If `\(\theta\)` were known, then we would predict `$$y^*|\theta\sim \mbox{Binomial}(30,\theta)$$` thus `\(\mbox{P}(y\geq{5}) = 1-(\sum_{j=0}^4 \theta^{j}(1-\theta)^{30-j})\)` .blue[BUT ...] `\(\theta\)` is unknown --- # Source of variation in prediction: - We don't know the true value of the parameters and we specify a prior on it: `$$\theta \sim \mathrm{Beta}(a,b)$$` - There is sampling variability ( `\(\rightarrow\)` choice of the data distribution) -- To account for the sources of variation we iterate the following steps: 1. Sample from the posterior distribution `\(\theta \sim p(\theta\mid y)\)` 2. Sample new values `\(y^*\sim p(y\mid \theta)\)` - By repeating these steps a large number of times, we eventually obtain a reasonable approximation to the .red[posterior predictive distribution]. --- # Posterior Predictive Distribution (PPD) - The .red[PPD] represents our uncertainty over the outcome of a future data collection, accounting for the observed data and model choice - For the sake of prediction, the parameters are not of interest. They are vehicles by which the data inform about the predictive model - The .red[PPD] averages over their posterior uncertainty `$$p(y^*|y) = \int p(y^*|\theta)p(\theta|y)d\theta$$` - This properly accounts for parametric uncertainty - The input is data, the output is a prediction distribution --- name: Computation of PPD <span style="display:block; margin-top: 250px ;"></span> .myblue[.center[.huge[ **Computation**]]] --- # Computing the PPD - Say `\(\theta^{(1)},...,\theta^{(M)}\)` are samples from the posterior - If we make a sample for `\(y^*\)` for each `\(\theta^{(m)}\)`, `$$y^{*(m)}\sim p(y|\theta^{(m)})$$` then the `\(y^{*(m)}\)` are samples from the PPD - The posterior predictive mean is approximated by the sample mean of the `\(y^{*(m)}\)` - The probability that `\(y^{*}\geq 5\)` is approximated by the sample proportion of the `\(y^{*(m)}\)` that are equal or above 5 --- name: Example <span style="display:block; margin-top: 250px ;"></span> .myblue[.center[.huge[ **Example**]]] --- # Example - We estimate the prevalence of a disease in the UK population using a sample of `\(n=58\)` individuals. - We find that `\(y=10\)` individuals have the diseases. - .red[What is the probability that, if we additionally sample `\(k=30\)` individuals this year, at least 5 will have the disease?] .pull-left[ 1. .blue[Likelihood]: `\(y \sim \mbox{Binomial}(\theta,58)\)` 2. .blue[Prior]: `\(\theta \sim \mbox{Beta}(1,1)\)` 3. .blue[Posterior]: `\(\theta \mid y \sim \mbox{Beta}(10+1,58-10+1)\)` 4. .blue[PPD]: `\(y^* \sim \mbox{Binomial}(\theta \mid y,30)\)` 5. `\(P(y\geq{5}) = \sum_{j=5}^{30} P(y^*=j)\)` ] .pull-right[ <center><img src=./img/PPD_example.png width='60%' title=''></center> ] <span style="display:block; margin-top: -100px ;"></span> <img src="./img/unnamed-chunk-3-1.png" style="display: block; margin: auto;" width="40%"> --- name: Introduction to Bayesian computing: Monte Carlo simulation <span style="display:block; margin-top: 250px ;"></span> .myblue[.center[.huge[ **Introduction to Bayesian computing: Monte Carlo simulations**]]] --- # Bayesian computing - In Session 2.1 we have introduced the concept of .red[conjugacy], and we said that if the the prior and posterior come from the same family of distributions, the prior is said to be **conjugate** to the likelihood `\(\rightarrow\)` the posterior is a known distribution. <span style="display:block; margin-top: 10px ;"></span> - In real life it is (almost) impossible to use conjugacy so we need to resort to simulative approaches or approximations to perform computation: - Monte Carlo methods <span style="display:block; margin-top: 10px ;"></span> - Markov Chain Monte Carlo (MCMC) methods <span style="display:block; margin-top: 10px ;"></span> - Integrated Nested Laplace Approximation (INLA) --- # Monte Carlo simulation - A Monte Carlo (MC) simulation is a randomly evolving simulation. - MC sampling is based on the idea that if you have a large random sample from a certain distribution, the statistics that you can calculate in this sample (mean, standard deviation, percentiles...) will be very similar to the corresponding theoretical values in the distribution. - If you have a complicated mathematical expression for a distribution and you cannot calculate algebraically important parameters, you could get the computer to generate a large random sample from such a distribution. - By calculating the mean of that parameter in the sample you could estimate the mean in the original distribution with great precision. --- # Example: a Monte Carlo approach to estimating tail-areas of distributions Suppose we want to know the probability of getting 8 or more heads when we toss a fair coin 10 times. - An *algebraic* approach would be: <center><img src=./img/Coin_Alg.png width='80%' title='INCLUDE TEXT HERE'></center> - A *physical* approach would be to repeatedly throw a set of 10 coins and count the proportion of throws that there were 8 or more heads. --- - A *simulation* approach uses a computer to toss the coins! <center><img src=./img/coin-toss.jpg width='60%' title='INCLUDE TEXT HERE'></center> <span style="display:block; margin-top: 10px ;"></span> Proportion with 8 or more heads in 10 tosses: (a) After 100 throws (0.02); (b) after 10,000 throws (0.0577); (c) the true Binomial distribution (0.0547). --- # Monte Carlo approach to approximate log-odds - We start with a Binomial likelihood `$$y \mid \theta \sim \text{Binomial}(\theta, n)$$` combined with a `$$\text{Beta}(a,b)$$` as prior for the probability of success `\(\theta\)`. <span style="display:block; margin-top: 10px ;"></span> - We are interested in the log-odds function of `\(\theta\)` defined as `$$\log \left(\frac{\theta}{1-\theta}\right)$$` - The integral `$$\int_0^1 \log\left(\frac{\theta}{1-\theta}\right) p(\theta\mid y)\text{d}\theta$$` cannot be computed analytically; we resort to Monte Carlo approximation. --- # Example of MC in practice - We simulate `\(m\)` independent values `\(\left\{\theta^{(1)},\ldots, \theta^{(m)}\right\}\)` from the `$$\text{Beta}(a_1=y+a,b_1=n-y+b)$$` posterior distribution using the property of conjugacy (Beta prior is conjugate to the Binomial likelihood). - We apply the log-odds transformation to each value obtaining the set of values `$$\left\{\log\left(\frac{\theta^{(1)}}{1-\theta^{(1)}}\right),\ldots,\log\left(\frac{\theta^{(m)}}{1-\theta^{(m)}}\right)\right\}$$` - Finally, we compute the sample mean `$$\frac{\sum_{i=1}^m \log\left(\frac{\theta^{(i)}}{1-\theta^{(i)}}\right)}{m}$$` which is the Monte Carlo approximation to `$$\log \left(\frac{\theta}{1-\theta}\right)$$` --- # Example of MC: R code In `R`: ``` r > a <- 1 > b <- 1 > theta <- rbeta(1,a,b) > n <- 1000 > y <- rbinom(1, size=n, p=theta) ``` -- - With this setting the exact posterior distribution of `\(\theta\)` is `$$\text{Beta}(a_1=a+y,b_1=n-y+b)$$` - To approximate the log-odds, we simulate `\(m=50000\)` values from this Beta posterior distribution using the `rbeta` function. ``` r > a1 <- a + y > b1 <- n - y + b > sim <- rbeta(n=50000, shape1=a1, shape2=b1) > logodds <- log(sim/(1-sim)) ``` --- # Results and comparison with the theoretical distribution The empirical distribution of the Monte Carlo sample is plotted below together with the exact posterior distribution of `\(\theta\)`. .pull-left[ <center><img src=./img/MCexample.png width='80%' title=''></center> ] .pull-right[ <center><img src=./img/MCexampleLogOdds.png width='80%' title=''></center> ]