Posterior inference I: Sampling

Taking samples turns out to be a very useful way to learn about a distribution!
For example: Bayesian linear regression in Stan

Dataset: Palmer penguins

Dataset: Dr. Kristen Gorman, University of Alaska (Gorman et al 2014)

R package palmerpenguins

Artwork by @allison_horst

Posterior inference I: Sampling

Taking samples turns out to be a very useful way to learn about a distribution!
For example: Bayesian linear regression in Stan

Dataset: Palmer penguins

Dataset: Dr. Kristen Gorman, University of Alaska (Gorman et al 2014)

R package palmerpenguins

Artwork by @allison_horst

data {
    int <lower=0> n;
    vector [n] x;
    vector [n] y;
}
parameters {
    real intercept;
    real slope;
    real <lower = 0> s;
}
model {
    y ~ normal(intercept + slope * x, s);
}

penguin_lm = stan_model("stan/basic_lm.stan")

Posterior inference I: Sampling

Taking samples turns out to be a very useful way to learn about a distribution!
For example: Bayesian linear regression in Stan

Dataset: Palmer penguins

Dataset: Dr. Kristen Gorman, University of Alaska (Gorman et al 2014)

R package palmerpenguins

Artwork by @allison_horst

data(penguins, package = "palmerpenguins")
penguins = as.data.frame(penguins[complete.cases(penguins),])
penguins = subset(penguins, species == "Gentoo")
peng_dat = with(penguins, list(
    x = bill_length_mm,
    y = bill_depth_mm,
    n = length(bill_length_mm)
))

# refresh = 0 just turns off the status display to keep the slides neat
# you can omit it in your own code!
peng_fit = sampling(penguin_lm, data = peng_dat, refresh = 0)

Posterior inference I: Sampling

Taking samples turns out to be a very useful way to learn about a distribution!
For example: Bayesian linear regression in Stan
What can we learn from our samples?

# samples are a bit easier to deal with in a matrix
peng_samps = as.matrix(peng_fit)
head(peng_samps)
##           parameters
## iterations intercept     slope         s      lp__
##       [1,]  6.790132 0.1739852 0.8368009 -27.18301
##       [2,]  6.707194 0.1758275 0.7691898 -26.04086
##       [3,]  4.091735 0.2309583 0.7726220 -25.72605
##       [4,]  3.572192 0.2392024 0.7861896 -25.84279
##       [5,]  3.892801 0.2343940 0.7330994 -25.43181
##       [6,]  6.447683 0.1803667 0.7783494 -25.40238

Posterior inference I: Sampling

Taking samples turns out to be a very useful way to learn about a distribution!
For example: Bayesian linear regression in Stan
What can we learn from our samples?
- What is the probability that the slope > 0.15?

slope_015 = peng_samps[, 'slope'] > 0.15
table(slope_015)
## slope_015
## FALSE  TRUE 
##    26  3974

sum(slope_015 / nrow(peng_samps))
## [1] 0.9935

Posterior inference I: Sampling

Taking samples turns out to be a very useful way to learn about a distribution!
For example: Bayesian linear regression in Stan
What can we learn from our samples?
- What is the probability that the slope > 0.15?
- What is the probability of a range of values, e.g., $0.15 < slope < 0.2$

sum(peng_samps[, 'slope'] > 0.15 & peng_samps[, 'slope'] < 0.25) / 
    nrow(peng_samps)
## [1] 0.9675

Posterior inference I: Sampling

Taking samples turns out to be a very useful way to learn about a distribution!
For example: Bayesian linear regression in Stan
What can we learn from our samples?
- What is the probability that the slope > 0.15?
- What is the probability of a range of values, say $ 0.15 < slope < 0.2$
- What interval encompasses 90% of the probability mass (90% Credible Interval)?

t(apply(samps, 2, quantile, c(0.05, 0.95)))
##            
## parameters           5%         95%
##   intercept   3.3830868   6.9219857
##   slope       0.1691958   0.2439565
##   s           0.6840424   0.8440211
##   lp__      -28.3636050 -24.5817686

Posterior inference I: Sampling

Taking samples turns out to be a very useful way to learn about a distribution!
For example: Bayesian linear regression in Stan
What can we learn from our samples?
- What is the probability that the slope > 0.15?
- What is the probability of a range of values, say $0.15 < slope < 0.2$
- What interval encompasses 90% of the probability mass (90% Credible Interval)?
We could emulate the output of summary(lm(...))

tab = data.frame(
    estimate = apply(peng_samps[, 1:2], 2, median),
    ste = apply(peng_samps[, 1:2], 2, sd)
)

	Estimate	Std. Error
intercept	5.14	1.10
slope	0.21	0.02

Posterior inference I: Sampling

What about central tendency? What’s the most likely or “best guess” for each parameter?
For normally distributed posterior, all three are equal
Otherwise, median performs best, but always prefer an interval to a point estimate

tab = data.frame(
    mean = colMeans(peng_samps[, 1:3]),
    median = apply(peng_samps[, 1:3], 2, median),
    ## need optimizing to get the posterior mode
    mode = optimizing(penguin_lm, data = peng_dat)$par)

	mean	median	mode
intercept	5.14159	5.13949	5.12095
slope	0.20718	0.20717	0.20761
s	0.75794	0.75534	0.74274

Posterior prediction

We can use the medians to easily predict the regression line.

	intercept	slope	s
	5.14	0.21	0.76

Posterior prediction

We can use the medians to easily predict the regression line.
Posterior distributions are transitive

If $\hat{\theta}$ is a set of samples approximating the posterior distribution of $\theta$, and if some desired variable $Z = f(\theta)$, then $f(\hat{\theta})$ approximates the posterior distribution of $Z$

	intercept	slope	s
	5.14	0.21	0.76

Posterior prediction

We can use the medians to easily predict the regression line.
Posterior distributions are transitive

If $\hat{\theta}$ is a set of samples approximating the posterior distribution of $\theta$, and if some desired variable $Z = f(\theta)$, then $f(\hat{\theta})$ approximates the posterior distribution of $Z$

We can use this to get a posterior distribution of regression lines
- Each posterior sample is one potential regression line

	intercept	slope	s
	5.14	0.21	0.76

Posterior prediction

We can use the medians to easily predict the regression line.
Posterior distributions are transitive

If $\hat{\theta}$ is a set of samples approximating the posterior distribution of $\theta$, and if some desired variable $Z = f(\theta)$, then $f(\hat{\theta})$ approximates the posterior distribution of $Z$

We can use this to get a posterior distribution of regression lines
- Each posterior sample is one potential regression line

	intercept	slope	s
	5.14	0.21	0.76

Posterior prediction

This means we can easily use the samples to predict a credible interval for $\mathbb{E}(y)$ for any arbitrary value of $x$
What information is this telling us?
- There is a 90% chance the conditional expectation is in the range(?!)

# predict from one x, one sample
pry = function(samp, x) samp[1] + samp[2] * x

test_x = 55.6
# just apply prediction to every row of samples
test_y = apply(peng_samps, 1, pry, x = test_x)
# then get the quantiles
quantile(test_y, c(0.05, 0.95))
##       5%      95% 
## 16.33720 16.98103

Posterior prediction

This means we can easily use the samples to predict a credible interval for $\mathbb{E}(y)$ for any arbitrary value of $x$
What information is this telling us?
We can do the same across many x-values to produce a confidence ribbon.
- This is very similar to regression confidence intervals produced by lm

# predict from many x, one sample
pry = function(samp, x) samp[1] + samp[2] * x

test_x = seq(40, 60, length.out = 200)

# test_x is a vector, so result is a matrix
# each row in this matrix is the prediction for a single x
# each column a single sample
test_y = apply(peng_samps, 1, pry, x = test_x)
colnames(test_y) = paste0("samp", 1:nrow(peng_samps))
rownames(test_y) = paste0("bill_len=", round(test_x, 2))
test_y[1:3, 1:6]
##                iterations
##                    samp1    samp2    samp3    samp4    samp5    samp6
##   bill_len=40   13.74954 13.74029 13.33007 13.14029 13.26856 13.66235
##   bill_len=40.1 13.76703 13.75796 13.35328 13.16433 13.29212 13.68048
##   bill_len=40.2 13.78451 13.77563 13.37649 13.18837 13.31567 13.69861

Posterior prediction

This means we can easily use the samples to predict a credible interval for $\mathbb{E}(y)$ for any arbitrary value of $x$
What information is this telling us?
We can do the same across many x-values to produce a confidence ribbon.
- This is very similar to regression confidence intervals produced by lm

# then we get the quantiles by row
interval_y = apply(test_y, 1, quantile, c(0.05, 0.95))
interval_y[, 1:5]
##      
##       bill_len=40 bill_len=40.1 bill_len=40.2 bill_len=40.3 bill_len=40.4
##   5%     13.11952      13.14429      13.16920      13.19365      13.21858
##   95%    13.73734      13.75457      13.77286      13.79004      13.80743

Posterior prediction

This means we can easily use the samples to predict a credible interval for $\mathbb{E}(y)$ for any arbitrary value of $x$
What information is this telling us?
We can do the same across many x-values to produce a confidence ribbon.
- This is very similar to regression confidence intervals produced by lm

Posterior prediction

Our model is generative
It postulates a statistical process (not mechanistic) by which the outcomes $y$ are created
We can use posterior predictive simulations to learn the distribution of outcomes
For a given value of $x$, the interval tells you where 90% of the values of $y$ will fall (not $\mathbb{E}[y]$)
To do this:
- for each sample of $a$, $b$, and $s$
- for each value of a prediction dataset $\hat{x}$
- compute $\eta = \mathbb{E}(y)$ using the regression equation
- simulate a new dataset $\hat{y}$ from $\eta$ and $s$
- compute quantiles for $\hat{y} | \hat{x}$
Similar to typical regression prediction intervals

# from a single sample, generate a new prediction dataset from xhat
sim = function(samp, xhat) {
    eta = samp[1] + samp[2] * xhat
    rnorm(length(xhat), eta, samp[3])
}

test_x = seq(40, 60, length.out = 200)
pr_test_y = matrix(0, ncol = nrow(peng_samps), nrow = length(test_x))

# for clarity, using a for loop. could (should) do this instead with mapply
for(i in 1:nrow(peng_samps))
    pr_test_y[,i] = sim(peng_samps[i,], test_x)

# now get quantiles for each value in x
pr_int_y = apply(pr_test_y, 1, quantile, c(0.05, 0.95))

Posterior prediction

Our model is generative
It postulates a statistical process (not mechanistic) by which the outcomes $y$ are created
We can use posterior predictive simulations to learn the distribution of outcomes
For a given value of $x$, the interval tells you where 90% of the values of $y$ will fall (not $\mathbb{E}[y]$)
To do this:
- for each sample of $a$, $b$, and $s$
- for each value of a prediction dataset $\hat{x}$
- compute $\eta = \mathbb{E}(y)$ using the regression equation
- simulate a new dataset $\hat{y}$ from $\eta$ and $s$
- compute quantiles for $\hat{y} | \hat{x}$
Similar to typical regression prediction intervals

Posterior Inference I

Posterior inference I: Sampling

Posterior inference I: Sampling

Posterior inference I: Sampling

Posterior inference I: Sampling

Posterior inference I: Sampling

Posterior inference I: Sampling

Posterior inference I: Sampling

Posterior inference I: Sampling

Posterior inference I: Sampling

Posterior inference I: Sampling

Posterior prediction

Posterior prediction

Posterior prediction

Posterior prediction

Posterior prediction

Posterior prediction

Posterior prediction

Posterior prediction

Posterior prediction

Posterior prediction