Posterior Inference I

Lauren. Talluto

28.11.2024

Posterior inference I: Sampling

Posterior inference I: Sampling

  • Taking samples turns out to be a very useful way to learn about a distribution!
  • For example: Bayesian linear regression in Stan

Dataset: Palmer penguins

Dataset: Dr. Kristen Gorman, University of Alaska (Gorman et al 2014)

R package palmerpenguins

Artwork by @allison_horst

Posterior inference I: Sampling

  • Taking samples turns out to be a very useful way to learn about a distribution!
  • For example: Bayesian linear regression in Stan

Dataset: Palmer penguins

Dataset: Dr. Kristen Gorman, University of Alaska (Gorman et al 2014)

R package palmerpenguins

Artwork by @allison_horst

data {
    int <lower=0> n;
    vector [n] x;
    vector [n] y;
}
parameters {
    real intercept;
    real slope;
    real <lower = 0> s;
}
model {
    y ~ normal(intercept + slope * x, s);
}
penguin_lm = stan_model("stan/basic_lm.stan") 

Posterior inference I: Sampling

  • Taking samples turns out to be a very useful way to learn about a distribution!
  • For example: Bayesian linear regression in Stan

Dataset: Palmer penguins

Dataset: Dr. Kristen Gorman, University of Alaska (Gorman et al 2014)

R package palmerpenguins

Artwork by @allison_horst

data(penguins, package = "palmerpenguins")
penguins = as.data.frame(penguins[complete.cases(penguins),])
penguins = subset(penguins, species == "Gentoo")
peng_dat = with(penguins, list(
    x = bill_length_mm,
    y = bill_depth_mm,
    n = length(bill_length_mm)
))

# refresh = 0 just turns off the status display to keep the slides neat
# you can omit it in your own code!
peng_fit = sampling(penguin_lm, data = peng_dat, refresh = 0)

Posterior inference I: Sampling

  • Taking samples turns out to be a very useful way to learn about a distribution!
  • For example: Bayesian linear regression in Stan
  • What can we learn from our samples?
# samples are a bit easier to deal with in a matrix
peng_samps = as.matrix(peng_fit)
head(peng_samps)
##           parameters
## iterations intercept     slope         s      lp__
##       [1,]  6.790132 0.1739852 0.8368009 -27.18301
##       [2,]  6.707194 0.1758275 0.7691898 -26.04086
##       [3,]  4.091735 0.2309583 0.7726220 -25.72605
##       [4,]  3.572192 0.2392024 0.7861896 -25.84279
##       [5,]  3.892801 0.2343940 0.7330994 -25.43181
##       [6,]  6.447683 0.1803667 0.7783494 -25.40238

Posterior inference I: Sampling

  • Taking samples turns out to be a very useful way to learn about a distribution!
  • For example: Bayesian linear regression in Stan
  • What can we learn from our samples?
    • What is the probability that the slope > 0.15?

slope_015 = peng_samps[, 'slope'] > 0.15
table(slope_015)
## slope_015
## FALSE  TRUE 
##    26  3974
sum(slope_015 / nrow(peng_samps))
## [1] 0.9935

Posterior inference I: Sampling

  • Taking samples turns out to be a very useful way to learn about a distribution!
  • For example: Bayesian linear regression in Stan
  • What can we learn from our samples?
    • What is the probability that the slope > 0.15?
    • What is the probability of a range of values, e.g., \(0.15 < slope < 0.2\)

sum(peng_samps[, 'slope'] > 0.15 & peng_samps[, 'slope'] < 0.25) / 
    nrow(peng_samps)
## [1] 0.9675

Posterior inference I: Sampling

  • Taking samples turns out to be a very useful way to learn about a distribution!
  • For example: Bayesian linear regression in Stan
  • What can we learn from our samples?
    • What is the probability that the slope > 0.15?
    • What is the probability of a range of values, say $ 0.15 < slope < 0.2$
    • What interval encompasses 90% of the probability mass (90% Credible Interval)?

t(apply(samps, 2, quantile, c(0.05, 0.95)))
##            
## parameters           5%         95%
##   intercept   3.3830868   6.9219857
##   slope       0.1691958   0.2439565
##   s           0.6840424   0.8440211
##   lp__      -28.3636050 -24.5817686

Posterior inference I: Sampling

  • Taking samples turns out to be a very useful way to learn about a distribution!
  • For example: Bayesian linear regression in Stan
  • What can we learn from our samples?
    • What is the probability that the slope > 0.15?
    • What is the probability of a range of values, say \(0.15 < slope < 0.2\)
    • What interval encompasses 90% of the probability mass (90% Credible Interval)?
  • We could emulate the output of summary(lm(...))
tab = data.frame(
    estimate = apply(peng_samps[, 1:2], 2, median),
    ste = apply(peng_samps[, 1:2], 2, sd)
)
Estimate Std. Error
intercept 5.14 1.10
slope 0.21 0.02

Posterior inference I: Sampling

  • What about central tendency? What’s the most likely or “best guess” for each parameter?
  • For normally distributed posterior, all three are equal
  • Otherwise, median performs best, but always prefer an interval to a point estimate
tab = data.frame(
    mean = colMeans(peng_samps[, 1:3]),
    median = apply(peng_samps[, 1:3], 2, median),
    ## need optimizing to get the posterior mode
    mode = optimizing(penguin_lm, data = peng_dat)$par)
mean median mode
intercept 5.14159 5.13949 5.12095
slope 0.20718 0.20717 0.20761
s 0.75794 0.75534 0.74274

Posterior prediction

  • We can use the medians to easily predict the regression line.
intercept slope s
5.14 0.21 0.76

Posterior prediction

  • We can use the medians to easily predict the regression line.
  • Posterior distributions are transitive

If \(\hat{\theta}\) is a set of samples approximating the posterior distribution of \(\theta\), and if some desired variable \(Z = f(\theta)\), then \(f(\hat{\theta})\) approximates the posterior distribution of \(Z\)

intercept slope s
5.14 0.21 0.76

Posterior prediction

  • We can use the medians to easily predict the regression line.
  • Posterior distributions are transitive

If \(\hat{\theta}\) is a set of samples approximating the posterior distribution of \(\theta\), and if some desired variable \(Z = f(\theta)\), then \(f(\hat{\theta})\) approximates the posterior distribution of \(Z\)

  • We can use this to get a posterior distribution of regression lines
    • Each posterior sample is one potential regression line
intercept slope s
5.14 0.21 0.76

Posterior prediction

  • We can use the medians to easily predict the regression line.
  • Posterior distributions are transitive

If \(\hat{\theta}\) is a set of samples approximating the posterior distribution of \(\theta\), and if some desired variable \(Z = f(\theta)\), then \(f(\hat{\theta})\) approximates the posterior distribution of \(Z\)

  • We can use this to get a posterior distribution of regression lines
    • Each posterior sample is one potential regression line
intercept slope s
5.14 0.21 0.76

Posterior prediction

  • This means we can easily use the samples to predict a credible interval for \(\mathbb{E}(y)\) for any arbitrary value of \(x\)
  • What information is this telling us?
    • There is a 90% chance the conditional expectation is in the range(?!)
# predict from one x, one sample
pry = function(samp, x) samp[1] + samp[2] * x

test_x = 55.6
# just apply prediction to every row of samples
test_y = apply(peng_samps, 1, pry, x = test_x)
# then get the quantiles
quantile(test_y, c(0.05, 0.95))
##       5%      95% 
## 16.33720 16.98103

Posterior prediction

  • This means we can easily use the samples to predict a credible interval for \(\mathbb{E}(y)\) for any arbitrary value of \(x\)
  • What information is this telling us?
  • We can do the same across many x-values to produce a confidence ribbon.
    • This is very similar to regression confidence intervals produced by lm
# predict from many x, one sample
pry = function(samp, x) samp[1] + samp[2] * x

test_x = seq(40, 60, length.out = 200)

# test_x is a vector, so result is a matrix
# each row in this matrix is the prediction for a single x
# each column a single sample
test_y = apply(peng_samps, 1, pry, x = test_x)
colnames(test_y) = paste0("samp", 1:nrow(peng_samps))
rownames(test_y) = paste0("bill_len=", round(test_x, 2))
test_y[1:3, 1:6]
##                iterations
##                    samp1    samp2    samp3    samp4    samp5    samp6
##   bill_len=40   13.74954 13.74029 13.33007 13.14029 13.26856 13.66235
##   bill_len=40.1 13.76703 13.75796 13.35328 13.16433 13.29212 13.68048
##   bill_len=40.2 13.78451 13.77563 13.37649 13.18837 13.31567 13.69861

Posterior prediction

  • This means we can easily use the samples to predict a credible interval for \(\mathbb{E}(y)\) for any arbitrary value of \(x\)
  • What information is this telling us?
  • We can do the same across many x-values to produce a confidence ribbon.
    • This is very similar to regression confidence intervals produced by lm
# then we get the quantiles by row
interval_y = apply(test_y, 1, quantile, c(0.05, 0.95))
interval_y[, 1:5]
##      
##       bill_len=40 bill_len=40.1 bill_len=40.2 bill_len=40.3 bill_len=40.4
##   5%     13.11952      13.14429      13.16920      13.19365      13.21858
##   95%    13.73734      13.75457      13.77286      13.79004      13.80743

Posterior prediction

  • This means we can easily use the samples to predict a credible interval for \(\mathbb{E}(y)\) for any arbitrary value of \(x\)
  • What information is this telling us?
  • We can do the same across many x-values to produce a confidence ribbon.
    • This is very similar to regression confidence intervals produced by lm

Posterior prediction

  • Our model is generative
  • It postulates a statistical process (not mechanistic) by which the outcomes \(y\) are created
  • We can use posterior predictive simulations to learn the distribution of outcomes
  • For a given value of \(x\), the interval tells you where 90% of the values of \(y\) will fall (not \(\mathbb{E}[y]\))
  • To do this:
    • for each sample of \(a\), \(b\), and \(s\)
    • for each value of a prediction dataset \(\hat{x}\)
    • compute \(\eta = \mathbb{E}(y)\) using the regression equation
    • simulate a new dataset \(\hat{y}\) from \(\eta\) and \(s\)
    • compute quantiles for \(\hat{y} | \hat{x}\)
  • Similar to typical regression prediction intervals
# from a single sample, generate a new prediction dataset from xhat
sim = function(samp, xhat) {
    eta = samp[1] + samp[2] * xhat
    rnorm(length(xhat), eta, samp[3])
}

test_x = seq(40, 60, length.out = 200)
pr_test_y = matrix(0, ncol = nrow(peng_samps), nrow = length(test_x))

# for clarity, using a for loop. could (should) do this instead with mapply
for(i in 1:nrow(peng_samps))
    pr_test_y[,i] = sim(peng_samps[i,], test_x)

# now get quantiles for each value in x
pr_int_y = apply(pr_test_y, 1, quantile, c(0.05, 0.95))

Posterior prediction

  • Our model is generative
  • It postulates a statistical process (not mechanistic) by which the outcomes \(y\) are created
  • We can use posterior predictive simulations to learn the distribution of outcomes
  • For a given value of \(x\), the interval tells you where 90% of the values of \(y\) will fall (not \(\mathbb{E}[y]\))
  • To do this:
    • for each sample of \(a\), \(b\), and \(s\)
    • for each value of a prediction dataset \(\hat{x}\)
    • compute \(\eta = \mathbb{E}(y)\) using the regression equation
    • simulate a new dataset \(\hat{y}\) from \(\eta\) and \(s\)
    • compute quantiles for \(\hat{y} | \hat{x}\)
  • Similar to typical regression prediction intervals