Posterior Inference I

Lauren Talluto

03.02.2026

Posterior inference I: Sampling

Posterior inference I: Sampling

  • Taking samples turns out to be a very useful way to learn about a distribution!
  • For example: Bayesian linear regression in Stan

Dataset: Palmer penguins

Dataset: Dr. Kristen Gorman, University of Alaska (Gorman et al 2014)

R package palmerpenguins

Artwork by @allison_horst

Posterior inference I: Sampling

  • Taking samples turns out to be a very useful way to learn about a distribution!
  • For example: Bayesian linear regression in Stan

Dataset: Palmer penguins

Dataset: Dr. Kristen Gorman, University of Alaska (Gorman et al 2014)

R package palmerpenguins

Artwork by @allison_horst

data {
    int <lower=0> n;
    vector [n] x;
    vector [n] y;
}
parameters {
    real intercept;
    real slope;
    real <lower = 0> s;
}
model {
    y ~ normal(intercept + slope * x, s);
}
penguin_lm = stan_model("stan/basic_lm.stan") 

Posterior inference I: Sampling

  • Taking samples turns out to be a very useful way to learn about a distribution!
  • For example: Bayesian linear regression in Stan

Dataset: Palmer penguins

Dataset: Dr. Kristen Gorman, University of Alaska (Gorman et al 2014)

R package palmerpenguins

Artwork by @allison_horst

data(penguins, package = "palmerpenguins")
penguins = as.data.frame(penguins[complete.cases(penguins),])
penguins = subset(penguins, species == "Gentoo")
peng_dat = with(penguins, list(
    x = bill_length_mm,
    y = bill_depth_mm,
    n = length(bill_length_mm)
))

# refresh = 0 just turns off the status display to keep the slides neat
# you can omit it in your own code!
peng_fit = sampling(penguin_lm, data = peng_dat, refresh = 0)

Posterior inference I: Sampling

  • Taking samples turns out to be a very useful way to learn about a distribution!
  • For example: Bayesian linear regression in Stan
  • What can we learn from our samples?
# samples are a bit easier to deal with in a matrix
peng_samps = as.matrix(peng_fit)
head(peng_samps)
##           parameters
## iterations intercept     slope         s      lp__
##       [1,]  4.327070 0.2240388 0.6671229 -26.35499
##       [2,]  4.417360 0.2207002 0.6792690 -26.56142
##       [3,]  4.893192 0.2111910 0.6604217 -26.76978
##       [4,]  4.902920 0.2113316 0.6599114 -26.57599
##       [5,]  3.651149 0.2364696 0.8081511 -26.70807
##       [6,]  5.829585 0.1946857 0.7548675 -25.50562

Posterior inference I: Sampling

  • Taking samples turns out to be a very useful way to learn about a distribution!
  • For example: Bayesian linear regression in Stan
  • What can we learn from our samples?
    • What is the probability that the slope > 0.15?

slope_015 = peng_samps[, 'slope'] > 0.15
table(slope_015)
## slope_015
## FALSE  TRUE 
##    36  3964
sum(slope_015 / nrow(peng_samps))
## [1] 0.991

Posterior inference I: Sampling

  • Taking samples turns out to be a very useful way to learn about a distribution!
  • For example: Bayesian linear regression in Stan
  • What can we learn from our samples?
    • What is the probability that the slope > 0.15?
    • What is the probability of a range of values, e.g., \(0.15 < slope < 0.2\)

sum(peng_samps[, 'slope'] > 0.15 & peng_samps[, 'slope'] < 0.25) / 
    nrow(peng_samps)
## [1] 0.969

Posterior inference I: Sampling

  • Taking samples turns out to be a very useful way to learn about a distribution!
  • For example: Bayesian linear regression in Stan
  • What can we learn from our samples?
    • What is the probability that the slope > 0.15?
    • What is the probability of a range of values, say $ 0.15 < slope < 0.2$
    • What interval encompasses 90% of the probability mass (90% Credible Interval)?

t(apply(samps, 2, quantile, c(0.05, 0.95)))
##            
## parameters           5%         95%
##   intercept   3.4496078   6.9806483
##   slope       0.1687487   0.2422317
##   s           0.6778897   0.8448743
##   lp__      -28.4490187 -24.5702378

Posterior inference I: Sampling

  • Taking samples turns out to be a very useful way to learn about a distribution!
  • For example: Bayesian linear regression in Stan
  • What can we learn from our samples?
    • What is the probability that the slope > 0.15?
    • What is the probability of a range of values, say \(0.15 < slope < 0.2\)
    • What interval encompasses 90% of the probability mass (90% Credible Interval)?
  • We could emulate the output of summary(lm(...))
tab = data.frame(
    estimate = apply(peng_samps[, 1:2], 2, median),
    ste = apply(peng_samps[, 1:2], 2, sd)
)
Estimate Std. Error
intercept 5.15 1.06
slope 0.21 0.02

Posterior inference I: Sampling

  • What about central tendency? What’s the most likely or “best guess” for each parameter?
  • For normally distributed posterior, all three are equal
  • Otherwise, median performs best, but always prefer an interval to a point estimate
tab = data.frame(
    mean = colMeans(peng_samps[, 1:3]),
    median = apply(peng_samps[, 1:3], 2, median),
    ## need optimizing to get the posterior mode
    mode = optimizing(penguin_lm, data = peng_dat)$par)
mean median mode
intercept 5.16935 5.15127 5.12090
slope 0.20658 0.20708 0.20761
s 0.75635 0.75220 0.74274

Posterior prediction

  • We can use the medians to easily predict the regression line.
intercept slope s
5.15 0.21 0.75

Posterior prediction

  • We can use the medians to easily predict the regression line.
  • Posterior distributions are transitive

If \(\hat{\theta}\) is a set of samples approximating the posterior distribution of \(\theta\), and if some desired variable \(Z = f(\theta)\), then \(f(\hat{\theta})\) approximates the posterior distribution of \(Z\)

intercept slope s
5.15 0.21 0.75

Posterior prediction

  • We can use the medians to easily predict the regression line.
  • Posterior distributions are transitive

If \(\hat{\theta}\) is a set of samples approximating the posterior distribution of \(\theta\), and if some desired variable \(Z = f(\theta)\), then \(f(\hat{\theta})\) approximates the posterior distribution of \(Z\)

  • We can use this to get a posterior distribution of regression lines
    • Each posterior sample is one potential regression line
intercept slope s
5.15 0.21 0.75

Posterior prediction

  • We can use the medians to easily predict the regression line.
  • Posterior distributions are transitive

If \(\hat{\theta}\) is a set of samples approximating the posterior distribution of \(\theta\), and if some desired variable \(Z = f(\theta)\), then \(f(\hat{\theta})\) approximates the posterior distribution of \(Z\)

  • We can use this to get a posterior distribution of regression lines
    • Each posterior sample is one potential regression line
intercept slope s
5.15 0.21 0.75

Posterior prediction

  • This means we can easily use the samples to predict a credible interval for \(\mathbb{E}(y)\) for any arbitrary value of \(x\)
  • What information is this telling us?
    • There is a 90% chance the conditional expectation is in the range(?!)
# predict from one x, one sample
pry = function(samp, x) samp[1] + samp[2] * x

test_x = 55.6
# just apply prediction to every row of samples
test_y = apply(peng_samps, 1, pry, x = test_x)
# then get the quantiles
quantile(test_y, c(0.05, 0.95))
##       5%      95% 
## 16.33331 16.96370

Posterior prediction

  • This means we can easily use the samples to predict a credible interval for \(\mathbb{E}(y)\) for any arbitrary value of \(x\)
  • What information is this telling us?
  • We can do the same across many x-values to produce a confidence ribbon.
    • This is very similar to regression confidence intervals produced by lm
# predict from many x, one sample
pry = function(samp, x) samp[1] + samp[2] * x

test_x = seq(40, 60, length.out = 200)

# test_x is a vector, so result is a matrix
# each row in this matrix is the prediction for a single x
# each column a single sample
test_y = apply(peng_samps, 1, pry, x = test_x)
colnames(test_y) = paste0("samp", 1:nrow(peng_samps))
rownames(test_y) = paste0("bill_len=", round(test_x, 2))
test_y[1:3, 1:6]
##                iterations
##                    samp1    samp2    samp3    samp4    samp5    samp6
##   bill_len=40   13.28862 13.24537 13.34083 13.35618 13.10993 13.61701
##   bill_len=40.1 13.31114 13.26755 13.36206 13.37742 13.13370 13.63658
##   bill_len=40.2 13.33365 13.28973 13.38328 13.39866 13.15747 13.65614

Posterior prediction

  • This means we can easily use the samples to predict a credible interval for \(\mathbb{E}(y)\) for any arbitrary value of \(x\)
  • What information is this telling us?
  • We can do the same across many x-values to produce a confidence ribbon.
    • This is very similar to regression confidence intervals produced by lm
# then we get the quantiles by row
interval_y = apply(test_y, 1, quantile, c(0.05, 0.95))
interval_y[, 1:5]
##      
##       bill_len=40 bill_len=40.1 bill_len=40.2 bill_len=40.3 bill_len=40.4
##   5%     13.14260      13.16763      13.19113      13.21401      13.23752
##   95%    13.74101      13.75833      13.77511      13.79278      13.80974

Posterior prediction

  • This means we can easily use the samples to predict a credible interval for \(\mathbb{E}(y)\) for any arbitrary value of \(x\)
  • What information is this telling us?
  • We can do the same across many x-values to produce a confidence ribbon.
    • This is very similar to regression confidence intervals produced by lm

Posterior prediction

  • Our model is generative
  • It postulates a statistical process (not mechanistic) by which the outcomes \(y\) are created
  • We can use posterior predictive simulations to learn the distribution of outcomes
  • For a given value of \(x\), the interval tells you where 90% of the values of \(y\) will fall (not \(\mathbb{E}[y]\))
  • To do this:
    • for each sample of \(a\), \(b\), and \(s\)
    • for each value of a prediction dataset \(\hat{x}\)
    • compute \(\eta = \mathbb{E}(y)\) using the regression equation
    • simulate a new dataset \(\hat{y}\) from \(\eta\) and \(s\)
    • compute quantiles for \(\hat{y} | \hat{x}\)
  • Similar to typical regression prediction intervals
# from a single sample, generate a new prediction dataset from xhat
sim = function(samp, xhat) {
    eta = samp[1] + samp[2] * xhat
    rnorm(length(xhat), eta, samp[3])
}

test_x = seq(40, 60, length.out = 200)
pr_test_y = matrix(0, ncol = nrow(peng_samps), nrow = length(test_x))

# for clarity, using a for loop. could (should) do this instead with mapply
for(i in 1:nrow(peng_samps))
    pr_test_y[,i] = sim(peng_samps[i,], test_x)

# now get quantiles for each value in x
pr_int_y = apply(pr_test_y, 1, quantile, c(0.05, 0.95))

Posterior prediction

  • Our model is generative
  • It postulates a statistical process (not mechanistic) by which the outcomes \(y\) are created
  • We can use posterior predictive simulations to learn the distribution of outcomes
  • For a given value of \(x\), the interval tells you where 90% of the values of \(y\) will fall (not \(\mathbb{E}[y]\))
  • To do this:
    • for each sample of \(a\), \(b\), and \(s\)
    • for each value of a prediction dataset \(\hat{x}\)
    • compute \(\eta = \mathbb{E}(y)\) using the regression equation
    • simulate a new dataset \(\hat{y}\) from \(\eta\) and \(s\)
    • compute quantiles for \(\hat{y} | \hat{x}\)
  • Similar to typical regression prediction intervals