Lauren. Talluto
28.11.2024
Dataset: Palmer penguins
Dataset: Dr. Kristen Gorman, University of Alaska (Gorman et al 2014)
R package
palmerpenguins
Artwork by @allison_horst
Dataset: Palmer penguins
Dataset: Dr. Kristen Gorman, University of Alaska (Gorman et al 2014)
R package
palmerpenguins
Artwork by @allison_horst
Dataset: Palmer penguins
Dataset: Dr. Kristen Gorman, University of Alaska (Gorman et al 2014)
R package
palmerpenguins
Artwork by @allison_horst
data(penguins, package = "palmerpenguins")
penguins = as.data.frame(penguins[complete.cases(penguins),])
penguins = subset(penguins, species == "Gentoo")
peng_dat = with(penguins, list(
x = bill_length_mm,
y = bill_depth_mm,
n = length(bill_length_mm)
))
# refresh = 0 just turns off the status display to keep the slides neat
# you can omit it in your own code!
peng_fit = sampling(penguin_lm, data = peng_dat, refresh = 0)
# samples are a bit easier to deal with in a matrix
peng_samps = as.matrix(peng_fit)
head(peng_samps)
## parameters
## iterations intercept slope s lp__
## [1,] 6.790132 0.1739852 0.8368009 -27.18301
## [2,] 6.707194 0.1758275 0.7691898 -26.04086
## [3,] 4.091735 0.2309583 0.7726220 -25.72605
## [4,] 3.572192 0.2392024 0.7861896 -25.84279
## [5,] 3.892801 0.2343940 0.7330994 -25.43181
## [6,] 6.447683 0.1803667 0.7783494 -25.40238
summary(lm(...))
tab = data.frame(
mean = colMeans(peng_samps[, 1:3]),
median = apply(peng_samps[, 1:3], 2, median),
## need optimizing to get the posterior mode
mode = optimizing(penguin_lm, data = peng_dat)$par)
mean | median | mode | |
---|---|---|---|
intercept | 5.14159 | 5.13949 | 5.12095 |
slope | 0.20718 | 0.20717 | 0.20761 |
s | 0.75794 | 0.75534 | 0.74274 |
intercept | slope | s | |
---|---|---|---|
5.14 | 0.21 | 0.76 |
If \(\hat{\theta}\) is a set of samples approximating the posterior distribution of \(\theta\), and if some desired variable \(Z = f(\theta)\), then \(f(\hat{\theta})\) approximates the posterior distribution of \(Z\)
intercept | slope | s | |
---|---|---|---|
5.14 | 0.21 | 0.76 |
If \(\hat{\theta}\) is a set of samples approximating the posterior distribution of \(\theta\), and if some desired variable \(Z = f(\theta)\), then \(f(\hat{\theta})\) approximates the posterior distribution of \(Z\)
intercept | slope | s | |
---|---|---|---|
5.14 | 0.21 | 0.76 |
If \(\hat{\theta}\) is a set of samples approximating the posterior distribution of \(\theta\), and if some desired variable \(Z = f(\theta)\), then \(f(\hat{\theta})\) approximates the posterior distribution of \(Z\)
intercept | slope | s | |
---|---|---|---|
5.14 | 0.21 | 0.76 |
lm
# predict from many x, one sample
pry = function(samp, x) samp[1] + samp[2] * x
test_x = seq(40, 60, length.out = 200)
# test_x is a vector, so result is a matrix
# each row in this matrix is the prediction for a single x
# each column a single sample
test_y = apply(peng_samps, 1, pry, x = test_x)
colnames(test_y) = paste0("samp", 1:nrow(peng_samps))
rownames(test_y) = paste0("bill_len=", round(test_x, 2))
test_y[1:3, 1:6]
## iterations
## samp1 samp2 samp3 samp4 samp5 samp6
## bill_len=40 13.74954 13.74029 13.33007 13.14029 13.26856 13.66235
## bill_len=40.1 13.76703 13.75796 13.35328 13.16433 13.29212 13.68048
## bill_len=40.2 13.78451 13.77563 13.37649 13.18837 13.31567 13.69861
lm
lm
# from a single sample, generate a new prediction dataset from xhat
sim = function(samp, xhat) {
eta = samp[1] + samp[2] * xhat
rnorm(length(xhat), eta, samp[3])
}
test_x = seq(40, 60, length.out = 200)
pr_test_y = matrix(0, ncol = nrow(peng_samps), nrow = length(test_x))
# for clarity, using a for loop. could (should) do this instead with mapply
for(i in 1:nrow(peng_samps))
pr_test_y[,i] = sim(peng_samps[i,], test_x)
# now get quantiles for each value in x
pr_int_y = apply(pr_test_y, 1, quantile, c(0.05, 0.95))