1. Optional. Implement your own Metropolis algorithm

See the bottom of this page for some scaffolding code to get you started. Fill in the appropriate places with the code for making the algorithm work. Alternatively, there is an implementation provided on the github repository.

2. German Tank Problem

Recall the German tank problem presented in lecture. Use the following captured serial numbers:

s = c(147, 126, 183, 88, 9, 203, 16, 10, 112, 205)

Your goal is to estimate a single parameter, \(N\), the highest possible serial number (indicating the number of tanks actually produced).

  1. What likelihood function is appropriate? Can you write this as an equation? The likelihood function should be \(pr(s|N)\).
  2. Translate this likelihood function into R code, and plot the function for varying values of \(N\).
  3. Translate a and b above into a Stan statement for the model block. It will look something like s ~ ...
  4. Add a prior for \(N\) to your Stan program. What prior is reasonable? Bonus: Write a prior and posterior function in R, and plot them as in part a.
  5. Finish the Stan program, then use it to get the MAP estimate for N using the optimizing function. What’s the MAP estimate?
    • Hint: You will need to use the vector datatype, which we haven’t seen yet. Look it up in the Stan manual to see if you can understand how and where to use it.

3. Sampling the posterior

Use sampling() to get 5000 samples from the posterior distribution. Alternatively, if you finished part 1 above, try this with your own sampler.

  1. Evaluate your samples using mcmc_trace() and mcmc_hist() from the bayesplot package (or implement your own versions). You might need to use as.array or as.matrix to convert the samples from stan to something that bayesplot can use. Compare the histogram of samples to the posterior density plot you made in 2d.
  2. What summary statistics can we get from the samples? How do your estimates of central tendency (mean, median, etc) compare with the MAP? What metrics of dispersion might be useful? Can you imagine how you might calculate a credible interval (== Bayesian confidence interval)?

Bonus: discrete uniform parameters

You probably have produced a model in 2 that treats N as a continuous varible, resulting of course in estimates that say something like “1457.3 tanks were produced.” This is of course impossible, \(N\) and \(s\) are both discrete parameters. Can you design a model that respects this constraint? How do the results differ?

Scaffolding for implementing the Metropolis algorithm

#' Simple single-parameter metropolis algorithm
#' @param target Target function (returning log unnormalised posterior density);
#'  this function should take the parameter as its first argument and a data list as its second
#' @param initial Initial value of the parameter
#' @param data Data to pass to the target
#' @param iter Number of iterations
#' @param scale Scale for the proposal distribution; defaults to 1
#' 
#' @return A list, with three components: 'chain' is the markov chain, 'scale' 
#'      is the scale parameter used, and 'accept' is the acceptance rate
metropolis = function(target, initial, data, iter = 5000, scale = 1) {


    ##### OPTIONAL
    ## here, you can run an adaptation phase to set the scale. 
    ## The steps should be a repeat of everything below
    ## The only addition: if you accept the proposal: scale = scale * 1.1 (or some other constant)
    ## if you reject: scale = scale / 1.1
    ## At the end of adaptation, discard the chain, you can't use those samples

    # set up the markov chain
    # here we preallocate a vector to hold the state of the chain
    chain = numeric(iter)

    # it is important to keep track of how many times we accept the proposals
    # the acceptance rate is an important diagnostic
    accept = 0


    # the first step in the chain gets initial values
    chain[1] = initial

    for(i in 2:iter) {

        ## STEPS FOR THE ALGORITHM
        ## 1. generate a proposal for chain[i]
        ##     this proposal should be draw from a proposal distribution centred around
        ##     chain[i-1] and using the scale to determine how wide the distribution is
        ##
        ## 2. Compute the acceptance probability of the proposal
        ##     remember that this is the ratio of the probabilities from the target distribution
        ##     target(proposal, data)/target(chain[i-1], data)
        ##     If your target returns a log probability (it should), then you need to convert
        ##     from log-scale to probability scale
        ##     
        ## 3. Do a bernoulli trial - on a success, accept the proposal, on a failure, reject it
        ## 
        ## 4. Save the result; if you accepted, chain[i] gets the proposal. 
        ##    If not, chain[i] will be the same as chain[i-1]. Don't forget to track acceptances. 

    }


    return(list(chain = chain, accept = accept/iter, scale = scale))
}



log_posterior = function(params, data) {
    ## fill in a log posterior for the problem you are working here
}

# fill in initial values, data, and the guess at the scale
## fit = metropolis(log_posterior, initial = , data = , scale = )