For those wishing to follow along with the R-based demo in class, click here for the companion R script for this lecture.
MCMC is a special type of random number generator that is designed to sample from difficult-to-describe (e.g., multivariate, hierarchical) probability distributions. In many/most cases, the posterior distribution for ecological problems is a very difficult-to-describe probability distribution.
MCMC is kind of magical in that it allows you to sample from probability distributions that are impossible to fully define mathematically! The MCMC approach uses random jumps in parameter space that eventually end up sampling from the posterior distribution! The key to MCMC is the following:
The probability of a successful jump in parameter space from point A to point B is proportional to the ratio of the posterior probabilities at these two points.
The probability of a successful jump in parameter space can be characterized as:
\(Prob(jump) * Prob(accept)\)
That is: in order to jump, you need to propose a specific jump (according to some probability distribution) and then accept that jump!
The ratio of jump probabilities can be characterized as:
\(\frac{Prob(jump_{a\rightarrow b})\cdot Prob(accept b|a)}{Prob(jump_{b\rightarrow a})\cdot Prob(accept a|b)}\)
For this procedure to (eventually) extract random samples from the posterior distribution, this ratio MUST be equal to the ratio of the posterior probabilities at points A and B:
\(\frac{Posterior(B)}{Posterior(A)}\)
If this rule is met, then in the long run the chain will explore each segment of parameter space in accordance with the posterior probability of that segment- thereby eventually capturing the entire posterior distribution (if run long enough).
Note: if the proposal distribution is symmetrical, then the probability of proposing a jump from A to B is the same as the probability of proposing a jump from B to A. Therefore, the proposal probabilities drop out of the above equation…
Amazingly, MCMC at its core is not that difficult to describe or even to implement. Let’s look at a simple MCMC algorithm.
This algorithm is essentially the same as the simulated annealing algorithm we discussed in the “optimization” lecture! The main difference: the “temperature” doesn’t decrease over time and the temperature parameter k is always set to 1.
The M-H algorithm can be expressed as:
\(Prob(accept B|A) = min(1,\frac{Posterior(B)}{Posterior(A)}\cdot \frac{Prob(b\rightarrow a)}{Prob(a\rightarrow b)})\)
Q: Does it make sense that this algorithm meets the basic rule of MCMC, that is, that the ratio of jump probabilities to/from any two points in parameter/model space is equal to the ratio of posterior probabilities evaluated at those two points?
This example is modified from this link by Darren Wilkinson
Remember that MCMC samplers are just a type of random number generator. We can use a Metropolis-Hastings sampler to develop our own random number generator for a simple known distribution. In this example, we use a M-H Sampler to generate random numbers from a standard bivariate normal probability distribution.
We don’t need an MCMC sampler for this simple example (we already have ‘perfect’ random number generators, like the one below). One way to do this would be to use the following code, which draws and visualizes an arbitrary number of independent samples from the bivariate standard normal distribution with a correlation parameter, \(\rho\).
# Simple example of MCMC sampling -----------------------
# first, let's build a function that generates random numbers from a bivariate standard normal distribution
rbvn<-function (n, rho) #function for drawing an arbitrary number of independent samples from the bivariate standard normal distribution.
{
x <- rnorm(n, 0, 1)
y <- rnorm(n, rho * x, sqrt(1 - rho^2))
cbind(x, y)
}
# Now, plot the random draws from this distribution, make sure this makes sense!
bvn<-rbvn(10000,0.98)
par(mfrow=c(3,2))
plot(bvn,col=1:10000)
plot(bvn,type="l")
plot(ts(bvn[,1]))
plot(ts(bvn[,2]))
hist(bvn[,1],40)
hist(bvn[,2],40)
par(mfrow=c(1,1))
# Metropolis-Hastings implementation of bivariate normal sampler...
library(mvtnorm) # load a package that allows us to compute probability densities for mv normal distribution
metropolisHastings <- function (n, rho=0.98){ # an MCMC sampler implementation of a bivariate random number generator
mat <- matrix(ncol = 2, nrow = n) # matrix for storing the random samples
x <- 0 # initial values for all parameters
y <- 0
prev <- mvtnorm::dmvnorm(c(x,y),mean=c(0,0),sigma = matrix(c(1,rho,rho,1),ncol=2)) # probability density of the distribution at the starting values
mat[1, ] <- c(x, y) # initialize the markov chain
counter <- 1
while(counter<=n) {
newx <- rnorm(1,x,0.5) # make a jump. Note the symmetrical proposal distribution
newy <- rnorm(1,y,0.5)
newprob <- mvtnorm::dmvnorm(c(newx,newy),sigma = matrix(c(1,rho,rho,1),ncol=2)) # assess whether the new jump is good!
ratio <- newprob/prev # compute the ratio of probabilities at the old (jump from) and proposed (jump to) locations.
prob.accept <- min(1,ratio) # decide the probability of accepting the new jump!
rand <- runif(1)
if(rand<=prob.accept){
x=newx;y=newy # set x and y to the new location
mat[counter,] <- c(x,y) # store this in the storage array
prev <- newprob # get ready for the next iteration
}else{
mat[counter,] <- c(x,y)
}
counter=counter+1
}
return(mat)
}
Then we can use the M-H sampler to get random samples from this known distribution…
# Test the new M-H sampler
bvn<-metropolisHastings(10000,0.98)
par(mfrow=c(3,2))
plot(bvn,col=1:10000)
plot(bvn,type="l")
plot(ts(bvn[,1]))
plot(ts(bvn[,2]))
hist(bvn[,1],40)
hist(bvn[,2],40)
par(mfrow=c(1,1))
Q: Why does the MCMC routine produce ‘lower quality’ random numbers than the first (bivariate normal) sampler we built?
Okay, enough with super simple examples- let’s try it for a non-trivial problem, like the Myxomatosis example from the Bolker book!
# MCMC implementation of the Myxomatosis example from the Bolker book --------------
library(emdbook)
MyxDat <- MyxoTiter_sum
Myx <- subset(MyxDat,grade==1)
head(Myx)
## grade day titer
## 1 1 2 5.207
## 2 1 2 5.734
## 3 1 2 6.613
## 4 1 3 5.997
## 5 1 3 6.612
## 6 1 3 6.810
Recall that we are modeling the distribution of measured titers (virus loads) for Australian rabbits. Bolker chose to use a Gamma distribution. Here is the empirical distribution:
# Visualize the Myxomatosis data for the 100th time!
hist(Myx$titer,freq=FALSE)
We need to estimate the gamma shape and scale parameters that best fit this empirical distribution. Here is one example of a Gamma fit to this distribution:
# ... and overlay a proposed data-generating model (gamma distribution)
hist(Myx$titer,freq=FALSE)
curve(dgamma(x,shape=40,scale=0.15),add=T,col="red")
Recall that the 2-D (log) likelihood surface looks something like this:
# define 2-D parameter space!
shapevec <- seq(3,100,by=0.1)
scalevec <- seq(0.01,0.5,by=0.001)
# define the likelihood surface -------------
GammaLogLikelihoodFunction <- function(params){
sum(dgamma(Myx$titer,shape=params['shape'],scale=params['scale'],log=T))
}
surface2D <- matrix(nrow=length(shapevec),ncol=length(scalevec)) # initialize storage variable
newparams <- c(shape=50,scale=0.2)
for(i in 1:length(shapevec)){
newparams['shape'] <- shapevec[i]
for(j in 1:length(scalevec)){
newparams['scale'] <- scalevec[j]
surface2D[i,j] <- GammaLogLikelihoodFunction(newparams)
}
}
# Visualize the likelihood surface
image(x=shapevec,y=scalevec,z=surface2D,zlim=c(-1000,-30),col=topo.colors(12))
contour(x=shapevec,y=scalevec,z=surface2D,levels=c(-30,-40,-80,-500),add=T)
Here is an implementation of the M-H algorithm to find the joint posterior distribution!
First, we need a likelihood function (our old friend!)- this time, we will return real probabilities– NOT log-transformed probabilities
# Write a non-log-transformed likelihood function ------------
GammaLikelihoodFunction <- function(params){
prod(dgamma(Myx$titer,shape=params['shape'],scale=params['scale'],log=F))
}
# and here's the log likelihood function
GammaLogLikelihoodFunction <- function(params){
sum(dgamma(Myx$titer,shape=params['shape'],scale=params['scale'],log=T))
}
params <- c(shape=40,scale=0.15)
params
## shape scale
## 40.00 0.15
GammaLikelihoodFunction(params)
## [1] 2.906766e-22
GammaLogLikelihoodFunction(params)
## [1] -49.58983
Then, we need a prior distribution for our parameters! Let’s assign relatively flat priors for both of our parameters. In this case, let’s assign a \(gamma(shape=0.001,scale=1000)\) for both parameters (mean of 1 and very large variance):
# Function for returning the prior probability density for any point in parameter space
GammaPriorFunction <- function(params){
prior <- c(shape=NA,scale=NA)
prior['shape'] <- dgamma(params['shape'],shape=0.001,scale=1000)
prior['scale'] <- dgamma(params['scale'],shape=0.001,scale=1000)
# prior['shape'] <- dunif(params['shape'],3,100) # alternative: could use uniform prior!
# prior['scale'] <- dunif(params['scale'],0.01,0.5)
return(prod(prior))
}
GammaLogPriorFunction <- function(params){
prior <- c(shape=NA,scale=NA)
prior['shape'] <- dgamma(params['shape'],shape=0.001,scale=1000,log=T)
prior['scale'] <- dgamma(params['scale'],shape=0.001,scale=1000,log=T)
# prior['shape'] <- dunif(params['shape'],3,100) # alternative: could use uniform prior!
# prior['scale'] <- dunif(params['scale'],0.01,0.5)
return(sum(prior))
}
curve(dgamma(x,shape=0.001,scale=1000),3,100)
params <- c(shape=40,scale=0.15)
params
## shape scale
## 40.00 0.15
GammaPriorFunction(params)
## [1] 1.583765e-07
prior2D <- matrix(nrow=length(shapevec),ncol=length(scalevec)) # initialize storage variable
newparams <- c(shape=50,scale=0.2)
for(i in 1:length(shapevec)){
newparams['shape'] <- shapevec[i]
for(j in 1:length(scalevec)){
newparams['scale'] <- scalevec[j]
prior2D[i,j] <- GammaPriorFunction(newparams)
}
}
# Visualize the prior likelihood surface
image(x=shapevec,y=scalevec,z=prior2D,zlim=c(0.000000001,0.001),col=topo.colors(12))
#contour(x=shapevec,y=scalevec,z=prior2D,levels=c(-30,-40,-80,-500),add=T)
Note that we are also assuming that the shape and scale are independent in the prior (multiplicative probabilities for the joint prior). However, this does not impose the same assumption on the posterior- parameter correlations in the likelihood can and will be reflected in the posterior distribution.
Then, we need a function that can compute the ratio of posterior probabilities for any given jump in parameter space. Because we are dealing with a ratio of posterior probabilities, we do NOT need to compute the normalization constant.
Without the need for a normalization constant, we just need to compute the ratio of weighted likelihoods (that is, the likelihood weighted by the prior)
# Function for computing the ratio of posterior densities -----------------
PosteriorRatio <- function(oldguess,newguess){
oldLik <- max(1e-90,GammaLikelihoodFunction(oldguess)) # compute likelihood and prior density at old guess
oldPrior <- max(1e-90,GammaPriorFunction(oldguess))
newLik <- GammaLikelihoodFunction(newguess) # compute likelihood and prior density at new guess
newPrior <- GammaPriorFunction(newguess)
return((newLik*newPrior)/(oldLik*oldPrior)) # compute ratio of weighted likelihoods
}
PosteriorRatio2 <- function(oldguess,newguess){
oldLogLik <- GammaLogLikelihoodFunction(oldguess) # compute likelihood and prior density at old guess
oldLogPrior <- GammaLogPriorFunction(oldguess)
newLogLik <- GammaLogLikelihoodFunction(newguess) # compute likelihood and prior density at new guess
newLogPrior <- GammaLogPriorFunction(newguess)
return(exp((newLogLik+newLogPrior)-(oldLogLik+oldLogPrior))) # compute ratio of weighted likelihoods
}
oldguess <- params
newguess <- c(shape=39,scale=0.15)
PosteriorRatio(oldguess,newguess)
## [1] 0.01423757
PosteriorRatio2(oldguess,newguess)
## [1] 0.01423757
Then we need a function for making new guesses, or jumps in parameter space:
# Define proposal distribution --------------------------
#for jumps in parameter space (use normal distribution)!
# function for making new guesses
newGuess <- function(oldguess){
sdshapejump <- 3 #4
sdscalejump <- 0.05
corjump <- -0.6
vcv <- diag(2)*c(sdshapejump^2,sdscalejump^2)
vcv[2,1] <- vcv[1,2] <- corjump*sqrt(sdshapejump^2*sdscalejump^2)
newguess <- mvtnorm::rmvnorm(1,oldguess,vcv)[1,] # c(shape=rnorm(1,mean=0,sd=sdshapejump),scale=rnorm(1,0,sdscalejump))
# newguess <- abs(oldguess + jump) # note: by taking the abs val to avoid negative numbers, our proposal jump probs are not strictly symmetrical, but this should not present a big issue in practice
newguess <- ifelse(newguess<0.001,0.001,newguess)
return(newguess)
}
# set a new "guess" near to the original guess
newGuess(oldguess=params)
## shape scale
## 38.7879892 0.1032595
newGuess(oldguess=params)
## shape scale
## 41.19306213 0.08522816
newGuess(oldguess=params)
## shape scale
## 45.1268979 0.1102452
Now we are ready to implement the Metropolis-Hastings MCMC algorithm:
We need a starting point:
# Set a starting point in parameter space -------------------
startingvals <- c(shape=75,scale=0.28) # starting point for the algorithm
Let’s play with the different functions we have so far…
# Try our new functions ------------------
startingvals
## shape scale
## 75.00 0.28
newguess <- newGuess(startingvals) # take a jump in parameter space
newguess
## shape scale
## 72.5651002 0.2364431
PosteriorRatio2(startingvals,newguess) # difference in posterior ratio
## [1] 7.032794e+121
Now let’s look at the Metropolis-Hastings routine:
# Visualize the Metropolis-Hastings routine: ---------------
chain.length <- 11
oldguess <- startingvals
guesses <- matrix(0,nrow=chain.length,ncol=2)
colnames(guesses) <- names(startingvals)
guesses[1,] <- startingvals
counter <- 2
while(counter <= chain.length){
newguess <- newGuess(oldguess)
post.rat <- PosteriorRatio2(oldguess,newguess)
prob.accept <- min(1,post.rat)
rand <- runif(1)
if(rand<=prob.accept){
oldguess <- newguess
guesses[counter,] <- newguess
}else{
guesses[counter,] <- oldguess
}
counter=counter+1
}
# visualize!
image(x=shapevec,y=scalevec,z=surface2D,zlim=c(-1000,-30),col=topo.colors(12))
contour(x=shapevec,y=scalevec,z=surface2D,levels=c(-30,-40,-80,-500),add=T)
lines(guesses,col="red")
Let’s run it for longer…
# Get more MCMC samples --------------
chain.length <- 1000
oldguess <- startingvals
guesses <- matrix(0,nrow=chain.length,ncol=2)
colnames(guesses) <- names(startingvals)
guesses[1,] <- startingvals
counter <- 2
while(counter <= chain.length){
newguess <- newGuess(oldguess)
post.rat <- PosteriorRatio2(oldguess,newguess)
prob.accept <- min(1,post.rat)
rand <- runif(1)
if(rand<=prob.accept){
oldguess <- newguess
guesses[counter,] <- newguess
}else{
guesses[counter,] <- oldguess
}
counter=counter+1
}
# visualize!
image(x=shapevec,y=scalevec,z=surface2D,zlim=c(-1000,-30),col=topo.colors(12))
contour(x=shapevec,y=scalevec,z=surface2D,levels=c(-30,-40,-80,-500),add=T)
lines(guesses,col="red")
How about for even longer??
# And more... -------------------
chain.length <- 10000
oldguess <- startingvals
guesses <- matrix(0,nrow=chain.length,ncol=2)
colnames(guesses) <- names(startingvals)
guesses[1,] <- startingvals
counter <- 2
while(counter <= chain.length){
newguess <- newGuess(oldguess)
post.rat <- PosteriorRatio2(oldguess,newguess)
prob.accept <- min(1,post.rat)
rand <- runif(1)
if(rand<=prob.accept){
oldguess <- newguess
guesses[counter,] <- newguess
}else{
guesses[counter,] <- oldguess
}
counter=counter+1
}
# visualize!
image(x=shapevec,y=scalevec,z=surface2D,zlim=c(-1000,-30),col=topo.colors(12))
contour(x=shapevec,y=scalevec,z=surface2D,levels=c(-30,-40,-80,-500),add=T)
lines(guesses,col="red")
This looks better! The search algorithm is finding the high-likelihood parts of parameter space pretty well!
Now, let’s look at the chain for the “shape” parameter
# Evaluate "traceplot" for the MCMC samples... ---------------------
## Shape parameter
plot(1:chain.length,guesses[,'shape'],type="l",main="shape parameter",xlab="iteration",ylab="shape")
And for the scale parameter…
## Scale parameter
plot(1:chain.length,guesses[,'scale'],type="l",main="scale parameter",xlab="iteration",ylab="scale")
Can we say that these chains have converged on the posterior distribution for the shape parameter??
First of all, the beginning of the chain “remembers” the starting value, and is therefore not a stationary distribution. We need to remove the first part of the chain, called the ‘burn-in’.
# Remove "burn-in" (allow MCMC routine some time to get to the posterior) --------------
burn.in <- 1000
MCMCsamples <- guesses[-c(1:burn.in),]
chain.length=chain.length-burn.in
plot(1:chain.length,MCMCsamples[,'shape'],type="l",main="shape parameter",xlab="iteration",ylab="shape")
plot(1:chain.length,MCMCsamples[,'scale'],type="l",main="scale parameter",xlab="iteration",ylab="scale")
But it still doesn’t look all that great. Let’s run it for even longer, and see if we get something that looks more like a proper random number generator (white noise)…
# Try again- run for much longer ---------------------
chain.length <- 100000
oldguess <- startingvals
guesses <- matrix(0,nrow=chain.length,ncol=2)
colnames(guesses) <- names(startingvals)
guesses[1,] <- startingvals
counter <- 2
while(counter <= chain.length){
newguess <- newGuess(oldguess)
post.rat <- PosteriorRatio2(oldguess,newguess)
prob.accept <- min(1,post.rat)
rand <- runif(1)
if(rand<=prob.accept){
oldguess <- newguess
guesses[counter,] <- newguess
}else{
guesses[counter,] <- oldguess
}
counter=counter+1
}
# visualize!
image(x=shapevec,y=scalevec,z=surface2D,zlim=c(-1000,-30),col=topo.colors(12))
contour(x=shapevec,y=scalevec,z=surface2D,levels=c(-30,-40,-80,-500),add=T)
lines(guesses,col="red")
Let’s first remove the first 5000 samples as a burn-in
# Use longer "burn-in" ------------------
burn.in <- 25000
MCMCsamples <- guesses[-c(1:burn.in),]
chain.length=chain.length-burn.in
Now, let’s look at the chains again
plot(1:chain.length,MCMCsamples[,'shape'],type="l",main="shape parameter",xlab="iteration",ylab="shape")
plot(1:chain.length,MCMCsamples[,'scale'],type="l",main="scale parameter",xlab="iteration",ylab="scale")
When evaluating these trace plots, we are hoping to see a “stationary distribution” that looks like white noise. This trace plot looks like it might have a little autocorrelation. One way to “fix” this is to thin the MCMC samples:
# "thin" the MCMC samples -----------------------
thinnedMCMC <- MCMCsamples[seq(1,chain.length,by=10),]
plot(1:nrow(thinnedMCMC),thinnedMCMC[,'shape'],type="l",main="shape parameter",xlab="iteration",ylab="shape")
plot(1:nrow(thinnedMCMC),thinnedMCMC[,'scale'],type="l",main="scale parameter",xlab="iteration",ylab="scale")
Now we can examine our posterior distribution!
# Visualize the posterior!
plot(density(thinnedMCMC[,'scale']),main="scale parameter",xlab="scale")
plot(density(thinnedMCMC[,'shape']),main="shape parameter",xlab="shape")
And we can visualize as before.
# More visual posterior checks... -----------------
par(mfrow=c(3,2))
plot(thinnedMCMC,col=1:10000)
plot(thinnedMCMC,type="l")
plot(ts(thinnedMCMC[,1]))
plot(ts(thinnedMCMC[,2]))
hist(thinnedMCMC[,1],40)
hist(thinnedMCMC[,2],40)
par(mfrow=c(1,1))
Hopefully it is clear that the Metropolis-Hastings MCMC method could be modified to fit arbitrary numbers of free parameters for arbitrary models. However, the M-H algorithm by itself is not necessarily the most efficient- and it can be very difficult to develop proposal distributions for multi-dimensional parameter space. The Gibbs sampler tends to be a more effective sampler for multi-dimensional joint posterior distributions. In lab we will play around with Gibbs samplers, mostly using a modeling language called BUGS (Bayesian Inference Using Gibbs Sampling).
NOTE: BUGS implementations (e.g., JAGS) actually tend to use a combination of M-H and Gibbs sampling!
NOTE: M-H and Gibbs samplers aren’t the only MCMC routines out there. For example, the popular program ‘stan’ uses a modification of M-H sampling called ’Hamiltonian Monte Carlo”, which tends to explore parameter space more efficiently
The Gibbs sampler is amazingly straightforward and efficient. Basically, the algorithm successively samples from the full conditional probability distribution – that is, the posterior distribution for arbitrary parameter i conditional on known values for all other parameters in the model.
In many cases, we can’t work out the full posterior distribution for our model directly, but we CAN work out the conditional posterior distribution analytically if all parameters except for the parameter in question were known with certainty. This is especially true if we use conjugate priors for our model specification (what BUGS/JAGS tries to do!). Even if not, the full conditional is often analytically tractable. Nonetheless, even if it’s not analytically tractable, we can use a M-H procedure as a “brute force” last resort!
Q: Why is a Gibbs sampler usually much more efficient than a pure M-H sampler?
Again, remember that MCMC samplers are just a type of random number generator. We can use a Gibbs sampler to develop our own random number generator for a fairly simple known distribution. In this example (same as before), we use a Gibbs Sampler to generate random numbers from a standard bivariate normal probability distribution. Notice that the Gibbs sampler is in many ways more simple and straightforward than the M-H algorithm.
# Simple example of a Gibbs sampler ----------------
# first, recall our simple bivariate normal sampler
rbvn<-function (n, rho){ #function for drawing an arbitrary number of independent samples from the bivariate standard normal distribution.
x <- rnorm(n, 0, 1)
y <- rnorm(n, rho * x, sqrt(1 - rho^2))
cbind(x, y)
}
bvn<-rbvn(10000,0.98)
par(mfrow=c(3,2))
plot(bvn,col=1:10000)
plot(bvn,type="l")
plot(ts(bvn[,1]))
plot(ts(bvn[,2]))
hist(bvn[,1],40)
hist(bvn[,2],40)
par(mfrow=c(1,1))
# Now construct a Gibbs sampler alternative ---------------
gibbs<-function (n, rho){ # a gibbs sampler implementation of a bivariate random number generator
mat <- matrix(ncol = 2, nrow = n) # matrix for storing the random samples
x <- 0
y <- 0
mat[1, ] <- c(x, y) # initialize the markov chain
for (i in 2:n) {
x <- rnorm(1, rho * y, sqrt(1 - rho^2)) # sample from x conditional on y
y <- rnorm(1, rho * x, sqrt(1 - rho^2)) # sample from y conditional on x
mat[i, ] <- c(x, y)
}
mat
}
Then we can use the Gibbs sampler to get random samples from this known distribution…
# Test the Gibbs sampler ------------------
bvn<-gibbs(10000,0.98)
par(mfrow=c(3,2))
plot(bvn,col=1:10000)
plot(bvn,type="l")
plot(ts(bvn[,1]))
plot(ts(bvn[,2]))
hist(bvn[,1],40)
hist(bvn[,2],40)
par(mfrow=c(1,1))
There is quite a bit of apparent autocorrelation in the samples of the Markov chain here. Gibbs samplers frequently have this issue!
Finally, let’s build a Gibbs sampler for our favorite Myxomatosis example! To do this, we will use the BUGS language (as implemented in JAGS), to help us!
The BUGS language looks similar to R, but there are several key differences:
Here is the myxomatosis example, as implemented in the BUGS language:
NOTE: this code looks a little like R, but don’t be confused- this is not R!
model {
#############
# LIKELIHOOD
############
for(obs in 1:n.observations){
titer[obs] ~ dgamma(shape,rate)
}
#############
# PRIORS
############
shape ~ dgamma(0.001,0.001)
scale ~ dgamma(0.001,0.001)
rate <- 1/scale # convert the scale parameter to a "rate" for BUGS
}
We can use the “cat” function in R to write out this model to a text file in your working directory:
# Myxomatosis example in BUGS modeling language ---------------
# Write the BUGS model to file
cat("
model {
#############
# LIKELIHOOD
############
for(obs in 1:n.observations){
titer[obs] ~ dgamma(shape,rate)
}
#############
# PRIORS
############
shape ~ dgamma(0.001,0.001)
scale ~ dgamma(0.001,0.001)
rate <- 1/scale
}
", file="BUGSmodel.txt")
Now that we have the BUGS model packaged as a text file, we bundle the data into a single list object that contains all the relevant data referenced in the BUGS code:
# Encapsulate the data into a single "list" object ------------------
myx.data.for.bugs <- list(
titer = Myx$titer,
n.observations = length(Myx$titer)
)
myx.data.for.bugs
## $titer
## [1] 5.207 5.734 6.613 5.997 6.612 6.810 5.930 6.501 7.182 7.292 7.819 7.489
## [13] 6.918 6.808 6.235 6.916 4.196 7.682 8.189 7.707 7.597 7.112 7.354 7.158
## [25] 7.466 7.927 8.499
##
## $n.observations
## [1] 27
Then we need to define the initial values for all parameters. It is convenient to define this as a function, so that each MCMC chain can be initialized with different starting values. This will become clear later!
# Function for generating random initial values --------------
init.vals.for.bugs <- function(){
list(
shape=runif(1,20,100),
scale=runif(1,0.05,0.3)
)
}
init.vals.for.bugs()
## $shape
## [1] 29.34063
##
## $scale
## [1] 0.1670234
init.vals.for.bugs()
## $shape
## [1] 35.34258
##
## $scale
## [1] 0.09382479
init.vals.for.bugs()
## $shape
## [1] 42.82044
##
## $scale
## [1] 0.2720501
Now we can call JAGS!
# Run JAGS!!!! ------------------
#library(R2jags)
library(jagsUI)
library(coda)
params.to.store <- c("shape","scale")
jags.fit <- jags(model="BUGSmodel.txt",data=myx.data.for.bugs,inits=init.vals.for.bugs,parameters.to.save=params.to.store,n.iter=5000,n.chains = 3,n.adapt = 100,n.burnin = 0)
##
## Processing function input.......
##
## Done.
##
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 27
## Unobserved stochastic nodes: 2
## Total graph size: 33
##
## Initializing model
##
## Adaptive phase, 100 iterations x 3 chains
## If no progress bar appears JAGS has decided not to adapt
##
## No burn-in specified
##
## Sampling from joint posterior, 5000 iterations x 3 chains
##
##
## Calculating statistics.......
##
## Done.
jagsfit.mcmc <- jags.fit$samples # extract "MCMC" object
summary(jagsfit.mcmc)
##
## Iterations = 101:5100
## Thinning interval = 1
## Number of chains = 3
## Sample size per chain = 5000
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## shape 49.1477 13.22589 0.1079890 1.852842
## scale 0.1518 0.04294 0.0003506 0.005888
## deviance 77.3206 1.88662 0.0154042 0.097958
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## shape 27.45608 39.3182 47.5946 57.8432 76.8916
## scale 0.09013 0.1196 0.1456 0.1763 0.2521
## deviance 75.38302 75.9368 76.7888 78.1168 82.2395
plot(jagsfit.mcmc)
This is probably a good time to talk about convergence of MCMC chains on the stationary posterior distribution. The above plots don’t look great. We want to see white noise, and we want to see chains that look similar to one another.
The first check is just visual- we look for the following to assess convergence:
One way we might be able to do a better job here is to run the chains for longer and discard the initial samples as a burn in!
We can also try to reduce serial autocorrelation by thinning our chain- here we retain only 1 out of every 100 samples.
# Run the chains for longer! -----------------
jags.fit <- jags(model="BUGSmodel.txt",data=myx.data.for.bugs,inits=init.vals.for.bugs,parameters.to.save=params.to.store,
n.iter = 100000,n.chains = 3,n.adapt = 1000,n.burnin = 10000,
n.thin=100,parallel=T )
##
## Processing function input.......
##
## Done.
##
## Beginning parallel processing using 3 cores. Console output will be suppressed.
##
## Parallel processing completed.
##
## Calculating statistics.......
##
## Done.
jagsfit.mcmc <- jags.fit$samples # convert to "MCMC" object (coda package)
summary(jagsfit.mcmc)
##
## Iterations = 10100:1e+05
## Thinning interval = 100
## Number of chains = 3
## Sample size per chain = 900
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## shape 47.6273 13.138 0.2528351 0.487647
## scale 0.1573 0.046 0.0008852 0.001747
## deviance 77.3699 1.990 0.0382902 0.048006
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## shape 25.66686 38.4330 46.2101 55.7803 75.8888
## scale 0.09118 0.1246 0.1498 0.1806 0.2721
## deviance 75.38681 75.9127 76.7803 78.1943 82.8007
plot(jagsfit.mcmc)
Just visually, this looks better. Now we can use some more quantitative convergence metrics.
One simple and intuitive convergence diagnostic is the Gelman-Rubin diagnostic, which assesses whether chains are more different from one another than they should be on the basis of simple Monte Carlo error:
# Run convergence diagnostics ------------------
gelman.diag(jagsfit.mcmc)
## Potential scale reduction factors:
##
## Point est. Upper C.I.
## shape 1.01 1.02
## scale 1.01 1.03
## deviance 1.01 1.03
##
## Multivariate psrf
##
## 1.01
In general, values of 1.1 or higher are considered poorly converged. Compute the G-R diagnostic for all free parameters in your model. If your tests fail, you should try running longer chains!
So this model looks pretty good!!