These next few weeks are focused on fitting models, specifically estimating model parameters and confidence intervals, using likelihood-based techniques (maximum likelihood and Bayesian model fitting). Estimating model parameters means finding the values of a set of parameters that best ‘fit’ the data. Likelihood is a metric that represents the probability of drawing your particular data set given a fully specified model (e.g., a particular data-generating model with a particular set of parameter values). This lab is designed to take two lab sessions to complete.
As with all labs in this course, your answers will either take the form of R functions (submitted as an R script) or short written responses (submitted together in a Word document). The R functions (and only the functions- not your testing code) should be stored in an R script file (‘.R’ extension). You don’t need to follow any naming convention for your R script as long as you submit via WebCampus.
Please submit the R script and the Word document via WebCampus by midnight on the due date (one week after the final lab session allocated for this topic – here, Oct. 25, 2021). You can work in groups but please submit the materials individually.
First, take a little time to review the likelihood lecture!
First, load the reed frog predation data from the Bolker book- it can be found here. Save this file to your working directory.
This dataset represents predation data for Hyperolius spinigularis (Vonesh and Bolker 2005). You can read more about this data set in the Bolker book.
NOTE: if you’d like to follow along, the R script for this demo can be found here
###### Read in the reed frog data set
#rfp <- read.csv("ReedfrogPred.csv")
########
# alternatively, load the data using the 'emdbook' package:
library(emdbook)
rfp <- ReedfrogPred
head(rfp)
Because predation on tadpoles is size and density-dependent, we will subset these data to a single size class (‘small’) and density (10) for all treatments including a predator (this simplifies the problem!). Subset your data now:
##### Take a subset of the data
rfp_sub <- subset(rfp, (rfp$pred=='pred')&(rfp$size=="small")&(rfp$density==10))
rfp_sub
For each individual, the per-trial probability of being eaten by a predator is a binomial process (i.e., they can survive or die during the interval). Recall that the likelihood that k out of N individuals are eaten as a function of the per capita predation probability p is:
\(Prob(k|p,N) = \binom{N}{k}p^{k}(1-p)^{N-k}\)
Since the observations are independent, the joint likelihood of the whole data set is the product of the likelihood of each individual observation. So, if we have n observations, each with the same total number of tadpoles N, and the number of tadpoles killed in the ith observation is ki, then the likelihood is:
\(L = \prod_{i=1}^{n}\binom{N}{k_{i}}p^{k_{i}}(1-p)^{N-k_{i}}\)
Here we assume the data are binomially distributed – the binomial distribution is the natural choice for data that are represented as k ‘successes’ out of N ’trials. We conventionally work in terms of the log-likelihood (LL), which is:
\(LL = \sum_{i=1}^{n}\left [log\binom{N}{k}+k_{i}log(p)+(N-k_{i})log(1-p) \right ]\)
In R this would be
killed <- rfp_sub$density-rfp_sub$surv
N=rfp_sub$density
p=0.5
sum(dbinom(killed, size=N, prob=p, log=TRUE)) # expression of data likelihood(log scale)
There is only one parameter in this calculation, p, because we know how many individuals we started with (N = 10 for each trial) and how many survived in each trial (k = 7, 5, 9, and 9). So we want to solve for the most likely value of p given our observations of N and surv. In essence we do this by picking a possible value of p (which can only range from 0 to 1), calculating the log-likelihood using the equation above, picking another value of p, completing the equation, etc. until we exhaust all possible values of p and identify the one having the highest likelihood value. Of course R has useful built in functions to help us optimize the likelihood function!
The “dbinom()” function calculates the binomial likelihood for a specified data set, specifically a vector of the number of successes (or events) k, probability p, and number of trials N. Specify your vector of successes (here a success means being eaten by a predator!):
num_killed <- rfp_sub$density-rfp_sub$surv # specify vector of "successes" (being eaten!)
num_killed
## [1] 3 5 1 1
Given our observed k (number killed), and N = 10 for each trial, what is the likelihood that p = 0.5 for each of our trials?
dbinom(num_killed,size=10,prob=0.5) # evaluate data likelihood with p=0.5
## [1] 0.117187500 0.246093750 0.009765625 0.009765625
[1] 0.117187500 0.246093750 0.009765625 0.009765625
We can see that given our data, fixed sample size, and model (with p = 0.5), our observed outcomes are very unlikely.
What is the likelihood of observing all 4 of our outcomes, i.e, the joint probability of our data?
prod(dbinom(num_killed,size=10,prob=0.5)) # joint data likelihood
## [1] 2.750312e-06
The joint likelihood values will be less than 1, and gets smaller and smaller each time we add more data (can you see why?). This is why we prefer to work with log-likelihoods (which yield larger numbers having better mathematical properties). And taking the log of a value <1 yields a negative number, which is why we often see that our log likelihood values are negative.
For now, we can build on this above process to estimate the likelihood function over the entire possible parameter space (probability of being eaten- which can range from 0 to 1).
First we make a sequence of 100 possible parameter values from 0.01 to 1.
p <- seq(0.01, 1, length=100) # prepare for visualizing the likelihood across parameter space
Then we make an empty storage vector for the likelihoods we’ll calculate
Lik <- numeric(length=100)
Now for the for loop! For every value of p (a sequence of 100 values) we will calculate the binomial probability and store it in the ‘Lik’ vector.
#########
# plot out the likelihood
for(i in 1:100){
Lik[i] <- prod(dbinom(num_killed,size=10,prob=p[i]))
}
plot(Lik~p,lty="solid",type="l", xlab="Predation Probability", ylab="Likelihood")
But we want to maximize the log-likelihood:
########
# plot out the log-likelihood
p <- seq(0.01, 1, by=0.01)
LogLik <- numeric(length=100)
for(i in 1:100){
LogLik[i] <- sum(dbinom(num_killed, size=10,
prob=p[i],log=TRUE))
}
plot(LogLik~p,lty="solid",type="l", xlab="Predation Probability", ylab="Log Likelihood")
We can ask R to tell us at which value of p the Log-Likelihood is maximized:
p[which(LogLik==max(LogLik))] # MLE for probability of predation
## [1] 0.25
And we can add an “abline()” to indicate the maximum Log-Likelihood estimate:
plot(LogLik~p,lty="solid",type="l", xlab="Predation Probability", ylab="Log Likelihood")
abline(v=0.25,lwd=3)
Alternatively, we can use the optim() or mle2() functions to find the maximum likelihood estimate. Although we seek the most likely, or maximum likelihood estimate, in practice we generally minimize the negative log-likelihood. To do so, first write a function to calculate the binomial negative log-likelihood function and estimate parameter p.
###########
# Write a likelihood function
# p: probability of predation per trial (param to estimate)
# k: number killed per trial (data)
# N: number of tadpoles per trial (data)
binomNLL1 <- function(p, k, N) {
-sum(dbinom(k, size=N, prob=p, log=TRUE))
}
As we did in class, you can use the ‘optim()’ function to minimize your negative log-likelihood function (‘binomNLL1()’) given a vector of starting parameters and your data. The starting parameters need not be accurate, but do need to be reasonable for the function to work, that’s why we spent time in class eyeballing curves (also read the Bolker book for a discussion of the ‘method of moments’, which can help you get reasonable starting values!). Given that there is only one estimable parameter, p, in the binomial function, you need only provide a starting estimate for it. Calculate the negative log-likelihood:
#####
# use "optim()" to find the MLE
opt1 <- optim(fn=binomNLL1, par = c(p=0.5), N = 10, k = num_killed, method = "BFGS") # use "optim()" to estimate the parameter value that maximizes the likelihood function
## Warning in dbinom(k, size = N, prob = p, log = TRUE): NaNs produced
## Warning in dbinom(k, size = N, prob = p, log = TRUE): NaNs produced
## Warning in dbinom(k, size = N, prob = p, log = TRUE): NaNs produced
## Warning in dbinom(k, size = N, prob = p, log = TRUE): NaNs produced
## Warning in dbinom(k, size = N, prob = p, log = TRUE): NaNs produced
You may get several warning messages, can you think why? opt1 returns a list that stores information about your optimization process.
opt1 # check out the results of "optim()"
## $par
## p
## 0.2500002
##
## $value
## [1] 7.571315
##
## $counts
## function gradient
## 17 7
##
## $convergence
## [1] 0
##
## $message
## NULL
The important bits are whether or not the process achieved convergence and the parameter estimate that was converged upon.
opt1$convergence
## [1] 0
Here a value of 0 means convergence has been achieved, a value of 1 means the process failed to converge. There is more info about convergence and alternative optimization options in Chapter 7 of the Bolker book.
Your best fit estimate of p is:
opt1$par # MLE
## p
## 0.2500002
This numerically computed answer is (almost exactly) equal to the theoretical answer of 0.25. The value of the function you optimized, binomNLL1, is:
opt1$value # max. likelihood (actually minimum negative-log-likelihood)
## [1] 7.571315
which is the negative log-likelihood for the model. And, as we already know, the absolute likelihood of this particular outcome (5, 3, 1 and 1 out of 10 tadpoles eaten in four replicates) is quite low, even for this simple four-observation scenario:
exp(-opt1$value) # convert to likelihood
## [1] 0.0005150149
Plot your observed outcomes against your predictions under the maximum likelihood model:
hist(num_killed,xlim=c(0,10),freq=F)
curve(dbinom(x,prob=opt1$par,size=10),add=T,from=0,to=10,n=11)
Note that “freq=F” scales the y-axis of a histogram to “density”, which allows us to overlay probability density functions.
In this exercise you will develop and use a likelihood function that returns the data likelihood for the following scenario: you visit three known-occupied wetland sites ten times and for each site you record the number of visits for which a particular frog species is detected (at least one call within a 5 minute period). Your data are as follows: 3,2 and 6 detections for sites 1, 2, and 3 respectively.
Write a function (likelihood function) called “NLL_frogOccupancy()” for computing the data likelihood (actually, negative log-likelihood) for the above scenario. Your likelihood function should compute the likelihood of these data: [3,2 and 6 detections for sites 1, 2, and 3 respectively] for any given detection probability \(p\), assuming that all sites are occupied continuously. Assume that all sites have the same detection probability (p, which is our free parameter)
Include your function in your submitted r script!
You can use this “sandbox” (below) to develop and test your function!
NLL_frogOccupancy <- function(params,data,N){
# [add code here!]
}
#NLL_frogOccupancy(params=0.5,data=c(3,2,6),N=10)
And test your function! For example:
NLL_frogOccupancy(params=0.5,data=c(3,2,6),N=10) # test your function
## [1] 6.853154
In your Word document, respond briefly to the following questions:
So we’ve looked at how to obtain the likelihood of getting our dataset given a stochastic model (the binomial distribution), but now we want to consider more interesting ecological questions like when the mean or variance of the model parameters vary among groups or depend upon covariates. Recall that we subset our data above because we expected survival to be (in part) density-dependent. Here we’ll consider how to model the probability of tadpole survival as a function of the initial density of tadpoles in the population. To do so, we need to incorporate a deterministic function into our stochastic model.
Save the reed frog functional response dataset to your working directory- it can be found here.
First, examine the first few lines:
#####
# 3.2a
#rffr <- read.csv("ReedfrogFuncResp.csv",row.names = 1)
# alternative:
rffr <- emdbook::ReedfrogFuncresp # from Bolker's "emdbook" package
# ?Reedfrog # learn more about this dataset
head(rffr)
Let’s look at the distribution of the data (probability of being killed).
hist(rffr$Killed/rffr$Initial)
Based on what we know mechanistically about the data, we’ll use a binomial distribution to describe the observed number killed.
Plot the number killed by the initial density (using plot()) to see what sort of deterministic function would describe the pattern. It looks like it could be linear, but because we know that this is a predation response, and that predators become handling-limited (saturated) at high prey densities. On page 182 Bolker indicates that if predation rate= \(aN/(1+ahN)\) (Holling Type II functional response), this means that the per-capita predation rate of tadpoles decreases hyperbolically with tadpole density \((= a/(1 + ahN))\). We’ll use this deterministic function for our data.
First, let’s see what that curve would look like over our data points with an initial guess at the parameters (we always need an initial guess to seed our optimization algorithms). Recall that the a parameter of this hyperbolic function indicates the initial slope, which we’ll guess to be around 0.5, and the h parameter indicates 1/asymptote, which we fiddled around with to match the data (so try 1/80).
This looks pretty good, but we want to actually fit the line to the data instead of making guesses, and we’ll use likelihood to do that. Just like before, we’ll write a negative log likelihood function, but this time we’ll incorporate the deterministic model!
Reed frog functional response challenge #1: write a likelihood function for the Holling-II functional response!
Write a function called “binomNLL2()” for computing the data likelihood for this model. This likelihood function should reflect that the data are drawn from a binomial distribution (either killed or not), and that the probability of predation (the number killed divided by the initial number) is explained by the Holling type II equation.
Include your function in your submitted r script!
You can use this “sandbox” (below) to develop and test your function!
binomNLL2 <- function(params,N,k){
# [add code here!]
}
# binomNLL2(c(a=0.4,h=1/200),c(5,10,15),c(3,5,6))
You can test your function using something like the following:
binomNLL2(c(a=0.4,h=1/200),c(5,10,15),c(3,5,6))
## [1] 4.706114
Now we can find the parameter values that best describe these data using ‘optim()’. We’ll use the same initial values for a and h that we used to plot the curve. N is the initial number of tadpoles, and k is the number of tadpoles killed.
opt2 <- suppressWarnings( optim(fn=binomNLL2, par=c(a=0.5,h=(1/80)), N=rffr$Initial, k=rffr$Killed) ) #use default simplex algorithm
opt2
## $par
## a h
## 0.52593924 0.01660613
##
## $value
## [1] 46.72136
##
## $counts
## function gradient
## 53 NA
##
## $convergence
## [1] 0
##
## $message
## NULL
The results are not that different from our starting values, so we made a good guess!
Write a function called “Rffuncresp()” for computing the maximum-likelihood estimates and plotting the goodness-of-fit for this model.
Include your function in your submitted r script!
You can use this “sandbox” (below) to develop and test your function!
Rffuncresp <- function(params,data){
# [add code here!]
}
#Rffuncresp(params=c(a=0.5,h=(1/30)),data=rffr)
You can test the function using something like this:
inits <- c(a=0.6,h=(1/60)) # test the function
Rffuncresp(params=inits,data=rffr)
## a h
## 0.52592194 0.01660555
NOTE: the “prediction interval” you are asked to generate here is sometimes called a “plug-in” prediction interval, and is a quick and dirty way to assess goodness-of-fit. Essentially, you just take the best-fit model (with parameter values at their MLE values) and use the 0.025 and 0.975 quantiles of the stochastic process (here, the binomial distribution) to define the range of data that would generally be produced under this model. NOTE: In “qbinom()”, for the “prob” argument, you can use the ratio of the x and y values to get a probability value from 0-1. For example:
########
# how to generate 'plug-in' prediction intervals
xvec <- seq(40,80,5)
yvec <- 0.5/(1+0.5*0.015*xvec) * xvec
upper<-qbinom(0.975,prob=yvec/xvec, size=xvec)
lower<-qbinom(0.025,prob=yvec/xvec, size=xvec)
upper
lower
In your Word document, respond briefly to the following questions:
Try some different starting values for the a and h parameters. Can you find any starting values that are so bad they cause the optimization algorithm (default algorithm used by “optim()” function in R) to fail?
Bolker calls the prediction interval you generated above a plug-in prediction interval. In what way(s) is this interval different than a true prediction interval? (hint: are there are any sources of error that the plug-in prediction interval ignores?)
Step 1. Identify the response and explanatory variables (e.g., Predation probability and Initial Population Size). Just stating what the response and explanatory variables are will help you start modeling.
Step 2. Determine the stochastic distribution (e.g., Binomial). In this case, the stochastic distribution was easy to identify because we chose it mechanistically. Other times it may not be so clear what the best distribution is, and looking at the histogram and plotting different distributions over the top will be helpful.
Step 3. Specify the deterministic function (e.g., Holling type II). Again, we chose this function mechanistically, but we could have chosen different functions just by looking at the plot of the points.
Step 4. Specify the likelihood of the data given our deterministic expectations and the stochastic distribution. Our negative log likelihood function combined the stochastic and deterministic elements together by having the stochastic parameter (in this case the binomial probability, p) be dependent upon the deterministic parameters.
Step 5. Make a guess for the initial parameters (e.g., a=0.5, h=1/80). You need to have an initial guess at the parameters to make ‘optim()’ work, and we plotted the Holling curve to make our guess. Sometimes you will also need to make a guess at the parameters for the stochastic distribution. In these cases, the method of moments is often the best option (see Bolker book for details).
Step 6. Estimate the best-fit parameters using maximum likelihood. We used optim() to search through all possible value combinations of parameters a and h to estimates for those parameters that correspond to the minimum negative log-likelihood.
Step 7. Add confidence or prediction intervals around your estimates to represent uncertainty. We calculated some plug-in estimates to put “pseudo-prediction intervals” around our estimates based on the stochastic distribution.
Let’s continue the myxomatosis virus titer example from the optimization lecture. The difference is that this time we’ll model a deterministic process (decay of viral loads over time) in addition to the stochastic process. Our goal is to fit a model to data on viral titers through time for the viruses that are grade 1 (highest virulence). Biological common sense, and data from other titer levels, tells us to expect titer levels to start at zero, increase over time to a peak, and then to decline. Given those expectations, we’ll fit a Ricker model to these data, following Bolker’s example and extending it just a bit.
Our goals are to:
You can add the data from the ‘emdbook’ package:
#######
# Exercise 3.3a
########
# Myxomatosis data
library(emdbook)
data(MyxoTiter_sum) # load the data
head(MyxoTiter_sum)
Select just the grade 1 titers:
myxdat <- subset(MyxoTiter_sum, grade==1) # select just the most virulent strain
plot(myxdat$titer~myxdat$day,xlim=c(0,10)) # visualize the relationship
Fit a Ricker model to the myxomatosis data – that is, use a Ricker deterministic equation to model the expected (mean) virus titer as a function of days since infection (see below for step-by step instructions).
Write a function called “NLL_myxRicker()” for computing the data likelihood and plotting the goodness-of-fit for this model.
This task (develop likelihood function) can be broken down into a few steps, just like we did above!
Our question is: how does a virus titer change in rabbits as a function of time since infection? This is almost exactly the same problem as we just did (above), but we’re using different distributions and functions. To solve the problem, you’ll need to go through the same steps outlined above.
Step 1. Identify the response and explanatory variables. The response is the virus titer and the explanatory variable is the days since infection.
Step 2. Determine the stochastic distribution.
Start by plotting the histogram of the response variable. Bolker suggests a gamma distribution – does it look like a gamma would work? Write down the parameters for the gamma distribution (page 133). (For the gamma distribution, use with the shape and scale parameters (not the rate)).
Step 3. Specify the deterministic function.
Plot the data points (hint: look at figure 6.5b). Bolker suggests the Ricker curve – does it look like the Ricker curve would work? Write down the equation and parameters for the Ricker curve (page 94).
Step 4. Specify the likelihood of the data given our deterministic expectations and the stochastic distribution. Take a moment to think how the parameters of the stochastic distribution are determined by the parameters of the deterministic function. For the gamma distribution, both the shape and scale parameters are related to the mean of the distribution, i.e., mean = shape × scale (page 133). So how will you specify that the deterministic function (the Ricker model) should represent the mean? What parameters do you need your (negative) log-likelihood function to estimate? Write out your negative log-likelihood function to solve for the likelihood.
ASIDE: the mean and variance of the gamma distribution are inter-related: the shape parameter can be specified as: \(\frac{mean^2}{var}\) and the scale parameter can be specified as \(\frac{var}{mean}\)!
Include your function in your submitted r script!
You can use this “sandbox” (below) to develop and test your function!
NLL_myxRicker <- function(params,data){
# [add code here!]
}
#NLL_myxRicker(params=c(a=2,b=0.1,shape=80),data=myxdat[,-1])
You can test your function using something like this:
NLL_myxRicker(params=c(a=4,b=0.2,shape=40),data=myxdat[,-1]) # test the function
## [1] 35.39427
Write a function called “MyxRicker()” for computing the maximum likelihood estimates and plotting the goodness-of-fit for this model.
To complete this exercise will involve going through the final steps (5-7) in the process outlined above:
Step 5. Make a guess for the initial parameters. We need initial parameters to put into “optim()”. Remember that the Ricker curve parameters can be estimated based on the initial slope and maximum (see pg. 95). Try plotting the curve over the points to get an approximate fit. There really isn’t any easier way to get there than trial and error for the deterministic function.
Step 6. Estimate the best fit parameters using maximum likelihood.
Now use ‘optim()’ to get your maximum likelihood parameter estimates.
Step 7. Add confidence/prediction intervals around your estimates.
After your run of ‘optim()’ (did you achieve convergence?), plot your fitted Ricker curve to your data. Revisit the earlier prediction interval code to add plug-in prediction intervals around your predicted curve based on gamma distributed errors (should resemble Figure 6.5b on page 184 of text).
Use the “qgamma()” function to build your plug-in intervals!
########
# Plug-in prediction intervals!
upper<-qgamma(0.975,shape=?, scale=?) # remember that the mean of the gamma distribution is shape*scale
lower<-qgamma(0.025,shape=?, scale=?)
Include your function in your submitted r script!
You can use this “sandbox” (below) to develop and test your function!
MyxRicker <- function(params,data){
# [add code here!]
}
#MyxRicker(params=c(a=1,b=0.1,shape=30),data=myxdat[,-1])
And as always, make sure to test your function code using something like this:
MyxRicker(params=c(a=2,b=0.2,shape=30),data=myxdat[,-1]) # test the function
## a b shape
## 3.5611591 0.1713262 90.5287907
Making “plug-in” confidence intervals looks nice on the plot (and is a useful, quick-and-dirty way to assess goodness-of-fit), but if we also want to visualize parameter uncertainty (and not just uncertainty arising from the randomness of the stochastic or residual error component) we need to consider other plausible parameter values from our n-dimensional likelihood surface (with as many dimensions as there are free parameters).
Make a function called “MyxRicker_ci()” that takes any two parameters from your myxomatosis model as inputs (holding any remaining parameters constant) and visualizes how the likelihood changes across this 2-dimensional parameter space. What we’ll end up with will look something like Figure 6.7 in the Bolker book.
NOTE: my testing code will only test your function with the parameters “a” and “b”, so you can hard-code your function with these parameters as the ones defining your 2-D parameter space (holding the ‘shape’ parameter constant)
Include your function in your submitted r script!
You can use this “sandbox” (below) to develop and test your function!
MyxRicker_ci <- function(LikFunc,params,params_selected,data,param1_lims,param2_lims){
# [add code here!]
}
# MyxRicker_ci(LikFunc = NLL_myxRicker, params=c(a=1,b=0.3,shape=50), params_selected = c("a","b"),
# data=myxdat[,-1], param1_lims= c(0.1,9),param2_lims=c(0.01,0.5))
Try testing your code- something like this:
# test: Ricker params "a" and "b" [make sure this one works]
testab <- MyxRicker_ci(LikFunc = NLL_myxRicker, params=c(a=2,b=0.2,shape=30), params_selected = c("a","b"), data=myxdat[,-1], param1_lims= c(0.1,9),param2_lims=c(0.01,0.5))
# test: Ricker param "a" and gamma "shape" [if this works, you fully implemented the suggested algorithm!]
testas <- MyxRicker_ci(LikFunc = NLL_myxRicker, params=c(a=2,b=0.2,shape=30), params_selected = c("a","shape"), data=myxdat[,-1], param1_lims= c(1,8),param2_lims=c(10,300))
And here is the output from the functions (approximate profile likelihood confidence intervals):
testab
testas
Construct profile likelihood confidence intervals for any one selected parameter using repeated calls to the “optim()” function. You will need to modify your likelihood functions and “optim()” commands to estimate only the parameters other than the one you are trying to get a CI for (because you’ll be fixing that parameter at many values on either side of the MLE).
–End of lab 3–