To follow along with the R-based lessons and demos, right (or command) click on this link and save the script to your working directory
\(\chi^2 = \sum_{i=1}^k\frac{(x_i-exp_i)^2}{exp_i}\)
Here, \(exp_i\) is the expected number of observations (count) in category i (e.g., under the null hypothesis). \(x_i\) is the observed number/count in category i. So the Chi-squared statistic basically summarizes the degree to which the observed count disagrees with the expected count across all categories.
Why is the numerator squared? First, this makes all deviations positive- negatives and positives can’t cancel each other out. Second, it allows us to use the well-described Chi-squared distribution!
The Chi-squared distribution is described (like the t distribution) by a certain degrees of freedom (‘degrees of freedom’ is the parameter needed to describe the distribution – just like mean and standard deviation are parameters of the normal distribution).
If you are performing a goodness-of-fit test, the degrees of freedom for this test is one less than the number of categories.
If you are performing a test for independence of two categorical variables, the degrees of freedom is computed as \((r-1)(c-1)\) where r and c are the number of categories in each of the two categorical variables, respectively.
This test asks the question “do the observations sort into categories according to your (null) hypothesis?” In a Chi-square goodness-of-fit test, the response variable of interest is categorical. Like the one-sample t-test, there is no predictor variable.
Example: Are graduate students more likely to be born some months versus other months?
What is the population? sample? parameter? statistic?
In this example, the response variable of interest is the birth month- which is best considered a categorical variable. You can run this test on non-categorical variables (e.g., continuous numeric), but you have to turn your variable into a categorical variable (e.g., by “binning”) prior to conducting this analysis.
## Chi squared goodness-of-fit example
birthdays.bymonth <- c(40,23,33,39,28,29,45,31,22,34,44,20)
months <- c("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec")
names(birthdays.bymonth) <- months
sample.size <- sum(birthdays.bymonth)
k = length(birthdays.bymonth) # number of categories (months)
exp.birthdays.bymonth <- sample.size*rep(1/k,times=k) # compute the expected number under the null hypothesis.
Chisq.stat <- sum((birthdays.bymonth-exp.birthdays.bymonth)^2/exp.birthdays.bymonth)
Chisq.stat
## [1] 24.14433
## View the summary statistic along with its sampling distribution under the null hypothesis
curve(dchisq(x,k-1),0,75)
abline(v=Chisq.stat,col="green",lwd=3)
p <- 1-pchisq(Chisq.stat,k-1)
p
## [1] 0.01213825
### use R's built in chi squared function
chisq.test(birthdays.bymonth) # should get the same p value!
##
## Chi-squared test for given probabilities
##
## data: birthdays.bymonth
## X-squared = 24.144, df = 11, p-value = 0.01214
Here is another example- this time we use a Chi-squared goodness of fit test to assess normality of a continuous variable. This example is meant to enhance understanding of the chi-squared goodness of fit test– I don’t suggest running this test to assess normality, since the shipiro-wilks test is a better test!
# Chi-squared test for normality
normal.data <- rnorm(250,4,1.5) # generate normal data
nonnormal.data <- runif(250,2,6) # generate non-normal data
hist(normal.data)
hist(nonnormal.data)
# First bin the data (chi squared test must be on categorical data!)
breaks <- c(-Inf,1,2,3,3.5,4,4.5,5,6,7,Inf)
normal.data.binned <- cut(normal.data,breaks,labels=breaks[-1])
nonnormal.data.binned <- cut(nonnormal.data,breaks,labels=breaks[-1])
obs_table_norm <- table(normal.data.binned)
obs_table_nonnorm <- table(nonnormal.data.binned)
# Determine expected values in each cell if the underlying distribution were normal
normal.probs <- sapply(2:length(breaks), function(t) pnorm(breaks[t],mean(normal.data),sd(normal.data))-pnorm(breaks[t-1],mean(normal.data),sd(normal.data)) )
exp_table_norm <- length(normal.data)*normal.probs # check that expected vals are greater than 5
normal.probs <- sapply(2:length(breaks), function(t) pnorm(breaks[t],mean(nonnormal.data),sd(nonnormal.data))-pnorm(breaks[t-1],mean(nonnormal.data),sd(nonnormal.data)) )
exp_table_nonnorm <- length(normal.data)*normal.probs # check that expected vals are greater than 5
# Chi squared stat for normality test on normal data
chistat_norm <- sum((obs_table_norm-exp_table_norm)^2/exp_table_norm)
pval_norm <- 1-pchisq(chistat_norm,length(exp_table_norm)-1)
pval_norm
## [1] 0.1633256
chistat_nonnorm <- sum((obs_table_nonnorm-exp_table_nonnorm)^2/exp_table_nonnorm)
pval_nonnorm <- 1-pchisq(chistat_nonnorm,length(exp_table_nonnorm)-1)
pval_nonnorm
## [1] 1.785932e-09
# Compare with shapiro wilk test (which is a MUCH better test!)
shapiro.test(normal.data)
##
## Shapiro-Wilk normality test
##
## data: normal.data
## W = 0.9955, p-value = 0.6817
shapiro.test(nonnormal.data)
##
## Shapiro-Wilk normality test
##
## data: nonnormal.data
## W = 0.94983, p-value = 1.392e-07
In the chi-square test for independence, we are testing for a relationship between two categorical variables. That is, both your response variable and predictor variables are categorical. For example, we might test if rabbit ear type (floppy, pointy, mixed) is associated with rabbit coat color (white, brown, mixed) (e.g., are floppy eared rabbits more likely to be white than bunnies with pointy ears).
Here we are testing if bunnies ear type is related to coat color. First we summarize the number of bunnies with each possible combination of coat color and ear type (this is our contingency table). Then we determine what number would be expected in each category if the null hypothesis were true. Finally we can compute the chi-squared statistic and compare with the null sampling distribution (chi squared distribution). Here is the code:
n.bunnies <- 112
# sample assuming null hypothesis is true
all.bunnies <- data.frame(
ear_type = sample(c("floppy","pointy","mixed"),n.bunnies,replace = T),
coat_color = sample(c("white","brown","mixed"),n.bunnies,replace = T)
)
head(all.bunnies)
## ear_type coat_color
## 1 floppy mixed
## 2 pointy brown
## 3 pointy brown
## 4 pointy mixed
## 5 pointy brown
## 6 floppy mixed
# make contingency table
con_table <- table(all.bunnies$ear_type,all.bunnies$coat_color)
# generate expected values
prop.ears <- rowSums(con_table)/n.bunnies
sum.coats <- colSums(con_table)
exp_table <- sapply(1:ncol(con_table),function(t) sum.coats[t]*prop.ears)
colnames(exp_table) <- colnames(con_table)
# compute chi-squared statistic
Chisq.stat <- sum((con_table-exp_table)^2/exp_table)
# Compare chi squared statistic with null sampling distribution
curve(dchisq(x,4),0,10)
abline(v=Chisq.stat,col="blue",lwd=2)
p.value <- 1-pchisq(Chisq.stat,4)
p.value
## [1] 0.08196423
text(7,0.15,paste0("p = ",round(p.value,3)))
# Compare with R's built in chi squared function
chisq.test(con_table)
##
## Pearson's Chi-squared test
##
## data: con_table
## X-squared = 8.2763, df = 4, p-value = 0.08196
Many statistical tests recommend the G test as an alternative to the Pearson Chi-squared test. The test is very similar to the Chi Squared test, and has the same number of degrees of freedom. The G statistic is computed as:
\(G = 2\cdot \sum{O_i\cdot ln(\frac{O_i}{E_i})}\)
The sampling distribution for the G statistic is, for large sample sizes, approximately Chi-squared distributed with the same number of degrees of freedom as for the corresponding Chi-squared test.
Let’s repeat the bunny example above, this time using both the Chi-squared test and the G test.
n.bunnies <- 112
# sample assuming null hypothesis is true
all.bunnies <- data.frame(
ear_type = sample(c("floppy","pointy","mixed"),n.bunnies,replace = T),
coat_color = sample(c("white","brown","mixed"),n.bunnies,replace = T)
)
head(all.bunnies)
## ear_type coat_color
## 1 mixed mixed
## 2 mixed brown
## 3 mixed white
## 4 mixed white
## 5 mixed mixed
## 6 pointy white
# make contingency table
con_table <- table(all.bunnies$ear_type,all.bunnies$coat_color)
# generate expected values
prop.ears <- rowSums(con_table)/n.bunnies
sum.coats <- colSums(con_table)
exp_table <- sapply(1:ncol(con_table),function(t) sum.coats[t]*prop.ears)
colnames(exp_table) <- colnames(con_table)
# compute chi-squared and G statistics
Chisq.stat <- sum((con_table-exp_table)^2/exp_table)
G.stat <- 2*sum(con_table*log(con_table/exp_table)) # slightly different!
# Compare chi squared and G statistics with null sampling distribution
curve(dchisq(x,4),0,10)
abline(v=Chisq.stat,col="blue",lwd=1)
abline(v=G.stat,col="red",lwd=1)
legend("topright",lty=c(1,1),lwd=c(1,1),col=c("blue","red"),legend=c("Chi-squared","G stat"),bty="n")
p.value.Chi <- 1-pchisq(Chisq.stat,4)
p.value.G <- 1-pchisq(G.stat,4)
text(7,0.12,paste0("p.Chi = ",round(p.value.Chi,3)))
text(7,0.1,paste0("p.G = ",round(p.value.G,3)))
As you can see, in practice it won’t usually matter whether you use the Chi-squared or G statistic! But some reviewers may ask you to use the G statistic because most statisticians agree that it does a slightly better job than the classical Chi-squared statistic.
The Chi-squared test is sometimes considered a non-parametric test because it does not assume anything about the underlying distribution of the data or the residuals. The Chi-squared tests simply makes the assumption that the observations are independent. Given the independence of observations AND large enough sample size, the sampling distribution for the chi-squared statistic (and G statistic) should be distributed according to a chi-squared distribution with the appropriate degrees of freedom.
BUT… the Fisher exact test makes fewer assumptions than the Chi-squared test– namely that the sample size is ‘large enough’ and should be used in place of the Chi-squared test when you have small sample size. Here is the bunny example again, this time with small sample size:
n.bunnies <- 20
# sample assuming null hypothesis is true
all.bunnies <- data.frame(
ear_type = sample(c("floppy","pointy","mixed"),n.bunnies,replace = T),
coat_color = sample(c("white","brown","mixed"),n.bunnies,replace = T)
)
head(all.bunnies)
## ear_type coat_color
## 1 pointy mixed
## 2 pointy brown
## 3 floppy brown
## 4 pointy brown
## 5 mixed mixed
## 6 pointy white
# make contingency table
con_table <- table(all.bunnies$ear_type,all.bunnies$coat_color)
# generate expected values
prop.ears <- rowSums(con_table)/n.bunnies
sum.coats <- colSums(con_table)
exp_table <- sapply(1:ncol(con_table),function(t) sum.coats[t]*prop.ears)
colnames(exp_table) <- colnames(con_table)
# run fisher exact test
#?fisher.test
chisq.test(con_table)
##
## Pearson's Chi-squared test
##
## data: con_table
## X-squared = 4.3333, df = 4, p-value = 0.3628
fisher.test(con_table) # better test
##
## Fisher's Exact Test for Count Data
##
## data: con_table
## p-value = 0.4312
## alternative hypothesis: two.sided
fisher.test(con_table,simulate.p.value = T,B=5000) # use simulated p-value - usually very similar answer
##
## Fisher's Exact Test for Count Data with simulated p-value (based on
## 5000 replicates)
##
## data: con_table
## p-value = 0.4323
## alternative hypothesis: two.sided