An Extremely Brief Introduction to Time Series Analysis

In many of the models that we use, we assume independent observations. When we have time series data, this assumption is incorrect. A time series is a set of data with sequential observations recorded over regular time intervals. Individual observations within a time series are correlated with prior observations. Time series data are taken at widely varying intervals, such as yearly, quarterly, monthly, daily, etc.; however, you can use similar methods to analyze the data.

When do you use time series analysis?

Any time individual observations are correlated by time. The trends, seasonality, and other factors that bear on time series data can be extracted effectively to decouple the time components of the data.

What can you accomplish with time series analysis?

Time series allows you to examine when important events occur that may not be easily visible with other forms of analysis. Time series analysis also allows you to make predictions about systems, an important tool in natural systems with management implications.

We’re going to examine data using two popular time series analyses: An ARIMA forecast and a Fourier transform.

Forecasting with ARIMA

ARIMA stands for Autoregressive Integrated Moving Average.

AR = Autoregressive parameter, denoted by p in the model. This value determines the autocorrelation in the time series.

I = Integrated parameter, denoted by d in the model. This value indicates the number of differences that we needed to take in order to make the time series stationary.

MA = Moving Average parameter, denoted by q in the model. This value relates the auto-correlation and lag. Different values denote auto-correlation with varying lags.

Essentially, an ARIMA model uses differencing to detrend data in order to make reasonable predictions about the future. Let’s use the following temperature data to make a prediction about the future.

First, let’s install some libraries, in case you don’t already have them:

#install.packages(c("forecast","tseries", "lubridate", "tidyverse"))
library(tidyverse)
## ── Attaching packages ───────────────────────────────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.2.1     ✔ purrr   0.3.3
## ✔ tibble  2.1.3     ✔ dplyr   0.8.3
## ✔ tidyr   1.0.0     ✔ stringr 1.4.0
## ✔ readr   1.3.1     ✔ forcats 0.4.0
## ── Conflicts ──────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(lubridate)
## 
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
## 
##     date
library(forecast)
## Registered S3 method overwritten by 'xts':
##   method     from
##   as.zoo.xts zoo
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
## Registered S3 methods overwritten by 'forecast':
##   method             from    
##   fitted.fracdiff    fracdiff
##   residuals.fracdiff fracdiff
library(tseries)

Now import the data we’ll be using. This data comes from Google Earth Engine.

#import data
df <- read_csv("CENMET.csv")
## Parsed with column specification:
## cols(
##   date = col_character(),
##   prcp = col_double(),
##   tmin = col_double(),
##   tmax = col_double(),
##   elev = col_double()
## )
str(df)
## Classes 'spec_tbl_df', 'tbl_df', 'tbl' and 'data.frame': 6939 obs. of  5 variables:
##  $ date: chr  "2000/1/1" "2000/1/2" "2000/1/3" "2000/1/4" ...
##  $ prcp: num  16.944 10.47 2.214 31.28 0.891 ...
##  $ tmin: num  271 270 272 272 268 ...
##  $ tmax: num  277 275 278 279 277 ...
##  $ elev: num  1126 1126 1126 1126 1126 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   date = col_character(),
##   ..   prcp = col_double(),
##   ..   tmin = col_double(),
##   ..   tmax = col_double(),
##   ..   elev = col_double()
##   .. )

The date reads as a character, so we need to change it to a date element before we can create our time series element.

df$date <- ymd(df$date)
str(df)
## Classes 'spec_tbl_df', 'tbl_df', 'tbl' and 'data.frame': 6939 obs. of  5 variables:
##  $ date: Date, format: "2000-01-01" "2000-01-02" ...
##  $ prcp: num  16.944 10.47 2.214 31.28 0.891 ...
##  $ tmin: num  271 270 272 272 268 ...
##  $ tmax: num  277 275 278 279 277 ...
##  $ elev: num  1126 1126 1126 1126 1126 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   date = col_character(),
##   ..   prcp = col_double(),
##   ..   tmin = col_double(),
##   ..   tmax = col_double(),
##   ..   elev = col_double()
##   .. )

Looks good. Now, we can glance at the data.

ggplot(df, aes(date, tmax)) + geom_line() + scale_x_date('month') +
  ylab("Tmax") + 
  xlab("") +
  theme_bw()