SGPE ECNM11049

Outline of today’s lecturer-led lab

Principal components analysis (PCA)
Digression: Factor augmented VARs (FA-VARs) based on PCA

Packages we will use in this lab

zoo: For working with time series data. The package includes important functions such as computing rolling correlations, rolling standard deviations and aggregation (e.g., converting data from daily to monthly frequency).
vars: Includes methods and tools specifically for VAR analysis.
factoextra: Easy-to-use functions to extract and visualise the output of PCA.
corrplot: Provides a visual exploratory tool on correlation matrix.

# Important packages described above
library(zoo)
library(vars)
library(factoextra)
library(corrplot)

options(scipen = 9) # Avoid scientific notation

Part 0: Plot time series over time

In this lab, we will use a monthly dataset for the US ranging from January 1980 to December 2019. The data includes year-on-year (y-o-y) consumer price inflation (CPIAUCSL), the Federal Funds rate (FEDFUNDS), and y-o-y growth rates for seven different industrial production measures: INDPRO, IPMANSICS, IPCONGD, IPMINE, IPDCONGD, IPBUSEQ, IPMAT. The time series were directly transformed and downloaded as a .csv file from FRED.

# Allocate the macroeconomic variables to the object "usmacro"
if("usmonthly.csv" %in% list.files()){
  usmacro <- read.table("usmonthly.csv", sep=",",header=T)
}else{ # file.choose() allows to choose the file interactively
  usmacro <- read.table(file.choose(), sep=",",header=T)
}
# Select the following macroeconomic variables 
var.slct <- c("INDPRO", "IPMANSICS", "IPCONGD", "IPMINE", "IPDCONGD",  "IPBUSEQ", "IPMAT", 
              "CPIAUCSL", "FEDFUNDS")
# Specify "usmacro" as time series object
usmacro <- ts(usmacro[,var.slct], end = c(2019, 12), frequency = 12)
colnames(usmacro) <- c("IP1", "IP2", "IP3", "IP4", "IP5",  "IP6", "IP7",  
                       "CPI", "FFR") # Use shorter labels for variables

We may wish to subset our time series.

# Subset time series and define a new end date
usmacro <- window(usmacro,           # Select the ts object
                  end = c(2006,12)   # Define new end date of time series
)

Part I: Some descriptives

In the first two parts, we focus solely on the seven different industrial production measures. Let’s plot them over time with ts.plot().

# Time series plot of industrial production measures 
IP.series <- usmacro[,1:7]

ts.plot(IP.series, 
        main = "Different Industrial Production Measures",  # Title of plot
        ylab = "Y-o-y growth rate",                         # Label of y-axis
        xlab = "Month",                                     # Label of x-axis
        col  = c(gray(seq(0.7,0.1,length.out = 7))),        # Define colours
        lty  = "solid",                                     # Define line types
        lwd  = 2                                            # Define line width
)
abline(h = 0) # Horizontal line at zero

Among the seven different industrial production growth rates, we can eyeball a substantial degree of comovement. It seems that, particularly around recessions, the comovement is exceptionally strong. Let’s take a closer look at the actual correlations. The corrplot() function provides a nice visualisation of these correlations.

# Obtain raw correlation coefficients 
cor <- cor(IP.series)

# Nice correlation plot for the industrial production measures 
corrplot::corrplot(cor)

How can we interpret this correlation plot?

Part II: Principal Component Analysis

We can use the function prcomp() to run principal components analysis (PCA), extract the principal components (which can be used as estimates of the factors), loadings matrix, and other series-specific quantities.

# Obtain principal components
PCA <- prcomp(IP.series, scale = TRUE)
# Extract the principal component and specify them to a time series
IP.PCs <- ts(PCA$x, start = start(IP.series), frequency = 12)
# Extract the loadings
IP.load <- PCA$rotation
# Extract the results for variables 
PCA.var <- get_pca_var(PCA)
# Obtain summary for each component
summary(PCA)

## Importance of components:
##                           PC1    PC2    PC3     PC4     PC5     PC6     PC7
## Standard deviation     2.2726 1.1351 0.5736 0.35667 0.27267 0.11652 0.05306
## Proportion of Variance 0.7378 0.1840 0.0470 0.01817 0.01062 0.00194 0.00040
## Cumulative Proportion  0.7378 0.9219 0.9689 0.98704 0.99766 0.99960 1.00000

How can we interpret the output we get from summary(PCA)?

Standard deviation captures the idea that the variances of the PCs are as large as possible subject to a constraint, Proportion of Variance refers to the fraction of total variability in the data explained by each factor/PC, and Cumulative Proportion refers to the total variability in the data explained by the factors/PCs considered.

We can visualise the variation explained by each principal component using a scree plot. A scree plot is an informal way of selecting the number of factors by plotting the variation explained by each component (from largest to smallest) against the number of components. To do this, we use the function fviz_eig() from the factoextra package.

fviz_eig(PCA, addlabels = TRUE, ylim = c(0, 100))

Based on the scree plot, how many factors (principal components) would you select?

We can also plot the first principal component alongside all the industrial production series.

# Time series plot together with first principal component
ts.plot(ts.intersect(IP.series, IP.PCs[,1]),
        main = "Different Industrial Production Measures",    # Title of plot
        ylab = "Y-o-y growth rate",                           # Label of y-axis
        xlab = "Month",                                       # Label of x-axis
        col  =  c(gray(seq(0.7, 0.1,length.out = 7)), "red"), # Define colours
        lty  = "solid",                                       # Define line types
        lwd  = c(rep(1.5, 7), 2)                              # Define line width
)
abline(h = 0) # Horizontal line at zero

In addition, we can also use the corrplot() function to look at the size/sign of the loadings. We only need to account for the fact that, unlike correlations, the loadings are not bounded between 0 and 1, and are on a different scale (the real line). This can be done via the option is.corr = FALSE.

corrplot::corrplot(IP.load, is.corr = FALSE)

We observe that the first factor loads equally on all the industrial production series, except for IP4. We can also visualise how much series-specific variation is explained by each principal component (which tells a similar story). While the first factor (principal component) strongly loads on all series except IP4, the second factor (principal component) explains a large portion of the variation in IP4.

corrplot::corrplot(PCA.var$cos2, is.corr=FALSE)

Part III - Digression: Factor augmented VARs (FA-VARs) based on PCA

In the following, for simplicity, we assume a single factor. We therefore extract the first principal component and augment the other macroeconomic variables with it.

# Extract the first principal component 
IP.PC1  <- IP.PCs[,1]

# Augment other macro variables with this first principal component
favar.data <- ts.intersect(IP.PC1, usmacro[,c("CPI", "FFR")])
colnames(favar.data) <- c("IP.PC1", "CPI", "FFR")

# Ordering of variables
var.order <- c("IP.PC1",   # real activity factor: slow-moving  (1st variable)
               "CPI",      # price index: slow-moving           (2nd variable)
               "FFR"       # policy rate: fast-moving           (3rd variable)
          )

favar.data   <- favar.data[,var.order]  # Reorder columns in dataset

Estimation and impulse response analysis for FA-VARs is no different than from conventional VARs.

VARs are the generalisation of univariate autoregressive (AR) models to the case in which there is more than one endogenous variable. A VAR is simply a collection of autoregressive distributed lag (ARDL) models. In a VAR model, we regress several time series variables on the lags of these variables.

In R, there are many methods and tools available for VAR analysis. One of these useful packages is the vars package. We can now use to the command VAR() to run a VAR model. Here we use twelve lags and include an intercept for our FA-VAR.

# Estimate a FA-VAR(12) model, including an intercept 
fa.var <- vars::VAR(favar.data,     # Available data (3 endogenous variables)
                    p = 12,         # Consider 12 lags (monthly data)
                    type = "const") # Include an intercept

summary(fa.var)

How can we interpret this output? How would we interpret the coefficients?

What can we learn from this VAR model about each equation and time series?

Model selection

For model selection, we can compute the Akaike Information Criterion (AIC) which you have seen previously. With monthly time series these days, it is almost a convention to work with p = 12 lags, accounting for the dynamics of a full year. However, we could also use any information criterion like the AIC to select the lag length.

Impulse response analysis with FA-VARs

Analysing the estimation output above can be a tedious task. We will therefore use impulse response functions (IRFs) to analyse the dynamic response over time of endogenous variables to a monetary policy shock, assuming that all other structural shocks are held at zero. IRFs measure how much future realisations are expected to differ in response to an on-off shock at time t with respect to the response when no such shock is imposed. IRFs of VARs are commonly used to quantify the dynamic response of certain endogenous variables after isolating a structural economic shock (keeping everything else constant). Intuitively, it is the marginal response to a specific shock.

In the following analysis, let’s focus on Cholesky identification to pin down a monetary policy shock. The interest rate, which is associated with a monetary policy shock, is ordered last. This specific ordering of variables implies that a monetary policy shock has no contemporaneous impact on the industrial production factor \(f_t\) (IP.PC1) and consumer price inflation (CPI). We will impose a positive one standard deviation shock and consider a maximum horizon of 60 months (i.e., 5 years).

# Obtain impulse responses
nhor <- 60 # Maximum horizon of impulse responses
irf.favar <- irf(fa.var,                # VAR object
                 impulse = "FFR",       # Impulse variable (in this case a "monetary policy shock")
                 response =  var.order, # Response variables
                 n.ahead = nhor,  
                 ci = 0.67,             # Confidence interval (67% corresponds to +/- one standard deviation) 
                 runs = 1000)           # Number of bootstrapping runs to obtain confidence intervals

To plot impulse responses, we could simply use plot(), but these default plots typically do not look nice.

# Plot impulse responses
irf.mp   <- data.frame(irf.favar$irf$FFR, irf.favar$Lower$FFR, irf.favar$Upper$FFR)
colnames(irf.mp) <- c(paste0(var.order, ".m"), 
                      paste0(var.order, ".l"),
                      paste0(var.order, ".u"))

par(mfrow=c(1,3)) # Divides the plotting window into three separate panels (3 columns)

# 1.) IRF of main IP factor to MP shock
plot(x = 0:nhor,
     y = irf.mp$IP.PC1.m, type = "l", main = "Main IP factor to MP shock", 
     xlab = '', ylab = '',
     ylim = c(min(irf.mp$IP.PC1.l), max(irf.mp$IP.PC1.u))*1.1)
polygon(x = c(0:nhor, rev(0:nhor)), 
        y = c(irf.mp$IP.PC1.u, rev(irf.mp$IP.PC1.l)),
        col = 'lightgray',
        lty = 0)
lines(x = 0:nhor, y = irf.mp$IP.PC1.m, lty=1, col = "black", lwd = 2) # Solid line for median response
lines(x = 0:nhor, y = irf.mp$IP.PC1.l,  lty=2, col = "black", lwd = 1) # Dashed lines for bounds
lines(x = 0:nhor, y = irf.mp$IP.PC1.u,  lty=2, col = "black", lwd = 1)
abline(h=0) # Horizontal zero line

# 2.) IRF of inflation to MP shock
plot(x = 0:nhor,
     y = irf.mp$CPI.m, type = "l", main = "Inflation to MP shock", 
     xlab = '', ylab = '',
     ylim = c(min(irf.mp$CPI.l), max(irf.mp$CPI.u))*1.1)
polygon(x = c(0:nhor, rev(0:nhor)), 
        y = c(irf.mp$CPI.u, rev(irf.mp$CPI.l)),
        col = 'lightgray',
        lty = 0)
lines(x = 0:nhor, y = irf.mp$CPI.m, lty=1, col = "black", lwd = 2) # Solid line for median response
lines(x = 0:nhor, y = irf.mp$CPI.l,  lty=2, col = "black", lwd = 1) # Dashed lines for bounds
lines(x = 0:nhor, y = irf.mp$CPI.u,  lty=2, col = "black", lwd = 1)
abline(h=0) # Horizontal zero line

# 3.) IRF of FFR to MP shock
plot(x = 0:nhor,
     y = irf.mp$FFR.m, type = "l", main = "FFR to MP shock", 
     xlab = '', ylab = '',
     ylim = c(min(irf.mp$FFR.l), max(irf.mp$FFR.u))*1.1)
polygon(x = c(0:nhor, rev(0:nhor)), 
        y = c(irf.mp$FFR.u, rev(irf.mp$FFR.l)),
        col = 'lightgray',
        lty = 0)
lines(x = 0:nhor, y = irf.mp$FFR.m, lty=1, col = "black", lwd = 2) # Solid line for median response
lines(x = 0:nhor, y = irf.mp$FFR.l,  lty=2, col = "black", lwd = 1) # Dashed lines for bounds
lines(x = 0:nhor, y = irf.mp$FFR.u,  lty=2, col = "black", lwd = 1)
abline(h=0) # Horizontal zero line

How can we interpret the impulse responses from above?

The FA-VAR is a powerful tool that also allows for obtaining series-specific impulse responses for each industrial production measure. We use the impulse response of the industrial production factor and exploit the relationship between each series and this factor. In other words, this amounts to scaling the single common factor by series-specific factor loadings.

# Impulse responses to each component (point estimate only)
# Duplicate the response from the common component for each series
irf.IPs <- matrix(rep(irf.mp$IP.PC1.m, each = ncol(IP.series)), ncol(IP.series), nhor+1)
rownames(irf.IPs) <- colnames(IP.series)

# Scale them with the associated loading of the 1st component
irf.IPs <- t(irf.IPs*IP.load[,1])
# Define them as a time series
irf.ip <- ts(irf.IPs, start = 0)

# Plot impulse responses of each IP series
par(mfrow = c(1,1))
ts.plot(irf.IPs,
        main = "IP series-specific IRFs to an MP shock",   # Title of plot
        ylab = "", xlab = "",                                      
        col  =  c(gray(seq(0.7, 0.1,length.out = 7))), # Define colours
        lty  = "solid",                                # Define line types
        lwd  = 1.5                                     # Define line width
)
abline(h = 0)

SGPE ECNM11049 — ATSE

Static factor models

Ping Wu (ping.wu@strath.ac.uk)