dfms: Efficient estimation of dynamic factor models (DFMs) using the expectation maximization (EM) algorithm or two-step (2S) estimation.
xts:
Similar to the zoo
package, it includes important functions
for working with time series.
httr: To download data directly into R from webpages.
# Important packages described above
install.packages("dfms", repos = "https://cran.rstudio.com/",
dependencies = TRUE)
install.packages("xts", repos = "https://cran.rstudio.com/",
dependencies = TRUE)
install.packages("httr", repos = "https://cran.rstudio.com/",
dependencies = TRUE)
The next step is to make sure that you can access the routines in
this package by making use of the library
command, which
would need to be run regardless of the machine that you are using.
# Important packages described above
library(dfms)
library(xts)
library(httr)
library(dplyr)
In this computer tutorial we directly source the current data vintage
from the FRED webpage. We use the GET()
function from the
httr
package. To do so, we need the url of the data file,
which we temporarily download/save and then write on an object named
usmacro
.
url_fred <- "https://www.stlouisfed.org/-/media/project/frbstl/stlouisfed/research/fred-md/monthly/current.csv?sc_lang=en&hash=80445D12401C59CF716410F3F7863B64" # Specify URL of data file.
GET(url_fred, write_disk(tf <- tempfile(fileext = "usdat.csv"))) # Temporarily save the csv file
usmacro <- read.table(tf, sep=",",header=T) #
# Select the following macroeconomic variables
var.slct <- c("INDPRO", "IPFPNSS", "IPFINAL", "IPCONGD", "IPDCONGD", "IPNCONGD", "IPBUSEQ","IPMAT", "IPDMAT", "IPNMAT", # 10 industrial production series
"CPIAUCSL", "CPIAPPSL", "CPITRNSL", "CPIULFSL", "PCEPI" # 5 price series
)
var.plot <- var.slct # Select the following macroeconomic variables for the plot
# var.slct <- c("RPI", "INDPRO", "PAYEMS", "S.P.500")
usmacro <- usmacro[,var.slct]
usmacro <- ts(usmacro[-1,], start = c(1959, 1), frequency = 12)
usmacro <- window(usmacro, start = c(1970, 1), end = c(2023, 12))
ts.plot(scale(usmacro[,var.plot]), lwd = 1, main = "US macro time series")
You will see the data is not stationary, so first take the log difference
usgrowth = usmacro %>%
log() %>% # take log
diff() %>% # take the difference
scale() # standardise data
ts.plot(usgrowth[,var.plot], lwd = 1, main = "Growth rates of US macro time series")
abline(h = 0)
Before estimating a model, the ICr()
function can be
applied to determine the number of factors. It computes three
information criteria proposed in Bai and NG (2002), whereby the second
criteria generally suggests the most parsimonious model
ic = ICr(usgrowth)
print(ic)
plot(ic)
Another option is to use a scree plot (see previous lab) to gauge the
number of factors by looking for a kink in the plot. A mathematical
procedure for finding the kink was suggested by Onatski (2010), but this
is currently not implemented in ICr()
.
screeplot(ic)
Based on the scree plot, I gauge that a model with 3 or 4 factors should be estimated. Next to the number of factors, the lag order of the factor-VAR of the transition equation should be estimated (the default is 1 lag). This can be done using the VARselect() function from the vars package, with PCA factor estimates reported by ICr()
vars::VARselect(ic$F_pca[, 1:4]) # Using 3 PCs to estimate the VAR lag order
## $selection
## AIC(n) HQ(n) SC(n) FPE(n)
## 4 2 1 4
##
## $criteria
## 1 2 3 4 5 6 7
## AIC(n) 2.571221 2.473769 2.430330 2.405649 2.416996 2.429584 2.442200
## HQ(n) 2.625546 2.571555 2.571576 2.590355 2.645162 2.701210 2.757287
## SC(n) 2.711151 2.725643 2.794148 2.881411 3.004702 3.129234 3.253794
## FPE(n) 13.081803 11.867181 11.362891 11.086184 11.213204 11.355990 11.501212
## 8 9 10
## AIC(n) 2.459526 2.467761 2.414020
## HQ(n) 2.818072 2.869767 2.859486
## SC(n) 3.383064 3.503243 3.561445
## FPE(n) 11.703611 11.802181 11.186776
The selection thus suggests we should estimate a factor model with r = 4 factors and p = 4 lags. (In Gary’s slides, the number of factors is represented by m.)
The DFM can be written as
\[\begin{aligned} y_t &= C f_t + \varepsilon_t,\\ f_t &= A_1 f_{t-1} + A_2 f_{t-2} + \cdots + A_p f_{t-p} + u_t, \quad \varepsilon_t \sim \mathcal{N}(0, W),\\ \varepsilon_t &= \rho_1 \varepsilon_{t-1} + \cdots + \rho_q \varepsilon_{t-q} +e_t, \quad e_t \sim \mathcal{N}(0, V). \end{aligned}\]Estimation can be done using the DFM()
function with
parameters r and p
model1 = DFM(usgrowth,
r = 4, # Number of factors
p = 4, # Number of lags
idio.ar1 = FALSE,# Model observation errors as AR(1) processes?
max.iter = 500
)
## Converged after 27 iterations.
summary(model1)
## Dynamic Factor Model: n = 15, T = 647, r = 4, p = 4, %NA = 0
##
## Call: DFM(X = usgrowth, r = 4, p = 4, idio.ar1 = FALSE, max.iter = 500)
##
## Summary Statistics of Factors [F]
## N Mean Median SD Min Max
## f1 647 0 0.1589 2.7767 -41.792 18.2906
## f2 647 0 -0.2256 1.8624 -11.8058 6.7822
## f3 647 0 0.0017 1.0412 -5.7385 3.2917
## f4 647 -0 0.0218 0.9574 -5.6904 4.9577
##
## Factor Transition Matrix [A]
## L1.f1 L1.f2 L1.f3 L1.f4 L2.f1 L2.f2 L2.f3 L2.f4
## f1 0.31051 0.30261 0.1415 -0.28354 -0.12621 -0.24208 -0.01003 0.03743
## f2 0.08357 0.55209 -0.1091 0.13570 0.02156 -0.03037 0.13159 0.04575
## f3 0.09928 0.02507 -0.2109 -0.05436 0.03370 -0.06323 -0.04600 -0.06917
## f4 0.02839 0.01643 0.1152 -0.18811 0.00325 -0.05274 -0.02714 -0.22540
## L3.f1 L3.f2 L3.f3 L3.f4 L4.f1 L4.f2 L4.f3 L4.f4
## f1 0.0617040 0.07189 -0.09753 0.252646 0.0316171 -0.1007523 0.10761 0.16607
## f2 0.0007484 0.05427 -0.05512 0.105254 0.0065051 0.1795951 -0.17631 -0.01718
## f3 0.0209634 -0.06858 0.03420 -0.004894 0.0360414 0.0130395 0.05148 -0.03445
## f4 0.0294371 0.03741 0.02433 0.006286 0.0002602 -0.0002518 -0.03791 -0.09391
##
## Factor Covariance Matrix [cov(F)]
## f1 f2 f3 f4
## f1 7.7101 -0.0039 0.0195 -0.0068
## f2 -0.0039 3.4686 0.0121 -0.0345
## f3 0.0195 0.0121 1.0842 0.0292
## f4 -0.0068 -0.0345 0.0292 0.9166
##
## Factor Transition Error Covariance Matrix [Q]
## u1 u2 u3 u4
## u1 6.6713 -0.3468 -0.2119 -0.1359
## u2 -0.3468 1.8341 0.0107 -0.0332
## u3 -0.2119 0.0107 0.9168 -0.0109
## u4 -0.1359 -0.0332 -0.0109 0.8298
##
## Observation Matrix [C]
## f1 f2 f3 f4
## INDPRO 0.3510 -0.0772 0.1371 0.0821
## IPFPNSS 0.3526 -0.0707 -0.1355 -0.0696
## IPFINAL 0.3449 -0.0762 -0.1836 -0.0824
## IPCONGD 0.3163 -0.0967 -0.3191 0.0639
## IPDCONGD 0.3026 -0.0721 -0.0304 -0.3473
## IPNCONGD 0.2089 -0.0722 -0.5666 0.5609
## IPBUSEQ 0.3174 -0.0493 0.0331 -0.2816
## IPMAT 0.3073 -0.0749 0.4174 0.2380
## IPDMAT 0.3093 -0.0588 0.2262 -0.1312
## IPNMAT 0.2060 -0.0308 0.3376 0.3093
## CPIAUCSL 0.0916 0.5046 -0.0366 0.0419
## CPIAPPSL 0.1141 0.1851 0.0642 -0.0105
## CPITRNSL 0.1158 0.3875 0.0214 0.0134
## CPIULFSL 0.0955 0.4936 -0.0367 0.0404
## PCEPI 0.0843 0.4775 -0.0635 0.0246
##
## Observation Error Covariance Matrix [diag(R) - Restricted]
## INDPRO IPFPNSS IPFINAL IPCONGD IPDCONGD IPNCONGD IPBUSEQ IPMAT
## 0.0002 0.0005 0.0206 0.0843 0.1614 0.0157 0.1385 0.0007
## IPDMAT IPNMAT CPIAUCSL CPIAPPSL CPITRNSL CPIULFSL PCEPI
## 0.1768 0.4484 0.0337 0.7715 0.3638 0.0667 0.1354
##
## Observation Residual Covariance Matrix [cov(resid(DFM))]
## INDPRO IPFPNSS IPFINAL IPCONGD IPDCONGD IPNCONGD IPBUSEQ
## INDPRO 0.0000 -0.0001* -0.0002* -0.0001 0.0000 0.0000 -0.0002
## IPFPNSS -0.0001* 0.0002 -0.0003* -0.0009* -0.0014* 0.0000 0.0000
## IPFINAL -0.0002* -0.0003* 0.0202 0.0184* 0.0238* 0.0002 0.0166*
## IPCONGD -0.0001 -0.0009* 0.0184* 0.0830 0.1044* -0.0022* -0.0609*
## IPDCONGD 0.0000 -0.0014* 0.0238* 0.1044* 0.1589 -0.0019* -0.0695*
## IPNCONGD 0.0000 0.0000 0.0002 -0.0022* -0.0019* 0.0008 0.0045*
## IPBUSEQ -0.0002 0.0000 0.0166* -0.0609* -0.0695* 0.0045* 0.1366
## IPMAT -0.0001* 0.0002* 0.0007* 0.0005* 0.0004 0.0000 0.0004
## IPDMAT -0.0004* 0.0006* -0.0093* 0.0103* 0.0074 0.0034* -0.0319*
## IPNMAT 0.0000 0.0013* -0.0274* -0.0257* -0.0325* -0.0037* -0.0092
## CPIAUCSL 0.0000 -0.0001 0.0006 -0.0017 -0.0018 0.0001 0.0029
## CPIAPPSL -0.0005* 0.0006 0.0084 0.0135 0.0217 0.0005 0.0061
## CPITRNSL 0.0002 -0.0002 -0.0107* 0.0210* 0.0291* 0.0007 -0.0467*
## CPIULFSL 0.0000 -0.0002 0.0006 0.0108* 0.0152* -0.0001 -0.0107*
## PCEPI -0.0001 0.0002 0.0010 -0.0026 -0.0066 -0.0004 0.0075
## IPMAT IPDMAT IPNMAT CPIAUCSL CPIAPPSL CPITRNSL CPIULFSL
## INDPRO -0.0001* -0.0004* 0.0000 0.0000 -0.0005* 0.0002 0.0000
## IPFPNSS 0.0002* 0.0006* 0.0013* -0.0001 0.0006 -0.0002 -0.0002
## IPFINAL 0.0007* -0.0093* -0.0274* 0.0006 0.0084 -0.0107* 0.0006
## IPCONGD 0.0005* 0.0103* -0.0257* -0.0017 0.0135 0.0210* 0.0108*
## IPDCONGD 0.0004 0.0074 -0.0325* -0.0018 0.0217 0.0291* 0.0152*
## IPNCONGD 0.0000 0.0034* -0.0037* 0.0001 0.0005 0.0007 -0.0001
## IPBUSEQ 0.0004 -0.0319* -0.0092 0.0029 0.0061 -0.0467* -0.0107*
## IPMAT 0.0003 0.0007* -0.0006 -0.0001 0.0012* -0.0005 -0.0001
## IPDMAT 0.0007* 0.1757 -0.0679* 0.0017 -0.0346* 0.0467* 0.0037
## IPNMAT -0.0006 -0.0679* 0.4483 -0.0046 -0.0080 -0.0049 -0.0023
## CPIAUCSL -0.0001 0.0017 -0.0046 0.0157 -0.0149* -0.0214* -0.0178*
## CPIAPPSL 0.0012* -0.0346* -0.0080 -0.0149* 0.7698 -0.1022* -0.0046
## CPITRNSL -0.0005 0.0467* -0.0049 -0.0214* -0.1022* 0.3533 0.0439*
## CPIULFSL -0.0001 0.0037 -0.0023 -0.0178* -0.0046 0.0439* 0.0495
## PCEPI 0.0003 -0.0151* 0.0193* -0.0114* 0.0408* -0.0577* -0.0327*
## PCEPI
## INDPRO -0.0001
## IPFPNSS 0.0002
## IPFINAL 0.0010
## IPCONGD -0.0026
## IPDCONGD -0.0066
## IPNCONGD -0.0004
## IPBUSEQ 0.0075
## IPMAT 0.0003
## IPDMAT -0.0151*
## IPNMAT 0.0193*
## CPIAUCSL -0.0114*
## CPIAPPSL 0.0408*
## CPITRNSL -0.0577*
## CPIULFSL -0.0327*
## PCEPI 0.1196
##
## Residual AR(1) Serial Correlation
## INDPRO IPFPNSS IPFINAL IPCONGD IPDCONGD IPNCONGD IPBUSEQ IPMAT
## -0.004101 0.012939 -0.009475 0.113489 0.194212 0.103491 0.153206 0.011628
## IPDMAT IPNMAT CPIAUCSL CPIAPPSL CPITRNSL CPIULFSL PCEPI
## 0.112309 -0.002949 -0.073317 0.090848 0.440820 0.102576 0.296435
##
## Summary of Residual AR(1) Serial Correlations
## N Mean Median SD Min Max
## 15 0.1028 0.1026 0.1325 -0.0733 0.4408
##
## Goodness of Fit: R-Squared
## INDPRO IPFPNSS IPFINAL IPCONGD IPDCONGD IPNCONGD IPBUSEQ IPMAT
## 1.0000 0.9998 0.9798 0.9170 0.8411 0.9992 0.8634 0.9997
## IPDMAT IPNMAT CPIAUCSL CPIAPPSL CPITRNSL CPIULFSL PCEPI
## 0.8243 0.5517 0.9843 0.2302 0.6467 0.9505 0.8804
##
## Summary of Individual R-Squared's
## N Mean Median SD Min Max
## 15 0.8445 0.917 0.2163 0.2302 1
plot(model1)
It also gives you the log likelihood until convergence
ll <- model1$loglik
cat("LL", ll[length(ll)])
## LL -4616.225
If I provide the output table, could you write down the equation and the estimated parameters?
Can you experiment with different values for p, q, and r, and compare their information criteria?