Title: | High-Dimensional Aggregate Density Forecasts |
---|---|
Description: | Provides a forecasting method that efficiently maps vast numbers of (scalar-valued) signals into an aggregate density forecast in a time-varying and computationally fast manner. The method proceeds in two steps: First, it transforms a predictive signal into a density forecast and, second, it combines the resulting candidate density forecasts into an ultimate aggregate density forecast. For a detailed explanation of the method, please refer to Adaemmer et al. (2023) <doi:10.2139/ssrn.4342487>. |
Authors: | Sven Lehmann [aut, cre, cph], Philipp Adämmer [aut], Rainer Schüssler [aut] |
Maintainer: | Sven Lehmann <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.3.0 |
Built: | 2024-11-09 06:08:54 UTC |
Source: | https://github.com/lehmasve/hdflex |
The dsc()
function generates
dynamic forecast combinations from a set of
candidate density forecasts. For each period,
it selects and combines a subset of predictive densities
with the highest ranks regarding local predictive accuracy.
The identities of the candidate forecasting models and
the subset sizes used for building the aggregate predictive density
may vary over time based on the data.
If only one candidate forecast is picked,
the approach temporarily collapses to pure model selection.
dsc( y, point_forecasts, variance_forecasts, gamma_grid, psi_grid, delta, burn_in, burn_in_dsc, metric, equal_weight, incl, portfolio_params = NULL )
dsc( y, point_forecasts, variance_forecasts, gamma_grid, psi_grid, delta, burn_in, burn_in_dsc, metric, equal_weight, incl, portfolio_params = NULL )
y |
A matrix of dimension |
point_forecasts |
A matrix with |
variance_forecasts |
A matrix with |
gamma_grid |
A numeric vector containing potential discount factors between 0 and 1 to exponentially down-weight the past predictive performance of the candidate forecasting models. The values of this tuning parameter are chosen in a procedure that amounts to leave-one-out cross-validation, taking into account the time series structure of the data. For details, see Adaemmer et al. (2023). |
psi_grid |
An integer vector that controls the (possible) sizes of the subsets. The values of this tuning parameter are chosen in a procedure that amounts to leave-one-out cross-validation, taking taking into account the time series structure of the data. For details, see Adaemmer et al. (2023). |
delta |
A numeric value between 0 and 1 denoting the discount factor applied to down-weight the past predictive performance of the aggregate predictive densities. |
burn_in |
An integer value |
burn_in_dsc |
An integer value |
metric |
An integer from the set |
equal_weight |
A boolean that denotes whether equal weights are used to
combine the candidate forecasts within a subset. If |
incl |
An optional integer vector that denotes signals that
must be included in the subset combinations. For example, |
portfolio_params |
A numeric vector of length 3 containing the following elements:
This parameter is only required if |
A list containing:
A list containing:
A vector with the actual values of the target variable.
A vector with the first moments of the aggregate predictive densities of the DSC model.
A vector with the second moments of the aggregate predictive densities of the DSC model.
A list containing:
A vector containing the selected values for the tuning parameter gamma.
A vector containing the selected values for the tuning parameter psi.
A matrix containing the selected candidate forecasting models.
A list containing:
The grid of gamma values used in the model.
The grid of psi values used in the model.
The delta value used in the model.
The burn-in period used in the model.
The burn-in period used in the model.
The ranking metric used in the model.
A boolean indicating if equal weighting was used.
Additional included parameters.
Philipp Adämmer, Sven Lehmann, Rainer Schüssler
Beckmann, J., Koop, G., Korobilis, D., and Schüssler, R. A. (2020) "Exchange rate predictability and dynamic bayesian learning." Journal of Applied Econometrics, 35 (4): 410–421.
Dangl, T. and Halling, M. (2012) "Predictive regressions with time-varying coefficients." Journal of Financial Economics, 106 (1): 157–181.
Del Negro, M., Hasegawa, R. B., and Schorfheide, F. (2016) "Dynamic prediction pools: An investigation of financial frictions and forecasting performance." Journal of Econometrics, 192 (2): 391–405.
Koop, G. and Korobilis, D. (2012) "Forecasting inflation using dynamic model averaging." International Economic Review, 53 (3): 867–886.
Koop, G. and Korobilis, D. (2023) "Bayesian dynamic variable selection in high dimensions." International Economic Review.
Raftery, A. E., Kárn'y, M., and Ettler, P. (2010) "Online prediction under model uncertainty via dynamic model averaging: Application to a cold rolling mill." Technometrics, 52 (1): 52–66.
West, M. and Harrison, J. (1997) "Bayesian forecasting and dynamic models" Springer, 2nd edn.
https://github.com/lehmasve/hdflex#readme
# See example for tvc().
# See example for tvc().
hdflex contains the forecasting algorithm STSC developed by Adämmer, Lehmann and Schüssler (2023) doi:10.2139/ssrn.4342487. STSC is a novel time series forecasting method designed to handle very large sets of predictive signals, many of which are irrelevant or have only short-lived predictive power. Please cite the paper when using the package.
Philipp Adämmer, Sven Lehmann, Rainer Schüssler
A high-dimensional dataset created by Koop and Korobilis (2023) that integrates predictive signals from various macroeconomic and financial sources.
inflation_data
inflation_data
A matrix with 245 quarterly observations (rows) and 462 signals (columns):
Transformed target variable: Total CPI (CPIAUCSL)
First and second lag of the target variable
Lagged and transformed signals from the sources listed above
External point forecasts available from 1976-Q1 to 2021-Q4 for quarterly Total CPI (CPIAUCSL), including:
Generated using regression trees, ridge regressions, and elastic nets over expanding and rolling windows
Based on models discussed in Koop and Korobilis (2023) such as Gaussian process regressions (GPR_FAC5), Unobserved Component Stochastic Volatility (UCSV), and Variational Bayes Dynamic Variable Selection (VBDVS_X)
The dataset includes data from the following sources:
FRED-QD dataset (McCracken and Ng, 2020)
Portfolio data (Jurado et al., 2015)
Stock market predictors (Welch and Goyal, 2008)
University of Michigan consumer surveys
World Bank’s Pink Sheet commodity prices
Key macroeconomic indicators from the Federal Reserve Economic Data for Canada, Germany, Japan, and the United Kingdom
The dataset is pre-processed for one-step-ahead forecasts and includes external point forecasts. It spans from 1960-Q3 to 2021-Q4.
Jurado, K., Ludvigson, S. C., and Ng, S. (2015) "Measuring uncertainty." American Economic Review, 105 (3): 1177–1216.
Koop, G. and Korobilis, D. (2023) "Bayesian dynamic variable selection in high dimensions." International Economic Review.
McCracken, M., and S. Ng (2020) “FRED-QD: A Quarterly Database for Macroeconomic Research” National Bureau of Economic Research, Working Paper 26872.
Welch, I. and Goyal, A. (2008) "A comprehensive look at the empirical performance of equity premium prediction." The Review of Financial Studies, 21 (4): 1455–1508.
stsc()
is a time series forecasting method designed to handle
vast sets of predictive signals, many of which may be irrelevant or
short-lived. This method transforms heterogeneous scalar-valued signals into
candidate density forecasts via time-varying coefficient models (TV-C),
and subsequently, combines them into an ultimate aggregate density forecast
via dynamic subset combinations (DSC).
stsc( y, X, Ext_F, init, lambda_grid, kappa_grid, bias, gamma_grid, psi_grid, delta, burn_in, burn_in_dsc, metric, equal_weight, incl, parallel = FALSE, n_threads = parallel::detectCores() - 2, portfolio_params = NULL )
stsc( y, X, Ext_F, init, lambda_grid, kappa_grid, bias, gamma_grid, psi_grid, delta, burn_in, burn_in_dsc, metric, equal_weight, incl, parallel = FALSE, n_threads = parallel::detectCores() - 2, portfolio_params = NULL )
y |
A matrix of dimension |
X |
A matrix with |
Ext_F |
A matrix with |
init |
An integer that denotes the number of observations used to initialize the observational variance and the coefficients' variance in the TV-C models. |
lambda_grid |
A numeric vector which takes values between 0 and 1
denoting the discount factor(s) that control the dynamics of the time-varying
coefficients. Each signal in combination with each value of
lambda provides a separate candidate forecast.
Constant coefficients are nested for the case |
kappa_grid |
A numeric vector which takes values between 0 and 1
to accommodate time-varying volatility in the TV-C models.
The observational variance is estimated via
Exponentially Weighted Moving Average and kappa denotes the underlying
decay factor. Constant variance is nested for the case |
bias |
A boolean to indicate whether the TV-C-models
allow for a bias correction to F-signals.
|
gamma_grid |
A numeric vector containing potential discount factors between 0 and 1 to exponentially down-weight the past predictive performance of the candidate forecasting models. The values of this tuning parameter are chosen in a procedure that amounts to leave-one-out cross-validation, taking into account the time series structure of the data. For details, see Adaemmer et al. (2023). |
psi_grid |
An integer vector that controls the (possible) sizes of the subsets. The values of this tuning parameter are chosen in a procedure that amounts to leave-one-out cross-validation, taking taking into account the time series structure of the data. For details, see Adaemmer et al. (2023). |
delta |
A numeric value between 0 and 1 denoting the discount factor applied to down-weight the past predictive performance of the aggregate predictive densities. |
burn_in |
An integer value |
burn_in_dsc |
An integer value |
metric |
An integer from the set |
equal_weight |
A boolean that denotes whether equal weights are used to
combine the candidate forecasts within a subset. If |
incl |
An optional integer vector that denotes signals that
must be included in the subset combinations. For example, |
parallel |
A boolean indicating whether the function should be parallelized. |
n_threads |
An integer that denotes the number of cores used
for parallelization. Only necessary if |
portfolio_params |
A numeric vector of length 3 containing the following elements:
This parameter is only required if |
A list containing:
A list containing:
A vector with the actual values of the target variable.
A vector with the first moments of the aggregate predictive densities of the STSC model.
A vector with the second moments of the aggregate predictive densities of the STSC model.
A list containing:
A vector containing the selected values for the tuning parameter gamma.
A vector containing the selected values for the tuning parameter psi.
A matrix containing the selected signals.
A matrix containing the selected values for the tuning parameter lambda.
A matrix containing the selected values for the tuning parameter kappa.
A list containing:
The grid of lambda values used in the model.
The grid of kappa values used in the model.
The grid of gamma values used in the model.
The grid of psi values used in the model.
The delta value used in the model.
The init value used in the model.
The burn-in period used in the model.
The burn-in period used in the model.
The ranking metric used in the model.
A boolean indicating if equal weighting was used.
A boolean indicating if bias correction was applied.
Additional included parameters.
Philipp Adämmer, Sven Lehmann, Rainer Schüssler
Beckmann, J., Koop, G., Korobilis, D., and Schüssler, R. A. (2020) "Exchange rate predictability and dynamic bayesian learning." Journal of Applied Econometrics, 35 (4): 410–421.
Dangl, T. and Halling, M. (2012) "Predictive regressions with time-varying coefficients." Journal of Financial Economics, 106 (1): 157–181.
Del Negro, M., Hasegawa, R. B., and Schorfheide, F. (2016) "Dynamic prediction pools: An investigation of financial frictions and forecasting performance." Journal of Econometrics, 192 (2): 391–405.
Koop, G. and Korobilis, D. (2012) "Forecasting inflation using dynamic model averaging." International Economic Review, 53 (3): 867–886.
Koop, G. and Korobilis, D. (2023) "Bayesian dynamic variable selection in high dimensions." International Economic Review.
Raftery, A. E., Kárn'y, M., and Ettler, P. (2010) "Online prediction under model uncertainty via dynamic model averaging: Application to a cold rolling mill." Technometrics, 52 (1): 52–66.
West, M. and Harrison, J. (1997) "Bayesian forecasting and dynamic models" Springer, 2nd edn.
https://github.com/lehmasve/hdflex#readme
######################################################### ######### Forecasting quarterly U.S. inflation ########## #### Please see Koop & Korobilis (2023) for further #### #### details regarding the data & external forecasts #### ######################################################### # Load Package library("hdflex") library("ggplot2") library("cowplot") ########## Get Data ########## # Load Package Data inflation_data <- inflation_data # Set Target Variable y <- inflation_data[, 1] # Set 'P-Signals' X <- inflation_data[, 2:442] # Set 'F-Signals' Ext_F <- inflation_data[, 443:462] # Get Dates and Number of Observations tdates <- rownames(inflation_data) tlength <- length(tdates) # First complete observation (no missing values) first_complete <- which(complete.cases(inflation_data))[1] ########## Rolling AR2-Benchmark ########## # Set up matrix for predictions benchmark <- matrix(NA, nrow = tlength, ncol = 1, dimnames = list(tdates, "AR2")) # Set Window-Size (15 years of quarterly data) window_size <- 15 * 4 # Time Sequence t_seq <- seq(window_size, tlength - 1) # Loop with rolling window for (t in t_seq) { # Split Data for Training Train Data x_train <- cbind(int = 1, X[(t - window_size + 1):t, 1:2]) y_train <- y[(t - window_size + 1):t] # Split Data for Prediction x_pred <- cbind(int = 1, X[t + 1, 1:2, drop = FALSE]) # Fit AR-Model model_ar <- .lm.fit(x_train, y_train) # Predict and store in benchmark matrix benchmark[t + 1, ] <- x_pred %*% model_ar$coefficients } ########## STSC ########## # Set TV-C-Parameter init <- 5 * 4 lambda_grid <- c(0.90, 0.95, 1.00) kappa_grid <- c(0.94, 0.96, 0.98) bias <- TRUE # Set DSC-Parameter gamma_grid <- c(0.40, 0.50, 0.60, 0.70, 0.80, 0.90, 0.91, 0.92, 0.93, 0.94, 0.95, 0.96, 0.97, 0.98, 0.99, 1.00) n_tvc <- (ncol(X) + ncol(Ext_F)) * length(lambda_grid) * length(kappa_grid) psi_grid <- c(1:100, sapply(1:4, function(i) floor(i * n_tvc / 4))) delta <- 0.95 burn_in <- first_complete + init / 2 burn_in_dsc <- 1 metric <- 5 equal_weight <- TRUE incl <- NULL parallel <- FALSE n_threads <- NULL # Apply STSC-Function results <- hdflex::stsc(y, X, Ext_F, init, lambda_grid, kappa_grid, bias, gamma_grid, psi_grid, delta, burn_in, burn_in_dsc, metric, equal_weight, incl, parallel, n_threads, NULL) ########## Evaluation ########## # Define Evaluation Period (OOS-Period) eval_period <- which(tdates >= "1991-04-01" & tdates <= "2021-12-01") # Get Evaluation Summary for STSC eval_results <- summary(obj = results, eval_period = eval_period) # Calculate (Mean-)Squared-Errors for AR2-Benchmark se_ar2 <- (y[eval_period] - benchmark[eval_period, 1])^2 mse_ar2 <- mean(se_ar2) # Create CSSED-Plot cssed <- cumsum(se_ar2 - eval_results$MSE[[2]]) plot_cssed <- ggplot( data = data.frame(eval_period, cssed), aes(x = eval_period, y = cssed) ) + geom_line() + ylim(-0.0008, 0.0008) + ggtitle("Cumulative Squared Error Differences") + xlab("Time Index") + ylab("CSSED") + geom_hline(yintercept = 0, linetype = "dashed", color = "darkgray") + theme_minimal(base_size = 15) + theme( panel.grid.major = element_blank(), panel.grid.minor = element_blank(), panel.border = element_rect(colour = "black", fill = NA), axis.ticks = element_line(colour = "black"), plot.title = element_text(hjust = 0.5) ) # Show Plots options(repr.plot.width = 15, repr.plot.height = 15) plots_list <- eval_results$Plots plots_list <- c(list(plot_cssed), plots_list) cowplot::plot_grid(plotlist = plots_list, ncol = 2, nrow = 3, align = "hv") # Relative MSE print(paste("Relative MSE:", round(eval_results$MSE[[1]] / mse_ar2, 4)))
######################################################### ######### Forecasting quarterly U.S. inflation ########## #### Please see Koop & Korobilis (2023) for further #### #### details regarding the data & external forecasts #### ######################################################### # Load Package library("hdflex") library("ggplot2") library("cowplot") ########## Get Data ########## # Load Package Data inflation_data <- inflation_data # Set Target Variable y <- inflation_data[, 1] # Set 'P-Signals' X <- inflation_data[, 2:442] # Set 'F-Signals' Ext_F <- inflation_data[, 443:462] # Get Dates and Number of Observations tdates <- rownames(inflation_data) tlength <- length(tdates) # First complete observation (no missing values) first_complete <- which(complete.cases(inflation_data))[1] ########## Rolling AR2-Benchmark ########## # Set up matrix for predictions benchmark <- matrix(NA, nrow = tlength, ncol = 1, dimnames = list(tdates, "AR2")) # Set Window-Size (15 years of quarterly data) window_size <- 15 * 4 # Time Sequence t_seq <- seq(window_size, tlength - 1) # Loop with rolling window for (t in t_seq) { # Split Data for Training Train Data x_train <- cbind(int = 1, X[(t - window_size + 1):t, 1:2]) y_train <- y[(t - window_size + 1):t] # Split Data for Prediction x_pred <- cbind(int = 1, X[t + 1, 1:2, drop = FALSE]) # Fit AR-Model model_ar <- .lm.fit(x_train, y_train) # Predict and store in benchmark matrix benchmark[t + 1, ] <- x_pred %*% model_ar$coefficients } ########## STSC ########## # Set TV-C-Parameter init <- 5 * 4 lambda_grid <- c(0.90, 0.95, 1.00) kappa_grid <- c(0.94, 0.96, 0.98) bias <- TRUE # Set DSC-Parameter gamma_grid <- c(0.40, 0.50, 0.60, 0.70, 0.80, 0.90, 0.91, 0.92, 0.93, 0.94, 0.95, 0.96, 0.97, 0.98, 0.99, 1.00) n_tvc <- (ncol(X) + ncol(Ext_F)) * length(lambda_grid) * length(kappa_grid) psi_grid <- c(1:100, sapply(1:4, function(i) floor(i * n_tvc / 4))) delta <- 0.95 burn_in <- first_complete + init / 2 burn_in_dsc <- 1 metric <- 5 equal_weight <- TRUE incl <- NULL parallel <- FALSE n_threads <- NULL # Apply STSC-Function results <- hdflex::stsc(y, X, Ext_F, init, lambda_grid, kappa_grid, bias, gamma_grid, psi_grid, delta, burn_in, burn_in_dsc, metric, equal_weight, incl, parallel, n_threads, NULL) ########## Evaluation ########## # Define Evaluation Period (OOS-Period) eval_period <- which(tdates >= "1991-04-01" & tdates <= "2021-12-01") # Get Evaluation Summary for STSC eval_results <- summary(obj = results, eval_period = eval_period) # Calculate (Mean-)Squared-Errors for AR2-Benchmark se_ar2 <- (y[eval_period] - benchmark[eval_period, 1])^2 mse_ar2 <- mean(se_ar2) # Create CSSED-Plot cssed <- cumsum(se_ar2 - eval_results$MSE[[2]]) plot_cssed <- ggplot( data = data.frame(eval_period, cssed), aes(x = eval_period, y = cssed) ) + geom_line() + ylim(-0.0008, 0.0008) + ggtitle("Cumulative Squared Error Differences") + xlab("Time Index") + ylab("CSSED") + geom_hline(yintercept = 0, linetype = "dashed", color = "darkgray") + theme_minimal(base_size = 15) + theme( panel.grid.major = element_blank(), panel.grid.minor = element_blank(), panel.border = element_rect(colour = "black", fill = NA), axis.ticks = element_line(colour = "black"), plot.title = element_text(hjust = 0.5) ) # Show Plots options(repr.plot.width = 15, repr.plot.height = 15) plots_list <- eval_results$Plots plots_list <- c(list(plot_cssed), plots_list) cowplot::plot_grid(plotlist = plots_list, ncol = 2, nrow = 3, align = "hv") # Relative MSE print(paste("Relative MSE:", round(eval_results$MSE[[1]] / mse_ar2, 4)))
This function plots the evolution of the tuning parameters for a 'dsc' object and returns basic performance metrics.
## S3 method for class 'dsc_obj' summary(object, eval_period = NULL, ...)
## S3 method for class 'dsc_obj' summary(object, eval_period = NULL, ...)
object |
An object of type 'dsc'. |
eval_period |
(Optional) A vector of indices to specify the evaluation period. Defaults to the entire period after burn-in. |
... |
Additional arguments to be consistent with the S3 print() function. |
A list containing:
A list with the mean squared error (MSE) and squared errors (SE).
A list with the average continuous ranked probability score (ACRPS) and CRPS values.
A list with the average predictive log-likelihood (APLL) and predictive log-likelihood (PLL) values.
A list of ggplot objects for visualizing the tuning parameters and selected CFMs.
Gneiting, T., Raftery, A. E., Westveld, A. H., and Goldman, T. (2005): Calibrated Probabilistic Forecasting Using Ensemble Model Output Statistics and Minimum CRPS Estimation. Monthly Weather Review, 133: 1098–1118.
Jordan, A., Krueger, F., and Lerch, S. (2019): "Evaluating Probabilistic Forecasts with scoringRules." Journal of Statistical Software, 90(12): 1-37.
# See example for tvc().
# See example for tvc().
This function plots the evolution of the tuning parameters for an 'stsc' object and returns basic performance metrics.
## S3 method for class 'stsc_obj' summary(object, eval_period = NULL, ...)
## S3 method for class 'stsc_obj' summary(object, eval_period = NULL, ...)
object |
An object of type 'stsc'. |
eval_period |
(Optional) A vector of indices to specify the evaluation period. Defaults to the entire period after burn-in. |
... |
Additional arguments to be consistent with the S3 print() function. |
A list containing:
A list with the mean squared error (MSE) and squared errors (SE).
A list with the average continuous ranked probability score (ACRPS) and CRPS values.
A list with the average predictive log-likelihood (APLL) and predictive log-likelihood (PLL) values.
A list of ggplot objects for visualizing the tuning parameters and selected signals.
Gneiting, T., Raftery, A. E., Westveld, A. H., and Goldman, T. (2005): Calibrated Probabilistic Forecasting Using Ensemble Model Output Statistics and Minimum CRPS Estimation. Monthly Weather Review, 133: 1098–1118.
Jordan, A., Krueger, F., and Lerch, S. (2019): "Evaluating Probabilistic Forecasts with scoringRules." Journal of Statistical Software, 90(12): 1-37.
# See example for stsc().
# See example for stsc().
The tvc()
function generates density forecasts
based on univariate time-varying coefficient models in state-space form.
Each forecasting model includes an intercept and one predictive signal,
which can either be a 'P-signal' or 'F-signal'.
All models are estimated independently and both
estimation and forecasting are carried out recursively.
tvc(y, X, Ext_F, init, lambda_grid, kappa_grid, bias)
tvc(y, X, Ext_F, init, lambda_grid, kappa_grid, bias)
y |
A matrix of dimension |
X |
A matrix with |
Ext_F |
A matrix with |
init |
An integer that denotes the number of observations used to initialize the observational variance and the coefficients' variance in the TV-C models. |
lambda_grid |
A numeric vector which takes values between 0 and 1
denoting the discount factor(s) that control the dynamics of the time-varying
coefficients. Each signal in combination with each value of
lambda provides a separate candidate forecast.
Constant coefficients are nested for the case |
kappa_grid |
A numeric vector which takes values between 0 and 1
to accommodate time-varying volatility in the TV-C models.
The observational variance is estimated via
Exponentially Weighted Moving Average and kappa denotes the underlying
decay factor. Constant variance is nested for the case |
bias |
A boolean to indicate whether the TV-C-models
allow for a bias correction to F-signals.
|
A list containing:
A list containing:
A vector with the actual values of the target variable.
A vector with the first moments of the predictive densities.
A vector with the second moments of the predictive densities.
A list containing:
The grid of lambda values used in the model.
The grid of kappa values used in the model.
The init value used in the model.
A boolean indicating if bias correct was applied to F-signals.
Philipp Adämmer, Sven Lehmann, Rainer Schüssler
Beckmann, J., Koop, G., Korobilis, D., and Schüssler, R. A. (2020) "Exchange rate predictability and dynamic bayesian learning." Journal of Applied Econometrics, 35 (4): 410–421.
Dangl, T. and Halling, M. (2012) "Predictive regressions with time-varying coefficients." Journal of Financial Economics, 106 (1): 157–181.
Del Negro, M., Hasegawa, R. B., and Schorfheide, F. (2016) "Dynamic prediction pools: An investigation of financial frictions and forecasting performance." Journal of Econometrics, 192 (2): 391–405.
Koop, G. and Korobilis, D. (2012) "Forecasting inflation using dynamic model averaging." International Economic Review, 53 (3): 867–886.
Koop, G. and Korobilis, D. (2023) "Bayesian dynamic variable selection in high dimensions." International Economic Review.
Raftery, A. E., Kárn'y, M., and Ettler, P. (2010) "Online prediction under model uncertainty via dynamic model averaging: Application to a cold rolling mill." Technometrics, 52 (1): 52–66.
West, M. and Harrison, J. (1997) "Bayesian forecasting and dynamic models" Springer, 2nd edn.
https://github.com/lehmasve/hdflex#readme
######################################################### ######### Forecasting quarterly U.S. inflation ########## #### Please see Koop & Korobilis (2023) for further #### #### details regarding the data & external forecasts #### ######################################################### # Load Package library("hdflex") library("ggplot2") library("cowplot") ########## Get Data ########## # Load Package Data inflation_data <- inflation_data # Set Target Variable y <- inflation_data[, 1] # Set 'P-Signals' X <- inflation_data[, 2:442] # Set 'F-Signals' Ext_F <- inflation_data[, 443:462] # Get Dates and Number of Observations tdates <- rownames(inflation_data) tlength <- length(tdates) # First complete observation (no missing values) first_complete <- which(complete.cases(inflation_data))[1] ########## Rolling AR2-Benchmark ########## # Set up matrix for predictions benchmark <- matrix(NA, nrow = tlength, ncol = 1, dimnames = list(tdates, "AR2")) # Set Window-Size (15 years of quarterly data) window_size <- 15 * 4 # Time Sequence t_seq <- seq(window_size, tlength - 1) # Loop with rolling window for (t in t_seq) { # Split Data for Training Train Data x_train <- cbind(int = 1, X[(t - window_size + 1):t, 1:2]) y_train <- y[(t - window_size + 1):t] # Split Data for Prediction x_pred <- cbind(int = 1, X[t + 1, 1:2, drop = FALSE]) # Fit AR-Model model_ar <- .lm.fit(x_train, y_train) # Predict and store in benchmark matrix benchmark[t + 1, ] <- x_pred %*% model_ar$coefficients } ########## STSC ########## ### Part 1: TVC-Function # Set TV-C-Parameter init <- 5 * 4 lambda_grid <- c(0.90, 0.95, 1.00) kappa_grid <- c(0.94, 0.96, 0.98) bias <- TRUE # Apply TVC-Function tvc_results <- hdflex::tvc(y, X, Ext_F, init, lambda_grid, kappa_grid, bias) # Assign TVC-Results forecast_tvc <- tvc_results$Forecasts$Point_Forecasts variance_tvc <- tvc_results$Forecasts$Variance_Forecasts # First complete forecast period (no missing values) sub_period <- seq(which(complete.cases(forecast_tvc))[1], tlength) ### Part 2: DSC-Function # Set DSC-Parameter gamma_grid <- c(0.40, 0.50, 0.60, 0.70, 0.80, 0.90, 0.91, 0.92, 0.93, 0.94, 0.95, 0.96, 0.97, 0.98, 0.99, 1.00) psi_grid <- c(1:100, sapply(1:4, function(i) floor(i * ncol(forecast_tvc) / 4))) delta <- 0.95 burn_in_tvc <- (init / 2) + 1 burn_in_dsc <- 1 metric <- 5 equal_weight <- TRUE incl <- NULL # Apply DSC-Function dsc_results <- hdflex::dsc(y[sub_period], forecast_tvc[sub_period, , drop = FALSE], variance_tvc[sub_period, , drop = FALSE], gamma_grid, psi_grid, delta, burn_in_tvc, burn_in_dsc, metric, equal_weight, incl, NULL) # Assign DSC-Results pred_stsc <- dsc_results$Forecasts$Point_Forecasts var_stsc <- dsc_results$Forecasts$Variance_Forecasts ########## Evaluation ########## # Define Evaluation Period (OOS-Period) eval_period <- which(tdates[sub_period] >= "1991-04-01" & tdates[sub_period] <= "2021-12-01") # Get Evaluation Summary for STSC eval_results <- summary(obj = dsc_results, eval_period = eval_period) # Calculate (Mean-)Squared-Errors for AR2-Benchmark oos_y <- y[sub_period][eval_period] oos_benchmark <- benchmark[sub_period[eval_period], , drop = FALSE] se_ar2 <- (oos_y - oos_benchmark)^2 mse_ar2 <- mean(se_ar2) # Create Cumulative Squared Error Differences (CSSED) Plot cssed <- cumsum(se_ar2 - eval_results$MSE[[2]]) plot_cssed <- ggplot( data.frame(eval_period, cssed), aes(x = eval_period, y = cssed) ) + geom_line() + ylim(-0.0008, 0.0008) + ggtitle("Cumulative Squared Error Differences") + xlab("Time Index") + ylab("CSSED") + geom_hline(yintercept = 0, linetype = "dashed", color = "darkgray") + theme_minimal(base_size = 15) + theme( panel.grid.major = element_blank(), panel.grid.minor = element_blank(), panel.border = element_rect(colour = "black", fill = NA), axis.ticks = element_line(colour = "black"), plot.title = element_text(hjust = 0.5) ) # Show Plots options(repr.plot.width = 15, repr.plot.height = 15) plots_list <- eval_results$Plots plots_list <- c(list(plot_cssed), plots_list) cowplot::plot_grid(plotlist = plots_list, ncol = 2, nrow = 3, align = "hv") # Relative MSE print(paste("Relative MSE:", round(eval_results$MSE[[1]] / mse_ar2, 4)))
######################################################### ######### Forecasting quarterly U.S. inflation ########## #### Please see Koop & Korobilis (2023) for further #### #### details regarding the data & external forecasts #### ######################################################### # Load Package library("hdflex") library("ggplot2") library("cowplot") ########## Get Data ########## # Load Package Data inflation_data <- inflation_data # Set Target Variable y <- inflation_data[, 1] # Set 'P-Signals' X <- inflation_data[, 2:442] # Set 'F-Signals' Ext_F <- inflation_data[, 443:462] # Get Dates and Number of Observations tdates <- rownames(inflation_data) tlength <- length(tdates) # First complete observation (no missing values) first_complete <- which(complete.cases(inflation_data))[1] ########## Rolling AR2-Benchmark ########## # Set up matrix for predictions benchmark <- matrix(NA, nrow = tlength, ncol = 1, dimnames = list(tdates, "AR2")) # Set Window-Size (15 years of quarterly data) window_size <- 15 * 4 # Time Sequence t_seq <- seq(window_size, tlength - 1) # Loop with rolling window for (t in t_seq) { # Split Data for Training Train Data x_train <- cbind(int = 1, X[(t - window_size + 1):t, 1:2]) y_train <- y[(t - window_size + 1):t] # Split Data for Prediction x_pred <- cbind(int = 1, X[t + 1, 1:2, drop = FALSE]) # Fit AR-Model model_ar <- .lm.fit(x_train, y_train) # Predict and store in benchmark matrix benchmark[t + 1, ] <- x_pred %*% model_ar$coefficients } ########## STSC ########## ### Part 1: TVC-Function # Set TV-C-Parameter init <- 5 * 4 lambda_grid <- c(0.90, 0.95, 1.00) kappa_grid <- c(0.94, 0.96, 0.98) bias <- TRUE # Apply TVC-Function tvc_results <- hdflex::tvc(y, X, Ext_F, init, lambda_grid, kappa_grid, bias) # Assign TVC-Results forecast_tvc <- tvc_results$Forecasts$Point_Forecasts variance_tvc <- tvc_results$Forecasts$Variance_Forecasts # First complete forecast period (no missing values) sub_period <- seq(which(complete.cases(forecast_tvc))[1], tlength) ### Part 2: DSC-Function # Set DSC-Parameter gamma_grid <- c(0.40, 0.50, 0.60, 0.70, 0.80, 0.90, 0.91, 0.92, 0.93, 0.94, 0.95, 0.96, 0.97, 0.98, 0.99, 1.00) psi_grid <- c(1:100, sapply(1:4, function(i) floor(i * ncol(forecast_tvc) / 4))) delta <- 0.95 burn_in_tvc <- (init / 2) + 1 burn_in_dsc <- 1 metric <- 5 equal_weight <- TRUE incl <- NULL # Apply DSC-Function dsc_results <- hdflex::dsc(y[sub_period], forecast_tvc[sub_period, , drop = FALSE], variance_tvc[sub_period, , drop = FALSE], gamma_grid, psi_grid, delta, burn_in_tvc, burn_in_dsc, metric, equal_weight, incl, NULL) # Assign DSC-Results pred_stsc <- dsc_results$Forecasts$Point_Forecasts var_stsc <- dsc_results$Forecasts$Variance_Forecasts ########## Evaluation ########## # Define Evaluation Period (OOS-Period) eval_period <- which(tdates[sub_period] >= "1991-04-01" & tdates[sub_period] <= "2021-12-01") # Get Evaluation Summary for STSC eval_results <- summary(obj = dsc_results, eval_period = eval_period) # Calculate (Mean-)Squared-Errors for AR2-Benchmark oos_y <- y[sub_period][eval_period] oos_benchmark <- benchmark[sub_period[eval_period], , drop = FALSE] se_ar2 <- (oos_y - oos_benchmark)^2 mse_ar2 <- mean(se_ar2) # Create Cumulative Squared Error Differences (CSSED) Plot cssed <- cumsum(se_ar2 - eval_results$MSE[[2]]) plot_cssed <- ggplot( data.frame(eval_period, cssed), aes(x = eval_period, y = cssed) ) + geom_line() + ylim(-0.0008, 0.0008) + ggtitle("Cumulative Squared Error Differences") + xlab("Time Index") + ylab("CSSED") + geom_hline(yintercept = 0, linetype = "dashed", color = "darkgray") + theme_minimal(base_size = 15) + theme( panel.grid.major = element_blank(), panel.grid.minor = element_blank(), panel.border = element_rect(colour = "black", fill = NA), axis.ticks = element_line(colour = "black"), plot.title = element_text(hjust = 0.5) ) # Show Plots options(repr.plot.width = 15, repr.plot.height = 15) plots_list <- eval_results$Plots plots_list <- c(list(plot_cssed), plots_list) cowplot::plot_grid(plotlist = plots_list, ncol = 2, nrow = 3, align = "hv") # Relative MSE print(paste("Relative MSE:", round(eval_results$MSE[[1]] / mse_ar2, 4)))