Joshua Angrist, V. Chernozhukov, Iván Fernández‐Val (2004)
Quantile Regression Under Misspecification, with an Application to the U.S. Wage StructureEconometrics eJournal
Iván Fernández-Val, J. Angrist, V. Chernozhukov (2004)
Quantile Regression under Misspecification
Tobias Fissler, J. Ziegel (2015)
Higher order elicitability and Osband’s principleAnnals of Statistics, 44
P. Hansen, Asger Lunde, James Nason (2010)
The Model Confidence SetEconometrics eJournal
Jeremy Berkowitz (2001)
Testing Density Forecasts, With Applications to Risk ManagementJournal of Business & Economic Statistics, 19
W. Wong (2008)
Backtesting Trading Risk of Commercial Banks Using Expected ShortfallBanking & Insurance eJournal
W. Hendricks, R. Koenker (1990)
Hierarchical Spline Models for Conditional Quantiles and the Demand for ElectricityJournal of the American Statistical Association, 87
David Ardia, Kris Boudt, Leopoldo Catania (2016)
Generalized Autoregressive Score Models in R: The GAS PackageERN: Forecasting Techniques (Topic)
R. Engle, Jeffrey Russell (1998)
Autoregressive Conditional Duration: A New Model for Irregularly Spaced Transaction DataEconometrica, 66
Nick Costanzino, Michael Curran (2015)
A Simple Traffic Light Approach to Backtesting Expected ShortfallERN: Value-at-Risk (Topic)
A. McNeil, R. Frey (2000)
Estimation of tail-related risk measures for heteroscedastic financial time series: an extreme value approachJournal of Empirical Finance, 7
A. Weiss (1991)
Estimating Nonlinear Dynamic Models Using Least Absolute Error EstimationEconometric Theory, 7
K. Holden, D. Peel (1990)
On Testing for Unbiasedness and Efficiency of ForecastsThe Manchester School, 58
(1996)
Overview of the Amendment to the Capital Accord to Incorporate Market Risks
J. Mincer (1970)
Economic Forecasts and Expectations: Analysis of Forecasting Behavior and Performance
S. Koopman, A. Lucas, Marcel Scharth (2012)
Predicting Time-Varying Parameters with Parameter-Driven and Observation-Driven ModelsReview of Economics and Statistics, 98
Jeroen Kerkhof, B. Melenberg (2002)
Backtesting for Risk-Based Regulatory CapitalRisk Management
J. Ziegel (2013)
COHERENCE AND ELICITABILITYMathematical Finance, 26
(2014)
‘Fundamental Review of the Trading Book: A Revised Market Risk Framework’ Second Consultative Document by the Basel Committee on Banking Supervision IMPACT ANALYSIS
Tobias Fissler, J. Ziegel, T. Gneiting (2015)
Expected Shortfall is jointly elicitable with Value at Risk - Implications for backtestingarXiv: Risk Management
B. Efron, R. Tibshirani (1994)
An Introduction to the Bootstrap
We use this as a proxy for the approximation of T SÀESR ¼T SÀESR ðð0
P. Embrechts, Haiyan Liu, Ruodu Wang (2017)
Quantile-Based Risk SharingRisk Management eJournal
M. Righi, Paulo Ceretta (2013)
Individual and Flexible Expected Shortfall BacktestingERN: Simulation Methods (Topic)
(2011)
Messages from the academic literature on risk measurement for the trading book
(2016)
Pillar 3 disclosure requirements – consolidated and enhanced framework . Technical report , Basel Committee on Banking Supervision
Fabio Bellini, B. Klar, A. Müller, Emanuela Gianin (2013)
Generalized Quantiles as Risk MeasuresMicroeconomics: Decision-Making under Risk & Uncertainty eJournal
G. Barone-Adesi, K. Giannopoulos, L. Vosper (1999)
VaR without correlations for portfolios of derivative securitiesJournal of Futures Markets, 19
S. Nadarajah, Bo Zhang, S. Chan (2014)
Estimation methods for expected shortfallQuantitative Finance, 14
Drew Creal, S. Koopman, A. Lucas (2013)
GENERALIZED AUTOREGRESSIVE SCORE MODELS WITH APPLICATIONSJournal of Applied Econometrics, 28
DuZaichao, EscancianoJuan Carlos (2017)
Backtesting Expected ShortfallManagement Science
Susanne Emmer, M. Kratz, Dirk Tasche (2013)
What is the Best Risk Measure in Practice? A Comparison of Standard MeasuresEconometric Modeling: Capital Markets - Risk eJournal
Zaichao Du, J. Escanciano (2015)
Backtesting Expected Shortfall: Accounting for Tail RiskMonetary Economics eJournal
M. Righi, Paulo Ceretta (2014)
A Comparison of Expected Shortfall Estimation ModelsEconometrics: Econometric Model Construction
Cox Pdf (1977)
The Theory Of Stochastic ProcessesThe Mathematical Gazette, 61
Daniel Nelson (1991)
CONDITIONAL HETEROSKEDASTICITY IN ASSET RETURNS: A NEW APPROACHEconometrica, 59
Nick Costanzino, Michael Curran (2015)
Backtesting General Spectral Risk Measures with Application to Expected ShortfallEconometrics: Econometric & Statistical Methods - Special Topics eJournal
C. Lloyd (2005)
Estimating test power adjusted for sizeJournal of Statistical Computation and Simulation, 75
Regression Percentiles Using Asymmetric Squared Error Loss
Kemal Guler, Pin Ng, Zhijie Xiao (2017)
Mincer–Zarnowitz quantile and expectile regressions for forecast evaluations under aysmmetric loss functionsJournal of Forecasting, 36
Whitney Newey, D. McFadden (1986)
Large sample estimation and hypothesis testingHandbook of Econometrics, 4
R. Cont, Romain Deguest, Giacomo Scandolo (2008)
Robustness and sensitivity analysis of risk measurement proceduresQuantitative Finance, 10
Savas Papadopoulos, Pantelis Stavroulias, T. Sager, Etti Baranoff (2018)
A Three-State Early Warning System for the European UnionEuropean Economics: Macroeconomics & Monetary Economics eJournal
(2013)
Mooted VAR substitute cannot be back-tested, says top quant
Timo Dimitriadis, Sebastian Bayer (2017)
A joint quantile and expected shortfall regression frameworkElectronic Journal of Statistics
T. Gneiting (2009)
Making and Evaluating Point ForecastsJournal of the American Statistical Association, 106
Stefan Weber (2006)
DISTRIBUTION‐INVARIANT RISK MEASURES, INFORMATION, AND DYNAMIC CONSISTENCYMathematical Finance, 16
Andrew Patton, J. Ziegel, Rui Chen (2017)
Dynamic Semiparametric Models for Expected Shortfall (and Value-At-Risk)ERN: Time-Series Models (Single) (Topic)
Michael Curran (2014)
Backtesting Expected Shortfall
S. Kotz (1974)
The Theory Of Stochastic Processes I
Furno Marilena, Vistocco Domenico (2018)
Quantile RegressionWiley Series in Probability and Statistics
J. Orgeldinger (2018)
Recent Issues in the Implementation of the New Basel Minimum Capital Requirements for Market Risk, 2
Alasdair Graham, János Pál (2014)
Backtesting value-at-risk tail losses on a dynamic portfolioThe Journal of Risk Model Validation, 8
Ivana Komunjer (2005)
Quasi-maximum likelihood estimation for conditional quantilesJournal of Econometrics, 128
H. White (1985)
Asymptotic theory for econometricians
C. Gouriéroux, A. Monfort, A. Trognon (1984)
PSEUDO MAXIMUM LIKELIHOOD METHODS: THEORYEconometrica, 52
Halbert Jr., Tae-Hwan Kim (2002)
Estimation, Inference, and Specification Testing for Possibly Misspecified Quantile RegressionEconometrics eJournal
A. Harvey (2013)
Dynamic Models for Volatility and Heavy Tails: With Applications to Financial and Economic Time Series
Sander Barendse (2017)
Efficiently Weighted Estimation of Tail and Interquantile ExpectationsEconometrics: Econometric & Statistical Methods - General eJournal
J. MacKinnon (2007)
Bootstrap Hypothesis Testing
L. Glosten, R. Jagannathan, D. Runkle (1993)
On the Relation between the Expected Value and the Volatility of the Nominal Excess Return on StocksJournal of Finance, 48
Yasuhiro Yamai, Toshinao Yoshiba (2002)
On the Validity of Value-at-Risk: Comparative Analyses with Expected ShortfallMonetary and and Economic Studies, 20
W. Gaglianone, L. Lima, Oliver Linton, Daniel Smith (2008)
Evaluating Value-at-Risk Models via Quantile RegressionJournal of Business & Economic Statistics, 29
Robert Löser, Dominik Wied, D. Ziggel (2018)
New Backtests for Unconditional Coverage of Expected ShortfallJournal of Risk
David Harvey (1997)
The evaluation of economic forecasts
H. White (1980)
Using Least Squares to Approximate Unknown Regression FunctionsInternational Economic Review, 21
Philippe Artzner, F. Delbaen, J. Eber, D. Heath (1999)
Coherent Measures of RiskMathematical Finance, 9
D. Hinkley (2008)
Bootstrap Methods: Another Look at the Jackknife
N. Nolde, J. Ziegel (2016)
Elicitability and backtesting: Perspectives for banking regulationThe Annals of Applied Statistics, 11
2019 a . esback : Expected Shortfall Backtesting . R package version 0 . 3 . 0
M. Kratz, Y. Lok, A. McNeil (2016)
Multinomial VAR Backtests: A Simple Implicit Approach to Backtesting Expected ShortfallRisk Management eJournal
R. Koenker, G. Bassett (2007)
Regression Quantiles
P. Embrechts, Jón Dańıelsson, C. Goodhart, C. Keating, F. Muennich, O. Renault, H. Shin (2001)
An academic response to Basel II, 130
James Taylor (2019)
Forecasting Value at Risk and Expected Shortfall Using a Semiparametric Approach Based on the Asymmetric Laplace DistributionJournal of Business & Economic Statistics, 37
Angrist (2006)
Quantile Regression under Misspecification, with an Application to the u.s. wage StructureEconometrica, 74
Ophélie Couperier, J. Leymarie (2020)
Backtesting Expected Shortfall via Multi-Quantile Regression
Drew Creal, S. Koopman, A. Lucas (2011)
Generalized Autoregressive Score Models with Applications ∗
T. Bollerslev (1986)
Generalized autoregressive conditional heteroskedasticityJournal of Econometrics, 31
H. Bierens, H. White (1996)
Estimation, Inference and Specification Analysis.Journal of the American Statistical Association, 91
P. Huber (1967)
The behavior of maximum likelihood estimates under nonstandard conditions
This article introduces novel backtests for the risk measure Expected Shortfall (ES) following the testing idea of Mincer and Zarnowitz (1969). Estimating a regression model for the ES stand-alone is infeasible and thus, our tests are based on a joint re- gression model for the Value at Risk (VaR) and the ES, which allows for different test specifications. These ES backtests are the first which solely backtest the ES in the sense that they only require ES forecasts as input variables. As the tests are poten- tially subject to model misspecification, we provide asymptotic theory under misspecification for the underlying joint regression. We find that employing a mis- specification robust covariance estimator substantially improves the tests’ perform- ance. We compare our backtests to existing joint VaR and ES backtests and find that our tests outperform the existing alternatives throughout all considered simulations. In an empirical illustration, we apply our backtests to ES forecasts for 200 stocks of the S&P 500 index. Key words: asymptotic theory, backtesting, expected shortfall, forecast evaluation, Mincer– Zarnowitz regression, model misspecification JEL classification: C12, C32, C52, C53, C58, G32 * We thank the editor Andrew Patton, an anonymous associate editor, and two referees for very helpful comments. We further thank Tobias Fissler, Lyudmila Grigoryeva, Roxana Halbleib, Phillip Heiler, Ekaterina Kazak, Winfried Pohlmeier, James Taylor, and Johanna Ziegel for suggestions which inspired some results of this paper. Financial support by the Heidelberg Academy of Sciences and Humanities (HAW) within the project “Analyzing, Measuring and Forecasting Financial Risks by means of High-Frequency Data”, the Klaus Tschira Foundation, the University of Hohenheim, and by the German Research Foundation (DFG) within the research group “Robust Risk Measures in Real Time Settings” is gratefully acknowledged. The authors acknowledge sup- port by the state of Baden-Wu ¨ rttemberg through bwHPC. The majority of the work on this paper was conducted while both authors were at the University of Konstanz. V C The Author(s) 2020. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creati- vecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, pro- vided the original work is properly cited. Downloaded from https://academic.oup.com/jfec/article/20/3/437/5912157 by DeepDyve user on 20 July 2022 438 Journal of Financial Econometrics Through the transition from Value at Risk (VaR) to Expected Shortfall (ES) as the primary market risk measure in the Basel Accords (Basel Committee, 2016, 2017), there is a great demand for reliable methods for estimating, forecasting, and backtesting the ES. Formally, the ES at level s 2ð0; 1Þ is defined as the mean of the returns smaller than the respective s- quantile (the VaR), where s is usually chosen to be 2.5% as stipulated by the Basel Accords. The ES is introduced into the banking regulation because it overcomes several shortcomings of the VaR, such as being not coherent and its inability to capture tail risks beyond the s- quantile (Artzner et al., 1999; Danielsson et al., 2001; Basel Committee, 2013). In contrast to estimation and forecasting of ES where most of the existing models for the VaR can eas- ily be adapted and generalized to the ES, such a generalization is not as straight-forward for backtesting ES forecasts (Emmer, Kratz, and Tasche, 2015). In general, backtesting of a risk measure is the process of testing whether given forecasts for this risk measure are cor- rectly specified, which is carried out by comparing the history of the issued risk forecasts with the corresponding realized returns. The primary difficulty in directly backtesting ES is its non-elicitability and non-identifiability (Weber, 2006; Gneiting, 2011; Fissler and Ziegel, 2016; Fissler, Ziegel, and Gneiting, 2016) as consequently, there is no analog to the hit sequence which is the natural identification function of quantiles and which lies at the heart of almost all VaR backtests. As a consequence, most of the proposed procedures in the growing literature on back- testing ES use indirect approaches by formally backtesting some quantity which is closely related to the ES. Examples include tests based on the entire tail distribution, a linear ap- proximation of the ES through several quantiles, or the pair consisting of the VaR and the ES. We argue that formally, these approaches are backtests for the auxiliary quantities ra- ther than for the ES itself, see also Nolde and Ziegel (2017). This distinction is particularly important as these backtests require further input variables such as forecasts for the VaR at multiple levels, the tail distribution beyond some quantile, or even the entire distribution. The regulatory authorities, however, do not have this additional information at hand as it is not mandatorily reported by the financial institutions (Aramonte et al., 2011; Basel Committee, 2016, 2017). As a consequence, the existing, so-called ES backtests are not ap- plicable where they are most needed. In this article, we propose novel backtests for ES forecasts which are the first strict ES backtests in the literature in the sense that besides the realized returns, they only require ES forecasts as input variables. Our tests follow the general regression-based testing idea of Mincer and Zarnowitz (1969). For this, we estimate a regression model that models the 1 See Yamai and Yoshiba (2002), Kerkhof and Melenberg (2004), Carver (2013), Acerbi and Szekely (2014), Emmer, Kratz, and Tasche (2015), Ziegel (2016), Fissler, Ziegel, and Gneiting (2016), Nolde and Ziegel (2017) for the ongoing discussion on backtestability of the ES. 2 In particular, several tests require the whole or tail distribution of the returns or equivalently the cumulative violation process (Kerkhof and Melenberg, 2004; Wong, 2008; Graham and Pa ´ l, 2014; Acerbi and Szekely, 2014; Du and Escanciano, 2017; Lo ¨ ser, Wied, and Ziggel, 2018; Costanzino and Curran, 2018), multiple quantiles at different levels (Emmer, Kratz, and Tasche, 2015; Costanzino and Curran, 2015; Kratz, Lok, and McNeil, 2018; Couperier and Leymarie, 2019), the VaR and the volatility (McNeil and Frey, 2000; Nolde and Ziegel, 2017; Righi and Ceretta, 2013, 2015), or the VaR (McNeil and Frey, 2000; Nolde and Ziegel, 2017) in addition to the ES forecasts. See Section S.1.2 in the Supplementary Appendix for an overview over the existing backtesting approaches. Downloaded from https://academic.oup.com/jfec/article/20/3/437/5912157 by DeepDyve user on 20 July 2022 Bayer and Dimitriadis j Regression-Based ES Backtesting 439 conditional ES at level s as a linear function ES ðY jF Þ¼ c þ c e^ , where we use finan- s t t1 t 1 2 cial returns Y as the response variable and the given ES forecasts e^ as the explanatory vari- t t able including an intercept term. For correctly specified ES forecasts, the intercept and slope parameters equal zero and one, which we test for by using a Wald statistic. As the ES is not elicitable (Gneiting, 2011), we face the methodological difficulty that we cannot esti- mate such a regression model for the ES stand-alone as neither loss nor identification func- tions are available for the ES, which could be used as objective functions for maximum (M) or generalized method of moment (GMM) estimation (Dimitriadis and Bayer, 2019). Recently, Patton, Ziegel, and Chen (2019) and Dimitriadis and Bayer (2019) propose a feasible alternative by specifying an auxiliary quantile regression equation Q ðY jF Þ¼ s t t1 b þ b q ^ (with explanatory variable q ^ ) and by jointly estimating the regression parame- 1 2 t t ters ðb; cÞ by employing a joint loss function for the quantile and the ES from Fissler and Ziegel (2016). The specification of the quantile equation allows for different testing approaches. First, we employ auxiliary VaR forecasts v^ as the explanatory variable in the quantile equation, but only test the ES-specific parameters c. We refer to this test as the Auxiliary ESR (ES re- gression) backtest. The main drawback of this test is that it requires auxiliary VaR forecasts and consequently, it is formally a joint backtest for the VaR and ES which, however, mainly focuses on the ES by only testing the ES-specific regression parameters. Second, we use the ES forecasts e^ as the explanatory variable in both, the quantile and the ES equation and again only test on the ES-specific parameters c. We refer to this test as the Strict ESR backt- est as it only requires ES forecasts as input variables and thus, it is the first test in the litera- ture which solely backtests ES forecasts. This testing idea comes at the drawback of a potential model misspecification in the quantile equation if the underlying data go beyond a pure scale (volatility) model. Therefore, we provide asymptotic theory for the joint quan- tile and ES regression framework under model misspecification, which generalizes the asymptotic theory introduced in Dimitriadis and Bayer (2019) and Patton, Ziegel, and Chen (2019). The potential model misspecification results in a more complex and usually inflated asymptotic covariance matrix. We account for this in the implementation of our tests by employing a covariance estimation technique which explicitly estimates these add- itional covariance terms. We further introduce an intercept variant of the Strict ESR backtest by fixing the slope parameter in the ES regression to one, and by only estimating and testing the intercept term. We refer to this backtest as the Intercept ESR backtest. This test allows for both, test- ing against one-sided and two-sided alternatives. In contrast, the other two proposed ESR backtests only allow for testing against two-sided alternatives as it is generally unclear how underestimated and overestimated ES forecasts influence the intercept and slope parame- ters. Because the capital requirements that the financial institutions must keep as a reserve depend on the reported risk forecasts, the market participants have an incentive to report risk forecasts which are too risky in order to minimize the expensive capital requirements. In contrast, issuing too conservative risk forecasts results in holding costly capital reserves for the financial institutions but poses no risk to the society as a whole. Thus, the regulators only have to prevent and penalize the underestimation of the financial risks, which demon- strates the necessity of one-sided testing procedures. For example, the currently applied traffic light system (Basel Committee, 1996) is in fact a one-sided VaR backtest. Like the Strict ESR backtest, the Intercept ESR backtest also has the desired characteristic to only Downloaded from https://academic.oup.com/jfec/article/20/3/437/5912157 by DeepDyve user on 20 July 2022 440 Journal of Financial Econometrics require ES forecasts as input variables and consequently is the first procedure that solely backtests the ES against a one-sided alternative. We provide implementations of the three ESR backtests proposed in this article in the R package esback (Bayer and Dimitriadis, 2019a). Such regression-based forecast evaluation approaches are already used for testing mean forecasts (Mincer and Zarnowitz, 1969), quantile forecasts (Gaglianone et al., 2011; Guler, Ng, and Xiao, 2017), and expectile forecasts (Guler, Ng, and Xiao, 2017). In contrast to these functionals where regression techniques are easily available (see e.g. Koenker and Bassett, 1978, Efron, 1991), the non-elicitability of the ES makes our approach more involved but also opens up the possibility for the different testing specifications we intro- duce. Our multivariate generalization approach of the Mincer and Zarnowitz (1969) test- ing idea can be applied equivalently to other higher-order elicitable functionals (Fissler and Ziegel, 2016) such as, for example, the variance (in the presence of a non-zero mean) and the Range VaR (Cont, Deguest, and Scandolo, 2010; Embrechts, Liu, and Wang, 2018). We evaluate the empirical properties of our ESR backtests and compare them to the existing joint VaR and ES backtests of McNeil and Frey (2000) and Nolde and Ziegel (2017) in several simulation designs. In the first setup, we implement the classical size and power analysis for backtesting risk measures, where we simulate data stemming from sev- eral realistic data generating processes (DGPs) and evaluate the empirical rejection frequen- cies of the backtests for forecasts stemming from the true and from some misspecified forecasting model. In order to assess how the potential model misspecification affects the Strict and the Intercept ESR backtests, we utilize DGPs which go beyond the class of pure scale (volatility) processes. For this, we implement two different Student’s-t generalized autoregressive score (GAS) models with time-varying higher moments (Creal, Koopman, and Lucas, 2013) and furthermore use an autoregressive (AR) GARCH (generalized autore- gressive conditional heteroskedasticity) model which allows for gradually increasing the de- gree of misspecification through the AR parameter. In the second setup, we introduce a new technique for evaluating the power of backtests for financial risk measures, where we continuously misspecify certain model parameters of the DGP to obtain a continuum of al- ternative models with a gradually increasing degree of misspecification. Misspecifying the different model parameters separately allows us to misspecify certain model characteristics (such as the reaction to shocks) in isolation, which permits a closer examination of the pro- posed backtesting procedures. The simulations show that all three ESR backtests we propose in this article are well- sized, especially when the tests are applied using the covariance estimation method which accounts for possible model misspecification. We further find that the performance of our testing procedures is almost unaffected by the DGPs which cause model misspecification in the Strict and the Intercept ESR tests. Moreover, our tests are more powerful than the exist- ing backtests of McNeil and Frey (2000) and Nolde and Ziegel (2017) in almost all of the considered simulation designs for both, testing against one-sided and two-sided alterna- tives. Notably, throughout all simulation designs, the ESR backtests are able to detect the various different misspecifications of the forecasts. In contrast, the existing backtests some- times completely fail to detect certain misspecifications, for instance when the forecaster reports risk forecasts for a misspecified probability level. The rest of this article is organized as follows. Section 1 introduces the ESR backtests and presents asymptotic theory under model misspecification. Section 2 contains several Downloaded from https://academic.oup.com/jfec/article/20/3/437/5912157 by DeepDyve user on 20 July 2022 Bayer and Dimitriadis j Regression-Based ES Backtesting 441 simulation studies and Section 3 applies the backtests to ES forecasts for a large amount of stocks from the S&P 500 index. Section 4 concludes. The proofs are deferred to Appendix A and a Supplementary Appendix contains further details of the proofs. 1 Theory 1.1 Setup and Notation We consider a stochastic process lþ1 Z ¼fZ : X ! R ; l 2 N; t ¼ 1; .. . ; Tg; (1.1) defined on some complete probability space ðX;F;PÞ, with the filtration F¼ fF ; t ¼ 1; .. . ; Tg and F ¼ rfZ ; s tg for all t ¼ 1; .. . ; T, where T 2 N. We partition t t s the stochastic process Z ¼ðY ; U Þ, where Y is an absolutely continuous random variable t t t t of interest and U is an l-dimensional vector of explanatory variables. We denote the condi- tional cumulative distribution function of Y given the past information F by F ðyÞ¼ t t1 t PðY yjF Þ and the corresponding probability density function by f . Whenever they t t1 t exist, the mean and the variance of F are denoted by E ½ and Var ðÞ. t t t For financial applications, the variable Y denotes the daily log returns of a financial asset (for instance, a stock or a portfolio), that is, Y ¼ log P log P , where P denotes t t t1 t the price of the asset at day t ¼ 1; .. . ; T. This means that throughout this article, we use the sign convention that positive returns denote profits, and negative returns denote losses. The vector U contains further variables that are used to produce forecasts for certain function- als (usually risk measures) of the random variable Y . We are interested in testing whether forecasts for a certain d-dimensional, d 2 N functional (risk measure) q ¼ qðF Þ of the conditional distribution F are correctly specified. For that, we define the most frequently used functionals for financial risk management in the following. The conditional quantile of Y given the information set F at level s 2ð0; 1Þ is defined as t t1 Q ðY jF Þ¼ F ðsÞ¼ inffy 2 R : F ðyÞ sg, which is called the VaR at level s in finan- s t t1 t cial applications. Furthermore, we define the functional ES at level s of Y given F as t t1 1 1 ES ðY jF Þ¼ F ðsÞds. If the distribution function F is continuous at its s-quantile, s t t1 t s 0 this definition can be simplified to the truncated tail mean of Y , ES ðY jF Þ¼ E ½Y jY Q ðY jF Þ: (1.2) s t t1 t t t s t t1 We denote an F -measurable one-step-ahead forecast for day t for the risk measure q t1 of the distribution F , stemming from some external forecaster or from some given forecast- ing model by q ^ ¼ q ^ ðF Þ. Following this notation, we denote forecasts for the s-VaR by t t t1 v^ and for the s-ES by e^ for some fixed level s 2ð0; 1Þ. For simplicity of notation, we drop t t the dependence on s as it is a fixed quantity. As both, the incentive of the forecaster and the underlying method used to generate the forecasts are in general unknown, these forecasts are not necessarily correctly specified. The focus of this article is to develop statistical tests for correctness of a given series of fore- casts fq ^ ; t ¼ 1; .. . ; Tg for the risk measure q relative to the realized return series 3 For recent overviews on VaR and ES forecasting approaches, see Komunjer (2004) and Nadarajah, Zhang, and Chan (2014). Downloaded from https://academic.oup.com/jfec/article/20/3/437/5912157 by DeepDyve user on 20 July 2022 442 Journal of Financial Econometrics fY ; t ¼ 1; .. . ; Tg. This is in the literature usually referred to as backtesting of the risk measure q without strictly defining this terminology. We provide such a definition in the following. Definition 1.1 A backtest for the series of forecasts fq ^ ; t ¼ 1; .. . ; Tg for the d-dimensional risk measure (functional) q relative to the realized return series fY ; t ¼ 1; .. . ; Tg is a function T Td f : R R !f0; 1g; (1.3) which maps the return and forecast series onto the respective test decision. The core message of this definition is that besides the realized return series, a backtest for some risk measure is only allowed to require forecasts for this risk measure as input varia- bles. This strict differentiation becomes relevant in the context of backtesting ES as, in con- trast to the existing VaR backtests, the recently proposed ES backtests require further input variables such as forecasts for the VaR, the volatility, or the entire tail distribution. The de- mand for these further quantities induces the following practical problems. First, the regula- tory authorities who rely on such backtesting methods do not necessarily receive forecasts from the financial institutions for the additional information required by these tests, which makes such backtests inapplicable for the regulatory authorities. Second, a rejection of the tests does not necessarily imply that the ES is misspecified, but that the forecasts for any of the input components are misspecified. Consequently, these tests are in fact not backtests for the ES, but rather backtests for some vector of risk measures (or the entire tail distribution). 1.2 The ESR Backtests We propose backtests for the risk measure ES that test whether a series of ES forecasts fe ; t ¼ 1; .. . Tg, stemming from some external forecaster or forecasting model, is correctly specified relative to a series of realized returns fY ; t ¼ 1; ... ; Tg. We follow the general testing idea of Mincer and Zarnowitz (1969) and regress the returns Y on the forecasts e t t and an intercept term by using a regression equation designed specifically for the functional ES, Y ¼ c þ c e^ þ u ; (1.4) t 1 2 t where ES ðu jF Þ¼ 0 almost surely. Given the structure in Equation (1.4) and since the s t1 forecasts e^ are generated by using the information set F , this condition on the error t t1 term is equivalent to ES ðY jF Þ¼ c þ c e^ : (1.5) s t t1 1 2 t We then test the hypothesis H : ðc ; c Þ¼ð0; 1Þ against H : ðc ; c Þ 6¼ð0; 1Þ: (1.6) 0 A 1 2 1 2 Under H , the ES forecasts are correctly specified as it holds that e^ ¼ ES ðY jF Þ al- 0 t s t t1 most surely. In general, Equation (1.4) is an example of a linear regression equation for 4 Given that the ES forecasts are correctly specified, that is, e ¼ ES ðY jF Þ, the correct specifi- t s t t1 cation condition in Equation (1.5) is equivalent to c ¼ð1 c Þe . This results in the remark of 1 2 Downloaded from https://academic.oup.com/jfec/article/20/3/437/5912157 by DeepDyve user on 20 July 2022 Bayer and Dimitriadis j Regression-Based ES Backtesting 443 > e the ES of the form Y ¼ W c þ u , for some general vector of covariates W . As outlined in t t t t Dimitriadis and Bayer (2019) and Patton, Ziegel, and Chen (2019), estimating the parame- ters c by M- or GMM estimation stand-alone is not possible since there do not exist strictly consistent loss and identification functions for the functional ES (Gneiting, 2011). Based on the seminal work of Fissler and Ziegel (2016) who introduce joint loss and identification functions for the VaR and ES, Dimitriadis and Bayer (2019), Patton, Ziegel, and Chen (2019), and Barendse (2020) propose the joint regression technique, > > e Y ¼ V b þ u ; and Y ¼ W c þ u ; (1.7) t t t t t t where V and W are k-dimensional, F -measurable covariate vectors, and where t t t1 Q ðu jF Þ¼ 0 and ES ðu jF Þ¼ 0 almost surely. Setting up this joint regression model s t1 s t1 t t facilitates the estimation of the joint parameters ðb; cÞ, whereas stand-alone estimation of c is infeasible. We use this joint setup to propose the following regression based backtests for the ES: The Auxiliary ESR Backtest: We choose V ¼ð1; v^ Þ and W ¼ð1; e^ Þ, i.e. we set up the re- t t t t gression system Y ¼ b þ b v^ þ u ; and Y ¼ c þ c e^ þ u ; (1.8) t 1 2 t t t 1 2 t and test H : ðc ; c Þ¼ð0; 1Þ against H : ðc ; c Þ 6¼ð0; 1Þ; (1.9) 0 1 2 A 1 2 using the Wald-type test statistic ^ ^ T ¼ Tðc ð0; 1ÞÞX ðc ð0; 1ÞÞ ; (1.10) AESR T c T based on some (consistent) covariance estimator X for the covariance of the subvector c. The Strict ESR Backtest: We choose V ¼ W ¼ð1; e^ Þ, i.e. we set up the regression system t t t ^ ^ Y ¼ b þ b e þ u ; and Y ¼ c þ c e þ u ; (1.11) t t t t t 1 2 1 2 t and test H : ðc ; c Þ¼ð0; 1Þ against H : ðc ; c Þ 6¼ð0; 1Þ; (1.12) 0 A 1 2 1 2 using the Wald-type test statistic T ¼ Tð^c ð0; 1ÞÞX ð^c ð0; 1ÞÞ ; (1.13) SESR T c T based on some (consistent) covariance estimator X for the covariance of the subvector c. We discuss the employed covariance estimators X in Section 1.5. Whereas setting up Mincer–Zarnowitz tests for classical elicitable functionals such as the mean, quantiles, and expectiles is straight-forward (see Mincer and Zarnowitz (1969), Gaglianone et al. (2011), Holden and Peel (1990), who claim that the null hypothesis, given in Equation (1.6) is only a suffi- cient, but not a necessary condition for correctly specified forecasts as c ¼ð1 c Þe is the 1 2 required necessary condition. However, this more general condition implies that the forecasts e are constant for all t ¼ 1; ... ; T , which is highly unrealistic given the dynamic nature of financial time series. Consequently, we employ the hypotheses given in Equation (1.6) for our backtesting procedure. Downloaded from https://academic.oup.com/jfec/article/20/3/437/5912157 by DeepDyve user on 20 July 2022 444 Journal of Financial Econometrics Guler, Ng, and Xiao (2017)), in the case of higher-order elicitable functionals such as the ES, we have several choices as illustrated above. The Auxiliary ESR backtest is based on the regression specification in Equation (1.8) and requires both, VaR and ES forecasts as input variables. Thus, following Definition 1.1, this backtest is formally a joint VaR and ES backtest, however, with a strong emphasis on backtesting ES forecasts. In contrast, the Strict ESR backtest only incorporates ES forecasts and consequently is the first backtest for the ES stand-alone. The Strict ESR test, however, comes at the cost of a potential model misspecification. Given that the financial returns Y follow some pure scale (volatility) process, it holds that the VaR and ES forecasts are perfectly colinear, e^ ¼ cv^ for some c 2 R. Consequently, if t t v^ equals the true conditional VaR, the first equation in (1.11) is correctly specified for the true parameter values ðb ; b Þ¼ð0; 1=cÞ. Most of the financial econometrics literature (al- 1 2 most the entire GARCH, stochastic volatility, and Realized Volatility literature) is based on such an assumption for daily returns, which motivates the applicability of this Strict ESR backtest. However, this backtest is also applicable in the general case where the true VaR and ES forecasts are not necessarily colinear. For this, we provide asymptotic theory for M- estimation of the joint VaR and ES regression under potential model misspecification in Section 1.4. 1.3 The One-Sided Intercept ESR Backtest The two ESR backtests introduced in the previous section only allow for testing two-sided hypotheses as specified in Equations (1.9) and (1.12), as it is generally unclear how too risky (or too conservative) forecasts influence the parameters c and c . Because the capital 1 2 requirements that the financial institutions have to keep as a reserve depend on the reported risk forecasts, the market participants have an incentive to report too risky forecasts for the ES in order to keep as little capital requirements as possible. In contrast, issuing too conser- vative risk forecasts and facing higher capital requirements do not have to be punished by the regulatory authorities. Thus, the regulators only have to prevent and consequently pen- alize the underestimation of financial risks, which can be done by using one-sided backtest- ing procedures. For example, the traffic light system (Basel Committee, 1996), currently implemented in the Basel Accords, is in fact a one-sided backtest for the hit ratios of VaR forecasts. Hence, we also introduce a regression-based backtesting procedure for the ES that allows for testing one-sided hypotheses. The Intercept ESR Backtest: This backtest is based on a regression setup similar to the Strict ESR backtest by regressing the forecast errors, Y e^ , only on an intercept term in t t the ES-specific regression equation, Y e^ ¼ b þ b e^ þ u ; and Y e^ ¼ c þ u ; (1.14) t t 1 2 t t t t 1 where Q ðu jF Þ¼ 0 and ES ðu jF Þ¼ 0 almost surely. By using this restricted regres- s t1 s t1 t t sion equation, we can define a one-sided and a two-sided alternative, 5 One could interpret the higher capital requirements as a punishment for too conservative risk forecasts. Downloaded from https://academic.oup.com/jfec/article/20/3/437/5912157 by DeepDyve user on 20 July 2022 Bayer and Dimitriadis j Regression-Based ES Backtesting 445 2s 2s H : c ¼ 0 against H : c 6¼ 0; and 1 1 0 A (1.15) 1s 1s H : c 0 against H : c < 0; 0 1 A 1 which we test by using a t-test based on the estimated asymptotic covariance described in Section 1.5. Note that this testing procedure is equivalent to fixing the slope parameter of the ES equation in the Strict ESR test given in Equation (1.11) to one and only estimating and test- ing the intercept term. Therefore, we call this backtest the Intercept ESR backtest. We keep the slope parameter in the quantile regression equation, as for pure scale models where e^ ¼ cv^ , it holds that b ¼ 0 and b ¼ð1 cÞ=c under the null hypothesis. t t 1 2 1.4 Asymptotic Theory under Model Misspecification In this section, we consider the asymptotic properties of the M-estimator of the joint VaR and ES regression framework given in Equation (1.7) under potential model misspecifica- tion. In the following, we write X ¼ðV ; W Þ for the compound vector of covariates. t t t Following Dimitriadis and Bayer (2019) and Patton, Ziegel, and Chen (2019), the M-esti- mator of the regression parameters h ¼ðb; cÞ is defined by: h ¼ arg min Q ðhÞ; where (1.16) T T h2H Q ðhÞ¼ qðY ; X ; hÞ and (1.17) T t t t¼1 ðV b Y Þ1 > 1 t t fY V bg > > t t > qðY ; X ; hÞ¼ W c V b þ þ logðW cÞ; (1.18) t t t t t W c s where the loss function in Equation (1.18) is a strictly consistent loss function for the pair quantile and ES (Fissler and Ziegel, 2016). Dimitriadis and Bayer (2019) and Patton, Ziegel, and Chen (2019) show consistency and asymptotic normality for the M-estimator in the case of a correctly specified parametric model, that is, under the assumption that there exists a true parameter h 2 H such that Q ðu jF Þ¼ 0 and ES ðu jF Þ¼ 0, al- 0 s t t1 s t1 most surely. In the following, we extend this theory by relaxing these assumptions which allows for the general case of misspecified models. For this, we define the pseudo-true parameter 0 0 h ¼ argmin Q ðhÞ; where Q ðhÞ¼ E½Q ðhÞ: (1.19) T T T h2H For the classical case of a correctly specified model, the pseudo-true parameter coincides with the true regression parameter h ¼ h and is independent of T. In the following, we re- strict our attention to processes and models for the conditional quantile and ES which fol- low the following conditions. Assumption 1.2 (A1) The distribution F is absolutely continuous with density function f , t t which is bounded from above, that is, there exists a constant c > 0 s.t. sup f ðyÞ c and y2R sup f ðyÞ c. y2R t 2k (A2) The parameter space H R is compact, convex, and has nonempty interior. Downloaded from https://academic.oup.com/jfec/article/20/3/437/5912157 by DeepDyve user on 20 July 2022 446 Journal of Financial Econometrics (A3) We assume that the pseudo-true parameter h defined in Equation (1.19) is in the in- terior of H and is the unique minimizer of the objective function Q ðhÞ and that the se- quence r E ½qðY ; X ; h Þ is uncorrelated. h t t t > > (A4) V ; W 2F and the matrices E½V V and E½W W have full rank. t t t1 t t t t (A5) The matrix K , defined in Theorem 1.4 has strictly positive Eigenvalues for all T suffi- ciently large enough. (A6) The stochastic process fY ; V ; W g is strong mixing of size r=ðr 2Þ for some r > 2. t t t (A7) For all h 2 H, it holds that j j K < 1 for some constant K > 0. W c rþ1 rþ1 rþ1 r (A8) It holds that E½jjV jj < 1; E½jjW jj < 1; E½jjV jj jjW jj < 1 and t t t t rþ1 r E½jjW jj jY j < 1 for the r > 2 from condition (A6). t t (A9) For any T 2 N; sup 1 > K a.s. for some constant K > 0. h2H fY ¼V bg t¼1 The conditions in Assumption 1.2 mainly resemble the regularity conditions for asymp- totic normality for correctly specified models from Patton, Ziegel, and Chen (2019) and we refer to Patton, Ziegel, and Chen (2019) for a discussion of these conditions. The key condi- tion that allows for misspecified models is the unique minimization condition of the pseudo-true parameter h in condition (A3). The above assumptions contain the case of correctly specified models as then, the condition (A3) is naturally fulfilled as the utilized loss function is a strictly consistent loss function for the VaR and the ES (Fissler and Ziegel, 2016). We connect this weaker condition (A3) to classical misspecified regression models for the mean and for quantiles of White (1980), Gourieroux, Monfort, and Trognon (1984), Kim and White (2003), Komunjer (2005), and Angrist, Chernozhukov, and Fernandez-Val (2006). For correctly specified models, we usually impose the strong condition that for all t ¼ 1; .. . ; T, E ½wðY ; X ; hÞ ¼0a:s: () h ¼ h ; (1.20) t t t where wðY ; X ; hÞ is almost surely the derivative of qðY ; X ; hÞ and corresponds to the iden- t t t t tification functions of the model (Gneiting, 2011). The weaker condition (A3) is essentially equivalent to the unconditional moment condition "# E wðY ; X ; hÞ ¼ 0 () h ¼ h : (1.21) t t t¼1 Thus, the condition (1.21) can be interpreted as an average identification condition, that > > is, V b and W c are some best averaged linear approximations of the true unknown t T t T conditional quantile and ES models. Theorem 1.3 (Consistency Misspecified Model). Given the conditions from Assumption 1.2, it holds that h h !0,as T !1, where h is the pseudo-true parameter as defined T T in Equation (1.19). The proof of Theorem 1.3 is given in Appendix A. Theorem 1.4 (Asymptotic Normality Misspecified Model). Given the conditions of Assumption 1.2, it holds that Downloaded from https://academic.oup.com/jfec/article/20/3/437/5912157 by DeepDyve user on 20 July 2022 Bayer and Dimitriadis j Regression-Based ES Backtesting 447 pffiffiffiffi 1=2 d R ðh Þ K ðh Þ Tðh h Þ!Nð0; I Þ; (1.22) T T T 2k T T T where K ðh Þ K ðh Þ R ðh Þ R ðh Þ 11;T 12;T 11;T 12;T T T T T K ðh Þ¼ and R ðh Þ¼ (1.23) T T T T K ðh Þ K ðh Þ R ðh Þ R ðh Þ 21;T 22;T 21;T 22;T T T T T with 1 1 > > K ðh Þ¼ E V V f V b ; (1.24) 11;T t t T t t T T sW c t T t¼1 "# 1 1 F ðV b Þ s > t > t T K ðh Þ¼ K ðh Þ¼ E V W ; (1.25) 12;T t T 21;T T t T s ðW c Þ t¼1 t T "# 1 1 K ðh Þ¼ E W W (1.26) 22;T t T t ðW c Þ t¼1 t T "# X > 2 1 1 F ðV b Þ s > > > t T E W W W c E ½Y 1 þ V b ; t t t fY V b g t t T t t T 3 t T T s s ðW c Þ t¼1 t T (1.27) and "# T > 1 1 1 s ð1 2sÞðF ðV b Þ sÞ > t T R ðh Þ¼ E V V þ ; (1.28) 11;T t T t 2 2 T s s ðW c Þ t¼1 t T 1 1 1 s > > > R ðh Þ¼ E V W ðV b W c Þ (1.29) 12;T t T t t T t T T s ðW c Þ t¼1 t T 1 s F ðV b Þ s 1 > t T > þ V b þ W c E ½Y 1 > (1.30) t t fY V b g t T t T t t T s s s F ðV b Þ s t T > > ðV b W c Þg; (1.31) t T t T 1 1 1 1 s > > > > > 2 R ðh Þ¼ E W W Var ðV b Y jY V b Þþ ðV b W c Þ 22;T t t t t T t t T t T t T t T T s s ðW c Þ t¼1 t T (1.32) s F ðV b Þ > > > t T þ2ðV b W c ÞV b g: (1.33) t T t T t T The proof of Theorem 1.4 is given in Appendix A. The asymptotic theory derived here embeds the asymptotic theory of Patton, Ziegel, and Chen (2019) and Dimitriadis and Bayer (2019) in the simplified case of correctly specified models. Correct specification > > 1 implies that F ðV b Þ¼ s and W c ¼ E ½Y 1 almost surely for all t t t fY V b g t T t T s t t T t ¼ 1; .. . ; T. Imposing these two conditions simplifies the asymptotic covariance matrix of Theorem 1.4 to the asymptotic covariances from Patton, Ziegel, and Chen (2019) and Dimitriadis and Bayer (2019). In general, allowing for model misspecification in regression models comes at the cost of an inflated and more complicated asymptotic covariance ma- trix, see White (1980), White (1994), Kim and White (2003), Komunjer (2005), and Downloaded from https://academic.oup.com/jfec/article/20/3/437/5912157 by DeepDyve user on 20 July 2022 448 Journal of Financial Econometrics Angrist, Chernozhukov, and Fernandez-Val (2006) for examples of semiparametric models for the mean and quantiles. Given consistency and asymptotic normality, we can derive the asymptotic distribution of the test statistics of the regression-based ESR backtests. Henceforth, we use the short no- 1 1 tation X ¼ K ðh Þ R ðh ÞK ðh Þ for the asymptotic covariance. As the Auxiliary T T T T T T T ESR backtest is not subject to model misspecification, under the null hypothesis it holds that c ¼ð0; 1Þ for all T 2 N. However, this does not necessarily hold for the Strict ESR and the Intercept ESR backtests and we define the following modified test statistics for these backtests, 0 0 0 ~ ^ T ðc Þ¼ Tð^c c ÞX ð^c c Þ ; (1.34) SESR T T T;c 0 0 0 > ~ ^ T ðc Þ¼ Tð^c c ÞX ð^c c Þ ; (1.35) IESR 1;T 1 T;c 1;T 1 ^ ^ depending on the parameter c , which we test for. The matrices X and X are the ES- T;c T;c specific parts of the estimators for the asymptotic covariance matrix and c and c refer to 1;T 1 the intercept components of the ES-specific parameter vectors c and c . Given these modi- fied test statistics, we can state the following corollary. Corollary 1.5 Given the conditions of Assumption 1.2 and given that X X !0, it holds T T that d d d 2 2 2 ~ ~ T !v ; T ðc Þ!v ; and T ðc Þ!v : (1.36) AESR SESR IESR 2 T 2 1;T 1 The proof of Corollary 1.5 is given in Appendix A. For the Auxiliary ESR test, we can simply use the test statistic T in order to test whether c ¼ð0; 1Þ. However, for the AESR Strict and Intercept ESR tests, we do not know the exact form of the pseudo-true parameter c under the null hypothesis due to the potential model misspecification. Consequently, we derive the distribution of the test statistic T ðc Þ at the pseudo-true parameter c .In SESR T T the following, we argue that in realistic financial settings, it holds that c ð0; 1Þ and thus, ~ ~ T ðc Þ T ðð0; 1ÞÞ ¼ T holds approximately, and equivalently for the SESR SESR SESR Intercept ESR test. This implies that these tests still have approximately correct size under these slightly misspecified null hypotheses. First, the majority of the literature in financial econometrics finds that pure scale proc- esses (e.g., GARCH and stochastic volatility models) approximate the true underlying daily financial data well enough. Thus, e^ cv^ for some c > 0 and we find that under the null t t hypothesis, the regression model in Equation (1.11) is only subject to slight model misspeci- fication. Second, the misspecification is in the auxiliary quantile equation, while we test the parameters of the ES equation in Equation (1.11), which is correctly specified under the null. Thus, the model misspecification enters our test statistic only indirectly through the auxiliary effect of the joint parameter estimation. We confirm that these approximations are very precise in the simulation setup of Appendix B, even for cases of unrealistically strong model misspecification. Furthermore, the simulation study in Section 2 confirms these results by showing that the Strict ESR backtest based on T exhibits correct size SESR and performs almost indistinguishably to the Auxiliary ESR backtest, also in the simulation setups where the underlying data do not follow a pure scale processes. This shows that the Downloaded from https://academic.oup.com/jfec/article/20/3/437/5912157 by DeepDyve user on 20 July 2022 Bayer and Dimitriadis j Regression-Based ES Backtesting 449 approximation error is negligible under the null hypothesis in realistic financial settings and that the Strict and Intercept ESR backtests can indeed be applied in practice. The following corollary specifies the behavior of our tests under alternative hypotheses. For this, we define the hypothetical parameter c as the ES-specific pseudo-true parameter of the model for correctly specified ES forecasts. While for correctly specified regression equations it holds that c ¼ð0; 1Þ, its exact form is unknown in the general case. Corollary 1.6 Under the alternative hypotheses, H H AESR SESR 0 IESR 0 H : c 6¼ð0; 1Þ; H : c 6¼ c ; and H : c 6¼ c ; (1.37) A T A T A 1;T T 1;T for all T T for some T 2 N, and given the conditions of Assumption 1.2 and given that A A X X!0, it holds that for all c > 0, P P P H H 0 0 ~ ~ PðT cÞ!1; P T ðc Þ c !1; and P T ðc Þ c !1: AESR SESR IESR T 1;T (1.38) The proof of Corollary 1.6 is given in Appendix A. While the parameter c is unknown in the general case, we argue above that c ð0; 1Þ still holds approximately in realistic ~ ~ settings and consequently T ðc Þ T ðð0; 1ÞÞ ¼ T . Corollary 1.6 theoretic- SESR SESR SESR ally implies diverging power for any case where c 6¼ c , that is, also in misspecified cases when c 6¼ð0; 1Þ, and thus, diverging power by employing the approximated test statistic T ¼ T ðð0; 1ÞÞ. While this holds theoretically, the empirical performance of the SESR SESR Strict and Intercept ESR tests is almost entirely unaffected by the small approximation error stemming from the indirect misspecification of the quantile model, as can be seen in the simulation results in Appendix B and in Section 2. 1.5 Implementation of the Tests The M-estimation of the parameters h is carried out by using the R package esreg (Bayer and Dimitriadis, 2019b). The main difficulty in the implementation of the backtests is esti- 1 1 mation of the asymptotic covariance matrix X ¼ K ðh Þ R ðh ÞK ðh Þ . Generally, T T T T T T T this is implemented by using the sample counterparts of the expectation of the components given in Equations (1.24)–(1.33) in Theorem 1.4, wich are however subject to the following four nuisance quantities: ^ >^ (a) the conditional density function, evaluated at the conditional quantile, f ðV b Þ, t t T d >^ >^ (b) the conditional truncated variance, Var ðV b Y jY V b Þ, t t t t T t T ^ >^ (c) the conditional distribution function, F ðV b Þ, and t T (d) the conditional truncated expectation E ½Y 1 > . t t ^ s fYt V b g t T We implement a novel and misspecification robust covariance estimator by estimating the four nuisance quantities above in the following way. The terms (a) and (b) are subject to the asymptotic covariance of correctly specified models for the quantile and the ES of Dimitriadis and Bayer (2019), Patton, Ziegel, and Chen (2019), and Barendse (2020). Thus, we follow the approach of Dimitriadis and Bayer (2019) and apply the nid estimator of Hendricks and Koenker (1992) for (a), the conditional density and the flexible scl-sp esti- mator of Dimitriadis and Bayer (2019) for (b), the conditional truncated variance. Downloaded from https://academic.oup.com/jfec/article/20/3/437/5912157 by DeepDyve user on 20 July 2022 450 Journal of Financial Econometrics ^ >^ In order to estimate (c), the conditional distribution function F ðV b Þ, we follow the t T general approach of the scl-sp estimator of Dimitriadis and Bayer (2019), that is, we assume that F follows a conditional location-scale model with innovations e with a flexible zero t t mean and unit variance distribution. We standardize Y by the estimates of the conditional mean and variance, estimated by pseudo-maximum likelihood and apply a kernel density estimator in order to obtain the distribution function of e . Hence, we can recover the distri- bution of Y given F . Notice that for the minor degree of misspecification we are subject t t1 ^ ^ to in our backtesting approach, it approximately holds that F ðV b Þ s for all t. We find t T that this semiparametric estimation approach, which is subject to the location-scale as- sumption, performs better than pure nonparametric alternatives as we are estimating the conditional distribution evaluated at rather extreme quantiles such as at s ¼ 2:5%. The last nuisance quantity, E ½Y 1 , is the mean, given the observations are t t >^ s fY V b g t t T smaller than the possibly misspecified linear quantile model. This quantity is closely related to the conditional ES, which is assumed to be a linear function in our approach. As for real- istic financial data, we only face a minor degree of misspecification in the quantile model, this nuisance quantity is assumed to still be approximately linear, and thus, we obtain that 1 > E ½Y 1 > ¼ W ^c for all t. Nonparametric estimation of this nuisance quantity t t ^ t T s fYt V b g t T again introduces too much estimation noise. We further implement our backtests based on a covariance estimator from Dimitriadis and Bayer (2019) and Patton, Ziegel, and Chen (2019), which does not ac- count for possible model misspecification. This estimator is based on the simplified co- variance structure given in Dimitriadis and Bayer (2019) and Patton, Ziegel, and Chen (2019), where the correct model specification assumption implies that F ðV b Þ¼ s and t T 1 > E ½Y 1 > ¼ W c , almost surely. Thus, we only estimate the nuisance quantities t t fY V b g t T s t t T (a) and (b) in this approach. 2 Monte-Carlo Simulations In this section, we evaluate the empirical performance of our proposed ESR backtests and compare them to the tests of McNeil and Frey (2000) and Nolde and Ziegel (2017). For this, we assess the empirical size and power of the tests, which are defined as the rejection frequency of the tests under the null and alternative hypothesis, respectively. This compari- son is conducted using two different approaches. The first, presented in Section 2.1, follows the typical strategy in the related literature of first assessing the size of the backtests with several realistic DGPs, followed by an evaluation of the power by backtesting forecasts stemming from an overly simplified model, in this case the Historical Simulation (HS) model. In the second setup, presented in Section 2.2, we continuously misspecify certain parameters of the true model and thereby obtain alternative models with a continuously increasing degree of misspecification. This approach of evaluating backtests has two advan- tages. First, we obtain power curves which can be used to draw conclusions of how an increasing model misspecification influences the test decisions. Second, misspecifying the different model parameters in isolation allows us to misspecify certain model characteristics while leaving the remaining model unchanged. Downloaded from https://academic.oup.com/jfec/article/20/3/437/5912157 by DeepDyve user on 20 July 2022 Bayer and Dimitriadis j Regression-Based ES Backtesting 451 2.1 Traditional Size and Power Comparisons In order to compare the proposed backtests from the previous sections, we simulate data from several DGPs. Besides pure scale (volatility) model specifications, under which the Strict and Intercept ESR backtests are correctly specified, we also consider more general Student’s-t GAS models (Creal, Koopman, and Lucas 2013) with time-varying higher moments and AR-GARCH specifications where our ESR backtests are subject to model misspecification under the null hypothesis. EGARCH: The first DGP is an EGARCH(1,1) model (Nelson, 1991) with t-distributed innovations, where the parameter values are calibrated to daily returns of the S&P 500 index, Y ¼ r z ; where z t ; and t t t t 7:39 (2.1) 2 2 log ðr Þ¼0:0012 0:161z þ 0:136ðjz j E½jz jÞ þ 0:978 log ðr Þ: t1 t1 t1 t t1 This model represents a highly flexible GARCH specification and due to its calibrated parameter values, this DGP accurately replicates the distributional properties of daily finan- cial returns. As we assume zero mean for this model, the true VaR and ES forecasts are per- fectly colinear and consequently, the regression equations for the Strict and the Intercept ESR backtests are correctly specified under the null hypothesis. AR-GARCH: The next specification is an AR(1)-GARCH(1,1) model with Gaussian innovations, Y ¼ /Y þ r z ; where z Nð0; 1Þ; and t t1 t t t (2.2) 2 2 2 r ¼ 0:01 þ 0:1Y þ 0:85r ; t t1 t1 where we consider the three specifications / 2f0; 0:1; 0:5g for the AR parameter. This DGP introduces model misspecification for the Strict and Intercept ESR backtests through the non-zero conditional mean specification, while leaving the realistic volatility structure of the financial returns unchanged. For this DGP, the ratio between true ES and VaR is given by: e^ l þ r q ðsÞ t t z ¼ ; (2.3) v^ l þ r n ðsÞ t t t z where l is the conditional mean of Y given F and q ðsÞ and n ðsÞ are the s-quantile, re- t t t1 z z spectively the s-ES of the innovations z .If l equals zero, the ratio is constant and thus, the t t regression equations in (1.11) are correctly specified under the null. By increasing the time- dependence of the conditional mean model through the AR parameter, we can monotonic- ally strengthen the model misspecification in this DGP. GAS-STD: We use a 3-factor Student’s-t GAS model with time-varying location l , scale r , t t and degrees of freedom n with parameters calibrated to daily returns of the S&P 500 index. This model is estimated and simulated by using the R package GAS (Ardia, Boudt, and Catania, 2019) and is based on the following model specification Y jðY ; .. . ; Y Þ tðl ; r ; n Þ; (2.4) t 1 t1 t t t where the vector ðl ; r ; n Þ follows an autoregressive specification, driven by the lagged t t t Downloaded from https://academic.oup.com/jfec/article/20/3/437/5912157 by DeepDyve user on 20 July 2022 452 Journal of Financial Econometrics score of the log-likelihood of the distributional specification in Equation (2.4). Creal, Koopman, and Lucas (2013) and Harvey (2013) introduce the general GAS specification, which nests many well known models, including ARMA, GARCH (Bollerslev, 1986), and ACD (Engle and Russell, 1998) models. Koopman, Lucas, A., and Scharth (2016) provides an overview of GAS and related models. We refer to Appendix A of Ardia, Boudt, and Catania (2019) for the exact parametric specification of this Student’s-t GAS model. GAS-SSTD: We generalize the previous GAS model to a 4-factor asymmetric Student’s-t GAS model with time-varying location l , scale r , skewness k , and degrees of freedom n , t t t t Y jðY ; .. . ; Y Þ tðl ; r ; k ; n Þ: (2.5) t 1 t1 t t t t Compared to the previous 3-factor GAS specification, this model further allows for asymmetries in the conditional return distribution through allowing for an additional time- varying skewness parameter with an autoregressive GAS specification. For the two location-scale DGPs, we obtain VaR and ES forecasts at level s by: v^ ¼ l ^ þ r ^ q ðsÞ and e^ ¼ l ^ þ r ^ n ðsÞ; (2.6) t t t z t t t z where l ^ and r ^ are the respective location and volatility forecasts generated by the location t t and scale models and q ðsÞ and n ðsÞ are the s-quantile, respectively the s-ES of the innova- z z tions z . For the t-distributions of the two GAS models, we obtain the ES forecasts through numerical integration. For the following size and power analysis of the backtests, we simu- late data from the DGPs given above with varying sample sizes of 250, 500, 1000, 2500, and 5000 observations and 250 additional pre-sample values required for the power ana- lysis. We run 10,000 Monte-Carlo replications for each of the DGPs. As stipulated by the Basel Accords, we fix the probability level to s ¼ 2:5% for the VaR and ES forecasts for each of the DGPs. In this part of the study, we focus on two-sided hypotheses and defer the one-sided case to Section 2.3. We compare our three ESR backtests to two specifications of the conditional calibration (CC) backtest of Nolde and Ziegel (2017) and to two specifica- tions of the exceedance residual (ER) backtests of McNeil and Frey (2000), which are fur- ther described in Section S.1.2.1 and Section S.1.2.2 in the Supplementary Appendix. Table 1 presents the empirical sizes of the considered backtests for the different DGPs introduced above and for the different sample sizes and a nominal test size of 5%. Table S.1 and Table S.2 in Section S.1.3 in the Supplementary Appendix show equivalent results for nominal significance levels of 1% and 10%. We find that in large samples, all backtests display rejection rates close to the respective nominal size for all considered DGPs. However, in small samples, the ESR tests based on the misspecification covariance estima- tor exhibit much better sizes compared to the equivalent ESR tests which do not account for the potential misspecification. As this holds for both, DGPs which do and do not gener- ate misspecification under the null, this indicates that the misspecification covariance esti- mator better approximates the finite sample distribution and should consequently be applied in empirical applications. We further find that the Strict ESR test and the Auxiliary ESR test perform almost iden- tical throughout all considered DGPs. This implies that the indirect misspecification the Strict ESR test introduces is negligible for realistic financial data. Even for the AR-GARCH model with increasing AR parameter /, the size properties of the Strict and the Intercept ESR tests are not adversely affected by the increasing degree of misspecification, see the Downloaded from https://academic.oup.com/jfec/article/20/3/437/5912157 by DeepDyve user on 20 July 2022 Bayer and Dimitriadis j Regression-Based ES Backtesting 453 Table 1. Empirical sizes for the first simulation study DGP Sample Str. Aux. Int. Str. Aux. Int. Gen. Sim. Std. ER size ESR ESR ESR ESR ESR ESR CC CC ER Misspec Covariance Classical Covariance EGARCH-STD 250 0.09 0.09 0.04 0.24 0.25 0.15 0.08 0.29 0.07 0.09 500 0.06 0.06 0.05 0.15 0.15 0.11 0.10 0.20 0.04 0.07 1000 0.05 0.05 0.04 0.11 0.11 0.09 0.09 0.14 0.05 0.07 2500 0.04 0.04 0.04 0.06 0.06 0.06 0.07 0.09 0.05 0.06 5000 0.04 0.04 0.05 0.06 0.06 0.06 0.06 0.08 0.05 0.06 GAS-STD 250 0.10 0.10 0.05 0.26 0.26 0.15 0.07 0.28 0.07 0.08 500 0.08 0.08 0.05 0.16 0.16 0.11 0.10 0.20 0.06 0.06 1000 0.06 0.06 0.05 0.11 0.11 0.09 0.09 0.14 0.06 0.07 2500 0.05 0.05 0.05 0.08 0.08 0.07 0.07 0.10 0.06 0.06 5000 0.04 0.05 0.05 0.06 0.06 0.06 0.06 0.08 0.06 0.06 GAS-SSTD 250 0.09 0.09 0.05 0.25 0.25 0.15 0.07 0.26 0.08 0.07 500 0.06 0.06 0.04 0.15 0.15 0.10 0.09 0.18 0.06 0.05 1000 0.05 0.05 0.04 0.10 0.10 0.08 0.08 0.13 0.07 0.06 2500 0.04 0.04 0.04 0.06 0.06 0.06 0.07 0.09 0.06 0.05 5000 0.04 0.04 0.04 0.05 0.05 0.05 0.07 0.07 0.06 0.05 AR-GARCH, / ¼ 0:0 250 0.05 0.05 0.03 0.18 0.18 0.11 0.06 0.22 0.06 0.07 500 0.04 0.04 0.03 0.12 0.12 0.09 0.07 0.14 0.04 0.04 1000 0.04 0.04 0.03 0.09 0.09 0.07 0.07 0.10 0.04 0.04 2500 0.03 0.03 0.04 0.06 0.06 0.06 0.06 0.08 0.05 0.05 5000 0.04 0.04 0.04 0.06 0.06 0.05 0.05 0.06 0.05 0.05 AR-GARCH, / ¼ 0:1 250 0.05 0.05 0.03 0.18 0.18 0.11 0.06 0.22 0.06 0.07 500 0.04 0.04 0.03 0.12 0.12 0.08 0.07 0.14 0.04 0.04 1000 0.04 0.04 0.03 0.09 0.09 0.07 0.07 0.10 0.04 0.04 2500 0.03 0.03 0.04 0.07 0.07 0.06 0.06 0.08 0.05 0.05 5000 0.04 0.04 0.04 0.06 0.06 0.05 0.05 0.06 0.05 0.05 AR-GARCH, / ¼ 0:5 250 0.04 0.04 0.02 0.17 0.17 0.11 0.06 0.22 0.06 0.07 500 0.04 0.04 0.03 0.12 0.12 0.09 0.07 0.14 0.04 0.04 1000 0.04 0.04 0.03 0.09 0.09 0.07 0.07 0.10 0.04 0.04 2500 0.04 0.04 0.03 0.07 0.07 0.06 0.06 0.08 0.05 0.05 5000 0.04 0.04 0.04 0.06 0.06 0.05 0.05 0.06 0.05 0.05 Notes: This table reports the empirical sizes of the backtests for the different DGPs described in Section 2.1 and for a nominal test size of 5%. The number of Monte-Carlo repetitions is 10,000 and the probability level for the risk measures is s ¼ 2:5%. ESR refers to the three backtests introduced in this article and we consider versions with covariance estimation with and without model misspecification. CC refers to the conditional cali- bration tests of Nolde and Ziegel (2017), and ER to the exceedance residuals tests of McNeil and Frey (2000). results of Appendix B for further details on this. From the four backtests from the literature, the general CC and the ER and its standardized version exhibit satisfactory sizes whereas the Simple CC test is severely oversized, especially in small samples. Downloaded from https://academic.oup.com/jfec/article/20/3/437/5912157 by DeepDyve user on 20 July 2022 454 Journal of Financial Econometrics For a comparison of the power of the backtests, we evaluate their ability to reject the null hypothesis for risk models producing incorrect ES forecasts. We utilize the HS ap- proach which forecasts the VaR and ES by using their empirical counterparts from previous trading days, v^ ¼ Q ðY ; Y ; .. . ; Y Þ and e^ ¼ Y 1 ; (2.7) t t1 t2 tw t ti fY v^ g s w ti t i¼1 fYti vtg i¼1 where Q is the empirical s-quantile and w is the length of a rolling window, that we set to 250, that is, one year of data. Since the standardized ER and the general CC backtests re- quire forecasts of the volatility, we estimate this quantity with the sample standard devi- ation of the returns over the same rolling window. For a meaningful and fair comparison of the power of the backtests to reject the null hypothesis, we compare the size-adjusted power of the backtests (Lloyd, 2005). For this, the original critical values of the tests are modified such that the rejection frequencies of the true model equal the nominal test sizes. The size-adjusted power is then given by the rejection frequencies of the alternative models using these modified critical values. The left panels in Figure 1 and Figure 2 contain the size-adjusted power of the backtests for all empirical sizes in the unit interval for a sample size of 1000 and for the different DGPs. The black line depicts the case of equal empirical size and power, which can be seen as a lower bound for any reasonable test: whenever the power is below this line, ran- domly guessing the test decision is more accurate than performing the test. For the three ESR backtests, we only report power for the tests relying on the misspecification robust co- variance estimator as these versions of the tests exhibit superior size properties for all con- sidered DGPs. We observe that throughout all six considered DGPs, the Strict and the Auxiliary ESR backtests clearly dominate the other tests in terms of power at almost all em- pirical sizes, including the most relevant region of test sizes between 1% and 10%. The Intercept ESR test is not as powerful, which is not unexpected as due to its unity restriction in the slope coefficient, it cannot account for misspecifications in the dynamics as precisely as the Strict and Auxiliary ESR tests. In order to present results for all considered sample sizes in condensed form for the rele- vant area of empirical sizes between 1% and 10%, we summarize the size-adjusted power by the partial area under the curve (PAUC), as proposed by Lloyd (2005). For that, we nu- merically compute the area under each power curve for the empirical sizes between 1% and 10%, which can be interpreted as the test power averaged over the different test sizes. In the right-hand panels of Figure 1 and Figure 2, we present the PAUC for all backtests, DGPs, and sample sizes. As expected, the average power increases with the sample size, so that using more information leads to more reliable decisions about the quality of a forecast. 6 A comparison of the raw power, that is, the raw rejection rate of the null hypotheses, could be mis- leading due to the differences in the empirical sizes of the backtests. In particular, an oversized test would exhibit unrealistically large rejection rates. 7 These plots are known as the receiver operating characteristic curves and origin from the psycho- metrics literature (Lloyd, 2005). They are an effective presentation method for general binary classi- fication tasks such as hypothesis testing as they show the size-adjusted power simultaneously for all significance levels. Downloaded from https://academic.oup.com/jfec/article/20/3/437/5912157 by DeepDyve user on 20 July 2022 Bayer and Dimitriadis j Regression-Based ES Backtesting 455 (a)(b) (c)(d) (e)(f) Figure 1. Size-adjusted power and PAUC plots against HS for a sample size of 1000 days. The number of Monte-Carlo repetitions is 10,000 and the probability level for the risk measures is s ¼ 2:5%. ESR refers to the backtests introduced in this article with (m) indicating the version which account for the additional covariance terms induced by the misspecified model. CC refers to the conditional calibra- tion tests of Nolde and Ziegel (2017), and ER to the exceedance residuals tests of McNeil and Frey (2000). (a) EGARCH: Size-adjusted Power; (b) EGARCH: PAUC; (c) GAS-STD: Size-adjusted Power; (d) GAS-STD: PAUC; (e) GAS-SSTD: Size-adjusted Power; (f) GAS-SSTD: PAUC. We find that for all considered sample sizes, the Strict and Auxiliary ESR backtests domin- ate the other testing approaches. The almost identical performance of the Strict and the Auxiliary ESR tests throughout all simulation designs in Figure 1 and Figure 2 emphasizes that the misspecification introduced by the Strict ESR test seems to be unproblematic for realistic financial data. 2.2 Continuous Model Misspecification In the second simulation study, we use a GARCH(1,1) model with standardized Student-t distributed innovations, Downloaded from https://academic.oup.com/jfec/article/20/3/437/5912157 by DeepDyve user on 20 July 2022 456 Journal of Financial Econometrics (a)(b) (c)(d) (f) (e) Figure 2. Size-adjusted power and PAUC plots against HS for a sample size of 1000 days. The number of Monte-Carlo repetitions is 10,000 and the probability level for the risk measures is s ¼ 2:5%. ESR refers to the backtests introduced in this article with (m) indicating the version which account for the additional covariance terms induced by the misspecified model. CC refers to the conditional calibra- tion tests of Nolde and Ziegel (2017), and ER to the exceedance residuals tests of McNeil and Frey (2000). (a) AR-GARCH / ¼ 0: Size-adjusted Power; (b) AR-GARCH / ¼ 0: PAUC; (c) AR-GARCH / ¼ 0.1: Size-adjusted Power; (d) AR-GARCH / ¼ 0.1: PAUC; (e) AR-GARCH / ¼ 0.5: Size-adjusted Power; (f) AR-GARCH / ¼ 0.5: PAUC. Y ¼ r z ; where z t ; and t t t t (2.8) 2 2 2 r ¼ g þ g Y þ g r ; 0 1 2 t t1 t1 with the parameter values g ¼ 0:01; g ¼ 0:1; g ¼ 0:85, and ¼ 5 for the true model. For 0 1 2 the analysis of the backtests, we simulate 10,000 times from this model with a fixed sample size of 2500 observations and consider the probability level s ¼ 2:5% for the VaR and the ES. Table 2 presents the empirical sizes of the backtests for a nominal size of 5% for both, the two- and one-sided hypotheses. As in the first simulation study, we find that most of the backtests are reasonably sized with rejection frequencies close to the nominal value. Downloaded from https://academic.oup.com/jfec/article/20/3/437/5912157 by DeepDyve user on 20 July 2022 Bayer and Dimitriadis j Regression-Based ES Backtesting 457 Table 2. Empirical sizes for the second simulation study DGP Str. Aux. Int. Str. Aux. Int. General Simple Std. ER ESR ESR ESR ESR ESR ESR CC CC ER Misspec covariance Classical covariance Two-sided 0.07 0.07 0.07 0.05 0.05 0.05 0.07 0.09 0.05 0.05 One-sided – – 0.02 – – 0.01 0.02 0.03 0.06 0.06 Notes: This table shows the empirical sizes of the backtests for the GARCH(1,1)-t model given in Equation (2.8), for a nominal test size of 5% and for both, one-sided and two-sided hypotheses. The number of Monte-Carlo rep- etitions is 10,000 and the probability level for the risk measures is s ¼ 2:5%. ESR refers to the backtests intro- duced in this article. CC refers to the conditional calibration tests of Nolde and Ziegel (2017), and ER to the exceedance residuals tests of McNeil and Frey (2000). Note that the Strict and Auxiliary ESR tests do not permit testing against a one-sided alternative and therefore, we only present sizes for the two-sided hypothesis. For a detailed analysis of the power of the backtests, we continuously misspecify the true model according to the following five designs: (a) We misspecify how the conditional variance reacts to the squared returns by varying the ARCH parameter g . We choose ~g between 0.03 and 0.2 and let ~g ¼ 0:95 ~g , such 1 2 1 that the persistence of the GARCH process remains constant. When ~g < g , there is 1 1 too little variation in the ES forecasts due to the reduced response to shocks and the GARCH process approaches a constant volatility model. (b) We alter the unconditional variance of the GARCH process E½r ¼ g =ð1 g g Þ t 0 1 2 between 0.5 and 0.01 by varying the parameter g while holding g and g constant. 0 1 2 Since the conditional variance is a weighted combination of the unconditional variance, the past squared returns, and the past conditional variance, this change implies that the ES forecasts are too conservative when the unconditional variance is larger than its true value, and vice versa. (c) We vary the persistence of shocks between 0.9 and 0.999 by setting ~g ¼ d g and ~g ¼ 1 1 2 d g for a varying constant d > 0 and by setting ~g ¼ E½r ð1 ~g ~g Þ in order to sta- 2 0 t 1 2 bilize the unconditional variance. A higher persistence causes a stronger and longer reac- tion to shocks. (d) We vary the degrees of freedom of the underlying Student-t distribution between 3 and 1. Since the conditional variance is unaffected, this modification implies a relative hori- zontal shift of the ES forecasts. (e) We misspecify the probability level ~s of the ES forecasts between 0.5% and 5%. This represents the scenario that a forecaster submits (accidentally or on purpose) predictions for some level ~s 6¼ s. Similar to changing the degrees of freedom, this modification implies a relative horizontal shift of the ES forecasts. As an illustrative example of these misspecifications, Figures S.1a to S.1e in the Supplementary Appendix depict 250 realizations of the returns of the true DGP in Equation (2.8), together with the corresponding ES forecasts of the true model (black dashed line) and of two exemplary models following the parameter misspecifications described in the points (a) to (e) above. We present the size-adjusted rejection rates plotted against the respective misspecified parameters for these five designs in Figure 3a–e. The true model is indicated by the gray Downloaded from https://academic.oup.com/jfec/article/20/3/437/5912157 by DeepDyve user on 20 July 2022 458 Journal of Financial Econometrics (a)(b) (c) (e) (d) Figure 3. Size-adjusted rejection rates for various types of misspecification. The gray vertical line depicts the true model. The number of Monte-Carlo repetitions is 10,000 and the probability level for the risk measures is s ¼ 2:5%. ESR refers to the backtests introduced in this article with (m) indicating the version which account for the additional covariance terms induced by the misspecified model. CC refers to the conditional calibration tests of Nolde and Ziegel (2017), and ER to the exceedance resid- uals tests of McNeil and Frey (2000). (a) Changing the reaction to the squared returns; (b) Changing the unconditional variance; (c) Changing the persistence; (d) Changing the degrees of freedom; (e) Changing the probability level. vertical line and, induced by the results of Figure S.1 in the Supplementary Appendix; the X-axis is oriented such that too risky (too small in absolute value) ES forecasts are on the right side of the true model. Even though there is no backtest that dominates the others throughout all considered designs, several conclusions can be drawn from this figure. 8 Notice that this inequality of the forecast magnitude only holds on average in the cases of Figure 3a and c whereas it holds strictly for Figure 3b, d and e. Downloaded from https://academic.oup.com/jfec/article/20/3/437/5912157 by DeepDyve user on 20 July 2022 Bayer and Dimitriadis j Regression-Based ES Backtesting 459 (a)(b) (c) (d)(e) Figure 4. Size-adjusted rejection rates for various types of misspecification with a one-sided hypoth- esis. The gray vertical line depicts the true model. The number of Monte-Carlo repetitions is 10,000 and the probability level for the risk measures is s ¼ 2:5%. ESR refers to the backtests introduced in this article with (m) indicating the version which account for the additional covariance terms induced by the misspecified model. CC refers to the conditional calibration tests of Nolde and Ziegel (2017), and ER to the exceedance residuals tests of McNeil and Frey (2000). (a) Changing the reaction to the squared returns; (b) Changing the unconditional variance; (c) Changing the persistence; (d) Changing the degrees of freedom; (e) Changing the probability level. 1. Overall, the Strict and Auxiliary ESR tests perform almost indistinguishably and in four out of the five considered designs, their performance is superior compared to the general CC and both ER backtesting approaches (Figure 3a–c and 3e). The ESR backtests outper- form the competitors especially when we misspecify the volatility dynamics of the under- lying GARCH process (Figure 3a–c). This shows that, in contrast to the existing Downloaded from https://academic.oup.com/jfec/article/20/3/437/5912157 by DeepDyve user on 20 July 2022 460 Journal of Financial Econometrics approaches, our ESR backtests can be used to detect misspecifications in the dynamics used to construct the ES forecasts which go beyond level shifts. 2. The two ER tests (and the general CC test that is constructed to be similar to the ER backtest) can hardly discriminate between forecasts for the VaR and ES issued through misspecified volatility processes (Figure 3a–c) and through misspecified probability levels ~s 6¼ s (Figure 3e). This confirms the theoretical results discussed in Section S.1.2.1 in the Supplementary Appendix that these backtests only reject misspecifications which affect the relation (distance) between the VaR and ES forecasts. In contrast, these backtests per- form well in the case of misspecified tails of the residual distribution, which particularly affects the relative distance between the VaR and ES forecasts (Figure 3d). If these backt- ests would be used by the regulatory authorities, banks could submit joint VaR and ES forecasts for some level ~s > s or some (too small) volatility process in order to minimize their capital requirements without facing the risk of being detected by these backtests. In comparison, our Intercept ESR backtest which is similar to the ER backtests by construc- tion is clearly able to identify these misspecified probability levels. 3. Throughout all five misspecifications, the simple CC backtest also exhibits good power properties, similar to our proposed backtests. However, our three ESR backtests exhibit much better size properties (see Tables 1 and 2) and in contrast to the simple CC test, they do not fail to reject the HS forecasts in the first simulation study (see Figure 1). 4. The Intercept ESR test performs well for misspecifications in the residual distribution, while it exhibits lower power against misspecifications in the dynamics of the model when we alter the ARCH parameter and the persistence of the process. This confirms the theoretical considerations that the Strict and Auxiliary test have a greater ability to reject these misspecifications. Together with the results from the first simulation study, these findings demonstrate that our proposed ESR backtests are a powerful choice for backtesting ES forecasts. They are reasonably sized and exhibit good power properties against a variety of misspecifica- tions. Notably, in contrast to the existing backtests, there is no single type of misspecifica- tion where our ESR tests are unable to discriminate between forecasts of the true and the misspecified models. 2.3 Testing One-Sided Hypotheses For the regulatory authorities, testing against a one-sided alternative might be more mean- ingful than the two-sided versions of the tests we consider in the previous sections. Holding more money than stipulated bv the Basel Accords is no concern for regulators as it is only important that banks keep enough monetary reserves to cover the risks from their market activities. In the following, we assess the performance of the Intercept ESR backtest and the one-sided versions of the four competitor backtests in rejecting the null hypothesis that the issued ES forecasts are at least as conservative (not smaller in absolute value) as the true ES, that is, that the associated market risk is not underestimated. In Figure 4a–e, we present the size-adjusted rejection rates for the one-sided versions of the considered backtests and for the five continuous parameter misspecifications described in the points (a)–(e) from the previous section. The structure of these figures is analog to the two-sided case where the X-axis is oriented such that too risky ES forecasts are on the right side of the true model (vertical gray line). As it can be seen in Figures S.1a to S.1e in the Downloaded from https://academic.oup.com/jfec/article/20/3/437/5912157 by DeepDyve user on 20 July 2022 Bayer and Dimitriadis j Regression-Based ES Backtesting 461 Supplementary Appendix, the five modifications of the true model exhibit clear patterns when they issue too risky, respectively too conservative forecasts for the true ES, where this finding holds strictly for the cases (b), (d), and (e) and on average for the cases (a) and (c). Thus, the one-sided backtests should only reject the null hypothesis for ES forecasts that issue too risky (too small in absolute value) forecasts, that is, which are on the right side of the true model in Figure 4a–e. We find that our Intercept ESR backtest is reasonably sized (compare Table 2) and dom- inates the ER and the CC tests in terms of their power in three out of the five misspecifica- tion designs. Only when altering the degrees of freedom, the ER tests are more powerful than the Intercept ESR test. When changing the persistence of the process, the Intercept ESR test performs overall comparably to its competitors throughout the different degrees of misspecification. Surprisingly, we see that in four out of the five cases, the one-sided CC tests (both, the simple and the general version) also reject too conservative ES forecasts, even though these should not be rejected by the specifications of the one-sided tests. Furthermore, as for the two-sided tests, both ER backtests fail to detect misspecifications of the underlying volatility process and of the underlying probability level. Summarizing these results, the proposed Intercept ESR backtest is a powerful backtest with good size proper- ties for testing one-sided hypotheses which dominates the existing one-sided (joint VaR and ES) backtesting techniques in the literature. 3 Empirical Application In the empirical application, we apply our backtests to compare ES forecasts along three dimensions: the complexity of the risk model, the length of the estimation window, and the model refitting frequency. From a practitioners point of view, it would be desirable to have a parsimonious model that can be estimated with few observations and is valid over a long period of time, for reasons of low engineering effort, data storage, and human and compu- tational effort for updating the model. To assess whether such a setup is reasonable, and if not, which dimensions are crucial for a good performance, we compare rejection rates of ES forecasts using our backtests. For this application, we use daily log returns of the 200 most highly capitalized stocks of the S&P 500 index (as of September 1, 2019), with a sufficiently long history of stock prices. We consider four different risk models: the standard GARCH(1,1) of Bollerslev (1986) and the GJR-GARCH(1,1) model of Glosten, Jagannathan, and Runkle (1993), both coupled with Gaussian and Student-t distributed innovations. For all four models and 200 stocks, we compare the same evaluation horizon, the period from January 2010 to August 2019 with a total of 2432 daily observations. We furthermore consider five dif- ferent lengths of the rolling estimation window ranging from one year (250 trading days) up to eight years (2000 trading days) and refitting horizons of 5, 21, 62, 125, and 250 days, corresponding to weekly, monthly, quarterly, bi-yearly, and yearly updating of the models. Table 3 presents the rejection rates of the one-sided Intercept ESR backtest with a nom- inal size of 5% for the 200 stocks under investigation, for the four GARCH specifications, the five estimation window sizes, and the five refitting frequencies. We choose to use the one-sided Intercept ESR test as this is the only one-sided and strict ES backtest in the literature. Given the currently implemented traffic light system of the Basel Committee, Downloaded from https://academic.oup.com/jfec/article/20/3/437/5912157 by DeepDyve user on 20 July 2022 462 Journal of Financial Econometrics Table 3. Results of the empirical application Rolling window Refitting frequency Refitting frequency 5 21 62 125 250 5 21 62 125 250 GARCH-N GJR-GARCH-N 250 0.96 0.96 0.95 0.95 0.92 0.96 0.96 0.95 0.92 0.90 500 0.94 0.94 0.93 0.91 0.87 0.93 0.93 0.94 0.94 0.91 1000 0.86 0.88 0.87 0.83 0.83 0.93 0.93 0.91 0.89 0.91 1500 0.85 0.88 0.86 0.87 0.87 0.93 0.93 0.93 0.93 0.91 2000 0.81 0.82 0.83 0.82 0.80 0.92 0.92 0.91 0.92 0.91 GARCH-t GJR-GARCH-t 250 0.00 0.01 0.04 0.06 0.12 0.00 0.00 0.01 0.02 0.04 500 0.00 0.00 0.01 0.01 0.02 0.00 0.00 0.01 0.01 0.01 1000 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 1500 0.00 0.01 0.00 0.01 0.01 0.00 0.01 0.00 0.00 0.01 2000 0.00 0.00 0.00 0.01 0.01 0.00 0.01 0.00 0.01 0.01 Notes: This table shows the rejection rates of the one-sided ESR backtest for ES forecasts stemming from the two GARCH-type models with Student’s t and Gaussian residuals, different rolling window sizes, and model refitting lengths (in days). The rejection frequencies are averaged over the analyzed 200 most capitalized stocks of the S&P 500 index. The out-of-sample window covers the time from January 2010 to August 2019 resulting in a sample size of 2432 days. this one-sided test might be the one with the highest practical relevance for backtesting the ES. The results show that both, the GARCH-N and GJR-GARCH-N are rejected for al- most all the stocks (in more than 80% of the cases) uniformly over the different estima- tion sample sizes and refitting frequencies. Independent of the sample length and refitting frequency, this supports the well-known finding that Gaussian residuals general- ly fail to capture the riskiness of financial assets, especially in the tails of the distribu- tion. In contrast, for the two GARCH specifications with Student-t distributed innovations, the rejection frequencies are considerably lower and for almost all choices of the refitting frequency and the estimation window length, they are below the nominal significance level of 5%. Furthermore, refitting the models more frequently tends to slightly decrease the rejection frequency for the models with Student-t distributed inno- vations, however, it tends to increase the rejection frequency for the Gaussian models. Overall, this implies that the refitting frequency is not a key factor in the model per- formance. Increasing the size of the estimation window tends to decrease the rejection frequency, whereas the results stabilize for lengths above 1000 days. Interestingly, employing the GJR-GARCH model, which accounts for a potential leverage effect in the volatility process, does not perform better than the standard GARCH model. Overall, the results of this application which are diversified over 200 individual stocks, imply that using a fat-tailed residual distribution and an estimation window above 1000 days (roughly four years) suffices to obtain rejection rates uniformly below 1%. Downloaded from https://academic.oup.com/jfec/article/20/3/437/5912157 by DeepDyve user on 20 July 2022 Bayer and Dimitriadis j Regression-Based ES Backtesting 463 4 Conclusion With the upcoming implementation of the third Basel Accords, risk managers and regula- tors will shift attention to the risk measure ES for the forecasting and evaluation of financial risks. In this article, we introduce regression based ESR backtests for ES forecasts, which extend the classical Mincer and Zarnowitz (1969) test to ES-specific versions. As estimation of regression parameters for the ES stand-alone is infeasible, our tests build on a recently developed joint VaR and ES regression, which allows for different specifications of our tests, titled the Auxiliary, Strict, and Intercept ESR backtests. As these tests are potentially subject to model misspecification, we extend the asymptotic theory for the joint VaR and ES regression model to possibly misspecified models and verify the tests’ performance in fi- nite samples through an extensive simulation study. We apply our tests to 200 stocks from the S&P 500 index in order to analyze the performance of ES forecasts stemming from the GARCH model family. We find that using fat-tailed (Student’s t) residual distributions and more than four years of data yield satisfactory ES forecasts. A unique and essential feature of the Strict and Intercept ESR backtests is that they solely require forecasts for the ES and are consequently the first backtests for the ES stand-alone. In contrast, a common drawback of the existing backtests in the literature is that they need fore- casts of further input variables, such as the VaR, the volatility, the tail distribution, or even the whole return distribution. Using more information than the ES forecasts is problematic for two reasons. First, these tests are not applicable for the regulatory authorities, who receive forecasts of the ES, but not of the additional information required by these tests. Second, rejecting the null hypothesis does not necessarily imply that the ES forecasts are incorrect as the rejection can be a result of a false prediction of any of the input parameters. This article contributes to the ongoing discussion about which risk measure is the best in practice in the following way. As the VaR is criticized for not being subadditive and for not capturing tail risks beyond itself, the recent literature proposes both, the ES and expect- iles as alternative risk measures. Expectiles are suggested as they are coherent, elicitable, and are able to capture extreme risks beyond the VaR and thus, they simultaneously over- come the drawbacks of the VaR and the ES (Bellini et al., 2014; Ziegel, 2016). Unfortunately, as opposed to the VaR and ES, they lack a visual and intuitive interpretation (Emmer, Kratz, and Tasche, 2015). In contrast, the ES is mainly criticized for its theoretical deficiencies of being not elicitable and not (only with difficulties) backtestable. However, starting with the joint elicitability result of VaR and ES of Fissler and Ziegel (2016), there is a growing body of literature using this result for a regression procedure (Dimitriadis and Bayer, 2019; Patton, Ziegel, and Chen, 2019; Barendse, 2020) and for relative forecast comparison (Fissler, Ziegel, and Gneiting, 2016; Nolde and Ziegel, 2017), which is extended by this article by introducing the ESR backtests, which are the first sensible backt- ests for the ES stand-alone. This shows that, even though technically more demanding, the ES can be modeled, evaluated, and backtested in the same way as quantiles and expectiles. Combining this with its ability to capture extreme tail risks and its intuitive visual interpret- ation, the ES is an appropriate candidate for being the standard risk measure in practice. Supplementary Data Supplementary data are available at Journal of Financial Econometrics online. Downloaded from https://academic.oup.com/jfec/article/20/3/437/5912157 by DeepDyve user on 20 July 2022 464 Journal of Financial Econometrics Appendix A: Proofs Proof of Theorem 1.3: We check that the necessary conditions (i)–(iv) of the basic consist- ency theorem, given in Theorem 2.1 in Newey and McFadden (1994), p. 2121 hold, where we consider the objective functions Q ðhÞ and Q ðhÞ as defined in Equations (1.17) and (1.19). First, notice that condition (ii) holds by imposing condition (A2). The unique identi- fication condition (i) holds by assumption (A3). Next, we verify the uniform convergence condition (iv) by applying the uniform weak law of large numbers given in Theorem A.2.5. in White (1994). For that, we have to show that (A). the map h7!qðY ; X ; hÞ is Lipschitz-L on H, see Definition A.2.3 in White (1994), t t o o o (B). For all h 2 H, there exists d > 0, such that for all d; 0 < d d , the sequences o o q ðh ; dÞ :¼ supfqðY ; X ; hÞjjjh h jj < dg and (A.1) t t h2H o o q ðh ; dÞ :¼ inffqðY ; X ; hÞjjjh h jj < dg (A.2) t t h2H obey a weak law of large numbers. Condition (A) follows directly from Lemma S.1.1 and we turn to condition (B). As the process fY ; V ; W g is strong mixing of size r=ðr 2Þ for some r > 2 by condition (A6), t t t the processes V and W are strong mixing of the same size by Theorem 3.49 in White t t (2001), p. 50. As the functions qðY ; X ; hÞ and the supremum/infimum functions are F - t t t o o measureable for all t 2 N, we can conclude that the sequences q ðh ; dÞ and q ðh ; dÞ are also strong mixing of the same size by applying the same theorem. Furthermore, for ~r > 1 and for some d > 0 sufficiently small enough, r ~r þ d and thus o rþd r E½jq ðh ; dÞj sup E½sup jqðY ; X ; hÞj for all t; 1 t T; T 1. As H is t t t 1 t T h2H compact, there exists some c > 0 such that sup jjhjj c and thus, for all t ¼ 1; ... ; T,it h2H holds that c 1 1 r r1 r r > r E½sup jqðY ; X ; hÞj 4 1 þ 1 þ EjjV jj þ EjjY jj þ supEjj logðW cÞjj ; (A.3) t t t t K s sK h2H h2H which is bounded by condition (A8) and as logðzÞ z for z large enough. The same in- equality holds for jq ðh ; dÞj. Thus, we can apply the weak law of large numbers for strong mixing sequences in Corollary 3.48 in White (2001), p. 49 in order to conclude that for all o o o o h 2 H such that jjh hjj d, it holds that ðq ðh ; dÞ E½q ðh ; dÞÞ!0 and t t t¼1 o o ðq ðh ; dÞ E½q ðh ; dÞÞ!0, which shows condition (B). Consequently, the uniform t t t¼1 convergence condition (iv) holds by applying the uniform weak law of large numbers given in Theorem A.2.5. in White (1994). As we have shown that the map h7!qðY ; X ; hÞ is Lipschitz-L in Lemma S.1.1, the map t t 1 0 1 h7!Q ¼ E½qðY ; X ; hÞ is also continuous which shows condition (iii). Thus, we can t t T T t¼1 9 Notice that we do not have a double index and thus we suppress the n in the notation of White (1994). Furthermore, we apply the definition by using the identify function for a . t Downloaded from https://academic.oup.com/jfec/article/20/3/437/5912157 by DeepDyve user on 20 July 2022 Bayer and Dimitriadis j Regression-Based ES Backtesting 465 apply Theorem 2.1. of Newey and McFadden (1994) which concludes the proof of this the- orem. h Proof of Theorem 1.4: Let 0 1 1 > s fY V bg B > t C sW c B t C wðÞ Y ; X ; h ¼ ; (A.4) t t B C W 1 @ t A > > > W c V b þ V b Y 1 > fY V bg 2 t t t t ðW cÞ which is almost surely the derivative of qðY ; X ; hÞ with respect to h. We further define t t 1 0 W ðhÞ¼ wðY ; X ; hÞ and W ðhÞ¼ E½W ðhÞ. From the proof of Lemma S.1.2, we get T t t T T T t¼1 the mean value expansion (for h close to h ), 0 0 ^ ~ ~ ^ W ðh Þ W ðh Þ¼ D ðh ; h Þðh h Þ; (A.5) T T 1 2 T T T T T ~ ~ ^ for some values h and h somewhere on the line between h and h , where the compo- 1 2 T ~ ~ nents of D ðh ; h Þ are given in Equation (S.1.8) and Equation (S.1.9), and where T 1 2 0 10 W ðh Þ¼ 0. T T ~ ~ Furthermore, it holds that D ðh ; h Þ¼ K ðh Þ and D ðh ; h Þ is a continuous function T T T 1 2 T T T ~ ~ in its arguments h and h . Using that K ðh Þ has Eigenvalues bounded away from zero 1 2 T ~ ~ (for T large enough), we also get that D ðh ; h Þ is non-singular in a neighborhood around T 1 2 h (for all arguments) for T large enough as the map which maps the matrix onto its ^ ~ Eigenvalues is continuous. As we further know that h h !0 and jjh h jj T j T T jjh h jj for all j ¼ 1, 2, we get from the continuous mapping theorem that 1 1 P ~ ~ D ðh ; h Þ K ðh Þ!0: (A.6) 1 2 T T T In the following, we apply Lemma A.1 in Weiss (1991) (by verifying its assumptions), which extends the i.i.d. results of Huber (1967) to strong mixing sequences. Assumption (N1) of Lemma A.1 in Weiss (1991) is satisfied as every almost surely continuous stochastic process is separable in the sense of Doob (Gikhman and Skorokhod, 2004) and the func- tions wðY ; X ; hÞ are almost surely continuous for all t 2 N. Assumption (N2) is satisfied as t t shown in the proof of Theorem 1.3. Assumption (N3)(i) is shown in Lemma S.1.2. The technical Assumptions (N3)(ii) and (N3)(iii) follow from Lemma 4 and Lemma 5 in the Supplementary Appendix of Patton, Ziegel, and Chen (2019). For this, notice that the mo- ment conditions in Assumption 2 (C) and (D) of Patton, Ziegel, and Chen (2019) are implied by the condition (A8) in Assumption 1.2 for the simplified case of linear models. Assumption (N4) follows from the moment conditions (A8) in Assumption 1.2 and Assumption (N5) from the strong mixing condition (A6). Furthermore, Lemma 2 in the pffiffiffiffi Supplementary Appendix of Patton, Ziegel, and Chen (2019) implies that TW ðh Þ!0. T T Thus, we can apply Lemma A.1 in Weiss (1991) and get that 10 The mean-value theorem cannot be generalized in a straight-forward fashion to vector-valued functions. Thus, we have to consider the mean value expansion in each component separately which gives this more complicated expression. Downloaded from https://academic.oup.com/jfec/article/20/3/437/5912157 by DeepDyve user on 20 July 2022 466 Journal of Financial Econometrics pffiffiffiffi pffiffiffiffi TW ðh Þ TW ðh Þ!0: (A.7) T T T T Combining Equations (A.5), (A.6), and (A.7), we get that pffiffiffiffi pffiffiffiffi ^ ~ ~ ^ Tðh h Þ¼D ðh ; h Þ TW ðh Þ (A.8) T T 1 2 T T T pffiffiffiffi ¼ K ðh Þþ o ð1Þ TW ðh Þþ o ð1Þ (A.9) p T p T T T pffiffiffiffi ¼K ðh Þ TW ðh Þþ o ð1Þ: (A.10) T p T T T Furthermore, pffiffiffiffi pffiffiffiffi 1=2 1=2 0 R ðh Þ TW ðh Þ¼ R ðh Þ T W ðh Þ W ðh Þ !Nð0; I Þ; (A.11) T T T T T T T T 2k T T by Lemma S.1.3 and thus, pffiffiffiffi 1=2 d R ðh ÞK ðh Þ Tðh h Þ!Nð0; I Þ; (A.12) T T 2k T T T T which concludes the proof of this theorem. h Proof of Corollary 1.5: We first notice that pffiffiffiffi pffiffiffiffi pffiffiffiffi 1=2 1=2 1=2 1=2 ^ ^ ^ ^ ^ X Tðh h Þ¼ X Tðh h ÞþðX X Þ Tðh h Þ: (A.13) T T T T T T T T T T pffiffiffiffi 1=2 1=2 d ^ ^ From Theorem 1.4, we get that X Tðh h Þ!Nð0; I Þ. Furthermore, as ðX T 4 T T T pffiffiffiffi 1=2 1=2 1=2 ^ ^ X Þ¼ o ð1Þ it holds by Slutzky’s theorem, that ðX X Þ Tðh h Þ¼ o ð1Þ P T P T T T T and consequently, pffiffiffiffi 1=2 ^ ^ X Tðh h Þ!Nð0; I Þ: (A.14) T 4 T T Thus, pffiffiffiffi > pffiffiffiffi 1=2 1=2 ^ ^ ^ ^ T ¼ X Tðc c Þ X Tðc c Þ !v ; (A.15) AESR T T T;c T T;c T 2 pffiffiffiffi > pffiffiffiffi 1=2 1=2 ~ ^ ^ T ðc Þ¼ X Tð^c c Þ X Tð^c c Þ !v ; and (A.16) SESR T T T T;c T T;c T 2 pffiffiffiffi pffiffiffiffi 1=2 1=2 ~ ^ ^ T ðc Þ¼ X Tð^c c Þ X Tð^c c Þ !v : (A.17) IESR 1;T 1;T 1;T T;c 1;T T;c 1;T 1 1 1 Proof of Corollary 1.6: In the following, we show the result for the Strict ESR test statistic, while equivalent results for the other two ESR tests follow from straight-forward simplifica- tions of this proof. Given the alternative hypothesis SESR H H : c 6¼ c ; 8T T for some T 2 N; (A.18) A A A T T H0 it holds that jjc c jj 2e for all T T and for some e > 0. Thus, A A A T T H H H 0 0 0 ^ ^ ^ jjc c jj ¼ jjc c þ c c jj jjjc c jj jjc c jjj e > 0; (A.19) T T T T T T T T T T with probability approaching one by the inverse triangle inequality and as jj^c c jj!0 and thus, jj^c c jj e with probability approaching one as T !1. Consequently, for T A all c 2 R, it holds that Downloaded from https://academic.oup.com/jfec/article/20/3/437/5912157 by DeepDyve user on 20 July 2022 Bayer and Dimitriadis j Regression-Based ES Backtesting 467 pffiffiffiffi 1=2 P jj TX ð^c c Þjj c ! 1; (A.20) T;c T T and thus, P T ðc Þ c ! 1: (A.21) SESR The proof for T follows along the lines and the one of T is a simplified version IESR AESR as we can consider the test parameters (0, 1) instead of the hypothetical pseudo-true param- eters under the null c . h Appendix B: Approximation Accuracy of the Misspecified Parameters In this section, we present a simulation study in order to analyze the accuracy of the approximations of the pseudo-true parameter c by the tested restriction c ¼ð0; 1Þ under the null. Subsequently, we analyze the approximation of the test statistic T by AESR Figure 5. This figure illustrates the effect the misspecified regression equations of the Strict ESR test have on the respective (average) parameter estimates and associated test statistics. The three plot col- umns of the figure correspond to the different values of the AR-parameter / 2f0; 0:1; 0:5g, which gov- erns the degree of misspecification. For each column, the first row illustrates the empirical degree of misspecification through a plot of c ¼ e ^ =v^ . The subsequent rows show the estimated densities of t t t the Strict ESR test (solid black lines) and the Auxiliary ESR test (dashed green lines) for the quantile- specific parameters b , the ES-specific parameter ^c and the respective test statistics T and SESR T T T . AESR Downloaded from https://academic.oup.com/jfec/article/20/3/437/5912157 by DeepDyve user on 20 July 2022 468 Journal of Financial Econometrics T . For this, we generate 1000 Monte-Carlo replications of the AR-GARCH DGP, SESR given in Equation (2.2) of length T ¼ 1000 with varying AR parameter / 2f0; 0:1; 0:5g, and generate optimal VaR and ES forecasts v^ and e^ . As already specified in Equation t t (2.3), for this DGP, the degree of misspecification in the quantile model of the Strict and the Intercept ESR tests is analytically tractable through the magnitude of /, where / ¼ 0 repre- sents the case of a correct specification. In contrast, the case / ¼ 0:5, which is highly un- realistic for financial returns, can be seen as a worst-case scenario. For each simulated time series, we estimate the underlying regression equations of the Auxiliary and the Strict ESR tests, given in Equations (1.8) and (1.11) and denote the respective parameter estimates AESR SESR through the superscripts, ^c and ^c . Under the null hypothesis, the (average) differ- T T AESR SESR ence of the estimated parameters ^c and ^c governs the (average) effect the misspe- T T cification has on the ES-specific parameters. Figure 5 illustrates the results, where the different columns of the plots correspond to the different values of /. In each column, the first row illustrates the degree of misspecification through a plot of c ¼ e^ =v^ . The subsequent rows illustrate the estimated densities of the t t t Strict and Auxiliary ESR tests for the quantile-specific parameters b , the ES-specific par- ameter ^c and the respective test statistics T and T . SESR AESR For the quantile-specific parameters b , we can verify the expected misspecified behavior of the quantile regression equation through differing parameter estimates between the Strict and Auxiliary ESR tests. For / ¼ 0, c is constant and the quantile regression is correctly specified, however with a slope coefficient generally unequal to 1. For an increasing degree of misspecification, we expectedly find an increasing degree of misspecification in SESR AESR ^ ^ the estimates b compared to b . In contrast, this effect is almost negligible for the T T SESR SESR ES-specific parameter estimates ^c compared to ^c . This illustrates that the misspe- T T cification mainly affects the quantile parameters while leaving the ES-specific parameters al- most unchanged, even for the worst-case scenario of / ¼ 0:5. The same approximation behavior can be observed for the associated test statistics reported in the last row of plots. This finding explains the almost identical behavior of the Auxiliary and the Strict ESR tests in the simulation exercises in Section 2. References Acerbi, C., and Szekely B. 2014. Backtesting Expected Shortfall. Risk Magazine, December: 76–81. Angrist, J., Chernozhukov V., and Fernandez-Val I. 2006. Quantile Regression under Misspecification, with an Application to the u.s. wage Structure. Econometrica 74: 539–563. Aramonte, S., Durand P., Kobayashi S., Kwast M., Lopez J. A., Mazzoni G., Raupach P., Summer M., and Wu J. 2011. “Messages from the Academic Literature on Risk Measurement for the Trading Book.” Technical report, Bank for International Settlements. Working paper No. 19. Available at https://www.bis.org/publ/bcbs_wp19.pdf. Accessed on 23 May 2020. Ardia, D., Boudt K., and Catania L. 2019. Generalized Autoregressive Score Models in R: The GAS Package. Journal of Statistical Software 88: 1–28. ~ ~ 11 We use this as a proxy for the approximation of T ¼ T ðð0; 1ÞÞ by T ðc Þ as SESR SESR SESR T ðc Þ is unfortunately not observable. SESR T Downloaded from https://academic.oup.com/jfec/article/20/3/437/5912157 by DeepDyve user on 20 July 2022 Bayer and Dimitriadis j Regression-Based ES Backtesting 469 Artzner, P., Delbaen F., Eber J.-M., and Heath D. 1999. Coherent Measures of Risk. Mathematical Finance 9: 203–228. Barendse, S. 2020. “Efficiently Weighted Estimation of Tail and Interquartile Expectations.” Working paper. Available at https://drive.google.com/file/d/1nI0QAWbM_VchAZDVg79p2vJc KoCrQB8o/view. Accessed on 23 May 2020. Basel Committee 1996. “Overview of the Amendment to the Capital Accord to Incorporate Market Risks.” Technical report, Bank for International Settlements. Available at http://www. bis.org/publ/bcbs23.pdf. Accessed on 23 May 2020. Basel Committee 2013. “Fundamental Review of the Trading Book: A Revised Market Risk Framework.” Technical report, Bank for International Settlements. Available at http://www.bis. org/publ/bcbs265.pdf. Accessed on 23 May 2020. Basel Committee 2016. “Minimum Capital Requirements for Market Risk.” Technical report, Bank for International Settlements. Available at http://www.bis.org/bcbs/publ/d352.pdf. Accessed on 23 May 2020. Basel Committee 2017. “Pillar 3 Disclosure Requirements – Consolidated and Enhanced Framework.” Technical report, Basel Committee on Banking Supervision. Available at http:// www.bis.org/bcbs/publ/d400.pdf. Accessed on 23 May 2020. Bayer, S., and Dimitriadis T. 2019a. esback: Expected Shortfall Backtesting. R package version 0.3.0. Available at https://CRAN.R-project.org/package¼esback. Bayer, S., and Dimitriadis T. 2019b. esreg: Joint Quantile and Expected Shortfall Regression.R package version 0.5.0. Available at https://CRAN.R-project.org/package¼esreg. Bellini, F., Klar B., Mu ¨ ller A., and Gianin E. R. 2014. Generalized Quantiles as Risk Measures. Insurance: Mathematics and Economics 54: 41–48. Bollerslev, T. 1986. Generalized Autoregressive Conditional Heteroskedasticity. Journal of Econometrics 31: 307–327. Carver, L. 2013. Mooted VAR Substitute Cannot Be Back-Tested, Says Top Quant. Risk Magazine, March. Cont, R., Deguest R., and Scandolo G. 2010. Robustness and Sensitivity Analysis of Risk Measurement Procedures. Quantitative Finance 10: 593–606. Costanzino, N., and Curran M. 2015. Backtesting General Spectral Risk Measures with Application to Expected Shortfall. Risk Magazine, March. Costanzino, N., and Curran M. 2018. A Simple Traffic Light Approach to Backtesting Expected Shortfall. Risks 6: 1–7. Couperier, O., and Leymarie J. 2019. “Backtesting Expected Shortfall via Multi-Quantile Regression.” Working paper. Available at https://halshs.archives-ouvertes.fr/halshs- 01909375v4/document. Accessed on 23 May 2020. Creal, D., Koopman S. J., and Lucas A. 2013. Generalized Autoregressive Score Models with Applications. Journal of Applied Econometrics 28: 777–795. Danielsson, J., Embrechts P., Goodhart C., Keating C., Muennich F., Renault O., and Shin H. S. 2001. An Academic Response to Basel II. Financial Markets Group Special Papers. Available at https://www.research-collection.ethz.ch/handle/20.500.11850/145525. Accessed on 23 May Dimitriadis, T., and Bayer S. 2019. A Joint Quantile and Expected Shortfall Regression Framework. Electronic Journal of Statistics 13: 1823–1871. Du, Z., and Escanciano J. C. 2017. Backtesting Expected Shortfall: Accounting for Tail Risk. Management Science 63: 940–958. Efron, B. 1991. Regression Percentiles Using Asymmetric Squared Error Loss. Statistica Sinica 1: 93–125. Embrechts, P., Liu H., and Wang R. 2018. Quantile-Based Risk Sharing. Operations Research 66: 936–949. Downloaded from https://academic.oup.com/jfec/article/20/3/437/5912157 by DeepDyve user on 20 July 2022 470 Journal of Financial Econometrics Emmer, S., Kratz M., and Tasche D. 2015. What is the Best Risk Measure in Practice? A Comparison of Standard Measures. The Journal of Risk 18: 31–60. Engle, R. F., and Russell J. R. 1998. Autoregressive Conditional Duration: A New Model for Irregularly Spaced Transaction Data. Econometrica 66: 1127–1162. Fissler, T., and Ziegel J. F. 2016. Higher Order Elicitability and Osband’s Principle. The Annals of Statistics 44: 1680–1707. Fissler, T., Ziegel J. F., and Gneiting T. 2016. Expected Shortfall is Jointly Elicitable with Value at Risk - Implications for Backtesting. Risk Magazine, January: 58–61. Gaglianone, W. P., Lima L. R., Linton O., and Smith D. R. 2011. Evaluating Value-at-Risk Models via Quantile Regression. Journal of Business & Economic Statistics 29: 150–160. Gikhman, I., and Skorokhod A. 2004. The Theory of Stochastic Processes I, Volume 210 of Classics in Mathematics. Springer Verlag Berlin Heidelberg. Glosten, L. R., Jagannathan R., and Runkle D. E. 1993. On the Relation between the Expected Value and the Volatility of the Nominal Excess Return on Stocks. The Journal of Finance 48: 1779–1801. Gneiting, T. 2011. Making and Evaluating Point Forecasts. Journal of the American Statistical Association 106: 746–762. Gourieroux, C., Monfort A., and Trognon A. 1984. Pseudo Maximum Likelihood Methods: Theory. Econometrica 52: 681–700. Graham, A., and Pa ´ l J. 2014. Backtesting Value-at-Risk Tail Losses on a Dynamic Portfolio. The Journal of Risk Model Validation 8: 59. Guler, K., Ng P. T., and Xiao Z. 2017. Mincer–Zarnowitz Quantile and Expectile Regressions for Forecast Evaluations under Aysmmetric Loss Functions. Journal of Forecasting 36: 651–679. Harvey, A. 2013. Dynamic Models for Volatility and Heavy Tails: With Applications to Financial and Economic Time Series. Econometric Society Monographs. Cambridge, UK: Cambridge University Press. Hendricks, W., and Koenker R. 1992. Hierarchical Spline Models for Conditional Quantiles and the Demand for Electricity. Journal of the American Statistical Association 87: 58–68. Holden, K., and Peel D. A. 1990. On Testing for Unbiasedness and Efficiency of Forecasts. The Manchester School 58: 120–127. Huber, P. 1967. “The Behavior of Maximum Likelihood Estimates under Nonstandard Conditions.” In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, pp. 221–233. Berkeley: University of California Press. Kerkhof, J., and Melenberg B. 2004. Backtesting for Risk-Based Regulatory Capital. Journal of Banking & Finance 28: 1845–1865. Kim, T.-H., and White H. 2003. “Estimation, Inference, and Specification Testing for Possibly Misspecified Quantile Regression.” In Fomby, T.B. and Carter Hill, R. (eds.) Maximum Likelihood Estimation of Misspecified Models: Twenty Years Later, pp. 107–132. Emerald Group Publishing Limited. Koenker, R. W., and Bassett G. 1978. Regression Quantiles. Econometrica 46: 33–50. Komunjer, I. 2004. “Quantile Prediction.” In G. Elliott, and A. Timmermann (eds.), Handbook of Economic Forecasting, vol. 2, chapter 17, pp. 961–994. Amsterdam: Elsevier. Komunjer, I. 2005. Quasi-Maximum Likelihood Estimation for Conditional Quantiles. Journal of Econometrics 128: 137–164. Koopman, S. J., Lucas A., and Scharth M. 2016. Predicting Time-Varying Parameters with Parameter-Driven and Observation-Driven Models. Review of Economics and Statistics 98: 97–110. Kratz, M., Lok Y. H., and McNeil A. J. 2018. Multinomial VaR Backtests: A Simple Implicit Approach to Backtesting Expected Shortfall. Journal of Banking & Finance 88: 393–407. Downloaded from https://academic.oup.com/jfec/article/20/3/437/5912157 by DeepDyve user on 20 July 2022 Bayer and Dimitriadis j Regression-Based ES Backtesting 471 Lloyd, C. J. 2005. Estimating Test Power Adjusted for Size. Journal of Statistical Computation and Simulation 75: 921–933. Lo ¨ ser, R., Wied D., and Ziggel D. 2018. New Backtests for Unconditional Coverage of Expected Shortfall. Journal of Risk 21: 1–21. McNeil, A. J., and Frey R. 2000. Estimation of Tail-Related Risk Measures for Heteroscedastic Financial Time Series: An Extreme Value Approach. Journal of Empirical Finance 7: 271–300. Mincer, J., and Zarnowitz V. 1969. “The Evaluation of Economic Forecasts.” In J. Mincer (eds.), Economic Forecasts and Expectations: Analysis of Forecasting Behavior and Performance, pp. 3–64. New York: National Bureau of Economic Research. Nadarajah, S., Zhang B., and Chan S. 2014. Estimation Methods for Expected Shortfall. Quantitative Finance 14: 271–291. Nelson, D. B. 1991. Conditional Heteroskedasticity in Asset Returns: A New Approach. Econometrica 59: 347–370. Newey, W., and McFadden D. 1994. “Large Sample Estimation and Hypothesis Testing.” In R. Engle, and D. McFadden (eds.), Handbook of Econometrics, vol. 4, chapter 36, pp. 2111–2245. Amsterdam: Elsevier. Nolde, N., and Ziegel J. F. 2017. Elicitability and Backtesting: Perspectives for Banking Regulation. The Annals of Applied Statistics 11: 1833–1874. Patton, A. J., Ziegel J. F., and Chen R. 2019. Dynamic Semiparametric Models for Expected Shortfall (and Value-at-Risk). Journal of Econometrics 211: 388–413. Righi, M. B., and Ceretta P. S. 2013. Individual and Flexible Expected Shortfall Backtesting. The Journal of Risk Model Validation 7: 3–20. Righi, M. B., and Ceretta P. S. 2015. A Comparison of Expected Shortfall Estimation Models. Journal of Economics and Business 78: 14–47. Weber, S. 2006. Distribution Invariant Risk Measures, Information, and Dynamic Consistency. Mathematical Finance 16: 419–441. Weiss, A. A. 1991. Estimating Nonlinear Dynamic Models Using Least Absolute Error Estimation. Econometric Theory 7: 46–68. White, H. 1980. Using Least Squares to Approximate Unknown Regression Functions. International Economic Review 21: 149–170. White, H. 1994. Estimation, Inference and Specification Analysis. Econometric Society Monographs. Camebridge: Cambridge University Press. White, H. 2001. Asymptotic Theory for Econometricians. San Diego: Academic Press. Wong, W. K. 2008. Backtesting Trading Risk of Commercial Banks Using Expected Shortfall. Journal of Banking & Finance 32: 1404–1415. Yamai, Y., and Yoshiba T. 2002. On the Validity of Value-at-Risk: Comparative Analyses with Expected Shortfall. Monetary and Economic Studies 20: 57–85. Ziegel, J. F. 2016. Coherence and Elicitability. Mathematical Finance 26: 901–918.
Journal of Financial Econometrics – Oxford University Press
Published: Jun 8, 2022
You can share this free article with as many people as you like with the url below! We hope you enjoy this feature!
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote