Access the full text.
Sign up today, get DeepDyve free for 14 days.
Joshua Angrist, V. Chernozhukov, Iván Fernández‐Val (2004)
Quantile Regression Under Misspecification, with an Application to the U.S. Wage StructureEconometrics eJournal
Iván Fernández-Val, J. Angrist, V. Chernozhukov (2004)
Quantile Regression under Misspecification
Tobias Fissler, J. Ziegel (2015)
Higher order elicitability and Osband’s principleAnnals of Statistics, 44
P. Hansen, Asger Lunde, James Nason (2010)
The Model Confidence SetEconometrics eJournal
Jeremy Berkowitz (2001)
Testing Density Forecasts, With Applications to Risk ManagementJournal of Business & Economic Statistics, 19
W. Wong (2008)
Backtesting Trading Risk of Commercial Banks Using Expected ShortfallBanking & Insurance eJournal
W. Hendricks, R. Koenker (1990)
Hierarchical Spline Models for Conditional Quantiles and the Demand for ElectricityJournal of the American Statistical Association, 87
David Ardia, Kris Boudt, Leopoldo Catania (2016)
Generalized Autoregressive Score Models in R: The GAS PackageERN: Forecasting Techniques (Topic)
R. Engle, Jeffrey Russell (1998)
Autoregressive Conditional Duration: A New Model for Irregularly Spaced Transaction DataEconometrica, 66
Nick Costanzino, Michael Curran (2015)
A Simple Traffic Light Approach to Backtesting Expected ShortfallERN: Value-at-Risk (Topic)
A. McNeil, R. Frey (2000)
Estimation of tail-related risk measures for heteroscedastic financial time series: an extreme value approachJournal of Empirical Finance, 7
A. Weiss (1991)
Estimating Nonlinear Dynamic Models Using Least Absolute Error EstimationEconometric Theory, 7
K. Holden, D. Peel (1990)
On Testing for Unbiasedness and Efficiency of ForecastsThe Manchester School, 58
(1996)
Overview of the Amendment to the Capital Accord to Incorporate Market Risks
J. Mincer (1970)
Economic Forecasts and Expectations: Analysis of Forecasting Behavior and Performance
S. Koopman, A. Lucas, Marcel Scharth (2012)
Predicting Time-Varying Parameters with Parameter-Driven and Observation-Driven ModelsReview of Economics and Statistics, 98
Jeroen Kerkhof, B. Melenberg (2002)
Backtesting for Risk-Based Regulatory CapitalRisk Management
J. Ziegel (2013)
COHERENCE AND ELICITABILITYMathematical Finance, 26
(2014)
‘Fundamental Review of the Trading Book: A Revised Market Risk Framework’ Second Consultative Document by the Basel Committee on Banking Supervision IMPACT ANALYSIS
Tobias Fissler, J. Ziegel, T. Gneiting (2015)
Expected Shortfall is jointly elicitable with Value at Risk - Implications for backtestingarXiv: Risk Management
B. Efron, R. Tibshirani (1994)
An Introduction to the Bootstrap
We use this as a proxy for the approximation of T SÀESR ¼T SÀESR ðð0
P. Embrechts, Haiyan Liu, Ruodu Wang (2017)
Quantile-Based Risk SharingRisk Management eJournal
M. Righi, Paulo Ceretta (2013)
Individual and Flexible Expected Shortfall BacktestingERN: Simulation Methods (Topic)
(2011)
Messages from the academic literature on risk measurement for the trading book
(2016)
Pillar 3 disclosure requirements – consolidated and enhanced framework . Technical report , Basel Committee on Banking Supervision
Fabio Bellini, B. Klar, A. Müller, Emanuela Gianin (2013)
Generalized Quantiles as Risk MeasuresMicroeconomics: Decision-Making under Risk & Uncertainty eJournal
G. Barone-Adesi, K. Giannopoulos, L. Vosper (1999)
VaR without correlations for portfolios of derivative securitiesJournal of Futures Markets, 19
S. Nadarajah, Bo Zhang, S. Chan (2014)
Estimation methods for expected shortfallQuantitative Finance, 14
Drew Creal, S. Koopman, A. Lucas (2013)
GENERALIZED AUTOREGRESSIVE SCORE MODELS WITH APPLICATIONSJournal of Applied Econometrics, 28
DuZaichao, EscancianoJuan Carlos (2017)
Backtesting Expected ShortfallManagement Science
Susanne Emmer, M. Kratz, Dirk Tasche (2013)
What is the Best Risk Measure in Practice? A Comparison of Standard MeasuresEconometric Modeling: Capital Markets - Risk eJournal
Zaichao Du, J. Escanciano (2015)
Backtesting Expected Shortfall: Accounting for Tail RiskMonetary Economics eJournal
M. Righi, Paulo Ceretta (2014)
A Comparison of Expected Shortfall Estimation ModelsEconometrics: Econometric Model Construction
Cox Pdf (1977)
The Theory Of Stochastic ProcessesThe Mathematical Gazette, 61
Daniel Nelson (1991)
CONDITIONAL HETEROSKEDASTICITY IN ASSET RETURNS: A NEW APPROACHEconometrica, 59
Nick Costanzino, Michael Curran (2015)
Backtesting General Spectral Risk Measures with Application to Expected ShortfallEconometrics: Econometric & Statistical Methods - Special Topics eJournal
C. Lloyd (2005)
Estimating test power adjusted for sizeJournal of Statistical Computation and Simulation, 75
Regression Percentiles Using Asymmetric Squared Error Loss
Kemal Guler, Pin Ng, Zhijie Xiao (2017)
Mincer–Zarnowitz quantile and expectile regressions for forecast evaluations under aysmmetric loss functionsJournal of Forecasting, 36
Whitney Newey, D. McFadden (1986)
Large sample estimation and hypothesis testingHandbook of Econometrics, 4
R. Cont, Romain Deguest, Giacomo Scandolo (2008)
Robustness and sensitivity analysis of risk measurement proceduresQuantitative Finance, 10
Savas Papadopoulos, Pantelis Stavroulias, T. Sager, Etti Baranoff (2018)
A Three-State Early Warning System for the European UnionEuropean Economics: Macroeconomics & Monetary Economics eJournal
(2013)
Mooted VAR substitute cannot be back-tested, says top quant
Timo Dimitriadis, Sebastian Bayer (2017)
A joint quantile and expected shortfall regression frameworkElectronic Journal of Statistics
T. Gneiting (2009)
Making and Evaluating Point ForecastsJournal of the American Statistical Association, 106
Stefan Weber (2006)
DISTRIBUTION‐INVARIANT RISK MEASURES, INFORMATION, AND DYNAMIC CONSISTENCYMathematical Finance, 16
Andrew Patton, J. Ziegel, Rui Chen (2017)
Dynamic Semiparametric Models for Expected Shortfall (and Value-At-Risk)ERN: Time-Series Models (Single) (Topic)
Michael Curran (2014)
Backtesting Expected Shortfall
S. Kotz (1974)
The Theory Of Stochastic Processes I
Furno Marilena, Vistocco Domenico (2018)
Quantile RegressionWiley Series in Probability and Statistics
J. Orgeldinger (2018)
Recent Issues in the Implementation of the New Basel Minimum Capital Requirements for Market Risk, 2
Alasdair Graham, János Pál (2014)
Backtesting value-at-risk tail losses on a dynamic portfolioThe Journal of Risk Model Validation, 8
Ivana Komunjer (2005)
Quasi-maximum likelihood estimation for conditional quantilesJournal of Econometrics, 128
H. White (1985)
Asymptotic theory for econometricians
C. Gouriéroux, A. Monfort, A. Trognon (1984)
PSEUDO MAXIMUM LIKELIHOOD METHODS: THEORYEconometrica, 52
Halbert Jr., Tae-Hwan Kim (2002)
Estimation, Inference, and Specification Testing for Possibly Misspecified Quantile RegressionEconometrics eJournal
A. Harvey (2013)
Dynamic Models for Volatility and Heavy Tails: With Applications to Financial and Economic Time Series
Sander Barendse (2017)
Efficiently Weighted Estimation of Tail and Interquantile ExpectationsEconometrics: Econometric & Statistical Methods - General eJournal
J. MacKinnon (2007)
Bootstrap Hypothesis Testing
L. Glosten, R. Jagannathan, D. Runkle (1993)
On the Relation between the Expected Value and the Volatility of the Nominal Excess Return on StocksJournal of Finance, 48
Yasuhiro Yamai, Toshinao Yoshiba (2002)
On the Validity of Value-at-Risk: Comparative Analyses with Expected ShortfallMonetary and and Economic Studies, 20
W. Gaglianone, L. Lima, Oliver Linton, Daniel Smith (2008)
Evaluating Value-at-Risk Models via Quantile RegressionJournal of Business & Economic Statistics, 29
Robert Löser, Dominik Wied, D. Ziggel (2018)
New Backtests for Unconditional Coverage of Expected ShortfallJournal of Risk
David Harvey (1997)
The evaluation of economic forecasts
H. White (1980)
Using Least Squares to Approximate Unknown Regression FunctionsInternational Economic Review, 21
Philippe Artzner, F. Delbaen, J. Eber, D. Heath (1999)
Coherent Measures of RiskMathematical Finance, 9
D. Hinkley (2008)
Bootstrap Methods: Another Look at the Jackknife
N. Nolde, J. Ziegel (2016)
Elicitability and backtesting: Perspectives for banking regulationThe Annals of Applied Statistics, 11
2019 a . esback : Expected Shortfall Backtesting . R package version 0 . 3 . 0
M. Kratz, Y. Lok, A. McNeil (2016)
Multinomial VAR Backtests: A Simple Implicit Approach to Backtesting Expected ShortfallRisk Management eJournal
R. Koenker, G. Bassett (2007)
Regression Quantiles
P. Embrechts, Jón Dańıelsson, C. Goodhart, C. Keating, F. Muennich, O. Renault, H. Shin (2001)
An academic response to Basel II, 130
James Taylor (2019)
Forecasting Value at Risk and Expected Shortfall Using a Semiparametric Approach Based on the Asymmetric Laplace DistributionJournal of Business & Economic Statistics, 37
Angrist (2006)
Quantile Regression under Misspecification, with an Application to the u.s. wage StructureEconometrica, 74
Ophélie Couperier, J. Leymarie (2020)
Backtesting Expected Shortfall via Multi-Quantile Regression
Drew Creal, S. Koopman, A. Lucas (2011)
Generalized Autoregressive Score Models with Applications ∗
T. Bollerslev (1986)
Generalized autoregressive conditional heteroskedasticityJournal of Econometrics, 31
H. Bierens, H. White (1996)
Estimation, Inference and Specification Analysis.Journal of the American Statistical Association, 91
P. Huber (1967)
The behavior of maximum likelihood estimates under nonstandard conditions
This paper introduces novel backtests for the risk measure Expected Shortfall (ES) following the testing idea of Mincer and Zarnowitz (1969). Estimating a regression framework for the ES stand-alone is infeasible, and thus, our tests are based on a joint regression for the Value at Risk and the ES, which allows for diﬀerent test speciﬁcations. These ES backtests are the ﬁrst which solely backtest the ES in the sense that they only require ES forecasts as input parameters. As the tests are potentially subject to model misspeciﬁcation, we provide asymptotic theory under misspeciﬁcation for the underlying joint regression. We ﬁnd that employing a misspeciﬁcation robust covariance estimator substantially improves the tests’ performance. We compare our backtests to existing approaches and ﬁnd that our tests outperform the competitors throughout all considered simulations. In an empirical illustration, we apply our backtests to ES forecasts for 200 stocks of the S&P 500 index. JEL Codes: C12, C32, C52, C53, C58, G32 Keywords: Expected Shortfall, Backtesting, Mincer-Zarnowitz Regression, Forecast Evaluation, Model Misspeciﬁcation, Asymptotic Theory 1. Introduction Through the transition from Value at Risk (VaR) to Expected Shortfall (ES) as the primary market risk measure in the Basel Accords (Basel Committee, 2016, 2017), there is a great demand for reliable methods for estimating, forecasting and backtesting the ES. Formally, the ES at level 2 ¹0; 1º is deﬁned as the mean of the returns smaller than the respective -quantile (the VaR), where is usually chosen to be 2.5% as stipulated by the Basel Accords. The ES is introduced into the banking regulation because it overcomes several shortcomings of the VaR, such as being not coherent and its inability to capture tail risks beyond the -quantile (Artzner et al., 1999; Danielsson et al., 2001; Basel Committee, 2013). In contrast to estimation and forecasting of ES where most of the existing models for the VaR can easily be adapted and generalized to the ES, such a generalization is not as straight-forward for backtesting ES forecasts (Emmer et al., 2015). In general, backtesting of a risk measure is the process of testing whether given forecasts for this risk measure are correctly speciﬁed, which is carried out by comparing the history of the issued risk forecasts with the Corresponding Author, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany and University of Hohenheim, Germany, e-mail: timo.dimitriadis@h-its.org University of Konstanz, Konstanz, Germany, e-mail: sebastian.bayer@uni-konstanz.de arXiv:1801.04112v2 [q-fin.RM] 21 Sep 2019 corresponding realized returns. The primary diﬃculty in directly backtesting ES is its non-elicitability and non-identiﬁability (Weber, 2006; Gneiting, 2011; Fissler and Ziegel, 2016; Fissler et al., 2016) as consequently, there is no analog to the hit sequence which is the natural identiﬁcation function of quantiles and which lies at the heart of almost all VaR backtests.1 As a consequence, most of the proposed procedures in the growing literature on backtesting ES use indirect approaches by formally backtesting some quantity which is closely related to the ES. Examples include tests based on the entire tail distribution, a linear approximation of the ES through several quantiles or the pair consisting of the VaR and the ES.2 We argue that formally, these approaches are backtests for the auxiliary quantities rather than for the ES itself, see also Nolde and Ziegel (2017). This distinction is particularly important as these backtests require further input parameters such as forecasts for the VaR at multiple levels, the tail distribution beyond some quantile, or even the entire distribution. The regulatory authorities however do not have this additional information at hand as it is not mandatorily reported by the ﬁnancial institutions (Aramonte et al., 2011; Basel Committee, 2016, 2017). As a consequence, the existing, so-called ES backtests are not applicable where they are most needed. In this paper, we propose novel backtests for ES forecasts which are the ﬁrst strict ES backtests in the literature in the sense that besides the realized returns, they only require ES forecasts as input parameters. Our tests follow the general regression based testing idea of Mincer and Zarnowitz (1969). For this, we estimate a regression framework which models the conditional ES at level as a linear function ES ¹Y j F º = + e ˆ , t t1 1 2 t where we use ﬁnancial returns Y as the response variable and the given ES forecasts e ˆ as the explanatory t t variable including an intercept term. For correctly speciﬁed ES forecasts, the intercept and slope parameters equal zero and one, which we test for by using a Wald statistic. As the ES is not elicitable (Gneiting, 2011), we face the methodological diﬃculty that we cannot estimate such a regression framework for the ES stand-alone as neither loss nor identiﬁcation functions are available for the ES which could be used as objective functions for M- or GMM-estimation (Dimitriadis and Bayer, 2019). Recently, Patton et al. (2019a) and Dimitriadis and Bayer (2019) propose a feasible alternative by specifying an auxiliary quantile regression equation Q ¹Y j F º = + ˆ (with explanatory variable ˆ ) and by jointly estimating the regression parameters t t1 1 2 t t ¹ ; º by employing a joint loss function for the quantile and the ES from Fissler and Ziegel (2016). The speciﬁcation of the quantile equation allows for diﬀerent testing approaches. First, we employ auxiliary VaR forecasts v ˆ as the explanatory variable in the quantile equation, but only test the ES speciﬁc parameters . We refer to this test as the Auxiliary ESR (ES Regression) backtest. The main drawback of this test is that it requires auxiliary VaR forecasts and consequently, it is formally a joint backtest for the VaR and ES which, however, mainly focuses on the ES by only testing the ES speciﬁc regression parameters. Second, we use the ES forecasts e ˆ as the explanatory variable in both, the quantile and the ES equation and again only test on the ES speciﬁc parameters . We refer to this test as the Strict ESR backtest as it only requires ES forecasts as input parameters and consequently is the ﬁrst test in the literature which solely backtests ES forecasts. This testing idea comes at the drawback of a potential model misspeciﬁcation in the quantile equation if the underlying data goes beyond a pure scale (volatility) model. Therefore, we provide asymptotic theory for this joint quantile and ES regression framework under model misspeciﬁcation, which generalizes 1See Yamai and Yoshiba (2002); Kerkhof and Melenberg (2004); Carver (2013); Acerbi and Szekely (2014); Emmer et al. (2015); Ziegel (2016); Fissler et al. (2016); Nolde and Ziegel (2017) for the ongoing discussion on backtestability of the ES. 2In particular, several tests require the whole or tail distribution of the returns or equivalently the cumulative violation process (Kerkhof and Melenberg, 2004; Wong, 2008; Graham and Pál, 2014; Acerbi and Szekely, 2014; Du and Escanciano, 2017; Löser et al., 2018; Costanzino and Curran, 2018), multiple quantiles at diﬀerent levels (Emmer et al., 2015; Costanzino and Curran, 2015; Kratz et al., 2018; Couperier and Leymarie, 2019), the VaR and the volatility (McNeil and Frey, 2000; Nolde and Ziegel, 2017; Righi and Ceretta, 2013, 2015), or the VaR (McNeil and Frey, 2000; Nolde and Ziegel, 2017) in addition to the ES forecasts. See Appendix C for an overview over the existing backtesting approaches. 2 the asymptotic theory introduced in Dimitriadis and Bayer (2019) and Patton et al. (2019a). The potential model misspeciﬁcation results in a more complex and usually inﬂated asymptotic covariance matrix. We account for this in the implementation of our tests by employing a new covariance estimation technique which explicitly estimates these new covariance terms. We further introduce an intercept variant of the Strict ESR backtest by ﬁxing the slope parameter in the regression to one, and by only estimating and testing the intercept term. We refer to this backtest as the Intercept ESR backtest. This test allows for both, testing against one-sided and two-sided alternatives. In contrast, the other two proposed ESR backtests only allow for testing against two-sided alternatives as it is generally unclear how underestimated and overestimated ES forecasts inﬂuence the intercept and slope parameters. Because the capital requirements that the ﬁnancial institutions must keep as a reserve depend on the reported risk forecasts, the market participants have an incentive to report risk forecasts which are too risky in order to minimize the expensive capital requirements. In contrast, issuing too conservative risk forecasts results in larger capital reserves, which does not have to be punished by the regulatory authorities. Thus, the regulators only have to prevent and penalize the underestimation of the ﬁnancial risks, which demonstrates the necessity of one-sided testing procedures. For example, the currently applied traﬃc light system (Basel Committee, 1996) is in fact a one-sided VaR backtest. As the Strict ESR backtest, the Intercept ESR backtest also has the desired characteristic to only require ES forecasts as input parameters and consequently is the ﬁrst procedure that solely backtests the ES against a one-sided alternative. We provide implementations of the three ESR backtests proposed in this paper in the R package esback (Bayer and Dimitriadis, 2019a). Such regression-based forecast evaluation approaches are already used for testing mean forecasts (Mincer and Zarnowitz, 1969), quantile forecasts (Gaglianone et al., 2011; Guler et al., 2017), and expectile forecasts (Guler et al., 2017). In contrast to these functionals, where regression techniques are easily available (see e.g. Koenker and Bassett, 1978, Efron, 1991), the non-elicitability of the ES makes our approach more involved but also opens up the possibility for the diﬀerent testing speciﬁcations we introduce. Our multivariate generalization approach of the Mincer and Zarnowitz (1969) testing idea can be applied equivalently to other higher-order elicitable functionals (Fissler and Ziegel, 2016) such as e.g. the variance (in the presence of a non-zero mean) and the Range VaR (Cont et al., 2010; Embrechts et al., 2018). We evaluate the empirical properties of our ESR backtests and compare them to the existing joint VaR and ES backtests of McNeil and Frey (2000) and Nolde and Ziegel (2017) through several simulation designs. In the ﬁrst setup, we implement the classical size and power analysis for backtesting risk measures, where we simulate data stemming from several realistic data generating processes and evaluate the empirical rejection frequencies of the backtests for forecasts stemming from the true and from some misspeciﬁed forecasting model. In order to assess how the potential model misspeciﬁcation aﬀects the Strict and the Intercept ESR backtests, we utilize DGPs which go beyond the class of pure scale (volatility) processes. For this, we implement two diﬀerent Student’s-t GAS models with time-varying higher moments (Creal et al., 2013) and furthermore use an AR-GARCH model which allows for gradually increasing the degree of misspeciﬁcation through the AR parameter. In the second setup, we introduce a new technique for evaluating the power of backtests for ﬁnancial risk measures, where we continuously misspecify certain model parameters of the data generating process to obtain a continuum of alternative models with a gradually increasing degree of misspeciﬁcation. Misspecifying the diﬀerent model parameters separately allows us to misspecify certain model characteristics (such as the reaction to shocks) in isolation, which permits a closer examination of the proposed backtesting procedures. The simulations show that all three ESR backtests we propose in this paper are well-sized, especially when the tests are applied using the new covariance estimation method which accounts for possible model misspeciﬁcation. We further ﬁnd that the performance of our testing procedures is almost unaﬀected by the 3 DGPs which cause model misspeciﬁcation in the Strict and the Intercept ESR tests. Moreover, our tests are more powerful than the existing backtests of McNeil and Frey (2000) and Nolde and Ziegel (2017) in almost all of the considered simulation designs for both, testing against one-sided and two-sided alternatives. Notably, throughout all simulation designs, the ESR backtests are able to detect the various diﬀerent misspeciﬁcations of the forecasts. In contrast, the existing backtests sometimes completely fail to detect certain misspeciﬁcations, for instance when the forecaster reports risk forecasts for a misspeciﬁed probability level. The rest of this paper is organized as follows. Section 2 introduces our new ESR backtests and presents asymptotic theory under model misspeciﬁcation. Section 3 contains several simulation studies and Section 4 applies the backtests to ES forecasts for a large amount of stocks from the S&P 500 index. Section 5 concludes. The proofs are deferred to Appendix A and Appendix B. 2. Theory 2.1. Setup and Notation We consider a stochastic process l+1 Z = Z : ! R ; l 2 N; t = 1; : : :; T ; (2.1) deﬁned on some complete probability space ; F; P , with the ﬁltration F = F ; t = 1; : : :; T and F = fZ ; s tg for all t = 1; : : :; T, where T 2 N. We partition the stochastic process Z = ¹Y ; U º, where t s t t t Y is an absolutely continuous random variable of interest and U is an l-dimensional vector of explanatory t t variables. We denote the conditional cumulative distribution function of Y given the past information F by t t1 F ¹yº = P¹Y y j F º and the corresponding probability density function by f . Whenever they exist, the t t t1 t mean and the variance of F are denoted by E »¼ and Var ¹º. t t t For ﬁnancial applications, the variable Y denotes the daily log returns of a ﬁnancial asset (for instance, a stock or a portfolio), i.e. Y = log P log P , where P denotes the price of the asset at day t = 1; : : :; T. This t t t1 t means that throughout this paper, we use the sign convention that positive returns denote proﬁts, and negative returns denote losses. The vector U contains further variables that are used to produce forecasts for certain functionals (usually risk measures) of the random variable Y . We are interested in testing whether forecasts for a certain d-dimensional, d 2 N functional (risk measure) = ¹F º of the conditional distribution F are t t correctly speciﬁed. For that, we deﬁne the most frequently used functionals for ﬁnancial risk management in the following. The conditional quantile of Y given the information set F at level 2 ¹0; 1º is deﬁned as t t1 Q Y j F = F ¹º = inf y 2 R : F ¹yº , which is called the VaR at level in ﬁnancial applications. t t1 t 1 1 Furthermore, we deﬁne the functional ES at level of Y given F as ES Y j F = F ¹sº ds. If t t1 t t1 the distribution function F is continuous at its -quantile, this deﬁnition can be simpliﬁed to the truncated tail mean of Y , ES Y j F = E Y j Y Q Y j F : (2.2) t t1 t t t t t1 We denote an F -measurable one-step-ahead forecast for day t for the risk measure of the distribution F , t1 t stemming from some external forecaster or from some given forecasting model3 by ˆ = ˆ ¹F º. Following t t t1 this notation, we denote forecasts for the -VaR by v ˆ and for the -ES by e ˆ for some ﬁxed level 2 ¹0; 1º. t t For simplicity of the notation, we drop the dependence on as it is a ﬁxed quantity. As both, the incentive of the forecaster and the underlying method used to generate the forecasts are in general unknown, these forecasts are not necessarily correctly speciﬁed. The focus of this paper is to develop 3For recent overviews on VaR and ES forecasting approaches, see Komunjer (2004) and Nadarajah et al. (2014). statistical tests for correctness of a given series of forecasts ˆ ; t = 1; : : :; T for the risk measure relative to the realized return series Y ; t = 1; : : :; T . This is in the literature usually referred to as backtesting of the risk measure without strictly deﬁning this terminology. We provide such a deﬁnition in the following. Deﬁnition 2.1. A backtest for the series of forecasts ˆ ; t = 1; : : :; T for the d-dimensional risk measure (functional) relative to the realized return series Y ; t = 1; : : :; T is a function T Td f : R R ! f0; 1g; (2.3) which maps the return and forecast series onto the respective test decision. The core message of this deﬁnition is that besides the realized return series, a backtest for some risk measure is only allowed to require forecasts for this risk measure as input parameters. This strict diﬀerentiation becomes relevant in the context of backtesting ES as, in contrast to the existing VaR backtests, the recently proposed ES backtests require further input parameters such as forecasts for the VaR, the volatility, or the entire tail distribution. The demand for these further quantities induces the following practical problems. First, the regulatory authorities who rely on such backtesting methods do not necessarily receive forecasts from the ﬁnancial institutions for the additional information required by these tests, which makes such backtests inapplicable for the regulatory authorities. Second, a rejection of the tests does not necessarily imply that the ES is misspeciﬁed, but that the forecasts for any of the input components are misspeciﬁed. Consequently, these tests are in fact not backtests for the ES, but rather backtests for some vector of risk measures (or the entire tail distribution). 2.2. The ESR Backtests We propose backtests for the risk measure ES that test whether a series of ES forecasts fe ˆ ; t = 1; : : : Tg, stemming from some external forecaster or forecasting model, is correctly speciﬁed relative to a series of realized returns fY ; t = 1; : : :; Tg. We follow the general testing idea of Mincer and Zarnowitz (1969) and regress the returns Y on the forecasts e ˆ and an intercept term by using a regression equation designed t t speciﬁcally for the functional ES, Y = + e ˆ + u ; (2.4) t 1 2 t where ES ¹u j F º = 0 almost surely. Given the structure in (2.4) and since the forecasts e ˆ are generated t1 t by using the information set F , this condition on the error term is equivalent to t1 ES ¹Y j F º = + e ˆ : (2.5) t t1 1 2 t We then test the hypothesis H : ¹ ; º = ¹0; 1º against H : ¹ ; º , ¹0; 1º: (2.6) 0 1 2 1 1 2 Under H , the ES forecasts are correctly speciﬁed as it holds that e ˆ = ES ¹Y j F º almost surely.4 In 0 t t t1 > e general, (2.4) is an example of a linear regression equation for the ES of the form Y = W + u , for some t t 4 Given that the ES forecasts are correctly speciﬁed, i.e. e ˆ = ES ¹Y j F º, the correct speciﬁcation condition (2.5) is t t t1 equivalent to = ¹1 ºe ˆ . This results in the remark of Holden and Peel (1990), who claim that the null hypothesis, given in (2.6) 1 2 is only a suﬃcient, but not a necessary condition for correctly speciﬁed forecasts as = ¹1 ºe ˆ is the required necessary condition. 1 2 However, this more general condition implies that the forecasts e ˆ are constant for all t = 1; : : :; T, which is highly unrealistic given the dynamic nature of ﬁnancial time series. Consequently, we employ the hypotheses given in (2.6) for our backtesting procedure. 5 general vector of covariates W . As outlined in Dimitriadis and Bayer (2019) and Patton et al. (2019a), estimating the parameters by M- or GMM-estimation stand-alone is not possible since there do not exist strictly consistent loss and identiﬁcation functions for the functional ES (Gneiting, 2011). Based on the seminal work of Fissler and Ziegel (2016) who introduce joint loss and identiﬁcation functions for the VaR and ES, Dimitriadis and Bayer (2019), Patton et al. (2019a) and Barendse (2018) propose the joint regression technique, > > e Y = V + u ; and Y = W + u ; (2.7) t t t t t where V and W are k-dimensional, F -measureable covariate vectors and where Q ¹u j F º = 0 and t t t1 t1 ES ¹u j F º = 0 almost surely. Setting up this joint regression framework facilitates the estimation of t1 the joint regression parameters ¹ ; º, whereas stand-alone estimation of is infeasible. We use this joint regression setup to propose the following regression based backtests for the ES: The Auxiliary ESR Backtest We choose V = ¹1; v ˆ º and W = ¹1; e ˆ º, i.e. we set up the regression system t t t t Y = + v ˆ + u ; and Y = + e ˆ + u ; (2.8) t 1 2 t t 1 2 t and test H : ¹ ; º = ¹0; 1º against H : ¹ ; º , ¹0; 1º; (2.9) 0 1 2 1 1 2 using the Wald-type test statistic T = T ˆ ¹0; 1º ˆ ¹0; 1º ; (2.10) A-ESR T T based on some (consistent) covariance estimator for the covariance of the subvector . The Strict ESR Backtest We choose V = W = ¹1; e ˆ º, i.e. we set up the regression system t t t Y = + e ˆ + u ; and Y = + e ˆ + u ; (2.11) t 1 2 t t 1 2 t t t and test H : ¹ ; º = ¹0; 1º against H : ¹ ; º , ¹0; 1º; (2.12) 0 1 2 1 1 2 using the Wald-type test statistic T = T ˆ ¹0; 1º ˆ ¹0; 1º ; (2.13) S-ESR T T based on some (consistent) covariance estimator for the covariance of the subvector . We discuss the employed covariance estimators in Section 2.5. Whereas setting up Mincer-Zarnowitz tests for classical elicitable functionals such as the mean, quantiles and expectiles is straight-forward (see Mincer and Zarnowitz (1969), Gaglianone et al. (2011), Guler et al. (2017)), in the case of higher-order elicitable functionals such as the ES we have several choices as illustrated above. The Auxiliary ESR backtest is based on the regression speciﬁcation (2.8) and requires both, VaR and ES forecasts as input parameters. Thus, following Deﬁnition 2.1, this backtest is formally a joint VaR and ES backtest, however, with a strong emphasis on backtesting ES forecasts. In contrast, the Strict ESR backtest only incorporates ES forecasts and consequently is the ﬁrst backtest for the ES stand-alone. 6 The Strict ESR test however comes at the cost of a potential model misspeciﬁcation. Given that the ﬁnancial returns Y follow some pure scale (volatility) process, it holds that the VaR and ES forecasts are perfectly colinear, e ˆ = cv ˆ for some c 2 R. Consequently, if v ˆ equals the true conditional VaR, the ﬁrst t t t equation in (2.11) is correctly speciﬁed for the true parameter values¹ ; º = ¹0; cº. Most of the ﬁnancial 1 2 econometrics literature (almost the entire GARCH, stochastic volatility and Realized Volatility literature) is based on such an assumption for daily returns, which motivates the applicability of this Strict ESR backtest. However, this backtest is also applicable in the general case where the true VaR and ES forecasts are not necessarily colinear. For this, we provide asymptotic theory for M-estimation of the joint VaR and ES regression under potential model misspeciﬁcation in Section 2.4. 2.3. The One-Sided Intercept ESR Backtest The two ESR backtests introduced in the previous section only allow for testing two-sided hypotheses as speciﬁed in (2.9) and (2.12), as it is generally unclear how too risky (or too conservative) forecasts inﬂuence the parameters and . Because the capital requirements the ﬁnancial institutions have to keep as a reserve 1 2 depend on the reported risk forecasts, the market participants have an incentive to report too risky forecasts for the ES in order to keep as little capital requirements as possible. In contrast, issuing too conservative risk forecasts and facing higher capital requirements does not have to be punished by the regulatory authorities.5 Thus, the regulators only have to prevent and consequently penalize the underestimation of ﬁnancial risks, which can be done by using one-sided backtesting procedures. For example, the traﬃc light system (Basel Committee, 1996), currently implemented in the Basel Accords, is in fact a one-sided backtest for the hit ratios of VaR forecasts. Consequently, we also introduce a regression-based backtesting procedure for the ES that allows for testing one-sided hypotheses. The Intercept ESR Backtest This backtest is based on the regression setup of the Strict ESR backtest by regressing the forecast errors, Y e ˆ , on an intercept term only, t t Y e ˆ = + u ; and Y e ˆ = + u ; (2.14) t t 1 t t 1 t t where Q ¹u j F º = 0 and ES ¹u j F º = 0 almost surely. By using this restricted regression t1 t1 t t equation, we can deﬁne a one-sided and a two-sided alternative, 2s 2s H : = 0 against H : , 0; and 1 1 0 1 (2.15) 1s 1s H : 0 against H : < 0; 1 1 0 1 which we test by using a t-test based on the estimated asymptotic covariance described in Section 2.5. Note that this testing procedure is equivalent to ﬁxing the slope parameter of the Strict ESR test given in (2.11) to one and only estimating and testing the intercept term. Therefore, we call this backtest the Intercept ESR backtest. 2.4. Asymptotic Theory under Model Misspeciﬁcation In this section, we consider the asymptotic properties of the M-estimator of the joint VaR and ES regression framework given in (2.7) under potential model misspeciﬁcation. In the following, we write X = ¹V ; W º for t t t 5One could interpret the higher capital requirements as a punishment for too conservative risk forecasts. 7 the compound vector of covariates. Following Dimitriadis and Bayer (2019) and Patton et al. (2019a), the M-estimator of the regression parameters is deﬁned by = arg min Q ¹º; where (2.16) T T Q ¹º = ¹Y ; X ; º and (2.17) T t t t=1 ¹V Yº1 1 fY V g t t > > > ¹Y ; X ; º = W V + + log¹W º; (2.18) t t t t t where the loss function in (2.18) is a strictly consistent loss function for the pair quantile and ES (Fissler and Ziegel, 2016). Dimitriadis and Bayer (2019) and Patton et al. (2019a) show consistency and asymptotic normality for the M-estimator in the case of a correctly speciﬁed parametric model, i.e under the assumption that there exists a true parameter 2 such that Q ¹u j F º = 0 and ES ¹u j F º = 0 almost surely. 0 t1 t1 t t In the following, we extend this theory by relaxing these assumptions which allows for the general case of misspeciﬁed models. For this, we deﬁne the pseudo-true parameter 0 0 = arg min Q ¹º; where Q ¹º = E»Q ¹º¼ (2.19) T T T For the classical case of a correctly speciﬁed model, the pseudo-true parameter coincides with the true regression parameter = and is independent of T. In the following, we restrict our attention to processes and models for the conditional quantile and ES which follow the following conditions. Assumption 2.2. (A1) The distribution F is absolutely continuous with density function f , which is bounded from above, t t i.e. there exists a constant c > 0 s.t. sup f ¹yº c and sup f ¹yº c. y2R y2R t 2k (A2) The parameter space R is compact, convex and has non-empty interior. (A3) We assume that the pseudo-true parameter deﬁned in (2.19) is in the interior of and is the unique minimizer of the objective function Q ¹º and that the sequence r E »¹Y ; X ; º¼ is t t t T T uncorrelated. > > (A4) V ; W 2 F and the matrices E»V V ¼ and E»W W ¼ have full rank. t t t1 t t t t (A5) The matrix , deﬁned in Theorem 2.4 has strictly positive Eigenvalues for all T suﬃciently large enough. (A6) The stochastic process fY ; V ; W g is strong mixing of sizer¹r 2º for some r > 2. t t t (A7) For all 2 , it holds that K < 1 for some constant K > 0. r+1 r+1 r+1 r r+1 r (A8) It holds thatE jjV jj < 1, E jjW jj < 1, E jjV jj jjW jj < 1 andE jjW jj jY j < t t t t t t 1 for the r > 2 from condition (A6). (A9) For any T 2 N, sup 1 > K a.s. for some constant K > 0. 2 fY =V g 8 The conditions in Assumption 2.2 mainly resemble the regularity conditions for asymptotic normality for correctly speciﬁed models from Patton et al. (2019a) and we refer to Patton et al. (2019a) for a discussion of these conditions. The key condition which allows for misspeciﬁed models is the unique minimization condition of the pseudo-true parameter in condition (A3). The above assumptions contain the case of correctly speciﬁed models as then, the condition (A3) is naturally fulﬁlled as the utilized loss function is a strictly consistent loss function for the VaR and the ES (Fissler and Ziegel, 2016). We connect this weaker condition (A3) to classical misspeciﬁed regression models for the mean and for quantiles of White (1980), Gourieroux et al. (1984), Kim and White (2003), Komunjer (2005) and Angrist et al. (2006). For correctly speciﬁed models, we usually impose the strong condition that for all t = 1; : : :; T, E ¹Y ; X ; º = 0 a.s. () = ; (2.20) t t t where ¹Y ; X ; º is almost surely the derivative of ¹Y ; X ; º and corresponds to the identiﬁcation functions t t t t of the model (Gneiting, 2011). The weaker condition (A3) is essentially equivalent to the unconditional moment condition " # E ¹Y ; X ; º = 0 () = : (2.21) t t t=1 > > Thus, the condition (2.21) can be interpreted as an average identiﬁcation condition, i.e. V and W are t t T T some best averaged linear approximations of the true unknown conditional quantile and ES models. Theorem 2.3 (Consistency Misspeciﬁed Model). Given the conditions from Assumption 2.2, it holds that ! 0, as T ! 1, where is the pseudo-true parameter as deﬁned in (2.19). T T The proof of Theorem 2.3 is given in Appendix A. Theorem 2.4 (Asymptotic Normality Misspeciﬁed Model). Given the conditions of Assumption 2.2, it holds that 12 ¹ º ¹ º T ! N ¹0; I º; (2.22) T T T 2k T T T where ¹ º ¹ º ¹ º ¹ º 11;T 12;T 11;T 12;T T T T T ¹ º = and ¹ º = (2.23) T T T T ¹ º ¹ º ¹ º ¹ º 21;T 22;T 21;T 22;T T T T T with 1 1 > > ¹ º = E V V f ¹V º ; (2.24) 11;T t t T t t T T W t=1 t T F ¹V º 1 1 t > > ¹ º = ¹ º = E V W ; (2.25) 12;T t T 21;T T t T ¹W º t=1 t 1 1 ¹ º = E W W (2.26) 22;T t T t T ¹W º t=1 t h i > F ¹V º 2 1 1 > > > T E W W W E Y 1 > + V ; (2.27) t t t t t fY V g t T t T > 3 t T ¹W º t=1 T 9 and ¹1 2º¹F ¹V º º 1 1 1 > T ¹ º = E V V + ; (2.28) 11;T t T t 2 2 ¹W º t=1 t T 1 1 1 > > > ¹ º = E V W V W (2.29) 12;T t T t t T t T ¹W º t=1 t > h i F ¹V º 1 t 1 t T > > + V + W E Y 1 > (2.30) t t t T t T fY V g F ¹V º T > > V W ; (2.31) t t T T 1 1 1 1 > > > > > ¹ º = E W W Var ¹V Y jY V º + V W (2.32) 22;T t t t t T t t T t T t T t T ¹W º t=1 t F ¹V º > > > +2 V W V : (2.33) t T t T t T The proof of Theorem 2.4 is given in Appendix A. The asymptotic theory derived here embeds the asymptotic theory of Patton et al. (2019a) and Dimitriadis and Bayer (2019) in the simpliﬁed case of correctly > > speciﬁed models. Correct speciﬁcation implies that F ¹V º = and W = E Y 1 almost t t t t t fY V g T T t surely for all t = 1; : : :; T. Imposing these two conditions simpliﬁes the asymptotic covariance matrix of Theorem 2.4 to the asymptotic covariances from Patton et al. (2019a) and Dimitriadis and Bayer (2019). In general, allowing for model misspeciﬁcation in regression models comes at the cost of an inﬂated and more complicated asymptotic covariance matrix, see e.g. White (1980), White (1994), Kim and White (2003), Komunjer (2005) and Angrist et al. (2006) for examples of semiparametric models for the mean and quantiles. Given consistency and asymptotic normality, we can derive the asymptotic distribution of the test statistics of our new regression-based ESR backtests. Henceforth, we use the short notation 1 1 ¹ º ¹ º ¹ º for the asymptotic covariance. As the Auxiliary ESR backtest is not subject T T T T T T to model misspeciﬁcation, under the null hypothesis it holds that = ¹0; 1º for all T 2 N. However, this does not necessarily hold for the Strict ESR and the Intercept ESR backtests and we consequently deﬁne the following modiﬁed test statistics for these backtests, e b T = T ˆ ˆ ; (2.34) S-ESR T T T T; T e b T = T ˆ ˆ ; (2.35) I-ESR 1;T 1;T 1;T T; 1;T b b where and are the ES-speciﬁc parts of the estimators for the asymptotic covariance matrix and T; T; 1;T refers to the intercept component of the pseudo-true ES speciﬁc parameter vector . Corollary 2.5. Given the conditions of Assumption 2.2 and given that ! 0, it holds that T T d d d 2 2 2 e e T ! ; T ! ; and T ! : (2.36) A-ESR S-ESR I-ESR 2 2 1 The proof of Corollary 2.5 is given in Appendix A. For the Strict ESR test (and the intercept version), we do not know the exact form of the peuso-true parameter in practice. In the following, we argue that in realistic ﬁnancial settings, ¹0; 1º and thus, T T holds approximately. First, the S-ESR S-ESR majority of literature in ﬁnancial econometrics ﬁnds that pure scale processes (e.g. GARCH and stochastic 10 volatility models) approximate the true underlying daily ﬁnancial data well enough. Thus, v ˆ ce ˆ for some t t c > 0 and we ﬁnd that under the null hypothesis, the regression model in (2.11) is only subject to a slight model misspeciﬁcation. Second, the misspeciﬁcation is in the auxiliary quantile equation, while we test the parameters of the correctly speciﬁed ES equation in (2.11). Thus, the model misspeciﬁcation enters our test statistic only indirectly through the auxiliary eﬀect of the joint parameter estimation. Third, our simulation results in Section 3 show that the Strict ESR backtest based on T exhibits correct size properties and S-ESR performs almost indistinguishably to the Auxiliary ESR backtest, also in the simulation setups where the underlying data does not follow a pure scale processes. This shows that the approximation error is negligible in realistic ﬁnancial settings and that the Strict and Intercept ESR backtests can be applied in practice. 2.5. Implementation of the Tests The M-estimation of the parameters is carried out by using the R package esreg (Bayer and Dimitriadis, 2019b). The main diﬃculty in the implementation of the backtests is estimation of the asymptotic covariance 1 1 matrix = ¹ º ¹ º ¹ º . Generally, this is implemented by using the sample counterparts of T T T T T T T the expectation of the components given in (2.24) - (2.33) in Theorem 2.4, wich are however subject to the following four nuisance quantities: ˆ ˆ (a) the conditional density function, evaluated at the conditional quantile, f ¹V º, t T > > c ˆ ˆ (b) the conditional, truncated variance, Var ¹V Y jY V º, t T t t T t t ˆ ˆ (c) the conditional distribution function, F ¹V º, and t T h i (d) the conditional, truncated expectation E Y 1 > . t t fY V g t T We implement a novel and misspeciﬁcation robust covariance estimator by estimating the four nuisance quantities above in the following way. The terms (a) and (b) are subject to the asymptotic covariance of correctly speciﬁed models for the quantile and the ES of Dimitriadis and Bayer (2019), Patton et al. (2019a) and Barendse (2018). Thus, we follow the approach of Dimitriadis and Bayer (2019) and apply the nid estimator of Hendricks and Koenker (1992) for (a), the conditional density and the ﬂexible scl-sp estimator of Dimitriadis and Bayer (2019) for (b), the conditional truncated variance. ˆ ˆ In order to estimate (c), the conditional distribution function F ¹V º, we follow the general approach of t T the scl-sp estimator of Dimitriadis and Bayer (2019), i.e. we assume that F follows a conditional location-scale model with innovations " with a ﬂexible zero mean and unit variance distribution. We standardize Y by the t t estimates of the conditional mean and variance, estimated by pseudo-maximum likelihood and apply a kernel density estimator in order to obtain the distribution function of " . Hence, we can recover the distribution of Y t t given F . Notice that for the minor degree of misspeciﬁcation we are subject to in our backtesting approach, t1 ˆ ˆ it approximately hold that F ¹V º for all t. We ﬁnd that this semiparametric estimation approach, t T which is subject to the location-scale assumption, performs better than pure nonparametric alternatives as we are estimating the conditional distribution evaluated at rather extreme quantiles such as at = 2:5%. h i The last nuisance quantity, E Y 1 > , is the mean, given the observations are smaller than the t t ˆ fY V g t T possibly misspeciﬁed linear quantile model. This quantity is closely related the the conditional ES, which is assumed to be a linear function in our approach. As for realistic ﬁnancial data, we only face a minor degree of misspeciﬁcation in the quantile model, this nuisance quantity is assumed to still be approximately linear, and h i 1 > thus, we obtain that E Y 1 = W ˆ for all t. Nonparametric estimation of this nuisance quantity t t ˆ t fY V g t t T again introduces too much estimation noise. 11 We further implement our backtests based on a covariance estimator from Dimitriadis and Bayer (2019) and Patton et al. (2019a), which does not account for possible model misspeciﬁcation. This estimator is based on the simpliﬁed covariance structure given in Dimitriadis and Bayer (2019) and Patton et al. (2019a), where h i > 1 > the correct model speciﬁcation assumption implies that F ¹V º = , and E Y 1 > = W t t t t fY V g t T t T almost surely. Thus, we only estimate the nuisance quantities (a) and (b) in this approach. 3. Monte-Carlo Simulations In this section, we evaluate the empirical performance of our proposed ESR backtests and compare them to the tests of McNeil and Frey (2000) and Nolde and Ziegel (2017). For that, we assess the empirical size and power of the tests, which are deﬁned as the rejection frequency of the tests under the null and alternative hypothesis respectively. This comparison is conducted using two diﬀerent approaches. The ﬁrst, presented in Section 3.1, follows the typical strategy in the related literature of ﬁrst assessing the size of the backtests with several realistic data generating processes (DGP), followed by an evaluation of the power by backtesting forecasts stemming from an overly simpliﬁed model, in this case the Historical Simulation (HS) model. In the second setup, presented in Section 3.2, we continuously misspecify certain parameters of the true model and thereby obtain alternative models with a continuously increasing degree of misspeciﬁcation. This approach of evaluating backtests has two advantages. First, we obtain power curves which can be used to draw conclusions how an increasing model misspeciﬁcation inﬂuences the test decisions. Second, misspecifying the diﬀerent model parameters in isolation allows us to misspecify certain model characteristics while leaving the remaining model unchanged. 3.1. Traditional Size and Power Comparisons In order to compare the proposed backtests from the previous sections, we simulate data from several DGPs. Besides pure scale (volatility) model speciﬁcations, under which the Strict and Intercept ESR backtests are correctly speciﬁed, we also consider more general Student’s-t GAS models (Creal et al., 2013) with time-varying higher moments and AR-GARCH speciﬁcations where our ESR backtests are subject to model misspeciﬁcation under the null hypothesis. EGARCH: The ﬁrst DGP is an EGARCH(1,1) model (Nelson, 1991) with t-distributed innovations, where the parameter values are calibrated to daily returns of the S&P 500 index, Y = z ; where z t ; and t t t t 7:39 (3.1) 2 2 log¹ º = 0:0012 0:161z + 0:136¹jz j E»jz j¼º + 0:978 log¹ º; t1 t1 t1 t1 This model represents a highly ﬂexible GARCH speciﬁcation and due to its calibrated parameter values, this DGP accurately replicates the distributional properties of daily ﬁnancial returns. As we assume a zero mean for this model, the true VaR and ES forecasts are perfectly colinear and consequently, the regression equations for the Strict and the Intercept ESR backtests are correctly speciﬁed under the null hypothesis. AR-GARCH: The next speciﬁcation is an AR(1)-GARCH(1,1) model with Gaussian innovations, Y = Y + z ; where z N¹0; 1º; and t t1 t t t (3.2) 2 2 2 = 0:01 + 0:1Y + 0:85 ; t1 t1 12 where we consider the three speciﬁcations 2 f0; 0:1; 0:5g for the AR parameter. This DGP introduces model misspeciﬁcation for the Strict and Intercept ESR backtests through the non-zero conditional mean speciﬁcation, while leaving the realistic volatility structure of the ﬁnancial returns unchanged. For this DGP, the ratio between true VaR and ES is given by v ˆ + z t t t = ; (3.3) e ˆ + t t t where is the conditional mean of Y given F . If equals zero, the ratio is constant and thus, the t t t1 t regression equations in (2.11) are correctly speciﬁed under the null. By increasing the time-dependence of the conditional mean model through the AR parameter, we can monotonically strengthen the model misspeciﬁcation in this DGP. GAS-STD: We use a 3-factor Student’s-t GAS model with time-varying location , scale , and degrees of t t freedom with parameters calibrated to daily returns of the S&P 500 index. This model is estimated and simulated by using the R package GAS (Ardia et al., 2019) and is based on the following model speciﬁcation Y Y ; : : :;Y t¹ ; ; º; (3.4) t 1 t1 t t t where the vector¹ ; ; º follows an autoregressive speciﬁcation, driven by the lagged score of the t t t log-likelihood of the distributional speciﬁcation in (3.4). Creal et al. (2013) and Harvey (2013) introduce the general GAS speciﬁcation, which nests many well known models, including ARMA, GARCH (Bollerslev, 1986) and ACD (Engle and Russell, 1998) models. Koopman et al. (2016) provides an overview of GAS and related models. We refer to Appendix A of Ardia et al. (2019) for the exact parametric speciﬁcation of this Student’s-t GAS model. GAS-SSTD: We generalize the previous GAS model to a 4-factor asymmetric Student’s-t GAS model with time-varying location , scale , skewness , and degrees of freedom , t t t t Y Y ; : : :;Y t¹ ; ; ; º: (3.5) t 1 t1 t t t t Compared to the previous 3-factor GAS speciﬁcation, this model further allows for asymmetries in the conditional return distribution through allowing for an additional time-varying skewness parameter with an autoregressive GAS-speciﬁcation. For the two location-scale DGPs, we obtain VaR and ES forecasts at level by v ˆ = ˆ + ˆ q ¹º and e ˆ = ˆ + ˆ ¹º; (3.6) t t t z t t t z where ˆ and ˆ are the respective location and volatility forecasts generated by the location and scale models t t and q ¹º and ¹º are the -quantile, respectively the -ES of the innovations z . For the t-distributions of z z t the two GAS models, we obtain the ES forecasts through numerical integration. For the following size and power analysis of the backtests, we simulate data from the DGPs given above with varying sample sizes of 250, 500, 1000, 2500, and 5000 observations and 250 additional pre-sample values required for the power analysis. We run 10,000 Monte Carlo replications for each of the DGPs. As stipulated by the Basel Accords, we ﬁx the probability level to = 2:5% for the VaR and ES forecasts for each of the DGPs. In this part of the study, we focus on two-sided hypotheses and defer the one-sided case to Section 3.3. We compare our three ESR backtests to two speciﬁcations of the conditional calibration (CC) backtest of Nolde and Ziegel (2017) 13 and to two speciﬁcations of the exceedance residual (ER) backtests of McNeil and Frey (2000), which are further described in Appendix C.1 and Appendix C.2. Table 1 presents the empirical sizes of the considered backtests for the diﬀerent DGPs introduced above and for the diﬀerent sample sizes and a nominal test size of 5%. Table 4 and Table 5 in Appendix D show equivalent results for nominal signiﬁcance levels of 1% and and 10%. We ﬁnd that in large samples, all backtests display rejection rates close to the respective nominal size for all considered DGPs. However, in small samples the ESR tests based on the misspeciﬁcation covariance estimator exhibit much better sizes compared to the equivalent ESR tests which do not account for the potential misspeciﬁcation. As this holds for both, DGPs which do and do not generate misspeciﬁcation under the null, this indicates that the misspeciﬁcation covariance estimator better approximates the ﬁnite sample distribution and should consequently be applied in empirical applications. We further ﬁnd that the Strict ESR test and the Auxiliary ESR test perform very similar throughout all considered DGPs. This implies that the indirect misspeciﬁcation the Strict ESR test introduces is negligible for realistic ﬁnancial data. Even for the AR-GARCH model with increasing AR paramter , the size properties of the Strict and the Intercept ESR tests are not adversely aﬀected by the increasing degree of misspeciﬁcation. From the four competitor backtests, the general CC and the ER and its standardized version exhibit satisfactory sizes whereas the Simple CC test is severely oversized, especially in small samples. For a comparison of the power of the backtests, we evaluate their ability to reject the null hypothesis for risk models producing incorrect ES forecasts. We utilize the Historical Simulation (HS) approach which forecasts the VaR and ES by using their empirical counterparts from previous trading days, v ˆ = Q ¹Y ; Y ; ;Y º and e ˆ = Y 1 ; (3.7) t t1 t2 tw t ti fY v ˆ g ti ti Y v ˆ f g ti ti i=1 i=1 where Q is the empirical -quantile and w is the length of a rolling window, that we set to 250, i.e. one year of data. Since the standardized ER and the general CC backtests require forecasts of the volatility, we estimate this quantity with the sample standard deviation of the returns over the same rolling window. For a meaningful and fair comparison of the power of the backtests to reject the null hypothesis, we compare the size-adjusted power6 of the backtests (Lloyd, 2005). For this, the original critical values of the tests are modiﬁed such that the rejection frequencies of the true model equal the nominal test sizes. The size-adjusted power is then given by the rejection frequencies of the alternative models using these modiﬁed critical values. The left panels in Figure 1 and Figure 2 contain the size-adjusted power of the backtests for all empirical sizes in the unit interval for a sample size of 1000 and for the diﬀerent DGPs.7 The black line depicts the case of equal empirical size and power, which can be seen as a lower bound for any reasonable test: whenever the power is below this line, randomly guessing the test decision is more accurate than performing the test. For the three ESR backtests, we only report power for the tests relying on the misspeciﬁcation robust covariance estimator as these versions of the tests exhibit superior size properties for all considered DGPs. We observe that throughout all six considered DGPs, the three ESR backtests clearly dominate the four competitors in terms of power at almost all empirical sizes, including the most relevant region of test sizes between 1% and 10%. Especially the Strict and the Auxiliary ESR tests exhibit a substantially larger power. In order to present results for all considered sample sizes in condensed form for the relevant area of empirical sizes between 1% and 10%, we summarize the size-adjusted power by the partial area under the 6A comparison of the raw power, i.e. the raw rejection rate of the null hypotheses, could be misleading due to the diﬀerences in the empirical sizes of the backtests. In particular, an oversized test would exhibit unrealistically large rejection rates. 7 These plots are known as the receiver operating characteristic (ROC) curves and origin from the psychometrics literature (Lloyd, 2005). They are an eﬀective presentation method for general binary classiﬁcation tasks such as hypothesis testing as they show the size-adjusted power simultaneously for all signiﬁcance levels. 14 Table 1: Empirical sizes for the ﬁrst simulation study. Sample Str. Aux. Int. Str. Aux. Int. Gen. Sim. Std. DGP ER Size ESR ESR ESR ESR ESR ESR CC CC ER Misspec Covariance Classical Covariance 250 0.09 0.09 0.14 0.24 0.25 0.16 0.08 0.29 0.07 0.09 500 0.06 0.07 0.10 0.15 0.15 0.11 0.10 0.20 0.04 0.07 EGARCH-STD 1000 0.05 0.05 0.08 0.11 0.11 0.08 0.09 0.14 0.05 0.07 2500 0.04 0.04 0.06 0.06 0.06 0.06 0.07 0.09 0.05 0.06 5000 0.04 0.04 0.07 0.06 0.06 0.07 0.06 0.08 0.05 0.06 250 0.10 0.10 0.14 0.26 0.26 0.15 0.07 0.28 0.07 0.08 500 0.07 0.08 0.10 0.16 0.16 0.11 0.10 0.20 0.06 0.06 GAS-STD 1000 0.06 0.06 0.07 0.11 0.11 0.07 0.09 0.14 0.06 0.07 2500 0.05 0.05 0.06 0.08 0.08 0.06 0.07 0.10 0.06 0.06 5000 0.04 0.05 0.08 0.06 0.06 0.08 0.06 0.08 0.06 0.06 250 0.09 0.09 0.13 0.25 0.25 0.15 0.07 0.26 0.08 0.07 500 0.06 0.06 0.10 0.15 0.15 0.10 0.09 0.18 0.06 0.05 GAS-SSTD 1000 0.05 0.05 0.07 0.10 0.10 0.07 0.08 0.13 0.07 0.06 2500 0.04 0.04 0.06 0.06 0.06 0.06 0.07 0.09 0.06 0.05 5000 0.04 0.04 0.06 0.05 0.05 0.06 0.07 0.07 0.06 0.05 250 0.05 0.04 0.11 0.18 0.18 0.13 0.06 0.22 0.06 0.07 500 0.04 0.04 0.09 0.12 0.12 0.09 0.07 0.14 0.04 0.04 AR-GARCH, = 0:0 1000 0.03 0.04 0.07 0.09 0.09 0.07 0.07 0.10 0.04 0.04 2500 0.03 0.03 0.06 0.06 0.06 0.06 0.06 0.08 0.05 0.05 5000 0.04 0.04 0.05 0.06 0.06 0.05 0.05 0.06 0.05 0.05 250 0.05 0.05 0.11 0.18 0.18 0.13 0.06 0.22 0.06 0.07 500 0.04 0.04 0.09 0.12 0.12 0.09 0.07 0.14 0.04 0.04 AR-GARCH, = 0:1 1000 0.04 0.04 0.07 0.09 0.09 0.07 0.07 0.10 0.04 0.04 2500 0.03 0.03 0.06 0.07 0.07 0.06 0.06 0.08 0.05 0.05 5000 0.04 0.04 0.05 0.06 0.06 0.05 0.05 0.06 0.05 0.05 250 0.04 0.04 0.11 0.17 0.17 0.13 0.06 0.22 0.06 0.07 500 0.04 0.04 0.09 0.12 0.12 0.09 0.07 0.14 0.04 0.04 AR-GARCH, = 0:5 1000 0.04 0.04 0.07 0.09 0.09 0.07 0.07 0.10 0.04 0.04 2500 0.04 0.04 0.06 0.07 0.07 0.06 0.06 0.08 0.05 0.05 5000 0.04 0.04 0.05 0.06 0.06 0.05 0.05 0.06 0.05 0.05 Notes: The table reports the empirical sizes of the backtests for the diﬀerent DGPs decribed in Section 3.1 and for a nominal test size of 5%. The number of Monte-Carlo repetitions is 10,000 and the probability level for the risk measures is = 2:5%. ESR refers to the three backtests introduced in this paper and we consider versions with covariance estimation with and without model misspeciﬁcation. CC refers to the conditional calibration tests of Nolde and Ziegel (2017), and ER to the exceedance residuals tests of McNeil and Frey (2000). 15 (a) EGARCH: Size-adjusted Power (b) EGARCH: Partial Area Under the Curve 1.0 1.0 Str. ESR (m) Aux. ESR (m) Int. ESR (m) 0.8 0.8 General CC Simple CC Std. ER 0.6 0.6 ER Str. ESR (m) 0.4 0.4 Aux. ESR (m) Int. ESR (m) General CC 0.2 0.2 Simple CC Std. ER ER 0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 250 500 1000 2500 5000 Empirical Size of the Test Sample Size (c) GAS-STD: Size-adjusted Power (d) GAS-STD: Partial Area Under the Curve 1.0 1.0 Str. ESR (m) Aux. ESR (m) Int. ESR (m) 0.8 0.8 General CC Simple CC Std. ER 0.6 0.6 ER Str. ESR (m) 0.4 0.4 Aux. ESR (m) Int. ESR (m) General CC 0.2 0.2 Simple CC Std. ER ER 0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 250 500 1000 2500 5000 Empirical Size of the Test Sample Size (e) GAS-SSTD: Size-adjusted Power (f) GAS-SSTD: Partial Area Under the Curve 1.0 1.0 Str. ESR (m) Aux. ESR (m) Int. ESR (m) 0.8 0.8 General CC Simple CC Std. ER 0.6 0.6 ER Str. ESR (m) 0.4 0.4 Aux. ESR (m) Int. ESR (m) General CC 0.2 0.2 Simple CC Std. ER ER 0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 250 500 1000 2500 5000 Empirical Size of the Test Sample Size Figure 1: Size-adjusted power and Partial Area Under the Curve plots against Historical Simulation for a sample size of 1000 days. The number of Monte-Carlo repetitions is 10,000 and the probability level for the risk measures is = 2:5%. ESR refers to the backtests introduced in this paper with (m) indicating the version which account for the additional covariance terms induced by the misspeciﬁed model. CC refers to the conditional calibration tests of Nolde and Ziegel (2017), and ER to the exceedance residuals tests of McNeil and Frey (2000). Empirical Power of the Test Empirical Power of the Test Empirical Power of the Test Partial Area Under the Curve Partial Area Under the Curve Partial Area Under the Curve (a) AR-GARCH = 0: Size-adjusted Power (b) AR-GARCH = 0: Partial Area Under the Curve 1.0 1.0 Str. ESR (m) Aux. ESR (m) Int. ESR (m) 0.8 0.8 General CC Simple CC Std. ER 0.6 0.6 ER Str. ESR (m) 0.4 0.4 Aux. ESR (m) Int. ESR (m) General CC 0.2 0.2 Simple CC Std. ER ER 0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 250 500 1000 2500 5000 Empirical Size of the Test Sample Size (c) AR-GARCH = 0:1: Size-adjusted Power (d) AR-GARCH = 0:1: Partial Area Under the Curve 1.0 1.0 Str. ESR (m) Aux. ESR (m) Int. ESR (m) 0.8 0.8 General CC Simple CC Std. ER 0.6 0.6 ER Str. ESR (m) 0.4 0.4 Aux. ESR (m) Int. ESR (m) General CC 0.2 0.2 Simple CC Std. ER ER 0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 250 500 1000 2500 5000 Empirical Size of the Test Sample Size (e) AR-GARCH = 0:5: Size-adjusted Power (f) AR-GARCH = 0:5: Partial Area Under the Curve 1.0 1.0 Str. ESR (m) Aux. ESR (m) Int. ESR (m) 0.8 0.8 General CC Simple CC Std. ER 0.6 0.6 ER Str. ESR (m) 0.4 0.4 Aux. ESR (m) Int. ESR (m) General CC 0.2 0.2 Simple CC Std. ER ER 0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 250 500 1000 2500 5000 Empirical Size of the Test Sample Size Figure 2: Size-adjusted power and Partial Area Under the Curve plots against Historical Simulation for a sample size of 1000 days. The number of Monte-Carlo repetitions is 10,000 and the probability level for the risk measures is = 2:5%. ESR refers to the backtests introduced in this paper with (m) indicating the version which account for the additional covariance terms induced by the misspeciﬁed model. CC refers to the conditional calibration tests of Nolde and Ziegel (2017), and ER to the exceedance residuals tests of McNeil and Frey (2000). Empirical Power of the Test Empirical Power of the Test Empirical Power of the Test Partial Area Under the Curve Partial Area Under the Curve Partial Area Under the Curve Table 2: Empirical sizes for the second simulation study. Str. Aux. Int. Str. Aux. Int. General Simple Std. DGP ER ESR ESR ESR ESR ESR ESR CC CC ER Misspec Covariance Classical Covariance Two-Sided 0.07 0.07 0.06 0.05 0.05 0.06 0.07 0.09 0.05 0.05 One-Sided – – 0.03 – – 0.03 0.02 0.03 0.06 0.06 Notes: This table shows the empirical sizes of the backtests for the GARCH(1,1)-t model given in (3.8), for a nominal test size of 5% and for both, one-sided and two-sided hypotheses. The number of Monte-Carlo repetitions is 10,000 and the probability level for the risk measures is = 2:5%. ESR refers to the backtests introduced in this paper. CC refers to the conditional calibration tests of Nolde and Ziegel (2017), and ER to the exceedance residuals tests of McNeil and Frey (2000). Note that the Strict and Auxiliary ESR tests do not permit testing against a one-sided alternative and therefore, we only present sizes for the two-sided hypothesis. curve (PAUC), as proposed by Lloyd (2005). For that, we numerically compute the area under each power curve for the empirical sizes between 1% and 10%, which can be interpreted as the test power averaged over the diﬀerent test sizes. In the right-hand panels of Figure 1 and Figure 2, we present the PAUC for all backtests, DGPs and sample sizes. As expected, the average power increases with the sample size, so that using more information leads to more reliable decisions about the quality of a forecast. We ﬁnd that for all considered sample sizes, the ESR backtests dominate the other testing approaches. This dominance is especially pronounced for the Strict and the Auxiliary ESR tests. The almost identical performance of the Strict and the Auxiliary ESR tests throughout all simulation designs in Figure 1 and Figure 2 emphasizes that the misspeciﬁcation introduced by the Strict ESR test seems to be unproblematic for realistic ﬁnancial data. 3.2. Continuous Model Misspeciﬁcation In the second simulation study, we use a GARCH(1,1) model with standardized Student-t distributed innovations, Y = z ; where z t ; and t t t t (3.8) 2 2 2 = + Y + ; 0 1 2 t1 t1 with the parameter values = 0:01, = 0:1, = 0:85, and = 5 for the true model. For the analysis of 0 1 2 the backtests, we simulate 10,000 times from this model with a ﬁxed sample size of 2500 observations and consider the probability level = 2:5% for the VaR and the ES. Table 2 presents the empirical sizes of the backtests for a nominal size of 5% for both, the two- and one-sided hypotheses. As in the ﬁrst simulation study, we ﬁnd that most of the backtests are reasonably sized with rejection frequencies close to the nominal value. For a detailed analysis of the power of the backtests, we continuously misspecify the true model according to the following ﬁve designs: (a) We misspecify how the conditional variance reacts to the squared returns by varying the ARCH parameter . We choose ˜ between 0.03 and 0.2 and let ˜ = 0:95 ˜ , such that the persistence of the GARCH 1 1 2 1 process remains constant. When ˜ < , there is too little variation in the ES forecasts due to the reduced 1 1 response to shocks and the GARCH process approaches a constant volatility model. (b) We alter the unconditional variance of the GARCH process E» ¼ = ¹1 º between 0:5 and 0 1 2 0:01 by varying the parameter while holding and constant. Since the conditional variance is a 0 1 2 18 weighted combination of the unconditional variance, the past squared returns and the past conditional variance, this change implies that the ES forecasts are too conservative when the unconditional variance is larger than its true value, and vice versa. (c) We vary the persistence of shocks between 0:9 and 0:999 by setting ˜ = c and ˜ = c for a 1 1 2 2 varying constant c and by setting ˜ = E» ¼¹1 ˜ ˜ º in order to stabilize the unconditional variance. 0 1 2 A higher persistence causes a stronger and longer reaction to shocks. (d) We vary the degrees of freedom of the underlying Student-t distribution between 3 and 1. Since the conditional variance is unaﬀected, this modiﬁcation implies a relative horizontal shift of the ES forecasts. (e) We misspecify the probability level ˜ of the ES forecasts between 0:5% and 5%. This represents the scenario that a forecaster submits (accidentally or on purpose) predictions for some level ˜ , . Similar to changing the degrees of freedom, this modiﬁcation implies a relative horizontal shift of the ES forecasts. As an illustrative example of these misspeciﬁcations, Figures 5a to 5e in Appendix D depict 250 realizations of the returns of the true DGP in (3.8), together with the corresponding ES forecasts of the true model (black dashed line) and of two exemplary models following the parameter misspeciﬁcations described in the points (a) to (e) above. We present the size-adjusted rejection rates plotted against the respective misspeciﬁed parameters for these ﬁve designs in Figures 3a to 3e. The true model is indicated by the gray vertical line and, induced by the results of Figure 5 in Appendix D, the x-axis is oriented such that too risky (too small in absolute value) ES forecasts are on the right side of the true model.8 Even though there is no backtest that dominates the others throughout all considered designs, several conclusions can be drawn from this ﬁgure. (1) Overall, the Strict and Auxiliary ESR tests perform almost indistinguishable and in four out of the ﬁve considered designs, their performance is superior compared to the general CC and both ER backtesting approaches. (Figures 3a to 3c and 3e). The ESR backtests outperform the competitors especially when we misspecify the volatility dynamics of the underlying GARCH process (Figures 3a to 3c). This shows that, in contrast to the existing approaches, our ESR backtests can be used to detect misspeciﬁcations in the dynamics used to construct the ES forecasts which go beyond level shifts. (2) The two ER tests (and the general CC test that is constructed to be similar to the ER backtest) can hardly discriminate between forecasts for the VaR and ES issued through misspeciﬁed volatility processes (Figures 3a to 3c) and through misspeciﬁed probability levels ˜ , (Figure 3e). This conﬁrms the theoretical results discussed in Section C.1 in Appendix C that these backtests only reject misspeciﬁcations which aﬀect the relation (distance) between the VaR and ES forecasts. In contrast, these backtests perform well in the case of misspeciﬁed tails of the residual distribution, which particularly aﬀects the relative distance between the VaR and ES forecasts (Figure 3d). If these backtests would be used by the regulatory authorities, banks could submit joint VaR and ES forecasts for some level ˜ > or some (too small) volatility process in order to minimize their capital requirements without facing the risk of being detected by these backtests. In comparison, our Intercept ESR backtest which is similar to the ER backtests by construction is clearly able to identify these misspeciﬁed probability levels. (3) Throughout all ﬁve misspeciﬁcations, the simple CC backtest also exhibits good power properties, similar to our proposed backtests. However, our three ESR backtests exhibit much better size properties (see Table 1 and Table 2) and in contrast to the simple CC test, they do not fail to reject the HS forecasts in the ﬁrst simulation study (see Figure 1). 8Notice that this inequality of the forecast magnitude only holds on average in the cases of Figures 3a and 3c whereas it holds strictly for Figures 3b, 3d and 3e. 19 (a) Changing the ARCH parameter (b) Changing the unconditional variance Str. ESR (m) 1.0 Str. ESR (m) 0.4 Aux. ESR (m) Aux. ESR (m) Int. ESR (m) Int. ESR (m) General CC 0.8 General CC Simple CC Simple CC 0.3 Std. ER Std. ER 0.6 ER ER 0.2 0.4 0.1 0.2 0.0 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 0.5 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0.01 ARCH parameter Unconditional variance (c) Changing the persistence 0.6 Str. ESR (m) Aux. ESR (m) Int. ESR (m) General CC Simple CC 0.4 Std. ER ER 0.2 0.0 0.91 0.93 0.95 0.97 0.99 Persistence (d) Changing the degrees of freedom (e) Changing the probability level 1.0 Str. ESR (m) 1.0 Str. ESR (m) Aux. ESR (m) Aux. ESR (m) Int. ESR (m) Int. ESR (m) 0.8 General CC 0.8 General CC Simple CC Simple CC Std. ER Std. ER 0.6 0.6 ER ER 0.4 0.4 0.2 0.2 0.0 0.0 3 4 5 6 7 8 9 10 12 15 20 30 50 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 Degrees of freedom of the Student-t Probability Level (in %) Figure 3: Size-adjusted rejection rates for various types of misspeciﬁcation. The gray vertical line depicts the true model. The number of Monte-Carlo repetitions is 10,000 and the probability level for the risk measures is = 2:5%. ESR refers to the backtests introduced in this paper with (m) indicating the version which account for the additional covariance terms induced by the misspeciﬁed model. CC refers to the conditional calibration tests of Nolde and Ziegel (2017), and ER to the exceedance residuals tests of McNeil and Frey (2000). Rejection Rate Rejection Rate Rejection Rate Rejection Rate Rejection Rate Together with the results from the ﬁrst simulation study, these ﬁndings demonstrate that our proposed ESR backtests are a powerful choice for backtesting ES forecasts. They are reasonably sized and exhibit good power properties against a variety of misspeciﬁcations. Notably, in contrast to the existing backtests, there is no single type of misspeciﬁcation where our ESR tests are unable to discriminate between forecasts of the true and the misspeciﬁed models. 3.3. Testing One-Sided Hypotheses For the regulatory authorities, testing against a one-sided alternative might be more meaningful than the two-sided versions of the tests we consider in the previous sections. Holding more money than stipulated bv the Basel Accords is no concern for regulators as it is only important that banks keep enough monetary reserves to cover the risks from their market activities. In the following, we assess the performance of the Intercept ESR backtest and the one-sided versions of the four competitor backtests in rejecting the null hypothesis that the issued ES forecasts are at least as conservative (not smaller in absolute value) as the true ES, i.e. that the associated market risk is not underestimated. In Figures 4a to 4e, we present the size-adjusted rejection rates for the one-sided versions of the considered backtests and for the ﬁve continuous parameter misspeciﬁcations described in the points (a) - (e) from the previous section. The structure of these ﬁgures is analog to the two-sided case where the x-axis is oriented such that too risky ES forecasts are on the right side of the true model (vertical gray line). As it can be seen in Figures 5a to 5e in Appendix D, the ﬁve modiﬁcations of the true model exhibit clear patterns when they issue too risky, respectively too conservative forecasts for the true ES, where this ﬁnding holds strictly for the cases (b), (d) and (e) and on average for the cases (a) and (c). Thus, the one-sided backtests should only reject the null hypothesis for ES forecasts that issue too risky (too small in absolute value) forecasts, i.e. which are on the right side of the true model in Figures 4a to 4e. We ﬁnd that our Intercept ESR backtest is reasonably sized (compare Table 2) and clearly dominates the ER and the CC tests in terms of their power in four out of the ﬁve misspeciﬁcation designs. Only when altering the degrees of freedom, the ER tests are slightly more powerful than the Intercept ESR test. Surprisingly, we see that in four out of the ﬁve cases, the one-sided CC tests (both, the simple and the general version) also reject too conservative ES forecasts, even though these should not be rejected by the speciﬁcations of the one-sided tests. Furthermore, as for the two-sided tests, both ER backtests fail to detect misspeciﬁcations of the underlying volatility process and of the underlying probability level. Summarizing these results, the proposed Intercept ESR backtest is a powerful backtest with good size properties for testing one-sided hypotheses which clearly dominates the existing one-sided (joint VaR and ES) backtesting techniques in the literature. 4. Empirical Application In the empirical application we apply our backtests to compare ES forecasts along three dimensions: the complexity of the risk model, the length of the estimation window, and the model reﬁt frequency. From a practitioners point of view, it would be desirable to have a parsimonious model that can be estimated with few observations and is valid over a long period of time, for reasons of low engineering eﬀort, data storage, and human and computational eﬀort for updating the model. To assess whether such a setup is reasonable, and if not, which dimensions are crucial for a good performance, we compare rejection rates of ES forecasts using our backtests. For this application, we use daily log returns of the 200 most highly capitalized stocks of the S&P 500 index (as of September 1, 2019), with a suﬃciently long history of stock prices. We consider four diﬀerent risk models: the standard GARCH(1,1) of Bollerslev (1986) and the GJR-GARCH(1,1) model of Glosten et al. 21 (a) Changing the reaction to the squared returns (b) Changing the unconditional variance 0.5 1.0 Int. ESR (m) General CC Simple CC 0.4 0.8 Std. ER ER Int. ESR (m) 0.3 0.6 General CC Simple CC Std. ER 0.4 0.2 ER 0.2 0.1 0.0 0.0 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 0.5 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0.01 ARCH parameter Unconditional variance (c) Changing the persistence Int. ESR (m) 0.8 General CC Simple CC Std. ER 0.6 ER 0.4 0.2 0.0 0.91 0.93 0.95 0.97 0.99 Persistence (d) Changing the degrees of freedom (e) Changing the probability level 1.0 1.0 Int. ESR (m) General CC Simple CC 0.8 0.8 Std. ER ER 0.6 0.6 0.4 0.4 Int. ESR (m) General CC 0.2 0.2 Simple CC Std. ER ER 0.0 0.0 3 4 5 6 7 8 9 10 12 15 20 30 50 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 Degrees of freedom of the Student-t Probability Level (in %) Figure 4: Size-adjusted rejection rates for various types of misspeciﬁcation with a one-sided hypothesis. The gray vertical line depicts the true model. The number of Monte-Carlo repetitions is 10,000 and the probability level for the risk measures is = 2:5%. ESR refers to the backtests introduced in this paper with (m) indicating the version which account for the additional covariance terms induced by the misspeciﬁed model. CC refers to the conditional calibration tests of Nolde and Ziegel (2017), and ER to the exceedance residuals tests of McNeil and Frey (2000). Rejection Rate Rejection Rate Rejection Rate Rejection Rate Rejection Rate (1993), both coupled with Gaussian and Student-t distributed innovations. For all four models and 200 stocks, we compare the same evaluation horizon, the period from January 2010 to August 2019 with a total of 2432 daily observations. We furthermore consider ﬁve diﬀerent lengths of the rolling estimation window ranging from one year (250 trading days) up to eight years (2000 trading days) and reﬁt horizons of 5, 21, 62, 125 and 250 days, corresponding to weekly, monthly, quarterly, bi-yearly and yearly updating of the models. Table 3 presents the rejection rates of the one-sided intercept ESR backtest with a nominal size of 5% for the 200 stocks under investigation, for the four GARCH speciﬁcations, the ﬁve estimation window sizes and the ﬁve reﬁt frequencies. We choose to use the one-sided intercept test as this is the only one-sided and strict ES backtest in the literature. Given the currently implemented traﬃc light system of the Basel Committee, this one-sided test might be the one with the highest practical relevance for backtesting the ES. Table 3: Results of the empirical application Rolling Reﬁt Frequency Reﬁt Frequency Window 5 21 62 125 250 5 21 62 125 250 GARCH-N GJR-GARCH-N 250 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 500 0.99 0.99 0.99 0.99 0.99 1.00 1.00 1.00 0.99 1.00 1000 0.98 0.97 0.98 0.96 0.96 0.99 0.99 0.99 0.99 0.99 1500 0.97 0.97 0.97 0.97 0.96 0.98 0.99 0.99 0.99 0.99 2000 0.96 0.96 0.94 0.95 0.94 0.98 0.98 0.98 0.98 0.97 GARCH-t GJR-GARCH-t 250 0.26 0.32 0.32 0.36 0.59 0.28 0.32 0.29 0.37 0.42 500 0.10 0.09 0.12 0.12 0.20 0.13 0.17 0.17 0.22 0.25 1000 0.07 0.07 0.07 0.08 0.09 0.14 0.11 0.11 0.12 0.14 1500 0.09 0.09 0.09 0.10 0.09 0.08 0.10 0.09 0.09 0.09 2000 0.10 0.08 0.08 0.08 0.09 0.09 0.09 0.10 0.09 0.10 Notes: This tables shows the rejection rates of the one-sided ESR backtest for ES forecasts stemming from the two GARCH-type models with Student’s t and Gaussian residuals, diﬀerent rolling window sizes and model reﬁt lengths (in days). The rejection frequencies are averaged over the analyzed 200 most capitalized stocks of the S&P 500 index. The out-of-sample window covers the time from Jan. 2010 to Aug. 2019 resulting in a sample size of 2432 days. The results show that both, the GARCH-N and GJR-GARCH-N are rejected for almost all the stocks (in more than 94% of the cases) uniformly over the diﬀerent estimation sample sizes and reﬁt frequencies. Independent of the sample length and reﬁt frequency, this supports the well known ﬁnding that Gaussian residuals generally fail to capture the riskiness of ﬁnancial assets, especially in the tail of the distribution. In contrast, for the two GARCH speciﬁcations with Student-t distributed innovations, the rejection frequencies are considerably lower and for many choices of reﬁt frequencies and estimation windows, they are just above the nominal signiﬁcance level. These results imply that using fat-tailed distributions generally decreases the rejection frequency. Similarly, reﬁtting the models more frequently tends to decrease the rejection frequency, especially if the estimation window is short. However, reﬁtting the model on a monthly or even weekly basis is not required, infrequent regular updates (such as quarterly) suﬃce, as more frequent updates do not improve performance if the estimation window is large enough. Interestingly, employing the GJR-GARCH model, which accounts for a potential leverage eﬀect in the volatility, does not perform better than the standard GARCH model. 23 Generally, reﬁtting these models at least quarterly using data which goes at least four years into the past suﬃces to obtain rejection rates uniformly below 11%. The results of this application, which are diversiﬁed over 200 individual stocks, indicate that in order to obtain satisfactory ES forecasts, one should use fat-tailed residual distributions, more than four years of data and regular reﬁtting of the models at least once per quarter. 5. Conclusion With the upcoming implementation of the third Basel Accords, risk managers and regulators will shift attention to the risk measure Expected Shortfall (ES) for the forecasting and evaluation of ﬁnancial risks. In this paper, we introduce regression based ESR backtests for ES forecasts, which extend the classical Mincer and Zarnowitz (1969) test to ES speciﬁc versions. As estimation of regression parameters for the ES stand-alone is infeasible, our tests build on a recently developed joint VaR and ES regression, which allows for diﬀerent speciﬁcations of our tests, titled the Auxiliary, Strict, and Intercept ESR backtests. As these tests are potentially subject to model misspeciﬁcation, we extend the asymptotic theory for the joint VaR and ES regression framework to possibly misspeciﬁed models and verify the tests’ performance in ﬁnite samples through an extensive simulation study. We apply our tests to 200 stocks from the S&P 500 index in order to analyze the performance of ES forecasts stemming from the GARCH model family. We ﬁnd that using fat-tailed (Student’s t) residual distributions, more than four years of data and regular reﬁtting of the models at least once per quarter yields satisfactory ES forecasts. A unique and essential feature of the Strict and Intercept ESR backtests is that they solely require forecasts for the ES and are consequently the ﬁrst backtests for the ES stand-alone. In contrast, a common drawback of the existing backtests in the literature is that they need forecasts of further input parameters, such as the VaR, the volatility, the tail distribution or even the whole return distribution. Using more information than the ES forecasts is problematic for two reasons. First, these tests are not applicable for the regulatory authorities, who receive forecasts of the ES, but not of the additional information required by these tests. Second, rejecting the null hypothesis does not necessarily imply that the ES forecasts are incorrect as the rejection can be a result of a false prediction of any of the input parameters. This paper contributes to the ongoing discussion about which risk measure is the best in practice in the following way. As the VaR is criticized for not being subadditive and for not capturing tail risks beyond itself, the recent literature proposes both, the ES and expectiles as alternative risk measures. Expectiles are suggested as they are coherent, elicitable and are able to capture extreme risks beyond the VaR and thus, they simultaneously overcome the drawbacks of the VaR and the ES (Bellini et al., 2014; Ziegel, 2016). Unfortunately, as opposed to the VaR and ES, they lack a visual and intuitive interpretation (Emmer et al., 2015). In contrast, the ES is mainly criticized for its theoretical deﬁciencies of being not elicitable and not (only with diﬃculties) backtestable. However, starting with the joint elicitability result of VaR and ES of Fissler and Ziegel (2016), there is a growing body of literature using this result for a regression procedure (Dimitriadis and Bayer, 2019; Barendse, 2018; Patton et al., 2019a) and for relative forecast comparison (Fissler et al., 2016; Nolde and Ziegel, 2017), which is extended by this paper through introducing the ESR backtests, which are the ﬁrst sensible backtests for the ES stand-alone. This shows that, even though technically more demanding, the ES can be modeled, evaluated and backtested in the same way as quantiles and expectiles. Combining this with its ability to capture extreme tail risks and its intuitive visual interpretation, the ES is an appropriate candidate for being the standard risk measure in practice. 24 Acknowledgments We thank the editor Andrew Patton, an anonymous associate editor and two referees for very helpful comments. We further thank Tobias Fissler, Lyudmila Grigoryeva, Roxana Halbleib, Phillip Heiler, Ekaterina Kazak, Winfried Pohlmeier, James Taylor, and Johanna Ziegel for suggestions which inspired some results of this paper. Financial support by the Heidelberg Academy of Sciences and Humanities (HAW) within the project “Analyzing, Measuring and Forecasting Financial Risks by means of High-Frequency Data” and by the German Research Foundation (DFG) within the research group “Robust Risk Measures in Real Time Settings” is gratefully acknowledged. The authors acknowledge support by the state of Baden-Württemberg through bwHPC. The majority of the work on this paper was conducted while both authors were at the Department of Economics, Universität Konstanz. Appendix A Proofs Proof of Theorem 2.3. We check that the necessary conditions (i) - (iv) of the basic consistency theorem, given in Theorem 2.1 in Newey and McFadden (1994), p.2121 hold, where we consider the objective functions Q ¹º and Q ¹º as deﬁned in (2.17) and (2.19). First, notice that condition (ii) holds by imposing condition (A2). The unique identiﬁcation condition (i) holds by assumption (A3). Next, we verify the uniform convergence condition (iv) by applying the uniform weak law of large numbers given in Theorem A.2.5. in White (1994). For that, we have to show that (A) the map 7! Y ; X ; is Lipschitz-L on , see Deﬁnition A.2.3 in White (1994)9, t t 1 o o o (B) For all 2 , there exists > 0, such that for all ; 0 < , the sequences o o ¯ ¹ ; º := sup Y ; X ; jj jj < and (A.1) t t t o o ¹ ; º := inf Y ; X ; jj jj < (A.2) t t obey a weak law of large numbers. Condition (A) follows directly from Lemma B.1 and we turn to condition (B). As the process fY ; V ; W g t t t is strong mixing of size r¹r 2º for some r > 2 by condition (A6), the processes V and W are strong t t mixing of the same size by Theorem 3.49 in White (2001), p. 50. As the functions Y ; X ; and the t t supremum/inﬁmum functions are F -measureable for all t 2 N, we can conclude that the sequences ¯ ¹ ; º t t and ¹ ; º are also strong mixing of the same size by applying the same theorem. o r ˜+ Furthermore, for r ˜ > 1 and for some > 0 suﬃciently small enough, r r ˜+ and thus E j ¯ ¹ ; ºj sup E sup Y ; X ; for all t; 1 t T; T 1. As is compact, there exists some c > 0 such t t 1tT 2 that sup jjjj c and thus, for all t = 1; : : :; T, it holds that c 1 1 r1 r r > r E sup Y ; X ; 4 1 + 1 + EjjV jj + EjjY jj + sup Ejj log¹W ºjj ; (A.3) t t t t K K 2 2 which is bounded by condition (A8) and as log¹zº z for z large enough. The same inequality holds for j ¹ ; ºj. Thus, we can apply the weak law of large numbers for strong mixing sequences in Corollary o o 3.48 in White (2001), p. 49 in order to conclude that for all 2 such that jj jj , it holds that 9 Notice that we do not have a double index and thus we supress the n in the notation of White (1994). Furthermore, we apply the deﬁnition by using the identify function for a . P P P P 1 T o o 1 T o o ¯ ¹ ; º E» ¯ ¹ ; º¼ ! 0 and ¹ ; º E ¹ ; º ! 0, which shows condition t t T t=1 T t=1 t t (B). Consequently, the uniform convergence condition (iv) holds by applying the uniform weak law of large numbers given in Theorem A.2.5. in White (1994). As we have shown that the map 7! Y ; X ; is Lipschitz-L in Lemma B.1, the map 7! Q = t t 1 1 T E Y ; X ; is also continuous which shows condition (iii). Thus, we can apply Theorem 2.1. of t t T t=1 Newey and McFadden (1994) which concludes the proof of this theorem. Proof of Theorem 2.4. Let 1 > fY V g © W t ª ¹Y ; X ; º = ; (A.4) ® t t W 1 t > > > W V + ¹V Yº1 > t 2 fY V g t t t t ¹W º t « ¬ which is almost surely the derivative of ¹Y ; X ; º with respect to . We further deﬁne ¹º = t t T 1 T 0 ¹Y ; X ; º and ¹º = E» ¹º¼. From the proof of Lemma B.2, we get the mean value ex- t t T T t=1 T pansion (for close to ), 0 0 ˆ ˜ ˜ ˆ ¹ º ¹ º = ¹ ; º ; (A.5) T T 1 2 T T T T T ˜ ˜ ˆ ˜ ˜ for some values and somewhere on the line between and , where the components of ¹ ; º are 1 2 T T 1 2 given in (B.8) and (B.9), and where ¹ º = 0.10 T T ˜ ˜ Furthermore, it holds that ¹ ; º = ¹ º and ¹ ; º is a continuous function in its arguments T T T 1 2 T T T ˜ ˜ and . Using that ¹ º has Eigenvalues bounded away from zero (for T large enough), we also get 1 2 T ˜ ˜ that ¹ ; º is non-singular in a neighborhood around (for all arguments) for T large enough as the T 1 2 map which maps the matrix onto its Eigenvalues is continuous. As we further know that ! 0 and ˜ ˆ jj jj jj jj for all j = 1; 2, we get from the continuous mapping theorem that j T T T 1 1 ˜ ˜ ¹ ; º ¹ º ! 0: (A.6) 1 2 T T T In the following, we apply Lemma A.1 in Weiss (1991) (by verifying its assumptions), which extends the iid results of Huber (1967) to strong mixing sequences. Assumption (N1) of Lemma A.1 in Weiss (1991) is satisﬁed as every almost surely continuous stochastic process is separable in the sense of Doob (Gikhman and Skorokhod, 2004) and the functions Y ; X ; are almost surely continuous for all t 2 N. Assumption (N2) t t is satisﬁed as shown in the proof of Theorem 2.3. Assumption (N3)(i) is shown in Lemma B.2. The technical Assumptions (N3)(ii) and (N3)(iii) follow from Lemma 4 and Lemma 5 in Patton et al. (2019b). For this, notice that the moment conditions in Assumption 2 (C) and (D) of Patton et al. (2019a) are implied by the condition (A8) in Assumption 2.2 for the simpliﬁed case of linear models. Assumption (N4) follows from the moment conditions (A8) in Assumption 2.2 and Assumption (N5) from the strong mixing condition (A6). Furthermore, Lemma 2 of Patton et al. (2019b) implies that T ¹ º ! 0. Thus, we can apply Lemma A.1 T T in Weiss (1991) and get that p p T ¹ º T ¹ º ! 0: (A.7) T T T T 10The mean-value theorem cannot be generalized in a straight-forward fashion to vector-valued functions. Thus, we have to consider the mean value expansion in each component separately which gives this more complicated expression. 26 Combining (A.5), (A.6) and (A.7), we get that p p 1 0 ˆ ˜ ˜ ˆ T = ¹ ; º T ¹ º (A.8) T T 1 2 T T T = ¹ º + o ¹1º T ¹ º + o ¹1º (A.9) p T p T T T = ¹ º T ¹ º + o ¹1º: (A.10) T p T T T Furthermore, p p 12 12 ¹ º T ¹ º = ¹ º T ¹ º ¹ º ! N 0; I ; (A.11) T T 2k T T T T T T T T by Lemma B.3 and thus, 12 ¹ º ¹ º T ! N 0; I ; (A.12) T T 2k T T T which concludes the proof of this theorem. Proof of Corollary 2.5. We ﬁrst notice that p p p 12 12 12 12 b ˆ ˆ b ˆ T = T + T : (A.13) T T T T T T T T T T 12 12 12 ˆ b From Theorem 2.4, we get that T ! N 0; I . Furthermore, as = o ¹1º it T 4 P T T T T 12 12 b ˆ holds by Slutzky’s theorem, that T = o ¹1º and consequently, T P T T T 12 T ! N ¹0; I º : (A.14) T 4 Thus, p > p 12 12 b b T = T ˆ T ˆ ! ; (A.15) A-ESR T T T T 2 T; T; p > p 12 12 ˜ b b T = T ˆ T ˆ ! ; and (A.16) J-ESR T T T T 2 T; T; p p 12 12 ˜ b b T = T ˆ T ˆ ! : (A.17) I-ESR T;1 T;1 T;1 T;1 1 T; T; 1 1 27 SUPPLEMENTARY MATERIAL Appendix B Technical Proofs Lemma B.1. Given the conditions from Assumption 2.2, the function Y ; X ; is L -Lipschitz on with t t 1 F -measurable and integrable Lipschitz-constant. Proof. We split the -function Y ; X ; = Y ; X ; + Y ; X ; , where t t 1 t t 2 t t Y ; X ; = 1 > ¹V Yº; 1 t t fY V g t t > > > V W t t Y ; X ; = log¹W º: 2 t t Local Lipschitz continuity of follows since it is a continuously diﬀerentiable function in (such that > o o W , 0) and thus (locally) Lipschitz-L . We consequently get that for all 2 , there exists a > 0 such o o o that for all 2 U ¹ º := 2 jj jj , it holds that V W V + W t t t o o Y ; X ; Y ; X ; sup + ; (B.1) 2 t t 2 t t > > o W ¹W º 2U o¹ º t t h i h i P P V W V +W t 1 T t t 1 T t o where the sequences E and E are bounded for all 2 by the > > 2 T t=1 W T t=1 ¹W º t t conditions (A7) and (A8) in Assumption 2.2. o > o For the function , we consider four cases. First, let = ! 2 ; 2 U ¹ º V ¹!º < 1 1 Y¹!º and V ¹!º < Y¹!º . Then, on , it holds that, t t 1 Y ; X ; = Y ; X ; = 0; (B.2) 1 t t 1 t t which is obviously Lipschitz-L . o > o > Second, let = ! 2 ; 2 U ¹ º V ¹!º Y¹!º and V ¹!º Y¹!º . On , for both 2 t t 2 t t 2 f; g, it holds that ˜ ˜ Y ; X ; = V Y ; (B.3) 1 t t t W ˜ which is a continuously diﬀerentiable function. Thus, V W t t o o > Y ; X Y ; X ; sup + sup ¹V Yº ; 1 t t 1 t t t > 2 o ¹W º o ¹W º 2U o¹ º 2U o¹ º t t (B.4) where the average of the expectations of the suprema sequencesin the last two lines are bounded by the conditions (A7) and (A8) in Assumption 2.2. o > > o > o Finally, let = ! 2 ; 2 U ¹ º V ¹!º < Y¹!º V ¹!º . As on , jV Y j 3 t 3 t t t t > o > jV V j almost surely, it holds that t t o > o Y ; X Y ; X ; = ¹V Yº 1 t t 1 t t t 1 V > o > o ¹V V º sup : t t > > W o W 2U o¹ º t t 28 Equivalently as above, the average of the expectations of the suprema sequences in the last two lines are bounded by the condition (A7) and (A8) in 2.2. An equivalent argument holds for = ! 2 ; 2 o > o > 4 U o¹ º V ¹!º < Y¹!º V ¹!º . As = , we can conclude that the function Y ; X ; is t i t t t t i=1 Lipschitz-L on . Lemma B.2. Given the conditions from Assumption 2.2, there exist constants a; d > 0 such that ¹º ajj jj for any 2 such that jj jj d ; (B.5) T T T and for all T T , where T 2 N is large enough. 0 0 Proof. Let 2 such that jj jj d for some (small) constant d > 0 and deﬁne 0 0 0 > ¹º = E F ¹V º and (B.6) T;1 > W 1 0 > > > ¹º = E W V + ¹V Yº1 > ; (B.7) t t t fY V g T;2 > t 2 t ¹W º 0 > 0 > 0 > such that ¹º = ¹º ; ¹º . Then, by applying the mean-value theorem we get that T T;1 T;2 V V t t 0 0 > > ¹º ¹ º = E F ¹V º E F ¹V º t t T;1 T;1 T t t T > > W W t t T " ! # > 1 > V V f ¹V º t > t 1 t t (B.8) W ˜ = E ¹ º > 1 > V W F ¹V º t > t 1 t 2 t ¹W ˜ º = ¹ º¹ º; T;1 1 for some on the line between and . Equivalently, for the second component, 0 0 ¹º ¹ º T;2 T;2 T W 1 > > > = E W V + ¹V Yº1 > t t t fY V g > t 2 t ¹W º W 1 > > > E W V + ¹V Yº1 > t fY V g t T t T t T > t 2 t T (B.9) ¹W º 2 3 F ¹V º > 1 t t 2 W V 6 7 t > 2 © ª ¹W ˜ º 6 t 7 = E ¹ º ® 6 7 1 2 1 > > > > ˜ ˜ 6 W W W ˜ V + ¹V Yº1 > 7 t > > 2 2 2 t ˜ 2 3 t t t t fY V g ¹W ˜ º ¹W ˜ º t 2 2 2 t « ¬ 4 t t 5 = ¹ º¹ º; T;2 2 ˜ ˜ ˜ for some on the line between and . Notice that and are not necessarily the same as the 2 1 2 ˜ ˜ mean-value theorem does not hold in its classical form for vector-valued functions. Thus, for ¹ ; º = T 1 2 ˜ ˜ ¹ º; ¹ º , we get that T;1 1 T;2 2 0 0 ˜ ˜ ¹º ¹ º = ¹ ; º¹ º: (B.10) T 1 2 T T T T ˜ ˜ ˜ In the following, we show that ; ¹ º c jj jj. For the ﬁrst component of ¹ º, we T 1 2 T 1 T;1 1 T T get that 0 > 2 f ¹V º 3 > t t > 6 V V V ¹ º 7 t > 1 © t t ª W T ˜ 6 t 7 jj ¹ º ¹ ºjj = E > ® (B.11) T;11 1 T;11 T 6 f ¹V º 7 > t > V V W ¹ ˜ º 6 7 t > 2 1 t t ¹W º 4« t ¬5 0 > 2 f ¹V º 3 3 t t jjV jj 6 7 t > © W ª 6 t 7 E ® jj jj; (B.12) 6 f ¹V º 7 2 t 6 jjV jj jjW jj 7 t t > ¹W º « ¬ 4 t 5 for some = ¹ ; º on the line between and . ˜ ˜ For the second component of ¹ º (and equivalently for the ﬁrst component of of ¹ º), we get that T;1 1 T;2 2 2 f ¹V º 3 > t > 6 V W V ¹ º 7 t > 2 1 © t t ª ¹W º T ˜ 6 t 7 jj ¹ º ¹ ºjj = E > ® (B.13) T;12 1 T;12 T 6 2¹F ¹V ºº 7 > t > V W W ¹ ˜ º 6 7 t > 3 1 t t ¹W º « t ¬ 4 5 2 f ¹V º 3 2 t jjV jj jjW jj 6 7 t t > 2 © ª ¹W º 6 t 7 E ® jj jj; (B.14) 6 2¹F ¹V ºº 7 2 t 6 jjV jj jjW jj 7 t t > ¹W º « ¬ 4 t 5 for some = ¹ ; º on the line between and . Eventually, for the second component of ¹ º, we get that T;2 2 jj ¹ º ¹ ºjj (B.15) T;22 1 T;22 2 2¹F ¹V ºº 3 > t > 6 W W V ¹ º 7 t > 1 t 3 t © T ª ¹W º 6 n o 7 = E ® 6 7 > 4 6 > > 1 > > W W + W V + ¹V Yº1 > W ¹ ˜ º 6 7 t > 3 > 4 t 2 t t t t fY V g t t T ¹W º ¹W º t 4« t t ¬5 (B.16) 2 2¹F ¹V ºº 3 2 t 6 jjV jj jjW jj 7 t t > © ª ¹W º 6 7 ˜ E jj jj; (B.17) ® 6 7 3 4 6 > > 1 > jjW jj + W V + ¹V Yº1 > 6 7 t > 3 > 4 t t t t fY V g ¹W º ¹W º t 4 t t ¬5 for some = ¹ ; º on the line between and . As the respective moments are ﬁnite given the ˜ ˜ moment conditions in (A8) in Assumption 2.2 and since jj jj jj jj and jj jj jj jj, 1 2 T T T T we have shown that for all T suﬃciently large enough, there exists a constant c > 0 such that ˜ ˜ ; ¹ º c jj jj: (B.18) T 1 2 T 1 T T Furthermore, as the matrix ¹ º has Eigenvalues bounded from below (for T large enough) by assumption, there exists a constant c > 0, such that ¹ º¹ º c jj jj: (B.19) T 2 T T T c c 2 2 Thus, we choose d > 0 small enough such that d < . Then jj jj d < and thus, 0 0 0 2c T 2c 1 1 2 2 ˜ ˜ 2c jj jj c jj jj. Consequently, ; ¹ º ¹ º c jj jj c 2jj jj 1 2 T 1 2 T 1 2 T T T T T T 30 and thus ˜ ˜ ¹º = ; ¹ º (B.20) T 1 2 ˜ ˜ = ¹ º¹ º + ; ¹ º ¹ º (B.21) T T 1 2 T T T T T ˜ ˜ ¹ º¹ º ; ¹ º ¹ º (B.22) T T 1 2 T T T T T jj jj; (B.23) by applying the mean value expansion and the inverse triangular inequality. Lemma B.3. Given Assumption 2.2, it holds that 12 ¹ º T ¹ º ! N¹0; I º: (B.24) T 2k T T T Proof. We show this multivariate result by applying the Cramér–Wold theorem, i.e. by showing that the conditions for the univariate CLT for -mixing sequences given in Theorem 5.20 in White (2001), p.130 > k hold for all linear combinations u Y ; X ; for all u 2 R such that jjujj = 1. By Theorem 3.49 in White t t (2001) p.50, we get that the sequences Y ; X ; and u Y ; X ; are strong mixing of sizer¹r 2º t t t t T T for some r > 2. Furthermore, for all t 2 N, it holds that r r E u Y ; X ; E Y ; X ; t t t t T T r r W W 1 V t t t T r1 4 max ; 1 E + E > > W ¹W º t t T T > r W V 1 t W Y t T t t + 1 + E + E > > 2 2 ¹W º ¹W º t t T T 1 1 1 r1 r r 4 max ; 1 E»jjV jj ¼ + E»jjW jj ¼ t t r r K K 1 1 1 + 1 + E W V + E»jjW Y jj ¼ < 1; t t t 2r 2r K K by applying Jensen’s inequality and by the moment conditions (A8) in Assumption 2.2, where r > 2 (from condition (A6)). As the sequence Y ; X ; is uncorrelated by condition (A3) in Assumption 2.2, we get t t that for all T 1, T T X X 1 1 Var Y ; X ; = E Y ; X ; º Y ; X ; º = ¹ º: (B.25) p t t t t t t T T T T T t=1 t=1 As ¹ º is real and symmetric and positive deﬁnite, it can be diagonalized with a real orthogonal matrix S, i.e. S ¹ ºS = D , where D is a diagonal matrix containing the Eigenvalues of ¹ º, denoted by T T T T T T f ; : : :; g. Consequently, for any u 2 R , 1;T k;T > > > > > Var u Y ; X ; = u ¹ ºu = u S D Su = v D v > min ; (B.26) t t T T T i;T T T i=1;:::;k t=1 where v = Su, i.e. jjvjj = 1 as S is orthogonal and where the Eigenvalues f ; : : :; g are bounded away 1;T k;T from zero for T suﬃciently large. Thus, we can apply Theorem 5.20 in White (2001) p. 130 for asymptotic > k normality of the sequences u Y ; X ; for all u 2 R such that jjujj = 1. Applying the Cramér-Wold t t theorem concludes the proof. 31 Appendix C Existing Backtests Over the past two decades and especially driven by the recent transition from VaR to ES in the Basel regulatory framework (Basel Committee, 2016, 2017), a large literature on backtesting the ES has emerged. These backtests are usually introduced with ﬁnancial regulators in mind who need to verify the risk forecasts they receive from the ﬁnancial institutions. To be applicable by the regulatory authorities, a backtest for the risk measure ES thus follows Deﬁnition 2.1 and only requires the observed return series and the ES forecasts as input variables. However, many of the proposed backtests for the ES fail to have this property. In particular, several tests require the whole return distribution (or equivalently the cumulative violation process 1 dp) (Kerkhof and Melenberg, 2004; Wong, 2008; Graham and Pál, 2014; Acerbi and Szekely, fY v ˆ ¹pºg t t 2014; Du and Escanciano, 2017; Löser et al., 2018; Costanzino and Curran, 2018), multiple quntile levels (Emmer et al., 2015; Costanzino and Curran, 2015; Kratz et al., 2018; Couperier and Leymarie, 2019), the VaR and the volatility (McNeil and Frey, 2000; Nolde and Ziegel, 2017; Righi and Ceretta, 2013, 2015), or the VaR (McNeil and Frey, 2000; Nolde and Ziegel, 2017) in addition to the ES forecasts. However, this information is not reported by the ﬁnancial institutions and therefore, most of these tests can not be used by the regulators (Aramonte et al., 2011; Basel Committee, 2017). Furthermore, when more information than solely the ES forecasts is used for backtesting, a rejection of the null hypothesis does not necessarily imply that the ES forecasts are wrong. More precisely, a rejection of the null implies that some component of the input parameters is incorrect (cf. Nolde and Ziegel, 2017). A related concern is raised by Aramonte et al. (2011), who note that ﬁnancial institutions could be tempted to submit forecasts of this additional information chosen such that the tests have particularly low power, so that correctness of their internal model (and their issued ES forecasts) is not doubted. Strictly following Deﬁnition 2.1, we would have to distinguish between backtests for the ES and joint backtests for the pair VaR and ES. However, as the ES is strongly intertwined with the VaR (through its deﬁnition and through the joint elicitability), sensible forecasts for the ES are based on correctly speciﬁed VaR forecasts. Consequently, it is reasonable to backtest both quantities jointly and thus, we compare the performance of our ESR backtests to existing joint VaR and ES backtests in the literature. In the subsequent two sections, we describe the exceedance residual (ER) backtests of McNeil and Frey (2000) and the conditional calibration (CC) backtests of Nolde and Ziegel (2017) in detail, since both have versions that only require VaR forecasts in addition to the ES. C.1 Testing the Exceedance Residuals One of the ﬁrst and still most frequently used tests for the ES is the exceedance residual (ER) backtest of McNeil and Frey (2000). This approach is based on the ES-speciﬁed residuals that exceed the VaR, er = Y e ˆ 1 , which form a martingale diﬀerence sequence given that v ˆ and e ˆ are the true quantile t t t fY v ˆ g t t t t and ES conditional on the information F . McNeil and Frey (2000) further consider a second version that t1 uses exceedance residuals standardized by a given volatility forecast, i.e. er ˆ . t t This backtest tests whether the expected value of the (raw or standardized) ER, = E»er ¼, is zero using P P T T the estimate ˆ = 1¹ 1 º er in conjunction with a bootstrap hypothesis test (see Efron and fY v ˆ g t t t t=1 t=1 Tibshirani, 1994, p. 224). In the original paper, McNeil and Frey (2000) propose to test against the one-sided alternative that is negative, i.e. that the issued ES forecasts are too risky (too small in absolute value). However, in this paper we discuss both, tests based on one-sided and two-sided hypotheses, so that in addition 32 to the original proposal, we also include a two-sided test, 2s 2s H : = 0 against H : , 0; and 0 1 (C.1) 1s 1s H : 0 against H : < 0: 0 1 By Deﬁnition 2.1, the test using the standardized ER is in fact a joint backtest for the triple VaR, ES and volatility, whereas the test using the raw ER is a joint backtest for the pair VaR and ES. In light of the discussion above, the test using the raw ER is therefore preferred. Nevertheless, in the simulation studies and the empirical application we apply both approaches and ﬁnd that they perform alike. Even though the intercept ESR test introduced in Section 2.3 and the ER backtest appear to be similar, there is a subtle but crucial diﬀerence between the two test statistics. For the intercept ESR test, we compute the empirical ES of Y e ˆ , i.e. the average of Y e ˆ given that Y e ˆ is smaller than its empirical -quantile. t t t t t t In contrast, the ER backtest computes the average of Y e ˆ , given that Y is smaller than the respective forecast t t t for its -quantile v ˆ . This diﬀerence seems marginal, but it has severe consequences for the theoretical and empirical properties of the tests. P P P T T T ˜ ˜ ˜ As we can write ˆ = 1T Y 1 1T e ˆ 1 , where T = 1 , the ER t fY v ˆ g t fY v ˆ g fY v ˆ g t t t t t t t=1 t=1 t=1 backtest in fact compares the empirical average of Y truncated at v ˆ to the average ES forecast e ˆ , whenever t t t there is a VaR violation. Thus, this backtest rejects whenever the distance/relation between the VaR and ES-forecasts is incorrect. However, simultaneous misspeciﬁcations of both forecasts, such as e.g. generated by misspeciﬁcation of the volatility process in location scale models cannot be detected. In the same spirit, the ER backtest cannot distinguish between correct forecasts for the VaR and ES at level and (correct) forecasts for a misspeciﬁed probability level ˜ , , as the given level does not inﬂuence the ER test statistic at all. In contrast, by computing the empirical -quantile of Y e ˆ (instead of using the forecast v ˆ ), the intercept ESR t t t test does not suﬀer from these shortcomings as can be observed in the simulation results in Section 3.2. C.2 Conditional Calibration Backtests Nolde and Ziegel (2017) introduce the concept of conditional calibration (CC) based on strict identiﬁcation functions (also known as moment conditions or estimating equations) of the respective functional and show that many classical backtests for risk measures can be uniﬁed using this concept. For the pair VaR and ES at level 2 ¹0; 1º, they choose the strict identiﬁcation function fYvg V¹Y; v; eº = ; (C.2) e v + 1 ¹v Yº fYvg whose expectation is zero if and only if v and e equal the true VaR and ES of the random variable Y respectively. The CC backtest for forecasts for the VaR, v ˆ and for the ES, e ˆ is based on the hypotheses t t 2s H : E V¹Y ; v ˆ ; e ˆ º j F = 0 against E V¹Y ; v ˆ ; e ˆ º j F , 0; and t t t t1 t t t t1 (C.3) 1s H : E V¹Y ; v ˆ ; e ˆ º j F 0 against E V¹Y ; v ˆ ; e ˆ º j F < 0; t t t t1 t t t t1 component-wise and almost surely for all t = 1; : : :; T. This is equivalent to testing E h V¹Y ; v ˆ ; e ˆ º = 0 for t t t all F measurable R -valued functions h . As this is infeasible, Nolde and Ziegel (2017) propose to use an t1 t F -measurable sequence of q 2-matrices of test functions h for some q 2 N and to use the Wald-type test t1 t statistic ! ! T T X X 1 1 T = T h V ¹Y ; v ˆ ; e ˆ º h V ¹Y ; v ˆ ; e ˆ º ; (C.4) CC t t t t t t t t T T t=1 t=1 1 T where = h V ¹Y ; v ˆ ; e ˆ º h V ¹Y ; v ˆ ; e ˆ º is a consistent estimator of the covariance of the T t t t t t t t t T t=1 q-dimensional vector h V ¹Y ; v ˆ ; e ˆ º. Under H , the test statistic asymptotically follows a distribution with t t t t 0 q degrees of freedom. Nolde and Ziegel (2017) propose two versions of this test, where the ﬁrst uses no information besides the risk forecasts (termed simple CC test), and where the second additionally requires volatility forecasts (termed general CC test). For the simple CC test, the test function is the identity matrix, h = I ; for both, the one- and t 2 two-sided hypotheses. For the general CC test, they propose to choose 1 jv ˆ j 0 0 h = ˆ ¹e ˆ v ˆ º; 1 and h = ; (C.5) t t t t t 0 0 1 ˆ for the two-sided and for the one-sided test, respectively, where ˆ is a forecast for the volatility. As with the standardized ER test, the general CC test is strictly speaking a backtest for the triple VaR, ES, and volatility, but we nevertheless include both versions in our empirical comparisons. We provide implementations of the two ESR backtests proposed in this paper, both ER backtests of McNeil and Frey (2000) and both CC backtests of Nolde and Ziegel (2017) in the R package esback (Bayer and Dimitriadis, 2019a). 34 Appendix D Additional Material (a) Changing the reaction to the squared returns (b) Changing the unconditional variance 4 4 ARCH parameter Unconditional Variance 0.2 0.1 (true) 0.03 0.01 0.2 (true) 0.5 2 2 0 0 2 2 4 4 0 50 100 150 200 250 0 50 100 150 200 250 Observation Number Observation Number (c) Changing the persistence Persistence 0.9 0.95 (true) 1.0 0 50 100 150 200 250 Observation Number (d) Changing the degrees of freedom (e) Changing the probability level 4 4 Degrees of freedom of the Student-t Probability Level (in %) 1.0 2.5 (true) 5.0 3 5 (true) 2 2 0 0 2 2 4 4 0 50 100 150 200 250 0 50 100 150 200 250 Observation Number Observation Number Figure 5: These plots show exemplary simulated return series with 250 observations for the DGP given in (3.8) and for the ﬁve parameter misspeciﬁcations illustrated in the points (a) - (e) in Section 3.2. In each of the subﬁgures, the black dashed line corresponds to the true model parameters. Return and ES Return and ES Return and ES Return and ES Return and ES Table 4: Empirical sizes for the ﬁrst simulation study. Sample Str. Aux. Int. Str. Aux. Int. Gen. Sim. Std. DGP ER Size ESR ESR ESR ESR ESR ESR CC CC ER Misspec Covariance Classical Covariance 250 0.05 0.05 0.09 0.16 0.16 0.10 0.01 0.22 0.04 0.05 500 0.03 0.03 0.05 0.08 0.08 0.06 0.03 0.13 0.01 0.01 EGARCH-STD 1000 0.02 0.02 0.03 0.05 0.05 0.03 0.04 0.08 0.01 0.02 2500 0.01 0.01 0.01 0.02 0.02 0.01 0.02 0.04 0.01 0.01 5000 0.01 0.01 0.01 0.02 0.01 0.01 0.02 0.03 0.01 0.01 250 0.05 0.06 0.08 0.17 0.17 0.10 0.01 0.21 0.04 0.05 500 0.04 0.04 0.06 0.09 0.09 0.06 0.02 0.13 0.01 0.01 GAS-STD 1000 0.02 0.02 0.03 0.06 0.06 0.03 0.03 0.08 0.02 0.02 2500 0.01 0.02 0.01 0.03 0.03 0.01 0.02 0.05 0.01 0.01 5000 0.01 0.01 0.01 0.02 0.02 0.01 0.02 0.03 0.01 0.01 250 0.05 0.05 0.08 0.16 0.16 0.09 0.01 0.20 0.04 0.04 500 0.03 0.03 0.05 0.08 0.08 0.06 0.02 0.12 0.01 0.01 GAS-SSTD 1000 0.02 0.02 0.03 0.04 0.04 0.03 0.03 0.07 0.02 0.01 2500 0.01 0.01 0.01 0.02 0.02 0.01 0.02 0.04 0.01 0.01 5000 0.01 0.01 0.01 0.01 0.01 0.01 0.02 0.03 0.02 0.01 250 0.02 0.02 0.06 0.10 0.10 0.07 0.01 0.17 0.04 0.04 500 0.02 0.02 0.04 0.05 0.05 0.05 0.02 0.09 0.00 0.00 AR-GARCH, = 0:0 1000 0.01 0.01 0.03 0.04 0.04 0.03 0.02 0.05 0.00 0.01 2500 0.01 0.01 0.02 0.02 0.02 0.02 0.02 0.03 0.01 0.01 5000 0.01 0.01 0.01 0.02 0.02 0.01 0.01 0.02 0.01 0.01 250 0.02 0.02 0.06 0.10 0.10 0.07 0.01 0.17 0.04 0.04 500 0.02 0.02 0.04 0.05 0.05 0.05 0.02 0.09 0.00 0.00 AR-GARCH, = 0:1 1000 0.01 0.01 0.03 0.03 0.04 0.03 0.02 0.05 0.00 0.01 2500 0.01 0.01 0.02 0.02 0.02 0.02 0.02 0.03 0.01 0.01 5000 0.01 0.01 0.01 0.01 0.02 0.01 0.01 0.02 0.01 0.01 250 0.02 0.02 0.06 0.09 0.09 0.07 0.01 0.17 0.04 0.04 500 0.02 0.02 0.04 0.06 0.06 0.04 0.02 0.09 0.00 0.00 AR-GARCH, = 0:5 1000 0.01 0.01 0.03 0.04 0.04 0.03 0.02 0.05 0.00 0.01 2500 0.01 0.01 0.02 0.02 0.02 0.02 0.02 0.03 0.01 0.01 5000 0.01 0.01 0.01 0.02 0.02 0.01 0.01 0.02 0.01 0.01 Notes: The table reports the empirical sizes of the backtests for the diﬀerent DGPs decribed in Section 3.1 and for a nominal test size of 1%. The number of Monte-Carlo repetitions is 10,000 and the probability level for the risk measures is = 2:5%. ESR refers to the three backtests introduced in this paper and we consider versions with covariance estimation with and without model misspeciﬁcation. CC refers to the conditional calibration tests of Nolde and Ziegel (2017), and ER to the exceedance residuals tests of McNeil and Frey (2000). 36 Table 5: Empirical sizes for the ﬁrst simulation study. Sample Str. Aux. Int. Str. Aux. Int. Gen. Sim. Std. DGP ER Size ESR ESR ESR ESR ESR ESR CC CC ER Misspec Covariance Classical Covariance 250 0.12 0.12 0.19 0.31 0.31 0.20 0.16 0.33 0.12 0.14 500 0.09 0.09 0.15 0.21 0.21 0.16 0.16 0.25 0.10 0.12 EGARCH-STD 1000 0.08 0.08 0.13 0.16 0.16 0.13 0.14 0.19 0.10 0.13 2500 0.08 0.08 0.11 0.11 0.12 0.11 0.12 0.14 0.10 0.11 5000 0.08 0.08 0.15 0.10 0.10 0.15 0.11 0.13 0.10 0.11 250 0.14 0.14 0.18 0.32 0.32 0.20 0.15 0.32 0.13 0.13 500 0.11 0.11 0.14 0.22 0.22 0.15 0.15 0.25 0.12 0.12 GAS-STD 1000 0.09 0.09 0.12 0.16 0.16 0.12 0.14 0.19 0.12 0.12 2500 0.09 0.09 0.12 0.13 0.12 0.12 0.12 0.15 0.11 0.12 5000 0.09 0.09 0.17 0.11 0.11 0.17 0.11 0.13 0.11 0.11 250 0.12 0.12 0.17 0.31 0.31 0.19 0.15 0.30 0.14 0.13 500 0.09 0.09 0.14 0.20 0.20 0.15 0.15 0.23 0.12 0.11 GAS-SSTD 1000 0.08 0.08 0.12 0.15 0.15 0.12 0.13 0.18 0.12 0.11 2500 0.07 0.07 0.11 0.11 0.11 0.11 0.13 0.14 0.12 0.10 5000 0.08 0.08 0.13 0.10 0.10 0.13 0.12 0.12 0.11 0.10 250 0.07 0.07 0.16 0.24 0.24 0.18 0.13 0.26 0.11 0.11 500 0.07 0.07 0.14 0.18 0.18 0.15 0.13 0.19 0.08 0.09 AR-GARCH, = 0:0 1000 0.07 0.07 0.12 0.14 0.14 0.12 0.12 0.16 0.09 0.09 2500 0.07 0.07 0.11 0.11 0.11 0.11 0.11 0.12 0.10 0.10 5000 0.08 0.08 0.11 0.11 0.11 0.11 0.11 0.11 0.10 0.10 250 0.07 0.07 0.16 0.24 0.24 0.18 0.13 0.26 0.11 0.11 500 0.07 0.07 0.14 0.18 0.18 0.15 0.13 0.19 0.08 0.09 AR-GARCH, = 0:1 1000 0.06 0.07 0.12 0.14 0.14 0.12 0.12 0.16 0.09 0.09 2500 0.07 0.07 0.11 0.11 0.11 0.11 0.11 0.12 0.10 0.10 5000 0.08 0.07 0.11 0.11 0.11 0.11 0.11 0.11 0.10 0.10 250 0.06 0.06 0.16 0.23 0.23 0.18 0.13 0.26 0.11 0.11 500 0.06 0.06 0.14 0.18 0.18 0.15 0.13 0.19 0.08 0.09 AR-GARCH, = 0:5 1000 0.06 0.06 0.12 0.14 0.14 0.12 0.12 0.16 0.09 0.09 2500 0.07 0.07 0.11 0.12 0.12 0.11 0.11 0.12 0.10 0.10 5000 0.08 0.08 0.11 0.12 0.11 0.11 0.11 0.11 0.10 0.10 Notes: The table reports the empirical sizes of the backtests for the diﬀerent DGPs decribed in Section 3.1 and for a nominal test size of 10%. The number of Monte-Carlo repetitions is 10,000 and the probability level for the risk measures is = 2:5%. ESR refers to the three backtests introduced in this paper and we consider versions with covariance estimation with and without model misspeciﬁcation. CC refers to the conditional calibration tests of Nolde and Ziegel (2017), and ER to the exceedance residuals tests of McNeil and Frey (2000). 37 References Acerbi, C. and Szekely, B. (2014). Backtesting Expected Shortfall. Risk Magazine, December:76–81. Angrist, J., Chernozhukov, V., and Fernandez-Val, I. (2006). Quantile regression under misspeciﬁcation, with an application to the u.s. wage structure. Econometrica, 74(2):539–563. Aramonte, S., Durand, P., Kobayashi, S., Kwast, M., Lopez, J. A., Mazzoni, G., Raupach, P., Summer, M., and Wu, J. (2011). Messages from the academic literature on risk measurement for the trading book. Technical report, Bank for International Settlements. Working Paper No. 19, available at http: //www.bis.org/publ/bcbs_wp19.pdf. Ardia, D., Boudt, K., and Catania, L. (2019). Generalized autoregressive score models in r: The gas package. Journal of Statistical Software, Articles, 88(6):1–28. Artzner, P., Delbaen, F., Eber, J.-M., and Heath, D. (1999). Coherent Measures of Risk. Mathematical Finance, 9(3):203–228. Barendse, S. (2018). Interquantile Expectation Regression. Tinbergen Institute Discussion Paper 2017-034/III. Available at https://ssrn.com/abstract=2937665. Basel Committee (1996). Overview of the Amendment to the Capital Accord to Incorporate Market Risks. Technical report, Bank for International Settlements. Available at http://www.bis.org/publ/bcbs23.pdf. Basel Committee (2013). Fundamental review of the trading book: A revised market risk framework. Technical report, Bank for International Settlements. Available at http://www.bis.org/publ/bcbs265.pdf. Basel Committee (2016). Minimum capital requirements for Market Risk. Technical report, Bank for International Settlements. Available at http://www.bis.org/bcbs/publ/d352.pdf. Basel Committee (2017). Pillar 3 disclosure requirements – consolidated and enhanced framework. Technical report, Basel Committee on Banking Supervision. Available at http://www.bis.org/bcbs/publ/d400.pdf. Bayer, S. and Dimitriadis, T. (2019a). esback: Expected Shortfall Backtesting. R package version 0.2.0, available at https://github.com/BayerSe/esback. Bayer, S. and Dimitriadis, T. (2019b). esreg: Joint Quantile and Expected Shortfall Regression. R package version 0.4.0, available at https://CRAN.R-project.org/package=esreg. Bellini, F., Klar, B., Müller, A., and Gianin, E. R. (2014). Generalized quantiles as risk measures. Insurance: Mathematics and Economics, 54(C):41 – 48. Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics, 31(3):307–327. Carver, L. (2013). Mooted VAR substitute cannot be back-tested, says top quant. Risk Magazine, March. Cont, R., Deguest, R., and Scandolo, G. (2010). Robustness and sensitivity analysis of risk measurement procedures. Quantitative Finance, 10(6):593–606. Costanzino, N. and Curran, M. (2015). Backtesting general spectral risk measures with application to expected shortfall. Risk Magazine, March. 38 Costanzino, N. and Curran, M. (2018). A simple traﬃc light approach to backtesting expected shortfall. Risks, 6(1). Couperier, O. and Leymarie, J. (2019). Backtesting expected shortfall via multi-quantile regression. available at https://halshs.archives-ouvertes.fr/halshs-01909375v2. Creal, D., Koopman, S. J., and Lucas, A. (2013). Generalized autoregressive score models with applications. Journal of Applied Econometrics, 28(5):777–795. Danielsson, J., Embrechts, P., Goodhart, C., Keating, C., Muennich, F., Renault, O., and Shin, H. S. (2001). An Academic Response to Basel II. Financial Markets Group Special Papers, available at https://EconPapers.repec.org/RePEc:fmg:fmgsps:sp130. Dimitriadis, T. and Bayer, S. (2019). A joint quantile and expected shortfall regression framework. Electron. J. Statist., 13(1):1823–1871. Du, Z. and Escanciano, J. C. (2017). Backtesting Expected Shortfall: Accounting for Tail Risk. Management Science, 63(4):940–958. Efron, B. (1991). Regression percentiles using asymmetric squared error loss. Statistica Sinica, 1(1):93–125. Efron, B. and Tibshirani, R. J. (1994). An Introduction to the Bootstrap. New York: Chapman and Hall. Embrechts, P., Liu, H., and Wang, R. (2018). Quantile-based risk sharing. Oper. Res., 66(4):936–949. Emmer, S., Kratz, M., and Tasche, D. (2015). What Is the Best Risk Measure in Practice? A Comparison of Standard Measures. Journal of Risk, 18(2):31–60. Engle, R. F. and Russell, J. R. (1998). Autoregressive conditional duration: A new model for irregularly spaced transaction data. Econometrica, 66(5):1127–1162. Fissler, T. and Ziegel, J. F. (2016). Higher order elicitability and Osband’s principle. Annals of Statistics, 44(4):1680–1707. Fissler, T., Ziegel, J. F., and Gneiting, T. (2016). Expected Shortfall is jointly elicitable with Value at Risk - Implications for backtesting. Risk Magazine, January:58–61. Gaglianone, W. P., Lima, L. R., Linton, O., and Smith, D. R. (2011). Evaluating Value-at-Risk Models via Quantile Regression. Journal of Business & Economic Statistics, 29(1):150–160. Gikhman, I. and Skorokhod, A. (2004). The Theory of Stochastic Processes I, volume 210 of Classics in Mathematics. Springer Berlin Heidelberg. Glosten, L. R., Jagannathan, R., and Runkle, D. E. (1993). On the Relation between the Expected Value and the Volatility of the Nominal Excess Return on Stocks. The Journal of Finance, 48(5):1779–1801. Gneiting, T. (2011). Making and Evaluating Point Forecasts. Journal of the American Statistical Association, 106(494):746–762. Gourieroux, C., Monfort, A., and Trognon, A. (1984). Pseudo maximum likelihood methods: Theory. Econometrica, 52(3):681–700. 39 Graham, A. and Pál, J. (2014). Backtesting value-at-risk tail losses on a dynamic portfolio. The Journal of Risk Model Validation, 8(2):59. Guler, K., Ng, P. T., and Xiao, Z. (2017). Mincer–Zarnowitz quantile and expectile regressions for forecast evaluations under aysmmetric loss functions. Journal of Forecasting, 36(6):651–679. Harvey, A. (2013). Dynamic Models for Volatility and Heavy Tails: With Applications to Financial and Economic Time Series. Econometric Society monographs. Cambridge University Press. Hendricks, W. and Koenker, R. (1992). Hierarchical Spline Models for Conditional Quantiles and the Demand for Electricity. Journal of the American Statistical Association, 87(417):58–68. Holden, K. and Peel, D. A. (1990). On Testing For Unbiasedness And Eﬃciency Of Forecasts. The Manchester School, 58(2):120–127. Huber, P. (1967). The behavior of maximum likelihood estimates under nonstandard conditions. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, pages 221–233. Berkeley: University of California Press. Kerkhof, J. and Melenberg, B. (2004). Backtesting for risk-based regulatory capital. Journal of Banking & Finance, 28(8):1845 – 1865. Kim, T.-H. and White, H. (2003). Estimation, inference, and speciﬁcation testing for possibly misspeciﬁed quantile regression. In Maximum Likelihood Estimation of Misspeciﬁed Models: Twenty Years Later, pages 107–132. Emerald Group Publishing Limited. Koenker, R. W. and Bassett, G. (1978). Regression quantiles. Econometrica, 46(1):33–50. Komunjer, I. (2004). Quantile Prediction. In Elliott, G. and Timmermann, A., editors, Handbook of Economic Forecasting, volume 2, chapter 17, pages 961–994. Elsevier. Komunjer, I. (2005). Quasi-maximum likelihood estimation for conditional quantiles. Journal of Econometrics, 128(1):137–164. Koopman, S. J., Lucas, A., and Scharth, M. (2016). Predicting time-varying parameters with parameter-driven and observation-driven models. Review of Economics and Statistics, 98(1):97–110. Kratz, M., Lok, Y. H., and McNeil, A. J. (2018). Multinomial VaR backtests: A simple implicit approach to backtesting expected shortfall. Journal of Banking & Finance, 88(C):393–407. Lloyd, C. J. (2005). Estimating test power adjusted for size. Journal of Statistical Computation and Simulation, 75(11):921–933. Löser, R., Wied, D., and Ziggel, D. (2018). New backtests for unconditional coverage of expected shortfall. Journal of Risk, 21(4):1–21. McNeil, A. J. and Frey, R. (2000). Estimation of tail-related risk measures for heteroscedastic ﬁnancial time series: an extreme value approach. Journal of Empirical Finance, 7(3–4):271–300. Mincer, J. and Zarnowitz, V. (1969). The Evaluation of Economic Forecasts. In Economic Forecasts and Expectations: Analysis of Forecasting Behavior and Performance, pages 3–46. National Bureau of Economic Research, Inc. 40 Nadarajah, S., Zhang, B., and Chan, S. (2014). Estimation methods for expected shortfall. Quantitative Finance, 14(2):271–291. Nelson, D. B. (1991). Conditional Heteroskedasticity in Asset Returns: A New Approach. Econometrica, 59(2):347–370. Newey, W. and McFadden, D. (1994). Large sample estimation and hypothesis testing. In Engle, R. and McFadden, D., editors, Handbook of Econometrics, volume 4, chapter 36, pages 2111–2245. Elsevier. Nolde, N. and Ziegel, J. F. (2017). Elicitability and backtesting: Perspectives for banking regulation. The Annals of Applied Statistics, 11(4):1833–1874. Patton, A. J., Ziegel, J. F., and Chen, R. (2019a). Dynamic semiparametric models for expected shortfall (and value-at-risk). Journal of Econometrics, 211(2):388 – 413. Patton, A. J., Ziegel, J. F., and Chen, R. (2019b). Supplemental appendix for dynamic semiparametric models for expected shortfall (and value-at-risk). available at https://doi.org/10.1016/j.jeconom.2018.10.008. Righi, M. B. and Ceretta, P. S. (2013). Individual and ﬂexible expected shortfall backtesting. Journal of Risk Model Validation, 7(3):3–20. Righi, M. B. and Ceretta, P. S. (2015). A comparison of Expected Shortfall estimation models. Journal of Economics and Business, 78:14–47. Weber, S. (2006). Distribution Invariant Risk Measures, Information, and Dynamic Consistency. Mathematical Finance, 16(2):419–441. Weiss, A. A. (1991). Estimating nonlinear dynamic models using least absolute error estimation. Econometric Theory, 7(1):46–68. White, H. (1980). Using least squares to approximate unknown regression functions. International Economic Review, 21(1):149–70. White, H. (1994). Estimation, Inference and Speciﬁcation Analysis. Econometric Society Monographs. Cambridge University Press. White, H. (2001). Asymptotic Theory for Econometricians. Academic Press, San Diego. Wong, W. K. (2008). Backtesting trading risk of commercial banks using expected shortfall. Journal of Banking & Finance, 32(7):1404–1415. Yamai, Y. and Yoshiba, T. (2002). On the validity of value-at-risk: comparative analyses with expected shortfall. Monetary and Economic Studies, 20(1):57–85. Ziegel, J. F. (2016). Coherence and elicitability. Mathematical Finance, 26(4):901–918.
Quantitative Finance – arXiv (Cornell University)
Published: Jan 12, 2018
You can share this free article with as many people as you like with the url below! We hope you enjoy this feature!
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.