Get 20M+ Full-Text Papers For Less Than $1.50/day. Subscribe now for You or Your Team.

Learn More →

A Joint Quantile and Expected Shortfall Regression Framework

A Joint Quantile and Expected Shortfall Regression Framework We introduce a novel regression framework which simultaneously models the quantile and the Expected Shortfall (ES) of a response variable given a set of covariates. This regression is based on a strictly consistent loss function for the pair quantile and ES, which allows for M- and Z-estimation of the joint regression parameters. We show consistency and asymptotic normality for both estimators under weak regularity conditions. The underlying loss function depends on two specification functions, whose choice affects the properties of the resulting estimators. We find that the Z-estimator is numerically unstable and thus, we rely on M-estimation of the model parameters. Extensive simulations verify the asymptotic properties and analyze the small sample behavior of the M-estimator for different specification functions. This joint regression framework allows for various applications including estimating, forecasting, and backtesting ES, which is particularly relevant in light of the recent introduction of ES into the Basel Accords. Keywords: Expected Shortfall, Joint Elicitability, Joint Regression, M-estimation, Quantile Regression 1. Introduction Measuring and forecasting risks is essential for a variety of academic disciplines. For this purpose, risk measures which are formally defined as a map (with certain properties) from a space of random variables to a real number, are applied to condense the complex nature of the involved risks to a single number (Artzner et al., 1999). In the context of financial risk measurement, to date the most commonly used risk measure is the Value-at-Risk (VaR), which is the -quantile of the return distribution. Its popularity is mainly due to its simple nature and the fact that up to now, the Basel Accords stipulate its use for the calculation of capital requirements for banks. Besides being not coherent (Artzner et al., 1999), the main drawback of the VaR is its inability to capture tail risks beyond itself. This deficiency is overcome by the risk measure Expected Shortfall (ES) at level , which is defined as the mean of the returns which are smaller than the -quantile of the return distribution. The ES has the desired ability to capture information from the whole left tail of the return distribution, which is particularly important for measuring extreme financial risks. Over the past few years, ES has increasingly become the object of interest for practitioners, academics, and regulators, especially since its recent introduction into the Basel Accords (Basel Committee, 2016). A major drawback of the ES (regarded as a statistical functional) is that it is not elicitable, which means that there exists no loss function (scoring function, scoring rule) which the ES uniquely minimizes in expectation (Gneiting, 2011; Weber, 2006). This result has two main consequences. First, consistent ranking of competing forecasts for the ES based on such a loss function is infeasible. Second, and more substantial for this paper, modeling the conditional ES given a set of covariates through a regression Corresponding author Email addresses: timo.dimitriadis@uni-konstanz.de, sebastian.bayer@uni-konstanz.de arXiv:1704.02213v3 [math.ST] 8 Aug 2017 model without specifying the full conditional distribution is infeasible since estimation of the regression parameters through M-estimation requires such a loss function. Consequently, and in contrast to quantile regression (which can be used to model the VaR), to date, there exists no such regression framework which models the ES based on a set of covariates. Nadarajah et al. (2014) provide an overview of estimation methods for the ES. However, the reviewed approaches are only applicable for univariate data and not suitable for estimating the conditional ES based on covariates such as in mean and quantile regression. Nevertheless, there are some approaches for the ES which incorporate explanatory variables through indirect estimation procedures. Taylor (2008b) proposes an implicit approach for forecasting ES using exponentially weighted quantile regression and Taylor (2008a) introduces a procedure based on expectile regression and a relationship between the ES and expectiles. Taylor (2017) suggests a joint modeling technique for the quantile and the ES based on maximum likelihood estimation of the asymmetric Laplace distribution. Barendse (2017) proposes generalized method of moments (GMM) estimation for a regression framework for the interquantile expectation. Even though the ES is not elicitable stand-alone, Fissler and Ziegel (2016) show in their seminal paper that the quantile (the VaR) and the ES are jointly elicitable by introducing a class of joint loss functions, whose expectation is minimized by these two functionals. This joint elicitability result and the associated class of loss functions gives rise to a growing literature in both, joint estimation (Zwingmann and Holzmann, 2016) and in joint forecast evaluation (Acerbi and Szekely, 2014; Fissler et al., 2016; Nolde and Ziegel, 2017; Ziegel et al., 2017) for the risk measures VaR and ES. In this paper, we utilize the class of loss functions of Fissler and Ziegel (2016) for the introduction of a novel simultaneous regression framework for the quantile and the ES and propose both, an M- and a Z-estimator for the joint regression parameters. These strictly consistent loss functions facilitate the opportunity to introduce M- and Z-estimation of the regression parameters without specifying the full conditional distribution of the model, as opposed to maximum likelihood estimation. We show consistency and asymptotic normality for both estimators under weak regularity conditions which are typical for such a regression framework. To the best of our knowledge, we are the first to propose such a joint regression framework for the quantile and the ES together with the joint M- and Z-estimation and the associated results of consistency and asymptotic normality. Furthermore, we are the first to propose a joint semiparametric regression framework for two different functionals based on joint M-estimation without specifying the full conditional distribution. The employed joint loss function, the estimating equations (for the Z-estimator) and the resulting parameter estimates depend on two specification functions, which can be chosen from some class of functions. Even though consistency and asymptotic normality hold for all applicable choices of these specification functions, they affect the necessary moment conditions, the resulting asymptotic covariance matrices of the estimators, the numerical stability of the optimization algorithm, and the computation times. We discuss the choice of these functions in a theoretical context with respect to asymptotic efficiency and necessary regularity conditions, and with respect to the numerical properties of the optimization algorithm. The estimation of the asymptotic covariance matrix imposes some difficulties. The first occurs in the estimation of the density quantile function, analogous to quantile regression (cf. Koenker, 2005) and thus, we utilize estimation procedures stemming from this literature. The second issue is the estimation of the variance of the negative quantile residuals conditional on the covariates, a nuisance quantity which is new to the literature. We introduce several estimators for this quantity which are able to cope with limited sample sizes and which can model the dependency of the negative quantile residuals on the covariates. Furthermore, we estimate the covariance matrix using the bootstrap. For ease of application, we provide an R package (Bayer and Dimitriadis, 2017a) which contains the implementation of the M- and Z-estimator. The user can choose the specification functions, the numerical optimization procedure and the estimation method for the covariance matrix of the parameter estimates. We conduct a Monte-Carlo simulation study where we consider three data generating processes with 2 different properties. We numerically verify consistency and asymptotic normality of the M-estimator for a range of different choices of the specification functions. Furthermore, we find that the Z-estimator is numerically unstable due to the redescending nature of the utilized estimating equations and consequently, we rely on M-estimation of the regression parameters. Moreover, we find that the performance of the M-estimator strongly depends on the specification functions, where choices resulting in positively homogeneous loss functions (Nolde and Ziegel, 2017; Efron, 1991) lead to a superior performance in terms of asymptotic efficiency, computation times, and mean squared error of the estimator. This joint regression technique for the quantile and ES has a wide range of potential applications as it generalizes quantile regression to the pair consisting of the quantile and the ES. In the context of financial risk management, it opens up the possibility to extend the existing applications of quantile regression on VaR in the financial literature to ES, such as e.g. in Chernozhukov and Umantsev (2001), Engle and Manganelli (2004), Koenker and Xiao (2006), Gaglianone et al. (2011), Halbleib and Pohlmeier (2012), Komunjer (2013), Xiao et al. (2015) and Žikeš and Baruník (2016). Such estimation, forecasting, and backtesting methods for the ES are particularly sought-after in light of the recent shift from VaR to ES in the Basel Accords. As an illustration, we present an empirical application where we use our regression framework to jointly forecast VaR and ES based on the realized volatility. The rest of the paper is organized as follows. In Section 2, we introduce the joint regression framework, the underlying regularity conditions together with the asymptotic properties of our estimators and discuss the choice of the specification functions. Section 3 provides details on the numerical implementation of the estimators and on the estimation of the asymptotic covariance matrix. Section 4 presents an extensive simulation study and Section 5 contains an empirical application. Section 6 provides concluding remarks. The proofs are deferred to Appendices B and C. 2. Methodology 2.1. The Joint Regression Framework Following Lambert et al. (2008), Gneiting (2011) and Fissler and Ziegel (2016), we introduce the concept of (multivariate) p-elicitability. We consider a random variable Z : ! R , defined on some complete probability space ;F; P , a class of distributions P on R , equipped with the Borel -field and a functional T : P ! D with its domain of action D  R ; p 2 N. We call an integrable loss function : R  D ! R strictly consistent for the functional T relative to the class of distributions P, if T is the unique minimizer of E ¹Z;º for all distributions F 2 P, where F is the distribution of Z. Furthermore, we call a p-dimensional functional T p-elicitable relative to the class P, if there exists a loss function which is strictly consistent for T relative to P. If the dimension p is clear from the context, we simply call the functional elicitable instead of p-elicitable. Given the generalized -quantile Q ¹Zº = F ¹ º = inf z 2 R : F¹zº  for some 2 ¹0; 1º, the ES of the random variable Z at level is defined as ES ¹Zº = Q ¹Zº du. If the distribution function of Z is continuous at its -quantile, this definition can be simplified to the conditional tail expectation ES ¹Zº = E Z Z  Q ¹Zº . Gneiting (2011) shows that the ES is not 1-elicitable with respect to any class P of probability distributions on intervals I  R, which contain measures with finite support or finite mixtures of absolutely continuous distributions with compact support (see also Weber, 2006). This result has several consequences for the risk measure ES. First, consistent and meaningful ranking of competing forecasts for the functional ES is infeasible. Second, and more consequential for this work, estimating the 0 e parameters of a stand-alone regression model for the functional ES in the sense that ES ¹YjXº = X by means of M-estimation, i.e. by minimizing some strictly consistent loss function, is infeasible. Even though the ES is not 1-elicitable, Fissler and Ziegel (2016) show that the pair consisting of the ES and the quantile at common probability level is 2-elicitable relative to the class of distributions with finite first moments and unique -quantiles and they characterize the full class of strictly consistent loss functions 3 for this pair subject to some regularity conditions. Since the definition of the ES already depends on the respective quantile, the fact that the ES is only elicitable jointly with the quantile is not surprising. We utilize this joint elicitability result for the introduction of a new joint regression framework for the quantile and the ES where the aforementioned class of strictly consistent loss functions serves as the basis for the M-estimation of the joint regression parameters. For this, let Y : ! R and X : ! R be random variables defined on the same probability space ;F; P as above. Henceforth, the transpose of X will be denoted by X , the cumulative distribution function of Y given X by F and the conditional YjX density function by f . For a k-times differentiable real-valued function G : R ! R, we denote the k-th YjX ¹kº derivative by G ¹º. Assumption 2.1 (The joint regression model). The regression framework which jointly models the conditional quantile and ES of Y given X for some fixed level 2 ¹0; 1º is given by 0 q 0 e e Y = X  + u and Y = X  + u ; (2.1) 0 0 q0 q e e0 0 2k where Q ¹u jXº = 0 and ES ¹u jXº = 0. The model is parametrized by  = ¹ ;  º 2   R , 0 0 where the parameter space  is compact with nonempty interior, int¹º , ;. We propose both, an M-estimation and a Z-estimation procedure for the compound regression parameter vector  . For the M-estimation, we adapt the class of strictly consistent joint loss functions1 for the quantile and ES as given in Fissler and Ziegel (2016) such that it can be used in a regression framework, 0 q ¹Y; X; º = 1 0 q G ¹X  º 1 0 q G ¹Yº fYX  g 1 fYX  g 1 0 q 0 q (2.2) ¹X  Yº1 fYX  g 0 e 0 e 0 q 0 e + G ¹X  º X  X  + G ¹X  º + a¹Yº; 2 2 where the function G is twice continuously differentiable, G is three times continuously differentiable, 1 2 ¹1º ¹1º G = G , G and G are strictly positive, G is increasing and a and G are integrable. We discuss 2 2 1 1 2 2 the choice of the specification functions G and G in a theoretical context in Section 2.3 and by their 1 2 numerical performance in Section 4.2. The corresponding (-type) M-estimator is defined by a sequence 1 n ˆ ˆ , such that  = argmin ¹Y ; X ; º. ;n ;n i i n i=1 Instead of minimizing some objective function ¹Y; X; º such as in (2.2), we can also define the corresponding Z-estimator (or -type M-estimator), which sets a vector of estimating equations (moment conditions), denoted by ¹Y; X; º, to zero. More generally, it suffices that these estimating equations converge to zero almost surely. Formally, the Z-estimator is a sequence  , such that ;n 1 n ¹Y ; X ;  º ! 0 almost surely, where i i ;n n i=1 ¹1º 1 0 q 0 e ¹1 0 q º XG ¹X  º + XG ¹X  º fYX  g 2 ¹Y; X; º © 1 ¹Y; X; º = = ­ ®; (2.3) ¹1º 0 e 0 e 0 q 1 0 q ¹Y; X; º 2 0 q XG ¹X  º X  X  + ¹X  Yº1 fYX  g « ¬ which is obtained by differentiating2 (2.2) and where the functions G and G are given as above. When 1 2 the loss function ¹Y; X; º is continuously differentiable in , it is obvious that the M- and Z-estimation approaches are equivalent. However, in this case the loss function ¹Y; X; º is not differentiable and 0 q ¹Y; X; º is discontinuous at the points where Y = X  . Thus, we treat these two estimation approaches as different estimators and show their asymptotic behavior separately. 1One can interpret the structure of this loss function as follows (Fissler et al., 2016): The first summand in (2.2) is a strictly consistent loss function for the quantile (Gneiting, 2011) and hence only depends on the quantile, whereas the second summand cannot be split into a part depending only on the quantile and one depending only on the ES. This illustrates the fact that the ES itself is not 1-elicitable, but 2-elicitable together with the respective quantile. 0 q 2 Note that the function ¹Y; X; º, given in (2.2) is only differentiable for Y , X  . However, the points of non- 0 q differentiability, Y = X  form a nullset with respect to the absolutely continuous distribution of Y given X. 4 2.2. Asymptotic Properties In this section, we present the asymptotic properties of the M- and Z-estimator of the regression parameters. Consistency and asymptotic normality hold under the following set of weak regularity conditions, which are natural for this regression framework. Assumption 2.2 (Regularity Conditions). (A-1) The data¹Y ; X º for i = 1; : : :; n is an iid series of random variables, distributed such as¹Y; Xº i i given above. Furthermore, the conditional distribution F has finite second moments and YjX is absolutely continuous with probability density function f , which is strictly positive, YjX continuous and bounded in a neighbourhood of the true conditional quantile, X  . (A-2) The matrix E X X is positive definite. (A-3) The functions ¹Y; X; º and ¹Y; X; º are given as in (2.2) and (2.3), where the function G ¹1º is twice continuously differentiable, G is three times continuously differentiable, G = G , 2 2 ¹1º G and G are strictly positive, G is increasing and a and G are integrable. 2 1 1 Remark 2.3 (Finite Moment Conditions). We further have to assume that certain moments of X are finite. For the sake of space, we specify the Finite Moment Conditions (M-1) - (M-4) in Appendix A. Note that these general moment conditions simplify substantially for sensible choices of the specification functions G and G as further outlined in Section 2.3. 1 2 Assumption (A-1) is a combination of typical regularity conditions of mean and quantile regression. Absolute continuity of F with a strictly positive, bounded and continuous density function in a YjX neighborhood of the true conditional quantile is also imposed for the asymptotic theory of quantile regression. Existence of the conditional moments of Y given X is subject to the conditions of mean regression and is included in our regularity conditions since ES is a truncated mean. The positive definiteness (full rank condition) in (A-2) is common for any regression design with stochastic regressors in order to exclude perfect multicollinearity of the regressors. The conditions for the specification functions G and G in (A-3) mainly originate from the conditions for the joint elicitability of the quantile and ES 1 2 in Fissler and Ziegel (2016). Differentiability of these functions is required in this setup for obtaining the estimating equations and for the differentiations in the computation of the asymptotic covariance in Theorem 2.6 and Theorem 2.7. The existence of certain moments of the explanatory variables as in conditions (M-1) - (M-4) in Appendix A is also standard in any regression design relying on stochastic regressors. Even though compactness of the parameter space  in Assumption 2.1 generally simplifies the proofs, in this setup it is crucial for consistency of the Z-estimator as the estimating equations are redescending to zero for many reasonable choices of the G function such as e.g. the choices resulting in positively homogeneous loss functions. For details on this, we refer to Section 3.1. Theorem 2.4. Assume that Assumption 2.1, Assumption 2.2 and the Moment Conditions (M-1) in P a:s: 1 n ˆ ˆ Appendix A hold true. Then, for every sequence  2  satisfying ¹Y ; X ;  º ! 0, it holds ;n i i ;n n i=1 a:s: that  !  . ;n 0 Theorem 2.5. Assume that Assumption 2.1, Assumption 2.2 and the Moment Conditions (M-2) ˆ ˆ in Appendix A hold true. Then, for every sequence  2  such that ¹Y ; X ;  º ;n i i ;n i=1 P P 1 n ¹Y ; X ;  º + o ¹1º, it holds that  !  . i i 0 P ;n 0 n i=1 Theorem 2.6. Assume that Assumption 2.1, Assumption 2.2 and the Moment Conditions (M-3) in P P ˆ ˆ Appendix A hold true. Then, for every sequence  2  satisfying ¹Y ; X ;  º ! 0, it ;n i i ;n i=1 holds that 1 1 n   ! N 0;  C ; (2.4) ;n 0 5 with 0 C C 11 11 12 = and C = ; (2.5) 0  C C 22 21 22 where h i q ¹1º q 0 0 0 0 e = E ¹X X º f ¹X  º G ¹X  º + G ¹X  º ; (2.6) 11 2 YjX 0 1 0 ¹1º 0 0 e = E ¹X X ºG ¹X  º ; (2.7) h i ¹1º q 2 0 0 0 e C = E ¹X X º G ¹X  º + G ¹X  º ; (2.8) 11 2 1 0 0 h i q ¹1º q ¹1º 0 0 0 e 0 0 e 0 e C = C = E ¹X X º X  X  G ¹X  º + G ¹X  º G ¹X  º ; (2.9) 12 21 2 0 0 0 0 1 0 2 1 1 ¹1º 2 q q q 2 0 0 e 0 0 0 0 e C = E ¹X X º G ¹X  º Var Y X  Y  X  ; X + X  X  : (2.10) 2 0 0 0 0 0 Theorem 2.7. Assume that Assumption 2.1, Assumption 2.2 and the Moment Conditions (M-4) 1 n ˆ ˆ in Appendix A hold true. Then, for every sequence  2  such that ¹Y ; X ;  º ;n i i ;n n i=1 n 1 inf ¹Y ; X ; º + o ¹n º, it holds that 2 i i P n i=1 1 1 n   ! N 0;  C ; (2.11) ;n 0 where the matrices  and C are given as in Theorem 2.6. Remark 2.8 (Quantile Regression). Notice that the asymptotic covariance matrix of the quantile-specific q 1 1 parameter estimates  is given by ¹1 ºD D D , where 1 1 h i q ¹1º q 0 0 0 0 e D = E ¹X X º f ¹X  º G ¹X  º + G ¹X  º and (2.12) 1 YjX 2 0 1 0 0 h i ¹1º q 0 0 0 e D = E ¹X X º G ¹X  º + G ¹X  º : (2.13) 0 2 1 0 This simplifies to the covariance matrix of quantile regression parameter estimates by setting G ¹zº = z and G ¹zº = 0, which means ignoring the ES-specific part of our loss function and estimating equations. This demonstrates that the quantile regression method is nested in our regression procedure, also in terms of its asymptotic distribution. Remark 2.9 (Asymptotic Covariance of the ES and the Oracle Estimator). The ES-specific part of the asymptotic covariance is mainly governed by the term C , which depends on the quantity 1 1 1 q q q 2 q 0 0 0 0 e 0 Var Y X  Y  X  ; X + X  X  = Var ¹Y X  º1 0 X : (2.14) 0 fYX  g 0 0 0 2 0 It is reasonable that the asymptotic covariance of ES regression parameters depends on the truncated variance of Y given X as the asymptomatic covariance of mean regression parameters is driven by the 0 0 e conditional (non-truncated) variance of Y given X. The second term X  X  in (2.14) is included 0 0 since the ES represents a truncated mean where the truncation point itself is a statistical functional (the quantile). In comparison, we consider an oracle M-estimator for the ES-specific regression parameters  , given by the loss function e 0 e 2 ¹Y; X;  º = ¹Y X  º 1 ; (2.15) Oracle fYX  g where we assume that the true quantile regression parameters  are known. The resulting asymptotic 6 covariance is given by h   i 1 1 e 0 0 0 e 0 0 AVar  = E X X  E ¹X X º Var Y X  Y  X  ; X  E X X ; (2.16) Oracle 0 0 0 0 e which shows that the additional term X  X  is not included for this estimator with fixed truncation 0 0 point X  . Remark 2.10 (Joint Estimation of the Sample Quantile and ES). We can use this regression framework to jointly estimate the quantile and ES of an identically distributed sample Y ; : : :;Y by regressing on a 1 n constant only. The asymptotic covariance matrix given in Theorem 2.6 and Theorem 2.7 then simplifies to with components ¹1 º = ; (2.17) f ¹ º Y 0 0 0 =  = ¹1 º ; (2.18) 12 21 f ¹ º 1 1 q q q e 2 = Var¹Y  jY   º + ¹  º ; (2.19) 0 0 0 0 where  and  are the true quantile and ES of Y. The same result is obtained by Zwingmann and 0 0 Holzmann (2016), who further allow for a distribution function for Y which is not differentiable at the quantile with strictly positive derivative. Notice that in this simplified case without covariates, the asymptotic covariance matrix is independent of the specification functions G and G used in the loss 1 2 function and in the estimating equations. Furthermore, (2.17) implies that quantile estimates stemming from our joint estimation procedure have the same asymptotic efficiency as quantile estimates stemming from minimizing the generalized piecewise linear loss (Gneiting, 2011) and as sample quantiles (cf. Koenker, 2005). The same holds true for the efficiency of the sample ES estimators (based on the sample quantile) of Brazauskas et al. (2008) and Chen (2008). Remark 2.11 (Pseudo-R and the choice of a¹Y º). By choosing a¹Yº = G ¹Yº + G ¹Yº 1 2 in (2.2), we can guarantee non-negative losses ¹Y; X; º  0. This choice enables us to define a pseudo-R for our joint regression framework in the sense of Koenker and Machado (1999), ¹Y; X; º QE R = 1 ; (2.20) ¹Y; X; º ˆ ˜ where  denotes the parameter estimates of the full regression model and  denotes the parameter estimates of a regression model restricted to an intercept term only. However, this choice of a¹Yº comes at the cost of more restrictive moment conditions, since we need to impose that E G ¹Yº + G ¹Yº < 1. 1 2 2.3. Choice of the Specification Functions The loss functions and the estimating equations given in (2.2) and (2.3) depend on two specification functions, G and G (with derivative G ), which have to fulfill the regularity conditions (A-3) in 1 2 2 Assumption 2.2. Fissler et al. (2016) already mention the feasible choices G ¹zº = 0, G ¹zº = z, 1 1 G ¹zº = exp¹zº and G ¹zº = exp¹zº 1 + exp¹zº in order to show that this class is non-empty. In contrast 2 2 to the loss functions of mean, quantile and expectile regression, there is no natural choice for these specification functions for the quantile and ES yet (Nolde and Ziegel, 2017). However, as the choice of these functions strongly influences the performance of our regression procedure in terms of its asymptotic efficiency, the necessary moment conditions of the regressors and the numerical performance of the optimization algorithm, we discuss sensible selection criteria in the following. 7 Efron (1991) and Nolde and Ziegel (2017) argue that for M-estimation of regression parameters it is crucial that the utilized loss function is positively homogeneous of some order b 2 R in the sense that ¹cY; X; cº = c ¹Y; X; º (2.21) for all c > 0. This is an important property for loss functions since the ordering of the losses should be independent of the unit of measurement, e.g. the currency we measure the prices and risk forecasts with. Loss functions following this property guarantee that we can change the scaling and still obtain the same optima and consequently the same parameter estimates. For the pair consisting of the quantile and the ES, Nolde and Ziegel (2017) characterize the full class of positively homogeneous3 loss functions of order b for the case where we restrict the domain of G , i.e. the conditional ES to the negative real line4, b < 0 : G ¹zº = c ; G ¹zº = c ¹zº + c ; (2.22) 1 0 2 1 0 b = 0 : G ¹zº = d 1 + d 1 ; G ¹zº = c log¹zº + c ; (2.23) 1 0 2 1 0 fz0g fz>0g 0 b b b 2 ¹0; 1º : G ¹zº = d 1 + d 1 jzj c ; G ¹zº = c ¹zº + c ; (2.24) 1 1 0 2 1 0 fz0g fz>0g 0 0 0 for some constants c ; d ; d 2 R with d  d , d ; d  0 and c > 0. There are no positively homogeneous 0 0 0 1 1 0 0 1 loss functions for the cases b  1. Our numerical simulations show that there is no gain in efficiency or numerical accuracy by deviating from the choice G ¹zº = 0 (see also Fissler et al., 2016; Nolde and Ziegel, 2017; Ziegel et al., 2017), which is also consistent with the homogeneity result. Consequently, we use G ¹zº = 0 in the following. A different natural guiding principle for selecting the specification functions is induced by choos- ing G (and G ) such that the moment conditions (M-1) - (M-4) in Appendix A are as least re- 2 1 strictive and as parsimonious as possible. For instance, choosing G such that G and its first 2 2 and second derivatives are bounded functions (and G ¹zº = 0) results in the moment condition 5 4 3 2 E jjXjj + jjXjj E jYj X + jjXjj E Y X + ja¹Yºj < 1. This motivates the usage of bounded func- tions5 for G such as e.g. the second example of Fissler et al. (2016), G ¹zº = exp¹zº 1 + exp¹zº , which 2 2 is the distribution function of the standard logistic distribution. Further examples of bounded G functions include the distribution functions of absolutely continuous distributions on the real line. In the simulation study in Section 4.2, we compare the performance of different specification functions in terms of mean squared error, asymptotic efficiency of the estimator and computation times. 3. Numerical Estimation of the Model In this section, we discuss the difficulties one encounters and the solutions we propose for estimating the joint regression model. Section 3.1 illustrates the numerical optimization procedure we employ for estimating the regression parameters and Section 3.2 discusses different estimation methods for the covariance matrix of the estimator. 3.1. Optimization Theorem 2.6 and Theorem 2.7 imply that both, M-estimation and Z-estimation of the regression parameters have the same asymptotic efficiency and consequently, we discuss these estimation approaches in terms of their numerical performance in the following. The numerical implementation of the Z-estimator relies 3For b = 0, only the loss differences are positively homogeneous. However, the ordering of the losses is still unaffected under this slightly weaker property. 4Since the conditional ES of financial assets for small probability levels is always negative, this is no critical restriction. 0 e However, for the numerical parameter estimation, we have to restrict the parameter space  such that X  < 0 for all  2  and for all X in the underlying sample. For details on this, we refer to Section 3.1. 5 Note that the positively homogeneous loss functions exhibit unbounded G functions. However, as the function G ¹zº does 2 2 not grow faster than linear as z tends to infinity, the resulting finite moment conditions are not too restrictive. 8 on root-finding of the estimating equations given in (2.3), which we implement as in GMM-estimation P P by minimizing the inner product ¹Y ; X ; º  ¹Y ; X ; º. However, the estimating equations are i i i i i i 0 e redescending to zero for many attractive choices of G in the sense that ¹Y; X; º ! 0 for X  ! 1. 2 2 q 0 e Consequently, for  such that  =  and X  ! 1, we get the same minimal value of the Z-estimation P P objective function ¹Y ; X ; º  ¹Y ; X ; º as for the true regression parameters  . Thus, the i i i i i i 0 Z-estimator is numerically unstable and diverges in many setups. Consequently, we rely on M-estimation of the regression parameters in the following. As the loss functions given in (2.2) are not differentiable and non-convex for all applicable choices of the specification functions (Fissler, 2017), we apply a derivative-free global optimization technique. More specifically, we use the Iterated Local Search (ILS) meta-heuristic of Lourenço et al. (2003), which successively refines the parameter estimates by repeated optimizations with iteratively perturbed starting values. Our q e exact implementation consists of the following steps. First, we obtain starting values for  and  from two quantile regressions of Y on X for the probability levels and ˜ , where we choose ˜ such that the ˜ -quantile and the -ES coincide under normality. Second, using these starting values we minimize the loss function with the derivative-free and robust Nelder-Mead Simplex algorithm (Nelder and Mead, 1965). Third, we perturb the resulting parameter estimates by adding normally distributed noise with zero mean and standard deviation equal to the estimated asymptotic standard errors of the initial quantile regression estimates. Fourth, we re-optimize the model with the perturbed parameter estimates as new starting values. If the loss is further decreased by this re-optimization, we update the estimates and otherwise, we retain the previous ones. Fifth, we iterate over the previous two steps until the loss does not decrease in m = 10 consecutive iterations. Our numerical experiments indicate that this repeated optimization procedure yields estimates very close to the ones stemming from other global optimization techniques such as e.g. simulated annealing, whereas the major advantage of ILS is the considerably lower computation time. For the choices of the specification functions which result in positively homogeneous loss functions, we have to restrict the domain of G to the negative real line as already discussed in Section 2.3. Thus, 0 e we have to restrict  such that X  < 0 for all  2  and for all i = 1; : : :; n during the optimization process. Even though in financial risk management the response variable Y is usually given by financial returns where the true (conditional) ES is strictly negative, there might still be some outliers X such that 0 e 0 e X   0. In such a case, imposing the restriction X  < 0 for all i = 1; : : :; n during the optimization i i process generates substantially biased estimates for  . In order to avoid this, we estimate the regression model for the transformed dependent variables Y max¹Yº for the positively homogeneous loss functions and add max¹Yº to the estimated intercept parameters to undo the transformation6. We provide an R package for the estimation of the regression parameters (see Bayer and Dimitriadis, 2017a). This package contains an implementation of both, the M- and the Z-estimator, where different optimization algorithms can be chosen (ILS, simulated annealing). The package allows for choosing the specification functions G and G and it includes an option to estimate the model either with or without 1 2 the translation of the dependent variable. Furthermore, the covariance matrix of the parameter estimates can be estimated either by using the asymptotic theory and the resulting techniques we discuss in the next section, or by using the nonparametric iid bootstrap (Efron, 1979). We recommend applying the M-estimator with the ILS algorithm as this procedure exhibits the best performance in our numerical experiments with respect to accuracy, stability and computation times. 6 Note that this data transformation changes the average loss function as the applied loss functions are in general not translation invariant. Thus, optimizing the translated loss function can lead to different parameter estimates. However, we do 0 e not face the risk of obtaining substantially biased estimates in cases where X   0 for some i 2 f1; : : : ng. Our numerical i 0 0 e experiments indicate that the difference between estimating the model for Y and for Y max¹Yº is small when X  < 0 for all i 0 0 e i 2 f1; : : : ng, but can be quite substantial if there is an outlier for X such that X   0. i 0 9 3.2. Asymptotic Covariance Estimation While most parts of the asymptotic covariance matrix given in Theorem 2.6 and Theorem 2.7 are straightforward to estimate, two nuisance quantities impose some difficulties. The first is the density quantile function f ¹X  º, which is already well investigated in the quantile regression literature. In YjX particular, we consider the estimators proposed by Koenker (1994), henceforth denoted by iid and by Hendricks and Koenker (1992), henceforth denoted by nid. The main difference between these is that the first is based on the assumption that the quantile residuals are independent of the covariates, whereas the second allows for a linear dependence structure. Both approaches depend on a bandwidth parameter which we choose according to Hall and Sheather (1988). The second nuisance quantity is the variance of the quantile residuals, conditional on the covariates and given that these residuals are negative, q q 0 0 q q Var Y X  Y  X  ; X = Var u u  0; X : (3.1) 0 0 Estimation of this quantity is demanding for two reasons. First, for very small probability levels which are typical in financial risk management such as e.g. = 2:5%, the truncation u  0 cuts off all but very few (about  n) observations. Second, modeling this truncated variance conditional on the covariates X is challenging, especially considering the very small sample sizes. Under the assumption of homoscedasticity, i.e. that the distribution of u is independent of the covariates X, we can simply estimate (3.1) by the sample variance of the negative quantile residuals and we refer to this estimator as ind in the following. We propose two further estimators which allow for a dependence of the quantile residuals on the covariates. For this purpose, we assume a location-scale process with linear7 specifications of the conditional mean and standard deviation in order to explicitly model the conditional relationship of u on X, q 0 0 u = X  + X  "; (3.2) for some parameter vectors ;  2 R and where "  G¹0; 1º follows a zero mean, unit variance distribution, q 0 0 2 such that u jX s G X ;¹X º with distribution function F and density f . As we need to estimate G G q q 0 2 the truncated variance of u given u  0, i.e. a truncated variant of¹X º , one possibility is to estimate (3.2) only for those observations where u  0. However, this approach particularly suffers from the very few negative quantile residuals as we need to estimate additional parameters compared to the ind approach. We present a feasible alternative by estimating the parameters  and  using all available observations of u and X by quasi generalized pseudo maximum likelihood (Gourieroux and Monfort, 1995, Section q q 8.4.4) and we obtain the truncated conditional variance by the scaling formula Var¹u ju  0; Xº = ¯ ¯ 0 0 2 q z h¹zº dz zh¹zº dz , where h¹zº = f ¹zºF ¹0º is the truncated conditional density of u G G 1 1 given X and u  0. We propose one parametric estimator, henceforth denoted by scl-N, where we assume that the distribution G is the normal distribution and apply a closed-form solution to the scaling formula. We further propose a semiparametric estimator, henceforth denoted by scl-sp, where we estimate the distribution G nonparametrically and then apply the scaling formula for this estimated density by numerical integration. 4. Simulation Study In this section, we investigate the finite sample behavior of the M-estimator and verify the asymptotic properties derived in Section 2.2 through simulations. Furthermore, we compare the performance of 7 This approach can further be generalized by considering more general specifications for the conditional mean and standard deviation. However, our numerical experiments indicate that the estimation accuracy for the asymptotic covariance matrix does not increase by deviating from these linear specifications. 10 different choices for the specification functions and evaluate the precision of the different covariance matrix estimators described in Section 3.2. 4.1. Data Generating Process In order to assess the numerical properties of estimating the joint regression model, we simulate data from a linear location-scale data generating process (DGP), 0 0 Y = X +¹X º v; (4.1) where v  F¹0; 1º has zero mean and unit variance, X = 1; X ; : : :; X and ;  2 R . For this process, 2 k the true conditional quantile and ES are linear functions in X, given by 0 0 Q ¹YjXº = X ¹ + z º and ES ¹YjXº = X ¹ +  º; (4.2) where z and  are the quantile and ES of the distribution F¹0; 1º, which implies that  = + z  and = +  . Furthermore, the conditional distributions of the quantile- and ES-residuals are given by q 0 0 2 e 0 0 2 u jX  F z ¹X º; ¹X º and u jX  F  ¹X º; ¹X º : (4.3) For the simulation study, we want to assess the performance of our regression procedure in various setups. Thus, we specify ,  and F in the following such that we get data which is homoscedastic (DGP-(1)) and heteroskedastic (DGP-(2)). Furthermore, we include a regression setup with multiple, correlated regressors and a leptocurtic conditional distribution (DGP-(3)), DGP-(1): X = ¹1; X º, X   and YjX  N X ; 1 2 2 2 2 2 DGP-(2): X = ¹1; X º, X   and YjX  N X ; ¹1 + 0:5X º 2 2 2 2 DGP-(3): X = ¹1; X ; X º X ; X  U»0; 1¼ with corr¹X ; X º = 0:5 and 2 3 2 3 2 3 YjX  t X X ; ¹1 + X + X º . 5 2 3 2 3 We simulate all three processes 25,000 times with varying sample sizes of n = 250, 500, 1000, 2000 and 5000 observations. For each replication and for each of the sample sizes we regress the simulated Y’s on the covariates X using our joint regression method for the probability level = 2:5%. 4.2. Comparing the Specification Functions We start the discussion of the simulation results by investigating the numerical performance of the M-estimator based on different choices of the specification function8 G used in the loss function in (2.2). We use three natural examples resulting in positively homogeneous loss functions of order b = 1, b = 0 and b = 0:5 respectively9, a bounded G function and the (unbounded) exponential function: G ¹zº = 1z; G ¹zº = log¹zº; G ¹zº = z; 2 2 2 (4.4) G ¹zº = log 1 + exp¹zº ; and G ¹zº = exp¹zº: 2 2 Figure 1 presents the sum (over the 2k regression parameters) of the mean squared errors (MSE) of the regression parameters for the three DGPs described above, different sample sizes and for the five choices of the specification functions given in (4.4). As implied by the asymptotic theory, we obtain consistent parameter estimates for all five choices of the specification functions as the MSEs converge to zero for all three DGPs. However, they differ substantially with respect to their small sample properties. The 8Following the reasoning of Section 2.3 and Nolde and Ziegel (2017); Ziegel et al. (2017), we fix G ¹zº = 0 throughout the simulation study. 9Our numerical simulations show that the numerical results are unaffected by different choices of the associated constants in (2.22) - (2.24). 11 three positively homogeneous specifications result in the most accurate estimates, whereas the choices G ¹zº = z and G ¹zº = log¹zº tend to perform slightly better than the choice G ¹zº = 1z. 2 2 2 Furthermore, the bounded choice G ¹zº = log 1 + exp¹zº still performs better than the unbounded exponential function. DGP-(1) DGP-(2) DGP-(3) 0.3 1.5 30 G (z) = G (z) = G (z) = 2 2 2 −log(−z) −log(−z) −log(−z) √ √ √ − −z − −z − −z 0.2 1.0 20 −1/z −1/z −1/z log(1 + exp(z)) log(1 + exp(z)) log(1 + exp(z)) exp(z) exp(z) exp(z) 0.1 0.5 10 0.0 0.0 0 250 500 1000 2000 5000 250 500 1000 2000 5000 250 500 1000 2000 5000 Sample Size Sample Size Sample Size Figure 1: Sum of the mean squared errors of the parameter estimates for all three DGPs. The results are shown for the five choices of the specification functions given in (4.4) and a range of sample sizes. Table 1 reports the Frobenius norms of the lower triangular parts of the true asymptotic covariance matrices and of the respective (lower triangular) quantile-specific and the ES-specific sub-matrices for the three DGPs and for the five choices of the specification functions given in (4.4). For comparison, we also report the Frobenius norm of the lower triangular part of the asymptotic covariance of the quantile regression estimator. We approximate the true asymptotic covariance matrix through Monte-Carlo integration with a sample size of 10 using the formulas in Theorem 2.6 and by using the true density and conditional truncated variance. On average, the specification functions G ¹zº = log¹zº and G ¹zº = z exhibit the smallest asymptotic covariances, closely followed by the third choice for a positively homogeneous loss function, G ¹zº = 1z. The non-homogeneous choices lead to considerably larger asymptotic variances for all considered DGPs and sub-matrices. Furthermore, by comparing the quantile-specific parameters of the joint estimation approach (from the positively homogeneous loss functions) to quantile regression estimates, we roughly obtain the same asymptotic efficiency. Table 1: This table reports the Frobenius norms of the lower triangular parts of the asymptotic covariance matrices and the respective quantile-specific and the ES-specific sub-matrices for the three DGPs and for the five choices of the specification functions given in (4.4). For comparison, we report the same quantity for the asymptotic covariance of the quantile regression estimator. DGP-(1) DGP-(2) DGP-(3) Q ES Full Q ES Full Q ES Full G ¹zº = log¹zº 7.5 13.1 9.2 17.9 26.9 20.0 581.1 1739.1 1053.0 G ¹zº = z 7.0 11.8 8.4 18.0 25.4 19.3 584.5 1740.1 1054.4 G ¹zº = 1z 9.1 16.9 11.8 24.1 39.4 28.5 613.7 1851.9 1119.8 G ¹zº = log¹1 + exp¹zºº 15.4 21.5 16.6 72.4 80.1 67.1 987.9 2393.0 1496.4 G ¹zº = exp¹zº 15.8 22.6 17.2 74.6 84.5 70.0 1001.9 2440.4 1524.6 Quantile Regression 6.8 – – 21.4 – – 600.5 – – 4.3. Comparing the Variance-Covariance Estimators In this section, we compare the empirical performance of the asymptotic covariance estimators discussed in Section 3.2. For the comparison of their precision, Figure 2 reports the average of the Frobenius norm of the lower triangular part of the differences between the estimated covariances and the empirical covariance of the estimated parameters. We report results for the three homogeneous loss functions and the three Mean Squared Error DGP-(1) DGP-(1) DGP-(1) G (z) = −log(−z) G (z) = − −z G (z) = −1/z 2 2 2 10 10 10 iid / ind iid / ind iid / ind nid / scl-N nid / scl-N nid / scl-N 8 8 8 nid / scl-sp nid / scl-sp nid / scl-sp 6 Bootstrap 6 Bootstrap 6 Bootstrap 4 4 4 2 2 2 0 0 0 250 500 1000 2000 5000 250 500 1000 2000 5000 250 500 1000 2000 5000 DGP-(2) DGP-(2) DGP-(2) G (z) = −log(−z) G (z) = − −z G (z) = −1/z 2 2 2 25 25 25 iid / ind iid / ind iid / ind nid / scl-N nid / scl-N nid / scl-N 20 20 20 nid / scl-sp nid / scl-sp nid / scl-sp 15 Bootstrap 15 Bootstrap 15 Bootstrap 10 10 10 5 5 5 0 0 0 250 500 1000 2000 5000 250 500 1000 2000 5000 250 500 1000 2000 5000 DGP-(3) DGP-(3) DGP-(3) G (z) = −log(−z) 2 G (z) = − −z G (z) = −1/z 2 2 iid / ind iid / ind iid / ind nid / scl-N nid / scl-N nid / scl-N 1000 1000 1000 nid / scl-sp nid / scl-sp nid / scl-sp Bootstrap Bootstrap Bootstrap 500 500 500 0 0 0 250 500 1000 2000 5000 250 500 1000 2000 5000 250 500 1000 2000 5000 Sample Size Sample Size Sample Size Figure 2: This figure compares four covariance estimation approaches described in Section 3.2 for the three data generating processes, a range of sample sizes and the three positively homogeneous choices of the G -functions. We report the average of the Frobenius norm of the lower triangular part of the differences between the estimated asymptotic covariances and the empirical covariance of the M-estimator. DGPs, where each of the plots presents the average norm differences for the four covariance estimators (iid/nid, nid/scl-N, nid/scl-sp and the iid bootstrap) depending on the sample size. We find that the iid/nid estimator performs well for the first, homoscedastic DGP whereas for the other two DGPs, it fails to capture the underlying more complicated dynamics of the data. The nid/scl-N estimator outperforms the other estimation approaches in the first two DGPs, where the underlying conditional distribution follows a normal distribution whereas its performance drops for the third DGP, which follows a Student-t distribution. The performance of the flexible nid/scl-sp estimator is the most stable throughout all three DGPs. Eventually, the bootstrap estimator accurately estimates the covariance for all three DGPs, whereas in comparison to the other estimators, it is particularly good in small samples. The provided R package contains all four covariance estimators. 5. Empirical Application In this empirical application, we use our joint regression framework for forecasting the VaR and ES of the close-to-close log returns of the IBM stock. For that purpose, we adopt the forecasting framework of Frobenius Norm Frobenius Norm Frobenius Norm Žikeš and Baruník (2016) and jointly forecast the VaR and ES of daily financial returns r by q q e e Q ¹r jRV º =  +  RV and ES ¹r jRV º =  +  RV ; (5.1) t t1 t1 t t1 t1 1 2 1 2 2 12 where RV = ¹ r º denotes the realized volatility estimator (Andersen and Bollerslev, 1998) for day t;i t, where r denotes the i-th high-frequency return of day t. Our dataset consists of the five minute returns t;i of the IBM stock from January 3, 2001 to July 18, 2017 with total of 4120 days, which we obtain from the TAQ database. We estimate the model parameters using a rolling window of 1000 days and evaluate the forecasts on the remaining 3120 days. We compare the predictive power of this model against three standard models from the literature. The first is the historical simulation (HS) approach, which forecasts the VaR and ES for day t as the sample quantile and ES of the daily returns of the past 250 trading days. The second is an AR(1)-GARCH(1,1)-t model (Bollerslev, 1986), and the third is the Heterogeneous Auto-Regressive (HAR) model of Corsi (2009), based on the realized volatility estimates given above. Forecasts of the VaR and ES for the HAR model are obtained from the volatility forecasts and by assuming a Gaussian return distribution. While the first two of these approaches rely on daily data only, the third one incorporates the same high frequency information as our approach. We evaluate the forecasting power of the VaR and ES of these models by the class of strictly consistent loss (scoring) functions for the VaR and ES of Fissler and Ziegel (2016). We use Murphy diagrams introduced by Ehm et al. (2016) and Ziegel et al. (2017), which provide a parsimonious way to evaluate competing forecasts simultaneously for a full class of strictly consistent loss functions. In fact, one forecasting model significantly dominates another one with respect to the full class of strictly consistent loss functions if and only if the elementary score differences plotted in the Murphy diagrams are strictly negative (positive). For further details on the theory and the implementation of Murphy diagrams, we refer to Ehm et al. (2016) and Ziegel et al. (2017). Difference to HS Difference to GARCH Difference to HAR 0.000 −0.002 −0.004 −0.006 −0.008 −0.10 −0.05 0.00 −0.10 −0.05 0.00 −0.10 −0.05 0.00 Threshold Threshold Threshold Figure 3: Elementary Score Differences of the VaR/ES Regression and the respective comparison models Figure 3 displays the average of the elementary score differences of the joint VaR and ES regression model against the three alternative models together with the respective 95% pointwise confidence bands for the elementary scores provided in Ziegel et al. (2017) for the pair VaR and ES. Using this graphical method, we can see that the elementary score differences for the joint regression forecasting model against the historical simulation and AR(1)-GARCH(1,1)-t model are significantly negative for the vast majority of threshold values. This implies that the joint regression forecasting model significantly dominates these other two forecasting approaches. Even though we also observe strictly negative elementary score differences in comparison against the HAR model, these differences are not significant and consequently, we cannot significantly outperform this model. 6. Conclusion In this paper, we introduce a joint regression technique for the quantile (the VaR) and the ES. This regression approach relies on the class of strictly consistent joint loss functions introduced by Fissler and Score Difference Ziegel (2016), which permits the joint elicitation of the quantile and the ES. We introduce an M- and a Z-estimator for the parameters of the joint regression model. Given a set of standard regularity conditions, we show consistency and asymptotic normality for both estimators, which we also verify numerically through extensive simulations. The underlying loss functions, the estimating equations and the asymptotic covariance matrices of the estimators depend on the choice of two specification functions, which we investigate in terms of the resulting moment conditions, asymptotic efficiency, numerical performance and computation times. In our numerical simulations, we find that choices resulting in positively homogeneous loss functions dominate other choices with respect to the aforementioned criteria. Furthermore, we propose several estimation methods for the asymptotic covariance matrix, which are able to cope with different properties of the underlying data. We provide an R package (see Bayer and Dimitriadis, 2017a), which implements the M- and Z-estimation procedures where one can choose the underlying specification functions, the numerical optimization approach and the estimation method for the asymptotic covariance matrix. Our new joint regression technique allows for a wide range of applications for the risk measures VaR and ES. This regression approach can be used to model the ES (jointly with the VaR) by generalizing existing applications of quantile regression on VaR, such as e.g. in Koenker and Xiao (2006), Engle and Manganelli (2004), Chernozhukov and Umantsev (2001), Žikeš and Baruník (2016), Halbleib and Pohlmeier (2012), Komunjer (2013) and Xiao et al. (2015). As an illustration, we present an empirical application in this paper where we use this regression framework to jointly forecast VaR and ES based on realized volatility estimates. Furthermore, Bayer and Dimitriadis (2017b) use this regression to develop an ES backtest which is particularly relevant in light of the recent introduction of ES into the Basel regulatory framework and the present lack of accurate backtesting methods for the ES. Acknowledgements We thank Tobias Fissler, Lyudmila Grigoryeva, Roxana Halbleib, Phillip Heiler, Frederic Menninger, Winfried Pohlmeier, Patrick Schmidt, Johanna Ziegel and the participants of the Stochastics Colloquium on 11/30/2016 at the University of Konstanz for fruitful discussions and suggestions which inspired some of the results of this paper. Financial support by the Heidelberg Academy of Sciences and Humanities (HAW) within the project “Analyzing, Measuring and Forecasting Financial Risks by means of High-Frequency Data”, by the German Research Foundation (DFG) within the research group “Robust Risk Measures in Real Time Settings” and general support by the Graduate School of Decision Sciences (University of Konstanz) is gratefully acknowledged. The computation in this work was performed on the computational resource bwUniCluster funded by the Ministry of Science, Research and the Arts Baden-Württemberg and the Universities of the State of Baden-Württemberg, Germany, within the framework program bwHPC. Appendix A Finite Moment Conditions For convenience of the supremum notation, for all  2 int¹º and for d > 0, we define the open neighborhood U ¹º = f 2  : jj jj < dg and its closure U ¹º = f 2  : jj jj  dg. d d (M-1) For Theorem 2.4, we assume that the following moments are finite for some d > 0: ¹1º ¹2º 2 0 q 3 0 e • E»jjXjj sup jG ¹X  ºj¼ • E»jjXjj sup jG ¹X  ºj¼ 2U ¹ º 2U ¹ º d 0 1 d 0 2 0 0 ¹2º 2 0 q • E»jjXjj sup jG ¹X  ºj¼ 2U ¹ º 1 ¹1º d 0 2 0 e • E»jjXjj sup jG ¹X  ºj E»jYjjX¼¼ 2U ¹ º d 0 2 2 0 e • E»jjXjj sup jG ¹X  ºj¼ 2U ¹ º d 0 ¹1º ¹2º 3 0 e 2 0 e • E»jjXjj sup jG ¹X  ºj¼ • E»jjXjj sup jG ¹X  ºj E»jYjjX¼¼ 2U ¹ º 2U ¹ º d 0 2 d 0 2 0 0 (M-2) For Theorem 2.5, we assume that the following moments are finite: 15 2 0 e • E»jjXjj ¼ • E»jjXjj sup jG ¹X  ºj¼ 0 q • E»sup jG ¹X  ºj¼ 2 0 e • E»sup jG ¹X  ºj E»jYjjX¼¼ • E»jG ¹Yºj¼ 0 e • E»ja¹Yºj¼ • E»sup jG ¹X  ºj¼ (M-3) For Theorem 2.6, we assume that the following moments are finite for some constant d > 0 and for all  2 U ¹ º: d 0 ¹1º ¹2º 3 0 q 0 q • E»jjXjj ¹sup G ¹X  ºº¹sup G ¹X  ˜ ºº¼ ¯ ¯ 2U ¹ º  ˜2U ¹ º d 0 1 d 0 1 0 0 ¹1º ¹1º 3 0 q 0 e • E»jjXjj ¹sup G ¹X  ºº¹sup G ¹X  ˜ ºº¼ ¯ ¯ 2U ¹ º  ˜2U ¹ º d 0 1 d 0 2 0 0 ¹2º 3 0 e 0 q • E»jjXjj ¹sup G ¹X  ºº¹sup G ¹X  ˜ ºº¼ ¯ ¯ 2U ¹ º  ˜2U ¹ º d 0 d 0 1 0 0 ¹1º 3 0 e 0 e • E»jjXjj ¹sup G ¹X  ºº¹sup G ¹X  ˜ ºº¼ ¯ 2 ¯ 2U ¹ º  ˜2U ¹ º d 0 d 0 0 0 ¹1º 3 0 q 2 • E»jjXjj sup ¹G ¹X  ºº ¼ 2U ¹ º d 0 1 3 0 e 2 • E»jjXjj sup ¹G ¹X  ºº ¼ 2U ¹ º d 0 ¹1º 3 0 q 0 e • E»jjXjj sup G ¹X  ºG ¹X  º¼ 2U ¹ º d 0 1 ¹1º ¹2º 5 0 e 0 e • E»jjXjj ¹sup G ¹X  ºº¹sup G ¹X  ˜ ºº¼ ¯ ¯ 2U ¹ º  ˜2U ¹ º d 0 2 d 0 2 0 0 ¹1º 5 0 e 2 • E»jjXjj ¹sup G ¹X  ºº ¼ 2U ¹ º d 0 2 ¹1º ¹2º 4 0 e 0 e • E»jjXjj ¹sup G ¹X  ºº¹sup G ¹X  ˜ ººE»jYjjX¼¼ ¯ ¯ 2U ¹ º  ˜2U ¹ º 2 2 d 0 d 0 0 0 ¹1º ¹1º 3 0 e 0 e • E»jjXjj G ¹X  º¹sup G ¹X  ººE»jYjjX¼¼ 2U ¹ º 2 d 0 2 ¹1º ¹2º 3 0 e 0 e 2 • E»jjXjj G ¹X  º¹sup G ¹X  ººE»Y jX¼¼ 2U ¹ º 2 d 0 2 ¹1º ¹2º 3 0 e 0 e 2 • E»jjXjj ¹sup G ¹X  ºº¹sup G ¹X  ˜ ººE»Y jX¼¼ ¯ ¯ 2U ¹ º  ˜2U ¹ º d 0 2 d 0 2 0 0 (M-4) For Theorem 2.7, we assume that the following moments are finite for some constant d > 0: ¹1º 2 0 e • E»jG ¹Yºj¼ • E»jjXjj sup jG ¹X  ºj¼ 2U ¹ º d 0 2 2 0 e 2 • E»ja¹Yºj¼ • E»jjXjj sup ¹G ¹X  ºº ¼ 2U ¹ º d 0 ¹1º 0 q ¹1º 4 0 e 2 • E»jjXjj sup jG ¹X  ºj¼ 2U ¹ º • E»jjXjj sup ¹G ¹X  ºº ¼ d 0 1 0 2U ¹ º 0 2 ¹1º ¹1º 2 0 q 2 0 e • E»jjXjj sup ¹G ¹X  ºº ¼ ¯ • E»jjXjj sup jG ¹X  ºj E»jYjjX¼¼ 2U ¹ º ¯ d 0 1 2U ¹ º d 0 2 ¹1º ¹1º 3 0 e 2 2 0 q 0 e • E»jjXjj sup ¹G ¹X  ºº E»jYjjX¼¼ • E»jjXjj sup jG ¹X  ºG ¹X  ºj¼ ¯ ¯ 2 2U ¹ º 2U ¹ º d 0 2 d 0 1 ¹1º 0 e 2 0 e 2 2 • E»jjXjj sup jG ¹X  ºj¼ • E»jjXjj sup ¹G ¹X  ºº E»Y jX¼¼ ¯ 2 ¯ 2U ¹ º 2U ¹ º d 0 d 0 2 0 0 Appendix B Proofs Henceforth, jjvjj denotes the maximum norm for a vector v 2 R and for a matrix A, jjAjj denotes the row-sum matrix norm which is induced by the maximum norm for vectors. For convenience of the supremum notation, for all  2 int¹º and for some d > 0, we define the open neighborhood U ¹º = f 2  : jj jj < dg and its closure U ¹º = f 2  : jj jj  dg. All references to d d Appendix C refer to the online supplement Dimitriadis and Bayer (2017). Proof of Theorem 2.4. We apply Theorem 2 from Huber (1967) and show that the function ¹Y; X; º as given in (2.3) satisfies the respective assumptions of this theorem. Note that the parameter space is assumed to be compact and thus, we do not have to show condition (B-4) in the notation of Huber 16 (1967). As the product of continuous functions and the indicator function 1 0 q , the function is fYX  g measurable and regarded as a stochastic process in , is separable in the sense of Doob as it is almost surely continuous in  (Gikhman and Skorokhod, 2004, p.164). This condition assures measurability of the suprema10 given below and in Lemma C.1. In oder to show that has a unique root at  , let us first define the sets q q 0 q 0 0 q 0 U = ! 2 X¹!º  , X¹!º  ; and W = ! 2 X¹!º  = X¹!º  ; (B.1) 0 0 for all  2  such that = W [ U and W \ U = ;. We first show that P¹U º > 0 for all  ,  . In order to see this, we assume the converse, i.e. let us assume that for a fixed  ,  , it holds that 0 q 0 P¹W º = P X  = X  = 1, which implies that q q q q 0 0 q 0 q 0 ¹  º E»X X ¼¹  º = E X  X  = 0: (B.2) 0 0 0 q 0 However, since  ,  , this contradicts the assumption that the matrix E»X X ¼ is positive definite and we can conclude that P¹U º > 0. The quantity h i ¹1º q 0 q 0 e 0 q 0 ¹º = E ¹Y; X; º = 1 E X G ¹X  º + G ¹X  º F ¹X  º F ¹X  º 1 1 2 YjX YjX 1 0 exists under the moment conditions (M-1) in Appendix A and if  =  , it holds that  ¹º = 0. Now, we assume that  2  such that  ,  . By splitting the expectation, we get that 0 q ¹º ¹  º h i ¹1º q q 0 q 0 e 0 q 0 0 q 0 = 1 E G ¹X  º + G ¹X  º X  X  F ¹X  º F ¹X  º 1 2 YjX YjX f!2W g 1 0 0 h i ¹1º q q 0 q 0 e 0 q 0 0 q 0 + 1 E G ¹X  º + G ¹X  º X  X  F ¹X  º F ¹X  º 1 : 2 YjX YjX f!2U g 1 0 0 0 q 0 The first summand is obviously zero since for all ! 2 W , F ¹X  º F ¹X  º = 0. Since the YjX YjX distribution of Y given X has strictly positive density in a neighbourhood of X  , we get that F is YjX strictly increasing in a neighbourhood of X  and thus q q 0 q 0 0 q 0 X  X  F ¹X  º F ¹X  º > 0 (B.3) YjX YjX 0 0 ¹1º 0 q 0 e for all ! 2 U . Furthermore, since G ¹X  º + G ¹X  º > 0 for all  2  and P¹U º > 0, we get that 0 q ¹º ¹  º h i ¹1º q q 0 q 0 e 0 q 0 0 q 0 = 1 E G ¹X  º + G ¹X  º X  X  F ¹X  º F ¹X  º 1 > 0; 2 YjX YjX f!2U g 1 0 0 and consequently  ¹º , 0. This implies that  ¹º = 0 if and only if  =  . Furthermore, 1 1 h i ¹1º 0 e 0 q 0 q 0 e ¹º = E XG ¹X  º X  F ¹X  º  + X  1 E Y1 0 q X : (B.4) 2 YjX fYX  g q q q 0 q 0 Assuming that  =  , which results from  ¹º = 0, we get that F ¹X  º = F ¹X  º = and 1 YjX YjX 0 0 ¹1º 0 e 0 0 e e e 1 E Y1 X = X  . Thus, (B.4) simplifies to E ¹X X ºG ¹X  º   and by applying fYX  g 0 2 0 ¹1º 0 0 e Lemma C.2, we get that the matrix E ¹X X ºG ¹X  º is positive definite for all  2 . Consequently, e e ¹º = 0 if and only if  =  and together with the arguments for  , we get that ¹º = 0 if and only if 2 1 =  . Eventually, assumption (B-2)’ from Theorem 2 of Huber (1967) follows directly from Lemma C.1, which concludes this proof. 10 Many other authors such as e.g. Newey and McFadden (1994); Andrews (1994); van der Vaart (1998) rely on outer probability in order to avoid these measurability issues. 17 Proof of Theorem 2.5. For this proof, we apply Theorem 5.7 from van der Vaart (1998) and show that the respective assumptions of this theorem hold. As in the proof of Theorem 2.6, we can conclude measurability of the suprema since the process  is continuous and consequently separable in the sense of Doob. Thus, we do not have to rely on outer probability measures such as in van der Vaart (1998). We start by showing uniform convergence in probability of the empirical mean of the objective function by the help of Lemma 2.4 of Newey and McFadden (1994). Since we have iid data, a compact parameter space and ¹Y; X; º is continuous for all  2 , it remains to show that there exists a dominating function d¹Y; Xº  j¹Y; X; º for all  2  with E d¹Y; Xº < 1. We define 0 q 0 e 0 q d¹Y; Xº = supjG ¹X  º + 1 G ¹X  º¹X  Yºj + G ¹Yº 1 2 1 (B.5) 0 e 0 e 0 q 0 e + sup G ¹X  º X  X  + supjG ¹X  ºj + G ¹Yº + a¹Yº 2 2 1 2 2 and it holds that d¹Y; Xº  ¹Y; X; º for all  2  and consequently, we can conclude uniform convergence in probability. We now show that E ¹Y; X; º has a unique and global minimum at  =  . For this, we assume that 2  such that  ,  and we define the sets 0 q 0 0 e 0 e U = ! 2 X¹!º  , X¹!º  or X¹!º  , X¹!º  and (B.6) 0 q 0 0 e 0 e W = ! 2 X¹!º  = X¹!º  and X¹!º  = X¹!º  ; (B.7) such that = U [ W and U \ W = ;. We first show that P¹U º > 0 for all  ,  . In order to see this, we assume the converse, i.e. we assume that P¹W º = 1, which implies that h i q q q 2 q q 0 0 q 0 q 0 0 q ¹  º E»X X ¼¹  º = E X  X  = 0, since P X  = X = 1 and equivalently 0 0 0 0 e e 0 0 e e q e e ¹  º E»X X ¼¹  º = 0. However, since  ,  and consequently either  ,  or  ,  , this 0 0 0 0 contradicts the assumption that the matrix E»X X ¼ is positive definite and it follows that P¹U º > 0. From the joint elicitability property of the quantile and ES of Fissler and Ziegel (2016), Corollary 5.5 k 0 q 0 0 e 0 e we get that for all x 2 R such that x  , x  or x  , x  , it holds that 0 0 E ¹Y; X;  º X = x < E ¹Y; X; º X = x ; (B.8) since the distribution of Y given X has a finite first moment and a unique -quantile. Thus, for all ! 2 U , E ¹Y; X;  º X ¹!º < E ¹Y; X; º X ¹!º: (B.9) We now define the random variable h¹X; ;  º¹!º = E ¹Y; X;  º X ¹!º E ¹Y; X; º X ¹!º; (B.10) 0 0 and (B.9) implies that h X; ;  ¹!º < 0 for all ! 2 U . Since P¹U º > 0, this implies that E h¹X; ;  º1 < 0. Furthermore, for all ! 2 W , it obviously holds that h¹X; ;  º¹!º = 0 and 0 f!2U g  0 consequently E h¹X; ;  º1 = 0. Thus, we get that 0 f!2W g E h¹X; ;  º = E h¹X; ;  º1 + E h¹X; ;  º1 < 0 (B.11) 0 0 f!2U g 0 f!2W g for all  2  such that  ,  , which shows that E ¹Y; X; º has a unique minimum at  =  . 0 0 Proof of Theorem 2.6. We apply Theorem 3 of Huber (1967) for the -function as given in (2.3) and show the respective assumptions of this theorem. Consistency of the Z-estimator is shown in Theorem 2.4. For the measureability and separability of the function, we refer to the proof of Theorem 2.4. It is already shown in the proof of Theorem 2.4 that there exists a  2  such that ¹ º = 0. For the 0 0 technical conditions (N-3), we apply Lemma C.3, Lemma C.1 and Lemma C.4. It remains to show that E jj ¹Y; X;  ºjj < 1, which follows from the subsequent computation of C and the Moment 18 1 1 Conditions (M-3) in Appendix A. The asymptotic covariance matrix is given by  C , where C = E ¹Y; X;  º ¹Y; X;  º and 0 0 @ ¹º @ ¹º 1 1 © q e ª @¹º @ @ 11 12 ­ 0 0® = = = : (B.12) @ ¹º @ ¹º ® 2 2 21 22 = q e @ @ 0 0 « ¬ Straightforward calculations yield the matrix C as given in (2.8) - (2.10). For the computation of , we first notice that the function ¹1º 1 0 q 0 q 0 e F ¹X  º G ¹X  º + G ¹X  º YjX 2 E ¹Y; X; º X =   (B.13) ¹1º 0 e 0 e 0 q 0 q 0 q XG ¹X  º X  X  + E ¹X  Yº1 X fYX  g is continuously differentiable for all  in some neighborhood U ¹ º around  , since the distribution d 0 0 F has a density which is strictly positive, continuous and bounded in this area. Let us choose a value YjX 0 0 ˜ ˜ 2 U ¹ º such that X   X . Then, d 0 @ @ @ 0 q E Y1 X = E Y1 0 q X + E Y1 0 q 0 q X ˜ ˜ fYX  g fYX  g fX  <YX  g q q q @ @ @ ¹ 0 q (B.14) 0 q 0 q = y f ¹yºdy = X¹X  º f ¹X  º: YjX YjX @ 0 q We consequently get that for all  2 U ¹ º, d 0 ¹1º 0 0 q 0 e 0 q E ¹Y; X; º X = 1 ¹X X º G ¹X  º + G ¹X  º f ¹X  º 1 2 YjX ¹2º 0 q 0 q +G ¹X  º F ¹X  º ; YjX @ @ ¹1º 0 0 e 0 q E ¹Y; X; º X = E ¹Y; X; º X = 1 ¹X X ºG ¹X  º F ¹X  º ; 1 2 YjX e q 2 @ @ ¹2º 0 0 e 0 q 0 q 0 e E ¹Y; X; º X = 1 ¹X X ºG ¹X  º X  F ¹X  º + ¹X  º E Y1 0 q X 2 YjX fYX  g ¹1º 0 0 e +¹X X ºG ¹X  º: @ @ In order to conclude that E E ¹Y; X; º X = E E ¹Y; X; º X , we apply a measure-theoretical @ @ version of the Leibniz integration rule, which requires that the derivative of the integrand exists and is absolutely bounded by some integrable function d¹Y; Xº, independent of . For the first term, this can easily be obtained by defining h i ¹1º ¹2º 0 0 q 0 e 0 q 0 q 0 q d¹Y; Xº = sup 1 ¹X X º G ¹X  º + G ¹X  º f ¹X  º + G ¹X  º F ¹X  º ; 2 YjX YjX 1 1 2U ¹ º d 0 which has finite expectation by the Moment Conditions (M-3). The other two terms follow the same reasoning. Inserting  =  eventually shows (2.6) and (2.7). Proof of Theorem 2.7. For this proof, we apply Theorem 5.23 from van der Vaart (1998) and show that the respective assumptions of this theorem hold. Theorem 2.5 shows consistency of the M-estimator. The map¹Y; Xº 7! ¹Y; X; º is obviously measurable as the sum of measurable functions. Furthermore, the map  7! ¹Y; X; º is almost surely differentiable since the only point of non-differentiability occurs 0 q where Y = X  , which is a nullset with respect to the joint distribution of Y and X and for all  2  such 0 q that Y , X  , its derivative is given by ¹Y; X; º. Local Lipschitz continuity with square-integrable Lipschitz-constant follows from Lemma C.5. We have already seen in the proof of Theorem 2.5 that the function E ¹Y; X; º is uniquely minimized at the point  and is twice continuously differentiable and consequently admits a second-order Taylor expansion at  . Thus, we have shown the necessary 19 assumptions of Theorem 5.23 from van der Vaart (1998). For the computation of the covariance matrix, we notice that the distribution of Y given X has a density f in a neighborhood of X  , which is strictly positive, continuous and bounded. Therefore, by the same YjX 0 @ 0 q 0 q arguments as in (B.14), we get that E G ¹Yº1 0 q X = XG ¹X  º f ¹X  º. Thus, straight- q 1 fYX  g 1 YjX forward calculations yield that for all  2 U ¹ º, it holds that E ¹Y; X; º X = E ¹Y; X; º X and d 0 by applying the Leibniz integration rule such as in the proof of Theorem 2.6, we finally get that E ¹Y; X; º = E ¹Y; X; º : (B.15) Consequently, the asymptotic covariance matrix equals the one given in Theorem 2.6. Appendix C Technical Results Lemma C.1. Let u¹Y; X; ; dº = sup ¹Y; X; º ¹Y; X; º (C.1) 2U ¹º and assume that Assumption 2.1, Assumption 2.2 and the Moment Conditions (M-1) in Appendix A hold. Then, there are strictly positive real numbers b and d , such that E u¹Y; X; ; dº  b d for jj  jj + d  d ; (C.2) 0 0 and for all d  0. Proof of Lemma C.1. For measurability of the suprema, we refer to the proof of Theorem 2.4. Let in the following d > 0 and  2  such that jj  jj + d  d . We first notice that for some fixed X 2 R and 0 0 for all  2 U ¹º, it holds that q q 1 0 q 1 0 q  1 (C.3) 0 0 fYX  g fYX  g fX  YX  g ¯ ¯ for all Y 2 R and for some  ;  2 U ¹º. Since U ¹º is compact, we get that d d 0 q 0 q q q sup 1 1  1 0 0 (C.4) fYX  g fYX  g fX  YX  g 2U ¹º q q q q for all Y 2 R and for some values  ;  2 U ¹º. Note that the values  and  depend on X and , + + however they are independent of Y. Consequently, it holds that " # h i q q E sup 1 0 q 1 0 q X  E 1 X 0 0 fYX  g fYX  g fX  YX  g 2U ¹º q q (C.5) 0 0 q 0 q 0 0 q = F X  F X  = f ¹X  º X  X YjX YjX YjX + + 0 q 2jjXjj  sup f ¹X  º d; YjX 2U ¹º q q q ˜ ˜ ¯ where we apply the mean value theorem for some  on the line between  and  , i.e.  2 U ¹º. 20 For the first component of , we get that " # E sup ¹Y; X; º ¹Y; X; º 1 1 2U ¹º " # 0 e 0 e G ¹X  º G ¹X  º ¹1º ¹1º 2 2 0 q 0 q E sup X G ¹X  º G ¹X  º + (C.6) 1 1 2U ¹º " " # # 0 e G ¹X  º ¹1º 2 0 q 0 q 0 q + E sup X G ¹X  º +  E sup 1 1 X : fYX  g fYX  g ¯ ¯ 2U ¹º 2U ¹º d d ¹1º 0 q 0 e The first term in (C.6) is O¹dº since G ¹X  º and G ¹X  º are continuously differentiable functions w.r.t  and thus, by the mean value theorem we get that ¹1º ¹1º ¹2º 0 q 0 q 0 q q q sup G ¹X  º G ¹X  º  sup XG ¹X  ˜ º  sup 1 1 1 ¯ ¯ ¯ 2U ¹º  ˜2U ¹º 2U ¹º d d d (C.7) ¹2º 0 q sup XG ¹X  ˜ º  d; ˜2U ¹º and the respective moments are finite by assumption. The same arguments hold for the function G . For the second term in (C.6), we apply (C.5) and thus get that " " # # 0 e G ¹X  º ¹1º 2 0 q 0 q 0 q E sup X G ¹X  º +  E sup 1 1 X fYX  g fYX  g ¯ ¯ 2U ¹º 2U ¹º d d " # (C.8) 0 e G ¹X  º ¹1º 2 0 q 0 q E sup X G ¹X  º + jjXjj  sup f ¹X  º  d: YjX ¯ ¯ 2U ¹º 2U ¹º d d Since the density f is bounded in a neighborhood of X  and the respective moments are finite by YjX assumption, we get that this term is also O¹dº. For the second component of , we get that " # E sup ¹Y; X; º ¹Y; X; º 2 2 2U ¹º " # ¹1º ¹1º 0 e 0 q 0 e 0 e 0 q 0 e E sup X¹X  X  ºG ¹X  º X¹X  X  ºG ¹X  º 2 2 2U ¹º " " # # ¹1º 0 e 0 q XG ¹X  ºX + E  E sup 1 0 q 1 0 q X fYX  g fYX  g 2U ¹º " " ! # # ¹1º ¹1º 0 e 0 q 0 e 0 q XG ¹X  ºX  XG ¹X  ºX 2 2 0 q + E E sup 1 X fYX  g 2U ¹º " " # # ¹1º 0 e XG ¹X  º + E  E sup Y 1 0 q 1 0 q X fYX  g fYX  g 2U ¹º " " # # 0 q Y1 fYX  g ¹1º ¹1º 0 e 0 e + E E sup XG ¹X  º XG ¹X  º X 2 2 2U ¹º = ¹iº +¹iiº +¹iiiº +¹ivº +¹vº: ¹1º 0 e 0 q 0 e The first, third and fifth term are linearly bounded by (C.7) since the functions¹X  X  ºG ¹X  º ¹1º ¹1º 0 q 0 e 0 e and ¹X  ºG ¹X  º and G ¹X  º are continuously differentiable. For the second term, we use the 2 2 21 arguments from (C.5). For the fourth term, we use similar arguments as in (C.5), and get that there exist q q q q q ¯ ˜ some  ;  2 U ¹º and a value  on the line between  and  , such that + + " " # # ¹1º 0 e XG ¹X  º 0 q 0 q E E sup Y 1 1 X fYX  g fYX  g 2U ¹º " # ¹1º 0 e h i XG ¹X  º q q E E jYj 1 X 0 0 fX  YX  g " # ¹ 0 ¹1º 0 e XG ¹X  º (C.9) = E jyj f ¹yºdy YjX " # ¹1º 0 e XG ¹X  º 2 0 q 0 q 0 0 q ˜ ˜ E jX  j f ¹X  º X  X YjX + " # ¹1º 0 e 0 q 0 q E G ¹X  º X sup jX  j f ¹X  º  d = O¹dº YjX 2U ¹º since f is bounded in a neighborhood of X  and the respective moments exist by assumption. This YjX concludes the proof of the lemma. Lemma C.2. Let the random variable X 2 R with distribution P be such that its second moments exist 0 k and the matrix E»X X ¼ is positive definite. Furthermore, let   R be a compact subspace with nonempty interior and let g : R   ! R be a strictly positive function. Then, the matrix E ¹X X ºg¹X; º (C.10) is also positive definite. 0 k Proof of Lemma C.2. Since E»X X ¼ is positive definite, we know that for all z 2 R with z , 0, it holds 0 0 0 0 0 2 0 that 0 < z E»X X ¼z = E»z ¹X X ºz¼ = E»¹X zº ¼ and consequently P X z , 0 > 0. Since g¹X; º is a strictly positive scalar for all  2 , it also holds that P ¹X zº g¹X; º , 0 > 0 and thus, for all z , 0, 0 0 0 z E ¹X X ºg¹X; º¼z = E X z g¹X; º > 0: (C.11) p p 0 0 This positivity statement holds since X z g¹X; º is a non-negative random variable andP ¹X zº g¹X; º , 0 > 0. This shows that the matrix E ¹X X ºg¹X; º is positive definite. Lemma C.3. Assume that Assumption 2.1, Assumption 2.2 and the Moment Conditions (M-3) in Appendix A hold. Then, for ¹º = E ¹Y; X; º ; (C.12) there are strictly positive numbers a; d , such that jj¹ºjj  a jj  jj for jj  jj  d : (C.13) 0 0 0 Proof of Lemma C.3. Let d > 0 and let jj  jj  d . Then, applying the mean value theorem, we get 0 0 0 that h i 0 ¹1º 0 q 0 e 0 q q ¹º = E ¹X X º G ¹X  º + G ¹X  º f ¹X  º ¹  º (C.14) 1 2 YjX 1 0 22 q q q for some  on the line between  and  . Similarly, for the second component we get that " # ¹1º 0 e 0 q G ¹X  º f ¹X  º YjX 2 0 q 0 q ¹º = E X X ¹  º X ¹  º (C.15) ¹1º 0 0 e e e + E ¹X X ºG ¹X  º ¹  º; 2 0 q q where  lies on the line between  and  . q q q q e e We first assume that jj  jj = jj  jj, i.e. jj  jj  jj  jj. Since the matrix 0 0 0 " # ¹1º 0 q 0 e G ¹X  º + G ¹X  º 0 0 q A¹º := E ¹X X º f ¹X  º (C.16) YjX exists and has full rank for all  2  by Lemma C.2 and is obviously symmetric, A has strictly positive real Eigenvalues ¹º; : : :; ¹º with minimum ¹º and we thus get that11 1 k ¹1º q q q q jj¹ºjj  jj ¹ºjj = jjA¹º¹  ºjj  ¹º jj  jj (C.17) 1 ¹1º 0 0 inf ¹º  jj  jj = c jj  jj: (C.18) ¹1º 1 0 jj jjd 0 0 Since jj  jj  d is a compact set and the function  7! inf ¹º, where ¹º is the 0 0 jj jjd ¹1º ¹1º 0 0 smallest Eigenvalue of the matrix A¹º, is continuous12, we get that the infimum coincides with the minimum and thus, the constant c := inf ¹º is strictly positive and does not depend on . jj jjd ¹1º 0 0 e e e e q Now, we assume that jj  jj = jj  jj  d , i.e. jj  jj  jj  jj. For the first term of 0 0 0 0 0 ¹º, given in (C.15), we define the vector " # ¹1º 0 e 0 q G ¹X  º f ¹X  º YjX 2 q 0 q 0 q 0 q b¹º := E X X ¹  º X  X  º ; (C.19) and for its l-th component, we get that " # ¹1º 0 e 0 q G ¹X  º f ¹X  º YjX q q q q jb ¹ºj = ¹  º¹  ºE X X X l i j l i j j 0i i; j " # ¹1º 0 e 0 q X G ¹X  º f ¹X  º YjX 2 q q q q E X X X  j  j  j  j i j l i 0i j j (C.20) i; j q q q q c j  j  j  j i 0i j j i; j 2 2 c k jj  jj ; 2 0 for all l = 1; : : :; k, which implies that jjb¹ºjj  c jj  jj ; (C.21) 3 0 ¹1º 0 0 e e e e e for some c > 0. For D¹º := E ¹X X ºG ¹X  º , it holds that jjD¹º¹  ºjj  c jj  jj = 3 4 2 0 0 11For a symmetric matrix A with full rank, we can find an orthogonal basis of Eigenvectors fv ; : : :; v g with corresponding P P P nonzero Eigenvalues f ¹º; : : :; ¹ºg such that x = b v with b 2 R. Then, jjAxjj = jjA b v jj = jj b Av jj = 1 k j j j j j j j P P jj b v jj  minj j  jj b v jj = minj j  jjxjj. j j j j j j j 12 This follows since the entries of the matrix A¹º are continuous in  as the expectation of a continuous function which is dominated by an integrable function is again continuous by the dominated convergence theorem. Furthermore, the Eigenvalues of a matrix are the solution of the characteristic polynomial, which has continuous coefficients since our matrix entries are continuous in . Eventually, since the roots of any polynomial with continuous coefficients are again continuous, we can conclude that the Eigenvalues of A¹º are continuous in . 23 c jj  jj for c > 0 by the same arguments as in (C.17). From (C.20), we can choose d small enough 4 0 4 0 such that 2 e e 2jjb¹ºjj  2c jj  jj  c jj  jj  jjD¹º¹  ºjj: (C.22) 3 0 4 0 e e Furthermore, by the submultiplicativity of the matrix norm, we also get that jjD¹º¹  ºjj e e e e jjD¹ºjj  jj  jj = c jj  jj and by the inverse triangle inequality, we get that 0 0 e e e e jj¹ºjj  jj ¹ºjj = D¹º¹  º + b¹º  jjD¹º¹  ºjj jjb¹ºjj : (C.23) 0 0 e e From (C.22), we can choose d small enough such that jjD¹  ºjj > 2jjbjj and thus e e e e e e jjD¹  ºjj jjbjj = jjD¹  ºjj jjbjj  jjD¹  ºjj (C.24) 0 0 0 c c 4 4 e e jj  jj  jj: (C.25) jj = 2 2 Lemma C.4. Let u¹Y; X; ; dº = sup ¹Y; X; º ¹Y; X; º : (C.26) 2U ¹º and assume that Assumption 2.1, Assumption 2.2 and the Moment Conditions (M-3) in Appendix A hold. Then, there are strictly positive numbers c and d , such that E u¹Y; X; ; dº  c d for jj  jj + d  d ; (C.27) 0 0 and for all d  0. Proof of Lemma C.4. Let in the following d > 0 and  2  such that jj  jj + d  d . It holds that 0 0 sup ¹Y; X; º ¹Y; X; º = sup ¹Y; X; º ¹Y; X; º (C.28) ¯ ¯ 2U ¹º 2U ¹º d d and consequently, we show that " # E sup ¹Y; X; º ¹Y; X; º = O¹dº (C.29) j j 2U ¹º for both components j = 1; 2 and for some d > 0 small enough. 24 For the first squared component, we get that " # E sup ¹Y; X; º ¹Y; X; º 1 1 2U ¹º ! " # ¹1º ¹1º 0 q 0 e 0 q 0 e max ; 1  E sup X G ¹X  º + G ¹X  º G ¹X  º G ¹X  º 2 2 1 1 2U ¹º " # ¹1º 0 q 0 e 0 q + E sup X G ¹X  º + G ¹X  º jjXjj sup f ¹X  º  d 2 YjX 2 1 ¯ ¯ 2U ¹º 2U ¹º d d ¹1º ¹1º 0 q 0 e 0 q 0 e + max 1 ; E sup X G ¹X  º + G ¹X  º G ¹X  º G ¹X  º 2 2 1 1 2U ¹º ¹1º 0 q 0 e X G ¹X  º + G ¹X  º ; where we apply (C.5) for the second summand. The remaining two summands can be bounded linearly by ¹1º the arguments given in (C.7) since G and G are continuously differentiable functions and the respective moments are finite. For the second component of , we get that ¹Y; X; º ¹Y; X; º 2 2 ¹1º ¹1º 0 e 0 q 0 e 0 e 0 q 0 e X¹X  X  ºG ¹X  º X¹X  X  ºG ¹X  º 2 2 ¹1º 0 e 0 q XG ¹X  ºX + 1 0 q 1 0 q fYX  g fYX  g ¹1º ¹1º 0 e 0 q 0 e 0 q XG ¹X  ºX  XG ¹X  ºX 2 2 0 q + 1 fYX  g (C.30) ¹1º 0 e XG ¹X  º + Y 1 0 q 1 0 q fYX  g fYX  g Y1 0 q fYX  g ¹1º ¹1º 0 e 0 e + XG ¹X  º XG ¹X  º 2 2 = ¹iº +¹iiº +¹iiiº +¹ivº +¹vº: h i Thus, in order to evaluate E sup ¹Y; X; º ¹Y; X; º , we have to consider all the cross 2 2 2U ¹º products out of the five summands in (C.30). Since the techniques applied are very similar, we only show 25 details for two of the cross products. " # E sup ¹iiº¹vº 2U ¹º ¹1º 0 e 0 q XG ¹X  ºX 0 q 0 q = E sup 1 1 fYX  g fYX  g 2U ¹º 0 q Y1 fYX  g ¹1º ¹1º 0 e 0 e XG ¹X  º XG ¹X  º 2 2 " # ¹1º ¹1º ¹1º 0 e 0 q 0 e 0 e E XG ¹X  ºX   E jYj X  jjXjj  sup G ¹X  º G ¹X  º 2 2 2 2U ¹º " # ¹1º ¹2º 0 e 0 q 0 e E XG ¹X  ºX   E jYj X  jjXjj  sup XG ¹X  º  d 2 2 2 2U ¹º = O¹dº; ¹1º by (C.7) since G is continuously differentiable. The following crossproducts can be bounded analogously by bounding the indicator functions and by 2 2 2 applying the mean value theorem as in (C.7): ¹iº ,¹iiiº ,¹vº ,¹iº¹iiiº,¹iº¹ivº,¹iº¹vº,¹iiº¹ivº,¹iiº¹vº, ¹iiiº¹ivº,¹iiiº¹vº and¹ivº¹vº. 2 2 A second type of technique, similar to the arguments in (C.9) arises in the cases¹iiº ,¹ivº and¹iiº¹ivº. q q q q q ¯ ˜ We get that there exists  ;  2 U ¹º and a value  on the line between  and  , such that + + " # " # 2 3 ¹1º 0 e 6 7 XG ¹X  º 2 2 6 7 0 q 0 q E sup ¹ivº  E E sup Y 1 1 X fYX  g fYX  g 6 7 ¯ ¯ 2U ¹º 6 2U ¹º 7 d d 4 5 2 3 ¹1º 0 e h i 6 XG ¹X  º 7 6 7 q q E E Y 1 X 0 0 fX  YX  g 6 7 6 7 4 5 2 q 3 ¹1º 0 0 e 6 XG ¹X  º + 7 6 7 = E y f ¹yºdy YjX 6 7 6 7 4 5 2 3 ¹1º 0 e 6 7 XG ¹X  º 2 0 q 2 0 q 0 0 q 6 ˜ ˜ 7 E ¹X  º f ¹X  º X  X YjX 6 7 6 7 4 5 " # ¹1º 2 0 e 0 q 2 0 q E X G ¹X  º  sup ¹X  º f ¹X  º  d YjX 2U ¹º = O¹dº; where we apply a multivariate version of the mean value theorem and notice that f is bounded. YjX Lemma C.5. Assume that Assumption 2.1, Assumption 2.2 and the Moment Conditions (M-4) in Appendix A hold. Then, the function ¹Y; X; º, given in (2.2) is locally Lipschitz continuous in  in the sense that for all  ;  2 U ¹ º in some neighborhood of  , it holds that 1 2 d 0 0 ¹Y; X;  º ¹Y; X;  º  K¹Y; Xº   ; (C.31) 1 2 1 2 where E K¹Y; Xº < 1. 26 Proof. We start the proof by splitting the  function into two parts, ¹Y; X; º =  ¹Y; X; º +  ¹Y; X; º; (C.32) 1 2 where 0 q 0 e 0 q 0 q ¹Y; X; º = 1 G ¹X  º G ¹Yº + G ¹X  º¹X  Yº ; (C.33) 1 fYX  g 1 1 2 0 e 0 e 0 q 0 e 0 q ¹Y; X; º = G ¹X  º X  X  G ¹X  º G ¹X  º + a¹Yº: (C.34) 2 2 2 1 Local Lipschitz continuity of  follows since it is a continuously differentiable function and thus locally Lipschitz. We consequently get that for some d > 0 and for all  ;  2 U ¹ º, it holds that 1 2 d 0 ¹1º 0 e 0 q XG ¹X  º XG ¹X  º ¹Y; X;  º  ¹Y; X;  º     sup  ; (C.35) 2 1 2 2 1 2 ¹1º 0 e 0 e 0 q XG ¹X  º X  X 2U ¹ º d 0 with Lipschitz-constant ¹1º 0 e 0 q XG ¹X  º XG ¹X  º K¹Y; Xº = sup  ; (C.36) ¹1º 0 e 0 e 0 q XG ¹X  º X  X 2U ¹ º d 0 which is square-integrable by the moment conditions (M-4). q q 0 0 For the function  , we consider three cases. First, let  ;  2  such that X   X  < Y. Then it 1 1 2 1 2 holds that, ¹Y; X;  º =  ¹Y; X;  º = 0; (C.37) 1 1 1 2 q q since 1 0 = 1 0 = 0, which is obviously a Lipschitz continuous function. fYX  g fYX  g 1 2 q q 0 0 Second, let  ;  2  such that Y  X   X  . Then, for  =  ;  , 1 2 1 2 1 2 0 q 0 e 0 q ¹Y; X; º = G ¹X  º G ¹Yº + G ¹X  º¹X  Yº; (C.38) 1 1 1 2 which is a continuously differentiable function and thus ¹1º 0 q 1 0 e XG ¹X  º + XG ¹X  º ¹Y; X;  º  ¹Y; X;  º     sup : (C.39) 1 1 1 2 1 2 ¹1º 1 0 e 0 q XG ¹X  º¹X  Yº 2U ¹ º d 0 q q 0 0 Finally, let  ;  2  such that X  < Y  X  . Then, since G is increasing, we get that 1 2 1 1 2 q q 0 0 e 0 ¹Y; X;  º  ¹Y; X;  º = G ¹X  º G ¹Yº + G ¹X  º¹X  Yº 1 1 1 2 1 1 2 2 2 2 q q q q 0 0 0 e 0 0 G ¹X  º G ¹X  º + G ¹X  º¹X  X  º 1 1 2 2 1 2 2 1 q q ¹1º 0 q 0 e sup XG ¹X  º + XG ¹X  º : 1 2 1 2U ¹ º d 0 Thus, the function ¹Y; X; º is locally Lipschitz continuous in  with square-integrable Lipschitz constants, E K¹Y; Xº < 1 by the Moment Conditions (M-4) in Appendix A. Proposition C.6. Let Y be a real-valued random variable with distribution function F, finite first and second moments and a unique -quantile q = F ¹ º. Then, ¹ ¹ q q 1 1 1 F¹x ^ yº F¹xºF¹yºdxdy = Var¹YjY  q º + q  ; (C.40) 1 1 where  = E Y Y  q denotes the -ES of Y. Proof. We first notice that for a distribution F with finite second moment und unique -quantile, it holds that E Y Y  q = F¹xºdx + q and (C.41) 2 2 E Y Y  q = xF¹xºdx + q ; (C.42) which can be obtained by using the identity ¹ ¹ 1 0 Y1 = 1 1 dt 1 dt (C.43) fYq g fYq g fY>tg fYtg 0 1 and by taking expectations on both sides. By applying (C.41), we get that ¹ ¹ ¹ q q q F¹xºF¹yºdxdy = F¹xºdx = q E Y Y  q = q  : (C.44) 1 1 1 Furthermore, notice that ¹ ¹ ¹ ¹ ¹ ¹ q q q y q q F¹x ^ yºdxdy = F¹xºdxdy + F¹yºdxdy; (C.45) 1 1 1 1 1 y and by rearranging the order of integration for the first term in (C.45), we get that ¹ ¹ º º q y F¹xº dxdy = F¹xº dxdy = F¹xº dydx 1 1 f¹x;yº: yq ; xyg f¹x;yº: xq ; yxg (C.46) ¹ ¹ ¹ q q q = F¹xº dydx = F¹xº¹q xº dx: 1 x 1 Thus, by first using (C.45) and (C.46) and by plugging in (C.41) and (C.44), we obtain ¹ ¹ ¹ ¹ q q q q F¹x ^ yºdxdy = 2 F¹yº dxdy 1 1 1 y = 2 F¹yº¹q yº dy ¹ ¹ q q (C.47) = 2q F¹yº dy 2 yF¹yº dy 1 1 2 2 = 2q q  + E Y Y  q q 2 2 = E Y Y  q + q 2 q  : Eventually, using (C.44) and (C.47), straight-forward calculations yield that ¹ ¹ q q 1 1 1 F¹x ^ yº F¹xºF¹yºdxdy = Var¹YjY  q º + q  ; (C.48) 1 1 which concludes the proof. Appendix D Separability of almost surely continuous functions Definition D.1 (Separability of a Stochastic Process). A stochastic process ¹x; º : ! Y is called separable in the sense of Doob, if there exists in an everywhere dense countable set I, and in 28 nullset N such that for any arbitrary open set G   and every closed set F  Y, the two sets fxj ¹x; º 2 F; 8 2 Gg and (D.1) fxj ¹x; º 2 F; 8 2 G\ Ig (D.2) differ from each other at most by a subset of N. Proposition D.2 (Gikhman and Skorokhod (2004)). Let  and Y be metric spaces,  be a separable space. The sets (D.1) and (D.2) coincide for all x 2 for which the stochastic process ¹x; º is continuous in . Proof. It is clear that fxj ¹x; º 2 F; 8 2 Gg  fxj ¹x; º 2 F; 8 2 G\ Ig. We thus only show the reverse. Let G   be an arbitrary open set and F  Y an arbitrary closed set. Let furthermore x 2 such ˜ ˜ ˜ that ¹x; º 2 F for all  2 G\ I. We have to show that ¹x; º 2 F for all  2 G but  < I. Thus, let  2 G n I. Since I is a dense set in , there exists a sequence ¹ º 2 \ I, such that n n2N ˜ ˜ !  and since G is an open set in  and  2 G, we can conclude that for m 2 N large enough,  2 G n n for all n  m. Furthermore, by continuity at , it holds that ¹x;  º ! ¹x; º and since  2 G\ I for n n all n large enough, ¹x;  º 2 F by assumption. Eventually, since F is a closed set, ¹x; º 2 F which proves the proposition. Corollary D.3 (Separability of continuous functions). Let  and Y be metric spaces,  be a separable space, and let the stochastic process ¹x; º be almost surely continuous. Then, is separable. Proof. Since ¹x; º is continuous for all x 2 n N for some N with P¹Nº = 0. We get from Proposition D.2 that the sets (D.1) and (D.2) coincide for all x 2 n N, i.e. they differ only by a subset of N. References Acerbi, C. and Szekely, B. (2014). Backtesting Expected Shortfall. Risk. Andersen, T. and Bollerslev, T. (1998). Answering the skeptics: Yes, standard volatility models do provide accurate forecasts. International Economic Review, 39:885–905. Andrews, D. (1994). Empirical Process Methods in Econometrics. In Engle, R. and McFadden, D., editors, Handbook of Econometrics, volume 4, chapter 37, pages 2247–2294. Elsevier. Artzner, P., Delbaen, F., Eber, J.-M., and Heath, D. (1999). Coherent Measures of Risk. Mathematical Finance, 9(3):203–228. Barendse, S. (2017). Interquantile Expectation Regression. Available at https://ssrn.com/abstract=2937665. Basel Committee (2016). Minimum capital requirements for Market Risk. Technical report, Basel Committee on Banking Supervision. Available at http://www.bis.org/bcbs/publ/d352.pdf. Bayer, S. and Dimitriadis, T. (2017a). esreg: Joint (VaR, ES) Regression. R package version 0.2.0, available at https://github.com/BayerSe/esreg. Bayer, S. and Dimitriadis, T. (2017b). Regression-based Expected Shortfall Backtesting. Working Paper. Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics, 31(3):307–327. Brazauskas, V., Jones, B. L., Puri, M. L., and Zitikis, R. (2008). Estimating conditional tail expectation with actuarial applications in view. Journal of Statistical Planning and Inference, 138(11):3590–3604. Chen, S. X. (2008). Nonparametric Estimation of Expected Shortfall. Journal of Financial Econometrics, 6(1):87–107. Chernozhukov, V. and Umantsev, L. (2001). Conditional value-at-risk: Aspects of modeling and estimation. Empirical Economics, 26(1):271–292. 29 Corsi, F. (2009). A Simple Approximate Long-Memory Model of Realized Volatility. Journal of Financial Econometrics, 7(2):174–196. Dimitriadis, T. and Bayer, S. (2017). Online Supplement for “A Joint Quantile and Expected Shortfall Regression Framework”. Efron, B. (1979). Bootstrap Methods: Another Look at the Jackknife. The Annals of Statistics, 7(1):1–26. Efron, B. (1991). Regression percentiles using asymmetric squared error loss. Statistica Sinica, 1:93–125. Ehm, W., Gneiting, T., Jordan, A., and Krüger, F. (2016). Of quantiles and expectiles: consistent scoring functions, Choquet representations and forecast rankings. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 78(3):505–562. Engle, R. and Manganelli, S. (2004). CAViaR: Conditional Autoregressive Value at Risk by Regression Quantiles. Journal of Business and Economic Statistics, 22(4):367–381. Fissler, T. (2017). On Higher Order Elicitability and Some Limit Theorems on the Poisson and Wiener Space. PhD thesis, Universität Bern. Fissler, T. and Ziegel, J. F. (2016). Higher order elicitability and Osband’s principle. Annals of Statistics, 44(4):1680–1707. Fissler, T., Ziegel, J. F., and Gneiting, T. (2016). Expected Shortfall is jointly elicitable with Value at Risk - Implications for backtesting. Risk Magazine, Janaury 2016. Gaglianone, W. P., Lima, L. R., Linton, O., and Smith, D. R. (2011). Evaluating Value-at-Risk Models via Quantile Regression. Journal of Business & Economic Statistics, 29(1):150–160. Gikhman, I. and Skorokhod, A. (2004). The Theory of Stochastic Processes I, volume 210 of Classics in Mathematics. Springer Berlin Heidelberg. Gneiting, T. (2011). Making and Evaluating Point Forecasts. Journal of the American Statistical Association, 106(494):746–762. Gourieroux, C. and Monfort, A. (1995). Statistics and Econometric Models: Volume 1, General Concepts, Estimation, Prediction and Algorithms. Cambridge University Press. Halbleib, R. and Pohlmeier, W. (2012). Improving the value at risk forecasts: Theory and evidence from the financial crisis. Journal of Economic Dynamics and Control, 36(8):1212–1228. Hall, P. and Sheather, S. J. (1988). On the Distribution of a Studentized Quantile. Journal of the Royal Statistical Society. Series B (Methodological), 50(3):381–391. Hendricks, W. and Koenker, R. (1992). Hierarchical Spline Models for Conditional Quantiles and the Demand for Electricity. Journal of the American Statistical Association, 87(417):58–68. Huber, P. (1967). The behavior of maximum likelihood estimates under nonstandard conditions. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, pages 221–233. Berkeley: University of California Press. Koenker, R. (1994). Confidence Intervals for Regression Quantiles. In Mandl, P. and Hušková, M., editors, Asymptotic Statistics: Proceedings of the Fifth Prague Symposium, held from September 4–9, 1993, pages 349–359. Physica-Verlag Heidelberg. Koenker, R. (2005). Quantile Regression. Econometric Society Monographs. Cambridge University Press. Koenker, R. and Machado, J. A. F. (1999). Goodness of Fit and Related Inference Processes for Quantile Regression. Journal of the American Statistical Association, 94(448):1296–1310. Koenker, R. and Xiao, Z. (2006). Quantile Autoregression. Journal of the American Statistical Association, 101(475):980–990. Komunjer, I. (2013). Quantile Prediction. In Handbook of Economic Forecasting, volume 2, chapter 17, pages 961–994. Elsevier. Lambert, N. S., Pennock, D. M., and Shoham, Y. (2008). Eliciting Properties of Probability Distributions. In Proceedings of the 9th ACM Conference on Electronic Commerce, pages 129–138. ACM. Lourenço, H. R., Martin, O. C., and Stützle, T. (2003). Iterated Local Search. In Glover, F. and Kochenberger, G. A., editors, Handbook of Metaheuristics, pages 320–353. Springer US, Boston, MA. Nadarajah, S., Zhang, B., and Chan, S. (2014). Estimation methods for expected shortfall. Quantitative Finance, 14(2):271–291. 30 Nelder, J. A. and Mead, R. (1965). A Simplex Method for Function Minimization. The Computer Journal, 7(4):308–313. Newey, W. and McFadden, D. (1994). Large sample estimation and hypothesis testing. In Engle, R. and McFadden, D., editors, Handbook of Econometrics, volume 4, chapter 36, pages 2111–2245. Elsevier. Nolde, N. and Ziegel, J. F. (2017). Elicitability and backtesting: Perspectives for banking regulation. arXiv:1608.05498 [q-fin.RM]. Taylor, J. W. (2008a). Estimating Value at Risk and Expected Shortfall Using Expectiles. Journal of Financial Econometrics, 6(2):231–252. Taylor, J. W. (2008b). Using Exponentially Weighted Quantile Regression to Estimate Value at Risk and Expected Shortfall. Journal of Financial Econometrics, 6(3):382–406. Taylor, J. W. (2017). Forecasting Value at Risk and Expected Shortfall Using a Semiparametric Approach Based on the Asymmetric Laplace Distribution. Forthcoming in Journal of Business and Economic Statistics. van der Vaart, A. W. (1998). Asymptotic statistics. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press. Žikeš, F. and Baruník, J. (2016). Semi-parametric Conditional Quantile Models for Financial Returns and Realized Volatility. Journal of Financial Econometrics, 14(1):185–226. Weber, S. (2006). Distribution Invariant Risk Measures, Information, and Dynamic Consistency. Mathematical Finance, 16(2):419–441. Xiao, Z., Guo, H., and Lam, M. S. (2015). Quantile Regression and Value at Risk. In Lee, C.-F. and Lee, J. C., editors, Handbook of Financial Econometrics and Statistics, pages 1143–1167. Springer. Ziegel, J. F., Krüger, F., Jordan, A., and Fasciati, F. (2017). Murphy Diagrams: Forecast Evaluation of Expected Shortfall. arXiv:1705.04537 [q-fin.RM]. Zwingmann, T. and Holzmann, H. (2016). Asymptotics for the expected shortfall. arXiv:1611.07222 [math.ST]. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Quantitative Finance arXiv (Cornell University)

A Joint Quantile and Expected Shortfall Regression Framework

Quantitative Finance , Volume 2020 (1704) – Apr 7, 2017

Loading next page...
 
/lp/arxiv-cornell-university/a-joint-quantile-and-expected-shortfall-regression-framework-Va4bp7r9GR

References (49)

ISSN
1935-7524
eISSN
ARCH-3346
DOI
10.1214/19-EJS1560
Publisher site
See Article on Publisher Site

Abstract

We introduce a novel regression framework which simultaneously models the quantile and the Expected Shortfall (ES) of a response variable given a set of covariates. This regression is based on a strictly consistent loss function for the pair quantile and ES, which allows for M- and Z-estimation of the joint regression parameters. We show consistency and asymptotic normality for both estimators under weak regularity conditions. The underlying loss function depends on two specification functions, whose choice affects the properties of the resulting estimators. We find that the Z-estimator is numerically unstable and thus, we rely on M-estimation of the model parameters. Extensive simulations verify the asymptotic properties and analyze the small sample behavior of the M-estimator for different specification functions. This joint regression framework allows for various applications including estimating, forecasting, and backtesting ES, which is particularly relevant in light of the recent introduction of ES into the Basel Accords. Keywords: Expected Shortfall, Joint Elicitability, Joint Regression, M-estimation, Quantile Regression 1. Introduction Measuring and forecasting risks is essential for a variety of academic disciplines. For this purpose, risk measures which are formally defined as a map (with certain properties) from a space of random variables to a real number, are applied to condense the complex nature of the involved risks to a single number (Artzner et al., 1999). In the context of financial risk measurement, to date the most commonly used risk measure is the Value-at-Risk (VaR), which is the -quantile of the return distribution. Its popularity is mainly due to its simple nature and the fact that up to now, the Basel Accords stipulate its use for the calculation of capital requirements for banks. Besides being not coherent (Artzner et al., 1999), the main drawback of the VaR is its inability to capture tail risks beyond itself. This deficiency is overcome by the risk measure Expected Shortfall (ES) at level , which is defined as the mean of the returns which are smaller than the -quantile of the return distribution. The ES has the desired ability to capture information from the whole left tail of the return distribution, which is particularly important for measuring extreme financial risks. Over the past few years, ES has increasingly become the object of interest for practitioners, academics, and regulators, especially since its recent introduction into the Basel Accords (Basel Committee, 2016). A major drawback of the ES (regarded as a statistical functional) is that it is not elicitable, which means that there exists no loss function (scoring function, scoring rule) which the ES uniquely minimizes in expectation (Gneiting, 2011; Weber, 2006). This result has two main consequences. First, consistent ranking of competing forecasts for the ES based on such a loss function is infeasible. Second, and more substantial for this paper, modeling the conditional ES given a set of covariates through a regression Corresponding author Email addresses: timo.dimitriadis@uni-konstanz.de, sebastian.bayer@uni-konstanz.de arXiv:1704.02213v3 [math.ST] 8 Aug 2017 model without specifying the full conditional distribution is infeasible since estimation of the regression parameters through M-estimation requires such a loss function. Consequently, and in contrast to quantile regression (which can be used to model the VaR), to date, there exists no such regression framework which models the ES based on a set of covariates. Nadarajah et al. (2014) provide an overview of estimation methods for the ES. However, the reviewed approaches are only applicable for univariate data and not suitable for estimating the conditional ES based on covariates such as in mean and quantile regression. Nevertheless, there are some approaches for the ES which incorporate explanatory variables through indirect estimation procedures. Taylor (2008b) proposes an implicit approach for forecasting ES using exponentially weighted quantile regression and Taylor (2008a) introduces a procedure based on expectile regression and a relationship between the ES and expectiles. Taylor (2017) suggests a joint modeling technique for the quantile and the ES based on maximum likelihood estimation of the asymmetric Laplace distribution. Barendse (2017) proposes generalized method of moments (GMM) estimation for a regression framework for the interquantile expectation. Even though the ES is not elicitable stand-alone, Fissler and Ziegel (2016) show in their seminal paper that the quantile (the VaR) and the ES are jointly elicitable by introducing a class of joint loss functions, whose expectation is minimized by these two functionals. This joint elicitability result and the associated class of loss functions gives rise to a growing literature in both, joint estimation (Zwingmann and Holzmann, 2016) and in joint forecast evaluation (Acerbi and Szekely, 2014; Fissler et al., 2016; Nolde and Ziegel, 2017; Ziegel et al., 2017) for the risk measures VaR and ES. In this paper, we utilize the class of loss functions of Fissler and Ziegel (2016) for the introduction of a novel simultaneous regression framework for the quantile and the ES and propose both, an M- and a Z-estimator for the joint regression parameters. These strictly consistent loss functions facilitate the opportunity to introduce M- and Z-estimation of the regression parameters without specifying the full conditional distribution of the model, as opposed to maximum likelihood estimation. We show consistency and asymptotic normality for both estimators under weak regularity conditions which are typical for such a regression framework. To the best of our knowledge, we are the first to propose such a joint regression framework for the quantile and the ES together with the joint M- and Z-estimation and the associated results of consistency and asymptotic normality. Furthermore, we are the first to propose a joint semiparametric regression framework for two different functionals based on joint M-estimation without specifying the full conditional distribution. The employed joint loss function, the estimating equations (for the Z-estimator) and the resulting parameter estimates depend on two specification functions, which can be chosen from some class of functions. Even though consistency and asymptotic normality hold for all applicable choices of these specification functions, they affect the necessary moment conditions, the resulting asymptotic covariance matrices of the estimators, the numerical stability of the optimization algorithm, and the computation times. We discuss the choice of these functions in a theoretical context with respect to asymptotic efficiency and necessary regularity conditions, and with respect to the numerical properties of the optimization algorithm. The estimation of the asymptotic covariance matrix imposes some difficulties. The first occurs in the estimation of the density quantile function, analogous to quantile regression (cf. Koenker, 2005) and thus, we utilize estimation procedures stemming from this literature. The second issue is the estimation of the variance of the negative quantile residuals conditional on the covariates, a nuisance quantity which is new to the literature. We introduce several estimators for this quantity which are able to cope with limited sample sizes and which can model the dependency of the negative quantile residuals on the covariates. Furthermore, we estimate the covariance matrix using the bootstrap. For ease of application, we provide an R package (Bayer and Dimitriadis, 2017a) which contains the implementation of the M- and Z-estimator. The user can choose the specification functions, the numerical optimization procedure and the estimation method for the covariance matrix of the parameter estimates. We conduct a Monte-Carlo simulation study where we consider three data generating processes with 2 different properties. We numerically verify consistency and asymptotic normality of the M-estimator for a range of different choices of the specification functions. Furthermore, we find that the Z-estimator is numerically unstable due to the redescending nature of the utilized estimating equations and consequently, we rely on M-estimation of the regression parameters. Moreover, we find that the performance of the M-estimator strongly depends on the specification functions, where choices resulting in positively homogeneous loss functions (Nolde and Ziegel, 2017; Efron, 1991) lead to a superior performance in terms of asymptotic efficiency, computation times, and mean squared error of the estimator. This joint regression technique for the quantile and ES has a wide range of potential applications as it generalizes quantile regression to the pair consisting of the quantile and the ES. In the context of financial risk management, it opens up the possibility to extend the existing applications of quantile regression on VaR in the financial literature to ES, such as e.g. in Chernozhukov and Umantsev (2001), Engle and Manganelli (2004), Koenker and Xiao (2006), Gaglianone et al. (2011), Halbleib and Pohlmeier (2012), Komunjer (2013), Xiao et al. (2015) and Žikeš and Baruník (2016). Such estimation, forecasting, and backtesting methods for the ES are particularly sought-after in light of the recent shift from VaR to ES in the Basel Accords. As an illustration, we present an empirical application where we use our regression framework to jointly forecast VaR and ES based on the realized volatility. The rest of the paper is organized as follows. In Section 2, we introduce the joint regression framework, the underlying regularity conditions together with the asymptotic properties of our estimators and discuss the choice of the specification functions. Section 3 provides details on the numerical implementation of the estimators and on the estimation of the asymptotic covariance matrix. Section 4 presents an extensive simulation study and Section 5 contains an empirical application. Section 6 provides concluding remarks. The proofs are deferred to Appendices B and C. 2. Methodology 2.1. The Joint Regression Framework Following Lambert et al. (2008), Gneiting (2011) and Fissler and Ziegel (2016), we introduce the concept of (multivariate) p-elicitability. We consider a random variable Z : ! R , defined on some complete probability space ;F; P , a class of distributions P on R , equipped with the Borel -field and a functional T : P ! D with its domain of action D  R ; p 2 N. We call an integrable loss function : R  D ! R strictly consistent for the functional T relative to the class of distributions P, if T is the unique minimizer of E ¹Z;º for all distributions F 2 P, where F is the distribution of Z. Furthermore, we call a p-dimensional functional T p-elicitable relative to the class P, if there exists a loss function which is strictly consistent for T relative to P. If the dimension p is clear from the context, we simply call the functional elicitable instead of p-elicitable. Given the generalized -quantile Q ¹Zº = F ¹ º = inf z 2 R : F¹zº  for some 2 ¹0; 1º, the ES of the random variable Z at level is defined as ES ¹Zº = Q ¹Zº du. If the distribution function of Z is continuous at its -quantile, this definition can be simplified to the conditional tail expectation ES ¹Zº = E Z Z  Q ¹Zº . Gneiting (2011) shows that the ES is not 1-elicitable with respect to any class P of probability distributions on intervals I  R, which contain measures with finite support or finite mixtures of absolutely continuous distributions with compact support (see also Weber, 2006). This result has several consequences for the risk measure ES. First, consistent and meaningful ranking of competing forecasts for the functional ES is infeasible. Second, and more consequential for this work, estimating the 0 e parameters of a stand-alone regression model for the functional ES in the sense that ES ¹YjXº = X by means of M-estimation, i.e. by minimizing some strictly consistent loss function, is infeasible. Even though the ES is not 1-elicitable, Fissler and Ziegel (2016) show that the pair consisting of the ES and the quantile at common probability level is 2-elicitable relative to the class of distributions with finite first moments and unique -quantiles and they characterize the full class of strictly consistent loss functions 3 for this pair subject to some regularity conditions. Since the definition of the ES already depends on the respective quantile, the fact that the ES is only elicitable jointly with the quantile is not surprising. We utilize this joint elicitability result for the introduction of a new joint regression framework for the quantile and the ES where the aforementioned class of strictly consistent loss functions serves as the basis for the M-estimation of the joint regression parameters. For this, let Y : ! R and X : ! R be random variables defined on the same probability space ;F; P as above. Henceforth, the transpose of X will be denoted by X , the cumulative distribution function of Y given X by F and the conditional YjX density function by f . For a k-times differentiable real-valued function G : R ! R, we denote the k-th YjX ¹kº derivative by G ¹º. Assumption 2.1 (The joint regression model). The regression framework which jointly models the conditional quantile and ES of Y given X for some fixed level 2 ¹0; 1º is given by 0 q 0 e e Y = X  + u and Y = X  + u ; (2.1) 0 0 q0 q e e0 0 2k where Q ¹u jXº = 0 and ES ¹u jXº = 0. The model is parametrized by  = ¹ ;  º 2   R , 0 0 where the parameter space  is compact with nonempty interior, int¹º , ;. We propose both, an M-estimation and a Z-estimation procedure for the compound regression parameter vector  . For the M-estimation, we adapt the class of strictly consistent joint loss functions1 for the quantile and ES as given in Fissler and Ziegel (2016) such that it can be used in a regression framework, 0 q ¹Y; X; º = 1 0 q G ¹X  º 1 0 q G ¹Yº fYX  g 1 fYX  g 1 0 q 0 q (2.2) ¹X  Yº1 fYX  g 0 e 0 e 0 q 0 e + G ¹X  º X  X  + G ¹X  º + a¹Yº; 2 2 where the function G is twice continuously differentiable, G is three times continuously differentiable, 1 2 ¹1º ¹1º G = G , G and G are strictly positive, G is increasing and a and G are integrable. We discuss 2 2 1 1 2 2 the choice of the specification functions G and G in a theoretical context in Section 2.3 and by their 1 2 numerical performance in Section 4.2. The corresponding (-type) M-estimator is defined by a sequence 1 n ˆ ˆ , such that  = argmin ¹Y ; X ; º. ;n ;n i i n i=1 Instead of minimizing some objective function ¹Y; X; º such as in (2.2), we can also define the corresponding Z-estimator (or -type M-estimator), which sets a vector of estimating equations (moment conditions), denoted by ¹Y; X; º, to zero. More generally, it suffices that these estimating equations converge to zero almost surely. Formally, the Z-estimator is a sequence  , such that ;n 1 n ¹Y ; X ;  º ! 0 almost surely, where i i ;n n i=1 ¹1º 1 0 q 0 e ¹1 0 q º XG ¹X  º + XG ¹X  º fYX  g 2 ¹Y; X; º © 1 ¹Y; X; º = = ­ ®; (2.3) ¹1º 0 e 0 e 0 q 1 0 q ¹Y; X; º 2 0 q XG ¹X  º X  X  + ¹X  Yº1 fYX  g « ¬ which is obtained by differentiating2 (2.2) and where the functions G and G are given as above. When 1 2 the loss function ¹Y; X; º is continuously differentiable in , it is obvious that the M- and Z-estimation approaches are equivalent. However, in this case the loss function ¹Y; X; º is not differentiable and 0 q ¹Y; X; º is discontinuous at the points where Y = X  . Thus, we treat these two estimation approaches as different estimators and show their asymptotic behavior separately. 1One can interpret the structure of this loss function as follows (Fissler et al., 2016): The first summand in (2.2) is a strictly consistent loss function for the quantile (Gneiting, 2011) and hence only depends on the quantile, whereas the second summand cannot be split into a part depending only on the quantile and one depending only on the ES. This illustrates the fact that the ES itself is not 1-elicitable, but 2-elicitable together with the respective quantile. 0 q 2 Note that the function ¹Y; X; º, given in (2.2) is only differentiable for Y , X  . However, the points of non- 0 q differentiability, Y = X  form a nullset with respect to the absolutely continuous distribution of Y given X. 4 2.2. Asymptotic Properties In this section, we present the asymptotic properties of the M- and Z-estimator of the regression parameters. Consistency and asymptotic normality hold under the following set of weak regularity conditions, which are natural for this regression framework. Assumption 2.2 (Regularity Conditions). (A-1) The data¹Y ; X º for i = 1; : : :; n is an iid series of random variables, distributed such as¹Y; Xº i i given above. Furthermore, the conditional distribution F has finite second moments and YjX is absolutely continuous with probability density function f , which is strictly positive, YjX continuous and bounded in a neighbourhood of the true conditional quantile, X  . (A-2) The matrix E X X is positive definite. (A-3) The functions ¹Y; X; º and ¹Y; X; º are given as in (2.2) and (2.3), where the function G ¹1º is twice continuously differentiable, G is three times continuously differentiable, G = G , 2 2 ¹1º G and G are strictly positive, G is increasing and a and G are integrable. 2 1 1 Remark 2.3 (Finite Moment Conditions). We further have to assume that certain moments of X are finite. For the sake of space, we specify the Finite Moment Conditions (M-1) - (M-4) in Appendix A. Note that these general moment conditions simplify substantially for sensible choices of the specification functions G and G as further outlined in Section 2.3. 1 2 Assumption (A-1) is a combination of typical regularity conditions of mean and quantile regression. Absolute continuity of F with a strictly positive, bounded and continuous density function in a YjX neighborhood of the true conditional quantile is also imposed for the asymptotic theory of quantile regression. Existence of the conditional moments of Y given X is subject to the conditions of mean regression and is included in our regularity conditions since ES is a truncated mean. The positive definiteness (full rank condition) in (A-2) is common for any regression design with stochastic regressors in order to exclude perfect multicollinearity of the regressors. The conditions for the specification functions G and G in (A-3) mainly originate from the conditions for the joint elicitability of the quantile and ES 1 2 in Fissler and Ziegel (2016). Differentiability of these functions is required in this setup for obtaining the estimating equations and for the differentiations in the computation of the asymptotic covariance in Theorem 2.6 and Theorem 2.7. The existence of certain moments of the explanatory variables as in conditions (M-1) - (M-4) in Appendix A is also standard in any regression design relying on stochastic regressors. Even though compactness of the parameter space  in Assumption 2.1 generally simplifies the proofs, in this setup it is crucial for consistency of the Z-estimator as the estimating equations are redescending to zero for many reasonable choices of the G function such as e.g. the choices resulting in positively homogeneous loss functions. For details on this, we refer to Section 3.1. Theorem 2.4. Assume that Assumption 2.1, Assumption 2.2 and the Moment Conditions (M-1) in P a:s: 1 n ˆ ˆ Appendix A hold true. Then, for every sequence  2  satisfying ¹Y ; X ;  º ! 0, it holds ;n i i ;n n i=1 a:s: that  !  . ;n 0 Theorem 2.5. Assume that Assumption 2.1, Assumption 2.2 and the Moment Conditions (M-2) ˆ ˆ in Appendix A hold true. Then, for every sequence  2  such that ¹Y ; X ;  º ;n i i ;n i=1 P P 1 n ¹Y ; X ;  º + o ¹1º, it holds that  !  . i i 0 P ;n 0 n i=1 Theorem 2.6. Assume that Assumption 2.1, Assumption 2.2 and the Moment Conditions (M-3) in P P ˆ ˆ Appendix A hold true. Then, for every sequence  2  satisfying ¹Y ; X ;  º ! 0, it ;n i i ;n i=1 holds that 1 1 n   ! N 0;  C ; (2.4) ;n 0 5 with 0 C C 11 11 12 = and C = ; (2.5) 0  C C 22 21 22 where h i q ¹1º q 0 0 0 0 e = E ¹X X º f ¹X  º G ¹X  º + G ¹X  º ; (2.6) 11 2 YjX 0 1 0 ¹1º 0 0 e = E ¹X X ºG ¹X  º ; (2.7) h i ¹1º q 2 0 0 0 e C = E ¹X X º G ¹X  º + G ¹X  º ; (2.8) 11 2 1 0 0 h i q ¹1º q ¹1º 0 0 0 e 0 0 e 0 e C = C = E ¹X X º X  X  G ¹X  º + G ¹X  º G ¹X  º ; (2.9) 12 21 2 0 0 0 0 1 0 2 1 1 ¹1º 2 q q q 2 0 0 e 0 0 0 0 e C = E ¹X X º G ¹X  º Var Y X  Y  X  ; X + X  X  : (2.10) 2 0 0 0 0 0 Theorem 2.7. Assume that Assumption 2.1, Assumption 2.2 and the Moment Conditions (M-4) 1 n ˆ ˆ in Appendix A hold true. Then, for every sequence  2  such that ¹Y ; X ;  º ;n i i ;n n i=1 n 1 inf ¹Y ; X ; º + o ¹n º, it holds that 2 i i P n i=1 1 1 n   ! N 0;  C ; (2.11) ;n 0 where the matrices  and C are given as in Theorem 2.6. Remark 2.8 (Quantile Regression). Notice that the asymptotic covariance matrix of the quantile-specific q 1 1 parameter estimates  is given by ¹1 ºD D D , where 1 1 h i q ¹1º q 0 0 0 0 e D = E ¹X X º f ¹X  º G ¹X  º + G ¹X  º and (2.12) 1 YjX 2 0 1 0 0 h i ¹1º q 0 0 0 e D = E ¹X X º G ¹X  º + G ¹X  º : (2.13) 0 2 1 0 This simplifies to the covariance matrix of quantile regression parameter estimates by setting G ¹zº = z and G ¹zº = 0, which means ignoring the ES-specific part of our loss function and estimating equations. This demonstrates that the quantile regression method is nested in our regression procedure, also in terms of its asymptotic distribution. Remark 2.9 (Asymptotic Covariance of the ES and the Oracle Estimator). The ES-specific part of the asymptotic covariance is mainly governed by the term C , which depends on the quantity 1 1 1 q q q 2 q 0 0 0 0 e 0 Var Y X  Y  X  ; X + X  X  = Var ¹Y X  º1 0 X : (2.14) 0 fYX  g 0 0 0 2 0 It is reasonable that the asymptotic covariance of ES regression parameters depends on the truncated variance of Y given X as the asymptomatic covariance of mean regression parameters is driven by the 0 0 e conditional (non-truncated) variance of Y given X. The second term X  X  in (2.14) is included 0 0 since the ES represents a truncated mean where the truncation point itself is a statistical functional (the quantile). In comparison, we consider an oracle M-estimator for the ES-specific regression parameters  , given by the loss function e 0 e 2 ¹Y; X;  º = ¹Y X  º 1 ; (2.15) Oracle fYX  g where we assume that the true quantile regression parameters  are known. The resulting asymptotic 6 covariance is given by h   i 1 1 e 0 0 0 e 0 0 AVar  = E X X  E ¹X X º Var Y X  Y  X  ; X  E X X ; (2.16) Oracle 0 0 0 0 e which shows that the additional term X  X  is not included for this estimator with fixed truncation 0 0 point X  . Remark 2.10 (Joint Estimation of the Sample Quantile and ES). We can use this regression framework to jointly estimate the quantile and ES of an identically distributed sample Y ; : : :;Y by regressing on a 1 n constant only. The asymptotic covariance matrix given in Theorem 2.6 and Theorem 2.7 then simplifies to with components ¹1 º = ; (2.17) f ¹ º Y 0 0 0 =  = ¹1 º ; (2.18) 12 21 f ¹ º 1 1 q q q e 2 = Var¹Y  jY   º + ¹  º ; (2.19) 0 0 0 0 where  and  are the true quantile and ES of Y. The same result is obtained by Zwingmann and 0 0 Holzmann (2016), who further allow for a distribution function for Y which is not differentiable at the quantile with strictly positive derivative. Notice that in this simplified case without covariates, the asymptotic covariance matrix is independent of the specification functions G and G used in the loss 1 2 function and in the estimating equations. Furthermore, (2.17) implies that quantile estimates stemming from our joint estimation procedure have the same asymptotic efficiency as quantile estimates stemming from minimizing the generalized piecewise linear loss (Gneiting, 2011) and as sample quantiles (cf. Koenker, 2005). The same holds true for the efficiency of the sample ES estimators (based on the sample quantile) of Brazauskas et al. (2008) and Chen (2008). Remark 2.11 (Pseudo-R and the choice of a¹Y º). By choosing a¹Yº = G ¹Yº + G ¹Yº 1 2 in (2.2), we can guarantee non-negative losses ¹Y; X; º  0. This choice enables us to define a pseudo-R for our joint regression framework in the sense of Koenker and Machado (1999), ¹Y; X; º QE R = 1 ; (2.20) ¹Y; X; º ˆ ˜ where  denotes the parameter estimates of the full regression model and  denotes the parameter estimates of a regression model restricted to an intercept term only. However, this choice of a¹Yº comes at the cost of more restrictive moment conditions, since we need to impose that E G ¹Yº + G ¹Yº < 1. 1 2 2.3. Choice of the Specification Functions The loss functions and the estimating equations given in (2.2) and (2.3) depend on two specification functions, G and G (with derivative G ), which have to fulfill the regularity conditions (A-3) in 1 2 2 Assumption 2.2. Fissler et al. (2016) already mention the feasible choices G ¹zº = 0, G ¹zº = z, 1 1 G ¹zº = exp¹zº and G ¹zº = exp¹zº 1 + exp¹zº in order to show that this class is non-empty. In contrast 2 2 to the loss functions of mean, quantile and expectile regression, there is no natural choice for these specification functions for the quantile and ES yet (Nolde and Ziegel, 2017). However, as the choice of these functions strongly influences the performance of our regression procedure in terms of its asymptotic efficiency, the necessary moment conditions of the regressors and the numerical performance of the optimization algorithm, we discuss sensible selection criteria in the following. 7 Efron (1991) and Nolde and Ziegel (2017) argue that for M-estimation of regression parameters it is crucial that the utilized loss function is positively homogeneous of some order b 2 R in the sense that ¹cY; X; cº = c ¹Y; X; º (2.21) for all c > 0. This is an important property for loss functions since the ordering of the losses should be independent of the unit of measurement, e.g. the currency we measure the prices and risk forecasts with. Loss functions following this property guarantee that we can change the scaling and still obtain the same optima and consequently the same parameter estimates. For the pair consisting of the quantile and the ES, Nolde and Ziegel (2017) characterize the full class of positively homogeneous3 loss functions of order b for the case where we restrict the domain of G , i.e. the conditional ES to the negative real line4, b < 0 : G ¹zº = c ; G ¹zº = c ¹zº + c ; (2.22) 1 0 2 1 0 b = 0 : G ¹zº = d 1 + d 1 ; G ¹zº = c log¹zº + c ; (2.23) 1 0 2 1 0 fz0g fz>0g 0 b b b 2 ¹0; 1º : G ¹zº = d 1 + d 1 jzj c ; G ¹zº = c ¹zº + c ; (2.24) 1 1 0 2 1 0 fz0g fz>0g 0 0 0 for some constants c ; d ; d 2 R with d  d , d ; d  0 and c > 0. There are no positively homogeneous 0 0 0 1 1 0 0 1 loss functions for the cases b  1. Our numerical simulations show that there is no gain in efficiency or numerical accuracy by deviating from the choice G ¹zº = 0 (see also Fissler et al., 2016; Nolde and Ziegel, 2017; Ziegel et al., 2017), which is also consistent with the homogeneity result. Consequently, we use G ¹zº = 0 in the following. A different natural guiding principle for selecting the specification functions is induced by choos- ing G (and G ) such that the moment conditions (M-1) - (M-4) in Appendix A are as least re- 2 1 strictive and as parsimonious as possible. For instance, choosing G such that G and its first 2 2 and second derivatives are bounded functions (and G ¹zº = 0) results in the moment condition 5 4 3 2 E jjXjj + jjXjj E jYj X + jjXjj E Y X + ja¹Yºj < 1. This motivates the usage of bounded func- tions5 for G such as e.g. the second example of Fissler et al. (2016), G ¹zº = exp¹zº 1 + exp¹zº , which 2 2 is the distribution function of the standard logistic distribution. Further examples of bounded G functions include the distribution functions of absolutely continuous distributions on the real line. In the simulation study in Section 4.2, we compare the performance of different specification functions in terms of mean squared error, asymptotic efficiency of the estimator and computation times. 3. Numerical Estimation of the Model In this section, we discuss the difficulties one encounters and the solutions we propose for estimating the joint regression model. Section 3.1 illustrates the numerical optimization procedure we employ for estimating the regression parameters and Section 3.2 discusses different estimation methods for the covariance matrix of the estimator. 3.1. Optimization Theorem 2.6 and Theorem 2.7 imply that both, M-estimation and Z-estimation of the regression parameters have the same asymptotic efficiency and consequently, we discuss these estimation approaches in terms of their numerical performance in the following. The numerical implementation of the Z-estimator relies 3For b = 0, only the loss differences are positively homogeneous. However, the ordering of the losses is still unaffected under this slightly weaker property. 4Since the conditional ES of financial assets for small probability levels is always negative, this is no critical restriction. 0 e However, for the numerical parameter estimation, we have to restrict the parameter space  such that X  < 0 for all  2  and for all X in the underlying sample. For details on this, we refer to Section 3.1. 5 Note that the positively homogeneous loss functions exhibit unbounded G functions. However, as the function G ¹zº does 2 2 not grow faster than linear as z tends to infinity, the resulting finite moment conditions are not too restrictive. 8 on root-finding of the estimating equations given in (2.3), which we implement as in GMM-estimation P P by minimizing the inner product ¹Y ; X ; º  ¹Y ; X ; º. However, the estimating equations are i i i i i i 0 e redescending to zero for many attractive choices of G in the sense that ¹Y; X; º ! 0 for X  ! 1. 2 2 q 0 e Consequently, for  such that  =  and X  ! 1, we get the same minimal value of the Z-estimation P P objective function ¹Y ; X ; º  ¹Y ; X ; º as for the true regression parameters  . Thus, the i i i i i i 0 Z-estimator is numerically unstable and diverges in many setups. Consequently, we rely on M-estimation of the regression parameters in the following. As the loss functions given in (2.2) are not differentiable and non-convex for all applicable choices of the specification functions (Fissler, 2017), we apply a derivative-free global optimization technique. More specifically, we use the Iterated Local Search (ILS) meta-heuristic of Lourenço et al. (2003), which successively refines the parameter estimates by repeated optimizations with iteratively perturbed starting values. Our q e exact implementation consists of the following steps. First, we obtain starting values for  and  from two quantile regressions of Y on X for the probability levels and ˜ , where we choose ˜ such that the ˜ -quantile and the -ES coincide under normality. Second, using these starting values we minimize the loss function with the derivative-free and robust Nelder-Mead Simplex algorithm (Nelder and Mead, 1965). Third, we perturb the resulting parameter estimates by adding normally distributed noise with zero mean and standard deviation equal to the estimated asymptotic standard errors of the initial quantile regression estimates. Fourth, we re-optimize the model with the perturbed parameter estimates as new starting values. If the loss is further decreased by this re-optimization, we update the estimates and otherwise, we retain the previous ones. Fifth, we iterate over the previous two steps until the loss does not decrease in m = 10 consecutive iterations. Our numerical experiments indicate that this repeated optimization procedure yields estimates very close to the ones stemming from other global optimization techniques such as e.g. simulated annealing, whereas the major advantage of ILS is the considerably lower computation time. For the choices of the specification functions which result in positively homogeneous loss functions, we have to restrict the domain of G to the negative real line as already discussed in Section 2.3. Thus, 0 e we have to restrict  such that X  < 0 for all  2  and for all i = 1; : : :; n during the optimization process. Even though in financial risk management the response variable Y is usually given by financial returns where the true (conditional) ES is strictly negative, there might still be some outliers X such that 0 e 0 e X   0. In such a case, imposing the restriction X  < 0 for all i = 1; : : :; n during the optimization i i process generates substantially biased estimates for  . In order to avoid this, we estimate the regression model for the transformed dependent variables Y max¹Yº for the positively homogeneous loss functions and add max¹Yº to the estimated intercept parameters to undo the transformation6. We provide an R package for the estimation of the regression parameters (see Bayer and Dimitriadis, 2017a). This package contains an implementation of both, the M- and the Z-estimator, where different optimization algorithms can be chosen (ILS, simulated annealing). The package allows for choosing the specification functions G and G and it includes an option to estimate the model either with or without 1 2 the translation of the dependent variable. Furthermore, the covariance matrix of the parameter estimates can be estimated either by using the asymptotic theory and the resulting techniques we discuss in the next section, or by using the nonparametric iid bootstrap (Efron, 1979). We recommend applying the M-estimator with the ILS algorithm as this procedure exhibits the best performance in our numerical experiments with respect to accuracy, stability and computation times. 6 Note that this data transformation changes the average loss function as the applied loss functions are in general not translation invariant. Thus, optimizing the translated loss function can lead to different parameter estimates. However, we do 0 e not face the risk of obtaining substantially biased estimates in cases where X   0 for some i 2 f1; : : : ng. Our numerical i 0 0 e experiments indicate that the difference between estimating the model for Y and for Y max¹Yº is small when X  < 0 for all i 0 0 e i 2 f1; : : : ng, but can be quite substantial if there is an outlier for X such that X   0. i 0 9 3.2. Asymptotic Covariance Estimation While most parts of the asymptotic covariance matrix given in Theorem 2.6 and Theorem 2.7 are straightforward to estimate, two nuisance quantities impose some difficulties. The first is the density quantile function f ¹X  º, which is already well investigated in the quantile regression literature. In YjX particular, we consider the estimators proposed by Koenker (1994), henceforth denoted by iid and by Hendricks and Koenker (1992), henceforth denoted by nid. The main difference between these is that the first is based on the assumption that the quantile residuals are independent of the covariates, whereas the second allows for a linear dependence structure. Both approaches depend on a bandwidth parameter which we choose according to Hall and Sheather (1988). The second nuisance quantity is the variance of the quantile residuals, conditional on the covariates and given that these residuals are negative, q q 0 0 q q Var Y X  Y  X  ; X = Var u u  0; X : (3.1) 0 0 Estimation of this quantity is demanding for two reasons. First, for very small probability levels which are typical in financial risk management such as e.g. = 2:5%, the truncation u  0 cuts off all but very few (about  n) observations. Second, modeling this truncated variance conditional on the covariates X is challenging, especially considering the very small sample sizes. Under the assumption of homoscedasticity, i.e. that the distribution of u is independent of the covariates X, we can simply estimate (3.1) by the sample variance of the negative quantile residuals and we refer to this estimator as ind in the following. We propose two further estimators which allow for a dependence of the quantile residuals on the covariates. For this purpose, we assume a location-scale process with linear7 specifications of the conditional mean and standard deviation in order to explicitly model the conditional relationship of u on X, q 0 0 u = X  + X  "; (3.2) for some parameter vectors ;  2 R and where "  G¹0; 1º follows a zero mean, unit variance distribution, q 0 0 2 such that u jX s G X ;¹X º with distribution function F and density f . As we need to estimate G G q q 0 2 the truncated variance of u given u  0, i.e. a truncated variant of¹X º , one possibility is to estimate (3.2) only for those observations where u  0. However, this approach particularly suffers from the very few negative quantile residuals as we need to estimate additional parameters compared to the ind approach. We present a feasible alternative by estimating the parameters  and  using all available observations of u and X by quasi generalized pseudo maximum likelihood (Gourieroux and Monfort, 1995, Section q q 8.4.4) and we obtain the truncated conditional variance by the scaling formula Var¹u ju  0; Xº = ¯ ¯ 0 0 2 q z h¹zº dz zh¹zº dz , where h¹zº = f ¹zºF ¹0º is the truncated conditional density of u G G 1 1 given X and u  0. We propose one parametric estimator, henceforth denoted by scl-N, where we assume that the distribution G is the normal distribution and apply a closed-form solution to the scaling formula. We further propose a semiparametric estimator, henceforth denoted by scl-sp, where we estimate the distribution G nonparametrically and then apply the scaling formula for this estimated density by numerical integration. 4. Simulation Study In this section, we investigate the finite sample behavior of the M-estimator and verify the asymptotic properties derived in Section 2.2 through simulations. Furthermore, we compare the performance of 7 This approach can further be generalized by considering more general specifications for the conditional mean and standard deviation. However, our numerical experiments indicate that the estimation accuracy for the asymptotic covariance matrix does not increase by deviating from these linear specifications. 10 different choices for the specification functions and evaluate the precision of the different covariance matrix estimators described in Section 3.2. 4.1. Data Generating Process In order to assess the numerical properties of estimating the joint regression model, we simulate data from a linear location-scale data generating process (DGP), 0 0 Y = X +¹X º v; (4.1) where v  F¹0; 1º has zero mean and unit variance, X = 1; X ; : : :; X and ;  2 R . For this process, 2 k the true conditional quantile and ES are linear functions in X, given by 0 0 Q ¹YjXº = X ¹ + z º and ES ¹YjXº = X ¹ +  º; (4.2) where z and  are the quantile and ES of the distribution F¹0; 1º, which implies that  = + z  and = +  . Furthermore, the conditional distributions of the quantile- and ES-residuals are given by q 0 0 2 e 0 0 2 u jX  F z ¹X º; ¹X º and u jX  F  ¹X º; ¹X º : (4.3) For the simulation study, we want to assess the performance of our regression procedure in various setups. Thus, we specify ,  and F in the following such that we get data which is homoscedastic (DGP-(1)) and heteroskedastic (DGP-(2)). Furthermore, we include a regression setup with multiple, correlated regressors and a leptocurtic conditional distribution (DGP-(3)), DGP-(1): X = ¹1; X º, X   and YjX  N X ; 1 2 2 2 2 2 DGP-(2): X = ¹1; X º, X   and YjX  N X ; ¹1 + 0:5X º 2 2 2 2 DGP-(3): X = ¹1; X ; X º X ; X  U»0; 1¼ with corr¹X ; X º = 0:5 and 2 3 2 3 2 3 YjX  t X X ; ¹1 + X + X º . 5 2 3 2 3 We simulate all three processes 25,000 times with varying sample sizes of n = 250, 500, 1000, 2000 and 5000 observations. For each replication and for each of the sample sizes we regress the simulated Y’s on the covariates X using our joint regression method for the probability level = 2:5%. 4.2. Comparing the Specification Functions We start the discussion of the simulation results by investigating the numerical performance of the M-estimator based on different choices of the specification function8 G used in the loss function in (2.2). We use three natural examples resulting in positively homogeneous loss functions of order b = 1, b = 0 and b = 0:5 respectively9, a bounded G function and the (unbounded) exponential function: G ¹zº = 1z; G ¹zº = log¹zº; G ¹zº = z; 2 2 2 (4.4) G ¹zº = log 1 + exp¹zº ; and G ¹zº = exp¹zº: 2 2 Figure 1 presents the sum (over the 2k regression parameters) of the mean squared errors (MSE) of the regression parameters for the three DGPs described above, different sample sizes and for the five choices of the specification functions given in (4.4). As implied by the asymptotic theory, we obtain consistent parameter estimates for all five choices of the specification functions as the MSEs converge to zero for all three DGPs. However, they differ substantially with respect to their small sample properties. The 8Following the reasoning of Section 2.3 and Nolde and Ziegel (2017); Ziegel et al. (2017), we fix G ¹zº = 0 throughout the simulation study. 9Our numerical simulations show that the numerical results are unaffected by different choices of the associated constants in (2.22) - (2.24). 11 three positively homogeneous specifications result in the most accurate estimates, whereas the choices G ¹zº = z and G ¹zº = log¹zº tend to perform slightly better than the choice G ¹zº = 1z. 2 2 2 Furthermore, the bounded choice G ¹zº = log 1 + exp¹zº still performs better than the unbounded exponential function. DGP-(1) DGP-(2) DGP-(3) 0.3 1.5 30 G (z) = G (z) = G (z) = 2 2 2 −log(−z) −log(−z) −log(−z) √ √ √ − −z − −z − −z 0.2 1.0 20 −1/z −1/z −1/z log(1 + exp(z)) log(1 + exp(z)) log(1 + exp(z)) exp(z) exp(z) exp(z) 0.1 0.5 10 0.0 0.0 0 250 500 1000 2000 5000 250 500 1000 2000 5000 250 500 1000 2000 5000 Sample Size Sample Size Sample Size Figure 1: Sum of the mean squared errors of the parameter estimates for all three DGPs. The results are shown for the five choices of the specification functions given in (4.4) and a range of sample sizes. Table 1 reports the Frobenius norms of the lower triangular parts of the true asymptotic covariance matrices and of the respective (lower triangular) quantile-specific and the ES-specific sub-matrices for the three DGPs and for the five choices of the specification functions given in (4.4). For comparison, we also report the Frobenius norm of the lower triangular part of the asymptotic covariance of the quantile regression estimator. We approximate the true asymptotic covariance matrix through Monte-Carlo integration with a sample size of 10 using the formulas in Theorem 2.6 and by using the true density and conditional truncated variance. On average, the specification functions G ¹zº = log¹zº and G ¹zº = z exhibit the smallest asymptotic covariances, closely followed by the third choice for a positively homogeneous loss function, G ¹zº = 1z. The non-homogeneous choices lead to considerably larger asymptotic variances for all considered DGPs and sub-matrices. Furthermore, by comparing the quantile-specific parameters of the joint estimation approach (from the positively homogeneous loss functions) to quantile regression estimates, we roughly obtain the same asymptotic efficiency. Table 1: This table reports the Frobenius norms of the lower triangular parts of the asymptotic covariance matrices and the respective quantile-specific and the ES-specific sub-matrices for the three DGPs and for the five choices of the specification functions given in (4.4). For comparison, we report the same quantity for the asymptotic covariance of the quantile regression estimator. DGP-(1) DGP-(2) DGP-(3) Q ES Full Q ES Full Q ES Full G ¹zº = log¹zº 7.5 13.1 9.2 17.9 26.9 20.0 581.1 1739.1 1053.0 G ¹zº = z 7.0 11.8 8.4 18.0 25.4 19.3 584.5 1740.1 1054.4 G ¹zº = 1z 9.1 16.9 11.8 24.1 39.4 28.5 613.7 1851.9 1119.8 G ¹zº = log¹1 + exp¹zºº 15.4 21.5 16.6 72.4 80.1 67.1 987.9 2393.0 1496.4 G ¹zº = exp¹zº 15.8 22.6 17.2 74.6 84.5 70.0 1001.9 2440.4 1524.6 Quantile Regression 6.8 – – 21.4 – – 600.5 – – 4.3. Comparing the Variance-Covariance Estimators In this section, we compare the empirical performance of the asymptotic covariance estimators discussed in Section 3.2. For the comparison of their precision, Figure 2 reports the average of the Frobenius norm of the lower triangular part of the differences between the estimated covariances and the empirical covariance of the estimated parameters. We report results for the three homogeneous loss functions and the three Mean Squared Error DGP-(1) DGP-(1) DGP-(1) G (z) = −log(−z) G (z) = − −z G (z) = −1/z 2 2 2 10 10 10 iid / ind iid / ind iid / ind nid / scl-N nid / scl-N nid / scl-N 8 8 8 nid / scl-sp nid / scl-sp nid / scl-sp 6 Bootstrap 6 Bootstrap 6 Bootstrap 4 4 4 2 2 2 0 0 0 250 500 1000 2000 5000 250 500 1000 2000 5000 250 500 1000 2000 5000 DGP-(2) DGP-(2) DGP-(2) G (z) = −log(−z) G (z) = − −z G (z) = −1/z 2 2 2 25 25 25 iid / ind iid / ind iid / ind nid / scl-N nid / scl-N nid / scl-N 20 20 20 nid / scl-sp nid / scl-sp nid / scl-sp 15 Bootstrap 15 Bootstrap 15 Bootstrap 10 10 10 5 5 5 0 0 0 250 500 1000 2000 5000 250 500 1000 2000 5000 250 500 1000 2000 5000 DGP-(3) DGP-(3) DGP-(3) G (z) = −log(−z) 2 G (z) = − −z G (z) = −1/z 2 2 iid / ind iid / ind iid / ind nid / scl-N nid / scl-N nid / scl-N 1000 1000 1000 nid / scl-sp nid / scl-sp nid / scl-sp Bootstrap Bootstrap Bootstrap 500 500 500 0 0 0 250 500 1000 2000 5000 250 500 1000 2000 5000 250 500 1000 2000 5000 Sample Size Sample Size Sample Size Figure 2: This figure compares four covariance estimation approaches described in Section 3.2 for the three data generating processes, a range of sample sizes and the three positively homogeneous choices of the G -functions. We report the average of the Frobenius norm of the lower triangular part of the differences between the estimated asymptotic covariances and the empirical covariance of the M-estimator. DGPs, where each of the plots presents the average norm differences for the four covariance estimators (iid/nid, nid/scl-N, nid/scl-sp and the iid bootstrap) depending on the sample size. We find that the iid/nid estimator performs well for the first, homoscedastic DGP whereas for the other two DGPs, it fails to capture the underlying more complicated dynamics of the data. The nid/scl-N estimator outperforms the other estimation approaches in the first two DGPs, where the underlying conditional distribution follows a normal distribution whereas its performance drops for the third DGP, which follows a Student-t distribution. The performance of the flexible nid/scl-sp estimator is the most stable throughout all three DGPs. Eventually, the bootstrap estimator accurately estimates the covariance for all three DGPs, whereas in comparison to the other estimators, it is particularly good in small samples. The provided R package contains all four covariance estimators. 5. Empirical Application In this empirical application, we use our joint regression framework for forecasting the VaR and ES of the close-to-close log returns of the IBM stock. For that purpose, we adopt the forecasting framework of Frobenius Norm Frobenius Norm Frobenius Norm Žikeš and Baruník (2016) and jointly forecast the VaR and ES of daily financial returns r by q q e e Q ¹r jRV º =  +  RV and ES ¹r jRV º =  +  RV ; (5.1) t t1 t1 t t1 t1 1 2 1 2 2 12 where RV = ¹ r º denotes the realized volatility estimator (Andersen and Bollerslev, 1998) for day t;i t, where r denotes the i-th high-frequency return of day t. Our dataset consists of the five minute returns t;i of the IBM stock from January 3, 2001 to July 18, 2017 with total of 4120 days, which we obtain from the TAQ database. We estimate the model parameters using a rolling window of 1000 days and evaluate the forecasts on the remaining 3120 days. We compare the predictive power of this model against three standard models from the literature. The first is the historical simulation (HS) approach, which forecasts the VaR and ES for day t as the sample quantile and ES of the daily returns of the past 250 trading days. The second is an AR(1)-GARCH(1,1)-t model (Bollerslev, 1986), and the third is the Heterogeneous Auto-Regressive (HAR) model of Corsi (2009), based on the realized volatility estimates given above. Forecasts of the VaR and ES for the HAR model are obtained from the volatility forecasts and by assuming a Gaussian return distribution. While the first two of these approaches rely on daily data only, the third one incorporates the same high frequency information as our approach. We evaluate the forecasting power of the VaR and ES of these models by the class of strictly consistent loss (scoring) functions for the VaR and ES of Fissler and Ziegel (2016). We use Murphy diagrams introduced by Ehm et al. (2016) and Ziegel et al. (2017), which provide a parsimonious way to evaluate competing forecasts simultaneously for a full class of strictly consistent loss functions. In fact, one forecasting model significantly dominates another one with respect to the full class of strictly consistent loss functions if and only if the elementary score differences plotted in the Murphy diagrams are strictly negative (positive). For further details on the theory and the implementation of Murphy diagrams, we refer to Ehm et al. (2016) and Ziegel et al. (2017). Difference to HS Difference to GARCH Difference to HAR 0.000 −0.002 −0.004 −0.006 −0.008 −0.10 −0.05 0.00 −0.10 −0.05 0.00 −0.10 −0.05 0.00 Threshold Threshold Threshold Figure 3: Elementary Score Differences of the VaR/ES Regression and the respective comparison models Figure 3 displays the average of the elementary score differences of the joint VaR and ES regression model against the three alternative models together with the respective 95% pointwise confidence bands for the elementary scores provided in Ziegel et al. (2017) for the pair VaR and ES. Using this graphical method, we can see that the elementary score differences for the joint regression forecasting model against the historical simulation and AR(1)-GARCH(1,1)-t model are significantly negative for the vast majority of threshold values. This implies that the joint regression forecasting model significantly dominates these other two forecasting approaches. Even though we also observe strictly negative elementary score differences in comparison against the HAR model, these differences are not significant and consequently, we cannot significantly outperform this model. 6. Conclusion In this paper, we introduce a joint regression technique for the quantile (the VaR) and the ES. This regression approach relies on the class of strictly consistent joint loss functions introduced by Fissler and Score Difference Ziegel (2016), which permits the joint elicitation of the quantile and the ES. We introduce an M- and a Z-estimator for the parameters of the joint regression model. Given a set of standard regularity conditions, we show consistency and asymptotic normality for both estimators, which we also verify numerically through extensive simulations. The underlying loss functions, the estimating equations and the asymptotic covariance matrices of the estimators depend on the choice of two specification functions, which we investigate in terms of the resulting moment conditions, asymptotic efficiency, numerical performance and computation times. In our numerical simulations, we find that choices resulting in positively homogeneous loss functions dominate other choices with respect to the aforementioned criteria. Furthermore, we propose several estimation methods for the asymptotic covariance matrix, which are able to cope with different properties of the underlying data. We provide an R package (see Bayer and Dimitriadis, 2017a), which implements the M- and Z-estimation procedures where one can choose the underlying specification functions, the numerical optimization approach and the estimation method for the asymptotic covariance matrix. Our new joint regression technique allows for a wide range of applications for the risk measures VaR and ES. This regression approach can be used to model the ES (jointly with the VaR) by generalizing existing applications of quantile regression on VaR, such as e.g. in Koenker and Xiao (2006), Engle and Manganelli (2004), Chernozhukov and Umantsev (2001), Žikeš and Baruník (2016), Halbleib and Pohlmeier (2012), Komunjer (2013) and Xiao et al. (2015). As an illustration, we present an empirical application in this paper where we use this regression framework to jointly forecast VaR and ES based on realized volatility estimates. Furthermore, Bayer and Dimitriadis (2017b) use this regression to develop an ES backtest which is particularly relevant in light of the recent introduction of ES into the Basel regulatory framework and the present lack of accurate backtesting methods for the ES. Acknowledgements We thank Tobias Fissler, Lyudmila Grigoryeva, Roxana Halbleib, Phillip Heiler, Frederic Menninger, Winfried Pohlmeier, Patrick Schmidt, Johanna Ziegel and the participants of the Stochastics Colloquium on 11/30/2016 at the University of Konstanz for fruitful discussions and suggestions which inspired some of the results of this paper. Financial support by the Heidelberg Academy of Sciences and Humanities (HAW) within the project “Analyzing, Measuring and Forecasting Financial Risks by means of High-Frequency Data”, by the German Research Foundation (DFG) within the research group “Robust Risk Measures in Real Time Settings” and general support by the Graduate School of Decision Sciences (University of Konstanz) is gratefully acknowledged. The computation in this work was performed on the computational resource bwUniCluster funded by the Ministry of Science, Research and the Arts Baden-Württemberg and the Universities of the State of Baden-Württemberg, Germany, within the framework program bwHPC. Appendix A Finite Moment Conditions For convenience of the supremum notation, for all  2 int¹º and for d > 0, we define the open neighborhood U ¹º = f 2  : jj jj < dg and its closure U ¹º = f 2  : jj jj  dg. d d (M-1) For Theorem 2.4, we assume that the following moments are finite for some d > 0: ¹1º ¹2º 2 0 q 3 0 e • E»jjXjj sup jG ¹X  ºj¼ • E»jjXjj sup jG ¹X  ºj¼ 2U ¹ º 2U ¹ º d 0 1 d 0 2 0 0 ¹2º 2 0 q • E»jjXjj sup jG ¹X  ºj¼ 2U ¹ º 1 ¹1º d 0 2 0 e • E»jjXjj sup jG ¹X  ºj E»jYjjX¼¼ 2U ¹ º d 0 2 2 0 e • E»jjXjj sup jG ¹X  ºj¼ 2U ¹ º d 0 ¹1º ¹2º 3 0 e 2 0 e • E»jjXjj sup jG ¹X  ºj¼ • E»jjXjj sup jG ¹X  ºj E»jYjjX¼¼ 2U ¹ º 2U ¹ º d 0 2 d 0 2 0 0 (M-2) For Theorem 2.5, we assume that the following moments are finite: 15 2 0 e • E»jjXjj ¼ • E»jjXjj sup jG ¹X  ºj¼ 0 q • E»sup jG ¹X  ºj¼ 2 0 e • E»sup jG ¹X  ºj E»jYjjX¼¼ • E»jG ¹Yºj¼ 0 e • E»ja¹Yºj¼ • E»sup jG ¹X  ºj¼ (M-3) For Theorem 2.6, we assume that the following moments are finite for some constant d > 0 and for all  2 U ¹ º: d 0 ¹1º ¹2º 3 0 q 0 q • E»jjXjj ¹sup G ¹X  ºº¹sup G ¹X  ˜ ºº¼ ¯ ¯ 2U ¹ º  ˜2U ¹ º d 0 1 d 0 1 0 0 ¹1º ¹1º 3 0 q 0 e • E»jjXjj ¹sup G ¹X  ºº¹sup G ¹X  ˜ ºº¼ ¯ ¯ 2U ¹ º  ˜2U ¹ º d 0 1 d 0 2 0 0 ¹2º 3 0 e 0 q • E»jjXjj ¹sup G ¹X  ºº¹sup G ¹X  ˜ ºº¼ ¯ ¯ 2U ¹ º  ˜2U ¹ º d 0 d 0 1 0 0 ¹1º 3 0 e 0 e • E»jjXjj ¹sup G ¹X  ºº¹sup G ¹X  ˜ ºº¼ ¯ 2 ¯ 2U ¹ º  ˜2U ¹ º d 0 d 0 0 0 ¹1º 3 0 q 2 • E»jjXjj sup ¹G ¹X  ºº ¼ 2U ¹ º d 0 1 3 0 e 2 • E»jjXjj sup ¹G ¹X  ºº ¼ 2U ¹ º d 0 ¹1º 3 0 q 0 e • E»jjXjj sup G ¹X  ºG ¹X  º¼ 2U ¹ º d 0 1 ¹1º ¹2º 5 0 e 0 e • E»jjXjj ¹sup G ¹X  ºº¹sup G ¹X  ˜ ºº¼ ¯ ¯ 2U ¹ º  ˜2U ¹ º d 0 2 d 0 2 0 0 ¹1º 5 0 e 2 • E»jjXjj ¹sup G ¹X  ºº ¼ 2U ¹ º d 0 2 ¹1º ¹2º 4 0 e 0 e • E»jjXjj ¹sup G ¹X  ºº¹sup G ¹X  ˜ ººE»jYjjX¼¼ ¯ ¯ 2U ¹ º  ˜2U ¹ º 2 2 d 0 d 0 0 0 ¹1º ¹1º 3 0 e 0 e • E»jjXjj G ¹X  º¹sup G ¹X  ººE»jYjjX¼¼ 2U ¹ º 2 d 0 2 ¹1º ¹2º 3 0 e 0 e 2 • E»jjXjj G ¹X  º¹sup G ¹X  ººE»Y jX¼¼ 2U ¹ º 2 d 0 2 ¹1º ¹2º 3 0 e 0 e 2 • E»jjXjj ¹sup G ¹X  ºº¹sup G ¹X  ˜ ººE»Y jX¼¼ ¯ ¯ 2U ¹ º  ˜2U ¹ º d 0 2 d 0 2 0 0 (M-4) For Theorem 2.7, we assume that the following moments are finite for some constant d > 0: ¹1º 2 0 e • E»jG ¹Yºj¼ • E»jjXjj sup jG ¹X  ºj¼ 2U ¹ º d 0 2 2 0 e 2 • E»ja¹Yºj¼ • E»jjXjj sup ¹G ¹X  ºº ¼ 2U ¹ º d 0 ¹1º 0 q ¹1º 4 0 e 2 • E»jjXjj sup jG ¹X  ºj¼ 2U ¹ º • E»jjXjj sup ¹G ¹X  ºº ¼ d 0 1 0 2U ¹ º 0 2 ¹1º ¹1º 2 0 q 2 0 e • E»jjXjj sup ¹G ¹X  ºº ¼ ¯ • E»jjXjj sup jG ¹X  ºj E»jYjjX¼¼ 2U ¹ º ¯ d 0 1 2U ¹ º d 0 2 ¹1º ¹1º 3 0 e 2 2 0 q 0 e • E»jjXjj sup ¹G ¹X  ºº E»jYjjX¼¼ • E»jjXjj sup jG ¹X  ºG ¹X  ºj¼ ¯ ¯ 2 2U ¹ º 2U ¹ º d 0 2 d 0 1 ¹1º 0 e 2 0 e 2 2 • E»jjXjj sup jG ¹X  ºj¼ • E»jjXjj sup ¹G ¹X  ºº E»Y jX¼¼ ¯ 2 ¯ 2U ¹ º 2U ¹ º d 0 d 0 2 0 0 Appendix B Proofs Henceforth, jjvjj denotes the maximum norm for a vector v 2 R and for a matrix A, jjAjj denotes the row-sum matrix norm which is induced by the maximum norm for vectors. For convenience of the supremum notation, for all  2 int¹º and for some d > 0, we define the open neighborhood U ¹º = f 2  : jj jj < dg and its closure U ¹º = f 2  : jj jj  dg. All references to d d Appendix C refer to the online supplement Dimitriadis and Bayer (2017). Proof of Theorem 2.4. We apply Theorem 2 from Huber (1967) and show that the function ¹Y; X; º as given in (2.3) satisfies the respective assumptions of this theorem. Note that the parameter space is assumed to be compact and thus, we do not have to show condition (B-4) in the notation of Huber 16 (1967). As the product of continuous functions and the indicator function 1 0 q , the function is fYX  g measurable and regarded as a stochastic process in , is separable in the sense of Doob as it is almost surely continuous in  (Gikhman and Skorokhod, 2004, p.164). This condition assures measurability of the suprema10 given below and in Lemma C.1. In oder to show that has a unique root at  , let us first define the sets q q 0 q 0 0 q 0 U = ! 2 X¹!º  , X¹!º  ; and W = ! 2 X¹!º  = X¹!º  ; (B.1) 0 0 for all  2  such that = W [ U and W \ U = ;. We first show that P¹U º > 0 for all  ,  . In order to see this, we assume the converse, i.e. let us assume that for a fixed  ,  , it holds that 0 q 0 P¹W º = P X  = X  = 1, which implies that q q q q 0 0 q 0 q 0 ¹  º E»X X ¼¹  º = E X  X  = 0: (B.2) 0 0 0 q 0 However, since  ,  , this contradicts the assumption that the matrix E»X X ¼ is positive definite and we can conclude that P¹U º > 0. The quantity h i ¹1º q 0 q 0 e 0 q 0 ¹º = E ¹Y; X; º = 1 E X G ¹X  º + G ¹X  º F ¹X  º F ¹X  º 1 1 2 YjX YjX 1 0 exists under the moment conditions (M-1) in Appendix A and if  =  , it holds that  ¹º = 0. Now, we assume that  2  such that  ,  . By splitting the expectation, we get that 0 q ¹º ¹  º h i ¹1º q q 0 q 0 e 0 q 0 0 q 0 = 1 E G ¹X  º + G ¹X  º X  X  F ¹X  º F ¹X  º 1 2 YjX YjX f!2W g 1 0 0 h i ¹1º q q 0 q 0 e 0 q 0 0 q 0 + 1 E G ¹X  º + G ¹X  º X  X  F ¹X  º F ¹X  º 1 : 2 YjX YjX f!2U g 1 0 0 0 q 0 The first summand is obviously zero since for all ! 2 W , F ¹X  º F ¹X  º = 0. Since the YjX YjX distribution of Y given X has strictly positive density in a neighbourhood of X  , we get that F is YjX strictly increasing in a neighbourhood of X  and thus q q 0 q 0 0 q 0 X  X  F ¹X  º F ¹X  º > 0 (B.3) YjX YjX 0 0 ¹1º 0 q 0 e for all ! 2 U . Furthermore, since G ¹X  º + G ¹X  º > 0 for all  2  and P¹U º > 0, we get that 0 q ¹º ¹  º h i ¹1º q q 0 q 0 e 0 q 0 0 q 0 = 1 E G ¹X  º + G ¹X  º X  X  F ¹X  º F ¹X  º 1 > 0; 2 YjX YjX f!2U g 1 0 0 and consequently  ¹º , 0. This implies that  ¹º = 0 if and only if  =  . Furthermore, 1 1 h i ¹1º 0 e 0 q 0 q 0 e ¹º = E XG ¹X  º X  F ¹X  º  + X  1 E Y1 0 q X : (B.4) 2 YjX fYX  g q q q 0 q 0 Assuming that  =  , which results from  ¹º = 0, we get that F ¹X  º = F ¹X  º = and 1 YjX YjX 0 0 ¹1º 0 e 0 0 e e e 1 E Y1 X = X  . Thus, (B.4) simplifies to E ¹X X ºG ¹X  º   and by applying fYX  g 0 2 0 ¹1º 0 0 e Lemma C.2, we get that the matrix E ¹X X ºG ¹X  º is positive definite for all  2 . Consequently, e e ¹º = 0 if and only if  =  and together with the arguments for  , we get that ¹º = 0 if and only if 2 1 =  . Eventually, assumption (B-2)’ from Theorem 2 of Huber (1967) follows directly from Lemma C.1, which concludes this proof. 10 Many other authors such as e.g. Newey and McFadden (1994); Andrews (1994); van der Vaart (1998) rely on outer probability in order to avoid these measurability issues. 17 Proof of Theorem 2.5. For this proof, we apply Theorem 5.7 from van der Vaart (1998) and show that the respective assumptions of this theorem hold. As in the proof of Theorem 2.6, we can conclude measurability of the suprema since the process  is continuous and consequently separable in the sense of Doob. Thus, we do not have to rely on outer probability measures such as in van der Vaart (1998). We start by showing uniform convergence in probability of the empirical mean of the objective function by the help of Lemma 2.4 of Newey and McFadden (1994). Since we have iid data, a compact parameter space and ¹Y; X; º is continuous for all  2 , it remains to show that there exists a dominating function d¹Y; Xº  j¹Y; X; º for all  2  with E d¹Y; Xº < 1. We define 0 q 0 e 0 q d¹Y; Xº = supjG ¹X  º + 1 G ¹X  º¹X  Yºj + G ¹Yº 1 2 1 (B.5) 0 e 0 e 0 q 0 e + sup G ¹X  º X  X  + supjG ¹X  ºj + G ¹Yº + a¹Yº 2 2 1 2 2 and it holds that d¹Y; Xº  ¹Y; X; º for all  2  and consequently, we can conclude uniform convergence in probability. We now show that E ¹Y; X; º has a unique and global minimum at  =  . For this, we assume that 2  such that  ,  and we define the sets 0 q 0 0 e 0 e U = ! 2 X¹!º  , X¹!º  or X¹!º  , X¹!º  and (B.6) 0 q 0 0 e 0 e W = ! 2 X¹!º  = X¹!º  and X¹!º  = X¹!º  ; (B.7) such that = U [ W and U \ W = ;. We first show that P¹U º > 0 for all  ,  . In order to see this, we assume the converse, i.e. we assume that P¹W º = 1, which implies that h i q q q 2 q q 0 0 q 0 q 0 0 q ¹  º E»X X ¼¹  º = E X  X  = 0, since P X  = X = 1 and equivalently 0 0 0 0 e e 0 0 e e q e e ¹  º E»X X ¼¹  º = 0. However, since  ,  and consequently either  ,  or  ,  , this 0 0 0 0 contradicts the assumption that the matrix E»X X ¼ is positive definite and it follows that P¹U º > 0. From the joint elicitability property of the quantile and ES of Fissler and Ziegel (2016), Corollary 5.5 k 0 q 0 0 e 0 e we get that for all x 2 R such that x  , x  or x  , x  , it holds that 0 0 E ¹Y; X;  º X = x < E ¹Y; X; º X = x ; (B.8) since the distribution of Y given X has a finite first moment and a unique -quantile. Thus, for all ! 2 U , E ¹Y; X;  º X ¹!º < E ¹Y; X; º X ¹!º: (B.9) We now define the random variable h¹X; ;  º¹!º = E ¹Y; X;  º X ¹!º E ¹Y; X; º X ¹!º; (B.10) 0 0 and (B.9) implies that h X; ;  ¹!º < 0 for all ! 2 U . Since P¹U º > 0, this implies that E h¹X; ;  º1 < 0. Furthermore, for all ! 2 W , it obviously holds that h¹X; ;  º¹!º = 0 and 0 f!2U g  0 consequently E h¹X; ;  º1 = 0. Thus, we get that 0 f!2W g E h¹X; ;  º = E h¹X; ;  º1 + E h¹X; ;  º1 < 0 (B.11) 0 0 f!2U g 0 f!2W g for all  2  such that  ,  , which shows that E ¹Y; X; º has a unique minimum at  =  . 0 0 Proof of Theorem 2.6. We apply Theorem 3 of Huber (1967) for the -function as given in (2.3) and show the respective assumptions of this theorem. Consistency of the Z-estimator is shown in Theorem 2.4. For the measureability and separability of the function, we refer to the proof of Theorem 2.4. It is already shown in the proof of Theorem 2.4 that there exists a  2  such that ¹ º = 0. For the 0 0 technical conditions (N-3), we apply Lemma C.3, Lemma C.1 and Lemma C.4. It remains to show that E jj ¹Y; X;  ºjj < 1, which follows from the subsequent computation of C and the Moment 18 1 1 Conditions (M-3) in Appendix A. The asymptotic covariance matrix is given by  C , where C = E ¹Y; X;  º ¹Y; X;  º and 0 0 @ ¹º @ ¹º 1 1 © q e ª @¹º @ @ 11 12 ­ 0 0® = = = : (B.12) @ ¹º @ ¹º ® 2 2 21 22 = q e @ @ 0 0 « ¬ Straightforward calculations yield the matrix C as given in (2.8) - (2.10). For the computation of , we first notice that the function ¹1º 1 0 q 0 q 0 e F ¹X  º G ¹X  º + G ¹X  º YjX 2 E ¹Y; X; º X =   (B.13) ¹1º 0 e 0 e 0 q 0 q 0 q XG ¹X  º X  X  + E ¹X  Yº1 X fYX  g is continuously differentiable for all  in some neighborhood U ¹ º around  , since the distribution d 0 0 F has a density which is strictly positive, continuous and bounded in this area. Let us choose a value YjX 0 0 ˜ ˜ 2 U ¹ º such that X   X . Then, d 0 @ @ @ 0 q E Y1 X = E Y1 0 q X + E Y1 0 q 0 q X ˜ ˜ fYX  g fYX  g fX  <YX  g q q q @ @ @ ¹ 0 q (B.14) 0 q 0 q = y f ¹yºdy = X¹X  º f ¹X  º: YjX YjX @ 0 q We consequently get that for all  2 U ¹ º, d 0 ¹1º 0 0 q 0 e 0 q E ¹Y; X; º X = 1 ¹X X º G ¹X  º + G ¹X  º f ¹X  º 1 2 YjX ¹2º 0 q 0 q +G ¹X  º F ¹X  º ; YjX @ @ ¹1º 0 0 e 0 q E ¹Y; X; º X = E ¹Y; X; º X = 1 ¹X X ºG ¹X  º F ¹X  º ; 1 2 YjX e q 2 @ @ ¹2º 0 0 e 0 q 0 q 0 e E ¹Y; X; º X = 1 ¹X X ºG ¹X  º X  F ¹X  º + ¹X  º E Y1 0 q X 2 YjX fYX  g ¹1º 0 0 e +¹X X ºG ¹X  º: @ @ In order to conclude that E E ¹Y; X; º X = E E ¹Y; X; º X , we apply a measure-theoretical @ @ version of the Leibniz integration rule, which requires that the derivative of the integrand exists and is absolutely bounded by some integrable function d¹Y; Xº, independent of . For the first term, this can easily be obtained by defining h i ¹1º ¹2º 0 0 q 0 e 0 q 0 q 0 q d¹Y; Xº = sup 1 ¹X X º G ¹X  º + G ¹X  º f ¹X  º + G ¹X  º F ¹X  º ; 2 YjX YjX 1 1 2U ¹ º d 0 which has finite expectation by the Moment Conditions (M-3). The other two terms follow the same reasoning. Inserting  =  eventually shows (2.6) and (2.7). Proof of Theorem 2.7. For this proof, we apply Theorem 5.23 from van der Vaart (1998) and show that the respective assumptions of this theorem hold. Theorem 2.5 shows consistency of the M-estimator. The map¹Y; Xº 7! ¹Y; X; º is obviously measurable as the sum of measurable functions. Furthermore, the map  7! ¹Y; X; º is almost surely differentiable since the only point of non-differentiability occurs 0 q where Y = X  , which is a nullset with respect to the joint distribution of Y and X and for all  2  such 0 q that Y , X  , its derivative is given by ¹Y; X; º. Local Lipschitz continuity with square-integrable Lipschitz-constant follows from Lemma C.5. We have already seen in the proof of Theorem 2.5 that the function E ¹Y; X; º is uniquely minimized at the point  and is twice continuously differentiable and consequently admits a second-order Taylor expansion at  . Thus, we have shown the necessary 19 assumptions of Theorem 5.23 from van der Vaart (1998). For the computation of the covariance matrix, we notice that the distribution of Y given X has a density f in a neighborhood of X  , which is strictly positive, continuous and bounded. Therefore, by the same YjX 0 @ 0 q 0 q arguments as in (B.14), we get that E G ¹Yº1 0 q X = XG ¹X  º f ¹X  º. Thus, straight- q 1 fYX  g 1 YjX forward calculations yield that for all  2 U ¹ º, it holds that E ¹Y; X; º X = E ¹Y; X; º X and d 0 by applying the Leibniz integration rule such as in the proof of Theorem 2.6, we finally get that E ¹Y; X; º = E ¹Y; X; º : (B.15) Consequently, the asymptotic covariance matrix equals the one given in Theorem 2.6. Appendix C Technical Results Lemma C.1. Let u¹Y; X; ; dº = sup ¹Y; X; º ¹Y; X; º (C.1) 2U ¹º and assume that Assumption 2.1, Assumption 2.2 and the Moment Conditions (M-1) in Appendix A hold. Then, there are strictly positive real numbers b and d , such that E u¹Y; X; ; dº  b d for jj  jj + d  d ; (C.2) 0 0 and for all d  0. Proof of Lemma C.1. For measurability of the suprema, we refer to the proof of Theorem 2.4. Let in the following d > 0 and  2  such that jj  jj + d  d . We first notice that for some fixed X 2 R and 0 0 for all  2 U ¹º, it holds that q q 1 0 q 1 0 q  1 (C.3) 0 0 fYX  g fYX  g fX  YX  g ¯ ¯ for all Y 2 R and for some  ;  2 U ¹º. Since U ¹º is compact, we get that d d 0 q 0 q q q sup 1 1  1 0 0 (C.4) fYX  g fYX  g fX  YX  g 2U ¹º q q q q for all Y 2 R and for some values  ;  2 U ¹º. Note that the values  and  depend on X and , + + however they are independent of Y. Consequently, it holds that " # h i q q E sup 1 0 q 1 0 q X  E 1 X 0 0 fYX  g fYX  g fX  YX  g 2U ¹º q q (C.5) 0 0 q 0 q 0 0 q = F X  F X  = f ¹X  º X  X YjX YjX YjX + + 0 q 2jjXjj  sup f ¹X  º d; YjX 2U ¹º q q q ˜ ˜ ¯ where we apply the mean value theorem for some  on the line between  and  , i.e.  2 U ¹º. 20 For the first component of , we get that " # E sup ¹Y; X; º ¹Y; X; º 1 1 2U ¹º " # 0 e 0 e G ¹X  º G ¹X  º ¹1º ¹1º 2 2 0 q 0 q E sup X G ¹X  º G ¹X  º + (C.6) 1 1 2U ¹º " " # # 0 e G ¹X  º ¹1º 2 0 q 0 q 0 q + E sup X G ¹X  º +  E sup 1 1 X : fYX  g fYX  g ¯ ¯ 2U ¹º 2U ¹º d d ¹1º 0 q 0 e The first term in (C.6) is O¹dº since G ¹X  º and G ¹X  º are continuously differentiable functions w.r.t  and thus, by the mean value theorem we get that ¹1º ¹1º ¹2º 0 q 0 q 0 q q q sup G ¹X  º G ¹X  º  sup XG ¹X  ˜ º  sup 1 1 1 ¯ ¯ ¯ 2U ¹º  ˜2U ¹º 2U ¹º d d d (C.7) ¹2º 0 q sup XG ¹X  ˜ º  d; ˜2U ¹º and the respective moments are finite by assumption. The same arguments hold for the function G . For the second term in (C.6), we apply (C.5) and thus get that " " # # 0 e G ¹X  º ¹1º 2 0 q 0 q 0 q E sup X G ¹X  º +  E sup 1 1 X fYX  g fYX  g ¯ ¯ 2U ¹º 2U ¹º d d " # (C.8) 0 e G ¹X  º ¹1º 2 0 q 0 q E sup X G ¹X  º + jjXjj  sup f ¹X  º  d: YjX ¯ ¯ 2U ¹º 2U ¹º d d Since the density f is bounded in a neighborhood of X  and the respective moments are finite by YjX assumption, we get that this term is also O¹dº. For the second component of , we get that " # E sup ¹Y; X; º ¹Y; X; º 2 2 2U ¹º " # ¹1º ¹1º 0 e 0 q 0 e 0 e 0 q 0 e E sup X¹X  X  ºG ¹X  º X¹X  X  ºG ¹X  º 2 2 2U ¹º " " # # ¹1º 0 e 0 q XG ¹X  ºX + E  E sup 1 0 q 1 0 q X fYX  g fYX  g 2U ¹º " " ! # # ¹1º ¹1º 0 e 0 q 0 e 0 q XG ¹X  ºX  XG ¹X  ºX 2 2 0 q + E E sup 1 X fYX  g 2U ¹º " " # # ¹1º 0 e XG ¹X  º + E  E sup Y 1 0 q 1 0 q X fYX  g fYX  g 2U ¹º " " # # 0 q Y1 fYX  g ¹1º ¹1º 0 e 0 e + E E sup XG ¹X  º XG ¹X  º X 2 2 2U ¹º = ¹iº +¹iiº +¹iiiº +¹ivº +¹vº: ¹1º 0 e 0 q 0 e The first, third and fifth term are linearly bounded by (C.7) since the functions¹X  X  ºG ¹X  º ¹1º ¹1º 0 q 0 e 0 e and ¹X  ºG ¹X  º and G ¹X  º are continuously differentiable. For the second term, we use the 2 2 21 arguments from (C.5). For the fourth term, we use similar arguments as in (C.5), and get that there exist q q q q q ¯ ˜ some  ;  2 U ¹º and a value  on the line between  and  , such that + + " " # # ¹1º 0 e XG ¹X  º 0 q 0 q E E sup Y 1 1 X fYX  g fYX  g 2U ¹º " # ¹1º 0 e h i XG ¹X  º q q E E jYj 1 X 0 0 fX  YX  g " # ¹ 0 ¹1º 0 e XG ¹X  º (C.9) = E jyj f ¹yºdy YjX " # ¹1º 0 e XG ¹X  º 2 0 q 0 q 0 0 q ˜ ˜ E jX  j f ¹X  º X  X YjX + " # ¹1º 0 e 0 q 0 q E G ¹X  º X sup jX  j f ¹X  º  d = O¹dº YjX 2U ¹º since f is bounded in a neighborhood of X  and the respective moments exist by assumption. This YjX concludes the proof of the lemma. Lemma C.2. Let the random variable X 2 R with distribution P be such that its second moments exist 0 k and the matrix E»X X ¼ is positive definite. Furthermore, let   R be a compact subspace with nonempty interior and let g : R   ! R be a strictly positive function. Then, the matrix E ¹X X ºg¹X; º (C.10) is also positive definite. 0 k Proof of Lemma C.2. Since E»X X ¼ is positive definite, we know that for all z 2 R with z , 0, it holds 0 0 0 0 0 2 0 that 0 < z E»X X ¼z = E»z ¹X X ºz¼ = E»¹X zº ¼ and consequently P X z , 0 > 0. Since g¹X; º is a strictly positive scalar for all  2 , it also holds that P ¹X zº g¹X; º , 0 > 0 and thus, for all z , 0, 0 0 0 z E ¹X X ºg¹X; º¼z = E X z g¹X; º > 0: (C.11) p p 0 0 This positivity statement holds since X z g¹X; º is a non-negative random variable andP ¹X zº g¹X; º , 0 > 0. This shows that the matrix E ¹X X ºg¹X; º is positive definite. Lemma C.3. Assume that Assumption 2.1, Assumption 2.2 and the Moment Conditions (M-3) in Appendix A hold. Then, for ¹º = E ¹Y; X; º ; (C.12) there are strictly positive numbers a; d , such that jj¹ºjj  a jj  jj for jj  jj  d : (C.13) 0 0 0 Proof of Lemma C.3. Let d > 0 and let jj  jj  d . Then, applying the mean value theorem, we get 0 0 0 that h i 0 ¹1º 0 q 0 e 0 q q ¹º = E ¹X X º G ¹X  º + G ¹X  º f ¹X  º ¹  º (C.14) 1 2 YjX 1 0 22 q q q for some  on the line between  and  . Similarly, for the second component we get that " # ¹1º 0 e 0 q G ¹X  º f ¹X  º YjX 2 0 q 0 q ¹º = E X X ¹  º X ¹  º (C.15) ¹1º 0 0 e e e + E ¹X X ºG ¹X  º ¹  º; 2 0 q q where  lies on the line between  and  . q q q q e e We first assume that jj  jj = jj  jj, i.e. jj  jj  jj  jj. Since the matrix 0 0 0 " # ¹1º 0 q 0 e G ¹X  º + G ¹X  º 0 0 q A¹º := E ¹X X º f ¹X  º (C.16) YjX exists and has full rank for all  2  by Lemma C.2 and is obviously symmetric, A has strictly positive real Eigenvalues ¹º; : : :; ¹º with minimum ¹º and we thus get that11 1 k ¹1º q q q q jj¹ºjj  jj ¹ºjj = jjA¹º¹  ºjj  ¹º jj  jj (C.17) 1 ¹1º 0 0 inf ¹º  jj  jj = c jj  jj: (C.18) ¹1º 1 0 jj jjd 0 0 Since jj  jj  d is a compact set and the function  7! inf ¹º, where ¹º is the 0 0 jj jjd ¹1º ¹1º 0 0 smallest Eigenvalue of the matrix A¹º, is continuous12, we get that the infimum coincides with the minimum and thus, the constant c := inf ¹º is strictly positive and does not depend on . jj jjd ¹1º 0 0 e e e e q Now, we assume that jj  jj = jj  jj  d , i.e. jj  jj  jj  jj. For the first term of 0 0 0 0 0 ¹º, given in (C.15), we define the vector " # ¹1º 0 e 0 q G ¹X  º f ¹X  º YjX 2 q 0 q 0 q 0 q b¹º := E X X ¹  º X  X  º ; (C.19) and for its l-th component, we get that " # ¹1º 0 e 0 q G ¹X  º f ¹X  º YjX q q q q jb ¹ºj = ¹  º¹  ºE X X X l i j l i j j 0i i; j " # ¹1º 0 e 0 q X G ¹X  º f ¹X  º YjX 2 q q q q E X X X  j  j  j  j i j l i 0i j j (C.20) i; j q q q q c j  j  j  j i 0i j j i; j 2 2 c k jj  jj ; 2 0 for all l = 1; : : :; k, which implies that jjb¹ºjj  c jj  jj ; (C.21) 3 0 ¹1º 0 0 e e e e e for some c > 0. For D¹º := E ¹X X ºG ¹X  º , it holds that jjD¹º¹  ºjj  c jj  jj = 3 4 2 0 0 11For a symmetric matrix A with full rank, we can find an orthogonal basis of Eigenvectors fv ; : : :; v g with corresponding P P P nonzero Eigenvalues f ¹º; : : :; ¹ºg such that x = b v with b 2 R. Then, jjAxjj = jjA b v jj = jj b Av jj = 1 k j j j j j j j P P jj b v jj  minj j  jj b v jj = minj j  jjxjj. j j j j j j j 12 This follows since the entries of the matrix A¹º are continuous in  as the expectation of a continuous function which is dominated by an integrable function is again continuous by the dominated convergence theorem. Furthermore, the Eigenvalues of a matrix are the solution of the characteristic polynomial, which has continuous coefficients since our matrix entries are continuous in . Eventually, since the roots of any polynomial with continuous coefficients are again continuous, we can conclude that the Eigenvalues of A¹º are continuous in . 23 c jj  jj for c > 0 by the same arguments as in (C.17). From (C.20), we can choose d small enough 4 0 4 0 such that 2 e e 2jjb¹ºjj  2c jj  jj  c jj  jj  jjD¹º¹  ºjj: (C.22) 3 0 4 0 e e Furthermore, by the submultiplicativity of the matrix norm, we also get that jjD¹º¹  ºjj e e e e jjD¹ºjj  jj  jj = c jj  jj and by the inverse triangle inequality, we get that 0 0 e e e e jj¹ºjj  jj ¹ºjj = D¹º¹  º + b¹º  jjD¹º¹  ºjj jjb¹ºjj : (C.23) 0 0 e e From (C.22), we can choose d small enough such that jjD¹  ºjj > 2jjbjj and thus e e e e e e jjD¹  ºjj jjbjj = jjD¹  ºjj jjbjj  jjD¹  ºjj (C.24) 0 0 0 c c 4 4 e e jj  jj  jj: (C.25) jj = 2 2 Lemma C.4. Let u¹Y; X; ; dº = sup ¹Y; X; º ¹Y; X; º : (C.26) 2U ¹º and assume that Assumption 2.1, Assumption 2.2 and the Moment Conditions (M-3) in Appendix A hold. Then, there are strictly positive numbers c and d , such that E u¹Y; X; ; dº  c d for jj  jj + d  d ; (C.27) 0 0 and for all d  0. Proof of Lemma C.4. Let in the following d > 0 and  2  such that jj  jj + d  d . It holds that 0 0 sup ¹Y; X; º ¹Y; X; º = sup ¹Y; X; º ¹Y; X; º (C.28) ¯ ¯ 2U ¹º 2U ¹º d d and consequently, we show that " # E sup ¹Y; X; º ¹Y; X; º = O¹dº (C.29) j j 2U ¹º for both components j = 1; 2 and for some d > 0 small enough. 24 For the first squared component, we get that " # E sup ¹Y; X; º ¹Y; X; º 1 1 2U ¹º ! " # ¹1º ¹1º 0 q 0 e 0 q 0 e max ; 1  E sup X G ¹X  º + G ¹X  º G ¹X  º G ¹X  º 2 2 1 1 2U ¹º " # ¹1º 0 q 0 e 0 q + E sup X G ¹X  º + G ¹X  º jjXjj sup f ¹X  º  d 2 YjX 2 1 ¯ ¯ 2U ¹º 2U ¹º d d ¹1º ¹1º 0 q 0 e 0 q 0 e + max 1 ; E sup X G ¹X  º + G ¹X  º G ¹X  º G ¹X  º 2 2 1 1 2U ¹º ¹1º 0 q 0 e X G ¹X  º + G ¹X  º ; where we apply (C.5) for the second summand. The remaining two summands can be bounded linearly by ¹1º the arguments given in (C.7) since G and G are continuously differentiable functions and the respective moments are finite. For the second component of , we get that ¹Y; X; º ¹Y; X; º 2 2 ¹1º ¹1º 0 e 0 q 0 e 0 e 0 q 0 e X¹X  X  ºG ¹X  º X¹X  X  ºG ¹X  º 2 2 ¹1º 0 e 0 q XG ¹X  ºX + 1 0 q 1 0 q fYX  g fYX  g ¹1º ¹1º 0 e 0 q 0 e 0 q XG ¹X  ºX  XG ¹X  ºX 2 2 0 q + 1 fYX  g (C.30) ¹1º 0 e XG ¹X  º + Y 1 0 q 1 0 q fYX  g fYX  g Y1 0 q fYX  g ¹1º ¹1º 0 e 0 e + XG ¹X  º XG ¹X  º 2 2 = ¹iº +¹iiº +¹iiiº +¹ivº +¹vº: h i Thus, in order to evaluate E sup ¹Y; X; º ¹Y; X; º , we have to consider all the cross 2 2 2U ¹º products out of the five summands in (C.30). Since the techniques applied are very similar, we only show 25 details for two of the cross products. " # E sup ¹iiº¹vº 2U ¹º ¹1º 0 e 0 q XG ¹X  ºX 0 q 0 q = E sup 1 1 fYX  g fYX  g 2U ¹º 0 q Y1 fYX  g ¹1º ¹1º 0 e 0 e XG ¹X  º XG ¹X  º 2 2 " # ¹1º ¹1º ¹1º 0 e 0 q 0 e 0 e E XG ¹X  ºX   E jYj X  jjXjj  sup G ¹X  º G ¹X  º 2 2 2 2U ¹º " # ¹1º ¹2º 0 e 0 q 0 e E XG ¹X  ºX   E jYj X  jjXjj  sup XG ¹X  º  d 2 2 2 2U ¹º = O¹dº; ¹1º by (C.7) since G is continuously differentiable. The following crossproducts can be bounded analogously by bounding the indicator functions and by 2 2 2 applying the mean value theorem as in (C.7): ¹iº ,¹iiiº ,¹vº ,¹iº¹iiiº,¹iº¹ivº,¹iº¹vº,¹iiº¹ivº,¹iiº¹vº, ¹iiiº¹ivº,¹iiiº¹vº and¹ivº¹vº. 2 2 A second type of technique, similar to the arguments in (C.9) arises in the cases¹iiº ,¹ivº and¹iiº¹ivº. q q q q q ¯ ˜ We get that there exists  ;  2 U ¹º and a value  on the line between  and  , such that + + " # " # 2 3 ¹1º 0 e 6 7 XG ¹X  º 2 2 6 7 0 q 0 q E sup ¹ivº  E E sup Y 1 1 X fYX  g fYX  g 6 7 ¯ ¯ 2U ¹º 6 2U ¹º 7 d d 4 5 2 3 ¹1º 0 e h i 6 XG ¹X  º 7 6 7 q q E E Y 1 X 0 0 fX  YX  g 6 7 6 7 4 5 2 q 3 ¹1º 0 0 e 6 XG ¹X  º + 7 6 7 = E y f ¹yºdy YjX 6 7 6 7 4 5 2 3 ¹1º 0 e 6 7 XG ¹X  º 2 0 q 2 0 q 0 0 q 6 ˜ ˜ 7 E ¹X  º f ¹X  º X  X YjX 6 7 6 7 4 5 " # ¹1º 2 0 e 0 q 2 0 q E X G ¹X  º  sup ¹X  º f ¹X  º  d YjX 2U ¹º = O¹dº; where we apply a multivariate version of the mean value theorem and notice that f is bounded. YjX Lemma C.5. Assume that Assumption 2.1, Assumption 2.2 and the Moment Conditions (M-4) in Appendix A hold. Then, the function ¹Y; X; º, given in (2.2) is locally Lipschitz continuous in  in the sense that for all  ;  2 U ¹ º in some neighborhood of  , it holds that 1 2 d 0 0 ¹Y; X;  º ¹Y; X;  º  K¹Y; Xº   ; (C.31) 1 2 1 2 where E K¹Y; Xº < 1. 26 Proof. We start the proof by splitting the  function into two parts, ¹Y; X; º =  ¹Y; X; º +  ¹Y; X; º; (C.32) 1 2 where 0 q 0 e 0 q 0 q ¹Y; X; º = 1 G ¹X  º G ¹Yº + G ¹X  º¹X  Yº ; (C.33) 1 fYX  g 1 1 2 0 e 0 e 0 q 0 e 0 q ¹Y; X; º = G ¹X  º X  X  G ¹X  º G ¹X  º + a¹Yº: (C.34) 2 2 2 1 Local Lipschitz continuity of  follows since it is a continuously differentiable function and thus locally Lipschitz. We consequently get that for some d > 0 and for all  ;  2 U ¹ º, it holds that 1 2 d 0 ¹1º 0 e 0 q XG ¹X  º XG ¹X  º ¹Y; X;  º  ¹Y; X;  º     sup  ; (C.35) 2 1 2 2 1 2 ¹1º 0 e 0 e 0 q XG ¹X  º X  X 2U ¹ º d 0 with Lipschitz-constant ¹1º 0 e 0 q XG ¹X  º XG ¹X  º K¹Y; Xº = sup  ; (C.36) ¹1º 0 e 0 e 0 q XG ¹X  º X  X 2U ¹ º d 0 which is square-integrable by the moment conditions (M-4). q q 0 0 For the function  , we consider three cases. First, let  ;  2  such that X   X  < Y. Then it 1 1 2 1 2 holds that, ¹Y; X;  º =  ¹Y; X;  º = 0; (C.37) 1 1 1 2 q q since 1 0 = 1 0 = 0, which is obviously a Lipschitz continuous function. fYX  g fYX  g 1 2 q q 0 0 Second, let  ;  2  such that Y  X   X  . Then, for  =  ;  , 1 2 1 2 1 2 0 q 0 e 0 q ¹Y; X; º = G ¹X  º G ¹Yº + G ¹X  º¹X  Yº; (C.38) 1 1 1 2 which is a continuously differentiable function and thus ¹1º 0 q 1 0 e XG ¹X  º + XG ¹X  º ¹Y; X;  º  ¹Y; X;  º     sup : (C.39) 1 1 1 2 1 2 ¹1º 1 0 e 0 q XG ¹X  º¹X  Yº 2U ¹ º d 0 q q 0 0 Finally, let  ;  2  such that X  < Y  X  . Then, since G is increasing, we get that 1 2 1 1 2 q q 0 0 e 0 ¹Y; X;  º  ¹Y; X;  º = G ¹X  º G ¹Yº + G ¹X  º¹X  Yº 1 1 1 2 1 1 2 2 2 2 q q q q 0 0 0 e 0 0 G ¹X  º G ¹X  º + G ¹X  º¹X  X  º 1 1 2 2 1 2 2 1 q q ¹1º 0 q 0 e sup XG ¹X  º + XG ¹X  º : 1 2 1 2U ¹ º d 0 Thus, the function ¹Y; X; º is locally Lipschitz continuous in  with square-integrable Lipschitz constants, E K¹Y; Xº < 1 by the Moment Conditions (M-4) in Appendix A. Proposition C.6. Let Y be a real-valued random variable with distribution function F, finite first and second moments and a unique -quantile q = F ¹ º. Then, ¹ ¹ q q 1 1 1 F¹x ^ yº F¹xºF¹yºdxdy = Var¹YjY  q º + q  ; (C.40) 1 1 where  = E Y Y  q denotes the -ES of Y. Proof. We first notice that for a distribution F with finite second moment und unique -quantile, it holds that E Y Y  q = F¹xºdx + q and (C.41) 2 2 E Y Y  q = xF¹xºdx + q ; (C.42) which can be obtained by using the identity ¹ ¹ 1 0 Y1 = 1 1 dt 1 dt (C.43) fYq g fYq g fY>tg fYtg 0 1 and by taking expectations on both sides. By applying (C.41), we get that ¹ ¹ ¹ q q q F¹xºF¹yºdxdy = F¹xºdx = q E Y Y  q = q  : (C.44) 1 1 1 Furthermore, notice that ¹ ¹ ¹ ¹ ¹ ¹ q q q y q q F¹x ^ yºdxdy = F¹xºdxdy + F¹yºdxdy; (C.45) 1 1 1 1 1 y and by rearranging the order of integration for the first term in (C.45), we get that ¹ ¹ º º q y F¹xº dxdy = F¹xº dxdy = F¹xº dydx 1 1 f¹x;yº: yq ; xyg f¹x;yº: xq ; yxg (C.46) ¹ ¹ ¹ q q q = F¹xº dydx = F¹xº¹q xº dx: 1 x 1 Thus, by first using (C.45) and (C.46) and by plugging in (C.41) and (C.44), we obtain ¹ ¹ ¹ ¹ q q q q F¹x ^ yºdxdy = 2 F¹yº dxdy 1 1 1 y = 2 F¹yº¹q yº dy ¹ ¹ q q (C.47) = 2q F¹yº dy 2 yF¹yº dy 1 1 2 2 = 2q q  + E Y Y  q q 2 2 = E Y Y  q + q 2 q  : Eventually, using (C.44) and (C.47), straight-forward calculations yield that ¹ ¹ q q 1 1 1 F¹x ^ yº F¹xºF¹yºdxdy = Var¹YjY  q º + q  ; (C.48) 1 1 which concludes the proof. Appendix D Separability of almost surely continuous functions Definition D.1 (Separability of a Stochastic Process). A stochastic process ¹x; º : ! Y is called separable in the sense of Doob, if there exists in an everywhere dense countable set I, and in 28 nullset N such that for any arbitrary open set G   and every closed set F  Y, the two sets fxj ¹x; º 2 F; 8 2 Gg and (D.1) fxj ¹x; º 2 F; 8 2 G\ Ig (D.2) differ from each other at most by a subset of N. Proposition D.2 (Gikhman and Skorokhod (2004)). Let  and Y be metric spaces,  be a separable space. The sets (D.1) and (D.2) coincide for all x 2 for which the stochastic process ¹x; º is continuous in . Proof. It is clear that fxj ¹x; º 2 F; 8 2 Gg  fxj ¹x; º 2 F; 8 2 G\ Ig. We thus only show the reverse. Let G   be an arbitrary open set and F  Y an arbitrary closed set. Let furthermore x 2 such ˜ ˜ ˜ that ¹x; º 2 F for all  2 G\ I. We have to show that ¹x; º 2 F for all  2 G but  < I. Thus, let  2 G n I. Since I is a dense set in , there exists a sequence ¹ º 2 \ I, such that n n2N ˜ ˜ !  and since G is an open set in  and  2 G, we can conclude that for m 2 N large enough,  2 G n n for all n  m. Furthermore, by continuity at , it holds that ¹x;  º ! ¹x; º and since  2 G\ I for n n all n large enough, ¹x;  º 2 F by assumption. Eventually, since F is a closed set, ¹x; º 2 F which proves the proposition. Corollary D.3 (Separability of continuous functions). Let  and Y be metric spaces,  be a separable space, and let the stochastic process ¹x; º be almost surely continuous. Then, is separable. Proof. Since ¹x; º is continuous for all x 2 n N for some N with P¹Nº = 0. We get from Proposition D.2 that the sets (D.1) and (D.2) coincide for all x 2 n N, i.e. they differ only by a subset of N. References Acerbi, C. and Szekely, B. (2014). Backtesting Expected Shortfall. Risk. Andersen, T. and Bollerslev, T. (1998). Answering the skeptics: Yes, standard volatility models do provide accurate forecasts. International Economic Review, 39:885–905. Andrews, D. (1994). Empirical Process Methods in Econometrics. In Engle, R. and McFadden, D., editors, Handbook of Econometrics, volume 4, chapter 37, pages 2247–2294. Elsevier. Artzner, P., Delbaen, F., Eber, J.-M., and Heath, D. (1999). Coherent Measures of Risk. Mathematical Finance, 9(3):203–228. Barendse, S. (2017). Interquantile Expectation Regression. Available at https://ssrn.com/abstract=2937665. Basel Committee (2016). Minimum capital requirements for Market Risk. Technical report, Basel Committee on Banking Supervision. Available at http://www.bis.org/bcbs/publ/d352.pdf. Bayer, S. and Dimitriadis, T. (2017a). esreg: Joint (VaR, ES) Regression. R package version 0.2.0, available at https://github.com/BayerSe/esreg. Bayer, S. and Dimitriadis, T. (2017b). Regression-based Expected Shortfall Backtesting. Working Paper. Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics, 31(3):307–327. Brazauskas, V., Jones, B. L., Puri, M. L., and Zitikis, R. (2008). Estimating conditional tail expectation with actuarial applications in view. Journal of Statistical Planning and Inference, 138(11):3590–3604. Chen, S. X. (2008). Nonparametric Estimation of Expected Shortfall. Journal of Financial Econometrics, 6(1):87–107. Chernozhukov, V. and Umantsev, L. (2001). Conditional value-at-risk: Aspects of modeling and estimation. Empirical Economics, 26(1):271–292. 29 Corsi, F. (2009). A Simple Approximate Long-Memory Model of Realized Volatility. Journal of Financial Econometrics, 7(2):174–196. Dimitriadis, T. and Bayer, S. (2017). Online Supplement for “A Joint Quantile and Expected Shortfall Regression Framework”. Efron, B. (1979). Bootstrap Methods: Another Look at the Jackknife. The Annals of Statistics, 7(1):1–26. Efron, B. (1991). Regression percentiles using asymmetric squared error loss. Statistica Sinica, 1:93–125. Ehm, W., Gneiting, T., Jordan, A., and Krüger, F. (2016). Of quantiles and expectiles: consistent scoring functions, Choquet representations and forecast rankings. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 78(3):505–562. Engle, R. and Manganelli, S. (2004). CAViaR: Conditional Autoregressive Value at Risk by Regression Quantiles. Journal of Business and Economic Statistics, 22(4):367–381. Fissler, T. (2017). On Higher Order Elicitability and Some Limit Theorems on the Poisson and Wiener Space. PhD thesis, Universität Bern. Fissler, T. and Ziegel, J. F. (2016). Higher order elicitability and Osband’s principle. Annals of Statistics, 44(4):1680–1707. Fissler, T., Ziegel, J. F., and Gneiting, T. (2016). Expected Shortfall is jointly elicitable with Value at Risk - Implications for backtesting. Risk Magazine, Janaury 2016. Gaglianone, W. P., Lima, L. R., Linton, O., and Smith, D. R. (2011). Evaluating Value-at-Risk Models via Quantile Regression. Journal of Business & Economic Statistics, 29(1):150–160. Gikhman, I. and Skorokhod, A. (2004). The Theory of Stochastic Processes I, volume 210 of Classics in Mathematics. Springer Berlin Heidelberg. Gneiting, T. (2011). Making and Evaluating Point Forecasts. Journal of the American Statistical Association, 106(494):746–762. Gourieroux, C. and Monfort, A. (1995). Statistics and Econometric Models: Volume 1, General Concepts, Estimation, Prediction and Algorithms. Cambridge University Press. Halbleib, R. and Pohlmeier, W. (2012). Improving the value at risk forecasts: Theory and evidence from the financial crisis. Journal of Economic Dynamics and Control, 36(8):1212–1228. Hall, P. and Sheather, S. J. (1988). On the Distribution of a Studentized Quantile. Journal of the Royal Statistical Society. Series B (Methodological), 50(3):381–391. Hendricks, W. and Koenker, R. (1992). Hierarchical Spline Models for Conditional Quantiles and the Demand for Electricity. Journal of the American Statistical Association, 87(417):58–68. Huber, P. (1967). The behavior of maximum likelihood estimates under nonstandard conditions. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, pages 221–233. Berkeley: University of California Press. Koenker, R. (1994). Confidence Intervals for Regression Quantiles. In Mandl, P. and Hušková, M., editors, Asymptotic Statistics: Proceedings of the Fifth Prague Symposium, held from September 4–9, 1993, pages 349–359. Physica-Verlag Heidelberg. Koenker, R. (2005). Quantile Regression. Econometric Society Monographs. Cambridge University Press. Koenker, R. and Machado, J. A. F. (1999). Goodness of Fit and Related Inference Processes for Quantile Regression. Journal of the American Statistical Association, 94(448):1296–1310. Koenker, R. and Xiao, Z. (2006). Quantile Autoregression. Journal of the American Statistical Association, 101(475):980–990. Komunjer, I. (2013). Quantile Prediction. In Handbook of Economic Forecasting, volume 2, chapter 17, pages 961–994. Elsevier. Lambert, N. S., Pennock, D. M., and Shoham, Y. (2008). Eliciting Properties of Probability Distributions. In Proceedings of the 9th ACM Conference on Electronic Commerce, pages 129–138. ACM. Lourenço, H. R., Martin, O. C., and Stützle, T. (2003). Iterated Local Search. In Glover, F. and Kochenberger, G. A., editors, Handbook of Metaheuristics, pages 320–353. Springer US, Boston, MA. Nadarajah, S., Zhang, B., and Chan, S. (2014). Estimation methods for expected shortfall. Quantitative Finance, 14(2):271–291. 30 Nelder, J. A. and Mead, R. (1965). A Simplex Method for Function Minimization. The Computer Journal, 7(4):308–313. Newey, W. and McFadden, D. (1994). Large sample estimation and hypothesis testing. In Engle, R. and McFadden, D., editors, Handbook of Econometrics, volume 4, chapter 36, pages 2111–2245. Elsevier. Nolde, N. and Ziegel, J. F. (2017). Elicitability and backtesting: Perspectives for banking regulation. arXiv:1608.05498 [q-fin.RM]. Taylor, J. W. (2008a). Estimating Value at Risk and Expected Shortfall Using Expectiles. Journal of Financial Econometrics, 6(2):231–252. Taylor, J. W. (2008b). Using Exponentially Weighted Quantile Regression to Estimate Value at Risk and Expected Shortfall. Journal of Financial Econometrics, 6(3):382–406. Taylor, J. W. (2017). Forecasting Value at Risk and Expected Shortfall Using a Semiparametric Approach Based on the Asymmetric Laplace Distribution. Forthcoming in Journal of Business and Economic Statistics. van der Vaart, A. W. (1998). Asymptotic statistics. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press. Žikeš, F. and Baruník, J. (2016). Semi-parametric Conditional Quantile Models for Financial Returns and Realized Volatility. Journal of Financial Econometrics, 14(1):185–226. Weber, S. (2006). Distribution Invariant Risk Measures, Information, and Dynamic Consistency. Mathematical Finance, 16(2):419–441. Xiao, Z., Guo, H., and Lam, M. S. (2015). Quantile Regression and Value at Risk. In Lee, C.-F. and Lee, J. C., editors, Handbook of Financial Econometrics and Statistics, pages 1143–1167. Springer. Ziegel, J. F., Krüger, F., Jordan, A., and Fasciati, F. (2017). Murphy Diagrams: Forecast Evaluation of Expected Shortfall. arXiv:1705.04537 [q-fin.RM]. Zwingmann, T. and Holzmann, H. (2016). Asymptotics for the expected shortfall. arXiv:1611.07222 [math.ST].

Journal

Quantitative FinancearXiv (Cornell University)

Published: Apr 7, 2017

There are no references for this article.