Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Bayesian Nonparametric Estimation of Ex Post Variance

Bayesian Nonparametric Estimation of Ex Post Variance Abstract Variance estimation is central to many questions in finance and economics. Until now ex post variance estimation has been based on infill asymptotic assumptions that exploit high-frequency data. This article offers a new exact finite sample approach to estimating ex post variance using Bayesian nonparametric methods. In contrast to the classical counterpart, the proposed method exploits pooling over high-frequency observations with similar variances. Bayesian nonparametric variance estimators under no noise, heteroskedastic and serially correlated microstructure noise are introduced and discussed. Monte Carlo simulation results show that the proposed approach can increase the accuracy of variance estimation. Applications to equity data and comparison with realized variance and realized kernel estimators are included. Volatility is an indispensable quantity in finance and is a key input into asset pricing, risk management, and portfolio management. In the last two decades, researchers have taken advantage of high-frequency data to estimate ex post variance using intraperiod returns. Barndorff-Nielsen and Shephard (2002) and Andersen et al. (2003) formalized the idea of using high-frequency data to measure the volatility of lower frequency returns. They show that realized variance (RV) is a consistent estimator of quadratic variation under ideal conditions. Unlike parametric models of volatility in which the model specification is important, RV is a model-free estimate of quadratic variation in that it is valid under a wide range of spot volatility dynamics.1 RV provides an accurate measure of ex post variance if there is no market microstructure noise. However, observed prices at high frequency are inevitably contaminated by noise in reality and returns are no longer uncorrelated. In this case, RV is a biased and inconsistent estimator (Hansen and Lunde, 2006; Aït-Sahalia, Mykland, and Zhang, 2011). The impact of market microstructure noise on forecasting is explored in Aït-Sahalia and Mancini (2008) and Andersen, Bollerslev, and Meddahi (2011). Several different approaches have been proposed to estimate ex post variance under microstructure noise. Zhou (1996) first introduced the idea of using a kernel-based method to estimate ex post variance. Barndorff-Nielsen et al. (2008) formally discussed the realized kernel and showed how to use it in practice in a later article (Barndorff-Nielsen et al., 2009). Another approach is the subsampling method of Zhang, Mykland, and Aït-Sahalia (2005). Hansen, Large, and Lunde (2008) showed how a time series model can be used to filter out market microstructure to obtain corrected estimates of ex post variance. A robust version of the predictive density of integrated volatility is derived in Corradi, Distaso, and Swanson (2009). Although bootstrap refinements are explored in Goncalves and Meddahi (2009) all distributional results from this literature rely on infill asymptotics. Much of the literature has focused on the asymptotic properties of adaptations of realized variation which are robust to market microstructure noise. However, an argument can be made against the direct use of realized variation in the no noise situation if time between observations does not converge to zero. Realized variation is the sum of squared intraperiod returns and each component of that sum is an unbiased estimator of the corresponding intraperiod integrated volatility. It is well-understood that unbiased estimators based on one observation can be suboptimal in terms of mean squared error and risk, for example, Brown and Zhao (2012). Shrinkage estimators which pool information from related estimates are one method to construct estimators with better properties. This suggests estimators of the integrated volatility with smaller mean squared error than realized variation can be constructed by summing shrunken estimates of the intraperiod integrated volatility.2 When will there be substantial differences? Intuitively, shrinkage estimates work well when the unbiased estimators are noisy and information can be usefully pooled. If high-frequency data had no noise, we would expect the difference to decrease as the sampling frequency increases (and the benefit of pooling will disappear asymptotically). In reality, high-frequency financial data include noise and it is less clear that substantial differences will disappear asymptotically. In our simulation experiments, we find evidence that the difference in mean squared error persists at high frequency. We use a Bayesian hierarchical approach to achieve shrinkage by pooling the information from related estimates. To our knowledge, this idea has not been used in the estimation of volatility using high-frequency data. We assume that the intraperiod integrated volatility over short periods is exchangeable which implies that they can be modeled as conditionally independent and drawn from a prior distribution. The choice of distribution will have a strong effect on the form of pooling and so we choose to infer this distribution from the data using Bayesian nonparametric methods rather than choosing a parametric family (such as the generalized inverse Gaussian distribution). We model intraperiod returns according to a Dirichlet process mixture (DPM) model. This is a countably infinite mixture of distributions which facilitates the clustering of return observations into distinct groups sharing the same variance parameter. Our proposed method benefits variance estimation in at least two aspects. First, the common values of intraperiod variance can be pooled into the same group leading to a more precise estimate. The pooling is done endogenously along with estimation of other model parameters. Second, the Bayesian nonparametric model delivers exact finite inference regarding ex post variance or transformations such as the logarithm. As such, uncertainty around the estimate of ex post volatility is readily available from the predictive density. Unlike the existing asymptotic theory which may give confidence intervals that contain negative values for variance, density intervals are always on the positive real line and can accommodate asymmetry. By extending key results in Hansen, Large, and Lunde (2008), we adapt the DPM mixture models to deal with returns contaminated with heteroskedastic noise and serially correlated noise. Mykland and Zhang (2009) considered links between local parametric inference and high-frequency financial data analysis. Their approach assumes that quantities such as volatility are constant over blocks of returns and can lead to more efficient estimation and the definition of new estimators. Our method can be seen as a generalization of their blocked RV estimator using partitioning ideas from Bayesian nonparametrics by defining clusters rather than blocks of returns. Our approach endogenously finds these clusters and so does not restrict to clusters or blocks with a number of returns which are consecutive in time. Monte Carlo simulation results show the Bayesian approach to be a very competitive alternative. Overall, pooling can lead to more precise estimates of ex post variance and better coverage frequencies. These results are robust to different prior settings, irregularly spaced prices, and tick time sampling. We show that the new variance estimators can be used with confidence and effectively recover both the average statistical features of daily ex post variance as well as the time series properties. Two applications to real-world data with comparison to RV and kernel-based estimators are included. This article is organized as follows. The Bayesian nonparametric model, daily variance estimator, and model estimation methods are discussed in Section 1. Section 2 extends the Bayesian nonparametric model to deal with heteroskedastic and serially correlated microstructure noise. The consistency of the estimator is considered in Section 3. Section 4 provides an extensive simulation and comparison of the estimators. Applications to IBM and S&P 500 ETF data are found in Section 5. Section 6 concludes followed by an Appendix. 1 Bayesian Nonparametric Ex Post Variance Estimation In this section, we introduce a Bayesian nonparametric ex post volatility estimator. After defining the daily variance, conditional on the data, the discussion moves to the DPM model which provides the model framework of the proposed estimator. The approach discussed in this section deals with returns without microstructure noise and an estimator suitable for returns with microstructure noise is found in Section 2. 1.1 Model of High-Frequency Returns First, we consider the case with no market microstructure noise. We are interested in estimating the integrated volatility over fixed periods (which for simplicity will subsequently be called a day) using high-frequency intraday log returns. We will assume that there are nt intraday log returns for the t-th day recorded at times τt,1,τt,2,…,τt,nt and which are denoted rt,1,…,rt,nt ⁠. The model for each log-return is rt,i=μt+σt,izt,i,  zt,i∼iidN(0,1), i=1,…,nt, (1) where μt is constant in day t and 0<σt,i<∞ for all i.3 We make no assumptions on the stochastic process generating σt,i2 ⁠. For example, volatility may have jumps, undergo structural change or possess long memory. Given this assumed discrete data generating process (DGP), we cannot distinguish between continuous and discrete (jumps) components as is commonly done in the literature (Barndorff-Nielsen and Shephard, 2006). The daily return is rt=∑i=1ntrt,i (2) and it follows, conditional on the unknown realized volatility path Ft≡{σt,i2}i=1nt ⁠, the ex post variance is Vt≡Var(rt|Ft)=∑i=1ntσt,i2. (3) In our Bayesian setting, Vt is the target to estimate conditional on the data {rt,i}i=1nt ⁠. 1.2 A Bayesian Model with Pooling In this section, we discuss a nonparametric prior for the model of Equation (1) that allows for pooling over common values of σt,i2 ⁠. The DPM model is a Bayesian nonparametric mixture model that has been used in density estimation and for modeling unknown hierarchical effects among many other applications. A key advantage of the model is that it naturally incorporates parameter pooling. Our nonparametric model has the following hierarchical form rt,i|μt,σt,i∼iidN(μt,σt,i2), i=1,…,nt, (4) σt,i2|Gt∼iidGt, (5) Gt|G0,t,αt∼DP(αt,G0,t), (6) G0,t≡IG(v0,t,s0,t), (7) where the base measure is the inverse-gamma distribution denoted as IG(v,s) ⁠, which has a mean of s/(v−1) for v > 1. The return mean μt is assumed to be a constant over i. The Dirichlet process was formally introduced by Ferguson (1973) and is a distribution over distributions. Note that each day assumes an independent DPM model and is indexed by t. We do not pursue pooling over Gt. A draw from a DP(αt,G0,t) is an almost surely discrete distribution which is centered around the base distribution G0,t ⁠, in the sense that E[Gt(B)]=G0,t(B) for any set B. The concentration parameter αt>0 governs how closely a draw Gt resembles G0,t ⁠. Larger values of αt lead to Gt being closer to G0,t ⁠. Since the realization Gt are discrete, a sample from σt,i2|Gt∼Gt has a positive probability of repeated values. This has lead the use of DPM’s for clustering problems. If Kn is the number of distinct values in a sample of size n, then E[Kn]≈αt log(1+n/αt) ⁠. Therefore, the number of distinct values grows logarithmically with sample size and larger values of αt will tend to lead to more distinct values. In fact, as αt→∞, Gt→G0,t ⁠, which implies that every rt,i has a unique σt,i2 drawn from centering distribution. In this model, the inverse gamma distribution is used as the centering base measure as this is the standard conjugate choice and leads to relatively simple computational schemes for inference. In the case of αt→∞ ⁠, there is no pooling and we have a setting very close to the classical counterpart discussed above. However, for finite αt, pooling can take place. The other extreme is complete pooling for αt→0 in which there is one common variance shared by all observations such that σt,i2=σt,12, ∀i ⁠. Since αt plays an important role in pooling, we place a prior on it and estimate it along with the other model parameters for each day. A stick-breaking representation (Sethuraman, 1994) of the DPM in Equation (5) is given as follows.4 p(rt,i|μt,Ψt,wt)=∑j=1∞wt,jN(rt,i|μt,ψt,j2), (8) wt,j=vt,j∏l=1j−1(1−wt,l), (9) vt,j∼iidBeta(1,αt), (10) where N(·|·,·) denotes the density of the normal distribution, Ψt={ψt,12,ψt,22.…,} is the set of unique values of σt,i2 ⁠, wt={wt,1,wt,2,…,} and wt,j is the weight associated with the j-th component. This formulation of the model facilitates posterior sampling which is discussed in the next section. Since our focus is on intraday returns and the number of observations in a day can be small, especially for lower frequencies such as five-minute. Therefore, the prior should be chosen carefully. It is straightforward to show that the prior predictive distribution of σt,i2 is G0,t ⁠. For σt,i2∼IG(v0,t,s0,t) ⁠, the mean and variance of σt,i2 are E(σt,i2)=s0,tv0,t−1 and var(σt,i2)=s0,t2(v0,t−1)2(v0,t−2). (11) Solving the two equations, the values of v0,t and s0,t are given by v0,t=[E(σt,i2)]2var(σt,i2)+2 and s0,t=E(σt,i2)(v0,t−1). (12) We use sample statistics var̂(rt,i) and var̂(rt,i2) calculated with three days intraday returns (day t – 1, day t, and day t + 1) to set the values of E(σt,i2) and var(σt,i2) ⁠, then use Equation (12) to find v0,t and s0,t ⁠. A shrinkage prior N(0,v2) is used for μt since μt is expected to be close to zero. The prior variance of μt is adjusted according to data frequency: v2=ζ2/nt where nt is the number of intraday returns. Finally, αt∼Gamma(a,b) ⁠. See Table 1 for prior settings. For a finite dataset i=1,…,nt ⁠, our target is the following posterior moment E[Vt|{rt,i}i=1nt]=E[∑i=1ntσt,i2|{rt,i}i=1nt]. (13) Note that the posterior mean of Vt can also be considered as the posterior mean of RV, RVt=∑i=1ntrt,i2 assuming μt is small. As such, RVt treats each σt,i2 as separate and corresponds to no pooling. Mykland and Zhang (2009) discuss the use of blocks of high-frequency data in volatility estimation. Our method can be seen as a generalization of Mykland and Zhang (2009). We allow returns with the same variance to form groups flexibly and do not impose the restriction that the returns in one group are consecutive in time. Another distinction is that our approach allows the group size to vary over clusters and be determined endogenously, while Mykland and Zhang (2009) have one fixed block size for all clusters preset by the econometrician. Furthermore, unlike standard blocking, the proposed method is invariance to return permutations since the DPM model assumes exchangeable data. Table 1 Prior specifications of models Model μt σt,i2 Θt αt DPM N(0,v2) IG(v0,t,s0,t) – Gamma(2,8) DPM-MA(q) N(0,v2) IG(v0,t,s0,t) N(0,I)1{|Θt|} Gamma(2,8) Model μt σt,i2 Θt αt DPM N(0,v2) IG(v0,t,s0,t) – Gamma(2,8) DPM-MA(q) N(0,v2) IG(v0,t,s0,t) N(0,I)1{|Θt|} Gamma(2,8) Note: v0,t and s0,t are calculated using Equation (12); 1{|Θt|} denotes the invertibility condition for the MA(q) model; v2=ζ2/nt where ζ2=0.01 and nt is the number of intraday returns. Open in new tab Table 1 Prior specifications of models Model μt σt,i2 Θt αt DPM N(0,v2) IG(v0,t,s0,t) – Gamma(2,8) DPM-MA(q) N(0,v2) IG(v0,t,s0,t) N(0,I)1{|Θt|} Gamma(2,8) Model μt σt,i2 Θt αt DPM N(0,v2) IG(v0,t,s0,t) – Gamma(2,8) DPM-MA(q) N(0,v2) IG(v0,t,s0,t) N(0,I)1{|Θt|} Gamma(2,8) Note: v0,t and s0,t are calculated using Equation (12); 1{|Θt|} denotes the invertibility condition for the MA(q) model; v2=ζ2/nt where ζ2=0.01 and nt is the number of intraday returns. Open in new tab 1.3 Model Estimation Estimation relies on Markov chain Monte Carlo (MCMC) techniques. We apply the slice sampler of Kalli, Griffin, and Walker (2011), along with Gibbs sampling to estimate the DPM model. The slice sampler provides an elegant way to deal with the infinite states in Equation (8). It introduces an auxiliary variable ut,1:nt={ut,1,…,ut,nt} that randomly truncates the state space to a finite set at each MCMC iteration but marginally delivers draws from the desired posterior. The joint distribution of rt,i and the auxiliary variable ut,i is given by f(rt,i,ut,i|wt,μt,Ψt)=∑j=1∞1(ut,i<wt,j)N(rt,i|μt,ψt,j2), (14) and integrating out ut,i recovers Equation (8). It is convenient to rewrite the model in terms of a latent state variable st,i∈{1,2,…} that maps each observation to an associated component and parameter σt,i2=ψt,st,i2 ⁠. Observations with a common state share the same variance parameter. For finite dataset, the number of states (clusters) is finite and ordered from 1,…,K ⁠. Note that the number of clusters K, is not a fixed value over the MCMC iterations. A new cluster with variance ψt,K+12∼G0,t can be created if existing clusters do not fit that observation well and clusters sharing a similar variance can be merged into one. The joint posterior is p(μt)∏j=1K[p(ψt,j2)]p(αt)∏i=1nt1(ut,i<wt,st,i)N(rt,i|μt,ψt,st,i2). (15) Each MCMC iteration contains the following sampling steps. π(μt|rt,1:nt,{ψt,j2}j=1K,st,1:nt)∝p(μt)∏i=1ntp(rt,i|μt,ψt,st,i2) ⁠. π(ψt,j2|rt,1:nt,st,1:nt,μt)∝p(ψt,j2)∏t:st,i=jp(rt,i|μt,ψt,j2) for j=1,…,K ⁠. π(vt,j|st,1:nt)∝Beta(vt,j|at,j,bt,j) with at,j=1+∑i=1nt1(st,i=j) and bt,j=αt+∑i=1nt1(st,i>j) and update wt,j=vt,j∏l<j(1−vt,l) for j=1,…,K ⁠. π(ut,i|wt,i,st,1:nt)∝1(0<ut,i<wt,st,i) ⁠. Find the smallest K such that ∑j=1Kwt,j>1−min(ut,1:nt) ⁠. π(st,i|r1:nt,st,1:nt,μt,{ψt,j2}j=1K,ut,1:nt,K)∝∑j=1K1(ut,i<wt,j)p(rt,i,|μt,ψt,j2) for i=1,…,nt ⁠. π(αt|K)∝p(αt)p(K|αt) ⁠. In the first step, μt is common to all returns and this is a standard Gibbs step given the conjugate prior. Step 2 is a standard Gibbs step for each variance parameter ψt,j2 based on the data assigned to cluster j. The remaining steps are standard for slice sampling of DPM models. In Step 7, αt is sampled based on Escobar and West (1994). Steps 1–7 give one iteration of the posterior sampler. After dropping a suitable burn-in amount, M additional samples are collected, {θ(m)}m=1M ⁠, where θ={μt,ψt,12,…,ψt,K2,st,1:nt,αt} ⁠. Posterior moments of interest can be estimated from sample averages of the MCMC output. 1.4 Ex Post Variance Estimator Conditional on the parameter vector θ, the estimate of Vt is E[Vt|θ]=∑i=1ntσt,si2. (16) The posterior mean of Vt is obtained by integrating out all parameters and distributional uncertainty. E[Vt|{rt,i}i=1nt] is estimated as V^t=1M∑m=1M∑i=1ntσt,i2(m), (17) where σt,i2(m)=ψt,st,i(m)2(m) ⁠. Similarly, other features of the posterior distribution of Vt can be obtained. For instance, a (1 − α) probability density interval for Vt is the quantiles of ∑i=1ntσt,st,i2 associated with probabilities α/2 and (1−α/2) ⁠. Conditional on the model and prior these are exact finite sample estimates, in contrast to the classical estimator which relies on infill asymptotics to derived confidence intervals. If log(Vt) is the quantity of interest, the estimator of E[log(Vt)|{rt,i}i=1nt] is given as  log(Vt)̂=1M∑m=1M log (∑i=1ntσt,i2(m)). (18) As before, quantile estimates of the posterior of log(Vt) can be estimated from the MCMC output. 2. Bayesian Estimator under Microstructure Error An early approach to deal with market microstructure noise was to prefilter with a time series model (Andersen et al., 2001; Bollen and Inder, 2002; Maheu and McCurdy, 2002). Hansen, Large, and Lunde (2008) show that prefiltering results in a bias to RV that can be easily corrected. We employ these insights into moving average specifications to account for noisy high-frequency returns. A significant difference is that we allow for heteroskedasticity in the noise process. 2.1 DPM-MA(1) Model The existence of microstructure noise turns the intraday return process into an autocorrelated process. First, consider the case in which the error is white noise: p˜t,i=pt,i+ϵt,i, ϵt,i∼N(0,ωt,i2), (19) where p˜t,i denotes the observed log-price with error, pt,i is the unobserved fundamental log-price, and ωt,i2 is the heteroskedastic noise variance. Given this structure, it can be shown that the returns series r˜t,i=p˜t+1,i−p˜t,i has nonzero first-order autocorrelation but zero higher order autocorrelation. That is cov(r˜t,i+1,r˜t,i)=−ωt,i2 and cov(r˜t,i+j,r˜t,i)=0 for j≥2 ⁠. This suggests a moving average model of order one.5 Combining MA(1) parameterization with our Bayesian nonparametric framework yields the DPM-moving average of order 1 (MA(1)) models. r˜t,i|μt,θt,δt,i2=μt+θtηt,i−1+ηt,i, ηt,i∼N(0,δt,i2) (20) δt,i2|Gt∼Gt, (21) Gt|G0,t,αt∼DP(αt,G0,t), (22) G0,t≡IG(v0,t,s0,t). (23) The noise terms are heteroskedastic. Note that the mean of rt,i is not a constant term but a moving average term. The MA parameter θt is constant for i but will change with the day t. The prior is θt∼N(mθ,vθ2)1{|θt|<1} in order to make the MA model invertible. The error term ηt,0 is assumed to be zero. Other model settings remain the same as the DPM illustrated in Section 1. Later we show how estimates from this specification can be used to recover an estimate of the ex post variance Vt of the true return process. 2.2 DPM-MA(q) Model For lower sampling frequencies, such as one minute or more, first-order autocorrelation is the main effect of market microstructure. As such, the MA(1) model will be sufficient for many applications. However, at higher sampling frequencies, the dependence may be stronger. To allow for a more complex effect on returns from the noise process consider the MA(q − 1) noise affecting returns, p˜t,i=pt,i+ϵt,i−ρ1ϵt,i−1−…−ρq−1ϵt,i−q+1, ϵt,i∼N(0,ωt,i2). (24) For returns, this leads to the following DPM-MA(q) model, r˜t,i|μt,{θt,j}j=1q,δt,i2=μt+∑j=1qθt,jηt,i−j+ηt,i, ηt,i∼N(0,δt,i2) (25) δt,i2|Gt∼Gt, (26) Gt|G0,t,αt∼DP(αt,G0,t), (27) G0,t≡IG(v0,t,s0,t). (28) The joint prior of (θt,1,…,θt,q) is N(MΘ,VΘ)1{Θ}6 and (ηt,0,…,ηt,−(q−1))=(0,…,0) ⁠. 2.3 Model Estimation We discuss the estimation of DPM-MA(1) model and the approach can be easily extended to the DPM-MA(q). The main difference in this model is that the conditional mean parameters μt and θt require a Metropolis-Hasting (MH) step to sample their conditional posteriors. The remaining MCMC steps are essentially the same. As before, let ψt,i2 denote the unique values of δt,j2 then each MCMC iteration samples from the following conditional distributions. π(μt|r˜t,1:nt,{ψt,j2}j=1K,θt,st,1:nt)∝p(μt)∏i=1ntN(r˜t,i|μt+θtηt,i−1,ψt,st,i2) ⁠. π(θt|r˜t,1:nt,μt,{ψt,j2}j=1K,s1:ntt)∝p(θt)∏i=1ntp(r˜t,i|μt+θtηt,i−1,ψt,st,i2) ⁠. π(ψt,j2|r˜t,1:nt,μt,θt,st,1:nt)∝p(ψt,j2)∏t:st=jp(r˜t,i|μt+θtεt,i−1,ψt,j2) for j=1,…,K ⁠. π(vt,j|st,1:nt)∝Beta(vt,j|at,j,bt,j) with at,j=1+∑i=1nt1(st,i=j) and bt,j=αt+∑i=1nt1(st,i>j) and update wt,j=vt,j∏l<j(1−vt,l) for j=1,…,K ⁠. π(ut,i|wt,i,st,1:nt)∝1(0<ut,i<wt,st,i) for i=1,…,nt ⁠. Find the smallest K such that ∑j=1Kwt,j>1−min(ut,1:nt) ⁠. π(st,i|r˜1:nt,st,1:nt,μt,θt,{ψt,j2}j=1K,ut,1:nt,K)∝∑j=1K1(ut,i<wt,j)N(r˜t,i|μt+θtηt,i−1,ψt,j2) for i=1,…,nt ⁠. π(αt|K)∝p(αt)p(K|αt) ⁠. In Steps 1 and 2, the likelihood requires the sequential calculation of the lagged error as ηt,i−1=r˜t,i−1−μt−θtηt,i−2 which precludes a Gibbs sampling step. Therefore, μt and θt are sampled using an MH with a random walk proposal. The proposal is calibrated to achieve an acceptance rate between 0.3 and 0.5. 2.4 Ex Post Variance Estimator under Microstructure Error Hansen, Large, and Lunde (2008) showed that prefiltering with an MA model results in a bias in the RV estimator.7 In the Appendix, it is shown that the Hansen, Large, and Lunde (2008) bias correction provides an accurate adjustment to our Bayesian estimator in the context of heteroskedastic noise. From the DPM-MA(1) model the posterior mean of Vt under independent microstructure error is V^t,MA(1)=1M∑m=1M(1+θt(m))2∑i=1ntδt,i2(m), (29) where δt,i2(m)=ψt,st,i(m)2(m) ⁠. The log of Vt, square-root of Vt and density intervals can be estimated as the Bayesian nonparametric ex post variance estimator without microstructure error. In the case of higher autocorrelation, the DPM-MA(q) model adjusted posterior estimate of Vt is V^t,MA(q)=1M∑m=1M(1+∑j=1qθt,j(m))2∑i=1ntδt,i2(m). (30) Next, we consider simulation evidence on these estimators. 3 Consistency Each of the previously discussed estimators (posterior means) for integrated volatility (3) can be fairly easily shown to be consistent estimators as the sampling frequency increases. The posterior mean of Vt can be shown to be equal to a consistent estimator plus a bias term that goes to zero in probability as nt→0 ⁠. We provide the proof for the case with no market microstructure noise. Theorem 1. Suppose that p is an arbitrage-free price process with zero mean, that  sup(τt,j+1−τt,j)→0for  nt→∞and that  s0,nt=O(nt−α)for  α>0then  E[Vt|{rt,i}i=1nt]is a consistent estimator of the integrated volatility. See Appendix for the proof. A similar argument can be used for the MA processes with the residuals from the MA process replacing the returns. Hansen, Large, and Lunde (2008) argue that scaling RVt avoids the inconsistency of realized volatility under market microstructure noise. 4 Simulation Results 4.1 Data Generating Process We consider four commonly used DGPs in the literature. The first one is the GARCH(1,1) diffusion, introduced by Andersen and Bollerslev (1998). The log-price follows dp(t)=μdt+σ(t)dWp(t), (31) dσ2(t)=α(β−σ2(t))dt+γσ2(t)dWσ(t). (32) where Wp(t) and Wσ(t) are two independent Wiener processes. The values of parameters follow Andersen and Bollerslev (1998) and are μ=0.03, α=0.035, β=0.636 ⁠, and γ=0.144 ⁠, which were estimated using foreign exchange data. Following Huang and Tauchen (2005), the second and third DGPs are a one-factor stochastic volatility diffusion (SV1F) and one-factor stochastic volatility diffusion with jumps (SV1FJ). SV1F is given by dp(t)=μdt+ exp (β0+β1v(t))dWp(t), (33) dv(t)=αv(t)dt+dWv(t) (34) and the price process for SV1FJ is dp(t)=μdt+ exp (β0+β1v(t))dWp(t)+dJ(t), (35) where corr(dWp(t),dWv(t))=ρ ⁠, and J(t) is a Poisson process with jump intensity λ and jump size δ∼N(0,σJ2) ⁠. We adopt the parameter settings from Huang and Tauchen (2005) and set μ=0.03, β0=0.0, β1=0.125, α=−0.1, ρ=−0.62, λ=0.014 ⁠, and σJ2=0.5 ⁠. The final DGP is the two-factor stochastic volatility diffusion (SV2F) from Chernov et al. (2003) and Huang and Tauchen (2005).8 dp(t)=μdt+s‐exp (β0+β1v1(t)+β2v2(t))dWp(t), (36) dv1(t)=α1v1(t)dt+dWv1(t), (37) dv2(t)=α2v2(t)dt+(1+ψv2(t))dWv2(t), (38) where corr(dWp(t),dWv1(t))=ρ1 and corr(dWp(t),dWv2(t))=ρ2 ⁠. The parameter values in SV2F are μ=0.03, β0=−1.2, β1=0.04, β2=1.5, α1=−0.00137, α2=−1.386, ψ=0.25 ⁠, and ρ1=ρ2=−0.3 ⁠, which are from Huang and Tauchen (2005). Data are simulated using a basic Euler discretization at one-second frequency for the four DGPs. Assuming the length of daily trading time is 6.5 hours (23,400 seconds), we first simulate the log-price level every second. After this, we compute the 5-minute, 1-minute, 30-second, and 10-second intraday returns by taking the difference every 300, 60, 30, 10 steps, respectively. The initial volatility level, such as v1t and v2t in SV2F, at day t is set equal to the last volatility value at previous day, t – 1. T = 5000 days of intraday returns are simulated using the four DGPs and used to report sampling properties of the volatility estimators. In each case, to remove dependence on the startup conditions 500 initial days are dropped from the simulation. 4.1.1 Independent noise Following Barndorff-Nielsen et al. (2008), log-prices with independent noise are simulated as follows p˜t,i=pt,i+ϵt,i,ϵt,i∼N(0,σω2),σω2=ξ2var(rt). (39) The error term is added to the log-prices simulated from the four DGPs every second. The variance of microstructure error is proportional to the daily variance calculated using the pure daily returns. We set the noise-to-signal ratio ξ2=0.001 ⁠, which is the same value used in Barndorff-Nielsen et al. (2008) and close to the value in Bandi and Russell (2008). 4.1.2 Dependent noise Following Hansen, Large, and Lunde (2008), we consider the simulation of log-prices with dependent noise as follows, p˜t,i=pt,i+ϵt,i,ϵt,i∼N(μϵt,i,σω2),μϵt,i=∑l=1φ(1−lφ)(pt,i−l−pt,i−1−l),σω2=ξ2var(rt), (40) where φ=20 ⁠, which makes the error term correlated with returns in the past 20 seconds (steps). If past returns are positive (negative), the noise term tends to be positive (negative). All other settings, such as σω2 and ξ2 ⁠, are the same as in the independent error case. 4.2 True Volatility and Comparison Criteria The RV by Andersen et al. (2003) and Barndorff-Nielsen and Shephard (2002), the flat-top realized kernel (RKF) by Barndorff-Nielsen et al. (2008), and the non-negative realized kernel (RKN) by Barndorff-Nielsen et al. (2011) serve as the benchmarks for comparison. Section A.1 in Appendix provides a brief review of those estimators. We assess the ability of several ex post variance estimators to estimate the daily quadratic variation (QVt) from the four DGPs. QVt is estimated as the summation of the squared intraday pure returns at the highest frequency (one second) σt2≡∑i=123400rt,i2. (41) The competing ex post daily variance estimators, generically labeled σ^t2 ⁠, are compared based on the root mean squared errors (RMSEs), and bias defined as RMSE(σt2̂)=1T∑t=1T(σt2̂−σt2)2, (42) Bias(σt2̂)=1T∑t=1T(σt2̂−σt2). (43) The coverage probability estimates report the frequency that the confidence intervals or density intervals from the Bayesian nonparametric estimators contain the true ex post variance, σt2 ⁠. The 95% confidence intervals of RVt, RKtF ⁠, and RKtN reply on the asymptotic distribution, which is provided in Equations (49), (52), and (56). We take the bias into account to compute the 95% confidence interval using RKtN ⁠. The estimation of integrated quarticity is crucial in determining the confidence interval for the realized kernels. We consider two versions of quarticity, one is to use the true (infeasible) IQt which is calculated as IQttrue=23400∑i=123400σt,i4, (44) where σt,i2 refers to spot variance simulated at the highest frequency. The other method is to estimate IQt using the tri-power quarticity (TPQt) estimator, see formula (54). The confidence interval based on IQttrue is the infeasible case and the confidence interval calculated using TPQt is the feasible case. For each day, 5000 MCMC draws are collected after 1000 burn-in to compute the Bayesian posterior quantities. A 0.95 density interval is the 0.025 and 0.975 sample quantiles of MCMC draws of ∑i=1ntσt,i2 ⁠, respectively. 4.3 No Microstructure Noise In Table 2, V^t has slightly smaller RMSE in twelve out of sixteen categories. A paired t-test shows most of these differences in MSE are significant as well. For example, for the five-minute data, V^t reduces the RMSE by over 5% for the SV2F data. This is remarkable given that RVt is the gold standard in the no noise setting. Bias and coverage probabilities (not displayed) for 95% confidence intervals of RVt and 0.95 density intervals of V^t show both estimators to perform well. Under no microstructure noise, the Bayesian nonparametric estimator is competitive with the classical counterpart RVt. V^t offers smaller estimation error and better finite sample results than RVt when the data frequency is low. Performance of RVt and V^t both improve as the sampling frequency increases. Table 2 RMSE of RVt, blocked RVt, and V^t (no microstructure noise case) Data freq. Estimator GARCH SV1F SV1FJ SV2F 5-minute RVt 0.12352 0.21226 0.21471 0.45601 RVt,M=26block 0.12485 0.21588 0.21680 0.44368 V^t 0.11866*** 0.20116*** 0.20659*** 0.43095** 1-minute RVt 0.05368 0.09283 0.09771 0.23296 RVt,M=78block 0.05397 0.09335 0.09875 0.23120 V^t 0.05323*** 0.09190*** 0.10051 0.22802* 30-second RVt 0.03886 0.06530 0.06741* 0.14178 RVt,M=130block 0.03906 0.06539 0.06771 0.14184 V^t 0.03867*** 0.06495*** 0.07276 0.13970 10-second RVt 0.02177 0.03601 0.03662* 0.09535 RVt,M=260block 0.02176 0.03601 0.03677 0.09587 V^t 0.02171** 0.03589*** 0.04722 0.09596 Data freq. Estimator GARCH SV1F SV1FJ SV2F 5-minute RVt 0.12352 0.21226 0.21471 0.45601 RVt,M=26block 0.12485 0.21588 0.21680 0.44368 V^t 0.11866*** 0.20116*** 0.20659*** 0.43095** 1-minute RVt 0.05368 0.09283 0.09771 0.23296 RVt,M=78block 0.05397 0.09335 0.09875 0.23120 V^t 0.05323*** 0.09190*** 0.10051 0.22802* 30-second RVt 0.03886 0.06530 0.06741* 0.14178 RVt,M=130block 0.03906 0.06539 0.06771 0.14184 V^t 0.03867*** 0.06495*** 0.07276 0.13970 10-second RVt 0.02177 0.03601 0.03662* 0.09535 RVt,M=260block 0.02176 0.03601 0.03677 0.09587 V^t 0.02171** 0.03589*** 0.04722 0.09596 Notes: This table reports the RMSE of estimating 5000 daily ex post variances using RVt, blocked RVt, and Bayesian nonparametric estimator V^t under different frequencies and DGPs. Microstructure noise is not considered. A paired t-test is used to test whether the difference in the mean of (RVt−Vt)2 and (V^t−Vt)2 is equal to zero. Bold entries denote the smallest values. Significance at *p < 0.05; **p < 0.01; ***p < 0.001. Open in new tab Table 2 RMSE of RVt, blocked RVt, and V^t (no microstructure noise case) Data freq. Estimator GARCH SV1F SV1FJ SV2F 5-minute RVt 0.12352 0.21226 0.21471 0.45601 RVt,M=26block 0.12485 0.21588 0.21680 0.44368 V^t 0.11866*** 0.20116*** 0.20659*** 0.43095** 1-minute RVt 0.05368 0.09283 0.09771 0.23296 RVt,M=78block 0.05397 0.09335 0.09875 0.23120 V^t 0.05323*** 0.09190*** 0.10051 0.22802* 30-second RVt 0.03886 0.06530 0.06741* 0.14178 RVt,M=130block 0.03906 0.06539 0.06771 0.14184 V^t 0.03867*** 0.06495*** 0.07276 0.13970 10-second RVt 0.02177 0.03601 0.03662* 0.09535 RVt,M=260block 0.02176 0.03601 0.03677 0.09587 V^t 0.02171** 0.03589*** 0.04722 0.09596 Data freq. Estimator GARCH SV1F SV1FJ SV2F 5-minute RVt 0.12352 0.21226 0.21471 0.45601 RVt,M=26block 0.12485 0.21588 0.21680 0.44368 V^t 0.11866*** 0.20116*** 0.20659*** 0.43095** 1-minute RVt 0.05368 0.09283 0.09771 0.23296 RVt,M=78block 0.05397 0.09335 0.09875 0.23120 V^t 0.05323*** 0.09190*** 0.10051 0.22802* 30-second RVt 0.03886 0.06530 0.06741* 0.14178 RVt,M=130block 0.03906 0.06539 0.06771 0.14184 V^t 0.03867*** 0.06495*** 0.07276 0.13970 10-second RVt 0.02177 0.03601 0.03662* 0.09535 RVt,M=260block 0.02176 0.03601 0.03677 0.09587 V^t 0.02171** 0.03589*** 0.04722 0.09596 Notes: This table reports the RMSE of estimating 5000 daily ex post variances using RVt, blocked RVt, and Bayesian nonparametric estimator V^t under different frequencies and DGPs. Microstructure noise is not considered. A paired t-test is used to test whether the difference in the mean of (RVt−Vt)2 and (V^t−Vt)2 is equal to zero. Bold entries denote the smallest values. Significance at *p < 0.05; **p < 0.01; ***p < 0.001. Open in new tab The comparison between the Bayes nonparametric estimator and the blocked RV9 of Mykland and Zhang (2009) is also considered. Table 2 reports RMSE of blocked RV with block size being nt3/4 ⁠. The RMSE of V^t remains the lowest in twelve out of sixteen cases. A robustness analysis is conducted to check how sensitive the results are to the selection of priors. Different sets of hyperparameters of μt and α are considered and calibration of the prior of σt,i2 based on only one day of data (Equation (12)). Table 3 summarizes RMSE of V^t under alternative priors for SV2F data (see entries of v0,t1d,s0,t1d for one day of prior calibration). None of the result changes more than 1% under new priors and V^t consistently outperforms RVt in 5-minute, 1-minute, and 30-second categories. Table 3 Prior robustness check Estimator Prior of μt Prior of αt Prior of σt,i2 RMSE Panel A: 5-minute return  RVt – – – 0.45601 N(0,0.01nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.43095 N(0,0.05nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.42919   V^t N(0,0.002nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.42953 N(0,0.01nt) Gamma(2,8) IG(v0,t3d,s0,t3d) 0.42975 N(0,0.01nt) Gamma(8,8) IG(v0,t3d,s0,t3d) 0.42782 N(0,0.01nt) Gamma(4,8) IG(v0,t1d,s0,t1d) 0.43199 Panel B: 1-minute return  RVt – – – 0.23296 N(0,0.01nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.22802 N(0,0.05nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.22739   V^t N(0,0.002nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.22686 N(0,0.01nt) Gamma(2,8) IG(v0,t3d,s0,t3d) 0.22691 N(0,0.01nt) Gamma(8,8) IG(v0,t3d,s0,t3d) 0.22613 N(0,0.01nt) Gamma(4,8) IG(v0,t1d,s0,t1d) 0.22695 Panel C: 30-second return  RVt – – – 0.14178 N(0,0.01nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.13970 N(0,0.05nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.14059   V^t N(0,0.002nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.14012 N(0,0.01nt) Gamma(2,8) IG(v0,t3d,s0,t3d) 0.14096 N(0,0.01nt) Gamma(8,8) IG(v0,t3d,s0,t3d) 0.14003 N(0,0.01nt) Gamma(4,8) IG(v0,t1d,s0,t1d) 0.14029 Panel D: 10-second return  RVt – – – 0.09535 N(0,0.01nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.09596 N(0,0.05nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.09631   V^t N(0,0.002nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.09546 N(0,0.01nt) Gamma(2,8) IG(v0,t3d,s0,t3d) 0.09610 N(0,0.01nt) Gamma(8,8) IG(v0,t3d,s0,t3d) 0.09644 N(0,0.01nt) Gamma(4,8) IG(v0,t1d,s0,t1d) 0.09528 Estimator Prior of μt Prior of αt Prior of σt,i2 RMSE Panel A: 5-minute return  RVt – – – 0.45601 N(0,0.01nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.43095 N(0,0.05nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.42919   V^t N(0,0.002nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.42953 N(0,0.01nt) Gamma(2,8) IG(v0,t3d,s0,t3d) 0.42975 N(0,0.01nt) Gamma(8,8) IG(v0,t3d,s0,t3d) 0.42782 N(0,0.01nt) Gamma(4,8) IG(v0,t1d,s0,t1d) 0.43199 Panel B: 1-minute return  RVt – – – 0.23296 N(0,0.01nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.22802 N(0,0.05nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.22739   V^t N(0,0.002nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.22686 N(0,0.01nt) Gamma(2,8) IG(v0,t3d,s0,t3d) 0.22691 N(0,0.01nt) Gamma(8,8) IG(v0,t3d,s0,t3d) 0.22613 N(0,0.01nt) Gamma(4,8) IG(v0,t1d,s0,t1d) 0.22695 Panel C: 30-second return  RVt – – – 0.14178 N(0,0.01nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.13970 N(0,0.05nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.14059   V^t N(0,0.002nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.14012 N(0,0.01nt) Gamma(2,8) IG(v0,t3d,s0,t3d) 0.14096 N(0,0.01nt) Gamma(8,8) IG(v0,t3d,s0,t3d) 0.14003 N(0,0.01nt) Gamma(4,8) IG(v0,t1d,s0,t1d) 0.14029 Panel D: 10-second return  RVt – – – 0.09535 N(0,0.01nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.09596 N(0,0.05nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.09631   V^t N(0,0.002nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.09546 N(0,0.01nt) Gamma(2,8) IG(v0,t3d,s0,t3d) 0.09610 N(0,0.01nt) Gamma(8,8) IG(v0,t3d,s0,t3d) 0.09644 N(0,0.01nt) Gamma(4,8) IG(v0,t1d,s0,t1d) 0.09528 Notes: This table reports the RMSE of estimating 5000 daily ex post variances using RVt and Bayes nonparametric estimator V^t under different priors. v0,t3d denotes 3 days used to calibrate the prior parameter while v0,t1d denotes one day of data used. The data are generated from SV2F. Microstructure noise is not considered. Open in new tab Table 3 Prior robustness check Estimator Prior of μt Prior of αt Prior of σt,i2 RMSE Panel A: 5-minute return  RVt – – – 0.45601 N(0,0.01nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.43095 N(0,0.05nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.42919   V^t N(0,0.002nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.42953 N(0,0.01nt) Gamma(2,8) IG(v0,t3d,s0,t3d) 0.42975 N(0,0.01nt) Gamma(8,8) IG(v0,t3d,s0,t3d) 0.42782 N(0,0.01nt) Gamma(4,8) IG(v0,t1d,s0,t1d) 0.43199 Panel B: 1-minute return  RVt – – – 0.23296 N(0,0.01nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.22802 N(0,0.05nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.22739   V^t N(0,0.002nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.22686 N(0,0.01nt) Gamma(2,8) IG(v0,t3d,s0,t3d) 0.22691 N(0,0.01nt) Gamma(8,8) IG(v0,t3d,s0,t3d) 0.22613 N(0,0.01nt) Gamma(4,8) IG(v0,t1d,s0,t1d) 0.22695 Panel C: 30-second return  RVt – – – 0.14178 N(0,0.01nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.13970 N(0,0.05nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.14059   V^t N(0,0.002nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.14012 N(0,0.01nt) Gamma(2,8) IG(v0,t3d,s0,t3d) 0.14096 N(0,0.01nt) Gamma(8,8) IG(v0,t3d,s0,t3d) 0.14003 N(0,0.01nt) Gamma(4,8) IG(v0,t1d,s0,t1d) 0.14029 Panel D: 10-second return  RVt – – – 0.09535 N(0,0.01nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.09596 N(0,0.05nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.09631   V^t N(0,0.002nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.09546 N(0,0.01nt) Gamma(2,8) IG(v0,t3d,s0,t3d) 0.09610 N(0,0.01nt) Gamma(8,8) IG(v0,t3d,s0,t3d) 0.09644 N(0,0.01nt) Gamma(4,8) IG(v0,t1d,s0,t1d) 0.09528 Estimator Prior of μt Prior of αt Prior of σt,i2 RMSE Panel A: 5-minute return  RVt – – – 0.45601 N(0,0.01nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.43095 N(0,0.05nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.42919   V^t N(0,0.002nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.42953 N(0,0.01nt) Gamma(2,8) IG(v0,t3d,s0,t3d) 0.42975 N(0,0.01nt) Gamma(8,8) IG(v0,t3d,s0,t3d) 0.42782 N(0,0.01nt) Gamma(4,8) IG(v0,t1d,s0,t1d) 0.43199 Panel B: 1-minute return  RVt – – – 0.23296 N(0,0.01nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.22802 N(0,0.05nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.22739   V^t N(0,0.002nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.22686 N(0,0.01nt) Gamma(2,8) IG(v0,t3d,s0,t3d) 0.22691 N(0,0.01nt) Gamma(8,8) IG(v0,t3d,s0,t3d) 0.22613 N(0,0.01nt) Gamma(4,8) IG(v0,t1d,s0,t1d) 0.22695 Panel C: 30-second return  RVt – – – 0.14178 N(0,0.01nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.13970 N(0,0.05nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.14059   V^t N(0,0.002nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.14012 N(0,0.01nt) Gamma(2,8) IG(v0,t3d,s0,t3d) 0.14096 N(0,0.01nt) Gamma(8,8) IG(v0,t3d,s0,t3d) 0.14003 N(0,0.01nt) Gamma(4,8) IG(v0,t1d,s0,t1d) 0.14029 Panel D: 10-second return  RVt – – – 0.09535 N(0,0.01nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.09596 N(0,0.05nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.09631   V^t N(0,0.002nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.09546 N(0,0.01nt) Gamma(2,8) IG(v0,t3d,s0,t3d) 0.09610 N(0,0.01nt) Gamma(8,8) IG(v0,t3d,s0,t3d) 0.09644 N(0,0.01nt) Gamma(4,8) IG(v0,t1d,s0,t1d) 0.09528 Notes: This table reports the RMSE of estimating 5000 daily ex post variances using RVt and Bayes nonparametric estimator V^t under different priors. v0,t3d denotes 3 days used to calibrate the prior parameter while v0,t1d denotes one day of data used. The data are generated from SV2F. Microstructure noise is not considered. Open in new tab We also include an analysis to check if the benefits of pooling persist given irregularly spaced returns. Following Barndorff-Nielsen et al. (2011), arrival times of observed prices are simulated from a Poisson process. Table 4 shows the Bayes nonparametric estimator V^t has a lower RMSE compared with RVt for this irregularly spaced DGP. Table 4 RMSE of RVt and V^t (irregularly spaced case) Data freq. Estimator GARCH SV1F SV1FJ SV2F λ = 300 RVt 0.15890 0.34273 0.32518 0.70050 V^t 0.14724 0.31481 0.30301 0.67717 λ = 60 RVt 0.14903 0.15044 0.14741 0.63304 V^t 0.14297 0.14611 0.14524 0.57660 λ = 30 RVt 0.06989 0.09738 0.10187 0.27621 V^t 0.06827 0.09604 0.10355 0.26985 λ = 10 RVt 0.03399 0.06432 0.06575 0.23201 V^t 0.03380 0.06392 0.06851 0.23070 Data freq. Estimator GARCH SV1F SV1FJ SV2F λ = 300 RVt 0.15890 0.34273 0.32518 0.70050 V^t 0.14724 0.31481 0.30301 0.67717 λ = 60 RVt 0.14903 0.15044 0.14741 0.63304 V^t 0.14297 0.14611 0.14524 0.57660 λ = 30 RVt 0.06989 0.09738 0.10187 0.27621 V^t 0.06827 0.09604 0.10355 0.26985 λ = 10 RVt 0.03399 0.06432 0.06575 0.23201 V^t 0.03380 0.06392 0.06851 0.23070 Notes: We follow Barndorff-Nielsen et al. (2011) to simulate irregularly spaced prices. The arrival times of observations are simulated from a Poisson process. The parameter λ in the Poisson process governs the trading frequency of simulated data. For example, λ = 30 means the transactions arrive every 30 seconds on average. Bold entries denote the smallest values. Open in new tab Table 4 RMSE of RVt and V^t (irregularly spaced case) Data freq. Estimator GARCH SV1F SV1FJ SV2F λ = 300 RVt 0.15890 0.34273 0.32518 0.70050 V^t 0.14724 0.31481 0.30301 0.67717 λ = 60 RVt 0.14903 0.15044 0.14741 0.63304 V^t 0.14297 0.14611 0.14524 0.57660 λ = 30 RVt 0.06989 0.09738 0.10187 0.27621 V^t 0.06827 0.09604 0.10355 0.26985 λ = 10 RVt 0.03399 0.06432 0.06575 0.23201 V^t 0.03380 0.06392 0.06851 0.23070 Data freq. Estimator GARCH SV1F SV1FJ SV2F λ = 300 RVt 0.15890 0.34273 0.32518 0.70050 V^t 0.14724 0.31481 0.30301 0.67717 λ = 60 RVt 0.14903 0.15044 0.14741 0.63304 V^t 0.14297 0.14611 0.14524 0.57660 λ = 30 RVt 0.06989 0.09738 0.10187 0.27621 V^t 0.06827 0.09604 0.10355 0.26985 λ = 10 RVt 0.03399 0.06432 0.06575 0.23201 V^t 0.03380 0.06392 0.06851 0.23070 Notes: We follow Barndorff-Nielsen et al. (2011) to simulate irregularly spaced prices. The arrival times of observations are simulated from a Poisson process. The parameter λ in the Poisson process governs the trading frequency of simulated data. For example, λ = 30 means the transactions arrive every 30 seconds on average. Bold entries denote the smallest values. Open in new tab 4.4 Independent and Dependent Microstructure Noise In this section, we compare RVt, RKtF, V^t ⁠, and V^t,MA(1) with independent microstructure noise. Table 5 shows the RMSE of the various estimators for different sampling frequencies and DGPs. RVt and V^t produce smaller errors in estimating σt2 than RKtF and V^t,MA(1) for five-minute data. However, increasing the sampling frequency results in a larger bias from the microstructure noise. As such, RKtF and V^t,MA(1) are more accurate as the data frequency increases. Compared to RKtF ⁠, V^t,MA(1) has a smaller RMSE in all cases, except for 30-second and 10-second SV2F return. Table 5 RMSE of RVt, RKtF, V^t ⁠, and V^t,MA(1) (independent microstructure error case) Data freq. Estimator GARCH SV1F SV1FJ SV2F 5-minute RVt 0.16003 0.29182 0.30651 0.47783 RKtF 0.22988 0.42318 0.43993 0.84100 Vt^ 0.15640 0.28464 0.29858 0.46117 V^t,MA(1) 0.21636 0.38776 0.40729 0.74828 1-minute RVt 0.48607 0.85374 0.94598 0.63983 RKtF 0.11157 0.20184 0.20822 0.46655 V^t 0.48735 0.85547 0.94689 0.63808 V^t,MA(1) 0.10592 0.18787 0.19539 0.41176 30-second RVt 0.95855 1.69544 1.87445 1.20299 RKtF 0.08483 0.15200 0.15743 0.27201 V^t 0.96016 1.69798 1.87569 1.20332 V^t,MA(1) 0.07906 0.14017 0.15232 0.27595 10-second RVt 2.86639 5.06382 5.60527 3.57263 RKtF 0.05575 0.10097 0.10683 0.16989 V^t 2.86858 5.06745 5.60757 3.57263 V^t,MA(1) 0.05387 0.09621 0.10555 0.20857 Data freq. Estimator GARCH SV1F SV1FJ SV2F 5-minute RVt 0.16003 0.29182 0.30651 0.47783 RKtF 0.22988 0.42318 0.43993 0.84100 Vt^ 0.15640 0.28464 0.29858 0.46117 V^t,MA(1) 0.21636 0.38776 0.40729 0.74828 1-minute RVt 0.48607 0.85374 0.94598 0.63983 RKtF 0.11157 0.20184 0.20822 0.46655 V^t 0.48735 0.85547 0.94689 0.63808 V^t,MA(1) 0.10592 0.18787 0.19539 0.41176 30-second RVt 0.95855 1.69544 1.87445 1.20299 RKtF 0.08483 0.15200 0.15743 0.27201 V^t 0.96016 1.69798 1.87569 1.20332 V^t,MA(1) 0.07906 0.14017 0.15232 0.27595 10-second RVt 2.86639 5.06382 5.60527 3.57263 RKtF 0.05575 0.10097 0.10683 0.16989 V^t 2.86858 5.06745 5.60757 3.57263 V^t,MA(1) 0.05387 0.09621 0.10555 0.20857 Notes: This table reports the RMSE of estimating 5000 daily ex post variances using RVt, RKtF ⁠, and Bayesian nonparametric estimators V^t and V^t,MA(1) based on returns at different frequencies and simulated from four DGPs. The price is contaminated with white noise. Bold entries denote the smallest values. Open in new tab Table 5 RMSE of RVt, RKtF, V^t ⁠, and V^t,MA(1) (independent microstructure error case) Data freq. Estimator GARCH SV1F SV1FJ SV2F 5-minute RVt 0.16003 0.29182 0.30651 0.47783 RKtF 0.22988 0.42318 0.43993 0.84100 Vt^ 0.15640 0.28464 0.29858 0.46117 V^t,MA(1) 0.21636 0.38776 0.40729 0.74828 1-minute RVt 0.48607 0.85374 0.94598 0.63983 RKtF 0.11157 0.20184 0.20822 0.46655 V^t 0.48735 0.85547 0.94689 0.63808 V^t,MA(1) 0.10592 0.18787 0.19539 0.41176 30-second RVt 0.95855 1.69544 1.87445 1.20299 RKtF 0.08483 0.15200 0.15743 0.27201 V^t 0.96016 1.69798 1.87569 1.20332 V^t,MA(1) 0.07906 0.14017 0.15232 0.27595 10-second RVt 2.86639 5.06382 5.60527 3.57263 RKtF 0.05575 0.10097 0.10683 0.16989 V^t 2.86858 5.06745 5.60757 3.57263 V^t,MA(1) 0.05387 0.09621 0.10555 0.20857 Data freq. Estimator GARCH SV1F SV1FJ SV2F 5-minute RVt 0.16003 0.29182 0.30651 0.47783 RKtF 0.22988 0.42318 0.43993 0.84100 Vt^ 0.15640 0.28464 0.29858 0.46117 V^t,MA(1) 0.21636 0.38776 0.40729 0.74828 1-minute RVt 0.48607 0.85374 0.94598 0.63983 RKtF 0.11157 0.20184 0.20822 0.46655 V^t 0.48735 0.85547 0.94689 0.63808 V^t,MA(1) 0.10592 0.18787 0.19539 0.41176 30-second RVt 0.95855 1.69544 1.87445 1.20299 RKtF 0.08483 0.15200 0.15743 0.27201 V^t 0.96016 1.69798 1.87569 1.20332 V^t,MA(1) 0.07906 0.14017 0.15232 0.27595 10-second RVt 2.86639 5.06382 5.60527 3.57263 RKtF 0.05575 0.10097 0.10683 0.16989 V^t 2.86858 5.06745 5.60757 3.57263 V^t,MA(1) 0.05387 0.09621 0.10555 0.20857 Notes: This table reports the RMSE of estimating 5000 daily ex post variances using RVt, RKtF ⁠, and Bayesian nonparametric estimators V^t and V^t,MA(1) based on returns at different frequencies and simulated from four DGPs. The price is contaminated with white noise. Bold entries denote the smallest values. Open in new tab As can be seen in Table 6, V^t,MA(1) has the best finite sample coverage among all the alternatives except for the SV2F data. For example, the coverage probabilities of 0.95 density intervals are always within 0.5% from the truth. Note that the density intervals are trivial to obtain from the MCMC output and do not require the calculation IQt. The coverage probabilities of either infeasible or feasible confidence intervals of realized kernels are not as good as those of V^t,MA(1) ⁠. Moreover, RKtF requires larger samples for good coverage, while density intervals of V^t,MA(1) perform well for either low- or high-frequency returns. Table 6 Coverage probability (independent microstructure error case) Data freq. Interval estimator GARCH (%) SV1F (%) SV1FJ (%) SV2F (%) 5-minute RVt 87.60 85.00 84.42 21.56 RKtF‐Infeasible 87.84 87.66 87.94 93.48 RKtF‐Feasible 84.28 96.20 83.68 97.72 V^t 81.84 78.50 77.10 18.12 V^t,MA(1) 94.02 94.40 94.14 89.74 1-minute RVt 0.46 0.82 0.78 5.24 RKtF‐Infeasible 88.50 89.78 89.02 93.32 RKtF‐Feasible 99.30 97.76 95.26 97.86 V^t 0.42 0.72 0.56 4.48 V^t,MA(1) 95.06 95.18 94.66 86.60 30-second RVt 0.00 0.00 0.02 1.72 RKtF‐Infeasible 89.80 90.46 90.74 92.80 RKtF‐Feasible 77.44 99.48 99.52 97.94 V^t 0.00 0.00 0.00 1.54 V^t,MA(1) 95.00 95.18 94.84 85.94 10-second RVt 0.00 0.00 0.00 0.04 RKtF‐Infeasible 92.08 92.68 92.90 92.10 RKtF‐Feasible 99.98 99.98 99.98 98.62 V^t 0.00 0.00 0.00 0.04 V^t,MA(1) 94.92 95.34 95.32 82.24 Data freq. Interval estimator GARCH (%) SV1F (%) SV1FJ (%) SV2F (%) 5-minute RVt 87.60 85.00 84.42 21.56 RKtF‐Infeasible 87.84 87.66 87.94 93.48 RKtF‐Feasible 84.28 96.20 83.68 97.72 V^t 81.84 78.50 77.10 18.12 V^t,MA(1) 94.02 94.40 94.14 89.74 1-minute RVt 0.46 0.82 0.78 5.24 RKtF‐Infeasible 88.50 89.78 89.02 93.32 RKtF‐Feasible 99.30 97.76 95.26 97.86 V^t 0.42 0.72 0.56 4.48 V^t,MA(1) 95.06 95.18 94.66 86.60 30-second RVt 0.00 0.00 0.02 1.72 RKtF‐Infeasible 89.80 90.46 90.74 92.80 RKtF‐Feasible 77.44 99.48 99.52 97.94 V^t 0.00 0.00 0.00 1.54 V^t,MA(1) 95.00 95.18 94.84 85.94 10-second RVt 0.00 0.00 0.00 0.04 RKtF‐Infeasible 92.08 92.68 92.90 92.10 RKtF‐Feasible 99.98 99.98 99.98 98.62 V^t 0.00 0.00 0.00 0.04 V^t,MA(1) 94.92 95.34 95.32 82.24 Notes: This table reports the coverage probabilities of 95% confidence intervals using RVt, RKtF ⁠, and 0.95 density intervals using V^t and V^MA(1) based on 5000-day results for different DGPs. The price is contaminated with white noise. Open in new tab Table 6 Coverage probability (independent microstructure error case) Data freq. Interval estimator GARCH (%) SV1F (%) SV1FJ (%) SV2F (%) 5-minute RVt 87.60 85.00 84.42 21.56 RKtF‐Infeasible 87.84 87.66 87.94 93.48 RKtF‐Feasible 84.28 96.20 83.68 97.72 V^t 81.84 78.50 77.10 18.12 V^t,MA(1) 94.02 94.40 94.14 89.74 1-minute RVt 0.46 0.82 0.78 5.24 RKtF‐Infeasible 88.50 89.78 89.02 93.32 RKtF‐Feasible 99.30 97.76 95.26 97.86 V^t 0.42 0.72 0.56 4.48 V^t,MA(1) 95.06 95.18 94.66 86.60 30-second RVt 0.00 0.00 0.02 1.72 RKtF‐Infeasible 89.80 90.46 90.74 92.80 RKtF‐Feasible 77.44 99.48 99.52 97.94 V^t 0.00 0.00 0.00 1.54 V^t,MA(1) 95.00 95.18 94.84 85.94 10-second RVt 0.00 0.00 0.00 0.04 RKtF‐Infeasible 92.08 92.68 92.90 92.10 RKtF‐Feasible 99.98 99.98 99.98 98.62 V^t 0.00 0.00 0.00 0.04 V^t,MA(1) 94.92 95.34 95.32 82.24 Data freq. Interval estimator GARCH (%) SV1F (%) SV1FJ (%) SV2F (%) 5-minute RVt 87.60 85.00 84.42 21.56 RKtF‐Infeasible 87.84 87.66 87.94 93.48 RKtF‐Feasible 84.28 96.20 83.68 97.72 V^t 81.84 78.50 77.10 18.12 V^t,MA(1) 94.02 94.40 94.14 89.74 1-minute RVt 0.46 0.82 0.78 5.24 RKtF‐Infeasible 88.50 89.78 89.02 93.32 RKtF‐Feasible 99.30 97.76 95.26 97.86 V^t 0.42 0.72 0.56 4.48 V^t,MA(1) 95.06 95.18 94.66 86.60 30-second RVt 0.00 0.00 0.02 1.72 RKtF‐Infeasible 89.80 90.46 90.74 92.80 RKtF‐Feasible 77.44 99.48 99.52 97.94 V^t 0.00 0.00 0.00 1.54 V^t,MA(1) 95.00 95.18 94.84 85.94 10-second RVt 0.00 0.00 0.00 0.04 RKtF‐Infeasible 92.08 92.68 92.90 92.10 RKtF‐Feasible 99.98 99.98 99.98 98.62 V^t 0.00 0.00 0.00 0.04 V^t,MA(1) 94.92 95.34 95.32 82.24 Notes: This table reports the coverage probabilities of 95% confidence intervals using RVt, RKtF ⁠, and 0.95 density intervals using V^t and V^MA(1) based on 5000-day results for different DGPs. The price is contaminated with white noise. Open in new tab The last experiment considers the performances of the estimators under dependent noise. RKtN ⁠, RVt, V^t, V^t,MA(1) ⁠, and V^t,MA(2) are compared. The RMSE of estimators can be found in Table 7. Again, RVt and V^t provide poor results if high-frequency data is used. Except for one entry in the table, a version of the Bayesian estimator has the smallest RMSE in each case. The V^t,MA(1) estimator is ranked the best if return frequency is 30 seconds, followed by V^t,MA(2) and RKtN ⁠. For 10 seconds returns, V^MA(2) provides the smallest error. Compared to RKtN ⁠, the V^t,MA(1) and V^t,MA(2) can provide significant improvements for 30- and 10-second returns. For instance, at 30 seconds, reductions in the RMSE of 10% or more are common while at the 10-second frequency reductions in the RMSE are 25% or more. Table 7 RMSE of RVt, RKtN, V^t, V^t,MA(1) ⁠, and V^t,MA(2) (dependent microstructure error case) Data freq. Estimator GARCH SV1F SV1FJ SV2F 5-minute RVt 0.21825 0.39266 0.41585 0.58520 RKtN 0.23575 0.44343 0.45080 0.89975 V^t 0.21505 0.38582 0.40767 0.54581 V^t,MA(1) 0.22493 0.40316 0.42100 0.83691 V^t,MA(2) 0.29260 0.54051 0.57714 1.18875 1-minute RVt 0.84121 1.48399 1.60189 1.6954 RKtN 0.14158 0.25780 0.26987 0.52030 V^t 0.84318 1.48663 1.60261 1.67740 V^t,MA(1) 0.11558 0.20443 0.21297 0.50769 V^t,MA(2) 0.13732 0.24891 0.26161 0.62325 30-second RVt 1.66229 2.95397 3.19560 3.37090 RKtN 0.11918 0.21559 0.22306 0.42729 V^t 1.66480 2.95765 3.19689 3.36058 V^t,MA(1) 0.08889 0.15848 0.16931 0.34572 V^t,MA(2) 0.10531 0.18916 0.19313 0.39269 10-second RVt 4.40694 7.81961 8.49852 7.85934 RKtN 0.09850 0.18004 0.18376 0.34594 V^t 4.41003 7.82481 8.49935 7.85507 V^t,MA(1) 0.16456 0.30833 0.30465 0.89045 V^t,MA(2) 0.06940 0.12804 0.13592 0.25182 Data freq. Estimator GARCH SV1F SV1FJ SV2F 5-minute RVt 0.21825 0.39266 0.41585 0.58520 RKtN 0.23575 0.44343 0.45080 0.89975 V^t 0.21505 0.38582 0.40767 0.54581 V^t,MA(1) 0.22493 0.40316 0.42100 0.83691 V^t,MA(2) 0.29260 0.54051 0.57714 1.18875 1-minute RVt 0.84121 1.48399 1.60189 1.6954 RKtN 0.14158 0.25780 0.26987 0.52030 V^t 0.84318 1.48663 1.60261 1.67740 V^t,MA(1) 0.11558 0.20443 0.21297 0.50769 V^t,MA(2) 0.13732 0.24891 0.26161 0.62325 30-second RVt 1.66229 2.95397 3.19560 3.37090 RKtN 0.11918 0.21559 0.22306 0.42729 V^t 1.66480 2.95765 3.19689 3.36058 V^t,MA(1) 0.08889 0.15848 0.16931 0.34572 V^t,MA(2) 0.10531 0.18916 0.19313 0.39269 10-second RVt 4.40694 7.81961 8.49852 7.85934 RKtN 0.09850 0.18004 0.18376 0.34594 V^t 4.41003 7.82481 8.49935 7.85507 V^t,MA(1) 0.16456 0.30833 0.30465 0.89045 V^t,MA(2) 0.06940 0.12804 0.13592 0.25182 Notes: This table reports the RMSE of estimating 5000 daily ex post variances using RVt, RKtN ⁠, and Bayesian nonparametric estimators V^t, V^t,MA(1) ⁠, and V^t,MA(2) based on returns at different frequencies and simulated from four DGPs. The observed prices contain microstructure noise that is dependent on returns. Bold entries denote the smallest values. Open in new tab Table 7 RMSE of RVt, RKtN, V^t, V^t,MA(1) ⁠, and V^t,MA(2) (dependent microstructure error case) Data freq. Estimator GARCH SV1F SV1FJ SV2F 5-minute RVt 0.21825 0.39266 0.41585 0.58520 RKtN 0.23575 0.44343 0.45080 0.89975 V^t 0.21505 0.38582 0.40767 0.54581 V^t,MA(1) 0.22493 0.40316 0.42100 0.83691 V^t,MA(2) 0.29260 0.54051 0.57714 1.18875 1-minute RVt 0.84121 1.48399 1.60189 1.6954 RKtN 0.14158 0.25780 0.26987 0.52030 V^t 0.84318 1.48663 1.60261 1.67740 V^t,MA(1) 0.11558 0.20443 0.21297 0.50769 V^t,MA(2) 0.13732 0.24891 0.26161 0.62325 30-second RVt 1.66229 2.95397 3.19560 3.37090 RKtN 0.11918 0.21559 0.22306 0.42729 V^t 1.66480 2.95765 3.19689 3.36058 V^t,MA(1) 0.08889 0.15848 0.16931 0.34572 V^t,MA(2) 0.10531 0.18916 0.19313 0.39269 10-second RVt 4.40694 7.81961 8.49852 7.85934 RKtN 0.09850 0.18004 0.18376 0.34594 V^t 4.41003 7.82481 8.49935 7.85507 V^t,MA(1) 0.16456 0.30833 0.30465 0.89045 V^t,MA(2) 0.06940 0.12804 0.13592 0.25182 Data freq. Estimator GARCH SV1F SV1FJ SV2F 5-minute RVt 0.21825 0.39266 0.41585 0.58520 RKtN 0.23575 0.44343 0.45080 0.89975 V^t 0.21505 0.38582 0.40767 0.54581 V^t,MA(1) 0.22493 0.40316 0.42100 0.83691 V^t,MA(2) 0.29260 0.54051 0.57714 1.18875 1-minute RVt 0.84121 1.48399 1.60189 1.6954 RKtN 0.14158 0.25780 0.26987 0.52030 V^t 0.84318 1.48663 1.60261 1.67740 V^t,MA(1) 0.11558 0.20443 0.21297 0.50769 V^t,MA(2) 0.13732 0.24891 0.26161 0.62325 30-second RVt 1.66229 2.95397 3.19560 3.37090 RKtN 0.11918 0.21559 0.22306 0.42729 V^t 1.66480 2.95765 3.19689 3.36058 V^t,MA(1) 0.08889 0.15848 0.16931 0.34572 V^t,MA(2) 0.10531 0.18916 0.19313 0.39269 10-second RVt 4.40694 7.81961 8.49852 7.85934 RKtN 0.09850 0.18004 0.18376 0.34594 V^t 4.41003 7.82481 8.49935 7.85507 V^t,MA(1) 0.16456 0.30833 0.30465 0.89045 V^t,MA(2) 0.06940 0.12804 0.13592 0.25182 Notes: This table reports the RMSE of estimating 5000 daily ex post variances using RVt, RKtN ⁠, and Bayesian nonparametric estimators V^t, V^t,MA(1) ⁠, and V^t,MA(2) based on returns at different frequencies and simulated from four DGPs. The observed prices contain microstructure noise that is dependent on returns. Bold entries denote the smallest values. Open in new tab Table 8 shows V^t,MA(1) and V^t,MA(2) have smaller bias if return frequency is one minute or higher. Table 9 shows the coverage probabilities of all the five estimators. The finite sample results of V^t,MA(2) are all very close to the optimal level, no matter the data frequency. Table 8 Bias of RVt, RKtN, V^t, V^t,MA(1) ⁠, and V^t,MA(2) (dependent microstructure error case) Data freq. Estimator GARCH SV1F SV1FJ SV2F 5-minute RVt 0.16032 0.28455 0.30733 0.17262 RKtN 0.01349 0.02985 0.03232 0.00733 V^t 0.16005 0.28359 0.30534 0.16275 V^t,MA(1) 0.01471 0.02665 0.02819 −0.00104 V^t,MA(2) 0.05581 0.10305 0.11604 0.03956 1-minute RVt 0.81057 1.42504 1.54563 0.87166 RKtN 0.02421 0.04351 0.04360 0.01839 V^t 0.81269 1.42805 1.54689 0.86954 V^t,MA(1) 0.00822 0.01401 0.01359 −0.01044 V^t,MA(2) 0.01694 0.03179 0.02977 −0.00588 30-second RVt 1.61481 2.85837 3.10192 1.72912 RKtN 0.02791 0.04940 0.05114 0.02369 V^t 1.61731 2.86219 3.10359 1.72853 V^t,MA(1) 0.00721 0.01253 0.00856 −0.01302 V^t,MA(2) 0.01074 0.01972 0.01796 −0.01155 10-second RVt 4.32800 7.65381 8.34221 4.67328 RKtN 0.04034 0.07209 0.07321 0.04327 V^t 4.33106 7.65902 8.34351 4.67462 V^t,MA(1) 0.11026 0.20188 0.20173 0.13648 V^t,MA(2) 0.00634 0.01300 0.00850 −0.01896 Data freq. Estimator GARCH SV1F SV1FJ SV2F 5-minute RVt 0.16032 0.28455 0.30733 0.17262 RKtN 0.01349 0.02985 0.03232 0.00733 V^t 0.16005 0.28359 0.30534 0.16275 V^t,MA(1) 0.01471 0.02665 0.02819 −0.00104 V^t,MA(2) 0.05581 0.10305 0.11604 0.03956 1-minute RVt 0.81057 1.42504 1.54563 0.87166 RKtN 0.02421 0.04351 0.04360 0.01839 V^t 0.81269 1.42805 1.54689 0.86954 V^t,MA(1) 0.00822 0.01401 0.01359 −0.01044 V^t,MA(2) 0.01694 0.03179 0.02977 −0.00588 30-second RVt 1.61481 2.85837 3.10192 1.72912 RKtN 0.02791 0.04940 0.05114 0.02369 V^t 1.61731 2.86219 3.10359 1.72853 V^t,MA(1) 0.00721 0.01253 0.00856 −0.01302 V^t,MA(2) 0.01074 0.01972 0.01796 −0.01155 10-second RVt 4.32800 7.65381 8.34221 4.67328 RKtN 0.04034 0.07209 0.07321 0.04327 V^t 4.33106 7.65902 8.34351 4.67462 V^t,MA(1) 0.11026 0.20188 0.20173 0.13648 V^t,MA(2) 0.00634 0.01300 0.00850 −0.01896 Notes: This table reports the bias estimates from 5000 daily ex post variances using RV, RKN, and Bayesian nonparametric estimators V^, V^MA(1) ⁠, and V^MA(2) based on returns at different frequencies and simulated from four DGPs. The observed prices contain microstructure noise that is dependent with returns. Bold entries denote the smallest values. Open in new tab Table 8 Bias of RVt, RKtN, V^t, V^t,MA(1) ⁠, and V^t,MA(2) (dependent microstructure error case) Data freq. Estimator GARCH SV1F SV1FJ SV2F 5-minute RVt 0.16032 0.28455 0.30733 0.17262 RKtN 0.01349 0.02985 0.03232 0.00733 V^t 0.16005 0.28359 0.30534 0.16275 V^t,MA(1) 0.01471 0.02665 0.02819 −0.00104 V^t,MA(2) 0.05581 0.10305 0.11604 0.03956 1-minute RVt 0.81057 1.42504 1.54563 0.87166 RKtN 0.02421 0.04351 0.04360 0.01839 V^t 0.81269 1.42805 1.54689 0.86954 V^t,MA(1) 0.00822 0.01401 0.01359 −0.01044 V^t,MA(2) 0.01694 0.03179 0.02977 −0.00588 30-second RVt 1.61481 2.85837 3.10192 1.72912 RKtN 0.02791 0.04940 0.05114 0.02369 V^t 1.61731 2.86219 3.10359 1.72853 V^t,MA(1) 0.00721 0.01253 0.00856 −0.01302 V^t,MA(2) 0.01074 0.01972 0.01796 −0.01155 10-second RVt 4.32800 7.65381 8.34221 4.67328 RKtN 0.04034 0.07209 0.07321 0.04327 V^t 4.33106 7.65902 8.34351 4.67462 V^t,MA(1) 0.11026 0.20188 0.20173 0.13648 V^t,MA(2) 0.00634 0.01300 0.00850 −0.01896 Data freq. Estimator GARCH SV1F SV1FJ SV2F 5-minute RVt 0.16032 0.28455 0.30733 0.17262 RKtN 0.01349 0.02985 0.03232 0.00733 V^t 0.16005 0.28359 0.30534 0.16275 V^t,MA(1) 0.01471 0.02665 0.02819 −0.00104 V^t,MA(2) 0.05581 0.10305 0.11604 0.03956 1-minute RVt 0.81057 1.42504 1.54563 0.87166 RKtN 0.02421 0.04351 0.04360 0.01839 V^t 0.81269 1.42805 1.54689 0.86954 V^t,MA(1) 0.00822 0.01401 0.01359 −0.01044 V^t,MA(2) 0.01694 0.03179 0.02977 −0.00588 30-second RVt 1.61481 2.85837 3.10192 1.72912 RKtN 0.02791 0.04940 0.05114 0.02369 V^t 1.61731 2.86219 3.10359 1.72853 V^t,MA(1) 0.00721 0.01253 0.00856 −0.01302 V^t,MA(2) 0.01074 0.01972 0.01796 −0.01155 10-second RVt 4.32800 7.65381 8.34221 4.67328 RKtN 0.04034 0.07209 0.07321 0.04327 V^t 4.33106 7.65902 8.34351 4.67462 V^t,MA(1) 0.11026 0.20188 0.20173 0.13648 V^t,MA(2) 0.00634 0.01300 0.00850 −0.01896 Notes: This table reports the bias estimates from 5000 daily ex post variances using RV, RKN, and Bayesian nonparametric estimators V^, V^MA(1) ⁠, and V^MA(2) based on returns at different frequencies and simulated from four DGPs. The observed prices contain microstructure noise that is dependent with returns. Bold entries denote the smallest values. Open in new tab Table 9 Coverage probability (dependent microstructure error case) Data freq. Interval estimator GARCH (%) SV1F (%) SV1FJ (%) SV2F (%) 5-minute RVt 76.22 74.00 73.12 21.14 RKtN‐Infeasible 87.26 87.62 87.64 76.72 RKtN‐Feasible 91.16 91.34 92.02 96.42 V^t 66.00 63.62 62.44 16.74 V^t,MA(1) 93.96 94.26 94.28 89.84 V^t,MA(2) 94.36 94.60 94.22 90.06 1-minute RVt 0.00 0.00 0.10 0.06 RKtN‐Infeasible 90.02 90.40 89.98 71.70 RKtN‐Feasible 99.80 99.80 99.70 99.46 V^t 0.00 0.00 0.04 0.04 V^t,MA(1) 94.64 94.92 94.72 87.08 V^t,MA(2) 94.58 94.92 94.30 86.92 30-second RVt 0.00 0.00 0.00 0.00 RKtN‐Infeasible 91.50 91.72 91.26 70.94 RKN‐Feasible 100.00 100.00 100.00 99.96 V^ 0.00 0.00 0.00 0.00 V^MA(1) 95.00 95.24 94.76 85.18 V^MA(2) 94.96 94.66 94.78 85.80 10-second RVt 0.00 0.00 0.00 0.00 RKtN‐Infeasible 91.90 92.44 92.30 69.72 RKtN‐Feasible 100.00 100.00 100.00 100.00 V^t 0.00 0.00 0.00 0.00 V^t,MA(1) 64.70 65.00 68.00 78.74 V^t,MA(2) 94.48 95.20 95.14 82.06 Data freq. Interval estimator GARCH (%) SV1F (%) SV1FJ (%) SV2F (%) 5-minute RVt 76.22 74.00 73.12 21.14 RKtN‐Infeasible 87.26 87.62 87.64 76.72 RKtN‐Feasible 91.16 91.34 92.02 96.42 V^t 66.00 63.62 62.44 16.74 V^t,MA(1) 93.96 94.26 94.28 89.84 V^t,MA(2) 94.36 94.60 94.22 90.06 1-minute RVt 0.00 0.00 0.10 0.06 RKtN‐Infeasible 90.02 90.40 89.98 71.70 RKtN‐Feasible 99.80 99.80 99.70 99.46 V^t 0.00 0.00 0.04 0.04 V^t,MA(1) 94.64 94.92 94.72 87.08 V^t,MA(2) 94.58 94.92 94.30 86.92 30-second RVt 0.00 0.00 0.00 0.00 RKtN‐Infeasible 91.50 91.72 91.26 70.94 RKN‐Feasible 100.00 100.00 100.00 99.96 V^ 0.00 0.00 0.00 0.00 V^MA(1) 95.00 95.24 94.76 85.18 V^MA(2) 94.96 94.66 94.78 85.80 10-second RVt 0.00 0.00 0.00 0.00 RKtN‐Infeasible 91.90 92.44 92.30 69.72 RKtN‐Feasible 100.00 100.00 100.00 100.00 V^t 0.00 0.00 0.00 0.00 V^t,MA(1) 64.70 65.00 68.00 78.74 V^t,MA(2) 94.48 95.20 95.14 82.06 Notes: This table reports the coverage probabilities of 95% confidence intervals of RV, RKN, and 0.95 density intervals of Bayesian nonparametric estimators V^, V^MA(1) and V^MA(2) based on 5000-day results. The observed prices contain microstructure noise that is dependent with returns. Open in new tab Table 9 Coverage probability (dependent microstructure error case) Data freq. Interval estimator GARCH (%) SV1F (%) SV1FJ (%) SV2F (%) 5-minute RVt 76.22 74.00 73.12 21.14 RKtN‐Infeasible 87.26 87.62 87.64 76.72 RKtN‐Feasible 91.16 91.34 92.02 96.42 V^t 66.00 63.62 62.44 16.74 V^t,MA(1) 93.96 94.26 94.28 89.84 V^t,MA(2) 94.36 94.60 94.22 90.06 1-minute RVt 0.00 0.00 0.10 0.06 RKtN‐Infeasible 90.02 90.40 89.98 71.70 RKtN‐Feasible 99.80 99.80 99.70 99.46 V^t 0.00 0.00 0.04 0.04 V^t,MA(1) 94.64 94.92 94.72 87.08 V^t,MA(2) 94.58 94.92 94.30 86.92 30-second RVt 0.00 0.00 0.00 0.00 RKtN‐Infeasible 91.50 91.72 91.26 70.94 RKN‐Feasible 100.00 100.00 100.00 99.96 V^ 0.00 0.00 0.00 0.00 V^MA(1) 95.00 95.24 94.76 85.18 V^MA(2) 94.96 94.66 94.78 85.80 10-second RVt 0.00 0.00 0.00 0.00 RKtN‐Infeasible 91.90 92.44 92.30 69.72 RKtN‐Feasible 100.00 100.00 100.00 100.00 V^t 0.00 0.00 0.00 0.00 V^t,MA(1) 64.70 65.00 68.00 78.74 V^t,MA(2) 94.48 95.20 95.14 82.06 Data freq. Interval estimator GARCH (%) SV1F (%) SV1FJ (%) SV2F (%) 5-minute RVt 76.22 74.00 73.12 21.14 RKtN‐Infeasible 87.26 87.62 87.64 76.72 RKtN‐Feasible 91.16 91.34 92.02 96.42 V^t 66.00 63.62 62.44 16.74 V^t,MA(1) 93.96 94.26 94.28 89.84 V^t,MA(2) 94.36 94.60 94.22 90.06 1-minute RVt 0.00 0.00 0.10 0.06 RKtN‐Infeasible 90.02 90.40 89.98 71.70 RKtN‐Feasible 99.80 99.80 99.70 99.46 V^t 0.00 0.00 0.04 0.04 V^t,MA(1) 94.64 94.92 94.72 87.08 V^t,MA(2) 94.58 94.92 94.30 86.92 30-second RVt 0.00 0.00 0.00 0.00 RKtN‐Infeasible 91.50 91.72 91.26 70.94 RKN‐Feasible 100.00 100.00 100.00 99.96 V^ 0.00 0.00 0.00 0.00 V^MA(1) 95.00 95.24 94.76 85.18 V^MA(2) 94.96 94.66 94.78 85.80 10-second RVt 0.00 0.00 0.00 0.00 RKtN‐Infeasible 91.90 92.44 92.30 69.72 RKtN‐Feasible 100.00 100.00 100.00 100.00 V^t 0.00 0.00 0.00 0.00 V^t,MA(1) 64.70 65.00 68.00 78.74 V^t,MA(2) 94.48 95.20 95.14 82.06 Notes: This table reports the coverage probabilities of 95% confidence intervals of RV, RKN, and 0.95 density intervals of Bayesian nonparametric estimators V^, V^MA(1) and V^MA(2) based on 5000-day results. The observed prices contain microstructure noise that is dependent with returns. Open in new tab Figures 1–3 display the histograms of the posterior mean of the number of clusters in three different settings. There are the DPM for five-minute SV1F returns (no noise), the DPM-MA(1) for one-minute SV1FJ returns (independent noise), and the DPM-MA(2) for 30-second SV2F returns (dependent noise). The figures show significant pooling. For example, in the one-minute SV1FJ return case, most of the daily variance estimates of Vt are formed by using one to five pooled groups of data, instead of 390 observations (separate groups) which is what the realized kernel uses. This level of pooling can lead to significant improvements for the Bayesian estimator. Figure 1 Open in new tabDownload slide Posterior mean of the number of clusters. Model: DPM. Data: five-minute return without microstructure noise from SV1F. Figure 1 Open in new tabDownload slide Posterior mean of the number of clusters. Model: DPM. Data: five-minute return without microstructure noise from SV1F. Figure 2 Open in new tabDownload slide Posterior mean of the number of clusters. Model: DPM-MA(1). Data: one-minute return with independent noise from SV1FJ. Figure 2 Open in new tabDownload slide Posterior mean of the number of clusters. Model: DPM-MA(1). Data: one-minute return with independent noise from SV1FJ. Figure 3 Open in new tabDownload slide Posterior Mean of the Number of Clusters. Model: DPM-MA(2). Data: 30-second return with dependent noise from SV2F. Figure 3 Open in new tabDownload slide Posterior Mean of the Number of Clusters. Model: DPM-MA(2). Data: 30-second return with dependent noise from SV2F. 4.5 Tick Time Sampling Following Griffin and Oomen (2008), the prices simulated from DGPs illustrated in Section 4.1 are discretized to tick prices. Let pt,ih denote the observed prices and ω represent the probability of price change. pt,ih=pt,i with probability ω, otherwise, pt,ih=pt,i−1h ⁠. ω is set to be 0.2. Tick time returns are formed based on prices sampled every k-th price change. The data frequencies selected are k = 60, k = 12, and k = 6, which roughly match the frequency of 5-minute, 1-minute, and 30-second data considered in previous examples. The RMSE of estimators based on tick time sampled returns is provided in Table 10. Panel A of Table 10 compares RVt and Vt in a no noise setting and Panel B shows the result of RKtF and V^t,MA(1) when independent microstructure noise is present. In ten out of twelve no-noise cases and all twelve cases with microstructure noise, the Bayesian nonparametric estimators dominate the classical counterparts in terms of RMSEs. The improvements on RMSE switching from RKtF to V^t,MA(1) range from 6.75% to 36.70% in the cases of tick prices contaminated with noise. Table 10 RMSE of RVt, RKtF, V^t ⁠, and V^t,MA(1) in tick time Data freq. Estimator GARCH SV1F SV1FJ SV2F Panel A: No microstructure noise  60-tick RV t 0.11566 0.21887 0.21663 0.49652 V^t 0.11175 0.21175 0.21081 0.47565  12-tick RVt 0.05237 0.09744 0.10194 0.27255 V^t 0.05184 0.09661 0.10474 0.26895  6-tick RVt 0.03866 0.07047 0.07393 0.17821 V^t 0.03842 0.07013 0.07787 0.17642 Panel B: Independent microstructure noise  60-tick RKtF 0.23958 0.43907 0.44927 0.87832 V^t,MA(1) 0.20988 0.39083 0.39199 0.69395  12-tick RKtF 0.11549 0.20695 0.20794 0.64542 V^t,MA(1) 0.10545 0.18643 0.18875 0.40857  6-tick RKtF 0.08666 0.15397 0.15639 0.31230 V^t,MA(1) 0.08080 0.13965 0.14555 0.25059 Data freq. Estimator GARCH SV1F SV1FJ SV2F Panel A: No microstructure noise  60-tick RV t 0.11566 0.21887 0.21663 0.49652 V^t 0.11175 0.21175 0.21081 0.47565  12-tick RVt 0.05237 0.09744 0.10194 0.27255 V^t 0.05184 0.09661 0.10474 0.26895  6-tick RVt 0.03866 0.07047 0.07393 0.17821 V^t 0.03842 0.07013 0.07787 0.17642 Panel B: Independent microstructure noise  60-tick RKtF 0.23958 0.43907 0.44927 0.87832 V^t,MA(1) 0.20988 0.39083 0.39199 0.69395  12-tick RKtF 0.11549 0.20695 0.20794 0.64542 V^t,MA(1) 0.10545 0.18643 0.18875 0.40857  6-tick RKtF 0.08666 0.15397 0.15639 0.31230 V^t,MA(1) 0.08080 0.13965 0.14555 0.25059 Notes: This table reports the RMSE from 5000 daily ex post variances using RV and V^ in no microstructure noise case, RKF and V^MA(1) in independent noise case, based on tick time-sampled returns. Bold entries denote the smallest values. Open in new tab Table 10 RMSE of RVt, RKtF, V^t ⁠, and V^t,MA(1) in tick time Data freq. Estimator GARCH SV1F SV1FJ SV2F Panel A: No microstructure noise  60-tick RV t 0.11566 0.21887 0.21663 0.49652 V^t 0.11175 0.21175 0.21081 0.47565  12-tick RVt 0.05237 0.09744 0.10194 0.27255 V^t 0.05184 0.09661 0.10474 0.26895  6-tick RVt 0.03866 0.07047 0.07393 0.17821 V^t 0.03842 0.07013 0.07787 0.17642 Panel B: Independent microstructure noise  60-tick RKtF 0.23958 0.43907 0.44927 0.87832 V^t,MA(1) 0.20988 0.39083 0.39199 0.69395  12-tick RKtF 0.11549 0.20695 0.20794 0.64542 V^t,MA(1) 0.10545 0.18643 0.18875 0.40857  6-tick RKtF 0.08666 0.15397 0.15639 0.31230 V^t,MA(1) 0.08080 0.13965 0.14555 0.25059 Data freq. Estimator GARCH SV1F SV1FJ SV2F Panel A: No microstructure noise  60-tick RV t 0.11566 0.21887 0.21663 0.49652 V^t 0.11175 0.21175 0.21081 0.47565  12-tick RVt 0.05237 0.09744 0.10194 0.27255 V^t 0.05184 0.09661 0.10474 0.26895  6-tick RVt 0.03866 0.07047 0.07393 0.17821 V^t 0.03842 0.07013 0.07787 0.17642 Panel B: Independent microstructure noise  60-tick RKtF 0.23958 0.43907 0.44927 0.87832 V^t,MA(1) 0.20988 0.39083 0.39199 0.69395  12-tick RKtF 0.11549 0.20695 0.20794 0.64542 V^t,MA(1) 0.10545 0.18643 0.18875 0.40857  6-tick RKtF 0.08666 0.15397 0.15639 0.31230 V^t,MA(1) 0.08080 0.13965 0.14555 0.25059 Notes: This table reports the RMSE from 5000 daily ex post variances using RV and V^ in no microstructure noise case, RKF and V^MA(1) in independent noise case, based on tick time-sampled returns. Bold entries denote the smallest values. Open in new tab Table 11 shows the RMSE of the Bayesian nonparametric estimators based on the two sampling schemes. As shown in Panel B of Table 11, the tick time V^t,MA(1) has lower RMSE in eight out of twelve cases. However, in Panel A, with no microstructure noise, calendar time sampling is uniformly better. Table 11 RMSE of V^t and V^t,MA(1) in Calendar Time and tick time Estimator GARCH SV1F SV1FJ SV2F Panel A: No microstructure noise  5-minute V^t 0.11125 0.20625 0.20928 0.45238  60-tick V^t 0.11175 0.21175 0.21081 0.47565  1-minute V^t 0.05069 0.09195 0.10075 0.18466  12-tick V^t 0.05184 0.09661 0.10474 0.26895  30-second V^t 0.03667 0.06597 0.07485 0.14292  6-tick V^t 0.03842 0.07013 0.07787 0.17642 Panel B: Independent microstructure noise  5-minute V^t,MA(1) 0.21020 0.38103 0.40211 0.73568  60-tick V^t,MA(1) 0.20988 0.39083 0.39199 0.69395  1-minute V^t,MA(1) 0.10304 0.19050 0.19861 0.40527  12-tick V^t,MA(1) 0.10545 0.18643 0.18875 0.40857  30-second V^t,MA(1) 0.07847 0.14306 0.15133 0.28697  6-tick V^t,MA(1) 0.08080 0.13965 0.14555 0.25059 Estimator GARCH SV1F SV1FJ SV2F Panel A: No microstructure noise  5-minute V^t 0.11125 0.20625 0.20928 0.45238  60-tick V^t 0.11175 0.21175 0.21081 0.47565  1-minute V^t 0.05069 0.09195 0.10075 0.18466  12-tick V^t 0.05184 0.09661 0.10474 0.26895  30-second V^t 0.03667 0.06597 0.07485 0.14292  6-tick V^t 0.03842 0.07013 0.07787 0.17642 Panel B: Independent microstructure noise  5-minute V^t,MA(1) 0.21020 0.38103 0.40211 0.73568  60-tick V^t,MA(1) 0.20988 0.39083 0.39199 0.69395  1-minute V^t,MA(1) 0.10304 0.19050 0.19861 0.40527  12-tick V^t,MA(1) 0.10545 0.18643 0.18875 0.40857  30-second V^t,MA(1) 0.07847 0.14306 0.15133 0.28697  6-tick V^t,MA(1) 0.08080 0.13965 0.14555 0.25059 Notes: This table reports the RMSE of estimating 5000 daily ex post variances using Bayesian nonparametric volatility estimator V^ and V^MA(1) in calendar time and tick time. Bold entries denote the smallest value between calendar time and tick time. Open in new tab Table 11 RMSE of V^t and V^t,MA(1) in Calendar Time and tick time Estimator GARCH SV1F SV1FJ SV2F Panel A: No microstructure noise  5-minute V^t 0.11125 0.20625 0.20928 0.45238  60-tick V^t 0.11175 0.21175 0.21081 0.47565  1-minute V^t 0.05069 0.09195 0.10075 0.18466  12-tick V^t 0.05184 0.09661 0.10474 0.26895  30-second V^t 0.03667 0.06597 0.07485 0.14292  6-tick V^t 0.03842 0.07013 0.07787 0.17642 Panel B: Independent microstructure noise  5-minute V^t,MA(1) 0.21020 0.38103 0.40211 0.73568  60-tick V^t,MA(1) 0.20988 0.39083 0.39199 0.69395  1-minute V^t,MA(1) 0.10304 0.19050 0.19861 0.40527  12-tick V^t,MA(1) 0.10545 0.18643 0.18875 0.40857  30-second V^t,MA(1) 0.07847 0.14306 0.15133 0.28697  6-tick V^t,MA(1) 0.08080 0.13965 0.14555 0.25059 Estimator GARCH SV1F SV1FJ SV2F Panel A: No microstructure noise  5-minute V^t 0.11125 0.20625 0.20928 0.45238  60-tick V^t 0.11175 0.21175 0.21081 0.47565  1-minute V^t 0.05069 0.09195 0.10075 0.18466  12-tick V^t 0.05184 0.09661 0.10474 0.26895  30-second V^t 0.03667 0.06597 0.07485 0.14292  6-tick V^t 0.03842 0.07013 0.07787 0.17642 Panel B: Independent microstructure noise  5-minute V^t,MA(1) 0.21020 0.38103 0.40211 0.73568  60-tick V^t,MA(1) 0.20988 0.39083 0.39199 0.69395  1-minute V^t,MA(1) 0.10304 0.19050 0.19861 0.40527  12-tick V^t,MA(1) 0.10545 0.18643 0.18875 0.40857  30-second V^t,MA(1) 0.07847 0.14306 0.15133 0.28697  6-tick V^t,MA(1) 0.08080 0.13965 0.14555 0.25059 Notes: This table reports the RMSE of estimating 5000 daily ex post variances using Bayesian nonparametric volatility estimator V^ and V^MA(1) in calendar time and tick time. Bold entries denote the smallest value between calendar time and tick time. Open in new tab In summary, these simulations show the Bayesian estimate of ex post variance to be very competitive with existing classical alternatives and under different sampling schemes. 5 Bayesian Nonparametric Estimates of Stock Market Variance For each day, 5000 MCMC draws are taken after 10,000 burn-in draws are discarded, to estimate posterior moments. All prior settings are the same as in the simulations. 5.1 IBM We first consider estimating and forecasting volatility using a long calendar span of IBM equity returns. The one-minute IBM price records from January 3, 1998 to February 16, 2016 were downloaded from the Kibot website.10 We choose the sample starting from January 3, 2001 as the relatively small number of transactions before the year 2000 yields many zero intraday returns. The days with less than five hours of trading are removed, which leaves 3764 days in the sample. Log-prices are placed on a one-minute grid using the price associated with closest time stamp that is less than or equal to the grid time. The five-minute and one-minute percentage log returns from 9:30 to 16:00(EST) are constructed by taking the log-price difference between two close prices in time grid and scaling by 100. The overnight returns are ignored so the first intraday return is formed using the daily opening price instead of the close price in the previous day. The procedure generates 293,520 five-minute returns and 1,467,848 one-minute returns. We use a filter to remove errors and outliers caused by abnormal price records. We would like to filter out the situation in which the price jumps up or down but quickly moves back to original price range. This suggests an error in the record. If |rt,i|+|rt,i+1|>8vart(rt,i) and |rt,i+rt,i+1|<0.05% ⁠, we replace rt,i and rt,i+1 by r′t,i=r′t,i+1=0.5×(rt,i+rt,i+1) ⁠. The filter adjusts 0 and 70 (70/1,467,848 = 0.00477%) returns for five-minute and one-minute case, respectively. From these data, several version of daily V^t ⁠, RVt, and RKt are computed. Daily returns are the open-to-close return and match the time interval for the variance estimates. For each of the estimators, we follow exactly the methods used in the simulation section. 5.1.1 Ex post variance estimation Figure 4 displays a volatility signature plot, which shows the relationship between the average volatility estimators and sampling frequency. The RV based on 10-minute returns serves as the unbiased benchmark because low-frequency returns are less influenced by market microstructure noise. The average of the Bayes nonparametric estimator is closer to 1.0 compared with RV no matter the sampling frequency. The plot becomes stable after 3.9 minutes sampling frequency. Figure 4 Open in new tabDownload slide Signature Plot of RVt and V^t (IBM data). Figure 4 Open in new tabDownload slide Signature Plot of RVt and V^t (IBM data). Table 12 reports summary statistics for several estimators. Overall the Bayesian and classical estimators are very close. Both the realized kernel and the moving average DPM estimators reduce the average level of daily variance and indicate the presence of significant market microstructure noise. Based on this and an analysis of the ACF of the high-frequency returns, we suggest the V^t,MA(1) for the five-minute data and the V^t,MA(4) for the one-minute data in the remainder of the analysis. Comparison with the kernel estimators is found in Figures 5 and 6. Except for the extreme values they are very similar. Figure 5 Open in new tabDownload slide RKtF and V^t,MA(1) based on five-minute IBM returns. Figure 5 Open in new tabDownload slide RKtF and V^t,MA(1) based on five-minute IBM returns. Figure 6 Open in new tabDownload slide RKtN and V^t,MA(4) based on one-minute IBM returns. Figure 6 Open in new tabDownload slide RKtN and V^t,MA(4) based on one-minute IBM returns. Table 12 Summary statistics: IBM Frequency Data Mean Median Var. Skew. Kurt. Min. Max. Daily rt 0.0673 0.0656 1.6046 0.2069 8.3059 −6.4095 12.2777 rt2 1.6091 0.4352 18.9654 13.9087 387.4812 0.0000 150.7429 5-minute RVt 1.8353 0.9458 11.9867 9.5887 148.2622 0.1032 76.2901 RVt,M=26block 1.8403 0.9506 12.1820 9.7116 151.8262 0.1003 77.2906 RKtF 1.6613 0.8447 9.3647 8.5539 124.8480 0.0375 71.9626 RKtN 1.6670 0.8476 8.8872 8.0467 109.1098 0.0556 66.3995 V^t 1.7805 0.9286 10.3994 8.3839 116.7700 0.1068 70.2477 V^t,MA(1) 1.6656 0.8424 9.2105 7.2318 77.7981 0.0275 52.3102 V^t,MA(2) 1.6969 0.8467 10.0917 8.4800 118.7351 0.0137 72.2059 1-minute RVt 2.0004 1.0468 13.5019 10.5704 202.6835 0.1535 103.8773 RVt,M=78block 2.0045 1.0478 13.6307 10.6737 206.9232 0.1551 105.0582 RKtF 1.7952 0.9163 10.8043 8.3092 113.5727 0.1006 73.8576 RKtN 1.7425 0.8973 9.6499 7.7187 94.7830 0.0897 60.2024 V^t 1.9649 1.0322 12.9922 10.6422 206.7584 0.1517 102.6389 V^t,MA(1) 1.8417 0.9211 11.3720 7.6213 87.5668 0.1156 64.1797 V^t,MA(2) 1.7894 0.8979 10.9147 8.5039 121.6750 0.1040 74.7890 V^t,MA(3) 1.7393 0.8824 9.6571 7.8283 101.5650 0.0986 61.4764 V^t,MA(4) 1.7105 0.8704 9.1269 7.3825 84.7413 0.0964 57.2552 Frequency Data Mean Median Var. Skew. Kurt. Min. Max. Daily rt 0.0673 0.0656 1.6046 0.2069 8.3059 −6.4095 12.2777 rt2 1.6091 0.4352 18.9654 13.9087 387.4812 0.0000 150.7429 5-minute RVt 1.8353 0.9458 11.9867 9.5887 148.2622 0.1032 76.2901 RVt,M=26block 1.8403 0.9506 12.1820 9.7116 151.8262 0.1003 77.2906 RKtF 1.6613 0.8447 9.3647 8.5539 124.8480 0.0375 71.9626 RKtN 1.6670 0.8476 8.8872 8.0467 109.1098 0.0556 66.3995 V^t 1.7805 0.9286 10.3994 8.3839 116.7700 0.1068 70.2477 V^t,MA(1) 1.6656 0.8424 9.2105 7.2318 77.7981 0.0275 52.3102 V^t,MA(2) 1.6969 0.8467 10.0917 8.4800 118.7351 0.0137 72.2059 1-minute RVt 2.0004 1.0468 13.5019 10.5704 202.6835 0.1535 103.8773 RVt,M=78block 2.0045 1.0478 13.6307 10.6737 206.9232 0.1551 105.0582 RKtF 1.7952 0.9163 10.8043 8.3092 113.5727 0.1006 73.8576 RKtN 1.7425 0.8973 9.6499 7.7187 94.7830 0.0897 60.2024 V^t 1.9649 1.0322 12.9922 10.6422 206.7584 0.1517 102.6389 V^t,MA(1) 1.8417 0.9211 11.3720 7.6213 87.5668 0.1156 64.1797 V^t,MA(2) 1.7894 0.8979 10.9147 8.5039 121.6750 0.1040 74.7890 V^t,MA(3) 1.7393 0.8824 9.6571 7.8283 101.5650 0.0986 61.4764 V^t,MA(4) 1.7105 0.8704 9.1269 7.3825 84.7413 0.0964 57.2552 Notes: This table reports the summary statistics of ex post variance estimators based on five-minute and one-minute returns, along with the summary statistics of daily return and daily squared return. The number of daily observation is 3764. Open in new tab Table 12 Summary statistics: IBM Frequency Data Mean Median Var. Skew. Kurt. Min. Max. Daily rt 0.0673 0.0656 1.6046 0.2069 8.3059 −6.4095 12.2777 rt2 1.6091 0.4352 18.9654 13.9087 387.4812 0.0000 150.7429 5-minute RVt 1.8353 0.9458 11.9867 9.5887 148.2622 0.1032 76.2901 RVt,M=26block 1.8403 0.9506 12.1820 9.7116 151.8262 0.1003 77.2906 RKtF 1.6613 0.8447 9.3647 8.5539 124.8480 0.0375 71.9626 RKtN 1.6670 0.8476 8.8872 8.0467 109.1098 0.0556 66.3995 V^t 1.7805 0.9286 10.3994 8.3839 116.7700 0.1068 70.2477 V^t,MA(1) 1.6656 0.8424 9.2105 7.2318 77.7981 0.0275 52.3102 V^t,MA(2) 1.6969 0.8467 10.0917 8.4800 118.7351 0.0137 72.2059 1-minute RVt 2.0004 1.0468 13.5019 10.5704 202.6835 0.1535 103.8773 RVt,M=78block 2.0045 1.0478 13.6307 10.6737 206.9232 0.1551 105.0582 RKtF 1.7952 0.9163 10.8043 8.3092 113.5727 0.1006 73.8576 RKtN 1.7425 0.8973 9.6499 7.7187 94.7830 0.0897 60.2024 V^t 1.9649 1.0322 12.9922 10.6422 206.7584 0.1517 102.6389 V^t,MA(1) 1.8417 0.9211 11.3720 7.6213 87.5668 0.1156 64.1797 V^t,MA(2) 1.7894 0.8979 10.9147 8.5039 121.6750 0.1040 74.7890 V^t,MA(3) 1.7393 0.8824 9.6571 7.8283 101.5650 0.0986 61.4764 V^t,MA(4) 1.7105 0.8704 9.1269 7.3825 84.7413 0.0964 57.2552 Frequency Data Mean Median Var. Skew. Kurt. Min. Max. Daily rt 0.0673 0.0656 1.6046 0.2069 8.3059 −6.4095 12.2777 rt2 1.6091 0.4352 18.9654 13.9087 387.4812 0.0000 150.7429 5-minute RVt 1.8353 0.9458 11.9867 9.5887 148.2622 0.1032 76.2901 RVt,M=26block 1.8403 0.9506 12.1820 9.7116 151.8262 0.1003 77.2906 RKtF 1.6613 0.8447 9.3647 8.5539 124.8480 0.0375 71.9626 RKtN 1.6670 0.8476 8.8872 8.0467 109.1098 0.0556 66.3995 V^t 1.7805 0.9286 10.3994 8.3839 116.7700 0.1068 70.2477 V^t,MA(1) 1.6656 0.8424 9.2105 7.2318 77.7981 0.0275 52.3102 V^t,MA(2) 1.6969 0.8467 10.0917 8.4800 118.7351 0.0137 72.2059 1-minute RVt 2.0004 1.0468 13.5019 10.5704 202.6835 0.1535 103.8773 RVt,M=78block 2.0045 1.0478 13.6307 10.6737 206.9232 0.1551 105.0582 RKtF 1.7952 0.9163 10.8043 8.3092 113.5727 0.1006 73.8576 RKtN 1.7425 0.8973 9.6499 7.7187 94.7830 0.0897 60.2024 V^t 1.9649 1.0322 12.9922 10.6422 206.7584 0.1517 102.6389 V^t,MA(1) 1.8417 0.9211 11.3720 7.6213 87.5668 0.1156 64.1797 V^t,MA(2) 1.7894 0.8979 10.9147 8.5039 121.6750 0.1040 74.7890 V^t,MA(3) 1.7393 0.8824 9.6571 7.8283 101.5650 0.0986 61.4764 V^t,MA(4) 1.7105 0.8704 9.1269 7.3825 84.7413 0.0964 57.2552 Notes: This table reports the summary statistics of ex post variance estimators based on five-minute and one-minute returns, along with the summary statistics of daily return and daily squared return. The number of daily observation is 3764. Open in new tab Interval estimates for two subperiods are shown in Figures 7 and 8. A clear disadvantage of the kernel-based confidence interval in that it includes negative values for ex post variance. The Bayesian version by construction does not and tends to be significantly shorter in volatile days. The results of log variance11 are also provided with some differences remaining. Figure 7 Open in new tabDownload slide High volatility period: RKtF and V^t,MA(1) calculated using five-minute IBM returns. Top: variance, below: log-variance. Figure 7 Open in new tabDownload slide High volatility period: RKtF and V^t,MA(1) calculated using five-minute IBM returns. Top: variance, below: log-variance. Figure 8 Open in new tabDownload slide Low volatility period: RKtF and V^t,MA(1) calculated using five-minute IBM returns. Top: variance, below: log-variance. Figure 8 Open in new tabDownload slide Low volatility period: RKtF and V^t,MA(1) calculated using five-minute IBM returns. Top: variance, below: log-variance. The degree of pooling from the Bayesian estimators is found in Figures 9 and 10. As expected, we see more groups in the higher one-minute frequency. In this case, on average, there are about three to seven distinct groups of intraday variance parameters. Figure 9 Open in new tabDownload slide Posterior Mean of the Number of Clusters (Based on 3764-day results from DPM-MA(1) using five-minute IBM returns). Figure 9 Open in new tabDownload slide Posterior Mean of the Number of Clusters (Based on 3764-day results from DPM-MA(1) using five-minute IBM returns). Figure 10 Open in new tabDownload slide Posterior Mean of the Number of Clusters (Based on 3764-day results from DPM-MA(4) using one-minute IBM returns). Figure 10 Open in new tabDownload slide Posterior Mean of the Number of Clusters (Based on 3764-day results from DPM-MA(4) using one-minute IBM returns). 5.1.2 Ex post variance modeling and forecasting Does the Bayesian estimator correctly recover the time series dynamics of volatility? To investigate this, we estimate several versions of the heterogeneous autoregressive (HAR) model introduced by Corsi (2009). This is a popular model that captures the strong dependence in ex post daily variance. For V^t ⁠, the HAR model is V^t=β0+β1V^t−1+β2V^t−1|t−5+β3V^t−1|t−22+ϵt, (45) where V^t−1|t−h=1h∑l=1hV^t−l and ϵt is the error term. V^t−1 ⁠, V^t−1|t−5 ⁠, and V^t−1|t−22 correspond to the daily, weekly, and monthly variance measures up to time t – 1. Similar specifications are obtained by replacing V^t with RVt or RKt. Bollerslev, Patton, and Quaedvlieg (2016) extend the HAR model to the HARQ model by taking the asymptotic theory of RVt into account. The HARQ model for RVt is given by RVt=β0+(β1+β1QRQt−11/2)RVt−1+β2RVt−1|t−5+β3RVt−1|t−22+ϵt (46) The loading on RVt−1 is no longer a constant, but varying with measurement error, which is captured by RQt−1 ⁠. The model responds more to RVt−1 if measurement error is low and has a lower response if error is high. Bollerslev, Patton, and Quaedvlieg (2016) provide evidence that the HARQ model outperforms the HAR model in forecasting.12 An advantage of our Bayesian approach is that we have the full finite sample posterior distribution for Vt. In the Bayesian nonparametric framework, there is no need to estimate IQt with RQt, instead the variance, standard deviation, or other features of Vt can be easily estimated using the MCMC output. Replacing RQt−1 with var̂(Vt−1) ⁠, the modified HARQ model for V^t is defined as V^t=β0+(β1+β1Qvar̂(Vt−1)1/2)V^t−1+β2V^t−1|t−5+β3V^t−1|t−22+ϵt, (47) where var̂(Vt−1)1/2 is an MCMC estimate of the posterior standard deviation of Vt. Table 13 displays the OLS estimates and the R2 for several model specifications. Coefficient estimates are comparable across each class of model. Clearly, the Bayesian variance estimates display the same type of time series dynamics found in the realized kernel estimates. Table 13 HAR and HARQ model regression result based on IBM ex post variance estimators Data freq. Parameter HAR HARQ RKtF V^t,MA(1) RKtF V^t,MA(1) 5-minute β0 0.1322 0.1224 0.1015 −0.0142 (0.0374) (0.0375) (0.0382) (0.0393) β1 0.1926 0.2506 0.2341 0.4629 (0.0196) (0.0196) (0.0224) (0.0283) β2 0.5649 0.4802 0.5664 0.4298 (0.0332) (0.0329) (0.0331) (0.0328) β3 0.1598 0.1927 0.1422 0.1482 (0.0281) (0.0282) (0.0289) (0.0281) β1Q – – −0.0012 −0.0202 (0.0003) (0.0020) R2 (%) 57.74 59.55 57.90 60.66 Data freq. Parameter HAR HARQ RKtN V^t,MA(4) RKtN V^t,MA(4) 1-minute β0 0.1246 0.1308 0.0065 −0.0402 (0.0365) (0.0376) (0.0367) (0.0388) β1 0.2493 0.2455 0.4464 0.5294 (0.0195) (0.0196) (0.0242) (0.0284) β2 0.5435 0.5198 0.5033 0.4521 (0.0318) (0.0321) (0.0312) (0.0317) β3 0.1331 0.1558 0.0708 0.0821 (0.0265) (0.0271) (0.0263) (0.0270) β1Q – – −0.0031 −0.0334 (0.0002) (0.0025) R2 (%) 62.71 60.34 64.39 62.19 Data freq. Parameter HAR HARQ RKtF V^t,MA(1) RKtF V^t,MA(1) 5-minute β0 0.1322 0.1224 0.1015 −0.0142 (0.0374) (0.0375) (0.0382) (0.0393) β1 0.1926 0.2506 0.2341 0.4629 (0.0196) (0.0196) (0.0224) (0.0283) β2 0.5649 0.4802 0.5664 0.4298 (0.0332) (0.0329) (0.0331) (0.0328) β3 0.1598 0.1927 0.1422 0.1482 (0.0281) (0.0282) (0.0289) (0.0281) β1Q – – −0.0012 −0.0202 (0.0003) (0.0020) R2 (%) 57.74 59.55 57.90 60.66 Data freq. Parameter HAR HARQ RKtN V^t,MA(4) RKtN V^t,MA(4) 1-minute β0 0.1246 0.1308 0.0065 −0.0402 (0.0365) (0.0376) (0.0367) (0.0388) β1 0.2493 0.2455 0.4464 0.5294 (0.0195) (0.0196) (0.0242) (0.0284) β2 0.5435 0.5198 0.5033 0.4521 (0.0318) (0.0321) (0.0312) (0.0317) β3 0.1331 0.1558 0.0708 0.0821 (0.0265) (0.0271) (0.0263) (0.0270) β1Q – – −0.0031 −0.0334 (0.0002) (0.0025) R2 (%) 62.71 60.34 64.39 62.19 Notes: This table reports OLS regression results for the HAR and HARQ models. The results in top panel are based on RKtF and V^t,MA(1) calculated using five-minute returns and the bottom panel shows the results of one-minute RKtN and V^t,MA(4) ⁠. The values in brackets are standard error of coefficients. Sample period: January 3, 2001 to February 16, 2016, 3764 observations. Open in new tab Table 13 HAR and HARQ model regression result based on IBM ex post variance estimators Data freq. Parameter HAR HARQ RKtF V^t,MA(1) RKtF V^t,MA(1) 5-minute β0 0.1322 0.1224 0.1015 −0.0142 (0.0374) (0.0375) (0.0382) (0.0393) β1 0.1926 0.2506 0.2341 0.4629 (0.0196) (0.0196) (0.0224) (0.0283) β2 0.5649 0.4802 0.5664 0.4298 (0.0332) (0.0329) (0.0331) (0.0328) β3 0.1598 0.1927 0.1422 0.1482 (0.0281) (0.0282) (0.0289) (0.0281) β1Q – – −0.0012 −0.0202 (0.0003) (0.0020) R2 (%) 57.74 59.55 57.90 60.66 Data freq. Parameter HAR HARQ RKtN V^t,MA(4) RKtN V^t,MA(4) 1-minute β0 0.1246 0.1308 0.0065 −0.0402 (0.0365) (0.0376) (0.0367) (0.0388) β1 0.2493 0.2455 0.4464 0.5294 (0.0195) (0.0196) (0.0242) (0.0284) β2 0.5435 0.5198 0.5033 0.4521 (0.0318) (0.0321) (0.0312) (0.0317) β3 0.1331 0.1558 0.0708 0.0821 (0.0265) (0.0271) (0.0263) (0.0270) β1Q – – −0.0031 −0.0334 (0.0002) (0.0025) R2 (%) 62.71 60.34 64.39 62.19 Data freq. Parameter HAR HARQ RKtF V^t,MA(1) RKtF V^t,MA(1) 5-minute β0 0.1322 0.1224 0.1015 −0.0142 (0.0374) (0.0375) (0.0382) (0.0393) β1 0.1926 0.2506 0.2341 0.4629 (0.0196) (0.0196) (0.0224) (0.0283) β2 0.5649 0.4802 0.5664 0.4298 (0.0332) (0.0329) (0.0331) (0.0328) β3 0.1598 0.1927 0.1422 0.1482 (0.0281) (0.0282) (0.0289) (0.0281) β1Q – – −0.0012 −0.0202 (0.0003) (0.0020) R2 (%) 57.74 59.55 57.90 60.66 Data freq. Parameter HAR HARQ RKtN V^t,MA(4) RKtN V^t,MA(4) 1-minute β0 0.1246 0.1308 0.0065 −0.0402 (0.0365) (0.0376) (0.0367) (0.0388) β1 0.2493 0.2455 0.4464 0.5294 (0.0195) (0.0196) (0.0242) (0.0284) β2 0.5435 0.5198 0.5033 0.4521 (0.0318) (0.0321) (0.0312) (0.0317) β3 0.1331 0.1558 0.0708 0.0821 (0.0265) (0.0271) (0.0263) (0.0270) β1Q – – −0.0031 −0.0334 (0.0002) (0.0025) R2 (%) 62.71 60.34 64.39 62.19 Notes: This table reports OLS regression results for the HAR and HARQ models. The results in top panel are based on RKtF and V^t,MA(1) calculated using five-minute returns and the bottom panel shows the results of one-minute RKtN and V^t,MA(4) ⁠. The values in brackets are standard error of coefficients. Sample period: January 3, 2001 to February 16, 2016, 3764 observations. Open in new tab Finally, out-of-sample root mean squared forecast errors (RMSFEs) of HAR and HARQ models using both classical estimators and Bayesian estimators are found in Table 14. The out-of-sample period is from January 3, 2005 to February 16, 2016 (2773 observations) and model parameters are re-estimated as new data arrives. Note that to mimic a real-time forecast setting the prior hyperparameters ν0,t and s0,t are set based on intraday data from day t and t – 1.13 Table 14 Out-of-sample forecasts of IBM volatility Dependent variable Regressors HAR HARQ Panel A: 5-minute return  5-minute RKtF RKtF 1.84113 1.84444 V^t,MA(1) 1.84042 1.81152  5-minute V^t,MA(1) RKtF 1.86130 1.86642 V^t,MA(1) 1.85546 1.83054 Panel B: 1-minute return  1-minute RKtN RKtN 1.87539 1.82881 V^t,MA(4) 1.87215 1.82548  1-minute V^t,MA(4) RKtN 1.94106 1.88974 V^t,MA(4) 1.93202 1.87276 Dependent variable Regressors HAR HARQ Panel A: 5-minute return  5-minute RKtF RKtF 1.84113 1.84444 V^t,MA(1) 1.84042 1.81152  5-minute V^t,MA(1) RKtF 1.86130 1.86642 V^t,MA(1) 1.85546 1.83054 Panel B: 1-minute return  1-minute RKtN RKtN 1.87539 1.82881 V^t,MA(4) 1.87215 1.82548  1-minute V^t,MA(4) RKtN 1.94106 1.88974 V^t,MA(4) 1.93202 1.87276 Notes: This table reports the RMSFE of forecasting next period ex post variance using both classical and Bayesian nonparametric variance estimator. Both HAR and HARQ models are considered. The forecasting target is the dependent variable one period out-of-sample. On each day, the model parameters are re-estimated using all the data up to that day. Out of sample period: January 3, 2005 to February 16, 2016, 2773 days. Bold entries denote the smallest value in a column. Open in new tab Table 14 Out-of-sample forecasts of IBM volatility Dependent variable Regressors HAR HARQ Panel A: 5-minute return  5-minute RKtF RKtF 1.84113 1.84444 V^t,MA(1) 1.84042 1.81152  5-minute V^t,MA(1) RKtF 1.86130 1.86642 V^t,MA(1) 1.85546 1.83054 Panel B: 1-minute return  1-minute RKtN RKtN 1.87539 1.82881 V^t,MA(4) 1.87215 1.82548  1-minute V^t,MA(4) RKtN 1.94106 1.88974 V^t,MA(4) 1.93202 1.87276 Dependent variable Regressors HAR HARQ Panel A: 5-minute return  5-minute RKtF RKtF 1.84113 1.84444 V^t,MA(1) 1.84042 1.81152  5-minute V^t,MA(1) RKtF 1.86130 1.86642 V^t,MA(1) 1.85546 1.83054 Panel B: 1-minute return  1-minute RKtN RKtN 1.87539 1.82881 V^t,MA(4) 1.87215 1.82548  1-minute V^t,MA(4) RKtN 1.94106 1.88974 V^t,MA(4) 1.93202 1.87276 Notes: This table reports the RMSFE of forecasting next period ex post variance using both classical and Bayesian nonparametric variance estimator. Both HAR and HARQ models are considered. The forecasting target is the dependent variable one period out-of-sample. On each day, the model parameters are re-estimated using all the data up to that day. Out of sample period: January 3, 2005 to February 16, 2016, 2773 days. Bold entries denote the smallest value in a column. Open in new tab The first column of Table 14 reports the data frequency and the dependent variable used in the HAR/HARQ model. The second column records the data used to construct the right-hand-side regressors. In this manner, we consider all the possible combinations of how RKtN is forecast by lags of RKtN or V^t,MA and similarly for forecasting V^t,MA ⁠. All of the specifications produce similar RMSFE. In all cases, the Bayesian variance measure forecasts itself and the realized kernel better. 5.2 SPDR S&P 500 ETF Transaction and National Best Bid and Offer data for SPDR S&P 500 ETF (SPY) was supplied by Tickdata. We follow the same method of Barndorff-Nielsen et al. (2011) to clean both transaction and quote datasets and form grid returns at 5-minute, 1-minute, 30-second, and 10-second frequencies using transaction prices. The sample period is from July 1, 2014 to June 29, 2016 and does not include days with less than six trading hours. The final dataset has 498 days of intraday observations. Table 15 displays the summary statistics of daily variance estimators of SPY returns. As the sampling frequency increases, the sample average of different variance estimators become closer to the sample variance of daily returns. Figures 11 and 12 display box plots of the daily variance estimates for the classical and Bayesian estimators for the 5-minute and 30-second data. There are several important points to make. First, both estimators recover the same general pattern of volatility in this period. Second, the Bayesian density interval is often shorter and asymmetric compare to the classical counterpart. Although there is general agreement, the high variance day of June 24 indicates some differences particularly in Figure 12. Finally, both estimates become more accurate with the higher frequency 30-second data and also make a significant downward revision to the variance estimates on June 24. Figure 11 Open in new tabDownload slide RVt and V^t based on five-minute SPY returns in June 2016. Top: variance, below: log-variance. Figure 11 Open in new tabDownload slide RVt and V^t based on five-minute SPY returns in June 2016. Top: variance, below: log-variance. Figure 12 Open in new tabDownload slide RKtF and V^t,MA(1) based on 30-second SPY returns in June 2016. Top: variance, below: log-variance. Figure 12 Open in new tabDownload slide RKtF and V^t,MA(1) based on 30-second SPY returns in June 2016. Top: variance, below: log-variance. Table 15 Summary statistics: SPY Frequency Data Mean Median Var. Skew. Kurt. Min. Max. Daily rt 0.0188 0.0452 0.4980 −0.6969 6.5418 −4.2837 2.7084 rt2 0.4984 0.1634 1.3633 8.8640 118.4977 0.0000 18.3506 5-minute RVt 0.5287 0.2959 1.5263 16.2802 320.3198 0.0358 25.2066 RVt,M=26block 0.5297 0.2920 1.5736 16.4718 325.8798 0.0356 25.6964 RKtF 0.4900 0.2754 0.5812 7.8004 97.4299 0.0094 11.5382 RKtN 0.4917 0.2792 0.6305 8.7319 119.7854 0.0168 12.7330 V^t 0.5212 0.2901 1.3682 15.7278 304.5983 0.0367 23.5902 V^t,MA(1) 0.5245 0.2756 1.3761 15.3403 294.2938 0.0160 23.4621 V^t,MA(2) 0.5162 0.2900 0.8368 10.9368 175.3430 0.0145 16.1662 1-minute RVt 0.5209 0.3154 0.5902 9.3401 138.3005 0.0521 12.8912 RVt,M=78block 0.5214 0.3174 0.5910 9.3483 138.5217 0.0520 12.9052 RKtF 0.5126 0.3036 0.8742 12.6004 219.7084 0.0344 17.4842 RKtN 0.5062 0.2999 0.8818 13.0819 232.7427 0.0281 17.8050 V^t 0.5170 0.3153 0.5766 9.2625 136.6305 0.0522 12.7063 V^t,MA(1) 0.5095 0.2984 0.9136 13.1729 235.1810 0.0383 18.1655 V^t,MA(2) 0.5025 0.2940 0.9899 13.5608 244.4456 0.0344 19.0570 30-second RVt 0.5034 0.3099 0.4525 7.2251 89.5632 0.0563 10.1295 RVt,M=130block 0.5297 0.2920 1.5736 16.471 325.8798 0.0356 25.6964 RKtF 0.5092 0.3109 0.6869 10.9490 177.4869 0.0411 14.7520 RKtN 0.5066 0.3094 0.7906 12.2661 211.2367 0.0356 16.4858 V^t 0.5008 0.3087 0.4466 7.1778 88.5730 0.0566 10.0356 V^t,MA(1) 0.5007 0.3017 0.6056 9.9364 152.7216 0.0436 13.3641 V^t,MA(2) 0.4997 0.2917 0.6780 10.6858 170.9371 0.0394 14.5134 10-second RVt 0.4984 0.3103 0.4172 7.0322 85.1760 0.0741 9.6097 RVt,M=260block 0.4984 0.3106 0.4170 7.0330 85.2038 0.0741 9.6077 RKtF 0.5021 0.3165 0.5179 8.7480 124.3016 0.0533 11.7691 RKtN 0.5051 0.3128 0.5845 9.6693 145.9337 0.0447 12.9900 V^t 0.4962 0.3099 0.4109 6.9512 83.4540 0.0741 9.4866 V^t,MA(1) 0.4921 0.3067 0.4220 6.9830 83.8263 0.0470 9.6152 V^t,MA(2) 0.4943 0.3066 0.4271 6.9312 83.1514 0.0441 9.6627 Frequency Data Mean Median Var. Skew. Kurt. Min. Max. Daily rt 0.0188 0.0452 0.4980 −0.6969 6.5418 −4.2837 2.7084 rt2 0.4984 0.1634 1.3633 8.8640 118.4977 0.0000 18.3506 5-minute RVt 0.5287 0.2959 1.5263 16.2802 320.3198 0.0358 25.2066 RVt,M=26block 0.5297 0.2920 1.5736 16.4718 325.8798 0.0356 25.6964 RKtF 0.4900 0.2754 0.5812 7.8004 97.4299 0.0094 11.5382 RKtN 0.4917 0.2792 0.6305 8.7319 119.7854 0.0168 12.7330 V^t 0.5212 0.2901 1.3682 15.7278 304.5983 0.0367 23.5902 V^t,MA(1) 0.5245 0.2756 1.3761 15.3403 294.2938 0.0160 23.4621 V^t,MA(2) 0.5162 0.2900 0.8368 10.9368 175.3430 0.0145 16.1662 1-minute RVt 0.5209 0.3154 0.5902 9.3401 138.3005 0.0521 12.8912 RVt,M=78block 0.5214 0.3174 0.5910 9.3483 138.5217 0.0520 12.9052 RKtF 0.5126 0.3036 0.8742 12.6004 219.7084 0.0344 17.4842 RKtN 0.5062 0.2999 0.8818 13.0819 232.7427 0.0281 17.8050 V^t 0.5170 0.3153 0.5766 9.2625 136.6305 0.0522 12.7063 V^t,MA(1) 0.5095 0.2984 0.9136 13.1729 235.1810 0.0383 18.1655 V^t,MA(2) 0.5025 0.2940 0.9899 13.5608 244.4456 0.0344 19.0570 30-second RVt 0.5034 0.3099 0.4525 7.2251 89.5632 0.0563 10.1295 RVt,M=130block 0.5297 0.2920 1.5736 16.471 325.8798 0.0356 25.6964 RKtF 0.5092 0.3109 0.6869 10.9490 177.4869 0.0411 14.7520 RKtN 0.5066 0.3094 0.7906 12.2661 211.2367 0.0356 16.4858 V^t 0.5008 0.3087 0.4466 7.1778 88.5730 0.0566 10.0356 V^t,MA(1) 0.5007 0.3017 0.6056 9.9364 152.7216 0.0436 13.3641 V^t,MA(2) 0.4997 0.2917 0.6780 10.6858 170.9371 0.0394 14.5134 10-second RVt 0.4984 0.3103 0.4172 7.0322 85.1760 0.0741 9.6097 RVt,M=260block 0.4984 0.3106 0.4170 7.0330 85.2038 0.0741 9.6077 RKtF 0.5021 0.3165 0.5179 8.7480 124.3016 0.0533 11.7691 RKtN 0.5051 0.3128 0.5845 9.6693 145.9337 0.0447 12.9900 V^t 0.4962 0.3099 0.4109 6.9512 83.4540 0.0741 9.4866 V^t,MA(1) 0.4921 0.3067 0.4220 6.9830 83.8263 0.0470 9.6152 V^t,MA(2) 0.4943 0.3066 0.4271 6.9312 83.1514 0.0441 9.6627 Notes: This table reports the summary statistics of ex post variance estimators based on 5-minute, 1-minute, 30-second, and 10-second SPY returns, along with the summary statistics of daily return and daily squared return. Sample period: July 2, 2014 to June 28, 2016. Open in new tab Table 15 Summary statistics: SPY Frequency Data Mean Median Var. Skew. Kurt. Min. Max. Daily rt 0.0188 0.0452 0.4980 −0.6969 6.5418 −4.2837 2.7084 rt2 0.4984 0.1634 1.3633 8.8640 118.4977 0.0000 18.3506 5-minute RVt 0.5287 0.2959 1.5263 16.2802 320.3198 0.0358 25.2066 RVt,M=26block 0.5297 0.2920 1.5736 16.4718 325.8798 0.0356 25.6964 RKtF 0.4900 0.2754 0.5812 7.8004 97.4299 0.0094 11.5382 RKtN 0.4917 0.2792 0.6305 8.7319 119.7854 0.0168 12.7330 V^t 0.5212 0.2901 1.3682 15.7278 304.5983 0.0367 23.5902 V^t,MA(1) 0.5245 0.2756 1.3761 15.3403 294.2938 0.0160 23.4621 V^t,MA(2) 0.5162 0.2900 0.8368 10.9368 175.3430 0.0145 16.1662 1-minute RVt 0.5209 0.3154 0.5902 9.3401 138.3005 0.0521 12.8912 RVt,M=78block 0.5214 0.3174 0.5910 9.3483 138.5217 0.0520 12.9052 RKtF 0.5126 0.3036 0.8742 12.6004 219.7084 0.0344 17.4842 RKtN 0.5062 0.2999 0.8818 13.0819 232.7427 0.0281 17.8050 V^t 0.5170 0.3153 0.5766 9.2625 136.6305 0.0522 12.7063 V^t,MA(1) 0.5095 0.2984 0.9136 13.1729 235.1810 0.0383 18.1655 V^t,MA(2) 0.5025 0.2940 0.9899 13.5608 244.4456 0.0344 19.0570 30-second RVt 0.5034 0.3099 0.4525 7.2251 89.5632 0.0563 10.1295 RVt,M=130block 0.5297 0.2920 1.5736 16.471 325.8798 0.0356 25.6964 RKtF 0.5092 0.3109 0.6869 10.9490 177.4869 0.0411 14.7520 RKtN 0.5066 0.3094 0.7906 12.2661 211.2367 0.0356 16.4858 V^t 0.5008 0.3087 0.4466 7.1778 88.5730 0.0566 10.0356 V^t,MA(1) 0.5007 0.3017 0.6056 9.9364 152.7216 0.0436 13.3641 V^t,MA(2) 0.4997 0.2917 0.6780 10.6858 170.9371 0.0394 14.5134 10-second RVt 0.4984 0.3103 0.4172 7.0322 85.1760 0.0741 9.6097 RVt,M=260block 0.4984 0.3106 0.4170 7.0330 85.2038 0.0741 9.6077 RKtF 0.5021 0.3165 0.5179 8.7480 124.3016 0.0533 11.7691 RKtN 0.5051 0.3128 0.5845 9.6693 145.9337 0.0447 12.9900 V^t 0.4962 0.3099 0.4109 6.9512 83.4540 0.0741 9.4866 V^t,MA(1) 0.4921 0.3067 0.4220 6.9830 83.8263 0.0470 9.6152 V^t,MA(2) 0.4943 0.3066 0.4271 6.9312 83.1514 0.0441 9.6627 Frequency Data Mean Median Var. Skew. Kurt. Min. Max. Daily rt 0.0188 0.0452 0.4980 −0.6969 6.5418 −4.2837 2.7084 rt2 0.4984 0.1634 1.3633 8.8640 118.4977 0.0000 18.3506 5-minute RVt 0.5287 0.2959 1.5263 16.2802 320.3198 0.0358 25.2066 RVt,M=26block 0.5297 0.2920 1.5736 16.4718 325.8798 0.0356 25.6964 RKtF 0.4900 0.2754 0.5812 7.8004 97.4299 0.0094 11.5382 RKtN 0.4917 0.2792 0.6305 8.7319 119.7854 0.0168 12.7330 V^t 0.5212 0.2901 1.3682 15.7278 304.5983 0.0367 23.5902 V^t,MA(1) 0.5245 0.2756 1.3761 15.3403 294.2938 0.0160 23.4621 V^t,MA(2) 0.5162 0.2900 0.8368 10.9368 175.3430 0.0145 16.1662 1-minute RVt 0.5209 0.3154 0.5902 9.3401 138.3005 0.0521 12.8912 RVt,M=78block 0.5214 0.3174 0.5910 9.3483 138.5217 0.0520 12.9052 RKtF 0.5126 0.3036 0.8742 12.6004 219.7084 0.0344 17.4842 RKtN 0.5062 0.2999 0.8818 13.0819 232.7427 0.0281 17.8050 V^t 0.5170 0.3153 0.5766 9.2625 136.6305 0.0522 12.7063 V^t,MA(1) 0.5095 0.2984 0.9136 13.1729 235.1810 0.0383 18.1655 V^t,MA(2) 0.5025 0.2940 0.9899 13.5608 244.4456 0.0344 19.0570 30-second RVt 0.5034 0.3099 0.4525 7.2251 89.5632 0.0563 10.1295 RVt,M=130block 0.5297 0.2920 1.5736 16.471 325.8798 0.0356 25.6964 RKtF 0.5092 0.3109 0.6869 10.9490 177.4869 0.0411 14.7520 RKtN 0.5066 0.3094 0.7906 12.2661 211.2367 0.0356 16.4858 V^t 0.5008 0.3087 0.4466 7.1778 88.5730 0.0566 10.0356 V^t,MA(1) 0.5007 0.3017 0.6056 9.9364 152.7216 0.0436 13.3641 V^t,MA(2) 0.4997 0.2917 0.6780 10.6858 170.9371 0.0394 14.5134 10-second RVt 0.4984 0.3103 0.4172 7.0322 85.1760 0.0741 9.6097 RVt,M=260block 0.4984 0.3106 0.4170 7.0330 85.2038 0.0741 9.6077 RKtF 0.5021 0.3165 0.5179 8.7480 124.3016 0.0533 11.7691 RKtN 0.5051 0.3128 0.5845 9.6693 145.9337 0.0447 12.9900 V^t 0.4962 0.3099 0.4109 6.9512 83.4540 0.0741 9.4866 V^t,MA(1) 0.4921 0.3067 0.4220 6.9830 83.8263 0.0470 9.6152 V^t,MA(2) 0.4943 0.3066 0.4271 6.9312 83.1514 0.0441 9.6627 Notes: This table reports the summary statistics of ex post variance estimators based on 5-minute, 1-minute, 30-second, and 10-second SPY returns, along with the summary statistics of daily return and daily squared return. Sample period: July 2, 2014 to June 28, 2016. Open in new tab 6 Conclusion This article offers a new exact finite sample approach to estimate ex post variance using Bayesian nonparametric methods. The proposed approach benefits ex post variance estimation in two aspects. First, the observations with similar variance levels can be pooled together to increase accuracy. Second, the exact finite sample inference is available directly without relying on additional assumptions about a higher frequency DGP. Bayesian nonparametric variance estimators under no noise, heteroskedastic, and serially correlated microstructure noise cases are introduced. Monte Carlo simulation results show that the proposed approach can increase the accuracy of ex post variance estimation and provide reliable finite-sample inference. Applications to real equity returns show the new estimators conform closely to the RV and kernel estimators in terms of average statistical properties as well as time series characteristics. The Bayesian estimators can be used with confidence and have several benefits relative to existing methods. The Bayesian estimator can capture asymmetric density intervals, always remains positive and does not rely on the estimation of integrated quarticity. Appendix A.1 Existing Ex Post Volatility Estimation A.1.1 RV Let rt,i denotes the i-th intraday return on day t, i=1,…,nt ⁠, where nt is the number of intraday returns on day t. RV is defined as RVt=∑i=1ntrt,i2, (48) and RVt→pIVt ⁠, as nt→∞ (Andersen et al., 2001). Barndorff-Nielsen and Shephard (2002) derive the asymptotic distribution of RVt as nt12IQt(RVt−IVt)→dN(0,1),  as nt→∞, (49) where IQt stands for the integrated quarticity, which can be estimated by realized quarticity (RQt) defined as RQt=nt3∑i=1ntrt,i4→pIQt,  as nt→∞. (50) A.1.2 Flat-top realized kernel Barndorff-Nielsen et al. (2008) introduced the flat-top realized kernel (⁠ RKtF ⁠), which is the optimal estimator if the microstructure error is a white noise process. RKtF=∑i=1ntr˜t,i2+∑h=1Hk(h−1H)(γ−h+γh),  γh=∑i=1ntr˜t,ir˜t,i−h, (51) where H is the bandwidth, k(x) is a kernel weight function. The preferred kernel function is the second-order Tukey–Hanning kernel and the preferred bandwidth is H*=cξnt ⁠, where ξ2=ω2/IQt denotes the noise-to-signal ratio. ω2 stands for the variance of microstructure noise and can be estimated by RVt/(2nt) by Bandi and Russell (2008). RVt based on 10-minute returns is less sensitive to microstructure noise and can be used as a proxy of IQt ⁠. c = 5.74 given Tukey–Hanning kernel of order 2. Given the Tukey–Hanning kernel and H*=cξnt ⁠, Barndorff-Nielsen et al. (2008) show that the asymptotic distribution of RKtF is nt1/4(RKtF−IVt)→dMN{0,4IQt3/4ω(ck•0,0+2c−1k•1,1IVtIQt+c−3k•2,2)}, (52) where MN is mixture of normal distribution, k•0,0=0.219, k•1,1=1.71 ⁠, and k•2,2=41.7 for second-order Tukey–Hanning kernel. Even though ω2 can be estimated using RVt/(2nt) ⁠, a better and less biased estimator suggested by Barndorff-Nielsen et al. (2008) is ωˇ2= exp [log(ω^2)−RKt/RVt]. (53) The estimation of IQt is more sensitive to the microstructure noise. The tri-power quarticity (TPQt) developed by Barndorff-Nielsen and Shephard (2006) can be used to estimate IQt, TPQt=ntμ4/3−3∑i=1nt−2|r˜t,i|4/3|r˜t,i+1|4/3|r˜t,i+2|4/3, (54) where μ4/3=22/3Γ(7/6)/Γ(1/2)Replacing IVt, ω2 ⁠, and IQt with RKtF ⁠, ω⌣2 ⁠, and TPQt in Equation (52), the asymptotic variance of RKtF can be calculated. A.1.3 Non-negative realized kernel The flat-top realized kernel discussed in the previous subsection is based on the assumption that the error term is white noise. However, the white noise assumption is restrictive and the error term can be serial dependent or dependent on returns in reality. Another drawback of the RKtF is that it may provide negative volatility estimates, albeit very rarely. Barndorff-Nielsen et al. (2011) further introduced the non-negative realized kernel (⁠ RKtN ⁠) which is more robust to these assumptions of error term and is calculated as RKtN=∑h=−HHk(hH+1)γh,  γh=∑i=|h|+1ntr˜t,ir˜t,i−|h|. (55) The optimal choice of H is H*=cξ4/5nt3/5 and the preferred kernel weight function is the Parzen kernel 16, which implies c=3.5134 ⁠. ξ2 can be estimated using the same method as in the calculation of RKtF ⁠. Barndorff-Nielsen et al. (2011) show the asymptotic distribution of RKtN based on H*=cξ4/5nt3/5 is given by nt1/5(RKtN−IVt)→dMN(κ,4κ2), (56) where κ=κ0(IQtω)2/5, κ0=0.97 for Parzen kernel function, ω and IQt can be estimated using Equations (53) and (54). A.2 Adjustment to DPM-MA(1) Estimator Let pt,i denotes the latent intraday price and ϵt,i is the microstructure noise which is independently distributed and heteroskedastic. The observed intraday price p˜t,i is p˜t,i=pt,i+ϵt,i,  E(ϵt,i)=0 and var(ϵt,i)=ωt,i2. (57) The log–return process is constructed as follows, r˜t,i=p˜t,i−p˜t,i−1=pt,i−pt,i−1+ϵt,i−ϵt,i−1=rt,i+ϵt,i−ϵt,i−1, (58) where r˜t,i and rt,i are the observed return and pure return. The variance and first autocovariance of {rt,i}i=1nt are var(r˜t,i)=σt,i2+ωt,i2+ωt,i−12, (59) cov(r˜t,i,r˜t,i−1)=−ωt,i−12. (60) Consider the following heteroskedastic MA(1) model for the observed r˜t,i ⁠, r˜t,i=μt+θtηt,i−1+ηt,i,  ηt,i∼N(0,δt,i2), (61) which will be used to recover an estimate of ex post variance for the pure return process, Vt=∑i=1ntσt,i2 ⁠. The corresponding moments of this process are var(r˜t,i)=θt2δt,i−12+δt,i2, (62) cov(r˜t,i,r˜t,i−1)=θtδt,i−12. (63) Equating (59) and (62), we have σt,i2+ωt,i2+ωt,i−12=θt2δt,i−12+δt,i2 (64) Equating (60) and (63), we have −ωt,i−12=θtδt,i−12  and  −ωt,i2=θtδt,i2. (65) Based on the result in Equation (65), the summation of δt,i2 ⁠, over i=1,…,nt ⁠, equals ∑i=1ntδt,i2=−1θt∑i=1ntwt,i2. (66) Plugging both terms in Equation (65) into Equation (64) yields σt,i2+ωt,i2+ωt,i−12=−θtωt,i−12−ωt,i2θt (67) σt,i2+(1+1θt)ωt,i2+(1+θt)ωt,i−12=0. (68) Using the results in Equation (68), the summation of σt,i2 ⁠, over i=1,…,nt ⁠, equals ∑i=1ntσt,i2+(1+1θt)∑i=1ntωt,i2+(1+θt)∑i=1ntωt,i−12=0 (69) Vt=−(1+1θt)∑i=1ntωt,i2−(1+θt)∑i=1ntωt,i−12. (70) The ratio between Equations (66) and (70) is Vt∑i=1ntδt,i2=−(1+1θt)∑i=1ntωt,i2−(1+θt)∑i=1ntωt,i−12−1θt∑i=1ntωt,i2 (71) =(1+θt)2∑i=1nt−1ωt,i2+(1+θt)ωt,nt2+(θt+θt2)ωt,02∑i=1nt−1ωt,i2+ωt,nt2 (72) =(1+θt)2, if ωt,nt=ωt,0. (73) Finally, we have (1+θt)2∑i=1ntδt,i2=Vt,  if ωt,nt=ωt,0. (74) A.3 Adjustment to DPM-MA(2) Estimator If the observed intraday price p˜t,i is p˜t,i=pt,i+ϵt,i−ρϵt,i−1,  E(ϵt,i)=0 and var(ϵt,i)=ωt,i2. (75) Then log–return process is constructed as follows. r˜t,i=p˜t,i−p˜t,i−1=rt,i+ϵt,i−(1+ρ)ϵt,i−1+ρϵt,i−2. (76) Using the following heteroskedastic MA(2) model for r˜t,i ⁠, r˜t,i=μt+θ1tηt,i−1+θ2tηt,i−2+ηt,i,  ηt,i∼N(0,δt,i2) (77) it can be shown the adjustment term is (1+θ1t+θ2t)2∑i=1ntδt,i2=Vt,  if ωt,nt−1=ωt,0 and ωt,nt=ωt,−1. (78) Similar results hold for higher order MA models. A.4 Proof of Theorem 1 We will show that E[Vt|{rt,i}i=1nt] can be expressed as realized volatility, RVt ⁠, plus a bias term, and that this bias converges to zero. It is useful to introduce the allocation variables used in MCMC sampler in Section 1.3 to derive a suitable expression for E[Vt|{rt,i}i=1nt] by using E[Vt|{rt,i}i=1nt]=E[∑i=1ntσt,i2|{rt,i}i=1nt]=E{st,i}i=1nt|{rt,i}i=1ntE[∑i=1ntσt,i2|{rt,i}i=1nt,{st,i}i=1nt]. Conditional on the allocation variables {st,i}i=1nt ⁠, we can write σt2=∑i=1ntσt,i2=∑k=1Ktmt,kψt,k, where Kt is the number of distinct values of {st,i}i=1nt, mt,k=∑i=1nt1(st,i=k) and ψt,k∼IG(ν0,t+mt,k/2,s0,t+∑st,i=krt,i2/2) ⁠. It follows that E[∑i=1ntσt,i2|{rt,i}i=1nt,{st,i}i=1nt]=∑k=1Ktmt,ks0,t+∑st,i=krt,i2/2ν0,t+mt,k/2−1\=s0,t∑k=1Ktmt,kν0,t+mt,k/2−1+∑k=1Kt(1−ν0,t−1ν0,t+mt,k/2−1)∑st,i=krt,i2=∑i=1ntrt,i2+s0,t∑k=1Ktmt,kν0,t+mt,k/2−1−∑k=1Ktν0,t−1ν0,t+mt,k/2−1∑st,i=krt,i2=RVt+B(st,1:nt) where B({st,i}i=1nt)=∑k=1Kt[s0,tmt,k−(ν0,t−1)∑st,i=krt,i2]ν0,t+mt,k/2−1=∑k=1Ktmt,kν0,t+mt,k/2−1[s0,t−(ν0,t−1)1mt,k∑st,i=krt,i2]. The posterior mean can be expressed as E[Vt|{rt,i}i=1nt]=RVt+E{st,i}i=1nt|{rt,i}i=1nt[B({st,i}i=1nt)]. (79) To show consistency, we use the result that RVt is a consistent estimator of quadrative variation under our conditions (Andersen et al., 2003). To see that the bias term E{st,i}i=1nt|{rt,i}i=1nt[B({st,i}i=1nt)] converges in probability to zero, notice that E{st,i}i=1nt|{rt,i}i=1nt[B({st,i}i=1nt)]=∑k=1Ktmt,kν0,t+mt,k/2−1[s0,t−(ν0,t−1)1mt,k∑st,i=kσt,i2]<2[Kts0,t−(ν0,t−1)∑k=1Kt1mt,k∑st,i=kσt,i2]. Clearly, this converges in probability to zero if Kt s0,t→0 as 1mt,k∑st,i=kσt,i2→0 in probability for all st,1:nt (since sup(τt,j+1−τt,j)→0 ⁠). We know that E[Kt]≈M log(1+nM) and the results follow from the assumption that s0,nt=O(nt−α) for α>0 ⁠. Footnotes * We are grateful for comments from Federico M. Bandi, an Associate Editor, two anonymous referees, Silvia Gonçalves, and seminar participants at the 2016 NBER-NSF Seminar on Bayesian Inference in Econometrics and Statistics and the 2016 Rimini Conference in Economics and Finance. J.M.M. thanks the SSHRC of Canada for financial support. 1 For a good survey of the key concepts see Andersen and Benzoni (2008), for an in-depth treatment see Aït-Sahalia and Jacod (2014). 2 Many estimators of integrated volatility calculated using high-frequency data such as realized kernels use pooling to reduce microstructure noise but are not explicitly seen as shrinkage estimators. 3 Polson and Roberts (1994) is an early work that used RV concepts and Bayesian methods to estimate diffusion processes for stock returns. 4 This infinite mixture is related to finite mixtures that have been used to approximate distributions. For example, see Kim, Shephard, and Chib (1998). 5 An MA(q) model of stationary and ergodic microstructure noise would not allow for staleness (Bandi, Pirino, and Reno, 2017) or flat trading (Phillips and Yu, 2007). 6 Restrictions on MA coefficients: all the roots of 1+θ1B+θ2B2+⋯+θqBq=0 are outside of the unit circle. 7 If r˜t=θ1ηt−1+…+θqηt−q+1+ηt ⁠, then under their assumptions the bias-corrected estimate of ex post variance is RVMAq=(1+θ1+…+θq)2∑i=1ntη^i2 ⁠, where η^i denotes a fitted residual. 8 The function s‐exp  is defined as s‐exp (x)= exp (x) if x≤x0 and s‐exp (x)= exp (x0)x0x0−x02+x2 if x>x0 ⁠, with x0= log (1.5) ⁠. 9 The blocked RV is calculated as follows. RVtblock=MΔt∑iM2Γ(M−12)Γ(M+12)στi2 where στi2=1Δt(M−1)∑tj∈(τi,τi+1)(rtj−r¯τi)2, r¯τi=1M∑tj∈(τi,τi+1)rtj ⁠, and M is the block size. 10 http://www.kibot.com 11 The 95% confidence intervals using log (RVt),  log (RKtF) and log (RKtN) are based on the asymptotic distributions in Barndorff-Nielsen and Shephard (2002), Barndorff-Nielsen et al. (2008), and Barndorff-Nielsen et al. (2011). 12 A drawback of this specification is that it is possible for the coefficient on RVt−1 to be negative and produce a negative forecast for next period’s variance. To avoid this when β1+β1QRQt−11/2<0 ⁠, it is set to 0. 13 Data from day t + 1 would not be available in a real-time scenario. Using only data from day t to set ν0,t and s0,t gives very similar results. References Aït-Sahalia Y. , Jacod J. . 2014 . High-Frequency Financial Econometrics . Princeton, NJ: Princeton University Press . Google Preview WorldCat COPAC Aït-Sahalia Y. , Mancini L. . 2008 . Out of Sample Forecasts of Quadratic Variation . Journal of Econometrics 147 : 17 – 33 . Google Scholar Crossref Search ADS WorldCat Aït-Sahalia Y. , Mykland P. A. , Zhang L. . 2011 . Ultra High Frequency Volatility Estimation with Dependent Microstructure Noise . Journal of Econometrics 160 : 160 – 175 . Google Scholar Crossref Search ADS WorldCat Andersen T. , Bollerslev T. . 1998 . Answering the Skeptics: Yes, Standard Volatility Models Do Provide Accurate Forecasts . International Economic Review 39 : 885 – 905 . Google Scholar Crossref Search ADS WorldCat Andersen T. G. , Benzoni L. . 2008 . Realized Volatility, Working Paper Series WP-08-14 . Chicago, IL: Federal Reserve Bank of Chicago . Google Preview WorldCat COPAC Andersen T. G. , Bollerslev T. , Diebold F. X. , Ebens H. . 2001 . The Distribution of Realized Stock Return Volatility . Journal of Financial Economics 61 : 43 – 76 . Google Scholar Crossref Search ADS WorldCat Andersen T. G. , Bollerslev T. , Diebold F. X. , Labys P. . 2001 . The Distribution of Realized Exchange Rate Volatility . Journal of the American Statistical Association 96 : 42 – 55 . Google Scholar Crossref Search ADS WorldCat Andersen T. G. , Bollerslev T. , Diebold F. X. , Labys P. . 2003 . Modeling and Forecasting Realized Volatility . Econometrica 71 : 579 – 625 . Google Scholar Crossref Search ADS WorldCat Andersen T. G. , Bollerslev T. , Meddahi N. . 2011 . Realized Volatility Forecasting and Market Microstructure Noise . Journal of Econometrics 160 : 220 – 234 . Google Scholar Crossref Search ADS WorldCat Bandi F. M. , Pirino D. , Reno R. . 2017 . EXcess Idle Time . Econometrica 85 : 1793 – 1846 . Google Scholar Crossref Search ADS WorldCat Bandi F. M. , Russell J. R. . 2008 . Microstructure Noise, Realized Variance, and Optimal Sampling . Review of Economic Studies 75 : 339 – 369 . Google Scholar Crossref Search ADS WorldCat Barndorff-Nielsen O. E. , Hansen P. R. , Lunde A. , Shephard N. . 2008 . Designing Realized Kernels to Measure the Ex Post Variation of Equity Prices in the Presence of Noise . Econometrica 76 : 1481 – 1536 . Google Scholar Crossref Search ADS WorldCat Barndorff-Nielsen O. E. , Hansen P. R. , Lunde A. , Shephard N. . 2009 . Realized Kernels in Practice: Trades and Quotes . The Econometrics Journal 12 : C1 – C32 . Google Scholar Crossref Search ADS WorldCat Barndorff-Nielsen O. E. , Hansen P. R. , Lunde A. , Shephard N. . 2011 . Multivariate Realised Kernels: Consistent Positive Semi-Definite Estimators of the Covariation of Equity Prices with Noise and Non-Synchronous Trading . Journal of Econometrics 162 : 149 – 169 . Google Scholar Crossref Search ADS WorldCat Barndorff-Nielsen O. E. , Shephard N. . 2002 . Estimating Quadratic Variation Using Realized Variance . Journal of Applied Econometrics 17 : 457 – 477 . Google Scholar Crossref Search ADS WorldCat Barndorff-Nielsen O. E. , Shephard N. . 2006 . Econometrics of Testing for Jumps in Financial Economics Using Bipower Variation . Journal of Financial Econometrics 4 : 1 – 30 . Google Scholar Crossref Search ADS WorldCat Bollen B. , Inder B. . 2002 . Estimating Daily Volatility in Financial Markets Utilizing Intraday Data . Journal of Empirical Finance 9 : 551 – 562 . Google Scholar Crossref Search ADS WorldCat Bollerslev T. , Patton A. , Quaedvlieg R. . 2016 . Exploiting the Errors: A Simple Approach for Improved Volatility Forecasting . Journal of Econometrics 1 – 18 . WorldCat Brown L. D. , Zhao L. H. . 2012 . A Geometrical Explanation of Stein Shrinkage . Statistical Science 27 : 24 – 30 . Google Scholar Crossref Search ADS WorldCat Chernov M. , Ronald Gallant A. , Ghysels E. , Tauchen G. . 2003 . Alternative Models for Stock Price Dynamics . Journal of Econometrics 116 : 225 – 257 . Google Scholar Crossref Search ADS WorldCat Corradi V. , Distaso W. , Swanson N. R. . 2009 . Predictive Density Estimators for Daily Volatility Based on the Use of Realized Measures . Journal of Econometrics 150 : 119 – 138 . Google Scholar Crossref Search ADS WorldCat Corsi F. 2009 . A Simple Approximate Long-Memory Model of Realized Volatility . Journal of Financial Econometrics 7 : 174 – 196 . Google Scholar Crossref Search ADS WorldCat Escobar M. D. , West M. . 1994 . Bayesian Density Estimation and Inference Using Mixtures . Journal of the American Statistical Association 90 : 577 – 588 . Google Scholar Crossref Search ADS WorldCat Ferguson T. S. 1973 . A Bayesian Analysis of Some Nonparametric Problems . The Annals of Statistics 1 : 209 – 230 . Google Scholar Crossref Search ADS WorldCat Goncalves S. , Meddahi N. . 2009 . Bootstrapping Realized Volatility . Econometrica 77 : 283 – 306 . Google Scholar Crossref Search ADS WorldCat Griffin J. E. , Oomen R. C. A. . 2008 . Sampling Returns for Realized Variance Calculations: Tick Time or Transaction Time? Econometric Reviews 27 : 230 – 253 . Google Scholar Crossref Search ADS WorldCat Hansen P. , Large J. , Lunde A. . 2008 . Moving Average-Based Estimators of Integrated Variance . Econometric Reviews 27 : 79 – 111 . Google Scholar Crossref Search ADS WorldCat Hansen P. R. , Lunde A. . 2006 . Realized Variance and Market Microstructure Noise . Journal of Business & Economic Statistics 24 : 127 – 161 . Google Scholar Crossref Search ADS WorldCat Huang X. , Tauchen G. . 2005 . The Relative Contribution of Jumps to Total Price Variance . Journal of Financial Econometrics 3 : 456 – 499 . Google Scholar Crossref Search ADS WorldCat Kalli M. , Griffin J. E. , Walker S. G. . 2011 . Slice Sampling Mixture Models . Statistics and Computing 21 : 93 – 105 . Google Scholar Crossref Search ADS WorldCat Kim S. , Shephard N. , Chib S. . 1998 . Stochastic Volatility: Likelihood Inference and Comparison with ARCH Models . Review of Economic Studies 65 : 361 – 393 . Google Scholar Crossref Search ADS WorldCat Maheu J. M. , McCurdy T. H. . 2002 . Nonlinear Features of Realized FX Volatility . Review of Economics and Statistics 84 : 668 – 681 . Google Scholar Crossref Search ADS WorldCat Mykland P. A. , Zhang L. . 2009 . Inference for Continuous Semimartingales Observed at High Frequency . Econometrica 77 : 1403 – 1445 . Google Scholar Crossref Search ADS WorldCat Phillips P. C. B. , Yu J. . 2007 . “Information Loss in Volatility Measurement with Flat Price Trading.” Cowles Foundation Discussion Paper 1598. Polson N. G. , Roberts G. O. . 1994 . Bayes Factors for Discrete Observations from Diffusion Processes . Biometrika 81 : 11 – 26 . Google Scholar Crossref Search ADS WorldCat Sethuraman J. 1994 . A Constructive Definition of Dirichlet Priors . Statistica Sinica 4: 639 – 650 . WorldCat Zhang L. , Mykland P. A. , Aït-Sahalia Y. . 2005 . A Tale of Two Time Scales: Determining Integrated Volatility with Noisy High-Frequency Data . Journal of the American Statistical Association 100 : 1394 – 1411 . Google Scholar Crossref Search ADS WorldCat Zhou B. 1996 . High Frequency Data and Volatility in Foreign Exchange Rates . Journal of Business & Economic Statistics 14 : 45 – 52 . WorldCat © The Author(s) 2019. Published by Oxford University Press. All rights reserved. For permissions, please email: journals.permissions@oup.com This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model) http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Journal of Financial Econometrics Oxford University Press

Bayesian Nonparametric Estimation of Ex Post Variance

Journal of Financial Econometrics , Volume Advance Article – Nov 1, 2019

Loading next page...
 
/lp/oxford-university-press/bayesian-nonparametric-estimation-of-ex-post-variance-e4N9fWI5xL

References (50)

Publisher
Oxford University Press
Copyright
© The Author(s) 2019. Published by Oxford University Press. All rights reserved. For permissions, please email: journals.permissions@oup.com
ISSN
1479-8409
eISSN
1479-8417
DOI
10.1093/jjfinec/nbz034
Publisher site
See Article on Publisher Site

Abstract

Abstract Variance estimation is central to many questions in finance and economics. Until now ex post variance estimation has been based on infill asymptotic assumptions that exploit high-frequency data. This article offers a new exact finite sample approach to estimating ex post variance using Bayesian nonparametric methods. In contrast to the classical counterpart, the proposed method exploits pooling over high-frequency observations with similar variances. Bayesian nonparametric variance estimators under no noise, heteroskedastic and serially correlated microstructure noise are introduced and discussed. Monte Carlo simulation results show that the proposed approach can increase the accuracy of variance estimation. Applications to equity data and comparison with realized variance and realized kernel estimators are included. Volatility is an indispensable quantity in finance and is a key input into asset pricing, risk management, and portfolio management. In the last two decades, researchers have taken advantage of high-frequency data to estimate ex post variance using intraperiod returns. Barndorff-Nielsen and Shephard (2002) and Andersen et al. (2003) formalized the idea of using high-frequency data to measure the volatility of lower frequency returns. They show that realized variance (RV) is a consistent estimator of quadratic variation under ideal conditions. Unlike parametric models of volatility in which the model specification is important, RV is a model-free estimate of quadratic variation in that it is valid under a wide range of spot volatility dynamics.1 RV provides an accurate measure of ex post variance if there is no market microstructure noise. However, observed prices at high frequency are inevitably contaminated by noise in reality and returns are no longer uncorrelated. In this case, RV is a biased and inconsistent estimator (Hansen and Lunde, 2006; Aït-Sahalia, Mykland, and Zhang, 2011). The impact of market microstructure noise on forecasting is explored in Aït-Sahalia and Mancini (2008) and Andersen, Bollerslev, and Meddahi (2011). Several different approaches have been proposed to estimate ex post variance under microstructure noise. Zhou (1996) first introduced the idea of using a kernel-based method to estimate ex post variance. Barndorff-Nielsen et al. (2008) formally discussed the realized kernel and showed how to use it in practice in a later article (Barndorff-Nielsen et al., 2009). Another approach is the subsampling method of Zhang, Mykland, and Aït-Sahalia (2005). Hansen, Large, and Lunde (2008) showed how a time series model can be used to filter out market microstructure to obtain corrected estimates of ex post variance. A robust version of the predictive density of integrated volatility is derived in Corradi, Distaso, and Swanson (2009). Although bootstrap refinements are explored in Goncalves and Meddahi (2009) all distributional results from this literature rely on infill asymptotics. Much of the literature has focused on the asymptotic properties of adaptations of realized variation which are robust to market microstructure noise. However, an argument can be made against the direct use of realized variation in the no noise situation if time between observations does not converge to zero. Realized variation is the sum of squared intraperiod returns and each component of that sum is an unbiased estimator of the corresponding intraperiod integrated volatility. It is well-understood that unbiased estimators based on one observation can be suboptimal in terms of mean squared error and risk, for example, Brown and Zhao (2012). Shrinkage estimators which pool information from related estimates are one method to construct estimators with better properties. This suggests estimators of the integrated volatility with smaller mean squared error than realized variation can be constructed by summing shrunken estimates of the intraperiod integrated volatility.2 When will there be substantial differences? Intuitively, shrinkage estimates work well when the unbiased estimators are noisy and information can be usefully pooled. If high-frequency data had no noise, we would expect the difference to decrease as the sampling frequency increases (and the benefit of pooling will disappear asymptotically). In reality, high-frequency financial data include noise and it is less clear that substantial differences will disappear asymptotically. In our simulation experiments, we find evidence that the difference in mean squared error persists at high frequency. We use a Bayesian hierarchical approach to achieve shrinkage by pooling the information from related estimates. To our knowledge, this idea has not been used in the estimation of volatility using high-frequency data. We assume that the intraperiod integrated volatility over short periods is exchangeable which implies that they can be modeled as conditionally independent and drawn from a prior distribution. The choice of distribution will have a strong effect on the form of pooling and so we choose to infer this distribution from the data using Bayesian nonparametric methods rather than choosing a parametric family (such as the generalized inverse Gaussian distribution). We model intraperiod returns according to a Dirichlet process mixture (DPM) model. This is a countably infinite mixture of distributions which facilitates the clustering of return observations into distinct groups sharing the same variance parameter. Our proposed method benefits variance estimation in at least two aspects. First, the common values of intraperiod variance can be pooled into the same group leading to a more precise estimate. The pooling is done endogenously along with estimation of other model parameters. Second, the Bayesian nonparametric model delivers exact finite inference regarding ex post variance or transformations such as the logarithm. As such, uncertainty around the estimate of ex post volatility is readily available from the predictive density. Unlike the existing asymptotic theory which may give confidence intervals that contain negative values for variance, density intervals are always on the positive real line and can accommodate asymmetry. By extending key results in Hansen, Large, and Lunde (2008), we adapt the DPM mixture models to deal with returns contaminated with heteroskedastic noise and serially correlated noise. Mykland and Zhang (2009) considered links between local parametric inference and high-frequency financial data analysis. Their approach assumes that quantities such as volatility are constant over blocks of returns and can lead to more efficient estimation and the definition of new estimators. Our method can be seen as a generalization of their blocked RV estimator using partitioning ideas from Bayesian nonparametrics by defining clusters rather than blocks of returns. Our approach endogenously finds these clusters and so does not restrict to clusters or blocks with a number of returns which are consecutive in time. Monte Carlo simulation results show the Bayesian approach to be a very competitive alternative. Overall, pooling can lead to more precise estimates of ex post variance and better coverage frequencies. These results are robust to different prior settings, irregularly spaced prices, and tick time sampling. We show that the new variance estimators can be used with confidence and effectively recover both the average statistical features of daily ex post variance as well as the time series properties. Two applications to real-world data with comparison to RV and kernel-based estimators are included. This article is organized as follows. The Bayesian nonparametric model, daily variance estimator, and model estimation methods are discussed in Section 1. Section 2 extends the Bayesian nonparametric model to deal with heteroskedastic and serially correlated microstructure noise. The consistency of the estimator is considered in Section 3. Section 4 provides an extensive simulation and comparison of the estimators. Applications to IBM and S&P 500 ETF data are found in Section 5. Section 6 concludes followed by an Appendix. 1 Bayesian Nonparametric Ex Post Variance Estimation In this section, we introduce a Bayesian nonparametric ex post volatility estimator. After defining the daily variance, conditional on the data, the discussion moves to the DPM model which provides the model framework of the proposed estimator. The approach discussed in this section deals with returns without microstructure noise and an estimator suitable for returns with microstructure noise is found in Section 2. 1.1 Model of High-Frequency Returns First, we consider the case with no market microstructure noise. We are interested in estimating the integrated volatility over fixed periods (which for simplicity will subsequently be called a day) using high-frequency intraday log returns. We will assume that there are nt intraday log returns for the t-th day recorded at times τt,1,τt,2,…,τt,nt and which are denoted rt,1,…,rt,nt ⁠. The model for each log-return is rt,i=μt+σt,izt,i,  zt,i∼iidN(0,1), i=1,…,nt, (1) where μt is constant in day t and 0<σt,i<∞ for all i.3 We make no assumptions on the stochastic process generating σt,i2 ⁠. For example, volatility may have jumps, undergo structural change or possess long memory. Given this assumed discrete data generating process (DGP), we cannot distinguish between continuous and discrete (jumps) components as is commonly done in the literature (Barndorff-Nielsen and Shephard, 2006). The daily return is rt=∑i=1ntrt,i (2) and it follows, conditional on the unknown realized volatility path Ft≡{σt,i2}i=1nt ⁠, the ex post variance is Vt≡Var(rt|Ft)=∑i=1ntσt,i2. (3) In our Bayesian setting, Vt is the target to estimate conditional on the data {rt,i}i=1nt ⁠. 1.2 A Bayesian Model with Pooling In this section, we discuss a nonparametric prior for the model of Equation (1) that allows for pooling over common values of σt,i2 ⁠. The DPM model is a Bayesian nonparametric mixture model that has been used in density estimation and for modeling unknown hierarchical effects among many other applications. A key advantage of the model is that it naturally incorporates parameter pooling. Our nonparametric model has the following hierarchical form rt,i|μt,σt,i∼iidN(μt,σt,i2), i=1,…,nt, (4) σt,i2|Gt∼iidGt, (5) Gt|G0,t,αt∼DP(αt,G0,t), (6) G0,t≡IG(v0,t,s0,t), (7) where the base measure is the inverse-gamma distribution denoted as IG(v,s) ⁠, which has a mean of s/(v−1) for v > 1. The return mean μt is assumed to be a constant over i. The Dirichlet process was formally introduced by Ferguson (1973) and is a distribution over distributions. Note that each day assumes an independent DPM model and is indexed by t. We do not pursue pooling over Gt. A draw from a DP(αt,G0,t) is an almost surely discrete distribution which is centered around the base distribution G0,t ⁠, in the sense that E[Gt(B)]=G0,t(B) for any set B. The concentration parameter αt>0 governs how closely a draw Gt resembles G0,t ⁠. Larger values of αt lead to Gt being closer to G0,t ⁠. Since the realization Gt are discrete, a sample from σt,i2|Gt∼Gt has a positive probability of repeated values. This has lead the use of DPM’s for clustering problems. If Kn is the number of distinct values in a sample of size n, then E[Kn]≈αt log(1+n/αt) ⁠. Therefore, the number of distinct values grows logarithmically with sample size and larger values of αt will tend to lead to more distinct values. In fact, as αt→∞, Gt→G0,t ⁠, which implies that every rt,i has a unique σt,i2 drawn from centering distribution. In this model, the inverse gamma distribution is used as the centering base measure as this is the standard conjugate choice and leads to relatively simple computational schemes for inference. In the case of αt→∞ ⁠, there is no pooling and we have a setting very close to the classical counterpart discussed above. However, for finite αt, pooling can take place. The other extreme is complete pooling for αt→0 in which there is one common variance shared by all observations such that σt,i2=σt,12, ∀i ⁠. Since αt plays an important role in pooling, we place a prior on it and estimate it along with the other model parameters for each day. A stick-breaking representation (Sethuraman, 1994) of the DPM in Equation (5) is given as follows.4 p(rt,i|μt,Ψt,wt)=∑j=1∞wt,jN(rt,i|μt,ψt,j2), (8) wt,j=vt,j∏l=1j−1(1−wt,l), (9) vt,j∼iidBeta(1,αt), (10) where N(·|·,·) denotes the density of the normal distribution, Ψt={ψt,12,ψt,22.…,} is the set of unique values of σt,i2 ⁠, wt={wt,1,wt,2,…,} and wt,j is the weight associated with the j-th component. This formulation of the model facilitates posterior sampling which is discussed in the next section. Since our focus is on intraday returns and the number of observations in a day can be small, especially for lower frequencies such as five-minute. Therefore, the prior should be chosen carefully. It is straightforward to show that the prior predictive distribution of σt,i2 is G0,t ⁠. For σt,i2∼IG(v0,t,s0,t) ⁠, the mean and variance of σt,i2 are E(σt,i2)=s0,tv0,t−1 and var(σt,i2)=s0,t2(v0,t−1)2(v0,t−2). (11) Solving the two equations, the values of v0,t and s0,t are given by v0,t=[E(σt,i2)]2var(σt,i2)+2 and s0,t=E(σt,i2)(v0,t−1). (12) We use sample statistics var̂(rt,i) and var̂(rt,i2) calculated with three days intraday returns (day t – 1, day t, and day t + 1) to set the values of E(σt,i2) and var(σt,i2) ⁠, then use Equation (12) to find v0,t and s0,t ⁠. A shrinkage prior N(0,v2) is used for μt since μt is expected to be close to zero. The prior variance of μt is adjusted according to data frequency: v2=ζ2/nt where nt is the number of intraday returns. Finally, αt∼Gamma(a,b) ⁠. See Table 1 for prior settings. For a finite dataset i=1,…,nt ⁠, our target is the following posterior moment E[Vt|{rt,i}i=1nt]=E[∑i=1ntσt,i2|{rt,i}i=1nt]. (13) Note that the posterior mean of Vt can also be considered as the posterior mean of RV, RVt=∑i=1ntrt,i2 assuming μt is small. As such, RVt treats each σt,i2 as separate and corresponds to no pooling. Mykland and Zhang (2009) discuss the use of blocks of high-frequency data in volatility estimation. Our method can be seen as a generalization of Mykland and Zhang (2009). We allow returns with the same variance to form groups flexibly and do not impose the restriction that the returns in one group are consecutive in time. Another distinction is that our approach allows the group size to vary over clusters and be determined endogenously, while Mykland and Zhang (2009) have one fixed block size for all clusters preset by the econometrician. Furthermore, unlike standard blocking, the proposed method is invariance to return permutations since the DPM model assumes exchangeable data. Table 1 Prior specifications of models Model μt σt,i2 Θt αt DPM N(0,v2) IG(v0,t,s0,t) – Gamma(2,8) DPM-MA(q) N(0,v2) IG(v0,t,s0,t) N(0,I)1{|Θt|} Gamma(2,8) Model μt σt,i2 Θt αt DPM N(0,v2) IG(v0,t,s0,t) – Gamma(2,8) DPM-MA(q) N(0,v2) IG(v0,t,s0,t) N(0,I)1{|Θt|} Gamma(2,8) Note: v0,t and s0,t are calculated using Equation (12); 1{|Θt|} denotes the invertibility condition for the MA(q) model; v2=ζ2/nt where ζ2=0.01 and nt is the number of intraday returns. Open in new tab Table 1 Prior specifications of models Model μt σt,i2 Θt αt DPM N(0,v2) IG(v0,t,s0,t) – Gamma(2,8) DPM-MA(q) N(0,v2) IG(v0,t,s0,t) N(0,I)1{|Θt|} Gamma(2,8) Model μt σt,i2 Θt αt DPM N(0,v2) IG(v0,t,s0,t) – Gamma(2,8) DPM-MA(q) N(0,v2) IG(v0,t,s0,t) N(0,I)1{|Θt|} Gamma(2,8) Note: v0,t and s0,t are calculated using Equation (12); 1{|Θt|} denotes the invertibility condition for the MA(q) model; v2=ζ2/nt where ζ2=0.01 and nt is the number of intraday returns. Open in new tab 1.3 Model Estimation Estimation relies on Markov chain Monte Carlo (MCMC) techniques. We apply the slice sampler of Kalli, Griffin, and Walker (2011), along with Gibbs sampling to estimate the DPM model. The slice sampler provides an elegant way to deal with the infinite states in Equation (8). It introduces an auxiliary variable ut,1:nt={ut,1,…,ut,nt} that randomly truncates the state space to a finite set at each MCMC iteration but marginally delivers draws from the desired posterior. The joint distribution of rt,i and the auxiliary variable ut,i is given by f(rt,i,ut,i|wt,μt,Ψt)=∑j=1∞1(ut,i<wt,j)N(rt,i|μt,ψt,j2), (14) and integrating out ut,i recovers Equation (8). It is convenient to rewrite the model in terms of a latent state variable st,i∈{1,2,…} that maps each observation to an associated component and parameter σt,i2=ψt,st,i2 ⁠. Observations with a common state share the same variance parameter. For finite dataset, the number of states (clusters) is finite and ordered from 1,…,K ⁠. Note that the number of clusters K, is not a fixed value over the MCMC iterations. A new cluster with variance ψt,K+12∼G0,t can be created if existing clusters do not fit that observation well and clusters sharing a similar variance can be merged into one. The joint posterior is p(μt)∏j=1K[p(ψt,j2)]p(αt)∏i=1nt1(ut,i<wt,st,i)N(rt,i|μt,ψt,st,i2). (15) Each MCMC iteration contains the following sampling steps. π(μt|rt,1:nt,{ψt,j2}j=1K,st,1:nt)∝p(μt)∏i=1ntp(rt,i|μt,ψt,st,i2) ⁠. π(ψt,j2|rt,1:nt,st,1:nt,μt)∝p(ψt,j2)∏t:st,i=jp(rt,i|μt,ψt,j2) for j=1,…,K ⁠. π(vt,j|st,1:nt)∝Beta(vt,j|at,j,bt,j) with at,j=1+∑i=1nt1(st,i=j) and bt,j=αt+∑i=1nt1(st,i>j) and update wt,j=vt,j∏l<j(1−vt,l) for j=1,…,K ⁠. π(ut,i|wt,i,st,1:nt)∝1(0<ut,i<wt,st,i) ⁠. Find the smallest K such that ∑j=1Kwt,j>1−min(ut,1:nt) ⁠. π(st,i|r1:nt,st,1:nt,μt,{ψt,j2}j=1K,ut,1:nt,K)∝∑j=1K1(ut,i<wt,j)p(rt,i,|μt,ψt,j2) for i=1,…,nt ⁠. π(αt|K)∝p(αt)p(K|αt) ⁠. In the first step, μt is common to all returns and this is a standard Gibbs step given the conjugate prior. Step 2 is a standard Gibbs step for each variance parameter ψt,j2 based on the data assigned to cluster j. The remaining steps are standard for slice sampling of DPM models. In Step 7, αt is sampled based on Escobar and West (1994). Steps 1–7 give one iteration of the posterior sampler. After dropping a suitable burn-in amount, M additional samples are collected, {θ(m)}m=1M ⁠, where θ={μt,ψt,12,…,ψt,K2,st,1:nt,αt} ⁠. Posterior moments of interest can be estimated from sample averages of the MCMC output. 1.4 Ex Post Variance Estimator Conditional on the parameter vector θ, the estimate of Vt is E[Vt|θ]=∑i=1ntσt,si2. (16) The posterior mean of Vt is obtained by integrating out all parameters and distributional uncertainty. E[Vt|{rt,i}i=1nt] is estimated as V^t=1M∑m=1M∑i=1ntσt,i2(m), (17) where σt,i2(m)=ψt,st,i(m)2(m) ⁠. Similarly, other features of the posterior distribution of Vt can be obtained. For instance, a (1 − α) probability density interval for Vt is the quantiles of ∑i=1ntσt,st,i2 associated with probabilities α/2 and (1−α/2) ⁠. Conditional on the model and prior these are exact finite sample estimates, in contrast to the classical estimator which relies on infill asymptotics to derived confidence intervals. If log(Vt) is the quantity of interest, the estimator of E[log(Vt)|{rt,i}i=1nt] is given as  log(Vt)̂=1M∑m=1M log (∑i=1ntσt,i2(m)). (18) As before, quantile estimates of the posterior of log(Vt) can be estimated from the MCMC output. 2. Bayesian Estimator under Microstructure Error An early approach to deal with market microstructure noise was to prefilter with a time series model (Andersen et al., 2001; Bollen and Inder, 2002; Maheu and McCurdy, 2002). Hansen, Large, and Lunde (2008) show that prefiltering results in a bias to RV that can be easily corrected. We employ these insights into moving average specifications to account for noisy high-frequency returns. A significant difference is that we allow for heteroskedasticity in the noise process. 2.1 DPM-MA(1) Model The existence of microstructure noise turns the intraday return process into an autocorrelated process. First, consider the case in which the error is white noise: p˜t,i=pt,i+ϵt,i, ϵt,i∼N(0,ωt,i2), (19) where p˜t,i denotes the observed log-price with error, pt,i is the unobserved fundamental log-price, and ωt,i2 is the heteroskedastic noise variance. Given this structure, it can be shown that the returns series r˜t,i=p˜t+1,i−p˜t,i has nonzero first-order autocorrelation but zero higher order autocorrelation. That is cov(r˜t,i+1,r˜t,i)=−ωt,i2 and cov(r˜t,i+j,r˜t,i)=0 for j≥2 ⁠. This suggests a moving average model of order one.5 Combining MA(1) parameterization with our Bayesian nonparametric framework yields the DPM-moving average of order 1 (MA(1)) models. r˜t,i|μt,θt,δt,i2=μt+θtηt,i−1+ηt,i, ηt,i∼N(0,δt,i2) (20) δt,i2|Gt∼Gt, (21) Gt|G0,t,αt∼DP(αt,G0,t), (22) G0,t≡IG(v0,t,s0,t). (23) The noise terms are heteroskedastic. Note that the mean of rt,i is not a constant term but a moving average term. The MA parameter θt is constant for i but will change with the day t. The prior is θt∼N(mθ,vθ2)1{|θt|<1} in order to make the MA model invertible. The error term ηt,0 is assumed to be zero. Other model settings remain the same as the DPM illustrated in Section 1. Later we show how estimates from this specification can be used to recover an estimate of the ex post variance Vt of the true return process. 2.2 DPM-MA(q) Model For lower sampling frequencies, such as one minute or more, first-order autocorrelation is the main effect of market microstructure. As such, the MA(1) model will be sufficient for many applications. However, at higher sampling frequencies, the dependence may be stronger. To allow for a more complex effect on returns from the noise process consider the MA(q − 1) noise affecting returns, p˜t,i=pt,i+ϵt,i−ρ1ϵt,i−1−…−ρq−1ϵt,i−q+1, ϵt,i∼N(0,ωt,i2). (24) For returns, this leads to the following DPM-MA(q) model, r˜t,i|μt,{θt,j}j=1q,δt,i2=μt+∑j=1qθt,jηt,i−j+ηt,i, ηt,i∼N(0,δt,i2) (25) δt,i2|Gt∼Gt, (26) Gt|G0,t,αt∼DP(αt,G0,t), (27) G0,t≡IG(v0,t,s0,t). (28) The joint prior of (θt,1,…,θt,q) is N(MΘ,VΘ)1{Θ}6 and (ηt,0,…,ηt,−(q−1))=(0,…,0) ⁠. 2.3 Model Estimation We discuss the estimation of DPM-MA(1) model and the approach can be easily extended to the DPM-MA(q). The main difference in this model is that the conditional mean parameters μt and θt require a Metropolis-Hasting (MH) step to sample their conditional posteriors. The remaining MCMC steps are essentially the same. As before, let ψt,i2 denote the unique values of δt,j2 then each MCMC iteration samples from the following conditional distributions. π(μt|r˜t,1:nt,{ψt,j2}j=1K,θt,st,1:nt)∝p(μt)∏i=1ntN(r˜t,i|μt+θtηt,i−1,ψt,st,i2) ⁠. π(θt|r˜t,1:nt,μt,{ψt,j2}j=1K,s1:ntt)∝p(θt)∏i=1ntp(r˜t,i|μt+θtηt,i−1,ψt,st,i2) ⁠. π(ψt,j2|r˜t,1:nt,μt,θt,st,1:nt)∝p(ψt,j2)∏t:st=jp(r˜t,i|μt+θtεt,i−1,ψt,j2) for j=1,…,K ⁠. π(vt,j|st,1:nt)∝Beta(vt,j|at,j,bt,j) with at,j=1+∑i=1nt1(st,i=j) and bt,j=αt+∑i=1nt1(st,i>j) and update wt,j=vt,j∏l<j(1−vt,l) for j=1,…,K ⁠. π(ut,i|wt,i,st,1:nt)∝1(0<ut,i<wt,st,i) for i=1,…,nt ⁠. Find the smallest K such that ∑j=1Kwt,j>1−min(ut,1:nt) ⁠. π(st,i|r˜1:nt,st,1:nt,μt,θt,{ψt,j2}j=1K,ut,1:nt,K)∝∑j=1K1(ut,i<wt,j)N(r˜t,i|μt+θtηt,i−1,ψt,j2) for i=1,…,nt ⁠. π(αt|K)∝p(αt)p(K|αt) ⁠. In Steps 1 and 2, the likelihood requires the sequential calculation of the lagged error as ηt,i−1=r˜t,i−1−μt−θtηt,i−2 which precludes a Gibbs sampling step. Therefore, μt and θt are sampled using an MH with a random walk proposal. The proposal is calibrated to achieve an acceptance rate between 0.3 and 0.5. 2.4 Ex Post Variance Estimator under Microstructure Error Hansen, Large, and Lunde (2008) showed that prefiltering with an MA model results in a bias in the RV estimator.7 In the Appendix, it is shown that the Hansen, Large, and Lunde (2008) bias correction provides an accurate adjustment to our Bayesian estimator in the context of heteroskedastic noise. From the DPM-MA(1) model the posterior mean of Vt under independent microstructure error is V^t,MA(1)=1M∑m=1M(1+θt(m))2∑i=1ntδt,i2(m), (29) where δt,i2(m)=ψt,st,i(m)2(m) ⁠. The log of Vt, square-root of Vt and density intervals can be estimated as the Bayesian nonparametric ex post variance estimator without microstructure error. In the case of higher autocorrelation, the DPM-MA(q) model adjusted posterior estimate of Vt is V^t,MA(q)=1M∑m=1M(1+∑j=1qθt,j(m))2∑i=1ntδt,i2(m). (30) Next, we consider simulation evidence on these estimators. 3 Consistency Each of the previously discussed estimators (posterior means) for integrated volatility (3) can be fairly easily shown to be consistent estimators as the sampling frequency increases. The posterior mean of Vt can be shown to be equal to a consistent estimator plus a bias term that goes to zero in probability as nt→0 ⁠. We provide the proof for the case with no market microstructure noise. Theorem 1. Suppose that p is an arbitrage-free price process with zero mean, that  sup(τt,j+1−τt,j)→0for  nt→∞and that  s0,nt=O(nt−α)for  α>0then  E[Vt|{rt,i}i=1nt]is a consistent estimator of the integrated volatility. See Appendix for the proof. A similar argument can be used for the MA processes with the residuals from the MA process replacing the returns. Hansen, Large, and Lunde (2008) argue that scaling RVt avoids the inconsistency of realized volatility under market microstructure noise. 4 Simulation Results 4.1 Data Generating Process We consider four commonly used DGPs in the literature. The first one is the GARCH(1,1) diffusion, introduced by Andersen and Bollerslev (1998). The log-price follows dp(t)=μdt+σ(t)dWp(t), (31) dσ2(t)=α(β−σ2(t))dt+γσ2(t)dWσ(t). (32) where Wp(t) and Wσ(t) are two independent Wiener processes. The values of parameters follow Andersen and Bollerslev (1998) and are μ=0.03, α=0.035, β=0.636 ⁠, and γ=0.144 ⁠, which were estimated using foreign exchange data. Following Huang and Tauchen (2005), the second and third DGPs are a one-factor stochastic volatility diffusion (SV1F) and one-factor stochastic volatility diffusion with jumps (SV1FJ). SV1F is given by dp(t)=μdt+ exp (β0+β1v(t))dWp(t), (33) dv(t)=αv(t)dt+dWv(t) (34) and the price process for SV1FJ is dp(t)=μdt+ exp (β0+β1v(t))dWp(t)+dJ(t), (35) where corr(dWp(t),dWv(t))=ρ ⁠, and J(t) is a Poisson process with jump intensity λ and jump size δ∼N(0,σJ2) ⁠. We adopt the parameter settings from Huang and Tauchen (2005) and set μ=0.03, β0=0.0, β1=0.125, α=−0.1, ρ=−0.62, λ=0.014 ⁠, and σJ2=0.5 ⁠. The final DGP is the two-factor stochastic volatility diffusion (SV2F) from Chernov et al. (2003) and Huang and Tauchen (2005).8 dp(t)=μdt+s‐exp (β0+β1v1(t)+β2v2(t))dWp(t), (36) dv1(t)=α1v1(t)dt+dWv1(t), (37) dv2(t)=α2v2(t)dt+(1+ψv2(t))dWv2(t), (38) where corr(dWp(t),dWv1(t))=ρ1 and corr(dWp(t),dWv2(t))=ρ2 ⁠. The parameter values in SV2F are μ=0.03, β0=−1.2, β1=0.04, β2=1.5, α1=−0.00137, α2=−1.386, ψ=0.25 ⁠, and ρ1=ρ2=−0.3 ⁠, which are from Huang and Tauchen (2005). Data are simulated using a basic Euler discretization at one-second frequency for the four DGPs. Assuming the length of daily trading time is 6.5 hours (23,400 seconds), we first simulate the log-price level every second. After this, we compute the 5-minute, 1-minute, 30-second, and 10-second intraday returns by taking the difference every 300, 60, 30, 10 steps, respectively. The initial volatility level, such as v1t and v2t in SV2F, at day t is set equal to the last volatility value at previous day, t – 1. T = 5000 days of intraday returns are simulated using the four DGPs and used to report sampling properties of the volatility estimators. In each case, to remove dependence on the startup conditions 500 initial days are dropped from the simulation. 4.1.1 Independent noise Following Barndorff-Nielsen et al. (2008), log-prices with independent noise are simulated as follows p˜t,i=pt,i+ϵt,i,ϵt,i∼N(0,σω2),σω2=ξ2var(rt). (39) The error term is added to the log-prices simulated from the four DGPs every second. The variance of microstructure error is proportional to the daily variance calculated using the pure daily returns. We set the noise-to-signal ratio ξ2=0.001 ⁠, which is the same value used in Barndorff-Nielsen et al. (2008) and close to the value in Bandi and Russell (2008). 4.1.2 Dependent noise Following Hansen, Large, and Lunde (2008), we consider the simulation of log-prices with dependent noise as follows, p˜t,i=pt,i+ϵt,i,ϵt,i∼N(μϵt,i,σω2),μϵt,i=∑l=1φ(1−lφ)(pt,i−l−pt,i−1−l),σω2=ξ2var(rt), (40) where φ=20 ⁠, which makes the error term correlated with returns in the past 20 seconds (steps). If past returns are positive (negative), the noise term tends to be positive (negative). All other settings, such as σω2 and ξ2 ⁠, are the same as in the independent error case. 4.2 True Volatility and Comparison Criteria The RV by Andersen et al. (2003) and Barndorff-Nielsen and Shephard (2002), the flat-top realized kernel (RKF) by Barndorff-Nielsen et al. (2008), and the non-negative realized kernel (RKN) by Barndorff-Nielsen et al. (2011) serve as the benchmarks for comparison. Section A.1 in Appendix provides a brief review of those estimators. We assess the ability of several ex post variance estimators to estimate the daily quadratic variation (QVt) from the four DGPs. QVt is estimated as the summation of the squared intraday pure returns at the highest frequency (one second) σt2≡∑i=123400rt,i2. (41) The competing ex post daily variance estimators, generically labeled σ^t2 ⁠, are compared based on the root mean squared errors (RMSEs), and bias defined as RMSE(σt2̂)=1T∑t=1T(σt2̂−σt2)2, (42) Bias(σt2̂)=1T∑t=1T(σt2̂−σt2). (43) The coverage probability estimates report the frequency that the confidence intervals or density intervals from the Bayesian nonparametric estimators contain the true ex post variance, σt2 ⁠. The 95% confidence intervals of RVt, RKtF ⁠, and RKtN reply on the asymptotic distribution, which is provided in Equations (49), (52), and (56). We take the bias into account to compute the 95% confidence interval using RKtN ⁠. The estimation of integrated quarticity is crucial in determining the confidence interval for the realized kernels. We consider two versions of quarticity, one is to use the true (infeasible) IQt which is calculated as IQttrue=23400∑i=123400σt,i4, (44) where σt,i2 refers to spot variance simulated at the highest frequency. The other method is to estimate IQt using the tri-power quarticity (TPQt) estimator, see formula (54). The confidence interval based on IQttrue is the infeasible case and the confidence interval calculated using TPQt is the feasible case. For each day, 5000 MCMC draws are collected after 1000 burn-in to compute the Bayesian posterior quantities. A 0.95 density interval is the 0.025 and 0.975 sample quantiles of MCMC draws of ∑i=1ntσt,i2 ⁠, respectively. 4.3 No Microstructure Noise In Table 2, V^t has slightly smaller RMSE in twelve out of sixteen categories. A paired t-test shows most of these differences in MSE are significant as well. For example, for the five-minute data, V^t reduces the RMSE by over 5% for the SV2F data. This is remarkable given that RVt is the gold standard in the no noise setting. Bias and coverage probabilities (not displayed) for 95% confidence intervals of RVt and 0.95 density intervals of V^t show both estimators to perform well. Under no microstructure noise, the Bayesian nonparametric estimator is competitive with the classical counterpart RVt. V^t offers smaller estimation error and better finite sample results than RVt when the data frequency is low. Performance of RVt and V^t both improve as the sampling frequency increases. Table 2 RMSE of RVt, blocked RVt, and V^t (no microstructure noise case) Data freq. Estimator GARCH SV1F SV1FJ SV2F 5-minute RVt 0.12352 0.21226 0.21471 0.45601 RVt,M=26block 0.12485 0.21588 0.21680 0.44368 V^t 0.11866*** 0.20116*** 0.20659*** 0.43095** 1-minute RVt 0.05368 0.09283 0.09771 0.23296 RVt,M=78block 0.05397 0.09335 0.09875 0.23120 V^t 0.05323*** 0.09190*** 0.10051 0.22802* 30-second RVt 0.03886 0.06530 0.06741* 0.14178 RVt,M=130block 0.03906 0.06539 0.06771 0.14184 V^t 0.03867*** 0.06495*** 0.07276 0.13970 10-second RVt 0.02177 0.03601 0.03662* 0.09535 RVt,M=260block 0.02176 0.03601 0.03677 0.09587 V^t 0.02171** 0.03589*** 0.04722 0.09596 Data freq. Estimator GARCH SV1F SV1FJ SV2F 5-minute RVt 0.12352 0.21226 0.21471 0.45601 RVt,M=26block 0.12485 0.21588 0.21680 0.44368 V^t 0.11866*** 0.20116*** 0.20659*** 0.43095** 1-minute RVt 0.05368 0.09283 0.09771 0.23296 RVt,M=78block 0.05397 0.09335 0.09875 0.23120 V^t 0.05323*** 0.09190*** 0.10051 0.22802* 30-second RVt 0.03886 0.06530 0.06741* 0.14178 RVt,M=130block 0.03906 0.06539 0.06771 0.14184 V^t 0.03867*** 0.06495*** 0.07276 0.13970 10-second RVt 0.02177 0.03601 0.03662* 0.09535 RVt,M=260block 0.02176 0.03601 0.03677 0.09587 V^t 0.02171** 0.03589*** 0.04722 0.09596 Notes: This table reports the RMSE of estimating 5000 daily ex post variances using RVt, blocked RVt, and Bayesian nonparametric estimator V^t under different frequencies and DGPs. Microstructure noise is not considered. A paired t-test is used to test whether the difference in the mean of (RVt−Vt)2 and (V^t−Vt)2 is equal to zero. Bold entries denote the smallest values. Significance at *p < 0.05; **p < 0.01; ***p < 0.001. Open in new tab Table 2 RMSE of RVt, blocked RVt, and V^t (no microstructure noise case) Data freq. Estimator GARCH SV1F SV1FJ SV2F 5-minute RVt 0.12352 0.21226 0.21471 0.45601 RVt,M=26block 0.12485 0.21588 0.21680 0.44368 V^t 0.11866*** 0.20116*** 0.20659*** 0.43095** 1-minute RVt 0.05368 0.09283 0.09771 0.23296 RVt,M=78block 0.05397 0.09335 0.09875 0.23120 V^t 0.05323*** 0.09190*** 0.10051 0.22802* 30-second RVt 0.03886 0.06530 0.06741* 0.14178 RVt,M=130block 0.03906 0.06539 0.06771 0.14184 V^t 0.03867*** 0.06495*** 0.07276 0.13970 10-second RVt 0.02177 0.03601 0.03662* 0.09535 RVt,M=260block 0.02176 0.03601 0.03677 0.09587 V^t 0.02171** 0.03589*** 0.04722 0.09596 Data freq. Estimator GARCH SV1F SV1FJ SV2F 5-minute RVt 0.12352 0.21226 0.21471 0.45601 RVt,M=26block 0.12485 0.21588 0.21680 0.44368 V^t 0.11866*** 0.20116*** 0.20659*** 0.43095** 1-minute RVt 0.05368 0.09283 0.09771 0.23296 RVt,M=78block 0.05397 0.09335 0.09875 0.23120 V^t 0.05323*** 0.09190*** 0.10051 0.22802* 30-second RVt 0.03886 0.06530 0.06741* 0.14178 RVt,M=130block 0.03906 0.06539 0.06771 0.14184 V^t 0.03867*** 0.06495*** 0.07276 0.13970 10-second RVt 0.02177 0.03601 0.03662* 0.09535 RVt,M=260block 0.02176 0.03601 0.03677 0.09587 V^t 0.02171** 0.03589*** 0.04722 0.09596 Notes: This table reports the RMSE of estimating 5000 daily ex post variances using RVt, blocked RVt, and Bayesian nonparametric estimator V^t under different frequencies and DGPs. Microstructure noise is not considered. A paired t-test is used to test whether the difference in the mean of (RVt−Vt)2 and (V^t−Vt)2 is equal to zero. Bold entries denote the smallest values. Significance at *p < 0.05; **p < 0.01; ***p < 0.001. Open in new tab The comparison between the Bayes nonparametric estimator and the blocked RV9 of Mykland and Zhang (2009) is also considered. Table 2 reports RMSE of blocked RV with block size being nt3/4 ⁠. The RMSE of V^t remains the lowest in twelve out of sixteen cases. A robustness analysis is conducted to check how sensitive the results are to the selection of priors. Different sets of hyperparameters of μt and α are considered and calibration of the prior of σt,i2 based on only one day of data (Equation (12)). Table 3 summarizes RMSE of V^t under alternative priors for SV2F data (see entries of v0,t1d,s0,t1d for one day of prior calibration). None of the result changes more than 1% under new priors and V^t consistently outperforms RVt in 5-minute, 1-minute, and 30-second categories. Table 3 Prior robustness check Estimator Prior of μt Prior of αt Prior of σt,i2 RMSE Panel A: 5-minute return  RVt – – – 0.45601 N(0,0.01nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.43095 N(0,0.05nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.42919   V^t N(0,0.002nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.42953 N(0,0.01nt) Gamma(2,8) IG(v0,t3d,s0,t3d) 0.42975 N(0,0.01nt) Gamma(8,8) IG(v0,t3d,s0,t3d) 0.42782 N(0,0.01nt) Gamma(4,8) IG(v0,t1d,s0,t1d) 0.43199 Panel B: 1-minute return  RVt – – – 0.23296 N(0,0.01nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.22802 N(0,0.05nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.22739   V^t N(0,0.002nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.22686 N(0,0.01nt) Gamma(2,8) IG(v0,t3d,s0,t3d) 0.22691 N(0,0.01nt) Gamma(8,8) IG(v0,t3d,s0,t3d) 0.22613 N(0,0.01nt) Gamma(4,8) IG(v0,t1d,s0,t1d) 0.22695 Panel C: 30-second return  RVt – – – 0.14178 N(0,0.01nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.13970 N(0,0.05nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.14059   V^t N(0,0.002nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.14012 N(0,0.01nt) Gamma(2,8) IG(v0,t3d,s0,t3d) 0.14096 N(0,0.01nt) Gamma(8,8) IG(v0,t3d,s0,t3d) 0.14003 N(0,0.01nt) Gamma(4,8) IG(v0,t1d,s0,t1d) 0.14029 Panel D: 10-second return  RVt – – – 0.09535 N(0,0.01nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.09596 N(0,0.05nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.09631   V^t N(0,0.002nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.09546 N(0,0.01nt) Gamma(2,8) IG(v0,t3d,s0,t3d) 0.09610 N(0,0.01nt) Gamma(8,8) IG(v0,t3d,s0,t3d) 0.09644 N(0,0.01nt) Gamma(4,8) IG(v0,t1d,s0,t1d) 0.09528 Estimator Prior of μt Prior of αt Prior of σt,i2 RMSE Panel A: 5-minute return  RVt – – – 0.45601 N(0,0.01nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.43095 N(0,0.05nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.42919   V^t N(0,0.002nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.42953 N(0,0.01nt) Gamma(2,8) IG(v0,t3d,s0,t3d) 0.42975 N(0,0.01nt) Gamma(8,8) IG(v0,t3d,s0,t3d) 0.42782 N(0,0.01nt) Gamma(4,8) IG(v0,t1d,s0,t1d) 0.43199 Panel B: 1-minute return  RVt – – – 0.23296 N(0,0.01nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.22802 N(0,0.05nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.22739   V^t N(0,0.002nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.22686 N(0,0.01nt) Gamma(2,8) IG(v0,t3d,s0,t3d) 0.22691 N(0,0.01nt) Gamma(8,8) IG(v0,t3d,s0,t3d) 0.22613 N(0,0.01nt) Gamma(4,8) IG(v0,t1d,s0,t1d) 0.22695 Panel C: 30-second return  RVt – – – 0.14178 N(0,0.01nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.13970 N(0,0.05nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.14059   V^t N(0,0.002nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.14012 N(0,0.01nt) Gamma(2,8) IG(v0,t3d,s0,t3d) 0.14096 N(0,0.01nt) Gamma(8,8) IG(v0,t3d,s0,t3d) 0.14003 N(0,0.01nt) Gamma(4,8) IG(v0,t1d,s0,t1d) 0.14029 Panel D: 10-second return  RVt – – – 0.09535 N(0,0.01nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.09596 N(0,0.05nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.09631   V^t N(0,0.002nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.09546 N(0,0.01nt) Gamma(2,8) IG(v0,t3d,s0,t3d) 0.09610 N(0,0.01nt) Gamma(8,8) IG(v0,t3d,s0,t3d) 0.09644 N(0,0.01nt) Gamma(4,8) IG(v0,t1d,s0,t1d) 0.09528 Notes: This table reports the RMSE of estimating 5000 daily ex post variances using RVt and Bayes nonparametric estimator V^t under different priors. v0,t3d denotes 3 days used to calibrate the prior parameter while v0,t1d denotes one day of data used. The data are generated from SV2F. Microstructure noise is not considered. Open in new tab Table 3 Prior robustness check Estimator Prior of μt Prior of αt Prior of σt,i2 RMSE Panel A: 5-minute return  RVt – – – 0.45601 N(0,0.01nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.43095 N(0,0.05nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.42919   V^t N(0,0.002nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.42953 N(0,0.01nt) Gamma(2,8) IG(v0,t3d,s0,t3d) 0.42975 N(0,0.01nt) Gamma(8,8) IG(v0,t3d,s0,t3d) 0.42782 N(0,0.01nt) Gamma(4,8) IG(v0,t1d,s0,t1d) 0.43199 Panel B: 1-minute return  RVt – – – 0.23296 N(0,0.01nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.22802 N(0,0.05nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.22739   V^t N(0,0.002nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.22686 N(0,0.01nt) Gamma(2,8) IG(v0,t3d,s0,t3d) 0.22691 N(0,0.01nt) Gamma(8,8) IG(v0,t3d,s0,t3d) 0.22613 N(0,0.01nt) Gamma(4,8) IG(v0,t1d,s0,t1d) 0.22695 Panel C: 30-second return  RVt – – – 0.14178 N(0,0.01nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.13970 N(0,0.05nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.14059   V^t N(0,0.002nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.14012 N(0,0.01nt) Gamma(2,8) IG(v0,t3d,s0,t3d) 0.14096 N(0,0.01nt) Gamma(8,8) IG(v0,t3d,s0,t3d) 0.14003 N(0,0.01nt) Gamma(4,8) IG(v0,t1d,s0,t1d) 0.14029 Panel D: 10-second return  RVt – – – 0.09535 N(0,0.01nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.09596 N(0,0.05nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.09631   V^t N(0,0.002nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.09546 N(0,0.01nt) Gamma(2,8) IG(v0,t3d,s0,t3d) 0.09610 N(0,0.01nt) Gamma(8,8) IG(v0,t3d,s0,t3d) 0.09644 N(0,0.01nt) Gamma(4,8) IG(v0,t1d,s0,t1d) 0.09528 Estimator Prior of μt Prior of αt Prior of σt,i2 RMSE Panel A: 5-minute return  RVt – – – 0.45601 N(0,0.01nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.43095 N(0,0.05nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.42919   V^t N(0,0.002nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.42953 N(0,0.01nt) Gamma(2,8) IG(v0,t3d,s0,t3d) 0.42975 N(0,0.01nt) Gamma(8,8) IG(v0,t3d,s0,t3d) 0.42782 N(0,0.01nt) Gamma(4,8) IG(v0,t1d,s0,t1d) 0.43199 Panel B: 1-minute return  RVt – – – 0.23296 N(0,0.01nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.22802 N(0,0.05nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.22739   V^t N(0,0.002nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.22686 N(0,0.01nt) Gamma(2,8) IG(v0,t3d,s0,t3d) 0.22691 N(0,0.01nt) Gamma(8,8) IG(v0,t3d,s0,t3d) 0.22613 N(0,0.01nt) Gamma(4,8) IG(v0,t1d,s0,t1d) 0.22695 Panel C: 30-second return  RVt – – – 0.14178 N(0,0.01nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.13970 N(0,0.05nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.14059   V^t N(0,0.002nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.14012 N(0,0.01nt) Gamma(2,8) IG(v0,t3d,s0,t3d) 0.14096 N(0,0.01nt) Gamma(8,8) IG(v0,t3d,s0,t3d) 0.14003 N(0,0.01nt) Gamma(4,8) IG(v0,t1d,s0,t1d) 0.14029 Panel D: 10-second return  RVt – – – 0.09535 N(0,0.01nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.09596 N(0,0.05nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.09631   V^t N(0,0.002nt) Gamma(4,8) IG(v0,t3d,s0,t3d) 0.09546 N(0,0.01nt) Gamma(2,8) IG(v0,t3d,s0,t3d) 0.09610 N(0,0.01nt) Gamma(8,8) IG(v0,t3d,s0,t3d) 0.09644 N(0,0.01nt) Gamma(4,8) IG(v0,t1d,s0,t1d) 0.09528 Notes: This table reports the RMSE of estimating 5000 daily ex post variances using RVt and Bayes nonparametric estimator V^t under different priors. v0,t3d denotes 3 days used to calibrate the prior parameter while v0,t1d denotes one day of data used. The data are generated from SV2F. Microstructure noise is not considered. Open in new tab We also include an analysis to check if the benefits of pooling persist given irregularly spaced returns. Following Barndorff-Nielsen et al. (2011), arrival times of observed prices are simulated from a Poisson process. Table 4 shows the Bayes nonparametric estimator V^t has a lower RMSE compared with RVt for this irregularly spaced DGP. Table 4 RMSE of RVt and V^t (irregularly spaced case) Data freq. Estimator GARCH SV1F SV1FJ SV2F λ = 300 RVt 0.15890 0.34273 0.32518 0.70050 V^t 0.14724 0.31481 0.30301 0.67717 λ = 60 RVt 0.14903 0.15044 0.14741 0.63304 V^t 0.14297 0.14611 0.14524 0.57660 λ = 30 RVt 0.06989 0.09738 0.10187 0.27621 V^t 0.06827 0.09604 0.10355 0.26985 λ = 10 RVt 0.03399 0.06432 0.06575 0.23201 V^t 0.03380 0.06392 0.06851 0.23070 Data freq. Estimator GARCH SV1F SV1FJ SV2F λ = 300 RVt 0.15890 0.34273 0.32518 0.70050 V^t 0.14724 0.31481 0.30301 0.67717 λ = 60 RVt 0.14903 0.15044 0.14741 0.63304 V^t 0.14297 0.14611 0.14524 0.57660 λ = 30 RVt 0.06989 0.09738 0.10187 0.27621 V^t 0.06827 0.09604 0.10355 0.26985 λ = 10 RVt 0.03399 0.06432 0.06575 0.23201 V^t 0.03380 0.06392 0.06851 0.23070 Notes: We follow Barndorff-Nielsen et al. (2011) to simulate irregularly spaced prices. The arrival times of observations are simulated from a Poisson process. The parameter λ in the Poisson process governs the trading frequency of simulated data. For example, λ = 30 means the transactions arrive every 30 seconds on average. Bold entries denote the smallest values. Open in new tab Table 4 RMSE of RVt and V^t (irregularly spaced case) Data freq. Estimator GARCH SV1F SV1FJ SV2F λ = 300 RVt 0.15890 0.34273 0.32518 0.70050 V^t 0.14724 0.31481 0.30301 0.67717 λ = 60 RVt 0.14903 0.15044 0.14741 0.63304 V^t 0.14297 0.14611 0.14524 0.57660 λ = 30 RVt 0.06989 0.09738 0.10187 0.27621 V^t 0.06827 0.09604 0.10355 0.26985 λ = 10 RVt 0.03399 0.06432 0.06575 0.23201 V^t 0.03380 0.06392 0.06851 0.23070 Data freq. Estimator GARCH SV1F SV1FJ SV2F λ = 300 RVt 0.15890 0.34273 0.32518 0.70050 V^t 0.14724 0.31481 0.30301 0.67717 λ = 60 RVt 0.14903 0.15044 0.14741 0.63304 V^t 0.14297 0.14611 0.14524 0.57660 λ = 30 RVt 0.06989 0.09738 0.10187 0.27621 V^t 0.06827 0.09604 0.10355 0.26985 λ = 10 RVt 0.03399 0.06432 0.06575 0.23201 V^t 0.03380 0.06392 0.06851 0.23070 Notes: We follow Barndorff-Nielsen et al. (2011) to simulate irregularly spaced prices. The arrival times of observations are simulated from a Poisson process. The parameter λ in the Poisson process governs the trading frequency of simulated data. For example, λ = 30 means the transactions arrive every 30 seconds on average. Bold entries denote the smallest values. Open in new tab 4.4 Independent and Dependent Microstructure Noise In this section, we compare RVt, RKtF, V^t ⁠, and V^t,MA(1) with independent microstructure noise. Table 5 shows the RMSE of the various estimators for different sampling frequencies and DGPs. RVt and V^t produce smaller errors in estimating σt2 than RKtF and V^t,MA(1) for five-minute data. However, increasing the sampling frequency results in a larger bias from the microstructure noise. As such, RKtF and V^t,MA(1) are more accurate as the data frequency increases. Compared to RKtF ⁠, V^t,MA(1) has a smaller RMSE in all cases, except for 30-second and 10-second SV2F return. Table 5 RMSE of RVt, RKtF, V^t ⁠, and V^t,MA(1) (independent microstructure error case) Data freq. Estimator GARCH SV1F SV1FJ SV2F 5-minute RVt 0.16003 0.29182 0.30651 0.47783 RKtF 0.22988 0.42318 0.43993 0.84100 Vt^ 0.15640 0.28464 0.29858 0.46117 V^t,MA(1) 0.21636 0.38776 0.40729 0.74828 1-minute RVt 0.48607 0.85374 0.94598 0.63983 RKtF 0.11157 0.20184 0.20822 0.46655 V^t 0.48735 0.85547 0.94689 0.63808 V^t,MA(1) 0.10592 0.18787 0.19539 0.41176 30-second RVt 0.95855 1.69544 1.87445 1.20299 RKtF 0.08483 0.15200 0.15743 0.27201 V^t 0.96016 1.69798 1.87569 1.20332 V^t,MA(1) 0.07906 0.14017 0.15232 0.27595 10-second RVt 2.86639 5.06382 5.60527 3.57263 RKtF 0.05575 0.10097 0.10683 0.16989 V^t 2.86858 5.06745 5.60757 3.57263 V^t,MA(1) 0.05387 0.09621 0.10555 0.20857 Data freq. Estimator GARCH SV1F SV1FJ SV2F 5-minute RVt 0.16003 0.29182 0.30651 0.47783 RKtF 0.22988 0.42318 0.43993 0.84100 Vt^ 0.15640 0.28464 0.29858 0.46117 V^t,MA(1) 0.21636 0.38776 0.40729 0.74828 1-minute RVt 0.48607 0.85374 0.94598 0.63983 RKtF 0.11157 0.20184 0.20822 0.46655 V^t 0.48735 0.85547 0.94689 0.63808 V^t,MA(1) 0.10592 0.18787 0.19539 0.41176 30-second RVt 0.95855 1.69544 1.87445 1.20299 RKtF 0.08483 0.15200 0.15743 0.27201 V^t 0.96016 1.69798 1.87569 1.20332 V^t,MA(1) 0.07906 0.14017 0.15232 0.27595 10-second RVt 2.86639 5.06382 5.60527 3.57263 RKtF 0.05575 0.10097 0.10683 0.16989 V^t 2.86858 5.06745 5.60757 3.57263 V^t,MA(1) 0.05387 0.09621 0.10555 0.20857 Notes: This table reports the RMSE of estimating 5000 daily ex post variances using RVt, RKtF ⁠, and Bayesian nonparametric estimators V^t and V^t,MA(1) based on returns at different frequencies and simulated from four DGPs. The price is contaminated with white noise. Bold entries denote the smallest values. Open in new tab Table 5 RMSE of RVt, RKtF, V^t ⁠, and V^t,MA(1) (independent microstructure error case) Data freq. Estimator GARCH SV1F SV1FJ SV2F 5-minute RVt 0.16003 0.29182 0.30651 0.47783 RKtF 0.22988 0.42318 0.43993 0.84100 Vt^ 0.15640 0.28464 0.29858 0.46117 V^t,MA(1) 0.21636 0.38776 0.40729 0.74828 1-minute RVt 0.48607 0.85374 0.94598 0.63983 RKtF 0.11157 0.20184 0.20822 0.46655 V^t 0.48735 0.85547 0.94689 0.63808 V^t,MA(1) 0.10592 0.18787 0.19539 0.41176 30-second RVt 0.95855 1.69544 1.87445 1.20299 RKtF 0.08483 0.15200 0.15743 0.27201 V^t 0.96016 1.69798 1.87569 1.20332 V^t,MA(1) 0.07906 0.14017 0.15232 0.27595 10-second RVt 2.86639 5.06382 5.60527 3.57263 RKtF 0.05575 0.10097 0.10683 0.16989 V^t 2.86858 5.06745 5.60757 3.57263 V^t,MA(1) 0.05387 0.09621 0.10555 0.20857 Data freq. Estimator GARCH SV1F SV1FJ SV2F 5-minute RVt 0.16003 0.29182 0.30651 0.47783 RKtF 0.22988 0.42318 0.43993 0.84100 Vt^ 0.15640 0.28464 0.29858 0.46117 V^t,MA(1) 0.21636 0.38776 0.40729 0.74828 1-minute RVt 0.48607 0.85374 0.94598 0.63983 RKtF 0.11157 0.20184 0.20822 0.46655 V^t 0.48735 0.85547 0.94689 0.63808 V^t,MA(1) 0.10592 0.18787 0.19539 0.41176 30-second RVt 0.95855 1.69544 1.87445 1.20299 RKtF 0.08483 0.15200 0.15743 0.27201 V^t 0.96016 1.69798 1.87569 1.20332 V^t,MA(1) 0.07906 0.14017 0.15232 0.27595 10-second RVt 2.86639 5.06382 5.60527 3.57263 RKtF 0.05575 0.10097 0.10683 0.16989 V^t 2.86858 5.06745 5.60757 3.57263 V^t,MA(1) 0.05387 0.09621 0.10555 0.20857 Notes: This table reports the RMSE of estimating 5000 daily ex post variances using RVt, RKtF ⁠, and Bayesian nonparametric estimators V^t and V^t,MA(1) based on returns at different frequencies and simulated from four DGPs. The price is contaminated with white noise. Bold entries denote the smallest values. Open in new tab As can be seen in Table 6, V^t,MA(1) has the best finite sample coverage among all the alternatives except for the SV2F data. For example, the coverage probabilities of 0.95 density intervals are always within 0.5% from the truth. Note that the density intervals are trivial to obtain from the MCMC output and do not require the calculation IQt. The coverage probabilities of either infeasible or feasible confidence intervals of realized kernels are not as good as those of V^t,MA(1) ⁠. Moreover, RKtF requires larger samples for good coverage, while density intervals of V^t,MA(1) perform well for either low- or high-frequency returns. Table 6 Coverage probability (independent microstructure error case) Data freq. Interval estimator GARCH (%) SV1F (%) SV1FJ (%) SV2F (%) 5-minute RVt 87.60 85.00 84.42 21.56 RKtF‐Infeasible 87.84 87.66 87.94 93.48 RKtF‐Feasible 84.28 96.20 83.68 97.72 V^t 81.84 78.50 77.10 18.12 V^t,MA(1) 94.02 94.40 94.14 89.74 1-minute RVt 0.46 0.82 0.78 5.24 RKtF‐Infeasible 88.50 89.78 89.02 93.32 RKtF‐Feasible 99.30 97.76 95.26 97.86 V^t 0.42 0.72 0.56 4.48 V^t,MA(1) 95.06 95.18 94.66 86.60 30-second RVt 0.00 0.00 0.02 1.72 RKtF‐Infeasible 89.80 90.46 90.74 92.80 RKtF‐Feasible 77.44 99.48 99.52 97.94 V^t 0.00 0.00 0.00 1.54 V^t,MA(1) 95.00 95.18 94.84 85.94 10-second RVt 0.00 0.00 0.00 0.04 RKtF‐Infeasible 92.08 92.68 92.90 92.10 RKtF‐Feasible 99.98 99.98 99.98 98.62 V^t 0.00 0.00 0.00 0.04 V^t,MA(1) 94.92 95.34 95.32 82.24 Data freq. Interval estimator GARCH (%) SV1F (%) SV1FJ (%) SV2F (%) 5-minute RVt 87.60 85.00 84.42 21.56 RKtF‐Infeasible 87.84 87.66 87.94 93.48 RKtF‐Feasible 84.28 96.20 83.68 97.72 V^t 81.84 78.50 77.10 18.12 V^t,MA(1) 94.02 94.40 94.14 89.74 1-minute RVt 0.46 0.82 0.78 5.24 RKtF‐Infeasible 88.50 89.78 89.02 93.32 RKtF‐Feasible 99.30 97.76 95.26 97.86 V^t 0.42 0.72 0.56 4.48 V^t,MA(1) 95.06 95.18 94.66 86.60 30-second RVt 0.00 0.00 0.02 1.72 RKtF‐Infeasible 89.80 90.46 90.74 92.80 RKtF‐Feasible 77.44 99.48 99.52 97.94 V^t 0.00 0.00 0.00 1.54 V^t,MA(1) 95.00 95.18 94.84 85.94 10-second RVt 0.00 0.00 0.00 0.04 RKtF‐Infeasible 92.08 92.68 92.90 92.10 RKtF‐Feasible 99.98 99.98 99.98 98.62 V^t 0.00 0.00 0.00 0.04 V^t,MA(1) 94.92 95.34 95.32 82.24 Notes: This table reports the coverage probabilities of 95% confidence intervals using RVt, RKtF ⁠, and 0.95 density intervals using V^t and V^MA(1) based on 5000-day results for different DGPs. The price is contaminated with white noise. Open in new tab Table 6 Coverage probability (independent microstructure error case) Data freq. Interval estimator GARCH (%) SV1F (%) SV1FJ (%) SV2F (%) 5-minute RVt 87.60 85.00 84.42 21.56 RKtF‐Infeasible 87.84 87.66 87.94 93.48 RKtF‐Feasible 84.28 96.20 83.68 97.72 V^t 81.84 78.50 77.10 18.12 V^t,MA(1) 94.02 94.40 94.14 89.74 1-minute RVt 0.46 0.82 0.78 5.24 RKtF‐Infeasible 88.50 89.78 89.02 93.32 RKtF‐Feasible 99.30 97.76 95.26 97.86 V^t 0.42 0.72 0.56 4.48 V^t,MA(1) 95.06 95.18 94.66 86.60 30-second RVt 0.00 0.00 0.02 1.72 RKtF‐Infeasible 89.80 90.46 90.74 92.80 RKtF‐Feasible 77.44 99.48 99.52 97.94 V^t 0.00 0.00 0.00 1.54 V^t,MA(1) 95.00 95.18 94.84 85.94 10-second RVt 0.00 0.00 0.00 0.04 RKtF‐Infeasible 92.08 92.68 92.90 92.10 RKtF‐Feasible 99.98 99.98 99.98 98.62 V^t 0.00 0.00 0.00 0.04 V^t,MA(1) 94.92 95.34 95.32 82.24 Data freq. Interval estimator GARCH (%) SV1F (%) SV1FJ (%) SV2F (%) 5-minute RVt 87.60 85.00 84.42 21.56 RKtF‐Infeasible 87.84 87.66 87.94 93.48 RKtF‐Feasible 84.28 96.20 83.68 97.72 V^t 81.84 78.50 77.10 18.12 V^t,MA(1) 94.02 94.40 94.14 89.74 1-minute RVt 0.46 0.82 0.78 5.24 RKtF‐Infeasible 88.50 89.78 89.02 93.32 RKtF‐Feasible 99.30 97.76 95.26 97.86 V^t 0.42 0.72 0.56 4.48 V^t,MA(1) 95.06 95.18 94.66 86.60 30-second RVt 0.00 0.00 0.02 1.72 RKtF‐Infeasible 89.80 90.46 90.74 92.80 RKtF‐Feasible 77.44 99.48 99.52 97.94 V^t 0.00 0.00 0.00 1.54 V^t,MA(1) 95.00 95.18 94.84 85.94 10-second RVt 0.00 0.00 0.00 0.04 RKtF‐Infeasible 92.08 92.68 92.90 92.10 RKtF‐Feasible 99.98 99.98 99.98 98.62 V^t 0.00 0.00 0.00 0.04 V^t,MA(1) 94.92 95.34 95.32 82.24 Notes: This table reports the coverage probabilities of 95% confidence intervals using RVt, RKtF ⁠, and 0.95 density intervals using V^t and V^MA(1) based on 5000-day results for different DGPs. The price is contaminated with white noise. Open in new tab The last experiment considers the performances of the estimators under dependent noise. RKtN ⁠, RVt, V^t, V^t,MA(1) ⁠, and V^t,MA(2) are compared. The RMSE of estimators can be found in Table 7. Again, RVt and V^t provide poor results if high-frequency data is used. Except for one entry in the table, a version of the Bayesian estimator has the smallest RMSE in each case. The V^t,MA(1) estimator is ranked the best if return frequency is 30 seconds, followed by V^t,MA(2) and RKtN ⁠. For 10 seconds returns, V^MA(2) provides the smallest error. Compared to RKtN ⁠, the V^t,MA(1) and V^t,MA(2) can provide significant improvements for 30- and 10-second returns. For instance, at 30 seconds, reductions in the RMSE of 10% or more are common while at the 10-second frequency reductions in the RMSE are 25% or more. Table 7 RMSE of RVt, RKtN, V^t, V^t,MA(1) ⁠, and V^t,MA(2) (dependent microstructure error case) Data freq. Estimator GARCH SV1F SV1FJ SV2F 5-minute RVt 0.21825 0.39266 0.41585 0.58520 RKtN 0.23575 0.44343 0.45080 0.89975 V^t 0.21505 0.38582 0.40767 0.54581 V^t,MA(1) 0.22493 0.40316 0.42100 0.83691 V^t,MA(2) 0.29260 0.54051 0.57714 1.18875 1-minute RVt 0.84121 1.48399 1.60189 1.6954 RKtN 0.14158 0.25780 0.26987 0.52030 V^t 0.84318 1.48663 1.60261 1.67740 V^t,MA(1) 0.11558 0.20443 0.21297 0.50769 V^t,MA(2) 0.13732 0.24891 0.26161 0.62325 30-second RVt 1.66229 2.95397 3.19560 3.37090 RKtN 0.11918 0.21559 0.22306 0.42729 V^t 1.66480 2.95765 3.19689 3.36058 V^t,MA(1) 0.08889 0.15848 0.16931 0.34572 V^t,MA(2) 0.10531 0.18916 0.19313 0.39269 10-second RVt 4.40694 7.81961 8.49852 7.85934 RKtN 0.09850 0.18004 0.18376 0.34594 V^t 4.41003 7.82481 8.49935 7.85507 V^t,MA(1) 0.16456 0.30833 0.30465 0.89045 V^t,MA(2) 0.06940 0.12804 0.13592 0.25182 Data freq. Estimator GARCH SV1F SV1FJ SV2F 5-minute RVt 0.21825 0.39266 0.41585 0.58520 RKtN 0.23575 0.44343 0.45080 0.89975 V^t 0.21505 0.38582 0.40767 0.54581 V^t,MA(1) 0.22493 0.40316 0.42100 0.83691 V^t,MA(2) 0.29260 0.54051 0.57714 1.18875 1-minute RVt 0.84121 1.48399 1.60189 1.6954 RKtN 0.14158 0.25780 0.26987 0.52030 V^t 0.84318 1.48663 1.60261 1.67740 V^t,MA(1) 0.11558 0.20443 0.21297 0.50769 V^t,MA(2) 0.13732 0.24891 0.26161 0.62325 30-second RVt 1.66229 2.95397 3.19560 3.37090 RKtN 0.11918 0.21559 0.22306 0.42729 V^t 1.66480 2.95765 3.19689 3.36058 V^t,MA(1) 0.08889 0.15848 0.16931 0.34572 V^t,MA(2) 0.10531 0.18916 0.19313 0.39269 10-second RVt 4.40694 7.81961 8.49852 7.85934 RKtN 0.09850 0.18004 0.18376 0.34594 V^t 4.41003 7.82481 8.49935 7.85507 V^t,MA(1) 0.16456 0.30833 0.30465 0.89045 V^t,MA(2) 0.06940 0.12804 0.13592 0.25182 Notes: This table reports the RMSE of estimating 5000 daily ex post variances using RVt, RKtN ⁠, and Bayesian nonparametric estimators V^t, V^t,MA(1) ⁠, and V^t,MA(2) based on returns at different frequencies and simulated from four DGPs. The observed prices contain microstructure noise that is dependent on returns. Bold entries denote the smallest values. Open in new tab Table 7 RMSE of RVt, RKtN, V^t, V^t,MA(1) ⁠, and V^t,MA(2) (dependent microstructure error case) Data freq. Estimator GARCH SV1F SV1FJ SV2F 5-minute RVt 0.21825 0.39266 0.41585 0.58520 RKtN 0.23575 0.44343 0.45080 0.89975 V^t 0.21505 0.38582 0.40767 0.54581 V^t,MA(1) 0.22493 0.40316 0.42100 0.83691 V^t,MA(2) 0.29260 0.54051 0.57714 1.18875 1-minute RVt 0.84121 1.48399 1.60189 1.6954 RKtN 0.14158 0.25780 0.26987 0.52030 V^t 0.84318 1.48663 1.60261 1.67740 V^t,MA(1) 0.11558 0.20443 0.21297 0.50769 V^t,MA(2) 0.13732 0.24891 0.26161 0.62325 30-second RVt 1.66229 2.95397 3.19560 3.37090 RKtN 0.11918 0.21559 0.22306 0.42729 V^t 1.66480 2.95765 3.19689 3.36058 V^t,MA(1) 0.08889 0.15848 0.16931 0.34572 V^t,MA(2) 0.10531 0.18916 0.19313 0.39269 10-second RVt 4.40694 7.81961 8.49852 7.85934 RKtN 0.09850 0.18004 0.18376 0.34594 V^t 4.41003 7.82481 8.49935 7.85507 V^t,MA(1) 0.16456 0.30833 0.30465 0.89045 V^t,MA(2) 0.06940 0.12804 0.13592 0.25182 Data freq. Estimator GARCH SV1F SV1FJ SV2F 5-minute RVt 0.21825 0.39266 0.41585 0.58520 RKtN 0.23575 0.44343 0.45080 0.89975 V^t 0.21505 0.38582 0.40767 0.54581 V^t,MA(1) 0.22493 0.40316 0.42100 0.83691 V^t,MA(2) 0.29260 0.54051 0.57714 1.18875 1-minute RVt 0.84121 1.48399 1.60189 1.6954 RKtN 0.14158 0.25780 0.26987 0.52030 V^t 0.84318 1.48663 1.60261 1.67740 V^t,MA(1) 0.11558 0.20443 0.21297 0.50769 V^t,MA(2) 0.13732 0.24891 0.26161 0.62325 30-second RVt 1.66229 2.95397 3.19560 3.37090 RKtN 0.11918 0.21559 0.22306 0.42729 V^t 1.66480 2.95765 3.19689 3.36058 V^t,MA(1) 0.08889 0.15848 0.16931 0.34572 V^t,MA(2) 0.10531 0.18916 0.19313 0.39269 10-second RVt 4.40694 7.81961 8.49852 7.85934 RKtN 0.09850 0.18004 0.18376 0.34594 V^t 4.41003 7.82481 8.49935 7.85507 V^t,MA(1) 0.16456 0.30833 0.30465 0.89045 V^t,MA(2) 0.06940 0.12804 0.13592 0.25182 Notes: This table reports the RMSE of estimating 5000 daily ex post variances using RVt, RKtN ⁠, and Bayesian nonparametric estimators V^t, V^t,MA(1) ⁠, and V^t,MA(2) based on returns at different frequencies and simulated from four DGPs. The observed prices contain microstructure noise that is dependent on returns. Bold entries denote the smallest values. Open in new tab Table 8 shows V^t,MA(1) and V^t,MA(2) have smaller bias if return frequency is one minute or higher. Table 9 shows the coverage probabilities of all the five estimators. The finite sample results of V^t,MA(2) are all very close to the optimal level, no matter the data frequency. Table 8 Bias of RVt, RKtN, V^t, V^t,MA(1) ⁠, and V^t,MA(2) (dependent microstructure error case) Data freq. Estimator GARCH SV1F SV1FJ SV2F 5-minute RVt 0.16032 0.28455 0.30733 0.17262 RKtN 0.01349 0.02985 0.03232 0.00733 V^t 0.16005 0.28359 0.30534 0.16275 V^t,MA(1) 0.01471 0.02665 0.02819 −0.00104 V^t,MA(2) 0.05581 0.10305 0.11604 0.03956 1-minute RVt 0.81057 1.42504 1.54563 0.87166 RKtN 0.02421 0.04351 0.04360 0.01839 V^t 0.81269 1.42805 1.54689 0.86954 V^t,MA(1) 0.00822 0.01401 0.01359 −0.01044 V^t,MA(2) 0.01694 0.03179 0.02977 −0.00588 30-second RVt 1.61481 2.85837 3.10192 1.72912 RKtN 0.02791 0.04940 0.05114 0.02369 V^t 1.61731 2.86219 3.10359 1.72853 V^t,MA(1) 0.00721 0.01253 0.00856 −0.01302 V^t,MA(2) 0.01074 0.01972 0.01796 −0.01155 10-second RVt 4.32800 7.65381 8.34221 4.67328 RKtN 0.04034 0.07209 0.07321 0.04327 V^t 4.33106 7.65902 8.34351 4.67462 V^t,MA(1) 0.11026 0.20188 0.20173 0.13648 V^t,MA(2) 0.00634 0.01300 0.00850 −0.01896 Data freq. Estimator GARCH SV1F SV1FJ SV2F 5-minute RVt 0.16032 0.28455 0.30733 0.17262 RKtN 0.01349 0.02985 0.03232 0.00733 V^t 0.16005 0.28359 0.30534 0.16275 V^t,MA(1) 0.01471 0.02665 0.02819 −0.00104 V^t,MA(2) 0.05581 0.10305 0.11604 0.03956 1-minute RVt 0.81057 1.42504 1.54563 0.87166 RKtN 0.02421 0.04351 0.04360 0.01839 V^t 0.81269 1.42805 1.54689 0.86954 V^t,MA(1) 0.00822 0.01401 0.01359 −0.01044 V^t,MA(2) 0.01694 0.03179 0.02977 −0.00588 30-second RVt 1.61481 2.85837 3.10192 1.72912 RKtN 0.02791 0.04940 0.05114 0.02369 V^t 1.61731 2.86219 3.10359 1.72853 V^t,MA(1) 0.00721 0.01253 0.00856 −0.01302 V^t,MA(2) 0.01074 0.01972 0.01796 −0.01155 10-second RVt 4.32800 7.65381 8.34221 4.67328 RKtN 0.04034 0.07209 0.07321 0.04327 V^t 4.33106 7.65902 8.34351 4.67462 V^t,MA(1) 0.11026 0.20188 0.20173 0.13648 V^t,MA(2) 0.00634 0.01300 0.00850 −0.01896 Notes: This table reports the bias estimates from 5000 daily ex post variances using RV, RKN, and Bayesian nonparametric estimators V^, V^MA(1) ⁠, and V^MA(2) based on returns at different frequencies and simulated from four DGPs. The observed prices contain microstructure noise that is dependent with returns. Bold entries denote the smallest values. Open in new tab Table 8 Bias of RVt, RKtN, V^t, V^t,MA(1) ⁠, and V^t,MA(2) (dependent microstructure error case) Data freq. Estimator GARCH SV1F SV1FJ SV2F 5-minute RVt 0.16032 0.28455 0.30733 0.17262 RKtN 0.01349 0.02985 0.03232 0.00733 V^t 0.16005 0.28359 0.30534 0.16275 V^t,MA(1) 0.01471 0.02665 0.02819 −0.00104 V^t,MA(2) 0.05581 0.10305 0.11604 0.03956 1-minute RVt 0.81057 1.42504 1.54563 0.87166 RKtN 0.02421 0.04351 0.04360 0.01839 V^t 0.81269 1.42805 1.54689 0.86954 V^t,MA(1) 0.00822 0.01401 0.01359 −0.01044 V^t,MA(2) 0.01694 0.03179 0.02977 −0.00588 30-second RVt 1.61481 2.85837 3.10192 1.72912 RKtN 0.02791 0.04940 0.05114 0.02369 V^t 1.61731 2.86219 3.10359 1.72853 V^t,MA(1) 0.00721 0.01253 0.00856 −0.01302 V^t,MA(2) 0.01074 0.01972 0.01796 −0.01155 10-second RVt 4.32800 7.65381 8.34221 4.67328 RKtN 0.04034 0.07209 0.07321 0.04327 V^t 4.33106 7.65902 8.34351 4.67462 V^t,MA(1) 0.11026 0.20188 0.20173 0.13648 V^t,MA(2) 0.00634 0.01300 0.00850 −0.01896 Data freq. Estimator GARCH SV1F SV1FJ SV2F 5-minute RVt 0.16032 0.28455 0.30733 0.17262 RKtN 0.01349 0.02985 0.03232 0.00733 V^t 0.16005 0.28359 0.30534 0.16275 V^t,MA(1) 0.01471 0.02665 0.02819 −0.00104 V^t,MA(2) 0.05581 0.10305 0.11604 0.03956 1-minute RVt 0.81057 1.42504 1.54563 0.87166 RKtN 0.02421 0.04351 0.04360 0.01839 V^t 0.81269 1.42805 1.54689 0.86954 V^t,MA(1) 0.00822 0.01401 0.01359 −0.01044 V^t,MA(2) 0.01694 0.03179 0.02977 −0.00588 30-second RVt 1.61481 2.85837 3.10192 1.72912 RKtN 0.02791 0.04940 0.05114 0.02369 V^t 1.61731 2.86219 3.10359 1.72853 V^t,MA(1) 0.00721 0.01253 0.00856 −0.01302 V^t,MA(2) 0.01074 0.01972 0.01796 −0.01155 10-second RVt 4.32800 7.65381 8.34221 4.67328 RKtN 0.04034 0.07209 0.07321 0.04327 V^t 4.33106 7.65902 8.34351 4.67462 V^t,MA(1) 0.11026 0.20188 0.20173 0.13648 V^t,MA(2) 0.00634 0.01300 0.00850 −0.01896 Notes: This table reports the bias estimates from 5000 daily ex post variances using RV, RKN, and Bayesian nonparametric estimators V^, V^MA(1) ⁠, and V^MA(2) based on returns at different frequencies and simulated from four DGPs. The observed prices contain microstructure noise that is dependent with returns. Bold entries denote the smallest values. Open in new tab Table 9 Coverage probability (dependent microstructure error case) Data freq. Interval estimator GARCH (%) SV1F (%) SV1FJ (%) SV2F (%) 5-minute RVt 76.22 74.00 73.12 21.14 RKtN‐Infeasible 87.26 87.62 87.64 76.72 RKtN‐Feasible 91.16 91.34 92.02 96.42 V^t 66.00 63.62 62.44 16.74 V^t,MA(1) 93.96 94.26 94.28 89.84 V^t,MA(2) 94.36 94.60 94.22 90.06 1-minute RVt 0.00 0.00 0.10 0.06 RKtN‐Infeasible 90.02 90.40 89.98 71.70 RKtN‐Feasible 99.80 99.80 99.70 99.46 V^t 0.00 0.00 0.04 0.04 V^t,MA(1) 94.64 94.92 94.72 87.08 V^t,MA(2) 94.58 94.92 94.30 86.92 30-second RVt 0.00 0.00 0.00 0.00 RKtN‐Infeasible 91.50 91.72 91.26 70.94 RKN‐Feasible 100.00 100.00 100.00 99.96 V^ 0.00 0.00 0.00 0.00 V^MA(1) 95.00 95.24 94.76 85.18 V^MA(2) 94.96 94.66 94.78 85.80 10-second RVt 0.00 0.00 0.00 0.00 RKtN‐Infeasible 91.90 92.44 92.30 69.72 RKtN‐Feasible 100.00 100.00 100.00 100.00 V^t 0.00 0.00 0.00 0.00 V^t,MA(1) 64.70 65.00 68.00 78.74 V^t,MA(2) 94.48 95.20 95.14 82.06 Data freq. Interval estimator GARCH (%) SV1F (%) SV1FJ (%) SV2F (%) 5-minute RVt 76.22 74.00 73.12 21.14 RKtN‐Infeasible 87.26 87.62 87.64 76.72 RKtN‐Feasible 91.16 91.34 92.02 96.42 V^t 66.00 63.62 62.44 16.74 V^t,MA(1) 93.96 94.26 94.28 89.84 V^t,MA(2) 94.36 94.60 94.22 90.06 1-minute RVt 0.00 0.00 0.10 0.06 RKtN‐Infeasible 90.02 90.40 89.98 71.70 RKtN‐Feasible 99.80 99.80 99.70 99.46 V^t 0.00 0.00 0.04 0.04 V^t,MA(1) 94.64 94.92 94.72 87.08 V^t,MA(2) 94.58 94.92 94.30 86.92 30-second RVt 0.00 0.00 0.00 0.00 RKtN‐Infeasible 91.50 91.72 91.26 70.94 RKN‐Feasible 100.00 100.00 100.00 99.96 V^ 0.00 0.00 0.00 0.00 V^MA(1) 95.00 95.24 94.76 85.18 V^MA(2) 94.96 94.66 94.78 85.80 10-second RVt 0.00 0.00 0.00 0.00 RKtN‐Infeasible 91.90 92.44 92.30 69.72 RKtN‐Feasible 100.00 100.00 100.00 100.00 V^t 0.00 0.00 0.00 0.00 V^t,MA(1) 64.70 65.00 68.00 78.74 V^t,MA(2) 94.48 95.20 95.14 82.06 Notes: This table reports the coverage probabilities of 95% confidence intervals of RV, RKN, and 0.95 density intervals of Bayesian nonparametric estimators V^, V^MA(1) and V^MA(2) based on 5000-day results. The observed prices contain microstructure noise that is dependent with returns. Open in new tab Table 9 Coverage probability (dependent microstructure error case) Data freq. Interval estimator GARCH (%) SV1F (%) SV1FJ (%) SV2F (%) 5-minute RVt 76.22 74.00 73.12 21.14 RKtN‐Infeasible 87.26 87.62 87.64 76.72 RKtN‐Feasible 91.16 91.34 92.02 96.42 V^t 66.00 63.62 62.44 16.74 V^t,MA(1) 93.96 94.26 94.28 89.84 V^t,MA(2) 94.36 94.60 94.22 90.06 1-minute RVt 0.00 0.00 0.10 0.06 RKtN‐Infeasible 90.02 90.40 89.98 71.70 RKtN‐Feasible 99.80 99.80 99.70 99.46 V^t 0.00 0.00 0.04 0.04 V^t,MA(1) 94.64 94.92 94.72 87.08 V^t,MA(2) 94.58 94.92 94.30 86.92 30-second RVt 0.00 0.00 0.00 0.00 RKtN‐Infeasible 91.50 91.72 91.26 70.94 RKN‐Feasible 100.00 100.00 100.00 99.96 V^ 0.00 0.00 0.00 0.00 V^MA(1) 95.00 95.24 94.76 85.18 V^MA(2) 94.96 94.66 94.78 85.80 10-second RVt 0.00 0.00 0.00 0.00 RKtN‐Infeasible 91.90 92.44 92.30 69.72 RKtN‐Feasible 100.00 100.00 100.00 100.00 V^t 0.00 0.00 0.00 0.00 V^t,MA(1) 64.70 65.00 68.00 78.74 V^t,MA(2) 94.48 95.20 95.14 82.06 Data freq. Interval estimator GARCH (%) SV1F (%) SV1FJ (%) SV2F (%) 5-minute RVt 76.22 74.00 73.12 21.14 RKtN‐Infeasible 87.26 87.62 87.64 76.72 RKtN‐Feasible 91.16 91.34 92.02 96.42 V^t 66.00 63.62 62.44 16.74 V^t,MA(1) 93.96 94.26 94.28 89.84 V^t,MA(2) 94.36 94.60 94.22 90.06 1-minute RVt 0.00 0.00 0.10 0.06 RKtN‐Infeasible 90.02 90.40 89.98 71.70 RKtN‐Feasible 99.80 99.80 99.70 99.46 V^t 0.00 0.00 0.04 0.04 V^t,MA(1) 94.64 94.92 94.72 87.08 V^t,MA(2) 94.58 94.92 94.30 86.92 30-second RVt 0.00 0.00 0.00 0.00 RKtN‐Infeasible 91.50 91.72 91.26 70.94 RKN‐Feasible 100.00 100.00 100.00 99.96 V^ 0.00 0.00 0.00 0.00 V^MA(1) 95.00 95.24 94.76 85.18 V^MA(2) 94.96 94.66 94.78 85.80 10-second RVt 0.00 0.00 0.00 0.00 RKtN‐Infeasible 91.90 92.44 92.30 69.72 RKtN‐Feasible 100.00 100.00 100.00 100.00 V^t 0.00 0.00 0.00 0.00 V^t,MA(1) 64.70 65.00 68.00 78.74 V^t,MA(2) 94.48 95.20 95.14 82.06 Notes: This table reports the coverage probabilities of 95% confidence intervals of RV, RKN, and 0.95 density intervals of Bayesian nonparametric estimators V^, V^MA(1) and V^MA(2) based on 5000-day results. The observed prices contain microstructure noise that is dependent with returns. Open in new tab Figures 1–3 display the histograms of the posterior mean of the number of clusters in three different settings. There are the DPM for five-minute SV1F returns (no noise), the DPM-MA(1) for one-minute SV1FJ returns (independent noise), and the DPM-MA(2) for 30-second SV2F returns (dependent noise). The figures show significant pooling. For example, in the one-minute SV1FJ return case, most of the daily variance estimates of Vt are formed by using one to five pooled groups of data, instead of 390 observations (separate groups) which is what the realized kernel uses. This level of pooling can lead to significant improvements for the Bayesian estimator. Figure 1 Open in new tabDownload slide Posterior mean of the number of clusters. Model: DPM. Data: five-minute return without microstructure noise from SV1F. Figure 1 Open in new tabDownload slide Posterior mean of the number of clusters. Model: DPM. Data: five-minute return without microstructure noise from SV1F. Figure 2 Open in new tabDownload slide Posterior mean of the number of clusters. Model: DPM-MA(1). Data: one-minute return with independent noise from SV1FJ. Figure 2 Open in new tabDownload slide Posterior mean of the number of clusters. Model: DPM-MA(1). Data: one-minute return with independent noise from SV1FJ. Figure 3 Open in new tabDownload slide Posterior Mean of the Number of Clusters. Model: DPM-MA(2). Data: 30-second return with dependent noise from SV2F. Figure 3 Open in new tabDownload slide Posterior Mean of the Number of Clusters. Model: DPM-MA(2). Data: 30-second return with dependent noise from SV2F. 4.5 Tick Time Sampling Following Griffin and Oomen (2008), the prices simulated from DGPs illustrated in Section 4.1 are discretized to tick prices. Let pt,ih denote the observed prices and ω represent the probability of price change. pt,ih=pt,i with probability ω, otherwise, pt,ih=pt,i−1h ⁠. ω is set to be 0.2. Tick time returns are formed based on prices sampled every k-th price change. The data frequencies selected are k = 60, k = 12, and k = 6, which roughly match the frequency of 5-minute, 1-minute, and 30-second data considered in previous examples. The RMSE of estimators based on tick time sampled returns is provided in Table 10. Panel A of Table 10 compares RVt and Vt in a no noise setting and Panel B shows the result of RKtF and V^t,MA(1) when independent microstructure noise is present. In ten out of twelve no-noise cases and all twelve cases with microstructure noise, the Bayesian nonparametric estimators dominate the classical counterparts in terms of RMSEs. The improvements on RMSE switching from RKtF to V^t,MA(1) range from 6.75% to 36.70% in the cases of tick prices contaminated with noise. Table 10 RMSE of RVt, RKtF, V^t ⁠, and V^t,MA(1) in tick time Data freq. Estimator GARCH SV1F SV1FJ SV2F Panel A: No microstructure noise  60-tick RV t 0.11566 0.21887 0.21663 0.49652 V^t 0.11175 0.21175 0.21081 0.47565  12-tick RVt 0.05237 0.09744 0.10194 0.27255 V^t 0.05184 0.09661 0.10474 0.26895  6-tick RVt 0.03866 0.07047 0.07393 0.17821 V^t 0.03842 0.07013 0.07787 0.17642 Panel B: Independent microstructure noise  60-tick RKtF 0.23958 0.43907 0.44927 0.87832 V^t,MA(1) 0.20988 0.39083 0.39199 0.69395  12-tick RKtF 0.11549 0.20695 0.20794 0.64542 V^t,MA(1) 0.10545 0.18643 0.18875 0.40857  6-tick RKtF 0.08666 0.15397 0.15639 0.31230 V^t,MA(1) 0.08080 0.13965 0.14555 0.25059 Data freq. Estimator GARCH SV1F SV1FJ SV2F Panel A: No microstructure noise  60-tick RV t 0.11566 0.21887 0.21663 0.49652 V^t 0.11175 0.21175 0.21081 0.47565  12-tick RVt 0.05237 0.09744 0.10194 0.27255 V^t 0.05184 0.09661 0.10474 0.26895  6-tick RVt 0.03866 0.07047 0.07393 0.17821 V^t 0.03842 0.07013 0.07787 0.17642 Panel B: Independent microstructure noise  60-tick RKtF 0.23958 0.43907 0.44927 0.87832 V^t,MA(1) 0.20988 0.39083 0.39199 0.69395  12-tick RKtF 0.11549 0.20695 0.20794 0.64542 V^t,MA(1) 0.10545 0.18643 0.18875 0.40857  6-tick RKtF 0.08666 0.15397 0.15639 0.31230 V^t,MA(1) 0.08080 0.13965 0.14555 0.25059 Notes: This table reports the RMSE from 5000 daily ex post variances using RV and V^ in no microstructure noise case, RKF and V^MA(1) in independent noise case, based on tick time-sampled returns. Bold entries denote the smallest values. Open in new tab Table 10 RMSE of RVt, RKtF, V^t ⁠, and V^t,MA(1) in tick time Data freq. Estimator GARCH SV1F SV1FJ SV2F Panel A: No microstructure noise  60-tick RV t 0.11566 0.21887 0.21663 0.49652 V^t 0.11175 0.21175 0.21081 0.47565  12-tick RVt 0.05237 0.09744 0.10194 0.27255 V^t 0.05184 0.09661 0.10474 0.26895  6-tick RVt 0.03866 0.07047 0.07393 0.17821 V^t 0.03842 0.07013 0.07787 0.17642 Panel B: Independent microstructure noise  60-tick RKtF 0.23958 0.43907 0.44927 0.87832 V^t,MA(1) 0.20988 0.39083 0.39199 0.69395  12-tick RKtF 0.11549 0.20695 0.20794 0.64542 V^t,MA(1) 0.10545 0.18643 0.18875 0.40857  6-tick RKtF 0.08666 0.15397 0.15639 0.31230 V^t,MA(1) 0.08080 0.13965 0.14555 0.25059 Data freq. Estimator GARCH SV1F SV1FJ SV2F Panel A: No microstructure noise  60-tick RV t 0.11566 0.21887 0.21663 0.49652 V^t 0.11175 0.21175 0.21081 0.47565  12-tick RVt 0.05237 0.09744 0.10194 0.27255 V^t 0.05184 0.09661 0.10474 0.26895  6-tick RVt 0.03866 0.07047 0.07393 0.17821 V^t 0.03842 0.07013 0.07787 0.17642 Panel B: Independent microstructure noise  60-tick RKtF 0.23958 0.43907 0.44927 0.87832 V^t,MA(1) 0.20988 0.39083 0.39199 0.69395  12-tick RKtF 0.11549 0.20695 0.20794 0.64542 V^t,MA(1) 0.10545 0.18643 0.18875 0.40857  6-tick RKtF 0.08666 0.15397 0.15639 0.31230 V^t,MA(1) 0.08080 0.13965 0.14555 0.25059 Notes: This table reports the RMSE from 5000 daily ex post variances using RV and V^ in no microstructure noise case, RKF and V^MA(1) in independent noise case, based on tick time-sampled returns. Bold entries denote the smallest values. Open in new tab Table 11 shows the RMSE of the Bayesian nonparametric estimators based on the two sampling schemes. As shown in Panel B of Table 11, the tick time V^t,MA(1) has lower RMSE in eight out of twelve cases. However, in Panel A, with no microstructure noise, calendar time sampling is uniformly better. Table 11 RMSE of V^t and V^t,MA(1) in Calendar Time and tick time Estimator GARCH SV1F SV1FJ SV2F Panel A: No microstructure noise  5-minute V^t 0.11125 0.20625 0.20928 0.45238  60-tick V^t 0.11175 0.21175 0.21081 0.47565  1-minute V^t 0.05069 0.09195 0.10075 0.18466  12-tick V^t 0.05184 0.09661 0.10474 0.26895  30-second V^t 0.03667 0.06597 0.07485 0.14292  6-tick V^t 0.03842 0.07013 0.07787 0.17642 Panel B: Independent microstructure noise  5-minute V^t,MA(1) 0.21020 0.38103 0.40211 0.73568  60-tick V^t,MA(1) 0.20988 0.39083 0.39199 0.69395  1-minute V^t,MA(1) 0.10304 0.19050 0.19861 0.40527  12-tick V^t,MA(1) 0.10545 0.18643 0.18875 0.40857  30-second V^t,MA(1) 0.07847 0.14306 0.15133 0.28697  6-tick V^t,MA(1) 0.08080 0.13965 0.14555 0.25059 Estimator GARCH SV1F SV1FJ SV2F Panel A: No microstructure noise  5-minute V^t 0.11125 0.20625 0.20928 0.45238  60-tick V^t 0.11175 0.21175 0.21081 0.47565  1-minute V^t 0.05069 0.09195 0.10075 0.18466  12-tick V^t 0.05184 0.09661 0.10474 0.26895  30-second V^t 0.03667 0.06597 0.07485 0.14292  6-tick V^t 0.03842 0.07013 0.07787 0.17642 Panel B: Independent microstructure noise  5-minute V^t,MA(1) 0.21020 0.38103 0.40211 0.73568  60-tick V^t,MA(1) 0.20988 0.39083 0.39199 0.69395  1-minute V^t,MA(1) 0.10304 0.19050 0.19861 0.40527  12-tick V^t,MA(1) 0.10545 0.18643 0.18875 0.40857  30-second V^t,MA(1) 0.07847 0.14306 0.15133 0.28697  6-tick V^t,MA(1) 0.08080 0.13965 0.14555 0.25059 Notes: This table reports the RMSE of estimating 5000 daily ex post variances using Bayesian nonparametric volatility estimator V^ and V^MA(1) in calendar time and tick time. Bold entries denote the smallest value between calendar time and tick time. Open in new tab Table 11 RMSE of V^t and V^t,MA(1) in Calendar Time and tick time Estimator GARCH SV1F SV1FJ SV2F Panel A: No microstructure noise  5-minute V^t 0.11125 0.20625 0.20928 0.45238  60-tick V^t 0.11175 0.21175 0.21081 0.47565  1-minute V^t 0.05069 0.09195 0.10075 0.18466  12-tick V^t 0.05184 0.09661 0.10474 0.26895  30-second V^t 0.03667 0.06597 0.07485 0.14292  6-tick V^t 0.03842 0.07013 0.07787 0.17642 Panel B: Independent microstructure noise  5-minute V^t,MA(1) 0.21020 0.38103 0.40211 0.73568  60-tick V^t,MA(1) 0.20988 0.39083 0.39199 0.69395  1-minute V^t,MA(1) 0.10304 0.19050 0.19861 0.40527  12-tick V^t,MA(1) 0.10545 0.18643 0.18875 0.40857  30-second V^t,MA(1) 0.07847 0.14306 0.15133 0.28697  6-tick V^t,MA(1) 0.08080 0.13965 0.14555 0.25059 Estimator GARCH SV1F SV1FJ SV2F Panel A: No microstructure noise  5-minute V^t 0.11125 0.20625 0.20928 0.45238  60-tick V^t 0.11175 0.21175 0.21081 0.47565  1-minute V^t 0.05069 0.09195 0.10075 0.18466  12-tick V^t 0.05184 0.09661 0.10474 0.26895  30-second V^t 0.03667 0.06597 0.07485 0.14292  6-tick V^t 0.03842 0.07013 0.07787 0.17642 Panel B: Independent microstructure noise  5-minute V^t,MA(1) 0.21020 0.38103 0.40211 0.73568  60-tick V^t,MA(1) 0.20988 0.39083 0.39199 0.69395  1-minute V^t,MA(1) 0.10304 0.19050 0.19861 0.40527  12-tick V^t,MA(1) 0.10545 0.18643 0.18875 0.40857  30-second V^t,MA(1) 0.07847 0.14306 0.15133 0.28697  6-tick V^t,MA(1) 0.08080 0.13965 0.14555 0.25059 Notes: This table reports the RMSE of estimating 5000 daily ex post variances using Bayesian nonparametric volatility estimator V^ and V^MA(1) in calendar time and tick time. Bold entries denote the smallest value between calendar time and tick time. Open in new tab In summary, these simulations show the Bayesian estimate of ex post variance to be very competitive with existing classical alternatives and under different sampling schemes. 5 Bayesian Nonparametric Estimates of Stock Market Variance For each day, 5000 MCMC draws are taken after 10,000 burn-in draws are discarded, to estimate posterior moments. All prior settings are the same as in the simulations. 5.1 IBM We first consider estimating and forecasting volatility using a long calendar span of IBM equity returns. The one-minute IBM price records from January 3, 1998 to February 16, 2016 were downloaded from the Kibot website.10 We choose the sample starting from January 3, 2001 as the relatively small number of transactions before the year 2000 yields many zero intraday returns. The days with less than five hours of trading are removed, which leaves 3764 days in the sample. Log-prices are placed on a one-minute grid using the price associated with closest time stamp that is less than or equal to the grid time. The five-minute and one-minute percentage log returns from 9:30 to 16:00(EST) are constructed by taking the log-price difference between two close prices in time grid and scaling by 100. The overnight returns are ignored so the first intraday return is formed using the daily opening price instead of the close price in the previous day. The procedure generates 293,520 five-minute returns and 1,467,848 one-minute returns. We use a filter to remove errors and outliers caused by abnormal price records. We would like to filter out the situation in which the price jumps up or down but quickly moves back to original price range. This suggests an error in the record. If |rt,i|+|rt,i+1|>8vart(rt,i) and |rt,i+rt,i+1|<0.05% ⁠, we replace rt,i and rt,i+1 by r′t,i=r′t,i+1=0.5×(rt,i+rt,i+1) ⁠. The filter adjusts 0 and 70 (70/1,467,848 = 0.00477%) returns for five-minute and one-minute case, respectively. From these data, several version of daily V^t ⁠, RVt, and RKt are computed. Daily returns are the open-to-close return and match the time interval for the variance estimates. For each of the estimators, we follow exactly the methods used in the simulation section. 5.1.1 Ex post variance estimation Figure 4 displays a volatility signature plot, which shows the relationship between the average volatility estimators and sampling frequency. The RV based on 10-minute returns serves as the unbiased benchmark because low-frequency returns are less influenced by market microstructure noise. The average of the Bayes nonparametric estimator is closer to 1.0 compared with RV no matter the sampling frequency. The plot becomes stable after 3.9 minutes sampling frequency. Figure 4 Open in new tabDownload slide Signature Plot of RVt and V^t (IBM data). Figure 4 Open in new tabDownload slide Signature Plot of RVt and V^t (IBM data). Table 12 reports summary statistics for several estimators. Overall the Bayesian and classical estimators are very close. Both the realized kernel and the moving average DPM estimators reduce the average level of daily variance and indicate the presence of significant market microstructure noise. Based on this and an analysis of the ACF of the high-frequency returns, we suggest the V^t,MA(1) for the five-minute data and the V^t,MA(4) for the one-minute data in the remainder of the analysis. Comparison with the kernel estimators is found in Figures 5 and 6. Except for the extreme values they are very similar. Figure 5 Open in new tabDownload slide RKtF and V^t,MA(1) based on five-minute IBM returns. Figure 5 Open in new tabDownload slide RKtF and V^t,MA(1) based on five-minute IBM returns. Figure 6 Open in new tabDownload slide RKtN and V^t,MA(4) based on one-minute IBM returns. Figure 6 Open in new tabDownload slide RKtN and V^t,MA(4) based on one-minute IBM returns. Table 12 Summary statistics: IBM Frequency Data Mean Median Var. Skew. Kurt. Min. Max. Daily rt 0.0673 0.0656 1.6046 0.2069 8.3059 −6.4095 12.2777 rt2 1.6091 0.4352 18.9654 13.9087 387.4812 0.0000 150.7429 5-minute RVt 1.8353 0.9458 11.9867 9.5887 148.2622 0.1032 76.2901 RVt,M=26block 1.8403 0.9506 12.1820 9.7116 151.8262 0.1003 77.2906 RKtF 1.6613 0.8447 9.3647 8.5539 124.8480 0.0375 71.9626 RKtN 1.6670 0.8476 8.8872 8.0467 109.1098 0.0556 66.3995 V^t 1.7805 0.9286 10.3994 8.3839 116.7700 0.1068 70.2477 V^t,MA(1) 1.6656 0.8424 9.2105 7.2318 77.7981 0.0275 52.3102 V^t,MA(2) 1.6969 0.8467 10.0917 8.4800 118.7351 0.0137 72.2059 1-minute RVt 2.0004 1.0468 13.5019 10.5704 202.6835 0.1535 103.8773 RVt,M=78block 2.0045 1.0478 13.6307 10.6737 206.9232 0.1551 105.0582 RKtF 1.7952 0.9163 10.8043 8.3092 113.5727 0.1006 73.8576 RKtN 1.7425 0.8973 9.6499 7.7187 94.7830 0.0897 60.2024 V^t 1.9649 1.0322 12.9922 10.6422 206.7584 0.1517 102.6389 V^t,MA(1) 1.8417 0.9211 11.3720 7.6213 87.5668 0.1156 64.1797 V^t,MA(2) 1.7894 0.8979 10.9147 8.5039 121.6750 0.1040 74.7890 V^t,MA(3) 1.7393 0.8824 9.6571 7.8283 101.5650 0.0986 61.4764 V^t,MA(4) 1.7105 0.8704 9.1269 7.3825 84.7413 0.0964 57.2552 Frequency Data Mean Median Var. Skew. Kurt. Min. Max. Daily rt 0.0673 0.0656 1.6046 0.2069 8.3059 −6.4095 12.2777 rt2 1.6091 0.4352 18.9654 13.9087 387.4812 0.0000 150.7429 5-minute RVt 1.8353 0.9458 11.9867 9.5887 148.2622 0.1032 76.2901 RVt,M=26block 1.8403 0.9506 12.1820 9.7116 151.8262 0.1003 77.2906 RKtF 1.6613 0.8447 9.3647 8.5539 124.8480 0.0375 71.9626 RKtN 1.6670 0.8476 8.8872 8.0467 109.1098 0.0556 66.3995 V^t 1.7805 0.9286 10.3994 8.3839 116.7700 0.1068 70.2477 V^t,MA(1) 1.6656 0.8424 9.2105 7.2318 77.7981 0.0275 52.3102 V^t,MA(2) 1.6969 0.8467 10.0917 8.4800 118.7351 0.0137 72.2059 1-minute RVt 2.0004 1.0468 13.5019 10.5704 202.6835 0.1535 103.8773 RVt,M=78block 2.0045 1.0478 13.6307 10.6737 206.9232 0.1551 105.0582 RKtF 1.7952 0.9163 10.8043 8.3092 113.5727 0.1006 73.8576 RKtN 1.7425 0.8973 9.6499 7.7187 94.7830 0.0897 60.2024 V^t 1.9649 1.0322 12.9922 10.6422 206.7584 0.1517 102.6389 V^t,MA(1) 1.8417 0.9211 11.3720 7.6213 87.5668 0.1156 64.1797 V^t,MA(2) 1.7894 0.8979 10.9147 8.5039 121.6750 0.1040 74.7890 V^t,MA(3) 1.7393 0.8824 9.6571 7.8283 101.5650 0.0986 61.4764 V^t,MA(4) 1.7105 0.8704 9.1269 7.3825 84.7413 0.0964 57.2552 Notes: This table reports the summary statistics of ex post variance estimators based on five-minute and one-minute returns, along with the summary statistics of daily return and daily squared return. The number of daily observation is 3764. Open in new tab Table 12 Summary statistics: IBM Frequency Data Mean Median Var. Skew. Kurt. Min. Max. Daily rt 0.0673 0.0656 1.6046 0.2069 8.3059 −6.4095 12.2777 rt2 1.6091 0.4352 18.9654 13.9087 387.4812 0.0000 150.7429 5-minute RVt 1.8353 0.9458 11.9867 9.5887 148.2622 0.1032 76.2901 RVt,M=26block 1.8403 0.9506 12.1820 9.7116 151.8262 0.1003 77.2906 RKtF 1.6613 0.8447 9.3647 8.5539 124.8480 0.0375 71.9626 RKtN 1.6670 0.8476 8.8872 8.0467 109.1098 0.0556 66.3995 V^t 1.7805 0.9286 10.3994 8.3839 116.7700 0.1068 70.2477 V^t,MA(1) 1.6656 0.8424 9.2105 7.2318 77.7981 0.0275 52.3102 V^t,MA(2) 1.6969 0.8467 10.0917 8.4800 118.7351 0.0137 72.2059 1-minute RVt 2.0004 1.0468 13.5019 10.5704 202.6835 0.1535 103.8773 RVt,M=78block 2.0045 1.0478 13.6307 10.6737 206.9232 0.1551 105.0582 RKtF 1.7952 0.9163 10.8043 8.3092 113.5727 0.1006 73.8576 RKtN 1.7425 0.8973 9.6499 7.7187 94.7830 0.0897 60.2024 V^t 1.9649 1.0322 12.9922 10.6422 206.7584 0.1517 102.6389 V^t,MA(1) 1.8417 0.9211 11.3720 7.6213 87.5668 0.1156 64.1797 V^t,MA(2) 1.7894 0.8979 10.9147 8.5039 121.6750 0.1040 74.7890 V^t,MA(3) 1.7393 0.8824 9.6571 7.8283 101.5650 0.0986 61.4764 V^t,MA(4) 1.7105 0.8704 9.1269 7.3825 84.7413 0.0964 57.2552 Frequency Data Mean Median Var. Skew. Kurt. Min. Max. Daily rt 0.0673 0.0656 1.6046 0.2069 8.3059 −6.4095 12.2777 rt2 1.6091 0.4352 18.9654 13.9087 387.4812 0.0000 150.7429 5-minute RVt 1.8353 0.9458 11.9867 9.5887 148.2622 0.1032 76.2901 RVt,M=26block 1.8403 0.9506 12.1820 9.7116 151.8262 0.1003 77.2906 RKtF 1.6613 0.8447 9.3647 8.5539 124.8480 0.0375 71.9626 RKtN 1.6670 0.8476 8.8872 8.0467 109.1098 0.0556 66.3995 V^t 1.7805 0.9286 10.3994 8.3839 116.7700 0.1068 70.2477 V^t,MA(1) 1.6656 0.8424 9.2105 7.2318 77.7981 0.0275 52.3102 V^t,MA(2) 1.6969 0.8467 10.0917 8.4800 118.7351 0.0137 72.2059 1-minute RVt 2.0004 1.0468 13.5019 10.5704 202.6835 0.1535 103.8773 RVt,M=78block 2.0045 1.0478 13.6307 10.6737 206.9232 0.1551 105.0582 RKtF 1.7952 0.9163 10.8043 8.3092 113.5727 0.1006 73.8576 RKtN 1.7425 0.8973 9.6499 7.7187 94.7830 0.0897 60.2024 V^t 1.9649 1.0322 12.9922 10.6422 206.7584 0.1517 102.6389 V^t,MA(1) 1.8417 0.9211 11.3720 7.6213 87.5668 0.1156 64.1797 V^t,MA(2) 1.7894 0.8979 10.9147 8.5039 121.6750 0.1040 74.7890 V^t,MA(3) 1.7393 0.8824 9.6571 7.8283 101.5650 0.0986 61.4764 V^t,MA(4) 1.7105 0.8704 9.1269 7.3825 84.7413 0.0964 57.2552 Notes: This table reports the summary statistics of ex post variance estimators based on five-minute and one-minute returns, along with the summary statistics of daily return and daily squared return. The number of daily observation is 3764. Open in new tab Interval estimates for two subperiods are shown in Figures 7 and 8. A clear disadvantage of the kernel-based confidence interval in that it includes negative values for ex post variance. The Bayesian version by construction does not and tends to be significantly shorter in volatile days. The results of log variance11 are also provided with some differences remaining. Figure 7 Open in new tabDownload slide High volatility period: RKtF and V^t,MA(1) calculated using five-minute IBM returns. Top: variance, below: log-variance. Figure 7 Open in new tabDownload slide High volatility period: RKtF and V^t,MA(1) calculated using five-minute IBM returns. Top: variance, below: log-variance. Figure 8 Open in new tabDownload slide Low volatility period: RKtF and V^t,MA(1) calculated using five-minute IBM returns. Top: variance, below: log-variance. Figure 8 Open in new tabDownload slide Low volatility period: RKtF and V^t,MA(1) calculated using five-minute IBM returns. Top: variance, below: log-variance. The degree of pooling from the Bayesian estimators is found in Figures 9 and 10. As expected, we see more groups in the higher one-minute frequency. In this case, on average, there are about three to seven distinct groups of intraday variance parameters. Figure 9 Open in new tabDownload slide Posterior Mean of the Number of Clusters (Based on 3764-day results from DPM-MA(1) using five-minute IBM returns). Figure 9 Open in new tabDownload slide Posterior Mean of the Number of Clusters (Based on 3764-day results from DPM-MA(1) using five-minute IBM returns). Figure 10 Open in new tabDownload slide Posterior Mean of the Number of Clusters (Based on 3764-day results from DPM-MA(4) using one-minute IBM returns). Figure 10 Open in new tabDownload slide Posterior Mean of the Number of Clusters (Based on 3764-day results from DPM-MA(4) using one-minute IBM returns). 5.1.2 Ex post variance modeling and forecasting Does the Bayesian estimator correctly recover the time series dynamics of volatility? To investigate this, we estimate several versions of the heterogeneous autoregressive (HAR) model introduced by Corsi (2009). This is a popular model that captures the strong dependence in ex post daily variance. For V^t ⁠, the HAR model is V^t=β0+β1V^t−1+β2V^t−1|t−5+β3V^t−1|t−22+ϵt, (45) where V^t−1|t−h=1h∑l=1hV^t−l and ϵt is the error term. V^t−1 ⁠, V^t−1|t−5 ⁠, and V^t−1|t−22 correspond to the daily, weekly, and monthly variance measures up to time t – 1. Similar specifications are obtained by replacing V^t with RVt or RKt. Bollerslev, Patton, and Quaedvlieg (2016) extend the HAR model to the HARQ model by taking the asymptotic theory of RVt into account. The HARQ model for RVt is given by RVt=β0+(β1+β1QRQt−11/2)RVt−1+β2RVt−1|t−5+β3RVt−1|t−22+ϵt (46) The loading on RVt−1 is no longer a constant, but varying with measurement error, which is captured by RQt−1 ⁠. The model responds more to RVt−1 if measurement error is low and has a lower response if error is high. Bollerslev, Patton, and Quaedvlieg (2016) provide evidence that the HARQ model outperforms the HAR model in forecasting.12 An advantage of our Bayesian approach is that we have the full finite sample posterior distribution for Vt. In the Bayesian nonparametric framework, there is no need to estimate IQt with RQt, instead the variance, standard deviation, or other features of Vt can be easily estimated using the MCMC output. Replacing RQt−1 with var̂(Vt−1) ⁠, the modified HARQ model for V^t is defined as V^t=β0+(β1+β1Qvar̂(Vt−1)1/2)V^t−1+β2V^t−1|t−5+β3V^t−1|t−22+ϵt, (47) where var̂(Vt−1)1/2 is an MCMC estimate of the posterior standard deviation of Vt. Table 13 displays the OLS estimates and the R2 for several model specifications. Coefficient estimates are comparable across each class of model. Clearly, the Bayesian variance estimates display the same type of time series dynamics found in the realized kernel estimates. Table 13 HAR and HARQ model regression result based on IBM ex post variance estimators Data freq. Parameter HAR HARQ RKtF V^t,MA(1) RKtF V^t,MA(1) 5-minute β0 0.1322 0.1224 0.1015 −0.0142 (0.0374) (0.0375) (0.0382) (0.0393) β1 0.1926 0.2506 0.2341 0.4629 (0.0196) (0.0196) (0.0224) (0.0283) β2 0.5649 0.4802 0.5664 0.4298 (0.0332) (0.0329) (0.0331) (0.0328) β3 0.1598 0.1927 0.1422 0.1482 (0.0281) (0.0282) (0.0289) (0.0281) β1Q – – −0.0012 −0.0202 (0.0003) (0.0020) R2 (%) 57.74 59.55 57.90 60.66 Data freq. Parameter HAR HARQ RKtN V^t,MA(4) RKtN V^t,MA(4) 1-minute β0 0.1246 0.1308 0.0065 −0.0402 (0.0365) (0.0376) (0.0367) (0.0388) β1 0.2493 0.2455 0.4464 0.5294 (0.0195) (0.0196) (0.0242) (0.0284) β2 0.5435 0.5198 0.5033 0.4521 (0.0318) (0.0321) (0.0312) (0.0317) β3 0.1331 0.1558 0.0708 0.0821 (0.0265) (0.0271) (0.0263) (0.0270) β1Q – – −0.0031 −0.0334 (0.0002) (0.0025) R2 (%) 62.71 60.34 64.39 62.19 Data freq. Parameter HAR HARQ RKtF V^t,MA(1) RKtF V^t,MA(1) 5-minute β0 0.1322 0.1224 0.1015 −0.0142 (0.0374) (0.0375) (0.0382) (0.0393) β1 0.1926 0.2506 0.2341 0.4629 (0.0196) (0.0196) (0.0224) (0.0283) β2 0.5649 0.4802 0.5664 0.4298 (0.0332) (0.0329) (0.0331) (0.0328) β3 0.1598 0.1927 0.1422 0.1482 (0.0281) (0.0282) (0.0289) (0.0281) β1Q – – −0.0012 −0.0202 (0.0003) (0.0020) R2 (%) 57.74 59.55 57.90 60.66 Data freq. Parameter HAR HARQ RKtN V^t,MA(4) RKtN V^t,MA(4) 1-minute β0 0.1246 0.1308 0.0065 −0.0402 (0.0365) (0.0376) (0.0367) (0.0388) β1 0.2493 0.2455 0.4464 0.5294 (0.0195) (0.0196) (0.0242) (0.0284) β2 0.5435 0.5198 0.5033 0.4521 (0.0318) (0.0321) (0.0312) (0.0317) β3 0.1331 0.1558 0.0708 0.0821 (0.0265) (0.0271) (0.0263) (0.0270) β1Q – – −0.0031 −0.0334 (0.0002) (0.0025) R2 (%) 62.71 60.34 64.39 62.19 Notes: This table reports OLS regression results for the HAR and HARQ models. The results in top panel are based on RKtF and V^t,MA(1) calculated using five-minute returns and the bottom panel shows the results of one-minute RKtN and V^t,MA(4) ⁠. The values in brackets are standard error of coefficients. Sample period: January 3, 2001 to February 16, 2016, 3764 observations. Open in new tab Table 13 HAR and HARQ model regression result based on IBM ex post variance estimators Data freq. Parameter HAR HARQ RKtF V^t,MA(1) RKtF V^t,MA(1) 5-minute β0 0.1322 0.1224 0.1015 −0.0142 (0.0374) (0.0375) (0.0382) (0.0393) β1 0.1926 0.2506 0.2341 0.4629 (0.0196) (0.0196) (0.0224) (0.0283) β2 0.5649 0.4802 0.5664 0.4298 (0.0332) (0.0329) (0.0331) (0.0328) β3 0.1598 0.1927 0.1422 0.1482 (0.0281) (0.0282) (0.0289) (0.0281) β1Q – – −0.0012 −0.0202 (0.0003) (0.0020) R2 (%) 57.74 59.55 57.90 60.66 Data freq. Parameter HAR HARQ RKtN V^t,MA(4) RKtN V^t,MA(4) 1-minute β0 0.1246 0.1308 0.0065 −0.0402 (0.0365) (0.0376) (0.0367) (0.0388) β1 0.2493 0.2455 0.4464 0.5294 (0.0195) (0.0196) (0.0242) (0.0284) β2 0.5435 0.5198 0.5033 0.4521 (0.0318) (0.0321) (0.0312) (0.0317) β3 0.1331 0.1558 0.0708 0.0821 (0.0265) (0.0271) (0.0263) (0.0270) β1Q – – −0.0031 −0.0334 (0.0002) (0.0025) R2 (%) 62.71 60.34 64.39 62.19 Data freq. Parameter HAR HARQ RKtF V^t,MA(1) RKtF V^t,MA(1) 5-minute β0 0.1322 0.1224 0.1015 −0.0142 (0.0374) (0.0375) (0.0382) (0.0393) β1 0.1926 0.2506 0.2341 0.4629 (0.0196) (0.0196) (0.0224) (0.0283) β2 0.5649 0.4802 0.5664 0.4298 (0.0332) (0.0329) (0.0331) (0.0328) β3 0.1598 0.1927 0.1422 0.1482 (0.0281) (0.0282) (0.0289) (0.0281) β1Q – – −0.0012 −0.0202 (0.0003) (0.0020) R2 (%) 57.74 59.55 57.90 60.66 Data freq. Parameter HAR HARQ RKtN V^t,MA(4) RKtN V^t,MA(4) 1-minute β0 0.1246 0.1308 0.0065 −0.0402 (0.0365) (0.0376) (0.0367) (0.0388) β1 0.2493 0.2455 0.4464 0.5294 (0.0195) (0.0196) (0.0242) (0.0284) β2 0.5435 0.5198 0.5033 0.4521 (0.0318) (0.0321) (0.0312) (0.0317) β3 0.1331 0.1558 0.0708 0.0821 (0.0265) (0.0271) (0.0263) (0.0270) β1Q – – −0.0031 −0.0334 (0.0002) (0.0025) R2 (%) 62.71 60.34 64.39 62.19 Notes: This table reports OLS regression results for the HAR and HARQ models. The results in top panel are based on RKtF and V^t,MA(1) calculated using five-minute returns and the bottom panel shows the results of one-minute RKtN and V^t,MA(4) ⁠. The values in brackets are standard error of coefficients. Sample period: January 3, 2001 to February 16, 2016, 3764 observations. Open in new tab Finally, out-of-sample root mean squared forecast errors (RMSFEs) of HAR and HARQ models using both classical estimators and Bayesian estimators are found in Table 14. The out-of-sample period is from January 3, 2005 to February 16, 2016 (2773 observations) and model parameters are re-estimated as new data arrives. Note that to mimic a real-time forecast setting the prior hyperparameters ν0,t and s0,t are set based on intraday data from day t and t – 1.13 Table 14 Out-of-sample forecasts of IBM volatility Dependent variable Regressors HAR HARQ Panel A: 5-minute return  5-minute RKtF RKtF 1.84113 1.84444 V^t,MA(1) 1.84042 1.81152  5-minute V^t,MA(1) RKtF 1.86130 1.86642 V^t,MA(1) 1.85546 1.83054 Panel B: 1-minute return  1-minute RKtN RKtN 1.87539 1.82881 V^t,MA(4) 1.87215 1.82548  1-minute V^t,MA(4) RKtN 1.94106 1.88974 V^t,MA(4) 1.93202 1.87276 Dependent variable Regressors HAR HARQ Panel A: 5-minute return  5-minute RKtF RKtF 1.84113 1.84444 V^t,MA(1) 1.84042 1.81152  5-minute V^t,MA(1) RKtF 1.86130 1.86642 V^t,MA(1) 1.85546 1.83054 Panel B: 1-minute return  1-minute RKtN RKtN 1.87539 1.82881 V^t,MA(4) 1.87215 1.82548  1-minute V^t,MA(4) RKtN 1.94106 1.88974 V^t,MA(4) 1.93202 1.87276 Notes: This table reports the RMSFE of forecasting next period ex post variance using both classical and Bayesian nonparametric variance estimator. Both HAR and HARQ models are considered. The forecasting target is the dependent variable one period out-of-sample. On each day, the model parameters are re-estimated using all the data up to that day. Out of sample period: January 3, 2005 to February 16, 2016, 2773 days. Bold entries denote the smallest value in a column. Open in new tab Table 14 Out-of-sample forecasts of IBM volatility Dependent variable Regressors HAR HARQ Panel A: 5-minute return  5-minute RKtF RKtF 1.84113 1.84444 V^t,MA(1) 1.84042 1.81152  5-minute V^t,MA(1) RKtF 1.86130 1.86642 V^t,MA(1) 1.85546 1.83054 Panel B: 1-minute return  1-minute RKtN RKtN 1.87539 1.82881 V^t,MA(4) 1.87215 1.82548  1-minute V^t,MA(4) RKtN 1.94106 1.88974 V^t,MA(4) 1.93202 1.87276 Dependent variable Regressors HAR HARQ Panel A: 5-minute return  5-minute RKtF RKtF 1.84113 1.84444 V^t,MA(1) 1.84042 1.81152  5-minute V^t,MA(1) RKtF 1.86130 1.86642 V^t,MA(1) 1.85546 1.83054 Panel B: 1-minute return  1-minute RKtN RKtN 1.87539 1.82881 V^t,MA(4) 1.87215 1.82548  1-minute V^t,MA(4) RKtN 1.94106 1.88974 V^t,MA(4) 1.93202 1.87276 Notes: This table reports the RMSFE of forecasting next period ex post variance using both classical and Bayesian nonparametric variance estimator. Both HAR and HARQ models are considered. The forecasting target is the dependent variable one period out-of-sample. On each day, the model parameters are re-estimated using all the data up to that day. Out of sample period: January 3, 2005 to February 16, 2016, 2773 days. Bold entries denote the smallest value in a column. Open in new tab The first column of Table 14 reports the data frequency and the dependent variable used in the HAR/HARQ model. The second column records the data used to construct the right-hand-side regressors. In this manner, we consider all the possible combinations of how RKtN is forecast by lags of RKtN or V^t,MA and similarly for forecasting V^t,MA ⁠. All of the specifications produce similar RMSFE. In all cases, the Bayesian variance measure forecasts itself and the realized kernel better. 5.2 SPDR S&P 500 ETF Transaction and National Best Bid and Offer data for SPDR S&P 500 ETF (SPY) was supplied by Tickdata. We follow the same method of Barndorff-Nielsen et al. (2011) to clean both transaction and quote datasets and form grid returns at 5-minute, 1-minute, 30-second, and 10-second frequencies using transaction prices. The sample period is from July 1, 2014 to June 29, 2016 and does not include days with less than six trading hours. The final dataset has 498 days of intraday observations. Table 15 displays the summary statistics of daily variance estimators of SPY returns. As the sampling frequency increases, the sample average of different variance estimators become closer to the sample variance of daily returns. Figures 11 and 12 display box plots of the daily variance estimates for the classical and Bayesian estimators for the 5-minute and 30-second data. There are several important points to make. First, both estimators recover the same general pattern of volatility in this period. Second, the Bayesian density interval is often shorter and asymmetric compare to the classical counterpart. Although there is general agreement, the high variance day of June 24 indicates some differences particularly in Figure 12. Finally, both estimates become more accurate with the higher frequency 30-second data and also make a significant downward revision to the variance estimates on June 24. Figure 11 Open in new tabDownload slide RVt and V^t based on five-minute SPY returns in June 2016. Top: variance, below: log-variance. Figure 11 Open in new tabDownload slide RVt and V^t based on five-minute SPY returns in June 2016. Top: variance, below: log-variance. Figure 12 Open in new tabDownload slide RKtF and V^t,MA(1) based on 30-second SPY returns in June 2016. Top: variance, below: log-variance. Figure 12 Open in new tabDownload slide RKtF and V^t,MA(1) based on 30-second SPY returns in June 2016. Top: variance, below: log-variance. Table 15 Summary statistics: SPY Frequency Data Mean Median Var. Skew. Kurt. Min. Max. Daily rt 0.0188 0.0452 0.4980 −0.6969 6.5418 −4.2837 2.7084 rt2 0.4984 0.1634 1.3633 8.8640 118.4977 0.0000 18.3506 5-minute RVt 0.5287 0.2959 1.5263 16.2802 320.3198 0.0358 25.2066 RVt,M=26block 0.5297 0.2920 1.5736 16.4718 325.8798 0.0356 25.6964 RKtF 0.4900 0.2754 0.5812 7.8004 97.4299 0.0094 11.5382 RKtN 0.4917 0.2792 0.6305 8.7319 119.7854 0.0168 12.7330 V^t 0.5212 0.2901 1.3682 15.7278 304.5983 0.0367 23.5902 V^t,MA(1) 0.5245 0.2756 1.3761 15.3403 294.2938 0.0160 23.4621 V^t,MA(2) 0.5162 0.2900 0.8368 10.9368 175.3430 0.0145 16.1662 1-minute RVt 0.5209 0.3154 0.5902 9.3401 138.3005 0.0521 12.8912 RVt,M=78block 0.5214 0.3174 0.5910 9.3483 138.5217 0.0520 12.9052 RKtF 0.5126 0.3036 0.8742 12.6004 219.7084 0.0344 17.4842 RKtN 0.5062 0.2999 0.8818 13.0819 232.7427 0.0281 17.8050 V^t 0.5170 0.3153 0.5766 9.2625 136.6305 0.0522 12.7063 V^t,MA(1) 0.5095 0.2984 0.9136 13.1729 235.1810 0.0383 18.1655 V^t,MA(2) 0.5025 0.2940 0.9899 13.5608 244.4456 0.0344 19.0570 30-second RVt 0.5034 0.3099 0.4525 7.2251 89.5632 0.0563 10.1295 RVt,M=130block 0.5297 0.2920 1.5736 16.471 325.8798 0.0356 25.6964 RKtF 0.5092 0.3109 0.6869 10.9490 177.4869 0.0411 14.7520 RKtN 0.5066 0.3094 0.7906 12.2661 211.2367 0.0356 16.4858 V^t 0.5008 0.3087 0.4466 7.1778 88.5730 0.0566 10.0356 V^t,MA(1) 0.5007 0.3017 0.6056 9.9364 152.7216 0.0436 13.3641 V^t,MA(2) 0.4997 0.2917 0.6780 10.6858 170.9371 0.0394 14.5134 10-second RVt 0.4984 0.3103 0.4172 7.0322 85.1760 0.0741 9.6097 RVt,M=260block 0.4984 0.3106 0.4170 7.0330 85.2038 0.0741 9.6077 RKtF 0.5021 0.3165 0.5179 8.7480 124.3016 0.0533 11.7691 RKtN 0.5051 0.3128 0.5845 9.6693 145.9337 0.0447 12.9900 V^t 0.4962 0.3099 0.4109 6.9512 83.4540 0.0741 9.4866 V^t,MA(1) 0.4921 0.3067 0.4220 6.9830 83.8263 0.0470 9.6152 V^t,MA(2) 0.4943 0.3066 0.4271 6.9312 83.1514 0.0441 9.6627 Frequency Data Mean Median Var. Skew. Kurt. Min. Max. Daily rt 0.0188 0.0452 0.4980 −0.6969 6.5418 −4.2837 2.7084 rt2 0.4984 0.1634 1.3633 8.8640 118.4977 0.0000 18.3506 5-minute RVt 0.5287 0.2959 1.5263 16.2802 320.3198 0.0358 25.2066 RVt,M=26block 0.5297 0.2920 1.5736 16.4718 325.8798 0.0356 25.6964 RKtF 0.4900 0.2754 0.5812 7.8004 97.4299 0.0094 11.5382 RKtN 0.4917 0.2792 0.6305 8.7319 119.7854 0.0168 12.7330 V^t 0.5212 0.2901 1.3682 15.7278 304.5983 0.0367 23.5902 V^t,MA(1) 0.5245 0.2756 1.3761 15.3403 294.2938 0.0160 23.4621 V^t,MA(2) 0.5162 0.2900 0.8368 10.9368 175.3430 0.0145 16.1662 1-minute RVt 0.5209 0.3154 0.5902 9.3401 138.3005 0.0521 12.8912 RVt,M=78block 0.5214 0.3174 0.5910 9.3483 138.5217 0.0520 12.9052 RKtF 0.5126 0.3036 0.8742 12.6004 219.7084 0.0344 17.4842 RKtN 0.5062 0.2999 0.8818 13.0819 232.7427 0.0281 17.8050 V^t 0.5170 0.3153 0.5766 9.2625 136.6305 0.0522 12.7063 V^t,MA(1) 0.5095 0.2984 0.9136 13.1729 235.1810 0.0383 18.1655 V^t,MA(2) 0.5025 0.2940 0.9899 13.5608 244.4456 0.0344 19.0570 30-second RVt 0.5034 0.3099 0.4525 7.2251 89.5632 0.0563 10.1295 RVt,M=130block 0.5297 0.2920 1.5736 16.471 325.8798 0.0356 25.6964 RKtF 0.5092 0.3109 0.6869 10.9490 177.4869 0.0411 14.7520 RKtN 0.5066 0.3094 0.7906 12.2661 211.2367 0.0356 16.4858 V^t 0.5008 0.3087 0.4466 7.1778 88.5730 0.0566 10.0356 V^t,MA(1) 0.5007 0.3017 0.6056 9.9364 152.7216 0.0436 13.3641 V^t,MA(2) 0.4997 0.2917 0.6780 10.6858 170.9371 0.0394 14.5134 10-second RVt 0.4984 0.3103 0.4172 7.0322 85.1760 0.0741 9.6097 RVt,M=260block 0.4984 0.3106 0.4170 7.0330 85.2038 0.0741 9.6077 RKtF 0.5021 0.3165 0.5179 8.7480 124.3016 0.0533 11.7691 RKtN 0.5051 0.3128 0.5845 9.6693 145.9337 0.0447 12.9900 V^t 0.4962 0.3099 0.4109 6.9512 83.4540 0.0741 9.4866 V^t,MA(1) 0.4921 0.3067 0.4220 6.9830 83.8263 0.0470 9.6152 V^t,MA(2) 0.4943 0.3066 0.4271 6.9312 83.1514 0.0441 9.6627 Notes: This table reports the summary statistics of ex post variance estimators based on 5-minute, 1-minute, 30-second, and 10-second SPY returns, along with the summary statistics of daily return and daily squared return. Sample period: July 2, 2014 to June 28, 2016. Open in new tab Table 15 Summary statistics: SPY Frequency Data Mean Median Var. Skew. Kurt. Min. Max. Daily rt 0.0188 0.0452 0.4980 −0.6969 6.5418 −4.2837 2.7084 rt2 0.4984 0.1634 1.3633 8.8640 118.4977 0.0000 18.3506 5-minute RVt 0.5287 0.2959 1.5263 16.2802 320.3198 0.0358 25.2066 RVt,M=26block 0.5297 0.2920 1.5736 16.4718 325.8798 0.0356 25.6964 RKtF 0.4900 0.2754 0.5812 7.8004 97.4299 0.0094 11.5382 RKtN 0.4917 0.2792 0.6305 8.7319 119.7854 0.0168 12.7330 V^t 0.5212 0.2901 1.3682 15.7278 304.5983 0.0367 23.5902 V^t,MA(1) 0.5245 0.2756 1.3761 15.3403 294.2938 0.0160 23.4621 V^t,MA(2) 0.5162 0.2900 0.8368 10.9368 175.3430 0.0145 16.1662 1-minute RVt 0.5209 0.3154 0.5902 9.3401 138.3005 0.0521 12.8912 RVt,M=78block 0.5214 0.3174 0.5910 9.3483 138.5217 0.0520 12.9052 RKtF 0.5126 0.3036 0.8742 12.6004 219.7084 0.0344 17.4842 RKtN 0.5062 0.2999 0.8818 13.0819 232.7427 0.0281 17.8050 V^t 0.5170 0.3153 0.5766 9.2625 136.6305 0.0522 12.7063 V^t,MA(1) 0.5095 0.2984 0.9136 13.1729 235.1810 0.0383 18.1655 V^t,MA(2) 0.5025 0.2940 0.9899 13.5608 244.4456 0.0344 19.0570 30-second RVt 0.5034 0.3099 0.4525 7.2251 89.5632 0.0563 10.1295 RVt,M=130block 0.5297 0.2920 1.5736 16.471 325.8798 0.0356 25.6964 RKtF 0.5092 0.3109 0.6869 10.9490 177.4869 0.0411 14.7520 RKtN 0.5066 0.3094 0.7906 12.2661 211.2367 0.0356 16.4858 V^t 0.5008 0.3087 0.4466 7.1778 88.5730 0.0566 10.0356 V^t,MA(1) 0.5007 0.3017 0.6056 9.9364 152.7216 0.0436 13.3641 V^t,MA(2) 0.4997 0.2917 0.6780 10.6858 170.9371 0.0394 14.5134 10-second RVt 0.4984 0.3103 0.4172 7.0322 85.1760 0.0741 9.6097 RVt,M=260block 0.4984 0.3106 0.4170 7.0330 85.2038 0.0741 9.6077 RKtF 0.5021 0.3165 0.5179 8.7480 124.3016 0.0533 11.7691 RKtN 0.5051 0.3128 0.5845 9.6693 145.9337 0.0447 12.9900 V^t 0.4962 0.3099 0.4109 6.9512 83.4540 0.0741 9.4866 V^t,MA(1) 0.4921 0.3067 0.4220 6.9830 83.8263 0.0470 9.6152 V^t,MA(2) 0.4943 0.3066 0.4271 6.9312 83.1514 0.0441 9.6627 Frequency Data Mean Median Var. Skew. Kurt. Min. Max. Daily rt 0.0188 0.0452 0.4980 −0.6969 6.5418 −4.2837 2.7084 rt2 0.4984 0.1634 1.3633 8.8640 118.4977 0.0000 18.3506 5-minute RVt 0.5287 0.2959 1.5263 16.2802 320.3198 0.0358 25.2066 RVt,M=26block 0.5297 0.2920 1.5736 16.4718 325.8798 0.0356 25.6964 RKtF 0.4900 0.2754 0.5812 7.8004 97.4299 0.0094 11.5382 RKtN 0.4917 0.2792 0.6305 8.7319 119.7854 0.0168 12.7330 V^t 0.5212 0.2901 1.3682 15.7278 304.5983 0.0367 23.5902 V^t,MA(1) 0.5245 0.2756 1.3761 15.3403 294.2938 0.0160 23.4621 V^t,MA(2) 0.5162 0.2900 0.8368 10.9368 175.3430 0.0145 16.1662 1-minute RVt 0.5209 0.3154 0.5902 9.3401 138.3005 0.0521 12.8912 RVt,M=78block 0.5214 0.3174 0.5910 9.3483 138.5217 0.0520 12.9052 RKtF 0.5126 0.3036 0.8742 12.6004 219.7084 0.0344 17.4842 RKtN 0.5062 0.2999 0.8818 13.0819 232.7427 0.0281 17.8050 V^t 0.5170 0.3153 0.5766 9.2625 136.6305 0.0522 12.7063 V^t,MA(1) 0.5095 0.2984 0.9136 13.1729 235.1810 0.0383 18.1655 V^t,MA(2) 0.5025 0.2940 0.9899 13.5608 244.4456 0.0344 19.0570 30-second RVt 0.5034 0.3099 0.4525 7.2251 89.5632 0.0563 10.1295 RVt,M=130block 0.5297 0.2920 1.5736 16.471 325.8798 0.0356 25.6964 RKtF 0.5092 0.3109 0.6869 10.9490 177.4869 0.0411 14.7520 RKtN 0.5066 0.3094 0.7906 12.2661 211.2367 0.0356 16.4858 V^t 0.5008 0.3087 0.4466 7.1778 88.5730 0.0566 10.0356 V^t,MA(1) 0.5007 0.3017 0.6056 9.9364 152.7216 0.0436 13.3641 V^t,MA(2) 0.4997 0.2917 0.6780 10.6858 170.9371 0.0394 14.5134 10-second RVt 0.4984 0.3103 0.4172 7.0322 85.1760 0.0741 9.6097 RVt,M=260block 0.4984 0.3106 0.4170 7.0330 85.2038 0.0741 9.6077 RKtF 0.5021 0.3165 0.5179 8.7480 124.3016 0.0533 11.7691 RKtN 0.5051 0.3128 0.5845 9.6693 145.9337 0.0447 12.9900 V^t 0.4962 0.3099 0.4109 6.9512 83.4540 0.0741 9.4866 V^t,MA(1) 0.4921 0.3067 0.4220 6.9830 83.8263 0.0470 9.6152 V^t,MA(2) 0.4943 0.3066 0.4271 6.9312 83.1514 0.0441 9.6627 Notes: This table reports the summary statistics of ex post variance estimators based on 5-minute, 1-minute, 30-second, and 10-second SPY returns, along with the summary statistics of daily return and daily squared return. Sample period: July 2, 2014 to June 28, 2016. Open in new tab 6 Conclusion This article offers a new exact finite sample approach to estimate ex post variance using Bayesian nonparametric methods. The proposed approach benefits ex post variance estimation in two aspects. First, the observations with similar variance levels can be pooled together to increase accuracy. Second, the exact finite sample inference is available directly without relying on additional assumptions about a higher frequency DGP. Bayesian nonparametric variance estimators under no noise, heteroskedastic, and serially correlated microstructure noise cases are introduced. Monte Carlo simulation results show that the proposed approach can increase the accuracy of ex post variance estimation and provide reliable finite-sample inference. Applications to real equity returns show the new estimators conform closely to the RV and kernel estimators in terms of average statistical properties as well as time series characteristics. The Bayesian estimators can be used with confidence and have several benefits relative to existing methods. The Bayesian estimator can capture asymmetric density intervals, always remains positive and does not rely on the estimation of integrated quarticity. Appendix A.1 Existing Ex Post Volatility Estimation A.1.1 RV Let rt,i denotes the i-th intraday return on day t, i=1,…,nt ⁠, where nt is the number of intraday returns on day t. RV is defined as RVt=∑i=1ntrt,i2, (48) and RVt→pIVt ⁠, as nt→∞ (Andersen et al., 2001). Barndorff-Nielsen and Shephard (2002) derive the asymptotic distribution of RVt as nt12IQt(RVt−IVt)→dN(0,1),  as nt→∞, (49) where IQt stands for the integrated quarticity, which can be estimated by realized quarticity (RQt) defined as RQt=nt3∑i=1ntrt,i4→pIQt,  as nt→∞. (50) A.1.2 Flat-top realized kernel Barndorff-Nielsen et al. (2008) introduced the flat-top realized kernel (⁠ RKtF ⁠), which is the optimal estimator if the microstructure error is a white noise process. RKtF=∑i=1ntr˜t,i2+∑h=1Hk(h&minu