Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Combining importance sampling and temporal difference control variates to simulate Markov Chains

Combining importance sampling and temporal difference control variates to simulate Markov Chains It is well known that in estimating performance measures associated with a stochastic system a good importance sampling distribution (IS) can give orders of magnitude of variance reduction while a bad one may lead to large, even infinite, variance. In this paper we study how this sensitivity of the estimator variance to the importance sampling change of measure may be "dampened" by combining importance sampling with stochastic approximation based temporal difference (TD) method. We consider a finite state space discrete time Markov chain (DTMC) with one-step transition rewards and an absorbing set of states and focus on estimating the cumulative expected reward to absorption starting from any state. In this setting we develop sufficient conditions under which the estimate resulting from the combined approach has a mean square error that asymptotically equals zero even when the estimate formed by using only importance sampling change of measure has infinite variance. In particular, we consider the problem of estimating the small buffer overflow probability in a queuing network, where the change of measure suggested in literature is shown to have infinite variance under certain parameters and where the appropriate combination of IS and TD method can be empirically seen to have a much faster convergence rate compared to naive simulation. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png ACM Transactions on Modeling and Computer Simulation (TOMACS) Association for Computing Machinery

Combining importance sampling and temporal difference control variates to simulate Markov Chains

Loading next page...
 
/lp/association-for-computing-machinery/combining-importance-sampling-and-temporal-difference-control-variates-tAsBnfOppQ
Publisher
Association for Computing Machinery
Copyright
Copyright © 2004 by ACM Inc.
ISSN
1049-3301
DOI
10.1145/974734.974735
Publisher site
See Article on Publisher Site

Abstract

It is well known that in estimating performance measures associated with a stochastic system a good importance sampling distribution (IS) can give orders of magnitude of variance reduction while a bad one may lead to large, even infinite, variance. In this paper we study how this sensitivity of the estimator variance to the importance sampling change of measure may be "dampened" by combining importance sampling with stochastic approximation based temporal difference (TD) method. We consider a finite state space discrete time Markov chain (DTMC) with one-step transition rewards and an absorbing set of states and focus on estimating the cumulative expected reward to absorption starting from any state. In this setting we develop sufficient conditions under which the estimate resulting from the combined approach has a mean square error that asymptotically equals zero even when the estimate formed by using only importance sampling change of measure has infinite variance. In particular, we consider the problem of estimating the small buffer overflow probability in a queuing network, where the change of measure suggested in literature is shown to have infinite variance under certain parameters and where the appropriate combination of IS and TD method can be empirically seen to have a much faster convergence rate compared to naive simulation.

Journal

ACM Transactions on Modeling and Computer Simulation (TOMACS)Association for Computing Machinery

Published: Jan 1, 2004

There are no references for this article.