Get 20M+ Full-Text Papers For Less Than $1.50/day. Subscribe now for You or Your Team.

Learn More →

Apprenticeship learning via inverse reinforcement learning

Apprenticeship learning via inverse reinforcement learning Apprenticeship Learning via Inverse Reinforcement Learning Pieter Abbeel Andrew Y. Ng Computer Science Department, Stanford University, Stanford, CA 94305, USA pabbeel@cs.stanford.edu ang@cs.stanford.edu Abstract We consider learning in a Markov decision process where we are not explicitly given a reward function, but where instead we can observe an expert demonstrating the task that we want to learn to perform. This setting is useful in applications (such as the task of driving) where it may be di ƒcult to write down an explicit reward function specifying exactly how di €erent desiderata should be traded o €. We think of the expert as trying to maximize a reward function that is expressible as a linear combination of known features, and give an algorithm for learning the task demonstrated by the expert. Our algorithm is based on using œinverse reinforcement learning  to try to recover the unknown reward function. We show that our algorithm terminates in a small number of iterations, and that even though we may never recover the expert ™s reward function, the policy output by the algorithm will attain performance close to that of the expert, where here performance is measured with respect to the expert ™s unknown http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png

Apprenticeship learning via inverse reinforcement learning

Association for Computing Machinery — Jul 4, 2004

Loading next page...
/lp/association-for-computing-machinery/apprenticeship-learning-via-inverse-reinforcement-learning-00UTILhVRE

References (22)

Datasource
Association for Computing Machinery
Copyright
Copyright © 2004 by ACM Inc.
ISBN
1-58113-838-5
doi
10.1145/1015330.1015430
Publisher site
See Article on Publisher Site

Abstract

Apprenticeship Learning via Inverse Reinforcement Learning Pieter Abbeel Andrew Y. Ng Computer Science Department, Stanford University, Stanford, CA 94305, USA pabbeel@cs.stanford.edu ang@cs.stanford.edu Abstract We consider learning in a Markov decision process where we are not explicitly given a reward function, but where instead we can observe an expert demonstrating the task that we want to learn to perform. This setting is useful in applications (such as the task of driving) where it may be di ƒcult to write down an explicit reward function specifying exactly how di €erent desiderata should be traded o €. We think of the expert as trying to maximize a reward function that is expressible as a linear combination of known features, and give an algorithm for learning the task demonstrated by the expert. Our algorithm is based on using œinverse reinforcement learning  to try to recover the unknown reward function. We show that our algorithm terminates in a small number of iterations, and that even though we may never recover the expert ™s reward function, the policy output by the algorithm will attain performance close to that of the expert, where here performance is measured with respect to the expert ™s unknown

There are no references for this article.