Web: http://arxiv.org/abs/2106.08229

Jan. 12, 2022, 2:10 a.m. | Pablo Samuel Castro, Tyler Kastner, Prakash Panangaden, Mark Rowland

cs.LG updates on arXiv.org arxiv.org

We present a new behavioural distance over the state space of a Markov
decision process, and demonstrate the use of this distance as an effective
means of shaping the learnt representations of deep reinforcement learning
agents. While existing notions of state similarity are typically difficult to
learn at scale due to high computational cost and lack of sample-based
algorithms, our newly-proposed distance addresses both of these issues. In
addition to providing detailed theoretical analysis, we provide empirical
evidence that learning this distance alongside the value function yields
structured and informative …

arxiv decision for markov processes

Statistics and Computer Science Specialist

@ Hawk-Research | Remote

Data Scientist, Credit/Fraud Strategy

@ Fora Financial | New York City

Postdoctoral Research Associate - Biomedical Natural Language Processing and Deep Learning

@ Oak Ridge National Laboratory - Oak Ridge, TN | Oak Ridge, TN, United States

Senior Machine Learning / Computer Vision Engineer

@ Glass Imaging | Los Altos, CA

Research Scientist in Biomedical Natural Language Processing and Deep Learning

@ Oak Ridge National Laboratory | Oak Ridge, TN

W3-Professorship for Intelligent Energy Management

@ Universität Bayreuth | Bayreuth, Germany