Web: http://arxiv.org/abs/2205.01965

May 5, 2022, 1:12 a.m. | Lorenzo Steccanella, Anders Jonsson

cs.LG updates on arXiv.org arxiv.org

This paper presents a novel state representation for reward-free Markov
decision processes. The idea is to learn, in a self-supervised manner, an
embedding space where distances between pairs of embedded states correspond to
the minimum number of actions needed to transition between them. Compared to
previous methods, our approach does not require any domain knowledge, learning
from offline and unlabeled data. We show how this representation can be
leveraged to learn goal-conditioned policies, providing a notion of similarity
between states …

arxiv learning reinforcement reinforcement learning representation representation learning state

