Oct. 6, 2022, 1:16 a.m. | Thomas Carta, Sylvain Lamprier, Pierre-Yves Oudeyer, Olivier Sigaud

cs.CL updates on arXiv.org arxiv.org

Reinforcement learning (RL) in long horizon and sparse reward tasks is
notoriously difficult and requires a lot of training steps. A standard solution
to speed up the process is to leverage additional reward signals, shaping it to
better guide the learning process. In the context of language-conditioned RL,
the abstraction and generalisation properties of the language input provide
opportunities for more efficient ways of shaping the reward. In this paper, we
leverage this idea and propose an automated reward shaping …

arxiv language

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Technology Consultant Master Data Management (w/m/d)

@ SAP | Walldorf, DE, 69190

Research Engineer, Computer Vision, Google Research

@ Google | Nairobi, Kenya