April 27, 2022, 1:12 a.m. | Matthew Chang, Arjun Gupta, Saurabh Gupta

cs.LG updates on arXiv.org arxiv.org

This paper tackles the problem of learning value functions from undirected
state-only experience (state transitions without action labels i.e. (s,s',r)
tuples). We first theoretically characterize the applicability of Q-learning in
this setting. We show that tabular Q-learning in discrete Markov decision
processes (MDPs) learns the same value function under any arbitrary refinement
of the action space. This theoretical result motivates the design of Latent
Action Q-learning or LAQ, an offline RL method that can learn effective value
functions from state-only …

arxiv experience learning state value

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Senior Computer Vision Engineer

@ Motive | Pakistan - Remote

Data Analyst III

@ Fanatics | New York City, United States

Senior Data Scientist - Experian Health (This role is remote, from anywhere in the U.S.)

@ Experian | ., ., United States

Senior Data Engineer

@ Springer Nature Group | Pune, IN