all AI news
Learning Value Functions from Undirected State-only Experience. (arXiv:2204.12458v1 [cs.LG])
April 27, 2022, 1:12 a.m. | Matthew Chang, Arjun Gupta, Saurabh Gupta
cs.LG updates on arXiv.org arxiv.org
This paper tackles the problem of learning value functions from undirected
state-only experience (state transitions without action labels i.e. (s,s',r)
tuples). We first theoretically characterize the applicability of Q-learning in
this setting. We show that tabular Q-learning in discrete Markov decision
processes (MDPs) learns the same value function under any arbitrary refinement
of the action space. This theoretical result motivates the design of Latent
Action Q-learning or LAQ, an offline RL method that can learn effective value
functions from state-only …
More from arxiv.org / cs.LG updates on arXiv.org
Jobs in AI, ML, Big Data
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Senior Computer Vision Engineer
@ Motive | Pakistan - Remote
Data Analyst III
@ Fanatics | New York City, United States
Senior Data Scientist - Experian Health (This role is remote, from anywhere in the U.S.)
@ Experian | ., ., United States
Senior Data Engineer
@ Springer Nature Group | Pune, IN