all AI news
Safe Deep RL in 3D Environments using Human Feedback. (arXiv:2201.08102v1 [cs.LG])
Jan. 21, 2022, 2:10 a.m. | Matthew Rahtz, Vikrant Varma, Ramana Kumar, Zachary Kenton, Shane Legg, Jan Leike
cs.LG updates on arXiv.org arxiv.org
Agents should avoid unsafe behaviour during both training and deployment.
This typically requires a simulator and a procedural specification of unsafe
behaviour. Unfortunately, a simulator is not always available, and procedurally
specifying constraints can be difficult or impossible for many real-world
tasks. A recently introduced technique, ReQueST, aims to solve this problem by
learning a neural simulator of the environment from safe human trajectories,
then using the learned simulator to efficiently learn a reward model from human
feedback. However, it …
More from arxiv.org / cs.LG updates on arXiv.org
Jobs in AI, ML, Big Data
Software Engineer for AI Training Data (School Specific)
@ G2i Inc | Remote
Software Engineer for AI Training Data (Python)
@ G2i Inc | Remote
Software Engineer for AI Training Data (Tier 2)
@ G2i Inc | Remote
Data Engineer
@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania
Artificial Intelligence – Bioinformatic Expert
@ University of Texas Medical Branch | Galveston, TX
Lead Developer (AI)
@ Cere Network | San Francisco, US