Web: http://arxiv.org/abs/2201.10447

Jan. 26, 2022, 2:11 a.m. | Canzhe Zhao, Yanjie Ze, Jing Dong, Baoxiang Wang, Shuai Li

cs.LG updates on arXiv.org arxiv.org

Temporal difference (TD) learning is a widely used method to evaluate
policies in reinforcement learning. While many TD learning methods have been
developed in recent years, little attention has been paid to preserving privacy
and most of the existing approaches might face the concerns of data privacy
from users. To enable complex representative abilities of policies, in this
paper, we consider preserving privacy in TD learning with nonlinear value
function approximation. This is challenging because such a nonlinear problem is …

arxiv learning optimization stochastic

More from arxiv.org / cs.LG updates on arXiv.org

Machine Learning Product Manager (Europe, Remote)

@ FreshBooks | Germany

Field Operations and Data Engineer, ADAS

@ Lucid Motors | Newark, CA

Machine Learning Engineer - Senior

@ Novetta | Reston, VA

Analytics Engineer

@ ThirdLove | Remote

Senior Machine Learning Infrastructure Engineer - Safety

@ Discord | San Francisco, CA or Remote

Internship, Data Scientist

@ Everstream Analytics | United States (Remote)