Feb. 13, 2024, 5:44 a.m. | Pouya Hamadanian Arash Nasr-Esfahany Malte Schwarzkopf Siddartha Sen Mohammad Alizadeh

cs.LG updates on arXiv.org arxiv.org

We study online reinforcement learning (RL) in non-stationary environments, where a time-varying exogenous context process affects the environment dynamics. Online RL is challenging in such environments due to "catastrophic forgetting" (CF). The agent tends to forget prior knowledge as it trains on new experiences. Prior approaches to mitigate this issue assume task labels (which are often not available in practice) or use off-policy methods that suffer from instability and poor performance.
We present Locally Constrained Policy Optimization (LCPO), an online …

agent catastrophic forgetting context cs.ai cs.lg dynamics environment environments exogenous issue knowledge labels online reinforcement learning prior process reinforcement reinforcement learning study the environment trains

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Senior Data Engineer

@ Quantexa | Sydney, New South Wales, Australia

Staff Analytics Engineer

@ Warner Bros. Discovery | NY New York 230 Park Avenue South