June 29, 2022, 1:11 a.m. | Srivas Chennu, Andrew Maher, Jamie Martin, Subash Prabanantham

stat.ML updates on arXiv.org arxiv.org

Real-world applications of reinforcement learning for recommendation and
experimentation faces a practical challenge: the relative reward of different
bandit arms can evolve over the lifetime of the learning agent. To deal with
these non-stationary cases, the agent must forget some historical knowledge, as
it may no longer be relevant to minimise regret. We present a solution to
handling non-stationarity that is suitable for deployment at scale, to provide
business operators with automated adaptive optimisation. Our solution aims to
provide interpretable …

arxiv lg memory

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote