all AI news
Worst-case Performance of Greedy Policies in Bandits with Imperfect Context Observations. (arXiv:2204.04773v2 [stat.ML] UPDATED)
May 27, 2022, 1:11 a.m. | Hongju Park, Mohamad Kazem Shirani Faradonbeh
stat.ML updates on arXiv.org arxiv.org
Contextual bandits are canonical models for sequential decision-making under
uncertainty in environments with time-varying components. In this setting, the
expected reward of each bandit arm consists of the inner product of an unknown
parameter with the context vector of that arm. The classical bandit settings
heavily rely on assuming that the contexts are fully observed, while study of
the richer model of imperfectly observed contextual bandits is immature. This
work considers Greedy reinforcement learning policies that take actions as if …
More from arxiv.org / stat.ML updates on arXiv.org
Jobs in AI, ML, Big Data
Senior ML Researcher - 3D Geometry Processing | 3D Shape Generation | 3D Mesh Data
@ Promaton | Europe
Senior Manager, IT Ops & Service Management, AI/ML
@ Sephora | San Francisco, CA, US, 50302863
AI/ML Senior Software Engineer (Indonesia)
@ Bjak | Jakarta, Jakarta, Indonesia
Data Engineer
@ Accenture Federal Services | Laurel, MD
Principal Engineer, Deep Learning
@ Outrider | Montreal, Quebec
Consultant Data manager F/H
@ Atos | Bezons, FRANCE, FR, 95870