Jan. 7, 2022, 2:10 a.m. | Rahul Singh, Abhishek Gupta, Ness B. Shroff

cs.LG updates on arXiv.org arxiv.org

We consider reinforcement learning (RL) in Markov Decision Processes in which
an agent repeatedly interacts with an environment that is modeled by a
controlled Markov process. At each time step $t$, it earns a reward, and also
incurs a cost-vector consisting of $M$ costs. We design model-based RL
algorithms that maximize the cumulative reward earned over a time horizon of
$T$ time-steps, while simultaneously ensuring that the average values of the
$M$ cost expenditures are bounded by agent-specified thresholds
$c^{ub}_i,i=1,2,\ldots,M$. …

arxiv decision learning markov processes

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Data Analyst - Associate

@ JPMorgan Chase & Co. | Mumbai, Maharashtra, India

Staff Data Engineer (Data Platform)

@ Coupang | Seoul, South Korea

AI/ML Engineering Research Internship

@ Keysight Technologies | Santa Rosa, CA, United States

Sr. Director, Head of Data Management and Reporting Execution

@ Biogen | Cambridge, MA, United States

Manager, Marketing - Audience Intelligence (Senior Data Analyst)

@ Delivery Hero | Singapore, Singapore