April 24, 2024, 4:43 a.m. | Matias Alvo, Daniel Russo, Yash Kanoria

cs.LG updates on arXiv.org arxiv.org

arXiv:2306.11246v2 Announce Type: replace
Abstract: We argue that inventory management presents unique opportunities for reliably applying and evaluating deep reinforcement learning (DRL). Toward reliable application, we emphasize and test two techniques. The first is Hindsight Differentiable Policy Optimization (HDPO), which performs stochastic gradient descent to optimize policy performance while avoiding the need to repeatedly deploy randomized policies in the environment-as is common with generic policy gradient methods. Our second technique involves aligning policy (neural) network architectures with the structure of …

abstract application arxiv control cs.ai cs.lg differentiable gradient inventory management networks opportunities optimization performance policy reinforcement reinforcement learning stochastic test type unique via

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne