May 10, 2024, 4:41 a.m. | Matteo Papini, Giorgio Manganini, Alberto Maria Metelli, Marcello Restelli

cs.LG updates on

arXiv:2405.05630v1 Announce Type: new
Abstract: Importance sampling (IS) represents a fundamental technique for a large surge of off-policy reinforcement learning approaches. Policy gradient (PG) methods, in particular, significantly benefit from IS, enabling the effective reuse of previously collected samples, thus increasing sample efficiency. However, classically, IS is employed in RL as a passive tool for re-weighting historical samples. However, the statistical community employs IS as an active tool combined with the use of behavioral distributions that allow the reduction of …

