March 18, 2024, 4:42 a.m. | Huayu Chen, Cheng Lu, Zhengyi Wang, Hang Su, Jun Zhu

cs.LG updates on arXiv.org arxiv.org

arXiv:2310.07297v3 Announce Type: replace
Abstract: Recent developments in offline reinforcement learning have uncovered the immense potential of diffusion modeling, which excels at representing heterogeneous behavior policies. However, sampling from diffusion policies is considerably slow because it necessitates tens to hundreds of iterative inference steps for one action. To address this issue, we propose to extract an efficient deterministic inference policy from critic models and pretrained diffusion behavior models, leveraging the latter to directly regularize the policy gradient with the behavior …

abstract arxiv behavior cs.lg diffusion diffusion modeling however inference issue iterative modeling offline optimization policy reinforcement reinforcement learning sampling through type

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US