Robust Lagrangian and Adversarial Policy Gradient for Robust Constrained Markov Decision Processes | allainews.com

May 7, 2024, 4:44 a.m. | David M. Bossens

cs.LG updates on arXiv.org arxiv.org

arXiv:2308.11267v2 Announce Type: replace
Abstract: The robust constrained Markov decision process (RCMDP) is a recent task-modelling framework for reinforcement learning that incorporates behavioural constraints and that provides robustness to errors in the transition dynamics model through the use of an uncertainty set. Simulating RCMDPs requires computing the worst-case dynamics based on value estimates for each state, an approach which has previously been used in the Robust Constrained Policy Gradient (RCPG). Highlighting potential downsides of RCPG such as not robustifying the …

abstract adversarial arxiv case computing constraints cs.ai cs.lg cs.ne decision dynamics errors framework gradient markov modelling policy process processes reinforcement reinforcement learning robust robustness set through transition type uncertainty

More from arxiv.org / cs.LG updates on arXiv.org

Bypassing the Safety Training of Open-Source LLMs with Priming Attacks 19 minutes ago | arxiv.org

arxiv attacks cs.ai cs.cl +7

Variational Mode Decomposition-Based Nonstationary Coherent Structure Analysis for Spatiotemporal Data 19 minutes ago | arxiv.org

abstract analysis and analysis arxiv +12

Differentially private projection-depth-based medians 19 minutes ago | arxiv.org

abstract arxiv cost cs.cr +19

Unified Binary and Multiclass Margin-Based Classification 19 minutes ago | arxiv.org

abstract algorithms analysis and analysis +15

An Experimental Design for Anytime-Valid Causal Inference on Multi-Armed Bandits 19 minutes ago | arxiv.org

abstract arxiv causal causal inference +12

Convergence of flow-based generative models via proximal gradient descent in Wasserstein space 19 minutes ago | arxiv.org

abstract advantages analysis arxiv +23

Identifying the Risks of LM Agents with an LM-Emulated Sandbox 19 minutes ago | arxiv.org

abstract advances agents amplify +22

Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs 19 minutes ago | arxiv.org

arxiv cs.ai cs.cl cs.lg +6

Robust Online Learning over Networks 19 minutes ago | arxiv.org

abstract agent agents arxiv +25

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

View on ai-jobs.net

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

View on ai-jobs.net

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

View on ai-jobs.net

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

View on ai-jobs.net

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

View on ai-jobs.net

Lead Developer (AI)

@ Cere Network | San Francisco, US

View on ai-jobs.net