Variance-Reduced Policy Gradient Approaches for Infinite Horizon Average Reward Markov Decision Processes | allainews.com

April 3, 2024, 4:42 a.m. | Swetha Ganesh, Washim Uddin Mondal, Vaneet Aggarwal

cs.LG updates on arXiv.org arxiv.org

arXiv:2404.02108v1 Announce Type: new
Abstract: We present two Policy Gradient-based methods with general parameterization in the context of infinite horizon average reward Markov Decision Processes. The first approach employs Implicit Gradient Transport for variance reduction, ensuring an expected regret of the order $\tilde{\mathcal{O}}(T^{3/5})$. The second approach, rooted in Hessian-based techniques, ensures an expected regret of the order $\tilde{\mathcal{O}}(\sqrt{T})$. These results significantly improve the state of the art of the problem, which achieves a regret of $\tilde{\mathcal{O}}(T^{3/4})$.

abstract arxiv context cs.lg decision general gradient horizon markov policy processes transport type variance

More from arxiv.org / cs.LG updates on arXiv.org

The Unreasonable Effectiveness of Easy Training Data for Hard Tasks 17 hours ago | arxiv.org

arxiv cs.ai cs.cl cs.lg +7

Fairness in Serving Large Language Models 17 hours ago | arxiv.org

arxiv cs.ai cs.lg cs.pf +7

Is Knowledge All Large Language Models Needed for Causal Reasoning? 17 hours ago | arxiv.org

abstract artificial artificial intelligence arxiv +26

Experiential Co-Learning of Software-Developing Agents 17 hours ago | arxiv.org

agents arxiv cs.ai cs.cl +5

Mitigating Biases for Instruction-following Language Models via Bias Neurons Elimination 17 hours ago | arxiv.org

abstract arxiv bias biases +19

Challenging the Validity of Personality Tests for Large Language Models 17 hours ago | arxiv.org

abstract arxiv become cs.ai +20

Towards Open-world Cross-Domain Sequential Recommendation: A Model-Agnostic Contrastive Denoising Approach 17 hours ago | arxiv.org

abstract aim arxiv cs.ir +17

Large Language Models Can Infer Psychological Dispositions of Social Media Users 17 hours ago | arxiv.org

abstract arxiv chatgpt cs.ai +19

The Copycat Perceptron: Smashing Barriers Through Collective Learning 17 hours ago | arxiv.org

abstract analyze arxiv binary +14

Senior Machine Learning Engineer

@ GPTZero | Toronto, Canada

View on ai-jobs.net

ML/AI Engineer / NLP Expert - Custom LLM Development (x/f/m)

@ HelloBetter | Remote

View on ai-jobs.net

Doctoral Researcher (m/f/div) in Automated Processing of Bioimages

@ Leibniz Institute for Natural Product Research and Infection Biology (Leibniz-HKI) | Jena

View on ai-jobs.net

Coding Data Quality Auditor

@ Neuberger Berman | Work At Home-Georgia

View on ai-jobs.net

Post Graduate (Year-Round) Intern - Market Research Analyst and Agreement Support

@ National Renewable Energy Laboratory | CO - Golden

View on ai-jobs.net

Retail Analytics Engineering - Sr. Manager (Data)

@ Axalta | Woonsocket-1 CVS Drive

View on ai-jobs.net