Sept. 23, 2022, 1:12 a.m. | Haibin Zhou, Zichuan Lin, Junyou Li, Deheng Ye, Qiang Fu, Wei Yang

cs.LG updates on arXiv.org arxiv.org

We study the adaption of soft actor-critic (SAC) from continuous action space
to discrete action space. We revisit vanilla SAC and provide an in-depth
understanding of its Q value underestimation and performance instability issues
when applied to discrete settings. We thereby propose entropy-penalty and
double average Q-learning with Q-clip to address these issues. Extensive
experiments on typical benchmarks with discrete action space, including Atari
games and a large-scale MOBA game, show the efficacy of our proposed method.
Our code is …

actor-critic arxiv

Data Scientist (m/f/x/d)

@ Symanto Research GmbH & Co. KG | Spain, Germany

Associate Data Engineer

@ Redkite | London, England, United Kingdom

Data Management Associate Consultant

@ SAP | Porto Salvo, PT, 2740-262

NLP & Data Modelling Consultant - SAP LABS

@ SAP | Bengaluru, IN, 560066

Catalog Data Quality Specialist

@ Delivery Hero | Montevideo, Uruguay

Data Analyst for CEO Office with Pathway to Functional Analyst

@ Amar Bank | Jakarta