Feb. 12, 2024, 5:41 a.m. | Nitsan Soffair Dotan Di-Castro Orly Avner Shie Mannor

cs.LG updates on arXiv.org arxiv.org

\textit{Std} $Q$-target is a \textit{conservative}, actor-critic, ensemble, $Q$-learning-based algorithm, which is based on a single key $Q$-formula: $Q$-networks standard deviation, which is an "uncertainty penalty", and, serves as a minimalistic solution to the problem of \textit{overestimation} bias. We implement \textit{SQT} on top of TD3/TD7 code and test it against the state-of-the-art (SOTA) actor-critic algorithms, DDPG, TD3 and TD7 on seven popular MuJoCo and Bullet tasks. Our results demonstrate \textit{SQT}'s $Q$-target formula superiority over \textit{TD3}'s $Q$-target formula as a \textit{conservative} solution …

actor actor-critic algorithm algorithms art bias code cs.ai cs.lg ddpg deviation ensemble key networks solution sota standard state test uncertainty

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US