Sept. 30, 2022, 1:14 a.m. | Xiaoteng Ma, Zhipeng Liang, Jose Blanchet, Mingwen Liu, Li Xia, Jiheng Zhang, Qianchuan Zhao, Zhengyuan Zhou

stat.ML updates on arXiv.org arxiv.org

Among the reasons hindering reinforcement learning (RL) applications to
real-world problems, two factors are critical: limited data and the mismatch
between the testing environment (real environment in which the policy is
deployed) and the training environment (e.g., a simulator). This paper attempts
to address these issues simultaneously with distributionally robust offline RL,
where we learn a distributionally robust policy using historical data obtained
from the source environment by optimizing against a worst-case perturbation
thereof. In particular, we move beyond tabular …

approximation arxiv function linear offline reinforcement reinforcement learning

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Social Insights & Data Analyst (Freelance)

@ Media.Monks | Jakarta

Cloud Data Engineer

@ Arkatechture | Portland, ME, USA