April 12, 2024, 4:41 a.m. | Soichiro Nishimori, Xin-Qiang Cai, Johannes Ackermann, Masashi Sugiyama

cs.LG updates on arXiv.org arxiv.org

arXiv:2404.07465v1 Announce Type: new
Abstract: In this paper, we investigate an offline reinforcement learning (RL) problem where datasets are collected from two domains. In this scenario, having datasets with domain labels facilitates efficient policy training. However, in practice, the task of assigning domain labels can be resource-intensive or infeasible at a large scale, leading to a prevalence of domain-unlabeled data. To formalize this challenge, we introduce a novel offline RL problem setting named Positive-Unlabeled Offline RL (PUORL), which incorporates domain-unlabeled …

abstract arxiv cs.lg data datasets domain domains however labels offline paper policy practice reinforcement reinforcement learning training type

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Senior Data Engineer

@ Quantexa | Sydney, New South Wales, Australia

Staff Analytics Engineer

@ Warner Bros. Discovery | NY New York 230 Park Avenue South