Nov. 17, 2022, 2:13 a.m. | Shinto Eguchi

stat.ML updates on arXiv.org arxiv.org

This paper aims at presenting a new application of information geometry to
reinforcement learning focusing on dynamic treatment resumes. In a standard
framework of reinforcement learning, a Q-function is defined as the conditional
expectation of a reward given a state and an action for a single-stage
situation. We introduce an equivalence relation, called the policy equivalence,
in the space of all the Q-functions. A class of information divergence is
defined in the Q-function space for every stage. The main objective …

arxiv divergence information resumes treatment

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne