Web: http://arxiv.org/abs/2201.10542

Jan. 26, 2022, 2:11 a.m. | Akshay Mete, Rahul Singh, P. R. Kumar

cs.LG updates on arXiv.org arxiv.org

We consider the problem of controlling a stochastic linear system with
quadratic costs, when its system parameters are not known to the agent --
called the adaptive LQG control problem. We re-examine an approach called
"Reward-Biased Maximum Likelihood Estimate" (RBMLE) that was proposed more than
forty years ago, and which predates the "Upper Confidence Bound" (UCB) method
as well as the definition of "regret". It simply added a term favoring
parameters with larger rewards to the estimation criterion. We propose …

