Web: http://arxiv.org/abs/2203.08957

June 17, 2022, 1:11 a.m. | Zifan Wang, Yi Shen, Michael M. Zavlanos

cs.LG updates on arXiv.org arxiv.org

We consider an online stochastic game with risk-averse agents whose goal is
to learn optimal decisions that minimize the risk of incurring significantly
high costs. Specifically, we use the Conditional Value at Risk (CVaR) as a risk
measure that the agents can estimate using bandit feedback in the form of the
cost values of only their selected actions. Since the distributions of the cost
functions depend on the actions of all agents that are generally unobservable,
they are themselves unknown …

arxiv games learning lg online risk

