all AI news
Distributed No-Regret Learning for Multi-Stage Systems with End-to-End Bandit Feedback
April 9, 2024, 4:41 a.m. | I-Hong Hou
cs.LG updates on arXiv.org arxiv.org
Abstract: This paper studies multi-stage systems with end-to-end bandit feedback. In such systems, each job needs to go through multiple stages, each managed by a different agent, before generating an outcome. Each agent can only control its own action and learn the final outcome of the job. It has neither knowledge nor control on actions taken by agents in the next stage. The goal of this paper is to develop distributed online learning algorithms that achieve …
abstract agent arxiv control cs.lg cs.ni distributed feedback job learn managed multiple paper stage studies systems through type
More from arxiv.org / cs.LG updates on arXiv.org
Jobs in AI, ML, Big Data
Lead Developer (AI)
@ Cere Network | San Francisco, US
Research Engineer
@ Allora Labs | Remote
Ecosystem Manager
@ Allora Labs | Remote
Founding AI Engineer, Agents
@ Occam AI | New York
AI Engineer Intern, Agents
@ Occam AI | US
AI Research Scientist
@ Vara | Berlin, Germany and Remote