April 9, 2024, 4:41 a.m. | I-Hong Hou

cs.LG updates on arXiv.org arxiv.org

arXiv:2404.04509v1 Announce Type: new
Abstract: This paper studies multi-stage systems with end-to-end bandit feedback. In such systems, each job needs to go through multiple stages, each managed by a different agent, before generating an outcome. Each agent can only control its own action and learn the final outcome of the job. It has neither knowledge nor control on actions taken by agents in the next stage. The goal of this paper is to develop distributed online learning algorithms that achieve …

abstract agent arxiv control cs.lg cs.ni distributed feedback job learn managed multiple paper stage studies systems through type

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote