March 26, 2024, 4:45 a.m. | Shiqian Li, Kewen Wu, Chi Zhang, Yixin Zhu

cs.LG updates on arXiv.org arxiv.org

arXiv:2312.03009v2 Announce Type: replace-cross
Abstract: Current evaluation protocols predominantly assess physical reasoning in stationary scenes, creating a gap in evaluating agents' abilities to interact with dynamic events. While contemporary methods allow agents to modify initial scene configurations and observe consequences, they lack the capability to interact with events in real time. To address this, we introduce I-PHYRE, a framework that challenges agents to simultaneously exhibit intuitive physical reasoning, multi-step planning, and in-situ intervention. Here, intuitive physical reasoning refers to a …

abstract agents arxiv capability consequences cs.ai cs.cv cs.lg cs.ro current dynamic evaluation events gap interactive observe reasoning type

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Reporting & Data Analytics Lead (Sizewell C)

@ EDF | London, GB

Data Analyst

@ Notable | San Mateo, CA