all AI news
Tur[k]ingBench: A Challenge Benchmark for Web Agents
March 19, 2024, 4:50 a.m. | Kevin Xu, Yeganeh Kordi, Kate Sanders, Yizhong Wang, Adam Byerly, Jack Zhang, Benjamin Van Durme, Daniel Khashabi
cs.CV updates on arXiv.org arxiv.org
Abstract: Recent chatbots have demonstrated impressive ability to understand and communicate in raw-text form. However, there is more to the world than raw text. For example, humans spend long hours of their time on web pages, where text is intertwined with other modalities and tasks are accomplished in the form of various complex interactions. Can state-of-the-art multi-modal models generalize to such complex domains?
To address this question, we introduce TurkingBench, a benchmark of tasks formulated as …
abstract agents arxiv benchmark challenge chatbots cs.ai cs.cl cs.cv cs.hc example form however humans raw spend tasks text type web world
More from arxiv.org / cs.CV updates on arXiv.org
Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs
2 days, 10 hours ago |
arxiv.org
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Field Sample Specialist (Air Sampling) - Eurofins Environment Testing – Pueblo, CO
@ Eurofins | Pueblo, CO, United States
Camera Perception Engineer
@ Meta | Sunnyvale, CA