all AI news
Understanding Multimodal Procedural Knowledge by Sequencing Multimodal Instructional Manuals
Feb. 22, 2024, 5:46 a.m. | Te-Lin Wu, Alex Spangher, Pegah Alipoormolabashi, Marjorie Freedman, Ralph Weischedel, Nanyun Peng
cs.CV updates on arXiv.org arxiv.org
Abstract: The ability to sequence unordered events is an essential skill to comprehend and reason about real world task procedures, which often requires thorough understanding of temporal common sense and multimodal information, as these procedures are often communicated through a combination of texts and images. Such capability is essential for applications such as sequential task planning and multi-source instruction summarization. While humans are capable of reasoning about and sequencing unordered multimodal procedural instructions, whether current machine …
abstract arxiv capability combination common sense cs.cl cs.cv events images information knowledge multimodal reason sense sequencing temporal through type understanding world
More from arxiv.org / cs.CV updates on arXiv.org
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
#13721 - Data Engineer - AI Model Testing
@ Qualitest | Miami, Florida, United States
Elasticsearch Administrator
@ ManTech | 201BF - Customer Site, Chantilly, VA