MM-PhyQA: Multimodal Physics Question-Answering With Multi-Image CoT Prompting | allainews.com

April 16, 2024, 4:50 a.m. | Avinash Anand, Janak Kapuriya, Apoorv Singh, Jay Saraf, Naman Lal, Astha Verma, Rushali Gupta, Rajiv Shah

cs.CL updates on arXiv.org arxiv.org

arXiv:2404.08704v1 Announce Type: new
Abstract: While Large Language Models (LLMs) can achieve human-level performance in various tasks, they continue to face challenges when it comes to effectively tackling multi-step physics reasoning tasks. To identify the shortcomings of existing models and facilitate further research in this area, we curated a novel dataset, MM-PhyQA, which comprises well-constructed, high schoollevel multimodal physics problems. By evaluating the performance of contemporary LLMs that are publicly available, both with and without the incorporation of multimodal elements …

abstract arxiv challenges cs.ai cs.cl dataset face human identify image language language models large language large language models llms multimodal novel performance physics prompting question reasoning research tasks type

More from arxiv.org / cs.CL updates on arXiv.org

Sparse is Enough in Fine-tuning Pre-trained Large Language Models 22 hours ago | arxiv.org

arxiv cs.ai cs.cl cs.lg +6

On the Learnability of Watermarks for Language Models 22 hours ago | arxiv.org

abstract arxiv cs.cl cs.cr +17

StableSSM: Alleviating the Curse of Memory in State-space Models through Stable Reparameterization 22 hours ago | arxiv.org

abstract arxiv capabilities cs.ai +14

Evaluating Generative Ad Hoc Information Retrieval 22 hours ago | arxiv.org

abstract advances arxiv cs.cl +19

Language Models As Semantic Indexers 22 hours ago | arxiv.org

arxiv cs.cl cs.ir cs.lg +4

Large language models can accurately predict searcher preferences 22 hours ago | arxiv.org

abstract arxiv cs.ai cs.cl +16

On the Reliability of Watermarks for Large Language Models 22 hours ago | arxiv.org

abstract arxiv become bots +28

A Watermark for Large Language Models 22 hours ago | arxiv.org

abstract arxiv cs.cl cs.cr +16

CreoleVal: Multilingual Multitask Benchmarks for Creoles 22 hours ago | arxiv.org

abstract annotated data arxiv benchmarks +14

AI Engineer Intern, Agents

@ Occam AI | US

View on ai-jobs.net

AI Research Scientist

@ Vara | Berlin, Germany and Remote

View on ai-jobs.net

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net

Lead Data Modeler

@ Sherwin-Williams | Cleveland, OH, United States

View on ai-jobs.net