Feb. 21, 2024, 5:46 a.m. | Zhong-Zhi Li, Ming-Liang Zhang, Fei Yin, Cheng-Lin Liu

cs.CV updates on arXiv.org arxiv.org

arXiv:2311.16476v2 Announce Type: replace
Abstract: Geometry problem solving (GPS) is a challenging mathematical reasoning task requiring multi-modal understanding, fusion, and reasoning. Existing neural solvers take GPS as a vision-language task but are short in the representation of geometry diagrams that carry rich and complex layout information. In this paper, we propose a layout-aware neural solver named LANS, integrated with two new modules: multimodal layout-aware pre-trained language module (MLA-PLM) and layout-aware fusion attention (LA-FA). MLA-PLM adopts structural-semantic pre-training (SSP) to implement …

abstract arxiv cs.ai cs.cv diagrams fusion geometry gps information language mathematical reasoning modal multi-modal paper plane reasoning representation solver type understanding vision

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Senior Software Engineer, Generative AI (C++)

@ SoundHound Inc. | Toronto, Canada