Feb. 21, 2024, 5:46 a.m. | Zhong-Zhi Li, Ming-Liang Zhang, Fei Yin, Cheng-Lin Liu

cs.CV updates on arXiv.org arxiv.org

arXiv:2311.16476v2 Announce Type: replace
Abstract: Geometry problem solving (GPS) is a challenging mathematical reasoning task requiring multi-modal understanding, fusion, and reasoning. Existing neural solvers take GPS as a vision-language task but are short in the representation of geometry diagrams that carry rich and complex layout information. In this paper, we propose a layout-aware neural solver named LANS, integrated with two new modules: multimodal layout-aware pre-trained language module (MLA-PLM) and layout-aware fusion attention (LA-FA). MLA-PLM adopts structural-semantic pre-training (SSP) to implement …

abstract arxiv cs.ai cs.cv diagrams fusion geometry gps information language mathematical reasoning modal multi-modal paper plane reasoning representation solver type understanding vision

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York