April 16, 2024, 4:48 a.m. | Kaixin Li, Yuchen Tian, Qisheng Hu, Ziyang Luo, Jing Ma

cs.CV updates on arXiv.org arxiv.org

arXiv:2404.09486v1 Announce Type: cross
Abstract: Programming often involves converting detailed and complex specifications into code, a process during which developers typically utilize visual aids to more effectively convey concepts. While recent developments in Large Multimodal Models have demonstrated remarkable abilities in visual reasoning and mathematical tasks, there is little work on investigating whether these models can effectively interpret visual elements for code generation. To this end, we present MMCode, the first multi-modal coding dataset for evaluating algorithmic problem-solving skills in …

arxiv code cs.cl cs.cv cs.se language language models large language large language models modal multi-modal programming type

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US