March 7, 2024, 5:45 a.m. | Emanuele Vivoli, Joan Lafuente Baeza, Ernest Valveny Llobet, Dimosthenis Karatzas

cs.CV updates on arXiv.org arxiv.org

arXiv:2403.03719v1 Announce Type: new
Abstract: This work explores a closure task in comics, a medium where visual and textual elements are intricately intertwined. Specifically, Text-cloze refers to the task of selecting the correct text to use in a comic panel, given its neighboring panels. Traditional methods based on recurrent neural networks have struggled with this task due to limited OCR accuracy and inherent model limitations. We introduce a novel Multimodal Large Language Model (Multimodal-LLM) architecture, specifically designed for Text-cloze, achieving …

abstract arxiv comics cs.cv medium multimodal networks neural networks panel panels recurrent neural networks text textual transformer type visual work

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

AIML - Sr Machine Learning Engineer, Data and ML Innovation

@ Apple | Seattle, WA, United States

Senior Data Engineer

@ Palta | Palta Cyprus, Palta Warsaw, Palta remote