Feb. 27, 2024, 5:51 a.m. | Yintao Tai, Xiyang Liao, Alessandro Suglia, Antonio Vergari

cs.CL updates on arXiv.org arxiv.org

arXiv:2401.03321v2 Announce Type: replace
Abstract: Recent work showed the possibility of building open-vocabulary large language models (LLMs) that directly operate on pixel representations. These models are implemented as autoencoders that reconstruct masked patches of rendered text. However, these pixel-based LLMs are limited to discriminative tasks (e.g., classification) and, similar to BERT, cannot be used to generate text. Therefore, they cannot be used for generative tasks such as free-form question answering. In this work, we introduce PIXAR, the first pixel-based autoregressive …

abstract arxiv auto autoencoders bert building classification cs.cl language language models large language large language models llms modeling pixar pixel possibility space tasks text type work

