April 2, 2024, 7:51 p.m. | Alex Nguyen, Zilong Wang, Jingbo Shang, Dheeraj Mekala

cs.CL updates on arXiv.org arxiv.org

arXiv:2404.00439v1 Announce Type: new
Abstract: The application of natural language processing models to PDF documents is pivotal for various business applications yet the challenge of training models for this purpose persists in businesses due to specific hurdles. These include the complexity of working with PDF formats that necessitate parsing text and layout information for curating training data and the lack of privacy-preserving annotation tools. This paper introduces DOCMASTER, a unified platform designed for annotating PDF documents, model training, and inference, …

