April 9, 2024, 4:51 a.m. | Arthur Amalvy (LIA), Vincent Labatut (LIA), Richard Dufour (LS2N - \'equipe TALN)

cs.CL updates on arXiv.org arxiv.org

arXiv:2310.10118v3 Announce Type: replace
Abstract: While recent pre-trained transformer-based models can perform named entity recognition (NER) with great accuracy, their limited range remains an issue when applied to long documents such as whole novels. To alleviate this issue, a solution is to retrieve relevant context at the document level. Unfortunately, the lack of supervision for such a task means one has to settle for unsupervised approaches. Instead, we propose to generate a synthetic context retrieval training dataset using Alpaca, an …

abstract accuracy arxiv context cs.cl dataset document documents issue ner recognition solution synthetic transformer transformer-based models type

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

#13721 - Data Engineer - AI Model Testing

@ Qualitest | Miami, Florida, United States

Elasticsearch Administrator

@ ManTech | 201BF - Customer Site, Chantilly, VA