March 21, 2024, 4:48 a.m. | Michal Mochtak, Peter Rupnik, Nikola Ljube\v{s}i\'c

cs.CL updates on arXiv.org arxiv.org

arXiv:2309.09783v2 Announce Type: replace
Abstract: The paper presents a new training dataset of sentences in 7 languages, manually annotated for sentiment, which are used in a series of experiments focused on training a robust sentiment identifier for parliamentary proceedings. The paper additionally introduces the first domain-specific multilingual transformer language model for political science applications, which was additionally pre-trained on 1.72 billion words from parliamentary proceedings of 27 European parliaments. We present experiments demonstrating how the additional pre-training on parliamentary data …

abstract arxiv cs.cl dataset domain identification language languages multilingual paper robust sentiment series training transformer type

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne