all AI news
The ParlaSent Multilingual Training Dataset for Sentiment Identification in Parliamentary Proceedings
March 21, 2024, 4:48 a.m. | Michal Mochtak, Peter Rupnik, Nikola Ljube\v{s}i\'c
cs.CL updates on arXiv.org arxiv.org
Abstract: The paper presents a new training dataset of sentences in 7 languages, manually annotated for sentiment, which are used in a series of experiments focused on training a robust sentiment identifier for parliamentary proceedings. The paper additionally introduces the first domain-specific multilingual transformer language model for political science applications, which was additionally pre-trained on 1.72 billion words from parliamentary proceedings of 27 European parliaments. We present experiments demonstrating how the additional pre-training on parliamentary data …
abstract arxiv cs.cl dataset domain identification language languages multilingual paper robust sentiment series training transformer type
More from arxiv.org / cs.CL updates on arXiv.org
Jobs in AI, ML, Big Data
Founding AI Engineer, Agents
@ Occam AI | New York
AI Engineer Intern, Agents
@ Occam AI | US
AI Research Scientist
@ Vara | Berlin, Germany and Remote
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne