Oct. 12, 2022, 1:17 a.m. | Sosuke Nishikawa, Ikuya Yamada, Yoshimasa Tsuruoka, Isao Echizen

cs.CL updates on arXiv.org arxiv.org

We present a multilingual bag-of-entities model that effectively boosts the
performance of zero-shot cross-lingual text classification by extending a
multilingual pre-trained language model (e.g., M-BERT). It leverages the
multilingual nature of Wikidata: entities in multiple languages representing
the same concept are defined with a unique identifier. This enables entities
described in multiple languages to be represented using shared embeddings. A
model trained on entity features in a resource-rich language can thus be
directly applied to other languages. Our experimental results …

arxiv bag classification cross-lingual text text classification

Data Scientist (m/f/x/d)

@ Symanto Research GmbH & Co. KG | Spain, Germany

Robotics Technician - Weekend Day Shift

@ GXO Logistics | Hillsboro, OR, US, 97124

Gen AI Developer

@ NTT DATA | Irving, TX, US

Applied AI/ML - Vice President

@ JPMorgan Chase & Co. | LONDON, United Kingdom

Research Fellow (Computer Science/Engineering/AI)

@ Nanyang Technological University | NTU Main Campus, Singapore

Senior Machine Learning Engineer

@ Rasa | Remote - Germany