all AI news
CWTM: Leveraging Contextualized Word Embeddings from BERT for Neural Topic Modeling
March 7, 2024, 5:48 a.m. | Zheng Fang, Yulan He, Rob Procter
cs.CL updates on arXiv.org arxiv.org
Abstract: Most existing topic models rely on bag-of-words (BOW) representation, which limits their ability to capture word order information and leads to challenges with out-of-vocabulary (OOV) words in new documents. Contextualized word embeddings, however, show superiority in word sense disambiguation and effectively address the OOV issue. In this work, we introduce a novel neural topic model called the Contextlized Word Topic Model (CWTM), which integrates contextualized word embeddings from BERT. The model is capable of learning …
abstract arxiv bag bert challenges cs.ai cs.cl documents embeddings however information leads modeling representation sense show topic modeling type word word embeddings words
More from arxiv.org / cs.CL updates on arXiv.org
Benchmarking LLMs via Uncertainty Quantification
2 days, 4 hours ago |
arxiv.org
CARE: Extracting Experimental Findings From Clinical Literature
2 days, 4 hours ago |
arxiv.org
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Research Scientist
@ Meta | Menlo Park, CA
Principal Data Scientist
@ Mastercard | O'Fallon, Missouri (Main Campus)