March 14, 2022, 1:11 a.m. | Maarten Grootendorst

cs.CL updates on arXiv.org arxiv.org

Topic models can be useful tools to discover latent topics in collections of
documents. Recent studies have shown the feasibility of approach topic modeling
as a clustering task. We present BERTopic, a topic model that extends this
process by extracting coherent topic representation through the development of
a class-based variation of TF-IDF. More specifically, BERTopic generates
document embedding with pre-trained transformer-based language models, clusters
these embeddings, and finally, generates topic representations with the
class-based TF-IDF procedure. BERTopic generates coherent topics …

arxiv modeling tf-idf topic modeling

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

#13721 - Data Engineer - AI Model Testing

@ Qualitest | Miami, Florida, United States

Elasticsearch Administrator

@ ManTech | 201BF - Customer Site, Chantilly, VA