Jan. 31, 2022, 8:35 p.m. | Diogo A.P. Nunes

Towards Data Science - Medium towardsdatascience.com

Exploring Large Collections of Documents with Unsupervised Topic Modelling — Part 2/4

Understanding document distribution with clustering

Image by author.

In this series of posts, we will be focusing on exploring large collections of unlabelled documents based on topic modelling. We will assume we know nothing about the contents of the corpus, except the corpus’ context. Our aim is to finish the exploration with some new, quantified knowledge about what is discussed in the corpus.

clustering modelling nlp part python reddit text-mining unsupervised

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

AI Engineering Manager

@ M47 Labs | Barcelona, Catalunya [Cataluña], Spain