Sept. 13, 2023, 11:40 p.m. | Antonio Jimenez Caballero

Towards Data Science - Medium towardsdatascience.com

Document Topic Extraction with Large Language Models (LLM) and the Latent Dirichlet Allocation (LDA) Algorithm

A guide on how to efficiently extract topics from large documents using Large Language Models (LLM) and the Latent Dirichlet Allocation (LDA) algorithm.

Photo by Henry Be on Unsplash

Introduction

I was developing a web application for chatting with PDF files, capable of processing large documents, above 1000 pages. But before starting a conversation with the document, I wanted the application to give the user …

ai langchain lda llm text-preprocessing

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Field Sample Specialist (Air Sampling) - Eurofins Environment Testing – Pueblo, CO

@ Eurofins | Pueblo, CO, United States

Camera Perception Engineer

@ Meta | Sunnyvale, CA