April 21, 2024, 6:07 a.m. | /u/RiseWarm

Machine Learning www.reddit.com

I have about 4M newspaper articles. I want to train word embedding, topic modeling on them. I got colab pro+ and their high-ram spec only has around 60GB RAM.

The runtime just crushes when I try to train anything on those 4M articles. I can think that we will load the data batch by batch from hard disk and send them? I have really no experience here. I would love to hear your experience and suggestions.

articles colab data embedding machinelearning modeling them think topic modeling train will word word embedding

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York