Dec. 18, 2023, 3:26 p.m. | /u/masc98

Machine Learning www.reddit.com

Hey everyone, ML Engineer here, I've trained LLMs in real life on a decent amount of data from scratch and I have a point here that I'd like to discuss with you.

As you know LLMs are trained at very big scales both in terms of model size and data.

As you know from ML basics, we want to avoid data leakage: the test set must NOT see samples from the training set. That would be cheating.

Someday, maybe if …

big data data leakage discuss engineer hey intelligence life llms machinelearning ml engineer terms

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York