March 3, 2024, 10:17 a.m. | /u/Personal-Trainer-541

Deep Learning www.reddit.com

Hi there,

I've created a video [here](https://youtu.be/hL4ZnAWSyuU) where I talk about the three most used tokenizers when training LLMs: (1) BPE encoding, (2) wordpiece and (3) sentencepiece.

I hope it may be of use to some of you out there. Feedback is more than welcomed! :)

deeplearning explained feedback llm

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Business Data Scientist, gTech Ads

@ Google | Mexico City, CDMX, Mexico

Lead, Data Analytics Operations

@ Zocdoc | Pune, Maharashtra, India