May 21, 2022, 11:10 a.m. | /u/CodeAllDay1337

Machine Learning www.reddit.com

Hi all,

I'm relatively new to the field of NLP and while reading a blog post from 2015 [The Unreasonable Effectiveness of Recurrent Neural Networks](http://karpathy.github.io/2015/05/21/rnn-effectiveness/) by Andrej Karpathy, I was wondering about this part of the "Further Reading" section:

>Currently it seems that word-level models work better than character-level models, but this is surely a temporary thing.

Aren't most state-of-the art models these days using some kind of vocabulary, i.e. whole words or at least sub-words? Text in the wild …

machinelearning tokenization

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Data Analyst - Associate

@ JPMorgan Chase & Co. | Mumbai, Maharashtra, India

Staff Data Engineer (Data Platform)

@ Coupang | Seoul, South Korea

AI/ML Engineering Research Internship

@ Keysight Technologies | Santa Rosa, CA, United States

Sr. Director, Head of Data Management and Reporting Execution

@ Biogen | Cambridge, MA, United States

Manager, Marketing - Audience Intelligence (Senior Data Analyst)

@ Delivery Hero | Singapore, Singapore