all AI news
[D] Character-level vs. word-level tokenization
May 21, 2022, 11:10 a.m. | /u/CodeAllDay1337
Machine Learning www.reddit.com
I'm relatively new to the field of NLP and while reading a blog post from 2015 [The Unreasonable Effectiveness of Recurrent Neural Networks](http://karpathy.github.io/2015/05/21/rnn-effectiveness/) by Andrej Karpathy, I was wondering about this part of the "Further Reading" section:
>Currently it seems that word-level models work better than character-level models, but this is surely a temporary thing.
Aren't most state-of-the art models these days using some kind of vocabulary, i.e. whole words or at least sub-words? Text in the wild …
More from www.reddit.com / Machine Learning
Jobs in AI, ML, Big Data
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Data Analyst - Associate
@ JPMorgan Chase & Co. | Mumbai, Maharashtra, India
Staff Data Engineer (Data Platform)
@ Coupang | Seoul, South Korea
AI/ML Engineering Research Internship
@ Keysight Technologies | Santa Rosa, CA, United States
Sr. Director, Head of Data Management and Reporting Execution
@ Biogen | Cambridge, MA, United States
Manager, Marketing - Audience Intelligence (Senior Data Analyst)
@ Delivery Hero | Singapore, Singapore