all AI news
[P] Testing different popular GPT tokenizers
May 19, 2023, 1:45 p.m. | /u/dxg39
Machine Learning www.reddit.com
Turns out most of them are not.
https://github.com/skeskinen/hf-tokenizer-testing
Does it matter if tokenizers can/can't reproduce the input exactly? I guess this is subjective, but I'd say it's at least a nice feature. A feature that (perhaps surprisingly?) most tokenizers out there don't seem to have.
I wrote this for myself on a quest to find a tokenizer …
decode encode feature gpt least machinelearning nice popular project small testing
More from www.reddit.com / Machine Learning
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Senior Data Engineer
@ Cint | Gurgaon, India
Data Science (M/F), setor automóvel - Aveiro
@ Segula Technologies | Aveiro, Portugal