April 25, 2024, 3:12 p.m. | Maykeye

DEV Community dev.to

Modern tokenizers come in all form! Some, like qwen, support ~150 000 tokens.


Byte level models support 256 tokens.


Can we go lower?


(There should be "you were so busy asking if you could" meme, but dev.to complains)


But the answer is yes. MambaBit comes with just 2 tokens. One token for bit 0, one token for bit 1. That's it. Yet somehow it still produces something which is not completely random.


Behold the most cursed becomes



Behold the most …

dev form llm machinelearning meme modern qwen support token tokens

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne