March 19, 2024, 7:35 p.m. | /u/timtom85

Machine Learning www.reddit.com

My intuition is that tokens get gradually enriched as we move through the layers, but that would mean we need to store a lot less information per token in the early layers than in the later ones.

Wouldn't it make sense to start out with (relatively) low-dimensional embeddings, and then project or extend these onto higher dimensions, until they reach their final size?

dimensionality embeddings information intuition layer low machinelearning mean per sense store through token tokens transformers

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Research Scientist, Demography and Survey Science, University Grad

@ Meta | Menlo Park, CA | New York City

Computer Vision Engineer, XR

@ Meta | Burlingame, CA