[D] Why do transformers use embeddings with the same dimensionality in each layer? | allainews.com

March 19, 2024, 7:35 p.m. | /u/timtom85

Machine Learning www.reddit.com

My intuition is that tokens get gradually enriched as we move through the layers, but that would mean we need to store a lot less information per token in the early layers than in the later ones.

Wouldn't it make sense to start out with (relatively) low-dimensional embeddings, and then project or extend these onto higher dimensions, until they reach their final size?

dimensionality embeddings information intuition layer low machinelearning mean per sense store through token tokens transformers

More from www.reddit.com / Machine Learning

[D] How would you diagnose these spikes in the training loss? 5 hours ago | www.reddit.com

loss machinelearning training training loss

"transformers can use meaningless filler tokens (e.g., '......') in place of a chain of thought" … 7 hours ago | www.reddit.com

abstract chain of thought converge however +9

[D] What are the most common and significant challenges moving your LLM (application/system) to production? 8 hours ago | www.reddit.com

application building challenges companies +10

[P] Natural language to MongoDB query conversion. 11 hours ago | www.reddit.com

machinelearning

[D] Role of the Identity Matrix in PointNet's Input Transformation Block 13 hours ago | www.reddit.com

block cloud code context +7

[P] NLLB-200 Distill 350M for en-ko 15 hours ago | www.reddit.com

cpu english good gpu +9

[D] Real talk about RAG 23 hours ago | www.reddit.com

data deal documents machinelearning +5

[P] Classification finetuning experiments on small GPT-2 sized LLMs 1 day, 4 hours ago | www.reddit.com

acc classification context cpu +16

[D] Llama-3 based OpenBioLLM-70B & 8B: Outperforms GPT-4, Gemini, Meditron-70B, Med-PaLM-1 & Med-PaLM-2 in Medical-domain 1 day, 5 hours ago | www.reddit.com

70b art biomedical domain +16

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

View on ai-jobs.net

Research Scientist, Demography and Survey Science, University Grad

@ Meta | Menlo Park, CA | New York City

View on ai-jobs.net

Computer Vision Engineer, XR

@ Meta | Burlingame, CA

View on ai-jobs.net