[D] technical question: How is it possible that embedding models produce fixed size vectors for sentences with varying lengths? | allainews.com

July 22, 2023, 1:42 p.m. | /u/Qdr-91

Machine Learning www.reddit.com

As far as I know and studied, each token is mapped from high dimensional discrete token space into a continuous, lower dimensional space where words are embedded meaningfully based on their relationships in the training data. So 1000 tokens text produces 1000 vectors.

Now for vector databases (correct me if I'm wrong), people are storing fixed sized vectors for text with varying lengths. For example, 2 sentences one with 1000 tokens and the other is 10 tokens, each produces one …

continuous data embedded embedding embedding models machinelearning mapped relationships space technical text token tokens training training data vectors words

More from www.reddit.com / Machine Learning

[D] How to train very shallow (dot product) networks with huge embeddings on a GPU … 3 hours ago | www.reddit.com

cluster compute cpu embedding +11

[P] Google Colab crashes before even training my images dataset. 16 hours ago | www.reddit.com

binary class classification colab +16

[D] Is Evaluating LLM Performance on Domain-Specific QA Sufficient for a Top-Tier Conference Submission? 17 hours ago | www.reddit.com

conference domain five hello +9

[N] Book Lauching: Accelerate Model Training with PyTorch 2.X 17 hours ago | www.reddit.com

ai workloads analyst book boosting +12

[D] Best community/website to find ML engineer interested in hourly work 20 hours ago | www.reddit.com

apis building community custom models +15

[D] What on earth is "discretization" step in Mamba? 22 hours ago | www.reddit.com

article core earth form +11

[R] Better & Faster Large Language Models via Multi-token Prediction 22 hours ago | www.reddit.com

abstract efficiency future gpt +17

[D] How to use RAG benchmarks in practice 1 day, 3 hours ago | www.reddit.com

context datasets however machinelearning +5

[D] ECCV-2024 reviews are out 1 day, 11 hours ago | www.reddit.com

eccv machinelearning reviews

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

View on ai-jobs.net

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

View on ai-jobs.net

Lead Developer (AI)

@ Cere Network | San Francisco, US

View on ai-jobs.net

Research Engineer

@ Allora Labs | Remote

View on ai-jobs.net

Ecosystem Manager

@ Allora Labs | Remote

View on ai-jobs.net

Founding AI Engineer, Agents

@ Occam AI | New York

View on ai-jobs.net