May 16, 2022, 2:43 p.m. | /u/igaloly

Natural Language Processing www.reddit.com

I have 1 billion documents.

Each document has a field with a vector named `embedding`.

Each embedding has 768 dimensions.

I want to find the mean vector out of this batch of documents.

A mean vector is, for example:

Assume I have 3 documents.

1 embedding for each document -> 3 embeddings: \[1, 2\] \[3, 4\] \[5, 6\]

The mean vector of this bucket of documents will be \[(1+3+5)/3, (2+4+6)/3\] -> \[3, 4\]

**What's the most time-efficient way I can …

elasticsearch languagetechnology mean opensearch vector

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Senior AI & Data Engineer

@ Bertelsmann | Kuala Lumpur, 14, MY, 50400

Analytics Engineer

@ Reverse Tech | Philippines - Remote