June 25, 2023, 6:31 a.m. | /u/Lockonon3

Data Science www.reddit.com

I'm looking for different ways to summarize documents with vector embeddings

* centroid of word2vec embeddings
* doc2vec but in terms of distributed bag of words since word order doesn't really matter for this particular task
* The CLS embedding of Bert

To be economic, I plan to keep only the top 20 tf-idf words of each document. For that reason, word order is completely arbitrary.

bag bag of words bert datascience distributed documents economic embedding embeddings terms tf-idf vector vector embeddings word word2vec words

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US