Sept. 28, 2022, 9:38 a.m. | /u/eternalmathstudent

Computer Vision www.reddit.com

I understand CBOW and skip-gram and their respective architectures and the intuition behind the model to a good extent. However I have the following 2 burning questions

1. Consider **CBOW** with **4 context words**, why the input layer has **4 full-vocabulary length one-hot vectors** to represent these 4 words and take average of them? Why can't it be just **1 vocabulary length vector with 4 ones** (in otherwords **4-hot vector**)?
2. **CBOW** takes inputs as context words and predict a …

computervision word2vec

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Business Intelligence Analyst

@ Rappi | COL-Bogotá

Applied Scientist II

@ Microsoft | Redmond, Washington, United States