Sept. 28, 2022, 9:38 a.m. | /u/eternalmathstudent

Deep Learning www.reddit.com

I understand CBOW and skip-gram and their respective architectures and the intuition behind the model to a good extent. However I have the following 2 burning questions

1. Consider **CBOW** with **4 context words**, why the input layer has **4 full-vocabulary length one-hot vectors** to represent these 4 words and take average of them? Why can't it be just **1 vocabulary length vector with 4 ones** (in otherwords **4-hot vector**)?
2. **CBOW** takes inputs as context words and predict a …

deeplearning word2vec

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Senior Business Intelligence Developer / Analyst

@ Transamerica | Work From Home, USA

Data Analyst (All Levels)

@ Noblis | Bethesda, MD, United States