Jan. 30, 2024, 12:55 p.m. | /u/ez613

Machine Learning www.reddit.com

Hello !



Is it feasible to set up the model's weights in such a way that the output of the final softmax layer, prior to any training, mirrors the distribution of tokens in the training data?

My initial thought is to initialize all weights and biases to zero, and then modify the softmax layer (which would initially output zeros) by incorporating a pre-calculated vector of observed token probabilities. I haven't come across this approach in my research thus far …

biases data distribution hello layer llm machinelearning natural prior set small softmax thought token tokens training training data weights and biases

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US