June 18, 2022, 12:56 p.m. | /u/imhungryforp1zza

Data Science www.reddit.com

I have a toy dataset that has a number of features. Some features have continuous values, others are one-hot encoded. I've set around 1% of each feature to be NaN to test different imputation methods with, such as sklearn's KNNImputer

The one-hot encoded features are treated as continuous values, which means instead of being assigned to 1 or 0, they're getting the average of their neighbors (e.g. if the three closest neighbors' values are \[0,1,1\], the imputed value will be …

datascience imputation knn values

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote