all AI news
A Benchmark and Taxonomy of Categorical Encoders
Towards Data Science - Medium towardsdatascience.com
New. Comprehensive. Extendable.
Image created by author with recraft.aiA large share of datasets contain categorical features. For example, out of 665 datasets on the UC Irvine Machine Learning Repository [1], 42 are fully categorical and 366 are reported as mixed. However, distance-based ML models require features in a numerical format. Categorical encoders replace the categories in such features with real numbers.
A variety of categorical encoders exist, but there have been few attempts to compare them on many datasets, …
aia author benchmark categorical categorical-data data science datasets example features format however image machine machine learning mixed ml models numerical recraft taxonomy uc irvine