April 7, 2024, 9:38 a.m. | /u/Franzese

Data Science www.reddit.com

For this project that I am working on we have been developing two competeing models. Having access to the codebase, I noticed the other model which has been accepted to be used in production for seemingly better results, has data leakage (using information during training from test data). Synthetic data generation done on the entire dataset and other feature engineering such as standardising the values on the entire dataset.

I brought this up in the group chat once, but it …

codebase data data leakage datascience information production project results team

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York