July 11, 2022, 11:27 a.m. | /u/blessedorcursed

Machine Learning www.reddit.com

A lot of recent DL models for tabular data have used some sort of pre-training to increase the robustness and performance metrics on smaller/noisy datasets. That's why I've decided to write a [deep-dive blog](https://syslog.ravelin.com/fraud-detection-with-minimum-labels-semi-supervised-learning-d2f8e7136da6) into a VIME paper which was one of the first to suggest pre-training tasks specific for tabular data.

It comes with an accompanying [repo](https://github.com/aruberts/blogs/tree/main/vime) that contains all the code and notebooks. From some personal testing that I've done, pre-training is the most valuable does improve the …

data learning machinelearning semi-supervised semi-supervised learning supervised learning tabular tabular data

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

[Job - 14823] Senior Data Scientist (Data Analyst Sr)

@ CI&T | Brazil

Data Engineer

@ WorldQuant | Hanoi

ML Engineer / Toronto

@ Intersog | Toronto, Ontario, Canada

Analista de Business Intelligence (Industry Insights)

@ NielsenIQ | Cotia, Brazil