Web: https://www.reddit.com/r/datascience/comments/sf0r7h/guidance_on_how_to_start/

Jan. 28, 2022, 9:07 p.m. | /u/SOULFULLLL

Data Science reddit.com

I have a data frame that will be coming next week, and I need to start working on it, the first step I'll do is to clean it. My question is what do you usually look for when cleaning a set? like duplicates, formatting problems and what?

I need guidance on how to start and what to look for?

Also, when you remove identical rows/duplicates how do you make sure they're duplicate and not just other identical rows?

submitted by …

datascience

Engineering Manager, Machine Learning (Credit Engineering)

@ Affirm | Remote Poland

Sr Data Engineer

@ Rappi | [CO] Bogotá

Senior Analytics Engineer

@ GetGround | Porto

Senior Staff Software Engineer, Data Engineering

@ Galileo, Inc. | New York City or Remote

Data Engineer

@ Atlassian | Bengaluru, India

Data Engineer | Hybrid (Pune)

@ Velotio | Pune, Maharashtra, India