May 17, 2022, 4:43 p.m. | Bvolodarskiy

Towards Data Science - Medium towardsdatascience.com

Image by anustudio from freepik

In my previous articles (post one and post two), I described how you can handle homogeneous data sources stored as Apache Parquet files of moderate size (~500 MB). But what if you need to deal with Big Data? How can you test it by using Great Expectations on AWS? How can you compare two non-homogeneous datasets? In this article, I will explore one way to do just that.

Challenges

The Provectus Data Engineering …

aws aws glue comparison data data quality data science great expectations machine learning quality

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Research Associate (Data Science/Information Engineering/Applied Mathematics/Information Technology)

@ Nanyang Technological University | NTU Main Campus, Singapore

Associate Director of Data Science and Analytics

@ Penn State University | Penn State University Park

Student Worker- Data Scientist

@ TransUnion | Israel - Tel Aviv

Vice President - Customer Segment Analytics Data Science Lead

@ JPMorgan Chase & Co. | Bengaluru, Karnataka, India

Middle/Senior Data Engineer

@ Devexperts | Sofia, Bulgaria