Web: https://towardsdatascience.com/improve-apache-spark-performance-with-the-s3-magic-committer-257a34e367af?source=rss----7f60cf5620c9---4

Jan. 27, 2022, 3:30 p.m. | Jean Yves

Towards Data Science - Medium towardsdatascience.com

Achieve up to 65% performance gain using the latest S3 magic committer from Spark 3.2 and Hadoop 3.3!

Most Apache Spark users overlook the choice of an S3 committer (a protocol used by Spark when writing output results to S3), because it is quite complex and documentation about it is scarce. This choice has a major impact on performance whenever Spark writes data to S3. Since for AWS users, a large portion of Spark jobs are spent writing to S3, …

apache apache spark data engineering hadoop performance s3 spark

Data Engineer, Buy with Prime

@ Amazon.com | Santa Monica, California, USA

Data Architect – Public Sector Health Data Architect, WWPS

@ Amazon.com | US, VA, Virtual Location - Virginia

[Job 8224] Data Engineer - Developer Senior

@ CI&T | Brazil

Software Engineer, Machine Learning, Planner/Behavior Prediction

@ Nuro, Inc. | Mountain View, California (HQ)

Lead Data Scientist

@ Inspectorio | Ho Chi Minh City, Ho Chi Minh City, Vietnam - Remote

Data Engineer

@ Craftable | Portugal - Remote