April 30, 2022, 7:18 p.m. | /u/yoi12321

Data Science www.reddit.com

I just joined a team that has their entire modeling suite in scala and spark and they leverage the dataframe API. For some of our data that makes sense we have data in the hundreds of terrabytes, but for other data we're actually in the megabytes, and things that should take a second are taking 20 minutes sometimes, mostly due to all the spark jobs that are created.



To my management this isn't a big deal none of this stuff …

datascience scala spark

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York