July 13, 2023, 9:57 p.m. | /u/rirhun

Machine Learning www.reddit.com

Hello all,

In our organization, we are currently using Metaflow as our managed training infrastructure and leveraging the \`@batch\` decorator for compute. Using Batch, we also have access to multi-node parallel jobs (\`@parallel\` decorator) for distributed training and we've used it to great effect for fine-tuning some LLMs.

We are now thinking of adopting Ray Train since it seems to be very popular nowadays and is gaining lots of traction.

Wondering how Ray Train compares to Metaflow (AWS Batch) and …

aws aws batch compute distributed fine-tuning infrastructure jobs llms machinelearning managed metaflow node ray thinking training

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Data Analyst (Digital Business Analyst)

@ Activate Interactive Pte Ltd | Singapore, Central Singapore, Singapore