all AI news
[D] Ray vs. AWS Batch for Distributed Training
July 13, 2023, 9:57 p.m. | /u/rirhun
Machine Learning www.reddit.com
In our organization, we are currently using Metaflow as our managed training infrastructure and leveraging the \`@batch\` decorator for compute. Using Batch, we also have access to multi-node parallel jobs (\`@parallel\` decorator) for distributed training and we've used it to great effect for fine-tuning some LLMs.
We are now thinking of adopting Ray Train since it seems to be very popular nowadays and is gaining lots of traction.
Wondering how Ray Train compares to Metaflow (AWS Batch) and …
aws aws batch compute distributed fine-tuning infrastructure jobs llms machinelearning managed metaflow node ray thinking training
More from www.reddit.com / Machine Learning
Jobs in AI, ML, Big Data
AI Research Scientist
@ Vara | Berlin, Germany and Remote
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Data Analyst (Digital Business Analyst)
@ Activate Interactive Pte Ltd | Singapore, Central Singapore, Singapore