[R] What infrastructure do you use to train big LLMs? | allainews.com

Oct. 29, 2023, 9:24 a.m. | /u/TimeInterview5482

Machine Learning www.reddit.com

I come from computer vision tasks with convnets that are relatively small in size and parameters, yet performing quite well (e.g. ResNet family, YOLO, etc.).

Now I am approaching some NLP and architectures based on transformers tend to be huge, so that I have problems to fit them in memory.

What infrastructure you use to train these model (GPT2, BERT or even the bigger ones)? cloud computing, HPC, etc.

architectures big computer computer vision etc family infrastructure llms machinelearning memory nlp parameters resnet small tasks them train transformers vision yolo

More from www.reddit.com / Machine Learning

[Discussion] What are SOTA Uncertainty Quantification Methods for Neural Networks? 9 hours ago | www.reddit.com

inputs look machinelearning modeling +8

[R] LLM4ED: Large Language Models for Automatic Equation Discovery 9 hours ago | www.reddit.com

abstract algorithms data design +20

[R] The Platonic Representation Hypothesis 9 hours ago | www.reddit.com

convergence machinelearning modal multi-modal +3

[D] Those in the industry, how are you using open source LLMs? 13 hours ago | www.reddit.com

industry llms love machinelearning +6

[R] Matryoshka representation learning (MRL) for CLIP (& SigLip) 18 hours ago | www.reddit.com

clip dimensions embeddings fidelity +11

[D] Any reason not to submit to NeurIPS? 19 hours ago | www.reddit.com

machinelearning neurips reason reviews +1

[D] Kolmogorov Arnold Networks: A visual paper breakdown (Video) 1 day ago | www.reddit.com

breakdown challenges concepts core +10

[D] GPT-4o "natively" multi-modal, what does this actually mean? 1 day ago | www.reddit.com

architecture embed encoder fine-tune +15

[D] Is BERT still relevant in 2024 for an EMNLP submission? 1 day, 2 hours ago | www.reddit.com

active learning applications bert classification +7

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

View on ai-jobs.net

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

View on ai-jobs.net

Lead Developer (AI)

@ Cere Network | San Francisco, US

View on ai-jobs.net

Research Engineer

@ Allora Labs | Remote

View on ai-jobs.net

Ecosystem Manager

@ Allora Labs | Remote

View on ai-jobs.net

Founding AI Engineer, Agents

@ Occam AI | New York

View on ai-jobs.net