Web: http://arxiv.org/abs/2110.08633

Jan. 26, 2022, 2:11 a.m. | Kabir Nagrecha, Arun Kumar

cs.LG updates on arXiv.org arxiv.org

Training deep learning (DL) models that do not fit into the memory of a
single GPU is a vexed process, forcing users to procure multiple GPUs to adopt
model-parallel execution. Unfortunately, sequential dependencies in neural
architectures often block efficient multi-device training, leading to
suboptimal performance. We present 'model spilling', a technique aimed at
models such as Transformers and CNNs to move groups of layers, or shards,
between DRAM and GPU memory, thus enabling arbitrarily large models to be
trained even …

arxiv deep deep learning learning model

More from arxiv.org / cs.LG updates on arXiv.org

Data Engineer, Buy with Prime

@ Amazon.com | Santa Monica, California, USA

Data Architect – Public Sector Health Data Architect, WWPS

@ Amazon.com | US, VA, Virtual Location - Virginia

[Job 8224] Data Engineer - Developer Senior

@ CI&T | Brazil

Software Engineer, Machine Learning, Planner/Behavior Prediction

@ Nuro, Inc. | Mountain View, California (HQ)

Lead Data Scientist

@ Inspectorio | Ho Chi Minh City, Ho Chi Minh City, Vietnam - Remote

Data Engineer

@ Craftable | Portugal - Remote