Nov. 22, 2023, 9:28 a.m. | /u/lexected

Machine Learning www.reddit.com

**TL;DR:** Organize your neurons into a tree to get 78x faster inference (theoretical limit is 341x).

This was demonstrated on BERT-base, where this change preserved 96% of its downstream GLUE performance. For a quick comparison, DistilBERT offers 1.6x acceleration while preserving 97% of GLUE performance.

This is a [HuggingFace Featured Paper from 11/21/2023](https://huggingface.co/papers/2311.10770).

Paper: [https://arxiv.org/abs/2311.10770](https://arxiv.org/abs/2311.10770)

Code: [https://github.com/pbelcak/UltraFastBERT](https://github.com/pbelcak/UltraFastBERT)

Model: [https://huggingface.co/pbelcak/UltraFastBERT-1x11-long](https://huggingface.co/pbelcak/UltraFastBERT-1x11-long)

Abstract:

>Language models only really need to use an exponential fraction of their neurons for individual inferences.
>
>As proof, we …

abstract bert change comparison distilbert faster glue inference language language modelling language models machinelearning modelling neurons organize performance tree

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York