[R] LongLoRA: New method extends LLAMA2 7B to 100k context length, 70B to 32k context length on on a single 8 × A100 machine | allainews.com

Sept. 22, 2023, 2:22 p.m. | /u/Successful-Western27

Machine Learning www.reddit.com

As AI models get bigger, training them requires more and more computing power. Researchers are looking for ways to train these large AI models without needing Google-scale resources.

A new paper proposes [LongLoRA](https://arxiv.org/pdf/2309.12307.pdf), **a fine-tuning approach that can extend LLaMA2 7B to 100k context length and 70B model to 32k context length on a single 8× A100 machine.**

Here are my highlights from the paper:

Big one of course: LongLoRA efficiently fine-tunes large AI models on longer texts

Key points: …

32k context a100 ai models big bigger computing computing power context google highlights llama2 machine machinelearning paper power researchers resources scale them training

More from www.reddit.com / Machine Learning

[Research] Consistency LLMs: converting LLMs to parallel decoders accelerates inference 3.5x 4 hours ago | www.reddit.com

check decoding deployment family +17

[D] How do transformers memorize facts after a single gradient update? 5 hours ago | www.reddit.com

dataset facts gradient knowledge +6

[D] Fun little discovery: Gemini is surprisingly bad at following simple number sequences 8 hours ago | www.reddit.com

discovery fun gemini machinelearning +5

[D] Strange Loss Curve while training 9 hours ago | www.reddit.com

dataset gpt loss machinelearning +4

[D] Intra-Document prefix (cumulative) sum when using sequence packing in PyTorch 14 hours ago | www.reddit.com

computational context context window documents +7

[Research] xLSTM: Extended Long Short-Term Memory 20 hours ago | www.reddit.com

abstract contributed deep learning error +16

Non Technical ML Podcasts? [D] 1 day, 3 hours ago | www.reddit.com

challenge context current data +16

[D] PEFT techniques actually used in the industry 1 day, 7 hours ago | www.reddit.com

industry machinelearning normally peft +2

[D] Can anyone with the expertise speak to the overlap, or not, between Nvidia's hardware … 1 day, 8 hours ago | www.reddit.com

apple chips expertise hardware +4

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

View on ai-jobs.net

Lead Developer (AI)

@ Cere Network | San Francisco, US

View on ai-jobs.net

Research Engineer

@ Allora Labs | Remote

View on ai-jobs.net

Ecosystem Manager

@ Allora Labs | Remote

View on ai-jobs.net

Founding AI Engineer, Agents

@ Occam AI | New York

View on ai-jobs.net

AI Engineer Intern, Agents

@ Occam AI | US

View on ai-jobs.net