[D] What's the purpose of the transpose in official LoRA implementation code? | allainews.com

March 28, 2024, 9:21 p.m. | /u/kessa231

Machine Learning www.reddit.com

Just skimmed their official implementation code and curious about this.
For example, in their Embedding module they declared and used lora parameters like this:

self.lora_A = nn.Parameter(self.weight.new_zeros((r, num_embeddings)))
self.lora_B = nn.Parameter(self.weight.new_zeros((embedding_dim, r)))
...
self.weight.data -= (self.lora_B @ self.lora_A).transpose(0, 1) * self.scaling
...
after_A = F.embedding(
x, self.lora_A.transpose(0, 1), self.padding_idx, self.max_norm,
self.norm_type, self.scale_grad_by_freq, self.sparse
)
result += (after_A @ self.lora_B.transpose(0, 1)) * self.scaling
...

So, why don't they just declare like this and use without transpose?

self.lora_A = nn.Parameter(self.weight.new_zeros((r, embedding_dim)))
self.lora_B …

code data embedding example implementation lora machinelearning parameters scaling

More from www.reddit.com / Machine Learning

[P] NLLB-200 Distill 350M for en-ko 3 hours ago | www.reddit.com

cpu english good gpu +9

[D] Real talk about RAG 10 hours ago | www.reddit.com

data deal documents machinelearning +5

[P] Classification finetuning experiments on small GPT-2 sized LLMs 15 hours ago | www.reddit.com

acc classification context cpu +16

[D] Llama-3 based OpenBioLLM-70B & 8B: Outperforms GPT-4, Gemini, Meditron-70B, Med-PaLM-1 & Med-PaLM-2 in Medical-domain 16 hours ago | www.reddit.com

70b art biomedical domain +16

How do I convince my superior to do data preprocessing? [D] 16 hours ago | www.reddit.com

ai engineer build chat chatbots +11

[D] Llama-3 based OpenBioLLM-70B & 8B: Outperforms GPT-4, Gemini, Meditron-70B, Med-PaLM-1 & Med-PaLM-2 in Medical-domain 17 hours ago | www.reddit.com

70b art biomedical domain +16

[D] Mathematical aspects of tokenization 19 hours ago | www.reddit.com

compression educational encoding entropy +7

[R] Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey 20 hours ago | www.reddit.com

abstract advancement application challenges +15

[D] Does it make sense to talk about the probabilities of models? 1 day, 3 hours ago | www.reddit.com

compute data likelihood machinelearning +4

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

View on ai-jobs.net

AIML - Sr Machine Learning Engineer, Data and ML Innovation

@ Apple | Seattle, WA, United States

View on ai-jobs.net

Senior Data Engineer

@ Palta | Palta Cyprus, Palta Warsaw, Palta remote

View on ai-jobs.net