June 4, 2024, 4:20 p.m. | /u/30299578815310

Machine Learning www.reddit.com

I recently saw these papers for fusing together the probability distributions of LLMs with different vocabs


[https://arxiv.org/pdf/2404.12715v2](https://arxiv.org/pdf/2404.12715v2) - this paper uses a no-training method by transforming distributions to a shared space, averaging them, and then casting them back to a distribution for a chosen model


[https://openreview.net/forum?id=jiDsk12qcz](https://openreview.net/forum?id=jiDsk12qcz) - this paper trains a target LLM based on prob distributions of other LLMs (with some special logic for vocab differences)


The first paper is of particular interest because it claims to outperform the …

distribution latest llms machinelearning paper papers probability together training

Senior Data Engineer

@ Displate | Warsaw

Junior Data Analyst - ESG Data

@ Institutional Shareholder Services | Mumbai

Intern Data Driven Development in Sensor Fusion for Autonomous Driving (f/m/x)

@ BMW Group | Munich, DE

Senior MLOps Engineer, Machine Learning Platform

@ GetYourGuide | Berlin

Data Engineer, Analytics

@ Meta | Menlo Park, CA

Data Engineer

@ Meta | Menlo Park, CA