Multistage Collaborative Knowledge Distillation from a Large Language Model for Semi-Supervised Sequence Generation | allainews.com

Feb. 28, 2024, 5:44 a.m. | Jiachen Zhao, Wenlong Zhao, Andrew Drozdov, Benjamin Rozonoyer, Md Arafat Sultan, Jay-Yoon Lee, Mohit Iyyer, Andrew McCallum

cs.LG updates on arXiv.org arxiv.org

arXiv:2311.08640v3 Announce Type: replace-cross
Abstract: We study semi-supervised sequence generation tasks, where the few labeled examples are too scarce to finetune a model, and meanwhile, few-shot prompted large language models (LLMs) exhibit room for improvement. In this paper, we present the discovery that a student model distilled from a few-shot prompted LLM can commonly generalize better than its teacher to unseen examples on such tasks. We find that the student is able to learn a general pattern from the high-quality …

abstract arxiv collaborative cs.cl cs.lg discovery distillation examples few-shot improvement knowledge language language model language models large language large language model large language models llms paper room semi-supervised study tasks type

More from arxiv.org / cs.LG updates on arXiv.org

Marabou 2.0: A Versatile Formal Analyzer of Neural Networks 13 hours ago | arxiv.org

abstract analysis arxiv components +16

Metric Entropy-Free Sample Complexity Bounds for Sample Average Approximation in Convex Stochastic Programming 13 hours ago | arxiv.org

abstract approximation arxiv complexity +15

FengWu-4DVar: Coupling the Data-driven Weather Forecasting Model with 4D Variational Assimilation 13 hours ago | arxiv.org

abstract artificial artificial intelligence arxiv +16

Image Restoration Through Generalized Ornstein-Uhlenbeck Bridge 13 hours ago | arxiv.org

arxiv bridge cs.ai cs.cv +8

Learn or Recall? Revisiting Incremental Learning with Pre-trained Language Models 13 hours ago | arxiv.org

arxiv cs.cl cs.lg incremental +7

System-level Safety Guard: Safe Tracking Control through Uncertain Neural Network Dynamics Models 13 hours ago | arxiv.org

arxiv control cs.lg cs.ro +13

Structured state-space models are deep Wiener models 13 hours ago | arxiv.org

abstract arxiv become classification +16

Differentiable and accelerated spherical harmonic and Wigner transforms 13 hours ago | arxiv.org

abstract analysis and analysis arxiv +16

Stable Attractors for Neural networks classification via Ordinary Differential Equations (SA-nODE) 13 hours ago | arxiv.org

abstract arxiv classification cond-mat.dis-nn +18

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

View on ai-jobs.net

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

View on ai-jobs.net

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

View on ai-jobs.net

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

View on ai-jobs.net

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

View on ai-jobs.net

Intern - Robotics Industrial Engineer Summer 2024

@ Vitesco Technologies | Seguin, US

View on ai-jobs.net