June 4, 2024, 4:54 a.m. | Fangcong Yin, Xi Ye, Greg Durrett

cs.CL updates on arXiv.org arxiv.org

arXiv:2406.01563v1 Announce Type: new
Abstract: Recent work in interpretability shows that large language models (LLMs) can be adapted for new tasks in a learning-free way: it is possible to intervene on LLM representations to elicit desired behaviors for alignment. For instance, adding certain bias vectors to the outputs of certain attention heads is reported to boost the truthfulness of models. In this work, we show that localized fine-tuning serves as an effective alternative to such representation intervention methods. We introduce …

abstract alignment arxiv attention bias cs.cl fine-tuning free instance interpretability language language models large language large language models llm llms shows tasks type vectors work

Senior Data Engineer

@ Displate | Warsaw

Solution Architect

@ Philips | Bothell - B2 - Bothell 22050

Senior Product Development Engineer - Datacenter Products

@ NVIDIA | US, CA, Santa Clara

Systems Engineer - 2nd Shift (Onsite)

@ RTX | PW715: Asheville Site W Asheville Greenfield Site TBD , Asheville, NC, 28803 USA

System Test Engineers (HW & SW)

@ Novanta | Barcelona, Spain

Senior Solutions Architect, Energy

@ NVIDIA | US, TX, Remote