June 21, 2024, 4:42 a.m. | Johannes Treutlein, Dami Choi, Jan Betley, Cem Anil, Samuel Marks, Roger Baker Grosse, Owain Evans

cs.CL updates on arXiv.org arxiv.org

arXiv:2406.14546v1 Announce Type: new
Abstract: One way to address safety risks from large language models (LLMs) is to censor dangerous knowledge from their training data. While this removes the explicit information, implicit information can remain scattered across various training documents. Could an LLM infer the censored knowledge by piecing together these implicit hints? As a step towards answering this question, we study inductive out-of-context reasoning (OOCR), a type of generalization in which LLMs infer latent information from evidence distributed across …

abstract arxiv cs.ai cs.cl cs.lg data documents information knowledge language language models large language large language models llm llms risks safety safety risks training training data type while

AI Focused Biochemistry Postdoctoral Fellow

@ Lawrence Berkeley National Lab | Berkeley, CA

Senior Data Engineer

@ Displate | Warsaw

Solutions Architect

@ PwC | Bucharest - 1A Poligrafiei Boulevard

Research Fellow (Social and Cognition Factors, CLIC)

@ Nanyang Technological University | NTU Main Campus, Singapore

Research Aide - Research Aide I - Department of Psychology

@ Cornell University | Ithaca (Main Campus)

Technical Architect - SMB/Desk

@ Salesforce | Ireland - Dublin