Feb. 9, 2024, 5:47 a.m. | Fenia Christopoulou Guchun Zhang Gerasimos Lampouras

cs.CL updates on arXiv.org arxiv.org

Large pre-trained language models have recently been expanded and applied to programming language tasks with great success, often through further pre-training of a strictly-natural language model--where training sequences typically contain both natural and (linearised) programming language. Such approaches effectively map both modalities of the sequence into the same embedding space. However, programming language keywords (e.g. ``while'') often have very strictly defined semantics. As such, transfer learning from their natural language usage may not necessarily be beneficial to their code application …

code code generation cs.cl embedding keywords language language model language models map natural natural language pre-training programming programming language space success tasks text text-to-code through training

Research Scholar (Technical Research)

@ Centre for the Governance of AI | Hybrid; Oxford, UK

HPC Engineer (x/f/m) - DACH

@ Meshcapade GmbH | Remote, Germany

Senior Analyst-Data Analysis

@ Tesco Bengaluru | Bengaluru, India

Data Engineer - Senior Associate

@ PwC | Brussels

People Data Analyst

@ Version 1 | London, United Kingdom

Senior Data Scientist

@ Palta | Simple Cyprus or remote