Feb. 9, 2024, 5:47 a.m. | Fenia Christopoulou Guchun Zhang Gerasimos Lampouras

cs.CL updates on arXiv.org arxiv.org

Large pre-trained language models have recently been expanded and applied to programming language tasks with great success, often through further pre-training of a strictly-natural language model--where training sequences typically contain both natural and (linearised) programming language. Such approaches effectively map both modalities of the sequence into the same embedding space. However, programming language keywords (e.g. ``while'') often have very strictly defined semantics. As such, transfer learning from their natural language usage may not necessarily be beneficial to their code application …

code code generation cs.cl embedding keywords language language model language models map natural natural language pre-training programming programming language space success tasks text text-to-code through training

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote