April 15, 2024, 4:47 a.m. | Rajarshi Haldar, Julia Hockenmaier

cs.CL updates on arXiv.org arxiv.org

arXiv:2404.08018v1 Announce Type: cross
Abstract: Large language models (LLMs) such as Llama 2 perform very well on tasks that involve both natural language and source code, particularly code summarization and code generation. We show that for the task of code summarization, the performance of these models on individual examples often depends on the amount of (subword) token overlap between the code and the corresponding reference natural language descriptions in the dataset. This token overlap arises because the reference descriptions in …

abstract arxiv code code generation cs.ai cs.cl cs.se examples language language models large language large language models llama llama 2 llms natural natural language performance show summarization tasks type

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Data Scientist

@ Publicis Groupe | New York City, United States

Bigdata Cloud Developer - Spark - Assistant Manager

@ State Street | Hyderabad, India