Aug. 10, 2023, 4:44 a.m. | Yizheng Chen, Zhoujie Ding, Lamya Alowain, Xinyun Chen, David Wagner

cs.LG updates on arXiv.org arxiv.org

We propose and release a new vulnerable source code dataset. We curate the
dataset by crawling security issue websites, extracting vulnerability-fixing
commits and source codes from the corresponding projects. Our new dataset
contains 18,945 vulnerable functions spanning 150 CWEs and 330,492
non-vulnerable functions extracted from 7,514 commits. Our dataset covers 295
more projects than all previous datasets combined.


Combining our new dataset with previous datasets, we present an analysis of
the challenges and promising research directions of using deep learning …

arxiv code crawling dataset deep learning detection functions issue projects release security vulnerability vulnerable websites

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York