all AI news
$\textit{L+M-24}$: Building a Dataset for Language + Molecules @ ACL 2024
March 5, 2024, 2:51 p.m. | Carl Edwards, Qingyun Wang, Lawrence Zhao, Heng Ji
cs.CL updates on arXiv.org arxiv.org
Abstract: Language-molecule models have emerged as an exciting direction for molecular discovery and understanding. However, training these models is challenging due to the scarcity of molecule-language pair datasets. At this point, datasets have been released which are 1) small and scraped from existing databases, 2) large but noisy and constructed by performing entity linking on the scientific literature, and 3) built by converting property prediction datasets to natural language using templates. In this document, we detail …
abstract acl arxiv building cs.ai cs.cl databases dataset datasets discovery language molecules q-bio.bm q-bio.qm small training type understanding
More from arxiv.org / cs.CL updates on arXiv.org
Jobs in AI, ML, Big Data
Founding AI Engineer, Agents
@ Occam AI | New York
AI Engineer Intern, Agents
@ Occam AI | US
AI Research Scientist
@ Vara | Berlin, Germany and Remote
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Codec Avatars Research Engineer
@ Meta | Pittsburgh, PA