May 3, 2024, 4:15 a.m. | Lee Youngmin, Lang S. I. D. Andrew, Cai Duoduo, Wheat R. Stephen

cs.CL updates on arXiv.org arxiv.org

arXiv:2405.00949v1 Announce Type: cross
Abstract: This study introduces a systematic framework to compare the efficacy of Large Language Models (LLMs) for fine-tuning across various cheminformatics tasks. Employing a uniform training methodology, we assessed three well-known models-RoBERTa, BART, and LLaMA-on their ability to predict molecular properties using the Simplified Molecular Input Line Entry System (SMILES) as a universal molecular representation format. Our comparative analysis involved pre-training 18 configurations of these models, with varying parameter sizes and dataset scales, followed by fine-tuning …

abstract architecture arxiv bart cs.cl cs.lg fine-tuning framework insights language language models large language large language models llama llms methodology physics.chem-ph q-bio.bm roberta role scale study tasks training type uniform

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US