Feb. 2, 2024, 9:41 p.m. | Shaghayegh Sadeghi Alan Bui Ali Forooghi Jianguo Lu Alioune Ngom

cs.CL updates on arXiv.org arxiv.org

Purpose: Large Language Models (LLMs) like ChatGPT and LLaMA are increasingly recognized for their potential in the field of cheminformatics, particularly in interpreting Simplified Molecular Input Line Entry System (SMILES), a standard method for representing chemical structures. These LLMs can decode SMILES strings into vector representations, providing a novel approach to understanding chemical graphs.
Methods: We investigate the performance of ChatGPT and LLaMA in embedding SMILES strings. Our evaluation focuses on two key applications: molecular property (MP) prediction and drug-drug …

analysis chatgpt comparative analysis cs.ai cs.cl cs.lg decode embedding embeddings language language models large language large language models line llama llms novel q-bio.bm simplified standard strings vector

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Software Engineer, Machine Learning (Tel Aviv)

@ Meta | Tel Aviv, Israel

Senior Data Scientist- Digital Government

@ Oracle | CASABLANCA, Morocco