May 9, 2024, 4:42 a.m. | Jerret Ross, Brian Belgodere, Samuel C. Hoffman, Vijil Chenthamarakshan, Youssef Mroueh, Payel Das

cs.LG updates on arXiv.org arxiv.org

arXiv:2405.04912v1 Announce Type: cross
Abstract: Transformer-based models trained on large and general purpose datasets consisting of molecular strings have recently emerged as a powerful tool for successfully modeling various structure-property relations. Inspired by this success, we extend the paradigm of training chemical language transformers on large-scale chemical datasets to generative tasks in this work. Specifically, we propose GP-MoLFormer, an autoregressive molecular string generator that is trained on more than 1.1B chemical SMILES. GP-MoLFormer uses a 46.8M parameter transformer decoder model …

abstract arxiv cs.lg datasets foundation foundation model general generative language modeling paradigm physics.chem-ph property q-bio.bm relations scale strings success tasks tool training transformer transformer-based models transformers type

Senior Machine Learning Engineer

@ GPTZero | Toronto, Canada

ML/AI Engineer / NLP Expert - Custom LLM Development (x/f/m)

@ HelloBetter | Remote

Doctoral Researcher (m/f/div) in Automated Processing of Bioimages

@ Leibniz Institute for Natural Product Research and Infection Biology (Leibniz-HKI) | Jena

Seeking Developers and Engineers for AI T-Shirt Generator Project

@ Chevon Hicks | Remote

GN SONG MT Market Research Data Analyst 11

@ Accenture | Bengaluru, BDC7A

GN SONG MT Market Research Data Analyst 09

@ Accenture | Bengaluru, BDC7A