Feb. 1, 2024, 12:46 p.m. | Katharine Fisher Michael Herbst Youssef Marzouk

stat.ML updates on arXiv.org arxiv.org

Data generation remains a bottleneck in training surrogate models to predict molecular properties. We demonstrate that multitask Gaussian process regression overcomes this limitation by leveraging both expensive and cheap data sources. In particular, we consider training sets constructed from coupled-cluster (CC) and density function theory (DFT) data. We report that multitask surrogates can predict at CC level accuracy with a reduction to data generation cost by over an order of magnitude. Of note, our approach allows the training set to …

cluster data data data data sources function physics.chem-ph process regression report stat.ml theory training

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US