Feb. 12, 2024, 5:46 a.m. | Siru Ouyang Zhuosheng Zhang Bing Yan Xuan Liu Yejin Choi Jiawei Han Lianhui Qin

cs.CL updates on arXiv.org arxiv.org

Large Language Models (LLMs) excel in diverse areas, yet struggle with complex scientific reasoning, especially in the field of chemistry. Different from the simple chemistry tasks (e.g., molecule classification) addressed in previous studies, complex chemistry problems require not only vast knowledge and precise calculation, but also compositional reasoning about rich dynamic interactions of different concepts (e.g., temperature changes). Our study shows that even advanced LLMs, like GPT-4, can fail easily in different ways. Interestingly, the errors often stem not from …

cs.ai cs.cl

