all AI news
LAiW: A Chinese Legal Large Language Models Benchmark
Feb. 20, 2024, 5:52 a.m. | Yongfu Dai, Duanyu Feng, Jimin Huang, Haochen Jia, Qianqian Xie, Yifang Zhang, Weiguang Han, Wei Tian, Hao Wang
cs.CL updates on arXiv.org arxiv.org
Abstract: General and legal domain LLMs have demonstrated strong performance in various tasks of LegalAI. However, the current evaluations of these LLMs in LegalAI are defined by the experts of computer science, lacking consistency with the logic of legal practice, making it difficult to judge their practical capabilities. To address this challenge, we are the first to build the Chinese legal LLMs benchmark LAiW, based on the logic of legal practice. To align with the thinking …
abstract arxiv benchmark capabilities chinese computer computer science cs.cl current domain experts general judge language language models large language large language models legal llms logic making performance practical practice science tasks type
More from arxiv.org / cs.CL updates on arXiv.org
Jobs in AI, ML, Big Data
Lead Developer (AI)
@ Cere Network | San Francisco, US
Research Engineer
@ Allora Labs | Remote
Ecosystem Manager
@ Allora Labs | Remote
Founding AI Engineer, Agents
@ Occam AI | New York
AI Engineer Intern, Agents
@ Occam AI | US
AI Research Scientist
@ Vara | Berlin, Germany and Remote