Feb. 5, 2024, 6:48 a.m. | Rongsheng Wang Haoming Chen Ruizhe Zhou Han Ma Yaofei Duan Yanlan Kang Songhua Yang Baoyu Fan

cs.CL updates on arXiv.org arxiv.org

ChatGPT and other general large language models (LLMs) have achieved remarkable success, but they have also raised concerns about the misuse of AI-generated texts. Existing AI-generated text detection models, such as based on BERT and RoBERTa, are prone to in-domain over-fitting, leading to poor out-of-domain (OOD) detection performance. In this paper, we first collected Chinese text responses generated by human experts and 9 types of LLMs, for which to multiple domains questions, and further created a dataset that mixed human-written …

ai-generated text bert chatgpt chinese concerns cs.cl detection domain general generated language language models large language large language models llm llms misuse performance roberta success text

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Lead Data Scientist, Commercial Analytics

@ Checkout.com | London, United Kingdom

Data Engineer I

@ Love's Travel Stops | Oklahoma City, OK, US, 73120