June 6, 2024, 4:52 a.m. | Farhad Nooralahzadeh, Yi Zhang, Ellery Smith, Sabine Maennel, Cyril Matthey-Doret, Rapha\"el de Fondville, Kurt Stockinger

cs.CL updates on arXiv.org arxiv.org

arXiv:2406.03170v1 Announce Type: new
Abstract: The potential for improvements brought by Large Language Models (LLMs) in Text-to-SQL systems is mostly assessed on monolingual English datasets. However, LLMs' performance for other languages remains vastly unexplored. In this work, we release the StatBot.Swiss dataset, the first bilingual benchmark for evaluating Text-to-SQL systems based on real-world applications. The StatBot.Swiss dataset contains 455 natural language/SQL-pairs over 35 big databases with varying level of complexity for both English and German.
We evaluate the performance of …

abstract arxiv benchmark bilingual cs.cl data data exploration dataset datasets english exploration however improvements language language models languages large language large language models llms natural natural language performance potential release sql systems text text-to-sql type work

Senior Data Engineer

@ Displate | Warsaw

Junior Data Analyst - ESG Data

@ Institutional Shareholder Services | Mumbai

Intern Data Driven Development in Sensor Fusion for Autonomous Driving (f/m/x)

@ BMW Group | Munich, DE

Senior MLOps Engineer, Machine Learning Platform

@ GetYourGuide | Berlin

Data Engineer, Analytics

@ Meta | Menlo Park, CA

Data Engineer

@ Meta | Menlo Park, CA