Sept. 19, 2023, 8:10 a.m. | /u/InterviewIntrepid889

Machine Learning www.reddit.com

Paper: [https://arxiv.org/abs/2309.09400](https://arxiv.org/abs/2309.09400)

Hugging Face datasets: [https://huggingface.co/datasets/uonlp/CulturaX](https://huggingface.co/datasets/uonlp/CulturaX)

Abstract:

>The driving factors behind the development of large language models (LLMs) with impressive learning capabilities are their colossal model sizes and extensive training datasets. Along with the progress in natural language processing, LLMs have been frequently made accessible to the public to foster deeper investigation and applications. However, when it comes to training datasets for these LLMs, especially the recent state-of-the-art models, they are often not fully disclosed. Creating training data for high-performing …

abstract machinelearning

Senior AI/ML Developer

@ Lemon.io | Remote

Consultant(e) Confirmé(e) Power BI & Azure - H/F

@ Talan | Lyon, France

Research Manager-Data Science

@ INFICON | East Syracuse, NY, United States

Data Scientist

@ Ubisoft | Singapore, Singapore

Data Science Assistant – Stage Janvier 2024 (F/H/NB)

@ Ubisoft | Paris, France

Data Scientist

@ dentsu international | Milano, Italy