April 5, 2024, 11 p.m. | Sana Hassan

MarkTechPost www.marktechpost.com

State-of-the-art language models require vast amounts of text data for pretraining, often in the order of trillions of words, which poses a challenge for smaller languages needing more extensive resources. While leveraging multilingual data is a logical solution, it’s commonly viewed as problematic due to the “curse of multilingualism.” Despite some research exploring the benefits […]


The post Poro 34B: A 34B Parameter AI Model Trained for 1T Tokens of Finnish, English, and Programming languages, Including 8B Tokens of Finnish-English …

ai model ai paper summary ai shorts applications art artificial intelligence challenge data editors pick english finnish language language model language models languages large language model multilingual poro pretraining programming programming languages resources staff state tech news technology text tokens translation vast words

More from www.marktechpost.com / MarkTechPost

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Business Data Analyst

@ Alstom | Johannesburg, GT, ZA