Oct. 1, 2022, 3:55 p.m. | /u/2600_yay

Natural Language Processing www.reddit.com

Does anyone have any leads on a large-ish 'simple English' dataset; anything north of a few GB is great. The [`simple.wikipedia.org`](https://simple.wikipedia.org/wiki/Main_Page) corpus is too small for my current needs, plus given the crowd-sourced nature of Wikipedia articles, edits, etc. I'm not confident that the articles on that Wikipedia are indeed written in a consistent 'simple English' style. (I don't yet have any qualitative evidence of this; it's purely observational from years of exploring `simple.wikipedia.org` along side 'regular Wikipedia', but some …

languagetechnology wikipedia

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Stagista Technical Data Engineer

@ Hager Group | BRESCIA, IT

Data Analytics - SAS, SQL - Associate

@ JPMorgan Chase & Co. | Mumbai, Maharashtra, India