Feb. 7, 2024, 3:40 p.m. |

Techmeme www.techmeme.com


Mozilla Foundation:

An in-depth look at Common Crawl, the 9.5PB web crawl archive dating back to 2008 run by a small nonprofit, its role in generative AI, its dataset, and more  —  Common Crawl's Impact on Generative AI  —  Common Crawl's mission: Enabling others to work like Google  —  Common Crawl's data: Machine scale analysis

dataset dating foundation generative look mozilla nonprofit role small web

More from www.techmeme.com / Techmeme

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Robotics Technician - 3rd Shift

@ GXO Logistics | Perris, CA, US, 92571