Aug. 21, 2023, 10:04 p.m. | Wes Davis

The Verge - All Posts www.theverge.com


Illustration by Alex Castro / The Verge


The New York Times has blocked OpenAI’s web crawler, meaning that OpenAI can’t use content from the publication to train its AI models. If you check the NYT’s robots.txt page, you can see that the NYT disallows GPTBot, the crawler that OpenAI introduced earlier this month. Based on the Internet Archive’s Wayback Machine, it appears NYT blocked the crawler as early as August 17th.



Screenshot by Jay Peters / The Verge …

ai models alex check crawler gptbot illustration meaning openai publication robots the new york times web web crawler

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US