Aug. 21, 2023, 10:04 p.m. | Wes Davis

The Verge - All Posts www.theverge.com


Illustration by Alex Castro / The Verge


The New York Times has blocked OpenAI’s web crawler, meaning that OpenAI can’t use content from the publication to train its AI models. If you check the NYT’s robots.txt page, you can see that the NYT disallows GPTBot, the crawler that OpenAI introduced earlier this month. Based on the Internet Archive’s Wayback Machine, it appears NYT blocked the crawler as early as August 17th.



Screenshot by Jay Peters / The Verge …

ai models alex check crawler gptbot illustration meaning openai publication robots the new york times web web crawler

Senior Machine Learning Engineer

@ GPTZero | Toronto, Canada

ML/AI Engineer / NLP Expert - Custom LLM Development (x/f/m)

@ HelloBetter | Remote

Doctoral Researcher (m/f/div) in Automated Processing of Bioimages

@ Leibniz Institute for Natural Product Research and Infection Biology (Leibniz-HKI) | Jena

Seeking Developers and Engineers for AI T-Shirt Generator Project

@ Chevon Hicks | Remote

Senior Data Quality Specialist

@ M&T Bank | Buffalo, NY

Data Quality Technician

@ Bureau Veritas Group | Vancouver, British Columbia, CA