Public AI Training Datasets Are Rife With Licensing Errors | allainews.com

Nov. 8, 2023, 2 p.m. | Edd Gent

IEEE Spectrum spectrum.ieee.org

Large language models feed on big data from publicly available training sets, but most of the sets are of doubtful legal status.

The scope of the problem has been demonstrated by the newly launched Data Provenance Initiative, which brings together a multi-institutional team of machine-learning and legal experts led by researchers at the Massachusetts Institute of Technology and Cohere for AI, a nonprofit research lab created by the AI company Cohere.

The group audited more than 1,800 …

ai training artificial intelligence big big data copyright data data provenance datasets doubtful errors experts language language models large language large language models legal licensing machine privacy provenance public researchers team together training training data

More from spectrum.ieee.org / IEEE Spectrum

Travels with Perplexity AI 1 day, 19 hours ago | spectrum.ieee.org

ai app bag bob kahn +11

Ukraine Is Riddled With Land Mines. Drones and AI Can Help 2 days, 17 hours ago | spectrum.ieee.org

armed forces colleagues drones explosives detection +8

Llama 3 Establishes Meta as the Leader in “Open” AI 2 days, 17 hours ago | spectrum.ieee.org

ai model ai models apps april +15

Startups Say India Is Ideal for Testing Self-Driving Cars 3 days, 18 hours ago | spectrum.ieee.org

artificial intelligence autonomous autonomous vehicles cars +17

Empower Your Supply Chain 1 week, 1 day ago | spectrum.ieee.org

applications artificial artificial intelligence big +26

Announcing a Benchmark to Improve AI Safety 1 week, 4 days ago | spectrum.ieee.org

ai-safety artificial intelligence benchmark benchmarks +29

What Software Engineers Need to Know About AI Jobs 1 week, 4 days ago | spectrum.ieee.org

ai hiring ai index report ai-jobs artificial +22

15 Graphs That Explain the State of AI in 2024 1 week, 5 days ago | spectrum.ieee.org

artificial intelligence big business business and finance +28

AI Chip Trims Energy Budget Back by 99+ Percent 2 weeks, 1 day ago | spectrum.ieee.org

ai chip applications art artificial intelligence +23

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

View on ai-jobs.net

Research Scientist

@ Meta | Menlo Park, CA

View on ai-jobs.net

Principal Data Scientist

@ Mastercard | O'Fallon, Missouri (Main Campus)

View on ai-jobs.net