March 13, 2024, 7:28 a.m. | /u/lapurita

Machine Learning

I occasionally see people talking about different companies having valuable data, which gives them a competitive advantage when creating an LLM. For example, Twitter is often mentioned, but how do we know that Twitter data was not included in the dataset for GPT-4? I know for a fact that Twitter was hilariously easy to scrape before Elon bought it, so it wouldn't surprise me at all if OpenAI has every single tweet made up to 2022 in their datasets. Same …

companies data dataset example gpt gpt-4 llm machinelearning openai people them twitter

Senior Data Engineer

@ Displate | Warsaw

Solution Architect

@ Philips | Bothell - B2 - Bothell 22050

Senior Product Development Engineer - Datacenter Products

@ NVIDIA | US, CA, Santa Clara

Systems Engineer - 2nd Shift (Onsite)

@ RTX | PW715: Asheville Site W Asheville Greenfield Site TBD , Asheville, NC, 28803 USA

System Test Engineers (HW & SW)

@ Novanta | Barcelona, Spain

Senior Solutions Architect, Energy

@ NVIDIA | US, TX, Remote