March 13, 2024, 7:28 a.m. | /u/lapurita

Machine Learning

I occasionally see people talking about different companies having valuable data, which gives them a competitive advantage when creating an LLM. For example, Twitter is often mentioned, but how do we know that Twitter data was not included in the dataset for GPT-4? I know for a fact that Twitter was hilariously easy to scrape before Elon bought it, so it wouldn't surprise me at all if OpenAI has every single tweet made up to 2022 in their datasets. Same …

companies data dataset example gpt gpt-4 llm machinelearning openai people them twitter

