According to PANews, Elon Musk recently discussed the limitations of current AI models in a live conversation with Stagwell Chairman Mark Penn. Musk stated that AI training has nearly exhausted real-world data, claiming that the cumulative knowledge of humanity was depleted last year. This view aligns with former OpenAI Chief Scientist Ilya Sutskever, who suggested at the NeurIPS machine learning conference that the AI industry has reached a 'data peak,' necessitating a shift in model development strategies.
Musk highlighted synthetic data as a means to supplement real data, enabling AI to learn through data generation and self-assessment. This approach is already being adopted by tech giants like Microsoft, Meta, OpenAI, and Anthropic. For instance, Microsoft's Phi-4 model and Google's Gemma model both utilize a combination of real and synthetic data for training. Gartner predicts that by 2024, approximately 60% of data in AI and analytics projects will be synthetically generated.
The advantages of synthetic data include cost savings. AI startup Writer, for example, spent about $700,000 to develop its Palmyra X 004 model, which relies almost entirely on synthetic data. In contrast, developing a similarly sized OpenAI model costs around $4.6 million. However, synthetic data also poses risks, such as reduced model creativity, increased output bias, and potential model failure, particularly if the training data itself is biased, which can affect the generated results.