According to Deep Tide TechFlow, on January 9, TechCrunch reported that Musk stated during a live dialogue with Stagwell Chairman Mark Penn that the sum of human knowledge as AI training data will be basically exhausted by 2024. This view echoes the 'data peak' theory proposed by OpenAI's former chief scientist Ilya Sutskever at the NeurIPS conference in December.

Musk believes that synthetic data will become a key path for the future development of AI. Currently, tech giants such as Microsoft, Meta, OpenAI, and Anthropic have adopted synthetic data training schemes in their flagship AI models. Among them, Microsoft’s newly open-sourced Phi-4, Google’s Gemma model, Anthropic’s Claude 3.5 Sonnet, and Meta’s latest Llama series models all use synthetic data for training or fine-tuning.

From a cost perspective, the AI startup Writer developed the Palmyra X 004 model using nearly all synthetic data at a cost of only $700,000, significantly lower than the $4.6 million development cost of a model of similar scale from OpenAI. However, research shows that synthetic data may lead to model collapse issues, making model outputs less creative and exacerbating biases, as the biases and limitations in the original training data can be amplified during the synthesis process. According to Gartner statistics, about 60% of the data in AI and analytics projects in 2024 will be synthetically generated.