Arrangement: New
At CES 2025, which opened this morning, Nvidia founder and CEO Jensen Huang delivered a landmark keynote speech, revealing the future of AI and computing. From the core Token concept of generating AI, to the release of the new Blackwell architecture GPU, to the AI-driven digital future, this speech will profoundly influence the entire industry from a cross-domain perspective.
1) From Generative AI to Agentic AI: The beginning of a new era
The birth of Token: As the core driving force for generating AI, token transforms text into knowledge, injects life into images, and opens up a new way of digital expression.
The evolution path of AI: from perceptual AI, generative AI to Agentic AI capable of reasoning, planning, and acting, AI technology continues to reach new heights.
The revolution of Transformers: since its launch in 2018, this technology has redefined computing methods, completely disrupting the traditional tech stack.
2) Blackwell GPU: Breaking performance limits
The next generation GeForce RTX 50 series: based on the Blackwell architecture, with 92 billion transistors, 4000 TOPS AI performance, and 4 PetaFLOPS computing power, three times the performance of the previous generation.
The fusion of AI and graphics: for the first time, the combination of programmable shaders and neural networks is achieved, introducing neural texture compression and material shading technology, delivering stunning rendering effects.
Affordable high performance: the RTX 5070 laptop achieves RTX 4090 performance at a price of $1299, promoting the popularization of high-performance computing.
3) Multi-domain expansion of AI applications
Enterprise-level AI Agent: NVIDIA provides tools such as Nemo and Llama Nemotron to help enterprises build autonomous reasoning digital employees, achieving intelligent management and services.
Physical AI: Through the Omniverse and Cosmos platforms, AI integrates into industrial, autonomous driving, and robotics fields, redefining global manufacturing and logistics.
Future computing scenarios: NVIDIA is bringing AI from the cloud to personal devices and within enterprises, covering all computing needs from developers to ordinary users.
Here are the main points of Jensen Huang's speech this time:
This is the birthplace of wisdom, a brand new factory—the generator of generating tokens. It is the building block of AI, opening a new field, and taking the first step into an extraordinary world. Tokens transform words into knowledge, breathe life into images; they turn creativity into videos, helping us navigate any environment safely; teach robots to move like masters, and inspire us to celebrate victories in new ways. At our most needed times, tokens can also bring inner peace. They give digital meaning, helping us better understand the world, predict potential dangers, and find ways to heal internal threats. They can make our visions come true, restoring all that we have lost.
All of this began in 1993 when NVIDIA launched its first product—the NV1. We wanted to create a computer that could accomplish what ordinary computers could not, making it possible to have game consoles in PCs. Then, in 1999, NVIDIA invented the programmable GPU, initiating over 20 years of technological advancement, making modern computer graphics possible. Six years later, we launched CUDA, which richly expressed the programmability of GPUs through abundant algorithms. This technology was initially hard to explain, but by 2012, the success of AlexNet validated the potential of CUDA, driving breakthrough developments in AI.
Since then, AI has developed at an astonishing rate. From perceptual AI to generative AI, to Agentic AI capable of perception, reasoning, planning, and action, AI's capabilities have continuously improved. In 2018, Google launched Transformers, and the world of AI truly took off. Transformers not only completely changed the landscape of AI but also redefined the entire computing field. We realized that machine learning is not just a new application or business opportunity, but a fundamental revolution in computing methods. From manually writing instructions to optimizing neural networks with machine learning, every layer of the tech stack has undergone significant changes.
Today, the application of AI is ubiquitous. Whether it is understanding text, images, sounds, or translating amino acids and physics, it can accomplish it. Almost all AI applications can be summarized into three questions: What modality of information has it learned? What modality of information has it translated? What modality of information has it generated? This fundamental concept drives every AI-driven application.
All these achievements are inseparable from the support of GeForce. GeForce brings AI to the masses, and now, AI is returning to GeForce. With real-time ray tracing technology, we can render graphics with stunning effects. With DLSS, AI can even exceed frame generation, predicting future images. Out of 33 million pixels, only 2 million pixels are computed; the rest are predicted and generated by AI. This miraculous technology demonstrates the powerful capabilities of AI, making computing more efficient and revealing infinite possibilities for the future.
This is why so many amazing things are happening now. We have driven the development of AI with GeForce, and now AI is completely revolutionizing GeForce. Today, we announce the next generation product—the RTX Blackwell family. Let’s take a look together.
This is the all-new GeForce RTX 50 series, based on the Blackwell architecture. This GPU is a performance monster, with 92 billion transistors, 4000 TOPS of AI performance, and 4 PetaFLOPS of AI computing power, three times that of the previous generation Ada architecture. All of this is to generate the stunning pixels I just showcased. It also features 380 ray tracing Teraflops, providing the most beautiful image quality possible for pixels that require computation, along with 125 shading Teraflops. This graphics card uses Micron's G7 memory, with speeds of 1.8TB per second, doubling the performance of the previous generation.
We can now combine AI workloads with computer graphics workloads. An extraordinary feature of this generation of products is that programmable shaders can also handle neural networks. This has led us to invent neural texture compression and neural material shading. These technologies learn textures and compression algorithms through AI, ultimately generating stunning image effects that only AI can achieve.
Even in mechanical design, this graphics card is a marvel. It features a dual-fan design, and the entire graphics card resembles a giant fan, with the internal voltage regulation module being state-of-the-art. Such outstanding design is entirely attributed to the efforts of the engineering team.
Next is the performance comparison. The well-known RTX 4090, priced at $1599, is a core investment for home PC entertainment centers. And now, the RTX 50 series offers higher performance, starting at just $549, with performance from RTX 5070 to RTX 5090 being double that of the RTX 4090.
Even more astonishingly, we have put this high-performance GPU into laptops. The RTX 5070 laptop is priced at $1299, yet has the performance of the RTX 4090. This design integrates AI and computer graphics technology, achieving high efficiency and high performance.
The future of computer graphics will be neural rendering—merging AI with computer graphics. The Blackwell series can even achieve this in laptops with a thickness of only 14.9 mm; the entire range from RTX 5070 to RTX 5090 can fit ultra-thin laptops.
GeForce has driven the popularization of AI, and now AI is completely transforming GeForce. This is a mutual promotion of technology and intelligence, and we are moving towards a higher realm.
Three types of Scaling Laws of AI
Next, let's talk about the development direction of AI.
1) Pre-trained Scaling Law
The AI industry is accelerating its expansion, driven by a powerful model known as the 'Scaling Law.' This empirical rule has been repeatedly validated by researchers and industry practitioners, indicating that the larger the training data, the larger the model size, and the more computing power invested, the stronger the model's capabilities will be.
The speed of data growth is accelerating exponentially. It is estimated that in the coming years, the amount of data produced by humans annually will surpass the total produced in all previous human history. This data is becoming multimodal, including forms like video, images, and sound. This vast amount of data can be used to train the foundational knowledge systems of AI, laying a solid knowledge foundation for AI.
2) Post-training Scaling Law
In addition, two other Scaling Laws are also emerging.
The second Scaling Law is the 'Post-training Scaling Law,' which involves technologies such as reinforcement learning and human feedback. In this way, AI generates answers based on human queries and continuously improves from human feedback. This reinforcement learning system helps AI refine skills in specific areas through high-quality prompts, for example, becoming better at solving math problems or performing complex reasoning.
The future of AI is not just about perception and generation; it is a process of continuous self-improvement and breaking boundaries. It is like having a mentor or coach that provides feedback after you complete a task. Through testing, feedback, and self-improvement, AI can also progress through similar reinforcement learning and feedback mechanisms. This post-training phase reinforcement learning combined with synthetic data generation technology resembles a self-practice process. AI can face complex and verifiable problems, such as proving theorems or solving geometric problems, continuously optimizing its answers through reinforcement learning. Although this post-training requires enormous computing power, it can ultimately create extraordinary models.
3) Testing Time Scaling Law
The Testing Time Scaling Law is also gradually emerging. This law demonstrates unique potential when AI is actually in use. AI can dynamically allocate resources during inference, no longer limited to parameter optimization, but focusing on computing allocation to produce the high-quality answers needed.
This process is similar to reasoning thought, rather than direct inference or one-time answers. AI can break down problems into multiple steps, generate multiple solutions and evaluate them, ultimately selecting the optimal solution. This long-term reasoning significantly enhances model capabilities.
We have seen the evolution of this technology, from ChatGPT to GPT-4, and now to the current Gemini Pro, all of these systems are experiencing gradual development through pre-training, post-training, and testing time scaling. Achieving these breakthroughs requires immense computing power, which is the core value of NVIDIA's Blackwell architecture.
Latest introduction of the Blackwell architecture
The Blackwell system is in full production, and its performance is astonishing. Today, every cloud service provider is deploying these systems, which are produced by 45 factories worldwide, supporting up to 200 configurations, including liquid cooling, air cooling, x86 architecture, and NVIDIA Grace CPU versions.
The core component NVLink system itself weighs up to 1.5 tons, with 600,000 parts, equivalent to the complexity of 20 cars, connected by 2 miles of copper wire and 5000 cables. The entire manufacturing process is extremely complex, but the goal is to meet the ever-expanding demand for computing.
Compared to the previous generation architecture, Blackwell has improved performance per watt by 4 times and performance per dollar by 3 times. This means that the scale of model training can increase by 3 times at the same cost, and the key behind these improvements is the generation of AI tokens. These tokens are widely used in ChatGPT, Gemini, and various AI services, forming the foundation for future computing.
On this basis, NVIDIA has promoted a new computing paradigm: neural rendering, perfectly merging AI with computer graphics. The 72 GPUs under the Blackwell architecture form the world's largest single-chip system, providing up to 1.4 ExaFLOPS of AI floating-point performance, with a memory bandwidth reaching an astonishing 1.2 PB/s, equivalent to the sum of all global internet traffic. This supercomputing capability allows AI to handle more complex reasoning tasks while significantly reducing costs, laying the foundation for more efficient computing.
AI Agent System and Ecosystem
Looking to the future, the reasoning process of AI is no longer a simple single-step response but is closer to 'internal dialogue.' Future AI will not only generate answers but will also reflect, reason, and continuously optimize. With the increased generation rate of AI tokens and reduced costs, the quality of AI services will significantly improve, meeting a broader range of application needs.
To help enterprises build AI systems with autonomous reasoning capabilities, NVIDIA provides three key tools: NVIDIA NeMo, AI microservices, and acceleration libraries. By packaging complex CUDA software and deep learning models into containerized services, enterprises can deploy these AI models on any cloud platform, rapidly developing domain-specific AI agents, such as service tools supporting enterprise management or digital employees for user interaction.
These models open up new possibilities for enterprises, not only lowering the development threshold for AI applications but also pushing the entire industry to take solid steps toward Agentic AI (autonomous AI). Future AI will become digital employees, easily integrated into enterprise tools like SAP and ServiceNow, providing intelligent services to customers in various environments. This is the next milestone in AI expansion and the core vision of NVIDIA's technology ecosystem.
Training evaluation systems. In the future, these AI Agents will essentially work alongside employees, completing tasks for you as digital labor. Therefore, introducing these specialized agents into your company is like onboarding a new employee. We provide various toolkits to help these AI agents learn the unique language, vocabulary, business processes, and working methods of the company. You need to provide them with examples of work results, and they will try to generate responses, after which you can provide feedback, evaluate, and so on. At the same time, you will also set restrictions, such as clearly stating what actions they are not allowed to perform, what they cannot say, and controlling the information they can access. This entire digital employee process is called Nemo. To some extent, every company's IT department will become the HR department for AI agents.
Today, IT departments manage and maintain a large amount of software; in the future, they will manage, nurture, onboard, and improve a large number of digital agents to serve the company. Thus, IT departments will gradually evolve into the HR departments for AI agents.
In addition, we provide many open-source blueprints for the ecosystem to use. Users can freely modify these blueprints. We have provided blueprints for various types of agents. Today, we also announced something very cool and smart: we launched a new family of models based on Llama, namely the NVIDIA Llama Nemo Tron language foundation model series.
Llama 3.1 is a phenomenal model. The download count for Meta's Llama 3.1 has reached about 350,650,000 times, spawning roughly 60,000 other models. This is one of the core reasons driving almost all enterprises and industries to start researching AI. We realized that Llama models can be better fine-tuned for enterprise use cases. Utilizing our expertise and capabilities, we fine-tuned it into the Llama Nemotron open model suite.
These models come in different sizes: small models respond quickly; mainstream super models Super Llama Nemotron are for general use; while ultra-large models Ultra Model can serve as teacher models for evaluating other models, generating answers, and determining their quality, or as knowledge distillation models. All of these models are now online.
These models perform excellently, ranking high in areas such as conversation, instructions, and information retrieval, making them very suitable for the functionalities of AI agents globally.
Our collaboration with the ecosystem is also very close, such as with ServiceNow, SAP, and Siemens in the field of industrial AI. Companies like Cadence and Perplexity are also undertaking excellent projects. Perplexity has disrupted the search field, while Codium serves 30 million software engineers worldwide. AI assistants will greatly enhance the productivity of software developers, which is the next huge application area for AI services. There are one billion knowledge workers globally, and AI agents may be the next robotics industry, with potential reaching trillions of dollars.
AI Agent Blueprint
Next, I will showcase some AI Agent blueprints completed in collaboration with partners.
AI Agents are the new digital workforce, capable of assisting or replacing humans in task completion. NVIDIA's Agentic AI building blocks, NEM pre-trained models, and Nemo framework help organizations easily develop and deploy AI agents. These agents can be trained as domain-specific task experts.
Here are four examples:
Research Assistant Agent: Able to read complex documents such as lectures, journals, financial reports, etc., and generate interactive podcasts for easier learning;
Software Security AI Agent: Helps developers continuously scan for software vulnerabilities and suggests appropriate measures;
Virtual Laboratory AI Agent: Accelerates compound design and screening, quickly finding potential drug candidates;
Video Analysis AI Agent: Based on NVIDIA's Metropolis blueprint, analyzes data from billions of cameras to generate interactive searches, summaries, and reports. For example, monitoring traffic flow, facility processes, and providing improvement suggestions, etc.;
The arrival of the physical AI era
We hope to bring AI from the cloud to every corner, including within companies and personal PCs. NVIDIA is working to transform Windows WSL 2 (Windows Subsystem) into the preferred platform for AI support. This will allow developers and engineers to more conveniently utilize NVIDIA's AI technology stack, including language models, image models, animation models, etc.
Additionally, NVIDIA launched Cosmos, the first physical world foundation model development platform, focusing on understanding the dynamic characteristics of the physical world such as gravity, friction, inertia, spatial relationships, and causality. It can generate videos and scenes that comply with physical laws, widely applied in training and validation for robotics, industrial AI, and multimodal language models.
Cosmos provides physical simulation by connecting to NVIDIA Omniverse, generating realistic simulation results. This combination is core technology for developing robotics and industrial applications.
NVIDIA's industrial strategy is based on three computing systems:
DGX systems for training AI;
AGX systems for deploying AI;
Digital twin systems for reinforcement learning and AI optimization;
Through the collaborative work of these three systems, NVIDIA has promoted the development of robotics and industrial AI, building a digital world for the future; rather than being a three-body problem, we have a 'three-computer' solution.
NVIDIA's vision for robotics lets me show you three examples.
1) Applications of industrial visualization
Currently, there are millions of factories and hundreds of thousands of warehouses globally, forming the backbone of a $50 trillion manufacturing industry. In the future, all of this needs to be software-defined and automated, incorporating robotics. We are working with Keon, a leading global warehouse automation solution provider, and Accenture, the world's largest professional services provider, focusing on digital manufacturing to create some very special solutions together. Our marketing approach is similar to other software and technology platforms, developed through developers and ecosystem partners, and more and more ecosystem partners are connecting to the Omniverse platform. This is because everyone wants to visualize the future of industry. In this $50 trillion global GDP, there is so much waste and so many opportunities for automation.
Here is an example of Keon and Accenture's collaboration with us:
Keon (a supply chain solutions company), Accenture (a global leader in professional services), and NVIDIA are bringing physical AI to the trillion-dollar warehouse and distribution center market. Managing efficient warehouse logistics requires navigating complex decision networks that are influenced by constantly changing variables, such as daily and seasonal demand fluctuations, space constraints, labor supply, and the integration of diverse robots and automation systems. Today, predicting the operational key performance indicators (KPIs) of physical warehouses is almost impossible.
To solve these problems, Keon is adopting Mega (a NVIDIA Omniverse blueprint) to build industrial digital twins for testing and optimizing robotic fleets. First, Keon's warehouse management solution assigns tasks to the industrial AI brain in the digital twin, such as moving goods from buffer locations to shuttle storage solutions. The robotic fleet performs tasks in the physical warehouse simulated environment of Omniverse, perceiving and reasoning to plan the next steps and take action. The digital twin environment uses sensor simulations, allowing the robotic brain to see the status after task execution and decide on subsequent actions. Under Mega's precise tracking, the entire cycle continues while measuring operational KPIs such as throughput, efficiency, and utilization, all completed before making changes to the physical warehouse.
With NVIDIA's collaboration, Keon and Accenture are redefining the future of industrial autonomy.
In the future, every factory will have a digital twin, fully synchronized with the actual factory. You can leverage Omniverse and Cosmos to generate a multitude of future scenarios, where AI will determine the optimal KPI scenarios and set them as constraints for the actual factory deployment and AI programming logic.
2) Self-driving cars
The autonomous driving revolution has arrived. After years of development, the success of both Waymo and Tesla has proven the maturity of autonomous driving technology. Our solutions provide three types of computer systems for this industry: systems for training AI (such as DGX systems), systems for simulation testing and generating synthetic data (such as Omniverse and Cosmos), and in-car computer systems (such as AGX systems). Almost all major automotive companies globally are collaborating with us, including Waymo, Zoox, Tesla, as well as the world's largest electric vehicle company BYD. Companies like Mercedes, Lucid, Rivian, Xiaomi, and Volvo, which are set to launch innovative models, are also involved. Aurora is using NVIDIA technology to develop autonomous driving trucks.
Every year, 100 million cars are manufactured, and there are one billion cars on the global roads, totaling trillions of miles driven each year. These will gradually achieve high automation or full automation. This industry is expected to become the first robotics industry worth trillions of dollars.
Today, we announce the launch of the next-generation onboard computer Thor. It is a general-purpose robotic computer capable of processing vast amounts of data from sensors such as cameras, high-resolution radar, and lidar. Thor is an upgraded version of the industry's standard Orin, with 20 times its computing power, and is now in full production. Additionally, NVIDIA's Drive OS is the first AI computing operating system certified to meet the highest functional safety standards (ISO 26262 ASIL D).
Autonomous driving data factories
NVIDIA utilizes Omniverse AI models and the Cosmos platform to create autonomous driving data factories, significantly expanding training data through synthetic driving scenarios. This includes:
OmniMap: Integrating maps and geospatial data to build drivable 3D environments;
Neural Reconstruction Engine: Generating high-fidelity 4D simulation environments from sensor logs and creating scene variants for training data;
Edify 3DS: Searching or generating new assets from asset libraries to create scenes for simulation.
With these technologies, we have expanded thousands of driving scenarios into billions of miles of data for the development of safer, more advanced autonomous driving systems.
3) General Robotics
The era of general robotics is about to arrive. The key to breakthroughs in this field lies in training. For humanoid robots, obtaining imitation data is relatively challenging, but NVIDIA's Isaac Groot provides a solution. It generates massive datasets through simulation and combines them with Omniverse and Cosmos's multi-universe simulation engine for policy training, validation, and deployment.
For example, developers can remotely operate robots through Apple Vision Pro, capturing data without physical robots and teaching task actions in a no-risk environment. Through Omniverse's domain randomization and 3D-to-real scene expansion capabilities, it generates exponentially growing datasets, providing vast resources for robot learning.
In summary, whether it is industrial visualization, autonomous driving, or general robotics, NVIDIA's technology is leading the future transformation in the fields of physical AI and robotics.
Finally, I have an important content to showcase, all of which is inseparable from a project called Project Digits that we launched internally ten years ago, officially named Deep Learning GPU Intelligence Training System, abbreviated as Digits.
Before the official release, we adjusted DGX to harmonize with the company's internal RTX, AGX, OVX, and other series products. The launch of DGX1 truly changed the direction of AI development, marking a milestone for NVIDIA's advancement in AI.
The revolution of DGX1
The original intention of DGX1 was to provide out-of-the-box AI supercomputers for researchers and startups. Imagine that traditional supercomputers require users to build dedicated facilities and design and construct complex infrastructure to exist. DGX1, however, is a supercomputer designed for AI development that requires no complex operations and is ready to use out of the box.
I remember that in 2016, I delivered the first DGX1 to a startup—OpenAI. At that time, Elon Musk, Ilya Sutskever, and many NVIDIA engineers were present, and we celebrated the arrival of DGX1 together. This device significantly promoted the development of AI computing.
Today, AI is everywhere. Not limited to research institutions and startup labs, as I mentioned at the beginning, AI has become a brand new way of computing and software development. Every software engineer, creative artist, and even ordinary users of computer tools need an AI supercomputer. But I have always hoped that the DGX1 could be a bit smaller.
Latest AI supercomputer
The following is NVIDIA's latest AI supercomputer. It still belongs to Project Digits, and we are currently looking for a better name, and suggestions are welcome. This is a truly amazing device.
This supercomputer can run NVIDIA's complete AI software stack, including DGX Cloud. It can serve as a cloud supercomputer, a high-performance workstation, or even an analytical workstation on a desktop. Most importantly, it is based on a new chip we secretly developed, codenamed GB110, which is the smallest Grace Blackwell we have manufactured.
I have a chip in my hand, let me show you its internal design. This chip was developed in collaboration with the world's leading SoC company MediaTek. This CPU SoC is custom built for NVIDIA, using NVLink chip-to-chip interconnection technology connected to the Blackwell GPU. This small chip is now in full production. We expect this supercomputer to officially launch around May.
We even offer 'double computing power' configurations, allowing these devices to be connected through ConnectX, supporting GPU direct (GPUDirect) technology. It is a complete supercomputing solution capable of meeting various needs for AI development, analytical work, and industrial applications.
Additionally, three new Blackwell system chips have been announced for mass production, the world's first physical AI foundation model, and breakthroughs in three major robotics fields - autonomous AI agent robots, humanoid robots, and self-driving cars.