Organizing: There is something new
At CES 2025, which opened this morning, NVIDIA founder and CEO Jensen Huang delivered a milestone keynote speech, revealing the future of AI and computing. From the core concept of generating AI tokens to the launch of the brand new Blackwell architecture GPU, and to the AI-driven digital future, this lecture will profoundly impact the entire industry from a cross-domain perspective.
1) From generative AI to agentic AI: The dawn of a new era
The birth of tokens: As the core driving force behind generative AI, tokens transform words into knowledge, breathe life into images, and open up new digital expressions.
AI evolution path: From perceptive AI, generative AI to reasoning, planning, and action-capable agentic AI, AI technology constantly reaches new heights.
The revolution of Transformers: Since its launch in 2018, this technology has redefined computing methods, fundamentally overturning traditional tech stacks.
2) Blackwell GPU: Breaking performance limits
The new generation GeForce RTX 50 series: based on the Blackwell architecture, featuring 92 billion transistors, 4000 TOPS AI performance, and 4 PetaFLOPS computing power, three times the performance of the previous generation.
The fusion of AI and graphics: achieving for the first time the combination of programmable shaders and neural networks, launching neural texture compression and material shading technology, delivering stunning rendering effects.
Affordable high performance: The RTX 5070 laptop achieves RTX 4090 performance at a price of $1299, promoting the accessibility of high-performance computing.
3) Multi-domain expansion of AI applications
Enterprise-level AI Agents: NVIDIA provides tools like Nemo and Llama Nemotron to help enterprises build autonomous reasoning digital employees, achieving intelligent management and services.
Physic AI: Through the Omniverse and Cosmos platforms, AI integrates into industrial, autonomous driving, and robotics fields, redefining global manufacturing and logistics.
Future computing scenarios: NVIDIA is bringing AI from the cloud to personal devices and enterprises, covering all computing needs from developers to ordinary users.
The following are the main points from Jensen Huang's speech:
This is the birthplace of wisdom, a brand new factory—a generator of tokens. It is the building block of AI, opening a new realm and taking the first step into an extraordinary world. Tokens transform words into knowledge, breathe life into images; they turn creativity into videos, helping us safely navigate any environment; teaching robots to move like masters, and inspiring us to celebrate victories in new ways. In our most needed moments, tokens can also bring inner peace. They give digital meaning, helping us better understand the world, anticipate potential dangers, and find ways to treat inner threats. They can make our visions come true, restoring all that we have lost.
All this began in 1993 when NVIDIA launched its first product—NV1. We aimed to create computers capable of achieving what ordinary computers could not, making it possible to have game consoles in PCs. Subsequently, in 1999, NVIDIA invented the programmable GPU, initiating over 20 years of technological progress, making modern computer graphics possible. Six years later, we launched CUDA, enriching the programmability of GPUs through rich algorithm expressions. This technology was initially difficult to explain, but by 2012, the success of AlexNet validated the potential of CUDA, propelling groundbreaking developments in AI.
Since then, AI has developed at an astonishing pace. From perceptive AI to generative AI, and now to agentic AI that can perceive, reason, plan, and act, the capabilities of AI continue to enhance. In 2018, Google launched the Transformer, and the world of AI truly took off. The Transformer not only fundamentally changed the landscape of AI but also redefined the entire computing domain. We realized that machine learning is not just a new application or business opportunity, but a fundamental revolution in computing methods. From manually writing instructions to optimizing neural networks with machine learning, every layer of the tech stack has undergone significant changes.
Today, the applications of AI are ubiquitous. Whether understanding text, images, sounds, or translating amino acids and physics, it can accomplish it all. Almost all AI applications can be distilled into three questions: What modality of information did it learn? What modality of information was translated? What modality of information was generated? This fundamental concept drives every AI-driven application.
All these achievements are inseparable from the support of GeForce. GeForce has brought AI to the masses, and now AI is returning to GeForce. With real-time ray tracing technology, we can render graphics with stunning effects. Through DLSS, AI can even surpass frame generation, predicting future images. Out of 33 million pixels, only 2 million are calculated, the rest are predicted and generated by AI. This miraculous technology showcases the powerful capabilities of AI, making computation more efficient and revealing the infinite possibilities of the future.
This is why so many amazing things are happening right now. We have utilized GeForce to drive the development of AI, and now AI is fundamentally revolutionizing GeForce. Today, we announce the next-generation products—the RTX Blackwell family. Let’s take a look together.
This is the brand new GeForce RTX 50 series, based on the Blackwell architecture. This GPU is a performance monster, featuring 92 billion transistors, 4000 TOPS of AI performance, and 4 PetaFLOPS of AI computing power, three times higher than the previous generation Ada architecture. All of this is to generate the stunning pixels I just showcased. It also features 380 ray tracing Teraflops for providing the most beautiful image quality for computationally intensive pixels, along with 125 shading Teraflops. This graphics card uses Micron's G7 memory, achieving speeds of 1.8TB per second, twice the performance of the previous generation.
We can now combine AI workloads with computer graphics workloads; an extraordinary feature of this generation of products is that programmable shaders can also handle neural networks. This has led us to invent Neural Texture Compression and Neural Material Shading. These technologies learn textures and compression algorithms through AI, ultimately generating stunning visual effects that only AI can achieve.
Even in mechanical design, this graphics card is a miracle. It features a dual-fan design, and the entire graphics card looks like a giant fan, with the internal voltage regulation module being state-of-the-art. Such an excellent design is entirely due to the efforts of the engineering team.
Next is the performance comparison. The well-known RTX 4090, priced at $1599, is a core investment for home PC entertainment hubs. Now, the RTX 50 series offers even higher performance while starting at just $549, with performance that is twice that of the RTX 4090 from RTX 5070 to RTX 5090.
Even more astonishingly, we have put this high-performance GPU into laptops. The RTX 5070 laptop priced at $1299 has the performance of an RTX 4090. This design combines AI and computer graphics technology, achieving high efficiency and high performance.
The future of computer graphics will be neural rendering— the fusion of AI and computer graphics. The Blackwell series can even be realized in laptops that are only 14.9 millimeters thick, with the entire series from RTX 5070 to RTX 5090 adaptable for ultra-thin laptops.
GeForce has driven the popularity of AI, and now AI has completely transformed GeForce in return. This is a mutual promotion of technology and intelligence, as we move towards a higher realm.
Three types of AI Scaling Laws
Next, let's talk about the direction of AI development.
1) Pre-training Scaling Law
The AI industry is rapidly expanding, driven by a powerful model known as the 'Scaling Law'. This empirical rule has been repeatedly validated by researchers and the industry, indicating that as the scale of training data, model size, and computational power increases, the capabilities of the model also enhance.
The speed of data growth is accelerating exponentially. It is estimated that in the coming years, humanity will produce more data annually than the total produced in all previous human history. This data is becoming multimodal, including forms such as video, images, and sounds. This vast amount of data can be used to train the foundational knowledge systems of AI, laying a solid knowledge foundation for AI.
2) Post-training Scaling Law
In addition, two other types of Scaling Laws are emerging.
The second Scaling Law is the 'Post-training Scaling Law,' which involves technologies like reinforcement learning and human feedback. In this way, AI generates answers based on human queries and continuously improves from human feedback. This reinforcement learning system helps AI refine skills in specific areas through high-quality prompts, such as becoming better at solving mathematical problems or conducting complex reasoning.
The future of AI is not just about perception and generation; it is a process of continuous self-improvement and breaking boundaries. It is akin to having a mentor or coach, providing feedback after you complete tasks. Through testing, feedback, and self-improvement, AI can also progress through similar reinforcement learning and feedback mechanisms. This post-training phase of reinforcement learning combined with synthetic data generation techniques is similar to a self-practice process. AI can face complex and verifiable challenges, such as proving theorems or solving geometric problems, continuously optimizing its answers through reinforcement learning. Although this post-training requires immense computing power, it ultimately creates extraordinary models.
3) Testing Time Scaling Law
Testing Time Scaling Law is also gradually emerging. This law exhibits unique potential when AI is actively used. AI can dynamically allocate resources during reasoning, no longer limited to parameter optimization, but focusing on computational allocation to produce the high-quality answers required.
This process is similar to reasoning thought rather than direct inference or one-time responses. AI can break down problems into multiple steps, generate multiple solutions, and evaluate them to ultimately select the optimal solution. This long-term reasoning is significantly effective in enhancing model capabilities.
We have seen the evolution of this technology, from ChatGPT to GPT-4, and now to Gemini Pro, all these systems are experiencing gradual advancements in pre-training, post-training, and testing time expansions. Achieving these breakthroughs requires immense computing power, which is the core value of NVIDIA's Blackwell architecture.
Latest introduction of the Blackwell architecture
The Blackwell systems are now in full production, and their performance is astonishing. Today, every cloud service provider is deploying these systems, produced by 45 factories worldwide, supporting up to 200 configurations, including liquid cooling, air cooling, x86 architecture, and NVIDIA Grace CPU versions.
Its core component, the NVLink system itself, weighs up to 1.5 tons, contains 600,000 parts, equivalent to the complexity of 20 cars, connected by 2 miles of copper wire and 5,000 cables. The entire manufacturing process is incredibly complex, but the goal is to meet the ever-expanding demand for computing.
Compared to the previous generation architecture, Blackwell improves performance per watt by 4 times and performance per dollar by 3 times. This means that at the same cost, the scale of training models can increase threefold, and the key behind these improvements is the generation of AI tokens. These tokens are widely used in ChatGPT, Gemini, and various AI services, forming the basis of future computing.
On this basis, NVIDIA has pushed a brand new computing paradigm: neural rendering, perfectly blending AI with computer graphics. The 72 GPUs under the Blackwell architecture form the world's largest single-chip system, providing up to 1.4 ExaFLOPS of AI floating-point performance, with an astonishing memory bandwidth of 1.2 PB/s, equivalent to the total internet traffic globally. This supercomputing power allows AI to handle more complex reasoning tasks while significantly reducing costs, laying the foundation for more efficient computing.
AI Agent systems and ecosystems
Looking ahead, the reasoning process of AI will no longer be a simple one-step response but will be closer to 'internal dialogue.' Future AI will not only generate answers but also reflect, reason, and continuously optimize. As the generation rate of AI tokens increases and costs decrease, the quality of AI services will significantly improve, meeting broader application needs.
To assist enterprises in building AI systems with autonomous reasoning capabilities, NVIDIA provides three key tools: NVIDIA NeMo, AI microservices, and acceleration libraries. By packaging complex CUDA software and deep learning models into containerized services, enterprises can deploy these AI models on any cloud platform, rapidly develop domain-specific AI agents, such as service tools for enterprise management or digital employees for user interaction.
These models open new possibilities for enterprises, not only lowering the development threshold for AI applications but also pushing the entire industry to make solid steps towards agentic AI (autonomous AI). The future AI will become digital employees, seamlessly integrating into enterprise tools like SAP and ServiceNow, providing intelligent services to customers in various environments. This is the next milestone in AI expansion and the core vision of NVIDIA's technology ecosystem.
Training Evaluation System. In the future, these AI agents will essentially work alongside employees, completing tasks for you as a digital workforce. Therefore, introducing these specialized agents into your company is akin to onboarding new employees. We provide different toolkits to help these AI agents learn the unique language, vocabulary, business processes, and workflow of the company. You need to provide them with examples of work output, they will attempt to generate, and then you can provide feedback, conduct assessments, etc. At the same time, you will also set restrictions, such as clearly stating what they cannot do, what they cannot say, and controlling the information they can access. This entire digital employee process is called Nemo. To some extent, every company's IT department will become the HR department for AI agents.
Today, IT departments manage and maintain a vast amount of software; in the future, they will manage, train, onboard, and improve numerous digital agents to serve the company. Therefore, IT departments will gradually evolve into the HR departments for AI agents.
Additionally, we provide many open-source blueprints for the ecosystem to use. Users can freely modify these blueprints. We have blueprints for various types of agents. Today, we also announce something very cool and clever: we have launched a brand new family of models based on Llama, the NVIDIA Llama Nemo Tron language foundational model series.
Llama 3.1 is a phenomenal model. Meta's Llama 3.1 has been downloaded approximately 350,650,000 times and has spawned about 60,000 other models. This is one of the core reasons that almost all enterprises and industries have begun to explore AI. We recognize that the Llama model can be better fine-tuned for enterprise use cases. Leveraging our expertise and capabilities, we have fine-tuned it into the Llama Nemotron open model suite.
These models are classified into different sizes: small models respond quickly; the mainstream super model Super Llama Nemotron serves as a general-purpose model; while the ultra-large model Ultra Model can be used as a teacher model for evaluating other models, generating answers, and determining their quality, or for knowledge distillation models. All these models are now online.
These models perform exceptionally well, ranking highly in dialogue, instruction, and information retrieval domains, making them very suitable for AI agent functionalities worldwide.
Our collaboration with the ecosystem is also very close, such as with ServiceNow, SAP, and Siemens in industrial AI. Companies like Cadence and Perplexity are also undertaking excellent projects. Perplexity is disrupting the search field, while Codium serves 30 million software engineers globally. AI assistants will significantly enhance the productivity of software developers, which is the next huge application area for AI services. There are 1 billion knowledge workers globally, and AI agents could be the next robotic industry, with potential reaching trillions of dollars.
AI Agent Blueprint
Next, let's showcase some AI Agent blueprints completed in collaboration with partners.
AI Agents are the new digital workforce, capable of assisting or replacing humans in completing tasks. NVIDIA's Agentic AI building blocks, NEM pre-trained models, and Nemo framework help organizations easily develop and deploy AI Agents. These agents can be trained as domain-specific task experts.
Here are four examples:
Research Assistant Agent: Capable of reading complex documents like lectures, journals, financial reports, etc., and generating interactive podcasts for learning;
Software Security AI Agent: Helping developers continuously scan for software vulnerabilities and prompt appropriate actions;
Virtual Laboratory AI Agent: Accelerating compound design and screening, quickly finding potential drug candidates;
Video Analysis AI Agent: Based on NVIDIA Metropolis Blueprint, analyzing data from billions of cameras to generate interactive searches, summaries, and reports. For example, monitoring traffic flow, facility processes, providing improvement suggestions, etc.;
The arrival of the physical AI era
We hope to bring AI from the cloud to every corner, including within companies and personal PCs. NVIDIA is working to transform Windows WSL 2 (Windows Subsystem for Linux) into the preferred platform for supporting AI. This will make it easier for developers and engineers to leverage NVIDIA's AI technology stack, including language models, image models, animation models, etc.
In addition, NVIDIA has launched Cosmos, the first physical world foundational model development platform, focusing on understanding the dynamic characteristics of the physical world, such as gravity, friction, inertia, spatial relationships, and causality. It can generate videos and scenes that comply with physical laws, widely applicable in training and validating robotics, industrial AI, and multimodal language models.
Cosmos provides physical simulation by connecting NVIDIA Omniverse, generating realistic simulation results. This combination is the core technology for developing robotics and industrial applications.
NVIDIA's industrial strategy is based on three computing systems:
DGX systems for training AI;
AGX systems for deploying AI;
Digital twin systems for reinforcement learning and AI optimization;
Through the collaboration of these three systems, NVIDIA has propelled the development of robotics and industrial AI, building the future digital world; rather than a three-body problem, we have a 'three-computer' solution.
NVIDIA's robotic vision allows me to show you three examples.
1) Applications of industrial visualization
Currently, there are millions of factories and hundreds of thousands of warehouses globally, forming the backbone of a $50 trillion manufacturing industry. In the future, all of this will need to be software-defined and automated, integrating robotics. We are collaborating with leading warehouse automation solution providers like Keon and the world's largest professional services provider Accenture, focusing on digital manufacturing to jointly create some very special solutions. Our marketing approach is similar to other software and technology platforms, conducted through developers and ecosystem partners, and more and more ecosystem partners have connected to the Omniverse platform. This is because everyone wishes to visualize the future of industry. In this $50 trillion global GDP, there is so much waste and so many automation opportunities.
Let's look at this example of collaboration between Keon and Accenture:
Keon (a supply chain solutions company), Accenture (a global leader in professional services), and NVIDIA are bringing physical AI into the trillion-dollar warehouse and distribution center market. Managing efficient warehouse logistics requires navigating a complex network of decisions, influenced by changing variables such as daily and seasonal demand fluctuations, space constraints, labor supply, and the integration of diverse robots and automation systems. Today, predicting the key performance indicators (KPIs) of physical warehouse operations is almost impossible.
To address these challenges, Keon is adopting Mega (a NVIDIA Omniverse blueprint) to build industrial digital twins for testing and optimizing robotic fleets. First, Keon's warehouse management solution assigns tasks to the industrial AI brain within the digital twin, such as moving goods from buffer locations to shuttle storage solutions. Robotic fleets execute tasks in the physical warehouse simulation environment within Omniverse, using perception and reasoning to plan their next actions and take action. The digital twin environment uses sensor simulations, allowing the robotic brain to see the state after task execution and decide the subsequent actions. Under the precise tracking of Mega, the entire cycle continues while measuring operational KPIs like throughput, efficiency, and utilization, all of which are completed before making changes to the physical warehouse.
With the help of NVIDIA, Keon and Accenture are redefining the future of industrial autonomy.
In the future, each factory will have a digital twin that is fully synchronized with the actual factory. You can leverage Omniverse and Cosmos to generate numerous future scenarios, and AI will determine the optimal KPI scenarios and deploy them as constraints and AI programming logic for the actual factory.
2) Autonomous vehicles
The autonomous driving revolution has arrived. After years of development, the successes of both Waymo and Tesla have proven the maturity of autonomous driving technology. Our solutions provide the industry with three types of computing systems: systems for training AI (like DGX systems), systems for simulating tests and generating synthetic data (like Omniverse and Cosmos), and in-vehicle computing systems (like AGX systems). Almost all major automotive companies globally are collaborating with us, including Waymo, Zoox, Tesla, and the world's largest electric vehicle company BYD. Companies like Mercedes, Lucid, Rivian, Xiaomi, and Volvo are also set to launch innovative models. Aurora is using NVIDIA technology to develop autonomous trucks.
Every year, 100 million cars are manufactured, with 1 billion cars on the global roads, covering a total mileage of trillions of miles each year. These will gradually achieve high levels of automation or full automation. This industry is expected to become the first robotic industry worth trillions of dollars.
Today, we announce the launch of the next-generation in-vehicle computer, Thor. It is a universal robotic computer capable of processing vast amounts of data from cameras, high-resolution radar, lidar, and other sensors. Thor is an upgrade of the current industry standard, Orin, with 20 times the computing power and is now in full production. Meanwhile, NVIDIA's Drive OS is the first AI computing operating system certified to meet the highest functional safety standards (ISO 26262 ASIL D).
Autonomous Driving Data Factory
NVIDIA utilizes the Omniverse AI models and Cosmos platform to create autonomous driving data factories, significantly expanding training data through synthetic driving scenarios. This includes:
OmniMap: Integrating maps and geospatial data to construct drivable 3D environments;
Neural Reconstruction Engine: Generating high-fidelity 4D simulation environments using sensor logs, and creating scene variants for training data;
Edify 3DS: Searching or generating new assets from asset libraries to create scenes for simulation.
Through these technologies, we expand thousands of driving scenarios into billions of miles of data for the development of safer and more advanced autonomous driving systems.
3) General robotics
The era of general robotics is about to arrive. The key to breakthroughs in this field lies in training. For humanoid robots, acquiring mimicry data is relatively challenging, but NVIDIA's Isaac Groot provides a solution. It generates vast datasets through simulation and combines with the multiverse simulation engine of Omniverse and Cosmos for policy training, validation, and deployment.
For example, developers can remotely operate robots through Apple Vision Pro, capturing data without physical robots and teaching task actions in a risk-free environment. By utilizing Omniverse's domain randomization and 3D-to-real scene extension features, an exponentially growing dataset is generated, providing vast resources for robotic learning.
In summary, whether in industrial visualization, autonomous driving, or general robotics, NVIDIA's technology is leading the future transformation of physical AI and robotics.
Finally, I have one important piece of content to showcase, all of which is thanks to a project we initiated internally ten years ago called Project Digits, officially named Deep Learning GPU Intelligence Training System (深度学习 GPU 智能训练系统), abbreviated as Digits.
Before the official release, we adjusted DGX to harmonize with the company's internal RTX, AGX, OVX, and other product lines. The advent of DGX1 truly changed the direction of AI development, marking a milestone for NVIDIA in AI development.
The revolutionary DGX1
The original intention of DGX1 was to provide researchers and startups with an out-of-the-box AI supercomputer. Imagine that previous supercomputers required users to build dedicated facilities and design complex infrastructures for their existence. DGX1, however, is a supercomputer specifically designed for AI development, requiring no complex operations—just plug it in and use it.
I still remember delivering the first DGX1 to a startup—OpenAI—in 2016. At that time, Elon Musk, Ilya Sutskever, and many engineers from NVIDIA were present, and we celebrated the arrival of DGX1 together. This device significantly propelled the development of AI computing.
Today, AI is ubiquitous. Not limited to research institutions and startup labs, as I mentioned at the beginning, AI has become a completely new way of computing and software development. Every software engineer, creative artist, and even ordinary users utilizing computer tools need an AI supercomputer. But I have always wished DGX1 could be a bit smaller.
The latest AI supercomputer
Here is NVIDIA's latest AI supercomputer. It still belongs to Project Digits, and we are currently looking for a better name, and suggestions are welcome. This is a truly astonishing device.
This supercomputer can run NVIDIA's complete AI software stack, including DGX Cloud. It can serve as a cloud supercomputer, a high-performance workstation, or even an on-desktop analytical workstation. Most importantly, it is based on a new chip we have secretly developed, codenamed GB110, which is the smallest Grace Blackwell we have manufactured.
I have a chip here to show you its internal design. This chip was developed in collaboration with MediaTek, a leading SoC company worldwide. This CPU SoC is custom-made for NVIDIA, connected to the Blackwell GPU using NVLink chip-to-chip interconnection technology. This small chip is now in full production. We expect this supercomputer to officially launch around May.
We even offer a 'double compute' configuration that allows these devices to be connected via ConnectX, supporting GPU passthrough (GPUDirect) technology. It is a complete supercomputing solution capable of meeting various needs for AI development, analytical work, and industrial applications.
Additionally, three new Blackwell system chips have been announced for mass production, the world's first physical AI foundational model, and breakthroughs in three major robotics fields—autonomous AI agent robots, humanoid robots, and autonomous vehicles.