Has Nvidia (NVDA.O) hit a new roadblock before its Blackwell GPUs are officially available? Following production issues a few months ago, the artificial intelligence giant's Blackwell processors are experiencing overheating issues when installed in high-volume server racks, according to The Information.
The challenges have led to design changes and delays, raising concerns among major customers including Google (GOOGL.O), Meta Platforms (META.O) and Microsoft (MSFT.O) about timely deployment of Blackwell servers, the report said.
The report mentioned that people familiar with the matter told The Information that Blackwell GPUs used for AI and high-performance computing (HPC) face overheating problems in servers that house 72 processors, which may require up to 120kW of power per rack.
As a result, Nvidia has reportedly revised its server rack design several times, as overheating not only hinders GPU performance but can also damage the hardware.
Given that customers like Google, Meta and Microsoft rely on these GPUs to train their state-of-the-art large-scale language models, an Nvidia spokesperson told Reuters that the company is working closely with cloud service providers and described the design adjustments as a routine part of the development process.
Notably, while such adjustments are common in large-scale technology rollouts, they have resulted in delays that could further push back expected shipping timelines, Tom's Hardware reports.
Tom's Hardware notes that the final revision of Blackwell only entered mass production at the end of October, with shipments expected to begin at the end of January. It remains to be seen whether the latest overheating issues will further delay Blackwell shipments.
This is by no means the first time Nvidia has run into problems with Blackwell. A few months ago, it was reported that GPUs were affected by a design flaw that affected processor yields, which was related to TSMC's (TSM.N) CoWoS advanced packaging, but was eventually resolved by changing the GPU's mask.
However, Nvidia CEO Jensen Huang in October refuted rumors that TSMC was to blame, stressing that TSMC helped resolve the problem and resumed manufacturing at "incredible speed." He also described demand for Blackwell as "crazy."
Article forwarded from: Jinshi Data