By Eugene Cheah

Compiled by: J1N, Techub News

The decline in AI computing power costs will stimulate a wave of startups to innovate using low-cost resources.

Last year, due to the tight supply of AI computing power, the rental price of H100 was as high as $8 per hour, but now there is an oversupply of computing power in the market, and the price has dropped to less than $2 per hour. This is because some companies signed computing power rental contracts in the early days, and in order to avoid wasting excess computing power, they began to resell their reserved computing resources, and the market mostly chose to use open source models, resulting in a decrease in demand for new models. Now, the supply of H100 in the market far exceeds the demand, so renting H100 is more cost-effective than buying it, and investing in buying a new H100 is no longer profitable.

A brief history of AI competitions

The price of GPU computing power has soared, with the initial rental price of H100 being about $4.70 per hour and rising to a maximum of more than $8. This is because the project founders must hurry up to train their AI models in order to achieve the next round of financing and convince investors.

ChatGPT was launched in November 2022, using the A100 series of GPUs. In March 2023, NVIDIA launched the new H100 series of GPUs, which it promoted as 3 times more powerful than A100, but only 2 times more expensive than A100.

This is a huge attraction for AI startups. Because the performance of the GPU directly determines the speed and scale of the AI ​​models they can develop. The powerful performance of the H100 means that these companies can develop faster, larger, and more efficient AI models than before, and may even catch up with or surpass industry leaders like OpenAI. Of course, all this is based on the premise that they have enough capital to buy or rent a large number of H100s.

Due to the greatly improved performance of H100 and the fierce competition in the AI ​​field, many startups have invested huge amounts of money to snap up H100 and use it to accelerate their model training. This surge in demand has caused the rental price of H100 to skyrocket, initially at $4.70 per hour, but later rose to more than $8.

These startups are willing to pay high rents because they are eager to train models quickly in order to attract investors' attention in the next round of financing and secure hundreds of millions of dollars in funding to continue expanding their businesses.

For computing centers (farms) with a large number of H100 GPUs, the demand for renting GPUs is very high, which is like "money at your doorstep". The reason is that these AI startups are eager to rent H100s to train their models and are even willing to pay the rent in advance. This means that GPU farms can rent out their GPUs at $4.70 per hour (or more) for a long time.

According to calculations, if they can continue to rent GPUs at this price, the payback period (i.e. the time it takes to recover the purchase cost) for their investment in the H100 will be less than 1.5 years. After the payback period, each GPU will bring in more than $100,000 in net cash flow income per year.

As demand for H100 and other high-performance GPUs continued to surge, GPU farm investors saw huge profit margins, so they not only agreed to this business model, but even made additional larger investments to purchase more GPUs to earn more profits.

(The Folly of the Tulip): Created after the first speculative bubble in recorded history, tulip prices continued to rise in 1634 and collapsed in February 1637

As the demand for artificial intelligence and big data processing grows, the demand for high-performance GPUs (especially NVIDIA's H100) has surged. In order to support these computing-intensive tasks, global companies have initially invested about $600 billion in hardware and infrastructure to purchase GPUs, build data centers, etc. to increase computing power. However, due to supply chain delays, the price of H100 remained high for most of 2023, even exceeding $4.70 per hour unless the buyer was willing to pay a large deposit in advance. By early 2024, as more suppliers entered the market, the rental price of H100 dropped to about $2.85, but I began to receive various promotional emails, reflecting the intensified competition after the market supply increased.

While the initial H100 GPU rental price ranged from $8 to $16 per hour, by August 2024, the auction-based rental price had dropped to $1 to $2 per hour. Market prices are expected to drop by 40% or more per year, far exceeding NVIDIA's forecast of $4 per hour for four years. This rapid price decline poses financial risks to those who have just purchased a high-priced new GPU, as they may not be able to recoup their costs through leasing.

What is the return on capital for investing $50,000 in one H100?

Without considering power and cooling costs, the purchase cost of H100 is about $50,000, and the expected service life is 5 years. There are usually two modes of leasing: short-term on-demand leasing and long-term reservation. Short-term leasing is more expensive but more flexible, while long-term reservation is cheaper but more stable. Next, the article will analyze the benefits of these two modes to calculate whether investors can recover their costs and make a profit within 5 years.

Short-term on-demand rental

Lease price and corresponding income:

>$2.85: Exceeds stock market IRR and achieves profitability.

<$2.85: The return is lower than that of investing in the stock market.

<$1.65 : Expected investment loss.

The "mixed price" model predicts that rents may fall to 50% of current prices in the next five years. If the rental price remains at $4.50 per hour, the return on investment (IRR) is more than 20%, which is profitable; but when the price drops to $2.85 per hour, the IRR is only 10% and the return is significantly reduced. If the price falls below $2.85, investment returns could even be lower than stock market gains, and when prices fall below $1.65, investors are at serious risk of losing money, especially for those who recently purchased an H100 server.

Note: The “blended price” is an assumption that the rental price of H100 gradually decreases to half of the current price over the next 5 years. This estimate is considered optimistic because the current market price decreases by more than 40% per year, so it is reasonable to consider the price decrease.

Long-term rental contracts (3 years and above)

During the AI ​​boom, many established infrastructure providers, based on past experience, especially in the early days of cryptocurrency, Ethereum PoW, experienced cycles of skyrocketing and plummeting GPU rental prices. Therefore, in 2023, they launched high-priced prepaid rental contracts for 3-5 years to lock in profits. These contracts usually require customers to pay more than $4 per hour, or even prepay 50% to 100% of the rental. With the surge in demand for AI, especially in the field of image generation, companies with basic models have to sign these contracts at high prices to quickly complete the target model and improve their competitiveness in order to seize the market opportunity and be the first to use the latest GPU clusters. However, when the model training is completed, these companies no longer need these GPU resources, but due to the contract lock-in, they cannot easily exit. In order to reduce losses, they choose to resell these leased GPU resources to recover part of the cost. This has led to a large number of resold GPU resources on the market, increased supply, and affected the rental price and supply and demand relationship in the market.

The current H100 value chain

Note: Value chain, also known as value chain analysis, value chain model, etc., was proposed by Michael Porter in his book (Competitive Advantage) in 1985. Porter pointed out that if an enterprise wants to develop a unique competitive advantage and create higher added value for its products and services, the business strategy is to structure the business model of the enterprise into a series of value-added processes, and this series of value-added processes is the "value chain".

The H100 value chain ranges from hardware to AI inference models, and the participants can be roughly divided into the following categories:

  • Hardware vendors working with Nvidia

  • Data Center Infrastructure Providers and Partners

  • Venture capital funds, large companies and startups: planning to build a basic model (or have already built a model)

  • Capacity resellers: Runpod, SFCompute, Together.ai, Vast.ai, GPUlist.ai, etc.

The current H100 value chain includes multiple links from hardware suppliers to data center providers, AI model development companies, capacity dealers, and AI reasoning service providers. The main pressure on the market comes from the continuous resale or leasing of idle resources by unused H100 capacity dealers, and the widespread use of "good enough" open source models (such as Llama 3), which has led to a decline in demand for H100. These two factors together have led to an oversupply of H100, which in turn has put downward pressure on market prices.

Market Trends: The Rise of Open Source Weight Models

Open source weight models refer to those whose weights have been publicly distributed for free and are widely used in the commercial field despite the lack of a formal open source license.

The demand for these models is driven by two main factors: the emergence of large open source models of similar scale to GPT-4 (such as LLaMA3 and DeepSeek-v2), and the maturity and widespread adoption of small (8 billion parameters) and medium (70 billion parameters) fine-tuned models.

As these open source models become more mature, companies can easily acquire and use them to meet the needs of most AI applications, especially in terms of inference and fine-tuning. Although these models may be slightly inferior to proprietary models in some benchmarks, their performance is good enough to handle most commercial use cases. Therefore, with the popularity of open source weight models, the market demand for inference and fine-tuning is growing rapidly.

Open source weight models also have three key advantages:

First, open source models are highly flexible, allowing users to fine-tune models according to specific fields or tasks, so as to better adapt to different application scenarios. Second, open source models provide reliability because model weights will not be updated without notice like some proprietary models, avoiding some development problems caused by updates and increasing users' trust in the model. Finally, it also ensures security and privacy. Enterprises can ensure that their tips and customer data will not be leaked through third-party API endpoints, reducing data privacy risks. It is these advantages that have led to the continued growth and widespread adoption of open source models, especially in reasoning and fine-tuning.

Demand shift for small and medium-sized model creators

Small and medium-sized model creators are enterprises or startups that do not have the ability or plan to train large base models (such as 70B parameter models) from scratch. With the rise of open source models, many companies have realized that fine-tuning existing open source models is more cost-effective than training a new model from scratch. Therefore, more and more companies choose to fine-tune rather than train models themselves. This greatly reduces the demand for computing resources such as H100.

Fine-tuning is much cheaper than training from scratch. Fine-tuning an existing model requires far fewer computational resources than training a base model from scratch. Training a large base model typically requires 16 or more H100 nodes, while fine-tuning typically requires only 1 to 4 nodes. This industry shift reduces the need for large clusters for small and medium-sized companies, directly reducing their reliance on H100 computing power.

Additionally, investment in base model creation has decreased. In 2023, many small and medium-sized companies tried to create new base models, but today, unless they can bring innovation (such as better architecture or support for hundreds of languages), there are almost no new base model creation projects. This is because there are already powerful enough open source models on the market, such as Llama 3, making it difficult for small companies to justify creating new models. Investor interest and funding have also shifted to fine-tuning rather than training models from scratch, further reducing the need for H100 resources.

Finally, excess capacity of reserved nodes is also a problem. Many companies have long-term reservations for H100 resources during the 2023 peak period, but due to the shift to fine-tuning, they find that these reserved nodes are no longer needed, and some hardware is even outdated when it arrives. These unused H100 nodes are now resold or rented, further increasing the supply in the market, resulting in an oversupply of H100 resources.

In general, with the popularization of model fine-tuning, the reduction in the creation of small and medium-sized basic models, and the surplus of reserved nodes, the market demand for H100 has dropped significantly and the oversupply situation has intensified.

Other factors contributing to the increase in GPU computing power supply and the decrease in demand

Large model creators move away from open source cloud platforms

There are reasons why large AI model creators such as Facebook, X.AI, and OpenAI are gradually shifting from public cloud platforms to self-built private computing clusters. First, existing public cloud resources (such as clusters of 1,000 nodes) can no longer meet their needs for training larger models. Second, from a financial perspective, self-built clusters are more advantageous because purchasing assets such as data centers and servers can increase company valuations, while leasing public clouds is just an expense and cannot improve assets. In addition, these companies have sufficient resources and professional teams, and can even acquire small data center companies to help them build and manage these systems. Therefore, they are no longer dependent on public clouds. As these companies move away from public cloud platforms, the market demand for computing resources decreases, which may cause unused resources to re-enter the market and increase supply.

Vast.ai is essentially a free market system where suppliers from all over the world compete with each other

Idle and delayed H100s are launched simultaneously

The simultaneous launch of idle and delayed H100 GPUs has increased supply in the market, leading to price drops. Platforms such as Vast.ai use a free market model where global suppliers compete with each other on price. In 2023, many resources were not launched in time due to H100 delays, and now these delayed H100 resources begin to enter the market, along with new H200 and B200 devices, as well as idle computing resources from startups and enterprises. Owners of small and medium-sized clusters, which typically have 8 to 64 nodes, but due to low utilization and running out of funds, their goal is to recoup costs as quickly as possible by renting out resources at low prices. To do this, they choose to compete for customers through fixed rates, auction systems, or free market pricing. In particular, auctions and free market models cause suppliers to compete to lower prices to ensure that resources are rented, which ultimately leads to a significant drop in prices across the market.

Cheaper GPU alternatives

Another major factor is that once the cost of compute power exceeds your budget, there are many alternatives for AI inference infrastructure, especially if you are running smaller models. You don’t need to pay extra for Infiniband with the H100.

Nvidia Market Segmentation

The emergence of cheaper alternatives to the H100 GPU for AI inference tasks will directly affect the market demand for the H100. First, while the H100 excels in training and fine-tuning AI models, many cheaper GPUs are able to meet the needs in the field of inference (i.e. running models), especially for smaller models. Because inference tasks do not require the high-end features of the H100 (such as Infiniband networking), users can choose more economical alternatives and save costs.

Nvidia itself also offers alternatives in the inference market, such as the L40S, a GPU specifically designed for inference that has about one-third the performance of the H100 but only one-fifth the price. Although the L40S is not as effective as the H100 for multi-node training, it is powerful enough for single-node inference and fine-tuning of small clusters, which provides users with a more cost-effective option.

H100 Infiniband Cluster Performance Configuration Table (August 2024)

AMD and Intel Alternative Suppliers

In addition, AMD and Intel have also launched lower-priced GPUs, such as AMD's MX300 and Intel's Gaudi 3. These GPUs perform well in inference and single-node tasks, are cheaper than H100, and have more memory and computing power. Although they have not been fully verified in large multi-node cluster training, they are mature enough in inference tasks to become a powerful alternative to H100.

These cheaper GPUs have been shown to be able to handle most inference tasks, especially inference and fine-tuning tasks on common model architectures such as LLaMA 3. Therefore, users can choose these alternative GPUs to reduce costs after resolving compatibility issues. In summary, these alternatives in the field of inference are gradually replacing H100, especially in small-scale inference and fine-tuning tasks, which further reduces the demand for H100.

GPU usage in Web3 is declining

Due to changes in the cryptocurrency market, the use of GPUs in crypto mining has declined, and a large number of GPUs have flowed into the cloud market. Although these GPUs are not capable of complex AI training tasks due to hardware limitations, they perform well in simpler AI reasoning tasks, especially for users with limited budgets, when processing smaller models (such as less than 10B parameters), these GPUs become a very cost-effective choice. After optimization, these GPUs can even run large models at a lower cost than using H100 nodes.

After the AI ​​computing power rental bubble, what is the current market like?

The problem with entering the market now: The new public cloud H100 cluster is late to the market and may not be profitable, so some investors may suffer heavy losses.

H100 public cloud clusters that are new to the market face profitability challenges. If the rental price is set too low (less than $2.25), it may not cover operating costs and result in losses; if the price is too high (3 dollars or more), it may lose customers and lead to idle capacity. In addition, clusters that enter the market later have difficulty recovering costs because they missed the early high prices (4 dollars/hour), and investors face the risk of not being able to make a profit. This makes cluster investment very difficult and may even cause investors to suffer significant losses.

Early adopters’ benefits: Medium-sized or large model creators who signed long-term lease contracts early on have recovered their costs and achieved profitability

Medium and large model creators have gained value through long-term leasing of H100 computing resources, the cost of which is covered at the time of financing. Although some computing resources are not fully utilized, these companies have extracted value from these clusters for current and future model training through the financing market. Even if there are unused resources, they can earn additional income through resale or leasing, which reduces market prices, reduces negative impacts, and has an overall positive impact on the ecosystem.

After the bubble bursts: Affordable H100 could accelerate adoption of open source AI

The emergence of low-cost H100 GPUs will drive the development of open source AI. As the price of H100 drops, AI developers and hobbyists can run and fine-tune open source weight models more cheaply, making these models more widely adopted. If closed-source models (such as GPT5++) do not achieve major technological breakthroughs in the future, the gap between open-source models and closed-source models will narrow, driving the development of AI applications. As the cost of AI inference and fine-tuning decreases, it may trigger a new wave of AI applications and accelerate the overall progress of the market.

Conclusion: Don’t buy a brand new H100

If you invest in a brand new H100 GPU now, you will most likely lose money. However, it is only reasonable to invest in special circumstances, such as when the project can buy discounted H100, cheap electricity costs, or when its AI products are competitive enough in the market. If you are considering investing, it is recommended to invest the funds in other fields or the stock market to get a better rate of return.