a16z 'Disciple' Kuzco Practical Guide Part Two: From Solo Operations to Cluster Deployment

There is still half a month of preparation time before Epoch Two begins.
Written by: J1N, Techub News
Introduction: From Epoch One to Two
Kuzco is a network dedicated to serving LLM large language model computing mining, selected for a16z's Crypto Startup Accelerator (CSX) fall accelerator program launched in New York on September 9 this year. Projects selected for this program can receive at least $500,000 in investment from a16z and will receive guidance and support from the a16z operating team. The accelerator program has now ended.
On November 16, Kuzco announced that the first phase (Epoch One) incentive plan will end on November 18, 2024, all operations will be suspended, data snapshots will be permanently stored, and final point rankings will be published in the new leaderboard.
Official disclosure, Epoch One will launch on March 6, 2024, with peak device counts exceeding 8000. The network will run Meta's released 8B specification Llama-3 AI large language model, totaling over 1 trillion token inferences.
And announcing that in the coming weeks, financing information and project development roadmap will be disclosed, and the second phase (Epoch Two) incentive plan will be launched on December 9, bringing new features such as higher throughput and reliability of NVIDIA hardware; encouraging users to connect top computing devices like A100 and H100; supporting more image generation and multimodal language models (VLM).
There is still half a month of preparation time before Epoch Two begins. This article will explore:
Sharing personal mining practices and results, from solo machines to clusters.
Showcasing the entire process of obtaining funding through research and practice and building high-spec machines.
Exploring the matching of hardware configuration with project needs and answering common investor questions.
Review of Epoch One: Solo Operations
Configuration
The author's configuration list includes RTX series graphics cards 2060, 2070S, 3080, 4060, 4060Ti, as well as 4 x 4070S and 2 Apple M2, M3 devices. These devices are distributed across several hosts, laptops, and a dedicated mining machine.
Cost
It is worth mentioning that these graphics cards were originally purchased by the author each year based on gaming needs, not specifically for mining. Therefore, when calculating costs, the hardware purchase costs were not included, only the actual electricity costs of the mining machines were counted. Here is an example of the mining machine assembled in the first article (a16z 'Disciple' Kuzco Practical Guide: How to Efficiently Conduct AI Computing Mining?).
Configuration of the mining machine:
Motherboard: z490 (to be replaced with industrial board later)
CPU: 10th Gen I9
Graphics Cards: 2060, 2070s, 3080, 4060ti, 4070s
Hand-assembled Mining Machine
The following chart shows the power consumption of the mining machine for October and November, totaling 564 kWh, obtaining approximately 600 million points (KZO Point). All machines combined account for about 1.1 billion points. The specific electricity cost needs to be calculated based on each person's local electricity rates, and here it is only provided for reference.
The furthest right in the chart, a total of 1 billion points obtained
Preparing for Epoch Two: Cluster Deployment
Based on the author's sharing in the first article and rich operational experience in device assembly, debugging, and environment deployment, the author successfully secured certain financial support and invested all of it in assembling high-performance mining machines to further enhance computing power scale and operational efficiency.
From solo operations to cluster deployment
Configuration and selection logic of high-spec machines
Combining the author's practical experience in Epoch One, the motherboard, CPU, graphics card, power supply, platform, and network configurations have been comprehensively optimized, selecting more compatible hardware combinations, which not only improved overall operational stability, security, and efficiency but also paid more attention to the liquidity of the second-hand market in hardware selection. This strategy can effectively reduce actual investment costs and provide higher cost-effective options for subsequent participants.
Motherboard
The author chooses industrial motherboards instead of mainstream B85, mainly based on a comprehensive consideration of performance, stability, and cost-effectiveness.
In terms of performance, running Kuzco's Llama-3 model requires starting multiple Docker processes, and running these processes in parallel will consume a lot of CPU resources, which places high demands on CPU performance. The CPU compatible with B85 cannot meet this demand.
In addition, industrial motherboards have significant advantages in long-term stable operation, high-temperature resistance, and manufacturer warranty, while they have stronger liquidity in the second-hand market, making them undoubtedly the optimal choice.
Graphics Card
The author chose to use 4070S as the main graphics card mainly based on the following points:
Advantages of AI computing performance: Compared to 30 series graphics cards, the performance improvement of 40 series graphics cards in AI computing far exceeds that in gaming performance. The core reason lies in the fact that AI computing power mainly relies on the number of CUDA cores in the graphics card, and the 40 series graphics cards have significantly more CUDA cores than the 30 series graphics cards.
Energy Efficiency Advantage: The author conducted detailed testing on multiple GPUs and calculated the average power consumption per token.
4060Ti (160W): 0.125 Tokens/W
3080 (330W): 0.22 Tokens/W
4090 (450W): 0.26 Tokens/W
4070S (220W): 0.38 Tokens/W
From testing results, 4070S performs best in balancing performance and power consumption, and its higher energy efficiency directly reduces electricity costs, making it the most cost-effective choice.
Prices and liquidity in the second-hand market: As a mid-to-high end graphics card, 4070S has high liquidity and value retention in the second-hand market, further reducing the holding cost of the equipment while providing flexibility for future hardware upgrades.
CPU
As mentioned earlier, running Kuzco's Llama-3 requires starting multiple Dockers, which significantly occupies CPU resources, especially in multi-card operation, where CPU usage may reach 80%-90%. Therefore, multi-core and multi-threading capabilities are particularly important. A high-performance, multi-threaded, stable CPU can not only effectively support multi-tasking but also ensure the stability and efficiency of the entire mining process.
13th Gen i5 can reach 70%+ utilization under full load on graphics cards
Network Environment
Soft routing is the box in the figure
The network environment is also crucial in mining. Even if high-performance graphics cards are configured, if the network is not optimized, computing power will be severely impacted. Based on the author's tests, insufficient internet speed may cause computing power to drop by 30%, and low-quality network nodes may directly lead to an inability to connect to the Kuzco network. Both of these points are unacceptable for mining. To solve these issues, the author adopts a soft routing solution, which is not only easy to configure but also can run efficiently with almost no manual intervention after setup, theoretically supporting unlimited device connections. For specific operational methods, it is recommended that readers refer to relevant materials based on their needs.
Power Supply
Classic Great Wall 2000w nuclear bomb power supply
When selecting a power supply, special attention needs to be paid to the issue of peak power consumption, which is why even though the rated power consumption of 7 x 4070S is only 1540W, the author still chooses to use dual 2000W power supplies, achieving a total power of 4000W. This is not a waste of resources but a consideration for the stability and safety of device operation.
Graphics cards may experience peak power consumption during operation, meaning at certain moments their actual power consumption may reach 1.5 times or more of the rated power consumption, then drop back to normal levels. If the power supply power is insufficient to cope with this peak, it may trigger the power supply's forced shutdown mechanism, potentially damaging the graphics card. This poses a fatal threat to the normal operation of the mining machine.
4070s Running Power Consumption Performance
Taking the 4070S as an example, although its rated power consumption is 220W, peak power consumption may exceed 400W. The peak power consumption of 7 graphics cards may reach over 3000W, thus configuring dual 2000W power supplies is to ensure the stable operation of the machines. This is especially important for users configuring multiple 4090s, as the rated power consumption for a single 4090 is 450W, while peak power consumption may reach up to 770W. In multi-card scenarios, relying solely on two power supplies may not meet the demand, usually requiring three power supplies to ensure system stability.
4090 Power Consumption Performance
Supplement
As for issues like BIOS settings, hardware compatibility, and remote management, the author will not elaborate too much. There are a lot of free tutorials available online for reference, and following these tutorials can solve most problems. It is recommended to refer to and handle based on one's own hardware configuration and needs, simply and efficiently.
Risks and Rewards
Answering the question everyone is most concerned about: How much can be mined daily? Frankly speaking, this question does not have a clear answer, as risks and rewards always coexist. I can share a clear viewpoint: whether in the crypto space or traditional industries, if any project can accurately calculate daily returns, then you are likely to have already missed out on significant profits. Unless you possess certain monopolistic resources, such as extremely low electricity costs or very cheap mining equipment, only then can you gain an advantage in returns. However, such resources are not available to everyone.
The author chooses devices with good liquidity precisely to reduce investment risks and cost pressures. Taking Kuzco mining as an example, costs are mainly concentrated on hardware depreciation and electricity costs, so your maximum loss is limited to these fixed costs. If you do not participate under low-cost conditions, then any investment decision loses its significance. It is important to emphasize that the nature of mining means there is no clear return expectation, but this is precisely the potential of mining.
From a subjective judgment, this track has huge market prospects: on one hand, Kuzco has received investment support from a16z; on the other hand, the demand for LLM large language models is rapidly expanding. Just think about it, almost no one would not use LLM, right? Platforms like OpenAI's ChatGPT, Meta's Llama, and Musk's XAI have all received rounds of high funding, clearly indicating the growth potential of this industry.
For ordinary people, directly participating in the AI industry is not an easy task. On one hand, the technical threshold for AI is high; on the other hand, training AI models requires massive resources and funding, which most people cannot afford. However, by joining the AI computing network through Kuzco, ordinary people can easily participate in this high-growth field under controlled costs, contributing to AI computing while also gaining returns.
Additionally, Bitcoin prices are about to break $100,000, rising from $16,000 in 2022 to the current high, with significant retracement risks behind it. If one chooses to directly purchase tokens from AI projects, they will face similar high volatility risks. In contrast, participating in the AI computing network is a more robust choice: not only are costs clearly controllable, but it also allows entry into the high-growth track of the AI industry with relatively low risks. This is one of the practical ways for ordinary people to enter the AI field in the current environment.