According to TechFlow, Chainbase, a full-chain data network, recently announced that it has open-sourced its large language model Theia-Llama-3.1-8B, which is specially designed for the crypto field, on HuggingFace. The model surpasses the mainstream models in the market in terms of perplexity and BERT score, and its ability to understand the crypto world exceeds that of most mainstream open source models.

The Chainbase team has creatively built the first professional Web3 dataset, which includes various data from the top 2000 projects on CoinMarketCap. The dataset has been manually and algorithmically filtered to ensure the accuracy, diversity, and professionalism of the training data. Based on this dataset, the team uses LoRA technology to efficiently fine-tune the model and uses tools such as DeepSpeed ​​to accelerate the training process. In addition, the model is quantized to Q8 GGUF format, which greatly reduces memory usage and improves inference speed.

It is reported that Theia-Llama-3.1-8B is Chainbase's initial attempt at a large model in the encryption field, and the model has been successfully applied to the Chainbase DEMO interactive application TheiaChat, which currently has more than 300,000 daily active users.