Overcoming Unique Challenges in Our Binance Ledger Migration
Main TakeawaysÂ
Our Binance Ledger migration aimed to solve the hot account problem inherent to our former relational database server.
A crypto exchange operates 24/7, 365 days of the year with no maintenance window to leverage, unlike a regular exchange, which has daily break hours.
Migrating to a new Binance Ledger had to occur online, keep usersâ assets SAFU, and incur no business impact at all, so the experience for end users remained seamless.Â
We chose a gradual, account-by-account migration strategy instead of a cut-over, all-at-once strategy as employed by the gh-ost tool.
Learn more about our Binance Ledger migration and the tools and techniques used throughout the process.
Binance Ledger underpins our technical operations and processes millions of daily transactions across a vast user base. You can learn more about the system, its goals, and its challenges in our How Binance Ledger Powers Your Binance Experience blog. Our migration process from the old to the new version faced a typical challenge: how could we upgrade the engine on the fly while the airplane is still in flight? We had to migrate our usersâ assets, and keeping funds SAFU was our top priority.Â
Binanceâs Key Migration Challenges
The following list of challenges had to be addressed to achieve our established goals:
Ensure the full correctness of the new ledger
Be able to detect any fund issue and fix it timely and accurately
Incur no downtime for upstreams
Comparing Our Migration Mission with Online Database DDL
Before we dive into our solutionâs details, letâs look at the common problem of performing an online DDL (data definition language) for a large table. What exactly is DDL? Well, imagine a table with hundreds of millions of rows where we need to add another column. We want to do this online with no disruption to the business.
The gh-ost tool is widely used to solve this problem, and you can see how it works in the following diagram.
The process essentially consists of two phases:
The sync phase, which continues until the new table is fully identical to the original table. There are two two kinds of data to be synced:
Existing data
Incremental data (new data generated from the original table during ongoing migration process)Â
The cutover phase swapping the original table with the new one without interrupting any ongoing transactions.
Where Binance Ledgerâs Problems Differed
Despite some similarities, Binance Ledger had some inherently unique challenges in our online migration mission.
Firstly, Binance backend systems operate in a distributed environment, whereas the online database DDL is in a monolithic environment. Secondly, we canât afford to adopt the all-at-once cutover approach because the data was our usersâ assets. Lastly, we need to ensure that all relevant services work as a whole before starting to perform mass migration.Â
As in the prior online DDL example, there were also two phases to our migration:
The sync phase, where a dedicated replicator service was specially built to sync the balances from the old ledger to the new ledger
The account-by-account cutover phase
Phase-Based Approaches
The task was large, and as they say, Rome wasn't built in a day. The divide and conquer approach often works like a charm in the face of a large and complex problem domain.
Phase 1: Replication
The why
We can summarize our thought-process here into two main points:
We modeled Binance Ledger as a new slave joining the existing MySQL cluster, which powers the current ledger system. By leveraging replication techniques, we could keep the usersâ balances fully in sync asynchronously.Â
We could then route production traffic in verbatim to Binance Ledger to validate its correctness and robustness. Even if things go sour in this phase, there is no impact for us and our users.
The whatÂ
Below, weâve illustrated the overall replication pipeline. The critical path to pay attention to is:Â
Transfer â Ledger â Binance Ledger replicator â Binance Ledger
The how
We broke down the replication process in two separate steps:
Dumped a snapshot of the ledger DB and then imported it into the Binance Ledger
Replicated the bin log of the ledger DB after the time when the snapshot was dumped.
Eventually, the data of balances and balance logs would be kept in full sync between the old ledger and the Binance Ledger, which can further be validated by the full reconciliation module.
The when
Binance Ledger went live in early August 2022. Afterward, we kicked off the replication process, which lasted until the middle of November 2022. This process was an important period for us as the new ledger systemâs correctness needed to be validated 100%. This step could not be skipped before going ahead with the next migration phase.
Ultimately, we found no issues and carried out several release routines to get more comfortable with the situation. The three month process wasnât particularly fast, but it was necessary for our SAFU goal.
Phase 2: online migration
The why
To migrate around hundreds of millions of accounts, we built a customized migration job.Â
The what
Below, weâve depicted the core migration flow for one account:
Here are some key notes to keep in mind:
The account system maintains the ownership mapping for each account.
Account A â ledger
Account B â Binance Ledger
Account C â forbidden
Before migrating an account, if there existed any pending concurrent transaction, it would be skipped to reduce business impact.
We changed ownership mapping from ledger to forbidden, disallowing any further balance updates, thus making it immutable.
We reconciled balances between the old ledger and the Binance Ledger.
We changed ownership mapping from forbidden to Binance Ledger, allowing future balance updates to be routed directly to the Binance Ledger.
According to our performance metric, it took an average of 150 ms from step 3 to step 5. In theory, users cannot conduct any transaction during this 150 ms migration period. It turned out that there were zero impacted transactions.
The Execution
At Binance, we advocate the principle of âgood execution over meticulous planning.â Solid execution is vital to our success, and fund safety is always our top priority. We adopted a gradual migration strategy over a period of three weeks to catch issues as early as possible which in turn helped reduce the magnitude of negative impact.
The Reconciliation Process
Reconciliation is hugely important in detecting potential balance anomalies timely from an unbiased perspective. We can conduct the process in a near real-time manner to take prompt action before things worsen. Two types of reconciliation modules are developed specifically for the online migration process: real-time and full.
Real-time
The transaction-level-based reconciliation process is built to detect any fund issue in real time.
Full
We can perform a full reconciliation periodically based on the snapshots synced to the data warehouse. This process ensures all balances are the same between the old ledger and the Binance Ledger.
For example, letâs say we have 10 million users still residing on the old ledger. We can use this full reconciliation to verify that the balances and balance logs are the same between the old ledger and the Binance Ledger.
Wrapping the Migration Process Up
In a nutshell, the mission was accomplished by 1) using replication techniques to validate the correctness of the new Binance Ledger 2) implementing an account-by-account migration strategy to upgrade the engine slowly, safely but surely.Â
We believe the aforementioned online migration paradigm can be reused in similar tasks. If the processes and topics discussed have piqued your interest, why not consider joining the team? Weâre always looking for dedicated individuals with fresh perspectives on our daily challenges at Binance.