Main topics of the post:
Binance uses machine learning models to monitor suspicious activity on the platform.
An issue known as outdated/inaccurate features can negatively impact the performance of such models, causing them to make inaccurate predictions or classifications.
Our streaming pipeline – or the process of continuously feeding the model with real-time data – consists of two parts: data processing and data delivery.
Data processing is divided into three categories: stream computing, data capture (ingesting) and data draining (sinking).
Our AI risk team of machine learning engineers and data scientists works around the clock to combat fraud and protect Binance users. To achieve this, they use AI-driven solutions that can identify and react against potential threats such as peer-to-peer (P2P) scams, payment detail theft, and account takeover attacks (ATO), to name a few.
In this article, we'll cover how our AI risk team uses a pipeline to ensure real-time responses, as well as what happens behind the scenes.
If you're not familiar with machine learning, we recommend reading the following section for a basic overview of some of the terms we'll be using throughout this article.
Batch and flow
Typically, machine learning engineers use two types of pipelines: batch and streaming. Both have their pros and cons, depending on what the situation demands.
Batch pipelines, as the name implies, process data in batches. Generally, engineers use them to process large volumes of data.
On the other hand, streaming pipelines process data in real time as it is collected. This makes them ideal for situations that require an almost instantaneous response; for example, detecting a hacker before he can withdraw funds from a stolen account.
Both pipelines are equally important. Stream pipelines are great for providing real-time responses, while batch pipelines are better for handling large volumes of data.
In the case of fraud prevention, we need to prioritize real-time data to avoid a situation called “model staleness,” which refers to outdated or inaccurate machine learning models.
The meaning of staleness (outdated)
Just as people can become less effective at a task if they are not up to date with the latest information or techniques, machine learning models can also become less accurate if they are not regularly updated according to the situation.
One thing you don't want is for a model designed to prevent fraud to become obsolete. The consequences range from the model incorrectly labeling a legitimate transaction as fraudulent to failing to identify a compromised account. Therefore, we use a streaming pipeline to ensure that fraud prevention models work with real-time data.
Processing for an Account Takeover Attacks (ATO) model
Let's take an example from our ATO model, which we train to identify accounts that criminals have taken over with malicious intent. One of the characteristics that this model measures is the number of transactions that a specific customer made in the last minute.
Hackers tend to follow a sequential pattern by carrying out a large number of operations, such as withdrawals, in a short period. Our system must identify this particularity as quickly as possible in case of potential threats. This means minimizing delays between the time a user takes an action and the time that user's activity data is processed by our models. Just a few seconds can be the difference between stopping a hacker and a user losing all their money.
For more information on how delayed features affect model performance, you can refer to this LinkedIn Engineering article: Near real-time features for near real-time personalization.
The role of batch processing
Note that the importance of the out-of-date feature may depend on the model or feature being used. For example, some features are relatively stable. In the case of the ATO mentioned above, we would also need to retrieve user withdrawal data over the last 30 days to calculate a ratio based on the most recent transactions.
In this case, batch processing for longer periods such as daily or hourly intervals is acceptable, despite the increased out-of-date resulting from waiting for data to arrive in data warehouses and periodically running batch jobs.
Balancing update and latency
Ultimately, the choice between batch and stream pipelines should be made based on the specific use case requirements and capabilities in question. Carefully considering these factors allows us to build effective fraud prevention systems that protect our users.
Using a streaming pipeline allows us to prioritize updating over latency for time-sensitive resources. The diagram above illustrates this need as the count of operations to retrieve the resources should be three instead of two.
That's why a pipeline for real-time machine learning is crucial to our risk team's day-to-day operations.
Stream pipeline details
The Binance AI Risk team's real-time machine learning mainly consists of two parts:
Data processing (top of diagram)
Data delivery (bottom of diagram)
Data processing
Regarding data processing, we can divide our streaming pipeline (Flink Job) into three categories based on their responsibilities:
Stream Processing: Feature Engineering
Data Capture: Feature Capture
Data Draining: Data Enrichment
Stream processing
The stream processing component of the pipeline is responsible for near real-time feature engineering, the process of extracting features from raw data.
It preprocesses features, which our machine learning models will use for online prediction. There are two types of processing methods for the stream pipeline: time-based and event-based.
Time-based. Calculates the number of transactions every 10 minutes. This process is somewhat outdated, but reduces latency.
Event-based. Calculates resources based on the event that is happening. This process reduces staleness but slightly increases latency.
Our preference is not to use real-time processing as much as possible and here's why:
There is a trade-off between latency and outdatedness. As online requests arrive, resource processing limits the processing logic to lightweight approaches. Although this method reduces staleness, resource processing increases prediction latency.
Independent scaling is challenging because forecasting and processing services depend on each other.
On-demand processing based on request quantity creates unpredictable scaling pressure.
Real-time processing is not adaptable to our model monitoring (training-serving skew, a discrepancy that can occur between a model's performance during training and its effectiveness when deployed) and resource monitoring solutions because resources do not are stored in a central database, i.e., a resource repository.
Data capture
The data capture component is responsible for capturing, in near real-time, features in our Kafka machine learning platform feature repository. Feature repositories are centralized databases that house commonly used features, they play an essential role in machine learning pipelines. You can learn more about them in the following articles: An in-depth look at our machine learning capability and Using MLOps to build a real-time end-to-end machine learning pipeline.
Data Drain
The data drain component is primarily responsible for collecting real-time events to a specific destination — such as Highly Distributed File Systems (HDFS) like S3 or other external databases like ElasticSearch — depending on the project requirements.
For our AI risk team, there are generally two data enrichment patterns that can be applied to real-time data in Kafka depending on the use case:
Static data. For example, retrieving a list of popular makers in S3 for P2P-related business projects in Flink jobs. Reference data is static and only needs to be updated less than once a month.
Dynamic data. Real-time exchange rates (BTC to USD, for example) are obtained from external databases like Redis. Record lookup ensures low latency and high accuracy if reference data changes.
Data delivery
The data delivery component of the pipeline is responsible for online forecasting and batch processing.
Online forecast. Occurs when requests arrive through Decision Hub (our risk team internal rules engine). The relevant service will ask the feature repository to retrieve the features and send them to the machine learning model for tagging. Our AI risk team has 20+ machine learning models designed to address different business requirements.
Batch processing. Although it may introduce a delay of up to a few days, it plays an important role as it complements the resources processed in real time.
Final considerations
It is important to note that the crypto market operates 24/7, unlike traditional financial markets which have opening and closing times. Every second, there is a continuous flow of new data (withdrawals, deposits, trades, etc.) that requires us to be on the lookout for criminals who are trying to steal user funds or personal information.
Our AI risk team has been working tirelessly to develop and maintain a sophisticated AI system that can effectively flag suspicious activity. Thanks to the efforts of this team, we can work quickly to protect compromised Binance accounts from potential losses or mitigate the damage as much as possible.
Stay tuned for more information about our machine learning efforts, or check out some of our previous articles below. Interested in a machine learning career at Binance? Check out Binance Engineering on our careers page for open positions.
Further reading
An in-depth look at our machine learning capability
Using MLOps to build a real-time end-to-end machine learning pipeline
A Feature Engineering Case Study on Consistency and Fraud Detection