Last night's market trend for Big Pie and Two Pie was really timely. It has not been such a smooth trend for a long time.
This time let’s talk about how to use B’s historical transaction records.
B'an officials have shared a lot of transaction data. There is a Github page where you can view in detail what categories of data are available and how to download them. Here I mainly introduce the download, organization and use of the aggregated transaction records aggTrades.
aggTrades is the transaction record after aggregation, that is, Binance aggregates several consecutive transactions executed at the same time, in the same direction, and at the same price into one record, similar to a K-line, but with only one price and no highest price. lowest price and so on. For example, if there is a large order placed at the market and many small orders are placed to eat it, then these orders that arrive at the same time in a very short time (milliseconds) will be aggregated together. Or it may be the order that the matching engine can match at one time.
The advantage of aggTrades data is that it can have millisecond-level transaction records, but it does not have as large a volume of data as pure trades (that is, the most original data, not aggregated at the same price), which is a good compromise for high-frequency data. plan.
Generally, quasi-high-frequency strategies use such aggregated data. Those who are really high-frequency usually use trades and orderbook data. The amount of data is very large and it is very inconvenient to process. If it is not used, there is no need. Not only consumes resources, but also increases the difficulty of programming.
aggTrades can be used to synthesize short K-lines of any level, such as short K-lines of 5, 10, and 15 seconds. It is still possible to use such K-lines for some intraday trading strategies.
For example, the following is the 1-second level k-line posted before.
There is also high-frequency arbitrage. It is difficult to make money with ordinary arbitrage now, and it is very risky. If the frequency is higher, there may be some opportunities.
This kind of tick-by-tick data can also be used to synthesize alternative k-lines, such as equal bars. Using aggTrades is much more accurate than using minute k-lines. The CTA strategy produced in this way may be compared with ordinary k-line strategies. complementary. I may write about how to synthesize this kind of bar in the next few articles.
The following is the code introduction (most of the code is in the attachment, because it is too much and too long, I will not post it in the text). The download code is Binance's example, but with some slight changes, and my processing code.
Here is an example of monthly data (Binance also provides daily data). The code is all single-threaded and executed synchronously, without asynchronous or multi-threading. Because such downloads are executed once a month or n days, you can just wait for a while, and there is no need to make it complicated.
There is generally no problem with using older data for basic backtesting, because high-frequency data backtesting mainly plays an analytical role, and it mainly depends on the actual price, because the slippage may be large, or the pending order is not completed, and the follow-up The order was not caught, the exchange was delayed, etc.
But later I also have the code to download real-time data, which is to download the aggTrades data of the day to instantly verify the missed market prices. Binance’s daily data is only available with a lag of several days. If you want to use it urgently, you have to crawl it yourself using the exchange api.
1 download
The first part of the code is the two files agg.py and utility.py, and then use
python3 agg.py -y 2022 -m 6 -t um -folder /your specified path
Here is an example of a command to download all perpetual contract aggTrades data for June 2022. It usually takes about ten minutes to download. The data is downloaded compressed, so the next step is to decompress it.
2. Unzip
To decompress data, use the unzip.py file. It is a very short code with only two functions, one for decompressing monthly data and the other for decompressing daily data.
After extracting the zip file, it becomes a csv file. The problem with csv files is that they take up more space on the hard disk and are slower to load. So the next step is to convert the csv file into a pickle file. You can compress or not compress according to your own situation. If your hard drive is large, it is recommended not to compress it. Generally, the sustainable data is about 50GB per month, so it depends on your situation. Generally, you can buy a faster external SSD hard drive. Nowadays, 1TB only costs a few hundred yuan, which is already cheap. It also loads very quickly during backtesting.
3. Convert compression to pickle
The conversion code is in csv_to_pkl.py. As mentioned above, it is easier to save and use after conversion.
After the conversion, I manually deleted the csv file, otherwise it would occupy the hard disk. There is no code to implement this step. As mentioned before, these are very low-frequency operations and do not need to be fully automated. I use gzip level 2 compression in the example code. Generally, one month's data is less than 10GB after compression.
Not only does csv take up space, it also loads slowly. It is recommended to use pickle format. Overall, its performance is good, but the key is good compatibility. If you want to use cloud servers, or many other open source packages, they are more compatible than feather and the like.
Okay, now that the data is ready, you can start studying strategies.
4. Instant data download
As mentioned before, the data on Binance’s data server is lagging behind by one or two days. If you need to verify how your strategy will perform under the current market conditions, but have not posted the real price of a certain currency in time, you can use the following code to download aggTrades back to the specified time. I usually use it to download the current 12 hours of data to see how the strategy may perform.
If the following code is used, it needs to be modified to the latest start time.
The downloaded data is like this. You can refer to Binance’s documentation and recent transactions (accumulation) to see their specific meanings.
With such raw data, you can use df.resample(bar_size).agg() to aggregate them into any level k-line you need, and you can do whatever you want.
at last
Basically this is the preparation of the aggTrades data I used before. Today's strategies are becoming increasingly volumetric, and more categories of strategies can be developed using more granular data. Complementarity between strategies is the key to multi-strategy.
If you need the above code, you can contact us to get it.