Among the crypto events that are taking place in Paris this week, in addition to the highly anticipated Ethereum EthCC, the 2023 DeFi Security Summit is also eye-catching. At this summit, Felix, the security services manager of OpenZeppelin EMEA, gave a speech at the 2023 DeFi Security Summit. BlockBeats has compiled and translated it for you as follows:
Hi, I'm Felix from OpenZeppelin. You may know us as a security library for smart contracts, but our services extend far beyond that, including auditing, monitoring, and many other things. Today I want to talk to you about the topic - testing.
This is an important question facing us at present: if we achieve 100% test coverage, do we really get security? If so, is it limited security or a lot of security? I will try to answer this question in today's speech. However, before we go into this issue, let's take a step back and answer why we need to test?
Generally speaking, the purpose of testing is: you may want to verify whether your code base implements a specific function, or you may expect it to have this function.
That's the original intention of testing, but if you dig a little deeper, you'll find that testing has other purposes. You may not only want to do unit testing, you may also want to do some integration testing or end-to-end testing, which actually has the property of "descriptive". So you can clearly see what the normal running path of your code base is like, how potential end users will interact with your code base. This "descriptive" property is also part of it. However, does it really help with security?
We hope it improves security. However, we don't have an answer to this question on this slide. So let's move on and come back to this question later. Now, let's talk about the quality of the test. Over time, you may want to complete the test. So we have various different quality indicators.
The first metric is "functionality", which is the test coverage of the code base. This is usually expressed as line coverage or branch coverage. This is a good reference number. We assume that in our test sample, this number reaches 100%, and we are doing well in this regard.
"Descriptive" means that you have some interesting test cases, not just very small unit tests, but some very interesting end-to-end test cases. Now the question is again, do we need an additional metric, or have we achieved perfect security by accident? Let's look at a small example: Here is a very small ERC20 token I wrote:
As you can see in the upper part of the slide, 100% test coverage is guaranteed by Foundry (Solidity framework). As you can see in the lower part of the slide, this token has three use cases: minting, burning and transfer. They all work and are covered in this test case. Basically we have done a good job in "functionality" and "descriptiveness", so are we safe?
No, this has an open burn vulnerability where you can destroy any user's tokens. This is a critical vulnerability and the security is almost 0%. You need to fix this and cannot deploy in this form. So what is missing?
The test case above is missing. In Foundry syntax, if a function name starts with "test fail", it pretty much means it's used to check for a specific failure case. A typical test might check if an arbitrary address can destroy 1000 tokens of another user. Obviously, you want this to not happen, so you write this negative test. This is just a simple example, let's look at the general principle.
When people say they are doing testing, most of what we typically see falls into the left category. They are doing functional testing, focusing on coverage. That is part of testing, and it is done very well, but it is not directly related to security. If you want to do security testing, you need to look to the right.
You need to ask yourself, have I adequately tested the functionality of the application? For example, for unauthorized functions, have I really tested that any address cannot interact with the contract in a specific way? Therefore, application-aware testing is what we call security testing here.
Now I want to take a step back and think about another question: How is testing done in theory and practice?
The theoretical numbers provided by IBM suggest that we should invest heavily in testing because it will bring real returns. If you find problems early in the design phase, implementation phase, or dedicated testing phase, the cost is relatively low. However, during the deployment phase, the cost can rise sharply. This is not unique to blockchain systems, in which case the number "100" you see may be further increased by the TVL brought by the protocol.
This principle applies to any software system. So in theory, everyone should test as well as possible, because economics clearly tells us that this will pay off. But in practice, testing is not a glamorous thing. Testing is not "cool". No one says, "I'm really passionate about QA testing, I'm a tester." All the "cool" things are basically about hackers. Of course, getting hacked is a bad thing for your protocol. But hacking is "cool", and testing is a necessary but unpopular task.
So how do we solve this problem? Here’s my secret to making testing attractive: think of hacking as part of the testing.
To do really good security testing, we need to start with a "while true" because you can iterate this cycle forever. Your goal is to create a brand new POC vulnerability that tries to attack the protocol you are developing. You may verify that it does not work, or find something that needs to be fixed. Then you can integrate it as a test case into your security test suite.
By applying hacker thinking to your daily activities, you can make testing more appealing. This is not just the unit testing you know or the boring testing that is typical of college courses. This security testing is more like a modern application of hacker thinking, just applying it to an artifact that can be incorporated into your test suite.
This is a great theory, but now we need to add some practical angles. Let's look at which projects have had vulnerabilities at the code level this year. I'm not talking about low-level errors like private key leakage here, but because there are bugs in the code. If the above test methods were used, the theft might have been prevented. With the funds stolen this year reaching $250 million, I think this is a strong argument that may promote the integration of such security tests into the test suite.
If you are a decision maker, you probably don't want to remember all the details, but just want a simple metric. Count the number of interesting test cases in your test suite based on the proof of concept vulnerability and use this as a metric for security testing. In general, 100% test coverage is for functional testing. It has nothing to do with security. If you want to do security testing, you should test the parts that don't exist. To do security testing in a good way, you need to adopt a hacker's mindset and really make POC vulnerabilities part of your test suite.
Unfortunately, this is not a silver bullet that will fix it once and for all. You can start with it as part of your project, and then it is best to contact your security partner to further enhance this testing approach.