Hashing refers to the process of generating a fixed-size output from a variable-size input. Such a method is accomplished through the use of mathematical formulas known as hash functions (implemented as hashing algorithms).

Although not all hash functions consistently include the use of cryptography, cryptographic hash functions are at the heart of cryptocurrencies. Through these features, blockchains and other types of distributed systems are able to achieve significant levels of data integrity and security.

Conventional and cryptographic hash functions are deterministic. This means that as long as the input does not change, the hashing algorithm will always produce the same output (also known as a hash or hash).

Typically, cryptocurrency hashing algorithms are designed as one-way functions, meaning they cannot be easily revoked without large amounts of time and computing resources. In other words, it's quite easy to create an output from an input, but relatively difficult to go in the opposite direction (to generate the input from the output alone). Overall, the harder it is to find the entry, the more secure the hashing algorithm is considered.


How does a hash function work?

Different hashing functions will produce outputs of different sizes, but the possible output sizes specific to each hashing algorithm are always constant. For example, the SHA-256 algorithm can only produce 256-bit output, while SHA-1 will always generate a 160-bit hash.

To illustrate, let's apply an SHA-256 hash function (used for Bitcoin) to the words Binance and binance.

SHA-256

Entrance

Sortie (256 bits)

Binance

f1624fcc63b615ac0e95daf9ab78434ec2e8ffe402144dc631b055f711225191

binance

59bba357145ca539dcd1ac957abc1ec58339ddcae7f5e8b5da0c36624784b2


Note how a minor change (here the case of the first letter) resulted in a totally different hash value. Since we are using SHA-256, the outputs will always have a fixed size of 256 bits (or 64 characters) - regardless of the size of the input. Additionally, no matter how many times one applies this algorithm to these particular words, both outputs will always be the same.

On the other hand, if we use the same entries through the SHA-1 hashing algorithm, we would obtain the following results:

SHA-1

Entrance

Sortie (160 bits)

Binance

7f0dc9146570c608ac9d6e0d11f8d409a1ee6ed1

binance

e58605c14a76ff98679322cca0eae7b3c4e08936


The acronym SHA stands for Secure Hash Algorithms. This refers to a set of cryptographic functions that include the SHA-0 and SHA-1 algorithms as well as the SHA-2 and SHA-3 groups. SHA-256 is part of the SHA-2 group, along with SHA-512 and other variants. Currently, only SHA-2 and SHA-3 groups are considered secure.


Why are they important?

Conventional hashing functions have a wide range of use cases, including database searches, large file analyses, and data management. On the other hand, cryptographic hash functions are widely used in information security applications, such as message authentication and fingerprint printing. When it comes to Bitcoin, cryptographic hash functions are an essential part of the process of mining and also play a role in generating new addresses and keys.

The true power of hashing is revealed when it comes to processing huge amounts of information. For example, one can run a file or data set through a hash function and then use its output to quickly verify the accuracy and integrity of the data. This is possible because of the deterministic nature of hash functions: the input will always result in a simplified, condensed output (hash). Such a technique removes the need to store and remember large amounts of data.

Hashing is particularly useful in the context of blockchain technology. The Bitcoin blockchain has several operations that involve hashing, most of which occur in the mining process. In fact, almost all cryptocurrency protocols rely on hashing to link and condense groups of transactions into blocks, as well as to produce cryptographic links between each block to ultimately create a blockchain.


Cryptographic hash functions

A hash function that deploys cryptographic techniques can be defined as a cryptographic hash function. Typically, breaking a cryptographic hash function requires a myriad of brute force attempts. For an individual to reverse a cryptographic hash function, they would need to guess what the input was through many attempts and failures until the correct output was produced. However, there is also a possibility of different inputs producing the exact same output, in which case collusion occurs.

Technically, a cryptographic hash function must follow three properties to be considered effectively secure. We can describe this as: collision resistance, preimage resistance and second preimage resistance.

Before discussing each property, let's summarize their logic in three short sentences.

  • Collision resistance: It is infeasible to find two distinct inputs that produce the same hash as output.

  • Preimage strength: It is impossible to reverse the hash function (find the input from a given output).

  • Second preimage resistance: It is impossible to find a second input that collides with a specified input.


Collision resistance

As mentioned, a collision occurs when different inputs produce the exact same hash. So a hash function is considered collision-resistant until someone finds a collision. Note that collisions will always exist for any hash function because the possible inputs are infinite, while the possible outputs are finite.

In other words, a hash function is collision-resistant when the chance of finding a collision is so low that it would require millions of years of computation. So while there are no collision-free hash functions, some of them are strong enough to be considered robust (e.g. SHA-256).

Among the different SHA algorithms, the SHA-0 and SHA-1 groups are no longer considered secure because collisions were found. Currently, SHA-2 and SHA-3 are considered collision-resistant.


Preimage resistance

The property of preimage resistance is related to the concept of one-way functions. A hash function is considered image-resistant when there is a very low probability of finding the input that generated a particular output.

Note that this property is different from the previous one, because here a hypothetical attacker would try to guess the input by looking at a given output. A collision, on the other hand, occurs when someone finds two different inputs that generate the same output, but it doesn't specifically matter which input was used.

The preimage resistance property is valuable for protecting data because simply hashing a message can prove its authenticity, without having to disclose its content. In practice, many service providers and web applications store and use hashes generated from passwords rather than plain text passwords.


Resistance to the second preimage

To simplify, we can say that second preimage resistance lies somewhere between the other two properties. A second preimage attack occurs when someone is able to find a specific input that generates the same output as another input they already know.

In other words, a second preimage attack involves finding a collision, but instead of looking for two random inputs that generate the same hash, we look for an input that generates the same hash as another specific input.

Therefore, any collision-resistant hash function is also resistant to second-preimage attacks, because these will always involve a collision. However, one can still perform a preimage attack on a collision-resistant function, because it involves finding a single input from a single output.


Mining

There are many steps in Bitcoin mining that involve hashing functions, such as checking balances, linking transactions input/output, and hashing transactions into a block to form a Merkle Tree. But one of the main reasons that the Bitcoin blockchain is secure is that miners must perform a myriad of hashing operations in order to find a valid solution for creating and adding a block.

Specifically, a miner must try several different inputs when creating a hash value for the block it is applying for. In essence, they will only be able to validate their block if they generate an output hash that starts with a certain number of zeros. The number of zeros is what determines the mining difficulty, and it varies depending on the hash rate devoted to the network.

In this case, the hash rate represents the amount of computing energy invested in mining Bitcoin. If the network hash rate increases, the Bitcoin protocol will automatically adjust the mining difficulty so that the average time needed to mine a block remains close to 10 minutes. On the other hand, if several miners decide to stop mining, resulting in a significant drop in the hash rate, the mining difficulty will be adjusted, which will make mining easier (until the average block time is reduced to 10 minutes).

Note that miners do not need to check for possible collisions because there are multiple hashes that they can generate as valid output (starting with a specific number of zeros). So there are several possible solutions for a certain block, and miners must find one - according to the threshold determined by the mining difficulty.

Since Bitcoin mining is a high-cost task, miners have no reason to cheat within the system as this would result in significant financial losses. The more miners there are in a blockchain, and therefore the bigger it gets, the stronger it becomes.


To conclude

There is no doubt that hash functions are essential tools in computing, especially when dealing with massive amounts of data. In combination with cryptography, hashing algorithms can be quite versatile, particularly for security and authentication, in several ways. Thus, cryptographic hash functions are vital for almost all cryptocurrency networks, so understanding their properties and working mechanisms is definitely useful for anyone interested in blockchain technology.