What is a Merkle Tree?

Modified on:

A Merkle Tree is a cryptographic method used to securely verify the validity and source of specific content within large data structures. This is achieved by representing an entire data set, composed of any amount of information contained in it, through a single fixed-size value called a “hash.”

Note: You can transform any type of content into a hash. For example, the paragraph above can be represented as the following fixed-size hash value:

b280ea449f14c128518b490e5f36a1e6f88e17e2d49341ba92f0b38ef23dcbe03d131e0630833092093c226a3f694329dd542e540dd6bc087ee2b632fc2325de

You can even confirm this by recreating the hash here! Keep a close eye to how the hash value changes entirely with the slightest change made to the paragraph (even blank spaces).

How can large amounts of data be summarized into a single hash?

This is where the idea of a “tree” comes into play. As a reference, remember that a large data set is usually composed of multiple data subsets. For example, Bitso has a data set that contains all the assets we hold under custody, built by data subsets represented by the balance each of our clients holds at Bitso.

The Merkle Tree method creates individual “leaves” that represent each data subset, where each “leaf” has a hash that represents the information contained in it. When we gather several “leaf” data subsets, we can build a data “branch,” and this “branch” is also represented by its hash, created by combining all “leaf” hashes into a new single fixed-size hash value. By gathering all “branch” hashes, we can create the final “tree” hash, which as a result gives us the final Merkle Root Hash that represents all the content contained in our complete data structure.

Why is this so relevant?

With this method, it’s possible to show that a specific data subset is indeed contained within the larger data structure without revealing its content or the content in other data subsets. In other words, you can verify that your data leaf “exists” in the Merkle Tree.