Chris Natoli

Bitcoin for mathematicians, part 3: The blockchain

published 4 January 2018

This series of blog posts is an exposition of Satoshi Nakamoto’s original whitepaper from 2008, supplemented by bitcoin.org’s developer guide. Since there are already plenty of Bitcoin explainers for laypeople, the intended audience of this series is more niche, so that mathematicians could benefit from a brief yet still precise explanation of Bitcoin.

Chains

I base the following definition on a descriptivist reading of Satoshi’s whitepaper.

definition: A chain on $A_1\times\cdots\times A_n$ is a finite sequence $$(a_1^1,\ldots,a_n^1),(a_1^2,\ldots,a_n^2),\ldots,(a_1^m,\ldots,a_n^m)$$ where $a_i^k\in A_i$ and for some function $f:A_1\times\cdots\times A_n\to A_1$, we have for all $i\lt m$, $f(a_0^i,\ldots,a_n^i)=a_1^{i+1}$.

For example, if we partition the Fibonacci sequence into pairs, then (1,1), (2,3), (5,8), (13,21), (34,55), … is a chain on $\mathbb{N}\times\mathbb{N}$ where $f:(a_1,a_2)\mapsto a_1+a_2$. Similarly, if a bitcoin were in fact a chain of electronic signatures as described in the whitepaper, then it would be a chain $(s_1,k_1),(s_2,k_2),\ldots,(s_m,k_m)$ on $S\times K$, where $S$ is the space of payers’ signatures and $K$ is the space of payees’ public keys. The function $(s_i,k_i)\mapsto s_{i+1}$ linking one pair to the next is the signing function of a digital signature scheme. Since both the payee’s public key and the payer’s signature are appended to the end of the chain – i.e., end of the coin – each time the coin is passed along, every signature in the coin could be verified as the paper suggests.

The importance of chains is that if someone tries to maliciously edit an earlier tuple in the chain, they would have to update every subsequent tuple according to $f$. This will prove useful in the blockchain to prevent double-payments.

Blocks

Every time a user constructs a transaction, they broadcast it to a network of miners, who collect transactions over a period of time and construct a block out of them. A block is a tuple containing the following (in addition to some technical data for bookkeeping):

1. An ordered list of the transactions that were collected. Note that if Alice sends Bob some bitcoins and Bob has not yet spent them in another transaction collected into this block, then those bitcoins are classfied as unspent transaction output for Bob to spend in a later block.
2. The merkle root of the transactions: From the list of transactions, create a list of transaction ids, i.e., the sha256-hashes of each transaction’s data. Consider the latter list as the leaves of a binary tree in which each node is the hash of the concatenation of its children (if any leaf does not have a sibling, it’s concatenated with itself). This hash tree is called a merkle tree, its root the merkle root.
3. An integer called a nonce, to be explained below.
4. A hash based on the previous block.

The block’s header is an 80-byte string concatenating the last three items and some additional data. The hash in item 4 is the $(\mathrm{sha}_{256}\circ \mathrm{sha}_{256})$-hash of the previous block’s header. Thus the blockchain is the chain of blocks linked together by $\mathrm{sha}_{256}\circ \mathrm{sha}_{256}\circ\pi$, where $\pi$ is the projection map from the entire block onto the block header. As we’ll see below, the blockchain serves as public ledger recording the history of all bitcoin transactions.

Miners cannot simply construct a block and tack it onto the existing chain. Instead, the hash of the block header must be less than a given integer $t\in\{1,\ldots,2^{256}\}$ called the target threshold that sets the probability of mining a successful hash to $\frac{t}{2^{256}}$.1 If the hash exceeds the threshold, then the miner must increment the nonce and try again until they find an appropriate nonce – at which point they append the block and broadcast the new chain to the network – or until a competing miner finds such a nonce. Since hashes are one-way, it is computationally intractable to find a successful pre-image, i.e., a nonce such that when the block header is hashed, the hash is below the threshold. The threshold is set periodically so that it takes about two weeks to mine 2,016 blocks. This system is called proof of work, since it forces miners to compute, on average, $\frac{2^{256}}{t}$ blocks before finding a successful one, hence proving their labor.

To incentivize mining, miners are rewarded $50\cdot 2^{-\lfloor n/210,000\rfloor}$ bitcoin for every successful block, where $n$ is the number of blocks mined so far – i.e, the reward started at 50 bitcoins and is halved after every 210,000 blocks. This reward is recorded as the first transaction of every block, called the coinbase transaction. It cannot be spent for another 100 blocks. Miners also collect all transaction fees from blocks they successfully mine. There is a predetermined maximum number of bitcoins to be mined, namely 21,000,000, so eventually transaction fees will be the only incentive. Although a transaction fee is optional for the payer and payee, picking up that transaction is also optional for the miner, so it behooves the payer and payee to agree on a positive transaction fee.

Why are transactions assembled into a merkle tree, rather than simply concatenated into a long tuple that is then hashed for the block header? The benefit of a merkle tree is that if one wishes to verify that a transaction has been recorded into the blockchain, they only need to know the merkle root of the right block and the intermediate nodes along the branch needed to re-compute the merkle root, rather than all transaction ids in that block. This is called simplified payment verification.

Forking

Chains as defined above must be linearly ordered, so what stops the blockchain from forking into two or more branches? For example, suppose two miners start with the same blockchain but collect different sets of transactions and mine their next block at the same time. Alternatively, suppose someone tries to erase history by truncating the blockchain and mining new blocks off the shorter chain.

Such scenarios are prevented by the simple rule that all nodes on the network consider the longest branch to be the correct blockchain. This immediately eliminates the possibility of revising history since the truncated chain is shorter. Also, if two miners simultaneously produce different blocks as the next block, other miners adopt whichever blockchain they see first and continue mining. Eventually, one of the branches will be longer than the other, at which point all nodes switch to the longer branch.

Consensus around the longest chain, combined with the proof-of-work requirement, makes it difficult to tamper with transactions already recorded in the blockchain. Suppose Eve wants to double-spend some amount of bitcoin, first to Alice and then to Bob. If she’s already paid Alice, then miners will reject the later payment to Bob, since they can look through old transactions and see that the output from which Eve is paying Bob has already been used. Eve must therefore edit the block in which she paid Alice, erase that transaction, mine a new hash below the threshold, and then mine a new, successful hash for every subsequent block since they’re chained together. Meanwhile, miners are continually mining new blocks, which Eve must catch up with after reconstructing the extant chain. Given how computationally expensive mining is, Eve cannot produce a longer chain unless she controls 51% of the mining nodes, at which point it’s likely more profitable to mine blocks than double-spend.

Footnotes

1. Satoshi actually writes in the whitepaper that the hash must start with a given number of leading zeros, which restricts $t$ to live in $\{2^n:0\lt n\le 256\}$. However, his reference implementation permits the wider range for $t$ described above.^