Chris Natoli

Bitcoin for mathematicians, part 2: Transactions

published 31 December 2017

( previous part | next part )

This series of blog posts is an exposition of Satoshi Nakamoto’s original whitepaper from 2008, supplemented by bitcoin.org’s developer guide. Since there are already plenty of Bitcoin explainers for laypeople, the intended audience of this series is more niche, so that mathematicians could benefit from a brief yet still precise explanation of Bitcoin.

Some preliminary definitions

definition: A one-way function is a function $f:X\to Y$ that is computable in polynomial time but for every probabilistic polynomial-time $F:Y\to X$ (i.e., $F$ is computable in polynomial time with an oracle for random bits), the probability that $F$ is a right-inverse for $f$ is negligible.

Morally speaking, one-way functions are easy to compute but hard to find pre-images of. Note that one-way functions have nothing to do with injectivity. Also note that although several functions are believed to be one-way, there isn’t yet any proof that one-way functions exist, since this would prove $P\ne NP$.

definition: A (cryptographic) hash function is a one-way function $f:X\to Y$ where the outputs are all of a fixed size, i.e., $Y$ is finite. Often, an output of a hash function is also called a hash.

E.g., all outputs of $f:\mathbb{N}\to\{0,\ldots,2^{256}-1\}$ are 256 bits long. sha256 is a hash function developed by the nsa with this domain and codomain.

definition: A trapdoor one-way function is a one-way function $f:X\times K\to Y$ such that for any $k\in K$, there is some corresponding $t_k$ and a probabilistic polynomial-time $F:K\times Y\times\{t_k\}\to Y$ such that for all $x\in X$, we have $F(k,f(x,k),t_k)=x'$ and $f(x',k)=f(x,k)$.

This demands unpacking: Morally speaking, a trapdoor one-way function is easy to right-invert only if one knows the trapdoor $t_k$. E.g., let $X,K$ be the set of all prime numbers, let $Y=\mathbb{N}$, and let $f(x,k)=xk$. If $x$ and $k$ are sufficiently large, then factoring $xk$ cannot be done in polynomial time, unless you know $k$. Here, $t_k=k$ and $F(k,y,t_k)=\frac{y}{k}$, hence $F(k,xk,t_k)=x$.

The classic use of trapdoor one-way functions is in public-key cryptography. Suppose Alice wants to send Bob a secret message. Then Bob generates the keypair $(k,t_k)$ and publishes the public key $k$ and the function $f(\cdot,k)$. Alice then encrypts her message $m$ and sends Bob $f(m,k)$ over a possibly insecure channel. Only Bob can decrypt the message since only he possesses the private key $t_k$ (assuming $t_k$ is not polynomial-time computable from $k$, as it is in the prime number example above). Injectivity of $f(\cdot,k):X\to Y$ is clearly beneficial for public-key encryption systems but (curiously enough) not necessary.

definition: A digital signature scheme is a collection of pairs of functions $(s_k,v_{k'})$ indexed by keypairs $(k,k')$ where the signing function $s_k:X\to Y$ and the verification function $v_{k'}:X\times Y\to\{T,F\}$ satisfy the following:

  1. $s_k$ and $v_{k'}$ are polynomial-time computable,
  2. $\Pr(v_{k'}(x,s_k(x))=T)=1$,
  3. for every probabilistic polynomial-time algorithm with input $k'$ and an oracle to access $s_k$, the probability of outputting some pair $(x,y)$ such that $v_{k'}(x,y)=T$ is negligible.

Morally speaking, signing and verifying signatures are easy but forging someone else’s signature on your own messages is hard without knowing their private key. Suppose Bob wants to ensure a message from Alice is actually from her. Alice generates a keypair $(k,k')$, keeping $k$ private and publishing $k'$. She then sends Bob both the message $x$ and her signature $s_k(x)$. If he confirms that $v_{k'}(x,s_k(x))=T$, then he can be (almost) sure that the message is indeed from Alice. If however $v_{k'}(x,s_k(x))=F$, then it is (almost) surely a forgery. One such scheme called ecdsa uses elliptic curves – a topic for another blog post.

Constructing a simple transaction

Every transaction has at least one input and one output. Suppose Bob has already received $b$ bitcoins from Alice and happens to be sending the exact same amount of bitcoin to Carol. To do so, Bob will construct a transaction linking his money received from Alice (input) to his payment to Carol (output).

First, Bob needs to know where to send his payment, so Carol generates a bitcoin address for this particular transaction by randomly selecting a private key $k\in\{0,\ldots,2^{256}-1\}$ and then computing the address $$a=(e\circ\mathrm{sha}_{256}\circ f)(k),$$ where $f$ is a one-way function related to ecdsa and $e$ just appends some extra information. The address functions somewhat like the public key corresponding to $k$.1 She then sends the address to Bob, who can now construct the output component of his transaction, namely, the pair $(b,a)$.2 Since only Carol has the private key associated with $a$, only she is able to spend that money later on.

The input component of Bob’s transaction is simply the transaction id $t$ of the transaction in which Alice paid him, i.e., the sha256-hash of the entirety of Alice’s transaction. Finally, Bob signs the transaction data – i.e., the tuple of inputs and outputs $(t,(b,a))$3 – using ecdsa with the private key from his transaction with Alice. The signature serves two functions: it guarantees that the payment is not a forgery, and it guarantees that Bob is indeed the owner of the money sent by Alice.

To summarize, if Carol wants to send $b$ bitcoins to David, she requests an address $a'$ from David and then constructs the transaction $\big(\mathrm{sha}_{256}(t,(b,a)),(b,a')\big)$, where the first item in the tuple is the transaction id of Bob’s transaction. Finally, she signs the transaction with her private key $k$, which is associated with the address $a$ of Bob’s payment and hence unlocks the money she received from Bob.

Transactions with multiple inputs and multiple outputs

Suppose Bob received $i_0$ bitcoins from Alice, $i_1$ from Afsana, and $i_2$ from Amarika, and wants to pay $b_0$ to Carol and $b_1$ to Camila, keeping $c$ in change, where $i_0+i_1+i_2\ge b_0+b_1+c$. (The difference $(i_0+i_1+i_2)-(b_0+b_1+c)$ is called the transaction fee, to be explained in the next part of this series.)

Bob will package all of this into a single transaction with three inputs and three outputs. Let $t_0$ be the transaction id of Bob’s transaction with Alice, $t_1$ with Afsana, and $t_2$ with Amarika. (Note that for anonymity, Bob generated a different private key and thus a different address for each transaction with Alice, Afsana, and Amarika. In general, Bob generates a new private key every time he receives money.) Let $a_0$ be the address Carol generates for Bob and $a_1$ be Camila’s. Again for anonymity, Bob generates his own address $a_2$ to which his change will be sent. The transaction is then $$T=\Big(\underbrace{\big(t_0,t_1,t_2\big)}_{\text{inputs}},\underbrace{\big((b_0,a_0),(b_1,a_1),(c,a_2)\big)}_{\text{outputs}}\Big).$$

Note that if, say, Camila wants to spend the $b_1$ bitcoins she received from Bob, the input of her transaction needs to be not only the transaction id $\mathrm{sha}_{256}(T)$ but also the index of her specific output, namely $1$. So her transaction might look something like $$\big(\underbrace{(\mathrm{sha}_{256}(T),1)}_{\text{input}},\underbrace{(b_1,a')}_{\text{output}}\big).$$ In fact, the input component of every transaction must mention both the transaction id and the output index, but I have suppressed the latter in the above exposition for simplicity.

Why must Bob specify his change? The software implementing Bitcoin requires that any time a previous output is referenced as an input in a new transaction, all bitcoins from that output must be spent. I.e., outputs are discrete lump sums that can only be divided by a transaction that takes that lump sum as an input and has multiple outputs. So if Bob wants to spend only two-thirds of the $b$ bitcoins he receives from Alice, he must append an additional output to his transaction that redirects $\frac b3$ bitcoins to himself. If he only specifies a single output of $\frac{2b}{3}$ bitcoins, say as payment to Carol, then the remaining $\frac b3$ bitcoins are considered part of the transaction fee.

Note how the cryptographic structure of a transaction simultaneously provides anonymity and precludes various types of fraud: Since payees always randomly generate a new private key for any transaction, there is (at least in theory) no data linking one transaction to the next or linking transactions to the payee’s identity. Also, since an amount of bitcoin already sent to a given address can only be spent by signing that expenditure with the private key associated with that address, and since forging signatures is computationally intractable, stealing bitcoins is practically impossible – that is, unless one hacks a cryptocurrency exchange or a wallet to steal users’ private keys.

What is a bitcoin? Where do bitcoins live?

First, a bitcoin is actually an abstraction for users’ convenience. Bitcoin software deals only with smaller, atomic units called satoshis, where a single satoshi is equal to 10–8 bitcoins.

Given the foregoing exposition, a descriptivist answer to the question What are bitcoins? is that they are integer amounts of satoshis living in transaction outputs. However, a prescriptivist answer based on Satoshi’s whitepaper is strikingly different.

He writes,

We define an electronic coin as a chain of digital signatures. Each owner transfers the coin to the next by digitally signing a hash of the previous transaction and the public key of the next owner and adding these to the end of the coin.

However, as far as I can tell, there is no object in Bitcoin software that models a unit of currency as a chain of digital signatures, with signatures continually added to the end of it after every transaction. If this were true, then bitcoins (or at least, individual satoshis) would have an independent existence outside of transactions, and each transaction would refer, perhaps by id numbers of some sort, to the individual satoshis involved. Furthermore, transactions would need an extra step to append a new signature to all these satoshis. But this is not the case. That said, one could inspect the blockchain and extract chains (or maybe directed acyclic graphs) of digital signatures, following a particular lump of satoshis across all transactions involving them.

The answer to the question What is a bitcoin? has implications for another question posed following Bitcoin’s fork4 into two distinct currencies on 1 August 2017: Which is the true heir of Satoshi’s currency, Bitcoin or Bitcoin Cash? The currency that is today called Bitcoin adopted a space-saving revision to the format of transactions, allowing nodes to forget older signatures. Consequently, a chain of signatures tracing the history of a single satoshi cannot be recovered from the blockchain. Bitcoin Cash, however, did not adopt this revision and instead simply increased the maximum size of a transaction. Prescriptivists therefore consider Bitcoin Cash to be the true heir, since it obeys Satoshi’s (outdated but alluringly rigorous) prescription that a bitcoin is defined as a chain of electronic signatures.

Footnotes

  1. Technically, $f(k)$ is considered the public key.^
  2. Technically, the output component is $b$ combined with instructions that only whoever controls the private key $k$ associated with $\mathrm{sha}_{256}(f(k))$ – the public key hash, which Bob obtains by inverting $e$ – can spend money from this transaction.^
  3. Other data is actually thrown into this tuple, but it’s technical.^
  4. This fork is explored further in part 4 of this series.^