Four Years Remaining

Sorting in Linear Time

Posted by Konstantin 25.07.2017 3 Comments
Every student of computer science, who has managed to keep even a tiny shred of attention at their algorithms course, should know that sorting $n$ numbers is a task that requires at least $\Omega(n \log n)$ time in general. There are some special cases, such as sorting small integers, where you can use counting sort or radix sort to beat this baseline, but as long as your numbers are hypothetically arbitrarily large, you are stuck with the $\Omega(n \log n)$ lower bound. Right?

Well, not really. One thing that many algorithms courses tend to skim over rather briefly is the discussion of the choice of the computation model, under which the algorithm of interest is supposed to run. In particular, the $\Omega(n \log n)$ bound for sorting holds for the comparison-only model of computation — the abstract situation where the algorithm may only perform pairwise comparisons of the numbers to be sorted. No arithmetic, bit-shifts or anything else your typical processor is normally trained to do is allowed. This is, obviously, not a very realistic model for a modern computer.

Let us thus consider a different computation model instead, which allows our computer to perform any of the basic arithmetic or bitwise operations on numbers in constant time. In addition, to be especially abstract, let us also assume that our computer is capable of handling numbers of arbitrary size. This is the so-called unit-cost RAM model.

It turns out that in this case one can sort arbitrarily large numbers in linear time. The method for achieving this (presented in the work of W. Paul and J. Simon, not to be confused with Paul Simon) is completely impractical, yet quite insightful and amusing (in the geeky sense). Let me illustrate it here.

Paul-and-Simon Sorting

The easiest way to show an algorithm is to step it through an example. Let us therefore consider the example task of sorting the following array of three numbers:
```
a = [5, 3, 9]
```
Representing the same numbers in binary:
```
[101, 11, 1001]
```
Our algorithm starts with a linear pass to find the bit-width of the largest number in the array. In our case the largest number is 9 and has 4 bits:
```
bits = max([ceil(log2(x)) for x in a])     # bits = 4
n = len(a)                                 # n = 3
```
Next the algorithm will create a $(4+1)\cdot 3^2 = 45$ -bit number A of the following binary form:
```
 1 {5} 1 {5} 1 {5} 1 {3} 1 {3} 1 {3} 1 {9} 1 {9} 1 {9}
```
where {9}, {3} and {5} denote the 4-bit representations of the corresponding numbers. In simple terms, we need to pack each array element repeated $n$ times together into a single number. It can be computed in linear time using, for example, the following code:
```
temp, A = 0, 0
for x in a:
    temp = (temp<<(n*(bits+1))) + (1<<bits) + x
for i in range(n):
    A = (A<<(bits+1)) + temp
```
The result is 23834505373497, namely:
```
101011010110101100111001110011110011100111001
```
Next, we need to compute another 45-bit number B, which will also pack all the elements of the array $n$ times, however this time they will be separated by 0-bits and interleaved as follows:
```
 0 {5} 0 {3} 0 {9} 0 {5} 0 {3} 0 {9} 0 {5} 0 {3} 0 {9}
```
This again can be done in linear time:
```
temp, B = 0, 0
for x in a:
    temp = (temp<<(bits+1)) + x
for i in range(n):
    B = (B<<(n*(bits+1))) + temp
```
The result is 5610472248425, namely:
```
001010001101001001010001101001001010001101001
```
Finally, here comes the magic trick: we subtract B from A. Observe how with this single operation we now actually perform all pairwise subtractions of the numbers in the array:
```
A = 1 {5} 1 {5} 1 {5} 1 {3} 1 {3} 1 {3} 1 {9} 1 {9} 1 {9} 
B = 0 {5} 0 {3} 0 {9} 0 {5} 0 {3} 0 {9} 0 {5} 0 {3} 0 {9}
```
Consider what happens to the bits separating all the pairs. If the number on top is greater or equal to the number on the bottom of the pair, the corresponding separating bit on the left will not be carried in the subtraction, and the corresponding bit of the result will be 1. However, whenever the number on the top is less than the number on the bottom, the resulting bit will be zeroed out due to carrying:
```
A   = 1 {5} 1 {5} 1 { 5} 1 { 3} 1 {3} 1 { 3} 1 {9} 1 {9} 1 {9} 
B   = 0 {5} 0 {3} 0 { 9} 0 { 5} 0 {3} 0 { 9} 0 {5} 0 {3} 0 {9}
A-B = 1 {0} 1 {2} 0 {12} 0 {14} 1 {0} 0 {10} 1 {4} 1 {6} 1 {0}
```
The same in binary (highlighted groups correspond to repetitions of the original array elements in the number A):
```
A   = 1 0101 1 0101 1 0101|1 0011 1 0011 1 0011|1 1001 1 1001 1 1001
B   = 0 0101 0 0011 0 1001|0 0101 0 0011 0 1001|0 0101 0 0011 0 1001
A-B = 1 0000 1 0010 0 1100|0 1110 1 0000 0 1010|1 0100 1 0110 1 0000
```
Each "separator" bit of A-B is effectively the result of a comparison of every array element with every other. Let us now extract these bits using a bitwise AND and sum them within each group. It takes another couple of linear passes:
```
x = A-B >> bits
mask, result = 0, 0
for i in range(n):
    mask = (mask<<(n*(bits+1))) + 1
for i in range(n):
    result += x & mask
    x = x >> (bits+1)
```
The result is now the following number:
```
result = 10|000000000000001|000000000000011
```
It is a packed binary representation of the array r = [2, 1, 3]. The number 2 here tells us that there are two elements in a, which are less or equal than a[0]=5. Similarly, the number 1 says that there is only one element less or equal than a[1]=3, and the number 3 means there are three elements less or equal than a[2]=9. In other words, this is an array of ranks, which tells us how the original array elements should be rearranged into sorted order:
```
r = [result >> (n*(bits+1)*(n-i-1)) & ((1<<(n*(bits+1)))-1) 
                                          for i in range(n)]
a_sorted = [None]*n
for i in range(n):
    a_sorted[r[i]-1] = a[i]
```
And voilà, the sorted array! As presented above, the method would only work for arrays consisting of distinct non-negative integers. However, with some modifications it can be adapted to arbitrary arrays of integers or floats. This is left as an exercise to the reader.

The General Implications

There are several things one can learn from the "Paul-and-Simon sort". Firstly, it shows the immense power of the unit-cost RAM computational model. By packing arbitrary amounts of data into a single register of unlimited size, we may force our imaginary computer to perform enormously complex parallel computations in a single step. Indeed, it is known that PSPACE-complete problems can be solved in polynomial time in the unlimited-precision RAM model. This, however, assumes that the machine can do arbitrary arithmetic operations. If you limit it to only additions, subtractions and multiplications (but not divisions or bit-shifts), you still cannot sort integers faster than $\Omega(n \log n)$ even using infinitely-sized registers (this is the main result of the Paul and Simon's article that inspired this post). Not obvious, is it?

Of course, real computers can usually only perform constant-time operations on registers of a fixed size. This is formalized in the $w$ -bit word-RAM model, and in this model the "Paul and Simon sort" degrades from a $O(n)$ into a $O(n^3)$ algorithm (with $O(n^2)$ memory consumption). This is a nice illustration of how the same algorithm can have different complexity based on the chosen execution model.

The third thing that the "Paul and Simon sort" highlights very clearly is the power of arithmetic operations on packed values and bitstrings. In fact, this idea has been applied to derive practically usable integer sorting algorithms with nearly-linear complexity. The latter paper by Han & Thorup expresses the idea quite well:

In case you need the full code of the step-by-step explanation presented above, here it is.
Tags: Algorithms, Computer science, Programming, Theory
The Blockchain Consensus Problem

Posted by Konstantin 09.07.2017 No Comments

The Dark Side of the Bitcoin

Recall that Bitcoin is a currency, i.e. it is a technology, which aims to provide a store of value along with a payment medium. With all due respect to its steadily growing adoption, it would be fair to note that it is not very good at fulfilling either of these two functions currently. Firstly, it is not a very reliable store of value due to extreme volatility in the price. Secondly, and most importantly, it is a mediocre payment medium because it is slow and expensive.

A typical transfer costs around $2 nowadays and takes about an hour for a full confirmation (or longer, if you pay a smaller fee). When you need to transfer a million dollars, this looks like a reasonable deal. When you buy a chocolate bar at a grocery store (something one probably does more often than transferring a million), it is unacceptable. Any plain old bank's payment card would offer a faster and cheaper solution, which is ironic, given that Bitcoin was meant to be all friendly, distributed and free (as in freedom) while banks are, as we all know, evil empires hungry for our money, flesh and souls.

The irony does not end here. The evil banks typically provide some useful services in exchange for the fees they collect, such as an online self-service portal, 24h support personnel, cash handling and ATMs, some security guarantees, interests on deposits, etc. The friendly Bitcoin offers nothing of this kind. What is Bitcoin wasting our money on then? Electricity, mainly! The Proof of Work (PoW) algorithm employed in the Bitcoin's blockchain requires the computation of quintillions of random, meaningless hashes to "confirm" payments. The "miner" nodes, running the Bitcoin's network are collectively performing more than 5 000 000 000 000 000 000 (five quintillion or five exa-) hash computations every second, continuously consuming as much electricity as the whole country of Turkmenistan. The situation is even worse if you consider that Bitcoin is just one of many other "coins" built upon the PoW algorithm (Ethereum and Litecoin being the two other prominent examples), and their overall power consumption is only growing with each day.

Just think of it: most of the $2 fee a Bitcoin user needs to pay for a transaction will neither end up as someone's wage nor make a return on investment in someone's pocket. Instead, it will burn up in fossil fuels which generate power for the "miners", wasting precious resources of our planet, contributing to global warming and pushing poor polar bears faster towards extinction. Is all this mayhem at least a "necessary evil"? Sadly, it is not.

The Unnecessary Evil

Formally speaking, Proof of Work is an algorithm for achieving consensus among a distributed set of nodes which collectively maintain a common blockchain. Is it the only such algorithm? Of course not! Many alternative methods exist, most of them (if not all) are both faster and less energy-hungry. In fact, the only valuable property of PoW is its ingenious simplicity. In terms of implementation it may very well be among the simplest distributed blockchain consensus algorithms ever to be invented.

It is natural that a successful pioneering technology (such as the Bitcoin) is originally built from simple blocks. Progress comes in small steps and you cannot innovate on all fronts at once, after all. There must come a time, however, when the limitations of the initially chosen basic blocks become apparent and the technology gets upgraded to something more efficient. With more than $1 billion dollars in electricity bills paid by Bitcoin users last year for the inefficiency of PoW, Bitcoin has long surpassed this turning point, in my opinion.

Unfortunately, due to its pioneering status, enormous inertia, ongoing hype and the high stakes involved, Bitcoin continues to roll on its old wooden proof-of-work wheels with no improvement in sight, somewhy still being perceived as the leader in the brave new world of cryptocurrencies.

Are nearly-instant and nearly-free payment along with energy efficiency too much to ask from a real "currency of the future"? I do not think so. In fact, Bitcoin could be such a currency, if only it could switch from the evil Proof of Work to a different, fast and eco-friendly consensus algorithm.

Which algorithm could it be? Let me offer you an overview of some of the current options I am personally aware of, so you could decide for yourself.

The Eco-Friendly Blockchain Consensus

Consider a network of many nodes, which needs to maintain a common state for a chain of blocks. There seem to be roughly three general categories of algorithms which the nodes could employ for their purpose: Proof of Authority (PoA), Nakamoto Consensus, and Byzantine Fault Tolerance (BFT). Let us consider them in order.

Proof of Authority

Perhaps the most straightforward solution would be to nominate a fixed subset of nodes as "authoritative", and let any of them append new blocks by signing them cryptographically. To avoid conflicting updates, nodes may agree on a predefined round-robin signing order, honestly randomize their waiting intervals, or use some kind of a deterministic lottery for selecting the signer for next block, etc.

As this approach relies on a fixed subset of (reasonably) trusted nodes, it does not look robust and secure enough for a proper worldwide distributed blockchain. For example, in the limit case of a single trusted party it is equivalent to using a single service provider such as a bank. None the less, it is a convenient baseline and an important primitive, actually applicable to a wide range of real-life blockchain deployments. By relying on a set of well-behaving parties, a PoA blockchain actually sidesteps most of the complexities of a real distributed algorithm, and can thus be made to perform much faster than any of the "truly distributed" algorithms.

The Ethereum software provides an implementation of this approach for those who want to run private chains. PeerCoin relies on the PoA principle by having "checkpoint blocks" signed regularly by a trusted authority. Finally, the Delegated Proof of Stake algorithm makes PoA work on a larger scale by relying on voting. It is probably one of the most interesting practical implementations of the idea.

Delegated Proof of Stake

Delegated Proof of Stake (DPoS) is a consensus algorithm implemented in Graphene-based blockchains (BitShares, Steem, EOS). It is a variant of Proof of Authority, where the small set of authoritative delegate nodes is elected by voting. When electing the delegates, each node can cast the number of votes, proportional to their account value (or "stakeholder share"), thus "delegating their stake in the network". The elected authorities then participate in a simple and fast round-robin block confirmation with each node given a two second window for confirming the next block.

The security of DPoS hinges on the assumption that the nodes with the most stake in the system should generally manage to elect a set of reasonable authorities, and in case of errors, the misbehaving authorities will not cause too much trouble and will be quickly voted out. At the same time, being internally a PoA implementation, the DPoS-based blockchains are by an order of magnitude faster in terms of transaction throughput than any other currently running public blockchains. Notably, they can also naturally support fee-less transactions.

Nakamoto Consensus

Consider the variation of PoA, where there are no pre-selected trusted nodes (i.e. all nodes may participate in the algorithm). Each time a new block needs to be added to the chain, let us pick the node who will gain the right to add it according to some deterministic "lottery" system. The consensus can then be achieved by simply verifying that the resulting blockchain is conforming to the lottery rules at all times, and the conflicting chains are resolved by always preferring the "harder" chain (according to some notion of "hardness").

For example, the infamous Proof-of-Work is an example of such a method. The "lottery" here is based on the ability of a node to find a suitable nonce value. The "hardness" is simply the length of the chain. Such "lottery" methods are sometimes referred to as "Nakamoto consensus algorithms". In terms of efficiency, Nakamoto consensus algorithms are among the slowest consensus algorithms.

Several alternatives to the "PoW lottery" have been proposed. Let us review some of them.

Proof of Stake

Proof of Stake (PoS), first implemented in the Nxt cryptocurrency, is a Nakamoto consensus technique, where the nodes with a greater balance on their account are given a higher chance to "win the lottery" and sign the next block. The actual technique used in Nxt is the following: before signing a block every node obtains a pseudo-random "lottery ticket number" $x$ by hashing the last block data with its own identifier. If this number is smaller than

$\alpha \cdot \text{(account balance)}\cdot \text{(time since last block)},$

(where $\alpha$ is a block-specific constant), the node gets the right to sign the next block. The higher the node's balance, the higher is the probability it will get a chance to sign. The rationale is that nodes with larger balances have more at stake, are more motivated to behave honestly, and thus need to be given more opportunities to participate in generating the blockchain.

Proof of Stake is typically considered as the primary alternative to Proof of Work without all the wasteful computation, and it should, in principle, be possible to transition the whole blockchain from the latter to the former. In fact, this is what may probably happen to Ethereum eventually.

Proof of Space

In Proof of Space (PoSpace), a consensus mechanism implemented in Burstcoin, the "miners" must first pre-generate a set of "lottery ticket numbers" in a particular manner for themselves, save these numbers on a hard drive and commit the hash (the Merkle tree root) of this complete ticket set to the blockchain. Then, similarly to Proof of Stake, by hashing the last block's data, a miner deterministically picks one of his own "lottery tickets" for the next block. If the value of this ticket, discounted by the number of tickets in possession, is small enough, the miner gets the right to sign the block. The more tickets a miner generates and stores, the better are his chances. When signing the block, the miner must present a couple of special hashes which he can only know if he constantly stores his complete set of tickets (or fully recomputes a large part of it every time, which is impractical). Consequently, instead of spending energy on the "mining" process, the nodes must constantly dedicate a certain amount of disk space to the algorithm.

Although it is probably among the less widely known methods, from both technical and practical standpoint, it is one of the most interesting techniques, in my opinion. Note how it combines the properties of PoS (speed and energy efficiency) with those of PoW (ownership of a real-world resource as a proxy for decentralization).

Proof of Burn

The idea behind Proof of Burn is to allow the nodes to generate their "lottery ticket numbers" by irretrievably transferring some coins to a nonexistent address and taking the hash of the resulting transaction. The resulting hash, scaled by the amount of coins burned, can then be used to gain the right to sign blocks just like in other Nakamoto lottery systems. The act of wasting coins is meant to be a virtual analogue of spending electricity on PoW mining, without actually spending it. Blockchains based purely on Proof of Burn do not seem to exist at the moment. However, the technique can be used alongside PoW, PoS or other approaches.

Proof of Elapsed Time

Presumably, some Intel processors have specialized instructions for emitting signed tokens, which prove that a given process called a particular function a certain period of time ago. The Hyperledger project proposes to build a consensus algorithm around those. Each "miner" will gain the right to sign a block after it waits for a certain period of time. The token which proves that the miner did in fact wait the allotted time, would act as a winning lottery ticket. I do not see how this method could work outside of the trusted Intel-only environment or how is it better than a trivialized Proof of Stake (not sure I even understood the idea correcty), but I could not help mentioning it here for completeness' sake.

Hybrid Nakamoto Consensus Systems

Some systems interleave PoW and PoS confirmations, or add PoA signatures from time to time to lock the chain or speed-up block confirmations. In fact, it is not too hard to invent nearly arbitrary combinations of delegation, voting, payments, authorities and lotteries.

Byzantine Fault Tolerance

The Practical Byzantine Fault Tolerance (PBFT) algorithm offers an alternative solution to the consensus problem. Here the blockchain state is tracked by a set of "bookkeeping" nodes, which constantly broadcast all changes among themselves and consider a change reliably replicated when it is signed and confirmed by given quorum (e.g. 2/3) of the bookkeepers. The algorithms of this type can be shown to be reliable if no more than a third of the nodes are dishonest. The Ripple, Stellar and Antshares are examples of blockchains based on such techniques. This algorithm allows much higher transaction throughputs than Nakamoto consensus (PoW, PoS, PoSpace), yet it still lags behind the speed of PoA or DPoS.

Tags: Bitcoin, Blockchain, Computer science, Cryptography, Distributed computing, Explanation