Four Years Remaining

The Blockchain Consensus Problem

Posted by Konstantin 09.07.2017 No Comments

The Dark Side of the Bitcoin

Recall that Bitcoin is a currency, i.e. it is a technology, which aims to provide a store of value along with a payment medium. With all due respect to its steadily growing adoption, it would be fair to note that it is not very good at fulfilling either of these two functions currently. Firstly, it is not a very reliable store of value due to extreme volatility in the price. Secondly, and most importantly, it is a mediocre payment medium because it is slow and expensive.

A typical transfer costs around $2 nowadays and takes about an hour for a full confirmation (or longer, if you pay a smaller fee). When you need to transfer a million dollars, this looks like a reasonable deal. When you buy a chocolate bar at a grocery store (something one probably does more often than transferring a million), it is unacceptable. Any plain old bank's payment card would offer a faster and cheaper solution, which is ironic, given that Bitcoin was meant to be all friendly, distributed and free (as in freedom) while banks are, as we all know, evil empires hungry for our money, flesh and souls.

The irony does not end here. The evil banks typically provide some useful services in exchange for the fees they collect, such as an online self-service portal, 24h support personnel, cash handling and ATMs, some security guarantees, interests on deposits, etc. The friendly Bitcoin offers nothing of this kind. What is Bitcoin wasting our money on then? Electricity, mainly! The Proof of Work (PoW) algorithm employed in the Bitcoin's blockchain requires the computation of quintillions of random, meaningless hashes to "confirm" payments. The "miner" nodes, running the Bitcoin's network are collectively performing more than 5 000 000 000 000 000 000 (five quintillion or five exa-) hash computations every second, continuously consuming as much electricity as the whole country of Turkmenistan. The situation is even worse if you consider that Bitcoin is just one of many other "coins" built upon the PoW algorithm (Ethereum and Litecoin being the two other prominent examples), and their overall power consumption is only growing with each day.

Just think of it: most of the $2 fee a Bitcoin user needs to pay for a transaction will neither end up as someone's wage nor make a return on investment in someone's pocket. Instead, it will burn up in fossil fuels which generate power for the "miners", wasting precious resources of our planet, contributing to global warming and pushing poor polar bears faster towards extinction. Is all this mayhem at least a "necessary evil"? Sadly, it is not.

The Unnecessary Evil

Formally speaking, Proof of Work is an algorithm for achieving consensus among a distributed set of nodes which collectively maintain a common blockchain. Is it the only such algorithm? Of course not! Many alternative methods exist, most of them (if not all) are both faster and less energy-hungry. In fact, the only valuable property of PoW is its ingenious simplicity. In terms of implementation it may very well be among the simplest distributed blockchain consensus algorithms ever to be invented.

It is natural that a successful pioneering technology (such as the Bitcoin) is originally built from simple blocks. Progress comes in small steps and you cannot innovate on all fronts at once, after all. There must come a time, however, when the limitations of the initially chosen basic blocks become apparent and the technology gets upgraded to something more efficient. With more than $1 billion dollars in electricity bills paid by Bitcoin users last year for the inefficiency of PoW, Bitcoin has long surpassed this turning point, in my opinion.

Unfortunately, due to its pioneering status, enormous inertia, ongoing hype and the high stakes involved, Bitcoin continues to roll on its old wooden proof-of-work wheels with no improvement in sight, somewhy still being perceived as the leader in the brave new world of cryptocurrencies.

Are nearly-instant and nearly-free payment along with energy efficiency too much to ask from a real "currency of the future"? I do not think so. In fact, Bitcoin could be such a currency, if only it could switch from the evil Proof of Work to a different, fast and eco-friendly consensus algorithm.

Which algorithm could it be? Let me offer you an overview of some of the current options I am personally aware of, so you could decide for yourself.

The Eco-Friendly Blockchain Consensus

Consider a network of many nodes, which needs to maintain a common state for a chain of blocks. There seem to be roughly three general categories of algorithms which the nodes could employ for their purpose: Proof of Authority (PoA), Nakamoto Consensus, and Byzantine Fault Tolerance (BFT). Let us consider them in order.

Proof of Authority

Perhaps the most straightforward solution would be to nominate a fixed subset of nodes as "authoritative", and let any of them append new blocks by signing them cryptographically. To avoid conflicting updates, nodes may agree on a predefined round-robin signing order, honestly randomize their waiting intervals, or use some kind of a deterministic lottery for selecting the signer for next block, etc.

As this approach relies on a fixed subset of (reasonably) trusted nodes, it does not look robust and secure enough for a proper worldwide distributed blockchain. For example, in the limit case of a single trusted party it is equivalent to using a single service provider such as a bank. None the less, it is a convenient baseline and an important primitive, actually applicable to a wide range of real-life blockchain deployments. By relying on a set of well-behaving parties, a PoA blockchain actually sidesteps most of the complexities of a real distributed algorithm, and can thus be made to perform much faster than any of the "truly distributed" algorithms.

The Ethereum software provides an implementation of this approach for those who want to run private chains. PeerCoin relies on the PoA principle by having "checkpoint blocks" signed regularly by a trusted authority. Finally, the Delegated Proof of Stake algorithm makes PoA work on a larger scale by relying on voting. It is probably one of the most interesting practical implementations of the idea.

Delegated Proof of Stake

Delegated Proof of Stake (DPoS) is a consensus algorithm implemented in Graphene-based blockchains (BitShares, Steem, EOS). It is a variant of Proof of Authority, where the small set of authoritative delegate nodes is elected by voting. When electing the delegates, each node can cast the number of votes, proportional to their account value (or "stakeholder share"), thus "delegating their stake in the network". The elected authorities then participate in a simple and fast round-robin block confirmation with each node given a two second window for confirming the next block.

The security of DPoS hinges on the assumption that the nodes with the most stake in the system should generally manage to elect a set of reasonable authorities, and in case of errors, the misbehaving authorities will not cause too much trouble and will be quickly voted out. At the same time, being internally a PoA implementation, the DPoS-based blockchains are by an order of magnitude faster in terms of transaction throughput than any other currently running public blockchains. Notably, they can also naturally support fee-less transactions.

Nakamoto Consensus

Consider the variation of PoA, where there are no pre-selected trusted nodes (i.e. all nodes may participate in the algorithm). Each time a new block needs to be added to the chain, let us pick the node who will gain the right to add it according to some deterministic "lottery" system. The consensus can then be achieved by simply verifying that the resulting blockchain is conforming to the lottery rules at all times, and the conflicting chains are resolved by always preferring the "harder" chain (according to some notion of "hardness").

For example, the infamous Proof-of-Work is an example of such a method. The "lottery" here is based on the ability of a node to find a suitable nonce value. The "hardness" is simply the length of the chain. Such "lottery" methods are sometimes referred to as "Nakamoto consensus algorithms". In terms of efficiency, Nakamoto consensus algorithms are among the slowest consensus algorithms.

Several alternatives to the "PoW lottery" have been proposed. Let us review some of them.

Proof of Stake

Proof of Stake (PoS), first implemented in the Nxt cryptocurrency, is a Nakamoto consensus technique, where the nodes with a greater balance on their account are given a higher chance to "win the lottery" and sign the next block. The actual technique used in Nxt is the following: before signing a block every node obtains a pseudo-random "lottery ticket number" $x$ by hashing the last block data with its own identifier. If this number is smaller than

$\alpha \cdot \text{(account balance)}\cdot \text{(time since last block)},$

(where $\alpha$ is a block-specific constant), the node gets the right to sign the next block. The higher the node's balance, the higher is the probability it will get a chance to sign. The rationale is that nodes with larger balances have more at stake, are more motivated to behave honestly, and thus need to be given more opportunities to participate in generating the blockchain.

Proof of Stake is typically considered as the primary alternative to Proof of Work without all the wasteful computation, and it should, in principle, be possible to transition the whole blockchain from the latter to the former. In fact, this is what may probably happen to Ethereum eventually.

Proof of Space

In Proof of Space (PoSpace), a consensus mechanism implemented in Burstcoin, the "miners" must first pre-generate a set of "lottery ticket numbers" in a particular manner for themselves, save these numbers on a hard drive and commit the hash (the Merkle tree root) of this complete ticket set to the blockchain. Then, similarly to Proof of Stake, by hashing the last block's data, a miner deterministically picks one of his own "lottery tickets" for the next block. If the value of this ticket, discounted by the number of tickets in possession, is small enough, the miner gets the right to sign the block. The more tickets a miner generates and stores, the better are his chances. When signing the block, the miner must present a couple of special hashes which he can only know if he constantly stores his complete set of tickets (or fully recomputes a large part of it every time, which is impractical). Consequently, instead of spending energy on the "mining" process, the nodes must constantly dedicate a certain amount of disk space to the algorithm.

Although it is probably among the less widely known methods, from both technical and practical standpoint, it is one of the most interesting techniques, in my opinion. Note how it combines the properties of PoS (speed and energy efficiency) with those of PoW (ownership of a real-world resource as a proxy for decentralization).

Proof of Burn

The idea behind Proof of Burn is to allow the nodes to generate their "lottery ticket numbers" by irretrievably transferring some coins to a nonexistent address and taking the hash of the resulting transaction. The resulting hash, scaled by the amount of coins burned, can then be used to gain the right to sign blocks just like in other Nakamoto lottery systems. The act of wasting coins is meant to be a virtual analogue of spending electricity on PoW mining, without actually spending it. Blockchains based purely on Proof of Burn do not seem to exist at the moment. However, the technique can be used alongside PoW, PoS or other approaches.

Proof of Elapsed Time

Presumably, some Intel processors have specialized instructions for emitting signed tokens, which prove that a given process called a particular function a certain period of time ago. The Hyperledger project proposes to build a consensus algorithm around those. Each "miner" will gain the right to sign a block after it waits for a certain period of time. The token which proves that the miner did in fact wait the allotted time, would act as a winning lottery ticket. I do not see how this method could work outside of the trusted Intel-only environment or how is it better than a trivialized Proof of Stake (not sure I even understood the idea correcty), but I could not help mentioning it here for completeness' sake.

Hybrid Nakamoto Consensus Systems

Some systems interleave PoW and PoS confirmations, or add PoA signatures from time to time to lock the chain or speed-up block confirmations. In fact, it is not too hard to invent nearly arbitrary combinations of delegation, voting, payments, authorities and lotteries.

Byzantine Fault Tolerance

The Practical Byzantine Fault Tolerance (PBFT) algorithm offers an alternative solution to the consensus problem. Here the blockchain state is tracked by a set of "bookkeeping" nodes, which constantly broadcast all changes among themselves and consider a change reliably replicated when it is signed and confirmed by given quorum (e.g. 2/3) of the bookkeepers. The algorithms of this type can be shown to be reliable if no more than a third of the nodes are dishonest. The Ripple, Stellar and Antshares are examples of blockchains based on such techniques. This algorithm allows much higher transaction throughputs than Nakamoto consensus (PoW, PoS, PoSpace), yet it still lags behind the speed of PoA or DPoS.

Tags: Bitcoin, Blockchain, Computer science, Cryptography, Distributed computing, Explanation
Blockchain in Simple Terms

Posted by Konstantin 29.03.2017 1 Comment
The following is an expanded version of an explanatory comment I posted here.

Alice's Diary

Alice decided to keep a diary. For that she bought a notebook, and started filling it with lines like:
1. Bought 5 apples.
2. Called mom.
  ....
3. Gave Bob $250.
4. Kissed Carl.
5. Ate a banana.
  ...
Alice did her best to keep a meticulous account of events, and whenever she had a discussion with friends about something that happened earlier, she would quickly resolve all arguments by taking out the notebook and demonstrating her records. One day she had a dispute with Bob about whether she lent him $250 earlier or not. Unfortunately, Alice did not have her notebook at hand at the time of the dispute, but she promised to bring it tomorrow to prove Bob owed her money.

Bob really did not want to return the money, so that night he got into Alice's house, found the notebook, found line 132 and carefully replaced it with "132. Kissed Dave". The next day, when Alice opened the notebook, she did not find any records about money being given to Bob, and had to apologize for making a mistake.

Alice's Blockchain

A year later Bob's conscience got to him and he confessed his crime to Alice. Alice forgave him, but decided to improve the way she kept the diary, to avoid the risk of forging records in the future. Here's what she came up with. The operating system Linups that she was using had a program named md5sum, which could convert any text to its hash - a strange sequence of 32 characters. Alice did not really understand what the program did with the text, it just seemed to produce a sufficiently random sequence. For example, if you entered "hello" into the program, it would output "b1946ac92492d2347c6235b4d2611184", and if you entered "hello " with a space at the end, the output would be "1a77a8341bddc4b45418f9c30e7102b4".

Alice scratched her head a bit and invented the following way of making record forging more complicated to people like Bob in the future: after each record she would insert the hash, obtained by feeding the md5sum program with the text of the record and the previous hash. The new diary now looked as follows:
1. 0000 (the initial hash, let us limit ourselves with just four digits for brevity)
2. Bought 5 apples.
3. 4178 (the hash of "0000" and "Bought 5 apples")
4. Called mom.
5. 2314 (the hash of "4178" and "Called mom")
  ...
  4492
6. Gave Bob $250.
  1010 (the hash of "4492" and "Gave Bob $250")
7. Kissed Carl.
  8204 (the hash of "1010" and "Kissed Carl")
  ...
Now each record was "confirmed" by a hash. If someone wanted to change the line 132 to something else, they would have to change the corresponding hash (it would not be 1010 anymore). This, in turn, would affect the hash of line 133 (which would not be 8204 anymore), and so on all the way until the end of the diary. In order to change one record Bob would have to rewrite confirmation hashes for all the following diary records, which is fairly time-consuming. This way, hashes "chain" all records together, and what was before a simple journal became now a chain of records or "blocks" - a blockchain.

Proof-of-Work Blockchain

Time passed, Alice opened a bank. She still kept her diary, which now included serious banking records like "Gave out a loan" or "Accepted a deposit". Every record was accompanied with a hash to make forging harder. Everything was fine, until one day a guy named Carl took a loan of $1000000. The next night a team of twelve elite Chinese diary hackers (hired by Carl, of course) got into Alice's room, found the journal and substituted in it the line "143313. Gave out a $1000000 loan to Carl" with a new version: "143313. Gave out a $10 loan to Carl". They then quickly recomputed all the necessary hashes for the following records. For a dozen of hackers armed with calculators this did not take too long.

Fortunately, Alice saw one of the hackers retreating and understood what happened. She needed a more secure system. Her new idea was the following: let us append a number (called "nonce") in brackets to each record, and choose this number so that the confirmation hash for the record would always start with two zeroes. Because hashes are rather unpredictable, the only way to do it is to simply try out different nonce values until one of them results in a proper hash:
1. 0000
2. Bought 5 apples (22).
3. 0042 (the hash of "0000" and "Bought 5 apples (22)")
4. Called mom (14).
5. 0089 (the hash of "0042" and "Called mom (14)")
  ...
  0057
6. Gave Bob $250 (33).
  0001
7. Kissed Carl (67).
  0093 (the hash of "0001" and "Kissed Carl (67)")
  ...
To confirm each record one now needs to try, on average, about 50 different hashing operations for different nonce values, which makes it 50 times harder to add new records or forge them than previously. Hopefully even a team of hackers wouldn't manage in time. Because each confirmation now requires hard (and somewhat senseless) work, the resulting method is called a proof-of-work system.

Distributed Blockchain

Tired of having to search for matching nonces for every record, Alice hired five assistants to help her maintain the journal. Whenever a new record needed to be confirmed, the assistants would start to seek for a suitable nonce in parallel, until one of them completed the job. To motivate the assistants to work faster she allowed them to append the name of the person who found a valid nonce, and promised to give promotions to those who confirmed more records within a year. The journal now looked as follows:
1. 0000
2. Bought 5 apples (29, nonce found by Mary).
3. 0013 (the hash of "0000" and "Bought 5 apples (29, nonce found by Mary)")
4. Called mom (45, nonce found by Jack).
5. 0089 (the hash of "0013" and "Called mom (45, nonce found by Jack)")
  ...
  0068
6. Gave Bob $250 (08, nonce found by Jack).
  0028
7. Kissed Carl (11, nonce found by Mary).
  0041
  ...
A week before Christmas, two assistants came to Alice seeking for a Christmas bonus. Assistant Jack, showed a diary where he confirmed 140 records and Mary confirmed 130, while Mary showed a diary where she, reportedly, confirmed more records than Jack. Each of them was showing Alice a journal with all the valid hashes, but different entries! It turns out that ever since having found out about the promotion the two assistants were working hard to keep their own journals, such that all nonces would have their names. Since they had to maintain the journals individually they had to do all the work confirming records alone rather than splitting it among other assistants. This of course made them so busy that they eventually had to miss some important entries about Alice's bank loans.

Consequently, Jacks and Mary's "own journals" ended up being shorter than the "real journal", which was, luckily, correctly maintained by the three other assistants. Alice was disappointed, and, of course, did not give neither Jack nor Mary a promotion. "I will only give promotions to assistants who confirm the most records in the valid journal", she said. And the valid journal is the one with the most entries, of course, because the most work has been put into it!

After this rule has been established, the assistants had no more motivation to cheat by working on their own journal alone - a collective honest effort always produced a longer journal in the end. This rule allowed assistants to work from home and completely without supervision. Alice only needed to check that the journal had the correct hashes in the end when distributing promotions. This way, Alice's blockchain became a distributed blockchain.

Bitcoin

Jack happened to be much more effective finding nonces than Mary and eventually became a Senior Assistant to Alice. He did not need any more promotions. "Could you transfer some of the promotion credits you got from confirming records to me?", Mary asked him one day. "I will pay you $100 for each!". "Wow", Jack thought, "apparently all the confirmations I did still have some value for me now!". They spoke with Alice and invented the following way to make "record confirmation achievements" transferable between parties.

Whenever an assistant found a matching nonce, they would not simply write their own name to indicate who did it. Instead, they would write their public key. The agreement with Alice was that the corresponding confirmation bonus would belong to whoever owned the matching private key:
1. 0000
2. Bought 5 apples (92, confirmation bonus to PubKey61739).
3. 0032 (the hash of "0000" and "Bought 5 apples (92, confirmation bonus to PubKey61739)")
4. Called mom (52, confirmation bonus to PubKey55512).
5. 0056 (the hash of "0032" and "Called mom (52, confirmation bonus to PubKey55512)")
  ...
  0071
6. Gave Bob $250 (22, confirmation bonus to PubKey61739).
  0088
7. Kissed Carl (40, confirmation bonus to PubKey55512).
  0012
  ...
To transfer confirmation bonuses between parties a special type of record would be added to the same diary. The record would state which confirmation bonus had to be transferred to which new public key owner, and would be signed using the private key of the original confirmation owner to prove it was really his decision:
1. 0071
2. Gave Bob $250 (22, confirmation bonus to PubKey6669).
  0088
3. Kissed Carl (40, confirmation bonus to PubKey5551).
  0012
  ...
  0099
4. TRANSFER BONUS IN RECORD 132 TO OWNER OF PubKey1111, SIGNED BY PrivKey6669. (83, confirmation bonus to PubKey4442).
  0071
In this example, record 284 transfers bonus for confirming record 132 from whoever it belonged to before (the owner of private key 6669, presumably Jack in our example) to a new party - the owner of private key 1111 (who could be Mary, for example). As it is still a record, there is also a usual bonus for having confirmed it, which went to owner of private key 4442 (who could be John, Carl, Jack, Mary or whoever else - it does not matter here). In effect, record 284 currently describes two different bonuses - one due to transfer, and another for confirmation. These, if necessary, can be further transferred to different parties later using the same procedure.

Once this system was implemented, it turned out that Alice's assistants and all their friends started actively using the "confirmation bonuses" as a kind of an internal currency, transferring them between each other's public keys, even exchanging for goods and actual money. Note that to buy a "confirmation bonus" one does not need to be Alice's assistant nor register anywhere. One just needs to provide a public key.

This confirmation bonus trading activity became so prominent that Alice stopped using the diary for her own purposes, and eventually all the records in the diary would only be about "who transferred which confirmation bonus to whom". This idea of a distributed proof-of-work-based blockchain with transferable confirmation bonuses is known as the Bitcoin.

Smart Contracts

But wait, we are not done yet. Note how Bitcoin is born from the idea of recording "transfer claims", cryptographically signed by the corresponding private key, into a blockchain-based journal. There is no reason we have to limit ourselves to this particular cryptographic protocol. For example, we could just as well make the following records:
1. Transfer bonus in record 132 to whoever can provide signatures, corresponding to PubKey1111 AND PubKey3123.
This would be an example of a collective deposit, which may only be extracted by a pair of collaborating parties. We could generalize further and consider conditions of the form:
1. Transfer bonus in record 132 to whoever first provides $x$ , such that $f(x) = \text{true}$ .
Here $f(x)$ could be any predicate describing a "contract". For example, in Bitcoin the contract requires $x$ to be a valid signature, corresponding to a given public key (or several keys). It is thus a "contract", verifying the knowledge of a certain secret (the private key). However, $f(x)$ could just as well be something like:

$f(x) = \text{true, if }x = \text{number of bytes in record #42000},$

which would be a kind of a "future prediction" contract - it can only be evaluated in the future, once record 42000 becomes available. Alternatively, consider a "puzzle solving contract":

$f(x) = \text{true, if }x = \text{valid, machine-verifiable}$

$\qquad\qquad\text{proof of a complex theorem},$

Finally, the first part of the contract, namely the phrase "Transfer bonus in record ..." could also be fairly arbitrary. Instead of transferring "bonuses" around we could just as well transfer arbitrary tokens of value:
1. Whoever first provides $x$ , such that $f(x) = \text{true}$ will be DA BOSS.
  ...
2. $x=42$ satisifes the condition in record 284.
  Now and forever, John is DA BOSS!
The value and importance of such arbitrary tokens will, of course, be determined by how they are perceived by the community using the corresponding blockchain. It is not unreasonable to envision situations where being DA BOSS gives certain rights in the society, and having this fact recorded in an automatically-verifiable public record ledger makes it possible to include the this knowledge in various automated systems (e.g. consider a door lock which would only open to whoever is currently known as DA BOSS in the blockchain).

Honest Computing

As you see, we can use a distributed blockchain to keep journals, transfer "coins" and implement "smart contracts". These three applications are, however, all consequences of one general, core property. The participants of a distributed blockchain ("assistants" in the Alice example above, or "miners" in Bitcoin-speak) are motivated to precisely follow all rules necessary for confirming the blocks. If the rules say that a valid block is the one where all signatures and hashes are correct, the miners will make sure these indeed are. If the rules say that a valid block is the one where a contract function needs to be executed exactly as specified, the miners will make sure it is the case, etc. They all seek to get their confirmation bonuses, and they will only get them if they participate in building the longest honestly computed chain of blocks.

Because of that, we can envision blockchain designs where a "block confirmation" requires running arbitrary computational algorithms, provided by the users, and the greedy miners will still execute them exactly as stated. This general idea lies behind the Ethereum blockchain project.

There is just one place in the description provided above, where miners have some motivational freedom to not be perfectly honest. It is the decision about which records to include in the next block to be confirmed (or which algorithms to execute, if we consider the Ethereum blockchain). Nothing really prevents a miner to refuse to ever confirm a record "John is DA BOSS", ignoring it as if it never existed at all. This problem is overcome in modern blockchains by having users offer additional "tip money" reward for each record included in the confirmed block (or for every algorithmic step executed on the Ethereum blockchain). This aligns the motivation of the network towards maximizing the number of records included, making sure none is lost or ignored. Even if some miners had something against John being DA BOSS, there would probably be enough other participants who would not turn down the opportunity of getting an additional tip.

Consequently, the whole system is economically incentivised to follow the protocol, and the term "honest computing" seems appropriate to me.
Tags: Bitcoin, Computer science, Cryptography, Distributed computing, Economics, Explanation
Bitcoin-based Timestamping

Posted by Konstantin 20.01.2014 No Comments
Bitcoin is a cryptographic currency, that has gained a lot of hype in the last year. From a technical perspective, it is simply a distributed timestamping scheme, fully dedicated to establishing the order of monetary transactions, by creating a long block chain.

Adding a new block to the block chain requires extremely expensive distributed computations. Thus, in terms of the amount of energy, invested by the users worldwide into its creation, the block chain is, at the moment, probably the most expensive computer-generated file in human history. A monument to the raw "computation for the sake of computation". The Bitcoin network by now includes hundreds of thousands of users, most of whom keep a full copy of the block chain and contribute to its further growth.

Timestamping hash, published in a paper

All that means that including any information into the block chain can act as a solid timestamp, proving the existence of this information at a particular point in time. It is nearly impossible to fake or revoke. Even if the Bitcoin network would cease working, the block chain would probably be kept around at least as a curious artifact (as well as an object of interest for data miners) . The idea is equivalent to a popular practice of timestamping information by publishing it in a widely distributed newspaper. However, publishing in a popular newspaper may be costly, while getting transactions into the block chain is nearly free and accessible to anyone.

Consequently, it seems obvious that sooner or later the bitcoin block chain must begin to be used for timestamping things other than transactions. Because trusted timestamping is a big deal. Everyone in Estonia knows that.

Unfortunately, it seems that although the idea has been mentioned before, there do not seem to be any convenient services developed for that, apart from BTProof, which is somewhat too simplistic, given the potential importance of the task at hand. In an attempt to perhaps inspire someone to consider imlementing a more serious service of this kind, let me give a brief overview of the ways to get your data into the block chain.

Smuggling your data into the block chain

If only Bitcoin transactions were allowed to have textual descriptions assigned to them, the task would be trivial: any piece of information you want to timestamp could be simply mentioned in the description of a transaction. However, this functionality is not part of the Bitcoin protocol, so we have to use tricks. At least three different techniques are possible here.

1. Specifying your data as a destination address.

Each Bitcoin transaction includes a "destination address", which is a 34-character string in hex-like encoding. This address may be specified arbitrarily. Thus, by transferring any amount to an address, which itself is the hash of the information you need timestamped, you will have the fact of information's existence mentioned in the block chain for future generations to behold. This is the idea behind BTProof. There are several problems with this method. Most importantly, anything you transfer to a nonexistent address will get lost forever, with no one being able to claim it. This makes the process non-free, because you cannot have transactions of size 0. Moreover, very small transactions are unfavoured by the Bitcoin network and take a long time to get verified. Finally, leaving "unclaimed" transactions forever hanging in the block chain is somewhat indecent in the first place, isn't it?

The abovementioned drawbacks may be addressed using multi-signature transactions. Those are a special type of transactions, which allow the funds to be claimed by any one of several addresses. In this case one address can be used to encode the hash and another one — to reclaim the funds spent in the transaction back. This concept has been suggested as the way to carry arbitrary data on top of Bitcoin in the MasterCoin project.

2. Specifying your data as a destination private key.

Rather than converting the data you need timestamped into a (non-existent) address, you can turn it into a private key. You can then perform two transactions. The first one transfers funds to an address, corresponding to this private key, and next one uses the private key to withdraw the funds back to you. As there is a fixed mapping from your data to the private key to the address that the funds went through, you have just included a trace of your data into the block chain. This method is mentioned here. The drawback is the need for two transactions, and the overall complexity of the scheme.

3. Specifying your data in the script.

Finally, the last two places in a bitcoin transaction, which allow custom data, are the two "script" fields. Namely, the act of depositing funds to a transaction in Bitcoin is not as simple as providing the target address. Instead, it is a script, that, when executed, is supposed to check the right of the receiver to obtain the funds. Similarly, the act of withdrawing funds from Bitcoin is expressed by a script that proves the rights of the owner.

For example, a typical "deposition script" looks as follows:
```
OP_DUP 
OP_HASH160
OP_PUSHDATAx <target_address>
OP_EQUALVERIFY
OP_CHECKSIG
```
This script means that in order to withdraw the funds, the receiver must push on the stack a signature of the transaction, followed by his public key. The script then starts executing by first duplicating (OP_DUP) the top value on the stack, the public key. It then applies a hash function (OP_HASH160) to the top value on the stack (this converts the public key to an address). Then another value is pushed onto the stack (OP_PUSHDATAx). Next, two top values are popped and checked for equality (OP_EQUALVERIFY). This verifies, that the receiver's address matches <target_address>. Finally, the OP_CHECKSIG command pops another two values from the stack (those are the signature and the public key now, remember), and verifies the correctness of the signature.

The beauty of the system is that it lets you create various rules for claiming funds apart from simply owning a private key to an address. For example, it is possible to create transactions which require multiple parties to collude to withdraw them. Or you may require the receiver of the funds to solve a puzzle. Or you may even put the funds up for anyone to take freely, etc.

What is important for our purposes, however, is that the scripting language is rather flexible. In particular, it lets you add useless commands, such as "push this data onto stack, then drop the top value from stack, then continue as normal:"
```
OP_PUSHDATAx <any_data> 
OP_DROP 
OP_HASH160
OP_PUSHDATAx <target_address>
OP_EQUALVERIFY
OP_CHECKSIG
```
This logic could be included in either the "depositing" or the "receiving" script, letting you essentially provide arbitrary "notes" in transactions and thus do timestamp data in the most reasonable way. This lets you timestamp using a single transaction, which recurrently transfers any amount from an address to itself.

Unfortunately, it seems that the freedom of scripting has been severely limited in the recent versions of Bitcoin software. Namely, transactions with any nonstandard scripts are simply declined from inclusion into a block (at least, none of my attempts to try this out succeeded). Even the fate of multi-signature transactions, mentioned in point 1 (which are just a particular kind of a script) is not completely clear. In any case, the Bitcoin specification will most probably evolve to eventually allow storing dedicated data packets in the block chain without the need to resort to hacks. And if not Bitcoin, perhaps such functionality will become part of one of the competing similar cryptocurrencies.

It seems unreasonable to run a huge distributed timestamping algorithm, and not let people use it for general-purpose timestamping, doesn't it?

Update: Given the recent problems related to the transaction malleability aspect of the protocol, it is easy to predict that the freedom of scripting will probably be limited even further in the future. However, eventually support must be added for storing a custom nonce into the signed transaction (as it seems to be the only reasonable way to make transactions uniquely identifiable despite malleability of their hash). That nonce would be a perfect candidate for general-purpose timestamping purposes.
Tags: Computer science, Cryptography, Distributed computing, Project, Timestamping
Parasitic Computing

Posted by Konstantin 09.09.2008 3 Comments

Every time you visit this page, a piece of Javascript code will run within your browser, render a small part of the picture below (which is, for the sake of beauty and simplicity, a fragment of the Mandelbrot fractal) and submit the resulting pixels to the server. After 100 visits the whole picture will be complete (and the rendering restarts). If I hadn't told you that, you wouldn't have the slightest chance of noticing how this page steals your CPU cycles, and that is why one might refer to such practice as parasitic or leech computing.

$Mandelbrot fractal$
The Mandelbrot fractal

In this simple example, I am probably not winning much by outsourcing the rendering procedure. The computation of each pixel requires about 800 arithmetic operations on average, and this is comparable to the overhead imposed by the need to communicate the results back to the server via HTTP. However, if I chose to render somewhat larger chunks of the image at higher precision, the gains would be much more significant. Additionally, the script could be written so that it would keep running continuously for as long as you are staying at the page, thus sacrificing the user experience somewhat, yet blatantly robbing you of CPU power.

It seems that this approach to distributed computing has not reached the masses yet. I believe, however, that we are going to see the spread of such parasitic code someday, because it is the second easiest way to monetize website traffic. Indeed, we are already used to watching ads in return for free service. Moreover, quite a lot of the ads are rather heavy Flash applications that spend your CPU cycles with the sole purpose of annoying you. Now, if someone replaced that annoying Flashing banner with a script, that computed something useful behind the scenes, you wouldn't be too disappointed, would you? And that someone could then sell his website traffic not in terms of "banner displays", but in terms of "CPU seconds". Or, well, he could sell both.

Of course, not every distributed computation can be easily implemented within such an environment. Firstly, it should be possible to divide the problem into a large number of independent parts: this is precisely the case when you need to compute the values of a certain function f for a large number of parameters. The Mandelbrot example above fits this description. Here is one other similar problem. Less obviously, various other tasks could be fit within the framework with the help of the Map-Reduce trick.

Secondly, the computation of each value f(x) should be reasonably complex, preferably superlinear, i.e. Ω(n^2) or worse. Otherwise, the overhead of sending the inputs (which is O(n)) would offset the benefits too much.

Thirdly, the description of the function f should be reasonably compact, otherwise the overhead of transferring it to each visitor would be too costly. Note, however, that this issue slightly depends on the kind of traffic being leeched upon: if a website has a small number of dedicated users, each user would only need to download the function definition once and refer to the cached version on his subsequent visits to the site.

Finally, the function f, as well as its inputs and outputs must be public. This restriction severely limits the use of the approach. For example, although numerous data analysis tasks could satisfy the above conditions, in many practical contexts the data is private and it is thus not possible to openly distribute it to arbitrary visitors of an arbitrary website.

Besides the theoretical difficulties, there are some technical issues that need to be solved before the whole thing can work, such as the security aspects (you can't trust the results!), implementation (Linear Algebra libraries for Javascript or Flash, please?), ethical concerns and some more.

Nonetheless, the whole thing still looks rather promising to me, and is at least as worthy of academic and industrial attention, as are all of these overhyped Grid, P2P and SOA technologies around.

PS: By the way, I find the topic well-suitable for a proper student project/thesis.

Tags: Distributed computing, Javascript, Project, Web