• Posted by Konstantin 28.06.2018 No Comments

    I had to replace my SIM card today. The old one was mini-sized and I needed a micro-sized one. My previous SIM card stayed with me for about 20 years or so. I have gone through high school, university, and traveled around to countless places with it. I changed my phone several times, starting from an older, bulkier Nokia through several candy-bar Nokias and a couple of smartphones. My personal computer evolved from a 200MHz Pentium with 16MB RAM to a 2.8GHz quad-core with 32GB RAM. But my SIM card always stayed the same. Let me dedicate this post to the memory of this glorious tiny computer - the most long-lasting and reliable piece of computing equipment I have ever used.

    My retiring SIM card. The mobile operator changed its name from Radiolinja to Elisa in 2000, but the SIM stayed.

    My retiring SIM card. The mobile operator changed its name from Radiolinja to Elisa in 2000, but the SIM card did not care.

    What is a SIM card?

    Your SIM card is just another smart card, technologically similar to any of the "chip-and-pin" bank cards in your wallet, your ID card and even your contactless public transport tickets (which communicate using the same set of protocols, just wirelessly). Note, however, that banks started to switch from magnetic to "chip-and-pin" credit cards and countries began issuing smart-card-based identity documents only about 15 years ago or so. SIM cards, on the other hand, had been in wide use much earlier, long before the term "smart card" became a popularized buzzword.

    All smart cards are built to serve one primary purpose - identification. The card stores a secret key (which is impossible to retrieve without disassembling the card using ultra-high-tech lab equipment), and provides a way to execute the following challenge-response protocol: you send a random string (a challenge) to the card. The card encrypts the string using the stored key and returns the response. The correctness of this response can now be verified by a second party (e.g. the mobile operator, the bank or a website using ID-card authentication). The actual details of the computation differ among the various smart cards. For example, some use symmetric while other - asymmetric cryptography. Some cards provide additional services, such as creating digital signatures or storing information on the card. None the less, identification is always the core function. This explains the name of the SIM card: Subscriber Identity Module. It is a module, which identifies you (the subscriber) to the network provider.

    What is inside a SIM card?

    The best way to understand what is inside a SIM (or any other type of smart-) card is to connect it to your computer and "talk" to it. Many laptops have integrated smart-card readers nowadays, so if you find a suitable frame for your nano/micro/mini-SIM, you may simply put it into the reader just as you would do with an ID or a bank card.

    Mini-SIM in a frame

    Old mini-SIM in a frame from a newer card

    Note, though, that if your frame is flimsy and not fitting perfectly (as is mine on the photo above) you run the risk of losing the chip somewhere in the depths of the card reader slot before you even manage to slide it in completely. Hence, in my case, a tiny cross-style USB card reader was a more convenient and reliable option:

    SIM in a frame in a reader

    SIM in a frame in a reader

    Plug it in, wait until the system recognizes the device, and we are ready to talk. There are many tools for "talking" to the card on a low level, but one of the best for the purposes of educational exploration is, in my opinion, the program called CardPeek. Let us fire it up. At start up it asks for the card reader to use (note that you can use it to explore both contact and contactless cards, if you have the necessary reader):

    Select reader screen

    Select reader screen

    We can now click "Analyzer → GSM SIM", provide the PIN, wait a bit, and have the program extract a wealth of information stored on the card:

    SIM card analyzed

    SIM card analyzed

    Fun, right? Let us now see where all this data came from and what it actually means.

    How does it work?

    At the hardware level, a smart card has a very simple connector with six active pins, which are designed for sending and receiving data to and from the card:

    Smart card connectors

    Smart card connectors (source)

    Four pins (VCC, GND, CLK, I/O) are used for the basic half-duplex synchronous serial communication. RST is the reset pin, and VPP is used for supplying the higher voltage when (re)programming the card. When you first connect to the card, the protocol requires to zero the "reset" pin, to which the card will reply by sending a fixed sequence of bytes, identifying the card and its capabilities. It is known as the card's ATR ("Answer To Reset") string. You can see this string listed as the "cold ATR" entry in the screenshot named "SIM card analyzed" above.

    Besides providing pre-packaged analysis scripts for various kinds of smart cards, CardPeek allows to send custom commands to the card by using its Lua scripting interface directly. This is what the "Command" text input at the bottom of the screen is for. Let us now switch to the "logs" tab and try sending some commands. First of all, we need to establish the connection to the card. This is done via the card.connect() call:

    card.connect()

    card.connect() example

    The ATR string is received by the reader when the connection is first established. We can obtain it via card.last_atr() and print out to the log window in hex-encoded form using log.print and bytes:format (the documentation for all these APIs is available here):

    ATR

    ATR example

    As we see, the ATR for my card happens to be 3BBA9400401447473352533731365320 (in hex).  If you search the web, you will find that this particular ATR is known to be  a signature of the Elisa SIM cards. It is not a random string, though, and every byte has a meaning. In particular:

    • 3B is the "initial byte", and this particular value identifies the smart card as a SIM card.
    • BA is the "format" byte. Its first four bits (1011) tell us that we have to expect fields TA1, TB1 and TD1 to follow. The last four bits denote the number 10 - the number of "historical bytes" at the end of the ATR.
    • 94 is the field TA1, specifying the clock rate of the serial protocol
    • 00 is the field TB1, specifying the programming voltage (apparently, the card is not re-programmable)
    • 40 tells us that we have to read out another byte field TC2 (this is in the left-side part of the byte, 4 ) and that the card uses T=0 protocol (this is in the right-side part, 0).
    • 14 is the promised TC2 field (not sure what it is meant for),
    • the last 10 bytes are the "historical bytes", providing card manufacturer-specific information.

    Greeting your Card

    Now that we are connected, we can send various commands to the card. Let us proceed by example. The first command we might want to send is "VERIFY CHV", which is essentially greeting the card by providing our PIN1 code ("CHV" stands for "Card Holder Verification").

    VERIFY CHV

    VERIFY CHV (source)

    Every smart card command starts with a two-byte identifier (for example, A0 20 is the (hex) identifier of the VERIFY CHV command). It is followed by two parameter bytes P1 and P2. For example, the parameter P1 for VERIFY CHV is always 0, and P2 must indicate the number of the PIN we are submitting (i.e. 1 for PIN1, 2 for PIN2). Next comes P3 - a byte, specifying the length of the data which follows. For VERIFY CHV the data is the provided PIN itself, and it is always 8 bytes long. If the PIN is shorter than 8 bytes, it must be padded by bytes FF. The PIN itself is encoded in plain ASCII (i.e. 1234 would be 31 32 33 34).

    Now, supposing my PIN1 is, in fact "1234", I can authenticate myself to the card via CardPeek as follows:

    sw, resp = card.send(bytes.new(8, "A0 20 00 01 08 31 32 33 34 FF FF FF FF"))

    Here, card.send is the sending command and bytes.new(8, ...) constructs an array of 8-bit bytes from a hex string (see CardPeek reference).

    The sw and resp are the two components of the T=0 protocol response. For VERIFY CHV we only care that sw is equal to 9000, which means "OK". Note how this is printed in the log.

    VERIFY CHV

    VERIFY CHV example

    Beware that if you do not receive a 9000 response, it means that the card denied your PIN for some reason. Trying to submit a wrong PIN three times in a row will block the card.

    Reading the Data

    Now that we have identified ourselves to the card, let us try to read the data stored on it. The data on the card is organized in a hierarchy of files. It is this exact hierarchy that you can observe as the output of the "Analyzer" script. The root file is called "MF", it has the ICCID, TELECOM and GSM sub-files, which, in turn, have a number of predefined sub-files themselves, etc. The names are just conventions, the card itself uses two-byte identifiers for each file. For example, "MF" is actually 3F 00, "TELECOM" is 7F 10, etc. While the card is connected you can navigate around the file structure just like you would do in a normal operating system using the cd command, except that in smart-card lingo the corresponding command is called SELECT.

    The binary form of the SELECT command is A0 A4 00 00 02 {x} {y}, where {x} {y} is the file identifier. Just like before, A0 A4 is the command code, 00 00 are the ignored P1 and P2 parameters, and 02 tells us that exactly two bytes must follow.

    Consequently, if we wanted to select the file "MF (3f 00)/TELECOM (7F 10)/ADN (6F 3A)", which contains the address book records, we could achieve it by sending the following sequence of commands via CardPeek:

    card.send(bytes.new(8, "A0 A4 00 00 02 3F 00"))
    card.send(bytes.new(8, "A0 A4 00 00 02 7F 10"))
    card.send(bytes.new(8, "A0 A4 00 00 02 6F 3A"))

    Selecting files is a common task, and CardPeek provides a convenient shorthand: card.select("#7f10"). For some cards (not mine, unfortunately) it should also be possible to do things like card.select("/7f10/6f3a").

    Once we have selected the ADN ("Abbreviated Dialing Numbers") file, we may read out the individual phone numbers from it using the READ RECORD command. The procedure is complicated by the fact that READ RECORD needs to be provided the "record size" as one of its parameters, which, in turn, must be taken from the response data of the last SELECT command, and this must be obtained via the GET RESPONSE command. The complete example would therefore be:

    -- Select /TELECOM/ADN
    card.select("#7F10")
    sw, resp = card.select("#6F3A")
    
    -- Read file metadata
    GET_RESPONSE = "A0 C0 00 00"
    sw, resp = card.send(bytes.new(8, GET_RESPONSE, bit.AND(sw, 0xff)))
    
    -- Read out first record in the file
    sw, resp = card.read_record('.', 1, resp:get(14))
    
    -- Print the record to the log
    log.print(log.INFO, resp:format("%P"))
    Reading out a phone record

    Reading out a phone record

    Note that instead of printing the output to the log via log.print you could also show a message box:

    ui.readline(resp:format("%P"))

    or append a new node to the tree in the "card view" tab:

    nodes.append(nodes.root(), {classname="block", 
                                label="Name", 
                                val=resp:format("%P"), 
                                size=#resp})
    

    In fact, at this point you should go and read the script $HOME/.cardpeek/scripts/gsm (beta).lua. It contains the code we ran in the beginning of this post to analyze the card. The script simply sends the relevant commands to the card and appends all responses as nodes to the tree.

    Authentication

    While data storage is a useful capability of a SIM card, its main purpose is subscriber authentication. Thus, our acquaintance with the SIM card would be incomplete without checking the corresponding function out as well. It is quite simple:

    RUN GSM ALGORITHM

    RUN GSM ALGORITHM (source)

    That is, the process is the following: we send the byte sequence A0 88 00 00 10, followed by a 16 byte-long challenge string (which is normally given by the mobile operator when the phone joins the network). The SIM card responds with 12 bytes, of which the first 4 we should send back to the mobile operator for verification, and use the remaining 8 as a cipher key.

    Before we can use the RUN GSM ALGORITHM command we need to verify our PIN1 (already done) and SELECT the GSM (7F 20) file:

    RUN_GSM_ALGO = "A0 88 00 00 10"
    GET_RESPONSE = "A0 C0 00 00"
    DF_GSM = "#7F20"
    
    CHALLENGE_STRING = "00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00"
    
    -- Select GSM file
    card.select(DF_GSM)
    
    -- Send the challenge data
    sw, resp = card.send(bytes.new(8, RUN_GSM_ALGO, CHALLENGE_STRING))
    
    -- Read 12-byte-long response
    sw, RESPONSE_STRING = card.send(bytes.new(8, GET_RESPONSE, 12))
    log.print(log.INFO, RESPONSE_STRING:format("%D"))
    RUN GSM ALGORITHM example

    RUN GSM ALGORITHM example

    And that is it. I hope you learned to understand and appreciate your SIM card a bit more today.

    Tags: , , , , , ,

  • Posted by Konstantin 18.04.2018 No Comments

    Every once in a while I happen to find myself in a public network, where all access besides HTTP and HTTPS is blocked by the firewall. This is extremely inconvenient, as I routinely need to access SSH, VPN or other ports besides HTTP(S). Over time I have developed a reasonably fast and simple way of overcoming the restriction whenever I need it. Let me document it here.

    Google Cloud Shell

    There are probably hundreds of cloud providers nowadays, each of them trying to outcompete the others by offering better, cheaper, faster, or more diverse set of services. One killer feature of the Google Cloud platform is its cloud shell, which gives you command-line access to a tiny Linux virtual machine directly from their webpage for free:

    Once you are logged into Google Cloud platform you may open the shell here

    Once you are logged into Google Cloud platform you may open the shell window via this button

    Even if you do not have any serious use for a cloud provider, the cloud shell is one good reason to get an account at the Google Cloud platform. Because whenever I find myself locked out of SSH behind a paranoid firewall, I can still SSH into any of my servers via the cloud shell. This works even when your access is limited to an HTTP proxy server.

    Once upon a time there was a great service named koding.com, which also provided free access to a Linux console via HTTP. Unfortunately, they have changed their pricing model since then and do not seem to have any similar free offerings anymore. If you know any alternative services that offer a web-based shell access to a Linux VM for free, do post them in the comments.

    OpenVPN via HTTPS

    Sometimes SSH access offered by the cloud shell is not enough. For example, I would often need to access the company's VPN server. It runs on port 1194 and in a properly paranoid network this port is, of course, also blocked. The way to sneak through this restriction is the following.

    1. Launch a server in the cloud, running an OpenVPN service on port 443 (which corresponds to HTTPS). Even the most paranoid firewalls would typically let HTTPS traffic through, because otherwise they would block most of the web for their users.
    2. Connect to that VPN server and tunnel all traffic through it to the outside world.
    3. Now we are free to connect anywhere we please. In particular, we may open a VPN tunnel to the company's server from within that "outer" VPN tunnel.
    4. At this point I would sometimes SSH into a machine behind the company's VPN and never cease to be amused by the concept of having a SSH tunnel within a VPN tunnel within another VPN tunnel.

    Let us now go through all these steps in detail.

    Setting up an OpenVPN server

    We start by launching a machine in the cloud. You are free to choose any cloud provider here, but as we are using Google's cloud shell already anyway (we are working behind a paranoid firewall already, remember), it makes sense to launch the server from Google's cloud as well. This can be as simple as copy-pasting the following command into the same cloud shell prompt:

    gcloud compute instances create openvpn-server --zone=europe-west3-a --machine-type=f1-micro --tags=https-server --image=ubuntu-1604-xenial-v20180405 --image-project=ubuntu-os-cloud --boot-disk-size=10GB --boot-disk-type=pd-standard --boot-disk-device-name=openvpn-server

    (obviously, detailed documentation of Google cloud functionality is way beyond the scope of this blog post. All the necessary references and tutorials are rather easy to find, though).  You may play with some of the settings passed to the command above, however the choice of the ubuntu-1604-*** image is important, because the script from the next part was only ever tested on that Linux version. The chosen machine type (f1-micro) is the cheapest and should cost around 5 euros per month (if you keep it open constantly), or a matter of cents, if you only use it for some hours.

    Launching a machine in the cloud

    Launching a machine in the cloud

    Once the machine is up, we SSH into it by typing:

    gcloud compute ssh openvpn-server

    Here we'll need to install and configure the OpenVPN server. This may be a fairly lengthy process of following step-by-step instructions from, for example, this well-written tutorial. Luckily, I've gone through this already and wrote down all the steps down into a replayable script, which seems to work fine so far with the chosen Linux image. Of course, there's no guarantee it will continue working forever (some rather loose configuration editing is hard-coded there). However, as we have just launched a throwaway virtual server, the worst that can happen is the need to throw that server away if it breaks. (Do not run the script blindly on a machine you care about, though). So let's just download and run it:

    curl -s https://gist.githubusercontent.com/konstantint/08ab09202b68e4e3542622e99d21a82e/raw/1a3ee68008d5b565565ebb8c126ae68a8cebe549/ovpn_setup.sh | bash -s

    Once completed, the script prints the filename "/home/<username>/client-configs/files/client1.ovpn". This is the name of the file which we need to transfer back to our computer. A clumsy, yet fast and straightforward way is to simply copy-paste its contents from the shell into a local text file:

    cat /home/your_username/client-configs/files/client1.ovpn

    We then select all the output starting from the first lines of the file

    client
    dev tun
    proto tcp
    ...

    all the way down to

    ...
    -----END OpenVPN Static key V1-----
    </tls-auth>

    (holding "shift", scrolling and clicking the mouse helps).

    We then create a new file (on the local machine), name it client1.ovpn (for example), paste the copied text and save. That's it, we have successfully set up an OpenVPN server running on port 443. Type exit in the cloud shell to log out of the server as we don't need to configure anything there.

    Setting up an OpenVPN client

    Next we must set up an OpenVPN client on the local computer. I am using a Windows laptop, hence the instructions are Windows-specific, although the logic for Linux or Mac should be rather similar. First, install OpenVPN. The nicest way to do it in Windows is via Chocolatey. Open cmd.exe with administrative privileges and:

    1. Install Chocolatey, if you still don't have it (trust me, it's a good piece of software to have):

    @"%SystemRoot%\System32\WindowsPowerShell\v1.0\powershell.exe" -NoProfile -InputFormat None -ExecutionPolicy Bypass -Command "iex ((New-Object System.Net.WebClient).DownloadString('https://chocolatey.org/install.ps1'))" && SET "PATH=%PATH%;%ALLUSERSPROFILE%\chocolatey\bin"

    2. Now install OpenVPN (if you still don't have it):

    choco install -y openvpn

    3. Launch OpenVPN GUI (Windows button + type "OpenV" + Enter), right-click on the newly appeared tray icon, select "Import File..." and choose the client1.ovpn file we created:

    Import OVPN file

    Import OVPN file

    4. Once you've done it, the OpenVPN tray menu will offer you a "Connect" option (or a "client1" submenu with a "Connect" option if you have other connections configured already). Click it, observe the connection dialog, wait until the tray icon becomes green, and congratulations, all your traffic is now tunneled through port 443 of the cloud machine you launched some minutes ago.

    OpenVPN client connected

    Connected

    You may verify the effect by googling the words "my ip". You are now also free to connect to any ports or services you need.

    Tunnel in a Tunnel

    As I mentioned in the beginning, having freed myself from the firewalls of a paranoid network administrator, I would then sometimes need to connect to a corporate or a university VPN. This happens to be surprisingly easy (this part is, however, Windows specific - I am not sure how an equivalent action should look like on Linux or Mac, although I'm sure it should be possible).

    1. OpenVPN uses a virtual network tunnel adapter to forward traffic. Initially it only installs one such adapter, but if you want to run a tunnel within a tunnel you will need to add a second one. This is done by simply running C:\Program Files\TAP-Windows\bin\addtap.bat with administrator privileges. It only needs to be done once, of course (unless you need to run a tunnel within a tunnel within a tunnel - then you need to add a a third TAP adapter by running addtap.bat again).
    2. Now running a VPN within a VPN is simply a matter of asking OpenVPN to "Connect" to VPNs in the appropriate order. As we are already connected to client1, we simply connect to another profile without disconnecting the first one - this will happily forward a tunnel through an existing tunnel. Fun, right?
    VPN via VPN

    VPN via VPN

    Cleaning Up

    If you only needed the VPN temporarily, do not forget to destroy the cloud machine before going home - otherwise you'll have to pay the unnecessary costs of keeping a server up. Destroying servers is simple. Just go back to the cloud shell where we launched the server and run:

    gcloud compute instances delete openvpn-server

    That's it. You are back at the mercy of the firewalls.

    Tags: , , , , , ,

  • Posted by Konstantin 17.03.2018 3 Comments

    I have randomly stumbled upon a Quora question "Can you write a program for adding 10 numbers" yesterday. The existing answers competed in geeky humor and code golf, so I could not help adding another take on the problem.

    Can you write a program for adding 10 numbers?

    The question offers a great chance to illustrate how to properly develop software solutions to real-life problems such as this one.

    First things first - let us analyze the requirements posed by the customer. They are rather vague, as usual. It is not clear what “numbers” we need to add, where and how should these “numbers” come from, what is really meant under “adding”, what should we do with the result, what platform the software is supposed to be running on, what are the service guarantees, how many users are expected, etc.

    Of course, we do not want to discover that we misunderstood some of the requirements late in the development cycle, as this could potentially require us to re-do all of the work. To avoid such unpleasant surprises we should be planning for a general, solid, enterprise-grade solution to the problem. After a short meeting of the technical committee we decided to pick C# as the implementation platform. It is OS-independent and has many powerful features which should cover any possible future needs. For example, if the customer would decide to switch to a cluster-based, parallel implementation later along the way, we’d quickly have this base covered. Java could also be a nice alternative, but, according to the recent developer surveys, C# development pays more.

    The Architecture

    Let us start by modeling the problem on a higher level. The customer obviously needs to process (“add”) some data (“10 numbers”). Without getting into too much detail, this task can be modeled as follows:

    interface IInputProvider {}
    interface IOutput {}
    interface ISolution {
        IOutput add10(IInputProvider input);    
    }

    Note how we avoid specifying the actual sources of input and output yet. Indeed, we really don’t know where the “10 numbers” may be coming from in the future - these could be read from standard input, sent from the Internet, delivered by homing pigeons, or teleported via holographic technology of the future - all these options are easily supported by simply implementing IInputProvider appropriately.

    Of course, we need to do something about the output once we obtain it, even though the customer forgot to mention this part of the problem. This means we will also have to implement the following interface:

    interface IOutputConsumer {
        void consumeOutput(IOutput output);
    }

    And that is it - our general solution architecture! Let us start implementing it now.

    The Configuration

    The architecture we work with is completely abstract. An actual solution would need to provide implementations for the IInputProviderIOutputConsumer and ISolution interfaces. How do we specify which classes are implementing these interfaces? There are many possibilities - we could load this information from a database, for example, and create a dedicated administrative interface for managing the settings. For reasons of brevity, we’ll illustrate a simplistic XML-based factory method pattern.

    Namely, we shall describe the necessary implementations in the XML file config.xml as follows:

    <Config>
        <InputProvider class="Enterprise.NumberSequenceProvider"/>
        <OutputConsumer class="Enterprise.PeanoNumberPrinter"/>
        <Solution class="Enterprise.TenNumbersAddingSolution"/>
    </Config>

    A special SolutionFactory class can now load this configuration and create the necessary object instances. Here’s a prototype implementation:

    class SolutionFactory {
        private XDocument cfg;
        public SolutionFactory(string configFile) {
            cfg = XDocument.Load(configFile);
        }
        public IInputProvider GetInputProvider() {
            return Instantiate<IInputProvider>("InputProvider");
        }
        public IOutputConsumer GetOutputConsumer() {
            return Instantiate<IOutputConsumer>("OutputConsumer");
        }
        public ISolution GetSolution() {
            return Instantiate<ISolution>("Solution");
        }
        private T Instantiate<T>(string elementName) {
            var typeName = cfg.Root.Element(elementName)
                                   .Attribute("class").Value;
            return (T)Activator.CreateInstance(Type.GetType(typeName));
        }
    }

    Of course, in a real implementation we would also worry about specifying the XML Schema for our configuration file, and make sure it is possible to override the (currently hard-coded) “config.xml” file name with an arbitrary URI using command-line parameters or environment variables. In many real-life enterprise solutions in Java, for example, even the choice of the XML parsing library would need to be configured and initialized using its own factory pattern. I omit many of such (otherwise crucial) details for brevity here.

    I am also omitting the unit-tests, which, of course, should be covering every single method we are implementing.

    The Application

    Now that we have specified the architecture and implemented the configuration logic, let us put it all together into a working application. Thanks to our flexible design, the main application code is extremely short and concise:

    class Program {
        static void Main(string[] args) {
            var sf = new SolutionFactory("config.xml");
            var ip = sf.GetInputProvider();
            var oc = sf.GetOutputConsumer();
            var sol = sf.GetSolution();
            var op = sol.add10(ip);
            oc.consumeOutput(op);
        }
    }

    Amazing, right? Well, it does not really work yet, of course, because we still need to implement the core interfaces. However, at this point we may conclude the work of the senior architect and assign the remaining tasks of filling in the blanks to the the main engineering team.

    The Inputs and Outputs

    Now that we have set up the higher-level architecture, we may think a bit more specifically about the algorithm we plan to implement. Recall that we need to “add 10 numbers”. We don’t really know what these “numbers” should be - they could be real numbers, complex numbers, Roman numerals or whatnot, so we have to be careful and not rush into making strict assumptions yet. Let’s just say that a “number” is something that can be added to another number:

    interface INumber {
        INumber add(INumber other);
    }

    We’ll leave the implementation of this interface to our mathematicians on the team later on.

    At this step we can also probably make the assumption that our IInputProviderimplementation should somehow give access to ten different instances of an INumber. We don’t know how these instances are provided - in the worst case each of them may be obtained using a completely different method and at completely different times. Consequently, one possible template for an IInputProvider could be the following:

    interface ITenNumbersProvider: IInputProvider {
        INumber GetNumber1();
        INumber GetNumber2();
        INumber GetNumber3();
        INumber GetNumber4();
        INumber GetNumber5();
        INumber GetNumber6();
        INumber GetNumber7();
        INumber GetNumber8();
        INumber GetNumber9();
        INumber GetNumber10();
    }

    Note how, by avoiding the use of array indexing, we force the compiler to require that any implementation of our ITenNumbersProvider interface indeed provides exactly ten numbers. For brevity, however, let us refactor this design a bit:

    enum NumberOfANumber {
        ONE, TWO, THREE, FOUR, FIVE, SIX, SEVEN, EIGHT, NINE, TEN
    }
    interface ITenNumbersProvider: IInputProvider {
        INumber GetNumber(NumberOfANumber noan);
    }

    By listing the identities of our “numbers” in an enum we still get some level of compile-time safety, although it is not as strong any more, because enum is, internally, just an integer. However, we god rid of unnecessary repetitions, which is a good thing. Refactoring is an important aspect of enterprise software development, you see.

    The senior architect looked at the proposed interface at one of our regular daily stand-ups, and was concerned with the chosen design. “Your interface assumes you can provide immediate access to any of the ten numbers”, he said. But what if the numbers cannot be provided simultaneously and will be arriving at unpredictable points in time? If this were the case, an event-driven design would be much more appropriate:

    delegate void NumberHandler(NumberOfANumber id, INumber n);
     
    interface IAsynchronousInputProvider: IInputProvider {
        void AddNumberListener(NumberHandler handler);
    }

    The adding subsystem would then simply subscribe to receive events about the incoming numbers and handle them as they come in.

    “This is all good and nice”, responded the mathematician, “but for efficient implementation of the addition algorithm we might need to have all ten numbers available at the same time”. “Ah, software design 101”, says the senior architect. We simply install an adapter class. It would pool the incoming data until we have all of it, thus converting the IAsynchronousInputProvider, used for feeding the data, into an ITenNumbersProvider, needed by the mathematician:

    class SyncronizationAdapter: ITenNumbersProvider {
       private Dictionary<NumberOfANumber, INumber> nums;
       private ManualResetEvent allDataAvailableEvent;
     
       public SynchronizationAdapter(IAsynchronousInputProvider ainput){
           nums = new Dictionary<NumberOfANumber, INumber>();
           allDataAvailableEvent = new ManualResetEvent(false);
           ainput.AddNumberListener(this.HandleIncomingNumber);
       }
       private void HandleIncomingNumber(NumberOfANumber id, INumber n){
           nums[id] = n;
           if (Enum.GetValues(typeof(NumberOfANumber))
                   .Cast<NumberOfANumber>()
                   .All(k => nums.ContainsKey(k)))
                allDataAvailableEvent.Set();
       }
       public INumber GetNumber(NumberOfANumber noan) {
           allDataAvailableEvent.WaitOne();
           return nums[noan];
       }
    }

    Now the mathematician can work on his addition logic without having to know anything about the way the numbers are coming in. Convenient, isn’t it?

    Note that we are still only providing the input interface specification (along with an adapter) here. The actual implementation has to wait until our mathematicians come up with an implementation of INumber and the data engineers decide on how to obtain ten of these in the most optimal way.

    But what about IOutput? Let us assume that we expect to output a single number. This means that INumber must itself already be an instance of IOutput:

    interface INumber: IOutput {
       INumber add(INumber other);
    }

    No need to implement anything, we just add an interface tag to INumber! See how object-oriented design techniques allow us to save development time!

    The Order of Addition

    OK, so we now have a concept of an INumber which has a (binary) addition operation defined, an ITenNumbersProvider which can provide ten INumber instances (conveniently abstracting away the IAsynchrhonousInputProvider which actually obtains the numbers), and our goal is to add them up to get an IOutput which is itself an INumber. Sounds easy, right? Not so fast! How exactly are we going to add these numbers? After all, maybe in some cases adding ((a+b)+c)+d)… can be less efficient or precise than (a+(b+(c+(d…. Or maybe the optimal addition strategy is to start from the middle and then add numbers in some order? There do exist nontrivial ways to add up numbers, you know. To accommodate for any possible options in the future (so that we wouldn’t have to rewrite the code unnecessarily), we should design our solution in a way that would let us switch our addition strategy easily, should we discover a better algorithm. One way to do it is by abstracting the implementation behind the following interface:

    interface IAdditionStrategy {
       INumber fold(Func<NumberOfANumber, INumber> elements,
                    Func<INumber, INumber, INumber> op); 
    }

    You see, it is essentially a functor, which gets a way to access our set of numbers (via an accessor function) along with a binary operator “op”, and “folds” this operator along the number set in any way it deems necessary. This particular piece was designed by Harry, who is a huge fan of functional programming. He was somewhat disappointed when we decided not to implement everything in Haskell. Now he can show how everyone was wrong. Indeed, the IAdditionStrategy is a core element of our design, after all, and it happens to look like a fold-functor which takes functions as inputs! “I told you we had to go with Haskell!”, says Harry! It would allow us to implement all of our core functionality with a much higher level of polymorphism than that of a simplistic C# interface!

    The Solution Logic

    So, if we are provided with the ten numbers via ITenNumbersProvider and an addition strategy via IAdditionStrategy, the implementation of the solution becomes a very simple matter:

    class TenNumbersAddingSolution: ISolution {
       private IAdditionStrategy strategy;
       public TenNumbersAddingSolution() {
           strategy = ...
       }
       public IOutput add10(IInputProvider input) {
           var tenNumbers = new SynchronizationAdapter(
                          (IAsynchronousInputProvider)input);
           return strategy.fold(i => tenNumbers.GetNumber(i), 
                                (x,y) => x.add(y));
       }
    }

    We still need to specify where to take the implementation of the IAdditionStrategy from, though. This would be a good place to refactor our code by introducing a dependency injection configuration framework such as the Autofac library. However, to keep this text as short as possible, I am forced to omit this step. Let us simply add the “Strategy” field to our current config.xml as follows:

    <Config>
        ...
        <Solution class="Enterprise.TenNumbersAddingSolution">
            <Strategy class="Enterprise.AdditionStrategy"/>
        </Solution>
    </Config>

    We could now load this configuration setting from the solution class:

        ...
        public TenNumbersAddingSolution() {
            var cfg = XDocument.Load("config.xml");
            var typeName = cfg.Root
                   .Element("Solution")
                   .Element("Strategy")
                   .Attribute("class").Value;
            strategy = (IAdditionStrategy)Activator
                   .CreateInstance(Type.GetType(typeName));
        }
        ...

    And voilà, we have our solution logic in place. We still need to implement INumberIAdditionStrategyITenNumbersProvider and IOutputConsumer, though. These are the lowest-level tasks that will force us to make the most specific decisions and thus determine the actual shape of our final product. These will be done by the most expert engineers and mathematicians, who understand how things actually work inside.

    The Numbers

    How should we implement our numbers? As this was not specified, we should probably start with the simplest possible option. One of the most basic number systems from the mathematician’s point of view is that of Peano natural numbers. It is also quite simple to implement, so let’s go for it:

    class PeanoInteger: INumber {
        public PeanoInteger Prev { get; private set; }
        public PeanoInteger(PeanoInteger prev) { Prev = prev; }
        public INumber add(INumber b) {
            if (b == null) return this;
            else return new PeanoInteger(this)
                    .add(((PeanoInteger)b).Prev);
        }
    }

    Let us have IOutputConsumer print out the given Peano integer as a sequence of “1”s to the console:

    class PeanoNumberPrinter: IOutputConsumer {
        public void consumeOutput(IOutput p) {
            for (var x = (PeanoInteger)p; x != null; x = x.Prev)
                 Console.Write("1");
            Console.WriteLine();
        }
    }

    Finally, our prototype IAdditionStrategy will be adding the numbers left to right. We shall leave the option of considering other strategies for later development iterations.

    class AdditionStrategy: IAdditionStrategy {
        public INumber fold(Func<NumberOfANumber, INumber> elements,
                            Func<INumber, INumber, INumber> op) {
           return Enum.GetValues(typeof(NumberOfANumber))
                      .Cast<NumberOfANumber>()
                      .Select(elements).Aggregate(op);
        }
    }

    Take a moment to contemplate the beautiful abstraction of this functional method once again. Harry’s work, no doubt!

    The Input Provider

    The only remaining piece of the puzzle is the source of the numbers, i.e. the IAsynchronousInputProvider interface. Its implementation is a fairly arbitrary choice at this point - most probably the customer will want to customize it later, but for the purposes of our MVP we shall implement a simple sequential asynchronous generator of Peano numbers {1, 2, 3, …, 10}:

    class NumberSequenceProvider: IAsynchronousInputProvider {
        private event NumberHandler handler;
        private ManualResetEvent handlerAvailable;
     
        public NumberSequenceProvider() {
            handlerAvailable = new ManualResetEvent(false);
            new Thread(ProduceNumbers).Start();
        }
        public void AddNumberListener(NumberHandler nh) {
            handler += nh;
            handlerAvailable.Set();
        }
        private void ProduceNumbers() {
            handlerAvailable.WaitOne();
            PeanoInteger pi = null;
            foreach (var v in Enum.GetValues(typeof(NumberOfANumber))
                                  .Cast<NumberOfANumber>()) {
                    pi = new PeanoInteger(pi);
                    handler(v, pi);
            }
        }
    }

    Note that we have to be careful to not start publishing the inputs before the number processing subsystem attaches to the input producer. To achieve that we rely on the event semaphore synchronization primitive. At this point we can clearly see the benefit of choosing a powerful, enterprise-grade platform from the start! Semaphores would look much clumsier in Haskell, don’t you think, Harry? (Harry disagrees)

    So here we are - we have a solid, enterprise-grade, asynchronous, configurable implementation for an abstractly defined addition of abstractly defined numbers, using an abstract input-output mechanism.

    $> dotnet run
    1111111111111111111111111111111111111111111111111111111

    We do need some more months to ensure full test coverage, update our numerous UML diagrams, write documentation for users and API docs for developers, work on packaging and installers for various platforms, arrange marketing and sales for the project (logo, website, Facebook page, customer relations, all that, you know), and attract investors. Investors could then propose to pivot the product into a blockchain-based, distributed solution. Luckily, thanks to our rock solid design abstractions, this would all boil down to reimplementing just a few of the lower-level interfaces!

    Software engineering is fun, isn’t it?

    The source code for the developed solution is available here.

    Tags: , , , , ,

  • Posted by Konstantin 06.12.2017 6 Comments

    Early stopping is a technique that is very often used when training neural networks, as well as with some other iterative machine learning algorithms. The idea is quite intuitive - let us measure the performance of our model on a separate validation dataset during the training iterations. We may then observe that, despite constant score improvements on the training data, the model's performance on the validation dataset would only improve during the first stage of training, reach an optimum at some point and then turn to getting worse with further iterations.

    The early stopping principle

    The early stopping principle

    It thus seems reasonable to stop training at the point when the minimal validation error is achieved. Training the model any further only leads to overfitting. Right? The reasoning sounds solid and, indeed, early stopping is often claimed to improve generalization in practice. Most people seem to take the benefit of the technique for granted. In this post I would like to introduce some skepticism into this view or at least illustrate that things are not necessarily as obvious as they may seem from the diagram with the two lines above.

    How does Early Stopping Work?

    To get a better feeling of what early stopping actually does, let us examine its application to a very simple "machine learning model" - the estimation of the mean. Namely, suppose we are given a sample of 50 points \mathbf{x}_i from a normal distribution with unit covariance and we need to estimate the mean \mathbf{w} of this distribution.

    Sample

    Sample

    The maximum likelihood estimate of \mathbf{w} can be found as the point which has the smallest sum of squared distances to all the points in the sample. In other words, "model fitting" boils down to finding the minimum of the following objective function:

        \[f_\mathrm{train}(\mathrm{w}) := \sum_{i=1}^{50} \Vert \mathbf{x}_i - \mathbf{w}\Vert^2\]

    As our estimate is based on a finite sample, it, of course, won't necessarily be exactly equal to the true mean of the distribution, which I chose in this particular example to be exactly (0,0):

    Sample mean as a minimum of the objective function

    Sample mean as a minimum of the objective function

    The circles in the illustration above are the contours of the objective function, which, as you might guess, is a paraboloid bowl. The red dot marks its bottom and is thus the solution to our optimization problem, i.e. the estimate of the mean we are looking for. We may find this solution in various ways. For example, a natural closed-form analytical solution is simply the mean of the training set. For our purposes, however, we will be using the gradient descent iterative optimization algorithm. It is also quite straightforward: start with any point (we'll pick (-0.5, 0) for concreteness' sake) and descend in small steps downwards until we reach the bottom of the bowl:

    Gradient descent

    Gradient descent

    Let us now introduce early stopping into the fitting process. We will split our 50 points randomly into two separate sets: 40 points will be used to fit the model and 10 will form the early stopping validation set. Thus, technically, we now have two different objective functions to deal with:

        \[f_\mathrm{fit}(\mathrm{w}) := \sum_{i=1}^{40} \Vert \mathbf{x}_i - \mathbf{w}\Vert^2\]

    and

        \[f_\mathrm{stop}(\mathrm{w}) := \sum_{i=41}^{50} \Vert \mathbf{x}_i - \mathbf{w}\Vert^2.\]

    Each of those defines its own "paraboloid bowl", both slightly different from the original one (because those are different subsets of data):

    Fitting and early stopping objectives

    Fitting and early stopping objectives

    As our algorithm descends towards the red point, we will be tracking the value of f_\mathrm{stop} at each step along the way:

    Gradient descent with validation

    Gradient descent with validation

    With a bit of imagination you should see on the image above, how the validation error decreases as the yellow trajectory approaches the purple dot and then starts to increase after some point midway. The spot where the validation error achieves the minimum (and thus the result of the early stopping algorithm) is shown by the green dot on the figure below:

    Early stopping

    Early stopping

    In a sense, the validation function now acts as a kind of a "guardian", preventing the optimization from converging towards the bottom of our main objective. The algorithm is forced to settle on a model, which is neither an optimum of f_\mathrm{fit} nor of f_\mathrm{stop}. Moreover, both f_\mathrm{fit} and f_\mathrm{stop} use less data than f_\mathrm{train}, and are thus inherently a worse representation of the problem altogether.

    So, by applying early stopping we effectively reduced our training set size, used an even less reliable dataset to abort training, and settled on a solution which is not an optimum of anything at all. Sounds rather stupid, doesn't it?

    Indeed, observe the distribution of the estimates found with (blue) and without (red) early stopping in repeated experiments (each time with a new random dataset):

    Solutions found with and without early stopping

    Solutions found with and without early stopping

    As we see, early stopping greatly increases the variance of the estimate and adds a small bias towards our optimization starting point.

    Finally, let us see how the quality of the fit depends on the size of the validation set:

    Fit quality vs validation set size

    Fit quality vs validation set size

    Here the y axis shows the squared distance of the estimated point to the true value (0,0), smaller is better (the dashed line is the expected distance of a randomly picked point from the data).  The x axis shows all possible sizes of the validation set. We see that using no early stopping at all (x=0) results in the best expected fit. If we do decide to use early stopping, then for best results we should split the data approximately equally into training and validation sets. Interestingly, there do not seem to be much difference in whether we pick 30%, 50% or 70% of data for the validation set - the validation set seems to play just as much role in the final estimate as the training data.

    Early Stopping with Non-convex Objectives

    The experiment above seems to demonstrate that early stopping should be almost certainly useless (if not harmful) for fitting simple convex models. However, it is never used with such models in practice. Instead, it is most often applied to the training of multilayer neural networks. Could it be the case that the method somehow becomes useful when the objective is highly non-convex? Let us run a small experiment, measuring the benefits of early stopping for fitting a convolutional neural-network on the MNIST dataset. For simplicity, I took the standard example from the Keras codebase, and modified it slightly. Here is the result we get when training the the most basic model:

    MNIST - Basic

    MNIST - Basic

    The y axis depicts log-loss on the 10k MNIST test set, the x axis shows the proportion of the 60k MNIST training set set aside for early stopping. Ignoring small random measurement noise, we may observe that using early stopping with about 10% of the training data does seem to convey a benefit. Thus, contrary to our previous primitive example, when the objective is complex, early stopping does work as a regularization method. Why and how does it work here? Here's one intuition I find believable (there are alternative possible explanations and measurements, none of which I find too convincing or clear, though): stopping the training early prevents the algorithm from walking too far away from the initial parameter values. This limits the overall space of models and is vaguely analogous to suppressing the norm of the parameter vector. In other words, early stopping resembles an ad-hoc version of \ell_p regularization.

    Indeed, observe how the use of early stopping affects the results of fitting the same model with a small \ell_2-penalty added to the objective:

    MNIST - L2

    MNIST - L2

    All of the benefits of early stopping are gone now, and the baseline (non-early-stopped, \ell_2-regularized) model is actually better overall than it was before. Let us now try an even more heavily regularized model by adding dropout (instead of the \ell_2 penalty), as is customary for deep neural networks. We can observe an even cleaner result:

    MNIST - Dropout

    MNIST - Dropout

    Early stopping is again not useful at all, and the overall model is better than all of our previous attempts.

    Conclusion: Do We Need Early Stopping?

    Given the reasoning and the anecdotal experimental evidence above, I personally tend to think that beliefs in the usefulness of early stopping (in the context of neural network training) may be well overrated. Even if it may improve generalization for some nonlinear models, you would most probably achieve the same effect more reliably using other regularization techniques, such as dropout or a simple \ell_2 penalty.

    Note, though, that there is a difference between early stopping in the context of neural networks and, say, boosting models. In the latter case early stopping is actually more explicitly limiting the complexity of the final model and, I suspect, might have a much more meaningful effect. At least we can't directly carry over the experimental examples and results in this blog post to that case.

    Also note, that no matter whether early stopping helps or harms the generalization of the trained model, it is still a useful heuristic as to when to stop a lengthy training process automatically if we simply need results that are good enough.

     

    Tags: , , , , , , ,

  • Posted by Konstantin 25.07.2017 3 Comments

    Every student of computer science, who has managed to keep even a tiny shred of attention at their algorithms course, should know that sorting n numbers is a task that requires at least \Omega(n \log n) time in general. There are some special cases, such as sorting small integers, where you can use counting sort or radix sort to beat this baseline, but as long as your numbers are hypothetically arbitrarily large, you are stuck with the \Omega(n \log n) lower bound. Right?

    Well, not really. One thing that many algorithms courses tend to skim over rather briefly is the discussion of the choice of the computation model, under which the algorithm of interest is supposed to run. In particular, the \Omega(n \log n) bound for sorting holds for the comparison-only model of computation — the abstract situation where the algorithm may only perform pairwise comparisons of the numbers to be sorted. No arithmetic, bit-shifts or anything else your typical processor is normally trained to do is allowed. This is, obviously, not a very realistic model for a modern computer.

    Let us thus consider a different computation model instead, which allows our computer to perform any of the basic arithmetic or bitwise operations on numbers in constant time. In addition, to be especially abstract, let us also assume that our computer is capable of handling numbers of arbitrary size. This is the so-called unit-cost RAM model.

    It turns out that in this case one can sort arbitrarily large numbers in linear time. The method for achieving this (presented in the work of W. Paul and J. Simon, not to be confused with Paul Simon) is completely impractical, yet quite insightful and amusing (in the geeky sense). Let me illustrate it here.

    Paul-and-Simon Sorting

    The easiest way to show an algorithm is to step it through an example. Let us therefore consider the example task of sorting the following array of three numbers:

    a = [5, 3, 9]

    Representing the same numbers in binary:

    [101, 11, 1001]

    Our algorithm starts with a linear pass to find the bit-width of the largest number in the array. In our case the largest number is 9 and has 4 bits:

    bits = max([ceil(log2(x)) for x in a])     # bits = 4
    n = len(a)                                 # n = 3

    Next the algorithm will create a (4+1)\cdot 3^2 = 45-bit number A of the following binary form:

     1 {5} 1 {5} 1 {5} 1 {3} 1 {3} 1 {3} 1 {9} 1 {9} 1 {9}

    where {9}, {3} and {5} denote the 4-bit representations of the corresponding numbers. In simple terms, we need to pack each array element repeated n times together into a single number. It can be computed in linear time using, for example, the following code:

    temp, A = 0, 0
    for x in a:
        temp = (temp<<(n*(bits+1))) + (1<<bits) + x
    for i in range(n):
        A = (A<<(bits+1)) + temp

    The result is 23834505373497, namely:

    101011010110101100111001110011110011100111001

    Next, we need to compute another 45-bit number B, which will also pack all the elements of the array n times, however this time they will be separated by 0-bits and interleaved as follows:

     0 {5} 0 {3} 0 {9} 0 {5} 0 {3} 0 {9} 0 {5} 0 {3} 0 {9}

    This again can be done in linear time:

    temp, B = 0, 0
    for x in a:
        temp = (temp<<(bits+1)) + x
    for i in range(n):
        B = (B<<(n*(bits+1))) + temp

    The result is 5610472248425, namely:

    001010001101001001010001101001001010001101001

    Finally, here comes the magic trick: we subtract B from A. Observe how with this single operation we now actually perform all pairwise subtractions of the numbers in the array:

    A = 1 {5} 1 {5} 1 {5} 1 {3} 1 {3} 1 {3} 1 {9} 1 {9} 1 {9} 
    B = 0 {5} 0 {3} 0 {9} 0 {5} 0 {3} 0 {9} 0 {5} 0 {3} 0 {9}

    Consider what happens to the bits separating all the pairs. If the number on top is greater or equal to the number on the bottom of the pair, the corresponding separating bit on the left will not be carried in the subtraction, and the corresponding bit of the result will be 1. However, whenever the number on the top is less than the number on the bottom, the resulting bit will be zeroed out due to carrying:

    A   = 1 {5} 1 {5} 1 { 5} 1 { 3} 1 {3} 1 { 3} 1 {9} 1 {9} 1 {9} 
    B   = 0 {5} 0 {3} 0 { 9} 0 { 5} 0 {3} 0 { 9} 0 {5} 0 {3} 0 {9}
    A-B = 1 {0} 1 {2} 0 {12} 0 {14} 1 {0} 0 {10} 1 {4} 1 {6} 1 {0}

    The same in binary (highlighted groups correspond to repetitions of the original array elements in the number A):

    A   = 1 0101 1 0101 1 0101|1 0011 1 0011 1 0011|1 1001 1 1001 1 1001
    B   = 0 0101 0 0011 0 1001|0 0101 0 0011 0 1001|0 0101 0 0011 0 1001
    A-B = 1 0000 1 0010 0 1100|0 1110 1 0000 0 1010|1 0100 1 0110 1 0000
    

    Each "separator" bit of A-B is effectively the result of a comparison of every array element with every other. Let us now extract these bits using a bitwise AND and sum them within each group. It takes another couple of linear passes:

    x = A-B >> bits
    mask, result = 0, 0
    for i in range(n):
        mask = (mask<<(n*(bits+1))) + 1
    for i in range(n):
        result += x & mask
        x = x >> (bits+1)

    The result is now the following number:

    result = 10|000000000000001|000000000000011

    It is a packed binary representation of the array r = [2, 1, 3]. The number 2 here tells us that there are two elements in a, which are less or equal than a[0]=5. Similarly, the number 1 says that there is only one element less or equal than a[1]=3, and the number 3 means there are three elements less or equal than a[2]=9. In other words, this is an array of ranks, which tells us how the original array elements should be rearranged into sorted order:

    r = [result >> (n*(bits+1)*(n-i-1)) & ((1<<(n*(bits+1)))-1) 
                                              for i in range(n)]
    a_sorted = [None]*n
    for i in range(n):
        a_sorted[r[i]-1] = a[i]
    

    And voilà, the sorted array! As presented above, the method would only work for arrays consisting of distinct non-negative integers. However, with some modifications it can be adapted to arbitrary arrays of integers or floats. This is left as an exercise to the reader.

    The General Implications

    There are several things one can learn from the "Paul-and-Simon sort". Firstly, it shows the immense power of the unit-cost RAM computational model. By packing arbitrary amounts of data into a single register of unlimited size, we may force our imaginary computer to perform enormously complex parallel computations in a single step. Indeed, it is known that PSPACE-complete problems can be solved in polynomial time in the unlimited-precision RAM model. This, however, assumes that the machine can do arbitrary arithmetic operations. If you limit it to only additions, subtractions and multiplications (but not divisions or bit-shifts), you still cannot sort integers faster than \Omega(n \log n) even using infinitely-sized registers (this is the main result of the Paul and Simon's article that inspired this post). Not obvious, is it?

    Of course, real computers can usually only perform constant-time operations on registers of a fixed size. This is formalized in the w-bit word-RAM model, and in this model the "Paul and Simon sort" degrades from a O(n) into a O(n^3) algorithm (with O(n^2) memory consumption). This is a nice illustration of how the same algorithm can have different complexity based on the chosen execution model.

    The third thing that the "Paul and Simon sort" highlights very clearly is the power of arithmetic operations on packed values and bitstrings. In fact, this idea has been applied to derive practically usable integer sorting algorithms with nearly-linear complexity. The latter paper by Han & Thorup expresses the idea quite well:

    Excerpt from Han & Thorup, "Integer Sorting in O(n sqrt(log log n)) Expected Time and Linear Space".

    In case you need the full code of the step-by-step explanation presented above, here it is.

    Tags: , , ,

  • Posted by Konstantin 09.07.2017 No Comments

    The Dark Side of the Bitcoin

    Recall that Bitcoin is a currency, i.e. it is a technology, which aims to provide a store of value along with a payment medium. With all due respect to its steadily growing adoption, it would be fair to note that it is not very good at fulfilling either of these two functions currently. Firstly, it is not a very reliable store of value due to extreme volatility in the price. Secondly, and most importantly, it is a mediocre payment medium because it is slow and expensive.

    A typical transfer costs around $2 nowadays and takes about an hour for a full confirmation (or longer, if you pay a smaller fee). When you need to transfer a million dollars, this looks like a reasonable deal. When you buy a chocolate bar at a grocery store (something one probably does more often than transferring a million), it is unacceptable. Any plain old bank's payment card would offer a faster and cheaper solution, which is ironic, given that Bitcoin was meant to be all friendly, distributed and free (as in freedom) while banks are, as we all know, evil empires hungry for our money, flesh and souls.

    The irony does not end here. The evil banks typically provide some useful services in exchange for the fees they collect, such as an online self-service portal, 24h support personnel, cash handling and ATMs, some security guarantees, interests on deposits, etc. The friendly Bitcoin offers nothing of this kind. What is Bitcoin wasting our money on then? Electricity, mainly! The Proof of Work (PoW) algorithm employed in the Bitcoin's blockchain requires the computation of quintillions of random, meaningless hashes to "confirm" payments. The "miner" nodes, running the Bitcoin's network are collectively performing more than 5 000 000 000 000 000 000 (five quintillion or five exa-) hash computations every second, continuously consuming as much electricity as the whole country of Turkmenistan. The situation is even worse if you consider that Bitcoin is just one of many other "coins" built upon the PoW algorithm (Ethereum and Litecoin being the two other prominent examples), and their overall power consumption is only growing with each day.

    Just think of it: most of the $2 fee a Bitcoin user needs to pay for a transaction will neither end up as someone's wage nor make a return on investment in someone's pocket. Instead, it will burn up in fossil fuels which generate power for the "miners", wasting precious resources of our planet, contributing to global warming and pushing poor polar bears faster towards extinction. Is all this mayhem at least a "necessary evil"? Sadly, it is not.

    The Unnecessary Evil

    Formally speaking, Proof of Work is an algorithm for achieving consensus among a distributed set of nodes which collectively maintain a common blockchain. Is it the only such algorithm? Of course not! Many alternative methods exist, most of them (if not all) are both faster and less energy-hungry. In fact, the only valuable property of PoW is its ingenious simplicity. In terms of implementation it may very well be among the simplest distributed blockchain consensus algorithms ever to be invented.

    It is natural that a successful pioneering technology (such as the Bitcoin) is originally built from simple blocks. Progress comes in small steps and you cannot innovate on all fronts at once, after all. There must come a time, however, when the limitations of the initially chosen basic blocks become apparent and the technology gets upgraded to something more efficient. With more than $1 billion dollars in electricity bills paid by Bitcoin users last year for the inefficiency of PoW, Bitcoin has long surpassed this turning point, in my opinion.

    Unfortunately, due to its pioneering status, enormous inertia, ongoing hype and the high stakes involved, Bitcoin continues to roll on its old wooden proof-of-work wheels with no improvement in sight, somewhy still being perceived as the leader in the brave new world of cryptocurrencies.

    Are nearly-instant and nearly-free payment along with energy efficiency too much to ask from a real "currency of the future"? I do not think so. In fact, Bitcoin could be such a currency, if only it could switch from the evil Proof of Work to a different, fast and eco-friendly consensus algorithm.

    Which algorithm could it be? Let me offer you an overview of some of the current options I am personally aware of, so you could decide for yourself.

    The Eco-Friendly Blockchain Consensus

    Consider a network of many nodes, which needs to maintain a common state for a chain of blocks. There seem to be roughly three general categories of algorithms which the nodes could employ for their purpose: Proof of Authority (PoA), Nakamoto Consensus, and Byzantine Fault Tolerance (BFT). Let us consider them in order.

    Proof of Authority

    Perhaps the most straightforward solution would be to nominate a fixed subset of nodes as "authoritative", and let any of them append new blocks by signing them cryptographically. To avoid conflicting updates, nodes may agree on a predefined round-robin signing order, honestly randomize their waiting intervals, or use some kind of a deterministic lottery for selecting the signer for next block, etc.

    As this approach relies on a fixed subset of (reasonably) trusted nodes, it does not look robust and secure enough for a proper worldwide distributed blockchain. For example, in the limit case of a single trusted party it is equivalent to using a single service provider such as a bank. None the less, it is a convenient baseline and an important primitive, actually applicable to a wide range of real-life blockchain deployments. By relying on a set of well-behaving parties, a PoA blockchain actually sidesteps most of the complexities of a real distributed algorithm, and can thus be made to perform much faster than any of the "truly distributed" algorithms.

    The Ethereum software provides an implementation of this approach for those who want to run private chains. PeerCoin relies on the PoA principle by having "checkpoint blocks" signed regularly by a trusted authority. Finally, the Delegated Proof of Stake algorithm makes PoA work on a larger scale by relying on voting. It is probably one of the most interesting practical implementations of the idea.

    Delegated Proof of Stake

    Delegated Proof of Stake (DPoS) is a consensus algorithm implemented in Graphene-based blockchains (BitShares, SteemEOS). It is a variant of Proof of Authority, where the small set of authoritative delegate nodes is elected by voting. When electing the delegates, each node can cast the number of votes, proportional to their account value (or "stakeholder share"), thus "delegating their stake in the network". The elected authorities then participate in a simple and fast round-robin block confirmation with each node given a two second window for confirming the next block.

    The security of DPoS hinges on the assumption that the nodes with the most stake in the system should generally manage to elect a set of reasonable authorities, and in case of errors, the misbehaving authorities will not cause too much trouble and will be quickly voted out. At the same time, being internally a PoA implementation, the DPoS-based blockchains are by an order of magnitude faster in terms of transaction throughput than any other currently running public blockchains. Notably, they can also naturally support fee-less transactions.

    Nakamoto Consensus

    Consider the variation of PoA, where there are no pre-selected trusted nodes (i.e. all nodes may participate in the algorithm). Each time a new block needs to be added to the chain, let us pick the node who will gain the right to add it according to some deterministic "lottery" system. The consensus can then be achieved by simply verifying that the resulting blockchain is conforming to the lottery rules at all times, and the conflicting chains are resolved by always preferring the "harder" chain (according to some notion of "hardness").

    For example, the infamous Proof-of-Work is an example of such a method. The "lottery" here is based on the ability of a node to find a suitable nonce value. The "hardness" is simply the length of the chain. Such "lottery" methods are sometimes referred to as "Nakamoto consensus algorithms". In terms of efficiency, Nakamoto consensus algorithms are among the slowest consensus algorithms.

    Several alternatives to the "PoW lottery" have been proposed. Let us review some of them.

    Proof of Stake

    Proof of Stake (PoS), first implemented in the Nxt cryptocurrency, is a Nakamoto consensus technique, where the nodes with a greater balance on their account are given a higher chance to "win the lottery" and sign the next block. The actual technique used in Nxt is the following: before signing a block every node obtains a pseudo-random "lottery ticket number" x by hashing the last block data with its own identifier. If this number is smaller than

        \[\alpha \cdot \text{(account balance)}\cdot \text{(time since last block)},\]

    (where \alpha is a block-specific constant), the node gets the right to sign the next block. The higher the node's balance, the higher is the probability it will get a chance to sign. The rationale is that nodes with larger balances have more at stake, are more motivated to behave honestly, and thus need to be given more opportunities to participate in generating the blockchain.

    Proof of Stake is typically considered as the primary alternative to Proof of Work without all the wasteful computation, and it should, in principle, be possible to transition the whole blockchain from the latter to the former. In fact, this is what may probably happen to Ethereum eventually.

    Proof of Space

    In Proof of Space (PoSpace), a consensus mechanism implemented in Burstcoin, the "miners" must first pre-generate a set of "lottery ticket numbers" in a particular manner for themselves, save these numbers on a hard drive and commit the hash (the Merkle tree root) of this complete ticket set to the blockchain. Then, similarly to Proof of Stake, by hashing the last block's data, a miner deterministically picks one of his own "lottery tickets" for the next block. If the value of this ticket, discounted by the number of tickets in possession, is small enough, the miner gets the right to sign the block. The more tickets a miner generates and stores, the better are his chances. When signing the block, the miner must present a couple of special hashes which he can only know if he constantly stores his complete set of tickets (or fully recomputes a large part of it every time, which is impractical). Consequently, instead of spending energy on the "mining" process, the nodes must constantly dedicate a certain amount of disk space to the algorithm.

    Although it is probably among the less widely known methods, from both technical and practical standpoint, it is one of the most interesting techniques, in my opinion. Note how it combines the properties of PoS (speed and energy efficiency) with those of PoW (ownership of a real-world resource as a proxy for decentralization).

    Proof of Burn

    The idea behind Proof of Burn is to allow the nodes to generate their "lottery ticket numbers" by irretrievably transferring some coins to a nonexistent address and taking the hash of the resulting transaction. The resulting hash, scaled by the amount of coins burned, can then be used to gain the right to sign blocks just like in other Nakamoto lottery systems. The act of wasting coins is meant to be a virtual analogue of spending electricity on PoW mining, without actually spending it. Blockchains based purely on Proof of Burn do not seem to exist at the moment. However, the technique can  be used alongside PoW, PoS or other approaches.

    Proof of Elapsed Time

    Presumably, some Intel processors have specialized instructions for emitting signed tokens, which prove that a given process called a particular function a certain period of time ago. The Hyperledger project proposes to build a consensus algorithm around those. Each "miner" will gain the right to sign a block after it waits for a certain period of time. The token which proves that the miner did in fact wait the allotted time, would act as a winning lottery ticket. I do not see how this method could work outside of the trusted Intel-only environment or how is it better than a trivialized Proof of Stake (not sure I even understood the idea correcty), but I could not help mentioning it here for completeness' sake.

    Hybrid Nakamoto Consensus Systems

    Some systems interleave PoW and PoS confirmations, or add PoA signatures from time to time to lock the chain or speed-up block confirmations. In fact, it is not too hard to invent nearly arbitrary combinations of delegation, voting, payments, authorities and lotteries.

    Byzantine Fault Tolerance

    The Practical Byzantine Fault Tolerance (PBFT) algorithm offers an alternative solution to the consensus problem. Here the blockchain state is tracked by a set of "bookkeeping" nodes, which constantly broadcast all changes among themselves and consider a change reliably replicated when it is signed and confirmed by given quorum (e.g. 2/3) of the bookkeepers. The algorithms of this type can be shown to be reliable if no more than a third of the nodes are dishonest. The Ripple, Stellar and Antshares are examples of blockchains based on such techniques. This algorithm allows much higher transaction throughputs than Nakamoto consensus (PoW, PoS, PoSpace), yet it still lags behind the speed of PoA or DPoS.

    Tags: , , , , ,

  • Posted by Konstantin 02.05.2017 2 Comments

    I happen to use the Amazon cloud machines from time to time for various personal and work-related projects. Over the years I've accumulated a terabyte or so of data files there. Those are mostly useless intermediate results or expired back-ups, which should be deleted and forgotten, but I could not gather the strength for that. "What if those datafiles happen to be of some archaelogical interest 30 years from now?", I thought. Keeping them just lying there on an Amazon machine is, however, a waste of money - it would be cheaper to download them all onto a local hard drive and tuck it somewhere into a dark dry place.

    But what would be the fastest way to download a terabyte of data from the cloud? Obviously, large downstream bandwidth is important here, but so should be a smart choice of the transfer technology. To my great suprise, googling did not provide me with a simple and convincing answer. A question posted to StackOverflow did not receive any informative replies and even got downvoted for reasons beyond my understanding. It's year 2017, but downloading a file is still not an obvious matter, apparently.

    Unhappy with such state of affairs I decided to compare some of the standard ways for downloading a file from a cloud machine. Although the resulting measurements are very configuration-specific, I believe the overall results might still generalize to a wider scope.

    Experimental Setup

    Consider the following situation:

    • An m4.xlarge AWS machine (which is claimed to have "High" network bandwidth) located in the EU (Ireland) region, with an SSD storage volume (400 Provisioned IOPS) attached to it.
    • A 1GB file with random data, generated on that machine using the following command:
      $ dd if=/dev/urandom of=file.dat bs=1M count=1024
    • The file needs to be transferred to a university server located in Tartu (Estonia). The server has a decently high network bandwidth and uses a mirrored-striped RAID for its storage backend.

    Our goal is to get the file from the AWS machine into the university server in the fastest time possible. We will now try eight different methods for that, measuring the mean transfer time over 5 attempts for each method.

    File Download Methods

    One can probably come up with hundreds of ways for transferring a file. The following eight are probably the most common and reasonably easy to arrange.

    1. SCP (a.k.a. SFTP)

    • Server setup: None (the SSH daemon is usually installed on a cloud machine anyway).
    • Client setup: None (if you can access a cloud server, you have the SSH client installed already).
    • Download command:

      scp -i ~/.ssh/id_rsa.amazon \
               ubuntu@$REMOTE_IP:/home/ubuntu/file.dat .

    2. RSync over SSH

    • Server setup: sudo apt install rsync (usually installed by default).
    • Client setup: sudo apt install rsync (usually installed by default).
    • Download command:

      rsync -havzP --stats \
            -e "ssh -i $HOME/.ssh/id_rsa.amazon" \
            ubuntu@$REMOTE_IP:/home/ubuntu/file.dat .

    3. Pure RSync

    • Server setup:
      Install RSync (usually already installed):

      sudo apt install rsync

      Create /etc/rsyncd.conf with the following contents:

      pid file = /var/run/rsyncd.pid
      lock file = /var/run/rsync.lock
      log file = /var/log/rsync.log
      
      [files]
      path = /home/ubuntu

      Run the RSync daemon:

      sudo rsync --daemon
    • Client setup: sudo apt install rsync (usually installed by default).
    • Download command:

      rsync -havzP --stats \
            rsync://$REMOTE_IP/files/file.dat .

    4. FTP (VSFTPD+WGet)

    • Server setup:
      Install VSFTPD:

      sudo apt install vsftpd

      Edit /etc/vsftpd.conf:

      listen=YES
      listen_ipv6=NO
      pasv_address=52.51.172.88   # The public IP of the AWS machine

      Create password for the ubuntu user:

      sudo passwd ubuntu

      Restart vsftpd:

      sudo service vsftpd restart
    • Client setup: sudo apt install wget (usually installed by default).
    • Download command:

      wget ftp://ubuntu:somePassword@$REMOTE_IP/file.dat

    5. FTP (VSFTPD+Axel)

    Axel is a command-line tool which can download through multiple connections thus increasing throughput.

    • Server setup: See 4.
    • Client setup: sudo apt install axel
    • Download command:

      axel -a ftp://ubuntu:somePassword@$REMOTE_IP/home/ubuntu/file.dat

    6. HTTP (NginX+WGet)

    • Server setup:
      Install NginX:

      sudo apt install nginx

      Edit /etc/nginx/sites-enabled/default, add into the main server block:

      location /downloadme {
          alias /home/ubuntu;
          gzip on;
      }

      Restart nginx:

      sudo service nginx restart
    • Client setup: sudo apt install wget (usually installed by default).
    • Download command:

      wget http://$REMOTE_IP/downloadme/file.dat

    7. HTTP (NginX+Axel)

    • Server setup: See 6.
    • Client setup: sudo apt install axel
    • Download command:

      axel -a http://$REMOTE_IP/downloadme/file.dat

    8. AWS S3

    The last option we try is first transferring the files onto an AWS S3 bucket, and then downloading from there using S3 command-line tools.

    • Server setup:
      Install and configure AWS command-line tools:

      sudo apt install awscli
      aws configure

      Create an S3 bucket:

      aws --region us-east-1 s3api create-bucket \
          --acl public-read-write --bucket test-bucket-12345 \
          --region us-east-1

      We create the bucket in the us-east-1 region because the S3 tool seems to have a bug at the moment which prevents from using it in the eu regions.

      Next, we transfer the file to the S3 bucket:

      aws --region us-east-1 s3 cp file.dat s3://test-bucket-12345
    • Client setup:
      Install and configure AWS command-line tools:

      sudo apt install awscli
      aws configure
    • Download command:

      aws --region us-east-1 s3 cp s3://test-bucket-12345/file.dat .

    Results

    Here are the measurement results. In case of the S3 method we report the total time needed to upload from the server to S3 and download from S3 to the local machine. Note that I did not bother to fine-tune any of the settings - it may very well be possible that some of the methods can be sped up significantly by configuring the servers appropriately. Consider the results below to indicate the "out of the box" performance of the corresponding approaches.

    Although S3 comes up as the fastest method (and might be even faster if it worked out of the box with the european datacenter), RSync is only marginally slower, yet it is easier to use, requires usually no additional set-up and handles incremental downloads very gracefully. I would thus summarize the results as follows:

    Whenever you need to download large files from the cloud, consider RSync over SSH as the default choice.

    Tags: , , , ,

  • Posted by Konstantin 29.03.2017 1 Comment

    The following is an expanded version of an explanatory comment I posted here.

    Alice's Diary

    Alice decided to keep a diary. For that she bought a notebook, and started filling it with lines like:

    1. Bought 5 apples.
    2. Called mom.
      ....
    3. Gave Bob $250.
    4. Kissed Carl.
    5. Ate a banana.
      ...

    Alice did her best to keep a meticulous account of events, and whenever she had a discussion with friends about something that happened earlier, she would quickly resolve all arguments by taking out the notebook and demonstrating her records. One day she had a dispute with Bob about whether she lent him $250 earlier or not. Unfortunately, Alice did not have her notebook at hand at the time of the dispute, but she promised to bring it tomorrow to prove Bob owed her money.

    Bob really did not want to return the money, so that night he got into Alice's house, found the notebook, found line 132 and carefully replaced it with "132. Kissed Dave". The next day, when Alice opened the notebook, she did not find any records about money being given to Bob, and had to apologize for making a mistake.

    Alice's Blockchain

    A year later Bob's conscience got to him and he confessed his crime to Alice. Alice forgave him, but decided to improve the way she kept the diary, to avoid the risk of forging records in the future. Here's what she came up with. The operating system Linups that she was using had a program named md5sum, which could convert any text to its hash - a strange sequence of 32 characters. Alice did not really understand what the program did with the text, it just seemed to produce a sufficiently random sequence. For example, if you entered "hello" into the program, it would output "b1946ac92492d2347c6235b4d2611184", and if you entered "hello " with a space at the end, the output would be "1a77a8341bddc4b45418f9c30e7102b4".

    Alice scratched her head a bit and invented the following way of making record forging more complicated to people like Bob in the future: after each record she would insert the hash, obtained by feeding the md5sum program with the text of the record and the previous hash. The new diary now looked as follows:

    1. 0000 (the initial hash, let us limit ourselves with just four digits for brevity)
    2. Bought 5 apples.
    3. 4178 (the hash of "0000" and "Bought 5 apples")
    4. Called mom.
    5. 2314 (the hash of "4178" and "Called mom")
      ...
      4492
    6. Gave Bob $250.
      1010 (the hash of "4492" and "Gave Bob $250")
    7. Kissed Carl.
      8204 (the hash of "1010" and "Kissed Carl")
      ...

    Now each record was "confirmed" by a hash. If someone wanted to change the line 132 to something else, they would have to change the corresponding hash (it would not be 1010 anymore). This, in turn, would affect the hash of line 133 (which would not be 8204 anymore), and so on all the way until the end of the diary. In order to change one record Bob would have to rewrite confirmation hashes for all the following diary records, which is fairly time-consuming. This way, hashes "chain" all records together, and what was before a simple journal became now a chain of records or "blocks" - a blockchain.

    Proof-of-Work Blockchain

    Time passed, Alice opened a bank. She still kept her diary, which now included serious banking records like "Gave out a loan" or "Accepted a deposit". Every record was accompanied with a hash to make forging harder. Everything was fine, until one day a guy named Carl took a loan of $1000000. The next night a team of twelve elite Chinese diary hackers (hired by Carl, of course) got into Alice's room, found the journal and substituted in it the line "143313. Gave out a $1000000 loan to Carl" with a new version: "143313. Gave out a $10 loan to Carl". They then quickly recomputed all the necessary hashes for the following records. For a dozen of hackers armed with calculators this did not take too long.

    Fortunately, Alice saw one of the hackers retreating and understood what happened. She needed a more secure system. Her new idea was the following: let us append a number (called "nonce") in brackets to each record, and choose this number so that the confirmation hash for the record would always start with two zeroes. Because hashes are rather unpredictable, the only way to do it is to simply try out different nonce values until one of them results in a proper hash:

    1. 0000
    2. Bought 5 apples (22).
    3. 0042 (the hash of "0000" and "Bought 5 apples (22)")
    4. Called mom (14).
    5. 0089 (the hash of "0042" and "Called mom (14)")
      ...
      0057
    6. Gave Bob $250 (33).
      0001
    7. Kissed Carl (67).
      0093 (the hash of "0001" and "Kissed Carl (67)")
      ...

    To confirm each record one now needs to try, on average, about 50 different hashing operations for different nonce values, which makes it 50 times harder to add new records or forge them than previously. Hopefully even a team of hackers wouldn't manage in time. Because each confirmation now requires hard (and somewhat senseless) work, the resulting method is called a proof-of-work system.

    Distributed Blockchain

    Tired of having to search for matching nonces for every record, Alice hired five assistants to help her maintain the journal. Whenever a new record needed to be confirmed, the assistants would start to seek for a suitable nonce in parallel, until one of them completed the job. To motivate the assistants to work faster she allowed them to append the name of the person who found a valid nonce, and promised to give promotions to those who confirmed more records within a year. The journal now looked as follows:

    1. 0000
    2. Bought 5 apples (29, nonce found by Mary).
    3. 0013 (the hash of "0000" and "Bought 5 apples (29, nonce found by Mary)")
    4. Called mom (45, nonce found by Jack).
    5. 0089 (the hash of "0013" and "Called mom (45, nonce found by Jack)")
      ...
      0068
    6. Gave Bob $250 (08, nonce found by Jack).
      0028
    7. Kissed Carl (11, nonce found by Mary).
      0041
      ...

    A week before Christmas, two assistants came to Alice seeking for a Christmas bonus. Assistant Jack, showed a diary where he confirmed 140 records and Mary confirmed 130, while Mary showed a diary where she, reportedly, confirmed more records than Jack. Each of them was showing Alice a journal with all the valid hashes, but different entries! It turns out that ever since having found out about the promotion the two assistants were working hard to keep their own journals, such that all nonces would have their names. Since they had to maintain the journals individually they had to do all the work confirming records alone rather than splitting it among other assistants. This of course made them so busy that they eventually had to miss some important entries about Alice's bank loans.

    Consequently, Jacks and Mary's "own journals" ended up being shorter than the "real journal", which was, luckily, correctly maintained by the three other assistants. Alice was disappointed, and, of course, did not give neither Jack nor Mary a promotion. "I will only give promotions to assistants who confirm the most records in the valid journal", she said. And the valid journal is the one with the most entries, of course, because the most work has been put into it!

    After this rule has been established, the assistants had no more motivation to cheat by working on their own journal alone - a collective honest effort always produced a longer journal in the end. This rule allowed assistants to work from home and completely without supervision. Alice only needed to check that the journal had the correct hashes in the end when distributing promotions. This way, Alice's blockchain became a distributed blockchain.

    Bitcoin

    Jack happened to be much more effective finding nonces than Mary and eventually became a Senior Assistant to Alice. He did not need any more promotions. "Could you transfer some of the promotion credits you got from confirming records to me?", Mary asked him one day. "I will pay you $100 for each!". "Wow", Jack thought, "apparently all the confirmations I did still have some value for me now!". They spoke with Alice and invented the following way to make "record confirmation achievements" transferable between parties.

    Whenever an assistant found a matching nonce, they would not simply write their own name to indicate who did it. Instead, they would write their public key. The agreement with Alice was that the corresponding confirmation bonus would belong to whoever owned the matching private key:

    1. 0000
    2. Bought 5 apples (92, confirmation bonus to PubKey61739).
    3. 0032 (the hash of "0000" and "Bought 5 apples (92, confirmation bonus to PubKey61739)")
    4. Called mom (52, confirmation bonus to PubKey55512).
    5. 0056 (the hash of "0032" and "Called mom (52, confirmation bonus to PubKey55512)")
      ...
      0071
    6. Gave Bob $250 (22, confirmation bonus to PubKey61739).
      0088
    7. Kissed Carl (40, confirmation bonus to PubKey55512).
      0012
      ...

    To transfer confirmation bonuses between parties a special type of record would be added to the same diary. The record would state which confirmation bonus had to be transferred to which new public key owner, and would be signed using the private key of the original confirmation owner to prove it was really his decision:

    1. 0071
    2. Gave Bob $250 (22, confirmation bonus to PubKey6669).
      0088
    3. Kissed Carl (40, confirmation bonus to PubKey5551).
      0012
      ...
      0099
    4. TRANSFER BONUS IN RECORD 132 TO OWNER OF PubKey1111, SIGNED BY PrivKey6669. (83, confirmation bonus to PubKey4442).
      0071

    In this example, record 284 transfers bonus for confirming record 132 from whoever it belonged to before (the owner of private key 6669, presumably Jack in our example) to a new party - the owner of private key 1111 (who could be Mary, for example). As it is still a record, there is also a usual bonus for having confirmed it, which went to owner of private key 4442 (who could be John, Carl, Jack, Mary or whoever else - it does not matter here). In effect, record 284 currently describes two different bonuses - one due to transfer, and another for confirmation. These, if necessary, can be further transferred to different parties later using the same procedure.

    Once this system was implemented, it turned out that Alice's assistants and all their friends started actively using the "confirmation bonuses" as a kind of an internal currency, transferring them between each other's public keys, even exchanging for goods and actual money. Note that to buy a "confirmation bonus" one does not need to be Alice's assistant nor register anywhere. One just needs to provide a public key.

    This confirmation bonus trading activity became so prominent that Alice stopped using the diary for her own purposes, and eventually all the records in the diary would only be about "who transferred which confirmation bonus to whom". This idea of a distributed proof-of-work-based blockchain with transferable confirmation bonuses is known as the Bitcoin.

    Smart Contracts

    But wait, we are not done yet. Note how Bitcoin is born from the idea of recording "transfer claims", cryptographically signed by the corresponding private key, into a blockchain-based journal. There is no reason we have to limit ourselves to this particular cryptographic protocol. For example, we could just as well make the following records:

    1. Transfer bonus in record 132 to whoever can provide signatures, corresponding to PubKey1111 AND PubKey3123.

    This would be an example of a collective deposit, which may only be extracted by a pair of collaborating parties. We could generalize further and consider conditions of the form:

    1. Transfer bonus in record 132 to whoever first provides x, such that f(x) = \text{true}.

    Here f(x) could be any predicate describing a "contract". For example, in Bitcoin the contract requires x to be a valid signature, corresponding to a given public key (or several keys). It is thus a "contract", verifying the knowledge of a certain secret (the private key). However, f(x) could just as well be something like:

        \[f(x) = \text{true, if }x = \text{number of bytes in record #42000},\]

    which would be a kind of a "future prediction" contract - it can only be evaluated in the future, once record 42000 becomes available. Alternatively, consider a "puzzle solving contract":

        \[f(x) = \text{true, if }x = \text{valid, machine-verifiable}\]

        \[\qquad\qquad\text{proof of a complex theorem},\]

    Finally, the first part of the contract, namely the phrase "Transfer bonus in record ..." could also be fairly arbitrary. Instead of transferring "bonuses" around we could just as well transfer arbitrary tokens of value:

    1. Whoever first provides x, such that f(x) = \text{true} will be DA BOSS.
      ...
    2. x=42 satisifes the condition in record 284.
      Now and forever, John is DA BOSS!

    The value and importance of such arbitrary tokens will, of course, be determined by how they are perceived by the community using the corresponding blockchain. It is not unreasonable to envision situations where being DA BOSS gives certain rights in the society, and having this fact recorded in an automatically-verifiable public record ledger makes it possible to include the this knowledge in various automated systems (e.g. consider a door lock which would only open to whoever is currently known as DA BOSS in the blockchain).

    Honest Computing

    As you see, we can use a distributed blockchain to keep journals, transfer "coins" and implement "smart contracts". These three applications are, however, all consequences of one general, core property. The participants of a distributed blockchain ("assistants" in the Alice example above, or "miners" in Bitcoin-speak) are motivated to precisely follow all rules necessary for confirming the blocks. If the rules say that a valid block is the one where all signatures and hashes are correct, the miners will make sure these indeed are. If the rules say that a valid block is the one where a contract function needs to be executed exactly as specified, the miners will make sure it is the case, etc. They all seek to get their confirmation bonuses, and they will only get them if they participate in building the longest honestly computed chain of blocks.

    Because of that, we can envision blockchain designs where a "block confirmation" requires running arbitrary computational algorithms, provided by the users, and the greedy miners will still execute them exactly as stated. This general idea lies behind the Ethereum blockchain project.

    There is just one place in the description provided above, where miners have some motivational freedom to not be perfectly honest. It is the decision about which records to include in the next block to be confirmed (or which algorithms to execute, if we consider the Ethereum blockchain). Nothing really prevents a miner to refuse to ever confirm a record "John is DA BOSS", ignoring it as if it never existed at all. This problem is overcome in modern blockchains by having users offer additional "tip money" reward for each record included in the confirmed block (or for every algorithmic step executed on the Ethereum blockchain). This aligns the motivation of the network towards maximizing the number of records included, making sure none is lost or ignored. Even if some miners had something against John being DA BOSS, there would probably be enough other participants who would not turn down the opportunity of getting an additional tip.

    Consequently, the whole system is economically incentivised to follow the protocol, and the term "honest computing" seems appropriate to me.

    Tags: , , , , ,

  • Posted by Konstantin 28.03.2017 No Comments

    Consider the following question:

    Which of the following two statements is logically true?

    1. All planets of the Solar System orbit the Sun. The Earth orbits the Sun. Consequently, the Earth is a planet of the Solar System.
    2. God is the creator of all things which exist. The Earth exists. Consequently, God created the Earth.

    implicationI've seen this question or variations of it pop up as "provocative" posts in social networks several times. At times they might invite lengthy discussions, where the participants would split into camps - some claim that the first statement is true, because Earth is indeed a planet of the Solar System and God did not create the Earth. Others would laugh at the stupidity of their opponents and argue that, obviously, only the second statement is correct, because it makes a valid logical implication, while the first one does not.

    Not once, however, have I ever seen a proper formal explanation of what is happening here. And although it is fairly trivial (once you know it), I guess it is worth writing up. The root of the problem here is the difference between implication and provability - something I myself remember struggling a bit to understand when I first had to encounter these notions in a course on mathematical logic years ago.

    Indeed, any textbook on propositional logic will tell you in one of the first chapters that you may write

        \[A \Rightarrow B\]

    to express the statement "A implies B". A chapter or so later you will learn that there is also a possibility to write

        \[A \vdash B\]

    to express a confusingly similar statement, that "B is provable from A". To confirm your confusion, another chapter down the road you should discover, that A \Rightarrow B is the same as \vdash A \Rightarrow B, which, in turn, is logically equivalent to A \vdash B. Therefore, indeed, whenever A \Rightarrow B is true, A \vdash B is true, and vice-versa. Is there a difference between \vdash and \Rightarrow then, and why do we need the two different symbols at all? The "provocative" question above provides an opportunity to illustrate this.

    The spoken language is rather informal, and there can be several ways of formally interpreting the same statement. Both statements in the puzzle are given in the form "A, B, consequently C". Here are at least four different ways to put them formally, which make the two statements true or false in different ways.

    The Pure Logic Interpretation

    Anyone who has enough experience solving logic puzzles would know that both statements should be interpreted as abstract claims about provability (i.e. deducibility):

        \[A, B \vdash C.\]

    As mentioned above, this is equivalent to

        \[(A\,\&\, B) \Rightarrow C.\]

    or

        \[\vdash (A\,\&\, B) \Rightarrow C.\]

    In this interpretation the first statement is wrong and the second is a correct implication.

    The Pragmatic Interpretation

    People who have less experience with math puzzles would often assume that they should not exclude their common sense knowledge from the task. The corresponding formal statement of the problem then becomes the following:

        \[[\text{common knowledge}] \vdash (A\,\&\, B) \Rightarrow C.\]

    In this case both statements become true. The first one is true simply because the consequent C is true on its own, given common knowledge (the Earth is indeed a planet) - the antecedents and provability do not play any role at all. The second is true because it is a valid reasoning, independently of the common knowledge.

    This type of interpretation is used in rhetorical phrases like "If this is true, I am a Dutchman".

    The Overly Strict Interpretation

    Some people may prefer to believe that a logical statement should only be deemed correct if every single part of it is true and logically valid. The two claims must then be interpreted as follows:

        \[([\text{common}] \vdash A)\,\&\, ([\text{common}] \vdash B)\,\&\, (A, B\vdash C).\]

    Here the issue of provability is combined with the question about the truthfulness of the facts used. Both statements are false - the first fails on logic, and the second on facts (assuming that God creating the Earth is not part of common knowledge).

    The Oversimplified Interpretation

    Finally, people very unfamiliar with strict logic would sometimes tend to ignore the words "consequently", "therefore" or "then", interpreting them as a kind of an extended synonym for "and". In their minds the two statements could be regarded as follows:

        \[[\text{common}] \vdash A\,\&\, B\,\&\, C.\]

    From this perspective, the first statement becomes true and the second (again, assuming the aspects of creation are not commonly known) is false.

    Although the author of the original question most probably did really assume the "pure logic" interpretation, as is customary for such puzzles, note how much leeway there can be when converting a seemingly simple phrase in English to a formal statement. In particular, observe that questions about provability, where you deliberately have to abstain from relying on common knowledge, may be different from questions about facts and implications, where common sense may (or must) be assumed and you can sometimes skip the whole "reasoning" part if you know the consequent is true anyway.

    Here is an quiz question to check whether you understood what I meant to explain.

    "The sky is blue, and therefore the Earth is round." True or false?

    Tags: , , , ,

  • Posted by Konstantin 21.03.2017 No Comments

    Ever since Erwin Schrödinger described a thought experiment, in which a cat in a sealed box happened to be "both dead and alive at the same time", popular science writers have been relying on it heavily to convey the mysteries of quantum physics to the layman. Unfortunately, instead of providing any useful intuition, this example has instead laid solid base to a whole bunch of misconceptions. Having read or heard something about the strange cat, people would tend to jump to profound conclusions, such as "according to quantum physics, cats can be both dead and alive at the same time" or "the notion of a conscious observer is important in quantum physics". All of these are wrong, as is the image of a cat, who is "both dead and alive at the same time". The corresponding Wikipedia page does not stress this fact well enough, hence I thought the Internet might benefit from a yet another explanatory post.

    The Story of the Cat

    The basic notion in quantum mechanics is a quantum system. Pretty much anything could be modeled as a quantum system, but the most common examples are elementary particles, such as electrons or photons. A quantum system is described by its state. For example, a photon has polarization, which could be vertical or horizontal. Another prominent example of a particle's state is its wave function, which represents its position in space.

    There is nothing special about saying that things have state. For example, we may say that any cat has a "liveness state", because it can be either "dead" or "alive". In quantum mechanics we would denote these basic states using the bra-ket notation as |\mathrm{dead}\rangle and |\mathrm{alive}\rangle. The strange thing about quantum mechanical systems, though, is the fact that quantum states can be combined together to form superpositions. Not only could a photon have a purely vertical polarization \left|\updownarrow\right\rangle or a purely horizontal polarization \left|\leftrightarrow\right\rangle, but it could also be in a superposition of both vertical and horizontal states:

        \[\left|\updownarrow\right\rangle + \left|\leftrightarrow\right\rangle.\]

    This means that if you asked the question "is this photon polarized vertically?", you would get a positive answer with 50% probability - in another 50% of cases the measurement would report the photon as horizontally-polarized. This is not, however, the same kind of uncertainty that you get from flipping a coin. The photon is not either horizontally or vertically polarized. It is both at the same time.

    Amazed by this property of quantum systems, Schrödinger attempted to construct an example, where a domestic cat could be considered to be in the state

        \[|\mathrm{dead}\rangle + |\mathrm{alive}\rangle,\]

    which means being both dead and alive at the same time. The example he came up with, in his own words (citing from Wikipedia), is the following:

    Schrodingers_cat.svgA cat is penned up in a steel chamber, along with the following device (which must be secured against direct interference by the cat): in a Geiger counter, there is a tiny bit of radioactive substance, so small, that perhaps in the course of the hour one of the atoms decays, but also, with equal probability, perhaps none; if it happens, the counter tube discharges and through a relay releases a hammer that shatters a small flask of hydrocyanic acid. If one has left this entire system to itself for an hour, one would say that the cat still lives if meanwhile no atom has decayed. The first atomic decay would have poisoned it.

    The idea is that after an hour of waiting, the radiactive substance must be in the state

        \[|\mathrm{decayed}\rangle + |\text{not decayed}\rangle,\]

    the poison flask should thus be in the state

        \[|\mathrm{broken}\rangle + |\text{not broken}\rangle,\]

    and the cat, consequently, should be

        \[|\mathrm{dead}\rangle + |\mathrm{alive}\rangle.\]

    Correct, right? No.

    The Cat Ensemble

    Superposition, which is being "in both states at once" is not the only type of uncertainty possible in quantum mechanics. There is also the "usual" kind of uncertainty, where a particle is in either of two states, we just do not exactly know which one. For example, if we measure the polarization of a photon, which was originally in the superposition \left|\updownarrow\right\rangle + \left|\leftrightarrow\right\rangle, there is a 50% chance the photon will end up in the state \left|\updownarrow\right\rangle after the measurement, and a 50% chance the resulting state will be \left|\leftrightarrow\right\rangle. If we do the measurement, but do not look at the outcome, we know that the resulting state of the photon must be either of the two options. It is not a superposition anymore. Instead, the corresponding situation is described by a statistical ensemble:

        \[\{\left|\updownarrow\right\rangle: 50\%, \quad\left|\leftrightarrow\right\rangle: 50\%\}.\]

    Although it may seem that the difference between a superposition and a statistical ensemble is a matter of terminology, it is not. The two situations are truly different and can be distinguished experimentally. Essentially, every time a quantum system is measured (which happens, among other things, every time it interacts with a non-quantum system) all the quantum superpositions are "converted" to ensembles - concepts native to the non-quantum world. This process is sometimes referred to as decoherence.

    Now recall the Schrödinger's cat. For the cat to die, a Geiger counter must register a decay event, triggering a killing procedure. The registration within the Geiger counter is effectively an act of measurement, which will, of course, "convert" the superposition state into a statistical ensemble, just like in the case of a photon which we just measured without looking at the outcome. Consequently, the poison flask will never be in a superposition of being "both broken and not". It will be either, just like any non-quantum object should. Similarly, the cat will also end up being either dead or alive - you just cannot know exactly which option it is before you peek into the box. Nothing special or quantum'y about this.

    The Quantum Cat

    "But what gives us the right to claim that the Geiger counter, the flask and the cat in the box are "non-quantum" objects?", an attentive reader might ask here. Could we imagine that everything, including the cat, is a quantum system, so that no actual measurement or decoherence would happen inside the box? Could the cat be "both dead and alive" then?

    Indeed, we could try to model the cat as a quantum system with |\mathrm{dead}\rangle and |\mathrm{alive}\rangle being its basis states. In this case the cat indeed could end up in the state of being both dead and alive. However, this would not be its most exciting capability. Way more suprisingly, we could then kill and revive our cat at will, back and forth, by simply measuring its liveness state appropriately. It is easy to see how this model is unrepresentative of real cats in general, and the worry about them being able to be in superposition is just one of the many inconsistencies. The same goes for the flask and the Geiger counter, which, if considered to be quantum systems, get the magical abilities to "break" and "un-break", "measure" and "un-measure" particles at will. Those would certainly not be a real world flask nor a counter anymore.

    The Cat Multiverse

    There is one way to bring quantum superposition back into the picture, although it requires some rather abstract thinking. There is a theorem in quantum mechanics, which states that any statistical ensemble can be regarded as a partial view of a higher-dimensional superposition. Let us see what this means. Consider a (non-quantum) Schrödinger's cat. As it might be hopefully clear from the explanations above, the cat must be either dead or alive (not both), and we may formally represent this as a statistical ensemble:

        \[\{\left|\text{dead}\right\rangle: 50\%, \quad\left|\text{alive}\right\rangle: 50\%\}.\]

    It turns out that this ensemble is mathematically equivalent in all respects to a superposition state of a higher order:

        \[\left|\text{Universe A}, \text{dead}\right\rangle + \left|\text{Universe B}, \text{alive}\right\rangle,\]

    where "Universe A" and "Universe B" are some abstract, unobservable "states of the world". The situation can be interpreted by imagining two parallel universes: one where the cat is dead and one where it is alive. These universes exist simultaneously in a superposition, and we are present in both of them at the same time, until we open the box. When we do, the universe superposition collapses to a single choice of the two options and we are presented with either a dead, or a live cat.

    Yet, although the universes happen to be in a superposition here, existing both at the same time, the cat itself remains completely ordinary, being either totally dead or fully alive, depending on the chosen universe. The Schrödinger's cat is just a cat, after all.

    Tags: , , , , , ,