• Posted by Konstantin 02.05.2017 2 Comments

    I happen to use the Amazon cloud machines from time to time for various personal and work-related projects. Over the years I've accumulated a terabyte or so of data files there. Those are mostly useless intermediate results or expired back-ups, which should be deleted and forgotten, but I could not gather the strength for that. "What if those datafiles happen to be of some archaelogical interest 30 years from now?", I thought. Keeping them just lying there on an Amazon machine is, however, a waste of money - it would be cheaper to download them all onto a local hard drive and tuck it somewhere into a dark dry place.

    But what would be the fastest way to download a terabyte of data from the cloud? Obviously, large downstream bandwidth is important here, but so should be a smart choice of the transfer technology. To my great suprise, googling did not provide me with a simple and convincing answer. A question posted to StackOverflow did not receive any informative replies and even got downvoted for reasons beyond my understanding. It's year 2017, but downloading a file is still not an obvious matter, apparently.

    Unhappy with such state of affairs I decided to compare some of the standard ways for downloading a file from a cloud machine. Although the resulting measurements are very configuration-specific, I believe the overall results might still generalize to a wider scope.

    Experimental Setup

    Consider the following situation:

    • An m4.xlarge AWS machine (which is claimed to have "High" network bandwidth) located in the EU (Ireland) region, with an SSD storage volume (400 Provisioned IOPS) attached to it.
    • A 1GB file with random data, generated on that machine using the following command:
      $ dd if=/dev/urandom of=file.dat bs=1M count=1024
    • The file needs to be transferred to a university server located in Tartu (Estonia). The server has a decently high network bandwidth and uses a mirrored-striped RAID for its storage backend.

    Our goal is to get the file from the AWS machine into the university server in the fastest time possible. We will now try eight different methods for that, measuring the mean transfer time over 5 attempts for each method.

    File Download Methods

    One can probably come up with hundreds of ways for transferring a file. The following eight are probably the most common and reasonably easy to arrange.

    1. SCP (a.k.a. SFTP)

    • Server setup: None (the SSH daemon is usually installed on a cloud machine anyway).
    • Client setup: None (if you can access a cloud server, you have the SSH client installed already).
    • Download command:

      scp -i ~/.ssh/id_rsa.amazon \
               ubuntu@$REMOTE_IP:/home/ubuntu/file.dat .

    2. RSync over SSH

    • Server setup: sudo apt install rsync (usually installed by default).
    • Client setup: sudo apt install rsync (usually installed by default).
    • Download command:

      rsync -havzP --stats \
            -e "ssh -i $HOME/.ssh/id_rsa.amazon" \
            ubuntu@$REMOTE_IP:/home/ubuntu/file.dat .

    3. Pure RSync

    • Server setup:
      Install RSync (usually already installed):

      sudo apt install rsync

      Create /etc/rsyncd.conf with the following contents:

      pid file = /var/run/rsyncd.pid
      lock file = /var/run/rsync.lock
      log file = /var/log/rsync.log
      
      [files]
      path = /home/ubuntu

      Run the RSync daemon:

      sudo rsync --daemon
    • Client setup: sudo apt install rsync (usually installed by default).
    • Download command:

      rsync -havzP --stats \
            rsync://$REMOTE_IP/files/file.dat .

    4. FTP (VSFTPD+WGet)

    • Server setup:
      Install VSFTPD:

      sudo apt install vsftpd

      Edit /etc/vsftpd.conf:

      listen=YES
      listen_ipv6=NO
      pasv_address=52.51.172.88   # The public IP of the AWS machine

      Create password for the ubuntu user:

      sudo passwd ubuntu

      Restart vsftpd:

      sudo service vsftpd restart
    • Client setup: sudo apt install wget (usually installed by default).
    • Download command:

      wget ftp://ubuntu:somePassword@$REMOTE_IP/file.dat

    5. FTP (VSFTPD+Axel)

    Axel is a command-line tool which can download through multiple connections thus increasing throughput.

    • Server setup: See 4.
    • Client setup: sudo apt install axel
    • Download command:

      axel -a ftp://ubuntu:somePassword@$REMOTE_IP/home/ubuntu/file.dat

    6. HTTP (NginX+WGet)

    • Server setup:
      Install NginX:

      sudo apt install nginx

      Edit /etc/nginx/sites-enabled/default, add into the main server block:

      location /downloadme {
          alias /home/ubuntu;
          gzip on;
      }

      Restart nginx:

      sudo service nginx restart
    • Client setup: sudo apt install wget (usually installed by default).
    • Download command:

      wget http://$REMOTE_IP/downloadme/file.dat

    7. HTTP (NginX+Axel)

    • Server setup: See 6.
    • Client setup: sudo apt install axel
    • Download command:

      axel -a http://$REMOTE_IP/downloadme/file.dat

    8. AWS S3

    The last option we try is first transferring the files onto an AWS S3 bucket, and then downloading from there using S3 command-line tools.

    • Server setup:
      Install and configure AWS command-line tools:

      sudo apt install awscli
      aws configure

      Create an S3 bucket:

      aws --region us-east-1 s3api create-bucket \
          --acl public-read-write --bucket test-bucket-12345 \
          --region us-east-1

      We create the bucket in the us-east-1 region because the S3 tool seems to have a bug at the moment which prevents from using it in the eu regions.

      Next, we transfer the file to the S3 bucket:

      aws --region us-east-1 s3 cp file.dat s3://test-bucket-12345
    • Client setup:
      Install and configure AWS command-line tools:

      sudo apt install awscli
      aws configure
    • Download command:

      aws --region us-east-1 s3 cp s3://test-bucket-12345/file.dat .

    Results

    Here are the measurement results. In case of the S3 method we report the total time needed to upload from the server to S3 and download from S3 to the local machine. Note that I did not bother to fine-tune any of the settings - it may very well be possible that some of the methods can be sped up significantly by configuring the servers appropriately. Consider the results below to indicate the "out of the box" performance of the corresponding approaches.

    Although S3 comes up as the fastest method (and might be even faster if it worked out of the box with the european datacenter), RSync is only marginally slower, yet it is easier to use, requires usually no additional set-up and handles incremental downloads very gracefully. I would thus summarize the results as follows:

    Whenever you need to download large files from the cloud, consider RSync over SSH as the default choice.

    Tags: , , , ,

  • Posted by Konstantin 27.07.2016 2 Comments

    While writing the previous post I was thinking of coming up with a small fun illustration for Aframe. I first heard about AFrame at the recent European Innovation Academy - a team-project-based entrepreneurship summer school. The team called MemVee was aiming to develop an AFrame-based site which would allow students to design and view interactive "Memory Palaces" - three-dimensional spaces with various objects related to their current study topics, organized in a way that simplifies remembering things for visual learners. Although I have never viewed a "Memory Palace" as something beyond a fantastic concept from a Sherlock Holmes TV episode, I am a visual learner myself and understand the importance of such illustrations. In fact, I try to use and preach graphical references in my teaching practice whenever I find the time and opportunity:

    • In this lecture the concept of a desk is used as a visual metaphor of "structuring the information" as well as to provide an outline to the talk.
    • Here and here an imaginary geographical map is used in a similar context.
    • For the computer graphics course I had to develop some "slides" as small OpenGL apps for visualizing the concepts during the lecture. This has been later taken to extreme heights in the practical materials designed by Raimond-Hendrik, who went on to give this course (alongside with a seminar) in the following years. Unfortunately the materials are closed for non-participants (yet I still hope they will be opened some day, do you read this, Raimond?), but the point is that every single notion has a tiny WebGL applet made to illustrate it interactively.
    • Once I tried to make a short talk about computer graphics, where the slides would be positioned on the walls of a 3D maze, so that to show them I'd have to "walk through the maze", like in a tiny first-person shooter game. Although this kind of visualization was not at all useful as a learning aid (as it did not structure anything at all), it none the less looked cool and was very well appreciated by the younger audience of the talk, to whom it was aimed at.

    I have lost the sources of that last presentation to a computer error and decided to recreate a similar "maze with slides" with AFrame. The night was long and I got sucked into the process to the point of making an automated tool. You upload your slides, and it generates a random maze with your slides hanging on the walls. It is utterly useless, but the domain name "slideamaze.com" was free and I could not resist the pun.

    Check it out. If you are into programming-related procrastination, try saving the "mazes" generated by the tool on your computer and editing the A-frame code to, say, add monsters or other fun educational tools into the maze.

    Tags: , , , , , ,

  • Posted by Konstantin 22.01.2015 31 Comments

    Yesterday I happened to attend a discussion about the security and privacy of information stored locally in Skype and Thunderbird profiles. It turns out, if you obtain a person's Skype profile directory, you will be able to log in as him without the need to know the password. In addition, Dominique made a remark that Skype does not really delete the messages that are marked as "removed" in the chat window. I found that curious and decided to take a closer look.

    Indeed, there is a bunch of *.dat files in the chatsync subdirectory of the Skype's profile, which preserve all messages along with all their edits or deletions. Unfortunately, the *.dat files are in some undocumented binary format, and the only tool I found for reading those lacks in features. However, hacking up a small Python parser according to what is known about the format, along with a minimalistic GUI is a single evening's exercise, and I happened to be in the mood for some random coding.

    Skype Chatsync Viewer

    Skype Chatsync Viewer

    Now, if you want to check out what was that message you or your conversation partner wrote before it was edited or deleted, this package will help. If you are not keen on installing Python packages, here is a standalone Windows executable.

    Tags: , , , , , , ,

  • Posted by Konstantin 12.03.2014 No Comments

    Whenever you write a program, you want this program to behave correctly and do what you want it to do. Thus, programming always goes together with the mental act of proving to yourself (and sometimes to other people as well), that the code you write is correct. Most often this "proof" is implicit, dissolved in the way you write your code and comment it. In fact, in my personal opinion, "good code" is exactly the one, where a human-reviewer is able to verify its correctness without too much effort.

    It is natural to use computers to help us verify correctness. Everyone who has ever programmed in a strictly-typed programming language, such as Java or Haskell, is familiar with the need to specify types of variables and functions and follow strict rules of type-safety. But of course, ensuring type-safety is just the compiler's way to help you ensure some basic claims about the program, such as "this variable will always contain an integer" or "this function will always be invoked with exactly three parameters".

    This is very convenient, yet can be somewhat limiting or annoying at times. After all, type-safety requires you to write code that "can be type-checked". Although very often this is expected of "good code" anyway, there are situations where you would like some more flexibility. For this reason some languages impose no type-safety rules at all (e.g. Python or Javascript), and some languages let you disable type-checking for parts of code.

    Rather than disabling the type checker, another principled way to allow more flexibility is to make the type-checker smarter. This is the promise of dependent types. In principle, a language, which supports dependent types, would let you make much more detailed statements about your program and have your program automatically checked for correctness with respect to those statements. Rather than being limited to primitive claims like "this variable is an integer", the use of dependent types enables you to assert things like "this is a sorted list", or "this is an odd integer", and so on up to nearly arbitrary level of detail, in the form of a type annotation. At least that much I found out during a course at the recent winter school.

    The course was based on the Agda programming language, and the first thing I decided to try implementing in Agda is a well-typed version of the following simple function:

    f t = if t then true else 0

    It might look like something trivial, yet most traditional strictly typed languages would not let you write this. After all, a function has to have a return type, and it has to be either a Boolean or an Integer, but not both. In this case, however, we expect our function to have a parameter-dependent type:

    f : (t : Bool) → if t then Bool else Integer

    Given that Agda is designed to support dependent types, how complicated could it be to implement such a simple function? It turns out, it takes a beginner more than an hour of thinking and a couple of consultations with the specialists in the field. The resulting code will include at least three different definitions of "if-then-else" statements and, I must admit, some aspects of it are still not completely clear to me.

    IF-THEN-ELSE in Agda

    IF-THEN-ELSE in Agda, including all the boilerplate code

    This is the longest code I've ever had to write to specify a simple if-then-else statement. The point of this blog post is to share the amusement and suggest you to check out Agda if you are in the mood for some puzzle-solving.

    As for dependent types, I guess those are not becoming mainstream any time soon.

    Tags: , , , , ,

  • Posted by Konstantin 03.09.2012 2 Comments

    For many people, the ability for learning and adaptation seems like something unique, extremely complicated and mysterious. Indeed, those are the abilities we almost exclusively associate with high levels of intelligence and knowledge. This is, however, an illusion. Although adaptive behaviour might indeed look complex, it is not necessarily driven by "intelligent" mechanisms. One of the best illustrations of this is a fully-fledged self-learning machine made from plain matchboxes.

    A Tic-Tac-Toe machine made by James Bridle

    A Tic-Tac-Toe machine by James Bridle

    The idea for such a machine was first introduced in 1960 by Donald Michie, who devised a simple self-learning algorithm for Tic-Tac-Toe (reminiscent of what is now known to be Reinforcement Learning). Due to lack of appropriate computing power, he implemented it "in hardware" using 300 or so matchboxes.

    The idea of the machine is simple. There is a matchbox corresponding to each game position, where the "computer" has to make a move. The matchbox contains colored beads, each color corresponding to a particular move. The decision is made by picking a random bead from the matchbox. Initially (when the machine is "untrained"), there is an equal number of beads of each color, and the machine thus makes equiprobably random turns. After each game, however, the machine is "punished" by removing beads, corresponding to losing turns, or "rewarded" by adding beads, corresponding to winning turns. Thus, after several games, the machine will adapt its strategy towards a winning one.

    The idea was popularized by Martin Gardner in one of his Scientific American articles (later published in the book "The Unexpected Hanging and Other Mathematical Diversions"). Gardner invented a simple game of  "Hexapawn", and derived a matchbox machine for it, which only required as little as 19 matchboxes. He also suggests in his article, however, to create a matchbox machine for "Mini-checkers" - checkers played on a 4x4 board. Ever since I saw this article some 20 or so years ago I was thinking of making one. This summer, while teaching a machine learning course in a summer school in Kiev, I actually made one. I could use it to both fulfil my ages-old desire as well as a teaching aid. You can make one too now, if you are interested.

    The Mini-checkers Machine

    The rules of mini-checkers are exactly like those of usual checkers, with three modifications:

    • The game is played on a 4x4 field. White is the first one to move. Machine plays for black.
    • Whenever both players get a King, the game immediately ends in a draw.
    • The King must always move to the furthest possible position in the chosen direction.

    To make the machine, you first have to buy and empty 24 matchboxes. Next, print out and stick the 24 game positions onto the boxes. Draw on each box all the possible black's moves as arrows using colored markers. Finally, for  each colored arrow, add 2 beads of the same color into the matchbox. That's it, your machine is ready to play.

    The Mini-checkers machine

    The Mini-checkers machine

    The game proceeds as already described: whenever the machine (the black player) has to make a decision (i.e. whenever it has to make a move and there is more than one possibility), find the matchbox with the current position depicted on it, shake it, and pick a random bead. This will tell you the decision of the machine. If the corresponding matchbox is empty, the machine forfeits. You should keep the matchboxes, corresponding to the moves that were made, open until the end of the game.

    Once the game is over, the machine is "taught":

    • If the machine won, do nothing.
    • If the game was a draw, remove the bead corresponding to the machine's last move from the matchbox, unless it was the last bead of that color in the box.
    • If the machine lost, remove all the beads, corresponding to the machine's last move, from the last matchbox.

    It takes about 30 games or so for the machine to actually learn to play well enough. Of course, a human would understand the strategy much earlier, but it's fun none the less.

    Playing with the machine will immediately lead you towards two important questions:

    • How efficient is the suggested learning procedure? Can it be improved and generalized?
    • How do you make a matchbox machine for a more complex game without having to manage thousands of matchboxes.

    As far as I know, contemporary machine learning has only partial answers to both of them.

    Tags: , , , ,

  • Posted by Konstantin 23.10.2009 No Comments

    The Alt+Tab key combination is perhaps one of the most well-known keyboard shortcuts in Windows, with the only competitors for the throne being Ctrl+C and Ctrl+V. And no matter, whether you are used to alt-tabbing for working purposes or simply as a means of efficient undercover procrastination, unless you are a complete novice you probably have this skill at the level of basic instincts.

    Unfortunately, there are cases where the instinct becomes inconvenient. Namely, whenever you use an application that displays multiple documents in separate tabs (like Firefox or Notepad++) or in separate child windows (like R), you are expected to use Ctrl+Tab rather than Alt+Tab to switch among documents. However, most of the time switching among documents is subjectively perceived as nothing essentially different than switching among programs, hence the fact that Alt+Tab won't work normally for that case is highly unintuitive. The typical case with me is that I would accidentally use Alt+Tab attempting to switch between the editor and console in R and unexpectedly find a completely different window in front of me, which is quite annoying.

    Although I am pretty sure I am not the only one to experience this kind of frustration, it is surprising that there does not seem to be any easily available solution to this trivial issue known to google. Thus, considering that the whole problem can be solved to a fair extent by simply translating Alt keypresses into Ctrl in a smart way, I've made a smallish program that does exactly that.

    I'm quite happy with the result and can't help sharing it with you.

    Download: Binary, Source.

    Tags: , , ,