• Posted by Konstantin 02.01.2024 No Comments

    This article has been cross-posted to Hackster.

    What is the best way to learn (or teach) introductory robotics? What could be the "hello world" project in this field, that would be simple for a beginner to get started with, complex enough to be exciting and qualify as a "robot", and deep enough to allow further creative extensions and experiments? Let us try solving this problem logically, step by step!

    First things first: what is a "robot"?

    Axiom number one: for anything to feel like a robot, it must have some moving parts. Ideally, it could have the freedom move around, and probably the easiest way to achieve that would be to equip it with two wheels. Could concepts other than a two-wheeler qualify as the simplest robot? Perhaps, but let us proceed with this one as a reasonably justified assumption.

    The wheels

    To drive the two upcoming wheels we will need two motors along with all that driver circuitry, ugh! But wait, what about those small continuous rotation servos! They are simple, cheap, and you just need to wire them to a single microcontroller pin. Some of them are even conveniently sold along with with a LEGO-style wheel!

    The microcontroller

    Rule number two: it is not a real robot, unless it has a programmable brain, like a microcontroller. But which microcontroller is the best for a beginner in year 2024? Some fifteen years ago the answer would certainly have the "Arduino" keyword in it. Nowadays, I'd say, the magic keyword is "CircuitPython". Programming just cannot get any simpler than plugging it into your computer and editing code.py on the newly appeared external drive. No need for "compiling", "flashing" or any of the "install these serial port drivers to fix obscure error messages" user experiences (especially valuable if you need to give a workshop at a school computer class). Moreover, the code to control a wheel looks as straightforward as motor.throttle = 0.5.

    OK, CircuitPython it is, but what specific CircuitPython-compatible microcontroller should we choose? We need something cheap and well-supported by the manufacturer and the community. The Raspberry Pi Pico would be a great match due to its popularity, price and feature set, except it is a bit too bulky (as will become clear below). Check out the Seeedstudio Xiao Series instead. The Xiao RP2040 is built upon the same chip as the Raspberry Pi Pico, is just as cheap, but has a more compact size. Interestingly, as there is a whole series of pin-compatible boards with the same shape, we can later try other models if we do not like this one. Moreover, we are not even locked into a single vendor, as the QT Py series from Adafruit is also sharing the same form factor. How cool is that? (I secretly hope this particular footprint would get more adoption by other manufacturers).

    The body

    OK, cool. We have our wheels and a microcontroller to steer them. We now need to build the body for our robot and wire things together somehow. What would be the simplest way to achieve that?

    Let us start with the wiring, as here we know an obviously correct answer - a breadboard. No simpler manual wiring technology has been invented so far, right? Which breadboard model and shape should we pick? This will depend on how we decide to build the frame for our robot, so let us put this question aside for a moment.

    How do we build the frame? The options seem infinite: we can design it in a CAD tool and laser cut it from acrylic sheets, print on a 3D printer, use some profile rails, wooden blocks, LEGO bricks,... just ask Google for some inspiration:

    Ugh, this starts to feel too complicated and possibly inaccessible to a normal schoolkid.... But wait, what is it there on the back of our breadboard? Is it sticky tape? Could we just... stick... our motors to it directly and use the breadboard itself as the frame?

    If we now try out a few breadboard models, we will quickly discover that the widespread, "mini-sized", 170-pin breadboard has the perfect size to host our two servos:

    By an extra lucky coincindence, the breadboard even has screw holes in the right places to properly attach the motors! If we now hot glue the servo connectors to the side as shown above, we can wire them comfortably, just as any other components on the breadboard itself. Finally, there is just enough space in the back to pack all the servo wires and stick a 3xAAA battery box, which happens to be enough to power both our servos and the Xiao RP2040:

    This picture shows the FS90R servos. These work just as well as the GeekServo ones shown in in the picture above.

    ... and thus, the BreadboardBot is born! Could a breadboard with two motors and a battery stuck to its back qualify as the simplest, lowest-tech ever robot platform? I think it could. Note that the total cost of this whole build, including the microcontroller is under $20 (under $15 if you order at the right discounts from the right places).

    But what can it do?

    Line following

    Rule number 3: it is not a real robot if it is not able to sense the environment and react to it somehow. Hence, we need some sensors, preferably ones that would help us steer the robot (wheels are our only "actuators" so far). What are the simplest steering techniques in introductory robotics? I'd say line following and obstacle avoidance!

    Let us do line following first then. Oh, but how do we attach the line tracking sensors, we should have built some complicated frame after all, right? Nope, by a yet another happy design coincidence, we can just plug them right here into our breadboard:

    Thanks to the magic of CircuitPython, all it takes now to teach our robot to (even if crudely) follow a line is just three lines of code (boilerplate definitions excluded):

    while True:
      servo_left.throttle = 0.1 * line_left.value
      servo_right.throttle = -0.1 * line_right.value

    Obstacle avoidance

    OK, a line follower implemented with a three-line algorithm is cool, can we do anything else? Sure, look how conveniently the ultrasonic distance sensor fits onto our breadboard:

    Add a few more lines of uncomplicated Python code and you get a shy robot that can follow a line but will also happily go roaming around your room, steering away from any walls or obstacles:

    Note how you can hear the sonar working in the video (not normally audible).

    ... and more

    There is still enough space on the breadboard to add a button and a buzzer:

    .. or splash some color with a Xiao RGB matrix:

    .. or plug in an OLED screen to give your BreadboardBot a funny animated face:
    The one on the image above also had the DHT11 sensor plugged into it, so it shows humidity and temperature while it is doing its line following. Why? Because it can!

    Tired of programming and just want a remote-controllable toy? Sure, plug in a HC06 bluetooth serial module, program one last time, and go steer the robot yourself with a smartphone:

    Other microcontroller boards, BLE, Wifi and FPV

    Xiao RP2040 is great, but it has no built-in wireless connectivity! Plug in its ESP32-based cousin Xiao ESP32S3 and you can steer your robot with a Bluetooth gamepad, or via Wifi, by opening a webpage served by the robot itself!
    If you have the Xiao ESP32S3 Sense (the version with a microphone and camera), that webpage can even show you the live first-person view feed of the robot's camera (conveniently positioned to look forward):

    In case you were wondering, the resulting FPV bot is not a very practical solution for spying on your friends, as the microcontroller heats up a lot and consumes the battery way too fast, but it is still a fun toy and an educational project.

    Finally, although the Xiao boards fit quite well with all other gadgets on the mini breadboard you are not limited by that choice either and can try plugging in any other appropriately-sized microcontroller board. Here is, for example, a BreadboardBot with an M5Stack Atom Lite running a line-follower program implemented as an ESPHome config (which you can flash over the air!).


    Conclusion

    The examples above are just the tip of the iceberg. Have any other sensors lying around from those "getting started with Arduino" kits that you did not find sufficiently exciting uses for so far? Maybe plugging them into the BreadboardBot is the chance to shine they were waiting for all this time! Teaching the robot to recognize familiar faces and follow them? Solving a maze? Following voice commands? Self-balancing? Programming coordinated movement of multiple robots? Interfacing the robot with your smart home or one of these fancy AI chatbots? Organizing competitions among multiple BreadboardBots? Vacuuming your desk? The sky (and your free time) is the limit here!

    Consider making one and spreading the joy of breadboard robotics!

    Detailed build instructions along with wiring diagrams and example code are available on the project's Github page. Contributions are warmly welcome!

    Tags: , , , , , , ,

  • Posted by Konstantin 21.10.2019 No Comments

    Can't help sharing this lovely example, illustrating the way Python interns small integers, the lower level use of the id function as well as the power of the ctypes module.

    import ctypes
    def deref(addr, type_):
        return ctypes.cast(addr, ctypes.POINTER(type_))
    
    deref(id(29), ctypes.c_int)[4] = 100
    > 29
    100
    
    > 29*100
    10000
    
    > 25+4
    100

    Note that depending on the version of Python the value of the integer may be stored at either offset [4] or [6].

    Tags: , , , ,

  • Posted by Konstantin 17.03.2018 3 Comments

    I have randomly stumbled upon a Quora question "Can you write a program for adding 10 numbers" yesterday. The existing answers competed in geeky humor and code golf, so I could not help adding another take on the problem.

    Can you write a program for adding 10 numbers?

    The question offers a great chance to illustrate how to properly develop software solutions to real-life problems such as this one.

    First things first - let us analyze the requirements posed by the customer. They are rather vague, as usual. It is not clear what “numbers” we need to add, where and how should these “numbers” come from, what is really meant under “adding”, what should we do with the result, what platform the software is supposed to be running on, what are the service guarantees, how many users are expected, etc.

    Of course, we do not want to discover that we misunderstood some of the requirements late in the development cycle, as this could potentially require us to re-do all of the work. To avoid such unpleasant surprises we should be planning for a general, solid, enterprise-grade solution to the problem. After a short meeting of the technical committee we decided to pick C# as the implementation platform. It is OS-independent and has many powerful features which should cover any possible future needs. For example, if the customer would decide to switch to a cluster-based, parallel implementation later along the way, we’d quickly have this base covered. Java could also be a nice alternative, but, according to the recent developer surveys, C# development pays more.

    The Architecture

    Let us start by modeling the problem on a higher level. The customer obviously needs to process (“add”) some data (“10 numbers”). Without getting into too much detail, this task can be modeled as follows:

    interface IInputProvider {}
    interface IOutput {}
    interface ISolution {
        IOutput add10(IInputProvider input);    
    }

    Note how we avoid specifying the actual sources of input and output yet. Indeed, we really don’t know where the “10 numbers” may be coming from in the future - these could be read from standard input, sent from the Internet, delivered by homing pigeons, or teleported via holographic technology of the future - all these options are easily supported by simply implementing IInputProvider appropriately.

    Of course, we need to do something about the output once we obtain it, even though the customer forgot to mention this part of the problem. This means we will also have to implement the following interface:

    interface IOutputConsumer {
        void consumeOutput(IOutput output);
    }

    And that is it - our general solution architecture! Let us start implementing it now.

    The Configuration

    The architecture we work with is completely abstract. An actual solution would need to provide implementations for the IInputProviderIOutputConsumer and ISolution interfaces. How do we specify which classes are implementing these interfaces? There are many possibilities - we could load this information from a database, for example, and create a dedicated administrative interface for managing the settings. For reasons of brevity, we’ll illustrate a simplistic XML-based factory method pattern.

    Namely, we shall describe the necessary implementations in the XML file config.xml as follows:

    <Config>
        <InputProvider class="Enterprise.NumberSequenceProvider"/>
        <OutputConsumer class="Enterprise.PeanoNumberPrinter"/>
        <Solution class="Enterprise.TenNumbersAddingSolution"/>
    </Config>

    A special SolutionFactory class can now load this configuration and create the necessary object instances. Here’s a prototype implementation:

    class SolutionFactory {
        private XDocument cfg;
        public SolutionFactory(string configFile) {
            cfg = XDocument.Load(configFile);
        }
        public IInputProvider GetInputProvider() {
            return Instantiate<IInputProvider>("InputProvider");
        }
        public IOutputConsumer GetOutputConsumer() {
            return Instantiate<IOutputConsumer>("OutputConsumer");
        }
        public ISolution GetSolution() {
            return Instantiate<ISolution>("Solution");
        }
        private T Instantiate<T>(string elementName) {
            var typeName = cfg.Root.Element(elementName)
                                   .Attribute("class").Value;
            return (T)Activator.CreateInstance(Type.GetType(typeName));
        }
    }

    Of course, in a real implementation we would also worry about specifying the XML Schema for our configuration file, and make sure it is possible to override the (currently hard-coded) “config.xml” file name with an arbitrary URI using command-line parameters or environment variables. In many real-life enterprise solutions in Java, for example, even the choice of the XML parsing library would need to be configured and initialized using its own factory pattern. I omit many of such (otherwise crucial) details for brevity here.

    I am also omitting the unit-tests, which, of course, should be covering every single method we are implementing.

    The Application

    Now that we have specified the architecture and implemented the configuration logic, let us put it all together into a working application. Thanks to our flexible design, the main application code is extremely short and concise:

    class Program {
        static void Main(string[] args) {
            var sf = new SolutionFactory("config.xml");
            var ip = sf.GetInputProvider();
            var oc = sf.GetOutputConsumer();
            var sol = sf.GetSolution();
            var op = sol.add10(ip);
            oc.consumeOutput(op);
        }
    }

    Amazing, right? Well, it does not really work yet, of course, because we still need to implement the core interfaces. However, at this point we may conclude the work of the senior architect and assign the remaining tasks of filling in the blanks to the the main engineering team.

    The Inputs and Outputs

    Now that we have set up the higher-level architecture, we may think a bit more specifically about the algorithm we plan to implement. Recall that we need to “add 10 numbers”. We don’t really know what these “numbers” should be - they could be real numbers, complex numbers, Roman numerals or whatnot, so we have to be careful and not rush into making strict assumptions yet. Let’s just say that a “number” is something that can be added to another number:

    interface INumber {
        INumber add(INumber other);
    }

    We’ll leave the implementation of this interface to our mathematicians on the team later on.

    At this step we can also probably make the assumption that our IInputProviderimplementation should somehow give access to ten different instances of an INumber. We don’t know how these instances are provided - in the worst case each of them may be obtained using a completely different method and at completely different times. Consequently, one possible template for an IInputProvider could be the following:

    interface ITenNumbersProvider: IInputProvider {
        INumber GetNumber1();
        INumber GetNumber2();
        INumber GetNumber3();
        INumber GetNumber4();
        INumber GetNumber5();
        INumber GetNumber6();
        INumber GetNumber7();
        INumber GetNumber8();
        INumber GetNumber9();
        INumber GetNumber10();
    }

    Note how, by avoiding the use of array indexing, we force the compiler to require that any implementation of our ITenNumbersProvider interface indeed provides exactly ten numbers. For brevity, however, let us refactor this design a bit:

    enum NumberOfANumber {
        ONE, TWO, THREE, FOUR, FIVE, SIX, SEVEN, EIGHT, NINE, TEN
    }
    interface ITenNumbersProvider: IInputProvider {
        INumber GetNumber(NumberOfANumber noan);
    }

    By listing the identities of our “numbers” in an enum we still get some level of compile-time safety, although it is not as strong any more, because enum is, internally, just an integer. However, we god rid of unnecessary repetitions, which is a good thing. Refactoring is an important aspect of enterprise software development, you see.

    The senior architect looked at the proposed interface at one of our regular daily stand-ups, and was concerned with the chosen design. “Your interface assumes you can provide immediate access to any of the ten numbers”, he said. But what if the numbers cannot be provided simultaneously and will be arriving at unpredictable points in time? If this were the case, an event-driven design would be much more appropriate:

    delegate void NumberHandler(NumberOfANumber id, INumber n);
     
    interface IAsynchronousInputProvider: IInputProvider {
        void AddNumberListener(NumberHandler handler);
    }

    The adding subsystem would then simply subscribe to receive events about the incoming numbers and handle them as they come in.

    “This is all good and nice”, responded the mathematician, “but for efficient implementation of the addition algorithm we might need to have all ten numbers available at the same time”. “Ah, software design 101”, says the senior architect. We simply install an adapter class. It would pool the incoming data until we have all of it, thus converting the IAsynchronousInputProvider, used for feeding the data, into an ITenNumbersProvider, needed by the mathematician:

    class SyncronizationAdapter: ITenNumbersProvider {
       private Dictionary<NumberOfANumber, INumber> nums;
       private ManualResetEvent allDataAvailableEvent;
     
       public SynchronizationAdapter(IAsynchronousInputProvider ainput){
           nums = new Dictionary<NumberOfANumber, INumber>();
           allDataAvailableEvent = new ManualResetEvent(false);
           ainput.AddNumberListener(this.HandleIncomingNumber);
       }
       private void HandleIncomingNumber(NumberOfANumber id, INumber n){
           nums[id] = n;
           if (Enum.GetValues(typeof(NumberOfANumber))
                   .Cast<NumberOfANumber>()
                   .All(k => nums.ContainsKey(k)))
                allDataAvailableEvent.Set();
       }
       public INumber GetNumber(NumberOfANumber noan) {
           allDataAvailableEvent.WaitOne();
           return nums[noan];
       }
    }

    Now the mathematician can work on his addition logic without having to know anything about the way the numbers are coming in. Convenient, isn’t it?

    Note that we are still only providing the input interface specification (along with an adapter) here. The actual implementation has to wait until our mathematicians come up with an implementation of INumber and the data engineers decide on how to obtain ten of these in the most optimal way.

    But what about IOutput? Let us assume that we expect to output a single number. This means that INumber must itself already be an instance of IOutput:

    interface INumber: IOutput {
       INumber add(INumber other);
    }

    No need to implement anything, we just add an interface tag to INumber! See how object-oriented design techniques allow us to save development time!

    The Order of Addition

    OK, so we now have a concept of an INumber which has a (binary) addition operation defined, an ITenNumbersProvider which can provide ten INumber instances (conveniently abstracting away the IAsynchrhonousInputProvider which actually obtains the numbers), and our goal is to add them up to get an IOutput which is itself an INumber. Sounds easy, right? Not so fast! How exactly are we going to add these numbers? After all, maybe in some cases adding ((a+b)+c)+d)… can be less efficient or precise than (a+(b+(c+(d…. Or maybe the optimal addition strategy is to start from the middle and then add numbers in some order? There do exist nontrivial ways to add up numbers, you know. To accommodate for any possible options in the future (so that we wouldn’t have to rewrite the code unnecessarily), we should design our solution in a way that would let us switch our addition strategy easily, should we discover a better algorithm. One way to do it is by abstracting the implementation behind the following interface:

    interface IAdditionStrategy {
       INumber fold(Func<NumberOfANumber, INumber> elements,
                    Func<INumber, INumber, INumber> op); 
    }

    You see, it is essentially a functor, which gets a way to access our set of numbers (via an accessor function) along with a binary operator “op”, and “folds” this operator along the number set in any way it deems necessary. This particular piece was designed by Harry, who is a huge fan of functional programming. He was somewhat disappointed when we decided not to implement everything in Haskell. Now he can show how everyone was wrong. Indeed, the IAdditionStrategy is a core element of our design, after all, and it happens to look like a fold-functor which takes functions as inputs! “I told you we had to go with Haskell!”, says Harry! It would allow us to implement all of our core functionality with a much higher level of polymorphism than that of a simplistic C# interface!

    The Solution Logic

    So, if we are provided with the ten numbers via ITenNumbersProvider and an addition strategy via IAdditionStrategy, the implementation of the solution becomes a very simple matter:

    class TenNumbersAddingSolution: ISolution {
       private IAdditionStrategy strategy;
       public TenNumbersAddingSolution() {
           strategy = ...
       }
       public IOutput add10(IInputProvider input) {
           var tenNumbers = new SynchronizationAdapter(
                          (IAsynchronousInputProvider)input);
           return strategy.fold(i => tenNumbers.GetNumber(i), 
                                (x,y) => x.add(y));
       }
    }

    We still need to specify where to take the implementation of the IAdditionStrategy from, though. This would be a good place to refactor our code by introducing a dependency injection configuration framework such as the Autofac library. However, to keep this text as short as possible, I am forced to omit this step. Let us simply add the “Strategy” field to our current config.xml as follows:

    <Config>
        ...
        <Solution class="Enterprise.TenNumbersAddingSolution">
            <Strategy class="Enterprise.AdditionStrategy"/>
        </Solution>
    </Config>

    We could now load this configuration setting from the solution class:

        ...
        public TenNumbersAddingSolution() {
            var cfg = XDocument.Load("config.xml");
            var typeName = cfg.Root
                   .Element("Solution")
                   .Element("Strategy")
                   .Attribute("class").Value;
            strategy = (IAdditionStrategy)Activator
                   .CreateInstance(Type.GetType(typeName));
        }
        ...

    And voilà, we have our solution logic in place. We still need to implement INumberIAdditionStrategyITenNumbersProvider and IOutputConsumer, though. These are the lowest-level tasks that will force us to make the most specific decisions and thus determine the actual shape of our final product. These will be done by the most expert engineers and mathematicians, who understand how things actually work inside.

    The Numbers

    How should we implement our numbers? As this was not specified, we should probably start with the simplest possible option. One of the most basic number systems from the mathematician’s point of view is that of Peano natural numbers. It is also quite simple to implement, so let’s go for it:

    class PeanoInteger: INumber {
        public PeanoInteger Prev { get; private set; }
        public PeanoInteger(PeanoInteger prev) { Prev = prev; }
        public INumber add(INumber b) {
            if (b == null) return this;
            else return new PeanoInteger(this)
                    .add(((PeanoInteger)b).Prev);
        }
    }

    Let us have IOutputConsumer print out the given Peano integer as a sequence of “1”s to the console:

    class PeanoNumberPrinter: IOutputConsumer {
        public void consumeOutput(IOutput p) {
            for (var x = (PeanoInteger)p; x != null; x = x.Prev)
                 Console.Write("1");
            Console.WriteLine();
        }
    }

    Finally, our prototype IAdditionStrategy will be adding the numbers left to right. We shall leave the option of considering other strategies for later development iterations.

    class AdditionStrategy: IAdditionStrategy {
        public INumber fold(Func<NumberOfANumber, INumber> elements,
                            Func<INumber, INumber, INumber> op) {
           return Enum.GetValues(typeof(NumberOfANumber))
                      .Cast<NumberOfANumber>()
                      .Select(elements).Aggregate(op);
        }
    }

    Take a moment to contemplate the beautiful abstraction of this functional method once again. Harry’s work, no doubt!

    The Input Provider

    The only remaining piece of the puzzle is the source of the numbers, i.e. the IAsynchronousInputProvider interface. Its implementation is a fairly arbitrary choice at this point - most probably the customer will want to customize it later, but for the purposes of our MVP we shall implement a simple sequential asynchronous generator of Peano numbers {1, 2, 3, …, 10}:

    class NumberSequenceProvider: IAsynchronousInputProvider {
        private event NumberHandler handler;
        private ManualResetEvent handlerAvailable;
     
        public NumberSequenceProvider() {
            handlerAvailable = new ManualResetEvent(false);
            new Thread(ProduceNumbers).Start();
        }
        public void AddNumberListener(NumberHandler nh) {
            handler += nh;
            handlerAvailable.Set();
        }
        private void ProduceNumbers() {
            handlerAvailable.WaitOne();
            PeanoInteger pi = null;
            foreach (var v in Enum.GetValues(typeof(NumberOfANumber))
                                  .Cast<NumberOfANumber>()) {
                    pi = new PeanoInteger(pi);
                    handler(v, pi);
            }
        }
    }

    Note that we have to be careful to not start publishing the inputs before the number processing subsystem attaches to the input producer. To achieve that we rely on the event semaphore synchronization primitive. At this point we can clearly see the benefit of choosing a powerful, enterprise-grade platform from the start! Semaphores would look much clumsier in Haskell, don’t you think, Harry? (Harry disagrees)

    So here we are - we have a solid, enterprise-grade, asynchronous, configurable implementation for an abstractly defined addition of abstractly defined numbers, using an abstract input-output mechanism.

    $> dotnet run
    1111111111111111111111111111111111111111111111111111111

    We do need some more months to ensure full test coverage, update our numerous UML diagrams, write documentation for users and API docs for developers, work on packaging and installers for various platforms, arrange marketing and sales for the project (logo, website, Facebook page, customer relations, all that, you know), and attract investors. Investors could then propose to pivot the product into a blockchain-based, distributed solution. Luckily, thanks to our rock solid design abstractions, this would all boil down to reimplementing just a few of the lower-level interfaces!

    Software engineering is fun, isn’t it?

    The source code for the developed solution is available here.

    Tags: , , , , ,

  • Posted by Konstantin 25.07.2017 3 Comments

    Every student of computer science, who has managed to keep even a tiny shred of attention at their algorithms course, should know that sorting n numbers is a task that requires at least \Omega(n \log n) time in general. There are some special cases, such as sorting small integers, where you can use counting sort or radix sort to beat this baseline, but as long as your numbers are hypothetically arbitrarily large, you are stuck with the \Omega(n \log n) lower bound. Right?

    Well, not really. One thing that many algorithms courses tend to skim over rather briefly is the discussion of the choice of the computation model, under which the algorithm of interest is supposed to run. In particular, the \Omega(n \log n) bound for sorting holds for the comparison-only model of computation — the abstract situation where the algorithm may only perform pairwise comparisons of the numbers to be sorted. No arithmetic, bit-shifts or anything else your typical processor is normally trained to do is allowed. This is, obviously, not a very realistic model for a modern computer.

    Let us thus consider a different computation model instead, which allows our computer to perform any of the basic arithmetic or bitwise operations on numbers in constant time. In addition, to be especially abstract, let us also assume that our computer is capable of handling numbers of arbitrary size. This is the so-called unit-cost RAM model.

    It turns out that in this case one can sort arbitrarily large numbers in linear time. The method for achieving this (presented in the work of W. Paul and J. Simon, not to be confused with Paul Simon) is completely impractical, yet quite insightful and amusing (in the geeky sense). Let me illustrate it here.

    Paul-and-Simon Sorting

    The easiest way to show an algorithm is to step it through an example. Let us therefore consider the example task of sorting the following array of three numbers:

    a = [5, 3, 9]

    Representing the same numbers in binary:

    [101, 11, 1001]

    Our algorithm starts with a linear pass to find the bit-width of the largest number in the array. In our case the largest number is 9 and has 4 bits:

    bits = max([ceil(log2(x)) for x in a])     # bits = 4
    n = len(a)                                 # n = 3

    Next the algorithm will create a (4+1)\cdot 3^2 = 45-bit number A of the following binary form:

     1 {5} 1 {5} 1 {5} 1 {3} 1 {3} 1 {3} 1 {9} 1 {9} 1 {9}

    where {9}, {3} and {5} denote the 4-bit representations of the corresponding numbers. In simple terms, we need to pack each array element repeated n times together into a single number. It can be computed in linear time using, for example, the following code:

    temp, A = 0, 0
    for x in a:
        temp = (temp<<(n*(bits+1))) + (1<<bits) + x
    for i in range(n):
        A = (A<<(bits+1)) + temp

    The result is 23834505373497, namely:

    101011010110101100111001110011110011100111001

    Next, we need to compute another 45-bit number B, which will also pack all the elements of the array n times, however this time they will be separated by 0-bits and interleaved as follows:

     0 {5} 0 {3} 0 {9} 0 {5} 0 {3} 0 {9} 0 {5} 0 {3} 0 {9}

    This again can be done in linear time:

    temp, B = 0, 0
    for x in a:
        temp = (temp<<(bits+1)) + x
    for i in range(n):
        B = (B<<(n*(bits+1))) + temp

    The result is 5610472248425, namely:

    001010001101001001010001101001001010001101001

    Finally, here comes the magic trick: we subtract B from A. Observe how with this single operation we now actually perform all pairwise subtractions of the numbers in the array:

    A = 1 {5} 1 {5} 1 {5} 1 {3} 1 {3} 1 {3} 1 {9} 1 {9} 1 {9} 
    B = 0 {5} 0 {3} 0 {9} 0 {5} 0 {3} 0 {9} 0 {5} 0 {3} 0 {9}

    Consider what happens to the bits separating all the pairs. If the number on top is greater or equal to the number on the bottom of the pair, the corresponding separating bit on the left will not be carried in the subtraction, and the corresponding bit of the result will be 1. However, whenever the number on the top is less than the number on the bottom, the resulting bit will be zeroed out due to carrying:

    A   = 1 {5} 1 {5} 1 { 5} 1 { 3} 1 {3} 1 { 3} 1 {9} 1 {9} 1 {9} 
    B   = 0 {5} 0 {3} 0 { 9} 0 { 5} 0 {3} 0 { 9} 0 {5} 0 {3} 0 {9}
    A-B = 1 {0} 1 {2} 0 {12} 0 {14} 1 {0} 0 {10} 1 {4} 1 {6} 1 {0}

    The same in binary (highlighted groups correspond to repetitions of the original array elements in the number A):

    A   = 1 0101 1 0101 1 0101|1 0011 1 0011 1 0011|1 1001 1 1001 1 1001
    B   = 0 0101 0 0011 0 1001|0 0101 0 0011 0 1001|0 0101 0 0011 0 1001
    A-B = 1 0000 1 0010 0 1100|0 1110 1 0000 0 1010|1 0100 1 0110 1 0000
    

    Each "separator" bit of A-B is effectively the result of a comparison of every array element with every other. Let us now extract these bits using a bitwise AND and sum them within each group. It takes another couple of linear passes:

    x = A-B >> bits
    mask, result = 0, 0
    for i in range(n):
        mask = (mask<<(n*(bits+1))) + 1
    for i in range(n):
        result += x & mask
        x = x >> (bits+1)

    The result is now the following number:

    result = 10|000000000000001|000000000000011

    It is a packed binary representation of the array r = [2, 1, 3]. The number 2 here tells us that there are two elements in a, which are less or equal than a[0]=5. Similarly, the number 1 says that there is only one element less or equal than a[1]=3, and the number 3 means there are three elements less or equal than a[2]=9. In other words, this is an array of ranks, which tells us how the original array elements should be rearranged into sorted order:

    r = [result >> (n*(bits+1)*(n-i-1)) & ((1<<(n*(bits+1)))-1) 
                                              for i in range(n)]
    a_sorted = [None]*n
    for i in range(n):
        a_sorted[r[i]-1] = a[i]
    

    And voilà, the sorted array! As presented above, the method would only work for arrays consisting of distinct non-negative integers. However, with some modifications it can be adapted to arbitrary arrays of integers or floats. This is left as an exercise to the reader.

    The General Implications

    There are several things one can learn from the "Paul-and-Simon sort". Firstly, it shows the immense power of the unit-cost RAM computational model. By packing arbitrary amounts of data into a single register of unlimited size, we may force our imaginary computer to perform enormously complex parallel computations in a single step. Indeed, it is known that PSPACE-complete problems can be solved in polynomial time in the unlimited-precision RAM model. This, however, assumes that the machine can do arbitrary arithmetic operations. If you limit it to only additions, subtractions and multiplications (but not divisions or bit-shifts), you still cannot sort integers faster than \Omega(n \log n) even using infinitely-sized registers (this is the main result of the Paul and Simon's article that inspired this post). Not obvious, is it?

    Of course, real computers can usually only perform constant-time operations on registers of a fixed size. This is formalized in the w-bit word-RAM model, and in this model the "Paul and Simon sort" degrades from a O(n) into a O(n^3) algorithm (with O(n^2) memory consumption). This is a nice illustration of how the same algorithm can have different complexity based on the chosen execution model.

    The third thing that the "Paul and Simon sort" highlights very clearly is the power of arithmetic operations on packed values and bitstrings. In fact, this idea has been applied to derive practically usable integer sorting algorithms with nearly-linear complexity. The latter paper by Han & Thorup expresses the idea quite well:

    Excerpt from Han & Thorup, "Integer Sorting in O(n sqrt(log log n)) Expected Time and Linear Space".

    In case you need the full code of the step-by-step explanation presented above, here it is.

    Tags: , , ,

  • Posted by Konstantin 27.07.2016 3 Comments

    While writing the previous post I was thinking of coming up with a small fun illustration for Aframe. I first heard about AFrame at the recent European Innovation Academy - a team-project-based entrepreneurship summer school. The team called MemVee was aiming to develop an AFrame-based site which would allow students to design and view interactive "Memory Palaces" - three-dimensional spaces with various objects related to their current study topics, organized in a way that simplifies remembering things for visual learners. Although I have never viewed a "Memory Palace" as something beyond a fantastic concept from a Sherlock Holmes TV episode, I am a visual learner myself and understand the importance of such illustrations. In fact, I try to use and preach graphical references in my teaching practice whenever I find the time and opportunity:

    • In this lecture the concept of a desk is used as a visual metaphor of "structuring the information" as well as to provide an outline to the talk.
    • Here and here an imaginary geographical map is used in a similar context.
    • For the computer graphics course I had to develop some "slides" as small OpenGL apps for visualizing the concepts during the lecture. This has been later taken to extreme heights in the practical materials designed by Raimond-Hendrik, who went on to give this course (alongside with a seminar) in the following years. Unfortunately the materials are closed for non-participants (yet I still hope they will be opened some day, do you read this, Raimond?), but the point is that every single notion has a tiny WebGL applet made to illustrate it interactively.
    • Once I tried to make a short talk about computer graphics, where the slides would be positioned on the walls of a 3D maze, so that to show them I'd have to "walk through the maze", like in a tiny first-person shooter game. Although this kind of visualization was not at all useful as a learning aid (as it did not structure anything at all), it none the less looked cool and was very well appreciated by the younger audience of the talk, to whom it was aimed at.

    I have lost the sources of that last presentation to a computer error and decided to recreate a similar "maze with slides" with AFrame. The night was long and I got sucked into the process to the point of making an automated tool. You upload your slides, and it generates a random maze with your slides hanging on the walls. It is utterly useless, but the domain name "slideamaze.com" was free and I could not resist the pun. [Update from 2018 - the domain expired and the project migrated to slideamaze.ing.ee.]

    Check it out. If you are into programming-related procrastination, try saving the "mazes" generated by the tool on your computer and editing the A-frame code to, say, add monsters or other fun educational tools into the maze.

    Tags: , , , , , ,

  • Posted by Konstantin 25.07.2016 No Comments

    The web started off with the simple idea of adding hyperlinks to words within text documents. The hyperlinks would let the reader to easily "jump" from one document or page to another, thus undermining the need to read the text sequentially, page by page, like a book. Suddenly, the information available to a computer user expanded from single documents into a whole network of disparate, interconnected pages. The old-school process of reading the pages in a fixed order became the process of browsing this network.

    Hyperlinks could be added to the corresponding words by annotating these words with appropriate mark-up tags. For example, "this sentence" would become "this <a href="other_document">sentence</a>". This looked ugly on a text-only screen, hence browsers were soon born - applications, which could render such mark-up in a nicer way. And once you view your text through a special application which knows how to interpret mark-up (a serious advance back in the 90s), you do not have to limit yourself to only tagging your text with hyperlinks - text appearance (such as whether it should be bold, italic, large or small) could just as well be specified using the same kind of mark-up.

    Fast-forward 25 years, most documents on the web consist nearly entirely of mark-up - some have nearly no text at all. The browsers now are extremely complex systems whose job goes way beyond turning hyperlinks into underlined words. They run code, display graphics and show videos. The web experience becomes more graphical and interactive with each year.

    There is one area, however, that the web has not yet fully embraced - 3D graphics. First attempts to enrich the web experience with 3D go back as far as 94, when VRML was born. It picked some traction in the scientific community - projects were made which used VRML to, say, visualize molecules. Unfortunately, the common web developers of the time mostly regarded VRML as an arcane technology irrelevant to their tasks, and the layman user would not care to install a heavyweight VRML plug-in in order to view a molecule in the browser. Now, if it were possible to make, say, an addictive 3D shooter game with VRML, things would probably be different - a critical mass of users would be tempted to install the plug-in to play the game, and the developers would become tempted to learn the arcane tech to develop a selling product for that critical mass. Apparently, no selling games were created with VRML.

    It took about fifteen years for the browsers to develop native support for 3D rendering technology. It is now indeed possible to create addictive 3D games, which run in the browser. And although a whole market for in-browser 3D applications has been born, the common web developer of our time still regards 3D as an arcane technology irrelevant to their tasks. It requires writing code in an unfamiliar API and it does not seem to interoperate with the rest of your webpage well. The successors of VRML still look like rather niche products for the specialized audience.

    I have recently discovered the A-Frame project and I have a feeling that this might finally bring 3D into the web as a truly common primitive. It interoperates smoothly with HTML and Javascript, it works out of the box on most browsers, it supports 3D virtual reality headsets, and it relies on an extremely intuitive Entity-Component approach to modeling (just like the Unity game engine, if you know what that means). Let me show you by example what I mean:

    <a-scene>
        <a-cube color="#d22" rotation="0 13 0">
            <a-animation attribute="position"
                         dur="1000"
                         easing="ease-in-out-quad"
                         direction="alternate"
                         to="0 2 0"
                         repeat="indefinite">
             </a-animation>
         </a-cube>
    </a-scene>
    

    This piece of markup can be included anywhere within your HTML page, you can use your favourite Javascript framework to add interactivity or even create the scene dynamically. You are not limited to any fixed set of entities or behaviours - adding new components is quite straightforward. The result seems to work on most browsers, even the mobile ones.

    Of course, things are not perfect, A-Frame's version number is 0.2.0 at the moment, and there are some rough edges to be polished and components to be developed. None the less, the next time you need to include a visualization on your webpage, try using D3 with A-frame, for example. It's quite enjoyable and feels way more natural than any of the 3D-web technologies I've tried so far.

    Tags: , , ,

  • Posted by Konstantin 18.06.2015 No Comments

    The developments of proper GPU-based implementations of neural network training methods in the recent years have lead to a steady growth of exciting practical examples of their potential. Among others, the topic of face recognition (not to be confused with face detection) is on the steady rise. Some 5 years ago or so, decent face recognition tools were limited to Google Picasa and Facebook, some research labs and a few commercial products, often branded with the word "Biometrics" (that somehow seems to grow out of fashion nowadays).

    Today things are different. Given a decently large labeled dataset, some knowledge of machine learning, familiarity with the GPU-based "deep learning" tools, and sufficient perseverence, one can design a reasonably impressive face recognizer system in a matter of several weeks or months at most. Consequently, now is the time when we can see recognition-based startups receive millions in funding. Just some months ago Microsoft came out with their own public face detection and recognition API (which got its fair share of publicity in the form of a funny age-guessing tool).

    The Hype Cycle

    The Hype Cycle

    Hence, the growth in popularity and use of face recognition is apparent. Given that the initially overinflated hype around the whole deep learning buzzword seems to have more-or-less settled down to reality, this time the growth is realistic. We are probably on the "enlightenment" segment of the hype cycle here, very close to reaching actual productivity (which is not without issues, though).

    Tambet and Ardi of our institute's neuroscience research group seemed to have caught the noospheric trend and are working on a neural-network based face recognizer with the initial aim of using it to sort and organize historical photos from the Estonian National Archives.

    During a Skype University Hackathon, which happened in April I had the chance to join forces with them to present their efforts in the form of a fun public web app. The idea was to let people search for similar faces in the Estonian archive photos. The resulting site was called "teisik.ee" ("doppelganger" in Estonian). Although it does not seem to exactly fulfil the "doppelganger finding" purpose, it does manage to identify persons known to the system from the database surprisingly well.

     

    Output from teisik.ee

    Output from teisik.ee

    Having observed that finding matches to celebrities, even if they are not perfect matches, is entertaining in its own way, we also managed to put up a second version of the service (all within that same weekend!), codenamed celebritymatch.me. This app lets you search the dataset of celebrity faces for those which are apparently similar (according to the opinion of our neural network, at least). Try it, it is not perfect, but rather fun.

    The implementation of the recognition system is rather straightforward for anyone who knows what a convolutional network is and otherwise pretty impossible to grasp in full, hence I won't go into much technical detail. It is implemented using Caffe and consists of three consequtive convolutional layers (with ReLU and max-pooling), followed by a fully-connected hidden layer, which is then fully-connected to a softmax classification output layer. The outputs of the penultimate layer (of size 64) are used as the feature representation of the face. Those feature representations are then used to find Euclidean-distance-wise nearest neighbors in the database of faces. The future plans are to later apply the probably smarter FaceNet approach to network training for the same use case. The webapp is done using Flask.

    Right after the hackathon Tambet was invited to give an interview about the project on Estonian television. If you understand Estonian, check it out, it is very good.

     

    Tags: , , , , , ,

  • Posted by Konstantin 21.04.2015 No Comments
    "Hello world" in Flask

    "Hello world" in Flask

    Over the recent years I happen to have made several small personal projects using Python's Flask web framework. The framework is designed to provide a very minimalistic "bottom-up" approach. It feels slightly less cluttered and imposing than some of the popular alternatives, thus fitting nicely for the projects a single person might typically want to hack up in a spare weekend. Minimalism of Flask does not mean it is somehow limited or unsuitable for larger projects - perhaps on the contrary, small size of the framework means there are fewer restrictions on what and how you can do with it.

    What a small framework needs to be applied comfortably beyond its 6-line "Hello-World" use case, however, is a decent starter project template that would include some of the most common bells and whistles. And indeed, there happen to be several such templates available. I used parts of them over time in my own projects, however I always ended redoing/rewriting/renaming bits and pieces to fit my personal aesthetic needs. Eventually I got tired of renaming and packaged a Flask application template in the way I consistently prefer to use it. I am not sure whether it is objectively better or worse than the alternatives. None the less, if at some point you decide to give Flask a try, let me suggest you try this template of mine as your point of origin.

    Tags: , , , , ,

  • Posted by Konstantin 12.04.2015 1 Comment
    The first axiom of human bananology

    There is a popular claim that "human DNA is 50% similar to the DNA of a banana", which is often cited in the context of "science popularization" as well as in the various "OMG did you know that" articles. It sounds funny, scientific and "plausible", hence I've seen many smart people repeat it, perhaps as a joke, without giving it too much thought. It is worth giving a thought, though.

    Note that depending on how you phrase the statement, it may imply different things, some of them could be more, and some less exciting. The following examples are completely different in their meaning:

    1. If you change 50% of human DNA you can obtain the DNA of a banana.
    2. 50% of human DNA nucleotides are present in the DNA of a banana.
    3. 50% of human's DNA regions have approximate matches in the banana DNA (or vice versa, which would be a different statement)
    4. 50% of human genes have orthologs among banana genes (or vice versa).

    The first one is obviously false, due to the fact that the total length of the human DNA is about 10 times that of the banana. You could include the whole banana sequence verbatim into a human genome, and it would only make 10% of it. The second one is also false, because, strictly speaking, not 50% but all 100% of human DNA nucleotides are also present in the DNA of a banana. Indeed, any two organisms on Earth have their DNA composed as a sequence of exactly the same four nucleotides. Moreover, if you start comparing random positions of two random DNA's, you will get a match once every four attempts by pure chance. There's a 25% basepair-wise similarity of any DNA to a random sequence!

    The last two (or four, if you include the "reversed" versions) claims are more informative. In fact, claim #4 is probably the most interesting and is what could be meant if the presumed "50% similarity" was indeed ever found. Given the wide availability of genomic data, this claim be checked to some extent by anyone with a computer and a desire to finally make sure, how much of a banana he is, after all. Let us do it.

    What proportion of human genes could be very similar to banana genes?

    Although there is a lot of data about gene orthology among various organisms, as far as humans and bananas are concerned, I could not find any proper precomputed alignments. Creating a full-genome alignment for two organisms from scratch is a procedure way too tedious for a single Sunday's evening, so let us try a simplified measurement approach instead. Consider all possible 20-nucleotide reads taken from the gene-associated regions in the reference human genome. We shall say that a human gene "is somewhat bananas" if at least 5% of its 20-bp reads match somewhere on the banana genome. Given this definition, we shall ask: what proportion of the known human genes "are somewhat bananas"?

    At this point some passionate readers could argue that, for example, 20-nucleotide reads are not long/short enough for the purposes of this question, or that the cutoff of 5% is too low or too high, or that approximate matching should be used instead along with some proper string alignment techniques, etc. As noted, we shall leave those aspects to dedicated researchers in the field of human bananology.

    The computation took about an hour to run and came back with the following conclusion:

    Only 33 out of the 24624 human genes (of at least 1000bp in length) are "somewhat bananas".

    In other words, no, you are not "50% similar" to a banana according to any simple definition of such similarity. Not even 1% similar! Of course, there could still be means to torture the data and squeeze the "50%" value out, but those must be some quite nontrivial means and the interpretation of the resulting similarity would be far from straightforward.

    Tags: , , ,

  • Posted by Konstantin 05.04.2015 4 Comments

    When it comes to data analysis, there are hundreds of exciting approaches: simple summary statistics and hypothesis tests, various clustering methods, linear and nonlinear regression or classification techniques, neural networks of various types and depths, decision rules and frequent itemsets, feature extractors and dimension reductors, ensemble methods, bayesian approaches and graphical models, logic-based approaches and fuzzy stuff, ant colonies, genetic algorithms and other optimization methods, monte-carlo algorithms, sampling and density estimation, logic-based and graph methods. Don't even get me started on the numerous visualization techniques.

    This sheer number of options is, however, both a blessing and a curse at the same time. In many practical situations just having those methods at your disposal may pose more problems than solutions. First you need to pick one of the approaches that might possibly fit your purpose. Then you will try to adapt it appropriately, spend several iterations torturing the data only to obtain very dubious first results, come to the conclusion that most probably you are doing something wrong, reconvince yourself that you need to try harder in that direction, spend some more iterations testing various parameter settings. Nothing works as you want it to, so you start everything from scratch with another method to find yourself obtaining new, even more dubious results, torturing the data even further, getting tired of that and finally settling on something "intermediately decent", which "probably makes sense", although you are not so sure any more and feel frustrated.

    I guess life of a statistician was probably way simpler back in the days when you could run a couple of t-tests, or an F-test from a linear regression and call it a day. In fact, it seems that many experimental (e.g. wetlab) scientists still live in that kind of world, when it comes to analyzing their experimental results. The world of T-tests is cozy and safe. They don't get you frustrated. Unfortunately, t-tests can feel ad-hockish, because they force you to believe that something "is normally distributed". Also, in practice, they are mainly used to confirm the obvious rather than discover something new from the data. A simple scatterplot will most often be better than a t-test as an analysis method. Hence, I am not a big fan of T-tests. However, I do have my own favourite statistical method, which always feels cozy and safe, and never gets me frustrated. I tend to apply it whenever I see a chance. It is the Fisher exact test in the particular context of feature selection.

    My appreciation of it stems from my background in bioinformatics and some experience with motif detection in particular. Suppose you have measured the DNA sequences for a bunch of genes. What can you do to learn something new about the sequence structure from that data? One of your best bets is to first group your sequences according to some known criteria. Suppose you know from previous experiments that some of the genes are cancer-related whereas others are not. As soon as you have specified those groups, you can start making observations like the following: "It seems that 10 out of my 20 cancer-related genes have the subsequence GATGAG in their DNA code. The same sequence is present in only 5 out of 100 non-cancer-related ones. How probable would it be to obtain similar counts of GATGAG, if the two groups were picked randomly?" If the probability to get those counts at random is very low, then obviously there is something fishy about GATGAG and cancer - perhaps they are related. To compute this probability you will need to use the hypergeometric distribution, and the resulting test (i.e. the question "how probable is this situation in a random split?") is known as the Fishers' exact test.

    This simple logic (with a small addition of a multiple testing correction on top) has worked wonders for finding actually important short sequences on the DNA. Of course it is not limited to sequence search. One of our research group's most popular web tools uses the same approach to discover functional annotations, that are "significantly overrepresented" in a given group of genes. The same approach can be used to construct decision trees, and in pretty much any other "supervised learning" situation, where you have groups of objects and want to find binary features of those objects, associated with the groups.

    Although in general the Fisher test is just one particular measure of association, it is, as I noted above, rather "cozy and comfortable". It does not force me to make any weird assumptions, there is no "ad-hoc" aspect to it, it is simple to compute and, most importantly, in my experience it nearly always produces "relevant" results.

    Words overrepresented in the speeches of Greece MPs

    Words overrepresented in the speeches of Greece MPs

    A week ago me, Ilya and Alex happened to take part in a small data analysis hackathon, dedicated to the analysis of speech transcripts from the European Parliament. Somewhat analogously to DNA sequences, speeches can be grouped in various ways: you can group them by the speaker who gave them, by country, gender or political party of that speaker, by the month or year when the speech was given or by any combination of such groupings. The obvious "features" of a speech are words, which can be either present or not present in it. Once you view the problem this way the task of finding group-specific words becomes self-evident and the Fisher test is the natural solution to it. We implemented this idea and extracted "country-specific" and "time-specific" words from the speeches (other options were left out due to time constraints). As is usual the case with my favourite method, the obtained results look relevant, informative and, when shown in the form of a word cloud, fun. Check them out.

    The complete source code of the analysis scripts and the visualization application is available on Github.

     

    Tags: , , , , , , ,