• Posted by Konstantin 02.01.2024 No Comments

    This article has been cross-posted to Hackster.

    What is the best way to learn (or teach) introductory robotics? What could be the "hello world" project in this field, that would be simple for a beginner to get started with, complex enough to be exciting and qualify as a "robot", and deep enough to allow further creative extensions and experiments? Let us try solving this problem logically, step by step!

    First things first: what is a "robot"?

    Axiom number one: for anything to feel like a robot, it must have some moving parts. Ideally, it could have the freedom move around, and probably the easiest way to achieve that would be to equip it with two wheels. Could concepts other than a two-wheeler qualify as the simplest robot? Perhaps, but let us proceed with this one as a reasonably justified assumption.

    The wheels

    To drive the two upcoming wheels we will need two motors along with all that driver circuitry, ugh! But wait, what about those small continuous rotation servos! They are simple, cheap, and you just need to wire them to a single microcontroller pin. Some of them are even conveniently sold along with with a LEGO-style wheel!

    The microcontroller

    Rule number two: it is not a real robot, unless it has a programmable brain, like a microcontroller. But which microcontroller is the best for a beginner in year 2024? Some fifteen years ago the answer would certainly have the "Arduino" keyword in it. Nowadays, I'd say, the magic keyword is "CircuitPython". Programming just cannot get any simpler than plugging it into your computer and editing code.py on the newly appeared external drive. No need for "compiling", "flashing" or any of the "install these serial port drivers to fix obscure error messages" user experiences (especially valuable if you need to give a workshop at a school computer class). Moreover, the code to control a wheel looks as straightforward as motor.throttle = 0.5.

    OK, CircuitPython it is, but what specific CircuitPython-compatible microcontroller should we choose? We need something cheap and well-supported by the manufacturer and the community. The Raspberry Pi Pico would be a great match due to its popularity, price and feature set, except it is a bit too bulky (as will become clear below). Check out the Seeedstudio Xiao Series instead. The Xiao RP2040 is built upon the same chip as the Raspberry Pi Pico, is just as cheap, but has a more compact size. Interestingly, as there is a whole series of pin-compatible boards with the same shape, we can later try other models if we do not like this one. Moreover, we are not even locked into a single vendor, as the QT Py series from Adafruit is also sharing the same form factor. How cool is that? (I secretly hope this particular footprint would get more adoption by other manufacturers).

    The body

    OK, cool. We have our wheels and a microcontroller to steer them. We now need to build the body for our robot and wire things together somehow. What would be the simplest way to achieve that?

    Let us start with the wiring, as here we know an obviously correct answer - a breadboard. No simpler manual wiring technology has been invented so far, right? Which breadboard model and shape should we pick? This will depend on how we decide to build the frame for our robot, so let us put this question aside for a moment.

    How do we build the frame? The options seem infinite: we can design it in a CAD tool and laser cut it from acrylic sheets, print on a 3D printer, use some profile rails, wooden blocks, LEGO bricks,... just ask Google for some inspiration:

    Ugh, this starts to feel too complicated and possibly inaccessible to a normal schoolkid.... But wait, what is it there on the back of our breadboard? Is it sticky tape? Could we just... stick... our motors to it directly and use the breadboard itself as the frame?

    If we now try out a few breadboard models, we will quickly discover that the widespread, "mini-sized", 170-pin breadboard has the perfect size to host our two servos:

    By an extra lucky coincindence, the breadboard even has screw holes in the right places to properly attach the motors! If we now hot glue the servo connectors to the side as shown above, we can wire them comfortably, just as any other components on the breadboard itself. Finally, there is just enough space in the back to pack all the servo wires and stick a 3xAAA battery box, which happens to be enough to power both our servos and the Xiao RP2040:

    This picture shows the FS90R servos. These work just as well as the GeekServo ones shown in in the picture above.

    ... and thus, the BreadboardBot is born! Could a breadboard with two motors and a battery stuck to its back qualify as the simplest, lowest-tech ever robot platform? I think it could. Note that the total cost of this whole build, including the microcontroller is under $20 (under $15 if you order at the right discounts from the right places).

    But what can it do?

    Line following

    Rule number 3: it is not a real robot if it is not able to sense the environment and react to it somehow. Hence, we need some sensors, preferably ones that would help us steer the robot (wheels are our only "actuators" so far). What are the simplest steering techniques in introductory robotics? I'd say line following and obstacle avoidance!

    Let us do line following first then. Oh, but how do we attach the line tracking sensors, we should have built some complicated frame after all, right? Nope, by a yet another happy design coincidence, we can just plug them right here into our breadboard:

    Thanks to the magic of CircuitPython, all it takes now to teach our robot to (even if crudely) follow a line is just three lines of code (boilerplate definitions excluded):

    while True:
      servo_left.throttle = 0.1 * line_left.value
      servo_right.throttle = -0.1 * line_right.value

    Obstacle avoidance

    OK, a line follower implemented with a three-line algorithm is cool, can we do anything else? Sure, look how conveniently the ultrasonic distance sensor fits onto our breadboard:

    Add a few more lines of uncomplicated Python code and you get a shy robot that can follow a line but will also happily go roaming around your room, steering away from any walls or obstacles:

    Note how you can hear the sonar working in the video (not normally audible).

    ... and more

    There is still enough space on the breadboard to add a button and a buzzer:

    .. or splash some color with a Xiao RGB matrix:

    .. or plug in an OLED screen to give your BreadboardBot a funny animated face:
    The one on the image above also had the DHT11 sensor plugged into it, so it shows humidity and temperature while it is doing its line following. Why? Because it can!

    Tired of programming and just want a remote-controllable toy? Sure, plug in a HC06 bluetooth serial module, program one last time, and go steer the robot yourself with a smartphone:

    Other microcontroller boards, BLE, Wifi and FPV

    Xiao RP2040 is great, but it has no built-in wireless connectivity! Plug in its ESP32-based cousin Xiao ESP32S3 and you can steer your robot with a Bluetooth gamepad, or via Wifi, by opening a webpage served by the robot itself!
    If you have the Xiao ESP32S3 Sense (the version with a microphone and camera), that webpage can even show you the live first-person view feed of the robot's camera (conveniently positioned to look forward):

    In case you were wondering, the resulting FPV bot is not a very practical solution for spying on your friends, as the microcontroller heats up a lot and consumes the battery way too fast, but it is still a fun toy and an educational project.

    Finally, although the Xiao boards fit quite well with all other gadgets on the mini breadboard you are not limited by that choice either and can try plugging in any other appropriately-sized microcontroller board. Here is, for example, a BreadboardBot with an M5Stack Atom Lite running a line-follower program implemented as an ESPHome config (which you can flash over the air!).


    Conclusion

    The examples above are just the tip of the iceberg. Have any other sensors lying around from those "getting started with Arduino" kits that you did not find sufficiently exciting uses for so far? Maybe plugging them into the BreadboardBot is the chance to shine they were waiting for all this time! Teaching the robot to recognize familiar faces and follow them? Solving a maze? Following voice commands? Self-balancing? Programming coordinated movement of multiple robots? Interfacing the robot with your smart home or one of these fancy AI chatbots? Organizing competitions among multiple BreadboardBots? Vacuuming your desk? The sky (and your free time) is the limit here!

    Consider making one and spreading the joy of breadboard robotics!

    Detailed build instructions along with wiring diagrams and example code are available on the project's Github page. Contributions are warmly welcome!

    Tags: , , , , , , ,

  • Posted by Konstantin 21.10.2019 No Comments

    Can't help sharing this lovely example, illustrating the way Python interns small integers, the lower level use of the id function as well as the power of the ctypes module.

    import ctypes
    def deref(addr, type_):
        return ctypes.cast(addr, ctypes.POINTER(type_))
    
    deref(id(29), ctypes.c_int)[4] = 100
    > 29
    100
    
    > 29*100
    10000
    
    > 25+4
    100

    Note that depending on the version of Python the value of the integer may be stored at either offset [4] or [6].

    Tags: , , , ,

  • Posted by Konstantin 17.03.2018 3 Comments

    I have randomly stumbled upon a Quora question "Can you write a program for adding 10 numbers" yesterday. The existing answers competed in geeky humor and code golf, so I could not help adding another take on the problem.

    Can you write a program for adding 10 numbers?

    The question offers a great chance to illustrate how to properly develop software solutions to real-life problems such as this one.

    First things first - let us analyze the requirements posed by the customer. They are rather vague, as usual. It is not clear what “numbers” we need to add, where and how should these “numbers” come from, what is really meant under “adding”, what should we do with the result, what platform the software is supposed to be running on, what are the service guarantees, how many users are expected, etc.

    Of course, we do not want to discover that we misunderstood some of the requirements late in the development cycle, as this could potentially require us to re-do all of the work. To avoid such unpleasant surprises we should be planning for a general, solid, enterprise-grade solution to the problem. After a short meeting of the technical committee we decided to pick C# as the implementation platform. It is OS-independent and has many powerful features which should cover any possible future needs. For example, if the customer would decide to switch to a cluster-based, parallel implementation later along the way, we’d quickly have this base covered. Java could also be a nice alternative, but, according to the recent developer surveys, C# development pays more.

    The Architecture

    Let us start by modeling the problem on a higher level. The customer obviously needs to process (“add”) some data (“10 numbers”). Without getting into too much detail, this task can be modeled as follows:

    interface IInputProvider {}
    interface IOutput {}
    interface ISolution {
        IOutput add10(IInputProvider input);    
    }

    Note how we avoid specifying the actual sources of input and output yet. Indeed, we really don’t know where the “10 numbers” may be coming from in the future - these could be read from standard input, sent from the Internet, delivered by homing pigeons, or teleported via holographic technology of the future - all these options are easily supported by simply implementing IInputProvider appropriately.

    Of course, we need to do something about the output once we obtain it, even though the customer forgot to mention this part of the problem. This means we will also have to implement the following interface:

    interface IOutputConsumer {
        void consumeOutput(IOutput output);
    }

    And that is it - our general solution architecture! Let us start implementing it now.

    The Configuration

    The architecture we work with is completely abstract. An actual solution would need to provide implementations for the IInputProviderIOutputConsumer and ISolution interfaces. How do we specify which classes are implementing these interfaces? There are many possibilities - we could load this information from a database, for example, and create a dedicated administrative interface for managing the settings. For reasons of brevity, we’ll illustrate a simplistic XML-based factory method pattern.

    Namely, we shall describe the necessary implementations in the XML file config.xml as follows:

    <Config>
        <InputProvider class="Enterprise.NumberSequenceProvider"/>
        <OutputConsumer class="Enterprise.PeanoNumberPrinter"/>
        <Solution class="Enterprise.TenNumbersAddingSolution"/>
    </Config>

    A special SolutionFactory class can now load this configuration and create the necessary object instances. Here’s a prototype implementation:

    class SolutionFactory {
        private XDocument cfg;
        public SolutionFactory(string configFile) {
            cfg = XDocument.Load(configFile);
        }
        public IInputProvider GetInputProvider() {
            return Instantiate<IInputProvider>("InputProvider");
        }
        public IOutputConsumer GetOutputConsumer() {
            return Instantiate<IOutputConsumer>("OutputConsumer");
        }
        public ISolution GetSolution() {
            return Instantiate<ISolution>("Solution");
        }
        private T Instantiate<T>(string elementName) {
            var typeName = cfg.Root.Element(elementName)
                                   .Attribute("class").Value;
            return (T)Activator.CreateInstance(Type.GetType(typeName));
        }
    }

    Of course, in a real implementation we would also worry about specifying the XML Schema for our configuration file, and make sure it is possible to override the (currently hard-coded) “config.xml” file name with an arbitrary URI using command-line parameters or environment variables. In many real-life enterprise solutions in Java, for example, even the choice of the XML parsing library would need to be configured and initialized using its own factory pattern. I omit many of such (otherwise crucial) details for brevity here.

    I am also omitting the unit-tests, which, of course, should be covering every single method we are implementing.

    The Application

    Now that we have specified the architecture and implemented the configuration logic, let us put it all together into a working application. Thanks to our flexible design, the main application code is extremely short and concise:

    class Program {
        static void Main(string[] args) {
            var sf = new SolutionFactory("config.xml");
            var ip = sf.GetInputProvider();
            var oc = sf.GetOutputConsumer();
            var sol = sf.GetSolution();
            var op = sol.add10(ip);
            oc.consumeOutput(op);
        }
    }

    Amazing, right? Well, it does not really work yet, of course, because we still need to implement the core interfaces. However, at this point we may conclude the work of the senior architect and assign the remaining tasks of filling in the blanks to the the main engineering team.

    The Inputs and Outputs

    Now that we have set up the higher-level architecture, we may think a bit more specifically about the algorithm we plan to implement. Recall that we need to “add 10 numbers”. We don’t really know what these “numbers” should be - they could be real numbers, complex numbers, Roman numerals or whatnot, so we have to be careful and not rush into making strict assumptions yet. Let’s just say that a “number” is something that can be added to another number:

    interface INumber {
        INumber add(INumber other);
    }

    We’ll leave the implementation of this interface to our mathematicians on the team later on.

    At this step we can also probably make the assumption that our IInputProviderimplementation should somehow give access to ten different instances of an INumber. We don’t know how these instances are provided - in the worst case each of them may be obtained using a completely different method and at completely different times. Consequently, one possible template for an IInputProvider could be the following:

    interface ITenNumbersProvider: IInputProvider {
        INumber GetNumber1();
        INumber GetNumber2();
        INumber GetNumber3();
        INumber GetNumber4();
        INumber GetNumber5();
        INumber GetNumber6();
        INumber GetNumber7();
        INumber GetNumber8();
        INumber GetNumber9();
        INumber GetNumber10();
    }

    Note how, by avoiding the use of array indexing, we force the compiler to require that any implementation of our ITenNumbersProvider interface indeed provides exactly ten numbers. For brevity, however, let us refactor this design a bit:

    enum NumberOfANumber {
        ONE, TWO, THREE, FOUR, FIVE, SIX, SEVEN, EIGHT, NINE, TEN
    }
    interface ITenNumbersProvider: IInputProvider {
        INumber GetNumber(NumberOfANumber noan);
    }

    By listing the identities of our “numbers” in an enum we still get some level of compile-time safety, although it is not as strong any more, because enum is, internally, just an integer. However, we god rid of unnecessary repetitions, which is a good thing. Refactoring is an important aspect of enterprise software development, you see.

    The senior architect looked at the proposed interface at one of our regular daily stand-ups, and was concerned with the chosen design. “Your interface assumes you can provide immediate access to any of the ten numbers”, he said. But what if the numbers cannot be provided simultaneously and will be arriving at unpredictable points in time? If this were the case, an event-driven design would be much more appropriate:

    delegate void NumberHandler(NumberOfANumber id, INumber n);
     
    interface IAsynchronousInputProvider: IInputProvider {
        void AddNumberListener(NumberHandler handler);
    }

    The adding subsystem would then simply subscribe to receive events about the incoming numbers and handle them as they come in.

    “This is all good and nice”, responded the mathematician, “but for efficient implementation of the addition algorithm we might need to have all ten numbers available at the same time”. “Ah, software design 101”, says the senior architect. We simply install an adapter class. It would pool the incoming data until we have all of it, thus converting the IAsynchronousInputProvider, used for feeding the data, into an ITenNumbersProvider, needed by the mathematician:

    class SyncronizationAdapter: ITenNumbersProvider {
       private Dictionary<NumberOfANumber, INumber> nums;
       private ManualResetEvent allDataAvailableEvent;
     
       public SynchronizationAdapter(IAsynchronousInputProvider ainput){
           nums = new Dictionary<NumberOfANumber, INumber>();
           allDataAvailableEvent = new ManualResetEvent(false);
           ainput.AddNumberListener(this.HandleIncomingNumber);
       }
       private void HandleIncomingNumber(NumberOfANumber id, INumber n){
           nums[id] = n;
           if (Enum.GetValues(typeof(NumberOfANumber))
                   .Cast<NumberOfANumber>()
                   .All(k => nums.ContainsKey(k)))
                allDataAvailableEvent.Set();
       }
       public INumber GetNumber(NumberOfANumber noan) {
           allDataAvailableEvent.WaitOne();
           return nums[noan];
       }
    }

    Now the mathematician can work on his addition logic without having to know anything about the way the numbers are coming in. Convenient, isn’t it?

    Note that we are still only providing the input interface specification (along with an adapter) here. The actual implementation has to wait until our mathematicians come up with an implementation of INumber and the data engineers decide on how to obtain ten of these in the most optimal way.

    But what about IOutput? Let us assume that we expect to output a single number. This means that INumber must itself already be an instance of IOutput:

    interface INumber: IOutput {
       INumber add(INumber other);
    }

    No need to implement anything, we just add an interface tag to INumber! See how object-oriented design techniques allow us to save development time!

    The Order of Addition

    OK, so we now have a concept of an INumber which has a (binary) addition operation defined, an ITenNumbersProvider which can provide ten INumber instances (conveniently abstracting away the IAsynchrhonousInputProvider which actually obtains the numbers), and our goal is to add them up to get an IOutput which is itself an INumber. Sounds easy, right? Not so fast! How exactly are we going to add these numbers? After all, maybe in some cases adding ((a+b)+c)+d)… can be less efficient or precise than (a+(b+(c+(d…. Or maybe the optimal addition strategy is to start from the middle and then add numbers in some order? There do exist nontrivial ways to add up numbers, you know. To accommodate for any possible options in the future (so that we wouldn’t have to rewrite the code unnecessarily), we should design our solution in a way that would let us switch our addition strategy easily, should we discover a better algorithm. One way to do it is by abstracting the implementation behind the following interface:

    interface IAdditionStrategy {
       INumber fold(Func<NumberOfANumber, INumber> elements,
                    Func<INumber, INumber, INumber> op); 
    }

    You see, it is essentially a functor, which gets a way to access our set of numbers (via an accessor function) along with a binary operator “op”, and “folds” this operator along the number set in any way it deems necessary. This particular piece was designed by Harry, who is a huge fan of functional programming. He was somewhat disappointed when we decided not to implement everything in Haskell. Now he can show how everyone was wrong. Indeed, the IAdditionStrategy is a core element of our design, after all, and it happens to look like a fold-functor which takes functions as inputs! “I told you we had to go with Haskell!”, says Harry! It would allow us to implement all of our core functionality with a much higher level of polymorphism than that of a simplistic C# interface!

    The Solution Logic

    So, if we are provided with the ten numbers via ITenNumbersProvider and an addition strategy via IAdditionStrategy, the implementation of the solution becomes a very simple matter:

    class TenNumbersAddingSolution: ISolution {
       private IAdditionStrategy strategy;
       public TenNumbersAddingSolution() {
           strategy = ...
       }
       public IOutput add10(IInputProvider input) {
           var tenNumbers = new SynchronizationAdapter(
                          (IAsynchronousInputProvider)input);
           return strategy.fold(i => tenNumbers.GetNumber(i), 
                                (x,y) => x.add(y));
       }
    }

    We still need to specify where to take the implementation of the IAdditionStrategy from, though. This would be a good place to refactor our code by introducing a dependency injection configuration framework such as the Autofac library. However, to keep this text as short as possible, I am forced to omit this step. Let us simply add the “Strategy” field to our current config.xml as follows:

    <Config>
        ...
        <Solution class="Enterprise.TenNumbersAddingSolution">
            <Strategy class="Enterprise.AdditionStrategy"/>
        </Solution>
    </Config>

    We could now load this configuration setting from the solution class:

        ...
        public TenNumbersAddingSolution() {
            var cfg = XDocument.Load("config.xml");
            var typeName = cfg.Root
                   .Element("Solution")
                   .Element("Strategy")
                   .Attribute("class").Value;
            strategy = (IAdditionStrategy)Activator
                   .CreateInstance(Type.GetType(typeName));
        }
        ...

    And voilà, we have our solution logic in place. We still need to implement INumberIAdditionStrategyITenNumbersProvider and IOutputConsumer, though. These are the lowest-level tasks that will force us to make the most specific decisions and thus determine the actual shape of our final product. These will be done by the most expert engineers and mathematicians, who understand how things actually work inside.

    The Numbers

    How should we implement our numbers? As this was not specified, we should probably start with the simplest possible option. One of the most basic number systems from the mathematician’s point of view is that of Peano natural numbers. It is also quite simple to implement, so let’s go for it:

    class PeanoInteger: INumber {
        public PeanoInteger Prev { get; private set; }
        public PeanoInteger(PeanoInteger prev) { Prev = prev; }
        public INumber add(INumber b) {
            if (b == null) return this;
            else return new PeanoInteger(this)
                    .add(((PeanoInteger)b).Prev);
        }
    }

    Let us have IOutputConsumer print out the given Peano integer as a sequence of “1”s to the console:

    class PeanoNumberPrinter: IOutputConsumer {
        public void consumeOutput(IOutput p) {
            for (var x = (PeanoInteger)p; x != null; x = x.Prev)
                 Console.Write("1");
            Console.WriteLine();
        }
    }

    Finally, our prototype IAdditionStrategy will be adding the numbers left to right. We shall leave the option of considering other strategies for later development iterations.

    class AdditionStrategy: IAdditionStrategy {
        public INumber fold(Func<NumberOfANumber, INumber> elements,
                            Func<INumber, INumber, INumber> op) {
           return Enum.GetValues(typeof(NumberOfANumber))
                      .Cast<NumberOfANumber>()
                      .Select(elements).Aggregate(op);
        }
    }

    Take a moment to contemplate the beautiful abstraction of this functional method once again. Harry’s work, no doubt!

    The Input Provider

    The only remaining piece of the puzzle is the source of the numbers, i.e. the IAsynchronousInputProvider interface. Its implementation is a fairly arbitrary choice at this point - most probably the customer will want to customize it later, but for the purposes of our MVP we shall implement a simple sequential asynchronous generator of Peano numbers {1, 2, 3, …, 10}:

    class NumberSequenceProvider: IAsynchronousInputProvider {
        private event NumberHandler handler;
        private ManualResetEvent handlerAvailable;
     
        public NumberSequenceProvider() {
            handlerAvailable = new ManualResetEvent(false);
            new Thread(ProduceNumbers).Start();
        }
        public void AddNumberListener(NumberHandler nh) {
            handler += nh;
            handlerAvailable.Set();
        }
        private void ProduceNumbers() {
            handlerAvailable.WaitOne();
            PeanoInteger pi = null;
            foreach (var v in Enum.GetValues(typeof(NumberOfANumber))
                                  .Cast<NumberOfANumber>()) {
                    pi = new PeanoInteger(pi);
                    handler(v, pi);
            }
        }
    }

    Note that we have to be careful to not start publishing the inputs before the number processing subsystem attaches to the input producer. To achieve that we rely on the event semaphore synchronization primitive. At this point we can clearly see the benefit of choosing a powerful, enterprise-grade platform from the start! Semaphores would look much clumsier in Haskell, don’t you think, Harry? (Harry disagrees)

    So here we are - we have a solid, enterprise-grade, asynchronous, configurable implementation for an abstractly defined addition of abstractly defined numbers, using an abstract input-output mechanism.

    $> dotnet run
    1111111111111111111111111111111111111111111111111111111

    We do need some more months to ensure full test coverage, update our numerous UML diagrams, write documentation for users and API docs for developers, work on packaging and installers for various platforms, arrange marketing and sales for the project (logo, website, Facebook page, customer relations, all that, you know), and attract investors. Investors could then propose to pivot the product into a blockchain-based, distributed solution. Luckily, thanks to our rock solid design abstractions, this would all boil down to reimplementing just a few of the lower-level interfaces!

    Software engineering is fun, isn’t it?

    The source code for the developed solution is available here.

    Tags: , , , , ,

  • Posted by Konstantin 27.07.2016 3 Comments

    While writing the previous post I was thinking of coming up with a small fun illustration for Aframe. I first heard about AFrame at the recent European Innovation Academy - a team-project-based entrepreneurship summer school. The team called MemVee was aiming to develop an AFrame-based site which would allow students to design and view interactive "Memory Palaces" - three-dimensional spaces with various objects related to their current study topics, organized in a way that simplifies remembering things for visual learners. Although I have never viewed a "Memory Palace" as something beyond a fantastic concept from a Sherlock Holmes TV episode, I am a visual learner myself and understand the importance of such illustrations. In fact, I try to use and preach graphical references in my teaching practice whenever I find the time and opportunity:

    • In this lecture the concept of a desk is used as a visual metaphor of "structuring the information" as well as to provide an outline to the talk.
    • Here and here an imaginary geographical map is used in a similar context.
    • For the computer graphics course I had to develop some "slides" as small OpenGL apps for visualizing the concepts during the lecture. This has been later taken to extreme heights in the practical materials designed by Raimond-Hendrik, who went on to give this course (alongside with a seminar) in the following years. Unfortunately the materials are closed for non-participants (yet I still hope they will be opened some day, do you read this, Raimond?), but the point is that every single notion has a tiny WebGL applet made to illustrate it interactively.
    • Once I tried to make a short talk about computer graphics, where the slides would be positioned on the walls of a 3D maze, so that to show them I'd have to "walk through the maze", like in a tiny first-person shooter game. Although this kind of visualization was not at all useful as a learning aid (as it did not structure anything at all), it none the less looked cool and was very well appreciated by the younger audience of the talk, to whom it was aimed at.

    I have lost the sources of that last presentation to a computer error and decided to recreate a similar "maze with slides" with AFrame. The night was long and I got sucked into the process to the point of making an automated tool. You upload your slides, and it generates a random maze with your slides hanging on the walls. It is utterly useless, but the domain name "slideamaze.com" was free and I could not resist the pun. [Update from 2018 - the domain expired and the project migrated to slideamaze.ing.ee.]

    Check it out. If you are into programming-related procrastination, try saving the "mazes" generated by the tool on your computer and editing the A-frame code to, say, add monsters or other fun educational tools into the maze.

    Tags: , , , , , ,

  • Posted by Konstantin 18.06.2015 No Comments

    The developments of proper GPU-based implementations of neural network training methods in the recent years have lead to a steady growth of exciting practical examples of their potential. Among others, the topic of face recognition (not to be confused with face detection) is on the steady rise. Some 5 years ago or so, decent face recognition tools were limited to Google Picasa and Facebook, some research labs and a few commercial products, often branded with the word "Biometrics" (that somehow seems to grow out of fashion nowadays).

    Today things are different. Given a decently large labeled dataset, some knowledge of machine learning, familiarity with the GPU-based "deep learning" tools, and sufficient perseverence, one can design a reasonably impressive face recognizer system in a matter of several weeks or months at most. Consequently, now is the time when we can see recognition-based startups receive millions in funding. Just some months ago Microsoft came out with their own public face detection and recognition API (which got its fair share of publicity in the form of a funny age-guessing tool).

    The Hype Cycle

    The Hype Cycle

    Hence, the growth in popularity and use of face recognition is apparent. Given that the initially overinflated hype around the whole deep learning buzzword seems to have more-or-less settled down to reality, this time the growth is realistic. We are probably on the "enlightenment" segment of the hype cycle here, very close to reaching actual productivity (which is not without issues, though).

    Tambet and Ardi of our institute's neuroscience research group seemed to have caught the noospheric trend and are working on a neural-network based face recognizer with the initial aim of using it to sort and organize historical photos from the Estonian National Archives.

    During a Skype University Hackathon, which happened in April I had the chance to join forces with them to present their efforts in the form of a fun public web app. The idea was to let people search for similar faces in the Estonian archive photos. The resulting site was called "teisik.ee" ("doppelganger" in Estonian). Although it does not seem to exactly fulfil the "doppelganger finding" purpose, it does manage to identify persons known to the system from the database surprisingly well.

     

    Output from teisik.ee

    Output from teisik.ee

    Having observed that finding matches to celebrities, even if they are not perfect matches, is entertaining in its own way, we also managed to put up a second version of the service (all within that same weekend!), codenamed celebritymatch.me. This app lets you search the dataset of celebrity faces for those which are apparently similar (according to the opinion of our neural network, at least). Try it, it is not perfect, but rather fun.

    The implementation of the recognition system is rather straightforward for anyone who knows what a convolutional network is and otherwise pretty impossible to grasp in full, hence I won't go into much technical detail. It is implemented using Caffe and consists of three consequtive convolutional layers (with ReLU and max-pooling), followed by a fully-connected hidden layer, which is then fully-connected to a softmax classification output layer. The outputs of the penultimate layer (of size 64) are used as the feature representation of the face. Those feature representations are then used to find Euclidean-distance-wise nearest neighbors in the database of faces. The future plans are to later apply the probably smarter FaceNet approach to network training for the same use case. The webapp is done using Flask.

    Right after the hackathon Tambet was invited to give an interview about the project on Estonian television. If you understand Estonian, check it out, it is very good.

     

    Tags: , , , , , ,

  • Posted by Konstantin 21.05.2015 No Comments

    Here's a curious quote from Alan Turing's famous paper from 1950:

    Overwhelming evidence

    Makes you appreciate how seriously one person's wishful thinking, coupled with dedication and publicity skills, may sometimes affect the scientific world.

     

    Tags: , , , ,

  • Posted by Konstantin 12.04.2015 1 Comment
    The first axiom of human bananology

    There is a popular claim that "human DNA is 50% similar to the DNA of a banana", which is often cited in the context of "science popularization" as well as in the various "OMG did you know that" articles. It sounds funny, scientific and "plausible", hence I've seen many smart people repeat it, perhaps as a joke, without giving it too much thought. It is worth giving a thought, though.

    Note that depending on how you phrase the statement, it may imply different things, some of them could be more, and some less exciting. The following examples are completely different in their meaning:

    1. If you change 50% of human DNA you can obtain the DNA of a banana.
    2. 50% of human DNA nucleotides are present in the DNA of a banana.
    3. 50% of human's DNA regions have approximate matches in the banana DNA (or vice versa, which would be a different statement)
    4. 50% of human genes have orthologs among banana genes (or vice versa).

    The first one is obviously false, due to the fact that the total length of the human DNA is about 10 times that of the banana. You could include the whole banana sequence verbatim into a human genome, and it would only make 10% of it. The second one is also false, because, strictly speaking, not 50% but all 100% of human DNA nucleotides are also present in the DNA of a banana. Indeed, any two organisms on Earth have their DNA composed as a sequence of exactly the same four nucleotides. Moreover, if you start comparing random positions of two random DNA's, you will get a match once every four attempts by pure chance. There's a 25% basepair-wise similarity of any DNA to a random sequence!

    The last two (or four, if you include the "reversed" versions) claims are more informative. In fact, claim #4 is probably the most interesting and is what could be meant if the presumed "50% similarity" was indeed ever found. Given the wide availability of genomic data, this claim be checked to some extent by anyone with a computer and a desire to finally make sure, how much of a banana he is, after all. Let us do it.

    What proportion of human genes could be very similar to banana genes?

    Although there is a lot of data about gene orthology among various organisms, as far as humans and bananas are concerned, I could not find any proper precomputed alignments. Creating a full-genome alignment for two organisms from scratch is a procedure way too tedious for a single Sunday's evening, so let us try a simplified measurement approach instead. Consider all possible 20-nucleotide reads taken from the gene-associated regions in the reference human genome. We shall say that a human gene "is somewhat bananas" if at least 5% of its 20-bp reads match somewhere on the banana genome. Given this definition, we shall ask: what proportion of the known human genes "are somewhat bananas"?

    At this point some passionate readers could argue that, for example, 20-nucleotide reads are not long/short enough for the purposes of this question, or that the cutoff of 5% is too low or too high, or that approximate matching should be used instead along with some proper string alignment techniques, etc. As noted, we shall leave those aspects to dedicated researchers in the field of human bananology.

    The computation took about an hour to run and came back with the following conclusion:

    Only 33 out of the 24624 human genes (of at least 1000bp in length) are "somewhat bananas".

    In other words, no, you are not "50% similar" to a banana according to any simple definition of such similarity. Not even 1% similar! Of course, there could still be means to torture the data and squeeze the "50%" value out, but those must be some quite nontrivial means and the interpretation of the resulting similarity would be far from straightforward.

    Tags: , , ,

  • Posted by Konstantin 15.03.2015 2 Comments

    Anyone who has had to deal with scientific literature must have encountered Postscript (".ps") files. Although the popularity of this format is gradually fading behind the overwhelming use of PDFs, you can still find .ps documents on some major research paper repositores, such as arxiv.org or citeseerx. Most people who happen to produce those .ps or .eps documents, do it using auxiliary tools, most commonly while typesetting their papers in LaTeX, or while preparing images for those papers using a vector graphics editor (e.g. Inkscape). As a result, Postscript tends to be regarded by the majority of its users as some random intermediate file format, akin to any of the myriad of other vector graphics formats.

    I find this situation unfortunate and unfair towards Postscript. After all, PS is not your ordinary vector graphics format. It is a fully-fledged Turing-complete programming language, that is well thought through and elegant in its own ways. If it were up to me, I would include a compulsory lecture on Postscript into any modern computer science curriculum. Let me briefly show you why.

    Stack-based programming

    Firstly, PostScript is perhaps the de-facto standard example of a proper purely stack-based language in the modern world. Other languages of this group are nowadays either dead, too simpletoo niche, or not practical. Like any stack language, it looks a bit unusual, yet it is simple to reason about and its complete specification is decently short. Let me show you some examples:

    2 2 add     % 2+2 (the two numbers are pushed to the stack,
                % then the add command pops them and replaces with
                % the resulting value 4)
    /x 2 def                  % x := 2 (push symbol "x", push value 2,
                              %         pop them and create a definition)
    /y x 2 add def            % y := x + 2 (you get the idea)
    (Hello!) show             % print "Hello!"
    x 0 gt {(Yes) show} if    % if (x > 0) print "Yes"

    Adding in a couple of commands that specify font parameters and current position on the page, we may write a perfectly valid PS file that would perform arithmetic operations, e.g:

    /Courier 10 selectfont   % Font we'll be using
    72 720 moveto            % Move cursor to position (72pt, 720pt)
                             % (0, 0) is the lower-left corner
    (Hello! 2+2=) show
    2 2 add                  % Compute 2+2
    ( ) cvs                  % Convert the number to a string.
                             % (First we had to provide a 1-character
                             % string as a buffer to store the result)
    show                     % Output "4"

    Computer graphics

    Postscript has all the basic facilities you'd expect from a programming language: numbers, strings, arrays, dictionaries, loops, conditionals, basic input/output. In addition, being primarily a 2D graphics language, it has all the standard graphics primitives. Here's a triangle, for example:

    newpath           % Define a triangle
        72 720 moveto
        172 720 lineto
        72 620 lineto
    closepath
    gsave             % Save current path
    10 setlinewidth   % Set stroke width
    stroke            % Stroke (destroys current path)
    grestore          % Restore saved path again
    0.5 setgray       % Set fill color
    fill              % Fill

    Postscript natively supports the standard graphics transformation stack:

    /triangle {       % Define a triangle of size 1 n the 0,0 frame
        newpath
            0 0 moveto
            1 0 lineto
            0 1 lineto
        closepath
        fill
    } def
    
    72 720 translate      % Move origin to 72, 720
    gsave                 % Push current graphics transform
        -90 rotate        % Draw a rotated triangle
        72 72 scale       % .. with 1in dimensions
        triangle
    grestore              % Restore back to non-scaled, non-rotated frame
    gsave
        100 0 translate   % Second triangle will be next to the first one
        32 32 scale       % .. smaller than the first one
        triangle          % .. and not rotated
    grestore

    Here is the result of the code above:

    Two triangles

    Two triangles

    The most common example of using a transformation stack is drawing fractals:

    /triangle {
        newpath
            0 0 moveto
            100 0 lineto
            0 -100 lineto
        closepath
        fill
    } def
    
    /sierpinski {
        dup 0 gt
        {
            1 sub
            gsave 0.5 0.5 scale dup sierpinski grestore
            gsave 50 0 translate 0.5 0.5 scale dup sierpinski grestore
            gsave 0 -50 translate 0.5 0.5 scale sierpinski grestore
        }
        { pop triangle }
        ifelse
    } def
    72 720 translate  % Move origin to 72, 720
    5 5 scale
    5 sierpinski
    Sierpinski triangle

    Sierpinski triangle

    With some more effort you can implement nonlinear dynamic system (Mandelbrot, Julia) fractals, IFS fractals, or even proper 3D raytracing in pure PostScript. Interestingly, some printers execute PostScript natively, which means all of those computations can happen directly on your printer. Moreover, it means that you can make a document that will make your printer print infinitely many pages. So far I could not find a printer that would work that way, though.

    System access

    Finally, it is important to note that PostScript has (usually read-only) access to the information on your system. This makes it possible to create documents, the content of which depends on the user that opens it or machine where they are opened or printed. For example, the document below will print "Hello, %username", where %username is your system username:

    /Courier 10 selectfont
    72 720 moveto
    (Hi, ) show
    (LOGNAME) getenv {} {(USERNAME) getenv pop} ifelse show
    (!) show

    I am sure, for most people, downloading a research paper from arxiv.org that would greet them by name, would probably seem creepy. Hence this is probably not the kind of functionality one would exploit with good intentions. Indeed, Dominique has an awesome paper that proposes a way in which paper reviewers could possibly be deanonymized by including user-specific typos in the document. Check out the demo with the source code.

    I guess this is, among other things, one of the reasons we are going to see less and less of Postscript files around. But none the less, this language is worth checking out, even if only once.

    Tags: , , , ,

  • Posted by Konstantin 09.03.2015 No Comments

    Playing cards are a great tool for modeling, popularizing and explaining various mathematical and algorithmic notions. Indeed, a deck of cards is a straightforward example for a finite set, a discrete distribution or a character string. Shuffling and dealing cards represents random sampling operations. A card hand denotes information possessed by a party. Turning card face down or face up looks a lot like bit flipping. Finally, card game rules describe an instance of a particular algorithm or a protocol. I am pretty sure that for most concepts in maths or computer science you can find some card game or a card trick that is directly related to it. Here are some examples of how you can simulate a simple computing machineillustrate inductive reasoning or explain map-reduce, in particular.

    Cryptographers seem to be especially keen towards playing cards. Some cryptographic primitives could have been inspired by them. There are decently secure ciphers built upon a deck of cards. Finally, there are a couple of very enlightening card-based illustrations of such nontrivial cryptographic concepts as zero-knowledge proofs and voting protocols. The recent course on secure two-party computation given by abhi shelat at the last week's EWSCS extended my personal collection of such illustrations with another awesome example — a secure two-party protocol for computing the AND function. As I could not find a description of this protocol anywhere in the internet (and abhi did not know who is the author), I thought it was worth being written up in a blog post here (along with a small modification of my own).

    The Tinder Game

    Consider the situation, where Alice and Bob want to find out, whether they are both interested in each other (just as if they were both users of the Tinder app). More formally, suppose that Alice has her private opinion about Bob represented as a single bit (where "0" means "not interested" and "1" means "interested"). Equivalently, Bob has his private opinion about Alice represented in the same way. They need to find out whether both bits are "1". However, if it is not the case, they would like to keep their opinions private. For example, if Alice is not interested in Bob, Bob would prefer Alice not to know that he is all over her. Because, you know, opinion asymmetry may lead to awkward social dynamics when disclosed, at least among college students.

    The following simple card game happens to solve their problem. Take five cards, so that three of them are red and two are black. The cards of one color must be indistinguishable from each other (e.g. you can't simply take three different diamonds for the reds). Deal one black and one red card to Alice, one black and one red card to Bob. Put the remaining red card on the table face up.

    Initial table configuration

    Initial table configuration

    Now Alice will put her two cards face down to the left of the open card, and Bob will put his two cards to the right of the open card:

    Alice and Bob played

    Alice and Bob played

    The rules for Alice and Bob are simple: if you like the other person, put your red card next to the central red card. Otherwise, put your black card next to the central one. For example, if both Alice and Bob like each other, they would play their cards as follows (albeit still face down):

    What Alice and Bob would play if they both liked each other

    What Alice and Bob would play if they both liked each other

    Now the middle card is also turned face down, and all five cards are collected, preserving their order:

    Five cards collected, preserving order

    Five cards collected, preserving order

    Next, Alice cuts this five-card deck, moving some number of cards from the top to the bottom (Bob should not know exactly how many). Then Bob cuts the deck (also making sure that Alice does not know how many cards he cuts). The result is equivalent to applying some cyclic rotation to the five card sequence yet neither Bob nor Alice knows how many cards were shifted in total.

    The five cards can now be opened by dealing them in order onto the table to find out the result. If there happen to be three red cards one after another in a row or two black cards one after another, Alice and Bob both voted "yes". Here is one example of such a situation. It is easy to see that it is a cyclic shift of the configuration with three red aces in the middle shown above.

    Both voted "yes"

    Both voted "yes"

    Otherwise, if neither three red aces nor two black aces are side by side, we may conclude that one or both of the players voted "no". However, there is absolutely no way to find out anything more specific than that. Here is an example:

    No mutual affection (or no affection at all)

    No mutual affection (or no affection at all)

    Obviously, this result could not be obtained as a cyclic shift of the configuration with three aces clumped together. On the other hand, it could have been obtained as a cyclic shift from any of the three other alternatives ("yes-no", "no-no" and "no-yes"), hence if Alice voted "no" she will have no way of figuring out what was Bob's vote. Thus, playing cards along with a cryptographic mindset helped Alice and Bob discover their mutual affection or the lack of it without the risk of awkward situations. Isn't it great?

    Throwing in a Zero-Knowledge Proof

    There are various ways Alice or Bob can try to "break" this protocol while still trying to claim honesty. One possible example is shown below:

    Alice is trying to cheat

    Alice is trying to cheat

    Do you see what happened? Alice has put her ace upside down, which will later allow her to understand what was Bob's move (yet she can easily pretend that turning a card upside down was an honest mistake). Although the problem is easily solved by picking a deck with asymmetric backs, for the sake of example, let us assume that such a solution is somewhy unsuitable. Perhaps there are no requisite decks at Alice and Bob's disposal, or they need to have symmetric backs for some other reasons. This offers a nice possibility for us to practice even more playing card cryptography and try to secure the original algorithm from such "attacks" using a small imitation of a zero-knowledge proof for turn correctness.

    Try solving it yourself before proceeding.

    Read more...

    Tags: , , , , , ,

  • Posted by Konstantin 22.01.2015 38 Comments

    Update from year 2017: The tool described in this post DOES NOT WORK with recent versions of Skype. Either these versions stopped saving removed messages altogether, or they are doing it in a novel manner not recognized by the tool.

    In other words - you would only recover "removed" messages if you are running older version of Skype (or these messages were sent at the time you were using that older version).

    Yesterday I happened to attend a discussion about the security and privacy of information stored locally in Skype and Thunderbird profiles. It turns out, if you obtain a person's Skype profile directory, you will be able to log in as him without the need to know the password. In addition, Dominique made a remark that Skype does not really delete the messages that are marked as "removed" in the chat window. I found that curious and decided to take a closer look.

    Indeed, there is a bunch of *.dat files in the chatsync subdirectory of the Skype's profile, which preserve all messages along with all their edits or deletions. Unfortunately, the *.dat files are in some undocumented binary format, and the only tool I found for reading those lacks in features. However, hacking up a small Python parser according to what is known about the format, along with a minimalistic GUI is a single evening's exercise, and I happened to be in the mood for some random coding.

    Skype Chatsync Viewer

    Skype Chatsync Viewer

    Now, if you want to check out what was that message you or your conversation partner wrote before it was edited or deleted, this package will help. If you are not keen on installing Python packages, here is a standalone Windows executable.

    Tags: , , , , , , ,