• Posted by Konstantin 15.03.2015 2 Comments

    Anyone who has had to deal with scientific literature must have encountered Postscript (".ps") files. Although the popularity of this format is gradually fading behind the overwhelming use of PDFs, you can still find .ps documents on some major research paper repositores, such as arxiv.org or citeseerx. Most people who happen to produce those .ps or .eps documents, do it using auxiliary tools, most commonly while typesetting their papers in LaTeX, or while preparing images for those papers using a vector graphics editor (e.g. Inkscape). As a result, Postscript tends to be regarded by the majority of its users as some random intermediate file format, akin to any of the myriad of other vector graphics formats.

    I find this situation unfortunate and unfair towards Postscript. After all, PS is not your ordinary vector graphics format. It is a fully-fledged Turing-complete programming language, that is well thought through and elegant in its own ways. If it were up to me, I would include a compulsory lecture on Postscript into any modern computer science curriculum. Let me briefly show you why.

    Stack-based programming

    Firstly, PostScript is perhaps the de-facto standard example of a proper purely stack-based language in the modern world. Other languages of this group are nowadays either dead, too simpletoo niche, or not practical. Like any stack language, it looks a bit unusual, yet it is simple to reason about and its complete specification is decently short. Let me show you some examples:

    2 2 add     % 2+2 (the two numbers are pushed to the stack,
                % then the add command pops them and replaces with
                % the resulting value 4)
    /x 2 def                  % x := 2 (push symbol "x", push value 2,
                              %         pop them and create a definition)
    /y x 2 add def            % y := x + 2 (you get the idea)
    (Hello!) show             % print "Hello!"
    x 0 gt {(Yes) show} if    % if (x > 0) print "Yes"

    Adding in a couple of commands that specify font parameters and current position on the page, we may write a perfectly valid PS file that would perform arithmetic operations, e.g:

    /Courier 10 selectfont   % Font we'll be using
    72 720 moveto            % Move cursor to position (72pt, 720pt)
                             % (0, 0) is the lower-left corner
    (Hello! 2+2=) show
    2 2 add                  % Compute 2+2
    ( ) cvs                  % Convert the number to a string.
                             % (First we had to provide a 1-character
                             % string as a buffer to store the result)
    show                     % Output "4"

    Computer graphics

    Postscript has all the basic facilities you'd expect from a programming language: numbers, strings, arrays, dictionaries, loops, conditionals, basic input/output. In addition, being primarily a 2D graphics language, it has all the standard graphics primitives. Here's a triangle, for example:

    newpath           % Define a triangle
        72 720 moveto
        172 720 lineto
        72 620 lineto
    closepath
    gsave             % Save current path
    10 setlinewidth   % Set stroke width
    stroke            % Stroke (destroys current path)
    grestore          % Restore saved path again
    0.5 setgray       % Set fill color
    fill              % Fill

    Postscript natively supports the standard graphics transformation stack:

    /triangle {       % Define a triangle of size 1 n the 0,0 frame
        newpath
            0 0 moveto
            1 0 lineto
            0 1 lineto
        closepath
        fill
    } def
    
    72 720 translate      % Move origin to 72, 720
    gsave                 % Push current graphics transform
        -90 rotate        % Draw a rotated triangle
        72 72 scale       % .. with 1in dimensions
        triangle
    grestore              % Restore back to non-scaled, non-rotated frame
    gsave
        100 0 translate   % Second triangle will be next to the first one
        32 32 scale       % .. smaller than the first one
        triangle          % .. and not rotated
    grestore

    Here is the result of the code above:

    Two triangles

    Two triangles

    The most common example of using a transformation stack is drawing fractals:

    /triangle {
        newpath
            0 0 moveto
            100 0 lineto
            0 -100 lineto
        closepath
        fill
    } def
    
    /sierpinski {
        dup 0 gt
        {
            1 sub
            gsave 0.5 0.5 scale dup sierpinski grestore
            gsave 50 0 translate 0.5 0.5 scale dup sierpinski grestore
            gsave 0 -50 translate 0.5 0.5 scale sierpinski grestore
        }
        { pop triangle }
        ifelse
    } def
    72 720 translate  % Move origin to 72, 720
    5 5 scale
    5 sierpinski
    Sierpinski triangle

    Sierpinski triangle

    With some more effort you can implement nonlinear dynamic system (Mandelbrot, Julia) fractals, IFS fractals, or even proper 3D raytracing in pure PostScript. Interestingly, some printers execute PostScript natively, which means all of those computations can happen directly on your printer. Moreover, it means that you can make a document that will make your printer print infinitely many pages. So far I could not find a printer that would work that way, though.

    System access

    Finally, it is important to note that PostScript has (usually read-only) access to the information on your system. This makes it possible to create documents, the content of which depends on the user that opens it or machine where they are opened or printed. For example, the document below will print "Hello, %username", where %username is your system username:

    /Courier 10 selectfont
    72 720 moveto
    (Hi, ) show
    (LOGNAME) getenv {} {(USERNAME) getenv pop} ifelse show
    (!) show

    I am sure, for most people, downloading a research paper from arxiv.org that would greet them by name, would probably seem creepy. Hence this is probably not the kind of functionality one would exploit with good intentions. Indeed, Dominique has an awesome paper that proposes a way in which paper reviewers could possibly be deanonymized by including user-specific typos in the document. Check out the demo with the source code.

    I guess this is, among other things, one of the reasons we are going to see less and less of Postscript files around. But none the less, this language is worth checking out, even if only once.

    Tags: , , , ,

  • Posted by Konstantin 09.05.2011 2 Comments

    I have recently realized that my HP 8440p laptop has a built-in "Qualcomm un2420 Broadband Module" device, also known as a "3G modem". For some reason no drivers were preinstalled for it on my system, and with the SIM card slot concealed behind the battery, it was not something I noticed immediately. Once the drivers were installed, the operating system had no problem recognizing the new "Mobile Broadband Connection" opportunity, and with the SIM card in the slot, I could connect to the Internet via 3G, yay.

    Knowing that there is more to mobile communication than Internet access, I was wondering whether I could do anything else, like voice calls or SMS. Unfortunately, my attempts of finding any reasonable software packages, which would open up the power of 3G to me at the click of a button, failed. Instead, however, I discovered that it is actually quite easy to communicate with the modem directly. It turns out you can control your shiny bleeding-edge 3.5G device by sending plain old AT commands to it over a serial port. This is the same protocol that the wired grandpa-modems have been speaking since the 70s and it is fun to see this language was kept along all the way into the wireless century.

    Let me show you how it works. Try to follow along — this is kinda fun.

    Finding your Modem

    If your computer does not have a built-in 3G modem, chances are high your garden variety cellphone does (not to mention smartphones, of course). If it is the case, then:

    • Switch on the Bluetooth receiver on your phone (for older Nokias this is usually in the "Settings -> Connectivity -> Bluetooth" menu).
    • On your computer, go to "Devices and Printers", click "Add a device", wait until your phone appears on the list, double click it and follow the instructions to establish the connection. (I'm talking about Windows 7 here, but the procedure should be similar for most modern OSs).
    • Once the computer recognizes your phone and installs the necessary drivers, it will appear as an icon in the "Devices" window. Double click it to open the "Properties" window, and make sure there is a "Standard Modem over Bluetooth link" function or something similar in the list.

      Cellphone Functions

      Cellphone Functions

    • Double-click that "modem" entry, a new properties window opens. Browse along in it, and find the COM port number that was assigned to the modem.

      Bluetooth modem at port COM9

      Bluetooth modem at port COM9

    If you do have a 3G modem bundled with your laptop (and you have the drivers installed), open the Device manager ("Control Panel -> Device Manager"), find the modem in the list, double click to open the "Properties" page, and browse to the "Modem" tab to find the COM port number.

    Laptop 3G modem in Device Manager

    Laptop 3G modem in Device Manager

    Connecting to the Modem

    Next thing - connect to the COM port. In Windows use PuTTY to do it. In Linux use minicom. Don't worry about the settings — the defaults should do.

    PuTTY connection dialog

    PuTTY connection dialog

    Once the connection starts, you will get a blank screen with nothing but a cursor. Try typing "AT+CGMI" followed by a <RETURN>. Note that, depending on the settings of your device and the terminal program, you might not see your letters being typed. If so, you will have to reconfigure the terminal (enable "local echo"). But for now, just type the command. You should get the name of the manufacturer in response. You can also get the word "ERROR" instead. This means that your modem is ready to talk to you, but it either does not support the "AT+CGMI" command OR requires you to enter the PIN code first. We'll get to it in a second. If you get no response at all, you must have connected to the wrong COM port.

    Terminal sessions with a Qualcomm (left) and Nokia (right) 3G modems

    Terminal sessions with a Qualcomm (left) and Nokia (right) 3G modems

    You can get more information about the device using the "AT+CGMM", "AT+CGMR", "AT+CGSN" commands. Try those.

    Authentication

    To do anything useful, you need to authenticate yourself by entering the PIN (if you use your cellphone over bluetooth, you most probably already entered it and no additional authentication is needed). You can check whether you need to enter PIN by using the command "AT+CPIN?" (note the question mark). If the response is "+CPIN: READY", your SIM card is already unlocked. Otherwise the response will probably be "+CPIN: SIM PIN", indicating that a PIN is expected to unlock the card. You enter the pin using the "AT+CPIN=<your pin>" command. Please note, that if you enter incorrect PIN three times, YOUR SIM CARD WILL BE BLOCKED (and you will have to go fetch your PUK code to unblock it), so be careful here.

    Entering the PIN

    Entering the PIN

    Doing Stuff

    Now the fun starts: you can try dialing, sending and receiving messages and do whatever the device lets you do. The (nonexhaustive) list of most interesting commands is available here. Not all of them will be supported by your device, though. For example, I found out that the laptop's 3G modem won't let me dial numbers, whilst this was not a problem for a cellphone connected over bluetooth (try the command "ATD<your number>;" (e.g. "ATD5550010;")). On the other hand, the 3G modem lets me list received messages using the "AT+CMGL" command, while the phone refused to do it.

    One useful command to know about is "AT+CUSD", which lets you send USSD messages (those "*1337#" codes) to the mobile service provider. For example, the prepaid SIM card I bought for my computer allows to buy a "daily internet ticket" (unlimited high-speed internet for 12 hours for 1 euro) via the "*135*78#" code. Here's how this can be done via terminal.

    Sending an USSD code and receiving an SMS

    Sending an USSD code and receiving an SMS

    We first send "AT+CUSD=1,*135*78#" command, which is equivalent to dialing "*135*78#" on the phone. The modem immediately shows us the operator's response ("You will shortly receive an SMS with information..."). We then list new SMS messages using the "AT+CMGL" command. There is one message, which is presented to us in the PDU encoding. A short visit to the online PDU decoder lets us decrypt the message - it simply says that the ticket is activated. Nice.

    Sending an SMS

    Finally, here's how you can send a "flash" SMS (i.e. the one which does not get saved at the receiver's phone by default and can thus easily confuse people. Try sending one of those at night - good fun). We start with an ATZ to reset the modem, just in case. Then we set message format to "text mode" using the "AT+CMGF=1" command (the alternative default is the "PDU mode", in which we would have to type SMS messages encoded in PDU). Then we set message parameters using the "AT+CSMP" command (the last 16 is responsible for the message being 'flash'). Finally, we start sending the message using the "AT+CMGS=<phone number>" command. We finish typing the message using <Ctrl+Z> and off it goes.

    Sending a "Flash" SMS

    Sending a "Flash" SMS

    For more details, refer to this tutorial or the corresponding specifications.

    All in all, it should be fairly easy to make a simple end-user interface for operating the 3G modem, it is strange that there is not much free software available which could provide this functionality. If you find one or decide to make one yourself, tell me.

    Tags: , , , ,

  • Posted by Konstantin 22.01.2009 2 Comments

    I bet most of you haven't heard of MDX, because it seems to be a technology that is difficult to stumble upon. It is not covered in university courses, not mentioned in the headlines of PC-magazine articles, and the corresponding google search returns about 20000 results, which makes it thousands (and even tens of millions) times less popular than most other related keywords. This is completely unfair. I believe MDX is compulsory knowledge for everyone who is educated in databases and data analysis enough to appreciate the virtues of both SQL queries and Excel-like spreadsheet processing. A famous quote by Alan Perlis says: "A language that doesn't affect the way you think about programming, is not worth knowing". In these terms, MDX is a language well worth knowing.

    MDX stands for Multidimensional Expressions. It is a language for specifying computations and performing queries on a multidimensional database, which effectively provides the computing power of spreadsheets in the form of a query language. Although it is not possible to explain all of MDX in a blog post, I'll try to show the gist of it in a couple of examples. I hope it will raise at least some interest in those, who are patient enough to read to the end. To understand MDX you need some experience with spreadsheets, so I'll assume you do and use the analogies with Excel.

    Multidimensional data. You use spreadsheets to work with tabular data, i.e. something like that:

    Tartu Tallinn
    Jan 21 -1.0 °C -2.1 °C
    Jan 22 -0.2 °C -0.2 °C
    Jan 23 0.3 °C -0.2 °C
    Average temperature

    This is a two-dimensional grid of numbers. Each point in the grid can be indexed by a tuple (Date, City). A multidimensional database is essentially the same grid but without the limitation of two dimensions. For instance, if there are several methods of measuring temperature, you can add a new dimension and index each cell by a tuple (Date, City, MeasurementMethod). And if you wish to keep both the average temperature and humidity, you just add a fourth dimension MeasureType to the database and store both, etc. Once your database becomes multidimensional it becomes impossible to display all of it as a two-dimensional table, so you will have to explore it by slices. For example, if the database did indeed contain 4 dimensions (Date, City, MeasurementMethod, MeasureType) then the above table would be a slice of it for (MeasureType = Temperature, MeasurementMethod = Usual). The MDX query corresponding to the slice would look as follows:

      select
        { City.Tartu, City.Tallinn } on columns,
        { Date.Jan.21, Date.Jan.22, Date.Jan.23 } on rows
      where (MeasureType.Temperature, MeasurementMethod.Usual)

    Cell calculations. The second thing you use spreadsheets for is to compute new cell values from the existing ones. For example, you might wish to augment the table above with a column showing the temperature difference between Tartu and Tallinn and a row showing the average temperature over three days:

    Tartu Tallinn Difference
    Jan 21 -1.0 °C -2.1 °C 1.1 °C
    Jan 22 -0.2 °C -0.2 °C 0.0 °C
    Jan 23 0.3 °C -0.2 °C 0.5 °C
    Average -0.3 °C -0.83 °C 0.53 °C
    Average temperature

    To do that in Excel you would have to enter a formula for each cell in the new column and row. In MDX you analogously write:

      create member City.Difference as (City.Tartu - City.Tallinn)
      create member Date.Jan.Average as 
             (Avg({ Date.Jan.21, Date.Jan.22, Date.Jan.23 }))

    Note that once you have defined the new members, they apply to any slice of your data. I.e., you have just defined the way to compute City.Difference and Date.Jan.Average for all existing measure types and measurement methods. Moreover, many of the useful aggregate computations are already predefined for you. For example, the MDX server would implicitly understand that Date.Jan denotes be the aggregation (e.g. average) of values over all the days of January, and Date is the aggregation of values over all the dates ever. Similarly, City denotes the aggregation over all cities. Thus, selecting the average temperature over all cities in January is a matter of requesting

      select
      where (Date.Jan, City, MeasureType.Temperature, 
             MeasurementMethod.Usual)

    Filters, orders and more. You often need to query the data for things like "cities with the highest average temperature in January", or request to "order days by the temperature difference between Tartu and Tallinn". This is where both the power and complexity of MDX becomes visible. The first query would look as follows:

      select
        Filter(City.Members, Date.Jan == Max(Date.Jan)) on columns
      where (MeasureType.Temperature, MeasurementMethod.Usual)

    Notice how much more expressive this is in comparison to the equivalent query you would have to come up with in SQL. Besides slicing, filtering and ordering, an MDX server can support lots of various generic data processing and analysis functions (here, the exact capabilities depend on the vendor). Thus, when properly tuned, even the tasks such as "selecting differentially expressed genes that play a significant role in a linear model for predicting cancer stage from microarray expression data" could become a matter of a single concise query.

    Pretty cool, don't you think?

    Tags: , , , , ,

  • Posted by Konstantin 13.12.2008 No Comments

    It is somewhat sad to see that the Scalable Vector Graphics (SVG) format, despite its considerable age and maturity, has not yet gained too much popularity in the internet, with a lot of Adobe Flash all over instead. Here are some points you should know about it, so that maybe you consider taking a couple of hours to get acquainted with it one day.

    1. SVG is an open source vector graphics format.
    2. SVG supports most of what you'd expect from a 2D graphics language: cubic splines, bezier curves, gradients, nested matrix transformations, reusable symbols, etc.
    3. SVG is XML-based and rather straightforward. If you need a picture with a line and two circles, you write a <line> tag and two <circle> tags:
      <svg xmlns="http://www.w3.org/2000/svg">
          <line x1="0" y1="0" x2="100" y2="100" 
                stroke-width="2" stroke="black"/>
          <circle cx="0" cy="0" r="50"/>
          <circle cx="100" cy="100" r="20" 
                  fill="red" stroke="black"/>
      </svg>
    4. Most vector graphics editors can write SVG. For example, Inkscape is one rather usable open-source software piece.
    5. SVG supports Javascript. Basically, if you know HTML and Javascript, you are ready to write SVG by hand, because SVG is also just XML + Javascript. This provides considerable freedom of expression.
    6. SVG can be conveniently embedded into HTML webpages and is supported out-of-the-box by most modern browsers.

    My personal interest towards SVG is related to the observation, that it seems very suitable for creating interactive data visualizations (charts, plots, graphs) right in the browser. And although the existing codebase devoted to these tasks can't be called just enormous, I'm sure it will grow and gain wider adoption. Don't miss it!

    Tags: , , , ,

  • Posted by Konstantin 24.10.2008 No Comments

    This week I had a chance to participate in a project meeting related to soft computing. A part of the meeting was a brainstorm-like session, dedicated to the generation of relevant topics for future research in the field. I was listening very carefully, writing everything down, and trying to generalize somewhat. By the end of the meeting my generalization attempts succeeded and now I can easily generate relevant topics in arbitrary numbers without any external help. In an act of openness, unprecedented elsewhere in scientific circles, I'm sharing this secret technology with you here.

    Fresh Topics in Soft Computing
    Generate more topics: Easy, Difficult

    For the curious: here is the source code.

    Tags: , , ,