• Posted by Konstantin 10.01.2012 No Comments

    It is not uncommon when a long-running scientific study or an experiment produces results which are, at best, uninteresting. The measured effect may be too weak to be reported on convincingly given the data at hand. None the less, resources have been put into it, many man-months have been spent, and thus a paper must be published. The researcher must therefore present his results in a way convincing enough for the reviewers to be lulled into acceptance.

    The following are the three best methods for doing that (and I have seen those being used in practice). Next time you read someone's paper (or write your own), keep them in mind.

    1. Use an irrelevant (and preferably strict) hypothesis test.
      Suppose you want to show that a set of measurements in one group differs from the set of measurements in the other group. The typical approach here is the T-test or the Wilcoxon test, both of which detect whether elements in one group are on average greater than those in the other group. If, however, you find that the tests fail on your data (i.e., there is no easily detectable difference in measurement magnitudes), why don't you try something like the Kolmogorov-Smirnov test, which checks whether the distributions of the two groups are different. It is a much stricter condition. In fact the tiniest outlier in your data will easily get you a low p-value and thus something to stick in the face of a reviewer. If even the KS test did not work, try testing something even less relevant, such as, whether your data is normally distributed. Most probably it is not, here's your low p-value! Remember - the smaller your p-values, the better is your paper!
    2. Avoid significance testing completely
      If you can't get a low p-value anywhere, do not worry. Significance testing is going somewhat out of fashion nowadays anyway, so it is possible to avoid it and still sound convincing. If one group of measurements has 40% of successes and the other has 42% - why not simply present those two numbers as obvious proof that the second group is better. Using ratios is also a smart idea. Say, some baseline algorithm has a 1% chance of success. You now test your algorithm and discover that out of 10 trials it had 1 success. That means your algorithm has just demonstrated a 10% success rate, which is ten times better than the baseline! Finally, ROC curves can often be used to hide the fact that your data is too tiny to make any conclusions. No one really ever checks for significance of those.
    3. Sweep multiple testing under the carpet
      If you are analyzing a dataset with 1000 attributes and 50 datapoints, it is not really very surprising if one of those attributes will seem "interesting" (e.g. highly correlated with the target effect) purely by chance - there is often nothing significant in finding one out of a thousand. However, if you only mention that one (or perhaps 10-50) of the original attributes, your results will magically become significant and no reviewer will be able to catch your cheating.

    There are certainly more, and I'll keep the post updated if I come up with a worthy addition. If you have something to add, please do comment.

    Tags: , , ,

  • Posted by Konstantin 09.05.2011 1 Comment

    I have recently realized that my HP 8440p laptop has a built-in "Qualcomm un2420 Broadband Module" device, also known as a "3G modem". For some reason no drivers were preinstalled for it on my system, and with the SIM card slot concealed behind the battery, it was not something I noticed immediately. Once the drivers were installed, the operating system had no problem recognizing the new "Mobile Broadband Connection" opportunity, and with the SIM card in the slot, I could connect to the Internet via 3G, yay.

    Knowing that there is more to mobile communication than Internet access, I was wondering whether I could do anything else, like voice calls or SMS. Unfortunately, my attempts of finding any reasonable software packages, which would open up the power of 3G to me at the click of a button, failed. Instead, however, I discovered that it is actually quite easy to communicate with the modem directly. It turns out you can control your shiny bleeding-edge 3.5G device by sending plain old AT commands to it over a serial port. This is the same protocol that the wired grandpa-modems have been speaking since the 70s and it is fun to see this language was kept along all the way into the wireless century.

    Let me show you how it works. Try to follow along — this is kinda fun.

    Finding your Modem

    If your computer does not have a built-in 3G modem, chances are high your garden variety cellphone does (not to mention smartphones, of course). If it is the case, then:

    • Switch on the Bluetooth receiver on your phone (for older Nokias this is usually in the "Settings -> Connectivity -> Bluetooth" menu).
    • On your computer, go to "Devices and Printers", click "Add a device", wait until your phone appears on the list, double click it and follow the instructions to establish the connection. (I'm talking about Windows 7 here, but the procedure should be similar for most modern OSs).
    • Once the computer recognizes your phone and installs the necessary drivers, it will appear as an icon in the "Devices" window. Double click it to open the "Properties" window, and make sure there is a "Standard Modem over Bluetooth link" function or something similar in the list.

      Cellphone Functions

      Cellphone Functions

    • Double-click that "modem" entry, a new properties window opens. Browse along in it, and find the COM port number that was assigned to the modem.

      Bluetooth modem at port COM9

      Bluetooth modem at port COM9

    If you do have a 3G modem bundled with your laptop (and you have the drivers installed), open the Device manager ("Control Panel -> Device Manager"), find the modem in the list, double click to open the "Properties" page, and browse to the "Modem" tab to find the COM port number.

    Laptop 3G modem in Device Manager

    Laptop 3G modem in Device Manager

    Connecting to the Modem

    Next thing - connect to the COM port. In Windows use PuTTY to do it. In Linux use minicom. Don't worry about the settings — the defaults should do.

    PuTTY connection dialog

    PuTTY connection dialog

    Once the connection starts, you will get a blank screen with nothing but a cursor. Try typing "AT+CGMI" followed by a <RETURN>. Note that, depending on the settings of your device and the terminal program, you might not see your letters being typed. If so, you will have to reconfigure the terminal (enable "local echo"). But for now, just type the command. You should get the name of the manufacturer in response. You can also get the word "ERROR" instead. This means that your modem is ready to talk to you, but it either does not support the "AT+CGMI" command OR requires you to enter the PIN code first. We'll get to it in a second. If you get no response at all, you must have connected to the wrong COM port.

    Terminal sessions with a Qualcomm (left) and Nokia (right) 3G modems

    Terminal sessions with a Qualcomm (left) and Nokia (right) 3G modems

    You can get more information about the device using the "AT+CGMM", "AT+CGMR", "AT+CGSN" commands. Try those.

    Authentication

    To do anything useful, you need to authenticate yourself by entering the PIN (if you use your cellphone over bluetooth, you most probably already entered it and no additional authentication is needed). You can check whether you need to enter PIN by using the command "AT+CPIN?" (note the question mark). If the response is "+CPIN: READY", your SIM card is already unlocked. Otherwise the response will probably be "+CPIN: SIM PIN", indicating that a PIN is expected to unlock the card. You enter the pin using the "AT+CPIN=<your pin>" command. Please note, that if you enter incorrect PIN three times, YOUR SIM CARD WILL BE BLOCKED (and you will have to go fetch your PUK code to unblock it), so be careful here.

    Entering the PIN

    Entering the PIN

    Doing Stuff

    Now the fun starts: you can try dialing, sending and receiving messages and do whatever the device lets you do. The (nonexhaustive) list of most interesting commands is available here. Not all of them will be supported by your device, though. For example, I found out that the laptop's 3G modem won't let me dial numbers, whilst this was not a problem for a cellphone connected over bluetooth (try the command "ATD<your number>;" (e.g. "ATD5550010;")). On the other hand, the 3G modem lets me list received messages using the "AT+CMGL" command, while the phone refused to do it.

    One useful command to know about is "AT+CUSD", which lets you send USSD messages (those "*1337#" codes) to the mobile service provider. For example, the prepaid SIM card I bought for my computer allows to buy a "daily internet ticket" (unlimited high-speed internet for 12 hours for 1 euro) via the "*135*78#" code. Here's how this can be done via terminal.

    Sending an USSD code and receiving an SMS

    Sending an USSD code and receiving an SMS

    We first send "AT+CUSD=1,*135*78#" command, which is equivalent to dialing "*135*78#" on the phone. The modem immediately shows us the operator's response ("You will shortly receive an SMS with information..."). We then list new SMS messages using the "AT+CMGL" command. There is one message, which is presented to us in the PDU encoding. A short visit to the online PDU decoder lets us decrypt the message - it simply says that the ticket is activated. Nice.

    Sending an SMS

    Finally, here's how you can send a "flash" SMS (i.e. the one which does not get saved at the receiver's phone by default and can thus easily confuse people. Try sending one of those at night - good fun). We start with an ATZ to reset the modem, just in case. Then we set message format to "text mode" using the "AT+CMGF=1" command (the alternative default is the "PDU mode", in which we would have to type SMS messages encoded in PDU). Then we set message parameters using the "AT+CSMP" command (the last 16 is responsible for the message being 'flash'). Finally, we start sending the message using the "AT+CMGS=<phone number>" command. We finish typing the message using <Ctrl+Z> and off it goes.

    Sending a "Flash" SMS

    Sending a "Flash" SMS

    For more details, refer to this tutorial or the corresponding specifications.

    All in all, it should be fairly easy to make a simple end-user interface for operating the 3G modem, it is strange that there is not much free software available which could provide this functionality. If you find one or decide to make one yourself, tell me.

    Tags: , , , ,

  • Posted by Swen 27.11.2008 4 Comments

    Note: This is again a post for sharing personal computer-taming experiences, that aims to help others who might stumble upon simialr problems. Also note, that this is a guest posting kindly presented by Swen Laur.

    Python

    We as dumb users are often hurting ourselves through ignorance. One of many such traps is Python install on Mac OS X Leopard. First, for most users a new install is completely unnecessary, since Leopard comes with pre-installed Python. However, as top Google hits kindly suggest to install MacPython, we as ignorant users follow the advice. As a result, there will be two versions of Python in the computer. Now if we install setup-tools and easy_install, spooky things start to happen — packages that are installed through easy_install are suddenly inaccessible and nothing seems to work properly.

    The explanation behind this phenomenon is simple though a bit technical. Obviously, after having installed MacPython, we have two distributions of Python in our computer:

    • /System/Library/Frameworks/Python.framework/Versions/Current/bin/
    • /Library/Frameworks/Python.framework/Versions/Current/bin/

    which are accessible from the command line through the links in the directories /usr/bin and /usr/local/bin, respectively. On installation, MacPython modifies .bash_login file so that the new path is

    /Library/Frameworks/Python.framework/Versions/Current/bin/;${PATH}
    and thus the command line invocation of python is translated to

    /Library/Frameworks/Python.framework/Versions/Current/bin/python

    which, naturally, corresponds to MacPython.

    As a second clue, note that setup-tools and easy_install packages come with Leopard's Python itself. More precisely, the corresponding command line tool is /usr/bin/easy_install. However, if you install setup-tools for your new MacPython, the corresponding binary will be  /usr/local/bin/easy_install. Now note that in the default configuration, /usr/bin goes before /usr/local/bin in the PATH variable. As a direct consequence, each invocation of easy_install puts packages to Apple Python (into the directory /Library/Python/<version>/site-packages, to be precise), and these packages are naturally not accessible by the MacPython, that you normally execute from a shell. MacPython search the packages from the directory

    /Library/Frameworks/Python.framework/Versions/<version>/lib/python2.5/site-packages.

    As a simple solution, we must change the configuration of easy_install. For that we have to change the configuration  .pydistutils.cfg located at your home directory. There should be the following lines

    [install]
    install_lib=/Library/Frameworks/Python.framework/Versions/Current/lib/python$py_version_short/site-packages/
    install_scripts =/Library/Frameworks/Python.framework/Versions/Current/bin

    The first line specifies where all Python modules should be installed and the second line specifies where all executables should be located (this directory is in the first place in the PATH and thus  MacPyton executables are the first to be executed). After that one should reinstall setuptools and easy_install.During the installation of setuptools, the program might comlain. In this case you should silence it by modifying PYTHONPATH. As a result, easy_install is located 

    /Library/Frameworks/Python.framework/Versions/Current/bin 

    and thus the packages are put into correct directory. As such, we solve the problem with easy_install but are still vulnerable to other easter-eggs coming from the fact that we do not use the Apple distribution of Python. In particular, you cannot do any user-interface related programming, since the corresponding PyObjC module is missing from non-Apple Python distributions.

    Another, more complex solution is to remove MacPython cleanly. The latter is not an easy thing to do, since package management of open-source tools is more than unsatisfactory in Mac OS X (in fact, open-source tools are often difficult to install and uninstall under many non-Linux operating systems). The situation is even worse when you have installed some other Apple packages (.mpkg files) to extend basic configuration of MacPython. In the following, we describe how to do a successful uninstall in such cases. This procedure is not for the fainthearted — if you have a time-machine copy of the system state before installing MacPython, then it might be easier to revert to this state.

    The key to successful uninstallation procedure is a basic understanding what is a package and how the Installer.app works. First, note that the package is nothing more than a directory (one can expand the contents by right-clicking the package) with the following structure

    Contents/Archive.bom
    Contents/Info.plist
    Contents/Packages/..
    Contents/PkgInfo
    Contents/Resources

    The file Info.plist provides an xml-description how to install the package. The file Archive.bom contains in semi-binary description of all files that will be copied to various install locations. The directories Packages and Resources contain sub-packages and files to be copied to the system. The installer.app reads Info.plist and Archive.bom to create necessary files and then copies the package without directories Resources to /Library/Receipts. The receipts are in principle enough to do a clean uninstall if the files are not used by other programs. Normal practice for the Apple software requires that all files should be stored in the directory
    /Applications/Installed-Application.app or in some place in /Library if the corresponding piece of software is shared by several applications. Mostly, the corresponding directories are stored in /Library/Frameworks.  The integrity of all files installed under /System during the system updates are not guaranteed and therefore sane persons do not store files there. The latter is not true for open-source software, which often tries to write into directories /usr/local/bin and /usr/bin (extremely dangerous).

    To uninstall packages, we have to first find out, where it was written. The  key IFPkgFlagDefaultLocation or IFPkgRelocatedPath in the Info.plist contains the default or actual installation directory. With the command line tool lsbom we can also see the list of all files and directories that where installed. To be precise, the lsbom gives out relative paths with respect to install directory.

    Now applying this knowledge, we can deduce that MacPython consists of five packages, which create a directory /Library/Frameworks/Python.framework
    and write the following files

    idle
    idle2.5
    pydoc
    pydoc2.5
    python
    python-config
    python2.5
    python2.5-config
    pythonw
    pythonw2.5
    smtpd.py
    smtpd2.5.py

    to the directory /usr/local/bin in addition to /Application/MacPython <>. Hence, if we remove these files, we restore the initial state unless we installed some other packages. In this case the procedure can be more tedious.

    Finally, we should restore the old ~/.bash_login from the file ~/.bash_login.pysave.

    To summarize, uninstallation of packages is more difficult than application-bundles (yet nevertheless, it is not hopeless). But still, I think applications without installers are the way to go.

    Important update: How to install scypy from source code

    Scipy is a nice extension of Python that provides the power of Matlab functions. By some odd reasons the maintainers of scipy package suggest to install non-Apple distribution of Python. In other words, they force you to abandon PyObjC library that is a core graphical library on Mac OS X. Fortunately,  it is still possible to have both by installing scipy package from the source. There are some hiccups in the procedure but in general it is doable

    1. Install unfpack thogether with UFconfig and AMD libraries from the source
      1. Download source files from the url http://www.cise.ufl.edu/research/sparse/umfpack/
      2. Change the configuration file UFconfig according your computer. Flag
         
        UMFPACK_CONFIG = -DNBLAS  

        is a safe though a bit slow choice 

      3. Run make and pray
      4. Copy libumfpack.a  and libamd.a
      5. to /usr/local/lib

      6. Copy all files from Include directories of UMPACK, UFconfig and AMD:
        umfpack*.h
        UFconfig.h
        amd.h
        amd_internal.h
    2. Download and install numpy and scipy packages. You might change the compilation target by by typing the following commandexport MACOSX_DEPLOYMENT_TARGET=10.4in the shell before compiling and installing. The target value 10.5 is not good choice, since it automatically creates compile problems (joys of using freeware). Note that the current tarball scipy-0.7.0b1 is incomplete (joys of using freeware) and thus you have to use svn trunc for that. But otherwise, the installation guide http://www.scipy.org/Installing_SciPy/Mac_OS_X is correct.

    Important update: An update to the update

    Seems that they have now fixed the 10.5 target and you can use the command

    export MACOSX_DEPLOYMENT_TARGET=10.5

    before compiling if you have Leopard

    Tags: , ,

  • Posted by Konstantin 10.10.2008 4 Comments

    Note: This is a brief transcript of actions related to my recent hard disk crash recovery. I'm publishing it here in the hope it might be of use to someone someday, just as all those similar posts out there are every so often of great use to me. So if you, dear reader, are not really interested in replacing a hard disk of your DELL PC, you better save your time and skip to the next post.

    A pair of days ago I accidentally managed to test what happens if you throw a running 3-year-old DELL XPS M1210 laptop from a height of about 1.3 meters against solid floor. The result, fortunately, was radically different from what one might expect after observing that painful inelastic crash. After switching the screen off for a period of 5 seconds or so (the reasons behind this behavior being a complete mystery for me) the laptop happily resumed as if nothing had happened. A later examination revealed that the only part that was damaged was the hard disk where a number of files became unreadable.

    I presumed that it must have been the result of a strong head crash. Common knowledge suggests that a head-crashed hard disk should better be replaced, even if it continues functioning. This is because new head crashes are much more probable after the first one has happened. Thus, I went to the shop and got myself a new average-grade 160Gb SATA hard disk (these are surprisingly cheap nowadays, btw). I also bought a small external case to put my old drive into, so that I could continue using it as an external storage medium.

    In principle, the whole procedure could have ended here: I could have taken the old hard disk out, put the new one in, installed the OS onto the new disk from a CD and copied my files from the old disk with the help of the external case. However, there was one thing I didn't like about this approach. If I were to restart with a blank OS, I'd have to bother installing the proper drivers as well as several pieces of software that were shipped with the factory preinstallation of Windows XP (in particular, that enormously useless yet really impressive Logitech Video Effects webcam feature). I knew, however, that there was a way to restore the disk to the factory setting so I went on to figure out how to do it for a new hard disk. In hindsight, the whole procedure is actually rather straightforward, at least if you appreciate the virtues of the dd command.

    DELL System Restore Partition

    It seems that most DELL laptops nowadays come with a "DELL System Restore" feature, which means that there is a hidden partition on the hard disk that stores the factory-installed Ghost image of the system drive. If at boot time you press Ctrl+F11, you get the opportunity to write this image to disk thus destroying all your data but bringing the computer to its retail state.

    The hidden partition is labeled DellRestore and besides the image file it contains a bootable DOS operating system (yes, real DOS!), the program for restoring the Ghost image (restore.exe) and an autoexec.bat script that invokes this program on boot. So, if I'm not mistaken, by pressing Ctrl+F11 you just direct the laptop to boot the DellRestore partition.

    In theory, if you mirror the partition structure of your old hard drive on the new one and copy the data of the DellRestore partition, it should be possible to simply put in the new drive and press Ctrl+F11 for a proper restoration. However, my new drive was larger in capacity than the old one so I didn't want to mirror the partition structure exactly and I ended up with a somewhat more complicated procedure. The following is an approximate list of steps I went through. Don't try repeating them unless you really understand everything, though. You are also strongly advised to visit this site first.

    The Walkthrough

    1. I needed two bootable CDs here: one Knoppix Live CD and one DOS bootable CD.
    2. First, keep the old (damaged) hard disk in the laptop, put the new disk into the external case, but don't plug it in yet.
    3. Boot the Knoppix CD, when it starts switch to the command line (Ctrl+Alt+F1)
    4. Ensure that the laptop's hard disk is seen as /dev/sda and examine its partition table:
      # cfdisk /dev/sda
    5. It should have 3 partitions: DellUtility (sda1), the main NTFS partition (sda2) and DellRestore (sda3). Write down their sizes and quit cfdisk.
    6. Plug in the new disk (the one in the external case) into the USB port. Ensure that OS detects it: type
      # dmesg
      and examine the output.
    7. Partition Table on the New Disk

      New Drive's Partition Table

      Type
      # cfdisk /dev/sdb
      and create a partition table for the new disk that mimics the one of the old one. In my case I created the first DellUtility partition of the same size and type as was sda1. I then added one NTFS partition (for the main Windows OS) and a small Linux partition (just for fun) choosing the sizes of both to my liking. Finally the fourth partition was created of the same size and type as sda3. See Figure.

    8. Copy the data of sda1 to sdb1 and sda3 to sdb4.
      # dd if=/dev/sda1 of=/dev/sdb1 bs=1024
      # dd if=/dev/sda3 of=/dev/sdb4 bs=1024
    9. Copy the master boot record of the old drive to the new one.
      # dd if=/dev/sda of=/dev/sdb bs=446 count=1
    10. Now, shutdown Knoppix
      # shutdown -h now
      remove the old disk from the laptop, take the new disk from the case and put it into the laptop. Now boot the DOS CD.
    11. When in DOS, go to disk C: (this is the DellRestore partition) subdirectory bat\:
      > cd c:\bat
    12. Execute restore.bat
      > restore.bat
      This should initiate the restore process and write the factory image to your second partition.
    13. After the restore process completed I somewhy still had the problem that the system would not boot. I booted Knoppix, opened the partition table (cfdisk /dev/sda) and discovered that the bootable flag of the second partition was gone after the restoration process. Making the partition bootable again and restarting fixed the problem.

    That's it.

    Tags: , ,

  • Posted by Konstantin 08.10.2008 No Comments

    I received a number of "why" and "how" questions regarding the pri.ee domain name of this site and I thought the answers are worth a post. The technically savvy audience can safely skip it, though.

    The pri.ee subdomain is reserved by EENet for private individuals, who have an estonian ID code. The registration is free of charge and very simple: you just fill in a short form and wait a day or two until your application is processed. As a result you end up with a simple affiliation-free way of designating your site. Of course, it does not have the bling of a www.your-name.com, but I find it quite appropriate for an aspiring blog (and besides, I'm just too greedy and lazy to bother paying for the privilege of a flashy name for my homepage).

    Now on to the "how" part. The only potentially tricky issue of the registration process is the need to fill in the "Name servers" field. Why do you need that and why can't you just directly provide the IP address of the server where you host your site? Well, if you could register the specific IP of your server with EENet, you would have to to contact EENet every time your hosting provider changed, right? In addition, you would need to bother EENet about any subdomain (i.e. <whatever>.yourname.pri.ee) you might be willing to add in the future. Certainly not the most convenient option. Therefore, instead of providing an IP address directly, you specify a reference to an intermediate server, which will perform the mapping of your domain name (and any subdomains) to IP addresses. That's how the internet domain naming system actually works.

    So which name server should you choose? Most reasonable hosting providers (that is, the ones that allow to host arbitrary domains) allow you to use their name servers for mapping your domain name. The exact server names depend on the provider and you should consult the documentation. For example, if you were hosting your site at 110mb.com (which is here just an arbitrarily chosen example of a reasonable free web hosting I'm aware of), the corresponding name servers would be ns1.110mb.com and ns2.110mb.com.

    However, using the name server of your provider is, to my mind, not the best option. In most cases the provider will not allow you to add subdomains and if you change your hosting you'll probably lose access to the name server, too. Thus, a smarter choice would be to manage your domain names yourself using an independent name server. Luckily enough, there are several name servers out there that you can use completely free of charge (or for a symbolic donation): EveryDNS and EditDNS are two examples of such services that I know of.

    After you register an account with, say, EveryDNS, you can specify the EveryDNS nameservers (ns1.everydns.net, ..., ns4.everydns.net) in the pri.ee domain registration form. You are now free to configure arbitrary address records for yourname.pri.ee or <whatever>.yourname.pri.ee to your liking.

    To summarize, here how one can get a reasonable website with a pri.ee domain name for free:

    1. Register with a reasonable web hosting provider
      • 110mb.com is one simple free option (with an exception that they charge $10 once if you need MySQL)
      • other options
    2. Register a DNS account
    3. Fill out this form.
      • If you chose EveryDNS in step 2, state ns1.everydns.net, ns2.everydns.net, ns3.everydns.net, ns4.everydns.net as your name servers.
      • Wait for a day or two.
    4. Suppose you applied for yourname.pri.ee (the domain is still free, by the way!), then:
      • Add this domain in your hosting's control panel and upload your website.
      • Add this domain to your DNS account.
        • You can add an A ("address") record mapping yourname.pri.ee to an IP address.
        • Alternatively, you can add a NS ("name server") record referencing yourname.pri.ee further to ns1.110mb.com (or whatever name server your hoster provides).
    5. Profit!

    Tags: ,