• Posted by Konstantin 23.01.2012 2 Comments

    Visualization is a very powerful method for data analysis. Very often, plotting a bunch of scatterplots, barplots, heatmaps, animations or other kinds of imagery is enough to immediately see by your own eyes, whether there are any interesting patterns in the data (which often means you have nearly solved the problem) or not (which means you should prepare yourself for a long-term battle with the data which might not end up succesfully for you).

    Visualization is powerful because by visualizing data you essentially "plug it" directly into your brain's processing engine, using the visual interface that happens to be supported by your brain. You need to convert the data into CSV or an XLS format to load it into Excel. Analogously, you need a 2d image or an animation to load the data into your brain - it is that simple.

    This view suggests two immediate developments. Firstly, why don't we use the other "interfaces" that our brain has with the outside world for data processing? Could converting data to something which sounds, feels, tastes or smells be a useful method for exploiting our brain's analytic capabilities even further? Obviously, visual input has the most impact simply due to the fact that the retina is an immediate part of the brain. However, auditory signals, for example, seem to have a powerful processing system in our brain dedicated to them too.

    Secondly, if we can appreciate how much our brain is capable of extracting from a single image, why don't we try to automate such an approach? Modern computer vision has reached sufficient maturity to be capable of extracting fairly complex informative features from images. This suggests that a particular 2d plot of a dataset can be used as a kind of an informative "data fingerprint" which, when processed by a computer vision-driven engine, could be analyzed on the presence of "visible" patterns and visual similarity to other datasets.

    The fun part is that there has been some research done in this direction. Consider the paper "Computer Vision for Music Identification" by Yan Ke et al. The authors propose to convert pieces of music into a spectrogram image. Those spectrogram images can then be compared to each other using methods of computer vision, thus resulting in an efficient similarity metric, usable for search and identification of musical pieces. The authors claim to achieve 95% precision at 90% recall, which compares favourably to alternative methods. I think it would be exciting to see more of such techniques applied in a wider range of areas.

     

    Representing audio as pictures, figure from (Y.Ke, 2005)

    Representing audio as pictures, figure from (Y.Ke, 2005)

    Tags: , , , ,

  • Posted by Konstantin 16.01.2012 No Comments

    This post presumes you are familiar with PCA.

    Consider the following experiment. First we generate a random vector (signal) as a sequence of random 5-element repeats. That is, something like

    (0.5, 0.5, 0.5, 0.5, 0.5,   0.9, 0.9, 0,9, 0.9, 0,9,   0.2, 0.2, 0.2, 0.2, 0.2,   ... etc ... )

    In R we could generate it like that:

    num_steps = 50
    step_length = 5;
    initial_vector = c();
    for (i in 1:num_steps) {
      initial_vector = c(initial_vector, rep(runif(1), step_length));
    }

    Here's a visual depiction of a possible resulting vector:

    Initial random vector

    plot(initial_vector), zoomed in

    Next, we shall create a dataset, where each element will be a randomly shifted copy of this vector:

    library(magic) # Necessary for the shift() function
    dataset = c()
    for (i in 1:1000) {
      shift_by = floor(runif(1)*num_steps*step_length) # Pick a random shift
      new_instance = shift(initial_vector, shift_by)   # Generate a shifted instance
      dataset = rbind(dataset, new_instance);          # Append to data
    }

    Finally, let's apply Principal Component Analysis to this dataset:

    pca = prcomp(dataset)

    Question - how do the top principal components look like? Guess first, then read below for the correct answer.

    Read more...

    Tags: , ,

  • Posted by Konstantin 10.01.2012 No Comments

    It is not uncommon when a long-running scientific study or an experiment produces results which are, at best, uninteresting. The measured effect may be too weak to be reported on convincingly given the data at hand. None the less, resources have been put into it, many man-months have been spent, and thus a paper must be published. The researcher must therefore present his results in a way convincing enough for the reviewers to be lulled into acceptance.

    The following are the three best methods for doing that (and I have seen those being used in practice). Next time you read someone's paper (or write your own), keep them in mind.

    1. Use an irrelevant (and preferably strict) hypothesis test.
      Suppose you want to show that a set of measurements in one group differs from the set of measurements in the other group. The typical approach here is the T-test or the Wilcoxon test, both of which detect whether elements in one group are on average greater than those in the other group. If, however, you find that the tests fail on your data (i.e., there is no easily detectable difference in measurement magnitudes), why don't you try something like the Kolmogorov-Smirnov test, which checks whether the distributions of the two groups are different. It is a much stricter condition. In fact the tiniest outlier in your data will easily get you a low p-value and thus something to stick in the face of a reviewer. If even the KS test did not work, try testing something even less relevant, such as, whether your data is normally distributed. Most probably it is not, here's your low p-value! Remember - the smaller your p-values, the better is your paper!
    2. Avoid significance testing completely
      If you can't get a low p-value anywhere, do not worry. Significance testing is going somewhat out of fashion nowadays anyway, so it is possible to avoid it and still sound convincing. If one group of measurements has 40% of successes and the other has 42% - why not simply present those two numbers as obvious proof that the second group is better. Using ratios is also a smart idea. Say, some baseline algorithm has a 1% chance of success. You now test your algorithm and discover that out of 10 trials it had 1 success. That means your algorithm has just demonstrated a 10% success rate, which is ten times better than the baseline! Finally, ROC curves can often be used to hide the fact that your data is too tiny to make any conclusions. No one really ever checks for significance of those.
    3. Sweep multiple testing under the carpet
      If you are analyzing a dataset with 1000 attributes and 50 datapoints, it is not really very surprising if one of those attributes will seem "interesting" (e.g. highly correlated with the target effect) purely by chance - there is often nothing significant in finding one out of a thousand. However, if you only mention that one (or perhaps 10-50) of the original attributes, your results will magically become significant and no reviewer will be able to catch your cheating.

    There are certainly more, and I'll keep the post updated if I come up with a worthy addition. If you have something to add, please do comment.

    Tags: , , ,

  • Posted by Konstantin 10.12.2011 No Comments

    It seems to be popular among some of my (very scientifically-minded) friends to somewhat provocatively defend a claim, that "fundamentally, there is no difference between a scientific worldview and a superstitious or a religious one". The typical justification is that the desire of one person to believe the scientific methodology and only trust in the "material world" has the same roots as the desire of another person to believe in the supernatural. Consequently, it should be unfair to claim that one set of beliefs must have a priority over the other. In the words of a well-versed narrator, this line of argument can be made very convincing and leave one wondering: if the choice of basic beliefs is indeed just a matter of taste and cultural traditions, why does it still seem that the "scientific worldview" is somewhat more "practical". Is it just an illusion? The following is an attempt to justify that it is not, and that there are some simple reasons for the objectivist worldview to be superior.

    In order to avoid misunderstanding, so common in philosophical discourses, we must start by defining some very basic terms.

    • First, an obvious yet necessary definition of us. By "us" I mean just myself and you, the reader. The existence of me requires no proof for me and I presume this is symmetric with you. The existence of you I can observe personally when we have a chance to meet. Thus, we might agree there is this thing called "us".
    • By a subjective reality I mean everything perceivable by me. This, again, requires no proof for me. I presume that there is an equivalent subjective reality for you. Most importantly, although our two realities may not necessarily be equal, they have a lot of things in common. For example, the text of this post obviously belongs to both. Thus, the set of things shared by our subjective realities shall be referred to as (our) objective reality.
    • By an individual we shall mean some entity, capable of communicating with us. That is, an individual is something capable of perceiving our signals and reacting to them. We do not assume any other properties for an individual - it may be a person, a fairy, or an alien. As long as we can communicate in some way, it fits the definition.
    • By an individual's worldview we shall regard the set of all the possible reactions of an individual to all the possible stimuli. That is, I declare the individual's worldview to be exactly defined by its observable behaviour. In order for two individuals to have differing worldviews they must behave (or, to be more precise, communicate) differently in at least one situation.

    Communication can take multiple forms. Let us limit ourselves mainly with verbal communication. Consideration of nonverbal worldviews would complicate the discussion for no good reason (those would be things such as musical and culinary tastes).

    So, suppose we have an individual. We can talk to it, and it can answer something. Its worldview is then just the set of answers to the questions in a given context (-Do fairies exist? -Yes/No/I don't know; -Is this liquid poisonous? -Yes/No/I don't know; etc).

    Obviously, according to such a definition, every person in this world is an individual and has its own unique worldview. The worldviews of some people are perhaps "more scientific" and those of others are perhaps "more superstitious". In the following I am going to show that there is one worldview which is "most informative" for the purposes of our communication with that individual, and this will be precisely the "scientific" one.

    • By a basic stimulus we shall call a stimulus, to which an individual responds in a way which is reproducible to some extent. Any other stimuli (those that provide a purely random response) are essentially uninformative and we would have no use for them in communication with the individual. The simplest example of a basic stimulus is a constant response. For example, most normal people, when pointed to a glass of milk and asked "Is this milk?", should provide a reproducible answer "Yes". Thus, "Is this milk?" is a basic stimulus.
    • Next, we shall say that the basic stimulus is objective, if there is at least some observable property of (our) objective reality, changing which would (again, reproducibly) influence the reaction to that stimulus. Any non-objective stimuli are of no interest to us, because they bear no relation to observable objective reality, and thus we have no way of interpreting the answer.
      For example, an individual might constantly respond "Boo!" to a query-phrase "Baa?", which means that "Baa?" a basic stimulus. However, we have no way of associating the answer "Boo!" with anything material, and thus have no way of understanding this response. Hence, from the point of view of communication, the query "Baa?" makes no sense. Similarly, if a person answers to the question "Do fairies exist" by "No", no matter what, we shall call this question non-objective.

    Now that we are done with the preliminary definitions, let me state the main claims.

    • We say that a worldview is ideally scientific, if it has the maximum possible set of basic objective stimuli (and any reactions to them).

    Most probably no human has an ideally scientific worldview simply because no human has explored all of the quirks of our objective reality. But each individual's worldview can be regarded as more or less scientific considering the amount of its basic objective stimuli. Contrarily,

    • We say that a worldview is ideally dogmatic, if it has the maximum possible set of basic non-objective stimuli.

    An ideally dogmatic worldview is, of course, meaningless, as it basically means that it has a fixed response to any question. However, we can say that each person's worldview has it's own degree of dogmaticity.

    Here lies the answer to the difference between a "scientific" and a "superstitious" (i.e. dogmatic or religious) worldviews. The former one is the only worldview, which makes sense for establishing communication between individuals (irrespectively of what "realities" each of them might live or believe in). This is the reason, which sets such a worldview forth as the "practical" one.

    It is important to understand that, in principle, there is nothing in this exposition that excludes the possibility for some people to actually have "their own" part of reality. For example, if person X tells me that he sees little green men all around, it may very well be something truly objective in his part of reality, which is, for some reason, not shared with me. Consequently, I will have to regard his statement as non-objective and as long as I do so, such a statement will be useless for communication and thus non-scientific (for the reality, which includes me and person X). I will thus say to him: "I do not refuse to believe in you seeing those little green men, but as long as I myself cannot observe their existence, discussing them with me is non-scientific, i.e. uninformative".

    Tags:

  • Posted by Konstantin 04.12.2011 1 Comment

    There is one rule of thumb that I find quite useful and happen to use fairly often. It is probably not widely known nor described in textbooks (I stumbled upon it on my own), so I regularly have to explain it.  Next time I'll just point out to this post.

    The rule is the following: a proportion estimate obtained on a sample of n points should only be trusted up to an error of \frac{1}{\sqrt{n}}.

    For example, suppose that you read in the newspaper that "25% of students like statistics". Now, if this result has been obtained from a survey of 64 participants, you should actually interpret the answer as 0.25\pm\frac{1}{\sqrt{64}}, that is, 0.25\pm 0.125, which means that the actual percentage lies somewhere between 12.5% and 37.5%.

    As another example, in machine learning, you often get to see cases where someone evaluates two classification algorithms on a test set of, say, 400 instances, measures that the first algorithm has an accuracy of 90%, the second an accuracy of, say, 92%, and boldly claims the dominance of the second algorithm. At this point, without going deeply into statistics, it is easy to figure that 1/\sqrt{400} should be somewhere around 5%, hence the difference between 90% and 92% is not too significant to celebrate.

    The Explanation

    The derivation of the rule is fairly straightforward. Consider a Bernoulli-distributed random variable with parameter p. We then take an i.i.d. sample of size n, and use it to estimate \hat p:

        \[\hat p = \frac{1}{n}\sum_i X_i\]

    The 95% confidence interval for this estimate, computed using the normal approximation is then:

        \[\hat p \pm 1.96\sqrt{\frac{p(1-p)}{n}}\]

    What remains is to note that 1.96\approx 2 and that \sqrt{p(1-p)} \leq 0.5. By substituting those two approximations we immediately get that the interval is at most

        \[\hat p \pm \frac{1}{\sqrt{n}}\]

    Limitations

    It is important to understand the limitations of the rule. In the cases where the true proportion estimate is p=0.5 and n is large enough for the normal approximation to make sense (20 is already good), the one-over-square-root-of-n rule is very close to a true 95% confidence interval.

    When the true proportion estimate is closer to 0 or 1, however, \sqrt{p(1-p)} is not close to 0.5 anymore, and the rule of thumb results in a conservatively large interval.

    In particular, the true 95% confidence interval for p=0.9 will be nearly two times smaller (\approx 0.6/\sqrt{n}). For p=0.99 the actual interval is five times smaller (\approx 0.2/\sqrt{n}). However, given the simplicity of the rule, the fact that the true p is rarely so close to 1, and the idea that it never hurts to be slightly conservative in statistical estimates, I'd say the one-over-a-square-root-of-n rule is a practically useful tool in most situations.

    Use in Machine Learning

    The rule is quite useful to quickly interpret performance indicators of machine learning models, such as precision, accuracy or recall, however, you should make sure you understand what proportion is actually being computed for each metric. Suppose we are evaluating a machine learning model on a test set of 10000 elements, 400 of which were classified as "positive" by the model. We measure the accuracy of the model by computing a proportion of correct predictions on the whole set of 10000 elements. Thus, the n here is 10000 and we should expect the confidence intervals of the resulting value to be under 1 percent point. However, the precision of the model is measured by computing the proportion of correct predictions among the 400 positives. Here n is actually 400 and the confidence intervals will be around 0.05.

    The rule can also be used to choose the size of the test set for your model. Despite the popular tradition, using a fraction of your full dataset for testing (e.g. "75%/25% split") is arbitrary and misguided. Instead, it is the absolute size of the test sample that you should care most about. For example, if you are happy with your precision estimates to be within a 1% range, you only need to make sure your test set would include around 10000 positives. Wasting an extra million examples for this estimate will increase the quality of your estimate, but you might be better off leaving these examples for model training instead.

    Tags: , ,

  • Posted by Konstantin 28.10.2011 No Comments

    I do not know who is the author, but I think this is great:

    Self-referential question

    Tags: , , ,

  • Posted by Konstantin 22.10.2011 No Comments

    This year I have been lucky to be taking part in the Robotex robot-building course. Despite being an awful time-sink, it is also uniquely enlightening. Our team has a blog, documenting the progress. If you think you might be interested to see what does "making a small robot" mean, and what kind of problems may come on the way, do take a peek:

    http://robotex.ing.ee/

    Tags: , ,

  • Posted by Konstantin 12.10.2011 16 Comments

    The Receiver Operating Characteristic Area Under the Curve (ROC AUC) is a measure of classifier performance, which is widely used in machine learning. Unfortunately, the obscure way the concept is explained in most sources makes it fairly hard to grasp its intuitive meaning. The name "area under the curve" itself is ill-conceived and is utterly useless in helping the intution. The aim of this post is to aid those struggling with the concept, and also present a simple and intuitive interpretation of the ROC AUC metric as the "average positive rank" which I, so far, have not seen stated explicitly elsewhere.

    To facilitate explanation, let us consider a hypothetical classification problem: classifying chocolate bars to "very good" or "just OK" (hereby we presume a chocolate bar may not be considered "bad" by definition of chocolate). Suppose we have a sample of particular chocolate bar types, for which we have carefully assigned the proper classification, e.g.:

    Item Class
    Mars 0
    Milka 1
    Kalev 0
    Lindt 1
    Ferrero-Rocher 1
    Laima 0
    Hershey's 0

    where class label "1" denotes "very good" and "0" means "just OK". Suppose we have derived a classifier which, based on some features of a chocolate bar, such as its chocolate content, packaging, country of origin, etc, attempts to assign a "goodness score" which should resemble our notion of goodness. For concreteness' sake, let us say the classifier assigned the following scores:

    Item True class Assigned score
    Milka 1 8
    Mars 0 5
    Lindt 1 3
    Ferrero-Rocher 1 3
    Kalev 0 1
    Laima 0 -3
    Hershey's 0 -5

    How do we assess the goodness of such a classifier? The "ROC approach" suggests we do the following. First, we shall plot a ROC curve. To do that, we order the chocolate bars according to their assigned score (this is already done in the table above) and use this ordering to write out the vector of the class labels (the "sorted prediction vector"):

    Highest rated item's true class Lowest rated item's true class
    1 0 1 1 0 0 0

    Obviously, the best possible classifier would order the instances so that all the ones would be on the left of all the zeroes. The particular classifier we chose here looks good, but not perfect.

    Next, we use the obtained vector to plot the ROC curve. To do this plot, forget all of the "True Positive Rate" and "False Positive Rate" nonsense you might have read about in most textbooks, and follow the following simple recipe:

    1. Draw a square with sides of length 1. Denote its its lower left corner as the "origin", its horizontal axis as "the negatives" and its vertical axis as "the positives". Split the negatives axis in as many parts as there are examples of class "0" in the dataset. Split the positives axis in as many parts as there are examples of class "1" in your dataset. In our case, the square will look as follows:

      The ROC square

    2. Next, take the sorted prediction vector, and draw a path, starting from the origin. For each "1" in the sorted prediction vector, the path will move one step "up", and for each "0" in the vector, the path will move one step "right" (note that if you have multiple predictions with the same score, make all the corresponding "up" and "right" moves at once by drawing the corresponding diagonal). In our case, the path goes ("up", "right", "up", "up", "right", "right", "right") and hence looks as follows:

      ROC curve

    Voilà, this is your ROC curve. Obviously, the "perfect" classifier would make all the "up" steps first and the "right" steps last, hence the ideal ROC curve looks like a perfect corner. A random classifier would mix his "up" and "right" steps randomly, and the curve would most probably follow a diagonal-ish path. The one-number summary of the "goodness" of such a path is the "area under the curve" (coloured gray here), which will be 1.0 for the ideal case, and somewhere around 0.5 for the worst case. In our case the area is 10/12 ~ 0.83.

    Here comes the important question — what does this number actually mean? One good answer is given by the following probabilistic interpretation:

    The area under ROC curve specifies the probability that, when we draw one positive and one negative example at random, the decision function assigns a higher value to the positive than to the negative example.

    There is, however, yet another, to my mind even more intuitive, interpretation. Define the rank of a positive example as the proportion of negative examples "to the right of it" in the sorted prediction vector. For example, if the sorted prediction vector is (1, 0, 0, 0, 0), then the rank of the sole positive item there is 1.0 (all negative examples are to the right). If the sorted prediction vector is (0, 0, 1, 0, 0), the rank is 0.5 (half of negative examples is to the right), and if the vector is (0, 0, 0, 0, 1), the rank is 0. In our chocolate bar example, the ranks of the positive examples are the following:

    Highest rated item's true class Lowest rated item's true class
    Sorted prediction vector 1 0 1 1 0 0 0
    Rank 1.0 0.75 0.75

    The "rank" thus denotes where would a classifier position a given positive item within the set of negative examples, whether closer to the "left" (rank 1) or closer to the "right" (rank 0).

    Now the average rank of all the positive examples seems like a reasonable metric of classifier performance, doesn't it? In fact, is exactly the ROC AUC! Now the value of 0.80 ROC score can be interpreted intuitively as follows: "the average positive example has about 20% of negative examples scored higher than it". This, so far, seems to me personally to be the "simplest" interpretation of the ROC AUC. So write it down:

    ROC AUC is the average positive rank.

    Tags: , ,

  • Posted by Konstantin 09.05.2011 2 Comments

    I have recently realized that my HP 8440p laptop has a built-in "Qualcomm un2420 Broadband Module" device, also known as a "3G modem". For some reason no drivers were preinstalled for it on my system, and with the SIM card slot concealed behind the battery, it was not something I noticed immediately. Once the drivers were installed, the operating system had no problem recognizing the new "Mobile Broadband Connection" opportunity, and with the SIM card in the slot, I could connect to the Internet via 3G, yay.

    Knowing that there is more to mobile communication than Internet access, I was wondering whether I could do anything else, like voice calls or SMS. Unfortunately, my attempts of finding any reasonable software packages, which would open up the power of 3G to me at the click of a button, failed. Instead, however, I discovered that it is actually quite easy to communicate with the modem directly. It turns out you can control your shiny bleeding-edge 3.5G device by sending plain old AT commands to it over a serial port. This is the same protocol that the wired grandpa-modems have been speaking since the 70s and it is fun to see this language was kept along all the way into the wireless century.

    Let me show you how it works. Try to follow along — this is kinda fun.

    Finding your Modem

    If your computer does not have a built-in 3G modem, chances are high your garden variety cellphone does (not to mention smartphones, of course). If it is the case, then:

    • Switch on the Bluetooth receiver on your phone (for older Nokias this is usually in the "Settings -> Connectivity -> Bluetooth" menu).
    • On your computer, go to "Devices and Printers", click "Add a device", wait until your phone appears on the list, double click it and follow the instructions to establish the connection. (I'm talking about Windows 7 here, but the procedure should be similar for most modern OSs).
    • Once the computer recognizes your phone and installs the necessary drivers, it will appear as an icon in the "Devices" window. Double click it to open the "Properties" window, and make sure there is a "Standard Modem over Bluetooth link" function or something similar in the list.

      Cellphone Functions

      Cellphone Functions

    • Double-click that "modem" entry, a new properties window opens. Browse along in it, and find the COM port number that was assigned to the modem.

      Bluetooth modem at port COM9

      Bluetooth modem at port COM9

    If you do have a 3G modem bundled with your laptop (and you have the drivers installed), open the Device manager ("Control Panel -> Device Manager"), find the modem in the list, double click to open the "Properties" page, and browse to the "Modem" tab to find the COM port number.

    Laptop 3G modem in Device Manager

    Laptop 3G modem in Device Manager

    Connecting to the Modem

    Next thing - connect to the COM port. In Windows use PuTTY to do it. In Linux use minicom. Don't worry about the settings — the defaults should do.

    PuTTY connection dialog

    PuTTY connection dialog

    Once the connection starts, you will get a blank screen with nothing but a cursor. Try typing "AT+CGMI" followed by a <RETURN>. Note that, depending on the settings of your device and the terminal program, you might not see your letters being typed. If so, you will have to reconfigure the terminal (enable "local echo"). But for now, just type the command. You should get the name of the manufacturer in response. You can also get the word "ERROR" instead. This means that your modem is ready to talk to you, but it either does not support the "AT+CGMI" command OR requires you to enter the PIN code first. We'll get to it in a second. If you get no response at all, you must have connected to the wrong COM port.

    Terminal sessions with a Qualcomm (left) and Nokia (right) 3G modems

    Terminal sessions with a Qualcomm (left) and Nokia (right) 3G modems

    You can get more information about the device using the "AT+CGMM", "AT+CGMR", "AT+CGSN" commands. Try those.

    Authentication

    To do anything useful, you need to authenticate yourself by entering the PIN (if you use your cellphone over bluetooth, you most probably already entered it and no additional authentication is needed). You can check whether you need to enter PIN by using the command "AT+CPIN?" (note the question mark). If the response is "+CPIN: READY", your SIM card is already unlocked. Otherwise the response will probably be "+CPIN: SIM PIN", indicating that a PIN is expected to unlock the card. You enter the pin using the "AT+CPIN=<your pin>" command. Please note, that if you enter incorrect PIN three times, YOUR SIM CARD WILL BE BLOCKED (and you will have to go fetch your PUK code to unblock it), so be careful here.

    Entering the PIN

    Entering the PIN

    Doing Stuff

    Now the fun starts: you can try dialing, sending and receiving messages and do whatever the device lets you do. The (nonexhaustive) list of most interesting commands is available here. Not all of them will be supported by your device, though. For example, I found out that the laptop's 3G modem won't let me dial numbers, whilst this was not a problem for a cellphone connected over bluetooth (try the command "ATD<your number>;" (e.g. "ATD5550010;")). On the other hand, the 3G modem lets me list received messages using the "AT+CMGL" command, while the phone refused to do it.

    One useful command to know about is "AT+CUSD", which lets you send USSD messages (those "*1337#" codes) to the mobile service provider. For example, the prepaid SIM card I bought for my computer allows to buy a "daily internet ticket" (unlimited high-speed internet for 12 hours for 1 euro) via the "*135*78#" code. Here's how this can be done via terminal.

    Sending an USSD code and receiving an SMS

    Sending an USSD code and receiving an SMS

    We first send "AT+CUSD=1,*135*78#" command, which is equivalent to dialing "*135*78#" on the phone. The modem immediately shows us the operator's response ("You will shortly receive an SMS with information..."). We then list new SMS messages using the "AT+CMGL" command. There is one message, which is presented to us in the PDU encoding. A short visit to the online PDU decoder lets us decrypt the message - it simply says that the ticket is activated. Nice.

    Sending an SMS

    Finally, here's how you can send a "flash" SMS (i.e. the one which does not get saved at the receiver's phone by default and can thus easily confuse people. Try sending one of those at night - good fun). We start with an ATZ to reset the modem, just in case. Then we set message format to "text mode" using the "AT+CMGF=1" command (the alternative default is the "PDU mode", in which we would have to type SMS messages encoded in PDU). Then we set message parameters using the "AT+CSMP" command (the last 16 is responsible for the message being 'flash'). Finally, we start sending the message using the "AT+CMGS=<phone number>" command. We finish typing the message using <Ctrl+Z> and off it goes.

    Sending a "Flash" SMS

    Sending a "Flash" SMS

    For more details, refer to this tutorial or the corresponding specifications.

    All in all, it should be fairly easy to make a simple end-user interface for operating the 3G modem, it is strange that there is not much free software available which could provide this functionality. If you find one or decide to make one yourself, tell me.

    Tags: , , , ,

  • Posted by Konstantin 16.03.2011 No Comments

    This is a (slightly modified) write-up of a part of a lecture I did for the "Welcome to Computer Science" course last semester.

    Part I. Humans Discover the World

    How it all started

    Millions of years ago humans were basically monkeys. Our ape-like ancestors enjoyed a happy existence in the great wide world of Nature. Their life was simple, their minds were devoid of thought, and their actions were guided by simple cause-and-effect mechanisms. Although for a modern human it might seem somewhat counterintuitive or even hard to imagine, the ability to think or understand is, in fact, completely unnecessary to succesfully survive in this world. As long as a living creature knows how to properly react to the various external stimuli, it will do just fine. When an ape sees something scary — ape runs. When an ape seems something tasty — ape eats. When an ape sees another ape — ape acts according to whatever action pattern is wired into its neural circuits. Does the ape understand what is happening and make choices? Not really — it is all about rather basic cause and effect.

    As time went by, evolution blessed our ape-like ancestors with some extra brain tissue. Now they could develop more complicated reaction mechanisms and, in particular, they started to remember things. Note that, in my terminology here, "remembering" is not the same as "learning". Learning is about simple adaptation. For example, an animal can learn that a particular muscle movement is necessary to get up on a tree — a couple of failed attempts will rewire its neural circuit to perform this action as necessary. One does not even need a brain to learn — the concentration of proteins in a bacteria will adjust to fit the particular environment, essentially demonstrating a learning ability. "Remembering", however, requires some analytical processing.

    Remembering

    It is easy to learn to flex a particular finger muscle whenever you feel like climbing up a tree, but it is a totally different matter to actually note that you happen to regularly perform this action somewhy. It is even more complicated to see that the same finger action is performed both when you climb a tree and when you pick a banana. Recognizing such analogies between actions and events is not plain "learning" any more. It is not about fine-tuning the way a particular cause-and-effect reflex is working. It is a kind of information processing. As such, it requires a certain amount of "memory" to store information about previous actions and some pattern analysis capability to be able to detect similarities, analogies and patterns in the stored observations. Those are precisely the functions that were taken over by the "extra" brain tissue.

    So, the apes started "remembering", noticing analogies and making generalization. Once the concept of "grabbing" is recognized as a recurring pattern, the idea of grabbing a stone instead of a tree branch is not far away. Further development of the brain lead to better "remembering" capabilities, more and more patterns discovered in the surrounding world, which eventually lead to the birth of symbolic processing in our brains.

    Symbols

    What is "grabbing"? It is an abstract notion, a recurring pattern, recognized by one of our brain circuits. The fact that we have this particular circuit allows us to recognize further occurrences of "grabbing" and generalize this idea in numerous ways. Hence, "grabbing" is just a symbol, a neural entity that helps our brains to describe a particular regularity in our lives.

    As time went by, prehistoric humans became aware (or, let me say "became conscious") of more and more patterns, and developed more symbols. More symbols meant better awareness of the surrounding world and its capabilities (hence, the development of tools), more elaborate communication abilities (hence, the birth of language), and, recursively, better analytic abilities (because using symbols, you can search for patterns among patterns).

    Symbols are immensely useful. Symbols are our way of being aware of the world, our way of controlling this world, our way of living in this world. The best thing about them is that they are easily spread. It may have taken centuries of human analytical power to note how the Sun moves along the sky, and how a shadow can be used to track time. But once this pattern has been discovered, it can be recorded and used infinitely. We are then free to go searching for other new exciting patterns. They are right in front of us, we just need to look hard. This makes up an awesome game for the humankind to play — find patterns, get rewards by gaining more control of the world, achieve better life quality, feel good, everyone wins! Not surprisingly, humans have been actively playing this game since the beginning of time. This game is what defines humankind, this is what drives its modern existence.

    Science

    Galelio's experiment

    "All things fall down" — here's an apparently obvious pattern, which is always here, ready to be verified. And yet it took humankind many years to discover even its most basic properties. It seems that the europeans, at least, did not care much about this essential phenomenon until the XVIIth century. Only after going through millenia self-deception, followed by centuries of extensive aggression, devastating epidemics, and wild travels, the europeans found the time to sit down and just look around. This is when Galileo found out that, oh gosh, stuff falls down. Moreover, it does so with the same velocity independently of its size. In order to illustrate this astonishing fact he had to climb on to the tower of Pisa, throw steel balls down and measure the fall time using his own heartbeat.

    In fact, the late Renaissance was most probably the time when europeans finally became aware of the game of science (after all, this is also a pattern that had to be discovered). People opened their eyes and started looking around. They understood that there are patterns waiting to be discovered. You couldn't see them well, and you had to look hard. Naturally, and somewhat ironically, the sky was the place they looked towards the most.

    Patterns in the Sky

    Tycho Brahe

    Tycho Brahe, a contemporary of Galileo, was a rich nobleman. As many other rich noblemen of his time, he was excited about the sky and spent his nights as an astronomer. He truly believed there are patterns in planetary motions, and even though he could not see them immediately, he carefully recorded daily positions of the stars and planets, resulting in a vast dataset of observations. The basic human "remembering" ability was not enough anymore — the data had to be stored on an external medium. Tycho carefully guarded his measurements, hoping to discover as much as possible himself, but he was not the one to find the pattern of planetary motion. His assistant, Johannes Kepler got a hold of the data after Tycho's death. He studied the data and came up with three simple laws which described the movements of planets around the Sun. The laws were somewhat weird (the planets are claimed to sweep equal areas along an ellipse for no apparent reason), but they fit the data well.

    Kepler's Laws

    This story perfectly mirrors basic human pattern discovery. There, a human first observes the world, then uses his brain to remember the observations, analyze them, find a simple regularity, and come up with an abstract summarizing symbol. Here the exact same procedure is performed on a larger scale: a human performs observations, a paper medium is used to store them, another human's mind is used to perform the analysis, the result is a set of summarizing laws.

    Isaac Newton

    Still a hundred years later, Isaac Newton looked hard at both Galileo's and Kepler's models and managed to summarize them further into a single equation: the law of gravity. It is somewhat weird (everything is claimed to be attracted to everything for no apparent reason), but it fits the data well. And the game is not over yet, three centuries later we are still looking hard to understand Gravity.

    Where are we going

    As we play the game, we gradually run out of the "obvious" patterns. Detecting new laws of nature and society becomes more and more complicated. Tycho Brahe had to delegate his "memory" capabilities to paper. In the 20th century, the advent of automation helped us to delegate not only "memory", but the observation process itself. Astronomers do not have to sit at their telescopes and manually write down stellar positions anymore — automated radar arrays keep a constant watch on the sky. Same is true of most other science disciplines to various extents. The last part of this puzzle which is not fully automated yet is the analysis part. Not for long...

    Part II. Computers Discover the World

    Manufactured life

    Vacuum tube

    The development of electricity was the main industrial highlight of the XIXth century. One particularly important invention of that century was an incredibly versatile electrical device called the vacuum tube. A lightbulb is a vacuum tube. A neon lamp is a vacuum tube. A CRT television set is a vacuum tube. But, all the fancy glowing stuff aside, the most important function of a vacuum tube turned out to be its ability to act as an electric current switcher. Essentially, it allowed to hardwire a very simple program:

    if (wire1) then (output=wire2) else (output=wire3)

    It turns out that by wiring thousands of such simple switches together, it is possible to implement arbitrary algorithms. Those algorithms can take input signals, perform nontrivial transformations of those signals, and produce appropriate outputs. But the ability to process inputs and produce nontrivial reactions is, in fact, the key factor distinguishing the living beings from lifeless matter. Hence, religious, spiritual, philosophical and biological aspects aside, the invention of electronic computing was the first step towards manufacturing life.

    Of course, the first computers were not at all like our fellow living beings. They could not see or hear, nor walk or talk. They could only communicate via signals on electrical wires. They could not learn — there was no mechanisms to automaticaly rewire the switches in response to outside stimuli. Neither could they recognize and "remember" patterns in their inputs. In general, their hardwired algorithms seemed somewhat too simple and too predictable in comparison to living organisms.

    Transistors

    But development went on with an astonishing pace. The 1940's gave us the most important invention of the XXth century: the transistor. A transistor provides the same switching logic as a vacuum tube, but is tiny and power-efficient. Computers with millions and billions of transistors became gradually available. Memory technologies caught up: bytes grew into kilobytes, megabytes, gigabytes and terabytes (expect to see a cheap petabyte drive at your local computer store in less than 5 years). The advent of networking and the Internet, multicore and multiprocessor technologies followed closely. Nowadays the potential for creating complex, "nontrivial" lifelike behaviour is not limited so much by the hardware capabilities. The only problem left to solve is wiring the right program.

    Reasoning

    The desire to manufacture "intelligence" surfaced early on in the history of computing. A device that can be programmed to compute, must be programmable to "think" too. This was the driving slogan for computer science research in most of the 1950-1980s. The main idea was that "thinking", a capability defining human intellectual superiority over fellow mammals, was mainly related to logical reasoning.

    "Socrates is a man. All men are mortal. => Hence, Socrates is mortal."

    As soon as we teach computers to perform such logical inferences, they will become capable of "thinking". Many years of research have been put in to this area and it was not in vain. By now, computers are indeed quite successful at performing logical inference, playing games and searching for solutions of complex discrete problems. But the catch is, this "thinking" does not feel like being proper "intelligence". It is still just a dumb preprogrammed cause-and-effect thing.

    The Turing Test

    Alan Turing

    A particular definition of "thinking" was provided by Alan Turing in his Turing test: let us define intelligence as a capability of imitating a human in a conversation, so that it would be indistinguishable from a real human. This is a hard goal to pursue. It obviously cannot be achieved by a bare logical inference engine. In order to imitate a human, computer has to know what a human knows, and that is a whole lot of knowledge. So, perhaps intelligence could be achieved by formalizing most of human knowledge within a powerful logical inference engine? This has been done, and done fairly well, but sadly, this still does not resemble real intelligence.

    Reasoning by Analogy

    Optical character recognition

    While hundreds of computer science researchers were struggling hard to create the ultimate knowledge-based logical system, real-life problems were waiting to be solved. No matter how good the computer became at solving abstract logical puzzles, he seemed helpless when faced with some of the most basic human tasks. Take, for example, character recognition. A single glimpse at a line of handwritten characters is enough for a human to recognize the letters (unless it is my handwriting, of course). But what logical inference should the computer do to perform it? Obviously, humans do not perform this task using reasoning, but rely on intuition instead. How can we "program" intuition?

    The only practical way to automate character recognition turned out to be rather simple, if not to say dumb. Just store many examples of actual handwritten characters. Whenever you need to recognize a character, find the closest match in that database and voila! Of course, there are details which I sweep under the carpet, but the essence is here: recognition of characters can only be done by "training" on a dataset of actual handwritten characters. The key part of this "training" lies, in turn, in recognizing (or defining) the analogies among letters. Thus, the "large" task of recognizing characters is reduced to a somewhat "smaller" task of finding out which letters are similar, and what features make them similar. But this is pattern recognition, not unlike the rudimentary "remembering" ability of the early human ancestors.

    The Meaning of Life

    Please, observe and find the regularity in the following list:

    • An ape observes its actions, recognizes regularities, and learns to purposefully grab things.
    • Galileo observes falling bodies, recognizes regularities, and leans to predict the falling behaviour.
    • Tycho Brahe observes stars, Johannes Keper recognizes regularities, and learns to predict planetary motion.
    • Isaac Newton observes various models, recognizes regularities, and develops a general model of gravity.
    • Computer observes handwritten characters, recognizes regularities, and learns to recognize characters.
    • Computer observes your mailbox, recognizes regularities, and learns to filter spam.
    • Computer observes natural language, recognizes regularities, and learns to translate.
    • Computer observes biological data, recognizes regularities, and discovers novel biology.

    Unexpectedly for us, we have stumbled upon a phenomemon, which, when implemented correctly, really "feels" like true intelligence. Hence, intelligence is not about logical inference nor extensive knowledge. It is all about the skill of recognizing regularities and patterns. Humans have evolved from preprogrammed cause-and-effect reflexes through simple "remembering" all the way towards fairly sophisticated pattern analysis. Computers now are following a similar path and are gradually joining us in The Game. There is still a long way to go, but we have a clear direction: The Intelligence, achieving which means basically "winning" The Game. If anything at all, this is the purpose of our existence - discovering all the regularities in the surrounding world for the sake of total domination of Nature. And we shall use the best intelligence we can craft to achieve it (unless we all die prematurely, of course, which would be sad, but someday some other species would appear to take a shot at the game).

    Epilogue. Strong AI

    There is a curious concept in the philosophical realms of computer science — "The Strong AI Hypothesis". It relates to the distinction between manufacturing "true consciousness" (so-called "strong AI") and creating "only a simulation of consciousness" (the "weak AI"). Although it is impossible to distinguish the two experimentally, there seems to be an emotional urge to make the distinction. This usually manifests in argumentation of the following kind: "System X is not true artificial intelligence, because it is a preprogrammed algorithm; Humans will never create true AI, because, unlike us, a preprogrammed algorithm will hever have free will; etc."

    Despite the seemingly unscientific nature of the issue, there is a way to look at it rationally. It is probably true that we shall never admit "true intelligence" nor "consciousness" to anything which acts according to an algorithm which is, in some sense, predictable or understandable by us. On the other hand, every complex system that we ever create, has to be made according to clearly understandable blueprints. The proper way of phrasing the "Strong AI" question is therefore the following: is it possible to create a system, which is built according to "simple" blueprints, and yet the behaviour of which is beyond our comprehension.

    Cellular automaton

    The answer to this question is not immediately clear, but my personal opinion is that it is a strong "yes". There are at least three kinds of approaches known nowadays, which provide a means for us to create something "smarter" than us. Firstly, using everything fractal, cellular, and generally chaotic is a simple recipe for producing uncomprehensibly complex behaviour from trivial rules. The problem with this approach, however, is that there is no good methodology for crafting any useful functions into a chaotic system.

     

    The second candidate is anything neural — obviously the choice of Mother Nature. Neural networks have the same property of being able to demonstrate behaviour, which is not immediately obvious from the neurons or the connections among them. We know how to train some types of networks and we have living examples to be inspired by. Nonetheless, it is still hard to actually "program" neural networks. Hence, the third and the most promising approach — general machine learning and pattern recognition.

    The idea of a pattern recognition-based system is to use a simple algorithm, accompanied by a huge dataset. Note that the distinction between the "algorithm" and the "dataset" here draws a clear boundary between two parts of a single system. The "algorithm" is the part which we need to understand and include in our "blueprints". The "data" is the remaining part, which we do not care knowing about. In fact, the data can be so vast, that no human is capable of grasping it completely. Collecting petabytes is no big deal these days any more. This is a perfect recipe for making a system which will be capable of demonstrating behaviour that would be unpredictable and "intelligent" enough for us to call it "free will".

    Think of it...

    Tags: , , , ,