• Posted by Margus 25.10.2009 5 Comments

    That is, if we are lucky and everything is done according to the proper tradition of doing statistics with 1% confidence intervals. In practice, things are probably even worse (many use 5% for instance), but this is what you would expect when everyone used proper methodology.

    Think about it...

    Tags: , ,

  • Posted by Konstantin 23.10.2009 No Comments

    The Alt+Tab key combination is perhaps one of the most well-known keyboard shortcuts in Windows, with the only competitors for the throne being Ctrl+C and Ctrl+V. And no matter, whether you are used to alt-tabbing for working purposes or simply as a means of efficient undercover procrastination, unless you are a complete novice you probably have this skill at the level of basic instincts.

    Unfortunately, there are cases where the instinct becomes inconvenient. Namely, whenever you use an application that displays multiple documents in separate tabs (like Firefox or Notepad++) or in separate child windows (like R), you are expected to use Ctrl+Tab rather than Alt+Tab to switch among documents. However, most of the time switching among documents is subjectively perceived as nothing essentially different than switching among programs, hence the fact that Alt+Tab won't work normally for that case is highly unintuitive. The typical case with me is that I would accidentally use Alt+Tab attempting to switch between the editor and console in R and unexpectedly find a completely different window in front of me, which is quite annoying.

    Although I am pretty sure I am not the only one to experience this kind of frustration, it is surprising that there does not seem to be any easily available solution to this trivial issue known to google. Thus, considering that the whole problem can be solved to a fair extent by simply translating Alt keypresses into Ctrl in a smart way, I've made a smallish program that does exactly that.

    I'm quite happy with the result and can't help sharing it with you.

    Download: Binary, Source.

    Tags: , , ,

  • Posted by Konstantin 11.10.2009 2 Comments

    I've recently stumbled upon a simple observation, which does not seem to be common knowledge and yet looks quite enlightening. Namely: polynomials provide an excellent way of modeling data in an order-agnostic manner.

    The situations when you need to represent data in an order-agnostic way are actually fairly common.  Suppose that you are given a traditional sample x_1, x_2, \dots, x_n and are faced with a task of devising a generic function of the sample, which could only depend on the values in the sample, but not on the ordering of these values. Alternatively, you might need to prove that a given statistic is constant with respect to all permutations of the sample. Finally, you might simply wish to have a convenient mapping for your feature vectors that would lose the ordering information, but nothing else.

    The most common way of addressing this problem is sorting the sample and working with the order statistics x_{(1)}, x_{(2)}, \dots, x_{(n)} instead of the original values. This is not always convenient. Firstly, the mapping of the original sample to the corresponding vector of order statistics (i.e. the sorting operation) is quite complicated to express mathematically. Secondly, the condition that the vector of order statistics is always sorted is not very pleasant to work with. A much better idea is to represent your data as a polynomial of the form

        \[p_x(z) = (z+x_1)(z+x_2)\dots(z+x_n)\,.\]

    This will immediately provide you with a marvellous tool: two polynomials p_x and p_y are equal if and only if their roots are equal, which means, in our case, that the samples x_1,\dots,x_n and y_1,\dots,y_n are equal up to a reordering.

    Now in order to actually represent the polynomial we can either directly compute its coefficients

        \[p_x(z) = z^n + a_1z^{n-1} + \dots + a_n\,,\]

    or calculate its values at any n different points (e.g. at 0,1,\dots,n-1) - in any case we end up with the same amount of data as we had originally (i.e. n values), but the new representation is order-agnostic and has, arguably, much nicer properties than the order statistics vector.

    It is not without its own problems, of course. Firstly, it requires at least \Omega(n^2) time to compute. Secondly, not every polynomial will have n real-valued roots. And thirdly, the interpretation of the new "feature vector" is not necessarily intuitive or meaningful. Yet nonetheless, it's a trick to consider.

    Tags: , , , ,