• Posted by Konstantin 09.09.2008 3 Comments

    Every time you visit this page, a piece of Javascript code will run within your browser, render a small part of the picture below (which is, for the sake of beauty and simplicity, a fragment of the Mandelbrot fractal) and submit the resulting pixels to the server. After 100 visits the whole picture will be complete (and the rendering restarts). If I hadn't told you that, you wouldn't have the slightest chance of noticing how this page steals your CPU cycles, and that is why one might refer to such practice as parasitic or leech computing.

    Mandelbrot fractal

    The Mandelbrot fractal

    In this simple example, I am probably not winning much by outsourcing the rendering procedure. The computation of each pixel requires about 800 arithmetic operations on average, and this is comparable to the overhead imposed by the need to communicate the results back to the server via HTTP. However, if I chose to render somewhat larger chunks of the image at higher precision, the gains would be much more significant. Additionally, the script could be written so that it would keep running continuously for as long as you are staying at the page, thus sacrificing the user experience somewhat, yet blatantly robbing you of CPU power.

    It seems that this approach to distributed computing has not reached the masses yet. I believe, however, that we are going to see the spread of such parasitic code someday, because it is the second easiest way to monetize website traffic. Indeed, we are already used to watching ads in return for free service. Moreover, quite a lot of the ads are rather heavy Flash applications that spend your CPU cycles with the sole purpose of annoying you. Now, if someone replaced that annoying Flashing banner with a script, that computed something useful behind the scenes, you wouldn't be too disappointed, would you? And that someone could then sell his website traffic not in terms of "banner displays", but in terms of "CPU seconds". Or, well, he could sell both.

    Of course, not every distributed computation can be easily implemented within such an environment. Firstly, it should be possible to divide the problem into a large number of independent parts: this is precisely the case when you need to compute the values of a certain function f for a large number of parameters. The Mandelbrot example above fits this description. Here is one other similar problem. Less obviously, various other tasks could be fit within the framework with the help of the Map-Reduce trick.

    Secondly, the computation of each value f(x) should be reasonably complex, preferably superlinear, i.e. Ω(n^2) or worse. Otherwise, the overhead of sending the inputs (which is O(n)) would offset the benefits too much.

    Thirdly, the description of the function f should be reasonably compact, otherwise the overhead of transferring it to each visitor would be too costly. Note, however, that this issue slightly depends on the kind of traffic being leeched upon: if a website has a small number of dedicated users, each user would only need to download the function definition once and refer to the cached version on his subsequent visits to the site.

    Finally, the function f, as well as its inputs and outputs must be public. This restriction severely limits the use of the approach. For example, although numerous data analysis tasks could satisfy the above conditions, in many practical contexts the data is private and it is thus not possible to openly distribute it to arbitrary visitors of an arbitrary website.

    Besides the theoretical difficulties, there are some technical issues that need to be solved before the whole thing can work, such as the security aspects (you can't trust the results!), implementation (Linear Algebra libraries for Javascript or Flash, please?), ethical concerns and some more.

    Nonetheless, the whole thing still looks rather promising to me, and is at least as worthy of academic and industrial attention, as are all of these overhyped Grid, P2P and SOA technologies around.

    PS: By the way, I find the topic well-suitable for a proper student project/thesis.


    Tags: , , ,

  • Posted by Konstantin 03.09.2008 4 Comments
    Google Chrome logo

    Geeks all over the world have just gained a new hot topic to flame or panic about. Web designers now have to verify their applications and websites against a yet another browser. Developers learned about that new open-source embeddable Javascript engine. All the normal people will have a choice of a yet another, hopefully well-made, browser to work with. Thus groweth the Church of Google.

    What is it to Google? Apart from the obvious increase in the user base and the potential to advertise suggest websites right in the address bar, wide distribution of Chrome should somewhat increase the amount of user-generated traffic flowing into Google servers. Indeed, the default configuration of Chrome, equipped with that marvelous auto-suggestion feature, seems to start sending stuff out as soon as you type your first character into the address line.

    Although the term "privacy violation" is the first one to pop out, let's keep that aside for now. The really interesting question concerns the nature of this constant influx of half-typed URLs and "search terms", annotated with timestamps and host IPs. Firstly, it most certainly contains additional value over whatever is already indexed in the web: global events, current social trends, new websites, ideas and random creative thoughts leave a mark on your address line. It therefore makes sense to look into this data and search for patterns. Secondly, the volume of this data stream is probably quite large whilst the noise is significant, hence it does not pay off to store it somewhere. And that's where we reach a nice observation: for many purposes you don't need to store this data.

    If the bandwidth of the stream is constantly high, you can afford to throw it away. If at any moment you should need data, just turn on the sniffer "put your bucket in", and you'll collect megabytes of interesting stuff in a matter of seconds, if not less. A simplistic version of such kind of "stream analysis" looks as follows: you ask Google "what do people read right now", it then listens to the stream for a second and responds with something meaningful, and I believe much cooler things can be thought of. Anyway, the important point is that no "global" indexing or data collection is needed to perform this service, just a thumb on the pulse of the web.

    Tags: , , ,

  • Posted by Konstantin 01.09.2008 10 Comments

    There is no easy way to explain why would a person, who has spent 11 years at school, 4 years at bachelor's and 3 years doing master's studies, then decide to enter PhD studies, risking yet another 4 years of his life, if not more. Unlike the case with the bachelor's or master's, there does not seem to be any social pressure encouraging to get a PhD. Neither is there much economical motivation, because getting a PhD does not guarantee higher pay. Finally,  although the PhD degree is indeed a damn cool thing to have, it is doubtful whether people possessing it enjoy life more than all the rest do.

    Nonetheless, PhD is a required attribute of anyone aspiring for the academic career, and is regarded as a qualitatively higher step on the education ladder. So, the question is, what makes this qualitative difference and what issues should one focus on most during the 4 years. Here's my initial guess:

    Your doctorate studies did not go in vain, if:

    1. You can generate a publishable paper in a month, a really good paper in 2-3 months,
    2. You can write a convincing grant/project proposal and you know when and where to submit it,
    3. You have realistic but useful ideas for future work and research,
    4. You know how to supervise/direct/collaborate with others and be actually useful at it,
    5. You know the most important people in your field, and they know you,
    6. You are good at lecturing or other kinds of oral presentation,
    7. You know what to do after you defend.

    I'm sure there's something missing, but this list has some aims complicated enough already. Hopefully, this blog will help me with points 3,5,7 of the above agenda as well as keep reminding of the fact that I don't want to loose my 4 years for nothing. It's day 1 today. 1460 days left. Yay!

    Tags: , ,