Four Years Remaining

Strong Typing

Posted by Konstantin 12.03.2014 No Comments
Whenever you write a program, you want this program to behave correctly and do what you want it to do. Thus, programming always goes together with the mental act of proving to yourself (and sometimes to other people as well), that the code you write is correct. Most often this "proof" is implicit, dissolved in the way you write your code and comment it. In fact, in my personal opinion, "good code" is exactly the one, where a human-reviewer is able to verify its correctness without too much effort.

It is natural to use computers to help us verify correctness. Everyone who has ever programmed in a strictly-typed programming language, such as Java or Haskell, is familiar with the need to specify types of variables and functions and follow strict rules of type-safety. But of course, ensuring type-safety is just the compiler's way to help you ensure some basic claims about the program, such as "this variable will always contain an integer" or "this function will always be invoked with exactly three parameters".

This is very convenient, yet can be somewhat limiting or annoying at times. After all, type-safety requires you to write code that "can be type-checked". Although very often this is expected of "good code" anyway, there are situations where you would like some more flexibility. For this reason some languages impose no type-safety rules at all (e.g. Python or Javascript), and some languages let you disable type-checking for parts of code.

Rather than disabling the type checker, another principled way to allow more flexibility is to make the type-checker smarter. This is the promise of dependent types. In principle, a language, which supports dependent types, would let you make much more detailed statements about your program and have your program automatically checked for correctness with respect to those statements. Rather than being limited to primitive claims like "this variable is an integer", the use of dependent types enables you to assert things like "this is a sorted list", or "this is an odd integer", and so on up to nearly arbitrary level of detail, in the form of a type annotation. At least that much I found out during a course at the recent winter school.

The course was based on the Agda programming language, and the first thing I decided to try implementing in Agda is a well-typed version of the following simple function:
```
f t = if t then true else 0
```
It might look like something trivial, yet most traditional strictly typed languages would not let you write this. After all, a function has to have a return type, and it has to be either a Boolean or an Integer, but not both. In this case, however, we expect our function to have a parameter-dependent type:
```
f : (t : Bool) → if t then Bool else Integer
```
Given that Agda is designed to support dependent types, how complicated could it be to implement such a simple function? It turns out, it takes a beginner more than an hour of thinking and a couple of consultations with the specialists in the field. The resulting code will include at least three different definitions of "if-then-else" statements and, I must admit, some aspects of it are still not completely clear to me.

IF-THEN-ELSE in Agda, including all the boilerplate code

This is the longest code I've ever had to write to specify a simple if-then-else statement. The point of this blog post is to share the amusement and suggest you to check out Agda if you are in the mood for some puzzle-solving.

As for dependent types, I guess those are not becoming mainstream any time soon.
Tags: Computer science, Fun, Procrastination, Programming, Theory, Unclear
Bitcoin-based Timestamping

Posted by Konstantin 20.01.2014 No Comments
Bitcoin is a cryptographic currency, that has gained a lot of hype in the last year. From a technical perspective, it is simply a distributed timestamping scheme, fully dedicated to establishing the order of monetary transactions, by creating a long block chain.

Adding a new block to the block chain requires extremely expensive distributed computations. Thus, in terms of the amount of energy, invested by the users worldwide into its creation, the block chain is, at the moment, probably the most expensive computer-generated file in human history. A monument to the raw "computation for the sake of computation". The Bitcoin network by now includes hundreds of thousands of users, most of whom keep a full copy of the block chain and contribute to its further growth.

Timestamping hash, published in a paper

All that means that including any information into the block chain can act as a solid timestamp, proving the existence of this information at a particular point in time. It is nearly impossible to fake or revoke. Even if the Bitcoin network would cease working, the block chain would probably be kept around at least as a curious artifact (as well as an object of interest for data miners) . The idea is equivalent to a popular practice of timestamping information by publishing it in a widely distributed newspaper. However, publishing in a popular newspaper may be costly, while getting transactions into the block chain is nearly free and accessible to anyone.

Consequently, it seems obvious that sooner or later the bitcoin block chain must begin to be used for timestamping things other than transactions. Because trusted timestamping is a big deal. Everyone in Estonia knows that.

Unfortunately, it seems that although the idea has been mentioned before, there do not seem to be any convenient services developed for that, apart from BTProof, which is somewhat too simplistic, given the potential importance of the task at hand. In an attempt to perhaps inspire someone to consider imlementing a more serious service of this kind, let me give a brief overview of the ways to get your data into the block chain.

Smuggling your data into the block chain

If only Bitcoin transactions were allowed to have textual descriptions assigned to them, the task would be trivial: any piece of information you want to timestamp could be simply mentioned in the description of a transaction. However, this functionality is not part of the Bitcoin protocol, so we have to use tricks. At least three different techniques are possible here.

1. Specifying your data as a destination address.

Each Bitcoin transaction includes a "destination address", which is a 34-character string in hex-like encoding. This address may be specified arbitrarily. Thus, by transferring any amount to an address, which itself is the hash of the information you need timestamped, you will have the fact of information's existence mentioned in the block chain for future generations to behold. This is the idea behind BTProof. There are several problems with this method. Most importantly, anything you transfer to a nonexistent address will get lost forever, with no one being able to claim it. This makes the process non-free, because you cannot have transactions of size 0. Moreover, very small transactions are unfavoured by the Bitcoin network and take a long time to get verified. Finally, leaving "unclaimed" transactions forever hanging in the block chain is somewhat indecent in the first place, isn't it?

The abovementioned drawbacks may be addressed using multi-signature transactions. Those are a special type of transactions, which allow the funds to be claimed by any one of several addresses. In this case one address can be used to encode the hash and another one — to reclaim the funds spent in the transaction back. This concept has been suggested as the way to carry arbitrary data on top of Bitcoin in the MasterCoin project.

2. Specifying your data as a destination private key.

Rather than converting the data you need timestamped into a (non-existent) address, you can turn it into a private key. You can then perform two transactions. The first one transfers funds to an address, corresponding to this private key, and next one uses the private key to withdraw the funds back to you. As there is a fixed mapping from your data to the private key to the address that the funds went through, you have just included a trace of your data into the block chain. This method is mentioned here. The drawback is the need for two transactions, and the overall complexity of the scheme.

3. Specifying your data in the script.

Finally, the last two places in a bitcoin transaction, which allow custom data, are the two "script" fields. Namely, the act of depositing funds to a transaction in Bitcoin is not as simple as providing the target address. Instead, it is a script, that, when executed, is supposed to check the right of the receiver to obtain the funds. Similarly, the act of withdrawing funds from Bitcoin is expressed by a script that proves the rights of the owner.

For example, a typical "deposition script" looks as follows:
```
OP_DUP 
OP_HASH160
OP_PUSHDATAx <target_address>
OP_EQUALVERIFY
OP_CHECKSIG
```
This script means that in order to withdraw the funds, the receiver must push on the stack a signature of the transaction, followed by his public key. The script then starts executing by first duplicating (OP_DUP) the top value on the stack, the public key. It then applies a hash function (OP_HASH160) to the top value on the stack (this converts the public key to an address). Then another value is pushed onto the stack (OP_PUSHDATAx). Next, two top values are popped and checked for equality (OP_EQUALVERIFY). This verifies, that the receiver's address matches <target_address>. Finally, the OP_CHECKSIG command pops another two values from the stack (those are the signature and the public key now, remember), and verifies the correctness of the signature.

The beauty of the system is that it lets you create various rules for claiming funds apart from simply owning a private key to an address. For example, it is possible to create transactions which require multiple parties to collude to withdraw them. Or you may require the receiver of the funds to solve a puzzle. Or you may even put the funds up for anyone to take freely, etc.

What is important for our purposes, however, is that the scripting language is rather flexible. In particular, it lets you add useless commands, such as "push this data onto stack, then drop the top value from stack, then continue as normal:"
```
OP_PUSHDATAx <any_data> 
OP_DROP 
OP_HASH160
OP_PUSHDATAx <target_address>
OP_EQUALVERIFY
OP_CHECKSIG
```
This logic could be included in either the "depositing" or the "receiving" script, letting you essentially provide arbitrary "notes" in transactions and thus do timestamp data in the most reasonable way. This lets you timestamp using a single transaction, which recurrently transfers any amount from an address to itself.

Unfortunately, it seems that the freedom of scripting has been severely limited in the recent versions of Bitcoin software. Namely, transactions with any nonstandard scripts are simply declined from inclusion into a block (at least, none of my attempts to try this out succeeded). Even the fate of multi-signature transactions, mentioned in point 1 (which are just a particular kind of a script) is not completely clear. In any case, the Bitcoin specification will most probably evolve to eventually allow storing dedicated data packets in the block chain without the need to resort to hacks. And if not Bitcoin, perhaps such functionality will become part of one of the competing similar cryptocurrencies.

It seems unreasonable to run a huge distributed timestamping algorithm, and not let people use it for general-purpose timestamping, doesn't it?

Update: Given the recent problems related to the transaction malleability aspect of the protocol, it is easy to predict that the freedom of scripting will probably be limited even further in the future. However, eventually support must be added for storing a custom nonce into the signed transaction (as it seems to be the only reasonable way to make transactions uniquely identifiable despite malleability of their hash). That nonce would be a perfect candidate for general-purpose timestamping purposes.
Tags: Computer science, Cryptography, Distributed computing, Project, Timestamping
Atmospheric Semantics of Scientific Presentations

Posted by Konstantin 01.04.2013 1 Comment

CrapCon is a fun evening session, which traditionally takes place at the annual Estonian Winter School on Computer Science (EWSCS), where participants of EWSCS are welcome to make short random nonsensical talks, prepared earlier on the same day during the lectures.

The following is my presentation from this year's CrapCon. The first part of it, as is customary for most CrapCon talks, makes fun of the topics being taught during the school, hence it might not make much sense outside the context of the event. The second part is fairly general (and totally serious!), though.

If you find that amusing, certainly check out the talk by Taivo Lints, which is, I think, among the best examples of computer-science-related abstract humor.

Tags: Computer science, Fun, Video
On Gödel's Theorems and the Universe

Posted by Konstantin 27.01.2013 7 Comments

Most results that are published in mathematics and natural sciences are highly technical and obscure to anyone outside the particular area of research. Consequently, most scientific publications and discoveries do not get any media attention at all. This is a problem both for the public (which would benefit from knowing a bit more about our world), the science in general (which would benefit from wider dissemination) and the scientists themselves (which would certainly appreciate if their work reached more than a couple of other people). The issue is usually addressed by the researchers trying to popularize their findings — provide simple interpretations, which could be grasped by the layman without the need to understand the foundations. Unfortunately, if the researcher was lucky to grab enough media attention, chances are high the process of "popularization" will get completely out of hands, and the message received by the public will have nothing to do with the original finding.

There are some theorems in pure mathematics, which happened to suffer especially seriously from this phenomenon. My two most favourite examples are the Heizenberg's uncertainty principle and Gödel's incompleteness theorems. Every once in a while I stumble on pieces, written by people who have zero knowledge of basic physics (not even mentioning quantum mechanics or Fourier theory), but claim to have clearly understood the meaning and the deep implications of the uncertainty principle upon our daily lives. Even more popular are Gödel's theorems. "Science proves that some things in this world can never be understood or predicted by us, mere humans!!!" is a typical take on Gödel by the press. Oh, and it is loved by religion enthusiasts!

This post is my desperate attempt to bring some sanity back into this world by providing a (yet another, but maybe slightly different) popular explanation for Gödel's ideas.

Gödel's theorems explained (with enough context)

Kurt Gödel

In our daily lives we primarily use symbols to communicate and describe the world. Those may be formulas in mathematics or simply sentences in the English language. Symbol-based systems can be used in different ways. From the perspective of arts and humanities, one is welcome to construct arbitrary sequences of words and sentences, as long as the result is "beautiful", "inspiring" or "seems smart and convincing". From a more technically-minded point of view, however, one is only welcome to operate with "true and logical" statements. However, it is not at all obvious which statements should be universally considered "true and logical". There are two ways the problem can be resolved.

Firstly, we can determine the truth by consulting "reality". For example, the phrase "All people are mortal, hence Socrates is mortal" is true because, indeed (avoiding a lecture-worth of formalities or so), in all possible universes where there are people and they are mortal, Socrates will always also be mortal.

Secondly, we can simply postulate a set of statements, which we shall universally regard as "the true ones". To distinguish those postulated truths from the "actual truths", mentioned in the paragraph above, let us call them "logically correct statements". That is, we can hypothetically write down an (infinite) list of all the "logically correct" sentences in some hypothetical "Book of All Logically Correct Statements". From that time on, every claim not in the book will be deemed "illogical" and hence "false" (the question of whether it is false in reality would, of course, depend on the goodness and relevance of the book we created).

This is really convenient, because now in order to check the correctness of the claim about Socrates we do not have to actually visit all the possible universes. It suffices to look it up in our book! If the book is any good, it will provide good answers, so the concept of "logical correctness" may be close or even equivalent to "actual truth". And in principle, there must exist "The Actual Book of Truth", which would list precisely the "actually true" statements.

Unfortunately, infinite books are somewhat inconvenient to print and carry around. Luckily, we can try to represent an infinite book in a finite way using algorithms. We have all seen how a short algorithm is capable of producing an infinite amount of nonsense, haven't we? Thus, instead of looking for an (infinitely-sized) "book of truth", we should look for a (finitely-sized) "algorithm of truth", which will be either capable of generating the whole book, or checking for any given statement, whether it belongs to the book.

Frege's propositional calculus

For clarity and historical reasons, the algorithms for describing "books of truths" are usually developed and written down using a particular "programming language", which consists of "axioms" and "inference rules". Here is one of the simplest examples of how a "program" might look like.

This is where a natural question arises: even if we assume that we have access to "the actual book of truth", will it really be possible to compress it down to a finite algorithm? Isn't it too much to ask? It turns out it is. There is really no way to create a compact representation for any but the most trivial "books of truths".

Besides the "asking too much" intuition, there are at least two other simple reasons why this is impossible. Firstly, most of the readers here know that there exist problems, that are in principle uncomputable. Those are usually about statements, the correctness of which cannot be checked in finite time by a finite algorithm, with the most famous example being the halting problem, as well as everything related to uncomputable numbers.

Naturally, if we can't even use a finite algorithm to list "all truly halting programs", our hopes for getting a finite description for any more complicated "book of truths" is gone. But there is another funny reason. Once we have decided to define the book with all true statements and devise an algorithm for generating it, we must not forget to include into the book some statements about the book and the algorithm themselves. In particular, we must consider the following phrase: "This sentence will not be included in the book generated by our algorithm."

If our algorithm does include this sentence in the generated book, the claim turns out to be false. As a result, our precious book would have false statements in it: it would be inconsistent. On the other hand, if our algorithm produces a book which does not include the above sentence, the sentence turns out to be true and the book is incomplete. It is just the liar's paradox.

And so it is — we have to put up with the situation, where it is impossible to produce a finite representation for all true statements without stumbling into logical paradoxes. Or, alternatively, that there are mathematical statements, which cannot be checked or proven within a finite time.

This describes what is known as "Gödel's first incompleteness theorem" (namely, the claim that in most cases our algorithm is only capable of producing either an incomplete or an inconsistent "book"). A curious corollary may be derived from it, which is known as the "Gödel's second incompleteness theorem". The theorem says that if our "book-generating algorithm" happens to include into it exactly the phrase "This book is consistent", it must necessarily be a false claim, and the algorithm must actually be inconsistent. This is often interpreted that no "honest" theory is capable of claiming it's own correctness (there are caveats, though).

Gödel's theorems have several interesting implications for the foundations of mathematics. None the less, those are still theorems about the fundamental impossibility of using a finite algorithm for generating a book of logically correct statements without stumbling into logical paradoxes and infinite loops. Nothing more and nothing less. They have nothing to do with the experimental sciences: note how we had to cut away all connections to reality in the very first paragraphs of the explanation. And thus, Gödel's theorems are not related to the causes of "human inability to grasp the universe", "failure or success of startup business plans", or anything else of that kind.

Tags: Computer science, Definition, Logic, Mathematics, Philosophy, Theory, Weltschmertz
A Brief History of Intelligence

Posted by Konstantin 16.03.2011 No Comments
This is a (slightly modified) write-up of a part of a lecture I did for the "Welcome to Computer Science" course last semester.

Part I. Humans Discover the World

How it all started

Millions of years ago humans were basically monkeys. Our ape-like ancestors enjoyed a happy existence in the great wide world of Nature. Their life was simple, their minds were devoid of thought, and their actions were guided by simple cause-and-effect mechanisms. Although for a modern human it might seem somewhat counterintuitive or even hard to imagine, the ability to think or understand is, in fact, completely unnecessary to succesfully survive in this world. As long as a living creature knows how to properly react to the various external stimuli, it will do just fine. When an ape sees something scary — ape runs. When an ape seems something tasty — ape eats. When an ape sees another ape — ape acts according to whatever action pattern is wired into its neural circuits. Does the ape understand what is happening and make choices? Not really — it is all about rather basic cause and effect.

As time went by, evolution blessed our ape-like ancestors with some extra brain tissue. Now they could develop more complicated reaction mechanisms and, in particular, they started to remember things. Note that, in my terminology here, "remembering" is not the same as "learning". Learning is about simple adaptation. For example, an animal can learn that a particular muscle movement is necessary to get up on a tree — a couple of failed attempts will rewire its neural circuit to perform this action as necessary. One does not even need a brain to learn — the concentration of proteins in a bacteria will adjust to fit the particular environment, essentially demonstrating a learning ability. "Remembering", however, requires some analytical processing.

Remembering

It is easy to learn to flex a particular finger muscle whenever you feel like climbing up a tree, but it is a totally different matter to actually note that you happen to regularly perform this action somewhy. It is even more complicated to see that the same finger action is performed both when you climb a tree and when you pick a banana. Recognizing such analogies between actions and events is not plain "learning" any more. It is not about fine-tuning the way a particular cause-and-effect reflex is working. It is a kind of information processing. As such, it requires a certain amount of "memory" to store information about previous actions and some pattern analysis capability to be able to detect similarities, analogies and patterns in the stored observations. Those are precisely the functions that were taken over by the "extra" brain tissue.

So, the apes started "remembering", noticing analogies and making generalization. Once the concept of "grabbing" is recognized as a recurring pattern, the idea of grabbing a stone instead of a tree branch is not far away. Further development of the brain lead to better "remembering" capabilities, more and more patterns discovered in the surrounding world, which eventually lead to the birth of symbolic processing in our brains.

Symbols

What is "grabbing"? It is an abstract notion, a recurring pattern, recognized by one of our brain circuits. The fact that we have this particular circuit allows us to recognize further occurrences of "grabbing" and generalize this idea in numerous ways. Hence, "grabbing" is just a symbol, a neural entity that helps our brains to describe a particular regularity in our lives.

As time went by, prehistoric humans became aware (or, let me say "became conscious") of more and more patterns, and developed more symbols. More symbols meant better awareness of the surrounding world and its capabilities (hence, the development of tools), more elaborate communication abilities (hence, the birth of language), and, recursively, better analytic abilities (because using symbols, you can search for patterns among patterns).

Symbols are immensely useful. Symbols are our way of being aware of the world, our way of controlling this world, our way of living in this world. The best thing about them is that they are easily spread. It may have taken centuries of human analytical power to note how the Sun moves along the sky, and how a shadow can be used to track time. But once this pattern has been discovered, it can be recorded and used infinitely. We are then free to go searching for other new exciting patterns. They are right in front of us, we just need to look hard. This makes up an awesome game for the humankind to play — find patterns, get rewards by gaining more control of the world, achieve better life quality, feel good, everyone wins! Not surprisingly, humans have been actively playing this game since the beginning of time. This game is what defines humankind, this is what drives its modern existence.

Science

Galelio's experiment

"All things fall down" — here's an apparently obvious pattern, which is always here, ready to be verified. And yet it took humankind many years to discover even its most basic properties. It seems that the europeans, at least, did not care much about this essential phenomenon until the XVIIth century. Only after going through millenia self-deception, followed by centuries of extensive aggression, devastating epidemics, and wild travels, the europeans found the time to sit down and just look around. This is when Galileo found out that, oh gosh, stuff falls down. Moreover, it does so with the same velocity independently of its size. In order to illustrate this astonishing fact he had to climb on to the tower of Pisa, throw steel balls down and measure the fall time using his own heartbeat.

In fact, the late Renaissance was most probably the time when europeans finally became aware of the game of science (after all, this is also a pattern that had to be discovered). People opened their eyes and started looking around. They understood that there are patterns waiting to be discovered. You couldn't see them well, and you had to look hard. Naturally, and somewhat ironically, the sky was the place they looked towards the most.

Patterns in the Sky

Tycho Brahe

Tycho Brahe, a contemporary of Galileo, was a rich nobleman. As many other rich noblemen of his time, he was excited about the sky and spent his nights as an astronomer. He truly believed there are patterns in planetary motions, and even though he could not see them immediately, he carefully recorded daily positions of the stars and planets, resulting in a vast dataset of observations. The basic human "remembering" ability was not enough anymore — the data had to be stored on an external medium. Tycho carefully guarded his measurements, hoping to discover as much as possible himself, but he was not the one to find the pattern of planetary motion. His assistant, Johannes Kepler got a hold of the data after Tycho's death. He studied the data and came up with three simple laws which described the movements of planets around the Sun. The laws were somewhat weird (the planets are claimed to sweep equal areas along an ellipse for no apparent reason), but they fit the data well.

Kepler's Laws

This story perfectly mirrors basic human pattern discovery. There, a human first observes the world, then uses his brain to remember the observations, analyze them, find a simple regularity, and come up with an abstract summarizing symbol. Here the exact same procedure is performed on a larger scale: a human performs observations, a paper medium is used to store them, another human's mind is used to perform the analysis, the result is a set of summarizing laws.

Isaac Newton

Still a hundred years later, Isaac Newton looked hard at both Galileo's and Kepler's models and managed to summarize them further into a single equation: the law of gravity. It is somewhat weird (everything is claimed to be attracted to everything for no apparent reason), but it fits the data well. And the game is not over yet, three centuries later we are still looking hard to understand Gravity.

Where are we going

As we play the game, we gradually run out of the "obvious" patterns. Detecting new laws of nature and society becomes more and more complicated. Tycho Brahe had to delegate his "memory" capabilities to paper. In the 20th century, the advent of automation helped us to delegate not only "memory", but the observation process itself. Astronomers do not have to sit at their telescopes and manually write down stellar positions anymore — automated radar arrays keep a constant watch on the sky. Same is true of most other science disciplines to various extents. The last part of this puzzle which is not fully automated yet is the analysis part. Not for long...

Part II. Computers Discover the World

Manufactured life

Vacuum tube

The development of electricity was the main industrial highlight of the XIXth century. One particularly important invention of that century was an incredibly versatile electrical device called the vacuum tube. A lightbulb is a vacuum tube. A neon lamp is a vacuum tube. A CRT television set is a vacuum tube. But, all the fancy glowing stuff aside, the most important function of a vacuum tube turned out to be its ability to act as an electric current switcher. Essentially, it allowed to hardwire a very simple program:

if (wire1) then (output=wire2) else (output=wire3)

It turns out that by wiring thousands of such simple switches together, it is possible to implement arbitrary algorithms. Those algorithms can take input signals, perform nontrivial transformations of those signals, and produce appropriate outputs. But the ability to process inputs and produce nontrivial reactions is, in fact, the key factor distinguishing the living beings from lifeless matter. Hence, religious, spiritual, philosophical and biological aspects aside, the invention of electronic computing was the first step towards manufacturing life.

Of course, the first computers were not at all like our fellow living beings. They could not see or hear, nor walk or talk. They could only communicate via signals on electrical wires. They could not learn — there was no mechanisms to automaticaly rewire the switches in response to outside stimuli. Neither could they recognize and "remember" patterns in their inputs. In general, their hardwired algorithms seemed somewhat too simple and too predictable in comparison to living organisms.

Transistors

But development went on with an astonishing pace. The 1940's gave us the most important invention of the XXth century: the transistor. A transistor provides the same switching logic as a vacuum tube, but is tiny and power-efficient. Computers with millions and billions of transistors became gradually available. Memory technologies caught up: bytes grew into kilobytes, megabytes, gigabytes and terabytes (expect to see a cheap petabyte drive at your local computer store in less than 5 years). The advent of networking and the Internet, multicore and multiprocessor technologies followed closely. Nowadays the potential for creating complex, "nontrivial" lifelike behaviour is not limited so much by the hardware capabilities. The only problem left to solve is wiring the right program.

Reasoning

The desire to manufacture "intelligence" surfaced early on in the history of computing. A device that can be programmed to compute, must be programmable to "think" too. This was the driving slogan for computer science research in most of the 1950-1980s. The main idea was that "thinking", a capability defining human intellectual superiority over fellow mammals, was mainly related to logical reasoning.

"Socrates is a man. All men are mortal. => Hence, Socrates is mortal."

As soon as we teach computers to perform such logical inferences, they will become capable of "thinking". Many years of research have been put in to this area and it was not in vain. By now, computers are indeed quite successful at performing logical inference, playing games and searching for solutions of complex discrete problems. But the catch is, this "thinking" does not feel like being proper "intelligence". It is still just a dumb preprogrammed cause-and-effect thing.

The Turing Test

Alan Turing

A particular definition of "thinking" was provided by Alan Turing in his Turing test: let us define intelligence as a capability of imitating a human in a conversation, so that it would be indistinguishable from a real human. This is a hard goal to pursue. It obviously cannot be achieved by a bare logical inference engine. In order to imitate a human, computer has to know what a human knows, and that is a whole lot of knowledge. So, perhaps intelligence could be achieved by formalizing most of human knowledge within a powerful logical inference engine? This has been done, and done fairly well, but sadly, this still does not resemble real intelligence.

Reasoning by Analogy

Optical character recognition

While hundreds of computer science researchers were struggling hard to create the ultimate knowledge-based logical system, real-life problems were waiting to be solved. No matter how good the computer became at solving abstract logical puzzles, he seemed helpless when faced with some of the most basic human tasks. Take, for example, character recognition. A single glimpse at a line of handwritten characters is enough for a human to recognize the letters (unless it is my handwriting, of course). But what logical inference should the computer do to perform it? Obviously, humans do not perform this task using reasoning, but rely on intuition instead. How can we "program" intuition?

The only practical way to automate character recognition turned out to be rather simple, if not to say dumb. Just store many examples of actual handwritten characters. Whenever you need to recognize a character, find the closest match in that database and voila! Of course, there are details which I sweep under the carpet, but the essence is here: recognition of characters can only be done by "training" on a dataset of actual handwritten characters. The key part of this "training" lies, in turn, in recognizing (or defining) the analogies among letters. Thus, the "large" task of recognizing characters is reduced to a somewhat "smaller" task of finding out which letters are similar, and what features make them similar. But this is pattern recognition, not unlike the rudimentary "remembering" ability of the early human ancestors.

The Meaning of Life

Please, observe and find the regularity in the following list:
- An ape observes its actions, recognizes regularities, and learns to purposefully grab things.
- Galileo observes falling bodies, recognizes regularities, and leans to predict the falling behaviour.
- Tycho Brahe observes stars, Johannes Keper recognizes regularities, and learns to predict planetary motion.
- Isaac Newton observes various models, recognizes regularities, and develops a general model of gravity.
- Computer observes handwritten characters, recognizes regularities, and learns to recognize characters.
- Computer observes your mailbox, recognizes regularities, and learns to filter spam.
- Computer observes natural language, recognizes regularities, and learns to translate.
- Computer observes biological data, recognizes regularities, and discovers novel biology.
Unexpectedly for us, we have stumbled upon a phenomemon, which, when implemented correctly, really "feels" like true intelligence. Hence, intelligence is not about logical inference nor extensive knowledge. It is all about the skill of recognizing regularities and patterns. Humans have evolved from preprogrammed cause-and-effect reflexes through simple "remembering" all the way towards fairly sophisticated pattern analysis. Computers now are following a similar path and are gradually joining us in The Game. There is still a long way to go, but we have a clear direction: The Intelligence, achieving which means basically "winning" The Game. If anything at all, this is the purpose of our existence - discovering all the regularities in the surrounding world for the sake of total domination of Nature. And we shall use the best intelligence we can craft to achieve it (unless we all die prematurely, of course, which would be sad, but someday some other species would appear to take a shot at the game).

Epilogue. Strong AI

There is a curious concept in the philosophical realms of computer science — "The Strong AI Hypothesis". It relates to the distinction between manufacturing "true consciousness" (so-called "strong AI") and creating "only a simulation of consciousness" (the "weak AI"). Although it is impossible to distinguish the two experimentally, there seems to be an emotional urge to make the distinction. This usually manifests in argumentation of the following kind: "System X is not true artificial intelligence, because it is a preprogrammed algorithm; Humans will never create true AI, because, unlike us, a preprogrammed algorithm will hever have free will; etc."

Despite the seemingly unscientific nature of the issue, there is a way to look at it rationally. It is probably true that we shall never admit "true intelligence" nor "consciousness" to anything which acts according to an algorithm which is, in some sense, predictable or understandable by us. On the other hand, every complex system that we ever create, has to be made according to clearly understandable blueprints. The proper way of phrasing the "Strong AI" question is therefore the following: is it possible to create a system, which is built according to "simple" blueprints, and yet the behaviour of which is beyond our comprehension.

Cellular automaton

The answer to this question is not immediately clear, but my personal opinion is that it is a strong "yes". There are at least three kinds of approaches known nowadays, which provide a means for us to create something "smarter" than us. Firstly, using everything fractal, cellular, and generally chaotic is a simple recipe for producing uncomprehensibly complex behaviour from trivial rules. The problem with this approach, however, is that there is no good methodology for crafting any useful functions into a chaotic system.

The second candidate is anything neural — obviously the choice of Mother Nature. Neural networks have the same property of being able to demonstrate behaviour, which is not immediately obvious from the neurons or the connections among them. We know how to train some types of networks and we have living examples to be inspired by. Nonetheless, it is still hard to actually "program" neural networks. Hence, the third and the most promising approach — general machine learning and pattern recognition.

The idea of a pattern recognition-based system is to use a simple algorithm, accompanied by a huge dataset. Note that the distinction between the "algorithm" and the "dataset" here draws a clear boundary between two parts of a single system. The "algorithm" is the part which we need to understand and include in our "blueprints". The "data" is the remaining part, which we do not care knowing about. In fact, the data can be so vast, that no human is capable of grasping it completely. Collecting petabytes is no big deal these days any more. This is a perfect recipe for making a system which will be capable of demonstrating behaviour that would be unpredictable and "intelligent" enough for us to call it "free will".

Think of it...
Tags: Artificial intelligence, Computer science, Machine learning, Philosophy, Science
What are pseudorandom numbers and what are they used for?

Posted by Swen 25.08.2009 No Comments

There seems to be a wide split of opinions between theoreticians and practitioners. Namely, some theoreticians define pseudorandomness in terms of complexity theory. As a result, the corresponding constructions are inherently slow and thus rejected by practitioners. This short blog post tries to address some common misconceptions.

A random number generator by xkcd

Formally, a pseudorandom generator is a deterministic function $f:\mathcal{S}\to\mathcal{R}$ which takes a (relatively short) seed $s$ and converts it into an element of $\mathcal{R}$ . In most cases, we as computer scientists are interested in functions $f:\{0,1\}^n\to\{0,1\}^\ell$ . However, there are other reasonable domains, too. For instance, when performing Monte-Carlo integration in the region $[0,1]$ , we would be interested in the functions of type $f:\{0,1\}^n\to[0,1]^\ell$ .

It is important to note that any function of type $f:\mathcal{S}\to\mathcal{R}$ is a pseudorandom generator. However, not all of those functions are equally good for all purposes. There are two main properties that are used to discriminate between various pseudorandom functions. First, we can talk about the efficiency of a pseudorandom generator, i.e., how fast can we compute $f(s)$ . Second, we can talk about the statistical properties of the output $f(s)$ .

Before going into the details, let us establish why anyone should use pseudorandom generators at all. There is an interesting controversy in the computer science, as well in statistics. Many algorithms assume that it is possible to use uniformly distributed numbers. For example, Monte-Carlo integration is based on the law of large numbers:

$\Pr[\lim_n\frac{1}{N}\sum_{i=1}^N g(x_i)=\int_0^1g(x)dx] = 1$

whenever $x_1,\ldots,x_N$ are taken independently and uniformly from the range $[0,1]$ . Similarly, one can provide theoretical bounds for the randomized version of quicksort, provided that we can draw elements uniformly from a set $\{1,\ldots,K\}$ . However, computers are mostly made of deterministic parts and it turns out to be really difficult to automatically collect uniformly distributed bits. A design of a device that would solve this problem is far from trivial.

The first official solution to this problem was posed in 1927 when a student of Karl Pearson published a table of random numbers. Later such tables were built by the RAND Corporation. That is, the function $f$ was explicitly specified through a big table. Of course, such a table is useful only if it can be used as a "reliable" source of random numbers. In particular, the value of

$\frac{1}{N}\sum_{i=1}^N g(x_i)$

should be as close to $\int_0^1 g(x)dx$ as possible. Since there are infinite number of functions, we cannot actually check this condition. Instead, statisticians performed a series of tests on the table to verify that the sequence $x_1,x_2,\ldots$ looks as close to random as possible. If we extend this concept properly, we get the modern quantification of pseudorandomness.

Formally, a function $f:\mathcal{S}\to\mathcal{R}$ is a $(t,\varepsilon)$ -secure pseudorandom generator if for any $t$ -time algorithm $A$ :

$|\Pr [r\gets\mathcal{R}: A(r)=1]-\Pr[s\gets\mathcal{S}: A(f(s))=1]|\leq \varepsilon$

where the probabilities are taken over the uniform choices of $s$ and $r$ . In more intuitive terms, if you replace the randomness $r$ with $f(s)$ and your algorithm runs in time $t$ , then the switch generates no easily observable discrepancies.

As an illustrative example, consider a probabilistic combinatorial search algorithm $B$ that runs in time $t_1$ and outputs a solution that can be verified in time $t_2$ . In this case, the use of a $(t_1+t_2,\varepsilon)$ -secure pseudorandom generator within $B$ instead of pure randomness would decrease the probability of $B$ finding a solution by at most $\varepsilon$ . Indeed, otherwise we could construct a $(t_1+t_2)$ -time algorithm $C$ that outputs 1, if $B$ finds the solution and 0 otherwise, and use it to discern the pseudorandom generator from true randomness with success at least $\varepsilon$ . This would contradict the fact that we have a true $(t_1+t_2,\varepsilon)$ -secure pseudorandom generator. Similar argument can be proven also for the Monte-Carlo integration algorithm. Note that parameters $t$ and $\epsilon$ can be arbitrary real numbers. For the combinatorial search algorithm that takes 3 weeks CPU time, you might use a $(3\ \text{week}, 0.01)$ -secure pseudorandom generator.

There are essentially two reasonable complaints about the pseudorandom generators. First, since obtaining random bits is hard, we have not solved the problem completely, as we must still get the seed from somewhere. This is indeed a valid problem. For instance, the standard rand() function in C is known to fail the NIST statistical test and thus you might actually observe inconsistencies when using rand() directly or generating a seed for a more complex function with rand(). The latter does not mean that rand() is not a pseudorandom generator, rather that its quality might be low for certain applications. As a complete solution, you would like to get a function $f:\{1\}\to\mathcal{R}$ . For some tasks, such as Monte-Carlo integration of certain functions the corresponding solution is known (see multidimensional integration and sparse grids).

However, we still do not know how to do it in general, i.e., how to convert randomized algorithms into deterministic ones without significant losses in their performance. The corresponding area of research is known as derandomization. Second, one might argue that we could prove directly that by replacing $r$ with $f(s)$ , the performance of the algorithm does not deteriorate much. However, this is not an easy task and it is usually not a valid option for usual mortals.

To summarize, whenever you use a certain function $f$ to extend a random seed in your implementation you actually have to believe that it does not affect performance, i.e., $f$ is a pseudorandom generator with appropriate $(t,\varepsilon)$ parameter values. Whether you should use $f(x)=1$ , rand() in C, or something more elaborate, depends on the application.

Tags: Computer science, Derandomization, Statistics, Theory
Artificial Intuition

Posted by Konstantin 07.12.2008 7 Comments
Logic versus Statistics

Consider the two algorithms presented below.

Algorithm 1:
```
   If, for a given brick B,
      B.width(cm) * B.height(cm) * B.length(cm) > 1000
   Then the brick is heavy
```
Algorithm 2:
```
   If, for a given male person P,
      P.age(years) + P.weight(kg) * 4 - P.height(cm) * 2 > 100
   Then the person might have health problems
```
Note that the two algorithms are quite similar, at least from the point of view of the machine executing them: in both cases a decision is produced by performing some simple mathematical operations with a given object. The algorithms are also similar in their behaviour: both work well on average, but can make mistakes from time to time, when given an unusual person, or a rare hollow brick. However, there is one crucial difference between them from the point of view of a human: it is much easier to explain how the algorithm "works'' in the first case, than it is in the second one. And this is what in general distinguishes traditional "logical" algorithms from the machine learning-based approaches.

Of course, explanation is a subjective notion: something, which looks like a reasonable explanation to one person, might seem incomprehensible or insufficient to another one. In general, however, any proper explanation is always a logical reduction of a complex statement to a set of "axioms". An "axiom" here means any "obvious" fact that requires no further explanations. Depending on the subjective simplicity of the axioms and the obviousness of the logical steps, the explanation can be judged as being good or bad, easy or difficult, true or false.

Here is, for example, an explanation of Algorithm 1, that would hopefully satisfy most readers:
- The volume of a rectangular object can be computed as its width*height*length. (axiom, i.e. no further explanation needed)
- A brick is a rectangular object. (axiom)
- Thus, the volume of a brick can be computed as its width*height*length. (logical step)
- The mass of a brick is its volume times the density. (axiom)
- We consider the density of a brick to be at least 1g/cm³ and we consider a brick heavy if it weighs at least 1 kg. (axiom)
- Thus a brick is heavy if its mass > width*height*length > 1000. (logical step, end of explanation)
If you try to deduce a similar explanation for Algorithm 2 you will probably stumble into problems: there are no nice and easy "axioms" to start with, unless, at least, you are really deep into modeling body fat and can assign a meaning to the sum of a person's age with his weight. Things become even murkier if you consider a typical linear classification algorithm used in OCR systems for deciding whether a given picture contains the handwritten letter A or not. The algorithm in its most simple form might look as follows:
```
   If 
   Then there is a letter A on the picture,
```
where $a_{ij}$ are some real numbers that were obtained using an obscure statistical procedure from an obscure dataset of pre-labeled pictures. There is really no good way to explain why the values of $a_{ij}$ are what they are and how this algorithm gets the result, other than to present the dataset of pictures it was trained upon and state that "well, these are all pictures of the letter A, therefore our algorithm detects the letter A on pictures".

Note that, in a sense, such an "explanation" uses each picture of the letter A from the training set as an axiom. However, these axioms are not the kind of statements we used to justify Algorithm 1. The evidence they provide is way too weak for traditional logical inferences. Indeed, the fact that one known image has a letter A on it does not help much in proving that some other given image has an A too. Yet, as there are many of these "weak axioms", one statistical inference step can combine them into a well-performing algorithm. Notice how different this step is from the traditional logical steps, which typically derive each "strong" fact from a small number of other "strong" facts.

So to summarize again: there are two kinds of algorithms, logical and statistical.
The former ones are derived from a few strong facts and can be logically explained. Very often you can find the exact specifications of such algorithms in the internet. The latter ones are based on a large number of "weak facts" and rely on induction rather than logical (i.e. deductive) explanation. Their exact specification (e.g. the actual values for the parameters $a_{ij}$ used in that OCR classifier) does not make as much general sense as the description of classical algorithms. Instead, you would typically find general principles for constructing such algorithms.

The Human Aspect

What I find interesting, is that the mentioned dichotomy stems more from human psychology than mathematics. After all, the "small" logical steps as well as the "big" statistical inference steps are all just "steps" from the point of view of maths and computation. The crucial difference is mainly due to a human aspect. The logical algorithms, as well as all of the logical decisions we make in our life, is what we often call "reason" or "intelligence". We make decisions based on reasoning many times a day, and we could easily explain the small logical steps behind each of them. But even more often do we make the kind of reason-free decisions that we call "intuitive". Take, for example, visual perception and body control. We do these things by analogy with our previous experiences and cannot really explain the exact algorithm. Professional intuition is another nice example. Suppose a skilled project manager says "I have doubts about this project because I've seen a lot of similar projects and all of them failed". Can he justify his claim? No, no matter how many examples of "similar projects" he presents, none of them will be considered as reasonable evidence from the logical point of view. Is his decision valid? Most probably yes.

Thus, the aforementioned classes of logical (deductive) and statistical (inductive) algorithms seem to directly correspond to reason and intuition in the human mind. But why do we, as humans, tend to consider intuition to be inexplicable and thus make "less sense" than reason? Note that the formal difference between the two classes of algorithms is that in the former case the number of axioms is small and the logical steps are "easy". We are therefore capable of representing the separate axioms and the small logical steps in our minds somehow. However, when the number of axioms is virtually unlimited and the statistical step for combining them is way more complicated, we seem to have no convenient way of tracking them consciously due to our limited brain capacity. This is somewhat analogous to how we can "really understand" why 1+1=2, but will have difficulties trying to grasp the meaning of 121*121=14641. Instead, the corresponding inductive computations can be "wired in" to the lower, unconscious level of our neural tissue by learning patterns from experience.

The Consequences

There was a time at the dawn of computer science, when much hope was put in the area of Artificial Intelligence. There, people attempted to devise "intelligent" algorithms based on formal logic and proofs. The promise was that in a number of years the methods of formal logic would develop to such heights, that would allow computer algorithms to attain "human" level of intelligence. That is, they would be able to walk like humans, talk like humans and do a lot of other cool things that we humans do. Half a century has passed and this still didn't happen. Computer science has seen enormous progress, but we have not found an algorithm based on formal logic that could imitate intuitive human actions. I believe that we never shall, because devising an algorithm based on formal logic actually means understanding and explaining an action in terms of a fixed number of axioms.

Firstly, it is unreasonable to expect that we can precisely explain much of the real world, because, strictly speaking, there exist mathematical statements that can't in principle be explained. Secondly, and most importantly, this expectation contradicts the assumption that most "truly human" actions are intuitive, i.e. we are simply incapable of understanding them.

Now what follows is a strange conclusion. There is no doubt that sooner or later computers will get really good at performing "truly human" actions, the trend is clear already. But, contradictory to our expectations, the fact that we shall create a machine that acts like a human will not really bring us closer to understanding how "a human" really "works". In other words, we shall never create Artificial Intelligence. What we are creating now, whether we want it or not, is Artificial Intuition.
Tags: Artificial intelligence, Computer science, Machine learning, Philosophy, Statistics