In the recent weeks I had to give a few introductory lectures on supervised machine learning within Jaak's data mining course. I also provided the students some home assignments, and there it was especially tricky to find a simple, fun and discussable exercise, which might help to form some intuition regarding the inherent difficulties of learning from data such as overfitting, multiple testing, and the like. The following is what I came up with and I find it a rather successful creation. It is remarkable that of the 80 students participating in the course only 4 or so came up with the correct answer 🙂
After the lecture on supervised machine learning at the University of Mordor, the teacher, Sauron, gave the students a dataset of the following form:
1) ABACDAFXYZ -> good 2) CEGXEAELWMN -> good 3) NUWAB -> bad ... 20) QRELAZMNPCXA -> good
The inputs were seemingly arbitrary strings of latin characters: these were the terrible spells known only to Sauron and chosen by Sauron at random from his Great Book of Terrible Spells. The output was a binary variable, classifying each spell as good or bad. There were 20 observations in total in the dataset.
The students were given a task: on the basis of this data, come up with a generic spell classifier. Sauron promised to test their result on the next lecture as follows: he will pick another random terrible spell from the book and require each student to make a prediction. The student whose prediction is wrong will have to jump down from the tower as a punishment.
The first student, Aghargh, who happened to be slacking on the lecture, couldn't come up with any smart ways to solve the task, so he ended up just counting the proportion of "good" and "bad" spells in the dataset. Having observed that 16 of the 20 spells were "good", he decided to predict "good" when asked.
The second student, Bughrorc, chose a smarter way. Firstly, he converted each string into a vector of binary attributes — one attribute for each letter, having value "1" if that letter was present in the corresponding spell and "0" otherwise. He then split the data randomly into a training set (15 instances) and a test set (5 instances), and attempted training various classifiers using the MordorMiner software. After some experimenting he finally found five classifiers that could predict all of the training examples correctly. One of these also predicted all of the testing examples correctly. He decided to use this classifier on the forthcoming test.
On the day of testing, Sauron asked the students to classify the spell
YOZAZA. Aghargh, as he decided, provided the answer "good". Bughrorc's classifier analyzed the string and claimed that the answer should be "bad"
Which of the students, do you think, might have a better chance of not jumping from the tower? Why? Can you quantify your answer?