Four Years Remaining » 2012 » January » 16

A PCA Puzzle

Posted by Konstantin 16.01.2012 No Comments
This post presumes you are familiar with PCA.

Consider the following experiment. First we generate a random vector (signal) as a sequence of random 5-element repeats. That is, something like

(0.5, 0.5, 0.5, 0.5, 0.5, 0.9, 0.9, 0,9, 0.9, 0,9, 0.2, 0.2, 0.2, 0.2, 0.2, ... etc ... )

In R we could generate it like that:
```
num_steps = 50
step_length = 5;
initial_vector = c();
for (i in 1:num_steps) {
  initial_vector = c(initial_vector, rep(runif(1), step_length));
}
```
Here's a visual depiction of a possible resulting vector:

plot(initial_vector), zoomed in

Next, we shall create a dataset, where each element will be a randomly shifted copy of this vector:
```
library(magic) # Necessary for the shift() function
dataset = c()
for (i in 1:1000) {
  shift_by = floor(runif(1)*num_steps*step_length) # Pick a random shift
  new_instance = shift(initial_vector, shift_by)   # Generate a shifted instance
  dataset = rbind(dataset, new_instance);          # Append to data
}
```
Finally, let's apply Principal Component Analysis to this dataset:
```
pca = prcomp(dataset)
```
Question - how do the top principal components look like? Guess first, then read below for the correct answer.

Read more...
Tags: Data analysis, Fun, Puzzle

Oli on The Data Science Workflow
Adam on The Curse of Genomic Coordinates
second on How to Send an SMS
6 Regularization Techniques for Deep Learning | Python | Keras - AI ASPIRANT on The Mystery of Early Stopping
Aldo D'Ottavio on What is the Covariance Matrix?