I've recently stumbled upon a simple observation, which does not seem to be common knowledge and yet looks quite enlightening. Namely: polynomials provide an excellent way of modeling data in an order-agnostic manner.
The situations when you need to represent data in an order-agnostic way are actually fairly common. Suppose that you are given a traditional sample and are faced with a task of devising a generic function of the sample, which could only depend on the values in the sample, but not on the ordering of these values. Alternatively, you might need to prove that a given statistic is constant with respect to all permutations of the sample. Finally, you might simply wish to have a convenient mapping for your feature vectors that would lose the ordering information, but nothing else.
The most common way of addressing this problem is sorting the sample and working with the order statistics instead of the original values. This is not always convenient. Firstly, the mapping of the original sample to the corresponding vector of order statistics (i.e. the sorting operation) is quite complicated to express mathematically. Secondly, the condition that the vector of order statistics is always sorted is not very pleasant to work with. A much better idea is to represent your data as a polynomial of the form
This will immediately provide you with a marvellous tool: two polynomials and are equal if and only if their roots are equal, which means, in our case, that the samples and are equal up to a reordering.
Now in order to actually represent the polynomial we can either directly compute its coefficients
or calculate its values at any different points (e.g. at ) - in any case we end up with the same amount of data as we had originally (i.e. values), but the new representation is order-agnostic and has, arguably, much nicer properties than the order statistics vector.
It is not without its own problems, of course. Firstly, it requires at least time to compute. Secondly, not every polynomial will have real-valued roots. And thirdly, the interpretation of the new "feature vector" is not necessarily intuitive or meaningful. Yet nonetheless, it's a trick to consider.
This trick has been used for decades in coding theory and cryptography. To illustrate it, I will give a small exercise to readers. Assume that you have to keep track of a list of items, which changes dynamically. There is no upper limit to the list size, however, you are given a promise that at the end it will never exceed 5 elements and all elements fit into 32-bits. Design a storage system that takes only 5 32-integers and it supports addition operations and legitimate removal operations and restores the final state.
I guess your task is about storing a set of items rather than a list, right? Rather cool!