• Posted by Konstantin 26.09.2012 4 Comments

    The issues related to scientific publishing, peer-review and funding always make for popular discussion topics at conferences. In fact, the ongoing ECML PKDD 2012 had a whole workshop, where researchers could complain about discuss some of their otherwise interesting results that were hard or impossible to publish. The rejection reasons ranged from "a negative result" or "too small to be worthy of publication" to "lack of theoretical justification". The overall consensus seemed to be that this is indeed a problem, at least in the field of machine learning.

    The gist of the problem is the following. Machine learning relies a lot on computational experiments - empirically measuring the performance of methods in various contexts. The current mainstream methodology suggests that such experiments should primarily play a supportive role, either demonstrating a general theoretic statement, or simply measuring the exact magnitude of the otherwise obvious benefit. This, unfortunately, leaves no room for "unexpected" experimental results, where the measured behaviour of a method is either contradicting or at least not explained by the available theory. Including such results in papers is very difficult, if not impossible, as they get criticised heavily by the reviewers. A reviewer expects all results in the paper to make sense. If anything is strange, it should either be explained or would better be disregarded as a mistake. This is a natural part of the quality assurance process in science as a whole.

    Quite often, though, unexpected results in computational experiments do happen. They typically have little relevance to the main topic of the paper, and the burden of explaining them can be just too large for a researcher to pursue. It is way easier to either drop the corresponding measurement, or find a dataset that behaves "nicely". As a result, a lot of  relevant information about such cases never sees the light of day. Thus, again and again, other researchers would continue stumbling on similar unexpected results, but continue shelving them away.

    The problem would not be present if the researchers cared to, say, write up such results as blog posts or tech-reports in ArXiv, thus making the knowledge available. However, even formulating the unexpected discoveries in writing, let along go any deeper, is often regarded as a waste of time that won't get the researcher much (if any) credit. Indeed, due to how the scientific funding works nowadays, the only kind of credit that counts for a scientist is (co-)authoring a publication in a "good" journal or conference.

    I believe that with time, science will evolve to naturally accommodate such smaller pieces of research into its process (mini-, micro-, nano-publications?), providing the necessary incentives for the researchers to expose, rather than shelve their "unexpected" results. Meanwhile, though, other methods could be employed, and one of the ideas that I find interesting is the concept I'd call "co-authorship licensing".

    Instead of ignoring a "small", "insignificant", or an "unexpected" result, the researcher should consider publishing it as either a blog post or a short (yet properly written) tech report. He should then add an explicit requirement, that the material may be referred to, cited, or used as-is in a "proper" publication (a journal or a conference paper) with the condition that the author of the post must be included in the author's list of the paper.

    I feel there could be multiple benefits to such an approach. Firstly, it non-invasively addresses the drawbacks of the current science funding model. If being cited as a co-author is the only real credit that counts in the scientific world, why not use it explicitly and thus allow to effectively "trade" smaller pieces of research. Secondly, it enables a meaningful separation of work. "Doing research" and "publishing papers" are two very different types of activities. Some scientists, who are good at producing interesting experimental results or observations, can be completely helpless when it comes to the task of getting their results published. On the other hand, those, who are extremely talented in presenting and organizing results into high-quality papers, may often prefer the actual experimentation to be done by someone else. Currently, the two activities have to be performed by the same person or, at best, by the people working at the same lab. Otherwise, if the obtained results are not immediately "properly" published, there is no incentive for the researchers to expose them. "Co-authorship licensing" could provide this incentive, acting as an open call for collaboration at the same time. (In fact, the somewhat ugly "licensing" term could be replaced with a friendlier equivalent, such as "open collaboration invitation", for example. I do feel, though, that it is more important to stress that others are allowed to collaborate rather than that someone is invited to).

    I'll conclude with three hypothetical examples.

    • A Bachelor's student makes a nice empirical study of System X in his thesis, but has no idea how to turn this to a journal article. He publishes his work in ArXiv under "co-authorship license", where it is found by a PhD student working in this area, who was lacking exactly those results for his next paper.
    • A data miner at company X, as a side-effect of his work, ends up with a large-scale evaluation of learning algorithm Y on an interesting dataset. He puts those results up as a "co-authorship licensed" report. It is discovered by a researcher, who is preparing a review paper about algorithm Y and is happy to include such results.
    • A bioinformatician discovers unexpected behaviour of algorithm X on a particular dataset. He writes his findings up as a blog post with a "co-authorship license", where those are discovered by a machine learning researcher, who is capable of explaining the results, putting them in context, and turning into an interesting paper.

    It seems to me that without the use of "co-authorship licensing" the situations above would end in no productive results, as they do nowadays.

    Of course, this all will only make sense once many people give it a thought. Unfortunately, no one reads this blog 🙂

    Tags: ,