Four Years Remaining

Data Engineering

Posted by Konstantin 07.09.2015 2 Comments

Research and engineering always go hand in hand. Their relationship is so tight that often people cease to see the difference. Indeed, both are development activities that bring along technical advances. Many researchers work in engineering companies, many engineers do what is essentially research, is there even a need to see the difference? Quite often there is. Let me highlight this difference.

Research (sometimes synonymous with "science") is, ideally, an experimental activity, which aims to explore some space of possibilities in order to hopefully come up with a certain "understanding" of what those possibilities are or how they should be used. Researchers primarily deliver written reports, summarizing their findings. There is absolutely no guarantee that those findings would be "interesting" or "useful" to any degree - indeed, a research project by definition starts off in a position of uncertainty. As a result, for practical, commercial or investment purposes, research is always a risky activity. It may take a long time, it will produce mostly text, and there is no guarantee that the results will be of use. The upside of this risk is, of course, the chance to stumble upon unique innovation, which may eventually bring considerable benefits. Later.

Engineering is a construction activity, where existing knowledge and tools are put together to deliver actual products. In order for this process to be successful, experimentation and uncertainty must be, ideally, brought down to a minimum. Thus, when compared to pure research, engineering projects are low risk endeavors, where the expected outputs are known in advance.

In simple terms - researchers answer questions whilst engineers build things, and by this definition those occupations are, obviously, very different. This difference is often apparent in material fields, such as construction or electronics, where the engineers can be distinguished from the researchers by the kind of tools they would mostly hold in their hands and the places they would spend their time the most. With computational fields this visual difference does not exist - both engineers and researchers spend most of their time behind a computer screen. The tools and approaches are still different, but you won't spot this unless you are in the field. I did not know the difference between Software Engineering and Computer Science until I had the chance to try both.

Things become even more confusing when data mining gets involved. The popularity of projects, which focus on building data-driven intelligent systems, is ever growing. As a result, more and more companies seem to be eager to embrace this magical world by hiring "data scientists" to do "data science" for them. The irony of the situation is that most of those companies are engineering businesses (e.g. software developer firms) and, as such, they would not (or at least should not) normally consider hiring anyone with the word "scientist" in the job title. Because scientists are not too famous for bringing stable income, engineers are.

The term "data science" is a vague one, but I think it is quite succinct in that it captures the exploratory aspect that is inherent in general-purpose data analysis, as well as the current state of the art in the field. Although there are some good high level tools for a wide range of "simple" machine learning tasks nowadays, as soon as you want to try something more exotic, you are often on your own, faced with uncertainty and the need to experiment before you can even try to "build" anything. For a typical engineering company such uncertainty is not a good thing to rely upon.

It does not mean that one cannot engineer data-driven systems nowadays, it means that in reality most of the companies, whether they know it or not, need a very particular kind of "data scientists". They need specialists with a good knowledge of simple reliable tools and are capable of applying them to various data formats. Those, who would perhaps avoid excessive experimentation in favor of simple, scalable, working solutions, even if those are somehow simplistic, suboptimal and do not employ custom-designed forty-layer convolutional networks with inception blocks, which require several months to debug and train. Those, who might not know much about concentration inequalities but would be fluent in data warehousing and streaming. There's actually a name for such people: "Data engineers".

There is nothing novel about the use of such terminology, yet I still regularly encounter way too much misunderstanding around this topic over and over again.

In particular, I would expect way more of the "Data engineering" curricula to appear at universities alongside the more or less standard "Data science" ones. The difference between the two would be slight, but notable - pretty much of the same order as the difference between our "Computer science" and "Software engineering" master's programmes, for example.

Tags: Computer science, Data science, Definition, Philosophy, Weltschmertz

M	T	W	T	F	S	S
« Jul				Jan »
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30

Data Engineering

Calendar

Recent Comments

Archives