Thursday, July 13, 2017

Cargo Cult Computer Science

I recently attended a presentation by a post-graduate student that I thought was a little bit funny. The presentation was about the experiments they had done on classifying classical music. At the end of the presentation, they proudly declared that algorithm X could identify the composer of a piece (one of Vivaldi, Bach or Mozart), from half a second of music.

The first query I raised was, how many notes are you going to get in half a second? Classical music tends to have a relatively relaxed pace (at least, compared to the music I enjoy) so I doubt there would be more than one or two notes in each sample. The response was, algorithm X is really good at classification so half a second is enough.

The second query I raised was as follows: there was only one piece from each composer in each sample, and the Vivaldi was entirely strings, the Mozart was entirely piano, and the Bach was a mixture of instruments. How do they know that the algorithm didn't just learn to classify instruments?

This is similar to the famous example from the early days of neural networks, when perceptrons were being trained to distinguish photographs that contained images of tanks and those that did not. After some very good results at the start of the project, a second batch of images utterly failed. The reason for that failure was traced to the fact that the photographs with tanks had been developed using a slightly different process to that used to develop the photographs without tanks. That resulted in a slight difference in the overall brightness of the photographs. The neural network had simply learned to distinguish between lighter and darker photographs.

Now, the people who were looking for tanks did one thing right: they tested their algorithm with more data. The post-graduate student at the start of this story didn't do that. They just looked at the results they got, which fit their expectations, and stopped there. That meant that the conclusions they were drawing were not supported by the evidence.

The American physicist Richard Feynman famously spoke of "Cargo Cult Science". This is research that has the superficial form of science, but does not follow the rigor expected of the scientific method.

The scientific method is a process that has developed over many centuries, and requires a certain rigor and self-criticism that is intended to prevent erroneous conclusions being made. It requires scientists to be completely honest with themselves, to consider every objection to their research method and possible factors that could be influencing their results. The scientific method is supposed to prevent researchers from just seeing what they want to see and instead see the reality. The post-graduate student did not do this, and so their conclusions are not necessarily valid.

I've seen this in a lot of papers in computer science, and in more than a few post-graduate theses. Experiments are performed, results are gathered, and conclusions are confidently espoused about the value of their approach. Yet they never consider what else could explain those results. They never consider whether their data is biased in some way, or if their method is flawed so that certain results are favoured over others.

I think there are several reasons this occurs. Students in computer science are not necessarily trained in the scientific method, so they can hardly be blamed for not following it. It is human nature that researchers want their approach, their new algorithm, to work, so they develop a kind of wilful blindness to the flaws in their experimental approach. Finally, and more insidiously, researchers are under immense pressure to publish: "publish or perish" applies in computer science just as much as any other field of academia. It is only through publishing papers that researchers gain employment, get promotion, and secure research funding. There is, then, a system set up to favour rapid and uncritical publication of supportive results and to suppress unfavourable results. We have created a system that favours Cargo Cult Computer Science.

Computer science, if it is to remain worthy of the appellation "science" must fully embrace the scientific method. This means being rigorous, and being self-critical. The consequences of not doing so, could be severe for everyone in the field.