I am reposting this after it was lost during the Blogger meltdown last week.
In a previous post, I mentioned some challenges in reviewing computational intelligence papers. In this post, I list what I consider to be the minimum requirements for computational intelligence papers. These are the things that I look for when I review a paper, and if they aren't there, I reject it.
1. Define all variables in equations
While most computational intelligence papers have mathematics in them, a disappointingly large number of them do not define the variables in their equations. Or, if they do, they define them some distance from the equation itself. If I am reading your paper, I want to understand the maths, and I can't do that if I can't quickly find the meaning of each variable.
2. Use more than one data set to test an algorithm
If your paper describes a new algorithm, or even an improvement on an existing algorithm, it must be tested on more than one data set. The No Free Lunch theorem tells us that there are always some data sets on which every algorithm will perform well, and some on which it will perform poorly. While publication bias means that poor results often do not get reported, I do expect results over more than one data set.
3. Investigate more than one set of parameters
For any algorithm, there will be one set of parameters that yields better performance than others. This means that if your study only utilised a single set of parameters, you cannot tell whether you might have gotten better results using different parameters. For new algorithms, it is useful to show how sensitive the performance of the algorithm is to its parameters.
4. Clearly describe how the parameters were chosen
This is a particular problem with papers that describe applications of algorithms. In short, even if you only list the parameters that gave the best performance, you should still describe how you chose them. Choosing parameters by trial-and-error is fine, but you must say in the paper that that was how you chose your parameters. Also, LIST THE PARAMETERS IN THE PAPER! Being able to replicate experiments is at the very heart of science, and if you don't say what your parameters were, your experiments can't be replicated.
5. Use multiple partitions of the data set
Neural network papers are particularly bad for this. Often, the algorithm will be trained on one subset of the data (the training set) then tested on the remaining data (the testing or validation set, depending on who's writing the paper). Sometimes the data is divided into subsets randomly, sometimes it is not. There are two problems with this approach: firstly, it is entirely possible that the data is partitioned in a way that is particularly good for the algorithm, that is, the reported performance of the algorithm is due to the partitioning of the data, rather than the algorithm itself; Secondly, if the training parameters of the algorithm are chosen to maximise the performance over the testing set, that is equivalent to training over the testing set: that is, the testing set is no longer independent.
A better way (and my preferred technique) is to use k-fold cross-validation with an independent validation data set. In this technique, a validation data set is either randomly extracted or sourced separately from the training data set. The training data set is then divided into k-subsets, and the algorithm trained over k-1 of the subsets. The kth subset is then used to evaluate the performance of the algorithm. This is repeated k times, with a different subset held out as the evaluation set each time. This has the effect of training and testing over the entire data set. The results over the cross-validation are used to select the parameters, and the final performance is assessed over the validation data set. Since the validation data set is not used to select the parameters or to train the algorithm, it remains statistically independent.
6. The final testing / validation set must be independent
This means that the data in the validation set must be from a separate process to that which produced the data used for the k-fold cross-validation. If you are training an ANN to recognise speech, the validation set should be from a different speaker to those it was trained on, or at least from a different recording. If you are training an ANN to recognise spatial features, the validation data should come from a different area or different survey to the data that was used to train the ANN.
7. Compare a new algorithm with an existing algorithm
If you are claiming that your new algorithm work well and is highly accurate, then you need to prove that by comparing it against the performance of an existing, preferably well-known, algorithm. You don't need to perform the experiments with the existing algorithm yourself (although it is good if you do), you can point to previously published results. But a comparison must be carried out.
8. Comparisons of performance must be done in a statistically sound manner
This means that you can't just look at two numbers (two means) and say that your algorithm is better because the mean accuracy is higher than that of an existing algorithm. Comparisons must be done using statistical tests, that is, I want to know whether the results are significantly different. If you say that the results are significantly different, then you must also specify what statistical test was performance. For example, it is best to say something along the lines of "the accuracy of the Bogon 2000 algorithm was significantly higher than that of the Wibble 12 algorithm (two-tailed t-test, p=0.01)".
If you don't want to follow these principles, that's fine, as long as you explain in your paper (or review rejoinder) why you didn't do that. I'm quite prepared to be shown to be wrong.
Monday, May 16, 2011
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.