Tuesday, May 24, 2011

Evolving Connectionist Systems

An interesting family of neural networks is Evolving Connectionist Systems (ECoS). These were invented by Professor Nik Kasabov around 1998. ECoS are constructive networks, that is, they do not start with a fixed structure but instead grow (add neurons) as training data is presented to them. The advantages of this are:
  1. they are fast learning, as they learn the data as it presented, rather than iteratively
  2. they are hard to over-train, as new data is accommodated by adding new neurons to the network
This makes ECoS networks very well suited to so-called "online" learning, where a stream of data is incoming and must be modeled as it arrives. Neurons are added when the current training example is either novel to the network (it has not seen something similar) or the network is not able to accurately model it.

The first ECoS was the Evolving Fuzzy Neural Network EFuNN. Later ECoS include the Simple Evolving Connectionist System SECoS (which is really an EFuNN with the fuzzy logic elements removed) and the Evolving Clustering Method ECM. EFuNN and SECoS both have rule extraction algorithms associated with them, by which fuzzy rules can be extracted from a trained EFuNN or SECoS network. This makes ECoS very useful for data mining, especially in an online application area.

I wrote a review of ECoS technology a couple of years ago, in this paper. An online reprint is available here. I also maintain a website of resources on ECoS networks at: ecos.watts.net.nz.

Research on ECoS networks is continuing, especially at Prof. Kasabov's lab KEDRI. Nowadays, ECoS research is focused on spiking neuron models, that is, neurons that include a temporal aspect to their activation, much as biological neurons do.

Wednesday, May 18, 2011

Modelling distribution of jellyfish with ANN

A new paper first-authored by David Pontin, my ex-PhD student from Lincoln University. This describes how he used MLP to model to presence and absence of a species of stinging jellyfish (Physalia physalis) at New Zealand beaches.

There are a couple of interesting points about this paper. Firstly, because there have been no surveys of Physalia distribution, a surrogate data set was used. This data set was stings recorded by lifeguards of Surf Lifesaving New Zealand. Since lifeguards treat jellyfish stings, each incident has to be recorded, and Physalia is the only stinging organism in New Zealand waters, a fairly large data set was available as to the presence of these jellyfish. Predictions were made from oceanic variables such as wave height and direction, and wind speed and direction.

Secondly, the data was carefully cleaned: since stings of swimmers was used as the surrogate for Physalia presence, times when there were no swimmers at the beach were excluded from the data set. While this introduced a small missing-not-at-random bias, it also removed a large number of false absences: if an example was recorded as an absence, then it was because there were no stings recorded, not because there was no one in the water.

Thirdly, an analysis of the contributions of each input of the ANN was performed. This showed which of the oceanic variables contributed the most to the presence of Physalia. This analysis indicated that there may be a hitherto unknown spawning ground for this species in the Tasman Sea.

Finally, and this is in many ways the focus of the paper, the contribution analysis of the ANN was compared with the results of input contribution analysis by an evolutionary algorithm.

Overall, this is a nice little paper that neatly sums up David's work and contributes to the understanding of the behaviour of Physalia. This shows how useful computational intelligence is to ecological applications, an area where there is, in my opinion, enormous potential for computational intelligence researchers to make real, meaningful contributions.

Monday, May 16, 2011

Minimum Requirements for Computational Intelligence Papers

I am reposting this after it was lost during the Blogger meltdown last week.


In a previous post, I mentioned some challenges in reviewing computational intelligence papers. In this post, I list what I consider to be the minimum requirements for computational intelligence papers. These are the things that I look for when I review a paper, and if they aren't there, I reject it.

1. Define all variables in equations

While most computational intelligence papers have mathematics in them, a disappointingly large number of them do not define the variables in their equations. Or, if they do, they define them some distance from the equation itself. If I am reading your paper, I want to understand the maths, and I can't do that if I can't quickly find the meaning of each variable.


2. Use more than one data set to test an algorithm

If your paper describes a new algorithm, or even an improvement on an existing algorithm, it must be tested on more than one data set. The No Free Lunch theorem tells us that there are always some data sets on which every algorithm will perform well, and some on which it will perform poorly. While publication bias means that poor results often do not get reported, I do expect results over more than one data set.


3. Investigate more than one set of parameters

For any algorithm, there will be one set of parameters that yields better performance than others. This means that if your study only utilised a single set of parameters, you cannot tell whether you might have gotten better results using different parameters. For new algorithms, it is useful to show how sensitive the performance of the algorithm is to its parameters.


4. Clearly describe how the parameters were chosen

This is a particular problem with papers that describe applications of algorithms. In short, even if you only list the parameters that gave the best performance, you should still describe how you chose them. Choosing parameters by trial-and-error is fine, but you must say in the paper that that was how you chose your parameters. Also, LIST THE PARAMETERS IN THE PAPER! Being able to replicate experiments is at the very heart of science, and if you don't say what your parameters were, your experiments can't be replicated.


5. Use multiple partitions of the data set

Neural network papers are particularly bad for this. Often, the algorithm will be trained on one subset of the data (the training set) then tested on the remaining data (the testing or validation set, depending on who's writing the paper). Sometimes the data is divided into subsets randomly, sometimes it is not. There are two problems with this approach: firstly, it is entirely possible that the data is partitioned in a way that is particularly good for the algorithm, that is, the reported performance of the algorithm is due to the partitioning of the data, rather than the algorithm itself; Secondly, if the training parameters of the algorithm are chosen to maximise the performance over the testing set, that is equivalent to training over the testing set: that is, the testing set is no longer independent.

A better way (and my preferred technique) is to use k-fold cross-validation with an independent validation data set. In this technique, a validation data set is either randomly extracted or sourced separately from the training data set. The training data set is then divided into k-subsets, and the algorithm trained over k-1 of the subsets. The kth subset is then used to evaluate the performance of the algorithm. This is repeated k times, with a different subset held out as the evaluation set each time. This has the effect of training and testing over the entire data set. The results over the cross-validation are used to select the parameters, and the final performance is assessed over the validation data set. Since the validation data set is not used to select the parameters or to train the algorithm, it remains statistically independent.


6. The final testing / validation set must be independent

This means that the data in the validation set must be from a separate process to that which produced the data used for the k-fold cross-validation. If you are training an ANN to recognise speech, the validation set should be from a different speaker to those it was trained on, or at least from a different recording. If you are training an ANN to recognise spatial features, the validation data should come from a different area or different survey to the data that was used to train the ANN.


7. Compare a new algorithm with an existing algorithm

If you are claiming that your new algorithm work well and is highly accurate, then you need to prove that by comparing it against the performance of an existing, preferably well-known, algorithm. You don't need to perform the experiments with the existing algorithm yourself (although it is good if you do), you can point to previously published results. But a comparison must be carried out.


8. Comparisons of performance must be done in a statistically sound manner

This means that you can't just look at two numbers (two means) and say that your algorithm is better because the mean accuracy is higher than that of an existing algorithm. Comparisons must be done using statistical tests, that is, I want to know whether the results are significantly different. If you say that the results are significantly different, then you must also specify what statistical test was performance. For example, it is best to say something along the lines of "the accuracy of the Bogon 2000 algorithm was significantly higher than that of the Wibble 12 algorithm (two-tailed t-test, p=0.01)".


If you don't want to follow these principles, that's fine, as long as you explain in your paper (or review rejoinder) why you didn't do that. I'm quite prepared to be shown to be wrong.