Tuesday, August 14, 2012

Identifying bats with ANN

An interesting paper has just come out in the Journal of Applied Ecology, which describes how ANN were used to identify thirty-four European species of bats based on their echolocation calls. This is a challenging problem, because the calls within any bat species can vary quite a lot, depending on what the bat is doing. For example, the calls that a bat uses while hunting are different to the calls that a bat uses while commuting to a hunting ground. The work described in this paper has several good features.

Firstly, they used a hierarchy of MLP ensembles to identify the species. First a level of MLP identified the geographic region (out of six) that the bat came from. Then a second level was used to identify the genus (out of seven) of the bat. Finally, an ensemble of species-specific MLP identified the species itself.

Secondly, they used a large data set to train the MLP, and performed a thorough data analysis to identify the significant features. Rather than just cramming every acoustic feature through the MLP and hoping for the best, they only used the most significant twenty-four.

Finally, they incorporated the classifiers into software called iBatsID that is freely available for anyone to use.

The authors reported a range of classification accuracies across the species, from a high of 100% to a low of 56.5%. They say that "This is almost certainly the results of our eANN [ensemble ANN] dealing with many more species". I think they're wrong when they say that, because the point of using ensembles is that the individual members of the ensemble can be highly specialised for a particular class. I suspect that the problem may be that the features they selected were not as useful for classifying the poorly-recognised species: rather than using the same twenty-four parameters for all thirty-four species, they might have gotten better results by selecting acoustic parameters for each species. Also, from the diagram (Figure 3 in the paper) it looks like they used the outputs of the regional and genus networks only to decide which groups of species MLP to use. An alternative would have been to use the output of the regional and genus MLP as input features for the following levels (similar to my approach in this paper), which would have added some more information into the classification process and probably boosted accuracy.

A final problem with this paper is that they have excluded a lot of the technical details about constructing and training the ANN, and about exactly how the different levels in the hierarchy interacted. This is probably because it is an ecology paper, not an ANN paper.

Overall, it's an interesting application, and I'm looking forward to seeing more work done on this problem in the future.