Thursday, April 26, 2012

There's no un-gameable metric

I've been a bit quiet on the blog front lately, mostly because I've been working like a dog on several projects, including writing tools for ecological modelling, re-working some websites, and fulfilling my duties both as guest editor of my special issue of Evolving Systems on Applications of Evolving Connectionist Systems, and as vice-chair of the IEEE CIS Social Media Subcommittee. It was also school holidays the last two weeks here in South Australia, and I was able to spend some quality time with my little girl.

Never fear, I'm working on several new blog posts on a variety of topics, including: the relationship between computational intelligence and data mining; further thoughts on doing a PhD (a follow-up to this post); my thoughts on the value of a computational intelligence degree; and my thoughts on collaborating with other researchers. The topic of today's post, though, is assessing academics and universities.

My alma mater has been in the New Zealand news lately (see here and here) after the release of a report by accounting firm KPMG that suggests that Otago had gamed the New Zealand government research assessment process to give themselves a higher score than they were entitled to.

The Performance-Based Research Funding (PBRF) framework rates the research outputs of eligible staff and uses those ratings, along with metrics of institutional performance such as number of research degrees completed, to assign an overall score to the institution. Staff can be rated as R (research inactive - bad for this exercise), C (research active / good), B (very good) or A (world-class). The fewer R's and C's an institution has, and the more B's and A's, the better the institution's score. Something like 25-30% of an institution's income will be determined by this score. There is also the huge marketing advantage of an institution scoring highly in relation to the other universities: in the first PBRF round in 2004, Auckland University made a lot of the fact that their staff were, on average, ranked highest in the country, while Otago made a lot of the fact that they were ranked highest as an institution. This is despite the government of the day clearly saying that PBRF wasn't supposed to be used for such comparisons, or as a management tool.

Eagle-eyed readers may have noticed the term "eligible staff" earlier in the previous paragraph: it is this facet of the process that Otago is accused of gaming.

The accusation is that Otago inappropriately classified staff it knew would get low scores as ineligible for assessment, and thus artificially boosted its ranking compared to other New Zealand institutions. Otago is also accused of firing, or pushing into retirement, staff based on their anticipated PBRF score. The vice-chancellor denies these accusations, and the whole thing is turning into a "he said / she said" situation.

Did Otago really do this? I honestly don't know. I do know that when I was working at Otago in 2004 (the first PBRF assessment round), I was assessed fully, and fairly, even though it would have been pretty easy to classify me as ineligible for assessment. I don't think my score in PBRF at the time was particularly helpful to their overall ranking, but maybe it wasn't too harmful, either. My point is, this entire drama shows that there is no metric of academic performance, of an individual, an institution, or a publication, that can't be gamed. That is, there is no metric that can't be manipulated so that an individual, institution or publication gets a higher score than they otherwise would. Journals can boost their impact factor by asking authors to cite articles from within that publication (and I have had editors ask me to do this). Individuals can boost their h-index by auto-citations, or by organising a special issue and asking every author to cite a review article they have written. Institutions can raise their assessment by head-hunting the top-performers in their fields, or by hiding staff from assessment. Some might argue that it is only prudent to game metrics whenever possible: after all, the future employment prospects of an academic, or the future financial security (and, therefore, job security of staff of) an institution depends on getting a good score on whatever metric is being used. As long as no rules are being broken, and the questions are being answered honestly, what's the harm? If there is wiggle-room, or room for gaming of the metric, isn't it the assessor's fault for designing an inexact metric? Others might argue that adherence to the spirit of the assessment is more important, more fair, and that gaming should be avoided.

This all means that there is no one metric you can use to assess an academic. You have to look at the entire picture: you have to look at their publication count; where they have published; what fields they have published in; how much teaching they have done; their teaching assessments; the quality of their institution; and their service to their institution(s), to professional societies, and to the community. I hope that one day I will rate highly in all of those areas, but for now, don't judge me just by my h-index alone.