A couple of days ago, I gave a simple method which can be used in order to stop fraudulent co-authorships that are aimed at gaming the h-index computation mechanism. Today, I would like to point out yet another problem with the current formula used to compute h-index by Google Scholar Citations. I call this the problem of meaningless self citations.

It has been known for a long time that some researchers, due to pressure from their employers and their own desire to increase the number of their publications, publish the same works with several journals and/or conferences. They just change the title and content here and there, but the main ideas of the paper are grossly preserved.

Reviewers have to be blamed for letting such papers to pass through their hands without proper scrutiny. This has led to many researchers dramatically increasing their published body of work and increasing the probability for those works to be cited thereby boosting their h-indices. As already pointed out, this problem can easily be sorted out by strengthening the review process.

The problem of meaningless self citations is, however, more subtle in the sense the papers may actually be genuine research outputs enriching the body of knowledge with ground breaking results. But within the references, the authors deliberately cite many of their own papers in order to increase the citation counts for those papers. Google Scholar treats all citations equally. It is, therefore, very easy for self-cited papers to gain more citations if the researcher and his/her co-authors are writing more and more papers.

At the end of the day, the researchers in a particular authorship network will all witness rapidly increasing h-indices. These will, in no way, reflect the actual impact of the researchers. The system is definitely being gamed through meaningless self citations. Therefore, there should be a way of dealing with this problem and I would like to ignite debate on this issue by giving a suggestion.

One suggestion would be to completely exclude self citations. But that would be too harsh for those authors whose new research works are genuinely coming up due to the impact of their previous works. There is a need to recognise such cases.

I, therefore, suggest that for a particular paper, all the citations, where at least one of the authors was involved in the work being cited, should be adjusted by applying a factor of 0.25. Google Scholar also keeps track of the researcher’s co-authors. These days, it very common for researchers to cite the works of their previous co-authors just to boost their friends citation count. These meaningless friend citations are very difficult to track. Some of them may even genuine. They should, therefore, be adjusted by a higher factor, say 0.5. If we let C denote the citation count for a particular paper, the citation count equation would finally look like this:

C = T + 0.5*F + 0.25*S, where T denotes third party citations; F and S denote self citations and friend citations respectively.

This formula gives more value to third party citations. These are the people that are not connected to the researcher and are citing the paper independently. As I mentioned, the purpose of this post is to initiate a debate on this issue. Feel free to share your views.