Sunday 24 January 2016

Data parasites vs idea parasites

In their editorial, Longo and Drazen (yeah, the-now-infamous NEJM editorial, Data Sharing, 21st Jan, 2016) raise an interesting, though misdirected, concern about data sharing. While they articulated the merits of data sharing correctly when they raise the moral issue of honoring the collective sacrifice of the patients who put themselves at risk to generate the data, they have conveniently ignored the public funding part when they incorrectly raise the abominable research/data parasites issue. Due to this oversight, the piece seems to be focused on the assumed unrequited authorship rights of the clinicians who generated the data in every subsequent study, instead of honoring public funding which allowed it to happen or the so-called moral imperative to honor the patients. While their concern is founded in reality when they observe that, an independent researcher may not fully understand the nuances of a study design or parameters; but it takes a wrong turn when they advocate continued reaping of the rewards of a publicly funded project where the patients put themselves at risk. The assumed importance of data generation is overtly inflated and takes centre stage so as to claim full ownership.

While the data generated may be of extreme importance and collaborations arising out of newer hypothesis are well-intentioned and thoughtfully done in the proposed manner, it in no way represent the only mechanism to do good or meaningful research. In the field of mass spectrometry based proteomics, there have been several groups generating data for free public use to facilitate better algorithm developments without asking for anything in return. These groups were funded by public money, and they contributed back to science and society in general. A privately funded project, however, may be out of purview of my counterclaims to the authors' views and may as well hold true there. Yet again, the funding body then becomes the claimant of data ownership, if they so decide.

Authors also make an incorrect audacious suggestion about having totally new hypothesis arguing that one should not build upon the work of others. It bore out from the malicious (of course in their opinion only, not  mine) practice of data scientists to use their data to disprove their hypothesis or claims. Prime example of anti-science. Oh, did it arise out of frustration that many NEJM retractions could be because of enthusiastic data parasites who happened to recheck their claims? I don't know. Science has always progressed by building up on work of predecessors, lest we keep reinventing the wheel. Using the suggested modus operandi would cripple clinical research, and even discovery of biomarkers would become near impossible, let alone development of a drug. The suggested recourse to the assumed data parasitism problem is another lame pitch for authorship. Authors probably forgot/never heard this remark by Carl Sagan -  "If you wish to make an apple pie from scratch, you must first invent the universe."

In the lines that follow, I revoke their four point agenda in this new light - 
One, starting with a novel idea that isn't an obvious extension to previous work, is somewhat of a fallacy in itself. If it was obvious enough, the main group generating the data could/should have done it. Everything in science seems obvious once the idea is shared. On pretext of data, the generators are trying to be idea parasites
Second, identifying potential collaborators isn't always necessary but researchers do collaborate when they require. Adding the data generating person as a collaborator is not wrong unless one starts to force it. It's obvious enough, they aren't happy with citations. Only authorship counts in their opinion.
Third, working together to solve a problem is not a novel idea and always has been promoted by funding bodies and scientific administration, but is correct only if required. Just because someone generated data, does not automatically give him insights into how another researcher sees a scientific problem that can be addressed using that data. It is noted however, that he may be of great help and the question poser is the best person to recognize this. Some researchers may actually be using data incompetently by not involving those who generated it, but such researchers are down-valuing their own research. Is the data parasitism concept even truly valid in this scenario? 
Fourth, fighting for a co-authorship makes the irony come full circle because authors now want to become idea/authorship parasites on the so-called data parasites. Once an idea parasite generates a valuable dataset from public funding and patients' sacrifice, he/she can rake in on it continuously by piggyback riding the idea, hard work, funding and sacrifices of others.

Conflict of interest: I am a data parasite by their definition. I thrive on public data.

(PS: I will update the post with proper references and links. Recheck in few days if you are interested)