Sunday 27 November 2016

Coder and Programmer

I asked this question during one interview (update:several, ok maybe all) from prospective bioinformatics students and to my surprise, a lot of them said they didn't have a clue what a pragma1 is ! This doesn't mean they didn't know about it but they were unaware of this technical term/jargon. Once I told them, most of them knew what I was talking about.

Let me give a little background first. I am a biologist-turned self-taught programmer who followed the book from page one to the last page (mostly because I couldn't grasp what it meant), started programming, failed and tried again, then went on to web based forums to gain some sense as I learnt to tackle problems of increasing difficulty. I had a pretty good problem (read very hard !!!) during my PhD (for which I started learning to program) which had enough complexity to make me sweat more and more and more and more...

Most students don't read the book, go off straight to the web and learn coding, not programming. Yes, there's a difference. Diffuse while it may seem, experienced programmers will immediately recognize what I am referring to. We start by coding and learn to accumulate nifty tricks on our way to becoming a programmer. We may use a less-than-efficient algorithm to manage RAM or CPU. Programmers know what works best given the resources at hand (CPU/RAM/type of OS or portability issues), maintainability (can the original developer & anyone else keep the code working and fix bugs without too much pain) and future developments (is it extensible in a way that allows new features to be added without a lot of refactoring?) to the code.

While coders are great at doing small chunks of unit problems very quickly, the programmers tend to think more about the problem, and identifying the abstract problem type and ways (or algorithms) to deal with it. Some examples are sorting (too basic and well-known), array-array comparisons etc. In my opinion, programmers, design and write code for an optimal solution while coders try to do it the one way they know or google up to do the stuff they are supposed to do. Also, programmers are great at debugging2, even though they may not be fastest in coding or the most elegant (but they do generally come up with nifty & elegant solutions). Coders may me great at execution but they may find it difficult to debug code, more so when it was written by somebody else.

With a thoughtful process behind acquiring newer sub-skills in programming, and making and effort to write and debug codes in a better and optimal way, any coder can aspire to become a programmer.

Footnotes:
1. I'll come back to pragmas in a later post. They deserve a solely dedicated post.
2. As they say - To err is human, to debug divine !

Sunday 24 January 2016

Data parasites vs idea parasites

In their editorial, Longo and Drazen (yeah, the-now-infamous NEJM editorial, Data Sharing, 21st Jan, 2016) raise an interesting, though misdirected, concern about data sharing. While they articulated the merits of data sharing correctly when they raise the moral issue of honoring the collective sacrifice of the patients who put themselves at risk to generate the data, they have conveniently ignored the public funding part when they incorrectly raise the abominable research/data parasites issue. Due to this oversight, the piece seems to be focused on the assumed unrequited authorship rights of the clinicians who generated the data in every subsequent study, instead of honoring public funding which allowed it to happen or the so-called moral imperative to honor the patients. While their concern is founded in reality when they observe that, an independent researcher may not fully understand the nuances of a study design or parameters; but it takes a wrong turn when they advocate continued reaping of the rewards of a publicly funded project where the patients put themselves at risk. The assumed importance of data generation is overtly inflated and takes centre stage so as to claim full ownership.

While the data generated may be of extreme importance and collaborations arising out of newer hypothesis are well-intentioned and thoughtfully done in the proposed manner, it in no way represent the only mechanism to do good or meaningful research. In the field of mass spectrometry based proteomics, there have been several groups generating data for free public use to facilitate better algorithm developments without asking for anything in return. These groups were funded by public money, and they contributed back to science and society in general. A privately funded project, however, may be out of purview of my counterclaims to the authors' views and may as well hold true there. Yet again, the funding body then becomes the claimant of data ownership, if they so decide.

Authors also make an incorrect audacious suggestion about having totally new hypothesis arguing that one should not build upon the work of others. It bore out from the malicious (of course in their opinion only, not  mine) practice of data scientists to use their data to disprove their hypothesis or claims. Prime example of anti-science. Oh, did it arise out of frustration that many NEJM retractions could be because of enthusiastic data parasites who happened to recheck their claims? I don't know. Science has always progressed by building up on work of predecessors, lest we keep reinventing the wheel. Using the suggested modus operandi would cripple clinical research, and even discovery of biomarkers would become near impossible, let alone development of a drug. The suggested recourse to the assumed data parasitism problem is another lame pitch for authorship. Authors probably forgot/never heard this remark by Carl Sagan -  "If you wish to make an apple pie from scratch, you must first invent the universe."

In the lines that follow, I revoke their four point agenda in this new light - 
One, starting with a novel idea that isn't an obvious extension to previous work, is somewhat of a fallacy in itself. If it was obvious enough, the main group generating the data could/should have done it. Everything in science seems obvious once the idea is shared. On pretext of data, the generators are trying to be idea parasites
Second, identifying potential collaborators isn't always necessary but researchers do collaborate when they require. Adding the data generating person as a collaborator is not wrong unless one starts to force it. It's obvious enough, they aren't happy with citations. Only authorship counts in their opinion.
Third, working together to solve a problem is not a novel idea and always has been promoted by funding bodies and scientific administration, but is correct only if required. Just because someone generated data, does not automatically give him insights into how another researcher sees a scientific problem that can be addressed using that data. It is noted however, that he may be of great help and the question poser is the best person to recognize this. Some researchers may actually be using data incompetently by not involving those who generated it, but such researchers are down-valuing their own research. Is the data parasitism concept even truly valid in this scenario? 
Fourth, fighting for a co-authorship makes the irony come full circle because authors now want to become idea/authorship parasites on the so-called data parasites. Once an idea parasite generates a valuable dataset from public funding and patients' sacrifice, he/she can rake in on it continuously by piggyback riding the idea, hard work, funding and sacrifices of others.

Conflict of interest: I am a data parasite by their definition. I thrive on public data.

(PS: I will update the post with proper references and links. Recheck in few days if you are interested)