Sunday 27 November 2016

Coder and Programmer

I asked this question during one interview (update:several, ok maybe all) from prospective bioinformatics students and to my surprise, a lot of them said they didn't have a clue what a pragma1 is ! This doesn't mean they didn't know about it but they were unaware of this technical term/jargon. Once I told them, most of them knew what I was talking about.

Let me give a little background first. I am a biologist-turned self-taught programmer who followed the book from page one to the last page (mostly because I couldn't grasp what it meant), started programming, failed and tried again, then went on to web based forums to gain some sense as I learnt to tackle problems of increasing difficulty. I had a pretty good problem (read very hard !!!) during my PhD (for which I started learning to program) which had enough complexity to make me sweat more and more and more and more...

Most students don't read the book, go off straight to the web and learn coding, not programming. Yes, there's a difference. Diffuse while it may seem, experienced programmers will immediately recognize what I am referring to. We start by coding and learn to accumulate nifty tricks on our way to becoming a programmer. We may use a less-than-efficient algorithm to manage RAM or CPU. Programmers know what works best given the resources at hand (CPU/RAM/type of OS or portability issues), maintainability (can the original developer & anyone else keep the code working and fix bugs without too much pain) and future developments (is it extensible in a way that allows new features to be added without a lot of refactoring?) to the code.

While coders are great at doing small chunks of unit problems very quickly, the programmers tend to think more about the problem, and identifying the abstract problem type and ways (or algorithms) to deal with it. Some examples are sorting (too basic and well-known), array-array comparisons etc. In my opinion, programmers, design and write code for an optimal solution while coders try to do it the one way they know or google up to do the stuff they are supposed to do. Also, programmers are great at debugging2, even though they may not be fastest in coding or the most elegant (but they do generally come up with nifty & elegant solutions). Coders may me great at execution but they may find it difficult to debug code, more so when it was written by somebody else.

With a thoughtful process behind acquiring newer sub-skills in programming, and making and effort to write and debug codes in a better and optimal way, any coder can aspire to become a programmer.

Footnotes:
1. I'll come back to pragmas in a later post. They deserve a solely dedicated post.
2. As they say - To err is human, to debug divine !

Sunday 24 January 2016

Data parasites vs idea parasites

In their editorial, Longo and Drazen (yeah, the-now-infamous NEJM editorial, Data Sharing, 21st Jan, 2016) raise an interesting, though misdirected, concern about data sharing. While they articulated the merits of data sharing correctly when they raise the moral issue of honoring the collective sacrifice of the patients who put themselves at risk to generate the data, they have conveniently ignored the public funding part when they incorrectly raise the abominable research/data parasites issue. Due to this oversight, the piece seems to be focused on the assumed unrequited authorship rights of the clinicians who generated the data in every subsequent study, instead of honoring public funding which allowed it to happen or the so-called moral imperative to honor the patients. While their concern is founded in reality when they observe that, an independent researcher may not fully understand the nuances of a study design or parameters; but it takes a wrong turn when they advocate continued reaping of the rewards of a publicly funded project where the patients put themselves at risk. The assumed importance of data generation is overtly inflated and takes centre stage so as to claim full ownership.

While the data generated may be of extreme importance and collaborations arising out of newer hypothesis are well-intentioned and thoughtfully done in the proposed manner, it in no way represent the only mechanism to do good or meaningful research. In the field of mass spectrometry based proteomics, there have been several groups generating data for free public use to facilitate better algorithm developments without asking for anything in return. These groups were funded by public money, and they contributed back to science and society in general. A privately funded project, however, may be out of purview of my counterclaims to the authors' views and may as well hold true there. Yet again, the funding body then becomes the claimant of data ownership, if they so decide.

Authors also make an incorrect audacious suggestion about having totally new hypothesis arguing that one should not build upon the work of others. It bore out from the malicious (of course in their opinion only, not  mine) practice of data scientists to use their data to disprove their hypothesis or claims. Prime example of anti-science. Oh, did it arise out of frustration that many NEJM retractions could be because of enthusiastic data parasites who happened to recheck their claims? I don't know. Science has always progressed by building up on work of predecessors, lest we keep reinventing the wheel. Using the suggested modus operandi would cripple clinical research, and even discovery of biomarkers would become near impossible, let alone development of a drug. The suggested recourse to the assumed data parasitism problem is another lame pitch for authorship. Authors probably forgot/never heard this remark by Carl Sagan -  "If you wish to make an apple pie from scratch, you must first invent the universe."

In the lines that follow, I revoke their four point agenda in this new light - 
One, starting with a novel idea that isn't an obvious extension to previous work, is somewhat of a fallacy in itself. If it was obvious enough, the main group generating the data could/should have done it. Everything in science seems obvious once the idea is shared. On pretext of data, the generators are trying to be idea parasites
Second, identifying potential collaborators isn't always necessary but researchers do collaborate when they require. Adding the data generating person as a collaborator is not wrong unless one starts to force it. It's obvious enough, they aren't happy with citations. Only authorship counts in their opinion.
Third, working together to solve a problem is not a novel idea and always has been promoted by funding bodies and scientific administration, but is correct only if required. Just because someone generated data, does not automatically give him insights into how another researcher sees a scientific problem that can be addressed using that data. It is noted however, that he may be of great help and the question poser is the best person to recognize this. Some researchers may actually be using data incompetently by not involving those who generated it, but such researchers are down-valuing their own research. Is the data parasitism concept even truly valid in this scenario? 
Fourth, fighting for a co-authorship makes the irony come full circle because authors now want to become idea/authorship parasites on the so-called data parasites. Once an idea parasite generates a valuable dataset from public funding and patients' sacrifice, he/she can rake in on it continuously by piggyback riding the idea, hard work, funding and sacrifices of others.

Conflict of interest: I am a data parasite by their definition. I thrive on public data.

(PS: I will update the post with proper references and links. Recheck in few days if you are interested)

Thursday 4 September 2014

Writing short blog posts

Short is sweet. Well... generally, and more so due to the lack of time we have versus the amount of information available these days. In the digital age, information availability is not as big a problem as the time available for assimilation. I have read several blogs of note and some of them despite being good (in content and writing style) require a lot of effort to read them in one attempt. This could be a transient phase in which I am struggling to manage my time properly. The conclusion I drew from this experience was to write short blog posts.

So how does this work? What if I have more to say? And does the reader really care?
If the reader is involved in your blog and reads regularly, it is highly likely that he/she may read posts even if those are long. But to attract and retain new readers, catch their attention quickly with good content, flow and lucid writing. That's what most adept writers know how to do. But AFAIK, Most such writers write long, engaging novels with attention to detail. An example is The Da Vinci Code.

This seems counter-intuitive. How does one balance being succinct and engaging at the same time? I am no expert in writing and my experience is based on whatever I have read, blog posts, news articles, research papers, reviews, novels, fiction, non fiction books etc. Follow a scientific article like format- short and to-the-point intro to the theme. Add a bit of history and your own unique perspective on the topic. Do not repeat much (some writers don't agree with this and I too break this rule of thumb as per need) content and be as clear and unambiguous as possible. Try to wrap up in few paragraphs, may be four is more then enough.

Lastly, nothing beats good content. More so in the era of short attention spans versus mountains of information. Be short, but valuable. I attempt to better my writing using this format (in short manageable chunks) and hopefully write few things worth reading in that process.

Abbrevaitions:
AFAIK=As far as I know

Wednesday 7 May 2014

Why Biologists Should Program and How I survived it

In my opinion, Biologists are subjective, programmers are objective (I have been both). Why should a biologist learn to program? Given enough funding, they1 can always hire a programmer, right? Yes, but they will never understand the wizardry a bioinformatician (a pet one in this case) has put in to achieve the result, and whether it (contribution, also sometimes read as authorship ) can be measured in terms of time devoted or the novelty of method devised or applied.

Programming is the Zenith of learning and adventure where knowledge, skill and fun meet. Yes, I am being geeky here. With my own little experience I have with Perl (which is not really considered a great programming language by many folks), I have learnt to think about science in a more systematic and less obfuscated way. Programming clears your logical thinking and helps you design better research studies2 and test them in an objective way as far as possible (only if you are honest enough to admit these shortcomings) . When you combine the descriptive biological thinking with the objective and techniques of programming, you gain a new understanding of the scientific world. Most people think it is too difficult to learn to program. I would say it is a lot easier but also a lot harder than you think. Contradictory? Yes, deliberate. It is easier to start if you take baby steps and brave through the period where you do seemingly easy but not practically useful exercises. This period can give you a sense of being totally lost and a you will keep asking- (i) why? and (ii) Will I be able to make a real program? ...and usually the negative self answers- not my cup of tea/ I know other tools on web that can do it/ have a bioinformatician friend. The art of programming is slow at the beginning takes some time (and a lot of dedication and honesty) to perfect where you start making small usable scripts like parsing files, finding patterns (TF binding sites, restriction sites in sequence etc).

Once you get over this period by constant practice, it feels like fun. That's where your creativity starts to flow and you design programs in your head and can't wait to do the next cool thing. On the practical side, it is suggested that the biologists must learn to program as the next big boom in science is expected to come from dry lab methods due to the data avalanche.

I too had a tough time starting to program (in 2007) but after some 7 odd years (2014), I can write software worth thousands of lines of code3 (ProteoStats) taking months of development time. I took up programming as part of my PhD coursework where my PhD adviser taught Perl. By the time I could understand command line, the course was over (it was a concise one with 3 classes). I sulked and almost believed I cant program...ever ! Then I tried with a book, learnt a little in few weeks but my interest again waned as I was not habitual of sitting at a computer for long hours (which can't be said for current generation, I think). So, in effect, although I did learn some syntax, I couldn't put it to any use. Problems I tried were either too easy & non-useful or too hard for me and often I remained clueless about what it means to program. I must also admit that my woes were also compounded because I was lost and didn't know when or why to program. Just like any other biologist, I too thought that I can do it (some task X) using a web based tool. Programming just seemed like a waste of time when others could do it for me.

Then one day, I was given a real problem which couldn't be solved by any tool. It was pretty complex and seemingly not cut for me. Now, I started learning again, quite earnestly, but again took some time to come to grips and then it dawned upon me- programming isn't about knowing all the syntax, or being able to do anything at once (rather one attempt). It can be built up doing small inane exercises just like maths. Your problem solving skills will eventually define whether you can program instead of your familiarity with complete syntax. All programmers, no matter how adept, Google for answers/syntax/tricks. The idea is to know enough which can be put to use in small problems and then keep adding one little tool, nuance, trick, coolness code slowly one at a time. Small successes make you want to program more and then my friend, gradually you become a programmer !

Some resources:-
Look at scratch4, a visual programming language developed at MIT which is good for beginners and children.

References:
1. No gender bias here. So avoided the use of he/she.
2. both theoretical and experimental studies
3. http://sourceforge.net/projects/mssuite/files/ProteoStats/
4. http://scratch.mit.edu/

Thursday 23 January 2014

Google Chrome as a Media Player

Ever had the problem of using so many different programs simultaneously on your laptop that it doesn't like to respond? Well, that's a common feature for computer science and bioinformatics people. You may have  a browser with dozens of tabs, a music player, an R console, your favorite IDE (text editor), a word document maybe (if you are writing a manuscript/report), probably an excel sheet too.
A nifty feature of Google Chrome browser is that it can double up as a media player for you. I have not yet tested other browsers which may be able to do this but since I am a heavy chrome user, I will focus exclusively on that.

Chrome as a music player-
it ca easily play mp3 files ( i have tested this). Saves me from opening another application (music player) on an already burdened system. I think it might be able to play other formats too (wma, wav etc). If you test/have tested, please add those in comments section.

Chrome as a video player-
I was just wondering if chrome can also double up as my video player? Lo and behold, yes it can. I tried mp4 files downloaded from coursera, an online educational MOOC. It played the files easily. It could also run mkv files but failed in case of avi files. Not too great since most videos we (rather I) have, might be in avi format but pretty good for smartphone/ipod videos (mp4 format).

Try it and let me know !

Update:
1. I forgot to mention that although a good feature of chrome to play media files, it lacks the additional functionality which some of us may want like playing in loop, playing and making a playlist etc.
2. Chrome can also be used to view common image files (JPG, GIF etc).
3. You can use chrome for viewing your text/XML files.

Thursday 16 January 2014

Stages of a PhD progression

This is a fun take on how PhD progression occurs, mainly in Indian context (more specifically my grad institute), but I presume similarities may occur worldwide. This is in context of giving your ideas or discussing/arguing/contesting what the mentor suggests. (Take it with a pinch of salt).

Generally, it is 5 year duration here- mentors won't allow 4 years easily nor would you be able to complete the work in that time. Sometimes, even 6 or 7 years can pass by without you noticing.

So here are the Stages of the PhD progression-

1st year - You are naive and don't know what is good or not (scientific project wise). You work on all ideas of mentor even if they are trash. You don't know when to speak up.

2nd year- You have started to feel you should speak up against some of the (silly) ideas but don't know when. And whenever you do, you get trashed.

3rd year- Having had enough of side tracks, you get bold and start speaking at (against?) all (almost) ideas the boss suggests. Not knowing when to say what, you always argue (not good for getting a PhD in the long run).

4th year- You develop tact and know when to say what but its too late, the damage has been done last year. Now that you want a PhD, you try to play ball using the new found wisdom to just get it done. Adviser also realizes this and never fails to mention you need to do XYZ before you can graduate.

5th year- Both of you understand each other like husband and wife. No pretenses work. Conversations are less animated and most talking is done by body language only. Both are in "you don't kill me I won't kill you" mode. Mentor is sharp enough to know you can't take that risk anyway. Tries to get done a few things you were always avoiding but now can't refuse.

6th and 7th year- You are still here? Either you haven't seen it coming or(bad planning/tough project) or your mentor is plain greedy. Why hire a post doc when you can get the work done through a miserable student who will do anything to get a PhD. You think I don't have an idea what to do next, let's get anything done to graduate. You do almost everything you are asked of, and arguments are as much as in 1st year again.

Tab to Search in Google Chrome

Imagine you are a Google Chrome user (of course you are! What? Still using the browser that should not be named here? Please leave my blog immediately !).

You want to do a you tube search for your favorite music video, let's assume PSY gangnam style1.
So you open Google chrome and write youtube in the omnibox, press enter. Then you click on youtube site from results and search for PSY gangnam style there. Alternatively, you write Youtube PSY gangnam style or just PSY gangnam style in the omnibox, click on search results to goto the page. Sounds familiar?

Google Chrome has a unique feature to cut down on the number of clicks you make here. It has a feature (of course it's not new, not now at least) called Tab to Search which helps you search a specific site directly from the omnibox. All you need to do is write the site name (youtube), when the name appears in suggestions, press the tab button. If chrome recognizes its search API, you will see a "Search Youtube" sign. Replace the site name (i.e youtube) for linkedin, scholar, pubmed, uniprot, NCBI, yahoo or whatever site you want to search. Type your query and you directly reach youtube search results instead of having to goto google results followed by site and then searching there.

It will not recognize any site at first for Tab to search feature. Once you visit the site a few times, chrome will recognize its API and you can search it easily.

I thought this feature might be well known but have been recently surprised that I was wrong. Though, most computationally inclined people may know it already.

For in-depth details - visit chrome's page explaining this feature.


References-
1. Here is the link in case you still can't search it that way :- http://www.youtube.com/watch?v=CH1XGdu-hzQ