Wednesday 7 May 2014

Why Biologists Should Program and How I survived it

In my opinion, Biologists are subjective, programmers are objective (I have been both). Why should a biologist learn to program? Given enough funding, they1 can always hire a programmer, right? Yes, but they will never understand the wizardry a bioinformatician (a pet one in this case) has put in to achieve the result, and whether it (contribution, also sometimes read as authorship ) can be measured in terms of time devoted or the novelty of method devised or applied.

Programming is the Zenith of learning and adventure where knowledge, skill and fun meet. Yes, I am being geeky here. With my own little experience I have with Perl (which is not really considered a great programming language by many folks), I have learnt to think about science in a more systematic and less obfuscated way. Programming clears your logical thinking and helps you design better research studies2 and test them in an objective way as far as possible (only if you are honest enough to admit these shortcomings) . When you combine the descriptive biological thinking with the objective and techniques of programming, you gain a new understanding of the scientific world. Most people think it is too difficult to learn to program. I would say it is a lot easier but also a lot harder than you think. Contradictory? Yes, deliberate. It is easier to start if you take baby steps and brave through the period where you do seemingly easy but not practically useful exercises. This period can give you a sense of being totally lost and a you will keep asking- (i) why? and (ii) Will I be able to make a real program? ...and usually the negative self answers- not my cup of tea/ I know other tools on web that can do it/ have a bioinformatician friend. The art of programming is slow at the beginning takes some time (and a lot of dedication and honesty) to perfect where you start making small usable scripts like parsing files, finding patterns (TF binding sites, restriction sites in sequence etc).

Once you get over this period by constant practice, it feels like fun. That's where your creativity starts to flow and you design programs in your head and can't wait to do the next cool thing. On the practical side, it is suggested that the biologists must learn to program as the next big boom in science is expected to come from dry lab methods due to the data avalanche.

I too had a tough time starting to program (in 2007) but after some 7 odd years (2014), I can write software worth thousands of lines of code3 (ProteoStats) taking months of development time. I took up programming as part of my PhD coursework where my PhD adviser taught Perl. By the time I could understand command line, the course was over (it was a concise one with 3 classes). I sulked and almost believed I cant program...ever ! Then I tried with a book, learnt a little in few weeks but my interest again waned as I was not habitual of sitting at a computer for long hours (which can't be said for current generation, I think). So, in effect, although I did learn some syntax, I couldn't put it to any use. Problems I tried were either too easy & non-useful or too hard for me and often I remained clueless about what it means to program. I must also admit that my woes were also compounded because I was lost and didn't know when or why to program. Just like any other biologist, I too thought that I can do it (some task X) using a web based tool. Programming just seemed like a waste of time when others could do it for me.

Then one day, I was given a real problem which couldn't be solved by any tool. It was pretty complex and seemingly not cut for me. Now, I started learning again, quite earnestly, but again took some time to come to grips and then it dawned upon me- programming isn't about knowing all the syntax, or being able to do anything at once (rather one attempt). It can be built up doing small inane exercises just like maths. Your problem solving skills will eventually define whether you can program instead of your familiarity with complete syntax. All programmers, no matter how adept, Google for answers/syntax/tricks. The idea is to know enough which can be put to use in small problems and then keep adding one little tool, nuance, trick, coolness code slowly one at a time. Small successes make you want to program more and then my friend, gradually you become a programmer !

Some resources:-
Look at scratch4, a visual programming language developed at MIT which is good for beginners and children.

References:
1. No gender bias here. So avoided the use of he/she.
2. both theoretical and experimental studies
3. http://sourceforge.net/projects/mssuite/files/ProteoStats/
4. http://scratch.mit.edu/