How I started to learn programming

中文版见这里:学习编程

In this post, I want to share with you my experiences of learning programming. It has brought great joy in my life.

Previously at Tsinghua University (during my bechelor's), I took a C++ course but didn't take it seriously, so after that I still couldn't write code at all. In March and April of 2016, we had the module of Bioinformatics (in our Master Program), during which we had a couple of R programming lectures. That's when I began to seriously learn programming.

Currently I am working with R and learning Python by myself. My advice is, learn R if you are interested in data analysis (for example, if you are a biologist working with high-throughput data); otherwise, it makes more sense to start with Python, which is far more intuitive than R and C++. I tried a Coursera course of R programming in 2013 but gave up in the middle, because I didn't realize that the power of R lies in big data manipulation, and I didn't have statistical training at that time.

Resources

As I have mentioned in this post, I had a few hands-on lectures on R programming, which were basically where I started to learn programming. We were provided with R scripts, which I felt was all I needed. It was more efficient to learn with these scripts on my own than being guided by other people. You can find two of them here: Introduction to R: the cell is very different from our world, Basic string and DNA sequence handling. We had another lecture about transcriptome analysis, but I couldn't find the script on the internet.

A Little Book of R for Bioinformatics is another great starting point. It's a great introduction to the R language as well as bioinformatics (mainly genome analysis), so it also gives you some idea about why we use R. The author (Avril Coghlan) has written more booklets on using R: Biomedical Statistics, Time Series and Multivariate Analysis.

Once you have learned the basics of R, you should build the habit of utilizing the comprehensive built-in help system. At first it was a bit difficult for me to read the help pages because they are often very concise. But after a while, I got used to it and began to appreciate the power of it. It can sometimes be really helpful to read the documentation and entries of the help pages when you are not familiar with a package.

As always, Google is your Friend. Besides, Github can be also very useful when you are using packages from there. Developers write about how to install their packages and you can also scrutinize the codes on Github.

I learned Python from scratch on Codecademy. It's a quite good introduction to the basics of Python in an interactive way.

UNIX Fundamentals is a great tutorial for the UNIX filesystem and shell commands.

Practice to learn

The best way to learn programming is by doing it. You can try building games (toy games xD), work with simple projects like building a website, or just work on mathematical (/biological/...) problems, which can be easily found online.

Here are a few assignments we had in the Bioinformatics module: homework 1, homework 2, homework 3. Me and a few friends did the assignments independently and discussed them in our Lernkreis. Both processes were very beneficial and full of aha moments. Every time I could discover something that I didn't fully understood.

R packages that I love

ggplot2: the best package for data visualization ever. With the package you can easily draw plots that are ready for publication.

dplyr: data manipulation. For example, you can efficiently select a subset of rows or columns in a data frame, add new columns that are functions of existing columns, and so on.

knitr: write reports in R. Documentation is very important because it saves time in the long run, for the writer himself and for the readers if they are working on similar tasks. As a great tool for documentation, knitr enables the smooth combination of text and code.

RColorBrewer: the greatest toy in R :) I am obsessed with the package because I have the same aesthetic standards with it.

tidyr: great tool to "tidy" the data. It works great with ggplot2.


I hope this post can help you a little bit if you are a green hand like me and interested in programming :) Happy coding, beautiful nerds!