How to learn scripting languages

I am assuming that you are already relatively comfortable with R or other computer programming languages but in case you are not…

I believe that effectively studying and learning code actually requires a very different approach than studying other subjects. For this reason, I have decided to make a list of approaches/ideas that I think are useful when learning a scripting language like R and that might not be that obvious if you are relatively new to this:

1) Avoid the “copy and paste” approach as much as possible

Although copying and pasting may help you to avoid typing errors, it can also interfere with your learning process for two reasons:

Typing errors can help you gain experience in writing code as R provides informative feedback when you make such mistakes. Making and correcting typing errors is an important skill to develop, particularly when you are typing a lot of code for your own data analysis.
Copying and pasting code may give you the impression that you know what you are doing when, in reality, you probably do not fully understand what the individual lines of code are actually doing. Furthermore, this problem will just get worse as you deal with increasingly longer and more complicated scripts

2) Study code line-by-line:

The intention of this material is that you follow along, execute the code, and compare your output with ours because “just as one cannot learn martial arts by watching Bruce Lee movies, you can’t learn to program by only reading a book. You have to get in there and throw some punches and, likewise, take some hits” (adapted from (McElreath 2020)).

In other words, I would like you to run one line of code at a time and making sure that you understand why the output is what it is. If things are not clear, it is important to spend more time with that piece of the code. Here are some tricks that are often helpful to understand a particular piece of code:

Break a line of code into its components and try to understand the individual pieces. For example, say you are trying to understand the last line in the code below:

tmp=read.csv('final edited.csv',as.is=T)
tmp1=tmp[,grep('w',colnames(tmp))]

The last line of code is comprised of three functions nested within each other, whose results are then used to subset specific columns. To better understand what is going on, you can break the last line of code into multiple pieces to see how they operate individually.

tmp=read.csv('final edited.csv',as.is=T)

names1=colnames(tmp) 
colnumbers=grep('w',names1) 
tmp1=tmp[,colnumbers]

Finally, it is SUPER helpful to annotate each line as you find out what these pieces do. This avoids you from feeling overwhelmed with so much information and makes this a useful resource when adapting this code for other purposes.

tmp=read.csv('final edited.csv',as.is=T)

names1=colnames(tmp) #get names of columns
colnumbers=grep('w',names1) #returns the indexes of the columns that contain "w"
tmp1=tmp[,colnumbers] #selects just the columns that contain "w" in their name

Note: I don’t provide the data “final edited.csv” here because it is not super critical for you to be able to follow this example.

Perform mini-experiments: create a simpler example in which you can tinker with the code and see what happens to the output. For instance, if it is not clear what the function “grep” does in the example above, you could try modifying it in multiple ways to gain a better understanding of how it works:

vec=c('denis','matt','john','sandra') # create a simpler vector of names
ind=grep('a',vec); ind #these are the items that contain "a" in the vector "vec"

## [1] 2 4

vec[ind] #let's check if this is true

## [1] "matt"   "sandra"

What happens if I change “a” for “e”?

vec=c('denis','matt','john','sandra') # create a simpler vector of names
ind=grep('e',vec); ind #these are the items that contain "e" in the vector "vec"

## [1] 1

vec[ind] #let's check if this is true

## [1] "denis"

In a way, this is not that different from experiments we perform in science, in which we are trying to understand how things work!

3) Take your time learning how to code:

It is important to realize that it takes time to learn how to code. What this implies is that you should not rush to get things done if you want to master this skill. In particular, everybody goes through some level of struggle and frustration when learning how to code.

Importantly, never skim over an R script! Skim reading might be useful to get up-to-date with the news or to read a novel but it is not useful for programming. For computer programming, you really have to pay attention to the details. For example, a single misplaced comma can completely change the instructions that are being given to the computer, leading to hours of frustration trying to figure out what went wrong.

Once you run into a problem, it might be tempting to immediately ask for help but I would strongly advise first giving yourself at least 30 min to try to figure it out. Learning how to troubleshoot problems and tinker with things till you know what went wrong or how to do something new is a very important skill to develop.

4) Use the internet to find answers:

Everybody (from novice to more experienced users) relies on the internet when they don’t understand something. It is likely that other people have already asked (and received useful answers) for the problem that you are facing. However, finding the exact piece of information that you need might be hard, especially if you don’t use the correct terms/key words. Learning how to search for the information that you need is a skill that also takes practice. In particular, “Stackoverflow” and existing cheatsheets (e.g., https://www.rstudio.com/resources/cheatsheets/) can be very useful resources.

Finally, knowing how to post your question to others is very important:

Make the problem as simple as possible to explain and to reproduce. Few people are going to be willing to help if first they have to understand the >100 lines of code that you have given them.
If the problem is an error message that you don’t understand, providing code that is able to reproduce the error is often critical for others to be able to help you.
If the problem is a particular task that you are trying to perform, it is very useful to have simple examples of the type of input that you have and the type of output that you want.

5) Use it or lose it:

If you have ever studied a foreign language, you know what I mean. Like any other language, if you don’t use R regularly (i.e., everyday), you will lose it. Even if what you want to do is straightforward and it would only take a minute to do it in Excel, use this as an opportunity to practice R instead. If you follow this suggestion, you will be able to slowly internalize several commonly used commands in R, making your interaction with R much more enjoyable and your work much more efficient and reproducible! Furthermore, getting proficient in doing simple tasks in R will definitively help you tackle more complicated tasks when time comes.

References

McElreath, R. 2020. Statistical Rethinking. A Bayesian Course with Examples in r and Stan. Second Edition. CRC Press.