1 Basic concepts in R

1.1 Variables

As explained in the slides, you can assign a value and initiate variable by using <- or = symbols. You can see the examples below:

1.1.1 Arithmetic operations

Similar to what we covered in the last session as well, you can do multiple arithmetic orations with the numeric variables.

1.2 Vectors

You can initiate a vector direct by using c() function. Here c technically stands for concatenation

With the vector n, You can try do some of the basic statistics functions like log(), mean(), max() and so on as explained in the slides to get a hang of it!

1.3 String operations

Here I will introduce some in-built functions to manipulate strings! Some of these functions can be very handy when you work with tables with different strings and you want to search/edit them!

Remember to use ? on the different functions to understand them better

Note: Almost all of the functions above are much more powerful than what you see in the results. When you look into the help page of any of these functions you would see pattern as an argument for these functions. This stands for REGEX patterns that you can learn to use capture string patterns in large texts or tables! I will not touch more upon this as REGEX alone can be a course on its own!

2 Reading in files to dataframes

You can generally import a csv or tsv file into the R environment using read.table command. This would automatically create a datatype called data.frame that stores information from a matrix (like an excel sheet) in a structured way!

You can download the files here: genes and metadata.

Note: It is important that by using ?read.table command, you can see that there are many other arguments to this function that one can use to read in their file, the way they want. Like skip can for example be used to skip a certain number of lines from your file! This would be helpful, if you have a file with some comments in the beginning of a counts table for example!

2.1 Looking inside a dataframe

  1. You can use functions like names() that will give you all the different column names of a data frame and summary() that will summarize your data.frame based on the data that is in the data.frame.

  2. You can access the different columns in the data.frame using $ sign. Then the result of this action becomes a vector of values in that column. You can think of this as accessing a particular column in a an excel sheet!

  3. The functions row.names() and colnames() can be used to set the row-headers and column-headers as the functions suggest! For example if you did not have headers and row names in the file you imported! In the metadata dataframe, we did not include renames! So, we add a row name based on one of the columns!

  4. You find out what kind of datatype your object is either by using class() or str() standing for structure of your R object!

2.2 Difference between a matrix and a dataframe

  1. It is good to know the difference between a matrix and a dataframe! A matrix is basically a dataframe that is not structured! You can between these formats by using the as.matrix() or as.data.frame() functions!

  2. You can also access and modify the contents of the dataframe or a matrix by using the syntax [row,column]. Here you can either specify the number or the names of the rows and columns.

  3. Similarly, you can also access the entire rows by [row,] syntax or entire columns by [,column] syntax.

  4. You can also subset your dataframes with the help of c() function.

  5. With this syntax, you can particularly change the contents by using <- or = as shown below:

2.3 Adding and removing from a dataframe

  1. You can add and remove row(s) or column(s) to a dataframe manually with <-

  2. or with cbind() or rbind() standing for row-bind and column-bind. In this case you make a new dataframe/vector and you bind it to the data.

  3. You can remove row or column from a dataframe by using - sign for the particular number!

2.4 Checking for NA values

  1. There are ways to check for NA values in your dataset! You can use is.na() function for example and with the combination of which() that would basically tell you exactly where it is the case!

  2. you can also remove rows or columns with NA by using na.omit()

2.5 Looking for specific values in a dataframe and subsetting

  1. You can the %in% function to specifically look for things in your dataframe.
  2. Let’s say you are looking for particular genes in your dataset, the you can use this function to subset as well

3 Factors

  1. factors are very important for statistical calculations. These basically the different levels that are there in your data. For example the Health in our metadata is a factor with two levels of yes and no. It is basically a binary factor. But you can have many levels as well!

  2. If there is a column in your dataframe and you want to make it a factor you can do this by as.factor() function

  3. The function table() tabulates observations and can be used to create bar plots quickly.

4 Basic plotting

  1. I also want to show you can use basic plotting using the functions: plot(), boxplot() and hist()

Hope you guys learnt some new ways handling data in R and most importantly had fun. You need to remember: