RNA sequencing data analysis using R and the Artemis HPC

This course provides an introduction on how to carry out RNA-seq data analysis using the Artemis HPC and R. We will cover the processes of:

  1. Obtaining sequencing data (in fastq or other format)
  2. Generating a count table
  3. Generating a list of differentially expressed genes
  4. Pathway analysis basics

Target audience:

Students and staff in the life sciences who would like to analyse their own RNA-seq data.

Prerequisites

You will need

  1. Your own laptop, with R, Rstudio, Bioconductor and several other key libraries installed.
  2. A University of Sydney Unikey (to access the Artemis HPC).
  3. A text editor: such as Sublime Text, Notepad ++ (Windows only), Visual studio code, Atom etc.
  4. A terminal application, such as the built in terminal on a mac or linux machine, and gitbash for Windows.

Introductory slides

Outline

1. Assessing the quality of your sequencing data What is the first step of any sequencing data analysis?
2. Building a genome index What is the first step of mapping data?
How do I find reference genomes and transcriptomes for my species?
3. Map reads How do I map my data on Artemis HPC?
What is a PBS script?
How do I interpret the PBS logs
How do I interpret the mapping logs?
4. Loading a count table into R How do we get our count table into R?
Was our data generated in a strand-specific manner?
5. Exploratory data analysis of a count table How do we filter out lowly expressed genes?
How can we reveal batch effects in our data?
6. Differential gene expression analysis How can we carry out DGEA on a count table
How can we make volcano plots and venn diagrams in R?
How do we annotate our count table?
How can we make publication-quality graphics?
7. Basic pathway analysis using ToppGene How do we quickly assess what the gene list we obtained actually means?
What online tools can we use for pathway analysis?
What are the limitations of online pathway analysis tools?