Glossary

Key Points

Assessing the quality of your sequencing data
  • fastqc is the first step in any sequencing data analysis when working with fastq files

Building a genome index
  • Mapping RNA-seq data requires using splicing-aware mappers.

  • The first step of mapping sequencing data is to build a genome index.

  • This involves figuring out which reference file and annotation you need, and making sure the chromosome names in them match

Map reads
  • STAR is used to map the reads on Artemis

Loading a count table into R
  • We have loaded our count table into R, and set up an RStudio project

  • We have identified that the data was generated in a strand-specific manner

Exploratory data analysis of a count table
  • We must filter out lowly expressed genes prior to DGEA

  • Prior to any DGEA, we must take advantage of unsupervised learning techniques to ensure that no batch effects or other confounding issues affect our experiment as a whole

  • PCA and heirarchical clustering can be used to achieve this

  • If there are issues with the PCA and/or clustering, while a list of genes will still be reported in the differential expression, it will most NOT be reliable or accurate, as these issues need to be taken into account

  • There are tools in R to take into account both understood (different time points) and inexplicable batch effects

Differential gene expression analysis
  • In this section, we have carried out differential gene expression analysis

  • We can use both built-in visualisations with limma, as well as external R packages

Basic pathway analysis using ToppGene
  • Exploratory pathway analysis can be performed using a wide range of online tools

  • ToppGene allows us to quickly assess what’s going on in our data

  • If a formal pathway analysis needs to be carried out, tools like goana and camera nicely fit within the limma ecosystem

Glossary

FIXME