Assessing the quality of your sequencing data
|
|
Building a genome index
|
Mapping RNA-seq data requires using splicing-aware mappers.
The first step of mapping sequencing data is to build a genome index.
This involves figuring out which reference file and annotation you need, and making sure the chromosome names in them match
|
Map reads
|
|
Loading a count table into R
|
We have loaded our count table into R, and set up an RStudio project
We have identified that the data was generated in a strand-specific manner
|
Exploratory data analysis of a count table
|
We must filter out lowly expressed genes prior to DGEA
Prior to any DGEA, we must take advantage of unsupervised learning techniques to ensure that no batch effects or other confounding issues affect our experiment as a whole
PCA and heirarchical clustering can be used to achieve this
If there are issues with the PCA and/or clustering, while a list of genes will still be reported in the differential expression, it will most NOT be reliable or accurate, as these issues need to be taken into account
There are tools in R to take into account both understood (different time points) and inexplicable batch effects
|
Differential gene expression analysis
|
In this section, we have carried out differential gene expression analysis
We can use both built-in visualisations with limma, as well as external R packages
|
Basic pathway analysis using ToppGene
|
Exploratory pathway analysis can be performed using a wide range of online tools
ToppGene allows us to quickly assess what’s going on in our data
If a formal pathway analysis needs to be carried out, tools like goana and camera nicely fit within the limma ecosystem
|