Skip to the content.

Quality Control

Learning Objectives

FastQC

Next Generation Sequencing can produce a large number of reads in each experiment, giving low-cost and in-depth information about the underlying RNA or DNA sample. However, every platform will produce errors (incorrect nucleotides in the sequence). Hence, quality control is an important step in data analysis. FastQC provides several modules to asses the quality of sequencing data:

To run FastQC we will need to load the FastQC module:

module load fastqc/0.11.9

Now that we have it loaded we can run FastQC on our sequencing data:

fastqc *.fastq.gz

After running FastQC, you will notice several files in your directory:

ls
SRR15607266_pass_1_fastqc.html  SRR15607266_pass_1.fastq.gz     SRR15607266_pass_2_fastqc.zip
SRR15607266_pass_1_fastqc.zip   SRR15607266_pass_2_fastqc.html  SRR15607266_pass_2.fastq.gz

To investigate the quality of our sequence, we will need to view the .html file that was produced. Navigate to the OnDemand Tab and click on Files > Home Directory:

Now navigate to this workshop’s directory and right click on SRR15607266_pass_1_fastqc.html and select Open Link in New Tab:

FastQC Output

Before we dive into these plots let’s discuss what is a “good” quality control plot and what is not.

Sequence Quality

GC Content

Per base sequence content

Adapters in Sequence

So How About Our Own Data?

Here we note that our quality scores are high and no adapters are present in our sample. However, the GC content and the Per base sequence content metrics have failed. This could be due to human contamination or overepresented sequences sequences like polyA tails.


Next

Previous