Intro to RNA-Seq using Tufts Galaxy
The introductory Slides give an overview of RNAseqencing technologies and our workflow.
Dataset
Our dataset is from the publication:
Chang et al. Next-Generation Sequencing Reveals HIV-1-Mediated Suppression of T Cell Activation and RNA Processing and Regulation of Noncoding RNA Expression in a CD4+T Cell Line. mBio 2011doi: 10.1128/mBio.00134-11
HIV infects CD4+ T cells, the same cells which are critical to mounting an immune response to the virus infection.
The experiment aims to compare the mRNA produced by Mock and HIV infected CD4+ T cells, both 12 hr and 24 hr after infection.
The raw reads from the study have been downsampled to 1 million reads per file in order to speed up computation. The full dataset is available from NCBI under accession SRP013224.
The following steps will walk you through how to run the tools. In each step certain parameters are set. If a parameter option appears on the screen but this tutorial doesn't mention how to set it, leave it at the default. There are questions throughout, which serve to guide you through the results and check your understanding.
Create a new history
- Click the + at the top of the history panel
- To rename the history, click on the box Unnamed History and type rnaseq day 1 and press enter
Import the raw data from a shared data library on our server
- On the top menu bar, click Shared Data and select Data Libraries
- Select chang_2011_2rep
- Just under the top menu bar, next to the search bar, click Export to History and select As a Collection. You'll see a list of fastq files.
- Click Continue to add all files as a list to our current history (No need to click on individual files)
- Name the collection chang_2011
- Click Create list
- Click on Galaxy Tufts on the top left to go back to the main panel.
You’ll see the collection (or list) chang_2011 in your history.
View Fastq files
- Click on the list chang_2011 to expand, you’ll see 11 sequencing files with extension fastq.qz
- Click on the eye icon on the first sequence file HIV_12hr_rep1 and look at the fastq reads
The first 4 lines constitute the first sequencing read:
@SRR497699.30343179.1 HWI-EAS39X_10175_FC61MK0_4_117_4812_10346 length=75
CAGATGGCCGCAGAGGAAGCCATGAAGGCCCTGCATGGGGAGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGAC
+
IIIIGIIHFIIIIBIIDII>IIDHIIHDIIIGIFIIEIGIBDDEFIG<EIEGEEG;<DB@A8CC7<><C@BBDDB
- Sequence identifier
- Sequence
-
- (optionally lists the sequence identifier again)
- Quality string