Skip to the content.

Setup

Approximate time: 20 minutes

Goals

Log into the HPC cluster’s On Demand interface

whuo01@login.cluster.tufts.edu's password:

[whuo01@login001 ~]$ This indicates you are logged in to the login node.

Compute node allocation

srun --pty -t 3:00:00 --mem 16G -N 1 -n 4 bash

Once you hit enter, you will see something like below showing that the job is queued:

[whuo01@login001 ~]$ srun --pty -t 3:00:00  --mem 16G  -N 1 -n 4 bash
srun: job 55918493 queued and waiting for resources

If wait times are very long, you can try a different partitions by adding, e.g. -p interactive before bash. Or, if you are you registered for the workshop, you can use following option before bash: -p preempt --reservation=bioworkshop. This reservation will be available for one week after the workshop start. You can press Ctrl-C to cancel your request and try again with different options, e.g.:

[whuo01@login001 ~]$ srun --pty -t 3:00:00  --mem 16G  -N 1 -n 4 -p interactive bash
[whuo01@pcomp45 ~]$

The success is indicated by the change of node name after your username. Here it was changed from login001 to pcomp45. This is an indication that you may proceed to the next step. Note: If you go through this workshop in multiple steps, you will have to rerun this step each time you log in.

Course data

cd /cluster/tufts/bio/tools/training/intro-to-rnaseq/users/

**Note: If you have a project directory for your lab, you may use this instead. These are located in /cluster/tufts with names like /cluster/tufts/labname/username/. If you don’t know whether you have project space, please email tts-research@tufts.edu.

Result:

intro-to-RNA-seq/
├── ERP004763_info.txt                 <-- sample description
├── raw_data                           <-- Folder with fastq files
│   ├── sample_info.txt
│   ├── SNF2
│   │   ├── ERR458500.fastq.gz         <-- gzip compressed fastq files
│   │   ├── ERR458501.fastq.gz
│   │   ├── ERR458502.fastq.gz
│   │   ├── ERR458503.fastq.gz
│   │   ├── ERR458504.fastq.gz
│   │   ├── ERR458505.fastq.gz
│   │   └── ERR458506.fastq.gz
│   └── WT
│       ├── ERR458493.fastq.gz
│       ├── ERR458494.fastq.gz
│       ├── ERR458495.fastq.gz
│       ├── ERR458496.fastq.gz
│       ├── ERR458497.fastq.gz
│       ├── ERR458498.fastq.gz
│       └── ERR458499.fastq.gz
└── scripts                           <-- Folder with all commands
    ├── fastqc.sh
    ├── featurecounts.sh
    ├── intro.R
    ├── sbatch_star_align_individual.sh
    ├── sbatch_star_align.sh
    └── sbatch_star_align_SNF2.sh

4 directories, 22 files

Data for the class

Publication: Statistical Models for RNA-seq Data Derived From a Two-Condition 48-replicate Experiment.

Purpose: The experiment seeks to compare a wild type Saccharomyces cerevisiae with a mutant that contains a knock-out in the gene SNF2. The purpose of the study is to analyze variability in sequencing replicates.

Project access number: PRJEB5348

Samples: The WT folder contains 7 sequencing files from a wild type yeast sample, SNF2 contains 7 sequencing files from a yeast sample with a knock-out mutation in the gene SNF2. Note that for the workshop purposes we are treating the 7 sequencing files as if they originate from separate biological replicates.

Organism: Saccharomyces cerevisiae

Sequencing: Illumina HiSeq, Single End, 50bp read length

Workshop Schedule