nf-core/differentialabundance

nf-core/differentialabundance is a bioinformatics pipeline that can be used to analyse data represented as matrices, comparing groups of observations to generate differential statistics and downstream analyses. The pipeline supports RNA-seq data such as that generated by the nf-core rnaseq workflow, and Affymetrix arrays via .CEL files.

Create the working directory

mkdir -p /cluster/tufts/workshop/UTLN/differentialabundance

reference genome gtf

In the last RNAseq workshop, we selected save_reference. So that all refereneced data will be saved for our future use. Today we can reuse the gtf file for human genome.

ls -1 /cluster/tufts/workshop/UTLN/rnaseq/rnaseqOut/genome/

You can see the GRCh38 reference genome's gtf and fasta files. In addition, you can see the newly created STAR index and rsem folders that can be used for your future RNA-Seq analysis.

Homo_sapiens.GRCh38.111.gtf
Homo_sapiens.GRCh38.dna.primary_assembly.fa
Homo_sapiens.GRCh38.dna.primary_assembly.fa.fai
Homo_sapiens.GRCh38.dna.primary_assembly.fa.sizes
Homo_sapiens.GRCh38.dna.primary_assembly.filtered.bed
Homo_sapiens.GRCh38.dna.primary_assembly.filtered.gtf
genome.transcripts.fa
index/
rsem/

Let's create a softlink of the gtf to our differentialabundance folder

cd /cluster/tufts/workshop/UTLN/differentialabundance
ln -s /cluster/tufts/workshop/UTLN/rnaseq/rnaseqOut/genome/Homo_sapiens.GRCh38.111.gtf .

gene expression count matrix

In the output folder of RNAseq workshop, you can find the count file we need salmon.merged.gene_counts.tsv via ls.

$ ls -1 /cluster/tufts/workshop/UTLN/rnaseq/rnaseqOut/star_salmon/*.tsv

/cluster/tufts/workshop/UTLN/rnaseq/rnaseqOut/star_salmon/salmon.merged.gene_counts.tsv
/cluster/tufts/workshop/UTLN/rnaseq/rnaseqOut/star_salmon/salmon.merged.gene_counts_length_scaled.tsv
/cluster/tufts/workshop/UTLN/rnaseq/rnaseqOut/star_salmon/salmon.merged.gene_counts_scaled.tsv
/cluster/tufts/workshop/UTLN/rnaseq/rnaseqOut/star_salmon/salmon.merged.gene_lengths.tsv
/cluster/tufts/workshop/UTLN/rnaseq/rnaseqOut/star_salmon/salmon.merged.gene_tpm.tsv
/cluster/tufts/workshop/UTLN/rnaseq/rnaseqOut/star_salmon/salmon.merged.transcript_counts.tsv
/cluster/tufts/workshop/UTLN/rnaseq/rnaseqOut/star_salmon/salmon.merged.transcript_lengths.tsv
/cluster/tufts/workshop/UTLN/rnaseq/rnaseqOut/star_salmon/salmon.merged.transcript_tpm.tsv
/cluster/tufts/workshop/UTLN/rnaseq/rnaseqOut/star_salmon/tx2gene.tsv

We can create a soft link of salmon.merged.gene_counts.tsv into our differentialabundance folder

cd /cluster/tufts/workshop/UTLN/differentialabundance
ln -s /cluster/tufts/workshop/UTLN/rnaseq/rnaseqOut/star_salmon/salmon.merged.gene_counts.tsv .

samplesheet.csv

sample	treatment	replicate	batch
GFPkd_1	GFPkd	1	A
GFPkd_2	GFPkd	2	A
GFPkd_3	GFPkd	3	A
PRMT5kd_1	PRMT5kd	1	A
PRMT5kd_2	PRMT5kd	2	A
PRMT5kd_3	PRMT5kd	3	A

You can copy my samplesheet.csv to your workding directory.

cd /cluster/tufts/workshop/UTLN/differentialabundance
cp /cluster/tufts/workshop/shared/samplesheet.csv .

contrast.csv

id	variable	reference	target	blocking
PRMT5kd_vs_GFPkd	treatment	GFPkd	PRMT5kd

You can copy my contrast.csv to your working directory.

cd /cluster/tufts/workshop/UTLN/differentialabundance
cp /cluster/tufts/workshop/shared/contrast.csv .

Open OnDemand

Click differentialabundance in Bioinformatics Apps.

Arguments

Number of hours: 2
Select cpu parition: batch
Reservation for class, training, workshop: Bioinformatics Workshop
Version: 1.4.0
Working Directory: /cluster/tufts/workshop/UTLN/differentialabundance ## Change this to your own directory
outdir: DEGout
study_type: rnaseq
input: samplesheet.csv
contrasts: contrast.csv
matrix: salmon.merged.gene_counts.tsv
observations_id_col: sample
observations_name_col: sample
differential_min_fold_change: 1.5
deseq2_vs_method: rlog
gsea_run: true
gsea_gene_sets: /cluster/tufts/workshop/shared/gsea/h.all.v2023.2.Hs.symbols.gmt.txt
shinyngs_build_app: true
report_title: PRMT5kd vs. GFPkd
report_author: Yucheng Zhang ## You can put your name as the author
gtf: Homo_sapiens.GRCh38.111.gtf

------------------------------------------------------
                                        ,--./,-.
        ___     __   __   __   ___     /,-._.--~'
  |\ | |__  __ /  ` /  \ |__) |__         }  {
  | \| |       \__, \__/ |  \ |___     \`-._,-`-,
                                        `._,._,'
  nf-core/differentialabundance v1.4.0
------------------------------------------------------
Core Nextflow options
  runName                     : maniac_mcclintock
  containerEngine             : singularity
  container                   : [RMARKDOWNNOTEBOOK:biocontainers/r-shinyngs:1.8.4--r43hdfd78af_0]
  launchDir                   : /cluster/tufts/workshop/UTLN/differentialabundance
  workDir                     : /cluster/tufts/workshop/UTLN/differentialabundance/work
  projectDir                  : /cluster/tufts/biocontainers/nf-core/pipelines/nf-core-differentialabundance/1.4.0/1_4_0
  userName                    : yzhang85
  profile                     : tufts
  configFiles                 : 

Input/output options
  input                       : samplesheet.csv
  contrasts                   : contrast.csv
  outdir                      : DEGout

Abundance values
  matrix                      : salmon.merged.gene_counts.tsv
  affy_cel_files_archive      : null
  querygse                    : null

Affy input options
  affy_cdfname                : null

Differential analysis
  differential_min_fold_change: 1.5

Limma specific options (microarray only)
  limma_spacing               : null
  limma_block                 : null
  limma_correlation           : null

GSEA
  gsea_run                    : true
  gsea_gene_sets              : /cluster/tufts/workshop/shared/gsea/h.all.v2023.2.Hs.symbols.gmt.txt

Shiny app settings
  shinyngs_shinyapps_account  : null
  shinyngs_shinyapps_app_name : null

Reporting options
  report_file                 : /cluster/tufts/biocontainers/nf-core/pipelines/nf-core-differentialabundance/1.4.0/1_4_0/assets/differentialabundance_report.Rmd
  logo_file                   : /cluster/tufts/biocontainers/nf-core/pipelines/nf-core-differentialabundance/1.4.0/1_4_0/docs/images/nf-core-differentialabundance_logo_light.png
  css_file                    : /cluster/tufts/biocontainers/nf-core/pipelines/nf-core-differentialabundance/1.4.0/1_4_0/assets/nf-core_style.css
  citations_file              : /cluster/tufts/biocontainers/nf-core/pipelines/nf-core-differentialabundance/1.4.0/1_4_0/CITATIONS.md
  report_title                : PRMT5kd vs. GFPkd
  report_author               : Yucheng Zhang
  report_description          : null

Institutional config options
  config_profile_description  : The Tufts University HPC cluster profile provided by nf-core/configs.
  config_profile_contact      : Yucheng Zhang
  config_profile_url          : https://it.tufts.edu/high-performance-computing

Max job request options
  max_cpus                    : 72
  max_memory                  : 120 GB
  max_time                    : 7d

!! Only displaying parameters that differ from the pipeline defaults !!
------------------------------------------------------
If you use nf-core/differentialabundance for your analysis please cite:

* The pipeline
  https://doi.org/10.5281/zenodo.7568000

* The nf-core framework
  https://doi.org/10.1038/s41587-020-0439-x

* Software dependencies
  https://github.com/nf-core/differentialabundance/blob/master/CITATIONS.md

[-        ] process > NFCORE_DIFFERENTIALABUNDANC... -

[-        ] process > NFCORE_DIFFERENTIALABUNDANC... -
[-        ] process > NFCORE_DIFFERENTIALABUNDANC... -
[-        ] process > NFCORE_DIFFERENTIALABUNDANC... -
[-        ] process > NFCORE_DIFFERENTIALABUNDANC... -
[-        ] process > NFCORE_DIFFERENTIALABUNDANC... -
[-        ] process > NFCORE_DIFFERENTIALABUNDANC... -
[-        ] process > NFCORE_DIFFERENTIALABUNDANC... -
[-        ] process > NFCORE_DIFFERENTIALABUNDANC... -
[-        ] process > NFCORE_DIFFERENTIALABUNDANC... -
[-        ] process > NFCORE_DIFFERENTIALABUNDANC... -
[-        ] process > NFCORE_DIFFERENTIALABUNDANC... -

.
.
.

executor >  slurm (14)
[3c/aa1431] process > NFCORE_DIFFERENTIALABUNDANC... [100%] 1 of 1 ✔
[73/374104] process > NFCORE_DIFFERENTIALABUNDANC... [100%] 1 of 1 ✔
[64/cc51c4] process > NFCORE_DIFFERENTIALABUNDANC... [100%] 1 of 1 ✔
[c8/7b9eb7] process > NFCORE_DIFFERENTIALABUNDANC... [100%] 1 of 1 ✔
[bf/0ac8f6] process > NFCORE_DIFFERENTIALABUNDANC... [100%] 1 of 1 ✔
[f3/85ca6e] process > NFCORE_DIFFERENTIALABUNDANC... [100%] 1 of 1 ✔
[1c/37c98b] process > NFCORE_DIFFERENTIALABUNDANC... [100%] 1 of 1 ✔
[3b/8585ca] process > NFCORE_DIFFERENTIALABUNDANC... [100%] 1 of 1 ✔
[12/f7dac7] process > NFCORE_DIFFERENTIALABUNDANC... [100%] 1 of 1 ✔
[c3/a75051] process > NFCORE_DIFFERENTIALABUNDANC... [100%] 1 of 1 ✔
[3f/7f671c] process > NFCORE_DIFFERENTIALABUNDANC... [100%] 1 of 1 ✔
[51/a37574] process > NFCORE_DIFFERENTIALABUNDANC... [100%] 1 of 1 ✔
[21/f74ad3] process > NFCORE_DIFFERENTIALABUNDANC... [100%] 1 of 1 ✔
[28/fbb019] process > NFCORE_DIFFERENTIALABUNDANC... [100%] 1 of 1 ✔
-[nf-core/differentialabundance] Pipeline completed successfully-
Completed at: 27-Mar-2024 14:02:23
Duration    : 25m 13s
CPU hours   : 0.5
Succeeded   : 14

Cleaning up...

Check the output files

Under the output folder, you will see subfolders listed as below:

other
shinyngs_app
tables
plots
report
pipeline_info

* Under report folder, you will see a html file which will be the report file.
Under shinyngs_app/ folder, you will see a subfolder which stores the app.R shiny app for interactive visualization. You can then view app.R with Open OnDemand shinyngs app.

Previous: rnaseq
Next: report