Skip to the content.

Approximate time: 20 minutes

Learning Objectives

VEP overview

VEP will add annotation from a number of sources for each variant that we upload. Below is a subset of the most commonly used annotations annotations.

-Pathogenicity predictions: Computational predictions of whether a variant will affect the protein function. Various algorithms are available (SIFT, PolyPhen2, CADD, etc)

These consequences are then binned into impact groups: LOW, MODERATE, MODIFIER, HIGH. For a full mapping to consequence to impact, see VEP

We’ll run VEP on the VCF that we produced and analyze the variant consequences.

Download the VCF

First, we’ll download the VCF from the cluster to our local computer.

  1. Go back to https://ondemand.cluster.tufts.edu
  2. In the top grey menu, click Files and select Home Directory.

  1. Select intro-to-ngs/results/na12878.vcf

  1. Click Download

Run VEP

  1. In web browser tab, navigate to to https://useast.ensembl.org/Tools/VEP Note that VEP can also be run on the command line on our HPC, resulting in a text file (txt or vcf). You are welcome to ask for instructions to run the command line VEP. For single VCF analysis, the web server is recommended in order to take advantage of the visualization tools.

  2. In the Species section choose Human (Homo sapiens) (should be the default)

  3. In the Input data section choose Or upload file: and navigate to the downloaded file na12878.vcf

  4. Under Transcript database to use select RefSeq transcripts

  5. Click Run

Viewing VEP results

When your job is done, click View Results

<img src=”../img/vep_results_1.png” width=900”>

Our goal is to identify variants that change the coding sequence. We can see in the Coding Consequences box on the right that 20% of the variants are missense, which means that they change the coding sequence of the transcript.

Filtering VEP consequences

Under Filters choose Consequence + is + missense_variant and click Add You should see 1 row - here are a subset of interesting columns:

Location Allele Consequence IMPACT SYMBOL BIOTYPE Amino_acids
10:94842866-94842866 G missense_variant MODERATE CYP2C19 protein_coding I/V
Existing_variation SIFT PolyPhen AF Clinical Significance
rs3758581,CM983294 tolerated(0.38) benign(0.05) 0.9515  

Based on the annotations, one can conclude that this variant unlikely to cause disease. This is consistent with what we know about NA12878 being a healthy individual.

Though the vatiant does change the amino acid from I to V, both SIFT, PolyPhen both suggest that this change does not alter protein function. Furthermore, there is no ClinVar report associated with this variant. Finally, the maximum allele frequency found for this variant in the 1000 Genomes database is 0.95, meaning it is a very common variant and unlikely to be pathogenic.

summary

Previous: Variant Calling

Main Page