Background

SARS-CoV-2 Spike Protein

SARS-CoV-2 is a virus with an RNA genome, contained within an envelope that is studded with spike protiens
Spike proteins bind with host cell receptor angiotensin-converting enzyme 2 (ACE2)129 to enter cells
As such spike protein is under selective pressure to change the receptor binding domain in order to more readily infect cells

More Info

Severe acute respiratory syndrome-coronavirus (SARS-CoV) has an RNA genome, which encodes proteins necessary for it’s replication inside cells. The genome is contained in an envelope, which is studded with envelope and Spike proteins. It enters host cells when the receptor binding domain of the Spike protein binds to the host cell receptor angiotensin-converting enzyme 2 (ACE2)129, and the ACE2–virus complex enters the cell where downstream steps in the virus lifecycle take place. Viruses have a high mutation rate, and not surprisingly the Spike protein is under selective pressure to change the receptor binding domain in order to more readily infect cells.

Variants of Concern (VOC)

Mutations in the spike protein have been noted to result in higher ACE2 affinity/transmissibility
Here we note five residues and in the spike protein and how they’ve changed over different SARS-CoV-2 variants
Today we are studying the delta variant

More Info

There are several varaints of the originally characterized SARS-CoV-2 sequence. They contain 5 major mutations with respect to the original sequence, shown here. For example take T478, which is spike protein residue number 478 which is threonine (T) in the original sequence, and changes to lysine (K) in the delta variant. These mutations in the spike proteins have been shown to result in higher ACE2 affinity, transmissibility among other phenotypes that make them an increased thread to public health. Today, we’ll study a delta variant sample and compare these positions to the originally characterized sequence using bioinformatics methods.

SARS-CoV-2 Resources

The US National Center for Biotechnology Information (NCBI) hosts repositories for many types of biomedical and genomics data.
We will use it to download our SARS-CoV-2 reference genome and the raw next generation sequencing data

More Info

The US National Center for Biotechnology Information hosts repositories for many types of biomedical and genomics data. Today we'll retrieve reference data from the nucleotide repository, which contains among other things sequence records for SARS-CoV-2 genomes as well as the Sequence Read Archive (SRA) where raw next generation sequencing data can be easily obtained for reanalysis.

Viral Genome Next Generation Sequencing (NGS)

RNA is extracted from the sample, contains virus and host RNA
Virus specific primers are used to capture SARS-CoV-2 and not host RNA
Transcribed into complementary DNA, and amplified
Adapters are added to attach to flowcell
Sequence of the fragments is determined using fluorescently tagged nucleotides
- They emit a colored light signal when they attach to a base on the fragment

More Info

Let's take a moment ot go through the Viral NGS Workflow. Total RNA is extracted from the sample, this contains both virus and host RNA. At this point a decision is made about downstream protocol based on whether the virus is known. In the case we will study, the authors are interested in sequencing SARS-CoV-2 which has a known sequence, so they can design virus specific primers which will capture SARS-CoV2 and not the host RNA, and transcribe it into complementary DNA which can be prepped for sequencing. Importantly, the primers they design will limit what they capture, so they must design the primers to bind in regions that they expect are conserved between viral variants. An amplification procedure has to be performed, because the current NGS technologies require a high input DNA amount and the viral genome amount is several orders of magnitude lower. Then, the preparation for NGS sequencing begins. We start with viral cRNA fragments that can be sequenced the same as DNA on an NGS sequencer. Sequencing adapters are added (these are the blue rectangles) that will allow the cDNA fragments to attach to the sequencing flowcell. The sequence of the fragments is determined using fluorescently tagged nucleotides that emit a colored light signal when they attach to a base on the fragment being sequences. It is the optical signal that the instrument reads to determine which base in the sequence was read.

Types of Read Data

We often have two kinds of data, single-end and paired-end
single-end sequence each DNA fragement from one end only
paired-end sequence each DNA fragement from both sides
- paired-end data is useful when sequencing highly repetitive sequences. Today we will be working with paired end data

Background

Tools for Life Science

Day 1: Intro to HPC Computing

Day 2: Intro to HPC Bioinformatics

Background

SARS-CoV-2 Spike Protein

Variants of Concern (VOC)

SARS-CoV-2 Resources

Viral Genome Next Generation Sequencing (NGS)

Types of Read Data