two paradigms of De Novo assembly - Answers 1. Overlap/layout/consensus
2. De Bruijn graphs
Overlap/layout/consensus genome assembly steps - Answers Step #1: Compare all reads to
each other to find those that overlap (read 5' -> 3')
Step #2: Create overlap graph arranging reads according to their overlaps
Step #3: Find unique path through the graph
Step #4: Assemble overlapping reads by aligning the reads and deriving consensus
De Bruijn graph assembly steps - Answers Step #1: Tally kmers
Step #2: Create graph of kmer overlap, where kmers are nodes and overlap between them are
edges More complex than overlap graph
Step #3: Find unique path through the graph
Step #4: Synthesize path into a consensus sequence
Overlap/layout/consensus genome assembly requires... - Answers all-vs-all comparison of
reads
overlap/layout/consensus genome assembly was made for... - Answers Sanger and 454
sequencing but has reemerged for PacBio and other long-read techniques
Unlike all vs all comparisons needed for overlap genome assembly, De Bruijn needs to... -
Answers compare all reads with each other, split reads up into kmers
Problems with De Bruijn assembly - Answers - Consider errors: make the graph even more
complicated with bubbles, dead ends
- Consider repeats: parts of the graph with no unique path through it•
- Graph broken on each side, forming contigs
what was one of the first widely used de novo assembly programs - Answers velvet
velvet assembly steps - Answers #1: De brujin graph construction
#2: graph simplification (unbranched paths collapsed into single nodes)
#3: tip removal (dead end paths removed)
#4: bubble popping (bubbles caused by sequencing errors are resolved
, low contigs and high N50s - Answers are good
Oxford nanopore - Answers genome assembler using long reads
How does Oxford Nanopore work? - Answers • Single molecule
• DNA strand translocated through a membrane embedded pore
• DNA in the pore changes current across the membrane in sequence-specific ways
• Can read RNA directly
• Improvements are fast and furious
Drawbacks to Oxford Nanopore - Answers • Accuracy is can be low-ish (95-98%) (BUT LONG
READS)
• Depends on base-calling software, always improving
• Errors are less random then PacBio
• More coverage doesn't always fix
• Read lengths are limited only by preparation method
• Can be multiple Mb long
Oxford Nanopore- MinION - Answers • Very small & portable (USB device)
• Inexpensive (~$1000/MinION)
• Up to 50 Gbp per flow cell (theoretical)
• Results in real time (24hr run typical)
two ways to assemble a genome using MinION - Answers 1. Create an assembly using Illumina
data, then refine using MinION reads
2. Create an assembly using MinION reads alone
canu was lower quality - Answers in our exercise
Genome annotation - Answers The process of identifying the locations of genes and all of the
coding regions in a genome and determining what those genes do"
Methods of annotation - Answers 1. From first principles: algorithmic rules
2. Direct experimental data in the literature (e.g., Swissprot database)
3. From orthology / homology to previously annotated sequences