GBD summary
L2 eukaryotic genome
Diploid = 2 genomes in a cell, slightly different
1 genome = haplotype, 2 genomes = genotype
Polyploid = more than 2 genomes in a cell (can be because multiple nuclei)
Most plants are polyploid
Chromosome:
1. DNA double helix wrapped around nucleosomes
2. Nucleosomes fold
3. Chain folds
“The human genome” is a reference genome -> hgX = human genome number X
Differences between human genomes:
- Structural differences
- Different bases
- Certain pieces that others don’t have at all
Future: graph genome
Karyotype = chromosome count and what they look like (= length, banding pattern, centromere
position), under light microscope for certain organism
Banding pattern = Giesma banding = G-banding
- AT gives dark region -> gene arm
- GC gives light region -> open chromatin -> transcriptionally active
Locus = where on banding pattern -> long arm = q, short arm = p; GC bands are numbered
Gene name: chromosome number, arm, region, band, ., subband
Cell lines: not very trustworthy, because old and manipulated
HeLa cell line: very old + fucked up
Simple sequence repeats = SSRs = almost perfect repeats of short sequences, mostly located in
centromeres, can be involved in disease
Jumping genes = transposable elements = TE;
move around in gene, important in evolution
- DNA transposon = cut + paste
- Retro transposon = copy + paste,
via RNA intermediate + reverse transcriptase
Nucleotide content differs: within/between species
- GC% = GC content = number of G+C as a fraction of
the sequence length
- CpG = CG dinucleotide, C followed by G
- CpG island = region with lot of CpGs and high GC%
GC content varies across/within species
, In humans CpG less than expected -> 1% instead of expected 4%
-> because deamination (C->T) is changed to AT
Most CpGs are methylated (except in CpG islands -> usually promotor of housekeeping gene)
-> which leads to deamination
If CpG island is methylated -> gene is repressed
In bacteria, archaea and viruses: (weak) relation between gene number and complexity
-> Not for eukaryotes
Hard to decide how many genes an eukaryote has: alternative splicing etc
Gene = genomic sequence (DNA or RNA) that encodes for functional element (protein or RNA) and is
coherent. Genes might overlap.
Gene families: similar genes found in different organisms -> hox genes: body plan
Genome duplication: lead to genome evolution -> more complexity -> back to diploid
Homology
- Homologs = shared ancestry
- Ortholog = by speciation
- Paralog = by duplication
L3 Sequencing
First generation sequencing:
- Sanger
o Chain termination technique = dideoxynucleotides without 3’ hydroxyl group -> can
see where chain ends with which base + radio-/fluorlabeling
o 800-1000 nucleotides
o Very accurate but low throughput
- Maxam-gilbert
o Radiolabelled DNA -> chemical treatment to break
Second generation sequencing = next generation sequencing:
- Illumina
o Bridge amplification
o Single end read = one way on the bridge
paired end read = from both ways on the bridge
- Ion Torrent
o Ion semiconductor sequencing
Third generation sequencing: long reads
- Pacific biosciences
o Single molecule, real time sequencing = SMRT
- Oxford nanopore
L2 eukaryotic genome
Diploid = 2 genomes in a cell, slightly different
1 genome = haplotype, 2 genomes = genotype
Polyploid = more than 2 genomes in a cell (can be because multiple nuclei)
Most plants are polyploid
Chromosome:
1. DNA double helix wrapped around nucleosomes
2. Nucleosomes fold
3. Chain folds
“The human genome” is a reference genome -> hgX = human genome number X
Differences between human genomes:
- Structural differences
- Different bases
- Certain pieces that others don’t have at all
Future: graph genome
Karyotype = chromosome count and what they look like (= length, banding pattern, centromere
position), under light microscope for certain organism
Banding pattern = Giesma banding = G-banding
- AT gives dark region -> gene arm
- GC gives light region -> open chromatin -> transcriptionally active
Locus = where on banding pattern -> long arm = q, short arm = p; GC bands are numbered
Gene name: chromosome number, arm, region, band, ., subband
Cell lines: not very trustworthy, because old and manipulated
HeLa cell line: very old + fucked up
Simple sequence repeats = SSRs = almost perfect repeats of short sequences, mostly located in
centromeres, can be involved in disease
Jumping genes = transposable elements = TE;
move around in gene, important in evolution
- DNA transposon = cut + paste
- Retro transposon = copy + paste,
via RNA intermediate + reverse transcriptase
Nucleotide content differs: within/between species
- GC% = GC content = number of G+C as a fraction of
the sequence length
- CpG = CG dinucleotide, C followed by G
- CpG island = region with lot of CpGs and high GC%
GC content varies across/within species
, In humans CpG less than expected -> 1% instead of expected 4%
-> because deamination (C->T) is changed to AT
Most CpGs are methylated (except in CpG islands -> usually promotor of housekeeping gene)
-> which leads to deamination
If CpG island is methylated -> gene is repressed
In bacteria, archaea and viruses: (weak) relation between gene number and complexity
-> Not for eukaryotes
Hard to decide how many genes an eukaryote has: alternative splicing etc
Gene = genomic sequence (DNA or RNA) that encodes for functional element (protein or RNA) and is
coherent. Genes might overlap.
Gene families: similar genes found in different organisms -> hox genes: body plan
Genome duplication: lead to genome evolution -> more complexity -> back to diploid
Homology
- Homologs = shared ancestry
- Ortholog = by speciation
- Paralog = by duplication
L3 Sequencing
First generation sequencing:
- Sanger
o Chain termination technique = dideoxynucleotides without 3’ hydroxyl group -> can
see where chain ends with which base + radio-/fluorlabeling
o 800-1000 nucleotides
o Very accurate but low throughput
- Maxam-gilbert
o Radiolabelled DNA -> chemical treatment to break
Second generation sequencing = next generation sequencing:
- Illumina
o Bridge amplification
o Single end read = one way on the bridge
paired end read = from both ways on the bridge
- Ion Torrent
o Ion semiconductor sequencing
Third generation sequencing: long reads
- Pacific biosciences
o Single molecule, real time sequencing = SMRT
- Oxford nanopore