Topics | Descriptions |
|
Module 1: Fundamentals
|
Chapter 1 Comparative genomics: an emerging field Details
|
Key words & concepts
- Alignment: from small sequences to whole genome
- Homology: paralog vs. ortholog
- Synteny: chromosomal synteny vs. synteny block
- Phylogeny: gene phylogeny vs. organismal phylogeny
- Species & speciation
Overview
Genome is a linear molecular structure that carries
genetic information. It consists of four nucleotides
(A, T, C, and G). Genetic information is scattered in
genome and is encoded in
these four "letters" in different order and
combination. One fundamental question is how genetic
information is encoded in genome, which is sometimes
called the "dark matter"?
A genome contains a large number of different types
of parts: genes and regulatory elements, sometimes
collectively called functional elements. Genes, in
particulary protein-coding genes, are relatively
straightforward to identify. In contrast, other types
of functional elements like enhancers and insulators
are harder to detect. Regulatory elements dictates
gene expression. In particular, it determines where
and when a gene is expressed. It also determines how
much is gene is expressed.
Presently, an effective way to understand genome is
comparative genomics, in which two or more genomes are
compared for similarities and for differences.
Comparative analysis can be down at the DNA level and
at the protein level. Using bioinformatics programs,
sequences are aligned and the alignments are examined
for their evolutionary relationship. Are they
homologous, or do they share common ancestor?
Comparative analysis can also be done for genomes of
different distances, ranging from genomes of different
strains of a species to different species that are
distanly related. Differences of genomes (i.e.,
genotypes) can therefore be linked to functional
consequences, or phenotypes.
Objectives
Understand why comparative genomics is important for
studying G2P (genotype-to-phenotype) relationship, and
how comparative genomics is carried out.
Readings
- Primer: Comparative genomics (by Ross Hardison, 2003).
- Review: Comparative genomics (by Web Miller et al., 2004).
|
Chapter 2 DNA sequencing technologies: the driving force
Details
|
Key words & concepts
- First-generation
- Second-generation
- 454 sequencing
- Solexa sequencing
- SOLiD sequencing
- Third-generation
Overview
The ability to determine the sequence of nucleotides are ordered in a DNA molecule is an essential first
step for understanding the composition and function of
a genome. Over the last 40 years, many different methods were developed. Sanger sequencing method, which was
invented by Fred Sanger who won his second Nobel Prize
for it, was the key method used in the Human Genome
Project. Second(or "next")-generation DNA sequencing methods
were introduced a few years after the completion of
the Human Genome Project. Because different sequencing
methods produce reads with different lengths and can be
either paired or single-ended, they can be used to
address different questions. The third-generation sequencing methods are
on the horizon for production use.
Popular DNA sequencing methods will be described in
this lesson, primarily because sequence reads from
each method are unique so that they need separate
methods for handling and for analysis.
Objectives
Master file format of each sequencing method and how
they can be processed for further analysis.
Readings
- Review: Applications of new sequencing
technologies for transcriptome analysis (by Morozova and Marra, 2009)
- Review: The potential and challenges of nanopore sequencing (by Branton et al., 2008)
-
File format: The Sanger FASTQ file
format for sequences with quality scores, and the
Solexa/Illumina FASTQ variants (by Cock et al., 2009).
|
Chapter 3 Bioinformatics: the enabling force
Details
|
Key words & concepts
- Alignment
- Short sequence alignment
- Whole genome alignment
- Database searches (blast)
- Homolog identification
- Synteny
- Synteny block identification (orthoCluster)
- Synteny breakpoint
- Genome visualization
- dotplot
- Circos
- Genome Browser & Synteny browser (gbrowse & gbrowse_syn)
Overview
Bioinformatics is an sister field of genomics and
emerged almost simultaneously as genomics. The amount
of information generated in large genome sequencing
and related functional genomics projects can no longer
be recorded in notebooks used comfortably by
traditional biologists. Computers, including both
hardware and software, are needed for genome
information storage, management, retrieval,
analysis, and report.
In this lecture, I will overview bioinformatics tools
developed in the last 10 years or so for DNA
alignment, for synteny identification, and for display
using genome browsers.
Objectives
Be comfortable with selecting computer programs
appropriate for different conditions.
Readings
- Bioinformatics: alive and kicking (by Stein, 2008)
- BWA (by Li et al., 2010)
- OrthoCluster (by Zeng et
al., 2008) & OrthoClusterDB (by Ng et al., 2009)
- gbrowse (by
Lincoln Stein, et al., 2010)
- Circos: An information aesthetic for comparative genomics (by Krzywinski et al., 2009)
|
Chapter 4 Resources for comparative genomics
Details
|
Key words & concepts
- Primary and clade-specific databases
- NCBI genome resources
- Ensembl genome resources
- UCSC genome resources
- Model organism databases (MODs)
Overview
Sequence results of publicly funded genome
projects are deposited in publicly accessible databases,
which makes it convenient to data retrieval. In this
lecture, three major public genome data resources will
be described. Although data stored in these databases
are essentially identical, each database has its own
unique bioinformatics tools, which makes each of them
useful for certain purposes.
Objectives
Be familar with the architecture of each of the
three public genome data resources and learn how
to retrieve data effectively.
Readings
- Touring Ensembl: A practical guide to genome browsing (by Spudich and Fernandez-Suarez, 2010)
- Entrez Gene: gene-centered information at NCBI (by Ostel et al., 2007)
|
Chapter 5 The Human Genome Project
Details
|
- The Human Genome Project
|
|
Module 2: Functional elements: identification & function
|
Chapter 6 Gene
Details
|
Key words & concepts
- Gene definition
- Protein-coding gene
- Exon
- Intron
- 5' & 3' UTRs
- Promoter
- Alternative splicing & isoform
- Non-coding gene
- Gene prediction
- Comparative prediction
- Transcriptome-based prediction
- Genes in operons
- Gene duplication
- Gene birth and death
Overview
Gene is a concept in motion. When Gregor Mendel first published
his studies on pea plants in 1866, he did not know the term gene
because it was not defined yet. The term was first coiled by Danish
botanist Wilhelm Johanssen in 1909 while the physical basis of
gene remained
unknown at that time. In 1910, Thomas Morgan's work on fruitfilies
shows that genes sit on chromosomes, leading to the idea of genes
as beads on a string. In 1941, George Beadle and Edward Tatum
introduced the concept that one gene makes an enzyme. Shortly after in
1944, Oswald Avery and colleagues found that genes are made of DNA.
James Watson and Francis Crick in 1953 published the chemical
structure of DNA, as well as the central dogma of molecular
biology. The concept that gene is a contiguous segment of DNA was
broken when Richard Roberts and Phillip Sharp discovered that
genes can be split into segments, leading to the idea that one
gene can make several proteins. Genes do not have to code for
proteins. RNA genes include rRNA and tRNAs. Work by Victor Ambros and Gary
Ruvkun on the nematode C. elegans lead to the discovery
of the first microRNA gene in 1993.
The structure of gene is dynamic. In genome, new genes are
born and existing genes can die. The birth and death of genes
can be detected by comparing genomes of closely related
organisms.
Objectives
Readings
- Origins, evolution and phenotypic impact of new genes (by Kaessmann, Genome Research, 2010)
- What is a gene, post-ENCODE? History and updated definition (by Gerstein et al., Genome Research, 2007)
- The origin of new genes: glimpses from the young and old (by Long et al., Nature Review Genetics, 2003)
|
Chapter 7 Ultraconserved elements
Details
|
- Identification
- Function
|
Chapter 8 Functional elements: cis-regulatory elements
Details
|
- Transcription factor binding sites (TFBSs)
- Promoter
- Enhancer
- Insulator
- Finding motifs: ChIP-chip & ChIP-SEQ
|
Chapter 9 ENCODE & modENCODE projects
Details
|
- Pilot project (1% of the human genome)
- ENCODE: functional elements in humans
- modENCODE: functional elements in D. melanogaster and C. elegans
|
Chapter 10 Synteny blocks
Details
|
- Chromosomal synteny
- Synteny blocks
- Perfect synteny blocks
- Imperfect synteny blocks
- "Ultraconserved" synteny blocks
- Synteny breakpoints
|
Chapter 11 Genome rearrangement events & genome evolution
Details
|
- Deletion
- Insertion
- Inversion
- Transposition
- Translocation
|
|
Module 3: Intra-species comparison
|
Chapter 12 Genome variations
Details
Shortest woman & tallest man
Shortest man & tallest woman
|
Key words & concepts
- Types of GVs
- Formation of GVs
- Duplication
- Non-homologous recombination
- GVs and disease conditions
- Personalized genomics & medicine
Readings
- New York Times: Adventures in Very Recent Evolution (Nichlas Wade, 2010)
- New York Times: Scientists Cite Fastest Case of Human Evolution (by Nichlas Wade, 2010)
|
Chapter 13 From SNP to HapMap
Details
|
- Types of SNPs
- Density and genome distribution
- Impact on genes
- Coding regions
- Regulatory regions
- Haplotype
- The HapMap Project
|
Chapter 14 Structural variation (SV)
Details
|
Key words & concepts
- Comparative genomics hybridization (CGH)
- Copy number variation (CNV)
- Balanced rearrangement (BRE)
- Inversion
- Transposition and translocation
Readings
- Copy Number Variation in Human Health, Disease, and Evolution (by Zhang et al., Annual Review, 2009)
|
Chapter 15 Loss-of-function variations
Details
|
Key words & concepts
- Identification
- Validation
- Buffering of genetic variation
Readings
- Initial sequence of the chimpanzee genome and comparison with the human genome
Note: Although this paper describes
differences between two species (human & chimpanzee),
such kind of differences also exist between human
individuals. It reported many
human disease genes in the chimpanzee genome.
- Principles for the Buffering of Genetic Variation (by Hartman et al., Science, 2001)
|
Chapter 16 GWAS (genome-wide association studies)
Details
|
Readings
- Genomewide Association Studies and Assessment of the Risk of Disease (by Manolio, NEJM, 2010)
|
Chapter 17 Personalized genomes & The 1000 Genome Project
Details
|
- Personalized genomes
- James Watson
- Craig Venter
- Yan Huang ("An Asian")
- A Korean
- Desmond Tutu
- The 1000 Genome Project
|
|
Module 4: Inter-species comparison
|
Chapter 18 Gene family: contraction and expansion
Details
|
- Gene family classification
- Comparative gene family classification
- Stable gene family (e.g., ABC transporters)
- Dynamic gene family (e.g., chemosensory genes)
|
Chapter 19 Transcription factor and gene battery
Details
|
- Classification of transcription factors
- Example: RFX gene family
- Example: RFX gene battery
|
Chapter 20 Horizontal gene transfer
Details
|
Readings
- Nature Review Focus: Horizontal gene transfer (2005)
- Lateral gene transfer and the nature of bacterial innovation (by Ochman et al., Nature, 2000)
- Lateral gene transfer between Archaea and Bacteria (by Nelson et al., Nature, 1999)
|
Chapter 21 Virulence factors & drug targets
Details
|
Readings
- Carbon metabolism of intracellular bacterial pathogens and possible links to virulence (Eisenreich et al., Nature Reviews Microbiology, 2010)
|
Chapter 22 Metagenomics
Details
|
Key words & concepts
- Environmental
- Hot spring
- Ocean
- Sludge
- soil
- Organismal
Readings
- Primer: Metagenomics
|
Chapter 23 What makes us human?
Details
|
Key words & concepts
- Human vs. animals
- Human vs. chimpanzee
- Human vs. Neandertal
- Human vs. human
Overview
Objectives
Readings
- A Draft Sequence of the Neandertal Genome (by Green et al., Science, 2010)
- An RNA gene expressed during cortical development evolved rapidly in humans (by Pollard et al., Nature, 2006)
- Evolution at two levels in humans and chimpanzees (by King and Wilson, Science, 1975) Note: This study was regarded as the first contribution to comaprative genomics (Sean Carool, PLoS Biology, 2005).
Listenings
- Dr. Katherine Pollard: What makes us human?
|
Chapter 24 The Genome 10K Project
Details
|
Key words & concepts
- Ancestral state reconstruction
- Comparative genomics
- Molecular evolution
- Species conservation
- Vertebrate biology
Overview
This large-scale project was proposed in
anticipation of a precipitous drop in costs and an
increase in sequencing efficiency.
Objectives
Be aware of this large-scale project and the resource that will be available for comparative genomics du
ring
and after the completion of this project.
Readings
- Genome 10K: A Proposal to Obtain Who
le-Genome Sequence for 10 000 Vertebrate Species (by Haussler et al., 2009)
|
|
Module 5: Student projects & presentations
|
There will be multiple presentation sessions.
|
Overview
Students will be divided into groups of two students.
Each group will propose a comparative genomics project
at the beginning of the course, which will be carried
out duing this course. At the end of the course, each
group will present their projects. One student will
focus on background and motivation, while the second
student on results and interpretation.
|