# Welcome to part 2 of the lemur lab. Instead of moving between the browser and the other software, we've decided to put the whole lab here within R. You can actually answer the questions within this document (which is just a text file) as well, and then save it (as "lastname.lemur2.R"). Just be sure to offset the answers with "## your fist name:" so as not to confuse the program and for us to keep track of who is who. So, for instance, we might write: ##How many internal edges are there on a tree with 4 taxa? ## #and I'd fill in, after the ##: ##How many internal edges are there on a tree with 4 taxa? ##Arne: if the tree is unrooted, there are (2n-3) = 5 edges total. n of those edges must be external (leading to the n species) so there must be 2n-3-n or n-3 internal edges. For n=4, that would be 1. #Save this file often. Ok, let't get started. Anything that is not set off by # is something you should execute in R. You do this by moving your cursor to that line (or highlighting a block of lines) and then hitting (command="apple" on macs). That copies that line or block into the console window and includes a 'return', which executes it. You can also type directly into the console window and hit return to execute. Now, hit a couple of times to move between this window and the console window (which should be empty at the moment) to make sure you get the swing of things. #The first thing you want to do is to make sure you have the relevant packages #This is the package for manipulating trees and data for trees. #From now on, execute anything that does not have a # in front of it, so both of the next two lines... install.packages("ape") #grabs it from the online CRAN library - choose the Canada BC one (it is actually hosted by SFU) library(ape) #puts it in memory for you # Move to the console window (upper left corner on most macs) to see that this command got executed. #now lets laod and plot some trees - how about one from part 1? #load the first tree from question 4 #remember to execute... tree1_question4<-"(((A:1,B:1):2,(C:2,D:2):1,E:1):1,((F:1,G:1):1,(H:3,I:3):1):1);" test.tree=read.tree(text=tree1_question4) #is the tree stored in memory? Let's find out by executing the thing you have called test.tree test.tree #Is it fully resolved or not? How do you know? plot(test.tree) #were you right about the tree being fully resolved (bifurcating)? # Get the sequences from the data file you have downloaded from the website into R. The way to get your file into R is to click down on the fasta file you downloaded from the browser and then drag it into the space between the "" signs in the command below (this is a mac thing, so ask if it makes no sense). # That will give you the path name. You should see something like "/Users/Education/Desktop/yourfoldername/cytb_aligned_fasta.txt". cb<-read.dna(file.choose(),format="sequential") #this is the new format that seems, mysteriously, to work #another way to do this using dialogue boxes to search for the file is to use file.choose() instead of the name. This will let you go search for the file directly cb<-read.dna(file=file.choose(),format="sequential") #this is the new format that seems, mysteriously, to work #Now execute it (remember - get your cursor into the line and press ). #you now should have a file called "cb" that contains the same data that you also have open in Se-Al. Viewing it in Se-Al is a lot better, so we'll go back and forth here. #check to see that they are all there: dim() gives the dimensions of the matrix you've loaded, and the name of the file gives more information dim(cb) ##How many sequences do you have? ## ##How long is the cytb gene? ## ##Wikipedia states that the cytb protein is "about 400 amino acids long" Does that make sense with your data Why (not)? ## #Now, let's check to see which of these sites show variation across the lemurs. These are called 'segregating sites' ss<-seg.sites(cb); head(ss) ##Go back to your Se-Al alignment - do they agree on the position of the first segregating site? What bp (base pairs) do you see segregating here? Is this a transition or a transversion? ## #Let's check and record the GC content of your data, just because you can and I used to be interested in this. There is a bias towards AT substitutions in mtDNA. #the next command gives some data on the file cb ##What is the GC content? ## #Now, let's get the genetic distances between the species. We'll create three sets, based on increasingly more complex models. #raw dist is simply the percent of sites that differ between species, with no correction for multiple hits. raw<-dist.dna(cb, model="raw", as.matrix=TRUE) #let's look at some of these distances. The next command just give the first 5 rows and 5 columns raw[1:5,1:5] #jc69 corrects for multiple hits, but treats all substitutions as equally (un)likely jc69<-dist.dna(cb, model="JC69", as.matrix=TRUE) #take a look jc69[1:5,1:5] ##Which set of distances are greater? ## #plot the two sets of distances against each other, add the 1:1 line plot(raw, jc69);abline(0,1) ##Do the modeled distances show a straight one-to-one pattern with percent difference? ##Can you explain why you see this pattern? #try a much more complex model, which considers transitions and transversions and codon position tn93<-dist.dna(cb, model="TN93", gamma=2, as.matrix=TRUE) #plot the two _corrected_ distances plot(jc69, tn93); abline(0,1) ##Why is there so much more scatter now? ## #Build a distance tree - let's use the most complicated model. NJ is an algorithm that tries to minimize both the total length of the tree and any mismatch between the data and the tree - so that the distance read along the branches of the tree (called the 'patristic distance') and the distances in the matrix are as close as possible. tn_tree<-nj(tn93); #Get some information on the tree file tn_tree; #Note that these distance trees are unrooted #now plot it. The first command isn't necessary, but it gives a title to the tree plot so you don't forget what you are looking at quartz("neighbor-joining using tn_93");plot.phylo(tn_tree, no.margin=TRUE, cex=0.75); ##(You may have to resize the window a bit to get to see the whole tree.) Is Microcebus monophyletic on this tree? Why (not)? ## #Ok, let's try to root the tree. There are two designated outgroups, species that do not live on Madagascar: Nycticebus.pygmaeus and Otolemur.crassicaudatus ##Just looking at the tree, do you think you can root the tree such that these are the (exclusive) outgroups? Why (not)? ## #Root on each in turn and take a look at the trees rootnp<-root(tn_tree, "Nycticebus.pygmaeus", resolve.root=TRUE) quartz("rooted on N. pygmaeus");plot.phylo(rootnp, no.margin=TRUE, cex=0.75) rootoc<-root(tn_tree, "Otolemur.crassicaudatus", resolve.root=TRUE) quartz("rooted on O. crassicaudatus");plot.phylo(rootoc, no.margin=TRUE, cex=0.75) #Cytb evoles really fast. You can see this by checking how many segregating sites there are: try typing "length(ss)" in the command line. Something like 40% of the sites have at least one polymorphism among the species #I think we should drop these two distant outgroups species, and rebuild the tree, rooted on Daubentonia Madagascarensis. #first, drop the sequences for the first two species (the distant outgroups) from your DNA file cbnew<-cb[3:56,] #check that two species were dropped: cbnew #make a single new distance tree tn_new<-nj(dist.dna(cbnew, model="TN93", gamma=2)) #root it on Daubentonia tn_new<-root(tn_new, "Daubentonia.madagascariensis") #you can plot this if you want to... #now, make 100 bootstrap trees, and record (in bootstraps) how often each split from the first tree is found among the 100 trees bootstraps<-boot.phylo(tn_new,cbnew,function(xx) nj(dist.dna(xx, model="TN93", gamma=2, as.matrix=TRUE))) #plot the original tree, with the bootstrap values as labels quartz("root_bootstrap"); plot.phylo(tn_new, no.margin=TRUE, cex=0.75); nodelabels(bootstraps, cex=0.66, frame="none", adj=0) ##Are the highest bootstaps nearer the tips, or nearer the root? Why might that be? (Ignore the 100 bootstrap at the base of the tree, as it is there by definition - all trees are rooted on the same species.) ## ##Which genus really seems to show no sign of being monophyletic? Now look closely at the path that links them. Is the evidence strong that the genus ISN'T monophyletic? Why (not)? ## ##If you look at the Lemurs_tree.pdf file, you can see that Phaner and Daubentonia are on the longest unbroken branches, but nowhere near each other. In the cytb tree you made, Phaner seems to be outside all the other lemurs, near Daubentonia. Why might this be? ##