Rescale tree

Most of the phylogenetic trees are scaled by evolutionary distance (substitution/site). In ggtree, users can re-scale a phylogenetic tree by any numerical variable inferred by evolutionary analysis (e.g. dN/dS).

beast_file <- system.file("examples/MCC_FluA_H3.tree", package="ggtree")
beast_tree <- read.beast(beast_file)
beast_tree
## 'beast' S4 object that stored information of
##   '/tmp/Rtmpmw360M/Rinst2ce3898b1a3/ggtree/examples/MCC_FluA_H3.tree'.
## 
## ...@ tree: 
## Phylogenetic tree with 76 tips and 75 internal nodes.
## 
## Tip labels:
##  A/Hokkaido/30-1-a/2013, A/New_York/334/2004, A/New_York/463/2005, A/New_York/452/1999, A/New_York/238/2005, A/New_York/523/1998, ...
## 
## Rooted; includes branch lengths.
## 
## with the following features available:
##   'height',  'height_0.95_HPD',  'height_median',    'height_range', 'length',
##   'length_0.95_HPD', 'length_median',    'length_range', 'posterior',    'rate',
##   'rate_0.95_HPD',   'rate_median',  'rate_range'.
p1 <- ggtree(beast_tree, mrsd='2013-01-01') + theme_tree2() +
    ggtitle("Divergence time")
p2 <- ggtree(beast_tree, branch.length = 'rate') + theme_tree2() +
    ggtitle("Substitution rate")
multiplot(p1, p2, ncol=2)

mlcfile <- system.file("extdata/PAML_Codeml", "mlc", package="ggtree")
mlc_tree <- read.codeml_mlc(mlcfile)
p1 <- ggtree(mlc_tree) + theme_tree2() +
    ggtitle("nucleotide substitutions per codon")
p2 <- ggtree(mlc_tree, branch.length='dN_vs_dS') + theme_tree2() +
    ggtitle("dN/dS tree")
multiplot(p1, p2, ncol=2)

In addition to specify branch.length in tree visualization, users can change branch length stored in tree object by using rescale_tree function.

beast_tree2 <- rescale_tree(beast_tree, branch.length = 'rate')
ggtree(beast_tree2) + theme_tree2()

Zoom on a portion of tree

ggtree provides gzoom function that similar to zoom function provided in ape. This function plots simultaneously a whole phylogenetic tree and a portion of it. It aims at exploring very large trees.

library("ape")
data(chiroptera)
library("ggtree")
gzoom(chiroptera, grep("Plecotus", chiroptera$tip.label))

Zoom in selected clade of a tree that was already annotated with ggtree is also supported.

groupInfo <- split(chiroptera$tip.label, gsub("_\\w+", "", chiroptera$tip.label))
chiroptera <- groupOTU(chiroptera, groupInfo)
p <- ggtree(chiroptera, aes(color=group)) + geom_tiplab() + xlim(NA, 23)
gzoom(p, grep("Plecotus", chiroptera$tip.label), xmax_adjust=2)

Color tree

In ggtree, coloring phylogenetic tree is easy, by using aes(color=VAR) to map the color of tree based on a specific variable (numeric and category are both supported).

ggtree(beast_tree, aes(color=rate)) +
    scale_color_continuous(low='darkgreen', high='red') +
    theme(legend.position="right")

User can use any feature (if available), including clade posterior and dN/dS etc., to scale the color of the tree.

Annotate clades

ggtree implements geom_cladelabel layer to annotate a selected clade with a bar indicating the clade with a corresponding label.

The geom_cladelabel layer accepts a selected internal node number. To get the internal node number, please refer to Tree Manipulation vignette.

set.seed(2015-12-21)
tree = rtree(30)
p <- ggtree(tree) + xlim(NA, 6)

p+geom_cladelabel(node=45, label="test label") +
    geom_cladelabel(node=34, label="another clade") 

Users can set the parameter, align = TRUE, to align the clade label, and use the parameter, offset, to adjust the position.

p+geom_cladelabel(node=45, label="test label", align=TRUE, offset=.5) +
    geom_cladelabel(node=34, label="another clade", align=TRUE, offset=.5)

Users can change the color of the clade label via the parameter color.

p+geom_cladelabel(node=45, label="test label", align=T, color='red') +
    geom_cladelabel(node=34, label="another clade", align=T, color='blue')

Users can change the angle of the clade label text and relative position from text to bar via the parameter offset.text.

p+geom_cladelabel(node=45, label="test label", align=T, angle=270, hjust='center', offset.text=.5) +
    geom_cladelabel(node=34, label="another clade", align=T, angle=45) 

The size of the bar and text can be changed via the parameters barsize and fontsize respectively.

p+geom_cladelabel(node=45, label="test label", align=T, angle=270, hjust='center', offset.text=.5, barsize=1.5) +
    geom_cladelabel(node=34, label="another clade", align=T, angle=45, fontsize=8) 

Users can also use geom_label to label the text.

p+ geom_cladelabel(node=34, label="another clade", align=T, geom='label', fill='lightblue') 

Highlight clades

ggtree implements geom_hilight layer, that accepts an internal node number and add a layer of rectangle to highlight the selected clade.

nwk <- system.file("extdata", "sample.nwk", package="ggtree")
tree <- read.tree(nwk)
ggtree(tree) + geom_hilight(node=21, fill="steelblue", alpha=.6) +
    geom_hilight(node=17, fill="darkgreen", alpha=.6)

ggtree(tree, layout="circular") + geom_hilight(node=21, fill="steelblue", alpha=.6) +
    geom_hilight(node=23, fill="darkgreen", alpha=.6)

Another way to highlight selected clades is setting the clades with different colors and/or line types as demonstrated in Tree Manipulation vignette.

Highlight balances

In addition to geom_hilight, ggtree also implements geom_balance which is designed to highlight neighboring subclades of a given internal node.

ggtree(tree) + 
  geom_balance(node=16, fill='steelblue', color='white', alpha=0.6, extend=1) +
  geom_balance(node=19, fill='darkgreen', color='white', alpha=0.6, extend=1)

labelling associated taxa (Monophyletic, Polyphyletic or Paraphyletic)

geom_cladelabel is designed for labelling Monophyletic (Clade) while there are related taxa that are not form a clade. ggtree provides geom_strip to add a strip/bar to indicate the association with optional label (see the issue).

ggtree(tree) + geom_tiplab() + geom_strip(5, 7, barsize=2, color='red') + geom_strip(6, 12, barsize=2, color='blue')

taxa connection

Some evolutionary events (e.g. reassortment, horizontal gene transfer) can be modeled by a simple tree. ggtree provides geom_taxalink layer that allows drawing straight or curved lines between any of two nodes in the tree, allow it to represent evolutionary events by connecting taxa.

ggtree(tree) + geom_tiplab() + geom_taxalink('A', 'E') + geom_taxalink('F', 'K', color='red', arrow=grid::arrow(length = grid::unit(0.02, "npc")))

Tree annotation with analysis of R packages

annotating tree with ape bootstraping analysis

library(ape)
data(woodmouse)
d <- dist.dna(woodmouse)
tr <- nj(d)
bp <- boot.phylo(tr, woodmouse, function(xx) nj(dist.dna(xx)))
tree <- apeBoot(tr, bp)
ggtree(tree) + geom_label(aes(label=bootstrap)) + geom_tiplab()

annotating tree with phangorn output

library(phangorn)
treefile <- system.file("extdata", "pa.nwk", package="ggtree")
tre <- read.tree(treefile)
tipseqfile <- system.file("extdata", "pa.fas", package="ggtree")
tipseq <- read.phyDat(tipseqfile,format="fasta")
fit <- pml(tre, tipseq, k=4)
fit <- optim.pml(fit, optNni=FALSE, optBf=T, optQ=T,
                 optInv=T, optGamma=T, optEdge=TRUE,
                 optRooted=FALSE, model = "GTR")

phangorn <- phyPML(fit, type="ml")
ggtree(phangorn) + geom_text(aes(x=branch, label=AA_subs, vjust=-.5))

Tree annotation with output from evolution software

In ggtree, we implemented several parser functions to parse output from commonly used software package in evolutionary biology, including:

Evolutionary evidences inferred by these software packages can be used for further analysis in R and annotating phylogenetic tree directly in ggtree. For more details, please refer to the Tree Data Import vignette.

Tree annotation with user specified annotation

the %<+% operator

In addition to parse commonly used software output, ggtree also supports annotating a phylogenetic tree using user’s own data.

Suppose we have the following data that associate with the tree and would like to attach the data in the tree.

nwk <- system.file("extdata", "sample.nwk", package="ggtree")
tree <- read.tree(nwk)
p <- ggtree(tree)

dd <- data.frame(taxa  = LETTERS[1:13], 
                 place = c(rep("GZ", 5), rep("HK", 3), rep("CZ", 4), NA),
                 value = round(abs(rnorm(13, mean=70, sd=10)), digits=1))
## you don't need to order the data
## data was reshuffled just for demonstration
dd <- dd[sample(1:13, 13), ]
row.names(dd) <- NULL
print(dd)
taxa place value
C GZ 60.9
H HK 82.2
I CZ 71.3
D GZ 109.3
G HK 78.5
K CZ 70.1
J CZ 74.3
M NA 75.6
E GZ 82.0
A GZ 70.7
L CZ 68.6
F HK 84.5
B GZ 72.7

We can imaging that the place column stores the location that we isolated the species and value column stores numerical values (e.g. bootstrap values).

We have demonstrated using the operator, %<%, to update a tree view with a new tree. Here, we will introduce another operator, %<+%, that attaches annotation data to a tree view. The only requirement of the input data is that its first column should be matched with the node/tip labels of the tree.

After attaching the annotation data to the tree by %<+%, all the columns in the data are visible to ggtree. As an example, here we attach the above annotation data to the tree view, p, and add a layer that showing the tip labels and colored them by the isolation site stored in place column.

p <- p %<+% dd + geom_tiplab(aes(color=place)) + 
       geom_tippoint(aes(size=value, shape=place, color=place), alpha=0.25)
p+theme(legend.position="right")

Once the data was attached, it is always attached. So that we can add other layers to display these information easily.

p + geom_text(aes(color=place, label=place), hjust=1, vjust=-0.4, size=3) +
    geom_text(aes(color=place, label=value), hjust=1, vjust=1.4, size=3)

phylo4d

phylo4d was defined in the phylobase package, which can be employed to integrate user’s data with phylogenetic tree. phylo4d was supported in ggtree and the data stored in the object can be used directly to annotate the tree.

dd2 <- dd[, -1]
rownames(dd2) <- dd[,1]
require(phylobase)
tr2 <- phylo4d(tree, dd2)
ggtree(tr2) + geom_tiplab(aes(color=place)) + 
    geom_tippoint(aes(size=value, shape=place, color=place), alpha=0.25)

jplace file format

ggtree provides write.jplace() function to store user’s own data and associated newick tree to a single jplace file, which can be parsed directly in ggtree and user’s data can be used to annotate the tree directly. For more detail, please refer to the Tree Data Import vignette.

Advance tree annotation

Advance tree annotation including visualizing phylogenetic tree with associated matrix and multiple sequence alignment; annotating tree with subplots and images (especially PhyloPic). For details and examples, please refer to the Advance Tree Annotation vignette.

Interactive tree annotation

Interactive tree annotation is also possible, please refer to https://guangchuangyu.github.io/2016/06/identify-method-for-ggtree.

References

1. Bouckaert, R. et al. BEAST 2: A software platform for bayesian evolutionary analysis. PLoS Comput Biol 10, e1003537 (2014).

2. Berger, S. A., Krompass, D. & Stamatakis, A. Performance, accuracy, and web server for evolutionary placement of short sequence reads under maximum likelihood. Systematic Biology 60, 291–302 (2011).

3. Pond, S. L. K., Frost, S. D. W. & Muse, S. V. HyPhy: Hypothesis testing using phylogenies. Bioinformatics 21, 676–679 (2005).

4. Yang, Z. PAML 4: Phylogenetic analysis by maximum likelihood. Molecular Biology and Evolution 24, 1586–1591 (2007).

5. Boussau, B. et al. Genome-scale coestimation of species and gene trees. Genome Res. 23, 323–330 (2013).

6. Matsen, F. A., Kodner, R. B. & Armbrust, E. V. Pplacer: Linear time maximum-likelihood and bayesian phylogenetic placement of sequences onto a fixed reference tree. BMC Bioinformatics 11, 538 (2010).

7. Marazzi, B. et al. Locating evolutionary precursors on a phylogenetic tree. Evolution 66, 3918–3930 (2012).

8. Stamatakis, A. RAxML version 8: A tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics btu033 (2014). doi:10.1093/bioinformatics/btu033

9. Höhna, S. et al. Probabilistic graphical model representation in phylogenetics. Syst Biol 63, 753–771 (2014).