Annotate clades
ggtree (Yu et al. 2017) implements geom_cladelabel
layer to annotate a selected clade with a bar indicating the clade with a corresponding label.
The geom_cladelabel
layer accepts a selected internal node number. To get the internal node number, please refer to Tree Manipulation vignette.
set.seed(2015-12-21)
tree <- rtree(30)
p <- ggtree(tree) + xlim(NA, 6)
p + geom_cladelabel(node=45, label="test label") +
geom_cladelabel(node=34, label="another clade")
Users can set the parameter, align = TRUE
, to align the clade label, and use the parameter, offset
, to adjust the position.
p + geom_cladelabel(node=45, label="test label", align=TRUE, offset=.5) +
geom_cladelabel(node=34, label="another clade", align=TRUE, offset=.5)
Users can change the color of the clade label via the parameter color
.
p + geom_cladelabel(node=45, label="test label", align=T, color='red') +
geom_cladelabel(node=34, label="another clade", align=T, color='blue')
Users can change the angle
of the clade label text and relative position from text to bar via the parameter offset.text
.
p + geom_cladelabel(node=45, label="test label", align=T, angle=270, hjust='center', offset.text=.5) +
geom_cladelabel(node=34, label="another clade", align=T, angle=45)
The size of the bar and text can be changed via the parameters barsize
and fontsize
respectively.
p + geom_cladelabel(node=45, label="test label", align=T, angle=270, hjust='center', offset.text=.5, barsize=1.5) +
geom_cladelabel(node=34, label="another clade", align=T, angle=45, fontsize=8)
Users can also use geom_label
to label the text.
Annotate clades for unrooted tree
ggtree provides geom_clade2
for labeling clades of unrooted layout trees.
Labelling associated taxa (Monophyletic, Polyphyletic or Paraphyletic)
geom_cladelabel
is designed for labelling Monophyletic (Clade) while there are related taxa that are not form a clade. ggtree
provides geom_strip
to add a strip/bar to indicate the association with optional label (see the issue).
Highlight clades
ggtree
implements geom_hilight
layer, that accepts an internal node number and add a layer of rectangle to highlight the selected clade.
ggtree(tree) + geom_hilight(node=21, fill="steelblue", alpha=.6) +
geom_hilight(node=17, fill="darkgreen", alpha=.6)
ggtree(tree, layout="circular") + geom_hilight(node=21, fill="steelblue", alpha=.6) +
geom_hilight(node=23, fill="darkgreen", alpha=.6)
Another way to highlight selected clades is setting the clades with different colors and/or line types as demonstrated in Tree Manipulation vignette.
Highlight balances
In addition to geom_hilight
, ggtree
also implements geom_balance
which is designed to highlight neighboring subclades of a given internal node.
Highlight clades for unrooted tree
ggtree provides geom_hilight_encircle
to support highlight clades for unrooted layout trees.
Taxa connection
Some evolutionary events (e.g. reassortment, horizontal gene transfer) can be modeled by a simple tree. ggtree
provides geom_taxalink
layer that allows drawing straight or curved lines between any of two nodes in the tree, allow it to represent evolutionary events by connecting taxa.
Tree annotation with output from evolution software
The treeio package implemented several parser functions to parse output from commonly used software in evolutionary biology.
Here, we used BEAST (Bouckaert et al. 2014) output as an example. For details, please refer to the Importer vignette.
file <- system.file("extdata/BEAST", "beast_mcc.tree", package="treeio")
beast <- read.beast(file)
ggtree(beast, aes(color=rate)) +
geom_range(range='length_0.95_HPD', color='red', alpha=.6, size=2) +
geom_nodelab(aes(x=branch, label=round(posterior, 2)), vjust=-.5, size=3) +
scale_color_continuous(low="darkgreen", high="red") +
theme(legend.position=c(.1, .8))
Tree annotation with user specified annotation
Integrating user data to annotate phylogenetic tree can be done at different levels. The treeio package implements full_join
methods to combine tree data to phylogenetic tree object. The tidytree package supports linking tree data to phylogeny using tidyverse verbs. ggtree supports mapping external data to phylogeny for visualization and annotation on the fly.
The %<+%
operator
Suppose we have the following data that associate with the tree and would like to attach the data in the tree.
nwk <- system.file("extdata", "sample.nwk", package="treeio")
tree <- read.tree(nwk)
p <- ggtree(tree)
dd <- data.frame(taxa = LETTERS[1:13],
place = c(rep("GZ", 5), rep("HK", 3), rep("CZ", 4), NA),
value = round(abs(rnorm(13, mean=70, sd=10)), digits=1))
## you don't need to order the data
## data was reshuffled just for demonstration
dd <- dd[sample(1:13, 13), ]
row.names(dd) <- NULL
taxa | place | value |
---|---|---|
D | GZ | 78.4 |
K | CZ | 72.7 |
C | GZ | 83.0 |
H | HK | 102.6 |
E | GZ | 75.3 |
M | NA | 67.1 |
J | CZ | 70.4 |
A | GZ | 51.5 |
B | GZ | 56.6 |
L | CZ | 79.6 |
F | HK | 55.9 |
I | CZ | 68.0 |
G | HK | 86.1 |
We can imaging that the place column stores the location that we isolated the species and value column stores numerical values (e.g. bootstrap values).
We have demonstrated using the operator, %<%
, to update a tree view with a new tree. Here, we will introduce another operator, %<+%
, that attaches annotation data to a tree view. The only requirement of the input data is that its first column should be matched with the node/tip labels of the tree.
After attaching the annotation data to the tree by %<+%
, all the columns in the data are visible to ggtree. As an example, here we attach the above annotation data to the tree view, p
, and add a layer that showing the tip labels and colored them by the isolation site stored in place column.
p <- p %<+% dd + geom_tiplab(aes(color=place)) +
geom_tippoint(aes(size=value, shape=place, color=place), alpha=0.25)
p + theme(legend.position="right")
Once the data was attached, it is always attached. So that we can add other layers to display these information easily.
Visualize tree with associated matrix
The gheatmap
function is designed to visualize phylogenetic tree with heatmap of associated matrix.
In the following example, we visualized a tree of H3 influenza viruses with their associated genotype.
beast_file <- system.file("examples/MCC_FluA_H3.tree", package="ggtree")
beast_tree <- read.beast(beast_file)
genotype_file <- system.file("examples/Genotype.txt", package="ggtree")
genotype <- read.table(genotype_file, sep="\t", stringsAsFactor=F)
colnames(genotype) <- sub("\\.$", "", colnames(genotype))
p <- ggtree(beast_tree, mrsd="2013-01-01") + geom_treescale(x=2008, y=1, offset=2)
p <- p + geom_tiplab(size=2)
gheatmap(p, genotype, offset=5, width=0.5, font.size=3, colnames_angle=-45, hjust=0) +
scale_fill_manual(breaks=c("HuH3N2", "pdm", "trig"), values=c("steelblue", "firebrick", "darkgreen"))
The width parameter is to control the width of the heatmap. It supports another parameter offset for controlling the distance between the tree and the heatmap, for instance to allocate space for tip labels.
For time-scaled tree, as in this example, it’s more often to use x
axis by using theme_tree2
. But with this solution, the heatmap is just another layer and will change the x
axis. To overcome this issue, we implemented scale_x_ggtree
to set the x axis more reasonable.
Visualize tree with multiple sequence alignment
With msaplot
function, user can visualize multiple sequence alignment with phylogenetic tree, as demonstrated below:
fasta <- system.file("examples/FluA_H3_AA.fas", package="ggtree")
msaplot(ggtree(beast_tree), fasta)
A specific slice of the alignment can also be displayed by specific window parameter.
Plot tree with associated data
For associating phylogenetic tree with different type of plot produced by user’s data, ggtree
provides facet_plot
function which accepts an input data.frame
and a geom
function to draw the input data. The data will be displayed in an additional panel of the plot.
tr <- rtree(30)
d1 <- data.frame(id=tr$tip.label, val=rnorm(30, sd=3))
p <- ggtree(tr)
p2 <- facet_plot(p, panel="dot", data=d1, geom=geom_point, aes(x=val), color='firebrick')
d2 <- data.frame(id=tr$tip.label, value=abs(rnorm(30, mean=100, sd=50)))
facet_plot(p2, panel='bar', data=d2, geom=geom_segment, aes(x=0, xend=value, y=y, yend=y), size=3, color='steelblue') + theme_tree2()
Plot tree with images and suplots
Please refer to the following vignettes:
References
Bouckaert, Remco, Joseph Heled, Denise Kühnert, Tim Vaughan, Chieh-Hsi Wu, Dong Xie, Marc A. Suchard, Andrew Rambaut, and Alexei J. Drummond. 2014. “BEAST 2: A Software Platform for Bayesian Evolutionary Analysis.” PLoS Comput Biol 10 (4):e1003537. https://doi.org/10.1371/journal.pcbi.1003537.
Yu, Guangchuang, David K. Smith, Huachen Zhu, Yi Guan, and Tommy Tsan-Yuk Lam. 2017. “Ggtree: An R Package for Visualization and Annotation of Phylogenetic Trees with Their Covariates and Other Associated Data.” Methods in Ecology and Evolution 8 (1):28–36. https://doi.org/10.1111/2041-210X.12628.