Introduction

library(ggtree)
library(ggtreeDendro)
library(aplot)
scale_color_subtree <- ggtreeDendro::scale_color_subtree

Clustering is very importance method to classify items into different categories and to infer functions since similar objects tend to behavior similarly. There are more than 200 packages in Bioconductor implement clustering algorithms or employ clustering methods for omic-data analysis.

Albeit the methods are important for data analysis, the visualization is quite limited. Most the the packages only have the ability to visualize the hierarchical tree structure using stats:::plot.hclust(). This package is design to visualize hierarchical tree structure with associated data (e.g., clinical information collected with the samples) using the powerful in-house developed ggtree package.

This package implements a set of autoplot() methods to display tree structure. We will implement more autoplot() methods to support more objects. The output of these autoplot() methods is a ggtree object, which can be further annotated by adding layers using ggplot2 syntax. Integrating associated data to annotate the tree is also supported by ggtreeExtra package.

Here are some demonstrations of using autoplot() methods to visualize common hierarchical clustering tree objects.

hclust and dendrogram objects

These two classes are defined in the stats package.

d <- dist(USArrests)

hc <- hclust(d, "ave")
den <- as.dendrogram(hc)

p1 <- autoplot(hc) + geom_tiplab()
p2 <- autoplot(den)
plot_list(p1, p2, ncol=2)

linkage object

The class linkage is defined in the mdendro package.

library("mdendro")
lnk <- linkage(d, digits = 1, method = "complete")
autoplot(lnk, layout = 'circular') + geom_tiplab() + 
  scale_color_subtree(4) + theme_tree()

agnes, diana and twins objects

These classes are defined in the cluster package.

library(cluster)
x1 <- agnes(mtcars)
x2 <- diana(mtcars)

p1 <- autoplot(x1) + geom_tiplab()
p2 <- autoplot(x2) + geom_tiplab()
plot_list(p1, p2, ncol=2)

pvclust object

The pvclust class is defined in the pvclust package.

library(pvclust)
data(Boston, package = "MASS")

set.seed(123)
result <- pvclust(Boston, method.dist="cor", method.hclust="average", nboot=1000, parallel=TRUE)
## Creating a temporary cluster...done:
## socket cluster with 71 nodes on host 'localhost'
## Multiscale bootstrap... Done.
autoplot(result, label_edge=TRUE, pvrect = TRUE) + geom_tiplab()

The pvclust object contains two types of p-values: AU (Approximately Unbiased) p-value and BP (Boostrap Probability) value. These values will be automatically labelled on the tree.

Session information

Here is the output of sessionInfo() on the system on which this document was compiled:

## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.20-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] pvclust_2.2-0      cluster_2.1.6      mdendro_2.2.1      aplot_0.2.3       
## [5] ggtreeDendro_1.8.0 ggtree_3.14.0      yulab.utils_0.1.7 
## 
## loaded via a namespace (and not attached):
##  [1] sass_0.4.9         utf8_1.2.4         generics_0.1.3     tidyr_1.3.1       
##  [5] prettydoc_0.4.1    ggplotify_0.1.2    lattice_0.22-6     digest_0.6.37     
##  [9] magrittr_2.0.3     evaluate_1.0.1     grid_4.4.1         fastmap_1.2.0     
## [13] jsonlite_1.8.9     ape_5.8            purrr_1.0.2        fansi_1.0.6       
## [17] scales_1.3.0       lazyeval_0.2.2     jquerylib_0.1.4    cli_3.6.3         
## [21] rlang_1.1.4        munsell_0.5.1      tidytree_0.4.6     withr_3.0.2       
## [25] cachem_1.1.0       yaml_2.3.10        tools_4.4.1        parallel_4.4.1    
## [29] dplyr_1.1.4        colorspace_2.1-1   ggplot2_3.5.1      vctrs_0.6.5       
## [33] R6_2.5.1           gridGraphics_0.5-1 lifecycle_1.0.4    fs_1.6.4          
## [37] ggfun_0.1.7        treeio_1.30.0      pkgconfig_2.0.3    pillar_1.9.0      
## [41] bslib_0.8.0        gtable_0.3.6       glue_1.8.0         Rcpp_1.0.13       
## [45] highr_0.11         xfun_0.48          tibble_3.2.1       tidyselect_1.2.1  
## [49] knitr_1.48         farver_2.1.2       htmltools_0.5.8.1  nlme_3.1-166      
## [53] patchwork_1.3.0    labeling_0.4.3     rmarkdown_2.28     compiler_4.4.1