scTreeViz
is a package for interactive visualization and exploration of Single Cell RNA sequencing data. scTreeViz
provides methods for exploring hierarchical features (eg. clusters in single cell at different resolutions or taxonomic hierarchy in single cell datasets), while supporting other useful data visualization charts like heatmaps for expression and scatter plots for dimensionality reductions like UMAP or TSNE.
library(scTreeViz)
library(Seurat)
library(SC3)
library(scran)
library(scater)
library(clustree)
library(igraph)
library(scRNAseq)
The first step in using the scTreeViz
package is to wrap datasets into TreeViz
objects. The TreeViz
class extends SummarizedExperiment
and provides various methods to interactively perform various operations on the underlying hierarchy and count or expression matrices. In this section, we show various ways to generate a TreeViz
object either from existing Single Cell packages (SingleCellExperiment or Seurat) or from a raw count matrix and cluster hierarchy.
SingleCellExperiment
A number of Single cell datasets are available as SingleCellExperiment
objects through the scRNAseq
package, for this usecase, we use LunSpikeInData
dataset. In addition, we calculate the dimensionality reductions; UMAP, TSNE and PCA from the functions provided in scater
package.
# load dataset
sce<- ZeiselBrainData()
# Normalization
sce <- logNormCounts(sce)
# calculate umap and tsne
sce <- runUMAP(sce)
sce<- runTSNE(sce)
sce<- runPCA(sce)
We provide createFromSCE
function to create a TreeViz
object from SingleCellExperiment
object. Here, the workflow works in two ways:
colData
of the SingleCellExperiment
object, we create clusters at different resolutions using the WalkTrap
algorithm by calling an internal function generate_walktrap_hierarchy
and use this cluster information for visualization.treeViz <- createFromSCE(sce, reduced_dim = c("UMAP","PCA","TSNE"))
#> [1] "1.cluster1" "2.cluster2" "3.cluster3" "4.cluster4" "5.cluster5"
#> [6] "6.cluster6" "7.cluster7" "8.cluster8" "9.cluster9" "10.cluster10"
#> [11] "11.cluster11" "12.cluster12" "13.cluster13" "14.cluster14" "samples"
plot(treeViz)
colData
of the object, then the user should set the flag parameter check_coldata
to TRUE
and provide prefix for the columns where cluster information is stored.# Forming clusters
set.seed(1000)
for (i in seq(10)) {
clust.kmeans <- kmeans(reducedDim(sce, "TSNE"), centers = i)
sce[[paste0("clust", i)]] <- factor(clust.kmeans$cluster)
}
treeViz<- createFromSCE(sce, check_coldata = TRUE, col_regex = "clust")
plot(treeViz)
Note: In both cases the user needs to provide the name of dimensionality reductions present in the object as a parameter.
Seurat
We use the dataset pbmc_small
available through Seurat to create a TreeViz
object.
data(pbmc_small)
pbmc <- pbmc_small
We then preprocess the data and find clusters at different resolutions.
pbmc[["percent.mt"]] <- PercentageFeatureSet(pbmc, pattern = "^MT-")
pbmc <- NormalizeData(pbmc)
all.genes <- rownames(pbmc)
pbmc <- ScaleData(pbmc, vars.to.regress = "percent.mt")
pbmc <- FindVariableFeatures(object = pbmc)
pbmc <- RunPCA(pbmc, features = VariableFeatures(object = pbmc))
pbmc <- FindNeighbors(pbmc, dims = 1:10)
pbmc <- FindClusters(pbmc, resolution = c(0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0), print.output = 0, save.SNN = TRUE)
pbmc
The measurements for dimensionality reduction methods we want to visualize are also added to the object via native functions in Seurat
. Since PCA
is already added, we calculate TSNE
and UMAP
# pbmc<- RunTSNE(pbmc)
pbmc<- RunUMAP(pbmc, dims=1:3)
Reductions(pbmc)
We use the createFromSeurat
function to create a TreeViz
object from Seurat
object. In addition the object, we pass the name of dimensionality reductions present in the object as a paramter in vector format to indicate these measurements should be added to treeviz
for visualization. If the mentioned reduced dimension is not present it would simply be ignored.
treeViz<- createFromSeurat(pbmc, check_metadata = TRUE, reduced_dim = c("umap","pca","tsne"))
#> [1] "6.cluster6" "10.cluster10" "11.cluster11" "samples"
#> [1] "umap" "pca" "tsne"
plot(treeViz)
n=64
# create a hierarchy
df<- data.frame(cluster0=rep(1,n))
for(i in seq(1,5)){
df[[paste0("cluster",i)]]<- rep(seq(1:(2**i)),each=ceiling(n/(2**i)),len=n)
}
# generate a count matrix
counts <- matrix(rpois(6400, lambda = 10), ncol=n, nrow=100)
colnames(counts)<- seq(1:64)
# create a `TreeViz` object
treeViz <- createTreeViz(df, counts)
plot(treeViz)
Start the App from the treeViz
object we created. This adds a facetZoom
to navigate the cluster hierarchy, a heatmap of the top n
most variable genes from the dataset, where ‘n’ is selected by the user and one scatter plot for each of the reduced dimensions.
app <- startTreeviz(treeViz, top_genes = 500)
Users can also use the interface to explore the same dataset using different visualizations available through Epiviz.
Users can also add Gene Box plots using either the frontend application, or from R session. In the following example, we visualize the 5th, 50th and 500th most variable gene as Box plots
Users need to select Add Visualization -> Gene Box PLot
option from menu and then select the desired gene using the search pane in the appeared dialogue box
Users can also select the gene from R session by using the plotGene
command followed by Gene name.
app$plotGene(gene="AIF1")
After exploring the dataset, this command the websocket connection.
app$stop_app()
sessionInfo()
#> R version 4.1.1 (2021-08-10)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 20.04.3 LTS
#>
#> Matrix products: default
#> BLAS: /home/biocbuild/bbs-3.14-bioc/R/lib/libRblas.so
#> LAPACK: /home/biocbuild/bbs-3.14-bioc/R/lib/libRlapack.so
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_GB LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] scRNAseq_2.7.2 igraph_1.2.7
#> [3] clustree_0.4.3 ggraph_2.0.5
#> [5] scater_1.22.0 ggplot2_3.3.5
#> [7] scran_1.22.0 scuttle_1.4.0
#> [9] SingleCellExperiment_1.16.0 SC3_1.22.0
#> [11] SeuratObject_4.0.2 Seurat_4.0.5
#> [13] scTreeViz_1.0.0 SummarizedExperiment_1.24.0
#> [15] Biobase_2.54.0 GenomicRanges_1.46.0
#> [17] GenomeInfoDb_1.30.0 IRanges_2.28.0
#> [19] S4Vectors_0.32.0 BiocGenerics_0.40.0
#> [21] MatrixGenerics_1.6.0 matrixStats_0.61.0
#> [23] epivizr_2.24.0 BiocStyle_2.22.0
#>
#> loaded via a namespace (and not attached):
#> [1] rappdirs_0.3.3 rtracklayer_1.54.0
#> [3] scattermore_0.7 tidyr_1.1.4
#> [5] bit64_4.0.5 knitr_1.36
#> [7] irlba_2.3.3 DelayedArray_0.20.0
#> [9] data.table_1.14.2 rpart_4.1-15
#> [11] KEGGREST_1.34.0 RCurl_1.98-1.5
#> [13] AnnotationFilter_1.18.0 doParallel_1.0.16
#> [15] generics_0.1.1 GenomicFeatures_1.46.0
#> [17] ScaledMatrix_1.2.0 cowplot_1.1.1
#> [19] RSQLite_2.2.8 RANN_2.6.1
#> [21] proxy_0.4-26 future_1.22.1
#> [23] bit_4.0.4 spatstat.data_2.1-0
#> [25] xml2_1.3.2 httpuv_1.6.3
#> [27] assertthat_0.2.1 viridis_0.6.2
#> [29] xfun_0.27 hms_1.1.1
#> [31] jquerylib_0.1.4 evaluate_0.14
#> [33] promises_1.2.0.1 DEoptimR_1.0-9
#> [35] fansi_0.5.0 restfulr_0.0.13
#> [37] progress_1.2.2 dbplyr_2.1.1
#> [39] DBI_1.1.1 htmlwidgets_1.5.4
#> [41] spatstat.geom_2.3-0 purrr_0.3.4
#> [43] ellipsis_0.3.2 RSpectra_0.16-0
#> [45] backports_1.2.1 dplyr_1.0.7
#> [47] bookdown_0.24 biomaRt_2.50.0
#> [49] deldir_1.0-6 sparseMatrixStats_1.6.0
#> [51] vctrs_0.3.8 ensembldb_2.18.0
#> [53] ROCR_1.0-11 abind_1.4-5
#> [55] withr_2.4.2 cachem_1.0.6
#> [57] ggforce_0.3.3 sys_3.4
#> [59] robustbase_0.93-9 checkmate_2.0.0
#> [61] sctransform_0.3.2 GenomicAlignments_1.30.0
#> [63] prettyunits_1.1.1 goftest_1.2-3
#> [65] cluster_2.1.2 ExperimentHub_2.2.0
#> [67] lazyeval_0.2.2 crayon_1.4.1
#> [69] labeling_0.4.2 edgeR_3.36.0
#> [71] pkgconfig_2.0.3 tweenr_1.0.2
#> [73] nlme_3.1-153 vipor_0.4.5
#> [75] ProtGenerics_1.26.0 rlang_0.4.12
#> [77] globals_0.14.0 lifecycle_1.0.1
#> [79] miniUI_0.1.1.1 filelock_1.0.2
#> [81] BiocFileCache_2.2.0 rsvd_1.0.5
#> [83] AnnotationHub_3.2.0 polyclip_1.10-0
#> [85] lmtest_0.9-38 rngtools_1.5.2
#> [87] graph_1.72.0 Matrix_1.3-4
#> [89] zoo_1.8-9 beeswarm_0.4.0
#> [91] pheatmap_1.0.12 ggridges_0.5.3
#> [93] png_0.1-7 viridisLite_0.4.0
#> [95] rjson_0.2.20 bitops_1.0-7
#> [97] KernSmooth_2.23-20 Biostrings_2.62.0
#> [99] blob_1.2.2 DelayedMatrixStats_1.16.0
#> [101] doRNG_1.8.2 stringr_1.4.0
#> [103] parallelly_1.28.1 beachmat_2.10.0
#> [105] scales_1.1.1 memoise_2.0.0
#> [107] magrittr_2.0.1 plyr_1.8.6
#> [109] ica_1.0-2 zlibbioc_1.40.0
#> [111] compiler_4.1.1 dqrng_0.3.0
#> [113] BiocIO_1.4.0 RColorBrewer_1.1-2
#> [115] rrcov_1.6-0 fitdistrplus_1.1-6
#> [117] Rsamtools_2.10.0 XVector_0.34.0
#> [119] listenv_0.8.0 patchwork_1.1.1
#> [121] pbapply_1.5-0 MASS_7.3-54
#> [123] mgcv_1.8-38 tidyselect_1.1.1
#> [125] stringi_1.7.5 highr_0.9
#> [127] yaml_2.2.1 BiocSingular_1.10.0
#> [129] locfit_1.5-9.4 ggrepel_0.9.1
#> [131] grid_4.1.1 sass_0.4.0
#> [133] tools_4.1.1 future.apply_1.8.1
#> [135] parallel_4.1.1 bluster_1.4.0
#> [137] foreach_1.5.1 metapod_1.2.0
#> [139] gridExtra_2.3 farver_2.1.0
#> [141] Rtsne_0.15 digest_0.6.28
#> [143] BiocManager_1.30.16 FNN_1.1.3
#> [145] shiny_1.7.1 Rcpp_1.0.7
#> [147] BiocVersion_3.14.0 later_1.3.0
#> [149] RcppAnnoy_0.0.19 WriteXLS_6.3.0
#> [151] OrganismDbi_1.36.0 httr_1.4.2
#> [153] AnnotationDbi_1.56.0 colorspace_2.0-2
#> [155] XML_3.99-0.8 tensor_1.5
#> [157] reticulate_1.22 splines_4.1.1
#> [159] uwot_0.1.10 RBGL_1.70.0
#> [161] statmod_1.4.36 spatstat.utils_2.2-0
#> [163] graphlayouts_0.7.1 plotly_4.10.0
#> [165] xtable_1.8-4 jsonlite_1.7.2
#> [167] epivizrServer_1.22.0 tidygraph_1.2.0
#> [169] R6_2.5.1 pillar_1.6.4
#> [171] htmltools_0.5.2 mime_0.12
#> [173] glue_1.4.2 fastmap_1.1.0
#> [175] BiocParallel_1.28.0 BiocNeighbors_1.12.0
#> [177] interactiveDisplayBase_1.32.0 class_7.3-19
#> [179] codetools_0.2-18 epivizrData_1.22.0
#> [181] pcaPP_1.9-74 mvtnorm_1.1-3
#> [183] utf8_1.2.2 lattice_0.20-45
#> [185] bslib_0.3.1 spatstat.sparse_2.0-0
#> [187] tibble_3.1.5 curl_4.3.2
#> [189] ggbeeswarm_0.6.0 leiden_0.3.9
#> [191] magick_2.7.3 survival_3.2-13
#> [193] limma_3.50.0 rmarkdown_2.11
#> [195] munsell_0.5.0 e1071_1.7-9
#> [197] GenomeInfoDbData_1.2.7 iterators_1.0.13
#> [199] reshape2_1.4.4 gtable_0.3.0
#> [201] spatstat.core_2.3-0