Here, we demonstrate BANKSY analysis on 10x Visium data of the human dorsolateral prefrontal cortex from Maynard et al (2018). The data comprise 12 samples obtained from 3 subjects, with manual annotation of the layers in each sample. We will focus on 4 of the 12 samples from subject 3, demonstrating multi-sample analysis with BANKSY.

library(Banksy)
library(SummarizedExperiment)
library(SpatialExperiment)
library(Seurat)

library(scater)
library(cowplot)
library(ggplot2)

1 Loading the data

We fetch the data for all 12 DLPFC samples with the spatialLIBD package. This might take awhile.

library(spatialLIBD)
library(ExperimentHub)

ehub <- ExperimentHub::ExperimentHub()
spe <- spatialLIBD::fetch_data(type = "spe", eh = ehub)

After the download is completed, we trim the SpatialExperiment object, retaining only the counts and some metadata such as the sample identifier and pathology annotations. This saves some memory.

imgData(spe) <- NULL
assay(spe, "logcounts") <- NULL
reducedDims(spe) <- NULL
rowData(spe) <- NULL
colData(spe) <- DataFrame(
    sample_id = spe$sample_id,
    clust_annotation = factor(
        addNA(spe$layer_guess_reordered_short),
        exclude = NULL, labels = seq(8)
    ),
    in_tissue = spe$in_tissue,
    row.names = colnames(spe)
)
invisible(gc())

Next, subset spe to samples from the last subject (samples 151673, 151674, 151675, 151676). This stores each sample in its own SpatialExperiment object, all placed in a list.

sample_names <- as.character(151673:151676)
spe_list <- lapply(sample_names, function(x) spe[, spe$sample_id == x])
rm(spe)
invisible(gc())

2 Data preprocessing

Using Seurat, we perform basic normalisation of the data, and select the top 2000 highly variable features from each sample. Other methods for normalisation and feature selection may also be used. We take the union of these features for downstream analysis.

#' Normalize data
seu_list <- lapply(spe_list, function(x) {
    x <- as.Seurat(x, data = NULL)
    NormalizeData(x, scale.factor = 5000, normalization.method = 'RC')
})

#' Compute HVGs
hvgs <- lapply(seu_list, function(x) {
    VariableFeatures(FindVariableFeatures(x, nfeatures = 2000))
})
hvgs <- Reduce(union, hvgs)

#' Add data to SpatialExperiment and subset to HVGs
aname <- "normcounts"
spe_list <- Map(function(spe, seu) {
    assay(spe, aname) <- GetAssayData(seu)
    spe[hvgs,]
    }, spe_list, seu_list)
rm(seu_list)
invisible(gc())

3 Running BANKSY

To run BANKSY across multiple samples, we first compute the BANKSY neighborhood feature matrices for each sample separately. We use k_geom=6 corresponding to the first-order neighbors in 10x Visium assays (k_geom=18 corresponding to first and second-order neighbors may also be used).

compute_agf <- FALSE
k_geom <- 6
spe_list <- lapply(spe_list, computeBanksy, assay_name = aname, 
                   compute_agf = compute_agf, k_geom = k_geom)

We then merge the samples to perform joint dimensional reduction and clustering:

spe_joint <- do.call(cbind, spe_list)
rm(spe_list)
invisible(gc())

When running multi-sample BANKSY PCA, the group argument may be provided. This specifies the grouping variable for the cells or spots across the samples. Features belonging to cells or spots corresponding to each level of the grouping variable will be z-scaled separately. In this case, sample_id in colData(spe_joint) gives the grouping based on the sample of origin.

lambda <- 0.2
use_agf <- FALSE
spe_joint <- runBanksyPCA(spe_joint, use_agf = use_agf, lambda = lambda, group = "sample_id", seed = 1000)

Run UMAP on the BANKSY embedding:

spe_joint <- runBanksyUMAP(spe_joint, use_agf = use_agf, lambda = lambda, seed = 1000)

Finally, obtain cluster labels for spots across all 4 samples. We use connectClusters for visual comparison of the manual annotations and BANKSY clusters.

res <- 0.7
spe_joint <- clusterBanksy(spe_joint, use_agf = use_agf, lambda = lambda, resolution = res, seed = 1000)
cnm <- sprintf("clust_M%s_lam%s_k50_res%s", as.numeric(use_agf), lambda, res)
spe_joint <- connectClusters(spe_joint)

Once joint clustering is performed, we split the samples into their own SpatialExperiment objects:

spe_list <- lapply(sample_names, function(x) spe_joint[, spe_joint$sample_id == x])
rm(spe_joint)
invisible(gc())

As an optional step, we smooth the cluster labels of each sample separately. This can be done if smooth spatial domains are expected in the biological sample or tissue in question.

spe_list <- lapply(spe_list, smoothLabels, cluster_names = cnm, k = 6L, verbose = FALSE)
names(spe_list) <- paste0("sample_", sample_names)

The raw and smoothed cluster labels are stored in the colData slot of each SingleCellExperiment or SpatialExperiment object.

#> DataFrame with 6 rows and 5 columns
#>                      sample_id clust_annotation in_tissue
#>                    <character>         <factor> <logical>
#> AAACAAGTATCTCCCA-1      151673                3      TRUE
#> AAACAATCTACTAGCA-1      151673                1      TRUE
#> AAACACCAATAACTGC-1      151673                7      TRUE
#> AAACAGAGCGACTCCT-1      151673                3      TRUE
#> AAACAGCTTTCAGAAG-1      151673                5      TRUE
#> AAACAGGGTCTATATT-1      151673                6      TRUE
#>                    clust_M0_lam0.2_k50_res0.7 clust_M0_lam0.2_k50_res0.7_smooth
#>                                      <factor>                          <factor>
#> AAACAAGTATCTCCCA-1                          3                                 3
#> AAACAATCTACTAGCA-1                          1                                 1
#> AAACACCAATAACTGC-1                          7                                 7
#> AAACAGAGCGACTCCT-1                          3                                 3
#> AAACAGCTTTCAGAAG-1                          6                                 6
#> AAACAGGGTCTATATT-1                          9                                 9

4 Parsing BANKSY output

We can compare BANKSY clusters to pathology annotations using several cluster comparison measures such as the adjusted Rand index (ARI) or normalized mutual information (NMI) with compareClusters. The function computes the selected comparison measure between all pairs of cluster labels:

compareClusters(spe_list$sample_151673, func = 'ARI')
#>                                   clust_annotation clust_M0_lam0.2_k50_res0.7
#> clust_annotation                             1.000                      0.535
#> clust_M0_lam0.2_k50_res0.7                   0.535                      1.000
#> clust_M0_lam0.2_k50_res0.7_smooth            0.554                      0.883
#>                                   clust_M0_lam0.2_k50_res0.7_smooth
#> clust_annotation                                              0.554
#> clust_M0_lam0.2_k50_res0.7                                    0.883
#> clust_M0_lam0.2_k50_res0.7_smooth                             1.000

We evaluate the ARI and NMI for each sample:

ari <- sapply(spe_list, function(x) as.numeric(tail(compareClusters(x, func = "ARI")[, 1], n = 1)))
ari
#> sample_151673 sample_151674 sample_151675 sample_151676 
#>         0.554         0.557         0.495         0.516

nmi <- sapply(spe_list, function(x) as.numeric(tail(compareClusters(x, func = "NMI")[, 1], n = 1)))
nmi
#> sample_151673 sample_151674 sample_151675 sample_151676 
#>         0.667         0.672         0.636         0.635

Visualise pathology annotation and BANKSY cluster on spatial coordinates with the ggspavis package:

# Use scater:::.get_palette('tableau10medium')
pal <- c(
    "#729ECE", "#FF9E4A", "#67BF5C", "#ED665D", "#AD8BC9",
    "#A8786E", "#ED97CA", "#A2A2A2", "#CDCC5D", "#6DCCDA"
)

plot_bank <- lapply(spe_list, function(x) {
    df <- cbind.data.frame(
        clust=colData(x)[[sprintf("%s_smooth", cnm)]], spatialCoords(x))
    ggplot(df, aes(x=pxl_row_in_fullres, y=pxl_col_in_fullres, col=clust)) +
        geom_point(size = 0.5) + 
        scale_color_manual(values = pal) +
        theme_classic() + 
        theme(
            legend.position = "none",
            axis.text.x=element_blank(),
            axis.text.y=element_blank(),
            axis.ticks=element_blank(),
            axis.title.x=element_blank(),
            axis.title.y=element_blank()) +
        labs(title = "BANKSY clusters") +
        coord_equal()
})

plot_anno <- lapply(spe_list, function(x) {
    df <- cbind.data.frame(
        clust=colData(x)[['clust_annotation']], spatialCoords(x))
    ggplot(df, aes(x=pxl_row_in_fullres, y=pxl_col_in_fullres, col=clust)) +
        geom_point(size = 0.5) + 
        scale_color_manual(values = pal) +
        theme_classic() + 
        theme(
            legend.position = "none",
            axis.text.x=element_blank(),
            axis.text.y=element_blank(),
            axis.ticks=element_blank(),
            axis.title.x=element_blank(),
            axis.title.y=element_blank()) +
        labs(title = sprintf("Sample %s", x$sample_id[1])) +
        coord_equal()
})

plot_list <- c(plot_anno, plot_bank)

plot_grid(plotlist = plot_list, ncol = 4, byrow = TRUE)

Visualize joint UMAPs for each sample:

umap_bank <- lapply(spe_list, function(x) {
    plotReducedDim(x,
        "UMAP_M0_lam0.2",
        colour_by = sprintf("%s_smooth", cnm),
        point_size = 0.5
    ) +
        theme(legend.position = "none") +
        labs(title = "BANKSY clusters")
})

umap_anno <- lapply(spe_list, function(x) {
    plotReducedDim(x,
        "UMAP_M0_lam0.2",
        colour_by = "clust_annotation",
        point_size = 0.5
    ) +
        theme(legend.position = "none") +
        labs(title = sprintf("Sample %s", x$sample_id[1]))
})

umap_list <- c(umap_anno, umap_bank)

plot_grid(plotlist = umap_list, ncol = 4, byrow = TRUE)

5 Session information

Vignette runtime:

#> Time difference of 2.747394 mins

sessionInfo()
#> R version 4.4.1 (2024-06-14)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.1 LTS
#> 
#> Matrix products: default
#> BLAS:   /home/biocbuild/bbs-3.20-bioc/R/lib/libRblas.so 
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_GB              LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: America/New_York
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats4    stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#>  [1] ExperimentHub_2.14.0        AnnotationHub_3.14.0       
#>  [3] BiocFileCache_2.14.0        dbplyr_2.5.0               
#>  [5] spatialLIBD_1.17.10         cowplot_1.1.3              
#>  [7] scater_1.34.0               ggplot2_3.5.1              
#>  [9] harmony_1.2.1               Rcpp_1.0.13                
#> [11] data.table_1.16.2           scran_1.34.0               
#> [13] scuttle_1.16.0              Seurat_5.1.0               
#> [15] SeuratObject_5.0.2          sp_2.1-4                   
#> [17] SpatialExperiment_1.16.0    SingleCellExperiment_1.28.0
#> [19] SummarizedExperiment_1.36.0 Biobase_2.66.0             
#> [21] GenomicRanges_1.58.0        GenomeInfoDb_1.42.0        
#> [23] IRanges_2.40.0              S4Vectors_0.44.0           
#> [25] BiocGenerics_0.52.0         MatrixGenerics_1.18.0      
#> [27] matrixStats_1.4.1           Banksy_1.2.0               
#> [29] BiocStyle_2.34.0           
#> 
#> loaded via a namespace (and not attached):
#>   [1] bitops_1.0-9             spatstat.sparse_3.1-0    doParallel_1.0.17       
#>   [4] httr_1.4.7               RColorBrewer_1.1-3       tools_4.4.1             
#>   [7] sctransform_0.4.1        DT_0.33                  utf8_1.2.4              
#>  [10] R6_2.5.1                 lazyeval_0.2.2           uwot_0.2.2              
#>  [13] withr_3.0.2              gridExtra_2.3            progressr_0.15.0        
#>  [16] cli_3.6.3                spatstat.explore_3.3-3   fastDummies_1.7.4       
#>  [19] labeling_0.4.3           sass_0.4.9               spatstat.data_3.1-2     
#>  [22] ggridges_0.5.6           pbapply_1.7-2            Rsamtools_2.22.0        
#>  [25] dbscan_1.2-0             aricode_1.0.3            dichromat_2.0-0.1       
#>  [28] sessioninfo_1.2.2        parallelly_1.38.0        attempt_0.3.1           
#>  [31] maps_3.4.2               limma_3.62.0             pals_1.9                
#>  [34] RSQLite_2.3.7            BiocIO_1.16.0            generics_0.1.3          
#>  [37] ica_1.0-3                spatstat.random_3.3-2    dplyr_1.1.4             
#>  [40] Matrix_1.7-1             ggbeeswarm_0.7.2         fansi_1.0.6             
#>  [43] abind_1.4-8              lifecycle_1.0.4          yaml_2.3.10             
#>  [46] edgeR_4.4.0              SparseArray_1.6.0        Rtsne_0.17              
#>  [49] paletteer_1.6.0          grid_4.4.1               blob_1.2.4              
#>  [52] promises_1.3.0           dqrng_0.4.1              crayon_1.5.3            
#>  [55] miniUI_0.1.1.1           lattice_0.22-6           beachmat_2.22.0         
#>  [58] mapproj_1.2.11           KEGGREST_1.46.0          magick_2.8.5            
#>  [61] pillar_1.9.0             knitr_1.48               metapod_1.14.0          
#>  [64] rjson_0.2.23             future.apply_1.11.3      codetools_0.2-20        
#>  [67] leiden_0.4.3.1           glue_1.8.0               spatstat.univar_3.0-1   
#>  [70] vctrs_0.6.5              png_0.1-8                spam_2.11-0             
#>  [73] gtable_0.3.6             rematch2_2.1.2           cachem_1.1.0            
#>  [76] xfun_0.48                S4Arrays_1.6.0           mime_0.12               
#>  [79] survival_3.7-0           RcppHungarian_0.3        iterators_1.0.14        
#>  [82] tinytex_0.53             fields_16.3              statmod_1.5.0           
#>  [85] bluster_1.16.0           fitdistrplus_1.2-1       ROCR_1.0-11             
#>  [88] nlme_3.1-166             bit64_4.5.2              filelock_1.0.3          
#>  [91] RcppAnnoy_0.0.22         bslib_0.8.0              irlba_2.3.5.1           
#>  [94] vipor_0.4.7              KernSmooth_2.23-24       colorspace_2.1-1        
#>  [97] DBI_1.2.3                tidyselect_1.2.1         bit_4.5.0               
#> [100] compiler_4.4.1           curl_5.2.3               BiocNeighbors_2.0.0     
#> [103] DelayedArray_0.32.0      plotly_4.10.4            rtracklayer_1.66.0      
#> [106] bookdown_0.41            scales_1.3.0             lmtest_0.9-40           
#> [109] rappdirs_0.3.3           stringr_1.5.1            digest_0.6.37           
#> [112] goftest_1.2-3            spatstat.utils_3.1-0     rmarkdown_2.28          
#> [115] benchmarkmeData_1.0.4    RhpcBLASctl_0.23-42      XVector_0.46.0          
#> [118] htmltools_0.5.8.1        pkgconfig_2.0.3          highr_0.11              
#> [121] fastmap_1.2.0            rlang_1.1.4              htmlwidgets_1.6.4       
#> [124] UCSC.utils_1.2.0         shiny_1.9.1              farver_2.1.2            
#> [127] jquerylib_0.1.4          zoo_1.8-12               jsonlite_1.8.9          
#> [130] BiocParallel_1.40.0      mclust_6.1.1             config_0.3.2            
#> [133] RCurl_1.98-1.16          BiocSingular_1.22.0      magrittr_2.0.3          
#> [136] GenomeInfoDbData_1.2.13  dotCall64_1.2            patchwork_1.3.0         
#> [139] munsell_0.5.1            viridis_0.6.5            reticulate_1.39.0       
#> [142] leidenAlg_1.1.4          stringi_1.8.4            zlibbioc_1.52.0         
#> [145] MASS_7.3-61              plyr_1.8.9               parallel_4.4.1          
#> [148] listenv_0.9.1            ggrepel_0.9.6            deldir_2.0-4            
#> [151] Biostrings_2.74.0        sccore_1.0.5             splines_4.4.1           
#> [154] tensor_1.5               locfit_1.5-9.10          igraph_2.1.1            
#> [157] spatstat.geom_3.3-3      RcppHNSW_0.6.0           reshape2_1.4.4          
#> [160] ScaledMatrix_1.14.0      XML_3.99-0.17            BiocVersion_3.20.0      
#> [163] evaluate_1.0.1           golem_0.5.1              BiocManager_1.30.25     
#> [166] foreach_1.5.2            httpuv_1.6.15            RANN_2.6.2              
#> [169] tidyr_1.3.1              purrr_1.0.2              polyclip_1.10-7         
#> [172] benchmarkme_1.0.8        future_1.34.0            scattermore_1.2         
#> [175] rsvd_1.0.5               xtable_1.8-4             restfulr_0.0.15         
#> [178] RSpectra_0.16-2          later_1.3.2              viridisLite_0.4.2       
#> [181] tibble_3.2.1             GenomicAlignments_1.42.0 memoise_2.0.1           
#> [184] beeswarm_0.4.0           AnnotationDbi_1.68.0     cluster_2.1.6           
#> [187] shinyWidgets_0.8.7       globals_0.16.3

Multi-sample analysis (10x Visium Human DLPFC)

29 October 2024

Contents

1 Loading the data

2 Data preprocessing

3 Running BANKSY

4 Parsing BANKSY output

5 Session information