This tutorial provides an example analysis for modelling gene detection pattern as outlined in R.Li et al, 2018. The goal of this tutorial is to provide an overview of the cell type classification and visualization tasks by learning a low dimensional embedding through a class of gene detection models: that is BFA and Binary PCA.
The following workflow summarizes a typical dimensionality reduction procedure performed by BFA or Binary PCA.
Let’s start with the installation
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("scBFA")
next we can load dependent packages
The example dataset is generated from our scRNA-seq pre-DC/cDC dataset sourced from https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE89232 After performing all quality control procedure of genes and cells (as outlined in the paper), we then select 500 most variable genes for illustration purpose. The example dataset consists of 950 cells and 500 genes
For illustration, here we construct a singleCellExperiment class to be the input of BFA and Binary PCA.
Let \(N\) stands for number of cells, \(G\) stands for the number of genes, and \(K\) stands for the number of latent dimensions.
A bfa model object computes the following parameters after fitting the gene detection matrix.
We choose 3 as number of latent dimensions and project the gene detection matrix on the embedding space.
We then visualize the low dimensional embedding of BFA in tSNE space. Points are colored by their corresponding cell types.
set.seed(5)
df = as.data.frame(bfa_model$ZZ)
df$celltype = celltype
p1 <- ggplot(df,aes(x = V1,y = V2,colour = celltype))
p1 <- p1 + geom_jitter(size=2.5,alpha = 0.8)
colorvalue <- c("#43d5f9","#24b71f","#E41A1C", "#ffc935","#3d014c","#39ddb2",
"slateblue2","maroon","#f7df27","palevioletred1","olivedrab3",
"#377EB8","#5043c1","blue","aquamarine2","chartreuse4",
"burlywood2","indianred1","mediumorchid1")
p1 <- p1 + xlab("tsne axis 1") + ylab("tsne axis 2")
p1 <- p1 + scale_color_manual(values = colorvalue)
p1 <- p1 + theme(panel.background = element_blank(),
legend.position = "right",
axis.text=element_blank(),
axis.line.x = element_line(color="black"),
axis.line.y = element_line(color="black"),
plot.title = element_blank()
)
p1
We then visualize the low dimensional embedding of Binary PCA in tSNE space. Points are colored by their corresponding cell types.
df = as.data.frame(bpca$x[,c(1:2)])
colnames(df) = c("V1","V2")
df$celltype = celltype
p1 <- ggplot(df,aes(x = V1,y = V2,colour = celltype))
p1 <- p1 + geom_jitter(size=2.5,alpha = 0.8)
colorvalue <- c("#43d5f9","#24b71f","#E41A1C", "#ffc935","#3d014c","#39ddb2",
"slateblue2","maroon","#f7df27","palevioletred1","olivedrab3",
"#377EB8","#5043c1","blue","aquamarine2","chartreuse4",
"burlywood2","indianred1","mediumorchid1")
p1 <- p1 + xlab("tsne axis 1") + ylab("tsne axis 2")
p1 <- p1 + scale_color_manual(values = colorvalue)
p1 <- p1 + theme(panel.background = element_blank(),
legend.position = "right",
axis.text=element_blank(),
axis.line.x = element_line(color="black"),
axis.line.y = element_line(color="black"),
plot.title = element_blank()
)
p1
sessionInfo()
#> R version 4.4.1 (2024-06-14)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.1 LTS
#>
#> Matrix products: default
#> BLAS: /home/biocbuild/bbs-3.20-bioc/R/lib/libRblas.so
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_GB LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: America/New_York
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] scBFA_1.20.0 ggplot2_3.5.1
#> [3] zinbwave_1.28.0 SingleCellExperiment_1.28.0
#> [5] SummarizedExperiment_1.36.0 Biobase_2.66.0
#> [7] GenomicRanges_1.58.0 GenomeInfoDb_1.42.0
#> [9] IRanges_2.40.0 S4Vectors_0.44.0
#> [11] BiocGenerics_0.52.0 MatrixGenerics_1.18.0
#> [13] matrixStats_1.4.1
#>
#> loaded via a namespace (and not attached):
#> [1] RcppAnnoy_0.0.22 splines_4.4.1 later_1.3.2
#> [4] tibble_3.2.1 polyclip_1.10-7 XML_3.99-0.17
#> [7] fastDummies_1.7.4 lifecycle_1.0.4 edgeR_4.4.0
#> [10] globals_0.16.3 lattice_0.22-6 MASS_7.3-61
#> [13] magrittr_2.0.3 limma_3.62.0 plotly_4.10.4
#> [16] sass_0.4.9 rmarkdown_2.28 jquerylib_0.1.4
#> [19] yaml_2.3.10 httpuv_1.6.15 Seurat_5.1.0
#> [22] sctransform_0.4.1 spam_2.11-0 sp_2.1-4
#> [25] spatstat.sparse_3.1-0 reticulate_1.39.0 cowplot_1.1.3
#> [28] pbapply_1.7-2 DBI_1.2.3 RColorBrewer_1.1-3
#> [31] ADGofTest_0.3 abind_1.4-8 zlibbioc_1.52.0
#> [34] Rtsne_0.17 pspline_1.0-20 purrr_1.0.2
#> [37] GenomeInfoDbData_1.2.13 ggrepel_0.9.6 irlba_2.3.5.1
#> [40] listenv_0.9.1 spatstat.utils_3.1-0 genefilter_1.88.0
#> [43] goftest_1.2-3 RSpectra_0.16-2 spatstat.random_3.3-2
#> [46] annotate_1.84.0 fitdistrplus_1.2-1 parallelly_1.38.0
#> [49] leiden_0.4.3.1 codetools_0.2-20 DelayedArray_0.32.0
#> [52] tidyselect_1.2.1 UCSC.utils_1.2.0 farver_2.1.2
#> [55] spatstat.explore_3.3-3 jsonlite_1.8.9 progressr_0.15.0
#> [58] ggridges_0.5.6 survival_3.7-0 tools_4.4.1
#> [61] ica_1.0-3 Rcpp_1.0.13 glue_1.8.0
#> [64] gridExtra_2.3 SparseArray_1.6.0 xfun_0.48
#> [67] DESeq2_1.46.0 dplyr_1.1.4 numDeriv_2016.8-1.1
#> [70] withr_3.0.2 fastmap_1.2.0 fansi_1.0.6
#> [73] digest_0.6.37 R6_2.5.1 mime_0.12
#> [76] colorspace_2.1-1 scattermore_1.2 tensor_1.5
#> [79] spatstat.data_3.1-2 RSQLite_2.3.7 copula_1.1-4
#> [82] utf8_1.2.4 tidyr_1.3.1 generics_0.1.3
#> [85] data.table_1.16.2 httr_1.4.7 htmlwidgets_1.6.4
#> [88] S4Arrays_1.6.0 uwot_0.2.2 pkgconfig_2.0.3
#> [91] gtable_0.3.6 blob_1.2.4 lmtest_0.9-40
#> [94] XVector_0.46.0 pcaPP_2.0-5 htmltools_0.5.8.1
#> [97] dotCall64_1.2 SeuratObject_5.0.2 scales_1.3.0
#> [100] png_0.1-8 spatstat.univar_3.0-1 knitr_1.48
#> [103] reshape2_1.4.4 nlme_3.1-166 cachem_1.1.0
#> [106] zoo_1.8-12 stringr_1.5.1 KernSmooth_2.23-24
#> [109] parallel_4.4.1 miniUI_0.1.1.1 softImpute_1.4-1
#> [112] AnnotationDbi_1.68.0 pillar_1.9.0 grid_4.4.1
#> [115] vctrs_0.6.5 RANN_2.6.2 promises_1.3.0
#> [118] xtable_1.8-4 cluster_2.1.6 evaluate_1.0.1
#> [121] mvtnorm_1.3-1 cli_3.6.3 locfit_1.5-9.10
#> [124] compiler_4.4.1 rlang_1.1.4 crayon_1.5.3
#> [127] future.apply_1.11.3 labeling_0.4.3 plyr_1.8.9
#> [130] stringi_1.8.4 viridisLite_0.4.2 deldir_2.0-4
#> [133] BiocParallel_1.40.0 gsl_2.1-8 munsell_0.5.1
#> [136] Biostrings_2.74.0 lazyeval_0.2.2 spatstat.geom_3.3-3
#> [139] Matrix_1.7-1 RcppHNSW_0.6.0 stabledist_0.7-2
#> [142] patchwork_1.3.0 bit64_4.5.2 future_1.34.0
#> [145] KEGGREST_1.46.0 statmod_1.5.0 shiny_1.9.1
#> [148] highr_0.11 ROCR_1.0-11 igraph_2.1.1
#> [151] memoise_2.0.1 bslib_0.8.0 bit_4.5.0