scaeData User Guide
scaeData 1.2.0
scaeData
is a complementary package to the Bioconductor package SingleCellAlleleExperiment
. It contains three datasets to be used when testing functions in SingleCellAlleleExperiment
. These are:
The raw FASTQs for all three datasets were sourced from publicly accessible datasets provided by 10x Genomics.
After downloading the raw data, the scIGD Snakemake workflow was utilized to perform HLA allele-typing processes and generate allele-specific quantification from scRNA-seq data using donor-specific references.
From Bioconductor:
if (!requireNamespace("BiocManager", quietly=TRUE))
install.packages("BiocManager")
BiocManager::install("scaeData")
Alternatively, a development version is available on GitHub and can be installed via:
if (!require("devtools", quietly = TRUE))
install.packages("devtools")
devtools::install_github("AGImkeller/scaeData", build_vignettes = TRUE)
The datasets within scaeData
are accessible using the scaeDataGet()
function:
library("scaeData")
pbmc_5k <- scaeDataGet("pbmc_5k")
pbmc_10k <- scaeDataGet("pbmc_10k")
For example, we can view pbmc_20k
:
pbmc_20k <- scaeDataGet("pbmc_20k")
## Retrieving barcode identifiers for **pbmc 20k** dataset...DONE
## Retrieving feature identifiers for **pbmc 20k** dataset...DONE
## Retrieving quantification matrix for **pbmc 20k** dataset...DONE
pbmc_20k
## $dir
## [1] "/home/biocbuild/.cache/R/ExperimentHub/"
##
## $barcodes
## [1] "5e87d79b9f122_9525"
##
## $features
## [1] "5e87d7285a8ab_9526"
##
## $matrix
## [1] "5e87d4c3f2c2c_9527"
cells.dir <- file.path(pbmc_20k$dir, pbmc_20k$barcodes)
features.dir <- file.path(pbmc_20k$dir, pbmc_20k$features)
mat.dir <- file.path(pbmc_20k$dir, pbmc_20k$matrix)
cells <- utils::read.csv(cells.dir, sep = "", header = FALSE)
features <- utils::read.delim(features.dir, header = FALSE)
mat <- Matrix::readMM(mat.dir)
rownames(mat) <- cells$V1
colnames(mat) <- features$V1
head(mat)
## 6 x 62760 sparse Matrix of class "dgTMatrix"
## [[ suppressing 34 column names 'ENSG00000279928.2', 'ENSG00000228037.1', 'ENSG00000142611.17' ... ]]
##
## AAACCCAAGAAACACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AAACCCAAGAAACTCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AAACCCAAGAAATTGC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AAACCCAAGAACAAGG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AAACCCAAGAACAGGA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AAACCCAAGAACCGCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
##
## AAACCCAAGAAACACT . . . ......
## AAACCCAAGAAACTCA . . . ......
## AAACCCAAGAAATTGC . . . ......
## AAACCCAAGAACAAGG . . . ......
## AAACCCAAGAACAGGA . . . ......
## AAACCCAAGAACCGCA . . . ......
##
## .....suppressing 62726 columns in show(); maybe adjust options(max.print=, width=)
## ..............................
A SingleCellAlleleExperiment
object, scae
for short, can be generated using the read_allele_counts()
function retrieved from the SingleCellAlleleExperiment
package.
A lookup table corresponding to each dataset, facilitating the creation of relevant additional data layers during object generation, can be accessed from the package’s extdata:
lookup <- read.csv(system.file("extdata", "pbmc_20k_lookup_table.csv", package="scaeData"))
library("SingleCellAlleleExperiment")
## Loading required package: SingleCellExperiment
## Loading required package: SummarizedExperiment
## Loading required package: MatrixGenerics
## Loading required package: matrixStats
##
## Attaching package: 'MatrixGenerics'
## The following objects are masked from 'package:matrixStats':
##
## colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
## colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
## colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
## colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
## colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
## colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
## colWeightedMeans, colWeightedMedians, colWeightedSds,
## colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
## rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
## rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
## rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
## rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
## rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
## rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
## rowWeightedSds, rowWeightedVars
## Loading required package: GenomicRanges
## Loading required package: stats4
## Loading required package: BiocGenerics
##
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:stats':
##
## IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
##
## Filter, Find, Map, Position, Reduce, anyDuplicated, aperm, append,
## as.data.frame, basename, cbind, colnames, dirname, do.call,
## duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
## lapply, mapply, match, mget, order, paste, pmax, pmax.int, pmin,
## pmin.int, rank, rbind, rownames, sapply, saveRDS, setdiff, table,
## tapply, union, unique, unsplit, which.max, which.min
## Loading required package: S4Vectors
##
## Attaching package: 'S4Vectors'
## The following object is masked from 'package:utils':
##
## findMatches
## The following objects are masked from 'package:base':
##
## I, expand.grid, unname
## Loading required package: IRanges
## Loading required package: GenomeInfoDb
## Loading required package: Biobase
## Welcome to Bioconductor
##
## Vignettes contain introductory material; view with
## 'browseVignettes()'. To cite Bioconductor, see
## 'citation("Biobase")', and for packages 'citation("pkgname")'.
##
## Attaching package: 'Biobase'
## The following object is masked from 'package:MatrixGenerics':
##
## rowMedians
## The following objects are masked from 'package:matrixStats':
##
## anyMissing, rowMedians
scae_20k <- read_allele_counts(pbmc_20k$dir,
sample_names = "example_data",
filter_mode = "no",
lookup_file = lookup,
barcode_file = pbmc_20k$barcodes,
gene_file = pbmc_20k$features,
matrix_file = pbmc_20k$matrix,
verbose = TRUE)
## Filtering performed on default value at 0 UMI counts.
## Data Read_in completed
## Generating SCAE object: Extending rowData with new classifiers
## Generating SCAE object: Filtering at 0 UMI counts.
## Generating SCAE object: Aggregating alleles corresponding to the same gene
## Generating SCAE object: Aggregating genes corresponding to the same functional groups
## SingleCellAlleleExperiment object completed
scae_20k
## class: SingleCellAlleleExperiment
## dim: 62772 1746519
## metadata(0):
## assays(1): counts
## rownames(62772): ENSG00000279928.2 ENSG00000228037.1 ... HLA_class_I
## HLA_class_II
## rowData names(3): Ensembl_ID NI_I Quant_type
## colnames(1746519): AAACCCAAGAAACACT AAACCCAAGAAACTCA ...
## TTTGTTGTCTTTGCTA TTTGTTGTCTTTGGAG
## colData names(2): Sample Barcode
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## ---------------
## Including a total of 29 immune related features:
## Allele-level information (17): A*02:01:01:01 A*24:02:01:01 ...
## DPB1*03:01:01:01 DPB1*11:01:01:01
## Immune genes (10): HLA-A HLA-B ... HLA-DPA1 HLA-DPB1
## Functional level information (2): HLA_class_I HLA_class_II
Please refer to the vignette and documentation of SingleCellAlleleExperiment
to further work with this kind of data container.
sessionInfo()
## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.20-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] SingleCellAlleleExperiment_1.2.0 SingleCellExperiment_1.28.0
## [3] SummarizedExperiment_1.36.0 Biobase_2.66.0
## [5] GenomicRanges_1.58.0 GenomeInfoDb_1.42.0
## [7] IRanges_2.40.0 S4Vectors_0.44.0
## [9] BiocGenerics_0.52.0 MatrixGenerics_1.18.0
## [11] matrixStats_1.4.1 scaeData_1.2.0
## [13] BiocStyle_2.34.0
##
## loaded via a namespace (and not attached):
## [1] KEGGREST_1.46.0 xfun_0.48 bslib_0.8.0
## [4] lattice_0.22-6 vctrs_0.6.5 tools_4.4.1
## [7] generics_0.1.3 parallel_4.4.1 curl_5.2.3
## [10] tibble_3.2.1 fansi_1.0.6 AnnotationDbi_1.68.0
## [13] RSQLite_2.3.7 blob_1.2.4 pkgconfig_2.0.3
## [16] Matrix_1.7-1 dbplyr_2.5.0 lifecycle_1.0.4
## [19] GenomeInfoDbData_1.2.13 compiler_4.4.1 Biostrings_2.74.0
## [22] codetools_0.2-20 htmltools_0.5.8.1 sass_0.4.9
## [25] yaml_2.3.10 pillar_1.9.0 crayon_1.5.3
## [28] jquerylib_0.1.4 BiocParallel_1.40.0 DelayedArray_0.32.0
## [31] cachem_1.1.0 abind_1.4-8 mime_0.12
## [34] ExperimentHub_2.14.0 AnnotationHub_3.14.0 tidyselect_1.2.1
## [37] digest_0.6.37 dplyr_1.1.4 purrr_1.0.2
## [40] bookdown_0.41 BiocVersion_3.20.0 grid_4.4.1
## [43] fastmap_1.2.0 SparseArray_1.6.0 cli_3.6.3
## [46] magrittr_2.0.3 S4Arrays_1.6.0 utf8_1.2.4
## [49] withr_3.0.2 filelock_1.0.3 UCSC.utils_1.2.0
## [52] rappdirs_0.3.3 bit64_4.5.2 rmarkdown_2.28
## [55] XVector_0.46.0 httr_1.4.7 bit_4.5.0
## [58] png_0.1-8 memoise_2.0.1 evaluate_1.0.1
## [61] knitr_1.48 BiocFileCache_2.14.0 rlang_1.1.4
## [64] glue_1.8.0 DBI_1.2.3 BiocManager_1.30.25
## [67] jsonlite_1.8.9 R6_2.5.1 zlibbioc_1.52.0