The R package eds
provides a single function readEDS
for efficiently reading Alevin’s EDS format for single cell count data into R, utilizing the sparse matrix format in the Matrix
package.
Note: eds
provides a low-level function readEDS
which most users will not need to use. Most users and developers will likely prefer to use tximport
(for importing matrices) or tximeta
(for easy conversion to SingleCellExperiment objects). This package is primarily developed in order to streamline the dependency graph for other packages.
EDS is an accronym for Efficient single cell binary Data Storage format for the cell-feature count matrices.
For more details on the EDS format see the following repository:
The following example is the same as round in ?readEDS
, first we point to EDS files as output by Alevin:
library(tximportData)
library(eds)
## Loading required package: Matrix
dir0 <- system.file("extdata",package="tximportData")
samps <- list.files(file.path(dir0, "alevin"))
dir <- file.path(dir0,"alevin",samps[3],"alevin")
quant.mat.file <- file.path(dir, "quants_mat.gz")
barcode.file <- file.path(dir, "quants_mat_rows.txt")
gene.file <- file.path(dir, "quants_mat_cols.txt")
readEDS()
requires knowing the number of cells and genes, which we find by reading in associated barcode and gene files. Again, note that a more useful convenience function for reading in Alevin data is tximport
(matrices) or tximeta
(for easy conversion to SingleCellExperiment).
cell.names <- readLines(barcode.file)
gene.names <- readLines(gene.file)
num.cells <- length(cell.names)
num.genes <- length(gene.names)
Finally, reading in the sparse matrix is accomplished with:
mat <- readEDS(
numOfGenes=num.genes,
numOfOriginalCells=num.cells,
countMatFilename=quant.mat.file)
sessionInfo()
## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.20-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] eds_1.8.0 Matrix_1.7-1 tximportData_1.33.0
##
## loaded via a namespace (and not attached):
## [1] digest_0.6.37 R6_2.5.1 fastmap_1.2.0 xfun_0.48
## [5] lattice_0.22-6 cachem_1.1.0 knitr_1.48 htmltools_0.5.8.1
## [9] rmarkdown_2.28 lifecycle_1.0.4 cli_3.6.3 grid_4.4.1
## [13] sass_0.4.9 jquerylib_0.1.4 compiler_4.4.1 tools_4.4.1
## [17] evaluate_1.0.1 bslib_0.8.0 Rcpp_1.0.13 yaml_2.3.10
## [21] rlang_1.1.4 jsonlite_1.8.9