This vignette will demonstrate a full single-cell lineage analysis workflow,
with particular emphasis on the processes of lineage reconstruction and
pseudotime inference. We will make use of the slingshot
package proposed in
(Street et al. 2017) and show how it may be applied in a broad range of settings.
The goal of slingshot
is to use clusters of cells to uncover global structure
and convert this structure into smooth lineages represented by one-dimensional
variables, called “pseudotime.” We provide tools for learning cluster
relationships in an unsupervised or semi-supervised manner and constructing
smooth curves representing each lineage, along with visualization methods for
each step.
The minimal input to slingshot
is a matrix representing the cells in a
reduced-dimensional space and a vector of cluster labels. With these two
inputs, we then:
getLineages
function.getCurves
function.We will work with two simulated datasets in this vignette. The first was
generated by the splatter
package (Zappia, Phipson, and Oshlack 2017), using parameters inferred from
the single-cell dataset contained in the HSMMSingleCell
package. This dataset
is contained in a SingleCellExperiment
object (Lun and Risso 2017) and
will be used to demonstrate a full “start-to-finish” workflow. It can be loaded
via the splatterPath
dataset, provided with slingshot
. Below, we provide
code showing how the dataset was generated.
library(HSMMSingleCell, quietly = TRUE)
library(splatter, quietly = TRUE)
# load data
data("HSMM_expr_matrix")
counts <- round(HSMM_expr_matrix)
# estimate simulation parameters
params <- splatEstimate(counts)
# set path-related parameters
params <- setParams(params, update = list(batchCells = 200,
path.nonlinearProb = .5,
de.prob = .6, de.facLoc = 1,
de.facScale = 2))
##
## Attaching package: 'S4Vectors'
## The following object is masked from 'package:base':
##
## expand.grid
##
## Attaching package: 'matrixStats'
## The following objects are masked from 'package:Biobase':
##
## anyMissing, rowMedians
##
## Attaching package: 'DelayedArray'
## The following objects are masked from 'package:matrixStats':
##
## colMaxs, colMins, colRanges, rowMaxs, rowMins, rowRanges
## The following objects are masked from 'package:base':
##
## aperm, apply
# simulate
sim <- splatSimulate(params, method = "paths")
## Getting parameters...
## Creating simulation object...
## Simulating library sizes...
## Simulating gene means...
## Simulating path endpoints...
## Simulating path steps...
## Simulating BCV...
## Simulating counts...
## Simulating dropout (if needed)...
## Done!
The second dataset consists of a \(140\) cells \(\times\) \(2\) dimensions
matrix of coordinates, along with cluster labels generated by \(k\)-means
clustering. This dataset represents a bifurcating lineage trajectory and it will
allow us to demonstrate some of the additional functionality offered by
slingshot
.
library(slingshot, quietly = FALSE)
data("slingshotExample")
dim(rd) # data representing cells in a reduced dimensional space
## [1] 140 2
length(cl) # vector of cluster labels
## [1] 140
To begin our analysis of the single lineage dataset, we need to reduce the dimensionality of our data and filtering out uninformative genes is a typical first step. This will greatly improve the speed of downstream analyses, while keeping the loss of information to a minimum.
For the gene filtering step, we retained any genes robustly expressed in at least enough cells to constitute a cluster, making them potentially interesting cell-type marker genes. We set this minimum cluster size to 10 cells and define a gene as being “robustly expressed” if it has a simulated count of at least 10 reads.
# filter genes down to potential cell-type markers
# at least M (15) reads in at least N (15) cells
geneFilter <- apply(assays(sim)$counts,1,function(x){
sum(x >= 15) >= 15
})
sim <- sim[geneFilter, ]
Another important early step in most RNA-Seq analysis pipelines is the choice of
normalization method. This allows us to remove unwanted technical or biological
artifacts from the data, such as batch, sequencing depth, cell cycle effects,
etc. In practice, it is valuable to compare a variety of normalization
techniques and compare them along different evaluation criteria, for which we
recommend the scone
package (Cole and Risso 2018).
Since we are working with simulated data, we have the advantage of knowing what effects are present. In this case, we know that each simulated cell had a unique expected library size and we will apply quantile normalization in order to remove this effect.
FQnorm <- function(counts){
rk <- apply(counts,2,rank,ties.method='min')
counts.sort <- apply(counts,2,sort)
refdist <- apply(counts.sort,1,median)
norm <- apply(rk,2,function(r){ refdist[r] })
rownames(norm) <- rownames(counts)
return(norm)
}
assays(sim)$norm <- FQnorm(assays(sim)$counts)
The fundamental assumption of slingshot
is that cells which are
transcriptionally similar will be close to each other in some
reduced-dimensional space. Since we use Euclidean distances in constructing
lineages and measuring pseudotime, it is important to have a low-dimensional
representation of the data.
There are many methods available for this task and we will intentionally avoid
the issue of determining which is the “best” method, as this likely depends on
the type of data, method of collection, upstream computational choices, and many
other factors. We will demonstrate two methods of dimensionality reduction:
principal components analysis (PCA) and diffusion maps (via the destiny
package, (Angerer et al. 2015)).
When performing PCA, we do not scale the genes by their variance because we do not believe that all genes are equally informative. We want to find signal in the robustly expressed, highly variable genes, not dampen this signal by forcing equal variance across genes. When plotting, we make sure to set the aspect ratio, so as not to distort the perceived distances.
pca <- prcomp(t(log1p(assays(sim)$norm)), scale. = FALSE)
rd1 <- pca$x[,1:2]
plot(rd1, col = topo.colors(100)[sim$Step], pch=16, asp = 1)
library(destiny, quietly = TRUE)
##
## Attaching package: 'destiny'
## The following object is masked from 'package:SummarizedExperiment':
##
## distance
## The following object is masked from 'package:GenomicRanges':
##
## distance
## The following object is masked from 'package:IRanges':
##
## distance
dm <- DiffusionMap(t(log1p(assays(sim)$norm)))
rd2 <- cbind(DC1 = dm$DC1, DC2 = dm$DC2)
plot(rd2, col = topo.colors(100)[sim$Step], pch=16, asp = 1)
We will add both dimensionality reductions to the SingleCellExperiment
object,
but continue our analysis focusing on the PCA results.
reducedDims(sim) <- SimpleList(PCA = rd1, DiffMap = rd2)
The final input to slingshot
is a vector of cluster labels for the cells. If
this is not provided, slingshot
will treat the data as a single cluster and
fit a standard principal curve. However, we recommend clustering the cells even
in datasets where only a single lineage is expected, as it allows for the
potential discovery of novel branching events.
The clusters identified in this step will be used to determine the global structure of the underlying lineages (that is, their number, when they branch off from one another, and the approximate locations of those branching events). This is different than the typical goal of clustering single-cell data, which is to identify all biologically relevant cell types present in the dataset. For example, when determining global lineage structure, there is no need to distinguish between immature and mature neurons since both cell types will, presumably, fall along the same segment of a lineage.
For our analysis, we implement two clustering methods which similarly assume
that Euclidean distance in a low-dimensional space reflect biological
differences between cells: Gaussian mixture modeling and \(k\)-means. The former
is implemented in the mclust
package (Scrucca et al. 2016) and features an automated
method for determining the number of clusters based on the Bayesian information
criterion (BIC).
library(mclust, quietly = TRUE)
## Package 'mclust' version 5.4.1
## Type 'citation("mclust")' for citing this R package in publications.
cl1 <- Mclust(rd1)$classification
colData(sim)$GMM <- cl1
library(RColorBrewer)
plot(rd1, col = brewer.pal(9,"Set1")[cl1], pch=16, asp = 1)
While \(k\)-means does not have a similar functionality, we have shown in (Street et al. 2017) that simultaneous principal curves are quite robust to the choice of \(k\), so we select a \(k\) of 4 somewhat arbitrarily. If this is too low, we may miss a true branching event and if it is too high or there is an abundance of small clusters, we may begin to see spurious branching events.
cl2 <- kmeans(rd1, centers = 4)$cluster
colData(sim)$kmeans <- cl2
plot(rd1, col = brewer.pal(9,"Set1")[cl2], pch=16, asp = 1)
At this point, we have everything we need to run slingshot
on our simulated
dataset. This is a two-step process composed of identifying the
global lineage structure with a cluster-based minimum spanning tree (MST) and
fitting simultaneous principal curves to describe each lineage.
These two steps can be run separately with the getLineages
and getCurves
functions, or together with the wrapper function, slingshot
(recommended). We
will use the combined method for the analysis of the splatter
dataset, but
demonstrate the usage of the individual functions later, on the slingshot
dataset.
The slingshot
wrapper function performs both steps of lineage inference in a
single call. The necessary inputs are a reduced dimensional matrix of
coordinates and a set of cluster labels. These can be separate objects or, in
the case of the splatter
data, elements contained in a SingleCellExperiment
object.
To run slingshot
with the dimensionality reduction produced by PCA and cluster
labels identified by Gaussian mixutre modeling, we would do the following:
sce <- slingshot(sim, clusterLabels = 'GMM', reducedDim = 'PCA')
## Using full covariance matrix
As noted above, if no clustering results are provided, it is assumed that all
cells are part of the same cluster and a single curve will be constructed. If no
dimensionality reduction is provided, slingshot
will use the first element of
the list returned by reducedDims
.
The output is a SingleCellExperiment
object with slingshot
results
incorporated. Most of the output is added to the metadata in the form of a list
and is accessible via metadata(sce)$slingshot
. Additionally, all inferred
pseudotime variables (one per lineage) are added to the colData
. To extract
all slingshot
results in a single object, we can use the SlingshotDataSet
function. This can be useful for visualization, as several plotting methods for
SlingshotDataSet
objects are included with the package. Below, we visuzalize
the inferred lineage for the splatter
data with points colored by pseudotime.
summary(sce$slingPseudotime_1)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 16.92 31.70 37.94 52.80 142.78
colors <- colorRampPalette(brewer.pal(11,'Spectral')[-6])(100)
plot(reducedDims(sce)$PCA, col = colors[cut(sce$slingPseudotime_1,breaks=100)], pch=16, asp = 1)
lines(SlingshotDataSet(sce), lwd=2)
We can also see how the lineage structure was intially estimated by the
cluster-based minimum spanning tree by using the type
argument.
plot(reducedDims(sce)$PCA, col = brewer.pal(9,'Set1')[sce$GMM], pch=16, asp = 1)
lines(SlingshotDataSet(sce), lwd=2, type = 'lineages')
After running slingshot
, an interesting next step may be to find genes that
change their expression over the course of development. We demonstrate one
possible method for this type of analysis on the \(1,000\) most variable genes.
We will regress each gene on the pseudotime variable we have generated, using a general additive model (GAM). This allows us to detect non-linear patterns in gene expression.
require(gam)
## Loading required package: gam
## Loading required package: splines
## Loading required package: foreach
## Loaded gam 1.16
t <- sce$slingPseudotime_1
# for time, only look at the 1,000 most variable genes
Y <- log1p(assays(sim)$norm)
var1K <- names(sort(apply(Y,1,var),decreasing = TRUE))[1:1000]
Y <- Y[var1K,]
# fit a GAM with a loess term for pseudotime
gam.pval <- apply(Y,1,function(z){
d <- data.frame(z=z, t=t)
tmp <- gam(z ~ lo(t), data=d)
p <- summary(tmp)[4][[1]][1,5]
p
})
We can then pick out the top genes based on p-value and visualize their expression over developmental time with a heatmap.
require(clusterExperiment)
## Loading required package: clusterExperiment
topgenes <- names(sort(gam.pval, decreasing = FALSE))[1:100]
heatdata <- assays(sim)$norm[rownames(assays(sim)$norm) %in% topgenes,
order(t, na.last = NA)]
heatclus <- sce$GMM[order(t, na.last = NA)]
ce <- ClusterExperiment(heatdata, heatclus, transformation = log1p)
plotHeatmap(ce, clusterSamplesData = "orderSamplesValue",visualizeData = 'transformed')
Here, we provide further details and highlight some additional functionality of
the slingshot
package. We will use the included slingshotExample
dataset for
illustrative purposes. This dataset was designed to represent cells in a low
dimensional space and comes with a set of cluster labels generated by \(k\)-means
clustering. Rather than constructing a full SingleCellExperiment
object, which
requires gene-level data, we will use the low-dimensional matrix of coordinates
directly and provide the cluster labels as an additional argument.
The getLineages
function takes as input an n
\(\times\) p
matrix and a vector of
clustering results of length n
. It maps connections between adjacent clusters
using a minimum spanning tree (MST) and identifies paths through these
connections that represent lineages. The output of this function is a
SlingshotDataSet
containing the inputs as well as the inferred MST
(represented by an adjacency matrix) and lineages (ordered vectors of cluster
names).
This analysis can be performed in an entirely unsupervised manner or in a
semi-supervised manner by specifying known initial and terminal point clusters.
If we do not specify a starting point, slingshot
selects one based on
parsimony, maximizing the number of clusters shared between lineages before a
split. If there are no splits or multiple clusters produce the same parsimony
score, the starting cluster is chosen arbitrarily.
In the case of our simulated data, slingshot
selects Cluster 1 as the starting
cluster. However, we generally recommend the specification of an initial cluster
based on prior knowledge (either time of sample collection or established gene
markers). This specification will have no effect on how the MST is constructed,
but it will impact how branching curves are constructed.
lin1 <- getLineages(rd, cl, start.clus = '1')
## Using full covariance matrix
lin1
## class: SlingshotDataSet
##
## Samples Dimensions
## 140 2
##
## lineages: 2
## Lineage1: 1 2 3 5
## Lineage2: 1 2 3 4
##
## curves: 0
plot(rd, col = brewer.pal(9,"Set1")[cl], asp = 1, pch = 16)
lines(lin1, lwd = 3)
At this step, slingshot
also allows for the specification of known endpoints.
Clusters which are specified as terminal cell states will be constrained to have
only one connection when the MST is constructed (ie., they must be leaf nodes).
This constraint could potentially impact how other parts of the tree are drawn,
as shown in the next example where we specify Cluster 3 as an endpoint.
lin2 <- getLineages(rd, cl, start.clus= '1', end.clus = '3')
## Using full covariance matrix
plot(rd, col = brewer.pal(9,"Set1")[cl], asp = 1, pch = 16)
lines(lin2, lwd = 3, show.constraints = TRUE)
This type of supervision can be useful for ensuring that results are consistent with prior biological knowledge. Specifically, it can prevent known terminal cell fates from being categorized as transitory states.
There are a few additional arguments we could have passed to getLineages
for
greater control:
dist.fun
is a function for computing distances between clusters. The default
is squared distance between cluster centers normalized by their joint covariance
matrix.omega
is a granularity parameter, allowing the user to set an upper
limit on connection distances. This can be useful for identifying outlier
clusters which are not a part of any lineage.After constructing the MST, getLineages
identifies paths through the tree to
designate as lineages. At this stage, a lineage will consist of an ordered set
of cluster names, starting with the root cluster and ending with a leaf. The
output of getLineages
is a SlingshotDataSet
which holds all of the inputs as
well as a list of lineages and some additional information on how they were
constructed. This object is then used as the input to the getCurves
function.
In order to model development along these various lineages, we will construct
smooth curves with the function getCurves
. Using smooth curves based on all
the cells eliminates the problem of cells projecting onto vertices of piece-wise
linear trajectories and makes slingshot
more robust to noise in the clustering
results.
In order to construct smooth lineages, getCurves
follows an iterative process
similar to that of principal curves presented in (Hastie and Stuetzle 1989). When there is
only a single lineage, the resulting curve is simply the principal curve through
the center of the data, with one adjustment: the initial curve is constructed
with the linear connections between cluster centers rather than the first
prinicpal component of the data. This adjustment adds stability and typically
hastens the algorithm’s convergence.
When there are two or more lineages, we add an additional step to the algorithm: averaging curves near shared cells. Both lineages should agree fairly well on cells that have yet to differentiate, so at each iteration we average the curves in the neighborhood of these cells. This increases the stability of the algorithm and produces smooth branching lineages.
crv1 <- getCurves(lin1)
crv1
## class: SlingshotDataSet
##
## Samples Dimensions
## 140 2
##
## lineages: 2
## Lineage1: 1 2 3 5
## Lineage2: 1 2 3 4
##
## curves: 2
## Curve1: Length: 15.045 Samples: 100.61
## Curve2: Length: 15.126 Samples: 103.49
plot(rd, col = brewer.pal(9,"Set1")[cl], asp = 1, pch = 16)
lines(crv1, lwd = 3)
The output of getCurves
is an updated slingshotDataSet
which now contains
the simultaneous principal curves and additional information on how they were
fit. The pseudotime
function extracts a cells-by-lineages matrix of pseudotime
values for each cell along each lineage, with NA
values for cells that were
not assigned to a particular lineage. The curveWeights
function extracts a
similar matrix of weights assigning each cell to one or more lineages.
The curves objects can be accessed using the curves
function, which will
return a list of principal_curve
objects. These objects consist of the
following slots:
s
: the matrix of points that make up the curve. These correspond to the
orthogonal projections of the data points.ord
: indices which can be used to
put the cells along a curve in order based their projections.lambda
: arclengths along the curve from the beginning to each cell’s
projection. A matrix of these values is returned by the slingPseudotime
function.dist_ind
: the squared distances between data points and their projections
onto the curve.dist
: the sum of the squared projection distances.w
: the vector of weights along this lineage. Cells that were unambiguously
assigned to this lineage will have a weight of 1
, while cells assigned to
other lineages will have a weight of 0
. It is possible for cells to have
weights of 1
(or very close to 1) for multiple lineages, if they are
positioned before a branching event. A matrix of these values is returned by the
slingCurveWeights
function.sessionInfo()
## R version 3.5.1 Patched (2018-07-12 r74967)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.5 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.8-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.8-bioc/R/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] splines stats4 parallel stats graphics grDevices utils
## [8] datasets methods base
##
## other attached packages:
## [1] clusterExperiment_2.2.0 gam_1.16
## [3] foreach_1.4.4 RColorBrewer_1.1-2
## [5] mclust_5.4.1 destiny_2.12.0
## [7] splatter_1.6.0 SingleCellExperiment_1.4.0
## [9] SummarizedExperiment_1.12.0 DelayedArray_0.8.0
## [11] BiocParallel_1.16.0 matrixStats_0.54.0
## [13] GenomicRanges_1.34.0 GenomeInfoDb_1.18.0
## [15] IRanges_2.16.0 S4Vectors_0.20.0
## [17] slingshot_1.0.0 bigmemory_4.5.33
## [19] Biobase_2.42.0 BiocGenerics_0.28.0
## [21] princurve_2.1.3 BiocStyle_2.10.0
##
## loaded via a namespace (and not attached):
## [1] tidyselect_0.2.5 RSQLite_2.1.1 AnnotationDbi_1.44.0
## [4] htmlwidgets_1.3 grid_3.5.1 trimcluster_0.1-2.1
## [7] RNeXML_2.1.2 munsell_0.5.0 codetools_0.2-15
## [10] miniUI_0.1.1.1 withr_2.1.2 colorspace_1.3-2
## [13] knitr_1.20 uuid_0.1-2 zinbwave_1.4.0
## [16] pspline_1.0-18 robustbase_0.93-3 vcd_1.4-4
## [19] VIM_4.7.0 TTR_0.23-4 NMF_0.21.0
## [22] GenomeInfoDbData_1.2.0 bit64_0.9-7 rhdf5_2.26.0
## [25] rprojroot_1.3-2 xfun_0.4 ggthemes_4.0.1
## [28] diptest_0.75-7 R6_2.3.0 doParallel_1.0.14
## [31] RcppEigen_0.3.3.4.0 locfit_1.5-9.1 manipulateWidget_0.10.0
## [34] flexmix_2.3-14 bitops_1.0-6 assertthat_0.2.0
## [37] promises_1.0.1 scales_1.0.0 nnet_7.3-12
## [40] gtable_0.2.0 phylobase_0.8.4 rlang_0.3.0.1
## [43] genefilter_1.64.0 akima_0.6-2 scatterplot3d_0.3-41
## [46] lazyeval_0.2.1 checkmate_1.8.5 abind_1.4-5
## [49] BiocManager_1.30.3 rgl_0.99.16 yaml_2.2.0
## [52] reshape2_1.4.3 crosstalk_1.0.0 backports_1.1.2
## [55] httpuv_1.4.5 tools_3.5.1 bookdown_0.7
## [58] gridBase_0.4-7 ggplot2_3.1.0 proxy_0.4-22
## [61] stabledist_0.7-1 Rcpp_0.12.19 plyr_1.8.4
## [64] progress_1.2.0 zlibbioc_1.28.0 purrr_0.2.5
## [67] RCurl_1.95-4.11 prettyunits_1.0.2 viridis_0.5.1
## [70] zoo_1.8-4 haven_1.1.2 cluster_2.0.7-1
## [73] magrittr_1.5 data.table_1.11.8 RSpectra_0.13-1
## [76] openxlsx_4.1.0 lmtest_0.9-36 mvtnorm_1.0-8
## [79] whisker_0.3-2 gsl_1.9-10.3 hms_0.4.2
## [82] mime_0.6 evaluate_0.12 xtable_1.8-3
## [85] smoother_1.1 XML_3.98-1.16 rio_0.5.10
## [88] readxl_1.1.0 gridExtra_2.3 compiler_3.5.1
## [91] tibble_1.4.2 crayon_1.3.4 htmltools_0.3.6
## [94] pcaPP_1.9-73 later_0.7.5 tidyr_0.8.2
## [97] howmany_0.3-1 DBI_1.0.0 MASS_7.3-51
## [100] fpc_2.1-11.1 boot_1.3-20 Matrix_1.2-14
## [103] ade4_1.7-13 car_3.0-2 bindr_0.1.1
## [106] igraph_1.2.2 forcats_0.3.0 pkgconfig_2.0.2
## [109] bigmemory.sri_0.1.3 rncl_0.8.3 registry_0.5
## [112] laeken_0.4.6 numDeriv_2016.8-1 locfdr_1.1-8
## [115] foreign_0.8-71 sp_1.3-1 xml2_1.2.0
## [118] annotate_1.60.0 rngtools_1.3.1 pkgmaker_0.27
## [121] webshot_0.5.1 XVector_0.22.0 bibtex_0.4.2
## [124] stringr_1.3.1 digest_0.6.18 copula_0.999-18
## [127] ADGofTest_0.3 softImpute_1.4 cellranger_1.1.0
## [130] rmarkdown_1.10 dendextend_1.9.0 edgeR_3.24.0
## [133] curl_3.2 kernlab_0.9-27 shiny_1.1.0
## [136] modeltools_0.2-22 nlme_3.1-137 jsonlite_1.5
## [139] bindrcpp_0.2.2 Rhdf5lib_1.4.0 carData_3.0-2
## [142] viridisLite_0.3.0 limma_3.38.0 pillar_1.3.0
## [145] lattice_0.20-35 httr_1.3.1 DEoptimR_1.0-8
## [148] survival_2.43-1 xts_0.11-1 glue_1.3.0
## [151] zip_1.0.0 prabclus_2.2-6 iterators_1.0.10
## [154] glmnet_2.0-16 bit_1.1-14 class_7.3-14
## [157] stringi_1.2.4 HDF5Array_1.10.0 blob_1.1.1
## [160] memoise_1.1.0 dplyr_0.7.7 e1071_1.7-0
## [163] ape_5.2
Angerer, Philipp, Laleh Haghverdi, Maren Büttner, Fabian Theis, Carsten Marr, and Florian Büttner. 2015. “Destiny: Diffusion Maps for Large-Scale Single-Cell Data in R.” Bioinformatics 32 (8). Ingolstädter Landstraße 1, Neuherberg, Germany: Helmholtz-Zentrum München:1241. https://doi.org/10.1093/bioinformatics/btv715.
Cole, Michael, and Davide Risso. 2018. Scone: Single Cell Overview of Normalized Expression Data.
Hastie, Trevor, and Werner Stuetzle. 1989. “Principal Curves.” Journal of the American Statistical Association 84 (406):502–16.
Lun, Aaron, and Davide Risso. 2017. SingleCellExperiment: S4 Classes for Single Cell Data.
Scrucca, Luca, Michael Fop, Thomas Brendan Murphy, and Adrian E. Raftery. 2016. “mclust 5: Clustering, Classification and Density Estimation Using Gaussian Finite Mixture Models.” The R Journal 8 (1):205–33.
Street, Kelly, Davide Risso, Russell B Fletcher, Diya Das, John Ngai, Nir Yosef, Elizabeth Purdom, and Sandrine Dudoit. 2017. “Slingshot: Cell Lineage and Pseudotime Inference for Single-Cell Transcriptomics.” bioRxiv. Cold Spring Harbor Laboratory. https://doi.org/10.1101/128843.
Zappia, Luke, Belinda Phipson, and Alicia Oshlack. 2017. “Splatter: Simulation of Single-Cell RNA Sequencing Data.” Genome Biology 18 (September):174. https://doi.org/10.1186/s13059-017-1305-0.