An Introduction to the openCyto package

## Loading required package: flowWorkspaceData
  1. Introduction

The openCyto package is designed to facilitate the application of automated gating methods in a sequential way to mimic the construction of a manual gating scheme.

1.1. Manual gating

Traditionally, scientists have to draw the gates for each individual sample on each 2-D projection (2 channels) within flowJo. Alternatively, they can draw template gates on one sample and replicate them to other samples, then manually inspect the gate on each sample to do the correction if necessary. Either way is time consuming and subjective, thus not suitable for the large data sets generated by high-throughput flow cytometry, CyTOF, or “cross-lab” data analysis.

Here is one xml workspace (manual gating scheme) exported from flowJo.

flowDataPath <- system.file("extdata", package = "flowWorkspaceData")
wsfile <- list.files(flowDataPath, pattern="manual.xml",full = TRUE)
wsfile
## [1] "/home/biocbuild/bbs-3.20-bioc/R/site-library/flowWorkspaceData/extdata/manual.xml"

By using the CytoML package, We can load it into R,

library(CytoML)
ws <- open_flowjo_xml(wsfile)

apply themanual gatesdefined inxmlto the rawFSCfiles,

gs <- flowjo_to_gatingset(ws, name= "T-cell", subset =1)

and then visualize theGating Hierarchy

gh <- gs[[1]]
plot(gh)
plot of chunk plot-manual-GatingHierarchy

plot of chunk plot-manual-GatingHierarchy

and the`gates`:
library(ggcyto)
autoplot(gh)
plot of chunk plot-manual-gates

plot of chunk plot-manual-gates

This is a gating scheme for a `T cell` panel, which tries to identify `T cell` sub-populations. We can achieve the same results by using the automated gating pipeline provided by this package.

1.2. Automated Gating

flowCore,flowClust and other packages provide many different gating methods to detect cell populations and draw gates automatically.

The flowWorkspace package provides the GatingSet as an efficient data structure to store, query and visualize the hierarchical gated data.

By taking advantage of these tools, the openCyto package can create the automated gating pipeline by a gatingTemplate, which is essentially the same kind of hierarchical gating scheme used by scientists.

  1. Create gating templates

2.1. Template format

First of all, we need to describe the gating hierarchy in a spread sheet (a plain text format). This spread sheet must have the following columns:

2.2. Example template

Here is an example of a gating template.

library(openCyto)
library(data.table)
gtFile <- system.file("extdata/gating_template/tcell.csv", package = "openCyto")
dtTemplate <- fread(gtFile)
dtTemplate
##             alias    pop    parent        dims   gating_method
##            <char> <char>    <char>      <char>          <char>
##  1:     nonDebris      +      root       FSC-A gate_mindensity
##  2:      singlets      + nonDebris FSC-A,FSC-H     singletGate
##  3:         lymph      +  singlets FSC-A,SSC-A       flowClust
##  4:           cd3      +     lymph         CD3 gate_mindensity
##  5:             * -/++/-       cd3     cd4,cd8 gate_mindensity
##  6: activated cd4     ++  cd4+cd8-    CD38,HLA gate_mindensity
##  7: activated cd8     ++  cd4-cd8+    CD38,HLA gate_mindensity
##  8:      CD45_neg      -  cd4+cd8-      CD45RA gate_mindensity
##  9:     CCR7_gate      +  CD45_neg        CCR7       flowClust
## 10:             * +/-+/-  cd4+cd8- CCR7,CD45RA         refGate
## 11:             * +/-+/-  cd4-cd8+ CCR7,CD45RA gate_mindensity
##               gating_args collapseDataForGating groupBy preprocessing_method
##                    <char>                <lgcl>   <int>               <char>
##  1:                                          NA      NA                     
##  2:                                          NA      NA                     
##  3: K=2,target=c(1e5,5e4)                    NA      NA      prior_flowClust
##  4:                                        TRUE       4                     
##  5:     gate_range=c(1,3)                    NA      NA                     
##  6:                                          NA      NA  standardize_flowset
##  7:                                          NA      NA  standardize_flowset
##  8:     gate_range=c(2,3)                    NA      NA                     
##  9:           neg=1,pos=1                    NA      NA                     
## 10:    CD45_neg:CCR7_gate                    NA      NA                     
## 11:                                          NA      NA                     
##     preprocessing_args
##                 <lgcl>
##  1:                 NA
##  2:                 NA
##  3:                 NA
##  4:                 NA
##  5:                 NA
##  6:                 NA
##  7:                 NA
##  8:                 NA
##  9:                 NA
## 10:                 NA
## 11:                 NA

Each row is usually corresponding to one cell population and the gating method that is used to get that population. We will try to explain how to create this gating template based on the manual gating scheme row by row.

2.2.1. “nonDebris”

dtTemplate[1,]
##        alias    pop parent   dims   gating_method gating_args
##       <char> <char> <char> <char>          <char>      <char>
## 1: nonDebris      +   root  FSC-A gate_mindensity            
##    collapseDataForGating groupBy preprocessing_method preprocessing_args
##                   <lgcl>   <int>               <char>             <lgcl>
## 1:                    NA      NA                                      NA

2.2.2. “singlets”

dtTemplate[2,]
##       alias    pop    parent        dims gating_method gating_args
##      <char> <char>    <char>      <char>        <char>      <char>
## 1: singlets      + nonDebris FSC-A,FSC-H   singletGate            
##    collapseDataForGating groupBy preprocessing_method preprocessing_args
##                   <lgcl>   <int>               <char>             <lgcl>
## 1:                    NA      NA                                      NA

2.2.3. “lymphocytes”

dtTemplate[3,]
##     alias    pop   parent        dims gating_method           gating_args
##    <char> <char>   <char>      <char>        <char>                <char>
## 1:  lymph      + singlets FSC-A,SSC-A     flowClust K=2,target=c(1e5,5e4)
##    collapseDataForGating groupBy preprocessing_method preprocessing_args
##                   <lgcl>   <int>               <char>             <lgcl>
## 1:                    NA      NA      prior_flowClust                 NA

2.2.4. “cd3+” (Tcells)

dtTemplate[4,]
##     alias    pop parent   dims   gating_method gating_args
##    <char> <char> <char> <char>          <char>      <char>
## 1:    cd3      +  lymph    CD3 gate_mindensity            
##    collapseDataForGating groupBy preprocessing_method preprocessing_args
##                   <lgcl>   <int>               <char>             <lgcl>
## 1:                  TRUE       4                                      NA

This is similar to the nonDebris gate except that we specify collapseDataForGating as TRUE, which tells the pipeline to collapse all samples into one and apply mindensity to the collapsed data on CD3 dimension. Once the gate is generated, it is replicated across all samples. This is only useful when each individual sample does not have enough events to deduce the gate. Here we do this just for the purpose of proof of concept.

2.2.5. CD4 and CD8

The fifth row specifies pop as cd4+/-cd8+/-, which will be expanded into 6 rows.

dtTemplate[5,]
##     alias    pop parent    dims   gating_method       gating_args
##    <char> <char> <char>  <char>          <char>            <char>
## 1:      * -/++/-    cd3 cd4,cd8 gate_mindensity gate_range=c(1,3)
##    collapseDataForGating groupBy preprocessing_method preprocessing_args
##                   <lgcl>   <int>               <char>             <lgcl>
## 1:                    NA      NA                                      NA

The first two rows are two 1-D gates that will be generated by gating_method on each dimension (cd4 and cd8) independently:

##     alias    pop                        parent   dims   gating_method
##    <char> <char>                        <char> <char>          <char>
## 1:   cd4+      + /nonDebris/singlets/lymph/cd3    cd4 gate_mindensity
## 2:   cd8+      + /nonDebris/singlets/lymph/cd3    cd8 gate_mindensity
##          gating_args collapseDataForGating groupBy preprocessing_method
##               <char>                <char>  <char>               <char>
## 1: gate_range=c(1,3)                                                   
## 2: gate_range=c(1,3)                                                   
##    preprocessing_args
##                <char>
## 1:                   
## 2:

Then another 4 rows are 4 rectangleGates that corresponds to the 4 quadrants in the 2-D projection (cd4 vs cd8).

##       alias    pop                        parent    dims gating_method
##      <char> <char>                        <char>  <char>        <char>
## 1: cd4+cd8+     ++ /nonDebris/singlets/lymph/cd3 cd4,cd8       refGate
## 2: cd4-cd8+     -+ /nonDebris/singlets/lymph/cd3 cd4,cd8       refGate
## 3: cd4+cd8-     +- /nonDebris/singlets/lymph/cd3 cd4,cd8       refGate
## 4: cd4-cd8-     -- /nonDebris/singlets/lymph/cd3 cd4,cd8       refGate
##                                                              gating_args
##                                                                   <char>
## 1: /nonDebris/singlets/lymph/cd3/cd4+:/nonDebris/singlets/lymph/cd3/cd8+
## 2: /nonDebris/singlets/lymph/cd3/cd4+:/nonDebris/singlets/lymph/cd3/cd8+
## 3: /nonDebris/singlets/lymph/cd3/cd4+:/nonDebris/singlets/lymph/cd3/cd8+
## 4: /nonDebris/singlets/lymph/cd3/cd4+:/nonDebris/singlets/lymph/cd3/cd8+
##    collapseDataForGating groupBy preprocessing_method preprocessing_args
##                   <char>  <char>               <char>             <char>
## 1:                                                                      
## 2:                                                                      
## 3:                                                                      
## 4:

As we see here, "refGate" in gating_method indicates that they are constructed based on the gate coordinates of the previous two 1-D gates. Those 1-D gates are thus considered as “reference gates” that are referred to by a colon-separated alias string in gating_args: "cd4+:cd8+".

Alternatively, we can expand it into these 6 rows explicitly in the spreadsheet. But this convenient representation is recommended unless the user wants to have finer control on how the gating is done. For instance, sometimes we need to use different gating_methods to generate 1-D gates on cd4 and cd8. Or it could be the case that cd8 gating needs to depend on cd4 gating, i.e. the parent of cd8+ is cd4+(or cd4-) instead of cd3. Sometimes we want to have a customized alias other than the quadrant-like name (x+y+) that gets generated automatically. (e.g. 5th row of the gating template)

  1. Load gating template

After the gating template is defined in the spreadsheet, it can be loaded into R:

gt_tcell <- gatingTemplate(gtFile)
gt_tcell
## --- Gating Template: default
## 	with  29  populations defined

Besides looking at the spreadsheet, we can examine the gating scheme by visualizing it:

plot(gt_tcell)
plot of chunk plot-gt

plot of chunk plot-gt

As we can see, the gating scheme has been expanded as we described above. All the **colored** arrows source from a `parent` population and the **grey** arrows source from a `reference` population(/gate).
  1. Run the gating pipeline

Once we are satisfied with the gating template, we can apply it to the actual flow data.

4.1. Load the raw data

First of all, we load the raw FCS files into R by ncdfFlow::read.ncdfFlowSet (it uses less memory than flowCore::read.flowSet) and create an empty GatingSet object.

fcsFiles <- list.files(pattern = "CytoTrol", flowDataPath, full = TRUE)
cs  <- load_cytoset_from_fcs(fcsFiles)
cf <- realize_view(cs[[1]])
gs <- GatingSet(cs)
gs
## A GatingSet with 2 samples

4.2. Compensation

Then, we compensate the data. If we have compensation controls (i.e. singly stained samples), we can calculate the compensation matrix by using the flowStats::spillover function. Here we simply use the compensation matrix defined in the flowJo workspace.

compMat <- gh_get_compensations(gh)
compensate(gs, compMat)
## A GatingSet with 2 samples

Here is one example showing the compensation outcome:

## A cytoset with 2 samples.
## 
##   column names:
##     V545-A, V450-A
## 
## cytoset has been subsetted and can be realized through 'realize_view()'.
plot of chunk compensate_plot

plot of chunk compensate_plot

4.3. Transformation

All of the stained channels need to be transformed properly before the gating. Here we use the flowCore::estimateLogicle method to determine the logicle transformation.

chnls <- parameters(compMat)
trans <- estimateLogicle(gs[[1]], channels = chnls)
gs <- transform(gs, trans)

Here is one example showing the transformation outcome:

plot of chunk transformation_plot

plot of chunk transformation_plot

4.5. Gating

Now we can apply the gating template to the data:

gt_gating(gt_tcell, gs)

Optionally, we can run the pipeline in parallel to speed up gating. e.g.

gt_gating(gt_tcell, gs, mc.cores=2, parallel_type = "multicore")

4.6. Hide nodes

After gating, there are some extra populations generated automatically by the pipeline (e.g. refGate).

plot(gs[[1]])
plot of chunk plot_afterGating

plot of chunk plot_afterGating

We can hide these populations if we are not interested in them:
nodesToHide <- c("cd8+", "cd4+"
				, "cd4-cd8-", "cd4+cd8+"
				, "cd4+cd8-/HLA+", "cd4+cd8-/CD38+"
				, "cd4-cd8+/HLA+", "cd4-cd8+/CD38+"
				, "CD45_neg/CCR7_gate", "cd4+cd8-/CD45_neg"
				, "cd4-cd8+/CCR7+", "cd4-cd8+/CD45RA+"
				)
lapply(nodesToHide, function(thisNode) gs_pop_set_visibility(gs, thisNode, FALSE))

4.7. Rename nodes

And rename the populations:

gs_pop_set_name(gs, "cd4+cd8-", "cd4")
gs_pop_set_name(gs, "cd4-cd8+", "cd8")
plot(gs[[1]])
plot of chunk plot_afterHiding

plot of chunk plot_afterHiding

4.8. Visualize the gates

autoplot(gs[[1]])
plot of chunk plotGate_autoGate

plot of chunk plotGate_autoGate

4.9. Apply a gating method without csv template

Sometimes it will be helpful (especially when working with data that is already gated) to be able to interact with the GatingSet directly without the need to write the complete csv gating template. We can apply each automated gating method using the same fields as in the gatingTemplate, but provided as arguments to the gs_add_gating_method function. The populations added by each of these calls to gs_add_gating_method can be removed sequentially by gs_remove_gating_method, which will remove all populations added by the prior call to gs_add_gating_method. These two functions allow for interactive stagewise prototyping of a gatingTemplate.

For example, suppose we wanted to add a CD38-/HLA- sub-population to the cd4+cd8- population. We could do this as follows:

gs_add_gating_method(gs, alias = "non-activated cd4",
                         pop = "--",
                         parent = "cd4",
                         dims = "CD38,HLA",
                         gating_method = "tailgate")
plot(gs[[1]])
plot of chunk gt_add_gating_method

plot of chunk gt_add_gating_method

The addition of this population can then easily be undone by a call to gs_remove_gating_method:

gs_remove_gating_method(gs)
plot(gs[[1]])
plot of chunk gs_remove_gating_method

plot of chunk gs_remove_gating_method

  1. Conclusion

The openCyto package allows users to specify their gating schemes and gate the data in a data-driven fashion. It frees the scientists from the labor-intensitive manual gating routines and increases the speed as well as the reproducibilty and objectivity of the data analysis work.