1 Introduction

To analyze RNA-seq count data, there are several ways/methods for each steps like

Transforming/scaling of the count data,
QC by clustering the samples using PCA, hierarchical clustering or multidimensional scaling
Most importantly identification of differentially expressed.

For each of these steps, there are different packages or tools whose input and output formats are very different. Therefore it is very difficult to use all these features from different packages in a study.

Figure 1: Input and output data structures of different methods to idetify differentially expressed genes

The broadSeq package simplifies the process of including many Bioconductor packages for RNA-seq data and evaluating their performance.

Figure 2: Flowchart

The silent features of broadSeq are

Single input format : SummarizedExperiment
Single output format : as base data.frame
Visualization : Using ggplot2 and ggpubr packages for publication ready figures
Easy and advanced PCA analysis
Function oriented interface to include in existing pipeline.
Differential expression
- Comparison of different methods

2 Reading the data

library(broadSeq)
library(ggplot2)

Here we read gene expression data which is quantified by salmon and processed by tximport package. The salmon generates count data, “abundance” (TPM) and gene length values. Therefore we may rename the corresponding assay slot ‘abundance’ to ‘TPM’.

These data belongs to this studies reference!!

In this data, there are gene expression values from first molar tissue of three different developing rodent species. The tissues are also from four different developmental time points.

se <- readRDS(system.file("extdata","rat_vole_mouseSE_salmon.rds", package = "broadSeq"))
SummarizedExperiment::assayNames(se)
#> [1] "counts"      "abundance"   "avgTxLength" "vst"

2.1 Sample metadata

as.data.frame(colData(se)) %>% dplyr::count(stage,species) %>% tidyr::spread(stage,n)
#>   species Bud Cap Late Cap Bell
#> 1   Mouse   7   7        7    7
#> 2     Rat   5   5        5    5
#> 3    Vole   7   7        7    7

se$stage <- factor(se$stage,levels = c("Bud","Cap","Late Cap","Bell"))

2.2 Filtering out low expression genes

assays(se)[["counts"]][,5] %>% ggpubr::ggdensity(y = "count")+
    ggplot2::geom_vline(xintercept = 10)+ggplot2::scale_x_log10()


keep <- (assays(se)[["counts"]] >= 3) %>% rowSums() >= 5 
# smallest Group Size is 5
table(keep)
#> keep
#> FALSE  TRUE 
#>   689  5089

3 Normalization

edgeR provides two different methods ; CPM and TMM. But it does not use normalized values for Differential Expression.

It also does not normalizes count values rather it normalizes the library sizes using “TMM” method in ‘normLibSizes’. Either use or not use normLibSizes(). edgeR::cpm() will generate normalized expression values.

3.1 CPM

se <- broadSeq::normalizeEdgerCPM(se ,method = "none",cpm.log = TRUE )
## The normalized values are added with the assay name "logCPM"
SummarizedExperiment::assayNames(se)
#> [1] "counts"      "abundance"   "avgTxLength" "vst"         "logCPM"

3.2 TMM

se <- broadSeq::normalizeEdgerCPM(se , method = "TMM", cpm.log = FALSE )
## The normalized values are added with the assay name "TMM"
SummarizedExperiment::assayNames(se)
#> [1] "counts"      "abundance"   "avgTxLength" "vst"         "logCPM"     
#> [6] "TMM"

3.3 access

assays(se)[["counts"]][1:5,1:5]
#>       ME16-E3M1L ME16-E5M1L ME16-E6M1R ME16-E6M1L ME16-E7M1R
#> Axin2       1080       2844       3129       2747       1850
#> Mmp14       2031       8334       8328       8626       6147
#> Timp1         60         96        209        265         66
#> Pax9        3063       6011       7177       6438       3794
#> Irx2         519       1237       1160       1056        802
assays(se)[["TMM"]][1:5,1:5]
#>       ME16-E3M1L ME16-E5M1L ME16-E6M1R ME16-E6M1L ME16-E7M1R
#> Axin2  266.12685  485.34661  496.17821  460.53011  432.54157
#> Mmp14  500.46634 1422.24988 1320.60470 1446.13495 1437.20705
#> Timp1   14.78483   16.38301   33.14198   44.42682   15.43121
#> Pax9   754.76533 1025.81522 1138.08597 1079.32029  887.06093
#> Irx2   127.88874  211.10188  183.94590  177.03669  187.51262
assays(se)[["logCPM"]][1:5,1:5]
#>       ME16-E3M1L ME16-E5M1L ME16-E6M1R ME16-E6M1L ME16-E7M1R
#> Axin2   8.031398   8.883492   8.916061   8.810157   8.733173
#> Mmp14   8.941266  10.433573  10.327398  10.459897  10.464346
#> Timp1   3.907562   4.037786   5.032632   5.451326   3.969329
#> Pax9    9.533527   9.962371  10.112899  10.037992   9.768499
#> Irx2    6.977147   7.684397   7.487015   7.433479   7.529533

4 Transformation

DESeq2 provides three different transformations

4.1 VST

variance stabilizing transformation (VST)

se <- broadSeq::transformDESeq2(se,method = "vst"  )
#> using 'avgTxLength' from assays(dds), correcting for library size

4.2 Normalized counts transformation

se <- broadSeq::transformDESeq2(se, method = "normTransform"  )
#> Warning in broadSeq::transformDESeq2(se, method = "normTransform"): For length correction assayname must match with avgTxLength
#> 
#> using 'avgTxLength' from assays(dds), correcting for library size

4.3 rlog

regularized log

se <- broadSeq::transformDESeq2(se, method = "rlog")

SummarizedExperiment::assayNames(se)
#> [1] "counts"        "abundance"     "avgTxLength"   "vst"          
#> [5] "logCPM"        "TMM"           "normTransform"

4.4 Comparision

Boxplot of different transforms for each sample.

p <- broadSeq::sampleAssay_plot(se[, se$species=="Mouse" ], 
                           assayName = "counts", fill = "stage", 
                           yscale = "log2")+ rremove("x.text")

p1 <- broadSeq::sampleAssay_plot(se[, se$species=="Mouse"], 
                           assayName = "vst", fill = "stage")+ rremove("x.text")

p2 <- broadSeq::sampleAssay_plot(se[, se$species=="Mouse"], 
                           assayName = "TMM", fill = "stage", 
                           yscale = "log10")+ rremove("x.text")

p3 <- broadSeq::sampleAssay_plot(se[, se$species=="Mouse"], 
                           assayName = "logCPM", fill = "stage")+ rremove("x.text")


ggarrange(p,p1,p2,p3, common.legend = TRUE, labels = c("A","B","C"))

Plot standard deviations versus means expression

if (requireNamespace("vsn", quietly = TRUE)) {
    library("vsn")
    x <- meanSdPlot(
        log2(assays(se[, se$species == "Rat"])[["counts"]]+1),
        plot = FALSE)
    print(x$gg +ggtitle(label = "log2(n+1) "))
    
    x <- meanSdPlot(
        assays(se[, se$species == "Rat"])[["vst"]],
        plot = FALSE)
    
    print(x$gg +ggtitle(label = "Vst"))
    
    x <- meanSdPlot(
        assays(se[, se$species == "Rat"])[["logCPM"]],
        plot = FALSE)
    print(x$gg + ggtitle(label = "logCPM"))
}

5 Visualization of gene Expression

## Multiple assay of a single gene
broadSeq::assay_plot(se, feature = c("Shh"), 
           assayNames = c("counts","logCPM","vst","TMM"),
           x = "stage", fill="species", add="dotplot", palette = "npg")


## Expression of multiple genes from a single assay
broadSeq::genes_plot(se, 
                     features = c("Shh","Edar"), 
                     facet.by = "symbol",
                     x = "stage", assayName = "vst", fill="species", palette = "jco")

5.1 Pre-defined or custom color palette based on journals

Scientific journal palettes from ggsci R package, e.g.: “npg”, “aaas”, “lancet”, “jco”, “ucscgb”, “uchicago”, “simpsons” and “rickandmorty” can be passed for coloring or filling by groups from metadata.

jco <- broadSeq::genes_plot(se[,se$species == "Mouse"], 
                     features = c("Shh"), facet.by = "symbol", assayName =  "logCPM",
                     x = "stage",  fill="stage", add="dotplot", xlab = "", 
                     title = "Journal of Clinical Oncology", palette = "jco") 

npg <- broadSeq::genes_plot(se[,se$species == "Mouse"], 
                     features = c("Shh"), facet.by = "symbol",assayName =  "logCPM",
                     x = "stage", fill="stage", add="dotplot", xlab = "",
                     title = "Nature Publishing Group", palette = "npg") 

aaas <- broadSeq::genes_plot(se[,se$species == "Mouse"], 
                     features = c("Shh"), facet.by = "symbol", assayName = "logCPM",
                     x = "stage", fill="stage", add="dotplot", xlab = "",
                     title = "Science", palette = "aaas")

nejm <- broadSeq::genes_plot(se[,se$species == "Mouse"], 
                     features = c("Shh"), facet.by = "symbol", assayName = "logCPM",
                     x = "stage", fill="stage", add="dotplot",  xlab = "",
                     title = "New England Journal of Medicine",palette = "nejm")

# ggarrange(jco,npg,aaas,nejm, 
#           common.legend = TRUE,legend = "none",
#           labels = c("A","B","C","D"))

ggarrange(jco+ggpubr::rotate_x_text(), npg+ggpubr::rotate_x_text(),
          aaas+ggpubr::rotate_x_text(),nejm+ggpubr::rotate_x_text(),
          nrow = 1, common.legend = TRUE,legend = "none",
          labels = c("A","B","C","D")) %>% 
    annotate_figure( top = text_grob("Color palette"))

6 QC with Clustering

6.1 MDS plot

Classical multidimensional scaling is based on measuring the distance between the samples.

Popular function plotMDS from limma does not work with SummarizedExperiment object. Here broadSeq provides this function through package cmdscale {stats}.

broadSeq::plot_MDS(se, scaledAssay = "vst", ntop=500, 
                   color = "species", shape = "stage", 
                   ellipse=TRUE, legend = "bottom")
#> Only using 500 most variable genes.

MDS is cool but it is not possible to know/visualize top variable genes with their meta data which is stored in se object

head(rowData(se))
#> DataFrame with 6 rows and 8 columns
#>             mouse_gene_id      symbol       Class chromosome_name
#>               <character> <character>    <factor>     <character>
#> Axin2  ENSMUSG00000000142       Axin2 Dispensable              11
#> Mmp14  ENSMUSG00000000957       Mmp14 Tissue                   14
#> Timp1  ENSMUSG00000001131       Timp1 Dispensable               X
#> Pax9   ENSMUSG00000001497        Pax9 Progression              12
#> Irx2   ENSMUSG00000001504        Irx2 Dispensable              13
#> Col1a1 ENSMUSG00000001506      Col1a1 Dispensable              11
#>        start_position   gene_biotype        vole_gene_id       Rnor_gene_id
#>             <integer>    <character>         <character>        <character>
#> Axin2       108811175 protein_coding Mglareolus_00009775 ENSRNOG00000055010
#> Mmp14        54669069 protein_coding Mglareolus_00038784 ENSRNOG00000010947
#> Timp1        20736405 protein_coding Mglareolus_00039838 ENSRNOG00000010208
#> Pax9         56738552 protein_coding Mglareolus_00000242 ENSRNOG00000008826
#> Irx2         72776939 protein_coding Mglareolus_00017335 ENSRNOG00000012742
#> Col1a1       94827050 protein_coding Mglareolus_00042400 ENSRNOG00000003897

Other methods can help to visualize gene information along with clustering information.

6.2 Hierarchical clustering and Heatmap

p_vst <- broadSeq::plotHeatmapCluster(
    se,
    scaledAssay = "vst",
    annotation_col = c("species", "stage"),
    annotation_row = c("Class","gene_biotype"),
    ntop = 30, show_geneAs = "symbol",
    cluster_cols = TRUE, cluster_rows = FALSE,
    show_rownames = TRUE, show_colnames = FALSE,
    main = "Top 30 variable gene vst"
)
#> Only using 30 most variable genes.

6.3 PCA plot

6.3.1 prcompTidy

Perform Principal Components Analysis with function broadSeq::prcompTidy() which returns a list of four data.frame objects:

pc_scores,
eigen_values,
loadings (eigen vectors) and
the original data.

Compute PCA using any assay

computedPCA_logCPM <- broadSeq::prcompTidy(se, scaledAssay = "logCPM", ntop = 500)
#> Only using 500 most variable genes.
## PCA based on vst values
computedPCA_vst <- broadSeq::prcompTidy(se, scaledAssay = "vst", ntop = 500)
#> Only using 500 most variable genes.

6.3.2 Plot

6.3.2.1 logCPM

plotAnyPC(computedPCA = computedPCA_logCPM,
          x = 1, y = 2, color = "species", shape = "stage",
          legend = "bottom")
#> Warning in seq_len(computedPCA$pc_scores[, dottedArg$shape]): first element
#> used of 'length.out' argument

6.3.2.2 VST

Here PC 2 and 3 are used which can cluster the samples by both species and developmental factors.

pca_vst <- plotAnyPC(computedPCA = computedPCA_vst,
            x = 2, y = 3,  color = "species", shape = "stage", 
            legend = "bottom") 
#> Warning in seq_len(computedPCA$pc_scores[, dottedArg$shape]): first element
#> used of 'length.out' argument

6.3.2.3 Other PCs

computedPCA_vst$eigen_values %>%
        dplyr::filter(var_exp >= 2) %>%
    ggbarplot(x="PC",y="var_exp", label = TRUE, label.pos = "out")

It can be checked if there are other PCs to explain considerable variance. PC3 can be useful to see variance due to different developmental time points.

pca_vst_2_3 <-plotAnyPC(computedPCA = computedPCA_vst,
                x = 2, y = 3,  
                color = "species", shape = "stage", legend = "bottom")
#> Warning in seq_len(computedPCA$pc_scores[, dottedArg$shape]): first element
#> used of 'length.out' argument
# pca_vst_2_3

PC3 captures beautifully the variance in gene expression due to developmental stages.

6.3.2.4 Gene loading

computedPCA_vst %>% broadSeq::getFeatureLoadRanking(keep = c("symbol","Class")) %>% head()
#>    symbol        Class     loading  PC Rank
#> 1    Rpl4        Other -0.10384659 PC1    1
#> 2 Hsp90b1        Other -0.09727451 PC1    2
#> 3    Atrx Dev. process -0.09419071 PC1    3
#> 4   Rpl19        Other -0.09347617 PC1    4
#> 5     Ncl        Other -0.09181188 PC1    5
#> 6    Rpl3        Other -0.08743885 PC1    6

#  Top 5 genes of PC2

computedPCA_vst$loadings %>% top_n(5,abs(PC2)  ) %>% dplyr::select(gene,PC2)
#>                  gene        PC2
#> Aadacl2fm3 Aadacl2fm3  0.1635445
#> Rack1           Rack1 -0.1452180
#> Rpl11           Rpl11 -0.1474743
#> Rpl8             Rpl8 -0.1511175
#> Rplp0           Rplp0 -0.1586286

pca_vst_loading <- computedPCA_vst %>% 
    broadSeq::getFeatureLoadRanking(keep = c("symbol","Class"), topN = 50, pcs=1:10) %>% 
    dplyr::count(Class, PC) %>%
    ggbarplot(
        x = "PC", y = "n", fill = "Class",
        legend = "bottom", palette = c("red","blue","orange","purple","white","grey")
    ) 
# pca_vst_loading

6.3.2.5 Biplot

# By default it plots top 2 genes from each PC axis
pca_vst_bi <- broadSeq::biplotAnyPC(computedPCA = computedPCA_vst, 
            x = 1, y = 2, genesLabel = "symbol", 
            color = "species", shape = "stage", 
            legend = "bottom")
#> Warning in seq_len(computedPCA$pc_scores[, dottedArg$shape]): first element
#> used of 'length.out' argument
# pca_vst_bi

ggarrange(
    ggarrange(pca_vst_bi+ggtitle(label =  ""),
          pca_vst_2_3+ggtitle(label =  ""), common.legend = TRUE),
    pca_vst_loading, nrow = 2)

6.3.2.6 User defined genes

Now plotting top 5 genes from PC3

# Top 5 genes of PC3
biplotAnyPC(computedPCA = computedPCA_vst,x = 2, y = 3, 
            color = "species", shape = "stage",
            genes= computedPCA_vst$loadings %>% 
                top_n(5,abs(PC3)) %>% pull(gene),
            genesLabel = "symbol")
#> Warning in seq_len(computedPCA$pc_scores[, dottedArg$shape]): first element
#> used of 'length.out' argument


## Plot progression gene "Shh" 
biplotAnyPC(computedPCA = computedPCA_vst,x = 2, y = 3, 
            color = "species", shape = "stage",
            genes=c("Shh"),
            genesLabel = "symbol")
#> Warning in seq_len(computedPCA$pc_scores[, dottedArg$shape]): first element
#> used of 'length.out' argument

7 Compare Differential expression

7.1 Data

In this example we will use RNA-seq expression data from developing mouse molar tissue. First we read the data as a SummarizedExperiment object.

se <- readRDS(system.file("extdata","rat_vole_mouseSE_salmon.rds", package = "broadSeq"))

# To reduce the run time, subset of the data used here
se <- se[,colData(se)$species == "Mouse"]

7.1.1 Gene information

head(rownames(se))
#> [1] "Axin2"  "Mmp14"  "Timp1"  "Pax9"   "Irx2"   "Col1a1"
head(rowData(se))
#> DataFrame with 6 rows and 8 columns
#>             mouse_gene_id      symbol       Class chromosome_name
#>               <character> <character>    <factor>     <character>
#> Axin2  ENSMUSG00000000142       Axin2 Dispensable              11
#> Mmp14  ENSMUSG00000000957       Mmp14 Tissue                   14
#> Timp1  ENSMUSG00000001131       Timp1 Dispensable               X
#> Pax9   ENSMUSG00000001497        Pax9 Progression              12
#> Irx2   ENSMUSG00000001504        Irx2 Dispensable              13
#> Col1a1 ENSMUSG00000001506      Col1a1 Dispensable              11
#>        start_position   gene_biotype        vole_gene_id       Rnor_gene_id
#>             <integer>    <character>         <character>        <character>
#> Axin2       108811175 protein_coding Mglareolus_00009775 ENSRNOG00000055010
#> Mmp14        54669069 protein_coding Mglareolus_00038784 ENSRNOG00000010947
#> Timp1        20736405 protein_coding Mglareolus_00039838 ENSRNOG00000010208
#> Pax9         56738552 protein_coding Mglareolus_00000242 ENSRNOG00000008826
#> Irx2         72776939 protein_coding Mglareolus_00017335 ENSRNOG00000012742
#> Col1a1       94827050 protein_coding Mglareolus_00042400 ENSRNOG00000003897

7.1.2 Sample information

The sample metadata

head(colData(se))
#> DataFrame with 6 rows and 4 columns
#>                 day        info    stage  species
#>            <factor> <character> <factor> <factor>
#> ME16-E3M1L      E16  ME16-E3M1L     Bell    Mouse
#> ME16-E5M1L      E16  ME16-E5M1L     Bell    Mouse
#> ME16-E6M1R      E16  ME16-E6M1R     Bell    Mouse
#> ME16-E6M1L      E16  ME16-E6M1L     Bell    Mouse
#> ME16-E7M1R      E16  ME16-E7M1R     Bell    Mouse
#> ME16-E8M1R      E16  ME16-E8M1R     Bell    Mouse
table(colData(se)$stage)
#> 
#>      Bud      Cap Late Cap     Bell 
#>        7        7        7        7

7.2 Differential Expression

There are four developmental time points; Bud, Cap, Late Cap and Bell (from embryonic days 13 to 16) with seven replicates each. Here, in this example, DE genes between Bud and Cap stages will be identified.

7.2.1 Function pattern

In broadSeq, the names of DE method has similar pattern like “use_”+“method name”. All these method has same signature of input arguments.

Here we will see use_NOIseq() method to apply NOISeq::noiseqbio() function.

result_Noiseq <- 
    use_NOIseq(se = se, 
           colData_id = "stage", control = "Bud", treatment = "Cap",
           rank = TRUE, 
           r = 10) # r is an argument of NOISeq::noiseqbio
#> Computing Z values...
#> Filtering out low count features...
#> 4654 features are to be kept for differential expression analysis with filtering method 1
#> [1] "r = 1"
#> [1] "r = 2"
#> [1] "r = 3"
#> [1] "r = 4"
#> [1] "r = 5"
#> [1] "r = 6"
#> [1] "r = 7"
#> [1] "r = 8"
#> [1] "r = 9"
#> [1] "r = 10"
#> Computing probability of differential expression...
#> p0 = 0.3809764779521
#> Probability
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
#>  0.0000  0.3169  0.7234  0.6102  0.9102  1.0000    1124

head(result_Noiseq)
#>          Bud_mean   Cap_mean    theta prob   log2FC rank
#> Dlx2    888.94171 375.731990 4.449016    1 1.242385    1
#> Irf6    511.38324 215.511591 4.491907    1 1.246639    2
#> Ptch2   358.98412 151.233857 3.525978    1 1.247139    3
#> Spp1     43.80853   1.627490 3.780321    1 4.750491    4
#> Fgf3     92.59420   1.270972 4.002669    1 6.186918    5
#> Sostdc1 993.75851 448.122284 4.155289    1 1.149003    6

pg <- broadSeq::genes_plot(se, x = "stage", assayName =  "counts",  
           features = result_Noiseq %>% dplyr::filter(rank <5) %>% rownames(),
           fill="stage", facet.by = "symbol",
           palette="jco", add = "dotplot")+rotate_x_text()

pg_sc <- ggscatter(result_Noiseq, x="Bud_mean", y="Cap_mean",color = "prob")+ 
    scale_x_log10()+scale_y_log10()

pg+pg_sc

7.2.2 Available methods

Following implementations of popular methods are available here.

# limma 
?use_limma_trend(se, colData_id, control, treatment, rank = FALSE, ...)
?use_limma_voom(se, colData_id, control, treatment, rank = FALSE, ...)

# edgeR 
?use_edgeR_exact(se, colData_id, control, treatment, rank = FALSE, ...)
?use_edgeR_GLM(se, colData_id, control, treatment, rank = FALSE, ...)

# deseq2
?use_deseq2(se, colData_id, control, treatment, rank = FALSE, ...)

# DELocal
?use_DELocal(se, colData_id, control, treatment, rank = FALSE, ...)

# noiseq
?use_NOIseq(se, colData_id, control, treatment, rank = FALSE, ...) 

# EBSeq 
?use_EBSeq(se, colData_id, control, treatment, rank = FALSE, ...)

# samseq
?use_SAMseq(se, colData_id, control, treatment, rank = FALSE, ...)

Advanced users can pass package specific arguments through ‘…’ .

7.3 Compare DE results

It is possible to execute all DE methods together and get aggregated results in a data.frame. broadSeq::use_multDE() should be used for it.

# First define a named list of functions
funs <- list(limma_trend = use_limma_trend, limma_voom = use_limma_voom,
             edgeR_exact = use_edgeR_exact, edgeR_glm = use_edgeR_GLM,
             deseq2 = use_deseq2, 
             DELocal = use_DELocal, noiseq = use_NOIseq, 
             EBSeq = use_EBSeq) 


multi_result <- broadSeq::use_multDE(
    se = se, 
    deFun_list = funs, return.df = TRUE,  
    colData_id = "stage", control = "Bud", treatment = "Cap", 
    rank = TRUE)
#> Now executing >>  limma_trend
#> ####
#> Now executing >>  limma_voom
#> ####
#> Now executing >>  edgeR_exact
#> ####
#> Now executing >>  edgeR_glm
#> ####
#> Now executing >>  deseq2
#> ####
#> factor levels were dropped which had no samples
#> estimating size factors
#> using 'avgTxLength' from assays(dds), correcting for library size
#> estimating dispersions
#> gene-wise dispersion estimates
#> mean-dispersion relationship
#> final dispersion estimates
#> fitting model and testing
#> -- replacing outliers and refitting for 22 genes
#> -- DESeq argument 'minReplicatesForReplace' = 7 
#> -- original counts are preserved in counts(dds)
#> estimating dispersions
#> fitting model and testing
#> Now executing >>  DELocal
#> ####
#> Default 1Mb neighborhood will be used
#> factor levels were dropped which had no samples
#> using 'avgTxLength' from assays(dds), correcting for library size
#> Now executing >>  noiseq
#> ####
#> Now executing >>  EBSeq
#> ####

The column names of the resultant data.frame are prefixed with corresponding function names.

head(multi_result)
#>               limma_trend_logFC limma_trend_AveExpr limma_trend_t
#> 1110008P14Rik        0.54405160           4.8117022     3.6578280
#> 1110032A03Rik       -0.13874330           5.3785105    -1.1189797
#> 1110032F04Rik        1.49491410           3.0666526     7.0648546
#> 1110051M20Rik       -0.08607705           6.8048345    -1.8949926
#> 1110059G10Rik        0.11686705           5.4687249     0.9131006
#> 1300017J02Rik       -1.75373449           0.3862759    -3.1448352
#>               limma_trend_P.Value limma_trend_adj.P.Val limma_trend_B
#> 1110008P14Rik        2.797602e-03          0.0130149310     -2.410387
#> 1110032A03Rik        2.829681e-01          0.4526549814     -6.717686
#> 1110032F04Rik        7.519122e-06          0.0001281578      3.676603
#> 1110051M20Rik        8.005768e-02          0.1777077549     -5.657093
#> 1110059G10Rik        3.774584e-01          0.5365202649     -6.925361
#> 1300017J02Rik        7.568554e-03          0.0283599915     -3.404915
#>               limma_trend_rank limma_voom_logFC limma_voom_AveExpr limma_voom_t
#> 1110008P14Rik             1242       0.55012790          4.7963921    3.4912966
#> 1110032A03Rik             3612      -0.13598068          5.3683453   -1.0766122
#> 1110032F04Rik              339       1.52521100          3.0082492    5.5457607
#> 1110051M20Rik             2603      -0.08377041          6.8010984   -1.5379391
#> 1110059G10Rik             4065       0.12743802          5.4591759    0.9788081
#> 1300017J02Rik             1542      -2.79755481         -0.4115494   -3.3164034
#>               limma_voom_P.Value limma_voom_adj.P.Val limma_voom_B
#> 1110008P14Rik       3.185938e-03         0.0140952159    -2.450562
#> 1110032A03Rik       2.982906e-01         0.4602197395    -6.781351
#> 1110032F04Rik       5.160299e-05         0.0005491014     2.094132
#> 1110051M20Rik       1.444281e-01         0.2758695513    -6.580324
#> 1110059G10Rik       3.428565e-01         0.5101789153    -6.904758
#> 1300017J02Rik       4.577596e-03         0.0189053550    -1.996826
#>               limma_voom_rank edgeR_exact_logFC edgeR_exact_logCPM
#> 1110008P14Rik            1306        0.56742197          4.8655704
#> 1110032A03Rik            3745       -0.11220434          5.3968467
#> 1110032F04Rik             543        1.62986496          3.3283271
#> 1110051M20Rik            3025       -0.08359159          6.8073364
#> 1110059G10Rik            3883        0.13179863          5.4882085
#> 1300017J02Rik            1399       -2.12756839          0.8642446
#>               edgeR_exact_PValue edgeR_exact_FDR edgeR_exact_rank
#> 1110008P14Rik       1.000530e-03    5.231732e-03             1105
#> 1110032A03Rik       4.437472e-01    6.609702e-01             3878
#> 1110032F04Rik       2.602324e-09    4.409451e-08              341
#> 1110051M20Rik       3.282945e-01    5.379710e-01             3526
#> 1110059G10Rik       3.735760e-01    5.881532e-01             3670
#> 1300017J02Rik       8.116018e-03    3.044647e-02             1540
#>               edgeR_glm_logFC edgeR_glm_logCPM edgeR_glm_LR edgeR_glm_PValue
#> 1110008P14Rik      0.56740347        4.8655850   11.7410747     6.113571e-04
#> 1110032A03Rik     -0.11219402        5.3968485    0.6026420     4.375717e-01
#> 1110032F04Rik      1.62985892        3.3283632   35.7072150     2.293132e-09
#> 1110051M20Rik     -0.08358111        6.8073360    0.9752985     3.233623e-01
#> 1110059G10Rik      0.13181500        5.4882097    0.8224682     3.644595e-01
#> 1300017J02Rik     -2.12795630        0.8641698    6.5759873     1.033637e-02
#>               edgeR_glm_FDR edgeR_glm_rank deseq2_baseMean
#> 1110008P14Rik  3.429535e-03           1030       147.18768
#> 1110032A03Rik  6.195270e-01           4081       218.60073
#> 1110032F04Rik  3.840497e-08            345        49.16736
#> 1110051M20Rik  5.007092e-01           3731       595.51100
#> 1110059G10Rik  5.431548e-01           3877       232.78793
#> 1300017J02Rik  3.670776e-02           1627        12.81403
#>               deseq2_log2FoldChange deseq2_lfcSE deseq2_stat deseq2_pvalue
#> 1110008P14Rik            0.50451366    0.1657664   3.0435219  2.338264e-03
#> 1110032A03Rik            0.19082266    0.1940145   0.9835483  3.253377e-01
#> 1110032F04Rik            1.58701594    0.2811161   5.6454104  1.647877e-08
#> 1110051M20Rik            0.43358853    0.1816105   2.3874642  1.696506e-02
#> 1110059G10Rik            0.08091388    0.1977231   0.4092283  6.823722e-01
#> 1300017J02Rik           -2.81096557    1.0793250  -2.6043737  9.204233e-03
#>                deseq2_padj deseq2_rank DELocal_relative.logFC DELocal_P.Value
#> 1110008P14Rik 1.108404e-02        1055               17.83827    0.2729166121
#> 1110032A03Rik 4.972536e-01        3272               42.92129    0.1221318350
#> 1110032F04Rik 3.225187e-07         255               48.05240    0.0007963788
#> 1110051M20Rik 5.449085e-02        1557               88.53803    0.1739847417
#> 1110059G10Rik 8.038971e-01        4245               11.50263    0.7104000042
#> 1300017J02Rik 3.336348e-02        1380              -17.16339    0.0080219554
#>               DELocal_adj.P.Val DELocal_B DELocal_rank noiseq_Bud_mean
#> 1110008P14Rik        0.62788658 -4.658483         2740      33.8722312
#> 1110032A03Rik        0.41032892 -4.279768         1972      39.6798287
#> 1110032F04Rik        0.01675251 -1.976905         1891      14.3689100
#> 1110051M20Rik        0.49469006 -4.450382         1367     107.2818570
#> 1110059G10Rik        0.95442262 -5.011704         3056      45.9373517
#> 1300017J02Rik        0.07509916 -2.959888         2777       0.6001355
#>               noiseq_Cap_mean noiseq_theta noiseq_prob noiseq_log2FC
#> 1110008P14Rik       23.331126    0.6555759   0.8852864    0.53784711
#> 1110032A03Rik       43.774221   -0.1974162   0.4118575   -0.14167571
#> 1110032F04Rik        4.693828    1.3985890   0.9774037    1.61411366
#> 1110051M20Rik      115.980525   -0.2726831   0.5827917   -0.11247646
#> 1110059G10Rik       42.887052    0.1415338   0.1180432    0.09912552
#> 1300017J02Rik        2.506642   -0.9268528   0.9510382   -2.06239542
#>               noiseq_rank   EBSeq_PPEE EBSeq_PPDE EBSeq_Status EBSeq_Direction
#> 1110008P14Rik        1872 0.0315029178 0.96849708           DE    Bud Over Cap
#> 1110032A03Rik        3472 0.9747575436 0.02524246           EE    Bud Over Cap
#> 1110032F04Rik         505 0.0001226596 0.99987734           DE    Bud Over Cap
#> 1110051M20Rik        3056 0.9830170118 0.01698299           EE    Bud Over Cap
#> 1110059G10Rik        4102 0.9690535947 0.03094641           EE    Bud Over Cap
#> 1300017J02Rik        1040 0.0220131318 0.97798687           DE    Bud Over Cap
#>               EBSeq_rank      mouse_gene_id        symbol Class chromosome_name
#> 1110008P14Rik       1219 ENSMUSG00000039195 1110008P14Rik Other               2
#> 1110032A03Rik       4261 ENSMUSG00000037971 1110032A03Rik Other               9
#> 1110032F04Rik        744 ENSMUSG00000046999 1110032F04Rik Other               3
#> 1110051M20Rik       4582 ENSMUSG00000040591 1110051M20Rik Other               2
#> 1110059G10Rik       4093 ENSMUSG00000032551 1110059G10Rik Other               9
#> 1300017J02Rik       1186 ENSMUSG00000033688 1300017J02Rik Other               9
#>               start_position   gene_biotype        vole_gene_id
#> 1110008P14Rik       32267109 protein_coding Mglareolus_00032142
#> 1110032A03Rik       50674128 protein_coding Mglareolus_00020124
#> 1110032F04Rik       68776919 protein_coding Mglareolus_00010961
#> 1110051M20Rik       91105413 protein_coding Mglareolus_00006101
#> 1110059G10Rik      122774154 protein_coding Mglareolus_00025510
#> 1300017J02Rik      103127720 protein_coding Mglareolus_00023537
#>                     Rnor_gene_id
#> 1110008P14Rik ENSRNOG00000022681
#> 1110032A03Rik ENSRNOG00000030962
#> 1110032F04Rik ENSRNOG00000009803
#> 1110051M20Rik ENSRNOG00000014798
#> 1110059G10Rik ENSRNOG00000004135
#> 1300017J02Rik ENSRNOG00000009434
# nrow(multi_result) == nrow(se)
colnames(multi_result)
#>  [1] "limma_trend_logFC"      "limma_trend_AveExpr"    "limma_trend_t"         
#>  [4] "limma_trend_P.Value"    "limma_trend_adj.P.Val"  "limma_trend_B"         
#>  [7] "limma_trend_rank"       "limma_voom_logFC"       "limma_voom_AveExpr"    
#> [10] "limma_voom_t"           "limma_voom_P.Value"     "limma_voom_adj.P.Val"  
#> [13] "limma_voom_B"           "limma_voom_rank"        "edgeR_exact_logFC"     
#> [16] "edgeR_exact_logCPM"     "edgeR_exact_PValue"     "edgeR_exact_FDR"       
#> [19] "edgeR_exact_rank"       "edgeR_glm_logFC"        "edgeR_glm_logCPM"      
#> [22] "edgeR_glm_LR"           "edgeR_glm_PValue"       "edgeR_glm_FDR"         
#> [25] "edgeR_glm_rank"         "deseq2_baseMean"        "deseq2_log2FoldChange" 
#> [28] "deseq2_lfcSE"           "deseq2_stat"            "deseq2_pvalue"         
#> [31] "deseq2_padj"            "deseq2_rank"            "DELocal_relative.logFC"
#> [34] "DELocal_P.Value"        "DELocal_adj.P.Val"      "DELocal_B"             
#> [37] "DELocal_rank"           "noiseq_Bud_mean"        "noiseq_Cap_mean"       
#> [40] "noiseq_theta"           "noiseq_prob"            "noiseq_log2FC"         
#> [43] "noiseq_rank"            "EBSeq_PPEE"             "EBSeq_PPDE"            
#> [46] "EBSeq_Status"           "EBSeq_Direction"        "EBSeq_rank"            
#> [49] "mouse_gene_id"          "symbol"                 "Class"                 
#> [52] "chromosome_name"        "start_position"         "gene_biotype"          
#> [55] "vole_gene_id"           "Rnor_gene_id"

7.3.0.1 Similarity of methods

DE methods are clustered based on their ranking of genes

clusters <- multi_result %>% dplyr::select(ends_with("rank")) %>% t() %>% dist() %>% hclust()
plot(clusters,main =  "distance: Euclidean")

7.3.1 Plots

7.3.1.1 Volcano

Up regulated genes should be red or hot colors

multi_result %>% broadSeq::volcanoPlot(
    pValName = "deseq2_padj",
    lFCName = "deseq2_log2FoldChange",
    labelName = "symbol",
    palette = "lancet" ,
    selectedLabel =
        multi_result %>% dplyr::arrange(deseq2_padj) %>% pull(symbol) %>% head()
)


multi_result %>% broadSeq::volcanoPlot(
    pValName = "deseq2_padj",
    lFCName = "deseq2_log2FoldChange",
    labelName = "symbol",
    palette = c("purple","orange","grey"),
    selectedLabel = list(criteria = "(`x` > 5 | `x` < -2) & (`y` > 10)")
) +xlim(-7.5,7.5)

Session Info

sessionInfo()
#> R version 4.4.1 (2024-06-14)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.1 LTS
#> 
#> Matrix products: default
#> BLAS:   /home/biocbuild/bbs-3.20-bioc/R/lib/libRblas.so 
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_GB              LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: America/New_York
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats4    stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#>  [1] broadSeq_1.0.0              SummarizedExperiment_1.36.0
#>  [3] Biobase_2.66.0              GenomicRanges_1.58.0       
#>  [5] GenomeInfoDb_1.42.0         IRanges_2.40.0             
#>  [7] S4Vectors_0.44.0            BiocGenerics_0.52.0        
#>  [9] MatrixGenerics_1.18.0       matrixStats_1.4.1          
#> [11] ggpubr_0.6.0                ggplot2_3.5.1              
#> [13] dplyr_1.1.4                 BiocStyle_2.34.0           
#> 
#> loaded via a namespace (and not attached):
#>   [1] splines_4.4.1           bitops_1.0-9            ggplotify_0.1.2        
#>   [4] tibble_3.2.1            R.oo_1.26.0             XML_3.99-0.17          
#>   [7] lifecycle_1.0.4         rstatix_0.7.2           rprojroot_2.0.4        
#>  [10] edgeR_4.4.0             doParallel_1.0.17       lattice_0.22-6         
#>  [13] NOISeq_2.50.0           backports_1.5.0         magrittr_2.0.3         
#>  [16] limma_3.62.0            sass_0.4.9              rmarkdown_2.28         
#>  [19] jquerylib_0.1.4         yaml_2.3.10             ggtangle_0.0.3         
#>  [22] cowplot_1.1.3           DBI_1.2.3               RColorBrewer_1.1-3     
#>  [25] pkgload_1.4.0           abind_1.4-8             zlibbioc_1.52.0        
#>  [28] Rtsne_0.17              purrr_1.0.2             R.utils_2.12.3         
#>  [31] yulab.utils_0.1.7       circlize_0.4.16         seriation_1.5.6        
#>  [34] GenomeInfoDbData_1.2.13 enrichplot_1.26.0       ggrepel_0.9.6          
#>  [37] tidytree_0.4.6          testthat_3.2.1.1        genefilter_1.88.0      
#>  [40] pheatmap_1.0.12         annotate_1.84.0         codetools_0.2-20       
#>  [43] DelayedArray_0.32.0     DOSE_4.0.0              tidyselect_1.2.1       
#>  [46] shape_1.4.6.1           aplot_0.2.3             UCSC.utils_1.2.0       
#>  [49] farver_2.1.2            TSP_1.2-4               jsonlite_1.8.9         
#>  [52] GetoptLong_1.0.5        Formula_1.2-5           randomcoloR_1.1.0.1    
#>  [55] survival_3.7-0          iterators_1.0.14        foreach_1.5.2          
#>  [58] tools_4.4.1             treeio_1.30.0           sechm_1.14.0           
#>  [61] Rcpp_1.0.13             glue_1.8.0              gridExtra_2.3          
#>  [64] SparseArray_1.6.0       xfun_0.48               DESeq2_1.46.0          
#>  [67] qvalue_2.38.0           ca_0.71.1               withr_3.0.2            
#>  [70] BiocManager_1.30.25     fastmap_1.2.0           fansi_1.0.6            
#>  [73] caTools_1.18.3          digest_0.6.37           DELocal_1.6.0          
#>  [76] R6_2.5.1                gridGraphics_0.5-1      colorspace_2.1-1       
#>  [79] GO.db_3.20.0            gtools_3.9.5            RSQLite_2.3.7          
#>  [82] R.methodsS3_1.8.2       ggsci_3.2.0             utf8_1.2.4             
#>  [85] tidyr_1.3.1             generics_0.1.3          data.table_1.16.2      
#>  [88] httr_1.4.7              S4Arrays_1.6.0          pkgconfig_2.0.3        
#>  [91] gtable_0.3.6            blob_1.2.4              registry_0.5-1         
#>  [94] ComplexHeatmap_2.22.0   XVector_0.46.0          brio_1.1.5             
#>  [97] clusterProfiler_4.14.0  htmltools_0.5.8.1       carData_3.0-5          
#> [100] bookdown_0.41           fgsea_1.32.0            clue_0.3-65            
#> [103] blockmodeling_1.1.5     scales_1.3.0            png_0.1-8              
#> [106] EBSeq_2.4.0             ggfun_0.1.7             knitr_1.48             
#> [109] reshape2_1.4.4          rjson_0.2.23            nlme_3.1-166           
#> [112] curl_5.2.3              cachem_1.1.0            GlobalOptions_0.1.2    
#> [115] stringr_1.5.1           KernSmooth_2.23-24      parallel_4.4.1         
#> [118] AnnotationDbi_1.68.0    desc_1.4.3              pillar_1.9.0           
#> [121] grid_4.4.1              vctrs_0.6.5             gplots_3.2.0           
#> [124] car_3.1-3               xtable_1.8-4            cluster_2.1.6          
#> [127] evaluate_1.0.1          magick_2.8.5            tinytex_0.53           
#> [130] cli_3.6.3               locfit_1.5-9.10         compiler_4.4.1         
#> [133] rlang_1.1.4             crayon_1.5.3            ggsignif_0.6.4         
#> [136] labeling_0.4.3          forcats_1.0.0           plyr_1.8.9             
#> [139] fs_1.6.4                stringi_1.8.4           BiocParallel_1.40.0    
#> [142] munsell_0.5.1           Biostrings_2.74.0       lazyeval_0.2.2         
#> [145] V8_6.0.0                GOSemSim_2.32.0         Matrix_1.7-1           
#> [148] RcppEigen_0.3.4.0.2     patchwork_1.3.0         bit64_4.5.2            
#> [151] KEGGREST_1.46.0         statmod_1.5.0           highr_0.11             
#> [154] igraph_2.1.1            broom_1.0.7             memoise_2.0.1          
#> [157] bslib_0.8.0             ggtree_3.14.0           fastmatch_1.1-4        
#> [160] bit_4.5.0               ape_5.8                 gson_0.1.0

Using broadSeq to analyze RNA-seq data

29 October 2024