To analyze RNA-seq count data, there are several ways/methods for each steps like
Transforming/scaling of the count data,
QC by clustering the samples using PCA, hierarchical clustering or multidimensional scaling
Most importantly identification of differentially expressed.
For each of these steps, there are different packages or tools whose input and output formats are very different. Therefore it is very difficult to use all these features from different packages in a study.
The broadSeq package simplifies the process of including many Bioconductor packages for RNA-seq data and evaluating their performance.
The silent features of broadSeq are
data.frame
ggplot2
and ggpubr
packages for publication ready
figureslibrary(broadSeq)
library(ggplot2)
Here we read gene expression data which is quantified by salmon
and
processed by tximport package. The salmon
generates count data,
“abundance” (TPM) and gene length values. Therefore we may rename the corresponding
assay slot ‘abundance’ to ‘TPM’.
These data belongs to this studies reference!!
In this data, there are gene expression values from first molar tissue of three different developing rodent species. The tissues are also from four different developmental time points.
se <- readRDS(system.file("extdata","rat_vole_mouseSE_salmon.rds", package = "broadSeq"))
SummarizedExperiment::assayNames(se)
#> [1] "counts" "abundance" "avgTxLength" "vst"
as.data.frame(colData(se)) %>% dplyr::count(stage,species) %>% tidyr::spread(stage,n)
#> species Bud Cap Late Cap Bell
#> 1 Mouse 7 7 7 7
#> 2 Rat 5 5 5 5
#> 3 Vole 7 7 7 7
se$stage <- factor(se$stage,levels = c("Bud","Cap","Late Cap","Bell"))
assays(se)[["counts"]][,5] %>% ggpubr::ggdensity(y = "count")+
ggplot2::geom_vline(xintercept = 10)+ggplot2::scale_x_log10()
keep <- (assays(se)[["counts"]] >= 3) %>% rowSums() >= 5
# smallest Group Size is 5
table(keep)
#> keep
#> FALSE TRUE
#> 689 5089
edgeR provides two different methods ; CPM and TMM. But it does not use normalized values for Differential Expression.
It also does not normalizes count values rather it normalizes the library sizes using “TMM” method in ‘normLibSizes’. Either use or not use normLibSizes(). edgeR::cpm() will generate normalized expression values.
se <- broadSeq::normalizeEdgerCPM(se ,method = "none",cpm.log = TRUE )
## The normalized values are added with the assay name "logCPM"
SummarizedExperiment::assayNames(se)
#> [1] "counts" "abundance" "avgTxLength" "vst" "logCPM"
se <- broadSeq::normalizeEdgerCPM(se , method = "TMM", cpm.log = FALSE )
## The normalized values are added with the assay name "TMM"
SummarizedExperiment::assayNames(se)
#> [1] "counts" "abundance" "avgTxLength" "vst" "logCPM"
#> [6] "TMM"
assays(se)[["counts"]][1:5,1:5]
#> ME16-E3M1L ME16-E5M1L ME16-E6M1R ME16-E6M1L ME16-E7M1R
#> Axin2 1080 2844 3129 2747 1850
#> Mmp14 2031 8334 8328 8626 6147
#> Timp1 60 96 209 265 66
#> Pax9 3063 6011 7177 6438 3794
#> Irx2 519 1237 1160 1056 802
assays(se)[["TMM"]][1:5,1:5]
#> ME16-E3M1L ME16-E5M1L ME16-E6M1R ME16-E6M1L ME16-E7M1R
#> Axin2 266.12685 485.34661 496.17821 460.53011 432.54157
#> Mmp14 500.46634 1422.24988 1320.60470 1446.13495 1437.20705
#> Timp1 14.78483 16.38301 33.14198 44.42682 15.43121
#> Pax9 754.76533 1025.81522 1138.08597 1079.32029 887.06093
#> Irx2 127.88874 211.10188 183.94590 177.03669 187.51262
assays(se)[["logCPM"]][1:5,1:5]
#> ME16-E3M1L ME16-E5M1L ME16-E6M1R ME16-E6M1L ME16-E7M1R
#> Axin2 8.031398 8.883492 8.916061 8.810157 8.733173
#> Mmp14 8.941266 10.433573 10.327398 10.459897 10.464346
#> Timp1 3.907562 4.037786 5.032632 5.451326 3.969329
#> Pax9 9.533527 9.962371 10.112899 10.037992 9.768499
#> Irx2 6.977147 7.684397 7.487015 7.433479 7.529533
DESeq2 provides three different transformations
variance stabilizing transformation (VST)
se <- broadSeq::transformDESeq2(se,method = "vst" )
#> using 'avgTxLength' from assays(dds), correcting for library size
se <- broadSeq::transformDESeq2(se, method = "normTransform" )
#> Warning in broadSeq::transformDESeq2(se, method = "normTransform"): For length correction assayname must match with avgTxLength
#>
#> using 'avgTxLength' from assays(dds), correcting for library size
regularized log
se <- broadSeq::transformDESeq2(se, method = "rlog")
SummarizedExperiment::assayNames(se)
#> [1] "counts" "abundance" "avgTxLength" "vst"
#> [5] "logCPM" "TMM" "normTransform"
Boxplot of different transforms for each sample.
p <- broadSeq::sampleAssay_plot(se[, se$species=="Mouse" ],
assayName = "counts", fill = "stage",
yscale = "log2")+ rremove("x.text")
p1 <- broadSeq::sampleAssay_plot(se[, se$species=="Mouse"],
assayName = "vst", fill = "stage")+ rremove("x.text")
p2 <- broadSeq::sampleAssay_plot(se[, se$species=="Mouse"],
assayName = "TMM", fill = "stage",
yscale = "log10")+ rremove("x.text")
p3 <- broadSeq::sampleAssay_plot(se[, se$species=="Mouse"],
assayName = "logCPM", fill = "stage")+ rremove("x.text")
ggarrange(p,p1,p2,p3, common.legend = TRUE, labels = c("A","B","C"))
Plot standard deviations versus means expression
if (requireNamespace("vsn", quietly = TRUE)) {
library("vsn")
x <- meanSdPlot(
log2(assays(se[, se$species == "Rat"])[["counts"]]+1),
plot = FALSE)
print(x$gg +ggtitle(label = "log2(n+1) "))
x <- meanSdPlot(
assays(se[, se$species == "Rat"])[["vst"]],
plot = FALSE)
print(x$gg +ggtitle(label = "Vst"))
x <- meanSdPlot(
assays(se[, se$species == "Rat"])[["logCPM"]],
plot = FALSE)
print(x$gg + ggtitle(label = "logCPM"))
}
## Multiple assay of a single gene
broadSeq::assay_plot(se, feature = c("Shh"),
assayNames = c("counts","logCPM","vst","TMM"),
x = "stage", fill="species", add="dotplot", palette = "npg")
## Expression of multiple genes from a single assay
broadSeq::genes_plot(se,
features = c("Shh","Edar"),
facet.by = "symbol",
x = "stage", assayName = "vst", fill="species", palette = "jco")
Scientific journal palettes from ggsci
R package, e.g.: “npg”, “aaas”,
“lancet”, “jco”, “ucscgb”, “uchicago”, “simpsons” and “rickandmorty” can
be passed for coloring or filling by groups from metadata.
jco <- broadSeq::genes_plot(se[,se$species == "Mouse"],
features = c("Shh"), facet.by = "symbol", assayName = "logCPM",
x = "stage", fill="stage", add="dotplot", xlab = "",
title = "Journal of Clinical Oncology", palette = "jco")
npg <- broadSeq::genes_plot(se[,se$species == "Mouse"],
features = c("Shh"), facet.by = "symbol",assayName = "logCPM",
x = "stage", fill="stage", add="dotplot", xlab = "",
title = "Nature Publishing Group", palette = "npg")
aaas <- broadSeq::genes_plot(se[,se$species == "Mouse"],
features = c("Shh"), facet.by = "symbol", assayName = "logCPM",
x = "stage", fill="stage", add="dotplot", xlab = "",
title = "Science", palette = "aaas")
nejm <- broadSeq::genes_plot(se[,se$species == "Mouse"],
features = c("Shh"), facet.by = "symbol", assayName = "logCPM",
x = "stage", fill="stage", add="dotplot", xlab = "",
title = "New England Journal of Medicine",palette = "nejm")
# ggarrange(jco,npg,aaas,nejm,
# common.legend = TRUE,legend = "none",
# labels = c("A","B","C","D"))
ggarrange(jco+ggpubr::rotate_x_text(), npg+ggpubr::rotate_x_text(),
aaas+ggpubr::rotate_x_text(),nejm+ggpubr::rotate_x_text(),
nrow = 1, common.legend = TRUE,legend = "none",
labels = c("A","B","C","D")) %>%
annotate_figure( top = text_grob("Color palette"))
Classical multidimensional scaling is based on measuring the distance between the samples.
Popular function plotMDS from limma does not work with
SummarizedExperiment object. Here broadSeq provides this function
through package cmdscale {stats}
.
broadSeq::plot_MDS(se, scaledAssay = "vst", ntop=500,
color = "species", shape = "stage",
ellipse=TRUE, legend = "bottom")
#> Only using 500 most variable genes.
MDS is cool but it is not possible to know/visualize top variable genes
with their meta data which is stored in se
object
head(rowData(se))
#> DataFrame with 6 rows and 8 columns
#> mouse_gene_id symbol Class chromosome_name
#> <character> <character> <factor> <character>
#> Axin2 ENSMUSG00000000142 Axin2 Dispensable 11
#> Mmp14 ENSMUSG00000000957 Mmp14 Tissue 14
#> Timp1 ENSMUSG00000001131 Timp1 Dispensable X
#> Pax9 ENSMUSG00000001497 Pax9 Progression 12
#> Irx2 ENSMUSG00000001504 Irx2 Dispensable 13
#> Col1a1 ENSMUSG00000001506 Col1a1 Dispensable 11
#> start_position gene_biotype vole_gene_id Rnor_gene_id
#> <integer> <character> <character> <character>
#> Axin2 108811175 protein_coding Mglareolus_00009775 ENSRNOG00000055010
#> Mmp14 54669069 protein_coding Mglareolus_00038784 ENSRNOG00000010947
#> Timp1 20736405 protein_coding Mglareolus_00039838 ENSRNOG00000010208
#> Pax9 56738552 protein_coding Mglareolus_00000242 ENSRNOG00000008826
#> Irx2 72776939 protein_coding Mglareolus_00017335 ENSRNOG00000012742
#> Col1a1 94827050 protein_coding Mglareolus_00042400 ENSRNOG00000003897
Other methods can help to visualize gene information along with clustering information.
p_vst <- broadSeq::plotHeatmapCluster(
se,
scaledAssay = "vst",
annotation_col = c("species", "stage"),
annotation_row = c("Class","gene_biotype"),
ntop = 30, show_geneAs = "symbol",
cluster_cols = TRUE, cluster_rows = FALSE,
show_rownames = TRUE, show_colnames = FALSE,
main = "Top 30 variable gene vst"
)
#> Only using 30 most variable genes.
Perform Principal Components Analysis with function
broadSeq::prcompTidy()
which returns a list of four data.frame
objects:
pc_scores,
eigen_values,
loadings (eigen vectors) and
the original data.
Compute PCA using any assay
computedPCA_logCPM <- broadSeq::prcompTidy(se, scaledAssay = "logCPM", ntop = 500)
#> Only using 500 most variable genes.
## PCA based on vst values
computedPCA_vst <- broadSeq::prcompTidy(se, scaledAssay = "vst", ntop = 500)
#> Only using 500 most variable genes.
plotAnyPC(computedPCA = computedPCA_logCPM,
x = 1, y = 2, color = "species", shape = "stage",
legend = "bottom")
#> Warning in seq_len(computedPCA$pc_scores[, dottedArg$shape]): first element
#> used of 'length.out' argument
Here PC 2 and 3 are used which can cluster the samples by both species and developmental factors.
pca_vst <- plotAnyPC(computedPCA = computedPCA_vst,
x = 2, y = 3, color = "species", shape = "stage",
legend = "bottom")
#> Warning in seq_len(computedPCA$pc_scores[, dottedArg$shape]): first element
#> used of 'length.out' argument
computedPCA_vst$eigen_values %>%
dplyr::filter(var_exp >= 2) %>%
ggbarplot(x="PC",y="var_exp", label = TRUE, label.pos = "out")
It can be checked if there are other PCs to explain considerable variance. PC3 can be useful to see variance due to different developmental time points.
pca_vst_2_3 <-plotAnyPC(computedPCA = computedPCA_vst,
x = 2, y = 3,
color = "species", shape = "stage", legend = "bottom")
#> Warning in seq_len(computedPCA$pc_scores[, dottedArg$shape]): first element
#> used of 'length.out' argument
# pca_vst_2_3
PC3 captures beautifully the variance in gene expression due to developmental stages.
computedPCA_vst %>% broadSeq::getFeatureLoadRanking(keep = c("symbol","Class")) %>% head()
#> symbol Class loading PC Rank
#> 1 Rpl4 Other -0.10384659 PC1 1
#> 2 Hsp90b1 Other -0.09727451 PC1 2
#> 3 Atrx Dev. process -0.09419071 PC1 3
#> 4 Rpl19 Other -0.09347617 PC1 4
#> 5 Ncl Other -0.09181188 PC1 5
#> 6 Rpl3 Other -0.08743885 PC1 6
# Top 5 genes of PC2
computedPCA_vst$loadings %>% top_n(5,abs(PC2) ) %>% dplyr::select(gene,PC2)
#> gene PC2
#> Aadacl2fm3 Aadacl2fm3 0.1635445
#> Rack1 Rack1 -0.1452180
#> Rpl11 Rpl11 -0.1474743
#> Rpl8 Rpl8 -0.1511175
#> Rplp0 Rplp0 -0.1586286
pca_vst_loading <- computedPCA_vst %>%
broadSeq::getFeatureLoadRanking(keep = c("symbol","Class"), topN = 50, pcs=1:10) %>%
dplyr::count(Class, PC) %>%
ggbarplot(
x = "PC", y = "n", fill = "Class",
legend = "bottom", palette = c("red","blue","orange","purple","white","grey")
)
# pca_vst_loading
# By default it plots top 2 genes from each PC axis
pca_vst_bi <- broadSeq::biplotAnyPC(computedPCA = computedPCA_vst,
x = 1, y = 2, genesLabel = "symbol",
color = "species", shape = "stage",
legend = "bottom")
#> Warning in seq_len(computedPCA$pc_scores[, dottedArg$shape]): first element
#> used of 'length.out' argument
# pca_vst_bi
ggarrange(
ggarrange(pca_vst_bi+ggtitle(label = ""),
pca_vst_2_3+ggtitle(label = ""), common.legend = TRUE),
pca_vst_loading, nrow = 2)
Now plotting top 5 genes from PC3
# Top 5 genes of PC3
biplotAnyPC(computedPCA = computedPCA_vst,x = 2, y = 3,
color = "species", shape = "stage",
genes= computedPCA_vst$loadings %>%
top_n(5,abs(PC3)) %>% pull(gene),
genesLabel = "symbol")
#> Warning in seq_len(computedPCA$pc_scores[, dottedArg$shape]): first element
#> used of 'length.out' argument
## Plot progression gene "Shh"
biplotAnyPC(computedPCA = computedPCA_vst,x = 2, y = 3,
color = "species", shape = "stage",
genes=c("Shh"),
genesLabel = "symbol")
#> Warning in seq_len(computedPCA$pc_scores[, dottedArg$shape]): first element
#> used of 'length.out' argument
In this example we will use RNA-seq expression data from developing mouse molar tissue. First we read the data as a SummarizedExperiment
object.
se <- readRDS(system.file("extdata","rat_vole_mouseSE_salmon.rds", package = "broadSeq"))
# To reduce the run time, subset of the data used here
se <- se[,colData(se)$species == "Mouse"]
head(rownames(se))
#> [1] "Axin2" "Mmp14" "Timp1" "Pax9" "Irx2" "Col1a1"
head(rowData(se))
#> DataFrame with 6 rows and 8 columns
#> mouse_gene_id symbol Class chromosome_name
#> <character> <character> <factor> <character>
#> Axin2 ENSMUSG00000000142 Axin2 Dispensable 11
#> Mmp14 ENSMUSG00000000957 Mmp14 Tissue 14
#> Timp1 ENSMUSG00000001131 Timp1 Dispensable X
#> Pax9 ENSMUSG00000001497 Pax9 Progression 12
#> Irx2 ENSMUSG00000001504 Irx2 Dispensable 13
#> Col1a1 ENSMUSG00000001506 Col1a1 Dispensable 11
#> start_position gene_biotype vole_gene_id Rnor_gene_id
#> <integer> <character> <character> <character>
#> Axin2 108811175 protein_coding Mglareolus_00009775 ENSRNOG00000055010
#> Mmp14 54669069 protein_coding Mglareolus_00038784 ENSRNOG00000010947
#> Timp1 20736405 protein_coding Mglareolus_00039838 ENSRNOG00000010208
#> Pax9 56738552 protein_coding Mglareolus_00000242 ENSRNOG00000008826
#> Irx2 72776939 protein_coding Mglareolus_00017335 ENSRNOG00000012742
#> Col1a1 94827050 protein_coding Mglareolus_00042400 ENSRNOG00000003897
The sample metadata
head(colData(se))
#> DataFrame with 6 rows and 4 columns
#> day info stage species
#> <factor> <character> <factor> <factor>
#> ME16-E3M1L E16 ME16-E3M1L Bell Mouse
#> ME16-E5M1L E16 ME16-E5M1L Bell Mouse
#> ME16-E6M1R E16 ME16-E6M1R Bell Mouse
#> ME16-E6M1L E16 ME16-E6M1L Bell Mouse
#> ME16-E7M1R E16 ME16-E7M1R Bell Mouse
#> ME16-E8M1R E16 ME16-E8M1R Bell Mouse
table(colData(se)$stage)
#>
#> Bud Cap Late Cap Bell
#> 7 7 7 7
There are four developmental time points; Bud, Cap, Late Cap and Bell (from embryonic days 13 to 16) with seven replicates each. Here, in this example, DE genes between Bud and Cap stages will be identified.
In broadSeq, the names of DE method has similar pattern like “use_”+“method name”. All these method has same signature of input arguments.
Here we will see use_NOIseq()
method to apply NOISeq::noiseqbio()
function.
result_Noiseq <-
use_NOIseq(se = se,
colData_id = "stage", control = "Bud", treatment = "Cap",
rank = TRUE,
r = 10) # r is an argument of NOISeq::noiseqbio
#> Computing Z values...
#> Filtering out low count features...
#> 4654 features are to be kept for differential expression analysis with filtering method 1
#> [1] "r = 1"
#> [1] "r = 2"
#> [1] "r = 3"
#> [1] "r = 4"
#> [1] "r = 5"
#> [1] "r = 6"
#> [1] "r = 7"
#> [1] "r = 8"
#> [1] "r = 9"
#> [1] "r = 10"
#> Computing probability of differential expression...
#> p0 = 0.3809764779521
#> Probability
#> Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
#> 0.0000 0.3169 0.7234 0.6102 0.9102 1.0000 1124
head(result_Noiseq)
#> Bud_mean Cap_mean theta prob log2FC rank
#> Dlx2 888.94171 375.731990 4.449016 1 1.242385 1
#> Irf6 511.38324 215.511591 4.491907 1 1.246639 2
#> Ptch2 358.98412 151.233857 3.525978 1 1.247139 3
#> Spp1 43.80853 1.627490 3.780321 1 4.750491 4
#> Fgf3 92.59420 1.270972 4.002669 1 6.186918 5
#> Sostdc1 993.75851 448.122284 4.155289 1 1.149003 6
pg <- broadSeq::genes_plot(se, x = "stage", assayName = "counts",
features = result_Noiseq %>% dplyr::filter(rank <5) %>% rownames(),
fill="stage", facet.by = "symbol",
palette="jco", add = "dotplot")+rotate_x_text()
pg_sc <- ggscatter(result_Noiseq, x="Bud_mean", y="Cap_mean",color = "prob")+
scale_x_log10()+scale_y_log10()
pg+pg_sc
Following implementations of popular methods are available here.
# limma
?use_limma_trend(se, colData_id, control, treatment, rank = FALSE, ...)
?use_limma_voom(se, colData_id, control, treatment, rank = FALSE, ...)
# edgeR
?use_edgeR_exact(se, colData_id, control, treatment, rank = FALSE, ...)
?use_edgeR_GLM(se, colData_id, control, treatment, rank = FALSE, ...)
# deseq2
?use_deseq2(se, colData_id, control, treatment, rank = FALSE, ...)
# DELocal
?use_DELocal(se, colData_id, control, treatment, rank = FALSE, ...)
# noiseq
?use_NOIseq(se, colData_id, control, treatment, rank = FALSE, ...)
# EBSeq
?use_EBSeq(se, colData_id, control, treatment, rank = FALSE, ...)
# samseq
?use_SAMseq(se, colData_id, control, treatment, rank = FALSE, ...)
Advanced users can pass package specific arguments through ‘…’ .
It is possible to execute all DE methods together and get aggregated results in a data.frame.
broadSeq::use_multDE()
should be used for it.
# First define a named list of functions
funs <- list(limma_trend = use_limma_trend, limma_voom = use_limma_voom,
edgeR_exact = use_edgeR_exact, edgeR_glm = use_edgeR_GLM,
deseq2 = use_deseq2,
DELocal = use_DELocal, noiseq = use_NOIseq,
EBSeq = use_EBSeq)
multi_result <- broadSeq::use_multDE(
se = se,
deFun_list = funs, return.df = TRUE,
colData_id = "stage", control = "Bud", treatment = "Cap",
rank = TRUE)
#> Now executing >> limma_trend
#> ####
#> Now executing >> limma_voom
#> ####
#> Now executing >> edgeR_exact
#> ####
#> Now executing >> edgeR_glm
#> ####
#> Now executing >> deseq2
#> ####
#> factor levels were dropped which had no samples
#> estimating size factors
#> using 'avgTxLength' from assays(dds), correcting for library size
#> estimating dispersions
#> gene-wise dispersion estimates
#> mean-dispersion relationship
#> final dispersion estimates
#> fitting model and testing
#> -- replacing outliers and refitting for 22 genes
#> -- DESeq argument 'minReplicatesForReplace' = 7
#> -- original counts are preserved in counts(dds)
#> estimating dispersions
#> fitting model and testing
#> Now executing >> DELocal
#> ####
#> Default 1Mb neighborhood will be used
#> factor levels were dropped which had no samples
#> using 'avgTxLength' from assays(dds), correcting for library size
#> Now executing >> noiseq
#> ####
#> Now executing >> EBSeq
#> ####
The column names of the resultant data.frame are prefixed with corresponding function names.
head(multi_result)
#> limma_trend_logFC limma_trend_AveExpr limma_trend_t
#> 1110008P14Rik 0.54405160 4.8117022 3.6578280
#> 1110032A03Rik -0.13874330 5.3785105 -1.1189797
#> 1110032F04Rik 1.49491410 3.0666526 7.0648546
#> 1110051M20Rik -0.08607705 6.8048345 -1.8949926
#> 1110059G10Rik 0.11686705 5.4687249 0.9131006
#> 1300017J02Rik -1.75373449 0.3862759 -3.1448352
#> limma_trend_P.Value limma_trend_adj.P.Val limma_trend_B
#> 1110008P14Rik 2.797602e-03 0.0130149310 -2.410387
#> 1110032A03Rik 2.829681e-01 0.4526549814 -6.717686
#> 1110032F04Rik 7.519122e-06 0.0001281578 3.676603
#> 1110051M20Rik 8.005768e-02 0.1777077549 -5.657093
#> 1110059G10Rik 3.774584e-01 0.5365202649 -6.925361
#> 1300017J02Rik 7.568554e-03 0.0283599915 -3.404915
#> limma_trend_rank limma_voom_logFC limma_voom_AveExpr limma_voom_t
#> 1110008P14Rik 1242 0.55012790 4.7963921 3.4912966
#> 1110032A03Rik 3612 -0.13598068 5.3683453 -1.0766122
#> 1110032F04Rik 339 1.52521100 3.0082492 5.5457607
#> 1110051M20Rik 2603 -0.08377041 6.8010984 -1.5379391
#> 1110059G10Rik 4065 0.12743802 5.4591759 0.9788081
#> 1300017J02Rik 1542 -2.79755481 -0.4115494 -3.3164034
#> limma_voom_P.Value limma_voom_adj.P.Val limma_voom_B
#> 1110008P14Rik 3.185938e-03 0.0140952159 -2.450562
#> 1110032A03Rik 2.982906e-01 0.4602197395 -6.781351
#> 1110032F04Rik 5.160299e-05 0.0005491014 2.094132
#> 1110051M20Rik 1.444281e-01 0.2758695513 -6.580324
#> 1110059G10Rik 3.428565e-01 0.5101789153 -6.904758
#> 1300017J02Rik 4.577596e-03 0.0189053550 -1.996826
#> limma_voom_rank edgeR_exact_logFC edgeR_exact_logCPM
#> 1110008P14Rik 1306 0.56742197 4.8655704
#> 1110032A03Rik 3745 -0.11220434 5.3968467
#> 1110032F04Rik 543 1.62986496 3.3283271
#> 1110051M20Rik 3025 -0.08359159 6.8073364
#> 1110059G10Rik 3883 0.13179863 5.4882085
#> 1300017J02Rik 1399 -2.12756839 0.8642446
#> edgeR_exact_PValue edgeR_exact_FDR edgeR_exact_rank
#> 1110008P14Rik 1.000530e-03 5.231732e-03 1105
#> 1110032A03Rik 4.437472e-01 6.609702e-01 3878
#> 1110032F04Rik 2.602324e-09 4.409451e-08 341
#> 1110051M20Rik 3.282945e-01 5.379710e-01 3526
#> 1110059G10Rik 3.735760e-01 5.881532e-01 3670
#> 1300017J02Rik 8.116018e-03 3.044647e-02 1540
#> edgeR_glm_logFC edgeR_glm_logCPM edgeR_glm_LR edgeR_glm_PValue
#> 1110008P14Rik 0.56740347 4.8655850 11.7410747 6.113571e-04
#> 1110032A03Rik -0.11219402 5.3968485 0.6026420 4.375717e-01
#> 1110032F04Rik 1.62985892 3.3283632 35.7072150 2.293132e-09
#> 1110051M20Rik -0.08358111 6.8073360 0.9752985 3.233623e-01
#> 1110059G10Rik 0.13181500 5.4882097 0.8224682 3.644595e-01
#> 1300017J02Rik -2.12795630 0.8641698 6.5759873 1.033637e-02
#> edgeR_glm_FDR edgeR_glm_rank deseq2_baseMean
#> 1110008P14Rik 3.429535e-03 1030 147.18768
#> 1110032A03Rik 6.195270e-01 4081 218.60073
#> 1110032F04Rik 3.840497e-08 345 49.16736
#> 1110051M20Rik 5.007092e-01 3731 595.51100
#> 1110059G10Rik 5.431548e-01 3877 232.78793
#> 1300017J02Rik 3.670776e-02 1627 12.81403
#> deseq2_log2FoldChange deseq2_lfcSE deseq2_stat deseq2_pvalue
#> 1110008P14Rik 0.50451366 0.1657664 3.0435219 2.338264e-03
#> 1110032A03Rik 0.19082266 0.1940145 0.9835483 3.253377e-01
#> 1110032F04Rik 1.58701594 0.2811161 5.6454104 1.647877e-08
#> 1110051M20Rik 0.43358853 0.1816105 2.3874642 1.696506e-02
#> 1110059G10Rik 0.08091388 0.1977231 0.4092283 6.823722e-01
#> 1300017J02Rik -2.81096557 1.0793250 -2.6043737 9.204233e-03
#> deseq2_padj deseq2_rank DELocal_relative.logFC DELocal_P.Value
#> 1110008P14Rik 1.108404e-02 1055 17.83827 0.2729166121
#> 1110032A03Rik 4.972536e-01 3272 42.92129 0.1221318350
#> 1110032F04Rik 3.225187e-07 255 48.05240 0.0007963788
#> 1110051M20Rik 5.449085e-02 1557 88.53803 0.1739847417
#> 1110059G10Rik 8.038971e-01 4245 11.50263 0.7104000042
#> 1300017J02Rik 3.336348e-02 1380 -17.16339 0.0080219554
#> DELocal_adj.P.Val DELocal_B DELocal_rank noiseq_Bud_mean
#> 1110008P14Rik 0.62788658 -4.658483 2740 33.8722312
#> 1110032A03Rik 0.41032892 -4.279768 1972 39.6798287
#> 1110032F04Rik 0.01675251 -1.976905 1891 14.3689100
#> 1110051M20Rik 0.49469006 -4.450382 1367 107.2818570
#> 1110059G10Rik 0.95442262 -5.011704 3056 45.9373517
#> 1300017J02Rik 0.07509916 -2.959888 2777 0.6001355
#> noiseq_Cap_mean noiseq_theta noiseq_prob noiseq_log2FC
#> 1110008P14Rik 23.331126 0.6555759 0.8852864 0.53784711
#> 1110032A03Rik 43.774221 -0.1974162 0.4118575 -0.14167571
#> 1110032F04Rik 4.693828 1.3985890 0.9774037 1.61411366
#> 1110051M20Rik 115.980525 -0.2726831 0.5827917 -0.11247646
#> 1110059G10Rik 42.887052 0.1415338 0.1180432 0.09912552
#> 1300017J02Rik 2.506642 -0.9268528 0.9510382 -2.06239542
#> noiseq_rank EBSeq_PPEE EBSeq_PPDE EBSeq_Status EBSeq_Direction
#> 1110008P14Rik 1872 0.0315029178 0.96849708 DE Bud Over Cap
#> 1110032A03Rik 3472 0.9747575436 0.02524246 EE Bud Over Cap
#> 1110032F04Rik 505 0.0001226596 0.99987734 DE Bud Over Cap
#> 1110051M20Rik 3056 0.9830170118 0.01698299 EE Bud Over Cap
#> 1110059G10Rik 4102 0.9690535947 0.03094641 EE Bud Over Cap
#> 1300017J02Rik 1040 0.0220131318 0.97798687 DE Bud Over Cap
#> EBSeq_rank mouse_gene_id symbol Class chromosome_name
#> 1110008P14Rik 1219 ENSMUSG00000039195 1110008P14Rik Other 2
#> 1110032A03Rik 4261 ENSMUSG00000037971 1110032A03Rik Other 9
#> 1110032F04Rik 744 ENSMUSG00000046999 1110032F04Rik Other 3
#> 1110051M20Rik 4582 ENSMUSG00000040591 1110051M20Rik Other 2
#> 1110059G10Rik 4093 ENSMUSG00000032551 1110059G10Rik Other 9
#> 1300017J02Rik 1186 ENSMUSG00000033688 1300017J02Rik Other 9
#> start_position gene_biotype vole_gene_id
#> 1110008P14Rik 32267109 protein_coding Mglareolus_00032142
#> 1110032A03Rik 50674128 protein_coding Mglareolus_00020124
#> 1110032F04Rik 68776919 protein_coding Mglareolus_00010961
#> 1110051M20Rik 91105413 protein_coding Mglareolus_00006101
#> 1110059G10Rik 122774154 protein_coding Mglareolus_00025510
#> 1300017J02Rik 103127720 protein_coding Mglareolus_00023537
#> Rnor_gene_id
#> 1110008P14Rik ENSRNOG00000022681
#> 1110032A03Rik ENSRNOG00000030962
#> 1110032F04Rik ENSRNOG00000009803
#> 1110051M20Rik ENSRNOG00000014798
#> 1110059G10Rik ENSRNOG00000004135
#> 1300017J02Rik ENSRNOG00000009434
# nrow(multi_result) == nrow(se)
colnames(multi_result)
#> [1] "limma_trend_logFC" "limma_trend_AveExpr" "limma_trend_t"
#> [4] "limma_trend_P.Value" "limma_trend_adj.P.Val" "limma_trend_B"
#> [7] "limma_trend_rank" "limma_voom_logFC" "limma_voom_AveExpr"
#> [10] "limma_voom_t" "limma_voom_P.Value" "limma_voom_adj.P.Val"
#> [13] "limma_voom_B" "limma_voom_rank" "edgeR_exact_logFC"
#> [16] "edgeR_exact_logCPM" "edgeR_exact_PValue" "edgeR_exact_FDR"
#> [19] "edgeR_exact_rank" "edgeR_glm_logFC" "edgeR_glm_logCPM"
#> [22] "edgeR_glm_LR" "edgeR_glm_PValue" "edgeR_glm_FDR"
#> [25] "edgeR_glm_rank" "deseq2_baseMean" "deseq2_log2FoldChange"
#> [28] "deseq2_lfcSE" "deseq2_stat" "deseq2_pvalue"
#> [31] "deseq2_padj" "deseq2_rank" "DELocal_relative.logFC"
#> [34] "DELocal_P.Value" "DELocal_adj.P.Val" "DELocal_B"
#> [37] "DELocal_rank" "noiseq_Bud_mean" "noiseq_Cap_mean"
#> [40] "noiseq_theta" "noiseq_prob" "noiseq_log2FC"
#> [43] "noiseq_rank" "EBSeq_PPEE" "EBSeq_PPDE"
#> [46] "EBSeq_Status" "EBSeq_Direction" "EBSeq_rank"
#> [49] "mouse_gene_id" "symbol" "Class"
#> [52] "chromosome_name" "start_position" "gene_biotype"
#> [55] "vole_gene_id" "Rnor_gene_id"
DE methods are clustered based on their ranking of genes
clusters <- multi_result %>% dplyr::select(ends_with("rank")) %>% t() %>% dist() %>% hclust()
plot(clusters,main = "distance: Euclidean")
Up regulated genes should be red or hot colors
multi_result %>% broadSeq::volcanoPlot(
pValName = "deseq2_padj",
lFCName = "deseq2_log2FoldChange",
labelName = "symbol",
palette = "lancet" ,
selectedLabel =
multi_result %>% dplyr::arrange(deseq2_padj) %>% pull(symbol) %>% head()
)
multi_result %>% broadSeq::volcanoPlot(
pValName = "deseq2_padj",
lFCName = "deseq2_log2FoldChange",
labelName = "symbol",
palette = c("purple","orange","grey"),
selectedLabel = list(criteria = "(`x` > 5 | `x` < -2) & (`y` > 10)")
) +xlim(-7.5,7.5)
Session Info
sessionInfo()
#> R version 4.4.1 (2024-06-14)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.1 LTS
#>
#> Matrix products: default
#> BLAS: /home/biocbuild/bbs-3.20-bioc/R/lib/libRblas.so
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_GB LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: America/New_York
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] broadSeq_1.0.0 SummarizedExperiment_1.36.0
#> [3] Biobase_2.66.0 GenomicRanges_1.58.0
#> [5] GenomeInfoDb_1.42.0 IRanges_2.40.0
#> [7] S4Vectors_0.44.0 BiocGenerics_0.52.0
#> [9] MatrixGenerics_1.18.0 matrixStats_1.4.1
#> [11] ggpubr_0.6.0 ggplot2_3.5.1
#> [13] dplyr_1.1.4 BiocStyle_2.34.0
#>
#> loaded via a namespace (and not attached):
#> [1] splines_4.4.1 bitops_1.0-9 ggplotify_0.1.2
#> [4] tibble_3.2.1 R.oo_1.26.0 XML_3.99-0.17
#> [7] lifecycle_1.0.4 rstatix_0.7.2 rprojroot_2.0.4
#> [10] edgeR_4.4.0 doParallel_1.0.17 lattice_0.22-6
#> [13] NOISeq_2.50.0 backports_1.5.0 magrittr_2.0.3
#> [16] limma_3.62.0 sass_0.4.9 rmarkdown_2.28
#> [19] jquerylib_0.1.4 yaml_2.3.10 ggtangle_0.0.3
#> [22] cowplot_1.1.3 DBI_1.2.3 RColorBrewer_1.1-3
#> [25] pkgload_1.4.0 abind_1.4-8 zlibbioc_1.52.0
#> [28] Rtsne_0.17 purrr_1.0.2 R.utils_2.12.3
#> [31] yulab.utils_0.1.7 circlize_0.4.16 seriation_1.5.6
#> [34] GenomeInfoDbData_1.2.13 enrichplot_1.26.0 ggrepel_0.9.6
#> [37] tidytree_0.4.6 testthat_3.2.1.1 genefilter_1.88.0
#> [40] pheatmap_1.0.12 annotate_1.84.0 codetools_0.2-20
#> [43] DelayedArray_0.32.0 DOSE_4.0.0 tidyselect_1.2.1
#> [46] shape_1.4.6.1 aplot_0.2.3 UCSC.utils_1.2.0
#> [49] farver_2.1.2 TSP_1.2-4 jsonlite_1.8.9
#> [52] GetoptLong_1.0.5 Formula_1.2-5 randomcoloR_1.1.0.1
#> [55] survival_3.7-0 iterators_1.0.14 foreach_1.5.2
#> [58] tools_4.4.1 treeio_1.30.0 sechm_1.14.0
#> [61] Rcpp_1.0.13 glue_1.8.0 gridExtra_2.3
#> [64] SparseArray_1.6.0 xfun_0.48 DESeq2_1.46.0
#> [67] qvalue_2.38.0 ca_0.71.1 withr_3.0.2
#> [70] BiocManager_1.30.25 fastmap_1.2.0 fansi_1.0.6
#> [73] caTools_1.18.3 digest_0.6.37 DELocal_1.6.0
#> [76] R6_2.5.1 gridGraphics_0.5-1 colorspace_2.1-1
#> [79] GO.db_3.20.0 gtools_3.9.5 RSQLite_2.3.7
#> [82] R.methodsS3_1.8.2 ggsci_3.2.0 utf8_1.2.4
#> [85] tidyr_1.3.1 generics_0.1.3 data.table_1.16.2
#> [88] httr_1.4.7 S4Arrays_1.6.0 pkgconfig_2.0.3
#> [91] gtable_0.3.6 blob_1.2.4 registry_0.5-1
#> [94] ComplexHeatmap_2.22.0 XVector_0.46.0 brio_1.1.5
#> [97] clusterProfiler_4.14.0 htmltools_0.5.8.1 carData_3.0-5
#> [100] bookdown_0.41 fgsea_1.32.0 clue_0.3-65
#> [103] blockmodeling_1.1.5 scales_1.3.0 png_0.1-8
#> [106] EBSeq_2.4.0 ggfun_0.1.7 knitr_1.48
#> [109] reshape2_1.4.4 rjson_0.2.23 nlme_3.1-166
#> [112] curl_5.2.3 cachem_1.1.0 GlobalOptions_0.1.2
#> [115] stringr_1.5.1 KernSmooth_2.23-24 parallel_4.4.1
#> [118] AnnotationDbi_1.68.0 desc_1.4.3 pillar_1.9.0
#> [121] grid_4.4.1 vctrs_0.6.5 gplots_3.2.0
#> [124] car_3.1-3 xtable_1.8-4 cluster_2.1.6
#> [127] evaluate_1.0.1 magick_2.8.5 tinytex_0.53
#> [130] cli_3.6.3 locfit_1.5-9.10 compiler_4.4.1
#> [133] rlang_1.1.4 crayon_1.5.3 ggsignif_0.6.4
#> [136] labeling_0.4.3 forcats_1.0.0 plyr_1.8.9
#> [139] fs_1.6.4 stringi_1.8.4 BiocParallel_1.40.0
#> [142] munsell_0.5.1 Biostrings_2.74.0 lazyeval_0.2.2
#> [145] V8_6.0.0 GOSemSim_2.32.0 Matrix_1.7-1
#> [148] RcppEigen_0.3.4.0.2 patchwork_1.3.0 bit64_4.5.2
#> [151] KEGGREST_1.46.0 statmod_1.5.0 highr_0.11
#> [154] igraph_2.1.1 broom_1.0.7 memoise_2.0.1
#> [157] bslib_0.8.0 ggtree_3.14.0 fastmatch_1.1-4
#> [160] bit_4.5.0 ape_5.8 gson_0.1.0