Project: Example bumphunter.
This report is meant to help explore a set of genomic regions and was generated using the regionReport
(Collado-Torres, Jaffe, and Leek, 2015) package. While the report is rich, it is meant to just start the exploration of the results and exemplify some of the code used to do so. If you need a more in-depth analysis for your specific data set you might want to use the customCode
argument.
Most plots were made with using ggplot2
(Wickham, 2009).
## knitrBoostrap and device chunk options
load_install('knitr')
opts_chunk$set(bootstrap.show.code = FALSE, dev = device)
if(!outputIsHTML) opts_chunk$set(bootstrap.show.code = FALSE, dev = device, echo = FALSE)
#### Libraries needed
## Bioconductor
load_install('bumphunter')
load_install('derfinder')
load_install('derfinderPlot')
load_install('GenomeInfoDb')
load_install('GenomicRanges')
load_install('ggbio')
## Transcription database to use by default
if(is.null(txdb)) {
load_install('TxDb.Hsapiens.UCSC.hg19.knownGene')
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene::TxDb.Hsapiens.UCSC.hg19.knownGene
}
## CRAN
load_install('ggplot2')
if(!is.null(theme)) theme_set(theme)
load_install('grid')
load_install('gridExtra')
load_install('knitr')
load_install('RColorBrewer')
load_install('mgcv')
load_install('whisker')
load_install('DT')
load_install('devtools')
## Working behind the scenes
# load_install('knitcitations')
# load_install('rmarkdown')
## Optionally
# load_install('knitrBootstrap')
#### Code setup
## For ggplot
tmp <- regions
names(tmp) <- seq_len(length(tmp))
regions.df <- as.data.frame(tmp)
regions.df$width <- width(tmp)
rm(tmp)
## Special subsets: need at least 3 points for a density plot
keepChr <- table(regions.df$seqnames) > 2
regions.df.plot <- subset(regions.df, seqnames %in% names(keepChr[keepChr]))
if(hasSignificant) {
## Keep only those sig
regions.df.sig <- regions.df[significantVar, ]
keepChr <- table(regions.df.sig$seqnames) > 2
regions.df.sig <- subset(regions.df.sig, seqnames %in% names(keepChr[keepChr]))
}
## Find which chrs are present in the data set
chrs <- levels(seqnames(regions))
## areaVar initialize
areaVar <- NULL
p2a <- ggplot(regions.df.plot, aes(x=log10(width), colour=seqnames)) +
geom_line(stat='density') + labs(title='Density of region lengths') +
xlab('Region width (log10)') + scale_colour_discrete(limits=chrs) +
theme(legend.title=element_blank())
p2a
This plot shows the density of the region lengths for all regions.
for(i in seq_len(length(densityVars))) {
densityVarName <- names(densityVars[i])
densityVarName <- ifelse(is.null(densityVarName), densityVars[i], densityVarName)
cat(knit_child(text = whisker.render(templateDensityInUse, list(varName = densityVars[i], densityVarName = densityVarName)), quiet = TRUE), sep = '\n')
}
p3aarea <- ggplot(regions.df.plot[is.finite(regions.df.plot[, 'area']), ], aes(x=area, colour=seqnames)) +
geom_line(stat='density') + labs(title='Density of Area') +
xlab('Area') + scale_colour_discrete(limits=chrs) +
theme(legend.title=element_blank())
p3aarea
This plot shows the density of the Area for all regions.
p3avalue <- ggplot(regions.df.plot[is.finite(regions.df.plot[, 'value']), ], aes(x=value, colour=seqnames)) +
geom_line(stat='density') + labs(title='Density of Value') +
xlab('Value') + scale_colour_discrete(limits=chrs) +
theme(legend.title=element_blank())
p3avalue
This plot shows the density of the Value for all regions.
p3aclusterL <- ggplot(regions.df.plot[is.finite(regions.df.plot[, 'clusterL']), ], aes(x=clusterL, colour=seqnames)) +
geom_line(stat='density') + labs(title='Density of Cluster Length') +
xlab('Cluster Length') + scale_colour_discrete(limits=chrs) +
theme(legend.title=element_blank())
p3aclusterL
This plot shows the density of the Cluster Length for all regions.
The following plots were made using ggbio
(Yin, Cook, and Lawrence, 2012) which in turn uses ggplot2
(Wickham, 2009). For more details check plotOverview
in derfinderPlot
(Collado-Torres, Jaffe, and Leek, 2015).
This plot shows the genomic locations of the regions found in the analysis. The significant regions are highlighted and the of the regions is shown on top of each chromosome (skipped because there was no applicable variable).
## Annotate regions with bumphunter
if(is.null(annotation)) {
genes <- annotateTranscripts(txdb = txdb)
annotation <- matchGenes(x = regions, subject = genes)
}
## Warning: Calling species() on a TxDb object is *deprecated*.
## Please use organism() instead.
## Make the plot
plotOverview(regions=regions, annotation=annotation, type='annotation', base_size=overviewParams$base_size, areaRel=overviewParams$areaRel, legend.position=c(0.97, 0.12))
This genomic overview plot shows the annotation region type for the regions as determined using bumphunter
(Jaffe, Murakami, Lee, Leek, et al., 2012). Note that the regions are shown only if the annotation information is available. Below is a table of the actual number of results per annotation region type.
annoReg <- table(annotation$region, useNA='always')
annoReg.df <- data.frame(Region=names(annoReg), Count=as.vector(annoReg))
if(outputIsHTML) {
kable(annoReg.df, format = 'markdown', align=rep('c', 3))
} else {
kable(annoReg.df)
}
Region | Count |
---|---|
upstream | 10 |
promoter | 0 |
overlaps 5’ | 0 |
inside | 0 |
overlaps 3’ | 0 |
close to 3’ | 0 |
downstream | 5 |
covers | 0 |
NA | 0 |
This genomic overview plot shows the annotation region type for the statistically significant regions. Note that the regions are shown only if the annotation information is available. Plot skipped because there are no significant regions.
Below is a table summarizing the number of genomic states per region as determined using derfinder
(Collado-Torres, Nellore, Frazee, Wilks, et al., 2016).
## Construct genomic state object
genomicState <- makeGenomicState(txdb = txdb, chrs = chrs, verbose = FALSE)
## 'select()' returned 1:1 mapping between keys and columns
## Annotate regions by genomic state
annotatedRegions <- annotateRegions(regions, genomicState$fullGenome, verbose = FALSE)
## Genomic states table
info <- do.call(rbind, lapply(annotatedRegions$countTable, function(x) { data.frame(table(x)) }))
colnames(info) <- c('Number of Overlapping States', 'Frequency')
info$State <- gsub('\\..*', '', rownames(info))
rownames(info) <- NULL
if(outputIsHTML) {
kable(info, format = 'markdown', align=rep('c', 4))
} else {
kable(info)
}
Number of Overlapping States | Frequency | State |
---|---|---|
0 | 15 | exon |
1 | 15 | intergenic |
0 | 15 | intron |
The following is a venn diagram showing how many regions overlap known exons, introns, and intergenic segments, none of them, or multiple of these groups.
## Venn diagram for all regions
venn <- vennRegions(annotatedRegions, counts.col = 'blue',
main = 'Regions overlapping genomic states')
Below is an interactive table with the top 15 regions (out of 15) as ranked by p-value without ranking because no p-value information was provided. Inf and -Inf are shown as 1e100 and -1e100 respectively. Use the search function to find your region of interest or sort by one of the columns.
## Add annotation information
regions.df <- cbind(regions.df, annotation)
## Rank by p-value (first pvalue variable supplied)
if(hasPvalueVars){
topRegions <- head(regions.df[order(regions.df[, pvalueVars[1]],
decreasing = FALSE), ], nBestRegions)
topRegions <- cbind(data.frame('pvalueRank' = seq_len(nrow(topRegions))),
topRegions)
} else {
topRegions <- head(regions.df, nBestRegions)
}
## Clean up -Inf, Inf if present
## More details at https://github.com/ramnathv/rCharts/issues/259
replaceInf <- function(df, colsubset=seq_len(ncol(df))) {
for(i in colsubset) {
inf.idx <- !is.finite(df[, i])
if(any(inf.idx)) {
inf.sign <- sign(df[inf.idx, i])
df[inf.idx, i] <- inf.sign * 1e100
}
}
return(df)
}
topRegions <- replaceInf(topRegions, which(sapply(topRegions, function(x) {
class(x) %in% c('numeric', 'integer')})))
## Make the table
greptext <- 'value$|area$|mean|log2FoldChange'
greppval <- 'pvalues$|qvalues$|fwer$'
if(hasPvalueVars) {
greppval <- paste0(paste(pvalueVars, collapse = '$|'), '$|', greppval)
}
if(hasDensityVars) {
greptext <- paste0(paste(densityVars, collapse = '$|'), '$|', greptext)
}
for(i in which(grepl(greppval, colnames(topRegions)))) topRegions[, i] <- format(topRegions[, i], scientific = TRUE)
if(outputIsHTML) {
datatable(topRegions, options = list(pagingType='full_numbers', pageLength=10, scrollX='100%'), rownames = FALSE) %>% formatRound(which(grepl(greptext, colnames(topRegions))), digits)
} else {
## Only print the top part if your output is a PDF file
df_top <- head(topRegions, 20)
for(i in which(grepl(greptext, colnames(topRegions)))) df_top[, i] <- round(df_top[, i], digits)
kable(df_top)
}
This report was generated in path /tmp/RtmpW7MgwM/Rbuild7d0c1207131/regionReport/vignettes using the following call to renderReport()
:
## renderReport(regions = regions, project = "Example bumphunter",
## pvalueVars = NULL, densityVars = c(Area = "area", Value = "value",
## `Cluster Length` = "clusterL"), significantVar = NULL,
## outdir = ".", output = "bumphunterExampleOutput", device = "png",
## template = "regionReportBumphunter.Rmd")
Date the report was generated.
## [1] "2016-05-15 21:56:24 PDT"
Wallclock time spent generating the report.
## Time difference of 1.913 mins
R
session information.
## Session info -----------------------------------------------------------------------------------------------------------
## setting value
## version R version 3.3.0 (2016-05-03)
## system x86_64, linux-gnu
## ui X11
## language en_US:
## collate C
## tz <NA>
## date 2016-05-15
## Packages ---------------------------------------------------------------------------------------------------------------
## package * version date source
## AnnotationDbi * 1.34.2 2016-05-16 Bioconductor
## AnnotationHub 2.4.2 2016-05-16 Bioconductor
## BSgenome 1.40.0 2016-05-16 Bioconductor
## Biobase * 2.32.0 2016-05-16 Bioconductor
## BiocGenerics * 0.18.0 2016-05-16 Bioconductor
## BiocInstaller 1.22.2 2016-05-16 Bioconductor
## BiocParallel 1.6.2 2016-05-16 Bioconductor
## BiocStyle * 2.0.2 2016-05-16 Bioconductor
## Biostrings 2.40.0 2016-05-16 Bioconductor
## DBI 0.4-1 2016-05-08 CRAN (R 3.3.0)
## DEFormats 1.0.2 2016-05-16 Bioconductor
## DESeq2 1.12.2 2016-05-16 Bioconductor
## DT * 0.1 2015-06-09 CRAN (R 3.3.0)
## Formula 1.2-1 2015-04-07 CRAN (R 3.3.0)
## GGally 1.0.1 2016-01-14 CRAN (R 3.3.0)
## GenomeInfoDb * 1.8.2 2016-05-16 Bioconductor
## GenomicAlignments 1.8.0 2016-05-16 Bioconductor
## GenomicFeatures * 1.24.2 2016-05-16 Bioconductor
## GenomicFiles 1.8.0 2016-05-16 Bioconductor
## GenomicRanges * 1.24.0 2016-05-16 Bioconductor
## Hmisc 3.17-4 2016-05-02 CRAN (R 3.3.0)
## IRanges * 2.6.0 2016-05-16 Bioconductor
## Matrix 1.2-6 2016-05-02 CRAN (R 3.3.0)
## OrganismDbi 1.14.0 2016-05-16 Bioconductor
## R6 2.1.2 2016-01-26 CRAN (R 3.3.0)
## RBGL 1.48.0 2016-05-16 Bioconductor
## RColorBrewer * 1.1-2 2014-12-07 CRAN (R 3.3.0)
## RCurl 1.95-4.8 2016-03-01 CRAN (R 3.3.0)
## RJSONIO 1.3-0 2014-07-28 CRAN (R 3.3.0)
## RSQLite 1.0.0 2014-10-25 CRAN (R 3.3.0)
## Rcpp 0.12.5 2016-05-14 CRAN (R 3.3.0)
## RefManageR 0.10.13 2016-04-04 CRAN (R 3.3.0)
## Rsamtools 1.24.0 2016-05-16 Bioconductor
## S4Vectors * 0.10.0 2016-05-16 Bioconductor
## SummarizedExperiment 1.2.2 2016-05-16 Bioconductor
## TxDb.Hsapiens.UCSC.hg19.knownGene * 3.2.2 2016-05-04 Bioconductor
## VariantAnnotation 1.18.1 2016-05-16 Bioconductor
## XML 3.98-1.4 2016-03-01 CRAN (R 3.3.0)
## XVector 0.12.0 2016-05-16 Bioconductor
## acepack 1.3-3.3 2014-11-24 CRAN (R 3.3.0)
## annotate 1.50.0 2016-05-16 Bioconductor
## backports 1.0.2 2016-03-18 CRAN (R 3.3.0)
## bibtex 0.4.0 2014-12-31 CRAN (R 3.3.0)
## biomaRt 2.28.0 2016-05-16 Bioconductor
## biovizBase 1.20.0 2016-05-16 Bioconductor
## bitops 1.0-6 2013-08-17 CRAN (R 3.3.0)
## bumphunter * 1.12.0 2016-05-16 Bioconductor
## checkmate 1.7.4 2016-04-08 CRAN (R 3.3.0)
## chron 2.3-47 2015-06-24 CRAN (R 3.3.0)
## cluster 2.0.4 2016-04-18 CRAN (R 3.3.0)
## codetools 0.2-14 2015-07-15 CRAN (R 3.3.0)
## colorspace 1.2-6 2015-03-11 CRAN (R 3.3.0)
## data.table 1.9.6 2015-09-19 CRAN (R 3.3.0)
## derfinder * 1.6.4 2016-05-16 Bioconductor
## derfinderHelper 1.6.3 2016-05-16 Bioconductor
## derfinderPlot * 1.6.3 2016-05-16 Bioconductor
## devtools * 1.11.1 2016-04-21 CRAN (R 3.3.0)
## dichromat 2.0-0 2013-01-24 CRAN (R 3.3.0)
## digest 0.6.9 2016-01-08 CRAN (R 3.3.0)
## doRNG 1.6 2014-03-07 CRAN (R 3.3.0)
## edgeR 3.14.0 2016-05-16 Bioconductor
## ensembldb 1.4.2 2016-05-16 Bioconductor
## evaluate 0.9 2016-04-29 CRAN (R 3.3.0)
## foreach * 1.4.3 2015-10-13 CRAN (R 3.3.0)
## foreign 0.8-66 2015-08-19 CRAN (R 3.3.0)
## formatR 1.4 2016-05-09 CRAN (R 3.3.0)
## genefilter 1.54.2 2016-05-16 Bioconductor
## geneplotter 1.50.0 2016-05-16 Bioconductor
## ggbio * 1.20.1 2016-05-16 Bioconductor
## ggplot2 * 2.1.0 2016-03-01 CRAN (R 3.3.0)
## graph 1.50.0 2016-05-16 Bioconductor
## gridExtra * 2.2.1 2016-02-29 CRAN (R 3.3.0)
## gtable 0.2.0 2016-02-26 CRAN (R 3.3.0)
## highr 0.6 2016-05-09 CRAN (R 3.3.0)
## htmltools 0.3.5 2016-03-21 CRAN (R 3.3.0)
## htmlwidgets 0.6 2016-02-25 CRAN (R 3.3.0)
## httpuv 1.3.3 2015-08-04 CRAN (R 3.3.0)
## httr 1.1.0 2016-01-28 CRAN (R 3.3.0)
## interactiveDisplayBase 1.10.3 2016-05-16 Bioconductor
## iterators * 1.0.8 2015-10-13 CRAN (R 3.3.0)
## jsonlite 0.9.20 2016-05-10 CRAN (R 3.3.0)
## knitcitations 1.0.7 2015-10-28 CRAN (R 3.3.0)
## knitr * 1.13 2016-05-09 CRAN (R 3.3.0)
## knitrBootstrap 1.0.0 2015-12-16 CRAN (R 3.3.0)
## labeling 0.3 2014-08-23 CRAN (R 3.3.0)
## lattice 0.20-33 2015-07-14 CRAN (R 3.3.0)
## latticeExtra 0.6-28 2016-02-09 CRAN (R 3.3.0)
## limma 3.28.4 2016-05-16 Bioconductor
## locfit * 1.5-9.1 2013-04-20 CRAN (R 3.3.0)
## lubridate 1.5.6 2016-04-06 CRAN (R 3.3.0)
## magrittr 1.5 2014-11-22 CRAN (R 3.3.0)
## markdown 0.7.7 2015-04-22 CRAN (R 3.3.0)
## matrixStats 0.50.2 2016-04-24 CRAN (R 3.3.0)
## memoise 1.0.0 2016-01-29 CRAN (R 3.3.0)
## mgcv * 1.8-12 2016-03-03 CRAN (R 3.3.0)
## mime 0.4 2015-09-03 CRAN (R 3.3.0)
## munsell 0.4.3 2016-02-13 CRAN (R 3.3.0)
## nlme * 3.1-128 2016-05-10 CRAN (R 3.3.0)
## nnet 7.3-12 2016-02-02 CRAN (R 3.3.0)
## org.Hs.eg.db * 3.3.0 2016-05-04 Bioconductor
## pkgmaker 0.22 2014-05-14 CRAN (R 3.3.0)
## plyr 1.8.3 2015-06-12 CRAN (R 3.3.0)
## qvalue 2.4.2 2016-05-16 Bioconductor
## regionReport * 1.6.5 2016-05-16 Bioconductor
## registry 0.3 2015-07-08 CRAN (R 3.3.0)
## reshape 0.8.5 2014-04-23 CRAN (R 3.3.0)
## reshape2 1.4.1 2014-12-06 CRAN (R 3.3.0)
## rmarkdown 0.9.6 2016-05-01 CRAN (R 3.3.0)
## rngtools 1.2.4 2014-03-06 CRAN (R 3.3.0)
## rpart 4.1-10 2015-06-29 CRAN (R 3.3.0)
## rtracklayer 1.32.0 2016-05-16 Bioconductor
## scales 0.4.0 2016-02-26 CRAN (R 3.3.0)
## shiny 0.13.2 2016-03-28 CRAN (R 3.3.0)
## stringi 1.0-1 2015-10-22 CRAN (R 3.3.0)
## stringr 1.0.0 2015-04-30 CRAN (R 3.3.0)
## survival 2.39-4 2016-05-11 CRAN (R 3.3.0)
## whisker * 0.3-2 2013-04-28 CRAN (R 3.3.0)
## withr 1.0.1.9000 2016-05-05 Github (jimhester/withr@bd42181)
## xtable 1.8-2 2016-02-05 CRAN (R 3.3.0)
## yaml 2.1.13 2014-06-12 CRAN (R 3.3.0)
## zlibbioc 1.18.0 2016-05-16 Bioconductor
Pandoc version used: 1.16.0.2.
This report was created with regionReport
(Collado-Torres, Jaffe, and Leek, 2015) using rmarkdown
(Allaire, Cheng, Xie, McPherson, et al., 2016) while knitr
(Xie, 2014) and DT
(Xie, 2015) were running behind the scenes. whisker
(de Jonge, 2013) was used for creating templates for the pvalueVars
and densityVars
.
Citations made with knitcitations
(Boettiger, 2015). The BibTeX file can be found here.
[1] J. Allaire, J. Cheng, Y. Xie, J. McPherson, et al. rmarkdown: Dynamic Documents for R. R package version 0.9.6. 2016. URL: https://CRAN.R-project.org/package=rmarkdown.
[2] C. Boettiger. knitcitations: Citations for ‘Knitr’ Markdown Files. R package version 1.0.7. 2015. URL: https://CRAN.R-project.org/package=knitcitations.
[3] L. Collado-Torres, A. E. Jaffe and J. T. Leek. derfinderPlot: Plotting functions for derfinder. https://github.com/leekgroup/derfinderPlot - R package version 1.6.3. 2015. URL: http://www.bioconductor.org/packages/derfinderPlot.
[4] L. Collado-Torres, A. E. Jaffe and J. T. Leek. “regionReport: Interactive reports for region-based analyses”. In: F1000Research 4 (2015), p. 105. DOI: 10.12688/f1000research.6379.1. URL: http://f1000research.com/articles/4-105/v1.
[5] L. Collado-Torres, A. Nellore, A. C. Frazee, C. Wilks, et al. “Flexible expressed region analysis for RNA-seq with derfinder”. In: bioRxiv (2016). DOI: 10.1101/015370. URL: http://biorxiv.org/content/early/2016/05/07/015370.
[6] A. E. Jaffe, P. Murakami, H. Lee, J. T. Leek, et al. “Bump hunting to identify differentially methylated regions in epigenetic epidemiology studies”. In: International journal of epidemiology 41.1 (2012), pp. 200–209. DOI: 10.1093/ije/dyr238.
[7] H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2009. ISBN: 978-0-387-98140-6. URL: http://ggplot2.org.
[8] Y. Xie. DT: A Wrapper of the JavaScript Library ‘DataTables’. R package version 0.1. 2015. URL: https://CRAN.R-project.org/package=DT.
[9] Y. Xie. “knitr: A Comprehensive Tool for Reproducible Research in R”. In: Implementing Reproducible Computational Research. Ed. by V. Stodden, F. Leisch and R. D. Peng. ISBN 978-1466561595. Chapman and Hall/CRC, 2014. URL: http://www.crcpress.com/product/isbn/9781466561595.
[10] T. Yin, D. Cook and M. Lawrence. “ggbio: an R package for extending the grammar of graphics for genomic data”. In: Genome Biology 13.8 (2012), p. R77.
[11] E. de Jonge. whisker: mustache for R, logicless templating. R package version 0.3-2. 2013. URL: https://CRAN.R-project.org/package=whisker.