Covers the creation of ScanMiRAnno objects, setting up the shiny app, and using the wrappers.
scanMiRApp 1.12.0
ScanMiRApp
offers a shiny interface to the scanMiR
package,
as well as convenience function to simplify its use with common annotations.
Both the shiny app and the convenience functions rely on objects of the class
ScanMiRAnno
, which contain the different pieces of annotation relating to a
species and genome build. Annotations for human (GRCh38), mouse (GRCm38) and
rat (Rnor_6) can be obtained as follows:
library(scanMiRApp)
# anno <- ScanMiRAnno("Rnor_6")
# for this vignette, we'll work with a lightweight fake annotation:
anno <- ScanMiRAnno("fake")
anno
## Genome: /home/biocbuild/bbs-3.20-bioc/tmpdir/Rtmpzcgmw2/filec0f4b4d80be6b
## Annotation: Fake falsus (fake1)
## Models: KdModelList of length 1
You can also build your own ScanMiRAnno
object by providing the function with
the different components (minimally, a BSgenome and an
ensembldb object - see ?ScanMiRAnno
for more information). For
minimal functioning with the shiny app, the models
slot additionally needs to
be populated with a KdModelList
(see the corresponding vignette of the
scanMiR
package for more information).
In addition, ScanMiRAnno
objects can contain pre-compiled scans and
aggregations, which are especially meant to speed up the shiny application.
These should be saved as IndexedFst files and should be
respectively indexed by transcript and by miRNA, and stored in the scan
and
aggregated
slot of the object.
The transcript (or UTR) sequence for any (set of) transcript(s) in the annotation can be obtained with:
seq <- getTranscriptSequence("ENSTFAKE0000056456", anno)
seq
## DNAStringSet object of length 1:
## width seq names
## [1] 688 CGTATTAAATTTAGCAAGGTTCC...ACCTTCAGATTTCAGCAGACTAG ENSTFAKE0000056456
Binding sites of a given miRNA on a transcript can be visualized with:
plotSitesOnUTR(tx="ENSTFAKE0000056456", annotation=anno, miRNA="hsa-miR-155-5p")
## Prepare miRNA model
## Get Transcript Sequence
## Scan
This will fetch the sequence, perform the scan, and plot the results.
The runFullScan
function can be used to launch a the scan for all miRNAs on
all protein-coding transcripts (or their UTRs) of a genome. These scans can then
be used to speed up the shiny app (see below). They can simply be launched as:
m <- runFullScan(anno)
## Loading annotation
## Extracting transcripts
## Scanning with 1 thread(s)
m
## GRanges object with 2 ranges and 4 metadata columns:
## seqnames ranges strand | type log_kd p3.score note
## <Rle> <IRanges> <Rle> | <factor> <integer> <integer> <Rle>
## [1] ENSTFAKE0000056456 281-288 * | 8mer -4868 12 TDMD?
## [2] ENSTFAKE0000056456 482-489 * | 7mer-m8 -3702 0 -
## -------
## seqinfo: 1 sequence from an unspecified genome; no seqlengths
Multi-threading can be enabled through the ncores
argument. See ?runFullScan
for more options.
The enrichedMirTxPairs
identifies miRNA-target enrichments (which could
indicate sponge- or cargo-like behaviors) by means of a binomial model
estimating the probability of the given number of binding sites for a given
pair given the total number of bindings sites for the miRNA (across all
transcripts) and transcript (across all miRNAs) in question. The output is
a data.frame indicating, for each pair passing some lenient filtering, the
transcript, miRNA, the number of 7mer/8mer sites, and the binomial log(p-value)
of the combination. We strongly recommend further filtering this list by
expression of the transcript in the system of interest, otherwise some
transcripts with very low expression (and hence biologically irrelevant) might
come up as strongly enriched.
The features of the shiny app are organized into two main components:
transcript (or sequence) -centered features are available in the search in gene/sequence tab. These for instance allow to scan custom sequences or selected transcript sequences for miRNA binding sites, visualize them on the transcript, and visualize the sequence pairing of specific matches.
the miRNA-centered features are available in the miRNA-based tab. It shows
the general binding specificity of a given miRNA. If the scanMiRAnno
object
contained aggregated data (see below), the tab also shows the top predicted
targets for the miRNAs.
A ScanMiRAnno
object is the minimal input for the shiny app, and multiple such
objects can be provided in the form of a named list:
scanMiRApp( list( nameOfAnnotation=anno ) )
Launched with this object, the app will not have access to any pre-compiled
scans or to aggregated data. This means that scans will be performed on the fly,
which also means that they will be slower. In addition, it means that the top
targets based on aggregated repression estimates (in the miRNA-based tab)
will not be available. To provide this additional information, you first need to
prepare the objects as IndexedFst files. Assuming you’ve
saved (or downloaded) the scans as scan.rds
and the aggregated data as
aggregated.rds
, you can re-save them as IndexedFst
(here in the folder
out_path
) and add them to the anno
object as follows:
# not run
anno <- ScanMiRAnno("Rnor_6")
saveIndexedFst(readRDS("scan.rds"), "seqnames", file.prefix="out_path/scan")
saveIndexedFst(readRDS("aggregated.rds"), "miRNA",
file.prefix="out_path/aggregated")
anno$scan <- loadIndexedFst("out_path/scan")
anno$aggregated <- loadIndexedFst("out_path/aggregated")
# then launch the app
scanMiRApp(list(Rnor_6=anno))
The same could be done for multiple ScanMiRAnno objects. If scanMiRApp
is
launched without any annotation
argument, it will generate anno objects for
the three base species (without any pre-compiled data).
Multithreading can be enabled in the shiny app by calling scanMiRApp()
(or
the underlying scanMiRserver()
) with the BP
argument, e.g.:
scanMiRApp(..., BP=BiocParallel::MulticoreParam(ncores))
where ncores
is the number of threads to use. This will enable
multi-threading for the scanning functions, which makes a big difference when
scanning for many miRNAs at a time. In addition, multi-threading can be
used to read the IndexedFst
files, which is enabled by the nthreads
of the
loadIndexedFst
function. However, since reading is quite fast already with a
single core, improvements there are typically fairly marginal.
By default, the app has a caching system which means that if a user wants to
launch the same scan with the same parameters twice, the results will be
re-used instead of re-computed. The cache has a maximum size (by default 10MB)
per user, beyond which older cache items will be removed. The cache size can be
manipulated through the maxCacheSize
argument.
## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.20-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] GenomicRanges_1.58.0 GenomeInfoDb_1.42.0 IRanges_2.40.0
## [4] S4Vectors_0.44.0 BiocGenerics_0.52.0 fstcore_0.9.18
## [7] scanMiRApp_1.12.0 scanMiR_1.12.0 BiocStyle_2.34.0
##
## loaded via a namespace (and not attached):
## [1] jsonlite_1.8.9 magrittr_2.0.3
## [3] magick_2.8.5 shinyjqui_0.4.1
## [5] GenomicFeatures_1.58.0 farver_2.1.2
## [7] rmarkdown_2.28 BiocIO_1.16.0
## [9] zlibbioc_1.52.0 vctrs_0.6.5
## [11] memoise_2.0.1 Rsamtools_2.22.0
## [13] RCurl_1.98-1.16 tinytex_0.53
## [15] htmltools_0.5.8.1 S4Arrays_1.6.0
## [17] progress_1.2.3 AnnotationHub_3.14.0
## [19] curl_5.2.3 SparseArray_1.6.0
## [21] sass_0.4.9 bslib_0.8.0
## [23] htmlwidgets_1.6.4 httr2_1.0.5
## [25] plotly_4.10.4 cachem_1.1.0
## [27] GenomicAlignments_1.42.0 mime_0.12
## [29] lifecycle_1.0.4 pkgconfig_2.0.3
## [31] Matrix_1.7-1 R6_2.5.1
## [33] fastmap_1.2.0 GenomeInfoDbData_1.2.13
## [35] MatrixGenerics_1.18.0 shiny_1.9.1
## [37] digest_0.6.37 colorspace_2.1-1
## [39] AnnotationDbi_1.68.0 shinycssloaders_1.1.0
## [41] RSQLite_2.3.7 labeling_0.4.3
## [43] seqLogo_1.72.0 filelock_1.0.3
## [45] fansi_1.0.6 httr_1.4.7
## [47] abind_1.4-8 compiler_4.4.1
## [49] withr_3.0.2 bit64_4.5.2
## [51] BiocParallel_1.40.0 DBI_1.2.3
## [53] highr_0.11 biomaRt_2.62.0
## [55] rappdirs_0.3.3 DelayedArray_0.32.0
## [57] waiter_0.2.5 rjson_0.2.23
## [59] tools_4.4.1 httpuv_1.6.15
## [61] fst_0.9.8 glue_1.8.0
## [63] restfulr_0.0.15 promises_1.3.0
## [65] grid_4.4.1 generics_0.1.3
## [67] gtable_0.3.6 tidyr_1.3.1
## [69] ensembldb_2.30.0 data.table_1.16.2
## [71] hms_1.1.3 xml2_1.3.6
## [73] utf8_1.2.4 XVector_0.46.0
## [75] stringr_1.5.1 BiocVersion_3.20.0
## [77] pillar_1.9.0 later_1.3.2
## [79] rintrojs_0.3.4 dplyr_1.1.4
## [81] BiocFileCache_2.14.0 lattice_0.22-6
## [83] rtracklayer_1.66.0 bit_4.5.0
## [85] tidyselect_1.2.1 Biostrings_2.74.0
## [87] knitr_1.48 bookdown_0.41
## [89] ProtGenerics_1.38.0 SummarizedExperiment_1.36.0
## [91] xfun_0.48 shinydashboard_0.7.2
## [93] Biobase_2.66.0 matrixStats_1.4.1
## [95] DT_0.33 stringi_1.8.4
## [97] UCSC.utils_1.2.0 lazyeval_0.2.2
## [99] yaml_2.3.10 evaluate_1.0.1
## [101] codetools_0.2-20 tibble_3.2.1
## [103] BiocManager_1.30.25 cli_3.6.3
## [105] xtable_1.8-4 munsell_0.5.1
## [107] jquerylib_0.1.4 Rcpp_1.0.13
## [109] dbplyr_2.5.0 png_0.1-8
## [111] XML_3.99-0.17 parallel_4.4.1
## [113] ggplot2_3.5.1 blob_1.2.4
## [115] prettyunits_1.2.0 AnnotationFilter_1.30.0
## [117] bitops_1.0-9 pwalign_1.2.0
## [119] txdbmaker_1.2.0 viridisLite_0.4.2
## [121] scales_1.3.0 scanMiRData_1.11.0
## [123] purrr_1.0.2 crayon_1.5.3
## [125] rlang_1.1.4 cowplot_1.1.3
## [127] KEGGREST_1.46.0