Center for Research in Environmental Epidemiology (CREAL), Barcelona, Spain
Bioinformatics Research Group in Epidemiolgy
(http://www.creal.cat/brge.htm)
Illumina Infinium HumanMethylation 450K BeadChip assay has become a standard tool to analyse methylation in human samples. Developed in 2011, it has already been used in projects such as The Cancer Genome Atlas (TCGA). Their 450.000 probes provide a good overall image of the methylation state of the genome, being one of the reasons of its success.
Given its complex design1, many Bioconductor packages have been developed to assess normalization and pre-processing issues (e.g. minfi (Aryee et al. 2014) or lumi (Du, Kibbe, and Lin 2008)). In addition, these packages can detect differentially methylated probes (DMPs) and differentially methylated regions (DMRs). However, the interfaces are not very intuitive and several scripting steps are usually required.
MEAL aims to facilitate the analysis of Illumina Methylation 450K chips. DMPs and DMRs detection algorithms are included, along with new plotting facilities. Besides, two wrapper functions allow performing whole methylome analysis or region analysis. Two additional features are adjustment of models for SNPs and tools to manage genomic-like variables.
MEAL implements three new classes: MethylationSet
, AnalysisResults
and AnalysisRegionResults
. MethylationSet
is a class derived from Bioconductor eSet
. All the information of the experiment is contained in this class: a beta values matrix, phenotypic information and annotation.
AnalysisResults
and AnalysisRegionResults
contain the results of whole methylome analysis and region analysis respectively as well as the raw data of the most significant features. With this design, the analyses and the reporting are split. Given that all the plots and the results can be obtained from these objects, we can do the analyses, store the objects and then do the report.
minfiData and IlluminaHumanMethylation450kanno.ilmn12.hg19 packages are required in order to run the examples of this vignette. minfiData contains MsetEx
, a MethylSet
from the minfi package.
library(MEAL)
library(MultiDataSet)
library(minfiData)
library(GenomicRanges)
MEAL does not contain preprocessing capabilities so these steps should be done previously. As a result, before analysing MsetEx
, probes that does not measure CpGs should be previously filtered. The next code uses a minfi function to remove not CpGs probes.
MsetExFilt <- dropMethylationLoci(MsetEx)
MethylationSet
construction requires a beta values matrix and a data.frame with the phenotypes. The following code extract both objects from a minfi
object. To speed up the example, only 30.000 probes will be used. In order to obtain reproducible results, the seed used for random number will be set to 0.
set.seed(0)
betas <- getBeta(MsetExFilt)
betas <- betas[sample(1:nrow(betas), 30000), ]
phenotypes <- pData(MsetExFilt)
With prepareMethylationSet
, a MethylationSet
is obtained. It contains only the samples having beta values and phenotypes and the probes containing annotation.
set <- prepareMethylationSet(matrix = betas, phenotypes = phenotypes,
annotation = "IlluminaHumanMethylation450kanno.ilmn12.hg19")
set
## MethylationSet (storageMode: lockedEnvironment)
## assayData: 29999 features, 6 samples
## element names: meth
## protocolData: none
## phenoData
## rowNames: 5723646052_R02C02 5723646052_R04C01 ...
## 5723646053_R06C02 (6 total)
## varLabels: Sample_Name Sample_Well ... filenames (13 total)
## varMetadata: labelDescription
## featureData
## featureNames: cg17501828 cg15560884 ... cg14273923 (29999 total)
## fvarLabels: chromosome position ... DHS (17 total)
## fvarMetadata: labelDescription
## experimentData: use 'experimentData(object)'
## Annotation: IlluminaHumanMethylation450kanno.ilmn12.hg19
prepareMethylationSet
is the main entry point to use the package and it is worth describing its arguments and functionalities. The first two arguments (matrix and phenotypes) are the matrix of beta values and the data.frame of phenotypes. It should be noticed that an AnnotatedDataFrame
can be used as phenotypes (during the construction of the object, phenotypes data.frame is coerced to an AnnotatedDataFrame
). In any case, beta values matrix contains CpGs at rows and samples at columns and both must be named. The same applies to phenotypes data.frame but rows must contain samples and columns variables.
annotation can be a character or a data.frame (or an AnnotatedDataFrame
). If character, it loads an annotation package and uses it in the object. At the moment, only IlluminaHumanMethylation450kanno.ilmn12.hg19 is supported. The following parameters (chromosome, position, genes and group) allow the specification of a custom annotation. If annotation is a data.frame, these parameters should match column names of the annotation data.frame. Chromosome equals to chromosome name (e.g. chr1) and position to position coordinates. Genes and group are optional and equal to the genes near the probe and to the position of the probe in relation to the gene. Default argument values match names in IlluminaHumanMethylation450kanno.ilmn12.hg19 annotation package. Finally, if annotation is not specified, IlluminaHumanMethylation450kanno.ilmn12.hg19 package is used.
minfi
objects can be directly used in prepareMethylationSet
. If the object contains phenotypes, there is no need to supply it. Otherwise, it will be compulsory.
This section (probe analysis) and the following (region analysis) describe the two main parts of the workflow. Despite they can be run as independent functions, it is recommended to use the general function DAPipeline
, which is explained below. DAPipeline
performs both analyses and it is designed in a more user-friendly way.
The initial approach when studying methylation was to find differentially methylated probes. The analysis chosen was to fit a linear model for each of the probes, taking into account the variable of interest as well as some covariates. Probes were usually sorted by statistical parameters such as coefficients p-value.
This approach has been implemented in MEAL but with some changes. As was proposed in (Du et al. 2010), M-values (logit2 transformation of beta values) are used to fit the model. In addition, a robust linear regression is used instead of a normal linear regression. This kind of regression is more robust to outliers and can obtain more accurate coefficients.
Before analysing the data, a quick look at the phenotypes will be done:
pData(set)
## Sample_Name Sample_Well Sample_Plate Sample_Group
## 5723646052_R02C02 GroupA_3 H5 NA GroupA
## 5723646052_R04C01 GroupA_2 D5 NA GroupA
## 5723646052_R05C02 GroupB_3 C6 NA GroupB
## 5723646053_R04C02 GroupB_1 F7 NA GroupB
## 5723646053_R05C02 GroupA_1 G7 NA GroupA
## 5723646053_R06C02 GroupB_2 H7 NA GroupB
## Pool_ID person age sex status Array Slide
## 5723646052_R02C02 NA id3 83 M normal R02C02 5723646052
## 5723646052_R04C01 NA id2 58 F normal R04C01 5723646052
## 5723646052_R05C02 NA id3 83 M cancer R05C02 5723646052
## 5723646053_R04C02 NA id1 75 F cancer R04C02 5723646053
## 5723646053_R05C02 NA id1 75 F normal R05C02 5723646053
## 5723646053_R06C02 NA id2 58 F cancer R06C02 5723646053
## Basename
## 5723646052_R02C02 ../extdata/5723646052/5723646052_R02C02
## 5723646052_R04C01 ../extdata/5723646052/5723646052_R04C01
## 5723646052_R05C02 ../extdata/5723646052/5723646052_R05C02
## 5723646053_R04C02 ../extdata/5723646053/5723646053_R04C02
## 5723646053_R05C02 ../extdata/5723646053/5723646053_R05C02
## 5723646053_R06C02 ../extdata/5723646053/5723646053_R06C02
## filenames
## 5723646052_R02C02 ../extdata/5723646052/5723646052_R02C02
## 5723646052_R04C01 ../extdata/5723646052/5723646052_R04C01
## 5723646052_R05C02 ../extdata/5723646052/5723646052_R05C02
## 5723646053_R04C02 ../extdata/5723646053/5723646053_R04C02
## 5723646053_R05C02 ../extdata/5723646053/5723646053_R05C02
## 5723646053_R06C02 ../extdata/5723646053/5723646053_R06C02
summary(pData(set))
## Sample_Name Sample_Well Sample_Plate Sample_Group
## Length:6 Length:6 Mode:logical Length:6
## Class :character Class :character NA's:6 Class :character
## Mode :character Mode :character Mode :character
##
##
##
## Pool_ID person age sex
## Mode:logical Length:6 Min. :58.00 Length:6
## NA's:6 Class :character 1st Qu.:62.25 Class :character
## Mode :character Median :75.00 Mode :character
## Mean :72.00
## 3rd Qu.:81.00
## Max. :83.00
## status Array Slide
## Length:6 Length:6 Min. :5.724e+09
## Class :character Class :character 1st Qu.:5.724e+09
## Mode :character Mode :character Median :5.724e+09
## Mean :5.724e+09
## 3rd Qu.:5.724e+09
## Max. :5.724e+09
## Basename filenames
## Length:6 Length:6
## Class :character Class :character
## Mode :character Mode :character
##
##
##
The most interesting variables are person, age, sex and status. Age is numeric but the other ones, that should be factors, are indeed characters. The following analysis will be performed using the variable status, which defines samples of normal or cancer tissues.
Probe analysis can be done with DAProbe
, which needs a MethylationSet
and a matrix model.
mod <- model.matrix(~as.factor(status), data = pData(set))
proberes <- DAProbe(set = set, model = mod, method = "ls", coefficient = 2)
head(proberes)
## intercept as.factor(status)normal sd t
## cg25937714 0.7305902 -0.5912205 0.019011867 -15.18472
## cg06340552 0.6815428 -0.6019749 0.013449414 -14.01203
## cg20450979 0.4797533 -0.4153843 0.009589855 -13.44973
## cg21245277 0.6422945 -0.5370338 0.015934441 -13.36087
## cg13355248 0.8088785 -0.7467653 0.013335034 -13.35681
## cg19333963 0.7364919 -0.6581963 0.015003345 -12.96130
## P.Value adj.P.Val chromosome position
## cg25937714 5.264155e-07 0.005426774 chr3 44036492
## cg06340552 9.574154e-07 0.005426774 chr4 142054329
## cg20450979 1.297019e-06 0.005426774 chr4 176923542
## cg21245277 1.362233e-06 0.005426774 chr2 185463218
## cg13355248 1.365300e-06 0.005426774 chr17 78449872
## cg19333963 1.704865e-06 0.005426774 chr19 1467979
## genes group
## cg25937714
## cg06340552 RNF150;RNF150 1stExon;5'UTR
## cg20450979 GPM6A;GPM6A;GPM6A;GPM6A 5'UTR;5'UTR;1stExon;1stExon
## cg21245277 ZNF804A;ZNF804A 1stExon;5'UTR
## cg13355248 NPTX1 1stExon
## cg19333963 APC2 Body
DAProbe
returns a data.frame with the results of the linear analysis. In the example, method is set to “ls” (normal linear regression) in order to speed the tutorial, but default is robust regression. Coefficient indicates the coefficients of the model matrix whose results will be returned. If coefficient is a vector, a list of data.frames is returned.
MEAL includes three region analysis algorithms: bumphunter (Jaffe et al. 2012), DMRcate (Peters et al. 2015) and blockFinder. More information about these methods can be found at the corresponding packages.
DARegion
needs a MethylationSet
and a matrix model (the same that DAProbe
):
regionres <- DARegion(set = set, model = mod, coefficient = 2)
names(regionres)
## [1] "bumphunter" "blockFinder" "DMRcate"
head(regionres$bumphunter)
## chr start end value area cluster indexStart
## 23969 chr6 29521023 29521756 -2.286410 13.718458 21494 23809
## 24390 chr6 118228078 118228871 -3.790524 11.371572 22512 25102
## 23124 chr4 142054254 142054329 -4.827935 9.655870 19467 21552
## 16223 chr10 1779432 1780184 -3.143640 9.430920 2715 2987
## 16907 chr11 30607068 30607360 -4.664209 9.328417 4439 4905
## 21929 chr20 61340107 61340542 -3.080326 9.240979 16372 18133
## indexEnd L clusterL
## 23969 23814 6 6
## 24390 25104 3 3
## 23124 21553 2 2
## 16223 2989 3 3
## 16907 4906 2 2
## 21929 18135 3 3
head(regionres$blockFinder)
## chr start end value area cluster indexStart
## 210 chr6 31904709 32411670 0.1379759 6.484868 1207 21656
## 207 chr6 28962770 30017803 0.1132343 4.982309 1207 21373
## 715 chr12 113345598 113417284 -1.3327353 3.998206 2228 6498
## 490 chr15 25230200 25513277 0.1477724 3.546538 2466 8428
## 35 chr1 152538857 153068236 0.2331429 3.497144 165 1690
## 212 chr6 33037444 33232191 0.1478054 3.103914 1207 21783
## indexEnd L clusterL
## 210 21734 47 250
## 207 21447 44 250
## 715 6500 3 12
## 490 8451 24 27
## 35 1704 15 33
## 212 21822 21 250
head(regionres$DMRcate)
## coord no.cpgs minfdr Stouffer
## 1369 chr6:29521023-29521756 6 2.087206e-41 2.858691e-07
## 208 chr10:118031870-118033115 4 3.848536e-54 8.255219e-07
## 1450 chr6:118228078-118228871 3 2.120233e-80 5.655719e-06
## 1324 chr5:168727752-168728076 3 5.861509e-53 8.606307e-06
## 612 chr16:51184205-51185110 3 2.247997e-25 4.416790e-05
## 750 chr19:12305854-12306303 3 9.694360e-34 5.004852e-05
## maxbetafc meanbetafc
## 1369 -0.5047504 -0.3571560
## 208 -0.6028006 -0.5332195
## 1450 -0.6021535 -0.4881543
## 1324 -0.4638198 -0.4453066
## 612 -0.5757717 -0.3976473
## 750 -0.5706350 -0.3859342
DARegion
returns a list of data.frames with the results of the different methods. methods parameter can be used to select the methods desired. If a method is not chosen, a NA value is returned in this position. Because DMRcate uses results from a linear model regression, these results can be computed using DAProbe
and passed in the argument proberes.
Bumphunter and blockFinder can calculate p-values for the regions differentially detected using bootstraping. num_permutations arguments sets the number of permutations that will be done to estimate these p-values. By default it is set to 0, because its computation requires a lot of memory and it is time consuming.
Finally, it should be said that blockFinder method has been adapted from minfi package and it needs its annotation package (IlluminaHumanMethylation450kanno.ilmn12.hg19). If another annotation is used, blockFinder use is discouraged.
The first approach when studying methylation changes is to get an overall picture of the methylation state throughout the genome. This kind of analysis is performed by the function DAPipeline
. The first analysis will be to evaluate the effect of cancer in methylation. Given that this variable is a character, it should be converted to factor using variable_types:
res <- DAPipeline(set = set, variable_names = "status", variable_types = "categorical",
probe_method = "ls", verbose = FALSE)
## Your contrast returned 2135 individually significant probes. We recommend the default setting of pcutoff in dmrcate().
## Fitting chr1...
## Fitting chr10...
## Fitting chr11...
## Fitting chr12...
## Fitting chr13...
## Fitting chr14...
## Fitting chr15...
## Fitting chr16...
## Fitting chr17...
## Fitting chr18...
## Fitting chr19...
## Fitting chr2...
## Fitting chr20...
## Fitting chr21...
## Fitting chr22...
## Fitting chr3...
## Fitting chr4...
## Fitting chr5...
## Fitting chr6...
## Fitting chr7...
## Fitting chr8...
## Fitting chr9...
## Fitting chrX...
## Fitting chrY...
## Demarcating regions...
## Done!
res
## class: AnalysisResults
## original class: MethylationSet
## features(2135): cg25937714 cg06340552 ... cg05073901 cg16246713
## samples(6): 5723646052_R02C02 5723646052_R04C01 ...
## 5723646053_R05C02 5723646053_R06C02
## variables: status
## model variables names: status
## covariables: none
## Number of bumps: 26175
## Number of blocks: NULL
## Number of regions: 314
## Number of differentially methylated probes: 2135
DAPipeline
generates a AnalysisResults
objects containing probe and region analysis results. Most of the parameters of this function are arguments of DAProbe
and DARegion
.
On the other hand, there are four important parameters in this function: variable_names, variable_types, covariable_names and covariable_types. These parameters define the variables that will be used as active variable (for which the results will be presented) and the variables used as covariates (variables that will enter in the model but not in the results). Class of the variables in set can be changed using variable_types and covariable_types. Available types are categorical (factor), continuous (numeric) and the three genetic models (dominant, recessive and additive).
When variables are defined in this way, the linear model created is additive: all the variables are summed. It is also possible to use a linear model containing interaction and other more complex features using the equation parameter:
complexres <- DAPipeline(set = set, variable_names = c("status", "sex"),
variable_types = c("categorical", "categorical"),
probe_method = "ls", num_var = 3, verbose = FALSE,
equation = "~ status:sex + status + sex")
## Your contrast returned no individually significant probes. Set pcutoff manually in dmrcate() to return DMRs, but be warned there is an increased risk of Type I errors.
## Fitting chr1...
## Fitting chr10...
## Fitting chr11...
## Fitting chr12...
## Fitting chr13...
## Fitting chr14...
## Fitting chr15...
## Fitting chr16...
## Fitting chr17...
## Fitting chr18...
## Fitting chr19...
## Fitting chr2...
## Fitting chr20...
## Fitting chr21...
## Fitting chr22...
## Fitting chr3...
## Fitting chr4...
## Fitting chr5...
## Fitting chr6...
## Fitting chr7...
## Fitting chr8...
## Fitting chr9...
## Fitting chrX...
## Fitting chrY...
## Your contrast returned 152 individually significant probes. We recommend the default setting of pcutoff in dmrcate().
## Fitting chr1...
## Fitting chr10...
## Fitting chr11...
## Fitting chr12...
## Fitting chr13...
## Fitting chr14...
## Fitting chr15...
## Fitting chr16...
## Fitting chr17...
## Fitting chr18...
## Fitting chr19...
## Fitting chr2...
## Fitting chr20...
## Fitting chr21...
## Fitting chr22...
## Fitting chr3...
## Fitting chr4...
## Fitting chr5...
## Fitting chr6...
## Fitting chr7...
## Fitting chr8...
## Fitting chr9...
## Fitting chrX...
## Fitting chrY...
## Demarcating regions...
## Done!
## Your contrast returned no individually significant probes. Set pcutoff manually in dmrcate() to return DMRs, but be warned there is an increased risk of Type I errors.
## Fitting chr1...
## Fitting chr10...
## Fitting chr11...
## Fitting chr12...
## Fitting chr13...
## Fitting chr14...
## Fitting chr15...
## Fitting chr16...
## Fitting chr17...
## Fitting chr18...
## Fitting chr19...
## Fitting chr2...
## Fitting chr20...
## Fitting chr21...
## Fitting chr22...
## Fitting chr3...
## Fitting chr4...
## Fitting chr5...
## Fitting chr6...
## Fitting chr7...
## Fitting chr8...
## Fitting chr9...
## Fitting chrX...
## Fitting chrY...
complexres
## class: AnalysisResults
## original class: MethylationSet
## features(152): cg27069132 cg23896353 ... cg04407470 cg03661529
## samples(6): 5723646052_R02C02 5723646052_R04C01 ...
## 5723646053_R05C02 5723646053_R06C02
## variables: status, sex
## model variables names: statusnormal, sexM, statusnormal:sexM
## covariables: none
## Number of bumps: 26257, 26476, 26799
## Number of blocks: NULL, NULL, NULL
## Number of regions: 0, 37, 0
## Number of differentially methylated probes: 0, 152, 0
When using equation, number of active variables must be explicit. They will be selected from the left of the equation to the right. Model matrices are created using model.matrix
function, so if we introduce interactions, they will be after the single variables. In our example, status and sex both had two levels. Therefore, a dummy variable is created for each of the samples and the interaction is the third column. Consequently, num_var must be 3.
In addition, it should be noticed that variables used in interactions must be included alone in the equation and must be present in the variable_names arguments. Covariables can still be used by setting them in covariable_names, being added to the linear model.
There are several functions that allow to access analysis results:
#Bumphunter
head(bumps(res)[[1]])
## chr start end value area cluster indexStart
## 23969 chr6 29521023 29521756 -2.286410 13.718458 21494 23809
## 24390 chr6 118228078 118228871 -3.790524 11.371572 22512 25102
## 23124 chr4 142054254 142054329 -4.827935 9.655870 19467 21552
## 16223 chr10 1779432 1780184 -3.143640 9.430920 2715 2987
## 16907 chr11 30607068 30607360 -4.664209 9.328417 4439 4905
## 21929 chr20 61340107 61340542 -3.080326 9.240979 16372 18133
## indexEnd L clusterL
## 23969 23814 6 6
## 24390 25104 3 3
## 23124 21553 2 2
## 16223 2989 3 3
## 16907 4906 2 2
## 21929 18135 3 3
#BlockFinder
head(blocks(res)[[1]])
## [1] NA
#DMRcate
head(dmrCate(res)[[1]])
## coord no.cpgs minfdr Stouffer
## 1369 chr6:29521023-29521756 6 2.087206e-41 2.858691e-07
## 208 chr10:118031870-118033115 4 3.848536e-54 8.255219e-07
## 1450 chr6:118228078-118228871 3 2.120233e-80 5.655719e-06
## 1324 chr5:168727752-168728076 3 5.861509e-53 8.606307e-06
## 612 chr16:51184205-51185110 3 2.247997e-25 4.416790e-05
## 750 chr19:12305854-12306303 3 9.694360e-34 5.004852e-05
## maxbetafc meanbetafc
## 1369 -0.5047504 -0.3571560
## 208 -0.6028006 -0.5332195
## 1450 -0.6021535 -0.4881543
## 1324 -0.4638198 -0.4453066
## 612 -0.5757717 -0.3976473
## 750 -0.5706350 -0.3859342
#Probe
head(probeResults(res)[[1]])
## [1] 0.7305902 0.6815428 0.4797533 0.6422945 0.8088785 0.7364919
#Region
names(regionResults(res))
## [1] "DMRCate" "Bumphunter" "BlockFinder"
All these functions return a list, even if it contains only one data.frame. regionResults
contains a list with the results of the three per region methods. Export of the results is simplified with the function exportResults
.
exportResults(res, dir = "./results")
This function creates csv files with all the results. If more than one variable is present, a subfolder for each variable is created.
AnalysisResults
incorporates many plotting facilities. plotFeature
plots the beta values distribution of a CpG. plotFeature
accepts a number with the CpG index or character with the name of a CpG.
plotFeature(res, 1)
Probe results can be used to plot a QQ-plot and a Manhattan plot. In the second one, CpGs inside a range can be highlighted by passing a GenomicRanges object with the range of interest.
plotQQ(res)
range <- GRanges(seqnames = Rle("chr1"),
ranges = IRanges(1000000, end = 10000000))
plotEWAS(res, range = range)
In figure 3, the red line of the Manhattan plot is the significance threshold using Bonferroni.
A study of a region can be performed with DARegionAnalysis
. Options are very similar to that of DAPipeline
. In this situation, status will be the active variable and age will be a covariable. A GenomicRange must be supplied in order to delimit the region.
range <- GRanges(seqnames = Rle("chr12"),
ranges = IRanges(70000000, end = 80000000))
region <- DARegionAnalysis(set = set, variable_names = "status",
variable_types = "categorical",
covariable_names = "age", range = range,
verbose = FALSE)
## Your contrast returned 6 individually significant probes; a small but real effect. Consider manually setting the value of pcutoff to return more DMRs, but be warned that doing this increases the risk of Type I errors.
## Fitting chr12...
## Demarcating regions...
## Done!
## 'nperm' >= set of all permutations: complete enumeration.
## Set of permutations < 'minperm'. Generating entire set.
region
## class: AnalysisRegionResults
## original class: MethylationSet
## features(44): cg11054631 cg20090301 ... cg01585322 cg17278249
## samples(6): 5723646052_R02C02 5723646052_R04C01 ...
## 5723646053_R05C02 5723646053_R06C02
## variables: status
## model variables names: status
## covariables: age
## Number of bumps: 41
## Number of blocks: 0
## Number of regions: 0
## Number of differentially methylated probes: 6
## Relevant snps(0):
## Snps Variance: NA
## Range:
## Chr: chr12 start: 70000000 end: 80000000
## R2: 0.471
## RDA P-value: 0.1
## Region P-value: 0.353
## Global R2: 0.419
DARegionAnalysis
generates a AnalysisRegionResults
, an heir of AnalysisResults
, so they share getter functions.
#Bumphunter
head(bumps(region)[[1]])
## chr start end value area cluster indexStart indexEnd
## 38 chr12 72667493 72667493 -3.008385 3.008385 20 20 20
## 27 chr12 77756101 77756101 1.886755 1.886755 37 37 37
## 15 chr12 72783379 72783379 1.501490 1.501490 21 21 21
## 29 chr12 78637464 78637464 1.478663 1.478663 39 39 39
## 31 chr12 78883559 78883559 1.451280 1.451280 41 41 41
## 12 chr12 72335228 72335228 1.374072 1.374072 17 17 17
## L clusterL
## 38 1 1
## 27 1 1
## 15 1 1
## 29 1 1
## 31 1 1
## 12 1 1
#BlockFinder
head(blocks(region)[[1]])
## data frame with 0 columns and 0 rows
#DMRcate
head(dmrCate(region)[[1]])
## [1] coord no.cpgs minfdr Stouffer maxbetafc meanbetafc
## <0 rows> (or 0-length row.names)
#Probe
head(probeResults(region)[[1]])
## [1] 0.8027510 0.3021116 0.8120773 0.4836036 0.1894649 0.3268627
#Region
names(regionResults(region))
## [1] "DMRCate" "Bumphunter" "BlockFinder"
Distribution of coefficients of the region can be plotted using the plotRegion
function. .
plotRegion(region)
In figure 4, green points are significant probes (p-value smaller than 0.05) while red points are non significant probes (p-value greater than 0.05). Blue lines are set at 0.05, a minimal difference to be considered differentially methylated.
To evaluate the correlation of the methylation probes with the phenotype of interest, a redundancy analysis (RDA) is run. RDA is a multivariate analogue of linear regression. In our case, we will fit our matrix of methylation probes to a matrix of variables: \[ Y = A*X + \epsilon \] where Y are the methylation values of the probes of our region, A the matrix used to fit X to Y and X the matrix with the variables to fit. We can transform this equation to: \[ Y = \hat{Y} + Y_r \] where \(\hat{Y}\) is the fitted matrix Y, or the part of Y that can be explained by X and \(Y_r\) is the residual Y, or the part of Y that cannot be explained by X. The variance of the matrix Y can be split between \(\hat{Y}\) and \(Y_r\), and an R2 can be computed, representing the amount of variance of Y that is explained by X.
In the print of the results, the R2 and the p-value from the RDA analysis are returned. A plot can be done with plotRDA
:
plotRDA(region)
In figure 5, points represent samples grouped by the active variables. Red crosses are the CpGs and CpG names are the CpGs most correlated to the RDA axes.
In our example, SNPs data was not available. An example with SNPs can be found in caseExample vignette.
Aryee, Martin J, Andrew E Jaffe, Hector Corrada-Bravo, Christine Ladd-Acosta, Andrew P Feinberg, Kasper D Hansen, and Rafael A Irizarry. 2014. “Minfi: a flexible and comprehensive Bioconductor package for the analysis of Infinium DNA methylation microarrays.” Bioinformatics (Oxford, England) 30 (10): 1363–9. doi:10.1093/bioinformatics/btu049.
Du, Pan, Warren A Kibbe, and Simon M Lin. 2008. “lumi: a pipeline for processing Illumina microarray.” Bioinformatics (Oxford, England) 24 (13): 1547–8. doi:10.1093/bioinformatics/btn224.
Du, Pan, Xiao Zhang, Chiang-Ching Huang, Nadereh Jafari, Warren A Kibbe, Lifang Hou, and Simon M Lin. 2010. “Comparison of Beta-value and M-value methods for quantifying methylation levels by microarray analysis.” BMC Bioinformatics 11 (January): 587. doi:10.1186/1471-2105-11-587.
Jaffe, Andrew E, Peter Murakami, Hwajin Lee, Jeffrey T Leek, M Daniele Fallin, Andrew P Feinberg, and Rafael A Irizarry. 2012. “Bump hunting to identify differentially methylated regions in epigenetic epidemiology studies.” International Journal of Epidemiology 41 (1): 200–209. doi:10.1093/ije/dyr238.
Peters, Timothy, Michael Buckley, Aaron Statham, Ruth Pidsley, Katherine Samaras, Reginald Lord, Susan Clark, and Peter Molloy. 2015. “De novo identification of differentially methylated regions in the human genome.” Epigenetics & Chromatin 8 (1): 6. doi:10.1186/1756-8935-8-6.
More information can be found at this minfi tutorial↩