1 Introduction

Illumina Infinium HumanMethylation 450K BeadChip assay has become a standard tool to analyse methylation in human samples. Developed in 2011, it has already been used in projects such as The Cancer Genome Atlas (TCGA). Their 450.000 probes provide a good overall image of the methylation state of the genome, being one of the reasons of its success.

Given its complex design¹, many Bioconductor packages have been developed to assess normalization and pre-processing issues (e.g. minfi (Aryee et al. 2014) or lumi (Du, Kibbe, and Lin 2008)). In addition, these packages can detect differentially methylated probes (DMPs) and differentially methylated regions (DMRs). However, the interfaces are not very intuitive and several scripting steps are usually required.

MEAL aims to facilitate the analysis of Illumina Methylation 450K chips. DMPs and DMRs detection algorithms are included, along with new plotting facilities. Besides, two wrapper functions allow performing whole methylome analysis or region analysis. Two additional features are adjustment of models for SNPs and tools to manage genomic-like variables.

1.1 Data types

MEAL implements three new classes: MethylationSet, AnalysisResults and AnalysisRegionResults. MethylationSet is a class derived from Bioconductor eSet. All the information of the experiment is contained in this class: a beta values matrix, phenotypic information and annotation.

AnalysisResults and AnalysisRegionResults contain the results of whole methylome analysis and region analysis respectively as well as the raw data of the most significant features. With this design, the analyses and the reporting are split. Given that all the plots and the results can be obtained from these objects, we can do the analyses, store the objects and then do the report.

1.2 Example data

minfiData and IlluminaHumanMethylation450kanno.ilmn12.hg19 packages are required in order to run the examples of this vignette. minfiData contains MsetEx, a MethylSet from the minfi package.

library(MEAL)
library(MultiDataSet)
library(minfiData)
library(GenomicRanges)

2 Read data

MEAL does not contain preprocessing capabilities so these steps should be done previously. As a result, before analysing MsetEx, probes that does not measure CpGs should be previously filtered. The next code uses a minfi function to remove not CpGs probes.

MsetExFilt <- dropMethylationLoci(MsetEx)

MethylationSet construction requires a beta values matrix and a data.frame with the phenotypes. The following code extract both objects from a minfi object. To speed up the example, only 30.000 probes will be used. In order to obtain reproducible results, the seed used for random number will be set to 0.

set.seed(0)
betas <- getBeta(MsetExFilt)
betas <- betas[sample(1:nrow(betas), 30000), ]
phenotypes <- pData(MsetExFilt)

With prepareMethylationSet, a MethylationSet is obtained. It contains only the samples having beta values and phenotypes and the probes containing annotation.

set <- prepareMethylationSet(matrix = betas, phenotypes = phenotypes, 
                                    annotation = "IlluminaHumanMethylation450kanno.ilmn12.hg19")
set

## MethylationSet (storageMode: lockedEnvironment)
## assayData: 29999 features, 6 samples 
##   element names: meth 
## protocolData: none
## phenoData
##   rowNames: 5723646052_R02C02 5723646052_R04C01 ...
##     5723646053_R06C02 (6 total)
##   varLabels: Sample_Name Sample_Well ... filenames (13 total)
##   varMetadata: labelDescription
## featureData
##   featureNames: cg17501828 cg15560884 ... cg14273923 (29999 total)
##   fvarLabels: chromosome position ... DHS (17 total)
##   fvarMetadata: labelDescription
## experimentData: use 'experimentData(object)'
## Annotation: IlluminaHumanMethylation450kanno.ilmn12.hg19

prepareMethylationSet is the main entry point to use the package and it is worth describing its arguments and functionalities. The first two arguments (matrix and phenotypes) are the matrix of beta values and the data.frame of phenotypes. It should be noticed that an AnnotatedDataFrame can be used as phenotypes (during the construction of the object, phenotypes data.frame is coerced to an AnnotatedDataFrame). In any case, beta values matrix contains CpGs at rows and samples at columns and both must be named. The same applies to phenotypes data.frame but rows must contain samples and columns variables.

annotation can be a character or a data.frame (or an AnnotatedDataFrame). If character, it loads an annotation package and uses it in the object. At the moment, only IlluminaHumanMethylation450kanno.ilmn12.hg19 is supported. The following parameters (chromosome, position, genes and group) allow the specification of a custom annotation. If annotation is a data.frame, these parameters should match column names of the annotation data.frame. Chromosome equals to chromosome name (e.g. chr1) and position to position coordinates. Genes and group are optional and equal to the genes near the probe and to the position of the probe in relation to the gene. Default argument values match names in IlluminaHumanMethylation450kanno.ilmn12.hg19 annotation package. Finally, if annotation is not specified, IlluminaHumanMethylation450kanno.ilmn12.hg19 package is used.

2.1 Compatibility with minfi

minfi objects can be directly used in prepareMethylationSet. If the object contains phenotypes, there is no need to supply it. Otherwise, it will be compulsory.

3 Probe analysis

This section (probe analysis) and the following (region analysis) describe the two main parts of the workflow. Despite they can be run as independent functions, it is recommended to use the general function DAPipeline, which is explained below. DAPipeline performs both analyses and it is designed in a more user-friendly way.

The initial approach when studying methylation was to find differentially methylated probes. The analysis chosen was to fit a linear model for each of the probes, taking into account the variable of interest as well as some covariates. Probes were usually sorted by statistical parameters such as coefficients p-value.

This approach has been implemented in MEAL but with some changes. As was proposed in (Du et al. 2010), M-values (logit2 transformation of beta values) are used to fit the model. In addition, a robust linear regression is used instead of a normal linear regression. This kind of regression is more robust to outliers and can obtain more accurate coefficients.

Before analysing the data, a quick look at the phenotypes will be done:

pData(set)

##                   Sample_Name Sample_Well Sample_Plate Sample_Group
## 5723646052_R02C02    GroupA_3          H5           NA       GroupA
## 5723646052_R04C01    GroupA_2          D5           NA       GroupA
## 5723646052_R05C02    GroupB_3          C6           NA       GroupB
## 5723646053_R04C02    GroupB_1          F7           NA       GroupB
## 5723646053_R05C02    GroupA_1          G7           NA       GroupA
## 5723646053_R06C02    GroupB_2          H7           NA       GroupB
##                   Pool_ID person age sex status  Array      Slide
## 5723646052_R02C02      NA    id3  83   M normal R02C02 5723646052
## 5723646052_R04C01      NA    id2  58   F normal R04C01 5723646052
## 5723646052_R05C02      NA    id3  83   M cancer R05C02 5723646052
## 5723646053_R04C02      NA    id1  75   F cancer R04C02 5723646053
## 5723646053_R05C02      NA    id1  75   F normal R05C02 5723646053
## 5723646053_R06C02      NA    id2  58   F cancer R06C02 5723646053
##                                                  Basename
## 5723646052_R02C02 ../extdata/5723646052/5723646052_R02C02
## 5723646052_R04C01 ../extdata/5723646052/5723646052_R04C01
## 5723646052_R05C02 ../extdata/5723646052/5723646052_R05C02
## 5723646053_R04C02 ../extdata/5723646053/5723646053_R04C02
## 5723646053_R05C02 ../extdata/5723646053/5723646053_R05C02
## 5723646053_R06C02 ../extdata/5723646053/5723646053_R06C02
##                                                 filenames
## 5723646052_R02C02 ../extdata/5723646052/5723646052_R02C02
## 5723646052_R04C01 ../extdata/5723646052/5723646052_R04C01
## 5723646052_R05C02 ../extdata/5723646052/5723646052_R05C02
## 5723646053_R04C02 ../extdata/5723646053/5723646053_R04C02
## 5723646053_R05C02 ../extdata/5723646053/5723646053_R05C02
## 5723646053_R06C02 ../extdata/5723646053/5723646053_R06C02

summary(pData(set))

##  Sample_Name        Sample_Well        Sample_Plate   Sample_Group      
##  Length:6           Length:6           Mode:logical   Length:6          
##  Class :character   Class :character   NA's:6         Class :character  
##  Mode  :character   Mode  :character                  Mode  :character  
##                                                                         
##                                                                         
##                                                                         
##  Pool_ID           person               age            sex           
##  Mode:logical   Length:6           Min.   :58.00   Length:6          
##  NA's:6         Class :character   1st Qu.:62.25   Class :character  
##                 Mode  :character   Median :75.00   Mode  :character  
##                                    Mean   :72.00                     
##                                    3rd Qu.:81.00                     
##                                    Max.   :83.00                     
##     status             Array               Slide          
##  Length:6           Length:6           Min.   :5.724e+09  
##  Class :character   Class :character   1st Qu.:5.724e+09  
##  Mode  :character   Mode  :character   Median :5.724e+09  
##                                        Mean   :5.724e+09  
##                                        3rd Qu.:5.724e+09  
##                                        Max.   :5.724e+09  
##    Basename          filenames        
##  Length:6           Length:6          
##  Class :character   Class :character  
##  Mode  :character   Mode  :character  
##                                       
##                                       
##

The most interesting variables are person, age, sex and status. Age is numeric but the other ones, that should be factors, are indeed characters. The following analysis will be performed using the variable status, which defines samples of normal or cancer tissues.

Probe analysis can be done with DAProbe, which needs a MethylationSet and a matrix model.

mod <- model.matrix(~as.factor(status), data = pData(set))
proberes <- DAProbe(set = set, model = mod, method = "ls", coefficient = 2)
head(proberes)

##            intercept as.factor(status)normal          sd         t
## cg25937714 0.7305902              -0.5912205 0.019011867 -15.18472
## cg06340552 0.6815428              -0.6019749 0.013449414 -14.01203
## cg20450979 0.4797533              -0.4153843 0.009589855 -13.44973
## cg21245277 0.6422945              -0.5370338 0.015934441 -13.36087
## cg13355248 0.8088785              -0.7467653 0.013335034 -13.35681
## cg19333963 0.7364919              -0.6581963 0.015003345 -12.96130
##                 P.Value   adj.P.Val chromosome  position
## cg25937714 5.264155e-07 0.005426774       chr3  44036492
## cg06340552 9.574154e-07 0.005426774       chr4 142054329
## cg20450979 1.297019e-06 0.005426774       chr4 176923542
## cg21245277 1.362233e-06 0.005426774       chr2 185463218
## cg13355248 1.365300e-06 0.005426774      chr17  78449872
## cg19333963 1.704865e-06 0.005426774      chr19   1467979
##                              genes                       group
## cg25937714                                                    
## cg06340552           RNF150;RNF150               1stExon;5'UTR
## cg20450979 GPM6A;GPM6A;GPM6A;GPM6A 5'UTR;5'UTR;1stExon;1stExon
## cg21245277         ZNF804A;ZNF804A               1stExon;5'UTR
## cg13355248                   NPTX1                     1stExon
## cg19333963                    APC2                        Body

DAProbe returns a data.frame with the results of the linear analysis. In the example, method is set to “ls” (normal linear regression) in order to speed the tutorial, but default is robust regression. Coefficient indicates the coefficients of the model matrix whose results will be returned. If coefficient is a vector, a list of data.frames is returned.

4 Region analysis

MEAL includes three region analysis algorithms: bumphunter (Jaffe et al. 2012), DMRcate (Peters et al. 2015) and blockFinder. More information about these methods can be found at the corresponding packages.

DARegion needs a MethylationSet and a matrix model (the same that DAProbe):

regionres <- DARegion(set = set, model = mod, coefficient = 2)
names(regionres)

## [1] "bumphunter"  "blockFinder" "DMRcate"

head(regionres$bumphunter)

##         chr     start       end     value      area cluster indexStart
## 23969  chr6  29521023  29521756 -2.286410 13.718458   21494      23809
## 24390  chr6 118228078 118228871 -3.790524 11.371572   22512      25102
## 23124  chr4 142054254 142054329 -4.827935  9.655870   19467      21552
## 16223 chr10   1779432   1780184 -3.143640  9.430920    2715       2987
## 16907 chr11  30607068  30607360 -4.664209  9.328417    4439       4905
## 21929 chr20  61340107  61340542 -3.080326  9.240979   16372      18133
##       indexEnd L clusterL
## 23969    23814 6        6
## 24390    25104 3        3
## 23124    21553 2        2
## 16223     2989 3        3
## 16907     4906 2        2
## 21929    18135 3        3

head(regionres$blockFinder)

##       chr     start       end      value     area cluster indexStart
## 210  chr6  31904709  32411670  0.1379759 6.484868    1207      21656
## 207  chr6  28962770  30017803  0.1132343 4.982309    1207      21373
## 715 chr12 113345598 113417284 -1.3327353 3.998206    2228       6498
## 490 chr15  25230200  25513277  0.1477724 3.546538    2466       8428
## 35   chr1 152538857 153068236  0.2331429 3.497144     165       1690
## 212  chr6  33037444  33232191  0.1478054 3.103914    1207      21783
##     indexEnd  L clusterL
## 210    21734 47      250
## 207    21447 44      250
## 715     6500  3       12
## 490     8451 24       27
## 35      1704 15       33
## 212    21822 21      250

head(regionres$DMRcate)

##                          coord no.cpgs       minfdr     Stouffer
## 1369    chr6:29521023-29521756       6 2.087206e-41 2.858691e-07
## 208  chr10:118031870-118033115       4 3.848536e-54 8.255219e-07
## 1450  chr6:118228078-118228871       3 2.120233e-80 5.655719e-06
## 1324  chr5:168727752-168728076       3 5.861509e-53 8.606307e-06
## 612    chr16:51184205-51185110       3 2.247997e-25 4.416790e-05
## 750    chr19:12305854-12306303       3 9.694360e-34 5.004852e-05
##       maxbetafc meanbetafc
## 1369 -0.5047504 -0.3571560
## 208  -0.6028006 -0.5332195
## 1450 -0.6021535 -0.4881543
## 1324 -0.4638198 -0.4453066
## 612  -0.5757717 -0.3976473
## 750  -0.5706350 -0.3859342

DARegion returns a list of data.frames with the results of the different methods. methods parameter can be used to select the methods desired. If a method is not chosen, a NA value is returned in this position. Because DMRcate uses results from a linear model regression, these results can be computed using DAProbe and passed in the argument proberes.

Bumphunter and blockFinder can calculate p-values for the regions differentially detected using bootstraping. num_permutations arguments sets the number of permutations that will be done to estimate these p-values. By default it is set to 0, because its computation requires a lot of memory and it is time consuming.

Finally, it should be said that blockFinder method has been adapted from minfi package and it needs its annotation package (IlluminaHumanMethylation450kanno.ilmn12.hg19). If another annotation is used, blockFinder use is discouraged.

5 Whole methylome analysis

The first approach when studying methylation changes is to get an overall picture of the methylation state throughout the genome. This kind of analysis is performed by the function DAPipeline. The first analysis will be to evaluate the effect of cancer in methylation. Given that this variable is a character, it should be converted to factor using variable_types:

res <- DAPipeline(set = set, variable_names = "status", variable_types = "categorical", 
                      probe_method = "ls", verbose = FALSE)

## Your contrast returned 2135 individually significant probes. We recommend the default setting of pcutoff in dmrcate().

## Fitting chr1...

## Fitting chr10...

## Fitting chr11...

## Fitting chr12...

## Fitting chr13...

## Fitting chr14...

## Fitting chr15...

## Fitting chr16...

## Fitting chr17...

## Fitting chr18...

## Fitting chr19...

## Fitting chr2...

## Fitting chr20...

## Fitting chr21...

## Fitting chr22...

## Fitting chr3...

## Fitting chr4...

## Fitting chr5...

## Fitting chr6...

## Fitting chr7...

## Fitting chr8...

## Fitting chr9...

## Fitting chrX...

## Fitting chrY...

## Demarcating regions...

## Done!

res

## class: AnalysisResults 
## original class: MethylationSet 
## features(2135): cg25937714 cg06340552 ... cg05073901 cg16246713
## samples(6): 5723646052_R02C02 5723646052_R04C01 ...
##   5723646053_R05C02 5723646053_R06C02
## variables:  status 
## model variables names:  status 
## covariables: none 
## Number of bumps: 26175 
## Number of blocks: NULL 
## Number of regions: 314 
## Number of differentially methylated probes: 2135

DAPipeline generates a AnalysisResults objects containing probe and region analysis results. Most of the parameters of this function are arguments of DAProbe and DARegion.

On the other hand, there are four important parameters in this function: variable_names, variable_types, covariable_names and covariable_types. These parameters define the variables that will be used as active variable (for which the results will be presented) and the variables used as covariates (variables that will enter in the model but not in the results). Class of the variables in set can be changed using variable_types and covariable_types. Available types are categorical (factor), continuous (numeric) and the three genetic models (dominant, recessive and additive).

When variables are defined in this way, the linear model created is additive: all the variables are summed. It is also possible to use a linear model containing interaction and other more complex features using the equation parameter:

complexres <- DAPipeline(set = set, variable_names = c("status", "sex"),
                             variable_types = c("categorical", "categorical"), 
                             probe_method = "ls", num_var = 3, verbose = FALSE,
                             equation = "~ status:sex + status + sex")

## Your contrast returned no individually significant probes. Set pcutoff manually in dmrcate() to return DMRs, but be warned there is an increased risk of Type I errors.

## Fitting chr1...

## Fitting chr10...

## Fitting chr11...

## Fitting chr12...

## Fitting chr13...

## Fitting chr14...

## Fitting chr15...

## Fitting chr16...

## Fitting chr17...

## Fitting chr18...

## Fitting chr19...

## Fitting chr2...

## Fitting chr20...

## Fitting chr21...

## Fitting chr22...

## Fitting chr3...

## Fitting chr4...

## Fitting chr5...

## Fitting chr6...

## Fitting chr7...

## Fitting chr8...

## Fitting chr9...

## Fitting chrX...

## Fitting chrY...

## Your contrast returned 152 individually significant probes. We recommend the default setting of pcutoff in dmrcate().

## Fitting chr1...

## Fitting chr10...

## Fitting chr11...

## Fitting chr12...

## Fitting chr13...

## Fitting chr14...

## Fitting chr15...

## Fitting chr16...

## Fitting chr17...

## Fitting chr18...

## Fitting chr19...

## Fitting chr2...

## Fitting chr20...

## Fitting chr21...

## Fitting chr22...

## Fitting chr3...

## Fitting chr4...

## Fitting chr5...

## Fitting chr6...

## Fitting chr7...

## Fitting chr8...

## Fitting chr9...

## Fitting chrX...

## Fitting chrY...

## Demarcating regions...

## Done!

## Your contrast returned no individually significant probes. Set pcutoff manually in dmrcate() to return DMRs, but be warned there is an increased risk of Type I errors.

## Fitting chr1...

## Fitting chr10...

## Fitting chr11...

## Fitting chr12...

## Fitting chr13...

## Fitting chr14...

## Fitting chr15...

## Fitting chr16...

## Fitting chr17...

## Fitting chr18...

## Fitting chr19...

## Fitting chr2...

## Fitting chr20...

## Fitting chr21...

## Fitting chr22...

## Fitting chr3...

## Fitting chr4...

## Fitting chr5...

## Fitting chr6...

## Fitting chr7...

## Fitting chr8...

## Fitting chr9...

## Fitting chrX...

## Fitting chrY...

complexres

## class: AnalysisResults 
## original class: MethylationSet 
## features(152): cg27069132 cg23896353 ... cg04407470 cg03661529
## samples(6): 5723646052_R02C02 5723646052_R04C01 ...
##   5723646053_R05C02 5723646053_R06C02
## variables:  status, sex 
## model variables names:  statusnormal, sexM, statusnormal:sexM 
## covariables: none 
## Number of bumps: 26257, 26476, 26799 
## Number of blocks: NULL, NULL, NULL 
## Number of regions: 0, 37, 0 
## Number of differentially methylated probes: 0, 152, 0

When using equation, number of active variables must be explicit. They will be selected from the left of the equation to the right. Model matrices are created using model.matrix function, so if we introduce interactions, they will be after the single variables. In our example, status and sex both had two levels. Therefore, a dummy variable is created for each of the samples and the interaction is the third column. Consequently, num_var must be 3.

In addition, it should be noticed that variables used in interactions must be included alone in the equation and must be present in the variable_names arguments. Covariables can still be used by setting them in covariable_names, being added to the linear model.

5.1 Results access

There are several functions that allow to access analysis results:

#Bumphunter
head(bumps(res)[[1]])

##         chr     start       end     value      area cluster indexStart
## 23969  chr6  29521023  29521756 -2.286410 13.718458   21494      23809
## 24390  chr6 118228078 118228871 -3.790524 11.371572   22512      25102
## 23124  chr4 142054254 142054329 -4.827935  9.655870   19467      21552
## 16223 chr10   1779432   1780184 -3.143640  9.430920    2715       2987
## 16907 chr11  30607068  30607360 -4.664209  9.328417    4439       4905
## 21929 chr20  61340107  61340542 -3.080326  9.240979   16372      18133
##       indexEnd L clusterL
## 23969    23814 6        6
## 24390    25104 3        3
## 23124    21553 2        2
## 16223     2989 3        3
## 16907     4906 2        2
## 21929    18135 3        3

#BlockFinder
head(blocks(res)[[1]])

## [1] NA

#DMRcate
head(dmrCate(res)[[1]])

##                          coord no.cpgs       minfdr     Stouffer
## 1369    chr6:29521023-29521756       6 2.087206e-41 2.858691e-07
## 208  chr10:118031870-118033115       4 3.848536e-54 8.255219e-07
## 1450  chr6:118228078-118228871       3 2.120233e-80 5.655719e-06
## 1324  chr5:168727752-168728076       3 5.861509e-53 8.606307e-06
## 612    chr16:51184205-51185110       3 2.247997e-25 4.416790e-05
## 750    chr19:12305854-12306303       3 9.694360e-34 5.004852e-05
##       maxbetafc meanbetafc
## 1369 -0.5047504 -0.3571560
## 208  -0.6028006 -0.5332195
## 1450 -0.6021535 -0.4881543
## 1324 -0.4638198 -0.4453066
## 612  -0.5757717 -0.3976473
## 750  -0.5706350 -0.3859342

#Probe
head(probeResults(res)[[1]])

## [1] 0.7305902 0.6815428 0.4797533 0.6422945 0.8088785 0.7364919

#Region
names(regionResults(res))

## [1] "DMRCate"     "Bumphunter"  "BlockFinder"

All these functions return a list, even if it contains only one data.frame. regionResults contains a list with the results of the three per region methods. Export of the results is simplified with the function exportResults.

exportResults(res, dir = "./results")

This function creates csv files with all the results. If more than one variable is present, a subfolder for each variable is created.

5.2 Plotting

AnalysisResults incorporates many plotting facilities. plotFeature plots the beta values distribution of a CpG. plotFeature accepts a number with the CpG index or character with the name of a CpG.

plotFeature(res, 1)

Figure 1: Beta values of cg25937714 split by cancer status

Probe results can be used to plot a QQ-plot and a Manhattan plot. In the second one, CpGs inside a range can be highlighted by passing a GenomicRanges object with the range of interest.

plotQQ(res)

Figure 2: QQplot of the analysis.

range <- GRanges(seqnames = Rle("chr1"), 
                                ranges = IRanges(1000000, end = 10000000))
plotEWAS(res, range = range)

Figure 3: Manhattan plot with the CpGs of the range highlighted

In figure 3, the red line of the Manhattan plot is the significance threshold using Bonferroni.

6 Range analysis

A study of a region can be performed with DARegionAnalysis. Options are very similar to that of DAPipeline. In this situation, status will be the active variable and age will be a covariable. A GenomicRange must be supplied in order to delimit the region.

range <- GRanges(seqnames = Rle("chr12"), 
                                ranges = IRanges(70000000, end = 80000000))
region <- DARegionAnalysis(set = set, variable_names = "status", 
                               variable_types = "categorical", 
                               covariable_names = "age", range = range, 
                               verbose = FALSE)

## Your contrast returned 6 individually significant probes; a small but real effect. Consider manually setting the value of pcutoff to return more DMRs, but be warned that doing this increases the risk of Type I errors.

## Fitting chr12...

## Demarcating regions...

## Done!

## 'nperm' >= set of all permutations: complete enumeration.

## Set of permutations < 'minperm'. Generating entire set.

region

## class: AnalysisRegionResults 
## original class: MethylationSet 
## features(44): cg11054631 cg20090301 ... cg01585322 cg17278249
## samples(6): 5723646052_R02C02 5723646052_R04C01 ...
##   5723646053_R05C02 5723646053_R06C02
## variables:  status 
## model variables names:  status 
## covariables: age 
## Number of bumps: 41 
## Number of blocks: 0 
## Number of regions: 0 
## Number of differentially methylated probes: 6 
## Relevant snps(0):
## Snps Variance:  NA 
## Range:
##  Chr:  chr12     start:  70000000    end:  80000000 
## R2:  0.471 
## RDA P-value:  0.1 
## Region P-value:  0.353 
## Global R2:  0.419

DARegionAnalysis generates a AnalysisRegionResults, an heir of AnalysisResults, so they share getter functions.

#Bumphunter
head(bumps(region)[[1]])

##      chr    start      end     value     area cluster indexStart indexEnd
## 38 chr12 72667493 72667493 -3.008385 3.008385      20         20       20
## 27 chr12 77756101 77756101  1.886755 1.886755      37         37       37
## 15 chr12 72783379 72783379  1.501490 1.501490      21         21       21
## 29 chr12 78637464 78637464  1.478663 1.478663      39         39       39
## 31 chr12 78883559 78883559  1.451280 1.451280      41         41       41
## 12 chr12 72335228 72335228  1.374072 1.374072      17         17       17
##    L clusterL
## 38 1        1
## 27 1        1
## 15 1        1
## 29 1        1
## 31 1        1
## 12 1        1

#BlockFinder
head(blocks(region)[[1]])

## data frame with 0 columns and 0 rows

#DMRcate
head(dmrCate(region)[[1]])

## [1] coord      no.cpgs    minfdr     Stouffer   maxbetafc  meanbetafc
## <0 rows> (or 0-length row.names)

#Probe
head(probeResults(region)[[1]])

## [1] 0.8027510 0.3021116 0.8120773 0.4836036 0.1894649 0.3268627

#Region
names(regionResults(region))

## [1] "DMRCate"     "Bumphunter"  "BlockFinder"

6.1 Plotting

Distribution of coefficients of the region can be plotted using the plotRegion function. .

plotRegion(region)

Figure 4: Plot of the differential methylation for each CpG of the region.

In figure 4, green points are significant probes (p-value smaller than 0.05) while red points are non significant probes (p-value greater than 0.05). Blue lines are set at 0.05, a minimal difference to be considered differentially methylated.

6.2 Redundancy analysis

To evaluate the correlation of the methylation probes with the phenotype of interest, a redundancy analysis (RDA) is run. RDA is a multivariate analogue of linear regression. In our case, we will fit our matrix of methylation probes to a matrix of variables: \[ Y = A*X + \epsilon \] where Y are the methylation values of the probes of our region, A the matrix used to fit X to Y and X the matrix with the variables to fit. We can transform this equation to: \[ Y = \hat{Y} + Y_r \] where \(\hat{Y}\) is the fitted matrix Y, or the part of Y that can be explained by X and \(Y_r\) is the residual Y, or the part of Y that cannot be explained by X. The variance of the matrix Y can be split between \(\hat{Y}\) and \(Y_r\), and an R² can be computed, representing the amount of variance of Y that is explained by X.

In the print of the results, the R² and the p-value from the RDA analysis are returned. A plot can be done with plotRDA:

plotRDA(region)

Figure 5: RDA plot of the region.

In figure 5, points represent samples grouped by the active variables. Red crosses are the CpGs and CpG names are the CpGs most correlated to the RDA axes.

7 SNPs Data

In our example, SNPs data was not available. An example with SNPs can be found in caseExample vignette.

References

Aryee, Martin J, Andrew E Jaffe, Hector Corrada-Bravo, Christine Ladd-Acosta, Andrew P Feinberg, Kasper D Hansen, and Rafael A Irizarry. 2014. “Minfi: a flexible and comprehensive Bioconductor package for the analysis of Infinium DNA methylation microarrays.” Bioinformatics (Oxford, England) 30 (10): 1363–9. doi:10.1093/bioinformatics/btu049.

Du, Pan, Warren A Kibbe, and Simon M Lin. 2008. “lumi: a pipeline for processing Illumina microarray.” Bioinformatics (Oxford, England) 24 (13): 1547–8. doi:10.1093/bioinformatics/btn224.

Du, Pan, Xiao Zhang, Chiang-Ching Huang, Nadereh Jafari, Warren A Kibbe, Lifang Hou, and Simon M Lin. 2010. “Comparison of Beta-value and M-value methods for quantifying methylation levels by microarray analysis.” BMC Bioinformatics 11 (January): 587. doi:10.1186/1471-2105-11-587.

Jaffe, Andrew E, Peter Murakami, Hwajin Lee, Jeffrey T Leek, M Daniele Fallin, Andrew P Feinberg, and Rafael A Irizarry. 2012. “Bump hunting to identify differentially methylated regions in epigenetic epidemiology studies.” International Journal of Epidemiology 41 (1): 200–209. doi:10.1093/ije/dyr238.

Peters, Timothy, Michael Buckley, Aaron Statham, Ruth Pidsley, Katherine Samaras, Reginald Lord, Susan Clark, and Peter Molloy. 2015. “De novo identification of differentially methylated regions in the human genome.” Epigenetics & Chromatin 8 (1): 6. doi:10.1186/1756-8935-8-6.

Introduction to MEAL

Carlos Ruiz, Carles Hernández, Juan R. González

Center for Research in Environmental Epidemiology (CREAL), Barcelona, Spain
Bioinformatics Research Group in Epidemiolgy
(http://www.creal.cat/brge.htm)

2016-05-30

Package version: MEAL 1.2.3

Contents

1 Introduction

1.1 Data types

1.2 Example data

2 Read data

2.1 Compatibility with minfi

3 Probe analysis

4 Region analysis

5 Whole methylome analysis

5.1 Results access

5.2 Plotting

6 Range analysis

6.1 Plotting

6.2 Redundancy analysis

7 SNPs Data

References

Introduction to MEAL

Carlos Ruiz, Carles Hernández, Juan R. González

Center for Research in Environmental Epidemiology (CREAL), Barcelona, Spain Bioinformatics Research Group in Epidemiolgy (http://www.creal.cat/brge.htm)

2016-05-30

Package version: MEAL 1.2.3

Contents

1 Introduction

1.1 Data types

1.2 Example data

2 Read data

2.1 Compatibility with minfi

3 Probe analysis

4 Region analysis

5 Whole methylome analysis

5.1 Results access

5.2 Plotting

6 Range analysis

6.1 Plotting

6.2 Redundancy analysis

7 SNPs Data

References

Center for Research in Environmental Epidemiology (CREAL), Barcelona, Spain
Bioinformatics Research Group in Epidemiolgy
(http://www.creal.cat/brge.htm)