Package: xcms
Authors: Johannes Rainer, Michael Witting
Modified: 2021-10-26 14:51:41
Compiled: Sun Nov 21 08:14:16 2021

1 Introduction

Metabolite identification is an important step in non-targeted metabolomics and requires different steps. One involves the use of tandem mass spectrometry to generate fragmentation spectra of detected metabolites (LC-MS/MS), which are then compared to fragmentation spectra of known metabolites. Different approaches exist for the generation of these fragmentation spectra, whereas the most used is data dependent acquisition (DDA) also known as the top-n method. In this method the top N most intense m/z values from a MS1 scan are selected for fragmentation in the next N scans before the cycle starts again. This method allows to generate clean MS2 fragmentation spectra on the fly during acquisition without the need for further experiments, but suffers from poor coverage of the detected metabolites (since only a limited number of ions are fragmented).

Data independent approaches (DIA) like Bruker bbCID, Agilent AllIons or Waters MSe don’t use such a preselection, but rather fragment all detected molecules at once. They are using alternating schemes with scan of low and high collision energy to collect MS1 and MS2 data. Using this approach, there is no problem in coverage, but the relation between the precursor and fragment masses is lost leading to chimeric spectra. Sequential Window Acquisition of all Theoretical Mass Spectra (or SWATH [1]) combines both approaches through a middle-way approach. There is no precursor selection and acquisition is independent of acquired data, but rather than isolating all precusors at once, defined windows (i.e. ranges of m/z values) are used and scanned. This reduces the overlap of fragment spectra while still keeping a high coverage.

This document showcases the analysis of two small LC-MS/MS data sets using xcms. The data files used are reversed-phase LC-MS/MS runs from the Agilent Pesticide mix obtained from a Sciex 6600 Triple ToF operated in SWATH acquisition mode. For comparison a DDA file from the same sample is included.

2 Analysis of DDA data

Below we load the example DDA data set using the readMSData function from the MSnbase package.

library(xcms)

dda_file <- system.file("TripleTOF-SWATH", "PestMix1_DDA.mzML",
                        package = "msdata")
dda_data <- readMSData(dda_file, mode = "onDisk")

The variable dda_data contains now all MS1 and MS2 spectra from the specified mzML file. The number of spectra for each MS level is listed below.

table(msLevel(dda_data))

## 
##    1    2 
## 1504 2238

For the MS2 spectra we can get the m/z of the precursor ion with the precursorMz function. Below we first filter the data set by MS level, extract the precursor m/z and call head to just show the first 6 elements. For easier readability we use the forward pipe operator %>% from the magrittr package.

library(magrittr)

dda_data %>%
    filterMsLevel(2L) %>%
    precursorMz() %>%
    head()

##  F1.S1570  F1.S1588  F1.S1592  F1.S1594  F1.S1595  F1.S1596 
## 130.96578 388.25426  89.93779  83.99569 371.22409 388.25226

With the precursorIntensity function it is also possible to extract the intensity of the precursor ion.

dda_data %>%
    filterMsLevel(2L) %>%
    precursorIntensity() %>%
    head()

## F1.S1570 F1.S1588 F1.S1592 F1.S1594 F1.S1595 F1.S1596 
##        0        0        0        0        0        0

Some manufacturers (like Sciex for the present test data) don’t define/export the precursor intensity and thus either NA or 0 is reported. We can however use the estimatePrecursorIntensity function from the xcms package to determine the precursor intensity for a MS 2 spectrum based on the intensity of the respective ion in the previous MS1 scan (note that with method = "interpolation" the precursor intensity would be defined based on interpolation between the intensity in the previous and subsequent MS1 scan). Below we estimate the precursor intensities, on the full data (for MS1 spectra a NA value is reported). Note also that we use xcms:: to call the function from the xcms package, because a function with the same name is also implemented in the Spectra package, which would however not support OnDiskMSnExp objects as input.

prec_int <- xcms::estimatePrecursorIntensity(dda_data)

We next set the precursor intensity in the spectrum metadata of dda_data. So that it can be extracted later with the precursorIntensity function.

fData(dda_data)$precursorIntensity <- prec_int

dda_data %>%
    filterMsLevel(2L) %>%
    precursorIntensity() %>%
    head()

##  F1.S1570  F1.S1588  F1.S1592  F1.S1594  F1.S1595  F1.S1596 
## 0.9691072 3.0772917 0.3885723 0.3215049 1.6329483 4.4057989

Next we perform the chromatographic peak detection on the MS level 1 data with the findChromPeaks method. Below we define the settings for a centWave-based peak detection and perform the analysis.

cwp <- CentWaveParam(snthresh = 5, noise = 100, ppm = 10,
                     peakwidth = c(3, 30))
dda_data <- findChromPeaks(dda_data, param = cwp)

In total 112 peaks were identified in the present data set.

The advantage of LC-MS/MS data is that (MS1) ions are fragmented and the corresponding MS2 spectra measured. Thus, for some of the ions (identified as MS1 chromatographic peaks) MS2 spectra are available. These can facilitate the annotation of the respective MS1 chromatographic peaks (or MS1 features after a correspondence analysis). Spectra for identified chromatographic peaks can be extracted with the chromPeakSpectra method. MS2 spectra with their precursor m/z and retention time within the rt and m/z range of the chromatographic peak are returned. Parameter return.type allows to define in which format these are returned. With return.type = "List" or return.type = "Spectra" the data is represented by a Spectra object from the Spectra.

library(Spectra)
dda_spectra <- chromPeakSpectra(
    dda_data, msLevel = 2L, return.type = "Spectra")
dda_spectra

## MSn data (Spectra) with 150 spectra in a MsBackendMzR backend:
##            msLevel     rtime scanIndex
##          <integer> <numeric> <integer>
## F1.S1812         2   237.869      1812
## F1.S1846         2   241.299      1846
## F1.S2446         2   326.583      2446
## F1.S2450         2   327.113      2450
## F1.S2502         2   330.273      2502
## ...            ...       ...       ...
## F1.S5110         2   574.725      5110
## F1.S5115         2   575.255      5115
## F1.S5272         2   596.584      5272
## F1.S5236         2   592.424      5236
## F1.S5266         2   596.054      5266
##  ... 38 more variables/columns.
## 
## file(s):
## PestMix1_DDA.mzML

By default chromPeakSpectra returns all spectra associated with a MS1 chromatographic peak, but parameter method allows to choose and return only one spectrum per peak (have a look at the ?chromPeakSpectra help page for more details). Also, it would be possible to extract MS1 spectra for each peak by specifying msLevel = 1L in the call above (e.g. to evaluate the full MS1 signal at the peak’s apex position).

In the example above we selected to return the data as a Spectra object. Spectra variables "peak_id" and "peak_index" contain the identifiers and the index (in the chromPeaks matrix) of the chromatographic peaks the MS2 spectrum is associated with.

dda_spectra$peak_id

##   [1] "CP004" "CP004" "CP005" "CP006" "CP006" "CP008" "CP008" "CP011" "CP011"
##  [10] "CP012" "CP012" "CP013" "CP013" "CP013" "CP013" "CP014" "CP014" "CP014"
##  [19] "CP014" "CP018" "CP022" "CP022" "CP022" "CP022" "CP025" "CP025" "CP025"
##  [28] "CP025" "CP026" "CP026" "CP026" "CP026" "CP033" "CP033" "CP034" "CP034"
##  [37] "CP034" "CP034" "CP034" "CP035" "CP035" "CP035" "CP041" "CP041" "CP041"
##  [46] "CP042" "CP042" "CP042" "CP042" "CP042" "CP043" "CP048" "CP048" "CP050"
##  [55] "CP050" "CP050" "CP050" "CP051" "CP051" "CP051" "CP052" "CP052" "CP052"
##  [64] "CP054" "CP055" "CP055" "CP056" "CP056" "CP056" "CP057" "CP057" "CP057"
##  [73] "CP057" "CP057" "CP061" "CP061" "CP061" "CP061" "CP065" "CP065" "CP066"
##  [82] "CP066" "CP067" "CP067" "CP069" "CP069" "CP069" "CP069" "CP070" "CP070"
##  [91] "CP070" "CP070" "CP070" "CP072" "CP072" "CP072" "CP073" "CP074" "CP074"
## [100] "CP074" "CP074" "CP075" "CP075" "CP075" "CP077" "CP077" "CP077" "CP079"
## [109] "CP079" "CP079" "CP079" "CP080" "CP080" "CP080" "CP081" "CP087" "CP087"
## [118] "CP087" "CP087" "CP087" "CP089" "CP089" "CP089" "CP090" "CP090" "CP090"
## [127] "CP092" "CP092" "CP094" "CP094" "CP095" "CP095" "CP095" "CP096" "CP096"
## [136] "CP096" "CP097" "CP097" "CP097" "CP099" "CP099" "CP099" "CP099" "CP099"
## [145] "CP100" "CP100" "CP100" "CP101" "CP102" "CP102"

Note also that with return.type = "List" a list parallel to the chromPeaks matrix would be returned, i.e. each element in that list would contain the spectra for the chromatographic peak with the same index. This data representation might eventually simplify further processing.

We next use the MS2 information to aid in the annotation of a chromatographic peak. As an example we use a chromatographic peak of an ion with an m/z of 304.1131 which we extract in the code block below.

ex_mz <- 304.1131
chromPeaks(dda_data, mz = ex_mz, ppm = 20)

##             mz    mzmin    mzmax      rt   rtmin   rtmax    into     intb
## CP057 304.1133 304.1126 304.1143 425.024 417.985 441.773 13040.8 12884.14
##           maxo sn sample
## CP057 3978.987 79      1

A search of potential ions with a similar m/z in a reference database (e.g. Metlin) returned a large list of potential hits, most with a very small ppm. For two of the hits, Flumazenil (Metlin ID 2724) and Fenamiphos (Metlin ID 72445) experimental MS2 spectra are available. Thus, we could match the MS2 spectrum for the identified chromatographic peak against these to annotate our ion. Below we extract all MS2 spectra that were associated with the candidate chromatographic peak using the ID of the peak in the present data set.

ex_id <- rownames(chromPeaks(dda_data, mz = ex_mz, ppm = 20))
ex_spectra <- dda_spectra[dda_spectra$peak_id == ex_id]
ex_spectra

## MSn data (Spectra) with 5 spectra in a MsBackendMzR backend:
##            msLevel     rtime scanIndex
##          <integer> <numeric> <integer>
## F1.S3505         2   418.926      3505
## F1.S3510         2   419.306      3510
## F1.S3582         2   423.036      3582
## F1.S3603         2   423.966      3603
## F1.S3609         2   424.296      3609
##  ... 38 more variables/columns.
## 
## file(s):
## PestMix1_DDA.mzML

There are 5 MS2 spectra representing fragmentation of the ion(s) measured in our candidate chromatographic peak. We next reduce this to a single MS2 spectrum using the combineSpectra method employing the combinePeaks function to determine which peaks to keep in the resulting spectrum (have a look at the ?combinePeaks help page for details). Parameter f allows to specify which spectra in the input object should be combined into one.

ex_spectrum <- combineSpectra(ex_spectra, FUN = combinePeaks, ppm = 20,
                              peaks = "intersect", minProp = 0.8,
                              intensityFun = median, mzFun = median,
                              f = ex_spectra$peak_id)
ex_spectrum

## MSn data (Spectra) with 1 spectra in a MsBackendDataFrame backend:
##            msLevel     rtime scanIndex
##          <integer> <numeric> <integer>
## F1.S3505         2   418.926      3505
##  ... 38 more variables/columns.
## Processing:
##  Switch backend from MsBackendMzR to MsBackendDataFrame [Sun Nov 21 08:14:20 2021]

Mass peaks from all input spectra with a difference in m/z smaller 20 ppm (parameter ppm) were combined into one peak and the median m/z and intensity is reported for these. Due to parameter minProp = 0.8, the resulting MS2 spectrum contains only peaks that were present in 80% of the input spectra.

A plot of this consensus spectrum is shown below.

plotSpectra(ex_spectrum)

Figure 1: Consensus MS2 spectrum created from all measured MS2 spectra for ions of chromatographic peak CP53

We could now match the consensus spectrum against a database of MS2 spectra. In our example we simply load MS2 spectra for the two compounds with matching m/z exported from Metlin. For each of the compounds MS2 spectra created with collision energies of 0V, 10V, 20V and 40V are available. Below we import the respective data and plot our candidate spectrum against the MS2 spectra of Flumanezil and Fenamiphos (from a collision energy of 20V). To import files in MGF format we have to load the MsBackendMgf R package which adds MGF file support to the Spectra package. This package can be installed with BiocManager::install("RforMassSpectrometry/MsBackendMgf").

Prior plotting we normalize our experimental spectra.

norm_fun <- function(z, ...) {
    z[, "intensity"] <- z[, "intensity"] /
        max(z[, "intensity"], na.rm = TRUE) * 100
    z
}
ex_spectrum <- addProcessing(ex_spectrum, FUN = norm_fun)

library(MsBackendMgf)
flumanezil <- Spectra(
    system.file("mgf", "metlin-2724.mgf", package = "xcms"),
    source = MsBackendMgf())

## Start data import from 1 files ... done

fenamiphos <- Spectra(
    system.file("mgf", "metlin-72445.mgf", package = "xcms"),
    source = MsBackendMgf())

## Start data import from 1 files ... done

par(mfrow = c(1, 2))
plotSpectraMirror(ex_spectrum, flumanezil[3], main = "against Flumanezil",
                  ppm = 40)
plotSpectraMirror(ex_spectrum, fenamiphos[3], main = "against Fenamiphos",
                  ppm = 40)

Mirror plots for the candidate MS2 spectrum against Flumanezil (left) and Fenamiphos (right). The upper panel represents the candidate MS2 spectrum, the lower the target MS2 spectrum. Matching peaks are indicated with a dot.

Figure 2: Mirror plots for the candidate MS2 spectrum against Flumanezil (left) and Fenamiphos (right)
The upper panel represents the candidate MS2 spectrum, the lower the target MS2 spectrum. Matching peaks are indicated with a dot.

Our candidate spectrum matches Fenamiphos, thus, our example chromatographic peak represents signal measured for this compound. In addition to plotting the spectra, we can also calculate similarities between them with the compareSpectra method (which uses by default the normalized dot-product to calculate the similarity).

compareSpectra(ex_spectrum, flumanezil, ppm = 40)

## [1] 4.520957e-02 3.283806e-02 2.049379e-03 3.374354e-05

compareSpectra(ex_spectrum, fenamiphos, ppm = 40)

## [1] 0.1326234432 0.4879399946 0.7198406271 0.3997922658 0.0004876129
## [6] 0.0028408885 0.0071030051 0.0053809736

Clearly, the candidate spectrum does not match Flumanezil, while it has a high similarity to Fenamiphos. While we performed here the MS2-based annotation on a single chromatographic peak, this could be easily extended to the full list of MS2 spectra (returned by chromPeakSpectra) for all chromatographic peaks in an experiment. See also here.

In the present example we used only a single data file and we did thus not need to perform a sample alignment and correspondence analysis. These tasks could however be performed similarly to plain LC-MS data, retention times of recorded MS2 spectra would however also be adjusted during alignment based on the MS1 data. After correspondence analysis (peak grouping) MS2 spectra for features can be extracted with the featureSpectra function which returns all MS2 spectra associated with any chromatographic peak of a feature.

Note also that this workflow can be included into the Feature-Based Molecular Networking FBMN to match MS2 spectra against GNPS. See here for more details and examples.

3 SWATH data analysis

In this section we analyze a small SWATH data set consisting of a single mzML file with data from the same sample analyzed in the previous section but recorded in SWATH mode. We again read the data with the readMSData function. The resulting object will contain all recorded MS1 and MS2 spectra in the specified file.

swath_file <- system.file("TripleTOF-SWATH",
                          "PestMix1_SWATH.mzML",
                          package = "msdata")

swath_data <- readMSData(swath_file, mode = "onDisk")

Below we determine the number of MS level 1 and 2 spectra in the present data set.

table(msLevel(swath_data))

## 
##    1    2 
##  444 3556

As described in the introduction, in SWATH mode all ions within pre-defined isolation windows are fragmented and MS2 spectra measured. The definition of these isolation windows (SWATH pockets) is imported from the mzML files and stored in the object’s fData (which provides additional annotations for each individual spectrum). Below we inspect the respective information for the first few spectra. The upper and lower isolation window m/z can be extracted with the isolationWindowLowerMz and isolationWindowUpperMz.

head(fData(swath_data)[, c("isolationWindowTargetMZ",
                           "isolationWindowLowerOffset",
                           "isolationWindowUpperOffset",
                           "msLevel", "retentionTime")])

##          isolationWindowTargetMZ isolationWindowLowerOffset
## F1.S2000                  208.95                      21.95
## F1.S2001                  244.05                      14.15
## F1.S2002                  270.85                      13.65
## F1.S2003                  299.10                      15.60
## F1.S2004                  329.80                      16.10
## F1.S2005                  367.35                      22.45
##          isolationWindowUpperOffset msLevel retentionTime
## F1.S2000                      21.95       2       200.084
## F1.S2001                      14.15       2       200.181
## F1.S2002                      13.65       2       200.278
## F1.S2003                      15.60       2       200.375
## F1.S2004                      16.10       2       200.472
## F1.S2005                      22.45       2       200.569

head(isolationWindowLowerMz(swath_data))

## [1] 187.0 229.9 257.2 283.5 313.7 344.9

head(isolationWindowUpperMz(swath_data))

## [1] 230.9 258.2 284.5 314.7 345.9 389.8

In the present data set we use the value of the isolation window target m/z to define the individual SWATH pockets. Below we list the number of spectra that are recorded in each pocket/isolation window.

table(isolationWindowTargetMz(swath_data))

## 
## 163.75 208.95 244.05 270.85  299.1  329.8 367.35 601.85 
##    444    445    445    445    445    444    444    444

We have thus 1,000 MS2 spectra measured in each isolation window.

3.1 Chromatographic peak detection in MS1 and MS2 data

Similar to a conventional LC-MS analysis, we perform first a chromatographic peak detection (on the MS level 1 data) with the findChromPeaks method. Below we define the settings for a centWave-based peak detection and perform the analysis.

cwp <- CentWaveParam(snthresh = 5, noise = 100, ppm = 10,
                     peakwidth = c(3, 30))
swath_data <- findChromPeaks(swath_data, param = cwp)

Next we perform a chromatographic peak detection in the MS level 2 data of each isolation window. We use the findChromPeaksIsolationWindow function employing the same peak detection algorithm reducing however the required signal-to-noise ratio. The isolationWindow parameter allows to specify which MS2 spectra belong to which isolation window and hence defines in which set of MS2 spectra chromatographic peak detection should be performed. While the default value for this parameter uses isolation windows provided by calling isolationWindowTargetMz on the object, it would also be possible to manually define the isolation windows, e.g. if the corresponding information is not available in the input mzML files.

cwp <- CentWaveParam(snthresh = 3, noise = 10, ppm = 10,
                     peakwidth = c(3, 30))
swath_data <- findChromPeaksIsolationWindow(swath_data, param = cwp)

The findChromPeaksIsolationWindow function added all peaks identified in the individual isolation windows to the chromPeaks matrix containing already the MS1 chromatographic peaks. These newly added peaks can be identified by the value of the "isolationWindow" column in the corresponding row in chromPeakData, which lists also the MS level in which the peak was identified.

chromPeakData(swath_data)

## DataFrame with 368 rows and 6 columns
##        ms_level is_filled isolationWindow isolationWindowTargetMZ
##       <integer> <logical>        <factor>               <numeric>
## CP01          1     FALSE              NA                      NA
## CP02          1     FALSE              NA                      NA
## CP03          1     FALSE              NA                      NA
## CP04          1     FALSE              NA                      NA
## CP05          1     FALSE              NA                      NA
## ...         ...       ...             ...                     ...
## CP364         2     FALSE          601.85                  601.85
## CP365         2     FALSE          601.85                  601.85
## CP366         2     FALSE          601.85                  601.85
## CP367         2     FALSE          601.85                  601.85
## CP368         2     FALSE          601.85                  601.85
##       isolationWindowLowerMz isolationWindowUpperMz
##                    <numeric>              <numeric>
## CP01                      NA                     NA
## CP02                      NA                     NA
## CP03                      NA                     NA
## CP04                      NA                     NA
## CP05                      NA                     NA
## ...                      ...                    ...
## CP364                  388.8                  814.9
## CP365                  388.8                  814.9
## CP366                  388.8                  814.9
## CP367                  388.8                  814.9
## CP368                  388.8                  814.9

Below we count the number of chromatographic peaks identified within each isolation window (the number of chromatographic peaks identified in MS1 is 62).

table(chromPeakData(swath_data)$isolationWindow)

## 
## 163.75 208.95 244.05 270.85  299.1  329.8 367.35 601.85 
##      2     38     32     14    105     23     61     31

We thus successfully identified chromatographic peaks in the different MS levels and isolation windows, but don’t have any actual MS2 spectra yet. These have to be reconstructed from the available chromatographic peak data which we will done in the next section.

3.2 Reconstruction of MS2 spectra

Identifying the signal of the fragment ions for the precursor measured by each MS1 chromatographic peak is a non-trivial task. The MS2 spectrum of the fragment ion for each MS1 chromatographic peak has to be reconstructed from the available MS2 signal (i.e. the chromatographic peaks identified in MS level 2). For SWATH data, fragment ion signal should be present in the isolation window that contains the m/z of the precursor ion and the chromatographic peak shape of the MS2 chromatographic peaks of fragment ions of a specific precursor should have a similar retention time and peak shape than the precursor’s MS1 chromatographic peak.

After detection of MS1 and MS2 chromatographic peaks has been performed, we can reconstruct the MS2 spectra using the reconstructChromPeakSpectra function. This function defines an MS2 spectrum for each MS1 chromatographic peak based on the following approach:

Identify MS2 chromatographic peaks in the isolation window containing the m/z of the ion (the MS1 chromatographic peak) that have approximately the same retention time than the MS1 chromatographic peak (the accepted difference in retention time can be defined with the diffRt parameter).
Extract the MS1 chromatographic peak and all MS2 chromatographic peaks identified by the previous step and correlate the peak shapes of the candidate MS2 chromatographic peaks with the shape of the MS1 peak. MS2 chromatographic peaks with a correlation coefficient larger than minCor are retained.
Reconstruct the MS2 spectrum using the m/z of all above selected MS2 chromatographic peaks and their intensity; each MS2 chromatographic peak selected for an MS1 peak will thus represent one mass peak in the reconstructed spectrum.

To illustrate this process we perform the individual steps on the example of Fenamiphos (exact mass 303.105800777 and m/z of [M+H]+ adduct 304.113077). As a first step we extract the chromatographic peak for this ion.

fenamiphos_mz <- 304.113077
fenamiphos_ms1_peak <- chromPeaks(swath_data, mz = fenamiphos_mz, ppm = 2)
fenamiphos_ms1_peak

##            mz    mzmin    mzmax      rt   rtmin   rtmax     into     intb
## CP34 304.1124 304.1121 304.1126 423.945 419.445 428.444 10697.34 10688.34
##          maxo  sn sample
## CP34 2401.849 618      1

Next we identify all MS2 chromatographic peaks that were identified in the isolation window containing the m/z of Fenamiphos. The information on the isolation window in which a chromatographic peak was identified is available in the chromPeakData (which contains arbitrary additional annotations to each individual chromatographic peak).

keep <- chromPeakData(swath_data)$isolationWindowLowerMz < fenamiphos_mz &
        chromPeakData(swath_data)$isolationWindowUpperMz > fenamiphos_mz

We also require the retention time of the MS2 chromatographic peaks to be similar to the retention time of the MS1 peak and extract the corresponding peak information.

keep <- keep &
    chromPeaks(swath_data)[, "rtmin"] < fenamiphos_ms1_peak[, "rt"] &
    chromPeaks(swath_data)[, "rtmax"] > fenamiphos_ms1_peak[, "rt"]

fenamiphos_ms2_peak <- chromPeaks(swath_data)[which(keep), ]

In total 24 MS2 chromatographic peaks match all the above condition. Next we extract their corresponding ion chromatograms, as well as the ion chromatogram of the MS1 peak. In addition we have to filter the object first by isolation window, keeping only spectra that were measured in that specific window and to specify to extract the chromatographic data from MS2 spectra (with msLevel = 2L).

rtr <- fenamiphos_ms1_peak[, c("rtmin", "rtmax")]
mzr <- fenamiphos_ms1_peak[, c("mzmin", "mzmax")]
fenamiphos_ms1_chr <- chromatogram(swath_data, rt = rtr, mz = mzr)

rtr <- fenamiphos_ms2_peak[, c("rtmin", "rtmax")]
mzr <- fenamiphos_ms2_peak[, c("mzmin", "mzmax")]
fenamiphos_ms2_chr <- chromatogram(
    filterIsolationWindow(swath_data, mz = fenamiphos_mz),
    rt = rtr, mz = mzr, msLevel = 2L)

We can now plot the extracted ion chromatogram of the MS1 and the extracted MS2 data.

plot(rtime(fenamiphos_ms1_chr[1, 1]),
     intensity(fenamiphos_ms1_chr[1, 1]),
     xlab = "retention time [s]", ylab = "intensity", pch = 16,
     ylim = c(0, 5000), col = "blue", type = "b", lwd = 2)
#' Add data from all MS2 peaks
tmp <- lapply(fenamiphos_ms2_chr@.Data,
              function(z) points(rtime(z), intensity(z),
                                 col = "#00000080",
                                 type = "b", pch = 16))

Figure 3: Extracted ion chromatograms for Fenamiphos from MS1 (blue) and potentially related signal in MS2 (grey)

Next we can calculate correlations between the peak shapes of each MS2 chromatogram with the MS1 peak. We perform the correlation below for one of the MS2 chromatographic peaks. Note that, because spectra are recorded consecutively, the retention times of the individual data points will differ for the MS2 and MS1 chromatographic data and data points have thus to be matched (aligned) before performing the correlation analysis. This is done automatically by the correlate function. See the help for the align method for more information on alignment options.

correlate(fenamiphos_ms2_chr[1, 1],
          fenamiphos_ms1_chr[1, 1], align = "approx")

## Warning: '.local' is deprecated.
## Use 'compareChromatograms' instead.
## See help("Deprecated")

## [1] 0.9997871

After identifying the MS2 chromatographic peaks with shapes of enough high similarity to the MS1 chromatographic peaks, an MS2 spectrum could be reconstructed based on the m/z and intensities of the MS2 chromatographic peaks.

The reconstructChromPeakSpectra function performs the above analysis for each individual MS1 chromatographic peak in a SWATH data set. Below we reconstruct MS2 spectra for our example data requiring a peak shape correlation higher than 0.9 between the candidate MS2 chromatographic peak and the target MS1 chromatographic peak. Again, we use return.type = "Spectra" to return the results as a Spectra object (instead to the default, but older/obsolete MSpectra object).

swath_spectra <- reconstructChromPeakSpectra(swath_data, minCor = 0.9,
                                             return.type = "Spectra")
swath_spectra

## MSn data (Spectra) with 62 spectra in a MsBackendDataFrame backend:
##       msLevel     rtime scanIndex
##     <integer> <numeric> <integer>
## 1           2   239.458        NA
## 2           2   240.358        NA
## 3           2   329.577        NA
## 4           2   329.771        NA
## 5           2   346.164        NA
## ...       ...       ...       ...
## 58          2   551.735        NA
## 59          2   551.735        NA
## 60          2   575.134        NA
## 61          2   575.134        NA
## 62          2   574.942        NA
##  ... 20 more variables/columns.
## Processing:
##  Merge 1 Spectra into one [Sun Nov 21 08:14:46 2021]

As a result we got a Spectra object of length equal to the number of MS1 peaks in our data. A peaksCount of 0 indicates that no MS2 spectrum could be defined based on the used settings. For reconstructed spectra additional annotations are available such as the IDs of the MS2 chromatographic peaks from which the spectrum was reconstructed ("ms2_peak_id") as well as the correlation coefficient of their chromatographic peak shape with the precursor’s shape ("ms2_peak_cor"). Metadata column "peak_id" contains the ID of the MS1 chromatographic peak:

swath_spectra$ms2_peak_id

## CharacterList of length 62
## [[1]] character(0)
## [[2]] character(0)
## [[3]] CP063
## [[4]] CP105
## [[5]] CP153
## [[6]] character(0)
## [[7]] character(0)
## [[8]] character(0)
## [[9]] character(0)
## [[10]] character(0)
## ...
## <52 more elements>

swath_spectra$peak_id

##  [1] "CP01" "CP02" "CP03" "CP04" "CP05" "CP06" "CP07" "CP08" "CP09" "CP10"
## [11] "CP11" "CP12" "CP13" "CP14" "CP15" "CP16" "CP17" "CP18" "CP19" "CP20"
## [21] "CP21" "CP22" "CP23" "CP24" "CP25" "CP26" "CP27" "CP28" "CP29" "CP30"
## [31] "CP31" "CP32" "CP33" "CP34" "CP35" "CP36" "CP37" "CP38" "CP39" "CP40"
## [41] "CP41" "CP42" "CP43" "CP44" "CP45" "CP46" "CP47" "CP48" "CP49" "CP50"
## [51] "CP51" "CP52" "CP53" "CP54" "CP55" "CP56" "CP57" "CP58" "CP59" "CP60"
## [61] "CP61" "CP62"

We next extract the MS2 spectrum for our example peak most likely representing [M+H]+ ions of Fenamiphos using its chromatographic peak ID:

fenamiphos_swath_spectrum <- swath_spectra[
    swath_spectra$peak_id == rownames(fenamiphos_ms1_peak)]

We can now compare the reconstructed spectrum to the example consensus spectrum from the DDA experiment in the previous section (variable ex_spectrum) as well as to the MS2 spectrum for Fenamiphos from Metlin (with a collision energy of 10V). For better visualization we normalize also the peak intensities of the reconstructed SWATH spectrum with the same function we used for the experimental DDA spectrum.

fenamiphos_swath_spectrum <- addProcessing(fenamiphos_swath_spectrum,
                                           norm_fun)

par(mfrow = c(1, 2))
plotSpectraMirror(fenamiphos_swath_spectrum, ex_spectrum,
     ppm = 50, main = "against DDA")
plotSpectraMirror(fenamiphos_swath_spectrum, fenamiphos[2],
     ppm = 50, main = "against Metlin")

Mirror plot comparing the reconstructed MS2 spectrum for Fenamiphos (upper panel) against the measured spectrum from the DDA data and the Fenamiphhos spectrum from Metlin.

Figure 4: Mirror plot comparing the reconstructed MS2 spectrum for Fenamiphos (upper panel) against the measured spectrum from the DDA data and the Fenamiphhos spectrum from Metlin

If we wanted to get the EICs for the MS2 chromatographic peaks used to generate this MS2 spectrum we can use the IDs of these peaks which are provided with $ms2_peak_id of the result spectrum.

pk_ids <- fenamiphos_swath_spectrum$ms2_peak_id[[1]]
pk_ids

##  [1] "CP199" "CP201" "CP211" "CP208" "CP200" "CP202" "CP217" "CP215" "CP205"
## [10] "CP212" "CP221" "CP223" "CP213" "CP207" "CP220"

With these peak IDs available we can extract their retention time window and m/z ranges from the chromPeaks matrix and use the chromatogram function to extract their EIC. Note however that for SWATH data we have MS2 signal from different isolation windows. Thus we have to first filter the swath_data object by the isolation window containing the precursor m/z with the filterIsolationWindow to subset the data to MS2 spectra related to the ion of interest. In addition, we have to use msLevel = 2L in the chromatogram call because chromatogram extracts by default only data from MS1 spectra.

rt_range <- chromPeaks(swath_data)[pk_ids, c("rtmin", "rtmax")]
mz_range <- chromPeaks(swath_data)[pk_ids, c("mzmin", "mzmax")]

pmz <- precursorMz(fenamiphos_swath_spectrum)[1]
swath_data_iwindow <- filterIsolationWindow(swath_data, mz = pmz)

## Warning in object[keep]: Removed preprocessing results

ms2_eics <- chromatogram(swath_data_iwindow, rt = rt_range,
                         mz = mz_range, msLevel = 2L)

Each row of this ms2_eics contains now the EIC of one of the MS2 chromatographic peaks.

As a second example we analyze the signal from an [M+H]+ ion with an m/z of 376.0381 (which would match Prochloraz). We first identify the MS1 chromatographic peak for that m/z and retrieve the reconstructed MS2 spectrum for that peak.

prochloraz_mz <- 376.0381

prochloraz_ms1_peak <- chromPeaks(swath_data, msLevel = 1L,
                                  mz = prochloraz_mz, ppm = 5)
prochloraz_ms1_peak

##            mz   mzmin    mzmax      rt   rtmin   rtmax     into     intb
## CP22 376.0373 376.037 376.0374 405.046 401.446 409.546 3664.051 3655.951
##          maxo  sn sample
## CP22 897.3923 278      1

prochloraz_swath_spectrum <- swath_spectra[
    swath_spectra$peak_id == rownames(prochloraz_ms1_peak)]

In addition we identify the corresponding MS1 peak in the DDA data set, extract all measured MS2 chromatographic peaks and build the consensus spectrum from these.

prochloraz_dda_peak <- chromPeaks(dda_data, msLevel = 1L,
                                  mz = prochloraz_mz, ppm = 5)
prochloraz_dda_peak

##             mz    mzmin    mzmax      rt   rtmin   rtmax     into     intb
## CP034 376.0385 376.0378 376.0391 405.715 400.346 410.555 5088.528 5078.727
##           maxo  sn sample
## CP034 1350.633 436      1

The retention times for the chromatographic peaks from the DDA and SWATH data match almost perfectly. Next we get the MS2 spectra for this peak.

prochloraz_dda_spectra <- dda_spectra[
    dda_spectra$peak_id == rownames(prochloraz_dda_peak)]
prochloraz_dda_spectra

## MSn data (Spectra) with 5 spectra in a MsBackendMzR backend:
##            msLevel     rtime scanIndex
##          <integer> <numeric> <integer>
## F1.S3253         2   401.438      3253
## F1.S3259         2   402.198      3259
## F1.S3306         2   404.677      3306
## F1.S3316         2   405.127      3316
## F1.S3325         2   405.877      3325
##  ... 38 more variables/columns.
## 
## file(s):
## PestMix1_DDA.mzML

In total 5 spectra were measured, some with a relatively high number of peaks. Next we combine them into a consensus spectrum.

prochloraz_dda_spectrum <- combineSpectra(
    prochloraz_dda_spectra, FUN = combinePeaks, ppm = 20,
    peaks = "intersect", minProp = 0.8, intensityFun = median, mzFun = median,
    f = prochloraz_dda_spectra$peak_id)

## Backend of the input object is read-only, will change that to an 'MsBackendDataFrame'

At last we load also the Prochloraz MS2 spectra (for different collision energies) from Metlin.

prochloraz <- Spectra(
    system.file("mgf", "metlin-68898.mgf", package = "xcms"),
    source = MsBackendMgf())

## Start data import from 1 files ... done

To validate the reconstructed spectrum we plot it against the corresponding DDA spectrum and the MS2 spectrum for Prochloraz (for a collision energy of 10V) from Metlin.

prochloraz_swath_spectrum <- addProcessing(prochloraz_swath_spectrum, norm_fun)
prochloraz_dda_spectrum <- addProcessing(prochloraz_dda_spectrum, norm_fun)

par(mfrow = c(1, 2))
plotSpectraMirror(prochloraz_swath_spectrum, prochloraz_dda_spectrum,
                  ppm = 40, main = "against DDA")
plotSpectraMirror(prochloraz_swath_spectrum, prochloraz[2],
                  ppm = 40, main = "against Metlin")

Mirror plot comparing the reconstructed MS2 spectrum for Prochloraz (upper panel) against the measured spectrum from the DDA data and the Prochloraz spectrum from Metlin.

Figure 5: Mirror plot comparing the reconstructed MS2 spectrum for Prochloraz (upper panel) against the measured spectrum from the DDA data and the Prochloraz spectrum from Metlin

The spectra fit relatively well. Interestingly, the peak representing the precursor (the right-most peak) seems to have a slightly shifted m/z value in the reconstructed spectrum.

Similar to the DDA data, the reconstructed MS2 spectra from SWATH data could be used in the annotation of the MS1 chromatographic peaks.

4 Outlook

Currently, spectra data representation, handling and processing is being re-implemented as part of the RforMassSpectrometry initiative aiming at increasing the performance of methods and simplifying their use. Thus, parts of the workflow described here will be changed (improved) in future.

Along with these developments, improved matching strategies for larger data sets will be implemented as well as functionality to compare Spectra directly to reference MS2 spectra from public annotation resources (e.g. Massbank or HMDB). See for example here for more information.

Regarding SWATH data analysis, future development will involve improved selection of the correct MS2 chromatographic peaks considering also correlation with intensity values across several samples.

5 Session information

sessionInfo()

## R version 4.1.2 (2021-11-01)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.3 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.14-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.14-bioc/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] MsBackendMgf_1.2.0     magrittr_2.0.1         pander_0.6.4          
##  [4] Spectra_1.4.1          MassSpecWavelet_1.60.0 waveslim_1.8.2        
##  [7] pheatmap_1.0.12        faahKO_1.34.0          MsFeatures_1.2.0      
## [10] xcms_3.16.1            MSnbase_2.20.2         ProtGenerics_1.26.0   
## [13] S4Vectors_0.32.3       mzR_2.28.0             Rcpp_1.0.7            
## [16] Biobase_2.54.0         BiocGenerics_0.40.0    BiocParallel_1.28.1   
## [19] BiocStyle_2.22.0      
## 
## loaded via a namespace (and not attached):
##  [1] fs_1.5.0                    bitops_1.0-7               
##  [3] matrixStats_0.61.0          progress_1.2.2             
##  [5] doParallel_1.0.16           RColorBrewer_1.1-2         
##  [7] GenomeInfoDb_1.30.0         tools_4.1.2                
##  [9] bslib_0.3.1                 utf8_1.2.2                 
## [11] R6_2.5.1                    affyio_1.64.0              
## [13] DBI_1.1.1                   colorspace_2.0-2           
## [15] prettyunits_1.1.1           tidyselect_1.1.1           
## [17] compiler_4.1.2              preprocessCore_1.56.0      
## [19] DelayedArray_0.20.0         bookdown_0.24              
## [21] sass_0.4.0                  scales_1.1.1               
## [23] DEoptimR_1.0-9              robustbase_0.93-9          
## [25] affy_1.72.0                 stringr_1.4.0              
## [27] digest_0.6.28               rmarkdown_2.11             
## [29] XVector_0.34.0              pkgconfig_2.0.3            
## [31] htmltools_0.5.2             MatrixGenerics_1.6.0       
## [33] highr_0.9                   fastmap_1.1.0              
## [35] limma_3.50.0                rlang_0.4.12               
## [37] impute_1.68.0               farver_2.1.0               
## [39] jquerylib_0.1.4             generics_0.1.1             
## [41] jsonlite_1.7.2              mzID_1.32.0                
## [43] dplyr_1.0.7                 RCurl_1.98-1.5             
## [45] GenomeInfoDbData_1.2.7      MALDIquant_1.20            
## [47] Matrix_1.3-4                munsell_0.5.0              
## [49] fansi_0.5.0                 MsCoreUtils_1.6.0          
## [51] lifecycle_1.0.1             vsn_3.62.0                 
## [53] stringi_1.7.5               yaml_2.2.1                 
## [55] MASS_7.3-54                 SummarizedExperiment_1.24.0
## [57] zlibbioc_1.40.0             plyr_1.8.6                 
## [59] grid_4.1.2                  parallel_4.1.2             
## [61] crayon_1.4.2                lattice_0.20-45            
## [63] hms_1.1.1                   magick_2.7.3               
## [65] knitr_1.36                  pillar_1.6.4               
## [67] GenomicRanges_1.46.1        codetools_0.2-18           
## [69] XML_3.99-0.8                glue_1.5.0                 
## [71] evaluate_0.14               pcaMethods_1.86.0          
## [73] BiocManager_1.30.16         vctrs_0.3.8                
## [75] foreach_1.5.1               gtable_0.3.0               
## [77] RANN_2.6.1                  purrr_0.3.4                
## [79] clue_0.3-60                 assertthat_0.2.1           
## [81] ggplot2_3.3.5               xfun_0.28                  
## [83] ncdf4_1.17.1                tibble_3.1.6               
## [85] iterators_1.0.13            IRanges_2.28.0             
## [87] cluster_2.1.2               ellipsis_0.3.2

References

1. Ludwig C, Gillet L, Rosenberger G, Amon S, Collins BC, Aebersold R: Data-independent acquisition-based SWATH-MS for quantitative proteomics: a tutorial. Molecular systems biology 2018, 14:e8126.

LC-MS/MS data analysis with xcms

21 November 2021

Package