Working with MSnSets

library(MSnbase)
library(camprotR)
library(dplyr)

Introduction

What is an MSnSet? To quote from MSnbase:

The MSnSet class is derived form the Biobase::eSet class and mimics the Biobase::ExpressionSet class classically used for microarray data.

This function description is a bit dense and unintelligible to the uninitiated. Additionally, there is already a vignette in the MSnbase package describing MSnSets, but this may be a bit hard to understand for beginners.

Here, I will describe an MSnSet in my own words.

An MSnSet is a special type of list (specifically it is an S4 object) that contains information about an MS experiment.

To better understand MSnSets we first need to define some terminology. In a quantitative proteomics experiment we are analysing ‘samples’ from different experimental conditions via MS, e.g. comparing ‘samples’ from cells treated with a drug versus a control. The quantitative data we eventually obtain consists of measurements of ‘features’. In this context, features can be PSMs, peptides, or proteins.

MSnSets contain multiple objects of different types:

The quantitative data (‘assay data’; in a numeric matrix)
The associated ‘feature data’ for each row in the quantification data (in a Biobase::AnnotatedDataFrame)
Metadata describing the MS experiment, for each column in the quantification data (‘phenotype Data’; in a Biobase::AnnotatedDataFrame)

These underlying objects must have a specific structure:

The number of rows in assayData must match the number of rows in featureData and the row names must match exactly.
The number of columns in assayData must match the number of rows in phenoData and the column/row names must match exactly.

Dimension requirements for the assayData (aka. expression data), featureData and phenoData (aka. sample data), slots. Adapted from this MSnbase vignette.

Conveniently, the MSnbase package comes with some example MSnSets. In this vignette we will explore the msnset MSnSet. This data set is from an iTRAQ 4-plex experiment wherein BSA and Enolase have been spiked into a background of Erwinia proteins. See ?msnset for more information.

Exploring an MSnSet

My favourite way to have a look at anything in R is to use the str() function to explore an objects’ structure. Here we look at msnset which is an MSnSet with 9 ‘slots’ that each contain some sort of object.

str(msnset, max.level = 2)
#> Formal class 'MSnSet' [package "MSnbase"] with 9 slots
#>   ..@ experimentData   :Formal class 'MIAPE' [package "MSnbase"] with 30 slots
#>   ..@ processingData   :Formal class 'MSnProcess' [package "MSnbase"] with 10 slots
#>   ..@ qual             :'data.frame':    220 obs. of  7 variables:
#>   ..@ assayData        :<environment: 0x55d014466db0> 
#>   ..@ phenoData        :Formal class 'AnnotatedDataFrame' [package "Biobase"] with 4 slots
#>   ..@ featureData      :Formal class 'AnnotatedDataFrame' [package "Biobase"] with 4 slots
#>   ..@ annotation       : chr "No annotation"
#>   ..@ protocolData     :Formal class 'AnnotatedDataFrame' [package "Biobase"] with 4 slots
#>   ..@ .__classVersion__:Formal class 'Versions' [package "Biobase"] with 1 slot

Each slot in msnset is described in detail in the ‘MSnSet slots’ section below. For now we will only concern ourselves with the assayData, featureData, and phenoData slots.

assayData

The assayData slot contains the quantitative data from the experiment, i.e. how much of each feature (spectra/PSM, peptide, or protein) was detected in each sample. This is the essential part of an MSnSet. All other slots are optional.

We can extract this information from the MSnSet into a numeric matrix with the exprs() function.

msnset_exprs <- exprs(msnset)

Lets look at its structure.

str(msnset_exprs)
#>  num [1:55, 1:4] 1348 740 27638 31893 26144 ...
#>  - attr(*, "dimnames")=List of 2
#>   ..$ : chr [1:55] "X1" "X10" "X11" "X12" ...
#>   ..$ : chr [1:4] "iTRAQ4.114" "iTRAQ4.115" "iTRAQ4.116" "iTRAQ4.117"

It is an 55 by 4 numeric matrix. The data it contains is reporter ion intensities from 4 iTRAQ tags across 55 different PSMs.

Each column of this matrix refers to an iTRAQ tag which corresponds to an individual ‘sample’. Each row of this matrix corresponds to a ‘feature’ which in this case is a PSM. The numbers indicate the intensity of the reporter ion from a particular tag (i.e. sample) in a particular PSM.

featureData

The featureData slot contains metadata about the ‘features’ (e.g. PSMs, peptides, proteins).

We can extract this information from the MSnSet into a data.frame with the fData() function.

msnset_fdata <- fData(msnset)

Lets look at its structure.

str(msnset_fdata)
#> 'data.frame':    55 obs. of  15 variables:
#>  $ spectrum           : int  1 10 11 12 13 14 15 16 17 18 ...
#>  $ ProteinAccession   : Factor w/ 40 levels "BSA","ECA0172",..: 1 18 35 30 17 9 37 38 32 24 ...
#>  $ ProteinDescription : Factor w/ 39 levels "30S ribosomal subunit protein S7",..: 12 21 6 13 35 28 10 11 20 24 ...
#>  $ PeptideSequence    : Factor w/ 47 levels "AADALLLK","AAGHDGK",..: 30 46 35 37 34 10 16 43 7 41 ...
#>  $ file               : int  1 1 1 1 1 1 1 1 1 1 ...
#>  $ retention.time     : num  1149 1503 1664 1664 1664 ...
#>  $ precursor.mz       : num  521 574 402 568 488 ...
#>  $ precursor.intensity: num  3449020 7849420 41253600 23549500 13025200 ...
#>  $ charge             : int  2 3 2 2 2 2 3 2 2 2 ...
#>  $ peaks.count        : int  1922 1376 1571 2397 2574 1829 1875 2928 1371 2075 ...
#>  $ tic                : num  0 0 0 0 0 0 0 0 0 0 ...
#>  $ ionCount           : num  2.64e+07 2.45e+07 2.31e+08 2.47e+08 2.07e+08 ...
#>  $ ms.level           : int  2 2 2 2 2 2 2 2 2 2 ...
#>  $ acquisition.number : int  2 11 12 13 14 15 16 17 18 19 ...
#>  $ collision.energy   : num  40 40 40 40 40 40 40 40 40 40 ...

It is a 55 by 4 data.frame. The data it contains is metadata about each ‘feature’, which are PSMs in this case. The type of metadata included is entirely arbitrary and there can be as many or as few columns as you want.

Each column of this matrix refers to a particular type of metadata. Each row of this matrix corresponds to a ‘feature’ which in this case is a PSM. Thus, the number of rows in featureData is the same as the number of rows in assayData. Also note that the row names of featureData exactly match the row names of assayData.

rownames(exprs(msnset)) == rownames(fData(msnset))
#>  [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
#> [16] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
#> [31] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
#> [46] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

phenoData

The phenoData slot contains metadata about the ‘samples’.

We can extract this information from the MSnSet into a data.frame with the pData() function.

msnset_pdata <- pData(msnset)

Let’s look at its structure.

str(msnset_pdata)
#> 'data.frame':    4 obs. of  2 variables:
#>  $ mz       : num  114 115 116 117
#>  $ reporters: Factor w/ 1 level "iTRAQ4": 1 1 1 1

It is a 55 by 4 data.frame. The data it contains is metadata about each ‘sample’, which are iTRAQ tags in this case. The type of metadata included is entirely arbitrary and there can be as many or as few columns as you want.

Each column of this matrix refers to a particular type of metadata. Each row of this matrix corresponds to a ‘sample’ which in this case is an iTRAQ tag. Thus, the number of rows in phenoData is the same as the number of columns in assayData. Also note that the row names of phenoData exactly match the column names of assayData.

colnames(exprs(msnset)) == rownames(pData(msnset))
#> [1] TRUE TRUE TRUE TRUE

Making an MSnSet

In the previous section we explored an small example MSnSet supplied with MSnbase. Here we will construct our own MSnSet. A small PSMs.txt Proteome Discoverer (PD) table from a TMT 10-plex experiment is provided with the camprotR package which we will turn into an MSnSet.

The input data

Lets have a look at our PSM data from PD. It is a data.frame.

str(psm_tmt_total)
#> 'data.frame':    5000 obs. of  50 variables:
#>  $ PSMs.Workflow.ID                 : int  -219 -219 -219 -219 -219 -219 -219 -219 -219 -219 ...
#>  $ PSMs.Peptide.ID                  : int  2151442 2803862 3972302 1821657 3972362 3972741 1821147 5621814 3973096 5921937 ...
#>  $ Checked                          : chr  "False" "False" "False" "False" ...
#>  $ Confidence                       : chr  "High" "High" "High" "High" ...
#>  $ Identifying.Node                 : chr  "Sequest HT (A2)" "Sequest HT (A2)" "Sequest HT (A2)" "Sequest HT (A2)" ...
#>  $ PSM.Ambiguity                    : chr  "Unambiguous" "Unambiguous" "Unambiguous" "Unambiguous" ...
#>  $ Sequence                         : chr  "VASTLTEEGGGGGGGGGSVAPKPPR" "ELYVAADEASIAPILAEAQAHFGR" "NFPNAIEHTLQWAR" "CLEPLPQEQGNMEYTK" ...
#>  $ Annotated.Sequence               : chr  "vASTLTEEGGGGGGGGGSVAPkPPR" "eLYVAADEASIAPILAEAQAHFGR" "nFPNAIEHTLQWAR" "cLEPLPQEQGNMEYTk" ...
#>  $ Modifications                    : chr  "N-Term(TMT6plex); K22(TMT6plex)" "N-Term(TMT6plex)" "N-Term(TMT6plex)" "N-Term(TMT6plex); C1(Carbamidomethyl); K16(TMT6plex)" ...
#>  $ Number.of.Proteins               : int  1 1 1 1 2 1 1 1 1 2 ...
#>  $ Master.Protein.Accessions        : chr  "Q8TF68" "Q8NFF5" "P22314" "Q9H583" ...
#>  $ Protein.Accessions               : chr  "Q8TF68" "Q8NFF5" "P22314" "Q9H583" ...
#>  $ Number.of.Missed.Cleavages       : int  0 0 0 0 0 0 1 0 2 1 ...
#>  $ Charge                           : int  3 3 3 3 2 2 3 3 3 4 ...
#>  $ Original.Precursor.Charge        : int  3 3 3 3 2 2 3 3 3 4 ...
#>  $ Delta.Score                      : num  0.706 0.661 0.433 0.512 0.539 ...
#>  $ Delta.Cn                         : num  0 0 0 0 0 0 0 0 0 0 ...
#>  $ Rank                             : int  1 1 1 1 1 1 1 1 1 1 ...
#>  $ Search.Engine.Rank               : int  1 1 1 1 1 1 1 1 1 1 ...
#>  $ mz.in.Da                         : num  885 924 643 799 912 ...
#>  $ MHplus.in.Da                     : num  2653 2771 1926 2395 1823 ...
#>  $ Theo.MHplus.in.Da                : num  2653 2771 1926 2395 1823 ...
#>  $ Delta.M.in.ppm                   : num  1.3 -0.21 -1.46 -1.56 -1.62 -1.32 -1.78 0.45 4.65 -1.87 ...
#>  $ Delta.mz.in.Da                   : num  0.00115 -0.0002 -0.00094 -0.00125 -0.00147 -0.00149 -0.00146 0.0003 0.00526 -0.00121 ...
#>  $ Intensity                        : num  6107200 1377135 34854708 1579551 4543815 ...
#>  $ Activation.Type                  : chr  "CID" "CID" "CID" "CID" ...
#>  $ MS.Order                         : chr  "MS2" "MS2" "MS2" "MS2" ...
#>  $ Isolation.Interference.in.Percent: num  0 10.79 3.49 73.84 14.02 ...
#>  $ SPS.Mass.Matches.in.Percent      : int  100 100 70 60 90 40 90 100 80 100 ...
#>  $ Average.Reporter.SN              : num  104.6 55.5 133.6 275.1 248.1 ...
#>  $ Ion.Inject.Time.in.ms            : num  12.002 9.874 0.951 9.193 6.601 ...
#>  $ RT.in.min                        : num  53.4 97.2 74.7 69.4 74.7 ...
#>  $ First.Scan                       : int  21503 46588 34172 29793 34178 34219 29738 37783 34263 30609 ...
#>  $ Spectrum.File                    : chr  "anja_lopit_total_rep1_f13_200825080010.raw" "anja_lopit_total_rep1_f14_200825100213.raw" "anja_lopit_total_rep1_f17_200825160824.raw" "anja_lopit_total_rep1_f11_200825055807.raw" ...
#>  $ File.ID                          : chr  "F2.6" "F2.7" "F2.10" "F2.5" ...
#>  $ Abundance.126                    : num  465.7 96.8 196.8 609.2 361.7 ...
#>  $ Abundance.127N                   : num  341.9 82.2 127.3 546 335.7 ...
#>  $ Abundance.127C                   : num  156.6 92.3 122.5 358.2 227.2 ...
#>  $ Abundance.128N                   : num  8.3 81.8 92.9 396.9 148.1 ...
#>  $ Abundance.128C                   : num  10.4 50.7 103.4 190.2 143.4 ...
#>  $ Abundance.129N                   : num  12.4 26.9 102.8 140.9 165.6 ...
#>  $ Abundance.129C                   : num  11 25.3 140.4 102.4 198.9 ...
#>  $ Abundance.130N                   : num  12.2 18.2 77.5 114.1 212.7 ...
#>  $ Abundance.130C                   : num  17.2 11 48.2 199.8 466.5 ...
#>  $ Abundance.131                    : num  11.4 72.3 337.6 103.4 245.6 ...
#>  $ Quan.Info                        : logi  NA NA NA NA NA NA ...
#>  $ XCorr                            : num  7.95 6.25 3.93 2.58 4.56 3.74 4.89 2.69 4.55 3.6 ...
#>  $ Number.of.Protein.Groups         : int  1 1 1 1 1 1 1 1 1 2 ...
#>  $ Percolator.q.Value               : num  5.22e-05 5.22e-05 5.22e-05 5.22e-05 5.22e-05 ...
#>  $ Percolator.PEP                   : num  1.73e-10 1.76e-06 4.08e-06 2.24e-05 5.37e-06 ...

This data.frame contains 5000 PSMs. We have quantitative data for each PSM (the Abundance columns) and metadata for each PSM (all the other columns).

assayData

As before, the single essential part of an MSnSet is the assayData slot which contains the quantitative data from your experiment.

In this case, it should contain a numeric matrix with 5000 rows corresponding to the 5000 PSMs and 10 columns corresponding to the 10 TMT tags.

First we extract the columns with the quantitative data and convert them to a numeric matrix.

# abundance columns for TMT PD output start with Abundance 
abundance_cols <- colnames(psm_tmt_total)[grepl('Abundance.', colnames(psm_tmt_total))]

tmt_exprs <- as.matrix(psm_tmt_total[, abundance_cols])

Then we remove the word ‘Abundance’ from the column names to make them more concise.

# update the column names to remove the 'Abundance.` prefix
colnames(tmt_exprs) <- gsub('Abundance.', '', colnames(tmt_exprs))

Lastly, we use the unique PSMs.Peptide.ID column to define unique row names. This is important for extracting and combining data down the line. Row names must be unique!

# use PSMs.Peptide.ID, which are unique, to define rownames
rownames(tmt_exprs) <- psm_tmt_total$PSMs.Peptide.ID

Our quantitative data are now ready.

featureData

Now we construct a data.frame with metadata for each PSM to go into the featureData slot of our MSnSet.

In this case, it should be a data.frame with 5000 rows corresponding to the 5000 PSMs and any number of columns.

First we extract the columns with the metadata of interest. Here we want everything but the Abundance columns and the unique IDs.

# get all columns except Abundance columns identified earlier
metadata_cols <- setdiff(colnames(psm_tmt_total), c(abundance_cols, "PSMs.Peptide.ID"))

tmt_fdata <- psm_tmt_total[, metadata_cols]

Again, we use the unique PSMs.Peptide.ID column to define unique row names. This must match tmt_exprs!

# use PSMs.Peptide.ID, which are unique, to define rownames
rownames(tmt_fdata) <- psm_tmt_total$PSMs.Peptide.ID

Our metadata are now ready.

phenoData

Lastly, we construct a data.frame with metadata for each TMT 10-plex tag, to go into the phenoData slot of our MSnSet.

In this case, it should be a data.frame with 10 rows corresponding to the 10 TMT tag and any number of columns.

First we construct an empty data.frame with 10 rows.

tmt_pdata <- data.frame(matrix(nrow = 10, ncol = 0))

Then we can add some metadata. In this example we will just add some fake sample names and fake treatment conditions.

tmt_pdata$sample <- paste0("sample", 1:10)
tmt_pdata$treatment <- rep(c("trt", "ctrl"), each = 5)

The rownames must be identical to the column names of tmt_exprs.

rownames(tmt_pdata) <- colnames(tmt_exprs)

Make the MSnSet

Now we construct the MSnSet. As long as we have set up the underlying data properly, this step is the easiest!

tmt_msnset <- MSnSet(exprs = tmt_exprs, fData = tmt_fdata, pData = tmt_pdata)

Lets have a look at its structure.

str(tmt_msnset, max.level = 2)
#> Formal class 'MSnSet' [package "MSnbase"] with 9 slots
#>   ..@ experimentData   :Formal class 'MIAPE' [package "MSnbase"] with 30 slots
#>   ..@ processingData   :Formal class 'MSnProcess' [package "MSnbase"] with 10 slots
#>   ..@ qual             :'data.frame':    0 obs. of  0 variables
#> Formal class 'data.frame' [package "methods"] with 4 slots
#>   ..@ assayData        :<environment: 0x55d0271dc2a8> 
#>   ..@ phenoData        :Formal class 'AnnotatedDataFrame' [package "Biobase"] with 4 slots
#>   ..@ featureData      :Formal class 'AnnotatedDataFrame' [package "Biobase"] with 4 slots
#>   ..@ annotation       : chr(0) 
#>   ..@ protocolData     :Formal class 'AnnotatedDataFrame' [package "Biobase"] with 4 slots
#>   ..@ .__classVersion__:Formal class 'Versions' [package "Biobase"] with 1 slot

As before we can access the different slots as follows.

# access the quantitative data
head(exprs(tmt_msnset))
#>           126  127N  127C  128N  128C  129N  129C  130N  130C   131
#> 2151442 465.7 341.9 156.6   8.3  10.4  12.4  11.0  12.2  17.2  11.4
#> 2803862  96.8  82.2  92.3  81.8  50.7  26.9  25.3  18.2  11.0  72.3
#> 3972302 196.8 127.3 122.5  92.9 103.4 102.8 140.4  77.5  48.2 337.6
#> 1821657 609.2 546.0 358.2 396.9 190.2 140.9 102.4 114.1 199.8 103.4
#> 3972362 361.7 335.7 227.2 148.1 143.4 165.6 198.9 212.7 466.5 245.6
#> 3972741  52.6  49.6  39.8  40.5  14.6   5.3   6.4   0.6   7.4  12.2

# access the PSM metadata
head(fData(tmt_msnset))
#>         PSMs.Workflow.ID Checked Confidence Identifying.Node PSM.Ambiguity
#> 2151442             -219   False       High  Sequest HT (A2)   Unambiguous
#> 2803862             -219   False       High  Sequest HT (A2)   Unambiguous
#> 3972302             -219   False       High  Sequest HT (A2)   Unambiguous
#> 1821657             -219   False       High  Sequest HT (A2)   Unambiguous
#> 3972362             -219   False       High  Sequest HT (A2)   Unambiguous
#> 3972741             -219   False       High  Sequest HT (A2)   Unambiguous
#>                          Sequence        Annotated.Sequence
#> 2151442 VASTLTEEGGGGGGGGGSVAPKPPR vASTLTEEGGGGGGGGGSVAPkPPR
#> 2803862  ELYVAADEASIAPILAEAQAHFGR  eLYVAADEASIAPILAEAQAHFGR
#> 3972302            NFPNAIEHTLQWAR            nFPNAIEHTLQWAR
#> 1821657          CLEPLPQEQGNMEYTK          cLEPLPQEQGNMEYTk
#> 3972362             SSTVGLVTLNDMK             sSTVGLVTLNDMk
#> 3972741         AFTHTAQYDEAISDYFR         aFTHTAQYDEAISDYFR
#>                                                Modifications Number.of.Proteins
#> 2151442                      N-Term(TMT6plex); K22(TMT6plex)                  1
#> 2803862                                     N-Term(TMT6plex)                  1
#> 3972302                                     N-Term(TMT6plex)                  1
#> 1821657 N-Term(TMT6plex); C1(Carbamidomethyl); K16(TMT6plex)                  1
#> 3972362                      N-Term(TMT6plex); K13(TMT6plex)                  2
#> 3972741                                     N-Term(TMT6plex)                  1
#>         Master.Protein.Accessions Protein.Accessions Number.of.Missed.Cleavages
#> 2151442                    Q8TF68             Q8TF68                          0
#> 2803862                    Q8NFF5             Q8NFF5                          0
#> 3972302                    P22314             P22314                          0
#> 1821657                    Q9H583             Q9H583                          0
#> 3972362                    Q14320     Q9Y247; Q14320                          0
#> 3972741                    P31939             P31939                          0
#>         Charge Original.Precursor.Charge Delta.Score Delta.Cn Rank
#> 2151442      3                         3      0.7057        0    1
#> 2803862      3                         3      0.6608        0    1
#> 3972302      3                         3      0.4326        0    1
#> 1821657      3                         3      0.5116        0    1
#> 3972362      2                         2      0.5395        0    1
#> 3972741      2                         2      0.6016        0    1
#>         Search.Engine.Rank  mz.in.Da MHplus.in.Da Theo.MHplus.in.Da
#> 2151442                  1  885.1497     2653.434          2653.431
#> 2803862                  1  924.4902     2771.456          2771.457
#> 3972302                  1  642.6767     1926.016          1926.018
#> 1821657                  1  799.0715     2395.200          2395.204
#> 3972362                  1  912.0196     1823.032          1823.035
#> 3972741                  1 1132.5432     2264.079          2264.082
#>         Delta.M.in.ppm Delta.mz.in.Da  Intensity Activation.Type MS.Order
#> 2151442           1.30        0.00115  6107200.5             CID      MS2
#> 2803862          -0.21       -0.00020  1377135.0             CID      MS2
#> 3972302          -1.46       -0.00094 34854708.0             CID      MS2
#> 1821657          -1.56       -0.00125  1579550.6             CID      MS2
#> 3972362          -1.62       -0.00147  4543815.0             CID      MS2
#> 3972741          -1.32       -0.00149   795471.4             CID      MS2
#>         Isolation.Interference.in.Percent SPS.Mass.Matches.in.Percent
#> 2151442                          0.000000                         100
#> 2803862                         10.790270                         100
#> 3972302                          3.490231                          70
#> 1821657                         73.839900                          60
#> 3972362                         14.018580                          90
#> 3972741                          0.000000                          40
#>         Average.Reporter.SN Ion.Inject.Time.in.ms RT.in.min First.Scan
#> 2151442               104.6                12.002   53.4312      21503
#> 2803862                55.5                 9.874   97.1653      46588
#> 3972302               133.6                 0.951   74.6628      34172
#> 1821657               275.1                 9.193   69.4258      29793
#> 3972362               248.1                 6.601   74.6711      34178
#> 3972741                22.8                34.009   74.7432      34219
#>                                      Spectrum.File File.ID Quan.Info XCorr
#> 2151442 anja_lopit_total_rep1_f13_200825080010.raw    F2.6        NA  7.95
#> 2803862 anja_lopit_total_rep1_f14_200825100213.raw    F2.7        NA  6.25
#> 3972302 anja_lopit_total_rep1_f17_200825160824.raw   F2.10        NA  3.93
#> 1821657 anja_lopit_total_rep1_f11_200825055807.raw    F2.5        NA  2.58
#> 3972362 anja_lopit_total_rep1_f17_200825160824.raw   F2.10        NA  4.56
#> 3972741 anja_lopit_total_rep1_f17_200825160824.raw   F2.10        NA  3.74
#>         Number.of.Protein.Groups Percolator.q.Value Percolator.PEP
#> 2151442                        1       5.220569e-05   1.733935e-10
#> 2803862                        1       5.221000e-05   1.761000e-06
#> 3972302                        1       5.221000e-05   4.083000e-06
#> 1821657                        1       5.221000e-05   2.239000e-05
#> 3972362                        1       5.221000e-05   5.374000e-06
#> 3972741                        1       5.221000e-05   2.095000e-05

# access the sample metadata
head(pData(tmt_msnset))
#>       sample treatment
#> 126  sample1       trt
#> 127N sample2       trt
#> 127C sample3       trt
#> 128N sample4       trt
#> 128C sample5       trt
#> 129N sample6      ctrl

Extracting results from an MSnSet

The code below shows briefly how to save/export the data within an MSnSet.

Using write.exprs() from MSnbase is the easiest way. Use the fDataCols argument to specify which featureData columns to add to the right of the quantitative data (specify as column names, column numbers, or a logical vector). The other arguments are the same as write.table().

MSnbase::write.exprs(
  tmt_msnset, 
  file = "results.csv",
  fDataCols = c("Percolator.q.Value", "Master.Protein.Accessions"),
  sep = ",", row.names = FALSE, col.names = TRUE
)

Alternatively you can manually combine the results manually.

results <- merge(
  exprs(tmt_msnset), # extract PSM quantitative data
  fData(tmt_msnset), # extract PSM metadata
  by = 0 # join by rownames
)

And then use the writexl package to save to Excel.

writexl::write_xlsx(results, path = "results.xlsx")

MSnSet slots

This section contains a detailed description of each MSnSet slot.

assayData

Contains the quantitative data from the experiment, i.e. how much of each feature (e.g. PSM, peptide, protein) was detected in each sample. This is the essential part of an MSnSet.

Access with exprs(MSnSet)
Object = a matrix of expression values, access the dimensions with dim(MSnSet)
- row names = feature names (e.g. PSMs, peptides, proteins), must be unique (e.g. UniProt accessions in protein groups like Q12345;Q98765), access with featureNames(MSnSet)
- column names = sample names, must be unique, access with sampleNames(MSnSet)
- data = expression values e.g. SILAC ratios, peptide intensities, etc.

featureData

Optional. Contains metadata about the features (e.g. proteins, peptides, PSMs). For example for protein features this object might contain the protein names, their lengths, isoelectric points, number of transmembrane domains, associated GO terms, etc.

Access the overall object with featureData(MSnSet)
Access the underlying data.frames with fData(MSnSet) and fvarMetadata(MSnSet).
Object = a Biobase::AnnotatedDataFrame, which is comprised of 2 data.frames
data.frame 1 (fData)
- row names = feature names (e.g. PSMs, peptides, proteins), names must be unique (e.g. UniProt accessions in protein groups like Q12345;Q98765)
- column names = short name of feature parameter e.g. transmem, access with fvarLabels(MSnSet)
- data = can be numeric, character, factor, boolean
data.frame 2 (varMetadata, optional)
- row names = name of equipment-generated parameter e.g. transmem
- column = a single column called labelDescription
- data = character, full description/name of the equipment generated parameters e.g. Number of transmembrane domains

phenoData

Optional. Contains metadata about each sample, usually relating to the experimental design, e.g. replicates, tissues, animals, treatments, etc.

Access the overall object with phenoData(MSnSet)
Access the underlying data.frames with pData(MSnSet) and varMetadata(MSnSet).
Object = a Biobase::AnnotatedDataFrame, which is comprised of 2 data.frames.
data.frame 1 (pData)
- row names = sample names, must be unique
- column names = short name of sample metadata parameters e.g. trt, access with MSnSet$
- data = can be numeric, character, factor, boolean
data.frame 2 (varMetadata, optional)
- row names = short name of sample metadata parameters e.g. trt
- column = a single column called labelDescription
- data = character, full description of sample metadata parameters Drug treatment

protocolData

Optional. Contains equipment-generated information about the protocols used for each sample. The number of rows and the row names must match the number of columns and column names of assayData.

Access the overall object with protocolData(MSnSet)
Access the underlying data.frames with pData(protocolData(MSnSet)) and varMetadata(protocolData(MSnSet)).
Object = a Biobase::AnnotatedDataFrame, which is comprised of 2 data.frames
data.frame 1 (pData)
- row names = sample names, must be unique
- column names = short name of equipment-generated parameter e.g. ms_model
- data = can be numeric, character, factor, boolean
data.frame 2 (varMetadata, optional)
- row names = name of equipment-generated parameter e.g. ms_model
- column = a single column called labelDescription
- data = character, full description/name of the equipment generated parameters e.g. MS Model

experimentData

Optional. Contains descriptive information about the experiment and the experimenter.

Access the overall object with experimentData(MSnSet)
Object = a Biobase::MIAME object, which is essentially a list of several characters and lists.
- name = character, contains experimenter name, access with
  expinfo(MSnSet)
- lab = character, lab where experiment was conducted, access with
  expinfo(MSnSet)
- contact = character, contact info for experimenter and/or lab, access with expinfo(MSnSet)
- url = character, URL for experiment, access with
  expinfo(MSnSet)
- title = character, single-sentence experiment title, access with
  expinfo(MSnSet)
- abstract = character, abstract describing the experiment, access with
  abstract(MSnSet)
- See Biobase::MIAME for info about other (probably unnecessary) sub-objects.

processingData

Contains the version of MSnbase used to construct the MSnSet and also a log of what processes have been applied to the MSnSet.

Access the overall object with processingData(MSnSet)
Object = an MSnProcess object, which contains several sub-objects that can be accessed using processingData(MSnSet)@
- files
- processing
- merged
- cleaned
- removedPeaks
- smoothed
- trimmed
- normalised
- MSnbaseVersion

Charlotte Dawson

2024-07-01

Introduction

Exploring an MSnSet

assayData

featureData

phenoData

Making an MSnSet

The input data

assayData

featureData

phenoData

Make the MSnSet

Extracting results from an MSnSet

MSnSet slots

assayData

featureData

phenoData

protocolData

experimentData

processingData