This function parses the output .txt files (peptide groups or PSMs) from Proteome Discoverer and then filters out features based on various criteria.

The function performs the following steps:

  1. Remove features without a master protein

  2. (Optional) Remove features without a unique master protein (i.e. Number.of.Protein.Groups == 1)

  3. (Optional) Remove features matching a cRAP protein

  4. (Optional) Remove features matching any protein associated with a cRAP protein (see below)

  5. Remove features without quantification values (only if TMT or SILAC are TRUE and level = "peptide".)

parse_features(
  data,
  master_protein_col = "Master.Protein.Accessions",
  protein_col = "Protein.Accessions",
  unique_master = TRUE,
  silac = FALSE,
  TMT = FALSE,
  level = "peptide",
  filter_crap = TRUE,
  crap_proteins = NULL,
  filter_associated_crap = TRUE
)

Arguments

data

data.frame generated from txt file output from Proteome Discoverer.

master_protein_col

string. Name of column containing master proteins.

protein_col

string. Name of column containing all protein matches.

unique_master

logical. Filter out features without a unique master protein.

silac

logical. Is the experiment a SILAC experiment?

TMT

logical. Is the experiment a TMT experiment?

level

string. Type of input file, must be one of either "peptide" or "PSM".

filter_crap

logical. Filter out features which match a cRAP protein.

crap_proteins

character vector. Contains the cRAP accessions, for example: c("P02768") which is serum albumin.

filter_associated_crap

logical. Filter out features which match a cRAP associated protein.

Value

Returns a data.frame with the filtered Proteome Discoverer output.

Details

Associated cRAP proteins are proteins which have at least one feature shared with a cRAP protein. It has been observed that the cRAP database does not contain all possible cRAP proteins e.g. some features can be assigned to a keratin which is not in the provided cRAP database.

Using filter_associated_crap = TRUE will filter out f2 and f3 in addition to f1, in the example below; regardless of the value in the Master.Protein.Accession column.

feature  Protein.Accessions         Master.Protein.Accessions
f1       protein1, protein2, cRAP,  protein1,
f2       protein1, protein3         protein3,
f3       protein2                   protein2

Examples