Analysis of expression proteomics data in R
Overview
These materials focus on expression proteomics, which aims to characterise the protein diversity and abundance in a particular system. You will learn about the bioinformatic analysis steps involved when working with these kind of data, in particular several dedicated proteomics Bioconductor packages, part of the R programming language. We will use a real-world dataset obtained from a tandem mass tag (TMT) mass spectrometry experiment. We cover the basic data structures used to store and manipulate protein abundance data, how to do quality control and filtering of the data, as well as several visualisations. Finally, we include statistical analysis of differential abundance across sample groups (e.g. control vs. treated) and further evaluation and biological interpretation of the results via gene ontology analysis.
You will learn about:
- How mass spectrometry can be used to quantify protein abundance and some of the methods used for peptide quantitation.
- The bioinformatics steps involved in processing and analysing expression proteomics data.
- How to assess the quality of your data, deal with missing values and summarise PSM-level (peptide-spectrum match) data to protein-level.
- How to perform differential expression analysis to compare protein abundances between different groups of samples.
Target Audience
Proteomics practitioners or data analysts/bioinformaticians that would like to learn how to use R to analyse proteomics data.
Prerequisites
- Basic understanding of mass spectometry.
- Watch this video for an excellent overview.
- A working knowledge of R and the tidyverse.
- Familiarity with other Bioconductor data classes, such as those used for RNA-seq analysis, is useful but not required.
Exercises
Exercises in these materials are labelled according to their level of difficulty:
Level | Description |
---|---|
Exercises in level 1 are simpler and designed to get you familiar with the concepts and syntax covered in the course. | |
Exercises in level 2 combine different concepts together and apply it to a given task. | |
Exercises in level 3 require going beyond the concepts and syntax introduced to solve new problems. |
Instructors
This workshop will be run by:
- Charlotte Hutchings - Cambridge Centre for Proteomics, University of Cambridge
- Lisa Breckels - Cambridge Centre for Proteomics, University of Cambridge
- Tom Smith - MRC Laboratory of Molecular Biology, Cambridge
- Alistair Hines - Cambridge Centre for Proteomics, University of Cambridge
- Kieran McCaskie - Cambridge Centre for Proteomics, University of Cambridge
Previous instructors include:
- Thomas Krueger - Department of Biochemistry, University of Cambridge
- Charlotte S. Dawson - Cambridge Centre for Proteomics, University of Cambridge
Citation
Please cite these materials if:
- You adapted or used any of them in your own teaching.
- These materials were useful for your research work.
For example, you can cite us in the methods section of your paper: “We carried out our analyses based on the recommendations in Hutchings and Breckels (2024)”.
You can cite these materials as:
Hutchings C, Breckels LM (2024) “CambridgeCentreForProteomics/course_expression_proteomics: Analysis of expression proteomics data in R”, https://cambridgecentreforproteomics.github.io/course_expression_proteomics
Or in BibTeX format:
@Misc{,
author = {Charlotte Hutchings and Lisa M Breckels},
title = {CambridgeCentreForProteomics/course_expression_proteomics: Analysis of expression proteomics data in R},
month = {November},
year = {2024},
url = {https://cambridgecentreforproteomics.github.io/course_expression_proteomics}
}
Other key references
Data analysis workflow
Hutchings C, Dawson CS, Krueger T, Lilley KS, Breckels LM. A Bioconductor workflow for processing, evaluating, and interpreting expression proteomics data [version 2; peer review: 3 approved]. F1000Research 2024, 12:1402 https://f1000research.com/articles/12-1402/v2
The QFeatures
R/Bioconductor package.
Gatto L, Vanderaa C: QFeatures: Quantitative features for mass spectrometry data. R package version 1.12.0. 2023. Reference Source
The limma
R/Bioconductor package
Ritchie, M.E., Phipson, B., Wu, D., Hu, Y., Law, C.W., Shi, W., and Smyth, G.K. (2015). limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Research 43(7), e47.
Case-study data
Queiroz, R.M.L., Smith, T., Villanueva, E., Marti-Solano, M., Monti, M., Pizzinga, M., Mirea, D.-M., Ramakrishna, M., Harvey, R.F., Dezi, V., Thomas, G.H., Willis, A.E. & Lilley, K.S. (2019) Comprehensive identification of RNA–protein interactions in any organism using orthogonal organic phase separation (OOPS). Nature Biotechnology. 37 (2), 169–178. doi:10.1038/s41587-018-0001-2.
Mass spectrometry-based proteomics:
Dupree, E.J., Jayathirtha, M., Yorkey, H., Mihasan, M., Petre, B.A. & Darie, C.C. (2020) A Critical Review of Bottom-Up Proteomics: The Good, the Bad, and the Future of This Field. Proteomes. 8 (3), 14. doi:10.3390/proteomes8030014.
Obermaier, C., Griebel, A. & Westermeier, R. (2021) Principles of protein labeling techniques. In: A. Posch (ed.). Proteomic Profiling: Methods and Protocols. Methods in Molecular Biology. New York, NY, Springer US. pp. 549–562. doi:10.1007/978-1-0716-1186-9_35.
Rainer, L.G., Sebastian Gibb, Johannes (n.d.) Chapter 5 Quantitative data | R for Mass Spectrometry. https://rformassspectrometry.github.io/book/sec-quant.html.
Acknowledgements
- Thank you to Hugo Tavares for coordinating this course and his valuable input in developing and testing this material.
- Thomas Kruger and Charlotte S. Dawson for their input and guidance writing this material and the f1000 workflow A Bioconductor workflow for processing, evaluating, and interpreting expression proteomics data.
- Prof. Kathryn Lilley, group head and director of Cambridge Centre for Proteomics at the Department of Biochemistry, University of Cambridge.
- The
QFeatures
andlimma
R/Bioconductor packages are fundamental to this workflow, please cite them alongside the course if you use this material. Thank you to Laurent Gatto and Christophe Vanderaa for providing exemplary software for proteomics. - Thank you to the R for Mass Spectrometry team for providing excellent material in particular the R for Mass Spectrometry Book by Laurent Gatto, Sebastian Gibb and Johannes Rainer.