Bioinformatics lecture notes
  • Introduction
  • Introduction to biology (for computer scientists)
  • Ethical considerations
  • Finding patterns in DNA
    • Introduction to pattern discovery
    • Looking for frequent k-mers
    • Leveraging biology
    • Finding genes
  • Exact string matching
    • Introduction to exact string matching
    • Semi-numerical matching
    • The Z algorithm
    • The KMP algorithm
  • Multiple sequence alignment
    • Introduction to multiple sequence alignment
    • Motif finding
  • String indexing
    • Introduction to string indexing
    • Introduction to suffix trees
    • Suffix trees: beyond the basics
    • Suffix arrays
    • The Burrows-Wheeler transform and the FM-index
  • Inexact alignment
    • Introduction to inexact alignment
    • Inexact alignment calculation with dynamic programming
    • Example: filling the dynamic programming table
    • Modeling alignment as a graph
    • Backtracking through the dynamic programming table
    • From edit distance to alignment scores
    • Local alignment
    • Exercises
  • Advanced inexact alignment
    • Gap penalties
    • Sequence alignment in linear space
    • Sequence alignment with bounded error
  • Proteomics data analysis
    • Introduction to proteomic data analysis
    • From peptides to theoretical spectra
    • Cyclopeptide sequencing
    • Dealing with errors in experimental spectra
  • Data clustering
    • Introduction to data clustering
    • K-means clustering
    • Hierarchical clustering
  • Phylogenetic analysis
    • Introduction to phylogenetic inference
    • Distance-based phylogenetic analysis
    • Trait-based phylogenetic inference
  • Sequence assembly
    • Introduction to sequence assembly
    • Graph formulations of sequence assembly
    • Finding Eulerian tours
  • Gene finding and annotation
    • Introduction to sequence annotation
    • Gene finding
    • Introduction to Hidden Markov Models
    • Taxonomic and functional annotation
Powered by GitBook
On this page
  1. Proteomics data analysis

Introduction to proteomic data analysis

PreviousSequence alignment with bounded errorNextFrom peptides to theoretical spectra

Last updated 4 months ago

While DNA encodes the "instructions of life", the proteins derived from the translation of RNA molecules are the entities that put these instructions into practice. So far in the class we have focused on DNA and protein strings without discussing how scientists have figured out what these strings are. In this chapter, we will focus on one experimental approach that can be used to uncover the amino-acid sequence of proteins.

We will start by focusing on a special type of proteins, cyclical non-ribosomal peptides. They are called cyclical because the protein is organized into a circular structure. Many of these molecules are used as antibiotics, anti-cancer drugs, tumor suppressors, and in other applications.

One of the experimental processes used to determine the sequence of amino-acids forming one of these non-ribosomal peptides is called mass spectrometry, and is described in more detail below.

Mass spectrometry

Mass spectrometry is an analytic technique that is used to measure the mass to charge ratio (m/z) of molecules. At a very high level view, a mixture of molecules is ionized (thereby providing them with a charge, which enables their manipulation by electric currents). These charged molecules are then "sprayed" into a chamber, where electric and/or magnetic fields are applied perpendicularly to the direction in which the particles move. The combination of the speed with which each molecule moves through the chamber, its mass to charge ratio, and the strength of the field applied determine how much the trajectory of the particle deviates from a straight line. The mass of each particle determines its inertia, while the charge of each particle determines the force exerted to it by the field, resulting in a deviation that is inversely proportional to the mass to charge ratio of each particle. Heavy particles with low charges only deviate a little, while low-mass particles that are heavily charged deviate the most.

A detector at the end of the chamber captures the particles and measures both their distance from the center and the abundance of the particles "hitting" a particular location in the detector, yielding signals such as that shown below.

Mass-spectrum. he x axis represents the inferred mass to charge ratio, while the y axis records the abundance of particles with a specific mass to charge ratio. Image source:
https://commons.wikimedia.org/wiki/File:Masspectrum.jpg
Representation of a mass spectrum. A series of vertical peaks represent the amount of molecules with a given mass-to-charge ratio (represented on the x axis).
A sketch of a mass-spectrometer. A "spray" of molecules is deviated by an electromagnetic field as it moves towards a detector. The magnitude of the deviation is related to the mass-to-charge ratio of the molecules.
A representation of a mass spectrometer - vertical arrows indicate an electro-magnetic field. A rectangle on the left highlights the spray of molecules. A curved arrow towards the right of the image shows a molecule moving towards a detector and having its trajectory deviated by the electromagnetic field.