From peptides to theoretical spectra

Introduction

Before trying to figure out how to go from experimental signal to the sequence of a peptide, we need to make sure we can model the process that yields the experimental signal. For simplicity, everything that we discuss in this chapter is predicated on the assumption that the mass to charge ratios of fragments of peptides can be added up. For example we assume that the mass to charge ratio of the peptide AG is equal to the sum of the mass to charge ratios of the two amino-acids A and G. This is not necessarily true in the real world, but it's a reasonable approximation that allows us to come up with relatively simple algorithms.

In order to use a mass spectrometer to figure out the sequence of a protein/peptide, it is necessary to "blast" the peptide to pieces, otherwise all we would find out is its mass to charge ratio. The total mass to charge ratio of the peptide does provide some information that constrains the possible length of the peptide. For a total mass M, the shortest peptide that has that mass would be composed of a string of tryptophan (W) amino acids as tryptophan is the "heaviest" amino-acid with a mass to charge ratio of 186. In other words, $L_{min} = M/186$ . The longest peptide that has mass M would be composed of a string of glycine (G), the "lightest" amino acid with a mass to charge ratio of 57: $L_{max} = M/57$ .

However, for a given mass M, there could be many different sequences of amino acids that add up to M, i.e., just knowing the mass of the peptide M we cannot figure out its sequence. We argue that, by measuring the size of the fragments produced by breaking up the peptide in random pieces, we can constrain the set of possible sequences that are consistent with the mass M, in many cases resolving the sequence of the peptide of interest.

Modeling peptide fragmentation

First, some assumptions. We assume we are looking for the sequence of a peptide P of mass M. Our experiment starts with many copies of P which are fragmented at random, then the mass to charge ratio of the fragments is measured with a mass spectrometer. We will also assume that each copy of the peptide is fragmented in at most two pieces. This is not true in the real world, but this assumption simplifies the algorithms we will develop.

It's time for an example. Let's assume we have a circular peptide KLFPWFNQYV, shown in the figure below.

Letters organized in a circle representing a cyclic peptide.

We assume that the fragmentation process breaks up the peptide in exactly two pieces. The process is random, thus copies of the peptide may break at different locations as shown below.

Multiple circles containing letters representing circular peptides. Some circles are broken at two locations to demonstrate the random breakage process.

To make it easier to work through examples, we'll switch to a shorter peptide: SELF. Fragmenting this cyclic peptide results in the following set of fragments: S, E, L, F, SE, EL, LF, FS, SEL, ELF, LFS, FSE, SELF (assuming that multiple copies of this peptide have been broken up at all possible locations that yield two pieces each). Note that we considered fragments that "wrap around" the end of the string since the peptide is circular. We will call the set of fragments generated (or more precisely, their masses), the theoretical spectrum of the peptide. We call this spectrum "theoretical" because it represents what we expect to see in the output of the instrument, rather than the actual signal we have measured. Throughout the following, we will interchangeably refer to the spectrum as the set of sub-peptides, or the set of masses.

The following table includes the masses (or mass-to-charge ratios) of all amino acids.

101

103

113

114

115

128

129

131

137

147

156

163

186

Adding up these values, we can compute the masses of the theoretical spectrum of peptide SELF.

SEL

ELF

LFS

FSE

SELF

129

113

147

87-129

129-113

113-147

147-87

87-129-113

129-113-147

113-147-87

147-87-129

87-129-113-147

129

113

147

216

242

260

234

329

389

347

363

476

If you look carefully at the masses represented in the spectrum, you will see that amino acids I (isoleucine) and L (leucine) both have the same mass, as do K (lysine) and Q (glutamine). That means that we will never be able to use a mass spectrometer to figure out the exact sequence of a peptide that contains both members of these pairs. Nonetheless, you will see that we can learn a whole lot about peptides from their spectrum, even if some ambiguity will remain unsolved.

PreviousIntroduction to proteomic data analysis NextCyclopeptide sequencing

Last updated 6 months ago