From peptides to theoretical spectra
Last updated
Last updated
Before trying to figure out how to go from experimental signal to the sequence of a peptide, we need to make sure we can model the process that yields the experimental signal. For simplicity, everything that we discuss in this chapter is predicated on the assumption that the mass to charge ratios of fragments of peptides can be added up. For example we assume that the mass to charge ratio of the peptide AG is equal to the sum of the mass to charge ratios of the two amino-acids A and G. This is not necessarily true in the real world, but it's a reasonable approximation that allows us to come up with relatively simple algorithms.
In order to use a mass spectrometer to figure out the sequence of a protein/peptide, it is necessary to "blast" the peptide to pieces, otherwise all we would find out is its mass to charge ratio. The total mass to charge ratio of the peptide does provide some information that constrains the possible length of the peptide. For a total mass M, the shortest peptide that has that mass would be composed of a string of tryptophan (W) amino acids as tryptophan is the "heaviest" amino-acid with a mass to charge ratio of 186. In other words, . The longest peptide that has mass M would be composed of a string of glycine (G), the "lightest" amino acid with a mass to charge ratio of 57: .
However, for a given mass M, there could be many different sequences of amino acids that add up to M, i.e., just knowing the mass of the peptide M we cannot figure out its sequence. We argue that, by measuring the size of the fragments produced by breaking up the peptide in random pieces, we can constrain the set of possible sequences that are consistent with the mass M, in many cases resolving the sequence of the peptide of interest.
First, some assumptions. We assume we are looking for the sequence of a peptide P of mass M. Our experiment starts with many copies of P which are fragmented at random, then the mass to charge ratio of the fragments is measured with a mass spectrometer. We will also assume that each copy of the peptide is fragmented in at most two pieces. This is not true in the real world, but this assumption simplifies the algorithms we will develop.
It's time for an example. Let's assume we have a circular peptide KLFPWFNQYV, shown in the figure below.
We assume that the fragmentation process breaks up the peptide in exactly two pieces. The process is random, thus copies of the peptide may break at different locations as shown below.
To make it easier to work through examples, we'll switch to a shorter peptide: SELF. Fragmenting this cyclic peptide results in the following set of fragments: S, E, L, F, SE, EL, LF, FS, SEL, ELF, LFS, FSE, SELF (assuming that multiple copies of this peptide have been broken up at all possible locations that yield two pieces each). Note that we considered fragments that "wrap around" the end of the string since the peptide is circular. We will call the set of fragments generated (or more precisely, their masses), the theoretical spectrum of the peptide. We call this spectrum "theoretical" because it represents what we expect to see in the output of the instrument, rather than the actual signal we have measured. Throughout the following, we will interchangeably refer to the spectrum as the set of sub-peptides, or the set of masses.
The following table includes the masses (or mass-to-charge ratios) of all amino acids.
G
A
S
P
V
T
C
I
L
N
D
K
Q
E
M
H
F
R
Y
W
57
71
87
97
99
101
103
113
113
114
115
128
128
129
131
137
147
156
163
186
Adding up these values, we can compute the masses of the theoretical spectrum of peptide SELF.
S
E
L
F
SE
EL
LF
FS
SEL
ELF
LFS
FSE
SELF
87
129
113
147
87-129
129-113
113-147
147-87
87-129-113
129-113-147
113-147-87
147-87-129
87-129-113-147
87
129
113
147
216
242
260
234
329
389
347
363
476
If you look carefully at the masses represented in the spectrum, you will see that amino acids I (isoleucine) and L (leucine) both have the same mass, as do K (lysine) and Q (glutamine). That means that we will never be able to use a mass spectrometer to figure out the exact sequence of a peptide that contains both members of these pairs. Nonetheless, you will see that we can learn a whole lot about peptides from their spectrum, even if some ambiguity will remain unsolved.