An Introduction to Protein Identification
What is protein identification?
As an integral part of proteomics research, protein identification is to identify a protein or characterize a protein. But how can one identify a protein? A very direct answer is to find the sequence of amino acids of the protein, but the process is time-consuming and annoying because of the necessary procedures, such as the usage of several restriction enzymes and running gels and finding masses… not over. After all those steps, one has to think about if the protein of interest has been identified already.
Why do we need protein identification?
With the emerging of Proteomics on the basis of human genome project, the discipline makes purpose to determine the presence and quantity of proteins. Like gene sequencing, protein identification (sequence of amino acids of the protein) is important to facilitate a systematic understanding of key biological knowledge, for example, protein structure, protein function, and protein evolution.
How do we do protein identification?
The available tools/approaches to identify proteins include mass spectrometry (database searching, de novo sequencing, and peptide sequence tag). The process is done in two main ways:
- Peptide Mass Fingerprinting (PMF). Usually PMF will use the masses of peptides derived from a spectrum as to check against a database of predicted peptide masses. These predicted masses are recorded from digestion of a list of well documented proteins. If a protein sequence has a significant number of predicted masses that match the experimental values, there is an excellent chance that the given protein is present in the sample.
- Tandem MS. Usually Tandem MS will apply collision-induced dissociation. This process breaks proteins within the peptide backbone and because of this fragmentation, comparisons between the observed fragment sizes and the database of predicted masses is possible.
In addition to Tandem MS, there is actually another but similar technique—peptide fragmentation fingerprinting or what we referred as to PFF. This approach uses enzymatic digestion of a single peptide to generate a fragmentation pattern.
Whatever the identification methods is, challenges such as identification scale and speed along with the exponential growth of the protein database and the accelerated generation of mass spectrometry data, even the nonspecific digestion and post-modifications in complex-sample identification will be encountered. In order to handle those challenges, new advancements in proteomics are always on the road or under applying already…