Untitled Document
 
     
Untitled Document

UNIX
        SGI IRIX

HOMOLOGY MODELING
        USES & BACKGROUND
        BASIC GUIDE
        FOLD ASSIGNMENT
        TEMPLATE & ALIGNMENT
        BUILDING THE MODEL
        REFINEMENT & EVALUATION

DOCKING
        USES
        BACKGROUND
        SETTING UP THE SYSTEM
        ACCESSING THE RESULTS

RATIONAL DRUG DESIGN
        USES
        BACKGROUND

MOLECULAR DYNAMICS
        USES
        BACKGROUND
        SETTING UP THE SYSTEM
        ACCESSING THE RESULTS

 

 

 

Homology Modeling

The following document gives a basic look into homology modeling. It is based on personal experience and trial and error.

TEMPLATE IDENTIFICATION

Template identification is a crucial step in homology modeling. To use an art anology, image yourself as a sculpter with clay trying to build a certain vase. In this case the clay can be though of as the atoms and the vase the correctly folded protein. To build this vase you would want to look at other similar vases. Ideally you would want a nearly identical vase to work on. The difficulty is that the vase you find to base your vase on can be too big, too small, have an extra handle etc.

Proteins can be though of in a similar way. A template may be too small, too large, too complex, or not similar enough. For proteins the first step in identifying a template is to try to find structures with significant sequence similarity to your query sequence. Ideally you want to find templates with >40% sequence identity since this can lead to models in quality approaching those of experimental techniques. Below 40% is more difficult and is classified as the twilight zone. At this level, alignment is a very important factor, if not the most important fact. In general alignment of your seuqence with the sequece of your template structure has the biggest effect on the final model.

A good first approach to finding a template is to use a sequence based search routine. A protein-protein BLAST is a good way to go. Even better is a PSI-BLAST or gapped BLAST which are more iterative and current algorithms.

What BLAST basically does is searches your sequence against a database of sequences. In our case since we want a structural template we want only those sequences of proteins whose structure is also known. Consequently we must limit our search to the protein data bank (PDB), or some other structural repository.

The results of BLAST are a series of hits, percent identiy, and statistical scores. You want to find a structure that has the highest sequence identity over the largest span. You also want a structure that is large enough. If your query sequence is 300AA and you have a template with 90% identity but that is only 10AA long it doesn't help much except perhaps for modeling that specific portion of the protein.

This leads to the issue of using multiple templates. Ideally it would be best to have one template to base your model on but this is not always possible. Further, your sequence may be composed of two distinct domains and thus more than one template will be necessary.

In short, BLAST is a great way to get started and often gives the best results. If BLAST fails to give good structures or if you want to solidify your findings, a search of PFAM and HHMER is another good place to look.

PFAM HMMER can be found at this address. Essentially this is a dynamic database of protein families and multiple sequence alignments. It is great for identifying certina protien domains or fold families. For instance, maybe your sequence belongs to a certain fold family, say the Ig fold family. You could then look at the structures in this family and look at proposed functions, structure, etc. This will give you a basis of comparison for your structure. This not only is a good way to find templates but to access your final model with other family members.

Another method for template identification is called FUGUE. FUGUE is a sequence structure based routine so it extrapolates information beyond just the sequence comparison. This is good for soldifying your BLAST results or for finding structures that a sequence searching algorithm missed. Some proteins have 20% or less sequence identity but have nearly identical structures. FUGUE is one method to find these more ellusive templates.

If all of the above fails perhaps you have a novel fold, or you cannot build a homology model. One last method may be to look at proposed structure of your protein and look for structures based on function. To compliment this, you can do a secondary structure prediciton with a number of algorithms using PELE available at the SDSC workbench. This will help determine what secondary structureal elements you may have. You can then compare this with a proposed template you find.

 

ALIGNMENT

Alignment of your sequence with your template will have one of, if not the biggest impact on your structure. Ideally you would like to have an alignment with few gaps and high sequence identity. This is not always possible. Complicating the situation is that there are several different routines for alignment. Each of these methods uses what's called a scoring matrix to determine how likely a residue is to line up with another residue. Some common ones are called BLOSUM, GONNET, PAM, and IDENTITY. By brute force one can try the different matrices in the alignment program and see what gives the best sequence identity while minimizing gaps. Another important factor to consider is what are the important regions in the protein. If a site on a protein is important for function it will likely be conserved and thus you may want to focus the attention of yoru alignment on that area. Perhaps there is a cluster of acidic residues that the template and your sequence share. If this is the case an alignment that put these regions together would be more believable.

Another factor to consider is secondary structure. If your template has an alpha helix and your sequence aligns there but has a proline in the middle of the helix you may want to change the alignment, either by using a different matrix or manually adjusting the alignment. Biochemical rules still apply. Secondary structure algorithms such as PELE are a good way to approximate the two-dimensional characteristics of your protein. By aligning secondary structure elements it can greatly improve the accuracy of your model.

There are several different parameters for alignment including how much one wants to penalize gaps etc. By experimentation with a variety of combinations you can try to find the best fit for your situation.

 

BUILDING THE MODEL

The actual building process of the build is rather straight forward. Programs like MODELLER implemented in molecular modeling packages such as DS Modeling and InsightII from Accelrys, have this built in. The alignment of structure and sequence can be submitted to MODELLER and the program will build a structure based on that alignment. By satisfying spatial restrains a model is formed. There are custom restraints you can put in the model, for instance if you know that certain cysteins for disulfide bridges or what prolines are cis prolines.

Before building your model it is a very good idea to make sure your protein has a forcefield assigned to it and the potentials are set correctly. A forcefield is basically a definition of how atom and bond charcteristics.

Once the model is built it can be further refined and analyzed.

 

REFINEMENT AND ANALYSIS

When building a model, the general question is what a given sequence's structure will look like. After we have a model we can ask the reverse question : given a structure does this sequence make sense. A program called Verify3D does this and gives a statistical measure of how good our structure is. It identifies by residue regions of low probablity and thus identifies misfolded regions of the protein. The program gives a low threshold and a high threshold. Anything below the low threshold number almost certainly means an incorret structure. Anything above the high threshold means an almost certainly correct structure. Homology models usually tend to fall somewhere in between. Again, similarity to the template will greatly affect this.

Since this is a quantitative method it gives a good measure one whether or not a given refinement was benefiical or detrimental.



     

 

Sign Guest Book
View Guest Book

This site is maintained by Arzhang Fallahi
Last Updated: August 2, 2004
Comments/Suggestions


Visits to site: