Predicting Disorder

Predicting Intrinsically Disordered Regions of Protein Chains
A support document for FirstGlance in Jmol.
Purpose I: Crystallized residues are missing or have a high temperature. Quite often some part of the protein sequence that was crystallized is missing in the 3D model, or has a higher temperature than the rest of the structure (see Local Uncertainty in the Views tab).

Empty basket at missing residues.

Missing Residues are reported under the under the Regions with missing residues will be marked with "empty baskets". Quite often, segments that are missing or that have a high temperature are predicited to be intrinsically disordered. When missing, such segments are less likely to be resolved when the same protein sequence is crystallized under other conditions.

Purpose II: Residues were deleted before crystallization. Crystallization success is often improved by deleting flexible portions of a protein chain, especially intrinsicially disordered portions. (Another reason for deletion is when a region is labile to degradation.) You may like to know whether portions of a chain that were deleted for a crystallization experiment are predicted to be intrinsically disordered. On the other hand, it is sometimes possible to obtain diffraction-quality crystals that include intrinsically disordered portions, in which case those disordered portions will likely be missing from the crystallographic model, unless they are stabilized by crystal contacts. (Example: in 3b0z, flexibility of the N terminus appears to be functionally important, and it is predicted to be intrinsically disordered. Nevertheless, it formed a helix in the crystal, stabilized by crystal contacts.)

For an introduction to intrinsic disorder in proteins, see the article in Proteopedia Intrinsically Disordered Protein. There you will also find a list of five or more prediction servers. Below are instructions for using one of these servers, FoldIndex. If the results are important to you, try other servers to see how well they agree.

Before you begin the procedure below, take a careful look at the Missing Residues report by FirstGlance in Jmol under the

Copy the sequence of the chain of interest. In FirstGlance, under the leftmost tab (usually labeled with the PDB code), click on Sequences. (These links will take you directly to sequences. In contrast, the links in the Resources Tab will take you to a general page on the PDB entry, where you will have to find the sequences.)
- Note that the sequences available from OCA, PDB-Europe, and PDB-USA (RCSB) span only the sequence range used for structure determination, which are quite often not full length sequences for the molecule in question. These crystallized sequences are suitable for Purpose I above. The FASTA sequences available from OCA are most convenient for copying.
- Full length sequences are available from UniProt. These are suitable for Purpose II above.

Go to the FoldIndex Server.

Paste the sequence into the box. (FoldIndex will ignore the FASTA description line, beginning ">".)

Click the Process button. Now you have your results.
- Caution! Don't look for predicted disorder by sequence number! Why? FoldIndex numbers the sequence starting at one. However, the first residue in a chain is often not numbered one in PDB files. Worse, the sequence numbering in PDB files is sometimes not consecutive because of numbering according to an ancestral sequence, leading to insertions and deletions with bizzare numbering. Keep reading ...

Purpose I: Crystallized residues are missing.

To find a segment that FoldIndex predicts to be disordered, use the sequence capability of the Find.. tool (under the Tools Tab).
1. Quick check: Copy the segment predicted to be disordered and see if it is found. If it is, then none of it is missing in the model! As an example, we'll use 2ace. FoldIndex marks predicted disorder in red. Here we have copied the first red segment:
  
  Be sure to remove the spaces when you paste the sequence into the slot in the Find.. tool:
  - Incorrect query: sequence=KPWSGVWNA STYPNNCQQY VDEQFPGFSG SEMWNPNRE
  - Correct query: sequence=KPWSGVWNASTYPNNCQQYVDEQFPGFSGSEMWNPNRE
  If the red sequence is found, then none of the residues in that sequence are missing in the 3D model, and it was not disordered in the crystal.
2. If the red sequence is not found, some or all of the red sequence is missing from the 3D model. If even one residue in the sequence is missing, the sequence will not be found.
  
  To locate the missing residues in the 3D model, copy a short sequence of 4-5 residues that precedes or follows the putatively disordered segment. Use this sequence in the Find.. tool.
  
  Be sure to remove the spaces when you paste the sequence into the slot in the Find.. tool:
  - Incorrect query: sequence=RIM HY
  - Correct query: sequence=RIMHY
  Once you locate the general area with a preceding or following sequence fragment, you will see an "empty basket" (marking missing residues) nearby. Touch or click on the end(s) of the chain near that basket to report the residues at the "broken end(s)".
Purpose II: Residues were deleted before crystallization.

To find out whether residues were deleted from the full length sequence before crystallization, check the lengths of the sequences.
1. To get the length of the experimental crystallized sequence, click on Sequences under the Molecule Information Tab ( and then click on the desired chain under OCA. Note the sequence length.
2. The full length is given in UniProt. Under the click on Sequences, and then in the lower left (help) panel of FirstGlance, under full-length genomic sequences, click on the chain of interest under UniProt. This will take you to the Sequences section (orange arrow below in snapshot), where you will find the full length (red arrow below in snapshot):
  
  In our example, 2ace, a protein chain of length 537 was crystallized, but the full length sequence is 586 amino acids. Note that in some cases, an expression tag may have been added to the crystallized sequence.
3. What portions were deleted (or added?) before crystallization? This can best be answered by aligning the crystallized sequence with the full-length sequence.
  - While you are at UniProt, click the FASTA button (magenta arrow in the snapshot above) and copy the full-length FASTA sequence.
  - At UniProt, click on the Align tab at the top of the page (red arrow in the snapshot below).
  - Paste the two FASTA sequences into the box. (You can enlarge the box by dragging the lower right corner to help see what you are doing.) The box should look similar to this. (A blank line between the FASTA sequences is OK but not necessary).
4. After clicking the Run Align button (below the box), the resulting alignment shows (in the example of 2ace) that the first 21 residues and the last 28 residues were deleted in the crystallized chain.
  
  FoldIndex does not make predictions for the first and last 25 residues due the size of its scanning interval (51 residues by default). In a different case having larger deletions, the predictions of FoldIndex for the full-length sequence would be more useful.
See also How To Align Protein Sequences and Display Multiple Sequence Alignments.

Suggestions for improvement? Please contact