FAQ for "insufficient data" in ConSurf

What does "insufficient data" mean?

ConSurf is unable to assign a meaningful conservation grade to amino acids for which the data are insufficient. The most common reason for insufficient data is that there are too few sequences in the multiple sequence alignment -- either too few altogether, or too few that extend into the region of certain amino acids. Another reason may be inadequate diversity in the multiple sequence alignment.

If fewer than 6 amino acids occupy a position in the multiple sequence alignment, ConSurf deems the position to have insufficient data. Even when this minimum is satisfied, if the confidence interval for a sequence position spans four color grades (or higher), the position is deemed to have insufficient data.

Here is more information about ConSurf's calculation and determination of "insufficient data".

Why show conservation grades at all when there are insufficient data?

By default, residues with insufficient data will be colored accordingly. The default color is yellow.
Conservation grades calculated for residues with insufficient data can be shown by changing the display from the default (Display #1) to one of the other options. Click on the Display link, and the other display options will be shown in PE's lower left frame.

In most cases, conservation grades calculated by ConSurf will not be meaningful for positions with insufficient data. However, there may be exceptions. Consider, for example, a region of hypervariability with conservation grades of 1. An amino acid in the middle of this region might be deleted in, for example, all but 4 of 30 sequences in the multiple sequence alignment. Here it may make sense to show this "insufficient data" position as conservation grade 1. There may be other cases where external biological data support a calculated conservation grade despite it being judged to have insufficient data by ConSurf. Therefore, ConSurf offers the flexibility to show calculated conservation grades even for positions with insufficient data, a flexibility which must be used with caution.

There may also be cases where a few residues with insufficient data are not part of a region of interest, but are distracting when left colored as insufficient data. Because of their apparent irrelevance to the biological point to be made, an author may elect to hide the fact that these positions have insufficient data (but should mention that fact in the legend to the figure in the spirit of full disclosure). Again, this flexibility must be used with caution.

How can I reduce the number of amino acids with insufficient data?

If the PSI-BLAST search found more homologous sequences than you used in your ConSurf calculation, increasing the number of sequences used will generally reduce the number of positions with insufficient data. Try Uniprot (as an alternative to SWISS-PROT) to obtain a larger number of sequences. You could also try multiple PSI-BLAST iterations to see whether this increases the number of sequences found (although the additional sequences will clearly have less homology).

Feedback to Eric Martz.