Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Dec 31;14(12):e1006237.
doi: 10.1371/journal.pcbi.1006237. eCollection 2018 Dec.

Statistical investigations of protein residue direct couplings

Affiliations

Statistical investigations of protein residue direct couplings

Andrew F Neuwald et al. PLoS Comput Biol. .

Abstract

Protein Direct Coupling Analysis (DCA), which predicts residue-residue contacts based on covarying positions within a multiple sequence alignment, has been remarkably effective. This suggests that there is more to learn from sequence correlations than is generally assumed, and calls for deeper investigations into DCA and perhaps into other types of correlations. Here we describe an approach that enables such investigations by measuring, as an estimated p-value, the statistical significance of the association between residue-residue covariance and structural interactions, either internal or homodimeric. Its application to thirty protein superfamilies confirms that direct coupling (DC) scores correlate with 3D pairwise contacts with very high significance. This method also permits quantitative assessment of the relative performance of alternative DCA methods, and of the degree to which they detect direct versus indirect couplings. We illustrate its use to assess, for a given protein, the biological relevance of alternative conformational states, to investigate the possible mechanistic implications of differences between these states, and to characterize subtle aspects of direct couplings. Our analysis indicates that direct pairwise correlations may be largely distinct from correlated patterns associated with functional specialization, and that the joint analysis of both types of correlations can yield greater power. Data, programs, and source code are freely available at http://evaldca.igs.umaryland.edu.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Empirical values of Ŝ as a function of S yielded by randomly shuffled 100,000 DCA arrays (blue dots connected by lines), and by 100,000 DCA arrays derived from column-permuted MSAs, where the order of the columns and of the residues within each column were randomly permuted (red triangles connected by lines).
Solid straight lines represents agreement of Ŝ with S, and the dashed curves represent an error range of two standard deviations. Results are shown for six of the domains listed in Table 2, designated by their corresponding pdb identifiers 3fhkF, 1ijxA, 4cmlA, 1olzA, 1k30A, 1wznA, ordered by increasing numbers of sequences in their corresponding MSAs. For 1wxnA, the additional data points (faint red triangles connected by a dashed line) corresponds to an MSA of 5,117 sequences randomly drawn from the original MSA.
Fig 2
Fig 2. S as a function of 3D distance ranges defining distinguished residue pairs.
See discussion in text. A. The s-scores obtained for distance ranges spanning zero to 16 Å. Column pairs corresponding to residue-to-residue distances below the indicated range were excluded from the analysis. B. Detailed plot of the span 2 to 5 Å. Each distance range covers 0.25 Å and is labeled by its upper limit.
Fig 3
Fig 3. S, SF and PPV scores as a function of various 3D structural coordinates for each of five protein domains.
Structures are ordered by the average of their scores over four methods: CCM (black lines), EVC (cyan lines), GSF (red lines) and PCV (green lines). Below the name for each domain are shown both the mean value of F and of the optimal cut points X for the S-scores. The constant cut point values of x = F × ℓ are shown between the PPV and SF plots. The value of r (the maximum 3D distance defining contacting pairs) is 5 Å.
Fig 4
Fig 4. Regression analysis of S for 60 Ran GTPase structures versus their crystal structure resolutions.
The coefficient of determination is R2 = 0.00005, indicating that crystal structure resolution fails to explain the variability of S around its mean. The same R4 family MSA and parameters were used here as for the analyses in Table 4.
Fig 5
Fig 5. Residues in Ran involved in interacting pairs within the transition state structure (pdb: 1k5g) [37].
Sidechains of residues in RanBP1 contacting Ran are labeled in (A) and shown in magenta with dot clouds. The sidechain of Ran Lys130, which plays a role in the stimulation of GTP hydrolysis by RanGAP [37], is indicated. The GTP transition state analog and sidechains of Ran’s catalytic (active site) residues are represented as cyan and red sticks, respectively. A PyMOL session file corresponding to this figure is available at our website. A. Sidechains of residue pairs contributing to the higher S for Ran in the transition state than in the ground state (pdb: 1k5d). These residues are represented as yellow spheres, except for the pivot point residues Phe90 and Val14, which are shown as bright blue spheres, and for two of the unlabeled catalytic residues shown in red (Thr24 and Thr42). B. Ran residues forming pairs whose interactions remain stable over diverse conformational forms (shown as orange and bright blue spheres). These diverse forms include the Ran-RanBP1-RanGAP transition (pdb: 1k5g) and ground (pdb: 1k5d) states; Ran bound to its exchange factor, RCC1 (pdb: 1i2m); Ran bound to GDP (pdb: 3gj0); Ran bound to Ntf1 and GDP (pdb: 1a2k); and Ran bound to RanBP1 and CRM1 (pdb: 4hb2).

Similar articles

Cited by

References

    1. Lunt B, Szurmant H, Procaccini A, Hoch JA, Hwa T, Weigt M. Inference of direct residue contacts in two-component signaling. Methods Enzymol. 2010;471:17–41. 10.1016/S0076-6879(10)71002-8 . - DOI - PubMed
    1. Weigt M, White RA, Szurmant H, Hoch JA, Hwa T. Identification of direct residue contacts in protein-protein interaction by message passing. Proceedings of the National Academy of Sciences of the United States of America. 2009;106(1):67–72. 10.1073/pnas.0805923106 - DOI - PMC - PubMed
    1. Hopf TA, Colwell LJ, Sheridan R, Rost B, Sander C, Marks DS. Three-dimensional structures of membrane proteins from genomic sequencing. Cell. 2012;149(7):1607–21. 10.1016/j.cell.2012.04.012 - DOI - PMC - PubMed
    1. Jones DT, Buchan DW, Cozzetto D, Pontil M. PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments. Bioinformatics. 2012;28(2):184–90. 10.1093/bioinformatics/btr638 . - DOI - PubMed
    1. Marks DS, Colwell LJ, Sheridan R, Hopf TA, Pagnani A, Zecchina R, et al. Protein 3D structure computed from evolutionary sequence variation. PLoS One. 2011;6(12):e28766 10.1371/journal.pone.0028766 - DOI - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources