TY - JOUR
T1 - Information-theoretic analysis of the reference state in contect potentials used reference state in contact potentials used for protein in structure predicuion
AU - Solis, Armando D.
AU - Rackovsky, Shalom R.
PY - 2010
Y1 - 2010
N2 - Using information-theoretic concepts, we examine the role of the reference state, a crucial component of empirical potential functions, in protein fold recognition. We derive an information-based connection between the probability distribution functions of the reference state and those that characterize the decoy set used in threading. In examining commonly used contact reference states, we find that the quasichemical approximation is informatically superior to other variant models designed to include characteristics of real protein chains, such as finite length and variable amino acid composition from protein to protein. We observe that in these variant models, the total divergence, the operative function that quantifies discrimination, decreases along with threading performance. We find that any amount of nativeness encoded in the reference state model does not significantly improve threading performance. A promising avenue for the development of better potentials is suggested by our information-theoretic analysis of the action of contact potentials on individual protein sequences. Our results show that contact potentials perform better when the compositional properties of the data set used to derive the score function probabilities are similar to the properties of the sequence of interest. Results also suggest to use only sequences of similar composition in deriving contact potentials, to tailor the contact potential specifically for a test sequence.
AB - Using information-theoretic concepts, we examine the role of the reference state, a crucial component of empirical potential functions, in protein fold recognition. We derive an information-based connection between the probability distribution functions of the reference state and those that characterize the decoy set used in threading. In examining commonly used contact reference states, we find that the quasichemical approximation is informatically superior to other variant models designed to include characteristics of real protein chains, such as finite length and variable amino acid composition from protein to protein. We observe that in these variant models, the total divergence, the operative function that quantifies discrimination, decreases along with threading performance. We find that any amount of nativeness encoded in the reference state model does not significantly improve threading performance. A promising avenue for the development of better potentials is suggested by our information-theoretic analysis of the action of contact potentials on individual protein sequences. Our results show that contact potentials perform better when the compositional properties of the data set used to derive the score function probabilities are similar to the properties of the sequence of interest. Results also suggest to use only sequences of similar composition in deriving contact potentials, to tailor the contact potential specifically for a test sequence.
KW - Contact potential
KW - Empirical potentials
KW - Hydrophobicity scale
KW - Information theory
KW - Protein structure prediction
KW - Query-specific potential
KW - Reference state
KW - Sequence-specific potential
KW - Threading
UR - http://www.scopus.com/inward/record.url?scp=77951241485&partnerID=8YFLogxK
U2 - 10.1002/prot.22652
DO - 10.1002/prot.22652
M3 - Article
C2 - 20034109
AN - SCOPUS:77951241485
SN - 0887-3585
VL - 78
SP - 1382
EP - 1397
JO - Proteins: Structure, Function and Bioinformatics
JF - Proteins: Structure, Function and Bioinformatics
IS - 6
ER -