TY - JOUR
T1 - Fundamental amino acid mass distributions and entropy costs in proteomes
AU - Lehmann, Jean
AU - Libchaber, Albert
AU - Greenbaum, Benjamin D.
N1 - Publisher Copyright:
© 2016 Elsevier Ltd
PY - 2016/12/7
Y1 - 2016/12/7
N2 - We examine whether the frequency of amino acids across an organism's proteome is primarily determined by optimization to function or other factors, such as the structure of the genetic code. Considering all available proteins together, we first point out that the frequency of an amino acid in a proteome negatively correlates with its mass, suggesting that the genome preserves a fundamental distribution ruled by simple energetics. Given the universality of such distributions, one can use outliers, cysteine and leucine, to identify amino acids that deviate from this simple rule for functional purposes and examine those functions. We quantify the strength of such selection as the entropic cost outliers pay to defy the mass-frequency relation. Codon degeneracy of an amino acid partially explains the correlation between mass and frequency: light amino acids being typically encoded by highly degenerate codon families, with the exception of arginine. While degeneracy may be a factor in hard wiring the relationship between mass and frequency in proteomes, it does not provide a complete explanation. By examining extremophiles, we are able to show that this law weakens with temperature, likely due to protein stability considerations, thus the environment is essential.
AB - We examine whether the frequency of amino acids across an organism's proteome is primarily determined by optimization to function or other factors, such as the structure of the genetic code. Considering all available proteins together, we first point out that the frequency of an amino acid in a proteome negatively correlates with its mass, suggesting that the genome preserves a fundamental distribution ruled by simple energetics. Given the universality of such distributions, one can use outliers, cysteine and leucine, to identify amino acids that deviate from this simple rule for functional purposes and examine those functions. We quantify the strength of such selection as the entropic cost outliers pay to defy the mass-frequency relation. Codon degeneracy of an amino acid partially explains the correlation between mass and frequency: light amino acids being typically encoded by highly degenerate codon families, with the exception of arginine. While degeneracy may be a factor in hard wiring the relationship between mass and frequency in proteomes, it does not provide a complete explanation. By examining extremophiles, we are able to show that this law weakens with temperature, likely due to protein stability considerations, thus the environment is essential.
KW - Amino acid statistics
KW - Extremophiles
KW - Genetic code origin
KW - Genome evolution
UR - http://www.scopus.com/inward/record.url?scp=84988941069&partnerID=8YFLogxK
U2 - 10.1016/j.jtbi.2016.08.011
DO - 10.1016/j.jtbi.2016.08.011
M3 - Article
C2 - 27544420
AN - SCOPUS:84988941069
SN - 0022-5193
VL - 410
SP - 119
EP - 124
JO - Journal of Theoretical Biology
JF - Journal of Theoretical Biology
ER -