Polymorphism, shared functions and convergent evolution of genes with sequences coding for polyalanine domains
- PMID: 14519685
- DOI: 10.1093/hmg/ddg329
Polymorphism, shared functions and convergent evolution of genes with sequences coding for polyalanine domains
Abstract
Mutations causing expansions of polyalanine domains are responsible for nine hereditary diseases. Other GC-rich sequences coding for some polyalanine domains were found to be polymorphic in human. These observations prompted us to identify all sequences in the human genome coding for polyalanine stretches longer than four alanines and establish their degree of polymorphism. We identified 494 annotated human proteins containing 604 polyalanine domains. Thirty-two percent (31/98) of tested sequences coding for more than seven alanines were polymorphic. The length of the polyalanine-coding sequence and its GCG or GCC repeat content are the major predictors of polymorphism. GCG codons are over-represented in human polyalanine coding sequences. Our data suggest that GCG and GCC codons play a key role in polyalanine-coding sequence appearance and polymorphism. The grouping by shared function of polyalanine-containing proteins in Homo sapiens, Drosophila melanogaster and Caenorhabditis elegans shows that the majority are involved in transcriptional regulation. Phylogenetic analyses of HOX, GATA and EVX protein families demonstrate that polyalanine domains arose independently in different members of these families, suggesting that convergent molecular evolution may have played a role. Finally polyalanine domains in vertebrates are conserved between mammals and are rarer and shorter in Gallus gallus and Danio rerio. Together our results show that the polymorphic nature of sequences coding for polyalanine domains makes them prime candidates for mutations in hereditary diseases and suggests that they have appeared in many different protein families through convergent evolution.
Similar articles
-
Genomic and evolutionary insights into genes encoding proteins with single amino acid repeats.Mol Biol Evol. 2006 Jul;23(7):1357-69. doi: 10.1093/molbev/msk022. Epub 2006 Apr 17. Mol Biol Evol. 2006. PMID: 16618963
-
Cadherin superfamily proteins in Caenorhabditis elegans and Drosophila melanogaster.J Mol Biol. 2001 Feb 2;305(5):1011-24. doi: 10.1006/jmbi.2000.4361. J Mol Biol. 2001. PMID: 11162110
-
[Analysis, identification and correction of some errors of model refseqs appeared in NCBI Human Gene Database by in silico cloning and experimental verification of novel human genes].Yi Chuan Xue Bao. 2004 May;31(5):431-43. Yi Chuan Xue Bao. 2004. PMID: 15478601 Chinese.
-
The human genome structure and organization.Acta Biochim Pol. 2001;48(3):587-98. Acta Biochim Pol. 2001. PMID: 11833767 Review.
-
[Nuclear receptors in man, fly and worm provide greater understanding of disease].Lakartidningen. 2002 Mar 14;99(11):1186-90. Lakartidningen. 2002. PMID: 11985014 Review. Swedish.
Cited by
-
Resequencing of the TMF-1 (TATA Element Modulatory Factor) regulated protein (TRNP1) gene in domestic and wild canids.Canine Med Genet. 2023 Nov 15;10(1):10. doi: 10.1186/s40575-023-00133-0. Canine Med Genet. 2023. PMID: 37968761 Free PMC article.
-
Zebrafish sex determination and differentiation: involvement of FTZ-F1 genes.Reprod Biol Endocrinol. 2005 Nov 10;3:63. doi: 10.1186/1477-7827-3-63. Reprod Biol Endocrinol. 2005. PMID: 16281973 Free PMC article. Review.
-
Phylogenetic and molecular characterization of the splicing factor RBM4.PLoS One. 2013;8(3):e59092. doi: 10.1371/journal.pone.0059092. Epub 2013 Mar 19. PLoS One. 2013. PMID: 23527094 Free PMC article.
-
Conformational behavior of polyalanine peptides with and without protecting groups of varying chain lengths: population of PP-II structure!J Mol Model. 2015 May;21(5):123. doi: 10.1007/s00894-015-2671-8. Epub 2015 Apr 23. J Mol Model. 2015. PMID: 25903302
-
Fork stalling and template switching as a mechanism for polyalanine tract expansion affecting the DYC mutant of HOXD13, a new murine model of synpolydactyly.Genetics. 2009 Sep;183(1):23-30. doi: 10.1534/genetics.109.104695. Epub 2009 Jun 22. Genetics. 2009. PMID: 19546318 Free PMC article.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Medical
Molecular Biology Databases
Miscellaneous