Chem. Senses 26: 151-159,
2001
© Oxford University Press 2001
Updating the str and srj (stl) Families of Chemoreceptors in Caenorhabditis Nematodes Reveals Frequent Gene Movement Within and Between Chromosomes
Department of Entomology, University of Illinois at UrbanaChampaign, 505 S. Goodwin, Urbana, IL 61801, USA
Correspondence to be sent to: H. M. Robertson, Department of Entomology, University of Illinois at UrbanaChampaign, 505 S. Goodwin, Urbana, IL 61801, USA. e-mail hughrobe{at}uiuc.edu
| Abstract |
|---|
|
|
|---|
The seven transmembrane receptor (str) and srj (renamed from stl) families of chemoreceptors have been updated and the genes formally named following completion of the Caenorhabditis elegans genome sequencing project. Analysis of gene locations revealed that 84% of the 320 genes and pseudogenes in these two families reside on the large chromosome V. Movements to other chromosomes, especially chromosome IV, have nevertheless been relatively common, but only one has led to further gene family diversification. Comparisons with homologs in C. briggsae indicated that 22.5% of these genes have been newly formed by gene duplication since the species split, while also showing that four have been lost by large deletions. These patterns of gene evolution are similar to those revealed by analysis of the equally large srh family of chemoreceptors, and are likely to reflect general features of nematode genome dynamics. Thus large random deletions presumably balance the rapid proliferation of genes and their degeneration into pseudogenes, while gene movement within and between chromosomes keeps these nematode genomes in flux.
| Introduction |
|---|
|
|
|---|
The nematode Caenorhabditis elegans has a large repertoire of chemoreceptor genes (Troemel et al., 1995
Analysis of the large srh gene family, with 214 genes and 90 pseudogenes, confirmed that these patterns of gene evolution were common (Robertson, 2000
), although in this case seven intron gains were inferred within the family. Analysis of the patterns of DNA deletions within the srh family showed that removal of pseudogenes probably results from the common occurrence of large deletions. Completion of sequencing of the C. elegans genome, and comparison with srh chemoreceptor orthologs in the partially completed sequence of the C. briggsae genome, also revealed that some genes have been completely lost from the C. elegans genome, while perhaps 28% of the srh family chemoreceptors in C. elegans have been newly formed since the split with C. briggsae. Finally, 82% of the srh family genes and pseudogenes occur on the large chromosome V; mapping of gene location on a phylogenetic tree revealed that movements to other chromosomes have been common (27 altogether), but only twice have led to amplification of new gene lineages on other chromosomes.
With completion of the sequencing of the C. elegans genome (C. elegans Sequencing Consortium, 1998
; C. elegans Genome Consortium, 1999
) it is now possible to provide a complete description and formal naming of the str family, as well as the related family previously called stl but here renamed srj. Phylogenetic analysis of these two gene families confirms several of the genome dynamics inferred from the srh family, including loss of C. briggsae orthologs, recent formation of many genes within C. elegans and the frequent occurrence of movements of genes between chromosomes. In addition, preliminary analysis of gene location within chromosome V revealed frequent gene movement within it.
| Materials and methods |
|---|
|
|
|---|
The public DNA databases were searched using TBLASTN for relatives of all major gene lineages of the str and srj (stl) families (Robertson, 1998
|
| Results |
|---|
|
|
|---|
The updated str and srj families
The updated str (seven transmembrane receptor) family consists of 189 genes and 74 pseudogenes. The related srj family consists of 39 genes and 18 pseudogenes. This family was previously called the stl family; however, this gene name has been reserved, so this family is being renamed in the sr (serpentine receptor) gene name series initiated by Troemel et al. (1995)
. In addition, 23 homologs in these two families are available from the partial genome sequence of the congener C. briggsae (see below). Conceptual translations were aligned with the previous dataset (Robertson, 1998
), but for the phylogenetic analysis an alignment generated with Clustal X was employed. Phylogenetic analysis of this large dataset of 343 protein sequences is difficult. MP analysis, which was employed for these two families previously, using the heuristic algorithm of PAUP* yielded six equally parsimonious trees 37 647 steps long with a consistency index of 0.15 and required 7 days on a 300 MHz G3 Power Macintosh computer to examine >2 trillion trees. However, this island of very similar trees was found only once, leaving the possibility that shorter trees exist. Therefore NJ was employed, followed by the heuristic minimum evolution (ME) algorithm of PAUP*, which examined >5 million rearrangements using tree-bisection-and-reconnection branch swapping resulting in a tree 0.2% shorter. This tree is shown in four sections in Figure 1, with the srj family designated as the outgroup based on its location in preliminary phylogenetic analyses of the entire superfamily.
The genes and pseudogenes are given formal names in the str and srj series according to their location in this phylogenetic tree (gene fragments encoding less than half of the typical amino acid length of these receptors were excluded from the pseudogene set). Genes C42D4.5, C50C10.7 and M7.13 in the large (DN)P subfamily have already been named str-1, -2 and -3 (Troemel et al., 1997
; Dwyer et al., 1998
; Peckol et al., 1999
) and the odr-10 name is also retained for gene C53B7.5 in the small odr-10 subfamily (Sengupta et al., 1996
).
Two new subfamilies are recognized in the str family. The DA subfamily consists of two divergent annotated genes on cosmid B0213 identified by the PSI-BLASTP search. The D(SP) subfamily, consisting of many newly identified genes in two overlapping yeast artificial chromosomes (YACs; Y9C9 and Y17G9) was previously a small basal lineage of the large (DN)P subfamily. In addition, two highly divergent proteins (STR-4/W06D12.4 and STR-5/Y40H7A.1) were identified in the PSI-BLASTP search that are distantly related to the DP subfamily in both the NJ and MP trees, but there was no bootstrap support for this relationship so they have not been assigned to a subfamily. Similarly, STR-94/F07C3.8 is highly divergent, does not cluster confidently with the EP subfamily and has a different placement in MP trees [at the base of the (DE)P subfamily] both in Robertson (1998)
and the present analysis, so it has not been assigned to a subfamily. Otherwise all the subfamilies are as recognized in Robertson (1998)
; all were supported by bootstrapping at the 70% level and most at the 95% level, as they were with MP on the original dataset. Also as before, there was little bootstrap support for the relationships of the subfamilies within the str family. Within the larger str subfamilies, the basal architecture was also seldom well supported by bootstrapping, and commonly was somewhat different in the most parsimonious trees identified.
Chromosomal location
As in the nuclear receptor superfamily (Sluder et al., 1999
) and the srh family (Robertson, 2000
), the vast majority, 267 (84%) of these 320 genes and pseudogenes are located on the large chromosome V, with just 40 on IV, seven on X, three on II, two on III and one on I. Mapping of these gene locations on the phylogenetic tree allowed inference of inter-chromosomal gene movements; these are indicated by roman numerals above the middle of the appropriate tree branch in Figure 1. Even the canonical odr-10/C53B7.3 gene is one of these, a recent gene duplication from str-112/ F10D2.4 that moved to the X chromosome.
Three additional aspects of this analysis are remarkable. First, 20 of the chromosome IV genes are clustered on overlapping YACs and cosmids and comprise most of the newly recognized D(SP) subfamily (str-152 to -175). These all appear to have resulted from duplication of an ancestral gene that moved from chromosome V a long time ago and formed this subfamily. Remarkably, four members of the subfamily now reside on chromosome V, but each involved a separate movement back to chromosome V, both on phylogenetic grounds [their independent clustering within the D(SP) subfamily is strongly supported by bootstrapping] and because they are widely disparately located on chromosome V.
Second, one other movement to chromosome IV in subfamily D(SA) has led to the formation of one gene and three pseudogenes on cosmid C34D4 (str-48 to -51); however, just five other movements have led to gene duplications at the new chromosomal location, all pairs, and four of them include a pseudogene. The remaining non-chromosome V genes and pseudogenes are all singletons. Altogether 14 movements to chromosome IV, six to X, three to II, two to III and one to I were inferred, making a total of 30 movements between chromosomes in the two families [including the return of the four D(SP) subfamily genes to chromosome V]. The independence of all of these gene movements was strongly supported by bootstrapping in the NJ/ME analysis and they were also clearly separate in the MP analysis, commonly occurring in different subfamilies or divergent gene lineages within the larger subfamilies.
Third, a further enigmatic feature of these gene movements is that three independent movements to chromosome IV have resulted in genes on cosmid C42D4 [str-44 in the D(SA) subfamily and str-1 and str-249 in the (DN)P subfamily], while another led to formation of the tandem cluster on the overlapping cosmid C34D4 mentioned above in the D(SA) subfamily. The independence of these events in the phylogenetic tree is convincing; they are not adjacent genes on these two cosmids, but it seems remarkable that they are so closely linked within 64 kbp on a chromosome of 10.7 Mbp. This is the only obvious instance of a possible hotspot for gene insertion on to a new chromosome: the other chromosome IV genes are fairly evenly distributed across the chromosome, as are those on chromosomes X, III and II.
Gene movement within chromosome V
Gene movement within a chromosome was obvious in the original dataset (Robertson, 1998
), but there is no simple way to quantify gene movement within a chromosome, in part because complete contigs are not yet available. Nevertheless, it is possible to undertake a preliminary analysis using the provisional chromosomal locations provided by the Entrez genome server at NCBI (http://www.ncbi.nlm.nih.gov/pmgifs/genomes/6239.html). The approximate location of all chromosome V genes in Mbp from the left end of this 20.6 Mbp chromosome is given after the gene name in Figure 1 (some clones/genes are not yet entered into this database, but their position was ascertained through overlap with those that are). The genes are fairly evenly distributed along the length of this chromosome and simple inspection of these locations shows that genes must have moved around on this chromosome frequently. For example, within the well-resolved srj family, at least 22 movements around chromosome V can be inferred from the tree. Even the terminal lineage of several genes on cosmid T03D3 and neighboring cosmids (C31B8, F37B4 and Y45G12A) involves several non-contiguous genes. Similarly high rates of movement within chromosome V are revealed by well-supported regions of the major str subfamilies. On the other hand, some lineages such as the entire EP subfamily have remained in the same region of chromosome V, even if not all the genes remain contiguous. As noted above, the same is true for most of the D(SP) subfamily on chromosome IV.
Intron evolution
Mapping of intron losses from the ancestral condition of eight introns within the str family on the phylogenetic tree was not as simple as before (Robertson, 1998
), because the independence of many losses inferred on basal branches within the large subfamilies is seldom supported by bootstrapping (Figure 1). Details of relationships within the subfamilies were somewhat different in this NJ/ME tree from the MP trees obtained earlier and now, and as before there are regions of the large D(SA) and (DN)P subfamilies where some relatively minor rearrangements would reduce inferred numbers of intron losses. On the other hand, there is sometimes underestimation of likely intron losses, for example in that gene str-94/F07C3.8 probably lost introns c and d independently of the EP subfamily ancestor. In the str family, 177 losses were inferred in Figure 1, with
150 of these being convincingly independent; the independence of 122 was supported by
70% bootstrapping (no matter how the trees are rearranged, more losses than those whose independence was supported by bootstrapping must be inferred to explain the current distribution of introns). In the srj family, 28 of the 30 inferred losses in Figure 1 were convincingly independent and supported by bootstrapping, yielding a total of
178 intron losses for the two families.
Only one intron gain was previously noted within the str family (Robertson, 1998
), but two more were recognized here. First, a new homolog from C. briggsae, G21D19.g near the base of the D(SA) subfamily, has a new intron called intron n between the positions of introns e and f. The C. elegans ortholog has been lost, so it is unclear if this is a unique addition within C. briggsae. Second, pseudogene str-117/T26H5.a* in the str subfamily appears to have a novel intron near the C-terminus, beyond the position of intron h, that was not recognized in the original reconstruction (it is named o). In addition, a minor adjustment to the timing of acquisition of introns j, k and l near the base of the srj family has been made; the most parsimonious mapping suggests that they were acquired after the first lineage of the family diverged (Figure 1).
C. briggsae homologs
Twenty-three homologs of the str and srj family members have been identified among the 8 Mbp or 8% of the C. briggsae genome available; their relationships are shown in Figure 1. In my previous analysis, Table 1 in Robertson (Robertson, 1998
), the levels of similarity of orthologs between the two species within the str family varied rather widely, from 6887% amino acid identity for the 11 genes on C. briggsae cosmid G47M22 to 5761% for three other orthologous pairs where at least one was a pseudogene. In contrast, orthologous pairs in the srh family revealed less variability in levels of amino acid identity, averaging 68% (range 5677%) (Robertson, 2000
). Five newly recognized orthologs in the str and srj families (Table 1) have divergence levels more in line with these (6478%), and an additional eight orthologous gene pairs in the srd and two smaller families have identities ranging from 53 to 74% (H. M. Robertson, unpublished). I have therefore chosen 70% amino acid identity as an average value for the divergence of C. elegans/C. briggsae chemoreceptor orthologous pairs. Inspection of Figure 1 shows that many closely related pairs, some triplets and even two quadruples and sextuples of chemoreceptors in the str and srj families within C. elegans are more closely related to each other than this (branches highlighted in bold). Altogether 72 gene duplications were inferred to have occurred within C. elegans since the species split, forming 22.5% of the two families.
|
These orthologous comparisons also show that these genes are under considerable selective pressure, because the frequency of synonymous or silent changes (Ks) (base changes that do not change the encoded amino acids) was always far greater than that for non-synonymous or replacement changes (Ka). Nevertheless, four of the C. briggsae genes have no orthologs in C. elegans, so these have apparently been lost by large deletions. The orthologs of G21D19.f# and g were apparently lost as part of a large deletion that also removed the orthologs of G21D19b, c, d and e in the srh family (Robertson, 2000
| Discussion |
|---|
|
|
|---|
Complete description of these two large families of chemoreceptors allows consideration of several new aspects of their molecular evolution not systematically addressed previously (Robertson, 1998
Preliminary analysis of chromosomal location within chromosome V shows that these genes have moved around frequently within the chromosome, leading to an even distribution of gene numbers along its length. Presumably most of these transpositions involve a similar mechanism to movement across chromosomes. Inversions might also be involved in movements within chromosome V, but their frequency and importance is unknown in nematodes. Extensive simulation studies to develop null models would be required to determine whether movement within chromosomes is more common than between them. It does appear, however, that movements to other chromosomes usually lead to loss of the gene, rather than the formation of new gene lineages so frequently seen within chromosome V.
As before (Robertson, 1998
, 2000
), comparisons with C. briggsae provide information about the patterns of gene evolution in these chemoreceptor families and genome dynamics in general. Almost a quarter (22.5%) of str and srj family genes and pseudogenes in C. elegans appear to have been newly formed by gene duplications since the species split. This process is clearly not limited to these and other chemoreceptors, because 60% of the C. elegans genome consists of gene families that have been found in nematodes but not in yeast, Drosophila or mammals (C. elegans Sequencing Consortium, 1998
; C. elegans Genome Consortium, 1999
; Rubin et al., 2000
). As expected with the finishing of the C. elegans genome sequencing project, orthologs were found for several of the C. briggsae genes identified before (Robertson, 1998
), as well as some new ones. Nevertheless, four C. briggsae members of these two families do not have orthologs in C. elegans; they have apparently been lost by three large deletions. As was true for the srh family (Robertson, 2000
), examination of pseudogenes and gene fragments also revealed the common occurrence of large deletions in the C. elegans genome. Presumably it is these kinds of event that remove the many newly forming pseudogenes and maintain the small size of this nematode genome in the face of rampant gene duplication.
| Acknowledgments |
|---|
I thank the Genome Sequencing Centers at Washington University, St Louis, USA and the Sanger Centre, Cambridge, UK, for communication of DNA sequence data prior to publication, John Spieth and Richard Durbin for their assistance in annotating these nematode genes, and John Spieth and Chris Michelsen for assistance in analyzing chromosomal locations. This work was supported by NSF grant IBN 96-04095. The amino acid alignment file has been submitted to the EMBL alignment database (ftp://ftp.ebi.ac.uk/pub/databases/embl/align/) with accession number ds42124. Gene and protein alignments are also available from the author at hughrobe{at}uiuc.edu.
| References |
|---|
|
|
|---|
Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W. and Lipman, D.J. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res., 25, 33893402.
Bargmann, C.I. (1998) Neurobiology of the Caenorhabditis elegans genome. Science, 282, 20282033.
C. elegans Genome Consortium (1999) How the worm was won. Trends Genet., 15, 5158.[ISI][Medline]
C. elegans Sequencing Consortium (1998) Genome sequence of the nematode C. elegans: a platform for investigating biology. Science, 282, 20122018.
Dwyer, N.D, Troemel, E.R., Sengupta, P. and Bargmann, C.I. (1998) Odorant receptor localization to olfactory cilia is mediated by ODR-4, a novel membrane-associated protein. Cell, 93, 455466.[ISI][Medline]
Jeanmougin, F., Thompson, J.D., Gouy, M., Higgins, D.G. and Gibson, T.J. (1998) Multiple sequence alignment with Clustal X. Trends Biochem. Sci., 23, 403405.[ISI][Medline]
Nei, M. and Gojobori, T. (1986) Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol. Biol. Evol., 3, 418426.[Abstract]
Peckol, E.L, Zallen, J.A., Yarrow, J.C. and Bargmann, C.I. (1999) Sensory activity affects sensory axon development in C. elegans. Development, 126, 18911902.[Abstract]
Robertson, H.M. (1998) Two large families of chemoreceptor genes in the nematodes Caenorhabditis elegans and Caenorhabditis briggsae reveal extensive gene duplication, diversification, movement, and intron loss. Genome Res., 8, 449463.
Robertson, H.M. (2000) The large srh family of chemoreceptor genes in Caenorhabditis nematodes reveals processes of genome evolution involving large duplications and deletions and intron gains and losses. Genome Res., 10, 192203.
Rubin, G.M. and 54 coauthors (2000) Comparative genomics of the eukaryotes. Science, 287, 22042215.
Sengupta, P., Chou, J.H. and Bargmann, C.I. (1996) odr-10 encodes a seven transmembrane domain olfactory receptor required for responses to the odorant diacetyl. Cell, 84, 899909.[ISI][Medline]
Sluder, A.E., Mathews, S.W., Hough, D., Yin, V.P. and Maina, C.V. (1999) The nuclear receptor superfamily has undergone extensive proliferation and diversification in nematodes. Genome Res., 9, 103120.
Swofford, D.L. (1998) PAUP*: Phylogenetic Analysis Using Parsimony and Other Methods, Version 4. Sinauer Press, New York.
Troemel, E.R., Chou, J.H., Dwyer, N.D., Colbert, H.A. and Bargmann, C.I. (1995) Divergent seven transmembrane receptors are candidate chemosensory receptors in C. elegans. Cell, 83, 207218.[ISI][Medline]
Troemel, E.R., Kimmel, B.E. and Bargmann, C.I. (1997) Reprogramming chemotaxis responses: sensory neurons define olfactory preferences in C. elegans. Cell, 91, 161169.[ISI][Medline]
Accepted September 21, 2000
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
J. H. Thomas Genome evolution in Caenorhabditis Brief Funct Genomic Proteomic, June 23, 2008; (2008) eln022v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. H. Thomas Concerted Evolution of Two Novel Protein Families in Caenorhabditis Species Genetics, April 1, 2006; 172(4): 2269 - 2281. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. H. Thomas Analysis of Homologous Gene Clusters in Caenorhabditis elegans Reveals Striking Regional Cluster Domains Genetics, January 1, 2006; 172(1): 127 - 143. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. K. Stewart, N. L. Clark, G. Merrihew, E. M. Galloway, and J. H. Thomas High Genetic Diversity in the Chemoreceptor Superfamily of Caenorhabditis elegans Genetics, April 1, 2005; 169(4): 1985 - 1996. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. H. Thomas, J. L. Kelley, H. M. Robertson, K. Ly, and W. J. Swanson Adaptive evolution in the SRZ chemoreceptor families of Caenorhabditis elegans and Caenorhabditis briggsae PNAS, March 22, 2005; 102(12): 4476 - 4481. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Chen, S. Pai, Z. Zhao, A. Mah, R. Newbury, R. C. Johnsen, Z. Altun, D. G. Moerman, D. L. Baillie, and L. D. Stein Identification of a nematode chemosensory gene family PNAS, January 4, 2005; 102(1): 146 - 151. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Chen, D. Lawson, K. Bradnam, T. W. Harris, and L. D. Stein WormBase as an Integrated Platform for the C. elegans ORFeome Genome Res., October 1, 2004; 14(10b): 2155 - 2161. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Coghlan and K. H. Wolfe From the Cover: Origins of recently gained introns in Caenorhabditis PNAS, August 3, 2004; 101(31): 11362 - 11367. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. M. Robertson, C. G. Warr, and J. R. Carlson Molecular evolution of the insect chemoreceptor gene superfamily in Drosophila melanogaster PNAS, November 25, 2003; 100(suppl_2): 14537 - 14542. [Abstract] [Full Text] |
||||
![]() |
A. Coghlan and K. H. Wolfe Fourfold Faster Rate of Genome Rearrangement in Nematodes Than in Drosophila Genome Res., June 1, 2002; 12(6): 857 - 867. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. G. Vogt, M. E. Rogers, M.-d. Franco, and M. Sun A comparative study of odorant binding protein genes: differential expression of the PBP1-GOBP2 gene cluster in Manduca sexta (Lepidoptera) and the organization of OBP genes in Drosophila melanogaster (Diptera) J. Exp. Biol., March 15, 2002; 205(6): 719 - 744. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. M. Harrison, A. Kumar, N. Lang, M. Snyder, and M. Gerstein A question of size: the eukaryotic proteome and the problems in defining it Nucleic Acids Res., March 1, 2002; 30(5): 1083 - 1090. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||









