Chem. Senses 27: 293-298,
2002
© Oxford University Press 2002
SYMPOSIUM: Proceedings of a Symposium on Functional Genomics in Neural Systems |
Subtraction-coupled Custom Microarray Analysis for Gene Discovery and Gene Expression Studies in the CNS
1 Interdepartmental Program in Neuroscience, University of California, Los Angeles, Los Angeles CA 90094, USA 2 Program in Neurogenetics, Department of Neurology, University of California, Los Angeles, Los Angeles CA 90094, USA
Correspondence to be sent to: Daniel H. Geschwind, UCLA Neurology, Reed Neurologic Research Center, 710 Westwood Plaza, Los Angeles, CA 90095-1769, USA. e-mail: dhg{at}ucla.edu
Abstract
The revolution in our knowledge about the genomes of organisms gives rise to the question, what do we do with this information? The development of techniques allowing high throughput analysis of RNA and protein expression, such as cDNA microarrays, provide for genome-wide analysis of gene expression. These analyses will help bridge the gap between systems and molecular neuroscience. This review discusses the advantages of using a subtractive hybridization technique, such as a representational difference analysis, to generate a custom cDNA microarray enriched for genes relevant to investigating complex, heterogeneous tissues such as those involved in the chemical senses. Real and hypothetical examples of these experiments are discussed. Benefits of this approach over traditional microarray techniques include having a more relevant clone set, the potential for gene discovery and the creation of a new tool to investigate similar systems. Potential pitfalls may include PCR artifacts and the need for sequencing. However, these disadvantages can be overcome so that the coupling of subtraction techniques to microarray screening can be a fruitful approach to a variety of experimental systems.
Introduction
DNA microarrays are a powerful technique for the simultaneous measurement
of the expression of thousands of genes. Such studies can quickly yield a
genome-wide description of RNA levels in a given cell or tissue at a given
point in time, or a genetic characterization of a tissue's response to
experimental manipulation. Information gleaned from these studies can generate
working hypotheses for molecular pathways essential to a given biological
process, or potential drug targets for therapies. The utility of the large
volumes of data generated, however, depends upon proper experimental design at
many levels. An important choice often not given due consideration is which
microarray or clone set one should use. Frequently, this choice is made based
primarily on convenience and availability, rather than focusing on assaying a
set of genes important to the process at hand. This review discusses the
advantages of using a subtractive hybridization technique, such as a
representational difference analysis (RDA), to generate a custom microarray
enriched for genes relevant to the experimental process being investigated. We
will begin with a hypothetical example and then discuss a specific
implementation of the technique to investigate neural stem cells
(Geschwind et al.,
2001
). RDA-coupled microarrays provide unique, focused arrays that
allow for novel gene discovery and that become a re-useable tool for the
analysis of similar systems.
A hypothetical example: the motivations for RDA-coupled microarrays
In the mid-1830s, Müller developed the doctrine of specific nerve
energies: `the same stimulus, for example, electricity, may act simultaneously
on all the organs of senseall are sensible to its action; but the nerve
of each sense is affected in a different waybecomes the seat of a
different sensation' (Müller and
Baly, 1838
). In other words, any signal carried by a nerve of a
specific sense will be interpreted as a stimulus of that sense. Thus,
mechanical stimulation of the retina, for example by rubbing the eyeballs,
will result in a perceived visual experience of seeing stars. Likewise,
although electrical activity is similar across modalities, it always results
in modality-specific sensations. This simple theory is the basis of our modern
understanding of the working of the senses. Even mechanical or electrical
stimulation of a spot on the somatosensory cortex will result in perceived
sensations arriving from the appropriate region of the body
(Penfield, 1959
). Thus, a
major question in sensation research concerns what is inherently particular
about one primary sensory cortex that allows for signals arriving there to be
interpreted as they are. One hypothesis is that differences in gene
expression, which reflect or underlie differences in neuronal identity and
circuitry, contribute to Müller's doctrine. If that is so, then we should
be able to find genes that are differentially expressed between the different
sensory cortices. In this case, our system of interest is olfaction and,
hence, we wish to uncover genes that define the olfactory cortex, that
distinguish it from other sensory cortices. Hypothetically, one approach that
could be taken would be to perform a microarray experiment comparing the gene
expression in the olfactory cortex to primary sensory cortices from the other
senses to identify genes enriched in the olfactory cortex.
A brief introduction to microarrays
Microarrays are solid surfaces such as glass slides, on which small
quantities of cDNAs or oligonucleotides complementary for thousands of genes
have been deposited in ordered arrays, with each spot or group of spots
representing one gene. These ordered arrays can be used for the simultaneous
measurement of the expression of thousands of genes from a given tissue by
hybridizing to the slide a radioactively or fluorescently labeled probe
derived from mRNA. A common method involves synthesizing and labeling cDNA
from two tissues with different fluorescent dyes and hybridizing both probes
onto the same array. This allows for a direct comparison of gene expression in
the two tissues, for example the olfactory cortex versus a pooled RNA from the
other primary sensory cortices. The advantage of a two-color system over a
radioactive or a one-color system is that it controls for irregularities in
spotting of the slide and allows the direct comparison of two tissues since
they are co-hybridized onto the same spots. Such microarrays have been used
successfully for such varied tasks as studying development in
Drosophila, profiling of tumors, and genetic characterization of
neural progenitors (Dhanasekaran et
al., 2001
; Furlong et
al., 2001
; Geschwind
et al., 2001
). A more general review of microarray
applications in neuroscience can be found in Luo and Geschwind
(Luo and Geschwind, 2001
).
Commercial versus custom arrays
The primary advantages of commercial arrays are their wide availability and general applicability. Most have been tested in a variety of systems and are not biased towards one particular system. However, there are two main disadvantages to using a prefabricated array. In our example involving the olfactory cortex, the tissues of interest are neural; however, a prefabricated array may contain many genes specific to non-neural tissue. For most 10 000-15 000 gene arrays, only 25-60% of the spots in the array show a measurable hybridization using neural tissue, suggesting that a significant proportion of genes being assayed are not relevant to the experiment. Also, until whole genome arrays become available, it is quite likely that the given array used in an experiment will be lacking genes that may be essential for the process being studied. Prefabricated arrays will be biased towards more common transcripts, as rare transcripts are less likely to be present in databases and available for arraying. Therefore, experiments with prefabricated arrays may provide only an incomplete picture of the genes behind the process being studied, with many key genes being excluded from the analysis and many irrelevant genes being included.
Custom arrays include more relevant genes
One option is to generate a cDNA library from the tissue being studied, for example the olfactory cortex, and to print an array from that library. This custom array will include almost all of the genes expressed in the tissue of interest, filling the holes that would be left by a prefabricated array. However, it would also include many genes expressed in the tissue, but not truly of interest to the question being addressed. For example, this library would likely include many clones for genes commonly expressed in all cells, such as basic metabolic and `housekeeping' genes. This means that, as with the prefabricated array, an array made from a cDNA library would include a significant proportion of genes not relevant to the experimental question. Further-more, an array printed from a cDNA library has one large disadvantage relative to a prefabricated arraythe identity of the clones is unknown without sequencing. One does not want to spend time and effort sequencing an entire library of genes, many of which are not relevant. What is needed is a way in which to focus this library on the condition of interest, to narrow the breadth of the genes assayed so that a more limited number of highly relevant genes is measured.
Custom arrays can be generated from the products of subtractions to enrich for genes of interest
One solution is found in the RDA technique
(Lisitsyn and Wigler, 1993
;
Hubank and Schatz, 1994
). RDA
is one of a family of techniques, such as suppressive subtractive
hybridization (Diatchenko et al.,
1996
), that couple hybridization-based deselection of common cDNAs
to differential PCR amplification to enrich for differentially expressed
transcripts from two populations of nucleic acids. Typically, two populations
of mRNA are separately transcribed into cDNA, restriction digested and ligated
to primers for PCR amplification. The two populations are then mixed and put
through iterative rounds of amplification and subtraction of cross-hybridizing
products. In each round, an excess of one population, called the driver, is
used to remove identical transcripts from the second, less concentrated
population, called the tester. This results in a sequential enrichment for
clones more abundant in the tester population than the driver population. The
products are then shotgun cloned or size-selected and cloned into a bacterial
vector to create a cDNA library. Normally, two libraries are created, with the
roles of tester and driver reversed, creating, in our example, one library
enriched for genes prevalent in the olfactory cortex and one library enriched
for genes prevalent in other primary cortices. There is an implicit trade-off
between the degree of subtraction and the complexity of the subtracted
mixture. The more rounds of subtraction performed, the higher the differential
expression, but the fewer the number of unique products. The fewer the number
of rounds, the more unique species identified, but a larger number of false
positives (i.e. equally expressed clones remaining in the subtracted library).
One can make the best of this situation by empirically determining the optimal
number of rounds to maximize differential expression, while detecting the
highest number of differentially expressed clones.
Subtraction alone may miss subtle differences in complex tissues
When RDA is done alone, three to four rounds of subtraction are conducted and a handful of differentially expressed clones are identified. To detect a larger number of differentially expressed genes, or perhaps more subtle differences, one can perform fewer rounds of subtraction and assay the surviving clones using microarrays. The two libraries resulting from our hypothetical subtraction are printed onto a glass slide or filter, resulting in a microarray that contains exclusively genes present in the olfactory cortex and other sensory cortices, and biased towards those that are differentially expressed. Following differential hybridization experiments, the spots with the greatest difference in signal intensity are sequenced and identified. This offers several advantages over the use of prefabricated arrays. In addition to the previously mentioned advantages of having a more focused array, this methodology allows for the inclusion of novel genes on the array, thus leading to gene discovery. This ability to include novel or unknown transcripts is one of the greatest strengths of RDA-coupled microarrays. Finally, the array created becomes a tool to study similar systems. Does the olfactory bulb also express some of these olfactory cortex specific genes? Once the RDA-coupled microarray has been designed, it can be made in quantity and used in a variety of related experiments.
Proof of principle for RDA-coupled microarrays
It is evident that RDA-coupled microarrays provide an elegant manner in
which to begin a comprehensive study of the most important genetic differences
between two complex conditions or tissues, such as the differences that may
underlie Müller's doctrine of specific nerve energies. But is such an
approach feasible? Since the introduction of this combination of techniques to
investigate the differences between malignant and non-malignant sarcomas
(Welford et al.,
1998
), several laboratories have used just such an approach
successfully, including Boeuf's comparison of brown and white preadipocytes,
and our laboratory's characterization of neural stem cells
(Boeuf et al., 2001
;
Geschwind et al.,
2001
). However, several important questions arise when performing
these sorts of studies. First, how should one analyze these arrayshow
does one select which clones to sequence, i.e. what constitutes a hit on a
microarray? What sorts of analyses are appropriate to these sorts of arrays?
How much sequencing should one do? Are there methods to limit the sequencing
of redundant clones? Second, what are the potential pitfalls what is
now known about RDA from doing these experiments? Are the most common clones
in each library the most differentially expressed? What sort of artifacts may
arise from using this system? Finally, and most importantly, do RDA-coupled
microarrays fulfill their promisesdo they provide a more focused array?
Do they allow for the discovery of novel genes? Do they provide a tool for the
study of similar systems?
Following the publication of our study of neural progenitors
(Geschwind et al.,
2001
), we had the opportunity to sequence all clones in one of our
libraries of RDA products. This has allowed us retrospectively to assess the
quality of our procedures and offer insight into the benefits and
idiosyncrasies of RDA-coupled microarrays. Here, we briefly describe the
experiment and discuss the lessons learned from the subsequent sequencing to
answer the questions posed.
RDA and microarrays for the study of neural progenitors
Neural stem cells are pluripotent, self-renewing cells that may have the
potential to give rise to the three major cell types of the brain: neurons,
astrocytes and oligodendrocytes. Essential to the normal development of the
brain, neural stem cells are thought to continue to produce progeny throughout
the lifespan to replace lost cells, notably in the olfactory bulbs. As they
could potentially prove a renewable resource to replace lost or damaged tissue
associated with neurodegenerative disorders such as Parkinson's disease, some
spinocerebellar ataxias, and Alzheimer's disease, it is of great interest to
understand the genes responsible for maintaining these cells in an
undifferentiated state and the genes responsible for commitment to specific
lineages (Gage, 2000
). To
discover genes that are enriched in neural stem cell populations, we performed
RDA followed by a cDNA microarray experiment
(Geschwind et al.,
2001
), as illustrated in Figure
1. The experiment was undoubtedly successful in that it has
yielded a long list of candidate genes important in the maintenance and
development of neural stem cells. Here, we will present some of the
conclusions from a further in-depth analysis of the technique itself.
|
How should one select which clones to sequence?
Effectiveness of selection criteria
To select clones for further sequencing and analysis, we required that the ratio in the mean difference in signal intensity be at least 1.5 and that the ratio ranked in the top 16% in at least two of three replicates. By requiring expression in the top 16% of all clones in two of three replicates, we eliminated clones that had high mean ratios based solely on one spuriously high ratio skewing the mean, thereby emphasizing those clones that yielded consistently high ratios. These criteria were empirically determined to yield an amenable number of clones for follow up with in situ hybridizations and Northern blots. The libraries used to create the microarray provide a readily available source of probes to use for these confirmatory studies, demonstrating another advantage of a custom array. Two major conclusions can be drawn from this: first, if one is planning to follow up the microarray with confirmation with a separate technique it is better to use more inclusive criteria and, second, criteria based on consistent expression across replicates are fairly effective at eliminating false positives.
RDA libraries are highly redundant. Systematic sequencing of the library results in diminishing returns, but allows for identification of a greater number of rare transcripts
Large-scale sequencing of a library can require significant time and resources, and RDA libraries are highly redundant. Clones with identical sequence can be grouped into contigs. While half of the contigs were discovered in the first five plates (480 clones) sequenced during a systematic sequencing of the library, it is surprising and interesting to note that 10-20% of subsequently sequenced clones represented new contigs in each of the additional 18 plates. Thus, sequencing continued to be fruitful throughout the library. Therefore, how much sequencing to do becomes a choice between the desire to identify rare transcripts and the time and effort involved in sequencing. If time and resources are limited, it is best to allow the microarray to decide which clones are a priority to sequence by selecting the most differentially expressed for further analysis.
A hybridization experiment can reduce the amount of sequencing necessary, but at a cost
The goal of this experiment, a `repeating clones' experiment
(Welford et al.,
1998
; Geschwind et
al., 2001
), was to reduce the amount of sequencing necessary
by eliminating spots on the array representing clones that had already been
sequenced. Thirty-five of the differentially expressed clones in the first two
sequenced plates were re-hybridized onto the array to identify genes already
sequenced. While 85% of the clones were correctly classified, there were four
rare transcripts, up-regulated in the NS library relative to the DC, that were
missed because they gave spurious hybridization signals. Since this was
effectively a one-dye experiment, this result could be due to differences in
DNA concentration across spots. This stresses the importance of using
replication, even for this simple experiment, and the need to control for
variability in concentration of DNA. A second probe, based on the PCR primer
common to each clone, could be labeled with another dye, allowing a ratio of
probe to DNA to be calculated for each spot. When deciding to use this
approach, one must think in terms of a costbenefit analysis. Redundant
sequencing will be reduced, but some rare transcripts may also be missed.
What is now known about RDA from these experiments?
Are the most abundant clones the most differentially expressed?
In contrast to previous work (Welford
et al., 1998
), we did not find a significant positive
correlation between the number of clones in a contig and the relative
expression of that clone. While it is clear that some of the contigs with
strong differential expression also have a great number of clones, it is also
clear that some contigs with no differential expression have many clones. This
suggests that there are perhaps two factors influencing the number of clones
representing a contig in the library. One would be the enrichment resulting
from the RDA subtraction and the other would be the prevalence of the sequence
in the starting populations, with sequences that are extremely common
surviving the mild subtraction of a two-round RDA. Another important
implication is that using the abundance of transcript in the library as a
gauge of relative expression, as is done with SAGE, could lead to an
overestimate of the biological importance of certain housekeeping transcripts
in the experimental system.
PCR hybrids may create RDA artifacts
Recently, some authors have noted that RDA is vulnerable to certain types
of PCR artifacts (Hansen-Hagge et
al., 2001
). Notably, common repeat elements in a sequence may
result in partial hybridizations that will survive the subtraction of RDA and
be amplified along with genuinely differentially expressed products. These
products can be detected by their partial sequence homology to a common
element, and partial novel sequence. Hansen-Hagge et al. have
suggested a novel ligation-mediated subtraction (LiMeS), which may resolve
these issues of amplification of PCR hybrids by introducing a ligation step
that necessitates full rather than partial hybrids prior to amplification
(Hansen-Hagge et al.,
2001
). To our knowledge, this technique has not yet been tested
for the generation of a microarray, but may represent the next logical
improvement on RDA-coupled microarrays.
Do RDA-coupled microarrays fulfill their promises?
RDA-coupled microarrays create a more focused array
As stated above, in the typical microarray experiment only 25-60% of the spots show measurable hybridization. Using this RDA-coupled microarray, >90% of the spots showed consistent hybridization, demonstrating that a greater majority of the genes on the array were sufficiently expressed in the system of interest. Thus, most of the genes on the array are relevant to the experiment and many of them are low-abundance species, demonstrating the utility of this approach.
RDA-coupled microarrays facilitate gene discovery
Of the 455 contigs discovered though sequencing, 209 mapped clearly to Unigene clusters. Ignoring 22 contigs that may represent PCR hybrids, that leaves 49 that have only partial homology to Unigene clusters and 70 that did not map onto Unigene clusters. These novel genes would not have been present on a prefabricated array. Most importantly, six of the seven of these genes that we have followed up thus far appear to be strongly expressed in the ventricular zone of the developing mouse brain, suggesting a role for these novel genes in neural stem cells on their immediate progeny.
RDA-coupled microarrays provide a tool to study similar systems
In creating a custom microarray, one also creates a tool that can be shared
among researchers to investigate related systems to discover common patterns
of gene expression. In a collaborative experiment
(Terksikh et al.,
2001
) gene expression in hematopoietic stem cells was compared to
whole bone marrow using the NS/DC array. By comparing the lists of genes
generated by this experiment and the genetic analysis of neural progenitors,
Terksikh et al. could create a short list of genes common to two
different populations of stem cells. Continuation of this approach could lead
to the definition of a core set of genes expressed in stem cells of all
tissues of the body. Thus the RDA-coupled microarrays utility may expand
beyond the experiment for which it was initially generated. Certainly, this
array contains a large number of genes relevant to neural development in many
systems and experimental paradigms. For example, this array could also be
amenable to the study of developing cells of the rostral migratory stream, to
examine the changes in their suite of gene expression as they mature into
olfactory bulb neurons (Rousselot et
al., 1995
).
Conclusion
We have reviewed the use of RDA-coupled microarrays for the creation of custom microarrays and discussed the benefits, such as the improved focus of the array, the inclusion of novel transcripts and the creation of a tool to investigate similar systems, as well as the disadvantages, such as RDA artifacts and the need for sequencing. However, these disadvantages can be overcome and we believe that the coupling of subtraction techniques to microarray screening will be a fruitful approach in a variety of experimental systems.
Acknowledgments
The authors are grateful for the collaboration of Harley Kornblum and members of his laboratory who have made significant contributions to the work described. We also thank M. Henson for advice on statistical analysis, the UCLA microarray core facility for arraying, N. Brown for training in Perl and mysql, and M. Parkkonen and M. Sandhu for sequencing. J. Dougherty is supported by HHMI. Some of the work described was supported by MH60233 and NS41393 and a grant from the Ron Shapiro charitable foundation.
References
Altschul, S.F., Madden, T.L,. Schaffer, A.A, Zhang, J., Zhang,
Z., Miller, W. and Lipman, D.J. (1997) Gapped
BLAST and PSI-BLAST: a new generation of protein database search
programs. Nucleic Acids Res., 25,3389
-3402.
Boeuf, S., Klingenspor, M,. Van Hal, N.L., Schneider, T.,
Keijer, J. and Klaus, S. (2001) Differential gene
expression in white and brown preadipocytes. Physiol.
Genom., 7,15
-25.
Dhanasekaran, S.M., Barrette, T.R., Ghosh, D., Shah, R., Varambally, S., Kurachi, K., Pienta, K.J., Rubin, M.A. and Chinnaiyan, A.M. (2001) Delineation of prognostic biomarkers in prostate cancer. Nature, 412,822 -826.[Medline]
Diatchenko, L., Lau, Y.F., Campbell, A.P., Chenchik, A.,
Moqadam, F., Huang, B., Lukyanov, S., Lukyanov, K., Gurskaya, N., Sverdlov,
E.D. and Siebert, P.D. (1996) Suppression
substractive hybridization: a method for generating differentially regulated
or tissue-specific cDNA probes and libraries. Proc. Natl Acad.
Sci. USA, 93,6025
-6030.
Furlong, E.E., Andersen, E.C., Null, B., White, K.P. and
Scott, M.P. (2001) Patterns of gene expression
during Drosophila mesoderm development. Science,293
, 1629-1633.
Gage, F.H. (2000) Mammalian neural stem
cells. Science, 287,1433
-1438.
Geschwind, D.H., Ou, J., Easterday, M.C., Dougherty, J.D., Jackson, R.L., Chen, Z., Antoine, H., Terskikh, A., Weissman, I.L., Nelson, S.F. and Kornblum, H.I. (2001) A genetic analysis of neural progenitor differentiation. Neuron,29 , 325-339.[Web of Science][Medline]
Hansen-Hagge, T.E., Trefzer, U., zu Reventlow, A.S., Kaltoft, K. and Sterry, W. (2001) Identification of sample-specific sequences in mammalian cDNA and genomic DNA by the novel ligation-mediated subtraction (Limes). Nucleic Acids Res.,29 , E20.
Hubank, M. and Schatz, D.G. (1994)
Identifying differences in mRNA expression by representational difference
analysis of cDNA. Nucleic Acids Res.,22
, 5640-5648.
Luo, Z. and Geschwind, D.H. (2001) Microarray applications in neuroscience. Neurobiol. Dis., 8,183 -193.[Web of Science][Medline]
Lisitsyn, N. and Wigler, M. (1993) Cloning the differences between two complex genomes.Science , 259,946 -951.[Abstract]
Müller, J. and Baly, W. (1838)Elements of Physiology . Taylor & Walton, London, p.819 .
Penfield, W. (1959) The interpretive cortex. Science, 179,1719 -1725.
Rousselot, P., Lois C. and Alvarez-Buylla, A. (1995) Embryonic (PSA) N-CAM reveals chains of migrating neuroblasts between the lateral ventricle and the olfactory bulb of adult mice. J. Comp. Neurol., 351,51 -61.[Web of Science][Medline]
Taniguchi, M., Miura, K., Iwao, H. and Yamanaka, S. (2001) Quantitative assessment of DNA microarrayscomparison with Northern blot analyses.Genomics , 71,34 -39.[Web of Science][Medline]
Terskikh, A.V., Easterday, M.C., Li, L., Hood, L., Kornblum,
H.I., Geschwind, D.H. and Weissman, I.L. (2001)
From hematopoiesis to neuropoiesis: evidence of overlapping genetic
programs. Proc. Natl Acad. Sci. USA,98
, 7934-7939.
Welford, S.M., Gregg, J., Chen, E., Garrison, D., Sorensen,
P.H., Denny, C.T. and Nelson, S.F. (1998)
Detection of differentially expressed genes in primary tumor tissues using
representational differences analysis coupled to microarray
hybridization. Nucleic Acids Res.,26
, 3059-3065.
Accepted December 7, 2001
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
G. Bruant, C. Maynard, S. Bekal, I. Gaucher, L. Masson, R. Brousseau, and J. Harel Development and Validation of an Oligonucleotide Microarray for Detection of Multiple Virulence and Antimicrobial Resistance Genes in Escherichia coli. Appl. Envir. Microbiol., May 1, 2006; 72(5): 3780 - 3784. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. S. McClintock High-throughput Expression Profiling Techniques Chem Senses, March 1, 2002; 27(3): 289 - 291. [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||


