Title: High-throughput transcriptome and functional validation analysis
Abstract: Methods for correlating genes and gene function are provided. Such methods generally involve selecting a candidate gene that appears to be correlated with a particular cellular state or activity and then validating the role of the candidate gene in establishment of such a cellular state or activity. Certain methods utilized RNA interference techniques in the validation process.
Patent Number: 6,841,351 Issued on 01/11/2005 to Gan,   et al.
| Inventors:
|
Gan; Li (San Francisco, CA);
Gonzalez-Zulueta; Mirella (Pacifica, CA);
Anton; Kristin (San Ramon, CA);
Wilson; Richa (San Francisco, CA);
Melcher; Thorsten (San Francisco, CA);
Chin; Daniel (Foster City, CA)
|
| Assignee:
|
AGY Therapeutics, Inc. (South San Francisco, CA)
|
| Appl. No.:
|
027807 |
| Filed:
|
October 19, 2001 |
| Current U.S. Class: |
435/6; 435/91.1; 435/91.2; 436/501; 536/22.1 |
| Intern'l Class: |
C12Q 001/68; C12P019/34; C07H021/04; G01N033/566 |
| Field of Search: |
435/6,91.1,91.2
436/501
536/22.1
|
References Cited [Referenced By]
U.S. Patent Documents
| 6027876 | Feb., 2000 | Kreitman | 435/6.
|
| 6077686 | Jun., 2000 | Der | 435/69.
|
| 6124091 | Sep., 2000 | Petryshyn | 435/6.
|
| 6135942 | Oct., 2000 | Leptin | 535/23.
|
| 6251928 | Jun., 2001 | Nakao et al. | 514/369.
|
| 6300110 | Oct., 2001 | Villeponteau | 435/194.
|
| 6312686 | Nov., 2001 | Staddon | 424/94.
|
| Foreign Patent Documents |
| WO 00/44914 | Aug., 2000 | WO.
| |
Other References
Bosher and Labouesse, (2000), Nat. Cell Bio., 2(2):E31-36.
Hamilton and Baulcombe, (1999), Science, 286(5441):950-952.
Hammond et al., (2001), Nat. Rev. Genet, 2(2):110-119.
Kumiko et al., (2000), FEBS Letters, 479:79-82.
Sharp and Zamore, (2000), Science, 287(5462):2431-2433.
Svoboda et al., (2000), Development, 127(19):4147-4156.
Tuschl et al., (1999), Genes Dev., 13(24):3191-3197.
Tavemarakis et al., (2000), Nat. Genet., 24(2):180-183.
Wianny and Zernicka-Goetz, (2000), Nat. Cell Biol., 2(2):70-75.
|
Primary Examiner: Whisenant; Ethan
Assistant Examiner: Chakrabarti; Arun K.
Attorney, Agent or Firm: Taylor; Rebecca D., Sherwood; Pamela J.
Bozicevic, Field & Francis LLP
Parent Case Text
CROSS-REFERENCE TO RELATED APPLICATIONS
This application is a continuation-in-part application of U.S. patent
application Ser. No. 09/627,362, filed Jul. 28, 2000 now abandoned, which
claims the benefit of U.S. Provisional Application No. 60/146,640, filed
Jul. 30, 1999, both of which are incorporated herein in their entirety for
all purposes.
Claims
We claim:
1. A method for validating the effect of a candidate gene that is expressed
in a mammalian neural cell of interest, said method comprising:
(a) producing a candidate a dsRNA which comprises at least 100 nucleotides
of a said candidate gene;
(b) introducing said candidate dsRNA into a reference mammalian neural
cell; and
(c) validating the effect of said candidate gene by detecting an alteration
in a cellular activity or a cellular state in said reference mammalian
neural cell, wherein said alteration is the result of specific attenuation
of mRNA corresponding to said candidate in said reference mammalian neural
cell, indicating that said candidate gene plays a functional role in
mammalian neural cells.
2. The method of claim 1, wherein said step of producing the candidate
dsRNA comprises:
producing a cDNA corresponding to said candidate gene from an mRNA of said
mammalian neural cell of interest; and producing the candidate dsRNA from
said cDNA.
3. The method according to claim 2, further comprising:
producing a plurality of candidate cDNAs from said mammalian neural cell of
interest
producing a plurality of candidate dsRNA which comprise at least 100
nucleotides of said candidate cDNAs;
introducing each of the candidate dsRNA into a plurality of separate
reference mammalian neural cells having a gene expression similar to said
mammalian neural cell of interest;
and validating the effect of said candidate genes by testing for
alterations in a cellular activity or a cellular state in said reference
mammalian neural cell that result of attenuation of mRNA corresponding to
said candidate in said reference mammalian neural cell, wherein detection
of said alterations is indicative that said candidate gene plays a
functional role in said mammalian neural cells of interest.
4. The method of claim 3, wherein said step of producing a plurality of
candidate cDNAs comprises:
producing double-stranded cDNA from mRNA by reverse transcription;
producing cDNAs of a similar length by digesting said cDNA with a
restriction enzyme; and
producing a plasmid or PCR fragment from said cDNA after said digesting
step.
5. The method of claim 4, wherein the candidate dsRNA is produced by
transcribing said plasmid cDNA or PCR fragment.
6. The method of claim 4, wherein the restriction enzyme is selected from
the group consisting of Dpn1 and Rsa1.
7. The method of claim 3, wherein said step of producing the plurality of
candidate dsRNAs comprises selecting a candidate cDNA that is expressed at
a detectably different level with respect to said reference mammalian
neural cell and said mammalian neural cell of interest, and said reference
mammalian neural cell and said mammalian neural cell of interest differ
with respect to a cellular characteristic that is detectable by said step
of testing for alterations in a cellular activity or a cellular state.
8. The method of claim 7, wherein the candidate cDNA is selected from a
normalized library prepared from said reference mammalian neural cells or
said mammalian neural cell of interest and is present in low abundance in
the normalized library.
9. The method of claim 7, wherein the candidate cDNA is a differentially
expressed cDNA selected from a subtracted library that is enriched for
cDNAs that are differentially expressed with respect to said reference
mammalian neural cells or said mammalian neural cell of interest.
10. The method of claim 7, wherein said step of selecting the candidate
cDNA comprises:
preparing a tester-normalized cDNA library from test cells; a
driver-normalized cDNA library from control cells; a tester-subtracted
cDNA library which is enriched in one or more genes that are up-regulated
with respect to the test cell and the control cell, and a
driver-subtracted cDNA library which is enriched in one or more genes that
are down-regulated with respect to the test cell and the control cell; and
selecting a cDNA from the normalized libraries by contacting cDNAs from the
tester-normalized cDNA library with labeled probes derived from mRNA from
test cells and contacting cDNAs from the driver-normalized cDNA library
with labeled probes derived from mRNA from control cells under conditions
whereby probes specifically hybridize with complementary cDNAs to form a
first set of hybridization complexes; and detecting at least one
hybridization complex from the first set of hybridization complexes to
identify a cDNA that is present in low abundance.
11. The method of claim 7, wherein said step of selecting the candidate
cDNA comprises:
preparing a tester-normalized cDNA library from test cells; a
driver-normalized cDNA library from control cells; a tester-subtracted
cDNA library which is enriched in one or more genes that are up-regulated
with respect to the test cell and the control cell, and a
driver-subtracted cDNA library which is enriched in one or more genes that
are down-regulated with respect to the test cell and the control cell; and
selecting a cDNA from the subtracted libraries by contacting cDNAs from the
tester-subtracted cDNA library and contacting cDNAs from the
driver-subtracted cDNA library with a population of labeled probes under
conditions whereby probes from the population of probes specifically
hybridize with complementary cDNAs to form a second set of hybridization
complexes, and wherein the population of labeled probes is derived from
mRNA from test cells and control cells; and detecting at least one
hybridization complex from the second set of hybridization complexes to
identify a cDNA that is differentially expressed above a threshold level
with respect to the subtracted libraries.
12. The method of claim 7, wherein the cellular characteristic is cell
health, the test cell is a diseased neural cell and the control cell is a
healthy neural cell, and the candidate gene is suspected of correlation
with a disease.
13. The method of claim 12, wherein the test cell is obtained from a mammal
that has had a stroke or is at risk for stroke.
14. The method of claim 7, wherein the cellular characteristic is cellular
differentiation and the candidate gene is suspected of correlation with
control of cellular differentiation.
15. The method of claim 7, wherein the candidate gene is endogenous to said
mammalian neural reference cell.
16. The method of claim 7, wherein the candidate gene is an
extrachromosomal gene in said mammalian neural reference cell.
17. The method of claim 12, wherein said mammalian neural reference cell is
a neuroblastoma cell.
18. The method of claim 17, wherein said mammalian neural reference cell
has increased sensitivity to N-methyl-D-aspartate, .beta.-amyloid,
peroxide, oxygen-glucose deprivation, or combinations thereof, relative to
a normal mammalian neural cell.
19. The method of claim 18, wherein the detecting step comprises detecting
a decrease in cellular sensitivity to N-methyl-D-aspartate,
.beta.-amyloid, peroxide, oxygen-glucose deprivation, or combinations
thereof, relative to a normal mammalian neural cell.
20. The method of claim 3, wherein the detecting step comprises detecting
modulation of ligand binding to a protein.
21. The method of claim 1, wherein the determining step comprises
determining whether the protein encoded by the candidate gene binds to
another protein to form a coimmunoprecipitating complex.
22. The method of claim 1, wherein the candidate dsRNA is at least 500
nucleotides in length.
23. The method of claim 1, wherein the candidate dsRNA is between 500 and
1100 nucleotides in length.
24. The method of claim 1, wherein said mammalian neural cell of interest
is a glial cell.
25. The method of claim 1, wherein said reference mammalian neural cell is
a glial cell.
Description
BACKGROUND
It is estimated that while over 100,000 genes are expressed by a mammalian
genome, only a fraction are expressed in any particular cell or tissue.
Gene expression patterns, especially as reflected in the abundance of
mRNAs, vary according to cell or tissue type, with developmental or
metabolic state, in response to insult or injury, and as a consequence of
other genetic and environmental factors. Moreover, the pattern of
expression changes in a dynamic fashion over time with changes in cell
state and environment. The term "transcriptome" has been coined to
describe the set of all genes expressed, at any given time, under defined
conditions in a given tissue (Velculescu et al., 1997, Cell 88:243-51).
The detection of changes to the transcriptome can provide useful
information regarding the identity of genes and gene products important in
development, drug response, and, particularly, human disease processes.
However, methods now used for identifying changes in the transcriptome
suffer from a variety of deficiencies, e.g., they are expensive, require
relatively large quantities of starting material, and/or do not
efficiently identify low abundance transcripts important in mediating cell
processes.
While a change in the expression of a particular gene between different
cell states is evidence that the gene may be responsible for the
difference in cell states, it would be preferable that the putative role
assigned to the gene be validated. Such validation ideally would involve
an assay system in which one can interrogate what effect, if any,
modulation of expression of the gene has on a cellular state or cellular
activity. If modulation of expression was found to be correlated with a
change in cellular state or activity, this would substantiate the putative
role for the gene. Thus, there remains a need for high throughput methods
for first identifying genes that appear to play a role in a particular
cellular state or activity and then validating that the gene does in fact
have such a role.
BRIEF SUMMARY OF THE INVENTION
One aspect of the present invention provides a method for identifying and
producing an active double-stranded RNA (dsRNA) which attenuates a desired
gene expression in a cell. In one particular embodiment, the method for
identifying and producing an active dsRNA comprises:
(a) producing a plurality of cDNA, wherein each cDNA comprises at least a
portion of a gene that is expressed in a cell;
(b) producing a candidate dsRNA from at least one of the cDNAs;
(c) introducing the candidate dsRNA into a reference cell having a gene
expression similar to the cell in step (a); and
(d) identifying an active dsRNA by determining whether the candidate dsRNA
attenuates a desired gene expression in the reference cell.
Moreover, methods of the present invention can also include producing the
identified active dsRNA from the corresponding cDNA of step (a). Since
methods of the present invention provide a library, preferably a
comprehensive library, of cDNA, once the active dsRNA has been identified
it can be readily synthesized by transcription of the corresponding cDNA.
Therefore, methods of the present invention do not require conventional
chemical oligonucleotide synthesis and/or availability of known gene
sequences to produce the active dsRNA.
Identification of the active dsRNA include selecting a candidate gene and
identifying whether the dsRNA of at least a portion of the candidate gene
is an active dsRNA by determining whether modulation of expression of the
candidate gene by dsRNA in a reference cell has a functional effect in the
reference cell. The candidate gene is a gene that is expressed in a test
cell and/or a control cell, and/or is expressed at a detectably different
level with respect to the test cell and the control cell. The candidate
gene can be an endogenous gene of the reference cell, or it can be present
in the reference cell as an extrachromosomal gene. The test cell and
control cell differ with respect to a particular cellular characteristic
of interest. The active dsRNA alters a cellular activity or a cellular
state in the reference cell by modulating the expression of the candidate
gene.
Active dsRNA can be identified by a variety of methods, including by
introducing the candidate dsRNA into the reference cell and detecting an
alteration in a cellular activity or a cellular state in the reference
cell. The alteration in a cellular activity or a cellular state in the
reference cell indicates that the candidate gene plays a functional role
in the reference cell and that the candidate dsRNA is an active dsRNA.
Preferably, the candidate dsRNA is selected such that it is substantially
identical to at least a part of the candidate gene.
In one embodiment, the cellular characteristic is cell health, the test
cell is a diseased cell and the control cell is a healthy cell, and the
candidate gene is potentially correlated with a disease.
In another embodiment, the cellular characteristic is stage of development
and the test cell and the control cell are at different stages of
development, and the candidate gene is potentially correlated with
mediating the change between the different stages of development.
In yet another embodiment, the cellular characteristic is cellular
differentiation and the candidate gene is potentially correlated with
controlling cellular differentiation.
Preferably, the plurality of cDNA, which is used to synthesize dsRNA, is
produced from at least one mRNA which is isolated from the cell. The
isolated mRNA is then reverse transcribed by any of the methods
conventionally known to one skilled in the art to produce the cDNA.
Typically, the cDNA is then digested with one or more, preferably two,
restriction enzymes to produce a plurality of similar length cDNAs. In
this manner, a more comprehensive cDNA library is provided. In one
particular embodiment of the present invention, the restriction enzyme is
selected from the group consisting of Dpn1 and Rsa1. A plasmid or PCR
fragment is then generated from the digested cDNAs by any of the
conventional methods known to one skilled in the art. And the candidate
dsRNA is the produced by transcription of the plasmid or the PCR fragment.
In another embodiment, the cDNA is produced from all mRNAs that are
isolated from the control cell. This provides a comprehensive cDNA library
which comprises at least a portion of substantially all genes that are
actively expressed in the cell.
Another aspect of the present invention provides a method for identifying
and validating activity of an active dsRNA which attenuates a desired gene
expression in a cell. The method generally comprises producing a candidate
dsRNA, introducing the candidate dsRNA into a reference cell and
identifying whether the candidate dsRNA is an active dsRNA by detecting an
alteration in a cellular activity or a cellular state in the reference
cell.
Yet another aspect of the present invention provides a high-through put
method for correlating genes and gene function, said method comprising:
(a) producing a plurality of candidate dsRNAs from a plurality of cDNAs of
a control cell such that each candidate dsRNA comprises at least a portion
of a gene that is expressed in the control cell;
(b) introducing each of the candidate dsRNA into a plurality of separate
reference cell each having a gene expression similar to the control cell
in step (a); and
(c) identifying which candidate dsRNA is an active dsRNA by detecting an
alteration in a cellular activity or a cellular state in the reference
cell, desired alteration indicating that the gene corresponding to the
candidate dsRNA plays a functional role in the reference cell.
In one embodiment, the plurality of cDNAs is produced from a plurality of
mRNAs as described herein. Preferably, each candidate dsRNA is
substantially identical to at least a portion of the candidate gene.
Detecting an alteration in a cellular activity or a cellular state in the
reference cell can involve a variety of methods. For example, one can
detect modulation of ligand binding to a protein, detect a change in
phenotype or determine whether the protein encoded by the candidate gene
binds to another protein to form a complex that can be
coimmunoprecipitated. Detecting a change in phenotype is particularly
useful when the reference cell is a part of an organism. In addition,
detecting an alteration in a cellular activity or a cellular state in the
reference cell can involve determining whether interference with
expression of the candidate gene in the reference cell is correlated with
alteration of a cellular activity or cellular state. Interference can be
achieved by introducing a double-stranded RNA into the reference cell that
can specifically hybridize to the candidate gene.
The candidate gene can be selected from a normalized library prepared from
cells of the same type as the test cell or the control cell. In one
particular embodiment, the candidate gene is present in low abundance in
the normalized library.
In another embodiment, the candidate gene is a differentially expressed
gene selected from a subtracted library that is enriched for genes that
are differentially expressed with respect to the test cell and the control
cell. Preferably, the subtracted library is also normalized and the
candidate gene is one of the genes that is both present in low abundance
and differentially expressed in the subtracted and normalized library.
In one particular embodiment of the present invention, the candidate gene
is selected by a method comprising:
(i) preparing
(A) a tester-normalized cDNA library which is a normalized library prepared
from test cells;
(B) a driver-normalized cDNA library which is a normalized library prepared
from control cells;
(C) a tester-subtracted cDNA library which is enriched in one or more genes
that are up-regulated with respect to the test cell and the control cell,
and
(D) a driver-subtracted cDNA library which is enriched in one or more genes
that are down-regulated with respect to the test cell and the control
cell; and
(ii) identifying one or more clones from the normalized libraries and/or
the subtracted libraries,
wherein the candidate gene is one of the clones identified.
In one embodiment, identification of one or more clones from the normalized
libraries comprises:
(A) contacting clones from the tester-normalized cDNA library with labeled
probes derived from mRNA from test cells and contacting clones from the
driver-normalized cDNA library with labeled probes derived from mRNA from
control cells under conditions whereby probes specifically hybridize with
complementary clones to form a first set of hybridization complexes; and
(B) detecting at least one hybridization complex from the first set of
hybridization complexes to identify a clone from one of the normalized
libraries which is present in low abundance.
In another embodiment, identification of one or more clones from the
normalized libraries comprises:
(A) contacting clones from the tester-subtracted CDNA library and
contacting clones from the driver-subtracted CDNA library with a
population of labeled probes under conditions whereby probes from the
population of probes specifically.
hybridize with complementary clones to form a second set of hybridization
complexes, and wherein the population of labeled probes is derived from
mRNA from test cells and control cells; and
(B) detecting at least one hybridization complex from the second set of
hybridization complexes to identify a clone from one of the subtracted
libraries which is differentially expressed above a threshold level with
respect to the subtracted libraries.
Methods of the present invention can be used with a wide variety of cells
and cell types. For example, in one embodiment the test cell is obtained
from a mammal that has had a stroke or is at risk for stroke. In another
embodiment, the test cell is obtained from a mammal that has neurological
disorders or develop phenotypes mimicking human neurological disorders.
The reference cell can be part of a cell culture, a tissue, part of an
organism, an embryo, neural, glial cell or a neuroblastoma cell. The
reference cell can be a mammalian cell. Preferably, the reference cell is
human cell or a model system which is useful for investigating a variety
of human diseases and/or illnesses.
In one embodiment, the reference cell is useful as a model system for
investigating neurological disorders in humans. In one particular
embodiment, the reference cell has increased sensitivity to
N-methyl-D-aspartate, .beta.-amyloid, peroxide, oxygen-glucoe deprivation,
or combinations thereof. In such cases, the detecting step can comprises
detecting a decrease in cellular sensitivity to N-methyl-D-aspartate,
.beta.-amyloid, peroxide, oxygen-glucose deprivation, or combinations
thereof.
BRIEF DESCRIPTION OF THE FIGURES
FIG. 1 shows duplicate arrays probed using the "knock-down" methods of the
invention. Arrows show (A) presence of hybridization signal (triplicate
spots) and (B) reduction of signal due to inclusion of knock-down
polynucleotide during hybridization. This figure shows a portion (detail)
of a larger array.
FIG. 2. Clones representing a group that are upregulated in Rsa I, 6 h
(tester) as opposed to Rsa I, 0 h (driver) and are of low hybridization
signal (=low abundance) in tester and driver are increased in their signal
(abundance) under condition of Library ID "F" (normalized
tester-subtracted) and PCR cycles=21, 23, 25, 27. Libraries (L) and
numbers of amplification steps in the second PCR cycle (N) are indicated
by the shorthand "LN." For example, "A21" encodes a description of Library
ID "A" with second PCR cycle process length of 21 cycles. Libraries (L)
and numbers of amplification steps in the second PCR cycle (N) are
indicated by the shorthand "LN." For example, "A21" encodes a description
of Library ID "A" with second PCR cycle process length of 21 cycles.
FIG. 3. Clones representing a group that are upregulated in Rsa I, 6 h
(tester) as opposed to Rsa I, 0 h (driver) and are of low hybridization
signal (=low abundance) in tester and driver are increased in their signal
(abundance) under condition of Library IDs "C" through "F" (normalized
tester-subtracted), "H" through "K" (normalized driver-subtracted) and PCR
cycles=25. Clones from Library IDs "A" and "B" are essentially unchanged.
FIG. 4. Clones representing groups that are upregulated in Rsa I, 6 h
(tester) as opposed to Rsa I, 0 h (driver) and are of low, medium or high
tester hybridization signal are normalized in their signal under condition
of Library ID "B".
FIG. 5. A Western Blot showing inhibition of expression of eGFP (enhanced
Green Fluorescent Protein) by eGFP dsRNA in a neuroblastoma cell line
(AGYNB-010) harboring a plasmid encoding for eGFP. The blot shows
inhibition of eGFP expression for cells transfected with eGFP dsRNA (i.e.,
dsRNA corresponding to the entire eGFP coding region; lanes 9 and 10) and
for cells transfected with eGFP dsRNA from the C-terminus (dsEGFP-C; lanes
6-8). Untransfected cells (mock cells; lanes 1-2) and cells transfected
with UCP-2 dsRNA (dsUCP2; lanes 3-5) served as controls and show little or
no inhibition of eGFP expression. Anti-MAP2 was used to assure equal
loading.
FIG. 6A. A Western Blot showing inhibition of endogenous PARP by PARP
dsRNA. Inhibition of endogenous PARP expression is observed for
neuroblastoma cells (AGYNB-010) transfected with PARP dsRNA prepared from
the C-terminus of PARP (dsPARP-C; lanes 3-6) or PARP dsRNA prepared from
the N-terminus of PARP (dsPARP-N; lanes 7-10). Control cells transfected
with UCP-2 dsRNA, in contrast, still express endogenous PARP (lanes 1-2).
Anti-MAP2 was used to assure equal loading.
FIGS. 6B-6D. Results showing that RNAi mediated inhibition of PARP
expression induces resistance to oxygen glucose deprivation (OGD). FIGS.
6B and 6C show views of neuroblastoma cells (AGYNB-010 cells) subjected to
3 hours of OGD. Cell viability was assayed by staining with a fluorescent
dye that preferentially stains healthy cells rather than dead cells. Cells
transfected with dsPARP 3 hours after initiation of OGD show significantly
less cell death (FIG. 6C) as compared to control cells transfected with
dsEGFP (FIG. 6B). FIG. 6D is a chart showing that AGYNB-010 cells
transfected with dsPARP are rescued from cell death following 3 hours of
OGD, whereas control cells that are either untransfected (mock cells) or
transfected with dsEGFP show significant cell death after 3 hours of OGD.
FIGS. 7A-7C. Charts showing sensitivity of the AGYNB-010 neuroblastoma cell
line to .beta.-amyloid (FIG. 7A), N-methyl-D-aspartate (NMDA) (FIG. 7B)
and oxygen glucose deprivation (OGD) (FIG. 7C).
DETAILED DESCRIPTION
I. Definitions
As used in this specification and the appended claims, the singular forms
"a," "an" and "the" include plural references unless the content clearly
dictates otherwise.
Unless defined otherwise, all technical and scientific terms used herein
have the meaning commonly understood by a person skilled in the art to
which this invention belongs. The following references provide one of
skill with a general definition of many of the terms used in this
invention: Singleton et al., DICTIONARY OF MICROBIOLOGY AND MOLECULAR
BIOLOGY (2d ed. 1994); THE CAMBRIDGE DICTIONARY OF SCIENCE AND TECHNOLOGY
(Walker ed., 1988); THE GLOSSARY OF GENETICS, 5TH ED., R. Rieger et al.
(eds.), Springer Verlag (1991); and Hale & Marham, THE HARPER COLLINS
DICTIONARY OF BIOLOGY (1991).
Various biochemical and molecular biology methods are well known in the
art. For example, methods of isolation and purification of nucleic acids
are described in detail in WO 97/10365, WO 97/27317, Chapter 3 of
Laboratory Techniques in Biochemistry and Molecular Biology: Hybridization
With Nucleic Acid Probes, Part I. Theory and Nucleic Acid Preparation, (P.
Tijssen, ed.) Elsevier, N.Y. (1993); Chapter 3 of Laboratory Techniques in
Biochemistry and Molecular Biology: Hybridization With Nucleic Acid
Probes, Part 1. Theory and Nucleic Acid Preparation, (P. Tijssen, ed.)
Elsevier, N.Y. (1993); and Sambrook et al., Molecular Cloning: A
Laboratory Manual, Cold Spring Harbor Press, N.Y., (1989); and Current
Protocols in Molecular Biology, (Ausubel, F. M. et al., eds.) John Wiley &
Sons, Inc., New York (1987-1999), including supplements such as supplement
46 (April 1999).
As used herein, the following terms have the meanings ascribed to them
unless specified otherwise:
The term "tissue," as used herein in the context of a source of mRNA and
cDNA, refers to any aggregation of morphologically or functionally related
cells, or cell systems, and thus includes cells (including in vitro
cultured cells), tissues, organs, and the like.
The term "library" as used herein, refers to a collection of
polynucleotides (usually in the form of double-stranded cDNA) derived from
mRNA of a particular tissue. The polynucleotides of a library may be, but
are not necessarily, cloned into a vector.
The terms "nucleic acid" "polynucleotide" and "oligonucleotide" are used
interchangable herein and refer to a deoxyribonucleotide or ribonucleotide
polymer in either single- or double-stranded form, and unless otherwise
limited, encompasses known analogs of natural nucleotides that hybridize
to nucleic acids in a manner similar to naturally-occurring nucleotides.
Examples of such analogs include, without limitation, phosphorothioates,
phosphoramidates, methyl phosphonates, chiral-methyl phosphonates,
2-O-methyl ribonucleotides, and peptide-nucleic acids (PNAs). A
"subsequence" or "segment" refers to a sequence of nucleotides that
comprise a part of a longer sequence of nucleotides.
A "gene," for the purposes of the present disclosure, includes a DNA region
encoding a gene product (see infra). The region can also include DNA
regions that regulate the production of the gene product, whether or not
such regulatory sequences are adjacent to coding and/or transcribed
sequences. Accordingly, a gene can include, without limitation, promoter
sequences, terminators, translational regulatory sequences such as
ribosome binding sites and internal ribosome entry sites, enhancers,
silencers, insulators, boundary elements, replication origins, matrix
attachment sites and locus control regions.
"Gene expression" refers to the conversion of the information, contained in
a gene, into a gene product. A gene product can be the direct
transcriptional product of a gene (e.g., mRNA, tRNA, rRNA, antisense RNA,
ribozyme, structural RNA or any other type of RNA) or a protein produced
by translation of a mRNA. Gene products also include RNAs which are
modified, by processes such as capping, polyadenylation, methylation, and
editing, and proteins modified by, for example, methylation, acetylation,
phosphorylation, ubiquitination, ADP-ribosylation, myristilation, and
glycosylation.
"Modulation" refers to a change in the level or magnitude of an activity or
process. The change can be either an increase or a decrease. For example,
modulation of gene expression includes both gene activation and gene
repression. Modulation can be assayed by determining any parameter that is
indirectly or directly affected by the expression of the target gene. Such
parameters include, e.g., changes in RNA or protein levels, changes in
protein activity, changes in product levels, changes in downstream gene
expression, changes in reporter gene transcription (luciferase, CAT,
.beta.-galactosidase, .beta.-glucuronidase, green fluorescent protein
(see, e.g., Mistili & Spector, Nature Biotechnology 15:961-964 (1997));
changes in signal transduction, phosphorylation and dephosphorylation,
receptor-ligand interactions, second messenger concentrations (e.g., cGMP,
cAMP, IP3, and Ca2+), and cell growth.
The term "complementary" means that one nucleic acid is identical to, or
hybridizes selectively to, another nucleic acid molecule. Selectivity of
hybridization exists when hybridization occurs that is more selective than
total lack of specificity. Typically, selective hybridization will occur
when there is at least about 55% identity over a stretch of at least 14-25
nucleotides, preferably at least 65%, more preferably at least 75%, and
most preferably at least 90%. Preferably, one nucleic acid hybridizes
specifically to the other nucleic acid. See M. Kanehisa, Nucleic Acids
Res. 12:203 (1984).
The term "exogenous" when used with reference to a molecule (e.g., a
nucleic acid) refers to a molecule that is not normally present in a cell,
but can be introduced into a cell by one or more genetic, biochemical or
other methods. Normal presence in the cell is determined with respect to
the particular developmental stage and environmental conditions of the
cell. Thus, for example, a molecule that is present only during embryonic
development of muscle is an exogenous molecule with respect to an adult
muscle cell. An exogenous molecule can comprise, for example, a
functioning version of a malfunctioning endogenous molecule or a
malfunctioning version of a normally-functioning endogenous molecule.
An exogenous molecule can be, among other things, a small molecule, such as
is generated by a combinatorial chemistry process, or a macromolecule such
as a protein, nucleic acid, carbohydrate, lipid, glycoprotein,
lipoprotein, polysaccharide, any modified derivative of the above
molecules, or any complex comprising one or more of the above molecules.
An exogenous molecule can be the same type of molecule as an endogenous
molecule, e.g., protein or nucleic acid (i.e., an exogenous gene),
providing it has a sequence that is different from an endogenous molecule.
Methods for the introduction of exogenous molecules into cells are known
to those of skill in the art and include, but are not limited to,
lipid-mediated transfer (i.e., liposomes, including neutral and cationic
lipids), electroporation, direct injection, cell fusion, particle
bombardment, calcium phosphate co-precipitation, DEAE-dextran-mediated
transfer and viral vector-mediated transfer.
By contrast, the term "endogenous" when used in reference to a molecule is
one that is normally present in a particular cell at a particular
developmental stage under particular environmental conditions.
The terms "identical" or percent "identity," in the context of two or more
nucleic acids or polypeptides, refer to two or more sequences or
subsequences that are the same or have a specified percentage of
nucleotides or amino acid residues that are the same, when compared and
aligned for maximum correspondence, as measured using a sequence
comparison algorithm such as those described below for example, or by
visual inspection.
The phrase "substantially identical," in the context of two nucleic acids,
refers to two or more sequences or subsequences that have at least 75%,
preferably at least 80% or 85%, more preferably at least 90%, 95% or
higher nucleotide identity, when compared and aligned for maximum
correspondence, as measured using a sequence comparison algorithm such as
those described below for example, or by visual inspection. Preferably,
the substantial identity exists over a region of the sequences that is at
least about 40-60 nucleotides in length, in other instances over a region
at least 60-80 nucleotides in length, in still other instances at least
90-100 nucleotides in length, and in yet other instances the sequences are
substantially identical over the full length of the sequences being
compared, such as the coding region of a nucleotide for example.
For sequence comparison, typically one sequence acts as a reference
sequence, to which test sequences are compared. When using a sequence
comparison algorithm, test and reference sequences are input into a
computer, subsequence coordinates are designated, if necessary, and
sequence algorithm program parameters are designated. The sequence
comparison algorithm then calculates the percent sequence identity for the
test sequence(s) relative to the reference sequence, based on the
designated program parameters.
Optimal alignment of sequences for comparison can be conducted, e.g., by
the local homology algorithm of Smith & Waterman, Adv. Appl. Math. 2:482
(1981), by the homology alignment algorithm of Needleman & Wunsch, J. Mol.
Biol. 48:443 (1970), by the search for similarity method of Pearson &
Lipman, Proc. Natl. Acad. Sci. USA 85:2444 (1988), by computerized
implementations of these algorithms (GAP, BESTFIT, FASTA, and TFASTA in
the Wisconsin Genetics Software Package, Genetics Computer Group, 575
Science Dr., Madison, Wis.), or by visual inspection [see generally,
Current Protocols in Molecular Biology, (Ausubel, F. M. et al., eds.) John
Wiley & Sons, Inc., New York (1987-1999, including supplements such as
supplement 46 (April 1999)]. Use of these programs to conduct sequence
comparisons are typically conducted using the default parameters specific
for each program.
Another example of algorithm that is suitable for determining percent
sequence identity and sequence similarity is the BLAST algorithm, which is
described in Altschul et al., J. Mol. Biol. 215:403-410 (1990). Software
for performing BLAST analyses is publicly available through the National
Center for Biotechnology Information. This algorithm involves first
identifying high scoring sequence pairs (HSPs) by identifying short words
of length W in the query sequence, which either match or satisfy some
positive-valued threshold score T when aligned with a word of the same
length in a database sequence. T is referred to as the neighborhood word
score threshold (Altschul et al, supra.). These initial neighborhood word
hits act as seeds for initiating searches to find longer HSPs containing
them. The word hits are then extended in both directions along each
sequence for as far as the cumulative alignment score can be increased.
Cumulative scores are calculated using, for nucleotide sequences, the
parameters M (reward score for a pair of matching residues; always>0)
and N (penalty score for mismatching residues; always<0). For amino
acid sequences, a scoring matrix is used to calculate the cumulative
score. Extension of the word hits in each direction are halted when: the
cumulative alignment score falls off by the quantity X from its maximum
achieved value; the cumulative score goes to zero or below, due to the
accumulation of one or more negative-scoring residue alignments; or the
end of either sequence is reached. For identifying whether a nucleic acid
or polypeptide is within the scope of the invention, the default
parameters of the BLAST programs are suitable. The BLASTN program (for
nucleotide sequences) uses as defaults a word length (W) of 11, an
expectation (E) of 10, M=5, N=-4, and a comparison of both strands. For
amino acid sequences, the BLASTP program uses as defaults a word length
(W) of 3, an expectation (E) of 10, and the BLOSUM62 scoring matrix. The
TBLATN program (using protein sequence for nucleotide sequence) uses as
defaults a word length (W) of 3, an expectation (E) of 10, and a BLOSUM 62
scoring matrix. (see Henikoff & Henikoff, Proc. Natl. Acad. Sci. USA
89:10915 (1989)).
In addition to calculating percent sequence identity, the BLAST algorithm
also performs a statistical analysis of the similarity between two
sequences (see, e.g., Karlin & Altschul, Proc. Natl. Acad. Sci. USA
90:5873-5787 (1993)). One measure of similarity provided by the BLAST
algorithm is the smallest sum probability (P(N)), which provides an
indication of the probability by which a match between two nucleotide or
amino acid sequences would occur by chance. For example, a nucleic acid is
considered similar to a reference sequence if the smallest sum probability
in a comparison of the test nucleic acid to the reference nucleic acid is
less than about 0.1, more preferably less than about 0.01, and most
preferably less than about 0.001.
Another indication that two nucleic acid sequences are substantially
identical is that the two molecules hybridize to each other under
stringent conditions. "Bind(s) substantially" refers to complementary
hybridization between a probe nucleic acid and a target nucleic acid and
embraces minor mismatches that can be accommodated by reducing the
stringency of the hybridization media to achieve the desired detection of
the target polynucleotide sequence. The phrase "hybridizing specifically
to" or "specifically hybridizing to", refers to the binding, duplexing, or
hybridizing of a molecule only to a particular nucleotide sequence under
stringent conditions when that sequence is present in a complex mixture
(e.g. total cellular) DNA or RNA.
The term "stringent conditions" refers to conditions under which a probe or
primer will hybridize to its target subsequence, but to no other
sequences. Stringent conditions are sequence-dependent and will be
different in different circumstances. Longer sequences hybridize
specifically at higher temperatures. Generally, stringent conditions are
selected to be about 5.degree. C. lower than the thermal melting point
(Tm) for the specific sequence at a defined ionic strength and pH. In
other instances, stringent conditions are chosen to be about 20.degree. C.
or 25.degree. C. below the melting temperature of the sequence and a probe
with exact or nearly exact complementarity to the target. As used herein,
the melting temperature is the temperature at which a population of
double-stranded nucleic acid molecules becomes half-dissociated into
single strands. Methods for calculating the T.sub.m of nucleic acids are
well known in the art (see, e.g., Berger and Kimmel (1987) Methods in
Enzymology, vol. 152: Guide to Molecular Cloning Techniques, San Diego:
Academic Press, Inc. and Sambrook et al. (1989) Molecular Cloning: A
Laboratory Manual, 2nd ed., vols. 1-3, Cold Spring Harbor Laboratory),
both incorporated herein by reference. As indicated by standard
references, a simple estimate of the T.sub.m value can be calculated by
the equation: T.sub.m =81.5+0.41(% G+C), when a nucleic acid is in aqueous
solution at 1 M NaCl (see e.g., Anderson and Young, "Quantitative Filter
Hybridization," in Nucleic Acid Hybridization (1985)). Other references
include more sophisticated computations which take structural as well as
sequence characteristics into account for the calculation of T.sub.m. The
melting temperature of a hybrid (and thus the conditions for stringent
hybridization) is affected by various factors such as the length and
nature (DNA, RNA, base composition) of the probe or primer and nature of
the target (DNA, RNA, base composition, present in solution or
immobilized, and the like), and the concentration of salts and other
components (e.g., the presence or absence of formamide, dextran sulfate,
polyethylene glycol). The effects of these factors are well known and are
discussed in standard references in the art, see e.g., Sambrook, supra,
and Ausubel, supra. Typically, stringent conditions will be those in which
the salt concentration is less than about 1.0 M Na ion, typically about
0.01 to 1.0 M Na ion concentration (or other salts) at pH 7.0 to 8.3 and
the temperature is at least about 30.degree. C. for short probes or
primers (e.g., 10 to 50 nucleotides) and at least about 60.degree. C. for
long probes or primers (e.g., greater than 50 nucleotides). Stringent
conditions can also be achieved with the addition of destabilizing agents
such as formamide.
The term "detectably labeled" means that an agent (e.g., a probe) has been
conjugated with a label that can be detected by physical, chemical,
electromagnetic and other related analytical techniques. Examples of
detectable labels that can be utilized include, but are not limited to,
radioisotopes, fluorophores, chromophores, mass labels, electron dense
particles, magnetic particles, spin labels, molecules that emit
chemiluminescence, electrochemically active molecules, enzymes, cofactors,
and enzyme substrates.
II. Overview
The present invention provides methods for efficiently identifying and
characterizing genes that play important roles in cellular processes such
as aging and development, response to environmental challenges (e.g.,
injury or drug exposure), and pathologic processes. Specifically, the
methods disclosed herein permit the rapid and economical generation of
"libraries" of differentially expressed and low abundance sequences likely
to play roles in pathogenesis and treatment of human disease. Importantly,
the methods of the invention are well suited to use with very small
amounts of tissue. This permits comprehensive libraries to be produced
even when small amount of starting material is available.
The methods also include a process in which genes identified as being
present in low abundance and/or as being differentially expressed
("candidate genes") are functionally validated. This validation process
involves determining whether a candidate gene does in fact play a
functional effect in a cell by, for example, determining if modulation of
expression of the candidate gene is correlated with an alteration in a
cellular activity or cellular state in the cell in which expression is
modulated.
Certain methods are performed using double-stranded RNA interference
(RNAi). In general, such methods involve introducing a dsRNA that is
substantially identical to at least a segment of the candidate gene into a
reference cell or tissue into which the dsRNA is introduced and then
determining whether interference with expression is associated with
alteration of cellular activity or state. Detection of such an alteration
provides evidence that the candidate gene is correlated with the
particular cellular state or process under investigation.
However, methods other than RNAi can be utilized to functionally validate
candidate genes identified in the libraries. Such methods include
interference with gene expression by use of antisense technology,
ribozymes and gene knock-out approaches. Additional approaches include
co-immunoprecipitation and epistasis investigations.
III. Preparation Of Libraries
Generally
In one aspect of the invention, cDNA libraries are prepared that are highly
enriched for gene sequences likely to play a role in the molecular and
cellular pathomechanisms of disease, or which are involved in other
important cellular processes. In one embodiment of the invention, four
related, or "cognate," libraries are prepared and selected sequences
analyzed. Although, in some embodiments of the invention, fewer than four
libraries are prepared, by screening multiple (e.g., four) libraries the
coverage of the transcriptome is maximized and the likelihood of
identifying low-abundance and differentially-expressed genes is increased.
Moreover, by preparing four libraries validation techniques, as described
infra are facilitated.
Tissue Sources
The libraries of the invention are prepared using mRNA from pairs of
tissues that are of the same type, but which differ in one major
characteristic, such as disease state (e.g., diseased & normal brain
tissue), age (e.g., adult and fetal liver tissue), exposure to drugs,
state of differentiation, stage of development, or other state (e.g.,
stimulated & unstimulated; activated & unactivated). The tissue source may
be human or non-human. Typically the tissues are from a mammal such as a
human, non-human primate, rat, or mouse. In some embodiments, the tissues
are from an animal or tissue culture model of a human disease, e.g.,
stroke, Alzheimer's disease, and neuropathy. Examples of tissue pairs
useful for library preparation are shown in Table 1.
TABLE 1
Gene-expression state 1 Gene-expression state 2
Diseased tissue Normal tissue
a) hypoxic/ischemic brain a) healthy brain
b) cirrhotic liver b) healthy liver
c) tumor c) normal tissue
d) Alzheimer's brain d) healthy brain
Drug-exposed tissue Non-drug exposed tissue
a) kainate-injected brain a) saline injected brain
b) Zyprexa .RTM.-injected brain b) saline injected brain
c) toxin-stimulated cell line c) saline stimulated cell line
Age/Tissue Type/etc. Age/Tissue Type/etc.
a) mature brain a) fetal brain
b) hippocampus b) cortex
c) neurons c) glial cells
Although each of any group of four cognate libraries is prepared using the
same tissue pair, the libraries have different properties as a result of
differences in their construction. For each set of libraries, one tissue
in the pair is designated the "driver tissue," "control tissue," or simply
"control cell" (from which "driver" cDNA may be made) and the second
tissue in the pair is designated the "tester" tissue, "test tissue," or
simply "test cell" (from which "tester" cDNA may be made). For example, in
a pair in the same horizontal row of Table I), the tissue in the first
column may be considered the tester and the tissue in the second column
may be considered the driver. For purposes of the invention, it is
entirely arbitrary which tissue is "driver" and which is "tester."
For ease of reference, the four cognate libraries are referred to herein
as: (1) driver-normalized, (2) tester-normalized, (3) driver-subtracted,
and (4) tester-subtracted. Libraries (1) and (2) are normalized, and thus
enriched in sequences corresponding to low abundance transcripts. In a
cognate group, Library 1 is made using one tissue of a pair (driver
tissue) and Library 2 is made using the specified tester tissue. Libraries
(3) and (4) are subtracted (or normalized and subtracted) libraries and
thus enriched in sequences that are differentially expressed between pairs
of tissue states. Libraries (3) and (4) of a cognate group are made using
both tissues in the tissue pair.
Several methods are known for making normalized and/or subtracted cDNA
libraries. Although certain methods are described or referred to in
Sections II(B)-(E), infra, the invention is not limited to embodiments in
which these methods are used. For example, the analytical methods
described in Section III may be used in combination with a variety of
normalization/subtraction approaches.
Preparation of Double-Stranded cDNA From Paired Tissue Samples
Double-stranded cDNA (dscDNA) is prepared from tissues using standard
protocols, i.e., by reverse transcription of messenger (poly A.sup.+) RNA
from a specified RNA source using a primer to produce single stranded
cDNA. Methods for isolation of total or poly(A) RNA and for making cDNA
libraries are well known in the art, and are described in detail in
Ausubel and Sambrook (supra). In one embodiment, the library is made using
oligo(dT) primers for first strand synthesis. The single-stranded cDNA is
converted into double-stranded cDNA (dscDNA) using routine methods (see,
e.g., Ausubel supra).
Restriction Enzyme Digestion
In some embodiments of the invention, the dscDNA from each tissue source is
digested with one restriction enzyme or, in an alternative embodiment, the
dscDNA from each tissue source is separately digested with two or more
restriction enzymes, with different specificities, that cut at recognition
sequences found frequently in the dscDNA. Often, two enzymes are used (and
the discussion and examples below will refer to use of two enzymes). As
noted, the digestion with each of the two or more enzymes is carried out
separately (e.g., in separate reaction tubes). The digested fragments may
be combined later for further processing.
The dual digestion steps allow for the efficient generation of libraries
that are more comprehensive (e.g., containing more different species of
expressed or differentially expressed species) than libraries made by
other methods. The digestion is intended, in part, to generate fragments
in a size range that allows efficient hybridization during the annealing
steps of library construction. Only fragments of the target size range
will efficiently anneal under the conditions used, and non-annealing
molecules are excluded from amplification or cloning in some embodiments
of the invention. A further advantage of the dual digestion steps is that
by digesting with multiple (e.g., 2) enzymes with different specificities
as taught herein, the resulting libraries are more comprehensive.
According to the invention, the restriction enzymes used are selected that
will produce a calculated (or "predicted") average fragment size of
between about 100 and about 500 basepairs, preferably about 300-500
basepairs (e.g., an average length of between 300 bases and 500 bases). In
addition, the two or more different enzymes should produce fragments of
similar lengths (e.g., so that each has a calculated average fragment size
of within about 150 bases, more often about 100 bases, of the calculated
average fragment size of the other). Because PCR is generally more
efficient for shorter fragments, the use of fragments of similar length
also ensures non-biased PCR amplification between fragments resulting from
di