Senior Fitness - Exercise and Nutrition for Aging Men and Women
FREE Article Feed for your website.
Home Ownership Magazine
Party Planning Information
Article Marketing Resources
Bio-Medical Research Article Database
Informative Articles on Life, Love and Happiness
Tutorials on Business to Writing
Famous Quotes from Famous People
Song Lyric Information
New US Patent Information
Comprehensive List of Content by Category
Online Auctions and Shopping Related Articles
Article Search
Most Recent Articles
 

Bad Credit Loans Made Easier by Pre Approval
Category:
Business  

Vitamin supplements by Nguang Nguek Fluek
Category:
Health / Fitness  

How you Can Save Money if you Book Hotels in Central Rome
Category:
Travel  

Universal Life Insurance guide 101
Category:
Finance / Investment  

FINE or VICE Cash Loans
Category:
Finance / Investment  

Why Blogs are so popular
Category:
Marketing  

Office Supplies and Client Relation
Category:
Business  

Buying a Hidden Spy Camera
Category:
Business  

Understanding Flower Bulbs
Category:
Home And Family  

Parenting 101 Get Into a Parenting Class
Category:
Home And Family  

Lanzarote Tourist
Category:
Travel  

A Visitors Guide to Paris France
Category:
Travel  

Personal Accounts Choosing Your Bank
Category:
Business  

Acne A Clean Face First Step In A 12 Step Program
Category:
Health / Fitness  

VOIP security guide
Category:
Computers  

Three Reasons For Becoming A Foster Parent
Category:
Home And Family  

Affiliate Programs MLM Income Opportunity Residual
Category:
Business  

Hepatitis C Symptoms What are the Signs and Symptoms of Hepatiti...
Category:
Health / Fitness  

Sales Success Who Do You Really Work For
Category:
Business  

Stress Testing Tools How to Test for Stress Level DHEA
Category:
Health / Fitness  

Stay At Home CEO How a Single Dad Found Financial Success Workin...
Category:
Business  

Build Your Confidence and Find Your Soulmate
Category:
Entertainment / Television  

Importance of Good Web Design
Category:
Business  

WANT MORE CHANCES OF WINNING THE LOTTERY JACKPOT
Category:
Business  

Eight Strategies to Become a Winner
Category:
Self Help  

Business Property Investment can provide Guaranteed Returns For ...
Category:
Business  

IVR Surveys The secret to Increasing response Rates
Category:
Business  

New Bankruptcy Training Course Provides 7 CLE Credits for Parale...
Category:
Business  

Something new to try What about a head or face massage
Category:
Health / Fitness  

10 Tips for Rapid Fat Loss
Category:
Health / Fitness  

A Guide to Tropical Wall Murals
Category:
Home And Family  

Debt Relief Solutions Get the Way for Financial Relief
Category:
Finance / Investment  

Evolution of Myspace from a social networking website to a marke...
Category:
Marketing  

Top Networking Marketing Opportunities Is There Such A Thing
Category:
Business  

What are you prepared to risk to optimise your chances of intern...
Category:
Marketing  

Using a Free Baby Shower Word Scramble Game
Category:
Home And Family  

To Everyone that Wants to Taste the Love
Category:
Entertainment / Television  

Business Loans
Category:
Business  

PSP Downloads Site Receives 5 Star Rating
Category:
Home And Family  

Did Colorado Kill Doc Holliday
Category:
Travel  

What is franchising
Category:
Business  

Dead Ducks Don t Quack
Category:
Business  

Capital and Repayment Mortgages
Category:
Finance / Investment  

Three Online Stock Trading Systems
Category:
Finance / Investment  

Compare Gyms and Save
Category:
Health / Fitness  

What are the Health Benefits of an Infrared Sauna
Category:
Health / Fitness  

Timeframe of long term SEO results
Category:
Marketing  

Why You Might Consider Enhancement After LASIK Laser Eye Surgery...
Category:
Health / Fitness  

One Way Links and Reciprocal Link Exchange and Traffic
Category:
Marketing  

Avoid Cold Calling Download Ebook Free Online
Category:
Business  

handbags
Category:
Computers  

Cottage Getaway to Plan Book early to secure your Cottage Rental...
Category:
Travel  

Understanding Teen Acne
Category:
Home And Family  

12 Cost effective Ways to Keep Your Child Safe around the Home
Category:
Home And Family  

What Are Supplemental Credit Cardholders
Category:
Business  

Equity Indexed Annuity is a Fixed Annuity Now Known as an Index ...
Category:
Finance / Investment  

Using A Data Recovery Service A Quick Overview
Category:
Computers  

Hemorrhoids Exercises to Easy Your Hemorrhoids
Category:
Health / Fitness  

What Comprises a Good Graphic Design
Category:
Computers  

Email Marketing For Success
Category:
Business  

Rx Assistance For NY Citizens By ACIRX
Category:
Business  

Secured Loan
Category:
Finance / Investment  

Are there really free online surveys that pay
Category:
Business  

Bread Makers Why your Kitchen is Begging for One
Category:
Home And Family  

SEO 101 For Beginners Revised
Category:
Marketing  

How to building and managing an opt in list for a website
Category:
Marketing  

The Benefits Of Using Professional Translations For Internationa...
Category:
Business  

What Is A Second Mortgage
Category:
Business  

3 Simple Methods To Building A Profitable Opt In List
Category:
Marketing  

Varieties Of Electric Heating Pads
Category:
Health / Fitness  

7 Ways To Ensure Your Article Never Gets Used By Other Webmaster...
Category:
Marketing  

We Should All be Greatful to Day Traders
Category:
Finance / Investment  

How To Find The Best PDA Phones On The Market Even If You re A N...
Category:
Computers  

Making Your Resource Box Work
Category:
Marketing  

Unraveling some of the myths about email promotion
Category:
Marketing

Collection of repeat proteins comprising repeat modules Number:7,417,130 from the United States Patent and Trademark Office (PTO) owispatent

Home    Author Login    Submit Article    Article Search    Add Your Link    Edit Your Link    Contact Us    Advertising    Disclaimer

   

 
Web LinkGrinder.com

Top Breaking News
     Greek, Cypriot Leaders Resume Unification Talks in Nicosia by Nathan Morley
     Indonesia Tobacco Sales Grow, Raising Health Fears
     South Korea Allows Top Defector to Travel Overseas by VOA News

Title: Collection of repeat proteins comprising repeat modules

Abstract: The present invention relates to collections of repeat proteins comprising repeat modules which are derived from one or more repeat units of a family of naturally occurring repeat proteins, to collections of nucleic acid molecules encoding said repeat proteins, to methods for the constructions and application of such collections and to individual members of such collections.

Patent Number: 7,417,130 Issued on 08/26/2008 to Stumpp,   et al.


Inventors: Stumpp; Michael Tobias (Zurich, CH), Forrer; Patrick (Zurich, CH), Binz; Hans Kaspar (Zurich, CH), Pluckthun; Andreas (Zurich, CH)
Assignee: University of Zurich (Zurich, CH)
Appl. No.: 10/363,552
Filed: September 10, 2001
PCT Filed: September 10, 2001
PCT No.: PCT/EP01/10454
371(c)(1),(2),(4) Date: July 06, 2003
PCT Pub. No.: WO02/20565
PCT Pub. Date: March 14, 2002


Foreign Application Priority Data

Sep 08, 2000 [EP] 00119670

Current U.S. Class: 536/23.1 ; 435/91.1
Current International Class: C07H 21/04 (20060101); C07H 21/02 (20060101); C12N 15/11 (20060101)


References Cited [Referenced By]

Foreign Patent Documents
WO96/06166 Feb., 1996 WO
WO98/06845 Feb., 1998 WO
WO/00/34784 Jun., 2000 WO
WO01/61005 Aug., 2001 WO
WO01/85909 Nov., 2001 WO

Other References

Bork, Proteins: Structure, Function, and Genetics 17: 363 (1993). cited by examiner .
Kajava, J. Mol. Biol. 277: 519-527 (1998). cited by other .
Kobe et al., Nature 374(9): 183-186 (1995). cited by other .
Lehmann et al., Protein Eng. 13(1): 49-57 (2000) (Abstract PMID 10679530). cited by other .
Lehmann et al., Protein Eng. 15(5): 403-4011 (2002) (Abstract PMID 12034860). cited by other .
Nygren et al., Current Opinion in Structural Biology 7: 463-469 (1997). cited by other .
Sedgwick et al., TIBS 24(8): 311-316 (1999). cited by other .
Zhang et al., J. Mol. Biol. 299: 1121-1132 (2000). cited by other .
Russ, William P. et al., "Knowledge-based potential functions in protein design", Current Opinion in Structural Biology, 2002, 12:447-452, Elsevier Science, New York, NY, USA. cited by other .
Forrer, Patrik et al., "A novel strategy to design binding molecules harnessing the modular nature of repeat proteins", FEBS Letters 539 (2003) 2-6, Elsevier Science B.V., New York, NY, USA. cited by other .
Groves, Matthew R. et al., "Topological characteristics of helical repeat proteins", Current Opinion in Structural Biology 1999, 9:383-389, Elsevier Science Ltd., New York, NY, USA. cited by other .
Kobe, Bostjan et al., "The leucine-rich repeat: a versatile binding motif", TIBS 19, Oct. 1994, p. 415, abstract, Elsevier Science, New York, NY, USA. cited by other.

Primary Examiner: Martinell; James
Attorney, Agent or Firm: Arent Fox

Claims



The invention claimed is:

1. A collection of nucleic acid molecules encoding a collection of repeat proteins, each repeat protein comprising a repeat domain, which comprises a set of consecutive repeat modules, wherein each of said repeat modules is derived from one or more repeat units, wherein said repeat units are ankyrin repeats and wherein said repeat units comprise framework residues, which contribute to the folding topology of said repeat unit or contribute to an interaction with a neighboring repeat unit, and target interaction residues, which contribute to an interaction with a target substance, wherein said repeat proteins differ from other repeat proteins in said collection in at least one amino acid position of the repeat modules, and wherein said derivation of each of said repeat modules is carried out by an analysis comprising the steps of (a) identifying said repeat units; (b) determining a repeat sequence motif by structural and sequence analysis of said repeat units, wherein said structural and sequence analysis includes the identification of said framework residues and said target interaction residues of said repeat units; and (c) constructing the repeat module so that it comprises the repeat sequence motif of (b).

2. The collection of claim 1, wherein each of said repeat modules has an amino acid sequence, wherein at least 70% of the amino acid residues are either (i) consensus amino acid residues deduced from the amino acid residues found at the corresponding positions on alignment of at least two repeat units; or (ii) the amino acid residues found at the corresponding positions in a repeat unit.

3. The collection of claim 1, wherein said set consists of between two and about 30 repeat modules.

4. The collection of claim 1 or 3, wherein said repeat modules are directly connected without an intervening amino acid sequence.

5. The collection of claim 1 or 3, wherein said repeat modules are connected by a peptide or polypeptide linker.

6. The collection of claim 1 or 3, wherein said repeat domain further comprises an N- and/or a C-terminal capping module having an amino acid sequence different from any one of said repeat modules.

7. The collection of claim 1, wherein said set consists of one type of repeat modules of the same length of said module and consisting of the same number and composition of the fixed amino acid positions.

8. The collection of claim 1, wherein said set consists of two different types of repeat modules, wherein each said type of repeat module is of the same length and consists of the same number and composition of the fixed amino acid positions, and wherein said two different types differ in at least the length or the number or the composition of the fixed amino acid positions.

9. The collection of claim 1, wherein said set comprises in said repeat domain a pair of two different types of consecutive repeat modules wherein each said type of repeat module is of the same length and consists of the same number and composition of the fixed amino acid positions, wherein said two different types differ in at least the length or the number or the composition of the fixed amino acid positions.

10. The collection of claim 1, wherein each of said repeat modules comprises the ankyrin repeat sequence motif D11G1TPLHLAA11GHLEIVEVLLK2GADVNA1, (SEQ ID NO: 4) wherein 1 represents an amino acid residue selected from the group: A, D, E, F, H, I, K, L, M, N, Q, R, S, T, V, W and Y; wherein 2 represents an amino acid residue selected from the group: H, N and Y.

11. The collection of claim 10, wherein the collection comprises nucleic acids generated by exchange of nucleotides such that one or more of the amino acid residues in said consensus sequences are replaced by an amino acid residue found at the corresponding position on alignment of a repeat unit.

12. The collection of claim 1, 3 or 10, wherein said nucleic acid molecules comprise identical nucleic acid sequences of at least 9 nucleotides between said repeat modules.

13. The collection of claim 12, wherein said identical nucleic acid sequences allow assembly of said nucleic acid molecules by PCR technology.

14. The collection of claim 1, 3 or 10, wherein there are nucleic acid sequences between said modules, and each of said nucleic acid sequences between said modules comprises a restriction enzyme recognition sequence.

15. The collection of claim 1, 3 or 10, wherein there are nucleic acid sequences between said repeat modules and each of said nucleic acid sequences between said repeat modules comprises a nucleic acid sequence formed from cohesive ends created by two restriction enzymes.

16. A collection of recombinant nucleic acid molecules comprising a collection of nucleic acid molecules according to claim 1, 3 or 10.
Description



This application is a 371 of PCT/EP01/10454 filed on Sep. 10, 2001, which is hereby incorporated by reference.

The present invention relates to collections of repeat proteins comprising repeat modules which are derived from one or more repeat units of a family of naturally occurring repeat proteins, to collections of nucleic acid molecules encoding said repeat proteins, to methods for the construction and application of such collections and to individual members of such collections.

A number of documents are cited throughout this specification. The disclosure content of these documents is herewith incorporated by reference.

Protein-protein interactions, or more generally, protein-ligand interactions, play an important role in all organisms and the understanding of the key features of recognition and binding is one focus of current biochemical research. Up to now, antibodies and any of the derivatives, which have been elaborated, are mainly used in this field of research. However, antibody technology is afflicted with well-known disadvantages. For instance, antibodies can hardly be applied intracellularly due to the reductive environment in the cytoplasm. Thus, there exists a need for high affinity binding molecules with characteristics that overcome the restriction of antibodies. Such molecules will most probably provide new solutions in medicine, biotechnology, and research, where intracellular binders will also become increasingly important in genomics.

Various efforts to construct novel binding proteins have been reported (reviewed in Nygren and Uhlen, 1997). The most promising strategy seemed to be a combination of limited library generation and screening or selection for the desired properties. Usually, existing scaffolds were recruited to randomise some exposed amino acid residues after analysis of the crystal structure. However, despite progress in terms of stability and expressibility, the affinities reported so far are considerably lower than the ones of antibodies (Ku and Schultz, 1995). A constraint might be the limitation to targets for which the crystal structure is known (Kirkham et al., 1999) or which are homologous to the original target molecule, so that no universal scaffold for binding has been identified so far. To increase the apparent affinity of binders after screening, several approaches have used multimerisation of single binders to take advantage of avidity effects.

Thus, the technical problem underlying the present invention is to identify novel approaches for the construction of collections of binding proteins.

The solution to this technical problem is achieved by providing the embodiments characterised in the claims. Accordingly, the present invention allows constructing collections of repeat proteins comprising repeat modules. The technical approach of the present invention, i.e. to derive said modules from the repeat units of naturally occurring repeat proteins, is neither provided nor suggested by the prior art.

Thus, the present invention relates to collections of nucleic acid molecules encoding collections of repeat proteins, each repeat protein comprising a repeat domain, which comprises a set of consecutive repeat modules, wherein each of said repeat modules is derived from one or more repeat units of one family of naturally occurring repeat proteins, wherein said repeat units comprise framework residues and target interaction residues, wherein said repeat proteins differ in at least one position corresponding to one of said target interaction residues.

In the context of the present invention, the term "collection" refers to a population comprising at least two different entities or members. Preferably, such a collection comprises at least 10.sup.5, more preferably more than 10.sup.7, and most preferably more that 10.sup.9 different members. A "collection" may as well be referred to as a "library" or a "plurality".

The term "nucleic acid molecule" refers to a polynucleotide molecule, which is a ribonucleic acid (RNA) or deoxyribonucleic acid (DNA) molecule, either single stranded or double stranded. A nucleic acid molecule may either be present in isolated form, or be comprised in recombinant nucleic acid molecules or vectors.

The term "repeat proteins" refers to a (poly)peptide/protein comprising one or more repeat domains (FIG. 1). Preferably, each of said repeat proteins comprises up to four repeat domains. More preferably, each of said repeat proteins comprises up to two repeat domains. However, most preferably, each of the repeat proteins comprises one repeat domain. Furthermore, said repeat protein may comprise additional non-repeat protein domains (FIGS. 2a and 2b), (poly)peptide tags and/or (poly)peptide linker sequences (FIG. 1). The term "(poly)peptide tag" refers to an amino acid sequence attached to a (poly)peptide/protein, where said amino acid sequence is usable for the purification, detection, or targeting of said (poly)peptide/protein, or where said amino acid sequence improves the physio-chemical behavior of said (poly)peptide/protein, or where said amino acid sequence possesses an effector function. Such (poly)peptide tags may be small polypeptide sequences, for example, His.sub.n (Hochuli et al., 1988; Lindner et al., 1992), myc, FLAG (Hopp et al., 1988; Knappik and Pluckthun, 1994), or Strep-tag (Schmidt and Skerra, 1993; Schmidt and Skerra, 1994; Schmidt et al., 1996. These (poly)peptide tags are all well known in the art and are fully available to the person skilled in the art. Additional non-repeat domains may be further moieties such as enzymes (for example enzymes like alkaline phosphatase), which allow the detection of said repeat proteins, or moieties which can be used for targeting (such as immunoglobulins or fragments thereof) and/or as effector molecules. The individual (poly)peptide tags, moieties and/or domains of a repeat protein may be connected to each other directly or via (poly)peptide linkers. The term "(poly)peptide linker" refers to an amino acid sequence, which is able to link, for example two protein domains, a (poly)peptide tag and a protein domain or two sequence tags. Such linkers for example glycine-serine-linkers of variable lengths (e.g. Forrer and Jaussi, 1998), are known to the person skilled in the relevant art.

In the context of the present invention, the term "(poly)peptide" relates to a molecule consisting of one or more chains of multiple, i.e. two or more, amino acids linked via peptide bonds.

The term "protein" refers to a (poly)peptide, where at least part of the (poly)peptide has, or is able to, acquire a defined three-dimensional arrangement by forming secondary, tertiary, or quaternary structures within and/or between its (poly)peptide chain(s). If a protein comprises two or more (poly)peptides, the individual (poly)peptide chains may be linked non-covalently or covalently, e.g. by a disulfide bond between two (poly)peptides. A part of a protein, which individually has, or is able to, acquire a defined three-dimensional arrangement by forming secondary or tertiary structures is termed "protein domain". Such protein domains are well known to the practitioner skilled in the relevant art.

The term "family of naturally occurring repeat proteins" refers to a group of naturally occurring repeat proteins, where the members of said group comprise similar repeat units. Protein families are well known to the person skilled in the art.

The term "repeat domain" refers to a protein domain comprising two or more consecutive repeat units (modules) as structural units (FIG. 1), wherein said structural units have the same fold, and stack tightly to create a superhelical structure having a joint hydrophobic core (for a review see Kobe and Kajava, 2000). The term "structural unit" refers to a locally ordered part of a (poly)peptide, formed by three-dimensional interactions between two or more segments of secondary structure that are near one another along the (poly)peptide chain. Such a structural unit comprises a structural motif. The term "structural motif" refers to a three-dimensional arrangement of secondary structure elements present in at least one structural unit. For example, the structural motif repetitively present in LRR proteins consists of a .beta.-strand and an opposing antiparallel helical segment connected by a loop (FIG. 4a). Structural motifs are well known to the person skilled in the relevant art. Said structural units are alone not able to acquire a defined three-dimensional arrangement; however, their consecutive arrangement as repeat modules in a repeat domain leads to a mutual stabilization of neighbouring units resulting in said superhelical structure.

The term "repeat modules" refers to the repeated amino acid sequences of the repeat proteins encoded by the nucleic acid molecules of the collection of the present invention, which are derived from the repeat units (FIG. 3) of naturally occurring proteins. Each repeat module comprised in a repeat domain is derived from one or more repeat units of one family of naturally occurring repeat proteins.

Such "repeat modules" may comprise positions with amino acid residues present in all copies of the repeat module ("fixed positions") and positions with differing or "randomised" amino acid residues ("randomised positions").

The term "set of repeat modules" refers to the total number of repeat modules present in a repeat domain. Such "set of repeat modules" present in a repeat domain comprises two or more consecutive repeat modules, and may comprise just one type of repeat module in two or more copies, or two or more different types of modules, each present in one or more copies. The collection of repeat proteins according to the present invention may comprise repeat domains with identical number of repeat modules per corresponding repeat domain (i.e. one set with a fixed number of repeat modules), or may comprise repeat domains, which differ in the number of repeat modules per corresponding repeat domain (i.e. two or more sets with different numbers of repeat modules).

Preferably, the repeat modules comprised in a set are homologous repeat modules. In the context of the present invention, the term "homologous repeat modules" refers to repeat modules, wherein more than 70% of the framework residues of said repeat modules are homologous. Preferably, more than 80% of the framework residues of said repeat modules are homologous. Most preferably, more than 90% of the framework residues of said repeat modules are homologous. Computer programs to determine the percentage of homology between polypeptides, such as Fasta, Blast or Gap, are known to the person skilled in the relevant art.

Preferably, a repeat module of the present invention is derived from one repeat unit. This may refer to a situation where a collection of nucleic acid molecules, each molecule encoding a repeat domain of the invention, is obtained by random mutagenesis of a nucleic acid molecule encoding a naturally occurring repeat domain. Thus, said repeat domain of the present invention comprises a set of repeat modules, wherein each of said modules is derived from the corresponding repeat unit of said naturally occurring repeat domain. Methods for random mutagenesis of nucleic acid molecules such as error-prone PCR (Wilson and Keefe, 2000) or DNA shuffling (Volkov and Arnold, 2000) are well known to the person skilled in the relevant art. In another situation, a single naturally occurring repeat unit may be used to derive a repeat sequence motif of the present invention.

More preferably, a repeat module of the present invention is derived from one or more repeat units. This may refer to a situation where two or more homologous nucleic acid molecules, each encoding a naturally occurring repeat domain, are subjected to DNA recombination or random chimeragenesis (Volkov and Arnold, 2000). Thus, said repeat domain of the present invention comprises a set of repeat modules, wherein each of said modules is derived from one or more corresponding repeat units of said homologous naturally occurring repeat domains. Preferably, said homologous nucleic acid molecules possess a DNA sequences identity of at least 75%. More preferably said sequence identity is at least 85%.

Most preferably, a repeat module of the present invention is derived from two or more repeat units, where two or more homologous repeat units are used to derive a repeat sequence motif of the present invention. Descriptions of such a derivation process are presented in the examples.

The term "a repeat module derived from one or repeat units" refers to (i) a process comprising the analysis of one or more repeat units of naturally occurring repeat proteins and the deduction of a repeat module. This process may comprise the steps of: (a) identifying naturally occurring repeat units; (b) determining an initial repeat sequence motif by sequence alignments; (c) refining the repeat sequence motif by sequence analysis and structural analysis of said repeat units; (d) constructing a repeat module according to the repeat sequence motif of (c); or (ii) a process comprising the process of (i) followed by further evolution of the repeat module by random mutagenesis or random chimeragenesis.

The term "repeat unit" refers to amino acid sequences comprising sequence motifs of one or more naturally occurring proteins, wherein said "repeat units" are found in multiple copies, and which exhibit a defined folding topology common to all said motifs determining the fold of the protein. Such repeat units comprise framework residues (FIG. 4d) and interaction residues (FIG. 4e). Examples of such repeat units include leucine-rich repeat units, ankyrin repeat units, armadillo repeat units, tetratricopeptide repeat units, HEAT repeat units, and leucine-rich variant repeat units (reviewed in Kobe & Deisenhofer, 1994; Groves & Barford, 1999; Marino et al., 2000; Kobe, 1996). Naturally occurring proteins containing two or more such repeat units are referred to as "naturally occurring repeat proteins". The amino acid sequences of the individual repeat units of a repeat protein may have a significant number of mutations, substitutions, additions and/or deletions when compared to each other, while still substantially retaining the general pattern, or motif, of the repeat units.

Preferably, the repeat units used for the deduction of a repeat sequence motif are homologous repeat units, wherein the repeat units comprise the same structural motif and wherein more than 70% of the framework residues of said repeat units are homologous. Preferably, more than 80% of the framework residues of said repeat units are homologous. Most preferably, more than 90% of the framework residues of said repeat units are homologous.

The term "repeat sequence motif" refers to an amino acid sequence, which is deduced from one or more repeat units. Such repeat sequence motifs comprise framework residue positions and target interaction residue positions. Said framework residue positions correspond to the positions of framework residues of said repeat units. Said target interaction residue positions correspond to the positions of target interaction residues of said repeat units. Such repeat sequence motifs comprise fixed positions and randomized positions. The term "fixed position" refers to an amino acid position in a repeat sequence motif, wherein said position is set to a particular amino acid. Most often, such fixed positions correspond to the positions of framework residues. The term "randomized position" refers to an amino acid position in a repeat sequence motif, wherein two or more amino acids are allowed at said amino acid position. Most often, such randomized positions correspond to the positions of target target interaction residues. However, some positions of framework residues may also be randomized. Amino acid sequence motifs are well known to the practitioner in the relevant art.

The term "folding topology" refers to the tertiary structure of said repeat units. The folding topology will be determined by stretches of amino acids forming at least parts of .alpha.-helices or .beta.-sheets, or amino acid stretches forming linear polypeptides or loops, or any combination of .alpha.-helices, .beta.-sheets and/or linear polypeptides/loops.

The term "consecutive" refers to an arrangement, wherein said modules are arranged in tandem.

In repeat proteins, there are at least 2, usually about 2 to 6, more usually at least about 6, frequently 20 or more repeat units. For the most part, the repeat proteins are structural proteins and/or adhesive proteins, being present in prokaryotes and eukaryotes, including vertebrates and non-vertebrates. An analogy of ankyrin proteins to antibodies has been suggested (Jacobs and Harrison, 1998).

In most cases, said repeat units will exhibit a high degree of sequence identity (same amino acid residues at corresponding positions) or sequence similarity (amino acid residues being different, but having similar physicochemical properties), and some of the amino acid residues might be key residues being strongly conserved in the different repeat units found in naturally occurring proteins.

However, a high degree of sequence variability by amino acid insertions and/or deletions, and/or substitutions between the different repeat units found in naturally occurring proteins will be possible as long as the common folding topology is maintained.

Methods for directly determining the folding topology of repeat proteins by physicochemical means such as X-ray crystallography, NMR or CD spectroscopy, are well known to the practitioner skilled in the relevant art. Methods for identifying and determining repeat units or repeat sequence motifs, or for identifying families of related proteins comprising such repeat units or motifs, such as homology searches (BLAST etc.) are well established in the field of bioinformatics, and are well known to the practitioner in such art. The step of refining an initial repeat sequence motif may comprise an iterative process.

Crystal structures have been reported for ankyrin-type repeats (Bork, 1993; Huxford et al., 1998, see FIGS. 2g and 2h), the ribonuclease inhibitor (RI) of the leucine-rich repeat (LRR) superfamily (Kobe and Deisenhofer, 1993, see FIG. 2c) and other LRR proteins (see FIG. 2d to 2f). Inspection of these structures revealed an elongated shape in the case of the ankyrin repeats, or a horseshoe shape in the case of the leucine-rich repeats giving rise to an extraordinarily large surface.

The term "framework residues" relates to amino acid residues of the repeat units, or the corresponding amino acid residues of the repeat modules, which contribute to the folding topology, i.e. which contribute to the fold of said repeat unit (or module) or which contribute to the interaction with a neighboring unit (or module). Such contribution might be the interaction with other residues in the repeat unit (module) (4d), or the influence on the polypeptide backbone conformation as found in .alpha.-helices or .beta.-sheets, or amino acid stretches forming linear polypeptides or loops. The term "target interaction residues" refers to amino acid residues of the repeat units, or the corresponding amino acid residues of the repeat modules, which contribute to the interaction with target substances. Such contribution might be the direct interaction with the target substances (FIG. 4e), or the influence on other directly interacting residues, e.g. by stabilising the conformation of the (poly)peptide of said repeat unit (module) to allow or enhance the interaction of said directly interacting residues with said target. Such framework and target interaction residues may be identified by analysis of the structural data obtained by the physicochemical methods referred to above, or by comparison with known and related structural information well known to practitioners in structural biology and/or bioinformatics.

The term "interaction with said target substances" may be, without being limited to, binding to a target, involvement in a conformational change or a chemical reaction of said target, or activation of said target.

A "target" may be an individual molecule such as a nucleic acid molecule, a (poly)peptide protein, a carbohydrate, or any other naturally occurring molecule, including any part of such individual molecule, or complexes of two or more of such molecules. The target may be a whole cell or a tissue sample, or it may be any non-naturally occurring molecule or moiety.

The term "differ in at least one position" refers to a collection of repeat proteins, which have at least one position where more than one amino acid may be found. Preferably, such positions are randomised. The term "randomised" refers to positions of the repeat modules, which are variable within a collection and are occupied by more than one amino acid residue in the collection. Preferably, the randomised positions vary additionally between repeat modules within one repeat domain. Preferably, such positions may be fully randomised, i.e. being occupied by the full set of naturally occurring, proteinogenic amino acid residues. More preferably, such positions may be partially randomised, i.e. being occupied by a subset of the full set of naturally occurring amino acid residues. Subsets of amino acid residues may be sets of amino acid residues with common physicochemical properties, such as sets of hydrophobic, hydrophilic, acidic, basic, aromatic, or aliphatic amino acids, subsets comprising all except for certain non-desired amino acid residues, such as sets not comprising cysteines or prolines, or subsets comprising all amino acid residues found at the corresponding position in naturally occurring repeat proteins. The randomisation may be applied to some, preferably to all of the target interaction residues. Methods for making "randomised" repeat proteins such as by using oligonucleotide-directed mutagenesis of the nucleic acid sequences encoding said repeat proteins (e.g. by using mixtures of mononucleotides or trinucleotides (Virnekas et al., 1994)), or by using error-prone PCR during synthesis of said nucleic acid sequences, are well known to the practitioner skilled in the art.

In a preferred embodiment, each of said repeat modules has an amino acid sequence, wherein at least 70% of the amino acid residues correspond either (i) to consensus amino acid residues deduced from the amino acid residues found at the corresponding positions of at least two naturally occurring repeat units; or (ii) to the amino acid residues found at the corresponding positions in a naturally occurring repeat unit.

A "consensus amino acid residue" may be found by aligning two or more repeat units based on structural and/or sequence homology determined as described above, and by identifying one of the most frequent amino acid residue for each position in said units (an example is shown in FIGS. 5a and 5b). Said two or more repeat units may be taken from the repeat units comprised in a single repeat protein, or from two or more repeat proteins. If two or more amino acid residues are found with a similar probability in said two ore more repeat units, the consensus amino acid may be one of the most frequently found amino acid or a combination of said two or more amino acid residues.

Further preferred is a collection, wherein said set consists of between two and about 30 repeat modules.

More preferably, said set consists of between 6 and about 15 repeat modules.

In a yet further preferred embodiment of the present invention, said repeat modules are directly connected.

In the context of the present invention, the term "directly connected" refers to repeat modules, which are arranged as direct repeats in a repeat protein without an intervening amino acid sequence.

In a still further preferred embodiment, said repeat modules are connected by a (poly)peptide linker.

Thus, the repeat modules may be linked indirectly via a (poly)peptide linker as intervening sequence separating the individual modules. An "intervening sequence" may be any amino acid sequence, which allows to connect the individual modules without interfering with the folding topology or the stacking of the modules. Preferentially, said intervening sequences are short (poly)peptide linkers of less than 10, and even more preferably, of less than 5 amino acid residues.

In a still further preferred embodiment of the collection of the present invention, each of said repeat proteins further comprises an N- and/or a C-terminal capping module (FIG. 1) having an amino acid sequence different from any one of said repeat modules.

The term "capping module" refers to a polypeptide fused to the N- or C-terminal repeat module of a repeat domain, wherein said capping module forms tight tertiary interactions with said repeat module thereby providing a cap that shields the hydrophobic core of said repeat module at the side not in contact with the consecutive repeat module from the solvent (FIG. 1).

Said N- and/or C-terminal capping module may be, or may be derived from, a capping unit (FIG. 3) or other domain found in a naturally occurring repeat protein adjacent to a repeat unit. The term "capping unit" refers to a naturally occurring folded (poly)peptide, wherein said (poly)peptide defines a particular structural unit which is N- or C-terminally fused to a repeat unit, wherein said (poly)peptide forms tight tertiary interactions with said repeat unit thereby providing a cap that shields the hydrophobic core of said repeat unit at one side from the solvent. Such capping units may have sequence similarities to said repeat sequence motif.

In a preferred embodiment, the present invention relates to a collection of nucleic acid molecules, wherein said repeat units are ankyrin repeat units.

The characteristics of ankyrin repeat proteins have been reviewed (Sedgwick and Smerdon, 1999) and one minimal folding unit has been investigated (Zhang and Peng, 2000). Ankyrin repeat proteins have been studied in some detail, and the data can be used to exemplify the construction of repeat proteins according to the present invention.

Ankyrin repeat proteins have been identified in 1987 through sequence comparisons between four such proteins in Saccharomyces cerevisiae, Drosophila melanogaster and Caenorhabditis elegans. Breeden and Nasmyth reported multiple copies of a repeat unit of approximately 33 residues in the sequences of swi6p, cdc10p, notch and Iin-12 (Breeden and Nasmyth, 1987). The subsequent discovery of 24 copies of this repeat unit in the ankyrin protein led to the naming of this repeat unit as the ankyrin repeat (Lux et al., 1990). Later, this repeat unit has been identified in several hundreds of proteins of different organisms and viruses (Bork, 1993; SMART database, Schultz et al., 2000). These proteins are located in the nucleus, the cytoplasm or the extracellular space. This is consistent with the fact that the ankyrin repeat domain of these proteins is independent of disulfide bridges and thus independent of the oxidation state of the environment. The number of repeat units per protein varies from two to more than twenty (SMART database, Schultz et al., 2000). A minimum number of repeat units seems to be required to form a stable folded domain (Zhang and Peng, 2000). On the other hand, there is also some evidence for an upper limit of six repeat units being present in one folded domain (Michaely and Bennet, 1993).

All so far determined tertiary structures of ankyrin repeat units share a characteristic fold (Sedgwick and Smerdon, 1999) composed of a .beta.-hairpin followed by two antiparallel .alpha.-helices and ending with a loop connecting the repeat unit with the next one (FIG. 4c). Domains built of ankyrin repeat units are formed by stacking the repeat units to an extended and curved structure. This is illustrated by the structure of the mouse GA-binding protein beta 1 subunit in FIG. 2h.

Proteins containing ankyrin repeat domains often contain additional domains (SMART database, Schultz et al., 2000). While the latter domains have variable functions, the function of the ankyrin repeat domain is most often the binding of other proteins, as several examples show (Batchelor et al., 1998; Gorina and Pavletich, 1996; Huxford et al., 1999; Jacobs and Harrisson, 1999; Jeffrey et al., 2000). When analysing the repeat units of these proteins, the target interaction residues are mainly found in the .beta.-hairpin and the exposed part of the first .alpha.-helix (FIG. 4c). These target interaction residues are hence forming a largecontact surface on the ankyrin repeat domain. This contact surface is exposed on a framework built of stacked units of .alpha.-helix 1, .alpha.-helix 2 and the loop (FIG. 4c). For an ankyrin repeat protein consisting of five repeat units, this interaction surface contacting other proteins is approximately 1200 .ANG..sup.2. Such a large interaction surface is advantageous to achieve high affinities to target molecules. The affinity of IkBa (which contains a domain of six ankyrin repeat units) to the NF-kB heterodimer for example is K.sub.D=3 nM (Malek et al., 1998), whereas the dissociation constant of human GA-binding protein beta 1 to its alpha unit is K.sub.D=0.78 nM (Suzuki et al., 1998 An advantage of the use of ankyrin repeat proteins according to the present invention over widely used antibodies is their potential to be expressed in a recombinant fashion in large amounts as soluble, monomeric and stable molecules (example 2).

Further preferred is a collection, wherein each of said repeat modules comprises the ankyrin repeat consensus sequence DxxGxTPLHLAaxx.+-..+-..+-..+-..+-..+-..+-..+-..+-..+-.GpxpaVpxLLpxGA.+-..- +-..+-..+-..+-.DVNAx (SEQ ID NO: 1), wherein "x" denotes any amino acid, ".+-." denotes any amino acid or a deletion, "a" denotes an amino acid with an apolar side chain, and "p" denotes a residue with a polar sidechain. Most preferred is a collection, wherein one or more of the positions denoted "x" are randomised.

Particularly preferred is a collection, wherein each of said repeat modules comprises the ankyrin repeat consensus sequence DxxGxTPLHLAxxxGxxxVVxLLLxxGADVNAx(SEQ ID NO: 2), wherein "x" denotes any amino acid.

Even more preferred is a diverse collection, wherein each of said repeat modules comprises the ankyrin repeat sequence motif DxxGxTPLHLAxxxGxxxIVxVLLxxGADVNAx(SEQ ID NO: 3), wherein "x" denotes any amino acid.

Yet more preferred is a diverse collection, wherein each of said repeat modules comprises the ankyrin repeat sequence motif D11G1TPLHLAA11GHLEIVEVLLK2GADVNA1 (SEQ ID NO: 4), wherein 1 represents an amino acid residue selected from the group: A, D, E, F, H, I, K, L, M, N, Q, R, S, T, V, W and Y; wherein 2 represents an amino acid residue selected from the group: H, N and Y.

In a further preferred embodiment, the present invention relates to a collection of nucleic acid molecules, wherein said repeat units are leucine-rich repeats (LRR).

The characteristics and properties of the LRR repeat have been reviewed (Kobe and Deisenhofer, 1994). LRR proteins have been studied in some detail, and the data can be used to exemplify the behaviour of repeat proteins.

LRR proteins have been identified by their highly conserved consensus of leucine or other hydrophobic residues at positions 2, 5, 7, and 12 (FIG. 4b). However, the significance of this amino acid distribution pattern was only understood, when the first structure of an LRR, the ribonuclease inhibitor protein was solved (FIG. 2c). Recently, further LRR crystal structures have been elucidated (FIGS. 2d-2f). A structure of a typical ankyrin repeat domain protein is shown for comparison (FIG. 2g). A single LRR is postulated to always correspond to a .beta.-strand and an antiparallel .alpha.-helix (a unique .alpha./.beta. fold, FIG. 4a), surrounding a core made up from leucine or other aliphatic residues only (Kajava, 1998). The overall shape of ribonuclease inhibitor (RI), a LRR protein, could be described as a horseshoe (FIG. 2c) formed by 15 tandem homologous repeats of strictly alternating A-type (29 amino acids) and B-type (28 amino acids) LRR. The alternating nature of the protein was already recognised when the sequence was analysed (FIG. 5a, (Lee et al., 1988)).

Interestingly, mammalian RI are characterised by their extreme affinity to their target proteins. For the binding of RNase A to human RI a K.sub.i=5.9.times.10.sup.-14 M (Kobe and Deisenhofer, 1996) was reported, whereas angiogenin was found to be inhibited with K.sub.i=7.1.times.10.sup.-16 M by pig RI (Lee et al., 1989), thus becoming one of the strongest interaction known between proteins. Even the best-binding antibodies feature affinities only up 1.5.times.10.sup.-11M (Yang et al., 1995). To better understand the outstanding affinity, two RI were co-crystallised with their target proteins. Subsequent analysis of the crystal structures showed that the interactions are mainly electrostatic (Kobe and Deisenhofer, 1996) and the involved amino acids were predominantly found emanating from the inner .beta.-sheet and the loop connecting each unit to its .alpha.-helix (FIG. 4b, Kobe and Deisenhofer, 1995). Moreover, the width of the horseshoe-like fold can change slightly to accommodate the target protein (Kobe and Deisenhofer, 1994). The interface between target and inhibitor consists of a "patch-work" of interactions and the tight association originates from the large buried surface area (about 2550 A.sup.2) when the target protein is bound inside the horseshoe, rather than shape complementarity (Kobe and Deisenhofer, 1996).

When comparing the detailed binding of RNase A and angiogenin (two molecules with only 30% sequence identity) to RI, significant differences became apparent (Chen and Shapiro, 1997). Whereas largely the same residues were involved on the side of RI, the residues of the target protein were not homologous or used different types of bonding (Papageorgiou et al., 1997). In other words, RI evolved in a way which allowed it to bind and inhibit different target molecules by relying on a large number of contacts presented in correct geometrical orientation, rather than optimal complementarity of the residues. This is the basis for a design of new binding molecules, which will have new binding specificities. The shape seems to be predestined for the recognition of large surfaces thereby allowing a much greater variety of random amino acids to generate a library as compared to the relatively small "variable" domains of antibodies. However, the loops of antibodies seem to be superior if small haptens or deep clefts have to be recognised. In addition, not only the repeats themselves can be varied but also their number depending on the target molecules.

Further preferred is a collection, wherein each of said modules comprises the LRR consensus sequence xLxxLxLxxN.+-.xaxx.+-.a.+-..+-..+-..+-.a.+-..+-.a.+-..+-.x.+-..+-.(SEQ ID NO: 5), wherein "x" denotes any amino acid, "a" denotes an aliphatic amino acid, and ".+-." denotes any amino acid or a deletion.

The term "aliphatic amino acid" refers to an amino acid taken from the list of Ala, Gly, lie, Leu and Val.

Particularly preferred is a collection, wherein at least one of said modules comprises the LRR consensus sequence XLExLxLxxCxLTxxxCxxLxxaLxxxx (SEQ ID NO: 6), wherein "x" denotes any amino acid, and "a" denotes an aliphatic amino acid (A-type LRR).

Particularly preferred is furthermore a collection, wherein at least one of said modules comprises the LRR consensus sequence XLxELxLxxNxLGDxGaxxLxxxLxxPxx (SEQ ID NO: 7), wherein "x" denotes any amino acid, and "a" denotes an aliphatic amino acid (B-type LRR).

Most preferred is a collection, wherein one or more of the positions denoted "x" and/or ".+-." are randomised.

Further preferred is a collection, wherein the cysteine residue at position 10 in the A-type LRR consensus sequence is replaced by a hydrophilic amino acid residue, and wherein the cystein residue at position 17 is replaced by a hydrophobic amino acid residue.

A hydrophilic amino acid residue may be taken from the list of Ser, Thr, Tyr, Gln, and Asn.

A hydrophobic amino acid residue may be taken from the list of Ala, Ile, Leu, Met, Phe, Trp, and Val.

Compared to single-chain Fv or conventional antibodies, several advantages can be enumerated. Whereas disulfide bridges are crucial for the stability of most antibodies (Proba et al., 1997), no disulfide bonds are required in LRR proteins, which makes intracellular applications possible.

Therefore, new binding molecules can be generated for application in a reducing environment. This could become an enormously powerful tool in elucidating the function of the numerous proteins identified by the genome sequencing projects by direct inhibition in the cytosol. As for many applications in biotechnology large amounts of expressed and correctly folding proteins are required, a production in E. coli is preferable but very difficult for antibodies which evolved in the oxidising extracellular environment. In contrast, folding or refolding of RI variants are more efficient as they are naturally found in the cytosol (see Example 1).

In a further preferred embodiment of a collection according to the present invention, one or more of the amino acid residues in an ankyrin or LRR repeat module as described above are exchanged by an amino acid residue found at the corresponding position in a corresponding naturally occurring repeat unit.

Preferably, up to 30% of the amino acid residues are exchanged, more preferably, up to 20%, and most preferably, up to 10% of the amino acid residues are exchanged.

Particularly preferred is a collection, wherein said set consists of one type of repeat modules.

The term "type of repeat module" refers to the characteristics of a module determined by the length of the module, the number and composition of its "fixed positions" as well as of its "randomised positions". "Different types of modules" may differ in one or more of said characteristics.

Further preferred is a collection, wherein said set consists of two different types of repeat modules.

In a still further preferred embodiment, the present invention relates to a collection, wherein said set comprises two different types of consecutive repeat modules as pairs in said repeat proteins.

Most preferred is a collection, wherein said two different types of modules are based on said A-type LRR and B-type LRR.

Further preferred is a collection, wherein the amino acid sequences of the repeat modules comprised in said set are identical for each said type except for the randomised residues.

Yet further preferred is a collection, wherein the nucleic acid sequences encoding the copies of each said type are identical except for the codons encoding amino acid residues at positions being randomised.

Particularly preferred is a collection, wherein the nucleic acid molecules encoding said repeat proteins comprise identical nucleic acid sequences of at least 9 nucleotides between said repeat modules.

Said "identical nucleic acid sequences of at least 9 nucleotides" may be part of the end of only one repeat module, or be formed by the ends of two adjacent repeat modules, or may be part of a (poly)peptide linker connecting two repeat modules.

In a further preferred collection according to the present invention, the nucleic acid molecules encoding said repeat proteins comprise identical nucleic acid sequences of at least 9 nucleotides between said pairs.

Said "identical nucleic acid sequences of at least 9 nucleotides" may be part of the end of only one pair of repeat modules, or be formed by the ends of two adjacent pairs of repeat modules, or may be part of a (poly)peptide linker connecting two pairs of repeat modules.

Most preferable is a collection, wherein each of the nucleic acid sequences between said modules, or said pairs, comprises a restriction enzyme recognition sequence. The term "restriction enzyme recognition sequence" refers to a nucleic acid sequence being recognised and cleaved by a restriction endonuclease. Said restriction enzyme recognition sequence may be divided symmetrically between the 3' and 5' ends (e.g. 3 nucleotides of a 6 base pair recognition sequence on both ends), or non-symmetrically (e.g. 2 nucleotides on one end, 4 on the corresponding end).

Particularly preferred is a collection, wherein each of the nucleic acid sequences between said modules, or said pairs, comprises a nucleic acid sequence formed from cohesive ends created by two compatible restriction enzymes.

The term "compatible restriction enzymes" refers to restriction enzymes having different recognition sequences but forming compatible cohesive ends when cleaving double stranded DNA. After re-ligation of sticky-end double-stranded DNA fragments produced from two compatible restriction enzymes, the product DNA does no longer exhibit the recognition sequences of both restriction enzymes.

In a further most preferred embodiment of the collection of the present invention, said identical nucleic acid sequences allow a PCR-based assembly of the nucleic acid molecules encoding said repeat proteins.

In a most preferred embodiment of the collection according to the present invention, said repeat proteins comprise one or more pairs of modules based on said A-type LRR and B-type LRR, wherein each of said pairs has the sequence RLE1L1L112DLTEAG4KDLASVLRSNPSLREL3LS3NKLGDAGVRLLLQGLLDPGT (SEQ ID NO: 8), wherein 1 represents an amino acid residue selected from the group: D, E, N, Q, S, R, K, W and Y; wherein 2 represents an amino acid residue selected from the group: N, S and T; wherein 3 represents an amino acid residue selected from the group: G, S, D, N, H and T; and wherein 4 represents an amino acid residue selected from the group: L, V and M.

Most preferably, each of said pairs of modules is encoded by the nucleic acid molecule

TABLE-US-00001 CGC CTG GAG 111 CTG 111 CTG 111 111 222 GAG CTC ACC GAG GCC GGC 444 AAG GAC CTG GCC AGC GTG CTC CGC TCC AAC CCG AGC CTG CGG GAG CTG 333 CTG AGC 333 AAC AAG CTC GGC GAT GCA GGC GTG CGG CTG CTC TTG CAG GGG CTG CTG GAC CCC GGC ACG (SEQ ID NO: 9)

wherein 111 represents a codon encoding an amino acid residue selected from the group: D, E, N, Q, S, R, K, W and Y; wherein 222 represents a codon encoding an amino acid residue selected from the group: N, S and T; wherein 333 represents a codon encoding an amino acid residue selected from the group: G, S, D, N, H and T; and wherein 444 represents a codon encoding an amino acid residue selected from the group: L, V and M.

In another preferred embodiment one or more of the amino acid residues in at least one pair of modules as listed above are exchanged by an amino acid residue found at the corresponding position in a naturally occurring LRR.

In yet another preferred embodiment, one or more of the amino acid codons in at least one pair of modules as listed above are exchanged by a codon encoding an amino acid residue found at the corresponding position in a naturally occurring LRR. Preferably, up to 30% of the amino acid residues, or amino acid codons, respectively, are exchanged, more preferably, up to 20%, and most preferably, up to 10% are exchanged.

In yet another preferred embodiment, one or more of the amino acid codons in at least one pair of modules as listed above are exchanged by a codon encoding an amino acid residue found at the corresponding position in a naturally occurring LRR.

In a further preferred embodiment, the present invention relates to a collection of recombinant nucleic acid molecules comprising a collection of nucleic acid molecules according to the present invention.

In the context of the present invention, the term "recombinant nucleic acid molecule" refers to a RNA or DNA molecule which comprises a nucleic acid sequence encoding said repeat protein and further nucleic acid sequences, e.g. non-coding sequences.

In a still further preferred embodiment, the present invention relates to a collection of vectors comprising a collection of nucleic acid molecules according to the present invention, or a collection of recombinant nucleic acid molecules according to the present invention.

A vector according to the present invention may be a plasmid, phagemid, cosmid, or a virus- or bacteriophage-based vector, and may be a cloning or sequencing vector, or preferably an expression vector, which comprises all elements required for the expression of nucleic acid molecules from said vector, either in prokaryotic or eukaryotc expression systems. Vectors for cloning, sequencing and expressing nucleic acid molecules are well known to any one of ordinary skill in the art. The vectors containing the nucleic acid molecules of the invention can be transferred into the host cell by well-known methods, which vary depending on the type of cellular host. For example, calcium chloride transfection is commonly utilised for prokaryotic cells, whereas, e.g., calcium phosphate or DEAE-Dextran mediated transfection or electroporation may be used for other cellular hosts; see Sambrook et al. (1989). Such vectors may comprise further genes such as marker genes which allow for the selection of said vector in a suitable host cell and under suitable conditions. Preferably, the nucleic acid molecules of the invention are operatively linked to expression control sequences allowing expression in prokaryotic or eukaryotic cells. Expression of said nucleic acid molecules comprises transcription of the polynucleotide into a translatable mRNA. Regulatory elements ensuring expression in eukaryotic cells, preferably mammalian cells, are well known to those skilled in the art. They usually comprise regulatory sequences ensuring initiation of transcription and, optionally, a poly-A signal ensuring termination of transcription and stabilization of the transcript, and/or an intron further enhancing expression of said nucleic acid molecule. Additional regulatory elements may include transcriptional as well as translational enhancers, and/or naturally-associated or heterologous promoter regions. Possible regulatory elements permitting expression in prokaryotic host cells comprise, e.g., the pL, lac, trp or tac promoter in E. coli, and examples for regulatory elements permitting expression in eukaryotic host cells are the AOX1 or GAL1 promoter in yeast or the CMV-, SV40-, RSV-promoter (Rous sarcoma virus), CMV-enhancer, SV40-enhancer or a globin intron in mammalian and other animal cells. Beside elements which are responsible for the initiation of transcription, such regulatory elements may also comprise transcription termination signals, such as the SV40-poly-A site or the tk-poly-A site, downstream of the nucleic acid molecule. Furthermore, depending on the expression system used leader sequences capable of directing the (poly)peptide to a cellular compartment or secreting it into the medium may be added to the coding sequence of the nucleic acid molecule of the invention and are well known in the art. The leader sequence(s) is (are) assembled in appropriate phase with translation, initiation and termination sequences, and preferably, a leader sequence capable of directing secretion of translated protein, or a portion thereof, into the periplasmic space or extracellular medium. Optionally, the heterologous sequence can encode a fusion protein including a C- or N-terminal identification peptide imparting desired characteristics, e.g., stabilization or simplified purification of expressed recombinant product. In this context, suitable expression vectors are known in the art such as Okayama-Berg cDNA expression vector pcDV1 (Pharmacia), pCDM8, pRc/CMV, pcDNA1, pcDNA3 (In-vitrogene), pSPORT1 (GIBCO BRL) or pCI (Promega) or more preferably pTFT74 (Ge et al., 1995) or a member of the pQE series (Qiagen). Furthermore, the present invention relates to vectors, particularly plasmids, cosmids, viruses and bacteriophages used conventionally in genetic engineering that comprise the polynucleotide of the invention. Preferably, said vector is an expression vector. Methods which are well known to those skilled in the art can be used to construct recombinant viral vectors; see, for example, the techniques described in Sambrook et al., Molecular Cloning A Laboratory Manual, Cold Spring Harbor Laboratory (1989) N.Y. and Ausubel et al., Current Protocols in Molecular Biology, Green Publishing Associates and Wiley Interscience, N.Y. (1989).

Futhermore, the invention relates to a collection of host cells comprising a collection of nucleic acid molecules according to the present invention, a collection of recombinant nucleic acid molecules according to the present invention, or a collection of vectors according to the present invention.

In the context of the present invention the term "host cell" may be any of a number commonly used in the production of heterologous proteins, including but not limited to bacteria, such as Escherichia coli (Ge et al., 1995), or Bacillus subtilis (Wu et al., 1993a), fungi, such as yeasts (Horwitz et al., 1988; Ridder et al., 1995) or filamentous fungus (Nyyssonen et al., 1993), plant cells (Hiatt, 1990; Hiatt and Ma, 1993; Whitelam et al., 1994), insect cells (Potter et al., 1993; Ward et al., 1995), or mammalian cells (Trill et al., 1995).

In another embodiment, the present invention relates to a collection of repeat proteins encoded by a collection of nucleic acid molecules according to the present invention, by a collection of vectors according to the present invention, or produced by a collection of host cells according to the present invention.

Furthermore, the present invention relates to a method for the construction of a collection of nucleic acid molecules according to the present invention, comprising the steps of (a) identifying a repeat unit from a repeat protein family; (b) identifying framework residues and target interaction residues in said repeat unit; (c) deducing at least one type of repeat module comprising framework residues and randomised target interaction residues from at least one member of said repeat protein family; and (d) constructing nucleic acid molecules each encoding a repeat protein comprising two or more copies of said at least one type of repeat module deduced in step (c).

The modes how this method is to be carried out are explained above in connection with the embodiment of the collection of nucleic acid molecules of the present invention. Descriptions of two such modes are illustrated in the example.

In a preferred embodiment of this method, said at least one repeat module deduced in step (c) has an amino acid sequence, wherein at least 70% of the amino acid residues correspond either (i) to consensus amino acid residues deduced from the amino acid residues found at the corresponding positions of at least two naturally occurring repeat units; or (ii) to the amino acid residues found at the corresponding positions in a naturally occurring repeat unit.

Further preferred is a method for the production of a collection of poly)peptides/proteins according to the present invention, comprising the steps of (a) providing a collection of host cells according to the present invention; and (b) expressing the collection of nucleic acid molecules comprised in said host cells.

Particularly preferred is a method for obtaining a repeat protein having a predetermined property, comprising the steps of (a) providing a collection of repeat proteins according to the present invention; and (b) screening said collection and/or selecting from said collection to obtain at least one repeat protein having said predetermined property.

The diverse collection of repeat proteins may be provided by several methods in accordance with the screening and/or selection system being used, and may comprise the use of methods such as display on the surface of bacteriophages (WO 90/02809; Smith, 1985; Kay et al., 1996; Dunn, 1996) or bacterial cells (WO 93/10214), ribosomal display (WO 91/05058; WO 98/48008; Hanes et al., 1998), display on plasmids (WO 93/08278) or by using covalent RNA-repeat protein hybrid constructs (WO 00/32823), intracellular expression and selection/screening such as by protein complementation assay (WO 98/341120; Pelletier et al., 1998). In all these methods, the repeat proteins are provided by expression of a corresponding collection of nucleic acid molecules and subsequent screening of the repeat proteins followed by identification of one or more repeat proteins having the desired property via the genetic information connected to the repeat proteins.

In the context of the present invention the term "predetermined property" refers to a property, which one of the repeat proteins out of the


Free Web Sudoku Puzzles.
Solve with your browser.
            1 7  
            5 8 2
  8       6 9    
    1 9 8       3
      6   1      
7       5 3 4    
    5 8       6  
1 2 3            
  4 8            
What is it?



Add Your Site · Terms Of Service · Privacy Policy


DISCLAIMER
Linkgrinder is a free service that searches the Internet and indexes all files found so that you may search quickly and easily for shared files. These files are created and made available individually by users whose identity we are not aware of and who we have no control over. In essence we function like a search engine tool; these files ARE NOT STORED OR SERVED BY OUR NETWORK. We are not responsible for any materials obtained by using our service. We do not monitor any of the contents of these files. These files may contain viruses, illegal materials, materials inappropriate for minors, offensive files and the like. BY USING OUR SERVICE, YOU ASSUME FULL RESPONSIBILITY FOR DOWNLOADING THESE MATERIALS AND WILL INDEMNIFY US FOR ANY DAMAGES THAT MAY BE INCURRED.

For More Specific Information VIEW OUR TERMS OF SERVICE.

Thank you and Enjoy!