Senior Fitness - Exercise and Nutrition for Aging Men and Women
FREE Article Feed for your website.
Home Ownership Magazine
Party Planning Information
Article Marketing Resources
Bio-Medical Research Article Database
Informative Articles on Life, Love and Happiness
Tutorials on Business to Writing
Famous Quotes from Famous People
Song Lyric Information
New US Patent Information
Comprehensive List of Content by Category
Online Auctions and Shopping Related Articles
Article Search
Most Recent Articles
Title: Fuel pump, in particular for an internal combustion engine with direct injection
Patent Number: 6,889,662 Issued on 05/10/2005 to Hess

Title: Method and system for releasing a voice response unit from a protocol session
Patent Number: 6,816,579 Issued on 11/09/2004 to Donovan,   et al.

Title: Retractable multiband radiator with switching contact for wireless communication devices
Patent Number: 7,006,801 Issued on 02/28/2006 to Lang

Title: Monitoring arrangement for compartment air contamination
Patent Number: 6,991,674 Issued on 01/31/2006 to Dietrich

Title: Method and apparatus for making a thixotropic metal slurry
Patent Number: 6,991,670 Issued on 01/31/2006 to Norville,   et al.

Title: Applicator plate for an adhesive applicator of a core-making machine
Patent Number: 6,991,682 Issued on 01/31/2006 to Moss,   et al.

Title: Treadmill having adjustable speed
Patent Number: 7,141,006 Issued on 11/28/2006 to Chen,   et al.

Title: Java hardware accelerator using thread manager
Patent Number: 6,826,749 Issued on 11/30/2004 to Patel,   et al.

Title: Proteinase inhibitor, precursor thereof and genetic sequences encoding same
Patent Number: 6,806,074 Issued on 10/19/2004 to Anderson,   et al.

Title: System and method for interactive on-line gaming
Patent Number: 6,863,612 Issued on 03/08/2005 to Willis

Title: Reciprocating device for a polishing roller of an emery-polishing machine
Patent Number: 6,860,797 Issued on 03/01/2005 to Chuang

Title: Titanium dioxide-calcium carbonate composite particles
Patent Number: 6,991,677 Issued on 01/31/2006 to Tanabe,   et al.

Title: Acoustic blanket system
Patent Number: 6,776,258 Issued on 08/17/2004 to Grosskrueger,   et al.

Title: Delay device, semiconductor testing device, semiconductor device, and oscilloscope
Patent Number: 6,769,082 Issued on 07/27/2004 to Okayasu,   et al.

Title: System and method for placing substrate contacts in a datapath stack in an integrated circuit design
Patent Number: 6,826,739 Issued on 11/30/2004 to Frerichs

Title: Distributed processing system with registered reconfiguration processors and registered notified processors
Patent Number: 6,769,072 Issued on 07/27/2004 to Kawamura,   et al.

Title: Power supply wiring method for semiconductor integrated circuit and semiconductor integrated circuit
Patent Number: 7,093,222 Issued on 08/15/2006 to Fujimoto

Title: Rear axle suspension mechanism for utility vehicles
Patent Number: 6,766,872 Issued on 07/27/2004 to Hurlburt

Title: Device for packaging continuous webs of materials such as selvedges generated on thermoforming units
Patent Number: 7,140,168 Issued on 11/28/2006 to Pourchet,   et al.

Title: Dual reclining device for vehicle seat
Patent Number: 6,767,068 Issued on 07/27/2004 to Fujii,   et al.

Title: Method, system and computer product for processing dual energy images
Patent Number: 6,816,572 Issued on 11/09/2004 to Jabri,   et al.

Title: Peritoneal dialysis solution containing modified icodextrins
Patent Number: 6,770,148 Issued on 08/03/2004 to Naggi,   et al.

Title: Packet synchronization detector
Patent Number: 6,816,560 Issued on 11/09/2004 to Spalink

Title: Stereo panoramic camera arrangements for recording panoramic images useful in a stereo panoramic image pair
Patent Number: 6,795,109 Issued on 09/21/2004 to Peleg,   et al.

Title: Dynamic generation of optimizer hints
Patent Number: 6,813,617 Issued on 11/02/2004 to Wong,   et al.

Title: Method and apparatus for providing distributed scene programming of a home automation and control system
Patent Number: 6,970,751 Issued on 11/29/2005 to Gonzales,   et al.

Title: Communication bus for low voltage swing data signals
Patent Number: 6,816,554 Issued on 11/09/2004 to Zhang

Title: Semiconductor device having two-layered charge storage electrode
Patent Number: 6,806,132 Issued on 10/19/2004 to Mori,   et al.

Title: System and method of deferred postal address processing
Patent Number: 6,816,602 Issued on 11/09/2004 to Coffelt,   et al.

Title: Methods of forming isolation regions associated with semiconductor constructions
Patent Number: 6,806,123 Issued on 10/19/2004 to McQueen,   et al.

Title: Video conferencing apparatus and method therefor
Patent Number: 6,795,107 Issued on 09/21/2004 to Neal,   et al.

Title: Method of removing a gate remnant from a casting
Patent Number: 7,140,414 Issued on 11/28/2006 to McKibben,   et al.

Title: Voice switching system and voice switching method
Patent Number: 6,816,591 Issued on 11/09/2004 to Terada,   et al.

Title: Multistage compressor
Patent Number: 6,769,267 Issued on 08/03/2004 to Ebara,   et al.

Title: X-ray generating apparatus, X-ray imaging apparatus, and X-ray inspection system
Patent Number: 6,816,573 Issued on 11/09/2004 to Hirano,   et al.

Title: Builder's square
Patent Number: 6,766,586 Issued on 07/27/2004 to Brooks

Title: Raid volume for sequential use that needs no redundancy pre-initialization
Patent Number: 6,813,687 Issued on 11/02/2004 to Humlicek

Title: Objective lens having diffractive structure for optical pick-up
Patent Number: 6,807,019 Issued on 10/19/2004 to Takeuchi,   et al.

Title: Underhood electronic integration
Patent Number: 6,807,060 Issued on 10/19/2004 to Glovatsky,   et al.

Title: Heat sink and combinations
Patent Number: 6,807,058 Issued on 10/19/2004 to Matteson,   et al.

Title: Corona discharge apparatus and method of manufacture
Patent Number: 6,807,044 Issued on 10/19/2004 to Vernitsky,   et al.

Title: Electronic device and interposer board
Patent Number: 6,807,047 Issued on 10/19/2004 to Togashi,   et al.

Title: Tiller
Patent Number: 6,766,866 Issued on 07/27/2004 to Miyahara,   et al.

Title: Configurable bracket for mounting electronic devices
Patent Number: 6,807,052 Issued on 10/19/2004 to Erickson,   et al.

Title: Semiconductor memory circuit hard to cause soft error
Patent Number: 6,807,081 Issued on 10/19/2004 to Nii

Title: Method for supporting a boating accessory
Patent Number: 7,007,911 Issued on 03/07/2006 to Slatter,   et al.

Title: Method to snapshot and playback raw data in an ultrasonic meter
Patent Number: 7,013,240 Issued on 03/14/2006 to Malik,   et al.

Title: Bumper assembly
Patent Number: 6,767,039 Issued on 07/27/2004 to Bird

Title: System and method for fetal brain monitoring
Patent Number: 7,016,722 Issued on 03/21/2006 to Prichep

Title: Sheet finisher with two processing trays
Patent Number: 6,957,810 Issued on 10/25/2005 to Yamada,   et al.

Title: Swing control weight
Patent Number: 6,808,460 Issued on 10/26/2004 to Namiki

Title: Independent back slide and stow
Patent Number: 6,767,040 Issued on 07/27/2004 to Freijy

Title: Adjustable keyboard stand
Patent Number: 7,007,907 Issued on 03/07/2006 to Huh

Title: Accessory device
Patent Number: 7,007,904 Issued on 03/07/2006 to Schultz

Title: Ebola virion proteins expressed from venezuelan equine encephalitis (VEE) virus replicons
Patent Number: 6,984,504 Issued on 01/10/2006 to Hart,   et al.

Title: Nucleic acids encoding peptides that induce chondrocyte redifferentiation
Patent Number: 6,984,519 Issued on 01/10/2006 to Desnoyers,   et al.

Title: Method and apparatus for variable data document printing
Patent Number: 7,142,326 Issued on 11/28/2006 to Bondy,   et al.

Title: Protective helmet with detachable shell piece
Patent Number: 6,766,537 Issued on 07/27/2004 to Maki,   et al.

Title: Method of continuous casting non-oriented electrical steel strip
Patent Number: 7,140,417 Issued on 11/28/2006 to Schoen,   et al.

Title: All terrain vehicle back support
Patent Number: 6,767,053 Issued on 07/27/2004 to Crounse

Title: Lowerable motor vehicle roof for a cabriolet
Patent Number: 6,767,045 Issued on 07/27/2004 to Reinsch

Title: Rotation transmission device
Patent Number: 6,766,888 Issued on 07/27/2004 to Yasui,   et al.

Title: Semiconductor package and fabrication method thereof
Patent Number: 6,891,273 Issued on 05/10/2005 to Pu,   et al.

Title: Pumping voltage generator
Patent Number: 6,853,567 Issued on 02/08/2005 to Kwon

Title: Operating device for controlling electronic devices utilizing a touch panel
Patent Number: 6,795,059 Issued on 09/21/2004 to Endo

Title: Method for manufacturing aluminum oxynitride (AlON) powder and other nitrogen-containing powders
Patent Number: 6,955,798 Issued on 10/18/2005 to Miao

Title: Syringes and injectors incorporating mechanical fluid agitation devices
Patent Number: 6,966,894 Issued on 11/22/2005 to Urich

Title: Color conversion device and color conversion method
Patent Number: 6,795,086 Issued on 09/21/2004 to Sugiura,   et al.

Title: Laser printing method and apparatus
Patent Number: 6,795,105 Issued on 09/21/2004 to Katayama,   et al.

Title: System for processing graphic patterns
Patent Number: 6,795,077 Issued on 09/21/2004 to Pasquier

Title: Method for determining bending order and disposition of dies
Patent Number: 6,795,095 Issued on 09/21/2004 to Nishiyama,   et al.

Title: Cytolysis of target cells by superantigen conjugates inducing T-cell activation
Patent Number: 6,962,694 Issued on 11/08/2005 to Soegaard,   et al.

Title: Thermal printer having shutter unit
Patent Number: 6,795,102 Issued on 09/21/2004 to Kokubo,   et al.

Title: Data input/output system, data input/output method, and program recording medium
Patent Number: 6,795,060 Issued on 09/21/2004 to Rekimoto,   et al.

Title: Display apparatus and driving method of same
Patent Number: 6,795,066 Issued on 09/21/2004 to Tanaka,   et al.

Data repository and method for promoting network storage of data Number:7,412,462 from the United States Patent and Trademark Office (PTO) owispatent

Home    Author Login    Submit Article    Article Search    Add Your Link    Edit Your Link    Contact Us    Advertising    Disclaimer

   

 
Web LinkGrinder.com

Top Breaking News
     Greek, Cypriot Leaders Resume Unification Talks in Nicosia by Nathan Morley
     Indonesia Tobacco Sales Grow, Raising Health Fears
     South Korea Allows Top Defector to Travel Overseas by VOA News

Title: Data repository and method for promoting network storage of data

Abstract: In general, the invention features methods by which more than one client program connected to a network stores the same data item on a storage device of a data repository connected to the network. In one aspect, the method comprises encrypting the data item using a key derived from the content of the data item, determining a digital fingerprint of the data item, and storing the data item on the storage device at a location or locations associated with the digital fingerprint. In a second aspect, the method comprises determining a digital fingerprint of the data item, testing for whether the data item is already stored in the repository by comparing the digital fingerprint of the data item to the digital fingerprints of data items already in storage in the repository, and challenging a client that is attempting to deposit a data item already stored in the repository, to ascertain that the client has the full data item.

Patent Number: 7,412,462 Issued on 08/12/2008 to Margolus,   et al.


Inventors: Margolus; Norman H. (Somerville, MA), Knight, Jr.; Thomas F. (Belmont, MA), Floyd; Jered J. (Cambridge, MA), Hartman; Sam (Cambridge, MA), Homsy, II; George E. (Cambridge, MA)
Assignee: Burnside Acquisition, LLC (Cambridge, MA)
Appl. No.: 09/785,535
Filed: February 16, 2001


Related U.S. Patent Documents

Application NumberFiling DatePatent NumberIssue Date
60183466Feb., 2000

Current U.S. Class: 707/200 ; 705/18; 705/51; 707/10; 709/203; 713/176; 713/180; 713/181
Current International Class: G06F 17/30 (20060101)
Field of Search: 707/10,204,200 713/189,193,200-202,176,180,181 709/203 705/51,18


References Cited [Referenced By]

U.S. Patent Documents
4641274 February 1987 Swank
4864616 September 1989 Pond et al.
RE34954 May 1995 Haber et al.
5532920 July 1996 Hartrick et al.
5579501 November 1996 Lipton et al.
5594227 January 1997 Deo
5765152 June 1998 Erickson
5778395 July 1998 Whiting et al.
5781901 July 1998 Kuzma
5852666 December 1998 Miller et al.
5914938 June 1999 Brady et al.
5915025 June 1999 Taguchi et al.
5931947 August 1999 Burns et al.
5940507 August 1999 Cane et al.
5978791 November 1999 Farber et al.
5990810 November 1999 Williams
6041411 March 2000 Wyatt
6052688 April 2000 Thorsen
6122631 September 2000 Berbec et al.
6148342 November 2000 Ho
6205533 March 2001 Margolus
6272492 August 2001 Kay
6308325 October 2001 Dobbek
6374266 April 2002 Shnelvar
6415280 July 2002 Farber et al.
6415302 July 2002 Garthwaite et al.
6430618 August 2002 Karger et al.
6526418 February 2003 Midgley et al.
6532542 March 2003 Thomlinson et al.
6535867 March 2003 Waters
6549992 April 2003 Armangau et al.
6557102 April 2003 Wong et al.
6584466 June 2003 Serbinis et al.
6601172 July 2003 Epstein
2003/0028761 February 2003 Platt
Foreign Patent Documents
0774715 May., 1997 EP
1 049 988 Sep., 2002 EP
1 049 989 May., 2003 EP
99/09480 Feb., 1999 WO
WO 01/18633 Mar., 2001 WO
WO 01/61563 Aug., 2001 WO

Other References

Tridgell et al., The Rsync Algorithm, Jun. 18, 1996, Department of Computer Science Australian National University Canberra, pp. 1-6. cited by examiner .
Chaum et al., "Utraceable Electronic Cash", Advances in Cryptology CRYPTO '88, Springer-Verlag, pp. 319-327 (1998). cited by other .
Feige et al., "Zero-Knowledge Proofs of Identity, "Journal of Cryptology 1:77-94 (1988). cited by other .
Margolus, Crystalline Computation, Chapter 18 of Feynman and Computation (A. Hey, ed.), Perseus Books, pp. 267-305 (1999). cited by other .
National Institute of Standards and Technology, NIST FIPS PUB 180-1, "Secure Hash Standard", U.S. Department of Commerce (Apr. 1995). cited by other .
Nowicki, "NFS: Network File System Protocol Specification" Network Working Group RFC1094, Sun Microsystems, Inc. (Mar. 1989). cited by other .
Preface from FWKCS(TM) Contents.sub.--Signature System, Version 2.05, Copypright Frederick W. Kantor (Apr. 26, 1996). cited by other .
Rabin, "Efficient Dispersal of Information for Security," Load Balancing, and Fault Tolerance, Journal of the ACM, vol. 36, No. 2, pp. 335-348 (Apr. 1989). cited by other .
Rivest, "The MD4 Message Digest Algorithm," Network Working Group RFC1186, MIT Laboratory for Computer Science (Oct. 1990). cited by other .
Karger et al., "Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web," Laboratory for Computer Science and Department of Mathematics, MIT, Cambridge, MA. cited by other .
Berners-Lee et al., "Universal Document Identifiers," available at http://www.webhistory.org/www-talk.1992/0032.html (Mar. 11, 1992). cited by other .
Bowman et al., "Harvest: A Scalable, Customizable Discovery and Access System," Technical Report CU-CS-732-94, Dept. of Comp. Science, Univ. of Colorado (Aug. 1994). cited by other .
Browne et al., "Location-Independent Naming for Virtual Distributed Software Repositories," available at www.netlib.org/utk/papers/lefn/main.html (Nov. 11, 1994). cited by other .
Crespo et al., "Archival Storage for Digital Libraries," Procs. of the Third ACM Conf. on Digital Libraries, pp. 69-78 (ISBN: 0-89791-965-3) (1998). cited by other .
Heckel, "A Technique for Isolating Differences Between Files," Communications of the ACM, vol. 21, No. 4 (Apr. 1978). cited by other .
Kaliski, "PKCS #1: RSA Encryption," Mar. 1998, The Internet Society, Request for Comments 2313, pp. 1-19, http://www.ietf.org/rfc/rfc2313.txt. cited by other .
Kantor, FWKCS.TM. Contents.sub.--Signature System, Version 1.18 (Sep. 11, 1992). cited by other .
Rabin, "Fingerprinting by Random Polynomials," Center for Research in Computing Technology, Harvard University, Technical Report TR-15-81 (1981). cited by other .
Rivest, "The MD5 Message-Digest Algorithm," Network Working Group, Request for Comments: 1321, MIT Lab for Comp. Science and RSA Data Security, Inc. (Apr. 1992). cited by other .
Sollins, "Functional Requirements for Uniform Resource Names," Network Working Group, Request for Comments: 1737, MIT Lab. For Comp. Science (Dec. 1994). cited by other .
Williams, "An Introduction to Digest Algorithms," available at ftp.rocksoft.com:/pub/rocksoft/., (Sep. 1994). cited by other.

Primary Examiner: Pham; Hung Q
Attorney, Agent or Firm: Fish & Richardson P.C.

Parent Case Text



CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims priority from U.S. Provisional Application Ser. No. 60/183,466, filed Feb. 18, 2000.
Claims



What is claimed is:

1. A method by which a plurality of client programs connected to a network deposit data items into a data repository connected to the network and avoid repeated storage of duplicated data items, the method comprising: depositing a data item in the data repository for a depositing client program, the depositing including determining a digital fingerprint from the data item using a hash function that produces digital fingerprints having a pseudorandom distribution; comparing the determined digital fingerprint from the deposited data item to digital fingerprints of data items already stored in the data repository; establishing from the comparing of digital fingerprints, without comparing the entire contents of the deposited data item to the entire contents of a data item already stored, whether a stored data item is identical to the deposited data item; and storing the deposited data item in the data repository if the deposited data item is not identical with any stored data item; wherein the stored data item is associated with a named object and an access authorization credential which is uniquely associated with a depositing client program or with a data repository user; wherein the access authorization credential is associated with the named object which comprises a digital fingerprint; wherein the named object is stored in a database; wherein the stored data item, in response to a request by a retrieving client program, is retrieved by using the access authorization credential to select the stored named object; retrieving the stored named object from the database; and using the digital fingerprint from the retrieved named object to return the stored data item; wherein the physical location or locations at which the stored data item is stored in the data repository are determined at least in part by the digital fingerprint.

2. The method of claim 1 wherein the depositing further comprises encrypting the deposited data item using a key derived from the content of the deposited data item.

3. The method of claim 2 wherein the encrypting of the deposited data item is performed by the depositing client program.

4. The method of claim 2 wherein if the deposited data item is identical to each of a plurality of data items deposited independently by a plurality of the depositing client programs and if an encryption key is independently derived from the content of each of the plurality of data items, then all of the independently derived encryption keys are the same.

5. The method of claim 2 wherein users are grouped into a plurality of families, and the depositing client program acts on behalf of a user, and the key derived from the content of the deposited data item is determined in part by which of the plurality of families the user belongs to.

6. The method of claim 1 wherein the stored data item is associated with each of a plurality of access-authorization credentials, each of which is uniquely associated with a distinct data repository user or client program.

7. The method of claim 1 wherein the stored named object is identified by information representative of the access-authorization credential.

8. The method of claim 7 wherein the information representative of the access-authorization credential is a cryptographic hash of all or part of the access-authorization credential.

9. The method of claim 8 wherein the cryptographic hash is an access identifier that uniquely identifies the stored data item for a particular user or client program.

10. The method of claim 9 wherein the data repository is the database and the physical locations at which the named-object is stored are based on an access identifier.

11. The method of claim 1 wherein the stored named object further comprises historical version information associating data items deposited at different times with different named object versions.

12. The method of claim 1 wherein named object history is preserved by creatng a new version of the named object each time that a new data item is associated with it.

13. The method of claim 12, wherein a determination of which versions of the named object to delete is based in whole or in part on the times at which the versions were created, and the intervals between these times.

14. The method of claim 1 wherein a digital time stamp hash is prepared for each of a plurality of named objects to allow a property of these named objects to be proven at a later date.

15. The method of claim 14 wherein a random or other difficult to guess element is incorporated into the digital time stamp hash for each of the plurality of named objects, to prevent the property from being proven if this element is deleted.

16. The method of claim 1 wherein it is determined that the stored data item is no longer associated with any named object and the storage space used by the stored data item is reused.

17. The method of claim 1 further comprising a challenge step to ascertain that the depositing client program has the entirety of the data item being deposited.

18. The method of claim 1 wherein user identification is required when the depositing client program is acting on behalf of a depositor user and access to the data item being deposited for the depositor user can be shared with other users.

19. The method of claim 1 wherein identity information about a depositor user associated with the depositing client program is made available to the retrieval client program, to discourage unlawful sharing of proprietary information.

20. The method of claim 19 wherein the identity information is stored in an encrypted form that users with whom the depositor has shared access to the named object can both read and decrypt.

21. The method of claim 1 wherein when the identity of a depositor user associated with the depositing client program has not been verified, restrictions are placed on the use of the access authorization credential by retrieving programs associated with other users.

22. The method of claim 21 wherein the rate of retrieving data associated with the named object is limited.

23. The method of claim 1 wherein the deposit client program runs on a client machine and is a mirroring program which determines which data items to transmit to the data repository, and wherein that determination is based at least in part on the result of a comparison of digital fingerprints establishing that certain data items are not already stored in the data repository.

24. The method of claim 1 wherein an index data item comprises fingerprints that identify a plurality of data items stored in the data repository, and the index data item is the data item that is deposited in the data repository and associated with the named object and the named object is used by the retrieving client program to retrieve one of the plurality of stored data items.

25. The method of claim 1 wherein access to the named objects can be transferred between data repository users using the access authorization credential, without communicating with the data repository.

26. The method of claim 25 wherein at least one class of data repository users is not permitted to transfer access to their named objects to other users using access-authorization credentials individually associated with their named objects.

27. The method of claim 1 wherein different physical locations comprise different hard disk drives.

28. The method of claim 1 wherein the physical locations comprise data servers linked by a network.

29. The method of claim 1 wherein establishing from the comparing of digital fingerprints, without comparing the entire contents of the deposited data item to the entire contents of a data item already stored, whether a stored data item is identical to the deposited data item comprises transmitting over the network the digital fingerprint of the deposited data item rather than the deposited data item itself.

30. The method of claim 1 wherein the depositing client program comprises a file server.

31. The method of claim 1 wherein the named objects represents a directory of a file system stored within the data repository.

32. The method of claim 1 wherein a structured item is split up into a plurality of data items with the divisions occurring at content dependent boundaries and the deposited data item is one of the plurality of data items.

33. The method of claim 1 wherein the deposited data item is one of a plurality of identical data items deposited independently by a plurality of the depositing client programs, and a corresponding plurality of retrieving client programs all share read access to the stored data item.

34. The method of claim 33 wherein a retrieving client program which does not possess an access authorization credential cannot read the stored data item.

35. The method of claim 1 wherein the data repository is the database.

36. The method of claim 1 wherein the depositing client program and the retrieving client program are the same program.

37. The method of claim 1 wherein there exists a defined protocol used by data repository client programs to communicate with the data repository and the defined protocol allows data repository clients to deposit data items without sending their full contents if identical data items are already stored in the data repository, and the defined protocol only allows data repository clients to retrieve data items indirectly, by using access authorization credentials to select named objects.
Description



BACKGROUND OF THE INVENTION

For almost as long as there have been computer networks, there have been schemes which allow computers to access each other's file systems over the network in much the same manner as they access their own local file system. The first widely used remote file access protocol was Sun Microsystems' network file system (NFS), which became very popular with the rise of Unix in the mid 1980's (see B. Nowicki, "NFS: Network File System Protocol Specification," Network Working Group RFC1094, March 1989). At about the same time, the SMB network file sharing protocol was developed by IBM for use with their PC's. Subsequent versions of SMB have become widely used on networked PC's running Microsoft Windows, and on their fileservers.

Keeping data in networked file systems allows users to access the same data environment from different workstations on the network, and greatly simplifies system administration and the sharing of public data. For these and other reasons, it is expected that network data repositories will become widely popular among PC users as soon as typical PC network connections become fast enough to make substantial remote storage of data practical. Indeed, some Web-based services which make specific types of user data accessible from any Web browser are already popular--for example, email services and appointment calendars. Servers for individuals' Web pages also follow the network-data model.

Many companies are offering additional Web-based services which store their data remotely, seeking new applications that will become popular. Some of these companies also offer substantial amounts of free network-based file storage. The greatest obstacle to the acceptance of these new network-based services has been slow network connections. Most computer users currently connect to the network through a telephone modem, which provides them with a connection that is about 1000 times slower than the I/O bandwidth to their local hard disk. This makes it relatively inconvenient to use remote network-based storage for most of the applications that these users now run on their local file system.

Some companies currently sell network-based backup services to PC users. For a fee, these companies provide a combination of PC software and networked storage space that allows users to keep a copy of their most important data remotely. For privacy, the PC software encrypts user data before sending it to be stored, using the user's individual public key. Some of these companies also offer Web-based access to backed-up data. Thus far, these companies have not achieved an appreciable penetration into the PC user market. Slow network connections, the cost and effort involved in obtaining and using such services, and a low perceived benefit attached to maintaining backups of file data, have been major obstacles. For the moment, most of the Gigabytes of programs and data that users accumulate remain exclusively on their local hard disks.

Use of network storage is also encouraged by techniques which speed up network file transfers. One such technique involves the concept of a "digital fingerprint" of a file, also called a "hash function", a "content signature" or a "message digest" (see R. L. Rivest, "MD4 Message Digest Algorithm," Network Working Group RFC1186, October 1990). A fingerprint is a fixed-length value obtained by mixing all of the bits of the file together in some prescribed deterministic manner--the same data always produces the same fingerprint. The fingerprint is used as a compact representative of the whole file: if two file fingerprints don't match, then the files are different. For a well designed fingerprint, the chance that any two actual files will ever have the same fingerprint can be made arbitrarily small. Such a fingerprint serves as a unique name for the file data.

Fingerprints have been used for many years to avoid unnecessary file transfers. One application of this sort has been in Bulletin Board Systems (BBSs), which have used fingerprints since the early 1990's to avoid the communication cost of uploading file data that is already present in the BBS, but associated with a different file name. Fingerprints have also been used in BBSs to conserve storage space by not storing duplicate data (for an example of both uses, see Frederick W. Kantor's Content Signature software, FWKCS, which has been in use by bulletin boards such as Channel 1 since at least 1993). These BBSs maintain a table of fingerprints for all files already present. When a new file is uploaded for storage on the BBS, its fingerprint is taken. If the BBS already contains a file with the same fingerprint (regardless of the file's name) then the duplicate data is not stored. Similarly, a client computer wishing to store data into the BBS can compute the fingerprint of the file that it wishes to send, and send that first. If a file containing this data is already present in the BBS, then the client is informed and need not send anything.

D. A. Farber and R. D. Lachman, in U.S. Pat. No. 5,978,791 (Data processing system using substantially unique identifiers to identify data items, whereby identical data items have the same identifiers, filed October 1997) carry the idea of file fingerprints a step further, using them as the primary identifier for all data-items stored in a file system. In their scheme, not only are fingerprints used to avoid unnecessary transmission and duplicate-storage of file data (as in the BBS scheme mentioned above), but they also use fingerprints directly to gain read access to data. In this scheme, access to "licensed" data is controlled by associating explicit lists of licensees with specific data-items. Such a control mechanism doesn't scale well when applied to intellectual property protection in general. Any data-item added to the system which is copyrighted, for example, would have to have attached to it an explicit list of all users who are legally allowed to read it. Otherwise someone can give out access to the data-item to everyone that uses the file system by anonymously publishing the fingerprint of the data-item. Constructing an explicit legal-access list for each data-item is in general cumbersome, difficult and intrusive.

Furthermore, existing schemes which use fingerprints to identify redundant data and avoid unnecessary transmission and storage depend upon the storage system being able to examine previously stored data. If users independently encrypt their data for privacy, they can't take advantage of each others data to save on transmission or on storage. If data is unencrypted, then the storage system maintainers have complete access to all user data. They may be tempted or coerced into looking at this data, and in some situations may be legally obliged to provide parts of it to third parties.

SUMMARY OF THE INVENTION

In general, the invention features a method by which more than one client program connected to a network stores the same data item on a storage device of a data repository connected to the network. The method comprises encrypting the data item using a key derived from the content of the data item, determining a digital fingerprint of the data item, and storing the data item on the storage device at a location or locations associated with the digital fingerprint.

In preferred implementations, one or more of the following features may be incorporated. The method may further include testing for whether a data item is already stored in the repository by comparing a digital fingerprint of the data item to digital fingerprints of data items already in storage in the repository. The same digital fingerprint may be used for storing the data item on the storage device and for testing whether a data item is already stored in the repository. Encrypting of the data item may be performed by the client prior to transmitting the data item to the storage device. The method may further include encrypting the key and storing the encrypted key on the storage device or on another storage device connected to the network. A client or user specific key may be used to encrypt the key derived from the content of the data item. The key derived from the content of the data item may be the same for all copies of the data item stored in the repository. Users of the method may be grouped into families, and the key derived from the content of the data item may be the same for all copies of the data item stored in the repository by users in the same family, but may be different for users in different families. One or more additional copies or other forms of redundant information about the data items may be stored on the storage device or on other storage devices connected to the network for data integrity, availability, or accessibility purposes and not to provide separate storage of the data item for different client programs. The method may further include associating the data item with each of a plurality of access-authorization credentials, each of which is uniquely associated with a particular user or client program. The method may further include associating the data item with each of a plurality of access-authorization credentials, each of which is uniquely associated with a particular user or client program. Associating of the data item with each of a plurality of access-authorization credentials may include storing a plurality of named objects, each named object comprising information representative of the data item paired with information representative of one of the access-authorization credentials. The information representative of the data item may be a digital fingerprint. The information representative of the access-authorization credential may be a cryptographic hash of all or part of the access-authorization credential. The cryptographic hash may be an access identifier that uniquely identifies the data item for a particular user or client program. The named object may be a data structure created by the client program. The named object may be a data structure created by a server program acting on behalf of the repository. The method may further include a client replacing an existing version of a data item stored on the storage device with a new version of that data item, by replacing the existing named object with a new named object. The method may further include a client retrieving a data item by accessing a named object using an access-authorization credential to select the named object, and using the contents of the named object to determine the location of the data item on the storage device. The named objects may further include version information associating different data items with different versions of the named object. A backup of data items stored on the storage device may be accomplished by preserving copies of the current versions of named objects in existence at the time of the backup. Data items associated with named objects may not be deleted from the repository, and wherein records are kept of the association between data items and names in order to define named objects, and wherein named objects may be backed up by preserving copies of the named object records in existence at the time of the backup. A backup of data items stored on the storage device may be accomplished by preserving copies of the current versions of named objects in existence at the time of the backup. A plurality of backups may be made at spaced time intervals. The backup may be accomplished by declaring that after a prescribed moment in time a new version of each named object will be created the first time that a new data item is associated with it. The prescribed moment in time is determined separately for each named object. Copies of named objects may be preserved by creating a new version of each named object each time that a new data item is associated with it. Versions of named objects that are deemed unnecessary may be deleted. The determination of which versions of a named object to delete may be based in whole or in part on the times at which the versions were created, and the intervals between these times. The method may further include preparing a digital time stamp of a plurality of named objects to allow a property of these named objects to be proven at a later date. A random or other difficult to guess element may be incorporated into the time stamp hash for each named object, to prevent the property from being proven if this element is deleted. The method may further include determining that a data item stored on the storage device is not referenced by any named object, and reusing the storage space used to store the unreferenced data item. The method may further include altering one or more properties or parameters associated with an access-authorization credential to change the access rights of a client or user to the data item referenced by that credential. The method may further include a challenge step to ascertain that the client has the full data item. The challenge step may require that the client attempting to store a data item provide correct answers to inquiries as to the content of portions of the data item. The data item content on which the challenge is based may be selected with a degree of randomness. Depositors may use the client to store data items in the repository, and at least some depositors may be required to provide identification upon storing at least some data items. Rules for when a depositor must provide identification may be selected in order to discourage unlawful distribution of access to the data item. There may be a greater degree of user identification or a higher likelihood that user identification will be required when the data item being stored by the depositor has been indicated to be shareable with other users. For a class of data items the items may only be shared if the depositor has provided adequate identification. Identity information about the depositor may be made available to anyone able to access the data item, to discourage unlawful sharing. The identity information may be stored in an encrypted form that the depositor and users subsequently accessing the shared data item can both read. The repository may not have access to the identity information about the depositor. There may be trial users of the repository, and the identity of such trial users may not have not been well verified, but restrictions may be placed on sharing of data items deposited by such trial users. The method may further include limiting access to data items deposited by a poorly verified trial user. Limited access may be provided by limiting the aggregate bandwidth provided for such accesses. Limited access may be provided by limiting the number of simultaneous accesses to the data items. The client may have a directory structure for the data items, the data items may be stored in the repository, and the directory structure may not be evident to the repository maintainers. The client program using the repository may determine which data items to deposit in the repository, and wherein that determination may be based at least in part on the result of a comparison of digital fingerprints establishing that certain data items are not in the repository. Mirroring software may be downloaded to the client using a bootstrap process, wherein a small bootstrap program may be downloaded and executed, and the bootstrap program may manage download and installation of the remainder of the mirroring software. The default for deciding what data items to mirror may be to mirror all data items. The mirroring may include making a determination of which data items need to be transmitted to the repository, and wherein that determination may be based primarily on a comparison of digital fingerprints for data items at the client and data items in the repository. The access-authorization credential may be determined in part by computing a hash involving elements of the pathname for a file on the client computer. The path name hash may be made unique to a client by introducing a reproducible but randomly chosen element into it. A data item may be represented as a composite of objects, and the component objects may be separately deposited in the repository. Lists of fingerprints for data-items making up a composite data-item may be deposited as an index data item, which can be given an object-name and used for obtaining access to any of the component data-items. A proof-of-deposit may be returned for each component deposit, and the proofs may be presented when the index data item is given an object-name. When transmitting a composite data-item, the client may use fingerprints to avoid retransmitting components following loss of communication. The composite data-item may be encrypted with a key that is only made available to the repository at the moment of access. An email message may be broken up into composite items in such a manner that the individual attachments may be separate component data-items. The physical location at which information about named-objects is stored may be based on access identifiers, to introduce reproducible pseudorandomness into the physical locations of the named-object data. Fingerprints may be determined directly from the data items, and this process produces randomly distributed numbers which can be used to introduce reproducible pseudorandomness into the physical locations of the data items. The repository may give the client a deposit receipt which allows the user to prove that the deposit occurred. An access identifier may be formed to provide proof of ownership of the data item stored in the repository, the access identifier may be formed by producing a one-way hash including identifying information chosen by the client program to identify the data item, and the one-way hash may not be reversed to permit the repository to discover the identity of the client program or user. The identifying information may be associated with the data item on the client. The identifying information may be derived at least in part from the path name of the data item on the client. User-identifying information may be provided to the repository as part of the access-authorization credential. At least some access-authorization credentials may be transferred between users without the use of the repository. At least one class of users may not be permitted to transfer access using access-authorization credentials.

In a second aspect, the invention features another method by which more than one client program connected to a network stores the same data item on a storage device of a data repository connected to the network. The method comprises determining a digital fingerprint of the data item, testing for whether a data item is already stored in the repository by comparing the digital fingerprint of the data item to the digital fingerprints of data items already in storage in the repository, and challenging a client that is attempting to deposit a data item already stored in the repository, to ascertain that the client has the full data item.

In preferred implementations, one or more of the following features may be incorporated. The challenging may require that the client provide correct answers to inquiries as to the content of portions of the data item. The data item content on which the challenge is based may not easily be predicted by the user or client program. The data item content on which the challenge is based may be determined by the client program without the aid of the repository. Future access to the data item may be provided by creating an access-authorization credential which can be presented at a later time to prove that the challenge has been met for that data item. Each access authorization credential may be uniquely associated with an access owner. Each access authorization credential may include information sufficient to identify the access owner. The access authorization credential may include a fingerprint. The fingerprint may be different from the fingerprint used for testing whether the data item is already stored in the repository. The access authorization credential may be associated with a fingerprint in the repository. The access authorization credential may be associated directly with the data-item or with a record in the repository that is associated with the data-item. The record in the repository with which the access authorization credential is associated may be an access identifier that is associated with the credential by computation of a one way hash function. The access identifier may be stored in the repository and may be compared with a later hash of an access authorization credential to verify access permission to a named object. The access authorization credential may include information sufficient to respond to a challenge. The access authorization credential may include data proof information created during a challenge process that is sufficient to prove to the repository that the challenge was passed. This data proof information may include the actual challenge response, so that it can be directly verified against the data-item. At least some access-authorization credentials may be transferred between users without the aid of the repository. The usage of some access authorization credential may be restricted for at least one class of access owners. The access authorization credential may only be usable by the access owner. The aggregate bandwidth available to all users of the access authorization credential may be limited. At the time of deposit at least some data items may be associated with a minimum expiration time. At least some data items that expire may be removed and their storage space reused. The repository may keep track of which access owners have deposited a given data item. Upon an access owner informing the repository that a data item is no longer needed, the data item may be deleted or the expiration of the data item may be accelerated. The repository may truncate the list of depositors associated with a data-item, and may never accelerates the expiration of this data item. The method may further include encrypting the data item using a key derived from the content of the data item. Encrypting of the data item may be performed by the client prior to transmitting the data item to the storage device. The method may further include encrypting the key and storing the encrypted key on the storage device or on another storage device connected to the network. A client or user specific key may be used to encrypt the key derived from the content of the data item.

In a third aspect, the invention features a method by which more than one client program connected to a network stores the same data item on a storage device of a data repository connected to the network. The method comprises determining a digital fingerprint of the data item, storing the data item on the storage device at a location or locations associated with the digital fingerprint, associating the data item with each of a plurality of access-authorization credentials, each of which is uniquely associated with an access owner, and preparing a digital time stamp of a plurality of records associating data-items and credentials, to allow a property of these records to be proven at a later date.

In preferred implementations, one or more of the following features may be incorporated. Preparing the digital time stamp may include forming a time stamp hash, and a difficult to guess or random element may be incorporated into the time stamp hash, to prevent the property from being proven if this element is deleted. All data items in the repository may be time stamped if they remain in the depository for a sufficiently long time period.

In a fourth aspect, the invention features a method for detecting the relative uniqueness of a data item in a repository of data items stored on a storage device at locations associated with their digital fingerprints. The method comprises determining a digital fingerprint of the data item, and determining (or approximating) the number of users with authorization credentials for the data item.

In preferred implementations, one or more of the following features may be incorporated. The data item may be a portion of the body of an e-mail message, and the method may be used to determine the relative uniqueness of the e-mail message in a large population of e-mail messages to determine the likelihood that the e-mail is spam. A decision as to whether a data item is a virus may be made by comparing the relative uniqueness of both the data item and other data items associated with the same application.

In a fifth aspect, the invention features a method for detecting whether a suspect data item is infected with a virus that has a uniform impact on an infected data item. The method comprises determining a digital fingerprint of the suspect data item, comparing the digital fingerprint of the suspect data item to the digital fingerprints of infected data items known to be infected with a virus that consistently affects the data item in the same manner, and basing a decision that the suspect data item contains the virus based on there being a match between the fingerprint of the suspect data item and one or more of the fingerprints of the infected data items.

In preferred implementations, one or more of the following features may be incorporated. The method may further include collecting and providing usage statistics based on number of pointers to a data item in the repository. The usage statistics may be configured to provide marketing penetration information on the data item.

In a sixth aspect, the invention features a method by which more than one client connected to a network stores the same data item on a storage device of a data repository connected to the network. The method comprises determining a digital fingerprint of the data item, testing for whether a data item is already stored in the repository by comparing the digital fingerprint of the data item to the digital fingerprints of data items already in storage in the repository, and associating with a data item an informational tag which may be read by at least some client programs.

In preferred implementations, one or more of the following features may be incorporated. The informational tag may indicate at least one of the following: whether the data item contains spam, whether the data item contains or is a virus, whether the data item is copyrighted, by whom the data item is copyrighted, what royalty payment is due for the copyright. The method may further include the process of collecting royalties or other payments for use of a copyright on a data item based on the indication of whether a data item is copyrighted. The process may enable voluntary payment of such royalties or payments. At least some of the tags may be encrypted using the same key as for each data item, so that users with the data item can read the informational contents of the tag.

In a seventh aspect, the invention features a method by which more than one client connected to a network may store the same data item on a storage device of a data repository connected to the network, and wherein there is a public data repository and a private data repository. The method comprises determining a digital fingerprint of the data item, testing for whether a data item is already stored in the public repository by comparing the digital fingerprint of the data item to the digital fingerprints of data items already in storage in the public repository, and if the data item is present in the public repository, storing a named object in the public repository associating the client with the data item and relying on storage of the data item in the public repository; and if the data item is not present in the public repository, storing a named object in the private repository and relying on storage of the data item in the private repository.

In preferred implementations, one or more of the following features may be incorporated. The client may store a named object for the data item exclusively either in the public or the private repository. The data items may be widely circulated non-electronic media such as books or music, and the method may further include converting the widely circulated non-electronic media to a standardized electronic version, storing the standardized electronic version as a data item in the repository, promoting the availability of the standardized electronic version to users with the right to have access, whereby the likelihood of the data repository storing multiple, slightly-different electronic versions of the non-electronic media is reduced.

In an eighth aspect, the invention features a method by which a client connected to a network over a lower speed connection may provide higher speed access to a data item for application processing than is possible over the relatively low speed connection to the network. The method comprises determining a digital fingerprint of the data item, testing for whether the data item is already stored in a repository by comparing the digital fingerprint of the data item to digital fingerprints of data items already in the repository, only if the data item is not already in the repository, transferring the data item over the lower speed connection from the client to the repository, the repository being connected to the network over a higher speed connection than the client, making a higher speed connection between an application server and the data repository, executing an application on the application server to process the data item stored on the data repository, and returning at least some of the processed data to the client across the lower speed connection.

In preferred implementations, one or both of the data transfers to and from the client may be conducted in the background while other applications are running on the client.

In a ninth aspect, the invention features a method by which multiple clients browse content on a network such as the Internet. The method comprises each of the multiple clients accessing content on the network via one or more proxy servers, determining the digital fingerprint of an item of content passing through the proxy server, storing the item of content in a content repository connected to the proxy server at a location associated with the digital fingerprint, testing for whether a content data item is already stored in the repository by comparing the digital fingerprint of the content data item to the digital fingerprints of content data items already in storage in the repository, associating a content data item already stored in the repository with an access authorization credential uniquely associated with an access owner.

In preferred implementations, one or more of the following features may be incorporated. The data repository may save substantially all content browsed by the clients, thereby preserving the content after it has been altered or removed from the network. The method may further include granting search engines access to the stored content data items or to information about the number of times that data items have been accessed or how recently the data items have been accessed.

In a tenth aspect, the invention features a method by which a plurality of clients connected to a network store the same broadcast data on a storage device of a data repository connected to the network, wherein the broadcast data comprises a sequence of frames or other fragments. The method comprises determining a digital fingerprint of each fragment, testing for whether the fragment is already stored in the repository by comparing a digital fingerprint of the fragment to digital fingerprints of fragments and other data items already in storage in the repository, having only the client or clients that determine that a fragment is not stored in the repository transmit the fragment to the repository, whereby because all but one or a small number of clients will not have to transmit the fragment to effect storage of the fragment to effect storage of the fragment n the repository, most of the clients are able to store the broadcast data in the repository without actually transmitting a significant fraction of the data to the repository.

In preferred implementations, the broadcast data may be video and the fragments may be frames of video. The encrypting may be performed by cellular automata, and may include dividing a data-item into segments in which at least some bits in each segment are considered to be homologous, transforming disjoint groups of homologous bits by applying a state-permutation operation separately to each group, and changing which bits are considered to be homologous and repeating the process. The arrangement of bits into segments can be expressed as having a spatial interpretation, and the spatial origin of each segment may be shifted in a manner determined by an encryption key, with bits in different segments that have the same spatial coordinates considered to be homologous. An encryption key may be used to determine what state-permutation operation is applied to each group of homologous bits in each step. Coalescence may be used for backup/mirroring in which substantially all of a personal computer's data is backed up in this fashion. The method may provide a mirroring capability for a personal computer, and mirroring software with instructions for carrying out the aforesaid steps may be preconfigured on the personal computer upon purchase. The method may provide a mirroring capability for a personal computer, and mirroring software for carrying out the method may be initially configured to mirror essentially all data on the user's computer. The method may provide a mirroring capability for a wireless network device.

In an eleventh aspect, the invention features a method for selling a backup service for backing up or mirroring data on a client computer. The method comprises accepting an unlimited amount of backup or mirroring data from a plurality of client computers, and storing the data in one or more repositories to which the client computers are connected via a network, for free or at a charge substantially less than sufficient to cover the cost of operating the backup service, charging a substantial fee, greater than the fee charged for accepting the data, for recovery of the data from the repositories.

In preferred implementations, one or more of the following features may be incorporated. The fee charged for recovery may be greater when the recovered data is provided quickly, either by express delivery of media containing the data or by delivery over a high-speed data connection. The recovery of data over a slow-speed data connection may be provided at no fee or at a charge substantially less than sufficient to cover the cost of operating the backup service. Data coalescence using digital fingerprints may be used to reduce the amount of data transmitted and stored during backup or mirroring. A charge may be made to third parties for high-speed network access to the client data resident on the repositories.

Other features and advantages of the various aspects of the invention will be apparent from the following detailed description and from the drawings.

DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram depicting a user's query to the repository to determine if data is present, and transmit it if necessary.

FIG. 2 is a block diagram depicting the creation of a named object to secure future read access to a data-item.

FIG. 3 is a block diagram depicting a read operation using a named object.

FIG. 4 depicts how a mirroring client can be downloaded and run on a user's computer with very little effort, time or user supervision.

FIG. 5 depicts the data-item encryption process, which produces an encrypted data-item that is user-independent.

FIG. 6 depicts a way to allow a user to prove ownership of a named-object, without requiring the repository to hold information from which it can identify the user.

FIG. 7 illustrates the steps involved in depositing a composite-item and associating it with a named-object.

FIG. 8 illustrates the steps involved in reading a portion of a composite-item.

FIG. 9 is a block diagram depicting a user's request that the repository modify a named object to point to new data in the storage.

FIG. 10 is a block diagram depicting an embodiment of the repository's timestamping service.

FIG. 11 is a block diagram depicting an encryption scheme based on a reversible cellular automaton.

DETAILED DESCRIPTION

This invention deals with the organization and operation of a network-based data repository and an associated data services business. This organization and method of operation are designed to make it both feasible and attractive for computer users with slow network connections to store a copy of their local file system data in remote network-connected storage. The same repository organization is also designed to provide efficient storage and data transmission for users with high-bandwidth network connections. This organization addresses feasibility and attractiveness not only in technical matters, but also in societal and legal matters, such as privacy and copyright.

The envisioned data repository consists of a set of data storage devices connected to the Internet, along with the hardwa


Free Web Sudoku Puzzles.
Solve with your browser.
    1 4 2   6    
  5   1         3
  2         9    
  6   8       3 7
                 
3 7       2   9  
    2         6  
6         8   2  
    5   6 3 4    
What is it?



Add Your Site · Terms Of Service · Privacy Policy


DISCLAIMER
Linkgrinder is a free service that searches the Internet and indexes all files found so that you may search quickly and easily for shared files. These files are created and made available individually by users whose identity we are not aware of and who we have no control over. In essence we function like a search engine tool; these files ARE NOT STORED OR SERVED BY OUR NETWORK. We are not responsible for any materials obtained by using our service. We do not monitor any of the contents of these files. These files may contain viruses, illegal materials, materials inappropriate for minors, offensive files and the like. BY USING OUR SERVICE, YOU ASSUME FULL RESPONSIBILITY FOR DOWNLOADING THESE MATERIALS AND WILL INDEMNIFY US FOR ANY DAMAGES THAT MAY BE INCURRED.

For More Specific Information VIEW OUR TERMS OF SERVICE.

Thank you and Enjoy!