Senior Fitness - Exercise and Nutrition for Aging Men and Women
FREE Article Feed for your website.
Home Ownership Magazine
Party Planning Information
Article Marketing Resources
Bio-Medical Research Article Database
Informative Articles on Life, Love and Happiness
Tutorials on Business to Writing
Famous Quotes from Famous People
Song Lyric Information
New US Patent Information
Comprehensive List of Content by Category
Online Auctions and Shopping Related Articles
Article Search
Most Recent Articles
Title: Method of upgrading optical node, and an optical node apparatus
Patent Number: 7,522,839 Issued on 04/21/2009 to Onaka,   et al.

Title: Information recording medium, recording apparatus, editing apparatus, reproduction apparatus, recording method, editing method, and reproduction method
Patent Number: 7,522,814 Issued on 04/21/2009 to Horii,   et al.

Title: Optical switches and actuators
Patent Number: 7,522,789 Issued on 04/21/2009 to Dames,   et al.

Title: Method and system for skin color estimation from an image
Patent Number: 7,522,769 Issued on 04/21/2009 to Harville,   et al.

Title: Method and system for imaging an object
Patent Number: 7,522,764 Issued on 04/21/2009 to Schwotzer

Title: Systems, methods and apparatus for filtered back-projection reconstruction in digital tomosynthesis
Patent Number: 7,522,755 Issued on 04/21/2009 to Li,   et al.

Title: Simultaneous optical flow estimation and image segmentation
Patent Number: 7,522,749 Issued on 04/21/2009 to Zitnick, III,   et al.

Title: Vehicle detection apparatus and method
Patent Number: 7,522,747 Issued on 04/21/2009 to Horibe

Title: Dual feedback control system for implantable hearing instrument
Patent Number: 7,522,738 Issued on 04/21/2009 to Miller, III

Title: Method and apparatus for providing authentication in a communication system
Patent Number: 7,522,727 Issued on 04/21/2009 to Sowa,   et al.

Title: Transmitter device, transmitting method, receiver device, receiving method, communication system, and program storage medium
Patent Number: 7,522,726 Issued on 04/21/2009 to Ishiguro,   et al.

Title: System and method of transmission of generalized scalable bit-streams
Patent Number: 7,522,724 Issued on 04/21/2009 to Mukherjee

Title: Password self encryption method and system and encryption by keys generated from personal secret information
Patent Number: 7,522,723 Issued on 04/21/2009 to Shaik

Title: System and method for server based conference call volume management
Patent Number: 7,522,719 Issued on 04/21/2009 to Carlson,   et al.

Title: Method and apparatus for providing emergency calls to a disabled endpoint device
Patent Number: 7,522,717 Issued on 04/21/2009 to Croak,   et al.

Title: Telephone outlet for implementing a local area network over telephone lines and a local area network using such outlets
Patent Number: 7,522,714 Issued on 04/21/2009 to Binder

Title: Network for telephony and data communication
Patent Number: 7,522,713 Issued on 04/21/2009 to Binder

Title: Delivery of audio driving directions via a telephone interface
Patent Number: 7,522,711 Issued on 04/21/2009 to Stein,   et al.

Title: Personalizable and customizable feature execution for IP telephony using operational semantics and deontic task trees
Patent Number: 7,522,710 Issued on 04/21/2009 to Gray,   et al.

Title: Focus/detector system of an X-ray apparatus for generating phase contrast recordings
Patent Number: 7,522,708 Issued on 04/21/2009 to Heismann,   et al.

Title: Power handling methods and apparatus
Patent Number: 7,522,705 Issued on 04/21/2009 to Katcha,   et al.

Title: Battery-powered portable x-ray imaging apparatus
Patent Number: 7,522,704 Issued on 04/21/2009 to Sung,   et al.

Title: Method for automatic defect recognition in testpieces by means of an X-ray examination unit
Patent Number: 7,522,700 Issued on 04/21/2009 to Bavendiek,   et al.

Title: Focus/detector system of an X-ray apparatus for generating phase contrast recordings
Patent Number: 7,522,698 Issued on 04/21/2009 to Popescu,   et al.

Title: X-ray CT apparatus
Patent Number: 7,522,697 Issued on 04/21/2009 to Satta,   et al.

Title: X-ray CT apparatus
Patent Number: 7,522,696 Issued on 04/21/2009 to Imai

Title: X-ray CT apparatus
Patent Number: 7,522,695 Issued on 04/21/2009 to Nishide,   et al.

Title: Passive safety-grade decay-heat removal method and decay-heat removal system for LMR with pool direct heat cooling process
Patent Number: 7,522,693 Issued on 04/21/2009 to Eoh,   et al.

Title: Isolator and a modem device using the isolator
Patent Number: 7,522,692 Issued on 04/21/2009 to Yukutake,   et al.

Title: Phase-locked circuit
Patent Number: 7,522,691 Issued on 04/21/2009 to Katakura

Title: Jitter self test
Patent Number: 7,522,690 Issued on 04/21/2009 to Zhang

Title: Clock recovery in communication systems
Patent Number: 7,522,689 Issued on 04/21/2009 to Haartsen

Title: Wireless clock system and method
Patent Number: 7,522,688 Issued on 04/21/2009 to Shemesh,   et al.

Title: Clock and data recovery system and method for clock and data recovery based on a forward error correction
Patent Number: 7,522,687 Issued on 04/21/2009 to Cranford, Jr.,   et al.

Title: CMOS burst mode clock data recovery circuit using frequency tracking method
Patent Number: 7,522,686 Issued on 04/21/2009 to Nam,   et al.

Title: Resynchronizing timing sync pulses in a synchronizing RF system
Patent Number: 7,522,685 Issued on 04/21/2009 to Zakrewski

Title: Signal transmission system
Patent Number: 7,522,684 Issued on 04/21/2009 to Sakai,   et al.

Title: Removing bias in a pilot symbol error rate for receivers
Patent Number: 7,522,682 Issued on 04/21/2009 to Obernosterer,   et al.

Title: Method and device for synchronizing a radio transmitter with a radio receiver
Patent Number: 7,522,681 Issued on 04/21/2009 to Kohlmann

Title: Apparatus, system, and method for asymmetric maximum likelihood detection
Patent Number: 7,522,680 Issued on 04/21/2009 to Berman,   et al.

Title: System and method for adapting to a change in constellation density while receiving a signal
Patent Number: 7,522,679 Issued on 04/21/2009 to Betts

Title: Method and apparatus for a data-dependent noise predictive viterbi
Patent Number: 7,522,678 Issued on 04/21/2009 to Ashley,   et al.

Title: Receiver with low power listen mode in a wireless local area network
Patent Number: 7,522,677 Issued on 04/21/2009 to Liang

Title: Method and system for transmitter envelope delay calibration
Patent Number: 7,522,676 Issued on 04/21/2009 to Matero

Title: Digital content preview generation and distribution among peer devices
Patent Number: 7,522,675 Issued on 04/21/2009 to Sheynman,   et al.

Title: Linearly independent preambles for MIMO channel estimation with backward compatibility
Patent Number: 7,522,674 Issued on 04/21/2009 to Hosur,   et al.

Title: Space-time coding using estimated channel information
Patent Number: 7,522,673 Issued on 04/21/2009 to Giannakis,   et al.

Title: Digital branch calibrator for an RF transmitter
Patent Number: 7,522,672 Issued on 04/21/2009 to Saed

Title: Apparatus and method for transmitting and receiving high-speed differential current data between circuit devices
Patent Number: 7,522,671 Issued on 04/21/2009 to Kiamilev,   et al.

Title: Digital transmission circuit and method providing selectable power consumption via single-ended or differential operation
Patent Number: 7,522,670 Issued on 04/21/2009 to Carballo,   et al.

Title: Method and apparatus for selective disregard of co-channel transmissions on a medium
Patent Number: 7,522,669 Issued on 04/21/2009 to Husted,   et al.

Title: Radio communication system and radio transmitter
Patent Number: 7,522,668 Issued on 04/21/2009 to Horiguchi

Title: Method and apparatus for dynamic determination of frames required to build a complete picture in an MPEG video stream
Patent Number: 7,522,667 Issued on 04/21/2009 to Gould

Title: VSB transmission system for processing supplemental transmission data
Patent Number: 7,522,666 Issued on 04/21/2009 to Choi,   et al.

Title: Mobile terminal with camera
Patent Number: 7,522,665 Issued on 04/21/2009 to Saw

Title: Remote live video inspection
Patent Number: 7,522,664 Issued on 04/21/2009 to Bhaskar,   et al.

Title: Burst error limiting feedback equalizer system and method for multidimensional modulation systems
Patent Number: 7,522,663 Issued on 04/21/2009 to Koralek

Title: Electronic device including image forming apparatus
Patent Number: 7,522,662 Issued on 04/21/2009 to Kajita

Title: Method of producing a two-dimensional probability density function (PDF) eye diagram and Bit Error Rate eye arrays
Patent Number: 7,522,661 Issued on 04/21/2009 to Nelson,   et al.

Title: Pulse pattern generating apparatus
Patent Number: 7,522,660 Issued on 04/21/2009 to Sato,   et al.

Title: Universal serial bus (USB) 2.0 legacy full speed and low speed (FS/LS) mode driver
Patent Number: 7,522,659 Issued on 04/21/2009 to Lacy,   et al.

Title: Design method and implementation of optimal linear IIR equalizers for RF transceivers
Patent Number: 7,522,658 Issued on 04/21/2009 to Jensen

Title: Throughput maximization in wireless communication systems
Patent Number: 7,522,657 Issued on 04/21/2009 to Ahmed,   et al.

Title: Reception of multiple code length CDMA transmissions
Patent Number: 7,522,656 Issued on 04/21/2009 to Zhengdi,   et al.

Title: Method and device for carrying out a plurality of correlation procedures in a mobile telephony environment
Patent Number: 7,522,655 Issued on 04/21/2009 to Ruprich,   et al.

Title: Finger using mixed weighting, and its application for demodulation apparatus and method
Patent Number: 7,522,654 Issued on 04/21/2009 to Im

Title: System and method for PN correlation and symbol synchronization
Patent Number: 7,522,653 Issued on 04/21/2009 to Griffin,   et al.

Title: Finger using chip-rate weighting in smart antenna system, and its application for demodulation apparatus and method
Patent Number: 7,522,652 Issued on 04/21/2009 to Im

Title: Solid-state lasers employing incoherent monochromatic pump
Patent Number: 7,522,651 Issued on 04/21/2009 to Luo,   et al.

Title: Gas discharge laser chamber improvements
Patent Number: 7,522,650 Issued on 04/21/2009 to Partlo,   et al.

Title: Submount of a multi-beam laser diode module
Patent Number: 7,522,649 Issued on 04/21/2009 to Ha,   et al.

Title: Hybrid type integrated optical device
Patent Number: 7,522,648 Issued on 04/21/2009 to Park,   et al.

Title: Semiconductor laser and method of fabricating the same
Patent Number: 7,522,647 Issued on 04/21/2009 to Hatori,   et al.

Title: Vertically emitting optically pumped diode laser with external resonator
Patent Number: 7,522,646 Issued on 04/21/2009 to Brick,   et al.

Title: Nitride-based semiconductor laser device
Patent Number: 7,522,645 Issued on 04/21/2009 to Tanaka

Disk array system and a method of avoiding failure of the disk array system Number:7,028,216 from the United States Patent and Trademark Office (PTO) owispatent

Home    Author Login    Submit Article    Article Search    Add Your Link    Edit Your Link    Contact Us    Advertising    Disclaimer

   

 
Web LinkGrinder.com

Top Breaking News
     Biden Celebrates US Independence Day with Troops in Iraq by VOA News
     Pakistani Airstrikes Kill at Least 10 Militants in Northwest by VOA News
     New US Offensive in Southern Afghanistan Puts Pakistani Military on Alert by Catherine Maddux

Title: Disk array system and a method of avoiding failure of the disk array system

Abstract: The invention is intended to reduce disk access during data transfer from a disk in which occurrence of failure is anticipated to a spare disk as much as possible so that occurrence of double-failure is prevented in advance. When occurrence of failure in a disk which configures a RAID group, contents stored in the disk is copied to the spared disk. Simultaneously, another RAID group is paired with the above described RAID group and a secondary volume is provided therein. A write request is directed to the secondary volume. A differential bitmap controls a update data. Data which is not updated is read out from the primary volume, and data which is already updated is read from the secondary volume. When data transfer is completed, contents stored in the secondary volume are copied to the primary volume.

Patent Number: 7,028,216 Issued on 04/11/2006 to Aizawa,   et al.


Inventors: Aizawa; Masaki (Odawara, JP); Katsuragi; Eiju (Odawara, JP); Fukuoka; Mikio (Odawara, JP); Okamoto; Takeki (Odawara, JP)
Assignee: Hitachi, Ltd. (Tokyo, JP)
Appl. No.: 782925
Filed: February 23, 2004


Foreign Application Priority Data

Nov 26, 2003 [JP] 2003-395322

Current U.S. Class: 714/7 ; 711/114
Current International Class: G06F 11/00 (20060101)
Field of Search: 714/5-13,42,54,702,710,718,799,805 711/161-162,113-114,154,168


References Cited [Referenced By]

U.S. Patent Documents
5579474 November 1996 Kakuta et al.
5617425 April 1997 Anderson
6442711 August 2002 Sasamoto et al.
6615314 September 2003 Higaki et al.
6757782 June 2004 Higaki et al.
6859888 February 2005 Furuya et al.
2002/0066050 May 2002 Lerman et al.
2002/0162057 October 2002 Talagala
2003/0233613 December 2003 Ash et al.
2004/0148544 July 2004 Higaki et al.
2004/0148545 July 2004 Higaki et al.
2004/0153844 August 2004 Ghose et al.
2004/0158672 August 2004 Higaki et al.
Foreign Patent Documents
7-146760 Nov., 1993 JP
Primary Examiner: Le; Dieu-Minh
Attorney, Agent or Firm: Reed Smith LLP Fisher, Esq.; Stanley P. Marquez, Esq.; Juan Carlos A.

Claims



What is claimed is:

1. A disk array system comprising: a channel adapter for controlling data transfer with respect to a host device; a plurality of data disk drives configuring a RAID group, at least one spare disk drive provided as a spare for the data disk drives; a disk adapter for controlling data transfer with respect to the data disk drives and the spare disk drive; a cache memory used by the channel adapter and the disk adapter for storing data; a control memory used by the channel adapter and the disk adapter for storing control information; a backup storage provided separately from the data disk drives and the spare disk drive; a first control unit provided in the disk adapter for observing occurrence of access error with respect to the data disk drives, the first control unit, when the frequency of occurrence of the access error exceeds a predetermined threshold, copying data stored in the data disk drive exceeding the threshold to the spare disk drive via the cache memory; a second control unit provided in the disk adapter for processing access request directed to the RAID group during the copying process, the second control unit making the backup storage take over a write request directed to the RAID group; and a third control unit provided in the disk adapter for copying data written in the backup storage by the second control unit to the data disk drives and the spare disk drive other than the data disk drive exceeding the threshold when the copying process by the fist control unit is finished.

2. A disk array system according to claim 1, wherein the second control unit processes a read request directed to the data disk drive exceeding the threshold based on the data stored in the data disk drives other than the data disk drive exceeding the threshold.

3. A disk array system according to claim 1, wherein the second control unit processes a read request directed to the data disk drives other than the data disk drive exceeding the threshold based on data copied in the backup storage.

4. A disk array system according to claim 1, wherein the second control unit is associated with differential management information for controlling data written in the backup storage, and determines based on the differential management information whether the read request directed to the RAID group is processed based on data stored in the data disk drives other than the data disk drive exceeding the threshold or based on data stored in the backup storage.

5. A disk array system according to claim 1, wherein the second control unit makes only the write request directed to the data disk drive exceeding the threshold out of write requests directed to the RAID group executed by the backup storage, and makes the write request directed to the each disk drive other than the data disk drive exceeding the threshold executed by the corresponding data disk drive.

6. A disk array system according to claim 1, wherein the second control unit makes a write request directed to the RAID group executed by the backup storage when a space more than a predetermined value is left in the backup storage and makes the write request directed to the RAID group executed by the RAID group when a space more than the predetermined value is not left in the backup storage.

7. A disk array system according to claim 1, wherein the first control unit recovers data in the data disk drive exceeding the threshold based on data stored in the data disk drives other than the data disk drive exceeding the threshold, and copies the recovered data to the spare disk drive.

8. A disk array system according to claim 1, wherein a manual instruction unit for making the first control unit execute copying process is provided.

9. A disk array system according to claim 1, wherein the first control unit and the second control unit can perform multiple operations, and the backup storage accepts write requests directed to each of the plurality of RAID groups.

10. A disk array system according to claim 1, wherein the backup storage can be implemented as at least one of another RAID groups having the same configuration as the RAID group described above, a logical volume, or a disk drive.

11. A method of avoiding failure of a disk array system comprising a channel adapter for controlling data transfer with respect to a host device, a plurality of data disk drives configuring a RAID group, at least one spare disk drive provided as a spare for the data disk drives, a disk adapter for controlling data transfer with respect to the data disk drives and the spare disk drives, a cache memory used by the channel adapter and the disk adapter for storing data, a control memory used by the channel adapter and the disk adapter for storing control information, a backup storage provided separately from the data disk drives and the spare disk drive comprising: a first step of observing occurrence of an access error with respect to the data disk drives and judging whether or not the frequency of occurrence of the access error exceeds a predetermine threshold: a second step of copying data stored in the data disk drive exceeding the threshold to the spare disk drive when the data disk drive exceeding the threshold is detected in the first step; a third step of associating the RAID group with the backup storage by starting the copying process in the first step; a fourth step of judging whether or not an access request directed to the RAID group has issued during the copying process in the first step; and a fifth step of writing data in the backup storage associated in the third step when issue of the access request is detected in the fourth step, and if the access request is a write request.

12. A method of avoiding failure of a disk array system according to claim 11, further comprising a sixth step of copying data written in the backup storage in the fifth step to the each disk drive other than the data disk drive exceeding the threshold and the spare disk drive when the copying process in the second step is finished.

13. A method of avoiding failure of a disk array system according to claim 11, wherein the fifth step comprises a step of, processing a read request based on data stored in the data disk drives other than the data disk drive exceeding the threshold when the access request detected in the fourth step is the read request directed to the data disk drive exceeding the threshold.

14. A method of avoiding failure of a disk array system according to claim 11, wherein the fifth step determines whether the read request directed to the RAID group detected in the fourth step is processed based on data stored in the data disk drives other than the data disk drive exceeding the threshold or based on data stored in the backup storage by utilizing the differential management information for controlling data stored in the backup storage.

15. A method of avoiding failure of a disk array system according to claim 11, wherein the fifth step makes only the write request directed to the data disk drive exceeding the threshold out of write requests directed to the RAID group detected in the fourth step executed by the backup storage, and makes a write request directed to the each disk drive other than the data disk drive exceeding the threshold executed by the corresponding data disk drive.

16. A method of avoiding failure of a disk array system according to claim 11, wherein the second step recovers data stored in the data disk drive exceeding the threshold based on data stored in the data disk drives other than the data disk drive exceeding the threshold, and copies the recovered data to the spare disk drive.

17. A method of using a disk drive in a disk array system comprising a plurality of disk drives configuring the RAID group, comprising: a faulty drive detecting step for observing occurrence of an access error with respect to the each disk drive configuring the RAID group, and when the frequency of the access error exceeds a predetermined threshold, determining that it is a faulty disk drive; a data copying step for copying data stored in the faulty disk drive to a normal disk drive other than the each disk drive configuring the RAID group when the faulty disk drive is detected in the fault disk drive detecting step, an access request detecting step for detecting whether or not an access request directed to the RAID group has issued during the copying process in the data copying step; and an access processing step for writing data relating to the write request to a normal disk drive different from the normal disk drive in which the data is copied when a write request is detected in the access request detecting step.

18. A method of using a disk drive in a disk array system according to claim 17, further comprising a data update step for copying data written in the normal disk drive in the access processing step to the each disk drive configuring the RAID group other than the faulty disk drive and the normal disk drive in which the data is copied when the data copy in the data copying step is finished.

19. A method of using a disk drive in a disk array system according to claim 17, wherein the access processing step recovers a requested data based on data stored in the each disk drive configuring the RAID group other than the faulty disk drive when a read request directed to the faulty disk drive is detected in the access request detecting step.

20. A method of using a disk drive in a disk array system according to claim 17, wherein the data copying step restores data stored in the faulty disk drive based on data stored in the each disk drive configuring the RAID group other than the faulty disk drive and copies the recovered data to the normal disk drives.
Description



CROSS-REFERENCE TO RELATED APPLICATIONS

This application relates to and claims priority from Japanese Patent Application No. 2003-395322, filed on Nov. 26, 2003, the entire disclosure of which is incorporated herein by reference.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates to a disk array system having a plurality of disk drives and a method of avoiding failure of the disk array system.

2. Description of the Related Art

The disk array system, including a number of disk drives in an array, is configured based on RAID (Redundant Array of Independent Inexpensive Disks). A logical volume, which is a logical storage area, is formed on a physical storage area possessed by each disk device. A host computer can read and write desired data by issuing a write command or a read command of a predetermined format to the disk array system.

Various defensive measures are taken for the disk array system in order to prevent loss of data stored in the disk drive. One is an employment of RAID configuration. For example, by employing a redundant storage configuration known as RAID levels 1 to 6 in the disk array system, possibility of data loss decreases. In addition, in the disk array system, for example, it is possible to store the identical data into a pair of logical volumes; a primary volume and a secondary volume; by duplicating the logical volume in the RAID configuration. Alternatively, as known so-called as a disaster recovery, there is a case where data copy is stored to a remote site located far away from a local site, considering an inadvertent situation such as natural disaster and the like. Data stored in the disk array system is regularly stored in a backup device such as a tape drive.

In addition, in the disk array system, duplication of physical structure is also employed. For example, the disk array system is multiplexed by providing a plurality of main units such as host interface circuits for performing data communication with the host computer or a lower level interface circuit for performing data communication with each disk drive. There are also provided a plurality of paths for connecting these main units and power sources for supplying a power to these main units.

In addition to these units, the disk array system may be provided with one or more spare disk drives. When any failure occurred in the disk drive in which data is stored, the data stored in the faulty disk drive is copied in the spare disk. For example, by executing inverse operation based on data and parity stored dispersedly in other disk drive, the data in the faulty disk drive is recovered (JP-A-7-146760). Subsequently, the faulty disk drive is taken out, and replaced with a new disk drive or a spare disk drive.

In the related art, when a failure occurred in the disk drive, data stored in the faulty disk drive is recovered based on data and parity stored in another normal disk drive. In the related art, recovered data is then stored in the spare disk drive. In this manner, in the related art, data copy to the spare disk drive is not performed until a failure is actually occurred in a certain disk drive. Therefore, timing to start data copy to the spare disk drive is delayed. In addition, since data is recovered from a normal disk drive, it takes a long time to recover the data, and it also takes a long time until data copy is completed.

In addition, when any failure occurred successively in the part of another normal disk drive, data required for inverse operation cannot be obtained, and thus data of the faulty disk drive cannot be recovered. Even with the normal disk drive, when read and write operation is repeated, possibility of occurrence of partial failure increases. When two or more parts of information (data, parity) cannot be read, data cannot be recovered by inverse operation and thus unrecoverable data will be lost.

SUMMARY OF THE INVENTION

It is an object of the present invention to provide a disk array system which can transfer data from a disk drive in which occurrence of failure is anticipated to a spare disk drive in a safer manner than in the related art, and a method of avoiding failure of the disk array system. Another object of the invention to provide a disk array system in which possibility of occurrence of failure in a normal disk drive is reduced by reducing writing and reading to a normal disk drive other than the disk drive in which occurrence of failure is anticipated and a method of avoiding failure of the disk array system. Other objects of the invention will be apparent from description of embodiments described later.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a schematic block diagram generally showing a disk array system according to an embodiment of the invention;

FIG. 2 is an explanatory drawing showing a configuration of RAID configuration management table, in which FIG. 2A shows a state before execution of a sparing process, and FIG. 2B is a state after execution of the sparing process;

FIG. 3 is an explanatory drawing showing a configuration of a pairing information management table, in which FIG. 3A shows a state before execution of the sparing process, and FIG. 3B shows a state after execution of the sparing process;

FIG. 4 is an explanatory drawing showing a configuration of a differential bitmap;

FIG. 5 is a schematic explanatory drawing generally showing a method of avoiding data failure according to a first embodiment;

FIG. 6 is a flowchart showing the sparing process;

FIG. 7 is a flowchart showing a procedure of manual sparing process;

FIG. 8 is a flowchart showing a data backup process;

FIG. 9 is a flowchart showing a feedback process of differential data;

FIG. 10 is a schematic explanatory drawing generally showing a method of avoiding data failure according to a second embodiment;

FIG. 11 is an explanatory drawing showing a work volume management table and the like, in which FIG. 11A is a state before execution of the sparing process, FIG. 11B is a state after execution of the sparing process, and FIG. 11C shows a storage configuration of the work volume;

FIG. 12 is a flowchart showing a data backup process;

FIG. 13 is a flowchart showing a feedback process of differential data;

FIG. 14 is a schematic explanatory drawing generally showing a method of avoiding data failure according to a third embodiment;

FIG. 15 is an explanatory drawing showing a management table, in which FIG. 15A shows a disk management table, FIG. 15B shows a work disk management table before execution of the sparing process, and FIG. 15C shows a work disk management table after execution of the sparing process;

FIG. 16 is an explanatory drawing showing a differential management table;

FIG. 17 is a flowchart showing a data backup process;

FIG. 18 is a flowchart showing a feedback process of differential data;

FIG. 19 is a schematic explanatory drawing generally showing a method of avoiding data failure according to a fourth embodiment;

FIG. 20 is an explanatory drawing showing extended states of management tables in which FIG. 20A shows an extended work volume management table, and FIG. 20B shows an extended work disk management table, respectively;

FIG. 21 is a schematic explanatory drawing generally showing a method of avoiding data failure according to a fifth embodiment;

FIG. 22 is a flowchart showing a method of the data backup process;

FIG. 23 is a flowchart showing another example of the data backup process;

FIG. 24 is a flowchart showing still another embodiment of the data backup process;

FIG. 25 is a schematic explanatory drawing generally showing a method of avoiding data failure according to a sixth embodiment;

FIG. 26 is a flowchart showing a sparing process;

FIG. 27 is a flowchart showing the data backup process;

FIG. 28 is a flow chart showing another example of the data backup process; and

FIG. 29 is a flowchart showing still another example of the data backup process.

DESCRIPTION OF THE SPECIFIC EMBODIMENTS

In order to solve the problem described above, a disk array system according to the present invention includes a channel adapter for controlling data transfer with respect to a host device, a plurality of data disk drives configuring a RAID group, at least one spare disk drive provided as a spare for the data disk drives, a disk adapter for controlling data transfer with respect to the data disk drives and the spare disk drive, a cache memory used by the channel adapter and the disk adapter for storing data, a control memory used by the channel adapter and the disk adapter for storing management information, a backup storage provided separately from the data disk drives and the spare disk drive, a first control unit provided in the disk adapter for observing occurrence of access error with respect to the data disk drives, the first control unit, when the frequency of occurrence of the access error exceeds a predetermined threshold, copying data stored in the data disk drive exceeding the threshold in the spare disk drive via the cache memory, a second control unit provided in the disk adapter for processing an access request directed to the RAID group during copying process by the first control unit, the second control unit making the backup storage take over a write request directed to the RAID group, a third control unit provided in the disk adapter for copying data written in the backup storage by the second control unit to the data disk drives and the spare disk drive other than the data disk drive exceeding the threshold when copying process by the fist control unit is finished.

The channel adapter stores data received from the host device in the cache memory. The channel adapter stores a command (read command, write command) received from the host device in the control memory. The disk adapter makes reference to the contents in the control memory, reads data received from the host device from the cache memory, and stores it in a predetermined data disk drive (in the case where it is a write command).

The disk adapter also makes reference to the contents of the control memory, reads data requested from the host device from the data disk drive, and stores it in the cache memory (in the case where it is a read command). The channel adapter reads the data stored in the cache memory and transmits it to the host device.

Data (including parity) is dispersedly stored in a plurality of data disk drives configuring the RAID group. For example, in the RAID 5, there is no parity-specific disk drive, and thus the parity is also stored dispersedly in the data disk drives as in the case of normal data. The backup storage is provided for processing a write request to the RAID group, and temporarily retains data directed to the RAID group. The backup storage may be implemented, for example, as another RAID group having the same configuration as the RAID group, as one or more logical volumes, or as one or more disk drives.

The first control unit observes occurrence of access errors in the data disk drives configuring the RAID group. The access error includes, for example, data read error and data write error. More specifically, the access error includes, for example, a case in which data could not be written due to scratches on the disk surface, a case in which data could not read due to deterioration of magnetization on the disk surface, and a case in which data could not be written or read due to failure or deterioration of a head. The first control unit observes occurrence of the access error for the data disk drives. When the frequency of occurrence of the access error exceeds a threshold, the first control unit copies data stored in the data disk drive in which the access error exceeding the threshold is detected to the spare disk drive. What should be noted here is that even when the access error exceeds the threshold, the failure that disables writing and reading is not necessarily occurred. Therefore, the first control unit can read data directly from the data disk drive in which the access error exceeding the threshold is detected and transfer it to the spare disk drive. When the data cannot be read directly from the data disk drive in which the access error exceeding the threshold is detected, the first control unit can take data and parity out from another normal data disk drive and recover the data, and then store the recovered data in the spare disk drive.

Even the first control unit is in the course of copying process to the spare disk drive, the host computer using the disk array system can access the RAID group to read or write desired data. When a write request directing to the RAID group is issued during copying process of the first control unit, the second control unit redirects the write request to the backup storage. In other words, new data is not stored in the data disk drives which configures the RAID group, but stores in the backup storage. Then, when the copying process of the first control unit is finished, the third control unit copies and copies the data stored in the backup storage to the data disk drives and the spare disk drives other than the data disk drive in which the access error exceeding the threshold is detected.

There is a case in which a data read request is issued to the data disk drives configuring the RAID group while the first control unit is in the course of copying into the spare disk drive. When a read request is issued to the data disk drive in which the access error exceeding the threshold is detected, the second control unit can recover the requested data from data stored in the data disk drives other than the data disk drive exceeding the threshold. The second control unit provides the recovered data to the source of the read request.

In contrast, when a read request is issued to the respective data disk drive other than the data disk drive in which the access error exceeding the threshold is detected, the second control unit can read data stored in the backup storage and provide the read-out data to the source of the read request.

Referring now to FIG. 1 to FIG. 29, embodiments of the present invention will be described. The present embodiment is characterized as shown below.

According to one configuration, the second control unit processing an access request while transferring data to the spare disk drive is associated with differential management information for controlling data written in the backup storage. The second control unit determines a storage area corresponding to the read request from the host computer based on the differential management information. When reading of data stored in the differential management information is requested, the second control unit reads the requested data from the backup storage and provides it to the host computer. In contrast, when reading of data which is not recorded in the differential management information is requested, the second control unit recovers the data based on data stored in the data disk drives other than the data disk drive exceeding the threshold, and provides the recovered data to the host computer.

In one configuration, the second control unit makes only the write request directed to the data disk drive in which the access error exceeding the threshold is detected out of write requests directed to the RAID group executed by the backup storage. When it is the write request to the data disk drives other than the data disk drive in which the access error exceeding the threshold is detected, the second control unit makes it executed by the corresponding data disk drive.

In one configuration, the second control unit redirects the write request directed to the RAID group to the backup storage when the space more than a predetermined value is left in the backup storage. When there is no space more than the predetermined value in the backup storage, the second control unit makes the RAID group execute the write request directed to the RAID group.

In one configuration, the first control unit recovers data in the data disk drive in which an access error exceeding the threshold is detected based on data stored in the data disk drives other than the data disk drive in which the access error exceeding the threshold is detected. The first control unit makes the recovered data copied in the spare disk drive.

In one configuration, there is provided a manual instruction unit for making the copying process controlled by the first control unit executed manually. In other words, even when an access error does not reach the predetermined threshold, the system administrator can make the contents stored in any one of the data disk drives configuring the RAID group copied in the spare disk drive via the manual instruction unit.

In one configuration, the first control unit and the second control unit can perform multiple operations. The backup storage is adapted to receive a plurality of write requests directing to the data disk drives in the RAID group.

For example, the present embodiment may be considered to be a method of avoiding failure of the disk array system. In other words, the present embodiment is a method of avoiding failure of the disk array system including a plurality of data disk drives configuring the RAID group, at least one spare disk drive provided as a spare for each the data disk drive, and the backup storage provided separately from the data disk drives and the spare disk drive, including a first step to a fifth step shown below. In the first step, occurrence of an access error with respect to the data disk drives is observed and whether or not the frequency of occurrence of the access error exceeds a predetermine threshold is judged. In the second step, when the data disk drive exceeding the threshold is detected in the first step, data stored in the data disk drive exceeding the threshold is copied to the spare disk drive. In the third step, the RAID group and the backup storage are associated by starting copying process in the first step. In the fourth step, whether or not an access request directed to the RAID group has issued is judged. In the fifth step, when issue of the access request is detected in the fourth step and if the access request is the write request, data is written in the backup storage associated in the third step.

In addition, the present embodiment may be considered to be a method of using the disk drive in the disk array system. In other words, the present embodiment is a method of using the disk drive in the disk array system including a plurality of disk drives configuring the RAID group and includes the following steps. In a faulty drive detecting step, occurrence of an access error with respect to the data disk drives configuring the RAID group is observed, and when the frequency of the access error exceeds a predetermined threshold is detected, it is determined that it is a faulty disk drive. In a data copying step, when the faulty disk drive is detected in the fault disk drive detecting step, data stored in the faulty disk drive is copied to a normal disk drive different from the data disk drives configuring the RAID group. In an access request detecting step, whether or not an access request directed to the RAID group has issued during copying process in the data copying step is detected. In an access processing step, when a write request is detected in the access request detecting step, data relating to the write request is written to a normal disk drive different from the normal disk drive in which the data is copied.

1. First Embodiment

Referring to FIG. 1 to FIG. 9, an embodiment of the present invention will be described. FIG. 1 is a schematic block diagram showing a configuration of a disk array system 10.

The disk array system 10 is connected to a plurality of host computers 1 via a communication network CN1 so as to be capable of communicating with each other in both directions. The communication network CN1 here includes, for example, LAN (Local Area Network), SAN (Storage Area Network), and Internet. When using the LAN, data transfer between the host computer 1 and the disk array system 10 is executed according to TCP/IP (Transmission Control Protocol/Internet Protocol) protocol. When using the SAN, the host computer 1 and the disk array system 10 perform data transfer according to a fiber channel protocol. When the host computer 1 is a main frame, data transfer is performed according to the communication protocol, such as FICON (Fibre Connection: Registered Trademark), ESCON (Enterprise System Connection: Registered Trademark), ACONARC (Advanced Connection Architecture: Registered Trademark), or FIBARC (Fiber Connection Architecture: Registered Trademark).

The each host computer 1 is implemented, for example, as a server, a personal computer, a workstation, or a main frame. For example, the each host computer 1 is connected to a plurality of client terminals located out of the drawing via a separate communication network. The each host computer 1 provides a service to each client terminal by writing/reading data to/from the disk array system 10 according to a request from each client terminal.

The disk array system 10 includes channel adapters (hereinafter referred to as CHA) 11, disk adapters (hereinafter referred to as DKA) 12, a shared memory 13, a cache memory 14, a switch unit 15, and disk drives 16, which will be described respectively later. The CHA 11 and the DKA 12 are implemented by cooperation between a printed board on which a processor or a memory is mouthed, and a control program.

The disk array system 10 is provided with a plurality of, for example, four or eight CHAs 11. The channel adapter 11 is provided according to the type of the host computer 1, such as a CHA for open system, or a CHA for main frame system. The each CHA 11 controls data transfer with respect to the host computer 1. The each CHA 11 is provided with a processor unit, a data communication unit, and a local memory unit (not shown).

The each CHA 11 receives a command and data for requesting writing/reading of data to/from the host computer 1 connected respectively thereto, and operates according to the command received from the host computer 1. The operation including the operation of the DKA 12 will now be described. For example, upon reception of data read request from the host computer 1, the CHA 11 stores the read command in the shared memory 13. The DKA 12 makes reference to the shared memory 13 as needed and, when an unprocessed read command is found, reads data from the disk drive 16 and stores it in the cache memory 14. The CHA 11 reads data transferred to the cache memory 14, and transmits it to the host computer 1 which is the source of the command. Alternatively, upon reception of a data write request from the host computer 1, the CHA 11 stores the write command in the shared memory 13, and stores received data in the cache memory 14. The DKA 12 stores the data stored in the cache memory 14 to a predetermined disk drive 16 according to the command stored in the shared memory 13.

In the disk array system 10, a plurality of, for example, four or eight DKAs 12 are provided. The each DKA 12 controls data communication with respect to each disk drive 16, and includes a processor unit, a data communication unit, and a local memory (not shown). The each DKA 12 and the each disk drive 16 are connected via the communication network CN2, such as the SAN, and perform data transfer in blocks according to the fiber channel protocol. The each DKA 12 observes the state of the disk drive 16 as needed, and the result of such observation is transmitted to the SVP 2 via an internal network CN3.

The disk array system 10 is provided with a plurality of disk drives 16. The disk drive 16 is implemented as a hard disk drive (HDD) or a semiconductor memory device, for example. Here, for example, a RAID group 17 may be configured of four disk drives 16. The RAID group 17 is a disk group implementing a redundant storage of data according, for example, to a RAID 5 (not limited to RAID 5). At least one logical volume 18 (LU) which is a logical storage area, may be set on a physical storage area provided by each RAID group 17.

The shared memory 13, which is an example of "control memory" is configured, for example, of a non-volatile memory, and stores control information or management information. The cache memory 14 mainly stores data.

An SVP (Service Processor) 2 is a computer device for managing and observing the disk array system 10. The SVP 2 collects various environment information or performance information from each CHA 11 and each DKA 12 via the communication network CN3 provided in the disk array system 10. Information that the SVP 2 collects includes, for example, a system structure, a power source alarm, a temperature alarm, and an I/O speed (IOPS). The communication network CN3 is configured, for example, as the LAN. The system administrator can perform setting of the RAID configuration, blocking process of various packages (CHA, DKA, disk drive, etc.) via a user interface provided by the SVP2.

FIG. 2 is a schematic explanatory drawing showing the configuration of a RAID configuration management table T1 stored in the disk array system 10. The RAID configuration management table T1 is stored, for example, in the shared memory 13. The RAID configuration management table T1 coordinates, for example, the RAID group number (Group # in the drawing), the logical volume number (Volume # in the drawing), the disk drive number (Disk # in the drawing), and the RAID level with respect to each other. Although other tables shown below is also the same, characters or values shown in the table are intended to make description easier, and are different from those actually stored. An example of the content of the RAID configuration management table T1 will now be described. For example, in the RAID group 17 of the group number 1, three logical volumes 18 in total from the volume numbers 1 to 3 are set. The RAID group 17 is configured of total four disk drives 16 specified by the disk numbers 1 to 4. The RAID group 17 specified by the group number 1 is operated by the RAID 5.

In the present embodiment, as described later, when the sign of occurrence of failure in a certain disk drive 16 is detected, data writing to the RAID group to which the disk drive 16 where occurrence of a failure is anticipated belongs is backed up by another RAID group (or logical volume or disk drive).

FIG. 2A shows a configuration before setting the RAID group 17 for backup, and FIG. 2B shows a configuration after setting the RAID group 17 for backup. As shown in FIG. 2A, no intended use is set to the RAID group 17 specified by the group number 5, and thus no logical volume is set at the beginning. When occurrence of failure is anticipated in any one of disk drives 16 belonging to the RAID group 17 of the group number 1, the unused RAID group 17 specified by the group number 5 is used as the RAID group 17 for backup. To the RAID group 17 (#5) used for data backup, the same number of logical volumes (#13 15) as the logical volumes 18 (#1 3) which is set to the original RAID group 17 (1#) is set.

FIG. 3 is a schematic explanatory drawing showing the configuration of a pairing information management table T2 stored in the disk array system 10. The pairing information management table T2 is to be stored in, for example, the shared memory 13, and controls the logical volumes 18 constituting the pair.

The pair information management table T2 coordinates, for example, the primary volume number, the secondary volume number, the pairing status, and the differential bitmap with respect to each other. The pairing information management table T2 shown in FIG. 3A shows a state before setting the logical volumes 18 for data backup. In FIG. 3A, for example, a certain logical volume 18 (#4) is paired as a main side with another logical volume 18(#7) as a sub side. The pairing status is "duplex". The "duplex" means to synchronize the contents stored in the primary volume and the secondary volume. The differential bitmap, which will be further described later, is information for managing the differential of data between the primary volume and the secondary volume.

FIG. 3B shows a case in which the RAID group 17 for data backup is set. The respective logical volumes 18 (#1 3) of the RAID group 17 (#1) are coordinated with the respective logical volumes 18 (#13 15) set to the RAID group 17(#5) one-to-one. In other words, in the example shown in FIG. 3B, the logical volume 18(#1) pairs with the logical volume 18(#13), the logical volume 18 (#2) pairs with the logical volume 18 (#14), and the logical volume (#3) pairs with the logical volume 18(#15). The paired states of these pairs are not "DUPLEX", but "UPDATE DATA BEING BACKED UP". The "UPDATE DATA BEING BACKED UP" means a state in which update data originally directed to the logical volumes 18 (#1 3) is being backed up to the logical volume 18 (#13 15) which is the destination of the data backup. The state of "UPDATE DATA BEING BACKED UP" and the state of "DUPLEX" differ from each other in that an initial copying process is not performed. In the normal duplication, the initial copying process is first performed to match the contents of the primary volume and of the secondary volume. However, in the state of "UPDATE DATA BEING BACKED UP", the initial copying process is not performed.

FIG. 4 is an explanatory drawing showing a differential bitmap 20. In the present embodiment, as shown in FIG. 4A, the primary volume pairs with the secondary volume, and when data writing (update) to the primary volume is requested, this data is stored in the secondary volume. Assuming that data (#1) and data (#2) are updated, these data are stored in the secondary volume. Then, the differential bits corresponding to the update data are set to "1", respectively. The state in which "1" is set to the differential bit means that data in the secondary volume is not copied to the primary volume, in other words, that new data is stored in the secondary volume. Therefore, when data reading is requested and if the differential bit corresponding to the requested data is set to "1", it can be determined that the data is stored in the secondary volume. In contrast, when the differential bit corresponding to the data to be read is set to "0", it can be determined that the requested data is stored in the primary volume.

As shown in FIG. 4B, the differential bitmap 20 is an aggregation of the differential bits. The differential bit map 20 is an example of "differential management information". In the present embodiment, the respective differential bits correspond respectively to the respective trucks in the disk. Therefore, the update management unit is "truck". When update of data which does not reach the update management unit is performed, all data in the truck to which the update data belongs is read into the cache memory 14, and is combined with the update data on the cache memory 14. The truck combined on the cache memory 14 is stored in the secondary volume and the corresponding differential bit is set to "1".

FIG. 5 is a schematic explanatory drawing showing a method of avoiding failure according to the present embodiment. In an example shown in FIG. 5, it is assumed that occurrence of failure in the fourth disk drive 16 (#4) belonging to the RAID group 7(P) is anticipated. Though it is described in detail later, when read error or write error occurred in a disk drive exceeding the threshold, it is judged that there is a risk of occurrence of failure in the disk drive 16(#4). Therefore, the contents stored in the disk drive 16 (#4) in which occurrence of failure is anticipated is read out to the cache memory 14 first, and then copied from the cache memory 14 to the spare disk drive 16 (SP)(S1).

When data copy to the spare disk drive 16 (SP) is started, an unused RAID group is reserved out of a plurality of RAID groups 17 in the disk array system 10 (S2). Then, the RAID group 17(P) including the disk drive 16 (#4) in which the occurrence of failure is anticipated is paired as a main side with the unused RAID group 17(S) reserved in the step S2 as a sub side. A primary volume 18 (P) set to the main RAID group 17(P) and a secondary volume 18(S) set to the sub RAID group 17(S) constitute a pair (S3). Information on such pairing is registered in the pairing information management table T2.

When data writing is requested from the host computer 1 during data transfer to the spare disk drive 16(SP), the data is stored not in the primary volume 18(P), but in the secondary volume 18(S)(S4). When the data is stored in the secondary volume 18(S), the differential bit corresponding to the update data is set to "1", and managed by the differential bitmap 20 (S5).

When data writing is requested from the host computer 1 during data transfer to the spare disk drive 16 (SP), DKA12 makes reference to the differential bitmap 20 and determines in which one of the primary volume 18(P) and the secondary volume 18(S) the data requested from the host computer 1 is stored. When the differential bit corresponding to the requested data is set to "0", the requested data is stored in the primary volume 18(P). Then, the DKA12 reads the requested data from the primary volume 18(P), and copies it in the cache memory 14. The CHA 11 transmits data which is transferred to the cache memory 14 to the host computer 1 (S6). On the other hand, when the differential bit corresponding to the data requested by the host computer 1 is set to "1", the requested data exists in the secondary volume 18(S). Then the DKA 12 reads the requested data from the secondary volume 18(S) and copies it to the cache memory 14. As described above, the CHA 11 transmits the data transferred to the cache memory 14 to the host computer 1 (S7).

When data transfer to the spare disk 16(SP) is completed, the DAK 12 makes reference to the differential bitmap 20 and copies data backed up in the secondary volume 18(S) to the primary volume 18(P)(S8). More specifically, the data stored in the secondary volume 18(S) is copied to the disk drives 16 (#1 3) other than the disk drive 16(#4) in which failure is anticipated, and to the spare disk drive 16(SP) out of the disk drive 16 belonging to the main RAID group 17(P). Although it is needless to say, not all the data stored in the secondary volume 18(S) is copied to the disk drives 16 (#1 3) and the spare disk drive (SP) respectively. Only the necessary data is copied to the corresponding disk.

Subsequently, referring to FIG. 6, a copying process to the spare disk drive 16(SP) shown in S1 in FIG. 5 will be described. In the present embodiment, data copy to the spare disk drive 16(SP) may be referred to as "sparing". The flowchart shown in FIG. 6 shows an example of a "first control unit", a "first step", and a "second step", a "faulty drive detecting step", and a "data copying step". Process shown in FIG. 6 is executed by the DKA 12. Each flowchart shows the outline of the process, and differs from the actual computer program, which is the same for the respective flowcharts shown later.

The DKA 12 observes access error (I/O error) in each disk drive 16 (S11). When occurrence of error is detected (YES in S11), the DKA 12 controls the number of occurrences of error for each type of error (S12). The DKA 12 can control the access error occurred by using an error management table T3 shown in FIG. 6. The number of occurrences of access error (N1 N3 . . . ) is controlled by each type (ET1 ET3 . . . ), and thresholds Th1 Th3 . . . are set to the respective types of access error. Although there is only one error control shown in FIG. 6, the error control is performed for the each disk drive 16 being used.

The access error can be classified, for example, into read errors and write errors. The access error can also be classified, for example, into recoverable errors and unrecoverable errors. The recoverable error means an error of the type that recovery of data can easily be achieved by ECC (Error-Correcting Code). The unrecoverable error means an error of the type that cannot be recovered by the redundant data (ECC) attached to each data and hence must be recovered at the higher level (inverse operation using other data and parity). Detailed examples of the access error include, for example, a case in which data cannot be written due to existence of physical scratches on the disk surface, a case in which data cannot be read because magnetism on the disk surface is deteriorated, and a case in which data cannot be read and written due to failure of a magnetic head.

As shown on the lower side of the error management table T3, the thresholds Th are different between the recoverable error and the unrecoverable error. The threshold Th of the recoverable error is set to a relatively high value, and the threshold Th of the unrecoverable error is set to a relatively low value. Although at least three errors are shown and threshold Th are set for each types of errors in the error management table T3 in FIG. 6, it is only shown byway of example, and it is possible to limit only two types of errors; the recoverable error and the unrecoverable error. Alternatively, it is also possible to classify the error in further detail and set thresholds Th to a number of types of errors, respectively as shown in the error management table T3.

The DKA 12 makes reference to the error management table T3 and judges whether or not the frequency of occurrence of the access error exceeds the threshold Th for each of the disk drives 16 which is being used (S13). When the frequency of occurrence of the access error does not exceed the threshold Th (NO in S13), the process is terminated. On the other hand, when the frequency of occurrence of the access error exceeds the threshold Th (YES in S13), it means that occurrence of failure is anticipated in the corresponding disk drive 16. Then, the DKA 12 copies the contents stored in the disk drive that occurrence of failure is anticipated (hereinafter, this drive may be referred to as faulty drive) 16 into the spare disk drive 16 (SP), and starts data transfer (S14). The procedure in S14 is repeated until data transfer is completed (NO in S15). When data transfer to the spare disk drive 16 (SP) is completed (YES in S15), the processing is terminated.

In the processing described above, the threshold Th is set to each error type, and the disk drive is judged to be a faulty disk drive when the frequency of occurrence of the access error of any type exceeds the corresponding threshold Th. However, the invention is not limited thereto, and it is also possible to judge whether or not it is a faulty disk drive by totally analyzing the access error (based on the access error).

FIG. 7 shows a processing in a case in which sparing is implemented manually via a SVP 2. The procedure shown in FIG. 7 is executed mainly by the cooperation of the SVP 2 and the DKA 12. This procedure includes a configuration corresponding to the "manual instruction unit".

The SVP 2 collects error information relating to the respective disk drives 16 from the respective DKAs 12 via the internal network CN3 (S21). The SVP 2 displays the collected error information on the terminal screen of the SVP 2 depending on the request from the system administrator, or automatically (S22). The SVP 2 (more specifically, the control program executed by the microprocessor of the SVP 2) judges whether or not the frequency of occurrence of access error exceeds the threshold. Th for the respective disk drives 16 (S23). When the disk drive 16 in which the frequency of occurrence of access error exceeds the threshold Th is detected (YES in S23), the SVP 2 determines that this disk drive 16 is a faulty disk drive having high risk of occurrence of failure in the future, and sends a warning to the system administrator (S24). This warning may be, for example, a display of a warning message, an audio output, or a flashing of a warning lamp. When there is no disk drive 16 that the frequency of occurrence of access error exceeds the threshold Th (NO in S23), S24 is skipped.

The system administrator can issue an instruction to start sparing according to the warning informed in S24, or even when no warning is made, according to his/her own judgement. Instruction to start sparing via manual operation of the system administrator is performed via the user interface of the SVP 2 (for example input through a keyboard switch or voice instruction). The DKA 12 judges whether or not the instruction to start sparing is issued by the system administrator (S25). When there is no start instruction by manual operation (NO in S25), whether or not the processing is to be terminated is determined (S26). For example, when the system administrator issues instruction to terminate the processing by operating menu or the like (YES in S26), the processing is terminated. When the system administrator does not issue instruction to terminate the processing (NO in S26), the procedure goes back to S21 and collection of error information and so on are repeated.

When the system administrator issues an instruction to start sparing by manual operation (YES in S25), the contents of a disk drive 16 which the system administrator indicated or the disk drive 16 warned in S24, or the disk drive 16 which the system administrator indicated and a warned disk drive 16 are copied in the spare disk drive 16 (SP) (S27). When data transfer to the spare disk drive 16 (SP) is completed (YES in S28), the processing is terminated.

FIG. 8 is a flowchart showing a data backup process. The data backup process is activated by starting sparing, and is executed by the DKA 12. The processing shown in FIG. 8 is an example corresponding to a "second control unit", "third step" to "fifth step", an "access request detection step" and an "access processing step" respectively.

The DKA 12 observes sparing, that is, whether or not data copy from the faulty disk drive 16 to the spare disk drive 16 (SP) has started (S31). When the start of sparing is detected (YES in S31), the DKA 12 judges whether or not an unused RAID group 17 exists (S32). When the unused RAID group 17 does not exist (NO in S32), the data backup area cannot be reserved, the processing is terminated.

When the unused RAID group 17 is found (YES in S32), the DKA 12 pairs the RAID group 17 including the faulty disk drive 16 as a main side with the found unused RAID group 17 as a sub side (S33). When a plurality of logical volumes 18 are set to the main RAID group 17, the same number and the same size of the logical volume 18 are set to the sub RAID group 17, and the respective logical volumes 18 in the main side and the sub side are paired with each other, respectively.

The DKA 12 makes reference to the shared memory 13 as needed, and observes whether or not an access request (read request or write request) is issued from the host computer 1 (S34). When no access request is issued from the host computer 1 (NO in S34), the DKA 12 judges whether or not the sparing is finished (S35). When sparing is not finished (NO in S35), the procedure goes to S34. When the sparing is finished (YES in S35), the DKA 12 copies data stored in the secondary volume 18 to the primary volume 18 (S36) and deletes the pairing of the volume (S37) and terminates the processing.

When an access request is issued from the host computer 1 during sparing (YES in S34), the DKA 12 judges whether or not the access request is a read request (indicated by READ in the drawing) (S38). When it is the read request (YES in S38), the DKA 12 makes reference to the differential bitmap 20 and judges whether or not the differential bit corresponding to the data requested to be read out is set to the value "1" (in the drawing, the case in which the differential bit is set to 1 is indicated by ON, and the case in which the differential bit is set to "0" is indicated by OFF) (S39).

When the differential bit is set to "1" (YES in S39), the requested data exists in the secondary volume 18. Therefore, the DKA 12 reads data from the secondary volume 18, and stores it in the cache memory 14 (S40). When the differential bit corresponding to the data requested to be read out is set to "0" (NO in S39), since the requested data exists in the primary volume 18, the DKA 12 reads data from the primary volume 18, and stores it in the cache memory 14 (S41). When the requested data is stored in the faulty disk drive 16, data is not read out directly from the faulty disk drive 16, but the requested data is recovered based on data stored in another normal disk drive 16.

When the access request requested from the host computer 1 is a write request (NO in S38), the DKA 12 sets the differential bit corresponding to the data to be written (data to be updated) to "1" (S42), and stores the data to be written into the secondary volume 18 (S43).

FIG. 9 is a flowchart showing a feedback processing of the differential data. The differential data feedback processing is executed by the DKA 12 when the sparing is finished. This processing corresponds to details of S36 in FIG. 8. This processing is an example corresponding to a "third control unit", a "sixth step", a "data update step".

The DKA 12 sets the feedback pointer to the first address of the logical volume (S51). The DKA 12 judges whether or not the differential bit corresponding to the address is set to "1" (S52). When the differential bit is set to "1" (YES in S52), the DKA 12 copies the data of this address from the secondary volume 18 to the primary volume 18 (S53). More specifically, the data read out from the secondary volume 18 is copied to the cache memory 14, and copied from the cache memory 14 to the primary volume 18. When data copy for one address is finished, the DKA 12 moves the feedback pointer to the next address (S54). The DKA 12 then judges whether or not the feedback of the differential data is completed (S55). In other words, the DKA 12 judges whether or not the feedback pointer points the terminal position. The procedure from S52 to S54 is repeated until the feedback of the differential data is terminated (NO in S55).

Acc


Free Web Sudoku Puzzles.
Solve with your browser.
              3  
  5   2   1      
    6 8     9   7
    3 7         9
  9           4  
6         2 3    
7   5     6 1    
      4   5   2  
  2              
What is it?



Add Your Site · Terms Of Service · Privacy Policy


DISCLAIMER
Linkgrinder is a free service that searches the Internet and indexes all files found so that you may search quickly and easily for shared files. These files are created and made available individually by users whose identity we are not aware of and who we have no control over. In essence we function like a search engine tool; these files ARE NOT STORED OR SERVED BY OUR NETWORK. We are not responsible for any materials obtained by using our service. We do not monitor any of the contents of these files. These files may contain viruses, illegal materials, materials inappropriate for minors, offensive files and the like. BY USING OUR SERVICE, YOU ASSUME FULL RESPONSIBILITY FOR DOWNLOADING THESE MATERIALS AND WILL INDEMNIFY US FOR ANY DAMAGES THAT MAY BE INCURRED.

For More Specific Information VIEW OUR TERMS OF SERVICE.

Thank you and Enjoy!