Senior Fitness - Exercise and Nutrition for Aging Men and Women
FREE Article Feed for your website.
Home Ownership Magazine
Party Planning Information
Article Marketing Resources
Bio-Medical Research Article Database
Informative Articles on Life, Love and Happiness
Tutorials on Business to Writing
Famous Quotes from Famous People
Song Lyric Information
New US Patent Information
Comprehensive List of Content by Category
Online Auctions and Shopping Related Articles
Article Search
Most Recent Articles
Title: Line thermal head printer device
Patent Number: 7,145,585 Issued on 12/05/2006 to Sogabe

Title: Low residue surface treatment
Patent Number: 6,762,157 Issued on 07/13/2004 to Babinski,   et al.

Title: Fringe field switching liquid crystal display having sawtooth edges on the common and pixel electrodes and on the conductive black matrix
Patent Number: 7,145,621 Issued on 12/05/2006 to Lee,   et al.

Title: Information reading apparatus
Patent Number: 7,145,698 Issued on 12/05/2006 to Yamamoto

Title: Image processing system including synchronous type processing unit and asynchronous type processing unit and image processing method
Patent Number: 7,145,700 Issued on 12/05/2006 to Nishigaki

Title: RIEG compositions and therapeutic and diagnostic uses therefor
Patent Number: 7,141,543 Issued on 11/28/2006 to Murray,   et al.

Title: Methods and arrangements for building a subsource address multicast distribution tree using traced routes
Patent Number: 6,947,392 Issued on 09/20/2005 to Novaes

Title: Apparatus for fluorescence subtracted Raman spectroscopy
Patent Number: 7,145,651 Issued on 12/05/2006 to Li,   et al.

Title: Inflator regulator with multiple adapters for connection to different size BC hoses
Patent Number: 6,761,163 Issued on 07/13/2004 to Toth

Title: Two-dimensional code with locator and orientation components
Patent Number: 7,143,944 Issued on 12/05/2006 to Lapstun,   et al.

Title: Semiconductor memory device having memory cells requiring no refresh operation
Patent Number: 7,141,835 Issued on 11/28/2006 to Kihara

Title: Multi-band amplifier
Patent Number: 6,909,325 Issued on 06/21/2005 to Saito

Title: High-efficiency solid state power amplifier
Patent Number: 6,909,324 Issued on 06/21/2005 to Wallis,   et al.

Title: Variable gain amplifier
Patent Number: 6,909,323 Issued on 06/21/2005 to Ueno,   et al.

Title: Hydrophilic biomedical composition
Patent Number: 6,774,197 Issued on 08/10/2004 to Clayton,   et al.

Title: Method for using self-help technology to deliver remote enterprise support
Patent Number: 7,143,415 Issued on 11/28/2006 to Connelly,   et al.

Title: Method and apparatus incorporating adaptive datalink framing for message communication
Patent Number: 7,145,876 Issued on 12/05/2006 to Huang,   et al.

Title: Memory circuit apparatus
Patent Number: 6,950,360 Issued on 09/27/2005 to Nishida,   et al.

Title: Cover assembly for structural members
Patent Number: 7,143,560 Issued on 12/05/2006 to Jesko

Title: Circuit for generating internal voltage
Patent Number: 7,142,045 Issued on 11/28/2006 to Mo,   et al.

Title: Environmentally durable, self-sealing optical articles
Patent Number: 6,765,061 Issued on 07/20/2004 to Dhar,   et al.

Title: Variable geometry turbine
Patent Number: 7,140,849 Issued on 11/28/2006 to Carter

Title: Semiconductor device having a contact window and fabrication method thereof
Patent Number: 6,764,955 Issued on 07/20/2004 to Jeon,   et al.

Title: Intake or exhaust port molding core structure
Patent Number: 7,143,808 Issued on 12/05/2006 to Lee

Title: Selection circuit
Patent Number: 7,142,038 Issued on 11/28/2006 to Baglin

Title: Table with foldable legs
Patent Number: 7,143,702 Issued on 12/05/2006 to Stanford

Title: Ultrasonic-welding apparatus, optical sensor and rotation sensor for the ultrasonic-welding apparatus
Patent Number: 6,768,128 Issued on 07/27/2004 to Kitamura,   et al.

Title: Stabilized power supply circuit
Patent Number: 7,142,040 Issued on 11/28/2006 to Naka,   et al.

Title: Temporary printer firmware upgrade
Patent Number: 7,145,682 Issued on 12/05/2006 to Boldon

Title: Method and apparatus for accessing electronic data via a familiar printed medium
Patent Number: 7,143,947 Issued on 12/05/2006 to Rathus,   et al.

Title: Print data compression method and printer driver
Patent Number: 7,145,696 Issued on 12/05/2006 to Silverbrook

Title: Electron beam monitoring sensor and electron beam monitoring method
Patent Number: 6,768,118 Issued on 07/27/2004 to Nakayama,   et al.

Title: Magnetic shield for a fiber optic gyroscope
Patent Number: 6,952,268 Issued on 10/04/2005 to Olson,   et al.

Title: Circuit component placement
Patent Number: 6,768,142 Issued on 07/27/2004 to Ali,   et al.

Title: Portable power working machine
Patent Number: 6,761,136 Issued on 07/13/2004 to Ohsawa

Title: Semiconductor integrated circuit device
Patent Number: 6,768,145 Issued on 07/27/2004 to Taguchi

Title: Phase difference detector, particularly for a PLL circuit
Patent Number: 7,142,025 Issued on 11/28/2006 to Milani,   et al.

Title: Method for forming metal wire interconnection in semiconductor devices using dual damascene process
Patent Number: 6,764,944 Issued on 07/20/2004 to Lee,   et al.

Title: Data inversion circuits having a bypass mode of operation and methods of operating the same
Patent Number: 7,142,021 Issued on 11/28/2006 to Park

Title: Maskless particle-beam system for exposing a pattern on a substrate
Patent Number: 6,768,125 Issued on 07/27/2004 to Platzgummer,   et al.

Title: Field emission display having integrated getter arrangement
Patent Number: 6,963,165 Issued on 11/08/2005 to Park,   et al.

Title: Magnetic head for rotary head drum
Patent Number: 7,154,705 Issued on 12/26/2006 to Kanaguchi,   et al.

Title: Systems and methods for utilizing a tracking label in an item delivery system
Patent Number: 7,143,937 Issued on 12/05/2006 to Rainey,   et al.

Title: Method and apparatus for timing characterization of integrated circuit designs
Patent Number: 7,143,378 Issued on 11/28/2006 to Nag

Title: Combination seed planter and garden tool
Patent Number: 7,143,703 Issued on 12/05/2006 to Gallant,   et al.

Title: Dual-thickness active device layer SOI chip structure
Patent Number: 7,141,855 Issued on 11/28/2006 to Chien

Title: Dish drainer and tray system with compact storage of the tray
Patent Number: 6,763,954 Issued on 07/20/2004 to Travers,   et al.

Title: Wireless infrared network transceiver
Patent Number: 7,142,786 Issued on 11/28/2006 to Moursund,   et al.

Title: Thermal image identification system
Patent Number: 6,768,126 Issued on 07/27/2004 to Novak,   et al.

Title: System and method of radar detection of non-linear interfaces
Patent Number: 6,765,527 Issued on 07/20/2004 to Jablonski,   et al.

Title: Process for on-line monitoring of oxidation or degradation and processability of oil sand ore
Patent Number: 6,768,115 Issued on 07/27/2004 to Mikula,   et al.

Title: Key holding device
Patent Number: 6,763,938 Issued on 07/20/2004 to Nelson

Title: Electrical isolation system for a fuel cell stack and method of operating a fuel cell stack
Patent Number: 6,764,782 Issued on 07/20/2004 to Raiser,   et al.

Title: Method of manufacturing a multilayer metallization structure with non-directional sputtering method
Patent Number: 6,764,945 Issued on 07/20/2004 to Ashihara,   et al.

Title: Methods and apparatus for improving the quality of displayed images through the use of display device and display condition information
Patent Number: 7,145,572 Issued on 12/05/2006 to Dresevic,   et al.

Title: Wire-bonded package with electrically insulating wire encapsulant and thermally conductive overmold
Patent Number: 7,141,454 Issued on 11/28/2006 to Matayabas, Jr.,   et al.

Title: System and method for facilitating color adjustment of imaging data
Patent Number: 7,145,692 Issued on 12/05/2006 to Simpson,   et al.

Title: Holder for removable memory component
Patent Number: 6,763,946 Issued on 07/20/2004 to Martin

Title: Storage and retrieval system for media disks
Patent Number: 6,763,953 Issued on 07/20/2004 to Pobee-Mensah

Title: Framer method architecture and circuit with programmable symbol selection
Patent Number: 6,763,036 Issued on 07/13/2004 to Maas,   et al.

Title: Image correction device
Patent Number: 7,145,690 Issued on 12/05/2006 to Yoshimura

Title: Method to form Si-containing SOI and underlying substrate with different orientations
Patent Number: 7,141,457 Issued on 11/28/2006 to Ieong,   et al.

Title: Semiconductor device with gigantic photon-photon interactions
Patent Number: 6,768,131 Issued on 07/27/2004 to Rufenacht

Title: Tab printing in a network controller
Patent Number: 7,145,680 Issued on 12/05/2006 to Wu,   et al.

Title: Memory device and dissimilar capacitors formed on same substrate
Patent Number: 7,141,848 Issued on 11/28/2006 to Kuwazawa

Title: Method of forming a metal gate electrode
Patent Number: 6,764,961 Issued on 07/20/2004 to Ku,   et al.

Title: Mobile sheet material cutting device
Patent Number: 6,952,878 Issued on 10/11/2005 to Bareis,   et al.

Title: Self-luminous device and electric machine using the same
Patent Number: 7,142,781 Issued on 11/28/2006 to Koyama,   et al.

Title: System for prioritizing of document presented on constrained receiving station interfaces to users of the internet personalized to each user's needs and interests
Patent Number: 6,961,901 Issued on 11/01/2005 to Colson

Title: Method of operating a domestic appliance
Patent Number: 7,146,669 Issued on 12/12/2006 to Orszulik

Title: Operational frequency range of latch circuits
Patent Number: 7,142,029 Issued on 11/28/2006 to Gregory

Title: Particulate material handling systems
Patent Number: 6,763,932 Issued on 07/20/2004 to Stenson,   et al.

Title: Receptor kinase, BIN1
Patent Number: 6,765,085 Issued on 07/20/2004 to Chory,   et al.

Title: Imaged nonwoven fire-retardant fiber blends and process for making same
Patent Number: 6,764,971 Issued on 07/20/2004 to Kelly,   et al.

Title: Monolithic millimeter wave reflect array system
Patent Number: 6,765,535 Issued on 07/20/2004 to Brown,   et al.

Method and system for performing subword permutation instructions for use in two-dimensional multimedia processing Number:7,092,526 from the United States Patent and Trademark Office (PTO) owispatent

Home    Author Login    Submit Article    Article Search    Add Your Link    Edit Your Link    Contact Us    Advertising    Disclaimer

   

 
Web LinkGrinder.com

Top Breaking News
     Greek, Cypriot Leaders Resume Unification Talks in Nicosia by Nathan Morley
     Indonesia Tobacco Sales Grow, Raising Health Fears
     South Korea Allows Top Defector to Travel Overseas by VOA News

Title: Method and system for performing subword permutation instructions for use in two-dimensional multimedia processing

Abstract: The method and system provides a set of permutation primitives for current and future 2-D multimedia programs which are based on decomposing images and objects into atomic units, then finding the permutations desired for the atomic units. The subword permutation instructions for these 2-D building blocks are also defined for larger subword sizes at successively higher hierarchical levels. The atomic unit can be a 2.times.2 matrix and four triangles contained within the 2.times.2 matrix. Each of the elements in the matrix can represent a subword of one or more bits. The permutations provide vertical, horizontal, diagonal, rotational, and other rearrangements of the elements in the atomic unit.

Patent Number: 7,092,526 Issued on 08/15/2006 to Lee


Inventors: Lee; Ruby B. (Princeton, NJ)
Assignee: Teleputers, LLC (Princeton, NJ)
Appl. No.: 09/850,380
Filed: May 7, 2001


Current U.S. Class: 380/37 ; 708/520; 712/10
Current International Class: H04K 1/04 (20060101); G06F 15/00 (20060101); G06F 7/32 (20060101)
Field of Search: 708/100,520 712/1,10,20,16,24,200 380/28,37,42-47


References Cited [Referenced By]

U.S. Patent Documents
4751733 June 1988 Delayaye et al.
Primary Examiner: Song; Hosuk
Attorney, Agent or Firm: Mathews, Shepherd, McKay & Bruneau, P.A.

Claims



What is claimed is:

1. A method for permuting two dimensional (2-D) data in a programmable processor comprising the steps of: decomposing said two dimensional data into at least one atomic element said two dimensional data being located in at least one source register said at least one atomic element of said two dimensional data is a 2.times.2 matrix and said two dimensional data is decomposed into data elements in said matrix; determining at least one permutation instruction for rearrangement of said data in said atomic element; said data elements being rearranged by said at least one permutation instruction, each of said data elements representing a subword having one or more bits; and applying said permutation instructions to said subwords and placing said permutated subwords into a destination register.

2. The method of claim 1 further comprising a triangle in said matrix, said data elements in said triangle being rearranged by said at least one permutation instruction.

3. The method of claim 2 wherein said permutation instruction rotates a first one of said data elements by one or more positions in said triangle.

4. The method of claim 1 wherein said permutation instruction swaps a first one of said data elements and a second one of said data elements, said first one of said data elements and said second one of said data elements being in the same column of said matrix.

5. The method of claim 1 wherein said permutation instruction swaps a first one of said data elements and a second one of said data elements, said first one of said data elements and said second one of said data elements being in the same row of said matrix.

6. The method of claim 1 wherein said permutation instruction swaps a first one of said data elements and a second one of said data elements, said first one of said data elements and said second one of said data elements being diagonal to one another in said matrix.

7. The method of claim 1 wherein said permutation instruction rotates a first one of said data elements by one or more positions in said matrix.

8. The method of claim 1 wherein said programmable processor is a microprocessor, digital signal processor, media processor, multimedia processor, cryptographic processor or programmable System-On-Chip (SOC).

9. The method of claim 1 wherein said permutation instruction alternately selects a first subword from a first column of said matrix and a second subword from said first column of said matrix and swaps the selected said first subword and the selected said second subword.

10. The method of claim 1 wherein said permutation instruction swaps a first subword in a first row of said matrix with a second subword in said first row of said matrix.

11. The method of claim 1 wherein said permutation instruction alternately selects a first subword from a first column of said matrix and a second subword from said first column of said matrix, swaps the selected said first subword and the selected said second subword and swaps the swapped first subword in a first row of said matrix with a third subword in said first row of said matrix or the swapped second subword in a second row of said matrix with a fourth subword in said second row of said matrix.

12. The method of claim 1 wherein said permutation instruction conditionally selects a first subword from a first column of said matrix and a second subword from said first column of said matrix dependant on a permutation control bit and swaps the selected said first subword and the selected said second subword.

13. The method of claim 1 wherein said permutation instruction conditionally swaps a first subword in a first row of said matrix with a second subword in said first row of said matrix dependant on a permutation control bit.

14. The method of claim 1 wherein said permutation instruction conditionally selects a first subword from a first column of said matrix and a second subword from said first column of said matrix dependant on a permutation control bit, swaps the selected said first subword and the selected said second subword and conditionally swaps the swapped first subword in a first row of said matrix with a third subword in said first row of said matrix or the swapped second subword in a second row of said matrix with a fourth subword in said second row of said matrix dependant on a permutation control bit.

15. The method of claim 1 wherein said permutation instruction defines a size of said subword, defines a subset of subwords in said sequence of subwords, swaps a first subword in said subset with a second subword in said subset and concatenates the swapped first subword and second subword.

16. The method of claim 1 wherein said permutation instruction conditionally concatenates one or more odd elements of a first said subword sequentially with one or more second odd elements of a second said subword.

17. The method of claim 16 wherein said odd elements of a first said subword and odd elements of a second said subword are 32-bit subwords, 16-bit subwords or 8-bit subwords and said first subword and said second subword are 64-bit subwords.

18. The method of claim 1 wherein said permutation instruction conditionally concatenates one or more first even elements of a first said subword sequentially with one or more second even elements of a second said subword.

19. The method of claim 18 wherein said even elements of said first said subword and said even elements of said second said subword are 32-bit subwords, 16-bit subwords or 8-bit subwords and said first subword and said second subword are 64-bit subwords.

20. The method of claim 1 wherein said permutation instructions for said atomic unit is defined for larger subword sizes at successively higher hierarchical levels.

21. A system for permuting two-dimensional (2-D) data in a programmable processor comprising: at least one source register containing said two dimensional data; means for decomposing said two dimensional data into at least one atomic element said at least one atomic element of said two dimensional data is a 2.times.2 matrix; a destination register; means for determining at least one permutation instruction for rearrangement of said data in said atomic element said two dimensional data is decomposed into data elements in said matrix, said data elements being rearranged by said at least one permutation instruction, each of said data elements representing a subword having one or more bits; and means for placing said permutated subwords into said destination register.

22. The system of claim 21 further comprising a triangle in said matrix, said data elements in said triangle being rearranged by said at least one permutation instruction.

23. The system of claim 22 wherein said permutation instruction rotates a first one of said data elements by one or more positions in said triangle.

24. The system of claim of claim 21 wherein said permutation instruction swaps a first one of said data elements and a second one of said data elements, said first one of said data elements and said second one of said data elements being in the same column of said matrix.

25. The system of claim 21 wherein said permutation instruction swaps a first one of said data elements and a second one of said data elements, said first one of said data elements and said second one of said data elements being in the same row of said matrix.

26. The system of claim 21 wherein said permutation instruction swaps a first one of said data elements and a second one of said data elements, said first one of said data elements and said second one of said data elements being diagonal to one another in said matrix.

27. The system of claim 21 wherein said permutation instruction rotates a first one of said data elements by one or more positions is said matrix.

28. The system of claim 21 wherein said permutation instruction conditionally selects a first subword from a first column of said matrix and a second subword from said first column of said matrix dependant on a permutation control bit and swaps the selected said first subword and the selected said second subword.

29. The system of claim 21 wherein said permutation instruction conditionally swaps a first subword in a first row of said matrix with a second subword in said first row of said matrix dependant on a permutation control bit.

30. The system of claim 21 wherein said permutation instruction conditionally selects a first subword from a first column of said matrix and a second subword from said first column of said matrix dependant on a permutation control bit, swaps the selected said first subword and the selected said second subword and conditionally swaps the swapped first subword in a first row of said matrix with a third subword in said first row of said matrix or the swapped second subword in a second row of said matrix with a fourth subword in said second row of said matrix dependant on a permutation control bit.

31. The system of claim 21 wherein said permutation instruction defines a size of said subword, defines a subset of subwords in said sequence of subwords, swaps a first subword in said subset with a second subword in said subset and concatenates the swapped first subword and second subword.

32. The system of claim 21 wherein said permutation instruction conditionally concatenates one or more odd elements of a first said subword sequentially with one or more second odd elements of a second said subword.

33. The system of claim 21 wherein said odd elements of said first said subword and said odd elements of said second said subword are 32-bit subwords, 16-bit subwords or 8-bit subwords and said first subword and said second subword are 64-bit subwords.

34. The system of claim 21 wherein said permutation instruction conditionally concatenates one or more first even elements of a first said subword sequentially with one or more second even elements of a second said subword.

35. The system of claim 21 wherein said even elements of said first said subword and said even elements of said second said subword are 32-bit subwords, 16-bit subwords or 8-bit subwords and said first subword and said second subword are 64-bit subwords.

36. The system of claim 21 wherein said programmable processor is a microprocessor, digital signal processor, media processor, multimedia processor, cryptographic processor or programmable System-On-Chip (SOC).

37. The system of claim 21 wherein said permutation instructions for said atomic unit is defined for larger subword sizes at successively higher hierarchical levels.

38. A method for performing subword permutations in a programmable processor comprising the steps of: in response to a permutation instruction alternately selecting a first subword from a first sequence of subwords in a first register and a second subword from a second sequence of subwords in a second register: and concatenating the selected said first subword and the selected said second subword into a third sequence of subwords in a third register wherein said permutation instruction comprises a parameter for determining the number of bits in said first subword and said second subword to be selected, a reference to a first source register which contains said first sequence of subwords, a reference to a second source register which contains said second sequence of subwords and optionally a reference to a destination register which contains said third sequence of subwords.

39. The method of claim 38 further comprising the step of repeating said alternately selecting step for each of said subwords in said first sequence of subwords and each of said subwords in said second sequence of subwords.

40. The method of claim 38 wherein each subword comprises one or more bits.

41. A method for performing subword permutation in a programmable processor comprising the steps of: swapping a first subword in a first register with a second subword in a sequence of subwords in a second register and concatenating the swapped said first subword and said second subword into a second sequence of subwords in a third register wherein said permutation instruction comprises a parameter for determining the number of bits in said first subword and said second subword to be swapped, a reference to a source register which contains said sequence of subwords and optionally a reference to a destination register which contains said second sequence of subwords.

42. The method of claim 41 further comprising the step of repeating said swapping step for each of said subwords in said sequence of subwords.

43. The method of claim 41 wherein each subword comprises one or more bits.

44. A method for performing subword permutation in a programmable processor comprising the steps of: in response to a permutation instruction alternately selecting a first subword from a first sequence of subwords in a first register and a second subword from a second sequence of subwords in a second register; concatenating the selected said first subword and the selected said second subword into a third sequence of subwords in a third register; swapping a third subword in said third sequence of subwords with a fourth subword in said second sequence or said third sequence of subwords; and concatenating the swapped said third subword with the swapped said fourth subword into a fourth sequence of subwords.

45. The method of claim 44 further comprising the step of repeating said alternately selecting step for each of said subwords in said first sequence of subwords and repeating said swapping step for each of said subwords in said third sequence of subwords.

46. The method of claim 44 wherein said permutation instruction comprises a parameter for determining the number of bits to be selected and to be swapped, a reference to a first source register which contains said first sequence of subwords, a reference to a second source register which contains said second sequence of subwords and optionally a reference to a destination register which contains said third sequence of subwords or said fourth sequence of subwords.

47. The method of claim 44 wherein each subword comprises one or more bits.

48. A method for performing subword permutations in a programmable processor comprising the steps of: in response to a permutation instruction conditionally alternately selecting a first subword from a first sequence of subwords and a second subword from a second sequence of subwords dependant on permutation control bits; and concatenating the selected said first subword and the selected said second subword into a third sequence of subwords wherein said permutation instruction comprises a control bit configuration for determining said permutation control bits, a first source register which contains said first sequence of subwords, a reference to a second source register which contains said second sequence of subwords and optionally a reference to a destination register which contains said third sequence of subwords.

49. The method of claim 48 further comprising the step of repeating said conditionally selecting step for each of said subwords in said first sequence of subwords and each of said subwords in said second sequence of subwords.

50. The method of claim 48 wherein each subword comprises one or more bits.

51. A method for performing subword permutation in a programmable processor comprising the steps of: conditionally swapping a first subword with a second subword in a sequence of subwords dependant on permutation control bits in a first register and concatenating the swapped said first subword and said second subword into a second sequence of subwords in a second register wherein said permutation instruction comprises a control bit configuration for determining said permutation control bits, a reference to a source register which contains said sequence of subwords and optionally a reference to a destination register which contains said second sequence of subwords.

52. The method of claim 51 further comprising the step of repeating said conditionally swapping step for each of said subwords in said sequence of subwords.

53. The method of claim 51 wherein each subword comprises one or more bits.

54. A method for performing subword permutation in a programmable processor comprising the steps of: in response to a permutation instruction conditionally electing a first subword from a first sequence of subwords in a first register and a second subword from a second sequence of subwords in a second register dependant on permutation control bits; concatenating the selected said first subword and the selected said second subword into a third sequence of subwords in a third register; conditionally swapping a third subword in said third sequence of subwords with a fourth subword in said second sequence or said third sequence of subwords dependant on said permutation control bits; and concatenating the swapped said third subword with the swapped said fourth subword into a fourth sequence of subwords.

55. The method of claim 54 further comprising the step of repeating said conditionally selecting step for each of said subwords in said first sequence of subwords and repeating said conditionally swapping step for each of said subwords in said third sequence of subwords.

56. The method of claim 54 wherein said permutation instruction comprises a control bit configuration for determining said permutation control bits, a reference to a first source register which contains said first sequence of subwords, a reference to a second source register which contains said second sequence of subwords and optionally a reference to a destination register which contains said third sequence of subwords or said fourth sequence of subwords.

57. The method of claim 54 wherein each subword comprises one or more bits.

58. A method for performing subword permutation of a sequence of subwords in a programmable processor comprising the steps of: defining a size of said subword; defining a subset of subwords in said sequence of subwords; swapping a first subword in said subset in a first register with a second subword in a sequence of subwords in a second register and concatenating the swapped first subword and second subword into a second sequence of subwords in a third register; and repeating said swapping step for consecutive subsets of subwords wherein said permutation instruction comprises a parameter for indicating said size of said subword, a parameter for indicating a number of elements in each said subset; a parameter for indicating permutation configuration bits, a source register which contains said first sequence of subwords and optionally a reference to a destination register which contains said second sequence of subwords.

59. The method of claim 58 wherein each subword comprises one or more bits.

60. A system for performing subword permutations in a programmable processor comprising: in response to a permutation instruction, means for alternately selecting a first subword from a first sequence of subwords in a first register and a second subword from a second sequence of subwords in a second register: and means for concatenating the selected said first subword and the selected said second subword into a third sequence of subwords in a third register wherein said permutation instruction comprises a parameter for determining the number of bits in said first subword and said second subword to be selected, a reference to a first source register which contains said first sequence of subwords, a reference to a second source register which contains said second sequence of subwords and optionally a reference to a destination register which contains said third sequence of subwords.

61. The system of claim 60 further comprising means for repeating said means for alternately selecting a first subword for each of said subwords in said first sequence of subwords and each of said subwords in said second sequence of subwords.

62. A system for performing subword permutation in a programmable processor comprising: means for swapping a first subword in a first register with a second subword in a sequence of subwords in a second register and concatenating the swapped said first subword and said second subword into a second sequence of subwords in a third register wherein said permutation instruction comprises a parameter for determining the number of bits in said first subword and said second subword to be swapped, a reference to a source register which contains said sequence of subwords and optionally a reference to a destination register which contains said second sequence of subwords.

63. The system of claim 62 further comprising means for repeating said means for swapping for each of said subwords in said sequence of subwords.

64. A system for performing subword permutation in a programmable processor comprising: in response to a permutation instruction, means for alternately selecting a first subword from a first sequence of subwords in a first register and a second subword from a second sequence of subwords in a second register; means for concatenating the selected said first subword and the selected said second subword into a third sequence of subwords in a third register; means for swapping a third subword in said third sequence of subwords with a fourth subword in said second sequence or said third sequence of subwords; and means for combining the said third sequence of subwords with the swapped said fourth subword into a fourth sequence of subwords.

65. The system of claim 64 further comprising means for repeating said means for alternately selecting for each of said subwords in said first sequence of subwords and repeating said means for swapping for each of said subwords in said second or third sequence of subwords.

66. The system of claim 64 wherein said permutation instruction comprises a parameter for determining the number of bits to be selected and to be swapped, a reference to a first source register which contains said first sequence of subwords, a reference to a second source register which contains said second sequence of subwords and optionally a reference to a destination register which contains said third sequence of subwords or said fourth sequence of subwords.

67. A system for performing subword permutations in a programmable processor comprising the steps of: in response to a permutation instruction means for conditionally selecting a first subword from a first sequence of subwords in a first register and a second subword from a second sequence of subwords in a second register dependant on permutation control bits; and means for concatenating the selected said first subword and the selected said second subword into a third sequence of subwords in a third register wherein said permutation instruction comprises a control bit configuration for determining said permutation control bits, a first source register which contains said first sequence of subwords, a reference to a second source register which contains said second sequence of subwords and optionally a reference to a destination register which contains said third sequence of subwords.

68. The system of claim 67 further comprising means for repeating said means for conditionally selecting for each of said subwords in said first sequence of subwords and each of said subwords in said second sequence of subwords.

69. A system for performing subword permutation in a programmable processor comprising: in response to a permutation instruction, means for conditionally swapping a first subword in a first register with a second subword in a sequence of subwords in a second register dependant on permutation control bits and concatenating the swapped said first subword and said second subword into a second sequence of subwords in a third register wherein said permutation instruction comprises a control bit configuration for determining said permutation control bits, a reference to a source register which contains said sequence of subwords and optionally a reference to a destination register which contains said second sequence of subwords.

70. The system of claim 69 further comprising means for repeating said means for conditionally swapping for each of said subwords in said sequence of subwords.

71. A system for performing subword permutation in a programmable processor comprising: in response to a permutation instruction, means for conditionally selecting a first subword from a first sequence of subwords in a first register and a second subword from a second sequence of subwords in a second register dependant on permutation control bits; means for concatenating the selected said first subword and the selected said second subword into a third sequence of subwords; means for conditionally swapping a third subword in said third sequence of subwords with a fourth subword in said second sequence or said third sequence of subwords dependant on said permutation control bits in a third register; and means for combining the third sequence of subwords with the swapped said fourth subword into a fourth sequence of subwords.

72. The system of claim 71 further comprising means for repeating said means for conditionally selecting each of said subwords in said first sequence of subwords and repeating said means for conditionally swapping for each of said subwords in said second or third sequence of subwords.

73. The system of claim 71 wherein said permutation instruction comprises a control bit configuration for determining said permutation control bits, a reference to a first source register which contains said first sequence of subwords, a reference to a second source register which contains said second sequence of subwords and optionally a reference to a destination register which contains said third sequence of subwords or said fourth sequence of subwords.

74. A system for performing subword permutation of a sequence of subwords in a programmable processor comprising: means for defining a size of said subword; means for defining a subset of subwords in said sequence of subwords; means for swapping a first subword in said subset in a first register with a second subword in a sequence of subwords in a second register and concatenating the swapped first subword and second subword into a second sequence of subwords; and means for repeating said swapping step for consecutive subsets of subwords.

75. The system of claim 74 wherein said permutation instruction comprises a parameter for indicating said size of said subword, a parameter for indicating a number of elements in each said subset; a parameter for indicating permutation configuration bits, a source register which contains said first sequence of subwords and optionally a reference to a destination register which contains said second sequence of subwords.
Description



BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates to permuting subwords packed in registers in which the subwords can be re-arranged within a register and between registers for achieving parallelism in subsequent processing, such as two-dimensional multimedia processing.

2. Description of the Related Art

Efficient processing of multimedia information like images, video and graphics breaks both the sequential processing paradigm and the linear data processing paradigm inherent in the design of computers. Computers have been conventionally designed primarily to process linear sequences of data: memory is addressed as a linear sequence of bytes or words, and data is fetched into the programmable processor and processed sequentially. Efficient processing of pixel-oriented visual material is inherently parallel rather than sequential, and two-dimensional rather than linear (1-D).

Multimedia extensions have been added to general-purpose processors to accelerate the processing of different media types, see Ruby Lee, "Subword Parallelism with MAX-2", IEEE Micro, Vol. 16 No. 4, August 1996, pp. 51 59; IA-64 Application Developer's Architecture Guide, Intel Corporation, Order Number: 245188-001, May 1999. http://developer.intel.com/design/ia64; and AltiVec Extension to PowerPC Instruction Set Architecture Specification. Motorola, Inc., May 1998. http://www.motorola.com/AltiVec. Subword parallelism has been deployed by multimedia instructions in microprocessor architectures and in media processors to accelerate the processing of lower-precision data, like 16-bit audio samples or 8-bit pixel components. SIMD (Single Instruction Multiple Data) parallel processor techniques within a single processor have been referred to as microSIMD architecture, see Ruby Lee, "Efficiency of microSIMD Architectures and Index-Mapped Data for Media Processing", Proceedings of Media Processors 1999, IS&T/SPIE Symposium on Electric Imaging: Science and Technology, January 1999, pp. 34 46. A subword-parallel (or microSIMD) instruction performs the same operation in parallel on multiple pairs of subwords packed into two registers, which are conventionally 32 to 128 bits wide in microprocessors and mediaprocessors. For example, a 64-bit word-oriented datapath can be partitioned into eight 8-bit subwords, or four 16-bit subwords, or two 32-bit subwords.

Conventional shift and rotate instructions have been used to move all the bits in a register by the same amount. Extract and deposit instructions, found in instruction-set architectures like PA-RISC move one field using one or two instructions, as described in Ruby Lee, "Precision Architecture", IEEE Computer, Vol. 22, No. 1, January 1989, pp. 78 91. Early subword permutation instructions like mix and permute in the PA-RISC MAX-2 multimedia instructions are a first attempt to find efficient and general-purpose subword permutation primitives, as described in Ruby Lee, "Subword Parallelism with MAX-2", IEEE Micro, Vol. 16 No. 4, August 1996, pp. 51 59. The subwords in the source register are numbered and a permute instruction specifies the new ordering desired in terms of this numbering. The mux instruction in IA-64 described in IA-64 Application Developer's Architecture Guide, Intel Corporation, Order Number: 245188-001, May 1999. http://developer.intel.com/design/ia64 and the vperm instruction described in Altivec, AltiVec Extension to PowerPC Instruction Set Architecture Specification. Motorola, Inc., May 1998. http://www.motorola.com/AltiVec are similar. There is a limit to the efficiency of the permute instruction for many subwords, since the control bits quickly exceed the number of bits permuted. Permuting four subwords requires only 8 control bits, which can be encoded in the permute instruction itself. Beyond four elements and up to sixteen elements, any arbitrary permutation can still be performed with one instruction, by providing the control bits for the permutation in a second source register, rather than in the 32-bit instruction. Permuting 32 elements requires 160 bits, and permuting 64 elements requires 384 bits (n*log n bits). Hence, permuting more than 16 elements cannot be achieved by a single instruction with two source registers, using this method of specifying permutations. The problem is further complicated by the fact that image, video or graphics processing requires mapping of two-dimensional objects onto subwords in multiple registers and then permuting these subwords between registers.

U.S. Pat. No. 5,673,321 describes a computer instruction (MIXxx) which selects subword items from two source registers in pre-defined ways, for example: MIXWL (Mix Word Left) concatenates the left half (32 bits) of register R1 with the left half of register R2. MIXWR (Mix Word Right) concatenates the right half of R1 with the right half of R2. MIXHL (Mix Half-word Left) concatenates in turn, the first half-words of R1 and R2, followed by the third half-words of R1 and R2. MIXHR (mix Half-word Right) concatenates in turn, the second half-words of R1 and R2, followed by the fourth half-words of R1 and R2, and the like. The instruction also may contain other fields. For example, the MIXxx instructions described above may be used to transpose a 4.times.4 matrix of half-words contained in four registers R1, R2, R3, R4, each with 4 half-words. MIXBx selects alternate bytes from two source registers, R1 and R2, in two pre-defined ways: MIXBL alternates the 4 odd bytes of R1 with the 4 odd bytes of R2; MIXBR alternates the 4 even bytes of r1 with the 4 even bytes of r2. The MIXBL instruction may be used, for example, to unpack and pack bytes into and out of the more significant half of corresponding half-words. This instruction may be used to "unpack" a register with 8 bytes into 2 registers of 4 half-words each, with each byte being the more significant byte of each half-word. The MIXBL instruction may also be used to unpack and pack bytes into and out of the less significant half of corresponding half-words.

It is desirable to provide efficient subword permutation instructions that can be used for parallel execution for example in 2-D multimedia processing.

SUMMARY OF THE INVENTION

The present invention provides single-cycle instructions, which can be used to construct any type of permutations needed in two-dimensional (2-D) multimedia processing. The instructions can be used in a programmable processor, such as a digital signal processor, video signal processors, media processors, multimedia processors, cryptographic processors and programmable Systemon-a-Chips (SOCs).

The method and system provides a set of permutation primitives for current and future 2-D multimedia programs which are based on decomposing images and objects into atomic units, then finding the permutations desired for the atomic units. The subword permutation instructions for these 2-D building blocks are also defined for larger subword sizes at successively higher hierarchical levels. The atomic unit can be a 2.times.2 matrix and four triangles contained within the 2.times.2 matrix. Each of the elements in the matrix can represent a subword of one or more bits. The permutations provide vertical, horizontal, diagonal, rotational, and other rearrangements of the elements in the atomic unit.

The subword permutation primitives of the present invention include: CHECK, EXCHANGE, EXCHECK CCHECK, CEXCHANGE, CEXCHECK, CMIX and PERMSET instructions. The CHECK instruction provides downward and upward swapping of elements. The CCHECK instruction provides conditional downward and upward swapping of elements dependant on permutation control bits. The EXCHANGE instruction provides right and left movement. The CEXCHANGE instruction provides conditional right and left movement. The EXCHECK instruction provides rotation of triangles of the matrix. The CEXCHECK instruction provides conditional rotation of triangles. CMIX provides conditional selection of elements from two source registers in predetermined ways. The Permset instruction allows the permutation of a smaller set of subwords to be repeated on other subwords in the source register, enabling symmetric permutations to be specified on many more elements, without increasing the number of permutation control bits. EXCHANGE instruction is one example of the PERMSET instruction.

An initial alphabet (Alphabet A) of subword permutations is determined which comprises CMIX, PERMSET, CHECK and EXCHECK. Processors designed for high performance can implement Alphabet A, while very cost sensitive processors can choose to implement a smaller set of instructions in a minimal alphabet, such alphabet can include the CMIX and PERMSET instructions. The omitted instructions, CHECK and EXCHECK in Alphabet A, can be composed from CMIX and PERMSET. All the 24 permutations of a 2.times.2 matrix can be obtained using only instructions from Alphabet A, in a single cycle, in a processor with at least two permutation units.

The subword permutation primitives of the present invention enhance the use of subword parallelism by allowing in-place rearrangement of packed subwords across multiple registers, reducing the need for memory accesses with potentially costly cache misses. The alphabet of permutation primitives is easy to implement and is useful for 2-D multimedia processing and for other data-parallel computations using subword parallelism.

The invention will be more fully described by reference to the following drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

For a better understanding of the present invention, reference may be made to the accompanying drawings.

FIG. 1 is a schematic diagram of a system for implementing permutation instructions in accordance with an embodiment of the present invention.

FIG. 2 is a flow diagram of a method for permutation of subwords to be used in parallel processing.

FIG. 3A is a schematic diagram of an area mapping of a 4.times.4 matrix.

FIG. 3B is a schematic diagram of decomposition of the 4.times.4 matrix shown in FIG. 3A into four 2.times.2 matrices.

FIG. 4A is a schematic diagram of eight nearest neighbor movements for a pixel in a 2-D frame.

FIG. 4B is a schematic diagram of nearest neighbor movement for four 2.times.2 matrices.

FIG. 4C is a schematic diagram of nearest neighbor movements for a 2.times.2 matrix.

FIG. 5A is a schematic diagram of rotation of a 2.times.2 matrix.

FIG. 5B is a schematic diagram of eight permutations of a 2.times.2 matrix, representing the rotations of the four triangles contained in the 2.times.2 matrix.

FIG. 6 is a schematic diagram of a matrix transpose of a 4.times.4 matrix.

FIG. 7 is a schematic diagram of data rearrangements of a 2.times.2 matrix in which rows are changed into diagonals and diagonals are changed into columns.

FIG. 8A is a diagram of an initial "alphabet A" of subword permutation primitives.

FIG. 8B is a diagram of an alternate alphabet of subword permutation primitives.

DETAILED DESCRIPTION

Reference will now be made in greater detail to a preferred embodiment of the invention, an example of which is illustrated in the accompanying drawings. Wherever possible, the same reference numerals will be used throughout the drawings and the description to refer to the same or like parts.

FIG. 1 illustrates a schematic diagram of a system for implementing efficient permutation instructions 10 in accordance with the teachings of the present invention. Register file 12 includes source register 11a, source register 11b and destination register 11c. System 10 can provide different subword permutations of any one or two registers in register file 12. The same solution can be applied to different subword sizes of 2.sup.i bits, for i=0, 1, 2, . . . , m, where n=2.sup.m bits. For a fixed word size of n bits, and 8-bit subwords, there are n/8 subwords to be permuted. For a fixed word size of n bits, and 1-bit subwords, there are n subwords to be permuted. For permutation instructions operating on two source registers, source register values to be permuted 13 from source register 11a and second source register values 15 from source register 11b are applied over datapaths to permutation functional unit 14. Source register values to be permuted 13 and 15 can be a sequence of bits or a sequence of subwords. For permutation instructions operating on one source register, source register values 13 from source register 11a and optionally permutation configuration bits 15 from source register 11b are sent over datapaths to permutation unit 14. Permutation functional unit 14 generates permutation result 16. Permutation result 16 can be an intermediate result if additional permutations are performed by permutation functional unit 14. For other instructions, arithmetic logic unit (ALU) 17 and shifter 18 receive source register values 13 from source register 11a and source register values 15 from source register 11b and generate a respective ALU result 20 and a shifter result 21 over a datapath to destination register 11c. System 10 can be implemented in any programmable processor, for example, a conventional microprocessor, digital signal processor (DSP), cryptographic processor, multimedia processor, mediaprocessor, or programmable System-on-a-Chip (SOC) and can be used in developing processors or coprocessors for providing cryptography and multimedia operations.

FIG. 2 is a flow diagram of a method for permutation of subwords to be used in parallel processing 10 in accordance with the teachings of the present invention. In block 12, data to be permuted is decomposed into an atomic element. For example, the data to be permuted can comprise pixel oriented data of images, graphics, video or animation which can be represented as two-dimensional (2-D) multi-media data. The data can be stored in memory of a programmable processor such as by using a 2-D array of pixels. The 2-D array of pixels can be for example an 8.times.8 matrix. For example, in MPEG-1 and MPEG-2 video decode and JPEG image decompression, a frequently computed function is a separable 2-D Inverse Discrete Cosine Transform (IDCT) on an 8.times.8 matrix. This involves eight 1-D IDCT functions on the columns, followed by eight identical 1-D IDCT functions on the rows.

The 8.times.8 matrix can be decomposed into four 4.times.4 matrices, each stored in four 64-bit registers, as shown in FIG. 3a, in which each element is a 16-bit subword. Each such 4.times.4 matrix can be further decomposed into four 2.times.2 matrices as shown in FIG. 3b. Matrices with dimensions that are a power of two can be successively decomposed into smaller matrices, and ultimately into the smallest 2.times.2 matrix. Accordingly, the smallest atomic unit for 2-D multi-media data, such as an image or a frame, is a 2.times.2 matrix. A 2-D object within a frame can also be decomposed into smaller blocks in which the smallest 2-D rectangular block is a 2.times.2 matrix of pixels.

A regular decomposable permutation on (2.sup.m.times.2.sup.n) elements can be composed from permutations on (2.sup.m-1.times.2.sup.n-1) elements. This decomposability can be repeated until a (2.times.2) block is reached or a (2.sup.s.times.2) block is reached for s>1. This (2.sup.s.times.2) block can be further decomposed into (2.times.2) blocks. A square decomposable permutation on (2.sup.m.times.2.sup.m) elements can be decomposed into permutations on (2.sup.m-1.times.2.sup.m-1) elements. This decomposability can be repeated until basic (2.times.2) blocks are reached.

At the lowest level referred to as the atomic unit, four pixels of a 2.times.2 matrix can be permuted. At the next higher level, a 2.times.2 matrix is permuted in which each element is now itself a 2.times.2 matrix resulting in 4.times.4 actual elements. Accordingly, the atomic units can serve as permutation primitives for the entire frame. Alternatively, data to be permuted can be represented by non-rectangular objects. Non-rectangular objects can be decomposed into non-rectangular polygons. The smallest non-rectangular polygon is a triangle. A triangle is also an atomic unit.

In block 14 of FIG. 2, permute instructions are determined for rearrangement of data in the 2-D atomic units. A first set of data rearrangements of a 2.times.2 matrix is to swap elements vertically, horizontally and diagonally. FIG. 4A illustrates eight nearest-neighbor movements for a pixel in a 2-D frame. FIG. 4B illustrates the 9-element matrix of FIG. 4a as four 2.times.2 matrices which are outlined in bold. As shown in FIG. 4B an element of a 2.times.2 matrix can move to its right (or left) neighbor, its downward (or upward) neighbor, or its diagonal right (or left) neighbor. FIG. 4C illustrates all possible nearest neighbor movements, for one or two pairs of elements for a 2.times.2 matrix.

In a second set of data rearrangements, the four elements of a 2.times.2 matrix can be rotated clockwise by 1, 2 or 3 positions as shown in FIG. 5a. This is equivalent to rotating counter-clockwise by 3, 2 or 1 position. Rotating by 2 positions is equivalent to swapping both the diagonal and anti-diagonal elements, as shown previously in FIG. 4c. Matrices 20a c illustrate up or down movements of elements. Matrices 21a 21c show right or left movements of elements. Matrices 22a 22c show diagonal or antidiagonal movements of elements. Accordingly, a permutation instruction can be defined only for clockwise or anti-clockwise rotation by 1 position.

A 2.times.2 matrix contains four triangles, each of which can be rotated clockwise or anti-clockwise by 1 position. Rotation of 8 different permutations of the 2.times.2 matrix is shown in FIG. 5b. Each of matrices 23b, 24b, 25b and 26b is a anti-clockwise rotation of respective triangle 23a, 24a, 25a, and 26a. Each of matrices 23c, 24c, 25c, and 26c is a clockwise rotation of respective triangles 23a, 24a, 25a, and 26a.

In block 16 of FIG. 2, a sequence of the determined permutation instructions are performed for obtaining a desired permutation.

A CHECK instruction can be used as a permutation instruction for downward and upward swapping of elements. The CHECK instruction selects alternately from the corresponding subwords in two source registers for each position in a destination register. The instruction format for the CHECK instruction can be defined as: CHECK,x R1, R2, R3 wherein x is a parameter that specifies the number of bits for each swap operation, R1 is a reference to a source register which contains a first subword sequence, R2 is a reference to a source register which contains a second subword sequence and R3 is a reference to a destination register where the permuted subwords are placed. For example R1 consists of eight bytes (64 bits); byte a, byte b, byte c, byte d, byte e, byte f, byte g and byte h as shown in Table 1. R2 consists of byte A, byte B, byte C, byte D, byte E, byte F, byte G and byte H. In a CHECK,8 R1, R2, R3 instruction the first 8 bits (byte a) of register R1 are put into destination register R3, the second eight bits of register R2 (byte B) are put into destination register R3 and the like as shown in row 31. For a CHECK,16 R1, R2, R3 instruction the first 16 bits (byte a and byte b) of register R1 are put into register R3, the second 16 bits (byte C and byte D) of register R2 are put into register R3 and the like as shown in row 32. For a CHECK,32 R1, R2, R3 instruction the first 32 bits (byte a, byte b, byte c and byte d) of register R1 are put into register R3, the second 32 bits (byte E, byte F, byte G and byte H) of register R2 are put into register R3 as shown in row 33. The CHECK instruction can also be defined for 4-bit subwords, 2-bit subwords and 1-bit subwords. In general, it can be defined for subwords of size 2.sup.i bits, for i=0, 1, 2, . . . , m, where n=2.sup.m bits and n is the word size, which is usually the width of the registers in bits.

An EXCHANGE instruction can be used as a permutation instruction for right and left movement. The EXCHANGE instruction swaps adjacent subwords in each pair of consecutive subwords in a source register. The instruction format for the EXCHANGE instruction can be defined as: EXCHANGE, x R1, R3 wherein x is a parameter that specifies the number of bits for each swap operation, R1 is a reference to a source register which contains a subword sequence and R3 is a reference to a destination register where the permuted subwords are placed. In an EXCHANGE,8 R1, R3 instruction the first eight bits of R1(byte a) are exchanged with the second eight bits of R1(byte b) and the like in row 34. In an EXCHANGE,16 R1,R2 instruction the first sixteen bits of R1(byte a and byte b) are exchanged with the second 16 bits of R1(byte c and byte d) and the like in row 35. In an EXCHANGE,32 R1, R2 instruction the first 32 bits of R1(byte a, byte b, byte c and byte d) are exchanged with the second 32 bits of R1(byte e, byte f, byte g and byte h) in row 36.

The EXCHANGE instruction can also be defined for 4-bit subwords, 2-bit subwords and 1-bit subwords. In general, it can be defined for subwords of size 2.sup.i bits, for i=0, 1, 2, . . . , m, where n=2.sup.m bits and n is the word size, which is usually the width of the registers in bits.

An EXCHECK instruction can be used for permutation instructions for rotation of a triangle of three elements within a 2.times.2 matrix and other permutations. The EXCHECK instruction performs a CHECK instruction on two source registers followed by an EXCHANGE instruction on the result of the CHECK instruction. The instruction format for the EXCHECK instruction can be defined as EXCHECK, x R1,R2,R3 wherein x is a parameter that specifies the number of bits for each swap operation, R1 is a reference to a source register which contains a first subword sequence, R2 is a reference to a source register which contains a second subword sequence and R3 is a reference to a destination register where the permuted subwords are placed. In an EXCHECK,8 R1,R2,R3 instruction a CHECK instruction for R1 and R2 results in destination register R3 shown in row 31. A EXCHANGE instruction of register R3 shown in row 31, exchanges the first eight bits (byte a) with the second eight bits (byteB) and the like in row 37. In an EXCHECK,16 R1,R2,R3 instruction a CHECK instruction for R1 and R2 results in destination register R3 shown in row 32. A EXCHANGE instruction of register R3 shown in row 32, exchanges the first 16 bits (byte a and byte b) with the second 16 bits(byte C and byte D) and the like in row 38. In an EXCHECK,32 R1,R2,R3 instruction a CHECK instruction for R1 and R2 results in destination register R3 shown in row 33. A EXCHANGE instruction of register R3 shown in row 33, exchanges the first 32 bits (byte a, byte b, byte c and byte d) with the second 16 bits (byte E, byte F, byte G and byte H) in row 39.

The EXCHECK instruction can also be defined for 4-bit subwords, 2-bit subwords and 1-bit subwords. In general, it can be defined for subwords of size 2.sup.i bits, for i=0, 1, 2, . . . , m, where n=2.sup.m bits and n is the word size, which is usually the width of the registers in bits.

TABLE-US-00001 TABLE 1 Register Contents: R1 = a b c d e f g h R2 = A B C D E F G H Instruction: Definition: row 31 check, 8 R1,R2,R3 R3 = a B c D e F g H row 32 check, 16 R1,R2,R3 R3 = a b C D e f G H row 33 check, 32 R1,R2,R3 R3 = a b c d E F G H row 34 exchange, 8 R1,R3 R3 = b a d c f e h g row 35 exchange, 16 R1,R3 R3 = c d a b g h e f row 36 exchange, 32 R1,R3 R3 = e f g h a b c d row 37 excheck, 8 R1,R2,R3 R3 = B a D c F e H g row 38 excheck, 16 R1,R2,R3 R3 = C D a b G H e f row 39 excheck, 32 R1,R2,R3 R3 = E F G H a b c d

A MIX operation, defined in U.S. Pat. No. 5,673,321 hereby incorporated by reference into this application can be used for swapping of diagonal elements. The MIX operation selects either all even elements, or all odd elements, from the two source registers. A MIXL instruction can be used to interleave the corresponding "even" elements from the two source registers, starting from the leftmost elements in each register. A MIXR instruction can be used to interleave the corresponding "odd" elements from the two source registers, ending with the rightmost elements in each register.

Table 2 defines MIXL and MIXR instructions, for three different subword sizes: 8 bits, 16 bits and 32 bits. Each letter in the register contents R1 and R2 represents an 8-bit subword, and each register holds a total of 64 bits.

TABLE-US-00002 TABLE 2 Register Contents: R1 = a b c d e f g h R2 = A B C D E F G H Instruction: Definition: MixL, 8 R1,R2,R3 R3 = a A c C e E g G MixR, 8 R1,R2,R3 R3 = b B d D f F h H MixL, 16 R1,R2,R3 R3 = a b A B e f E F MixR, 16 R1,R2,R3 R3 = c d C D g h G H MixL, 32 R1,R2,R3 R3 = a b c d A B C D MixR, 32 R1,R2,R3 R3 = e f g h E F G H

A decomposable permutation is a 2-D object matrix transpose in which the matrix is flipped along its diagonal: rows become columns, and columns become rows. For example, an 8.times.8 matrix of 16-bit elements stored in 16 registers can be decomposed into four 4.times.4 matrices (FIG. 3a), each of which can be further decomposed into four 2.times.2 matrices (FIG. 3b). By transposing each of the 2.times.2 matrices, then transposing the larger 2.times.2 matrix, where each element is itself one of these 2.times.2 matrices, a matrix transpose of a 4.times.4 matrix can be obtained as shown in FIG. 6. The MIX instructions can be used to perform the hierarchical 2.times.2 matrix transpositions. The MIXL and MIXR instructions are used in pairs at the level of a subword size equal to the matrix element size. Thereafter, the MIXL and MIXR instructions are used at the size of subwords that are twice as large. Repeating this on each of the four 4.times.4 matrices determines the transpose of the original 8.times.8 matrix.

Table 3 illustrates a systematical enumeration of the permutations of area-mapped 2.times.2 matrices for illustrating that the subword permutation instructions defined above can perform the described permutations. R1 and R2 contain four 2.times.2 matrices. The leftmost matrix has been highlighted in bold for indicating the permutation of the first 2.times.2 matrix that is labeled initially "a b" in R1 and "A B" in R2. The permutations are enumerated as follows: each of the 4 elements in a resulting 2.times.2 matrix can be in the top left corner in R3. Thereafter, each of the 3 remaining elements can be in the top right corner in R3. This gives 12 possibilities for the top row, which is used for the numeric numbering of the cases. The two remaining elements of each 2.times.2 matrix are in the bottom row in R4, and their two possible orderings give the (a) and (b) numbering in Table 3.

TABLE-US-00003 TABLE 3 All Permutations of Four Area-Mapped 2 .times. 2 Matrices Operand registers: R1 = a b c d e f g h R2 = A B C D E F G H Result Registers: Instructions Used: Type of Data Movement: 1(a) R3 = a b c d e f g h ;R3=R1 identity permutation a at top left R4 = A B C D E F G H ;R4=R2 1(b) R3 = a b c d e f g h ;R3=R1 swap bottom row elements right- R4 = B A D C F E H G ;R4=exchange(R2) left 2(a) R3 = a B c D e F g H ;R3=check(R1,R2) swap right column elements up- R4 = A b C d E f G h ;R4=check(R2,R1) down 2(b) R3 = a B c D e F g H ;R3=check(R1,R2) rotate bottom-right triangle anti- R4 = b A d C f E h G ;R4=excheck(R2,R1) clockwise 3(a) R3 = a A c C e E g G ;R3=mixL(R1,R2) swap diagonal elements = R4 = b B d D f F h H ;R4=mixR(R1,R2) transpose 3(b) R3 = a A c C e E g G ;R3=mixL(R1,R2) rotate bottom-right triangle R4 = B b D d F f H h ;R4=mixR(R2,R1) clockwise 4(a) R3 = b a d c f e h g ;R3=exchange(R1) swap top row elements right-left b at top left R4 = A B C D E F G H ;R4=R2 4(b) R3 = b a d c f e h g ;R3=exchange(R1) swap both rows' elements right- R4 = B A D C F E H G ;R4=exchange(R2) left 5(a) R3 = b B d D f F h H ;R3=mixR(R1,R2) rotate top-right triangle anti- R4 = A a C c E e G g ;R4=mixL(R2,R1) clockwise 5(b) R3 = b B d D f F h H ;R3=mixR(R1,R2) rotate anti-clockwise 1 element R4 = a A c C e E g G ;R4=mixL(R1,R2) 6(a) R3 = b A d C f E h G ;R3=excheck(R2,R1) rotate top-left triangle anti- R4 = a B c D e F g H ;R4=check(R1,R2) clockwise 6(b) R3 = b A d C f H h H ;R3=excheck(R2,R1) 40 a R4 = B a D c F e H g ;R4=excheck(R1,R2) 7(a) R3 = A a C c E e G g ;R3=mixL(R2,R1) rotate top-left triangle clockwise A at top left R4 = b B d D f F h H ;R4=mixR(R1,R2) 7(b) R3 = A a C c E e G g ;R3=mixL(R2,R1) rotate clockwise 1 element R4 = B b D d F f H h ;R4=mixR(R2,R1) 8(a) R3 = A b C d E f G h ;R3=check(R2,R1) swap left column elements up- R4 = a B c D e F g H ;R4=check(R1,R2) down 8(b) R3 = A b C d E f G h ;R3=check(R2,R1) rotate bottom-left triangle R4 = B a D c F e H g ;R4=excheck(R1,R2) clockwise 9(a) R3 = A B C D E F G H ;R3=R2 swap left and right column R4 = a b c d e f g h ;R4=R1 elements up-down 9(b) R3 = A B C D E F G H ;R3=R2 40 b R4 = b a d c f e h g ;R4=exchange(R1) 10(a) R3 = B a D c F e H g ;R3=excheck(R1,R2) rotate top-right triangle clockwise B at top left R4 = A b C d E f G h ;R4=check(R2,R1) 10(b) R3 = B a D c F e H g ;R3=excheck(R1,R2) 40 c R4 = b A d C f E h G ;R4=excheck(R2,R1) 11(a) R3 = B b D d F f H h ;R3=mixR(R2,R1) rotate bottom-left triangle anti- R4 = a A c C e E g G ;R4=mixL(R1,R2) clockwise 11(b) R3 = B b D d F f H h ;R3=mixR(R2,R1) swap anti-diagonal elements R4 = A a C c E e G g ;R4=mixL(R2,R1) 12(a) R3 = B A D C F E H G ;R3=exchange(R2) 40 d R4 = a b c d e f g h ;R4=R1 12(b) R3 = B A D C F E H G ;R3=exchange(R2) swap diagonal and anti-diagonal R4 = b a d c f e h g ;R4=exchange(R1) elements = rotate clockwise by 2

The subword permutation instructions used to achieve each of the 2.times.2 block permutations are shown. If the processor has at least two permutation units, then each case in Table 3 can be executed in one cycle, since there are no dependencies in generating R3 and R4 providing for efficiency of these permutation primitives.

Each 2.times.2 matrix permutation is also labeled with one of the 20 data movements including identity, described in FIGS. 4c, 5a and 5b. There are four permutations in Table 3 that are not labeled with a data movement 40a 40d. These permutations correspond to data rearrangements of a 2.times.2 matrix, described as changing rows into diagonals, and changing diagonals into columns, as shown in FIG. 7.

In an alternate embodiment, permutation instructions provide conditional swaps between the targeted subwords in two registers and between subwords in one register.

The instructions can be used for all different subword sizes of 2.sup.i bits, for i=0, 1, 2, . . . n/2. A CCHECK instruction can be used as a permutation instruction for conditional downward and upward swapping of elements. The CCHECK instruction selects conditionally from the corresponding subwords in two source registers for each position in a destination register dependant on a control bit. The instruction format for the CCHECK instruction can be defined as: CCHECK,0xxxxxxx R1, R2, R3 wherein control bits are denoted as "xxxxxxx", R1 is a reference to a source register which contains a first subword sequence, R2 is a reference to a source register which contains a second subword sequence and R3 is a reference to a destination register where the permuted subwords are placed. If the control bit is a 1, the CCHECK instruction swaps the corresponding elements in register R1 and register R2. If the control bit is a 0, the CCHECK does not swap corresponding elements in register R1 and register R2. A control bit can be used for each potential swap between a pair of subwords. For "CHECK,8", 4 control bits are used in the CCHECK instruction to specify if the right 1-byte subword of each pair of subwords in R1 should be swapped with the corresponding subword in R2. For "CHECK, 16", 2 control bits are used in the CCHECK instruction to specify if the right 2-byte subword of each pair of subwords in R1 should be swapped with the corresponding subword in R2. For "CHECK,32", 1 control bit is used in the CCHECK instruction to specify if the right 4-byte subword of R1 should be swapped with that in R2. Table 4A illustrates a comparison between a CCHECK instruction and a CHECK instruction for different subword sizes.

A CEXCHANGE instruction can be used as a permutation instruction for conditional right and left movement. The CEXCHANGE instruction conditionally swaps adjacent subwords in each pair of consecutive subwords in a source register dependant on a control bit. The instruction format for the CEXCHANGE instruction can be defined as CEXCHANGE, 0xxxxxxx R1, R3 wherein control bits are denoted as "xxxxxxx", R1 is a reference to a source register which contains a subword sequence and R3 is a reference to a destination register where the permuted subwords are placed.

The CEXCHANGE can be used to represent a binary tree in which at each level of the tree, the left subtree can be swapped with the right subtree. A subtree at level i is represented by a subword of size n/2.sup.1, where the root of the binary tree is at level 0, and the leaves of the tree are at level 1g(n). That is, the root node of the binary tree has 2 subtrees at level 1. The root node is represented by the whole word of size n bits. Level 1 of the tree is represented by 2 subwords, each of size n/2 bits. Level 2 of the binary tree is represented by 4 subwords, each of size n/4 bits. Level 3 of the binary tree is represented by 8 subwords, each of size n/8 bits and the like. The last (leaf) level of the tree is


Free Web Sudoku Puzzles.
Solve with your browser.
  4       8   9 1
  7         8    
        2       3
        1     8 6
    5 6   7 9    
9 1     3        
2       8        
    3         7  
4 9   2       5  
What is it?



Add Your Site · Terms Of Service · Privacy Policy


DISCLAIMER
Linkgrinder is a free service that searches the Internet and indexes all files found so that you may search quickly and easily for shared files. These files are created and made available individually by users whose identity we are not aware of and who we have no control over. In essence we function like a search engine tool; these files ARE NOT STORED OR SERVED BY OUR NETWORK. We are not responsible for any materials obtained by using our service. We do not monitor any of the contents of these files. These files may contain viruses, illegal materials, materials inappropriate for minors, offensive files and the like. BY USING OUR SERVICE, YOU ASSUME FULL RESPONSIBILITY FOR DOWNLOADING THESE MATERIALS AND WILL INDEMNIFY US FOR ANY DAMAGES THAT MAY BE INCURRED.

For More Specific Information VIEW OUR TERMS OF SERVICE.

Thank you and Enjoy!