Field of the Invention
The present invention relates to memory systems, and more particularly
to shared memory multiprocessor systems in which one or more processors has a cache
Background of the Invention
Computer systems commonly have a plurality of components. such as
processors, memories, and input/output devices. The components commonly are grouped
as component modules, each of which may contain one or more processors. memory,
and/or input/output devices. Component modules generally communicate with one another
in the form of "transactions" such as conventional "read" and "write" transactions.
For example, in a typical read transaction, one module will send signals to another
module identifying data that it needs to obtain and requesting that the identified
data be sent to it. The responding module then processes the request and returns
In many computer systems, software running on the system is executed
by two or more main processor modules that share one or more main memories. Systems
of this type are often referred to as "shared memory multiprocessor" systems.
In shared memory multiprocessor systems, transactions generally are
transferred among processor modules and the main memory modules on a shared memory
interconnect, such as a conventional shared bus or a conventional cross bar switch.
Transactions are transmitted over the shared memory interconnect during "cycles,"
each cycle being a period of time during which a limited number of transactions
can be transferred on the shared memory interconnect. In a crossbar switch, each
module is ordinarily entitled to place a transaction on the switch and receive
a transaction from the switch during each cycle. This contrasts with conventional
shared buses, in which only a single module is entitled to send a transaction on
the bus during a given cycle.
Many conventional shared memory interconnects accommodate "split
transactions." In split transactions, a response need not immediately follow a
request. For example, a module may send a read transaction to another module on
a crossbar switch during a given cycle. The other module may be busy performing
other tasks for one or more cycles before it can process the read transaction.
Eventually, the responding module will process the transaction and return the requested
data during some future cycle. To correlate request transactions with their responses,
each request transaction generally contains a transaction identification code that
identifies the module initiating the transaction and distinguishes the transaction
from other issued by that module. Response transactions generally are sent with
the transaction identification code so that the response can be correlated with
the request transaction. For example, data returned in response to a read transaction
will identify the particular read transaction for which the data is being returned
In shared memory multiprocessor systems, the main processors generally
are coupled directly to the shared memory interconnect. Each main memory generally
is coupled to the shared memory interconnect through a separate main memory controller.
If the system has more than one main memory, each memory contains a mutually exclusive
set of data addresses. If a processor is to read data from a main memory or write
data to a main memory, it must communicate with the main memory controller for
In addition to main memories within a system, each processor module
may also have a cache memory, which stores recently used data values for quick
access by the module. Ordinarily, a cache memory stores both the recently used
data and the addresses where these data items are stored in the relevant main memory.
When the module seeks data from an address in a main memory, it requests that data
from its cache memory using the address associated with the data. The cache memory
checks to see whether it holds data associated with that address. If so, the cache
memory returns the requested data directly to the processor. If the cache memory
does not contain the desired information (i.e., if a "cache miss" occurs), a regular
memory access ordinarily occurs. Cache memory is typically useful when main memory
(generally RAM) accesses are slow compared to the microprocessor speed. Cache memory
is faster than main RAM memory.
In the case of a shared memory multi-processor system in which each
processor has a cache memory, the situation is somewhat more complex than for single
processor systems having a cache memory. In a multiprocessor system, the current
data corresponding to a particular main memory address may be stored in one or
more cache memories, and/or in a main memory. The data in a cache memory may have
been operated on by a processor, resulting in a value that is different from the
value stored in main memory. Thus, a "cache coherency scheme," must be implemented
to assure that the current data value for any address is provided independent of
where that data value resides.
Typical coherency schemes involve a substantial amount of communication
among modules that can introduce unnecessary delays into the computer system. In
a typical coherency scheme, when data is requested by a module, each module having
a cache memory is notified that is must perform a "coherency check" of its cache
memory to determine whether it has data associated with the requested address and
reports the results of its coherency check. The module must then perform the coherency
check and report the results. Notification of the modules that coherency checks
must be performed and reporting of the results by the modules can each result in
In addition to reporting whether it has the requested data in its
cache memory, each module generally reports the status of the data stored in its
cache memory. For example, a module may report that it has a copy of the data and
its copy is "private" (i.e., the data is only available for use by that module)
or that the data is "shared" (i.e., the data may reside in more than one cache
memory at the same time). A module may also report whether its data is "clean"
(i.e., the same as the data associated with the same address stored in main memory)
or "dirty" (i.e., the data has been operated on after it was obtained). Ordinarily,
only one private-dirty copy of data is permitted at any given time. Generally,
coherency status information for a particular data line stored in a cache memory
is stored in a set of fields known as a "cache tag" for that data line.
A "coherent transaction" is any transaction that requires a check
of all memories to determine the source of the data to be delivered to the requesting
processor. Coherent transactions generally can be issued during any available cycle
of the shared memory interconnect. Some modules, however, may be busy internally
and unable to immediately perform a coherency check for the transaction and cache
coherency checks may take several cycles to complete. To accommodate the rate at
which coherent transactions can be issued, modules sometimes have a cache coherency
queue for storing coherent transactions until a coherency check can be performed.
Depending on the rate at which coherency checks are performed, a large number
of coherent transactions may be stored in a particular module's coherency queue,
holding up execution of the transaction for a relatively long period of time.
The results of the coherency checks performed by each module are
analyzed and the most current data is provided to the module that requested the
data. For example, if no cache memories have a copy of the requested data, the
data will be supplied by main memory. If a module has a private-dirty copy, it
is either returned to a main memory controller that supplies the data or it is
supplied directly to the requesting module in a cache to cache copy. When the data
is supplied, each module typically updates the status of the data in its cache
memory. For example, if a private copy of data is supplied to a particular module,
other modules will ordinarily mark their copies of that data, if any, as invalid.
Accordingly, there is a need for a memory system having a main memories
and cache memories that provides for cache coherency checking without undue delays.
US-A-5,297,269 concerns a cache coherency protocol for a multi-processor
computer system. The system comprises several central processing units connected
to one or more memories by a cross bar switch unit, and each main memory module
comprises a memory directory resident on the module in which several coherency
state fields for each data block stored in the memory module is contained to indicate
the coherency state of this data block. Coherency throughout the multi-processor
system for a data block stored within the memory module is enforced by making
a query to the memory directory upon each data transfer operation which effects
or may be effected by the coherency state of a data block. The system further provides
information indicating transitional states to expedite the processing of read
and write requests. The transition states indicate to the main memory module whether
an outstanding data operation regarding a particular block is being executed so
that the memory module can block or inhibit a subsequent read request for that
data block until the already-commenced operation has been completed.
Starting from the prior art discussed above, it is the object underlying
the present invention to provide an improved coherent memory system having a main
memory and cache memories that provide for cache coherency and does not add excessive
delays to coherent transactions.
This object is achieved by a memory system according to claim 1.
Broadly stated, the present invention encompasses a duplicate cache
tag memory system. The memory system has a main memory controller for supplying
data in response to transactions received by the main memory controller. The memory
system also has two or more modules, each having a cache memory for storing data
supplied to the module by the main memory controller. Each module sends transactions
to the memory controller to request data needed by the module that is not stored
in the cache memory of the module.
The memory system also includes a cache tag array coupled to the
main memory controller. The cache tag array includes a cache tag corresponding
to each data line stored in one of the cache memories of the modules. The cache
tag for a data line specifies the module that most recently requested the data
line from the main memory controller.
In the preferred embodiment, the cache tags also specify whether
the relevant data line is held private or shared.
Brief Description of the Drawing
FIG. 1 is a block diagram illustrating one embodiment of a computer
system utilizing a duplicate cache tag memory system according to the present invention.
Detailed Description of the Invention and Preferred Embodiments
The present invention encompasses an improved memory system for a
shared memory multiprocessor computer system in which one or more processor modules
and/or input/output modules have cache memories. According to the present invention,
the main memory controller for each main memory maintains a duplicate cache tag
array containing current information on the status of data lines from that main
memory that are stored in the cache memories. Thus, coherency checks can be performed
directly by the main memory controller. This eliminates the need for each processor
having a cache memory to perform a separate coherency check and to communicate
the results of its coherency checks to the main memory controller, and thereby
reduces delays associated with processing coherent transactions.
A computer system utilizing a memory system according to the present
invention is shown at 10 in FIG. 1
Computer system 10 is a multiprocessor computer having a crossbar
switch 12 and a plurality of components coupled to crossbar switch 12. The components
include main memory controllers 14 and 16, input/output module 18, and processor
modules 20, 22, and 24. The components send transactions to one another on crossbar
Transactions are received by and sent from crossbar switch 12 during
cycles. Each memory controller, input/output module, and processor module is entitled
to place a transaction on crossbar switch 12 and receive a transaction from crossbar
switch 12 during each cycle.
Each transaction placed on the crossbar switch includes transaction
identification information, or a transaction identification code, that identifies
the module to which the transaction is directed and distinguishes the transaction
from other pending transactions issued by that module. The transaction also identifies
the responding module to which the transaction is sent. The crossbar switch directs
the transactions received during a given cycle to the responding modules.
To accommodate burst transaction issue rates greater than the speed
at which transactions can be processed by responding modules, crossbar switch 12
includes queues associated with each module. Transactions being delivered by crossbar
switch 12 to a module are placed in the queue associated with each module and sent
to the module from the queue in first-in. first-out ("FIFO") order. Queues 30,
32, 36, 38 and 40 are associated. respectively, with main memory controllers 14
and 16, and processor modules 20, 22, and 24.
To accommodate the case in which more than one module sends transactions
to a particular module during a given cycle, each module is assigned a predetermined
priority. Transactions placed on the crossbar switch in the same cycle and directed
to the same module are placed in the responding module's queue in an order defined
by the priority of the sending modules. Thus, transactions received by the crossbar
switch during a single cycle and directed to a receiving module sill be delivered
to the receiving module during separate cycles.
Main memory controllers 14 and 16 are responsible for reading information
from the main memories 15 and 17, respectively, and storing information in their
respective main memories in a conventional manner. Main memory controllers 14 and
16 interface with memories 15 and 17 either directly or through conventional buses.
Processor modules 20, 22, and 24 are the main processors for computer
system 10. Software for the system may execute simultaneously on all processors.
Processor modules 20, 22, and 24 include cache memories 50, 52, and
54, respectively. Recently used data is stored in cache memories 50, 52, and 54
along with the main memory address for the data and information relating to the
coherency status or the data. The main memory address and coherency information
can be viewed as being "cache tags" associated with each data line stored in the
In the preferred embodiment, the status of each cache line (i.e.,
a copy of a dataline that is stored in a cache memory) is defined in one of five
ways: "Invalid", "Private", "Shared", "Private-Pending", "Shared-Pending." A "private"
line is one that is allowed to reside in the cache memory of only one module and
to be used only by that module. If a module holds a data line private, the module
may use the data line in any way as it represents the current value of the data.
A "shared" line is one that is allowed to reside in more than one module's cache
at the same time. In order for the current value of a data line marked shared to
be changed by a module, the shared copies held by other modules must be marked
invalid and the module must obtain the data line private. "Invalid" indicates that
the data should not be used. The status of private-pending or shared-pending are
In addition to the cache tags of each processor module, each main
memory controller contains a duplicate cache tag array that stores the current
status of each cache line related to that main memory that is held by the processor
modules. Duplicate cache tag array 44 is associated with main memory controller
14. Duplicate cache tag array 46 is associated with main memory controller 16.
The duplicate cache tag array for each main memory contains a "duplicate
cache tag" corresponding to the cache tag associated with each data line from that
main memory that is stored in a cache memory. The duplicate cache tag includes
the main memory address associated with the data and the coherency status of the
cache line. The duplicate cache tag also contains information identifying the processor
that has each data line or the transaction identification code for a processor
that has requested the line. Thus, at any given time, each main memory has access
to a duplicate cache tag array having the current cache coherency status for each
data line associated with that main memory. There is no need for each module to
perform a cache coherency check for coherent transactions and report the results
to the main memory controllers.
It should be noted that a "duplicate cache tag" will not be a precise
copy of the relevant cache tag. Rather, the duplicate cache tag is a duplicate
only in the sense that it should include all coherency status information contained
in the actual cache tag or, as will be explained below, more current information.
When a module needs a copy of a particular data line, it places a
read transaction requesting that data line on the crossbar switch. The read transaction
includes a transaction identification code and also specifies whether a shared
copy or a private copy of the data is desired. A read-private transaction is a
request for a private copy of data and a read-shared transaction is a request for
a shared copy of data. The read transaction is directed to the main memory controller
for the main memory that contains the specified memory address.
When a main memory controller receives a read transaction, it checks
the duplicate cache tag array to determine whether the data line is held in a cache
memory and the coherency status of the line. If no cache memory contains the data
line, the main memory controller supplies the line from main memory to the requesting
If the line is held shared in one or more cache memories, the main
memory controller supplies the data line from main memory. If the data is requested
shared (i.e., by a read-shared transaction), the data is simply sent to
the requesting module. If the data is requested private (i.e., by a read-private
transaction), the main memory controller supplies the data from main memory and
sends purge (i.e., invalidate) transactions to each module indicating that the
modules should mark their shared copy of the data as invalid. In some embodiments
of the present invention, the main memory controllers may keep a record of all
the modules that have a shared copy of a given data line. In such embodiments,
the main memory controller need only send purge transactions to the modules that
have a shared copy of the data line.
If the data line is held private in a module's cache memory, the
main memory controller sends a flush back (i.e., data return) transaction to the
relevant module along with the transaction identification code for the read transaction
requesting the data. In response to the flush back transaction, the module marks
its cache tag for the line as invalid and returns the line to the requesting main
memory controller along with the transaction identification code. When the line
is received by the main memory controller, it supplies the line to the requesting
module and updates the relevant address in main memory.
When it sends the flush back transaction to the relevant module,
the main memory controller updates the status of its duplicate cache tag to reflect
the requesting module as the owner of the data line and to indicate that the ownership
is pending, that is the data has not yet been sent to the owner of the line. If
the data line was requested private, the status is recorded as pending private.
If the data line was requested shared, the status is recorded as pending shared.
Assuming no transactions requesting the data are received from other
modules, the duplicate cache tag is again updated when the data is received by
the main memory controller in response to the flush back. If the data line was
requested private, the duplicate cache tag is updated to reflect a private status,
rather than private pending, and the data is sent to the requesting module. If
the data line was requested shared, the duplicate cache tag is updated to reflect
a shared status, rather than pending shared, and the data is sent to the requesting
If a transaction requesting the data is received from another module
before the data line is received in response to the flush back, a somewhat different
treatment is necessary. As noted, at the time the flush back is sent, the duplicate
cache tag is updated to reflect the requesting module as the current owner and
the transaction identification code is sent with the flush back. The transaction
identification code is also included with the data when the data is sent back to
the main memory controller. For the reasons explained below, this allows for the
case in which multiple requests for the same data line are pending.
When the main memory controller receives a second request for data,
the duplicate cache tag for the data has already been marked with a pending status
based on the first request for the data. At the time of the second request, the
duplicate cache tag is updated to reflect the new requesting module and whether
the data was requested private or shared by this new module. If the first requesting
module requested the data private, a flush back transaction is sent to the first
requesting module along with the transaction identification code identifying the
second request for the data. If the first requesting module requested the data
shared, a purge transaction is sent to the first requesting module. The first requesting
module will act on these transactions after it has received the requested data
and used it once.
When the data is received by the main memory controller in response
to the flush back for the first data request, the duplicate cache tag will already
have been updated to reflect the second data request. As noted above, however,
the data is returned to the main memory controller with the transaction identification
code for the first request. Thus, the main memory controller is able to properly
direct the data. When the data is received by the main memory controller, it therefore
sends the data to the first requesting module based on this information. Additionally,
if the first requesting module is receiving the data shared, the data is also sent
to the second requesting module.
If the first requesting module requested the data private, a flush
back transaction will have been received by the first requesting module before
the data is received. This out-of-order command sequence is used to indicate to
the first requesting module that the first requesting module may use the data once
and then must send the data back to the main memory controller.
If the first requesting module requested the data shared, a purge
transaction will have been received by the first requesting module before the
data is received. This out-of-order command sequence is used to indicate to the
first requesting module that it may use the data once and then must mark its copy
If a third or subsequent request is received for previously requested
data, the same procedure is used. The duplicate cache tag is updated to reflect
the most recent request for the data. A flushback or purge transaction is sent
to the previous requestor, depending on whether the previous requestor requested
the data private or shared. If the previous requestor requested the data shared,
the data is sent to both the previous requestor and the more recent requestor when
it received by the main memory controller.
As an example of the operation of the present invention, consider
the case of computer system 10. Assume that each cache tag for cache memories 50,
52 and 54 is initially marked invalid. Each duplicate cache tag will therefore
also be marked invalid.
During one cycle of crossbar switch 12, processor 20 sends a read
private transaction to main memory controller 14 indicating that it seeks to obtain
a private copy of a data line "A." Since data line A is stored only in main memory
15, main memory controller 14 supplies data line A to processor module 20 during
a subsequent cycle and updates duplicate cache tag array 44 to indicate that data
line A is held private by processor module 20.
Processor module 20 receives data line A. Data line A is stored in
cache memory 50 along with a cache tag indicating that the data is held private.
Processor module 20 is free to alter the value of data line A in its cache memory
During a subsequent cycle, module 22 sends a read private transaction
to main memory controller 14 indicating that it seeks to obtain a private copy
of data line A. Main memory controller 14 checks duplicate cache tag array 44 and
determines that line A is held private by processor module 20. Main memory controller
14 sends a flush back A (or yield A) transaction to processor module 20 requesting
that module 20's private copy of A be sent back to main memory controller 14. The
transaction identification code indicating the read private transaction sent by
processor module 22 is also sent to processor module 20 along with the flush back
transaction. Main memory controller 14 also updates duplicate cache tag array
44 to indicate that processor module 22 owns data line A private pending.
When processor module 20 receives the flush back transaction, it
sends data line A to main memory controller 14 along with the transaction identification
code for the read private transaction of processor module 22 and marks its copy
of line A invalid.
Assuming that no further requests for data line A were received,
when main memory controller 14 receives data line A, it sends data line A to processor
module 22 along with the transaction identification code, and updates duplicate
cache tag array 44 to indicate that data line A is held private by module 22.
Now consider the case in which processor module 24 had requested
a shared copy of data line A before data line A was received by main memory controller
14 from processor module 20. In this case, when module 24's read-shared transaction
is received, the duplicate cache tag for data line A is updated to reflect module
24 as the owner of data line A with the status of "pending shared." A flush back
is sent to module 22 along with the transaction identification code for the read-shared
transaction sent by module 24.
When data line A is received by main memory controller 14 from processor
module 20, it is sent to processor module 22 based on the transaction identification
code returned with the data. Processor module 22 uses data line A once and then
returns it to main memory controller 14 along with the transaction identification
code for the read-shared transaction sent by module 24. Processor module 22 also
marks its copy of data line A invalid.
When main memory controller 14 receives data line A from processor
module 22, it sends data line A to processor module 24. It also updates the duplicate
cache tag for data line A to indicate that it is held shared by processor module
24. Cache memory 54 for processor module 24 is updated with data line A and a cache
tag indicating that the data is held shared.
It will be appreciated by those skilled in the art that main memories
may be updated with the current value of data each time a private copy of the data
is received by the relevant main memory controller.
The present invention has been described above using processor modules
with cache memories for illustrative purposes only. It will be appreciated by those
skilled in the art that the present invention is applicable to any other type of
modules, such as an input/output module, having a cache memory.
Various modifications to the present invention will become apparent
to those skilled in the art from the foregoing description and accompanying drawings.
Accordingly, the present invention is to be limited solely by the scope of the