| Dokumentenidentifikation |
EP1476807 17.01.2008 |
| EP-Veröffentlichungsnummer |
0001476807 |
| Titel |
ZEIGERIDENTIFIKATION UND AUTOMATISCHES DATENVORABRUFEN |
| Anmelder |
Sun Microsystems, Inc., Santa Clara, Calif., US |
| Erfinder |
TREMBLAY, Marc, Menlo Park, CA 94025, US; CHAUDHRY, Shailender, San Francisco, CA 94107, US |
| Vertreter |
derzeit kein Vertreter bestellt |
| DE-Aktenzeichen |
60317878 |
| Vertragsstaaten |
DE, FR, GB |
| Sprache des Dokument |
EN |
| EP-Anmeldetag |
21.02.2003 |
| EP-Aktenzeichen |
037431806 |
| WO-Anmeldetag |
21.02.2003 |
| PCT-Aktenzeichen |
PCT/US03/05194 |
| WO-Veröffentlichungsnummer |
2003073268 |
| WO-Veröffentlichungsdatum |
04.09.2003 |
| EP-Offenlegungsdatum |
17.11.2004 |
| EP date of grant |
05.12.2007 |
| Veröffentlichungstag im Patentblatt |
17.01.2008 |
| IPC-Hauptklasse |
G06F 9/38(2006.01)A, F, I, 20051017, B, H, EP
|
| IPC-Nebenklasse |
G06F 12/08(2006.01)A, L, I, 20051017, B, H, EP
G06F 9/35(2006.01)A, L, I, 20051017, B, H, EP
|
| Beschreibung[en] |
|
BACKGROUND
Field of the Invention
The invention related to prefetch techniques and, in particular,
to automatic prefetch based on detection by a processor of likely pointer values.
Description of the Related Art
Computer systems typically include, amongst other things,
a memory system and one or more processors and/or execution units. The memory system
serves as a repository of information, while a processor reads information from
the memory system, operates on it, and stores it back. As processor speeds and sizes
of memory systems have increased, the mismatch between the ability of the processor
to address arbitrary stored information and the ability of the memory system to
quickly provide it has increased. To address this mismatch, memory systems are typically
organized as a hierarchy using caching techniques that are well understood in the
art.
In general, caches can be used to reduce average latency
problems when accessing (e.g., reading or writing) main memory. A cache is typically
a small, specially configured, high-speed memory that represents a small portion
of the information represented in main memory. By placing the cache (small, relatively
fast, expensive memory) between main memory (large, relatively slow memory) and
the processor, the memory system as a whole is able to satisfy a substantial number
of requests from the processor at the speed of the cache, thereby reducing the overall
latency of the system. Some systems may define multiple levels of cache.
When the data requested by the processor is in the cache
(a "hit"), the request is satisfied at the speed of the cache. However, when the
data requested by the processor is not in the cache (a "miss"), the processor must
wait until the data is provided from the slower main memory, resulting in greater
latency. Typically, useful work is stalled while data is supplied from main memory.
As is well known in the art, the frequency of cache misses is much higher in some
applications or execution runs than in others. In particular, accesses for some
database systems tend to miss in the cache with higher frequency than some scientific
or engineering applications. In general, such variation in cache miss frequencies
can be traced to differing spatial and temporal locality characteristics of the
memory access sequences.
In some applications, particularly those characterized
by array accesses, hardware techniques can be employed to predict subsequent accesses.
Stride prediction techniques and associated hardware prefetch strategies are one
such example. However, in many applications, it is difficult for hardware to discern
and predict memory access sequences and software techniques may be alternatively
or additionally employed. For example, to increase the likelihood of cache hits
and thereby improve apparent memory access latency, some computer systems define
instructions for prefetching data from memory to cache. The assumption is that software
(e. g. , either the programmer or a compiler) may be in a reasonable position to
identify prefetch opportunities.
Unfortunately, for certain classes of applications, conventional
hardware and software prefetch techniques are not particularly effective. For example,
in some applications, performance is dictated by how well a processor can access
data represented in data structures that are traversed using pointers. Particularly
in complex data structures for which component objects are dynamically-allocated
and freed throughout execution and accesses do not exhibit a high degree of spatial
and temporal locality, access patterns may be difficult for conventional techniques
to discern. Data structures that are typically employed in relational database systems
often present such prefetch challenges.
The article "
Content-Based Prefetching: Initial Results" by Robert Cooksey et al, of the
University of Colorado
, describes examining the context of data as it is moved from memory to
caches and prefetching on the basis of data values that are likely to be addresses.
SUMMARY
According to one aspect of the present invention there
is provided a processor comprising: a register; at least one writable store having
one or more values encoded therein which define a subset of memory addresses; wherein
the subset of memory addresses comprises at runtime at least one range of addresses
from which memory is dynamically allocated; means for monitoring values generated
by executing instructions or operations; means for identifying those of said generated
values which are destined for the register; means for comparing the identified value
with the one or more values encoded in the writable store; and logic for initiating
a prefetch if said identified value corresponds with the one or more values defining
the subset of memory addresses.
According to a second aspect of the present invention there
is provided a method of automatically prefetching at least some data in a computer
system, the method comprising: selecting a subset of memory addresses which subset
comprises at runtime at least one range of addresses from which memory is dynamically
allocated; encoding one or more values, defining said subset of memory addresses,
into at least one writable store; executing an instruction or operation sequence
which generates values; identifying those of said generated values which are destined
for a register; comparing the identified value with the one or more values encoded
in the writable store; and initiating a prefetch if said identified value corresponds
with the one or more values defining the subset of memory addresses.
In the prior article by Cooksey et al there is no mechanism
for delimiting the prefetching by defining a subset of addresses using values in
a writable store.
Accordingly, techniques have been developed whereby likely
pointer values are identified at runtime and contents of corresponding storage location
can be prefetched into a cache hierarchy to reduce effective memory access latencies.
In particular, in some realizations, one or more writable stores are defined in
a processor architecture to delimit a portion or portions of a memory address space.
BRIEF DESCRIPTION OF THE DRAWINGS
The present invention may be better understood, and its
numerous objects, features, and advantages made apparent to those skilled in the
art by referencing the accompanying drawings.
FIG. 1 is block diagram depicting an illustrative processor architecture
that includes various levels of cache memory and in which techniques of the present
invention may be employed to provide automatic prefetch in accordance with some
embodiments of the present invention.
FIG. 2 depicts operation of a memory referencing value detection mechanism
in accordance with some embodiments of the present invention.
FIG. 3 depicts pipeline stages in the execution of instructions of an illustrative
processor and identifies a variety of suitable stages of in pipelined execution
at which an identification of likely pointer values destined for a register may
be performed.
FIG. 4 is a flow chart depicting operation of a memory referencing value
detection mechanism in accordance with some embodiments of the present invention.
The flow chart illustrates an illustrative exploitation in which one or more marker
registers are employed to delimit a heap.
Use of the same reference symbols in different drawings
indicates similar or identical items.
DESCRIPTION OF THE PREFERRED EMBODIMENT(S)
The description that follows presents a series of systems,
apparati, methods and techniques that facilitate automated prefetch based on detection
of memory referencing values in a processor. Exemplary realizations focus on detection
of likely pointer values destined for a processor register and include an ability
to delimit a relevant range of memory addresses values for automated prefetch using
a pair of writable stores. For simplicity of illustration, the description emphasizes
values that correspond to a single contiguous range of memory addresses; however,
more generally, any subset (including non-contiguous ranges) of addressable storage
may be identified in some realizations. One exemplary subset corresponds to a heap
from which memory may be dynamically allocated. However, more generally, desirable
subsets are exploitation-specific. Similarly, while much of the description herein
assumes a single processor, process or thread context, some realizations in accordance
with the present invention provide automated prefetch facilities customizable for
each processor of a multiprocessor, each process and/or each thread of execution.
Accordingly, in view of the above, and without limitation, certain exemplary exploitations
are now described.
FIG. 1 depicts functional units of an illustrative processor 100 that
includes a memory hierarchy for which some memory system latencies may be at least
be partially hidden using automated prefetch techniques in accordance with some
embodiments of the present invention. The memory hierarchy of processor
100 includes a data cache 101 associated with a load/store unit
110 as well as a next level cache 102, 102A, main memory
104 and any levels (not specifically shown) of additional cache or buffering.
The design of processor 100 is reminiscent of that of certain SPARC architecture
based processors and includes an illustrative set of execution units, namely load-store
unit 110, integer execution unit 130 and floating-point unit
150.
Note that descriptions and/or terminology consistent with
the SPARC architecture are used herein purely for illustrative purposes and, based
on the description herein, persons of ordinary skill in the art will appreciate
exploitations of the present invention suitable for a wide variety of processor
implementations and architectures. SPARC architecture based processors are available
from Sun Microsystems, Inc, Palo Alto, California. SPARC trademarks are used under
license and are trademarks or registered trademarks of SPARC International, Inc.
in the United States and other countries. Products bearing SPARC trademarks are
based upon an architecture developed by Sun Microsystems.
In the illustration of FIG. 1, storage for registers
120 (which may, in some implementations, include renaming facilities, a reorder
buffer or other out-of-order and/or speculative execution facilities) is integrated
with integer execution unit 130. Other implementations may employ other forms
or implementations of storage for architectural states. Suitable forms and storage
implementations are architecture- and processor implementation-specific. However,
in any case, facilities are provided in accordance with the present invention to
identify instructions (or operations) that target storage suitable for referencing
(or addressing into) memory and further to detect correspondence of values destined
for such storage with a definable subset (or portion) of such memory. Based on such
target identification and correspondence detection, a prefetch may be initiated.
For example, in the context of FIG. 1, a prefetch operation may be inserted
into an available position of load queue 111.
Note that a prefetch need not be initiated in all such
cases. For example, a prefetch that targets a line already resident in cache (e.g.,
already in data cache 101) will typically be forgone. Similarly, an already
queued load or store operation may obviate a prefetch if the queued load or store
operation targets the same cache line as the desired prefetch. In some implementations,
it may be desirable to forgo a prefetch when load queue depth or memory access bandwidth
is insufficient. In general, any of a variety of conventional load/store techniques
may be employed in a particular implementation and such techniques are well understood
in the art. In any case, when initiated, such an automatically-generated prefetch
need not (and typically does not) correspond to any prefetch instruction in an instruction
stream and the prefetch can be automatically-generated for instruction sequences
in which conventional stride-prediction techniques prove inadequate. That said,
nothing herein precludes combination of techniques or facilities in accordance with
the present invention with software- assisted prefetch (e.g., compiler inserted
prefetch instructions) and/or conventional hardware-assisted prefetch.
To access data in a typical load/store processor architecture,
program instructions (e.g., load-type instructions and store-type instructions)
will typically operate on a fully-specified pointer value stored in a register.
A load instruction,
LD [R21], R22
that loads into register R22 the contents of memory identified by the address stored
in register R21 is typical. In general, the pointer value that resides in R21 may
be the result of a previous load (e.g., in the case of a linked-list traversal),
may be the result of an address computation or may be the byproduct of some other
operation defined by a particular processor architecture. In each case handled by
a particular realization, we employ techniques in accordance with the present invention
to detect that a value destined for a register (e.g., for register R21) matches
a pattern that suggests that it is a pointer value.
While techniques in accordance with the present invention
are not necessarily limited to contiguous ranges of memory addresses, let alone
to a single contiguous range, many aspects will be understood in accordance with
detection patterns that correspond to a single contiguous range of memory addresses.
Suitable encoding and comparison techniques for such patterns are straight-forward.
Nonetheless, based on the description herein, persons of ordinary skill in the art
will appreciate suitable extensions to other patterns that may be employed to delimit
other subsets of addressable storage.
In view of the foregoing and without limitation, data identified
or referenced by pointers often reside in a region of memory that can be described
by a simple range. Indeed, one may determine or discover that most pointers to interesting
data in a heap appears in locations that correspond to the range of possible values
from 0xFFFF0000 to 0xFFFFFFFF. Of course, values that define the aforementioned
range are purely arbitrary. In general, such a range may correspond to current bounds
of the heap itself or to a particular portion thereof. Whatever the particular range,
values destined for registers of a processor may be compared against the range to
identify likely pointer values.
FIG. 2 illustrates operation of an illustrative memory referencing value
detection mechanism. At any of a variety of execution stages (e.g., 201A, 201B,
201C ... 201D) in a processor 200, an executing instruction or
operation may generate a value destined for a register. In general, a processor-specific
set of instructions, operations or variants thereof can be identified, which target
such registers (not specifically shown) of processor 200. Of course, in many
processor implementations, architectural techniques may be employed to rename registers
and/or to provide speculative or out-of-order execution. Accordingly, instructions
or operations that target registers of a processor will be understood to encompass
those instructions or operations that target architectural state of the processor,
regardless of how such state may be transiently or speculatively represented. Based
on the description herein, persons of ordinary skill in the art will appreciate
relevant sets of instructions or operations and relevant repositories of architectural
state for any particular processor exploitation of the present invention.
FIG. 2 illustrates comparison of values destined for a representation
220 of architectural state against a range 205 of addresses defined
by a pair of stores 230 that identify corresponding subset 205A of
memory 204 addressable by processor 200. As previously described,
subset 205A of addressable memory 204 may correspond to a heap or
to some other relevant subset of addressable memory. In some realizations, subset
205A may correspond to a range (or to ranges) of addresses employed in some
performance critical data structure. In some realizations, processor representation
of the pattern description (such as by stores 230) is writable by application
code so as to allow programmer control of the focus of an automatic prefetch mechanism.
In other realizations, definition of the pattern description may be under control
of an execution or operating environment. For example, the pattern description may
be maintained by a memory allocator to correspond to a current extent of a heap.
In general, potential pointer values may be produced at
any of a variety of execution stages (e.g., 201A, 201B, 201C ...
201D) and suitable stages are typically processor and instruction or operation
specific. FIG. 3 depicts pipeline stages in the execution of instructions
of an illustrative SPARC architecture based processor and identifies a variety of
suitable stages of pipelined execution at which an identification of likely pointer
values destined for a register may be performed. In particular, FIG. 3 illustrates
a pipeline with nine-stages: fetch 301, decode 302, grouping
303, execution 304, cache access 305, load miss 306,
integer pipe wait 307, trap resolution 308 and writeback
309. Integer instructions are executed and virtual addresses calculated in
execution stage 304. In stage 305, the data cache (e.g., data cache
101 and supporting data TLB 101A, see FIG. 1) is accessed
and hits and misses are determined. During later stages, a load miss enters a load
buffer (e.g., load queue 111, see FIG. 1), wait conditions and traps
(if any) are resolved and finally in stage 309, results are written to registers
(e.g., to register file 120, see FIG. 1) and instructions are
committed.
In the illustrative context of FIG. 3, comparison
with a pattern (or patterns) that correspond(s) to a subset of addressable memory
may be performed beginning at execution stage 304. In general, initiation
of a prefetch occurs as soon as possible thereafter (e.g., during execution stage
304 itself), or may be delayed until the register-destined value is actually
written to register storage in stage 309. In some realizations, the particular
stage of execution at which values suitable for comparison against the pattern or
range are available may vary from instruction-to-instruction (or operation-to-operation)
or from execution instance to execution instance. For example, a value loaded from
memory into register storage by an execution instance of a load operation may be
available after data cache access (stage 305) or later, such as after a cache line
fill. On the other hand, results of an arithmetic calculation (e.g., as part of
an address calculation) may be available at stage 304. In general, the particular
execution stages and on-chip computational structures at which register destined
result values may be compared against a pattern descriptive of a subset addressable
memory are instruction, processor and implementation dependent. Accordingly, the
particular timing of a check for likely pointer values and the placement of computational
structures to perform such checks will vary from realization to realization.
A wide variety of design alternatives may be employed in
the implementation of processor realizations in accordance with the present invention.
For example, in some realizations, likely pointer value checks may be performed
essentially indiscriminately on all register bound values. For example, in certain
processor architectures, result bus traffic may be observed. In other realizations,
likely pointer value checks may be implemented at certain computational structures
or pipeline stages. Alternatively, or in addition, results of certain instructions
or operations may be the focus likely pointer value checks. In such cases, likely
pointer value checks may coordinate with instruction dispatch and result commitment
logic. Whatever the particular design, a processor in accordance with some embodiments
of the present invention initiates prefetch of some data from memory based on detection,
by the processor, of a likely pointer value destined for a register or other representation
of architectural state.
FIG. 4 is a flow chart depicting operation of a memory referencing value
detection mechanism in accordance with some embodiments of the present invention.
As before, values are monitored at an appropriate computational structure or pipeline
stage of a processor exploitation of the invention. In the particular illustration
of FIG. 4, results of a subset of instructions are targeted for likely pointer
value detection. Therefore, two checks are emphasized. First, instructions or operations
that are part of a register targeting subset of all instructions or operations are
identified (401). Register targeting instructions are easily recognized in
most instructions set architectures. For example, in some processor implementation,
instructions or operations that target a register may be marked during (or as a
result of) decode. Of course, in some realizations (not specifically illustrated
in FIG. 4), a subset of register targeting instructions may be the focus
of likely pointer value detection. For example, certain arithmetic operations may
be predominately employed as a final (or later stage) step in address calculation
instruction sequences generated by compilers. Accordingly, some implementations
may focus likely pointer value detection on result values of such operations. Alternatively
or additionally, a subset of registers may be disproportionately used for storage
of pointer values and likely pointer value detection may be appropriately focused
on instructions that target such registers.
After identification (if any) of a relevant instruction
type (e.g., a register targeting instruction), particular result values are compared
against a pattern that is descriptive of an interesting subset of addressable memory.
The interesting subset can be entirely implementation specific. For example, the
flow chart of FIG. 4 illustrates an illustrative exploitation in which one
or more marker registers are employed to encode a pattern that delimits a heap.
Other portions of addressable memory may be desirable in other exploitations. In
general, comparison of a possible pointer value against a base and bound or base
and offset encoding are each quite simple, and for certain well behaved base, bound
and offsets, comparisons may be computed particularly efficiently. While a single
contiguous range of addressable memory locations is suitable for many implementations,
other implementations may provide plural ranges (and associated storage, such as
in the form of sets of marker registers 403) if desired. For example, some
realizations may support potentially distinct ranges for multiple threads or processes
or to support non-contiguous subsets of addressable memory.
In general, it is desirable for the particular subset (or
range) of addressable memory to be definable at runtime. FIG. 4 illustrates
a particular exploitation of such a runtime facility for maintaining a pattern (encoded
in one or more marker registers 403) to support detection of likely pointer
values that correspond to a current extent of a heap. In general, the contents of
such marker registers or other suitable encodings may be initialized to correspond
to an interesting subset of addressable memory and, in some realizations, such encodings
may be updates as the interesting subset changes (e.g., as a heap is expanded or
contracted). Accordingly, some realizations marker registers 403 are as storage
writable under program control, e.g., machine specific registers. Nonetheless, in
some exploitations for which an interesting subset of addressable storage may be
defined apart from a particular application of program, a fixed subset or range
encoding may be suitable. While some of the description herein has been in the context
of values that may be intercepted and compared at various pipeline stages, other
realizations in accordance with some embodiments of the present invention may scan
register or other archictectural state storage for likely pointer values.
While the invention has been described with reference to
various embodiments, it will be understood that these embodiments are illustrative
and that the scope of the invention is not limited to them. Many variations, modifications,
additions, and improvements are possible. For example, while much of the description
herein has focused on the illustrative context of prefetch and addressable memory,
techniques of the present invention may be applied to pre-executable activity for
any of a variety of addressable resources that may be employed in a computing environment.
As described, techniques of the present invention may be employed in combination
with other prefetch facilities, be they hardware- or software-oriented. Any of a
variety of suitable patterns may be encoded to identify a subset of addressable
storage and any of a variety of suitable physical structures may employed to encode
same.
Although the terms "instruction" and "operation" may be
used in some contexts to identify different things, e.g., instructions in accordance
with an instruction set and operations (microcode or otherwise), such distinctions
are not significant in the context of the present invention. Accordingly, while
the term instruction is typically used in the claims, persons of ordinary skill
in the art will understand the scope of such term shall include instructions, operations
or any similar functional encoding such as an opcode, bytecode, etc. in a general
sense, without regard to distinctions that others may employ. Similarly, while pointers
values (and therefore detection of likely pointer values) will generally be defined
in virtual address space, other definitions (e.g., physical addresses, linear addresses,
indexes, etc.) and encodings may be suitable for certain exploitations.
More generally, realizations in accordance with the present
invention have been described in the context of particular embodiments. These embodiments
are meant to be illustrative and not limiting. Accordingly, plural instances may
be provided for components described herein as a single instance. Boundaries between
various components, operations and data stores are somewhat arbitrary, and particular
operations are illustrated in the context of specific illustrative configurations.
Other allocations of functionality are envisioned and may fall within the scope
of claims that follow. Finally, structures and functionality presented as discrete
components in the exemplary configurations may be implemented as a combined structure
or component. These and other variations, modifications, additions, and improvements
may fall within the scope of the invention as defined in the claims that follow.
|
| Anspruch[de] |
Prozessor (100), der Folgendes umfasst:
ein Register (120);
wenigstens einen beschreibbaren Speicher (230) mit darin codierten ein
oder mehreren Werten, die eine Teilmenge (205A) von Speicheradressen definieren;
wobei die Teilmenge (205A) von Speicheradressen zur Laufzeit wenigstens einen Bereich
(205) von Adressen umfasst, der dynamisch vom Speicher zugewiesen wird;
Mittel zum Überwachen von durch die Ausführung von Befehlen oder Operationen
erzeugten Werten;
Mittel (401) zum Identifizieren derjenigen der genannten erzeugten Werte, die für
das Register (120) bestimmt sind;
Mittel (402) zum Vergleichen des identifizierten Wertes mit den in dem beschreibbaren
Speicher (230) codierten ein oder mehreren Werten; und
Logik zum Einleiten eines Prefetch (160), wenn der genannte identifizierte Wert
den die Teilmenge (205A) von Speicheradressen definierenden ein oder mehreren Werten
entspricht.
Prozessor nach Anspruch 1, der ferner Logik umfasst, die auf eine Adressberechnung
anspricht, wobei die den Prefetch (160) einleitende Logik auf einer Übereinstimmung
zwischen der berechneten Adresse und der Teilmenge (205A) von Speicheradressen basiert.
Prozessor nach Anspruch 1 oder 2, der ferner Logik umfasst, die auf
die Speicherung eines Wertes in dem Register (120) anspricht, wobei die den Prefetch
(120) einleitende Logik auf einer Übereinstimmung zwischen dem gespeicherten
Wert und der Teilmenge (205A) von Speicheradressen basiert.
Prozessor nach Anspruch 1 oder 2, der ferner Adressvergleicherlogik
umfasst, die den Prefetch (160) auf der Basis einer Übereinstimmung zwischen
dem für das Register bestimmten Wert und der Teilmenge (205A) von Speicheradressen
einleitet, umfassend Mittel zum Ausführen des Vergleichs in einer Pipeline-Stufe
zwischen der Adressberechnung und der Speicherung in dem Register (120).
Prozessor nach einem der vorherigen Ansprüche, wobei der für
das Register (120) bestimmte Wert für ein Architekturregister oder einen diesem
entsprechenden Umordnungspufferzustand bestimmt ist.
Prozessor nach einem der vorherigen Ansprüche, wobei das Register
(120) ein Register eines Betriebsregistersatzes ist.
Prozessor nach Anspruch 6, wobei der Betriebsregistersatz einem bestimmten
Thread oder Prozess entspricht und wobei die Teilmenge von Speicheradressen Positionen
entspricht, die von dem oder für den jeweiligen Thread oder Prozess dynamisch
zugewiesen wurden.
Prozessor nach einem der vorherigen Ansprüche, wobei der wenigstens
eine beschreibbare Speicher (230) ein Paar Register beinhaltet, deren Inhalt eine
zusammenhängende Reihe von Speicheradressen definiert.
Prozessor nach einem der vorherigen Ansprüche, wobei die Teilmenge
(205A) zur Laufzeit einen einem Freispeicher entsprechenden Bereich von Speicheradressen
abdeckt.
Prozessor nach einem der vorherigen Ansprüche mit der Aufgabe,
den Prefetch (160) ohne Vorliegen eines entsprechenden Prefetch-Befehls in einer
Befehlssequenz auszuführen.
Verfahren zum automatischen Prefetchen wenigstens einiger Daten in einem
Computersystem, wobei das Verfahren die folgenden Schritte beinhaltet:
Auswählen einer Teilmenge (205A) von Speicheradressen, die zur
Laufzeit wenigstens einen Bereich (205) von Adressen umfasst, der dynamisch vom
Speicher zugewiesen wird;
Codieren von ein oder mehreren Werten, die die genannte Teilmenge (205A)
von Speicheradressen definieren, zu wenigstens einem beschreibbaren Speicher (230);
Ausführen einer Befehls- oder Operationssequenz, die Werte erzeugt;
Identifizieren derjenigen der genannten erzeugten Werte, die für
ein Register (120) bestimmt sind;
Vergleichen des identifizierten Wertes mit den ein oder mehreren Werten,
die in dem beschreibbaren Speicher (230) codiert sind; und
Einleiten eines Prefetch (160), wenn der genannte identifizierte Wert
den die Teilmenge (205A) von Speicheradressen definierenden ein oder mehreren Werten
entspricht.
Verfahren nach Anspruch 11, das ferner das Ausführen eines Speicherzugriffsbefehls
beinhaltet, der Inhalt des Registers als Adresswert benutzt,
wobei eine vorherige Ausführung des Prefetch es zulässt, dass der Speicherzugriffsbefehl
aus dem Cache erledigt wird.
Verfahren nach Anspruch 11, wobei der wenigstens eine beschreibbare
Speicher ein Paar Register beinhaltet, die Grenzen von wenigstens einem zusammenhängenden
Teil der Teilmenge (205A) von adressierbarem Speicher codiert.
Verfahren nach Anspruch 11, das ferner das Initialisieren des wenigstens
einen beschreibbaren Speichers (230) beinhaltet, so dass er Grenzen des Freispeichers
entspricht.
Verfahren nach Anspruch 11, 13 oder 14, das ferner das Initialisieren
des wenigstens einen beschreibbaren Speichers (230) beinhaltet, so dass er einem
Bereich von Speicheradressen entspricht, der zum Speichern von Sperrzuständen
verwendet wird.
Verfahren nach Anspruch 11, 12, 13, 14 oder 15, das ferner das Prefetchen
von wenigstens einigen anderen Daten auf der Basis eines Prefetch-Befehls in der
Befehlssequenz beinhaltet.
Verfahren nach Anspruch 11, 12, 13, 14 oder 15, das ferner das Prefetchen
von wenigstens einigen anderen Daten auf der Basis einer Vorhersage von Speicherzugriffsschritten
beinhaltet.
Verfahren nach Anspruch 11, wobei das Identifizieren der genannten,
für ein Register bestimmten erzeugten Werte das Vergleichen von in einer Registerdatei
gespeicherten Datenwerten mit der Teilmenge von Speicheradressen beinhaltet.
Verfahren nach Anspruch 11 oder 18, wobei das Identifizieren der genannten,
für ein Register bestimmten erzeugten Werte das Scannen von in einer Registerdatei
oder einem Umordnungspuffer gespeicherten Werten und das Vergleichen der gescannten
Werte mit der Teilmenge von Speicheradressen beinhaltet.
Verfahren nach Anspruch 11, 18 oder 19, wobei der Prozessor spekulative
Ausführung unterstützt und wobei das Identifizieren der genannten, für
ein Register bestimmten erzeugten Werte das Vergleichen von Registerzuständen,
die nichtspekulativ werden, mit der Teilmenge von Speicheradressen beinhaltet.
Verfahren nach Anspruch 11, 18, 19 oder 20, wobei das Identifizieren
der genannten, für ein Register bestimmten erzeugten Werte das Vergleichen
von Adresswerten, die bei der Ausführung bestimmter Befehle eines Befehlssatzes
berechnet werden, mit der Teilmenge von Speicheradressen beinhaltet.
Verfahren nach Anspruch 11, 18, 19 oder 21, wobei die Teilmenge von
Speicheradressen durch Inhalt von wenigstens einem beschreibbaren Speicher des Prozessors
definiert wird.
Verfahren nach Anspruch 11, 18, 19, 20 oder 21, wobei die Teilmenge
von Speicheradressen ein oder mehrere zusammenhängende Bereiche von Speicheradressen
definiert.
Verfahren nach Anspruch 11, wobei die Teilmenge (205A) zur Laufzeit
wenigstens einen mit einem Freispeicher koextensiven Bereich von Adressen abdeckt.
Verfahren nach Anspruch 11, wobei die Teilmenge (205A) einen programmierbaren
Bereich von Adressen abdeckt.
|
| Anspruch[en] |
A processor (100) comprising:
a register (120);
at least one writable store (230) having one or more values encoded
therein which define a subset (205A) of memory addresses;
wherein the subset (205A) of memory addresses comprises at runtime at least one
range (205) of addresses which is dynamically allocated from memory ;
means for monitoring values generated by executing instructions or operations;
means (401) for identifying those of said generated values which are destined for
the register (120);
means (402) for comparing the identified value with the one or more values encoded
in the writable store (230); and
logic for initiating a prefetch (160) if said identified value corresponds with
the one or more values defining the subset (205A) of memory addresses.
The processor of claim 1, further comprising:
logic responsive to an address calculation, wherein the logic initiating
the prefetch (160) is based on a match between the calculated address and the subset
(205A) of memory addresses.
The processor of claim 1 or 2, further comprising:
logic responsive to storage of a value into the register (120), the
logic initiating the prefetch (160) based on a match between the stored value and
the subset (205A) of memory addresses.
The processor of claim 1 or 2, further comprising:
address match logic that initiates the prefetch (160) based on a match
between the value destine for the register and the subset (205A) of memory addresses,
comprising means for performing the match at a pipeline stage between address calculation
and storage to the register (120).
The processor of any one of the preceding claims,
wherein the value destined for the register (120) is destined for an architectural
register or any reorder buffer state corresponding thereto.
The processor of any one of the preceding claims,
wherein the register (120) is a register of an operative register set.
The processor of claim 6,
wherein the operative register set corresponds to a particular thread or process;
and
wherein the subset of memory addresses corresponds to locations dynamically allocated
by or for the particular thread or process.
The processor of any one of the preceding claims,
wherein the at least one writable store (230) includes a pair of registers whose
contents define a contiguous range of memory addresses.
The processor of any one of the preceding claims,
wherein, at runtime, the subset (205A) covers a range of memory addresses that correspond
to a heap.
The processor of any one of the preceding claims, adapted to perform
the prefetch (160) without presence of a corresponding prefetch instruction in an
instruction sequence.
A method of automatically prefetching at least some data in a computer
system, the method comprising:
selecting a subset (205A) of memory addresses which subset comprises
at runtime at least one range (205) of addresses which is dynamically allocated
from memory ;
encoding one or more values, defining said subset (205A) of memory addresses,
into at least one writable store (230);
executing an instruction or operation sequence which generates values;
identifying those of said generated values which are destined for a
register (120);
comparing the identified value with the one or more values encoded in
the writable store (230); and
initiating a prefetch (160) if said identified value corresponds with
the one or more values defining the subset (205A) of memory addresses.
The method of claim 11, further comprising:
executing a memory access instruction that uses contents of the register
as an address value, wherein prior performance of the prefetch allows the memory
access instruction to be serviced from cache.
The method of claim 11,
wherein the at least one writable store includes a pair of registers that encode
bounds of at least one contiguous portion of the subset (205A) of addressable memory.
The method of claim 11, further comprising:
initializing the at least one writable store (230) to correspond to
bounds of the heap.
The method of claim 11, 13, or 14, further comprising:
initializing the at least one writable store (230) to correspond to
a range of memory addresses used for storage of lock states.
The method of any one of claims 11, 12, 13, 14 or 15, further comprising:
prefetching at least some other data based on a prefetch instruction
in the instruction sequence.
The method of any one claims 11, 12, 13, 14 or 15, further comprising:
prefetching at least some other data based on a prediction of memory
access strides.
The method of claim 11,
wherein identifying said generated values which are destined for a register includes
comparing data values being stored into a register file against the subset of memory
addresses.
The method of claim 11 or 18,
wherein identifying said generated values which are destined for a register includes
scanning values
stored in a register file or reorder buffer and comparing the scanned values against
the subset of memory addresses.
The method of claim 11, 18 or 19,
wherein the processor supports speculative execution; and wherein identifying said
generated values which are destined for a register includes comparing against the
subset of memory addresses, register states that become non-speculative.
The method of any one of claims 11, 18, 19 or 20,
wherein identifying said generated values which are destined for a register includes
comparing against the subset of memory addresses, address values calculated on execution
of certain instructions of an instruction set.
The method of any one of claims 11, 18, 19, or 21,
wherein the subset of memory addresses is defined by contents of at least one writable
store of the processor.
The method of any one of claims 11, 18, 19, 20, or 21,
wherein the subset of memory addresses defines one or more contiguous ranges of
memory addresses.
The method as in claim 11, wherein, at runtime, the subset (205A) covers
at least one range of addresses coextensive with a heap.
The method as in claim 11, wherein the subset (205A) covers a programmable
range of addresses.
|
| Anspruch[fr] |
Un processeur (100) comprenant:
un registre (120),
au moins un espace mémoire inscriptible (230) possédant une
ou plusieurs valeurs codées dans celui-ci qui définissent un sous-ensemble
(205A) d'adresses mémoire,
où le sous-ensemble (205A) d'adresses mémoire comprend, lors de l'exécution,
au moins une plage (205) d'adresses qui est attribuée dynamiquement à
partir de la mémoire,
un moyen de surveiller des valeurs générées par l'exécution
d'instructions ou d'opérations,
un moyen (401) d'identifier les valeurs desdites valeurs générées
qui sont destinées au registre (120),
un moyen (402) de comparer la valeur identifiée à la une ou plusieurs
valeurs codées dans l'espace mémoire inscriptible (230), et
une logique destinée à lancer une prélecture (160) si ladite valeur
identifiée correspond à la une ou plusieurs valeurs définissant le
sous-ensemble (205A) d'adresses mémoire.
Le processeur selon la Revendication 1, comprenant en outre :
une logique réagissant à un calcul d'adresse, où la logique
lançant la prélecture (160) s'appuie sur une mise en correspondance entre
l'adresse calculée et le sous-ensemble (205A) d'adresses mémoire.
Le processeur selon la Revendication 1 ou 2, comprenant en outre :
une logique réagissant à la mise en mémoire d'une valeur
dans le registre (120), la logique lançant la prélecture (160) en s'appuyant
sur une mise en correspondance entre la valeur conservée en mémoire et
le sous-ensemble (205A) d'adresses mémoire.
Le processeur selon la Revendication 1 ou 2, comprenant en outre :
une logique de mise en correspondance d'adresses qui lance la prélecture
(160) en s'appuyant sur une mise en correspondance entre la valeur destinée
au registre et le sous-ensemble (205A) d'adresses mémoire, comprenant un moyen
d'exécuter la mise en correspondance à un étage de pipeline entre
le calcul d'adresse et la mise en mémoire dans le registre (120).
Le processeur selon l'une quelconque des Revendications précédentes,
où la valeur destinée au registre (120) est destinée à un registre
architectural ou à tout état de mémoire tampon de réordonnancement
correspondant à celui-ci.
Le processeur selon l'une quelconque des Revendications précédentes,
où le registre (120) est un registre d'un ensemble de registres opérationnels.
Le processeur selon la Revendication 6,
où l'ensemble de registres opérationnels correspond à un fil ou un
processus particulier, et
où le sous-ensemble d'adresses mémoire correspond à des emplacements
attribués dynamiquement par ou pour le fil ou le processus particulier.
Le processeur selon l'une quelconque des Revendications précédentes,
où le au moins un espace mémoire inscriptible (230) comprend une paire
de registres dont les contenus définissent une plage contiguë d'adresses
mémoire.
Le processeur selon l'une quelconque des Revendications précédentes,
où, lors de l'exécution, le sous-ensemble (205A) couvre une plage d'adresses
mémoire qui correspondent à un tas.
Le processeur selon l'une quelconque des Revendications précédentes,
adapté de façon à exécuter la prélecture (160) sans la
présence d'une instruction de prélecture correspondante dans une séquence
d'instructions.
Un procédé destiné à effectuer automatiquement une
prélecture d'au moins certaines données sur un système informatique,
le procédé comprenant :
la sélection d'un sous-ensemble (205A) d'adresses mémoire,
ce sous-ensemble comprenant, lors de l'exécution, au moins une plage (205)
d'adresses qui est attribuée dynamiquement à partir de la mémoire,
le codage d'une ou plusieurs valeurs, définissant ledit sous-ensemble
(205A) d'adresses mémoire, dans au moins un espace mémoire inscriptible
(230),
l'exécution d'une séquence d'instructions ou d'opérations
qui génère des valeurs,
l'identification des valeurs parmi lesdites valeurs générées
qui sont destinées à un registre (120),
la comparaison de la valeur identifiée à la une ou plusieurs
valeurs codées dans l'espace mémoire inscriptible (230), et
le lancement d'une opération de prélecture (160) si ladite
valeur identifiée correspond à la une ou plusieurs valeurs définissant
le sous-ensemble (205A) d'adresses mémoire.
Le procédé selon la Revendication 11, comprenant en outre
:
l'exécution d'une instruction d'accès à la mémoire
qui utilise les contenus du registre en tant que valeur d'adresse, où les performances
antérieures de la prélecture permettent à l'instruction d'accès
à la mémoire d'être exécutée à partir d'une mémoire
tampon.
Le procédé selon la Revendication 11,
où le au moins un espace mémoire inscriptible comprend une paire de registres
qui codent les limites d'au moins une partie contiguë du sous-ensemble (205A)
de mémoire adressable.
Le procédé selon la Revendication 11, comprenant en outre
:
l'initialisation du au moins un espace mémoire inscriptible (230)
de façon qu'il corresponde aux limites du tas.
Le procédé selon la Revendication 11, 13 ou 14, comprenant
en outre :
l'initialisation du au moins un espace mémoire inscriptible (230)
de façon qu'il corresponde à une plage d'adresses mémoire utilisée
pour la conservation en mémoire d'états de verrouillage.
Le procédé selon l'une quelconque des Revendications 11, 12,
13, 14 ou 15, comprenant en outre :
la prélecture d'au moins certaines autres données en s'appuyant
sur une instruction de prélecture de la séquence d'instructions.
Le procédé selon l'une quelconque des Revendications 11, 12,
13, 14 ou 15, comprenant en outre :
la prélecture d'au moins certaines autres données en s'appuyant
sur une prédiction d'enjambements d'accès à la mémoire.
Le procédé selon la Revendication 11,
où l'identification desdites valeurs générées qui sont destinées
à un registre comprend la comparaison de valeurs de données conservées
en mémoire dans un fichier de registre au sous-ensemble d'adresses mémoire.
Le procédé selon la Revendication 11 ou 18,
où l'identification desdites valeurs générées qui sont destinées
à un registre comprend le balayage de valeurs conservées en mémoire
dans un fichier de registre ou une mémoire tampon de réordonnancement
et une comparaison des valeurs balayées au sous-ensemble d'adresses mémoire.
Le procédé selon la Revendication 11, 18 ou 19,
où le processeur accepte une exécution spéculative, et où l'identification
desdites valeurs générées qui sont destinées à un registre
comprend une comparaison, par rapport au sous-ensemble d'adresses mémoire,
d'états de registre qui deviennent non spéculatifs.
Le procédé selon l'une quelconque des Revendications 11, 18,
19 ou 20,
où l'identification desdites valeurs générées qui sont destinées
à un registre comprend une comparaison, par rapport au sous-ensemble d'adresses
mémoire, de valeurs d'adresses calculées lors de l'exécution de certaines
instructions d'un jeu d'instructions.
Le procédé selon l'une quelconque des Revendications 11, 18,
19 ou 21,
où le sous-ensemble d'adresses mémoire est défini par les contenus
d'au moins un espace mémoire inscriptible du processeur.
Le procédé selon l'une quelconque des Revendications 11, 18,
19, 20 ou 21,
où le sous-ensemble d'adresses mémoire définit une ou plusieurs plages
contiguës d'adresses mémoire.
Le procédé selon la Revendication 11, où, lors de l'exécution,
le sous-ensemble (205A) couvre au moins une plage d'adresses coextensive avec un
tas.
Le procédé selon la Revendication 11, où, le sous-ensemble
(205A) couvre une plage programmable d'adresses.
|
|
|